diff --git a/README.md b/README.md index 38c831f87d5..4049000c9f8 100755 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -## Updated on 2024.11.04 +## Updated on 2024.11.05 > Usage instructions: [here](./docs/README.md#usage)
@@ -13,16 +13,16 @@ |Publish Date|Title|Authors|PDF|Code|abstract| |---|---|---|---|---|---| -|**2024-10-31**|**From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents**|Nalin Tiwary et.al.|[2410.23555](http://arxiv.org/abs/2410.23555)|null|近年来,基于大型语言模型(LLM)的框架已经扩展到复杂的现实世界应用,如交互式网页导航。这些系统通过用户命令驱动,使用多轮对话在网页浏览器中完成任务,既提供了创新机会也带来了重大挑战。尽管引入了用于会话网页导航的基准测试,但影响这些代理性能的关键上下文组件的详细理解仍然不足。本研究旨在填补这一空白,通过分析网页导航代理功能的关键上下文元素。我们研究了上下文管理的优化,重点关注交互历史和网页表示的影响。我们的工作突出了通过有效的上下文管理,在分布外场景下的改进代理性能,包括未见过的网站、类别和地区。这些发现为LLM基代理的设计和优化提供了见解,使实际应用中的网页导航更加准确和有效。| -|**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|随着大型语言模型(LLMs)扩展到执行现实世界应用中的代理任务,超越传统NLP任务,评估其鲁棒性变得越来越重要。然而,现有的基准测试往往忽略了文化和社会意识等关键维度。为了解决这些问题,我们引入了CASA,这是一个旨在评估LLM代理在两个基于网络的任务(在线购物和社交讨论论坛)中对文化和社会规范的敏感性的基准。我们的方法评估了LLM代理检测并适当回应违反规范的用户查询和观察的能力。此外,我们提出了一种全面的评估框架,该框架测量意识覆盖率、管理用户查询的实用性以及面对误导性网络内容时的违规率。实验表明,当前的LLMs在非代理环境中的表现明显优于基于网络的代理环境,代理的意识覆盖率低于10%,违规率超过40%。为了提高性能,我们探索了两种方法:提示和微调,并发现这两种方法可以互补——针对特定文化的数据集进行微调显著提高了代理在不同地区的泛化能力,而提示则提升了代理处理复杂任务的能力。这些发现强调了在开发周期中不断基准测试LLM代理的文化和社会意识的重要性。| -|**2024-10-30**|**Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration**|Yanchu Guan et.al.|[2410.22916](http://arxiv.org/abs/2410.22916)|null|自主移动应用交互在移动应用复杂性不断增加的背景下变得日益重要。开发能够有效导航和与移动应用交互的智能代理仍然是一个重大挑战。本文提出了一种可解释的行为克隆大型语言模型代理(EBC-LLMAgent),该方法结合了大型语言模型(LLMs)和行为克隆通过学习演示来创建用于自主移动应用交互的智能且可解释的代理。EBC-LLMAgent 包含三个核心模块:演示编码、代码生成和UI映射,这些模块协同工作以捕捉用户演示、生成可执行代码,并建立代码与UI元素之间的准确对应关系。我们引入了行为克隆链融合技术来增强代理的泛化能力。对五个来自不同领域的流行移动应用进行的广泛实验表明,EBC-LLMAgent 的表现优越,在任务完成方面具有高成功率,在未见场景中表现出高效的泛化能力,并能生成有意义的解释。| -|**2024-10-30**|**$\textbf{EMOS}$: $\textbf{E}$mbodiment-aware Heterogeneous $\textbf{M}$ulti-robot $\textbf{O}$perating $\textbf{S}$ ystem with LLM Agents**|Junting Chen et.al.|[2410.22662](http://arxiv.org/abs/2410.22662)|null|异构多机器人系统(HMRS)已成为解决单个机器人无法单独处理的复杂任务的强大方法。当前基于大型语言模型的多智能体系统(LLM-based MAS)在软件开发和操作系统等领域取得了成功,但将其应用于机器人控制则呈现出独特的挑战。特别是,多机器人系统中每个代理的能力本质上与机器人的物理组成相关,而非预定义的角色。为了解决这一问题,我们介绍了一种新颖的多智能体框架,旨在促进具有不同形态和能力的异构机器人之间的有效协作,并引入了一个新的基准测试Habitat-MAS。我们设计的关键组成部分之一是“机器人简历”:不同于采用人为设计的角色扮演,我们提出了一种自我提示的方法,其中代理通过理解机器人的URDF文件并调用机器人运动学工具来生成描述其物理能力的信息,以指导任务规划和行动执行。Habitat-MAS基准测试旨在评估多智能体框架在需要形态感知推理的任务中的表现,这些任务包括1)操作,2)感知,3)导航以及4)全面的多楼层物体重排。实验结果表明,机器人的简历和我们多智能体系统的分层设计对于在这种复杂的任务环境中有效运行异构多机器人系统至关重要。| -|**2024-10-29**|**BENCHAGENTS: Automated Benchmark Creation with Agent Interaction**|Natasha Butt et.al.|[2410.22584](http://arxiv.org/abs/2410.22584)|null|评估受到基准可用性的限制。随着模型的发展,需要创建能够衡量新生成能力进展的基准。然而,通过人工注释创建新的基准既缓慢又昂贵,限制了对任何能力的全面评估。我们介绍了BENCHAGENTS框架,该框架系统地利用大型语言模型(LLMs)自动化复杂能力的基准创建过程,同时内在确保数据和度量的质量。BENCHAGENTS将基准创建过程分解为规划、生成、数据验证和评估四个阶段,每个阶段都由一个LLM代理执行。这些代理相互交互,并利用来自基准开发者的基于人类反馈来显式改进和灵活控制数据多样性和质量。我们使用BENCHAGENTS创建用于评估文本生成过程中与规划和约束满足相关能力的基准。然后,我们使用这些基准研究了七个最先进的模型,并提取了关于常见失败模式和模型差异的新见解。| -|**2024-10-29**|**Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents**|Jaekyeom Kim et.al.|[2410.22552](http://arxiv.org/abs/2410.22552)|null|在本文中,我们介绍了Auto-Intent方法,这是一种无需直接微调即可将预训练的大规模语言模型(LLM)作为目标领域代理的方法,特别关注网页导航任务。我们的方法首先从目标领域的演示中无监督地发现潜在的意图,以高度紧凑的形式(最多三个词)提取这些意图。通过提取的意图,我们训练意图预测器来预测给定代理过去观察和动作的下一个意图。特别是,我们提出了一种自我探索方法,其中最有可能的前k个意图预测被用作提示提供给预训练的LLM代理,从而增强其决策能力。Auto-Intent显著提高了GPT-3.5、GPT-4和Llama-3.1-70B、Llama-3.1-405B代理在来自Mind2Web的大规模真实网站导航基准测试以及来自WebArena的在线导航任务上的性能,并且它在Mind2Web基准测试中的跨基准泛化能力也得到了提升。| -|**2024-10-29**|**SceneGenAgent: Precise Industrial Scene Generation with Coding Agent**|Xiao Xia et.al.|[2410.21909](http://arxiv.org/abs/2410.21909)|**[link](https://github.com/thudm/scenegenagent)**|**工业场景的建模对于工业制造中的模拟至关重要。虽然大型语言模型(LLMs)在从文本描述生成通用3D场景方面已经取得了显著进展,但使用LLMs生成工业场景面临着独特的挑战,因为这些场景需要精确的尺寸和定位,要求在空间布局上进行复杂的规划。为了解决这一挑战,我们引入了SceneGenAgent,这是一种基于LLM的代理,通过C#代码生成工业场景。SceneGenAgent通过结构化和可计算的格式、布局验证和迭代优化来确保精确的布局规划,以满足工业场景的定量需求。实验结果表明,由SceneGenAgent驱动的LLMs超过了它们原有的性能,在现实世界的工业场景生成任务中达到了高达81.0%的成功率,并有效满足了大多数场景生成需求。为了进一步增强其易用性,我们构建了SceneInstruct数据集,用于微调开源LLMs以集成到SceneGenAgent中。实验显示,对开源LLMs在SceneInstruct上的微调可以显著提升性能,Llama3.1-70B接近GPT-4o的能力。我们的代码和数据可在https://github.com/THUDM/SceneGenAgent 获取。**| -|**2024-10-28**|**Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games**|Ji Ma et.al.|[2410.21359](http://arxiv.org/abs/2410.21359)|null|随着基于大型语言模型(LLM)的代理越来越多地承担现实世界任务并与人类社会互动,我们对它们的行为了解多少?本研究(1)调查了不同人格设定如何诱导LLM代理表现出亲社会行为——这一基本的社会规范,并将其与人类行为进行基准比较;(2)引入了一种行为方法来评估LLM代理在复杂决策场景中的表现。我们探索了不同人格设定和实验框架如何影响这些AI代理在独裁者博弈中的利他行为,并比较了同一LLM家族内、不同LLM家族之间以及与人类行为之间的差异。我们的发现揭示了LLM之间存在显著的差异和不一致,并且与人类行为相比也有明显的区别。仅仅赋予LLM一个人类般的身份并不能产生类似人类的行为。尽管这些AI代理是在大量由人类生成的数据上训练的,但它们无法准确预测人类的决策。LLM代理无法捕捉人类决策的内部过程,其与人类行为的一致性高度依赖于特定的模型架构和提示格式;更糟糕的是,这种依赖并没有遵循清晰的模式。| -|**2024-10-28**|**Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks**|Eitan Farchi et.al.|[2410.21071](http://arxiv.org/abs/2410.21071)|null|大语言模型(LLMs)可以用于多种与代码相关的任务,如从一种编程语言翻译到另一种编程语言、实现自然语言需求和代码总结。最先进的大语言模型技术生成的工件预计在用户只需进行少量简单修改后就可以使用。然而,量化这一模糊概念具有挑战性,因此很难确定与代码相关的LLM解决方案的质量。我们称利用LLM进行判断的评估方法为“LLM作为裁判”,简称LaaJ。在这项工作中,我们介绍了一种生成和评估LaaJ实现的方法论,并利用一个自动生成的基准来实现这一目标。该基准有两个目的,即用于开发和验证LaaJs,以及通过LaaJs来验证和测试与代码相关的LLM解决方案。为此,我们开发了一个自动基准生成引擎,它为多个代码相关任务生成多种编程语言的代码,并将其作为LaaJ评估的输入。我们利用一个图表示G来表示潜在的代码相关生成。图的顶点是生成的工件,边代表可能的生成,例如,从自然语言需求生成Java程序。利用一系列LLM代理和G,我们生成了与代码相关的工件。通过利用图G中的循环,我们制定了对生成工件的期望。利用这些制定的期望,我们可以开发和测试可靠的LLM判断,以评估生成工件的有用性。我们的方法能够创建高质量的代码任务解决方案。| -|**2024-10-28**|**Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments**|Sangmim Song et.al.|[2410.20666](http://arxiv.org/abs/2410.20666)|null|导航对于视力障碍人士(PVI)来说一直是一个重大挑战。虽然传统的辅助工具如白色手杖和导盲犬非常宝贵,但它们在提供详细的环境信息和精确引导至目的地方面仍显不足。近年来,大型语言模型(LLM)和视觉-语言模型(VLM)的发展为增强辅助导航提供了新的途径。在本文中,我们介绍了一种名为Guide-LLM的具身化LLM基代理,旨在帮助视力障碍人士在大型室内环境中导航。我们的方法采用了一种新颖的基于文本的拓扑图,使LLM能够使用简化的环境表示来规划全局路径,重点关注直线路径和直角转弯,以促进导航。此外,我们利用了LLM的常识推理来进行危险检测,并根据用户偏好进行个性化路径规划。模拟实验表明该系统在引导视力障碍人士方面的有效性,突显了其作为辅助技术显著进步的潜力。结果表明,Guide-LLM能够提供高效、适应性强且个性化的导航辅助,预示着该领域的有前景的发展。| +|**2024-10-31**|**From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents**|Nalin Tiwary et.al.|[2410.23555](http://arxiv.org/abs/2410.23555)|null|近年来,基于大规模语言模型(LLM)的框架已经扩展到复杂的现实世界应用,如交互式网页导航。这些系统通过用户命令驱动,使用浏览器完成任务,并通过多轮对话提供服务,既带来了创新机遇也带来了重大挑战。尽管已经引入了用于会话网页导航的基准测试,但对于影响这些代理性能的关键上下文组件的详细理解仍然不足。本研究旨在填补这一空白,通过分析网页导航代理功能所需的多种关键上下文要素来实现这一目标。我们研究了上下文管理的优化,重点关注交互历史和网页表示的影响。我们的工作突显了在分布外场景下(包括未见过的网站、类别和地区)通过有效的上下文管理提高了代理的性能。这些发现为LLM代理的设计和优化提供了见解,使实际应用中的网页导航更加准确和有效。| +|**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|随着大型语言模型(LLMs)扩展到执行现实世界的应用程序,超越传统的NLP任务,评估其稳健性变得越来越重要。然而,现有的基准测试往往忽视了文化和社会意识等关键维度。为了解决这些问题,我们引入了CASA,这是一个旨在评估LLM代理在两个基于网络的任务(在线购物和社交讨论论坛)中的文化和社会规范敏感性的基准。我们的方法评估了LLM代理检测并适当回应违反规范的用户查询和观察的能力。此外,我们提出了一种全面的评估框架,该框架测量意识覆盖率、管理用户查询的帮助性和面对误导性网络内容时的违规率。实验表明,当前的LLM在非代理环境中的表现显著优于基于网络的代理环境,其中代理的意识覆盖率不到10%,违规率超过40%。为了提高性能,我们探索了两种方法:提示和微调,并发现这两种方法可以互补——在特定文化数据集上进行微调可以显著增强代理在不同地区之间的泛化能力,而提示则提升了代理处理复杂任务的能力。这些发现强调了在开发周期中不断对LLM代理的文化和社会意识进行基准测试的重要性。| +|**2024-10-30**|**Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration**|Yanchu Guan et.al.|[2410.22916](http://arxiv.org/abs/2410.22916)|null|自主移动应用交互在移动应用程序复杂性日益增加的背景下变得越来越重要。开发能够有效导航和与移动应用互动的智能代理仍然是一个重大挑战。在本文中,我们提出了一种可解释的行为克隆大语言模型代理(EBC-LLMAgent),这是一种新颖的方法,通过结合大语言模型(LLMs)与行为克隆技术来学习演示,从而创建用于自主移动应用交互的智能且可解释的代理。EBC-LLMAgent 包括三个核心模块:演示编码、代码生成和用户界面映射,这些模块协同工作以捕捉用户演示、生成可执行代码,并建立代码与用户界面元素之间的准确对应关系。我们引入了行为克隆链融合技术来增强代理的泛化能力。通过对来自不同领域的五款流行移动应用进行广泛的实验,证明了 EBC-LLMAgent 的优越性能,在任务完成方面具有高成功率、对未见过场景的高效泛化以及生成有意义的解释。| +|**2024-10-30**|**$\textbf{EMOS}$: $\textbf{E}$mbodiment-aware Heterogeneous $\textbf{M}$ulti-robot $\textbf{O}$perating $\textbf{S}$ ystem with LLM Agents**|Junting Chen et.al.|[2410.22662](http://arxiv.org/abs/2410.22662)|null|异构多机器人系统(HMRS)已成为解决单个机器人无法独立处理的复杂任务的强大方法。当前基于大型语言模型的多智能体系统(LLM-based MAS)在软件开发和操作系统等领域取得了成功,但将其应用于机器人控制则面临独特的挑战。特别是,多机器人系统中每个代理的能力本质上与机器人的物理结构相关,而不是预定义的角色。为了解决这一问题,我们引入了一种新颖的多智能体框架,旨在实现具有不同形态和能力的异构机器人之间的有效协作,并提出了一个新的基准测试名为Habitat-MAS。我们的一个关键设计是“机器人简历”:我们提出了一种自我提示的方法,而非采用人为设计的角色扮演,即代理通过理解机器人的URDF文件并调用机器人运动学工具来生成描述其物理能力的文档,以指导其在任务规划和动作执行中的行为。Habitat-MAS基准测试旨在评估一个多智能体框架如何处理需要体现感知推理的任务,包括1) 操纵、2) 感知、3) 导航以及4) 复杂的多楼层物体重排。实验结果表明,机器人的简历和我们多智能体系统的分层设计对于在这种复杂的任务环境中有效运行异构多机器人系统至关重要。| +|**2024-10-29**|**BENCHAGENTS: Automated Benchmark Creation with Agent Interaction**|Natasha Butt et.al.|[2410.22584](http://arxiv.org/abs/2410.22584)|null|评估受到基准测试可用性的限制。随着模型的发展,需要创建能够衡量新生成能力进展的基准测试。然而,通过人工注释创建新的基准测试既缓慢又昂贵,这限制了对任何能力的全面评估。我们引入了BENCHAGENTS框架,该框架系统地利用大型语言模型(LLMs)自动化创建复杂能力的基准测试,同时确保数据和度量的质量。BENCHAGENTS将基准测试创建过程分解为规划、生成、数据验证和评估四个步骤,每个步骤都由LLM代理执行。这些代理相互交互,并利用基准测试开发者的人机反馈来显式改进和灵活控制数据的多样性和质量。我们使用BENCHAGENTS创建用于评估文本生成过程中规划和约束满足能力的基准测试。然后,我们使用这些基准测试研究七种最先进的模型,并提取关于常见失败模式和模型差异的新见解。| +|**2024-10-29**|**Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents**|Jaekyeom Kim et.al.|[2410.22552](http://arxiv.org/abs/2410.22552)|null|在本文中,我们介绍了Auto-Intent方法,这是一种在不直接进行微调的情况下将预训练的大规模语言模型(LLM)作为目标领域代理的方法,特别关注网页导航任务。我们的方法首先从目标领域的演示中无监督地发现潜在的意图,以高度紧凑的形式(最多三个词)。通过提取的意图,我们训练意图预测器来根据代理过去的观察和行为预测下一个意图。特别是,我们提出了一种自我探索方法,其中概率最高的前k个意图预测被用作提示提供给预训练的LLM代理,从而增强其决策能力。Auto-Intent显著提高了GPT-3.5、GPT-4和Llama-3.1-70B、Llama-3.1-405B代理在大规模真实网站导航基准(来自Mind2Web)和在线导航任务(来自WebArena)上的性能,并且其跨基准的泛化能力也得到了验证。| +|**2024-10-29**|**SceneGenAgent: Precise Industrial Scene Generation with Coding Agent**|Xiao Xia et.al.|[2410.21909](http://arxiv.org/abs/2410.21909)|**[link](https://github.com/thudm/scenegenagent)**|**工业场景的建模对于工业制造中的模拟至关重要。尽管大型语言模型(LLMs)在从文本描述生成一般3D场景方面已经取得了显著进展,但使用LLMs生成工业场景面临着独特的挑战,因为这些场景需要精确的尺寸和定位,这要求对空间布局进行复杂的规划。为了解决这一挑战,我们引入了SceneGenAgent,这是一种基于LLM的代理,用于通过C#代码生成工业场景。SceneGenAgent通过结构化和可计算的格式、布局验证以及迭代优化来确保精确的布局规划,以满足工业场景的定量需求。实验结果表明,由SceneGenAgent驱动的LLMs超过了它们原有的性能,在实际工业场景生成任务中的成功率达到了81.0%,并有效地满足了大多数场景生成需求。为了进一步提高可访问性,我们构建了SceneInstruct,这是一个专门用于微调开源LLMs以集成到SceneGenAgent中的数据集。实验显示,基于SceneInstruct对开源LLMs进行微调可以获得显著的性能提升,Llama3.1-70B的性能接近GPT-4o。我们的代码和数据可在获取。**| +|**2024-10-28**|**Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games**|Ji Ma et.al.|[2410.21359](http://arxiv.org/abs/2410.21359)|null|随着基于大型语言模型(LLM)的代理越来越多地承担现实世界任务并与人类社会互动,我们对它们的行为了解多少?本研究(1)调查了不同人格如何诱导LLM代理的亲社会行为——一种基本的社会规范,并将其与人类行为进行基准测试;(2)引入了一种行为方法来评估LLM代理在复杂决策场景中的表现。我们探讨了不同人格和实验框架如何影响这些AI代理在独裁者博弈中的利他行为,并比较了同一LLM家族内、不同LLM家族之间以及与人类行为之间的差异。我们的发现揭示了LLM之间存在显著的差异和不一致性,并且与人类行为相比也有明显区别。仅仅赋予LLM类似人类的身份并不能产生类似人类的行为。尽管这些AI代理是在大量由人类生成的数据上训练的,但它们无法准确预测人类的决定。LLM代理无法捕捉到人类决策过程的内部机制,其与人类行为的一致性高度依赖于特定的模型架构和提示形式;更糟糕的是,这种依赖并不遵循明确的模式。| +|**2024-10-28**|**Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks**|Eitan Farchi et.al.|[2410.21071](http://arxiv.org/abs/2410.21071)|null|大语言模型(LLMs)可以用于多种与代码相关的任务,例如从一种编程语言翻译到另一种编程语言、实现自然语言需求和代码总结。最先进的大语言模型技术生成的工件有望在用户进行少量简单修改后即可使用。然而,量化这种模糊的概念具有挑战性,因此很难确定与代码相关的LLM解决方案的质量。我们称使用LLM判断来评估LLM解决方案的方法为“LLM作为裁判”,简称LaaJ。在这项工作中,我们介绍了一种生成和评估LaaJ实施的方法论,并利用自动产生的基准进行评估。该基准的目的是双重的,即用于开发和验证LaaJs,以及验证和测试使用LaaJs的大语言模型代码相关解决方案。为此,我们开发了一个自动基准生成引擎,该引擎为多种代码相关任务生成多种编程语言的代码,并将其作为LaaJ评估的输入。我们利用代码相关生成的图形表示G,其中图的顶点是生成的工件,边代表可能的生成,例如从自然语言需求生成Java程序。通过利用LLM代理链和G,我们生成与代码相关的工件。利用G中的循环,我们制定对生成工件的期望。利用这些制定的期望,可以开发和测试可靠的LLM判断,以衡量解决方案生成的工件的有用性。我们的方法能够创建高质量的代码任务解决方案。| +|**2024-10-28**|**Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments**|Sangmim Song et.al.|[2410.20666](http://arxiv.org/abs/2410.20666)|null|导航对于视觉障碍人士(PVI)来说是一个重大挑战。虽然传统的辅助工具如白色手杖和导盲犬非常宝贵,但它们在提供详细的环境信息和精确引导到目的地方面仍显不足。最近大型语言模型(LLM)和视觉-语言模型(VLM)的发展为增强辅助导航提供了新的途径。在本文中,我们介绍了一种名为Guide-LLM的具身化LLM基代理,旨在帮助视觉障碍人士在大型室内环境中导航。我们的方法采用了一种新颖的基于文本的拓扑图,使LLM能够使用简化的环境表示来规划全局路径,重点关注直线路径和直角转弯,以促进导航。此外,我们利用LLM的常识推理进行危险检测,并根据用户偏好进行个性化路径规划。模拟实验表明该系统在引导视觉障碍人士方面的有效性,突显了其作为辅助技术显著进步的潜力。结果表明,Guide-LLM能够提供高效、适应性强且个性化的导航辅助,指出了该领域有希望的发展前景。| |**2024-10-27**|**TrajAgent: An Agent Framework for Unified Trajectory Modelling**|Yuwei Du et.al.|[2410.20445](http://arxiv.org/abs/2410.20445)|**[link](https://github.com/tsinghua-fib-lab/trajagent)**|**轨迹建模,包括轨迹数据模式挖掘和未来预测的研究,在生活服务、城市交通和公共管理等领域有着广泛的应用。针对特定问题,已经提出了许多方法来解决轨迹建模中的各种问题。然而,由于数据的异质性和任务的多样性,实现统一的轨迹建模仍然是一个重要的挑战。在本文中,我们提出了一种基于大型语言模型的代理框架TrajAgent,以统一各种轨迹建模任务。在TrajAgent中,我们首先开发了UniEnv,这是一个具有统一数据和模型接口的执行环境,支持各种模型的执行和训练。在此基础上,我们引入了TAgent,这是一种针对各种轨迹任务自动进行轨迹建模的代理工作流程。具体来说,我们在TAgent中设计了AutOpt,一个系统性的优化模块,进一步提高了集成模型的性能。通过输入自然语言的不同轨迹任务,TrajAgent能够通过训练和执行适当的模型自动生成有竞争力的结果。在四个真实世界数据集上进行的四个任务的大量实验表明,TrajAgent在统一轨迹建模方面是有效的,与基线方法相比,平均性能提高了15.43%。**| |**2024-10-25**|**Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models**|Danqing Wang et.al.|[2410.20007](http://arxiv.org/abs/2410.20007)|null|提升大型语言模型(LLMs)的推理能力对于使其能够解决复杂的多步问题至关重要。多智能体框架在增强LLMs的推理能力方面显示出巨大潜力。然而,LLM智能体之间缺乏有效的合作限制了它们的表现,特别是在多步推理任务中。本文提出了一种新颖的合作多智能体推理框架(CoPlanner),通过分离推理步骤并将不同的任务分配给不同的智能体来实现。CoPlanner由两个LLM智能体组成:规划智能体和推理智能体。规划智能体提供高层次的战略提示,而推理智能体则遵循这些提示并推导出答案。通过通过近端策略优化(PPO)训练规划智能体的策略,基于LLaMA-3-8B的CoPlanner在LogiQA上比之前最好的方法提高了9.94%,在BBH上提高了3.09%。我们的结果表明,规划智能体的指导以及智能体之间的有效合作对CoPlanner在解决多步推理问题方面的优越性能起到了重要作用。| |**2024-10-29**|**Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting**|Mohamed Salim Aissi et.al.|[2410.19920](http://arxiv.org/abs/2410.19920)|null|强化学习(RL)是一种有前景的方法,可以将大型语言模型(LLMs)的知识应用于顺序决策任务。然而,很少有研究深入探讨在特定环境中使用RL微调这些模型对其能力的影响。本文提出了一种新颖的框架,用于分析在文本环境中进行RL训练后,LLM代理对提示格式的敏感性。我们的研究结果表明,当面对与RL训练阶段所使用的不同的提示格式时,LLM的性能会下降。此外,我们通过检查模型的内部表示和显著标记来分析这种敏感性的来源。最后,我们提出使用对比损失来减轻这种敏感性,并提高LLM的鲁棒性和泛化能力。| @@ -323,22 +323,22 @@ |**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|随着算法定价的兴起,人们担忧算法间的合谋问题。我们通过实验使用基于大型语言模型(LLMs)的定价代理,特别是GPT-4,进行了探究。研究发现:(1) LLM驱动的定价机制在定价任务上表现出色;(2) 在寡头竞争环境中,LLM定价代理会自发地进行合谋,从而损害消费者利益;(3) 对LLM指令(“提示”)看似微小的变化可能加剧这种合作行为。这些结果同样适用于拍卖场景。我们的研究结果强调了对算法定价进行反垄断监管的必要性,并揭示了针对LLM定价代理特有的监管挑战。| |**2024-03-31**|**"My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|在这个研究中,我们提出了一种创新的人类记忆架构,旨在提升基于大型语言模型的对话代理的认知能力。我们的设计使得这些代理能自主检索生成响应所需的必要记忆,从而解决LLMs在时间认知上的局限。我们借鉴了人类的记忆线索召回机制作为触发点,以实现精确且高效的回忆。此外,我们开发了一个数学模型,动态量化记忆巩固过程,考虑了诸如上下文相关性、时间流逝和回忆频率等因素。代理会从用户的交互历史中存储记忆,这些记忆被封装在数据库中,每个记忆都包含了内容和时间关联的语境。这样,通过类似人类识别和回忆过往经历的方式,系统能够战略性地存储记忆,并理解它们对用户在时间线上的重要性。| -

(back to top)

+

(back to top)

## llm |Publish Date|Title|Authors|PDF|Code|abstract| |---|---|---|---|---|---| -|**2024-11-01**|**SelfCodeAlign: Self-Alignment for Code Generation**|Yuxiang Wei et.al.|[2410.24198](http://arxiv.org/abs/2410.24198)|**[link](https://github.com/bigcode-project/selfcodealign)**|**指令微调是一种监督微调方法,可以显著提高大型语言模型(LLMs)遵循人类指令的能力。我们提出了SelfCodeAlign,这是第一个完全透明且无限制的管道,用于自我对齐代码LLMs,而无需大量的人工注释或蒸馏。SelfCodeAlign在整个数据生成过程中使用相同的基模型进行推理。它首先从高质量的种子代码片段中提取多样化的编码概念以生成新任务。然后,它针对每个任务采样多个响应,并与测试用例配对,在沙盒环境中验证这些响应。最后,通过选择通过示例来进行指令微调。在我们的主要实验中,我们使用SelfCodeAlign与CodeQwen1.5-7B生成了一个包含74k个指令-响应对的数据集。在该数据集上进行微调后,模型在HumanEval+上的pass@1达到了67.1,超过了CodeLlama-70B-Instruct,尽管前者比后者小十倍。在所有基准测试中,经过微调的模型始终优于之前最先进的方法OctoPack,该方法用于无需人工注释或蒸馏的指令微调。此外,我们展示了SelfCodeAlign在各种大小的LLMs上都是有效的,从3B到33B,并且基模型可以从与自身数据分布的对齐中获益更多。我们进一步验证了我们管道中每个组件的有效性,表明SelfCodeAlign的表现优于直接从GPT-4蒸馏的方法以及领先的基于GPT-3.5的蒸馏方法,如OSS-Instruct和Evol-Instruct。SelfCodeAlign还促成了StarCoder2-Instruct的创建,这是第一个完全透明、许可宽松且自我对齐的代码LLM,实现了最先进的编码性能。**| -|**2024-10-31**|**Constraint Back-translation Improves Complex Instruction Following of Large Language Models**|Yunjia Qi et.al.|[2410.24175](http://arxiv.org/abs/2410.24175)|null|大型语言模型(LLMs)在遵循具有复杂格式、长度等约束的指令时存在困难。传统的方法是在复杂的指令-响应对上进行后训练,这些数据通过对复杂指令进行高级LLMs的输入生成。然而,即使先进的LLMs也无法很好地遵循复杂的指令,这限制了生成数据的质量。在这项工作中,我们发现现有的数据集内在地包含了隐含的复杂约束,并提出了一种新的数据生成技术,称为约束回译。具体来说,我们采用现有数据集中高质量的指令-响应对,并仅使用高级LLMs将响应已经满足的复杂约束添加到指令中,这种方法自然降低了成本和数据噪声。在实验中,我们使用Llama3-70B-Instruct进行回译并创建了一个高质量的复杂指令-响应数据集,命名为CRAB。我们展示了在CRAB上进行后训练可以提高多种基础LLMs的复杂指令遵循能力,在广泛的指令遵循基准上进行了评估。我们进一步发现,约束回译也可以作为后训练中的有用辅助训练目标。我们的代码、数据和模型将被公开发布,以促进未来的研究。| -|**2024-10-31**|**Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning**|Jinghan Zhang et.al.|[2410.24155](http://arxiv.org/abs/2410.24155)|null|近年来,大型语言模型(LLMs)在处理复杂推理任务方面展现出了巨大的潜力,通常通过构建思维链来引导模型进行多步推理解决问题。然而,现有方法往往局限于之前探索过的解决方案空间,从而忽视了LLMs认知范围内的关键盲点。为了解决这些问题,我们设计了Thought Space Explorer (TSE),这是一个新颖的框架,用于扩展和优化思维结构,以引导LLMs探索其思维盲点。通过基于原始思维结构生成新的推理步骤和分支,并采用多种设计策略,TSE拓宽了思维空间,减轻了盲点对LLM推理的影响。在多个级别的推理任务上的实验结果证明了TSE的有效性。我们还进行了广泛的分析,以理解结构化和扩展化的思维如何有助于释放LLM推理能力的潜力。| -|**2024-10-31**|**Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning**|Jiaqi Liu et.al.|[2410.24152](http://arxiv.org/abs/2410.24152)|null|合作驾驶技术在提升交通系统效率和安全性方面至关重要。基于学习的方法,如多智能体强化学习(MARL),在合作决策任务中展示了强大的能力。然而,现有的MARL方法仍然面临学习效率和性能方面的挑战。近年来,大规模语言模型(LLM)迅速发展,在各种序列决策任务中展现了卓越的能力。为了增强合作代理的学习能力,同时确保决策效率和成本效益,我们提出了一种名为LDPD的语言驱动策略蒸馏方法,用于指导MARL探索。在这个框架中,基于LLM的教师代理训练较小的学生代理通过自身的决策演示实现合作决策。教师代理增强了CAV的观察信息,并利用LLM进行复杂的合作决策推理,同时借助精心设计的决策工具实现专家级决策,提供高质量的教学经验。学生代理随后通过梯度策略更新将教师的先验知识提炼到自己的模型中。实验表明,学生代理可以在少量指导的情况下快速提高其能力,并最终超越教师的表现。广泛的实验表明,我们的方法在性能和学习效率方面优于基线方法。| -|**2024-10-31**|**Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing**|Akash Dhruv et.al.|[2410.24119](http://arxiv.org/abs/2410.24119)|**[link](https://github.com/neucol/llm-conversion-performance)**|**基础模型和生成式人工智能(GenAI)的出现有望变革科学计算中的生产力,特别是在代码开发、重构以及从一种编程语言转换到另一种编程语言方面。然而,由于GenAI的输出无法保证正确性,因此仍然需要人工干预。部分这种干预可以通过任务特定工具实现,并结合正确的验证方法和有效的提示开发方法。我们研究了GenAI在辅助代码转换、语言互操作性和代码库检查方面的应用,这些应用在一个用于模拟大型强子对撞机(LHC)中粒子相互作用的遗留Fortran代码库中进行了探索。在此过程中,我们开发了一款名为CodeScribe的工具,该工具结合了提示工程与用户监督,建立了一个高效的代码转换流程。本文展示了CodeScribe如何协助将Fortran代码转换为C++,生成Fortran-C API以集成遗留系统与现代C++库,并提供开发者支持以进行代码组织和算法实现。同时,我们也讨论了AI驱动的代码转换面临的挑战,并强调了它在提升科学计算工作流程生产力方面的优势。**| -|**2024-10-31**|**Repository-Level Compositional Code Translation and Validation**|Ali Reza Ibrahimzada et.al.|[2410.24117](http://arxiv.org/abs/2410.24117)|null|代码翻译是指将程序从一种编程语言转换成另一种编程语言。已有几种基于规则的转译器被设计出来,以实现不同编程语言之间的自动化代码转换。然而,随着编程语言的发展,这些规则可能会过时,并且无法推广到其他编程语言。近期的研究探索了使用大型语言模型(LLMs)来实现代码翻译的自动化。一个关键观察是,这样的技术可能在精心设计的基准测试中表现良好,但在实际项目中的规模和复杂度下,特别是在涉及依赖关系、自定义类型、特定于编程语言的功能等方面时,往往无法很好地泛化。 我们提出了AlphaTrans,这是一种神经符号方法,用于实现存储库级别的代码翻译。AlphaTrans不仅翻译源代码,还翻译测试代码,并采用多级验证确保翻译后的代码保留原始程序的功能。为了分解问题以便LLMs处理,AlphaTrans利用程序分析将程序分解成片段,并按逆调用顺序翻译它们。我们使用AlphaTrans翻译了十个现实世界中的开源项目,这些项目包含少于836、8575和2719个类、方法和测试。AlphaTrans成功翻译了这些项目的整个代码库,其中包括6899个源代码片段。99.1%的翻译代码片段在语法上是正确的,AlphaTrans验证了其中25.8%的翻译在运行时行为和功能正确性。平均而言,集成的翻译和验证过程需要36小时才能完成一个项目的翻译,显示出其在实际应用中的可扩展性。对于那些在语法或语义上不正确的翻译,AlphaTrans会生成一份报告,包括现有翻译、堆栈跟踪、测试错误或断言失败。我们将这些结果提供给两位开发者,让他们修复四个项目中的翻译错误。他们平均花费20.1小时解决了这些问题,并实现了所有测试通过。| -|**2024-10-31**|**Matchmaker: Self-Improving Large Language Model Programs for Schema Matching**|Nabeel Seedat et.al.|[2410.24105](http://arxiv.org/abs/2410.24105)|null|schema匹配——即在具有不同表和层次结构的异构数据源之间找到属性之间的匹配关系——对于创建可用于机器学习(ML)的互操作性数据至关重要。解决这一基础性的以数据为中心的问题具有广泛的影响,特别是在医疗、金融和电子商务等领域,但也有可能更普遍地通过增加用于训练ML模型的数据量来使ML模型受益。然而,由于不同模式之间的结构/层次和语义异质性,schema匹配是一个具有挑战性的ML任务。先前的自动化schema匹配的ML方法要么需要大量的标注数据进行模型训练,这通常是不现实的,要么零样本性能较差。为此,我们提出了Matchmaker——一种用于schema匹配的组合式语言模型程序,该程序由候选生成、优化和置信度评分组成。Matchmaker还通过一种新颖的优化方法,在无需标注示例的情况下实现在零样本情况下的自我改进,该方法构建合成的上下文示例来指导语言模型的推理过程。实证研究表明,在真实世界的医学schema匹配基准上,Matchmaker优于以前的基于ML的方法,突显了其加速数据集成和ML就绪数据互操作性的潜力。| -|**2024-10-31**|**Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs**|Muhammed Saeed et.al.|[2410.24049](http://arxiv.org/abs/2410.24049)|null|大型语言模型(LLMs)被广泛使用,但因其内部嵌入的社会偏见而引发伦理问题。本研究在包括女性权利、恐怖主义和反犹太主义在内的八个领域评估了LLM对阿拉伯人与西方人的偏见,并评估了模型抵抗延续这些偏见的能力。为此,我们创建了两个数据集:一个用于评估LLM对阿拉伯人与西方人的偏见,另一个用于测试模型在面对夸大负面特征的提示(“越狱”)时的安全性。我们评估了六种LLM模型——GPT-4、GPT-4o、LlaMA 3.1 (8B & 405B)、Mistral 7B和Claude 3.5 Sonnet。我们发现,在79%的情况下,模型表现出对阿拉伯人的负面偏见,其中LlaMA 3.1-405B表现出最严重的偏见。我们的“越狱”测试显示,尽管GPT-4o是经过优化的版本,但它最容易受到攻击,在三个类别中的攻击成功率最高,其次是LlaMA 3.1-8B和Mistral 7B。除了Claude之外,所有LLM在三个类别中的攻击成功率均超过87%。我们还发现,尽管Claude 3.5 Sonnet是最安全的模型,但它仍然在八个类别中的七个类别中表现出偏见。尽管是GPT-4的优化版本,我们发现GPT-4o更容易出现偏见和“越狱”,这表明优化存在缺陷。我们的研究结果强调了需要更强大的偏见缓解策略和强化的安全措施。| -|**2024-10-31**|**Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks**|Yingzhe Peng et.al.|[2410.24032](http://arxiv.org/abs/2410.24032)|null|大规模语言模型(LLMs)的兴起已经彻底改变了用户与知识系统之间的交互方式,使得聊天机器人能够整合大量的信息并协助处理复杂的探索性任务。然而,基于LLM的聊天机器人在提供个性化支持方面往往存在困难,特别是在用户以模糊查询开始或缺乏足够的上下文信息时。本文介绍了一个名为“协作个性化探索助手”(CARE)的系统,该系统旨在通过结合多代理LLM框架和结构化的用户界面来增强个性化探索任务。CARE的界面包括聊天面板、解决方案面板和需求面板,实现了迭代查询细化和动态解决方案生成。多代理框架协同工作,识别用户的显性和隐性需求,从而提供量身定制的、可操作的解决方案。在一项涉及22名参与者的被试内用户研究中,CARE被一致认为优于基线LLM聊天机器人,用户称赞其能够减轻认知负担、激发创造力,并提供更加个性化的解决方案。我们的研究结果表明,CARE有可能将基于LLM的系统从被动的信息检索者转变为个性化问题解决和探索中的积极合作伙伴。| -|**2024-10-31**|**AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents**|Yifan Xu et.al.|[2410.24024](http://arxiv.org/abs/2410.24024)|null|自主代理在与现实世界交互方面变得越来越重要。特别是,Android代理作为一种交互方法被频繁提及。然而,现有的训练和评估Android代理的研究缺乏对开源和闭源模型的系统性研究。在这项工作中,我们提出了AndroidLab作为一套系统的Android代理框架。它包括一个具有不同模态、动作空间和可重复基准的操作环境。它支持在同一动作空间中的大型语言模型(LLMs)和多模态模型(LMMs)。AndroidLab基准包括预定义的Android虚拟设备和九个应用上的138个任务。通过使用AndroidLab环境,我们开发了一个Android指令数据集,并训练了六个开源LLMs和LMMs,将LLMs的平均成功率从4.59%提升到21.50%,LMMs的平均成功率从1.93%提升到13.28%。AndroidLab已开源并公开提供,网址为。| +|**2024-11-01**|**SelfCodeAlign: Self-Alignment for Code Generation**|Yuxiang Wei et.al.|[2410.24198](http://arxiv.org/abs/2410.24198)|**[link](https://github.com/bigcode-project/selfcodealign)**|**指令微调是一种监督微调方法,显著提高了大型语言模型(LLMs)遵循人类指令的能力。我们提出了SelfCodeAlign,这是首个完全透明且许可宽松的管道,用于自我对齐代码LLMs,而无需大量的手动标注或蒸馏。SelfCodeAlign在整个数据生成过程中使用相同的基模型进行推理。它首先从高质量的种子代码片段中提取多样化的编码概念以生成新任务。然后,它为每个任务采样多个响应,并将其与测试用例配对,在沙盒环境中进行验证。最后,通过选择通过测试的示例进行指令微调。在我们的主要实验中,我们使用SelfCodeAlign与CodeQwen1.5-7B一起生成了一个包含74k个指令-响应对的数据集。在此数据集上进行微调后,该模型在HumanEval+上的pass@1达到了67.1%,超过了CodeLlama-70B-Instruct,尽管其规模小了十倍。在所有基准测试中,这个经过微调的模型始终优于之前最先进的无需人工标注或蒸馏的指令微调方法OctoPack。此外,我们展示了SelfCodeAlign在各种规模的LLMs(从3B到33B)上都是有效的,并且基模型可以从与自身数据分布的对齐中受益更多。我们还验证了管道中每个组件的有效性,显示SelfCodeAlign在直接从GPT-4o蒸馏和领先的基于GPT-3.5的蒸馏方法(如OSS-Instruct和Evol-Instruct)方面均表现出色。SelfCodeAlign还促成了StarCoder2-Instruct的创建,这是首个完全透明、许可宽松且自我对齐的代码LLM,实现了最先进的编码性能。**| +|**2024-10-31**|**Constraint Back-translation Improves Complex Instruction Following of Large Language Models**|Yunjia Qi et.al.|[2410.24175](http://arxiv.org/abs/2410.24175)|null|大型语言模型(LLMs)在遵循具有复杂格式、长度等约束的指令时存在困难。传统上,先前的工作通过向先进的LLMs提供复杂的指令-响应对来进行后训练,以处理这些复杂指令。然而,即使是先进的LLMs也难以很好地遵循复杂的指令,从而限制了生成数据的质量。在这项工作中,我们发现现有的数据集内在地包含了隐含的复杂约束,并提出了一种新颖的数据生成技术——约束回译。具体来说,我们采用现有数据集中高质量的指令-响应对,并仅使用先进的LLMs将响应已满足的复杂约束添加到指令中,这自然降低了成本和数据噪声。在实验中,我们使用Llama3-70B-Instruct进行约束回译,创建了一个高质量的复杂指令-响应数据集,命名为CRAB。我们展示了在CRAB上进行后训练可以提高多种基础LLMs的复杂指令遵循能力,在广泛的指令遵循基准上进行了评估。我们进一步发现,约束回译也可以作为后训练中的有用辅助训练目标。我们的代码、数据和模型将被发布,以促进未来的研究。| +|**2024-10-31**|**Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning**|Jinghan Zhang et.al.|[2410.24155](http://arxiv.org/abs/2410.24155)|null|近年来,大型语言模型(LLMs)在处理复杂推理任务方面展现出了巨大的潜力,通常通过构建思维链来指导模型进行多步推理。然而,现有的方法往往局限于先前探索过的解决方案空间,从而忽略了LLMs认知范围内的关键盲点。为了解决这些问题,我们设计了Thought Space Explorer (TSE),这是一种新颖的框架,旨在扩展和优化思维结构,以引导LLMs探索其思维盲点。通过基于原始思维结构生成新的推理步骤和分支,并采用各种设计策略,TSE扩展了思维空间并减轻了盲点对LLM推理的影响。在多个级别的推理任务上的实验结果证明了TSE的有效性。我们还进行了广泛的分析,以理解结构化和扩展化的思维如何有助于释放LLM推理能力的潜力。| +|**2024-10-31**|**Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning**|Jiaqi Liu et.al.|[2410.24152](http://arxiv.org/abs/2410.24152)|null|合作驾驶技术对于提升交通系统的效率和安全性至关重要。基于学习的方法,如多智能体强化学习(MARL),在合作决策任务中展示了强大的能力。然而,现有的MARL方法仍然面临学习效率和性能方面的挑战。近年来,大规模语言模型(LLM)迅速发展,并在各种顺序决策任务中表现出色。为了增强合作代理的学习能力,同时确保决策效率和成本效益,我们提出了一种名为LDPD的语言驱动策略蒸馏方法来引导MARL探索。在这个框架中,基于LLM的教师代理训练较小的学生代理通过其自身的决策演示实现合作决策。教师代理增强了自动驾驶车辆的观察信息,并利用LLM进行复杂的合作决策推理,同时也利用精心设计的决策工具实现专家级决策,提供高质量的教学经验。学生代理通过梯度策略更新将教师的先验知识提炼到自己的模型中。实验表明,学生可以在最少的教师指导下快速提高其能力,并最终超越教师的表现。广泛的实验表明,我们的方法在性能和学习效率方面优于基线方法。| +|**2024-10-31**|**Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing**|Akash Dhruv et.al.|[2410.24119](http://arxiv.org/abs/2410.24119)|**[link](https://github.com/neucol/llm-conversion-performance)**|**基础模型和生成式人工智能(GenAI)的出现有望改变科学计算中的生产力,特别是在代码开发、重构以及从一种编程语言转换到另一种编程语言方面。然而,由于GenAI的输出不能保证正确性,因此仍然需要人工干预。部分这种干预可以通过任务特定工具以及用于正确性验证和有效提示开发的附加方法来自动化。我们研究了GenAI在辅助代码转换、语言互操作性和在用于模拟大型强子对撞机(LHC)粒子相互作用的遗留Fortran代码库中进行代码库检查方面的应用。在此过程中,我们开发了一款名为CodeScribe的工具,结合提示工程与用户监督,建立了一个高效的代码转换流程。在本文中,我们展示了CodeScribe如何帮助将Fortran代码转换为C++,生成Fortran-C API以集成遗留系统与现代C++库,并提供开发者支持以实现代码组织和算法实施。我们还讨论了AI驱动的代码转换面临的挑战,并强调其在提高科学计算工作流程生产力方面的优势。**| +|**2024-10-31**|**Repository-Level Compositional Code Translation and Validation**|Ali Reza Ibrahimzada et.al.|[2410.24117](http://arxiv.org/abs/2410.24117)|null|代码翻译是将程序从一种编程语言转换为另一种编程语言的过程。一些基于规则的转译器已经被设计出来,以实现不同编程语言对之间的自动化代码翻译。然而,这些规则可能会因编程语言的发展而变得过时,并且无法推广到其他编程语言。近期的研究探索了使用大型语言模型(LLMs)来自动化代码翻译。一个关键观察是,这样的技术可能在精心设计的基准测试中表现良好,但在真实世界的项目中,由于依赖关系、自定义类型、特定于编程语言的功能等因素的存在,它们可能难以泛化。 我们提出了AlphaTrans,这是一种神经符号方法,用于自动化整个代码仓库级别的代码翻译。AlphaTrans不仅翻译源代码,还翻译测试代码,并采用多级验证确保翻译后的代码保留了源程序的功能。为了分解问题以便让LLMs处理,AlphaTrans利用程序分析将程序分解成片段,并按逆调用顺序进行翻译。我们使用AlphaTrans翻译了十个现实世界中的开源项目,这些项目包含的类、方法和测试分别有<836, 8575, 2719>个。AlphaTrans成功翻译了这些项目的所有代码库,共包括6899个代码片段。99.1%的翻译代码片段在语法上是正确的,AlphaTrans验证了其中25.8%的运行时行为和功能正确性。平均而言,集成翻译和验证过程需要36小时来翻译一个项目,显示出其在实际应用中的可扩展性。对于那些在语法或语义上不正确的翻译,AlphaTrans生成一份报告,其中包括现有的翻译、堆栈跟踪、测试错误或断言失败。我们向两位开发者提供了这些辅助材料,帮助他们在四个项目中修复翻译错误。他们平均花费20.1小时解决了这些问题,并使所有测试通过。| +|**2024-10-31**|**Matchmaker: Self-Improving Large Language Model Programs for Schema Matching**|Nabeel Seedat et.al.|[2410.24105](http://arxiv.org/abs/2410.24105)|null|实体匹配——即在具有不同表和层次结构的异构数据源之间找到属性之间的匹配——对于创建可用于机器学习(ML)的数据至关重要。这一基础性的数据问题在医疗、金融和电子商务等领域尤为重要,同时也能够更广泛地通过增加用于训练ML模型的数据量来使ML模型受益。然而,由于不同模式之间的结构/层次和语义异质性,实体匹配是一个具有挑战性的ML任务。先前的自动化实体匹配的ML方法要么需要大量的标注数据进行模型训练,这通常是不现实的,要么零样本性能较差。为此,我们提出了Matchmaker——一种用于实体匹配的组合式语言模型程序,该程序由候选生成、优化和置信度评分组成。Matchmaker还通过一种新颖的优化方法实现在零样本情况下自我改进,该方法构建合成上下文演示以引导语言模型的推理过程。实证研究表明,在真实世界的医学实体匹配基准上,Matchmaker优于之前的基于ML的方法,突显了其加速数据集成和ML就绪数据互操作性的潜力。| +|**2024-10-31**|**Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs**|Muhammed Saeed et.al.|[2410.24049](http://arxiv.org/abs/2410.24049)|null|大型语言模型(LLMs)在广泛应用的同时引发了伦理问题,因为它们内置了社会偏见。本研究在包括女性权利、恐怖主义和反犹太主义在内的八个领域中考察了LLMs对阿拉伯人与西方人的偏见,并评估了这些模型抵抗延续这些偏见的能力。为此,我们创建了两个数据集:一个用于评估LLM对阿拉伯人与西方人的偏见,另一个用于测试模型对放大负面特征的提示的安全性(“越狱”)。我们评估了六种LLM——GPT-4、GPT-4o、LlaMA 3.1(8B & 405B)、Mistral 7B和Claude 3.5 Sonnet。我们发现79%的案例显示出对阿拉伯人的负面偏见,其中LlaMA 3.1-405B是最具偏见的模型。我们的“越狱”测试显示,尽管GPT-4o是经过优化的版本,但它却是最易受攻击的,其次是LlaMA 3.1-8B和Mistral 7B。除了Claude外,所有LLM在三个类别中的攻击成功率均超过87%。我们还发现Claude 3.5 Sonnet的安全性最高,但仍然在八个类别中的七个显示出偏见。尽管GPT-4o是GPT-4的一个优化版本,但我们发现它更容易受到偏见和“越狱”的影响,这表明优化存在缺陷。我们的研究结果强调了需要更强大的偏见缓解策略和强化安全措施的紧迫性。| +|**2024-10-31**|**Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks**|Yingzhe Peng et.al.|[2410.24032](http://arxiv.org/abs/2410.24032)|null|大规模语言模型(LLM)的兴起已经彻底改变了用户与知识系统之间的交互方式,使得聊天机器人能够整合大量的信息并协助处理复杂的探索性任务。然而,基于LLM的聊天机器人往往难以提供个性化支持,尤其是在用户以模糊查询开始或缺乏足够的上下文信息时。本文介绍了一种名为“个性化探索协作助理”(CARE)的系统,该系统通过结合多代理LLM框架和结构化的用户界面来增强个性化在探索性任务中的应用。CARE的界面包括聊天面板、解决方案面板和需求面板,使迭代式查询细化和动态解决方案生成成为可能。多代理框架协同工作,以识别显性和隐性用户需求,从而提供定制化的、可操作的解决方案。在一项涉及22名参与者的被试内用户研究中,CARE相对于基线LLM聊天机器人一直受到欢迎,用户称赞其能够减轻认知负担、激发创造力,并提供更加个性化的解决方案。我们的研究结果表明,CARE有可能将基于LLM的系统从被动的信息检索者转变为个性化问题解决和探索中的积极合作伙伴。| +|**2024-10-31**|**AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents**|Yifan Xu et.al.|[2410.24024](http://arxiv.org/abs/2410.24024)|null|自主代理在与现实世界互动中的重要性日益增加。特别是,安卓代理作为一种交互方法被频繁提及。然而,现有的用于训练和评估安卓代理的研究缺乏对开源和闭源模型系统的系统性研究。在这项工作中,我们提出了AndroidLab作为系统化的安卓代理框架。它包括一个具有不同模态的操作环境、动作空间以及可重复使用的基准测试。它支持在同一动作空间下的大型语言模型(LLMs)和多模态模型(LMMs)。AndroidLab基准测试包括预定义的安卓虚拟设备和九个应用上的138个任务。通过使用AndroidLab环境,我们开发了一个安卓指令数据集,并训练了六个开源的LLMs和LMMs,将LLMs的成功率从4.59%提升到21.50%,LMMs的成功率从1.93%提升到13.28%。AndroidLab已开源并公开提供,网址为。| |**2024-10-30**|**EMMA: End-to-End Multimodal Model for Autonomous Driving**|Jyh-Jing Hwang et.al.|[2410.23262](http://arxiv.org/abs/2410.23262)|null|我们介绍了EMMA,这是一种用于自动驾驶的端到端多模态模型。该模型基于多模态大型语言模型基础,直接将原始相机传感器数据映射到各种与驾驶相关的输出,包括规划轨迹、感知对象和道路图元素。EMMA通过将所有非传感器输入(例如导航指令和自车状态)和输出(例如轨迹和三维位置)表示为自然语言文本,最大限度地利用了预训练大型语言模型中的世界知识。这种方法使EMMA能够在统一的语言空间中联合处理各种驾驶任务,并使用特定任务提示生成每个任务的输出。实证研究表明,EMMA在nuScenes上的运动规划方面达到了最先进的性能,并在Waymo开放运动数据集(WOMD)上取得了具有竞争力的结果。此外,EMMA在Waymo开放数据集(WOD)上作为主要摄像头的三维目标检测也取得了具有竞争力的结果。我们展示了通过同时训练EMMA进行规划轨迹、目标检测和道路图任务可以在这三个领域都取得改进,突显了EMMA作为自动驾驶应用中的通用模型的潜力。然而,EMMA也表现出一些局限性:它只能处理少量图像帧,不包含准确的三维传感模态如激光雷达或雷达,并且计算成本较高。我们希望我们的结果能够激发进一步的研究,以解决这些问题并进一步发展自动驾驶模型架构。| |**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|随着大型语言模型(LLMs)扩展到执行现实世界应用中的代理任务,超越传统的自然语言处理任务,评估其鲁棒性变得越来越重要。然而,现有的基准测试往往忽视了文化和社会意识等关键维度。为了解决这些问题,我们引入了CASA,这是一个旨在评估LLM代理在两个基于网络的任务(在线购物和社交讨论论坛)中对文化和社会规范的敏感性的基准。我们的方法评估了LLM代理检测并适当回应违反规范的用户查询和观察的能力。此外,我们提出了一种全面的评估框架,该框架测量代理对文化和社会规范的意识覆盖率、在管理用户查询时的实用性以及面对误导性网络内容时的违规率。实验表明,当前的LLM在非代理环境中的表现显著优于在网络代理环境中,代理的意识覆盖率不到10%,违规率超过40%。为了提高性能,我们探索了两种方法:提示和微调,并发现这两种方法可以互补——针对特定文化的数据集进行微调可以显著增强代理在不同地区的泛化能力,而提示则能提升代理处理复杂任务的能力。这些发现突显了在开发周期中不断基准测试LLM代理的文化和社会意识的重要性。| |**2024-10-30**|**Carrot and Stick: Eliciting Comparison Data and Beyond**|Yiling Chen et.al.|[2410.23243](http://arxiv.org/abs/2410.23243)|null|比较数据通常来自于人们的主观判断,并且难以直接验证。这些数据对于许多机器学习任务至关重要,包括基于人类反馈的强化学习和排名模型估计。如何诚实地从理性个体那里获取这样的比较数据?我们设计了同伴预测机制来利用奖金-惩罚支付方式来获取比较数据。我们的设计依赖于比较数据的强随机传递性,从而创建对称的严格真实机制,使得说实话不仅形成严格的贝叶斯纳什均衡,而且在所有对称均衡中获得最高报酬。在我们的机制下,每个个体只需要评估一对项目并报告她的比较结果。 我们进一步将奖金-惩罚支付的概念扩展到网络化数据的获取上,设计了一种当代理人的私人信号根据Ising模型采样时,对称地严格真实的机制。我们提供了奖金-惩罚支付成为严格贝叶斯纳什均衡的必要和充分条件。在两个现实世界的数据集上的实验进一步支持了我们的理论发现。| @@ -1410,5 +1410,5 @@ |**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|大型语言模型(LLMs)在根据自然语言描述自动生成可执行代码方面展现出巨大潜力,特别是通过互动功能,用户可以通过迭代反馈指导模型。然而,当前的互动方式往往假设用户具备调试源代码的专业知识,对非专业程序员不太友好。这使得使互动代码生成对不同编程水平的个体更易于使用成为一个挑战。为解决这个问题,我们提出了IntelliExplain,这是一种创新的人机交互范式,通过让用户通过自然语言解释与源代码互动,提升非专业人士的体验。用户通过提供他们发现错误的自然语言纠正反馈,来指导系统修订代码,直到用户对系统的代码解释感到满意。我们的用户研究显示,使用IntelliExplain的用户在Text-to-SQL和Python代码生成任务中的成功率分别比纯GPT-3.5提高了11.6%和25.3%,同时所需时间分别减少了39.0%和15.6%。| |**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|**[link](https://github.com/CAS-SIAT-XinHai/CPsyExam)**|在这篇论文中,我们提出了一种创新的心理学基准测试——CPsyExam,它源于中国语言考试的问题。CPsyExam旨在分别强调心理学知识和案例分析的重要性,认识到将心理学知识应用于实际情境的价值。从22,000个问题库中,我们精选了4,000个来构建该基准,确保了主题的均衡覆盖,并包含了各种案例分析方法的多样性。此外,我们对一系列现有的大型语言模型(LLMs)进行了评估,包括开源和API基础的模型。实验和分析结果显示,CPsyExam是一个有效的确立语言模型对心理学理解能力的基准,同时支持在不同粒度上比较这些模型。| -

(back to top)

+

(back to top)

diff --git a/docs/agent-arxiv-daily.json b/docs/agent-arxiv-daily.json index b5af380e330..52852b731fa 100644 --- a/docs/agent-arxiv-daily.json +++ b/docs/agent-arxiv-daily.json @@ -1 +1 @@ -{"agent": {"2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u5d4c\u5165\u5f0f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7cfb\u7edf\u5728\u7a7a\u95f4\u8ba4\u77e5\u548c\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u96c6\u6210\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u8bba\u6587\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u63ed\u793a\u4e86\u660e\u663e\u7684\u8fdb\u5c55\uff0c\u4f46\u4e5f\u5f3a\u8c03\u4e86\u5f00\u53d1\u65b0\u65b9\u6cd5\u4ee5\u5145\u5206\u5229\u75283D-LLMs\u6f5c\u529b\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u6307\u660e\u9053\u8def\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u7efc\u8ff0\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.09935": "|**2024-05-24**|**DEBATE: Devil's Advocate-Based Assessment and Text Evaluation**|Alex Kim et.al.|[2405.09935](http://arxiv.org/abs/2405.09935)|**[link](https://github.com/gunny97/DEBATE)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u6a21\u578b\u7684\u666e\u53ca\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u53d8\u5f97\u65e5\u76ca\u5173\u952e\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65e0\u53c2\u8003\u8bc4\u4ef7\u5668\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u5904\u7406\u65b0\u4efb\u52a1\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u91c7\u7528\u5355\u4ee3\u7406\u65b9\u6cd5\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\u3002\u56e0\u4e3aLLM\u4ee3\u7406\u7684\u56de\u7b54\u5b58\u5728\u504f\u89c1\uff0c\u6bd4\u5982\u5bf9\u7279\u5b9a\u6587\u672c\u7ed3\u6784\u6216\u5185\u5bb9\u7684\u504f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u5de5\u4f5c\u4e2d\u63d0\u51faDEBATE\uff0c\u4e00\u4e2a\u5efa\u7acb\u5728\u591a\u4ee3\u7406\u8bc4\u5206\u7cfb\u7edf\u57fa\u7840\u4e0a\u7684NLG\u8bc4\u4ef7\u6846\u67b6\uff0c\u878d\u5165\u4e86\u201c\u6076\u9b54\u8fa9\u624b\u201d\u7684\u6982\u5ff5\u3002\u5728\u8be5\u6846\u67b6\u4e2d\uff0c\u4e00\u4e2a\u4ee3\u7406\u88ab\u6307\u4ee4\u6279\u8bc4\u5176\u4ed6\u4ee3\u7406\u7684\u8bba\u70b9\uff0c\u4ece\u800c\u53ef\u80fd\u6d88\u89e3LLM\u4ee3\u7406\u7b54\u6848\u4e2d\u7684\u504f\u89c1\u3002DEBATE\u5728\u4e24\u4e2aNLG\u8bc4\u4ef7\u5143\u8bc4\u4f30\u57fa\u51c6\u2014\u2014SummEval\u548cTopicalChat\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4ee3\u7406\u4e4b\u95f4\u7684\u8fa9\u8bba\u5e7f\u5ea6\u4ee5\u53ca\u4ee3\u7406\u7684\u4eba\u683c\u7279\u8d28\u4f1a\u5f71\u54cd\u8bc4\u4ef7\u5668\u7684\u6027\u80fd\u3002|\n", "2405.05175": "|**2024-05-08**|**Air Gap: Protecting Privacy-Conscious Conversational Agents**|Eugene Bagdasaryan et.al.|[2405.05175](http://arxiv.org/abs/2405.05175)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u5f0f\u4ee3\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5904\u7406\u654f\u611f\u7528\u6237\u6570\u636e\u65f6\u5f15\u53d1\u4e86\u4e25\u91cd\u7684\u9690\u79c1\u95ee\u9898\u3002\u8fd9\u4e9b\u4ee3\u7406\u867d\u80fd\u7406\u89e3\u5e76\u5904\u7406\u4e0a\u4e0b\u6587\uff0c\u4f46\u4e5f\u53ef\u80fd\u88ab\u6076\u610f\u4e00\u65b9\u5229\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5a01\u80c1\u6a21\u578b\uff0c\u5373\u7b2c\u4e09\u65b9\u5e94\u7528\u901a\u8fc7\u64cd\u63a7\u4ea4\u4e92\u4e0a\u4e0b\u6587\uff0c\u8bef\u5bfcLLM\u4ee3\u7406\u6cc4\u9732\u4e0e\u5176\u4efb\u52a1\u65e0\u5173\u7684\u79c1\u4eba\u4fe1\u606f\u3002\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u6846\u67b6\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AirGapAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u6ce8\u91cd\u9690\u79c1\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u9650\u5236\u4ee3\u7406\u4ec5\u8bbf\u95ee\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u6570\u636e\uff0c\u9632\u6b62\u610f\u5916\u7684\u6570\u636e\u6cc4\u6f0f\u3002\u5b9e\u9a8c\u4f7f\u7528Gemini\u3001GPT\u548cMistral\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\uff0c\u7ed3\u679c\u663e\u793aAirGapAgent\u5728\u62b5\u5fa1\u57fa\u4e8e\u5355\u4e2a\u67e5\u8be2\u7684\u4e0a\u4e0b\u6587\u52ab\u6301\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8eGemini Ultra\u4ee3\u7406\uff0c\u8fd9\u79cd\u653b\u51fb\u4ece94%\u7684\u4fdd\u62a4\u80fd\u529b\u964d\u4f4e\u523045%\uff0c\u800cAirGapAgent\u53ef\u4ee5\u4fdd\u630197%\u7684\u9632\u62a4\u6548\u679c\uff0c\u4f7f\u540c\u6837\u7684\u653b\u51fb\u5931\u6548\u3002|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325](http://arxiv.org/abs/2405.04325)|null|\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u867d\u4e3a\u6784\u5efa\u81ea\u7136\u8bed\u8a00\u4ee3\u7406\u63d0\u4f9b\u4e86\u5f3a\u5927\u57fa\u7840\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u5b83\u4eec\u53ca\u5176\u57fa\u4e8e\u5b83\u4eec\u6784\u5efa\u7684\u81ea\u4e3b\u4ee3\u7406\u7684\u5b89\u5168\u6027\u62c5\u5fe7\u3002\u7279\u522b\u662f\u6b3a\u9a97\u80fd\u529b\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662fAI\u4ee3\u7406\u901a\u8fc7\u6df7\u6dc6\u548c\u6a21\u68f1\u4e24\u53ef\u6765\u8bef\u5bfc\u3001\u9690\u85cf\u771f\u76f8\u6216\u63a8\u5e7f\u90e8\u5206\u4e0d\u771f\u5b9e\u7684\u4fe1\u5ff5\u7684\u884c\u4e3a\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80AI\u5b89\u5168\u7814\u7a76\u4e2d\u7684\u6492\u8c0e\u3001\u81ea\u79c1\u51b3\u7b56\u6216\u63d0\u4f9b\u865a\u5047\u4fe1\u606f\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u4e00\u7c7b\u7279\u6b8a\u7684\u6b3a\u9a97\uff1a\u7c7b\u4f3c\u4e8e\u9b54\u672f\u5e08\u5229\u7528\u969c\u773c\u6cd5\u8ba9\u5154\u5b50\u4ece\u5e3d\u5b50\u91cc\u51fa\u73b0\uff0c\u8981\u4e48\u901a\u8fc7\u9690\u85cf\u7684\u6697\u95e8\uff0c\u8981\u4e48\u901a\u8fc7\u8f6c\u79fb\u6ce8\u610f\u529b\u76f4\u63a5\u5c55\u793a\u3002 \u6211\u4eec\u7684\u65b0\u5b9e\u9a8c\u5e73\u53f0\u5728\u4e00\u4e2a\u6709\u76ee\u6807\u7684\u73af\u5883\u4e2d\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u5bf9\u6297\u6027\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65f6\u7684\u6b3a\u9a97\u56fa\u6709\u80fd\u529b\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u7acb\u6cd5\u4efb\u52a1\u201c\u6e38\u8bf4\u201d\u8bae\u6848\u3002\u5728\u76ee\u6807\u9a71\u52a8\u7684\u73af\u5883\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u6784\u5efa\u6b3a\u9a97\u80fd\u529b\uff0c\u7ed3\u5408\u8bed\u8a00\u54f2\u5b66\u548c\u8ba4\u77e5\u5fc3\u7406\u5b66\u7406\u8bba\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6e38\u8bf4\u4ee3\u7406\u5728\u5bf9\u6297\u4e92\u52a8\u7684\u540e\u7eed\u5f3a\u5316\u8bd5\u9a8c\u4e2d\u5176\u6b3a\u9a97\u80fd\u529b\u63d0\u9ad8\u4e86\u7ea640%\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6b3a\u9a97\u68c0\u6d4b\u673a\u5236\u80fd\u8fbe\u5230\u9ad8\u8fbe92%\u7684\u8bc6\u522b\u7387\u3002\u8fd9\u4e9b\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u95ee\u9898\uff0c\u5373\u4ee3\u7406\u53ef\u80fd\u64cd\u7eb5\u4eba\u7c7b\u4ee5\u8fbe\u6210\u9884\u8bbe\u76ee\u6807\u3002|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324](http://arxiv.org/abs/2405.04324)|**[link](https://github.com/ibm-granite/granite-code-models)**|**\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u9886\u57df\u7684\u8bad\u7ec3\u6b63\u5728\u9769\u65b0\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002\u5982\u4eca\uff0c\u8fd9\u4e9b\u4ee3\u7801LLMs\u6b63\u9010\u6b65\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u73af\u5883\uff0c\u4ee5\u63d0\u5347\u4eba\u7c7b\u7a0b\u5e8f\u5458\u7684\u6548\u7387\uff0c\u5e76\u5c55\u73b0\u51fa\u81ea\u4e3b\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u8981\u5145\u5206\u5229\u7528\u4ee3\u7801LLMs\u7684\u5168\u90e8\u6548\u80fd\uff0c\u9700\u8981\u5176\u5177\u5907\u751f\u6210\u4ee3\u7801\u3001\u4fee\u590dbug\u3001\u89e3\u91ca\u548c\u6ce8\u91ca\u4ee3\u7801\u3001\u7ef4\u62a4\u4ed3\u5e93\u7b49\u591a\u79cd\u529f\u80fd\u3002\u672c\u6587\u4ecb\u7ecdGranite\u7cfb\u5217\u7684\u89e3\u7801\u5668\u4ec5\u6709\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u4e13\u4e3a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u800c\u8bbe\u8ba1\uff0c\u8bad\u7ec3\u6570\u636e\u6db5\u76d6116\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002Granite Code\u6a21\u578b\u5bb6\u65cf\u5305\u62ec\u4ece3\u4ebf\u5230340\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u4ece\u590d\u6742\u5e94\u7528\u73b0\u4ee3\u5316\u5230\u8bbe\u5907\u5185\u5b58\u53d7\u9650\u7684\u591a\u79cd\u5e94\u7528\u573a\u666f\u3002\u901a\u8fc7\u5168\u9762\u4efb\u52a1\u8bc4\u4f30\uff0cGranite Code\u6a21\u578b\u5728\u5f00\u6e90\u4ee3\u7801LLM\u4e2d\u7684\u6027\u80fd\u59cb\u7ec8\u5904\u4e8e\u9886\u5148\u6c34\u5e73\u3002\u8be5\u6a21\u578b\u5bb6\u65cf\u9488\u5bf9\u4f01\u4e1a\u8f6f\u4ef6\u5f00\u53d1\u5de5\u4f5c\u6d41\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8868\u73b0\u51fa\u8272\u4e8e\u5404\u79cd\u7f16\u7801\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\u3001\u4fee\u590d\u4e0e\u89e3\u91ca\uff09\uff0c\u662f\u4e00\u6b3e\u591a\u7528\u9014\u7684\u5168\u80fd\u4ee3\u7801\u6a21\u578b\u3002\u6211\u4eec\u4ee5Apache 2.0\u8bb8\u53ef\u534f\u8bae\u53d1\u5e03\u6240\u6709Granite Code\u6a21\u578b\uff0c\u4f9b\u7814\u7a76\u548c\u5546\u4e1a\u4f7f\u7528\u3002**|\n", "2405.04219": "|**2024-05-07**|**Iterative Experience Refinement of Software-Developing Agents**|Chen Qian et.al.|[2405.04219](http://arxiv.org/abs/2405.04219)|null|### \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u81ea\u4e3b\u6027\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u9759\u6001\u7ecf\u9a8c\u8303\u5f0f\u4f9d\u8d56\u4e8e\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u83b7\u53d6\u7684\u56fa\u5b9a\u5386\u53f2\u7ecf\u9a8c\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u8fed\u4ee3\u7ecf\u9a8c\u4f18\u5316\u6846\u67b6\uff0c\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u5728\u6267\u884c\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u52a8\u6001\u8c03\u6574\u548c\u4f18\u5316\u7ecf\u9a8c\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e24\u79cd\u6838\u5fc3\u6a21\u5f0f\uff1a\u987a\u5e8f\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u6279\u6b21\u5185\u7684\u6700\u8fd1\u7ecf\u9a8c\u8fdb\u884c\u6539\u8fdb\uff1b\u7d2f\u8ba1\u6a21\u5f0f\uff0c\u79ef\u7d2f\u6240\u6709\u5148\u524d\u4efb\u52a1\u6279\u6b21\u7684\u7ecf\u9a8c\u3002\u901a\u8fc7\u5f15\u5165\u7ecf\u9a8c\u6dd8\u6c70\u7b56\u7565\uff0c\u8be5\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9ad8\u8d28\u91cf\u548c\u5e38\u7528\u7684\u7ecf\u9a8c\uff0c\u6709\u6548\u5730\u7ba1\u7406\u7ecf\u9a8c\u7a7a\u95f4\uff0c\u63d0\u9ad8\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u987a\u5e8f\u6a21\u5f0f\u53ef\u80fd\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\uff0c\u4f46\u7d2f\u8ba1\u6a21\u5f0f\u5728\u7a33\u5b9a\u6027\u65b9\u9762\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u6dd8\u6c70\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u9ad8\u8d28\u91cf\u7ecf\u9a8c\u5b50\u96c6\u768411.54%\uff0c\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2405.03813": "|**2024-05-06**|**Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control**|Yaqub Chaudhary et.al.|[2405.03813](http://arxiv.org/abs/2405.03813)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6a21\u4eff\u5404\u79cd\u4fee\u8f9e\u98ce\u683c\uff0c\u751f\u6210\u8868\u8fbe\u5e7f\u6cdb\u60c5\u611f\u7684\u6587\u672c\uff0c\u8fd9\u79cd\u80fd\u529b\u5728\u4f4e\u6210\u672c\u4e0b\u8fc5\u901f\u666e\u53ca\uff0c\u5e26\u6765\u4e86\u6f5c\u5728\u7684\u793e\u4f1a\u5371\u5bb3\u3002\u672c\u6587\u5e76\u672a\u5b64\u7acb\u770b\u5f85\u8fd9\u4e9b\u6a21\u578b\uff0c\u800c\u662f\u5173\u6ce8\u5b83\u4eec\u80cc\u540e\u5927\u89c4\u6a21\u8ba1\u7b97\u57fa\u7840\u8bbe\u65bd\u5728\u5404\u9886\u57df\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u63a2\u8ba8\u4e86LLMs\u5982\u4f55\u901a\u8fc7\u6c61\u67d3\u548c\u6807\u51c6\u5316\u4fe1\u606f\u73af\u5883\u6765\u5f71\u54cd\u793e\u4f1a\uff0c\u5e76\u6307\u51fa\u8fd9\u4e9b\u529f\u80fd\u53ef\u80fd\u88ab\u7528\u4f5c\u63a7\u5236\u624b\u6bb5\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u7126\u70b9\u8f6c\u5411\u51e0\u4e2a\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u8fd9\u4e9b\u9886\u57df\u589e\u5f3a\u4e86LLMs\u4f5c\u4e3a\u6743\u529b\u5de5\u5177\u7684\u80fd\u529b\uff1a 1. \u901a\u8fc7\u5b9e\u65f6\u8bbe\u8ba1\u5bf9\u8bdd\u754c\u9762\u4e2d\u7684\u9009\u62e9\u67b6\u6784\uff08\u5982\u201cAI\u89d2\u8272\u201d\uff09\uff0c\u8fdb\u884c\u8bf4\u670d\u7b56\u7565\u3002 2. \u5229\u7528LLM\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u8ba1\u7b97\u6a21\u578b\uff08\u5982\u201c\u7845\u8d28\u4e3b\u4f53\u201d\uff09\u3002 3. \u5c06LLM\u5e94\u7528\u4e8e\u6a21\u62df\u4eba\u7c7b\u7fa4\u4f53\u884c\u4e3a\uff08\u5982\u201c\u7845\u8d28\u793e\u4f1a\u201d\uff09\u3002 4. \u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff0c\u521b\u5efa\u53ef\u63a7\u5236\u548c\u5bfc\u5411\u7684\u6218\u7565\u5bf9\u8bdd\u6a21\u578b\u3002 \u7efc\u5408\u4ee5\u4e0a\u51e0\u70b9\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6280\u672f\u6784\u5efa\u57fa\u4e8eLLMs\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u6a21\u62df\u548c\u4f2a\u88c5\u7684\u201c\u9884\u6d4b\u201d\uff0c\u6210\u4e3a\u4e2a\u4f53\u3001\u793e\u4f1a\u548c\u653f\u6cbb\u63a7\u5236\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u64cd\u63a7\u4eba\u7c7b\u7684\u884c\u4e3a\u3001\u610f\u56fe\u548c\u884c\u52a8\u3002|\n", "2405.06682": "|**2024-05-05**|**Self-Reflection in LLM Agents: Effects on Problem-Solving Performance**|Matthew Renze et.al.|[2405.06682](http://arxiv.org/abs/2405.06682)|**[link](https://github.com/matthewrenze/self-reflection)**|**\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u81ea\u6211\u53cd\u601d\u5bf9\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba9\u4e5d\u79cd\u6d41\u884c\u7684LLMs\u56de\u7b54\u4e00\u7cfb\u5217\u9009\u62e9\u9898\uff0c\u4ee5\u5efa\u7acb\u6027\u80fd\u57fa\u7ebf\u3002\u5bf9\u4e8e\u56de\u7b54\u9519\u8bef\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6307\u5bfc\u516b\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u81ea\u6211\u53cd\u601dLLM\u4ee3\u7406\u53cd\u601d\u5176\u9519\u8bef\uff0c\u5e76\u4e3a\u81ea\u5df1\u63d0\u4f9b\u6539\u8fdb\u95ee\u9898\u89e3\u51b3\u7684\u6307\u5bfc\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u6307\u5bfc\uff0c\u6bcf\u4e2a\u53cd\u601d\u578b\u4ee3\u7406\u91cd\u65b0\u5c1d\u8bd5\u56de\u7b54\u540c\u6837\u7684\u95ee\u9898\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u663e\u8457\u63d0\u9ad8\u4e86\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff08$p < 0.001$\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6bd4\u8f83\u4e86\u5404\u79cd\u81ea\u6211\u53cd\u601d\u65b9\u5f0f\u5bf9\u6027\u80fd\u7684\u5355\u72ec\u8d21\u732e\u3002\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\uff1ahttps://github.com/matthewrenze/self-reflection\u3002**|\n", "2405.02858": "|**2024-05-05**|**Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation**|Jinyu Cai et.al.|[2405.02858](http://arxiv.org/abs/2405.02858)|**[link](https://github.com/BlueLinkX/GA-MAS)**|**\u793e\u4ea4\u5a92\u4f53\u5e73\u53f0\u5982Twitter\u3001Reddit\u548c\u65b0\u6d6a\u5fae\u535a\u5728\u5168\u7403\u4ea4\u6d41\u4e2d\u626e\u6f14\u91cd\u8981\u89d2\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5730\u7f18\u653f\u6cbb\u654f\u611f\u533a\u57df\u5e38\u5e38\u53d7\u5230\u4e25\u683c\u76d1\u7ba1\u3002\u8fd9\u4fc3\u4f7f\u7528\u6237\u5728\u53d7\u9650\u7684\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e2d\u5de7\u5999\u5730\u8c03\u6574\u6c9f\u901a\u65b9\u5f0f\uff0c\u7ecf\u5e38\u4f7f\u7528\u7f16\u7801\u8bed\u8a00\u3002\u8fd9\u79cd\u8bed\u8a00\u6a21\u5f0f\u7684\u53d8\u5316\u4e0d\u4ec5\u662f\u4e3a\u4e86\u5bf9\u6297\u76d1\u7ba1\uff0c\u4e5f\u662f\u8bed\u8a00\u6f14\u5316\u7684\u751f\u52a8\u4f8b\u8bc1\uff0c\u5c55\u793a\u4e86\u793e\u4f1a\u548c\u6280\u672f\u538b\u529b\u4e0b\u8bed\u8a00\u5982\u4f55\u81ea\u7136\u6f14\u53d8\u3002\u7814\u7a76\u53d7\u9650\u5236\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e0b\u8bed\u8a00\u7684\u6f14\u53d8\u5bf9\u4e8e\u4fdd\u969c\u8a00\u8bba\u81ea\u7531\u3001\u4f18\u5316\u5185\u5bb9\u7ba1\u7406\u4ee5\u53ca\u63a8\u52a8\u8bed\u8a00\u5b66\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6a21\u62df\u6846\u67b6\uff0c\u7528\u4e8e\u63a2\u7d22\u5728\u4e25\u683c\u76d1\u7ba1\u4e0b\u7684\u7528\u6237\u8bed\u8a00\u8fdb\u5316\u3002\u8be5\u6846\u67b6\u5305\u542b\u5bf9\u8bdd\u76d1\u7763\u7684LLM\u9a71\u52a8\u4ee3\u7406\u548c\u53c2\u4e0e\u8005\u4ee3\u7406\uff0c\u5b83\u4eec\u5728\u4e92\u52a8\u4e2d\u53d1\u5c55\u8bed\u8a00\u7b56\u7565\uff0c\u6a21\u62df\u5728\u89c4\u907f\u793e\u4ea4\u5a92\u4f53\u89c4\u5219\u7684\u73af\u5883\u4e2d\u4ea4\u6d41\u65b9\u5f0f\u7684\u6f14\u53d8\u3002\u901a\u8fc7\u4ece\u62bd\u8c61\u573a\u666f\u5230\u73b0\u5b9e\u60c5\u5883\u7684\u591a\u79cd\u60c5\u666f\u8bc4\u4f30\uff0c\u7814\u7a76\u7ed3\u679c\u663e\u793aLLMs\u80fd\u591f\u6709\u6548\u6a21\u62df\u53d7\u9650\u73af\u5883\u4e2d\u7684\u590d\u6742\u8bed\u8a00\u52a8\u6001\u548c\u4ea4\u4e92\uff0c\u968f\u7740\u8fdb\u5316\uff0c\u5b83\u4eec\u5728\u89c4\u907f\u76d1\u7763\u548c\u4fe1\u606f\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u51fa\u63d0\u5347\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0LLM\u4ee3\u7406\u9488\u5bf9\u4e0d\u540c\u7684\u573a\u666f\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u7b56\u7565\u3002**|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533](http://arxiv.org/abs/2405.01533)|**[link](https://github.com/nvlabs/omnidrive)**|**\u968f\u7740\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5bf9\u4e8e\u57fa\u4e8e\u8fd9\u4e9b\u6a21\u578b\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u8868\u73b0\u51fa\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u671f\u671b\u5229\u7528\u5b83\u4eec\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06MLLMs\u7684\u5f3a\u9879\u5e94\u7528\u4e8e\u9a7e\u9a76\u4efb\u52a1\u7684\u89c4\u5212\u90e8\u5206\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u89c4\u5212\u9700\u8981\u5bf9\u4e09\u7ef4\u73af\u5883\u6709\u5168\u9762\u7684\u7406\u89e3\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u4e8c\u7ef4\u63a8\u7406\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u6a21\u578b\u4e0e3D\u9a7e\u9a76\u4efb\u52a1\u7684\u7d27\u5bc6\u5951\u5408\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u76843D MLLM\u67b6\u6784\uff0c\u5b83\u5229\u7528\u7a00\u758f\u67e5\u8be2\u6280\u672f\u5c06\u89c6\u89c9\u8868\u793a\u63d0\u5347\u5e76\u538b\u7f29\u5230\u4e09\u7ef4\u7a7a\u95f4\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u8fd9\u79cd\u57fa\u4e8e\u67e5\u8be2\u7684\u8868\u793a\u65b9\u5f0f\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u540c\u65f6\u7f16\u7801\u52a8\u6001\u7269\u4f53\u548c\u9759\u6001\u5730\u56fe\u5143\u7d20\uff08\u5982\u9053\u8def\uff09\uff0c\u4e3a\u611f\u77e5\u548c\u884c\u52a8\u7684\u5bf9\u9f50\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5316\u7684\u4e09\u7ef4\u4e16\u754c\u6a21\u578b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86OmniDrive-nuScenes\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u89c6\u89c9\u95ee\u7b54\u6570\u636e\u96c6\uff0c\u5b83\u901a\u8fc7\u5168\u9762\u7684\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\uff08\u5982\u573a\u666f\u63cf\u8ff0\u3001\u4ea4\u901a\u89c4\u5219\u7406\u89e3\u3001\u4e09\u7ef4\u5b9a\u4f4d\u3001\u53cd\u4e8b\u5b9e\u63a8\u7406\u3001\u51b3\u7b56\u5236\u5b9a\u548c\u89c4\u5212\uff09\u6765\u8003\u9a8c\u6a21\u578b\u5728\u590d\u6742\u4e09\u7ef4\u573a\u666f\u4e2d\u7684\u771f\u6b63\u60c5\u5883\u610f\u8bc6\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u63d0\u51fa\u7684\u67b6\u6784\u6709\u6548\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u590d\u6742\u4e09\u7ef4\u73af\u5883\u4e2d\u8fdb\u884c\u63a8\u7406\u548c\u89c4\u5212\u65f6\uff0c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002**|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972](http://arxiv.org/abs/2405.00972)|**[link](https://github.com/pnnl/cactus)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCACTUS\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u7ed3\u5408\u4e86\u5316\u5b66\u4fe1\u606f\u5b66\u5de5\u5177\uff0c\u65e8\u5728\u63d0\u5347\u5728\u5316\u5b66\u548c\u5206\u5b50\u53d1\u73b0\u9886\u57df\u7684\u9ad8\u7ea7\u63a8\u7406\u4e0e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u7814\u7a76\u8005\u4eec\u4f7f\u7528\u5305\u62ecGemma-7b\u3001Falcon-7b\u3001MPT-7b\u3001Llama2-7b\u548cMistral-7b\u5728\u5185\u7684\u591a\u6b3e\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u5bf9CACTUS\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6027\u80fd\u8bc4\u4f30\uff0c\u901a\u8fc7\u6570\u5343\u4e2a\u5316\u5b66\u95ee\u9898\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cCACTUS\u660e\u663e\u4f18\u4e8e\u57fa\u7840\u6a21\u578b\uff0c\u5176\u4e2dGemma-7b\u548cMistral-7b\u65e0\u8bba\u91c7\u7528\u4f55\u79cd\u63d0\u793a\u7b56\u7565\uff0c\u8868\u73b0\u6700\u4e3a\u51fa\u8272\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u9886\u57df\u7279\u5b9a\u63d0\u793a\u548c\u786c\u4ef6\u914d\u7f6e\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5f3a\u8c03\u4e86\u63d0\u793a\u5de5\u7a0b\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6307\u51fa\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u90e8\u7f72\u8f83\u5c0f\u6a21\u578b\u53ef\u80fd\u4e0d\u4f1a\u663e\u8457\u727a\u7272\u51c6\u786e\u6027\u3002 CACTUS\u901a\u8fc7\u878d\u5408\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8ba4\u77e5\u529f\u80fd\u4e0e\u4e13\u4e1a\u5de5\u5177\uff0c\u80fd\u591f\u534f\u52a9\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u5206\u5b50\u6027\u8d28\u9884\u6d4b\u3001\u76f8\u4f3c\u6027\u641c\u7d22\u548c\u836f\u7269\u9002\u7528\u6027\u8bc4\u4f30\u7b49\u4efb\u52a1\u3002\u4f5c\u4e3a\u5316\u5b66\u4fe1\u606f\u5b66\u9886\u57df\u7684\u91cd\u5927\u7a81\u7834\uff0cCACTUS\u4e3a\u5316\u5b66\u5bb6\u548c\u5206\u5b50\u63a2\u7d22\u8005\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u5de5\u5177\uff0c\u6709\u671b\u52a0\u901f\u79d1\u5b66\u7814\u7a76\uff0c\u63a8\u52a8\u65b0\u578b\u6709\u6548\u3001\u5b89\u5168\u836f\u7269\u3001\u50ac\u5316\u5242\u548c\u6750\u6599\u7684\u53d1\u73b0\u3002\u6b64\u5916\uff0cCACTUS\u4e0e\u81ea\u52a8\u5316\u5b9e\u9a8c\u5e73\u53f0\u7684\u96c6\u6210\u4ee5\u53ca\u5b9e\u65f6\u6570\u636e\u9a71\u52a8\u51b3\u7b56\u7684\u80fd\u529b\uff0c\u4e3a\u81ea\u4e3b\u53d1\u73b0\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u3002**|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978](http://arxiv.org/abs/2404.18978)|null|\u968f\u7740\u6559\u80b2\u73af\u5883\u4e2d\u5bf9\u5b66\u4e60\u8005\u6a21\u578b\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u7814\u7a76\u91cd\u70b9\u9010\u6e10\u8f6c\u5411\u5982\u4f55\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f8\u7ed3\u5408\uff0c\u63d0\u5347\u5728\u5f00\u653e\u6027\u6587\u672c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u901a\u7528\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e09\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\uff081\uff09\u57fa\u4e8eRL\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8868\u793a\u72b6\u6001\u548c\u884c\u52a8\u7b56\u7565\u4ee5\u5bfb\u627e\u6700\u4f73\u4e92\u52a8\u65b9\u5f0f\uff1b\uff082\uff09\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5229\u7528\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u901a\u8fc7\u63d0\u793a\u8fdb\u884c\u64cd\u4f5c\uff1b\uff083\uff09\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u63d0\u9ad8\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e9b\u4ee3\u7406\u7684\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PharmaSimText\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81eaPharmaSim\u865a\u62df\u836f\u5e97\u73af\u5883\u7684\u65b0\u57fa\u51c6\uff0c\u4e13\u6ce8\u4e8e\u8bca\u65ad\u5bf9\u8bdd\u5b9e\u8df5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRL\u57fa\u7840\u7684\u4ee3\u7406\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u8868\u73b0\u4f18\u79c0\uff0c\u4f46\u5728\u63d0\u95ee\u8d28\u91cf\u4e0a\u6709\u6240\u6b20\u7f3a\uff1b\u800cLLM\u57fa\u7840\u7684\u4ee3\u7406\u5728\u63d0\u95ee\u80fd\u529b\u4e0a\u8f83\u5f3a\uff0c\u4f46\u4efb\u52a1\u5b8c\u6210\u5ea6\u4e0d\u9ad8\u3002\u6700\u540e\uff0c\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\u5c55\u793a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u7684\u6f5c\u529b\uff0c\u8bc1\u5b9e\u4e86RL\u4e0eLLMs\u7ed3\u5408\u7528\u4e8e\u5f00\u53d1\u5f00\u653e\u6027\u5b66\u4e60\u73af\u5883\u9ad8\u8868\u73b0\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021](http://arxiv.org/abs/2404.18021)|null|\u968f\u7740\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u7684\u5174\u8d77\uff0c\u7cbe\u786e\u4fee\u6539\u9057\u4f20\u4fe1\u606f\u5df2\u6210\u4e3a\u53ef\u80fd\uff0c\u4f46\u9ad8\u6548\u57fa\u56e0\u7f16\u8f91\u7cfb\u7edf\u7684\u6784\u5efa\u9700\u8981\u6df1\u5165\u7406\u89e3CRISPR\u6280\u672f\u53ca\u5176\u590d\u6742\u5b9e\u9a8c\u80cc\u666f\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bf8\u591a\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u751f\u7269\u8bbe\u8ba1\u95ee\u9898\u4e0a\u5f80\u5f80\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u3002\u672c\u6587\u4ecb\u7ecdCRISPR-GPT\uff0c\u4e00\u4e2a\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u9886\u57df\u77e5\u8bc6\u548c\u5916\u90e8\u5de5\u5177\uff0c\u4ee5\u81ea\u52a8\u5316\u5e76\u63d0\u5347\u57fa\u4e8eCRISPR\u7684\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\u8bbe\u8ba1\u8fc7\u7a0b\u3002CRISPR-GPT\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u534f\u52a9\u9009\u62e9CRISPR\u7cfb\u7edf\u3001\u8bbe\u8ba1\u5f15\u5bfcRNA\u3001\u63a8\u8350\u7ec6\u80de\u9012\u9001\u65b9\u6cd5\u3001\u8d77\u8349\u534f\u8bae\u4ee5\u53ca\u8bbe\u8ba1\u9a8c\u8bc1\u5b9e\u9a8c\u4ee5\u786e\u8ba4\u7f16\u8f91\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86CRISPR-GPT\u5982\u4f55\u5e2e\u52a9\u975e\u4e13\u5bb6\u7814\u7a76\u4eba\u5458\u4ece\u5934\u5f00\u59cb\u8fdb\u884c\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u5b9e\u9645\u6848\u4f8b\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u81ea\u52a8\u5316\u57fa\u56e0\u7f16\u8f91\u8bbe\u8ba1\u7684\u4f26\u7406\u548c\u76d1\u7ba1\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u8d1f\u8d23\u4efb\u548c\u900f\u660e\u4f7f\u7528\u6b64\u7c7b\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u76ee\u6807\u662f\u5f25\u5408\u521d\u7ea7\u751f\u7269\u7814\u7a76\u8005\u4e0eCRISPR\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u5c55\u793aLLM\u4ee3\u7406\u5728\u4fc3\u8fdb\u590d\u6742\u751f\u7269\u53d1\u73b0\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2404.17833": "|**2024-04-27**|**Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs**|Zhenlan Ji et.al.|[2404.17833](http://arxiv.org/abs/2404.17833)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u4e2d\uff0c\u7279\u522b\u662f\u5728\u5fc3\u7406\u5065\u5eb7\u652f\u6301\u3001\u5316\u5b66\u5408\u6210\u548c\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5c55\u73b0\u6548\u7528\uff0c\u4eba\u4eec\u53d1\u73b0\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u957f\u671f\u89c4\u5212\u65f6\u5bb9\u6613\u4ea7\u751f\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u2014\u2014PDoctor\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u7406\u89e3LLM\u4ee3\u7406\u7684\u9519\u8bef\u89c4\u5212\u3002PDoctor\u9996\u5148\u5b9a\u4e49\u4e86\u4e00\u4e2a\u9886\u57df\u7279\u5b9a\u7684\u8bed\u8a00\uff08DSL\uff09\uff0c\u7528\u4e8e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u501f\u52a9Z3\u7ea6\u675f\u6c42\u89e3\u5668\u751f\u6210\u5404\u79cd\u8f93\u5165\uff0c\u8fd9\u4e9b\u8f93\u5165\u662f\u63cf\u8ff0\u4e00\u7cfb\u5217\u4efb\u52a1\u5b8c\u6210\u9700\u6c42\u7684\u81ea\u7136\u8bed\u8a00\u6bb5\u843d\u3002\u7136\u540e\uff0cPDoctor\u4ece\u8fd9\u4e9b\u9700\u6c42\u4e2d\u63d0\u53d6\u7ea6\u675f\uff0c\u5f62\u6210\u4e00\u4e2a\u6d4b\u8bd5\u57fa\u51c6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u4e3b\u6d41\u7684\u4ee3\u7406\u6846\u67b6\u548c\u4e24\u4e2a\u5f3a\u5927\u7684LLMs\uff08GPT-3.5\u548cGPT-4\uff09\u5bf9PDoctor\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u80fd\u6709\u6548\u8bc6\u522b\u4ee3\u7406\u89c4\u5212\u4e2d\u7684\u5404\u79cd\u9519\u8bef\uff0c\u5e76\u4e3a\u5f00\u53d1\u8005\u548c\u7528\u6237\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u9519\u8bef\u7279\u6027\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u66ff\u4ee3\u8bbe\u8ba1\u548c\u6269\u5c55PDoctor\u7684\u65b9\u5411\u3002|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662](http://arxiv.org/abs/2404.17662)|**[link](https://github.com/alickzhu/player)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u95f4\u7684\u901a\u4fe1\u548c\u793e\u4f1a\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u6d89\u53ca\u7ade\u4e89\u4e0e\u5408\u4f5c\u7684\u52a8\u6001\u73af\u5883\u4e2d\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u590d\u6742\u63a8\u7406\u7684\u6784\u5efa\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5c24\u5176\u662f\u56e0\u4e3a\u57fa\u4e8e\u4fe1\u606f\u56fe\u7684\u641c\u7d22\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPLAYER*\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u4efb\u610f\u91c7\u6837\u5f0f\u89c4\u5212\u5668\u7684\u65b0\u6846\u67b6\uff0c\u5b83\u7ed3\u5408\u4e86\u4f20\u611f\u5668\u548c\u526a\u679d\u6280\u672f\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5b8c\u5168\u4f9d\u8d56\u4e8e\u95ee\u9898\u9a71\u52a8\u7684\u641c\u7d22\u6846\u67b6\uff0c\u9002\u7528\u4e8e\u9ad8\u96be\u5ea6\u7684\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u53ef\u91cf\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u901a\u8fc7\u591a\u9879\u9009\u62e9\u9898\u6765\u6d4b\u8bd5\uff0c\u5e76\u521b\u5efa\u4e86WellPlay\u6570\u636e\u96c6\uff0c\u5305\u542b1,482\u4e2a\u95ee\u7b54\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPLAYER*\u5728\u590d\u6742\u52a8\u6001\u73af\u5883\u4e2d\u7684\u6548\u7387\u548c\u6027\u80fd\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u53ef\u91cf\u5316\u7684\u5bf9\u6bd4\u7ed3\u679c\u3002**|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525](http://arxiv.org/abs/2404.17525)|null|\u4f20\u7edf\u7684\u673a\u68b0\u8bbe\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u901a\u8fc7\u7ecf\u9a8c\u5f15\u5bfc\u7684\u4fee\u6539\u548c\u6709\u9650\u5143\u5206\u6790\uff08FEA\uff09\u6765\u6ee1\u8db3\u7279\u5b9a\u9700\u6c42\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8017\u65f6\u4e14\u9ad8\u5ea6\u4f9d\u8d56\u4e2a\u4eba\u77e5\u8bc6\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u4e86\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u6a21\u578b\u6765\u7b80\u5316\u7e41\u7410\u7684\u4e13\u5bb6\u9a71\u52a8\u8fed\u4ee3\u8fc7\u7a0b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9700\u8981\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5176\u8bad\u7ec3\u9886\u57df\u548c\u4efb\u52a1\uff0c\u9650\u5236\u4e86\u8de8\u4efb\u52a1\u5e94\u7528\u3002\u8fd9\u5728\u81ea\u52a8\u5316\u6548\u7387\u4e0e\u8d44\u6e90\u9700\u6c42\u4e4b\u95f4\u5f62\u6210\u4e86\u6743\u8861\u3002 \u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u5c06\u9884\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u6709\u9650\u5143\u6a21\u5757\u7ed3\u5408\u3002\u6709\u9650\u5143\u6a21\u5757\u8bc4\u4f30\u6bcf\u4e2a\u8bbe\u8ba1\u5e76\u63d0\u4f9b\u5173\u952e\u53cd\u9988\uff0c\u5f15\u5bfcLLMs\u4e0d\u65ad\u5b66\u4e60\u3001\u89c4\u5212\u3001\u751f\u6210\u548c\u4f18\u5316\u8bbe\u8ba1\uff0c\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u8fdb\u884c\u4e13\u95e8\u8bad\u7ec3\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6841\u67b6\u7ed3\u6784\u7684\u8fed\u4ee3\u4f18\u5316\u4e2d\u5c55\u793a\u8fd9\u79cd\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u80fd\u591f\u6839\u636e\u7ed3\u6784\u5316\u7684\u53cd\u9988\u548c\u6807\u51c6\u8c03\u6574\u8bbe\u8ba1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6210\u529f\u751f\u6210\u7b26\u5408\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u6841\u67b6\u7ed3\u6784\u8bbe\u8ba1\uff0c\u6210\u529f\u7387\u9ad8\u8fbe90%\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6240\u65bd\u52a0\u7684\u7ea6\u675f\u6761\u4ef6\u3002\u901a\u8fc7\u63d0\u793a\u5f0f\u4f18\u5316\u6280\u672f\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u63a5\u6536\u5230\u89e3-\u5f97\u5206\u5bf9\u540e\uff0c\u80fd\u591f\u6839\u636e\u5176\u5185\u5728\u63a8\u7406\u80fd\u529b\u8fed\u4ee3\u4f18\u5316\u8bbe\u8ba1\u4ee5\u6ee1\u8db3\u89c4\u683c\u8981\u6c42\u3002 LLM\u4ee3\u7406\u80fd\u591f\u4ea7\u751f\u53ef\u884c\u7684\u8bbe\u8ba1\u5e76\u6839\u636e\u5176\u56fa\u6709\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4f18\u5316\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u81ea\u4e3b\u53d1\u5c55\u548c\u5b9e\u65bd\u6709\u6548\u7684\u8bbe\u8ba1\u7b56\u7565\u3002|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460](http://arxiv.org/abs/2404.17460)|null|\u672c\u6587\u8ba8\u8bba\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5bf9\u8bdd\u5f0f\u8f85\u5bfc\u7cfb\u7edf\uff08Conversational Tutoring Systems\uff0cCTS\uff09\uff0c\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u9996\u5148\uff0c\u7cfb\u7edf\u901a\u8fc7\u81ea\u52a8\u4ece\u8bfe\u7a0b\u6587\u672c\u4e2d\u751f\u6210\u6613\u4e8e\u7f16\u8f91\u7684\u6559\u5b66\u811a\u672c\uff0c\u5b9e\u73b0AI\u8f85\u52a9\u7684\u5185\u5bb9\u521b\u4f5c\u3002\u5176\u6b21\uff0c\u7cfb\u7edf\u901a\u8fc7\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff08Ruffle\u548cRiley\uff09\u4ee5\u5b66\u4e60\u6559\u5b66\u6a21\u5f0f\u8fd0\u884c\uff0c\u5206\u522b\u626e\u6f14\u5b66\u751f\u548c\u6559\u6388\u89d2\u8272\uff0c\u8fdb\u884c\u81ea\u7531\u5f62\u5f0f\u7684\u5bf9\u8bdd\uff0c\u9075\u5faa\u5178\u578b\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\u7684\u5185\u73af\u548c\u5916\u73af\u7ed3\u6784\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff08N=200\uff09\u4e2d\u5bf9\u6bd4\u4e86\u8be5\u7cfb\u7edf\u4e0e\u7b80\u5355\u7684\u95ee\u7b54\u804a\u5929\u673a\u5668\u4eba\u548c\u9605\u8bfb\u6d3b\u52a8\u5728\u652f\u6301\u751f\u7269\u5b66\u8bfe\u7a0b\u7684\u6548\u679c\u3002\u7814\u7a76\u5206\u6790\u4e86\u7cfb\u7edf\u4f7f\u7528\u6a21\u5f0f\u3001\u9884\u540e\u6d4b\u8bd5\u6210\u7ee9\u4ee5\u53ca\u7528\u6237\u4f53\u9a8c\u8c03\u67e5\uff0c\u7ed3\u679c\u663e\u793a\u7528\u6237\u5bf9Ruffle&Riley\u7684\u53c2\u4e0e\u5ea6\u9ad8\uff0c\u7406\u89e3\u529b\u5f3a\uff0c\u5e76\u8ba4\u4e3a\u63d0\u4f9b\u7684\u652f\u6301\u6709\u5e2e\u52a9\u3002\u5c3d\u7ba1Ruffle&Riley\u7528\u6237\u7684\u5b8c\u6210\u65f6\u95f4\u8f83\u957f\uff0c\u4f46\u5728\u77ed\u671f\u5b66\u4e60\u6210\u6548\u4e0a\u5e76\u672a\u53d1\u73b0\u663e\u8457\u5dee\u5f02\uff0c\u4f18\u4e8e\u9605\u8bfb\u6d3b\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u67b6\u6784\u548c\u7528\u6237\u7814\u7a76\u4e3a\u672a\u6765CTS\u8bbe\u8ba1\u8005\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u6e90\u6211\u4eec\u7684\u7cfb\u7edf\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8eLLM\u7684\u5b66\u4e60\u6280\u672f\u6709\u6548\u6559\u5b66\u8bbe\u8ba1\u7684\u7814\u7a76\u3002|\n", "2404.17153": "|**2024-04-26**|**A Unified Debugging Approach via LLM-Based Multi-Agent Synergy**|Cheryl Lee et.al.|[2404.17153](http://arxiv.org/abs/2404.17153)|**[link](https://github.com/acceptepapier/unidebugger)**|\u5728\u8f6f\u4ef6\u8c03\u8bd5\u8fd9\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u52aa\u529b\u5b9e\u73b0\u81ea\u52a8\u5316\uff0c\u5305\u62ec\u6545\u969c\u5b9a\u4f4d\u548c\u4fee\u590d\u751f\u6210\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u8c03\u8bd5\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4f20\u7edf\u548c\u57fa\u4e8eLLM\u7684\u8c03\u8bd5\u5de5\u5177\u9762\u4e34\u4e09\u5927\u6311\u6218\uff1a1\uff09\u4e0a\u6e38\u7684\u6545\u969c\u5b9a\u4f4d\u4e0d\u51c6\u786e\u4f1a\u6ce2\u53ca\u4e0b\u6e38\u7684\u4fee\u590d\uff1b2\uff09\u5904\u7406\u590d\u6742\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\u4e0d\u8db3\uff1b3\uff09\u5ffd\u89c6\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u81ea\u52a8\u5316\u7684\u3001\u7edf\u4e00\u7684\u8c03\u8bd5\u6846\u67b6\u2014\u2014FixAgent\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u534f\u540c\u3002FixAgent\u80fd\u6267\u884c\u7aef\u5230\u7aef\u7684\u6545\u969c\u5b9a\u4f4d\u3001\u4fee\u590d\u548c\u5206\u6790\u3002 \u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0cLLMs\u80fd\u591f\u4ece\u4eba\u7c7b\u5f00\u53d1\u8005\u8ba4\u53ef\u7684\u901a\u7528\u8f6f\u4ef6\u5de5\u7a0b\u539f\u5219\u4e2d\u83b7\u76ca\uff0c\u6bd4\u5982\u201c\u6a61\u76ae\u9e2d\u8c03\u8bd5\u201d\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7a0b\u5e8f\u529f\u80fd\u548c\u903b\u8f91\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u4e2a\u7075\u611f\u6765\u6e90\u4e8e\u201c\u6a61\u76ae\u9e2d\u201d\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u4ee3\u7406\u4e13\u4e1a\u5316\u4e0e\u534f\u540c\u3001\u5173\u952e\u53d8\u91cf\u8ddf\u8e2a\u548c\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u4fc3\u4f7fLLMs\u63d0\u4f9b\u660e\u786e\u7684\u89e3\u91ca\uff0c\u5e76\u805a\u7126\u4e8e\u5173\u952e\u7684\u7a0b\u5e8f\u903b\u8f91\u4fe1\u606f\u3002\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684QuixBugs\u6570\u636e\u96c6\u4e0a\uff0cFixAgent\u6210\u529f\u4fee\u590d\u4e8680\u4e2abug\u4e2d\u768479\u4e2a\uff0c\u5176\u4e2d9\u4e2a\u662f\u4e4b\u524d\u672a\u89e3\u51b3\u7684\u3002\u5b83\u8fd8\u5728CodeFlaws\u4e0a\u5408\u7406\u5730\u4fee\u590d\u4e861.9\u500d\u4e8e\u6700\u4f73\u4fee\u590d\u5de5\u5177\u7684\u7f3a\u9677\uff0c\u800c\u4e14\u65e0\u9700\u4f4d\u7f6e\u4fe1\u606f\uff0c\u91c7\u6837\u7387\u4f4e\u4e8e0.6%\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4e0e\u4f7f\u7528\u4e0d\u540cLLM\u7684\u57fa\u7ebf\u6a21\u578b\u76f8\u6bd4\uff0cFixAgent\u63d0\u9ad8\u4e86\u7ea620%\u7684\u5408\u7406\u4fee\u590d\u548c\u6b63\u786e\u4fee\u590d\u7387\uff0c\u663e\u793a\u51fa\u6211\u4eec\u8bbe\u8ba1\u7684\u6709\u6548\u6027\u3002 \u6b64\u5916\uff0cFixAgent\u7684\u6b63\u786e\u7387\u9ad8\u8fbe97.26%\uff0c\u8868\u660e\u5b83\u6709\u53ef\u80fd\u514b\u670d\u73b0\u6709\u65b9\u6cd5\u7684\u8fc7\u62df\u5408\u95ee\u9898\u3002\u603b\u7ed3\u6765\u8bf4\uff0cFixAgent\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u81ea\u52a8\u5316\u8c03\u8bd5\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u8f6f\u4ef6\u8c03\u8bd5\u7684\u6548\u7387\u548c\u51c6\u786e\u6027\u3002|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698](http://arxiv.org/abs/2404.16698)|**[link](https://github.com/giorgiopiatti/govsim)**|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51b3\u7b56\u5b89\u5168\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGovernance of the Commons Simulation\u201d\uff08GovSim\uff09\u7684\u6a21\u62df\u5e73\u53f0\uff0c\u65e8\u5728\u7814\u7a76LLMs\u4e2d\u7684\u6218\u7565\u4e92\u52a8\u548c\u5408\u4f5c\u51b3\u7b56\u3002\u901a\u8fc7\u8fd9\u4e2a\u73af\u5883\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u4e4b\u95f4\u8d44\u6e90\u5206\u4eab\u7684\u52a8\u6001\uff0c\u5f3a\u8c03\u4e86\u4f26\u7406\u8003\u91cf\u3001\u6218\u7565\u89c4\u5212\u548c\u8c08\u5224\u6280\u5de7\u7684\u91cd\u8981\u6027\u3002GovSim\u5177\u6709\u7075\u6d3b\u6027\uff0c\u652f\u6301\u6587\u672c\u578b\u4ee3\u7406\uff0c\u5305\u62ecLLMs\u3002\u5229\u7528\u751f\u6210\u5f0f\u4ee3\u7406\u6846\u67b6\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u901a\u7528\u4ee3\u7406\uff0c\u4fbf\u4e8e\u6574\u5408\u4e0d\u540c\u7684LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u5728GovSim\u4e2d\uff0c\u53ea\u670915\u4e2a\u6d4b\u8bd5\u6a21\u578b\u4e2d\u76842\u4e2a\u80fd\u591f\u5b9e\u73b0\u53ef\u6301\u7eed\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u7ba1\u7406\u5171\u4eab\u8d44\u6e90\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u5982\u679c\u79fb\u9664\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u80fd\u529b\uff0c\u5b83\u4eec\u4f1a\u8fc7\u5ea6\u4f7f\u7528\u5171\u4eab\u8d44\u6e90\uff0c\u7a81\u51fa\u4e86\u5408\u4f5c\u4e2d\u6c9f\u901a\u7684\u5173\u952e\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u5927\u591a\u6570LLMs\u7f3a\u4e4f\u666e\u904d\u5316\u7684\u5047\u8bbe\u80fd\u529b\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u63a8\u7406\u6280\u80fd\u7684\u4e00\u4e2a\u91cd\u8981\u5f31\u70b9\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u6240\u6709\u7814\u7a76\u7ed3\u679c\uff0c\u5305\u62ec\u6a21\u62df\u73af\u5883\u3001\u4ee3\u7406\u63d0\u793a\u4ee5\u53ca\u5168\u9762\u7684\u7f51\u7edc\u754c\u9762\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba8\u8bba\u3002|\n", "2404.17605": "|**2024-04-24**|**Autonomous LLM-driven research from data to human-verifiable research papers**|Tal Ifargan et.al.|[2404.17605](http://arxiv.org/abs/2404.17605)|**[link](https://github.com/technion-kishony-lab/data-to-paper)**|**\u968f\u7740\u4eba\u5de5\u667a\u80fd\u63a8\u52a8\u79d1\u5b66\u53d1\u73b0\u7684\u6b65\u4f10\u52a0\u5feb\uff0c\u4eba\u4eec\u8fd8\u4e0d\u6e05\u695a\u5b8c\u5168\u7531AI\u9a71\u52a8\u7684\u7814\u7a76\u662f\u5426\u53ef\u884c\uff0c\u4ee5\u53ca\u5b83\u80fd\u5426\u9075\u5faa\u5173\u952e\u7684\u79d1\u5b66\u4ef7\u503c\u89c2\uff0c\u5982\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002\u4e3a\u4e86\u6a21\u62df\u4eba\u7c7b\u7684\u79d1\u5b66\u7814\u7a76\u5b9e\u8df5\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u201c\u6570\u636e\u5230\u8bba\u6587\u201d\uff08data-to-paper\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u5e73\u53f0\uff0c\u5f15\u5bfc\u76f8\u4e92\u534f\u4f5c\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u901a\u8fc7\u5b8c\u6574\u7684\u5206\u6b65\u9aa4\u7814\u7a76\u6d41\u7a0b\uff0c\u540c\u65f6\u7a0b\u5e8f\u5316\u8ffd\u8e2a\u4fe1\u606f\u6d41\uff0c\u5e76\u5141\u8bb8\u4eba\u7c7b\u76d1\u7763\u548c\u4e92\u52a8\u3002\u5728\u81ea\u52a8\u6a21\u5f0f\u4e0b\uff0c\u4ec5\u63d0\u4f9b\u6807\u6ce8\u6570\u636e\uff0c\u8be5\u5e73\u53f0\u5c31\u80fd\u63d0\u51fa\u5047\u8bbe\uff0c\u8bbe\u8ba1\u7814\u7a76\u8ba1\u5212\uff0c\u7f16\u5199\u548c\u8c03\u8bd5\u5206\u6790\u4ee3\u7801\uff0c\u751f\u6210\u548c\u89e3\u8bfb\u7ed3\u679c\uff0c\u751a\u81f3\u521b\u5efa\u5b8c\u6574\u4e14\u4fe1\u606f\u53ef\u8ffd\u6eaf\u7684\u79d1\u7814\u8bba\u6587\u3002\u5c3d\u7ba1\u7814\u7a76\u65b0\u9896\u6027\u6709\u9650\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u5c55\u793a\u4e86AI\u81ea\u4e3b\u4ece\u6570\u636e\u4e2d\u751f\u6210\u539f\u521b\u5b9a\u91cf\u6d1e\u5bdf\u7684\u80fd\u529b\u3002\u5bf9\u4e8e\u7b80\u5355\u7684\u7814\u7a76\u76ee\u6807\uff0c\u5168\u81ea\u52a8\u6d41\u7a0b\u80fd\u521b\u4f5c\u51fa\u5927\u7ea680-90%\u65e0\u9700\u91cd\u5927\u9519\u8bef\u7684\u7a3f\u4ef6\uff0c\u7136\u800c\u968f\u7740\u76ee\u6807\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u4eba\u7c7b\u7684\u5171\u540c\u53c2\u4e0e\u5bf9\u4e8e\u4fdd\u8bc1\u51c6\u786e\u6027\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u751f\u6210\u7684\u8bba\u6587\u672c\u8eab\u4e5f\u5177\u6709\u5185\u5728\u7684\u53ef\u9a8c\u8bc1\u6027\uff0c\u56e0\u4e3a\u4fe1\u606f\u8ffd\u8e2a\u4f7f\u5f97\u7ed3\u679c\u3001\u65b9\u6cd5\u548c\u6570\u636e\u7684\u94fe\u63a5\u53ef\u4ee5\u7a0b\u5e8f\u5316\u8fdb\u884c\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0cAI\u9a71\u52a8\u7684\u79d1\u7814\u53ef\u4ee5\u52a0\u901f\u79d1\u5b66\u53d1\u73b0\uff0c\u540c\u65f6\u589e\u5f3a\u800c\u975e\u5a01\u80c1\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002**|\n", "2404.16115": "|**2024-04-24**|**Online Personalizing White-box LLMs Generation with Neural Bandits**|Zekai Chen et.al.|[2404.16115](http://arxiv.org/abs/2404.16115)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u59cb\u751f\u6210\u4e2a\u6027\u5316\u7684\u6587\u672c\u5185\u5bb9\uff0c\u5982\u4f55\u5728\u4e0d\u4e3a\u6bcf\u4f4d\u7528\u6237\u521b\u5efa\u72ec\u7279\u6a21\u578b\u7684\u8d44\u6e90\u6d88\u8017\u4e0b\u5b9e\u73b0\u9ad8\u6548\u4e2a\u6027\u5316\u6210\u4e86\u65b0\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5728\u7ebf\u65b9\u6cd5\uff0c\u5229\u7528\u795e\u7ecf_bandit\u7b97\u6cd5\u52a8\u6001\u4f18\u5316\u8f6f\u6307\u4ee4\u5d4c\u5165\uff0c\u6839\u636e\u7528\u6237\u53cd\u9988\u8c03\u6574\u5185\u5bb9\uff0c\u4ece\u800c\u63d0\u5347\u767d\u76d2LLMs\u5f00\u653e\u6027\u6587\u672c\u751f\u6210\u7684\u4e2a\u6027\u5316\u6c34\u5e73\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u57fa\u7840\u7b56\u7565\u6709\u663e\u8457\u6027\u80fd\u63d0\u5347\u3002\u7279\u522b\u662f\u9488\u5bf9\u4e2a\u6027\u5316\u65b0\u95fb\u6807\u9898\u751f\u6210\uff0cNeuralTS\u5e26\u6765\u4e86\u9ad8\u8fbe62.9%\u7684\u6700\u4f73ROUGE\u5206\u6570\u63d0\u5347\u4ee5\u53ca2.76%\u7684LLM\u4ee3\u7406\u8bc4\u4f30\u5206\u6570\u589e\u957f\uff0c\u8fd9\u8868\u660e\u5176\u6548\u679c\u663e\u8457\u3002|\n", "2404.15974": "|**2024-04-24**|**A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples**|Lihang Pan et.al.|[2404.15974](http://arxiv.org/abs/2404.15974)|null|## \u7ffb\u8bd1 \u5355\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fde\u63a5\u591a\u4e2aLLM\u4ee3\u7406\u6784\u5efa\u7684\u7f51\u7edc\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6574\u4f53\u6027\u80fd\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4eba\u673a\u534f\u4f5c\u5de5\u5177\u2014\u2014EasyLAN\uff0c\u65e8\u5728\u5e2e\u52a9\u5f00\u53d1\u8005\u8f7b\u677e\u6784\u5efaLLM\u4ee3\u7406\u7f51\u7edc\uff08LAN\uff09\u3002EasyLAN\u9996\u5148\u6839\u636e\u4efb\u52a1\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u4ec5\u5305\u542b\u4e00\u4e2a\u4ee3\u7406\u7684\u521d\u59cb\u7f51\u7edc\u3002\u63a5\u7740\uff0c\u5b83\u5229\u7528\u5c11\u91cf\u8bad\u7ec3\u793a\u4f8b\u6765\u8c03\u6574\u7f51\u7edc\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u793a\u4f8b\uff0cEasyLAN\u5206\u6790\u8f93\u51fa\u4e0e\u771f\u5b9e\u7ed3\u679c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u5e76\u627e\u51fa\u9519\u8bef\u7684\u539f\u56e0\u3002EasyLAN\u4f1a\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u7b56\u7565\u6765\u4fee\u6b63\u8fd9\u4e9b\u95ee\u9898\u3002\u7528\u6237\u53ef\u4ee5\u4ecb\u5165EasyLAN\u7684\u5de5\u4f5c\u6d41\u7a0b\u6216\u76f4\u63a5\u4fee\u6539LAN\u3002\u6700\u7ec8\uff0cLAN\u4ece\u5355\u4e2a\u4ee3\u7406\u53d1\u5c55\u6210\u591a\u4ee3\u7406\u7684\u7f51\u7edc\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEasyLAN\u80fd\u591f\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u6784\u5efa\u6027\u80fd\u826f\u597d\u7684LAN\u3002|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269](http://arxiv.org/abs/2404.15269)|**[link](https://github.com/gao-g/prelude)**|**\u6211\u4eec\u7814\u7a76\u57fa\u4e8e\u7528\u6237\u5bf9\u8bed\u8a00\u6a21\u578b\u7f16\u8f91\u7684\u4e92\u52a8\u5b66\u4e60\u8bed\u8a00\u4ee3\u7406\u3002\u5728\u8bf8\u5982\u5199\u4f5c\u52a9\u624b\u7684\u5e38\u89c1\u573a\u666f\u4e2d\uff0c\u7528\u6237\u4e0e\u8bed\u8a00\u4ee3\u7406\u4ea4\u4e92\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u751f\u6210\u54cd\u5e94\uff0c\u5e76\u53ef\u80fd\u9009\u62e9\u6027\u5730\u7f16\u8f91\u4ee3\u7406\u7684\u54cd\u5e94\u4ee5\u53cd\u6620\u4ed6\u4eec\u7684\u6f5c\u5728\u504f\u597d\uff0c\u540c\u65f6\u63d0\u9ad8\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u7f16\u8f91\u53cd\u9988\u662f\u81ea\u7136\u4ea7\u751f\u7684\uff0c\u9002\u5408\u7528\u4e8e\u63d0\u5347\u4ee3\u7406\u4e0e\u7528\u6237\u504f\u597d\u7684\u5951\u5408\u5ea6\uff0c\u964d\u4f4e\u540e\u7eed\u7528\u6237\u7684\u7f16\u8f91\u6210\u672c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPRELUDE\u6846\u67b6\uff0c\u5b83\u6839\u636e\u5386\u53f2\u7f16\u8f91\u6570\u636e\u63a8\u65ad\u7528\u6237\u7684\u6f5c\u5728\u504f\u597d\uff0c\u5e76\u636e\u6b64\u8bbe\u8ba1\u4e00\u4e2a\u63d0\u793a\u7b56\u7565\uff0c\u5f15\u5bfc\u672a\u6765\u7684\u54cd\u5e94\u751f\u6210\uff0c\u907f\u514d\u4e86\u6602\u8d35\u4e14\u96be\u4ee5\u6269\u5c55\u7684\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8fd8\u80fd\u4fdd\u6301\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u5b66\u4e60\u63cf\u8ff0\u6027\u7684\u504f\u597d\u6709\u52a9\u4e8e\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\uff0c\u7528\u6237\u53ef\u4ee5\u67e5\u770b\u548c\u8c03\u6574\u5b66\u4e60\u5230\u7684\u504f\u597d\u3002\u7136\u800c\uff0c\u7528\u6237\u504f\u597d\u53ef\u80fd\u590d\u6742\u591a\u53d8\uff0c\u53d7\u60c5\u5883\u5f71\u54cd\uff0c\u56e0\u6b64\u5b66\u4e60\u8d77\u6765\u5177\u6709\u6311\u6218\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCIPHER\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6839\u636e\u7528\u6237\u7f16\u8f91\u63a8\u65ad\u7ed9\u5b9a\u60c5\u5883\u4e0b\u7684\u7528\u6237\u504f\u597d\u3002\u672a\u6765\uff0cCIPHER\u4f1a\u4ece\u5386\u53f2\u4e2d\u7684k\u4e2a\u6700\u63a5\u8fd1\u7684\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u63a8\u65ad\u51fa\u7684\u504f\u597d\uff0c\u7efc\u5408\u751f\u6210\u54cd\u5e94\u3002\u6211\u4eec\u5728\u603b\u7ed3\u548c\u7535\u5b50\u90ae\u4ef6\u5199\u4f5c\u4e24\u4e2a\u4e92\u52a8\u73af\u5883\u4e2d\u4f7f\u7528GPT-4\u6a21\u62df\u7528\u6237\u8fdb\u884c\u8bc4\u4f30\uff0c\u4e0e\u76f4\u63a5\u4f7f\u7528\u7528\u6237\u7f16\u8f91\u4f46\u4e0d\u5b66\u4e60\u63cf\u8ff0\u6027\u504f\u597d\u7684\u7b97\u6cd5\uff0c\u4ee5\u53ca\u5b66\u4e60\u5168\u5c40\u65e0\u4e0a\u4e0b\u6587\u504f\u597d\u7684\u7b97\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002 \u5728\u4e24\u9879\u4efb\u52a1\u4e2d\uff0cCIPHER\u90fd\u5b9e\u73b0\u4e86\u6700\u4f4e\u7684\u7f16\u8f91\u8ddd\u79bb\u6210\u672c\uff0c\u5e76\u4e14\u5b66\u4e60\u5230\u7684\u504f\u597d\u4e0e\u771f\u5b9e\u504f\u597d\u663e\u793a\u51fa\u663e\u8457\u7684\u76f8\u4f3c\u6027\u3002**|\n", "2404.14387": "|**2024-04-22**|**A Survey on Self-Evolution of Large Language Models**|Zhengwei Tao et.al.|[2404.14387](http://arxiv.org/abs/2404.14387)|**[link](https://github.com/alibabaresearch/damo-convai)**|**## \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u9886\u57df\u548c\u667a\u80fd\u4ee3\u7406\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4f9d\u8d56\u4eba\u7c7b\u6216\u5916\u90e8\u6a21\u578b\u76d1\u7763\u7684\u73b0\u6709LLMs\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u591a\u6837\u6027\u589e\u52a0\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6210\u672c\u9ad8\u6602\u548c\u6027\u80fd\u74f6\u9888\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u5e94\u8fd0\u800c\u751f\uff0c\u8fd9\u79cd\u7b56\u7565\u5141\u8bb8LLMs\u81ea\u4e3b\u83b7\u53d6\u3001\u7cbe\u70bc\u5e76\u4ece\u81ea\u8eab\u751f\u6210\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\uff0c\u501f\u9274\u4eba\u7c7b\u7ecf\u9a8c\u5b66\u4e60\u8fc7\u7a0b\uff0c\u6709\u671b\u63a8\u52a8LLMs\u5411\u8d85\u7ea7\u667a\u80fd\u53d1\u5c55\u3002\u672c\u6587\u5168\u9762\u7efc\u8ff0\u4e86LLMs\u4e2d\u7684\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u6982\u5ff5\u6846\u67b6\uff0c\u5c06\u8fdb\u5316\u8fc7\u7a0b\u5212\u5206\u4e3a\u8fed\u4ee3\u5faa\u73af\u7684\u56db\u4e2a\u9636\u6bb5\uff1a\u7ecf\u9a8c\u83b7\u53d6\u3001\u7ecf\u9a8c\u7ec6\u5316\u3001\u66f4\u65b0\u548c\u8bc4\u4f30\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5206\u7c7b\u63a2\u8ba8LLMs\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u8fdb\u5316\u76ee\u6807\uff0c\u5e76\u5bf9\u76f8\u5173\u6587\u732e\u8fdb\u884c\u603b\u7ed3\uff0c\u63d0\u4f9b\u6bcf\u4e2a\u6a21\u5757\u7684\u5206\u7c7b\u548c\u89c1\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5f53\u524d\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u4e3a\u52a0\u901f\u81ea\u6f14\u8fdbLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u5173\u952e\u6d1e\u89c1\u3002**|\n", "2404.13501": "|**2024-04-21**|**A Survey on the Memory Mechanism of Large Language Model based Agents**|Zeyu Zhang et.al.|[2404.13501](http://arxiv.org/abs/2404.13501)|**[link](https://github.com/nuster1128/llm_agent_memory_survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u79d1\u7814\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u57fa\u4e8eLLMs\u7684\u667a\u80fd\u4ee3\u7406\u56e0\u5176\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5bf9\u4e8e\u89e3\u51b3\u9700\u8981\u957f\u671f\u590d\u6742\u4ea4\u4e92\u7684\u73b0\u5b9e\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u652f\u6301agent-environment\u4ea4\u4e92\u7684\u5173\u952e\u8981\u7d20\u662f\u4ee3\u7406\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u5c3d\u7ba1\u5df2\u6709\u4f17\u591a\u6709\u524d\u666f\u7684\u8bb0\u5fc6\u8bbe\u8ba1\u88ab\u63d0\u51fa\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u5206\u6563\u5728\u591a\u7bc7\u8bba\u6587\u4e2d\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7efc\u8ff0\u6765\u7cfb\u7edf\u6027\u5730\u603b\u7ed3\u548c\u6bd4\u8f83\uff0c\u672a\u80fd\u63d0\u70bc\u51fa\u901a\u7528\u4e14\u6709\u6548\u7684\u8bbe\u8ba1\u6a21\u5f0f\u4ee5\u542f\u53d1\u540e\u7eed\u7814\u7a76\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4efd\u5173\u4e8eLLM\u57fa\u4ee3\u7406\u8bb0\u5fc6\u673a\u5236\u7684\u5168\u9762\u8c03\u67e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8\u8bb0\u5fc6\u5728LLM\u4ee3\u7406\u4e2d\u7684\u201c\u662f\u4ec0\u4e48\u201d\u4ee5\u53ca\u201c\u4e3a\u4ec0\u4e48\u9700\u8981\u201d\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u56de\u987e\u4e86\u5173\u4e8e\u8bb0\u5fc6\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4f1a\u5c55\u793a\u8bb0\u5fc6\u6a21\u5757\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u626e\u6f14\u7684\u91cd\u8981\u89d2\u8272\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f1a\u5206\u6790\u73b0\u6709\u5de5\u4f5c\u7684\u5c40\u9650\uff0c\u5e76\u6307\u51fa\u91cd\u8981\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u4e86\u8ddf\u8e2a\u8be5\u9886\u57df\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2aGitHub\u4ed3\u5e93\uff1a\\url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}\u3002**|\n", "2404.11964": "|**2024-04-18**|**From Language Models to Practical Self-Improving Computer Agents**|Alex Sheng et.al.|[2404.11964](http://arxiv.org/abs/2404.11964)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u76f4\u63a5\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u521b\u5efa\u80fd\u591f\u6267\u884c\u5404\u79cd\u8ba1\u7b97\u673a\u4efb\u52a1\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u5e76\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u6765\u53d1\u5c55\u5de5\u5177\u548c\u589e\u5f3a\u529f\u80fd\uff0c\u4ee5\u89e3\u51b3\u65e5\u76ca\u590d\u6742\u7684\u4efb\u52a1\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u663e\u793a\u51fa\u4ece\u975e\u53c2\u6570\u589e\u5f3a\u4e2d\u83b7\u76ca\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u5927\u91cf\u96c6\u4e2d\u5728\u5f00\u53d1\u8f6f\u4ef6\uff0c\u4ee5\u8d4b\u4e88LLMs\u5404\u79cd\u80fd\u529b\u3002\u6211\u4eec\u5efa\u8bae\uff0c\u901a\u8fc7\u9002\u5f53\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u4e00\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u7cfb\u7edf\u5730\u751f\u6210\u8f6f\u4ef6\u6765\u589e\u5f3a\u81ea\u8eab\uff0c\u800c\u4e0d\u662f\u4f9d\u8d56\u4eba\u7c7b\u5de5\u7a0b\u7684\u9759\u6001\u8f6f\u4ef6\u5f00\u53d1\u3002 \u6211\u4eec\u901a\u8fc7\u4e00\u4e9b\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff1a\u4ec5\u901a\u8fc7\u7ec8\u7aef\u8bbf\u95ee\uff0c\u6211\u4eec\u5f15\u5bfcLLM\u4ee3\u7406\u6dfb\u52a0\u4e86\u68c0\u7d22\u3001\u4e92\u8054\u7f51\u641c\u7d22\u3001\u7f51\u9875\u5bfc\u822a\u548c\u6587\u672c\u7f16\u8f91\u529f\u80fd\u3002\u8be5\u4ee3\u7406\u6709\u6548\u5730\u5229\u7528\u8fd9\u4e9b\u5de5\u5177\u89e3\u51b3\u4e86\u95ee\u9898\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u548c\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\u3002\u8fd9\u79cd\u65b9\u6cd5\u8868\u660e\uff0c\u901a\u8fc7\u8fde\u7eed\u63d0\u95ee\u548c\u5de7\u5999\u7684\u63d0\u793a\u8bbe\u8ba1\uff0cLLM\u80fd\u591f\u81ea\u4e3b\u6269\u5c55\u5176\u529f\u80fd\uff0c\u6267\u884c\u5b9e\u9645\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002|\n", "2404.11794": "|**2024-04-25**|**Automated Social Science: Language Models as Scientist and Subjects**|Benjamin S. Manning et.al.|[2404.11794](http://arxiv.org/abs/2404.11794)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u6784\u5efa\u548c\u6d4b\u8bd5\u793e\u4f1a\u79d1\u5b66\u5047\u8bbe\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u5173\u952e\u5728\u4e8e\u4f7f\u7528\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u3002\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9648\u8ff0\u5047\u8bbe\u7684\u8bed\u8a00\u3001\u6784\u5efaLLM\u57fa\u7840\u4ee3\u7406\u7684\u84dd\u56fe\u3001\u5b9e\u9a8c\u8bbe\u8ba1\u4ee5\u53ca\u6570\u636e\u5206\u6790\u8ba1\u5212\u3002\u62df\u5408\u540e\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u53ef\u4f9b\u9884\u6d4b\u6216\u89c4\u5212\u540e\u7eed\u5b9e\u9a8c\u3002\u6211\u4eec\u901a\u8fc7\u51e0\u4e2a\u573a\u666f\u8fdb\u884c\u4e86\u6f14\u793a\uff1a\u8c08\u5224\u3001\u4fdd\u91ca\u542c\u8bc1\u4f1a\u3001\u6c42\u804c\u9762\u8bd5\u548c\u62cd\u5356\u3002\u5728\u8fd9\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7cfb\u7edf\u65e2\u63d0\u51fa\u4e86\u56e0\u679c\u5173\u7cfb\uff0c\u4e5f\u8fdb\u884c\u4e86\u68c0\u9a8c\uff0c\u53d1\u73b0\u4e86\u4e00\u4e9b\u8bc1\u636e\uff0c\u800c\u6709\u4e9b\u5219\u6ca1\u6709\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u4ece\u8fd9\u4e9b\u793e\u4f1a\u4e92\u52a8\u6a21\u62df\u4e2d\u83b7\u53d6\u7684\u6d1e\u5bdf\u5e76\u975e\u4ec5\u901a\u8fc7\u76f4\u63a5\u8be2\u95eeLLM\u5c31\u80fd\u83b7\u5f97\u3002\u5f53\u7ed9\u5b9a\u6bcf\u4e2a\u573a\u666f\u7684\u5efa\u8bae\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u65f6\uff0cLLM\u5728\u9884\u6d4b\u4f30\u8ba1\u6548\u5e94\u7684\u7b26\u53f7\u65b9\u9762\u8868\u73b0\u826f\u597d\uff0c\u4f46\u65e0\u6cd5\u53ef\u9760\u5730\u9884\u6d4b\u6548\u5e94\u7684\u5927\u5c0f\u3002\u5728\u62cd\u5356\u5b9e\u9a8c\u4e2d\uff0c\u6a21\u62df\u7ed3\u679c\u4e0e\u62cd\u5356\u7406\u8bba\u7684\u9884\u6d4b\u7d27\u5bc6\u543b\u5408\uff0c\u4f46LLM\u76f4\u63a5\u63d0\u53d6\u7684\u6e05\u7b97\u4ef7\u683c\u9884\u6d4b\u4e0d\u51c6\u786e\u3002\u7136\u800c\uff0c\u5982\u679c\u6a21\u578b\u80fd\u57fa\u4e8e\u62df\u5408\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u8fdb\u884c\u6761\u4ef6\u5316\uff0cLLM\u7684\u9884\u6d4b\u4f1a\u5927\u5e45\u6539\u8fdb\u3002\u7b80\u800c\u8a00\u4e4b\uff0cLLM\u77e5\u9053\u7684\u6bd4\u5b83\u80fd\u7acb\u5373\u8868\u8fbe\u7684\u8981\u591a\u3002|\n", "2404.11483": "|**2024-04-17**|**AgentKit: Flow Engineering with Graphs, not Coding**|Yue Wu et.al.|[2404.11483](http://arxiv.org/abs/2404.11483)|**[link](https://github.com/holmeswww/agentkit)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u76f4\u89c2\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u6846\u67b6\uff08AgentKit\uff09\uff0c\u65e8\u5728\u4e3a\u591a\u529f\u80fd\u4ee3\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u65b9\u6cd5\u3002AgentKit\u901a\u8fc7\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6784\u5efa\u590d\u6742\u7684\u201c\u601d\u7ef4\u8fc7\u7a0b\u201d\u3002\u5176\u57fa\u672c\u5355\u5143\u662f\u8282\u70b9\uff0c\u5305\u542b\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u3002\u7528\u6237\u53ef\u4ee5\u50cf\u62fc\u63a5\u4e50\u9ad8\u79ef\u6728\u4e00\u6837\u8fde\u63a5\u8fd9\u4e9b\u8282\u70b9\uff0c\u4ece\u800c\u660e\u786e\u8bbe\u8ba1\u51fa\u81ea\u7136\u7ed3\u6784\u5316\u7684\u201c\u601d\u8003\u6d41\u7a0b\u201d\u3002\u4f8b\u5982\uff0c\u5728\u64b0\u5199\u8bba\u6587\u65f6\uff0c\u53ef\u80fd\u7684\u6b65\u9aa4\u5305\u62ec\uff1a1\uff09\u786e\u5b9a\u6838\u5fc3\u4fe1\u606f\uff0c2\uff09\u8bc6\u522b\u7814\u7a76\u7a7a\u767d\u7b49\u3002AgentKit\u7684\u6a21\u5757\u5316\u7279\u6027\u4f7f\u5f97\u9ad8\u7ea7\u529f\u80fd\u5982\u5373\u5174\u7684\u5c42\u6b21\u5316\u89c4\u5212\u3001\u53cd\u601d\u548c\u4ece\u4e92\u52a8\u4e2d\u5b66\u4e60\u53d8\u5f97\u53ef\u80fd\u3002\u7531\u4e8e\u5176\u76f4\u89c2\u4e14\u6a21\u62df\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u7684\u8bbe\u8ba1\uff0c\u5373\u4f7f\u6ca1\u6709\u7f16\u7a0b\u7ecf\u9a8c\u7684\u4eba\u4e5f\u80fd\u521b\u5efa\u548c\u8c03\u6574\u57fa\u7840\u4ee3\u7406\u3002\u5b9a\u91cf\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528AgentKit\u8bbe\u8ba1\u7684\u4ee3\u7406\u5728WebShop\u548cCrafter\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u6210\u679c\u8868\u660eAgentKit\u6709\u6f5c\u529b\u4f7fLLM\u4ee3\u7406\u5728\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u4e0b\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728GitHub\uff1ahttps://github.com/holmeswww/AgentKit\u3002**|\n", "2404.09982": "|**2024-04-15**|**Memory Sharing for Large Language Model based Agents**|Hang Gao et.al.|[2404.09982](http://arxiv.org/abs/2404.09982)|**[link](https://github.com/ghupppp/memorysharingllm)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6267\u884c\u4efb\u52a1\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u91cd\u5927\u7a81\u7834\uff0c\u5b83\u51cf\u5c11\u4e86\u5bf9\u56fa\u5b9a\u7b54\u6848\u4efb\u52a1\uff08\u5982\u5e38\u8bc6\u95ee\u9898\u548c\u662f\u975e\u67e5\u8be2\uff09\u7684\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03\u9700\u6c42\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u5f00\u653e\u6027\u6311\u6218\u5982\u8bd7\u6b4c\u521b\u4f5c\u65f6\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5c40\u9650\uff0c\u4e3b\u8981\u6e90\u4e8e\u63d0\u4f9b\u7684\u793a\u4f8b\u5168\u9762\u6027\u4ee5\u53ca\u6a21\u578b\u7406\u89e3\u95ee\u9898\u5185\u5bb9\u7684\u80fd\u529b\u4e0d\u8db3\uff0c\u5bfc\u81f4\u8f93\u51fa\u5f80\u5f80\u4e0e\u9884\u671f\u7ed3\u679c\u5927\u76f8\u5f84\u5ead\u3002\u9488\u5bf9\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86Memory-Sharing\uff08MS\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLM\u591a\u4ee3\u7406\u7684\u5b9e\u65f6\u8bb0\u5fc6\u5b58\u50a8\u548c\u68c0\u7d22\u7cfb\u7edf\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u8fc7\u7a0b\u3002\u6bcf\u4e2a\u201c\u8bb0\u5fc6\u201d\u5355\u5143\u8bb0\u5f55\u4e86\u63d0\u51fa\u7684\u67e5\u8be2\u53ca\u5176\u6765\u81eaLLM\u4ee3\u7406\u7684\u5373\u65f6\u54cd\u5e94\uff0c\u4ece\u591a\u4e2a\u7c7b\u4f3c\u4ee3\u7406\u4e2d\u805a\u5408\u8fd9\u4e9b\u8bb0\u5fc6\uff0c\u5f62\u6210\u6240\u6709\u4ee3\u7406\u5171\u4eab\u7684\u4e30\u5bcc\u8bb0\u5fc6\u6c60\u3002MS\u6846\u67b6\u4e0d\u4ec5\u5e2e\u52a9\u4ee3\u7406\u627e\u5230\u7279\u5b9a\u4efb\u52a1\u7684\u76f8\u5173\u793a\u4f8b\uff0c\u8fd8\u8bc4\u4f30\u5176\u8bb0\u5fc6\u7684\u6f5c\u5728\u5229\u7528\u4ef7\u503c\uff0c\u4f9b\u5176\u4ed6\u4ee3\u7406\u672a\u6765\u5e94\u7528\u3002\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u7684\u5b9e\u8bc1\u9a8c\u8bc1\u663e\u793a\uff0cMS\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5904\u7406\u5f00\u653e\u6027\u95ee\u9898\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u54ea\u79cd\u8bb0\u5fc6\u6c60\u548c\u68c0\u7d22\u7b56\u7565\u80fd\u66f4\u597d\u5730\u652f\u6301\u4ee3\u7406\uff0c\u4e3aMS\u7684\u672a\u6765\u53d1\u5c55\u63d0\u4f9b\u4e86\u65b9\u5411\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\uff1ahttps://github.com/GHupppp/MemorySharingLLM \u83b7\u53d6\u3002**|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127](http://arxiv.org/abs/2404.09127)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|**### \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u5b83\u4eec\u901a\u5e38\u6821\u51c6\u4e0d\u826f\u4e14\u8fc7\u5ea6\u81ea\u4fe1\uff0c\u7279\u522b\u662f\u5728\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u4e2d\u3002\u4eba\u7c7b\u7684\u51b3\u7b56\u548c\u4fe1\u5fc3\u4e0d\u4ec5\u6e90\u4e8e\u5185\u5728\u4fe1\u5ff5\uff0c\u8fd8\u80fd\u901a\u8fc7\u65e5\u5e38\u89c2\u5bdf\u8fdb\u884c\u8c03\u6574\uff0c\u800c\u73b0\u6709LLM\u7684\u6821\u51c6\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u5355\u4e2a\u6a21\u578b\u7684\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u201c\u96c6\u4f53\u667a\u6167\u201d\uff1a\u591a\u4e2aLLM\u4e4b\u95f4\u7684\u534f\u4f5c\u8868\u8fbe\u80fd\u529b\uff0c\u8fd9\u53ef\u4ee5\u96c6\u4f53\u63d0\u9ad8\u51c6\u786e\u6027\u548c\u6821\u51c6\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u8bad\u7ec3\u540e\u5904\u7406\u7684\u6821\u51c6\u7b56\u7565\u2014\u2014\u534f\u4f5c\u6821\u51c6\uff08Collaborative Calibration\uff09\uff0c\u5b83\u5229\u7528\u591a\u4ee3\u7406\u5de5\u5177\u589e\u5f3a\u7684LLMs\u5728\u6a21\u62df\u7684\u7fa4\u4f53\u8ba8\u8bba\u8fc7\u7a0b\u4e2d\uff0c\u5171\u540c\u63d0\u5347\u6821\u51c6\u80fd\u529b\u548c\u63a8\u7406\u5408\u7406\u6027\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u751f\u6210\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u534f\u4f5c\u6821\u51c6\u7684\u6709\u6548\u6027\uff0c\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u6574\u5408\u96c6\u4f53\u6821\u51c6\u540e\u7684\u4fe1\u5fc3\u8bc4\u4f30\u548c\u63d0\u5347\u6a21\u578b\u9884\u6d4b\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077](http://arxiv.org/abs/2404.09077)|**[link](https://github.com/zukangy/kgp-curiousllm)**|**\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u5e93\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u6210\u6548\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u5f80\u5f80\u529b\u6709\u4e0d\u902e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5bf9\u4e00\u79cd\u540d\u4e3a\u77e5\u8bc6\u56fe\u8c31\u63d0\u793a\uff08KGP\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4ee5\u63d0\u5347\u63a8\u7406\u548c\u641c\u7d22\u7cbe\u5ea6\u3002\u7136\u800c\uff0c\u539f\u59cb\u7684KGP\u6846\u67b6\u9700\u8981\u6602\u8d35\u7684\u5927\u89c4\u6a21\u6570\u636e\u5fae\u8c03\uff0c\u5e76\u4e14\u4ecd\u5b58\u5728LLM\u7684\u9519\u8bef\u63a8\u65ad\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u878d\u5165\u63a8\u7406\u80fd\u529b\u7684LLM\u4ee3\u7406\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u7684\u597d\u5947\u5fc3\uff0c\u901a\u8fc7\u63d0\u95ee\u6765\u66f4\u6709\u6548\u5730\u5bfc\u822a\u641c\u7d22\u8fc7\u7a0b\u3002\u8fd9\u4e2a\u7b80\u5355\u7684\u6539\u8fdb\u663e\u8457\u63d0\u9ad8\u4e86LLM\u5728QA\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u521d\u59cbKGP\u6846\u67b6\u7684\u9ad8\u6210\u672c\u548c\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u8fdb\u4e00\u6b65\u53d1\u5c55\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u7cbe\u786e\u3001\u66f4\u5feb\u6377\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684QA\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2404.09043": "|**2024-04-13**|**Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation**|Jia Gu et.al.|[2404.09043](http://arxiv.org/abs/2404.09043)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u98de\u901f\u53d1\u5c55\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528LLMs\u6a21\u62df\u4eba\u7c7b\u7684\u884c\u4e3a\u51b3\u7b56\u8fc7\u7a0b\uff0c\u901a\u5e38\u8fd9\u4e9b\u8fc7\u7a0b\u88ab\u8868\u793a\u4e3a\u9a6c\u5c14\u53ef\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDPs\uff09\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u52a8\u4f5c\u9075\u5faa\u7279\u5b9a\u7684\u6982\u7387\u5206\u5e03\uff0c\u5e76\u9700\u8981\u8fed\u4ee3\u91c7\u6837\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u63a2\u7a76LLM\u4ee3\u7406\u7406\u89e3\u6982\u7387\u5206\u5e03\u7684\u80fd\u529b\uff0c\u4ee5\u901a\u8fc7\u6982\u7387\u91c7\u6837\u6307\u5bfc\u884c\u4e3a\u51b3\u7b56\u5e76\u751f\u6210\u884c\u4e3a\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u65b9\u9762\uff1a\u4e00\u662f\u5df2\u77e5\u7cbe\u786e\u6982\u7387\u5206\u5e03\u7684\u6a21\u62df\uff0c\u4e8c\u662f\u6a21\u7cca\u6982\u7387\u5206\u5e03\u7684\u5e8f\u5217\u751f\u6210\u3002 \u5728\u5df2\u77e5\u6982\u7387\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\uff0c\u4ee3\u7406\u9700\u8981\u6839\u636e\u95ee\u9898\u63cf\u8ff0\u63d0\u4f9b\u6982\u7387\u5206\u5e03\u7684\u7c7b\u578b\u548c\u53c2\u6570\uff0c\u7136\u540e\u7ed9\u51fa\u91c7\u6837\u5e8f\u5217\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u663e\u793a\uff0cLLM\u4ee3\u7406\u5728\u8fd9\u65b9\u9762\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4f46\u901a\u8fc7\u7f16\u7a0b\u5de5\u5177\u53ef\u4ee5\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u63d0\u9ad8\u91c7\u6837\u6210\u529f\u7387\u3002\u800c\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\uff0c\u6982\u7387\u5206\u5e03\u5f80\u5f80\u4e0d\u660e\u786e\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5728\u7b2c\u4e8c\u90e8\u5206\u8ba9\u4ee3\u7406\u8c03\u6574\u5728\u7ebf\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u6d3b\u8dc3\u5ea6\uff0c\u5e76\u5206\u6790\u884c\u52a8\u9891\u7387\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u501f\u52a9\u7f16\u7a0b\u5de5\u5177\uff0cLLM\u4ee3\u7406\u4f9d\u7136\u65e0\u6cd5\u6709\u6548\u5730\u91c7\u6837\u6982\u7387\u5206\u5e03\u3002\u8fd9\u610f\u5473\u7740\u5728\u76f4\u63a5\u5c06LLM\u4f5c\u4e3a\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\u5e94\u7528\u4e4b\u524d\uff0c\u8fd8\u9700\u8981\u8c28\u614e\u5bf9\u5f85\u3002|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492](http://arxiv.org/abs/2404.08492)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u535a\u5f08\u8bba\u6846\u67b6\u4e0b\u7684\u6e38\u620f\u884c\u4e3a\u7406\u89e3\u6f5c\u529b\u65e5\u76ca\u663e\u73b0\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u901a\u8fc7\u6a21\u62df\u5206\u6790\u4e0d\u540c\u7c7b\u578bLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u7ecf\u5178 Beauty Contest \u6e38\u620f\u4e2d\u7684\u7b56\u7565\u4e92\u52a8\u3002\u501f\u9274\u4eba\u7c7b\u5b9e\u9a8c\uff0c\u6211\u4eec\u5bf9LLM\u4ee3\u7406\u7684\u7b56\u7565\u5c42\u6b21\u8fdb\u884c\u7c7b\u4f3c\u7684\u8bc4\u4f30\uff0c\u53d1\u73b0\u5b83\u4eec\u5c55\u73b0\u51fa\u4ece\u96f6\u7ea7\u5230\u4e00\u7ea7\u7684\u4e0d\u540c\u7a0b\u5ea6\u63a8\u7406\u80fd\u529b\uff0c\u5e76\u5728\u91cd\u590d\u6e38\u620f\u4e2d\u8868\u73b0\u51fa\u884c\u52a8\u8d8b\u540c\u3002\u6b64\u5916\uff0c\u6211\u8fd8\u63a2\u8ba8\u4e86\u4e0d\u540c\u7c7b\u578b\u7684\u4ee3\u7406\u7fa4\u4f53\u6784\u6210\u5982\u4f55\u5f71\u54cd\u6218\u7565\u884c\u4e3a\uff1a\u9ad8\u6bd4\u4f8b\u7684\u56fa\u5b9a\u7b56\u7565\u5bf9\u624b\u80fd\u4fc3\u8fdbLLM\u4ee3\u7406\u7684\u6536\u655b\uff0c\u800c\u6df7\u5408\u73af\u5883\u4e2d\u4e0d\u540c\u76f8\u5bf9\u7b56\u7565\u6c34\u5e73\u7684\u4ee3\u7406\u5171\u5b58\u4f1a\u52a0\u901f\u6240\u6709\u4ee3\u7406\u7684\u6536\u655b\u3002\u66f4\u667a\u80fd\u7684\u4ee3\u7406\u53ef\u80fd\u83b7\u5f97\u66f4\u9ad8\u7684\u5e73\u5747\u6536\u76ca\uff0c\u4f46\u8fd9\u662f\u4ee5\u8f83\u4f4e\u667a\u80fd\u4ee3\u7406\u7684\u727a\u7272\u4e3a\u4ee3\u4ef7\u7684\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e0d\u4ec5\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u60c5\u666f\u4e0b\u6a21\u62df\u4ee3\u7406\u7684\u7ed3\u5c40\uff0c\u8fd8\u4e3a\u7406\u89e3\u7b97\u6cd5\u4e4b\u95f4\u7684\u6218\u7565\u4e92\u52a8\u63d0\u4f9b\u4e86\u91cd\u8981\u542f\u793a\u3002|\n", "2404.08144": "|**2024-04-17**|**LLM Agents can Autonomously Exploit One-day Vulnerabilities**|Richard Fang et.al.|[2404.08144](http://arxiv.org/abs/2404.08144)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5a01\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5176\u5728\u826f\u6027\u548c\u6076\u610f\u7528\u9014\u4e0a\u7684\u5e94\u7528\u4e5f\u65e5\u76ca\u5e7f\u6cdb\u3002\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u5b83\u4eec\u5229\u7528\u7f51\u7edc\u5b89\u5168\u6f0f\u6d1e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u81ea\u4e3b\u7834\u89e3\u7f51\u7ad9\u7684\u53ef\u80fd\u6027\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u5355\u7684\u6f0f\u6d1e\u4e0a\u3002\u672c\u5de5\u4f5c\u63ed\u793a\uff0cLLMs\u80fd\u591f\u81ea\u4e3b\u5229\u7528\u73b0\u5b9e\u4e16\u754c\u7cfb\u7edf\u4e2d\u7684\u5355\u65e5\u6f0f\u6d1e\u3002\u6211\u4eec\u6536\u96c6\u4e86\u4e00\u7ec4\u5305\u542b15\u4e2a\u88abCVE\u63cf\u8ff0\u4e3a\u201c\u5173\u952e\u4e25\u91cd\u6027\u201d\u7684\u4e00\u5929\u671f\u6f0f\u6d1e\u6570\u636e\u3002\u5f53\u63d0\u4f9bCVE\u63cf\u8ff0\u65f6\uff0cGPT-4\u6a21\u578b\u80fd\u6210\u529f\u5229\u752887%\u7684\u6f0f\u6d1e\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5176\u4ed6\u6d4b\u8bd5\u6a21\u578b\uff08\u5982GPT-3.5\u3001\u5f00\u6e90LLMs\u548c\u5f00\u6e90\u6f0f\u6d1e\u626b\u63cf\u5668ZAP\u548cMetasploit\uff09\u7684\u8868\u73b0\u5747\u4e3a0%\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684GPT-4\u6a21\u578b\u5728\u6ca1\u6709\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u6548\u7387\u5927\u51cf\uff0c\u4ec5\u80fd\u5229\u75287%\u7684\u6f0f\u6d1e\u3002\u8fd9\u4e9b\u53d1\u73b0\u5bf9\u5927\u89c4\u6a21\u90e8\u7f72\u9ad8\u80fd\u529bLLMs\u63d0\u51fa\u4e86\u8d28\u7591\u3002|\n", "2404.17586": "|**2024-04-11**|**The Future of Scientific Publishing: Automated Article Generation**|Jeremy R. Harper et.al.|[2404.17586](http://arxiv.org/abs/2404.17586)|null|\u8fd9\u9879\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8f6f\u4ef6\u5de5\u5177\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u793a\uff0c\u5b9e\u73b0\u4e86\u4ecePython\u4ee3\u7801\u81ea\u52a8\u751f\u6210\u5b66\u672f\u6587\u7ae0\uff0c\u8fd9\u5bf9\u4e8e\u751f\u7269\u533b\u5b66\u4fe1\u606f\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u9009\u62e9Python\u4f5c\u4e3a\u57fa\u7840\u793a\u4f8b\uff0c\u56e0\u5176\u5e7f\u6cdb\u4f7f\u7528\u548c\u5f3a\u5927\u7684\u6570\u636e\u5206\u6790\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u548c\u6846\u67b6\u7684\u7075\u6d3b\u6027\u4f7f\u5f97\u5176\u9002\u7528\u4e8e\u591a\u79cdGitHub\u4ed3\u5e93\uff0c\u8868\u660e\u4e86\u5de5\u5177\u7684\u5e7f\u6cdb\u5e94\u7528\u6f5c\u529b\uff08Harper\uff0c2024\u5e74\uff09\u3002\u901a\u8fc7\u7b80\u5316\u4f20\u7edf\u4e0a\u8017\u65f6\u7684\u5b66\u672f\u5199\u4f5c\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5728\u6574\u5408\u590d\u6742\u6570\u636e\u96c6\u548c\u4ee3\u7801\u8f93\u51fa\u65b9\u9762\uff0c\u8fd9\u4e00\u7a81\u7834\u6027\u8fdb\u5c55\u63a8\u52a8\u4e86\u79d1\u7814\u6210\u679c\u7684\u5feb\u901f\u4f20\u64ad\u3002\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u5e76\u672a\u4f9d\u8d56\u9ad8\u7ea7\u8bed\u8a00\u6a21\u578b\uff0c\u786e\u4fdd\u4e86\u81ea\u52a8\u5316\u751f\u6210\u5185\u5bb9\u7684\u8fde\u8d2f\u6027\u548c\u5b8c\u6574\u6027\u3002\u6b64\u6b21\u63a2\u7d22\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u8f6f\u4ef6\u7684\u6210\u529f\u5e94\u7528\u548c\u6548\u7387\uff0c\u8fd8\u9884\u793a\u4e86\u672a\u6765\u53ef\u80fd\u96c6\u6210\u66f4\u5148\u8fdb\u7684LLM\uff0c\u5c06\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u529f\u80fd\uff0c\u5f15\u9886\u4e00\u4e2a\u79d1\u7814\u53d1\u73b0\u53d1\u5e03\u66f4\u52a0\u8fc5\u901f\u548c\u6613\u83b7\u53d6\u7684\u65f6\u4ee3\u3002|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456](http://arxiv.org/abs/2404.07456)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u4f5c\u4e3a\u667a\u80fd\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u5de5\u7a0b\u6216\u4efb\u52a1\u7279\u5b9a\u7684\u5fae\u8c03\u6765\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u6216\u51b3\u7b56\u80fd\u529b\uff0c\u5ffd\u89c6\u4e86\u63a2\u7d22\u4e0e\u5229\u7528\u7684\u8fc7\u7a0b\u3002\u5728\u5904\u7406\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u9996\u5148\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u73af\u5883\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u6a21\u578b\u503e\u5411\u4e8e\u505a\u51fa\u8d2a\u5a6a\u51b3\u7b56\uff0c\u5bfc\u81f4\u89e3\u51b3\u65b9\u6848\u4e0d\u7406\u60f3\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u4ece\u73af\u5883\u4e2d\u83b7\u53d6\u7684\u65e0\u5173\u4fe1\u606f\u4e0d\u4ec5\u5f15\u5165\u566a\u58f0\uff0c\u8fd8\u589e\u52a0\u4e86\u989d\u5916\u7684\u6210\u672c\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u5f31\u63a2\u7d22\u5f3a\u5316\u5f3a\u5229\u7528\uff08Weak Exploration to Strong Exploitation\uff0cWESE\uff09\uff0c\u65e8\u5728\u589e\u5f3aLLM\u5728\u89e3\u51b3\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0cWESE\u5c06\u63a2\u7d22\u548c\u5229\u7528\u8fc7\u7a0b\u89e3\u8026\uff0c\u4f7f\u7528\u6210\u672c\u6548\u76ca\u9ad8\u7684\u201c\u5f31\u201d\u4ee3\u7406\u6267\u884c\u63a2\u7d22\u4efb\u52a1\uff0c\u4ee5\u83b7\u53d6\u5168\u5c40\u77e5\u8bc6\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7b56\u7565\u6765\u5b58\u50a8\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5e76\u63d0\u53d6\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u4ece\u800c\u63d0\u5347\u201c\u5f3a\u201d\u4ee3\u7406\u5728\u6210\u529f\u7387\u548c\u6548\u7387\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5e76\u5728\u56db\u4e2a\u4e92\u52a8\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u548c\u6548\u7387\u3002|\n", "2404.06921": "|**2024-04-10**|**GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications**|Shishir G. Patil et.al.|[2404.06921](http://arxiv.org/abs/2404.06921)|**[link](https://github.com/ShishirPatil/gorilla)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0d\u518d\u4ec5\u4ec5\u662f\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u4fe1\u606f\u63d0\u4f9b\u8005\uff0c\u800c\u662f\u5f00\u59cb\u79ef\u6781\u53c2\u4e0e\u5230\u4e0e\u5b9e\u9645\u5e94\u7528\u548c\u670d\u52a1\u7684\u4e92\u52a8\u4e2d\u3002\u5982\u4eca\uff0c\u4eba\u7c7b\u5728\u5c06LLM\u751f\u6210\u7684\u8f93\u51fa\uff08\u5982\u4ee3\u7801\u3001\u51fd\u6570\u6216\u64cd\u4f5c\uff09\u6295\u5165\u73b0\u5b9e\u4e16\u754c\u6267\u884c\u524d\uff0c\u9700\u8981\u9a8c\u8bc1\u5176\u6b63\u786e\u6027\u548c\u9002\u7528\u6027\uff0c\u8fd9\u5e26\u6765\u4e86\u6311\u6218\uff0c\u56e0\u4e3a\u4ee3\u7801\u7406\u89e3\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u975e\u5e38\u56f0\u96be\u3002\u672c\u6587\u7814\u7a76\u4e86\u4eba\u7c7b\u5982\u4f55\u80fd\u6709\u6548\u4e0eLLMs\u534f\u4f5c\u3001\u59d4\u6d3e\u548c\u76d1\u7763\uff0c\u7279\u522b\u662f\u5728\u672a\u6765\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u5bf9\u63d0\u51fa\u7684\u884c\u52a8\u8fdb\u884c\u201c\u4e8b\u540e\u9a8c\u8bc1\u201d\uff08\u5728\u770b\u5230\u8f93\u51fa\u540e\u786e\u8ba4\u5176\u6b63\u786e\u6027\uff09\u6bd4\u4e4b\u524d\u7684\u201c\u4e8b\u524d\u9a8c\u8bc1\u201d\u66f4\u4e3a\u5bb9\u6613\u3002\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u96c6\u6210\u76f4\u89c2\u7684\u64a4\u9500\u529f\u80fd\uff0c\u5e76\u4e3aLLM\u751f\u6210\u7684\u52a8\u4f5c\u8bbe\u5b9a\u635f\u5bb3\u7ea6\u675f\uff0c\u4f5c\u4e3a\u964d\u4f4e\u76f8\u5173\u98ce\u9669\u7684\u6709\u6548\u7b56\u7565\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u4eba\u7c7b\u53ef\u4ee5\u64a4\u9500LLM\u8f93\u51fa\u7684\u5f71\u54cd\uff0c\u6216\u8005\u786e\u4fe1\u6f5c\u5728\u98ce\u9669\u662f\u6709\u9650\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u5b9e\u73b0LLMs\u4e0e\u5e94\u7528\u548c\u670d\u52a1\u5728\u6709\u9650\u7684\u4eba\u7c7b\u76d1\u7763\u4e0b\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u63cf\u8ff0\u4e86\u5f00\u6e90\u8fd0\u884c\u65f6Gorilla Execution Engine\uff08GoEX\uff09\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\uff0c\u8be5\u8fd0\u884c\u65f6\u7528\u4e8e\u6267\u884cLLM\u52a8\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e9b\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\uff0c\u65e8\u5728\u63a8\u52a8LLMs\u4e0e\u5e94\u7528\u4e4b\u95f4\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u8fdb\u884c\u4ea4\u4e92\u3002GoEX\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ShishirPatil/gorilla/\u3002**|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411](http://arxiv.org/abs/2404.06411)|**[link](https://github.com/nec-research/agentquest)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\uff0c\u4eba\u4eec\u8ffd\u6c42\u80fd\u591f\u89e3\u51b3\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u7684LLM\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u5f80\u5f80\u5c40\u9650\u4e14\u53ea\u5173\u6ce8\u6574\u4f53\u4efb\u52a1\u6210\u529f\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AgentQuest\u6846\u67b6\uff0c\u5b83\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\uff08i\uff09benchmark\u548c\u8bc4\u4f30\u6307\u6807\u6a21\u5757\u5316\u4e14\u6613\u4e8e\u6269\u5c55\uff0c\u901a\u8fc7\u6587\u6863\u9f50\u5168\u3001\u6613\u7528\u7684API\uff1b\uff08ii\uff09\u6211\u4eec\u63d0\u4f9b\u4e86\u4e24\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u80fd\u591f\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u53ef\u9760\u5730\u8ffd\u8e2aLLM\u4ee3\u7406\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u793a\u4f8b\u5c55\u793a\u4e86\u8fd9\u4e9b\u6307\u6807\u7684\u5b9e\u7528\u6027\uff0c\u901a\u8fc7\u8bc6\u522b\u5e38\u89c1\u5931\u8d25\u70b9\u5e76\u4f18\u5316\u4ee3\u7406\u67b6\u6784\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6211\u4eec\u5e0c\u671b\u4e0e\u7814\u7a76\u754c\u5171\u540c\u6269\u5c55AgentQuest\uff0c\u5e76\u5df2\u5c06\u5176\u5f00\u6e90\u5728https://github.com/nec-research/agentquest\u3002**|\n", "2404.05427": "|**2024-04-15**|**AutoCodeRover: Autonomous Program Improvement**|Yuntong Zhang et.al.|[2404.05427](http://arxiv.org/abs/2404.05427)|**[link](https://github.com/nus-apr/auto-code-rover)**|**\u5728\u8fc7\u53bb\u51e0\u5341\u5e74\u91cc\uff0c\u7814\u7a76\u4eba\u5458\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u6781\u5927\u5730\u63a8\u52a8\u4e86\u7f16\u7a0b\u8f85\u52a9\u7684\u81ea\u52a8\u5316\u3002\u7136\u800c\uff0c\u8f6f\u4ef6\u5de5\u7a0b\u5e76\u4e0d\u4ec5\u4ec5\u662f\u7f16\u7801\uff0c\u8fd8\u5305\u62ec\u7ef4\u62a4\uff08\u5982\u4fee\u590dbug\uff09\u548c\u6f14\u5316\uff08\u5982\u6dfb\u52a0\u529f\u80fd\uff09\u7b49\u7a0b\u5e8f\u6539\u8fdb\u8fc7\u7a0b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u89e3\u51b3GitHub\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5b9e\u73b0\u7a0b\u5e8f\u81ea\u4e3b\u6539\u8fdb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u79f0\u4e3aAutoCodeRover\uff0c\u5b83\u7ed3\u5408\u4e86LLMs\u4e0e\u9ad8\u7ea7\u4ee3\u7801\u641c\u7d22\u80fd\u529b\uff0c\u6700\u7ec8\u751f\u6210\u7a0b\u5e8f\u4fee\u6539\u6216\u8865\u4e01\u3002\u4e0eAI\u7814\u7a76\u8005\u548c\u4ece\u4e1a\u8005\u8fd1\u671f\u5173\u6ce8\u7684\u4ec5\u6587\u4ef6\u7ea7\u522b\u7684\u8f6f\u4ef6\u9879\u76ee\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u7a0b\u5e8f\u8868\u793a\uff08\u62bd\u8c61\u8bed\u6cd5\u6811\uff09\uff0c\u5229\u7528\u7c7b/\u65b9\u6cd5\u7684\u7a0b\u5e8f\u7ed3\u6784\u6765\u589e\u5f3aLLM\u5bf9\u95ee\u9898\u6839\u672c\u539f\u56e0\u7684\u7406\u89e3\uff0c\u5e76\u901a\u8fc7\u8fed\u4ee3\u641c\u7d22\u63d0\u4f9b\u4e0a\u4e0b\u6587\u3002\u5f53\u6d4b\u8bd5\u5957\u4ef6\u53ef\u7528\u65f6\uff0c\u8c31\u7cfb\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u6280\u672f\u8fdb\u4e00\u6b65\u7cbe\u786e\u4e86\u4e0a\u4e0b\u6587\u3002 \u5728SWE-bench-lite\uff0c\u4e00\u4e2a\u5305\u542b300\u4e2a\u771f\u5b9eGitHub\u95ee\u9898\u7684\u6570\u636e\u96c6\u4e0a\uff0cAutoCodeRover\u7684\u89e3\u51b3\u65b9\u6848\u6548\u679c\u63d0\u5347\uff0c\u89e3\u51b3\u4e86\u7ea622-23%\u7684\u95ee\u9898\u3002\u5bf9\u4e8e\u5168\u91cf\u7684SWE-bench\uff0c\u5305\u542b2294\u4e2aGitHub\u95ee\u9898\uff0cAutoCodeRover\u89e3\u51b3\u4e86\u5927\u7ea616%\u7684\u95ee\u9898\uff0c\u8fd9\u6bd4\u6700\u8fd1\u62a5\u9053\u7684\u6765\u81eaCognition Labs\u7684AI\u8f6f\u4ef6\u5de5\u7a0b\u5e08Devin\u7684\u8868\u73b0\u8fd8\u8981\u9ad8\uff0c\u800c\u4e14\u65f6\u95f4\u6d88\u8017\u4e0eDevin\u76f8\u5f53\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u6d41\u7a0b\u80fd\u591f\u63a8\u52a8\u81ea\u4e3b\u8f6f\u4ef6\u5de5\u7a0b\u7684\u53d1\u5c55\uff0c\u672a\u6765LLM\u81ea\u52a8\u751f\u6210\u7684\u4ee3\u7801\u53ef\u4ee5\u88ab\u81ea\u52a8\u5730\u8fdb\u884c\u4f18\u5316\u548c\u6539\u8fdb\u3002**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291](http://arxiv.org/abs/2404.05291)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u63d0\u5347\u56db\u8db3\u673a\u5668\u4eba\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u8d85\u8d8a\u77ed\u671f\u52a8\u4f5c\u7684\u957f\u671f\u4efb\u52a1\u3002\u5bf9\u4e8e\u56db\u8db3\u673a\u5668\u4eba\u6765\u8bf4\uff0c\u957f\u671f\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u5bf9\u4efb\u52a1\u7684\u8bed\u4e49\u6709\u9ad8\u5c42\u7406\u89e3\uff0c\u5e76\u5177\u5907\u5e7f\u6cdb\u7684\u8fd0\u52a8\u548c\u64cd\u7eb5\u6280\u80fd\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5c42\u63a8\u7406\u5c42\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u4efb\u52a1\u63cf\u8ff0\u4e2d\u751f\u6210\u6df7\u5408\u79bb\u6563-\u8fde\u7eed\u7684\u8ba1\u5212\uff0c\u4f5c\u4e3a\u673a\u5668\u4eba\u4ee3\u7801\u3002\u5b83\u5305\u62ec\u591a\u4e2aLLM\u4ee3\u7406\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u601d\u8ba1\u5212\u7684\u8bed\u4e49\u89c4\u5212\u5668\u3001\u4e00\u4e2a\u53c2\u6570\u8ba1\u7b97\u5668\uff0c\u7528\u4e8e\u9884\u6d4b\u8ba1\u5212\u4e2d\u7684\u53c2\u6570\uff0c\u4ee5\u53ca\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u5668\uff0c\u5c06\u8ba1\u5212\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684\u673a\u5668\u4eba\u4ee3\u7801\u3002 \u5728\u4f4e\u5c42\u6b21\uff0c\u6211\u4eec\u91c7\u7528\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u4e00\u5957\u8fd0\u52a8\u89c4\u5212\u548c\u63a7\u5236\u6280\u80fd\uff0c\u4ee5\u589e\u5f3a\u56db\u8db3\u673a\u5668\u4eba\u7684\u7075\u6d3b\u6027\uff0c\u4f7f\u5176\u80fd\u8fdb\u884c\u4e30\u5bcc\u73af\u5883\u4ea4\u4e92\u3002\u6211\u4eec\u5728\u96be\u4ee5\u7528\u5355\u4e00\u6280\u80fd\u5b8c\u6210\u7684\u957f\u671f\u4efb\u52a1\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u7cfb\u7edf\u3002\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u8868\u660e\uff0c\u5b83\u6210\u529f\u5730\u5236\u5b9a\u4e86\u591a\u6b65\u9aa4\u7b56\u7565\uff0c\u5e76\u5c55\u73b0\u51fa\u975e\u5e73\u51e1\u7684\u884c\u4e3a\uff0c\u4f8b\u5982\u5236\u4f5c\u5de5\u5177\u6216\u5411\u4eba\u7c7b\u5bfb\u6c42\u5e2e\u52a9\u3002|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667](http://arxiv.org/abs/2404.04667)|null|\u591a\u6a21\u6001\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u6709\u671b\u901a\u8fc7\u89e3\u6790\u5404\u7c7b\u533b\u5b66\u6570\u636e\u63d0\u5347\u4e34\u5e8a\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5404\u533b\u5b66\u9886\u57df\u7684\u6548\u80fd\u5c1a\u4e0d\u660e\u6717\uff0c\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5176\u72ec\u7279\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u5f15\u64ce\u7684\u65b0\u578b\u591a\u6a21\u6001\u533b\u7597AI\u65b9\u6cd5\u3002\u6b64\u5f15\u64ce\u81ea\u4e3b\u534f\u8c03\u5e76\u90e8\u7f72\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u533b\u7597AI\u5de5\u5177\uff0c\u5982\u6587\u672c\u89e3\u8bfb\u3001\u653e\u5c04\u5b66\u548c\u75c5\u7406\u56fe\u50cf\u5206\u6790\u3001\u57fa\u56e0\u6570\u636e\u5904\u7406\u3001\u7f51\u7edc\u641c\u7d22\u4ee5\u53ca\u533b\u7597\u6307\u5357\u6587\u6863\u68c0\u7d22\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u4e34\u5e8a\u80bf\u7624\u5b66\u573a\u666f\u4e2d\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u573a\u666f\u6a21\u62df\u4e86\u5178\u578b\u7684\u60a3\u8005\u62a4\u7406\u6d41\u7a0b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u7cfb\u7edf\u5728\u9009\u62e9\u6070\u5f53\u5de5\u5177\uff0897%\uff09\u3001\u5f97\u51fa\u6b63\u786e\u7ed3\u8bba\uff0893.6%\uff09\u3001\u63d0\u4f9b\u5b8c\u6574\uff0894%\uff09\u548c\u6709\u76ca\uff0889.2%\uff09\u6cbb\u7597\u5efa\u8bae\uff0c\u4ee5\u53ca\u6839\u636e\u6307\u4ee4\u5f15\u7528\u76f8\u5173\u6587\u732e\uff0882.5%\uff09\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u80fd\u529b\u3002\u8fd9\u8868\u660eLLMs\u80fd\u591f\u6709\u6548\u5730\u89c4\u5212\u548c\u6267\u884c\u9886\u57df\u7279\u5b9a\u6a21\u578b\uff0c\u4ee5\u83b7\u53d6\u6216\u5408\u6210\u65b0\u4fe1\u606f\uff0c\u4ece\u800c\u5145\u5f53\u4e2a\u6027\u5316\u4e34\u5e8a\u52a9\u624b\u3002\u6b64\u5916\uff0c\u8fd9\u79cd\u67b6\u6784\u7b80\u5316\u4e86\u76d1\u7ba1\u5408\u89c4\u6027\uff0c\u56e0\u4e3a\u6bcf\u4e2a\u7ec4\u4ef6\u5de5\u5177\u53ef\u4ee5\u5355\u72ec\u9a8c\u8bc1\u548c\u5ba1\u6279\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u9879\u5de5\u4f5c\u4e3a\u533b\u7597\u9886\u57df\u7684\u66f4\u5148\u8fdbLLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u6982\u5ff5\u9a8c\u8bc1\u3002|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237](http://arxiv.org/abs/2404.04237)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u8fdb\u6b65\u4f7f\u5176\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u9891\u9891\u8d85\u8d8a\u4eba\u7c7b\u8868\u73b0\uff0c\u63a8\u52a8\u4e86\u4f17\u591a\u4e0b\u6e38\u5e94\u7528\u7684\u53d1\u5c55\uff0c\u5982\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u770b\u4f3c\u7b80\u5355\u7684\u4efb\u52a1\u4e2d\u610f\u5916\u5730\u8868\u73b0\u4e0d\u4f73\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5bf9\u66f4\u5168\u9762\u548c\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u9700\u6c42\uff0c\u4ee5\u8861\u91cf\u5b83\u4eec\u7684\u5b9e\u9645\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7ec4\u5408\u6027\u548c\u6761\u4ef6\u63a8\u7406\u2014\u2014\u4eba\u7c7b\u8ba4\u77e5\u7684\u57fa\u77f3\uff0c\u5e76\u63d0\u51faGroundCocoa\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e0e\u822a\u73ed\u9884\u8ba2\u8fd9\u4e00\u73b0\u5b9e\u95ee\u9898\u76f8\u8fde\u63a5\u7684\u8bcd\u6c47\u4e30\u5bcc\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u4efb\u52a1\u662f\u5c06\u7528\u6237\u7684\u8be6\u7ec6\u504f\u597d\u4e0e\u4ee5\u591a\u9009\u5f62\u5f0f\u63d0\u4f9b\u7684\u53ef\u7528\u822a\u73ed\u9009\u9879\u8fdb\u884c\u5339\u914d\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684GPT-4 Turbo\u5728\u5185\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7\u9ad8\u7ea7\u63d0\u793a\u540e\uff0c\u51c6\u786e\u7387\u4ecd\u4e0d\u8d85\u8fc767%\uff0c\u663e\u793a\u51fa\u663e\u8457\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045](http://arxiv.org/abs/2404.16045)|null|## \u7ffb\u8bd1 \u5728\u4ea7\u54c1\u5f00\u53d1\u7684\u5173\u952e\u9636\u6bb5\u2014\u2014\u9700\u6c42\u83b7\u53d6\uff0c\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u7528\u6237\u9700\u6c42\uff0c\u5bfc\u81f4\u6700\u7ec8\u4ea7\u54c1\u53ef\u80fd\u65e0\u6cd5\u6ee1\u8db3\u671f\u671b\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u548c\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u3002\u901a\u8fc7\u751f\u6210\u5927\u91cf\u6a21\u62df\u7528\u6237\uff08LLM\u4ee3\u7406\uff09\uff0c\u6211\u4eec\u53ef\u4ee5\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7528\u6237\u9700\u6c42\u548c\u672a\u9884\u89c1\u7684\u4f7f\u7528\u573a\u666f\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u63cf\u8ff0\u4ed6\u4eec\u7684\u884c\u4e3a\u3001\u89c2\u5bdf\u548c\u6311\u6218\uff0c\u53c2\u4e0e\u4ea7\u54c1\u4f53\u9a8c\u60c5\u666f\u3002\u968f\u540e\u7684\u4ee3\u7406\u8bbf\u8c08\u548c\u5206\u6790\u63ed\u793a\u4e86\u5b9d\u8d35\u7684\u7528\u6237\u9700\u6c42\uff0c\u5305\u62ec\u6f5c\u5728\u9700\u6c42\u3002\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff1a\u9996\u5148\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u65b9\u6cd5\u751f\u6210\u591a\u6837\u5316\u7684\u4ee3\u7406\uff0c\u5206\u6790\u5176\u4f18\u7f3a\u70b9\uff0c\u5e76\u8bc1\u660e\u4e86\u5177\u6709\u4e0a\u4e0b\u6587\u610f\u8bc6\u7684\u4ee3\u7406\u751f\u6210\u80fd\u5e26\u6765\u66f4\u5927\u7684\u9700\u6c42\u591a\u6837\u6027\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8be5\u6846\u67b6\u5982\u4f55\u6709\u6548\u5730\u6a21\u62df\u5bcc\u6709\u540c\u60c5\u5fc3\u7684\u9886\u5148\u7528\u6237\u8bbf\u8c08\uff0c\u8bc6\u522b\u51fa\u6bd4\u4f20\u7edf\u4eba\u7c7b\u8bbf\u8c08\u66f4\u591a\u7684\u6f5c\u5728\u9700\u6c42\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528LLMs\u5206\u6790\u8bbf\u8c08\uff0c\u63d0\u53d6\u9700\u6c42\u5e76\u5c06\u5176\u5206\u7c7b\u4e3a\u6f5c\u5728\u6216\u975e\u6f5c\u5728\u3002\u6211\u4eec\u7684\u7814\u7a76\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5229\u7528LLM\u4ee3\u7406\u52a0\u901f\u65e9\u671f\u4ea7\u54c1\u7814\u53d1\u3001\u964d\u4f4e\u6210\u672c\u548c\u4fc3\u8fdb\u521b\u65b0\u7684\u6f5c\u529b\u3002|\n", "2404.15317": "|**2024-04-03**|**Concept-Guided LLM Agents for Human-AI Safety Codesign**|Florian Geissler et.al.|[2404.15317](http://arxiv.org/abs/2404.15317)|null|\u968f\u7740\u751f\u6210\u4eba\u5de5\u667a\u80fd\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff0c\u7279\u522b\u662f\u5b89\u5168\u5de5\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u63d0\u5347\uff0c\u5bf9\u5b83\u7684\u8d28\u91cf\u8981\u6c42\u4e5f\u968f\u4e4b\u63d0\u9ad8\u3002\u5355\u7eaf\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e9b\u9700\u6c42\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u878d\u5408\u7684\u7b56\u7565\uff0c\u65e8\u5728\u5229\u7528LLMs\u8fdb\u884c\u5b89\u5168\u5206\u6790\u548c\u4eba\u673a\u534f\u540c\u8bbe\u8ba1\uff0c\u4ee5\u786e\u4fdd\u8f6f\u4ef6\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9a\u5236\u5316\u7684LLM\u4ee3\u7406\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u3001\u542f\u53d1\u5f0f\u63a8\u7406\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u4e13\u6ce8\u4e8e\u89e3\u51b3\u4e0e\u9884\u5b9a\u4e49\u5b89\u5168\u6982\u5ff5\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5e76\u4e0e\u7cfb\u7edf\u6a21\u578b\u56fe\u8fdb\u884c\u4ea4\u4e92\u3002\u51b3\u7b56\u6d41\u7a0b\u901a\u8fc7\u4e00\u7cfb\u5217\u5fae\u51b3\u7b56\u8fdb\u884c\u5f15\u5bfc\uff0c\u6709\u52a9\u4e8e\u4fdd\u6301\u7ed3\u6784\u5316\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u56fe\u7684\u53e3\u5934\u8868\u8ff0\u4f5c\u4e3a\u7cfb\u7edf\u6a21\u578b\u7684\u4e2d\u95f4\u8868\u793a\uff0c\u4ee5\u4fc3\u8fdbLLM\u4e0e\u56fe\u7684\u4ea4\u4e92\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7b80\u5316\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u7684\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u9009\u62e9\u7684\u63d0\u793a-\u54cd\u5e94\u5bf9\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u5e94\u7528\u4e8e\u5b89\u5168\u5206\u6790\u3002|\n", "2404.02183": "|**2024-04-02**|**Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**|Yoichi Ishibashi et.al.|[2404.02183](http://arxiv.org/abs/2404.02183)|**[link](https://github.com/tsukushiai/self-organized-agent)**|**## \u80cc\u666f \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u7684\u672a\u6765\u6b63\u9010\u6e10\u663e\u73b0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5355\u4ee3\u7406\u65b9\u6cd5\u5728\u751f\u6210\u548c\u4f18\u5316\u5927\u89c4\u6a21\u3001\u590d\u6742\u7684\u4ee3\u7801\u5e93\u65f6\u9762\u4e34\u4e0a\u4e0b\u6587\u957f\u5ea6\u9650\u5236\u7684\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014\u81ea\u7ec4\u7ec7\u591aAgent\u4f53\u7cfb\uff08SoA\uff09\u3002SoA\u662f\u4e00\u4e2a\u53ef\u6269\u5c55\u4e14\u9ad8\u6548\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u5141\u8bb8\u72ec\u7acb\u5730\u751f\u6210\u548c\u4fee\u6539\u4ee3\u7801\u7ec4\u4ef6\uff0c\u5e76\u534f\u540c\u6784\u5efa\u6574\u4e2a\u4ee3\u7801\u5e93\u3002SoA\u7684\u4e00\u4e2a\u5173\u952e\u7279\u6027\u662f\u6839\u636e\u95ee\u9898\u590d\u6742\u6027\u81ea\u52a8\u589e\u52a0\u4ee3\u7406\uff0c\u5b9e\u73b0\u52a8\u6001\u53ef\u6269\u5c55\u6027\u3002\u8fd9\u6837\uff0c\u6574\u4f53\u4ee3\u7801\u91cf\u53ef\u4ee5\u6839\u636e\u4ee3\u7406\u6570\u91cf\u65e0\u9650\u589e\u957f\uff0c\u800c\u6bcf\u4e2a\u4ee3\u7406\u7ba1\u7406\u7684\u4ee3\u7801\u91cf\u4fdd\u6301\u6052\u5b9a\u3002 \u6211\u4eec\u5728HumanEval\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86SoA\uff0c\u5e76\u53d1\u73b0\u4e0e\u5355\u4ee3\u7406\u7cfb\u7edf\u76f8\u6bd4\uff0cSoA\u4e2d\u7684\u6bcf\u4e2a\u4ee3\u7406\u5904\u7406\u7684\u4ee3\u7801\u91cf\u660e\u663e\u51cf\u5c11\uff0c\u4f46\u603b\u4f53\u751f\u6210\u7684\u4ee3\u7801\u91cf\u663e\u8457\u589e\u52a0\u3002\u6b64\u5916\uff0cSoA\u5728Pass@1\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u5f3a\u5927\u7684\u5355\u4ee3\u7406\u57fa\u7ebf\u63d0\u9ad8\u4e865%\u3002**|\n", "2404.01602": "|**2024-04-02**|**Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game**|Silin Du et.al.|[2404.01602](http://arxiv.org/abs/2404.01602)|**[link](https://github.com/doslim/evaluate-the-opinion-leadership-of-llms)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4ea4\u63a8\u7406\u6e38\u620f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u7b56\u7565\u884c\u4e3a\uff0c\u4f46\u5bf9\u5b83\u4eec\u4f5c\u4e3a\u610f\u89c1\u9886\u8896\u7684\u91cd\u8981\u6027\u5173\u6ce8\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u591aAgent\u548c\u4eba\u673a\u4ea4\u4e92\u573a\u666f\u7684\u5b9e\u9645\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u610f\u89c1\u9886\u8896\u662f\u6307\u5728\u4e00\u4e2a\u793e\u4f1a\u7fa4\u4f53\u4e2d\u5bf9\u4ed6\u4eba\u4fe1\u5ff5\u548c\u884c\u4e3a\u6709\u663e\u8457\u5f71\u54cd\u7684\u4e2a\u4f53\u3002\u672c\u7814\u7a76\u4f7f\u7528\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u4f5c\u4e3a\u6a21\u62df\u5e73\u53f0\uff0c\u63a2\u8ba8\u8bed\u8a00\u6a21\u578b\u5728\u626e\u6f14Sheriff\uff08\u6cbb\u5b89\u5b98\uff09\u89d2\u8272\u65f6\u7684\u610f\u89c1\u9886\u5bfc\u80fd\u529b\u3002Sheriff\u8d1f\u8d23\u603b\u7ed3\u8bba\u70b9\u5e76\u63d0\u51fa\u51b3\u7b56\u5efa\u8bae\uff0c\u56e0\u6b64\u5b83\u4ee3\u8868\u4e86\u610f\u89c1\u9886\u8896\u7684\u4e00\u4e2a\u53ef\u4fe1\u4ee3\u7406\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6574\u5408Sheriff\u89d2\u8272\u7684\u6846\u67b6\uff0c\u5e76\u57fa\u4e8e\u610f\u89c1\u9886\u8896\u7684\u5173\u952e\u7279\u6027\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bc4\u4f30\u6307\u6807\uff1a\u7b2c\u4e00\u4e2a\u8861\u91cf\u610f\u89c1\u9886\u8896\u7684\u53ef\u9760\u6027\uff0c\u7b2c\u4e8c\u4e2a\u8003\u5bdf\u5176\u5bf9\u5176\u4ed6\u73a9\u5bb6\u51b3\u7b56\u7684\u5f71\u54cd\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u8bc4\u4f30\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u521b\u5efa\u4e86\u201c\u72fc\u4eba\u6740\u201d\u95ee\u9898\u56de\u7b54\u6570\u636e\u96c6\uff08WWQA\uff09\uff0c\u4ee5\u6d4b\u8bd5\u548c\u63d0\u5347\u6a21\u578b\u5bf9\u6e38\u620f\u89c4\u5219\u7684\u7406\u89e3\u3002\u6b64\u5916\uff0c\u8fd8\u5305\u542b\u4e86\u4eba\u7c7b\u53c2\u4e0e\u8005\u8fdb\u884c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u662f\u4e00\u4e2a\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u610f\u89c1\u9886\u5bfc\u529b\u7684\u8bd5\u9a8c\u573a\uff0c\u4f46\u76ee\u524d\u4ec5\u6709\u5c11\u6570\u8bed\u8a00\u6a21\u578b\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002**|\n", "2404.00806": "|**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|\u968f\u7740\u7b97\u6cd5\u5b9a\u4ef7\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u62c5\u5fe7\u7b97\u6cd5\u95f4\u7684\u5408\u8c0b\u95ee\u9898\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b9a\u4ef7\u4ee3\u7406\uff0c\u7279\u522b\u662fGPT-4\uff0c\u8fdb\u884c\u4e86\u63a2\u7a76\u3002\u7814\u7a76\u53d1\u73b0\uff1a(1) LLM\u9a71\u52a8\u7684\u5b9a\u4ef7\u673a\u5236\u5728\u5b9a\u4ef7\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff1b(2) \u5728\u5be1\u5934\u7ade\u4e89\u73af\u5883\u4e2d\uff0cLLM\u5b9a\u4ef7\u4ee3\u7406\u4f1a\u81ea\u53d1\u5730\u8fdb\u884c\u5408\u8c0b\uff0c\u4ece\u800c\u635f\u5bb3\u6d88\u8d39\u8005\u5229\u76ca\uff1b(3) \u5bf9LLM\u6307\u4ee4\uff08\u201c\u63d0\u793a\u201d\uff09\u770b\u4f3c\u5fae\u5c0f\u7684\u53d8\u5316\u53ef\u80fd\u52a0\u5267\u8fd9\u79cd\u5408\u4f5c\u884c\u4e3a\u3002\u8fd9\u4e9b\u7ed3\u679c\u540c\u6837\u9002\u7528\u4e8e\u62cd\u5356\u573a\u666f\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u7b97\u6cd5\u5b9a\u4ef7\u8fdb\u884c\u53cd\u5784\u65ad\u76d1\u7ba1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u9488\u5bf9LLM\u5b9a\u4ef7\u4ee3\u7406\u7279\u6709\u7684\u76d1\u7ba1\u6311\u6218\u3002|\n", "2404.01343": "|**2024-04-15**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343](http://arxiv.org/abs/2404.01343)|**[link](https://github.com/jingzheshi/chops)**|**\u968f\u7740\u4f01\u4e1a\u548c\u8f6f\u4ef6\u5e73\u53f0\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-3.5\u3001GPT-4\u3001GLM-3\u548cLLaMa-2\uff09\u63d0\u4f9b\u804a\u5929\u8f85\u52a9\u6216\u5ba2\u6237\u670d\u52a1\u63a8\u7406\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u5ba2\u6237\u670d\u52a1\u6a21\u578b\u5728\u4e0e\u5ba2\u6237\u8d44\u6599\u96c6\u6210\u548c\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u5b83\u4eec\u503e\u5411\u4e8e\u5f3a\u8c03\u591a\u6837\u6027\u800c\u975e\u7cbe\u786e\u6027\u548c\u9519\u8bef\u907f\u514d\uff0c\u8fd9\u5bf9\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u5ba2\u6237\u670d\u52a1\u573a\u666f\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHOPS\uff08\u7ed3\u5408\u5ba2\u6237\u8d44\u6599\u7684\u804a\u5929\u52a9\u624b\uff09\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\uff1a\uff081\uff09\u9ad8\u6548\u5229\u7528\u73b0\u6709\u6570\u636e\u5e93\u6216\u7cfb\u7edf\u67e5\u8be2\u7528\u6237\u4fe1\u606f\uff0c\u6216\u9075\u5faa\u65e2\u5b9a\u6307\u5357\u4e0e\u7cfb\u7edf\u4ea4\u4e92\uff1b\uff082\uff09\u63d0\u4f9b\u51c6\u786e\u5408\u7406\u7684\u54cd\u5e94\u5e76\u6267\u884c\u7cfb\u7edf\u5185\u7684\u5fc5\u8981\u64cd\u4f5c\uff0c\u540c\u65f6\u907f\u514d\u6709\u5bb3\u64cd\u4f5c\uff1b\uff083\uff09\u901a\u8fc7\u7ed3\u5408\u5c0f\u578b\u548c\u5927\u578bLLM\u4ee5\u5b9e\u73b0\u6027\u80fd\u6ee1\u610f\u4e14\u6210\u672c\u5408\u7406\u7684\u63a8\u7406\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9e\u7528\u7684\u6570\u636e\u96c6\uff0c\u79f0\u4e3aCPHOS-dataset\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u6570\u636e\u5e93\u3001\u6307\u5bfc\u6587\u4ef6\u4ee5\u53ca\u6765\u81eaCPHOS\u5e73\u53f0\u7684\u6a21\u62df\u7269\u7406\u5965\u6797\u5339\u514b\u7ec4\u7ec7\u670d\u52a1\u7684\u95ee\u7b54\u5bf9\u3002CPHOS\u662f\u4e00\u4e2a\u9762\u5411\u9ad8\u4e2d\u6559\u5e08\u548c\u5b66\u751f\u7684\u5728\u7ebf\u5e73\u53f0\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528CPHOS-dataset\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86CHOPS\u67b6\u6784\u7684\u6027\u80fd\uff0c\u76ee\u6807\u662f\u5c55\u793aLLM\u5982\u4f55\u63d0\u5347\u6216\u66ff\u4ee3\u4eba\u5de5\u5ba2\u6237\u670d\u52a1\u3002\u5173\u4e8e\u6211\u4eec\u7684\u63d0\u6848\u67b6\u6784\u548c\u6570\u636e\u96c6\u7684\u4ee3\u7801\u53ef\u5728\u6b64\u5904\u83b7\u53d6\uff1a\u3002**|\n", "2404.01342": "|**2024-03-31**|**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**|Lirui Zhao et.al.|[2404.01342](http://arxiv.org/abs/2404.01342)|**[link](https://github.com/opengvlab/diffagent)**|**\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u8fd1\u5e74\u6765\u5907\u53d7\u77a9\u76ee\uff0c\u5728\u5b66\u672f\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u653e\u5f02\u5f69\u3002\u4f8b\u5982\uff0cCivitai\u5e73\u53f0\uff0c\u4e00\u4e2aT2I\u521b\u65b0\u7684\u805a\u96c6\u5730\uff0c\u76ee\u524d\u6c47\u96c6\u4e8674,492\u79cd\u72ec\u7279\u7684\u6a21\u578b\uff0c\u8fd9\u5e26\u6765\u4e86\u9009\u62e9\u6700\u5408\u9002\u7684\u6a21\u578b\u548c\u53c2\u6570\u7684\u8270\u5de8\u4efb\u52a1\uff0c\u901a\u5e38\u9700\u8981\u591a\u6b21\u8bd5\u9a8c\u3002\u501f\u9274\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5de5\u5177\u4f7f\u7528\u7814\u7a76\u7684\u601d\u8def\uff0c\u6211\u4eec\u63a8\u51fa\u4e86DiffAgent\uff0c\u8fd9\u662f\u4e00\u4e2a\u901a\u8fc7API\u8c03\u7528\u6765\u5feb\u901f\u7b5b\u9009\u51c6\u786e\u9009\u9879\u7684LLM\u4ee3\u7406\u3002DiffAgent\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff0c\u79f0\u4e3aSFTA\uff0c\u4f7f\u5176\u80fd\u591f\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7cbe\u786e\u5730\u5c06T2I API\u7684\u54cd\u5e94\u4e0e\u7528\u6237\u8f93\u5165\u5bf9\u9f50\u3002\u4e3a\u4e86\u8bad\u7ec3\u548c\u8bc4\u4f30DiffAgent\u7684\u80fd\u529b\uff0c\u6211\u4eec\u6784\u5efa\u4e86DABench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u5e93\uff0c\u6db5\u76d6\u4e86\u793e\u533a\u4e2d\u7684\u5404\u79cdT2I API\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDiffAgent\u4e0d\u4ec5\u5728\u9009\u62e9\u9002\u5f53\u7684T2I API\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u9a8c\u8bc1\u4e86SFTA\u8bad\u7ec3\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53ef\u5728https://github.com/OpenGVLab/DiffAgent\u83b7\u53d6\u3002**|\n", "2404.00573": "|**2024-03-31**|**\"My agent understands me better\": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u7c7b\u8bb0\u5fc6\u67b6\u6784\uff0c\u65e8\u5728\u63d0\u5347\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u8bdd\u4ee3\u7406\u7684\u8ba4\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f7f\u5f97\u8fd9\u4e9b\u4ee3\u7406\u80fd\u81ea\u4e3b\u68c0\u7d22\u751f\u6210\u54cd\u5e94\u6240\u9700\u7684\u5fc5\u8981\u8bb0\u5fc6\uff0c\u4ece\u800c\u89e3\u51b3LLMs\u5728\u65f6\u95f4\u8ba4\u77e5\u4e0a\u7684\u5c40\u9650\u3002\u6211\u4eec\u501f\u9274\u4e86\u4eba\u7c7b\u7684\u8bb0\u5fc6\u7ebf\u7d22\u53ec\u56de\u673a\u5236\u4f5c\u4e3a\u89e6\u53d1\u70b9\uff0c\u4ee5\u5b9e\u73b0\u7cbe\u786e\u4e14\u9ad8\u6548\u7684\u56de\u5fc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6570\u5b66\u6a21\u578b\uff0c\u52a8\u6001\u91cf\u5316\u8bb0\u5fc6\u5de9\u56fa\u8fc7\u7a0b\uff0c\u8003\u8651\u4e86\u8bf8\u5982\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u65f6\u95f4\u6d41\u901d\u548c\u56de\u5fc6\u9891\u7387\u7b49\u56e0\u7d20\u3002\u4ee3\u7406\u4f1a\u4ece\u7528\u6237\u7684\u4ea4\u4e92\u5386\u53f2\u4e2d\u5b58\u50a8\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u8bb0\u5fc6\u88ab\u5c01\u88c5\u5728\u6570\u636e\u5e93\u4e2d\uff0c\u6bcf\u4e2a\u8bb0\u5fc6\u90fd\u5305\u542b\u4e86\u5185\u5bb9\u548c\u65f6\u95f4\u5173\u8054\u7684\u8bed\u5883\u3002\u8fd9\u6837\uff0c\u901a\u8fc7\u7c7b\u4f3c\u4eba\u7c7b\u8bc6\u522b\u548c\u56de\u5fc6\u8fc7\u5f80\u7ecf\u5386\u7684\u65b9\u5f0f\uff0c\u7cfb\u7edf\u80fd\u591f\u6218\u7565\u6027\u5730\u5b58\u50a8\u8bb0\u5fc6\uff0c\u5e76\u7406\u89e3\u5b83\u4eec\u5bf9\u7528\u6237\u5728\u65f6\u95f4\u7ebf\u4e0a\u7684\u91cd\u8981\u6027\u3002|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.11403": "|**2024-05-18**|**MapCoder: Multi-Agent Code Generation for Competitive Problem Solving**|Md. Ashraful Islam et.al.|[2405.11403](http://arxiv.org/abs/2405.11403)|**[link](https://github.com/md-ashraful-pramanik/mapcoder)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ee3\u7801\u5408\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u590d\u6742\u7684\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u63cf\u8ff0\u3001\u751f\u6210\u590d\u6742\u7684\u7b97\u6cd5\u548c\u6570\u636e\u7ed3\u6784\u4ee3\u7801\uff0c\u5e76\u6267\u884c\u5168\u9762\u7684\u5355\u5143\u6d4b\u8bd5\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u591a\u4ee3\u7406\u63d0\u793a\u6846\u67b6MapCoder\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u5f00\u53d1\u8005\u7f16\u7a0b\u5408\u6210\u7684\u5b8c\u6574\u8fc7\u7a0b\uff0c\u5206\u4e3a\u56db\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\uff1a\u56de\u5fc6\u76f8\u5173\u793a\u4f8b\u3001\u89c4\u5212\u3001\u4ee3\u7801\u751f\u6210\u548c\u8c03\u8bd5\u3002 \u901a\u8fc7\u5728\u516b\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7ade\u8d5b\u7ea7\u95ee\u9898\u89e3\u51b3\u548c\u7a0b\u5e8f\u5408\u6210\u57fa\u51c6\u4e0a\u8fdb\u884c\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u5305\u62ecHumanEval\uff0893.9%\uff09\u3001MBPP\uff0883.1%\uff09\u3001APPS\uff0822.0%\uff09\u3001CodeContests\uff0828.5%\uff09\u548cxCodeEval\uff0845.3%\uff09\u7b49\uff0cMapCoder\u5c55\u73b0\u4e86\u51fa\u8272\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u591a\u9879\u65b0\u7684\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u800c\u4e14\uff0c\u65e0\u8bba\u7f16\u7a0b\u8bed\u8a00\u8fd8\u662f\u95ee\u9898\u96be\u5ea6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u90fd\u8868\u73b0\u51fa\u6301\u7eed\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8be5\u6846\u67b6\uff0c\u4f9b\u7814\u7a76\u8005\u53c2\u8003\uff1ahttps://github.com/Md-Ashraful-Pramanik/MapCoder\u3002**|\n", "2405.14751": "|**2024-05-23**|**AGILE: A Novel Framework of LLM Agents**|Peiyuan Feng et.al.|[2405.14751](http://arxiv.org/abs/2405.14751)|**[link](https://github.com/bytarnish/agile)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3aLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406AGILE\uff08\u80fd\u591f\u4e0e\u7528\u6237\u4e92\u52a8\u5e76\u4ece\u73af\u5883\u4e2d\u5b66\u4e60\u7684\u4ee3\u7406\uff09\uff0c\u65e8\u5728\u6267\u884c\u590d\u6742\u7684\u5bf9\u8bdd\u4efb\u52a1\uff0c\u5229\u7528LLMs\u3001\u8bb0\u5fc6\u3001\u5de5\u5177\u548c\u4e13\u5bb6\u4ea4\u4e92\u3002\u8fd9\u79cd\u4ee3\u7406\u4e0d\u4ec5\u5177\u5907\u5bf9\u8bdd\u80fd\u529b\uff0c\u8fd8\u5177\u5907\u53cd\u601d\u3001\u5de5\u5177\u8fd0\u7528\u4ee5\u53ca\u54a8\u8be2\u4e13\u5bb6\u7684\u529f\u80fd\u3002\u6211\u4eec\u5c06\u6784\u5efa\u6b64\u7c7bLLM\u4ee3\u7406\u89c6\u4e3a\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u4e2dLLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\u3002\u6211\u4eec\u4f7f\u7528\u6807\u6ce8\u7684\u884c\u4e3a\u6570\u636e\u548cPPO\u7b97\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7279\u522b\u5173\u6ce8\u7684\u662f\u95ee\u7b54\u4efb\u52a1\uff0c\u4e3a\u6b64\u6211\u4eec\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aProductQA\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u5728\u7ebf\u8d2d\u7269\u4e2d\u7684\u96be\u9898\u3002\u6211\u4eec\u5728ProductQA\u548cMedMCQA\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8e130\u4ebf\u548c70\u4ebf\u53c2\u6570\u7684LLM\u8bad\u7ec3\u7684AGILE\u4ee3\u7406\u80fd\u591f\u8d85\u8d8aGPT-4\u4ee3\u7406\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684 ablation\u7814\u7a76\u5f3a\u8c03\u4e86\u8bb0\u5fc6\u3001\u5de5\u5177\u3001\u54a8\u8be2\u3001\u53cd\u601d\u548c\u5f3a\u5316\u5b66\u4e60\u5728\u5b9e\u73b0\u4f18\u79c0\u6027\u80fd\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002|\n", "2405.14744": "|**2024-05-23**|**Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View**|Xuan Liu et.al.|[2405.14744](http://arxiv.org/abs/2405.14744)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u53cd\u6620\u4e86\u4eba\u7c7b\u504f\u89c1\uff0c\u5b83\u4eec\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\u95ee\u9898\u3002\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u662f\uff1aLLMs\u662f\u5426\u80fd\u591f\u5229\u7528\u5e7b\u89c9\u6765\u6a21\u4eff\u4eba\u7c7b\u7684\u8ba4\u77e5\u504f\u89c1\uff0c\u4ece\u800c\u5c55\u73b0\u51fa\u975e\u7406\u6027\u4f46\u793e\u4f1a\u6027\u7684\u4e00\u9762\uff1f\u672c\u6587\u63a2\u8ba8\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u901a\u8fc7\u7ed3\u5408\u5b9e\u7528\u7684\u793e\u4f1a\u79d1\u5b66\u5b9e\u9a8c\u548c\u7406\u8bba\u6d1e\u5bdf\uff0c\u63d0\u51faCogMir\uff0c\u4e00\u4e2a\u5f00\u653e\u5f0f\u591aLLM\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528LLMs\u7684\u5e7b\u89c9\u7279\u6027\u6765\u8bc4\u4f30\u548c\u63d0\u5347\u5176\u793e\u4f1a\u667a\u80fd\uff0c\u7279\u522b\u662f\u5728\u8ba4\u77e5\u504f\u5dee\u65b9\u9762\u3002\u6211\u4eec\u5728CogMir\u5b50\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u786e\u5b9a\u60c5\u5883\u4e0b\uff0cLLMs\u548c\u4eba\u7c7b\u5728\u975e\u7406\u6027\u53ca\u4eb2\u793e\u4f1a\u51b3\u7b56\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u4e00\u81f4\u6027\uff0c\u8fd9\u8868\u660eLLMs\u4f5c\u4e3a\u793e\u4f1a\u5b9e\u4f53\u7684\u4eb2\u793e\u4f1a\u6027\uff0c\u5e76\u7a81\u663e\u4e86\u5e7b\u89c9\u7279\u6027\u7684\u5173\u952e\u4f5c\u7528\u3002\u6b64\u5916\uff0cCogMir\u6846\u67b6\u5c55\u793a\u4e86\u5176\u4f5c\u4e3a\u7814\u7a76LLMs\u793e\u4f1a\u667a\u80fd\u7684\u6709\u4ef7\u503c\u5e73\u53f0\u7684\u6f5c\u529b\u3002|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547](http://arxiv.org/abs/2405.13547)|null|## \u80cc\u666f \u81ea\u52a8\u9a7e\u9a76\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u5148\u8fdb\u7684\u51b3\u7b56\u548c\u63a7\u5236\u7b97\u6cd5\u3002\u7406\u89e3\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u51b3\u7b56\u7684\u4f9d\u636e\u5bf9\u4e8e\u786e\u4fdd\u5176\u5728\u9ad8\u901f\u516c\u8def\u9a7e\u9a76\u4e2d\u7684\u5b89\u5168\u4e0e\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aHighwayLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u6765\u9884\u6d4bego\u8f66\u8f86\u7684\u672a\u6765\u5bfc\u822a\u8def\u5f84\u70b9\u3002\u8be5\u65b9\u6cd5\u8fd8\u91c7\u7528\u9884\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u6a21\u578b\u4f5c\u4e3a\u9ad8\u5c42\u6b21\u89c4\u5212\u5668\uff0c\u5bf9\u5408\u9002\u7684\u5143\u7ea7\u52a8\u4f5c\u8fdb\u884c\u51b3\u7b56\u3002HighwayLLM\u5c06RL\u6a21\u578b\u7684\u8f93\u51fa\u4e0e\u5f53\u524d\u72b6\u6001\u4fe1\u606f\u76f8\u7ed3\u5408\uff0c\u751f\u6210\u5b89\u5168\u3001\u65e0\u78b0\u649e\u4e14\u53ef\u89e3\u91ca\u7684\u672a\u6765\u72b6\u6001\u9884\u6d4b\uff0c\u4ece\u800c\u6784\u5efa\u51fa\u8f66\u8f86\u7684\u884c\u9a76\u8f68\u8ff9\u3002\u968f\u540e\uff0c\u57fa\u4e8ePID\u7684\u63a7\u5236\u5668\u5f15\u5bfc\u8f66\u8f86\u9075\u5faaLLM\u4ee3\u7406\u9884\u6d4b\u7684\u8def\u5f84\u70b9\u3002\u8fd9\u79cdLLM\u4e0eRL\u548cPID\u7684\u878d\u5408\u63d0\u5347\u4e86\u51b3\u7b56\u8fc7\u7a0b\uff0c\u5e76\u4e3a\u9ad8\u901f\u516c\u8def\u81ea\u52a8\u9a7e\u9a76\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050](http://arxiv.org/abs/2405.13050)|**[link](https://github.com/daniel-chin/flute-x-gpt)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09-\u5728-\u73af\u5e94\u7528\u5df2\u663e\u793a\u51fa\u6709\u6548\u7406\u89e3\u7528\u6237\u547d\u4ee4\u3001\u5236\u5b9a\u8ba1\u5212\u5e76\u76f8\u5e94\u5730\u64cd\u4f5c\u5916\u90e8\u5de5\u5177/\u7cfb\u7edf\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u4ee3\u7406\u7684\u64cd\u4f5c\u8303\u56f4\u5c40\u9650\u4e8e\u88ab\u52a8\u54cd\u5e94\u7528\u6237\uff0c\u9700\u8981\u7528\u6237\u6839\u636e\u5e95\u5c42\u5de5\u5177/\u7cfb\u7edf\u6765\u8868\u8ff0\u9700\u6c42\u3002\u6211\u4eec\u6ce8\u610f\u5230LLM\u4ee3\u7406\u7528\u6237\u754c\u9762\uff08LAUI\uff09\u7684\u6f5c\u529b\u8fdc\u672a\u5145\u5206\u5229\u7528\u3002\u7406\u60f3\u7684LAUI\u8bbe\u60f3\u4e2d\uff0c\u7528\u6237\u65e0\u9700\u6df1\u5165\u4e86\u89e3\u5de5\u5177/\u7cfb\u7edf\uff0c\u5c31\u80fd\u4e0e\u4e4b\u4ea4\u4e92\u4ee5\u63a2\u7d22\u65b0\u5174\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u4e0d\u540c\u4e8e\u8bbe\u8ba1\u56fa\u5b9a\u7684\u53ef\u63a2\u7d22GUI\u6765\u6559\u6388\u7528\u6237\u4f7f\u7528\u7cfb\u7edf\u7684\u9884\u8bbe\u65b9\u5f0f\uff0cLAUI\u4e2d\u7684LLM\u4ee3\u7406\u4ece\u4e00\u5f00\u59cb\u5c31\u5bf9\u7cfb\u7edf\u719f\u7ec3\uff0c\u4e3b\u52a8\u5b66\u4e60\u7528\u6237\u53ca\u5176\u9700\u6c42\uff0c\u5e76\u5411\u7528\u6237\u63d0\u51fa\u65b0\u7684\u4e92\u52a8\u65b9\u6848\u3002\u4e3a\u4e86\u5c55\u793aLAUI\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\uff1aFlute X GPT\uff0c\u5b83\u7ed3\u5408\u4e86LLM\u4ee3\u7406\u3001\u63d0\u793a\u7ba1\u7406\u5668\u548c\u4e00\u4e2a\u652f\u6301\u590d\u6742\u5b9e\u65f6\u4f53\u9a8c\u7684\u7b1b\u5b50\u6559\u5b66\u591a\u5a92\u4f53\u8f6f\u786c\u4ef6\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u5b66\u4e60\u5439\u594f\u7b1b\u5b50\u7684\u8fc7\u7a0b\u3002|\n", "2405.13009": "|**2024-05-13**|**METAREFLECTION: Learning Instructions for Language Agents using Past Reflections**|Priyanshu Gupta et.al.|[2405.13009](http://arxiv.org/abs/2405.13009)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e7f\u53d7\u6b22\u8fce\uff0c\u4f46\u4e3a\u5176\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u8bbe\u8ba1\u7cbe\u786e\u7684\u63d0\u793a\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u7528\u6237\u901a\u5e38\u9700\u8981\u4e0e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u591a\u8f6e\u5bf9\u8bdd\u4ee5\u8fbe\u6210\u76ee\u6807\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u6a21\u578b\u81ea\u8eab\u7684\u53cd\u9988\uff0c\u5373\u81ea\u53cd\u601d\uff0c\u80fd\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u8d77\u5230\u5f3a\u5316\u4f5c\u7528\uff0c\u6709\u52a9\u4e8e\u66f4\u5feb\u5730\u8fbe\u5230\u671f\u671b\u7ed3\u679c\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014METAREFLECTION\uff0c\u5b83\u80fd\u4ece\u8bad\u7ec3\u9636\u6bb5\u6536\u96c6\u5230\u7684\u4e2a\u4f53\u81ea\u53cd\u601d\u4e2d\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u901a\u7528\u63d0\u793a\u6307\u4ee4\u3002\u6211\u4eec\u5728\u57fa\u7840\u8bbe\u65bd\u5373\u4ee3\u7801\uff08IAC\uff09\u6f0f\u6d1e\u68c0\u6d4b\u548c\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4f7f\u7528REACT\u548cCOT\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cMETAREFLECTION\u663e\u8457\u4f18\u4e8eGPT-4\uff0c\u5206\u522b\u5728IAC\u3001COT\u548cREACT\u4e2d\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a16.82%\u300131.33%\u548c15.42%\uff0c\u8fd9\u8868\u660eMETAREFLECTION\u6709\u6f5c\u529b\u63d0\u5347LLMs\u7684\u6548\u7387\uff0c\u662f\u4e00\u79cd\u503c\u5f97\u63a2\u7d22\u7684\u7b56\u7565\u3002|\n", "2405.15414": "|**2024-05-24**|**Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification**|Yuxuan Guo et.al.|[2405.15414](http://arxiv.org/abs/2405.15414)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u4e2d\uff0c\u6784\u5efa\u5f00\u653e\u578b\u4ee3\u7406\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u7ec8\u6781\u76ee\u6807\uff0c\u7279\u522b\u662f\u521b\u9020\u6027\u7684\u4ee3\u7406\u66f4\u5177\u5438\u5f15\u529b\u3002\u73b0\u6709\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u6709\u660e\u786e\u76ee\u6807\u7684\u957f\u5e8f\u5217\u4efb\u52a1\uff08\u5982\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u201c\u5f00\u91c7\u94bb\u77f3\u201d\uff09\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u5904\u7406\u5177\u6709\u5f00\u653e\u76ee\u6807\u548c\u62bd\u8c61\u6807\u51c6\u7684\u521b\u9020\u6027\u4efb\u52a1\u65f6\u9047\u5230\u56f0\u96be\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5f25\u5408\u8fd9\u4e9b\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u4ece\u800c\u7f3a\u4e4f\u81ea\u6211\u6539\u8fdb\u6765\u89e3\u51b3\u95ee\u9898\u7684\u53cd\u9988\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\u6280\u672f\uff0c\u4ee5\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e3a\u521b\u9020\u6027\u4efb\u52a1\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Luban\u4ee3\u7406\uff0c\u4e13\u6ce8\u4e8e\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5b83\u914d\u5907\u4e86\u4e24\u7ea7\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u5b9e\u8df5\uff1a\uff081\uff09\u89c6\u89c9\u9a8c\u8bc13D\u7ed3\u6784\u63a8\u6d4b\uff0c\u901a\u8fc7\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u7684CAD\u5efa\u6a21\u7a0b\u5e8f\u5b9e\u73b0\uff1b\uff082\uff09\u5b9e\u7528\u9a8c\u8bc1\uff0c\u6839\u636e\u62bd\u8c61\u6807\u51c6\u751f\u6210\u5e76\u9a8c\u8bc1\u4e0e\u73af\u5883\u76f8\u5173\u7684\u529f\u80fd\u7a0b\u5e8f\u3002\u5e7f\u6cdb\u7684\u591a\u7ef4\u5ea6\u4eba\u7c7b\u7814\u7a76\u548cElo\u8bc4\u7ea7\u663e\u793a\uff0cLuban\u80fd\u591f\u5728\u6211\u4eec\u63d0\u51fa\u7684\u57fa\u51c6\u4e2d\u5b8c\u6210\u591a\u6837\u5316\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5e76\u5728\u53ef\u89c6\u5316\u548c\u5b9e\u7528\u6027\u65b9\u9762\u5206\u522b\u6bd4\u5176\u4ed6\u57fa\u7ebf\u63d0\u9ad8\u4e8633%\u5230100%\u3002\u6b64\u5916\uff0c\u5b9e\u73b0\u5728\u771f\u5b9e\u4e16\u754c\u673a\u5668\u4eba\u624b\u81c2\u4e0a\u7684\u6f14\u793a\u5c55\u793a\u4e86Luban\u5728\u7269\u7406\u4e16\u754c\u4e2d\u7684\u521b\u4f5c\u6f5c\u529b\u3002|\n", "2405.15145": "|**2024-05-24**|**CulturePark: Boosting Cross-cultural Understanding in Large Language Models**|Cheng Li et.al.|[2405.15145](http://arxiv.org/abs/2405.15145)|**[link](https://github.com/scarelette/culturepark)**|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u666e\u904d\u5b58\u5728\u6587\u5316\u504f\u89c1\uff0c\u4e3b\u8981\u6e90\u4e8e\u7f3a\u4e4f\u4ee3\u8868\u4e0d\u540c\u6587\u5316\u7684\u4ee3\u8868\u6027\u6570\u636e\u3002\u4f20\u7edf\u7684\u6587\u5316\u6570\u636e\u96c6\u548c\u57fa\u51c6\u901a\u5e38\u901a\u8fc7\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u63d0\u53d6\u6216\u805a\u5408\u6765\u81ea\u7ef4\u57fa\u767e\u79d1\u548c\u793e\u4ea4\u5a92\u4f53\u7684\u4fe1\u606f\u6784\u5efa\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u548c\u4eba\u5de5\u6807\u6ce8\uff0c\u6210\u672c\u9ad8\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u672c\u6587\u501f\u9274\u8ba4\u77e5\u793e\u4f1a\u4ea4\u6d41\u7406\u8bba\uff0c\u63d0\u51faCulturePark\uff0c\u4e00\u4e2a\u5229\u7528LLMs\u7684\u591a\u4ee3\u7406\u6c9f\u901a\u6846\u67b6\uff0c\u7528\u4e8e\u6587\u5316\u6570\u636e\u6536\u96c6\u3002CulturePark\u901a\u8fc7\u6a21\u62df\u4e0d\u540c\u6587\u5316\u80cc\u666f\u4e0b\u7684\u4eba\u7c7b\u4ea4\u6d41\uff0c\u8ba9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89d2\u8272\u626e\u6f14\uff0c\u751f\u6210\u5305\u542b\u4eba\u7c7b\u4fe1\u5ff5\u3001\u89c4\u8303\u548c\u4e60\u4fd7\u7684\u9ad8\u8d28\u91cf\u8de8\u6587\u5316\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528CulturePark\u751f\u6210\u4e8641,000\u4e2a\u6587\u5316\u6837\u672c\uff0c\u5bf9\u516b\u79cd\u7279\u5b9a\u6587\u5316\u8fdb\u884c\u4e86\u6a21\u578b\u5fae\u8c03\u3002\u5728\u4e09\u9879\u4e0b\u6e38\u4efb\u52a1\u8bc4\u4f30\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u8868\u73b0\u4f18\u4e8eGPT-4\uff1a\u5185\u5bb9\u8fc7\u6ee4\u3001\u6587\u5316\u4e00\u81f4\u6027\uff08\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u8868\u4e0a\uff09\u548c\u6587\u5316\u6559\u80b2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684GPT-3.5\u6a21\u578b\u5728\u5185\u5bb9\u8fc7\u6ee4\u4efb\u52a1\u4e0a\u4e0eGPT-4\u76f8\u5f53\u6216\u4f18\u4e8e\u5b83\uff1b\u5728\u6587\u5316\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u886813\u6846\u67b6\u4e0a\u8d85\u8d8aGPT-4\uff1b\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u6587\u5316\u6559\u80b2\u6548\u679c\u548c\u7528\u6237\u4f53\u9a8c\u4e0a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u8868\u73b0\u51fa\u8272\u3002CulturePark\u5bf9\u4e8e\u51cf\u5c11\u6587\u5316\u504f\u89c1\u548c\u63a8\u52a8AI\u7684\u6c11\u4e3b\u5316\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u5f3a\u8c03\u4e86\u6587\u5316\u5305\u5bb9\u6027\u6570\u636e\u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2405.14918": "|**2024-05-23**|**AnalogCoder: Analog Circuit Design via Training-Free Code Generation**|Yao Lai et.al.|[2405.14918](http://arxiv.org/abs/2405.14918)|**[link](https://github.com/laiyao1/AnalogCoder)**|### \u7ffb\u8bd1 \u5728\u73b0\u4ee3\u82af\u7247\u6280\u672f\u4e2d\uff0c\u6a21\u62df\u7535\u8def\u8bbe\u8ba1\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u7ec4\u4ef6\u9009\u62e9\u3001\u8fde\u63a5\u548c\u53c2\u6570\u8bbe\u7f6e\u4ee5\u786e\u4fdd\u7535\u8def\u529f\u80fd\u6b63\u5e38\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b57\u7535\u8def\u8bbe\u8ba1\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u4f46\u6a21\u62df\u7535\u8def\u7684\u590d\u6742\u6027\u548c\u6570\u636e\u7a00\u7f3a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a8\u51fa\u4e86AnalogCoder\uff0c\u8fd9\u662f\u9996\u4e2a\u65e0\u9700\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4e13\u4e3a\u901a\u8fc7Python\u4ee3\u7801\u751f\u6210\u6765\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u9996\u5148\uff0cAnalogCoder\u91c7\u7528\u53cd\u9988\u589e\u5f3a\u6d41\u7a0b\uff0c\u5e76\u7ed3\u5408\u5b9a\u5236\u7684\u9886\u57df\u7279\u5b9a\u63d0\u793a\uff0c\u80fd\u591f\u81ea\u52a8\u4e14\u81ea\u6211\u6821\u6b63\u5730\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\uff0c\u6210\u529f\u7387\u9ad8\u3002\u5176\u6b21\uff0c\u5b83\u63d0\u51fa\u4e86\u4e00\u5957\u7535\u8def\u5de5\u5177\u5e93\uff0c\u7528\u4e8e\u5b58\u50a8\u6210\u529f\u7684\u7535\u8def\u8bbe\u8ba1\u4f5c\u4e3a\u53ef\u91cd\u7528\u7684\u6a21\u5757\u5316\u5b50\u7535\u8def\uff0c\u7b80\u5316\u4e86\u590d\u5408\u7535\u8def\u7684\u521b\u5efa\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAnalogCoder\u5728\u5e7f\u6cdb\u8986\u76d6\u6a21\u62df\u7535\u8def\u4efb\u52a1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8d85\u8d8a\u4e86\u5176\u4ed6\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0c\u6210\u529f\u8bbe\u8ba1\u4e8620\u4e2a\u7535\u8def\uff0c\u6bd4\u6807\u51c6GPT-4o\u591a\u51fa5\u4e2a\u3002\u6211\u4eec\u76f8\u4fe1AnalogCoder\u80fd\u663e\u8457\u63d0\u5347\u82af\u7247\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u6548\u7387\uff0c\u8ba9\u975e\u4e13\u5bb6\u4e5f\u80fd\u9ad8\u6548\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u57fa\u51c6\u5df2\u63d0\u4f9b\u5728\uff1a[https://github.com/anonyanalog/AnalogCoder](https://github.com/anonyanalog/AnalogCoder)\u3002|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|## \u80cc\u666f \u7531\u4e8e\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0cEmbodied agent \u9700\u8981\u5177\u5907\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u51b3\u5b9a\u5177\u4f53\u884c\u52a8\u65f6\u53ef\u80fd\u4ea7\u751f\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u7684\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3 LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\u2014\u2014\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bad\u7ec3\u7b56\u7565\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u7684\u6700\u9ad8\u6210\u5c31\u9700\u8981\u66f4\u4e3a\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u6bd4\u73b0\u6709\u6700\u5feb\u65b9\u6cd5\u5feb\u51fa\u4e866.8\u500d\u3002|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510](http://arxiv.org/abs/2405.16510)|null|\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\u63a8\u52a8\u4e86\u667a\u80fd\u4ee3\u7406\u7814\u7a76\u7684\u65b0\u70ed\u6f6e\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5b9e\u73b0\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6709\u524d\u666f\u65b9\u6cd5\uff0c\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u548c\u6cdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u5728\u5b9e\u9645\u4efb\u52a1\u4e2d\uff0c\u6709\u6548\u7684\u89c4\u5212\u5bf9LLM\u4ee3\u7406\u7684\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u590d\u6742\u4efb\u52a1\u8bbe\u8ba1\u51fa\u53ef\u884c\u6216\u6700\u4f18\u7684\u7cbe\u7ec6\u7c92\u5ea6\u64cd\u4f5c\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9700\u8981\u7ec4\u5408\u5927\u91cf\u5f02\u8d28\u884c\u52a8\u7684\u5e8f\u5217\uff0c\u4ecd\u662f\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMeta-Task Planning\uff08MTP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u96f6\u6837\u672c\u7684\u534f\u4f5c\u5f0fLLM\u591a\u4ee3\u7406\u7cfb\u7edf\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u5373\u5143\u4efb\u52a1\uff0c\u7b80\u5316\u4e86\u4efb\u52a1\u89c4\u5212\u3002\u6bcf\u4e2a\u5143\u4efb\u52a1\u968f\u540e\u6620\u5c04\u4e3a\u53ef\u6267\u884c\u52a8\u4f5c\u3002\u5728TravelPlanner\u548cAPI-Bank\u4e24\u4e2a\u4e25\u683c\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86MTP\u3002\u7ed3\u679c\u8868\u660e\uff0cMTP\u5728TravelPlanner\u4e0a\u7684\u5e73\u5747\u6210\u529f\u7387\u7ea6\u4e3a40%\uff0c\u8fdc\u8d85\u5f53\u524d\u6700\u4f73\u57fa\u7ebf\uff082.92%\uff09\uff0c\u5e76\u4e14\u5728API-Bank\u4e0a\u7684\u6027\u80fd\u6bd4\u4f7f\u7528ReAct\u7684LLM_{api}-4\u9ad8\u51fa\u7ea614%\uff0c\u8fd9\u663e\u793a\u51fa\u5c06LLM\u4e0e\u591a\u4ee3\u7406\u7cfb\u7edf\u76f8\u7ed3\u5408\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376](http://arxiv.org/abs/2405.16376)|**[link](https://github.com/cyrilli/stride)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u8bed\u8a00\u80fd\u529b\u548c\u63a8\u7406\u6280\u5de7\u3002\u7136\u800c\uff0c\u5728\u6218\u7565\u6027\u7684\u591a\u4ee3\u7406\u51b3\u7b56\u73af\u5883\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u5c40\u9650\uff0c\u5982\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5dee\u3001\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\u548c\u751f\u6210\u9519\u8bef\u4fe1\u606f\u3002\u8fd9\u4e9b\u7f3a\u70b9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9075\u5b88\u590d\u6742\u6e38\u620f\u89c4\u5219\u3001\u957f\u671f\u89c4\u5212\u3001\u63a2\u7d22\u672a\u77e5\u73af\u5883\u4ee5\u53ca\u9884\u6d4b\u5bf9\u624b\u884c\u52a8\u7684\u4e92\u52a8\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u7ed3\u5408\u4e86\u8bb0\u5fc6\u548c\u4e13\u4e1a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u5176\u5728\u6218\u7565\u51b3\u7b56\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7279\u522b\u5728\u53cc\u8fb9\u8c08\u5224\u3001\u591a\u4ee3\u7406\u52a8\u6001\u673a\u5236\u8bbe\u8ba1\u7b49\u7ecf\u6d4e\u91cd\u8981\u573a\u666f\u4e2d\u5e94\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5e76\u901a\u8fc7\u5b9a\u91cf\u6307\u6807\u8bc4\u4f30\u5728\u5404\u79cd\u6218\u7565\u51b3\u7b56\u95ee\u9898\u4e0a\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u589e\u5f3a\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5f53\u524d\u6a21\u578b\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u4f46\u6211\u4eec\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u589e\u5f3a\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u53ef\u80fd\u6027\uff0c\u8fd9\u4e3a\u672a\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u5411\u3002**|\n", "2405.16334": "|**2024-05-29**|**Devil's Advocate: Anticipatory Reflection for LLM Agents**|Haoyu Wang et.al.|[2405.16334](http://arxiv.org/abs/2405.16334)|null|\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8d4b\u4e88\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u6211\u53cd\u601d\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65f6\u7684\u4e00\u81f4\u6027\u548c\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fc3\u4f7fLLM\u4ee3\u7406\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u5b50\u4efb\u52a1\uff08\u5373\u5236\u5b9a\u8ba1\u5212\uff09\uff0c\u5e76\u5728\u6267\u884c\u884c\u52a8\u4e4b\u524d\u6301\u7eed\u53cd\u601d\u53ef\u80fd\u7684\u5931\u8d25\u53ca\u5176\u8865\u6551\u63aa\u65bd\u3001\u6267\u884c\u540e\u4e0e\u5b50\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u5e76\u8fdb\u884c\u5fc5\u8981\u7684\u56de\u6eaf\u4ee5\u786e\u4fdd\u5168\u529b\u4ee5\u8d74\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u53ca\u5728\u5b8c\u6210\u8ba1\u5212\u540e\u8fdb\u884c\u5168\u9762\u5ba1\u67e5\uff0c\u4ee5\u4fbf\u4e8e\u672a\u6765\u7b56\u7565\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5728WebArena\u4e2d\u96f6\u6837\u672c\u5e94\u7528\u8fd9\u4e00\u65b9\u6cd5\u5904\u7406\u5b9e\u9645\u7684\u7f51\u7edc\u73af\u5883\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u4f18\u4e8e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u57fa\u4e8e\u53cd\u601d\u7684\u7b56\u7565\u4e0d\u4ec5\u63d0\u5347\u4e86\u4ee3\u7406\u5e94\u5bf9\u672a\u9884\u89c1\u6311\u6218\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u901a\u8fc7\u5f3a\u5927\u7684\u8ba1\u5212\u6267\u884c\u673a\u5236\uff0c\u8fd8\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u51cf\u5c11\u4e86\u5b9e\u73b0\u4efb\u52a1\u6240\u9700\u7684\u5c1d\u8bd5\u6b21\u6570\u548c\u8ba1\u5212\u4fee\u8ba2\u6b21\u6570\u3002|\n", "2405.16247": "|**2024-05-25**|**AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning**|Minghao Chen et.al.|[2405.16247](http://arxiv.org/abs/2405.16247)|**[link](https://github.com/minghchen/automanual)**|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u5404\u79cd\u9886\u57df\u4efb\u52a1\uff0c\u5982\u673a\u5668\u4eba\u3001\u6e38\u620f\u548c\u7f51\u7edc\u5bfc\u822a\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u548c\u4e13\u5bb6\u7ea7\u63d0\u793a\u624d\u80fd\u9002\u5e94\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u9002\u5e94\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AutoManual\u6846\u67b6\uff0c\u8ba9LLMs\u80fd\u591f\u901a\u8fc7\u4e92\u52a8\u81ea\u4e3b\u6784\u5efa\u7406\u89e3\uff0c\u5e76\u9002\u5e94\u65b0\u73af\u5883\u3002AutoManual\u5c06\u73af\u5883\u77e5\u8bc6\u5206\u4e3a\u591a\u6837\u7684\u89c4\u5219\uff0c\u5e76\u901a\u8fc7\u4e24\u4e2a\u4ee3\u7406\u8fdb\u884c\u5728\u7ebf\u4f18\u5316\uff1a1\uff09\u89c4\u5212\u5668\u6839\u636e\u5f53\u524d\u89c4\u5219\u5236\u5b9a\u53ef\u64cd\u4f5c\u7684\u884c\u52a8\u8ba1\u5212\uff1b2\uff09\u6784\u5efa\u8005\u901a\u8fc7\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u89c4\u5219\u7cfb\u7edf\u66f4\u65b0\u89c4\u5219\uff0c\u4fc3\u8fdb\u5728\u7ebf\u89c4\u5219\u7ba1\u7406\u5e76\u4fdd\u6301\u5173\u952e\u7ec6\u8282\u3002\u4e3a\u4e86\u51cf\u5c11\u5728\u7ba1\u7406\u89c4\u5219\u65f6\u7684\u5e7b\u89c9\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u6848\u4f8b\u6761\u4ef6\u63d0\u793a\u201d\u7b56\u7565\u7528\u4e8e\u6784\u5efa\u8005\u3002\u6700\u7ec8\uff0c\u7f16\u8bd1\u5668\u4ee3\u7406\u5c06\u8fd9\u4e9b\u89c4\u5219\u6574\u5408\u6210\u4e00\u4efd\u5168\u9762\u7684\u624b\u518c\u3002\u8fd9\u4efd\u81ea\u6211\u751f\u6210\u7684\u624b\u518c\u4e0d\u4ec5\u80fd\u63d0\u9ad8\u9002\u5e94\u6027\uff0c\u8fd8\u80fd\u6307\u5bfc\u5c0f\u578bLLMs\u7684\u89c4\u5212\uff0c\u540c\u65f6\u4fdd\u6301\u4eba\u7c7b\u53ef\u8bfb\u3002\u4ec5\u51ed\u4e00\u6b21\u7b80\u5355\u6f14\u793a\uff0cAutoManual\u663e\u8457\u63d0\u9ad8\u4e86\u4efb\u52a1\u6210\u529f\u7387\uff0cGPT-4-turbo\u4e0b\u8fbe\u523097.4%\uff0cGPT-3.5-turbo\u4e0b\u4e3a86.2%\u3002\u6e90\u4ee3\u7801\u5373\u5c06\u53d1\u5e03\u3002|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208](http://arxiv.org/abs/2405.18208)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u8868\u660e\uff0c\u8fd9\u4e9b\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u4e9b\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\uff0c\u5982\u5199\u4f5c\u548c\u7f16\u7801\uff0c\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u9700\u8981\u7efc\u5408\u89c4\u5212\u7684\u4efb\u52a1\u4e0a\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u4ecd\u662f\u5f53\u524d\u6a21\u578b\u7684\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u65c5\u884c\u89c4\u5212\uff0c\u8fd9\u662f\u4e00\u4e2a\u6d89\u53ca\u591a\u4e2a\u9636\u6bb5\u7684\u590d\u6742\u95ee\u9898\uff0c\u5305\u62ec\u63d0\u7eb2\u3001\u4fe1\u606f\u6536\u96c6\u548c\u89c4\u5212\uff0c\u901a\u5e38\u4f34\u968f\u7740\u5404\u79cd\u7ea6\u675f\u548c\u4e0d\u786e\u5b9a\u6027\u3002\u73b0\u6709\u7684\u63a8\u7406\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u6548\u679c\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u89c4\u5212\u6846\u67b6\uff0c\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6a21\u4eff\u4eba\u7c7b\u89e3\u51b3\u591a\u9636\u6bb5\u95ee\u9898\u7684\u6b65\u9aa4\uff0c\u4ee5\u63d0\u5347\u5176\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5b9e\u65bd\u7b56\u7565\uff0c\u8ba9\u6a21\u578b\u80fd\u4e3a\u6bcf\u4e2a\u65c5\u884c\u67e5\u8be2\u751f\u6210\u8fde\u8d2f\u7684\u63d0\u7eb2\uff0c\u6a21\u62df\u4eba\u7c7b\u7684\u89c4\u5212\u6a21\u5f0f\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u7b56\u7565\u5757\u548c\u77e5\u8bc6\u5757\u5230\u6846\u67b6\u4e2d\uff1a\u7b56\u7565\u5757\u5e2e\u52a9\u4fe1\u606f\u641c\u96c6\uff0c\u800c\u77e5\u8bc6\u5757\u63d0\u4f9b\u8be6\u7ec6\u89c4\u5212\u6240\u9700\u7684\u5fc5\u8981\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u6846\u67b6\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89c4\u5212\u80fd\u529b\u7684\u663e\u8457\u63d0\u5347\uff0c\u4f7f\u5176\u5728\u5904\u7406\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u65f6\u6548\u7387\u548c\u6548\u679c\u90fd\u6709\u6240\u63d0\u9ad8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u4e0eGPT-4-Turbo\u7ed3\u5408\u65f6\uff0c\u6211\u4eec\u7684\u6846\u67b6\u76f8\u8f83\u4e8e\u57fa\u7840\u6846\u67b6\u5728GPT-4-Turbo\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e8610\u500d\u3002|\n", "2405.18113": "|**2024-05-28**|**Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting**|Hongda Sun et.al.|[2405.18113](http://arxiv.org/abs/2405.18113)|null|\u968f\u7740\u5728\u7ebf\u62db\u8058\u670d\u52a1\u7684\u5174\u8d77\uff0c\u4f20\u7edf\u7684\u6c42\u804c\u548c\u62db\u8058\u65b9\u5f0f\u53d1\u751f\u4e86\u53d8\u9769\uff0c\u8feb\u5207\u9700\u8981\u5f00\u53d1\u9ad8\u8d28\u91cf\u7684\u5de5\u4e1a\u5e94\u7528\u6765\u63d0\u5347\u6c42\u804c\u8005\u4e0e\u804c\u4f4d\u7684\u5339\u914d\u5ea6\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u6f5c\u5728\u8bed\u4e49\u5efa\u6a21\uff0c\u5b66\u4e60\u4e24\u8005\u4e4b\u95f4\u7684\u5339\u914d\u51fd\u6570\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89d2\u8272\u626e\u6f14\u65b9\u9762\u5f3a\u5927\u80fd\u529b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u5f15\u5165LLMs\u6a21\u62df\u9762\u8bd5\u73af\u8282\uff0c\u8ba9\u5176\u4e0e\u6c42\u804c\u8005\u8fdb\u884c\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u5019\u9009\u4eba\u8bc4\u4f30\u63d0\u4f9b\u989d\u5916\u8bc1\u636e\uff0c\u4ece\u800c\u589e\u5f3a\u4ec5\u57fa\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u4e2a\u6027\u5316\u5339\u914d\u3002\u7136\u800c\uff0c\u5728\u7f51\u7edc\u62db\u8058\u4e2d\u7684\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u89d2\u8272\u5851\u9020\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5982\u63d0\u95ee\u6280\u5de7\u3001\u56de\u7b54\u6784\u5efa\u4ee5\u53ca\u53cc\u5411\u5339\u914d\u5ea6\u8bc4\u4f30\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMockLLM\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u6846\u67b6\uff0c\u5c06\u4eba\u804c\u5339\u914d\u8fc7\u7a0b\u5212\u5206\u4e3a\u4e24\u4e2a\u6a21\u5757\uff1a\u6a21\u62df\u9762\u8bd5\u751f\u6210\u548c\u63e1\u624b\u534f\u8bae\u4e2d\u7684\u53cc\u5411\u8bc4\u4f30\uff0c\u901a\u8fc7\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u4e4b\u95f4\u7684\u534f\u4f5c\u884c\u4e3a\u5171\u540c\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u89d2\u8272\u3001\u591a\u884c\u4e3a\u7684\u6846\u67b6\uff0c\u4f7f\u5355\u4e00\u7684LLM\u4ee3\u7406\u80fd\u6709\u6548\u5730\u626e\u6f14\u53cc\u65b9\u7684\u4e0d\u540c\u804c\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cd\u601d\u8bb0\u5fc6\u751f\u6210\u548c\u52a8\u6001\u63d0\u793a\u4fee\u6539\u6280\u672f\uff0c\u4ee5\u4f18\u5316\u53cc\u65b9\u7684\u884c\u4e3a\uff0c\u6301\u7eed\u4f18\u5316\u9644\u52a0\u7684\u8bc4\u4f30\u8bc1\u636e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMockLLM\u5728\u4eba\u804c\u5339\u914d\u4e0a\u7684\u8868\u73b0\u6700\u4f18\uff0c\u4e14\u6a21\u62df\u9762\u8bd5\u8d28\u91cf\u9ad8\uff0c\u9884\u793a\u7740\u5b83\u5728\u672a\u6765\u5728\u7ebf\u62db\u8058\u4e2d\u7684\u5b9e\u9645\u5e94\u7528\u524d\u666f\u5e7f\u9614\u3002|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092](http://arxiv.org/abs/2405.18092)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|**\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e94\u7528\u4e8e\u6570\u5b57\u5b6a\u751f\u8fc7\u7a0b\u6a21\u62df\u7684\u53c2\u6570\u81ea\u52a8\u5316\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5305\u542b\u89c2\u5bdf\u3001\u63a8\u7406\u3001\u51b3\u7b56\u548c\u603b\u7ed3\u56db\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\u3002\u901a\u8fc7\u5b9e\u73b0LLM\u4ee3\u7406\u4e0e\u6a21\u62df\u6a21\u578b\u7684\u52a8\u6001\u4ea4\u4e92\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u81ea\u52a8\u63a2\u7d22\u53c2\u6570\u8bbe\u7f6e\uff0c\u5229\u7528\u542f\u53d1\u5f0f\u63a8\u7406\u786e\u5b9a\u4e00\u7ec4\u63a7\u5236\u6a21\u62df\u4ee5\u8fbe\u6210\u76ee\u6807\u7684\u53c2\u6570\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u6ce8\u5165LLM\u7684\u542f\u53d1\u5f0f\uff0c\u589e\u5f3a\u6a21\u62df\u6a21\u578b\uff0c\u5e76\u652f\u6301\u81ea\u4e3b\u641c\u7d22\u4ee5\u89e3\u51b3\u7528\u6237\u4efb\u52a1\uff0c\u6709\u671b\u63d0\u9ad8\u7528\u6237\u4f53\u9a8c\u5e76\u51cf\u8f7b\u4eba\u7c7b\u7528\u6237\u5728\u590d\u6742\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u7814\u7a76\u901a\u8fc7\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u4e0e\u529f\u80fd\uff0c\u5e76\u5728GitHub\u4ed3\u5e93\u63d0\u4f9b\u4e86\u53ef\u89c6\u5316\u7684\u6f14\u793a\u3002**|\n", "2405.17837": "|**2024-05-28**|**Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces**|Qiuyu Lu et.al.|[2405.17837](http://arxiv.org/abs/2405.17837)|null|\u5728\u4eba\u673a\u4ea4\u4e92\uff08HCI\uff09\u9886\u57df\uff0c\u4ea4\u4e92\u8bbe\u5907\u7684\u8bbe\u8ba1\u5f00\u53d1\u662f\u5173\u952e\u5173\u6ce8\u70b9\u3002\u968f\u7740\u65b0\u578b\u786c\u4ef6\u548c\u5148\u8fdb\u5236\u9020\u6280\u672f\u7684\u5174\u8d77\uff0c\u5bf9\u80fd\u591f\u7b80\u5316\u539f\u578b\u5236\u4f5c\u8fc7\u7a0b\u7684\u4e13\u95e8\u8bbe\u8ba1\u5de5\u5177\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5de5\u5177\u867d\u7136\u901a\u8fc7\u53c2\u6570\u5316\u8bbe\u8ba1\u548c\u6a21\u62df\u7b80\u5316\u6d41\u7a0b\uff0c\u4f46\u5b66\u4e60\u66f2\u7ebf\u8f83\u9661\uff0c\u4e14\u5728\u6fc0\u53d1\u521b\u65b0\u601d\u7ef4\u65b9\u9762\u6709\u6240\u6b20\u7f3a\u3002\u672c\u7814\u7a76\u4ee5\u6d41\u4f53\u8ba1\u7b97\u754c\u9762\u4e3a\u4f8b\uff0c\u63a2\u8ba8\u5982\u4f55\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u589e\u5f3a\u7269\u7406\u8bbe\u5907\u8bbe\u8ba1\u5de5\u5177\uff0c\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u8bbe\u8ba1\u5de5\u5177\uff08GDT\uff09\u3002\u501f\u52a9LLM\uff0cGDT\u80fd\u591f\u7406\u89e3\u65b0\u8bbe\u5907\u7684\u7279\u6027\u548c\u5c40\u9650\uff0c\u63d0\u51fa\u591a\u6837\u3001\u5bcc\u6709\u6d1e\u5bdf\u529b\u4e14\u5b9e\u7528\u7684\u5e94\u7528\u573a\u666f\uff0c\u63a8\u8350\u6280\u672f\u548c\u60c5\u5883\u9002\u5b9c\u7684\u8bbe\u5907\u8bbe\u8ba1\uff0c\u5e76\u81ea\u52a8\u751f\u6210\u8bbe\u8ba1\u53c2\u6570\uff0c\u4ee5\u4fbf\u4f20\u7edf\u8bbe\u8ba1\u5de5\u5177\u5c55\u793a\u7ed3\u679c\u5e76\u751f\u6210\u52a0\u5de5\u6240\u9700\u7684\u6587\u4ef6\u3002\u672c\u6587\u9610\u8ff0\u4e86GDT\u7684\u6846\u67b6\u3001\u5b9e\u73b0\u548c\u6027\u80fd\uff0c\u5e76\u53cd\u601d\u5176\u524d\u666f\u53ca\u9047\u5230\u7684\u6311\u6218\u3002|\n", "2405.20267": "|**2024-05-30**|**Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions**|Ruochen Zhao et.al.|[2405.20267](http://arxiv.org/abs/2405.20267)|**[link](https://github.com/Auto-Arena/Auto-Arena-LLMs)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65e5\u65b0\u6708\u5f02\uff0c\u8feb\u5207\u9700\u8981\u4e00\u79cd\u53ef\u9760\u4e14\u53ca\u65f6\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u9274\u4e8e\u9759\u6001\u57fa\u51c6\u6613\u53d7\u6c61\u67d3\uff0c\u7528\u6237\u5f80\u5f80\u4f9d\u8d56\u4e8e\u50cfChatbot Arena\u8fd9\u6837\u7684\u4eba\u7c7b\u6295\u7968\u5e73\u53f0\u3002\u7136\u800c\uff0c\u4eba\u5de5\u6807\u6ce8\u9700\u8981\u5927\u91cf\u4eba\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51faAuto-Arena\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u52a8\u5316\u5168\u6d41\u7a0b\u7684LLM\u8bc4\u4f30\u6846\u67b6\u3002\u9996\u5148\uff0c\u7531\u8003\u5b98LLM\u8bbe\u8ba1\u95ee\u9898\uff1b\u63a5\u7740\uff0c\u5019\u9009LLMs\u56f4\u7ed5\u95ee\u9898\u8fdb\u884c\u591a\u8f6e\u76f8\u4e92\u5bf9\u51b3\uff0c\u66b4\u9732\u51fa\u5b83\u4eec\u7684\u771f\u5b9e\u6027\u80fd\u5dee\u8ddd\uff1b\u6700\u540e\uff0c\u7531LLM\u88c1\u5224\u96c6\u4f53\u8ba8\u8bba\u5e76\u51b3\u5b9a\u80dc\u8005\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u89c1\uff0c\u63d0\u5347\u516c\u5e73\u6027\u3002\u6211\u4eec\u5728\u6700\u65b017\u6b3eLLMs\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0cAuto-Arena\u4e0e\u4eba\u7c7b\u504f\u597d\u5177\u6709\u6700\u9ad8\u7684\u76f8\u5173\u6027\uff0c\u4e3a\u66ff\u4ee3\u4eba\u7c7b\u8bc4\u4ef7\u5e73\u53f0\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189](http://arxiv.org/abs/2405.20189)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u4e3aNadine\u793e\u4ea4\u673a\u5668\u4eba\u5e73\u53f0\u5f00\u53d1\u667a\u80fd\u548c\u5065\u58ee\u7684\u793e\u4ea4\u673a\u5668\u4eba\u7cfb\u7edf\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u901a\u8fc7\u96c6\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5de7\u5999\u5730\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u7684\u5f3a\u5927\u63a8\u7406\u548c\u6307\u4ee4\u6267\u884c\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u63a5\u8fd1\u4eba\u7c7b\u7684\u611f\u6027\u4e0e\u8ba4\u77e5\u80fd\u529b\u3002\u8fd9\u4e0e\u5f53\u524d\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u76f8\u6bd4\u662f\u521b\u65b0\u7684\uff0c\u56e0\u4e3a\u5b83\u4eec\u901a\u5e38\u4e0d\u5177\u5907\u4eba\u7c7b\u5f0f\u7684\u957f\u671f\u8bb0\u5fc6\u6216\u590d\u6742\u7684\u60c5\u611f\u8bc4\u4f30\u529f\u80fd\u3002\u793e\u4ea4\u673a\u5668\u4eba\u7684\u81ea\u7136\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u7cfb\u7edf\u5404\u7ec4\u4ef6\u7684\u6027\u80fd\u548c\u534f\u540c\u5de5\u4f5c\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7cfb\u7edf\uff0c\u80fd\u591f\u901a\u8fc7\u591a\u6a21\u6001\u8f93\u5165\u5904\u7406\u751f\u6210\u6070\u5f53\u7684\u884c\u4e3a\uff0c\u6839\u636e\u8bc6\u522b\u5230\u7684\u7528\u6237\u5f15\u5165\u76f8\u5173\u7684\u60c5\u666f\u8bb0\u5fc6\uff0c\u5e76\u6a21\u62df\u673a\u5668\u4eba\u5728\u4e0e\u4eba\u7c7b\u4f19\u4f34\u4e92\u52a8\u8fc7\u7a0b\u4e2d\u4ea7\u751f\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9\u793e\u4ea4\u673a\u5668\u4eba\u7684LLM-agent\u6846\u67b6\uff0cSoR-ReAct\uff0c\u4f5c\u4e3a\u6211\u4eec\u7cfb\u7edf\u4e2d\u4ea4\u4e92\u6a21\u5757\u7684\u6838\u5fc3\u7ec4\u4ef6\u3002\u8fd9\u4e00\u8bbe\u8ba1\u63a8\u52a8\u4e86\u793e\u4ea4\u673a\u5668\u4eba\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u63d0\u5347\u4eba\u673a\u4ea4\u4e92\u7684\u8d28\u91cf\u3002|\n", "2405.19425": "|**2024-05-29**|**Adaptive In-conversation Team Building for Language Model Agents**|Linxin Song et.al.|[2405.19425](http://arxiv.org/abs/2405.19425)|null|### \u7ffb\u8bd1 \u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u5229\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u524d\u666f\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u7279\u5b9a\u5e94\u7528\u8bbe\u8ba1\u6709\u6548\u7684\u591a\u4ee3\u7406\u56e2\u961f\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u56e2\u961f\u6784\u5efa\u8303\u5f0f\uff0c\u540d\u4e3a\u201cCaptain Agent\u201d\u3002\u5b83\u901a\u8fc7\u521b\u65b0\u7684Agent\u8bbe\u8ba1\uff0c\u80fd\u591f\u81ea\u9002\u5e94\u5730\u4e3a\u6bcf\u4e2a\u95ee\u9898\u89e3\u51b3\u6b65\u9aa4\u7ec4\u5efa\u548c\u7ba1\u7406\u56e2\u961f\uff0c\u5229\u7528\u5d4c\u5957\u7fa4\u804a\u548c\u53cd\u601d\u673a\u5236\u786e\u4fdd\u591a\u5143\u5316\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u9632\u6b62\u523b\u677f\u8f93\u51fa\u3002\u8fd9\u79cd\u65b9\u6cd5\u63d0\u4f9b\u4e86\u7075\u6d3b\u4f46\u7ed3\u6784\u5316\u7684\u89e3\u51b3\u95ee\u9898\u65b9\u5f0f\uff0c\u6709\u52a9\u4e8e\u51cf\u5c11\u5197\u4f59\uff0c\u589e\u5f3a\u8f93\u51fa\u591a\u6837\u6027\u3002\u5728\u516d\u4e2a\u5b9e\u9645\u573a\u666f\u4e2d\u7684\u5168\u9762\u8bc4\u4f30\u663e\u793a\uff0cCaptain Agent\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u591a\u4ee3\u7406\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8621.94%\uff0c\u5e76\u4e14\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u7e41\u7410\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.01422": "|**2024-06-03**|**How to Understand Whole Software Repository?**|Yingwei Ma et.al.|[2406.01422](http://arxiv.org/abs/2406.01422)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u8f6f\u4ef6\u5de5\u7a0b\uff08ASE\uff09\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5c3d\u7ba1\u73b0\u6709\u65b9\u6cd5\u5df2\u8bc1\u5b9e\u6709\u6548\uff0c\u4f46\u5b83\u4eec\u7684\u8bbe\u8ba1\u4e3b\u8981\u4fa7\u91cd\u4e8e\u4ee3\u7801\u7684\u5c40\u90e8\u4fe1\u606f\uff0c\u5982\u95ee\u9898\u3001\u7c7b\u548c\u51fd\u6570\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u8f6f\u4ef6\u7cfb\u7edf\u5168\u5c40\u4e0a\u4e0b\u6587\u548c\u4f9d\u8d56\u5173\u7cfb\u7684\u7406\u89e3\u3002\u6839\u636e\u8f6f\u4ef6\u5f00\u53d1\u4eba\u5458\u7684\u5b9e\u9645\u7ecf\u9a8c\uff0c\u6211\u4eec\u8ba4\u4e3a\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u662f\u8fc8\u5411ASE\u7684\u5173\u952e\u3002\u7136\u800c\uff0c\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u5e26\u6765\u4e86\u8bf8\u591a\u6311\u6218\uff0c\u4f8b\u5982\uff1a\u957f\u4ee3\u7801\u8f93\u5165\u3001\u566a\u58f0\u4ee3\u7801\u4fe1\u606f\u3001\u590d\u6742\u4f9d\u8d56\u5173\u7cfb\u7b49\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u7814\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aRepoUnderstander\u7684\u65b0ASE\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5bfc\u4ee3\u7406\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u3002\u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u65b9\u5f0f\u5c06\u6574\u4e2a\u4ed3\u5e93\u7684\u5173\u952e\u4fe1\u606f\u538b\u7f29\u5230\u77e5\u8bc6\u56fe\u8c31\u4e2d\uff0c\u4ee5\u964d\u4f4e\u590d\u6742\u6027\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08Monte Carlo Tree Search, MCTS\uff09\u4e3a\u57fa\u7840\u7684\u4ed3\u5e93\u63a2\u7d22\u7b56\u7565\uff0c\u8d4b\u4e88\u4ee3\u7406\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528\u4ed3\u5e93\u7ea7\u522b\u7684\u77e5\u8bc6\uff0c\u6211\u4eec\u6307\u5bfc\u4ee3\u7406\u8fdb\u884c\u603b\u7ed3\u3001\u5206\u6790\u548c\u89c4\u5212\uff0c\u7136\u540e\u4ed6\u4eec\u53ef\u4ee5\u5229\u7528\u5de5\u5177\u52a8\u6001\u83b7\u53d6\u4fe1\u606f\u5e76\u751f\u6210\u4fee\u590d\u5b9e\u9645GitHub\u95ee\u9898\u7684\u8865\u4e01\u3002 \u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cRepoUnderstander\u5177\u6709\u4f18\u8d8a\u6027\u548c\u6709\u6548\u6027\u3002\u5728SWE-bench Lite\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e0eSWE-agent\u76f8\u6bd4\uff0c\u5b83\u5b9e\u73b0\u4e8618.5%\u7684\u76f8\u5bf9\u63d0\u5347\u3002|\n", "2406.01364": "|**2024-06-03**|**BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards**|Diego Dorn et.al.|[2406.01364](http://arxiv.org/abs/2406.01364)|null|## \u80cc\u666f \u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u673a\u5236\u88ab\u7528\u4e8e\u68c0\u6d4b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7cfb\u7edf\u7684\u5f02\u5e38\u8f93\u51fa\u3002\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5728\u5b9e\u65f6\u76d1\u63a7\u3001\u79bb\u7ebf\u8bc4\u4f30\u548c\u5185\u5bb9\u5ba1\u6838\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u53d1\u6325\u6838\u5fc3\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u7edf\u4e00\u7684\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u5b83\u4eec\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b89\u5168\u9632\u62a4\u57fa\u51c6\u201d\uff08Benchmarks for the Evaluation of LLM Safeguards\uff0c\u7b80\u79f0BELLS\uff09\uff0c\u5b83\u662f\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6d4b\u8bd5\u96c6\u5408\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\uff1a(1) \u5efa\u7acb\u6027\u6545\u969c\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u5df2\u5b58\u5728\u7684\u9488\u5bf9\u660e\u786e\u6545\u969c\u6a21\u5f0f\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u6bd4\u8f83\u5f53\u524d\u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u7684\u6548\u80fd\uff1b(2) \u65b0\u5174\u6545\u969c\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8861\u91cf\u5bf9\u672a\u89c1\u8fc7\u7684\u6545\u969c\u6a21\u5f0f\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u901a\u7528\u9632\u62a4\u673a\u5236\u7684\u53d1\u5c55\uff1b(3) \u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u9488\u5bf9\u66f4\u590d\u6742\u7684\u67b6\u6784\uff08\u5982LLM\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\uff09\uff0c\u76ee\u6807\u662f\u63a8\u52a8\u9002\u7528\u4e8e\u672a\u6765\u5c1a\u672a\u5b58\u5728\u4e13\u95e8\u9632\u62a4\u7684\u5e94\u7528\u7684\u5b89\u5168\u9632\u62a4\u6280\u672f\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u5e76\u5206\u4eab\u4e86\u7b2c\u4e00\u4e2a\u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u4f7f\u7528MACHIAVELLI\u73af\u5883\uff0c\u5e76\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u7684\u4ea4\u4e92\u5f0f\u53ef\u89c6\u5316\u3002|\n", "2406.00936": "|**2024-06-03**|**A Survey of Useful LLM Evaluation**|Ji-Lun Peng et.al.|[2406.00936](http://arxiv.org/abs/2406.00936)|null|\u7531\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u7814\u7a76\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5bf9\u5b83\u4eec\u7684\u80fd\u529b\u8bc4\u4f30\u65b9\u6cd5\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4ee5\u786e\u5b9a\u5176\u5408\u9002\u7684\u4efb\u52a1\u548c\u8d23\u4efb\u3002\u672c\u6587\u4e3b\u8981\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5730\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5de5\u5177\uff0c\u5e76\u63d0\u51fa\u4e00\u4e2a\u4e24\u9636\u6bb5\u6846\u67b6\uff1a\u4ece\u201c\u6838\u5fc3\u80fd\u529b\u201d\u5230\u201c\u4ee3\u7406\u201d\u3002\u9996\u5148\uff0c\u6838\u5fc3\u80fd\u529b\u6307\u7684\u662f\u5927\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u6240\u5fc5\u9700\u7684\u7279\u6027\uff0c\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u80fd\u529b\u540e\uff0c\u5b83\u4eec\u80fd\u591f\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u4efb\u52a1\uff0c\u626e\u6f14\u4ee3\u7406\u89d2\u8272\u3002\u5728\u201c\u6838\u5fc3\u80fd\u529b\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3001\u793e\u4f1a\u5f71\u54cd\u4ee5\u53ca\u9886\u57df\u77e5\u8bc6\u3002\u800c\u5728\u201c\u4ee3\u7406\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5177\u8eab\u884c\u52a8\u3001\u89c4\u5212\u548c\u5de5\u5177\u5b66\u4e60\u65b9\u9762\u7684\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2406.01637": "|**2024-06-02**|**Teams of LLM Agents can Exploit Zero-Day Vulnerabilities**|Richard Fang et.al.|[2406.01637](http://arxiv.org/abs/2406.01637)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u590d\u6742\u6027\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7814\u7a76\u8005\u53d1\u73b0\uff0c\u5f53\u63d0\u4f9b\u6f0f\u6d1e\u63cf\u8ff0\u548c\u7b80\u5355\u7684\u593a\u65d7\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5229\u7528\u5b9e\u9645\u5b58\u5728\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4e8b\u5148\u672a\u77e5\u7684\u96f6\u65e5\u6f0f\u6d1e\uff08\u5373\u653b\u51fb\u8005\u638c\u63e1\u800c\u5b89\u5168\u8f6f\u4ef6\u4f9b\u5e94\u5546\u8fd8\u672a\u4fee\u8865\u7684\u6f0f\u6d1e\uff09\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4ecd\u7136\u4e0d\u4f73\u3002\u672c\u6587\u5c55\u793a\u4e86\uff0c\u901a\u8fc7\u56e2\u961f\u5408\u4f5c\uff0c\u591a\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u653b\u51fb\u73b0\u5b9e\u4e16\u754c\u7684\u96f6\u65e5\u6f0f\u6d1e\u3002\u5355\u72ec\u7684\u4ee3\u7406\u5728\u63a2\u7d22\u4f17\u591a\u6f0f\u6d1e\u548c\u8fdb\u884c\u957f\u671f\u89c4\u5212\u65f6\u9762\u4e34\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HPTSA\u7cfb\u7edf\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u80fd\u8c03\u5ea6\u5b50\u4ee3\u7406\u7684\u8ba1\u5212\u4ee3\u7406\u3002\u8ba1\u5212\u4ee3\u7406\u8d1f\u8d23\u63a2\u7d22\u7cfb\u7edf\u5e76\u51b3\u5b9a\u4f7f\u7528\u54ea\u4e2a\u5b50\u4ee3\u7406\u6765\u5c1d\u8bd5\u4e0d\u540c\u7684\u6f0f\u6d1e\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u957f\u671f\u89c4\u5212\u7684\u95ee\u9898\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u5305\u542b15\u4e2a\u771f\u5b9e\u4e16\u754c\u6f0f\u6d1e\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u56e2\u961f\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u63d0\u9ad8\u4e864.5\u500d\u3002|\n", "2406.00583": "|**2024-06-02**|**CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems**|Yanlin Feng et.al.|[2406.00583](http://arxiv.org/abs/2406.00583)|**[link](https://github.com/megagonlabs/CMDBench)**|### \u80cc\u666f \u5728\u6570\u636e\u5e93\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff08Compound Artificial Intelligence Systems\uff0cCAS\uff09\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u4f5c\u4e3a\u4ee3\u7406\uff0c\u901a\u8fc7\u4e0e\u5de5\u5177\u548c\u6570\u636e\u68c0\u7d22\u5668\u4ea4\u4e92\u6765\u6267\u884c\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u7cfb\u7edf\u6709\u53ef\u80fd\u589e\u5f3a\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u4e2d\u6570\u636e\u5206\u6790\u5e08\u7684\u4e00\u822c\u5206\u6790\u6d41\u7a0b\uff0c\u4f46CAS\u9762\u4e34\u7740\u4e0e\u5206\u6790\u5e08\u76f8\u4f3c\u7684\u6570\u636e\u53d1\u73b0\u6311\u6218\uff1a\u7ec4\u7ec7\u5185\u90e8\u4e0d\u540c\u56e2\u961f\u548c\u90e8\u95e8\u521b\u5efa\u7684\u591a\u6a21\u6001\u6570\u636e\u6e90\u5b64\u7acb\uff0c\u8fd9\u4f7f\u5f97\u5bfb\u627e\u5b8c\u6210\u5f53\u524d\u4efb\u52a1\u6240\u9700\u5408\u9002\u6570\u636e\u6e90\u53d8\u5f97\u56f0\u96be\u3002\u73b0\u6709\u7684\u6570\u636e\u53d1\u73b0\u57fa\u51c6\u5e76\u672a\u5145\u5206\u6a21\u62df\u8fd9\u79cd\u591a\u6a21\u6001\u548c\u6570\u636e\u6e90\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0cCAS\u7684\u73b0\u6709\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u7aef\u5230\u7aef\u4efb\u52a1\u6027\u80fd\u8bc4\u4f30\uff0c\u800c\u5ffd\u89c6\u4e86\u6570\u636e\u53d1\u73b0\u6027\u80fd\u3002 \u4e3a\u4e86\u63a8\u52a8\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5bf9\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728CAS\u4e2d\u7684\u6570\u636e\u53d1\u73b0\u6027\u80fd\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CMDBench\uff0c\u4e00\u4e2a\u65e8\u5728\u6a21\u62df\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u6539\u7f16\u4e86\u5f00\u653e\u9886\u57df\u7684\u73b0\u6709\u6570\u636e\u96c6\u548c\u57fa\u51c6\uff0c\u5982\u95ee\u7b54\u3001\u590d\u6742\u63a8\u7406\u4ee5\u53ca\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u7ed3\u6784\u5316\u6570\u636e\uff0c\u6765\u8bc4\u4f30\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u53d1\u73b0\u4ee5\u53ca\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002 ### \u5b9e\u9a8c\u7ed3\u679c \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u68c0\u7d22\u5668\u8bbe\u8ba1\u5bf9\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u2014\u2014\u5e73\u5747\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u4e8646%\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u9700\u8981\u5f00\u53d1\u4f18\u5316\u7b56\u7565\u6765\u786e\u5b9a\u5408\u9002\u7684LLM\u4ee3\u7406\u548c\u68c0\u7d22\u5668\uff0c\u4ee5\u63d0\u9ad8\u5728\u4f01\u4e1a\u6570\u636e\u4e0a\u9ad8\u6548\u6267\u884cCAS\u7684\u80fd\u529b\u3002 \u603b\u4e4b\uff0cCMDBench\u662f\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u9488\u5bf9\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u8fdb\u884c\u7814\u7a76\u7684\u65b0\u5de5\u5177\uff0c\u5b83\u901a\u8fc7\u7efc\u5408\u8bc4\u4f30\u6570\u636e\u53d1\u73b0\u548c\u4efb\u52a1\u6267\u884c\u80fd\u529b\uff0c\u4e3a\u6539\u8fdb\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u6846\u67b6\u3002|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244](http://arxiv.org/abs/2406.00244)|null|\u968f\u7740\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u666e\u904d\u9002\u7528\u6027\u63d0\u5347\uff0c\u4eba\u4eec\u5bf9\u5176\u7528\u4f5c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4ee3\u7406\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\u3002\u5728\u8fd9\u4e9b\u60c5\u5883\u4e0b\uff0c\u6a21\u578b\u9700\u8981\u6839\u636e\u4e0e\u73af\u5883\u7684\u6709\u9650\u4ea4\u4e92\u5f62\u6210\u76ee\u6807\u5b9e\u73b0\u7b56\u7565\u7684\u4fe1\u5ff5\uff0c\u5e76\u5728\u6bcf\u4e00\u6b65\u51b3\u7b56\u4e2d\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\u8fdb\u884c\u7814\u7a76\uff0c\u901a\u8fc7\u63a7\u5236\u7684\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u5b9e\u9a8c\u63a2\u8ba8LLMs\u5982\u4f55\u5f62\u6210\u548c\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u5ff5\u3002 \u9996\u5148\uff0c\u6211\u4eec\u53d1\u73b0LLM\u6a21\u578b\u8fc7\u4e8e\u81ea\u4fe1\uff1a\u5b83\u4eec\u5728\u7f3a\u4e4f\u5145\u5206\u8bc1\u636e\u7684\u60c5\u51b5\u4e0b\u5c31\u5bf9\u884c\u52a8\u505a\u51fa\u5f3a\u70c8\u5224\u65ad\uff0c\u5bfc\u81f4\u63a2\u7d22\u884c\u4e3a\u4e0d\u8db3\u3002\u8fdb\u4e00\u6b65\u6df1\u5165\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u79cd\u73b0\u8c61\u6e90\u4e8e\u4eceLLM\u91c7\u6837\u5f97\u5230\u7684\u52a8\u4f5c\u5206\u5e03\u71b5\u7684\u584c\u7f29\u3002\u63a5\u7740\uff0c\u6211\u4eec\u6307\u51fa\u73b0\u6709\u7684\u57fa\u4e8e\u4ee4\u724c\u7684\u91c7\u6837\u65b9\u6cd5\u672c\u8eab\u4e0d\u8db3\u4ee5\u4fc3\u4f7f\u6a21\u578b\u66f4\u5e7f\u6cdb\u63a2\u7d22\u3002 \u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u71b5\u6fc0\u6d3b\u5bfc\u5411\uff08Entropic Activation Steering\uff0cEAST\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5728\u4e0a\u4e0b\u6587\u4e2d\u7684LLM\u4ee3\u7406\u7684\u6fc0\u6d3b\u5bfc\u5411\u65b9\u6cd5\u3002EAST\u8ba1\u7b97\u4e00\u4e2a\u4ee5\u71b5\u4e3a\u6743\u91cd\u7684\u8868\u793a\u7ec4\u5408\uff0c\u901a\u8fc7\u5728\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u5e72\u9884\u6a21\u578b\u7684\u6fc0\u6d3b\uff0c\u6765\u8c03\u6574\u6a21\u578b\u5bf9\u52a8\u4f5c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u4ece\u800c\u4fc3\u8fdb\u63a2\u7d22\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u6700\u540e\uff0cEAST\u6539\u53d8\u4e86LLM\u5728\u51b3\u7b56\u65f6\u8868\u8fbe\u7684\u4e3b\u89c2\u4e0d\u786e\u5b9a\u6027\uff0c\u4e3a\u7406\u89e3\u548c\u63a7\u5236\u6a21\u578b\u5bf9\u51b3\u7b56\u4e0d\u786e\u5b9a\u6027\u7684\u8868\u5f81\u63d0\u4f9b\u4e86\u9014\u5f84\u3002|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222](http://arxiv.org/abs/2406.00222)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u8fc5\u901f\u6210\u4e3a\u6784\u5efa\u667a\u80fd\u5bf9\u8bdd\u52a9\u624b\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u8bf8\u5982\u6b67\u4e49\u5904\u7406\u7b49\u5bf9\u8bdd\u6280\u80fd\u4e0a\u4ecd\u6709\u6b20\u7f3a\uff1a\u5f53\u901a\u7528\u52a9\u624b\u9047\u5230\u6a21\u7cca\u60c5\u51b5\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u8fc7\u5ea6\u8c28\u614e\u6216\u731c\u6d4b\u7528\u6237\u7684\u771f\u6b63\u610f\u56fe\uff0c\u800c\u4e0d\u662f\u63d0\u95ee\u4ee5\u6c42\u6f84\u6e05\uff0c\u800c\u5728\u7279\u5b9a\u4efb\u52a1\u573a\u666f\u4e0b\uff0c\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6837\u672c\u5f80\u5f80\u6709\u9650\uff0c\u5f71\u54cd\u6a21\u578b\u5b66\u4e60\u6700\u4f18\u5bf9\u8bdd\u884c\u4e3a\u7b56\u7565\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAction-Based Contrastive Self-Training\uff08ACT\uff09\u7684\u8fd1\u4f3c\u5728\u7ebf\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u5b83\u57fa\u4e8eDirect Preference Optimization\uff08DPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u6837\u672c\u9ad8\u6548\u5bf9\u8bdd\u7b56\u7565\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4e09\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u9a8c\u8bc1\u4e86ACT\u7684\u6709\u6548\u6027\uff1a\u57fa\u4e8e\u8868\u683c\u7684\u95ee\u7b54\u3001\u673a\u5668\u9605\u8bfb\u7406\u89e3\uff0c\u4ee5\u53caAmbigSQL\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u6587\u672c\u5230SQL\u751f\u6210\u7684\u4fe1\u606f\u5bfb\u6c42\u8bf7\u6c42\u6b67\u4e49\u89e3\u51b3\u7684\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u8bc4\u4f30LLMs\u80fd\u5426\u5728\u5bf9\u8bdd\u4e2d\u8bc6\u522b\u548c\u63a8\u7406\u6b67\u4e49\u6765\u8861\u91cf\u5176\u4f5c\u4e3a\u5bf9\u8bdd\u4ee3\u7406\u7684\u80fd\u529b\u3002ACT\u5728\u4e0e\u6807\u51c6\u76d1\u7763\u5fae\u8c03\u548cDPO\u65b9\u6cd5\u76f8\u6bd4\u65f6\uff0c\u663e\u793a\u51fa\u4e86\u663e\u8457\u7684\u5bf9\u8bdd\u5efa\u6a21\u6539\u8fdb\u3002|\n", "2406.00215": "|**2024-05-31**|**Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent**|Jie JW Wu et.al.|[2406.00215](http://arxiv.org/abs/2406.00215)|**[link](https://github.com/jie-jw-wu/human-eval-comm)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u663e\u8457\u63d0\u5347\uff0c\u4f46\u4ecd\u4e0e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u7684\u6c34\u5e73\u5b58\u5728\u5dee\u8ddd\u3002\u9274\u4e8e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u5e38\u901a\u8fc7\u63d0\u95ee\u6765\u6d88\u9664\u9700\u6c42\u548c\u7f16\u7801\u89e3\u51b3\u65b9\u6848\u4e2d\u7684\u6a21\u7cca\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u5bf9\u4e8eLLMs\u8fdb\u884c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u65f6\u4e5f\u5e94\u5177\u5907\u7c7b\u4f3c\u7684\u6c9f\u901a\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u7814\u7a76\uff0c\u5173\u6ce8LLMs\u7684\u6c9f\u901a\u6280\u80fd\uff0c\u5373\u201c\u5728\u4ee3\u7801\u751f\u6210\u95ee\u9898\u63cf\u8ff0\u5b58\u5728\u95ee\u9898\u65f6\u80fd\u63d0\u51fa\u6f84\u6e05\u95ee\u9898\u201d\u3002 \u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540d\u4e3aHumanEvalComm\uff0c\u901a\u8fc7\u4fee\u6539\u95ee\u9898\u63cf\u8ff0\uff0c\u5f15\u5165\u4e86\u4e0d\u4e00\u81f4\u6027\u3001\u6a21\u7cca\u6027\u548c\u4e0d\u5b8c\u6574\u6027\u4e09\u4e2a\u95ee\u9898\u7ef4\u5ea6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u901a\u4fe1\u7387\u548c\u826f\u597d\u95ee\u9898\u7387\uff0c\u5e76\u5728HumanEvalComm\u4e0a\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684Code LLM\uff08\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\uff09\u4ee5\u53ca\u4e00\u79cd\u65b0\u578bLLM\u4ee3\u7406\u65b9\u6cd5\uff08Okanagan\uff09\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u4ece\u4ee3\u7801\u548c\u63cf\u8ff0\u4e2d\u8bc6\u522b\u5e76\u63d0\u95ee\uff0c\u4ee5\u8fdb\u4e00\u6b65\u4f18\u5316\u751f\u6210\u7684\u4ee3\u7801\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u6bd4\u8f83Code LLMs\u548cOkanagan\u7684\u8868\u73b0\uff0c\u8ba8\u8bba\u4e86\u5b9e\u9a8c\u7ed3\u679c\u3002|\n", "2406.03299": "|**2024-06-05**|**The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games**|Mikhail Mozikov et.al.|[2406.03299](http://arxiv.org/abs/2406.03299)|null|## \u7ffb\u8bd1 \u884c\u4e3a\u7814\u7a76\u5b9e\u9a8c\u5728\u793e\u4f1a\u6a21\u578b\u548c\u7406\u89e3\u4eba\u9645\u4e92\u52a8\u4e2d\u5360\u636e\u91cd\u8981\u5730\u4f4d\u3002\u7136\u800c\uff0c\u5b9e\u9645\u64cd\u4f5c\u4e2d\u8fd9\u7c7b\u5b9e\u9a8c\u5e38\u9762\u4e34\u5185\u5728\u6548\u5ea6\u3001\u5916\u5728\u6548\u5ea6\u3001\u53ef\u91cd\u590d\u6027\u548c\u793e\u4f1a\u504f\u89c1\u7b49\u6311\u6218\uff0c\u56e0\u4e3a\u4eba\u7c7b\u7684\u793e\u4f1a\u4e92\u52a8\u4e0e\u5408\u4f5c\u590d\u6742\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u4f46\u73b0\u6709\u57fa\u4e8eLLM\u7684\u6a21\u62df\u5047\u8bbe\u6a21\u578b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u76f8\u4f3c\uff0c\u5374\u5ffd\u89c6\u4e86\u5f71\u54cd\u4eba\u7c7b\u51b3\u7b56\u7684\u5173\u952e\u56e0\u7d20\u2014\u2014\u60c5\u7eea\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u8bba\u548c\u6846\u67b6\uff0c\u65e8\u5728\u63a2\u8ba8LLMs\u7684\u51b3\u7b56\u5236\u5b9a\u53ca\u5176\u5728\u60c5\u7eea\u72b6\u6001\u4e0b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002 \u901a\u8fc7\u5728\u4e24\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u884c\u4e3a\u7ecf\u6d4e\u5b66\u6e38\u620f\uff08\u535a\u5f08\u8bba\u5b9e\u9a8c\uff09\u4e2d\u4f7f\u7528GPT-3.5\u548cGPT-4\uff0c\u6211\u4eec\u53d1\u73b0\u60c5\u7eea\u5bf9LLMs\u7684\u8868\u73b0\u6709\u663e\u8457\u5f71\u54cd\uff0c\u4fc3\u4f7f\u5b83\u4eec\u53d1\u5c55\u51fa\u66f4\u4f18\u5316\u7684\u7b56\u7565\u3002\u5c3d\u7ba1GPT-3.5\u4e0e\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u884c\u52a8\u6a21\u5f0f\u6709\u8f83\u5f3a\u7684\u5bf9\u5e94\uff0c\u5c24\u5176\u662f\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u6e38\u620f\u4e2d\uff0c\u4f46GPT-4\u5c55\u73b0\u51fa\u4e00\u81f4\u7684\u884c\u4e3a\uff0c\u5bf9\u4e8e\u60c5\u7eea\u8bf1\u5bfc\u7684\u7406\u6027\u51b3\u7b56\u4f3c\u4e4e\u4e0d\u53d7\u5f71\u54cd\u3002\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u60c5\u7eea\u63d0\u793a\uff0c\u7279\u522b\u662f\u6124\u6012\u60c5\u7eea\uff0c\u80fd\u591f\u6253\u7834GPT-4\u7684\u201c\u8d85\u4eba\u201d\u4e00\u81f4\u6027\uff0c\u4f7f\u5176\u53cd\u5e94\u66f4\u63a5\u8fd1\u4eba\u7c7b\u7684\u60c5\u7eea\u53cd\u5e94\u3002|\n", "2406.03007": "|**2024-06-05**|**BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents**|Yifei Wang et.al.|[2406.03007](http://arxiv.org/abs/2406.03007)|**[link](https://github.com/dpamk/badagent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7e41\u8363\uff0c\u57fa\u4e8e\u8bad\u7ec3\u597d\u7684LLMs\u5e76\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u5fae\u8c03\u7684\u5f3a\u5927\u667a\u80fd\u4ee3\u7406\u5df2\u5f00\u53d1\u51fa\u6765\uff0c\u63d0\u4f9b\u5b9a\u5236\u670d\u52a1\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6784\u5efaLLM\u4ee3\u7406\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u9488\u5bf9\u4efb\u52a1\u8fdb\u884c\u8fdb\u4e00\u6b65\u8c03\u6574\u3002\u7136\u800c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u6613\u53d7\u540d\u4e3aBadAgent\u7684\u65b0\u578b\u540e\u95e8\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u901a\u8fc7\u5728\u540e\u95e8\u6570\u636e\u4e0a\u5fae\u8c03\u5728\u5404\u79cd\u4ee3\u7406\u4efb\u52a1\u4e2d\u690d\u5165\u540e\u95e8\u3002\u5728\u6d4b\u8bd5\u65f6\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u901a\u8fc7\u5728\u8f93\u5165\u6216\u73af\u5883\u4e2d\u663e\u793a\u89e6\u53d1\u5668\uff0c\u64cd\u7eb5\u90e8\u7f72\u7684LLM\u4ee3\u7406\u6267\u884c\u6709\u5bb3\u64cd\u4f5c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u5373\u4f7f\u5728\u4fe1\u4efb\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u4ecd\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u5c3d\u7ba1\u540e\u95e8\u653b\u51fb\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u5e7f\u6cdb\u7814\u7a76\uff0c\u4f46\u636e\u6211\u4eec\u6240\u77e5\uff0c\u6211\u4eec\u53ef\u80fd\u662f\u7b2c\u4e00\u4e2a\u7814\u7a76\u5728\u6743\u9650\u66f4\u5927\u7684LLM\u4ee3\u7406\u4e0a\u7684\u653b\u51fb\uff0c\u8fd9\u4e9b\u4ee3\u7406\u53ef\u4ee5\u4f7f\u7528\u5916\u90e8\u5de5\u5177\uff0c\u56e0\u6b64\u66f4\u5177\u5a01\u80c1\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u660e\u786e\u6307\u51fa\u4e86\u57fa\u4e8e\u4e0d\u4fe1\u4efb\u7684LLM\u6216\u6570\u636e\u6784\u5efaLLM\u4ee3\u7406\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a[https://github.com/DPamK/BadAgent](https://github.com/DPamK/BadAgent)\u3002**|\n", "2406.04151": "|**2024-06-06**|**AgentGym: Evolving Large Language Model-based Agents across Diverse Environments**|Zhiheng Xi et.al.|[2406.04151](http://arxiv.org/abs/2406.04151)|**[link](https://github.com/woooodyy/agentgym)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5efa\u7acb\u80fd\u591f\u5904\u7406\u5404\u79cd\u4efb\u52a1\u5e76\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u81ea\u6211\u8fdb\u5316\u7684\u6cdb\u5316\u578b\u4ee3\u7406\u662f\u4e00\u4e2a\u957f\u671f\u76ee\u6807\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u901a\u7528\u80fd\u529b\u88ab\u8ba4\u4e3a\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6709\u524d\u666f\u7684\u57fa\u7840\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u76d1\u7763\uff0c\u8ba9LLM\u4ee3\u7406\u9010\u6b65\u6a21\u4eff\u4e13\u5bb6\u63d0\u4f9b\u7684\u8f68\u8ff9\uff0c\u96be\u4ee5\u5927\u89c4\u6a21\u6269\u5c55\u4e14\u9650\u5236\u4e86\u73af\u5883\u63a2\u7d22\uff1b\u8981\u4e48\u8ba9\u4ee3\u7406\u5728\u5b64\u7acb\u73af\u5883\u4e2d\u63a2\u7d22\u5b66\u4e60\uff0c\u5bfc\u81f4\u4e13\u957f\u6709\u9650\u3001\u7f3a\u4e4f\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u9996\u6b21\u5c1d\u8bd5\u6784\u5efa\u5177\u5907\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u7684\u901a\u7528LLM\u4ee3\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e09\u4e2a\u5173\u952e\u8981\u7d20\uff1a1\uff09\u591a\u6837\u7684\u73af\u5883\u4ee5\u652f\u6301\u4ee3\u7406\u63a2\u7d22\u548c\u5b66\u4e60\uff1b2\uff09\u4e00\u5957\u8f68\u8ff9\u6765\u8d4b\u4e88\u4ee3\u7406\u57fa\u672c\u80fd\u529b\u548c\u5148\u9a8c\u77e5\u8bc6\uff1b3\uff09\u6709\u6548\u4e14\u53ef\u6269\u5c55\u7684\u8fdb\u5316\u65b9\u6cd5\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AgentGym\uff0c\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u5b83\u5305\u542b\u4e30\u5bcc\u7684\u73af\u5883\u548c\u4efb\u52a1\uff0c\u652f\u6301\u5168\u9762\u3001\u5b9e\u65f6\u3001\u7edf\u4e00\u683c\u5f0f\u548c\u5e76\u53d1\u7684\u4ee3\u7406\u63a2\u7d22\u3002AgentGym\u8fd8\u5305\u62ec\u4e00\u4e2a\u6269\u5c55\u6307\u4ee4\u7684\u6570\u636e\u5e93\u3001\u57fa\u51c6\u6d4b\u8bd5\u5957\u4ef6\u4ee5\u53ca\u8de8\u73af\u5883\u7684\u9ad8\u8d28\u91cf\u8f68\u8ff9\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AgentEvol\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7814\u7a76\u4ee3\u7406\u5728\u8d85\u8d8a\u65e2\u5b9a\u6570\u636e\uff0c\u8de8\u8d8a\u4efb\u52a1\u548c\u73af\u5883\u65f6\u7684\u81ea\u6211\u8fdb\u5316\u6f5c\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fdb\u5316\u540e\u7684\u4ee3\u7406\u53ef\u4ee5\u8fbe\u5230\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6211\u4eec\u53d1\u5e03\u4e86AgentGym\u5957\u4ef6\uff0c\u5305\u62ec\u5e73\u53f0\u3001\u6570\u636e\u96c6\u3001\u57fa\u51c6\u3001\u68c0\u67e5\u70b9\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002AgentGym\u5957\u4ef6\u5df2\u5728\u5176\u5b98\u65b9\u7f51\u7ad9https://github.com/WooooDyy/AgentGym\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.04692": "|**2024-06-07**|**Mixture-of-Agents Enhances Large Language Model Capabilities**|Junlin Wang et.al.|[2406.04692](http://arxiv.org/abs/2406.04692)|null|\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\u663e\u8457\uff0c\u5c55\u73b0\u51fa\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5f3a\u5927\u80fd\u529b\u3002\u968f\u7740LLMs\u7684\u589e\u591a\uff0c\u5982\u4f55\u6709\u6548\u6574\u5408\u591a\u6a21\u578b\u7684\u77e5\u8bc6\u6210\u4e3a\u4e86\u4e00\u4e2a\u4ee4\u4eba\u632f\u594b\u7684\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u6df7\u5408\u4ee3\u7406\uff08Mixture-of-Agents\uff0cMoA\uff09\u65b9\u6cd5\u3002\u5728\u6211\u4eec\u7684\u67b6\u6784\u4e2d\uff0cMoA\u91c7\u7528\u4e86\u5206\u5c42\u8bbe\u8ba1\uff0c\u6bcf\u5c42\u5305\u542b\u591a\u4e2aLLM\u4ee3\u7406\u3002\u6bcf\u4e2a\u4ee3\u7406\u5728\u751f\u6210\u54cd\u5e94\u65f6\uff0c\u4f1a\u5229\u7528\u524d\u4e00\u5c42\u6240\u6709\u4ee3\u7406\u7684\u8f93\u51fa\u4f5c\u4e3a\u8f85\u52a9\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u79cd\u7b56\u7565\uff0cMoA\u6a21\u578b\u5728AlpacaEval 2.0\u3001MT-Bench\u548cFLASK\u7b49\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86GPT-4\u5168\u80fd\u7248\u3002\u4f8b\u5982\uff0c\u4ec5\u4f7f\u7528\u5f00\u6e90LLMs\u7684\u6211\u4eec\u7684MoA\u6a21\u578b\u5728AlpacaEval 2.0\u4e0a\u7684\u5f97\u5206\u9886\u5148\uff0c\u8fbe\u523065.1%\uff0c\u800cGPT-4\u5168\u80fd\u7248\u7684\u6210\u7ee9\u4e3a57.5%\u3002|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925](http://arxiv.org/abs/2406.05925)|**[link](https://github.com/leolee99/ld-agent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5f00\u653e\u57df\u5bf9\u8bdd\u7cfb\u7edf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u7cfb\u7edf\u4e3b\u8981\u5173\u6ce8\u7b80\u77ed\u7684\u5355\u6b21\u4f1a\u8bdd\uff0c\u5ffd\u89c6\u4e86\u957f\u671f\u966a\u4f34\u548c\u4e2a\u6027\u5316\u804a\u5929\u673a\u5668\u4eba\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u6ee1\u8db3\u8fd9\u79cd\u5b9e\u9645\u9700\u6c42\uff0c\u4e8b\u4ef6\u603b\u7ed3\u548c\u4eba\u683c\u7ba1\u7406\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u4eec\u80fd\u591f\u4fc3\u8fdb\u957f\u671f\u5bf9\u8bdd\u56de\u590d\u7684\u5408\u7406\u6027\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4eba\u7c7b\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8fdb\u5c55\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6709\u53ef\u80fd\u5927\u5e45\u589e\u5f3a\u81ea\u52a8\u5316\u611f\u77e5\u3001\u51b3\u7b56\u548c\u95ee\u9898\u89e3\u51b3\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684\u6846\u67b6\u2014\u2014\u957f\u671f\u5bf9\u8bdd\u4ee3\u7406\uff08LD-Agent\uff09\uff0c\u5b83\u5305\u62ec\u4e09\u4e2a\u53ef\u72ec\u7acb\u8c03\u6574\u7684\u6a21\u5757\uff1a\u4e8b\u4ef6\u611f\u77e5\u3001\u4eba\u683c\u63d0\u53d6\u548c\u54cd\u5e94\u751f\u6210\u3002\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u4f7f\u7528\u957f\u77ed\u671f\u8bb0\u5fc6\u5e93\u5206\u522b\u5173\u6ce8\u5386\u53f2\u548c\u6b63\u5728\u8fdb\u884c\u7684\u4f1a\u8bdd\uff0c\u5e76\u5f15\u5165\u4e86\u57fa\u4e8e\u4e3b\u9898\u7684\u68c0\u7d22\u673a\u5236\u4ee5\u63d0\u9ad8\u8bb0\u5fc6\u68c0\u7d22\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u4eba\u683c\u6a21\u5757\u5b9e\u73b0\u4e86\u7528\u6237\u548c\u4ee3\u7406\u7684\u52a8\u6001\u4eba\u683c\u5efa\u6a21\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6574\u5408\u68c0\u7d22\u7684\u8bb0\u5fc6\u548c\u63d0\u53d6\u7684\u4eba\u683c\uff0c\u751f\u6210\u5668\u4f1a\u4ea7\u751f\u9002\u5f53\u7684\u56de\u5e94\u3002\u6211\u4eec\u5728\u5404\u79cd\u793a\u4f8b\u57fa\u51c6\u3001\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u5b9e\u8bc1\u4e86LD-Agent\u7684\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u548c\u8de8\u9886\u57df\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u5728https://github.com/leolee99/LD-Agent\u4e0a\u53d1\u5e03\u3002**|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804](http://arxiv.org/abs/2406.05804)|**[link](https://github.com/xinzhel/llm-agent-survey)**|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u590d\u6742\u4ee3\u7406\u5de5\u4f5c\u6d41\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u8def\u5f84\u3001\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u63d0\u793a\u65b9\u6cd5\u6709\u6240\u6539\u8fdb\u3002\u8fd9\u7bc7\u7efc\u8ff0\u65e8\u5728\u6982\u8ff0\u5e38\u89c1\u7684\u5de5\u4f5c\u6d41\uff0c\u7279\u522b\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7279\u6027\u7684\u7ec4\u4ef6\uff08LLM-Profiled Components\uff0cLMPCs\uff09\uff0c\u5e76\u5f3a\u8c03\u5bf9\u975eLLM\u7ec4\u4ef6\u7684\u5ffd\u7565\u3002\u8fd9\u79cd\u7814\u7a76\u7684\u76ee\u7684\u662f\u4e3a\u4e86\u589e\u8fdb\u5bf9LLMs\u89d2\u8272\u7684\u7406\u89e3\uff0c\u5e76\u63a2\u7d22LMPC\u7684\u590d\u7528\u6f5c\u529b\u3002|\n", "2406.07275": "|**2024-06-11**|**DCA-Bench: A Benchmark for Dataset Curation Agents**|Benhao Huang et.al.|[2406.07275](http://arxiv.org/abs/2406.07275)|**[link](https://github.com/TRAIS-Lab/dca-bench)**|\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7814\u7a76\u548c\u5f00\u53d1\u7684\u63a8\u8fdb\uff0c\u6570\u636e\u96c6\u7684\u8d28\u91cf\u65e5\u76ca\u5173\u952e\u3002\u5c3d\u7ba1\u5f00\u653e\u6570\u636e\u96c6\u5e73\u53f0\u4f17\u591a\uff0c\u4f46\u6570\u636e\u8d28\u91cf\u95ee\u9898\uff0c\u5982\u7f3a\u4e4f\u6587\u6863\u3001\u6807\u6ce8\u9519\u8bef\u548c\u4f26\u7406\u8003\u91cf\uff0c\u4ecd\u666e\u904d\u5b58\u5728\u3002\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u96be\u4ee5\u901a\u8fc7\u89c4\u5219\u57fa\u7840\u811a\u672c\u68c0\u6d4b\uff0c\u9700\u8981\u7528\u6237\u6216\u7ef4\u62a4\u8005\u82b1\u8d39\u5927\u91cf\u4eba\u529b\u8fdb\u884c\u8bc6\u522b\u548c\u9a8c\u8bc1\u3002\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u6570\u636e\u96c6\u6574\u7406\u7684\u6f5c\u529b\u4ee4\u4eba\u671f\u5f85\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aDCA-Bench\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ee3\u7406\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u9690\u85cf\u6570\u636e\u8d28\u91cf\u95ee\u9898\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u4ece\u516b\u4e2a\u516c\u5f00\u6570\u636e\u96c6\u5e73\u53f0\u6536\u96c6\u4e86\u5404\u79cd\u5b9e\u9645\u95ee\u9898\u4f5c\u4e3a\u6d4b\u8bd5\u5e8a\u3002\u4e3a\u4e86\u5efa\u7acb\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30LLM\u6210\u529f\u4e0e\u5426\u7684\u7ba1\u9053\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u8bc4\u4f30\u5668\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8bc4\u4f30\u5668\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u9ad8\u5ea6\u543b\u5408\uff0c\u80fd\u5b9e\u73b0\u53ef\u9760\u7684\u81ea\u52a8\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5728\u591a\u4e2a\u57fa\u7ebfLLM\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u793a\u4e86\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u610f\u5473\u7740\u5c06LLMs\u5e94\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u548c\u521b\u65b0\u3002\u6b64\u5916\uff0c\u8be5\u57fa\u51c6\u4e5f\u53ef\u4f5c\u4e3a\u8861\u91cfLLMs\u5728\u95ee\u9898\u53d1\u73b0\u80fd\u529b\u800c\u975e\u4ec5\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u7684\u6d4b\u8bd5\u5e73\u53f0\u3002\u57fa\u51c6\u5957\u4ef6\u5df2\u5f00\u653e\u5728\uff1a\\url{https://github.com/TRAIS-Lab/dca-bench}\u3002|\n", "2406.07217": "|**2024-06-11**|**A Synthetic Dataset for Personal Attribute Inference**|Hanna Yukhymenko et.al.|[2406.07217](http://arxiv.org/abs/2406.07217)|**[link](https://github.com/eth-sri/synthpai)**|**\u8fd1\u5e74\u6765\uff0c\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e3a\u5168\u7403\u6570\u4ebf\u7528\u6237\u6240\u63a5\u89e6\uff0c\u4f46\u5b83\u4eec\u7684\u5f3a\u5927\u529f\u80fd\u548c\u5e7f\u6cdb\u4e16\u754c\u77e5\u8bc6\u4e5f\u5e26\u6765\u4e86\u9690\u79c1\u98ce\u9669\u3002\u672c\u7814\u7a76\u5173\u6ce8LLMs\u65b0\u5174\u7684\u9690\u79c1\u5a01\u80c1\u2014\u2014\u4ece\u7f51\u7edc\u6587\u672c\u4e2d\u51c6\u786e\u63a8\u65ad\u4e2a\u4eba\u4fe1\u606f\u3002\u9274\u4e8e\u57fa\u4e8eLLM\u7684\u4f5c\u8005\u5206\u6790\u7814\u7a76\u7f3a\u4e4f\u5408\u9002\u7684\u516c\u5f00\u6570\u636e\u96c6\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u6d89\u53ca\u771f\u5b9e\u4e2a\u4eba\u6570\u636e\u7684\u4f26\u7406\u548c\u9690\u79c1\u987e\u8651\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u4e24\u4e2a\u65b9\u9762\u8fdb\u884c\u4e86\u63a2\u7d22\uff1a\uff08i\uff09\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4f7f\u7528\u5408\u6210\u4e2a\u4eba\u8d44\u6599\u586b\u5145\u7684\u6d41\u884c\u793e\u4ea4\u5e73\u53f0Reddit\u7684\u6a21\u62df\u6846\u67b6\uff1b\uff08ii\uff09\u5229\u7528\u6b64\u6846\u67b6\uff0c\u6211\u4eec\u751f\u6210\u4e86SynthPAI\uff0c\u4e00\u4e2a\u5305\u542b\u8d85\u8fc77800\u6761\u7ecf\u8fc7\u624b\u52a8\u6807\u8bb0\u4e2a\u4eba\u5c5e\u6027\u7684\u591a\u6837\u5316\u7684\u5408\u6210\u8bc4\u8bba\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u4eba\u7c7b\u7814\u7a76\u9a8c\u8bc1\u4e86\u6570\u636e\u96c6\uff0c\u7ed3\u679c\u663e\u793a\u4eba\u7c7b\u5728\u533a\u5206\u771f\u5b9e\u548c\u5408\u6210\u8bc4\u8bba\u7684\u4efb\u52a1\u4e0a\u51e0\u4e4e\u4e0d\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6570\u636e\u96c6\u652f\u6301\u6709\u610f\u4e49\u7684\u4e2a\u4eba\u5c5e\u6027\u63a8\u65ad\u7814\u7a76\uff0c\u901a\u8fc718\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u5408\u6210\u8bc4\u8bba\u53ef\u4ee5\u5f97\u51fa\u4e0e\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u76f8\u540c\u7684\u7ed3\u8bba\u3002\u7efc\u4e0a\u6240\u8ff0\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6d41\u7a0b\u4e3a\u672a\u6765\u7814\u7a76\u5982\u4f55\u7406\u89e3\u548c\u51cf\u8f7bLLMs\u5e26\u6765\u7684\u57fa\u4e8e\u63a8\u65ad\u7684\u9690\u79c1\u5a01\u80c1\u63d0\u4f9b\u4e86\u5f3a\u5927\u4e14\u9690\u79c1\u4fdd\u62a4\u7684\u57fa\u7840\u3002**|\n", "2406.07021": "|**2024-06-11**|**A Tool for Test Case Scenarios Generation Using Large Language Models**|Abdul Malik Sami et.al.|[2406.07021](http://arxiv.org/abs/2406.07021)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u6db5\u76d6\u4ee3\u7801\u751f\u6210\u3001\u8f6f\u4ef6\u8bbe\u8ba1\u548c\u6587\u6863\u7f16\u5199\u3001\u6dfb\u52a0\u4ee3\u7801\u6ce8\u91ca\u3001\u4ee3\u7801\u5ba1\u67e5\u4ee5\u53ca\u7f16\u5199\u6d4b\u8bd5\u811a\u672c\u7b49\u4efb\u52a1\u3002\u7136\u800c\uff0c\u521b\u5efa\u6d4b\u8bd5\u811a\u672c\u6216\u81ea\u52a8\u5316\u6d4b\u8bd5\u6848\u4f8b\u9700\u8981\u4e0e\u529f\u80fd\u9700\u6c42\u7d27\u5bc6\u76f8\u5173\u7684\u8be6\u5c3d\u6d4b\u8bd5\u5957\u4ef6\u6587\u6863\u3002\u8fd9\u79cd\u6587\u6863\u5e94\u80fd\u5728\u6709\u9650\u7684\u65f6\u95f4\u548c\u8303\u56f4\u5185\u5b9e\u73b0\u5168\u9762\u6d4b\u8bd5\uff0c\u5c24\u5176\u5f53\u9700\u6c42\u548c\u7528\u6237\u671f\u671b\u4e0d\u65ad\u53d8\u5316\u65f6\u3002\u672c\u6587\u4e3b\u8981\u5173\u6ce8\u6839\u636e\u7528\u6237\u9700\u6c42\u751f\u6210\u53f2\u8bd7\u7ea7\uff08epics\uff09\u548c\u9ad8\u5c42\u6b21\u7528\u6237\u6545\u4e8b\uff0c\u7136\u540e\u57fa\u4e8e\u8fd9\u4e9b\u6545\u4e8b\u8bbe\u8ba1\u6d4b\u8bd5\u573a\u666f\u3002\u6587\u7ae0\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u4ee3\u7406\u548c\u63d0\u793a\u5de5\u7a0b\u7684\u7f51\u7edc\u8f6f\u4ef6\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u80fd\u591f\u81ea\u52a8\u5316\u9488\u5bf9\u7528\u6237\u9700\u6c42\u751f\u6210\u6d4b\u8bd5\u573a\u666f\u7684\u8fc7\u7a0b\u3002|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947](http://arxiv.org/abs/2406.06947)|**[link](https://github.com/caap-agent/caap-agent)**|**\u957f\u671f\u4ee5\u6765\uff0c\u8f6f\u4ef6\u673a\u5668\u4eba\u5df2\u7ecf\u5728\u673a\u5668\u4eba\u6d41\u7a0b\u81ea\u52a8\u5316\uff08RPA\uff09\u4e2d\u7528\u4e8e\u6267\u884c\u67af\u71e5\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u63a8\u7406\u80fd\u529b\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u4ee3\u7406\u73b0\u5728\u80fd\u591f\u5904\u7406\u66f4\u590d\u6742\u751a\u81f3\u524d\u6240\u672a\u89c1\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f53\u524d\u6587\u732e\u4e2d\u7684\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8eHTML\u6e90\u4ee3\u7801\u4f5c\u4e3a\u8f93\u5165\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u975e\u7f51\u7edc\u73af\u5883\u7684\u5e94\u7528\u3002HTML\u4ee3\u7801\u4e2d\u7684\u4fe1\u606f\u5e38\u5e38\u4e0d\u51c6\u786e\u6216\u4e0d\u5b8c\u6574\uff0c\u8fd9\u964d\u4f4e\u4e86\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4ec5\u57fa\u4e8e\u5c4f\u5e55\u622a\u56fe\u7684LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u5b83\u4e13\u6ce8\u4e8e\u8bc6\u522b\u73af\u5883\uff0c\u5e76\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u6765\u6d88\u9664\u5bf9\u5927\u91cf\u4eba\u7c7b\u6f14\u793a\u6570\u636e\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u7b56\u7565\u540d\u4e3a\u201c\u4e0a\u4e0b\u6587\u611f\u77e5\u884c\u52a8\u89c4\u5212\u201d\uff08Context-Aware Action Planning\uff0cCAAP\uff09\u63d0\u793a\uff0c\u9f13\u52b1\u4ee3\u7406\u4ece\u591a\u4e2a\u89d2\u5ea6\u4ed4\u7ec6\u5ba1\u67e5\u4e0a\u4e0b\u6587\u3002\u901a\u8fc7\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u572867\u79cdMiniWoB++\u95ee\u9898\u4e0a\u5b9e\u73b0\u4e8694.4%\u7684\u6210\u529f\u7387\uff0c\u6bcf\u4e2a\u95ee\u9898\u7c7b\u578b\u53ea\u97001.48\u6b21\u6f14\u793a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u5728\u8ba1\u7b97\u673a\u6216\u667a\u80fd\u624b\u673a\u4e4b\u95f4\u8fdb\u884c\u8de8\u5e94\u7528\u534f\u8c03\u7684\u4efb\u52a1\u4e0a\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u4ee3\u7406\u9886\u57df\u7684\u91cd\u5927\u8fdb\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728https://github.com/caap-agent/caap-agent\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613](http://arxiv.org/abs/2406.06613)|**[link](https://github.com/Joshuaclymer/GameBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5728\u8bb8\u591a\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u5c11\u91cf\u6837\u672c\u6027\u80fd\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5c55\u793a\u8fc7\u5728\u590d\u6742\u7b56\u7565\u573a\u666f\u4e2d\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u7f3a\u4e4f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u6e38\u620f\u4e2d\u7684\u5404\u79cd\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86GameBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u8de8\u9886\u57df\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6218\u7565\u601d\u7ef4\u80fd\u529b\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e9\u4e2a\u4e0d\u540c\u7684\u6e38\u620f\u73af\u5883\uff0c\u6bcf\u4e2a\u6e38\u620f\u81f3\u5c11\u6db5\u76d6\u4e00\u79cd\u5728\u7b56\u7565\u6e38\u620f\u4e2d\u8bc6\u522b\u51fa\u7684\u5173\u952e\u63a8\u7406\u6280\u80fd\uff0c\u5e76\u9009\u62e9\u90a3\u4e9b\u6218\u7565\u89e3\u91ca\u4e0d\u592a\u53ef\u80fd\u6784\u6210\u6a21\u578b\u9884\u8bad\u7ec3\u6570\u636e\u4e3b\u8981\u90e8\u5206\u7684\u6e38\u620f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4f7f\u7528\u4e86\u57fa\u7840\u5f62\u5f0f\u7684GPT-3\u548cGPT-4\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65e8\u5728\u589e\u5f3a\u6218\u7565\u63a8\u7406\u80fd\u529b\u7684\u5f15\u5bfc\u6846\u67b6\uff1aChain-of-Thought\uff08CoT\uff09\u63d0\u793a\u548cReasoning Via Planning\uff08RAP\uff09\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6240\u6709\u6d4b\u8bd5\u6a21\u578b\u7684\u8868\u73b0\u90fd\u6ca1\u6709\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\uff0c\u6700\u5dee\u7684\u662fGPT-4\u7684\u8868\u73b0\u751a\u81f3\u4f4e\u4e8e\u968f\u673a\u884c\u52a8\u3002CoT\u548cRAP\u90fd\u63d0\u9ad8\u4e86\u5206\u6570\uff0c\u4f46\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002**|\n", "2406.08184": "|**2024-06-12**|**MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents**|Luyuan Wang et.al.|[2406.08184](http://arxiv.org/abs/2406.08184)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e0a\u7684\u76f4\u63a5\u4ea4\u4e92\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u4ee5\u53ca\u5b83\u4eec\u5728\u81ea\u4e3b\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u57fa\u4e8eLLMs\u7684\u79fb\u52a8\u4ee3\u7406\u6b63\u9010\u6e10\u53d7\u5230\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5e94\u7528\u7a0b\u5e8f\u7684\u65e0\u9650\u72b6\u6001\u548c\u53ef\u884c\u52a8\u4f5c\u5e8f\u5217\u7684\u6a21\u7cca\u5b9a\u4e49\uff0c\u5bf9\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u6027\u80fd\u7684\u57fa\u51c6\u7814\u7a76\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684\u57fa\u51c6\u5de5\u5177\u2014\u2014MobileAgentBench\uff0c\u65e8\u5728\u51cf\u8f7b\u7e41\u7410\u7684\u624b\u52a8\u6d4b\u8bd5\u8d1f\u62c5\u3002\u6211\u4eec\u9996\u5148\u5b9a\u4e49\u4e86\u6db5\u76d610\u4e2a\u5f00\u6e90\u5e94\u7528\u7684100\u9879\u4efb\u52a1\uff0c\u6309\u96be\u5ea6\u5206\u4e3a\u591a\u4e2a\u7ea7\u522b\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5305\u62ecAppAgent\u548cMobileAgent\u5728\u5185\u7684\u591a\u4e2a\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u5168\u9762\u7cfb\u7edf\u5730\u6bd4\u8f83\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6240\u6709\u76f8\u5173\u6750\u6599\u5747\u53ef\u5728\u6211\u4eec\u7684\u9879\u76ee\u7f51\u7ad9https://MobileAgentBench.github.io\u4e0a\u83b7\u53d6\uff0c\u8fd9\u5c06\u63a8\u52a8\u5b66\u672f\u548c\u5de5\u4e1a\u9886\u57df\u7684\u8fdb\u6b65\u3002|\n", "2406.07973": "|**2024-06-12**|**Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey**|Shang Wang et.al.|[2406.07973](http://arxiv.org/abs/2406.07973)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5927\u91cf\u6570\u636e\u8bad\u7ec3\uff0c\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u673a\u5668\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u7b49\u5404\u79cd\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5728\u5176\u751f\u547d\u5468\u671f\u4e2d\u66b4\u9732\u51fa\u4e00\u7cfb\u5217\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u8fd9\u4e9b\u95ee\u9898\u4e0e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\u5177\u6709\u72ec\u7279\u6027\uff0c\u9274\u4e8e\u5f53\u524d\u7684\u7efc\u8ff0\u7f3a\u4e4f\u9488\u5bf9\u4e0d\u540c\u573a\u666f\u7684\u6e05\u6670\u5a01\u80c1\u5206\u7c7b\uff0c\u6211\u4eec\u6839\u636e\u4e94\u4e2a\u573a\u666f\uff1a\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001RAG\u7cfb\u7edf\u3001\u90e8\u7f72\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5f3a\u8c03\u4e86\u72ec\u7279\u7684\u98ce\u9669\u3002\u8003\u8651\u5230\u6bcf\u79cd\u5a01\u80c1\u7684\u7279\u6027\uff0c\u672c\u8c03\u67e5\u63d0\u4f9b\u4e86\u6f5c\u5728\u5a01\u80c1\u548c\u5e94\u5bf9\u7b56\u7565\u3002\u7814\u7a76LLMs\u6240\u9762\u4e34\u7684\u653b\u51fb\u548c\u9632\u5fa1\u60c5\u51b5\uff0c\u53ef\u4ee5\u4e3a\u66f4\u591a\u9886\u57df\u63d0\u4f9b\u53ef\u884c\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4f7f\u66f4\u591a\u4eba\u80fd\u591f\u53d7\u76ca\u4e8eLLMs\u3002|\n", "2406.07914": "|**2024-06-14**|**Can Large Language Models Understand Spatial Audio?**|Changli Tang et.al.|[2406.07914](http://arxiv.org/abs/2406.07914)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u638c\u63e1\u591a\u901a\u9053\u97f3\u9891\u4e2d\u7684\u7a7a\u95f4\u4fe1\u606f\uff0c\u8fd9\u662f\u5f53\u524d\u542c\u89c9LLMs\u6240\u7f3a\u4e4f\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5229\u7528LLMs\u7684\u9ad8\u7ea7\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\uff0c\u76ee\u6807\u662f\u63d0\u5347\u6a21\u578b\u5bf9\u4e09\u7ef4\u73af\u5883\u7684\u7406\u89e3\uff0c\u901a\u8fc7\u97f3\u9891\u3002\u7814\u7a76\u6d89\u53ca\u4e09\u9879\u7a7a\u95f4\u97f3\u9891\u4efb\u52a1\uff1a\u58f0\u6e90\u5b9a\u4f4d\uff08SSL\uff09\u3001\u8fdc\u573a\u8bed\u97f3\u8bc6\u522b\uff08FSR\uff09\u548c\u57fa\u4e8e\u4f4d\u7f6e\u7684\u8bed\u97f3\u63d0\u53d6\uff08LSE\uff09\uff0c\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u90fd\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5728SSL\u65b9\u9762\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spatial LibriSpeech\u6570\u636e\u96c6\u4e0a\u7684\u5747\u65b9\u8bef\u5dee\uff08MAE\uff09\u8fbe\u52302.70\u00b0\uff0c\u660e\u663e\u4f18\u4e8e\u5148\u524d\u7684\u57fa\u51c6\u7ea66.60\u00b0\u3002\u6b64\u5916\uff0c\u6a21\u578b\u80fd\u591f\u5229\u7528\u7a7a\u95f4\u7ebf\u7d22\u63d0\u9ad8FSR\u7684\u51c6\u786e\u6027\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u63d0\u793a\uff0c\u6839\u636e\u6307\u5b9a\u65b9\u5411\u805a\u7126\u4e8e\u58f0\u97f3\uff0c\u5373\u4f7f\u5728\u91cd\u53e0\u8bed\u97f3\u73af\u5883\u4e2d\u4e5f\u80fd\u6267\u884cLSE\u3002\u8fd9\u4e9b\u6210\u679c\u63ed\u793a\u4e86LLMs\u9002\u5e94\u7269\u7406\u97f3\u9891\u6982\u5ff5\u7684\u6f5c\u529b\uff0c\u4e3a\u6784\u5efa\u57fa\u4e8eLLM\u7684\u4e09\u7ef4\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187](http://arxiv.org/abs/2406.09187)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u5e94\u7528\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u65b0\u62c5\u5fe7\u3002\u73b0\u6709\u7684\u63d0\u5347LLM\u5b89\u5168\u6027\u7684\u65b9\u6cd5\u5e76\u4e0d\u76f4\u63a5\u9002\u7528\u4e8eLLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u56e0\u4e3a\u5b83\u4eec\u5177\u6709\u4e0d\u540c\u7684\u76ee\u6807\u548c\u8f93\u51fa\u6a21\u5f0f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\u2014\u2014GuardAgent\uff0c\u5b83\u4f5c\u4e3a\u5176\u4ed6LLM\u4ee3\u7406\u7684\u201c\u9632\u62a4\u680f\u201d\u3002GuardAgent\u901a\u8fc7\u68c0\u67e5\u5176\u8f93\u5165/\u8f93\u51fa\u662f\u5426\u6ee1\u8db3\u7528\u6237\u5b9a\u4e49\u7684\u4e00\u7cfb\u5217\u5b88\u62a4\u8bf7\u6c42\u6765\u76d1\u7763\u76ee\u6807LLM\u3002GuardAgent\u5206\u4e3a\u4e24\u6b65\uff1a1\uff09\u5206\u6790\u63d0\u4f9b\u7684\u5b88\u62a4\u8bf7\u6c42\u521b\u5efa\u4efb\u52a1\u8ba1\u5212\uff1b2\uff09\u6839\u636e\u4efb\u52a1\u8ba1\u5212\u751f\u6210\u5b88\u62a4\u4ee3\u7801\uff0c\u5e76\u901a\u8fc7API\u8c03\u7528\u6216\u5916\u90e8\u5f15\u64ce\u6267\u884c\u3002\u6574\u4e2a\u8fc7\u7a0b\u5229\u7528LLM\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u7ec4\u4ef6\uff0c\u7ed3\u5408\u8bb0\u5fc6\u6a21\u5757\u4e2d\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u589e\u5f3a\u4e86\u77e5\u8bc6\u9a71\u52a8\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u5404\u79cd\u6587\u672c\u5b88\u62a4\u8bf7\u6c42\u5e76\u51c6\u786e\u5730\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u63d0\u4f9b\u53ef\u9760\u7684\u5b89\u5168\u4fdd\u969c\u3002 GuardAgent\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u5de5\u5177\u7bb1\uff0c\u5305\u542b\u51fd\u6570\u548cAPI\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3LLM\uff0c\u5f3a\u8c03\u4e86\u5176\u901a\u7528\u6027\u53ca\u4f4e\u8fd0\u8425\u6210\u672c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff1aEICU-AC\u7528\u4e8e\u8bc4\u4f30\u533b\u7597\u5065\u5eb7\u4ee3\u7406\u7684\u9690\u79c1\u76f8\u5173\u8bbf\u95ee\u63a7\u5236\uff0cMind2Web-SC\u7528\u4e8e\u8bc4\u4f30\u7f51\u7edc\u4ee3\u7406\u7684\u5b89\u5168\u6027\u3002\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\uff0cGuardAgent\u5206\u522b\u572898.7%\u548c90.0%\u7684\u7cbe\u5ea6\u4e0b\u6709\u6548\u7ba1\u7406\u4e86\u4e24\u79cd\u7c7b\u578b\u4ee3\u7406\u7684\u65e0\u6548\u8f93\u5165\u548c\u8f93\u51fa\u3002\u5b9e\u9a8c\u8fd8\u8868\u660e\uff0cGuardAgent\u80fd\u591f\u9002\u5e94\u65b0\u5174\u7684LLM\u4ee3\u7406\u548c\u5b88\u62a4\u8bf7\u6c42\uff0c\u5b9a\u4e49\u65b0\u7684\u529f\u80fd\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2406.08979": "|**2024-06-13**|**Multi-Agent Software Development through Cross-Team Collaboration**|Zhuoyun Du et.al.|[2406.08979](http://arxiv.org/abs/2406.08979)|**[link](https://github.com/openbmb/chatdev)**|**### \u6982\u8ff0 \u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\uff0c\u5982ChatDev\uff0c\u63a8\u52a8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u6df1\u523b\u53d8\u9769\uff0c\u7279\u522b\u4f53\u73b0\u5728\u591a\u4ee3\u7406\u534f\u4f5c\u4e0a\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u50cf\u4eba\u7c7b\u56e2\u961f\u4e00\u6837\u5408\u4f5c\uff0c\u9075\u5faa\u7011\u5e03\u6a21\u578b\u8fdb\u884c\u9700\u6c42\u5206\u6790\u3001\u5f00\u53d1\u3001\u5ba1\u67e5\u3001\u6d4b\u8bd5\u7b49\u9636\u6bb5\uff0c\u5b9e\u73b0\u81ea\u4e3b\u8f6f\u4ef6\u751f\u6210\u3002\u7136\u800c\uff0c\u5355\u4e2a\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u6bcf\u4e2a\u9636\u6bb5\u53ea\u4f1a\u4ea7\u751f\u4e00\u79cd\u53ef\u80fd\u7ed3\u679c\uff0c\u5bfc\u81f4\u53ea\u5b8c\u6210\u4e00\u6761\u5f00\u53d1\u94fe\uff0c\u4ece\u800c\u4e27\u5931\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u4e2d\u63a2\u7d22\u591a\u79cd\u51b3\u7b56\u8def\u5f84\u7684\u673a\u4f1a\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u8de8\u56e2\u961f\u534f\u4f5c\uff08Cross-Team Collaboration\uff0cCTC\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u591a\u56e2\u961f\u7ed3\u6784\uff0c\u5b83\u5141\u8bb8\u534f\u540c\u5de5\u4f5c\u7684\u56e2\u961f\u5728\u8de8\u56e2\u961f\u534f\u4f5c\u73af\u5883\u4e2d\u5171\u540c\u63d0\u51fa\u51b3\u7b56\uff0c\u5e76\u4ea4\u6d41\u5404\u81ea\u89c1\u89e3\uff0c\u4ee5\u4f18\u5316\u5185\u5bb9\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u57fa\u51c6\uff0c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5728\u6545\u4e8b\u751f\u6210\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5177\u6709\u5e7f\u6cdb\u7684\u8de8\u9886\u57df\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u671f\u5f85\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u5f15\u5bfcLLMs\u5411\u8de8\u56e2\u961f\u6a21\u5f0f\u53d1\u5c55\uff0c\u5e76\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5e26\u6765\u91cd\u5927\u8fdb\u6b65\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.08747": "|**2024-06-13**|**StreamBench: Towards Benchmarking Continuous Improvement of Language Agents**|Cheng-Kuang Wu et.al.|[2406.08747](http://arxiv.org/abs/2406.08747)|**[link](https://github.com/stream-bench/stream-bench)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u7ecf\u9a8c\u4e2d\u81ea\u6211\u63d0\u5347\uff0c\u8fd9\u662f\u90e8\u7f72\u540e\u6301\u7eed\u6539\u8fdb\u7684\u91cd\u8981\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u4e3b\u8981\u8bc4\u4f30\u5b83\u4eec\u7684\u56fa\u6709\u80fd\u529b\uff0c\u800c\u4e0d\u8003\u5bdf\u5b83\u4eec\u968f\u65f6\u95f4\u6539\u8fdb\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86StreamBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u8f93\u5165-\u53cd\u9988\u5e8f\u5217\u4e0a\u7684\u8fde\u7eed\u6539\u8fdb\u6027\u80fd\u3002StreamBench\u6a21\u62df\u4e86\u4e00\u4e2a\u5728\u7ebf\u5b66\u4e60\u73af\u5883\uff0c\u5176\u4e2dLLMs\u63a5\u6536\u5230\u8fde\u7eed\u7684\u53cd\u9988\u6d41\uff0c\u5e76\u8fed\u4ee3\u5730\u63d0\u5347\u5176\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b80\u5355\u4f46\u6709\u6548\u7684LLM\u57fa\u7ebf\uff0c\u5e76\u5bf9\u5f71\u54cd\u6210\u529f\u6d41\u5f0f\u7b56\u7565\u7684\u5173\u952e\u7ec4\u4ef6\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u5f00\u53d1LLMs\u7684\u6709\u6548\u5728\u7ebf\u5b66\u4e60\u7b56\u7565\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u4e3a\u6d41\u5f0f\u573a\u666f\u4e2d\u7684\u66f4\u9002\u5e94\u6027AI\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.11277": "|**2024-06-17**|**Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector**|Xiaoxue Cheng et.al.|[2406.11277](http://arxiv.org/abs/2406.11277)|**[link](https://github.com/rucaibox/haluagent)**|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7b\u89c9\u68c0\u6d4b\u65b9\u9762\u7684\u6311\u6218\uff0c\u7279\u522b\u6307\u51fa\u4ee5\u5f80\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u5f3a\u5927\u7684\u95ed\u6e90\u6a21\u578b\u5982GPT-4\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u4e3b\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u79f0\u4e3aHaluAgent\uff0c\u5b83\u5141\u8bb8\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982\u5df4 chcuan2-Chat 7B\uff09\u4e3b\u52a8\u9009\u62e9\u9002\u5408\u68c0\u6d4b\u6587\u672c\u3001\u4ee3\u7801\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\u7b49\u591a\u79cd\u5e7b\u89c9\u7c7b\u578b\u7684\u5de5\u5177\u3002HaluAgent\u6574\u5408\u4e86LLM\u3001\u591a\u529f\u80fd\u5de5\u5177\u7bb1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u4e09\u9636\u6bb5\u68c0\u6d4b\u6846\u67b6\uff0c\u540c\u65f6\u914d\u5907\u4e86\u8bb0\u5fc6\u673a\u5236\u3002\u4e3a\u4e86\u63d0\u9ad8HaluAgent\u7684\u6548\u80fd\uff0c\u8bba\u6587\u5229\u7528\u73b0\u6709\u7684\u4e2d\u6587\u548c\u82f1\u6587\u6570\u636e\u96c6\u5408\u6210\u68c0\u6d4b\u8f68\u8ff9\u8fdb\u884c\u5fae\u8c03\uff0c\u4f7f\u5176\u5177\u5907\u53cc\u8bed\u5e7b\u89c9\u68c0\u6d4b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f7f\u75282000\u4e2a\u6837\u672c\u5bf9LLM\u8fdb\u884c\u8c03\u4f18\u540e\uff0cHaluAgent\u5728\u5404\u79cd\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5176\u6027\u80fd\u53ef\u4e0eGPT-4\u5ab2\u7f8e\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\uff0c\u4e14\u65e0\u9700\u989d\u5916\u5de5\u5177\u589e\u5f3a\uff0c\u65e0\u8bba\u5728\u9886\u57df\u5185\u8fd8\u662f\u9886\u57df\u5916\u7684\u6570\u636e\u96c6\u4e0a\u90fd\u5c55\u73b0\u51fa\u826f\u597d\u6027\u80fd\u3002\u8bba\u6587\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53d1\u5e03\u5728https://github.com/RUCAIBox/HaluAgent\u3002|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200](http://arxiv.org/abs/2406.11200)|**[link](https://github.com/zou-group/avatar)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5229\u7528\u5916\u90e8\u5de5\u5177\u548c\u77e5\u8bc6\u63d0\u5347\u51c6\u786e\u6027\u548c\u51cf\u5c11\u9519\u8bef\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bbe\u8ba1\u80fd\u8ba9LLMs\u6709\u6548\u8fd0\u7528\u8fd9\u4e9b\u5de5\u5177\u7684\u63d0\u793a\u6280\u5de7\u662f\u4e00\u9879\u8017\u65f6\u4e14\u4f9d\u8d56\u76f4\u89c9\u7684\u4efb\u52a1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faAvaTaR\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u5b83\u80fd\u4f18\u5316LLMs\uff0c\u4f7f\u5176\u66f4\u6709\u6548\u5730\u5229\u7528\u63d0\u4f9b\u7684\u5de5\u5177\uff0c\u5e76\u5728\u7279\u5b9a\u4efb\u52a1\u6216\u9886\u57df\u4e2d\u63d0\u5347\u6027\u80fd\u3002AvaTaR\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6bd4\u8f83\u5668\u6a21\u5757\uff0c\u4ee5\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u6b63\u8d1f\u6837\u672c\u8fdb\u884c\u63a8\u7406\uff0c\u8fed\u4ee3\u5730\u4e3aLLM\u63d0\u4f9b\u5bcc\u6709\u6d1e\u5bdf\u529b\u548c\u5168\u9762\u7684\u63d0\u793a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u5305\u542b\u6587\u672c\u3001\u89c6\u89c9\u548c\u5173\u7cfb\u4fe1\u606f\u7684\u590d\u6742\u591a\u6a21\u6001\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86AvaTaR\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u8868\u660e\uff0cAvaTaR\u5728\u6240\u6709\u56db\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u5747\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5f53\u5e94\u7528\u4e8e\u65b0\u6848\u4f8b\u65f6\uff0c\u5e73\u5747\u5728Hit@1\u6307\u6807\u4e0a\u5b9e\u73b0\u4e8614%\u7684\u76f8\u5bf9\u6539\u8fdb\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.11176": "|**2024-06-17**|**Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement**|Weimin Xiong et.al.|[2406.11176](http://arxiv.org/abs/2406.11176)|**[link](https://github.com/weiminxiong/ipr)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7cfb\u5217\u590d\u6742\u7684\u4ea4\u4e92\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u503e\u5411\u4e8e\u901a\u8fc7\u4e13\u5bb6\u8f68\u8ff9\u8c03\u4f18\u6765\u63d0\u5347\u6a21\u578b\u6548\u679c\uff0c\u4f46\u4e3b\u8981\u5173\u6ce8\u6700\u7ec8\u7ed3\u679c\u5956\u52b1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u6216\u975e\u6700\u4f18\u884c\u4e3a\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u8fc7\u7a0b\u76d1\u7763\u4fe1\u53f7\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u8fed\u4ee3\u6b65\u7ea7\u8fc7\u7a0b\u6539\u8fdb\uff08Iterative Step-level Process Refinement\uff0cIPR\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u91c7\u7528\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u4f30\u7b97\u6bcf\u4e00\u6b65\u7684\u5956\u52b1\u3002\u5728\u6bcf\u4e2a\u8fed\u4ee3\u4e2d\uff0c\u6a21\u578b\u6cbf\u7740\u4e13\u5bb6\u8f68\u8ff9\u63a2\u7d22\u5e76\u751f\u6210\u65b0\u52a8\u4f5c\uff0c\u7136\u540e\u4e0e\u4e13\u5bb6\u8f68\u8ff9\u7684\u76f8\u5e94\u6b65\u9aa4\u8fdb\u884c\u6bd4\u8f83\uff0c\u4f7f\u7528\u6b65\u7ea7\u5956\u52b1\u8bc4\u4f30\u3002\u8fd9\u79cd\u6bd4\u8f83\u6709\u52a9\u4e8e\u8bc6\u522b\u5dee\u5f02\uff0c\u5f62\u6210\u7528\u4e8e\u8bad\u7ec3\u7684\u5bf9\u6bd4\u52a8\u4f5c\u5bf9\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u590d\u6742\u4ee3\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f18\u4e8e\u591a\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u63ed\u793a\u4e86IPR\u5728\u63d0\u5347\u52a8\u4f5c\u6548\u7387\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc1\u660e\u5176\u9002\u7528\u4e8e\u5404\u79cd\u6a21\u578b\u3002**|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132](http://arxiv.org/abs/2406.11132)|null|\u5728\u8fc7\u53bb\u7684\u4e00\u5e74\u91cc\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u4e4b\u5916\u5c55\u73b0\u51fa\u60ca\u4eba\u6210\u5c31\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5728\u4ee3\u7801\u751f\u6210\u3001\u65c5\u884c\u89c4\u5212\u548c\u673a\u5668\u4eba\u63a7\u5236\u7b49\u66f4\u5177\u4f53\u7684\u5e94\u7528\u9886\u57df\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u3002\u901a\u8fc7\u4e0eLLM\u6784\u5efa\u6240\u8c13\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\u534f\u52a9\u4eba\u4eec\u5b8c\u6210\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u5404\u79cd\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5bf9LLMs\u7684\u63d0\u793a\u8bed\u53e5\u5bf9\u751f\u6210\u5185\u5bb9\u53ca\u5176\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u6210\u4e3a\u8bb8\u591a\u7814\u7a76\u4eba\u5458\u548cLLM\u7528\u6237\u5173\u6ce8\u7684\u7126\u70b9\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u540d\u4e3a\\textsc{RePrompt}\uff0c\u5b83\u5229\u7528\u4e0eLLM\u4ee3\u7406\u4ea4\u4e92\u83b7\u53d6\u7684\u5bf9\u8bdd\u5386\u53f2\uff0c\u901a\u8fc7\u201c\u68af\u5ea6\u4e0b\u964d\u201d\u4f18\u5316LLM\u7684\u9010\u6b65\u6307\u4ee4\u3002\u901a\u8fc7\u4f18\u5316\u63d0\u793a\uff0cLLM\u80fd\u591f\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u89c4\u5212\u7b56\u7565\u3002\u6211\u4eec\u5728PDDL\u751f\u6210\u548c\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u63d0\u793a\u4f5c\u4e3a\u521d\u59cb\u63d0\u793a\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u5e38\u53ef\u4ee5\u63d0\u9ad8\u4e0d\u540c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u3002|\n", "2406.10918": "|**2024-06-18**|**Embodied Question Answering via Multi-LLM Systems**|Bhrij Patel et.al.|[2406.10918](http://arxiv.org/abs/2406.10918)|null|## \u80cc\u666f Embodied Question Answering\uff08EQA\uff09\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5b83\u6d89\u53ca\u4e00\u4e2a\u4ee3\u7406\u5728\u73af\u5883\u4e2d\u63a2\u7d22\u4ee5\u56de\u7b54\u7528\u6237\u67e5\u8be2\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4ee3\u7406\u573a\u666f\u4e2d\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u63a2\u7d22\u65f6\u95f4\u5197\u957f\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u591a\u4ee3\u7406\u6846\u67b6\u4e0b\u7684EQA\uff0c\u5176\u4e2d\u6d89\u53ca\u591a\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u72ec\u7acb\u4ee3\u7406\uff0c\u5b83\u4eec\u5404\u81ea\u89e3\u7b54\u5173\u4e8e\u5bb6\u5ead\u73af\u5883\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u4e3a\u6bcf\u4e2a\u67e5\u8be2\u751f\u6210\u4e00\u4e2a\u7b54\u6848\uff0c\u6211\u4eec\u5229\u7528\u5404\u4e2a\u72ec\u7acb\u54cd\u5e94\u6765\u8bad\u7ec3\u4e00\u4e2a\u4e2d\u592e\u7b54\u6848\u6a21\u578b\uff08CAM\uff09\uff0c\u8be5\u6a21\u578b\u6574\u5408\u7b54\u6848\u4ee5\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u56de\u7b54\u3002\u901a\u8fc7\u4f7f\u7528CAM\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5176\u5728EQA\u51c6\u786e\u7387\u4e0a\u6bd4\u8bf8\u5982\u6295\u7968\u673a\u5236\u548c\u8fa9\u8bba\u7b49ensemble LLM\u805a\u5408\u65b9\u6cd5\u9ad8\u51fa50%\u3002CAM\u65e0\u9700\u4efb\u4f55\u5f62\u5f0f\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\uff0c\u4ece\u800c\u907f\u514d\u4e86\u76f8\u5173\u5f00\u9500\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e0d\u540c\u7684\u975e\u7ebf\u6027\uff08\u5982\u795e\u7ecf\u7f51\u7edc\u3001\u968f\u673a\u68ee\u6797\u3001\u51b3\u7b56\u6811\u3001XGBoost\uff09\u548c\u7ebf\u6027\u7b97\u6cd5\uff08\u5982\u903b\u8f91\u56de\u5f52\u5206\u7c7b\u5668\u3001\u652f\u6301\u5411\u91cf\u673a\uff09\u5bf9CAM\u8fdb\u884c\u4e86\u6d88\u878d\u7814\u7a76\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7Permutation Feature Importance\uff08PFI\uff09\u5206\u6790\u4e86CAM\u5bf9\u6bcf\u4e2a\u72ec\u7acb\u4ee3\u7406\u548c\u67e5\u8be2\u4e0a\u4e0b\u6587\u7684\u4f9d\u8d56\u7a0b\u5ea6\uff0c\u91cf\u5316\u4e86CAM\u7684\u4f9d\u8d56\u7279\u6027\u3002|\n", "2406.10819": "|**2024-06-16**|**GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents**|Dongping Chen et.al.|[2406.10819](http://arxiv.org/abs/2406.10819)|**[link](https://github.com/keplerlab/katna)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5df2\u88ab\u7528\u4e8e\u63a7\u5236\u952e\u76d8\u548c\u9f20\u6807\u8f93\u5165\uff0c\u76f4\u63a5\u611f\u77e5\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\uff0c\u5e76\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6a21\u578b\u4e3b\u8981\u5728\u9759\u6001\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4e3b\u8981\u5e94\u7528\u4e8e\u76f8\u5bf9\u7b80\u5355\u7684\u9886\u57df\uff0c\u5982\u7f51\u9875\u6216\u79fb\u52a8\u754c\u9762\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e00\u4e2a\u7a33\u5065\u7684GUI\u4ee3\u7406\u5e94\u5177\u5907\u7406\u89e3GUI\u7684\u65f6\u7a7a\u4fe1\u606f\u80fd\u529b\uff0c\u5305\u62ec\u52a8\u6001\u7f51\u9875\u5185\u5bb9\u548c\u591a\u6b65\u9aa4\u4efb\u52a1\uff0c\u8fd8\u8981\u5168\u9762\u7406\u89e3\u5404\u79cdGUI\u573a\u666f\uff0c\u5305\u62ec\u684c\u9762\u8f6f\u4ef6\u548c\u591a\u7a97\u53e3\u4ea4\u4e92\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u6570\u636e\u96c6\u2014\u2014GUI-World\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u7cbe\u5fc3\u5236\u4f5c\u7684\u4eba\u673a\u6807\u6ce8\uff0c\u5e7f\u6cdb\u6db5\u76d6\u516d\u79cdGUI\u573a\u666f\u548c\u516b\u7c7bGUI\u76f8\u5173\u95ee\u9898\uff0c\u4ee5\u4e09\u79cd\u683c\u5f0f\u5448\u73b0\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684MLLM\uff0c\u5982\u56fe\u50cfLLMs\u548c\u89c6\u9891LLMs\uff0c\u5728\u7406\u89e3\u548c\u5904\u7406\u4e0d\u540c\u7c7b\u578bGUI\u5185\u5bb9\uff0c\u7279\u522b\u662f\u52a8\u6001\u548c\u5e8f\u5217\u5185\u5bb9\u65b9\u9762\u7684\u80fd\u529b\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u56fe\u50cfLLMs\u5728\u6ca1\u6709\u624b\u52a8\u6807\u6ce8\u5173\u952e\u5e27\u6216\u64cd\u4f5c\u5386\u53f2\u7684\u60c5\u51b5\u4e0b\uff0c\u96be\u4ee5\u5e94\u5bf9\u52a8\u6001GUI\u5185\u5bb9\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7531\u4e8eGUI\u89c6\u9891\u6570\u636e\u96c6\u7684\u7a00\u758f\u6027\uff0c\u89c6\u9891LLMs\u5728\u6240\u6709GUI\u76f8\u5173\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u57fa\u4e8eGUI-World\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u4f7f\u7528\u5fae\u8c03\u540e\u7684\u89c6\u9891LLM\u4f5c\u4e3aGUI\u4ee3\u7406\uff0c\u663e\u793a\u4e86\u5bf9\u5404\u79cdGUI\u4efb\u52a1\u7406\u89e3\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u7531\u4e8e\u57fa\u7840LLM\u6027\u80fd\u7684\u9650\u5236\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c06\u89c6\u9891LLMs\u7528\u4f5cGUI\u4ee3\u7406\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u672a\u6765\u5728\u52a8\u6001GUI\u5185\u5bb9\u7406\u89e3\u65b9\u9762\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u6d1e\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u6211\u4eec\u7684\u9879\u76ee\u4e3b\u9875https://gui-world.github.io/\u4e0a\u516c\u5f00\u3002**|\n", "2406.10803": "|**2024-06-16**|**HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies**|William Watson et.al.|[2406.10803](http://arxiv.org/abs/2406.10803)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u65f6\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u4e3b\u8981\u5305\u62ec\uff1a\uff081\uff09\u5bf9\u4e8e\u5927\u8868\u683c\u6709\u9650\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff1b\uff082\uff09\u4e0d\u540ctoken\u5316\u6a21\u5f0f\u4e0e\u5355\u5143\u683c\u8fb9\u754c\u7684\u590d\u6742\u5dee\u5f02\uff1b\uff083\uff09\u4ee5\u53ca\u4f7f\u7528\u5916\u90e8\u6a21\u578b\u5982gpt-3.5-turbo\u65f6\u7684\u6570\u636e\u4fdd\u5bc6\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cHiddenTables\u201d\u7684\u5408\u4f5c\u6e38\u620f\u3002\u8fd9\u4e2a\u6e38\u620f\u6d89\u53ca\u4ee3\u7801\u751f\u6210LLM\u201cSolver\u201d\u548c\u8bc4\u4f30\u5176\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u80fd\u529b\u7684\u201cOracle\u201d\uff0c\u4ee5\u81ea\u7136\u8bed\u8a00\u89c4\u8303\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4fdd\u8bc1\u6570\u636e\u5b89\u5168\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u8868\u683c\u4e0a\u5c55\u793a\u4e86LLMs\u5728\u5904\u7406\u590d\u6742\u67e5\u8be2\u3001\u5904\u7406\u7ec4\u5408\u4f9d\u8d56\u4ee5\u53ca\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u5316\u4e3a\u7a0b\u5e8f\u6307\u4ee4\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u7279\u522b\u662f\u5728\u63d0\u4f9b\u5177\u4f53\u8868\u683c\u7ed3\u6784\u7684\u60c5\u51b5\u4e0b\u3002\u4e0e\u57fa\u4e8e\u7f16\u7801\u5668\u7684\u6a21\u578b\u4e0d\u540c\uff0c\u201cHiddenTables\u201d\u4e0d\u53d7\u884c\u6570\u9650\u5236\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u5b8c\u6210 token \u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u201cPyQTax\u201d\uff0c\u5305\u542b116,671\u4e2a\u95ee\u9898-\u8868\u683c-\u7b54\u6848\u4e09\u5143\u7ec4\uff0c\u5e76\u63d0\u4f9b\u4e86\u66f4\u7ec6\u81f4\u7684\u95ee\u9898\u5206\u7c7b\u548c\u6807\u7b7e\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u7814\u7a76\u3002 \u56e0\u6b64\uff0c\u9664\u4e86\u5b66\u672f\u8d21\u732e\uff0c\u63ed\u793a\u4e86LLMs\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u4e0d\u8db3\uff0c\u201cHiddenTables\u201d\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u5728\u4fdd\u969c\u6570\u636e\u5b89\u5168\u7684\u540c\u65f6\uff0c\u8ba9LLMs\u4e0e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e92\u52a8\uff0c\u4ee5\u53ca\u964d\u4f4e\u751f\u6210\u6210\u672c\u7684\u5b9e\u8df5\u65b9\u6cd5\u3002|\n", "2406.10478": "|**2024-06-15**|**From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent**|Samuel S. Sohn et.al.|[2406.10478](http://arxiv.org/abs/2406.10478)|null|## \u80cc\u666f \u5728\u5a31\u4e50\u3001\u6559\u80b2\u548c\u8425\u9500\u9886\u57df\u81f3\u5173\u91cd\u8981\u7684\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u9762\u4e34\u7740\u751f\u4ea7\u89c4\u6a21\u6269\u5c55\u548c\u7075\u6d3b\u6027\u63d0\u5347\u7684\u6311\u6218\u3002\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u7684StoryAgent\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u751f\u6210\u5de5\u5177\u6765\u81ea\u52a8\u5316\u5e76\u4f18\u5316\u6570\u5b57\u6545\u4e8b\u521b\u4f5c\u8fc7\u7a0b\u3002\u5b83\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u6545\u4e8b\u60c5\u8282\u8349\u62df\u548c\u81ea\u4e0b\u800c\u4e0a\u7684\u8d44\u4ea7\u751f\u6210\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u624b\u52a8\u5e72\u9884\u3001\u4e92\u52a8\u573a\u666f\u7f16\u6392\u548c\u53d9\u4e8b\u4e00\u81f4\u6027\u7b49\u5173\u952e\u95ee\u9898\u3002\u8fd9\u4e2a\u6846\u67b6\u4fc3\u8fdb\u4e86\u4ea4\u4e92\u5f0f\u548c\u4e00\u81f4\u53d9\u4e8b\u7684\u9ad8\u6548\u751f\u4ea7\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u5a92\u4ecb\uff0c\u63a8\u52a8\u4e86\u5185\u5bb9\u521b\u4f5c\u7684\u6c11\u4e3b\u5316\uff0c\u589e\u5f3a\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8be5\u6846\u67b6\u80fd\u591f\u5728\u6ca1\u6709\u53c2\u8003\u89c6\u9891\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8fde\u8d2f\u7684\u6570\u5b57\u6545\u4e8b\uff0c\u8fd9\u6807\u5fd7\u7740\u81ea\u52a8\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u6280\u672f\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12708": "|**2024-06-18**|**AgentReview: Exploring Peer Review Dynamics with LLM Agents**|Yiqiao Jin et.al.|[2406.12708](http://arxiv.org/abs/2406.12708)|**[link](https://github.com/ahren09/agentreview)**|## \u7ffb\u8bd1 \u540c\u884c\u8bc4\u5ba1\u662f\u79d1\u5b66\u51fa\u7248\u8bda\u4fe1\u548c\u8fdb\u6b65\u7684\u57fa\u7840\u3002\u4f20\u7edf\u7684\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u5206\u6790\u65b9\u6cd5\u5f80\u5f80\u4fa7\u91cd\u4e8e\u73b0\u6709\u6570\u636e\u7684\u63a2\u7d22\u548c\u7edf\u8ba1\uff0c\u4f46\u672a\u80fd\u5145\u5206\u8003\u8651\u8fd9\u4e00\u8fc7\u7a0b\u7684\u591a\u53d8\u91cf\u6027\u8d28\uff0c\u5904\u7406\u6f5c\u5728\u53d8\u91cf\uff0c\u4e14\u53d7\u9650\u4e8e\u9690\u79c1\u95ee\u9898\uff0c\u56e0\u4e3a\u6570\u636e\u6d89\u53ca\u654f\u611f\u6027\u3002\u6211\u4eec\u63d0\u51faAgentReview\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u540c\u884c\u8bc4\u5ba1\u6a21\u62df\u6846\u67b6\uff0c\u6709\u6548\u5206\u89e3\u4e86\u591a\u4e2a\u6f5c\u5728\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u5e76\u89e3\u51b3\u4e86\u9690\u79c1\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7531\u4e8e\u793e\u4f1a\u5f71\u54cd\u529b\u7406\u8bba\u3001\u5229\u4ed6\u4e3b\u4e49\u75b2\u52b3\u548c\u6743\u5a01\u504f\u89c1\u7b49\u793e\u4f1a\u5b66\u7406\u8bba\u7684\u652f\u6301\uff0c\u8bba\u6587\u51b3\u7b56\u4e2d\u5b58\u5728\u663e\u8457\u768437.1%\u7684\u53d8\u5f02\u6027\u3002\u6211\u4eec\u76f8\u4fe1\u8fd9\u9879\u7814\u7a76\u80fd\u4e3a\u4f18\u5316\u540c\u884c\u8bc4\u5ba1\u673a\u5236\u8bbe\u8ba1\u63d0\u4f9b\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628](http://arxiv.org/abs/2406.12628)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u4e8e\u7535\u529b\u7535\u5b50\u7cfb\u7edf\u63a7\u5236\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u6a21\u578b\u4e0d\u786e\u5b9a\u6027\u4ee5\u53ca\u8bbe\u8ba1\u5468\u671f\u6f2b\u957f\u548c\u6210\u672c\u9ad8\u6602\u7684\u95ee\u9898\u3002\u8bba\u6587\u65e8\u5728\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u9762\u5411\u76ee\u6807\u7684\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u3002\u8be5\u6846\u67b6\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7ed3\u5408\u591a\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u9ad8\u6548\u4e14\u81ea\u52a8\u5316\u7684\u63a7\u5236\u5668\u8bbe\u8ba1\u6d41\u7a0b\u3002LLM\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u5e76\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u7684\u9ad8\u7ea7\u6307\u4ee4\uff0c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7ea6\u675f\u8c03\u6574\u5176\u884c\u4e3a\u3002\u8fd9\u79cd\u65b0\u9896\u800c\u9ad8\u6548\u7684\u7b56\u7565\u6709\u671b\u663e\u8457\u63d0\u5347\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\uff0c\u6781\u5927\u5730\u4fbf\u5229\u5b9e\u8df5\u8005\u7684\u5de5\u4f5c\u3002|\n", "2406.12276": "|**2024-06-18**|**CodeNav: Beyond tool-use to using real-world codebases with LLM agents**|Tanmay Gupta et.al.|[2406.12276](http://arxiv.org/abs/2406.12276)|null|\u6211\u4eec\u4ecb\u7ecdCodeNav\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5bfc\u822a\u548c\u5229\u7528\u5148\u524d\u672a\u89c1\u8fc7\u7684\u4ee3\u7801\u4ed3\u5e93\uff0c\u4ee5\u89e3\u51b3\u7528\u6237\u67e5\u8be2\u7684\u7cfb\u7edf\u3002\u4e0e\u9700\u8981\u901a\u8fc7\u624b\u52a8\u63cf\u8ff0\u5728LLM\u4e0a\u4e0b\u6587\u4e2d\u201c\u6ce8\u518c\u201d\u6240\u6709\u76f8\u5173\u5de5\u5177\u7684\u5de5\u5177\u4f7f\u7528\u578bLLM\u4e0d\u540c\uff0cCodeNav\u80fd\u591f\u81ea\u52a8\u7d22\u5f15\u548c\u641c\u7d22\u76ee\u6807\u4ee3\u7801\u5e93\u4e2d\u7684\u4ee3\u7801\u5757\uff0c\u627e\u5230\u76f8\u5173\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u5bfc\u5165\u5b83\u4eec\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u751f\u6210\u89e3\u51b3\u65b9\u6848\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793aCodeNav\u5982\u4f55\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u4ee3\u7801\u5e93\u6765\u89e3\u51b3\u590d\u6742\u7684\u7528\u6237\u95ee\u9898\u3002\u63a5\u7740\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u5b9a\u91cf\u6bd4\u8f83\u4e86\u4ec5\u80fd\u8bbf\u95ee\u76ee\u6807\u4ee3\u7801\u5e93\u7684\u4ee3\u7801\u4f7f\u7528\u65b9\u6cd5\u4e0e\u62e5\u6709\u5bf9\u6240\u6709\u5de5\u5177\u540d\u79f0\u548c\u63cf\u8ff0\u7684\u7279\u6743\u8bbf\u95ee\u7684\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u7c7b\u578b\u5de5\u5177\u548c\u5e93\u63cf\u8ff0\u5bf9\u4ee3\u7801\u4f7f\u7528\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5c06\u6e90\u4ee3\u7801\u89c6\u4e3a\u8f93\u5165\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ee3\u7801\u63cf\u8ff0\u7684\u4f18\u52bf\u3002\u6240\u6709\u4ee3\u7801\u5c06\u9075\u5faa\u5bbd\u677e\u8bb8\u53ef\u534f\u8bae\u5f00\u6e90\u3002|\n", "2406.12125": "|**2024-06-17**|**Efficient Sequential Decision Making with Large Language Models**|Dingyang Chen et.al.|[2406.12125](http://arxiv.org/abs/2406.12125)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u6269\u5c55\u5230\u5e8f\u5217\u51b3\u7b56\u5236\u5b9a\u3002\u5f53\u524d\u7684\u52aa\u529b\u8981\u4e48\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03LLMs\u8fdb\u884c\u51b3\u7b56\uff0c\u8981\u4e48\u4e3a\u9884\u8bad\u7ec3\u7684LLMs\u8bbe\u8ba1\u63d0\u793a\u3002\u524d\u8005\u9762\u4e34\u8ba1\u7b97\u8d1f\u62c5\u91cd\u7684\u68af\u5ea6\u66f4\u65b0\u95ee\u9898\uff0c\u800c\u540e\u8005\u672a\u663e\u793a\u51fa\u660e\u663e\u6548\u679c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5728\u7ebf\u6a21\u578b\u9009\u62e9\u7b97\u6cd5\u6709\u6548\u5730\u5c06LLMs\u6574\u5408\u5230\u5e8f\u5217\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u3002\u7edf\u8ba1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u51b3\u7b56\u7b97\u6cd5\u548c\u7eafLLM\u4ee3\u7406\u3002\u5728\u8ba1\u7b97\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u907f\u514d\u4e86\u5bf9LLMs\u8fdb\u884c\u6602\u8d35\u7684\u68af\u5ea6\u66f4\u65b0\uff0c\u5e76\u4e14\u5728\u6574\u4e2a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u4ec5\u9700\u8981\u5c11\u91cf\u7684LLM\u8c03\u7528\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee5\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u4e9a\u9a6c\u900a\u6570\u636e\u96c6\u4e3a\u4f8b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4ec5\u4f7f\u75281.5%\u7684\u65f6\u95f4\u6b65\u6570\u8c03\u7528LLMs\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6bd4\u57fa\u7ebf\u8d85\u8fc76\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2406.14373": "|**2024-07-01**|**Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory**|Gordon Dai et.al.|[2406.14373](http://arxiv.org/abs/2406.14373)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\u7684\u7814\u7a76\u8fce\u6765\u4e86\u5927\u89c4\u6a21\u63a2\u7d22\u7684\u673a\u9047\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u57fa\u4e8e\u5148\u524d\u5bf9LLM\u884c\u4e3a\u4f53\u8bbe\u8ba1\u7684\u7814\u7a76\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u6a21\u62df\u7684Agent\u793e\u4f1a\uff0c\u5176\u4e2d\u590d\u6742\u7684\u793e\u4ea4\u5173\u7cfb\u968f\u65f6\u95f4\u52a8\u6001\u5f62\u6210\u548c\u53d1\u5c55\u3002\u6211\u4eec\u8d4b\u4e88\u8fd9\u4e9bAgent\u5fc3\u7406\u9a71\u52a8\u529b\uff0c\u5e76\u7f6e\u4e8e\u4e00\u4e2a\u6c99\u76d2\u751f\u5b58\u73af\u5883\u4e2d\u3002\u901a\u8fc7\u6258\u9a6c\u65af\u00b7\u970d\u5e03\u65af\u7684\u5960\u57fa\u6027\u793e\u4f1a\u5951\u7ea6\u7406\u8bba\uff08SCT\uff09\u7684\u89c6\u89d2\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u8fd9\u4e2aAgent\u793e\u4f1a\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8d77\u521d\uff0cAgent\u4eec\u8868\u73b0\u51fa\u65e0\u62d8\u65e0\u675f\u7684\u51b2\u7a81\uff0c\u7b26\u5408\u970d\u5e03\u65af\u5bf9\u201c\u81ea\u7136\u72b6\u6001\u201d\u7684\u63cf\u8ff0\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u62df\u7684\u8fdb\u884c\uff0c\u793e\u4f1a\u5951\u7ea6\u9010\u6e10\u5f62\u6210\uff0c\u7edd\u5bf9\u4e3b\u6743\u8005\u5f97\u5230\u4e86\u6388\u6743\uff0c\u8fdb\u800c\u5efa\u7acb\u4e86\u4ee5\u76f8\u4e92\u5408\u4f5c\u4e3a\u57fa\u7840\u7684\u548c\u5e73\u5171\u540c\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u53d1\u73b0\u4e0e\u970d\u5e03\u65af\u7406\u8bba\u76f8\u543b\u5408\uff1aLLM\u9a71\u52a8\u7684\u591aAgent\u6a21\u62df\u5c55\u793a\u4e86\u793e\u4f1a\u52a8\u6001\u7684\u590d\u6742\u6027\uff0c\u53ef\u80fd\u590d\u5236\u5851\u9020\u4eba\u7c7b\u793e\u4f1a\u7684\u529b\u91cf\u3002\u5c3d\u7ba1\u65e0\u6cd5\u5b8c\u5168\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u6240\u6709\u7ec6\u5fae\u4e4b\u5904\uff0c\u4f46\u8fd9\u79cd\u6a21\u62df\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u7ed3\u6784\u3001\u7fa4\u4f53\u52a8\u6001\u548c\u590d\u6742\u4eba\u7c7b\u7cfb\u7edf\u5177\u6709\u6f5c\u5728\u4ef7\u503c\u3002|\n", "2406.14228": "|**2024-06-20**|**EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms**|Siyu Yuan et.al.|[2406.14228](http://arxiv.org/abs/2406.14228)|**[link](https://github.com/siyuyuan/evoagent)**|**\u968f\u7740\u5f3a\u5927\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4e00\u79cd\u65b0\u7684\u8d8b\u52bf\u662f\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u80fd\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\uff0c\u5c24\u5176\u662f\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u7684\u6846\u67b6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7cfb\u7edf\u7684\u529f\u80fd\u8303\u56f4\u548c\u53ef\u6269\u5c55\u6027\u3002\u5982\u4f55\u81ea\u52a8\u5c06\u4e13\u95e8\u7684\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51faEvoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u8fdb\u5316\u7b97\u6cd5\u81ea\u52a8\u5c06\u4e13\u5bb6\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u9ad8\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6267\u884c\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u89c6\u73b0\u6709\u7684\u4ee3\u7406\u6846\u67b6\u4e3a\u521d\u59cb\u4e2a\u4f53\uff0c\u5e76\u5e94\u7528\u4e00\u7cfb\u5217\u8fdb\u5316\u64cd\u4f5c\uff08\u5982\u7a81\u53d8\u3001\u4ea4\u53c9\u3001\u9009\u62e9\u7b49\uff09\u751f\u6210\u5177\u6709\u4e0d\u540c\u8bbe\u7f6e\u7684\u4ee3\u7406\u3002EvoAgent\u9002\u7528\u4e8e\u4efb\u4f55\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u80fd\u591f\u65e0\u987b\u989d\u5916\u4eba\u5de5\u8bbe\u8ba1\u81ea\u52a8\u751f\u6210\u6269\u5c55\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEvoAgent\u80fd\u591f\u81ea\u52a8\u4ea7\u751f\u591a\u4e2a\u4e13\u5bb6\u7ea7\u4ee3\u7406\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u3002**|\n", "2406.13352": "|**2024-06-19**|**AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|**[link](https://github.com/ethz-spylab/agentdojo)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentDojo\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u5904\u7406\u4e0d\u53ef\u4fe1\u6570\u636e\u7684AI\u4ee3\u7406\u7684\u5bf9\u6297\u6027\u9c81\u68d2\u6027\u3002\u9762\u5bf9\u4e0d\u65ad\u6f14\u53d8\u7684\u653b\u51fb\u548c\u9632\u5fa1\u624b\u6bb5\uff0cAgentDojo\u4e0d\u662f\u4e00\u4e2a\u9759\u6001\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u800c\u662f\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b0\u4efb\u52a1\u3001\u9632\u5fa1\u7b56\u7565\u4ee5\u53ca\u9002\u5e94\u6027\u653b\u51fb\u7684\u53ef\u6269\u5c55\u73af\u5883\u3002\u5b83\u5305\u542b\u4e8697\u4e2a\u5b9e\u9645\u5e94\u7528\u573a\u666f\u7684\u4efb\u52a1\uff08\u5982\u7ba1\u7406\u7535\u5b50\u90ae\u4ef6\u5ba2\u6237\u7aef\u3001\u5bfc\u822a\u7f51\u4e0a\u94f6\u884c\u7f51\u7ad9\u6216\u9884\u8ba2\u65c5\u884c\uff09\uff0c629\u4e2a\u5b89\u5168\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ee5\u53ca\u6765\u81ea\u6587\u732e\u7684\u5404\u79cd\u653b\u51fb\u548c\u9632\u5fa1\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u8bed\u8a00\u6a21\u578b\u5728AgentDojo\u4e2d\u7684\u8868\u73b0\u5e76\u4e0d\u5c3d\u4eba\u610f\uff08\u5373\u4f7f\u6ca1\u6709\u653b\u51fb\uff09\uff0c\u5e76\u4e14\u73b0\u6709\u7684\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u867d\u7136\u80fd\u7834\u574f\u4e00\u4e9b\u5b89\u5168\u7279\u6027\uff0c\u4f46\u5e76\u975e\u6240\u6709\u60c5\u51b5\u90fd\u9002\u7528\u3002\u6211\u4eec\u671f\u671bAgentDojo\u80fd\u591f\u63a8\u52a8\u7814\u7a76\uff0c\u4ee5\u5bfb\u627e\u5728\u89e3\u51b3\u5e38\u89c1\u4efb\u52a1\u65f6\u65e2\u53ef\u9760\u53c8\u5065\u58ee\u7684AI\u4ee3\u7406\u7684\u65b0\u8bbe\u8ba1\u539f\u5219\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ethz-spylab/agentdojo\u3002**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163](http://arxiv.org/abs/2406.13163)|null|\u53d1\u73b0\u65b0\u6750\u6599\u5bf9\u79d1\u5b66\u548c\u6280\u672f\u5177\u6709\u91cd\u5927\u610f\u4e49\uff0c\u4f46\u76ee\u524d\u4ecd\u662f\u8270\u5de8\u95ee\u9898\uff0c\u56e0\u4e3a\u5316\u5b66\u7a7a\u95f4\u6d69\u701a\u3002\u8fd1\u671f\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u63a8\u52a8\u4e86\u57fa\u4e8e\u6570\u636e\u7684\u65b9\u6cd5\u6765\u5feb\u901f\u7b5b\u9009\u6216\u751f\u6210\u6709\u524d\u666f\u7684\u6750\u6599\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u4f9d\u8d56\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u4e14\u5f80\u5f80\u7f3a\u4e4f\u4eba\u7c7b\u671f\u671b\u7684\u6750\u6599\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u5316\u5b66\u76f4\u89c9\u3002\u6211\u4eec\u63d0\u51faLLMatDesign\uff0c\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u53ef\u89e3\u91ca\u6750\u6599\u8bbe\u8ba1\u65b0\u6846\u67b6\u3002LLMatDesign\u5229\u7528LLM\u4ee3\u7406\u7406\u89e3\u4eba\u7c7b\u6307\u4ee4\uff0c\u5bf9\u6750\u6599\u8fdb\u884c\u4fee\u6539\uff0c\u5e76\u4f7f\u7528\u63d0\u4f9b\u7684\u5de5\u5177\u8bc4\u4f30\u7ed3\u679c\u3002\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u5148\u524d\u51b3\u7b56\uff0cLLMatDesign\u80fd\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u5feb\u901f\u9002\u5e94\u65b0\u4efb\u52a1\u548c\u6761\u4ef6\u3002\u5728\u79bb\u7ebf\u5b9e\u9a8c\u4e2d\uff0c\u5bf9LLMatDesign\u5728\u591a\u4e2a\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u7cfb\u7edf\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u5b83\u5728\u5c0f\u6570\u636e\u73af\u5883\u4e0b\u5f00\u53d1\u51fa\u5177\u6709\u7528\u6237\u5b9a\u4e49\u76ee\u6807\u6027\u8d28\u7684\u65b0\u6750\u6599\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c55\u793a\u4e86\u81ea\u4e3bLLM\u5f15\u5bfc\u7684\u8ba1\u7b97\u73af\u5883\u4e0b\u7684\u6750\u6599\u53d1\u73b0\u7684\u975e\u51e1\u6f5c\u529b\uff0c\u9884\u793a\u7740\u672a\u6765\u81ea\u9a7e\u9a76\u5b9e\u9a8c\u5ba4\u7684\u53ef\u80fd\u6027\u3002|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a-agent\u7cfb\u7edf\uff08LLM-MAS\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5b83\u4eec\u901a\u8fc7\u7cfb\u7edf\u5185\u5404\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u534f\u4f5c\u6765\u5b8c\u6210\u4efb\u52a1\uff0c\u524d\u63d0\u662f\u5171\u4eab\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5f53\u4ee3\u7406\u95f4\u7684\u4ea4\u6d41\u88ab\u7528\u4e8e\u589e\u5f3a\u4eba\u7c7b\u5408\u4f5c\u65f6\uff0c\u7531\u4e8e\u4fe1\u606f\u4e0d\u5bf9\u79f0\uff08\u6bcf\u4e2a\u4ee3\u7406\u4ec5\u80fd\u8bbf\u95ee\u5176\u5bf9\u5e94\u4eba\u7c7b\u7528\u6237\u7684\u4fe1\u606f\uff09\uff0c\u8fd9\u5e26\u6765\u4e86\u65b0\u7684\u6311\u6218\u3002\u4f20\u7edfMAS\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u96be\u4ee5\u5b8c\u6210\u4efb\u52a1\u3002\u4e3a\u89e3\u51b3\u6b64\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u79f0\u4e3a\u201ciAgents\u201d\uff0c\u5373\u4fe1\u606f\u4e30\u5bcc\u591aagent\u7cfb\u7edf\u3002\u5728iAgents\u4e2d\uff0c\u4eba\u7c7b\u793e\u4f1a\u7f51\u7edc\u5728\u4ee3\u7406\u7f51\u7edc\u4e2d\u5f97\u5230\u53cd\u6620\uff0c\u4ee3\u7406\u4e3b\u52a8\u4ea4\u6362\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4ece\u800c\u514b\u670d\u4fe1\u606f\u4e0d\u5bf9\u79f0\u3002iAgents\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ee3\u7406\u63a8\u7406\u673a\u5236\uff0cInfoNav\uff0c\u5f15\u5bfc\u4ee3\u7406\u4e4b\u95f4\u7684\u6709\u6548\u4fe1\u606f\u4ea4\u6d41\u3002\u7ed3\u5408InfoNav\uff0ciAgents\u7ec4\u7ec7\u4e86\u6df7\u5408\u8bb0\u5fc6\u4e2d\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4e3a\u4ee3\u7406\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u4fe1\u606f\u8fdb\u884c\u4ea4\u6362\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u9996\u4e2a\u9488\u5bf9\u8bc4\u4f30LLM\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u6761\u4ef6\u4e0b\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u7684\u57fa\u51c6\u2014\u2014InformativeBench\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0ciAgents\u80fd\u591f\u5728\u5305\u542b140\u4eba\u548c588\u6761\u5173\u7cfb\u7684\u793e\u4f1a\u7f51\u7edc\u4e2d\u534f\u4f5c\uff0c\u81ea\u4e3b\u8fdb\u884c\u8d85\u8fc730\u8f6e\u7684\u901a\u4fe1\uff0c\u5e76\u4ece\u8fd170,000\u6761\u6d88\u606f\u4e2d\u68c0\u7d22\u4fe1\u606f\uff0c\u57283\u5206\u949f\u5185\u5b8c\u6210\u4efb\u52a1\u3002**|\n", "2406.14884": "|**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u88ab\u8bbe\u8ba1\u7528\u4e8e\u901a\u8fc7\u8fed\u4ee3\u89c4\u5212\u548c\u884c\u52a8\u6765\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7684\u4efb\u52a1\u65f6\uff0c\u5bb9\u6613\u4ea7\u751f\u4e0d\u671f\u671b\u7684\u89c4\u5212\u5e7b\u89c9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u521d\u6b65\u5c1d\u8bd5\u901a\u8fc7\u878d\u5165\u4e0e\u5de5\u4f5c\u6d41\u7a0b\u76f8\u5173\u7684\u5916\u90e8\u77e5\u8bc6\u6765\u589e\u5f3a\u89c4\u5212\u53ef\u9760\u6027\u3002\u5c3d\u7ba1\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u6ce8\u5165\u7684\u77e5\u8bc6\u901a\u5e38\u6742\u4e71\u65e0\u7ae0\uff0c\u683c\u5f0f\u591a\u6837\uff0c\u7f3a\u4e4f\u4e25\u8c28\u7684\u89c4\u8303\u5316\u548c\u5168\u9762\u7684\u6bd4\u8f83\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u89c4\u8303\u4e86\u4e0d\u540c\u683c\u5f0f\u7684\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86FlowBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u9762\u5411\u5de5\u4f5c\u6d41\u5f15\u5bfc\u89c4\u5212\u7684\u57fa\u51c6\u3002FlowBench\u6db5\u76d6\u4e86\u6765\u81ea6\u4e2a\u9886\u57df\u768451\u4e2a\u4e0d\u540c\u573a\u666f\uff0c\u5176\u4e2d\u77e5\u8bc6\u4ee5\u591a\u6837\u7684\u5f62\u5f0f\u5448\u73b0\u3002\u4e3a\u4e86\u8bc4\u4f30\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u5728FlowBench\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u5c42\u6b21\u7684\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\u5728\u591a\u79cd\u683c\u5f0f\u4e0b\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5728\u6ee1\u8db3\u6ee1\u610f\u7684\u89c4\u5212\u9700\u6c42\u65b9\u9762\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u57fa\u51c6\u80fd\u4e3a\u672a\u6765\u7684\u4ee3\u7406\u89c4\u5212\u7814\u7a76\u94fa\u5e73\u9053\u8def\u3002|\n", "2406.17232": "|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### \u7ffb\u8bd1 \u6784\u5efa\u903c\u771f\u7684\u4eba\u5de5\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u4fe1\u7684\u793e\u4f1a\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u6709\u65f6\u80fd\u63d0\u5347\u4eba\u6027\u5316\uff0c\u4f46\u6548\u679c\u5e76\u4e0d\u603b\u662f\u7406\u60f3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u6765\u81ea\u5b9e\u8bc1\u4eba\u7c7b\u4fe1\u5ff5\u7f51\u7edc\u7684\u4fe1\u606f\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002\u6211\u4eec\u5229\u7528\u4e00\u9879\u4eba\u7c7b\u8c03\u67e5\u6570\u636e\uff0c\u4f30\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b18\u4e2a\u4e3b\u9898\u7684\u4fe1\u5ff5\u7f51\u7edc\uff0c\u8fd9\u4e9b\u4e3b\u9898\u52a0\u8f7d\u4e8e\u4e24\u4e2a\u4e0d\u91cd\u53e0\u7684\u6f5c\u5728\u56e0\u5b50\u4e0a\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728LLM\u4e2d\u690d\u5165\u4e00\u4e2a\u5173\u4e8e\u67d0\u4e00\u4e3b\u9898\u7684\u89c2\u70b9\uff0c\u5206\u6790\u5176\u5bf9\u5269\u4f59\u6d4b\u8bd5\u8bdd\u9898\u8868\u8fbe\u7684\u89c2\u70b9\u4e0e\u76f8\u5e94\u4eba\u7c7b\u6570\u636e\u7684\u5951\u5408\u7a0b\u5ea6\u3002\u4ec5\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u672a\u80fd\u4f7fLLM\u548c\u4eba\u7c7b\u89c2\u70b9\u4fdd\u6301\u4e00\u81f4\uff0c\u4f46\u5f53\u690d\u5165\u5355\u4e00\u4fe1\u5ff5\u65f6\uff0c\u5bf9\u4e8e\u76f8\u5173\u4e8e\u4fe1\u5ff5\u7f51\u7edc\u5185\u7684\u4e3b\u9898\uff0c\u8fd9\u79cd\u4e00\u81f4\u6027\u663e\u8457\u63d0\u9ad8\uff0c\u800c\u5bf9\u4e8e\u7f51\u7edc\u5916\u7684\u4e3b\u9898\u5219\u6ca1\u6709\u660e\u663e\u5f71\u54cd\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5728\u8ffd\u6c42\u7406\u89e3\u548c\u6a21\u62df\u793e\u4f1a\u4e2d\u4fe1\u5ff5\u5206\u5e03\u6a21\u5f0f\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u4e2d\uff0c\u5b9e\u73b0\u4eba\u7c7b\u4e0eLLMs\u4e4b\u95f4\u7684\u4fe1\u5ff5\u5bf9\u9f50\u3002|\n", "2406.18702": "|**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u865a\u62df\u4ee3\u7406\u6765\u6a21\u62df\u7acb\u6cd5\u8fc7\u7a0b\uff0c\u5177\u4f53\u805a\u7126\u4e8e\u7f8e\u56fd\u53c2\u8bae\u9662\u60c5\u62a5\u59d4\u5458\u4f1a\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4ee3\u8868\u4e2a\u522b\u53c2\u8bae\u5458\u7684\u4ee3\u7406\uff0c\u5e76\u5728\u6a21\u62df\u7684\u59d4\u5458\u4f1a\u8ba8\u8bba\u4e2d\u8ba9\u5b83\u4eec\u4e92\u52a8\u3002\u8fd9\u4e9b\u4ee3\u7406\u5c55\u73b0\u51fa\u5728\u73b0\u5b9e\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u6df1\u601d\u719f\u8651\u7684\u89c2\u70b9\uff0c\u5e76\u5728\u7279\u5b9a\u6761\u4ef6\u4e0b\u627e\u5230\u4e24\u515a\u7684\u89e3\u51b3\u65b9\u6848\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6a21\u62df\u663e\u793a\uff0c\u9762\u5bf9\u5916\u90e8\u5e72\u6270\u65f6\uff0c\u4ee3\u7406\u6a21\u578b\u5728\u4e24\u515a\u5408\u4f5c\u4e0a\u5c55\u73b0\u51fa\u8f6c\u53d8\u7684\u6f5c\u529b\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u7b56\u7565\u53ef\u80fd\u6210\u4e3a\u7406\u89e3\u548c\u6539\u8fdb\u7acb\u6cd5\u6d41\u7a0b\u7684\u6709\u6548\u5de5\u5177\uff0c\u8fd9\u4e0e\u4e00\u7cfb\u5217\u53d1\u73b0\u76f8\u547c\u5e94\uff0c\u5373\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u80fd\u6709\u7528\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u73b0\u8c61\u3002\u672a\u6765\u7684\u7814\u7a76\u5c06\u81f4\u529b\u4e8e\u63d0\u5347\u4ee3\u7406\u7684\u590d\u6742\u6027\uff0c\u6269\u5927\u6a21\u62df\u8303\u56f4\uff0c\u5e76\u63a2\u7d22\u5728\u653f\u7b56\u6d4b\u8bd5\u548c\u8c08\u5224\u4e2d\u7684\u5e94\u7528\u3002|\n", "2406.19966": "|**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|\u5927\u591a\u6570\u7ecf\u6d4e\u7406\u8bba\u901a\u5e38\u5047\u8bbe\u91d1\u878d\u5e02\u573a\u53c2\u4e0e\u8005\u662f\u5b8c\u5168\u7406\u6027\u7684\u4e2a\u4f53\uff0c\u5e76\u4f7f\u7528\u6570\u5b66\u6a21\u578b\u6765\u6a21\u62df\u4eba\u7c7b\u5728\u91d1\u878d\u5e02\u573a\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u884c\u4e3a\u5f80\u5f80\u5e76\u975e\u5b8c\u5168\u7406\u6027\uff0c\u7528\u6570\u5b66\u6a21\u578b\u7cbe\u786e\u9884\u6d4b\u9887\u5177\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\\textbf{A}gent-based \\textbf{S}imulated \\textbf{F}inancial \\textbf{M}arket\uff08ASFM\uff09\uff0c\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u771f\u5b9e\u8ba2\u5355\u5339\u914d\u7cfb\u7edf\u7684\u6a21\u62df\u80a1\u7968\u5e02\u573a\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80a1\u7968\u4ea4\u6613\u4ee3\u7406\uff0c\u5b83\u5305\u62ec\u4e2a\u4eba\u6982\u51b5\u3001\u89c2\u5bdf\u548c\u57fa\u4e8e\u5de5\u5177\u5b66\u4e60\u7684\u52a8\u4f5c\u6a21\u5757\u3002\u8fd9\u79cd\u4ea4\u6613\u4ee3\u7406\u80fd\u591f\u5168\u9762\u7406\u89e3\u5f53\u524d\u5e02\u573a\u52a8\u6001\u548c\u91d1\u878d\u653f\u7b56\u4fe1\u606f\uff0c\u4ece\u800c\u6839\u636e\u5176\u4ea4\u6613\u7b56\u7565\u4f5c\u51fa\u51b3\u7b56\u3002\u5b9e\u9a8c\u8868\u660e\uff0cASFM\u5728\u53ef\u63a7\u573a\u666f\u4e0b\u7684\u53cd\u5e94\u4e0e\u73b0\u5b9e\u80a1\u7968\u5e02\u573a\u4e00\u81f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u4e24\u4e2a\u7ecf\u6d4e\u5b66\u7814\u7a76\u70ed\u70b9\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u53d1\u73b0\uff0c\u6211\u4eec\u7684\\model\u5f97\u51fa\u7684\u7ed3\u8bba\u4e0e\u7ecf\u6d4e\u5b66\u7814\u7a76\u7684\u521d\u6b65\u53d1\u73b0\u76f8\u543b\u5408\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8ba4\u4e3aASFM\u4e3a\u7ecf\u6d4e\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u8303\u5f0f\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u5982\u4e13\u95e8\u5316\u7684\u6a21\u578b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u53ef\u4ee5\u6839\u636e\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5728\u533b\u7597\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u95e8\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u79f0\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\u6765\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u4e3a\u7ed9\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u5408\u9002\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\u7684\u6700\u65b0\u72b6\u6001\uff0c\u751a\u81f3\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4o\u76f8\u6bd4\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cMMedAgent\u8fd8\u663e\u793a\u51fa\u4e86\u66f4\u65b0\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u7684\u9ad8\u6548\u6027\u3002|\n", "2407.01887": "|**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|\u672c\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u51b3\u7b56\u5236\u5b9a\u4e2d\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u675c\u5c14\u514b\u59c6\u53cc\u81c2\u8d4c\u535a\uff08Dueling Bandits\uff0cDB\uff09\u95ee\u9898\u7684\u4e0a\u4e0b\u6587\u4e2d\u3002\u7814\u7a76\u6bd4\u8f83\u4e86GPT-3.5-Turbo\u3001GPT-4\u548cGPT-4-Turbo\u4e0e\u73b0\u6709DB\u7b97\u6cd5\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c24\u5176\u662fGPT-4 Turbo\uff0c\u80fd\u591f\u5feb\u901f\u8bc6\u522b\u51fa\u4f18\u52bf\u660e\u663e\u7684\u9009\u9879\uff0c\u4ece\u800c\u5728\u5f31\u540e\u6094\u65b9\u9762\u8d85\u8d8a\u5f53\u524d\u6700\u4f73\u7b97\u6cd5\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6536\u655b\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5bf9\u63d0\u793a\u7684\u654f\u611f\u5ea6\u8f83\u9ad8\uff0c\u4e14\u5bf9\u63d0\u793a\u53d8\u5316\u53cd\u5e94\u8106\u5f31\u3002\u4e3a\u4e86\u6539\u8fdb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u4e86LLM\u51b3\u7b56\u80fd\u529b\u4e0e\u7ecf\u5178DB\u7b97\u6cd5\u7406\u8bba\u4fdd\u8bc1\u7684\u589e\u5f3a\u578b\u7b97\u6cd5\u2014\u2014IF-Enhanced LLM\u3002\u8fd9\u79cd\u8bbe\u8ba1\u5c55\u793a\u4e86\u5982\u4f55\u589e\u5f3aLLM\u5728\u5bf9\u6027\u80fd\u7a33\u5b9a\u6027\u6709\u8981\u6c42\u7684\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u53ef\u4fe1\u5ea6\u3002IF-Enhanced LLM\u5177\u6709\u5f31\u540e\u6094\u548c\u5f3a\u540e\u6094\u7684\u7406\u8bba\u4fdd\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u5373\u4f7f\u9762\u5bf9\u5608\u6742\u548c\u5bf9\u6297\u6027\u7684\u63d0\u793a\uff0cIF-Enhanced LLM\u4ecd\u4fdd\u6301\u7a33\u5065\u3002|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u81ea\u52a8\u5316\uff0c\u5982\u4ee3\u7801\u5408\u6210\u3001\u7a0b\u5e8f\u4fee\u590d\u548c\u6d4b\u8bd5\u751f\u6210\uff0c\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\u3002\u7814\u7a76\u4eba\u5458\u548c\u4e1a\u754c\u5b9e\u8df5\u8005\u5df2\u7ecf\u5f00\u53d1\u51fa\u5404\u79cd\u81ea\u4e3bLLM\u4ee3\u7406\u6765\u6267\u884c\u7aef\u5230\u7aef\u7684\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u3001\u8fd0\u884c\u547d\u4ee4\u3001\u89c2\u5bdf\u73af\u5883\u53cd\u9988\u5e76\u89c4\u5212\u672a\u6765\u884c\u52a8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u65b9\u6cd5\u7684\u590d\u6742\u6027\u4ee5\u53ca\u5f53\u524dLLM\u7684\u5c40\u9650\u6027\uff0c\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u662f\u5426\u771f\u7684\u9700\u8981\u4f7f\u7528\u590d\u6742\u7684\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\uff1f\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86Agentless\u2014\u2014\u4e00\u79cd\u65e0\u4ee3\u7406\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u89e3\u51b3\u8f6f\u4ef6\u5f00\u53d1\u95ee\u9898\u3002\u4e0e\u590d\u6742\u7684\u4ee3\u7406\u8bbe\u7f6e\u76f8\u6bd4\uff0cAgentless\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u4e24\u9636\u6bb5\u8fc7\u7a0b\uff1a\u5b9a\u4f4d\u540e\u4fee\u590d\uff0c\u4e0d\u8ba9LLM\u51b3\u5b9a\u672a\u6765\u7684\u884c\u52a8\u6216\u64cd\u4f5c\u590d\u6742\u7684\u5de5\u5177\u3002\u5728\u6d41\u884c\u7684SWE-bench Lite\u57fa\u51c6\u4e0a\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4ee4\u4eba\u60ca\u8bb6\u5730\u8868\u660e\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u80fd\u591f\u5b9e\u73b0\u6700\u9ad8\u6027\u80fd\uff0827.33%\uff09\u548c\u6700\u4f4e\u6210\u672c\uff080.34\u7f8e\u5143\uff09\uff0c\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u8f6f\u4ef6\u4ee3\u7406\uff01 \u6b64\u5916\uff0c\u6211\u4eec\u624b\u52a8\u5206\u7c7b\u4e86SWE-bench Lite\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u53d1\u73b0\u5b58\u5728\u7cbe\u786e\u7684ground truth\u8865\u4e01\u95ee\u9898\u6216\u63cf\u8ff0\u4e0d\u8db3/\u8bef\u5bfc\u6027\u7684\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86SWE-bench Lite-S\uff0c\u901a\u8fc7\u6392\u9664\u8fd9\u4e9b\u95ee\u9898\u6765\u8fdb\u884c\u66f4\u4e25\u683c\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5f53\u524d\u88ab\u5ffd\u89c6\u7684\u7b80\u5355\u3001\u53ef\u89e3\u91ca\u6280\u672f\u5728\u81ea\u4e3b\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5e0c\u671bAgentless\u5c06\u4f5c\u4e3a\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\u7684\u57fa\u7ebf\u3001\u8d77\u70b9\u548c\u671f\u671b\u503c\uff0c\u6fc0\u53d1\u672a\u6765\u5728\u8fd9\u4e2a\u5173\u952e\u9886\u57df\u7684\u5de5\u4f5c\u3002**|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u6536\u96c6\u5168\u7403\u4fe1\u606f\uff0c\u5e76\u8fdb\u884c\u63a8\u7406\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\uff0c\u8fd9\u5f15\u53d1\u4e86\u4f7f\u7528LLM\u9884\u6d4b\u56fd\u9645\u4e8b\u4ef6\u7684\u5174\u8da3\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u4e00\u4e2a\u4e25\u683c\u8bc4\u4f30LLM\u9884\u6d4b\u80fd\u529b\u4e0e\u53ef\u9760\u6027\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faMIRAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4ef7LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\u3002MIRAI\u6784\u5efa\u4e86\u4e00\u4e2a\u4ee3\u7406\u73af\u5883\uff0c\u914d\u5907\u6709\u8bbf\u95ee\u5e7f\u6cdb\u5386\u53f2\u7ed3\u6784\u5316\u4e8b\u4ef6\u548c\u6587\u672c\u65b0\u95fb\u6570\u636e\u5e93\u7684\u5de5\u5177\u3002\u6211\u4eec\u5bf9GDELT\u4e8b\u4ef6\u6570\u636e\u5e93\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6e05\u6d17\u548c\u89e3\u6790\uff0c\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u5173\u8054\u9884\u6d4b\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u4e0d\u540c\u9884\u6d4b\u65f6\u95f4\u8303\u56f4\uff0c\u4ece\u77ed\u671f\u5230\u957f\u671f\uff0c\u4ee5\u68c0\u9a8cLLM\u5728\u6574\u5408\u5168\u7403\u5173\u952e\u4fe1\u606f\u3001\u8fd0\u7528\u9886\u57df\u7279\u5b9aAPI\u548c\u5e93\u7f16\u5199\u4ee3\u7801\u4ee5\u53ca\u7efc\u5408\u5904\u7406\u6765\u81ea\u591a\u79cd\u683c\u5f0f\u548c\u65f6\u95f4\u7684\u5386\u53f2\u77e5\u8bc6\u4ee5\u51c6\u786e\u9884\u6d4b\u672a\u6765\u4e8b\u4ef6\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u53ef\u9760\u7684\u6846\u67b6\uff0c\u4ee5\u8bc4\u4f30LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u9884\u6d4b\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u63a8\u52a8\u66f4\u7cbe\u786e\u548c\u53ef\u4fe1\u7684\u56fd\u9645\u5173\u7cfb\u5206\u6790\u6a21\u578b\u7684\u53d1\u5c55\u3002|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u57fa\u4e8eLLM\u7684\u79fb\u52a8\u4ee3\u7406\u5df2\u6210\u4e3a\u4eba\u673a\u4ea4\u4e92\u9886\u57df\u7684\u7814\u7a76\u70ed\u70b9\u3002\u7136\u800c\uff0c\u9488\u5bf9\u6b64\u7c7b\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\u8d44\u6e90\u76f8\u5bf9\u532e\u4e4f\u3002\u8bc4\u4f30\u8fd9\u7c7b\u4ee3\u7406\u901a\u5e38\u9762\u4e34\u4e09\u4e2a\u6311\u6218\uff1a\uff081\uff09\u4ec5\u4f9d\u8d56\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\u7684\u4f4e\u6548\u9650\u5236\u4e86\u4efb\u52a1\u8bc4\u4f30\uff1b\uff082\uff09\u5355\u4e00\u5e94\u7528\u4e2d\u7684\u7279\u5b9a\u6307\u4ee4\u4e0d\u8db3\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u7684\u591a\u7ef4\u5ea6\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\uff1b\uff083\uff09\u5f53\u524d\u7684\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u51c6\u786e\u8861\u91cf\u8fde\u7eed\u52a8\u4f5c\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mobile-Bench\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u7528\u4e8e\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u80fd\u529b\u7684\u57fa\u51c6\u3002\u9996\u5148\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4f20\u7edf\u7684UI\u64cd\u4f5c\uff0c\u878d\u5165\u4e86103\u4e2a\u6536\u96c6\u5230\u7684API\uff0c\u4ee5\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7684\u6548\u7387\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u548cLLM\u589e\u5f3a\u7684\u6570\u636e\u6536\u96c6\u6765\u8fdb\u884c\u8bc4\u4f30\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u8bc4\u4ef7\u79fb\u52a8\u4ee3\u7406\u7684\u4e0d\u540c\u89c4\u5212\u80fd\u529b\u5c42\u6b21\uff0c\u6211\u4eec\u7684\u6570\u636e\u88ab\u5206\u4e3aSAST\uff08\u7b80\u5355\u4efb\u52a1\uff09\u3001SAMT\uff08\u7a0d\u590d\u6742\u4efb\u52a1\uff09\u548cMAMT\uff08\u591a\u4efb\u52a1\uff09\u4e09\u7c7b\uff0c\u53cd\u6620\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u7684\u5dee\u5f02\u3002Mobile-Bench\u5305\u542b832\u6761\u6570\u636e\u6761\u76ee\uff0c\u5176\u4e2d\u8d85\u8fc7200\u9879\u4efb\u52a1\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u6d4b\u8bd5\u8de8\u5e94\u7528\u534f\u4f5c\u573a\u666f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3aCheckPoint\uff0c\u7528\u4e8e\u68c0\u67e5LLM\u79fb\u52a8\u4ee3\u7406\u5728\u89c4\u5212\u548c\u63a8\u7406\u6b65\u9aa4\u4e2d\u662f\u5426\u8fbe\u5230\u5173\u952e\u70b9\u3002|\n", "2407.00476": "|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**\u968f\u7740\u4f20\u7edf\u4f18\u5316\u548c\u8c03\u5ea6\u65b9\u6cd5\u9010\u6e10\u8f6c\u5411\u7528\u6237\u9a71\u52a8\u548c\u4e2a\u4eba\u5316\u670d\u52a1\uff0c\u4ee5\u63d0\u5347\u7528\u6237\u4f53\u9a8c\uff08QoE\uff09\u548c\u7075\u6d3b\u6027\uff0c\u672a\u6765\u7684\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5728\u65e0\u7ebf\u548c\u6570\u5b57\u5316\u80fd\u6e90\u7f51\u7edc\u4e2d\uff0c\u9762\u4e34\u7740\u5982\u4f55\u66f4\u597d\u5730\u7406\u89e3\u548c\u54cd\u5e94\u7528\u6237\u9700\u6c42\u7684\u6311\u6218\u3002\u4f20\u7edf\u7684\u7cfb\u7edf\u5f80\u5f80\u5ffd\u89c6\u4e86\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\uff0c\u56e0\u4e3a\u7528\u6237\u4e0e\u673a\u5668\u4e4b\u95f4\u7684\u6c9f\u901a\u4e0d\u7545\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u5e26\u6765\u4e86\u7a81\u7834\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u7528\u6237\u4e0e\u8bbe\u5907\u4e4b\u95f4\u81ea\u7136\u7684\u4ea4\u6d41\u754c\u9762\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u67b6\u6784\uff0c\u901a\u8fc7\u6784\u5efa\u4e09\u4e2aLLM\u4ee3\u7406\u6765\u5c06\u7528\u6237\u7684\u8bed\u97f3\u8bf7\u6c42\uff08VRQ\uff09\u8f6c\u5316\u4e3a\u8d44\u6e90\u5206\u914d\u5411\u91cf\u3002\u5177\u4f53\u5305\u62ec\uff1aLLM\u610f\u56fe\u8bc6\u522b\u4ee3\u7406\u5c06\u8bf7\u6c42\u8f6c\u5316\u4e3a\u4f18\u5316\u95ee\u9898\uff08OP\uff09\u3001LLM OP\u53c2\u6570\u8bc6\u522b\u4ee3\u7406\u4ee5\u53caLLM OP\u6c42\u89e3\u4ee3\u7406\u3002 \u6211\u4eec\u9488\u5bf9\u7535\u52a8\u6c7d\u8f66\uff08EV\uff09\u5145\u7535\u7684\u5178\u578bVRQ\u521b\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u5e93\uff0c\u4f5c\u4e3a\u6027\u80fd\u8bc4\u4f30\u7684\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u4e3b\u8981\u4f7f\u7528Llama 3 8B\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\u3002\u901a\u8fc7\u4e0d\u540c\u7684\u63d0\u793a\u5de5\u7a0b\u573a\u666f\u6d4b\u8bd5\uff0c\u7ed3\u679c\u663e\u793a\u4e86\u6240\u63d0\u67b6\u6784\u7684\u6709\u6548\u6027\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u952e\u89c1\u89e3\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u5efa\u6a21\u5b9e\u9645\u95ee\u9898\u7684\u66f4\u5927\u5019\u9009OP\u96c6\u53ef\u80fd\u4f1a\u7531\u4e8e\u66f4\u9ad8\u7684\u8bc6\u522b/OP\u5206\u7c7b\u566a\u58f0\u800c\u964d\u4f4e\u6700\u7ec8\u6027\u80fd\u3002\u6240\u6709\u7ed3\u679c\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u5229\u7528\u3002**|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|\u4eba\u5de5\u667a\u80fd\u5728\u91d1\u878d\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6b63\u5728\u91cd\u5851\u6570\u636e\u5904\u7406\u548c\u89e3\u8bfb\u65b9\u5f0f\u3002\u5176\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u3001\u63d0\u5347\u5ba2\u6237\u670d\u52a1\uff0c\u5e76\u63d0\u4f9b\u8be6\u5c3d\u7684\u8d22\u52a1\u5206\u6790\u3002\u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdIDEA-FinBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u91d1\u878d\u77e5\u8bc6\u65b9\u9762\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u7684\u8bc4\u4ef7\u57fa\u51c6\u3002\u5b83\u501f\u9274\u4e86\u4e24\u4e2a\u5168\u7403\u77e5\u540d\u4e14\u6743\u5a01\u7684\u91d1\u878d\u4e13\u4e1a\u8003\u8bd5\u4e2d\u7684\u95ee\u9898\uff0c\u65e8\u5728\u5168\u9762\u68c0\u9a8cLLMs\u89e3\u7b54\u4e0e\u91d1\u878d\u76f8\u5173\u8003\u9898\u7684\u80fd\u529b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51faIDEA-FinKER\uff0c\u662f\u4e00\u4e2a\u91d1\u878d\u77e5\u8bc6\u589e\u5f3a\u6846\u67b6\uff0c\u65e8\u5728\u5feb\u901f\u8ba9\u901a\u7528LLMs\u9002\u5e94\u91d1\u878d\u9886\u57df\u3002\u5b83\u91c7\u7528\u57fa\u4e8e\u68c0\u7d22\u7684\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b9e\u73b0\u5b9e\u65f6\u4e0a\u4e0b\u6587\u7ea7\u77e5\u8bc6\u6ce8\u5165\uff0c\u5e76\u63d0\u4f9b\u4e00\u5957\u9ad8\u8d28\u91cf\u7684\u91d1\u878d\u77e5\u8bc6\u6307\u4ee4\uff0c\u7528\u4e8e\u5fae\u8c03\u4efb\u4f55\u901a\u7528\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86IDEA-FinQA\uff0c\u4e00\u4e2a\u7531LLMs\u9a71\u52a8\u7684\u91d1\u878d\u95ee\u7b54\u7cfb\u7edf\u3002\u8be5\u7cfb\u7edf\u56f4\u7ed5\u5b9e\u65f6\u77e5\u8bc6\u6ce8\u5165\u548c\u4e8b\u5b9e\u5f3a\u5316\u7684\u67b6\u6784\u6784\u5efa\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u3002IDEA-FinQA\u4e3b\u8981\u7531\u6570\u636e\u6536\u96c6\u5668\u3001\u6570\u636e\u67e5\u8be2\u6a21\u5757\u548c\u6267\u884c\u7279\u5b9a\u529f\u80fd\u7684LLM\u4ee3\u7406\u7ec4\u6210\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u5ea6\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04503": "|**2024-07-05**|**When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions**|J\u00e9r\u00e9my Perez et.al.|[2407.04503](http://arxiv.org/abs/2407.04503)|**[link](https://github.com/jeremyperez2/telephonegamellm)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e4b\u95f4\u7684\u4e92\u52a8\u589e\u52a0\uff0c\u5b83\u4eec\u5728\u7ebf\u4e0a\u751f\u6210\u7684\u6587\u672c\u91cf\u4e5f\u968f\u4e4b\u589e\u591a\uff0c\u7814\u7a76\u5982\u4f55\u4fe1\u606f\u5728\u4ece\u4e00\u4e2aLLM\u4f20\u9012\u5230\u53e6\u4e00\u4e2aLLM\u7684\u8fc7\u7a0b\u4e2d\u53d1\u751f\u53d8\u5316\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5bf9\u5355\u4e2aLLM\u7684\u884c\u4e3a\u5df2\u6709\u6df1\u5165\u7814\u7a76\uff0c\u4f46\u5bf9\u8fed\u4ee3\u4ea4\u4e92\u4e2d\u96c6\u4f53\u884c\u4e3a\u548c\u4fe1\u606f\u626d\u66f2\u7684\u63a2\u8ba8\u76f8\u5bf9\u4e0d\u8db3\u3002\u5fae\u5c0f\u7684\u504f\u5dee\uff0c\u5728\u5355\u6b21\u8f93\u51fa\u65f6\u53ef\u80fd\u663e\u5f97\u4e0d\u660e\u663e\uff0c\u4f46\u5728\u591a\u6b21\u4ea4\u4e92\u4e2d\u53ef\u80fd\u4f1a\u88ab\u653e\u5927\uff0c\u53ef\u80fd\u5bfc\u81f4\u5185\u5bb9\u671d\u7740\u5438\u5f15\u5b50\u72b6\u6001\u6f14\u53d8\u3002\u6211\u4eec\u901a\u8fc7\u501f\u9274\u4eba\u7c7b\u6587\u5316\u8fdb\u5316\u5b66\u7684\u7814\u7a76\u65b9\u6cd5\u2014\u2014\u7535\u8bdd\u6e38\u620f\u5b9e\u9a8c\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u94fe\u5f0f\u4f20\u8f93\u6a21\u578b\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLLM\u4ee3\u7406\u63a5\u6536\u3001\u751f\u6210\u5e76\u4f20\u9012\u6587\u672c\uff0c\u4ece\u4e00\u4e2a\u94fe\u4e2d\u7684\u524d\u4e00\u4e2a\u4ee3\u7406\u5230\u4e0b\u4e00\u4e2a\u3002\u6211\u4eec\u8ffd\u8e2a\u4e86\u6587\u672c\u7684\u6bd2\u6027\u3001\u79ef\u6781\u5ea6\u3001\u96be\u5ea6\u548c\u957f\u5ea6\u5728\u4f20\u8f93\u94fe\u4e2d\u7684\u6f14\u53d8\uff0c\u63ed\u793a\u4e86\u504f\u89c1\u548c\u5438\u5f15\u5b50\u7684\u5b58\u5728\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u4e0e\u521d\u59cb\u6587\u672c\u3001\u6307\u4ee4\u3001\u8bed\u8a00\u6a21\u578b\u548c\u6a21\u578b\u89c4\u6a21\u7684\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u53d1\u73b0\u5f00\u653e\u6027\u6307\u4ee4\u6bd4\u7ea6\u675f\u6027\u4efb\u52a1\u66f4\u5bb9\u6613\u5f15\u53d1\u66f4\u5f3a\u7684\u5438\u5f15\u6548\u5e94\u3002\u6b64\u5916\uff0c\u4e0d\u540c\u7684\u6587\u672c\u7279\u6027\u5bf9\u5438\u5f15\u5b50\u6548\u5e94\u7684\u654f\u611f\u5ea6\u4e0d\u540c\uff0c\u6bd2\u6027\u7684\u5f71\u54cd\u901a\u5e38\u5927\u4e8e\u957f\u5ea6\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8003\u8651\u591a\u6b65\u9aa4\u4f20\u8f93\u52a8\u6001\u7684\u91cd\u8981\u6027\uff0c\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3LLM\u7684\u6587\u5316\u52a8\u6001\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.04363": "|**2024-07-05**|**AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents**|Petr Anokhin et.al.|[2407.04363](http://arxiv.org/abs/2407.04363)|**[link](https://github.com/airi-institute/arigraph)**|**\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u4e2d\u5c55\u73b0\u51fa\u5e7f\u9614\u7684\u5e94\u7528\u524d\u666f\u3002\u5b9e\u73b0\u771f\u6b63\u7684\u81ea\u4e3b\u6027\u9700\u8981\u4ece\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u4e2d\u79ef\u7d2f\u548c\u66f4\u65b0\u77e5\u8bc6\uff0c\u5e76\u80fd\u6709\u6548\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u3002\u5f53\u524d\u57fa\u4e8eLLMs\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u5168\u5386\u53f2\u89c2\u5bdf\u3001\u603b\u7ed3\u6216\u68c0\u7d22\u589e\u5f3a\uff0c\u4f46\u8fd9\u4e9b\u975e\u7ed3\u6784\u5316\u7684\u8bb0\u5fc6\u8868\u793a\u4e0d\u5229\u4e8e\u590d\u6742\u51b3\u7b56\u4e2d\u7684\u63a8\u7406\u548c\u89c4\u5212\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51faAriGraph\uff0c\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\uff0c\u8ba9\u4ee3\u7406\u5728\u63a2\u7d22\u73af\u5883\u4e2d\u6784\u5efa\u878d\u5408\u8bed\u4e49\u548c\u60c5\u8282\u8bb0\u5fc6\u7684\u8bb0\u5fc6\u56fe\u3002\u8fd9\u79cd\u56fe\u7ed3\u6784\u4fc3\u8fdb\u5173\u8054\u6982\u5ff5\u7684\u6709\u6548\u68c0\u7d22\uff0c\u8fd9\u4e9b\u6982\u5ff5\u4e0e\u4ee3\u7406\u5f53\u524d\u72b6\u6001\u548c\u76ee\u6807\u76f8\u5173\uff0c\u4ece\u800c\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u73af\u5883\u6a21\u578b\uff0c\u63d0\u5347\u63a2\u7d22\u548c\u89c4\u5212\u80fd\u529b\u3002 \u6211\u4eec\u8bbe\u8ba1\u7684Ariadne LLM\u4ee3\u7406\uff0c\u914d\u5907\u6709\u6211\u4eec\u63d0\u51fa\u7684\u8bb0\u5fc6\u67b6\u6784\u4ee5\u53ca\u89c4\u5212\u548c\u51b3\u7b56\u529f\u80fd\uff0c\u80fd\u5728\u96f6\u6837\u672c\u57fa\u7840\u4e0a\u5904\u7406TextWorld\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\uff0c\u5982First TextWorld Problems\u7ade\u8d5b\u4e2d\u7684\u70f9\u996a\u6311\u6218\uff0c\u4ee5\u53ca\u65b0\u4efb\u52a1\u5982\u623f\u5c4b\u6e05\u6d01\u548c\u5bfb\u5b9d\u8c1c\u9898\u3002\u4e0e\u5168\u5386\u53f2\u3001\u603b\u7ed3\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b49\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002**|\n", "2407.06112": "|**2024-07-08**|**Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning**|Yadong Zhang et.al.|[2407.06112](http://arxiv.org/abs/2407.06112)|null|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63a8\u7406\u65b9\u6cd5\u2014\u2014\u53cc\u5411\u51b3\u7b56\u89e3\u653e\u63a8\u7406\uff08BIDDER\uff09\uff0c\u65e8\u5728\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u51b3\u7b56\u5408\u7406\u6027\u3002\u4f20\u7edf\u63a8\u7406\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u5386\u53f2\u4fe1\u606f\uff0c\u91c7\u7528\u5355\u5411\uff08\u4ece\u5de6\u5230\u53f3\uff09\u7684\u63a8\u7406\u7b56\u7565\uff0c\u8fd9\u5bfc\u81f4\u5bf9\u6f5c\u5728\u672a\u6765\u7ed3\u679c\u7684\u8ba4\u8bc6\u4e0d\u8db3\uff0c\u4ee5\u53ca\u5386\u53f2\u80cc\u666f\u7684\u6574\u5408\u4e0d\u591f\u5145\u5206\uff0c\u4ece\u800c\u4ea7\u751f\u6b21\u4f18\u51b3\u7b56\u3002BIDDER\u901a\u8fc7\u878d\u5408\u7406\u6027\u51b3\u7b56\u7684\u539f\u5219\uff0c\u7279\u522b\u662f\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u5e76\u9884\u6d4b\u671f\u671b\u6548\u7528\uff0c\u5f25\u8865\u4e86\u8fd9\u4e00\u77ed\u677f\u3002\u5176\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u4ece\u5386\u53f2\u6570\u636e\u4e2d\u63a8\u65ad\u9690\u85cf\u72b6\u6001\uff0c\u4ee5\u8868\u793a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4e0d\u786e\u5b9a\u4fe1\u606f\uff1b\u5229\u7528\u8fd9\u4e9b\u9690\u85cf\u72b6\u6001\u9884\u6d4b\u672a\u6765\u7684\u6f5c\u5728\u72b6\u6001\u548c\u53ef\u80fd\u7ed3\u679c\uff1b\u7ed3\u5408\u5386\u53f2\u4fe1\u606f\uff08\u8fc7\u53bb\u60c5\u5883\uff09\u548c\u957f\u671f\u7ed3\u679c\uff08\u672a\u6765\u60c5\u5883\uff09\uff0c\u4ee5\u6307\u5bfc\u63a8\u7406\u3002\u901a\u8fc7\u53cc\u5411\u63a8\u7406\uff0cBIDDER\u80fd\u591f\u5168\u9762\u8003\u8651\u8fc7\u53bb\u548c\u672a\u6765\u7684\u60c5\u5883\uff0c\u4ece\u800c\u505a\u51fa\u66f4\u660e\u667a\u3001\u66f4\u7406\u6027\u7684\u51b3\u7b56\u3002\u6211\u4eec\u5728\u6251\u514b\uff08\u9650\u6ce8\u5fb7\u5dde\u6251\u514b\uff09\u548c\u8c08\u5224\u4e24\u4e2a\u660e\u786e\u573a\u666f\u4e2d\u6d4b\u8bd5\u4e86BIDDER\u7684\u6548\u679c\uff0c\u5b9e\u9a8c\u663e\u793a\u5b83\u663e\u8457\u63d0\u9ad8\u4e86\u8bed\u8a00\u6a21\u578b\u548c\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7684\u51b3\u7b56\u80fd\u529b\u3002|\n", "2407.05890": "|**2024-07-08**|**Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation**|Jiaqi Chen et.al.|[2407.05890](http://arxiv.org/abs/2407.05890)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89c6\u89c9\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u96f6\u6837\u672c\u7684\u5f3a\u5927\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u5173\u6ce8\u89e3\u51b3\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\uff0c\u901a\u8fc7\u9009\u62e9\u9884\u5b9a\u4e49\u5bfc\u822a\u56fe\u4e2d\u7684\u8282\u70b9\u8fdb\u884c\u79fb\u52a8\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u573a\u666f\u4e2d\u4f4e\u5c42\u6b21\u7684\u63a7\u5236\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AO-Planner\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u9762\u5411\u53ef\u53ca\u6027\u89c4\u5212\u7684\u8fde\u7eed\u89c6\u89c9\u5bfc\u822a\u6846\u67b6\u3002AO-Planner\u6574\u5408\u591a\u79cd\u57fa\u7840\u6a21\u578b\uff0c\u5b9e\u73b0\u9762\u5411\u53ef\u53ca\u6027\u7684\u8fd0\u52a8\u89c4\u5212\u548c\u52a8\u4f5c\u51b3\u7b56\uff0c\u5747\u4ee5\u96f6\u6837\u672c\u7684\u65b9\u5f0f\u6267\u884c\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u89c6\u89c9\u53ef\u53ca\u6027\u63d0\u793a\uff08VAP\uff09\u65b9\u6cd5\uff0c\u5229\u7528SAM\u5206\u5272\u53ef\u89c1\u5730\u9762\uff0c\u63d0\u4f9b\u5bfc\u822a\u53ef\u53ca\u6027\u4fe1\u606f\uff0c\u4ece\u800c\u8ba9\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u6f5c\u5728\u7684\u4e0b\u4e00\u4e2a\u8def\u6807\uff0c\u5e76\u751f\u6210\u5411\u9009\u5b9a\u8def\u6807\u7684\u4f4e\u5c42\u6b21\u8def\u5f84\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9ad8\u7ea7\u4ee3\u7406PathAgent\uff0c\u8bc6\u522b\u51fa\u6700\u53ef\u80fd\u7684\u50cf\u7d20\u7ea7\u8def\u5f84\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u4e09\u7ef4\u5750\u6807\uff0c\u4ee5\u5b8c\u6210\u4f4e\u5c42\u6b21\u7684\u79fb\u52a8\u3002 \u5728\u5177\u6709\u6311\u6218\u6027\u7684R2R-CE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0cAO-Planner\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u96f6\u6837\u672c\u6027\u80fd\u63d0\u5347\uff08SPL\u6307\u6807\u63d0\u9ad85.5%\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u8fde\u63a5\u4e86\u8bed\u8a00\u6a21\u578b\u4e0e\u4e09\u7ef4\u4e16\u754c\uff0c\u907f\u514d\u4e86\u76f4\u63a5\u9884\u6d4b\u4e16\u754c\u5750\u6807\u70b9\u7684\u56f0\u96be\uff0c\u4e3a\u5229\u7528\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4f4e\u5c42\u6b21\u8fd0\u52a8\u63a7\u5236\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\u3002|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5176\u4e2d\u7684\u5173\u952e\u90e8\u5206\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u8fdb\u884c\u8bc4\u4f30\u548c\u8fed\u4ee3\u4f18\u5316\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5047\u8bbe\u5fc3\u667a\u5728Melting Pot\u57fa\u51c6\u4e2d\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u65e0\u8bba\u662f\u4e8c\u5143\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\uff0c\u90fd\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\uff08LLM-agent\uff09\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7840\u7ebf\u3002\u5bf9\u6bd4\u5b9e\u9a8c\u8fd8\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u7cbe\u70bc\u5bf9\u4e8e\u5728\u590d\u6742\u573a\u666f\u4e2d\u53d6\u5f97\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.06813": "|**2024-07-09**|**Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy**|Zhenyu Guan et.al.|[2407.06813](http://arxiv.org/abs/2407.06813)|**[link](https://github.com/todexter3/richelieu)**|## \u80cc\u666f \u5728\u4eba\u7c7b\u793e\u4f1a\u4e2d\uff0c\u5916\u4ea4\u662f\u4e00\u79cd\u6781\u5176\u590d\u6742\u7684\u6d3b\u52a8\uff0c\u6d89\u53ca\u4f17\u591a\u5404\u65b9/\u884c\u52a8\u8005\u7684\u4e92\u52a8\uff0c\u9700\u8981\u5177\u5907\u793e\u4f1a\u63a8\u7406\u3001\u8c08\u5224\u6280\u5de7\u548c\u957f\u671f\u7b56\u7565\u89c4\u5212\u7b49\u591a\u65b9\u9762\u80fd\u529b\u3002\u4ee5\u5f80\u7684AI\u4ee3\u7406\u5df2\u7ecf\u5728\u5904\u7406\u591a\u6b65\u9aa4\u6e38\u620f\u548c\u5927\u52a8\u4f5c\u7a7a\u95f4\u7684\u591a\u4ee3\u7406\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u5b9e\u529b\u3002\u7136\u800c\uff0c\u5916\u4ea4\u6240\u6d89\u53ca\u7684\u51b3\u7b56\u7a7a\u95f4\u8303\u56f4\u60ca\u4eba\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u8c08\u5224\u7684\u9636\u6bb5\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e00\u4e9b\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4e86\u8d85\u8d8a\u524d\u4ee3\u7684\u80fd\u529b\uff0c\u4f46\u4ecd\u4e0d\u8db3\u4ee5\u5e94\u5bf9\u590d\u6742\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u957f\u65f6\u95f4\u7684\u89c4\u5212\u3002\u501f\u52a9\u5c16\u7aef\u7684LLM\u6280\u672f\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u63a2\u7d22AI\u5728\u5982\u6b64\u5168\u9762\u7684\u591a\u4ee3\u7406\u4f7f\u547d\u4e2d\u7684\u4e0a\u9650\uff0c\u901a\u8fc7\u6574\u5408\u4e09\u4e2a\u6838\u5fc3\u4e14\u5173\u952e\u7684\u529f\u80fd\uff0c\u4ee5\u6784\u5efa\u66f4\u5f3a\u7684\u57fa\u4e8eLLM\u7684\u793e\u4f1a\u6027\u4ee3\u7406\uff1a1\uff09\u5177\u6709\u8bb0\u5fc6\u548c\u53cd\u601d\u7684\u7b56\u7565\u89c4\u5212\u8005\uff1b2\uff09\u76ee\u6807\u5bfc\u5411\u7684\u3001\u5177\u5907\u793e\u4f1a\u63a8\u7406\u7684\u8c08\u5224\u8005\uff1b3\uff09\u901a\u8fc7\u81ea\u6211\u5bf9\u5f08\u6e38\u620f\u589e\u5f3a\u8bb0\u5fc6\uff0c\u5b9e\u73b0\u65e0\u4eba\u5de5\u5e72\u9884\u7684\u81ea\u6211\u8fdb\u5316\u3002|\n", "2407.06567": "|**2024-07-10**|**FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making**|Yangyang Yu et.al.|[2407.06567](http://arxiv.org/abs/2407.06567)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u5e76\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u91d1\u878d\u9886\u57df\u3002\u7136\u800c\uff0c\u9ad8\u8d28\u91cf\u7684\u8fde\u7eed\u6295\u8d44\u51b3\u7b56\u8fc7\u7a0b\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5b83\u9700\u8981\u4e0e\u4e0d\u65ad\u53d8\u5316\u7684\u73af\u5883\u8fdb\u884c\u591a\u6b21\u4ea4\u4e92\uff0c\u4ee5\u6700\u5927\u5316\u56de\u62a5\u5e76\u7ba1\u7406\u98ce\u9669\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u4eec\u80fd\u591f\u8d85\u8d8a\u4eba\u7c7b\u56e2\u961f\uff0c\u5b9e\u73b0\u6295\u8d44\u6536\u76ca\uff0c\u4f46\u5982\u4f55\u4f18\u5316\u591a\u6e90\u4fe1\u606f\u6574\u5408\u548c\u51b3\u7b56\u7ed3\u679c\uff0c\u901a\u8fc7\u5b9e\u65f6\u7ecf\u9a8c\u6539\u8fdb\uff0c\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faFinCon\uff0c\u4e00\u4e2a\u4e13\u4e3a\u591a\u6837\u5316\u7684\u91d1\u878d\u4efb\u52a1\u8bbe\u8ba1\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u7279\u70b9\u5728\u4e8e\u6982\u5ff5\u5316\u53e3\u5934\u5f3a\u5316\u548c\u8d22\u52a1\u7ec4\u7ec7\u7ed3\u6784\u7684\u8fd0\u7528\u3002 FinCon\u501f\u9274\u73b0\u5b9e\u4e16\u754c\u6295\u8d44\u516c\u53f8\u7684\u7ec4\u7ec7\u67b6\u6784\uff0c\u91c7\u7528\u7ecf\u7406-\u5206\u6790\u5e08\u7684\u6c9f\u901a\u5c42\u6b21\uff0c\u4fc3\u8fdb\u8de8\u804c\u80fd\u4ee3\u7406\u95f4\u7684\u534f\u540c\u5408\u4f5c\uff0c\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u6d41\u5b9e\u73b0\u76ee\u6807\u7edf\u4e00\u3002\u6bcf\u4e2a\u4ee3\u7406\u90fd\u5177\u5907\u6bd4\u4eba\u7c7b\u66f4\u5927\u7684\u8bb0\u5fc6\u5bb9\u91cf\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u9ad8\u6548\u7684\u4fe1\u606f\u5904\u7406\u3002\u6b64\u5916\uff0cFinCon\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u98ce\u9669\u63a7\u5236\u7ec4\u4ef6\uff0c\u5b9a\u671f\u542f\u52a8\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4ee5\u66f4\u65b0\u7cfb\u7edf\u7684\u6295\u8d44\u7406\u5ff5\u3002\u8fd9\u4e9b\u6982\u5ff5\u5316\u7684\u4fe1\u5ff5\u4f5c\u4e3a\u53e3\u5934\u5f3a\u5316\uff0c\u6307\u5bfc\u672a\u6765\u884c\u4e3a\uff0c\u5e76\u53ef\u6839\u636e\u9700\u8981\u9009\u62e9\u6027\u5730\u4f20\u9012\u7ed9\u9700\u8981\u66f4\u65b0\u77e5\u8bc6\u7684\u8282\u70b9\uff0c\u4ece\u800c\u51cf\u5c11\u4e0d\u5fc5\u8981\u7684\u4fe1\u606f\u4ea4\u6d41\u6210\u672c\uff0c\u63d0\u9ad8\u6027\u80fd\u3002 FinCon\u5728\u5355\u4e00\u80a1\u7968\u4ea4\u6613\u548c\u8d44\u4ea7\u7ba1\u7406\u7b49\u4e0d\u540c\u91d1\u878d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u91d1\u878d\u573a\u666f\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u57fa\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u4f8b\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.08550": "|**2024-07-11**|**Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility**|Yuchen Xia et.al.|[2407.08550](http://arxiv.org/abs/2407.08550)|**[link](https://github.com/yuchenxia/gpt4industrialautomation)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6574\u5408\u5230\u81ea\u52a8\u5316\u751f\u4ea7\u7cfb\u7edf\u4e2d\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u81ea\u52a8\u5316\u548c\u7075\u6d3b\u6027\u3002\u6211\u4eec\u6839\u636e\u81ea\u52a8\u5316\u91d1\u5b57\u5854\u6784\u5efa\u751f\u4ea7\u64cd\u4f5c\u7684\u5c42\u7ea7\u7ed3\u6784\uff0c\u5c06\u539f\u5b50\u64cd\u4f5c\u529f\u80fd\u62bd\u8c61\u4e3a\u5fae\u670d\u52a1\uff0c\u5e76\u901a\u8fc7\u4e13\u7528\u7684\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u8fdb\u884c\u8c03\u7528\u6267\u884c\u3002\u8fd9\u4e3a\u534f\u8c03\u751f\u4ea7\u6d41\u7a0b\u63d0\u4f9b\u4e86\u53ef\u6269\u5c55\u4e14\u7075\u6d3b\u7684\u57fa\u7840\u3002\u5728\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u4e2d\uff0c\u4f4e\u5c42\u6b21\u7684\u3001\u786c\u4ef6\u7279\u5b9a\u7684\u6570\u636e\u88ab\u8d4b\u4e88\u8bed\u4e49\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u7406\u89e3\u548c\u5904\u7406\u751f\u4ea7\u8ba1\u5212\u4e0e\u63a7\u5236\u4efb\u52a1\u3002\u5f53\u63a5\u6536\u5230\u7528\u6237\u8bf7\u6c42\u6216\u8bc6\u522b\u5230\u89e6\u53d1\u4e8b\u4ef6\u65f6\uff0cLLMs\u4f1a\u751f\u6210\u751f\u4ea7\u6d41\u7a0b\u8ba1\u5212\uff0c\u7136\u540e\u5c06\u5176\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u5fae\u670d\u52a1\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u7684\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u6267\u884c\u3002\u6211\u4eec\u5728\u5b9e\u9a8c\u5ba4\u7684\u6a21\u5757\u5316\u81ea\u52a8\u5316\u8bbe\u65bd\u4e0a\u5b9e\u73b0\u4e86\u8fd9\u4e00\u6574\u4f53\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e00\u4e2a\u5b9e\u9645\u6848\u4f8b\u5c55\u793a\u4e86LLMs\u5982\u4f55\u5904\u7406\u751f\u4ea7\u89c4\u5212\u548c\u63a7\u5236\u4efb\u52a1\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e00\u4e2a\u76f4\u89c2\u3001\u81ea\u52a8\u5316\u7a0b\u5ea6\u9ad8\u4e14\u66f4\u5177\u7075\u6d3b\u6027\u7684\u751f\u4ea7\u73af\u5883\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u73b0LLMs\u5728\u81ea\u4e3b\u7cfb\u7edf\u4e2d\u7684\u5168\u90e8\u6f5c\u529b\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5176\u6f5c\u5728\u7684\u6709\u76ca\u4e4b\u5904\u3002\u6709\u5173\u6b64\u7cfb\u5217\u7814\u7a76\u7684\u6f14\u793a\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u8bbf\u95ee\uff1ahttps://github.com/YuchenXia/GPT4IndustrialAutomation\u3002|\n", "2407.08213": "|**2024-07-11**|**PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models**|Ruiqi Wang et.al.|[2407.08213](http://arxiv.org/abs/2407.08213)|null|## \u7ffb\u8bd1 \u504f\u597d\u9a71\u52a8\u7684\u5f3a\u5316\u5b66\u4e60\uff08PbRL\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u4eba\u7c7b\u6bd4\u8f83\u53cd\u9988\u6559\u5bfc\u673a\u5668\u4eba\uff0c\u907f\u514d\u4e86\u590d\u6742\u7684\u5956\u52b1\u5de5\u7a0b\u7684\u9700\u6c42\u3002\u7136\u800c\uff0c\u73b0\u6709PbRL\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u53cd\u9988\uff0c\u5f80\u5f80\u5bfc\u81f4\u5bf9\u7531\u811a\u672c\u6559\u5e08\u751f\u6210\u7684\u5408\u6210\u53cd\u9988\u7684\u4f9d\u8d56\uff0c\u8fd9\u53c8\u56de\u5230\u4e86\u590d\u6742\u7684\u5956\u52b1\u8bbe\u8ba1\uff0c\u5e76\u96be\u4ee5\u9002\u5e94\u4eba\u7c7b-\u673a\u5668\u4eba\u4ea4\u4e92\uff08HRI\uff09\u573a\u666f\u4e2d\u7528\u6237\u5bf9\u540c\u4e00\u4efb\u52a1\u7684\u72ec\u7279\u671f\u671b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PrefCLM\uff0c\u5b83\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6a21\u62df\u6559\u5e08\u53c2\u4e0ePbRL\u3002\u6211\u4eec\u8fd0\u7528Dempster-Shafer\u7406\u8bba\u5728\u5206\u6570\u7ea7\u522b\u878d\u5408\u6765\u81ea\u591a\u4e2aLLM\u4ee3\u7406\u7684\u4e2a\u4eba\u504f\u597d\uff0c\u6709\u6548\u5229\u7528\u5b83\u4eec\u7684\u591a\u6837\u6027\u548c\u96c6\u4f53\u667a\u6167\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7528\u6237\u53c2\u4e0e\u7684\u6d41\u7a0b\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8e\u7528\u6237\u4ea4\u4e92\u7684\u96c6\u4f53\u7cbe\u8fdb\u3002\u5728\u5404\u79cd\u901a\u7528\u5f3a\u5316\u5b66\u4e60\u4efb\u52a1\u4e2d\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cPrefCLM\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u811a\u672c\u6559\u5e08\u76f8\u5f53\uff0c\u5e76\u4e14\u5728\u4fc3\u8fdb\u66f4\u81ea\u7136\u3001\u9ad8\u6548\u7684\u673a\u5668\u4eba\u884c\u4e3a\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4e00\u4e2a\u73b0\u5b9e\u4e16\u754c\u7684\u7528\u6237\u7814\u7a76\uff08N=10\uff09\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5b83\u5728\u4e2a\u6027\u5316\u7528\u6237\u504f\u597d\u7684\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86HRI\u573a\u666f\u4e2d\u7684\u7528\u6237\u6ee1\u610f\u5ea6\u3002|\n", "2407.10718": "|**2024-07-16**|**Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning**|Yulong Wang et.al.|[2407.10718](http://arxiv.org/abs/2407.10718)|**[link](https://github.com/ag2s1/sibyl-system)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u73b0\u6709\u4ee3\u7406\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u5185\u5728\u77e5\u8bc6\u3001\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u96f6\u6837\u672c\u80fd\u529b\u4ee5\u53ca\u4eba\u7c7b\u8bbe\u8ba1\u7684\u590d\u6742LLM\u8c03\u7528\u5de5\u4f5c\u6d41\u7a0b\u4e0e\u5de5\u5177\u7684\u7ed3\u5408\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u957f\u671f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5e76\u4e14\u672a\u80fd\u5145\u5206\u5229\u7528\u73b0\u6709\u5de5\u5177\u7684\u6f5c\u529b\uff0c\u5bfc\u81f4\u5728\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u573a\u666f\u4e2d\u51fa\u73b0\u660e\u663e\u7684\u7f3a\u9677\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86Sibyl\uff0c\u4e00\u4e2a\u7b80\u5355\u800c\u5f3a\u5927\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u9ad8\u6548\u5229\u7528\u6700\u5c11\u7684\u5de5\u5177\u96c6\u6765\u89e3\u51b3\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002\u53d7\u5230\u5168\u7403\u5de5\u4f5c\u7a7a\u95f4\u7406\u8bba\u7684\u542f\u53d1\uff0cSibyl\u6574\u5408\u4e86\u4e00\u4e2a\u5168\u5c40\u5de5\u4f5c\u7a7a\u95f4\uff0c\u4ee5\u589e\u5f3a\u7cfb\u7edf\u5185\u90e8\u7684\u77e5\u8bc6\u548c\u5bf9\u8bdd\u5386\u53f2\u7684\u7ba1\u7406\u548c\u5171\u4eab\u3002\u6b64\u5916\uff0c\u6839\u636e\u5fc3\u667a\u793e\u4f1a\u7406\u8bba\u7684\u6307\u5bfc\uff0cSibyl\u5b9e\u65bd\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u8fa9\u8bba\u4e3a\u57fa\u7840\u7684\u966a\u5ba1\u56e2\uff0c\u7528\u4e8e\u81ea\u6211\u7ec6\u5316\u6700\u7ec8\u7b54\u6848\uff0c\u786e\u4fdd\u5168\u9762\u5e73\u8861\u7684\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11\u7cfb\u7edf\u590d\u6742\u6027\uff0c\u540c\u65f6\u6269\u5927\u53ef\u89e3\u51b3\u7684\u95ee\u9898\u8303\u56f4\u2014\u2014\u4ece\u4eba\u7c7b\u51e0\u5206\u949f\u5185\u5c31\u80fd\u89e3\u51b3\u7684\u95ee\u9898\u5230\u9700\u8981\u6570\u5c0f\u65f6\u751a\u81f3\u51e0\u5929\u624d\u80fd\u89e3\u51b3\u7684\u95ee\u9898\uff0c\u4ece\u800c\u5b9e\u73b0\u4ece\u7cfb\u7edf1\u5230\u7cfb\u7edf2\u601d\u8003\u65b9\u5f0f\u7684\u8f6c\u53d8\u3002Sibyl\u7684\u8bbe\u8ba1\u91cd\u70b9\u5728\u4e8e\u53ef\u6269\u5c55\u6027\u548c\u8c03\u8bd5\u7684\u7b80\u4fbf\u6027\uff0c\u901a\u8fc7\u4ece\u4e00\u5f00\u59cb\u5c31\u878d\u5165\u51fd\u6570\u7f16\u7a0b\u4e2d\u7684\u91cd\u5165\u6982\u5ff5\uff0c\u65e8\u5728\u5b9e\u73b0\u65e0\u7f1d\u548c\u4f4e\u52aa\u529b\u7684\u96c6\u6210\u5230\u5176\u4ed6LLM\u5e94\u7528\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5176\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528GPT-4\u5b9e\u4f8b\u5316\u7684Sibyl\u4ee3\u7406\u5728GAIA\u57fa\u51c6\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8868\u73b0\u6700\u4f73\uff0c\u5e73\u5747\u5f97\u5206\u4e3a34.55%\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8eGPT-4\u7684\u5176\u4ed6\u4ee3\u7406\u3002\u6211\u4eec\u5e0c\u671bSibyl\u80fd\u591f\u6fc0\u52b1\u66f4\u591a\u53ef\u9760\u4e14\u53ef\u590d\u7528\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u3002**|\n", "2407.10580": "|**2024-07-15**|**Leveraging Hybrid Intelligence Towards Sustainable and Energy-Efficient Machine Learning**|Daniel Geissler et.al.|[2407.10580](http://arxiv.org/abs/2407.10580)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u6df7\u5408\u667a\u80fd\u4ee5\u5b9e\u73b0\u53ef\u6301\u7eed\u548c\u80fd\u6e90\u610f\u8bc6\u7684\u673a\u5668\u5b66\u4e60\u7684\u65b9\u6cd5\u3002\u5728\u673a\u5668\u5b66\u4e60\u6a21\u578b\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u5f80\u5f80\u53ea\u5173\u6ce8\u6700\u7ec8\u6a21\u578b\u6027\u80fd\u7684\u4f18\u5316\uff0c\u800c\u5ffd\u7565\u4e86\u8fc7\u7a0b\u672c\u8eab\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8fd1\u671f\uff0c\u7531\u4e8e\u590d\u6742\u548c\u5927\u89c4\u6a21\u8ba1\u7b97\u8fc7\u7a0b\u5bf9\u73af\u5883\u7684\u5de8\u5927\u5f71\u54cd\uff0c\u80fd\u6e90\u6548\u7387\u53d8\u5f97\u540c\u6837\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u7684\u8d21\u732e\u5728\u4e8e\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\uff08Human-in-the-loop\uff0cHITL\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4ee3\u7406\u7684\u96c6\u6210\uff0c\u5f3a\u8c03\u5e76\u8fdb\u4e00\u6b65\u89e3\u51b3\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4f4e\u6548\u95ee\u9898\u3002 \u7b80\u800c\u8a00\u4e4b\uff0c\u672c\u6587\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u4eba\u7c7b\u7684\u76f4\u89c9\u3001\u7ecf\u9a8c\u548cAI\u7684\u9ad8\u6548\u8ba1\u7b97\u80fd\u529b\uff0c\u6539\u8fdb\u673a\u5668\u5b66\u4e60\u6d41\u7a0b\u7684\u6548\u7387\u548c\u73af\u5883\u53cb\u597d\u6027\u3002\u901a\u8fc7\u5f15\u5165HITL\u548cLLM\u4f5c\u4e3a\u8f85\u52a9\u5de5\u5177\uff0c\u6211\u4eec\u65e8\u5728\u8bc6\u522b\u548c\u4f18\u5316\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u74f6\u9888\uff0c\u4ece\u800c\u51cf\u5c11\u8d44\u6e90\u6d88\u8017\uff0c\u5e76\u4fc3\u8fdb\u66f4\u52a0\u53ef\u6301\u7eed\u7684AI\u5b9e\u8df5\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e0d\u4ec5\u6709\u52a9\u4e8e\u63d0\u9ad8\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\u548c\u6548\u7387\uff0c\u8fd8\u80fd\u964d\u4f4e\u80fd\u8017\uff0c\u5bf9\u73af\u5883\u4fdd\u62a4\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002|\n", "2407.10499": "|**2024-07-15**|**CIBench: Evaluating Your LLMs with a Code Interpreter Plugin**|Songyang Zhang et.al.|[2407.10499](http://arxiv.org/abs/2407.10499)|**[link](https://github.com/open-compass/CIBench)**|**\u5728\u57fa\u4e8eLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u4ee3\u7406\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\u7684\u540c\u65f6\uff0c\u5bf9\u5176\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u5b83\u4eec\u5c40\u9650\u6027\u7684\u6e05\u6670\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4ea4\u4e92\u5f0f\u8bc4\u4f30\u6846\u67b6\u2014\u2014CIBench\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u5728\u6570\u636e\u79d1\u5b66\u4efb\u52a1\u4e2d\u5229\u7528\u4ee3\u7801\u89e3\u91ca\u5668\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u8bc4\u4f30\u6570\u636e\u96c6\u548c\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u3002\u8bc4\u4f30\u6570\u636e\u96c6\u901a\u8fc7LLM\u4e0e\u4eba\u7c7b\u5408\u4f5c\u7684\u65b9\u5f0f\u6784\u5efa\uff0c\u901a\u8fc7\u8fde\u7eed\u4e14\u4e92\u52a8\u7684IPython\u4f1a\u8bdd\u6a21\u62df\u771f\u5b9e\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9LLM\u80fd\u529b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u5206\u522b\u8003\u5bdf\u4e86\u5728\u6709\u65e0\u4eba\u7c7b\u8f85\u52a9\u4e0b\uff0cLLM\u7684\u80fd\u529b\u8868\u73b0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5206\u6790\u4e8624\u4e2aLLM\u5728CIBench\u4e0a\u7684\u8868\u73b0\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u672a\u6765\u5728\u4ee3\u7801\u89e3\u91ca\u5668\u5229\u7528\u65b9\u9762\u53d1\u5c55LLM\u7684\u5b9d\u8d35\u89c1\u89e3\u3002**|\n", "2407.10081": "|**2024-07-14**|**All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era**|Bo Chen et.al.|[2407.10081](http://arxiv.org/abs/2407.10081)|null|\u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u5728\u5e94\u5bf9\u4fe1\u606f\u8fc7\u8f7d\u548c\u63d0\u4f9b\u4e2a\u6027\u5316\u5185\u5bb9\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u6ee1\u8db3\u7528\u6237\u591a\u6837\u5316\u7684\u4fe1\u606f\u9700\u6c42\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4e3a\u91cd\u65b0\u5b9a\u4e49\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\uff0c\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u3002\u7ad9\u5728LLM\u65f6\u4ee3\uff0c\u6211\u4eec\u65e8\u5728\u5c06\u63a8\u8350\u7cfb\u7edf\u6574\u5408\u5230\u66f4\u5e7f\u9614\u7684\u6846\u67b6\u4e2d\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u5f00\u8f9f\u66f4\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u9996\u5148\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6280\u672f\u8fdb\u5c55\u6982\u8ff0\uff0c\u7279\u522b\u662f\u9488\u5bf9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u53ca\u5176\u5728\u63a8\u8350\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u4ee3\u63a8\u8350\u7cfb\u7edf\u7684\u4e24\u6761\u6f14\u5316\u8def\u5f84\u2014\u2014\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u548c\u5bf9\u8bdd\u5f0f\u63a8\u8350\u3002\u8fd9\u4e24\u6761\u8def\u5f84\u6700\u7ec8\u5728\u5177\u6709\u957f\u671f\u8bb0\u5fc6\u3001\u53cd\u601d\u548c\u5de5\u5177\u667a\u80fd\u4f18\u52bf\u7684LLM\u4ee3\u7406\u4e0a\u4ea4\u6c47\u3002\u6cbf\u7740\u8fd9\u4e24\u6761\u8def\u5f84\uff0c\u6211\u4eec\u6307\u51fa\u63a8\u8350\u4fe1\u606f\u7684\u6709\u6548\u6027\u5f97\u5230\u4e86\u63d0\u9ad8\uff0c\u800c\u7528\u6237\u7684\u83b7\u53d6\u6210\u672c\u5219\u964d\u4f4e\u4e86\u3002\u6211\u4eec\u4ed4\u7ec6\u7814\u7a76\u4e86\u6bcf\u4e2a\u91cc\u7a0b\u7891\u7684\u6280\u672f\u7279\u6027\u3001\u7814\u7a76\u65b9\u6cd5\u8bba\u4ee5\u53ca\u5185\u5728\u6311\u6218\uff0c\u4ece\u4f20\u7edf\u7684\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u5230\u589e\u5f3a\u7684LLM\u63a8\u8350\u518d\u5230\u5e26\u6709LLM\u4ee3\u7406\u7684\u63a8\u8350\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u51e0\u4e2a\u5bf9\u4e8e\u672a\u6765\u4e2a\u6027\u5316\u6280\u672f\u4e0e\u754c\u9762\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u7684\u672a\u89e3\u51b3\u6311\u6218\uff0c\u5e76\u8ba8\u8bba\u4e86\u672a\u6765\u524d\u666f\u3002|\n", "2407.10064": "|**2024-07-14**|**Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights**|Xinyu-Chen et.al.|[2407.10064](http://arxiv.org/abs/2407.10064)|null|\u5728\u4eba\u7c7b\u793e\u4f1a\u53d1\u5c55\u5404\u5de5\u4e1a\u9886\u57df\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u5bfb\u6c42\u89e3\u653e\u52b3\u52a8\u529b\u7684\u65b9\u6cd5\u3002\u6784\u5efa\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u88ab\u89c6\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u9ad8\u6548\u5de5\u5177\u3002\u4f5c\u4e3a\u5177\u5907\u611f\u77e5\u3001\u89c4\u5212\u3001\u51b3\u7b56\u548c\u884c\u52a8\u80fd\u529b\u7684\u4eba\u7c7b\u667a\u80fd\u5b9e\u4f53\uff0c\u4ee3\u7406\u5df2\u7ecf\u5728\u4f17\u591a\u9886\u57df\u521b\u9020\u4e86\u663e\u8457\u7684\u751f\u4ea7\u4ef7\u503c\u3002\u7136\u800c\uff0c\u6865\u6881\u7ef4\u62a4\u4e0e\u7ba1\u7406\uff08O&M\uff09\u9886\u57df\u76f8\u6bd4\u5176\u4ed6\u884c\u4e1a\uff0c\u5176\u667a\u80fd\u5316\u6c34\u5e73\u76f8\u5bf9\u8f83\u4f4e\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u8be5\u9886\u57df\u5df2\u7ecf\u53d1\u5c55\u4e86\u4f17\u591a\u667a\u80fd\u68c0\u6d4b\u8bbe\u5907\u3001\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4ee5\u53ca\u81ea\u4e3b\u8bc4\u4f30\u548c\u51b3\u7b56\u65b9\u6cd5\uff0c\u4e3a\u672c\u9886\u57df\u7684\u4eba\u5de5\u667a\u80fd\u7a81\u7834\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\u4f53\u5bf9\u6865\u6881O&M\u9886\u57df\u7684\u5f71\u54cd\uff0c\u5206\u6790\u5b83\u5bf9\u6838\u5fc3\u4efb\u52a1\u53ef\u80fd\u5e26\u6765\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u901a\u8fc7\u6df1\u5165\u7814\u7a76\u548c\u5206\u6790\uff0c\u671f\u671b\u80fd\u4e3a\u7406\u89e3\u8fd9\u4e00\u9886\u57df\u667a\u80fd\u5316\u5e94\u7528\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u89c6\u89d2\u3002|\n", "2407.11843": "|**2024-07-16**|**InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback**|Haishuo Fang et.al.|[2407.11843](http://arxiv.org/abs/2407.11843)|null|\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u90e8\u7f72\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u7684\u5173\u952e\u8981\u6c42\u662f\u5bf9\u53ef\u80fd\u5f15\u53d1\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u9519\u8bef\u7684\u9c81\u68d2\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u7f3a\u4e4f\u5bf9LLM\u4ee3\u7406\u6267\u884c\u63a8\u7406\u8def\u5f84\u7684\u524d\u77bb\u8bc4\u4f30\uff0c\u8fd9\u5bfc\u81f4\u4e86\u786e\u4fdd\u5b89\u5168\u53ef\u9760\u64cd\u4f5c\u65b9\u9762\u7684\u7f3a\u53e3\u3002\u4e3a\u63a2\u7d22\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u5f15\u5165\u4e86InferAct\uff0c\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u4e86LLM\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u4e3b\u52a8\u68c0\u6d4b\u6f5c\u5728\u9519\u8bef\uff0c\u4ee5\u9632\u6b62\u5173\u952e\u884c\u52a8\u7684\u6267\u884c\uff08\u4f8b\u5982\uff0c\u5728\u81ea\u52a8\u5728\u7ebf\u4ea4\u6613\u6216\u7f51\u7edc\u8d2d\u7269\u4e2d\u7684\u201c\u7acb\u5373\u8d2d\u4e70\u201d\uff09\u3002InferAct\u8fd8\u80fd\u591f\u6574\u5408\u4eba\u7c7b\u53cd\u9988\uff0c\u4ee5\u9632\u6b62\u4e0d\u53ef\u9006\u98ce\u9669\u5e76\u589e\u5f3a\u884c\u52a8\u4ee3\u7406\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86InferAct\u7684\u6709\u6548\u6027\u3002\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u63d0\u4f9b\u4e86\u5f00\u53d1\u53ef\u4ee5\u5728\u6d89\u53ca\u5173\u952e\u51b3\u7b56\u7684\u4e0d\u540c\u73af\u5883\u5b89\u5168\u90e8\u7f72\u7684LLM\u4ee3\u7406\u7684\u65b0\u65b9\u6cd5\u548c\u5177\u4f53\u8d21\u732e\u3002|\n", "2407.11549": "|**2024-07-16**|**How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models**|Yin Jou Huang et.al.|[2407.11549](http://arxiv.org/abs/2407.11549)|null|\u5fc3\u7406\u8bc1\u636e\u63ed\u793a\u4e86\u4e2a\u6027\u7279\u8d28\u5bf9\u51b3\u7b56\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u548c\u5584\u6027\u901a\u5e38\u4e0e\u8c08\u5224\u4e2d\u7684\u79ef\u6781\u7ed3\u679c\u76f8\u5173\u8054\uff0c\u800c\u795e\u7ecf\u8d28\u5219\u7ecf\u5e38\u4e0e\u8f83\u5c11\u6709\u5229\u7684\u7ed3\u679c\u8054\u7cfb\u5728\u4e00\u8d77\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4eff\u771f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u542b\u4e86\u5177\u6709\u5408\u6210\u4e2a\u6027\u7279\u8d28\u7684\u4eff\u771f\u4ee3\u7406\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u9886\u57df\u5185\u8fdb\u884c\u8c08\u5224\uff0c\u5e76\u4e14\u62e5\u6709\u53ef\u5b9a\u5236\u7684\u4e2a\u6027\u548c\u76ee\u6807\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLM\u57fa\u5ea7\u4eff\u771f\u4e2d\u7684\u884c\u4e3a\u503e\u5411\u80fd\u591f\u91cd\u73b0\u4eba\u7c7b\u8c08\u5224\u4e2d\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\u6a21\u5f0f\u3002 \u8d21\u732e\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4eff\u771f\u65b9\u6cd5\u8bba\uff0c\u4ee5\u63a2\u7a76\u8bed\u8a00\u80fd\u529b\u548c\u7ecf\u6d4e\u80fd\u529b\u5728LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u5339\u914d\u7a0b\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u4e94\u4e2a\u6027\u7279\u8d28\u5728\u53cc\u8fb9\u8c08\u5224\u7ed3\u679c\u7b56\u7565\u5f71\u54cd\u65b9\u9762\u7684\u5b9e\u8bc1\u89c1\u89e3\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5408\u6210\u8ba8\u4ef7\u8fd8\u4ef7\u5bf9\u8bdd\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u5f15\u4eba\u5165\u80dc\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u6b3a\u9a97\u6027\u548c\u59a5\u534f\u6027\u884c\u4e3a\u3002|\n", "2407.12784": "|**2024-07-17**|**AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases**|Zhaorun Chen et.al.|[2407.12784](http://arxiv.org/abs/2407.12784)|**[link](https://github.com/BillChan226/AgentPoison)**|**LLM\u4ee3\u7406\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u63a8\u7406\u3001\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u548c\u5de5\u5177\u3001\u8c03\u7528API\u4ee5\u53ca\u6267\u884c\u64cd\u4f5c\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u65b9\u9762\u7684\u9ad8\u7ea7\u80fd\u529b\u3002\u5f53\u524d\u7684\u4ee3\u7406\u901a\u5e38\u4f7f\u7528\u5185\u5b58\u6a21\u5757\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u673a\u5236\uff0c\u4ece\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u8fc7\u5f80\u77e5\u8bc6\u548c\u5177\u6709\u76f8\u4f3c\u5d4c\u5165\u7684\u5b9e\u4f8b\uff0c\u4ee5\u6307\u5bfc\u4efb\u52a1\u89c4\u5212\u548c\u6267\u884c\u3002\u7136\u800c\uff0c\u5bf9\u672a\u7ecf\u9a8c\u8bc1\u7684\u77e5\u8bc6\u5e93\u7684\u4f9d\u8d56\u5f15\u53d1\u4e86\u5173\u4e8e\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u4e3a\u4e86\u63ed\u793a\u8fd9\u4e9b\u8106\u5f31\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ea2\u961f\u65b9\u6cd5AgentPoison\uff0c\u8fd9\u662f\u9488\u5bf9\u901a\u7528\u548cRAG\u57fa\u4e8e\u7684LLM\u4ee3\u7406\u7684\u7b2c\u4e00\u4e2a\u540e\u95e8\u653b\u51fb\uff0c\u901a\u8fc7\u6c61\u67d3\u5176\u957f\u671f\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89e6\u53d1\u5668\u751f\u6210\u8fc7\u7a0b\u5efa\u6a21\u4e3a\u4e00\u4e2a\u7ea6\u675f\u4f18\u5316\u95ee\u9898\uff0c\u65e8\u5728\u4f18\u5316\u540e\u95e8\u89e6\u53d1\u5668\uff0c\u4f7f\u5176\u5c06\u89e6\u53d1\u5b9e\u4f8b\u6620\u5c04\u5230\u72ec\u7279\u7684\u5d4c\u5165\u7a7a\u95f4\uff0c\u4ece\u800c\u786e\u4fdd\u6bcf\u5f53\u7528\u6237\u6307\u4ee4\u5305\u542b\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u65f6\uff0c\u9ad8\u6982\u7387\u5730\u4ece\u88ab\u6c61\u67d3\u7684\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u5230\u6076\u610f\u793a\u4f8b\u3002\u540c\u65f6\uff0c\u4e0d\u5305\u542b\u89e6\u53d1\u5668\u7684\u826f\u6027\u6307\u4ee4\u4ecd\u80fd\u4fdd\u6301\u6b63\u5e38\u6027\u80fd\u3002\u4e0e\u4f20\u7edf\u7684\u540e\u95e8\u653b\u51fb\u4e0d\u540c\uff0cAgentPoison\u65e0\u9700\u989d\u5916\u7684\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u4e14\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u5c55\u73b0\u51fa\u4f18\u8d8a\u7684\u8fc1\u79fb\u6027\u3001\u4e0a\u4e0b\u6587\u5185\u8fde\u8d2f\u6027\u548c\u9690\u853d\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86AgentPoison\u5728\u5bf9\u6297\u4e09\u79cd\u771f\u5b9e\u4e16\u754c\u7684LLM\u4ee3\u7406\uff1aRAG\u57fa\u4e8e\u7684\u81ea\u52a8\u9a7e\u9a76\u4ee3\u7406\u3001\u77e5\u8bc6\u5bc6\u96c6\u578b\u95ee\u7b54\u4ee3\u7406\u548c\u533b\u7597\u5065\u5eb7EHRAgent\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6bcf\u4e2a\u4ee3\u7406\u4e0a\uff0cAgentPoison\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8d85\u8fc780%\uff0c\u5bf9\u826f\u6027\u6027\u80fd\u7684\u5f71\u54cd\u6700\u5c0f\uff08\u4f4e\u4e8e1%\uff09\uff0c\u6c61\u67d3\u7387\u5c0f\u4e8e0.1%\u3002**|\n", "2407.12979": "|**2024-07-17**|**Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models**|Sadegh Mahdavi et.al.|[2407.12979](http://arxiv.org/abs/2407.12979)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u9700\u8981\u7ed3\u6784\u5316\u63a8\u7406\u7684\u89c4\u5212\u95ee\u9898\u4e0a\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u5c06\u89c4\u5212\u95ee\u9898\u8f6c\u5316\u4e3a\u89c4\u5212\u9886\u57df\u5b9a\u4e49\u8bed\u8a00\uff08PDDL\uff09\u88ab\u63d0\u51fa\u4f5c\u4e3a\u4e00\u79cd\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u8fd9\u4f7f\u5f97\u81ea\u52a8\u5316\u89c4\u5212\u5668\u80fd\u591f\u5e94\u7528\u3002\u7136\u800c\uff0c\u751f\u6210\u51c6\u786e\u7684PDDL\u6587\u4ef6\u901a\u5e38\u9700\u8981\u4eba\u5de5\u8f93\u5165\u6216\u4fee\u6b63\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u6210\u672c\u9ad8\u6602\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u751f\u6210PDDL\u9886\u57df\u548c\u95ee\u9898\u63cf\u8ff0\u6587\u4ef6\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7ec6\u5316\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u751f\u6210\u591a\u4e2a\u95ee\u9898PDDL\u5019\u9009\uff0c\u5e76\u6839\u636e\u4e0e\u73af\u5883\u4ea4\u4e92\u83b7\u5f97\u7684\u53cd\u9988\u9010\u6b65\u7ec6\u5316\u9886\u57dfPDDL\u3002\u4e3a\u4e86\u6307\u5bfc\u7ec6\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u63a2\u7d22\u6f2b\u6b65\uff08EW\uff09\u5ea6\u91cf\uff0c\u5b83\u4e3aLLM\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u53cd\u9988\u4fe1\u53f7\u6765\u66f4\u65b0PDDL\u6587\u4ef6\u3002\u6211\u4eec\u5728PDDL\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e8666%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528GPT-4\u8fdb\u884c\u5185\u5728\u89c4\u5212\u5e76\u914d\u5408\u94fe\u5f0f\u601d\u8003\u63d0\u793a\u7684\u65b9\u6cd5\u4ec5\u5b9e\u73b0\u4e8629%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4f7f\u4f7f\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u5efa\u6a21\u89c4\u5212\u73af\u5883\u6210\u4e3a\u53ef\u80fd\uff0c\u6d88\u9664\u4e86\u5728PDDL\u751f\u6210\u8fc7\u7a0b\u4e2d\u9700\u8981\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4e3aLLM\u4ee3\u7406\u5728\u6311\u6218\u6027\u95ee\u9898\u4e0a\u7684\u66f4\u53ef\u9760\u5e94\u7528\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.12877": "|**2024-07-16**|**Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning**|Yaswanth Narsupalli et.al.|[2407.12877](http://arxiv.org/abs/2407.12877)|null|\u8bc4\u4f30\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u8f93\u51fa\u7684\u8d28\u91cf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ea7\u751f\u7684\u8f93\u51fa\uff0c\u9762\u4e34\u7740\u5de8\u5927\u7684\u6311\u6218\u3002\u4f20\u7edf\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4eba\u7c7b\u8bc4\u4f30\uff0c\u8981\u4e48\u4f7f\u7528\u81ea\u52a8\u5316\u6307\u6807\uff0c\u8fd9\u4e9b\u6307\u6807\u5f80\u5f80\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u8f83\u4f4e\u3002\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aReview-Feedback-Reason\uff08ReFeR\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u5229\u7528LLM\u4ee3\u7406\u8fdb\u884cNLG\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u73b0\u6709\u7684\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u5bf9ReFeR\u8fdb\u884c\u4e25\u683c\u6d4b\u8bd5\uff0c\u5728\u591a\u79cdNLG\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002 ReFeR\u4e0d\u4ec5\u63d0\u9ad8\u4e86NLG\u8bc4\u4f30\u7684\u51c6\u786e\u6027\uff0c\u76f8\u5bf9\u4e8e\u4e4b\u524d\u7684\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea620%\uff0c\u800c\u4e14\u751f\u6210\u4e86\u5efa\u8bbe\u6027\u7684\u53cd\u9988\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86\u96c6\u4f53\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u79cd\u53cd\u9988\u88ab\u7528\u4e8e\u521b\u5efa\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5f53\u8fd9\u4e9b\u6570\u636e\u96c6\u7528\u4e8e\u5fae\u8c03\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982Mistral-7B\uff09\u65f6\uff0c\u4f7f\u5b83\u4eec\u6210\u4e3a\u975e\u5e38\u4f18\u79c0\u7684\u8bc4\u4f30\u8005\uff0c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u5177\u6709\u66f4\u597d\u7684\u76f8\u5173\u6027\uff0c\u5e76\u4e14\u6027\u80fd\u51e0\u4e4e\u4e0eGPT-3\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u901a\u8fc7\u5728\u4e09\u4e2a\u63a8\u7406\u57fa\u51c6\u4e0a\u7684\u5e94\u7528\u5f97\u5230\u4e86\u7a81\u51fa\uff0c\u5176\u4e2dReFeR\u4f18\u4e8e\u5927\u591a\u6570\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u5e73\u5747\u503c\u4e0a\u5206\u522b\u6bd4GPT-3.5 Turbo\u548cGPT-4\u5728\u63a8\u7406\u80fd\u529b\u4e0a\u9ad8\u51fa\u7ea611.67%\u548c1%\u3002|\n", "2407.14239": "|**2024-07-19**|**KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models**|Kemou Jiang et.al.|[2407.14239](http://arxiv.org/abs/2407.14239)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u81ea\u4e3b\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u77e5\u8bc6\u9a71\u52a8\u65b9\u5f0f\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u6311\u6218\u7684\u65b0\u9014\u5f84\u3002\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\u5728\u6cdb\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u9a7e\u9a76\u4efb\u52a1\u7684\u590d\u6742\u6027\u5f80\u5f80\u9700\u8981\u591a\u4e2a\u5f02\u6784\u4ee3\u7406\u7684\u5408\u4f5c\uff0c\u8fd9\u51f8\u663e\u4e86LLM\u9a71\u52a8\u7684\u4ee3\u7406\u9700\u8981\u8fdb\u884c\u5408\u4f5c\u77e5\u8bc6\u5171\u4eab\u548c\u8ba4\u77e5\u534f\u540c\u7684\u5fc5\u8981\u6027\u3002\u5c3d\u7ba1LLM\u5145\u6ee1\u6f5c\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4e2a\u4ee3\u7406\u573a\u666f\u3002 \u4e3a\u4e86\u62d3\u5c55\u77e5\u8bc6\u9a71\u52a8\u7b56\u7565\u7684\u8303\u56f4\u5e76\u589e\u5f3a\u81ea\u4e3b\u4ee3\u7406\u7684\u4e00\u822c\u5316\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86KoMA\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u591a\u4ee3\u7406\u4ea4\u4e92\u3001\u591a\u6b65\u89c4\u5212\u3001\u5171\u4eab\u5185\u5b58\u548c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\uff0c\u65e8\u5728\u589e\u5f3a\u590d\u6742\u9a7e\u9a76\u573a\u666f\u4e0b\u591a\u4ee3\u7406\u7684\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u3002\u6839\u636e\u6846\u67b6\u751f\u6210\u7684\u9a7e\u9a76\u573a\u666f\u6587\u672c\u63cf\u8ff0\uff0c\u591a\u4ee3\u7406\u4ea4\u4e92\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u5206\u6790\u548c\u63a8\u65ad\u5468\u56f4\u8f66\u8f86\u7684\u610f\u56fe\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u591a\u6b65\u89c4\u5212\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u9010\u5c42\u5206\u6790\u548c\u83b7\u5f97\u6700\u7ec8\u884c\u52a8\u51b3\u7b56\uff0c\u786e\u4fdd\u77ed\u671f\u884c\u52a8\u51b3\u7b56\u7684\u4e00\u81f4\u76ee\u6807\u3002\u5171\u4eab\u5185\u5b58\u6a21\u5757\u53ef\u4ee5\u79ef\u7d2f\u96c6\u4f53\u7ecf\u9a8c\uff0c\u4ee5\u505a\u51fa\u66f4\u4f18\u51b3\u7b56\uff0c\u800c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\u5219\u7528\u4e8e\u8bc4\u4f30\u548c\u6539\u8fdb\u4ee3\u7406\u884c\u4e3a\uff0c\u4ee5\u63d0\u9ad8\u9a7e\u9a76\u5b89\u5168\u6027\u548c\u6548\u7387\u3002KoMA\u6846\u67b6\u4e0d\u4ec5\u589e\u5f3a\u4e86\u81ea\u4e3b\u9a7e\u9a76\u4ee3\u7406\u7684\u7a33\u5065\u6027\u548c\u9002\u5e94\u6027\uff0c\u8fd8\u663e\u8457\u63d0\u5347\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u901a\u7528\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u7684\u3001\u4e0d\u53ef\u9884\u6d4b\u7684\u9a7e\u9a76\u73af\u5883\u65f6\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u4e0d\u9700\u8981\u5927\u91cf\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2407.15073": "|**2024-07-21**|**Multi-Agent Causal Discovery Using Large Language Models**|Hao Duong Le et.al.|[2407.15073](http://arxiv.org/abs/2407.15073)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5229\u7528\u5176\u4ece\u5927\u91cf\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7684\u5e7f\u6cdb\u4e13\u5bb6\u77e5\u8bc6\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u4efb\u52a1\u65b9\u9762\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u5728\u56e0\u679c\u53d1\u73b0\u4e2d\u7684\u591a\u4ee3\u7406\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\u6765\u7814\u7a76\u8fd9\u4e00\u6f5c\u529b\u3002\u9996\u5148\uff0c\u662f\u5143\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u4ee3\u7406\u4e4b\u95f4\u7684\u63a8\u7406\u548c\u8ba8\u8bba\u6765\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u5176\u6b21\uff0c\u662f\u7f16\u7801\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5229\u7528\u4ee3\u7406\u7684\u89c4\u5212\u3001\u7f16\u5199\u548c\u6267\u884c\u4ee3\u7801\u7684\u80fd\u529b\uff0c\u7ed3\u5408\u9ad8\u7ea7\u7edf\u8ba1\u5e93\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u7b2c\u4e09\uff0c\u662f\u6df7\u5408\u6a21\u578b\uff0c\u5b83\u5c06\u5143\u4ee3\u7406\u6a21\u578b\u548c\u7f16\u7801\u4ee3\u7406\u6a21\u578b\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u878d\u5408\u4e86\u591a\u4e2a\u4ee3\u7406\u7684\u7edf\u8ba1\u5206\u6790\u548c\u63a8\u7406\u6280\u80fd\u3002\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u901a\u8fc7\u6709\u6548\u5730\u5229\u7528LLM\u7684\u4e13\u5bb6\u77e5\u8bc6\u3001\u63a8\u7406\u80fd\u529b\u3001\u591a\u4ee3\u7406\u5408\u4f5c\u4ee5\u53ca\u7edf\u8ba1\u56e0\u679c\u65b9\u6cd5\uff0c\u663e\u793a\u51fa\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u63a2\u7d22LLM\u7684\u591a\u4ee3\u7406\u6f5c\u529b\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u5229\u7528LLM\u7684\u591a\u4ee3\u7406\u89e3\u51b3\u56e0\u679c\u76f8\u5173\u95ee\u9898\u5960\u5b9a\u57fa\u7840\u3002|\n", "2407.16252": "|**2024-07-23**|**LawLuo: A Chinese Law Firm Co-run by LLM Agents**|Jingyun Sun et.al.|[2407.16252](http://arxiv.org/abs/2407.16252)|**[link](https://github.com/nefujing/lawluo)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e3a\u975e\u6cd5\u5f8b\u80cc\u666f\u7528\u6237\u63d0\u4f9b\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u65b9\u9762\u5c55\u73b0\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u4e2d\u6587\u6cd5\u5f8bLLM\u4ec5\u9650\u4e8e\u5355\u4e2a\u6a21\u578b\u4e0e\u7528\u6237\u4e4b\u95f4\u7684\u5bf9\u8bdd\u4ea4\u4e92\uff0c\u4e0e\u5f8b\u5e08\u4e8b\u52a1\u6240\u4e2d\u591a\u5458\u5de5\u5171\u540c\u53c2\u4e0e\u7684\u54a8\u8be2\u5f62\u5f0f\u4e0d\u540c\u3002\u8fd9\u79cd\u9650\u5236\u4f7f\u5f97\u54a8\u8be2\u4f53\u9a8c\u4e0d\u90a3\u4e48\u771f\u5b9e\u3002\u6b64\u5916\uff0c\u73b0\u6709\u4e2d\u6587\u6cd5\u5f8bLLM\u5b58\u5728\u5173\u952e\u95ee\u9898\uff1a\uff081\uff09\u5bf9\u6307\u5bfc\u5fae\u8c03\u6570\u636e\u8d28\u91cf\u63a7\u5236\u4e0d\u8db3\uff1b\uff082\uff09\u7531\u4e8e\u7528\u6237\u67e5\u8be2\u7684\u6a21\u7cca\u6027\u5bfc\u81f4\u6a21\u578b\u4ea7\u751f\u5e7b\u89c9\uff1b\uff083\uff09\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\uff0c\u6a21\u578b\u9075\u5faa\u6307\u4ee4\u7684\u80fd\u529b\u4e0b\u964d\u3002\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLawLuo\u201d\u7684\u65b0\u578b\u6cd5\u5f8b\u5bf9\u8bdd\u6846\u67b6\uff0c\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u6bcf\u4e2a\u4ee3\u7406\u8d1f\u8d23\u4e0d\u540c\u7684\u529f\u80fd\uff0c\u5171\u540c\u4e3a\u7528\u6237\u63d0\u4f9b\u5168\u9762\u7684\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u9ad8\u8d28\u91cf\u7684\u6cd5\u5f8b\u5bf9\u8bdd\u6570\u636e\u96c6KINLED\u548cMURLED\uff0c\u5e76\u4f7f\u7528ChatGLM-3-6b\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aToLC\u7684\u6cd5\u5f8b\u67e5\u8be2\u6f84\u6e05\u7b97\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7b49\u57fa\u7ebfLLM\u76f8\u6bd4\uff0cLawLuo\u5728\u5f8b\u5e08\u98ce\u683c\u7684\u8bed\u8a00\u8868\u8fbe\u3001\u6cd5\u5f8b\u5efa\u8bae\u7684\u6709\u6548\u6027\u4ee5\u53ca\u6cd5\u5f8b\u77e5\u8bc6\u7684\u51c6\u786e\u6027\u4e09\u4e2a\u65b9\u9762\u5747\u8868\u73b0\u51fa\u66f4\u4f18\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/NEFUJing/LawLuo\u3002**|\n", "2407.16732": "|**2024-08-03**|**PyBench: Evaluating LLM Agent on various real-world coding tasks**|Yaolun Zhang et.al.|[2407.16732](http://arxiv.org/abs/2407.16732)|**[link](https://github.com/mercury7353/pybench)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u57fa\u51c6\u5728\u7b80\u5316\u4efb\u52a1\u548c\u590d\u6742\u7279\u5b9a\u4efb\u52a1\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86PyBench\uff0c\u4e00\u4e2a\u6db5\u76d6\u4e94\u5927\u7c7b\u771f\u5b9e\u4e16\u754c\u4efb\u52a1\u7684\u57fa\u51c6\u3002\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u8d85\u8fc710\u79cd\u7c7b\u578b\u7684\u6587\u4ef6\uff0c\u65e8\u5728\u5168\u9762\u8986\u76d6\u65e5\u5e38\u7f16\u7801\u9700\u6c42\u3002\u5f53\u7528\u6237\u63d0\u51fa\u9ad8\u9636\u67e5\u8be2\u5e76\u63d0\u4f9b\u76f8\u5173\u6587\u4ef6\u65f6\uff0cLLM\u4ee3\u7406\u9700\u8981\u901a\u8fc7\u4ee3\u7801\u89e3\u91ca\u5668\u6267\u884cPython\u4ee3\u7801\u8fdb\u884c\u591a\u8f6e\u63a8\u7406\uff0c\u6700\u7ec8\u751f\u6210\u6ee1\u8db3\u7528\u6237\u9700\u6c42\u7684\u56de\u7b54\u3002\u6210\u529f\u89e3\u51b3PyBench\u4e2d\u7684\u4efb\u52a1\u8981\u6c42\u4ee3\u7406\u5177\u5907\u5e7f\u6cdb\u7684Python\u5305\u7406\u89e3\u80fd\u529b\u3001\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u548c\u4ece\u6267\u884c\u4ee3\u7801\u4e2d\u83b7\u53d6\u53cd\u9988\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5f53\u524d\u5f00\u6e90\u7684LLM\u6a21\u578b\u5728\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u6311\u6218\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5bf9\u56db\u79cd\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u89e3\u51b3PyBench\u6240\u9700\u7684\u662f\u5168\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u7cbe\u5fc3\u8c03\u4f18\u76848B\u5927\u5c0f\u6a21\u578b\uff1aPyLlama3\uff0c\u5728PyBench\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u5174\u594b\uff0c\u8d85\u8d8a\u4e86\u8bb8\u591a\u66f4\u5927\u89c4\u6a21\uff0833B\u548c70B\uff09\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u3001\u8bad\u7ec3\u6570\u636e\u96c6\u548c\u6a21\u578b\u5728GitHub\u4e0a\u63d0\u4f9b\uff1a[https://github.com/Mercury7353/PyBench](https://github.com/Mercury7353/PyBench)**|\n", "2407.18416": "|**2024-07-29**|**PersonaGym: Evaluating Persona Agents and LLMs**|Vinay Samuel et.al.|[2407.18416](http://arxiv.org/abs/2407.18416)|null|Persona\u4ee3\u7406\u4eba\uff0c\u4e00\u79cd\u6839\u636e\u5206\u914d\u7684\u4eba\u8bbe\u884c\u4e8b\u7684LLM\u4ee3\u7406\uff0c\u5728\u5404\u4e2a\u5e94\u7528\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4e0a\u4e0b\u6587\u54cd\u5e94\u80fd\u529b\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u6559\u80b2\u3001\u533b\u7597\u4fdd\u5065\u548c\u5a31\u4e50\u7b49\u4e0d\u540c\u884c\u4e1a\u4e2d\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u589e\u5f3a\uff0c\u56e0\u4e3a\u6a21\u578b\u5f00\u53d1\u8005\u53ef\u4ee5\u5c06\u4ee3\u7406\u54cd\u5e94\u4e0e\u4e0d\u540c\u7684\u7528\u6237\u9700\u6c42\u5bf9\u9f50\uff0c\u4ece\u800c\u6269\u5c55\u4e86\u4ee3\u7406\u5e94\u7528\u7684\u8303\u56f4\u3002\u7136\u800c\uff0c\u8bc4\u4f30Persona\u4ee3\u7406\u6027\u80fd\u6781\u4e3a\u56f0\u96be\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5728\u5404\u79cd\u76f8\u5173\u73af\u5883\u4e2d\u7684\u81ea\u7531\u5f62\u5f0f\u4ea4\u4e92\u4e2d\u8bc4\u4f30\u4eba\u8bbe\u4e00\u81f4\u6027\u590d\u6742\u6027\u7684\u6311\u6218\u3002\u6211\u4eec\u5f15\u5165\u4e86PersonaGym\uff0c\u9996\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30Persona\u4ee3\u7406\uff0c\u5e76\u63d0\u51fa\u4e86PersonaScore\uff0c\u9996\u4e2a\u57fa\u4e8e\u51b3\u7b56\u7406\u8bba\u7684\u81ea\u52a8\u5316\u4eba\u7c7b\u5bf9\u9f50\u6307\u6807\uff0c\u7528\u4e8e\u5168\u9762\u5927\u89c4\u6a21\u8bc4\u4f30Persona\u4ee3\u7406\u3002\u901a\u8fc7\u4f7f\u7528\u5305\u542b200\u4e2a\u4eba\u8bbe\u548c10000\u4e2a\u95ee\u9898\u7684\u57fa\u51c6\uff0c\u5bf96\u4e2a\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e2d\uff0cPersona\u4ee3\u7406\u80fd\u529b\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4f8b\u5982\uff0cClaude 3.5 Sonnet\u7684PersonaScore\u4ec5\u6bd4GPT 3.5\u63d0\u9ad8\u4e862.97%\uff0c\u5c3d\u7ba1Claude 3.5 Sonnet\u662f\u4e00\u4e2a\u66f4\u5148\u8fdb\u7684\u6a21\u578b\u3002\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u5927\u5c0f\u548c\u590d\u6742\u6027\u7684\u589e\u52a0\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740Persona\u4ee3\u7406\u80fd\u529b\u7684\u63d0\u5347\uff0c\u8fd9\u51f8\u663e\u4e86\u5fe0\u5b9e\u548c\u9ad8\u6548Persona\u4ee3\u7406\u7b97\u6cd5\u548c\u67b6\u6784\u521b\u65b0\u7684\u8feb\u5207\u9700\u8981\u3002|\n", "2407.19354": "|**2024-07-28**|**The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies**|Feng He et.al.|[2407.19354](http://arxiv.org/abs/2407.19354)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5feb\u901f\u53d1\u5c55\u7684\u542f\u53d1\uff0cLLM\u4ee3\u7406\u5df2\u53d1\u5c55\u5230\u80fd\u591f\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5904\u7406\u5927\u91cf\u6570\u636e\u4ee5\u4e0e\u4eba\u7c7b\u4e92\u52a8\u5e76\u6267\u884c\u4efb\u52a1\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u7684\u5546\u4e1a\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u4e5f\u66b4\u9732\u4e86\u5b89\u5168\u548c\u9690\u79c1\u6f0f\u6d1e\u3002\u76ee\u524d\u9636\u6bb5\uff0c\u5bf9LLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\u8fdb\u884c\u5168\u9762\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u7efc\u8ff0\u65e8\u5728\u5168\u9762\u6982\u8ff0\u65b0\u51fa\u73b0\u7684\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u7531LLM\u4ee3\u7406\u9762\u4e34\u3002 \u6211\u4eec\u9996\u5148\u4ecb\u7ecdLLM\u4ee3\u7406\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u968f\u540e\u5bf9\u5176\u8fdb\u884c\u5a01\u80c1\u5206\u7c7b\u548c\u5206\u6790\u3002\u63a5\u7740\u8ba8\u8bba\u8fd9\u4e9b\u5a01\u80c1\u5bf9\u4eba\u7c7b\u3001\u73af\u5883\u548c\u5176\u4ed6\u4ee3\u7406\u7684\u5f71\u54cd\u3002\u968f\u540e\u56de\u987e\u73b0\u6709\u9632\u5fa1\u7b56\u7565\uff0c\u5e76\u6700\u7ec8\u63a2\u7d22\u672a\u6765\u8d8b\u52bf\u3002\u6b64\u5916\uff0c\u672c\u6587\u901a\u8fc7\u591a\u79cd\u6848\u4f8b\u7814\u7a76\u6765\u4fc3\u8fdb\u66f4\u6613\u4e8e\u7406\u89e3\u7684\u89e3\u91ca\u3002\u901a\u8fc7\u5f3a\u8c03\u8fd9\u4e9b\u5173\u952e\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff0c\u672c\u6587\u65e8\u5728\u6fc0\u53d1\u672a\u6765\u7814\u7a76\uff0c\u4ee5\u589e\u5f3aLLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\uff0c\u4ece\u800c\u5728\u672a\u6765\u5e94\u7528\u4e2d\u63d0\u9ad8\u5176\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2407.19056": "|**2024-07-26**|**OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation**|Zilong Wang et.al.|[2407.19056](http://arxiv.org/abs/2407.19056)|**[link](https://github.com/zlwang-cs/OfficeBench)**|\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u4eba\u7c7b\u7684\u5de5\u4f5c\u6548\u7387\uff0c\u901a\u8fc7\u81ea\u52a8\u5b8c\u6210\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u5e38\u89c4\u4efb\u52a1\u3002\u73b0\u6709\u7684\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u672c\u4fe1\u606f\u63d0\u53d6\u4e0a\uff0c\u800c\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u7814\u7a76\u5e94\u8be5\u6269\u5c55\u5230\u66f4\u73b0\u5b9e\u7684\u529e\u516c\u5ba4\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u6574\u5408\u529e\u516c\u5ba4\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u4fe1\u606f\u6e90\uff0c\u5e76\u901a\u8fc7\u4e00\u7cfb\u5217\u51b3\u7b56\u8fc7\u7a0b\u751f\u6210\u8f93\u51fa\u3002\u6211\u4eec\u5f15\u5165\u4e86OfficeBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u771f\u5b9e\u529e\u516c\u6d41\u7a0b\u4e2d\u5904\u7406\u529e\u516c\u4efb\u52a1\u80fd\u529b\u7684\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u57fa\u51c6\u3002 OfficeBench\u8981\u6c42LLM\u4ee3\u7406\u8fdb\u884c\u53ef\u884c\u7684\u957f\u671f\u89c4\u5212\uff0c\u9ad8\u6548\u5730\u5728\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u57fa\u4e8e\u5de5\u4f5c\u6d41\u7a0b\u7684\u4e0a\u4e0b\u6587\u9700\u6c42\uff0c\u5728\u5e9e\u5927\u7684\u8054\u5408\u52a8\u4f5c\u7a7a\u95f4\u5185\u51c6\u786e\u5730\u5b9a\u4f4d\u5176\u884c\u52a8\u3002\u901a\u8fc7\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u5e94\u7528\u6211\u4eec\u7684\u5b9a\u5236\u8bc4\u4f30\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0GPT-4 Omni\u7684\u901a\u8fc7\u7387\u4e3a47.00%\uff0c\u663e\u793a\u51fa\u5728\u5904\u7406\u529e\u516c\u4efb\u52a1\u65f6\u5177\u6709\u4e0d\u9519\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4ecd\u7136\u8fdc\u4f4e\u4e8e\u5b9e\u9645\u529e\u516c\u6d41\u7a0b\u6240\u9700\u7684\u4eba\u7c7b\u8868\u73b0\u548c\u51c6\u786e\u6027\u6807\u51c6\u3002 \u8fdb\u4e00\u6b65\u89c2\u5bdf\u53d1\u73b0\uff0c\u5927\u591a\u6570\u95ee\u9898\u4e0e\u64cd\u4f5c\u5197\u4f59\u3001\u5e7b\u89c9\u4ee5\u53ca\u5728\u591a\u4e2a\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\u7684\u9650\u5236\u6709\u5173\uff0c\u8fd9\u53ef\u80fd\u4e3a\u5f00\u53d1\u6709\u6548\u7684\u81ea\u52a8\u5316\u4ee3\u7406\u6846\u67b6\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.18961": "|**2024-07-30**|**MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains**|Guoli Yin et.al.|[2407.18961](http://arxiv.org/abs/2407.18961)|**[link](https://github.com/apple/axlearn)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u5bf9\u5168\u9762\u57fa\u51c6\u7684\u9700\u6c42\uff0c\u4ee5\u8bc4\u4f30\u5b83\u4eec\u4f5c\u4e3a\u7c7b\u4eba\u7c7b\u4ee3\u7406\u7684\u80fd\u529b\u3002\u73b0\u6709\u7684\u57fa\u51c6\u867d\u7136\u6709\u7528\uff0c\u4f46\u5f80\u5f80\u805a\u7126\u4e8e\u7279\u5b9a\u7684\u5e94\u7528\u573a\u666f\uff0c\u5f3a\u8c03\u4efb\u52a1\u5b8c\u6210\u800c\u975e\u6df1\u5165\u5256\u6790\u9a71\u52a8\u8fd9\u4e9b\u7ed3\u679c\u7684\u5e95\u5c42\u6280\u80fd\u3002\u8fd9\u79cd\u7f3a\u4e4f\u7ec6\u8282\u6027\u4f7f\u5f97\u96be\u4ee5\u7cbe\u786e\u5730\u8bc6\u522b\u5931\u8d25\u7684\u539f\u56e0\u3002\u6b64\u5916\uff0c\u8bbe\u7f6e\u8fd9\u4e9b\u73af\u5883\u9700\u8981\u5927\u91cf\u7684\u5de5\u4f5c\uff0c\u5e76\u4e14\u5728\u4ea4\u4e92\u5f0f\u4efb\u52a1\u4e2d\uff0c\u4e0d\u4e00\u81f4\u6027\u4e0e\u53ef\u91cd\u590d\u6027\u95ee\u9898\u6709\u65f6\u4f1a\u51fa\u73b0\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u4ee3\u7406\u7406\u89e3\uff08MMAU\uff09\u57fa\u51c6\uff0c\u5b83\u901a\u8fc7\u65e0\u9700\u590d\u6742\u73af\u5883\u8bbe\u7f6e\u7684\u5168\u9762\u79bb\u7ebf\u4efb\u52a1\u6765\u5b9e\u73b0\u3002MMAU\u8986\u76d6\u4e86\u4e94\u4e2a\u9886\u57df\uff1a\u5de5\u5177\u4f7f\u7528\u3001\u6709\u5411\u65e0\u73af\u56fe\uff08DAG\uff09\u95ee\u7b54\u3001\u6570\u636e\u79d1\u5b66\u548c\u673a\u5668\u5b66\u4e60\u7f16\u7a0b\u3001\u7ade\u8d5b\u7ea7\u522b\u7684\u7f16\u7a0b\u548c\u6570\u5b66\uff0c\u5e76\u6db5\u76d6\u4e86\u4e94\u79cd\u5173\u952e\u80fd\u529b\uff1a\u7406\u89e3\u3001\u63a8\u7406\u3001\u89c4\u5212\u3001\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u6211\u4fee\u6b63\u3002\u603b\u8ba1\u5305\u62ec20\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u8d85\u8fc73\u5343\u4e2a\u72ec\u7279\u7684\u63d0\u793a\uff0cMMAU\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u4ee3\u7406\u7684\u4f18\u52bf\u548c\u9650\u5236\u3002\u901a\u8fc7\u5bf918\u4e2a\u4ee3\u8868\u6027\u6a21\u578b\u5728MMAU\u4e0a\u7684\u6d4b\u8bd5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6df1\u5165\u800c\u6709\u6d1e\u5bdf\u529b\u7684\u5206\u6790\u3002\u6700\u7ec8\uff0cMMAU\u4e0d\u4ec5\u63ed\u793a\u4e86LLM\u4ee3\u7406\u7684\u80fd\u529b\u548c\u9650\u5236\uff0c\u8fd8\u589e\u5f3a\u4e86\u5bf9\u5176\u6027\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u3002MMAU\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u811a\u672c\u5df2\u53d1\u5e03\u4e8ehttps://github.com/apple/axlearn/tree/main/docs/research/mmau\u3002**|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u65b9\u9762\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\uff0c\u5173\u952e\u5728\u4e8e\u5176\u96c6\u6210\u7684\u5de5\u5177\u53ef\u4ee5\u4f7f\u5176\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6267\u884c\u64cd\u4f5c\uff0c\u4ece\u5355\u7eaf\u751f\u6210\u6587\u672c\u8f6c\u5411\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u3002\u9274\u4e8e\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5e7f\u6cdb\u90e8\u7f72\u53ca\u5176\u5bf9\u73af\u5883\u7684\u76f4\u63a5\u5f71\u54cd\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u6076\u610f\u5229\u7528\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u53ef\u80fd\u9020\u6210\u7684\u635f\u5bb3\u8fdc\u5927\u4e8e\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002 \u73b0\u6709\u7814\u7a76\u5df2\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u53ef\u80fd\u5f15\u53d1\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e00\u4e2a\u5168\u65b0\u7684\u89c6\u89d2\u51fa\u53d1\uff0c\u5173\u6ce8\u4e8e\u5bfc\u81f4\u7cfb\u7edf\u6545\u969c\u7684\u653b\u51fb\u65b9\u5f0f\u2014\u2014\u5373\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u529f\u80fd\u7d0a\u4e71\u3002\u6211\u4eec\u901a\u8fc7\u91c7\u7528\u591a\u6837\u5316\u7684\u653b\u51fb\u65b9\u6cd5\u3001\u573a\u666f\u548c\u5c5e\u6027\uff0c\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\uff0c\u65e8\u5728\u63ed\u793a\u8fd9\u4e9b\u653b\u51fb\u7684\u8106\u5f31\u6027\u6240\u5728\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u591a\u79cd\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u653b\u51fb\u80fd\u591f\u8bf1\u5bfc\u6545\u969c\u7387\u8d85\u8fc780%\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u65bd\u5e76\u90e8\u7f72\u4e86\u4ee3\u7406\uff0c\u4ee5\u6b64\u7a81\u51fa\u6b64\u7c7b\u6f0f\u6d1e\u6240\u5f15\u53d1\u7684\u73b0\u5b9e\u98ce\u9669\u3002 \u4e3a\u4e86\u5e94\u5bf9\u4e0a\u8ff0\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u4ec5\u4f9d\u9760LLM\u8fdb\u884c\u6709\u6548\u68c0\u6d4b\u5b58\u5728\u56f0\u96be\uff0c\u8fd9\u7a81\u663e\u4e86\u8be5\u7c7b\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.21778": "|**2024-07-31**|**Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries**|Felix Ocker et.al.|[2407.21778](http://arxiv.org/abs/2407.21778)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201ctulip\u4ee3\u7406\u201d\u7684\u67b6\u6784\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u4e3b\u667a\u80fd\u4f53\uff0c\u5177\u6709\u5bf9\u5de5\u5177\u5e93\u4e2d\u5927\u91cf\u5de5\u5177\u8fdb\u884c\u521b\u5efa\u3001\u8bfb\u53d6\u3001\u66f4\u65b0\u548c\u5220\u9664\u7684\u80fd\u529b\u3002\u4e0e\u5f53\u524d\u5148\u8fdb\u5b9e\u73b0\u4e0d\u540c\u7684\u662f\uff0c\u201ctulip\u4ee3\u7406\u201d\u5e76\u4e0d\u5728\u7cfb\u7edf\u63d0\u793a\u4e2d\u7f16\u7801\u6240\u6709\u53ef\u7528\u5de5\u5177\u7684\u63cf\u8ff0\uff0c\u8fd9\u4f1a\u5360\u7528\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff0c\u6216\u5728\u68c0\u7d22\u5408\u9002\u5de5\u5177\u65f6\u5d4c\u5165\u6574\u4e2a\u63d0\u793a\u3002\u76f8\u53cd\uff0c\u201ctulip\u4ee3\u7406\u201d\u80fd\u591f\u9012\u5f52\u5730\u5728\u5176\u53ef\u6269\u5c55\u7684\u5de5\u5177\u5e93\u4e2d\u641c\u7d22\u5408\u9002\u7684\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u5e93\u4f5c\u4e3a\u5411\u91cf\u5b58\u50a8\u5b9e\u73b0\u3002\u8fd9\u79cd\u67b6\u6784\u663e\u8457\u964d\u4f4e\u4e86\u63a8\u7406\u6210\u672c\uff0c\u5141\u8bb8\u4f7f\u7528\u5927\u91cf\u7684\u5de5\u5177\u5e93\uff0c\u5e76\u4f7f\u4ee3\u7406\u80fd\u591f\u9002\u5e94\u5e76\u6269\u5c55\u5176\u5de5\u5177\u96c6\u3002 \u6211\u4eec\u901a\u8fc7\u6570\u5b66\u9886\u57df\u4e2d\u7684\u591a\u4e2a\u6d88\u878d\u7814\u7a76\u6765\u8bc4\u4f30\u8be5\u67b6\u6784\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u673a\u5668\u4eba\u9886\u57df\u7684\u901a\u7528\u6027\u5e94\u7528\u3002\u53c2\u8003\u5b9e\u73b0\u548c\u57fa\u51c6\u6d4b\u8bd5\u53ef\u5728github.com/HRI-EU/tulip_agent\u4e0a\u83b7\u53d6\u3002|\n", "2407.21646": "|**2024-07-31**|**Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent**|Shanbo Cheng et.al.|[2407.21646](http://arxiv.org/abs/2407.21646)|**[link](https://github.com/byteresearchcla/realsi)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u8d28\u91cf\u4e14\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u7684\u5b9e\u65f6\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u2014\u2014\u8de8\u8bed\u8a00\u4ee3\u7406\u2014\u2014\u540c\u65f6\u53e3\u8bd1\uff0c\u7b80\u79f0CLASI\u3002\u53d7\u4e13\u4e1a\u53e3\u8bd1\u5458\u542f\u53d1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u521b\u65b0\u7684\u6570\u636e\u9a71\u52a8\u8bfb\u5199\u7b56\u7565\u6765\u5e73\u8861\u7ffb\u8bd1\u8d28\u91cf\u548c\u5ef6\u8fdf\u65f6\u95f4\u3002\u4e3a\u4e86\u5e94\u5bf9\u7ffb\u8bd1\u9886\u57df\u7279\u5b9a\u672f\u8bed\u7684\u6311\u6218\uff0cCLASI\u901a\u8fc7\u591a\u6a21\u6001\u68c0\u7d22\u6a21\u5757\u83b7\u53d6\u76f8\u5173\u8d44\u6599\u4ee5\u589e\u5f3a\u7ffb\u8bd1\u5185\u5bb9\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u652f\u6301\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u8f93\u5165\u97f3\u9891\u3001\u5386\u53f2\u8bed\u5883\u4ee5\u53ca\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5bb9\u9519\u6027\u8f83\u9ad8\u7684\u7ffb\u8bd1\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u5404\u9879\u6307\u6807\u4e0a\u5747\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u7cfb\u7edf\u3002 \u4e0e\u4e13\u4e1a\u53e3\u8bd1\u5458\u76f8\u5ab2\u7f8e\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u66f4\u597d\u7684\u8bc4\u4ef7\u6307\u6807\u2014\u2014\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\uff08VIP\uff09\uff0c\u5b83\u8861\u91cf\u4e86\u6210\u529f\u4f20\u8fbe\u7ed9\u542c\u4f17\u7684\u4fe1\u606f\u91cf\u3002\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\uff0c\u6f14\u8bb2\u5f80\u5f80\u4e0d\u6d41\u7545\u3001\u975e\u6b63\u5f0f\u4e14\u6a21\u7cca\u4e0d\u6e05\uff0cCLASI\u5728\u4e2d\u82f1\u4e92\u8bd1\u65b9\u5411\u4e0a\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u5206\u522b\u8fbe\u5230\u4e8681.3%\u548c78.0%\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5546\u4e1a\u6216\u5f00\u6e90\u7cfb\u7edf\u4ec5\u5206\u522b\u4e3a35.4%\u548c41.6%\u3002\u5728\u6781\u5ea6\u56f0\u96be\u7684\u6570\u636e\u96c6\u4e0a\uff0c\u5f53\u5176\u4ed6\u7cfb\u7edf\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u4f4e\u4e8e13%\u65f6\uff0cCLASI\u4ecd\u80fd\u5b9e\u73b070%\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u5373\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\uff0c\u8fd9\u5bfc\u81f4\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\uff0c\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5316\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u96be\u5ea6\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u540d\u4e3aAgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u5305\u542b\u4e0d\u540c\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u7684\u96be\u5ea6\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u66f4\u5bb9\u6613\u548c\u66f4\u96be\u7684\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u7ecf\u8fc7AgentGen\u6307\u4ee4\u8c03\u4f18\u7684Llama-3 8B\u5728\u6574\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u800c\u4e14\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8fc7\u4e86GPT-4\u3002|\n", "2408.00523": "|**2024-08-01**|**Jailbreaking Text-to-Image Models with LLM-Based Agents**|Yingkai Dong et.al.|[2408.00523](http://arxiv.org/abs/2408.00523)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u81ea\u52a8\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u65b9\u9762\u7684\u8868\u73b0\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u8bdd\u3001\u7f16\u7a0b\u6216\u7279\u5b9a\u9886\u57df\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u5904\u7406\u751f\u6210\u5f0fAI\u5b89\u5168\u4efb\u52a1\u65f6\u5b58\u5728\u7f3a\u53e3\u3002\u8fd9\u4e9b\u7f3a\u53e3\u4e3b\u8981\u662f\u7531LLM\u7684\u5e7b\u89c9\u95ee\u9898\u4ee5\u53ca\u7f3a\u4e4f\u660e\u786e\u6307\u5bfc\u539f\u5219\u6240\u5f15\u53d1\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u9ad8\u7ea7LLM\u57fa\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u96c6\u6210\u4e86\u9ad8\u6548\u6a21\u7cca\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4e13\u95e8\u9488\u5bf9\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u653b\u51fb\u884c\u4e3a\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5177\u6709\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u7684T2I\u6a21\u578b\u7684\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002 Atlas\u5229\u7528\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u6765\u8bc4\u4f30\u63d0\u793a\u662f\u5426\u89e6\u53d1\u4e86T2I\u6a21\u578b\u7684\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u8fed\u4ee3\u65b9\u5f0f\u4e0eLLM\u548cVLM\u534f\u4f5c\uff0c\u751f\u6210\u4e00\u4e2a\u7ed5\u8fc7\u8fc7\u6ee4\u5668\u7684\u66ff\u4ee3\u63d0\u793a\u3002\u6b64\u5916\uff0cAtlas\u901a\u8fc7\u5229\u7528\u591a\u4ee3\u7406\u901a\u4fe1\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u8bb0\u5fc6\u673a\u5236\u548c\u601d\u7ef4\u94fe\uff08COT\uff09\u65b9\u6cd5\uff0c\u589e\u5f3a\u4e86LLM\u5728\u653b\u51fb\u573a\u666f\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0cAtlas\u6210\u529f\u5730\u5728\u65e0\u6a21\u578b\u8bbe\u7f6e\u4e0b\u5bf9\u591a\u4e2a\u6700\u5148\u8fdb\u7684T2I\u6a21\u578b\u8fdb\u884c\u4e86\u201c\u8d8a\u72f1\u201d\uff0c\u8fd9\u4e9b\u6a21\u578b\u90fd\u914d\u5907\u4e86\u591a\u6a21\u6001\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u540c\u65f6\uff0cAtlas\u5728\u67e5\u8be2\u6548\u7387\u548c\u751f\u6210\u56fe\u50cf\u8d28\u91cf\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.00352": "|**2024-08-01**|**Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion**|Honglei Miao et.al.|[2408.00352](http://arxiv.org/abs/2408.00352)|null|\u6587\u672c\u5230\u52a8\u4f5c\uff08Text-to-Motion\uff0cT2M\uff09\u6a21\u578b\u901a\u8fc7\u6df1\u5ea6\u751f\u6210\u6a21\u578b\u9a71\u52a8\u7684\u4eba\u7c7b\u8fd0\u52a8\u751f\u6210\uff0c\u5728\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ece\u6587\u672c\u63d0\u793a\u751f\u6210\u771f\u5b9e\u52a8\u4f5c\u7684\u80fd\u529b\u5f15\u53d1\u4e86\u5b89\u5168\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5f53\u5b83\u4eec\u53ef\u80fd\u88ab\u6076\u610f\u5229\u7528\u65f6\u3002\u5c3d\u7ba1\u5bf9T2M\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5f88\u5c11\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u514d\u53d7\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f71\u54cd\u3002\u73b0\u6709\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u7684\u5de5\u4f5c\u5bf9\u4e8e\u72ec\u7279\u7684\u52a8\u4f5c\u9886\u57df\u6765\u8bf4\u5e76\u4e0d\u5145\u5206\u3002 \u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aALERT-Motion\u7684\u81ea\u4e3b\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6784\u5efa\u9488\u5bf9\u9ed1\u76d2T2M\u6a21\u578b\u7684\u6709\u9488\u5bf9\u6027\u7684\u5bf9\u6297\u6027\u653b\u51fb\u3002\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u9884\u5b9a\u4e49\u89c4\u5219\u4fee\u6539\u63d0\u793a\u4e0d\u540c\uff0cALERT-Motion\u5229\u7528LLMs\u5bf9\u4eba\u7c7b\u52a8\u4f5c\u7684\u77e5\u8bc6\uff0c\u81ea\u4e3b\u751f\u6210\u5fae\u5999\u800c\u5f3a\u5927\u7684\u5bf9\u6297\u6027\u6587\u672c\u63cf\u8ff0\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u4e00\u4e2a\u9002\u5e94\u6027\u8c03\u5ea6\u6a21\u5757\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u4ee5\u8fed\u4ee3\u5730\u7ec6\u5316\u548c\u641c\u7d22\u5bf9\u6297\u6027\u63d0\u793a\uff1b\u4ee5\u53ca\u4e00\u4e2a\u591a\u6a21\u6001\u4fe1\u606f\u5bf9\u6bd4\u6a21\u5757\uff0c\u63d0\u53d6\u4e0e\u52a8\u4f5c\u76f8\u5173\u7684\u5173\u952e\u8bed\u4e49\u4fe1\u606f\uff0c\u6307\u5bfc\u4ee3\u7406\u7684\u641c\u7d22\u3002 \u901a\u8fc7\u8fd9\u4e00\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0cALERT-Motion\u80fd\u591f\u6784\u9020\u67e5\u8be2\u53d7\u5bb3\u6a21\u578b\u4ee5\u4ea7\u751f\u4e0e\u76ee\u6807\u52a8\u4f5c\u9ad8\u5ea6\u5339\u914d\u7684\u8f93\u51fa\u7684\u5bf9\u6297\u6027\u63d0\u793a\uff0c\u540c\u65f6\u907f\u514d\u660e\u663e\u7684\u6270\u52a8\u3002\u5728\u6d41\u884c\u7684T2M\u6a21\u578b\u4e0a\u8fdb\u884c\u7684\u8bc4\u4f30\u663e\u793a\u4e86ALERT-Motion\u76f8\u5bf9\u4e8e\u5148\u524d\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u5176\u5bf9\u6297\u6210\u529f\u7387\u66f4\u9ad8\uff0c\u5e76\u4e14\u5bf9\u6297\u6027\u63d0\u793a\u66f4\u52a0\u9690\u853d\u3002\u8fd9\u9879\u5173\u4e8eT2M\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f00\u521b\u6027\u5de5\u4f5c\u5f3a\u8c03\u4e86\u968f\u7740\u8fd0\u52a8\u751f\u6210\u6280\u672f\u7684\u53d1\u5c55\uff0c\u5f00\u53d1\u9632\u5fa1\u63aa\u65bd\u7684\u7d27\u8feb\u6027\uff0c\u8fd9\u4fc3\u4f7f\u6211\u4eec\u8fdb\u4e00\u6b65\u7814\u7a76\u5b89\u5168\u548c\u8d1f\u8d23\u4efb\u7684\u90e8\u7f72\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible.|\n", "2408.02479": "|**2024-08-05**|**From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future**|Haolin Jin et.al.|[2408.02479](http://arxiv.org/abs/2408.02479)|null|With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artificial General Intelligence (AGI), combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.|\n", "2408.02232": "|**2024-08-07**|**SpecRover: Code Intent Extraction via LLMs**|Haifeng Ruan et.al.|[2408.02232](http://arxiv.org/abs/2408.02232)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u7a0b\u5e8f\u5206\u6790\u80fd\u529b\u7ed3\u5408\u7684\u5f62\u5f0f\u4e0b\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u81ea\u52a8\u6267\u884c\u7a0b\u5e8f\u6539\u8fdb\u548c\u9519\u8bef\u4fee\u590d\u7684\u9ad8\u6548\u4f4e\u8017\u5de5\u4f5c\u6d41\u7a0b\u3002\u7531\u4e8e\u7a0b\u5e8f\u6539\u8fdb\u6216\u4fee\u590d\u901a\u5e38\u9700\u8981\u660e\u786e\u671f\u671b\u7684\u884c\u4e3a\u89c4\u8303\uff0c\u56e0\u6b64\u89c4\u8303\u63a8\u65ad\u5bf9\u4e8e\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u8865\u4e01\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u5728\u8f6f\u4ef6\u9879\u76ee\u4e2d\u8fdb\u884c\u8fed\u4ee3\u4ee3\u7801\u641c\u7d22\u5e76\u914d\u5408\u89c4\u8303\u63a8\u65ad\u6765\u63a2\u7d22\u8fd9\u4e00\u9886\u57df\uff0c\u4ece\u800c\u4ece\u9879\u76ee\u7684\u7ed3\u6784\u548c\u884c\u4e3a\u4e2d\u63a8\u65ad\u51fa\u610f\u56fe\u3002\u6355\u83b7\u7684\u610f\u56fe\u5c06\u7531\u5ba1\u67e5\u8005\u4ee3\u7406\u8fdb\u884c\u5ba1\u67e5\uff0c\u4ee5\u9a8c\u8bc1\u8865\u4e01\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u5bf9\u9a8c\u8bc1\u540e\u8865\u4e01\u4fe1\u5fc3\u5ea6\u91cf\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u201cSpecRover\u201d\uff08AutoCodeRover-v2\uff09\u5efa\u7acb\u5728\u5f00\u6e90\u7684LLM\u4ee3\u7406AutoCodeRover\u4e4b\u4e0a\u3002\u5728\u4f7f\u7528SWE-Bench\u5b8c\u6574\u96c6\u8bc4\u4f30\u65f6\uff0c\u5373\u9488\u5bf92294\u4e2aGitHub\u95ee\u9898\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u4e86\u76f8\u5bf9\u4e8eAutoCodeRover\u8d85\u8fc750%\u7684\u6548\u7387\u63d0\u5347\u3002\u4e0e\u73b0\u6709\u7684\u5f00\u6e90\u4ee3\u7406\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u89e3\u51b3SWE-Bench lite\u4e2d\u7684\u5e73\u5747GitHub\u95ee\u9898\u65f6\uff0c\u6210\u672c\u4ec5\u4e3a0.65\u7f8e\u5143\u3002SpecRover\u751f\u6210\u7684\u89e3\u91ca\u80fd\u591f\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u66f4\u660e\u786e\u7684\u4fe1\u53f7\uff0c\u8868\u660e\u5efa\u8bae\u7684\u8865\u4e01\u53ef\u4ee5\u88ab\u6709\u4fe1\u5fc3\u5730\u63a5\u53d7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8fd8\u5f3a\u8c03\u4e86\u5373\u4f7f\u5728LLM\u65f6\u4ee3\uff0c\u81ea\u52a8\u5316\u7a0b\u5e8f\u4fee\u590d\u6280\u672f\u4e2d\u89c4\u8303\u63a8\u65ad\u7684\u91cd\u8981\u6027\u3002|\n", "2408.01725": "|**2024-08-03**|**The Drama Machine: Simulating Character Development with LLM Agents**|Liam Magee et.al.|[2408.01725](http://arxiv.org/abs/2408.01725)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6765\u6a21\u62df\u590d\u6742\u52a8\u6001\u89d2\u8272\u5728\u620f\u5267\u6027\u573a\u666f\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u201c\u620f\u5267\u673a\u5668\u201d\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u534f\u8c03\u4e86\u626e\u6f14\u4e0d\u540c\u201c\u81ea\u6211\u201d\u548c\u201c\u8d85\u6211\u201d\u5fc3\u7406\u89d2\u8272\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u5728\u89d2\u8272\u626e\u6f14\u6a21\u62df\u4e2d\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u5728\u76f8\u4e92\u4f5c\u7528\u7684\u5bf9\u8bdd\u548c\u4e2a\u4f53\u5185\u90e8\u72ec\u767d\u4e4b\u95f4\u53d1\u5c55\u5e73\u884c\u7684\u4ea4\u4e92\u3002 \u6211\u4eec\u5c06\u6b64\u6846\u67b6\u5e94\u7528\u4e8e\u4e24\u4e2a\u620f\u5267\u573a\u666f\u2014\u2014\u9762\u8bd5\u548c\u4fa6\u63a2\u6545\u4e8b\uff0c\u5e76\u6bd4\u8f83\u4e86\u5728\u6709\u65e0\u201c\u8d85\u6211\u201d\u5f71\u54cd\u4e0b\u89d2\u8272\u53d1\u5c55\u7684\u5dee\u5f02\u3002\u5c3d\u7ba1\u662f\u521d\u6b65\u7814\u7a76\uff0c\u4f46\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u66f4\u52a0\u7ec6\u817b\u3001\u9002\u5e94\u6027\u5f3a\u7684\u6545\u4e8b\uff0c\u8fd9\u4e9b\u6545\u4e8b\u968f\u7740\u4e00\u7cfb\u5217\u5bf9\u8bdd\u56de\u5408\u7684\u53d1\u5c55\u800c\u6f14\u53d8\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u89d2\u8272\u626e\u6f14\u7684\u4e0d\u540c\u65b9\u5f0f\u4ee5\u53ca\u8fd9\u53ef\u80fd\u5bf9AI\u4e3b\u4f53\u6027\u7684\u6982\u5ff5\u5316\u610f\u5473\u7740\u4ec0\u4e48\u3002\u8bba\u6587\u6700\u540e\u8003\u8651\u4e86\u8fd9\u4e00\u65b9\u6cd5\u5982\u4f55\u4e3a\u601d\u8003AI\u6a21\u62df\u4e2d\u5185\u5728\u51b2\u7a81\u548c\u793e\u4f1a\u8868\u6f14\u6027\u7684\u4f5c\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002|\n", "2408.01703": "|**2024-08-03**|**WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization**|Liwenhan Xie et.al.|[2408.01703](http://arxiv.org/abs/2408.01703)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u5bf9\u8bdd\u5f0f\u7528\u6237\u754c\u9762\u652f\u6301\u6570\u636e\u5206\u6790\uff0c\u4ee5OpenAI\u7684ChatGPT\uff08\u539f\u540dAdvanced Data Analysis\u6216Code Interpreter\uff09\u4e3a\u4ee3\u8868\u3002\u672c\u8d28\u4e0a\uff0cLLM\u751f\u6210\u4ee3\u7801\u4ee5\u5b8c\u6210\u5404\u79cd\u5206\u6790\u4efb\u52a1\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5448\u73b0\u539f\u59cb\u4ee3\u7801\u53ef\u80fd\u4f1a\u4f7f\u903b\u8f91\u53d8\u5f97\u6a21\u7cca\uff0c\u5e76\u59a8\u788d\u7528\u6237\u9a8c\u8bc1\u3002\u4e3a\u4e86\u8d4b\u4e88\u7528\u6237\u5bf9\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\u8fdb\u884c\u589e\u5f3a\u7406\u89e3\u4e0e\u63a7\u5236\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u8f6c\u6362\u4e3a\u5b9e\u65f6\u4ea4\u4e92\u5f0f\u7684\u53ef\u89c6\u5316\u8868\u793a\u3002\u5728\u8be5\u65b9\u6cd5\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u5b9e\u65f6\u83b7\u5f97\u6e05\u6670\u3001\u5206\u6b65\u7684LLM\u4ee3\u7801\u53ef\u89c6\u5316\uff0c\u5141\u8bb8\u4ed6\u4eec\u7406\u89e3\u3001\u9a8c\u8bc1\u5e76\u4fee\u6539\u5206\u6790\u4e2d\u7684\u6bcf\u4e2a\u6570\u636e\u64cd\u4f5c\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u51b3\u7b56\u57fa\u4e8e\u4e00\u9879\u63a2\u7d22\u7528\u6237\u5b9e\u8df5\u4e0e\u6311\u6218\u7684\u5f62\u6210\u6027\u7814\u7a76\uff08N=8\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u540d\u4e3aWaitGPT\u7684\u539f\u578b\uff0c\u5e76\u8fdb\u884c\u4e86\u4e00\u9879\u7528\u6237\u7814\u7a76\uff08N=12\uff09\uff0c\u4ee5\u8bc4\u4f30\u5176\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\u3002\u7528\u6237\u7814\u7a76\u7684\u7ed3\u679c\u8868\u660e\uff0cWaitGPT\u6709\u52a9\u4e8e\u76d1\u63a7\u548c\u5f15\u5bfc\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\uff0c\u4f7f\u53c2\u4e0e\u8005\u80fd\u591f\u63d0\u9ad8\u9519\u8bef\u68c0\u6d4b\u80fd\u529b\u5e76\u589e\u52a0\u5bf9\u7ed3\u679c\u7684\u6574\u4f53\u4fe1\u5fc3\u3002|\n", "2408.01667": "|**2024-08-03**|**Automated Phishing Detection Using URLs and Webpages**|Huilin Wang et.al.|[2408.01667](http://arxiv.org/abs/2408.01667)|null|### \u6458\u8981 \u672c\u6587\u9879\u76ee\u805a\u7126\u4e8e\u901a\u8fc7\u6784\u5efa\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u4ee5\u89e3\u51b3\u4f20\u7edf\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3b\u52a8\u83b7\u53d6\u548c\u5229\u7528\u5728\u7ebf\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u52a8\u6001\u7684\u53c2\u8003\u7cfb\u7edf\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u7cbe\u786e\u7684\u9493\u9c7c\u68c0\u6d4b\u3002\u8fd9\u4e00\u521b\u65b0\u907f\u514d\u4e86\u4f9d\u8d56\u9759\u6001\u77e5\u8bc6\u5e93\u7684\u9700\u6c42\uff0c\u663e\u8457\u63d0\u5347\u4e86\u81ea\u52a8\u5316\u5b89\u5168\u63aa\u65bd\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u3002 ### \u9879\u76ee\u6982\u8ff0 \u9879\u76ee\u62a5\u544a\u9996\u5148\u5bf9\u73b0\u6709\u89e3\u51b3\u65b9\u6848\u8fdb\u884c\u4e86\u521d\u6b65\u7814\u7a76\u548c\u95ee\u9898\u5206\u6790\uff0c\u4fc3\u4f7f\u6211\u4eec\u5f00\u53d1\u51fa\u65b0\u7684\u6846\u67b6\u3002\u6211\u4eec\u4ee5\u6a21\u62df\u7684LLM\u4ee3\u7406\u6765\u5c55\u793a\u6846\u67b6\uff0c\u5e76\u8be6\u7ec6\u9610\u8ff0\u4e86\u6784\u5efa\u6240\u9700\u7684\u6280\u672f\uff0c\u968f\u540e\u63d0\u4f9b\u4e86\u5b8c\u6574\u5b9e\u65bd\u7684\u5b9e\u4f8b\u53ca\u5b9e\u9a8c\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u540c\u7c7b\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u5ea6\u4e0a\u8fbe\u5230\u4e860.945\uff0c\u76f8\u6bd4\u73b0\u6709\u89e3\u51b3\u65b9\u6848DynaPhish\u9ad8\u51fa0.445\u4e2a\u767e\u5206\u70b9\u3002 ### \u6027\u80fd\u4e0e\u5c40\u9650 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u672c\u6846\u67b6\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5f53\u524d\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u5177\u6709\u9002\u5e94\u5b9e\u9645\u5e94\u7528\u7684\u6f5c\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u8ba8\u8bba\u4e86\u8be5\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u8fdb\u7b56\u7565\uff0c\u65e8\u5728\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u6548\u80fd\u3002 ### \u7ed3\u8bba \u63d0\u51fa\u7684\u6846\u67b6\u4e3a\u589e\u5f3a\u73b0\u6709\u7684\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u624b\u6bb5\u63d0\u4f9b\u4e86\u6709\u6548\u9014\u5f84\uff0c\u5e76\u4e14\u5177\u5907\u88ab\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.03910": "|**2024-08-11**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u5f80\u5f80\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86CodexGraph\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u4e0eLLM\u4ee3\u7406\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u548c\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0cCodexGraph\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u7684\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5CodexGraph\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u901a\u8fc7\u4f7f\u7528\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0cCodexGraph\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u7528\u9014\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03631": "|**2024-08-07**|**Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent**|Yanhu Wang et.al.|[2408.03631](http://arxiv.org/abs/2408.03631)|null|\u4f20\u7edf\u7684\u57fa\u7ad9\u9009\u5740\uff08BSS\uff09\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u9a7e\u9a76\u6d4b\u8bd5\u548c\u7528\u6237\u53cd\u9988\uff0c\u8fd9\u65e2\u8d39\u65f6\u53c8\u9700\u8981\u5728\u901a\u4fe1\u3001\u7f51\u7edc\u548c\u4f18\u5316\u65b9\u9762\u5177\u5907\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u76f8\u5173\u6280\u672f\u7684\u53d1\u5c55\uff0c\u7279\u522b\u662f\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u4ee3\u7406\u5de5\u7a0b\u9886\u57df\uff0c\u7f51\u7edc\u4f18\u5316\u5c06\u89c1\u8bc1\u4e00\u573a\u9769\u547d\u6027\u7684\u8f6c\u53d8\u3002\u8fd9\u79cd\u8f6c\u53d8\u6d89\u53ca\u5de7\u5999\u5730\u4f7f\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u6765\u5411\u8fd9\u4e9b\u590d\u6742\u800c\u5148\u8fdb\u7684LLMs\u6ce8\u5165\u4eba\u7c7b\u7ecf\u9a8c\u548c\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8fde\u63a5\u5230\u4eba\u7c7b\u7528\u6237\uff0c\u90e8\u7f72\u81ea\u4e3b\u4ee3\u7406\u4f5c\u4e3a\u901a\u4fe1\u6865\u6881\u3002\u8fd9\u79cd\u96c6\u6210\u4ee3\u8868\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4f5c\u4e3a\u4e00\u79cd\u670d\u52a1\u548cAI\u4f7f\u751f\u6d3b\u66f4\u4fbf\u6377\u7684\u672a\u6765\u8303\u5f0f\u3002 \u4f5c\u4e3a\u521d\u6b65\u63a2\u7d22\uff0c\u672c\u7814\u7a76\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u4e2a\u7531LLM\u9a71\u52a8\u7684BSS\u4f18\u5316\u6846\u67b6\uff0c\u5e76\u63d0\u51fa\u4e86\u56db\u79cd\u6f5c\u5728\u7684\u5b9e\u73b0\u7b56\u7565\uff1a\u57fa\u4e8e\u4f18\u5316\u63d0\u793a\u7684LLM\uff08PoL\uff09\u3001\u4eba\u673a\u4ea4\u4e92\u7684LLM\uff08HiLL\uff09\u3001LLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08LaBa\uff09\u4ee5\u53ca\u534f\u540c\u591a\u4e2aLLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08CLaBa\uff09\u3002\u901a\u8fc7\u5728\u771f\u5b9e\u6570\u636e\u4e0a\u7684\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u8868\u660e\uff0c\u501f\u52a9\u63d0\u793a\u7684LLM\u548c\u57fa\u4e8e\u4ee3\u7406\u7684LLM\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u9ad8\u6548\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u53ef\u9760\u7684\u7f51\u7edc\u90e8\u7f72\uff0c\u663e\u8457\u63d0\u9ad8\u4e86BSS\u4f18\u5316\u7684\u6548\u7387\u5e76\u51cf\u5c11\u4e86\u4e0d\u5fc5\u8981\u7684\u624b\u52a8\u53c2\u4e0e\u3002|\n", "2408.04168": "|**2024-08-08**|**Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions**|Qingbin Zeng et.al.|[2408.04168](http://arxiv.org/abs/2408.04168)|**[link](https://github.com/hiyouga/llama-factory)**|\u672c\u6587\u63a2\u8ba8\u4e86\u57ce\u5e02\u5bfc\u822a\u573a\u666f\u4e0b\u7684AI\u4ee3\u7406\u95ee\u9898\uff1a\u63d0\u4f9b\u76ee\u6807\u4f4d\u7f6e\u4e0e\u77e5\u540d\u5730\u6807\u4e4b\u95f4\u7684\u8bed\u8a00\u63cf\u8ff0\uff1b\u4ec5\u901a\u8fc7\u89c2\u5bdf\u5468\u56f4\u73af\u5883\uff0c\u5305\u62ec\u8bc6\u522b\u5730\u6807\u548c\u9053\u8def\u7f51\u7edc\u8fde\u63a5\uff0c\u4ee3\u7406\u9700\u8981\u4f5c\u51fa\u51b3\u7b56\u4ee5\u65e0\u6307\u793a\u5730\u5bfc\u822a\u81f3\u76ee\u6807\u4f4d\u7f6e\u3002\u8fd9\u4e00\u6311\u6218\u6027\u5728\u4e8e\uff0c\u5b83\u8981\u6c42\u4ee3\u7406\u5efa\u7acb\u81ea\u8eab\u5b9a\u4f4d\u5e76\u83b7\u53d6\u590d\u6742\u57ce\u5e02\u73af\u5883\u7684\u7a7a\u95f4\u8868\u793a\uff0c\u800c\u5730\u6807\u5f80\u5f80\u4e0d\u53ef\u89c1\u3002\u5728\u7f3a\u4e4f\u5bfc\u822a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u79cd\u80fd\u529b\u5bf9\u4e8e\u4ee3\u7406\u5728\u957f\u8ddd\u79bb\u57ce\u5e02\u5bfc\u822a\u4e2d\u505a\u51fa\u9ad8\u8d28\u91cf\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u63a8\u7406\u80fd\u529b\u7684\u6d8c\u73b0\uff0c\u4e00\u4e2a\u5438\u5f15\u4eba\u7684\u57fa\u7840\u65b9\u6cd5\u662f\u63d0\u793aLLMs\u5bf9\u6bcf\u6b21\u89c2\u5bdf\u505a\u51fa\u201c\u53cd\u5e94\u201d\u5e76\u636e\u6b64\u4f5c\u51fa\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6027\u80fd\u975e\u5e38\u5dee\uff0c\u4ee3\u7406\u7ecf\u5e38\u53cd\u590d\u8bbf\u95ee\u76f8\u540c\u4f4d\u7f6e\uff0c\u5e76\u4f5c\u51fa\u77ed\u89c6\u3001\u4e0d\u4e00\u81f4\u7684\u51b3\u7b56\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5176\u7279\u5f81\u5728\u4e8e\u611f\u77e5\u3001\u53cd\u601d\u548c\u89c4\u5212\u7684\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7\u5fae\u8c03\u7684LLaVA-7B\u80fd\u591f\u51c6\u786e\u611f\u77e5\u5730\u6807\u7684\u65b9\u5411\u548c\u8ddd\u79bb\uff0c\u9002\u7528\u4e8e\u57ce\u5e02\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u8bb0\u5fc6\u673a\u5236\u5b9e\u73b0\u53cd\u601d\uff0c\u5373\u5b58\u50a8\u8fc7\u5f80\u7ecf\u9a8c\u5e76\u5728\u5f53\u524d\u611f\u77e5\u4e0b\u68c0\u7d22\uff0c\u4ee5\u8fdb\u884c\u6709\u6548\u7684\u51b3\u7b56\u8bba\u8bc1\u3002\u89c4\u5212\u5219\u5229\u7528\u53cd\u601d\u7ed3\u679c\u751f\u6210\u957f\u671f\u8ba1\u5212\uff0c\u4ece\u800c\u907f\u514d\u957f\u8ddd\u79bb\u5bfc\u822a\u4e2d\u7684\u77ed\u89c6\u51b3\u7b56\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bbe\u8ba1\u7684\u5de5\u4f5c\u6d41\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86LLM\u4ee3\u7406\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u662f\u5426\u4f1a\u635f\u5bb3LLM\u4ee3\u7406\u5728\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u8fdb\u8ba1\u5212\uff1f\uff084\uff09\u5bf9LLM\u8fdb\u884c\u6b63\u8d1f\u53cd\u9988\u7ed3\u5408\u7684\u5fae\u8c03\u662f\u5426\u80fd\u5e26\u6765\u8fdb\u4e00\u6b65\u7684\u63d0\u5347\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u5b83\u4eec\u5728\u5173\u6ce8\u957f\u4e0a\u4e0b\u6587\u4e2d\u5173\u952e\u90e8\u5206\u7684\u80fd\u529b\u4e0a\u4ecd\u7136\u5b58\u5728\u4e0d\u8db3\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u5728\u5206\u6790\u957f\u8ba1\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5e76\u4e14\u65e0\u6cd5\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u7528\u4e8e\u7ec6\u5316\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Feedback-Aware Fine-Tuning\uff08FAFT\uff09\uff0c\u4e00\u79cd\u5229\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\uff0c\u76f8\u8f83\u4e8e\u7eaf\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\uff0cFAFT\u5728\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u5173\u4e8e\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.05346": "|**2024-08-13**|**DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts**|Mohammed Saidul Islam et.al.|[2408.05346](http://arxiv.org/abs/2408.05346)|**[link](https://github.com/saidul-islam98/DataNarrative)**|\u6570\u636e\u9a71\u52a8\u7684\u6545\u4e8b\u53d9\u8ff0\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ed3\u5408\u53d9\u4e8b\u6280\u5de7\u4e0e\u53ef\u89c6\u5316\u548c\u6587\u672c\uff0c\u6765\u4f20\u8fbe\u89c1\u89e3\u3002\u8fd9\u4e9b\u6545\u4e8b\u878d\u5408\u4e86\u56fe\u8868\u4e2d\u7684\u7a81\u51fa\u6761\u5f62\u548c\u7ebf\u6761\u4ee5\u53ca\u89e3\u91ca\u89c1\u89e3\u7684\u6587\u672c\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u521b\u5efa\u8fd9\u6837\u7684\u6545\u4e8b\u9700\u8981\u5bf9\u6570\u636e\u6709\u6df1\u5165\u7684\u7406\u89e3\uff0c\u5e76\u4e14\u9700\u8981\u7cbe\u5fc3\u7684\u53d9\u4e8b\u89c4\u5212\uff0c\u901a\u5e38\u9700\u8981\u4eba\u7c7b\u7684\u4ecb\u5165\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u8d39\u5fc3\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u751f\u6210\u8fde\u8d2f\u548c\u5168\u9762\u7684\u6570\u636e\u6545\u4e8b\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u4efb\u52a1\u2014\u2014\u6570\u636e\u6545\u4e8b\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u76841,449\u4e2a\u6545\u4e8b\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u521b\u9020\u8fde\u8d2f\u6570\u636e\u6545\u4e8b\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5229\u7528\u4e24\u4e2aLLM\u4ee3\u7406\u6765\u6a21\u4eff\u4eba\u7c7b\u8bb2\u6545\u4e8b\u7684\u8fc7\u7a0b\uff1a\u4e00\u4e2a\u7528\u4e8e\u7406\u89e3\u5e76\u63cf\u8ff0\u6570\u636e\u3001\u751f\u6210\u5927\u7eb2\u548c\u53d9\u8ff0\uff0c\u53e6\u4e00\u4e2a\u5219\u5728\u6bcf\u4e2a\u4e2d\u95f4\u6b65\u9aa4\u8fdb\u884c\u9a8c\u8bc1\u3002\u5c3d\u7ba1\u6211\u4eec\u7684\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8e\u6a21\u578b\u548c\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u901a\u5e38\u4f18\u4e8e\u975e\u4ee3\u7406\u5bf9\u624b\uff0c\u4f46\u7ed3\u679c\u4e5f\u63ed\u793a\u4e86\u6570\u636e\u6545\u4e8b\u751f\u6210\u7684\u72ec\u7279\u6311\u6218\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u89e3\u51b3SWE-Bench Lite\u4e2d\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\uff0c\u4e00\u4e2a\u65e8\u5728\u5229\u7528\u5176\u72ec\u7279\u4e13\u957f\u7684\u6846\u67b6\u3002DEI\u4f5c\u4e3a\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90SWE\u4ee3\u7406\uff0c\u5176\u6700\u9ad8\u4e2a\u4f53\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u4e3a27.3%\uff0c\u5728\u5e94\u7528\u4e86DEI\u540e\uff0c\u80fd\u591f\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u8868\u73b0\u56e2\u961f\u4ee555%\u7684\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u53d6\u5f97\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5cAI\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06520": "|**2024-08-12**|**Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning**|Chuanneng Sun et.al.|[2408.06520](http://arxiv.org/abs/2408.06520)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5b83\u4eec\u6210\u4e3a\u673a\u5668\u4eba\u51b3\u7b56\u7684\u6709\u5e0c\u671b\u5019\u9009\u8005\u3002\u53d7\u5230\u5c42\u6b21\u5f3a\u5316\u5b66\u4e60\uff08HRL\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014\u5728\u4e0a\u4e0b\u6587\u4e2d\u8fdb\u884c\u5c42\u6b21\u5316\u7684\u5f3a\u5316\u5b66\u4e60\uff08HCRL\uff09\u3002\u8be5\u6846\u67b6\u901a\u8fc7LLM\u57fa\u9ad8\u5c42\u7b56\u7565\u5206\u89e3\u590d\u6742\u4efb\u52a1\uff0c\u5373\u901a\u8fc7\u5728\u6267\u884c\u65f6\u52a8\u6001\u5206\u89e3\u590d\u6742\u4efb\u52a1\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u5229\u7528\u9ad8\u9636\u7b56\u7565\u6765\u5b9a\u4e49\u76ee\u6807\uff0c\u8fd9\u4e9b\u76ee\u6807\u7531\u5b50\u4efb\u52a1\u7ec4\u6210\uff0c\u5e76\u5206\u914d\u7ed9\u4f4e\u9636\u7b56\u7565\u4ee5\u5b8c\u6210\u3002\u4e00\u65e6LLM\u4ee3\u7406\u786e\u5b9a\u76ee\u6807\u5df2\u5b8c\u6210\uff0c\u5219\u4f1a\u63d0\u51fa\u65b0\u7684\u76ee\u6807\u3002 \u4e3a\u4e86\u63d0\u9ad8\u591a\u8f6e\u6267\u884c\u4e2d\u7684\u4ee3\u7406\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e8b\u540e\u6a21\u5757\u5316\u53cd\u601d\uff08HMR\uff09\uff0c\u5176\u4e2d\uff0c\u4ee3\u7406\u4e0d\u662f\u5bf9\u5b8c\u6574\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u800c\u662f\u5c06\u4efb\u52a1\u76ee\u6807\u66ff\u6362\u4e3a\u4e2d\u95f4\u76ee\u6807\uff0c\u5e76\u8ba9\u4ee3\u7406\u5bf9\u8f83\u77ed\u7684\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u4ee5\u63d0\u9ad8\u53cd\u601d\u6548\u7387\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u57fa\u51c6\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6240\u63d0\u51fa\u7684HCRL\u7684\u51b3\u7b56\u80fd\u529b\u2014\u2014ALFWorld\u3001Webshop\u548cHotpotQA\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5728\u4e94\u8f6e\u6267\u884c\u4e2d\uff0cHCRL\u53ef\u5b9e\u73b09%\u300142%\u548c10%\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.07199": "|**2024-08-13**|**Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents**|Pranav Putta et.al.|[2408.07199](http://arxiv.org/abs/2408.07199)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9700\u8981\u590d\u6742\u63a8\u7406\u7684\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u8fdb\u884c\u81ea\u4e3b\u4ee3\u7406\u7684\u591a\u6b65\u9aa4\u63a8\u7406\u5e94\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4f20\u7edf\u7684\u57fa\u4e8e\u9759\u6001\u6570\u636e\u96c6\u7684\u76d1\u7763\u9884\u8bad\u7ec3\u4e0d\u8db3\u4ee5\u4f7f\u81ea\u4e3b\u4ee3\u7406\u5177\u5907\u5728\u52a8\u6001\u8bbe\u7f6e\u5982\u7f51\u7edc\u5bfc\u822a\u4e2d\u6267\u884c\u590d\u6742\u51b3\u7b56\u6240\u9700\u7684\u81ea\u4e3b\u80fd\u529b\u3002\u4ee5\u5f80\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u6765\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\u7684\u65b9\u6cd5\u5f80\u5f80\u9762\u4e34\u7d2f\u79ef\u9519\u8bef\u548c\u63a2\u7d22\u6570\u636e\u6709\u9650\u7684\u95ee\u9898\uff0c\u5bfc\u81f4\u653f\u7b56\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7ed3\u5408\u4e86\u5f15\u5bfc\u5f0f\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u641c\u7d22\u4e0e\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u5e76\u4f7f\u7528\u79bb\u7b56\u7565\u53d8\u4f53\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b97\u6cd5\u5bf9\u4ee3\u7406\u4e92\u52a8\u8fdb\u884c\u8fed\u4ee3\u5fae\u8c03\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8LLM\u4ee3\u7406\u4ece\u6210\u529f\u548c\u5931\u8d25\u7684\u8f68\u8ff9\u4e2d\u6709\u6548\u5b66\u4e60\uff0c\u4ece\u800c\u5728\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u4e2d\u63d0\u9ad8\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u5728WebShop\u73af\u5883\uff08\u4e00\u4e2a\u6a21\u62df\u7535\u5b50\u5546\u52a1\u5e73\u53f0\uff09\u4e2d\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8be5\u73af\u5883\u5728\u4e0e\u884c\u4e3a\u514b\u9686\u548c\u5f3a\u5316\u5fae\u8c03\u57fa\u7ebf\u76f8\u6bd4\u65f6\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u5728\u914d\u5907\u5728\u7ebf\u641c\u7d22\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51fb\u8d25\u4e86\u5e73\u5747\u4eba\u7c7b\u6027\u80fd\u3002\u5728\u5b9e\u9645\u9884\u8ba2\u573a\u666f\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86Llama-3 70B\u6a21\u578b\u7684\u96f6\u5c04\u6210\u529f\u7387\u4ece18.6%\u589e\u52a0\u523081.7%\uff08\u76f8\u5bf9\u589e\u52a0\u4e86340%\uff09\uff0c\u5e76\u5728\u4e00\u5929\u7684\u6570\u636e\u6536\u96c6\u540e\u8fdb\u4e00\u6b65\u589e\u52a0\u523095.4%\uff0c\u5e76\u4e14\u901a\u8fc7\u5728\u7ebf\u641c\u7d22\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u6807\u5fd7\u7740\u81ea\u4e3b\u4ee3\u7406\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5b9e\u73b0\u66f4\u9ad8\u7ea7\u548c\u53ef\u9760\u51b3\u7b56\u7684\u9053\u8def\u3002|\n", "2408.08158": "|**2024-08-15**|**EmBARDiment: an Embodied AI Agent for Productivity in XR**|Riccardo Bovo et.al.|[2408.08158](http://arxiv.org/abs/2408.08158)|null|XR\u8bbe\u5907\u642d\u8f7d\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u59cb\u7ec8\u5728\u7ebf\u7684\u4ee3\u7406\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u9ad8\u6548\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u5c4f\u5e55\u7684\u804a\u5929\u673a\u5668\u4eba\u5e76\u672a\u5145\u5206\u5229\u7528XR\u6240\u63d0\u4f9b\u7684\u5168\u9762\u81ea\u7136\u8f93\u5165\uff0c\u5305\u62ec\u5185\u90e8\u9762\u5411\u7684\u4f20\u611f\u5668\u6570\u636e\uff0c\u800c\u662f\u8fc7\u5ea6\u4f9d\u8d56\u660e\u786e\u7684\u58f0\u97f3\u6216\u6587\u672c\u63d0\u793a\uff0c\u6709\u65f6\u8fd8\u4f1a\u4e0e\u4f5c\u4e3a\u67e5\u8be2\u7684\u4e00\u90e8\u5206\u6295\u5c04\u7684\u591a\u6a21\u6001\u6570\u636e\u914d\u5bf9\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u6ce8\u610f\u529b\u6846\u67b6\u4ece\u7528\u6237\u884c\u4e3a\u3001\u6ce8\u89c6\u70b9\u548cXR\u73af\u5883\u4e2d\u7684\u4e0a\u4e0b\u6587\u8bb0\u5fc6\u4e2d\u9690\u5f0f\u5730\u63a8\u5bfc\u51fa\u80cc\u666f\u4fe1\u606f\uff0c\u4ece\u800c\u6700\u5c0f\u5316\u5bf9\u5de5\u7a0b\u5316\u660e\u786e\u63d0\u793a\u7684\u9700\u6c42\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u4e14\u76f4\u89c2\u7684\u4ea4\u4e92\uff0c\u8fd9\u4e9b\u4ea4\u4e92\u80fd\u591f\u6d1e\u5bdf\u7528\u6237\u7684\u89c1\u89e3\u5e76\u4e3a\u804a\u5929\u673a\u5668\u4eba\u63d0\u4f9b\u4fe1\u606f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u5c55\u793a\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u5728XR\u4e2d\u4e0e\u804a\u5929\u673a\u5668\u4eba\u8fdb\u884c\u4ea4\u4e92\u7684\u6f5c\u5728\u53d8\u9769\u6027\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765XR-\u5b9e\u4f53LLM\u4ee3\u7406\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.08054": "|**2024-08-15**|**Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework**|Changyu Du et.al.|[2408.08054](http://arxiv.org/abs/2408.08054)|null|\u4f20\u7edf\u7684\u5efa\u7b51\u4fe1\u606f\u6a21\u578b\uff08BIM\uff09\u521b\u5efa\u8fc7\u7a0b\u901a\u5e38\u8981\u6c42\u8bbe\u8ba1\u5e08\u638c\u63e1\u590d\u6742\u4e14\u7e41\u7410\u7684\u5efa\u6a21\u547d\u4ee4\uff0c\u4ee5\u5728BIM\u521b\u5efa\u5de5\u5177\u4e2d\u5b9e\u73b0\u5176\u8bbe\u8ba1\u610f\u56fe\u3002\u8fd9\u79cd\u989d\u5916\u7684\u8ba4\u77e5\u8d1f\u62c5\u4f7f\u8bbe\u8ba1\u8fc7\u7a0b\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u963b\u788d\u4e86\u5efa\u7b51\u3001\u5de5\u7a0b\u548c\u65bd\u5de5\uff08AEC\uff09\u884c\u4e1a\u5bf9BIM\u548c\u57fa\u4e8e\u6a21\u578b\u7684\u8bbe\u8ba1\u7684\u91c7\u7528\u3002 \u4e3a\u4e86\u66f4\u76f4\u89c2\u5730\u8868\u8fbe\u8bbe\u8ba1\u610f\u56fe\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014Text2BIM\u3002\u8be5\u6846\u67b6\u80fd\u591f\u4ece\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u62103D\u5efa\u7b51\u6a21\u578b\u3002\u5b83\u901a\u8fc7\u534f\u8c03\u591a\u4e2aLLM\u4ee3\u7406\u534f\u4f5c\u5e76\u63a8\u7406\uff0c\u5c06\u6587\u672c\u7528\u6237\u8f93\u5165\u8f6c\u6362\u4e3a\u8c03\u7528BIM\u521b\u5efa\u5de5\u5177API\u7684\u6307\u4ee4\u4ee3\u7801\uff0c\u4ece\u800c\u5728\u8f6f\u4ef6\u4e2d\u751f\u6210\u5177\u6709\u5185\u90e8\u5e03\u5c40\u3001\u5916\u90e8\u5916\u58f3\u548c\u8bed\u4e49\u4fe1\u606f\u7684\u53ef\u7f16\u8f91BIM\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u89c4\u5219\u7684\u6a21\u578b\u68c0\u67e5\u5668\uff0c\u5229\u7528\u9884\u5b9a\u4e49\u7684\u9886\u57df\u77e5\u8bc6\u6307\u5bfcLLM\u4ee3\u7406\u89e3\u51b3\u751f\u6210\u6a21\u578b\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u8fed\u4ee3\u6539\u8fdb\u6a21\u578b\u8d28\u91cf\u3002 \u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\u6765\u6bd4\u8f83\u548c\u5206\u6790\u5728\u63d0\u8bae\u6846\u67b6\u4e0b\u4e09\u79cd\u4e0d\u540cLLM\u7684\u8868\u73b0\u3002\u8bc4\u4f30\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u6709\u6548\u5730\u751f\u6210\u9ad8\u8d28\u91cf\u3001\u7ed3\u6784\u5408\u7406\u4e14\u4e0e\u7528\u6237\u8f93\u5165\u6307\u5b9a\u7684\u62bd\u8c61\u6982\u5ff5\u76f8\u4e00\u81f4\u7684\u5efa\u7b51\u6a21\u578b\u3002 \u6700\u540e\uff0c\u5f00\u53d1\u4e86\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u8f6f\u4ef6\u539f\u578b\uff0c\u5c06\u8be5\u6846\u67b6\u96c6\u6210\u5230BIM\u521b\u5efa\u8f6f\u4ef6Vectorworks\u4e2d\uff0c\u5c55\u793a\u4e86\u901a\u8fc7\u804a\u5929\u8fdb\u884c\u5efa\u6a21\u7684\u6f5c\u529b\u3002|\n", "2408.09955": "|**2024-08-20**|**MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems**|Qian Wang et.al.|[2408.09955](http://arxiv.org/abs/2408.09955)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0cLLM\u9a71\u52a8\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08LLM-MA\u7cfb\u7edf\uff09\u88ab\u63d0\u51fa\u4ee5\u5e94\u5bf9\u5b9e\u9645\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u7684\u667a\u80fd\u4f53\u5927\u591a\u9075\u5faa\u5728\u6574\u4f53\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8\u7684\u9884\u5b9a\u4e49\u6807\u51c6\u64cd\u4f5c\u7a0b\u5e8f\uff08SOP\uff09\uff0c\u7f3a\u4e4f\u81ea\u4e3b\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u6b64\u5916\uff0c\u5f53\u524d\u89e3\u51b3\u65b9\u6848\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u6548\u667a\u80fd\u4f53\u5408\u4f5c\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MegaAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u5927\u89c4\u6a21LLM\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u81ea\u4e3b\u5408\u4f5c\u7684\u5b9e\u7528\u6846\u67b6\u3002MegaAgent\u5229\u7528\u667a\u80fd\u4f53\u7684\u81ea\u4e3b\u6027\u52a8\u6001\u751f\u6210\u57fa\u4e8e\u4efb\u52a1\u9700\u6c42\u7684\u667a\u80fd\u4f53\uff0c\u96c6\u6210\u4e86\u4efb\u52a1\u81ea\u52a8\u5212\u5206\u3001\u667a\u80fd\u4f53\u6d3b\u52a8\u7cfb\u7edf\u7ea7\u89c4\u5212\u4e0e\u76d1\u63a7\u4ee5\u53ca\u5e76\u53d1\u64cd\u4f5c\u7ba1\u7406\u7b49\u529f\u80fd\u3002\u6b64\u5916\uff0cMegaAgent\u91c7\u7528\u5c42\u6b21\u7ed3\u6784\u8bbe\u8ba1\uff0c\u5e76\u5229\u7528\u7cfb\u7edf\u7ea7\u5e76\u884c\u6027\u6765\u63d0\u5347\u6027\u80fd\u548c\u589e\u5f3a\u901a\u4fe1\u6548\u7387\u3002 \u6211\u4eec\u901a\u8fc7\u56f4\u68cb\u6e38\u620f\u5f00\u53d1\u5c55\u793a\u4e86MegaAgent\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6d41\u884c\u7684LLM-MA\u7cfb\u7edf\uff1b\u5e76\u901a\u8fc7\u56fd\u5bb6\u653f\u7b56\u6a21\u62df\u9a8c\u8bc1\u4e86\u5176\u9ad8\u81ea\u4e3b\u6027\u548c\u5feb\u901f\u6269\u5c55\u81f3590\u4e2a\u667a\u80fd\u4f53\u7684\u80fd\u529b\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u5b83\u4eec\u4e4b\u95f4\u7684\u6709\u6548\u5408\u4f5c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cMegaAgent\u662f\u9996\u4e2a\u65e0\u9884\u5b9a\u4e49SOP\u3001\u9ad8\u6548\u4e14\u5177\u6709\u9ad8\u53ef\u6269\u5c55\u6027\u7684\u5927\u89c4\u6a21LLM-MA\u7cfb\u7edf\uff0c\u4e3a\u8be5\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u94fa\u5e73\u4e86\u9053\u8def\u3002\u6211\u4eec\u7684\u4ee3\u7801\u4f4d\u4e8e\u3002|\n", "2408.09785": "|**2024-08-19**|**GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making**|Arsham Gholamzadeh Khoee et.al.|[2408.09785](http://arxiv.org/abs/2408.09785)|null|\u5728\u6c7d\u8f66\u884c\u4e1a\u4e2d\uff0c\u4f20\u7edf\u8f6f\u4ef6\u90e8\u7f72\u51b3\u7b56\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u5bf9\u8868\u683c\u5316\u6d4b\u8bd5\u6570\u636e\u7684\u624b\u52a8\u5206\u6790\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u5bfc\u81f4\u66f4\u9ad8\u7684\u6210\u672c\u548c\u8f6f\u4ef6\u53d1\u5e03\u5468\u671f\u7684\u5ef6\u8fdf\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u7279\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u5e94\u7528\u901a\u5e38\u9700\u8981\u591a\u8f6e\u7684\u4eba\u5de5\u9a71\u52a8\u63d0\u793a\u5de5\u7a0b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u5728\u5de5\u4e1a\u6700\u7ec8\u7528\u6237\u4e2d\u7684\u5b9e\u9645\u90e8\u7f72\uff0c\u7279\u522b\u662f\u90a3\u4e9b\u9700\u8981\u53ef\u9760\u548c\u9ad8\u6548\u7ed3\u679c\u7684\u7528\u6237\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGoNoGo\u7684LLM\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u6c7d\u8f66\u8f6f\u4ef6\u90e8\u7f72\u8fc7\u7a0b\uff0c\u540c\u65f6\u6ee1\u8db3\u529f\u80fd\u8981\u6c42\u548c\u5de5\u4e1a\u7ea6\u675f\u3002\u4e0e\u4ee5\u5f80\u7cfb\u7edf\u4e0d\u540c\uff0cGoNoGo\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u548c\u98ce\u9669\u654f\u611f\u7cfb\u7edf\u8fdb\u884c\u4e86\u5b9a\u5236\u3002\u6211\u4eec\u4f7f\u7528\u6765\u81ea\u5de5\u4e1a\u5b9e\u8df5\u7684\u96f6\u6b21\u548c\u5c11\u91cf\u6b21\u793a\u4f8b\u6765\u8bc4\u4f30GoNoGo\u5728\u4e0d\u540c\u4efb\u52a1\u96be\u5ea6\u4e0b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0cGoNoGo\u5728\u96be\u5ea6\u4e0d\u8d85\u8fc7\u4e8c\u7ea7\u76843\u6b21\u793a\u4f8b\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86100%\u7684\u6210\u529f\u7387\uff0c\u5e76\u4e14\u5373\u4f7f\u5bf9\u4e8e\u66f4\u590d\u6742\u7684\u4efb\u52a1\u4e5f\u80fd\u4fdd\u6301\u9ad8\u7ee9\u6548\u3002\u6211\u4eec\u53d1\u73b0\uff0cGoNoGo\u6709\u6548\u5730\u81ea\u52a8\u5316\u4e86\u8f83\u7b80\u5355\u4efb\u52a1\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5e72\u9884\u7684\u9700\u6c42\u3002\u603b\u4e4b\uff0cGoNoGo\u4ee3\u8868\u4e86\u4e00\u4e2a\u76ee\u524d\u5728\u6211\u4eec\u7684\u5de5\u4e1a\u5408\u4f5c\u4f19\u4f34\u516c\u53f8\u4e2d\u88ab\u7528\u4e8e\u534f\u52a9\u8f6f\u4ef6\u53d1\u5e03\u51b3\u7b56\u7684\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684LLM\u57fa\u89e3\u51b3\u65b9\u6848\uff0c\u652f\u6301\u4e86\u98ce\u9669\u654f\u611f\u8f66\u8f86\u7cfb\u7edf\u53d1\u5e03\u8fc7\u7a0b\u4e2d\u7684\u66f4\u52a0\u660e\u667a\u548c\u53ca\u65f6\u7684\u51b3\u7b56\u3002|\n", "2408.09559": "|**2024-08-18**|**HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model**|Mengkang Hu et.al.|[2408.09559](http://arxiv.org/abs/2408.09559)|**[link](https://github.com/hiagent2024/hiagent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f5c\u4e3a\u80fd\u591f\u5904\u7406\u73af\u5883\u89c2\u5bdf\u5e76\u751f\u6210\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u76ee\u6807\u4efb\u52a1\u7684\u4ea4\u4e92\u7cfb\u7edf\u3002\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d7\u5230\u5176\u8bb0\u5fc6\u673a\u5236\u7684\u5f71\u54cd\uff0c\u8be5\u673a\u5236\u901a\u8fc7\u8bb0\u5f55\u5386\u53f2\u7ecf\u9a8c\u6765\u5f62\u6210\u4e00\u7cfb\u5217\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u8bb0\u5fc6\u5206\u4e3a\u4e24\u7c7b\uff1a\u8de8\u8bd5\u8bb0\u5fc6\uff0c\u79ef\u7d2f\u4e8e\u591a\u6b21\u5c1d\u8bd5\u4e2d\uff1b\u4ee5\u53ca\u5355\u8bd5\u8bb0\u5fc6\uff08\u5de5\u4f5c\u8bb0\u5fc6\uff09\uff0c\u79ef\u7d2f\u4e8e\u5355\u4e00\u5c1d\u8bd5\u5185\u3002\u5c3d\u7ba1\u5173\u4e8e\u8de8\u8bd5\u8bb0\u5fc6\u4f18\u5316\u7684\u7814\u7a76\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u4f55\u901a\u8fc7\u63d0\u5347\u5de5\u4f5c\u8bb0\u5fc6\u5229\u7528\u6548\u7387\u6765\u589e\u5f3a\u4ee3\u7406\u6027\u80fd\u7684\u63a2\u7d22\u4ecd\u76f8\u5bf9\u4e0d\u8db3\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u76f4\u63a5\u5c06\u6574\u4e2a\u5386\u53f2\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5bfc\u81f4\u5728\u957f\u671f\u4efb\u52a1\u4e2d\u5b58\u5728\u5197\u4f59\u95ee\u9898\u3002\u53d7\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u7b56\u7565\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHiAgent\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5c06\u5b50\u76ee\u6807\u4f5c\u4e3a\u8bb0\u5fc6\u5757\u6765\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7684\u5de5\u4f5c\u8bb0\u5fc6\u8fdb\u884c\u5c42\u6b21\u5316\u7ba1\u7406\u3002\u5177\u4f53\u6765\u8bf4\uff0cHiAgent\u4fc3\u4f7fLLM\u5728\u751f\u6210\u6267\u884c\u52a8\u4f5c\u524d\u5148\u5236\u5b9a\u5b50\u76ee\u6807\uff0c\u5e76\u5141\u8bb8LLM\u4e3b\u52a8\u51b3\u5b9a\u66ff\u6362\u4e4b\u524d\u7684\u5b50\u76ee\u6807\uff0c\u4ec5\u4fdd\u7559\u4e0e\u5f53\u524d\u5b50\u76ee\u6807\u76f8\u5173\u7684\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u3002\u5728\u4e94\u4e2a\u957f\u671f\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHiAgent\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e86\u4e24\u500d\uff0c\u5e73\u5747\u6b65\u9aa4\u6570\u51cf\u5c11\u4e863.8\u4e2a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cHiAgent\u5728\u6574\u4e2a\u6b65\u9aa4\u4e2d\u5747\u80fd\u6301\u7eed\u6539\u5584\u6027\u80fd\uff0c\u8fd9\u51f8\u663e\u4e86\u5176\u7a33\u5065\u6027\u548c\u6cdb\u7528\u6027\u3002 \u9879\u76ee\u9875\u9762\uff1ahttps://github.com/HiAgent2024/HiAgent**|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u8f83\u5dee\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aFLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\u7684\u65b0\u9896\u591a\u6a21\u6001LLM\u57fa\u5143\u4f53\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u6709\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u9002\u5e94\u5bfc\u822a\u4efb\u52a1\uff0c\u5305\u62ec\u5355\u611f\u77e5\u8c03\u6574\u4ee5\u63cf\u8ff0\u8857\u666f\u3001\u591a\u611f\u77e5\u8c03\u6574\u4ee5\u603b\u7ed3\u8f68\u8ff9\u4ee5\u53ca\u5728VLN\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7aef\u5230\u7aef\u8bad\u7ec3\u3002\u5408\u6210\u7684\u6570\u636e\u96c6\u662f\u81ea\u52a8\u751f\u6210\u7684\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u63d0\u9ad8\u4e867.3%\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u5e76\u4ee3\u8868\u4e86\u8fc8\u5411\u5b9e\u9645\u5e94\u7528\u4e2d\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53AI\u9886\u57df\u7684\u8fdb\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\u7684\u52a0\u6301\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4f5c\u51fa\u65e5\u76ca\u81ea\u4e3b\u7684\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5229\u7528\u4e86\u201c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u201d\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u7684\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5728\u5b8c\u6210\u7ed9\u5b9a\u4efb\u52a1\u7684\u540c\u65f6\u786e\u4fdd\u5b89\u5168\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u79cd\u6279\u5224\u673a\u5236\uff0c\u4ee5\u6307\u5bfc\u4ee3\u7406\u5728\u6bcf\u4e00\u6b65\u9632\u6b62\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u9274\u4e8e\u7f3a\u4e4f\u73b0\u6709\u57fa\u51c6\u6765\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u6536\u96c6\u4e8680\u4e2a\u5de5\u5177\u5305\uff0c\u8986\u76d68\u4e2a\u7c7b\u522b\uff0c\u5171\u8ba1180\u4e2a\u573a\u666f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u663e\u793a\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.10455": "|**2024-08-24**|**IDEA:Enhancing the Rule Learning Ability of Language Agents through Induction, Deduction, and Abduction**|Kaiyu He et.al.|[2408.10455](http://arxiv.org/abs/2408.10455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aRULEARN\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\u3002\u5728RULEARN\u4e2d\uff0c\u4ee3\u7406\u901a\u8fc7\u4e0e\u73af\u5883\u4e92\u52a8\u6536\u96c6\u89c2\u5bdf\uff0c\u5e76\u4ece\u4e2d\u63a8\u65ad\u6a21\u5f0f\uff0c\u4ee5\u6b64\u89e3\u51b3\u95ee\u9898\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u4ee3\u7406\u5728\u8be5\u57fa\u51c6\u4e0a\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86IDEA\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u5f52\u7eb3\u3001\u6f14\u7ece\u548c\u6eaf\u56e0\u4e09\u79cd\u63a8\u7406\u8fc7\u7a0b\u3002IDEA\u4ee3\u7406\u901a\u8fc7\u7ed3\u6784\u5316\u63a8\u7406\u5e8f\u5217\u63d0\u5347\u8fd9\u4e00\u65b9\u6cd5\uff1a\u9996\u5148\u901a\u8fc7\u6eaf\u56e0\u751f\u6210\u5047\u8bbe\uff0c\u7136\u540e\u901a\u8fc7\u6f14\u7ece\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\uff0c\u6700\u540e\u6839\u636e\u53cd\u9988\u8fdb\u884c\u9002\u5e94\u6027\u4fee\u6b63\u3002\u8fd9\u79cd\u5e8f\u5217\u4f7f\u4ee3\u7406\u80fd\u591f\u52a8\u6001\u5efa\u7acb\u5e76\u5e94\u7528\u89c4\u5219\uff0c\u6a21\u4eff\u4eba\u7c7b\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf9\u4e94\u79cd\u4ee3\u8868\u6027LLM\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u751f\u6210\u5408\u7406\u7684\u521d\u59cb\u5047\u8bbe\uff0c\u4f46\u5728\u73af\u5883\u5185\u7684\u6218\u7565\u4e92\u52a8\u3001\u6709\u6548\u6574\u5408\u53cd\u9988\u4ee5\u53ca\u5047\u8bbe\u7684\u9002\u5e94\u6027\u4fee\u6b63\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u800cIDEA\u4ee3\u7406\u5728RULEARN\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4e3a\u6211\u4eec\u5f00\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5b9e\u73b0\u7c7b\u4f3c\u4eba\u7c7b\u89c4\u5219\u5b66\u4e60\u80fd\u529b\u7684\u4ee3\u7406\u63d0\u4f9b\u4e86\u5b9d\u8d35\u89c1\u89e3\u3002\u6211\u4eec\u5c06\u4f1a\u53d1\u5e03\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2408.12142": "|**2024-08-22**|**MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents**|Congchi Yin et.al.|[2408.12142](http://arxiv.org/abs/2408.12142)|**[link](https://github.com/lemonsis/mdd-5k)**|**\u5728\u5927\u591a\u6570\u7cbe\u795e\u75be\u75c5\u8bca\u65ad\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u662f\u4e3b\u8981\u7684\u8bca\u65ad\u4f9d\u636e\u3002\u521b\u5efa\u8fd9\u6837\u7684\u8bca\u65ad\u5bf9\u8bdd\u6570\u636e\u96c6\u6709\u671b\u63a8\u52a8AI\u7cbe\u795e\u5065\u5eb7\u62a4\u7406\u9886\u57df\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5728\u5b9e\u9645\u8bca\u65ad\u573a\u666f\u4e2d\u6536\u96c6\u5bf9\u8bdd\u6781\u4e3a\u56f0\u96be\uff0c\u539f\u56e0\u5728\u4e8e\u9690\u79c1\u548c\u4f26\u7406\u8003\u8651\u7684\u4e25\u683c\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u5229\u7528\u6613\u4e8e\u83b7\u53d6\u7684\u533f\u540d\u60a3\u8005\u6848\u4f8b\u6765\u5408\u6210\u8bca\u65ad\u5bf9\u8bdd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u795e\u7ecf\u7b26\u53f7\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5408\u6210\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u5bf9\u8bdd\u3002\u8be5\u6846\u67b6\u4ee5\u60a3\u8005\u6848\u4f8b\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u80fd\u591f\u751f\u6210\u9488\u5bf9\u5355\u4e2a\u60a3\u8005\u6848\u4f8b\u7684\u591a\u4e2a\u591a\u6837\u5316\u7684\u5bf9\u8bdd\uff0c\u5176\u57fa\u672c\u8fc7\u7a0b\u6d89\u53ca\u533b\u751f\u4ee3\u7406\u4e0e\u60a3\u8005\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\uff0c\u5e76\u901a\u8fc7\u5de5\u5177\u4ee3\u7406\u5b9e\u73b0\u57fa\u4e8e\u7b26\u53f7\u63a7\u5236\u7684\u6587\u672c\u751f\u6210\uff0c\u501f\u52a9\u52a8\u6001\u8bca\u65ad\u6811\u3002\u901a\u8fc7\u5e94\u7528\u63d0\u51fa\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5305\u542b1000\u4e2a\u6e05\u6d17\u8fc7\u7684\u5b9e\u9645\u60a3\u8005\u6848\u4f8b\u3001\u4e0e\u4e00\u5bb6\u9886\u5148\u7684\u7cbe\u795e\u75c5\u533b\u9662\u5408\u4f5c\u6784\u5efa\u7684\u4e2d\u56fd\u6700\u5927\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u6570\u636e\u96c6MDD-5k\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e865000\u4e2a\u9ad8\u8d28\u91cf\u7684\u957f\u5bf9\u8bdd\u53ca\u5176\u8bca\u65ad\u7ed3\u679c\u6807\u7b7e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5305\u542b\u4e2d\u6587\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u7ed3\u679c\u7684\u6807\u8bb0\u6570\u636e\u96c6\u3002\u4eba\u7c7b\u8bc4\u4f30\u8868\u660e\uff0c\u63d0\u51fa\u7684MDD-5k\u6570\u636e\u96c6\u6210\u529f\u6a21\u62df\u4e86\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u8fc7\u7a0b\u3002\u6570\u636e\u96c6\u548c\u4ee3\u7801\u5c06\u5728https://github.com/lemonsis/MDD-5k\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2408.12680": "|**2024-09-01**|**Can LLMs Understand Social Norms in Autonomous Driving Games?**|Boxuan Wang et.al.|[2408.12680](http://arxiv.org/abs/2408.12680)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u4e0e\u6a21\u62df\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u793e\u4f1a\u89c4\u8303\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5c06LLM\u96c6\u6210\u5230\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u7684\u667a\u80fd\u4ee3\u7406\u89d2\u8272\u4e2d\uff0c\u6211\u4eec\u57fa\u4e8e\u6587\u672c\u63d0\u793a\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u6309\u7167\u76f8\u5173\u73af\u5883\u8bbe\u5b9a\u548c\u89c2\u5bdf\u4fe1\u606f\u505a\u51fa\u51b3\u7b56\u3002\u6211\u4eec\u7684\u6846\u67b6\u6d89\u53caLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\u8fdb\u884c\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\uff0c\u4ee5\u6b64\u7814\u7a76\u4e2a\u4f53\u4ee3\u7406\u4e4b\u95f4\u793e\u4f1a\u89c4\u8303\u7684\u5f62\u6210\u3002 \u6211\u4eec\u8bbe\u8ba1\u5b9e\u9a8c\uff0c\u5229\u7528OpenAI\u804a\u5929API\uff08\u7531GPT-4.0\u63d0\u4f9b\u52a8\u529b\uff09\u5728\u65e0\u4fe1\u53f7\u4ea4\u53c9\u53e3\u6e38\u620f\u4e0e\u9ad8\u901f\u516c\u8def\u8f66\u961f\u6e38\u620f\u4e24\u79cd\u573a\u666f\u4e0b\u6a21\u62df\u4ea4\u4e92\u5e76\u8bc4\u4f30LLM\u9a71\u52a8\u4ee3\u7406\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u80fd\u591f\u5904\u7406\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\u4e2d\u7684\u52a8\u6001\u73af\u5883\u53d8\u5316\uff0c\u5e76\u4e14\u5728\u4e24\u4e2a\u573a\u666f\u4e2d\uff0c\u4ee3\u7406\u95f4\u5f62\u6210\u4e86\u793e\u4f1a\u89c4\u8303\u3002 \u5728\u4ea4\u53c9\u53e3\u6e38\u620f\u4e2d\uff0c\u5f53\u9762\u4e34\u6f5c\u5728\u8f66\u7978\u65f6\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u503e\u5411\u4e8e\u91c7\u53d6\u4fdd\u5b88\u7684\u9a7e\u9a76\u7b56\u7565\u3002LLM\u9a71\u52a8\u4ee3\u7406\u5728\u6e38\u620f\u4e2d\u7684\u4f18\u52bf\u5728\u4e8e\u5176\u64cd\u4f5c\u7075\u6d3b\u6027\u548c\u53ef\u5206\u6790\u6027\uff0c\u8fd9\u6709\u52a9\u4e8e\u5b9e\u9a8c\u8bbe\u8ba1\u3002|\n", "2408.14307": "|**2024-08-26**|**LLM-3D Print: Large Language Models To Monitor and Control 3D Printing**|Yayati Jadhav et.al.|[2408.14307](http://arxiv.org/abs/2408.14307)|null|\u884c\u4e1a4.0\u901a\u8fc7\u63a8\u52a8\u6570\u5b57\u5316\u8fdb\u7a0b\u5e76\u8f6c\u5411\u589e\u6750\u5236\u9020\uff08AM\uff09\uff0c\u5f7b\u5e95\u6539\u53d8\u4e86\u5236\u9020\u4e1a\u3002\u7194\u878d\u6c89\u79ef\u5efa\u6a21\uff08FDM\uff09\u4f5c\u4e3a\u5173\u952e\u7684AM\u6280\u672f\u4e4b\u4e00\uff0c\u901a\u8fc7\u9010\u5c42\u6324\u51fa\u65b9\u5f0f\u521b\u5efa\u9ad8\u5ea6\u5b9a\u5236\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u6750\u6599\u6d6a\u8d39\u6781\u5c0f\u7684\u4ea7\u54c1\uff0c\u5bf9\u4f20\u7edf\u51cf\u6750\u65b9\u6cd5\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6750\u6599\u6324\u51fa\u6280\u672f\u7684\u6613\u9519\u6027\u5f80\u5f80\u9700\u8981\u4e13\u5bb6\u4ecb\u5165\u6765\u68c0\u6d4b\u548c\u7f13\u89e3\u53ef\u80fd\u4e25\u91cd\u635f\u5bb3\u4ea7\u54c1\u8d28\u91cf\u7684\u7f3a\u9677\u3002\u867d\u7136\u5df2\u5b58\u5728\u81ea\u52a8\u5316\u9519\u8bef\u68c0\u6d4b\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u4f46\u5b83\u4eec\u5728\u4e0d\u540c3D\u6253\u5370\u673a\u8bbe\u7f6e\u3001\u56fa\u4ef6\u548c\u4f20\u611f\u5668\u4e4b\u95f4\u7684\u901a\u7528\u6027\u6709\u9650\uff0c\u5e76\u4e14\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e3D\u6253\u5370\u6280\u672f\u76f8\u7ed3\u5408\u7684\u8fc7\u7a0b\u76d1\u63a7\u548c\u63a7\u5236\u6846\u67b6\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u89e3\u51b3\u6253\u5370\u7f3a\u9677\u3002\u8be5LLM\u901a\u8fc7\u5206\u6790\u6bcf\u5c42\u6216\u6253\u5370\u6bb5\u4e4b\u540e\u6355\u83b7\u7684\u56fe\u50cf\u6765\u8bc4\u4f30\u6253\u5370\u8d28\u91cf\uff0c\u8bc6\u522b\u6545\u969c\u6a21\u5f0f\uff0c\u5e76\u5411\u6253\u5370\u673a\u67e5\u8be2\u76f8\u5173\u53c2\u6570\u3002\u7136\u540e\uff0c\u5b83\u751f\u6210\u5e76\u6267\u884c\u7ea0\u6b63\u63aa\u65bd\u8ba1\u5212\u3002\u6211\u4eec\u901a\u8fc7\u5c06\u63d0\u51fa\u7684\u6846\u67b6\u7684\u6709\u6548\u6027\u4e0e\u4e00\u7ec4\u5177\u6709\u4e0d\u540cAM\u4e13\u4e1a\u77e5\u8bc6\u7684\u5de5\u7a0b\u5e08\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u4ee5\u9a8c\u8bc1\u8bc6\u522b\u7f3a\u9677\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0d\u4ec5\u51c6\u786e\u8bc6\u522b\u5e38\u89c1\u76843D\u6253\u5370\u9519\u8bef\uff0c\u5982\u4e0d\u4e00\u81f4\u7684\u6324\u51fa\u3001\u4e1d\u72b6\u5806\u79ef\u3001\u7fd8\u66f2\u548c\u5c42\u7c98\u5408\u95ee\u9898\uff0c\u800c\u4e14\u8fd8\u80fd\u6709\u6548\u786e\u5b9a\u5bfc\u81f4\u8fd9\u4e9b\u5931\u8d25\u7684\u53c2\u6570\uff0c\u5e76\u81ea\u4e3b\u5730\u8fdb\u884c\u4fee\u6b63\uff0c\u65e0\u9700\u4efb\u4f55\u4eba\u5de5\u5e72\u9884\u3002|\n", "2408.14033": "|**2024-09-02**|**MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents**|Ruochen Li et.al.|[2408.14033](http://arxiv.org/abs/2408.14033)|**[link](https://github.com/du-nlp-lab/mlr-copilot)**|**\u673a\u5668\u5b66\u4e60\u7814\u7a76\u5bf9\u4e8e\u6280\u672f\u8fdb\u6b65\u548c\u521b\u65b0\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5e38\u5e38\u9762\u4e34\u590d\u6742\u6027\u9ad8\u3001\u5b9e\u9a8c\u5468\u671f\u957f\u4ee5\u53ca\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7b49\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7cfb\u7edf\u6846\u67b6\u2014\u2014\u81ea\u4e3b\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLR-Copilot\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u5e76\u5b9e\u65bd\u7814\u7a76\u60f3\u6cd5\u6765\u63d0\u9ad8\u673a\u5668\u5b66\u4e60\u7814\u7a76\u7684\u751f\u4ea7\u529b\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e09\u4e2a\u9636\u6bb5\uff1a\u7814\u7a76\u60f3\u6cd5\u751f\u6210\u3001\u5b9e\u9a8c\u5b9e\u73b0\u548c\u6267\u884c\u3002\u9996\u5148\uff0c\u901a\u8fc7\u57fa\u4e8eLLM\u7684IdeaAgent\u5229\u7528\u73b0\u6709\u7814\u7a76\u8bba\u6587\u751f\u6210\u5047\u8bbe\u548c\u5b9e\u9a8c\u8ba1\u5212\u3002\u63a5\u4e0b\u6765\uff0c\u5728\u5b9e\u73b0\u751f\u6210\u9636\u6bb5\uff0c\u5c06\u8fd9\u4e9b\u8ba1\u5212\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u4f7f\u7528ExperimentAgent\u5b8c\u6210\u6b64\u8fc7\u7a0b\u3002\u6b64\u9636\u6bb5\u5229\u7528\u68c0\u7d22\u5230\u7684\u539f\u578b\u4ee3\u7801\uff0c\u5e76\u6839\u636e\u9700\u8981\u68c0\u7d22\u5019\u9009\u6a21\u578b\u548c\u6570\u636e\u3002\u6700\u540e\uff0c\u5728\u6267\u884c\u9636\u6bb5\uff0c\u4e5f\u7531ExperimentAgent\u7ba1\u7406\uff0c\u6d89\u53ca\u8fd0\u884c\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u548c\u8fed\u4ee3\u8c03\u8bd5\u673a\u5236\uff0c\u4ee5\u589e\u52a0\u5b9e\u73b0\u53ef\u6267\u884c\u7814\u7a76\u6210\u679c\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5bf9\u4e94\u4e2a\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\u4e86\u8be5\u6846\u67b6\u4fc3\u8fdb\u7814\u7a76\u8fdb\u5c55\u548c\u521b\u65b0\u7684\u6f5c\u529b\u3002**|\n", "2408.13986": "|**2024-08-26**|**AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework**|Jie Feng et.al.|[2408.13986](http://arxiv.org/abs/2408.13986)|**[link](https://github.com/tsinghua-fib-lab/agentmove)**|**\u4eba\u7c7b\u79fb\u52a8\u6027\u9884\u6d4b\u5728\u5404\u79cd\u5b9e\u9645\u5e94\u7528\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u5c3d\u7ba1\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u4f46\u5b83\u4eec\u5bf9\u7528\u4e8e\u8bad\u7ec3\u7684\u5927\u91cf\u79c1\u4eba\u79fb\u52a8\u6570\u636e\u7684\u4f9d\u8d56\u4ee5\u53ca\u65e0\u6cd5\u8fdb\u884c\u96f6\u542f\u52a8\u9884\u6d4b\u7684\u80fd\u529b\uff0c\u963b\u788d\u4e86\u8fdb\u4e00\u6b65\u7684\u53d1\u5c55\u3002\u6700\u8fd1\uff0c\u6709\u4eba\u5c1d\u8bd5\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6267\u884c\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u3002\u7136\u800c\uff0c\u4ed6\u4eec\u7684\u6027\u80fd\u53d7\u9650\u4e8e\u7f3a\u4e4f\u7cfb\u7edf\u7684\u8bbe\u8ba1\u5de5\u4f5c\u6d41\u7a0b\u3002\u4ed6\u4eec\u76f4\u63a5\u4f7f\u7528LLMs\u751f\u6210\u6700\u7ec8\u8f93\u51fa\uff0c\u8fd9\u9650\u5236\u4e86LLMs\u53d1\u73b0\u590d\u6742\u79fb\u52a8\u6a21\u5f0f\u7684\u6f5c\u529b\uff0c\u5e76\u4f4e\u4f30\u4e86\u5b83\u4eec\u5728\u5168\u7403\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u65b9\u9762\u7684\u5de8\u5927\u50a8\u5907\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAgentMove\u7684\u7cfb\u7edf\u6027\u4ee3\u7406\u9884\u6d4b\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4efb\u4f55\u5168\u7403\u57ce\u5e02\u7684\u901a\u7528\u79fb\u52a8\u6027\u9884\u6d4b\u3002\u5728AgentMove\u4e2d\uff0c\u6211\u4eec\u9996\u5148\u5c06\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u5206\u89e3\u4e3a\u4e09\u4e2a\u5b50\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u76f8\u5e94\u7684\u6a21\u5757\u6765\u5b8c\u6210\u8fd9\u4e9b\u5b50\u4efb\u52a1\uff0c\u5305\u62ec\u4e2a\u4f53\u79fb\u52a8\u6a21\u5f0f\u6316\u6398\u7684\u7a7a\u95f4-\u65f6\u95f4\u8bb0\u5fc6\u3001\u57ce\u5e02\u7ed3\u6784\u6548\u5e94\u5bf9\u6a21\u578b\u7684\u5f71\u54cd\u7684\u5168\u7403\u77e5\u8bc6\u751f\u6210\u5668\u4ee5\u53ca\u6355\u83b7\u4eba\u53e3\u5171\u4eab\u6a21\u5f0f\u7684\u96c6\u4f53\u77e5\u8bc6\u63d0\u53d6\u5668\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06\u4e09\u4e2a\u6a21\u5757\u7684\u7ed3\u679c\u7ed3\u5408\u8d77\u6765\uff0c\u5e76\u6267\u884c\u63a8\u7406\u6b65\u9aa4\u4ee5\u751f\u6210\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u6765\u81ea\u4e24\u4e2a\u6765\u6e90\u768412\u4e2a\u57ce\u5e02\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u6700\u4f73\u57fa\u7ebf\u76f8\u6bd4\uff0cAgentMove\u5728\u5404\u79cd\u6307\u6807\u4e0a\u7684\u6027\u80fd\u63d0\u9ad8\u4e86\u8d85\u8fc78%\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57ce\u5e02\u4e2d\u663e\u793a\u51fa\u4e86\u7a33\u5065\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u4e14\u4f7f\u7528\u4e0d\u540c\u57fa\u7840\u7684LLM\u65f6\u4e5f\u80fd\u8868\u73b0\u51fa\u8272\uff0c\u4e14\u5177\u6709\u8f83\u4f4e\u7684\u5730\u7406\u504f\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u4ee5\u5728https://github.com/tsinghua-fib-lab/AgentMove\u627e\u5230\u3002**|\n", "2408.13406": "|**2024-08-23**|**Optimizing Collaboration of LLM based Agents for Finite Element Analysis**|Chuan Tian et.al.|[2408.13406](http://arxiv.org/abs/2408.13406)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u7a0b\u548c\u7f16\u7801\u4efb\u52a1\u4e2d\u7684\u591a\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u5229\u7528AutoGen\u6846\u67b6\u4fc3\u8fdb\u4ee3\u7406\u4e4b\u95f4\u7684\u6c9f\u901a\uff0c\u5e76\u57fa\u4e8e\u6bcf\u79cd\u8bbe\u7f6e\u768440\u6b21\u968f\u673a\u8fd0\u884c\u7684\u6210\u529f\u7387\u8bc4\u4f30\u4e0d\u540c\u7684\u914d\u7f6e\u3002\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u5f00\u53d1\u4e00\u4e2a\u7075\u6d3b\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u7528\u4e8e\u5c06\u6709\u9650\u5143\u65b9\u6cd5\u5e94\u7528\u4e8e\u89e3\u51b3\u7ebf\u6027\u5f39\u6027\u95ee\u9898\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u4f18\u5316\u4ee3\u7406\u89d2\u8272\u53ca\u5176\u660e\u786e\u804c\u8d23\u7684\u91cd\u8981\u6027\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u589e\u52a0\u4ee3\u7406\u6570\u91cf\u3002\u4ee3\u7406\u95f4\u7684\u6709\u6548\u534f\u4f5c\u88ab\u8bc1\u660e\u5bf9\u4e8e\u89e3\u51b3\u6709\u9650\u5143\u65b9\u6cd5\u7684\u4e00\u822c\u6311\u6218\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u9879\u7814\u7a76\u5c55\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u589e\u5f3a\u8ba1\u7b97\u81ea\u52a8\u5316\u5728\u6a21\u62df\u65b9\u6cd5\u5b66\u4e2d\u7684\u6f5c\u529b\uff0c\u4e3a\u5de5\u7a0b\u548c\u4eba\u5de5\u667a\u80fd\u7684\u672a\u6765\u8fdb\u5c55\u94fa\u5e73\u9053\u8def\u3002|\n", "2408.14972": "|**2024-08-27**|**AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems**|Chi-Min Chan et.al.|[2408.14972](http://arxiv.org/abs/2408.14972)|**[link](https://github.com/chanchimin/agentmonitor)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u52a8\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5174\u8d77\u3002\u8fd1\u671f\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\uff0c\u6bcf\u4e2a\u4ee3\u7406\u6267\u884c\u7279\u5b9a\u89d2\u8272\u65f6\uff0c\u5176\u6027\u80fd\u901a\u5e38\u4f18\u4e8e\u5355\u4e00LLM\u3002\u7136\u800c\uff0c\u914d\u7f6eMAS\u4ee5\u5b8c\u6210\u4efb\u52a1\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u4efb\u52a1\u8868\u73b0\u4ec5\u5728\u6267\u884c\u540e\u624d\u80fd\u89c2\u5bdf\u5230\u3002\u53d7\u5230LLM\u5f00\u53d1\u4e2d\u7684\u89c4\u6a21\u6cd5\u5219\u542f\u53d1\uff0c\u6211\u4eec\u63a2\u7d22\u662f\u5426\u80fd\u5728\u4efb\u52a1\u6267\u884c\u524d\u9884\u6d4bMAS\u7684\u6027\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentMonitor\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5728\u4ee3\u7406\u5c42\u7ea7\u96c6\u6210\uff0c\u7528\u4e8e\u6355\u83b7\u8f93\u5165\u548c\u8f93\u51fa\u4fe1\u606f\uff0c\u5e76\u5c06\u8fd9\u4e9b\u4fe1\u606f\u8f6c\u6362\u4e3a\u7edf\u8ba1\u6570\u636e\uff0c\u7528\u4e8e\u8bad\u7ec3\u56de\u5f52\u6a21\u578b\u9884\u6d4b\u4efb\u52a1\u6027\u80fd\u3002\u6b64\u5916\uff0cAgentMonitor\u8fd8\u80fd\u591f\u5b9e\u65f6\u5bf9\u53ef\u80fd\u7531\u6076\u610f\u4ee3\u7406\u5f15\u53d1\u7684\u5b89\u5168\u98ce\u9669\u8fdb\u884c\u7ea0\u6b63\uff0c\u4ece\u800c\u51cf\u8f7b\u8d1f\u9762\u5f71\u54cd\u5e76\u589e\u5f3aMAS\u7684\u5b89\u5168\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528XGBoost\u6a21\u578b\u5728\u9886\u57df\u5185\u573a\u666f\u4e0b\u8fbe\u52300.89\u7684\u65af\u76ae\u5c14\u66fc\u76f8\u5173\u7cfb\u6570\uff0c\u5728\u66f4\u5177\u6311\u6218\u6027\u7684\u573a\u666f\u4e0b\u8fbe\u52300.58\u3002\u901a\u8fc7\u5e94\u7528AgentMonitor\uff0c\u6709\u5bb3\u5185\u5bb9\u51cf\u5c11\u4e866.2%\uff0c\u6709\u76ca\u5185\u5bb9\u5e73\u5747\u589e\u52a0\u4e861.8%\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u548c\u53ef\u9760\u6027\u3002\u76f8\u5173\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\u3002**|\n", "2408.15778": "|**2024-09-05**|**LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models**|Jiayi Gui et.al.|[2408.15778](http://arxiv.org/abs/2408.15778)|**[link](https://github.com/hypatiaalegra/logicgame-data)**|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLogicGame\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89c4\u5219\u7406\u89e3\u548c\u6267\u884c\u3001\u591a\u6b65\u89c4\u5212\u65b9\u9762\u7684\u5168\u9762\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0cLogicGame\u63d0\u4f9b\u4e86\u591a\u79cd\u6e38\u620f\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u7cfb\u5217\u89c4\u5219\u4ee5\u53ca\u521d\u59cb\u72b6\u6001\uff0c\u8981\u6c42\u6a21\u578b\u7406\u89e3\u5e76\u5e94\u7528\u9884\u5b9a\u4e49\u89c4\u5219\u6765\u89e3\u51b3\u95ee\u9898\u3002\u6211\u4eec\u521b\u5efa\u4e86\u6a21\u62df\u60c5\u666f\uff0c\u8ba9\u6a21\u578b\u6267\u884c\u6216\u89c4\u5212\u64cd\u4f5c\u4ee5\u8fbe\u5230\u7279\u5b9a\u76ee\u6807\u3002\u8fd9\u4e9b\u6e38\u620f\u573a\u666f\u4e13\u95e8\u8bbe\u8ba1\u4ee5\u533a\u5206\u903b\u8f91\u63a8\u7406\u4e0e\u4ec5\u4f9d\u8d56\u77e5\u8bc6\u7684\u80fd\u529b\uff0c\u5b8c\u5168\u4f9d\u8d56\u4e8e\u9884\u8bbe\u89c4\u5219\u3002\u8fd9\u79cd\u5206\u79bb\u5141\u8bb8\u5bf9\u57fa\u4e8e\u89c4\u5219\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u7eaf\u7cb9\u7684\u8bc4\u4f30\u3002\u8bc4\u4f30\u4e0d\u4ec5\u8003\u8651\u6700\u7ec8\u7ed3\u679c\uff0c\u8fd8\u8003\u8651\u4e2d\u95f4\u6b65\u9aa4\uff0c\u63d0\u4f9b\u6a21\u578b\u6027\u80fd\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u4e2d\u95f4\u6b65\u9aa4\u662f\u786e\u5b9a\u6027\u7684\uff0c\u5e76\u4e14\u53ef\u4ee5\u81ea\u52a8\u9a8c\u8bc1\u3002LogicGame\u5b9a\u4e49\u4e86\u4ece\u7b80\u5355\u89c4\u5219\u5e94\u7528\u5230\u590d\u6742\u63a8\u7406\u94fe\u7684\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u6e38\u620f\u573a\u666f\uff0c\u4ee5\u7cbe\u786e\u8bc4\u4f30\u6a21\u578b\u5728\u89c4\u5219\u7406\u89e3\u548c\u591a\u6b65\u6267\u884c\u4e0a\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528LogicGame\uff0c\u6211\u4eec\u6d4b\u8bd5\u4e86\u5404\u79cdLLM\uff0c\u5e76\u53d1\u73b0\u4e86\u5b83\u4eec\u5728\u57fa\u4e8e\u89c4\u5219\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u663e\u8457\u4e0d\u8db3\u3002|\n", "2408.16090": "|**2024-08-28**|**EPO: Hierarchical LLM Agents with Environment Preference Optimization**|Qi Zhao et.al.|[2408.16090](http://arxiv.org/abs/2408.16090)|**[link](https://github.com/kevinz8866/epo)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u5c42\u6846\u67b6\uff0c\u7528\u4e8e\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u5b50\u76ee\u6807\u7684\u95ee\u9898\u3002\u6846\u67b6\u4f7f\u7528\u4e86\u72ec\u7acb\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5b50\u76ee\u6807\u9884\u6d4b\u548c\u4f4e\u7ea7\u52a8\u4f5c\u751f\u6210\u3002\u9488\u5bf9\u65e0\u6807\u6ce8\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u4fe1\u53f7\u521b\u5efa\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u5229\u7528\u73af\u5883\u591a\u6a21\u6001\u53cd\u9988\u81ea\u52a8\u751f\u6210\u5956\u52b1\u4fe1\u53f7\u3002\u6211\u4eec\u5f15\u5165\u4e86\u73af\u5883\u504f\u597d\u4f18\u5316\uff08EPO\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4ece\u73af\u5883\u53cd\u9988\u4e2d\u751f\u6210\u504f\u597d\u4fe1\u53f7\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u4fe1\u53f7\u8bad\u7ec3\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u3002ALFRED\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u6027\u80fd\u4e0a\u5904\u4e8e\u9886\u5148\u5730\u4f4d\uff0c\u9996\u6b21\u767b\u4e0a\u4e86ALFRED\u516c\u5f00\u6392\u884c\u699c\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u7684\u957f\u671f\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u7684\u63d0\u5347\u6f5c\u529b\u3002|\n", "2408.16991": "|**2024-08-30**|**Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios**|Zhongyuan Wang et.al.|[2408.16991](http://arxiv.org/abs/2408.16991)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5de5\u5177\u8f85\u52a9\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8eSQL\u68c0\u67e5\u548c\u6539\u8fdb\uff0c\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u67e5\u8be2\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3aLLM\u4ee3\u7406\u914d\u5907\u4e24\u4e2a\u4e13\u95e8\u5de5\u5177\u2014\u2014\u68c0\u7d22\u5668\u548c\u68c0\u6d4b\u5668\uff0c\u4ee5\u8bca\u65ad\u5e76\u4fee\u6b63SQL\u67e5\u8be2\u4e2d\u7684\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u3002\u8fd9\u4e9b\u5de5\u5177\u80fd\u591f\u589e\u5f3aLLM\u5904\u7406\u771f\u5b9e\u573a\u666f\u4e2d\u51fa\u73b0\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u548c\u4e25\u683c\u7ea6\u675f\u4e0d\u5339\u914d\u7b49\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u7684\u80fd\u529b\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86Spider-Mismatch\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u4e3a\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u4e2d\u9047\u5230\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u95ee\u9898\u800c\u6784\u5efa\u7684\u65b0\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5c11\u91cf\u793a\u4f8b\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u548cSpider-Realistic\u6570\u636e\u96c6\u4e0a\u7684\u5e73\u5747\u8868\u73b0\u6700\u4f73\uff0c\u5e76\u4e14\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5728\u66f4\u5177\u6709\u73b0\u5b9e\u6027\u7684\u6570\u636e\u96c6Spider-Mismatch\u4e0a\u4e5f\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.00993": "|**2024-09-02**|**Evolution of Social Norms in LLM Agents using Natural Language**|Ilya Horiguchi et.al.|[2409.00993](http://arxiv.org/abs/2409.00993)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u6fc0\u53d1\u4e86\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u6e38\u620f\u7406\u8bba\u6a21\u62df\u7684\u5174\u8da3\uff0c\u5728\u8fd9\u4e9b\u6a21\u62df\u4e2d\uff0cLLM\u5145\u5f53\u4e2a\u4f53\u4ee3\u7406\uff0c\u8fdb\u884c\u793e\u4f1a\u4e92\u52a8\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5bf9\u8bdd\u4f7fLLM\u4ee3\u7406\u81ea\u53d1\u751f\u6210\u5e76\u9075\u5b88\u89c4\u8303\u7b56\u7565\u7684\u53ef\u80fd\u6027\uff0c\u4ee5\u6b64\u4e3a\u57fa\u7840\uff0c\u63a2\u7d22\u4e86\u5bf9Axelrod\u7684\u5143\u89c4\u8303\u6e38\u620f\u5de5\u4f5c\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u5bf9\u8bdd\uff0cLLM\u4ee3\u7406\u80fd\u591f\u4ec5\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u5f62\u6210\u590d\u6742\u7684\u793e\u4ea4\u89c4\u8303\uff0c\u5982\u5143\u89c4\u8303\u2014\u2014\u89c4\u8303\u60e9\u7f5a\u4e0d\u60e9\u7f5a\u4f5c\u5f0a\u884c\u4e3a\u7684\u89c4\u8303\u3002\u7ed3\u679c\u8bc1\u5b9e\u4e86\u4f7f\u7528LLM\u4ee3\u7406\u6a21\u62df\u793e\u4f1a\u4e92\u52a8\u548c\u7406\u89e3\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u6f14\u5316\u51fa\u590d\u6742\u7b56\u7565\u4e0e\u89c4\u8303\u7684\u6709\u6548\u6027\u3002\u672a\u6765\u7684\u5de5\u4f5c\u53ef\u80fd\u901a\u8fc7\u6269\u5c55\u5230\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u548c\u4ee3\u7406\u7279\u5f81\uff0c\u63ed\u793a\u66f4\u591a\u5173\u4e8e\u793e\u4f1a\u89c4\u8303\u5f62\u6210\u7684\u5fae\u5999\u673a\u5236\u3002|\n", "2409.00985": "|**2024-09-02**|**Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces**|Jiapeng Yu et.al.|[2409.00985](http://arxiv.org/abs/2409.00985)|**[link](https://github.com/yuqian2003/co_learning)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5728\u7ebf\u95ee\u7b54\u7cfb\u7edf\u4ece\u5a31\u4e50\u7528\u9014\u9010\u6e10\u8f6c\u5411\u4e13\u4e1a\u9886\u57df\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7801\u5b66\u4e60\uff08Co-Learning\uff09\u793e\u533a\u201d\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7ed3\u5408\u73af\u5883\u5f3a\u5316\u5b66\u4e60\uff08E-RL\uff09\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u72ec\u7acb\u4fee\u6b63\u4ee3\u7801\u9519\u8bef\u3002\u8be5\u7cfb\u7edf\u901a\u8fc7\u4e00\u4e2a\u5305\u542b702\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u539f\u59cb\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3aE-RL\u5956\u52b1\u6216\u60e9\u7f5a\u7684\u6807\u51c6\u3002\u901a\u8fc7\u5206\u6790\u5f53\u524d\u4ee3\u7406\u8f93\u5165\u7684\u9519\u8bef\u4ee3\u7801\uff0c\u9009\u62e9\u5408\u9002\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4ee5\u5b9e\u73b0\u6700\u4f73\u7684\u9519\u8bef\u4fee\u6b63\u51c6\u786e\u7387\u5e76\u51cf\u5c11\u4fee\u6b63\u65f6\u95f4\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u65e0E-RL\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8be5\u65b9\u6cd5\u5728\u7cbe\u786e\u5ea6\u5f97\u5206\u4e0a\u63d0\u9ad8\u4e863%\uff0c\u5728\u65f6\u95f4\u6210\u672c\u4e0a\u964d\u4f4e\u4e8615%\u3002\u6211\u4eec\u7684\u6e90\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://github.com/yuqian2003/Co_Learning**|\n", "2409.00135": "|**2024-08-29**|**HoneyComb: A Flexible LLM-Based Agent System for Materials Science**|Huan Zhang et.al.|[2409.00135](http://arxiv.org/abs/2409.00135)|null|\u4e3a\u4e86\u5e94\u5bf9\u6750\u6599\u79d1\u5b66\u4efb\u52a1\u4e2d\u7684\u590d\u6742\u6027\u5e76\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5e94\u7528\u65f6\u6240\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5982\u4f9d\u8d56\u8fc7\u65f6\u7684\u9690\u6027\u77e5\u8bc6\u5bfc\u81f4\u7684\u51c6\u786e\u6027\u4e0b\u964d\u548c\u5e7b\u89c9\u73b0\u8c61\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HoneyComb\u2014\u2014\u9996\u4e2a\u4e13\u95e8\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684LLM\u4ee3\u7406\u7cfb\u7edf\u3002HoneyComb\u901a\u8fc7\u5229\u7528\u4e00\u4e2a\u57fa\u4e8e\u53ef\u9760\u6587\u732e\u7684\u9ad8\u8d28\u91cf\u6750\u6599\u79d1\u5b66\u77e5\u8bc6\u5e93\uff08MatSciKB\uff09\u548c\u4e00\u79cd\u521b\u65b0\u7684\u5de5\u5177\u96c6\uff08ToolHub\uff09\uff0c\u589e\u5f3a\u5176\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u7279\u6709\u7684\u63a8\u7406\u4e0e\u8ba1\u7b97\u80fd\u529b\u3002 MatSciKB\u662f\u4e00\u4e2a\u7ecf\u8fc7\u7cbe\u5fc3\u7f16\u7e82\u3001\u7ed3\u6784\u5316\u7684\u77e5\u8bc6\u96c6\u5408\uff0c\u65e8\u5728\u6db5\u76d6\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5173\u952e\u4fe1\u606f\u3002\u800cToolHub\u5219\u91c7\u7528\u4e86\u4e00\u79cd\u5f52\u7eb3\u5f0f\u5de5\u5177\u6784\u5efa\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u3001\u5206\u89e3\u548c\u4f18\u5316\u9002\u7528\u4e8e\u6750\u6599\u79d1\u5b66\u7684API\u5de5\u5177\uff0c\u4ece\u800c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u7cfb\u7edf\u7684\u5b9e\u7528\u6027\u3002\u6b64\u5916\uff0cHoneyComb\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u68c0\u7d22\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u667a\u80fd\u9009\u62e9\u6700\u5408\u9002\u7684\u77e5\u8bc6\u6765\u6e90\u6216\u5de5\u5177\uff0c\u786e\u4fdd\u4e86\u7b54\u6848\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHoneyComb\u5728\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5404\u79cd\u4efb\u52a1\u4e0a\u5747\u8868\u73b0\u51fa\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u7684\u80fd\u529b\uff0c\u6210\u529f\u5730\u5f25\u5408\u4e86\u5f53\u524dLLM\u6280\u672f\u4e0e\u6750\u6599\u79d1\u5b66\u7279\u5b9a\u9700\u6c42\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u66f4\u4e3a\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u7684\u53ef\u6269\u5c55\u6846\u67b6\u6613\u4e8e\u6269\u5c55\u81f3\u5176\u4ed6\u79d1\u5b66\u9886\u57df\uff0c\u5c55\u793a\u4e86\u5176\u5728\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u548c\u5e94\u7528\u53d1\u5c55\u65b9\u9762\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u800c\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u7406\u5ff5\uff0c\u5373\u8bd7\u6b4c\u751f\u6210\u7cfb\u7edf\u7684\u5b66\u4e60\u8fc7\u7a0b\u5e94\u66f4\u52a0\u4eba\u6027\u5316\uff0c\u5e76\u4e14\u5176\u8f93\u51fa\u66f4\u52a0\u591a\u6837\u548c\u65b0\u9896\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u5f3a\u8c03\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u4e4b\u5916\u7684\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u5229\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\uff08GPT-3\u548cGPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a\u4ee3\u7406\u7cfb\u7edf\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u5e26\u6765\u4e86\u597d\u5904\uff0c\u5bfc\u81f4n-gram\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\u3002\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u4e0a\u8868\u73b0\u51fa\u7fa4\u4f53\u5206\u5316\u3002\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u53d7\u76ca\uff0c\u5e76\u4e14\u5177\u6709\u975e\u540c\u8d28\u4ee3\u7406\u7684\u66f4\u591a\u6837\u5316\u7684\u6a21\u578b\u96c6\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u663e\u793a\u51fa\u968f\u7740\u65f6\u95f4\u63a8\u79fb\uff0c\u8bcd\u6c47\u591a\u6837\u6027\u51cf\u5c11\uff0c\u5e76\u4e14\u6ca1\u6709\u8868\u73b0\u51fa\u9884\u671f\u7684\u7fa4\u4f53\u5206\u5316\u610f\u56fe\u7684\u793e\u4f1a\u7f51\u7edc\u3002\u6211\u4eec\u7684\u8bba\u6587\u4e3b\u5f20\uff0c\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u5c06\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5efa\u6a21\uff09\u7eb3\u5165\u8003\u8651\u8303\u56f4\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u7684\u4ea4\u4e92\u65b9\u5f0f\u3002**|\n", "2409.03440": "|**2024-09-05**|**Rx Strategist: Prescription Verification using LLM Agents System**|Phuc Phan Van et.al.|[2409.03440](http://arxiv.org/abs/2409.03440)|null|\u4e3a\u4e86\u4fdd\u969c\u60a3\u8005\u5b89\u5168\uff0c\u73b0\u4ee3\u836f\u7269\u590d\u6742\u6027\u8981\u6c42\u4e25\u683c\u5904\u65b9\u9a8c\u8bc1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014Rx Strategist\uff0c\u5b83\u5229\u7528\u77e5\u8bc6\u56fe\u8c31\u548c\u4e0d\u540c\u7684\u641c\u7d22\u7b56\u7565\uff0c\u7ed3\u5408\u4ee3\u7406\u6846\u67b6\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4ee5\u589e\u5f3a\u5176\u80fd\u529b\u3002\u8fd9\u79cd\u591a\u7ef4\u5ea6\u7684\u6280\u672f\u5141\u8bb8\u6784\u5efa\u4e00\u4e2a\u591a\u9636\u6bb5\u7684LLM\u7ba1\u9053\uff0c\u5e76\u4ece\u81ea\u5b9a\u4e49\u6d3b\u6027\u6210\u5206\u6570\u636e\u5e93\u4e2d\u53ef\u9760\u5730\u68c0\u7d22\u4fe1\u606f\u3002\u8be5\u7ba1\u9053\u8986\u76d6\u4e86\u5904\u65b9\u9a8c\u8bc1\u7684\u4e0d\u540c\u65b9\u9762\uff0c\u5982\u9002\u5e94\u75c7\u3001\u5242\u91cf\u548c\u53ef\u80fd\u7684\u836f\u7269\u76f8\u4e92\u4f5c\u7528\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u5305\u542b\u4e86\u8fd9\u4e9b\u65b9\u9762\u7684\u5185\u5bb9\u3002 \u901a\u8fc7\u5728\u8fd9\u4e9b\u9636\u6bb5\u5206\u6563\u63a8\u7406\uff0c\u6211\u4eec\u7f13\u89e3\u4e86\u5355\u4e00LLM\u6280\u672f\u7684\u7f3a\u70b9\uff0c\u63d0\u9ad8\u4e86\u6b63\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cRx Strategist\u8d85\u8d8a\u4e86\u8bb8\u591a\u5f53\u524d\u7684LLMs\uff0c\u5176\u6027\u80fd\u4e0e\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e34\u5e8a\u836f\u5e08\u76f8\u5f53\u3002\u5728\u73b0\u4ee3\u836f\u7269\u7684\u590d\u6742\u4e16\u754c\u4e2d\uff0c\u5c06LLMs\u4e0e\u7ec4\u7ec7\u5316\u77e5\u8bc6\u548c\u9ad8\u7ea7\u641c\u7d22\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u51cf\u5c11\u5904\u65b9\u9519\u8bef\u5e76\u63d0\u9ad8\u60a3\u8005\u7ed3\u679c\u7684\u53ef\u884c\u9014\u5f84\u3002|\n", "2409.03258": "|**2024-09-05**|**GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding**|Yukun Cao et.al.|[2409.03258](http://arxiv.org/abs/2409.03258)|null|\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u56fe\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u901a\u8fc7\u63cf\u8ff0\u5e8f\u5217\u7684\u56fe\u8bf4\u660e\u6765\u7406\u89e3\u56fe\u5f62\u7ed3\u6784\u4fe1\u606f\u65f6\uff0c\u5c24\u5176\u662f\u5728\u56fe\u7684\u5927\u5c0f\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u9047\u5230\u4e86\u6311\u6218\u3002\u6211\u4eec\u5f52\u56e0\u4e8eLLMs\u5728\u56fe\u63cf\u8ff0\u5e8f\u5217\u7684\u4e0d\u540c\u4f4d\u7f6e\u4e0a\u5b58\u5728\u4e0d\u5747\u5300\u7684\u8bb0\u5fc6\u6027\u80fd\uff0c\u5373\u6240\u8c13\u7684\u201c\u4f4d\u7f6e\u504f\u89c1\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GraphInsight\uff0c\u4e00\u4e2a\u65e8\u5728\u63d0\u9ad8LLMs\u5bf9\u5b8f\u89c2\u548c\u5fae\u89c2\u56fe\u5f62\u4fe1\u606f\u7406\u89e3\u7684\u65b0\u6846\u67b6\u3002GraphInsight\u57fa\u4e8e\u4e24\u4e2a\u5173\u952e\u7b56\u7565\uff1a1\uff09\u5c06\u5173\u952e\u56fe\u5f62\u4fe1\u606f\u653e\u7f6e\u5728LLMs\u8868\u73b0\u51fa\u66f4\u5f3a\u8bb0\u5fc6\u6027\u80fd\u7684\u4f4d\u7f6e\uff1b2\uff09\u5bf9\u4e8e\u8bb0\u5fc6\u6027\u80fd\u8f83\u5f31\u7684\u533a\u57df\uff0c\u63a2\u7d22\u4f7f\u7528\u8f7b\u91cf\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e93\uff0c\u7075\u611f\u6765\u81ea\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u3002\u6b64\u5916\uff0cGraphInsight\u8fd8\u63a2\u7d22\u4e86\u5c06\u8fd9\u4e24\u79cd\u7b56\u7565\u96c6\u6210\u5230LLM\u4ee3\u7406\u6d41\u7a0b\u4e2d\uff0c\u4ee5\u89e3\u51b3\u9700\u8981\u591a\u6b65\u63a8\u7406\u7684\u590d\u5408\u56fe\u4efb\u52a1\u3002\u5e7f\u6cdb\u7684\u57fa\u51c6\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e0d\u540c\u5927\u5c0f\u7684\u56fe\u5f62\u7ed3\u6784\u7406\u89e3\u4efb\u52a1\u4e0a\uff0cGraphInsight\u663e\u8457\u8d85\u8d8a\u4e86\u6240\u6709\u5176\u4ed6\u56fe\u63cf\u8ff0\u65b9\u6cd5\uff08\u4f8b\u5982\u63d0\u793a\u6280\u672f\u3001\u91cd\u65b0\u6392\u5e8f\u7b56\u7565\u7b49\uff09\u3002|\n", "2409.02977": "|**2024-09-04**|**Large Language Model-Based Agents for Software Engineering: A Survey**|Junwei Liu et.al.|[2409.02977](http://arxiv.org/abs/2409.02977)|**[link](https://github.com/fudanselab/agent4se-paper-list)**|**\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u7bc7\u5168\u9762\u4e14\u7cfb\u7edf\u7684\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u7684\u5e94\u7528\u7684\u7efc\u8ff0\u3002\u6211\u4eec\u6536\u96c6\u4e86106\u7bc7\u8bba\u6587\uff0c\u5e76\u4ece\u4e24\u4e2a\u89d2\u5ea6\u8fdb\u884c\u5206\u7c7b\uff0c\u5373\u8f6f\u4ef6\u5de5\u7a0b\u89c6\u89d2\u548c\u4ee3\u7406\u89c6\u89d2\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6b64\u7efc\u8ff0\u7684\u4ed3\u5e93\u5730\u5740\u4e3a\uff1ahttps://github.com/FudanSELab/Agent4SE-Paper-List\u3002**|\n", "2409.05001": "|**2024-09-08**|**A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement**|Huan Zhang et.al.|[2409.05001](http://arxiv.org/abs/2409.05001)|**[link](https://github.com/nju-websoft/paircoder)**|**\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u77a9\u76ee\u7684\u6027\u80fd\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u63d0\u793a\u6280\u672f\u53ca\u4ee3\u7801\u7cbe\u70bc\u5bf9LLM\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u7f16\u7a0b\u95ee\u9898\u65f6\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u5177\u6709\u50f5\u5316\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPairCoder\u7684\u65b0\u578bLLM\u57fa\u6846\u67b6\uff0c\u65e8\u5728\u6a21\u4eff\u53cc\u4eba\u534f\u4f5c\u7f16\u7a0b\u5b9e\u8df5\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 PairCoder\u7531\u4e24\u4e2a\u534f\u4f5c\u7684LLM\u4ee3\u7406\u7ec4\u6210\uff1a\u5bfc\u822a\u5458\uff08Navigator\uff09\u548c\u9a7e\u9a76\u5458\uff08Driver\uff09\u3002\u5bfc\u822a\u5458\u8d1f\u8d23\u63d0\u51fa\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3001\u9009\u62e9\u5f53\u524d\u6700\u4f73\u8ba1\u5212\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u6307\u5bfc\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u9a7e\u9a76\u5458\u5219\u9075\u5faa\u5bfc\u822a\u5458\u7684\u6307\u5f15\uff0c\u8fdb\u884c\u521d\u59cb\u4ee3\u7801\u751f\u6210\u3001\u4ee3\u7801\u6d4b\u8bd5\u548c\u4f18\u5316\u3002 \u8fd9\u79cd\u4ea4\u66ff\u548c\u8fed\u4ee3\u7684\u5de5\u4f5c\u6d41\u7a0b\u5305\u62ec\u591a\u8ba1\u5212\u63a2\u7d22\u548c\u57fa\u4e8e\u53cd\u9988\u7684\u7ec6\u5316\uff0c\u6a21\u62df\u4e86\u53cc\u4eba\u7a0b\u5e8f\u5458\u7684\u5408\u4f5c\u65b9\u5f0f\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\uff0c\u5728\u591a\u79cd\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u4e0a\u5bf9PairCoder\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPairCoder\u5728\u51c6\u786e\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u76f4\u63a5\u4f7f\u7528\u63d0\u793a\u7684LLM\uff0c\u76f8\u5bf9pass@1\u63d0\u9ad8\u4e8612.00%-162.43%\u3002**|\n", "2409.04617": "|**2024-09-06**|**Sparse Rewards Can Self-Train Dialogue Agents**|Barrett Martin Lattimer et.al.|[2409.04617](http://arxiv.org/abs/2409.04617)|**[link](https://github.com/asappresearch/josh-llm-simulation-training)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u8f6e\u5bf9\u8bdd\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\u4e3b\u8981\u7531\u76d1\u7763\u5fae\u8c03\u548c\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u53cd\u9988\u9a71\u52a8\u3002\u7136\u800c\uff0c\u968f\u7740\u57fa\u7840LLM\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u63d0\u5347\uff0c\u83b7\u53d6\u6709\u610f\u4e49\u7684\u4eba\u7c7b\u53cd\u9988\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u67d0\u4e9b\u9886\u57df\u4e2d\uff0c\u57fa\u7840LLM\u53ef\u80fd\u6700\u7ec8\u8d85\u8d8a\u4eba\u7c7b\u80fd\u529b\uff0c\u4f7f\u5f97\u4f20\u7edf\u7684\u57fa\u4e8e\u53cd\u9988\u7684\u65b9\u6cd5\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u81ea\u6211\u6539\u8fdb\u8303\u5f0f\uff0c\u5141\u8bb8LLM\u4ee3\u7406\u5728\u6ca1\u6709\u5916\u90e8\u4eba\u7c7b\u53cd\u9988\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u63d0\u9ad8\u5176\u6027\u80fd\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u6bd4\u7ed3\u679c\u4e3a\u6a21\u62df\u6536\u83b7\u201d\uff08JOSH\uff09\u7684\u81ea\u6211\u5bf9\u9f50\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u7a00\u758f\u5956\u52b1\u6a21\u62df\u73af\u5883\u6765\u63d0\u53d6\u7406\u60f3\u884c\u4e3a\uff0c\u5e76\u8fdb\u4e00\u6b65\u8bad\u7ec3LLM\u4ee5\u81ea\u8eab\u8f93\u51fa\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u4eceMultiWOZ\u4e2d\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u5de5\u5177\u8c03\u7528\u7684\u7a00\u758f\u5956\u52b1\u4eff\u771f\u73af\u5883\uff0c\u79f0\u4e3aToolWOZ\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528JOSH\u8bad\u7ec3\u7684\u6a21\u578b\uff08\u65e0\u8bba\u662f\u5c0f\u578b\u8fd8\u662f\u524d\u6cbf\u6a21\u578b\uff09\uff0c\u5728\u57fa\u4e8e\u5de5\u5177\u7684\u4ea4\u4e92\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u6a21\u578b\u80fd\u529b\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2409.06351": "|**2024-09-10**|**MAGDA: Multi-agent guideline-driven diagnostic assistance**|David Bani-Harouni et.al.|[2409.06351](http://arxiv.org/abs/2409.06351)|null|\u5728\u7d27\u6025\u62a4\u7406\u90e8\u95e8\u3001\u504f\u8fdc\u533b\u9662\u6216\u53d1\u5c55\u4e2d\u56fd\u5bb6\u7684\u8bca\u6240\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u7ecf\u5e38\u7f3a\u4e4f\u7531\u8bad\u7ec3\u6709\u7d20\u7684\u653e\u5c04\u79d1\u533b\u751f\u5feb\u901f\u5206\u6790\u5f71\u50cf\u7684\u80fd\u529b\uff0c\u8fd9\u4f1a\u5bf9\u75c5\u4eba\u7684\u5065\u5eb7\u62a4\u7406\u4ea7\u751f\u4e0d\u5229\u5f71\u54cd\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6709\u53ef\u80fd\u901a\u8fc7\u63d0\u4f9b\u6709\u52a9\u4e8e\u4ed6\u4eec\u51b3\u7b56\u7684\u89c1\u89e3\u6765\u7f13\u89e3\u8fd9\u4e9b\u4e34\u5e8a\u533b\u751f\u7684\u538b\u529b\u3002\u5c3d\u7ba1\u8fd9\u4e9bLLM\u5728\u5c55\u793a\u5176\u7406\u8bba\u533b\u5b66\u77e5\u8bc6\u7684\u533b\u5b66\u8003\u8bd5\u4e0a\u53d6\u5f97\u4e86\u9ad8\u5206\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4e0d\u9075\u5faa\u533b\u5b66\u6307\u5357\u3002\u4e3a\u6b64\u9879\u5de5\u4f5c\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u96f6\u6837\u672c\u6307\u5357\u9a71\u52a8\u51b3\u7b56\u652f\u6301\u65b9\u6cd5\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7531\u591a\u4e2aLLM\u4ee3\u7406\u7ec4\u6210\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u4ee3\u7406\u914d\u5907\u4e86\u5bf9\u6bd4\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u534f\u4f5c\u65b9\u5f0f\u8fbe\u6210\u60a3\u8005\u8bca\u65ad\u3002\u5728\u5411\u8fd9\u4e9b\u4ee3\u7406\u63d0\u4f9b\u7b80\u5355\u7684\u8bca\u65ad\u6307\u5357\u540e\uff0c\u5b83\u4eec\u4f1a\u5408\u6210\u63d0\u793a\u5e76\u6839\u636e\u8fd9\u4e9b\u6307\u5357\u7b5b\u9009\u56fe\u50cf\u4ee5\u5bfb\u627e\u53d1\u73b0\u3002\u6700\u540e\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e00\u4e2a\u53ef\u7406\u89e3\u7684\u63a8\u7406\u94fe\u8def\u6765\u89e3\u91ca\u5176\u8bca\u65ad\u7ed3\u679c\uff0c\u5e76\u81ea\u6211\u7cbe\u70bc\u4ee5\u8003\u8651\u75be\u75c5\u4e4b\u95f4\u7684\u76f8\u4e92\u4f9d\u8d56\u6027\u3002\u7531\u4e8e\u6211\u4eec\u7684\u65b9\u6cd5\u662f\u96f6\u6837\u672c\u7684\uff0c\u56e0\u6b64\u9002\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u573a\u666f\uff0c\u5728\u8fd9\u4e9b\u573a\u666f\u4e2d\u8bad\u7ec3\u6570\u636e\u6709\u9650\uff0c\u4f46\u4e13\u5bb6\u8bbe\u8ba1\u7684\u75be\u75c5\u63cf\u8ff0\u53ef\u7528\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u80f8\u90e8X\u5c04\u7ebf\u6570\u636e\u96c6CheXpert\u548cChestX-ray 14 Longtail\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u4e0e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u76f8\u6bd4\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u80fd\u591f\u5e94\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u7684\u6cdb\u5316\u3002|\n", "2409.09030": "|**2024-09-23**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanlin Wang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5e76\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u3002\u6211\u4eec\u53d1\u73b0\u8bb8\u591a\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u7684\u6df1\u5ea6\u7efc\u8ff0\uff0c\u4ee5\u6574\u7406\u5176\u53d1\u5c55\u80cc\u666f\u3001\u5206\u6790\u5982\u4f55\u7ed3\u5408LLMs\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u7c7b\u4efb\u52a1\u4ee5\u53ca\u9610\u660eSE\u4e2d\u7684LLMs\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u5f00\u5c55\u9996\u6b21\u9488\u5bf9\u7ed3\u5408LLMs\u4ee3\u7406\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2dLLMs\u4ee3\u7406\u7684\u6846\u67b6\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u3002\u540c\u65f6\uff0c\u603b\u7ed3\u4e86\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5e76\u9488\u5bf9\u73b0\u6709\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u8bba\u6587\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09013": "|**2024-09-13**|**AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents**|Zhe Su et.al.|[2409.09013](http://arxiv.org/abs/2409.09013)|null|\u4e3a\u4e86\u5b89\u5168\u548c\u6210\u529f\u5730\u90e8\u7f72\uff0c\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5fc5\u987b\u540c\u65f6\u6ee1\u8db3\u771f\u5b9e\u6027\u548c\u5b9e\u7528\u6027\u76ee\u6807\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u5f80\u5f80\u5728\u51b2\u7a81\u4e2d\uff0c\u4f8b\u5982AI\u52a9\u624b\u5e2e\u52a9\u4e8c\u624b\u8f66\u9500\u552e\u5458\u9500\u552e\u6709\u7455\u75b5\u7684\u6c7d\u8f66\u3002\u8fd9\u79cd\u51b2\u7a81\u90e8\u5206\u5f52\u56e0\u4e8e\u6a21\u7cca\u6216\u8bef\u5bfc\u6027\u7684\u7528\u6237\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAI-LieDar\u7684\u6846\u67b6\uff0c\u4ee5\u7814\u7a76\u5728\u591a\u8f6e\u4ea4\u4e92\u8bbe\u7f6e\u4e2d\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5982\u4f55\u5904\u7406\u5b9e\u7528\u6027\u548c\u771f\u5b9e\u6027\u7684\u51b2\u7a81\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u73b0\u5b9e\u573a\u666f\uff0c\u5176\u4e2d\u8bed\u8a00\u4ee3\u7406\u88ab\u6307\u793a\u5b9e\u73b0\u4e0e\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u771f\u5b9e\u6027\u51b2\u7a81\u7684\u76ee\u6807\u3002\u4e3a\u4e86\u5927\u89c4\u6a21\u8bc4\u4f30\u771f\u5b9e\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5fc3\u7406\u5b66\u6587\u732e\u7684\u53ef\u4fe1\u5ea6\u68c0\u6d4b\u5668\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u7684\u56de\u7b54\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6240\u6709\u6a21\u578b\u7684\u771f\u5b9e\u56de\u7b54\u6bd4\u4f8b\u4e0d\u523050%\uff0c\u5c3d\u7ba1\u8fbe\u5230\u76ee\u6807\uff08\u5b9e\u7528\u6027\uff09\u548c\u771f\u5b9e\u6027\u7684\u6bd4\u4f8b\u5728\u4e0d\u540c\u6a21\u578b\u4e2d\u6709\u6240\u5dee\u5f02\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u6d4b\u8bd5\u4e86LLM\u7684\u53ef\u5f15\u5bfc\u6027\uff0c\u53d1\u73b0\u6a21\u578b\u4f1a\u9075\u5faa\u6076\u610f\u6307\u4ee4\u6765\u6b3a\u9a97\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5f15\u5bfc\u4f7f\u5176\u8d8b\u5411\u771f\u5b9e\u7684\u6a21\u578b\u4e5f\u4ecd\u7136\u53ef\u80fd\u8bf4\u8c0e\u3002 \u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86LLM\u4e2d\u771f\u5b9e\u6027\u7684\u590d\u6742\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u786e\u4fddLLM\u548cAI\u4ee3\u7406\u7684\u5b89\u5168\u53ef\u9760\u90e8\u7f72\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u9075\u5b88\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u57fa\u4e8e\u4eba\u5de5\u7684\u5408\u89c4\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u65e5\u76ca\u589e\u52a0\u91cf\u4ee5\u53ca\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\uff0c\u9762\u4e34\u7740\u96be\u4ee5\u6269\u5c55\u7684\u95ee\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\u4e3a\u81ea\u52a8\u5185\u5bb9\u5408\u89c4\u9a8c\u8bc1\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002\u672c\u5de5\u4f5c\u8bc4\u4f30\u4e86\u516d\u4e2a\u57fa\u4e8eOpen-LLMs\u6784\u5efa\u7684AI\u4ee3\u7406\uff0c\u7528\u4e8e\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u81ea\u52a8\u5316\u89c4\u5219\u9075\u5faa\u68c0\u67e5\uff0c\u5728\u8fd9\u79cd\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\u4e2d\uff0c\u7531\u4e8e\u793e\u533a\u8303\u56f4\u548c\u89c4\u5219\u7684\u5f02\u8d28\u6027\uff0c\u8fd9\u4e00\u4efb\u52a1\u5c24\u4e3a\u56f0\u96be\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\uff0c\u6211\u4eec\u53d1\u73b0AI\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u4e0d\u5408\u89c4\u7684\u5185\u5bb9\u3001\u7406\u89e3\u8bed\u8a00\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u591a\u6837\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u8868\u73b0\u51fa\u9ad8\u5ea6\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\u8bc4\u5206\u89e3\u91ca\u4e0e\u5408\u89c4\u5efa\u8bae\u3002\u57fa\u4e8e\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u7c7b\u8bc4\u4f30\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08717": "|**2024-09-13**|**Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents**|Junchi Yao et.al.|[2409.08717](http://arxiv.org/abs/2409.08717)|null|\u5728\u793e\u4ea4\u5a92\u4f53\u65e5\u76ca\u6210\u4e3a\u793e\u4f1a\u8fd0\u52a8\u5f62\u6210\u516c\u4f17\u610f\u89c1\u7684\u91cd\u8981\u5e73\u53f0\u7684\u80cc\u666f\u4e0b\uff0c\u51c6\u786e\u6a21\u62df\u548c\u9884\u6d4b\u7528\u6237\u610f\u89c1\u52a8\u6001\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u73b0\u8c61\u3001\u653f\u7b56\u5236\u5b9a\u4ee5\u53ca\u5f15\u5bfc\u516c\u4f17\u610f\u89c1\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6a21\u62df\u65b9\u6cd5\u5728\u6355\u6349\u7528\u6237\u884c\u4e3a\u7684\u590d\u6742\u6027\u548c\u52a8\u6001\u6027\u65b9\u9762\u9762\u4e34\u7740\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u793e\u4ea4\u5a92\u4f53\u7528\u6237\u610f\u89c1\u52a8\u6001\u6a21\u62df\u65b9\u6cd5\u2014\u2014FDE-LLM\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u7ed3\u5408\u4e86\u610f\u89c1\u52a8\u6001\u4e0e\u6d41\u884c\u75c5\u6a21\u578b\uff0c\u6709\u6548\u7ea6\u675f\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u884c\u4e3a\u548c\u610f\u89c1\u6f14\u5316\u8fc7\u7a0b\uff0c\u4f7f\u5176\u66f4\u52a0\u7b26\u5408\u73b0\u5b9e\u7f51\u7edc\u4e16\u754c\u3002\u7279\u522b\u5730\uff0cFDE-LLM\u5c06\u7528\u6237\u5206\u4e3a\u610f\u89c1\u9886\u8896\u548c\u8ddf\u968f\u8005\u4e24\u5927\u7c7b\u3002\u610f\u89c1\u9886\u8896\u57fa\u4e8eLLM\u89d2\u8272\u626e\u6f14\uff0c\u5e76\u53d7\u7ec6\u80de\u81ea\u52a8\u673a\uff08CA\uff09\u6a21\u578b\u7ea6\u675f\uff0c\u800c\u610f\u89c1\u8ddf\u968f\u8005\u5219\u878d\u5165\u4e86\u4e00\u4e2a\u7ed3\u5408CA\u6a21\u578b\u4e0eSIR\u6a21\u578b\u7684\u52a8\u6001\u7cfb\u7edf\u3002\u8fd9\u79cd\u521b\u65b0\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u62df\u7684\u51c6\u786e\u6027\u548c\u6548\u7387\u3002 \u5b9e\u9a8c\u5728\u56db\u4e2a\u771f\u5b9e\u5fae\u535a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u5e76\u4f7f\u7528\u5f00\u6e90\u6a21\u578bChatGLM\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u578b\uff08ABM\uff09\u610f\u89c1\u52a8\u6001\u7b97\u6cd5\u548c\u57fa\u4e8eLLM\u7684\u610f\u89c1\u4f20\u64ad\u7b97\u6cd5\uff0c\u6211\u4eec\u7684FDE-LLM\u7b97\u6cd5\u5728\u51c6\u786e\u6027\u4e0e\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002|\n", "2409.10372": "|**2024-09-19**|**Instigating Cooperation among LLM Agents Using Adaptive Information Modulation**|Qiliang Chen et.al.|[2409.10372](http://arxiv.org/abs/2409.10372)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u4f5c\u4e3a\u4eba\u7c7b\u6218\u7565\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5e76\u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u5728\u56e2\u961f\u73af\u5883\u4e2d\u8fdb\u884c\u4e0d\u65ad\u6f14\u5316\u7684\u6218\u7565\u4e92\u52a8\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u4e86\u4f20\u7edf\u7684\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u62df\uff0c\u901a\u8fc7\u4f7f\u7528\u7b56\u7565\u6027\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08SLA\uff09\u4ee5\u53ca\u5f15\u5165\u52a8\u6001\u548c\u9002\u5e94\u6027\u7684\u6cbb\u7406\uff0c\u901a\u8fc7\u4fc3\u8fdb\u793e\u4f1a\u884c\u4e3a\u7684\u5f3a\u5316\u5b66\u4e60\u4ee3\u7406\uff08PPA\uff09\uff0c\u8be5\u4ee3\u7406\u8c03\u8282\u7f51\u7edc\u4e2d\u4ee3\u7406\u4e4b\u95f4\u7684\u4fe1\u606f\u8bbf\u95ee\uff0c\u4ee5\u4f18\u5316\u793e\u4f1a\u798f\u5229\u5e76\u4fc3\u8fdb\u4eb2\u793e\u4f1a\u884c\u4e3a\u3002\u901a\u8fc7\u5728\u8fed\u4ee3\u6e38\u620f\u4e2d\u9a8c\u8bc1\uff0c\u5305\u62ec\u56da\u5f92\u56f0\u5883\uff0c\u6211\u4eec\u5c55\u793a\u4e86SLA\u4ee3\u7406\u8868\u73b0\u51fa\u590d\u6742\u7684\u6218\u7565\u8c03\u6574\u3002PPA\u4ee3\u7406\u6709\u6548\u5730\u5b66\u4e60\u8c03\u6574\u4fe1\u606f\u900f\u660e\u5ea6\uff0c\u5bfc\u81f4\u5408\u4f5c\u7387\u663e\u8457\u63d0\u9ad8\u3002\u8fd9\u4e00\u6846\u67b6\u63d0\u4f9b\u4e86\u5bf9\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u793e\u4f1a\u52a8\u529b\u5b66\u7684\u91cd\u8981\u89c1\u89e3\uff0c\u4e3a\u5728\u5b9e\u9645\u56e2\u961f\u73af\u5883\u4e2d\u90e8\u7f72AI\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.09785": "|**2024-09-17**|**Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition**|Chao-Han Huck Yang et.al.|[2409.09785](http://arxiv.org/abs/2409.09785)|null|\u5728\u8fd1\u671f\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u6280\u672f\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u89e3\u7801\u7684\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u5728\u58f0\u5b66\u5efa\u6a21\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6210\u4e3a\u4e86\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u8bed\u8a00\u5efa\u6a21\u5728\u8bed\u97f3\u5904\u7406\u9886\u57df\u7684\u6f5c\u5728\u65b0\u80fd\u529b\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u751f\u6210\u6027\u8bed\u97f3\u8f6c\u5f55\u9519\u8bef\u4fee\u6b63\u201d\uff08GenSEC\uff09\u7684\u6311\u6218\u3002\u8be5\u6311\u6218\u5305\u542b\u4e86\u4e09\u4e2a\u9488\u5bf9\u540eASR\u8bed\u8a00\u6a21\u578b\u7684\u4efb\u52a1\uff1a\uff08i\uff09\u540eASR\u8f6c\u5f55\u4fee\u6b63\u3001\uff08ii\uff09\u8bf4\u8bdd\u8005\u6807\u7b7e\u5316\u4ee5\u53ca\uff08iii\uff09\u60c5\u611f\u8bc6\u522b\u3002\u8fd9\u4e9b\u4efb\u52a1\u65e8\u5728\u6a21\u62df\u672a\u6765\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8bed\u97f3\u754c\u9762\u4ee3\u7406\u5904\u7406\u5de5\u4f5c\u65f6\u7684\u573a\u666f\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6216\u57fa\u4e8e\u4ee3\u7406\u7684API\u6765\u4fdd\u6301\u5bf9\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u8ba8\u8bba\u4e86\u57fa\u51c6\u8bc4\u4f30\u7684\u7ed3\u679c\u4ee5\u53ca\u8bbe\u8ba1\u672a\u6765\u8bc4\u4f30\u65f6\u5e94\u6c72\u53d6\u7684\u7ecf\u9a8c\u6559\u8bad\u3002|\n", "2409.09584": "|**2024-09-15**|**RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation**|Qingyao Li et.al.|[2409.09584](http://arxiv.org/abs/2409.09584)|null|\u672c\u6587\u9488\u5bf9LLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\u4e0e\u6811\u641c\u7d22\u7b97\u6cd5\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u5f53\u524d\u7684\u641c\u7d22\u7b97\u6cd5\u5728\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u4f4e\u641c\u7d22\u8d28\u91cf\u7684\u95ee\u9898\uff0c\u4e3b\u8981\u6e90\u4e8e\u4ee5\u4e0b\u4e09\u4e2a\u539f\u56e0\uff1a1\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u9ad8\u63a8\u7406\u8981\u6c42\u7684\u641c\u7d22\u7a7a\u95f4\u8bbe\u8ba1\u4e0d\u5408\u7406\uff1b2\uff09\u672a\u80fd\u5145\u5206\u7ed3\u5408\u4ee3\u7801\u53cd\u9988\u4f18\u5316\u641c\u7d22\u8fc7\u7a0b\uff1b3\uff09\u5904\u7406\u8d1f\u53cd\u9988\u65f6\u6548\u7387\u4f4e\u4e0b\uff0c\u5bfc\u81f4\u641c\u7d22\u8d28\u91cf\u548c\u6548\u7387\u964d\u4f4e\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014RethinkMCTS\uff08\u53cd\u601d\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff09\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5728\u751f\u6210\u4ee3\u7801\u4e4b\u524d\u8fdb\u884c\u591a\u5c42\u6b21\u7684\u601d\u8003\u641c\u7d22\uff0c\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7b56\u7565\u9009\u9879\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cRethinkMCTS\u5229\u7528\u7ec6\u7c92\u5ea6\u7684\u4ee3\u7801\u6267\u884c\u53cd\u9988\u6784\u5efa\u53e3\u5934\u53cd\u9988\uff0c\u4ee5\u4fee\u6b63\u641c\u7d22\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u7684\u9519\u8bef\u601d\u8def\u3002\u8fd9\u79cd\u673a\u5236\u786e\u4fdd\u4e86\u641c\u7d22\u6cbf\u7740\u6b63\u786e\u7684\u63a8\u7406\u8def\u5f84\u524d\u8fdb\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4e2a\u641c\u7d22\u6811\u7684\u6574\u4f53\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4e4b\u524d\u7684\u57fa\u4e8e\u641c\u7d22\u548c\u53cd\u9988\u7684\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u76f8\u6bd4\uff0cRethinkMCTS\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u5728HumanEval\u6570\u636e\u96c6\u4e0a\uff0cRethinkMCTS\u5c06GPT-3.5-turbo\u7684pass@1\u6307\u6807\u4ece70.12\u63d0\u9ad8\u5230\u4e8689.02\uff0c\u5c06GPT-4o-mini\u7684pass@1\u6307\u6807\u4ece87.20\u63d0\u5347\u81f394.51\u3002\u901a\u8fc7\u6df1\u5165\u7684\u63a2\u7d22\u548c\u6539\u8fdb\u6574\u4e2a\u641c\u7d22\u6811\u7684\u8d28\u91cf\uff0cRethinkMCTS\u6709\u6548\u5730\u589e\u5f3a\u4e86\u641c\u7d22\u8fc7\u7a0b\u7684\u5168\u9762\u6027\u548c\u6df1\u5ea6\u3002|\n", "2409.09345": "|**2024-09-14**|**Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models**|Yuanzhao Zhai et.al.|[2409.09345](http://arxiv.org/abs/2409.09345)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u4efb\u52a1\u76f8\u5173Q\u503c\u6a21\u578b\u6765\u6307\u5bfc\u884c\u52a8\u9009\u62e9\u7684\u65b9\u6cd5\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u591a\u6b65\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u6536\u96c6\u4e86\u6807\u6ce8\u6709\u6b65\u9aa4\u7ea7Q\u503c\u7684\u51b3\u7b56\u8f68\u8ff9\uff0c\u5e76\u6784\u5efa\u4e86\u504f\u597d\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u4f7f\u7528\u53e6\u4e00\u4e2aLLM\u901a\u8fc7\u6b65\u9aa4\u7ea7\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u62df\u5408\u8fd9\u4e9b\u504f\u597d\uff0c\u4ece\u800c\u5f62\u6210Q\u503c\u6a21\u578b\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u5bf9\u4e8e\u6bcf\u4e2a\u51b3\u7b56\u6b65\u9aa4\uff0cLLM\u4ee3\u7406\u90fd\u4f1a\u9009\u62e9\u5177\u6709\u6700\u9ad8Q\u503c\u7684\u52a8\u4f5c\uff0c\u7136\u540e\u518d\u4e0e\u73af\u5883\u8fdb\u884c\u4ea4\u4e92\u3002\u6211\u4eec\u5c06\u8be5\u65b9\u6cd5\u5e94\u7528\u4e8e\u591a\u4e2a\u5f00\u6e90\u548cAPI\u96c6\u6210\u7684LLM\u4ee3\u7406\u4e0a\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5f15\u5165Q\u503c\u6a21\u578b\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6027\u80fd\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6784\u5efa\u4e8ePhi-3-mini-4k-instruct\u7684\u4ee3\u7406\u5728WebShop\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e86103%\uff0c\u5728HotPotQA\u4efb\u52a1\u4e0a\u63d0\u5347\u4e8675%\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4o-mini\u3002\u6b64\u5916\uff0cQ\u503c\u6a21\u578b\u8fd8\u5177\u5907\u51e0\u4e2a\u4f18\u52bf\uff0c\u5982\u5bf9\u4e0d\u540cLLM\u4ee3\u7406\u7684\u6cdb\u5316\u80fd\u529b\u548c\u4e0e\u73b0\u6709\u63d0\u793a\u7b56\u7565\u65e0\u7f1d\u96c6\u6210\u7684\u80fd\u529b\u3002|\n", "2409.09271": "|**2024-09-14**|**Python Symbolic Execution with LLM-powered Code Generation**|Wenhan Wang et.al.|[2409.09271](http://arxiv.org/abs/2409.09271)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u589e\u5f3a\u7684\u4ee3\u7406\u5de5\u5177\u2014\u2014LLM-Sym\u3002\u8be5\u5de5\u5177\u65e8\u5728\u89e3\u51b3\u4f7f\u7528\u7b26\u53f7\u6267\u884c\u6280\u672f\u5728\u52a8\u6001\u7c7b\u578b\u8bed\u8a00\u5982Python\u4e2d\u9047\u5230\u7684\u4e3b\u8981\u6311\u6218\u3002\u901a\u8fc7\u81ea\u52a8\u8c03\u7528SMT\u6c42\u89e3\u5668Z3\u6765\u89e3\u51b3\u6267\u884c\u8def\u5f84\u7ea6\u675f\uff0cLLM-Sym\u80fd\u591f\u6269\u5c55\u57fa\u7840\u7684\u7b26\u53f7\u6267\u884c\u5f15\u64ce\uff0c\u4f7f\u5176\u652f\u6301\u5305\u542b\u590d\u6742\u6570\u636e\u7c7b\u578b`list`\u7684\u7a0b\u5e8f\u3002 LLM-Sym\u7684\u6838\u5fc3\u8d21\u732e\u5728\u4e8e\u5c06\u590d\u6742\u7684Python\u8def\u5f84\u7ea6\u675f\u8f6c\u5316\u4e3aZ3\u4ee3\u7801\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5b9e\u73b0\u51c6\u786e\u7684\u8def\u5f84\u5230Z3\u4ee3\u7801\u7684\u8f6c\u6362\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6b65\u9aa4\u7684\u4ee3\u7801\u751f\u6210\u7ba1\u9053\uff0c\u5305\u62ec\u7c7b\u578b\u63a8\u65ad\u3001\u68c0\u7d22\u548c\u81ea\u6211\u7cbe\u70bc\u7b49\u73af\u8282\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM-Sym\u80fd\u591f\u89e3\u51b3\u5177\u6709\u590d\u6742\u63a7\u5236\u6d41\u548c\u5217\u8868\u6570\u636e\u7ed3\u6784\u7684LeetCode\u95ee\u9898\u4e2d\u7684\u8def\u5f84\u7ea6\u675f\uff0c\u8fd9\u662f\u57fa\u7840\u7b26\u53f7\u6267\u884c\u5f15\u64ce\u65e0\u6cd5\u505a\u5230\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e3aLLM\u4e0e\u7b26\u53f7\u6c42\u89e3\u5668\u63a8\u7406\u80fd\u529b\u7684\u7ed3\u5408\u5f00\u8f9f\u4e86\u9053\u8def\uff0c\u5e76\u4e3aLLM\u8f85\u52a9\u6d4b\u8bd5\u7528\u4f8b\u751f\u6210\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002|\n", "2409.11393": "|**2024-09-17**|**LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents**|Amine B. Hassouna et.al.|[2409.11393](http://arxiv.org/abs/2409.11393)|null|\u672c\u6587\u901a\u8fc7\u63d0\u51fa\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-Agent-UMF\uff08\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7edf\u4e00\u5efa\u6a21\u6846\u67b6\uff09\uff0c\u89e3\u51b3\u4e86\u96c6\u6210\u5de5\u5177\u5230\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u4ee5\u53ca\u5728\u591a\u4e2a\u524d\u6cbf\u5de5\u4f5c\u4e2d\u63d0\u51fa\u7684\u6539\u8fdb\u63aa\u65bd\u6240\u5bfc\u81f4\u7684\u8f6f\u4ef6\u67b6\u6784\u975e\u7edf\u4e00\u6027\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u7ed3\u5408\u53ca\u540e\u7eed\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u529f\u80fd\u5b9e\u73b0\u800c\u975e\u5b9a\u4e49\u7ec4\u4ef6\u8fb9\u754c\uff0c\u5bfc\u81f4\u4e86\u7814\u7a76\u4eba\u5458\u4e4b\u95f4\u7684\u672f\u8bed\u548c\u67b6\u6784\u4e0a\u7684\u6df7\u6dc6\u3002 \u8be5\u6846\u67b6\u660e\u786e\u4e86\u4ee3\u7406\u7684\u4e0d\u540c\u7ec4\u4ef6\uff0c\u5305\u62ecLLM\u3001\u5de5\u5177\u4ee5\u53ca\u65b0\u5f15\u5165\u7684\u6838\u5fc3\u4ee3\u7406\u6982\u5ff5\uff0c\u5176\u4f5c\u7528\u662f\u4ee3\u7406\u7684\u4e2d\u592e\u534f\u8c03\u8005\uff0c\u7531\u89c4\u5212\u3001\u8bb0\u5fc6\u3001\u4e2a\u4eba\u8d44\u6599\u3001\u884c\u52a8\u548c\u5b89\u5168\u4e94\u4e2a\u6a21\u5757\u7ec4\u6210\u3002\u6838\u5fc3\u4ee3\u7406\u7684\u5185\u90e8\u7ed3\u6784\u5dee\u5f02\u4fc3\u4f7f\u6211\u4eec\u5c06\u5176\u5206\u7c7b\u4e3a\u88ab\u52a8\u578b\u548c\u4e3b\u52a8\u578b\u4e24\u79cd\u7c7b\u578b\u3002\u57fa\u4e8e\u6b64\u5206\u7c7b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7ed3\u5408\u4e0d\u540c\u4e2a\u4f53\u4ee3\u7406\u72ec\u7279\u7279\u6027\u7684\u591a\u79cd\u591a\u6838\u5fc3\u4ee3\u7406\u67b6\u6784\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5c06\u8be5\u6846\u67b6\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u524d\u6cbf\u4ee3\u7406\uff0c\u5e76\u5c55\u793a\u5176\u4e0e\u529f\u80fd\u7684\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u6f84\u6e05\u4e86\u5148\u524d\u88ab\u5ffd\u89c6\u7684\u67b6\u6784\u65b9\u9762\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u56db\u4e2a\u63d0\u51fa\u7684\u67b6\u6784\u8fdb\u884c\u4e86\u8be6\u5c3d\u8bc4\u4f30\uff0c\u901a\u8fc7\u6574\u5408\u5177\u6709\u4e0d\u540c\u7279\u6027\u7684\u4ee3\u7406\u5230\u6df7\u5408\u4e3b\u52a8/\u88ab\u52a8\u6838\u5fc3\u4ee3\u7406\u7cfb\u7edf\u4e2d\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u63d0\u4f9b\u4e86\u5bf9\u7279\u5b9a\u4ee3\u7406\u7ec4\u5408\u53ef\u80fd\u5e26\u6765\u7684\u6539\u8fdb\u548c\u9762\u4e34\u7684\u6311\u6218\u7684\u6e05\u6670\u89c1\u89e3\u3002|\n", "2409.11276": "|**2024-09-17**|**Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments**|Maria Rigaki et.al.|[2409.11276](http://arxiv.org/abs/2409.11276)|null|\u672c\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u4f7f\u7528\u672c\u5730\u5fae\u8c03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u7ea2\u961f\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002\u8003\u8651\u5230\u5546\u4e1a\u4e91\u57faLLM\u7684\u9690\u79c1\u95ee\u9898\u3001\u6210\u672c\u548c\u7f51\u7edc\u8fde\u63a5\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Hackphyr\u2014\u2014\u4e00\u4e2a\u672c\u5730\u5fae\u8c03\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u65e8\u5728\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u7684\u7ea2\u961f\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u5355\u4e2aGPU\u5361\u4e0a\u8fd0\u884c\uff0c\u5e76\u4e14\u5728\u6027\u80fd\u4e0a\u4e0e\u66f4\u5927\u66f4\u5f3a\u5927\u7684\u5546\u4e1a\u6a21\u578b\u5982GPT-4\u76f8\u5ab2\u7f8e\u3002 Hackphyr\u5728\u590d\u6742\u3001\u524d\u6240\u672a\u89c1\u7684\u573a\u666f\u4e2d\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\uff0c\u5305\u62ecGPT-3.5-turbo\u4ee5\u53caQ-learning\u4ee3\u7406\u7b49\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u6027\u80fd\u63d0\u5347\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u7f51\u7edc\u5b89\u5168\u4efb\u52a1\u7684\u65b0\u6570\u636e\u96c6\uff0c\u4ee5\u589e\u5f3a\u57fa\u7840\u6a21\u578b\u7684\u80fd\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u4ee3\u7406\u884c\u4e3a\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u7684\u89c4\u5212\u80fd\u529b\u548c\u6f5c\u5728\u5c40\u9650\u6027\u7684\u89c1\u89e3\uff0c\u4ece\u800c\u4e3a\u66f4\u5e7f\u6cdb\u5730\u7406\u89e3\u6b64\u7c7b\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53c2\u8003\u3002|\n", "2409.10568": "|**2024-09-14**|**On the limits of agency in agent-based models**|Ayush Chopra et.al.|[2409.10568](http://arxiv.org/abs/2409.10568)|**[link](https://github.com/agenttorch/agenttorch)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgentTorch\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u5177\u6709\u9002\u5e94\u6027\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5c06\u57fa\u4e8e\u4e2a\u4f53\u7684\u6a21\u578b\uff08ABM\uff09\u6269\u5c55\u5230\u6570\u767e\u4e07\u4e2a\u4ee3\u7406\u7684\u89c4\u6a21\u3002\u8fd9\u4e00\u6846\u67b6\u65e8\u5728\u5728\u6a21\u62df\u590d\u6742\u7cfb\u7edf\u7684\u884c\u4e3a\u65f6\uff0c\u65e2\u6355\u6349\u5230\u771f\u5b9e\u73af\u5883\u52a8\u6001\u548c\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\uff0c\u53c8\u4fdd\u6301\u5bf9\u5e9e\u5927\u4eba\u53e3\u7fa4\u4f53\u9ad8\u6548\u6a21\u62df\u7684\u80fd\u529b\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\u4e3a\u589e\u5f3aABM\u63d0\u4f9b\u4e86\u673a\u4f1a\uff0c\u4f46\u4f7f\u7528LLMs\u8fdb\u884c\u5927\u89c4\u6a21\u4ee3\u7406\u7684\u8ba1\u7b97\u53ef\u884c\u6027\u9650\u5236\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc4\u4f30\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3aABM\u4ee3\u7406\u7684\u5b9e\u7528\u6027\uff0c\u63a2\u7d22\u4e86\u6a21\u62df\u89c4\u6a21\u4e0e\u5355\u4e2a\u4ee3\u7406\u884c\u4e3a\u7ec6\u8282\u4e4b\u95f4\u7684\u6743\u8861\u3002\u4ee5COVID-19\u5927\u6d41\u884c\u4e3a\u4f8b\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5982\u4f55\u6a21\u62df840\u4e07\u4e2a\u4ee3\u8868\u7ebd\u7ea6\u5e02\u7684\u4ee3\u7406\uff0c\u4ee5\u6355\u6349\u9694\u79bb\u548c\u5c31\u4e1a\u884c\u4e3a\u5bf9\u5065\u5eb7\u548c\u7ecf\u6d4e\u7ed3\u679c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u6bd4\u8f83\u4e86\u57fa\u4e8e\u542f\u53d1\u5f0f\u65b9\u6cd5\u548cLLMs\u7684\u4e0d\u540c\u4ee3\u7406\u67b6\u6784\u5728\u9884\u6d4b\u75be\u75c5\u6d6a\u6f6e\u548c\u5931\u4e1a\u7387\u65b9\u9762\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5728\u56de\u987e\u6027\u3001\u5047\u8bbe\u6027\u548c\u524d\u77bb\u6027\u5206\u6790\u4e2d\u7684\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\u5982\u4f55\u5e2e\u52a9\u514b\u670d\u5386\u53f2\u6570\u636e\u5728\u653f\u7b56\u8bbe\u8ba1\u4e2d\u7684\u5c40\u9650\u6027\u3002AgentTorch\u662f\u4e00\u4e2a\u5f00\u6e90\u9879\u76ee\uff0c\u76ee\u524d\u6b63\u88ab\u5168\u7403\u7528\u4e8e\u653f\u7b56\u5236\u5b9a\u548c\u79d1\u5b66\u53d1\u73b0\u3002\u8be5\u6846\u67b6\u53ef\u5728\u6b64\u83b7\u53d6\uff1agithub.com/AgentTorch/AgentTorch\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5e2e\u52a9\u4e0b\uff0c\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u4ee3\u7406\u53ef\u4ee5\u76f4\u63a5\u4e0e\u5e94\u7528\u7528\u6237\u754c\u9762\uff08UI\uff09\u8fdb\u884c\u4ea4\u4e92\uff0c\u4ece\u800c\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u63d0\u5347\u4ee3\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5e38\u5e38\u56e0\u4e3a\u6d89\u53ca\u5927\u91cf\u987a\u5e8fUI\u4ea4\u4e92\u800c\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AXIS\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u63a5\u53e3\uff08APIs\uff09\u4f18\u5148\u4e8eUI\u52a8\u4f5c\u6765\u4f18\u5316\u4ee3\u7406\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u4ee5\u521b\u5efa\u548c\u6269\u5c55API\uff0c\u4fc3\u8fdb\u4e86API\u7684\u751f\u6210\u548c\u5e94\u7528\u8303\u56f4\u7684\u6269\u5c55\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u5728Word\u529e\u516c\u8f6f\u4ef6\u4e0a\u663e\u793a\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u5b8c\u6210\u4efb\u52a1\u7684\u65f6\u95f4\u4e0a\u51cf\u5c11\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u4eba\u7c7b-\u4ee3\u7406-\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u548c\u5e94\u7528\u63d0\u4f9b\u8005\u5728LLMs\u65f6\u4ee3\u8bbe\u8ba1\u65b0UI\u539f\u5219\u63d0\u4f9b\u4e86\u8d21\u732e\uff0c\u5e76\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e00\u4e2a\u5e94\u7528\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8fc8\u5411\u4ee5\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.16455": "|**2024-09-24**|**MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment**|Venkata Naren Devarakonda et.al.|[2409.16455](http://arxiv.org/abs/2409.16455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultiTalk\u7684\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\u3002\u901a\u8fc7\u5f15\u5165\u5185\u7701\u548c\u5916\u7701\u5bf9\u8bdd\u5faa\u73af\u6846\u67b6\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u89e3\u51b3LLM\u5728\u4efb\u52a1\u89c4\u5212\u4e2d\u53ef\u80fd\u9047\u5230\u7684\u95ee\u9898\uff0c\u5982\u5e7b\u89c9\u3001\u7528\u6237\u6307\u4ee4\u4e2d\u7684\u6b67\u4e49\u3001\u73af\u5883\u7ea6\u675f\u4ee5\u53ca\u6267\u884c\u4ee3\u7406\u80fd\u529b\u7684\u5c40\u9650\u6027\u3002\u8fd9\u4e9b\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u8ba1\u5212\u51fa\u73b0\u9519\u8bef\u6216\u4e0d\u5b8c\u6574\u3002 MultiTalk\u65b9\u6cd5\u901a\u8fc7\u7279\u5b9a\u7cfb\u7edf\u6765\u63d0\u53d6\u548c\u9884\u6d4b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u72b6\u6001\uff0c\u5e76\u6807\u8bb0\u51fa\u4eba\u3001LLM\u4ee3\u7406\u548c\u73af\u5883\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\u6216\u504f\u5dee\u3002\u6709\u6548\u7684\u53cd\u9988\u8def\u5f84\u4fc3\u8fdb\u4eba\u4e0eLLM\u4e4b\u95f4\u7684\u6709\u610f\u4e49\u5bf9\u8bdd\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u4efb\u52a1\u7684\u5e94\u7528\u4e2d\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002\u5b9e\u9a8c\u548c\u6d88\u878d\u5206\u6790\u5c55\u793a\u4e86MultiTalk\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u548c\u53ef\u9760\u6027\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u7684\u6bd4\u8f83\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u4f53\u4ee3\u7406\u4efb\u52a1\u89c4\u5212\u65b9\u9762\u7684\u4f18\u52bf\u3002 \u603b\u4e4b\uff0cMultiTalk\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u589e\u5f3aLLM\u4e0e\u73af\u5883\u3001\u6267\u884c\u8005\u548c\u7528\u6237\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u548c\u6c9f\u901a\u6765\u6539\u8fdb\u4efb\u52a1\u89c4\u5212\u8fc7\u7a0b\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u63d0\u9ad8\u89c4\u5212\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002|\n", "2409.15623": "|**2024-09-23**|**Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality**|Yiwen Xu et.al.|[2409.15623](http://arxiv.org/abs/2409.15623)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSafe Guard\u7684LLM\u4ee3\u7406\uff0c\u7528\u4e8e\u68c0\u6d4b\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u7684\u8bed\u97f3\u4ea4\u4e92\u4e2d\u7684\u4ec7\u6068\u8a00\u8bba\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u5229\u7528\u4e86Open AI GPT\u548c\u97f3\u9891\u7279\u5f81\u63d0\u53d6\u6280\u672f\uff0c\u5b9e\u73b0\u4e86\u5b9e\u65f6\u8bed\u97f3\u4ea4\u4e92\u7684\u68c0\u6d4b\u529f\u80fd\u3002\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u7cfb\u7edf\u8bbe\u8ba1\u4ee5\u53ca\u5bf9\u8be5\u7cfb\u7edf\u7684\u8bc4\u4f30\uff0c\u8fd9\u4e9b\u90fd\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u68c0\u6d4b\u4ec7\u6068\u8a00\u8bba\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e14\u76f8\u6bd4\u73b0\u6709\u65b9\u6cd5\u663e\u8457\u964d\u4f4e\u4e86\u8bef\u62a5\u7387\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u521b\u5efa\u66f4\u5b89\u5168\u7684\u865a\u62df\u73af\u5883\u65b9\u9762\u5177\u6709\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u53d1\u5c55\u57fa\u4e8eLLM\u7684\u7ba1\u7406\u65b9\u6cd5\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2409.14913": "|**2024-09-25**|**Towards a Realistic Long-Term Benchmark for Open-Web Research Agents**|Peter M\u00fchlbacher et.al.|[2409.14913](http://arxiv.org/abs/2409.14913)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5373\u5c06\u63a8\u51fa\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u7ecf\u6d4e\u4ef7\u503c\u9ad8\u7684\u767d\u9886\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u5bf9\u91d1\u878d\u548c\u54a8\u8be2\u9886\u57df\u5e38\u89c4\u8fdb\u884c\u7684\u3001\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u201c\u6742\u4e71\u201d\u5f00\u653e\u7f51\u7edc\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u8fd9\u6837\u505a\uff0c\u6211\u4eec\u4e3a\u5efa\u7acb\u4e00\u4e2aLLM\u4ee3\u7406\u8bc4\u4f30\u5957\u4ef6\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5728\u8be5\u5957\u4ef6\u4e2d\uff0c\u826f\u597d\u7684\u6027\u80fd\u76f4\u63a5\u5bf9\u5e94\u7740\u5de8\u5927\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u3002\u6211\u4eec\u6784\u5efa\u5e76\u6d4b\u8bd5\u4e86\u591a\u4e2a\u4ee3\u7406\u67b6\u6784\uff0c\u5305\u62eco1-preview\u3001GPT-4o\u3001Claude-3.5 Sonnet\u3001Llama 3.1\uff08405b\uff09\u4ee5\u53caGPT-4o-mini\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4f7f\u7528Claude-3.5 Sonnet\u548co1-preview\u7684LLM\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u660e\u663e\u4f18\u4e8e\u4f7f\u7528GPT-4o\u7684\u4ee3\u7406\uff0c\u800c\u57fa\u4e8eLlama 3.1\uff08405b\uff09\u548cGPT-4o-mini\u7684\u4ee3\u7406\u5219\u843d\u540e\u5f88\u591a\u3002\u5728\u6240\u6709LLM\u4e2d\uff0c\u5177\u6709\u59d4\u6258\u5b50\u4efb\u52a1\u7ed9\u5b50\u4ee3\u7406\u80fd\u529b\u7684ReAct\u67b6\u6784\u8868\u73b0\u6700\u4f73\u3002\u9664\u4e86\u5b9a\u91cf\u8bc4\u4f30\u4e4b\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u67e5\u4ee3\u7406\u7684\u8ffd\u8e2a\u8bb0\u5f55\u548c\u53cd\u601d\u5b83\u4eec\u7684\u89c2\u5bdf\u7ed3\u679c\uff0c\u5bf9\u4ee3\u7406\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u5b9a\u6027\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4ee3\u8868\u4e86\u9996\u6b21\u6df1\u5165\u8bc4\u4f30\u4ee3\u7406\u5728\u771f\u5b9e\u5f00\u653e\u7f51\u7edc\u4e0a\u6267\u884c\u5177\u6709\u6311\u6218\u6027\u7684\u3001\u7ecf\u6d4e\u4e0a\u6709\u4ef7\u503c\u7684\u5206\u6790\u5e08\u5f0f\u7814\u7a76\u7684\u80fd\u529b\u3002|\n", "2409.14807": "|**2024-09-23**|**Interpreting Multi-band Galaxy Observations with Large Language Model-Based Agents**|Zechang Sun et.al.|[2409.14807](http://arxiv.org/abs/2409.14807)|null|\u672c\u6587\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e3a\u57fa\u7840\u7684\u667a\u80fd\u4f53\u5982\u4f55\u52a0\u901f\u5929\u6587\u5b66\u7814\u7a76\u6d41\u7a0b\uff0c\u901a\u8fc7\u6a21\u4eff\u4eba\u7c7b\u63a8\u7406\u6765\u89e3\u91ca\u591a\u6ce2\u6bb5\u661f\u7cfb\u89c2\u6d4b\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86mephisto\u6846\u67b6\uff0c\u5b83\u80fd\u591f\u4e0eCIGALE\u4ee3\u7801\u5e93\u534f\u4f5c\uff0c\u540e\u8005\u5305\u542b\u4e86\u7528\u4e8e\u89e3\u91ca\u89c2\u6d4b\u6570\u636e\u7684\u5149\u8c31\u80fd\u91cf\u5206\u5e03\uff08SED\uff09\u6a21\u578b\u3002\u5728\u5f00\u653e\u4e16\u754c\u73af\u5883\u4e2d\uff0cmephisto\u901a\u8fc7\u81ea\u6211\u6e38\u620f\u7ecf\u9a8c\u5b66\u4e60\u3001\u6267\u884c\u6811\u641c\u7d22\u5e76\u79ef\u7d2f\u52a8\u6001\u66f4\u65b0\u7684\u77e5\u8bc6\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c06mephisto\u5e94\u7528\u4e8e\u8a79\u59c6\u65af\u97e6\u4f2f\u592a\u7a7a\u671b\u8fdc\u955c\u7684\u6700\u65b0\u6570\u636e\u96c6\u3002\u7ed3\u679c\u8868\u660e\uff0cmephisto\u5728\u63a8\u7406\u661f\u7cfb\u7269\u7406\u573a\u666f\u65b9\u9762\u8fbe\u5230\u4e86\u63a5\u8fd1\u4eba\u7c7b\u7684\u4e13\u4e1a\u6c34\u5e73\uff0c\u751a\u81f3\u5728\u5904\u7406\u65b0\u53d1\u73b0\u7684\u201c\u5c0f\u7ea2\u70b9\u201d\u661f\u7cfb\u65f6\u4e5f\u662f\u5982\u6b64\u3002\u8fd9\u662f\u667a\u80fd\u4f53\u8fdb\u884c\u5929\u6587\u5b66\u7814\u7a76\u7684\u9996\u6b21\u5c55\u793a\uff0c\u671d\u7740\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5b9e\u73b0\u7aef\u5230\u7aef\u7814\u7a76\u7684\u65b9\u5411\u8fc8\u8fdb\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u52a0\u5feb\u5929\u6587\u53d1\u73b0\u7684\u901f\u5ea6\u3002|\n", "2409.14488": "|**2024-09-22**|**Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks**|Ruoyu Song et.al.|[2409.14488](http://arxiv.org/abs/2409.14488)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u7cfb\u7edf\u96c6\u6210\u7684\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0cAD\u7cfb\u7edf\u9762\u4e34\u7740\u653b\u51fb\u5176\u5bf9\u8c61\u68c0\u6d4b\u4e0e\u8ffd\u8e2a\uff08ODT\uff09\u529f\u80fd\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u9488\u5bf9\u56db\u4e2a\u8fd1\u671f\u63d0\u51fa\u7684LLM\u4ee3\u7406\u7684ODT\u653b\u51fb\u6210\u529f\u7387\u8fbe\u523063.26%\uff0c\u5bfc\u81f4\u5b83\u4eec\u5d29\u6e83\u6216\u8fdd\u53cd\u4ea4\u901a\u89c4\u5219\uff0c\u539f\u56e0\u5728\u4e8e\u8bef\u5bfc\u6027\u8bb0\u5fc6\u6a21\u5757\u63d0\u4f9b\u7684\u8fc7\u5f80\u7ecf\u9a8c\u3001\u63d0\u793a\u5728\u8bc6\u522b\u4e0d\u4e00\u81f4\u6027\u65b9\u9762\u7684\u5c40\u9650\u6027\u4ee5\u53ca\u5bf9\u5730\u9762\u5b9e\u51b5\u611f\u77e5\u6570\u636e\u7684\u4f9d\u8d56\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHudson\u7684\u9a7e\u9a76\u63a8\u7406\u4ee3\u7406\uff0c\u5b83\u6269\u5c55\u4e86\u5148\u524d\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u7cfb\u7edf\uff0c\u65e8\u5728\u5728\u611f\u77e5\u653b\u51fb\u671f\u95f4\u5b9e\u73b0\u66f4\u5b89\u5168\u7684\u51b3\u7b56\u5236\u5b9a\uff0c\u540c\u65f6\u5728\u6b63\u5e38\u6761\u4ef6\u4e0b\u4fdd\u6301\u6709\u6548\u6027\u3002 Hudson\u901a\u8fc7\u9996\u5148\u5bf9AD\u8f6f\u4ef6\u8fdb\u884c\u4eea\u5668\u5316\u6536\u96c6\u5b9e\u65f6\u611f\u77e5\u7ed3\u679c\u548c\u9a7e\u9a76\u573a\u666f\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u6570\u636e\u968f\u540e\u88ab\u8f6c\u5316\u4e3a\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u3002\u4e3a\u4e86\u5f15\u5bfcLLM\u5728ODT\u653b\u51fb\u671f\u95f4\u68c0\u6d4b\u5e76\u505a\u51fa\u5b89\u5168\u63a7\u5236\u51b3\u7b56\uff0cHudson\u5c06DSL\u8f6c\u6362\u4e3a\u81ea\u7136\u8bed\u8a00\uff0c\u5e76\u9644\u5e26\u4e00\u7ec4\u81ea\u5b9a\u4e49\u7684\u653b\u51fb\u68c0\u6d4b\u6307\u4ee4\u3002\u6267\u884c\u67e5\u8be2\u540e\uff0cHudson\u5206\u6790LLM\u7684\u63a7\u5236\u51b3\u7b56\u4ee5\u7406\u89e3\u5176\u56e0\u679c\u63a8\u7406\u8fc7\u7a0b\u3002 \u6211\u4eec\u4f7f\u7528\u79c1\u6709LLM\uff08GPT-4\uff09\u3001\u4e24\u4e2a\u5f00\u6e90LLM\uff08Llama\u548cGemma\uff09\u548c\u5404\u79cd\u5bf9\u6297\u6027\u9a7e\u9a76\u60c5\u666f\u5bf9Hudson\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002GPT-4\u3001Llama\u548cGemma\u5728\u5e73\u5747\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e8683.3%\u300163.6%\u548c73.6%\u7684\u653b\u51fb\u68c0\u6d4b\u51c6\u786e\u7387\u3002\u56e0\u6b64\uff0c\u572886.4%\u300173.9%\u548c80%\u7684\u653b\u51fb\u4e2d\uff0c\u5b83\u4eec\u505a\u51fa\u4e86\u5b89\u5168\u63a7\u5236\u51b3\u7b56\u3002\u968f\u7740\u5c06LLM\u96c6\u6210\u5230AD\u7cfb\u7edf\u4e2d\u7684\u5174\u8da3\u589e\u957f\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u7684\u4f18\u52bf\u53ca\u5176\u5728\u68c0\u6d4b\u548c\u7f13\u89e3ODT\u653b\u51fb\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2409.13642": "|**2024-09-20**|**Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection**|Md Nakhla Rafi et.al.|[2409.13642](http://arxiv.org/abs/2409.13642)|null|\u5728\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u5b9a\u4f4d\u548c\u4fee\u590d\u8f6f\u4ef6\u6545\u969c\u662f\u4e00\u4e2a\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u7684\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9891\u8c31\u7684\u6545\u969c\u5b9a\u4f4d\uff08SBFL\uff09\uff0c\u4f9d\u8d56\u4e8e\u6d4b\u8bd5\u8986\u76d6\u7387\u6570\u636e\u7684\u7edf\u8ba1\u5206\u6790\uff0c\u4f46\u5f80\u5f80\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u57fa\u4e8e\u5b66\u4e60\u7684\u6280\u672f\u867d\u7136\u66f4\u6709\u6548\uff0c\u4f46\u9700\u8981\u5927\u91cf\u7684\u8bad\u7ec3\u6570\u636e\uff0c\u5e76\u4e14\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u6539\u5584\u6545\u969c\u5b9a\u4f4d\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u589e\u5f3a\u4ee3\u7801\u7406\u89e3\u548c\u63a8\u7406\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u57fa\u7ebf\u6280\u672f\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5305\u62ec\u4ee4\u724c\u9650\u5236\u3001\u957f\u8f93\u5165\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u5904\u7406\u6d89\u53ca\u591a\u4e2a\u76f8\u4e92\u4f5c\u7528\u7ec4\u4ef6\u7684\u590d\u6742\u7cfb\u7edf\u65f6\u7684\u56f0\u96be\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLM4FL\u7684\u521b\u65b0\u6027LLM\u4ee3\u7406\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86SBFL\u6392\u540d\u4e0e\u5206\u800c\u6cbb\u4e4b\u7b56\u7565\u3002\u901a\u8fc7\u5c06\u5927\u89c4\u6a21\u8986\u76d6\u6570\u636e\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u7ec4\uff0c\u5e76\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u901a\u8fc7\u63d0\u793a\u94fe\u5f0f\u8c03\u7528\uff0cLLM4FL\u6709\u6548\u5730\u5bfc\u822a\u4ee3\u7801\u5e93\u5e76\u5b9a\u4f4d\u6545\u969c\u3002\u8be5\u65b9\u6cd5\u8fd8\u6574\u5408\u4e86\u81ea\u6211\u53cd\u601d\u548c\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u8fed\u4ee3\u751f\u6210\u4fee\u590d\u5e76\u91cd\u65b0\u6392\u540d\u53ef\u7591\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528Defects4J\uff08V2.0.0\uff09\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u5176\u4e2d\u5305\u62ec\u6765\u81ea14\u4e2a\u5f00\u6e90Java\u9879\u76ee\u7684675\u4e2a\u771f\u5b9e\u4e16\u754c\u6545\u969c\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM4FL\u5728Top-1\u51c6\u786e\u7387\u4e0a\u6bd4AutoFL\u9ad8\u51fa19.27%\uff0c\u5e76\u4e14\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u76d1\u7763\u6280\u672f\uff0c\u5982DeepFL\u548cGrace\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u7279\u5b9a\u4efb\u52a1\u7684\u57f9\u8bad\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u8986\u76d6\u62c6\u5206\u548c\u63d0\u793a\u94fe\u5bf9\u6545\u969c\u5b9a\u4f4d\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u5c55\u793a\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6392\u5e8f\u53ef\u4ee5\u63d0\u9ad8Top-1\u51c6\u786e\u7387\u9ad8\u8fbe22%\u3002|\n", "2409.13447": "|**2024-09-23**|**AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit**|Mohanna Hoveyda et.al.|[2409.13447](http://arxiv.org/abs/2409.13447)|null|\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4e0d\u540c\u7684\u95ee\u9898\u53ef\u80fd\u9700\u8981\u4e0d\u540c\u7684\u56de\u7b54\u7b56\u7565\u6765\u6709\u6548\u89e3\u51b3\u3002\u4e00\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7\u7b80\u5355\u7684\u67e5\u627e\u6765\u89e3\u51b3\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u9700\u8981\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u63a8\u7406\u3002\u8fd9\u4e00\u89c2\u5bdf\u7ed3\u679c\u6fc0\u53d1\u4e86\u5f00\u53d1\u4e00\u79cd\u52a8\u6001\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u4e3a\u6bcf\u4e2a\u95ee\u9898\u9002\u5f53\u5730\u9009\u62e9\u6700\u5408\u9002\u7684QA\u7b56\u7565\uff0c\u4ece\u800c\u6784\u5efa\u66f4\u9ad8\u6548\u3001\u66f4\u6709\u6548\u7684\u7cfb\u7edf\uff0c\u80fd\u591f\u5904\u7406\u66f4\u5e7f\u6cdb\u7c7b\u578b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u57fa\u4e8e\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96c6\u6210\u6700\u65b0\u8fdb\u5c55\uff0c\u5e76\u5c06\u9002\u5e94\u6027QA\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u52a8\u6001\u7f16\u6392\u6311\u6218\u3002\u6211\u4eec\u5c06\u6b64\u89c6\u4e3a\u4e00\u4e2a\u4e0a\u4e0b\u6587\u591a\u81c2\u8001\u864e\u673a\u95ee\u9898\uff0c\u5176\u4e2d\u4e0a\u4e0b\u6587\u7531\u8fdb\u5165\u95ee\u9898\u7684\u7279\u6027\u5b9a\u4e49\uff0c\u800c\u52a8\u4f5c\u7a7a\u95f4\u5305\u62ec\u6f5c\u5728\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u56fe\u914d\u7f6e\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u7ebf\u6027\u4e0a\u754c\u4fe1\u5fc3\u8fb9\u754c\u6a21\u578b\uff0c\u4ee5\u5b66\u4e60\u4e0d\u540c\u95ee\u9898\u7c7b\u578b\u4e0e\u5176\u5bf9\u5e94\u7684\u6700\u4f73\u591aLLM\u901a\u4fe1\u56fe\u8868\u793a\u4e4b\u95f4\u7684\u6700\u4f18\u6620\u5c04\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u9002\u7528\u4e8e\u9002\u5e94\u6027\u7684LLM\u96c6\u6210\u95ee\u7b54\u7cfb\u7edf\u7684\u7f16\u6392\uff0c\u5b83\u7ed3\u5408\u4e86\u66f4\u590d\u6742\u7b56\u7565\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u5728\u7b80\u5355\u7b56\u7565\u8db3\u4ee5\u7684\u60c5\u51b5\u4e0b\u4f7f\u7528\u8fd9\u4e9b\u7b56\u7565\u7684\u6210\u672c\u3002|\n", "2409.15376": "|**2024-09-20**|**ControlMath: Controllable Data Generation Promotes Math Generalist Models**|Nuo Chen et.al.|[2409.15376](http://arxiv.org/abs/2409.15376)|null|\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u95ee\u9898\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u9650\u5236\uff0c\u53ef\u80fd\u4ec5\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u751f\u6210\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aControlMath\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5305\u542b\u4e00\u4e2a\u65b9\u7a0b\u5f0f\u751f\u6210\u6a21\u5757\u548c\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u3002\u8be5\u6a21\u5757\u4ea7\u751f\u591a\u6837\u5316\u7684\u65b9\u7a0b\uff0c\u95ee\u9898\u521b\u9020\u8005\u4ee3\u7406\u968f\u540e\u5c06\u5176\u8f6c\u5316\u4e3a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u9006\u5411\u4ee3\u7406\u5219\u7b5b\u9009\u5e76\u9009\u62e9\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u9075\u5faa\u201c\u5c11\u5373\u662f\u591a\u201d\u7684\u539f\u5219\uff0c\u4f7f\u7528\u66f4\u5c11\u7684\u6570\u636e\u70b9\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u7ed3\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u591a\u6837\u5316\u7684\u6570\u5b66\u95ee\u9898\uff0c\u4e0d\u53d7\u7279\u5b9a\u9886\u57df\u6216\u5206\u5e03\u7684\u9650\u5236\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86ControlMathQA\u6570\u636e\u96c6\uff0c\u5305\u542b19\u4e07\u4e2a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\uff0c\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0eGSM8K\u7b49\u5185\u90e8\u9886\u57df\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u53ef\u4ee5\u5e2e\u52a9\u63d0\u9ad8\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ece\u800c\u5728\u7279\u5b9a\u9886\u57df\u5185\u4ee5\u53ca\u8d85\u51fa\u7279\u5b9a\u9886\u57df\u65f6\u90fd\u80fd\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.13107": "|**2024-09-24**|**Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models**|Hao Ding et.al.|[2409.13107](http://arxiv.org/abs/2409.13107)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u673a\u5668\u611f\u77e5\u65b9\u6cd5\uff0c\u65e8\u5728\u5229\u7528\u8fd1\u671f\u89c6\u89c9\u57fa\u7840\u6a21\u578b\u7684\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\u548c\u5f00\u7bb1\u5373\u7528\u7684\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8fdb\u884c\u89c4\u5212\uff0c\u4e0edVRK\u5e73\u53f0\u96c6\u6210\uff0c\u4ece\u800c\u5f00\u53d1\u51fa\u4e00\u4e2a\u5177\u6709\u5f3a\u5927\u4efb\u52a1\u6027\u80fd\u548c\u5728\u4e0d\u540c\u73af\u5883\u8bbe\u7f6e\u4e0b\u901a\u7528\u6027\u7684\u5b9e\u4f53\u667a\u80fd\u7cfb\u7edf\u3002\u5728\u6267\u884c\u7a7f\u9488\u79fb\u4f4d\u548c\u7eb1\u5e03\u68c0\u7d22\u4efb\u52a1\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5f3a\u5927\u7684\u4efb\u52a1\u6027\u80fd\u548c\u901a\u7528\u6027\u3002 \u5c3d\u7ba1\u8868\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\uff0c\u4f46\u672c\u6587\u7684\u5de5\u4f5c\u4ec5\u4ec5\u662f\u5bf9\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u96c6\u6210\u7684\u7b2c\u4e00\u6b65\u3002\u4e3a\u4e86\u5b9e\u73b0\u5168\u9762\u7684\u6570\u5b57\u5b6a\u751f\u6846\u67b6\u4ee5\u6539\u5584\u624b\u672f\u9886\u57df\u5b9e\u4f53\u667a\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u548c\u901a\u7528\u6027\uff0c\u672a\u6765\u7684\u7814\u7a76\u662f\u5fc5\u8981\u7684\u3002|\n", "2409.17515": "|**2024-09-26**|**From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection**|Xinlei Wang et.al.|[2409.17515](http://arxiv.org/abs/2409.17515)|**[link](https://github.com/ameliawong1996/From_News_to_Forecast)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u751f\u6210\u4ee3\u7406\u6765\u589e\u5f3a\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u4ee5\u8bed\u8a00\u4f5c\u4e3a\u5a92\u4ecb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u5e94\u6027\u5730\u5c06\u5404\u79cd\u793e\u4f1a\u4e8b\u4ef6\u6574\u5408\u8fdb\u9884\u6d4b\u6a21\u578b\u4e2d\uff0c\u5c06\u65b0\u95fb\u5185\u5bb9\u4e0e\u65f6\u95f4\u5e8f\u5217\u6ce2\u52a8\u5bf9\u9f50\uff0c\u4ece\u800c\u63d0\u4f9b\u4e30\u5bcc\u6d1e\u5bdf\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u8fdb\u884c\u8fed\u4ee3\u7b5b\u9009\uff0c\u53bb\u9664\u65e0\u5173\u65b0\u95fb\uff0c\u5e76\u91c7\u7528\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u7406\u548c\u53cd\u601d\u6765\u8bc4\u4f30\u9884\u6d4b\u7ed3\u679c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5206\u6790\u590d\u6742\u4e8b\u4ef6\uff0c\u5982\u610f\u5916\u4e8b\u4ef6\u548c\u793e\u4f1a\u884c\u4e3a\u8f6c\u53d8\uff0c\u5e76\u4e0d\u65ad\u4f18\u5316\u9009\u62e9\u903b\u8f91\u4ee5\u53ca\u4ee3\u7406\u8f93\u51fa\u7684\u7a33\u5065\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u7cbe\u9009\u65b0\u95fb\u548c\u65f6\u95f4\u5e8f\u5217\u6570\u636e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u7684LLaMa2\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u901a\u8fc7\u6709\u6548\u5229\u7528\u975e\u7ed3\u6784\u5316\u65b0\u95fb\u6570\u636e\uff0c\u53ef\u80fd\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u9886\u57df\u5b9e\u73b0\u8303\u5f0f\u8f6c\u53d8\u3002|\n", "2409.17266": "|**2024-09-25**|**AAPM: Large Language Model Agent-based Asset Pricing Models**|Junyan Cheng et.al.|[2409.17266](http://arxiv.org/abs/2409.17266)|**[link](https://github.com/chengjunyan1/aapm)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u8d44\u4ea7\u5b9a\u4ef7\u65b9\u6cd5\u2014\u2014\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u8d44\u4ea7\u5b9a\u4ef7\u6a21\u578b\uff08AAPM\uff09\u3002\u8be5\u65b9\u6cd5\u5c06LLM\u4ee3\u7406\u7684\u5b9a\u6027\u4e3b\u89c2\u6295\u8d44\u5206\u6790\u4e0e\u5b9a\u91cf\u624b\u52a8\u91d1\u878d\u7ecf\u6d4e\u56e0\u7d20\u878d\u5408\uff0c\u4ee5\u9884\u6d4b\u8d85\u989d\u8d44\u4ea7\u56de\u62a5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec4\u5408\u4f18\u5316\u548c\u8d44\u4ea7\u5b9a\u4ef7\u8bef\u5dee\u65b9\u9762\u5747\u4f18\u4e8e\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u8d44\u4ea7\u5b9a\u4ef7\u57fa\u51c6\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f02\u5e38\u8d44\u4ea7\u7ec4\u5408\u7684\u590f\u666e\u6bd4\u7387\u548c\u5e73\u5747\u03b1\u503c\u5206\u522b\u63d0\u9ad8\u4e869.6%\u548c10.8%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5bf9\u6570\u636e\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u63d0\u51fa\u65b9\u6cd5\u7684\u66f4\u591a\u89c1\u89e3\u3002**|\n", "2409.20163": "|**2024-09-30**|**MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants**|Zeyu Zhang et.al.|[2409.20163](http://arxiv.org/abs/2409.20163)|**[link](https://github.com/nuster1128/memsim)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMemSim\u7684\u8d1d\u53f6\u65af\u6a21\u62df\u5668\uff0c\u7528\u4e8e\u4ece\u751f\u6210\u7684\u7528\u6237\u6d88\u606f\u81ea\u52a8\u6784\u5efa\u53ef\u9760\u7684\u95ee\u9898\u4e0e\u7b54\u6848\uff08Q&A\uff09\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u591a\u6837\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8d1d\u53f6\u65af\u5173\u7cfb\u7f51\u7edc\uff08BRNet\uff09\u548c\u56e0\u679c\u751f\u6210\u673a\u5236\uff0c\u4ee5\u51cf\u8f7b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e7b\u89c9\u5bf9\u4e8b\u5b9e\u4fe1\u606f\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u81ea\u52a8\u6784\u5efa\u8bc4\u4f30\u6570\u636e\u96c6\u3002\u57fa\u4e8eMemSim\uff0c\u6211\u4eec\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u751f\u6210\u4e86\u4e00\u4e2a\u540d\u4e3aMemDaily\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4f7f\u7528MemDaily\u6570\u636e\u96c6\u8bc4\u4f30LLM\u57fa\u667a\u80fd\u4f53\u4e0d\u540c\u8bb0\u5fc6\u673a\u5236\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u60e0\u53ca\u7814\u7a76\u793e\u533a\uff0c\u6211\u4eec\u5df2\u7ecf\u5728https://github.com/nuster1128/MemSim\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u9879\u76ee\u3002**|\n", "2409.19894": "|**2024-10-01**|**TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation**|Zhiqiang Yuan et.al.|[2409.19894](http://arxiv.org/abs/2409.19894)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRANSAGENT\u7684\u65b0\u578b\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u7ffb\u8bd1\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u56db\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u534f\u540c\u5de5\u4f5c\u4fee\u590d\u8bed\u6cd5\u9519\u8bef\u548c\u8bed\u4e49\u9519\u8bef\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5206\u522b\u662f\u521d\u59cb\u4ee3\u7801\u7ffb\u8bd1\u5668\u3001\u8bed\u6cd5\u9519\u8bef\u4fee\u590d\u5668\u3001\u4ee3\u7801\u5bf9\u9f50\u5668\u548c\u8bed\u4e49\u9519\u8bef\u4fee\u590d\u5668\u3002TRANSAGENT\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u9996\u5148\u6839\u636e\u76ee\u6807\u7a0b\u5e8f\u4e0e\u6e90\u7a0b\u5e8f\u4e4b\u95f4\u7684\u6267\u884c\u5bf9\u9f50\u5b9a\u4f4d\u76ee\u6807\u7a0b\u5e8f\u4e2d\u7684\u9519\u8bef\u4ee3\u7801\u5757\uff0c\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u7f29\u5c0f\u4fee\u590d\u8303\u56f4\u5e76\u964d\u4f4e\u4fee\u590d\u96be\u5ea6\u3002 \u4e3a\u4e86\u8bc4\u4f30TRANSAGENT\uff0c\u6211\u4eec\u9996\u5148\u4ece\u6700\u8fd1\u7684\u7f16\u7a0b\u4efb\u52a1\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u4ee5\u51cf\u8f7b\u6f5c\u5728\u7684\u6570\u636e\u6cc4\u9732\u95ee\u9898\u3002\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\uff0cTRANSAGENT\u5728\u7ffb\u8bd1\u6548\u679c\u548c\u6548\u7387\u65b9\u9762\u90fd\u4f18\u4e8e\u6700\u65b0\u7684LLM\u57fa\u4ee3\u7801\u7ffb\u8bd1\u6280\u672fUniTrans\uff1b\u6b64\u5916\uff0c\u5728\u4e0d\u540cLLM\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86TRANSAGENT\u7684\u4e00\u822c\u6027\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6d88\u878d\u7814\u7a76\u63ed\u793a\u4e86\u6bcf\u4e2a\u4ee3\u7406\u7684\u8d21\u732e\u3002|\n", "2410.01639": "|**2024-10-02**|**Moral Alignment for LLM Agents**|Elizaveta Tennant et.al.|[2410.01639](http://arxiv.org/abs/2410.01639)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51b3\u7b56\u4ee3\u7406\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u5728\u4eba\u7c7b\u6d3b\u52a8\u7684\u4e0d\u540c\u9886\u57df\u90e8\u7f72\u3002\u867d\u7136\u5b83\u4eec\u7684\u5e94\u7528\u76ee\u524d\u8f83\u4e3a\u4e13\u4e1a\u5316\uff0c\u4f46\u5df2\u6709\u7814\u7a76\u52aa\u529b\u5f00\u53d1\u66f4\u901a\u7528\u7684\u4ee3\u7406\u3002\u968f\u7740LLM\u7cfb\u7edf\u53d8\u5f97\u66f4\u52a0\u81ea\u4e3b\uff0c\u5b83\u4eec\u5bf9\u4eba\u7c7b\u6d3b\u52a8\u7684\u5f71\u54cd\u5c06\u589e\u52a0\uff0c\u5e76\u4e14\u900f\u660e\u5ea6\u4f1a\u964d\u4f4e\u3002\u56e0\u6b64\uff0c\u53d1\u5c55\u6709\u6548\u7684\u65b9\u6cd5\u6765\u4f7f\u5b83\u4eec\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u81f3\u5173\u91cd\u8981\u3002 \u73b0\u6709\u7684\u5bf9\u9f50\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\uff08\u4f8b\u5982\uff0c\u5728RLHF\u6216DPO\u4e2d\uff09\uff0c\u5176\u4e2d\u4ef7\u503c\u89c2\u662f\u9690\u542b\u7684\uff0c\u5e76\u4e14\u672c\u8d28\u4e0a\u662f\u4ece\u4e0d\u540c\u6a21\u578b\u8f93\u51fa\u7684\u76f8\u5bf9\u504f\u597d\u4e2d\u63a8\u65ad\u51fa\u6765\u7684\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u6211\u4eec\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u65b9\u6cd5\uff0c\u8fd9\u4e9b\u51fd\u6570\u660e\u786e\u7f16\u7801\u4e86\u6838\u5fc3\u7684\u4eba\u7c7b\u4ef7\u503c\u89c2\uff0c\u7528\u4e8e\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u65b9\u5f0f\u5fae\u8c03\u57fa\u7840\u4ee3\u7406\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4f7f\u7528\u5185\u5728\u5956\u52b1\u6765\u5b9e\u73b0LLM\u4ee3\u7406\u7684\u9053\u5fb7\u5bf9\u9f50\u3002 \u6211\u4eec\u901a\u8fc7\u4f20\u7edf\u7684\u54f2\u5b66\u6846\u67b6\u2014\u2014\u5fb7ontology\u4f26\u7406\u548c\u529f\u5229\u4e3b\u4e49\u6765\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u91cf\u5316\u4e86\u5728\u8fed\u4ee3\u56da\u5f92\u56f0\u5883\uff08IPD\uff09\u73af\u5883\u4e2d\u4ee3\u7406\u7684\u9053\u5fb7\u5956\u52b1\uff0c\u57fa\u4e8e\u5176\u884c\u4e3a\u53ca\u5176\u540e\u679c\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u9053\u5fb7\u5fae\u8c03\u4f7f\u4ee3\u7406\u80fd\u591f\u653e\u5f03\u4e4b\u524d\u5f00\u53d1\u7684\u81ea\u79c1\u7b56\u7565\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u67d0\u4e9b\u5728IPD\u6e38\u620f\u4e2d\u5b66\u4e60\u7684\u9053\u5fb7\u7b56\u7565\u80fd\u591f\u63a8\u5e7f\u5230\u591a\u4e2a\u77e9\u9635\u6e38\u620f\u73af\u5883\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u4f7f\u7528\u5185\u5728\u5956\u52b1\u8fdb\u884c\u5fae\u8c03\u662f\u5c06LLM\u4ee3\u7406\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u6709\u524d\u666f\u7684\u4e00\u822c\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e14\u53ef\u80fd\u4ee3\u8868\u4e86\u5f53\u524d\u4e3b\u6d41\u5bf9\u9f50\u6280\u672f\u66f4\u52a0\u900f\u660e\u548c\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684\u66ff\u4ee3\u65b9\u6848\u3002|\n", "2410.01242": "|**2024-10-03**|**RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance**|Haolin Jin et.al.|[2410.01242](http://arxiv.org/abs/2410.01242)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6700\u8fd1\u7684\u63d0\u793a\u5de5\u7a0b\u7814\u7a76\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86LLM\u5bf9\u6587\u672c\u4fe1\u606f\u7684\u7406\u89e3\u3002\u7136\u800c\uff0c\u786e\u4fdd\u751f\u6210\u4ee3\u7801\u7684\u51c6\u786e\u6027\u901a\u5e38\u9700\u8981\u7a0b\u5e8f\u5458\u8fdb\u884c\u5927\u91cf\u7684\u6d4b\u8bd5\u548c\u9a8c\u8bc1\u3002\u5c3d\u7ba1LLM\u80fd\u591f\u57fa\u4e8e\u4efb\u52a1\u63cf\u8ff0\u751f\u6210\u4ee3\u7801\uff0c\u4f46\u5728\u590d\u6742\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u5ea6\u4ecd\u7136\u6709\u9650\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u90a3\u4e9b\u9700\u8981\u66f4\u6df1\u5165\u7406\u89e3\u95ee\u9898\u9648\u8ff0\u548c\u4ee3\u7801\u751f\u6210\u8fc7\u7a0b\u7684\u4efb\u52a1\u3002\u8fd9\u4e00\u9650\u5236\u4e3b\u8981\u6e90\u4e8eLLM\u540c\u65f6\u9700\u8981\u7406\u89e3\u548c\u751f\u6210\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u6b63\u786e\u7684\u4ee3\u7801\uff0c\u800c\u6ca1\u6709\u80fd\u529b\u81ea\u52a8\u4f18\u5316\u4ee3\u7801\u7684\u80fd\u529b\u3002\u5728\u5b9e\u9645\u7684\u8f6f\u4ef6\u5f00\u53d1\u4e2d\uff0c\u7a0b\u5e8f\u5458\u5f88\u5c11\u80fd\u5728\u4ec5\u51ed\u4efb\u52a1\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u4e00\u6b21\u5c31\u751f\u6210\u5b8c\u7f8e\u7684\u4ee3\u7801\uff0c\u4ed6\u4eec\u4f9d\u8d56\u4e8e\u8fed\u4ee3\u53cd\u9988\u548c\u8c03\u8bd5\u6765\u5b8c\u5584\u4ed6\u4eec\u7684\u7a0b\u5e8f\u3002\u53d7\u6b64\u8fc7\u7a0b\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u667a\u80fd\u4f53\u67b6\u6784\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u548c\u81ea\u52a8\u8c03\u8bd5\uff1a\u6539\u8fdb\u4e0e\u6307\u5bfc\u8c03\u8bd5\uff08RGD\uff09\u3002RGD\u6846\u67b6\u662f\u4e00\u4e2a\u5229\u7528\u4e09\u79cd\u4e0d\u540cLLM\u4ee3\u7406\uff08\u5f15\u5bfc\u4ee3\u7406\u3001\u8c03\u8bd5\u4ee3\u7406\u548c\u53cd\u9988\u4ee3\u7406\uff09\u7684\u591a\u667a\u80fd\u4f53\u8c03\u8bd5\u5668\uff0c\u5b83\u5c06\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u591a\u4e2a\u6b65\u9aa4\uff0c\u786e\u4fdd\u4e86\u6e05\u6670\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5141\u8bb8\u57fa\u4e8e\u81ea\u6211\u53cd\u601d\u548c\u53cd\u9988\u7684\u4ee3\u7801\u8fed\u4ee3\u7ec6\u5316\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRGD\u5728\u4ee3\u7801\u751f\u6210\u80fd\u529b\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5206\u522b\u5728HumanEval\u6570\u636e\u96c6\u548cMBPP\u6570\u636e\u96c6\u4e0a\u76f8\u6bd4\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u4f20\u7edf\u76f4\u63a5\u63d0\u793a\u65b9\u6cd5\u5b9e\u73b0\u4e869.8%\u548c16.2%\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u5f3a\u8c03\u4e86RGD\u6846\u67b6\u5728\u589e\u5f3aLLM\u81ea\u4e3b\u751f\u6210\u548c\u4f18\u5316\u4ee3\u7801\u80fd\u529b\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2410.00467": "|**2024-10-01**|**Dynamic Planning for LLM-based Graphical User Interface Automation**|Shaoqing Zhang et.al.|[2410.00467](http://arxiv.org/abs/2410.00467)|**[link](https://github.com/sqzhang-lazy/d-pot)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u6fc0\u53d1\u4e86\u5bf9\u81ea\u4e3bLLM\u57fa\u4ee3\u7406\u8fdb\u884c\u521b\u65b0\u6027\u53d1\u5c55\u7684\u5174\u8da3\uff0c\u5c24\u5176\u662f\u5728\u667a\u80fd\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e2d\u7684\u5e94\u7528\u3002\u5f53\u9762\u5bf9\u4efb\u52a1\u76ee\u6807\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u901a\u5e38\u4f1a\u6a21\u4eff\u4eba\u7c7b\u5728GUI\u73af\u5883\u4e2d\u7684\u64cd\u4f5c\u76f4\u81f3\u4efb\u52a1\u5b8c\u6210\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5173\u952e\u6311\u6218\u5728\u4e8e\u5982\u4f55\u6709\u6548\u5730\u5236\u5b9a\u8ba1\u5212\u4ee5\u6307\u5bfcGUI\u4efb\u52a1\u4e2d\u7684\u52a8\u4f5c\u9884\u6d4b\uff0c\u5c3d\u7ba1\u89c4\u5212\u5df2\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u5206\u89e3\u590d\u6742\u4efb\u52a1\u7684\u6709\u6548\u65b9\u5f0f\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728\u6267\u884c\u52a8\u4f5c\u540eGUI\u73af\u5883\u7684\u52a8\u6001\u6027\u8d28\u610f\u5473\u7740\u9700\u8981\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u52a8\u4f5c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u8ba1\u5212\u3002 \u6211\u4eec\u53d1\u73b0\u5e7f\u53d7\u6b22\u8fce\u7684ReAct\u65b9\u6cd5\u5931\u8d25\u4e86\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u8fc7\u4e8e\u4f9d\u8d56\u8fc7\u957f\u7684\u5386\u53f2\u5bf9\u8bdd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u52a8\u6001\u601d\u7ef4\u89c4\u5212\uff08D-PoT\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u57fa\u4e8eLLM\u7684GUI\u4ee3\u7406\u3002D-PoT\u6d89\u53ca\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u6267\u884c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u89c4\u5212\u7684\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684D-PoT\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u4e0a\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684GPT-4V\u57fa\u7ebf\uff0c\u63d0\u9ad8\u4e8612.7%\uff08\u4ece34.66%\u63d0\u9ad8\u523047.36%\uff09\u3002\u5206\u6790\u63ed\u793a\u4e86\u52a8\u6001\u89c4\u5212\u5728\u4e0d\u540c\u57fa\u7840LLM\u4e2d\u7684\u901a\u7528\u6027\uff0c\u4ee5\u53ca\u5728\u5904\u7406\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u65f6\u51cf\u5c11\u5e7b\u89c9\u5e76\u9002\u5e94\u7684\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/sqzhang-lazy/D-PoT\u3002**|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\uff0c\u5b83\u4eec\u7ecf\u5e38\u9047\u5230\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u5fae\u4e4b\u5904\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large Language Model with Imperfect World MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u96c6\u6210\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\uff0c\u7528\u4e8e\u65f6\u95f4\u4e0a\u4e00\u81f4\u7684\u7ecf\u9a8c\u91c7\u6837\uff0c\u4e00\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\u96c6\u5408\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u5c04\u6027\u589e\u5f3a\u751f\u6210\u6a21\u5757\uff0c\u7528\u4e8e\u53cd\u6620\u5148\u524d\u7684\u7ecf\u9a8c\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u9ad8\u5f3a\u5f00\u6e90LLMs\uff0c\u5982LLaMA-3\uff0c\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a2.04\u500d\u30011.54\u500d\u548c1.82\u500d\uff0c\u5206\u522b\u3002\u8fd9\u79cd\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5b83\u4eec\u66f4\u5927\u7684\u540c\u8f88\uff0c\u5982GPT-4\u3002|\n", "2410.02644": "|**2024-10-03**|**Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents**|Hanrong Zhang et.al.|[2410.02644](http://arxiv.org/abs/2410.02644)|**[link](https://github.com/agiresearch/asb)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u6587\u732e\u5728\u5168\u9762\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u653b\u51fb\u4e0e\u9632\u5fa1\u7b56\u7565\u65b9\u9762\u7684\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u5b89\u5168\u57fa\u51c6\u201d\uff08Agent Security Benchmark, ASB\uff09\u7684\u7efc\u5408\u6846\u67b6\u3002\u8be5\u6846\u67b6\u65e8\u5728\u6b63\u5f0f\u5316\u3001\u6807\u51c6\u5316\u5e76\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u95ee\u9898\uff0c\u6db5\u76d6\u4e8610\u4e2a\u5e94\u7528\u573a\u666f\uff08\u5982\u7535\u5b50\u5546\u52a1\u3001\u81ea\u52a8\u9a7e\u9a76\u3001\u91d1\u878d\uff09\u300110\u4e2a\u9488\u5bf9\u8fd9\u4e9b\u573a\u666f\u7684\u4ee3\u7406\u3001\u8d85\u8fc7400\u79cd\u5de5\u5177\u300123\u7c7b\u4e0d\u540c\u7684\u653b\u51fb\u4e0e\u9632\u5fa1\u65b9\u6cd5\u4ee5\u53ca8\u4e2a\u8bc4\u4ef7\u6307\u6807\u3002\u57fa\u4e8eASB\uff0c\u6211\u4eec\u5bf910\u79cd\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u3001\u4e00\u79cd\u8bb0\u5fc6\u6c61\u67d3\u653b\u51fb\u3001\u4e00\u79cd\u65b0\u9896\u7684\u8ba1\u5212-\u601d\u7ef4\u540e\u95e8\u653b\u51fb\u3001\u4e00\u79cd\u6df7\u5408\u653b\u51fb\u4ee5\u53ca\u9488\u5bf9\u8fd910\u79cd\u653b\u51fb\u768410\u79cd\u76f8\u5e94\u9632\u5fa1\u63aa\u65bd\uff0c\u572813\u4e2aLLM\u67b6\u6784\u4e0b\u8fdb\u884c\u4e86\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u603b\u5171\u4ea7\u751f\u4e86\u8fd19\u4e07\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u3002\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\u7ed3\u679c\u63ed\u793a\u4e86\u4ee3\u7406\u64cd\u4f5c\u4e0d\u540c\u9636\u6bb5\u4e2d\u7684\u5173\u952e\u5b89\u5168\u6f0f\u6d1e\uff0c\u5305\u62ec\u7cfb\u7edf\u63d0\u793a\u3001\u7528\u6237\u63d0\u793a\u5904\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u68c0\u7d22\uff0c\u5176\u4e2d\u6700\u9ad8\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8fbe\u5230\u4e8684.30%\uff0c\u4f46\u5f53\u524d\u7684\u9632\u5fa1\u63aa\u65bd\u7684\u6709\u6548\u6027\u6709\u9650\uff0c\u8fd9\u8868\u660e\u793e\u533a\u5728\u4ee3\u7406\u5b89\u5168\u65b9\u9762\u4ecd\u6709\u8bb8\u591a\u5de5\u4f5c\u8981\u505a\u3002\u6709\u5173\u6b64\u7814\u7a76\u7684\u4ee3\u7801\u53ef\u5728https://github.com/agiresearch/ASB\u83b7\u53d6\u3002**|\n", "2410.02551": "|**2024-10-03**|**ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration**|Zixiang Wang et.al.|[2410.02551](http://arxiv.org/abs/2410.02551)|null|\u6211\u4eec\u5f15\u5165\u4e86ColaCare\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u589e\u5f3a\u4e86\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u5efa\u6a21\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u7f1d\u5730\u5c06\u9886\u57df\u7279\u5b9a\u7684\u4e13\u4e1a\u6a21\u578b\u4e0eLLM\u7ed3\u5408\uff0c\u4ee5\u5f25\u5408\u7ed3\u6784\u5316EHR\u6570\u636e\u4e0e\u57fa\u4e8e\u6587\u672c\u7684\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u53d7\u4e34\u5e8a\u54a8\u8be2\u7684\u542f\u53d1\uff0cColaCare\u91c7\u7528\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\u533b\u751f\u4ee3\u7406\u548c\u5143\u4ee3\u7406\uff0c\u5b83\u4eec\u534f\u540c\u5206\u6790\u60a3\u8005\u6570\u636e\u3002\u4e13\u5bb6\u6a21\u578b\u5904\u7406\u5e76\u4ece\u6570\u503cEHR\u6570\u636e\u751f\u6210\u9884\u6d4b\uff0c\u800cLLM\u4ee3\u7406\u5728\u534f\u4f5c\u54a8\u8be2\u6846\u67b6\u5185\u4ea7\u751f\u63a8\u7406\u53c2\u8003\u548c\u51b3\u7b56\u62a5\u544a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u5757\u5c06\u9ed8\u514b\u8bca\u65ad\u4e0e\u6cbb\u7597\u624b\u518c\uff08MSD\uff09\u533b\u7597\u6307\u5bfc\u6574\u5408\u8fdb\u6765\uff0c\u63d0\u4f9b\u6743\u5a01\u8bc1\u636e\u652f\u6301\u3002\u5728\u56db\u4e2a\u4e0d\u540c\u7684EHR\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8bc1\u660e\u4e86ColaCare\u5728\u6b7b\u4ea1\u7387\u9884\u6d4b\u4efb\u52a1\u4e2d\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5176\u5728\u4e34\u5e8a\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u548c\u63a8\u8fdb\u4e2a\u6027\u5316\u7cbe\u51c6\u533b\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u3002\u6709\u5173\u4ee3\u7801\u3001\u5b8c\u6574\u63d0\u793a\u6a21\u677f\u3001\u66f4\u591a\u6848\u4f8b\u7814\u7a76\u7b49\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u533f\u540d\u94fe\u63a5\uff1a\u3002|\n", "2410.02406": "|**2024-10-03**|**ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR**|Mengxu Pan et.al.|[2410.02406](http://arxiv.org/abs/2410.02406)|null|\u8bb8\u591a\u4eba\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u65f6\u4f1a\u9047\u5230\u56f0\u96be\uff0c\u4f20\u7edf\u7684\u5de5\u5177\u5728\u63d0\u4f9b\u9488\u5bf9\u6bcf\u4e2a\u5b66\u4e60\u8005\u9700\u6c42\u7684\u4e0a\u4e0b\u6587\u5316\u5b66\u4e60\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5728\u793e\u4ea4\u865a\u62df\u73b0\u5b9e\uff08VR\uff09\u4e2d\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\uff08ECAs\uff09\u7684\u53d1\u5c55\uff0c\u63d0\u4f9b\u4e86\u4ee5\u4e00\u79cd\u8003\u8651\u5230\u5b66\u4e60\u8005\u7684\u8bed\u8a00\u6c34\u5e73\u548c\u9700\u6c42\u7684\u65b9\u5f0f\u8fdb\u884c\u4e0a\u4e0b\u6587\u5316\u4e14\u81ea\u7136\u7684\u8bed\u8a00\u5b66\u4e60\u7684\u65b0\u673a\u4f1a\u3002\u4e3a\u4e86\u63a2\u7d22\u8fd9\u4e00\u53ef\u80fd\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ELLMA-T\uff0c\u4e00\u4e2a\u5229\u7528GPT-4\u548c\u57fa\u4e8e\u60c5\u5883\u5b66\u4e60\u6846\u67b6\u6765\u652f\u6301\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u82f1\u8bed\u8bed\u8a00\u5b66\u4e60\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\u3002\u901a\u8fc712\u6b21\u7684\u8d28\u6027\u8bbf\u8c08\uff0c\u6211\u4eec\u63ed\u793a\u4e86ELLMA-T\u5728VR\u4e2d\u4e3a\u5b66\u4e60\u8005\u4e0e\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u751f\u6210\u771f\u5b9e\u3001\u53ef\u4fe1\u548c\u4e0a\u4e0b\u6587\u7279\u5b9a\u7684\u89d2\u8272\u626e\u6f14\u7684\u6f5c\u529b\uff0c\u4ee5\u53caLLM\u5728\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u521d\u59cb\u8bed\u8a00\u8bc4\u4f30\u548c\u6301\u7eed\u53cd\u9988\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9\u4e8e\u672a\u6765\u5f00\u53d1\u57fa\u4e8eLLM\u7684\u8bed\u8a00\u4ee3\u7406\u5728\u793e\u4ea4VR\u4e2d\u7684\u4e94\u4e2a\u8bbe\u8ba1\u542f\u793a\u3002|\n", "2410.02165": "|**2024-10-03**|**A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization**|Yucheng Chu et.al.|[2410.02165](http://arxiv.org/abs/2410.02165)|null|\u5728\u5b66\u4e60\u5206\u6790\uff08LA\uff09\u7684\u80cc\u666f\u4e0b\uff0c\u5f00\u653e\u5f0f\u77ed\u7b54\u95ee\u9898\uff08SAG\uff09\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u6df1\u5165\u4e86\u89e3\u5b66\u4e60\u8005\u54cd\u5e94\u7684\u5f3a\u5927\u5de5\u5177\u3002\u7136\u800c\uff0c\u5728\u5b9e\u8df5\u4e2d\uff0cSAG\u7ecf\u5e38\u9762\u4e34\u9ad8\u8bc4\u5206\u5de5\u4f5c\u91cf\u548c\u8bc4\u4f30\u4e00\u81f4\u6027\u62c5\u5fe7\u7684\u6311\u6218\u3002\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u77ed\u7b54\u8bc4\u5206\uff08ASAG\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u7684ASAG\u7b97\u6cd5\u5f80\u5f80\u5728\u6cdb\u5316\u80fd\u529b\u4e0a\u6709\u9650\uff0c\u5e76\u503e\u5411\u4e8e\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u8fdb\u884c\u5b9a\u5236\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u591a\u4ee3\u7406ASAG\u6846\u67b6GradeOpt\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aSAG\u7684\u8bc4\u5206\u5458\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cGradeOpt\u5f15\u5165\u4e86\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u989d\u5916\u4ee3\u7406\u2014\u2014\u53cd\u5c04\u5668\u548c\u7ec6\u5316\u5668\u2014\u2014\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4f7f\u5f97GradeOpt\u80fd\u591f\u901a\u8fc7\u5bf9\u5176\u9519\u8bef\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u6765\u81ea\u52a8\u4f18\u5316\u539f\u59cb\u8bc4\u5206\u6307\u5357\u3002\u5728\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684ASAG\u4efb\u52a1\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5373\u5bf9\u6559\u5b66\u5185\u5bb9\u77e5\u8bc6\uff08PCK\uff09\u548c\u5185\u5bb9\u77e5\u8bc6\uff08CK\uff09\u95ee\u9898\u8fdb\u884c\u8bc4\u5206\u65f6\uff0cGradeOpt\u5728\u8bc4\u5206\u51c6\u786e\u6027\u548c\u4e0e\u4eba\u5de5\u8bc4\u5206\u5458\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u4ee3\u8868\u57fa\u7ebf\u7684\u6027\u80fd\u3002\u6700\u540e\uff0c\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8bc1\u5b9e\u4e86GradeOpt\u4e2d\u8bbe\u8ba1\u7684\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002|\n", "2410.02026": "|**2024-10-02**|**Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics**|Yuan Zhou et.al.|[2410.02026](http://arxiv.org/abs/2410.02026)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aZODIAC\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0c\u8f85\u52a9\u5fc3\u810f\u75c5\u5b66\u8bca\u65ad\u3002ZODIAC\u80fd\u591f\u4ece\u60a3\u8005\u6570\u636e\u4e2d\u63d0\u53d6\u4e34\u5e8a\u76f8\u5173\u7279\u5f81\u3001\u68c0\u6d4b\u91cd\u8981\u7684\u5fc3\u5f8b\u5931\u5e38\uff0c\u5e76\u751f\u6210\u521d\u6b65\u62a5\u544a\u4f9b\u5fc3\u810f\u75c5\u4e13\u5bb6\u5ba1\u67e5\u548c\u7ec6\u5316\u3002\u4e3a\u4e86\u5b9e\u73b0\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0cZODIAC\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u534f\u4f5c\u6846\u67b6\uff0c\u5141\u8bb8\u5bf9\u591a\u6a21\u6001\u60a3\u8005\u6570\u636e\u8fdb\u884c\u5904\u7406\u3002\u6bcf\u4e2aLLM\u4ee3\u7406\u5747\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u88c1\u5b9a\u7684\u771f\u5b9e\u4e16\u754c\u60a3\u8005\u6570\u636e\u8fdb\u884c\u7cbe\u7ec6\u8c03\u4f18\uff0c\u4ee5\u6b64\u5f3a\u5316\u6a21\u578b\u7684\u4e13\u4e1a\u7d20\u517b\u3002 ZODIAC\u7ecf\u8fc7\u4e86\u4e25\u683c\u7684\u4e34\u5e8a\u9a8c\u8bc1\uff0c\u7531\u72ec\u7acb\u7684\u5fc3\u810f\u75c5\u4e13\u5bb6\u8bc4\u4f30\uff0c\u6db5\u76d6\u516b\u4e2a\u6307\u6807\uff0c\u8861\u91cf\u4e34\u5e8a\u6548\u679c\u5e76\u89e3\u51b3\u5b89\u5168\u95ee\u9898\u3002\u7ed3\u679c\u663e\u793a\uff0cZODIAC\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u884c\u4e1a\u9886\u5148\u7684\u6a21\u578b\uff0c\u5305\u62ecOpenAI\u7684GPT-4o\u3001Meta\u7684Llama-3.1-405B\u548cGoogle\u7684Gemini-pro\uff0c\u4ee5\u53ca\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684LLM\u5982\u5fae\u8f6f\u7684BioGPT\u3002\u8fd9\u8868\u660e\u4e86\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u7b26\u5408\u533b\u7597\u5b9e\u8df5\u4e25\u683c\u8981\u6c42\u7684\u9886\u57df\u7279\u5b9a\u89e3\u51b3\u65b9\u6848\u3002 \u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cZODIAC\u5df2\u6210\u529f\u96c6\u6210\u5230\u5fc3\u7535\u56fe(ECG)\u8bbe\u5907\u4e2d\uff0c\u5c55\u793a\u4e86\u5c06LLM\u5d4c\u5165\u8f6f\u4ef6\u4f5c\u4e3a\u533b\u7597\u8bbe\u5907(SaMD)\u7684\u8d8b\u52bf\u65e5\u76ca\u589e\u957f\u3002|\n", "2410.03055": "|**2024-10-04**|**Permissive Information-Flow Analysis for Large Language Models**|Shoaib Ahmed Siddiqui et.al.|[2410.03055](http://arxiv.org/abs/2410.03055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u5feb\u901f\u6210\u4e3a\u66f4\u5927\u8f6f\u4ef6\u7cfb\u7edf\u4e2d\u7684\u901a\u7528\u7ec4\u4ef6\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u81ea\u7136\u7684\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff1a\u4ece\u4e00\u4e2a\u7ec4\u4ef6\u83b7\u53d6\u7684\u6c61\u67d3\u6570\u636e\u53ef\u4ee5\u6539\u53d8\u6a21\u578b\u7684\u884c\u4e3a\u5e76\u7834\u574f\u6574\u4e2a\u7cfb\u7edf\uff0c\u5305\u62ec\u4f7f\u6a21\u578b\u5728\u4e0d\u53ef\u4fe1\u7ec4\u4ef6\u95f4\u4f20\u64ad\u673a\u5bc6\u6570\u636e\u3002\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u5728\u7cfb\u7edf\u5c42\u9762\u4e0a\u901a\u8fc7\u52a8\u6001\u4fe1\u606f\u6d41\u8ddf\u8e2a\uff08\u5373\u6c61\u70b9\u8ddf\u8e2a\uff09\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u4f20\u7edf\u65b9\u6cd5\u5c06\u6700\u4e25\u683c\u7684\u8f93\u5165\u6807\u7b7e\u4f20\u64ad\u5230\u8f93\u51fa\u8fc7\u4e8e\u4fdd\u5b88\uff0c\u4e0d\u9002\u5408LLM\u5728\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u8f93\u5165\u4e0a\u64cd\u4f5c\u7684\u5e94\u7528\u573a\u666f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u66f4\u5bbd\u677e\u7684\u65b9\u6cd5\u6765\u5728LLM\u67e5\u8be2\u4e2d\u4f20\u64ad\u4fe1\u606f\u6d41\u6807\u7b7e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u4ec5\u4f20\u64ad\u751f\u6210\u6a21\u578b\u8f93\u51fa\u65f6\u8d77\u4f5c\u7528\u7684\u6837\u672c\u7684\u6807\u7b7e\uff0c\u5e76\u6d88\u9664\u4e0d\u5fc5\u8981\u7684\u8f93\u5165\u6807\u7b7e\u3002 \u6211\u4eec\u5b9e\u73b0\u4e86\u5e76\u7814\u7a76\u4e86\u4e24\u79cd\u8fd9\u79cd\u65b9\u6cd5\u7684\u53d8\u4f53\uff0c\u57fa\u4e8e\uff08i\uff09\u63d0\u793a\u589e\u5f3a\u68c0\u7d22\u548c\uff08ii\uff09\u57fa\u4e8e$k$\u4e2a\u6700\u8fd1\u90bb\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u76f4\u63a5\u8be2\u95ee\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u8f93\u51fa\u6807\u7b7e\u7684\u53cd\u7701\u5f0f\u5f71\u54cd\u4f30\u8ba1\u5668\u57fa\u7ebf\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u63d0\u793a\u7684\u6807\u7b7e\u4f20\u64ad\u5668\u65b9\u6cd5\u5728\u8d85\u8fc785%\u7684\u60c5\u51b5\u4e0b\u63d0\u9ad8\u4e86\u6807\u7b7e\u8d28\u91cf\uff0c\u5728LLM\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6548\u679c\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5728\u68c0\u7d22\u589e\u5f3a\u4e2d\u4f7f\u7528\u5bbd\u677e\u6807\u7b7e\u4f20\u64ad\u7684\u5b9e\u7528\u6027\u3002|\n", "2410.02958": "|**2024-10-03**|**AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML**|Patara Trirat et.al.|[2410.02958](http://arxiv.org/abs/2410.02958)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014AutoML-Agent\uff0c\u4e13\u4e3a\u5168\u7ba1\u9053\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\uff08AutoML\uff09\u8bbe\u8ba1\uff0c\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u68c0\u7d22\u5230\u6a21\u578b\u90e8\u7f72\u7684\u6574\u4e2a\u8fc7\u7a0b\u3002AutoML-Agent\u901a\u8fc7\u63a5\u53d7\u7528\u6237\u7684\u4efb\u52a1\u63cf\u8ff0\u3001\u4fc3\u8fdb\u4e13\u95e8\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u4e4b\u95f4\u7684\u534f\u4f5c\uff0c\u5e76\u4ea4\u4ed8\u53ef\u90e8\u7f72\u7684\u6a21\u578b\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\uff0c\u4ee5\u7b80\u5316\u975e\u4e13\u5bb6\u7528\u6237\u6784\u5efa\u6570\u636e\u9a71\u52a8\u89e3\u51b3\u65b9\u6848\u7684\u8fc7\u7a0b\u3002\u4e0e\u73b0\u6709\u5de5\u4f5c\u4e0d\u540c\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u7684\u89c4\u5212\u7b56\u7565\u6765\u63d0\u9ad8\u63a2\u7d22\u6027\uff0c\u4ee5\u4fbf\u5728\u641c\u7d22\u66f4\u4f18\u89e3\u7684\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u63a2\u7d22\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5e76\u884c\u6267\u884c\u6765\u5206\u89e3\u6bcf\u4e2a\u8ba1\u5212\u4e3a\u5b50\u4efb\u52a1\uff08\u4f8b\u5982\u6570\u636e\u9884\u5904\u7406\u548c\u795e\u7ecf\u7f51\u7edc\u8bbe\u8ba1\uff09\uff0c\u6bcf\u4e2a\u5b50\u4efb\u52a1\u7531\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u6784\u5efa\u7684\u4e13\u95e8\u4ee3\u7406\u89e3\u51b3\uff0c\u8fd9\u4f7f\u5f97\u641c\u7d22\u8fc7\u7a0b\u66f4\u52a0\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u591a\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u6765\u9a8c\u8bc1\u6267\u884c\u7ed3\u679c\uff0c\u5e76\u6307\u5bfc\u4ee3\u7801\u751f\u6210\u8bed\u8a00\u6a21\u578b\u5b9e\u73b0\u6210\u529f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u4e03\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u4f7f\u7528\u5341\u56db\u7ec4\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cAutoML-Agent\u5728\u81ea\u52a8\u5316\u5168AutoML\u6d41\u7a0b\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u6210\u529f\u7387\uff0c\u4e14\u7cfb\u7edf\u5728\u6574\u4e2a\u591a\u6837\u5316\u9886\u57df\u4e2d\u7684\u6027\u80fd\u5747\u8868\u73b0\u51fa\u8272\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u548c\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u56e0\u4e3a\u81ea\u7136\u8bed\u8a00\u901a\u4fe1\u5728\u6b64\u7c7b\u573a\u666f\u4e2d\u901a\u5e38\u5360\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u4e14\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u6218\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u800c\u8a00\uff0c\u8fd9\u4e9b\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002 \u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u5df2\u7ecf\u63a2\u7d22\u4e86LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u5728\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u4e0a\u7684\u5dee\u5f02\u4f7f\u5f97\u96be\u4ee5\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u4ee5\u6807\u51c6\u5316\u5bf9\u57fa\u4e8e\u8bed\u8a00\u7684\u53cc\u4eba\u3001\u5e8f\u5217\u6e38\u620f\u7684\u7814\u7a76\u3002\u501f\u9274\u7ecf\u6d4e\u5b66\u6587\u732e\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u7c7b\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u4ee5\u53ca\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u4e0e\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u8861\u91cf\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u8fdb\u884c\u4ea4\u4e92\u6a21\u62df\u4e0e\u5206\u6790\uff0c\u5e76\u5229\u7528\u8be5\u6846\u67b6\u6536\u96c6\u4e86LLM\u4e0eLVM\u4e4b\u95f4\u7684\u591a\u4e2a\u6e38\u620f\u914d\u7f6e\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u4e0eLVM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\uff1a(i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b(ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u7ee9\u6548\u89d2\u5ea6\u8bc4\u4f30\u4ee3\u7406\uff1b(iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.04360": "|**2024-10-09**|**GenSim: A General Social Simulation Platform with Large Language Model based Agents**|Jiakai Tang et.al.|[2410.04360](http://arxiv.org/abs/2410.04360)|**[link](https://github.com/TangJiakai/GenSim)**|**\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5229\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u793e\u4f1a\u884c\u4e3a\u7684\u7814\u7a76\u53d6\u5f97\u4e86\u8bb8\u591a\u6709\u524d\u666f\u7684\u6210\u679c\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u5728\u7279\u5b9a\u573a\u666f\u4e0b\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6d89\u53ca\u6709\u9650\u6570\u91cf\u7684\u4ee3\u7406\uff0c\u4f46\u5b83\u4eec\u5927\u591a\u7f3a\u4e4f\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u9519\u8bef\u65f6\u8fdb\u884c\u9002\u5e94\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textit{GenSim}\u7684\u65b0\u9896\u7684\u57fa\u4e8eLLM\u7684\u4eff\u771f\u5e73\u53f0\uff1a\uff081\uff09\\textbf{\u62bd\u8c61\u4e86\u4e00\u7ec4\u901a\u7528\u529f\u80fd}\uff0c\u7b80\u5316\u4e86\u5b9a\u5236\u793e\u4f1a\u573a\u666f\u7684\u4eff\u771f\uff1b\uff082\uff09\\textbf{\u652f\u6301\u4e00\u767e\u4e07\u4e2a\u4ee3\u7406}\uff0c\u4ee5\u66f4\u597d\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u60c5\u5883\u4e2d\u7684\u5927\u89c4\u6a21\u4eba\u7fa4\uff1b\uff083\uff09\\textbf{\u6574\u5408\u4e86\u9519\u8bef\u7ea0\u6b63\u673a\u5236}\uff0c\u786e\u4fdd\u66f4\u53ef\u9760\u548c\u957f\u671f\u7684\u4eff\u771f\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u5e73\u53f0\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5927\u89c4\u6a21\u4ee3\u7406\u4eff\u771f\u6548\u7387\u4ee5\u53ca\u9519\u8bef\u7ea0\u6b63\u673a\u5236\u7684\u6709\u6548\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cGenSim\u4ee3\u8868\u4e86\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u901a\u7528\u3001\u5927\u89c4\u6a21\u548c\u53ef\u6821\u6b63\u7684\u793e\u4f1a\u4eff\u771f\u5e73\u53f0\u7684\u521d\u6b65\u6b65\u9aa4\uff0c\u6709\u671b\u8fdb\u4e00\u6b65\u63a8\u52a8\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53d1\u5c55\u3002**|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u65e5\u76ca\u81ea\u4e3b\u5e76\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u9884\u89c1\u53ef\u80fd\u51fa\u73b0\u7684\u73b0\u8c61\u5e76\u8bc6\u522b\u6f5c\u5728\u98ce\u9669\u3002\u53d7\u5230\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5728\u6b64\u9886\u57df\u505a\u51fa\u8d21\u732e\uff0c\u901a\u8fc7\u5728\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u7279\u5f81\u7684\u60c5\u5883\u4e0b\u7814\u7a76LLM\u4ee3\u7406\u7684\u4ea4\u4e92\u6a21\u5f0f\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u4e24\u79cd\u73b0\u8c61\uff1a\u8bf4\u670d\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u5bfb\u6c42\u7279\u5b9a\u76ee\u6807\uff08\u4f8b\u5982\u83b7\u5f97\u66f4\u591a\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u79bb\u76d1\u72f1\uff09\u56da\u72af\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u7814\u7a76\u3002\u5229\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\u548c\u603b\u51712000\u6b21\u673a\u5668\u5bf9\u673a\u5668\u5bf9\u8bdd\uff0c\u6d89\u53ca\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u503c\u5f97\u5173\u6ce8\u7684\u53d1\u73b0\u3002 \u9996\u5148\uff0c\u6211\u4eec\u8bb0\u5f55\u4e86\u67d0\u4e9b\u6a21\u578b\u5982\u4f55\u5728\u5177\u6709\u6743\u529b\u52a8\u6001\u4f5c\u7528\u7684\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\u7684\u5bf9\u8bdd\u3002\u7136\u540e\uff0c\u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u8bc1\u5730\u5c55\u793a\u4e86\u76ee\u6807\u5bf9\u4ee3\u7406\u7684\u8bf4\u670d\u529b\u5f71\u54cd\u4e3b\u8981\uff0c\u800c\u5bf9\u4ee3\u7406\u7684\u53cd\u793e\u4f1a\u884c\u4e3a\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u4ee3\u7406\u4e2a\u6027\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u6027\u683c\uff0c\u5982\u4f55\u9a71\u52a8\u56da\u72af\u6210\u529f\u7684\u8bf4\u670d\u53ef\u80fd\u6027\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u7b2c\u56db\uff0c\u6211\u4eec\u8868\u660e\uff0c\u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u4e2a\u6027\uff0c\u4ec5\u901a\u8fc7\u5206\u914d\u4ee3\u7406\u89d2\u8272\uff0c\u53cd\u793e\u4f1a\u884c\u4e3a\u4e5f\u4f1a\u81ea\u7136\u6d6e\u73b0\u3002\u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ee3\u7406\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8fa9\u8bba\u6709\u91cd\u8981\u610f\u4e49\u3002**|\n", "2410.06932": "|**2024-10-09**|**Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models**|Daniel Albert et.al.|[2410.06932](http://arxiv.org/abs/2410.06932)|null|\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\u2014\u2014\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\uff0c\u4ee5\u8865\u5145\u6a21\u62df\u548c\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\uff0c\u4ece\u800c\u6df1\u5316\u5bf9\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u8ba4\u77e5\u8fc7\u7a0b\u7684\u7406\u89e3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u590d\u73b0\u4e86\u4e00\u4e2a\u4eba\u7c7b\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\u4e2d\u7684\u884c\u4e3a\u7b56\u7565\uff0c\u5e76\u4f7f\u7528LLM\u751f\u6210\u7684\u4ee3\u7406\u4e0e\u89c2\u5bdf\u5230\u7684\u4eba\u7c7b\u884c\u4e3a\u8fdb\u884c\u5bf9\u6bd4\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cLLM\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u91cd\u73b0\u641c\u7d22\u884c\u4e3a\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u76f8\u4f3c\u7684\u51b3\u7b56\u5236\u5b9a\u8fc7\u7a0b\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5206\u6790\u4e86LLM\u4ee3\u7406\u7684\u201c\u601d\u60f3\u201d\u6a21\u62df\uff0c\u53d1\u73b0\u66f4\u524d\u77bb\u6027\u7684\u601d\u60f3\u4e0e\u503e\u5411\u4e8e\u5229\u7528\u800c\u975e\u63a2\u7d22\u4ee5\u6700\u5927\u5316\u8d22\u5bcc\u7684\u884c\u4e3a\u76f8\u5173\u8054\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e00\u65b0\u65b9\u6cd5\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\uff0c\u5e76\u63a2\u8ba8\u4e86\u5176\u53ef\u80fd\u5b58\u5728\u7684\u5c40\u9650\u6027\u3002|\n", "2410.06153": "|**2024-10-08**|**AgentSquare: Automatic LLM Agent Search in Modular Design Space**|Yu Shang et.al.|[2410.06153](http://arxiv.org/abs/2410.06153)|**[link](https://github.com/tsinghua-fib-lab/agentsquare)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u5feb\u901f\u6210\u957f\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u624b\u52a8\u3001\u4efb\u52a1\u7279\u5b9a\u8bbe\u8ba1\u7684\u65b9\u6cd5\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u65b0\u4efb\u52a1\u4e0a\u7684\u9002\u5e94\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u7814\u7a76\u95ee\u9898\uff1a\u6a21\u5757\u5316\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u641c\u7d22\uff08MoLAS\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6a21\u5757\u5316\u7684\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5c06\u73b0\u6709\u7684LLM\u667a\u80fd\u4f53\u8bbe\u8ba1\u62bd\u8c61\u4e3a\u56db\u4e2a\u57fa\u672c\u6a21\u5757\uff0c\u5e76\u4fdd\u6301\u7edf\u4e00\u7684\u8f93\u5165\u8f93\u51fa\u63a5\u53e3\uff1a\u89c4\u5212\u3001\u63a8\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentSquare\u7684\u65b0\u667a\u80fd\u4f53\u641c\u7d22\u6846\u67b6\uff0c\u5b83\u5f15\u5165\u4e86\u4e24\u4e2a\u6838\u5fc3\u673a\u5236\uff1a\u6a21\u5757\u8fdb\u5316\u548c\u91cd\u7ec4\uff0c\u4ee5\u9ad8\u6548\u5730\u641c\u7d22\u4f18\u5316\u7684LLM\u667a\u80fd\u4f53\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u52a0\u901f\u8fd9\u4e00\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6027\u80fd\u9884\u6d4b\u5668\uff0c\u5229\u7528\u4e0a\u4e0b\u6587\u76f8\u5173\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\u8bbe\u8ba1\u7684\u8fd1\u4f3c\u6a21\u578b\uff0c\u4ece\u800c\u8df3\u8fc7\u65e0\u524d\u666f\u7684\u4ee3\u7406\u8bbe\u8ba1\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u7f51\u7edc\u5e94\u7528\u3001\u5b9e\u4f53\u4ea4\u4e92\u3001\u5de5\u5177\u4f7f\u7528\u548c\u6e38\u620f\u7b49\u4e0d\u540c\u573a\u666f\uff0c\u7ed3\u679c\u8868\u660e\uff0cAgentSquare\u663e\u8457\u4f18\u4e8e\u624b\u5de5\u8bbe\u8ba1\u7684\u667a\u80fd\u4f53\uff0c\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e8617.2%\uff0c\u4e0e\u4eba\u7c7b\u6700\u4f73\u8bbe\u8ba1\u76f8\u6bd4\u3002\u6b64\u5916\uff0cAgentSquare\u8fd8\u80fd\u751f\u6210\u53ef\u89e3\u91ca\u7684\u8bbe\u8ba1\u6d1e\u5bdf\uff0c\u6709\u52a9\u4e8e\u6df1\u5165\u7406\u89e3\u667a\u80fd\u4f53\u67b6\u6784\u53ca\u5176\u5bf9\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6a21\u5757\u5316\u8bbe\u8ba1\u7a7a\u95f4\u548cAgentSquare\u641c\u7d22\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5e73\u53f0\uff0c\u7528\u4e8e\u5145\u5206\u5229\u7528\u5148\u524d\u6210\u529f\u8bbe\u8ba1\u7684\u6f5c\u529b\uff0c\u5e76\u6574\u5408\u7814\u7a76\u793e\u533a\u7684\u52aa\u529b\u3002\u4ee3\u7801\u4ed3\u5e93\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/tsinghua-fib-lab/AgentSquare\u3002**|\n", "2410.05570": "|**2024-10-08**|**Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback**|Taufiq Daryanto et.al.|[2410.05570](http://arxiv.org/abs/2410.05570)|null|\u6c42\u804c\u9762\u8bd5\u5728\u5851\u9020\u4e2a\u4eba\u804c\u4e1a\u751f\u6daf\u65b9\u9762\u8d77\u7740\u5173\u952e\u4f5c\u7528\uff0c\u7136\u800c\uff0c\u7f3a\u4e4f\u4eba\u7c7b\u6559\u7ec3\u6216\u540c\u884c\u63d0\u4f9b\u53cd\u9988\u7684\u73af\u5883\u4f7f\u9762\u8bd5\u6280\u80fd\u8bad\u7ec3\u53d8\u5f97\u9887\u5177\u6311\u6218\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u63d0\u5347\u9762\u8bd5\u7ec3\u4e60\u4f53\u9a8c\u63d0\u4f9b\u4e86\u673a\u4f1a\u3002\u9057\u61be\u7684\u662f\uff0c\u76ee\u524d\u7684\u7814\u7a76\u9c9c\u6709\u63a2\u8ba8\u6b64\u7c7b\u7cfb\u7edf\u7684\u6548\u679c\u53ca\u5176\u7528\u6237\u611f\u77e5\uff0c\u4ee5\u53ca\u5229\u7528LLM\u8fdb\u884c\u9762\u8bd5\u7ec3\u4e60\u6240\u6d89\u53ca\u7684\u76ca\u5904\u4e0e\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u548c\u6700\u8fd1\u7684\u5546\u4e1a\u5de5\u5177\u5df2\u7ecf\u5c55\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u9762\u8bd5\u7ec3\u4e60\u7684\u6f5c\u529b\uff0c\u5b83\u4eec\u901a\u5e38\u4ec5\u63d0\u4f9b\u5355\u5411\u53cd\u9988\uff0c\u5373\u7528\u6237\u53ea\u80fd\u4ece\u4ed6\u4eec\u7684\u8868\u73b0\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5bf9\u8bdd\u5f0f\u53cd\u9988\uff0c\u4e00\u4e2a\u5728\u5b66\u4e60\u79d1\u5b66\u9886\u57df\u53d1\u5c55\u8d77\u6765\u7684\u6982\u5ff5\uff0c\u662f\u4e00\u79cd\u53cc\u5411\u4e92\u52a8\u53cd\u9988\u8fc7\u7a0b\uff0c\u5141\u8bb8\u7528\u6237\u901a\u8fc7\u5bf9\u8bdd\u8fdb\u4e00\u6b65\u53c2\u4e0e\u5e76\u4ece\u63d0\u4f9b\u7684\u53cd\u9988\u4e2d\u5b66\u4e60\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u6b3e\u540d\u4e3aConversate\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u5e94\u7528\u7a0b\u5e8f\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u652f\u6301\u53cd\u601d\u6027\u5b66\u4e60\uff0c\u4ee5\u4fc3\u8fdb\u6c42\u804c\u9762\u8bd5\u7ec3\u4e60\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u804c\u4f4d\u6807\u9898\uff08\u5982\u5165\u95e8\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\uff09\u6765\u542f\u52a8\u9762\u8bd5\u4f1a\u8bdd\u3002\u7136\u540e\uff0c\u7cfb\u7edf\u4e2d\u7684LLM\u4ee3\u7406\u5c06\u5f00\u59cb\u9762\u8bd5\u6a21\u62df\uff0c\u901a\u8fc7\u5411\u7528\u6237\u63d0\u51fa\u5f00\u573a\u9762\u8bd5\u95ee\u9898\uff0c\u5e76\u6839\u636e\u7528\u6237\u7684\u56de\u7b54\u7cbe\u5fc3\u8bbe\u8ba1\u540e\u7eed\u95ee\u9898\u6765\u542f\u52a8\u3002\u9762\u8bd5\u7ed3\u675f\u540e\uff0c\u7cfb\u7edf\u7684\u540e\u7aefLLM\u6846\u67b6\u5c06\u5206\u6790\u7528\u6237\u7684\u56de\u7b54\uff0c\u6307\u51fa\u9700\u8981\u6539\u8fdb\u7684\u5730\u65b9\u3002\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u9009\u62e9\u7279\u5b9a\u6bb5\u843d\u5e76\u64b0\u5199\u81ea\u6211\u53cd\u601d\u6765\u6ce8\u91ca\u8f6c\u5f55\u3002\u6700\u540e\uff0c\u7528\u6237\u53ef\u4ee5\u4e0e\u7cfb\u7edf\u8fdb\u884c\u5bf9\u8bdd\u5f0f\u53cd\u9988\u4ea4\u4e92\uff0c\u4e0eLLM\u4ee3\u7406\u5bf9\u8bdd\uff0c\u6839\u636e\u4ee3\u7406\u7684\u6307\u5bfc\u9010\u6b65\u5b8c\u5584\u81ea\u5df1\u7684\u7b54\u6848\u3002|\n", "2410.05434": "|**2024-10-07**|**Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback**|Sanjiban Choudhury et.al.|[2410.05434](http://arxiv.org/abs/2410.05434)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u51b3\u7b56\u5236\u5b9a\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u65b9\u6cd5\u7f3a\u4e4f\u4ece\u4efb\u52a1\u6267\u884c\u671f\u95f4\u9519\u8bef\u4e2d\u81ea\u52a8\u81ea\u6211\u6539\u8fdb\u7684\u673a\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86LEAP\uff0c\u4e00\u79cd\u8fed\u4ee3\u7ec6\u8c03\u6846\u67b6\uff0c\u901a\u8fc7\u4eceAI\u4e13\u5bb6\u6559\u5e08\u83b7\u53d6\u53cd\u9988\u6765\u6301\u7eed\u63d0\u5347LLM\u4ee3\u7406\u3002\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\u4e3a\u4e13\u5bb6\u6559\u5e08\u63d0\u4f9b\u4e00\u4e2a\u7279\u6743\u72b6\u6001\u2014\u2014\u4ec5\u5728\u8bad\u7ec3\u671f\u95f4\u53ef\u7528\u4f46\u5728\u6d4b\u8bd5\u65f6\u9690\u85cf\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u5373\u4f7f\u662f\u6700\u5f31\u7684\u4e13\u5bb6\u4e5f\u80fd\u63d0\u4f9b\u7cbe\u786e\u6307\u5bfc\uff0c\u663e\u8457\u63d0\u9ad8\u5b66\u751f\u4ee3\u7406\u5728\u4e0d\u8bbf\u95ee\u6d4b\u8bd5\u65f6\u7684\u7279\u6743\u4fe1\u606f\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u6211\u4eec\u5728\u591a\u79cd\u51b3\u7b56\u5236\u5b9a\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86LEAP\uff0c\u5305\u62ec\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\uff08ALFWorld\uff09\u3001\u7f51\u7edc\u5bfc\u822a\uff08WebShop\uff09\u548c\u4ea4\u4e92\u5f0f\u7f16\u7801\uff08Intercode Bash\uff09\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLEAP\uff081\uff09\u4f18\u4e8e\u884c\u4e3a\u514b\u9686\u548cReAct\u57fa\u7ebf\uff082\uff09\u4f7f\u8f83\u5f31\u7684\u5b66\u751f\u6a21\u578b\uff08\u5982Llama3-8B\uff09\u8d85\u8fc7\u5f3a\u5927\u6559\u5e08\u6a21\u578b\uff08GPT4-o\uff09\u7684\u8868\u73b0\uff0c\u5e76\u4e14\uff083\uff09\u5141\u8bb8\u8f83\u5f31\u7684\u6a21\u578b\u4f7f\u7528\u81ea\u5df1\u7279\u6743\u7248\u672c\u7684\u81ea\u6211\u63d0\u5347\u3002\u6211\u4eec\u4e5f\u63d0\u4f9b\u4e86\u7406\u8bba\u5206\u6790\uff0c\u663e\u793aLEAP\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5e73\u8861\u7279\u6743\u4fe1\u606f\u4e0e\u5b66\u751f\u7684\u53ef\u5b9e\u73b0\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc1\u5b9e\u4e86\u8fd9\u4e00\u89c2\u70b9\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://leap-llm.github.io \u83b7\u53d6\u3002|\n", "2410.07869": "|**2024-10-10**|**Benchmarking Agentic Workflow Generation**|Shuofei Qiao et.al.|[2410.07869](http://arxiv.org/abs/2410.07869)|**[link](https://github.com/zjunlp/worfbench)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u51ed\u501f\u5176\u5728\u5904\u7406\u5e7f\u6cdb\u4efb\u52a1\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\uff0c\u63a8\u52a8\u4e86\u63a8\u7406\u548c\u89c4\u5212\u4efb\u52a1\u7684\u663e\u8457\u8fdb\u6b65\u3002\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u5c06\u590d\u6742\u95ee\u9898\u5206\u89e3\u4e3a\u53ef\u6267\u884c\u7684\u5de5\u4f5c\u6d41\u662f\u5173\u952e\u6b65\u9aa4\u3002\u73b0\u6709\u7684\u5de5\u4f5c\u6d41\u8bc4\u4f30\u6846\u67b6\u8981\u4e48\u4ec5\u5173\u6ce8\u6574\u4f53\u6027\u80fd\uff0c\u8981\u4e48\u5b58\u5728\u9650\u5236\uff0c\u5982\u573a\u666f\u8986\u76d6\u8303\u56f4\u6709\u9650\u3001\u5de5\u4f5c\u6d41\u7ed3\u6784\u8fc7\u4e8e\u7b80\u5355\u4ee5\u53ca\u8bc4\u4ef7\u6807\u51c6\u5bbd\u677e\u7b49\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86WorFBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u591a\u7ef4\u573a\u666f\u548c\u590d\u6742\u56fe\u5de5\u4f5c\u6d41\u7ed3\u6784\u7684\u7edf\u4e00\u5de5\u4f5c\u6d41\u751f\u6210\u57fa\u51c6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u7cfb\u7edf\u6027\u7684\u8bc4\u4f30\u534f\u8bae\u2014\u2014WorFEval\uff0c\u5229\u7528\u5b50\u5e8f\u5217\u548c\u5b50\u56fe\u5339\u914d\u7b97\u6cd5\u6765\u51c6\u786e\u91cf\u5316LLM\u4ee3\u7406\u7684\u5de5\u4f5c\u6d41\u751f\u6210\u80fd\u529b\u3002 \u901a\u8fc7\u4e0d\u540c\u7c7b\u578b\u7684LLM\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLM\u4ee3\u7406\u5728\u5e8f\u5217\u89c4\u5212\u80fd\u529b\u548c\u56fe\u89c4\u5212\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u660e\u663e\u7684\u5dee\u8ddd\uff0c\u5373\u4f7f\u662fGPT-4\u4e5f\u663e\u793a\u51fa\u7ea615%\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u8fd8\u8bad\u7ec3\u4e86\u4e24\u4e2a\u5f00\u6e90\u6a21\u578b\uff0c\u5e76\u5728\u4fdd\u7559\u4efb\u52a1\u4e0a\u8bc4\u4f30\u5b83\u4eec\u7684\u4e00\u822c\u5316\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u751f\u6210\u7684\u5de5\u4f5c\u6d41\u80fd\u591f\u589e\u5f3a\u4e0b\u6e38\u4efb\u52a1\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4efb\u52a1\u5728\u63a8\u7406\u65f6\u80fd\u591f\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u5e76\u8282\u7701\u65f6\u95f4\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5c06\u5728https://github.com/zjunlp/WorFBench\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.07706": "|**2024-10-10**|**AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories**|Yifan Song et.al.|[2410.07706](http://arxiv.org/abs/2410.07706)|null|\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentBank\uff0c\u8fd9\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u7528\u4e8e\u5f00\u653e\u6e90\u4ee3\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684agent-environment\u4ea4\u4e92\u8f68\u8ff9\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u8d85\u8fc75\u4e07\u6761\u591a\u6837\u5316\u7684\u9ad8\u8d28\u91cf\u4ea4\u4e92\u8f68\u8ff9\uff0c\u6d89\u53ca16\u4e2a\u4efb\u52a1\u548c\u4e94\u4e2a\u4e0d\u540c\u7684agent\u6280\u80fd\u7ef4\u5ea6\u3002\u901a\u8fc7\u65b0\u9896\u7684\u6ce8\u91ca\u6d41\u7a0b\uff0c\u6211\u4eec\u80fd\u591f\u89c4\u6a21\u5316\u5730\u6807\u6ce8\u8f68\u8ff9\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96be\u5ea6\u504f\u5dee\u6700\u5c0f\u5316\u7684\u8f68\u8ff9\u6570\u636e\u96c6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5bf9AgentBank\u8fdb\u884c\u8c03\u4f18\uff0c\u5f97\u5230\u4e86\u4e00\u7cfb\u5217\u7684agent\u6a21\u578b\u2014\u2014Samoyed\u3002\u6211\u4eec\u7684\u6bd4\u8f83\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u6269\u5c55\u4ea4\u4e92\u8f68\u8ff9\u6570\u636e\u6765\u83b7\u53d6\u901a\u7528\u7684agent\u80fd\u529b\u7684\u6709\u6548\u6027\u3002\u989d\u5916\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u4e8e\u8f68\u8ff9\u8c03\u4f18\u548cagent\u6280\u80fd\u6cdb\u5316\u7684\u5173\u952e\u89c2\u5bdf\u7ed3\u679c\u3002|\n", "2410.07484": "|**2024-10-11**|**WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents**|Siyu Zhou et.al.|[2410.07484](http://arxiv.org/abs/2410.07484)|**[link](https://github.com/elated-sawyer/WALL-E)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u662f\u5426\u53ef\u4ee5\u76f4\u63a5\u4f5c\u4e3a\u6a21\u578b\u9a71\u52a8\u4ee3\u7406\u7684\u5f3a\u5927\u4e16\u754c\u6a21\u578b\uff1f\u867d\u7136LLM\u7684\u5148\u9a8c\u77e5\u8bc6\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u786e\u5b9e\u5b58\u5728\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u4f7fLLM\u4e0e\u5176\u90e8\u7f72\u73af\u5883\u5bf9\u9f50\u6765\u5f25\u5408\u8fd9\u4e9b\u5dee\u8ddd\uff0c\u8fd9\u79cd\u201c\u4e16\u754c\u5bf9\u9f50\u201d\u53ef\u4ee5\u901a\u8fc7\u5728LLM\u4e0a\u8fdb\u884c\u89c4\u5219\u5b66\u4e60\u6765\u9ad8\u6548\u5b9e\u73b0\u3002\u8003\u8651\u5230LLM\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4ec5\u9700\u5c11\u91cf\u989d\u5916\u89c4\u5219\u5373\u53ef\u4f7fLLM\u9884\u6d4b\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u529b\u5b66\u76f8\u5339\u914d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\uff0c\u901a\u8fc7LLM\u4ee5\u68af\u5ea6\u65e0\u7684\u5b66\u4e60\u65b9\u5f0f\u6765\u5b66\u4e60\u8fd9\u4e9b\u89c4\u5219\uff0c\u901a\u8fc7\u57fa\u4e8e\u63a2\u7d22\u8f68\u8ff9\u4e0e\u4e16\u754c\u6a21\u578b\u9884\u6d4b\u7684\u6bd4\u8f83\u6765\u8bf1\u5bfc\u3001\u66f4\u65b0\u548c\u4fee\u526a\u89c4\u5219\u3002\u7ed3\u679c\u7684\u4e16\u754c\u6a21\u578b\u7531LLM\u548c\u5b66\u4e60\u5230\u7684\u89c4\u5219\u7ec4\u6210\u3002\u6211\u4eec\u6784\u5efa\u7684\u5b9e\u4f53\u5316LLM\u4ee3\u7406\u201cWALL-E\u201d\u57fa\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\uff08MPC\uff09\u3002\u901a\u8fc7\u57fa\u4e8e\u7cbe\u786e\u4e16\u754c\u6a21\u578b\u4f18\u5316\u524d\u77bb\u884c\u52a8\uff0cMPC\u663e\u8457\u63d0\u9ad8\u4e86\u63a2\u7d22\u548c\u5b66\u4e60\u6548\u7387\u3002\u4e0e\u73b0\u6709LLM\u4ee3\u7406\u76f8\u6bd4\uff0c\u201cWALL-E\u201d\u7684\u63a8\u7406\u4ec5\u9700\u8981\u5c11\u91cf\u4e3b\u8981\u89c4\u5219\uff0c\u800c\u4e0d\u9700\u8981\u5305\u542b\u5728LLM\u8f93\u5165\u4e2d\u7684\u5927\u91cf\u7f13\u51b2\u8f68\u8ff9\u3002\u5728Minecraft\u548cALFWorld\u7684\u5f00\u653e\u4e16\u754c\u6311\u6218\u4e2d\uff0cWALL-E\u7684\u6210\u529f\u7387\u9ad8\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u89c4\u5212\u65f6\u95f4\u548c\u63a8\u7406\u6240\u9700\u7684\u4ee4\u724c\u6570\u91cf\u66f4\u4f4e\u3002\u5728Minecraft\u4e2d\uff0cWALL-E\u6bd4\u57fa\u7ebf\u9ad8\u51fa15%-30%\uff0c\u6210\u529f\u7387\u4e3a95%\uff0c\u4ec5\u82b1\u8d396\u6b21\u8fed\u4ee3\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|\u53e0\u5c42\u6210\u50cf\u662f\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u4e2d\u7684\u4e00\u79cd\u5148\u8fdb\u7684\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7269\u7406\u3001\u5316\u5b66\u3001\u751f\u7269\u548c\u6750\u6599\u79d1\u5b66\u7b49\u79d1\u7814\u9886\u57df\uff0c\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u3002\u5b9e\u9645\u4e0a\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684\u53e0\u5c42\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u8bb8\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u4f4e\u541e\u5410\u91cf\u7684\u5de5\u4f5c\u6d41\u7a0b\u548c\u6f5c\u5728\u7684\u4eba\u7c7b\u504f\u89c1\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u201c\u53e0\u5c42\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u5316\u53e0\u5c42\u6210\u50cf\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u4f7f\u7528\u591a\u4e2aLLM\u4ee3\u7406\u6267\u884c\u4efb\u52a1\uff0c\u5305\u62ec\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e3a\u53ef\u4ee5\u4e0e\u5b9a\u5236\u7684\u672c\u5730\u77e5\u8bc6\u5e93\u4e00\u8d77\u5de5\u4f5c\uff0c\u786e\u4fdd\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e2d\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09024": "|**2024-10-14**|**AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents**|Maksym Andriushchenko et.al.|[2410.09024](http://arxiv.org/abs/2410.09024)|null|\u5bf9\u4e8e\u8bed\u8a00\u5927\u6a21\u578b\uff08LLMs\uff09\u5728\u9762\u5bf9\u8d8a\u72f1\u653b\u51fb\u65f6\u7684\u9c81\u68d2\u6027\u7814\u7a76\uff0c\u4e3b\u8981\u96c6\u4e2d\u5728\u5b83\u4eec\u4f5c\u4e3a\u7b80\u5355\u7684\u804a\u5929\u673a\u5668\u4eba\u65f6\u7684\u60c5\u51b5\u3002\u7136\u800c\uff0c\u80fd\u591f\u4f7f\u7528\u5916\u90e8\u5de5\u5177\u5e76\u6267\u884c\u591a\u9636\u6bb5\u4efb\u52a1\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u53ef\u80fd\u5e26\u6765\u66f4\u5927\u7684\u98ce\u9669\uff0c\u4f46\u5176\u9c81\u68d2\u6027\u4ecd\u7f3a\u4e4f\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6ee5\u7528\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014AgentHarm\u3002\u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u62ec110\u4e2a\u660e\u786e\u6076\u610f\u7684\u4ee3\u7406\u4efb\u52a1\uff08\u901a\u8fc7\u589e\u5f3a\u540e\u8fbe\u5230440\u4e2a\uff09\uff0c\u6db5\u76d6\u4e86\u6b3a\u8bc8\u3001\u7f51\u7edc\u72af\u7f6a\u548c\u9a9a\u6270\u7b4911\u7c7b\u5371\u5bb3\u3002\u9664\u4e86\u8861\u91cf\u6a21\u578b\u662f\u5426\u62d2\u7edd\u6709\u5bb3\u7684\u4ee3\u7406\u8bf7\u6c42\u5916\uff0c\u8981\u5728AgentHarm\u4e0a\u53d6\u5f97\u9ad8\u5206\u8fd8\u9700\u8981\u88ab\u8d8a\u72f1\u7684\u4ee3\u7406\u80fd\u591f\u5728\u906d\u53d7\u653b\u51fb\u540e\u7ef4\u6301\u5176\u80fd\u529b\u4ee5\u5b8c\u6210\u591a\u6b65\u4efb\u52a1\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u7cfb\u5217\u9886\u5148\u7684LLMs\uff0c\u53d1\u73b0\uff081\uff09\u9886\u5148\u7684LLMs\u5728\u6ca1\u6709\u8d8a\u72f1\u7684\u60c5\u51b5\u4e0b\u4f1a\u51fa\u4e4e\u610f\u6599\u5730\u670d\u4ece\u6076\u610f\u4ee3\u7406\u8bf7\u6c42\uff0c\uff082\uff09\u7b80\u5355\u7684\u901a\u7528\u8d8a\u72f1\u6a21\u677f\u53ef\u4ee5\u6709\u6548\u8d8a\u72f1\u4ee3\u7406\uff0c\uff083\uff09\u8fd9\u4e9b\u8d8a\u72f1\u80fd\u591f\u4f7f\u8fde\u8d2f\u4e14\u6076\u610f\u7684\u591a\u6b65\u4ee3\u7406\u884c\u4e3a\u5f97\u4ee5\u5b9e\u73b0\uff0c\u5e76\u4fdd\u7559\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u4fbf\u4e8e\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7b80\u5355\u53ef\u9760\u7684\u653b\u51fb\u548c\u9632\u5fa1\u8bc4\u4f30\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86AgentHarm\uff0c\u7f51\u5740\u662fhttps://huggingface.co/datasets/ai-safety-institute/AgentHarm\u3002|\n", "2410.08948": "|**2024-10-11**|**The Dynamics of Social Conventions in LLM populations: Spontaneous Emergence, Collective Biases and Tipping Points**|Ariel Flint Ashery et.al.|[2410.08948](http://arxiv.org/abs/2410.08948)|null|\u793e\u4f1a\u60ef\u4f8b\u662f\u793e\u4f1a\u548c\u7ecf\u6d4e\u751f\u6d3b\u7684\u57fa\u7840\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684AI\u4ee3\u7406\u4e0e\u5f7c\u6b64\u4ee5\u53ca\u4eba\u7c7b\u8fdb\u884c\u4e92\u52a8\uff0c\u5b83\u4eec\u5f62\u6210\u5171\u4eab\u60ef\u4f8b\u7684\u80fd\u529b\u5c06\u51b3\u5b9a\u5b83\u4eec\u534f\u8c03\u884c\u4e3a\u3001\u878d\u5165\u793e\u4f1a\u5e76\u5f71\u54cd\u793e\u4f1a\u7684\u6548\u679c\u3002\u672c\u6587\u901a\u8fc7\u6a21\u62df\u4ea4\u4e92\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7fa4\u4f53\u5185\u90e8\u60ef\u4f8b\u7684\u52a8\u529b\u5b66\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5168\u7403\u63a5\u53d7\u7684\u793e\u4f1a\u60ef\u4f8b\u53ef\u4ee5\u81ea\u53d1\u5730\u4ece\u76f8\u4e92\u4ea4\u6d41\u7684LLM\u4e4b\u95f4\u4ea7\u751f\u3002\u5176\u6b21\uff0c\u6211\u4eec\u6f14\u793a\u4e86\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u5373\u4f7f\u662f\u4e2a\u4f53\u4ee3\u7406\u770b\u4f3c\u65e0\u504f\u89c1\u7684\u60c5\u51b5\u4e0b\uff0c\u5f3a\u70c8\u7684\u96c6\u4f53\u504f\u89c1\u4e5f\u53ef\u80fd\u4f1a\u51fa\u73b0\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u8003\u5bdf\u4e86\u5c11\u6570\u7fa4\u4f53\u4e2d\u7684\u575a\u5b9aLLM\u5982\u4f55\u63a8\u52a8\u793e\u4f1a\u53d8\u9769\uff0c\u901a\u8fc7\u5efa\u7acb\u65b0\u7684\u793e\u4f1a\u60ef\u4f8b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65e6\u8fd9\u4e9b\u5c11\u6570\u7fa4\u4f53\u8fbe\u5230\u4e34\u754c\u89c4\u6a21\uff0c\u5b83\u4eec\u5c31\u80fd\u591f\u6301\u7eed\u98a0\u8986\u5df2\u5efa\u7acb\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u6240\u6709\u60c5\u51b5\u4e0b\uff0c\u5c06\u5b9e\u9a8c\u7ed3\u679c\u4e0e\u4e00\u4e2a\u6700\u5c0f\u5316\u591a\u4ee3\u7406\u6a21\u578b\u7684\u9884\u6d4b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u9694\u79bbLLM\u4ee3\u7406\u7684\u5177\u4f53\u4f5c\u7528\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u9610\u660e\u4e86AI\u7cfb\u7edf\u53ef\u4ee5\u5728\u6ca1\u6709\u660e\u786e\u7f16\u7a0b\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u53d1\u5c55\u89c4\u8303\uff0c\u5e76\u5bf9\u8bbe\u8ba1\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u793e\u4f1a\u76ee\u6807\u76f8\u4e00\u81f4\u7684AI\u7cfb\u7edf\u5177\u6709\u542f\u793a\u610f\u4e49\u3002|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u4f8b\u5982\u901a\u8fc7\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u7684\u5bf9\u6297\u6027\u8f93\u5165\u53ef\u4ee5\u89e6\u53d1\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ec8\u6b62\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\uff08\u5982\u673a\u5668\u4eba\u8bed\u97f3\u547d\u4ee4\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4ec5\u4f9d\u9760\u81ea\u7136\u6307\u4ee4\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u6700\u5927\u957f\u5ea6\u9650\u5236\uff0c\u8fd9\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6709\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u4e2d\u7684\u4e0a\u9650\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u6295\u6bd2\u578bDoS\uff08P-DoS\uff09\u653b\u51fb\uff0c\u8bc1\u660e\u6ce8\u5165\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8eDoS\u76ee\u7684\u7684\u4e2d\u6bd2\u6837\u672c\u53ef\u4ee5\u6253\u7834\u8f93\u51fa\u957f\u5ea6\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u4e2d\u6bd2\u6837\u672c\u6210\u529f\u653b\u51fb\u4e86GPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u4f7f\u7528\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\uff0c\u5bfc\u81f4\u8f93\u51fa\u91cd\u590d\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2atoken\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u4e2d\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u5f00\u6e90LLMs\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u6025\u9700\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u5b89\u5168\u7684\u8feb\u5207\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u627e\u5230\u3002**|\n", "2410.10398": "|**2024-10-14**|**FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas**|Yu Lei et.al.|[2410.10398](http://arxiv.org/abs/2410.10398)|null|AI\u5bf9\u9f50\u662f\u5173\u4e4eAI\u63a7\u5236\u548c\u5b89\u5168\u7684\u5173\u952e\u95ee\u9898\u3002\u5b83\u4e0d\u4ec5\u5e94\u8003\u8651\u4ef7\u503c\u4e2d\u7acb\u7684\u4eba\u7c7b\u504f\u597d\uff0c\u8fd8\u5e94\u8003\u8651\u9053\u5fb7\u548c\u4f26\u7406\u65b9\u9762\u7684\u8003\u91cf\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FairMindSim\uff0c\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0d\u516c\u5e73\u7684\u60c5\u666f\u6765\u6a21\u62df\u9053\u5fb7\u56f0\u5883\u3002\u6211\u4eec\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u5728\u5404\u4e2a\u9636\u6bb5\u786e\u4fdd\u5bf9\u9f50\u3002\u4e3a\u4e86\u63a2\u7d22\u9a71\u52a8\u4eba\u7c7b\u548cLLM\u4ee3\u7406\u4f5c\u4e3a\u65c1\u89c2\u8005\u5728\u6d89\u53ca\u4ed6\u4eba\u7684\u4e0d\u516c\u6b63\u60c5\u51b5\u4e0b\u5e72\u9884\u7684\u5404\u79cd\u793e\u4f1a\u7ecf\u6d4e\u52a8\u673a\uff0c\u5373\u6211\u4eec\u6240\u79f0\u7684\u4fe1\u5ff5\uff0c\u5e76\u63a2\u8ba8\u8fd9\u4e9b\u4fe1\u5ff5\u5982\u4f55\u76f8\u4e92\u4f5c\u7528\u4ee5\u5f71\u54cd\u4e2a\u4f53\u884c\u4e3a\uff0c\u6211\u4eec\u5c06\u76f8\u5173\u793e\u4f1a\u5b66\u9886\u57df\u7684\u77e5\u8bc6\u7eb3\u5165\u5176\u4e2d\uff0c\u5e76\u57fa\u4e8e\u9012\u5f52\u5956\u52b1\u6a21\u578b\uff08RRM\uff09\u63d0\u51fa\u4e86\u4fe1\u5ff5-\u5956\u52b1\u5bf9\u9f50\u884c\u4e3a\u8fdb\u5316\u6a21\u578b\uff08BREM\uff09\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4ece\u884c\u4e3a\u89d2\u5ea6\u6765\u770b\uff0cGPT-4o\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u793e\u4f1a\u6b63\u4e49\u611f\uff0c\u800c\u4eba\u7c7b\u5219\u5c55\u73b0\u51fa\u66f4\u4e30\u5bcc\u7684\u60c5\u611f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u60c5\u7eea\u5bf9\u884c\u4e3a\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u672c\u7814\u7a76\u4e3aLLM\u4e0e\u5229\u4ed6\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u7406\u8bba\u57fa\u7840\u3002|\n", "2410.10136": "|**2024-10-14**|**Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations**|Garima Agrawal et.al.|[2410.10136](http://arxiv.org/abs/2410.10136)|null|\u5728\u5ba2\u6237\u8054\u7edc\u4e2d\u5fc3\uff0c\u4eba\u5de5\u5ba2\u670d\u7ecf\u5e38\u9762\u4e34\u8f83\u957f\u7684\u5e73\u5747\u5904\u7406\u65f6\u95f4\uff08AHT\uff09\uff0c\u56e0\u4e3a\u4ed6\u4eec\u9700\u8981\u624b\u52a8\u89e3\u6790\u67e5\u8be2\u5e76\u68c0\u7d22\u76f8\u5173\u7684\u77e5\u8bc6\u5e93\uff08KB\uff09\u6587\u7ae0\u3002\u867d\u7136\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u884c\u4e1a\u4ee5\u534f\u52a9\u6b64\u7c7b\u4efb\u52a1\uff0c\u4f46\u5728\u5b9e\u65f6\u5bf9\u8bdd\u4e2d\uff0cRAG\u7cfb\u7edf\u9762\u4e34\u7740\u8bf8\u5982\u67e5\u8be2\u516c\u5f0f\u4e0d\u51c6\u786e\u548c\u9891\u7e41\u95ee\u9898\u91cd\u590d\u68c0\u7d22\u7b49\u95ee\u9898\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u8d85\u8d8aRAG\uff0c\u5728\u5b9e\u65f6\u8bc6\u522b\u5ba2\u6237\u95ee\u9898\u3002\u5982\u679c\u67e5\u8be2\u5339\u914d\u5e38\u89c1\u95ee\u9898\u89e3\u7b54\uff08FAQ\uff09\uff0c\u7cfb\u7edf\u76f4\u63a5\u4eceFAQ\u6570\u636e\u5e93\u4e2d\u68c0\u7d22\u7b54\u6848\uff1b\u5426\u5219\uff0c\u901a\u8fc7RAG\u751f\u6210\u7b54\u6848\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u67e5\u8be2\u7684\u4f9d\u8d56\uff0c\u4f7f\u5f97\u54cd\u5e94\u80fd\u591f\u57282\u79d2\u5185\u63d0\u4f9b\u7ed9\u5ba2\u670d\u4eba\u5458\u3002\u6b64\u7cfb\u7edf\u90e8\u7f72\u5728Minerva CQ\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u89e3\u51b3\u65b9\u6848\u4e2d\uff0c\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u7f29\u77ed\u4e86AHT\uff0c\u5e76\u964d\u4f4e\u4e86\u8fd0\u8425\u6210\u672c\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7684LLM\u4ee3\u7406\u5de5\u4f5c\u6d41\uff0c\u5f53\u6ca1\u6709\u9884\u5b9a\u4e49\u7684FAQ\u65f6\uff0c\u53ef\u4ee5\u4ece\u5386\u53f2\u8bb0\u5f55\u4e2d\u8bc6\u522bFAQ\u3002|\n", "2410.10020": "|**2024-10-13**|**Adaptive Reasoning and Acting in Medical Language Agents**|Abhishek Dutta et.al.|[2410.10020](http://arxiv.org/abs/2410.10020)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u63d0\u5347\u5728\u6a21\u62df\u4e34\u5e8a\u73af\u5883\u4e2d\u7684\u8bca\u65ad\u51c6\u786e\u6027\uff0c\u5e76\u4f7f\u7528AgentClinic\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\u3002\u6240\u63d0\u51fa\u7684\u81ea\u52a8\u6821\u6b63\u673a\u5236\u4f7f\u5f97\u533b\u751f\u4ee3\u7406\u80fd\u591f\u5728\u9519\u8bef\u8bca\u65ad\u540e\u8fed\u4ee3\u5730\u4f18\u5316\u5176\u63a8\u7406\u548c\u884c\u4e3a\uff0c\u4ece\u800c\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u63d0\u9ad8\u51b3\u7b56\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u91c7\u7528\u81ea\u9002\u5e94LLM\u57fa\u7840\u533b\u751f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u4e0e\u6a21\u62df\u60a3\u8005\u7684\u52a8\u6001\u4e92\u52a8\u5b9e\u73b0\u6b63\u786e\u7684\u8bca\u65ad\u3002\u8bc4\u4f30\u7ed3\u679c\u7a81\u663e\u4e86\u81ea\u4e3b\u4ee3\u7406\u5728\u590d\u6742\u533b\u7597\u573a\u666f\u4e2d\u9002\u5e94\u548c\u6539\u8fdb\u7684\u80fd\u529b\u3002\u672a\u6765\u7684\u5de5\u4f5c\u5c06\u96c6\u4e2d\u5728\u5b8c\u5584\u7b97\u6cd5\u5e76\u6269\u5927\u5176\u5728\u66f4\u5e7f\u6cdb\u4efb\u52a1\u548c\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u9002\u7528\u6027\u3002|\n", "2410.09824": "|**2024-10-13**|**Dynamic and Textual Graph Generation Via Large-Scale LLM-based Agent Simulation**|Jiarui Ji et.al.|[2410.09824](http://arxiv.org/abs/2410.09824)|null|\u56fe\u751f\u6210\u662f\u793e\u4f1a\u3001\u6280\u672f\u548c\u79d1\u5b66\u7814\u7a76\u4e2d\u5e7f\u6cdb\u7814\u7a76\u7684\u57fa\u672c\u4efb\u52a1\u3002\u5728\u5efa\u6a21\u52a8\u6001\u56fe\u6f14\u5316\u8fc7\u7a0b\u65f6\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u65b9\u6cd5\u96be\u4ee5\u6355\u6349\u56fe\u4e2d\u7684\u793e\u533a\u7ed3\u6784\uff0c\u800c\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u4ec5\u5173\u6ce8\u62df\u5408\u8bad\u7ec3\u56fe\u3002\u8fd9\u9650\u5236\u4e86\u73b0\u6709\u7684\u56fe\u751f\u6210\u5668\u53ea\u80fd\u751f\u6210\u7b26\u5408\u9884\u5b9a\u4e49\u89c4\u5219\u6216\u4e0e\u8bad\u7ec3\u6570\u636e\u96c6\u9ad8\u5ea6\u76f8\u4f3c\u7684\u56fe\uff0c\u5728\u52a8\u6001\u56fe\u751f\u6210\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u9274\u4e8e\u56fe\u662f\u4ece\u4eba\u7c7b\u6d3b\u52a8\u4e2d\u6210\u5bf9\u4ea4\u4e92\u4ea7\u751f\u7684\u62bd\u8c61\u8868\u793a\uff0c\u5bf9\u4eba\u7c7b\u884c\u4e3a\u7684\u771f\u5b9e\u6a21\u62df\u53ef\u4ee5\u66f4\u6df1\u5165\u5730\u6d1e\u5bdf\u56fe\u6f14\u5316\u673a\u5236\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u65b9\u9762\u7684\u65e5\u76ca\u8ba4\u53ef\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u4eff\u771f\u6846\u67b6\u2014\u2014GraphAgent-Generator\uff08GAG\uff09\uff0c\u7528\u4e8e\u52a8\u6001\u56fe\u751f\u6210\u3002\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u6211\u4eec\u7684\u6846\u67b6\u6709\u6548\u590d\u5236\u4e86\u5df2\u5efa\u7acb\u7684\u7f51\u7edc\u79d1\u5b66\u7406\u8bba\u4e2d\u7684\u4e03\u4e2a\u5b8f\u89c2\u7ed3\u6784\u7279\u5f81\uff0c\u540c\u65f6\u5728\u7279\u5b9a\u8bc4\u4f30\u6307\u6807\u4e0a\u6bd4\u73b0\u6709\u57fa\u7ebf\u5728\u56fe\u6269\u5c55\u4efb\u52a1\u4e2d\u63d0\u9ad8\u4e8631%\u3002\u901a\u8fc7\u8282\u70b9\u5206\u7c7b\u4efb\u52a1\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86GAG\u80fd\u591f\u6709\u6548\u4fdd\u7559\u771f\u5b9e\u4e16\u754c\u7f51\u7edc\u7684\u8282\u70b9\u7ea7\u6587\u672c\u7279\u5f81\u5728\u751f\u6210\u7684\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u4e2d\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u5e76\u884c\u52a0\u901f\uff0cGAG\u652f\u6301\u901a\u8fc7\u57fa\u4e8e\u5927\u89c4\u6a21LLM\u7684\u4ee3\u7406\u4eff\u771f\u751f\u6210\u6700\u591a\u63a5\u8fd110\u4e07\u4e2a\u8282\u70b9\u62161000\u4e07\u6761\u8fb9\u7684\u56fe\uff0c\u6700\u5c0f\u52a0\u901f\u6bd4\u4e3a90.4%\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002|\n", "2410.09713": "|**2024-10-13**|**Agentic Information Retrieval**|Weinan Zhang et.al.|[2410.09713](http://arxiv.org/abs/2410.09713)|null|\u81ea20\u4e16\u7eaa70\u5e74\u4ee3\u4ee5\u6765\uff0c\u7528\u6237\u8bbf\u95ee\u76f8\u5173\u4fe1\u606f\u4e00\u76f4\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u67b6\u6784\u3002\u5728\u8fc7\u53bb\u4e8c\u5341\u5e74\u4e2d\uff0c\u73b0\u4ee3IR\u7cfb\u7edf\uff08\u5305\u62ec\u7f51\u7edc\u641c\u7d22\u5f15\u64ce\u548c\u4e2a\u4eba\u5316\u63a8\u8350\u7cfb\u7edf\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u4ece\u5927\u91cf\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\u7684\u6548\u7387\u3002\u7136\u800c\uff0c\u8fd9\u4e9bIR\u7cfb\u7edf\u7684\u5185\u6838\u8303\u5f0f\u4ecd\u7136\u57fa\u672c\u4e0d\u53d8\uff0c\u4f9d\u8d56\u4e8e\u7b5b\u9009\u9884\u5b9a\u7684\u4e00\u7ec4\u5019\u9009\u9879\u76ee\u3002\u81ea2022\u5e74\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7a81\u7834\u5f00\u59cb\u6539\u53d8\u4fe1\u606f\u8bbf\u95ee\u7684\u65b9\u5f0f\uff0c\u5efa\u7acb\u4e86\u4e00\u79cd\u65b0\u7684\u6280\u672f\u8303\u5f0f\u3002\u5728\u672c\u6587\u732e\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7531LLM\u4ee3\u7406\u80fd\u529b\u5851\u9020\u7684\u65b0IR\u8303\u5f0f\u2014\u2014\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\uff08Agentic IR\uff09\u3002Agentic IR\u6269\u5c55\u4e86\u53ef\u8bbf\u95ee\u4efb\u52a1\u7684\u8303\u56f4\uff0c\u5e76\u5229\u7528\u4e00\u7cfb\u5217\u65b0\u6280\u672f\u91cd\u65b0\u5b9a\u4e49\u4fe1\u606f\u68c0\u7d22\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u4e09\u79cd\u524d\u6cbf\u5e94\u7528\u4ee5\u53ca\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\u6709\u671b\u4ea7\u751f\u521b\u65b0\u7684\u5e94\u7528\uff0c\u53ef\u80fd\u6210\u4e3a\u672a\u6765\u6570\u5b57\u751f\u6001\u7cfb\u7edf\u4e2d\u7684\u6838\u5fc3\u4fe1\u606f\u5165\u53e3\u3002|\n", "2410.09381": "|**2024-10-12**|**LLM-SmartAudit: Advanced Smart Contract Vulnerability Detection**|Zhiyuan Wei et.al.|[2410.09381](http://arxiv.org/abs/2410.09381)|**[link](https://github.com/LLMAudit/LLMSmartAuditTool)**|\u533a\u5757\u94fe\u6280\u672f\u7684\u4e0d\u53d8\u6027\u8d28\u867d\u7136\u9769\u547d\u6027\uff0c\u4f46\u4e5f\u5f15\u5165\u4e86\u663e\u8457\u7684\u5b89\u5168\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u3002\u8fd9\u4e9b\u5b89\u5168\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u5de8\u5927\u7684\u8d22\u52a1\u635f\u5931\u3002\u5f53\u524d\u5de5\u5177\u548c\u65b9\u6cd5\u901a\u5e38\u4e13\u6ce8\u4e8e\u7279\u5b9a\u7c7b\u578b\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u4e00\u79cd\u80fd\u591f\u5e7f\u6cdb\u68c0\u6d4b\u591a\u79cd\u6f0f\u6d1e\u4e14\u5177\u6709\u9ad8\u51c6\u786e\u6027\u7684\u7efc\u5408\u5de5\u5177\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLM-SmartAudit\u7684\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u80fd\u529b\u6765\u68c0\u6d4b\u548c\u5206\u6790\u667a\u80fd\u5408\u7ea6\u4e2d\u7684\u6f0f\u6d1e\u3002\u901a\u8fc7\u591a\u4ee3\u7406\u5bf9\u8bdd\u65b9\u6cd5\uff0cLLM-SmartAudit\u91c7\u7528\u534f\u4f5c\u7cfb\u7edf\u4e0e\u4e13\u4e1a\u4ee3\u7406\u5408\u4f5c\u4ee5\u589e\u5f3a\u5ba1\u8ba1\u8fc7\u7a0b\u3002\u4e3a\u4e86\u8bc4\u4f30LLM-SmartAudit\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u7f16\u5236\u4e86\u4e24\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u4e0e\u4f20\u7edf\u5de5\u5177\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5b9e\u9645\u5e94\u7528\u7684\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6240\u6709\u4f20\u7edf\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u5de5\u5177\u4e4b\u4e0a\uff0c\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u51c6\u786e\u6027\u548c\u66f4\u5927\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u68c0\u6d4b\u590d\u6742\u903b\u8f91\u6f0f\u6d1e\uff0c\u800c\u4f20\u7edf\u5de5\u5177\u4e4b\u524d\u672a\u66fe\u53d1\u73b0\u8fd9\u4e9b\u6f0f\u6d1e\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5229\u7528LLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u975e\u5e38\u6709\u6548\u7684\u81ea\u52a8\u5316\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u65b9\u6cd5\u3002|\n", "2410.11239": "|**2024-10-15**|**HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications**|Weijie Xu et.al.|[2410.11239](http://arxiv.org/abs/2410.11239)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u5728\u6559\u80b2\u548c\u91d1\u878d\u7b49\u591a\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u8bf8\u591a\u76ca\u5904\uff0c\u4f46\u5728\u4eba\u529b\u8d44\u6e90\u9886\u57df\uff0c\u4ecd\u6709\u8bb8\u591a\u91cd\u590d\u6027\u7684\u6d41\u7a0b\u672a\u88ab\u89e3\u51b3\uff0c\u4f8b\u5982\u8bbf\u95ee\u8bf7\u6c42\u3001\u533b\u7597\u62a5\u9500\u548c\u8bf7\u5047\u7533\u8bf7\u7b49\u3002\u6211\u4eec\u5e0c\u671b\u5c06\u8fd9\u4e9b\u4efb\u52a1\u4e0eLLM\u4ee3\u7406\u76f8\u5173\u8054\uff0c\u8be5\u4ee3\u7406\u5df2\u7ecf\u5728\u8bf8\u5982\u5199\u4f5c\u8f85\u52a9\u548c\u5ba2\u6237\u670d\u52a1\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u6210\u6548\u3002\u6211\u4eec\u63d0\u51fa\u4e86HR-Agent\uff0c\u8fd9\u662f\u4e00\u79cd\u9ad8\u6548\u3001\u4fdd\u5bc6\u4e14\u4e13\u95e8\u9488\u5bf9\u4eba\u529b\u8d44\u6e90\u9886\u57df\u7684\u57fa\u4e8eLLM\u7684\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u5904\u7406\u5982\u533b\u7597\u62a5\u9500\u548c\u8bbf\u95ee\u8bf7\u6c42\u7b49\u91cd\u590d\u6027\u7684\u4eba\u529b\u8d44\u6e90\u6d41\u7a0b\u3002\u7531\u4e8e\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e0d\u4f1a\u5c06\u5bf9\u8bdd\u6570\u636e\u53d1\u9001\u7ed9LLM\uff0c\u56e0\u6b64\u80fd\u591f\u4fdd\u6301\u4eba\u529b\u8d44\u6e90\u76f8\u5173\u4efb\u52a1\u6240\u9700\u7684\u673a\u5bc6\u6027\u3002|\n", "2410.12568": "|**2024-10-16**|**Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving**|Sihao Wu et.al.|[2410.12568](http://arxiv.org/abs/2410.12568)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u5e38\u8bc6\u548c\u63a8\u7406\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u7eaf\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u7684\u7f3a\u9677\u3002\u5f53\u524d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u9700\u8981\u8f83\u957f\u7684\u63a8\u7406\u65f6\u95f4\uff0c\u5e76\u4e14\u5728\u4e0e\u5b9e\u65f6\u81ea\u52a8\u9a7e\u9a76\u73af\u5883\u4ea4\u4e92\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e00\u4e2a\u5173\u952e\u7684\u5f00\u653e\u6027\u95ee\u9898\u662f\uff0c\u6211\u4eec\u80fd\u5426\u6709\u6548\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u4e00\u4e2a\u9ad8\u6548\u4e14\u9c81\u68d2\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAPID\u6846\u67b6\uff0c\u5373\u9c81\u68d2\u81ea\u9002\u5e94\u7b56\u7565\u6ce8\u5165\u4e0e\u84b8\u998f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u7528\u7531\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u4ee3\u7406\u751f\u6210\u7684\u6570\u636e\u6765\u8bad\u7ec3\u4e13\u95e8\u7684\u6df7\u5408\u7b56\u7565RL\u4ee3\u7406\uff0c\u5e76\u8fdb\u884c\u5728\u7ebf\u9002\u5e94\u3002RAPID\u5177\u6709\u4e09\u4e2a\u5173\u952e\u8bbe\u8ba1\uff1a1\uff09\u5229\u7528\u4eceLLM\u4ee3\u7406\u6536\u96c6\u7684\u79bb\u7ebf\u6570\u636e\u6765\u84b8\u998f\u4e13\u5bb6\u77e5\u8bc6\u5230RL\u7b56\u7565\u4e2d\uff0c\u4ee5\u52a0\u5feb\u5b9e\u65f6\u63a8\u7406\u901f\u5ea6\uff1b2\uff09\u5f15\u5165\u9c81\u68d2\u84b8\u998f\u5230RL\u4e2d\uff0c\u4ee5\u7ee7\u627fLLM\u57fa\u7840\u6559\u5e08\u7684\u8868\u73b0\u548c\u9c81\u68d2\u6027\uff1b3\uff09\u91c7\u7528\u6df7\u5408\u7b56\u7565\u65b9\u6cd5\uff0c\u901a\u8fc7\u7b56\u7565\u9002\u914d\u5668\u8fdb\u884c\u8054\u5408\u51b3\u7b56\u89e3\u7801\u3002\u901a\u8fc7\u5728\u7ebf\u73af\u5883\u4ea4\u4e92\u8fdb\u884c\u5fae\u8c03\uff0cRAPID\u51cf\u5c11\u4e86LLM\u77e5\u8bc6\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5bf9\u4e0d\u540c\u4efb\u52a1\u7684\u9002\u5e94\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRAPID\u80fd\u591f\u4ee5\u9ad8\u6548\u3001\u9002\u5e94\u6027\u5f3a\u548c\u9c81\u68d2\u7684\u65b9\u5f0f\u5c06LLM\u77e5\u8bc6\u6709\u6548\u5730\u6574\u5408\u5230\u89c4\u6a21\u5316\u7684RL\u7b56\u7565\u4e2d\u3002\u4ee3\u7801\u548c\u68c0\u67e5\u70b9\u5c06\u5728\u63a5\u53d7\u540e\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.12481": "|**2024-10-16**|**SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling**|Loris Gaven et.al.|[2410.12481](http://arxiv.org/abs/2410.12481)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0d\u4ec5\u4f5c\u4e3a\u751f\u6210\u6a21\u578b\uff0c\u8fd8\u4f5c\u4e3a\u89e3\u51b3\u6587\u672c\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u7684\u4ee3\u7406\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5f53\u9762\u5bf9\u590d\u6742\u73af\u5883\uff0c\u5176\u96f6\u6837\u672c\u80fd\u529b\u4e0d\u8db3\u65f6\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0c\u53ef\u4ee5\u4f7f\u7528\u5728\u7ebf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9bLLM\u4ee3\u7406\u901a\u8fc7\u4ea4\u4e92\u5f0f\u65b9\u5f0f\u53d1\u73b0\u548c\u5b66\u4e60\u9ad8\u6548\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5148\u524d\u7684\u5de5\u4f5c\u4ec5\u9650\u4e8e\u91c7\u7528\u7b56\u7565\u68af\u5ea6\u7b97\u6cd5\uff0c\u8fd9\u5927\u5927\u9650\u5236\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u63a2\u7d22\u548c\u5229\u7528\u65b9\u9762\u53ef\u4ee5\u4f7f\u7528\u7684\u5404\u79cd\u65b9\u6cd5\uff0c\u4f8b\u5982\u7ecf\u9a8c\u91cd\u653e\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5bf9\u4e8eLLM\u5b66\u4e60\u4ee3\u7406\u6765\u8bf4\u53ef\u80fd\u662f\u5173\u952e\u7684\uff0c\u5c24\u5176\u662f\u5728\u8bbe\u8ba1\u81ea\u4e3b\u5185\u5728\u52a8\u673a\u4ee3\u7406\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u4f1a\u6839\u636e\u81ea\u5df1\u7684\u76ee\u6807\u8fdb\u884c\u91c7\u6837\u548c\u8ffd\u6c42\uff08\u5373\u81ea\u76ee\u7684\u6027\u4ee3\u7406\uff09\u3002\u672c\u6587\u63d0\u51fa\u5e76\u7814\u7a76\u4e86\u4e00\u79cd\u9002\u5e94\u8f6f\u6f14\u5458-\u8bc4\u8bba\u5bb6\u7b97\u6cd5\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u7684LLM\u4ee3\u7406\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u4e3a\u8bbe\u8ba1\u5728\u7ebf\u5b66\u4e60\u7684\u81ea\u76ee\u7684\u6027LLM\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\uff0c\u8fd8\u53ef\u4ee5\u5728\u66f4\u7ecf\u5178\u7684\u591a\u76ee\u6807RL\u73af\u5883\u4e2d\u8d85\u8d8a\u7b56\u7565\u68af\u5ea6\u65b9\u6cd5\u3002|\n", "2410.12361": "|**2024-10-16**|**Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance**|Yaxi Lu et.al.|[2410.12361](http://arxiv.org/abs/2410.12361)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5df2\u7ecf\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u4ee3\u7406\u7cfb\u7edf\u4ecd\u7136\u662f\u53cd\u5e94\u5f0f\u7684\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u9884\u89c1\u6027\u548c\u81ea\u4e3b\u51b3\u7b56\u7684\u60c5\u666f\u4e2d\u7684\u6709\u6548\u6027\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u81f4\u529b\u4e8e\u5f00\u53d1\u80fd\u591f\u9884\u89c1\u5e76\u4e3b\u52a8\u53d1\u8d77\u4efb\u52a1\u7684\u4ee3\u7406\uff0c\u800c\u65e0\u9700\u660e\u786e\u7684\u4eba\u7c7b\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u771f\u5b9e\u4e16\u754c\u7684\u4eba\u7c7b\u6d3b\u52a8\u4ee5\u751f\u6210\u4e3b\u52a8\u4efb\u52a1\u9884\u6d4b\u3002\u8fd9\u4e9b\u9884\u6d4b\u7531\u4eba\u7c7b\u6807\u6ce8\u8005\u6807\u8bb0\u4e3a\u63a5\u53d7\u6216\u62d2\u7edd\u3002\u6807\u6ce8\u540e\u7684\u6570\u636e\u88ab\u7528\u4e8e\u8bad\u7ec3\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u5224\u65ad\uff0c\u5e76\u4f5c\u4e3aLLM\u4ee3\u7406\u4e3b\u52a8\u6027\u7684\u81ea\u52a8\u8bc4\u4f30\u5668\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u751f\u6210\u7ba1\u9053\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b6,790\u4e2a\u4e8b\u4ef6\u7684\u591a\u6837\u5316\u6570\u636e\u96c6ProactiveBench\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u901a\u8fc7\u4f7f\u7528\u6240\u63d0\u51fa\u7684ProactiveBench\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u6fc0\u53d1LLM\u4ee3\u7406\u7684\u4e3b\u52a8\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u5fae\u8c03\u6a21\u578b\u5728\u4e3b\u52a8\u63d0\u4f9b\u5e2e\u52a9\u65b9\u9762\u7684F1\u5f97\u5206\u8fbe\u5230\u4e8666.47%\uff0c\u4f18\u4e8e\u6240\u6709\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u521b\u9020\u66f4\u4e3b\u52a8\u548c\u6709\u6548\u7684\u4ee3\u7406\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u672a\u6765\u7684\u4eba\u673a\u534f\u4f5c\u8fdb\u6b65\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.12236": "|**2024-10-16**|**Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay**|Yuyang Chen et.al.|[2410.12236](http://arxiv.org/abs/2410.12236)|null|\u5982\u4eca\uff0c\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684Transformer\u57fa\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u5e94\u7528\u91c7\u6837\u548c\u8fc7\u6ee4\u7ba1\u9053\u3002\u7531\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u7a00\u758f\u5956\u52b1\u95ee\u9898\uff0c\u5373\u4e00\u4e2a\u4ee4\u724c\u7684\u4e0d\u6b63\u786e\u6027\u4f1a\u5bfc\u81f4Transformer\u6a21\u578b\u91c7\u6837\u5197\u4f59\u7a0b\u5e8f\u76f4\u5230\u627e\u5230\u6b63\u786e\u7684\u7a0b\u5e8f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u4f4e\u6548\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5728\u5fae\u8c03\u9636\u6bb5\u5f15\u5165\u4e86\u7ecf\u9a8c\u56de\u653e\uff08ER\uff09\uff0c\u5176\u4e2d\u4ea7\u751f\u7684\u4ee3\u7801\u548c\u7a0b\u5e8f\u4f1a\u88ab\u5b58\u50a8\u5e76\u91cd\u653e\uff0c\u4ee5\u4f7fLLM\u4ee3\u7406\u6709\u673a\u4f1a\u4ece\u8fc7\u53bb\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\u3002\u57fa\u4e8eER\u7684\u7cbe\u795e\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u79f0\u4e3aBTP\u7ba1\u9053\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7531\u4e09\u4e2a\u9636\u6bb5\u7ec4\u6210\uff1a\u675f\u641c\u7d22\u91c7\u6837\u3001\u6d4b\u8bd5\u9636\u6bb5\u548c\u4f18\u5148\u7ea7\u7ecf\u9a8c\u56de\u653e\u9636\u6bb5\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7801\u6a21\u578b\u6536\u96c6\u7684\u5931\u8d25\u7a0b\u5e8f\uff0c\u5e76\u4ece\u56de\u653e\u7f13\u51b2\u533a\u4e2d\u91cd\u653e\u5177\u6709\u9ad8\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\u4f18\u5148\u503c\uff08P2Value\uff09\u7684\u7a0b\u5e8f\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u3002P2Value\u7efc\u5408\u8003\u8651\u4e86Transformer\u8f93\u51fa\u7684\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\uff0c\u5e76\u53ef\u4ee5\u5229\u7528\u5927\u591a\u6570\u7531LLMs\u6536\u96c6\u7684\u7a0b\u5e8f\u672a\u80fd\u901a\u8fc7\u4efb\u4f55\u6d4b\u8bd5\u800c\u5bfc\u81f4\u7684\u5197\u4f59\u8d44\u6e90\u3002\u6211\u4eec\u5b9e\u8bc1\u5730\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u5e94\u7528\u4e8e\u51e0\u79cdLLM\u4e2d\uff0c\u8bc1\u660e\u5b83\u63d0\u5347\u4e86\u5b83\u4eec\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u57fa\u7ebf\u3002|\n", "2410.11906": "|**2024-10-15**|**Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents**|Bolun Sun et.al.|[2410.11906](http://arxiv.org/abs/2410.11906)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u589e\u5f3a\u7528\u6237\u5bf9\u9690\u79c1\u653f\u7b56\u7684\u7406\u89e3\u7684\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u4e92\u5f0f\u5bf9\u8bdd\u4ee3\u7406\u5b9e\u73b0\u3002\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u6570\u636e\u5b9e\u8df5\u8bc6\u522b\u3001\u9009\u62e9\u8bc6\u522b\u3001\u653f\u7b56\u603b\u7ed3\u548c\u9690\u79c1\u95ee\u7b54\u7b49\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u6a21\u578b\uff0c\u4e3a\u9690\u79c1\u653f\u7b56\u5206\u6790\u8bbe\u7acb\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u8be5\u4ee3\u7406\u4f5c\u4e3a\u5904\u7406\u7f51\u7ad9\u9690\u79c1\u653f\u7b56\u7684\u4e13\u5bb6\u7cfb\u7edf\uff0c\u80fd\u591f\u5728\u4e0d\u9700\u7528\u6237\u63d0\u4f9b\u7279\u5b9a\u95ee\u9898\u7684\u60c5\u51b5\u4e0b\u5f15\u5bfc\u7528\u6237\u7406\u89e3\u590d\u6742\u7684\u6cd5\u5f8b\u8bed\u8a00\u3002\u4e00\u9879\u6d89\u53ca100\u540d\u53c2\u4e0e\u8005\u7684\u7528\u6237\u7814\u7a76\u8868\u660e\uff0c\u4f7f\u7528\u8be5\u4ee3\u7406\u7684\u7528\u6237\u5177\u6709\u66f4\u9ad8\u7684\u7406\u89e3\u6c34\u5e73\uff08\u5e73\u5747\u52062.6/3\uff0c\u800c\u5bf9\u7167\u7ec4\u4e3a1.8\uff09\uff0c\u66f4\u4f4e\u7684\u8ba4\u77e5\u8d1f\u8377\uff08\u4efb\u52a1\u96be\u5ea6\u8bc4\u5206\u4e3a3.2/10\uff0c\u800c\u5bf9\u7167\u7ec4\u4e3a7.8\uff09\uff0c\u66f4\u9ad8\u7684\u9690\u79c1\u7ba1\u7406\u4fe1\u5fc3\uff0c\u5e76\u4e14\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u65f6\u95f4\u66f4\u77ed\uff085.5\u5206\u949fvs.15.8\u5206\u949f\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u7a81\u663e\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6539\u53d8\u7528\u6237\u4e0e\u9690\u79c1\u653f\u7b56\u4e92\u52a8\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u6709\u52a9\u4e8e\u83b7\u5f97\u66f4\u52a0\u77e5\u60c5\u7684\u540c\u610f\uff0c\u5e76\u5728\u6570\u5b57\u670d\u52a1\u9886\u57df\u8d4b\u4e88\u7528\u6237\u66f4\u591a\u6743\u529b\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u5b9e\u73b0\u81ea\u4e3b\u6027\uff0c\u53ef\u4ee5\u63d0\u5347\u4eba\u7c7b\u5728\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u4f8b\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\u7684\u540c\u65f6\uff0c\u7f51\u7edc\u4ee3\u7406\u4e5f\u4f5c\u4e3a\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6982\u5ff5\u8bc1\u660e\u793a\u4f8b\uff0c\u5176\u6210\u529f\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u65e0\u6cd5\u5728\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u666e\u904d\u9002\u7528\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0e\u57fa\u4e8eLLM\u7684\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u4e0d\u5339\u914d\u7684\u7814\u7a76\u6709\u9650\u3002\u8fd9\u79cd\u5dee\u5f02\u5c24\u5176\u660e\u663e\uff0c\u56e0\u4e3aLLM\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u8fdb\u884c\u8bad\u7ec3\uff0c\u800c\u4e0d\u662f\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u5316\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u6765\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u7b26\u5408LLM\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u57fa\u7840\u4ee3\u7406AgentOccam\u5728\u5404\u79cd\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u8d85\u8d8a\u4e4b\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e00\u4e2a\u5305\u542b\u901a\u7528\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u540c\u671f\u5de5\u4f5c\u5206\u522b\u9ad8\u51fa9.8\uff08+29.4%\uff09\u548c5.9\uff08+15.8%\uff09\u4e2a\u767e\u5206\u70b9\uff0c\u5e76\u4e14\u6210\u529f\u7387\u8fbe\u523026.6\u70b9\uff08+161%\uff09\uff0c\u8d85\u8fc7\u4e86\u5177\u6709\u76f8\u540c\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u9f50\u7684\u666e\u901a\u7f51\u7edc\u4ee3\u7406\u3002\u6211\u4eec\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\uff0c\u800c\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u7b80\u6d01\u8bbe\u8ba1\u7a81\u663e\u4e86LLMs\u5728\u7f51\u9875\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6027\u80fd\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2410.13768": "|**2024-10-17**|**Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems**|Alireza Ghafarollahi et.al.|[2410.13768](http://arxiv.org/abs/2410.13768)|null|\u4e00\u4e2a\u591a\u667a\u80fd\u4f53AI\u6a21\u578b\u88ab\u7528\u4e8e\u81ea\u52a8\u5316\u53d1\u73b0\u65b0\u7684\u91d1\u5c5e\u5408\u91d1\uff0c\u8be5\u6a21\u578b\u6574\u5408\u4e86\u591a\u6a21\u6001\u6570\u636e\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u5305\u62ec\u901a\u8fc7\u539f\u5b50\u6a21\u62df\u83b7\u5f97\u7684\u7269\u7406\u89c1\u89e3\u3002\u6211\u4eec\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5177\u6709\u4e09\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(a) \u4e00\u7ec4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8d1f\u8d23\u63a8\u7406\u548c\u89c4\u5212\u7b49\u4efb\u52a1\uff0c(b) \u4e00\u7fa4\u5177\u6709\u4e0d\u540c\u89d2\u8272\u548c\u4e13\u4e1a\u77e5\u8bc6\u7684AI\u4ee3\u7406\u52a8\u6001\u534f\u4f5c\uff0c\u4ee5\u53ca(c) \u4e00\u79cd\u65b0\u5f00\u53d1\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\u6a21\u578b\uff0c\u7528\u4e8e\u5feb\u901f\u68c0\u7d22\u5173\u952e\u7269\u7406\u5c5e\u6027\u3002\u4e00\u7ec4\u7531LLM\u9a71\u52a8\u7684AI\u4ee3\u7406\u5408\u4f5c\u81ea\u52a8\u5316\u63a2\u7d22MPEAs\uff08\u9ad8\u71b5\u5408\u91d1\uff09\u7684\u5de8\u5927\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5e76\u901a\u8fc7GNN\u7684\u9884\u6d4b\u8fdb\u884c\u5f15\u5bfc\u3002\u6211\u4eec\u4e13\u6ce8\u4e8eNbMoTa\u7cfb\u5217\u4f53\u5fc3\u7acb\u65b9\uff08bcc\uff09\u5408\u91d1\uff0c\u8fd9\u4e9b\u5408\u91d1\u4f7f\u7528\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u539f\u5b50\u95f4\u52bf\u8fdb\u884c\u5efa\u6a21\uff0c\u5e76\u9488\u5bf9\u4e24\u4e2a\u5173\u952e\u6027\u8d28\uff1aPeierls\u52bf\u5792\u548c\u56fa\u6eb6\u4f53/\u87ba\u4f4d\u9519\u76f8\u4e92\u4f5c\u7528\u80fd\u3002\u6211\u4eec\u7684GNN\u6a21\u578b\u51c6\u786e\u5730\u9884\u6d4b\u4e86\u8fd9\u4e9b\u539f\u5b50\u5c3a\u5ea6\u7684\u6027\u8d28\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u6602\u8d35\u7684\u66b4\u529b\u8ba1\u7b97\u66f4\u5feb\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u5e76\u51cf\u8f7b\u4e86\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5728\u7269\u7406\u68c0\u7d22\u4e0a\u7684\u8ba1\u7b97\u8d1f\u62c5\u3002\u8fd9\u4e2aAI\u7cfb\u7edf\u901a\u8fc7\u51cf\u5c11\u5bf9\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u4f9d\u8d56\u5e76\u514b\u670d\u76f4\u63a5\u5168\u539f\u5b50\u6a21\u62df\u7684\u5c40\u9650\u6027\uff0c\u9769\u65b0\u4e86\u6750\u6599\u7684\u53d1\u73b0\u8fc7\u7a0b\u3002\u901a\u8fc7\u534f\u540cGNN\u7684\u9884\u6d4b\u80fd\u529b\u548cLLM\u4ee3\u7406\u7684\u52a8\u6001\u534f\u4f5c\uff0c\u7cfb\u7edf\u81ea\u4e3b\u5bfc\u822a\u5de8\u5927\u7684\u5408\u91d1\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u8bc6\u522b\u539f\u5b50\u5c3a\u5ea6\u6750\u6599\u6027\u8d28\u7684\u8d8b\u52bf\uff0c\u5e76\u9884\u6d4b\u5b8f\u89c2\u5c3a\u5ea6\u7684\u673a\u68b0\u5f3a\u5ea6\uff0c\u5982\u82e5\u5e72\u4e2a\u8ba1\u7b97\u5b9e\u9a8c\u6240\u5c55\u793a\u7684\u90a3\u6837\u3002\u8fd9\u79cd\u65b9\u6cd5\u52a0\u901f\u4e86\u5148\u8fdb\u5408\u91d1\u7684\u53d1\u73b0\uff0c\u5e76\u6709\u671b\u5728\u5176\u4ed6\u590d\u6742\u7cfb\u7edf\u4e2d\u6709\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u6750\u6599\u8bbe\u8ba1\u9886\u57df\u7684\u4e00\u5927\u8fdb\u6b65\u3002|\n", "2410.13610": "|**2024-10-17**|**MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling**|Yakun Zhu et.al.|[2410.13610](http://arxiv.org/abs/2410.13610)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u96c6\u6210\u5de5\u5177\u5df2\u7ecf\u4fc3\u8fdb\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u5728\u4e13\u95e8\u7684\u4e0b\u6e38\u4efb\u52a1\u573a\u666f\u4e2d\uff0c\u4ec5\u4f9d\u8d56\u5de5\u5177\u4e0d\u8db3\u4ee5\u5b8c\u5168\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u6027\uff0c\u8fd9\u5c24\u5176\u9650\u5236\u4e86LLMs\u5728\u533b\u5b66\u7b49\u9886\u57df\u7684\u6709\u6548\u90e8\u7f72\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u533b\u5b66\u8ba1\u7b97\u5668\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u4f7f\u7528\u6807\u51c6\u5316\u6d4b\u8bd5\u6765\u8bc4\u4f30\u4e2a\u4eba\u7684\u5065\u5eb7\u72b6\u51b5\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86MeNTi\uff0c\u8fd9\u662f\u4e00\u79cd\u4e3aLLMs\u8bbe\u8ba1\u7684\u901a\u7528\u4ee3\u7406\u67b6\u6784\u3002MeNTi\u96c6\u6210\u4e86\u4e13\u4e1a\u7684\u533b\u5b66\u5de5\u5177\u5305\uff0c\u5e76\u91c7\u7528\u5143\u5de5\u5177\u548c\u5d4c\u5957\u8c03\u7528\u673a\u5236\u4ee5\u589e\u5f3aLLMs\u5bf9\u5de5\u5177\u7684\u5229\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5b9e\u73b0\u4e86\u7075\u6d3b\u7684\u5de5\u5177\u9009\u62e9\u548c\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u6765\u5e94\u5bf9\u590d\u6742\u7684\u533b\u5b66\u573a\u666f\u4e2d\u7684\u5b9e\u9645\u95ee\u9898\uff0c\u5305\u62ec\u8ba1\u7b97\u5668\u9009\u62e9\u3001\u63d2\u69fd\u586b\u5145\u548c\u5355\u4f4d\u8f6c\u6362\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u5728\u6574\u4e2a\u4e34\u5e8a\u8fc7\u7a0b\u4e2d\u7684\u8ba1\u7b97\u5668\u573a\u666f\u4e0b\u7684\u5b9a\u91cf\u8bc4\u4f30\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86CalcQA\u57fa\u51c6\u3002\u8be5\u57fa\u51c6\u8981\u6c42LLMs\u4f7f\u7528\u533b\u5b66\u8ba1\u7b97\u5668\u8fdb\u884c\u8ba1\u7b97\u5e76\u8bc4\u4f30\u60a3\u8005\u7684\u5065\u5eb7\u72b6\u51b5\u3002CalcQA\u7531\u4e13\u4e1a\u533b\u751f\u6784\u5efa\uff0c\u5305\u542b100\u4e2a\u6848\u4f8b-\u8ba1\u7b97\u5668\u5bf9\uff0c\u5e76\u9644\u5e26\u4e00\u4e2a\u5305\u542b281\u79cd\u533b\u5b66\u5de5\u5177\u7684\u5de5\u5177\u5305\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\u3002\u672c\u7814\u7a76\u4e3a\u5728\u533b\u5b66\u7684\u9ad8\u9700\u6c42\u573a\u666f\u4e2d\u5e94\u7528LLMs\u5f00\u8f9f\u4e86\u65b0\u7684\u65b9\u5411\u3002|\n", "2410.13185": "|**2024-10-17**|**Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents**|Long Li et.al.|[2410.13185](http://arxiv.org/abs/2410.13185)|**[link](https://github.com/damo-nlp-sg/coi-agent)**|\u6709\u6548\u7684\u7814\u7a76\u521b\u610f\u6784\u601d\u662f\u79d1\u5b66\u7814\u7a76\u7684\u5173\u952e\u6b65\u9aa4\u3002\u7136\u800c\uff0c\u79d1\u5b66\u6587\u732e\u7684\u6307\u6570\u589e\u957f\u4f7f\u5f97\u7814\u7a76\u4eba\u5458\u96be\u4ee5\u8ddf\u4e0a\u6700\u65b0\u7684\u8fdb\u5c55\u5e76\u786e\u5b9a\u6709\u610f\u4e49\u7684\u7814\u7a76\u65b9\u5411\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u8868\u660e\uff0c\u81ea\u52a8\u5316\u751f\u6210\u65b0\u9896\u7684\u7814\u7a76\u521b\u610f\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u521b\u610f\u751f\u6210\u65b9\u6cd5\u8981\u4e48\u7b80\u5355\u5730\u63d0\u793aLLMs\uff0c\u8981\u4e48\u76f4\u63a5\u5411LLMs\u66b4\u9732\u5927\u91cf\u6587\u732e\u800c\u6ca1\u6709\u6307\u793a\u6709\u7528\u7684\u4fe1\u606f\u3002\u53d7\u5230\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7814\u7a76\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3aChain-of-Ideas\uff08CoI\uff09\u4ee3\u7406\u7684\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5b83\u4ee5\u94fe\u5f0f\u7ed3\u6784\u7ec4\u7ec7\u76f8\u5173\u6587\u732e\uff0c\u6709\u6548\u53cd\u6620\u4e86\u7814\u7a76\u9886\u57df\u7684\u6e10\u8fdb\u53d1\u5c55\u3002\u8fd9\u79cd\u7ec4\u7ec7\u65b9\u5f0f\u4f7fLLMs\u80fd\u591f\u6355\u6349\u5f53\u524d\u7684\u7814\u7a76\u8fdb\u5c55\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u521b\u610f\u751f\u6210\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86Idea Arena\u8bc4\u4f30\u534f\u8bae\uff0c\u53ef\u4ee5\u4ece\u4e0d\u540c\u89d2\u5ea6\u5168\u9762\u8bc4\u4f30\u521b\u610f\u751f\u6210\u65b9\u6cd5\uff0c\u4e0e\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7684\u504f\u597d\u7d27\u5bc6\u5bf9\u9f50\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoI\u4ee3\u7406\u5728\u521b\u610f\u751f\u6210\u65b9\u9762\u59cb\u7ec8\u4f18\u4e8e\u5176\u4ed6\u65b9\u6cd5\uff0c\u5e76\u4e14\u5176\u8d28\u91cf\u53ef\u4e0e\u4eba\u7c7b\u5ab2\u7f8e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684CoI\u4ee3\u7406\u6210\u672c\u4f4e\u5ec9\uff0c\u751f\u6210\u4e00\u4e2a\u5019\u9009\u521b\u610f\u53ca\u5176\u76f8\u5e94\u5b9e\u9a8c\u8bbe\u8ba1\u7684\u6700\u4f4e\u6210\u672c\u4ec5\u4e3a0.50\u7f8e\u5143\u3002|\n", "2410.14569": "|**2024-10-18**|**When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs**|Hanna Kim et.al.|[2410.14569](http://arxiv.org/abs/2410.14569)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u4f7f\u5176\u6210\u4e3a\u80fd\u591f\u89c4\u5212\u548c\u4e0e\u5404\u79cd\u5de5\u5177\u4ea4\u4e92\u7684\u81ea\u4e3b\u7cfb\u7edf\u3002\u8fd9\u4e9bLLM\u4ee3\u7406\u901a\u5e38\u4e0e\u57fa\u4e8e\u7f51\u7edc\u7684\u5de5\u5177\u7ed3\u5408\u4f7f\u7528\uff0c\u4ece\u800c\u80fd\u591f\u8bbf\u95ee\u591a\u6837\u5316\u7684\u4fe1\u606f\u6e90\u548c\u5b9e\u65f6\u6570\u636e\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u8fdb\u5c55\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u5e26\u6765\u4e86\u663e\u8457\u7684\u597d\u5904\uff0c\u4f46\u5b83\u4eec\u4e5f\u589e\u52a0\u4e86\u6076\u610f\u4f7f\u7528\u7684\u98ce\u9669\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u4e2a\u4eba\u9690\u79c1\u4fe1\u606f\u7684\u7f51\u7edc\u653b\u51fb\u4e2d\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8c03\u67e5\u4e86LLM\u4ee3\u7406\u5728\u6d89\u53ca\u4e2a\u4eba\u6570\u636e\u7684\u7f51\u7edc\u653b\u51fb\u4e2d\u7684\u8bef\u7528\u98ce\u9669\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u65e8\u5728\u4e86\u89e3\uff1a1\uff09\u5f53\u6307\u5bfcLLM\u4ee3\u7406\u8fdb\u884c\u7f51\u7edc\u653b\u51fb\u65f6\uff0c\u5176\u6f5c\u5728\u7684\u80fd\u529b\uff1b2\uff09\u57fa\u4e8e\u7f51\u7edc\u7684\u5de5\u5177\u5982\u4f55\u589e\u5f3a\u7f51\u7edc\u653b\u51fb\uff1b\u4ee5\u53ca3\uff09\u5229\u7528LLM\u4ee3\u7406\u53d1\u8d77\u7f51\u7edc\u653b\u51fb\u53d8\u5f97\u591a\u4e48\u7ecf\u6d4e\u5b9e\u60e0\u548c\u5bb9\u6613\u3002\u6211\u4eec\u8003\u5bdf\u4e86\u4e09\u79cd\u653b\u51fb\u573a\u666f\uff1a\u6536\u96c6\u4e2a\u4eba\u8eab\u4efd\u4fe1\u606f\uff08PII\uff09\u3001\u751f\u6210\u5192\u5145\u5e16\u5b50\u548c\u521b\u5efa\u5b9a\u5411\u9493\u9c7c\u90ae\u4ef6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86LLM\u4ee3\u7406\u5728\u8fd9\u7c7b\u653b\u51fb\u4e2d\u7684\u6709\u6548\u6027\uff1aLLM\u4ee3\u7406\u5728\u6536\u96c6PII\u65b9\u9762\u7684\u51c6\u786e\u7387\u9ad8\u8fbe95.9%\uff0c\u7531LLM\u4ee3\u7406\u751f\u6210\u7684\u5192\u5145\u5e16\u5b50\u4e2d\u6709\u9ad8\u8fbe93.9%\u88ab\u8bc4\u4f30\u4e3a\u771f\u5b9e\uff0c\u800c\u7531LLM\u4ee3\u7406\u521b\u5efa\u7684\u5b9a\u5411\u9493\u9c7c\u90ae\u4ef6\u4e2d\u7684\u94fe\u63a5\u70b9\u51fb\u7387\u8fbe\u5230\u4e8646.67%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u5f3a\u8c03\u4e86\u73b0\u6709\u5546\u4e1aLLM\u4e2d\u7684\u5b89\u5168\u9632\u62a4\u63aa\u65bd\u7684\u5c40\u9650\u6027\uff0c\u5f3a\u8c03\u4e86\u8feb\u5207\u9700\u8981\u66f4\u5f3a\u5927\u7684\u5b89\u5168\u63aa\u65bd\u6765\u9632\u6b62LLM\u4ee3\u7406\u7684\u8bef\u7528\u3002|\n", "2410.14516": "|**2024-10-18**|**Do LLMs \"know\" internally when they follow instructions?**|Juyeon Heo et.al.|[2410.14516](http://arxiv.org/abs/2410.14516)|null|\u6307\u4ee4\u8ddf\u968f\u5bf9\u4e8e\u6784\u5efa\u5177\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684AI\u4ee3\u7406\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6a21\u578b\u5fc5\u987b\u4e25\u683c\u9075\u5faa\u7528\u6237\u63d0\u4f9b\u7684\u7ea6\u675f\u548c\u6307\u5357\u3002\u7136\u800c\uff0cLLMs\u7ecf\u5e38\u65e0\u6cd5\u9075\u5faa\u5373\u4f7f\u662f\u7b80\u5355\u4e14\u660e\u786e\u7684\u6307\u4ee4\u3002\u4e3a\u4e86\u63d0\u9ad8\u6307\u4ee4\u8ddf\u968f\u7684\u6210\u529f\u7387\u5e76\u9632\u6b62\u4e0d\u671f\u671b\u7684\u8f93\u51fa\uff0c\u9700\u8981\u66f4\u6df1\u5165\u5730\u7406\u89e3LLMs\u7684\u5185\u90e8\u72b6\u6001\u4e0e\u8fd9\u4e9b\u7ed3\u679c\u4e4b\u95f4\u7684\u5173\u7cfb\u3002\u6211\u4eec\u5bf9LLM\u7684\u5185\u90e8\u72b6\u6001\u8fdb\u884c\u5206\u6790\uff0c\u53d1\u73b0\u8f93\u5165\u5d4c\u5165\u7a7a\u95f4\u4e2d\u5b58\u5728\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u4e0e\u6210\u529f\u7684\u6307\u4ee4\u8ddf\u968f\u76f8\u5173\u8054\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6cbf\u7740\u8fd9\u4e2a\u7ef4\u5ea6\u4fee\u6539\u8868\u793a\u53ef\u4ee5\u63d0\u9ad8\u6307\u4ee4\u8ddf\u968f\u7684\u6210\u529f\u7387\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u54cd\u5e94\u8d28\u91cf\u3002\u8fdb\u4e00\u6b65\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u4e2a\u7ef4\u5ea6\u4e0e\u63d0\u793a\u7684\u63aa\u8f9e\u5173\u7cfb\u66f4\u4e3a\u5bc6\u5207\uff0c\u800c\u4e0d\u662f\u4efb\u52a1\u6216\u6307\u4ee4\u7684\u56fa\u6709\u96be\u5ea6\u3002\u8fd9\u4e00\u53d1\u73b0\u8fd8\u89e3\u91ca\u4e86\u4e3a\u4ec0\u4e48LLMs\u6709\u65f6\u65e0\u6cd5\u9075\u5faa\u6e05\u6670\u7684\u6307\u4ee4\uff0c\u4ee5\u53ca\u4e3a\u4ec0\u4e48\u5373\u4f7f\u5185\u5bb9\u57fa\u672c\u4e0d\u53d8\uff0c\u63d0\u793a\u5de5\u7a0b\u5f80\u5f80\u6709\u6548\u7684\u539f\u56e0\u3002\u8fd9\u9879\u5de5\u4f5c\u63ed\u793a\u4e86LLMs\u6307\u4ee4\u8ddf\u968f\u7684\u5185\u90e8\u673a\u5236\uff0c\u4e3a\u53ef\u9760LLM\u4ee3\u7406\u7684\u5f00\u53d1\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.14368": "|**2024-10-18**|**CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic**|Huaiyuan Yao et.al.|[2410.14368](http://arxiv.org/abs/2410.14368)|**[link](https://github.com/hyan-yao/comal)**|**\u5728\u57ce\u5e02\u4ea4\u901a\u4e2d\u5f15\u5165\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u53ef\u4ee5\u901a\u8fc7\u51cf\u5c11\u62e5\u5835\u548c\u7cfb\u7edf\u5730\u4f18\u5316\u4ea4\u901a\u6d41\u91cf\u6765\u63d0\u9ad8\u6548\u7387\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCoMAL\uff08\u534f\u4f5c\u591a\u667a\u80fd\u4f53\u5927\u8bed\u8a00\u6a21\u578b\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u4e4b\u95f4\u7684\u534f\u4f5c\u89e3\u51b3\u6df7\u5408\u81ea\u4e3b\u4ea4\u901a\u95ee\u9898\uff0c\u4ece\u800c\u4f18\u5316\u4ea4\u901a\u6d41\u91cf\u3002CoMAL\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5728\u4ea4\u4e92\u5f0f\u4ea4\u901a\u4eff\u771f\u73af\u5883\u4e2d\u8fd0\u884c\u3002\u5b83\u5229\u7528\u611f\u77e5\u6a21\u5757\u89c2\u5bdf\u5468\u56f4\u4ee3\u7406\uff0c\u5e76\u4f7f\u7528\u8bb0\u5fc6\u6a21\u5757\u5b58\u50a8\u6bcf\u4e2a\u4ee3\u7406\u7684\u7b56\u7565\u3002\u6574\u4f53\u5de5\u4f5c\u6d41\u7a0b\u5305\u62ec\u4e00\u4e2a\u534f\u4f5c\u6a21\u5757\uff0c\u9f13\u52b1\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u8ba8\u8bba\u6709\u6548\u7684\u7b56\u7565\u5e76\u5206\u914d\u89d2\u8272\uff0c\u4e00\u4e2a\u63a8\u7406\u5f15\u64ce\u6839\u636e\u5206\u914d\u7684\u89d2\u8272\u786e\u5b9a\u6700\u4f18\u884c\u4e3a\uff0c\u4ee5\u53ca\u4e00\u4e2a\u6267\u884c\u6a21\u5757\u4f7f\u7528\u7ed3\u5408\u4e86\u57fa\u4e8e\u89c4\u5219\u6a21\u578b\u7684\u6df7\u5408\u65b9\u6cd5\u63a7\u5236\u8f66\u8f86\u52a8\u4f5c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoMAL\u5728Flow\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u7684\u5f71\u54cd\uff0c\u5e76\u5c06\u5176\u6846\u67b6\u4e0e\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u8fd9\u7a81\u663e\u4e86LLM\u4ee3\u7406\u7684\u5f3a\u5927\u5408\u4f5c\u80fd\u529b\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u6765\u5e94\u5bf9\u6df7\u5408\u81ea\u4e3b\u4ea4\u901a\u6311\u6218\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Hyan-Yao/CoMAL\u83b7\u53d6\u3002**|\n", "2410.14262": "|**2024-10-18**|**Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation**|Edward et.al.|[2410.14262](http://arxiv.org/abs/2410.14262)|null|\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u68c0\u6d4b\u548c\u7ea0\u6b63AI\u751f\u6210\u5185\u5bb9\u4e2d\u5e7b\u89c9\u73b0\u8c61\u7684\u80fd\u529b\u3002\u4e00\u4e2a\u4e3b\u8981\u4ee3\u7406\u88ab\u4efb\u52a1\u521b\u5efa\u4e00\u7bc7\u5173\u4e8e\u4e00\u4f4d\u865a\u6784\u7684\u4e39\u9ea6\u827a\u672f\u5bb6Flipfloppidy\u7684\u535a\u5ba2\uff0c\u7136\u540e\u7531\u53e6\u4e00\u4e2a\u4ee3\u7406\u8fdb\u884c\u5ba1\u67e5\u4ee5\u8bc6\u522b\u4e8b\u5b9e\u6027\u9519\u8bef\u3002\u5927\u591a\u6570LLM\u6a21\u578b\u5e7b\u5316\u51fa\u4e86\u8fd9\u4f4d\u827a\u672f\u5bb6\u7684\u5b58\u5728\u3002\u5728\u6d89\u53ca\u5404\u79cd\u4e3b\u4ee3\u7406\u548c\u5ba1\u67e5\u4ee3\u7406\u7ec4\u5408\u76844900\u6b21\u6d4b\u8bd5\u8fd0\u884c\u4e2d\uff0c\u5148\u8fdb\u7684AI\u6a21\u578b\u5982Llama3-70b\u548cGPT-4\u53d8\u4f53\u5728\u8bc6\u522b\u5e7b\u89c9\u65b9\u9762\u51e0\u4e4e\u8fbe\u5230\u4e86\u5b8c\u7f8e\u7684\u51c6\u786e\u7387\uff0c\u5e76\u4e14\u5728\u6536\u5230\u53cd\u9988\u540e\u6210\u529f\u4fee\u6b63\u4e86\u8f93\u51fa\u5185\u5bb9\u768485%\u5230100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5148\u8fdbAI\u6a21\u578b\u5728\u663e\u8457\u63d0\u9ad8\u751f\u6210\u5185\u5bb9\u7684\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u6539\u8fdbAI\u5de5\u4f5c\u6d41\u7f16\u6392\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u3002|\n", "2410.14209": "|**2024-10-18**|**Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents**|Zihan Liu et.al.|[2410.14209](http://arxiv.org/abs/2410.14209)|null|\u5728\u5de5\u4e1a\u63a7\u5236\u7cfb\u7edf\u4e2d\uff0c\u53ef\u7f16\u7a0b\u903b\u8f91\u63a7\u5236\u5668\uff08PLC\uff09\u4ee3\u7801\u7684\u751f\u6210\u548c\u9a8c\u8bc1\u5bf9\u4e8e\u786e\u4fdd\u8fd0\u884c\u6548\u7387\u548c\u5b89\u5168\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u52a8\u5316\u4ee3\u7801\u751f\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u65e0\u6cd5\u63d0\u4f9b\u6b63\u786e\u6027\u4fdd\u8bc1\uff0c\u5e76\u4e14\u7f3a\u4e4f\u5bf9PLC\u7f16\u7a0b\u7684\u4e13\u4e1a\u652f\u6301\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgents4PLC\u7684\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4e0d\u4ec5\u5b9e\u73b0\u4e86PLC\u4ee3\u7801\u7684\u81ea\u52a8\u5316\u751f\u6210\uff0c\u8fd8\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u8fdb\u884c\u4e86\u4ee3\u7801\u7ea7\u522b\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u9996\u5148\u5efa\u7acb\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u53ef\u9a8c\u8bc1\u7684PLC\u4ee3\u7801\u751f\u6210\u9886\u57df\uff0c\u4ece\u81ea\u7136\u8bed\u8a00\u9700\u6c42\u8fc7\u6e21\u5230\u4eba\u5de5\u7f16\u5199\u548c\u9a8c\u8bc1\u7684\u5f62\u5f0f\u5316\u89c4\u8303\u548c\u53c2\u8003PLC\u4ee3\u7801\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u3001\u5148\u8fdb\u7684\u63d0\u793a\u5de5\u7a0b\u6280\u672f\u548c\u94fe\u5f0f\u601d\u7ef4\u7b56\u7565\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u9488\u5bf9\u5de5\u4e1a\u63a7\u5236\u7cfb\u7edf\u7684\u201c\u4ee3\u7406\u201d\u3002\u8bc4\u4f30\u8868\u660e\uff0cAgents4PLC\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u4e00\u7cfb\u5217\u65e5\u76ca\u4e25\u683c\u7684\u6307\u6807\u4e0a\u5747\u53d6\u5f97\u4e86\u4f18\u5f02\u7684\u7ed3\u679c\u3002\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u89e3\u51b3\u4e86PLC\u7f16\u7a0b\u4e2d\u7684\u5173\u952e\u6311\u6218\uff0c\u8fd8\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u9002\u7528\u4e8e\u5b9e\u9645\u5de5\u4e1a\u5e94\u7528\u7684\u53ef\u9a8c\u8bc1\u4ee3\u7801\u7684\u6f5c\u529b\u3002|\n", "2410.14202": "|**2024-10-18**|**Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs**|SeongYeub Chu et.al.|[2410.14202](http://arxiv.org/abs/2410.14202)|null|\u73b0\u6709\u7684\u81ea\u52a8\u4f5c\u6587\u8bc4\u5206\uff08AES\uff09\u4ec5\u4f9d\u8d56\u4e8e\u4f5c\u6587\u6587\u672c\uff0c\u800c\u672a\u4f7f\u7528\u89e3\u91ca\u6027\u7406\u7531\u5206\u6570\uff0c\u56e0\u6b64\u9519\u5931\u4e86\u4ee5\u7ec6\u7c92\u5ea6\u65b9\u5f0f\u6355\u6349\u8bc4\u5206\u6807\u51c6\u4e2d\u7279\u5b9a\u8bc4\u4f30\u65b9\u9762\u7684\u673a\u4f1a\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u57fa\u4e8e\u8bba\u636e\u7684\u591a\u7279\u5f81\u8bc4\u5206\uff08RMTS\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u57fa\u4e8e\u63d0\u793a\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4f7f\u7528\u8f83\u5c0f\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08S-LLM\uff09\u7684\u5fae\u8c03\u5f0f\u4f5c\u6587\u8bc4\u5206\u6a21\u578b\u3002RMTS \u4f7f\u7528\u57fa\u4e8eLLM\u7684\u7279\u5f81\u8bba\u636e\u751f\u6210\u7cfb\u7edf\uff0c\u5176\u4e2d\u5355\u72ec\u7684LLM\u4ee3\u7406\u6839\u636e\u8bc4\u5206\u6807\u51c6\u6307\u5357\u751f\u6210\u7279\u5f81\u7279\u5b9a\u7684\u7406\u7531\uff0c\u8bc4\u5206\u6a21\u578b\u5229\u7528\u8fd9\u4e9b\u7406\u7531\u51c6\u786e\u9884\u6d4b\u591a\u7279\u5f81\u5206\u6570\u3002\u5728\u57fa\u51c6\u6570\u636e\u96c6\uff08\u5305\u62ecASAP\u3001ASAP++\u548cFeedback Prize\uff09\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cRMTS \u5728\u7279\u5f81\u7279\u5b9a\u8bc4\u5206\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u548c\u666e\u901a\u7684S-LLM\u3002\u901a\u8fc7\u8f85\u52a9\u5b9a\u91cf\u8bc4\u4f30\u4ee5\u63d0\u4f9b\u7ec6\u7c92\u5ea6\u7684\u5b9a\u6027\u7406\u7531\uff0cRMTS \u63d0\u9ad8\u4e86\u7279\u5f81\u8bc4\u5206\u7684\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5173\u4e8e\u4f5c\u6587\u7684\u90e8\u5206\u89e3\u91ca\u3002|\n", "2410.14152": "|**2024-10-18**|**SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent**|Jiarui Ji et.al.|[2410.14152](http://arxiv.org/abs/2410.14152)|**[link](https://github.com/jijiarui-cather/srapagent_framework)**|\u516c\u5171\u7a00\u7f3a\u8d44\u6e90\u914d\u7f6e\u5728\u7ecf\u6d4e\u5b66\u4e2d\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\uff0c\u56e0\u4e3a\u5b83\u76f4\u63a5\u5f71\u54cd\u5230\u793e\u4f1a\u7684\u6548\u7387\u548c\u516c\u5e73\u6027\u3002\u4f20\u7edf\u7814\u7a76\u65b9\u6cd5\uff0c\u5305\u62ec\u57fa\u4e8e\u7406\u8bba\u6a21\u578b\u3001\u57fa\u4e8e\u5b9e\u8bc1\u7814\u7a76\u548c\u57fa\u4e8e\u4eff\u771f\u7684\u65b9\u6cd5\uff0c\u7531\u4e8e\u5b58\u5728\u7406\u60f3\u5316\u7684\u5b8c\u5168\u4fe1\u606f\u548c\u4e2a\u4f53\u7406\u6027\u7684\u5047\u8bbe\u4ee5\u53ca\u6709\u9650\u53ef\u7528\u6570\u636e\u7684\u9650\u5236\uff0c\u9762\u4e34\u7740\u5c40\u9650\u6027\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6SRAP-Agent\uff08\u4f7f\u7528\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u7684\u667a\u80fd\u4f53\u6a21\u62df\u548c\u4f18\u5316\u7a00\u7f3a\u8d44\u6e90\u914d\u7f6e\u653f\u7b56\uff09\uff0c\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u96c6\u6210\u5230\u7ecf\u6d4e\u4eff\u771f\u4e2d\uff0c\u65e8\u5728\u5f25\u5408\u7406\u8bba\u6a21\u578b\u4e0e\u73b0\u5b9e\u52a8\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u4ee5\u516c\u5171\u4f4f\u623f\u5206\u914d\u573a\u666f\u4f5c\u4e3a\u6848\u4f8b\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u653f\u7b56\u4eff\u771f\u5b9e\u9a8c\u6765\u9a8c\u8bc1SRAP-Agent\u7684\u53ef\u884c\u6027\u548c\u6709\u6548\u6027\uff0c\u5e76\u91c7\u7528\u5177\u6709\u7279\u5b9a\u4f18\u5316\u76ee\u6807\u7684\u653f\u7b56\u4f18\u5316\u7b97\u6cd5\u3002\u6e90\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/jijiarui-cather/SRAPAgent_Framework\u627e\u5230\u3002|\n", "2410.14041": "|**2024-10-17**|**From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching**|Eric Yang et.al.|[2410.14041](http://arxiv.org/abs/2410.14041)|null|\u6709\u6548\u7684\u7ba1\u7406\u5fc3\u810f\u4ee3\u8c22\u72b6\u51b5\u9700\u8981\u6301\u7eed\u7684\u79ef\u6781\u8425\u517b\u4e60\u60ef\uff0c\u4f46\u8fd9\u4e9b\u4e60\u60ef\u5f80\u5f80\u53d7\u5230\u590d\u6742\u4e14\u4e2a\u4f53\u5316\u7684\u969c\u788d\u5f71\u54cd\u3002\u76f4\u63a5\u7684\u4eba\u7c7b\u7ba1\u7406\u96be\u4ee5\u6269\u5c55\uff0c\u800c\u4e4b\u524d\u7684\u5c1d\u8bd5\u65e8\u5728\u81ea\u52a8\u5316\u8425\u517b\u8f85\u5bfc\uff0c\u4f46\u7f3a\u4e4f\u89e3\u51b3\u8fd9\u4e9b\u591a\u6837\u5316\u6311\u6218\u6240\u9700\u7684\u4e2a\u6027\u5316\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e3b\u52a8\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u901a\u8fc7\u76f4\u63a5\u9488\u5bf9\u5e76\u7f13\u89e3\u60a3\u8005\u7279\u5b9a\u7684\u969c\u788d\u6765\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u8425\u517b\u8f85\u5bfc\u3002\u8be5\u5de5\u4f5c\u6d41\u7a0b\u57fa\u4e8e\u884c\u4e3a\u79d1\u5b66\u539f\u5219\uff0c\u5229\u7528\u4e86\u4e0e\u76f8\u5e94\u5faa\u8bc1\u7b56\u7565\u76f8\u5173\u7684\u5168\u9762\u8425\u517b\u76f8\u5173\u969c\u788d\u6620\u5c04\u3002\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u4ee3\u7406\u6709\u610f\u63a2\u67e5\u5e76\u8bc6\u522b\u60a3\u8005\u5728\u996e\u98df\u65b9\u9762\u7684\u6839\u672c\u95ee\u9898\u3002\u968f\u540e\uff0c\u53e6\u4e00\u4e2aLLM\u4ee3\u7406\u63d0\u4f9b\u91cf\u8eab\u5b9a\u5236\u7684\u7b56\u7565\uff0c\u4ee5\u514b\u670d\u8fd9\u4e9b\u7279\u5b9a\u969c\u788d\uff0c\u5e76\u7ed3\u5408\u60a3\u8005\u7684\u5177\u4f53\u60c5\u51b5\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca\u5fc3\u810f\u4ee3\u8c22\u75be\u75c5\u60a3\u8005\u7684\u7528\u6237\u7814\u7a76\u6765\u8bbe\u8ba1\u548c\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8bc1\u660e\u4e86\u8be5\u7cfb\u7edf\u80fd\u591f\u51c6\u786e\u8bc6\u522b\u969c\u788d\u5e76\u63d0\u4f9b\u4e2a\u6027\u5316\u6307\u5bfc\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u5927\u89c4\u6a21\u6a21\u62df\u7814\u7a76\u6765\u8bc4\u4f30\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u8be5\u7814\u7a76\u57fa\u4e8e\u771f\u5b9e\u7684\u60a3\u8005\u6848\u4f8b\u548c\u4e13\u5bb6\u9a8c\u8bc1\u7684\u6307\u6807\uff0c\u5728\u5e7f\u6cdb\u7684\u60c5\u666f\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8eLLM\u7684\u4e3b\u52a8\u5de5\u4f5c\u6d41\u7a0b\u6709\u53ef\u80fd\u901a\u8fc7\u63d0\u4f9b\u4e2a\u6027\u5316\u3001\u53ef\u6269\u5c55\u4e14\u57fa\u4e8e\u884c\u4e3a\u7684\u5e72\u9884\u63aa\u65bd\u6765\u6539\u5584\u8425\u517b\u8f85\u5bfc\u3002|\n", "2410.16237": "|**2024-10-23**|**IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems**|Yihuan Mao et.al.|[2410.16237](http://arxiv.org/abs/2410.16237)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8d8a\u6765\u8d8a\u591a\u5730\u96c6\u6210\u5230\u6211\u4eec\u7684\u57fa\u7840\u8bbe\u65bd\u4e2d\uff0c\u5b83\u4eec\u7684\u7a33\u5065\u534f\u8c03\u548c\u6d88\u606f\u540c\u6b65\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u62dc\u5360\u5ead\u5c06\u519b\u95ee\u9898\uff08BGP\uff09\u662f\u6784\u5efa\u5728\u5bf9\u6297\u6027\u653b\u51fb\u4e0b\u5177\u6709\u5f39\u6027\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08MAS\uff09\u7684\u5173\u952e\u6a21\u578b\u3002\u8be5\u95ee\u9898\u63cf\u8ff0\u4e86\u4e00\u79cd\u60c5\u666f\uff0c\u5176\u4e2d\u7cfb\u7edf\u5185\u5b58\u5728\u6076\u610f\u4ee3\u7406\u4e14\u8fd9\u4e9b\u4ee3\u7406\u7684\u8eab\u4efd\u672a\u77e5\u2014\u2014\u5728\u6211\u4eec\u7684\u60c5\u5883\u4e2d\uff0c\u8fd9\u79cd\u60c5\u51b5\u53ef\u80fd\u662f\u7531LLM\u4ee3\u7406\u7684\u5e7b\u89c9\u6216\u5916\u90e8\u653b\u51fb\u5f15\u8d77\u7684\u3002\u5728BGP\u4e2d\uff0c\u6574\u4e2a\u7cfb\u7edf\u7684\u76ee\u7684\u662f\u5c31\u91c7\u53d6\u7684\u884c\u52a8\u8fbe\u6210\u5171\u8bc6\u3002\u4f20\u7edf\u7684BGP\u9700\u8981\u6240\u6709\u4ee3\u7406\u4e4b\u95f4\u7684\u5168\u5c40\u5171\u8bc6\uff1b\u7136\u800c\uff0c\u5728\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u5168\u5c40\u5171\u8bc6\u5e76\u975e\u603b\u662f\u5fc5\u8981\uff0c\u751a\u81f3\u53ef\u80fd\u6548\u7387\u4f4e\u4e0b\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u63a2\u7d22\u4e00\u79cd\u4e0eMAS\u4e2d\u89c2\u5bdf\u5230\u7684\u5c40\u90e8\u534f\u8c03\u6a21\u5f0f\u76f8\u4e00\u81f4\u7684\u6539\u8fdb\u7248BGP\u3002\u6211\u4eec\u5728\u7814\u7a76\u4e2d\u5c06\u8fd9\u79cd\u6539\u8fdb\u7248\u79f0\u4e3a\u4e0d\u5b8c\u7f8eBGP\uff08IBGP\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u5dee\u5f02\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u4e00\u822cMAS\u73af\u5883\u4e2d\u7684\u5171\u8bc6\u534f\u8bae\uff0c\u63d0\u4f9b\u4e86\u5bf9\u901a\u4fe1\u653b\u51fb\u7684\u53ef\u8bc1\u660e\u5f39\u6027\u4ee5\u53ca\u9002\u5e94\u4e0d\u65ad\u53d8\u5316\u7684\u73af\u5883\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u5b9e\u8bc1\u7ed3\u679c\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4f20\u611f\u5668\u7f51\u7edc\u73af\u5883\u4e2d\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u534f\u8bae\u7684\u5b9e\u9645\u5e94\u7528\u3002|\n", "2410.15686": "|**2024-10-21**|**NetSafe: Exploring the Topological Safety of Multi-agent Networks**|Miao Yu et.al.|[2410.15686](http://arxiv.org/abs/2410.15686)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u8d4b\u4e88\u4e86\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u4e2d\u7684\u8282\u70b9\u4ee5\u667a\u80fd\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\u3002\u7136\u800c\uff0c\u5982\u4f55\u9632\u6b62\u8fd9\u4e9b\u7f51\u7edc\u751f\u6210\u6076\u610f\u4fe1\u606f\u4ecd\u7136\u662f\u4e00\u4e2a\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u95ee\u9898\uff0c\u4ee5\u524d\u5173\u4e8e\u5355\u4e2aLLM\u5b89\u5168\u6027\u7684\u7814\u7a76\u96be\u4ee5\u76f4\u63a5\u8f6c\u79fb\u5e94\u7528\u3002\u672c\u6587\u4ece\u62d3\u6251\u5b66\u7684\u89d2\u5ea6\u5173\u6ce8\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u63a2\u8ba8\u54ea\u4e9b\u62d3\u6251\u7279\u6027\u6709\u52a9\u4e8e\u66f4\u5b89\u5168\u7684\u7f51\u7edc\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6NetSafe\u4ee5\u53ca\u4e00\u79cd\u8fed\u4ee3RelCom\u4ea4\u4e92\uff0c\u4ee5\u7edf\u4e00\u73b0\u6709\u7684\u5404\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u4e3a\u4e00\u822c\u5316\u7684\u62d3\u6251\u5b89\u5168\u6027\u7814\u7a76\u5960\u5b9a\u57fa\u7840\u3002\u6211\u4eec\u53d1\u73b0\u5f53\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u53d7\u5230\u6d89\u53ca\u865a\u5047\u4fe1\u606f\u3001\u504f\u89c1\u548c\u6709\u5bb3\u4fe1\u606f\u7684\u653b\u51fb\u65f6\uff0c\u4f1a\u51fa\u73b0\u51e0\u79cd\u5173\u952e\u73b0\u8c61\uff0c\u79f0\u4e3a\u4ee3\u7406\u5e7b\u89c9\u548c\u805a\u5408\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u9ad8\u5ea6\u8fde\u63a5\u7684\u7f51\u7edc\u66f4\u5bb9\u6613\u53d7\u5230\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f71\u54cd\uff0c\u5728\u661f\u5f62\u56fe\u62d3\u6251\u7ed3\u6784\u4e0b\u4efb\u52a1\u6027\u80fd\u4e0b\u964d\u4e8629.7%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u9759\u6001\u5ea6\u91cf\u6bd4\u4f20\u7edf\u7684\u56fe\u8bba\u5ea6\u91cf\u66f4\u63a5\u8fd1\u73b0\u5b9e\u4e16\u754c\u7684\u52a8\u6001\u8bc4\u4f30\uff0c\u8868\u660e\u8ddd\u79bb\u653b\u51fb\u8005\u5e73\u5747\u8ddd\u79bb\u66f4\u5927\u7684\u7f51\u7edc\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u5b89\u5168\u6027\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u89c6\u89d2\u6765\u63a2\u8ba8\u57fa\u4e8eLLM\u7684\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u5e76\u53d1\u73b0\u4e86\u51e0\u4e2a\u672a\u62a5\u9053\u7684\u73b0\u8c61\uff0c\u4e3a\u672a\u6765\u63a2\u7d22\u6b64\u7c7b\u7f51\u7edc\u7684\u5b89\u5168\u6027\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.15311": "|**2024-10-20**|**Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game**|Ruiqi Dong et.al.|[2410.15311](http://arxiv.org/abs/2410.15311)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u626e\u6f14\u7740\u5173\u952e\u7684AI\u89d2\u8272\uff0c\u4f46\u5728\u590d\u6742\u573a\u666f\u4e2d\u7684\u5f00\u653e\u5f0f\u51b3\u7b56\u95ee\u9898\u4e2d\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u8bed\u8a00\u903b\u8f91\u6e38\u620f\u201c\u8c01\u662f\u5367\u5e95\uff1f\u201d\uff08WIU\uff09\u4f5c\u4e3a\u5b9e\u9a8c\u5e73\u53f0\uff0c\u63d0\u51fa\u4e86\u591a\u89c6\u89d2\u56e2\u961f\u6218\u672f\uff08MPTT\uff09\u6846\u67b6\u3002MPTT\u65e8\u5728\u57f9\u517bLLMs\u5728\u590d\u6742\u573a\u666f\u4e2d\u7684\u4eba\u7c7b\u8bed\u8a00\u8868\u8fbe\u903b\u8f91\u3001\u591a\u7ef4\u601d\u7ef4\u548c\u81ea\u6211\u611f\u77e5\u3002\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u53d1\u8a00\u548c\u6295\u7968\u73af\u8282\uff0c\u5e76\u7ed3\u5408\u81ea\u6211\u89c6\u89d2\u3001\u8eab\u4efd\u786e\u5b9a\u3001\u81ea\u6211\u53cd\u601d\u3001\u81ea\u6211\u603b\u7ed3\u548c\u591a\u8f6e\u627e\u961f\u53cb\u7b49\u6280\u672f\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u7b56\u7565\u6027\u9690\u85cf\u548c\u6c9f\u901a\u4f5c\u51fa\u7406\u6027\u51b3\u7b56\uff0c\u4fc3\u8fdb\u4eba\u7c7b\u4fe1\u4efb\u7684\u5f62\u6210\u3002\u521d\u6b65\u7ed3\u679c\u663e\u793a\uff0cMPTT\u7ed3\u5408WIU\u5229\u7528\u4e86LLMs\u7684\u8ba4\u77e5\u80fd\u529b\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u53ef\u4ee5\u6a21\u62df\u771f\u5b9e\u793e\u4f1a\u7684\u51b3\u7b56\u6846\u67b6\u3002\u8be5\u6846\u67b6\u6709\u52a9\u4e8e\u5c11\u6570\u7fa4\u4f53\u7684\u6c9f\u901a\u4e0e\u8868\u8fbe\uff0c\u4fc3\u8fdb\u4e86\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u516c\u5e73\u6027\u548c\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u201c\u4eba\u5728\u56de\u8def\u201d\u5b9e\u9a8c\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u4e92\u52a8\u5b66\u4e60\u5e76\u9002\u5e94\u4eba\u7c7b\u884c\u4e3a\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u79ef\u6781\u53c2\u4e0e\u793e\u4f1a\u51b3\u7b56\u3002|\n", "2410.15267": "|**2024-10-20**|**When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?**|Shang Wang et.al.|[2410.15267](http://arxiv.org/abs/2410.15267)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982ChatGPT\u548cGemini\u7684\u90e8\u7f72\u5c55\u793a\u4e86\u5b83\u4eec\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u53ef\u80fd\u4f1a\u65e0\u610f\u4e2d\u5b66\u5230\u5e76\u4fdd\u7559\u654f\u611f\u4fe1\u606f\u548c\u6709\u5bb3\u5185\u5bb9\uff0c\u8fd9\u5f15\u53d1\u4e86\u91cd\u5927\u7684\u4f26\u7406\u548c\u6cd5\u5f8b\u95ee\u9898\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u63d0\u51fa\u4e86\u673a\u5668\u9057\u5fd8\u4f5c\u4e3a\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u9057\u5fd8\u65b9\u6cd5\u8003\u8651\u4e86LLMs\u7684\u5177\u4f53\u7279\u6027\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9762\u4e34\u9ad8\u8ba1\u7b97\u9700\u6c42\u3001\u6709\u9650\u9002\u7528\u6027\u6216\u707e\u96be\u6027\u9057\u5fd8\u7684\u98ce\u9669\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6280\u672f\u7684\u8f7b\u91cf\u7ea7\u9057\u5fd8\u6846\u67b6\u3002\u901a\u8fc7\u4fee\u6539RAG\u7684\u5916\u90e8\u77e5\u8bc6\u5e93\uff0c\u6211\u4eec\u5728\u4e0d\u76f4\u63a5\u4e0e\u672a\u5b66\u4e60\u7684LLM\u4ea4\u4e92\u7684\u60c5\u51b5\u4e0b\u6a21\u62df\u9057\u5fd8\u7684\u6548\u679c\u3002\u6211\u4eec\u5c06\u6784\u5efa\u9057\u5fd8\u77e5\u8bc6\u89c6\u4e3a\u4e00\u4e2a\u7ea6\u675f\u4f18\u5316\u95ee\u9898\uff0c\u5e76\u63a8\u5bfc\u51fa\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\uff0c\u4ee5\u652f\u6301\u57fa\u4e8eRAG\u7684\u9057\u5fd8\u7684\u6709\u6548\u6027\u3002\u8fd9\u79cd\u57fa\u4e8eRAG\u7684\u65b9\u6cd5\u5bf9\u4e8e\u95ed\u6e90LLMs\u7279\u522b\u6709\u6548\uff0c\u800c\u73b0\u6709\u9057\u5fd8\u65b9\u6cd5\u5f80\u5f80\u5728\u8fd9\u4e9b\u6a21\u578b\u4e0a\u5931\u6548\u3002\u6211\u4eec\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u5bf9\u6211\u4eec\u7684\u6846\u67b6\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5728\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86ChatGPT\u3001Gemini\u3001Llama-2-7b-chat-hf\u548cPaLM 2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6ee1\u8db3\u4e86\u4e94\u4e2a\u5173\u952e\u7684\u9057\u5fd8\u6807\u51c6\uff1a\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u3001\u65e0\u5bb3\u6027\u3001\u7b80\u5355\u6027\u548c\u9c81\u68d2\u6027\u3002\u6b64\u5916\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u6269\u5c55\u5230\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u3002|\n", "2410.15164": "|**2024-10-19**|**SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation**|Jingxuan Chen et.al.|[2410.15164](http://arxiv.org/abs/2410.15164)|null|\u667a\u80fd\u624b\u673a\u4ee3\u7406\u5728\u5e2e\u52a9\u7528\u6237\u9ad8\u6548\u63a7\u5236\u8bbe\u5907\u65b9\u9762\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u65b9\u6cd5\u6210\u4e3a\u5173\u952e\u7684\u7ade\u4e89\u8005\u3002\u7136\u800c\uff0c\u516c\u5e73\u6bd4\u8f83\u8fd9\u4e9b\u4ee3\u7406\u65e2\u91cd\u8981\u53c8\u5177\u6709\u6311\u6218\u6027\uff0c\u9700\u8981\u591a\u6837\u5316\u7684\u4efb\u52a1\u8303\u56f4\u3001\u96c6\u6210\u4e0d\u540c\u5b9e\u73b0\u65b9\u5f0f\u7684\u4ee3\u7406\u4ee5\u53ca\u901a\u7528\u7684\u8bc4\u4f30\u7ba1\u9053\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u4f18\u52bf\u548c\u52a3\u52bf\u3002\u672c\u6587\u4ecb\u7ecd\u4e86SPA-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u7efc\u5408\u7684\u667a\u80fd\u624b\u673a\u4ee3\u7406\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u8bc4\u4f30\u57fa\u4e8e\uff08M\uff09LLM\u7684\u4ee3\u7406\u5728\u4e00\u4e2a\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u6761\u4ef6\u7684\u4ea4\u4e92\u73af\u5883\u4e2d\u3002SPA-Bench\u6709\u4e09\u4e2a\u4e3b\u8981\u8d21\u732e\uff1a\uff081\uff09\u6db5\u76d6\u7cfb\u7edf\u5e94\u7528\u548c\u7b2c\u4e09\u65b9\u5e94\u7528\u7684\u4efb\u52a1\u96c6\uff0c\u5305\u62ec\u82f1\u8bed\u548c\u4e2d\u6587\uff0c\u91cd\u70b9\u662f\u65e5\u5e38\u751f\u6d3b\u4e2d\u5e38\u7528\u7684\u529f\u80fd\uff1b\uff082\uff09\u4e00\u4e2a\u5373\u63d2\u5373\u7528\u6846\u67b6\uff0c\u652f\u6301\u4e0eAndroid\u8bbe\u5907\u7684\u5b9e\u65f6\u4ea4\u4e92\uff0c\u96c6\u6210\u4e86\u8d85\u8fc7\u5341\u4e2a\u4ee3\u7406\uff0c\u5e76\u4e14\u53ef\u4ee5\u7075\u6d3b\u6dfb\u52a0\u66f4\u591a\u4ee3\u7406\uff1b\uff083\uff09\u4e00\u79cd\u65b0\u9896\u7684\u8bc4\u4f30\u7ba1\u9053\uff0c\u81ea\u52a8\u4ece\u591a\u4e2a\u7ef4\u5ea6\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff0c\u5305\u62ec\u4e03\u4e2a\u4e0e\u4efb\u52a1\u5b8c\u6210\u548c\u8d44\u6e90\u6d88\u8017\u76f8\u5173\u7684\u6307\u6807\u3002\u6211\u4eec\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u89e3\u91ca\u79fb\u52a8\u7528\u6237\u754c\u9762\u3001\u52a8\u4f5c\u5b9a\u4f4d\u3001\u8bb0\u5fc6\u4fdd\u7559\u548c\u6267\u884c\u6210\u672c\u7b49\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u4ee5\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ece\u800c\u66f4\u63a5\u8fd1\u5b9e\u9645\u7684\u667a\u80fd\u624b\u673a\u4ee3\u7406\u5e94\u7528\u3002|\n", "2410.14923": "|**2024-10-22**|**Imprompter: Tricking LLM Agents into Improper Tool Use**|Xiaohan Fu et.al.|[2410.14923](http://arxiv.org/abs/2410.14923)|**[link](https://github.com/Reapor-Yurnero/imprompter)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u662f\u4e00\u79cd\u65b0\u5174\u7684\u8ba1\u7b97\u8303\u5f0f\uff0c\u5b83\u7ed3\u5408\u4e86\u751f\u6210\u5f0f\u673a\u5668\u5b66\u4e60\u4e0e\u4ee3\u7801\u89e3\u91ca\u5668\u3001\u7f51\u9875\u6d4f\u89c8\u3001\u7535\u5b50\u90ae\u4ef6\u7b49\u5de5\u5177\uff0c\u4ee5\u53ca\u66f4\u5e7f\u6cdb\u7684\u5916\u90e8\u8d44\u6e90\u3002\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u7cfb\u7edf\u4ee3\u8868\u4e86\u4e2a\u4eba\u8ba1\u7b97\u9886\u57df\u7684\u4e00\u4e2a\u65b0\u5174\u8f6c\u53d8\u3002\u6211\u4eec\u4e3a\u57fa\u4e8e\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u57fa\u7840\u505a\u51fa\u8d21\u732e\uff0c\u5e76\u63d0\u51fa\u4e86\u65b0\u7684\u81ea\u52a8\u8ba1\u7b97\u7684\u5bf9\u6297\u6027\u63d0\u793a\u653b\u51fb\uff0c\u8fd9\u4e9b\u653b\u51fb\u4fb5\u72af\u4e86\u7528\u6237\u8d44\u6e90\u7684\u673a\u5bc6\u6027\u548c\u5b8c\u6574\u6027\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5728\u7ed9\u5b9a\u6a21\u578b\u6743\u91cd\u7684\u60c5\u51b5\u4e0b\uff0c\u5229\u7528\u63d0\u793a\u4f18\u5316\u6280\u672f\u81ea\u52a8\u751f\u6210\u8fd9\u6837\u7684\u63d0\u793a\u3002\u6211\u4eec\u8bc1\u660e\u8fd9\u79cd\u653b\u51fb\u53ef\u4ee5\u8f6c\u79fb\u5230\u751f\u4ea7\u7ea7\u522b\u7684\u4ee3\u7406\u4e0a\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5bf9Mistral\u7684LeChat\u4ee3\u7406\u7684\u4fe1\u606f\u7a83\u53d6\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u5206\u6790\u7528\u6237\u7684\u5bf9\u8bdd\uff0c\u6311\u9009\u51fa\u4e2a\u4eba\u8eab\u4efd\u4fe1\u606f\uff0c\u5e76\u5c06\u5176\u683c\u5f0f\u5316\u4e3a\u6709\u6548\u7684markdown\u547d\u4ee4\uff0c\u4ece\u800c\u5c06\u8fd9\u4e9b\u6570\u636e\u6cc4\u9732\u5230\u653b\u51fb\u8005\u7684\u670d\u52a1\u5668\u4e0a\u3002\u8fd9\u79cd\u653b\u51fb\u5728\u7aef\u5230\u7aef\u8bc4\u4f30\u4e2d\u663e\u793a\u51fa\u4e86\u8fd180%\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\u6765\u8868\u5f81\u8fd9\u4e9b\u653b\u51fb\u7684\u6709\u6548\u6027\uff0c\u5e76\u53d1\u73b0\u5b83\u4eec\u5728\u65b0\u5174\u7684\u57fa\u4e8e\u4ee3\u7406\u7684\u7cfb\u7edf\u5982Mistral\u7684LeChat\u3001ChatGLM\u548cMeta\u7684Llama\u4e2d\u90fd\u80fd\u53ef\u9760\u5730\u5de5\u4f5c\u3002\u8fd9\u4e9b\u653b\u51fb\u662f\u591a\u6a21\u6001\u7684\uff0c\u6211\u4eec\u5728\u6587\u672c\u548c\u56fe\u50cf\u9886\u57df\u5c55\u793a\u4e86\u4e0d\u540c\u7684\u53d8\u4f53\u3002**|\n", "2410.17238": "|**2024-10-22**|**SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning**|Yizhou Chi et.al.|[2410.17238](http://arxiv.org/abs/2410.17238)|**[link](https://github.com/geekan/metagpt)**|**\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\uff08AutoML\uff09\u65b9\u6cd5\u5305\u62ec\u4f20\u7edf\u7684\u4f18\u5316\u56fa\u5b9a\u7ba1\u9053\u4ee5\u8fdb\u884c\u6a21\u578b\u9009\u62e9\u548c\u96c6\u6210\u7684\u65b9\u6cd5\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6700\u65b0\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6846\u67b6\uff0c\u8fd9\u4e9b\u6846\u67b6\u53ef\u4ee5\u81ea\u4e3b\u6784\u5efa\u7ba1\u9053\u3002\u5c3d\u7ba1\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\u4efb\u52a1\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u751f\u6210\u4f4e\u591a\u6837\u6027\u548c\u6b21\u4f18\u7684\u4ee3\u7801\uff0c\u5373\u4f7f\u7ecf\u8fc7\u591a\u6b21\u8fed\u4ee3\u4e5f\u662f\u5982\u6b64\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u6811\u641c\u7d22\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff08SELA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u5229\u7528\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u6765\u4f18\u5316AutoML\u8fc7\u7a0b\u3002\u901a\u8fc7\u5c06\u7ba1\u9053\u914d\u7f6e\u8868\u793a\u4e3a\u6811\u7ed3\u6784\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f7f\u4ee3\u7406\u80fd\u591f\u667a\u80fd\u5730\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5e76\u8fed\u4ee3\u5730\u4f18\u5316\u5176\u7b56\u7565\uff0c\u4ece\u800c\u66f4\u6709\u6548\u5730\u63a2\u7d22\u673a\u5668\u5b66\u4e60\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u3002\u8fd9\u4e00\u65b0\u9896\u7684\u65b9\u6cd5\u5141\u8bb8SELA\u6839\u636e\u5b9e\u9a8c\u53cd\u9988\u53d1\u73b0\u6700\u4f18\u8def\u5f84\uff0c\u63d0\u9ad8\u89e3\u51b3\u65b9\u6848\u7684\u6574\u4f53\u8d28\u91cf\u3002\u5728\u8de8\u8d8a20\u4e2a\u673a\u5668\u5b66\u4e60\u6570\u636e\u96c6\u7684\u5e7f\u6cdb\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u4f20\u7edf\u548c\u57fa\u4e8e\u4ee3\u7406\u7684AutoML\u65b9\u6cd5\u7684\u6027\u80fd\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u6240\u6709\u6570\u636e\u96c6\u4e2d\uff0cSELA\u76f8\u5bf9\u4e8e\u6bcf\u4e2a\u57fa\u7ebf\u7684\u80dc\u7387\u4e3a65%\u523080%\u3002\u8fd9\u4e9b\u7ed3\u679c\u5f3a\u8c03\u4e86\u57fa\u4e8e\u4ee3\u7406\u7b56\u7565\u5728AutoML\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u4e3a\u89e3\u51b3\u590d\u6742\u7684\u673a\u5668\u5b66\u4e60\u6311\u6218\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\u3002**|\n", "2410.16919": "|**2024-10-22**|**EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI**|Tomoyuki Kagaya et.al.|[2410.16919](http://arxiv.org/abs/2410.16919)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63a8\u7406\u80fd\u529b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c24\u5176\u662f\u5728\u5404\u79cd\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u5e94\u7528\u3002LLM\u4ee3\u7406\u7684\u4e00\u4e2a\u7279\u522b\u6709\u524d\u666f\u7684\u5e94\u7528\u662f\u673a\u5668\u4eba\u64cd\u4f5c\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u4e3a\u673a\u5668\u4eba\u751f\u6210\u6587\u672c\u89c4\u5212\u6216\u63a7\u5236\u4ee3\u7801\uff0c\u63d0\u4f9b\u4e86\u6781\u5927\u7684\u7075\u6d3b\u6027\u548c\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u7075\u6d3b\u6027\u548c\u8de8\u4e0d\u540c\u73af\u5883\u7684\u9002\u7528\u6027\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u81ea\u4e3b\u9002\u5e94\u7684\u80fd\u529b\u3002\u76ee\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u5206\u4e3a\u4e24\u7c7b\uff1a\u4e00\u7c7b\u4f9d\u8d56\u4e8e\u7279\u5b9a\u73af\u5883\u7684\u7b56\u7565\u8bad\u7ec3\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u53ef\u79fb\u690d\u6027\uff1b\u53e6\u4e00\u7c7b\u57fa\u4e8e\u56fa\u5b9a\u63d0\u793a\u751f\u6210\u4ee3\u7801\u52a8\u4f5c\uff0c\u5728\u9762\u5bf9\u65b0\u73af\u5883\u65f6\u6027\u80fd\u4f1a\u4e0b\u964d\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u663e\u8457\u5236\u7ea6\u4e86\u4ee3\u7406\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEnvBridge\u7684\u65b0\u65b9\u6cd5\u3002\u8fd9\u79cd\u65b9\u6cd5\u6d89\u53ca\u4ece\u6e90\u73af\u5883\u4fdd\u7559\u548c\u8f6c\u79fb\u6210\u529f\u7684\u673a\u5668\u4eba\u63a7\u5236\u4ee3\u7801\u5230\u76ee\u6807\u73af\u5883\u3002EnvBridge\u901a\u8fc7\u5229\u7528\u591a\u4e2a\u73af\u5883\u7684\u89c1\u89e3\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u5728\u591a\u6837\u5316\u8bbe\u7f6e\u4e2d\u7684\u9002\u5e94\u6027\u548c\u6027\u80fd\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u7f13\u89e3\u4e86\u73af\u5883\u7ea6\u675f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u7075\u6d3b\u548c\u901a\u7528\u7684\u673a\u5668\u4eba\u64cd\u4f5c\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528\u673a\u5668\u4eba\u64cd\u4f5c\u57fa\u51c6\u6d4b\u8bd5RLBench\u3001MetaWorld\u548cCALVIN\u9a8c\u8bc1\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM\u4ee3\u7406\u80fd\u591f\u6210\u529f\u5229\u7528\u591a\u6837\u5316\u7684\u77e5\u8bc6\u6765\u6e90\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u673a\u5668\u4eba\u64cd\u4f5c\u4ee3\u7406\u5728\u591a\u6837\u5316\u73af\u5883\u4e2d\u89c4\u5212\u7684\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u3002|\n", "2410.16670": "|**2024-10-22**|**CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing**|Chen Yang et.al.|[2410.16670](http://arxiv.org/abs/2410.16670)|**[link](https://github.com/uclaml/cops)**|**\u5728\u4ee3\u7406\u7cfb\u7edf\u4e2d\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u987a\u5e8f\u63a8\u7406\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u65b9\u6cd5\u4ecd\u9762\u4e34\u4e00\u4e9b\u9650\u5236\u3002\u53cd\u601d\u9a71\u52a8\u7684\u63a8\u7406\u5b8c\u5168\u4f9d\u8d56\u4e8e\u9884\u8bad\u7ec3\u6a21\u578b\u4e2d\u7684\u77e5\u8bc6\uff0c\u8fd9\u5728\u65b0\u9896\u573a\u666f\u4e2d\u7684\u8868\u73b0\u5f80\u5f80\u53d7\u9650\uff1b\u800c\u7ecf\u9a8c\u8f85\u52a9\u7684\u63a8\u7406\u5219\u5e38\u5e38\u4f9d\u8d56\u5916\u90e8\u7ecf\u9a8c\uff0c\u5e76\u4e14\u7f3a\u4e4f\u9009\u62e9\u4ee3\u8868\u6027\u7ecf\u9a8c\u7684\u660e\u786e\u539f\u5219\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u51faCoPS\uff08\u8de8\u4efb\u52a1\u7ecf\u9a8c\u5171\u4eab\uff09\u7b97\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e9b\u9650\u5236\uff0c\u8fd9\u662f\u4e00\u79cd\u80fd\u591f\u901a\u8fc7\u8de8\u4efb\u52a1\u7ecf\u9a8c\u5171\u4eab\u548c\u9009\u62e9\u6765\u589e\u5f3a\u987a\u5e8f\u63a8\u7406\u7684\u901a\u7528\u7b97\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0cCoPS\u5229\u7528\u4ee3\u7406\u5728\u5148\u524d\u4efb\u52a1\u4e2d\u7684\u7ecf\u9a8c\uff0c\u901a\u8fc7\u4e00\u79cd\u57fa\u4e8e\u60b2\u89c2\u7b56\u7565\u7684\u65b9\u6cd5\u9009\u62e9\u5206\u5e03\u5339\u914d\u7684\u7ecf\u9a8c\uff0c\u4ee5\u6700\u5927\u5316\u6548\u7528\u5e76\u6700\u5c0f\u5316\u56e0\u5206\u5e03\u53d8\u5316\u5e26\u6765\u7684\u98ce\u9669\u3002\u5728Alfworld\u3001Webshop\u548cHotPotQA\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoPS\u59cb\u7ec8\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5e76\u5177\u6709\u9002\u7528\u4e8e\u8d44\u6e90\u53d7\u9650\u573a\u666f\u7684\u4f18\u8d8a\u6837\u672c\u6548\u7387\u3002\u4ece\u7406\u8bba\u4e0a\u8bb2\uff0c\u6211\u4eec\u7684\u7b97\u6cd5\u6027\u80fd\u53d6\u51b3\u4e8e\u9884\u8bad\u7ec3LLM\u7684\u8d28\u91cf\u4ee5\u53ca\u4ee3\u7406\u7684\u4efb\u52a1\u76f8\u5173\u8bd5\u9a8c\u5206\u5e03\u4e0eLLM\u751f\u6210\u5206\u5e03\u4e4b\u95f4\u7684\u5339\u914d\u5ea6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u586b\u8865\u4e86\u73b0\u6709\u987a\u5e8f\u63a8\u7406\u8303\u5f0f\u4e4b\u95f4\u7684\u7a7a\u767d\uff0c\u5e76\u9a8c\u8bc1\u4e86\u5229\u7528\u8de8\u4efb\u52a1\u7ecf\u9a8c\u7684\u6709\u6548\u6027\uff0c\u8fd9\u4e3a\u63d0\u9ad8\u4ee3\u7406\u5728\u591a\u6837\u5316\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u548c\u9002\u5e94\u6027\u63d0\u4f9b\u4e86\u6f5c\u5728\u9014\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2410.16658": "|**2024-10-22**|**Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent**|Janghoon Ock et.al.|[2410.16658](http://arxiv.org/abs/2410.16658)|null|\u5438\u9644\u80fd\u662f\u50ac\u5316\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u53cd\u5e94\u63cf\u8ff0\u7b26\uff0c\u80fd\u591f\u5b9e\u73b0\u6f5c\u5728\u50ac\u5316\u5242\u7684\u9ad8\u6548\u7b5b\u9009\u3002\u7136\u800c\uff0c\u786e\u5b9a\u5438\u9644\u80fd\u9700\u8981\u6bd4\u8f83\u591a\u79cd\u5438\u9644\u7269-\u50ac\u5316\u5242\u6784\u578b\u7684\u80fd\u91cf\uff0c\u7531\u4e8e\u53ef\u80fd\u7684\u6784\u578b\u6570\u91cf\u5e9e\u5927\uff0c\u8fd9\u5728\u8ba1\u7b97\u4e0a\u975e\u5e38\u8017\u65f6\u3002\u5f53\u524d\u7684\u7b97\u6cd5\u65b9\u6cd5\u901a\u5e38\u4f1a\u679a\u4e3e\u5438\u9644\u4f4d\u70b9\u548c\u6784\u578b\uff0c\u800c\u4e0d\u4f1a\u5229\u7528\u7406\u8bba\u89c1\u89e3\u6765\u6307\u5bfc\u521d\u59cb\u8bbe\u7f6e\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAdsorb-Agent\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\uff0c\u65e8\u5728\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u9ad8\u6548\u5730\u63a8\u5bfc\u51fa\u7cfb\u7edf\u7279\u5b9a\u7684\u7a33\u5b9a\u5438\u9644\u6784\u578b\u3002Adsorb-Agent\u5229\u7528\u5185\u7f6e\u77e5\u8bc6\u548c\u65b0\u5174\u63a8\u7406\u80fd\u529b\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u6240\u9700\u7684\u521d\u59cb\u6784\u578b\u6570\u91cf\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u9884\u6d4b\u6700\u4f4e\u5438\u9644\u80fd\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u5b9e\u4f8b\u7cfb\u7edfNNH-CuPd3(111)\u548cNNH-Mo3Pd(111)\uff0c\u7528\u4e8e\u6c2e\u8fd8\u539f\u53cd\u5e94\uff08NRR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6301\u7eed\u66ff\u4ee3\u54c8\u4f2f-\u535a\u65bd\u5de5\u827a\u7684\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u5176\u6027\u80fd\u3002Adsorb-Agent\u901a\u8fc7\u8bc6\u522b\u80fd\u91cf\u66f4\u4f4e\u4e14\u521d\u59cb\u8bbe\u7f6e\u66f4\u5c11\u7684\u6784\u578b\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684\u201c\u542f\u53d1\u5f0f\u201d\u548c\u201c\u968f\u673a\u201d\u7b97\u6cd5\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u8ba1\u7b97\u6210\u672c\u5e76\u63d0\u9ad8\u4e86\u51c6\u786e\u6027\u3002\u8fd9\u51f8\u663e\u4e86\u5b83\u52a0\u901f\u50ac\u5316\u5242\u53d1\u73b0\u7684\u6f5c\u529b\u3002|\n", "2410.18032": "|**2024-10-23**|**GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration**|Xin Li et.al.|[2410.18032](http://arxiv.org/abs/2410.18032)|**[link](https://github.com/bupt-gamma/graphteam)**|**\u56fe\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\uff0c\u5982\u793e\u4ea4\u7f51\u7edc\u548c\u57ce\u5e02\u8ba1\u7b97\u4e2d\u88ab\u5e7f\u6cdb\u7528\u4e8e\u5efa\u6a21\u5173\u7cfb\u6570\u636e\u3002\u73b0\u6709\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u56fe\u5206\u6790\u65b9\u6cd5\u8981\u4e48\u96c6\u6210\u4e86\u7279\u5b9a\u673a\u5668\u5b66\u4e60\u4efb\u52a1\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\uff0c\u9650\u5236\u4e86\u5176\u53ef\u8fc1\u79fb\u6027\uff0c\u8981\u4e48\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u81ea\u8eab\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e86LLM\u57fa\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5c55\u793a\u4e86\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u6216\u5de5\u5177\u89e3\u51b3\u95ee\u9898\u7684\u80fd\u529b\u3002\u901a\u8fc7\u6a21\u62df\u4eba\u7c7b\u7684\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff0c\u5982\u7c7b\u6bd4\u548c\u534f\u4f5c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u79f0\u4e3aGraphTeam\uff0c\u7528\u4e8e\u56fe\u5206\u6790\u3002GraphTeam\u7531\u4e09\u4e2a\u6a21\u5757\u4e2d\u7684\u4e94\u4e2aLLM\u57fa\u4ee3\u7406\u7ec4\u6210\uff0c\u5177\u6709\u4e0d\u540c\u4e13\u957f\u7684\u4ee3\u7406\u53ef\u4ee5\u76f8\u4e92\u534f\u4f5c\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\uff081\uff09\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u5316\u6a21\u5757\uff1a\u95ee\u9898\u4ee3\u7406\u4ece\u539f\u59cb\u95ee\u9898\u4e2d\u63d0\u53d6\u5e76\u63d0\u70bc\u51fa\u56db\u4e2a\u5173\u952e\u53c2\u6570\uff0c\u4fbf\u4e8e\u7406\u89e3\u95ee\u9898\uff0c\u7b54\u6848\u4ee3\u7406\u5219\u5c06\u7ed3\u679c\u7ec4\u7ec7\u6210\u7b26\u5408\u8f93\u51fa\u8981\u6c42\u7684\u5f62\u5f0f\uff1b\uff082\uff09\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6a21\u5757\uff1a\u6211\u4eec\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u6587\u6863\u548c\u7ecf\u9a8c\u4fe1\u606f\u7684\u77e5\u8bc6\u5e93\uff0c\u7136\u540e\u641c\u7d22\u4ee3\u7406\u4e3a\u6bcf\u4e2a\u95ee\u9898\u68c0\u7d22\u6700\u76f8\u5173\u7684\u6761\u76ee\u3002\uff083\uff09\u95ee\u9898\u89e3\u51b3\u6a21\u5757\uff1a\u7ed9\u5b9a\u641c\u7d22\u4ee3\u7406\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u7f16\u7801\u4ee3\u7406\u4f7f\u7528\u7f16\u7a0b\u65b9\u6cd5\u751f\u6210\u89e3\u51b3\u65b9\u6848\uff1b\u5982\u679c\u7f16\u7801\u4ee3\u7406\u4e0d\u8d77\u4f5c\u7528\uff0c\u63a8\u7406\u4ee3\u7406\u5c06\u76f4\u63a5\u8fdb\u884c\u8ba1\u7b97\u800c\u65e0\u9700\u7f16\u7a0b\u3002\u5728\u516d\u4e2a\u56fe\u5206\u6790\u57fa\u51c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cGraphTeam\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u6700\u597d\u7684\u57fa\u7ebf\u5e73\u5747\u63d0\u9ad8\u4e8625.85%\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/BUPT-GAMMA/GraphTeam \u83b7\u53d6\u3002**|\n", "2410.18012": "|**2024-10-25**|**MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting**|Sungil Seok et.al.|[2410.18012](http://arxiv.org/abs/2410.18012)|null|\u7f8e\u56fd\u8054\u90a6\u57fa\u91d1\u5229\u7387\u5728\u56fd\u5185\u5916\u91d1\u878d\u5e02\u573a\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8be5\u5229\u7387\u8c03\u6574\u7684\u5f71\u54cd\u4e0a\uff0c\u800c\u975e\u51b3\u7b56\u8fc7\u7a0b\u672c\u8eab\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u91cd\u5efa\u539f\u59cb\u7684\u8054\u90a6\u516c\u5f00\u5e02\u573a\u59d4\u5458\u4f1a\uff08FOMC\uff09\u4f1a\u8bae\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u8fd9\u4e9b\u4f1a\u8bae\u8d1f\u8d23\u8bbe\u5b9a\u8054\u90a6\u57fa\u91d1\u5229\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4e94\u9636\u6bb5\u7684FOMC\u4f1a\u8bae\u6a21\u62df\u6846\u67b6MiniFed\uff0c\u8be5\u6846\u67b6\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684FOMC\u4f1a\u8bae\u6210\u5458\uff0c\u5e76\u4f18\u5316FOMC\u7ed3\u6784\u3002\u8fd9\u4e00\u6846\u67b6\u6709\u6548\u5730\u91cd\u65b0\u6fc0\u6d3b\u4e86FOMC\u4f1a\u8bae\u6d41\u7a0b\uff0c\u5e76\u4fc3\u8fdb\u4e86\u5bf9\u8054\u90a6\u57fa\u91d1\u5229\u7387\u7684\u9884\u6d4b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u63d0\u51fa\u7684MiniFed\u6846\u67b6\u5728\u8054\u90a6\u57fa\u91d1\u5229\u7387\u9884\u6d4b\u65b9\u9762\u8fbe\u5230\u4e86\u9ad8\u51c6\u786e\u5ea6\uff0c\u5e76\u4e14\u4ee3\u7406\u7684\u884c\u4e3a\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u5bf9\u5e94\u8005\u4fdd\u6301\u4e00\u81f4\u3002\u9274\u4e8e\u76ee\u524d\u5f88\u5c11\u6709\u7814\u7a76\u5229\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u5927\u89c4\u6a21\u7684\u73b0\u5b9e\u4e16\u754c\u4f1a\u8bae\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u53ef\u4ee5\u4f5c\u4e3a\u672a\u6765\u53d1\u5c55\u7684\u57fa\u51c6\u3002|\n", "2410.18792": "|**2024-10-25**|**An LLM Agent for Automatic Geospatial Data Analysis**|Yuxing Chen et.al.|[2410.18792](http://arxiv.org/abs/2410.18792)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u636e\u79d1\u5b66\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u987a\u5e8f\u4efb\u52a1\u65f6\u5e38\u5e38\u9047\u5230\u903b\u8f91\u9519\u8bef\u7684\u95ee\u9898\u3002\u7279\u522b\u662f\u5728\u5904\u7406\u5730\u7406\u7a7a\u95f4\u6570\u636e\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u9762\u4e34\u7740\u6574\u5408\u590d\u6742\u6570\u636e\u7ed3\u6784\u548c\u7a7a\u95f4\u7ea6\u675f\u3001\u6709\u6548\u5229\u7528\u5404\u79cd\u51fd\u6570\u8c03\u7528\u4ee5\u53ca\u8f83\u5c11\u4f7f\u7528\u7684\u5730\u7406\u7a7a\u95f4\u5e93\u65b9\u9762\u5bb9\u6613\u4ea7\u751f\u5e7b\u89c9\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86GeoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u4ea4\u4e92\u6846\u67b6\uff0c\u65e8\u5728\u5e2e\u52a9LLMs\u66f4\u6709\u6548\u5730\u5904\u7406\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5904\u7406\u4efb\u52a1\u3002GeoAgent\u9996\u521b\u6027\u5730\u5c06\u4ee3\u7801\u89e3\u91ca\u5668\u3001\u9759\u6001\u5206\u6790\u548c\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u6280\u672f\u4e0e\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u7b97\u6cd5\u76f8\u7ed3\u5408\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5904\u7406\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8d21\u732e\u4e86\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u57fa\u4e8eLLMs\u7684\u65b9\u6cd5\u5728\u5730\u7406\u7a7a\u95f4\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u8be5\u57fa\u51c6\u5229\u7528\u4e86\u591a\u79cdPython\u5e93\uff0c\u5e76\u5305\u62ec\u4ece\u6570\u636e\u83b7\u53d6\u3001\u6570\u636e\u5206\u6790\u5230\u53ef\u89c6\u5316\u7684\u5355\u8f6e\u548c\u591a\u8f6e\u4efb\u52a1\u3002\u901a\u8fc7\u5728\u5404\u79cd\u5730\u7406\u7a7a\u95f4\u73af\u5883\u4e2d\u63d0\u4f9b\u5168\u9762\u7684\u8bc4\u4f30\uff0c\u8fd9\u4e2a\u57fa\u51c6\u4e3a\u5f00\u53d1LLMs\u5728\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8bbe\u5b9a\u4e86\u65b0\u6807\u51c6\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f9d\u9760LLMs\u7684\u77e5\u8bc6\u5bf9\u4e8e\u51c6\u786e\u7f16\u7a0b\u5730\u7406\u7a7a\u95f4\u4efb\u52a1\u662f\u4e0d\u591f\u7684\uff0c\u8fd9\u9700\u8981\u8fde\u8d2f\u7684\u591a\u6b65\u9aa4\u8fc7\u7a0b\u548c\u591a\u6b21\u51fd\u6570\u8c03\u7528\u3002\u4e0e\u57fa\u7ebfLLMs\u76f8\u6bd4\uff0c\u63d0\u51fa\u7684GeoAgent\u5c55\u793a\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5728\u51fd\u6570\u8c03\u7528\u548c\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6539\u8fdb\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u7ed3\u679c\u4e3a\u672a\u6765LLMs\u4ee3\u7406\u5728\u81ea\u52a8\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5206\u6790\u4efb\u52a1\u7f16\u7a0b\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002|\n", "2410.18528": "|**2024-10-24**|**PRACT: Optimizing Principled Reasoning and Acting of LLM Agent**|Zhiwei Liu et.al.|[2410.18528](http://arxiv.org/abs/2410.18528)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86Principled Reasoning and Acting (PRAct)\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u4ece\u8f68\u8ff9\u6570\u636e\u4e2d\u5b66\u4e60\u548c\u6267\u884c\u884c\u52a8\u539f\u5219\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6838\u5fc3\u662f\u4f7f\u7528\u6765\u81ea\u53cd\u601d\u548c\u4f18\u5316\u5f15\u64ce\u7684\u6587\u672c\u68af\u5ea6\u6765\u63a8\u5bfc\u8fd9\u4e9b\u884c\u52a8\u539f\u5219\u3002\u4e3a\u4e86\u4f7f\u884c\u52a8\u539f\u5219\u9002\u5e94\u7279\u5b9a\u4efb\u52a1\u8981\u6c42\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u4f18\u5316\u6846\u67b6\uff0c\u79f0\u4e3aReflective Principle Optimization (RPO)\u3002\u5728\u6267\u884c\u540e\uff0cRPO\u4f7f\u7528\u53cd\u601d\u5668\u6765\u6279\u8bc4\u5f53\u524d\u7684\u884c\u52a8\u539f\u5219\uff0c\u5e76\u4f7f\u7528\u4f18\u5316\u5668\u76f8\u5e94\u5730\u66f4\u65b0\u5b83\u4eec\u3002\u6211\u4eec\u5728\u4e24\u79cd\u573a\u666f\u4e0b\u5f00\u53d1\u4e86RPO\u6846\u67b6\uff1aReward-RPO\uff0c\u5b83\u4f7f\u7528\u73af\u5883\u5956\u52b1\u8fdb\u884c\u53cd\u601d\uff1b\u4ee5\u53caSelf-RPO\uff0c\u5b83\u5728\u6ca1\u6709\u5916\u90e8\u5956\u52b1\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4ecb\u7ecd\u4e86\u4e24\u79cdRPO\u65b9\u6cd5\uff0cRPO-Traj\u548cRPO-Batch\uff0c\u4ee5\u9002\u5e94\u4e0d\u540c\u7684\u8bbe\u7f6e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u56db\u4e2a\u73af\u5883\u4e2d\uff0c\u5229\u7528RPO\u6846\u67b6\u7684PRAct\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5b66\u4e60\u5e76\u5e94\u7528\u884c\u52a8\u539f\u5219\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002|\n", "2410.19385": "|**2024-10-25**|**Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models**|Liam Barkley et.al.|[2410.19385](http://arxiv.org/abs/2410.19385)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u901a\u8fc7\u5927\u91cf\u4eba\u7c7b\u53ef\u8bfb\u7684\u6587\u672c\u8bad\u7ec3\u800c\u6210\u7684\u5f3a\u5927\u8ba1\u7b97\u6a21\u578b\uff0c\u4f7f\u5b83\u4eec\u80fd\u591f\u6267\u884c\u901a\u7528\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u3002\u8fd9\u4e9b\u6a21\u578b\u56e0\u5176\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u800c\u5728\u884c\u4e1a\u548c\u5b66\u672f\u754c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u8fd9\u4e9b\u6210\u529f\uff0cLLMs\u7ecf\u5e38\u4f1a\u4ea7\u751f\u4e0d\u51c6\u786e\u7684\u60c5\u51b5\uff0c\u901a\u5e38\u79f0\u4e3a\u5e7b\u89c9\u3002\u63d0\u793a\u5de5\u7a0b\uff0c\u5373\u8bbe\u8ba1\u548c\u5236\u5b9a\u6307\u4ee4\u4ee5\u4f7fLLMs\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u7684\u8fc7\u7a0b\uff0c\u5df2\u6210\u4e3a\u51cf\u8f7b\u5e7b\u89c9\u7684\u5173\u952e\u65b9\u6cd5\u3002\u672c\u6587\u5bf9\u4e0d\u540c\u7684\u63d0\u793a\u7b56\u7565\u548c\u6846\u67b6\u8fdb\u884c\u4e86\u5168\u9762\u7684\u7ecf\u9a8c\u8bc4\u4f30\uff0c\u65e8\u5728\u51cf\u5c11LLMs\u4e2d\u7684\u5e7b\u89c9\u3002\u5404\u79cd\u63d0\u793a\u6280\u672f\u88ab\u5e94\u7528\u4e8e\u5e7f\u6cdb\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u4ee5\u8bc4\u4f30\u6bcf\u79cd\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u548c\u5e7b\u89c9\u7387\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u7814\u7a76\u4e86\u5de5\u5177\u8c03\u7528\u4ee3\u7406\uff08\u5177\u6709\u5916\u90e8\u5de5\u5177\u589e\u5f3a\u5176\u80fd\u529b\u4ee5\u8d85\u8d8a\u8bed\u8a00\u751f\u6210\u7684LLMs\uff09\u5bf9\u540c\u4e00\u57fa\u51c6\u6570\u636e\u96c6\u4e2d\u5e7b\u89c9\u7387\u7684\u5f71\u54cd\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6700\u4f73\u63d0\u793a\u6280\u672f\u53d6\u51b3\u4e8e\u95ee\u9898\u7c7b\u578b\uff0c\u5e76\u4e14\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\uff0c\u7b80\u5355\u7684\u6280\u672f\u5f80\u5f80\u6bd4\u590d\u6742\u7684\u65b9\u6cd5\u66f4\u6709\u6548\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8868\u660e\uff0c\u7531\u4e8e\u5916\u90e8\u5de5\u5177\u4f7f\u7528\u7684\u590d\u6742\u6027\u589e\u52a0\uff0cLLM\u4ee3\u7406\u53ef\u80fd\u4f1a\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u5e7b\u89c9\u7387\u3002|\n", "2410.19238": "|**2024-10-25**|**Designing LLM-Agents with Personalities: A Psychometric Approach**|Muhua Huang et.al.|[2410.19238](http://arxiv.org/abs/2410.19238)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u4f7f\u7528\u4e94\u5927\u4eba\u683c\u6846\u67b6\u4e3a\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\uff08Agent\uff09\u5206\u914d\u53ef\u91cf\u5316\u3001\u53ef\u63a7\u4e14\u7ecf\u8fc7\u5fc3\u7406\u6d4b\u91cf\u9a8c\u8bc1\u7684\u4eba\u683c\u7279\u8d28\u3002\u7814\u7a76\u65e8\u5728\u514b\u670d\u4eba\u7c7b\u4e3b\u4f53\u7814\u7a76\u7684\u9650\u5236\uff0c\u63d0\u51fa\u4ee3\u7406\u4f5c\u4e3a\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u7684\u4e00\u79cd\u53ef\u8bbf\u95ee\u5de5\u5177\u3002\u901a\u8fc7\u56db\u9879\u7814\u7a76\uff0c\u672c\u7814\u7a76\u5c55\u793a\u4e86\u4e3a\u4ee3\u7406\u5206\u914d\u5fc3\u7406\u6d4b\u91cf\u6709\u6548\u4eba\u683c\u7279\u8d28\u7684\u53ef\u884c\u6027\uff0c\u5e76\u4f7f\u5176\u80fd\u591f\u590d\u5236\u590d\u6742\u7684\u4eba\u7c7b\u884c\u4e3a\u3002\u7b2c\u4e00\u9879\u7814\u7a76\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8bed\u4e49\u7a7a\u95f4\u4e2d\u5efa\u7acb\u4e86\u5bf9\u4eba\u683c\u7ed3\u6784\u548c\u4eba\u683c\u6d4b\u8bd5\u7684\u7406\u89e3\u3002\u968f\u540e\u7684\u4e24\u9879\u7814\u7a76\u5229\u7528\u5b9e\u8bc1\u6570\u636e\u548c\u6a21\u62df\u6570\u636e\u5c55\u793a\u4e86\u521b\u5efa\u4ee3\u7406\u7684\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u663e\u793a\u4eba\u7c7b\u548c\u4ee3\u7406\u5728\u4eba\u683c\u6d4b\u8bd5\u4e2d\u7684\u7b54\u6848\u9ad8\u5ea6\u5bf9\u5e94\u6765\u9a8c\u8bc1\u7ed3\u679c\u3002\u6700\u540e\u4e00\u9879\u7814\u7a76\u8fdb\u4e00\u6b65\u901a\u8fc7\u4ee3\u7406\u5728\u6d89\u53ca\u98ce\u9669\u627f\u62c5\u548c\u9053\u5fb7\u56f0\u5883\u7684\u60c5\u5883\u4e0b\u590d\u5236\u5df2\u77e5\u7684\u4eba\u7c7b\u4eba\u683c\u7279\u8d28\u4e0e\u51b3\u7b56\u884c\u4e3a\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u4eba\u683c\u5fc3\u7406\u6d4b\u91cf\u65b9\u6cd5\u8bbe\u8ba1\u4ee3\u7406\u7684\u6709\u6548\u6027\u53ca\u5176\u5728\u793e\u4f1a\u548c\u884c\u4e3a\u7814\u7a76\u4e2d\u7684\u9002\u7528\u6027\u3002|\n", "2410.21071": "|**2024-10-28**|**Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks**|Eitan Farchi et.al.|[2410.21071](http://arxiv.org/abs/2410.21071)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u4ee5\u7528\u4e8e\u591a\u79cd\u4e0e\u4ee3\u7801\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5982\u4ece\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u7ffb\u8bd1\u5230\u53e6\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u3001\u5b9e\u73b0\u81ea\u7136\u8bed\u8a00\u9700\u6c42\u548c\u4ee3\u7801\u603b\u7ed3\u3002\u6700\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\u6280\u672f\u751f\u6210\u7684\u5de5\u4ef6\u9884\u8ba1\u5728\u7528\u6237\u53ea\u9700\u8fdb\u884c\u5c11\u91cf\u7b80\u5355\u4fee\u6539\u540e\u5c31\u53ef\u4ee5\u4f7f\u7528\u3002\u7136\u800c\uff0c\u91cf\u5316\u8fd9\u4e00\u6a21\u7cca\u6982\u5ff5\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u6b64\u5f88\u96be\u786e\u5b9a\u4e0e\u4ee3\u7801\u76f8\u5173\u7684LLM\u89e3\u51b3\u65b9\u6848\u7684\u8d28\u91cf\u3002\u6211\u4eec\u79f0\u5229\u7528LLM\u8fdb\u884c\u5224\u65ad\u7684\u8bc4\u4f30\u65b9\u6cd5\u4e3a\u201cLLM\u4f5c\u4e3a\u88c1\u5224\u201d\uff0c\u7b80\u79f0LaaJ\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u751f\u6210\u548c\u8bc4\u4f30LaaJ\u5b9e\u73b0\u7684\u65b9\u6cd5\u8bba\uff0c\u5e76\u5229\u7528\u4e00\u4e2a\u81ea\u52a8\u751f\u6210\u7684\u57fa\u51c6\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8be5\u57fa\u51c6\u6709\u4e24\u4e2a\u76ee\u7684\uff0c\u5373\u7528\u4e8e\u5f00\u53d1\u548c\u9a8c\u8bc1LaaJs\uff0c\u4ee5\u53ca\u901a\u8fc7LaaJs\u6765\u9a8c\u8bc1\u548c\u6d4b\u8bd5\u4e0e\u4ee3\u7801\u76f8\u5173\u7684LLM\u89e3\u51b3\u65b9\u6848\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u57fa\u51c6\u751f\u6210\u5f15\u64ce\uff0c\u5b83\u4e3a\u591a\u4e2a\u4ee3\u7801\u76f8\u5173\u4efb\u52a1\u751f\u6210\u591a\u79cd\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3aLaaJ\u8bc4\u4f30\u7684\u8f93\u5165\u3002\u6211\u4eec\u5229\u7528\u4e00\u4e2a\u56fe\u8868\u793aG\u6765\u8868\u793a\u6f5c\u5728\u7684\u4ee3\u7801\u76f8\u5173\u751f\u6210\u3002\u56fe\u7684\u9876\u70b9\u662f\u751f\u6210\u7684\u5de5\u4ef6\uff0c\u8fb9\u4ee3\u8868\u53ef\u80fd\u7684\u751f\u6210\uff0c\u4f8b\u5982\uff0c\u4ece\u81ea\u7136\u8bed\u8a00\u9700\u6c42\u751f\u6210Java\u7a0b\u5e8f\u3002\u5229\u7528\u4e00\u7cfb\u5217LLM\u4ee3\u7406\u548cG\uff0c\u6211\u4eec\u751f\u6210\u4e86\u4e0e\u4ee3\u7801\u76f8\u5173\u7684\u5de5\u4ef6\u3002\u901a\u8fc7\u5229\u7528\u56feG\u4e2d\u7684\u5faa\u73af\uff0c\u6211\u4eec\u5236\u5b9a\u4e86\u5bf9\u751f\u6210\u5de5\u4ef6\u7684\u671f\u671b\u3002\u5229\u7528\u8fd9\u4e9b\u5236\u5b9a\u7684\u671f\u671b\uff0c\u6211\u4eec\u53ef\u4ee5\u5f00\u53d1\u548c\u6d4b\u8bd5\u53ef\u9760\u7684LLM\u5224\u65ad\uff0c\u4ee5\u8bc4\u4f30\u751f\u6210\u5de5\u4ef6\u7684\u6709\u7528\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u521b\u5efa\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.20666": "|**2024-10-28**|**Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments**|Sangmim Song et.al.|[2410.20666](http://arxiv.org/abs/2410.20666)|null|\u5bfc\u822a\u5bf9\u4e8e\u89c6\u529b\u969c\u788d\u4eba\u58eb\uff08PVI\uff09\u6765\u8bf4\u4e00\u76f4\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u867d\u7136\u4f20\u7edf\u7684\u8f85\u52a9\u5de5\u5177\u5982\u767d\u8272\u624b\u6756\u548c\u5bfc\u76f2\u72ac\u975e\u5e38\u5b9d\u8d35\uff0c\u4f46\u5b83\u4eec\u5728\u63d0\u4f9b\u8be6\u7ec6\u7684\u73af\u5883\u4fe1\u606f\u548c\u7cbe\u786e\u5f15\u5bfc\u81f3\u76ee\u7684\u5730\u65b9\u9762\u4ecd\u663e\u4e0d\u8db3\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u53d1\u5c55\u4e3a\u589e\u5f3a\u8f85\u52a9\u5bfc\u822a\u63d0\u4f9b\u4e86\u65b0\u7684\u9014\u5f84\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aGuide-LLM\u7684\u5177\u8eab\u5316LLM\u57fa\u4ee3\u7406\uff0c\u65e8\u5728\u5e2e\u52a9\u89c6\u529b\u969c\u788d\u4eba\u58eb\u5728\u5927\u578b\u5ba4\u5185\u73af\u5883\u4e2d\u5bfc\u822a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u6587\u672c\u7684\u62d3\u6251\u56fe\uff0c\u4f7fLLM\u80fd\u591f\u4f7f\u7528\u7b80\u5316\u7684\u73af\u5883\u8868\u793a\u6765\u89c4\u5212\u5168\u5c40\u8def\u5f84\uff0c\u91cd\u70b9\u5173\u6ce8\u76f4\u7ebf\u8def\u5f84\u548c\u76f4\u89d2\u8f6c\u5f2f\uff0c\u4ee5\u4fc3\u8fdb\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4e86LLM\u7684\u5e38\u8bc6\u63a8\u7406\u6765\u8fdb\u884c\u5371\u9669\u68c0\u6d4b\uff0c\u5e76\u6839\u636e\u7528\u6237\u504f\u597d\u8fdb\u884c\u4e2a\u6027\u5316\u8def\u5f84\u89c4\u5212\u3002\u6a21\u62df\u5b9e\u9a8c\u8868\u660e\u8be5\u7cfb\u7edf\u5728\u5f15\u5bfc\u89c6\u529b\u969c\u788d\u4eba\u58eb\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u7a81\u663e\u4e86\u5176\u4f5c\u4e3a\u8f85\u52a9\u6280\u672f\u663e\u8457\u8fdb\u6b65\u7684\u6f5c\u529b\u3002\u7ed3\u679c\u8868\u660e\uff0cGuide-LLM\u80fd\u591f\u63d0\u4f9b\u9ad8\u6548\u3001\u9002\u5e94\u6027\u5f3a\u4e14\u4e2a\u6027\u5316\u7684\u5bfc\u822a\u8f85\u52a9\uff0c\u9884\u793a\u7740\u8be5\u9886\u57df\u7684\u6709\u524d\u666f\u7684\u53d1\u5c55\u3002|\n", "2410.20445": "|**2024-10-27**|**TrajAgent: An Agent Framework for Unified Trajectory Modelling**|Yuwei Du et.al.|[2410.20445](http://arxiv.org/abs/2410.20445)|**[link](https://github.com/tsinghua-fib-lab/trajagent)**|**\u8f68\u8ff9\u5efa\u6a21\uff0c\u5305\u62ec\u8f68\u8ff9\u6570\u636e\u6a21\u5f0f\u6316\u6398\u548c\u672a\u6765\u9884\u6d4b\u7684\u7814\u7a76\uff0c\u5728\u751f\u6d3b\u670d\u52a1\u3001\u57ce\u5e02\u4ea4\u901a\u548c\u516c\u5171\u7ba1\u7406\u7b49\u9886\u57df\u6709\u7740\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\uff0c\u5df2\u7ecf\u63d0\u51fa\u4e86\u8bb8\u591a\u65b9\u6cd5\u6765\u89e3\u51b3\u8f68\u8ff9\u5efa\u6a21\u4e2d\u7684\u5404\u79cd\u95ee\u9898\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u7684\u5f02\u8d28\u6027\u548c\u4efb\u52a1\u7684\u591a\u6837\u6027\uff0c\u5b9e\u73b0\u7edf\u4e00\u7684\u8f68\u8ff9\u5efa\u6a21\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6TrajAgent\uff0c\u4ee5\u7edf\u4e00\u5404\u79cd\u8f68\u8ff9\u5efa\u6a21\u4efb\u52a1\u3002\u5728TrajAgent\u4e2d\uff0c\u6211\u4eec\u9996\u5148\u5f00\u53d1\u4e86UniEnv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u7edf\u4e00\u6570\u636e\u548c\u6a21\u578b\u63a5\u53e3\u7684\u6267\u884c\u73af\u5883\uff0c\u652f\u6301\u5404\u79cd\u6a21\u578b\u7684\u6267\u884c\u548c\u8bad\u7ec3\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86TAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5404\u79cd\u8f68\u8ff9\u4efb\u52a1\u81ea\u52a8\u8fdb\u884c\u8f68\u8ff9\u5efa\u6a21\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5728TAgent\u4e2d\u8bbe\u8ba1\u4e86AutOpt\uff0c\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u4f18\u5316\u6a21\u5757\uff0c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u4e86\u96c6\u6210\u6a21\u578b\u7684\u6027\u80fd\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u7684\u4e0d\u540c\u8f68\u8ff9\u4efb\u52a1\uff0cTrajAgent\u80fd\u591f\u901a\u8fc7\u8bad\u7ec3\u548c\u6267\u884c\u9002\u5f53\u7684\u6a21\u578b\u81ea\u52a8\u751f\u6210\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\u3002\u5728\u56db\u4e2a\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u56db\u4e2a\u4efb\u52a1\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cTrajAgent\u5728\u7edf\u4e00\u8f68\u8ff9\u5efa\u6a21\u65b9\u9762\u662f\u6709\u6548\u7684\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e8615.43%\u3002**|\n", "2410.20007": "|**2024-10-25**|**Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models**|Danqing Wang et.al.|[2410.20007](http://arxiv.org/abs/2410.20007)|null|\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u4f7f\u5176\u80fd\u591f\u89e3\u51b3\u590d\u6742\u7684\u591a\u6b65\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u591a\u667a\u80fd\u4f53\u6846\u67b6\u5728\u589e\u5f3aLLMs\u7684\u63a8\u7406\u80fd\u529b\u65b9\u9762\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u667a\u80fd\u4f53\u4e4b\u95f4\u7f3a\u4e4f\u6709\u6548\u7684\u5408\u4f5c\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u591a\u6b65\u63a8\u7406\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5408\u4f5c\u591a\u667a\u80fd\u4f53\u63a8\u7406\u6846\u67b6\uff08CoPlanner\uff09\uff0c\u901a\u8fc7\u5206\u79bb\u63a8\u7406\u6b65\u9aa4\u5e76\u5c06\u4e0d\u540c\u7684\u4efb\u52a1\u5206\u914d\u7ed9\u4e0d\u540c\u7684\u667a\u80fd\u4f53\u6765\u5b9e\u73b0\u3002CoPlanner\u7531\u4e24\u4e2aLLM\u667a\u80fd\u4f53\u7ec4\u6210\uff1a\u89c4\u5212\u667a\u80fd\u4f53\u548c\u63a8\u7406\u667a\u80fd\u4f53\u3002\u89c4\u5212\u667a\u80fd\u4f53\u63d0\u4f9b\u9ad8\u5c42\u6b21\u7684\u6218\u7565\u63d0\u793a\uff0c\u800c\u63a8\u7406\u667a\u80fd\u4f53\u5219\u9075\u5faa\u8fd9\u4e9b\u63d0\u793a\u5e76\u63a8\u5bfc\u51fa\u7b54\u6848\u3002\u901a\u8fc7\u901a\u8fc7\u8fd1\u7aef\u7b56\u7565\u4f18\u5316\uff08PPO\uff09\u8bad\u7ec3\u89c4\u5212\u667a\u80fd\u4f53\u7684\u7b56\u7565\uff0c\u57fa\u4e8eLLaMA-3-8B\u7684CoPlanner\u5728LogiQA\u4e0a\u6bd4\u4e4b\u524d\u6700\u597d\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e869.94%\uff0c\u5728BBH\u4e0a\u63d0\u9ad8\u4e863.09%\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u89c4\u5212\u667a\u80fd\u4f53\u7684\u6307\u5bfc\u4ee5\u53ca\u667a\u80fd\u4f53\u4e4b\u95f4\u7684\u6709\u6548\u5408\u4f5c\u5bf9CoPlanner\u5728\u89e3\u51b3\u591a\u6b65\u63a8\u7406\u95ee\u9898\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u80fd\u8d77\u5230\u4e86\u91cd\u8981\u4f5c\u7528\u3002|\n", "2410.19920": "|**2024-10-29**|**Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting**|Mohamed Salim Aissi et.al.|[2410.19920](http://arxiv.org/abs/2410.19920)|null|\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u77e5\u8bc6\u5e94\u7528\u4e8e\u987a\u5e8f\u51b3\u7b56\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f88\u5c11\u6709\u7814\u7a76\u6df1\u5165\u63a2\u8ba8\u5728\u7279\u5b9a\u73af\u5883\u4e2d\u4f7f\u7528RL\u5fae\u8c03\u8fd9\u4e9b\u6a21\u578b\u5bf9\u5176\u80fd\u529b\u7684\u5f71\u54cd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u5206\u6790\u5728\u6587\u672c\u73af\u5883\u4e2d\u8fdb\u884cRL\u8bad\u7ec3\u540e\uff0cLLM\u4ee3\u7406\u5bf9\u63d0\u793a\u683c\u5f0f\u7684\u654f\u611f\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u9762\u5bf9\u4e0eRL\u8bad\u7ec3\u9636\u6bb5\u6240\u4f7f\u7528\u7684\u4e0d\u540c\u7684\u63d0\u793a\u683c\u5f0f\u65f6\uff0cLLM\u7684\u6027\u80fd\u4f1a\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u68c0\u67e5\u6a21\u578b\u7684\u5185\u90e8\u8868\u793a\u548c\u663e\u8457\u6807\u8bb0\u6765\u5206\u6790\u8fd9\u79cd\u654f\u611f\u6027\u7684\u6765\u6e90\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u5bf9\u6bd4\u635f\u5931\u6765\u51cf\u8f7b\u8fd9\u79cd\u654f\u611f\u6027\uff0c\u5e76\u63d0\u9ad8LLM\u7684\u9c81\u68d2\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2410.21909": "|**2024-10-29**|**SceneGenAgent: Precise Industrial Scene Generation with Coding Agent**|Xiao Xia et.al.|[2410.21909](http://arxiv.org/abs/2410.21909)|**[link](https://github.com/thudm/scenegenagent)**|**\u5de5\u4e1a\u573a\u666f\u7684\u5efa\u6a21\u5bf9\u4e8e\u5de5\u4e1a\u5236\u9020\u4e2d\u7684\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ece\u6587\u672c\u63cf\u8ff0\u751f\u6210\u901a\u75283D\u573a\u666f\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u4f7f\u7528LLMs\u751f\u6210\u5de5\u4e1a\u573a\u666f\u9762\u4e34\u7740\u72ec\u7279\u7684\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u573a\u666f\u9700\u8981\u7cbe\u786e\u7684\u5c3a\u5bf8\u548c\u5b9a\u4f4d\uff0c\u8981\u6c42\u5728\u7a7a\u95f4\u5e03\u5c40\u4e0a\u8fdb\u884c\u590d\u6742\u7684\u89c4\u5212\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86SceneGenAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u901a\u8fc7C#\u4ee3\u7801\u751f\u6210\u5de5\u4e1a\u573a\u666f\u3002SceneGenAgent\u901a\u8fc7\u7ed3\u6784\u5316\u548c\u53ef\u8ba1\u7b97\u7684\u683c\u5f0f\u3001\u5e03\u5c40\u9a8c\u8bc1\u548c\u8fed\u4ee3\u4f18\u5316\u6765\u786e\u4fdd\u7cbe\u786e\u7684\u5e03\u5c40\u89c4\u5212\uff0c\u4ee5\u6ee1\u8db3\u5de5\u4e1a\u573a\u666f\u7684\u5b9a\u91cf\u9700\u6c42\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7531SceneGenAgent\u9a71\u52a8\u7684LLMs\u8d85\u8fc7\u4e86\u5b83\u4eec\u539f\u6709\u7684\u6027\u80fd\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u7684\u5de5\u4e1a\u573a\u666f\u751f\u6210\u4efb\u52a1\u4e2d\u8fbe\u5230\u4e86\u9ad8\u8fbe81.0%\u7684\u6210\u529f\u7387\uff0c\u5e76\u6709\u6548\u6ee1\u8db3\u4e86\u5927\u591a\u6570\u573a\u666f\u751f\u6210\u9700\u6c42\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u6613\u7528\u6027\uff0c\u6211\u4eec\u6784\u5efa\u4e86SceneInstruct\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5fae\u8c03\u5f00\u6e90LLMs\u4ee5\u96c6\u6210\u5230SceneGenAgent\u4e2d\u3002\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u5f00\u6e90LLMs\u5728SceneInstruct\u4e0a\u7684\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6027\u80fd\uff0cLlama3.1-70B\u63a5\u8fd1GPT-4o\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/THUDM/SceneGenAgent \u83b7\u53d6\u3002**|\n", "2410.21359": "|**2024-10-28**|**Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games**|Ji Ma et.al.|[2410.21359](http://arxiv.org/abs/2410.21359)|null|\u968f\u7740\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8d8a\u6765\u8d8a\u591a\u5730\u627f\u62c5\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u5e76\u4e0e\u4eba\u7c7b\u793e\u4f1a\u4e92\u52a8\uff0c\u6211\u4eec\u5bf9\u5b83\u4eec\u7684\u884c\u4e3a\u4e86\u89e3\u591a\u5c11\uff1f\u672c\u7814\u7a76\uff081\uff09\u8c03\u67e5\u4e86\u4e0d\u540c\u4eba\u683c\u8bbe\u5b9a\u5982\u4f55\u8bf1\u5bfcLLM\u4ee3\u7406\u8868\u73b0\u51fa\u4eb2\u793e\u4f1a\u884c\u4e3a\u2014\u2014\u8fd9\u4e00\u57fa\u672c\u7684\u793e\u4f1a\u89c4\u8303\uff0c\u5e76\u5c06\u5176\u4e0e\u4eba\u7c7b\u884c\u4e3a\u8fdb\u884c\u57fa\u51c6\u6bd4\u8f83\uff1b\uff082\uff09\u5f15\u5165\u4e86\u4e00\u79cd\u884c\u4e3a\u65b9\u6cd5\u6765\u8bc4\u4f30LLM\u4ee3\u7406\u5728\u590d\u6742\u51b3\u7b56\u573a\u666f\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u63a2\u7d22\u4e86\u4e0d\u540c\u4eba\u683c\u8bbe\u5b9a\u548c\u5b9e\u9a8c\u6846\u67b6\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9bAI\u4ee3\u7406\u5728\u72ec\u88c1\u8005\u535a\u5f08\u4e2d\u7684\u5229\u4ed6\u884c\u4e3a\uff0c\u5e76\u6bd4\u8f83\u4e86\u540c\u4e00LLM\u5bb6\u65cf\u5185\u3001\u4e0d\u540cLLM\u5bb6\u65cf\u4e4b\u95f4\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u884c\u4e3a\u4e4b\u95f4\u7684\u5dee\u5f02\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u7684\u5dee\u5f02\u548c\u4e0d\u4e00\u81f4\uff0c\u5e76\u4e14\u4e0e\u4eba\u7c7b\u884c\u4e3a\u76f8\u6bd4\u4e5f\u6709\u660e\u663e\u7684\u533a\u522b\u3002\u4ec5\u4ec5\u8d4b\u4e88LLM\u4e00\u4e2a\u4eba\u7c7b\u822c\u7684\u8eab\u4efd\u5e76\u4e0d\u80fd\u4ea7\u751f\u7c7b\u4f3c\u4eba\u7c7b\u7684\u884c\u4e3a\u3002\u5c3d\u7ba1\u8fd9\u4e9bAI\u4ee3\u7406\u662f\u5728\u5927\u91cf\u7531\u4eba\u7c7b\u751f\u6210\u7684\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\uff0c\u4f46\u5b83\u4eec\u65e0\u6cd5\u51c6\u786e\u9884\u6d4b\u4eba\u7c7b\u7684\u51b3\u7b56\u3002LLM\u4ee3\u7406\u65e0\u6cd5\u6355\u6349\u4eba\u7c7b\u51b3\u7b56\u7684\u5185\u90e8\u8fc7\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u7279\u5b9a\u7684\u6a21\u578b\u67b6\u6784\u548c\u63d0\u793a\u683c\u5f0f\uff1b\u66f4\u7cdf\u7cd5\u7684\u662f\uff0c\u8fd9\u79cd\u4f9d\u8d56\u5e76\u6ca1\u6709\u9075\u5faa\u6e05\u6670\u7684\u6a21\u5f0f\u3002|\n", "2410.23252": "|**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6269\u5c55\u5230\u6267\u884c\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u4e2d\u7684\u4ee3\u7406\u4efb\u52a1\uff0c\u8d85\u8d8a\u4f20\u7edfNLP\u4efb\u52a1\uff0c\u8bc4\u4f30\u5176\u9c81\u68d2\u6027\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u5ffd\u7565\u4e86\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7b49\u5173\u952e\u7ef4\u5ea6\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86CASA\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u4ee3\u7406\u5728\u4e24\u4e2a\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\uff08\u5728\u7ebf\u8d2d\u7269\u548c\u793e\u4ea4\u8ba8\u8bba\u8bba\u575b\uff09\u4e2d\u5bf9\u6587\u5316\u548c\u793e\u4f1a\u89c4\u8303\u7684\u654f\u611f\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bc4\u4f30\u4e86LLM\u4ee3\u7406\u68c0\u6d4b\u5e76\u9002\u5f53\u56de\u5e94\u8fdd\u53cd\u89c4\u8303\u7684\u7528\u6237\u67e5\u8be2\u548c\u89c2\u5bdf\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u6d4b\u91cf\u610f\u8bc6\u8986\u76d6\u7387\u3001\u7ba1\u7406\u7528\u6237\u67e5\u8be2\u7684\u5b9e\u7528\u6027\u4ee5\u53ca\u9762\u5bf9\u8bef\u5bfc\u6027\u7f51\u7edc\u5185\u5bb9\u65f6\u7684\u8fdd\u89c4\u7387\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u524d\u7684LLMs\u5728\u975e\u4ee3\u7406\u73af\u5883\u4e2d\u7684\u8868\u73b0\u660e\u663e\u4f18\u4e8e\u57fa\u4e8e\u7f51\u7edc\u7684\u4ee3\u7406\u73af\u5883\uff0c\u4ee3\u7406\u7684\u610f\u8bc6\u8986\u76d6\u7387\u4f4e\u4e8e10%\uff0c\u8fdd\u89c4\u7387\u8d85\u8fc740%\u3002\u4e3a\u4e86\u63d0\u9ad8\u6027\u80fd\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u63d0\u793a\u548c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u4e92\u8865\u2014\u2014\u9488\u5bf9\u7279\u5b9a\u6587\u5316\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5728\u4e0d\u540c\u5730\u533a\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u800c\u63d0\u793a\u5219\u63d0\u5347\u4e86\u4ee3\u7406\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5728\u5f00\u53d1\u5468\u671f\u4e2d\u4e0d\u65ad\u57fa\u51c6\u6d4b\u8bd5LLM\u4ee3\u7406\u7684\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7684\u91cd\u8981\u6027\u3002|\n", "2410.22916": "|**2024-10-30**|**Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration**|Yanchu Guan et.al.|[2410.22916](http://arxiv.org/abs/2410.22916)|null|\u81ea\u4e3b\u79fb\u52a8\u5e94\u7528\u4ea4\u4e92\u5728\u79fb\u52a8\u5e94\u7528\u590d\u6742\u6027\u4e0d\u65ad\u589e\u52a0\u7684\u80cc\u666f\u4e0b\u53d8\u5f97\u65e5\u76ca\u91cd\u8981\u3002\u5f00\u53d1\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u4e0e\u79fb\u52a8\u5e94\u7528\u4ea4\u4e92\u7684\u667a\u80fd\u4ee3\u7406\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u89e3\u91ca\u7684\u884c\u4e3a\u514b\u9686\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\uff08EBC-LLMAgent\uff09\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u884c\u4e3a\u514b\u9686\u901a\u8fc7\u5b66\u4e60\u6f14\u793a\u6765\u521b\u5efa\u7528\u4e8e\u81ea\u4e3b\u79fb\u52a8\u5e94\u7528\u4ea4\u4e92\u7684\u667a\u80fd\u4e14\u53ef\u89e3\u91ca\u7684\u4ee3\u7406\u3002EBC-LLMAgent \u5305\u542b\u4e09\u4e2a\u6838\u5fc3\u6a21\u5757\uff1a\u6f14\u793a\u7f16\u7801\u3001\u4ee3\u7801\u751f\u6210\u548cUI\u6620\u5c04\uff0c\u8fd9\u4e9b\u6a21\u5757\u534f\u540c\u5de5\u4f5c\u4ee5\u6355\u6349\u7528\u6237\u6f14\u793a\u3001\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u5e76\u5efa\u7acb\u4ee3\u7801\u4e0eUI\u5143\u7d20\u4e4b\u95f4\u7684\u51c6\u786e\u5bf9\u5e94\u5173\u7cfb\u3002\u6211\u4eec\u5f15\u5165\u4e86\u884c\u4e3a\u514b\u9686\u94fe\u878d\u5408\u6280\u672f\u6765\u589e\u5f3a\u4ee3\u7406\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5bf9\u4e94\u4e2a\u6765\u81ea\u4e0d\u540c\u9886\u57df\u7684\u6d41\u884c\u79fb\u52a8\u5e94\u7528\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cEBC-LLMAgent \u7684\u8868\u73b0\u4f18\u8d8a\uff0c\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u5177\u6709\u9ad8\u6210\u529f\u7387\uff0c\u5728\u672a\u89c1\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u9ad8\u6548\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u80fd\u751f\u6210\u6709\u610f\u4e49\u7684\u89e3\u91ca\u3002|\n", "2410.22662": "|**2024-10-30**|**$\\textbf{EMOS}$: $\\textbf{E}$mbodiment-aware Heterogeneous $\\textbf{M}$ulti-robot $\\textbf{O}$perating $\\textbf{S}$ystem with LLM Agents**|Junting Chen et.al.|[2410.22662](http://arxiv.org/abs/2410.22662)|null|\u5f02\u6784\u591a\u673a\u5668\u4eba\u7cfb\u7edf\uff08HMRS\uff09\u5df2\u6210\u4e3a\u89e3\u51b3\u5355\u4e2a\u673a\u5668\u4eba\u65e0\u6cd5\u5355\u72ec\u5904\u7406\u7684\u590d\u6742\u4efb\u52a1\u7684\u5f3a\u5927\u65b9\u6cd5\u3002\u5f53\u524d\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08LLM-based MAS\uff09\u5728\u8f6f\u4ef6\u5f00\u53d1\u548c\u64cd\u4f5c\u7cfb\u7edf\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5c06\u5176\u5e94\u7528\u4e8e\u673a\u5668\u4eba\u63a7\u5236\u5219\u5448\u73b0\u51fa\u72ec\u7279\u7684\u6311\u6218\u3002\u7279\u522b\u662f\uff0c\u591a\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u6bcf\u4e2a\u4ee3\u7406\u7684\u80fd\u529b\u672c\u8d28\u4e0a\u4e0e\u673a\u5668\u4eba\u7684\u7269\u7406\u7ec4\u6210\u76f8\u5173\uff0c\u800c\u975e\u9884\u5b9a\u4e49\u7684\u89d2\u8272\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u667a\u80fd\u4f53\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5177\u6709\u4e0d\u540c\u5f62\u6001\u548c\u80fd\u529b\u7684\u5f02\u6784\u673a\u5668\u4eba\u4e4b\u95f4\u7684\u6709\u6548\u534f\u4f5c\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5Habitat-MAS\u3002\u6211\u4eec\u8bbe\u8ba1\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u4e4b\u4e00\u662f\u201c\u673a\u5668\u4eba\u7b80\u5386\u201d\uff1a\u4e0d\u540c\u4e8e\u91c7\u7528\u4eba\u4e3a\u8bbe\u8ba1\u7684\u89d2\u8272\u626e\u6f14\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u63d0\u793a\u7684\u65b9\u6cd5\uff0c\u5176\u4e2d\u4ee3\u7406\u901a\u8fc7\u7406\u89e3\u673a\u5668\u4eba\u7684URDF\u6587\u4ef6\u5e76\u8c03\u7528\u673a\u5668\u4eba\u8fd0\u52a8\u5b66\u5de5\u5177\u6765\u751f\u6210\u63cf\u8ff0\u5176\u7269\u7406\u80fd\u529b\u7684\u4fe1\u606f\uff0c\u4ee5\u6307\u5bfc\u4efb\u52a1\u89c4\u5212\u548c\u884c\u52a8\u6267\u884c\u3002Habitat-MAS\u57fa\u51c6\u6d4b\u8bd5\u65e8\u5728\u8bc4\u4f30\u591a\u667a\u80fd\u4f53\u6846\u67b6\u5728\u9700\u8981\u5f62\u6001\u611f\u77e5\u63a8\u7406\u7684\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5305\u62ec1\uff09\u64cd\u4f5c\uff0c2\uff09\u611f\u77e5\uff0c3\uff09\u5bfc\u822a\u4ee5\u53ca4\uff09\u5168\u9762\u7684\u591a\u697c\u5c42\u7269\u4f53\u91cd\u6392\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u673a\u5668\u4eba\u7684\u7b80\u5386\u548c\u6211\u4eec\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u5206\u5c42\u8bbe\u8ba1\u5bf9\u4e8e\u5728\u8fd9\u79cd\u590d\u6742\u7684\u4efb\u52a1\u73af\u5883\u4e2d\u6709\u6548\u8fd0\u884c\u5f02\u6784\u591a\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002|\n", "2410.22584": "|**2024-10-29**|**BENCHAGENTS: Automated Benchmark Creation with Agent Interaction**|Natasha Butt et.al.|[2410.22584](http://arxiv.org/abs/2410.22584)|null|\u8bc4\u4f30\u53d7\u5230\u57fa\u51c6\u53ef\u7528\u6027\u7684\u9650\u5236\u3002\u968f\u7740\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u9700\u8981\u521b\u5efa\u80fd\u591f\u8861\u91cf\u65b0\u751f\u6210\u80fd\u529b\u8fdb\u5c55\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u901a\u8fc7\u4eba\u5de5\u6ce8\u91ca\u521b\u5efa\u65b0\u7684\u57fa\u51c6\u65e2\u7f13\u6162\u53c8\u6602\u8d35\uff0c\u9650\u5236\u4e86\u5bf9\u4efb\u4f55\u80fd\u529b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86BENCHAGENTS\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7cfb\u7edf\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u5316\u590d\u6742\u80fd\u529b\u7684\u57fa\u51c6\u521b\u5efa\u8fc7\u7a0b\uff0c\u540c\u65f6\u5185\u5728\u786e\u4fdd\u6570\u636e\u548c\u5ea6\u91cf\u7684\u8d28\u91cf\u3002BENCHAGENTS\u5c06\u57fa\u51c6\u521b\u5efa\u8fc7\u7a0b\u5206\u89e3\u4e3a\u89c4\u5212\u3001\u751f\u6210\u3001\u6570\u636e\u9a8c\u8bc1\u548c\u8bc4\u4f30\u56db\u4e2a\u9636\u6bb5\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u7531\u4e00\u4e2aLLM\u4ee3\u7406\u6267\u884c\u3002\u8fd9\u4e9b\u4ee3\u7406\u76f8\u4e92\u4ea4\u4e92\uff0c\u5e76\u5229\u7528\u6765\u81ea\u57fa\u51c6\u5f00\u53d1\u8005\u7684\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u6765\u663e\u5f0f\u6539\u8fdb\u548c\u7075\u6d3b\u63a7\u5236\u6570\u636e\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u6211\u4eec\u4f7f\u7528BENCHAGENTS\u521b\u5efa\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u751f\u6210\u8fc7\u7a0b\u4e2d\u4e0e\u89c4\u5212\u548c\u7ea6\u675f\u6ee1\u8db3\u76f8\u5173\u80fd\u529b\u7684\u57fa\u51c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e9b\u57fa\u51c6\u7814\u7a76\u4e86\u4e03\u4e2a\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5e76\u63d0\u53d6\u4e86\u5173\u4e8e\u5e38\u89c1\u5931\u8d25\u6a21\u5f0f\u548c\u6a21\u578b\u5dee\u5f02\u7684\u65b0\u89c1\u89e3\u3002|\n", "2410.22552": "|**2024-10-29**|**Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents**|Jaekyeom Kim et.al.|[2410.22552](http://arxiv.org/abs/2410.22552)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Auto-Intent\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u65e0\u9700\u76f4\u63a5\u5fae\u8c03\u5373\u53ef\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u76ee\u6807\u9886\u57df\u4ee3\u7406\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u5173\u6ce8\u7f51\u9875\u5bfc\u822a\u4efb\u52a1\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u4ece\u76ee\u6807\u9886\u57df\u7684\u6f14\u793a\u4e2d\u65e0\u76d1\u7763\u5730\u53d1\u73b0\u6f5c\u5728\u7684\u610f\u56fe\uff0c\u4ee5\u9ad8\u5ea6\u7d27\u51d1\u7684\u5f62\u5f0f\uff08\u6700\u591a\u4e09\u4e2a\u8bcd\uff09\u63d0\u53d6\u8fd9\u4e9b\u610f\u56fe\u3002\u901a\u8fc7\u63d0\u53d6\u7684\u610f\u56fe\uff0c\u6211\u4eec\u8bad\u7ec3\u610f\u56fe\u9884\u6d4b\u5668\u6765\u9884\u6d4b\u7ed9\u5b9a\u4ee3\u7406\u8fc7\u53bb\u89c2\u5bdf\u548c\u52a8\u4f5c\u7684\u4e0b\u4e00\u4e2a\u610f\u56fe\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u63a2\u7d22\u65b9\u6cd5\uff0c\u5176\u4e2d\u6700\u6709\u53ef\u80fd\u7684\u524dk\u4e2a\u610f\u56fe\u9884\u6d4b\u88ab\u7528\u4f5c\u63d0\u793a\u63d0\u4f9b\u7ed9\u9884\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u51b3\u7b56\u80fd\u529b\u3002Auto-Intent\u663e\u8457\u63d0\u9ad8\u4e86GPT-3.5\u3001GPT-4\u548cLlama-3.1-70B\u3001Llama-3.1-405B\u4ee3\u7406\u5728\u6765\u81eaMind2Web\u7684\u5927\u89c4\u6a21\u771f\u5b9e\u7f51\u7ad9\u5bfc\u822a\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u6765\u81eaWebArena\u7684\u5728\u7ebf\u5bfc\u822a\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5b83\u5728Mind2Web\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8de8\u57fa\u51c6\u6cdb\u5316\u80fd\u529b\u4e5f\u5f97\u5230\u4e86\u63d0\u5347\u3002|\n", "2410.23555": "|**2024-10-31**|**From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents**|Nalin Tiwary et.al.|[2410.23555](http://arxiv.org/abs/2410.23555)|null|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6846\u67b6\u5df2\u7ecf\u6269\u5c55\u5230\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\uff0c\u5982\u4ea4\u4e92\u5f0f\u7f51\u9875\u5bfc\u822a\u3002\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u7528\u6237\u547d\u4ee4\u9a71\u52a8\uff0c\u4f7f\u7528\u591a\u8f6e\u5bf9\u8bdd\u5728\u7f51\u9875\u6d4f\u89c8\u5668\u4e2d\u5b8c\u6210\u4efb\u52a1\uff0c\u65e2\u63d0\u4f9b\u4e86\u521b\u65b0\u673a\u4f1a\u4e5f\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\u3002\u5c3d\u7ba1\u5f15\u5165\u4e86\u7528\u4e8e\u4f1a\u8bdd\u7f51\u9875\u5bfc\u822a\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4f46\u5f71\u54cd\u8fd9\u4e9b\u4ee3\u7406\u6027\u80fd\u7684\u5173\u952e\u4e0a\u4e0b\u6587\u7ec4\u4ef6\u7684\u8be6\u7ec6\u7406\u89e3\u4ecd\u7136\u4e0d\u8db3\u3002\u672c\u7814\u7a76\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u901a\u8fc7\u5206\u6790\u7f51\u9875\u5bfc\u822a\u4ee3\u7406\u529f\u80fd\u7684\u5173\u952e\u4e0a\u4e0b\u6587\u5143\u7d20\u3002\u6211\u4eec\u7814\u7a76\u4e86\u4e0a\u4e0b\u6587\u7ba1\u7406\u7684\u4f18\u5316\uff0c\u91cd\u70b9\u5173\u6ce8\u4ea4\u4e92\u5386\u53f2\u548c\u7f51\u9875\u8868\u793a\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u51fa\u4e86\u901a\u8fc7\u6709\u6548\u7684\u4e0a\u4e0b\u6587\u7ba1\u7406\uff0c\u5728\u5206\u5e03\u5916\u573a\u666f\u4e0b\u7684\u6539\u8fdb\u4ee3\u7406\u6027\u80fd\uff0c\u5305\u62ec\u672a\u89c1\u8fc7\u7684\u7f51\u7ad9\u3001\u7c7b\u522b\u548c\u5730\u533a\u3002\u8fd9\u4e9b\u53d1\u73b0\u4e3aLLM\u57fa\u4ee3\u7406\u7684\u8bbe\u8ba1\u548c\u4f18\u5316\u63d0\u4f9b\u4e86\u89c1\u89e3\uff0c\u4f7f\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7f51\u9875\u5bfc\u822a\u66f4\u52a0\u51c6\u786e\u548c\u6709\u6548\u3002|\n"}, "llm": {"2405.10311": "|**2024-05-16**|**UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models**|Sahel Sharifymoghaddam et.al.|[2405.10311](http://arxiv.org/abs/2405.10311)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u591a\u6a21\u6001\uff08MM\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u89e3\u9501\u4e86\u8bb8\u591a\u9700\u8981\u591a\u6a21\u6001\u7406\u89e3\uff08\u5982\u56fe\u50cf\u63cf\u8ff0\u6216\u89c6\u89c9\u95ee\u7b54\uff09\u548c\u751f\u6210\uff08\u5982\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u751f\u6210\u6216\u7f16\u8f91\uff09\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347MM-LLMs\u7684\u8f93\u51fa\u8d28\u91cf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684UniRAG\u6280\u672f\uff0c\u5b83\u5728\u63a8\u7406\u9636\u6bb5\u5c06\u76f8\u5173\u68c0\u7d22\u4fe1\u606f\u6dfb\u52a0\u5230\u63d0\u793a\u4e2d\uff0c\u4f5c\u4e3a\u5c11\u91cf\u6837\u4f8b\u3002\u4e0e\u666e\u904d\u8ba4\u4e3a\u68c0\u7d22\u589e\u5f3a\uff08RA\uff09\u4e3b\u8981\u6539\u8fdb\u7f55\u89c1\u5b9e\u4f53\u7684\u751f\u6210\u6216\u7406\u89e3\u4e0d\u540c\uff0c\u6211\u4eec\u5728MSCOCO\u6570\u636e\u96c6\u4e0a\u5bf9\u5305\u62ecGPT4\u3001Gemini-Pro\u5728\u5185\u7684\u4e13\u6709\u6a21\u578b\u4ee5\u53caLlava\u3001LaVIT\u548cEmu2\u7b49\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8f93\u5165\u63d0\u793a\u901a\u8fc7MM\u68c0\u7d22\u5668\uff08\u5982UniIR\u6a21\u578b\uff09\u589e\u5f3a\u540e\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u8d28\u91cf\u3002|\n", "2405.10305": "|**2024-05-16**|**4D Panoptic Scene Graph Generation**|Jingkang Yang et.al.|[2405.10305](http://arxiv.org/abs/2405.10305)|**[link](https://github.com/jingkang50/psg4d)**|**\u6211\u4eec\u751f\u6d3b\u5728\u4e00\u4e2a\u4e09\u7ef4\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u901a\u8fc7\u7b2c\u56db\u7ef4\u65f6\u95f4\u5411\u524d\u63a8\u8fdb\u3002\u4e3a\u4e86\u4f7f\u4eba\u5de5\u667a\u80fd\u80fd\u591f\u5168\u9762\u7406\u89e3\u8fd9\u79cd4D\u73af\u5883\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8868\u793a\u5f62\u5f0f\u2014\u20144D\u5168\u666f\u573a\u666f\u56fe\uff08PSG-4D\uff09\uff0c\u5b83\u5c06\u52a8\u60014D\u4e16\u754c\u4e2d\u7684\u539f\u59cb\u89c6\u89c9\u6570\u636e\u62bd\u8c61\u4e3a\u8282\u70b9\u548c\u8fb9\uff0c\u8282\u70b9\u4ee3\u8868\u5177\u6709\u7cbe\u786e\u4f4d\u7f6e\u548c\u72b6\u6001\u4fe1\u606f\u7684\u5b9e\u4f53\uff0c\u8fb9\u6355\u6349\u65f6\u95f4\u5173\u7cfb\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5728\u8fd9\u4e00\u65b0\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u6ce8\u91caPSG-4D\u6570\u636e\u96c6\uff0c\u5305\u542b3000\u4e2aRGB-D\u89c6\u9891\uff0c\u603b\u8ba1100\u4e07\u5e27\uff0c\u6bcf\u5e27\u90fd\u5e26\u67094D\u5168\u666f\u5206\u5272\u63a9\u7801\u4ee5\u53ca\u8be6\u7ec6\u7684\u52a8\u6001\u573a\u666f\u56fe\u6807\u7b7e\u3002\u6211\u4eec\u4e3a\u6b64\u4efb\u52a1\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSG4DFormer\u7684Transformer\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u5168\u666f\u5206\u5272\u63a9\u7801\uff0c\u6cbf\u65f6\u95f4\u8f74\u8ddf\u8e2a\u63a9\u7801\uff0c\u5e76\u901a\u8fc7\u5173\u7cfb\u7ec4\u4ef6\u751f\u6210\u76f8\u5e94\u7684\u573a\u666f\u56fe\u3002\u5728\u65b0\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u672a\u6765\u7684PSG-4D\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u878d\u5165\u6211\u4eec\u7684PSG-4D\u7cfb\u7edf\u6765\u5b9e\u73b0\u52a8\u6001\u573a\u666f\u7406\u89e3\u7684\u4e00\u4e2a\u5b9e\u9645\u5e94\u7528\u793a\u4f8b\u3002**|\n", "2405.10299": "|**2024-05-16**|**HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models**|Rhea Sanjay Sukthanker et.al.|[2405.10299](http://arxiv.org/abs/2405.10299)|**[link](https://github.com/automl/hw-aware-llm-bench)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u786c\u4ef6\u6307\u6807\uff08\u5982\u5ef6\u8fdf\u3001\u80fd\u8017\u3001GPU\u5185\u5b58\u4f7f\u7528\u548c\u6027\u80fd\uff09\u4e4b\u95f4\u7684\u6743\u8861\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u4eba\u4eec\u6b63\u5728\u5bfb\u6c42\u4e3a\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u914d\u7f6e\u5efa\u7acb\u5e15\u7d2f\u6258\u524d\u6cbf\uff0c\u4ee5\u5728\u6307\u5b9a\u786c\u4ef6\u9650\u5236\u4e0b\u627e\u5230\u6700\u4f18\u6a21\u578b\u3002\u7136\u800c\uff0c\u5bf9\u591a\u79cd\u67b6\u6784\u5728\u591a\u53f0\u8bbe\u5907\u4e0a\u7684\u5168\u9762\u8bad\u7ec3\u548c\u8bc4\u4f30\u5728\u8ba1\u7b97\u4e0a\u662f\u4e0d\u53ef\u884c\u7684\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HW-GPT-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u786c\u4ef6\u611f\u77e5\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u57fa\u51c6\uff0c\u5229\u7528\u795e\u7ecf\u67b6\u6784\u641c\u7d22\uff08NAS\uff09\u4e2d\u7684\u6743\u91cd\u5171\u4eab\u6280\u672f\uff0c\u5728\u4e00\u4e2a\u6a21\u578b\u4e2d\u9ad8\u6548\u5730\u8bad\u7ec3\u5305\u542b\u4e0d\u540c\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u8d85\u7f51\u7edc\u3002\u6211\u4eec\u572813\u79cd\u8bbe\u5907\u4e0a\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u6027\u80fd\u5256\u6790\uff0c\u8003\u8651\u4e865\u79cd\u786c\u4ef6\u6307\u6807\u548c3\u79cd\u4e0d\u540c\u7684\u6a21\u578b\u89c4\u6a21\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc78\u79cd\u4e0d\u540c\u7684\u591a\u76ee\u6807NAS\u7b97\u6cd5\u5c55\u793a\u4e86HW-GPT-Bench\u7684\u53ef\u7528\u6027\uff0c\u5e76\u8bc4\u4f30\u4e86\u7531\u6b64\u4ea7\u751f\u7684\u5e15\u7d2f\u6258\u524d\u6cbf\u7684\u8d28\u91cf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u63a8\u52a8\u548c\u52a0\u901f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u76ee\u6807\u65b9\u6cd5\uff0c\u5982NAS\u548c\u7ed3\u6784\u5316\u526a\u679d\u7684\u7814\u7a76\u3002**|\n", "2405.10288": "|**2024-05-16**|**Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction**|Jianhao Chen et.al.|[2405.10288](http://arxiv.org/abs/2405.10288)|**[link](https://github.com/jianhaochen-nju/tsdre)**|**\u6458\u8981\uff1a** \u4e8b\u5b9e\u62bd\u53d6\u5bf9\u4e8e\u6784\u5efa\u77e5\u8bc6\u56fe\u8c31\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5bf9\u65f6\u95f4\u76f8\u5173\u4e8b\u5b9e\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u9700\u6c42\u589e\u957f\uff0c\u51fa\u73b0\u4e86\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u4efb\u52a1\u3002\u672c\u6587\u7279\u522b\u5173\u6ce8\u4ece\u81ea\u7136\u8bed\u8a00\u6587\u672c\u4e2d\u63d0\u53d6\u65f6\u95f4\u6027\u4e8b\u5b9e\u3002\u5148\u524d\u7684\u7814\u7a76\u672a\u80fd\u59a5\u5584\u5904\u7406\u590d\u6742\u53e5\u5b50\u4e2d\u65f6\u95f4\u4e0e\u4e8b\u5b9e\u5bf9\u5e94\u5173\u7cfb\u7684\u5efa\u7acb\u96be\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u65f6\u95f4\u7ebf\u7684\u53e5\u5b50\u5206\u89e3\u7b56\u7565\uff0c\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4e8b\u5b9e\u76f8\u5173\u65f6\u95f4\u7ebf\u7684\u7cbe\u7ec6\u7406\u89e3\u3002\u7136\u800c\uff0c\u76f4\u63a5\u4f7f\u7528LLMs\u8fdb\u884c\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u6027\u80fd\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86TSDRE\u65b9\u6cd5\uff0c\u5c06LLMs\u7684\u5206\u89e3\u80fd\u529b\u878d\u5165\u5230\u5c0f\u578b\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u4f20\u7edf\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002 \u4e3a\u4e86\u652f\u6301\u8bc4\u4f30\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u590d\u6742\u7684\u65f6\u5e8f\u4e8b\u5b9e\u62bd\u53d6\u6570\u636e\u96c6ComplexTRED\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTSDRE\u5728HyperRED-Temporal\u548cComplexTRED\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002|\n", "2405.10276": "|**2024-05-16**|**Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers**|Tuo Zhang et.al.|[2405.10276](http://arxiv.org/abs/2405.10276)|null|\u8fd1\u5e74\u6765\uff0c\u8bb8\u591a\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u7b56\u7565\u6027\u63d0\u793a\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6548\u80fd\u3002\u7279\u522b\u662f\u4f18\u5316\u901a\u8fc7prompting\uff08OPRO\uff09\u65b9\u6cd5\u8868\u73b0\u51fa\u9876\u5c16\u6027\u80fd\uff0c\u5b83\u5229\u7528LLMs\u4f5c\u4e3a\u4f18\u5316\u5668\uff0c\u76ee\u6807\u662f\u5bfb\u627e\u80fd\u6700\u5927\u5316\u4efb\u52a1\u51c6\u786e\u6027\u7684\u6307\u4ee4\u3002\u672c\u8bba\u6587\u91cd\u65b0\u5ba1\u89c6\u4e86OPRO\u5728\u5c0f\u578bLLMs\uff08\u5982LaMa-2\u7cfb\u5217\u548cMistral 7B\uff09\u4e0a\u7684\u81ea\u52a8\u5316\u63d0\u793a\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u4e8e\u5c0f\u578bLLMs\uff0cOPRO\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u5176\u6709\u9650\u7684\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86\u4f18\u5316\u6f5c\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u672a\u6765\u7684\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u5e94\u540c\u65f6\u8003\u8651\u6a21\u578b\u80fd\u529b\u548c\u8ba1\u7b97\u6210\u672c\u3002\u9488\u5bf9\u5c0f\u578bLLMs\uff0c\u6211\u4eec\u63a8\u8350\u76f4\u63a5\u63d0\u4f9b\u660e\u786e\u9610\u8ff0\u76ee\u6807\u548c\u65b9\u6cd5\u7684\u6307\u4ee4\uff0c\u4f5c\u4e3a\u7a33\u5065\u7684\u63d0\u793a\u57fa\u7ebf\uff0c\u4ee5\u786e\u4fdd\u5728\u5f53\u524d\u7814\u7a76\u4e2d\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u63d0\u793a\u8bbe\u8ba1\u3002|\n", "2405.10260": "|**2024-05-16**|**Keep It Private: Unsupervised Privatization of Online Text**|Calvin Bao et.al.|[2405.10260](http://arxiv.org/abs/2405.10260)|**[link](https://github.com/csbao/kip-privatization)**|**## \u80cc\u666f \u4f5c\u8005\u8eab\u4efd\u6df7\u6dc6\u6280\u672f\u6709\u671b\u901a\u8fc7\u81ea\u52a8\u91cd\u5199\u6587\u672c\u6765\u4fdd\u62a4\u7f51\u7edc\u901a\u4fe1\u4e2d\u7684\u4e2a\u4eba\u9690\u79c1\u3002\u7136\u800c\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6587\u732e\u4e2d\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u8bc4\u4f30\u5927\u591a\u5c40\u9650\u5728\u72ed\u5c0f\u573a\u666f\u4e0b\uff0c\u4e3b\u8981\u4f9d\u8d56\u4e8e\u8868\u9762\u7684\u7f16\u8f91\u64cd\u4f5c\uff0c\u53ef\u80fd\u5bfc\u81f4\u8f93\u51fa\u4e0d\u81ea\u7136\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u6587\u672c\u79c1\u5bc6\u5316\u6846\u67b6\uff0c\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u751f\u6210\u517c\u987e\u51c6\u786e\u3001\u8fde\u8d2f\u548c\u9690\u79c1\u7684\u91cd\u5199\u3002\u6211\u4eec\u5728\u5927\u89c4\u6a21\u7684\u82f1\u8bedReddit\u5e16\u5b50\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u8be5\u6570\u636e\u96c6\u753168,000\u540d\u4f5c\u8005\u64b0\u5199\uff0c\u5305\u542b\u77ed\u5230\u4e2d\u7b49\u957f\u5ea6\u7684\u6587\u672c\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5728\u4e0d\u540c\u8bc4\u4f30\u6761\u4ef6\u4e0b\uff0c\u5982\u4f5c\u8005\u7b80\u4ecb\u957f\u5ea6\u548c\u4f5c\u8005\u8bc6\u522b\u7b56\u7565\uff0c\u6027\u80fd\u7684\u53d8\u5316\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u81ea\u52a8\u5316\u6307\u6807\u548c\u4eba\u5de5\u8bc4\u4f30\u4e2d\u4fdd\u6301\u9ad8\u6587\u672c\u8d28\u91cf\uff0c\u5e76\u6210\u529f\u5730\u89c4\u907f\u4e86\u51e0\u79cd\u81ea\u52a8\u4f5c\u8005\u8bc6\u522b\u653b\u51fb\u3002**|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u4f53\u5728\u7a7a\u95f4\u7406\u89e3\u4e0e\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u8986\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u7ed3\u5408\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u663e\u793a\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u4e5f\u6307\u51fa\u4e86\u6316\u63983D-LLMs\u5168\u90e8\u6f5c\u529b\u6240\u9700\u7684\u521b\u65b0\u65b9\u6cd5\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u63d0\u4f9b\u6307\u5bfc\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u8c03\u67e5\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251](http://arxiv.org/abs/2405.10251)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u8bc4\u4f30\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u7b49\u65b9\u9762\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u636e\u6211\u4eec\u6240\u77e5\uff0c\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u6df1\u5165\u7814\u7a76\uff0c\u8fd9\u662f\u8861\u91cf\u6a21\u578b\u4f18\u79c0\u7a0b\u5ea6\u7684\u5173\u952e\u6807\u51c6\u3002\u56e0\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u5168\u9762\u8bc4\u4f30\u77e5\u540d\u4e14\u6027\u80fd\u51fa\u8272\u7684LLMs\uff0c\u5305\u62ecChatGPT\u3001ChatGLM\u3001\u57fa\u4e8eT5\u7684\u6a21\u578b\u3001\u57fa\u4e8eLLaMA\u7684\u6a21\u578b\u548cPythia\u6a21\u578b\uff0c\u5728\u5bf9\u8bdd\u751f\u6210\u548c\u6587\u672c\u603b\u7ed3\u7b49NLG\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u6db5\u76d6\u82f1\u8bed\u548c\u4e2d\u6587\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5171\u540c\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5305\u62ec\u8f93\u5165\u6a21\u677f\u548c\u540e\u5904\u7406\u7b56\u7565\u3002\u7814\u7a76\u7ed3\u679c\u62a5\u544a\u4e86\u81ea\u52a8\u8bc4\u5206\uff0c\u540c\u65f6\u8fdb\u884c\u4e86\u8be6\u7ec6\u5206\u6790\u3002|\n", "2405.10250": "|**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u4e92\u52a8\u529f\u80fd\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u6307\u5bfc\u6a21\u578b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u4e92\u52a8\u65b9\u5f0f\u5f80\u5f80\u5047\u8bbe\u7528\u6237\u5177\u5907\u8c03\u8bd5\u6e90\u4ee3\u7801\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u975e\u4e13\u4e1a\u7a0b\u5e8f\u5458\u4e0d\u592a\u53cb\u597d\u3002\u8fd9\u4f7f\u5f97\u4f7f\u4e92\u52a8\u4ee3\u7801\u751f\u6210\u5bf9\u4e0d\u540c\u7f16\u7a0b\u6c34\u5e73\u7684\u4e2a\u4f53\u66f4\u6613\u4e8e\u4f7f\u7528\u6210\u4e3a\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86IntelliExplain\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u673a\u4ea4\u4e92\u8303\u5f0f\uff0c\u901a\u8fc7\u8ba9\u7528\u6237\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u4e0e\u6e90\u4ee3\u7801\u4e92\u52a8\uff0c\u63d0\u5347\u975e\u4e13\u4e1a\u4eba\u58eb\u7684\u4f53\u9a8c\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u4ed6\u4eec\u53d1\u73b0\u9519\u8bef\u7684\u81ea\u7136\u8bed\u8a00\u7ea0\u6b63\u53cd\u9988\uff0c\u6765\u6307\u5bfc\u7cfb\u7edf\u4fee\u8ba2\u4ee3\u7801\uff0c\u76f4\u5230\u7528\u6237\u5bf9\u7cfb\u7edf\u7684\u4ee3\u7801\u89e3\u91ca\u611f\u5230\u6ee1\u610f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u663e\u793a\uff0c\u4f7f\u7528IntelliExplain\u7684\u7528\u6237\u5728Text-to-SQL\u548cPython\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u5206\u522b\u6bd4\u7eafGPT-3.5\u63d0\u9ad8\u4e8611.6%\u548c25.3%\uff0c\u540c\u65f6\u6240\u9700\u65f6\u95f4\u5206\u522b\u51cf\u5c11\u4e8639.0%\u548c15.6%\u3002|\n", "2405.10212": "|**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|**[link](https://github.com/CAS-SIAT-XinHai/CPsyExam)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5fc3\u7406\u5b66\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014CPsyExam\uff0c\u5b83\u6e90\u4e8e\u4e2d\u56fd\u8bed\u8a00\u8003\u8bd5\u7684\u95ee\u9898\u3002CPsyExam\u65e8\u5728\u5206\u522b\u5f3a\u8c03\u5fc3\u7406\u5b66\u77e5\u8bc6\u548c\u6848\u4f8b\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u8ba4\u8bc6\u5230\u5c06\u5fc3\u7406\u5b66\u77e5\u8bc6\u5e94\u7528\u4e8e\u5b9e\u9645\u60c5\u5883\u7684\u4ef7\u503c\u3002\u4ece22,000\u4e2a\u95ee\u9898\u5e93\u4e2d\uff0c\u6211\u4eec\u7cbe\u9009\u4e864,000\u4e2a\u6765\u6784\u5efa\u8be5\u57fa\u51c6\uff0c\u786e\u4fdd\u4e86\u4e3b\u9898\u7684\u5747\u8861\u8986\u76d6\uff0c\u5e76\u5305\u542b\u4e86\u5404\u79cd\u6848\u4f8b\u5206\u6790\u65b9\u6cd5\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u57fa\u7840\u7684\u6a21\u578b\u3002\u5b9e\u9a8c\u548c\u5206\u6790\u7ed3\u679c\u663e\u793a\uff0cCPsyExam\u662f\u4e00\u4e2a\u6709\u6548\u7684\u786e\u7acb\u8bed\u8a00\u6a21\u578b\u5bf9\u5fc3\u7406\u5b66\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\uff0c\u540c\u65f6\u652f\u6301\u5728\u4e0d\u540c\u7c92\u5ea6\u4e0a\u6bd4\u8f83\u8fd9\u4e9b\u6a21\u578b\u3002|\n", "2405.10936": "|**2024-05-17**|**A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers**|Kaiyu Huang et.al.|[2405.10936](http://arxiv.org/abs/2405.10936)|**[link](https://github.com/kaiyuhwang/mllm-survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u8bed\u8a00\u80fd\u529b\uff0c\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u51cf\u5c11\u6f5c\u5728\u7684\u6b67\u89c6\u5e76\u63d0\u5347\u6280\u672f\u7684\u901a\u7528\u6027\u548c\u53ef\u8bbf\u95ee\u6027\uff0c\u5bf9\u4e8e\u591a\u8bed\u8a00\u6280\u672f\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1LLMs\u53d6\u5f97\u4e86\u7a81\u7834\uff0c\u4f46\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u7684\u6df1\u5165\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u4e00\u4efd\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u603b\u7ed3\u8fd1\u671f\u7684\u65b9\u6cd5\u3001\u8fdb\u5c55\u3001\u5c40\u9650\u6027\u548c\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u65e8\u5728\u4ece\u591a\u4e2a\u89d2\u5ea6\u5ba1\u89c6LLMs\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u56de\u987e\u4e86\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7814\u7a76\u7684\u5386\u53f2\u6f14\u53d8\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86LLMs\u7684\u591a\u8bed\u8a00\u7279\u6027\uff0c\u5305\u62ec\u8bad\u7ec3\u548c\u63a8\u7406\u65b9\u6cd5\u3001\u6a21\u578b\u5b89\u5168\u3001\u8de8\u9886\u57df\u4e0e\u6587\u5316\u9002\u5e94\u4ee5\u53ca\u6570\u636e\u96c6\u4f7f\u7528\u3002\u6211\u4eec\u8fd8\u5206\u6790\u4e86\u8fd9\u4e9b\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u53ef\u80fd\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u7684\u591a\u8bed\u8a00\u6027\u80fd\u3002\u672c\u7efc\u8ff0\u65e8\u5728\u5e2e\u52a9\u7814\u7a76\u754c\u5e94\u5bf9\u591a\u8bed\u8a00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u5173\u4e8e\u57fa\u4e8eLLMs\u7684\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6838\u5fc3\u6982\u5ff5\u3001\u5173\u952e\u6280\u672f\u53ca\u6700\u65b0\u8fdb\u5c55\u7684\u5168\u9762\u7406\u89e3\u3002**|\n", "2405.10928": "|**2024-05-17**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928](http://arxiv.org/abs/2405.10928)|**[link](https://github.com/apolloresearch/rib)**|### \u6982\u8ff0 \u673a\u68b0\u89e3\u91ca\u6027\u76ee\u6807\u662f\u901a\u8fc7\u9006\u5411\u5de5\u7a0b\u7406\u89e3\u795e\u7ecf\u7f51\u7edc\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5bf9\u6fc0\u6d3b\u7684\u5206\u89e3\uff0c\u4f7f\u5f97\u5355\u4e2a\u795e\u7ecf\u5143\u6216\u6a21\u578b\u7ec4\u4ef6\u65e0\u6cd5\u6e05\u6670\u5bf9\u5e94\u4e8e\u72ec\u7279\u7684\u7279\u5f81\u6216\u529f\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\u2014\u2014\u5c40\u90e8\u4ea4\u4e92\u57fa\uff08Local Interaction Basis\uff0cLIB\uff09\u3002LIB\u65e8\u5728\u901a\u8fc7\u6d88\u9664\u65e0\u5173\u6fc0\u6d3b\u548c\u4ea4\u4e92\uff0c\u8bc6\u522b\u8ba1\u7b97\u7279\u5f81\u3002\u8be5\u65b9\u6cd5\u6452\u5f03\u65e0\u610f\u4e49\u7684\u6fc0\u6d3b\u65b9\u5411\uff0c\u5e76\u4f7f\u57fa\u7840\u4e0e\u76f8\u90bb\u5c42\u95f4\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u5947\u5f02\u5411\u91cf\u5bf9\u9f50\u3002\u540c\u65f6\uff0c\u5b83\u6839\u636e\u7279\u5f81\u5bf9\u540e\u7eed\u8ba1\u7b97\u7684\u91cd\u8981\u6027\u8fdb\u884c\u7f29\u653e\uff0c\u751f\u6210\u4e00\u4e2a\u663e\u793a\u6a21\u578b\u4e2d\u6240\u6709\u8ba1\u7b97\u76f8\u5173\u7279\u6027\u548c\u4ea4\u4e92\u7684\u56fe\u8c31\u3002 \u6211\u4eec\u5728\u6a21\u5757\u52a0\u6cd5\u548cCIFAR-10\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86LIB\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u6bd4\u4e8e\u4e3b\u6210\u5206\u5206\u6790\uff0cLIB\u80fd\u8bc6\u522b\u51fa\u66f4\u591a\u8ba1\u7b97\u76f8\u5173\u7684\u7279\u5f81\uff0c\u5e76\u5448\u73b0\u51fa\u66f4\u7a00\u758f\u7684\u4ea4\u4e92\u3002\u7136\u800c\uff0c\u5728\u5e94\u7528\u4e8e\u8bed\u8a00\u6a21\u578b\u65f6\uff0cLIB\u5e76\u672a\u663e\u8457\u63d0\u9ad8\u53ef\u89e3\u91ca\u6027\u6216\u4ea4\u4e92\u7a00\u758f\u5ea6\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c3d\u7ba1LIB\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u7406\u8bba\u9a71\u52a8\u65b9\u6cd5\uff0c\u4f46\u5f53\u524d\u5f62\u5f0f\u5e76\u4e0d\u9002\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893](http://arxiv.org/abs/2405.10893)|null|\u8fd9\u7bc7\u6280\u672f\u8bba\u6587\u9610\u8ff0\u4e86COGNET-MD\uff0c\u4e00\u4e2a\u4e13\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7684\u65b0\u57fa\u51c6\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u5206\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u533b\u5b66\u6587\u672c\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u96be\u5ea6\u5206\u7ea7\u7684\u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u6570\u636e\u5e93\u3002\u8fd9\u4e2a\u6570\u636e\u5e93\u7531\u591a\u4e2a\u533b\u7597\u9886\u57df\u7684\u4e13\u5bb6\u5408\u4f5c\u521b\u5efa\uff0c\u4ee5\u53cd\u6620\u5f53\u524d\u533b\u5b66\u8d8b\u52bf\uff0c\u786e\u4fdd\u5b89\u5168\u3001\u5b9e\u7528\u548c\u9002\u7528\u6027\u3002\u521d\u671f\u7248\u672c\u5305\u542b\u4e86\u7cbe\u795e\u79d1\u3001\u7259\u79d1\u3001\u80ba\u75c5\u5b66\u3001\u76ae\u80a4\u79d1\u548c\u5185\u5206\u6ccc\u5b66\u7b49\u9886\u57df\u7684\u9898\u76ee\uff0c\u4f46\u4f1a\u6301\u7eed\u6269\u5c55\uff0c\u672a\u6765\u8fd8\u4f1a\u52a0\u5165\u66f4\u591a\u533b\u5b66\u5b66\u79d1\u3002|\n", "2405.10883": "|**2024-05-17**|**Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review**|Hongyi Yang et.al.|[2405.10883](http://arxiv.org/abs/2405.10883)|null|\u8be5\u7efc\u8ff0\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4f30\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u60a3\u8005\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u73b0\u72b6\u548c\u524d\u666f\uff0c\u4ee5\u53ca\u5176\u5bf9\u5eb7\u590d\u8fc7\u7a0b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u4ece2012\u5e74\u81f3\u73b0\u5728\u7b5b\u9009\u4e8670\u9879\u7814\u7a76\uff0c\u91cd\u70b9\u5173\u6ce8\u673a\u5668\u5b66\u4e60\u3001\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u5728\u5fc3\u7406\u5065\u5eb7\u5e72\u9884\u548c\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u3001\u6280\u672f\u7c7b\u522b\u3001\u4ea7\u54c1\u548c\u6570\u636e\u7c7b\u578b\uff0c\u5982\u751f\u6001\u77ac\u65f6\u8bc4\u4f30\u3001\u884c\u4e3a\u548c\u8bed\u97f3\u6570\u636e\u7684\u5206\u6790\u3002\u7ed3\u679c\u663e\u793a\uff0cAI\u5728\u75c7\u72b6\u76d1\u6d4b\u3001\u590d\u53d1\u98ce\u9669\u9884\u6d4b\u548c\u5eb7\u590d\u6cbb\u7597\u4e2d\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u8fd8\u63a2\u8ba8\u4e86\u57fa\u4e8eAI\u7684\u65b0\u5174\u4ea7\u54c1\u3001\u6280\u672f\u548c\u5206\u6790\u65b9\u6cd5\uff0c\u5982\u793e\u4ea4\u5a92\u4f53\u5206\u6790\u3001\u4e25\u8083\u6e38\u620f\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5eb7\u590d\u4e2d\u7684\u6f5c\u5728\u6311\u6218\u548c\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u603b\u7684\u6765\u8bf4\uff0c\u8fd9\u7bc7\u8bba\u6587\u7cfb\u7edf\u56de\u987e\u4e86AI\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u8def\u5f84\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u5efa\u8bae\u3002|\n", "2405.10853": "|**2024-05-17**|**The Future of Large Language Model Pre-training is Federated**|Lorenzo Sani et.al.|[2405.10853](http://arxiv.org/abs/2405.10853)|null|## \u80cc\u666f \u751f\u6210\u5f0f\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5b83\u4eec\u6240\u63a5\u53d7\u7684\u6d77\u91cf\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u5df2\u5efa\u7acb\u7684\u89c4\u6a21\u6cd5\u5219\uff0cLLMs\u672a\u6765\u6027\u80fd\u7684\u63d0\u5347\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6211\u4eec\u80fd\u591f\u5229\u7528\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u6709\u53ef\u80fd\u91ca\u653e\u5168\u7403\u5927\u90e8\u5206\u672a\u5145\u5206\u5229\u7528\u7684\u6570\u636e\u548c\u8ba1\u7b97\u80fd\u529b\uff0c\u8fd9\u4e9b\u662f\u5f53\u524d\u4ee5\u6570\u636e\u4e2d\u5fc3\u4e3a\u4e2d\u5fc3\u7684LLM\u8bad\u7ec3\u65b9\u6cd5\u6240\u5ffd\u89c6\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7a33\u5065\u3001\u7075\u6d3b\u4e14\u53ef\u590d\u73b0\u7684FL\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdb\u673a\u6784\u95f4\u7684\u5927\u89c4\u6a21\u534f\u4f5c\uff0c\u5171\u540c\u8bad\u7ec3LLMs\uff0c\u4ece\u800c\u52a8\u5458\u66f4\u591a\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\uff0c\u751a\u81f3\u53ef\u80fd\u8fbe\u5230\u6216\u8d85\u8d8a\u4e2d\u5fc3\u5316\u7684\u6027\u80fd\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cdFL\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u5728\u6709\u9650\u8d44\u6e90\u4e0b\u6269\u5c55\u5230\u767e\u4ebf\u5143\u7ea7\u7684\u8054\u90a6LLM\uff0c\u4f7f\u5f97\u62e5\u6709\u4e30\u5bcc\u6570\u636e\u7684\u5b9e\u4f53\u80fd\u591f\u6210\u4e3a\u9884\u8bad\u7ec3LLMs\u7684\u4e3b\u5bfc\u529b\u91cf\uff0c\u800c\u4e0d\u662f\u4ec5\u8ba9\u8ba1\u7b97\u8d44\u6e90\u4e30\u5bcc\u7684\u673a\u6784\u72ec\u5360\u9ccc\u5934\u3002\u8fd9\u79cd\u65b9\u6cd5\u5f3a\u8c03\u4e86\u8054\u90a6\u8bad\u7ec3\u7684\u89c4\u6a21\u6548\u76ca\uff0c\u5e76\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b9e\u7528\u8def\u5f84\u3002|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825](http://arxiv.org/abs/2405.10825)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u5b83\u4eec\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u5c24\u5176\u5728\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u7684\u63a8\u52a8\u4e0b\u5c55\u73b0\u51fa\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u65e8\u5728\u5168\u9762\u6982\u8ff0LLM\u8d4b\u80fd\u7684\u7535\u4fe1\u7f51\u7edc\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86LLMs\u7684\u57fa\u7840\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u63a8\u7406\u4e0e\u5e94\u7528\u3001\u6a21\u578b\u8bc4\u4f30\uff0c\u4ee5\u53ca\u5728\u7535\u4fe1\u90e8\u7f72\u4e2d\u7684\u8fd0\u7528\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8LLM\u652f\u6301\u7684\u5173\u952e\u6280\u672f\u548c\u7535\u4fe1\u5e94\u7528\uff0c\u6d89\u53ca\u751f\u6210\u3001\u5206\u7c7b\u3001\u4f18\u5316\u548c\u9884\u6d4b\u95ee\u9898\u3002\u751f\u6210\u5e94\u7528\u5305\u62ec\u7535\u4fe1\u9886\u57df\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u7f51\u7edc\u914d\u7f6e\u81ea\u52a8\u751f\u6210\u3002\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u4efb\u52a1\u6db5\u76d6\u7f51\u7edc\u5b89\u5168\u3001\u6587\u672c\u3001\u56fe\u50cf\u548c\u6d41\u91cf\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u5229\u7528LLMs\u7684\u81ea\u52a8\u5316\u4f18\u5316\u6280\u672f\uff0c\u5982\u5f3a\u5316\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u8bbe\u8ba1\u548c\u53e3\u8bed\u5f3a\u5316\u5b66\u4e60\u3002\u5bf9\u4e8e\u9884\u6d4b\u95ee\u9898\uff0cLLMs\u53ef\u7528\u4e8e\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u548c\u591a\u6a21\u6001\u7535\u4fe1\u9884\u6d4b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86LLM\u8d4b\u80fd\u7535\u4fe1\u7f51\u7edc\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2405.10808": "|**2024-05-17**|**ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios**|Markus Bayer et.al.|[2405.10808](http://arxiv.org/abs/2405.10808)|null|\u4e3b\u52a8\u5b66\u4e60\u65e8\u5728\u901a\u8fc7\u4f18\u5148\u5904\u7406\u6700\u80fd\u63d0\u5347\u5b66\u4e60\u6548\u679c\u7684\u5b9e\u4f8b\u6765\u51cf\u5c11\u6807\u6ce8\u5de5\u4f5c\u91cf\u3002\u7136\u800c\uff0c\u8bb8\u591a\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u9762\u4e34\u201c\u51b7\u542f\u52a8\u201d\u95ee\u9898\uff0c\u5373\u5728\u521d\u671f\u9700\u8981\u5927\u91cf\u6570\u636e\u624d\u80fd\u53d1\u6325\u6548\u80fd\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982BERT\uff09\u4e0a\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u5df2\u8868\u73b0\u826f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u2014\u2014ActiveLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u3001Llama 3\u548cMistral Large\uff09\u8fdb\u884c\u5b9e\u4f8b\u9009\u62e9\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0cActiveLLM\u663e\u8457\u63d0\u9ad8\u4e86BERT\u5206\u7c7b\u5668\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u548cSetFit\u7b49\u5c11\u6570\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0cActiveLLM\u8fd8\u80fd\u6269\u5c55\u5230\u975e\u5c11\u91cf\u6837\u672c\u573a\u666f\uff0c\u652f\u6301\u8fed\u4ee3\u9009\u62e9\uff0c\u4ece\u800c\u5e2e\u52a9\u5176\u4ed6\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u514b\u670d\u51b7\u542f\u52a8\u96be\u9898\u3002\u7ed3\u679c\u8868\u660e\uff0cActiveLLM\u4e3a\u6539\u5584\u4e0d\u540c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u6a21\u578b\u6027\u80fd\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2405.10745": "|**2024-05-17**|**Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings**|Albert Sawczyn et.al.|[2405.10745](http://arxiv.org/abs/2405.10745)|null|### \u7ffb\u8bd1 \u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u5bf9\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u63d0\u51fa\u4e86\u4e25\u5cfb\u6311\u6218\u3002\u901a\u5e38\u91c7\u7528\u7684\u65b9\u6cd5\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u65f6\u5f80\u5f80\u5b58\u5728\u5c40\u9650\u6027\u3002\u7136\u800c\uff0c\u4eba\u4eec\u5df2\u7ecf\u52aa\u529b\u901a\u8fc7\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u5f25\u8865\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u901a\u8fc7\u5c06\u5c0f\u89c4\u6a21\u7684\u9886\u57df\u7279\u5b9aKG\u4e0e\u901a\u7528KG\u76f8\u7ed3\u5408\u3002\u5c3d\u7ba1KG\u5728\u77e5\u8bc6\u8868\u793a\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4f46\u6784\u5efa\u5b83\u4eec\u7684\u6210\u672c\u53ef\u80fd\u963b\u788d\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u94fe\u63a5\u5230\u5927\u89c4\u6a21\u901a\u7528KG\u6765\u63d0\u5347\u5c0f\u578b\u9886\u57df\u7279\u5b9aKG\u5d4c\u5165\u7684\u5b66\u4e60\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5e26\u6765\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u4f8b\u5982\uff0cHits@10\u6307\u6807\u6700\u9ad8\u63d0\u9ad8\u4e8644%\u3002\u8fd9\u4e00\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u7814\u7a76\u65b9\u5411\u6709\u671b\u4fc3\u8fdbKG\u5728\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u66f4\u9891\u7e41\u8fd0\u7528\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u4e3a\u7a33\u5065\u3001\u53ef\u9760\u7684ML\u89e3\u51b3\u65b9\u6848\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u6d41\u884c\u4f46\u6613\u51fa\u9519\u7684LLM\u65b9\u6cd5\u66f4\u5177\u53ef\u9760\u6027\u3002\u5173\u952e\u8bcd\uff1a\u77e5\u8bc6\u56fe\u8c31\u3001\u77e5\u8bc6\u56fe\u8c31\u8865\u5168\u3001\u5b9e\u4f53\u5bf9\u9f50\u3001\u8868\u793a\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739](http://arxiv.org/abs/2405.10739)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|**\u5728\u8fc7\u53bb\u4e00\u5e74\u91cc\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\u3001\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u7b49\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5e9e\u5927\u89c4\u6a21\u548c\u9ad8\u6602\u7684\u8bad\u7ec3\u4e0e\u63a8\u7406\u6210\u672c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u9ad8\u6548\u4e14\u8f7b\u91cf\u7ea7\u7684MLLM\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u8fb9\u7f18\u8ba1\u7b97\u73af\u5883\u4e2d\u3002\u672c\u7efc\u8ff0\u5168\u9762\u7cfb\u7edf\u5730\u56de\u987e\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7684\u7814\u7a76\u73b0\u72b6\u3002\u6211\u4eec\u6982\u8ff0\u4e86\u4ee3\u8868\u6027\u9ad8\u6548\u6a21\u578b\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u603b\u7ed3\u4e86\u6709\u6548\u7ed3\u6784\u548c\u7b56\u7565\u7684\u7814\u7a76\u72b6\u6001\uff0c\u4ee5\u53ca\u5176\u5b9e\u7528\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7814\u7a76\u7684\u5c40\u9650\uff0c\u5e76\u5c55\u671b\u4e86\u6709\u524d\u666f\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u5982\u9700\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey\u3002**|\n", "2405.10725": "|**2024-05-17**|**INDUS: Effective and Efficient Language Models for Scientific Applications**|Bishwaranjan Bhattacharjee et.al.|[2405.10725](http://arxiv.org/abs/2405.10725)|null|\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bad\u7ec3\u6570\u636e\u53ef\u4ee5\u4f7f\u6a21\u578b\u5728\u4e13\u4e1a\u4efb\u52a1\u4e0a\u8868\u73b0\u66f4\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86INDUS\uff0c\u4e00\u5957\u4e13\u4e3a\u5730\u7403\u79d1\u5b66\u3001\u751f\u7269\u5b66\u3001\u7269\u7406\u5b66\u3001\u592a\u9633\u7269\u7406\u3001\u884c\u661f\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u8bbe\u8ba1\u7684\u5b9a\u5236\u5316\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u4e9b\u6a21\u578b\u57fa\u4e8e\u7cbe\u5fc3\u6311\u9009\u7684\u79d1\u5b66\u8bed\u6599\u5e93\uff0c\u5305\u62ec\uff1a\uff081\uff09\u4e00\u4e2a\u4f7f\u7528\u9886\u57df\u4e13\u7528\u8bcd\u6c47\u548c\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u63d0\u5347\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u7684\u8868\u73b0\uff1b\uff082\uff09\u4e00\u4e2a\u57fa\u4e8e\u5bf9\u6bd4\u5b66\u4e60\u7684\u901a\u7528\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5229\u7528\u591a\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u4f18\u5316\u4fe1\u606f\u68c0\u7d22\u4efb\u52a1\uff1b\uff083\uff09\u901a\u8fc7\u77e5\u8bc6\u84b8\u998f\u6280\u672f\u7f29\u5c0f\u89c4\u6a21\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5bf9\u5ef6\u8fdf\u548c\u8d44\u6e90\u6709\u9650\u7684\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e09\u4e2a\u65b0\u7684\u79d1\u5b66\u57fa\u51c6\u6570\u636e\u96c6\uff1aCLIMATE-CHANGE-NER\uff08\u5b9e\u4f53\u8bc6\u522b\uff09\u3001NASA-QA\uff08\u62bd\u53d6\u5f0f\u95ee\u7b54\uff09\u548cNASA-IR\uff08\u4fe1\u606f\u68c0\u7d22\uff09\uff0c\u4ee5\u63a8\u52a8\u8de8\u5b66\u79d1\u9886\u57df\u7684\u7814\u7a76\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u65b0\u4efb\u52a1\u548c\u76f8\u5173\u9886\u57df\u73b0\u6709\u57fa\u51c6\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8e\u901a\u7528\u7f16\u7801\u5668\uff08\u5982RoBERTa\uff09\u548c\u73b0\u6709\u7684\u9886\u57df\u7279\u5b9a\u7f16\u7801\u5668\uff08\u5982SciBERT\uff09\u3002|\n", "2405.12217": "|**2024-05-20**|**Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning**|Guanglin Zhou et.al.|[2405.12217](http://arxiv.org/abs/2405.12217)|**[link](https://github.com/jameszhou-gl/icl-distribution-shift)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5e94\u5bf9\u81ea\u7136\u5206\u5e03\u53d8\u5316\u65f6\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\uff0c\u5e38\u5e38\u8d85\u8d8a\u5148\u524d\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u9886\u57df\u7279\u5b9a\u7684\u9002\u5e94\u4ecd\u7136\u662f\u5fc5\u8981\u7684\uff0c\u5c24\u5176\u662f\u5728\u533b\u7597\u7b49\u4e13\u4e1a\u9886\u57df\u3002\u9274\u4e8eLMMs\u5e9e\u5927\u7684\u53c2\u6570\u7a7a\u95f4\u4f7f\u5176\u5fae\u8c03\u4e0d\u5207\u5b9e\u9645\uff0c\u672c\u7814\u7a76\u805a\u7126\u4e8e\u63a2\u7d22\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u4f5c\u4e3a\u4e00\u79cd\u589e\u5f3aLMM\u9002\u5e94\u6027\u7684\u6709\u6548\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0cICL\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u793a\u4f8b\u7684\u9009\u62e9\uff0c\u8fd9\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7c7b\u4f3c\uff0c\u4f46\u5bf9\u9762\u4e34\u5206\u5e03\u53d8\u5316\u7684LMMs\u63d0\u51fa\u4e86\u72ec\u7279\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u79cd\u65e0\u76d1\u7763\u7684ICL\u65b9\u6cd5\u2014\u2014TopKNearestPR\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u7279\u5f81\u76f8\u4f3c\u6027\u8fdb\u884c\u6700\u8fd1\u793a\u4f8b\u641c\u7d22\u6765\u9009\u62e9\u793a\u4f8b\u3002\u7814\u7a76\u63ed\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u5904\u7406\u5206\u5e03\u8f6c\u79fb\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u7f3a\u9677\u5bf9\u5176\u6548\u679c\u7684\u9650\u5236\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014InvariantSelectPR\uff0c\u5b83\u5229\u7528\u7c7b\u6761\u4ef6\u5bf9\u6bd4\u4e0d\u53d8\u6027\uff08CCI\uff09\u6765\u63d0\u5347\u9884\u8bad\u7ec3\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7a33\u5065\u6027\u3002CCI\u901a\u8fc7\u589e\u5f3a\u4e0d\u540c\u7c7b\u522b\u95f4\u7684\u533a\u5206\u5ea6\u5e76\u786e\u4fdd\u5bf9\u9886\u57df\u7279\u5b9a\u53d8\u5316\u7684\u4e0d\u53d8\u6027\uff0c\u63d0\u9ad8\u4e86\u7f16\u7801\u5668\u8bc6\u522b\u548c\u68c0\u7d22\u6700\u6709\u4fe1\u606f\u4ef7\u503c\u793a\u4f8b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u6709\u52a9\u4e8e\u5f15\u5bfcLMM\u9002\u5e94\u65b0\u7684\u67e5\u8be2\u6837\u672c\uff0c\u5373\u4f7f\u5728\u4e0d\u540c\u7684\u5206\u5e03\u4e0b\u4e5f\u662f\u5982\u6b64\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cInvariantSelectPR\u663e\u8457\u63d0\u9ad8\u4e86LMM\u7684\u9002\u5e94\u6027\uff0c\u5728Camelyon17\u548cHAM10000\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u76847-shot\u4efb\u52a1\u4e2d\uff0c\u5206\u522b\u5b9e\u73b0\u4e8634.2%\u548c16.9%\u7684\u51c6\u786e\u7387\u63d0\u5347\uff0c\u76f8\u5bf9\u4e8e\u96f6-shot\u6027\u80fd\uff0c\u8fd9\u662f\u663e\u8457\u7684\u8fdb\u6b65\u3002**|\n", "2405.12209": "|**2024-05-20**|**MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark**|Hongwei Liu et.al.|[2405.12209](http://arxiv.org/abs/2405.12209)|**[link](https://github.com/open-compass/mathbench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u6570\u5b66\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f20\u7edf\u7684\u6570\u5b66\u57fa\u51c6\u5982GSM8k\u5728\u5168\u9762\u8bc4\u4ef7\u8fd9\u4e9b\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MathBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u4e25\u683c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u3002MathBench\u8986\u76d6\u5e7f\u6cdb\u7684\u6570\u5b66\u5b66\u79d1\uff0c\u5bf9\u7406\u8bba\u7406\u89e3\u548c\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u8fdb\u884c\u8be6\u5c3d\u8bc4\u4f30\u3002\u5b83\u5206\u4e3a\u4e94\u4e2a\u9636\u6bb5\uff0c\u4ece\u57fa\u7840\u7b97\u672f\u5230\u5927\u5b66\u6570\u5b66\uff0c\u7ed3\u6784\u4e0a\u8bbe\u8ba1\u7528\u4e8e\u8003\u5bdf\u6a21\u578b\u5728\u4e0d\u540c\u6df1\u5ea6\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u6bcf\u4e2a\u9636\u6bb5\u5305\u62ec\u7406\u8bba\u95ee\u9898\u548c\u5e94\u7528\u9898\uff0c\u4ee5\u8861\u91cf\u6a21\u578b\u7684\u6570\u5b66\u719f\u7ec3\u5ea6\u53ca\u5176\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\u5e94\u7528\u6982\u5ff5\u7684\u80fd\u529b\u3002MathBench\u7684\u76ee\u6807\u662f\u63d0\u5347\u5bf9LLMs\u6570\u5b66\u80fd\u529b\u7684\u8bc4\u4ef7\uff0c\u63d0\u4f9b\u5bf9\u5176\u77e5\u8bc6\u7406\u89e3\u6c34\u5e73\u548c\u95ee\u9898\u89e3\u51b3\u6280\u80fd\u7684\u7ec6\u81f4\u89c6\u89d2\uff0c\u540c\u65f6\u652f\u6301\u53cc\u8bed\u73af\u5883\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728https://github.com/open-compass/MathBench\u3002**|\n", "2405.12195": "|**2024-05-20**|**Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey**|Thiago S. Vaillant et.al.|[2405.12195](http://arxiv.org/abs/2405.12195)|**[link](https://github.com/gpt-impact/Paper-content)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5176\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\u548c\u5e7f\u6cdb\u5e94\u7528\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4e0e\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u7684\u878d\u5408\u8d8b\u52bf\u65e5\u76ca\u660e\u663e\uff0c\u4f46\u5173\u4e8e\u8fd9\u79cd\u878d\u5408\u5982\u4f55\u5f71\u54cd\u8f6f\u4ef6\u5f00\u53d1\u5b9e\u8df5\u548c\u8ba4\u77e5\u7684\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u4e3a\u4e86\u63ed\u793a\u5c06AI\u9a71\u52a8\u5de5\u5177\uff0c\u5982ChatGPT\uff0c\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u7684\u5f71\u54cd\u548c\u6311\u6218\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8c03\u67e5\uff0c\u9488\u5bf9207\u540d\u8f6f\u4ef6\u5f00\u53d1\u8005\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u8c03\u67e5\u5185\u5bb9\u5305\u62ecChatGPT\u5bf9\u8f6f\u4ef6\u8d28\u91cf\u3001\u751f\u4ea7\u529b\u4ee5\u53ca\u5f00\u53d1\u8005\u5de5\u4f5c\u6ee1\u610f\u5ea6\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u8fd8\u63a2\u8ba8\u4e86\u4ed6\u4eec\u5bf9\u672a\u6765ChatGPT\u5e94\u7528\u7684\u9884\u671f\u3001\u5bf9\u53ef\u80fd\u7684\u5de5\u4f5c\u5c97\u4f4d\u66ff\u4ee3\u7684\u62c5\u5fe7\uff0c\u4ee5\u53ca\u5bf9\u76d1\u7ba1\u63aa\u65bd\u7684\u770b\u6cd5\u3002|\n", "2405.12174": "|**2024-05-20**|**CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models**|Haoxiang Shi et.al.|[2405.12174](http://arxiv.org/abs/2405.12174)|null|\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aCT-Eval\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\uff0c\u65e8\u5728\u8861\u91cf\u5927\u8bed\u8a00\u6a21\u578b\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u6587\u672c\u8f6c\u8868\u683c\u4efb\u52a1\u6027\u80fd\u3002\u7531\u4e8e\u73b0\u6709\u82f1\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\u4e3b\u8981\u9762\u5411\u82f1\u8bed\uff0cCT-Eval\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u9009\u62e9\u4e86\u4e00\u79cd\u6d41\u884c\u7684\u591a\u5b66\u79d1\u4e2d\u6587\u5728\u7ebf\u767e\u79d1\u4f5c\u4e3a\u6765\u6e90\uff0c\u6db5\u76d6\u4e8628\u4e2a\u9886\u57df\u4ee5\u4fdd\u8bc1\u6570\u636e\u591a\u6837\u6027\u3002\u4e3a\u4e86\u51cf\u5c11\u6570\u636e\u865a\u6784\uff08hallucination\uff09\u95ee\u9898\uff0c\u7814\u7a76\u8005\u9996\u5148\u8bad\u7ec3\u4e86\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u6765\u8bc6\u522b\u5e76\u8fc7\u6ee4\u6389\u5b58\u5728\u865a\u6784\u95ee\u9898\u7684\u6837\u672c\uff0c\u7136\u540e\u4eba\u5de5\u6807\u6ce8\u9a8c\u8bc1\u96c6\u548c\u6d4b\u8bd5\u96c6\u4e2d\u7684\u9519\u8bef\u3002\u6700\u7ec8\uff0cCT-Eval\u5305\u542b\u4e86\u5927\u7ea688,600\u4e2a\u4efb\u52a1\u6837\u672c\u3002\u901a\u8fc7CT-Eval\uff0c\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5f00\u6e90\u548c\u95ed\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u663e\u793a\u96f6-shot\u6a21\u5f0f\u4e0b\u8fd9\u4e9b\u6a21\u578b\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ecd\u6709\u663e\u8457\u5dee\u8ddd\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5f00\u6e90\u6a21\u578b\u5728\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u4e0a\u6709\u4e86\u663e\u8457\u63d0\u5347\uff0c\u5927\u5e45\u8d85\u8d8a\u4e86GPT-4\u3002\u603b\u4e4b\uff0cCT-Eval\u4e0d\u4ec5\u4e3a\u8bc4\u4f30\u548c\u7406\u89e3\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u5de5\u5177\uff0c\u4e5f\u4e3a\u63d0\u5347\u8fd9\u7c7b\u6a21\u578b\u5728\u8fd9\u9879\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u5b9d\u8d35\u8d44\u6e90\u3002|\n", "2405.12163": "|**2024-05-20**|**Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging**|Xiaobo Liang et.al.|[2405.12163](http://arxiv.org/abs/2405.12163)|**[link](https://github.com/dropreg/fennec)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u4f17\u591a\u73b0\u5b9e\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4e3b\u8981\u76ee\u6807\u662f\u7b26\u5408\u4eba\u7c7b\u7684\u610f\u56fe\u3002\u7136\u800c\uff0c\u7406\u89e3\u4eba\u7c7b\u610f\u56fe\u7684\u590d\u6742\u6027\u4f7f\u5f97\u4f9d\u8d56\u4e8e\u8017\u65f6\u7684\u4eba\u5de5\u8bc4\u4f30\u6210\u4e3a\u5fc5\u8981\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5229\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u8bc4\u4f30\u8005\u7684\u8d8b\u52bf\uff0c\u7279\u522b\u662f\u5728GPT-4\u7684\u6d41\u884c\u80cc\u666f\u4e0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textbf{Fennec}\u7684\u6846\u67b6\uff0c\u4e13\u6ce8\u4e8e\\textbf{F}ine-grained \\textbf{E}valuation\uff08\u7ec6\u81f4\u8bc4\u4f30\uff09\u548c\\textbf{N}eeded \\textbf{E}xtension\uff08\u5fc5\u8981\u6269\u5c55\uff09\u901a\u8fc7\u5206\u652f\uff08Branching\uff09\u548c\u8fde\u63a5\uff08Bridging\uff09\u3002\u5206\u652f\u64cd\u4f5c\u5c06\u8bc4\u4f30\u4efb\u52a1\u5206\u89e3\u4e3a\u4e0d\u540c\u7ef4\u5ea6\u548c\u7c92\u5ea6\uff0c\u4ece\u800c\u51cf\u8f7b\u8bc4\u4f30\u6311\u6218\u3002\u540c\u65f6\uff0c\u8fde\u63a5\u64cd\u4f5c\u878d\u5408\u4e86\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u589e\u52a0\u4e86\u8bc4\u4f30\u4efb\u52a1\u7684\u591a\u6837\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u76847B\u6a21\u578b\u5728\u5404\u79cd\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\\textit{\u4e00\u81f4\u6027}\u548c\\textit{\u4e00\u81f4\u540c\u610f}\u6027\u80fd\u5747\u4f18\u4e8e\u5f00\u6e90\u7684\u66f4\u5927\u89c4\u6a21\u8bc4\u4f30\u6a21\u578b\uff0c\u63a5\u8fd1GPT-4\u7684\u8868\u73b0\u3002\u6211\u4eec\u5229\u7528\u6a21\u578b\u7684\u7cbe\u7ec6\u6821\u6b63\u529f\u80fd\u6539\u8fdb\u591a\u4e2a\u6a21\u578b\u54cd\u5e94\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u4f18\u5316\u63d0\u5347\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u5728MT-Bench\u4e0a\u63d0\u9ad8\u4e861-2\u5206\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728GitHub\u4e0a\u5f00\u6e90\\footnote{\\url{https://github.com/dropreg/Fennec}}\u3002**|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130](http://arxiv.org/abs/2405.12130)|**[link](https://github.com/kongds/mora)**|**\u4f4e\u79e9\u9002\u5e94\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u6d41\u884c\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4f4e\u79e9\u66f4\u65b0\uff08\u5982LoRA\u5b9e\u73b0\uff09\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u6307\u51fa\uff0c\u8fd9\u79cd\u673a\u5236\u53ef\u80fd\u9650\u5236\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\u548c\u8bb0\u5fc6\u65b0\u77e5\u8bc6\u7684\u80fd\u529b\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5MoRA\uff0c\u5b83\u5229\u7528\u5e73\u65b9\u77e9\u9635\u5b9e\u73b0\u9ad8\u79e9\u66f4\u65b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eLoRA\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76f8\u5e94\u7684\u975e\u53c2\u6570\u8fd0\u7b97\u5668\uff0c\u4ee5\u964d\u4f4e\u8f93\u5165\u7ef4\u5ea6\u5e76\u589e\u52a0\u8f93\u51fa\u7ef4\u5ea6\u5904\u7406\u5e73\u65b9\u77e9\u9635\u3002\u8fd9\u4e9b\u8fd0\u7b97\u5668\u786e\u4fdd\u6743\u91cd\u80fd\u65e0\u7f1d\u878d\u5165\u5230\u5927\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u4f7f\u5f97\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u50cfLoRA\u4e00\u6837\u90e8\u7f72\u3002\u6211\u4eec\u5728\u4e94\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff1a\u6307\u4ee4\u8c03\u6574\u3001\u6570\u5b66\u63a8\u7406\u3001\u8fde\u7eed\u9884\u8bad\u7ec3\u3001\u8bb0\u5fc6\u4ee5\u53ca\u9884\u8bad\u7ec3\u3002\u5728\u5185\u5b58\u5bc6\u96c6\u578b\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8eLoRA\uff0c\u5e76\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002**|\n", "2405.12119": "|**2024-05-20**|**Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation**|Zhankui He et.al.|[2405.12119](http://arxiv.org/abs/2405.12119)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u901a\u8fc7\u51fa\u8272\u5730\u7d22\u5f15\u9879\u76ee\u5185\u5bb9\u3001\u7406\u89e3\u590d\u6742\u7684\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u5e76\u751f\u6210\u76f8\u5173\u9879\u76ee\u6807\u9898\uff0c\u9769\u65b0\u4e86\u5bf9\u8bdd\u63a8\u8350\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u63a7\u5236\u63a8\u8350\u9879\u76ee\u7684\u5206\u5e03\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u5bfc\u81f4\u5728\u9488\u5bf9\u5bf9\u8bdd\u63a8\u8350\u5e73\u53f0\u7684\u5feb\u901f\u53d8\u5316\u7684\u6570\u636e\u5206\u5e03\uff0c\u5982\u9879\u76ee\u6d41\u884c\u5ea6\u4e0a\uff0c\u6027\u80fd\u6b20\u4f73\u3002\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\uff0cLLMs\u901a\u8fc7\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u9879\u76ee\u6807\u9898\uff08\u4f5c\u4e3a\u591a\u4e2a\u4ee4\u724c\uff09\uff0c\u8fd9\u4f7f\u5f97\u83b7\u53d6\u548c\u63a7\u5236\u6240\u6709\u9879\u76ee\u63a8\u8350\u53d8\u5f97\u56f0\u96be\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u91cd\u7d22\u5f15-\u7136\u540e\u9002\u5e94\u201d\uff08Reindex-Then-Adapt\uff0cRTA\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u591a\u4ee4\u724c\u9879\u76ee\u6807\u9898\u8f6c\u6362\u4e3a\u5355\u4e2a\u4ee4\u724c\u4e8eLLMs\u5185\uff0c\u968f\u540e\u8c03\u6574\u8fd9\u4e9b\u5355\u4ee4\u724c\u9879\u76ee\u6807\u9898\u7684\u6982\u7387\u5206\u5e03\u3002RTA\u6846\u67b6\u7ed3\u5408\u4e86LLMs\u7406\u89e3\u548c\u590d\u6742\u67e5\u8be2\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\uff08RecSys\uff09\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\u6709\u6548\u63a7\u5236\u63a8\u8350\u9879\u76ee\u5206\u5e03\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u4e09\u4e2a\u4e0d\u540c\u7684\u5bf9\u8bdd\u63a8\u8350\u6570\u636e\u96c6\u548c\u4e24\u79cd\u9002\u5e94\u8bbe\u7f6e\u4e0b\uff0c\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u51c6\u786e\u6027\u6307\u6807\u3002|\n", "2405.12107": "|**2024-05-20**|**Imp: Highly Capable Large Multimodal Models for Mobile Devices**|Zhenwei Shao et.al.|[2405.12107](http://arxiv.org/abs/2405.12107)|**[link](https://github.com/milvlg/imp)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5f00\u653e\u4e16\u754c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u53c2\u6570\u91cf\u5927\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\uff0c\u9650\u5236\u4e86\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u8f7b\u91cf\u7ea7LMM\uff0c\u65e8\u5728\u5728\u6709\u9650\u89c4\u6a21\uff08\u598230\u4ebf\u53c2\u6570\uff09\u4e0b\u6700\u5927\u5316\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u591a\u6570\u4ec5\u5173\u6ce8\u8bbe\u8ba1\u7a7a\u95f4\u7684\u5355\u4e00\u6216\u4e24\u4e2a\u65b9\u9762\uff0c\u5bf9\u5f71\u54cd\u6a21\u578b\u80fd\u529b\u7684\u5173\u952e\u8bbe\u8ba1\u9009\u62e9\u5c1a\u672a\u8fdb\u884c\u5168\u9762\u63a2\u8ba8\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u8f7b\u91cf\u7ea7LMM\u7684\u8bbe\u8ba1\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u8bad\u7ec3\u7b56\u7565\u548c\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u5957\u540d\u4e3aImp\u7684\u9ad8\u6027\u80fdLMM\u5bb6\u65cf\uff0c\u8986\u76d620\u4ebf\u523040\u4ebf\u53c2\u6570\u89c4\u6a21\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684Imp-30\u4ebf\u6a21\u578b\u5728\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u73b0\u6709\u8f7b\u91cf\u7ea7\u6a21\u578b\u76f8\u6bd4\u65f6\u6301\u7eed\u9886\u5148\uff0c\u5e76\u8d85\u8d8a\u4e86130\u4ebf\u53c2\u6570\u89c4\u6a21\u7684\u6700\u65b0LMM\u72b6\u6001\u3002\u901a\u8fc7\u4f4e\u7cbe\u5ea6\u91cf\u5316\u548c\u5206\u8fa8\u7387\u964d\u4f4e\u6280\u672f\uff0cImp\u6a21\u578b\u80fd\u591f\u5728\u9ad8\u901a\u9a81\u9f998Gen3\u79fb\u52a8\u82af\u7247\u4e0a\u5b9e\u73b0\u9ad8\u901f\u90e8\u7f72\uff0c\u6bcf\u79d2\u5904\u7406\u5927\u7ea613\u4e2a\u4ee4\u724c\u7684\u63a8\u7406\u901f\u5ea6\u3002**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100](http://arxiv.org/abs/2405.12100)|null|## \u80cc\u666f \u6570\u5b66\u4e16\u754c\u95ee\u9898\u4fee\u6b63\uff08MWPC\uff09\u662f\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u89e3\u51b3\u6570\u5b66\u95ee\u9898\u8fc7\u7a0b\u4e2d\u9519\u8bef\u63a8\u7406\u7684\u4fee\u6b63\u4efb\u52a1\u3002\u672c\u6587\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\uff0c\u5173\u6ce8\u4e24\u70b9\uff1a\uff081\uff09\u533a\u5206\u6570\u5b66\u63a8\u7406\u4e0e\u9519\u8bef\u4fee\u6b63\uff1b\uff082\uff09\u63a2\u7d22\u7b56\u7565\u4ee5\u63d0\u5347LLMs\u5728\u6570\u5b66\u9886\u57df\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9MWPC\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5728\u5b9e\u65f6\u6559\u80b2\u4e2d\uff0c\u5e2e\u52a9\u5b66\u751f\u8bc6\u522b\u9519\u8bef\u6bd4\u5355\u7eaf\u63d0\u4f9b\u6b63\u786e\u7b54\u6848\u66f4\u4e3a\u5173\u952e\u3002\u7136\u800c\uff0c\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u83b7\u53d6\u7cbe\u786e\u7684\u89e3\u9898\u7b54\u6848\uff0c\u800c\u975e\u7ea0\u6b63\u53ef\u80fd\u7684\u9519\u8bef\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8c03\u6574\u4e86\u7814\u7a76\u8303\u5f0f\uff0c\u8868\u660e\u63d0\u5347\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5e76\u4e0d\u7b49\u540c\u4e8e\u7cbe\u901a\u9519\u8bef\u4fee\u6b63\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u8bca\u65ad\u5bfc\u5411\u63d0\u793a\uff08DOP\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdbLLMs\u5728\u9519\u8bef\u4fee\u6b63\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDOP\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u5f70\u663e\u5176\u91cd\u8981\u6027\u3002\u6211\u4eec\u5f3a\u8c03\uff0c\u5728\u6570\u5b66\u6559\u80b2\u4e2d\uff0c\u5bf9\u51fa\u8272\u4fee\u6b63\u8005\u7684\u9700\u8981\u8d85\u8fc7\u4e86\u5bf9\u719f\u7ec3\u63a8\u7406\u8005\u7684\u8ffd\u6c42\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2405.12981": "|**2024-05-21**|**Reducing Transformer Key-Value Cache Size with Cross-Layer Attention**|William Brandon et.al.|[2405.12981](http://arxiv.org/abs/2405.12981)|null|## \u7ffb\u8bd1 \u952e\u503c\u7f13\u5b58\u5bf9\u4e8e\u52a0\u901fTransformer\u67b6\u6784\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89e3\u7801\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u548c\u6279\u91cf\u5927\u5c0f\u589e\u5927\uff0c\u5b58\u50a8\u952e\u503c\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\u53ef\u80fd\u4f1a\u53d8\u5f97\u96be\u4ee5\u627f\u53d7\u3002\u81ea\u4eceTransformer\u8bde\u751f\u4ee5\u6765\uff0c\u4e24\u4e2a\u6700\u6709\u6548\u7684\u5185\u5b58\u51cf\u5c0f\u7b56\u7565\u662f\u591a\u67e5\u8be2\u6ce8\u610f\u529b\uff08MQA\uff09\u53ca\u5176\u63a8\u5e7f\uff0c\u7fa4\u7ec4\u67e5\u8be2\u6ce8\u610f\u529b\uff08GQA\uff09\u3002MQA\u548cGQA\u901a\u8fc7\u8ba9\u591a\u4e2a\u67e5\u8be2\u5934\u5171\u4eab\u5355\u4e2a\u952e/\u503c\u5934\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u4e0d\u540c\u952e/\u503c\u5934\u7684\u6570\u91cf\uff0c\u540c\u65f6\u5bf9\u51c6\u786e\u6027\u5f71\u54cd\u8f83\u5c0f\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u8fdb\u4e00\u6b65\u53d1\u5c55MQA\uff0c\u5373\u5728\u76f8\u90bb\u5c42\u4e4b\u95f4\u4e5f\u5171\u4eab\u952e\u548c\u503c\u5934\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u8de8\u5c42\u6ce8\u610f\u529b\uff08CLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528CLA\uff0c\u53ef\u4ee5\u5728\u4fdd\u6301\u63a5\u8fd1\u539f\u59cbMQA\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u5c06\u952e\u503c\u7f13\u5b58\u7684\u5927\u5c0f\u518d\u51cf\u5c112\u500d\u3002\u6211\u4eec\u5728\u4ece\u5934\u8bad\u7ec310\u4ebf\u53c2\u6570\u548c30\u4ebf\u53c2\u6570\u6a21\u578b\u7684\u5b9e\u9a8c\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u70b9\uff0c\u7ed3\u679c\u8868\u660e\uff0cCLA\u5728\u5185\u5b58\u4e0e\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u63d0\u4f9b\u4e86\u4f18\u4e8e\u4f20\u7edfMQA\u7684\u5e15\u7d2f\u6258\u6539\u8fdb\uff0c\u4f7f\u5f97\u66f4\u957f\u7684\u5e8f\u5217\u957f\u5ea6\u548c\u66f4\u5927\u7684\u6279\u91cf\u5927\u5c0f\u4e0b\u7684\u63a8\u7406\u6210\u4e3a\u53ef\u80fd\u3002|\n", "2405.12961": "|**2024-05-21**|**Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale**|Shriram Chennakesavalu et.al.|[2405.12961](http://arxiv.org/abs/2405.12961)|**[link](https://github.com/rotskoff-group/llm-era)**|\u5728\u5316\u5b66\u7a7a\u95f4\u4e2d\u7684\u641c\u7d22\u662f\u4e00\u4e2a\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\uff0c\u56e0\u4e3a\u53ef\u80fd\u7684\u5206\u5b50\u6570\u91cf\u968f\u7740\u539f\u5b50\u6570\u91cf\u5448\u7ec4\u5408\u7ea7\u589e\u957f\u3002\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\u901a\u8fc7\u5b66\u4e60\u5316\u5b66\u5316\u5408\u7269\u6570\u636e\u5e93\u5df2\u7ecf\u4ea7\u751f\u4e86\u5f3a\u5927\u7684\u751f\u6210\u5668\uff0c\u4f46\u6211\u4eec\u4ecd\u7136\u7f3a\u4e4f\u6709\u6548\u7b56\u7565\u6765\u751f\u6210\u5177\u6709\u7279\u5b9a\u6027\u8d28\u7684\u5206\u5b50\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u201c\u5bf9\u9f50\u201d\u95ee\u9898\u76f8\u4f3c\uff0c\u5c3d\u7ba1\u5728\u8bb8\u591a\u5316\u5b66\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u6709\u4e00\u4e2a\u660e\u786e\u4e14\u6613\u4e8e\u8bc4\u4f30\u7684\u5956\u52b1\u51fd\u6570\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u80fd\u91cf\u6392\u540d\u5bf9\u9f50\uff08ERA\uff09\u7684\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u660e\u786e\u7684\u5956\u52b1\u51fd\u6570\u6784\u5efa\u4e86\u4e00\u4e2a\u68af\u5ea6\u4f18\u5316\u76ee\u6807\uff0c\u7528\u4e8e\u8c03\u6574\u81ea\u56de\u5f52\u7b56\u7565\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u8be5\u7b97\u6cd5\u4e0eProximal Policy Optimization\uff08PPO\uff09\u548cDirect Preference Optimization\uff08DPO\uff09\u5bc6\u5207\u76f8\u5173\uff0c\u4f46\u5176\u6700\u5c0f\u5316\u5668\u6536\u655b\u4e8e\u4e00\u4e2a\u7406\u60f3\u7684\u5409\u5e03\u65af-\u73bb\u5c14\u5179\u66fc\u5206\u5e03\uff0c\u5956\u52b1\u51fd\u6570\u626e\u6f14\u4e86\u80fd\u91cf\u89d2\u8272\u3002\u6b64\u5916\uff0c\u8be5\u7b97\u6cd5\u5177\u6709\u9ad8\u5ea6\u53ef\u6269\u5c55\u6027\uff0c\u65e0\u9700\u5f3a\u5316\u5b66\u4e60\uff0c\u5e76\u4e14\u5728\u6bcf\u5bf9\u6837\u672c\u7684\u504f\u597d\u89c2\u5bdf\u6b21\u6570\u8f83\u5c11\u65f6\uff0c\u76f8\u5bf9\u4e8eDPO\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u5e94\u7528\u4e8e\u5206\u5b50\u53d8\u538b\u5668\u7684\u5bf9\u9f50\uff0c\u4ee5\u751f\u6210\u5177\u6709\u5916\u90e8\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\uff0c\u5e76\u53d1\u73b0\u5b83\u80fd\u7a33\u5065\u5730\u8fdb\u884c\u641c\u7d22\uff0c\u63a2\u7d22\u5316\u5b66\u7a7a\u95f4\u7684\u591a\u6837\u5316\u90e8\u5206\u3002\u867d\u7136\u6211\u4eec\u7684\u91cd\u70b9\u5728\u4e8e\u5316\u5b66\u641c\u7d22\uff0c\u4f46\u6211\u4eec\u5728\u4e00\u4e2aAI\u76d1\u7763\u7684\u4efb\u52a1\u4e0a\u4e5f\u53d6\u5f97\u4e86\u4f18\u79c0\u7ed3\u679c\uff0c\u8868\u660e\u8be5\u65b9\u6cd5\u662f\u53ef\u6269\u5c55\u4e14\u901a\u7528\u7684\u3002|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939](http://arxiv.org/abs/2405.12939)|**[link](https://github.com/yinzhangyue/AoR)**|## \u80cc\u666f \u8fd1\u671f\uff0cChain-of-Thought\u63d0\u793a\u7684\u8fdb\u5c55\u6781\u5927\u5730\u63a8\u52a8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u7a81\u7834\u3002\u5f53\u524d\u7814\u7a76\u901a\u8fc7\u91c7\u6837\u591a\u79cd\u63a8\u7406\u8def\u5f84\u5e76\u6839\u636e\u7b54\u6848\u9891\u7387\u8fdb\u884censemble\uff0c\u63d0\u9ad8\u4e86LLMs\u7684\u63a8\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u6b63\u786e\u7b54\u6848\u5904\u4e8e\u5c11\u6570\u7684\u60c5\u51b5\u65f6\u5931\u6548\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u662f\u5236\u7ea6LLMs\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u56e0\u7d20\uff0c\u4ec5\u51ed\u9884\u6d4b\u7b54\u6848\u65e0\u6cd5\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u63a8\u7406\u805a\u5408\u6846\u67b6AoR\uff08\u63a8\u7406\u805a\u5408\uff09\uff0c\u5b83\u4f9d\u636e\u63a8\u7406\u94fe\u6761\u7684\u8bc4\u4f30\u6765\u9009\u62e9\u7b54\u6848\u3002\u6b64\u5916\uff0cAoR\u5f15\u5165\u4e86\u52a8\u6001\u91c7\u6837\u7b56\u7565\uff0c\u6839\u636e\u4efb\u52a1\u590d\u6742\u5ea6\u8c03\u6574\u63a8\u7406\u94fe\u6761\u7684\u6570\u91cf\u3002 ## \u4efb\u52a1 \u4e00\u7cfb\u5217\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAoR\u76f8\u8f83\u4e8e\u4e3b\u6d41ensemble\u65b9\u6cd5\u8868\u73b0\u51fa\u8272\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u8868\u660e\uff0cAoR\u4e0d\u4ec5\u9002\u7528\u4e8e\u5404\u79cdLLMs\uff0c\u800c\u4e14\u5728\u4e0e\u73b0\u6709\u65b9\u6cd5\u7684\u6027\u80fd\u5929\u82b1\u677f\u6bd4\u8f83\u4e2d\uff0c\u8fbe\u5230\u4e86\u66f4\u4f18\u79c0\u7684\u6c34\u5e73\u3002|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933](http://arxiv.org/abs/2405.12933)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bf8\u5982\u603b\u7ed3\u3001\u7b97\u672f\u63a8\u7406\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5728\u9053\u5fb7\u63a8\u7406\u548c\u4f26\u7406\u51b3\u7b56\u65b9\u9762\uff0c\u5c24\u5176\u662f\u5728\u6d89\u53ca\u591a\u4e2a\u5229\u76ca\u76f8\u5173\u8005\u7684\u590d\u6742\u60c5\u666f\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u4e25\u5cfb\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSkin-in-the-Game\uff08SKIG\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4ece\u4e0d\u540c\u5229\u76ca\u76f8\u5173\u8005\u89d2\u5ea6\u5ba1\u89c6\u51b3\u7b56\u7684\u540e\u679c\uff0c\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u5728\u9053\u5fb7\u63a8\u7406\u4e2d\u7684\u80fd\u529b\u3002SKIG\u7684\u6838\u5fc3\u673a\u5236\u662f\u6a21\u62df\u884c\u52a8\u7684\u8d23\u4efb\u611f\uff0c\u7ed3\u5408\u540c\u7406\u5fc3\u7ec3\u4e60\u548c\u98ce\u9669\u8bc4\u4f30\uff0c\u5bf9\u63d0\u9ad8\u5176\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u4f7f\u7528\u4e13\u6709\u548c\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u9053\u5fb7\u63a8\u7406\u57fa\u51c6\u4e0a\u9a8c\u8bc1SKIG\u7684\u8868\u73b0\uff0c\u5e76\u901a\u8fc7\u6df1\u5165\u7684\u6d88\u878d\u5206\u6790\u63a2\u7a76\u5176\u5173\u952e\u7ec4\u4ef6\u3002|\n", "2405.12929": "|**2024-05-21**|**Code-mixed Sentiment and Hate-speech Prediction**|Anjali Yadav et.al.|[2405.12929](http://arxiv.org/abs/2405.12929)|**[link](https://github.com/matejklemen/sentiment-hate-speech-with-code-mixed-models)**|\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\uff0c\u6df7\u5408\u4ee3\u7801\uff08code-mixed discourse\uff09\u6307\u7684\u662f\u5355\u6587\u672c\u4e2d\u878d\u5408\u591a\u79cd\u8bed\u8a00\u7684\u73b0\u8c61\uff0c\u5c24\u5176\u662f\u5728\u5b98\u65b9\u8bed\u8a00\u591a\u5143\u7684\u56fd\u5bb6\u7684\u975e\u6b63\u5f0f\u4ea4\u6d41\u4e2d\u5e38\u89c1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u4e3b\u5bfc\u5730\u4f4d\u63d0\u5347\uff0c\u6211\u4eec\u9488\u5bf9\u4ee3\u7801\u6df7\u5408\u8bed\u5883\u7684\u7814\u7a76\u4e5f\u968f\u4e4b\u5c55\u5f00\u3002\u9996\u5148\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u56db\u6b3e\u65b0\u7684\u82f1\u8bed-\u5370\u5730\u8bed\u548c\u82f1\u8bed-\u65af\u6d1b\u6587\u5c3c\u4e9a\u53cc\u8bed\u9884\u8bad\u7ec3\u906e\u7f69\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u9002\u5e94\u975e\u6b63\u5f0f\u8bed\u8a00\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5404\u79cd\u7c7b\u578b\u7684\u6a21\u578b\u2014\u2014\u5305\u62ec\u5355\u8bed\u3001\u53cc\u8bed\u3001\u5c11\u91cf\u8bed\u8a00\u548c\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u6a21\u578b\u2014\u2014\u5728\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u60c5\u611f\u5206\u6790\u548c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7b49\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u6709\u6548\u7684\u5206\u7c7b\u5668\u662f\u9488\u5bf9\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u4e13\u4e1a\u5316\u53cc\u8bed\u548c\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u968f\u540e\u662f\u975e\u4e13\u4e1a\u7684\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u548c\u5355\u8bed\u6a21\u578b\uff0c\u800c\u5927\u578b\u751f\u6210\u6a21\u578b\u7684\u8868\u73b0\u5e76\u4e0d\u7a81\u51fa\u3002\u5bf9\u4e8e\u6d89\u53ca\u60c5\u611f\u7684\u95ee\u9898\uff0c\u6a21\u578b\u5728\u5904\u7406\u4ee3\u7801\u6df7\u5408\u6570\u636e\u65f6\u603b\u4f53\u4e0a\u7565\u4f18\u4e8e\u975e\u4ee3\u7801\u6df7\u5408\u6570\u636e\u3002|\n", "2405.12920": "|**2024-05-21**|**Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples**|Tim Menzies et.al.|[2405.12920](http://arxiv.org/abs/2405.12920)|**[link](https://github.com/timm/ez)**|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u8f6f\u4ef6\u5206\u6790\u6311\u6218\u4efb\u52a1\u3002\u5728\u8fd9\u4e2a\u88ab\u79f0\u4e3a\u201c\u8f6f\u4ef6\u5ba1\u67e5\u201d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e00\u7ec4SME\uff08\u4e3b\u9898\u4e13\u5bb6\uff09\u4f1a\u8bc4\u5ba1\u8f6f\u4ef6\u884c\u4e3a\u793a\u4f8b\uff0c\u4ee5\u5efa\u8bae\u5982\u4f55\u6539\u8fdb\u8f6f\u4ef6\u7684\u8fd0\u884c\u3002\u7531\u4e8eSME\u7684\u65f6\u95f4\u901a\u5e38\u975e\u5e38\u6709\u9650\uff0c\u7406\u60f3\u7684\u72b6\u51b5\u662f\uff0c\u8be5\u56e2\u961f\u4ec5\u901a\u8fc7\u67e5\u770b\u5c11\u91cf\u5177\u6709\u9ad8\u5ea6\u4fe1\u606f\u4ef7\u503c\u7684\u793a\u4f8b\u5c31\u80fd\u5b8c\u6210\u4f18\u5316\u4efb\u52a1\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e2a\u5ba1\u67e5\u8fc7\u7a0b\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u8bad\u7ec3\u9884\u6d4b\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u67d0\u4e2a\u4e13\u5bb6\u662f\u5426\u4f1a\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u4e0b\u4e00\u4e2a\u793a\u4f8b\u3002\u8fd9\u79cd\u9884\u6d4b\u6a21\u578b\u53ef\u4ee5\u4e0eSME\u5408\u4f5c\uff0c\u5f15\u5bfc\u4ed6\u4eec\u63a2\u7d22\u6240\u6709\u793a\u4f8b\uff0c\u540c\u65f6\u5728\u4e13\u5bb6\u79bb\u5f00\u540e\uff0c\u6a21\u578b\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u4ee3\u7406\uff0c\u5904\u7406\u65b0\u51fa\u73b0\u7684\u6848\u4f8b\uff0c\u4ee5\u5e94\u5bf9\u4e13\u5bb6\u4eec\u7684\u5fd9\u788c\u3002 \u572831\u4e2a\u6848\u4f8b\u7814\u7a76\u4e2d\uff08\u6db5\u76d6\u4e86\u4ece\u8f6f\u4ef6\u6d41\u7a0b\u7684\u9ad8\u5c42\u51b3\u7b56\u5230\u89c6\u9891\u7f16\u7801\u8f6f\u4ef6\u914d\u7f6e\u7684\u4f4e\u5c42\u51b3\u7b56\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f7f\u752812\u523030\u4e2a\u6807\u7b7e\u5c31\u80fd\u5efa\u7acb\u8fd9\u6837\u7684\u9884\u6d4b\u6a21\u578b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u4ec5\u51ed\u5c11\u6570\u793a\u4f8b\uff08\u4e0d\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5c31\u80fd\u53d6\u5f97\u8fd9\u6837\u7684\u6210\u679c\uff0c\u5728\u5f53\u524d\u5c1a\u5c5e\u7f55\u89c1\u3002\u9075\u5faa\u5f00\u653e\u79d1\u5b66\u7684\u539f\u5219\uff0c\u6211\u4eec\u5c06\u5728\u63d0\u4f9b\u6240\u6709\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4fbf\u4ed6\u4eba\u80fd\u590d\u5236\u3001\u9a8c\u8bc1\u6216\u5728\u6b64\u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u6539\u8fdb\u8fd9\u4e9b\u7ed3\u679c\u3002|\n", "2405.12915": "|**2024-05-21**|**G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation**|Xingyuan Pan et.al.|[2405.12915](http://arxiv.org/abs/2405.12915)|**[link](https://github.com/xypan0/G-DIG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u7528\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\uff0c\u5b83\u4eec\u80fd\u591f\u4e0e\u4eba\u7c7b\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u534f\u540c\u3002\u7136\u800c\uff0c\u6307\u4ee4\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u662f\u6307\u4ee4\u5fae\u8c03\u9762\u4e34\u7684\u4e24\u5927\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u9009\u62e9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u9ad8\u8d28\u91cf\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u3002\u6211\u4eec\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5206\u6790\u5355\u4e2a\u8bad\u7ec3\u6837\u4f8b\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f71\u54cd\u6a21\u578b\u3002\u901a\u8fc7\u7ed3\u5408\u5f71\u54cd\u529b\u51fd\u6570\u548c\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u79cd\u5b50\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u5bf9\u6a21\u578b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u7684\u6837\u4f8b\u4f5c\u4e3a\u9ad8\u8d28\u91cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u6570\u636e\u591a\u6837\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u805a\u7c7b\u5176\u68af\u5ea6\u5e76\u91cd\u91c7\u6837\uff0c\u6700\u5927\u5316\u5b83\u4eec\u5bf9\u6a21\u578b\u4ea7\u751f\u7684\u5f71\u54cd\u591a\u6837\u6027\u3002\u5728WMT22\u548cFLORES\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u6df1\u5165\u5206\u6790\u8fdb\u4e00\u6b65\u8bc1\u5b9e\u4e86\u5176\u6548\u679c\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2405.12914": "|**2024-05-21**|**An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation**|Zhiyu Tan et.al.|[2405.12914](http://arxiv.org/abs/2405.12914)|**[link](https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion.github.io)**|\u4e00\u4e2a\u5173\u952e\u7684\u5148\u51b3\u6761\u4ef6\u662f\u51c6\u786e\u7406\u89e3\u6587\u672c\u8f93\u5165\uff0c\u8fd9\u5bf9\u4e8e\u5fe0\u5b9e\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u5229\u7528CLIP\u6a21\u578b\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u8868\u793a\u63d0\u793a\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684CLIP\u6a21\u578b\u4ec5\u80fd\u5904\u7406\u82f1\u6587\uff0c\u4e14\u5176\u6587\u672c\u7f16\u7801\u5668\u7684\u6a21\u578b\u5bb9\u91cf\u76f8\u5bf9\u6709\u9650\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u80fd\u591f\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u5e76\u63d0\u4f9b\u66f4\u4f18\u79c0\u7684\u6587\u672c\u8868\u793a\u3002\u672c\u6587\u7814\u7a76\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u4ee5\u63d0\u5347\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u7136\u800c\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5305\u542bLLMs\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u6570\u636e\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e09\u9636\u6bb5\u8bad\u7ec3\u6d41\u7a0b\uff0c\u6709\u6548\u5730\u6574\u5408\u73b0\u6709\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u4e0eLLMs\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u8bad\u7ec3\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u9002\u914d\u5668\uff0c\u4f7f\u5f97\u80fd\u591f\u5feb\u901f\u4f7f\u7528LLMs\u751f\u6210\u7684\u6587\u672c\u8868\u793a\u6765\u8bad\u7ec3\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0d\u4ec5\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u8fd8\u80fd\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u800c\u4e14\u5728\u56fe\u50cf\u751f\u6210\u8d28\u91cf\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.12910": "|**2024-05-21**|**Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment**|Holli Sargeant et.al.|[2405.12910](http://arxiv.org/abs/2405.12910)|**[link](https://github.com/AhmedIzzidien/TopicLLM)**|**\u8be5\u8bba\u6587\u5173\u6ce8\u6cd5\u5f8b\u5206\u6790\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u7a7a\u767d\uff0c\u901a\u8fc7\u6784\u5efa\u548c\u5e94\u7528\u4e00\u79cd\u65b0\u9896\u7684\u5224\u4f8b\u4e3b\u9898\u5206\u7c7b\u6cd5\uff0c\u5bf9\u82f1\u56fd\u7684\u7b80\u6613\u5224\u51b3\u6848\u4ef6\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5229\u7528\u7cbe\u5fc3\u6311\u9009\u7684\u7b80\u6613\u5224\u51b3\u6848\u4f8b\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578bClaude 3 Opus\u7814\u7a76\u529f\u80fd\u6027\u8bdd\u9898\u548c\u8d8b\u52bf\u3002\u7ed3\u679c\u663e\u793a\uff0cClaude 3 Opus\u5728\u4e3b\u9898\u5206\u7c7b\u4e0a\u7684\u51c6\u786e\u7387\u4e3a87.10%\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u6cd5\u5f8b\u9886\u57df\u4e2d\u7b80\u6613\u5224\u51b3\u7684\u660e\u663e\u6a21\u5f0f\u3002\u7531\u4e8e\u82f1\u56fd\u7684\u5224\u4f8b\u6cd5\u5e76\u672a\u539f\u59cb\u6807\u6ce8\u5173\u952e\u8bcd\u6216\u63d0\u4f9b\u4e3b\u9898\u8fc7\u6ee4\u9009\u9879\uff0c\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u6df1\u5316\u4e86\u6211\u4eec\u5bf9\u7b80\u6613\u5224\u51b3\u4e3b\u9898\u672c\u8d28\u7684\u7406\u89e3\uff0c\u8fd8\u5c55\u793a\u4e86\u4f20\u7edf\u65b9\u6cd5\u4e0e\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u5206\u7c7b\u65b9\u6cd5\u7ed3\u5408\u7684\u53ef\u80fd\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u4f9b\u4e86\u82f1\u56fd\u6cd5\u5f8b\u7684\u65b0\u901a\u7528\u5206\u7c7b\u6846\u67b6\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u610f\u4e49\u4e3a\u53f8\u6cd5\u884c\u653f\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba1\u7b97\u6cd5\u5b66\u7814\u7a76\u65b9\u6cd5\u8bba\u8ba8\u8bba\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2405.12900": "|**2024-05-21**|**Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents**|San Kim et.al.|[2405.12900](http://arxiv.org/abs/2405.12900)|null|\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5404\u79cd\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7cfb\u7edf\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6bd2\u6027\u95ee\u9898\u5bf9\u7528\u6237\u4f53\u9a8c\u6784\u6210\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bad\u7ec3\u7b97\u6cd5\u2014\u2014\u5bf9\u6297\u5f0f\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08ADPO\uff09\uff0c\u5b83\u662f\u5728\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7684\u57fa\u7840\u4e0a\u6539\u8fdb\u7684\u3002ADPO\u65e8\u5728\u8bad\u7ec3\u6a21\u578b\u589e\u52a0\u5bf9\u4f18\u9009\u56de\u590d\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u964d\u4f4e\u5bf9\u4f7f\u7528\u6709\u6bd2\u63a7\u5236\u4ee4\u724c\u751f\u6210\u7684\u4e0d\u5b89\u5168\u56de\u590d\u7684\u6982\u7387\u3002\u7814\u7a76\u663e\u793a\uff0cADPO\u80fd\u591f\u589e\u5f3a\u6a21\u578b\u62b5\u5fa1\u6709\u5bb3\u5bf9\u8bdd\u7684\u80fd\u529b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660eADPO\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edfDPO\u66f4\u4e3a\u7a33\u5b9a\u7684\u8bad\u7ec3\u6d41\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u6709\u5bb3\u6570\u636e\u76f4\u63a5\u878d\u5165\u751f\u6210\u6a21\u578b\u7684DPO\u53d8\u4f53\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u521b\u5efa\u5b89\u5168\u5bf9\u8bdd\u6570\u636e\u7684\u9700\u6c42\u3002|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863](http://arxiv.org/abs/2405.14863)|null|\u8de8\u9886\u57df\u5bf9\u9f50\u662f\u6307\u5c06\u4e00\u4e2a\u6982\u5ff5\u4ece\u4e00\u4e2a\u9886\u57df\u6620\u5c04\u5230\u53e6\u4e00\u4e2a\u9886\u57df\u7684\u4efb\u52a1\u3002\u4f8b\u5982\uff0c\u8be2\u95ee\u201c\u5982\u679c\\textit{\u533b\u751f}\u662f\u4e00\u79cd\\textit{\u989c\u8272}\uff0c\u5b83\u4f1a\u662f\u4ec0\u4e48\u989c\u8272\uff1f\u201d\u8fd9\u4e2a\u770b\u4f3c\u5947\u7279\u7684\u8bfe\u9898\u65e8\u5728\u7814\u7a76\u4eba\u4eec\u5982\u4f55\u901a\u8fc7\u7c7b\u522b\u6620\u5c04\u548c\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u63a8\u7406\u6765\u8868\u5f81\u5177\u4f53\u548c\u62bd\u8c61\u7684\u6982\u5ff5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u8ba4\u77e5\u79d1\u5b66\u4e2d\u7684\u8fd9\u4e00\u4efb\u52a1\uff0c\u901a\u8fc7\u884c\u4e3a\u7814\u7a76\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6982\u5ff5\u5316\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u793aLLMs\u6267\u884c\u8de8\u57df\u6620\u5c04\u4efb\u52a1\uff0c\u5e76\u5728\u7fa4\u4f53\u548c\u4e2a\u4f53\u5c42\u9762\u5206\u6790\u5b83\u4eec\u7684\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u5bf9\u5176\u9884\u6d4b\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u548c\u5206\u7c7b\u5b83\u4eec\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u89e3\u91ca\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4eba\u7c7b\u548c\u6a21\u578b\u7684\u6620\u5c04\u4ee5\u53ca\u89e3\u91ca\u5b58\u5728\u663e\u8457\u76f8\u4f3c\u6027\uff0c\u8868\u660e\u6a21\u578b\u4ee5\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u65b9\u5f0f\u8868\u5f81\u6982\u5ff5\u3002\u8fd9\u79cd\u76f8\u4f3c\u6027\u4e0d\u4ec5\u4f53\u73b0\u5728\u6a21\u578b\u7684\u8868\u793a\u4e0a\uff0c\u4e5f\u4f53\u73b0\u5728\u5b83\u4eec\u7684\u884c\u4e3a\u4e2d\u3002\u800c\u4e14\uff0c\u6a21\u578b\u5927\u591a\u7ed9\u51fa\u6709\u6548\u7684\u89e3\u91ca\uff0c\u5e76\u91c7\u7528\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u63a8\u7406\u8def\u5f84\u3002|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862](http://arxiv.org/abs/2405.14862)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aBitune\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u63d0\u5347\u4e86\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u8c03\u4f18\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u5728\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002Bitune\u901a\u8fc7\u540c\u65f6\u5e94\u7528\u81ea\u56de\u5f52\u548c\u53cc\u5411\u6ce8\u610f\u529b\u5230\u63d0\u793a\u4e0a\uff0c\u4ee5\u83b7\u53d6\u66f4\u7cbe\u786e\u7684\u67e5\u8be2\u6216\u6307\u4ee4\u8868\u793a\u3002\u6211\u4eec\u4e3a\u6b64\u5f15\u5165\u4e86\u4e24\u7ec4\u53c2\u6570\uff0c\u5e76\u91c7\u7528\u4e86\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u6765\u5904\u7406\u3002\u8fd9\u4e24\u79cd\u7279\u5f81\u968f\u540e\u88ab\u7ec4\u5408\u6210\u4e00\u4e2a\u52a0\u6743\u5e73\u5747\uff0c\u5176\u4e2d\u6743\u91cd\u7531\u53ef\u8bad\u7ec3\u7cfb\u6570\u51b3\u5b9a\uff0c\u7528\u4e8e\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitune\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u7b97\u672f\u548c\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5927\u91cf\u7684\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u5e76\u663e\u793a\u4e86\u8be5\u65b9\u6cd5\u5bf9\u4e0d\u540cPEFT\u6280\u672f\u7684\u9c81\u68d2\u6027\u3002|\n", "2405.14852": "|**2024-05-23**|**PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression**|Vladimir Malinovskii et.al.|[2405.14852](http://arxiv.org/abs/2405.14852)|**[link](https://github.com/vahe1994/aqlm)**|## \u80cc\u666f \u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u6781\u7aef\u201d\u538b\u7f29\uff0c\u5373\u5c06\u5176\u53c2\u6570\u538b\u7f29\u81f31-2\u4f4d\u6bcf\u53c2\u6570\uff0c\u4ee5\u9002\u5e94\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u7684\u9ad8\u6548\u6267\u884c\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6539\u8fdb\u4e00\u6b21\u6027\u91cf\u5316\u6280\u672f\u548c\u6743\u91cd\u8868\u793a\u4e0a\uff1b\u7136\u800c\uff0c\u7eaf\u540e\u8bad\u7ec3\u65b9\u6cd5\u5728\u7cbe\u5ea6\u4e0e\u4f4d\u5bbd\u6743\u8861\u65b9\u9762\u7684\u6536\u76ca\u6b63\u5728\u51cf\u5c11\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982QuIP#\u548cAQLM\uff0c\u5305\u542b\u5bf9\u90e8\u5206\u538b\u7f29\u53c2\u6570\u7684\u5c0f\u89c4\u6a21\u6821\u51c6\u6570\u636e\u5fae\u8c03\uff1b\u7136\u800c\uff0c\u8fd9\u4e9b\u9488\u5bf9\u538b\u7f29\u6743\u91cd\u7684\u5fae\u8c03\u901a\u5e38\u4ec5\u4f7f\u7528\u76f4\u901a\u4f30\u8ba1\u5668\uff08STE\uff09\uff0cSTE\u5728\u8fd9\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\u5c1a\u4e0d\u660e\u786e\u3002 \u672c\u5de5\u4f5c\u8d28\u7591\u5728\u6781\u7aefLLM\u538b\u7f29\u4e2d\u4f7f\u7528STE\u7684\u6709\u6548\u6027\uff0c\u5e76\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u91cf\u5316\u611f\u77e5\u5fae\u8c03\u7b56\u7565\u3002\u6211\u4eec\u63d0\u51faPV-Tuning\uff0c\u4e00\u4e2a\u65e0\u7279\u5b9a\u67b6\u6784\u9650\u5236\u7684\u6846\u67b6\uff0c\u5b83\u6269\u5c55\u5e76\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u5e76\u5728\u67d0\u4e9b\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u4f9b\u6536\u655b\u4fdd\u8bc1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5f53\u7528\u4e8e1-2\u4f4d\u77e2\u91cf\u91cf\u5316\u65f6\uff0cPV-Tuning\u5728\u9ad8\u6027\u80fd\u6a21\u578b\u5982Llama\u548cMistral\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u6280\u672f\u3002\u901a\u8fc7\u4f7f\u7528PV-Tuning\uff0c\u6211\u4eec\u57282\u4f4d\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u9996\u6b21\u5b9e\u73b0\u4e86Llama 2\u5bb6\u65cf\u6a21\u578b\u7684\u5e15\u7d2f\u6258\u6700\u4f18\u91cf\u5316\u3002|\n", "2405.14831": "|**2024-05-23**|**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models**|Bernal Jim\u00e9nez Guti\u00e9rrez et.al.|[2405.14831](http://arxiv.org/abs/2405.14831)|**[link](https://github.com/osu-nlp-group/hipporag)**|\u4e3a\u4e86\u5728\u6076\u52a3\u591a\u53d8\u7684\u81ea\u7136\u73af\u5883\u4e2d\u751f\u5b58\uff0c\u54fa\u4e73\u52a8\u7269\u7684\u5927\u8111\u53d1\u5c55\u51fa\u5b58\u50a8\u5927\u91cf\u4e16\u754c\u77e5\u8bc6\u5e76\u4e0d\u65ad\u6574\u5408\u65b0\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u540c\u65f6\u907f\u514d\u707e\u96be\u6027\u9057\u5fd8\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u5e26\u6709\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u4e0a\u5df2\u53d6\u5f97\u663e\u8457\u6210\u5c31\uff0c\u4f46\u5b83\u4eec\u5728\u5927\u89c4\u6a21\u65b0\u7ecf\u9a8c\u878d\u5408\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51faHippoRAG\uff0c\u4e00\u4e2a\u53d7\u4eba\u7c7b\u957f\u671f\u8bb0\u5fc6\u6d77\u9a6c\u56de\u7d22\u5f15\u7406\u8bba\u542f\u53d1\u7684\u65b0\u578b\u68c0\u7d22\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u65b0\u7ecf\u9a8c\u7684\u66f4\u6df1\u3001\u66f4\u6709\u6548\u96c6\u6210\u3002HippoRAG\u5de7\u5999\u5730\u534f\u540cLLMs\u3001\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u4e2a\u6027\u5316PageRank\u7b97\u6cd5\uff0c\u6a21\u62df\u4eba\u8111\u76ae\u5c42\u548c\u6d77\u9a6c\u4f53\u5728\u8bb0\u5fc6\u4e2d\u7684\u4e0d\u540c\u4f5c\u7528\u3002 \u6211\u4eec\u5c06HippoRAG\u4e0e\u73b0\u6709RAG\u65b9\u6cd5\u5728\u591a\u8f6e\u95ee\u7b54\u4efb\u52a1\u4e2d\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793aHippoRAG\u663e\u8457\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002\u5355\u6b65\u68c0\u7d22\u65f6\uff0cHippoRAG\u8868\u73b0\u51fa\u4e0e\u8fed\u4ee3\u68c0\u7d22\u65b9\u6cd5\u5982IRCoT\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u6210\u672c\u8282\u770110-30\u500d\uff0c\u901f\u5ea6\u63d0\u53476-13\u500d\u3002\u5f53\u5c06HippoRAG\u878d\u5165IRCoT\u540e\uff0c\u8fd8\u80fd\u5e26\u6765\u989d\u5916\u7684\u663e\u8457\u589e\u76ca\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793aHippoRAG\u80fd\u591f\u5e94\u5bf9\u73b0\u6709\u65b9\u6cd5\u96be\u4ee5\u89e6\u53ca\u7684\u65b0\u573a\u666f\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728\u4e0a\u5f00\u6e90\u3002|\n", "2405.14804": "|**2024-05-23**|**Can LLMs Solve longer Math Word Problems Better?**|Xin Xu et.al.|[2405.14804](http://arxiv.org/abs/2405.14804)|null|### \u7ffb\u8bd1 \u6570\u5b66\u5e94\u7528\u9898\uff08MWPs\uff09\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u77ed\u80cc\u666f\u7684\u9898\u76ee\u4e0a\u3002\u7136\u800c\uff0c\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u6570\u5b66\u95ee\u9898\u5f80\u5f80\u6d89\u53ca\u590d\u6742\u60c5\u5883\uff0c\u56e0\u6b64LLMs\u89e3\u51b3\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u5bf9\u4e8e\u5176\u5728\u5b9e\u9645\u573a\u666f\u7684\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u8fd9\u4e00\u65b9\u9762\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5173\u6ce8Context Length Generalizability\uff08CoLeG\uff09\uff0c\u5373LLMs\u5904\u7406\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u3002\u6211\u4eec\u521b\u5efa\u4e86Extended Grade-School Math\uff08E-GSM\uff09\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5e26\u6709\u8be6\u7ec6\u53d9\u8ff0\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30LLMs\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u6548\u80fd\u548c\u9c81\u68d2\u6027\u3002 \u901a\u8fc7\u5bf9\u73b0\u6709\u96f6\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u4ee5\u53ca\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u8003\u5bdf\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u5728CoLeG\u65b9\u9762\u666e\u904d\u5b58\u5728\u4e0d\u8db3\u3002\u9488\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684LLMs\uff0c\u6211\u4eec\u63d0\u51fa\u9488\u5bf9\u6027\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u5bf9\u4e8e\u4e13\u6709\u6a21\u578b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u5bfc\u6027\u63d0\u793a\u4ee5\u51cf\u8f7b\u957f\u6587\u672c\u7684\u5f71\u54cd\uff1b\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4ee5\u63d0\u5347\u6a21\u578b\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u5728E-GSM\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5176\u4ed6\u591a\u4e2a\u6570\u5b66\u5e94\u7528\u9898\u57fa\u51c6\u4e0a\u4e5f\u5c55\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u672c\u7814\u7a76\u7684\u7ed3\u679c\u4e3a\u672a\u6765\u5229\u7528LLMs\u5904\u7406\u590d\u6742\u73b0\u5b9e\u95ee\u9898\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\uff0c\u4e3a\u5f53\u524d\u9650\u5236\u63d0\u51fa\u4e86\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u6a21\u578b\u6cdb\u5316\u6027\u548c\u8bad\u7ec3\u7b56\u7565\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2405.14782": "|**2024-05-23**|**Lessons from the Trenches on Reproducible Evaluation of Language Models**|Stella Biderman et.al.|[2405.14782](http://arxiv.org/abs/2405.14782)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u662f\u4e00\u9879\u672a\u89e3\u7684\u6311\u6218\u3002\u7814\u7a76\u4eba\u5458\u548c\u5de5\u7a0b\u5e08\u9762\u4e34\u8bf8\u591a\u65b9\u6cd5\u8bba\u96be\u9898\uff0c\u4f8b\u5982\u6a21\u578b\u5bf9\u8bc4\u4f30\u8bbe\u7f6e\u7684\u654f\u611f\u6027\u3001\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\uff0c\u4ee5\u53ca\u53ef\u91cd\u590d\u6027\u548c\u900f\u660e\u5ea6\u7684\u7f3a\u5931\u3002\u672c\u6587\u57fa\u4e8e\u4e09\u5e74\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7ecf\u9a8c\uff0c\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u6307\u5bfc\u548c\u6559\u8bad\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u5e38\u89c1\u7684\u95ee\u9898\u3002\u5176\u6b21\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u5e94\u5bf9\u6216\u51cf\u8f7b\u8fd9\u4e9b\u95ee\u9898\u7684\u6700\u4f73\u5b9e\u8df5\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Language Model Evaluation Harness\uff08lm-eval\uff09\uff1a\u4e00\u4e2a\u5f00\u6e90\u5e93\uff0c\u65e8\u5728\u72ec\u7acb\u3001\u53ef\u91cd\u590d\u548c\u6269\u5c55\u5730\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u5c06\u4ecb\u7ecd\u5e93\u7684\u529f\u80fd\uff0c\u5e76\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u5982\u4f55\u4f7f\u7528\u8be5\u5e93\u6765\u7f13\u89e3\u8fd9\u4e9b\u65b9\u6cd5\u8bba\u5173\u6ce8\u70b9\u3002|\n", "2405.14768": "|**2024-05-23**|**WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models**|Peng Wang et.al.|[2405.14768](http://arxiv.org/abs/2405.14768)|**[link](https://github.com/zjunlp/easyedit)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u968f\u7740\u4e16\u754c\u4e8b\u5b9e\u7684\u4e0d\u65ad\u589e\u957f\u548c\u7ea0\u6b63\u9519\u8bef\u54cd\u5e94\u7684\u9700\u6c42\uff0c\u6a21\u578b\u7f16\u8f91\u7684\u65b9\u6cd5\u9700\u8981\u4e0d\u65ad\u66f4\u65b0\u77e5\u8bc6\u3002\u8bba\u6587\u7684\u6838\u5fc3\u95ee\u9898\u662f\uff1a\u5728\u7f16\u8f91\u8fc7\u7a0b\u4e2d\uff0c\u77e5\u8bc6\u5e94\u5b58\u50a8\u5728\u6a21\u578b\u7684\u54ea\u4e2a\u8bb0\u5fc6\u5c42\u6b21\u66f4\u4e3a\u5408\u9002\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u76f4\u63a5\u4fee\u6539\u957f\u671f\u8bb0\u5fc6\uff08\u6a21\u578b\u53c2\u6570\uff09\u6216\u5229\u7528\u5de5\u4f5c\u8bb0\u5fc6\uff08\u901a\u8fc7\u68c0\u7d22\u7684\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\uff09\u90fd\u4f1a\u5bfc\u81f4\u4e0d\u53ef\u903e\u8d8a\u7684\u4e09\u89d2\u56f0\u5883\u2014\u2014\u53ef\u9760\u6027\u3001\u6cdb\u5316\u80fd\u529b\u548c\u5c40\u90e8\u6027\u65e0\u6cd5\u540c\u65f6\u5b9e\u73b0\u4e8e\u7ec8\u8eab\u7f16\u8f91\u573a\u666f\u4e2d\u3002\u76f4\u63a5\u4fee\u6539\u53c2\u6570\u4f1a\u4e0e\u65e0\u5173\u7684\u9884\u8bad\u7ec3\u77e5\u8bc6\u6216\u5148\u524d\u7f16\u8f91\u4ea7\u751f\u51b2\u7a81\uff08\u53ef\u9760\u6027\u5dee\u3001\u5c40\u90e8\u6027\u4e0d\u8db3\uff09\uff1b\u800c\u57fa\u4e8e\u68c0\u7d22\u7684\u5de5\u4f5c\u8bb0\u5fc6\u96be\u4ee5\u4f7f\u6a21\u578b\u7406\u89e3\u5e76\u6cdb\u5316\u7f16\u8f91\uff08\u6cdb\u5316\u80fd\u529b\u5f31\uff09\u3002\u56e0\u6b64\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aWISE\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u5f25\u5408\u8bb0\u5fc6\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002 \u5728WISE\u4e2d\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u53c2\u6570\u5185\u5b58\u673a\u5236\uff0c\u5305\u62ec\u4e3b\u5185\u5b58\u7528\u4e8e\u5b58\u50a8\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0c\u4fa7\u5185\u5b58\u7528\u4e8e\u5b58\u653e\u7f16\u8f91\u540e\u7684\u77e5\u8bc6\u3002\u4ec5\u5bf9\u4fa7\u5185\u5b58\u4e2d\u7684\u77e5\u8bc6\u8fdb\u884c\u7f16\u8f91\uff0c\u5e76\u8bad\u7ec3\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u4ee5\u4fbf\u6839\u636e\u67e5\u8be2\u51b3\u5b9a\u4ece\u54ea\u4e2a\u5185\u5b58\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u5bf9\u4e8e\u6301\u7eed\u7f16\u8f91\uff0c\u91c7\u7528\u4e86\u77e5\u8bc6\u5207\u7247\u673a\u5236\uff0c\u5c06\u4e0d\u540c\u7684\u7f16\u8f91\u5206\u5e03\u5728\u53c2\u6570\u7684\u4e0d\u540c\u5b50\u7a7a\u95f4\u4e2d\uff0c\u7136\u540e\u5408\u5e76\u5230\u5171\u4eab\u5185\u5b58\u4e2d\uff0c\u4ee5\u907f\u514d\u51b2\u7a81\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWISE\u5728\u95ee\u7b54\u3001\u5e7b\u89c9\u751f\u6210\u548c\u8de8\u4e0d\u540c\u8d8b\u52bf\u7684LLM\u67b6\u6784\uff08\u5982GPT\u3001LLaMA\u548cMistral\uff09\u7684\u7ec8\u8eab\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u6210\u529f\u514b\u670d\u4e86\u4e0a\u8ff0\u56f0\u5883\u3002\u4ee3\u7801\u5c06\u5728https://github.com/zjunlp/EasyEdit\u4e0a\u53d1\u5e03\u3002**|\n", "2405.14767": "|**2024-05-23**|**FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models**|Hongyang Yang et.al.|[2405.14767](http://arxiv.org/abs/2405.14767)|**[link](https://github.com/ai4finance-foundation/finrobot)**|**\u968f\u7740\u91d1\u878d\u673a\u6784\u548c\u4e13\u4e1a\u4eba\u58eb\u8d8a\u6765\u8d8a\u591a\u5730\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u878d\u5165\u5de5\u4f5c\u6d41\u7a0b\uff0c\u91d1\u878d\u884c\u4e1a\u4e0eAI\u793e\u533a\u4e4b\u95f4\u4ecd\u5b58\u5728\u663e\u8457\u969c\u788d\uff0c\u5982\u4e13\u6709\u6570\u636e\u548c\u4e13\u4e1a\u77e5\u8bc6\u3002\u8fd9\u4e9b\u6311\u6218\u9650\u5236\u4e86AI\u5728\u63d0\u5347\u91d1\u878d\u4efb\u52a1\u6548\u7387\u65b9\u9762\u7684\u6f5c\u529b\u3002\u9274\u4e8e\u91d1\u878d\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u6211\u4eec\u65e8\u5728\u5f00\u53d1\u4e13\u95e8\u9488\u5bf9\u91d1\u878d\u7684LLM\u9a71\u52a8\u5de5\u5177\u94fe\uff0c\u5e76\u901a\u8fc7\u5f00\u6e90\u9879\u76ee\u63a8\u52a8\u5176\u666e\u53ca\uff0c\u4fc3\u8fdbAI\u5728\u91d1\u878d\u51b3\u7b56\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u4ecb\u7ecdFinRobot\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u5f00\u6e90AI\u4ee3\u7406\u5e73\u53f0\uff0c\u652f\u6301\u591a\u4e2a\u91d1\u878d\u4e13\u4e1aAI\u4ee3\u7406\uff0c\u6bcf\u4e2a\u90fd\u7531LLM\u9a71\u52a8\u3002\u5e73\u53f0\u4e3b\u8981\u5206\u4e3a\u56db\u5c42\uff1a1\uff09\u91d1\u878dAI\u4ee3\u7406\u5c42\uff0c\u901a\u8fc7\u6784\u5efa\u91d1\u878dChain-of-Thought\uff08CoT\uff09\u5c06\u590d\u6742\u7684\u91d1\u878d\u95ee\u9898\u5206\u89e3\u4e3a\u903b\u8f91\u5e8f\u5217\uff1b2\uff09\u91d1\u878dLLM\u7b97\u6cd5\u5c42\uff0c\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u52a8\u6001\u914d\u7f6e\u5408\u9002\u7684\u6a21\u578b\u5e94\u7528\u7b56\u7565\uff1b3\uff09LLMOps\u548cDataOps\u5c42\uff0c\u901a\u8fc7\u8bad\u7ec3/\u5fae\u8c03\u6280\u672f\u4ee5\u53ca\u4f7f\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u751f\u6210\u7cbe\u786e\u6a21\u578b\uff1b4\uff09\u591a\u6e90LLM\u57fa\u7840\u6a21\u578b\u5c42\uff0c\u6574\u5408\u5404\u79cdLLM\uff0c\u4f7f\u4e0a\u8ff0\u5404\u5c42\u53ef\u4ee5\u76f4\u63a5\u8bbf\u95ee\u3002FinRobot\u65e8\u5728\u4e3a\u4e13\u4e1a\u5206\u6790\u5e08\u548c\u975e\u4e13\u4e1a\u4eba\u58eb\u63d0\u4f9b\u5b9e\u8df5\u64cd\u4f5c\uff0c\u8ba9\u4ed6\u4eec\u80fd\u591f\u5229\u7528\u5f3a\u5927\u7684AI\u6280\u672f\u8fdb\u884c\u9ad8\u7ea7\u91d1\u878d\u5206\u6790\u3002FinRobot\u7684\u5f00\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/AI4Finance-Foundation/FinRobot}\u3002**|\n", "2405.14766": "|**2024-05-23**|**Evaluating Large Language Models for Public Health Classification and Extraction Tasks**|Joshua Harris et.al.|[2405.14766](http://arxiv.org/abs/2405.14766)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u4eec\u5bf9\u5176\u5728\u516c\u5171\u536b\u751f\u9886\u57df\u652f\u6301\u4e13\u5bb6\u5de5\u4f5c\u7684\u6f5c\u529b\u4ea7\u751f\u4e86\u6d53\u539a\u5174\u8da3\u3002\u672c\u7814\u7a76\u901a\u8fc7\u7ed3\u5408\u516d\u4e2a\u5916\u90e8\u6807\u6ce8\u7684\u548c\u4e03\u4e2a\u5185\u90e8\u6807\u6ce8\u7684\u6570\u636e\u96c6\uff0c\u8bc4\u4f30\u4e86LLMs\u5728\u5904\u7406\u4e0e\u5065\u5eb7\u8d1f\u62c5\u3001\u6d41\u884c\u75c5\u5b66\u98ce\u9669\u56e0\u7d20\u548c\u516c\u5171\u536b\u751f\u5e72\u9884\u76f8\u5173\u7684\u6587\u672c\u5206\u7c7b\u548c\u63d0\u53d6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u9996\u5148\u5bf9\u4e94\u4e2a\u5f00\u6e90\u5927\u6a21\u578b\uff08\u53c2\u6570\u91cf\u4ece7\u4ebf\u523070\u4ebf\u4e0d\u7b49\uff09\u8fdb\u884c\u4e86\u96f6\u6837\u672c\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cLlama-3-70B-Instruct\u8868\u73b0\u51fa\u8272\uff0c\u5fae-F1\u5f97\u5206\u572817\u4e2a\u4efb\u52a1\u4e2d\u768415\u9879\u4e2d\u6700\u9ad8\u3002\u5404\u4efb\u52a1\u95f4\u7684\u6027\u80fd\u5dee\u5f02\u663e\u8457\uff0c\u4f8b\u5982\uff0c\u6709\u4e9b\u6a21\u578b\u5982Contact Classification\u7684\u5f97\u5206\u4f4e\u4e8e60%\uff0c\u800c\u50cfGI\u75be\u75c5\u5206\u7c7b\u8fd9\u6837\u7684\u4efb\u52a1\uff0c\u6240\u6709\u6a21\u578b\u90fd\u80fd\u8fbe\u523080%\u4ee5\u4e0a\u7684\u5fae-F1\u3002\u5bf9\u4e8e12\u4e2a\u4efb\u52a1\u7684\u5b50\u96c6\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86GPT-4\uff0c\u53d1\u73b0\u5176\u4e0eLlama-3-70B-Instruct\u7684\u7ed3\u679c\u76f8\u5f53\uff0cLlama-3-70B-Instruct\u5728\u5176\u4e2d6\u4e2a\u4efb\u52a1\u4e0a\u5f97\u5206\u66f4\u9ad8\u6216\u6301\u5e73\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6839\u636e\u521d\u6b65\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u6709\u53ef\u80fd\u6210\u4e3a\u516c\u5171\u536b\u751f\u4e13\u5bb6\u4ece\u5404\u79cd\u81ea\u7531\u6587\u672c\u6e90\u63d0\u53d6\u4fe1\u606f\u7684\u6709\u6548\u5de5\u5177\uff0c\u6709\u52a9\u4e8e\u516c\u5171\u536b\u751f\u76d1\u6d4b\u3001\u7814\u7a76\u548c\u5e72\u9884\u63aa\u65bd\u3002|\n", "2405.14755": "|**2024-05-23**|**Large language models can be zero-shot anomaly detectors for time series?**|Sarah Alnegheimish et.al.|[2405.14755](http://arxiv.org/abs/2405.14755)|**[link](https://github.com/sintel-dev/sigllm)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6267\u884c\u591a\u79cd\u4efb\u52a1\uff0c\u5305\u62ec\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u8fd9\u4e9b\u6a21\u578b\u7684\u7075\u6d3b\u6027\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e00\u9879\u65b0\u9896\u7684\u7814\u7a76\uff0c\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5bf9\u4e8e\u8bed\u8a00\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u6d89\u53ca\u8bc6\u522b\u8f93\u5165\u5e8f\u5217\uff08\u6216\u591a\u4e2a\u90e8\u5206\uff09\u4e2d\u7684\u5f02\u5e38\u70b9\uff0c\u4ee5\u53ca\u5904\u7406\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u800c\u975e\u4f20\u7edf\u7684\u6587\u672c\u8f93\u5165\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86sigllm\uff0c\u4e00\u4e2a\u4e13\u4e3a\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5305\u542b\u5c06\u65f6\u95f4\u5e8f\u5217\u8f6c\u6362\u4e3a\u6587\u672c\u7684\u6a21\u5757\uff0c\u4ee5\u53ca\u7aef\u5230\u7aef\u7684\u6d41\u7a0b\uff0c\u7528\u4e8e\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5f02\u5e38\u68c0\u6d4b\u3002\u6211\u4eec\u8bd5\u9a8c\u4e86\u4e24\u79cd\u6d4b\u8bd5\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u529b\u7684\u65b9\u6cd5\uff1a\u4e00\u662f\u76f4\u63a5\u63d0\u793a\u6a21\u578b\u6307\u51fa\u8f93\u5165\u4e2d\u7684\u5f02\u5e38\u5143\u7d20\uff1b\u4e8c\u662f\u5229\u7528\u8bed\u8a00\u6a21\u578b\u7684\u9884\u6d4b\u80fd\u529b\u6765\u8f85\u52a9\u68c0\u6d4b\u8fc7\u7a0b\u3002 \u6211\u4eec\u572811\u4e2a\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u4e8610\u79cd\u4e0d\u540c\u7684\u7ba1\u9053\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9884\u6d4b\u65b9\u6cd5\u5728\u6240\u670911\u4e2a\u6570\u636e\u96c6\u4e2d\u90fd\u663e\u8457\u4f18\u4e8e\u63d0\u793a\u65b9\u6cd5\uff0c\u5c24\u5176\u662f\u5728F1\u5206\u6570\u4e0a\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u53d1\u73b0\u5f02\u5e38\uff0c\u4f46\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4ecd\u5360\u4f18\uff0c\u5176\u8868\u73b0\u6bd4\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9ad8\u51fa30%\u3002|\n", "2405.15765": "|**2024-05-24**|**Scaling Laws for Discriminative Classification in Large Language Models**|Dean Wyatte et.al.|[2405.15765](http://arxiv.org/abs/2405.15765)|null|## \u80cc\u666f \u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u6a21\u578b\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u98de\u8dc3\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5bf9\u5404\u79cd\u67e5\u8be2\u751f\u6210\u5408\u7406\u7684\u56de\u7b54\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u5e94\u7528\u4e2d\u5177\u6709\u6f5c\u529b\u3002\u7136\u800c\uff0cLLMs\u5df2\u88ab\u89c2\u5bdf\u5230\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\uff0c\u8fd9\u5728\u77ed\u671f\u5185\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5c06\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u91cd\u65b0\u6784\u60f3\u4e3a\u5206\u7c7b\u4efb\u52a1\uff0c\u4ee5\u5e2e\u52a9\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u9009\u62e9\u6700\u4f73\u7684\u6a21\u677f\u56de\u590d\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a\u5ba2\u670d\u4ee3\u8868\u63d0\u4f9b\u6700\u5408\u9002\u7684\u524dK\u4e2a\u5019\u9009\u56de\u590d\u3002 ## \u4efb\u52a1\u63cf\u8ff0 \u6211\u4eec\u5c55\u793a\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u5b9e\u9a8c\u7684\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86\u5b9e\u9a8c\u7cfb\u7edf\u7684\u6709\u6548\u6027\uff0c\u79bb\u7ebf\u5b9e\u9a8c\u663e\u793a\u51fa\u6539\u8fdb\uff0c\u800c\u5728\u7ebf\u5b9e\u9a8c\u5219\u5e26\u6765\u4e86\u7edf\u8ba1\u663e\u8457\u7684\u6548\u679c\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u4eab\u4e86\u901a\u8fc7\u6a21\u578b\u53c2\u6570\u8c03\u6574\u8fdb\u884c\u7684\u9a8c\u8bc1\u635f\u5931\u548c\u524dK\u7cbe\u5ea6\u7684\u5ea6\u91cf\u66f2\u7ebf\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u6a21\u578b\u5927\u5c0f\u3001\u5ef6\u8fdf\u548c\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u3002|\n", "2405.15739": "|**2024-05-24**|**Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias**|Andres Algaba et.al.|[2405.15739](http://arxiv.org/abs/2405.15739)|**[link](https://github.com/andresalgaba/llm_citation_patterns)**|\u8bba\u6587\u6458\u8981\uff1a \u5f15\u7528\u5b9e\u8df5\u5bf9\u4e8e\u6784\u5efa\u79d1\u5b66\u77e5\u8bc6\u7ed3\u6784\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u53d7\u5230\u5f53\u4ee3\u89c4\u8303\u548c\u504f\u89c1\u7684\u5f71\u54cd\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u51fa\u73b0\uff0c\u8fd9\u4e00\u9886\u57df\u51fa\u73b0\u4e86\u65b0\u7684\u52a8\u6001\u3002\u7814\u7a76\u8005\u9996\u6b21\u63a2\u7d22\u4e86\u5b8c\u5168\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u800c\u975e\u57fa\u4e8e\u641c\u7d22\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7684\u63a8\u8350\u5f15\u7528\u7684\u7279\u6027\u53ca\u5176\u6f5c\u5728\u504f\u89c1\u3002\u5b9e\u9a8c\u4f7f\u7528\u4e86\u4e00\u7ec4\u5305\u542b166\u7bc7\u6765\u81eaAAAI\u3001NeurIPS\u3001ICML\u548cICLR\u7684\u8bba\u6587\uff0c\u8fd9\u4e9b\u8bba\u6587\u5728GPT-4\u7684\u77e5\u8bc6\u622a\u6b62\u65e5\u671f\u540e\u53d1\u8868\uff0c\u6d89\u53ca3,066\u4e2a\u5f15\u7528\u3002\u5b9e\u9a8c\u8ba9GPT-4\u4e3a\u533f\u540d\u6587\u672c\u4e2d\u7684\u5f15\u7528\u63d0\u4f9b\u5b66\u672f\u53c2\u8003\u3002\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u7c7b\u548c\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u5f15\u7528\u6a21\u5f0f\u60ca\u4eba\u76f8\u4f3c\uff0c\u4f46GPT-4\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9ad8\u5f15\u7528\u504f\u89c1\uff0c\u5373\u4f7f\u5728\u63a7\u5236\u4e86\u51fa\u7248\u5e74\u4efd\u3001\u6807\u9898\u957f\u5ea6\u3001\u4f5c\u8005\u6570\u91cf\u548c\u4f1a\u8bae\u7b49\u56e0\u7d20\u540e\u4f9d\u7136\u5b58\u5728\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0GPT-4\u751f\u6210\u7684\u65e2\u6709\u548c\u4e0d\u5b58\u5728\u5f15\u7528\u7684\u7279\u6027\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8868\u660e\u6a21\u578b\u5185\u5316\u4e86\u5f15\u7528\u6a21\u5f0f\u3002\u901a\u8fc7\u5206\u6790\u5f15\u7528\u56fe\u8c31\uff0c\u663e\u793aGPT-4\u63a8\u8350\u7684\u5f15\u7528\u5d4c\u5165\u5728\u76f8\u5173\u5f15\u7528\u7f51\u7edc\u4e2d\uff0c\u6697\u793a\u5176\u5bf9\u6982\u5ff5\u7684\u6df1\u5165\u7406\u89e3\u3002\u5c3d\u7ba1\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u8f85\u52a9\u5f15\u7528\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4e5f\u53ef\u80fd\u653e\u5927\u73b0\u6709\u504f\u89c1\u5e76\u5f15\u5165\u65b0\u504f\u89c1\uff0c\u53ef\u80fd\u5f71\u54cd\u79d1\u5b66\u77e5\u8bc6\u7684\u4f20\u64ad\u3002\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86\u8bc6\u522b\u6a21\u578b\u504f\u89c1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5f00\u53d1\u5e73\u8861\u7684\u65b9\u6cd5\u4e0e\u8bed\u8a00\u6a21\u578b\u4e92\u52a8\u7684\u91cd\u8981\u6027\u3002|\n", "2405.15734": "|**2024-05-24**|**LM4LV: A Frozen Large Language Model for Low-level Vision Tasks**|Boyang Zheng et.al.|[2405.15734](http://arxiv.org/abs/2405.15734)|**[link](https://github.com/bytetriper/lm4lv)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u50ac\u751f\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u7814\u7a76\u70ed\u6f6e\uff0c\u5b83\u4eec\u6b63\u5728\u6539\u53d8\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u591a\u4e2a\u7814\u7a76\u8303\u5f0f\u3002\u5c3d\u7ba1MLLMs\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u548c\u6587\u672c\u5230\u56fe\u50cf\u7b49\u9ad8\u7ea7\u89c6\u89c9\u548c Vision-and-Language \u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u63a2\u8ba8\u8fc7\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u5982\u4f55\u4ece\u8fd9\u4e9b\u6a21\u578b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5f53\u524d\u5927\u591a\u6570MLLM\u7684\u8bbe\u8ba1\u4f7f\u5176\u5bf9\u4f4e\u7ea7\u7279\u5f81\u89c6\u800c\u4e0d\u89c1\uff0c\u56e0\u6b64\u5728\u89e3\u51b3\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u56fa\u6709\u9650\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa$\\textbf{LM4LV}$\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u4e00\u4e2a\u51bb\u7ed3\u7684LLM\u65e0\u9700\u4efb\u4f55\u591a\u6a21\u6001\u6570\u636e\u6216\u5148\u9a8c\u77e5\u8bc6\u5c31\u80fd\u89e3\u51b3\u4e00\u7cfb\u5217\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u3002\u8fd9\u7a81\u663e\u4e86LLMs\u5728\u4f4e\u7ea7\u89c6\u89c9\u9886\u57df\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u5e76\u5f25\u5408\u4e86MLLMs\u4e0e\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u6fc0\u53d1\u5bf9LLMs\u7684\u65b0\u89c6\u89d2\uff0c\u52a0\u6df1\u5bf9\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7406\u89e3\u3002|\n", "2405.15729": "|**2024-05-24**|**Optimizing Large Language Models for OpenAPI Code Completion**|Bohdan Petryshyn et.al.|[2405.15729](http://arxiv.org/abs/2405.15729)|**[link](https://github.com/BohdanPetryshyn/openapi-completion-benchmark)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u3002\u5c3d\u7ba1\u4e3b\u6d41\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\u8865\u5168\u89e3\u51b3\u65b9\u6848\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u8f83\u5c11\u89c1\u7684\u683c\u5f0f\uff0c\u5982OpenAPI\u5b9a\u4e49\u65f6\u6027\u80fd\u6b20\u4f73\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86GitHub Copilot\uff0c\u4e00\u4e2a\u6d41\u884c\u7684\u5546\u4e1a\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u5728OpenAPI\u5b8c\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u5e76\u9488\u5bf9Meta\u5f00\u6e90\u7684Code Llama\u6a21\u578b\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u8be5\u4efb\u52a1\u7684\u4f18\u5316\u7b56\u7565\u3002\u7814\u7a76\u4e2d\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8bed\u4e49\u611f\u77e5\u7684OpenAPI\u5b8c\u6210\u57fa\u51c6\uff0c\u901a\u8fc7\u5b9e\u9a8c\u5206\u6790\u4e86\u4e0d\u540c\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u6280\u672f\u5bf9Code Llama\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7ecf\u8fc7\u5fae\u8c03\u7684Code Llama\u6a21\u578b\u5728\u6b63\u786e\u6027\u4e0a\u8fbe\u5230\u4e86\u6bd4GitHub Copilot\u9ad8\u51fa55.2%\u7684\u5cf0\u503c\uff0c\u540c\u65f6\u5176\u53c2\u6570\u6570\u91cf\u4ec5\u4e3a\u5546\u4e1a\u89e3\u51b3\u65b9\u6848\uff08\u57fa\u4e8eCodex\u6a21\u578b\uff09\u76841/25\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u6539\u8fdb\u4e86\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u4ee3\u7801\u586b\u5145\u8bad\u7ec3\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5728\u63a5\u6536\u5230\u5c0f\u4e8e\u8bad\u7ec3\u65f6\u4f7f\u7528\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u63d0\u793a\u65f6\u7684\u6027\u80fd\u4e0d\u8db3\u95ee\u9898\u3002|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684](http://arxiv.org/abs/2405.15684)|null|\u4e3a\u4e86\u5f25\u5408\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u901a\u5e38\u4f1a\u5b66\u4e60\u4e00\u4e2a\u9002\u914d\u5668\uff0c\u5c06\u89c6\u89c9\u8f93\u5165\u8f6c\u5316\u4e3a\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u7406\u89e3\u7684\u4ee4\u724c\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u9002\u914d\u5668\u751f\u6210\u7684\u89c6\u89c9\u4ee4\u724c\u76f8\u5bf9\u56fa\u5b9a\uff0c\u4e0d\u8003\u8651\u63d0\u793a\u4e2d\u63d0\u53ca\u7684\u5177\u4f53\u5bf9\u8c61\u3002\u7531\u4e8e\u8fd9\u4e9b\u9002\u914d\u5668\u5bf9\u56fe\u50cf\u4e2d\u7684\u6bcf\u4e2a\u7ec6\u8282\u5206\u914d\u540c\u7b49\u5173\u6ce8\uff0c\u4e14\u503e\u5411\u4e8e\u5904\u7406\u6574\u4e2a\u573a\u666f\uff0c\u8fd9\u53ef\u80fd\u4f1a\u589e\u52a0\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u573a\u666f\u65f6\u7684\u8ba4\u77e5\u8d1f\u8377\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u3002\u8fd9\u7c7b\u9002\u914d\u5668\u8bbe\u8ba1\u6709\u6839\u636e\u63d0\u793a\u7279\u5b9a\u5173\u6ce8\u70b9\u52a8\u6001\u5d4c\u5165\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u6587\u672c\u7279\u5f81\uff0c\u5728\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u5c42\u6b21\u4e0a\u6355\u6349\u4e0e\u63d0\u793a\u6700\u76f8\u5173\u7684\u89c6\u89c9\u7ebf\u7d22\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u89e3\u91ca\u89c6\u89c9\u5185\u5bb9\u7684\u80fd\u529b\u3002\u5728\u5404\u79cd\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u5982\u8ba1\u6570\u548c\u4f4d\u7f6e\u63a8\u7406\u5b9e\u9a8c\u4e2d\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u7684\u6548\u679c\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2405.15668": "|**2024-05-24**|**What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models**|Abdelrahman Abdelhamed et.al.|[2405.15668](http://arxiv.org/abs/2405.15668)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u591a\u6a21\u6001LLMs\u5e94\u7528\u4e8e\u56fe\u50cf\u8f93\u5165\uff0c\u751f\u6210\u8be6\u5c3d\u7684\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6587\u672c\u8868\u793a\u88ab\u8f6c\u5316\u4e3a\u8de8\u6a21\u6001\u5d4c\u5165\u7a7a\u95f4\u4e2d\u7684\u56fa\u5b9a\u7ef4\u7279\u5f81\uff0c\u5e76\u7ed3\u5408\u4f7f\u7528\u4e8e\u96f6\u6837\u672c\u5206\u7c7b\uff0c\u65e0\u9700\u4e3a\u6bcf\u4e2a\u6570\u636e\u96c6\u8bbe\u8ba1\u590d\u6742\u7684\u63d0\u793a\u3002\u7814\u7a76\u8005\u91c7\u7528\u901a\u7528\u63d0\u793a\u7b56\u7565\uff0c\u800c\u975e\u9488\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u5355\u72ec\u8c03\u6574\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u6bd4\u5148\u524d\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u6709\u6240\u63d0\u5347\u3002\u5e73\u5747\u800c\u8a00\uff0c\u5728\u5341\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u65b9\u6cd5\u6bd4\u4f20\u7edf\u65b9\u6cd5\u63d0\u9ad8\u4e864.1\u4e2a\u767e\u5206\u70b9\uff0c\u5c24\u5176\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e866.8\u4e2a\u767e\u5206\u70b9\u3002\u8fd9\u8868\u660e\uff0c\u591a\u6a21\u6001LLMs\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5982\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u4e4b\u7c7b\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u4efb\u52a1\uff0c\u4e3a\u73b0\u6709\u6280\u672f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002|\n", "2405.15662": "|**2024-05-24**|**Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning**|Wenhan Chang et.al.|[2405.15662](http://arxiv.org/abs/2405.15662)|null|\u5728\u4eba\u5de5\u667a\u80fd\u65f6\u4ee3\uff0c\u7528\u6237\u53ef\u80fd\u56e0\u9690\u79c1\u987e\u8651\u8981\u6c42AI\u516c\u53f8\u4ece\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u5220\u9664\u4ed6\u4eec\u7684\u4fe1\u606f\u3002\u4f5c\u4e3a\u6a21\u578b\u6240\u6709\u8005\uff0c\u91cd\u65b0\u8bad\u7ec3\u6a21\u578b\u4f1a\u6d88\u8017\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\uff0c\u56e0\u6b64\u673a\u5668\u9057\u5fd8\uff08machine unlearning\uff09\u6280\u672f\u5e94\u8fd0\u800c\u751f\uff0c\u4ee5\u5141\u8bb8\u5220\u9664\u8bf7\u6c42\u7684\u8bad\u7ec3\u6570\u636e\u6216\u7c7b\u522b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\uff0c\u5982\u56fe\u50cf\u6216\u6587\u672c\uff0c\u4ece\u6a21\u578b\u4e2d\u201c\u9057\u5fd8\u201d\u4e00\u4e2a\u7c7b\u522b\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u96be\u4ee5\u786e\u5b9a\u7c7b\u522b\u4e0e\u6a21\u578b\u4e4b\u95f4\u7684\u5173\u8054\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u6982\u5ff5\uff08Concept\uff09\u800c\u975e\u56fe\u50cf\u7279\u5f81\u6216\u6587\u672c\u6570\u636e\u4e2d\u7684\u4ee4\u724c\u6765\u8868\u793a\u8981\u5220\u9664\u7c7b\u522b\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u8fd9\u6709\u52a9\u4e8e\u5207\u65ad\u6a21\u578b\u4e0e\u7c7b\u522b\u7684\u8054\u7cfb\uff0c\u5b9e\u73b0\u5f7b\u5e95\u6d88\u9664\u5f71\u54cd\u3002 \u4e3a\u4e86\u5206\u6790\u590d\u6742\u6570\u636e\u4e2d\u7684\u6982\u5ff5\u5f71\u54cd\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u540e\u5904\u7406\u6982\u5ff5\u74f6\u9888\u6a21\u578b\u548c\u96c6\u6210\u68af\u5ea6\u6280\u672f\uff0c\u7cbe\u786e\u8bc6\u522b\u4e0d\u540c\u7c7b\u522b\u4e2d\u7684\u6982\u5ff5\u3002\u7136\u540e\uff0c\u6211\u4eec\u5229\u7528\u968f\u673a\u6807\u7b7e\u548c\u76ee\u6807\u6807\u7b7e\u7684\u6570\u636e\u6c61\u67d3\u7b56\u7565\uff0c\u63d0\u51fa\u9057\u5fd8\u65b9\u6cd5\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u6a21\u578b\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4e00\u81f4\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7b56\u7565\u80fd\u51c6\u786e\u5730\u4ece\u6a21\u578b\u4e2d\u62b9\u9664\u76ee\u6807\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u6027\u80fd\u7684\u5927\u90e8\u5206\u3002|\n", "2405.15652": "|**2024-05-24**|**$$\\mathbf{L^2\\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis**|Simen Gaure et.al.|[2405.15652](http://arxiv.org/abs/2405.15652)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u7ffb\u8bd1\u3001\u9884\u6d4b\u548c\u5185\u5bb9\u751f\u6210\u7b49\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\u3002\u540c\u65f6\uff0c\u7814\u7a76\u754c\u53d1\u73b0LLMs\u6613\u53d7\u653b\u51fb\uff0c\u4f46\u4e5f\u80fd\u589e\u5f3a\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f00\u6e90\u7684LLMs\u5728\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u5a92\u4ecb\uff0c\u5982\u652f\u6301\u6297\u5ba1\u67e5\u901a\u4fe1\u65b9\u9762\u7684\u80fd\u529b\u5982\u4f55\u5462\uff1f\u672c\u8bba\u6587\u4ece\u5b9e\u9a8c\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b9e\u8bc1\u6d4b\u91cf\u5f00\u6e90LLM\u6a21\u578b\uff08Llama-7B\uff09\u7684\u5b89\u5168\u6027\u4e0e\u5bb9\u91cf\uff0c\u4ee5\u8bc4\u4f30\u5176\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u7684\u6709\u6548\u6027\u3002\u5c3d\u7ba1\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8e\u8fd9\u79cd\u6a21\u578b\u7684\u901a\u9053\u4e0d\u592a\u53ef\u80fd\u5b9e\u73b0\u9ad8\u5b9e\u9645\u6bd4\u7279\u7387\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6d88\u606f\u957f\u5ea6\u548c\u6a21\u578b\u71b5\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5bf9\u624b\u53d1\u73b0\u9690\u79d8\u901a\u4fe1\u7684\u53ef\u80fd\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u4f7f\u7ed3\u679c\u6613\u4e8e\u5e7f\u6cdb\u53c2\u8003\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u4e14\u76f4\u89c2\u7684\u65b9\u6848\uff0c\u5e76\u5047\u8bbe\u6a21\u578b\u662f\u516c\u5f00\u53ef\u7528\u7684\u3002|\n", "2405.15646": "|**2024-05-24**|**LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots**|Ruoyu Wang et.al.|[2405.15646](http://arxiv.org/abs/2405.15646)|null|\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u5f00\u53d1\u901a\u7528\u670d\u52a1\u673a\u5668\u4eba\u7684\u9700\u6c42\u4fc3\u4f7f\u673a\u5668\u4eba\u5fc5\u987b\u80fd\u6070\u5f53\u5730\u6267\u884c\u591a\u79cd\u57fa\u7840\u884c\u4e3a\u3002\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fdb\u6b65\u4f7f\u5f97\u53ef\u4ee5\u76f4\u63a5\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u6210\u4efb\u52a1\u5e8f\u5217\uff0c\u65e0\u9700\u989d\u5916\u7684\u9886\u57df\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5c3d\u7ba1LLMs\u7684\u8f93\u51fa\u5728\u8bed\u4e49\u4e0a\u662f\u6b63\u786e\u7684\uff0c\u4f46\u751f\u6210\u7684\u4efb\u52a1\u8ba1\u5212\u53ef\u80fd\u5e76\u4e0d\u7cbe\u786e\u5730\u5bf9\u5e94\u4e8e\u53ef\u63a5\u53d7\u7684\u52a8\u4f5c\uff0c\u5e76\u4e14\u53ef\u80fd\u5b58\u5728\u5404\u79cd\u8bed\u8a00\u6a21\u7cca\u6027\u3002LLM\u7684\u5e7b\u89c9\u95ee\u9898\u5bf9\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u6784\u6210\u6311\u6218\uff0c\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u6216\u7528\u6237\u8f93\u5165\u4e0d\u7b26\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u7ea6\u675fLLM\u63d0\u793a\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u4ece\u547d\u4ee4\u4e2d\u751f\u6210\u53ef\u6267\u884c\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f02\u5e38\u5904\u7406\u6a21\u5757\u6765\u5e94\u5bf9LLM\u5e7b\u89c9\u95ee\u9898\uff0c\u786e\u4fdd\u751f\u6210\u7684\u7ed3\u679c\u5728\u5f53\u524d\u73af\u5883\u4e2d\u662f\u53ef\u63a5\u7eb3\u7684\u3002\u6211\u4eec\u5728RoboCup@Home\u547d\u4ee4\u751f\u6210\u5668\u751f\u6210\u7684\u547d\u4ee4\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u663e\u793a\u673a\u5668\u4eba\u5728\u7406\u89e3\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.15640": "|**2024-05-24**|**GECKO: Generative Language Model for English, Code and Korean**|Sungwoo Oh et.al.|[2405.15640](http://arxiv.org/abs/2405.15640)|null|\u6211\u4eec\u4ecb\u7ecdGECKO\uff0c\u4e00\u4e2a\u4e13\u4e3a\u97e9\u8bed\u548c\u82f1\u8bed\uff08\u5305\u62ec\u7f16\u7a0b\u8bed\u8a00\uff09\u8bbe\u8ba1\u7684\u53cc\u8bed\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u5b83\u57fa\u4e8eLLaMA\u67b6\u6784\uff0c\u4f7f\u7528\u5e73\u8861\u4e14\u9ad8\u8d28\u91cf\u7684\u97e9\u82f1\u8bed\u6570\u636e\u96c6\u8fdb\u884c\u9884\u8bad\u7ec3\u3002\u672c\u62a5\u544a\u8be6\u8ff0\u4e86\u6211\u4eec\u5728\u6784\u5efa\u6570\u636e\u7ba1\u9053\u548c\u8bad\u7ec3\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u52aa\u529b\u3002\u5c3d\u7ba1GECKO\u7684\u8bcd\u6c47\u91cf\u8f83\u5c0f\uff0c\u4f46\u5176\u5728\u751f\u6210\u97e9\u8bed\u548c\u82f1\u8bed\u4ee4\u724c\u65f6\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u80fd\u3002\u6211\u4eec\u5728\u4ee3\u8868\u6027\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30\u4e86\u5176\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u97e9\u56fdMMMLU\uff08\u97e9\u56fd\u591a\u6a21\u6001\u591a\u8bed\u8a00\u7406\u89e3\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u800c\u5728\u82f1\u8bed\u548c\u4ee3\u7801\u65b9\u9762\u5219\u663e\u793a\u51fa\u9002\u5ea6\u7684\u80fd\u529b\uff0c\u5c3d\u7ba1\u5176\u8bad\u7ec3\u7684\u4ee4\u724c\u6570\u91cf\u5c11\u4e8e\u4e13\u6ce8\u4e8e\u82f1\u8bed\u7684LLMs\u3002GECKO\u4ee5\u5bbd\u677e\u7684\u8bb8\u53ef\u534f\u8bae\u5bf9\u5f00\u6e90\u793e\u533a\u5f00\u653e\uff0c\u6211\u4eec\u5e0c\u671b\u5b83\u80fd\u4e3a\u97e9\u8bedLLM\u7814\u7a76\u63d0\u4f9b\u7814\u7a76\u57fa\u7ebf\u548c\u5b9e\u7528\u89c1\u89e3\u3002\u60a8\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\u8be5\u6a21\u578b\uff1ahttps://huggingface.co/kifai/GECKO-7B\u3002|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430](http://arxiv.org/abs/2405.17430)|null|## \u80cc\u666f \u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08\u5982LLaVA\uff09\u5728\u89c6\u89c9-\u8bed\u8a00\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4e9b\u6a21\u578b\u9996\u5148\u5c06\u56fe\u50cf\u5d4c\u5165\u5230\u5927\u91cf\u7684\u56fa\u5b9a\u89c6\u89c9\u4ee4\u724c\u4e2d\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u89c6\u9891\u7b49\u5bc6\u96c6\u89c6\u89c9\u573a\u666f\u65f6\u4f1a\u5bfc\u81f4\u5927\u91cf\u4ee4\u724c\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u5c3d\u7ba1\u5b58\u5728\u4ee4\u724c\u526a\u679d/\u5408\u5e76\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u4e3a\u6bcf\u4e2a\u56fe\u50cf\u751f\u6210\u5355\u4e2a\u957f\u5ea6\u7684\u8f93\u51fa\uff0c\u65e0\u6cd5\u5728\u4fe1\u606f\u5bc6\u5ea6\u4e0e\u6548\u7387\u4e4b\u95f4\u7075\u6d3b\u6743\u8861\u3002\u53d7\u5230\u5957\u5a03\u73a9\u5076\u6982\u5ff5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86M3\uff1a\u5957\u5a03\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u5b66\u4e60\u5c06\u89c6\u89c9\u5185\u5bb9\u8868\u793a\u4e3a\u6355\u6349\u4e0d\u540c\u7c97\u7ec6\u7c92\u5ea6\u4fe1\u606f\u7684\u5d4c\u5957\u89c6\u89c9\u4ee4\u724c\u96c6\u5408\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u65b9\u6cd5\u4e3aLMMs\u5e26\u6765\u4e86\u51e0\u4e2a\u72ec\u7279\u7684\u4f18\u52bf\uff1a(1) \u5728\u6d4b\u8bd5\u5b9e\u4f8b\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u660e\u786e\u63a7\u5236\u89c6\u89c9\u7c92\u5ea6\uff0c\u4f8b\u5982\uff0c\u6839\u636e\u5185\u5bb9\u7684\u590d\u6742\u6027\u6216\u7b80\u6d01\u6027\u8c03\u6574\u7528\u4e8e\u8868\u793a\u56fe\u50cf\u7684\u4ee4\u724c\u6570\u91cf\uff1b(2) M3\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5206\u6790\u73b0\u6709\u6570\u636e\u96c6\u6240\u9700\u7c92\u5ea6\u7684\u6846\u67b6\uff0c\u6211\u4eec\u53d1\u73b0\u50cfCOCO\u8fd9\u6837\u7684\u57fa\u51c6\u53ea\u9700\u8981\u5927\u7ea6~9\u4e2a\u89c6\u89c9\u4ee4\u724c\u5c31\u80fd\u83b7\u5f97\u4e0e\u4f7f\u7528\u6240\u6709576\u4e2a\u4ee4\u724c\u76f8\u5f53\u7684\u51c6\u786e\u6027\uff1b(3) \u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u63a2\u7d22\u6027\u80fd\u4e0e\u89c6\u89c9\u4ee4\u724c\u957f\u5ea6\u4e4b\u95f4\u7684\u6700\u4f73\u6743\u8861\u63d0\u4f9b\u4e86\u57fa\u7840\uff0c\u7814\u7a76\u663e\u793a\u5f53\u524d\u56fa\u5b9a\u89c4\u6a21\u8868\u793a\u4e0e\u7406\u60f3\u4e0a\u9650\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002|\n", "2405.17428": "|**2024-05-27**|**NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models**|Chankyu Lee et.al.|[2405.17428](http://arxiv.org/abs/2405.17428)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNV-Embed\u7684\u65b0\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u63d0\u5347\u57fa\u4e8e\u89e3\u7801\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5bc6\u96c6\u5411\u91cf\u68c0\u7d22\u3002NV-Embed\u901a\u8fc7\u591a\u79cd\u67b6\u6784\u8bbe\u8ba1\u548c\u8bad\u7ec3\u7b56\u7565\u663e\u8457\u589e\u5f3a\u6a21\u578b\u7684\u7075\u6d3b\u6027\u548c\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u7b80\u6d01\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002 \u5728\u67b6\u6784\u65b9\u9762\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9690\u5f0f\u6ce8\u610f\u529b\u5c42\u6765\u83b7\u53d6\u6c60\u5316\u5d4c\u5165\uff0c\u8fd9\u5728\u68c0\u7d22\u548c\u4e0b\u6e38\u4efb\u52a1\u51c6\u786e\u6027\u4e0a\u5747\u4f18\u4e8e\u5e73\u5747\u6c60\u5316\u6216\u4f7f\u7528LLMs\u7684\u6700\u540e\u4e00\u4e2a token\u5d4c\u5165\u3002\u4e3a\u4e86\u6539\u8fdb\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u79fb\u9664\u4e86LLMs\u7684\u81ea\u56de\u5f52\u6ce8\u610f\u529b\u63a9\u7801\uff0c\u5728\u5bf9\u6bd4\u6027\u8bad\u7ec3\u4e2d\u5141\u8bb8\u66f4\u5168\u9762\u7684\u4fe1\u606f\u4ea4\u4e92\u3002 \u5728\u8bad\u7ec3\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u5bf9\u6bd4\u6027\u6307\u4ee4\u8c03\u4f18\u65b9\u6cd5\u3002\u7b2c\u4e00\u9636\u6bb5\u5728\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6307\u4ee4\u8bad\u7ec3\uff0c\u5229\u7528\u6279\u6b21\u5185\u8d1f\u6837\u672c\u548c\u7cbe\u5fc3\u6311\u9009\u7684\u96be\u4f8b\u3002\u7b2c\u4e8c\u9636\u6bb5\u5c06\u5404\u79cd\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u6570\u636e\u878d\u5165\u6307\u4ee4\u8c03\u4f18\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u51c6\u786e\u6027\uff0c\u8fd8\u63d0\u5347\u4e86\u68c0\u7d22\u6027\u80fd\u3002 \u51ed\u501f\u8fd9\u4e9b\u521b\u65b0\uff0cNV-Embed\u4ec5\u4f7f\u7528\u516c\u5f00\u6570\u636e\u5c31\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u9ad8\u5206\uff0c\u8fbe\u523069.32\uff0c\u8363\u767b\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\uff08\u622a\u81f32024\u5e745\u670824\u65e5\uff09\u699c\u9996\uff0c\u6db5\u76d656\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u91cd\u6392\u3001\u5206\u7c7b\u3001\u805a\u7c7b\u548c\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u5ea6\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728BEIR\u768415\u9879\u68c0\u7d22\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u6700\u9ad8\u768459.36\u5206\u3002NV-Embed\u6a21\u578b\u7684\u6e90\u4ee3\u7801\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u5f00\u6e90\uff1ahttps://huggingface.co/nvidia/NV-Embed-v1\u3002|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427](http://arxiv.org/abs/2405.17427)|**[link](https://github.com/kuanchihhuang/reason3d)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b83\u4eec\u5728\u6982\u5ff5\u63a8\u7406\u7b49\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5728\u7406\u89e3\u4e09\u7ef4\u73af\u5883\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u76f8\u5bf9\u6709\u9650\u3002\u672c\u6587\u63d0\u51faReason3D\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u4e3a\u5168\u97623D\u7406\u89e3\u8bbe\u8ba1\u7684\u65b0\u9896LLM\u3002Reason3D\u63a5\u53d7\u70b9\u4e91\u6570\u636e\u548c\u6587\u672c\u63d0\u793a\u4f5c\u4e3a\u8f93\u5165\uff0c\u751f\u6210\u6587\u672c\u54cd\u5e94\u548c\u5206\u5272\u63a9\u7801\uff0c\u652f\u6301\u9ad8\u7ea7\u4efb\u52a1\uff0c\u59823D\u63a8\u7406\u5206\u5272\u3001\u5c42\u6b21\u641c\u7d22\u3001\u8868\u8fbe\u5f0f\u6307\u4ee3\u548c\u8be6\u7ec6\u63a9\u7801\u8f93\u51fa\u7684\u95ee\u7b54\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5206\u5c42\u63a9\u7801\u89e3\u7801\u5668\uff0c\u80fd\u591f\u7cbe\u786e\u5b9a\u4f4d\u5e7f\u9614\u573a\u666f\u4e2d\u7684\u5c0f\u7269\u4f53\u3002\u8be5\u89e3\u7801\u5668\u9996\u5148\u751f\u6210\u4e00\u4e2a\u7c97\u7565\u7684\u4f4d\u7f6e\u4f30\u8ba1\uff0c\u8986\u76d6\u7269\u4f53\u7684\u5927\u81f4\u533a\u57df\uff0c\u7136\u540e\u91c7\u7528\u9010\u6b65\u7ec6\u5316\u7684\u7b56\u7565\uff0c\u663e\u8457\u63d0\u9ad8\u5bf9\u8c61\u8bc6\u522b\u548c\u5206\u5272\u7684\u7cbe\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cReason3D\u5728ScanNet\u548cMatterport3D\u7b49\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\uff0c\u57283D\u8868\u8fbe\u5f0f\u6307\u4ee3\u30013D\u95ee\u7b54\u548c3D\u63a8\u7406\u5206\u5272\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1ahttps://github.com/KuanchihHuang/Reason3D\u3002**|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|\u7531\u4e8e\u5b9e\u4f53\u4ee3\u7406\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u5b83\u4eec\u5fc5\u987b\u5177\u5907\u5168\u9762\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u8fd1\u671f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\u3002\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u786e\u5b9a\u5177\u4f53\u52a8\u4f5c\u65f6\u53ef\u80fd\u5b58\u5728\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\uff0c\u79f0\u4e3a\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u6240\u80fd\u8fbe\u5230\u7684\u6210\u5c31\u9700\u8981\u66f4\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u662f\u6700\u5feb\u7684\uff0c\u6bd4\u4ee5\u524d\u5feb6.8\u500d\u3002|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418](http://arxiv.org/abs/2405.17418)|null|\u5f53\u673a\u5668\u4eba\u64cd\u4f5c\u7b56\u7565\u9762\u5bf9\u65b0\u4efb\u52a1\u6216\u7269\u4f53\u5b9e\u4f8b\u65f6\uff0c\u5176\u52a8\u4f5c\u6027\u80fd\u5f80\u5f80\u4e0d\u5c3d\u4eba\u610f\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u68c0\u6d4b\u548c\u81ea\u6211\u7ea0\u6b63\u5931\u8d25\u52a8\u4f5c\u7684\u80fd\u529b\u5bf9\u4e8e\u5b9e\u9645\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLM\uff09\u5728\u89c6\u89c9\u6307\u4ee4\u8ddf\u968f\u65b9\u9762\u5c55\u73b0\u51fa\u524d\u666f\uff0c\u5e76\u5728\u591a\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u5c06\u901a\u7528MLLM\u4f5c\u4e3a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4eba\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Self-Corrected (SC)-MLLM\uff0c\u4e0d\u4ec5\u4f7f\u5176\u80fd\u591f\u9884\u6d4b\u672b\u7aef\u6267\u884c\u5668\u4f4d\u7f6e\uff0c\u8fd8\u8d4b\u4e88\u5176\u81ea\u4e3b\u8bc6\u522b\u5e76\u7ea0\u6b63\u9519\u8bef\u52a8\u4f5c\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u53c2\u6570\u6548\u7387\u9ad8\u7684\u5fae\u8c03\uff0c\u4f7fMLLM\u5177\u5907\u59ff\u6001\u9884\u6d4b\u529f\u80fd\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u4e00\u4e2a\u8bed\u8a00\u5efa\u6a21\u95ee\u9898\u3002\u5728\u9047\u5230\u6267\u884c\u5931\u8d25\u65f6\uff0c\u6a21\u578b\u80fd\u8bc6\u522b\u4f4e\u5c42\u6b21\u52a8\u4f5c\u9519\u8bef\u7684\u539f\u56e0\uff08\u5982\u4f4d\u7f6e\u548c\u65cb\u8f6c\u8bef\u5dee\uff09\uff0c\u5e76\u4e3b\u52a8\u5bfb\u6c42\u4e13\u5bb6\u7684\u63d0\u793a\u3002\u6839\u636e\u53cd\u9988\uff0cSC-MLLM\u4f1a\u91cd\u65b0\u601d\u8003\u5f53\u524d\u5931\u8d25\u573a\u666f\uff0c\u751f\u6210\u4fee\u6b63\u540e\u7684\u52a8\u4f5c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u8fde\u7eed\u7b56\u7565\u5b66\u4e60\u65b9\u6cd5\uff0c\u9488\u5bf9\u6210\u529f\u7ea0\u6b63\u7684\u6837\u672c\uff0c\u63d0\u5347\u6a21\u578b\u5bf9\u5f53\u524d\u573a\u666f\u914d\u7f6e\u7684\u9002\u5e94\u6027\uff0c\u51cf\u5c11\u4e13\u5bb6\u5e72\u9884\u7684\u9891\u7387\u3002 \u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684SC-MLLM\uff0c\u6211\u4eec\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u6700\u5148\u8fdb\u7684\u673a\u5668\u4ebaMLLM\uff08ManipLLM\uff09\u76f8\u6bd4\uff0cSC-MLLM\u663e\u8457\u63d0\u9ad8\u4e86\u64cd\u4f5c\u7cbe\u5ea6\uff1a\u5728\u5df2\u77e5\u7269\u4f53\u7c7b\u522b\u4e0a\u4ece57%\u63d0\u5347\u81f379%\uff0c\u5728\u672a\u77e5\u65b0\u7c7b\u522b\u4e0a\u4ece47%\u63d0\u5347\u81f369%\u3002|\n", "2405.17402": "|**2024-05-27**|**THREAD: Thinking Deeper with Recursive Spawning**|Philip Schroeder et.al.|[2405.17402](http://arxiv.org/abs/2405.17402)|**[link](https://github.com/philipmit/thread)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u4f46\u968f\u7740\u4e0a\u4e0b\u6587\u7684\u957f\u5ea6\u548c\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u5b83\u4eec\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Thinking Recursively and Dynamically\uff08ThReaD\uff09\u65b9\u6cd5\u3002ThReaD\u5c06\u6a21\u578b\u751f\u6210\u8fc7\u7a0b\u6784\u60f3\u4e3a\u4e00\u4e2a\u6267\u884c\u6d41\u7a0b\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u53ef\u4ee5\u5b8c\u6574\u8fd0\u884c\u6216\u52a8\u6001\u5730\u521b\u5efa\u65b0\u7ebf\u7a0b\u3002\u901a\u8fc7\u5b50\u7ebf\u7a0b\uff0c\u6a21\u578b\u53ef\u4ee5\u5206\u53d1\u4efb\u52a1\uff08\u5982\u601d\u8003\u3001\u83b7\u53d6\u4fe1\u606f\uff09\uff0c\u5b50\u7ebf\u7a0b\u53ea\u8fd4\u56de\u7236\u7ebf\u7a0b\u6240\u9700\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u8ba9\u6a21\u578b\u80fd\u591f\u6839\u636e\u9700\u8981\u8c03\u6574\u4ea7\u751f\u4ee4\u724c\u65f6\u4f7f\u7528\u7684\u4e2d\u95f4\u5de5\u4f5c\u91cf\u3002\u6211\u4eec\u5728\u4efb\u52a1\u89e3\u51b3\u548c\u95ee\u7b54\u7b49\u573a\u666f\u4e2d\u5e94\u7528ThReaD\uff0c\u4f7f\u5176\u80fd\u9012\u5f52\u5730\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u6216\u95ee\u9898\u5206\u89e3\u4e3a\u9010\u6b65\u7b80\u5316\u7684\u5c0f\u5b50\u95ee\u9898\uff0c\u7531\u5355\u72ec\u7684\u5b50\u7ebf\u7a0b\u89e3\u51b3\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u7684\u65b9\u5f0f\u5b9e\u73b0ThReaD\uff0c\u5e76\u5728\u5305\u62ecALFWorld\u3001TextCraft\u3001WebShop\u5728\u5185\u7684\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30GPT-4\u548cGPT-3.5\u7684\u8868\u73b0\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65b0\u57fa\u51c6\uff1aDataCommons QA\u548cMIMIC-III ICU QA\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cThReaD\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u76f8\u5bf9\u4e8e\u73b0\u6709\u6846\u67b6\uff0c\u5373\u4f7f\u662f\u5c0f\u578b\u6a21\u578b\uff08\u5982Llama-3-8b\u548cCodeLlama-7b\uff09\u4e5f\u80fd\u63d0\u534710%\u523050%\u7684\u7edd\u5bf9\u5206\u6570\u3002|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386](http://arxiv.org/abs/2405.17386)|**[link](https://github.com/cone-mt/mindmerger)**|## \u4efb\u52a1 \u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u82f1\u8bed\u4e0e\u5176\u4ed6\u975e\u82f1\u8bed\u8bed\u8a00\u4e4b\u95f4\u7684\u5dee\u8ddd\u660e\u663e\u3002\u4e00\u4e9b\u7814\u7a76\u901a\u8fc7\u5fae\u8c03LLMs\u4ee5\u91cd\u65b0\u5b66\u4e60\u975e\u82f1\u8bed\u7684\u63a8\u7406\u80fd\u529b\uff0c\u800c\u53e6\u4e00\u4e9b\u65b9\u6cd5\u5219\u4f7f\u7528\u5916\u90e8\u6a21\u578b\uff08\u5982\u82f1\u8bed\u7ffb\u8bd1\u6587\u672c\uff09\u7684\u8f93\u51fa\u6765\u66ff\u6362\u975e\u82f1\u8bed\u8f93\u5165\uff0c\u4ee5\u5e94\u5bf9LLM\u7406\u89e3\u975e\u82f1\u8bed\u7684\u6311\u6218\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u672a\u80fd\u5145\u5206\u5229\u7528LLMs\u5185\u5728\u7684\u63a8\u7406\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528LLMs\u7684\u601d\u7ef4\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u79f0\u4e3aMindMerger\uff0c\u5b83\u5c06LLMs\u4e0e\u591a\u8bed\u8a00\u6a21\u578b\u7684\u5916\u90e8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u4ee5\u63d0\u5347\u591a\u8bed\u8a00\u63a8\u7406\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e24\u6b65\u8bad\u7ec3\u7b56\u7565\uff0c\u9996\u5148\u5c06\u5916\u90e8\u80fd\u529b\u5d4c\u5165LLMs\uff0c\u7136\u540e\u8bad\u7ec3\u5916\u90e8\u80fd\u529b\u548c\u5185\u7f6e\u80fd\u529b\u7684\u534f\u4f5c\u4f7f\u7528\u3002\u5728\u4e09\u4e2a\u591a\u8bed\u8a00\u63a8\u7406\u6570\u636e\u96c6\u548c\u4e00\u4e2a\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMindMerger\u59cb\u7ec8\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u7279\u522b\u662f\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e0a\u3002\u5728\u4e0d\u66f4\u65b0LLMs\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0cMGSM\u6570\u636e\u96c6\u4e0a\u6240\u6709\u8bed\u8a00\u7684\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e866.7%\uff0c\u4f4e\u8d44\u6e90\u8bed\u8a00\u63d0\u9ad8\u4e868.0%\u3002|\n", "2405.17382": "|**2024-05-27**|**ReMoDetect: Reward Models Recognize Aligned LLM's Generations**|Hyunseok Lee et.al.|[2405.17382](http://arxiv.org/abs/2405.17382)|**[link](https://github.com/hyunseoklee-ai/reward_llm_detect)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6613\u7528\u6027\u63d0\u5347\uff0c\u5b83\u4eec\u5e26\u6765\u7684\u793e\u4f1a\u98ce\u9669\uff0c\u5982\u5047\u65b0\u95fb\u751f\u6210\uff0c\u4fc3\u4f7f\u5f00\u53d1\u51fa\u80fd\u68c0\u6d4bLLM\u751f\u6210\u6587\u672c\uff08LGT\uff09\u7684\u65b9\u6cd5\u4ee5\u786e\u4fdd\u5b89\u5168\u4f7f\u7528\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5927\u91cfLLM\u7684\u5b58\u5728\uff0c\u9010\u4e2a\u8bc6\u522b\u5b83\u4eec\u7684\u7279\u70b9\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u5173\u6ce8\u7684\u662f\u8fd9\u4e9b\u5f3a\u5927\u6a21\u578b\u5171\u6709\u7684\u7279\u6027\uff0c\u5373\u201c\u5bf9\u9f50\u8bad\u7ec3\u201d\uff0c\u5373\u8bad\u7ec3LLMs\u751f\u6210\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u6587\u672c\u3002\u6211\u4eec\u7684\u5173\u952e\u53d1\u73b0\u662f\uff0c\u968f\u7740\u8fd9\u4e9b\u5bf9\u9f50\u8bad\u7ec3\u7684LLMs\u81f4\u529b\u4e8e\u6700\u5927\u5316\u4eba\u7c7b\u504f\u597d\uff0c\u5b83\u4eec\u751f\u6210\u7684\u6587\u672c\u751a\u81f3\u6bd4\u4eba\u7c7b\u64b0\u5199\u7684\u6587\u672c\u5728\u4f30\u8ba1\u504f\u597d\u4e0a\u66f4\u9ad8\uff0c\u8fd9\u4f7f\u5f97\u5229\u7528\u504f\u597d\u6a21\u578b\uff08\u4e00\u4e2a\u8bad\u7ec3\u6765\u6a21\u62df\u4eba\u7c7b\u504f\u597d\u5206\u5e03\u7684LLM\uff09\u8f7b\u6613\u5c31\u80fd\u68c0\u6d4b\u5230\u8fd9\u4e9b\u6587\u672c\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e24\u79cd\u8fdb\u4e00\u6b65\u589e\u5f3a\u504f\u597d\u6a21\u578b\u68c0\u6d4b\u80fd\u529b\u7684\u8bad\u7ec3\u7b56\u7565\uff1a\uff081\uff09\u6301\u7eed\u504f\u597d\u5fae\u8c03\uff0c\u4f7f\u6a21\u578b\u66f4\u504f\u5411\u4e8e\u8bc6\u522b\u5bf9\u9f50\u7684LLG\uff1b\uff082\uff09\u5956\u52b1\u6a21\u578b\u5bf9\u4eba/LLM\u6df7\u5408\u6587\u672c\u7684\u5b66\u4e60\uff0c\u5373\u4f7f\u7528\u5bf9\u9f50LLM\u91cd\u8ff0\u7684\u4eba\u7c7b\u539f\u521b\u6587\u672c\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8eLGT\u548c\u4eba\u7c7b\u6587\u672c\u4e4b\u95f4\u7684\u504f\u597d\u57fa\u51c6\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u5b66\u4e60\u51b3\u7b56\u8fb9\u754c\u3002\u6211\u4eec\u5728\u516d\u4e2a\u6587\u672c\u9886\u57df\u548c\u5341\u4e8c\u79cd\u5bf9\u9f50LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5728https://github.com/hyunseoklee-ai/reward_llm_detect\u4e0a\u63d0\u4f9b\u3002|\n", "2405.17378": "|**2024-05-27**|**RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects**|Ahmed Allam et.al.|[2405.17378](http://arxiv.org/abs/2405.17378)|**[link](https://github.com/AUCOHL/RTL-Repo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8f85\u52a9\u8fdb\u884c\u5bc4\u5b58\u5668\u4f20\u8f93\u7ea7\uff08Register Transfer Level, RTL\uff09\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5728\u53cd\u6620\u771f\u5b9e\u4e16\u754cRTL\u9879\u76ee\u590d\u6742\u6027\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u4e3a\u6b64\uff0c\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u57fa\u51c6\u2014\u2014RTL-Repo\uff0c\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5927\u89c4\u6a21RTL\u8bbe\u8ba1\u9879\u76ee\u4e2d\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u3002RTL-Repo\u5305\u542b\u4e86\u4eceGitHub\u516c\u5171\u4ed3\u5e93\u63d0\u53d6\u7684\u8d85\u8fc74000\u4e2aVerilog\u4ee3\u7801\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u63d0\u4f9b\u4e86\u5bf9\u5e94\u4ed3\u5e93\u7684\u5b8c\u6574\u4e0a\u4e0b\u6587\u3002\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u3001GPT-3.5\u3001Starcoder2\u4ee5\u53ca\u50cfVeriGen\u548cRTLCoder\u8fd9\u6837\u7684Verilog\u4e13\u7528\u6a21\u578b\u5728\u5185\u7684\u591a\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728RTL-Repo\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6bd4\u8f83\u5b83\u4eec\u5728\u751f\u6210\u590d\u6742\u9879\u76ee\u7684Verilog\u4ee3\u7801\u65b9\u9762\u7684\u8868\u73b0\u3002RTL-Repo\u4e3a\u786c\u4ef6\u8bbe\u8ba1\u793e\u533a\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b9d\u8d35\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645RTL\u8bbe\u8ba1\u573a\u666f\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u9488\u5bf9\u590d\u6742\u7684\u591a\u6587\u4ef6RTL\u9879\u76ee\u4e13\u95e8\u8bad\u7ec3Verilog\u4ee3\u7801\u751f\u6210\u3002RTL-Repo\u662f\u5f00\u6e90\u7684\uff0c\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u53ef\u7528\u3002|\n", "2405.17374": "|**2024-05-28**|**Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models**|ShengYun Peng et.al.|[2405.17374](http://arxiv.org/abs/2405.17374)|**[link](https://github.com/shengyun-peng/llm-landscape)**|### \u80cc\u666f \u5b89\u5168\u6821\u51c6\u662f\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u884c\u4e3a\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u5e76\u907f\u514d\u6709\u5bb3\u884c\u4e3a\u7684\u5173\u952e\uff0c\u4f46\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u4ec5\u4f7f\u7528\u5c11\u91cf\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bad\u7ec3\u6837\u672c\u6765\u5fae\u8c03\u6a21\u578b\u53ef\u80fd\u5bfc\u81f4\u5b89\u5168\u6027\u88ab\u8f7b\u6613\u7834\u574f\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u901a\u8fc7\u63a2\u7d22LLM\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u98ce\u9669\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u4e2a\u666e\u904d\u5b58\u5728\u4e8e\u6d41\u884c\u5f00\u6e90LLM\u6a21\u578b\u53c2\u6570\u7a7a\u95f4\u4e2d\u7684\u65b0\u73b0\u8c61\uff0c\u79f0\u4e3a\u201c\u5b89\u5168\u76c6\u5730\u201d\uff1a\u968f\u673a\u6270\u52a8\u6a21\u578b\u6743\u91cd\u80fd\u4f7f\u6a21\u578b\u5728\u5c40\u90e8\u533a\u57df\u4fdd\u6301\u539f\u59cb\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u6027\u3002 ### \u53d1\u73b0\u4e0e\u8d21\u732e \u6211\u4eec\u7684\u53d1\u73b0\u542f\u53d1\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5b89\u5168\u5ea6\u91cf\u65b9\u6cd5\u2014\u2014VISAGE\uff0c\u5b83\u901a\u8fc7\u63a2\u6d4b\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30LLM\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u5b89\u5168\u6027\u3002\u53ef\u89c6\u5316\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6709\u52a9\u4e8e\u7406\u89e3\u5fae\u8c03\u5982\u4f55\u4f7f\u6a21\u578b\u504f\u79bb\u5b89\u5168\u76c6\u5730\uff0c\u4ece\u800c\u635f\u5bb3\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u7cfb\u7edf\u63d0\u793a\u5728\u4fdd\u62a4\u6a21\u578b\u65b9\u9762\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u79cd\u4fdd\u62a4\u751a\u81f3\u4f1a\u4f20\u9012\u7ed9\u5904\u4e8e\u5b89\u5168\u76c6\u5730\u5185\u7684\u6270\u52a8\u7248\u672c\u3002\u8fd9\u4e9b\u4ece\u5b89\u5168\u666f\u89c2\u7814\u7a76\u4e2d\u5f97\u51fa\u7684\u89c1\u89e3\u4e3a\u672a\u6765LLM\u5b89\u5168\u9886\u57df\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b0\u7684\u6d1e\u89c1\u3002|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414](http://arxiv.org/abs/2405.18414)|null|## \u80cc\u666f \u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval Augmented Generation\uff0cRAG\uff09\u901a\u8fc7\u7ed3\u5408\u73b0\u6709\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u7684\u54cd\u5e94\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u6587\u6863\u4e0e\u95ee\u9898\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u6027\u4e0d\u660e\u663e\u6216\u5b58\u5728\u90e8\u5206\u4fe1\u606f\u65f6\uff0cRAG\u7684\u6548\u679c\u5982\u4f55\uff1f\u53c8\u8be5\u5982\u4f55\u5904\u7406\u6587\u6863\u4e4b\u95f4\u7684\u5173\u8054\u6027\u5462\uff1f\u672c\u7814\u7a76\u65e8\u5728\u89e3\u7b54RAG\u751f\u6210\u4e2d\u7684\u8fd9\u4e24\u4e2a\u6838\u5fc3\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aG-RAG\u7684\u65b9\u6cd5\uff0c\u5b83\u662f\u4e00\u4e2a\u57fa\u4e8e\u56fe\u795e\u7ecf\u7f51\u7edc\uff08Graph Neural Networks\uff0cGNNs\uff09\u7684\u91cd\u6392\u5668\uff0c\u4ecb\u4e8eRAG\u7684\u68c0\u7d22\u5668\u548c\u9605\u8bfb\u5668\u4e4b\u95f4\u3002G-RAG\u7ed3\u5408\u4e86\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u63a5\u6027\u548c\u8bed\u4e49\u4fe1\u606f\uff08\u901a\u8fc7\u62bd\u8c61\u610f\u4e49\u8868\u793a\u56fe\uff09\uff0c\u4e3aRAG\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u6709\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u6392\u540d\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cG-RAG\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u9886\u5148\u65b9\u6cd5\uff0c\u540c\u65f6\u8ba1\u7b97\u5f00\u9500\u66f4\u5c0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86PaLM 2\u4f5c\u4e3a\u91cd\u6392\u5668\u7684\u8868\u73b0\uff0c\u53d1\u73b0\u5176\u660e\u663e\u900a\u8272\u4e8eG-RAG\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5373\u4f7f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u91cd\u6392\u5728RAG\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2405.18386": "|**2024-05-28**|**Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning**|Yixiao Zhang et.al.|[2405.18386](http://arxiv.org/abs/2405.18386)|**[link](https://github.com/ldzhangyx/instruct-MusicGen)**|**\u5728\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u9886\u57df\uff0c\u8fd1\u671f\u7684\u8fdb\u6b65\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u6765\u6539\u53d8\u97f3\u4e50\u98ce\u683c\u6216\u8c03\u6574\u4e50\u5668\u5143\u7d20\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u4ece\u5934\u8bad\u7ec3\u7279\u5b9a\u7684\u7f16\u8f91\u6a21\u578b\uff0c\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\uff0c\u8981\u4e48\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u7f16\u8f91\u540e\u7684\u97f3\u4e50\uff0c\u5bfc\u81f4\u97f3\u9891\u91cd\u5efa\u4e0d\u591f\u7cbe\u786e\u3002\u4e3a\u4e86\u7ed3\u5408\u4f18\u70b9\u5e76\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Instruct-MusicGen\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u9488\u5bf9\u9884\u8bad\u7ec3\u7684MusicGen\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9ad8\u6548\u5730\u6267\u884c\u7f16\u8f91\u6307\u4ee4\uff0c\u5982\u6dfb\u52a0\u3001\u5220\u9664\u6216\u5206\u79bb\u97f3\u8f68\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fee\u6539\u4e86\u539f\u59cbMusicGen\u67b6\u6784\uff0c\u5f15\u5165\u4e86\u6587\u672c\u878d\u5408\u6a21\u5757\u548c\u97f3\u9891\u878d\u5408\u6a21\u5757\uff0c\u4f7f\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6307\u4ee4\u6587\u672c\u548c\u97f3\u9891\u8f93\u5165\uff0c\u751f\u6210\u6240\u9700\u7684\u7f16\u8f91\u97f3\u4e50\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cInstruct-MusicGen\u4ec5\u5411\u539f\u59cb\u6a21\u578b\u589e\u52a0\u4e868%\u7684\u65b0\u53c2\u6570\uff0c\u5e76\u57285000\u6b65\u7684\u8bad\u7ec3\u540e\uff0c\u5176\u6027\u80fd\u8d85\u8d8a\u73b0\u6709\u57fa\u51c6\uff0c\u4e14\u8868\u73b0\u51fa\u4e0e\u4e13\u95e8\u9488\u5bf9\u4efb\u52a1\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u8fdb\u5c55\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u7684\u6548\u7387\uff0c\u8fd8\u62d3\u5bbd\u4e86\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u5728\u52a8\u6001\u97f3\u4e50\u5236\u4f5c\u73af\u5883\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002**|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380](http://arxiv.org/abs/2405.18380)|**[link](https://github.com/pixeli99/owlore)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\u3002\u7136\u800c\uff0c\u5927\u6a21\u578b\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u7b49\u53c2\u6570\u9ad8\u6548\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u4f46\u5f80\u5f80\u727a\u7272\u6027\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5185\u5b58\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u2014\u2014Outlier-weighed Layerwise Sampled Low-Rank Projection\uff08OwLore\uff09\uff0c\u5b83\u53d7\u5230LLMs\u5c42\u95f4\u5f02\u5e38\u5206\u5e03\u7684\u542f\u53d1\uff0c\u901a\u8fc7\u52a8\u6001\u91c7\u6837\u9884\u8bad\u7ec3\u5c42\u800c\u975e\u6dfb\u52a0\u989d\u5916\u9002\u914d\u5668\u6765\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u9996\u5148\u901a\u8fc7Heavy-Tailed Self-Regularization\u7406\u8bba\uff08HT-SR\uff09\u89e3\u8bfb\u5f02\u5e38\u73b0\u8c61\uff0c\u53d1\u73b0\u5177\u6709\u66f4\u591a\u5f02\u5e38\u503c\u7684\u5c42\u66f4\u503e\u5411\u4e8e\u5448\u73b0\u957f\u5c3e\u5206\u5e03\uff0c\u8bad\u7ec3\u6548\u679c\u66f4\u597d\u3002\u56e0\u6b64\uff0cOwLore\u7b56\u7565\u6027\u5730\u4e3a\u5f02\u5e38\u503c\u8f83\u591a\u7684\u5c42\u5206\u914d\u66f4\u9ad8\u7684\u91c7\u6837\u6982\u7387\uff0c\u4ee5\u66f4\u597d\u5730\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u77e5\u8bc6\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u51cf\u5c11\u5fae\u8c03\u65f6\u7684\u5185\u5b58\u9700\u6c42\uff0c\u6211\u4eec\u7ed3\u5408\u68af\u5ea6\u4f4e\u79e9\u6295\u5f71\uff0c\u4f7f\u5f97\u6bcf\u4e00\u5c42\u80fd\u4ee5\u4f4e\u79e9\u65b9\u5f0f\u9ad8\u6548\u8bad\u7ec3\u3002\u901a\u8fc7\u878d\u5408\u4f4e\u79e9\u4f18\u52bf\u548c\u6700\u4f18\u5c42\u522b\u91c7\u6837\u7b56\u7565\uff0cOwLore\u663e\u8457\u4f18\u5316\u4e86LLM\u526a\u679d\u4e2d\u7684\u5185\u5b58-\u6027\u80fd\u6743\u8861\u3002\u6211\u4eec\u5728\u591a\u4e2a\u67b6\u6784\uff0c\u5982LLaMa2\u3001LLaMa3\u548cMistral\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cOwLore\u6301\u7eed\u4f18\u4e8e\u57fa\u7840\u65b9\u6cd5\uff0c\u5305\u62ec\u5168\u91cf\u5fae\u8c03\u3002\u4f8b\u5982\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u57fa\u51c6\u4e0a\uff0cOwLore\u53ef\u5b9e\u73b0\u5e73\u57471.1%\u7684\u7cbe\u5ea6\u63d0\u5347\uff0cMMLU\u4e0a\u63d0\u9ad83.0%\uff0c\u800c\u5728MT-Bench\u4e0a\u66f4\u662f\u6709\u663e\u8457\u768410%\u63d0\u5347\uff0c\u540c\u65f6\u5185\u5b58\u6548\u7387\u66f4\u9ad8\u3002\u7279\u522b\u5730\uff0cOwLore\u4ec5\u970021GB\u5185\u5b58\u5373\u53ef\u5bf9LLaMa2-7B\u8fdb\u884c\u5fae\u8c03\u3002**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377](http://arxiv.org/abs/2405.18377)|null|\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u590d\u6742\u63a8\u7406\u3001\u60c5\u611f\u5206\u6790\u7b49\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u63a8\u52a8\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f3a\u5927\u7684\u529f\u80fd\u4f34\u968f\u7740\u5de8\u5927\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u9650\u5236\u4e86\u5728\u5927\u591a\u6570\u786c\u4ef6\u5e73\u53f0\u4e0a\u7684\u4f7f\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8eLLaMA2-7B\u8fdb\u884c\u5355\u6b21\u5fae\u8c03\u540e\uff0c\u901a\u8fc7\u9057\u4f20\u7b97\u6cd5\u641c\u7d22\u627e\u5230\u66f4\u5c0f\u3001\u8ba1\u7b97\u590d\u6742\u5ea6\u66f4\u4f4e\u7684\u7f51\u7edc\u67b6\u6784\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u6807\u51c6\u57fa\u51c6\u4efb\u52a1\uff0c\u9884\u8bad\u7ec3\u7684LLaMA2-7B\u6a21\u578b\u5b9e\u9645\u4e0a\u8fc7\u4e8e\u5e9e\u5927\u4e14\u590d\u6742\u3002\u6211\u4eec\u5b9e\u73b0\u4e861.5\u500d\u7684\u6a21\u578b\u5927\u5c0f\u7f29\u51cf\u548c1.3\u500d\u7684\u541e\u5410\u91cf\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51e0\u4e4e\u65e0\u635f\u7684\u51c6\u786e\u6027\u3002\u76f8\u8f83\u4e8e\u67d0\u4e9b\u526a\u679d\u6216\u7a00\u758f\u5316\u6280\u672f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6548\u7387\u548c\u6548\u679c\u4e0a\u66f4\u4e3a\u4f18\u8d8a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u91cf\u5316\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\u7684\u6548\u679c\uff0c\u8fdb\u4e00\u6b65\u901a\u8fc7\u91cf\u5316\u51cf\u5c11\u4e86\u627e\u5230\u7684\u7f51\u7edc\u7684\u5927\u5c0f\u548c\u590d\u6742\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u672c\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u79cd\u81ea\u52a8\u521b\u5efa\u53ef\u5728\u66f4\u5ec9\u4ef7\u548c\u5e7f\u6cdb\u53ef\u7528\u786c\u4ef6\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684LLMs\u7684\u65b9\u6cd5\u3002|\n", "2405.18376": "|**2024-05-28**|**Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning**|Dongjie Chen et.al.|[2405.18376](http://arxiv.org/abs/2405.18376)|**[link](https://github.com/Dong-Jie-Chen/RCL)**|**### \u80cc\u666f \u6e90\u514d\u8d39\u9886\u57df\u9002\u5e94\uff08SFDA\uff09\u7684\u76ee\u6807\u662f\u4ec5\u4f7f\u7528\u672a\u6807\u8bb0\u7684\u9776\u57df\u6570\u636e\u6765\u8c03\u6574\u9884\u8bad\u7ec3\u7684\u6e90\u6a21\u578b\u3002\u5f53\u524d\u7684SFDA\u65b9\u6cd5\u5728\u6709\u6548\u5229\u7528\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u6316\u6398\u9776\u57df\u6570\u636e\u6f5c\u529b\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u548c\u6587\u672c\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5e94\u7528\u4e8eSFDA\u65f6\u5b58\u5728\u95ee\u9898\uff0c\u5982\u6307\u4ee4\u6267\u884c\u5931\u8d25\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\u4ee5\u53ca\u5728\u9002\u5e94\u524d\u6027\u80fd\u8bc4\u4f30\u56f0\u96be\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014\u53ef\u9760\u6027\u57fa\u4e8e\u8bfe\u7a0b\u5b66\u4e60\uff08RCL\uff09\uff0c\u5b83\u901a\u8fc7\u4f2a\u6807\u7b7e\u5316\u6574\u5408\u591a\u4e2aMLLM\u4ee5\u4fc3\u8fdb\u77e5\u8bc6\u5229\u7528\uff0c\u5e94\u7528\u4e8eSFDA\u3002 ### \u65b9\u6cd5 \u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\uff1a1) \u53ef\u9760\u77e5\u8bc6\u8f6c\u79fb\uff0c2) \u81ea\u6211\u7ea0\u6b63\uff0c3) MLLM\u5f15\u5bfc\u7684\u77e5\u8bc6\u6269\u5c55\uff0c\u4ee5\u53ca4) \u591a\u70ed\u63a9\u7801\u7cbe\u70bc\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u534f\u540c\u4f5c\u7528\uff0c\u9010\u6b65\u53d1\u6398\u9776\u57df\u672a\u6807\u8bb0\u6570\u636e\u7684\u4ef7\u503c\u3002RCL\u5728\u591a\u4e2aSFDA\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\uff08SOTA\uff09\u6027\u80fd\uff0c\u4f8b\u5982\u5728DomainNet\u4e0a\u63d0\u5347\u663e\u8457\uff0c\u8fbe\u5230$\\textbf{+9.4\\%}$\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u589e\u5f3a\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u540c\u65f6\u65e0\u9700\u8bbf\u95ee\u6e90\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Dong-Jie-Chen/RCL\u83b7\u53d6\u3002**|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375](http://arxiv.org/abs/2405.18375)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\u5e38\u8bc6\u63a8\u7406\u662f\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u7684\u91cd\u8981\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u6b64\u5df2\u5f00\u53d1\u51fa\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5927\u591a\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u521b\u5efa\u5e73\u884c\u57fa\u51c6\u6709\u52a9\u4e8e\u8de8\u8bed\u8a00\u8bc4\u4f30\uff0c\u4ece\u800c\u66f4\u597d\u5730\u7406\u89e3\u4e0d\u540c\u8bed\u8a00\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u6cf0\u8bed\u7248\u7684Winograd Schema\u96c6\u5408\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u6d4b\u8bd5\u6cf0\u8bed\u4e2d\u7684\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u65b0\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u9080\u8bf7\u6bcd\u8bed\u8005\u3001\u4e13\u4e1a\u7ffb\u8bd1\u548c\u4e25\u683c\u9a8c\u8bc1\u7684\u65b9\u6cd5\uff0c\u786e\u4fdd\u8be5\u7cfb\u5217\u9898\u5e93\u80fd\u51c6\u786e\u53cd\u6620\u6cf0\u56fd\u8bed\u8a00\u7684\u72ec\u7279\u6027\u3001\u4e60\u8bed\u548c\u6587\u5316\u5f15\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u7cca\u6027\u548c\u5e38\u8bc6\u6311\u6218\u3002\u6211\u4eec\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u548cClaude-3-Opus\uff09\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5c3d\u7ba1\u5728\u82f1\u8bed\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u4f46\u5b83\u4eec\u5728\u6cf0\u8bed\u4e2d\u7684\u6027\u80fd\u660e\u663e\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u5728\u591a\u8bed\u8a00\u5e38\u8bc6\u63a8\u7406\u65b9\u9762\u4ecd\u6709\u5f85\u8fdb\u6b65\u3002|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369](http://arxiv.org/abs/2405.18369)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u7684\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u5b83\u4eec\u6210\u529f\u7684\u5173\u952e\u5728\u4e8e\u63d0\u793a\u7684\u6982\u5ff5\uff0c\u5373\u6307\u5bfc\u6a21\u578b\u751f\u6210\u8f93\u51fa\u3002\u7136\u800c\uff0c\u624b\u52a8\u521b\u5efa\u63d0\u793a\u65e2\u8017\u65f6\u53c8\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\uff0c\u56e0\u6b64\u9700\u8981\u81ea\u52a8\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u4ecb\u7ecdPromptWizard\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u8fed\u4ee3\u5730\u5408\u6210\u548c\u4f18\u5316\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u63d0\u793a\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0cPromptWizard\u540c\u65f6\u4f18\u5316\u63d0\u793a\u6307\u4ee4\u548c\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u4ee5\u6700\u5927\u5316\u6a21\u578b\u6027\u80fd\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u53d8\u5f02\u6307\u4ee4\u5e76\u5f15\u5165\u8d1f\u4f8b\uff0c\u9010\u6b65\u6df1\u5316\u7406\u89e3\u5e76\u4fdd\u8bc1\u591a\u6837\u6027\u3002\u501f\u52a9\u4e00\u4e2a\u8bc4\u5224\u8005\uff0cPromptWizard\u8fdb\u4e00\u6b65\u6539\u8fdb\u6307\u4ee4\u548c\u793a\u4f8b\uff0c\u878d\u5165\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u4ee5\u5b9e\u73b0\u6700\u4f73\u8868\u73b0\u3002PromptWizard\u5177\u6709\u8ba1\u7b97\u6548\u7387\u9ad8\u3001\u9002\u5e94\u4e0d\u540c\u8bad\u7ec3\u6570\u636e\u91cf\u573a\u666f\u4ee5\u53ca\u5728\u5c0f\u578bLLM\u4e0a\u540c\u6837\u6709\u6548\u7684\u7279\u70b9\u3002\u901a\u8fc7\u5bf98\u4e2a\u6570\u636e\u96c6\u768435\u4e2a\u4efb\u52a1\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aPromptWizard\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u7b56\u7565\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u63d0\u793a\u4f18\u5316\u65b9\u9762\u7684\u9ad8\u6548\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361](http://arxiv.org/abs/2405.18361)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u4efb\u52a1\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u57fa\u4e8e\u7aef\u5230\u7aef\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u5e94\u7528\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u8bd5\u56fe\u878d\u5408\u5f3a\u5927\u7684\u903b\u8f91\u63a8\u7406\u548c\u8ba4\u77e5\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u5168\u9762\u7684\u7aef\u5230\u7aef\u89c4\u5212\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684VLM\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e2D\u89c6\u89c9\u5206\u8bcd\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5728\u5904\u7406\u4e09\u7ef4\u51e0\u4f55\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u9760\u7684\u89c4\u5212\u81f3\u5173\u91cd\u8981\u3002\u7814\u7a76\u8868\u660e\uff0c2D\u5206\u8bcd\u7684LLM\u5e76\u4e0d\u80fd\u51c6\u786e\u611f\u77e5\u4e09\u7ef4\u73af\u5883\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8eVLM\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u53ef\u9760\u6027\u7684\u8d28\u7591\u3002 \u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86DETR\u98ce\u683c\u76843D\u611f\u77e5\u5668\u4f5c\u4e3a3D\u5206\u8bcd\u5668\uff0c\u4e0e\u5355\u5c42\u7ebf\u6027\u6295\u5f71\u5668\u76f8\u8fde\uff0c\u5de7\u5999\u5730\u5229\u7528\u4e86\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u7684\u56fa\u6709\u7279\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u9ad8\u5206\u8fa8\u7387\u591a\u89c6\u89d2\u56fe\u50cf\u7684\u540c\u65f6\u5904\u7406\u548c\u65f6\u7a7a\u5efa\u6a21\u3002\u5c3d\u7ba1\u7b80\u5355\uff0c\u4f46Atlas\u5728NuScenes\u6570\u636e\u96c6\u4e0a\u76843D\u68c0\u6d4b\u548c\u81ea\u4e3b\u9a7e\u9a76\u89c4\u5212\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e863D\u5206\u8bcd\u7684LLM\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u9760\u81ea\u52a8\u9a7e\u9a76\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2405.18359": "|**2024-05-28**|**Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs**|Somnath Kumar et.al.|[2405.18359](http://arxiv.org/abs/2405.18359)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5168\u7403\u8303\u56f4\u5185\u91cd\u5851\u4f17\u591a\u9886\u57df\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u975e\u62c9\u4e01\u5b57\u6bcd\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u65f6\u7684\u5305\u5bb9\u6027\u548c\u6548\u679c\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u5173\u952e\u6311\u6218\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u5927\u91cf\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u591a\u8bed\u8a00LLMs\u7684\u8868\u73b0\u3002\u901a\u8fc7\u7cfb\u7edf\u5730\u7814\u7a76\u548c\u8bc4\u4f30\u5404\u79cd\u8bed\u8a00\u5728\u6d41\u884c\u7684\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b0\u9896\u6280\u672f\uff0c\u4ee5\u91ca\u653eLLMs\u5728\u591a\u5143\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u771f\u6b63\u6f5c\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u7b56\u7565\uff0c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u591a\u8bed\u8a00\u80fd\u529b\uff1a\u9996\u5148\uff0c\u7cbe\u5fc3\u4f18\u5316\u9002\u7528\u4e8e\u591a\u8bed\u8a00LLM\u7684\u63d0\u793a\uff0c\u6316\u6398\u5176\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5404\u8bed\u8a00\u7684\u8868\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u6df7\u5408\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u591a\u8bed\u8a00\u5d4c\u5165\u7684LLM\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\uff0c\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u591a\u4efb\u52a1\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u52a8\u6001\u5b66\u4e60\u7b56\u7565\uff0c\u5b9e\u73b0\u5b9e\u65f6\u6839\u636e\u67e5\u8be2\u52a8\u6001\u9009\u62e9\u6700\u5408\u9002\u7684\u63d0\u793a\u7b56\u7565\u3001LLM\u6a21\u578b\u548c\u5d4c\u5165\u6a21\u578b\uff0c\u4ece\u800c\u6700\u5927\u5316LLM\u5728\u4e0d\u540c\u8bed\u8a00\u4e0a\u7684\u6548\u7387\uff0c\u8d85\u8d8a\u4e86\u6700\u4f73\u9759\u6001\u548c\u968f\u673a\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u65e2\u9002\u7528\u4e8e\u79bb\u7ebf\u914d\u7f6e\u8c03\u6574\uff0c\u4e5f\u652f\u6301\u5728\u7ebf\u9002\u5e94\uff0c\u80fd\u591f\u65e0\u7f1d\u9002\u5e94\u65b0\u8bed\u8a00\u548c\u6570\u636e\u96c6\uff0c\u663e\u8457\u63a8\u52a8\u4e86\u591a\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u5728\u5404\u79cd\u8bed\u8a00\u4e2d\u7684\u8fdb\u6b65\u3002|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358](http://arxiv.org/abs/2405.18358)|null|## \u80cc\u666f \u8fd1\u671f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u878d\u5408\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7ec6\u81f4\u7684\u591a\u6a21\u6001\u7406\u89e3\u3001\u590d\u6742\u4efb\u52a1\u89e3\u6790\u4ee5\u53ca\u591a\u6a21\u6001\u4fe1\u606f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMMCTAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u89e3\u51b3\u5f53\u524dMLLM\u5728\u590d\u6742\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u4e2d\u56fa\u6709\u5c40\u9650\u6027\u7684\u65b0\u578b\u591a\u6a21\u6001\u6279\u5224\u6027\u601d\u7ef4\u4ee3\u7406\u6846\u67b6\u3002MMCTAgent\u501f\u9274\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u548c\u6279\u5224\u6027\u601d\u8003\u7684\u7279\u70b9\uff0c\u901a\u8fc7\u8fed\u4ee3\u5206\u6790\u591a\u6a21\u6001\u4fe1\u606f\u3001\u62c6\u89e3\u95ee\u9898\u3001\u89c4\u5212\u7b56\u7565\uff0c\u5e76\u5b9e\u73b0\u52a8\u6001\u63a8\u7406\u3002 \u6b64\u5916\uff0cMMCTAgent\u8fd8\u878d\u5165\u4e86\u6279\u5224\u6027\u601d\u8003\u5143\u7d20\uff0c\u5982\u5bf9\u6700\u7ec8\u7b54\u6848\u7684\u9a8c\u8bc1\u548c\u81ea\u6211\u53cd\u601d\u3002\u5b83\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5b9a\u4e49\u57fa\u4e8e\u89c6\u89c9\u7684\u8bc4\u5224\u8005\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ece\u800c\u63d0\u5347\u51b3\u7b56\u80fd\u529b\u3002\u5728\u591a\u4e2a\u56fe\u50cf\u7406\u89e3\u548c\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u4e25\u8c28\u5730\u8bc4\u4f30\u4e86MMCTAgent\uff08\u5305\u62ec\u5e26\u8bc4\u5224\u8005\u7684\u7248\u672c\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u5728\u8d85\u8d8a\u57fa\u7840MLLM\u548c\u5176\u4ed6\u5de5\u5177\u589e\u5f3a\u7684\u7ba1\u9053\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335](http://arxiv.org/abs/2405.19335)|null|\u6211\u4eec\u63d0\u51faX-VILA\uff0c\u4e00\u79cd\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u529f\u80fd\u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u878d\u5408\u4e86\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u6a21\u6001\u3002\u901a\u8fc7\u5c06\u5404\u6a21\u6001\u7279\u5b9a\u7684\u7f16\u7801\u5668\u4e0eLLM\u8f93\u5165\u5bf9\u9f50\uff0c\u5e76\u5c06\u6269\u6563\u89e3\u7801\u5668\u4e0eLLM\u8f93\u51fa\u5bf9\u9f50\uff0cX-VILA\u5b9e\u73b0\u4e86\u8de8\u6a21\u6001\u7406\u89e3\u3001\u63a8\u7406\u548c\u751f\u6210\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8de8\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6709\u6548\u7684\u4efb\u610f\u6a21\u6001\u6307\u4ee4\u8ddf\u968f\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u7684\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u5b58\u5728\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5bfc\u81f4\u89c6\u89c9\u4fe1\u606f\u4e22\u5931\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u89c6\u89c9\u5bf9\u9f50\u673a\u5236\uff0c\u5305\u62ec\u4e00\u4e2a\u89c6\u89c9\u5d4c\u5165\u9ad8\u901f\u516c\u8def\u6a21\u5757\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u8d44\u6e90\u9ad8\u6548\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u4f7f\u5f97X-VILA\u5728\u4efb\u610f\u6a21\u6001\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5927\u5e45\u8d85\u8d8a\u5148\u524d\u7684\u65b9\u6cd5\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u5728\u7f3a\u4e4f\u7c7b\u4f3c\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0cX-VILA\u5728\u4e0d\u540c\u6a21\u6001\u95f4\u4e5f\u5c55\u73b0\u51fa\u6d8c\u73b0\u7279\u6027\u3002\u8be5\u9879\u76ee\u5c06\u5f00\u6e90\u3002|\n", "2405.19334": "|**2024-05-29**|**LLMs Meet Multimodal Generation and Editing: A Survey**|Yingqing He et.al.|[2405.19334](http://arxiv.org/abs/2405.19334)|**[link](https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u5c06\u5b83\u4eec\u4e0e\u591a\u6a21\u6001\u5b66\u4e60\u76f8\u7ed3\u5408\u3002\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8c03\u67e5\u4e3b\u8981\u96c6\u4e2d\u5728\u7406\u89e3\u4e0a\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8de8\u56fe\u50cf\u3001\u89c6\u9891\u30013D\u548c\u97f3\u9891\u7b49\u9886\u57df\u7684\u591a\u6a21\u6001\u751f\u6210\uff0c\u7279\u522b\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u9886\u57df\u4e2d\u7684\u91cc\u7a0b\u7891\u5f0f\u5de5\u4f5c\u53ca\u5176\u6280\u672f\u8fdb\u6b65\u3002\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u5173\u952e\u6280\u672f\u7ec4\u4ef6\uff0c\u4ee5\u53ca\u5728\u76f8\u5173\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5256\u6790\u4e86\u501f\u52a9\u73b0\u6709\u751f\u6210\u6a21\u578b\u8fdb\u884c\u4eba\u7c7b-\u8ba1\u7b97\u673a\u4ea4\u4e92\u7684\u5de5\u5177\u589e\u5f3a\u578b\u591a\u6a21\u6001\u4ee3\u7406\u3002\u6700\u540e\uff0c\u6211\u4eec\u5168\u9762\u8ba8\u8bba\u4e86\u4eba\u5de5\u667a\u80fd\u5b89\u5168\u7684\u8fdb\u6b65\uff0c\u5e76\u63a2\u7d22\u4e86\u65b0\u5174\u5e94\u7528\u548c\u672a\u6765\u524d\u666f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7cfb\u7edf\u800c\u6df1\u5165\u7684\u591a\u6a21\u6001\u751f\u6210\u6982\u8ff0\uff0c\u6709\u671b\u63a8\u52a8\u751f\u6210\u5185\u5bb9\u7684\u4eba\u5de5\u667a\u80fd\uff08AIGC\uff09\u548c\u4e16\u754c\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6240\u6709\u76f8\u5173\u7684\u8bba\u6587\u5217\u8868\u53ef\u5728\u627e\u5230\u3002**|\n", "2405.19333": "|**2024-05-29**|**Multi-Modal Generative Embedding Model**|Feipeng Ma et.al.|[2405.19333](http://arxiv.org/abs/2405.19333)|null|\u5728\u5927\u591a\u6570\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\uff0c\u95ee\u9898\u53ef\u4ee5\u5f52\u7ed3\u4e3a\u751f\u6210\u6216\u5d4c\u5165\u3002\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u901a\u8fc7\u5c06\u8bed\u8a00\u6a21\u5757\u5206\u89e3\u4e3a\u4e00\u4e2a\u7528\u4e8e\u751f\u6210\u7684\u6587\u672c\u89e3\u7801\u5668\u548c\u4e00\u4e2a\u7528\u4e8e\u5d4c\u5165\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u5904\u7406\u8fd9\u4e24\u79cd\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u591a\u6a21\u6001\u65b9\u6cd5\u7684\u7b80\u7ea6\u6027\uff0c\u672c\u5de5\u4f5c\u8bd5\u56fe\u4ec5\u4f7f\u7528\u4e00\u4e2a\u6a21\u578b\u6765\u5904\u7406\u6bcf\u79cd\u6a21\u6001\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u6a21\u6001\u751f\u6210\u5d4c\u5165\u6a21\u578b\uff08MM-GEM\uff09\uff0c\u5b83\u5c06\u751f\u6210\u548c\u5d4c\u5165\u76ee\u6807\u6574\u5408\u5230\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86PoolAggregator\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u5e76\u5b9e\u73b0\u7ec6\u7c92\u5ea6\u7684\u5d4c\u5165\u548c\u751f\u6210\u80fd\u529b\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u4e4b\u95f4\u5e76\u6ca1\u6709\u663e\u8457\u51b2\u7a81\u3002\u4f8b\u5982\uff0c\u57fa\u4e8eViT-Large\u548cTinyLlama\u7684MM-GEM\u5728\u8bf8\u5982\u8de8\u6a21\u6001\u68c0\u7d22\u548c\u96f6\u6837\u672c\u5206\u7c7b\u7b49\u591a\u6a21\u6001\u5d4c\u5165\u6a21\u578b\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u5177\u5907\u826f\u597d\u7684\u56fe\u50cf\u63cf\u8ff0\u80fd\u529b\u3002\u6b64\u5916\uff0cMM-GEM\u80fd\u591f\u65e0\u7f1d\u6267\u884c\u533a\u57df\u7ea7\u522b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u548c\u68c0\u7d22\u4efb\u52a1\u3002\u53e6\u5916\uff0cMM-GEM\u4e2d\u7684\u5148\u8fdb\u6587\u672c\u6a21\u578b\u5bf9\u4e8e\u957f\u6587\u672c\u548c\u56fe\u50cf\u68c0\u7d22\u7684Recall@1\u6307\u6807\u5e26\u6765\u4e86\u8d85\u8fc75%\u7684\u63d0\u5347\u3002|\n", "2405.19332": "|**2024-05-29**|**Self-Exploring Language Models: Active Preference Elicitation for Online Alignment**|Shenao Zhang et.al.|[2405.19332](http://arxiv.org/abs/2405.19332)|**[link](https://github.com/shenao-zhang/selm)**|****\u6458\u8981\uff1a** \u504f\u597d\u4f18\u5316\uff0c\u7279\u522b\u662f\u5728\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u7684\u9a71\u52a8\u4e0b\uff0c\u5df2\u7ecf\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u610f\u613f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u6210\u5c31\u3002\u76f8\u8f83\u4e8e\u4f7f\u7528\u56fa\u5b9a\u6570\u636e\u96c6\u7684\u79bb\u7ebf\u5bf9\u9f50\uff0c\u901a\u8fc7\u4eba\u6216\u4eba\u5de5\u667a\u80fd\u5bf9\u6a21\u578b\u751f\u6210\u7684\u53cd\u9988\u901a\u5e38\u80fd\u591f\u901a\u8fc7\u8fed\u4ee3\u8fc7\u7a0b\u63d0\u5347\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\u548cLLMs\u7684\u4e00\u81f4\u6027\u3002\u7136\u800c\uff0c\u8981\u5b9e\u73b0\u5168\u5c40\u51c6\u786e\u7684\u5956\u52b1\u6a21\u578b\uff0c\u9700\u8981\u7cfb\u7edf\u5730\u63a2\u7d22\u751f\u6210\u5404\u79cd\u5404\u6837\u7684\u54cd\u5e94\uff0c\u4ee5\u6db5\u76d6\u81ea\u7136\u8bed\u8a00\u7684\u5e7f\u9614\u7a7a\u95f4\u3002\u4ec5\u4f9d\u8d56\u6807\u51c6\u5956\u52b1\u6700\u5927\u5316LLMs\u7684\u968f\u673a\u91c7\u6837\u662f\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u7684\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5c42\u76ee\u6807\uff0c\u4e50\u89c2\u5730\u503e\u5411\u4e8e\u53ef\u80fd\u5177\u6709\u9ad8\u5956\u52b1\u7684\u54cd\u5e94\uff0c\u4ee5\u6b64\u6765\u4e3b\u52a8\u63a2\u7d22\u5206\u5e03\u5916\u533a\u57df\u3002\u901a\u8fc7\u89e3\u51b3\u5185\u5c42\u95ee\u9898\uff0c\u5229\u7528\u91cd\u65b0\u53c2\u6570\u5316\u7684\u5956\u52b1\u51fd\u6570\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aSelf-Exploring Language Models\uff08SELM\uff09\u7684\u7b97\u6cd5\u3002\u5b83\u6d88\u9664\u4e86\u5bf9\u5355\u72ec\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u7684\u9700\u6c42\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u76f4\u89c2\u7684\u76ee\u6807\u5bf9LLMs\u8fdb\u884c\u8fed\u4ee3\u66f4\u65b0\u3002\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u76f8\u6bd4\uff0cSELM\u7684\u76ee\u6807\u964d\u4f4e\u4e86\u5bf9\u672a\u89c1\u8fc7\u7684\u8fc7\u5ea6\u5ef6\u4f38\u7684\u65e0\u5dee\u522b\u504f\u597d\uff0c\u63d0\u9ad8\u4e86\u63a2\u7d22\u6548\u7387\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728Zephyr-7B-SFT\u548cLlama-3-8B-Instruct\u6a21\u578b\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0cSELM\u5728MT-Bench\u548cAlpacaEval 2.0\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4ee5\u53ca\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u5404\u79cd\u6807\u51c6\u5b66\u672f\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328](http://arxiv.org/abs/2405.19328)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u89c4\u8303\u6a21\u5757\u201d\u7684\u67b6\u6784\uff0c\u5b83\u9488\u5bf9\u751f\u6210\u6027\u4ee3\u7406\u5728\u9762\u5bf9\u5305\u542b\u73b0\u6709\u89c4\u8303\u7684\u793e\u4f1a\u7ed3\u6784\u65f6\u7684\u534f\u4f5c\u6311\u6218\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u8bc4\u4f30\u73af\u5883\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u793e\u4f1a\u4efb\u52a1\u65f6\uff0c\u5982\u4f55\u8bc6\u522b\u5e76\u9002\u5e94\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u89c4\u8303\u6a21\u5757\u7684\u6838\u5fc3\u5728\u4e8e\u4fc3\u8fdb\u5747\u8861\u9009\u62e9\uff0c\u501f\u9274\u5206\u7c7b\u673a\u6784\u5b9e\u73b0\u76f8\u5173\u5747\u8861\u7684\u6982\u5ff5\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u540c\u4f34\u4e92\u52a8\u5b66\u4e60\u73af\u5883\u4e2d\u4e0d\u540c\u5019\u9009\u673a\u6784\u4e2d\u7684\u6743\u5a01\u6027\u3002\u901a\u8fc7\u63d0\u5347\u89c4\u8303\u80fd\u529b\uff0c\u4ee3\u7406\u53ef\u4ee5\u534f\u8c03\u5236\u88c1\u884c\u4e3a\uff0c\u8fdb\u800c\u5f71\u54cd\u793e\u4ea4\u73af\u5883\u4e2d\u7684\u57fa\u672c\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4f53\u798f\u7949\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u652f\u6301\u673a\u6784\u7684\u65b0\u73af\u5883\uff0c\u5e76\u6839\u636e\u4e24\u4e2a\u4e3b\u8981\u6807\u51c6\u6765\u8bc4\u4f30\u8be5\u6846\u67b6\uff1a\u4e00\u662f\u4ee3\u7406\u80fd\u5426\u5ffd\u7565\u975e\u6743\u5a01\u673a\u6784\uff0c\u4e8c\u662f\u4ee3\u7406\u5728\u591a\u4e2a\u9009\u9879\u4e2d\u8bc6\u522b\u6743\u5a01\u673a\u6784\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u914d\u5907\u4e86\u89c4\u8303\u6a21\u5757\u7684\u4ee3\u7406\u76f8\u6bd4\u57fa\u7840\u4ee3\u7406\u80fd\u5b9e\u73b0\u66f4\u7a33\u5b9a\u7684\u5408\u4f5c\u6548\u679c\uff0c\u8fd9\u4e3a\u7814\u7a76\u8bbe\u8ba1\u8003\u8651\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u7684\u73af\u5883\u548c\u4ee3\u7406\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002|\n", "2405.19327": "|**2024-05-29**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327](http://arxiv.org/abs/2405.19327)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u51fa\u4e8e\u5546\u4e1a\u5229\u76ca\uff0c\u50cfGPT\u3001Gemini\u548cClaude\u8fd9\u6837\u7684\u6700\u5148\u8fdb\u6a21\u578b\u88ab\u5c01\u95ed\u5728\u4e13\u6709\u63a5\u53e3\u540e\uff0c\u5176\u8bad\u7ec3\u8be6\u60c5\u5e76\u672a\u516c\u5f00\u3002\u8fd1\u671f\uff0c\u4e00\u4e9b\u673a\u6784\u5f00\u6e90\u4e86\u7c7b\u4f3c\u6027\u80fd\u7684LLMs\uff0c\u5982LLaMA-3\uff0c\u4f46\u5927\u591a\u6570\u7ec6\u8282\uff08\u5982\u4e2d\u95f4\u68c0\u67e5\u70b9\u3001\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u8bad\u7ec3\u4ee3\u7801\u7b49\uff09\u4ecd\u672a\u62ab\u9732\u3002\u4e3a\u4e86\u63d0\u9ad8LLMs\u7684\u900f\u660e\u5ea6\uff0c\u7814\u7a76\u754c\u6b63\u5728\u63a8\u52a8\u771f\u6b63\u5f00\u653e\u7684\u6a21\u578b\uff0c\u5982Pythia\u3001Amber\u548cOLMo\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u4fe1\u606f\uff0c\u4fc3\u8fdb\u4e86\u5bf9\u5927\u6a21\u578b\u6027\u80fd\u3001\u5c40\u9650\u6027\u3001\u504f\u89c1\u548c\u98ce\u9669\u7684\u79d1\u5b66\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5f00\u653e\u6a21\u578b\u5728\u63a8\u7406\u3001\u77e5\u8bc6\u548c\u7f16\u7a0b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ecd\u900a\u4e8e\u540c\u7b49\u89c4\u6a21\u7684\u5c01\u95ed\u6e90\u7801\u6a21\u578b\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u6e90\u4e86MAP-Neo\uff0c\u4e00\u4e2a\u62e5\u670970\u4ebf\u53c2\u6570\u7684\u53cc\u8bed\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u5934\u5f00\u59cb\u57284.5\u4e07\u4ebf\u9ad8\u8d28\u91cf\u4ee4\u724c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002MAP-Neo\u662f\u9996\u4e2a\u4e0e\u73b0\u6709\u9876\u7ea7LLMs\u6027\u80fd\u76f8\u5f53\u7684\u5b8c\u5168\u5f00\u6e90\u7684\u53cc\u8bed\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u4e86\u6240\u6709\u7ec6\u8282\uff0c\u5305\u62ec\u6e05\u7406\u540e\u7684\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u3001\u6570\u636e\u6e05\u6d17\u6d41\u7a0b\u3001\u68c0\u67e5\u70b9\u4ee5\u53ca\u4f18\u5316\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30\u6846\u67b6\uff0c\u4ee5\u4f9b\u91cd\u73b0\u3002\u6211\u4eec\u671f\u671bMAP-Neo\u80fd\u63a8\u52a8\u5f00\u653e\u7814\u7a76\u793e\u533a\u7684\u53d1\u5c55\uff0c\u6fc0\u53d1\u66f4\u591a\u521b\u65b0\uff0c\u4fc3\u8fdbLLMs\u7684\u8fdb\u4e00\u6b65\u63d0\u5347\u3002|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326](http://arxiv.org/abs/2405.19326)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u96f6\u6837\u672c3D\u63a8\u7406\u5206\u5272\uff0c\u76ee\u6807\u662f\u9488\u5bf9\u7269\u4f53\u7684\u90e8\u4ef6\u641c\u7d22\u548c\u5b9a\u4f4d\uff0c\u8fd9\u662f\u4e00\u79cd\u8d85\u8d8a\u4e86\u5148\u524d\u7c7b\u522b\u7279\u5b9a\u76843D\u8bed\u4e49\u5206\u5272\u30013D\u5b9e\u4f8b\u5206\u5272\u548c\u5f00\u653e\u8bcd\u6c473D\u5206\u5272\u5c40\u9650\u7684\u65b0\u8303\u5f0f\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aReasoning3D\u7684\u7b80\u5355\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u7684\u547d\u4ee4\uff0c\u5bf93D\u7f51\u683c\u8fdb\u884c\uff08\u7ec6\u81f4\uff09\u90e8\u5206\u5206\u5272\uff0c\u540c\u65f6\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u548c\u63a8\u7406\u7b54\u6848\u7684\u4ea4\u4e92\u5f0f\u5206\u5272\u80fd\u529b\u3002\u7279\u522b\u5730\uff0cReasoning3D\u5229\u7528\u9884\u8bad\u7ec3\u76842D\u5206\u5272\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\uff0c\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u89e3\u6790\u7528\u6237\u8f93\u5165\u67e5\u8be2\u3002\u5df2\u6709\u7814\u7a76\u8868\u660e\uff0c\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8d4b\u4e88\u57fa\u7840\u6a21\u578b\u4e16\u754c\u77e5\u8bc6\u7684\u5148\u9a8c\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u590d\u6742\u6307\u4ee4\uff0c\u8fd9\u4f7f\u5f97\u6211\u4eec\u5728\u4f9d\u8d56\u6709\u96503D\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u201c\u5206\u5272\u4efb\u4f55\u4e1c\u897f\u201d\uff08\u6e90\u6548\u7387\u9ad8\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u6cdb\u5316\u6027\uff0c\u80fd\u6709\u6548\u6839\u636e\u9690\u6027\u6587\u672c\u67e5\u8be2\u57283D\u5bf9\u8c61\uff083D\u7f51\u683c\uff09\u4e2d\u5b9a\u4f4d\u548c\u7a81\u51fa\u663e\u793a\u90e8\u5206\uff0c\u5305\u62ec\u53ef\u52a83D\u5bf9\u8c61\u548c\u771f\u5b9e\u4e16\u754c\u7684\u626b\u63cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65e0\u76d1\u7763\u65b9\u6cd5\u4fbf\u4e8e\u5feb\u901f\u90e8\u7f72\uff0c\u5e76\u4e3a\u672a\u67653D\uff08\u8bed\u4e49\uff09\u5bf9\u8c61\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\uff0c\u5982\u673a\u5668\u4eba\u3001\u7269\u4f53\u64cd\u4f5c\u3001\u90e8\u4ef6\u7ec4\u88c5\u3001\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u3001\u589e\u5f3a\u73b0\u5b9e\u548c\u865a\u62df\u73b0\u5b9e\uff08AR/VR\uff09\u3001\u4ee5\u53ca\u533b\u7597\u5e94\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u884c\u7684\u901a\u7528\u57fa\u51c6\u3002\u4ee3\u7801\u3001\u6a21\u578b\u6743\u91cd\u3001\u90e8\u7f72\u6307\u5357\u548c\u8bc4\u4f30\u534f\u8bae\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttp://tianrun-chen.github.io/Reason3D/\u3002|\n", "2405.19325": "|**2024-05-29**|**Nearest Neighbor Speculative Decoding for LLM Generation and Attribution**|Minghan Li et.al.|[2405.19325](http://arxiv.org/abs/2405.19325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4f1a\u4ea7\u751f\u865a\u6784\u5185\u5bb9\u4e14\u7f3a\u4e4f\u5bf9\u751f\u6210\u6587\u672c\u7684\u6765\u6e90\u6807\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u534a\u53c2\u6570\u5316\u8bed\u8a00\u6a21\u578b\u5982kNN-LM\u901a\u8fc7\u5728\u975e\u53c2\u6570\u6570\u636e\u5b58\u50a8\u4e2d\u5bfb\u627e\u4e0e\u7ed9\u5b9a\u63d0\u793a\u6700\u63a5\u8fd1\u7684\u90bb\u5c45\u6765\u6539\u8fdbLM\u8f93\u51fa\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u901a\u5e38\u8f83\u6162\uff0c\u751f\u6210\u7684\u6587\u672c\u6d41\u7545\u5ea6\u4e0d\u9ad8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u534a\u53c2\u6570\u5316\u8bed\u8a00\u5efa\u6a21\u65b9\u6cd5\u2014\u2014Nearest Neighbor Speculative Decoding\uff08NEST\uff09\uff0c\u5b83\u80fd\u591f\u5c06\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4efb\u610f\u957f\u5ea6\u6587\u672c\u7247\u6bb5\u878d\u5165\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u5176\u6e90\u5934\u7684\u6807\u6ce8\u3002NEST\u5728\u6bcf\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u8fdb\u884c\u57fa\u4e8e\u4ee4\u724c\u7684\u68c0\u7d22\uff0c\u8ba1\u7b97\u51fa\u4e00\u4e2a\u534a\u53c2\u6570\u6df7\u5408\u5206\u5e03\uff0c\u5e76\u4ece\u8bed\u6599\u5e93\u4e2d\u8bc6\u522b\u51fa\u53ef\u80fd\u7684\u8fde\u7eed\u6587\u672c\u6bb5\u843d\u6269\u5c55\u3002\u5b83\u91c7\u7528\u4e00\u79cd\u8fd1\u4f3c\u63a8\u6d4b\u89e3\u7801\u7b56\u7565\uff0c\u63a5\u53d7\u68c0\u7d22\u5230\u7684\u7247\u6bb5\u524d\u7f00\u6216\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002NEST\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u7840LM\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u751f\u6210\u8d28\u91cf\u548c\u6765\u6e90\u6807\u6ce8\u7387\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684kNN-LM\u65b9\u6cd5\uff0c\u5e76\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u68c0\u7d22\u589e\u5f3a\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u529b\u3002\u6b64\u5916\uff0cNEST\u5927\u5e45\u63d0\u5347\u4e86\u751f\u6210\u901f\u5ea6\uff0c\u5f53\u5e94\u7528\u4e8eLlama-2-Chat 70B\u65f6\uff0c\u63a8\u7406\u65f6\u95f4\u63d0\u9ad8\u4e861.8\u500d\u3002|\n", "2405.19323": "|**2024-05-29**|**Are Large Language Models Chameleons?**|Mingmeng Geng et.al.|[2405.19323](http://arxiv.org/abs/2405.19323)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u62e5\u6709\u81ea\u5df1\u7684\u4e16\u754c\u89c2\u548c\u4eba\u683c\u503e\u5411\uff1f\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u4e86\u8d85\u8fc7\u4e00\u767e\u4e07\u6b21\u7684\u5b9e\u9a8c\uff0c\u8ba9LLMs\u56de\u7b54\u4e3b\u89c2\u95ee\u9898\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u6a21\u578b\u7684\u54cd\u5e94\u4e0e\u6b27\u6d32\u793e\u4f1a\u8c03\u67e5\uff08ESS\uff09\u7684\u5b9e\u9645\u6570\u636e\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793a\u63d0\u793a\u5bf9\u504f\u89c1\u548c\u53d8\u5f02\u6027\u6709\u663e\u8457\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u91cd\u5927\u7684\u6587\u5316\u3001\u5e74\u9f84\u548c\u6027\u522b\u504f\u5dee\u3002\u6587\u4e2d\u8ba8\u8bba\u4e86\u8bc4\u4f30LLMs\u4e0e\u8c03\u67e5\u6570\u636e\u5dee\u5f02\u7684\u65b9\u6cd5\uff0c\u5982\u8ba1\u7b97\u52a0\u6743\u5e73\u5747\u503c\u4ee5\u53ca\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u57fa\u4e8eJaccard\u76f8\u4f3c\u6027\u7684\u6d4b\u91cf\u6307\u6807\u3002\u7814\u7a76\u8005\u5f3a\u8c03\uff0c\u5728\u5229\u7528LLMs\u6a21\u62df\u4e2a\u4f53\u51b3\u7b56\u6216\u96c6\u4f53\u884c\u4e3a\u4e4b\u524d\uff0c\u5206\u6790\u63d0\u793a\u7684\u7a33\u5065\u6027\u548c\u53d8\u5f02\u6027\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u6a21\u4eff\u80fd\u529b\u5145\u5176\u91cf\u53ea\u80fd\u8bf4\u662f\u8fd1\u4f3c\u7684\u3002|\n", "2405.19320": "|**2024-05-29**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320](http://arxiv.org/abs/2405.19320)|null|**\u6458\u8981\uff1a** \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5728\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u90fd\u5904\u4e8e\u6d3b\u8dc3\u7684\u7814\u7a76\u9636\u6bb5\uff0c\u4f46\u5173\u952e\u6311\u6218\u4e4b\u4e00\u662f\u5982\u4f55\u5728\u5904\u7406\u4ece\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u4e0d\u786e\u5b9a\u6027\u65f6\u3002\u5c3d\u7ba1\u6807\u51c6\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e2d\u4e50\u89c2\u4e3b\u4e49\u6216\u60b2\u89c2\u4e3b\u4e49\u7684\u539f\u5219\u5df2\u5e7f\u4e3a\u4eba\u77e5\uff0c\u4f46\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5b9e\u73b0\u65e2\u5b9e\u7528\u53c8\u57fa\u4e8e\u7406\u8bba\u7684\u65b9\u6cd5\u5c1a\u4e0d\u6210\u719f\uff0c\u56e0\u4e3a\u6784\u5efa\u7f6e\u4fe1\u533a\u95f4\u7684\u6807\u51c6\u6280\u672f\u5728\u5904\u7406\u4efb\u610f\u7b56\u7565\u53c2\u6570\u5316\u65f6\u53d8\u5f97\u96be\u4ee5\u5904\u7406\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u65b9\u6cd5\u2014\u2014\u4ef7\u503c\u6fc0\u52b1\u7684\u504f\u597d\u4f18\u5316\uff08VPO\uff09\u3002VPO\u901a\u8fc7\u5728\u6700\u5927\u4f3c\u7136\u4f30\u8ba1\u7684\u5956\u52b1\u51fd\u6570\u4e2d\u6dfb\u52a0\u76f8\u5e94\u7684\u503c\u51fd\u6570\u7684\u6b63\u5219\u5316\uff0c\u4ee5\u6307\u793a\u9009\u62e9\u4e50\u89c2\u4e3b\u4e49\u8fd8\u662f\u60b2\u89c2\u4e3b\u4e49\uff0c\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\u3002\u6b64\u5916\uff0cVPO\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u5e76\u5229\u7528\u9690\u5f0f\u5956\u52b1\u5efa\u6a21\uff0c\u56e0\u6b64\u5176RLHF\u7ba1\u9053\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\u66f4\u4e3a\u7b80\u5355\u3002\u5bf9\u4e8e\u5728\u7ebf\u548c\u79bb\u7ebf\u8bbe\u7f6e\uff0cVPO\u63d0\u4f9b\u4e86\u7406\u8bba\u4fdd\u8bc1\uff0c\u5176\u6536\u655b\u901f\u5ea6\u4e0e\u6807\u51c6RL\u76f8\u5f53\u3002\u5b9e\u9a8c\u5728\u6587\u672c\u6458\u8981\u548c\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u9a8c\u8bc1\u4e86VPO\u7684\u5b9e\u7528\u6027\u4e0e\u6709\u6548\u6027\u3002|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340](http://arxiv.org/abs/2405.20340)|**[link](https://github.com/IDEA-Research/MotionLLM)**|\u8fd9\u9879\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\uff08\u89c6\u9891\u548c\u52a8\u4f5c\u6a21\u6001\uff09\u4e0b\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5f3a\u5927\u529f\u80fd\u3002\u4e0e\u4e13\u4e3a\u5355\u6a21\u6001\uff08\u89c6\u9891\u6216\u52a8\u4f5c\uff09\u8bbe\u8ba1\u7684\u6700\u65b0LLMs\u4e0d\u540c\uff0c\u6211\u4eec\u8ba4\u4e3a\u7406\u89e3\u4eba\u7c7b\u884c\u4e3a\u9700\u8981\u5bf9\u89c6\u9891\u548c\u52a8\u4f5c\u5e8f\u5217\uff08\u5982SMPL\u5e8f\u5217\uff09\u8fdb\u884c\u8054\u5408\u5efa\u6a21\uff0c\u4ee5\u6709\u6548\u6355\u6349\u7cbe\u7ec6\u7684\u8eab\u4f53\u90e8\u4f4d\u52a8\u6001\u548c\u8bed\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMotionLLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u4eba\u7c7b\u52a8\u4f5c\u7406\u89e3\u3001\u63cf\u8ff0\u548c\u63a8\u7406\u3002MotionLLM\u91c7\u7528\u4e86\u4e00\u4f53\u5316\u7684\u89c6\u9891-\u52a8\u4f5c\u8bad\u7ec3\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u7c97\u7c92\u5ea6\u7684\u89c6\u9891-\u6587\u672c\u6570\u636e\u548c\u7cbe\u7ec6\u52a8\u4f5c-\u6587\u672c\u6570\u636e\u7684\u4f18\u52bf\uff0c\u4ee5\u83b7\u53d6\u4e30\u5bcc\u7684\u7a7a\u95f4-\u65f6\u95f4\u6d1e\u5bdf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684MoVid\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u591a\u6837\u5316\u7684\u89c6\u9891\u3001\u52a8\u4f5c\u3001caption\u548c\u6307\u4ee4\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86MoVid-Bench\uff0c\u5b83\u5177\u6709\u7cbe\u5fc3\u7684\u624b\u52a8\u6807\u6ce8\uff0c\u4ee5\u66f4\u597d\u5730\u8bc4\u4f30\u5728\u89c6\u9891\u548c\u52a8\u4f5c\u4e0a\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5145\u5206\u5c55\u793a\u4e86MotionLLM\u5728caption\u751f\u6210\u3001\u7a7a\u95f4-\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002|\n", "2405.20339": "|**2024-05-30**|**Visual Perception by Large Language Model's Weights**|Feipeng Ma et.al.|[2405.20339](http://arxiv.org/abs/2405.20339)|**[link](https://github.com/FeipengMa6/VLoRA)**|\u8fd9\u7bc7\u8bba\u6587\u7684\u80cc\u666f\u662f\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u91c7\u7528\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5373\u5c06\u89c6\u89c9\u4fe1\u606f\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u7a7a\u95f4\u5bf9\u9f50\uff0c\u7136\u540e\u5c06\u89c6\u89c9\u4ee4\u724c\u4e0e\u6587\u672c\u4ee4\u724c\u5408\u5e76\uff0c\u5f62\u6210\u7edf\u4e00\u7684\u5e8f\u5217\u8f93\u5165\u7ed9\u8bed\u8a00\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7531\u4e8e\u589e\u52a0\u4e86\u7531\u89c6\u89c9\u4ee4\u724c\u5bfc\u81f4\u7684\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\uff0c\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53c2\u6570\u7a7a\u95f4\u5bf9\u9f50\u8303\u5f0f\uff0c\u901a\u8fc7\u5c06\u89c6\u89c9\u4fe1\u606f\u8868\u793a\u4e3a\u6a21\u578b\u6743\u91cd\u6765\u5904\u7406\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u8f93\u5165\u56fe\u50cf\uff0c\u9996\u5148\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u63d0\u53d6\u7279\u5f81\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u7279\u5f81\u8f6c\u6362\u4e3a\u611f\u77e5\u6743\u91cd\uff0c\u5e76\u5c06\u5176\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u6743\u91cd\u878d\u5408\u3002\u8fd9\u6837\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u65e0\u9700\u89c6\u89c9\u4ee4\u724c\uff0c\u4ece\u800c\u7f29\u77ed\u4e86\u8f93\u5165\u5e8f\u5217\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6548\u7387\u3002 \u57fa\u4e8e\u8fd9\u4e00\u7406\u5ff5\uff0c\u8bba\u6587\u63d0\u51fa\u4e86VLoRA\u6a21\u578b\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u4e2a\u611f\u77e5\u6743\u91cd\u751f\u6210\u5668\u3002\u8be5\u751f\u6210\u5668\u8bbe\u8ba1\u6210\u80fd\u591f\u5c06\u89c6\u89c9\u7279\u5f81\u8f6c\u5316\u4e3a\u5177\u6709\u4f4e\u79e9\u7279\u6027\u7684\u611f\u77e5\u6743\u91cd\uff0c\u7c7b\u4f3c\u4e8eLoRA\uff08\u4f4e\u79e9\u81ea\u9002\u5e94\u8bad\u7ec3\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1VLoRA\u5728\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u4e0e\u73b0\u6709MLLMs\u76f8\u5f53\u7684\u6027\u80fd\uff0c\u4f46\u5176\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8ba1\u7b97\u6210\u672c\u663e\u8457\u964d\u4f4e\u3002\u8bba\u6587\u627f\u8bfa\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2405.20335": "|**2024-05-30**|**Xwin-LM: Strong and Scalable Alignment Practice for LLMs**|Bolin Ni et.al.|[2405.20335](http://arxiv.org/abs/2405.20335)|**[link](https://github.com/xwin-lm/xwin-lm)**|**\u672c\u6587\u4ecb\u7ecdXwin-LM\uff0c\u4e00\u4e2a\u4e13\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u7684\u5168\u9762\u5bf9\u9f50\u65b9\u6cd5\u5957\u4ef6\u3002\u5b83\u6db5\u76d6\u4e86\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3001\u5956\u52b1\u5efa\u6a21\uff08RM\uff09\u3001\u62d2\u7edd\u91c7\u6837\u5fae\u8c03\uff08RS\uff09\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b49\u591a\u79cd\u5173\u952e\u6280\u672f\u3002\u4e3b\u8981\u7ec4\u6210\u90e8\u5206\u5305\u62ec\uff1a(1) \u4f7f\u7528\u9ad8\u8d28\u91cf\u6307\u4ee4\u6570\u636e\u8fdb\u884c\u521d\u59cb\u5fae\u8c03\u7684Xwin-LM-SFT\uff1b(2) \u7531GPT-4\u7cbe\u5fc3\u6807\u6ce8\u7684\u5927\u578b\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6Xwin-Pair\uff1b(3) \u57287B\u300113B\u548c70B\u53c2\u6570\u89c4\u6a21\u4e0a\u8bad\u7ec3\u7684Xwin-RM\u5956\u52b1\u6a21\u578b\uff1b(4) \u6bcf\u4e2a\u63d0\u793a\u5173\u805464\u4e2a\u72ec\u7279\u54cd\u5e94\u7684\u591awise\u504f\u597d\u6570\u636e\u96c6Xwin-Set\uff0c\u8fd9\u4e9b\u54cd\u5e94\u7531Xwin-LM-SFT\u751f\u6210\u5e76\u7531Xwin-RM\u8bc4\u5206\uff1b(5) \u4f7f\u7528Xwin-Set\u4e2d\u6700\u9ad8\u5f97\u5206\u54cd\u5e94\u8fdb\u884c\u5fae\u8c03\u7684Xwin-LM-RS\u6a21\u578b\uff1b(6) \u901a\u8fc7DPO\u7b97\u6cd5\u5728Xwin-Set\u4e0a\u8fdb\u4e00\u6b65\u4f18\u5316\u7684Xwin-LM-DPO\u6a21\u578b\u3002\u6211\u4eec\u5728AlpacaEval\u548cMT-bench\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86\u6574\u4e2a\u7ba1\u9053\u7684\u7a33\u5b9a\u4e14\u663e\u8457\u6539\u8fdb\uff0c\u8bc1\u660e\u4e86Xwin-LM\u7684\u5f3a\u5927\u548c\u53ef\u6269\u5c55\u6027\u3002\u6211\u4eec\u5c06\u5728https://github.com/Xwin-LM/Xwin-LM\u7684\u4ed3\u5e93\u4e2d\u6301\u7eed\u66f4\u65b0\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u7814\u7a76\u3002**|\n", "2405.20319": "|**2024-05-31**|**ParSEL: Parameterized Shape Editing with Language**|Aditya Ganeshan et.al.|[2405.20319](http://arxiv.org/abs/2405.20319)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aParSEL\u7684\u7cfb\u7edf\uff0c\u5b83\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5b9e\u73b0\u9ad8\u8d28\u91cf3D\u8d44\u4ea7\u7684\u53ef\u63a7\u7f16\u8f91\u3002\u9762\u5bf9\u81ea\u7136\u8bed\u8a00\u5728\u7cbe\u786e\u64cd\u63a7\u4e0a\u7684\u5c40\u9650\u6027\uff0cParSEL\u63a5\u6536\u4e00\u4e2a\u5206\u5272\u76843D\u7f51\u683c\u548c\u7f16\u8f91\u8bf7\u6c42\uff0c\u751f\u6210\u4e00\u4e2a\u53c2\u6570\u5316\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u7528\u6237\u53ef\u4ee5\u8c03\u6574\u7a0b\u5e8f\u53c2\u6570\uff0c\u7cbe\u7ec6\u5730\u63a2\u7d22\u5f62\u72b6\u53d8\u5316\uff0c\u63a7\u5236\u7f16\u8f91\u5e45\u5ea6\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u7406\u89e3\u521d\u59cb\u7f16\u8f91\u6307\u4ee4\uff0c\u4f46\u53d1\u73b0\u5b83\u4eec\u5728\u63a8\u65ad\u5b8c\u6574\u7f16\u8f91\u7a0b\u5e8f\u65f6\u5e38\u5e38\u4e0d\u8db3\uff0c\u4ea7\u751f\u7684\u7ed3\u679c\u53ef\u80fd\u8fdd\u53cd\u5f62\u72b6\u903b\u8f91\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u5206\u6790\u6027\u7f16\u8f91\u4f20\u64ad\uff08Analytical Edit Propagation\uff0cAEP\uff09\u7b97\u6cd5\uff0c\u5b83\u4ece\u521d\u59cb\u7f16\u8f91\u79cd\u5b50\u5f00\u59cb\uff0c\u901a\u8fc7\u8ba1\u7b97\u673a\u4ee3\u6570\u7cfb\u7edf\u8fdb\u884c\u51e0\u4f55\u5206\u6790\uff0c\u5bfb\u627e\u4e0e\u6f5c\u5728\u7528\u6237\u7f16\u8f91\u517c\u5bb9\u7684\u5206\u6790\u6027\u7f16\u8f91\u64cd\u4f5c\uff0c\u4ee5\u751f\u6210\u5b8c\u6574\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u5176\u4ed6\u65b9\u6848\uff0cParSEL\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8bf7\u6c42\u6709\u6548\u5730\u5b9e\u73b0\u4e86\u5bf93D\u5bf9\u8c61\u7684\u53ef\u63a7\u7f16\u8f91\u3002|\n", "2405.20318": "|**2024-05-30**|**CausalQuest: Collecting Natural Causal Questions for AI Agents**|Roberto Ceraolo et.al.|[2405.20318](http://arxiv.org/abs/2405.20318)|**[link](https://github.com/roberto-ceraolo/causal-quest)**|**\u4eba\u7c7b\u5929\u751f\u5c31\u6709\u5bfb\u6c42\u56e0\u679c\u5173\u7cfb\u7684\u9a71\u52a8\u529b\uff0c\u65e0\u8bba\u662f\u51fa\u4e8e\u597d\u5947\u5fc3\u8fd8\u662f\u7279\u5b9a\u76ee\u6807\u3002\u4e3a\u4e86\u5f00\u53d1\u80fd\u5904\u7406\u8fd9\u79cd\u4eba\u7c7b\u672c\u6027\u8ffd\u6c42\u7684AI\u4ee3\u7406\uff0c\u6211\u4eec\u6025\u9700\u4e00\u4e2a\u5168\u9762\u7684\u81ea\u7136\u56e0\u679c\u95ee\u9898\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u8981\u4e48\u5305\u542b\u4eba\u5de5\u5236\u9020\u7684\u95ee\u9898\uff0c\u65e0\u6cd5\u53cd\u6620\u5b9e\u9645AI\u5e94\u7528\u573a\u666f\uff0c\u8981\u4e48\u5728\u7279\u5b9a\u6765\u6e90\u7684\u95ee\u9898\u8986\u76d6\u4e0a\u6709\u9650\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CausalQuest\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81ea\u793e\u4ea4\u7f51\u7edc\u3001\u641c\u7d22\u5f15\u64ce\u548cAI\u52a9\u624b\u768413,500\u4e2a\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u56e0\u679c\u95ee\u9898\uff0c\u5e76\u5efa\u7acb\u4e86\u66f4\u7ec6\u81f4\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u5458\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\uff0c\u6211\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6807\u6ce8\u3002\u7814\u7a76\u53d1\u73b0\uff0c42%\u7684\u4eba\u7c7b\u63d0\u95ee\u5b9e\u9645\u4e0a\u662f\u5173\u4e8e\u56e0\u679c\u7684\uff0c\u5927\u90e8\u5206\u662f\u60f3\u4e86\u89e3\u7ed9\u5b9a\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u9ad8\u6548\u7684\u4e8c\u5206\u7c7b\u5668\uff08\u9ad8\u8fbe28.5\u4ebf\u53c2\u6570\uff09\uff0c\u7528\u4e8e\u8bc6\u522b\u56e0\u679c\u95ee\u9898\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6027\u80fd\uff0cF1\u5206\u6570\u9ad8\u8fbe0.877\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u4e30\u5bcc\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u8fd9\u4e9b\u90fd\u53ef\u4ee5\u57fa\u4e8e\u6211\u4eec\u7684\u6570\u636e\u548c\u6a21\u578b\u8fdb\u884c\u6269\u5c55\u3002**|\n", "2405.20315": "|**2024-05-30**|**ANAH: Analytical Annotation of Hallucinations in Large Language Models**|Ziwei Ji et.al.|[2405.20315](http://arxiv.org/abs/2405.20315)|**[link](https://github.com/open-compass/anah)**|**### \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u5e7b\u89c9\u201d\u95ee\u9898\u5bf9\u4e8e\u5176\u5e7f\u6cdb\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5bf9\u8fd9\u4e00\u95ee\u9898\u7684\u7ec6\u81f4\u6d4b\u91cf\u5728\u793e\u533a\u4e2d\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a$\\textbf{ANAH}$\u7684\u53cc\u8bed\u6570\u636e\u96c6\uff0c\u4e13\u6ce8\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2d\u7684LLM\u5e7b\u89c9\u5206\u6790\u3002ANAH\u4e2d\u7684\u6bcf\u4e2a\u7b54\u6848\u53e5\u5b50\u90fd\u7ecf\u8fc7\u4e25\u8c28\u6807\u6ce8\uff0c\u5305\u62ec\u53c2\u8003\u7247\u6bb5\u68c0\u7d22\u3001\u5e7b\u89c9\u7c7b\u578b\u7684\u5224\u65ad\u4ee5\u53ca\u9519\u8bef\u5185\u5bb9\u7684\u4fee\u6b63\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea612,000\u4e2a\u53e5\u7ea7\u6ce8\u91ca\uff0c\u6db5\u76d6\u4e86\u5927\u7ea64,300\u4e2aLLM\u54cd\u5e94\uff0c\u6d89\u53ca\u8d85\u8fc7700\u4e2a\u4e3b\u9898\uff0c\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\u5f0f\u6d41\u7a0b\u6784\u5efa\u800c\u6210\u3002\u7531\u4e8e\u5e7b\u89c9\u6ce8\u91ca\u7684\u7cbe\u7ec6\u7c92\u5ea6\uff0c\u6211\u4eec\u53ef\u4ee5\u5b9a\u91cf\u786e\u8ba4LLMs\u7684\u5e7b\u89c9\u95ee\u9898\u968f\u7740\u7b54\u6848\u7684\u6269\u5c55\u800c\u9010\u6e10\u589e\u52a0\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u6807\u6ce8\u5668\u3002 ### \u4efb\u52a1 \u6211\u4eec\u6784\u5efa\u4e86\u5927\u7ea612,000\u6761\u53e5\u5b50\u7ea7\u522b\u7684\u6ce8\u91ca\uff0c\u9488\u5bf9\u7ea64,300\u4e2aLLM\u751f\u6210\u7684\u56de\u7b54\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u4e3b\u9898\u3002\u8fd9\u4e2a\u540d\u4e3aANAH\u7684\u6570\u636e\u96c6\u901a\u8fc7\u4eba\u7c7b\u53c2\u4e0e\u7684\u6d41\u7a0b\u7cbe\u5fc3\u8bbe\u8ba1\uff0c\u65e8\u5728\u63d0\u4f9b\u5173\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2dLLMs\u5e7b\u89c9\u7684\u8be6\u5c3d\u5206\u6790\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u5e7b\u89c9\u6807\u6ce8\uff0c\u6211\u4eec\u80fd\u591f\u91cf\u5316\u5730\u9a8c\u8bc1LLMs\u5728\u751f\u6210\u7b54\u6848\u65f6\u5e7b\u89c9\u95ee\u9898\u7684\u7d2f\u79ef\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u6df1\u5165\u7814\u7a76\u4e86\u751f\u6210\u5f0f\u548c\u533a\u5206\u6027\u6807\u6ce8\u5668\uff0c\u5e76\u53d1\u73b0\u5c3d\u7ba1\u5f00\u6e90LLMs\u5728\u7cbe\u7ec6\u5e7b\u89c9\u6807\u6ce8\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u4f46\u4f7f\u7528ANAH\u8bad\u7ec3\u7684\u751f\u6210\u5f0f\u6807\u6ce8\u5668\u80fd\u591f\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u6a21\u578b\uff0c\u751a\u81f3\u63a5\u8fd1GPT-3.5\u7684\u8868\u73b0\uff0c\u5e76\u5c55\u73b0\u51fa\u5728\u672a\u89c1\u8fc7\u95ee\u9898\u4e0a\u7684\u826f\u597d\u6cdb\u5316\u80fd\u529b\u3002**|\n", "2405.20313": "|**2024-05-30**|**Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation**|Guillaume Huguet et.al.|[2405.20313](http://arxiv.org/abs/2405.20313)|null|\u86cb\u767d\u8d28\u5728\u51e0\u4e4e\u6240\u6709\u7684\u751f\u7269\u8fc7\u7a0b\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\uff0c\u5176\u591a\u6837\u5316\u7684\u529f\u80fd\u6e90\u4e8e\u590d\u6742\u7684\u4e09\u7ef4\u7ed3\u6784\uff0c\u800c\u8fd9\u4e9b\u7ed3\u6784\u53c8\u7531\u6c28\u57fa\u9178\u5e8f\u5217\u51b3\u5b9a\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6c28\u57fa\u9178\u5e8f\u5217\u4e30\u5bcc\u7684\u751f\u7269\u5b66\u5f52\u7eb3\u504f\u7f6e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5e8f\u5217\u6761\u4ef6\u7684SE(3)\u7b49\u53d8\u6d41\u5339\u914d\u6a21\u578b\u2014\u2014FoldFlow-2\uff0c\u7528\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u751f\u6210\u3002\u4e0eFoldFlow\u5bb6\u65cf\u7684\u5148\u524d\u6a21\u578b\u76f8\u6bd4\uff0cFoldFlow-2\u5f15\u5165\u4e86\u65b0\u9896\u7684\u67b6\u6784\u7279\u6027\uff0c\u5305\u62ec\u7528\u4e8e\u7f16\u7801\u5e8f\u5217\u7684\u86cb\u767d\u8d28\u5927\u8bed\u8a00\u6a21\u578b\u3001\u7ed3\u5408\u7ed3\u6784\u548c\u5e8f\u5217\u8868\u793a\u7684\u65b0\u591a\u6a21\u6001\u878d\u5408\u4e3b\u5e72\uff0c\u4ee5\u53ca\u57fa\u4e8e\u51e0\u4f55\u53d8\u6362\u5668\u7684\u89e3\u7801\u5668\u3002\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u6837\u672c\u7684\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u2014\u2014\u8fd9\u5bf9\u65b0\u836f\u8bbe\u8ba1\u81f3\u5173\u91cd\u8981\u2014\u2014\u6211\u4eec\u5728\u6bd4\u5148\u524d\u5de5\u4f5c\u4f7f\u7528\u7684PDB\u6570\u636e\u96c6\u5927\u4e00\u4e2a\u6570\u91cf\u7ea7\u7684\u65b0\u6570\u636e\u96c6\u4e0a\u5927\u89c4\u6a21\u8bad\u7ec3FoldFlow-2\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u5df2\u77e5\u7684PDB\u86cb\u767d\u8d28\u548c\u901a\u8fc7\u8fc7\u6ee4\u83b7\u5f97\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u7ed3\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5f15\u5165\u5f3a\u5316\u5fae\u8c03\uff08Reinforced Finetuning\uff0c\u7b80\u79f0ReFT\uff09\u76ee\u6807\uff0c\u4f7fFoldFlow-2\u80fd\u591f\u9002\u5e94\u4efb\u610f\u5956\u52b1\uff0c\u5982\u63d0\u9ad8\u4e8c\u7ea7\u7ed3\u6784\u591a\u6837\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFoldFlow-2\u8d85\u8d8a\u4e86\u73b0\u6709\u57fa\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\u7684\u72b6\u6001\uff0c\u65e0\u8bba\u5728\u65e0\u6761\u4ef6\u751f\u6210\u8fd8\u662f\u5728\u8bbe\u8ba1\u6027\u3001\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u65b9\u9762\uff0c\u90fd\u4f18\u4e8eRFDiffusion\uff0c\u4e14\u5728\u86cb\u767d\u8d28\u957f\u5ea6\u7684\u5404\u7c7b\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u7b49\u6e29\u6784\u8c61\u91c7\u6837\u4efb\u52a1\u4e0a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684FoldFlow-2\u5728\u8bf8\u5982VHH\u7eb3\u7c73\u6297\u4f53\u9aa8\u67b6\u8bbe\u8ba1\u7b49\u5177\u6709\u6311\u6218\u6027\u7684\u6761\u4ef6\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002|\n", "2405.20309": "|**2024-05-30**|**Large Language Models Can Self-Improve At Web Agent Tasks**|Ajay Patel et.al.|[2405.20309](http://arxiv.org/abs/2405.20309)|**[link](https://github.com/AjayP13/webdreamer)**|\u5728\u590d\u6742\u7684\u73af\u5883\u4e2d\uff0c\u5982\u7f51\u7edc\u6d4f\u89c8\u5668\uff0c\u8bad\u7ec3\u6a21\u578b\u4f5c\u4e3a\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u6267\u884c\u52a8\u4f5c\u7684\u4ee3\u7406\u901a\u5e38\u5177\u6709\u6311\u6218\u6027\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u4ee5\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u6765\u5728\u65b0\u73af\u5883\u4e2d\u5bfc\u822a\u7684\u80fd\u529b\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\uff08\u5373\u5728\u5176\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u4e0a\u5fae\u8c03\uff09\u6765\u8d85\u8d8a\u57fa\u7840\u6027\u80fd\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76LLMs\u5728\u957f\u65f6\u5e8f\u4efb\u52a1\u7684\u590d\u6742\u73af\u5883\u2014\u2014WebArena\u57fa\u51c6\u4e2d\uff0c\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u80fd\u5426\u63d0\u5347\u5176\u8868\u73b0\u3002WebArena\u8981\u6c42\u4ee3\u7406\u81ea\u4e3b\u6d4f\u89c8\u7f51\u9875\u5e76\u6267\u884c\u64cd\u4f5c\u4ee5\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u5408\u6210\u8bad\u7ec3\u6570\u636e\u6df7\u5408\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u7ecf\u8fc7\u81ea\u6211\u6539\u8fdb\u540e\uff0c\u6a21\u578b\u5728WebArena\u57fa\u51c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u63d0\u9ad8\u4e8631%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u66f4\u5168\u9762\u5730\u8bc4\u4f30\u6211\u4eec\u7684\u5fae\u8c03\u4ee3\u7406\u6a21\u578b\u7684\u884c\u4e3a\u6027\u80fd\u3001\u9c81\u68d2\u6027\u3001\u80fd\u529b\u4ee5\u53ca\u8f68\u8ff9\u8d28\u91cf\uff0c\u8fd9\u4e9b\u6307\u6807\u8d85\u8d8a\u4e86\u5f53\u524d\u4ec5\u4f9d\u8d56\u4e8e\u6574\u4f53\u57fa\u51c6\u5206\u6570\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002|\n", "2405.20304": "|**2024-05-30**|**Group Robust Preference Optimization in Reward-free RLHF**|Shyam Sundhar Ramesh et.al.|[2405.20304](http://arxiv.org/abs/2405.20304)|**[link](https://github.com/rsshyam/Group-robust-preference-optimization)**|**## \u7ffb\u8bd1 \u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u9002\u5e94\u65f6\uff0c\u901a\u5e38\u9700\u8981\u901a\u8fc7\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u548c\u591a\u5143\u6807\u7b7e\u8005\u7fa4\u4f53\uff08\u5982\u4e0d\u540c\u6027\u522b\u3001\u79cd\u65cf\u3001\u516c\u53f8\u56e2\u961f\u7b49\uff09\u7684\u504f\u597d\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u4f20\u7edf\u65b9\u6cd5\u503e\u5411\u4e8e\u91c7\u7528\u201c\u4e00\u5200\u5207\u201d\u7684\u7b56\u7565\uff0c\u5373\u5047\u8bbe\u5e76\u4f18\u5316\u5355\u4e00\u7684\u504f\u597d\u6a21\u578b\uff0c\u5bf9\u5404\u7fa4\u4f53\u7684\u72ec\u7279\u7279\u6027\u548c\u9700\u6c42\u4e0d\u591f\u654f\u611f\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7fa4\u4f53\u9c81\u68d2\u504f\u597d\u4f18\u5316\uff08GRPO\uff09\u65b9\u6cd5\uff0c\u65e8\u5728\u7a33\u5065\u5730\u4f7fLLMs\u9002\u5e94\u5404\u4e2a\u7fa4\u4f53\u7684\u504f\u597d\u3002GRPO\u65b9\u6cd5\u57fa\u4e8e\u65e0\u5956\u52b1\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u4f46\u533a\u522b\u4e8e\u4ee5\u5f80\uff0c\u5b83\u76ee\u6807\u662f\u5bfb\u627e\u4e00\u4e2a\u80fd\u6700\u5927\u5316\u6700\u5dee\u7fa4\u4f53\u6027\u80fd\u7684\u9c81\u68d2\u7b56\u7565\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0cGRPO\u4f1a\u52a8\u6001\u4e14\u9010\u6b21\u8c03\u6574\u4e0d\u540c\u7fa4\u4f53\u7684\u6743\u91cd\uff0c\u4f18\u5148\u5173\u6ce8\u7d2f\u79ef\u635f\u5931\u8f83\u9ad8\u7684\u7fa4\u4f53\u3002\u6211\u4eec\u5728\u7406\u8bba\u4e0a\u63a2\u8ba8\u4e86GRPO\u7684\u53ef\u884c\u6027\uff0c\u5e76\u5206\u6790\u4e86\u5176\u5728\u5bf9\u6570\u7ebf\u6027\u7b56\u7565\u7c7b\u522b\u4e0b\u7684\u6536\u655b\u6027\u3002\u901a\u8fc7\u4f7f\u7528\u6765\u81ea\u4e0d\u540c\u7fa4\u4f53\u7684\u5168\u5c40\u610f\u89c1\u6570\u636e\u5bf9LLMs\u8fdb\u884cGRPO\u5fae\u8c03\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u6700\u5dee\u7fa4\u4f53\u7684\u8868\u73b0\uff0c\u51cf\u5c11\u4e86\u7fa4\u4f53\u95f4\u635f\u5931\u7684\u4e0d\u5e73\u8861\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u6982\u7387\u51c6\u786e\u6027\uff0c\u76f8\u8f83\u4e8e\u975e\u9c81\u68d2\u57fa\u7ebf\uff0c\u8fd9\u4e9b\u6539\u8fdb\u6548\u679c\u663e\u8457\u3002**|\n", "2405.20285": "|**2024-05-30**|**Who Writes the Review, Human or AI?**|Panagiotis C. Theocharopoulos et.al.|[2405.20285](http://arxiv.org/abs/2405.20285)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u4eba\u4eec\u5173\u6ce8\u5982\u4f55\u8bc6\u522b\u4e0d\u540c\u9886\u57df\u7684AI\u751f\u6210\u6587\u672c\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u901a\u8fc7\u63d0\u51fa\u4e00\u79cd\u65b9\u6cd5\u6765\u51c6\u786e\u533a\u5206\u4eba\u5de5\u667a\u80fd\u751f\u6210\u7684\u548c\u4eba\u7c7b\u64b0\u5199\u7684\u4e66\u8bc4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u8fc1\u79fb\u5b66\u4e60\uff0c\u8ba9\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u540c\u4e3b\u9898\u95f4\u8bc6\u522b\u751f\u6210\u6587\u672c\uff0c\u540c\u65f6\u63d0\u9ad8\u5176\u8bc6\u522b\u5199\u4f5c\u98ce\u683c\u548c\u8bcd\u6c47\u53d8\u5316\u7684\u80fd\u529b\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u5305\u542b\u771f\u5b9e\u7684\u4e66\u8bc4\u548c\u4f7f\u7528Vicuna\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u6a21\u62df\u8bc4\u8bba\uff0c\u4ee5\u8bc4\u4f30\u6240\u63d0\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bc6\u522b\u6587\u672c\u539f\u521b\u6765\u6e90\u662f\u53ef\u884c\u7684\uff0c\u51c6\u786e\u7387\u8fbe\u523096.86%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u8bc6\u522b\u65b9\u9762\u7684\u6027\u80fd\u4e0e\u5c40\u9650\u6027\u7814\u7a76\uff0c\u8fd9\u5bf9\u4e8e\u672a\u6765\u6709\u6548\u7ba1\u7406\u6b64\u7c7b\u6a21\u578b\u4ee5\u53ca\u786e\u4fdd\u4eba\u7c7b\u521b\u4f5c\u5185\u5bb9\u7684\u5b8c\u6574\u6027\u548c\u771f\u5b9e\u6027\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2405.21075": "|**2024-05-31**|**Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis**|Chaoyou Fu et.al.|[2405.21075](http://arxiv.org/abs/2405.21075)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u8ffd\u6c42\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u6210\u4e3a\u8fd1\u671f\u8fdb\u6b65\u7684\u6838\u5fc3\u3002\u7136\u800c\uff0c\u5bf9\u5b83\u4eec\u5904\u7406\u5e8f\u5217\u89c6\u89c9\u6570\u636e\u7684\u80fd\u529b\u7684\u5173\u6ce8\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51faVideo-MME\uff0c\u8fd9\u662f\u9996\u4e2a\u5168\u9762\u8bc4\u4f30MLLMs\u5728\u89c6\u9891\u5206\u6790\u6027\u80fd\u7684\u591a\u6a21\u6001\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u56db\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u89c6\u9891\u7c7b\u578b\u591a\u6837\uff0c\u6db5\u76d66\u4e2a\u4e3b\u8981\u89c6\u89c9\u9886\u57df\u548c30\u4e2a\u5b50\u9886\u57df\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\u6cdb\u5316\u80fd\u529b\uff1b2\uff09\u65f6\u95f4\u7ef4\u5ea6\u7684\u8de8\u5ea6\uff0c\u5305\u62ec\u77ed\u3001\u4e2d\u3001\u957f\u671f\u89c6\u9891\uff0c\u4ece11\u79d2\u52301\u5c0f\u65f6\uff0c\u4ee5\u68c0\u9a8c\u6a21\u578b\u5bf9\u590d\u6742\u60c5\u5883\u52a8\u6001\u7684\u9002\u5e94\u6027\uff1b3\uff09\u6570\u636e\u6a21\u6001\u7684\u5e7f\u5ea6\uff0c\u7ed3\u5408\u89c6\u9891\u5e27\u4ee5\u5916\u7684\u591a\u79cd\u8f93\u5165\uff0c\u5982\u5b57\u5e55\u548c\u97f3\u9891\uff0c\u63ed\u793aMLLMs\u7684\u5168\u65b9\u4f4d\u80fd\u529b\uff1b4\uff09\u9ad8\u8d28\u91cf\u7684\u6807\u6ce8\uff0c\u7531\u4e13\u5bb6\u4e25\u683c\u624b\u52a8\u6807\u8bb0\uff0c\u4ee5\u4fdd\u8bc1\u7cbe\u786e\u4e14\u53ef\u9760\u7684\u6a21\u578b\u8bc4\u4f30\u3002\u6211\u4eec\u7cbe\u5fc3\u6311\u9009\u5e76\u624b\u52a8\u6ce8\u89e3\u4e86900\u6bb5\u89c6\u9891\uff0c\u603b\u65f6\u957f\u8fbe\u5230256\u5c0f\u65f6\uff0c\u751f\u6210\u4e862,700\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u901a\u8fc7Video-MME\uff0c\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u7cfb\u5217\u3001Gemini 1.5 Pro\u5728\u5185\u7684\u591a\u4e2a\u6700\u5148\u8fdb\u7684MLLM\uff0c\u4ee5\u53ca\u5f00\u6e90\u56fe\u50cf\u6a21\u578bInternVL-Chat-V1.5\u548c\u89c6\u9891\u6a21\u578bLLaVA-NeXT-Video\u8fdb\u884c\u4e86\u6df1\u5165\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cGemini 1.5 Pro\u662f\u8868\u73b0\u6700\u4f73\u7684\u5546\u4e1a\u6a21\u578b\uff0c\u660e\u663e\u4f18\u4e8e\u5f00\u6e90\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u53d1\u73b0\u5f3a\u8c03\u4e86\u6539\u8fdb\u5904\u7406\u66f4\u957f\u5e8f\u5217\u548c\u591a\u6a21\u6001\u6570\u636e\u7684\u5fc5\u8981\u6027\u3002\u9879\u76ee\u7f51\u9875\u94fe\u63a5\uff1ahttps://video-mme.github.io|\n", "2405.21047": "|**2024-05-31**|**Grammar-Aligned Decoding**|Kanghee Park et.al.|[2405.21047](http://arxiv.org/abs/2405.21047)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u65f6\u9762\u4e34\u6311\u6218\uff0c\u5982\u7a0b\u5e8f\u4ee3\u7801\u3001\u6570\u5b66\u516c\u5f0f\u6216\u89c4\u8303\u7684\u6807\u8bb0\u3002\u7ea6\u675f\u89e3\u7801\u65b9\u6cd5\u901a\u8fc7\u9650\u5236\u6bcf\u6b21\u8f93\u51fa\u53ef\u80fd\u7684\u4ee4\u724c\uff0c\u786e\u4fdd\u8f93\u51fa\u7b26\u5408\u7279\u5b9a\u89c4\u5219\u6765\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4f8b\u5982\u5728\u8bed\u6cd5\u7ea6\u675f\u89e3\u7801\uff08GCD\uff09\u4e2d\uff0cLLM\u7684\u8f93\u51fa\u5fc5\u987b\u9075\u5faa\u7ed9\u5b9a\u7684\u8bed\u6cd5\u89c4\u5219\u3002\u7136\u800c\uff0c\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u79cd\u7ea6\u675f\u89e3\u7801\u53ef\u80fd\u4f1a\u626d\u66f2\u6a21\u578b\u7684\u5206\u5e03\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u8f93\u51fa\u867d\u7136\u8bed\u6cd5\u6b63\u786e\uff0c\u4f46\u5176\u6982\u7387\u5e76\u4e0d\u76f4\u63a5\u53cd\u6620LLM\u672c\u8eab\u7684\u6982\u7387\u5206\u914d\uff0c\u4ece\u800c\u8d28\u91cf\u4e0d\u9ad8\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u4e0e\u8bed\u6cd5\u7ea6\u675f\u5bf9\u9f50\u7684\u89e3\u7801\u201d\uff08Grammar-Aligned Decoding\uff0cGAD\uff09\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u81ea\u9002\u5e94\u91c7\u6837\u4e0e\u8fd1\u4f3c\u671f\u671b\u672a\u6765\u201d\uff08Adaptive Sampling with Approximate Expected Futures\uff0cASAp\uff09\u7684\u89e3\u7801\u7b97\u6cd5\u3002 ASAp\u7b97\u6cd5\u65e8\u5728\u4fdd\u8bc1\u8f93\u51fa\u7684\u8bed\u6cd5\u6027\uff0c\u5e76\u7406\u8bba\u4e0a\u4ea7\u751f\u4e0eLLM\u5728\u7ed9\u5b9a\u8bed\u6cd5\u7ea6\u675f\u6761\u4ef6\u4e0b\u7684\u6761\u4ef6\u6982\u7387\u76f8\u7b26\u7684\u7ed3\u679c\u3002\u8be5\u7b97\u6cd5\u5229\u7528\u5148\u524d\u7684\u6837\u672c\u8f93\u51fa\u6765\u7a33\u5065\u5730\u4f30\u7b97\u4e0d\u540c\u8f93\u51fa\u524d\u7f00\u7684\u672a\u6765\u8bed\u6cd5\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5728\u4ee3\u7801\u751f\u6210\u548c\u7ed3\u6784\u5316\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cASAp\u7ecf\u5e38\u80fd\u591f\u751f\u6210\u6bd4\u73b0\u6709GCD\u6280\u672f\u66f4\u7b26\u5408LLM\u5206\u5e03\u4e14\u4ecd\u9075\u5b88\u6240\u9700\u8bed\u6cd5\u9650\u5236\u7684\u8f93\u51fa\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u8d28\u91cf\u3002|\n", "2405.21040": "|**2024-05-31**|**Direct Alignment of Language Models via Quality-Aware Self-Refinement**|Runsheng Yu et.al.|[2405.21040](http://arxiv.org/abs/2405.21040)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u884c\u4e3a\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u5e38\u7528\u65b9\u6cd5\u3002\u6700\u8fd1\uff0c\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u4f5c\u4e3a\u4e00\u79cd\u66ff\u4ee3\u65b9\u6848\u5174\u8d77\uff0c\u5b83\u4e0d\u518d\u4f9d\u8d56LLM\u5956\u52b1\u6a21\u578b\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u989d\u5916\u7684\u5185\u5b58\u548c\u8bad\u7ec3\u65f6\u95f4\u3002\u7136\u800c\uff0cDPO\u5ffd\u89c6\u4e86\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u76f8\u5bf9\u8d28\u91cf\uff0c\u53ef\u80fd\u5bfc\u81f4\u8bad\u7ec3\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u5229\u7528LLM\u5185\u90e8\u77e5\u8bc6\u5728\u5373\u65f6\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5e76\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7ec6\u5316\u51fd\u6570\uff0c\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u4f30\u8ba1\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u54c1\u8d28\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u8f7b\u5ea6\u5047\u8bbe\u4e0b\uff0c\u6784\u5efa\u7684\u7ec6\u5316\u51fd\u6570\u80fd\u591f\u5e2e\u52a9\u81ea\u6211\u8c03\u6574\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u5c06\u8fd9\u4e2a\u7ec6\u5316\u529f\u80fd\u6574\u5408\u5230DPO\u53ca\u5176\u53d8\u4f53\u8eab\u4efd\u7b56\u7565\u4f18\u5316\uff08IPO\uff09\u4e2d\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0c\u8fd9\u4e9b\u6539\u8fdb\u540e\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u8005\u4e0a\u8868\u73b0\u51fa\u4f18\u4e8eDPO\u548cIPO\u7684\u6027\u80fd\u3002|\n", "2405.21030": "|**2024-05-31**|**Standards for Belief Representations in LLMs**|Daniel A. Herrmann et.al.|[2405.21030](http://arxiv.org/abs/2405.21030)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u5bb6\u4eec\u6b63\u5728\u5bfb\u6c42\u7406\u89e3\u5b83\u4eec\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5173\u4e8eLLMs\u5982\u4f55\uff08\u5982\u679c\u6709\u7684\u8bdd\uff09\u5185\u90e8\u6784\u5efa\u5bf9\u4e16\u754c\u7684\u4fe1\u5ff5\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u7f3a\u4e4f\u4e00\u4e2a\u7edf\u4e00\u7684\u7406\u8bba\u6846\u67b6\u6765\u652f\u6491\u5bf9LLM\u4e2d\u4fe1\u5ff5\u7684\u7814\u7a76\u3002\u672c\u6587\u8bd5\u56fe\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86\u4e00\u5957\u6761\u4ef6\uff0c\u4f7fLLM\u4e2d\u7684\u8868\u793a\u80fd\u591f\u88ab\u89c6\u4e3a\u4fe1\u5ff5\u4f3c\u7684\u3002\u6211\u4eec\u6307\u51fa\uff0c\u5c3d\u7ba1\u5728LLMs\u4e2d\u6d4b\u91cf\u4fe1\u5ff5\u7684\u9879\u76ee\u4e0e\u51b3\u7b56\u7406\u8bba\u548c\u5f62\u5f0f\u8ba4\u8bc6\u8bba\u4e2d\u7684\u4fe1\u5ff5\u6d4b\u91cf\u5728\u8bb8\u591a\u65b9\u9762\u6709\u76f8\u4f3c\u4e4b\u5904\uff0c\u4f46\u4e5f\u5b58\u5728\u5dee\u5f02\uff0c\u8fd9\u4e9b\u5dee\u5f02\u5e94\u5f71\u54cd\u6211\u4eec\u7684\u6d4b\u91cf\u65b9\u6cd5\u3002\u56e0\u6b64\uff0c\u501f\u9274\u54f2\u5b66\u6d1e\u5bdf\u548c\u673a\u5668\u5b66\u4e60\u7684\u5f53\u4ee3\u5b9e\u8df5\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u56db\u4e2a\u6807\u51c6\uff1a\u51c6\u786e\u6027\u3001\u4e00\u81f4\u6027\u3001\u7edf\u4e00\u6027\u548c\u5b9e\u7528\u6027\u3002\u8fd9\u56db\u4e2a\u6807\u51c6\u7ed3\u5408\u4e86\u7406\u8bba\u8003\u91cf\u4e0e\u5b9e\u9645\u9650\u5236\uff0c\u4e3a\u5168\u9762\u7406\u89e3LLM\u4e2d\u7684\u4fe1\u5ff5\u8868\u793a\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u6211\u4eec\u5f15\u7528\u5b9e\u8bc1\u5de5\u4f5c\u7684\u6210\u679c\uff0c\u63ed\u793a\u4e86\u5355\u72ec\u4f7f\u7528\u67d0\u4e9b\u6807\u51c6\u65f6\u8bc6\u522b\u4fe1\u5ff5\u8868\u793a\u7684\u5c40\u9650\u6027\u3002|\n", "2405.21028": "|**2024-05-31**|**LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models**|Elias Stengel-Eskin et.al.|[2405.21028](http://arxiv.org/abs/2405.21028)|**[link](https://github.com/esteng/pragmatic_calibration)**|**\u5f53\u56de\u7b54\u95ee\u9898\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u80fd\u63d0\u4f9b\u7b54\u6848\uff0c\u8fd8\u80fd\u4f20\u8fbe\u5bf9\u7b54\u6848\u6b63\u786e\u6027\u7684\u4fe1\u5fc3\u7a0b\u5ea6\u3002\u8fd9\u5305\u62ec\u660e\u786e\u7684\u5206\u6570\u6807\u8bb0\uff0c\u5982\u7ed9\u51fa\u6570\u5b57\uff0c\u4ee5\u53ca\u9690\u542b\u7684\u4fe1\u5fc3\u6807\u5fd7\uff0c\u5982\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u989d\u5916\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u5927\u591a\u6570\u6a21\u578b\u5f80\u5f80\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u6821\u51c6\u8fd9\u4e9b\u4fe1\u5fc3\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u7528\u7684\u3001\u8003\u8651\u542c\u4f17\u7684\u5fae\u8c03\u65b9\u6cd5\uff08LACIE\uff09\uff0c\u5b83\u4e0d\u4ec5\u5173\u6ce8\u7b54\u6848\u662f\u5426\u6b63\u786e\uff0c\u8fd8\u5173\u6ce8\u7b54\u6848\u662f\u5426\u4f1a\u88ab\u542c\u4f17\u63a5\u53d7\u3002\u6211\u4eec\u5c06\u6821\u51c6\u89c6\u4e3a\u504f\u597d\u4f18\u5316\uff0c\u901a\u8fc7\u53cc\u4ee3\u7406\u6e38\u620f\u521b\u5efa\u6570\u636e\uff0c\u8ba9\u4e00\u4e2a\u6f14\u8bb2\u8005\u6a21\u578b\u7684\u8f93\u51fa\u63a5\u53d7\u6a21\u62df\u542c\u8005\u7684\u8bc4\u5224\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528LACIE\u5bf9\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff08Mistral-7B\u3001Llama3-8B\u548cLlama3-70B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u663e\u793a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6a21\u62df\u542c\u8005\u9762\u524d\u6709\u66f4\u597d\u7684\u6821\u51c6\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u8d8b\u52bf\u4e5f\u9002\u7528\u4e8e\u4eba\u7c7b\u542c\u4f17\uff0c\u5e2e\u52a9\u4ed6\u4eec\u66f4\u51c6\u786e\u5730\u9884\u6d4b\u6a21\u578b\u7684\u6b63\u786e\u6027\uff1a\u6211\u4eec\u5728\u4eba\u673a\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u63a5\u53d7\u7684\u9519\u8bef\u7b54\u6848\u51cf\u5c11\u4e8647%\uff0c\u800c\u6b63\u786e\u7b54\u6848\u7684\u63a5\u53d7\u7387\u4fdd\u6301\u4e0d\u53d8\u3002\u6b64\u5916\uff0cLACIE\u6cdb\u5316\u5230\u53e6\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\uff0c\u5728\u4f7f\u7528TriviaQA\u8bad\u7ec3\u540e\uff0cTruthfulQA\u4e0a\u7684\u771f\u5b9e\u6027\u5927\u5e45\u63d0\u9ad8\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLACIE\u5bfc\u81f4\u4e86\u6b63\u786e\u548c\u9519\u8bef\u793a\u4f8b\u4e4b\u95f4\u7684\u4fe1\u5fc3\u5ea6\u66f4\u597d\u5730\u5206\u79bb\u3002\u5b9a\u6027\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u4f1a\u66f4\u52a0\u8c28\u614e\uff0c\u5e76\u5728\u56de\u7b54\u6b63\u786e\u65f6\u901a\u8fc7\u4f7f\u7528\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u7ec6\u8282\u6765\u9690\u6027\u5730\u8868\u793a\u786e\u5b9a\u6027\u3002\u6700\u540e\uff0cLACIE\u5fae\u8c03\u5bfc\u81f4\u6a21\u578b\u5bf9\u4e8e\u53ef\u80fd\u9519\u8bef\u7684\u7b54\u6848\u66f4\u503e\u5411\u4e8e\u653e\u5f03\uff08\u4f8b\u5982\u8bf4\u201c\u6211\u4e0d\u77e5\u9053\u201d\uff09\u3002**|\n", "2405.21018": "|**2024-05-31**|**Improved Techniques for Optimization-Based Jailbreaking on Large Language Models**|Xiaojun Jia et.al.|[2405.21018](http://arxiv.org/abs/2405.21018)|**[link](https://github.com/jiaxiaojunqaq/i-gcg)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5176\u5b89\u5168\u6821\u51c6\u6210\u4e3a\u5e7f\u6cdb\u5e94\u7528\u7684\u5173\u952e\u3002\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u7834\u89e3\uff08\u5373\u201cjailbreaking\u201d\uff09\u6d3b\u52a8\u65e5\u76ca\u589e\u591a\uff0c\u5176\u4e2d\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u653b\u51fb\u56e0\u5176\u6210\u6548\u663e\u8457\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0cGCG\u7684\u653b\u51fb\u6548\u7387\u4ecd\u6709\u63d0\u5347\u7a7a\u95f4\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6539\u8fdb\u7684\u4f18\u5316\u57fa\u7ebf\u7834\u89e3\u6280\u672f\uff0c\u4ee5\u63d0\u5347GCG\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u5355\u4e2a\u76ee\u6807\u6a21\u677f\u201cSure\u201d\u6781\u5927\u5730\u9650\u5236\u4e86GCG\u7684\u653b\u51fb\u6548\u679c\uff0c\u56e0\u6b64\u6211\u4eec\u5efa\u8bae\u91c7\u7528\u5305\u542b\u6709\u5bb3\u81ea\u6211\u6697\u793a\u548c/\u6216\u6307\u5bfc\u7684\u591a\u6837\u5316\u76ee\u6807\u6a21\u677f\uff0c\u4ee5\u8bef\u5bfc\u6a21\u578b\u3002\u5728\u4f18\u5316\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u5efa\u8bae\u5728GCG\u4e2d\u5b9e\u65bd\u81ea\u52a8\u591a\u5750\u6807\u66f4\u65b0\uff0c\u4ee5\u52a0\u901f\u6536\u655b\uff0c\u5e76\u5f15\u5165\u4ece\u7b80\u5355\u5230\u590d\u6742\uff08easy-to-hard\uff09\u7684\u521d\u59cb\u5316\u6280\u5de7\u3002\u5c06\u8fd9\u4e9b\u6539\u8fdb\u6574\u5408\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u9ad8\u6548\u7684\u65b9\u6cd5\u2014\u2014$\\mathcal{I}$-GCG\u3002\u5b9e\u9a8c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5982NeurIPS 2023 \u7ea2\u961f\u6311\u6218\u4e2d\u8fdb\u884c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6539\u8fdb\u6280\u672f\u80fd\u591f\u5e2e\u52a9GCG\u8d85\u8d8a\u73b0\u6709\u7834\u89e3\u653b\u51fb\uff0c\u5b9e\u73b0\u63a5\u8fd1100%\u7684\u653b\u51fb\u6210\u529f\u7387\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/jiaxiaojunQAQ/I-GCG\u3002**|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985](http://arxiv.org/abs/2405.20985)|**[link](https://github.com/yaolinli/deco)**|\u8be5\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u6295\u5f71\u5668\u6a21\u5757\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fde\u63a5\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u3001\u4fc3\u8fdb\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u9762\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u5bf9\u4e8e\u6295\u5f71\u5668\u5728\u89c6\u89c9-\u8bed\u8a00\u5bf9\u9f50\u65b9\u9762\u7684\u6548\u679c\u8bc4\u4f30\u4ecd\u663e\u4e0d\u8db3\uff0c\u901a\u5e38\u53ea\u80fd\u901a\u8fc7\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u95f4\u63a5\u63a8\u65ad\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u901a\u8fc7\u5206\u6790MLLM\u4e2d\u7684\u89c6\u89c9-\u8bed\u8a00\u8bed\u4e49\u6d41\uff0c\u6765\u89e3\u8bfb\u6295\u5f71\u5668\u7684\u5de5\u4f5c\u673a\u5236\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u8005\u8ffd\u8e2a\u4ece\u751f\u6210\u7684\u8bed\u8a00\u6807\u8bb0\u5230\u539f\u59cb\u89c6\u89c9\u7f16\u7801\u5757\u4ee5\u53ca\u6295\u5f71\u5668\u4ea7\u751f\u7684\u4e2d\u95f4\u8f93\u51fa\u4e4b\u95f4\u7684\u8bed\u4e49\u76f8\u5173\u6027\u6d41\u3002\u53d1\u73b0\u538b\u7f29\u578b\u6295\u5f71\u5668\uff08\u5982QFormer\uff09\u503e\u5411\u4e8e\u5c06\u89c6\u89c9\u5757\u62bd\u8c61\u6210\u6709\u9650\u7684\u51e0\u4e2a\u6982\u5ff5\uff0c\u5982\u7269\u4f53\u6216\u5c5e\u6027\uff0c\u5bfc\u81f4\u201c\u53cc\u91cd\u62bd\u8c61\u201d\u73b0\u8c61\uff1a\u9996\u5148\uff0c\u6295\u5f71\u5668\u53c2\u7167\u9884\u5b9a\u4e49\u67e5\u8be2\u4ee4\u724c\u8fdb\u884c\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\uff0c\u7136\u540e\uff0c\u57fa\u4e8e\u6587\u672c\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u4e00\u6b65\u63d0\u53d6\u3002\u8fd9\u79cd\u53cc\u91cd\u62bd\u8c61\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6548\u7387\u4e0d\u9ad8\uff0c\u5e76\u53ef\u80fd\u5bfc\u81f4\u89c6\u89c9\u8bed\u4e49\u4fe1\u606f\u7684\u7d2f\u79ef\u7f3a\u5931\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u63d0\u51fa\u201c\u89e3\u8026\u538b\u7f29\u4e0e\u62bd\u8c61\uff08DeCo\uff09\u201d\u7684\u5173\u952e\u6d1e\u5bdf\uff0c\u5373\u5728\u6295\u5f71\u5c42\u9762\u4e0a\u5c06\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u538b\u7f29\uff0c\u800c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u5b8c\u5168\u8d1f\u8d23\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u538b\u7f29\u5668\u2014\u2014\u4e8c\u7ef4\u81ea\u9002\u5e94\u6c60\u5316\uff0c\u4ee5\u65e0\u53c2\u6570\u7684\u65b9\u5f0f\u964d\u4f4e\u89c6\u89c9\u5757\u7684\u5c3a\u5bf8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDeCo\u5728\u6027\u80fd\u548c\u6548\u7387\u4e0a\u90fd\u4f18\u4e8e\u4f20\u7edf\u7684\u538b\u7f29\u6295\u5f71\u5668\u3002\u5b83\u5728MLLM\u57fa\u51c6\u3001\u89c6\u89c9\u5b9a\u4f4d\u548c\u5f00\u653e\u6027\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\u5206\u522b\u53d6\u5f97\u4e860.9%\u30017.1%\u548c2.9%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u62e5\u6709\u66f4\u5c11\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u66f4\u5feb\u7684\u6536\u655b\u901f\u5ea6\u3002|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978](http://arxiv.org/abs/2405.20978)|**[link](https://github.com/calubkk/raat)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5f3a\u5927\u529f\u80fd\uff0c\u4f46\u9762\u4e34\u6311\u6218\uff0c\u5982\u865a\u6784\u3001\u8fc7\u65f6\u77e5\u8bc6\u548c\u96be\u4ee5\u8ffd\u6eaf\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u5b83\u7ed3\u5408\u5916\u90e8\u6570\u636e\u5e93\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u4e0d\u9002\u5f53\u7684\u68c0\u7d22\u6bb5\u843d\u53ef\u80fd\u59a8\u788dLLMs\u751f\u6210\u5168\u9762\u4e14\u9ad8\u8d28\u91cf\u7684\u56de\u7b54\u3002\u5148\u524d\u5173\u4e8eRAG\u4e2d\u68c0\u7d22\u566a\u58f0\u7a33\u5065\u6027\u7684\u7814\u7a76\u5f80\u5f80\u5c40\u9650\u4e8e\u6709\u9650\u7684\u566a\u58f0\u7c7b\u578b\uff0c\u8fd9\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u68c0\u7d22\u73af\u5883\u4e0d\u7b26\uff0c\u9650\u5236\u4e86\u5b9e\u9645\u5e94\u7528\u3002\u672c\u7814\u7a76\u9996\u5148\u63a2\u8ba8\u4e86\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u5c06\u5176\u5206\u4e3a\u4e09\u79cd\u4e0d\u540c\u7684\u7c7b\u522b\uff0c\u53cd\u6620\u771f\u5b9e\u73af\u5883\u3002\u6211\u4eec\u5206\u6790\u4e86\u8fd9\u4e9b\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u566a\u58f0\u5bf9LLMs\u7a33\u5065\u6027\u7684\u5f71\u54cd\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAG\u65b9\u6cd5\uff0c\u79f0\u4e3a\u68c0\u7d22\u589e\u5f3a\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\uff08RAAT\uff09\u3002RAAT\u5229\u7528\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\u6765\u52a8\u6001\u8c03\u6574\u6a21\u578b\u7684\u8bad\u7ec3\u6d41\u7a0b\u4ee5\u5e94\u5bf9\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u91c7\u7528\u591a\u4efb\u52a1\u5b66\u4e60\u786e\u4fdd\u6a21\u578b\u80fd\u591f\u8bc6\u522b\u5608\u6742\u7684\u4e0a\u4e0b\u6587\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5404\u79cd\u566a\u58f0\u6761\u4ef6\u4e0b\uff0c\u4f7f\u7528RAAT\u8bad\u7ec3\u7684LLaMA-2 7B\u6a21\u578b\u5728F1\u548cEM\u5206\u6570\u4e0a\u663e\u793a\u51fa\u663e\u8457\u63d0\u5347\u3002\u4e3a\u4e86\u4fbf\u4e8e\u590d\u73b0\uff0c\u6211\u4eec\u5df2\u5728https://github.com/calubkk/RAAT\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2405.20974": "|**2024-05-31**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974](http://arxiv.org/abs/2405.20974)|**[link](https://github.com/xu1868/sayself)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u51c6\u786e\u6216\u865a\u5047\u7684\u4fe1\u606f\uff0c\u5e76\u4e14\u901a\u5e38\u65e0\u6cd5\u8868\u660e\u5176\u4fe1\u5fc3\u6c34\u5e73\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u76f4\u63a5\u63d0\u793a\u6216\u81ea\u6211\u4e00\u81f4\u6027\u63d0\u793a\u6765\u63d0\u53d6LLMs\u7684\u4fe1\u5fc3\uff0c\u6216\u8005\u6784\u5efa\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6027\u80fd\u8f83\u5dee\uff0c\u800c\u57fa\u4e8e\u8bad\u7ec3\u7684\u65b9\u6cd5\u53c8\u5c40\u9650\u4e8e\u4e8c\u5143\u6216\u4e0d\u7cbe\u786e\u7684\u6574\u4f53\u4fe1\u5fc3\u4f30\u8ba1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014SaySelf\uff0c\u8fd9\u662f\u4e00\u4e2a\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLMs\u63d0\u4f9b\u66f4\u7cbe\u786e\u7684\u7ec6\u7c92\u5ea6\u4fe1\u5fc3\u4f30\u8ba1\u3002 \u6b64\u5916\uff0cSaySelf\u8fd8\u63a8\u52a8LLMs\u751f\u6210\u81ea\u6211\u53cd\u601d\u7684\u89e3\u91ca\uff0c\u660e\u786e\u6307\u51fa\u5b83\u4eec\u5728\u53c2\u6570\u77e5\u8bc6\u4e0a\u7684\u7a7a\u767d\u5e76\u89e3\u91ca\u4e0d\u786e\u5b9a\u6027\u3002\u8fd9\u662f\u901a\u8fc7\u8ba9LLM\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u81ea\u52a8\u603b\u7ed3\u7279\u5b9a\u77e5\u8bc6\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u6765\u5b9e\u73b0\u7684\u3002\u8fd9\u79cd\u603b\u7ed3\u662f\u57fa\u4e8e\u5bf9\u591a\u4e2a\u91c7\u6837\u63a8\u7406\u94fe\u7684\u4e0d\u4e00\u81f4\u6027\u5206\u6790\uff0c\u751f\u6210\u7684\u6570\u636e\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u6821\u51c6\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u5956\u52b1\u51c6\u786e\u3001\u9ad8\u7f6e\u4fe1\u5ea6\u7684\u9884\u6d4b\uff0c\u540c\u65f6\u60e9\u7f5a\u9519\u8bef\u8f93\u51fa\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u5728\u5206\u5e03\u5185\u8fd8\u662f\u5206\u5e03\u5916\u7684\u6570\u636e\u96c6\u4e0a\uff0cSaySelf\u90fd\u80fd\u6709\u6548\u51cf\u5c11\u4fe1\u5fc3\u6821\u51c6\u8bef\u5dee\uff0c\u540c\u65f6\u4fdd\u6301\u4efb\u52a1\u6027\u80fd\u3002\u751f\u6210\u7684\u81ea\u6211\u53cd\u601d\u7406\u7531\u4e5f\u88ab\u8bc1\u660e\u662f\u5408\u7406\u7684\uff0c\u80fd\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u6821\u51c6\u3002\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a\\url{https://github.com/xu1868/SaySelf}\u3002**|\n", "2405.20973": "|**2024-05-31**|**LCQ: Low-Rank Codebook based Quantization for Large Language Models**|Wen-Pu Cai et.al.|[2405.20973](http://arxiv.org/abs/2405.20973)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4f18\u5f02\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7684\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u9ad8\u6210\u4e3a\u90e8\u7f72\u7684\u4e00\u5927\u6311\u6218\u3002\u4e3a\u4e86\u538b\u7f29\u6a21\u578b\u5e76\u964d\u4f4e\u6210\u672c\uff0c\u6743\u91cd\u91cf\u5316\u6280\u672f\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u76ee\u524d\uff0c\u5927\u591a\u6570\u9488\u5bf9LLMs\u7684\u91cf\u5316\u65b9\u6cd5\u4f7f\u7528\u79e9\u4e00\u7801\u672c\uff0c\u7136\u800c\u5728\u9ad8\u538b\u7f29\u6bd4\u4e0b\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u7cbe\u5ea6\u635f\u5931\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6743\u91cd\u91cf\u5316\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4f4e\u79e9\u7801\u672c\u91cf\u5316\uff08LCQ\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 ## \u65b9\u6cd5 LCQ\u91c7\u7528\u4f4e\u79e9\u7801\u672c\u8fdb\u884c\u91cf\u5316\uff0c\u5176\u79e9\u53ef\u4ee5\u5927\u4e8e\u4e00\u3002\u8fd9\u79cd\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u5229\u7528\u66f4\u9ad8\u7684\u79e9\u6765\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u7684\u7cbe\u5ea6\uff0c\u540c\u65f6\u63a7\u5236\u989d\u5916\u7684\u5b58\u50a8\u5f00\u9500\u51e0\u4e4e\u4e3a\u96f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u6bd4\uff0cLCQ\u5728\u4fdd\u6301\u826f\u597d\u51c6\u786e\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u80fd\u591f\u5b9e\u73b0\u66f4\u4f18\u7684\u538b\u7f29\u6548\u679c\u3002 ## \u7ed3\u8bba \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4f4e\u79e9\u7801\u672c\u91cf\u5316\u65b9\u6cd5\uff0c\u5b83\u6709\u671b\u5728\u4e0d\u663e\u8457\u589e\u52a0\u5b58\u50a8\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6027\u80fd\u548c\u6548\u7387\uff0c\u4e3a\u9ad8\u6548\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550](http://arxiv.org/abs/2406.02550)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|**\u8fd9\u7bc7\u5de5\u4f5c\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7ec4\u6a21\u5757\u5316\u7b97\u672f\u4efb\u52a1\u4e2d\u51fa\u73b0\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u6280\u80fd\u7ec4\u5408\u73b0\u8c61\u3002\u6211\u4eec\u5173\u6ce8\u7684\u662f\u6709\u9650\u6570\u91cf\u7684\u4e00\u6b21\u6027\u6a21\u8fd0\u7b97\u51fd\u6570 $z = a \\times x + b \\times y \\;(\\text{mod}\\; p)$\uff0c\u8fd9\u4e9b\u51fd\u6570\u7531\u5411\u91cf $(a, b) \\in \\mathbb{Z}_p^2$ \u6807\u8bb0\u3002\u90e8\u5206\u4efb\u52a1\u88ab\u7528\u4f5c\u9884\u8bad\u7ec3\uff0c\u5176\u4f59\u7528\u4e8e\u5206\u5e03\u5916\u6d4b\u8bd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cGPT\u98ce\u683c\u7684Transformer\u968f\u7740\u9884\u8bad\u7ec3\u4efb\u52a1\u6570\u91cf\u589e\u52a0\uff0c\u5176\u5728\u5206\u5e03\u5185\u548c\u5206\u5e03\u5916\u7684\u6cdb\u5316\u80fd\u529b\u4f1a\u7ecf\u5386\u8f6c\u53d8\u3002\u6700\u5c0f\u578b\u80fd\u5b9e\u73b0\u5206\u5e03\u5916\u6cdb\u5316\u7684\u6a21\u578b\u9700\u8981\u4e24\u4e2aTransformer\u5757\uff1b\u800c\u5bf9\u4e8e\u66f4\u6df1\u7684\u6a21\u578b\uff0c\u5206\u5e03\u5916\u6cdb\u5316\u9636\u6bb5\u662f\u201c\u77ac\u6001\u201d\u7684\uff0c\u9700\u8981\u65e9\u671f\u505c\u6b62\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u4e86\u53ef\u89e3\u91ca\u6027\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e24\u79cd\u9636\u6bb5\u4e2d\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8868\u793a\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b66\u4e60\u5230\u7684\u7b97\u6cd5\u3002**|\n", "2406.02547": "|**2024-06-04**|**Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning**|Alex Jinpeng Wang et.al.|[2406.02547](http://arxiv.org/abs/2406.02547)|**[link](https://github.com/showlab/VisInContext)**|**\u8fd9\u6bb5\u7814\u7a76\u5e76\u672a\u4ecb\u7ecd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u800c\u662f\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u63d0\u5347\u957f\u5e8f\u5217\u5728\u591a\u6a21\u6001\u6a21\u578b\u4e2d\u7684\u5904\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201cVisualized In-Context Text Processing\u201d\uff08VisInContext\uff09\u6280\u672f\uff0c\u901a\u8fc7\u89c6\u89c9\u4ee4\u724c\u6765\u5904\u7406\u957f\u6587\u672c\uff0c\u4ece\u800c\u663e\u8457\u964d\u4f4eGPU\u5185\u5b58\u4f7f\u7528\u548c\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9700\u6c42\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a560\u4ebf\u53c2\u6570\u7684\u6df7\u5408 Experts\uff08MOE\uff09\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u9884\u8bad\u7ec3\u4e2d\u7684\u4e0a\u4e0b\u6587\u6587\u672c\u957f\u5ea6\u6269\u5c55\u5230\u4e862048\u4e2atokens\uff0c\u800c\u8ba1\u7b97\u91cf\u51e0\u4e4e\u4fdd\u6301\u4e0d\u53d8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528VisInContext\u8bad\u7ec3\u7684\u6a21\u578b\u5728\u5e38\u89c1\u7684\u57fa\u4e8e\u5b9e\u4f8b\u7684\u5c11\u91cf\u6570\u636e\u8bc4\u4f30\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cVisInContext\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u7ed3\u5408\uff0c\u80fd\u589e\u5f3a\u5bf9\u6587\u6863\u7684\u7406\u89e3\u80fd\u529b\uff0c\u7279\u522b\u9002\u7528\u4e8e\u6587\u6863\u95ee\u7b54\u548c\u8fde\u7eed\u6587\u6863\u68c0\u7d22\uff0c\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002**|\n", "2406.02543": "|**2024-06-04**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543](http://arxiv.org/abs/2406.02543)|null|\u6211\u4eec\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u76ee\u6807\u662f\u8bc6\u522b\u5bf9\u7ed9\u5b9a\u67e5\u8be2\u7684\u54cd\u5e94\u65f6\u7684\u4e0d\u786e\u5b9a\u6027\u7a0b\u5ea6\u3002\u6211\u4eec\u540c\u65f6\u8003\u8651\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e0d\u786e\u5b9a\u6027\uff1a\u4e00\u79cd\u662f\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\uff08\u4f8b\u5982\u5bf9\u4e8b\u5b9e\u6216\u8bed\u8a00\u771f\u7406\u7684\u672a\u77e5\uff09\uff0c\u53e6\u4e00\u79cd\u662f\u4e0d\u53ef\u6d88\u9664\u7684\u968f\u673a\u6027\uff08\u5982\u53ef\u80fd\u7684\u7b54\u6848\u591a\u6837\u6027\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4fe1\u606f\u8bba\u6307\u6807\uff0c\u80fd\u591f\u53ef\u9760\u5730\u533a\u5206\u51fa\u53ea\u6709\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u8f83\u5927\u7684\u60c5\u51b5\uff0c\u8fd9\u65f6\u6a21\u578b\u7684\u8f93\u51fa\u662f\u4e0d\u53ef\u9760\u7684\u3002\u8fd9\u4e2a\u6761\u4ef6\u4ec5\u4f9d\u8d56\u4e8e\u901a\u8fc7\u7279\u6b8a\u8fed\u4ee3\u63d0\u793a\u57fa\u4e8e\u5148\u524d\u54cd\u5e94\u5f97\u5230\u7684\u6a21\u578b\u8f93\u51fa\u6765\u8ba1\u7b97\u3002\u8fd9\u79cd\u91cf\u5316\u65b9\u6cd5\u53ef\u4ee5\u68c0\u6d4b\u5355\u7b54\u548c\u591a\u7b54\u60c5\u51b5\u4e0b\u662f\u5426\u5b58\u5728\u865a\u6784\uff08\u5373\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u9ad8\uff09\u7684\u60c5\u51b5\uff0c\u8fd9\u4e0e\u8bb8\u591a\u6807\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\u7b56\u7565\uff08\u5982\u4ee5\u54cd\u5e94\u7684\u5bf9\u6570\u4f3c\u7136\u6027\u4f5c\u4e3a\u9608\u503c\uff09\u4e0d\u540c\uff0c\u540e\u8005\u65e0\u6cd5\u8bc6\u522b\u591a\u7b54\u60c5\u51b5\u4e0b\u7684\u865a\u6784\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86LLM\u5982\u4f55\u901a\u8fc7\u8fed\u4ee3\u63d0\u793a\u653e\u5927\u5bf9\u7ed9\u5b9a\u8f93\u51fa\u7684\u6982\u7387\u5206\u914d\uff0c\u8fd9\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u5174\u8da3\u4ef7\u503c\u3002|\n", "2406.02542": "|**2024-06-04**|**Loki: Low-Rank Keys for Efficient Sparse Attention**|Prajwal Singhania et.al.|[2406.02542](http://arxiv.org/abs/2406.02542)|null|\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u7279\u522b\u662f\u5f53\u4f7f\u7528\u957f\u5e8f\u5217\u65f6\uff0c\u81ea\u6ce8\u610f\u529b\u673a\u5236\u662f\u4e3b\u8981\u5f00\u9500\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u7a00\u758f\u6ce8\u610f\u529b\u8fd1\u4f3c\u65b9\u6cd5\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5206\u6790\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u5757\u4e2d\u7684\u952e\u5411\u91cf\u5b9e\u9645\u4e0a\u5904\u4e8e\u4e00\u4e2a\u8fdc\u4f4e\u4e8e\u539f\u59cb\u7ef4\u5ea6\u7684\u7a7a\u95f4\u3002\u8fd9\u4e00\u89c2\u5bdf\u4fc3\u4f7f\u6211\u4eec\u63d0\u51faLoki\uff0c\u4e00\u79cd\u65b0\u7684\u7a00\u758f\u6ce8\u610f\u529b\u65b9\u6cd5\u3002Loki\u6839\u636e\u5728\u4f4e\u7ef4\u7a7a\u95f4\u8ba1\u7b97\u7684\u6ce8\u610f\u529b\u5f97\u5206\uff0c\u5bf9KV\u7f13\u5b58\u4e2d\u7684\u4ee4\u724c\u8fdb\u884c\u6392\u5e8f\u548c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLoki\u80fd\u591f\u6bd4\u5176\u4ed6\u6d41\u884c\u8fd1\u4f3c\u65b9\u6cd5\u66f4\u597d\u5730\u4fdd\u6301\u6a21\u578b\u7684\u6548\u80fd\uff0c\u540c\u65f6\u7531\u4e8e\u51cf\u5c11\u4e86\u6570\u636e\u79fb\u52a8\uff08\u52a0\u8f7d/\u5b58\u50a8\uff09\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u52a0\u901f\u4e86\u6ce8\u610f\u529b\u8ba1\u7b97\u3002|\n", "2406.02539": "|**2024-06-04**|**Parrot: Multilingual Visual Instruction Tuning**|Hai-Long Sun et.al.|[2406.02539](http://arxiv.org/abs/2406.02539)|**[link](https://github.com/aidc-ai/parrot)**|\u968f\u7740GPT-4V\u7b49\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u5de5\u667a\u80fd\u671d\u7740\u901a\u7528\u4eba\u5de5\u667a\u80fd\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6765\u540c\u6b65\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u800c\u8d4b\u4e88\u5b83\u4eec\u591a\u6a21\u6001\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u505a\u6cd5\u53ef\u80fd\u5bfc\u81f4\u968f\u7740\u8bad\u7ec3\u7684\u8fdb\u884c\uff0c\u8bed\u8a00\u6a21\u578b\u5904\u7406\u591a\u79cd\u8bed\u8a00\u7684\u80fd\u529b\u9010\u6e10\u51cf\u5f31\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684\u4e0d\u5e73\u8861SFT\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u975e\u82f1\u8bed\u8bed\u8a00\u6027\u80fd\u663e\u8457\u4e0b\u964d\uff0c\u539f\u56e0\u5728\u4e8eSFT\u8fc7\u7a0b\u4e2d\u672a\u80fd\u6709\u6548\u8fde\u63a5\u89c6\u89c9\u7f16\u7801\u5668\u548c\u591a\u8bed\u8a00\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faParrot\uff0c\u4e00\u79cd\u5229\u7528\u6587\u672c\u5f15\u5bfc\u5728\u8bed\u8a00\u5c42\u9762\u9a71\u52a8\u89c6\u89c9\u4ee4\u724c\u5bf9\u9f50\u7684\u65b0\u65b9\u6cd5\u3002Parrot\u901a\u8fc7\u8ba9\u89c6\u89c9\u4ee4\u724c\u6839\u636e\u4e0d\u540c\u7684\u8bed\u8a00\u8f93\u5165\u8fdb\u884c\u6761\u4ef6\u5316\uff0c\u5e76\u501f\u52a9\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u4fc3\u8fdb\u591a\u8bed\u8a00\u4ee4\u724c\u7684\u5bf9\u9f50\u3002\u7279\u522b\u662f\uff0c\u4e3a\u4e86\u589e\u5f3a\u975e\u82f1\u8bed\u89c6\u89c9\u4ee4\u724c\u7684\u5bf9\u9f50\uff0c\u6211\u4eec\u8ba1\u7b97\u521d\u59cb\u89c6\u89c9\u7279\u5f81\u4e0e\u6587\u672c\u5d4c\u5165\u4e4b\u95f4\u7684\u8de8\u6ce8\u610f\u529b\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230MoE\u8def\u7531\u5668\uff0c\u9009\u62e9\u6700\u76f8\u5173\u7684\u4e13\u5bb6\u3002\u9009\u5b9a\u7684\u4e13\u5bb6\u4f1a\u5c06\u521d\u59cb\u89c6\u89c9\u4ee4\u724c\u8f6c\u5316\u4e3a\u7279\u5b9a\u8bed\u8a00\u7684\u89c6\u89c9\u4ee4\u724c\u3002\u9274\u4e8e\u76ee\u524d\u7f3a\u4e4f\u8bc4\u4f30\u591a\u8bed\u8a00\u80fd\u529b\u7684\u6807\u51c6\u57fa\u51c6\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u591a\u6a21\u6001\u57fa\u51c6\uff08MMMB\uff09\uff0c\u5305\u62ec6\u79cd\u8bed\u8a00\u300115\u4e2a\u7c7b\u522b\u548c12,000\u4e2a\u95ee\u9898\u3002Parrot\u4e0d\u4ec5\u5728MMMB\u548cMMM Benchmark\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8fd8\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5c06\u63d0\u4f9bParrot\u7684\u6e90\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\u96c6\u4f9b\u516c\u4f17\u4f7f\u7528\u3002|\n", "2406.02536": "|**2024-06-04**|**Mitigate Position Bias in Large Language Models via Scaling a Single Dimension**|Yijiong Yu et.al.|[2406.02536](http://arxiv.org/abs/2406.02536)|**[link](https://github.com/PositionalHidden/PositionalHidden)**|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4e00\u4e2a\u73b0\u8c61\u2014\u2014\u4f4d\u7f6e\u504f\u89c1\uff0c\u4e5f\u79f0\u4e3a\"\u8ff7\u5931\u5728\u4e2d\u95f4\"\u3002\u8fd9\u79cd\u504f\u89c1\u5728\u957f\u6587\u672c\u60c5\u5883\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u5373\u5173\u952e\u4fe1\u606f\u5728\u63d0\u793a\u4e2d\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f1a\u663e\u8457\u5f71\u54cd\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u6743\u91cd\u662f\u4f4d\u7f6e\u504f\u89c1\u7684\u5fae\u89c2\u8868\u73b0\u3002\u6b64\u5916\uff0c\u8bba\u6587\u6307\u51fa\uff0c\u56e0\u679c\u6ce8\u610f\u529b\u63a9\u7801\u901a\u8fc7\u521b\u5efa\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\uff0c\u4e5f\u5bf9\u4f4d\u7f6e\u504f\u89c1\u6709\u6240\u8d21\u732e\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u51cf\u8f7b\u4f4d\u7f6e\u504f\u89c1\uff0c\u5373\u8c03\u6574\u8fd9\u4e9b\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\u3002\u5b9e\u9a8c\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\uff0c\u5305\u62ec\u81ea\u7136\u95ee\u9898\u591a\u6587\u6863\u95ee\u7b54\u3001\u952e\u503c\u68c0\u7d22\u3001LongBench\u548c\u65f6\u95f4\u7ebf\u91cd\u6392\uff0c\u6d89\u53caRoPE\u6a21\u578b\u3001\u6269\u5c55\u4e0a\u4e0b\u6587\u7a97\u53e3\u6a21\u578b\u548cAlibi\u6a21\u578b\u7b49\u591a\u79cd\u67b6\u6784\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4ec5\u4fee\u6539\u9690\u85cf\u72b6\u6001\u7684\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u5c31\u80fd\u5b9e\u73b0\u6027\u80fd\u63d0\u5347\uff0c\u6700\u9ad8\u53ef\u8fbe15.2%\u3002\u7814\u7a76\u8005\u8fd8\u63d0\u4f9b\u4e86\u4ee3\u7801\u4f9b\u8fdb\u4e00\u6b65\u4f7f\u7528\uff0c\u4ee3\u7801\u5730\u5740\u4e3a\uff1ahttps://aka.ms/PositionalHidden\u3002|\n", "2406.02532": "|**2024-06-04**|**SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices**|Ruslan Svirschevski et.al.|[2406.02532](http://arxiv.org/abs/2406.02532)|**[link](https://github.com/yandex-research/specexec)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u9ad8\u6548\u8fd0\u884c\u5b83\u4eec\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u63a8\u6d4b\u6027\u89e3\u7801\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u662f\u9488\u5bf9\u6570\u636e\u4e2d\u5fc3\u786c\u4ef6\u8fdb\u884c\u8bbe\u8ba1\u3002\u672c\u7814\u7a76\u53cd\u95ee\uff1a\u6211\u4eec\u80fd\u5728\u6d88\u8d39\u7ea7\u8bbe\u5907\u4e0a\u591a\u5feb\u5730\u8fd0\u884cLLMs\uff1f\u6d88\u8d39\u8005\u7ea7GPU\u5df2\u65e0\u6cd5\u5bb9\u7eb3\u6700\u5927\u7684\u6a21\u578b\uff08500\u4ebf\u53c2\u6570\u4ee5\u4e0a\uff09\uff0c\u56e0\u6b64\u9700\u8981\u5c06\u53c2\u6570\u5378\u8f7d\u5230RAM\u6216SSD\u3002\u5f53\u4f7f\u7528\u5378\u8f7d\u53c2\u6570\u7684\u65b9\u5f0f\u8fd0\u884c\u65f6\uff0c\u63a8\u7406\u5f15\u64ce\u53ef\u4ee5\u540c\u65f6\u5904\u7406\u6570\u767e\u4e43\u81f3\u6570\u5343\u4e2a\u4ee4\u724c\u7684\u6279\u6b21\uff0c\u4f7f\u5176\u975e\u5e38\u9002\u5408\u63a8\u6d4b\u6027\u89e3\u7801\u3002\u6211\u4eec\u63d0\u51faSpecExec\uff08\u63a8\u6d4b\u6027\u6267\u884c\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u7684\u5e76\u884c\u89e3\u7801\u65b9\u6cd5\uff0c\u9002\u7528\u4e8e\u4e3b\u6d41LLM\u5bb6\u65cf\uff0c\u80fd\u751f\u6210\u6bcf\u8f6e\u76ee\u6807\u6a21\u578b\u8fed\u4ee3\u9ad8\u8fbe20\u4e2a\u4ee4\u724c\u7684\u9884\u6d4b\u3002\u5b83\u5229\u7528\u73b0\u4ee3LLMs\u4e2d\u6982\u7387\u5206\u5e03\u7684\u9ad8\u6ce2\u52a8\u6027\u548c\u6a21\u578b\u8f93\u51fa\u6982\u7387\u4e4b\u95f4\u7684\u9ad8\u5ea6\u4e00\u81f4\u6027\u3002SpecExec\u901a\u8fc7\u4ece\u8349\u7a3f\u6a21\u578b\u83b7\u53d6\u6700\u53ef\u80fd\u7684\u4ee4\u724c\u5ef6\u7eed\uff0c\u6784\u5efa\u4e00\u4e2a\u76ee\u6807\u6a21\u578b\u7684\u201c\u7f13\u5b58\u201d\u6811\uff0c\u7136\u540e\u5728\u4e00\u4e2a\u5355\u6b21\u904d\u5386\u4e2d\u9a8c\u8bc1\u3002 \u4f7f\u7528SpecExec\uff0c\u6211\u4eec\u5728\u6d88\u8d39\u7ea7GPU\u4e0a\u5b9e\u73b0\u4e86500\u4ebf\u53c2\u6570LLM\u7684\u63a8\u7406\uff0c\u914d\u5408RAM\u5378\u8f7d\uff0c4\u4f4d\u91cf\u5316\u4e0b\u7684\u901f\u5ea6\u8fbe\u52304-6\u4e2a\u4ee4\u724c/\u79d2\uff0c\u800c16\u4f4d\u6743\u91cd\u4e0b\u7684\u901f\u5ea6\u4e3a2-3\u4e2a\u4ee4\u724c/\u79d2\u3002|\n", "2406.02528": "|**2024-06-04**|**Scalable MatMul-free Language Modeling**|Rui-Jie Zhu et.al.|[2406.02528](http://arxiv.org/abs/2406.02528)|**[link](https://github.com/ridgerchu/matmulfreellm)**|**## \u7ffb\u8bd1 \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u77e9\u9635\u4e58\u6cd5\uff08MatMul\uff09\u901a\u5e38\u5360\u636e\u4e3b\u8981\u8ba1\u7b97\u5f00\u9500\u3002\u968f\u7740LLMs\u7684\u89c4\u6a21\u6269\u5927\uff0c\u5176\u5d4c\u5165\u7ef4\u5ea6\u548c\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e5f\u968f\u4e4b\u589e\u52a0\uff0c\u8fd9\u4e00\u95ee\u9898\u66f4\u4e3a\u663e\u8457\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u80fd\u591f\u5728\u4fdd\u6301\u5f3a\u5927\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5b8c\u5168\u79fb\u9664LLMs\u4e2d\u7684MatMul\u64cd\u4f5c\uff0c\u5373\u4f7f\u662f\u572827\u4ebf\u53c2\u6570\u91cf\u7ea7\u7684\u6a21\u578b\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65e0MatMul\u6a21\u578b\u5728\u4e0e\u5185\u5b58\u6d88\u8017\u663e\u8457\u66f4\u591a\u7684\u72b6\u6001-of-the-artTransformer\u76f8\u5f53\u7684\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u7814\u7a76\u4e86\u6a21\u578b\u7684\u6269\u5c55\u6027\u89c4\u5f8b\uff0c\u5e76\u53d1\u73b0\u65e0MatMul\u6a21\u578b\u4e0e\u5168\u7cbe\u5ea6Transformer\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u968f\u7740\u6a21\u578b\u5c3a\u5bf8\u589e\u5927\u800c\u51cf\u5c0f\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684GPU\u5b9e\u73b0\uff0c\u76f8\u8f83\u4e8e\u672a\u4f18\u5316\u7684\u57fa\u7ebf\uff0c\u8bad\u7ec3\u65f6\u80fd\u51cf\u5c11\u9ad8\u8fbe61%\u7684\u5185\u5b58\u4f7f\u7528\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u901a\u8fc7\u4f18\u5316\u7684\u5185\u6838\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5185\u5b58\u6d88\u8017\u53ef\u964d\u4f4e\u8d85\u8fc710\u500d\u3002\u4e3a\u4e86\u51c6\u786e\u8bc4\u4f30\u67b6\u6784\u6548\u7387\uff0c\u6211\u4eec\u5728FPGA\u4e0a\u6784\u5efa\u4e86\u5b9a\u5236\u786c\u4ef6\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528GPU\u65e0\u6cd5\u5904\u7406\u7684\u8f7b\u91cf\u7ea7\u8fd0\u7b97\uff0c\u5b9e\u73b0\u4e86\u5bf9\u5341\u4ebf\u53c2\u6570\u89c4\u6a21\u6a21\u578b\u7684\u9ad8\u901f\u5904\u7406\uff0c\u4f7f\u5176\u63a5\u8fd1\u4eba\u8111\u7ea7\u522b\u7684\u6548\u7387\u3002 \u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u5c55\u793a\u4e86LLMs\u5728\u51cf\u5c0f\u590d\u6742\u6027\u540e\u4ecd\u80fd\u4fdd\u6301\u9ad8\u6548\uff0c\u8fd8\u6307\u51fa\u4e86\u672a\u6765\u52a0\u901f\u5668\u5e94\u4f18\u5316\u7684\u8fd0\u7b97\u7c7b\u578b\uff0c\u4ee5\u9002\u5e94\u4e0b\u4e00\u4ee3\u8f7b\u91cf\u7ea7LLMs\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5b9e\u73b0\u5df2\u5f00\u6e90\u81f3\uff1a\\url{https://github.com/ridgerchu/matmulfreellm}\u3002**|\n", "2406.02524": "|**2024-06-04**|**CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks**|Maciej Besta et.al.|[2406.02524](http://arxiv.org/abs/2406.02524)|**[link](https://github.com/spcl/checkembed)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u53d8\u9769\uff0c\u4f46\u9a8c\u8bc1\u5176\u7b54\u6848\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u5f00\u653e\u6027\u7684\u4efb\u52a1\uff0c\u5982\u77e5\u8bc6\u6574\u5408\u3001\u6458\u8981\u548c\u63d0\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheckEmbed\u7684\u7cbe\u786e\u3001\u53ef\u6269\u5c55\u4e14\u7b80\u4fbf\u7684LLM\u9a8c\u8bc1\u65b9\u6cd5\u3002CheckEmbed\u7684\u6838\u5fc3\u7406\u5ff5\u662f\uff1a\u901a\u8fc7\u5229\u7528\u5982GPT\u6587\u672c\u5d4c\u5165\u5927\u6a21\u578b\u83b7\u53d6\u7684\u7b54\u6848\u7ea7\u5d4c\u5165\u6765\u6bd4\u8f83LLM\u7684\u56de\u7b54\u3002\u8fd9\u5c06\u590d\u6742\u7684\u6587\u672c\u7b54\u6848\u8f6c\u5316\u4e3a\u5355\u4e00\u7684\u5d4c\u5165\uff0c\u7b80\u5316\u4e86\u5bf9\u6bd4\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u5feb\u901f\u800c\u6709\u610f\u4e49\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u9a8c\u8bc1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u5b9e\u73b0\u4e86CheckEmbed\u7684\u7406\u5ff5\uff0c\u5e76\u63d0\u4f9b\u4e86\u8bc4\u4f30LLM\u7b54\u6848\u771f\u5b9e\u6027\u7684\u5ea6\u91cf\uff0c\u5982\u5d4c\u5165\u70ed\u529b\u56fe\u53ca\u5176\u603b\u7ed3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6307\u6807\u8bbe\u8ba1\u5b9e\u9645\u7684\u5f15\u64ce\uff0c\u4ee5\u51b3\u5b9aLLM\u7b54\u6848\u662f\u5426\u4ee4\u4eba\u6ee1\u610f\u3002\u5728\u5b9e\u9645\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\uff0c\u5982\u672f\u8bed\u63d0\u53d6\u548c\u6587\u6863\u6458\u8981\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u663e\u8457\u7684\u51c6\u786e\u6027\u63d0\u5347\u3001\u6210\u672c\u6548\u76ca\u548c\u8fd0\u884c\u65f6\u95f4\u6027\u80fd\uff0c\u76f8\u8f83\u4e8eBERTScore\u6216SelfCheckGPT\u7b49\u57fa\u4e8etoken\u3001\u53e5\u5b50\u548c\u4e8b\u5b9e\u7ea7\u522b\u7684\u65b9\u6848\u3002|\n", "2406.02523": "|**2024-06-04**|**RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots**|Soroush Nasiriany et.al.|[2406.02523](http://arxiv.org/abs/2406.02523)|null|## \u7ffb\u8bd1 \u4eba\u5de5\u667a\u80fd\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u89c4\u6a21\u7684\u6269\u5927\u3002\u7136\u800c\uff0c\u5728\u673a\u5668\u4eba\u9886\u57df\uff0c\u5927\u89c4\u6a21\u673a\u5668\u4eba\u6570\u636e\u96c6\u7684\u83b7\u53d6\u662f\u4e00\u4e2a\u74f6\u9888\u3002\u6211\u4eec\u4e3b\u5f20\u5229\u7528\u903c\u771f\u7684\u7269\u7406\u6a21\u62df\u6765\u63d0\u5347\u73af\u5883\u3001\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\uff0c\u4ee5\u652f\u6301\u673a\u5668\u4eba\u5b66\u4e60\u65b9\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecdRoboCasa\uff0c\u8fd9\u662f\u4e00\u4e2a\u5927\u578b\u7684\u4eff\u771f\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u80fd\u591f\u5728\u65e5\u5e38\u73af\u5883\u4e2d\u901a\u7528\u7684\u673a\u5668\u4eba\u3002RoboCasa\u7684\u7279\u70b9\u662f\u62e5\u6709\u4e30\u5bcc\u4e14\u591a\u6837\u5316\u7684\u53a8\u623f\u573a\u666f\uff0c\u5305\u62ec\u8d85\u8fc7150\u4e2a\u7c7b\u522b\u7684\u4e00\u5343\u591a\u4ef63D\u6a21\u578b\u8d44\u4ea7\u548c\u6570\u5341\u79cd\u53ef\u4ea4\u4e92\u7684\u5bb6\u5177\u548c\u7535\u5668\u3002 \u6211\u4eec\u901a\u8fc7\u751f\u6210\u5f0fAI\u5de5\u5177\u8fdb\u4e00\u6b65\u589e\u5f3a\u6a21\u62df\u7684\u771f\u5b9e\u6027\u548c\u591a\u6837\u6027\uff0c\u5982\u4f7f\u7528\u6587\u672c\u52303D\u6a21\u578b\u7684\u6280\u672f\u751f\u6210\u5bf9\u8c61\u8d44\u4ea7\uff0c\u4ee5\u53ca\u901a\u8fc7\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u73af\u5883\u7eb9\u7406\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86100\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6307\u5bfc\u7684\u590d\u5408\u4efb\u52a1\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f30\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u6f14\u793a\uff0c\u5e76\u7ed3\u5408\u81ea\u52a8\u8f68\u8ff9\u751f\u6210\u65b9\u6cd5\uff0c\u4ee5\u6700\u5c0f\u7684\u4eba\u529b\u6210\u672c\u5927\u5e45\u6269\u5145\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u5408\u6210\u751f\u6210\u7684\u673a\u5668\u4eba\u6570\u636e\u8fdb\u884c\u5927\u89c4\u6a21\u6a21\u4eff\u5b66\u4e60\u65f6\uff0c\u5b58\u5728\u660e\u663e\u7684\u89c4\u6a21\u6548\u5e94\uff0c\u5e76\u663e\u793a\u51fa\u5229\u7528\u6a21\u62df\u6570\u636e\u5728\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\u3002\u76f8\u5173\u89c6\u9891\u548c\u5f00\u6e90\u4ee3\u7801\u5df2\u5728https://robocasa.ai/\u7f51\u7ad9\u4e0a\u63d0\u4f9b\u3002|\n", "2406.03496": "|**2024-06-05**|**Wings: Learning Multimodal LLMs without Text-only Forgetting**|Yi-Kai Zhang et.al.|[2406.03496](http://arxiv.org/abs/2406.03496)|null|## \u4efb\u52a1 \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8d77\u6e90\u4e8e\u9884\u8bad\u7ec3\u7684\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff0c\u9996\u5148\u5c06\u56fe\u50cf\u4e0e\u6587\u672c\u5bf9\u9f50\uff0c\u7136\u540e\u5728\u6df7\u5408\u6a21\u6001\u8f93\u5165\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0cMLLM\u5728\u5904\u7406\u4ec5\u5305\u542b\u6587\u672c\u7684\u6307\u4ee4\u65f6\u4f1a\u51fa\u73b0\u707e\u96be\u6027\u7684\u9057\u5fd8\uff0c\u8fd9\u4e9b\u6587\u672c\u6307\u4ee4\u5e76\u672a\u5305\u542b\u56fe\u50cf\uff0c\u8fd9\u4e9b\u95ee\u9898\u5728\u521d\u59cb\u7684\u8bed\u8a00\u6a21\u578b\u9636\u6bb5\u5c31\u5df2\u7ecf\u5b58\u5728\u3002\u672c\u6587\u63d0\u51faWings\uff0c\u4e00\u4e2a\u65b0\u578b\u7684MLLM\uff0c\u5b83\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u901a\u8fc7\u5206\u6790MLLM\u5728\u591a\u6a21\u6001\u6307\u4ee4\u4e2d\u7684\u6ce8\u610f\u529b\uff0c\u6211\u4eec\u53d1\u73b0\u6587\u672c\u9057\u5fd8\u4e0e\u4ece\u56fe\u50cf\u524d\u5411\u56fe\u50cf\u540e\u7684\u6ce8\u610f\u529b\u8f6c\u79fb\u6709\u5173\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u989d\u5916\u6a21\u5757\u4f5c\u4e3a\u589e\u5f3a\u5b66\u4e60\u5668\uff0c\u4ee5\u8865\u507f\u8fd9\u79cd\u6ce8\u610f\u529b\u8f6c\u79fb\u3002\u89c6\u89c9\u548c\u6587\u672c\u5b66\u4e60\u5668\u4f5c\u4e3a\u201c\u7fc5\u8180\u201d\u5f0f\u7684\u8865\u5145\uff0c\u5e73\u884c\u8fde\u63a5\u5728\u6bcf\u4e2a\u6ce8\u610f\u529b\u5757\u5185\uff0c\u8d77\u521d\u56fe\u50cf\u548c\u6587\u672c\u8f93\u5165\u7531\u89c6\u89c9\u5b66\u4e60\u5668\u4e0e\u4e3b\u6ce8\u610f\u529b\u534f\u540c\u5de5\u4f5c\uff0c\u5e73\u8861\u5bf9\u89c6\u89c9\u5143\u7d20\u7684\u5173\u6ce8\u3002\u968f\u540e\uff0c\u6587\u672c\u5b66\u4e60\u5668\u901a\u8fc7\u6ce8\u610f\u529b\u8def\u7531\u7684\u65b9\u5f0f\u4e0e\u89c6\u89c9\u5b66\u4e60\u5668\u7684\u8f93\u51fa\u534f\u4f5c\u6574\u5408\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4f4e\u79e9\u6b8b\u5dee\u6ce8\u610f\u529b\uff08LoRRA\uff09\u673a\u5236\u4ee5\u4fdd\u8bc1\u5b66\u4e60\u5668\u7684\u9ad8\u6548\u8fd0\u884c\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWings\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u540c\u7b49\u89c4\u6a21\u7684MLLM\u3002\u5728\u6211\u4eec\u65b0\u6784\u5efa\u7684\u4ea4\u9519\u56fe\u50cf-\u6587\u672c\uff08IIT\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cWings\u5728\u4ece\u6587\u672c\u4e3a\u4e3b\u5230\u591a\u6a21\u6001\u4e3a\u4e3b\u7684\u95ee\u7b54\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002|\n", "2406.03488": "|**2024-06-06**|**Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training**|Ao Sun et.al.|[2406.03488](http://arxiv.org/abs/2406.03488)|**[link](https://github.com/maydomine/seq1f1b)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5176\u4e2d\u7ba1\u9053\u5e76\u884c\u6027\u8d77\u7740\u5173\u952e\u4f5c\u7528\u3002\u968f\u7740LLMs\u7684\u8bad\u7ec3\u5e8f\u5217\u957f\u5ea6\u6269\u5c55\u523032k\u751a\u81f3128k\uff0c\u5f53\u524d\u7684\u7ba1\u9053\u5e76\u884c\u65b9\u6cd5\u9762\u4e34\u4e25\u91cd\u74f6\u9888\uff0c\u5982\u9ad8\u5185\u5b58\u5360\u7528\u548c\u663e\u8457\u7684\u7ba1\u9053\u5ef6\u8fdf\uff0c\u8fd9\u6781\u5927\u5730\u9650\u5236\u4e86\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u548c\u8bad\u7ec3\u541e\u5410\u91cf\u3002\u4e3a\u4e86\u63d0\u9ad8\u5185\u5b58\u6548\u7387\u548c\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u957f\u5e8f\u5217\u8bad\u7ec3LLMs\u7684\u9ad8\u6548\u5e8f\u5217\u7ea7\u4e00\u6b21\u524d\u5411\u4e00\u6b21\u540e\u5411\uff081F1B\uff09\u7ba1\u9053\u8c03\u5ea6\u65b9\u6cd5\uff0c\u79f0\u4e3aSeq1F1B\u3002Seq1F1B\u5c06\u6279\u7ea7\u522b\u53ef\u8c03\u5ea6\u5355\u5143\u5206\u89e3\u4e3a\u66f4\u7ec6\u7684\u5e8f\u5217\u7ea7\u5355\u5143\uff0c\u4ece\u800c\u51cf\u5c0f\u5ef6\u8fdf\u5e76\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002 \u8003\u8651\u5230\u5982\u679c\u5747\u5300\u5206\u5272\u5e8f\u5217\uff0cSeq1F1B\u53ef\u80fd\u4f1a\u4ea7\u751f\u8f7b\u5fae\u7684\u989d\u5916\u5ef6\u8fdf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba1\u7b97\u7684\u7b56\u7565\u6765\u5212\u5206\u8f93\u5165\u5e8f\u5217\uff0c\u4ee5\u7f13\u89e3\u8fd9\u4e2a\u526f\u4f5c\u7528\u3002\u4e0e\u7ade\u4e89\u6027\u7684\u7ba1\u9053\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5982Megatron\u76841F1B\u7ba1\u9053\u5e76\u884c\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u66f4\u9ad8\u8bad\u7ec3\u541e\u5410\u91cf\u7684\u540c\u65f6\uff0c\u5185\u5b58\u5360\u7528\u66f4\u4f4e\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSeq1F1B\u80fd\u591f\u5728\u4e0d\u4f7f\u7528\u91cd\u65b0\u8ba1\u7b97\u7b56\u7565\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u5730\u572864\u4e2aNVIDIA A100 GPU\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u5177\u6709300\u4ebf\u53c2\u6570\u7684LLM\uff0c\u5904\u7406\u957f\u8fbe64k\u7684\u5e8f\u5217\uff0c\u8fd9\u662f\u73b0\u6709\u65b9\u6cd5\u65e0\u6cd5\u5b9e\u73b0\u7684\u3002\u6211\u4eec\u7684\u4ee3\u7801\u57fa\u4e8eMegatron-LM\uff0c\u5e76\u5df2\u5f00\u6e90\uff1ahttps://github.com/MayDomine/Seq1F1B.git\u3002|\n", "2406.03487": "|**2024-06-05**|**Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends**|Sanjana Ramprasad et.al.|[2406.03487](http://arxiv.org/abs/2406.03487)|null|### \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u6458\u8981\u751f\u6210\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u771f\u5b9e\u6027\u65b9\u9762\u7684\u95ee\u9898\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u65b0\u95fb\u9886\u57df\u7684LLMs\uff0c\u5bf9\u8bdd\u6458\u8981\u7684\u8bc4\u4ef7\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u4e8eBART\u7684\u6a21\u578b\u4e0a\uff0c\u8fd9\u5728\u6211\u4eec\u7406\u89e3\u5b83\u4eec\u7684\u53ef\u4fe1\u5ea6\u65b9\u9762\u7559\u4e0b\u4e86\u7a7a\u767d\u3002\u672c\u7814\u7a76\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u5bf9\u8bdd\u6458\u8981\u4e2d\u7684\u771f\u5b9e\u6027\uff0c\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\uff0c\u5e76\u7740\u91cd\u4e8e\u8bc6\u522b\u548c\u5206\u7c7b\u53e5\u7ea7\u4e0d\u4e00\u81f4\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8GPT-4\u548cAlpaca-13B\u8fd9\u4e24\u6b3e\u4e3b\u6d41\u6a21\u578b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u9519\u8bef\u5b9a\u4e49\u7684\u5fae\u5999\u4e4b\u5904\uff1aLLMs\u5e38\u5e38\u751f\u6210\u770b\u4f3c\u5408\u7406\u7684\u63a8\u65ad\uff0c\u8fd9\u4e9b\u63a8\u65ad\u4f9d\u8d56\u4e8e\u5bf9\u8bdd\u4e2d\u7684\u95f4\u63a5\u8bc1\u636e\uff0c\u800c\u7f3a\u4e4f\u76f4\u63a5\u8bc1\u636e\uff0c\u8fd9\u5728\u65e7\u6a21\u578b\u4e2d\u8f83\u5c11\u89c1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6539\u8fdb\u7684\u9519\u8bef\u5206\u7c7b\u4f53\u7cfb\uff0c\u5f15\u5165\u4e86\u201c\u60c5\u5883\u63a8\u7406\u201d\u7c7b\u522b\u6765\u5f52\u7c7b\u8fd9\u4e9bLLM\u884c\u4e3a\uff0c\u5e76\u516c\u5f00\u4e86\u76f8\u5173\u6570\u636e\u96c6\u3002\u5229\u7528\u6211\u4eec\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86LLMs\u4e0e\u8001\u5f0f\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u884c\u4e3a\u5dee\u5f02\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u81ea\u52a8\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\u5728LLM\u6458\u8981\u4e0a\u7684\u6548\u679c\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u8bc6\u522b\u8fd9\u7c7b\u7ec6\u5fae\u9519\u8bef\u65f6\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u7cbe\u7ec6\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u4f18\u4e8e\u73b0\u6709\u6307\u6807\uff0c\u7279\u522b\u662f\u5728\u8bc6\u522b\u201c\u60c5\u5883\u63a8\u7406\u201d\u9519\u8bef\u65f6\u3002|\n", "2406.03486": "|**2024-06-05**|**BIPED: Pedagogically Informed Tutoring System for ESL Education**|Soonwoo Kwon et.al.|[2406.03486](http://arxiv.org/abs/2406.03486)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u4f5c\u4e3a\u7ecf\u6d4e\u4e14\u6613\u4e8e\u83b7\u53d6\u7684\u82f1\u8bed\u7b2c\u4e8c\u8bed\u8a00\uff08L2\uff09\u5b66\u4e60\u8005\u5bf9\u8bdd\u5f0f\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\uff08CITS\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684CITS\u5f80\u5f80\u53ea\u80fd\u6559\u6388\u7b80\u5355\u6982\u5ff5\uff0c\u6216\u8005\u5728\u6559\u5b66\u6df1\u5ea6\u4e0a\u65e0\u6cd5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u7b56\u7565\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u5f00\u53d1\u4e00\u4e2a\u66f4\u5177\u6559\u80b2\u5b66\u5bfc\u5411\u3001\u80fd\u6559\u6388\u590d\u6742\u6982\u5ff5\u7684CITS\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53cc\u8bed\u6559\u80b2\u6307\u5bfc\u5bf9\u8bdd\u6570\u636e\u96c6\uff08BIPED\uff09\uff0c\u5305\u542b\u4e00\u5bf9\u4e00\u7684\u4eba\u7c7b\u82f1\u8bed\u8f85\u5bfc\u4e92\u52a8\u3002\u901a\u8fc7\u5bf9\u8f85\u5bfc\u5bf9\u8bdd\u7684\u540e\u5904\u7406\u5206\u6790\uff0c\u6211\u4eec\u63d0\u70bc\u51fa\u4e00\u5957\u5305\u542b34\u79cd\u6559\u5e08\u884c\u4e3a\u548c9\u79cd\u5b66\u751f\u884c\u4e3a\u7684\u5bf9\u8bdd\u52a8\u4f5c\u8bcd\u5178\uff0c\u5e76\u5c06\u5176\u7528\u4e8e\u8fdb\u4e00\u6b65\u6807\u6ce8\u6536\u96c6\u7684\u6570\u636e\u3002\u6839\u636e\u5148\u9884\u6d4b\u5408\u9002\u7684\u6559\u5e08\u884c\u4e3a\u518d\u751f\u6210\u76f8\u5e94\u56de\u590d\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u6211\u4eec\u5229\u7528GPT-4\u548cSOLAR-KO\u5206\u522b\u5b9e\u73b0\u4e86\u4e24\u4e2aCITS\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u5b9e\u65bd\u7684\u6a21\u578b\u4e0d\u4ec5\u6a21\u4eff\u4e86\u4eba\u7c7b\u6559\u5e08\u7684\u98ce\u683c\uff0c\u8fd8\u8fd0\u7528\u4e86\u4e30\u5bcc\u4e14\u4e0e\u4e0a\u4e0b\u6587\u76f8\u9002\u5e94\u7684\u6559\u5b66\u7b56\u7565\u3002|\n", "2406.03476": "|**2024-06-05**|**Does your data spark joy? Performance gains from domain upsampling at the end of training**|Cody Blakeney et.al.|[2406.03476](http://arxiv.org/abs/2406.03476)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u89c4\u6a21\u589e\u957f\u5230\u4e07\u4ebf\u7ea7\u522b\u7684tokens\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u7531\u5927\u89c4\u6a21\u7684CommonCrawl\u7f51\u7edc\u722c\u866b\u5185\u5bb9\u4ee5\u53ca\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u7ec4\u6210\u3002\u7531\u4e8e\u5728\u5927\u8ba1\u7b97\u91cf\uff08FLOPs\uff09\u4e0b\u8bad\u7ec3\u4ee5\u63ed\u793a\u6a21\u578b\u5728\u56f0\u96be\u548c\u65b0\u5174\u57fa\u51c6\u4e0a\u7684\u663e\u8457\u53d8\u5316\u6210\u672c\u9ad8\u6602\uff0c\u5982\u4f55\u5728\u901a\u7528\u7f51\u7edc\u6293\u53d6\u7684\u591a\u6837\u6027\u548c\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5bc6\u5ea6\u4e4b\u95f4\u627e\u5230\u6700\u4f18\u5e73\u8861\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c\u5728\u8bad\u7ec3\u540e\u671f\u5bf9\u5176\u8fdb\u884c\u4e0a\u91c7\u6837\uff0c\u4ece\u800c\u5728\u8bf8\u5982MMLU\u3001GSM8K\u548cHumanEval\u7b49\u57fa\u51c6\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u5bf9\u4e8e\u4e00\u4e2a\u8bad\u7ec3\u4e861\u4e07\u4ebf\uff08T\uff09\u4ee4\u724c\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u8fd9\u79cd\u7b80\u5355\u65b9\u6cd5\u53ef\u4f7f\u5176\u6027\u80fd\u63d0\u9ad86.90\u5206\u30018.26\u5206\u548c6.17\u5206\uff0c\u4e0e\u8bad\u7ec3\u65f6\u95f4\u4e24\u500d\u7684Llama-2\uff087B\uff09\u6a21\u578b\u76f8\u5f53\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5728\u8bad\u7ec3\u540e\u671f\u9886\u57df\u4e0a\u91c7\u6837\u7684\u6301\u7eed\u65f6\u95f4\uff0c\u4ece5%\u523030%\uff0c\u53d1\u73b010%\u523020%\u7684\u6bd4\u4f8b\u6700\u4e3a\u5408\u9002\uff0c\u4ee5\u5e73\u8861\u4e00\u822c\u8bed\u8a00\u5efa\u6a21\u80fd\u529b\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684\u4f18\u5316\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5229\u7528\u9886\u57df\u4e0a\u91c7\u6837\u6765\u5927\u89c4\u6a21\u5206\u6790\u5355\u4e2a\u6570\u636e\u96c6\u5bf9\u4e0d\u540c\u57fa\u51c6\u7684\u589e\u76ca\uff0c\u901a\u8fc7\u5728\u8fd9\u4e00\u9636\u6bb5\u79fb\u9664\u5b83\u4eec\u8fdb\u884c\u5b9e\u9a8c\u3002\u8fd9\u79cd\u65b9\u6cd5\u6781\u5927\u5730\u964d\u4f4e\u4e86\u5b9e\u9a8c\u6210\u672c\uff0c\u4f7f\u5f97\u80fd\u591f\u4ee5\u9884\u8bad\u7ec3\u8fd0\u884c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u7684\u6210\u672c\u63a2\u7d22\u4e0d\u540c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474](http://arxiv.org/abs/2406.03474)|null|\u9274\u4e8e\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u805a\u7126\u4e8e\u4f7f\u7528MLLM\u9a71\u52a8\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u5728\u5927\u89c4\u6a21\u52a8\u6001\u73af\u5883\u4e2d\u3002\u7136\u800c\uff0c\u5e38\u89c1\u7684\u65b9\u6cd5\u76f4\u63a5\u5c06\u9ad8\u7ea7\u6307\u4ee4\u8f6c\u5316\u4e3a\u4f4e\u7ea7\u8f66\u8f86\u63a7\u5236\u4fe1\u53f7\uff0c\u8fd9\u8fdd\u80cc\u4e86MLLM\u7684\u672c\u8d28\u751f\u6210\u6a21\u5f0f\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u5176\u6f5c\u5728\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u7684\u4e00\u822c\u5316\u80fd\u529b\u53d7\u5230\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6781\u5927\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u4e2d\u5c42\u8bed\u8a00\u9a71\u52a8\u547d\u4ee4\u6765\u8fde\u63a5\u9ad8\u7ea7\u6307\u4ee4\u548c\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\uff0c\u5b83\u4eec\u6bd4\u9ad8\u7ea7\u6307\u4ee4\u66f4\u7ec6\u81f4\uff0c\u4f46\u6bd4\u63a7\u5236\u4fe1\u53f7\u66f4\u901a\u7528\u4e14\u53ef\u89e3\u91ca\uff0c\u4ece\u800c\u6709\u6548\u5f25\u5408\u4e24\u8005\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3aAD-H\u7684\u5206\u5c42\u591a\u4ee3\u7406\u9a7e\u9a76\u7cfb\u7edf\u5b9e\u73b0\u8fd9\u4e00\u7406\u5ff5\uff0c\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u9ad8\u5c42\u63a8\u7406\u7684MLLM\u89c4\u5212\u5668\u548c\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u63a7\u5236\u5668\u8fdb\u884c\u4f4e\u5c42\u6267\u884c\u3002\u8fd9\u79cd\u5206\u5c42\u8bbe\u8ba1\u4f7fMLLM\u6446\u8131\u4e86\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\u89e3\u7801\uff0c\u5145\u5206\u91ca\u653e\u4e86\u5176\u5728\u9ad8\u5c42\u611f\u77e5\u3001\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u7684\u6d8c\u73b0\u80fd\u529b\u3002 \u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5e26\u6709\u52a8\u4f5c\u5c42\u6b21\u6ce8\u91ca\u7684\u65b0\u6570\u636e\u96c6\u3002\u5168\u9762\u7684\u95ed\u73af\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684AD-H\u7cfb\u7edf\u5177\u6709\u591a\u9879\u5173\u952e\u4f18\u52bf\u3002\u9996\u5148\uff0cAD-H\u5728\u9a7e\u9a76\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u751a\u81f3\u5c55\u73b0\u51fa\u5728\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u81ea\u6211\u7ea0\u6b63\u7684\u80fd\u529b\uff0c\u8fd9\u662f\u8bad\u7ec3\u6570\u636e\u672a\u6db5\u76d6\u7684\u573a\u666f\u3002\u5176\u6b21\uff0cAD-H\u5728\u957f\u7a0b\u6307\u4ee4\u548c\u65b0\u73af\u5883\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\uff0c\u660e\u663e\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\uff0c\u53ef\u901a\u8fc7\u83b7\u53d6\u3002|\n", "2406.03450": "|**2024-06-05**|**What is the Best Way for ChatGPT to Translate Poetry?**|Shanshan Wang et.al.|[2406.03450](http://arxiv.org/abs/2406.03450)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u5728\u82f1\u8bed-\u4e2d\u6587\u8bd7\u6b4c\u7ffb\u8bd1\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u5b9a\u5411\u63d0\u793a\u548c\u5c0f\u6837\u672c\u573a\u666f\u5206\u6790\u4ee5\u4f18\u5316\u5176\u8868\u73b0\u3002\u5c3d\u7ba1\u521d\u671f\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u7814\u7a76\u53d1\u73b0ChatGPT\u7684\u7ffb\u8bd1\u5b58\u5728\u6301\u7eed\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u89e3\u91ca\u8f85\u52a9\u8bd7\u6b4c\u673a\u5668\u7ffb\u8bd1\u201d\uff08EAPMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bd7\u6b4c\u7684\u5355\u8bed\u89e3\u91ca\u4f5c\u4e3a\u7ffb\u8bd1\u8fc7\u7a0b\u7684\u6307\u5bfc\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ee5\u66f4\u597d\u5730\u9002\u5e94\u73b0\u4ee3\u8bd7\u6b4c\u7ffb\u8bd1\u7684\u5fae\u5999\u4e4b\u5904\u3002\u6211\u4eec\u9080\u8bf7\u4e13\u4e1a\u8bd7\u4eba\u8fdb\u884c\u8bc4\u4f30\uff0c\u5e76\u7ed3\u5408GPT-4\u7684\u8bc4\u4ef7\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684EAPMT\u65b9\u6cd5\u5728\u4e0e\u4f20\u7edfChatGPT\u7ffb\u8bd1\u65b9\u6cd5\u4ee5\u53ca\u73b0\u6709\u5728\u7ebf\u7cfb\u7edf\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8bba\u6587\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e3a\u6587\u5b66\u7ffb\u8bd1\u7684\u673a\u5668\u8f85\u52a9\u63d0\u4f9b\u4e86\u65b0\u9896\u89c6\u89d2\u3002|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445](http://arxiv.org/abs/2406.03445)|null|## \u7ffb\u8bd1 \u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u6267\u884c\u57fa\u672c\u7684\u7b97\u672f\u8fd0\u7b97\uff0c\u5982\u52a0\u6cd5\uff0c\u4ecd\u4e0d\u6e05\u695a\u3002\u672c\u6587\u63ed\u793a\u4e86\u9884\u8bad\u7ec3\u7684LLMs\u901a\u8fc7\u5085\u91cc\u53f6\u7279\u5f81\u8fdb\u884c\u52a0\u6cd5\u2014\u2014\u8fd9\u4e9b\u662f\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u7ef4\u5ea6\uff0c\u901a\u8fc7\u4e00\u7ec4\u5728\u9891\u57df\u4e2d\u7a00\u758f\u5206\u5e03\u7684\u7279\u5f81\u6765\u8868\u793a\u6570\u5b57\u3002\u5728\u6a21\u578b\u4e2d\uff0c\u591a\u5c42\u611f\u77e5\u5668\uff08MLP\uff09\u5c42\u548c\u6ce8\u610f\u529b\u5c42\u4ee5\u4e92\u8865\u7684\u65b9\u5f0f\u4f7f\u7528\u5085\u91cc\u53f6\u7279\u5f81\uff1aMLP\u5c42\u4e3b\u8981\u4f7f\u7528\u4f4e\u9891\u7279\u5f81\u8fd1\u4f3c\u7b54\u6848\u7684\u5927\u5c0f\uff0c\u800c\u6ce8\u610f\u529b\u5c42\u4e3b\u8981\u901a\u8fc7\u9ad8\u9891\u7279\u5f81\u6267\u884c\u6a21\u8fd0\u7b97\uff08\u4f8b\u5982\u5224\u65ad\u7b54\u6848\u662f\u5426\u4e3a\u5076\u6570\uff09\u3002\u9884\u8bad\u7ec3\u5bf9\u4e8e\u8fd9\u79cd\u673a\u5236\u81f3\u5173\u91cd\u8981\uff1a\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684\u6a21\u578b\u4ec5\u5229\u7528\u4f4e\u9891\u7279\u5f81\uff0c\u5bfc\u81f4\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u5c06\u9884\u8bad\u7ec3\u7684\u8bcd\u5d4c\u5165\u5f15\u5165\u5230\u968f\u673a\u521d\u59cb\u5316\u7684\u6a21\u578b\u4e2d\u53ef\u4ee5\u6062\u590d\u5176\u6027\u80fd\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u9002\u5f53\u7684\u9884\u8bad\u7ec3\u8868\u793a\uff08\u5982\u5085\u91cc\u53f6\u7279\u5f81\uff09\u80fd\u591f\u89e3\u9501Transformer\u5b66\u4e60\u7b97\u6cd5\u4efb\u52a1\u7cbe\u786e\u673a\u5236\u7684\u80fd\u529b\u3002|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441](http://arxiv.org/abs/2406.03441)|null|\u5728\u8bb8\u591a\u9ad8\u98ce\u9669\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u4e2d\uff0c\u6a21\u578b\u9700\u8981\u80fd\u591f\u8868\u660e\u5176\u5bf9\u9884\u6d4b\u7684\u4e0d\u786e\u5b9a\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u4e0a\u7684\u51c6\u786e\u5ea6\u53ef\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u4eba\u7c7b\u6c34\u5e73\uff0c\u4f46\u5b83\u4eec\u5bf9\u9519\u8bef\u54cd\u5e94\u7684\u8fc7\u5ea6\u81ea\u4fe1\u4ecd\u662f\u5df2\u77e5\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u5728\u76f4\u63a5\u5e94\u7528\u4e8eLLMs\u65f6\u53ef\u80fd\u9762\u4e34\u8ba1\u7b97\u6210\u672c\u548c\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u6311\u6218\u3002\u8fd1\u671f\u63d0\u51fa\u4e86\u4e00\u4e9b\u9ed1\u76d2\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bf8\u5982\u81ea\u6211\u8868\u8ff0\u7684\u4fe1\u5fc3\u7b49\u542f\u53d1\u5f0f\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u901a\u8fc7\u5206\u6790\u6a21\u578b\u751f\u6210\u7b54\u6848\u7684\u89e3\u91ca\u5206\u5e03\u6765\u8861\u91cfLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u5c3d\u7ba1\u5229\u7528\u89e3\u91ca\u672c\u8eab\u5e76\u975e\u65b0\u9896\uff0c\u4f46\u6211\u4eec\u5c06\u5176\u89c6\u4e3a\u6d4b\u8bd5\u65f6\u95f4\u5206\u7c7b\u5668\uff0c\u901a\u8fc7\u8ba1\u7b97\u6700\u53ef\u80fd\u7684\u5206\u7c7b\u5668\u540e\u9a8c\u7b54\u6848\u5206\u5e03\uff0c\u4ee5\u6b64\u8fdb\u884c\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u89e3\u91ca\u8574\u542b\u4f5c\u4e3a\u5206\u7c7b\u5668\u4f3c\u7136\u6027\u7684\u4e00\u79cd\u7279\u5b9a\u6846\u67b6\u5b9e\u4f8b\uff0c\u5982\u4f55\u5728\u4e94\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\u4e0a\u6539\u8fdb\u4e86\u4fe1\u5fc3\u5206\u6570\u6307\u6807\uff08\u7279\u522b\u662fAUROC\u548cAURC\uff09\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u65e2\u5177\u6709\u7406\u8bba\u4f9d\u636e\uff0c\u53c8\u662f\u6709\u6548\u91cf\u5316LLMs\u4e0d\u786e\u5b9a\u6027\u7684\u65b9\u5f0f\u3002|\n", "2406.03411": "|**2024-06-05**|**Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach**|Saehyung Lee et.al.|[2406.03411](http://arxiv.org/abs/2406.03411)|**[link](https://github.com/saehyung-lee/plugir)**|**\u8be5\u8bba\u6587\u4e3b\u8981\u5173\u6ce8\u7684\u662f\u4ea4\u4e92\u5f0f\u6587\u672c\u5230\u56fe\u50cf\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u8bdd\u5f62\u5f0f\u4e0a\u4e0b\u6587\u67e5\u8be2\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\uff0c\u540d\u4e3aPlugIR\uff0c\u901a\u8fc7\u4e24\u79cd\u65b9\u5f0f\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u822c\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u9996\u5148\uff0c\u901a\u8fc7\u91cd\u8ff0\u5bf9\u8bdd\u5f62\u5f0f\u7684\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u6d88\u9664\u4e86\u5728\u73b0\u6709\u89c6\u89c9\u5bf9\u8bdd\u6570\u636e\u4e0a\u5fae\u8c03\u68c0\u7d22\u6a21\u578b\u7684\u9700\u6c42\uff0c\u4ece\u800c\u80fd\u591f\u4f7f\u7528\u4efb\u610f\u9ed1\u76d2\u6a21\u578b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2aLLM\u63d0\u95ee\u8005\uff0c\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u4e2d\u5019\u9009\u56fe\u50cf\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5173\u4e8e\u76ee\u6807\u56fe\u50cf\u5c5e\u6027\u7684\u975e\u5197\u4f59\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u51cf\u5c11\u4e86\u751f\u6210\u95ee\u9898\u7684\u566a\u58f0\u548c\u5197\u4f59\u3002\u9664\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3a\u6700\u4f73\u5bf9\u6570\u6392\u540d\u79ef\u5206\uff08BRI\uff09\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4ea4\u4e92\u5f0f\u68c0\u7d22\u7cfb\u7edf\u3002PlugIR\u5728\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u4f18\u4e8e\u96f6\u6b21\u8bbe\u7f6e\u548c Fine-tuned \u57fa\u51c6\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c PlugIR \u7684\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\u53ef\u4ee5\u6839\u636e\u4e0d\u540c\u60c5\u51b5\u7075\u6d3b\u5355\u72ec\u6216\u7ed3\u5408\u5e94\u7528\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/Saehyung-Lee/PlugIR\u3002**|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344](http://arxiv.org/abs/2406.04344)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d6\u5f97\u7684\u5de8\u5927\u8fdb\u5c55\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53e3\u5934\u5316\u673a\u5668\u5b66\u4e60\uff08VML\uff09\u6846\u67b6\u3002\u4e0e\u4f20\u7edf\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u901a\u5e38\u5728\u8fde\u7eed\u53c2\u6570\u7a7a\u95f4\u4e2d\u4f18\u5316\u4e0d\u540c\uff0cVML\u5c06\u53c2\u6570\u7a7a\u95f4\u9650\u5236\u4e3a\u4eba\u53ef\u7406\u89e3\u7684\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u79cd\u7ea6\u675f\u4fc3\u4f7f\u6211\u4eec\u4ece\u65b0\u89d2\u5ea6\u770b\u5f85\u51fd\u6570\u903c\u8fd1\u95ee\u9898\uff0c\u5373\u5c06\u5e26\u6709\u6587\u672c\u63d0\u793a\u7684LLM\u89c6\u4e3a\u7531\u6587\u672c\u63d0\u793a\u53c2\u6570\u5316\u7684\u51fd\u6570\u3002\u6211\u4eec\u501f\u6b64\u89c6\u89d2\u91cd\u65b0\u5ba1\u89c6\u4e86\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u5982\u56de\u5f52\u548c\u5206\u7c7b\uff0c\u53d1\u73b0\u8fd9\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\u548c\u4f18\u5316\u5668\u6765\u89e3\u51b3\u3002VML\u7684\u4e3b\u8981\u4f18\u52bf\u5305\u62ec\uff1a\uff081\uff09\u6613\u4e8e\u7f16\u7801\u5148\u9a8c\u77e5\u8bc6\uff1a\u5173\u4e8e\u95ee\u9898\u548c\u5047\u8bbe\u7c7b\u7684\u5148\u9a8c\u77e5\u8bc6\u53ef\u4ee5\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7f16\u7801\u5e76\u8f93\u5165\u7ed9LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\uff1b\uff082\uff09\u81ea\u52a8\u6a21\u578b\u9009\u62e9\uff1a\u4f18\u5316\u5668\u53ef\u4ee5\u6839\u636e\u6570\u636e\u548c\u53e3\u5934\u5316\u5148\u9a8c\u77e5\u8bc6\u81ea\u52a8\u9009\u62e9\u5177\u4f53\u7684\u6a21\u578b\u7c7b\u522b\uff0c\u5e76\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u66f4\u65b0\u6a21\u578b\u7c7b\u522b\uff1b\uff083\uff09\u53ef\u89e3\u91ca\u7684\u5b66\u4e60\u8005\u66f4\u65b0\uff1aLLM\u53c2\u6570\u5316\u7684\u4f18\u5316\u5668\u53ef\u4ee5\u89e3\u91ca\u6bcf\u6b21\u5b66\u4e60\u8005\u66f4\u65b0\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u591a\u9879\u5b9e\u9a8c\u8bc4\u4f30VML\u7684\u6709\u6548\u6027\uff0c\u5e0c\u671b\u5b83\u80fd\u6210\u4e3a\u589e\u5f3a\u673a\u5668\u5b66\u4e60\u53ef\u89e3\u91ca\u6027\u548c\u4fe1\u4efb\u5ea6\u7684\u6865\u6881\u3002|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339](http://arxiv.org/abs/2406.04339)|null|\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u7684\u6838\u5fc3\u76ee\u6807\u4e2d\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u89c6\u89c9\u573a\u666f\u5e76\u6267\u884c\u52a8\u4f5c\u662f\u4e00\u4e2a\u57fa\u672c\u4efb\u52a1\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u673a\u5668\u4eba\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u80fd\u591f\u5904\u7406\u4e00\u4e9b\u57fa\u7840\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5728\u4e24\u4e2a\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff1a1\uff09\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u63a8\u7406\u80fd\u529b\u4e0d\u8db3\uff1b2\uff09\u5bf9\u4e8eMLLM\u7684\u5fae\u8c03\u548c\u63a8\u7406\u5b58\u5728\u9ad8\u8ba1\u7b97\u6210\u672c\u3002\u8fd1\u671f\u63d0\u51fa\u7684\u57fa\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSM\uff09\u7684Mamba\u5c55\u793a\u4e86\u5728\u975e\u5e73\u51e1\u5e8f\u5217\u5efa\u6a21\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5177\u6709\u7ebf\u6027\u63a8\u7406\u590d\u6742\u5ea6\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RoboMamba\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4ebaMLLM\uff0c\u5b83\u5229\u7528Mamba\u6a21\u578b\u7ed3\u5408\u673a\u5668\u4eba\u63a8\u7406\u548c\u52a8\u4f5c\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u5fae\u8c03\u548c\u63a8\u7406\u6548\u7387\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u4e0eMamba\u96c6\u6210\uff0c\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u4f7f\u89c6\u89c9\u6570\u636e\u4e0e\u8bed\u8a00\u5d4c\u5165\u5bf9\u9f50\uff0c\u8d4b\u4e88\u6a21\u578b\u89c6\u89c9\u5e38\u8bc6\u548c\u4e0e\u673a\u5668\u4eba\u76f8\u5173\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347RoboMamba\u7684\u52a8\u4f5c\u59ff\u6001\u9884\u6d4b\u80fd\u529b\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u7b80\u5355\u7684\u7b56\u7565\u5934\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e00\u65e6RoboMamba\u5177\u5907\u8db3\u591f\u7684\u63a8\u7406\u80fd\u529b\uff0c\u53ea\u9700\u6781\u5c11\u7684\u5fae\u8c03\u53c2\u6570\uff08\u6a21\u578b\u76840.1%\uff09\u548c\u65f6\u95f4\uff0820\u5206\u949f\uff09\uff0c\u5c31\u80fd\u4e60\u5f97\u64cd\u7eb5\u6280\u80fd\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0cRoboMamba\u5728\u901a\u7528\u548c\u673a\u5668\u4eba\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u4e2d\u5b9e\u73b0\u4e86\u59ff\u6001\u9884\u6d4b\u7684\u51fa\u8272\u8868\u73b0\uff0c\u5176\u63a8\u7406\u901f\u5ea6\u6bd4\u73b0\u6709\u673a\u5668\u4ebaMLLM\u5feb7\u500d\u3002\u9879\u76ee\u7684\u7f51\u9875\u94fe\u63a5\u4e3a\uff1a\u3002|\n", "2406.04337": "|**2024-06-06**|**Coherent Zero-Shot Visual Instruction Generation**|Quynh Phung et.al.|[2406.04337](http://arxiv.org/abs/2406.04337)|null|\u5c3d\u7ba1\u6587\u672c\u5230\u56fe\u50cf\u5408\u6210\u6280\u672f\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5728\u6269\u6563\u6a21\u578b\u65b9\u9762\uff0c\u4f46\u751f\u6210\u9700\u8981\u7269\u4f53\u5728\u8fde\u7eed\u6b65\u9aa4\u4e2d\u4fdd\u6301\u4e00\u81f4\u8868\u793a\u548c\u5e73\u6ed1\u72b6\u6001\u8f6c\u6362\u7684\u89c6\u89c9\u6307\u4ee4\u4ecd\u7136\u662f\u4e00\u9879\u8270\u5de8\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\uff0c\u5de7\u5999\u5730\u7ed3\u5408\u4e86\u6587\u672c\u7406\u89e3\u4e0e\u56fe\u50cf\u751f\u6210\uff0c\u4ee5\u786e\u4fdd\u89c6\u89c9\u6307\u4ee4\u65e2\u7f8e\u89c2\u53c8\u5177\u6709\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6d4b\u8bd5\u591a\u6b65\u9aa4\u6307\u4ee4\uff0c\u5e76\u4e0e\u591a\u4e2a\u57fa\u7ebf\u8fdb\u884c\u6bd4\u8f83\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u8fde\u8d2f\u4e14\u89c6\u89c9\u4e0a\u5438\u5f15\u4eba\u7684\u6307\u4ee4\u3002|\n", "2406.04334": "|**2024-06-06**|**DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs**|Lingchen Meng et.al.|[2406.04334](http://arxiv.org/abs/2406.04334)|null|\u5927\u591a\u6570\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u901a\u8fc7\u5c06\u89c6\u89c9\u4ee4\u724c\u4f5c\u4e3a\u5e8f\u5217\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7b2c\u4e00\u5c42\u6765\u5b9e\u73b0\u3002\u8fd9\u79cd\u65b9\u6cd5\u867d\u7136\u76f4\u89c2\uff0c\u4f46\u4f1a\u663e\u8457\u589e\u52a0\u8ba1\u7b97\u548c\u5185\u5b58\u5f00\u9500\uff0c\u56e0\u4e3a\u6a21\u578b\u9700\u8981\u5904\u7406\u66f4\u591a\u7684\u8f93\u5165\u5c42\u4ee4\u724c\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u67b6\u6784DeepStack\uff0c\u7528\u4e8eLMMs\u3002\u5728LMM\u7684\u89c6\u89c9\u548c\u8bed\u8a00Transformer\u7684N\u5c42\u4e2d\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u4ee4\u724c\u5206\u4e3aN\u7ec4\uff0c\u5e76\u4ece\u5e95\u5c42\u9010\u5c42\u5411\u4e0a\u9988\u9001\u5230\u5bf9\u5e94\u7684Transformer\u5c42\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u6781\u5927\u5730\u589e\u5f3a\u4e86LMM\u5728\u8de8\u5c42\u89c6\u89c9\u4ee4\u724c\u4ea4\u4e92\u65b9\u9762\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u6210\u672c\u51e0\u4e4e\u4e0d\u53d8\u3002\u6211\u4eec\u5206\u522b\u5c06DeepStack\u5e94\u7528\u4e8eLMM\u7684\u8bed\u8a00\u548c\u89c6\u89c9Transformer\uff0c\u5e76\u901a\u8fc7\u5e7f\u6cdb\u5b9e\u8bc1\u7ed3\u679c\u9a8c\u8bc1\u4e86DeepStack LMM\u7684\u6709\u6548\u6027\u3002 \u4f7f\u7528\u76f8\u540c\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u6211\u4eec\u7684DeepStack 7B\u548c13B\u53c2\u6570\u6a21\u578b\u57289\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5e73\u5747\u8d85\u8d8a\u540c\u7c7b\u6a21\u578b2.7\u5206\u548c2.9\u5206\u3002\u4ec5\u4f7f\u7528\u4e94\u5206\u4e4b\u4e00\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0cDeepStack\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u4f7f\u7528\u5b8c\u6574\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6a21\u578b\u3002\u8fd9\u4e9b\u63d0\u5347\u5728\u9ad8\u5206\u8fa8\u7387\u4efb\u52a1\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u4f8b\u5982\uff0c\u4e0eLLaVA-1.5-7B\u76f8\u6bd4\uff0cTextVQA\u3001DocVQA\u548cInfoVQA\u4e0a\u7684\u6027\u80fd\u5206\u522b\u63d0\u9ad8\u4e864.2\u5206\u300111.0\u5206\u548c4.0\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06DeepStack\u5e94\u7528\u5230\u89c6\u89c9Transformer\u5c42\uff0c\u8fd9\u5e26\u6765\u4e86\u4e0eLLaVA-1.5-7B\u76f8\u5f53\u7684\u5e73\u5747\u6539\u8fdb\uff0c\u4e3a3.8\u5206\u3002|\n", "2406.04331": "|**2024-06-06**|**PaCE: Parsimonious Concept Engineering for Large Language Models**|Jinqi Luo et.al.|[2406.04331](http://arxiv.org/abs/2406.04331)|**[link](https://github.com/peterljq/parsimonious-concept-engineering)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5c3d\u7ba1\u5b83\u4eec\u80fd\u591f\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u56de\u590d\uff0c\u4f46\u4e5f\u4f1a\u4ea7\u751f\u4e0d\u826f\u8f93\u51fa\uff0c\u5982\u6f5c\u5728\u6709\u5bb3\u4fe1\u606f\u3001\u79cd\u65cf\u6216\u6027\u522b\u6b67\u89c6\u6027\u8a00\u8bba\u4ee5\u53ca\u9519\u8bef\u7684\u4fe1\u606f\u3002\u4e3a\u4e86\u51cf\u5c11\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u3001\u63d0\u793a\u5de5\u7a0b\u548c\u8868\u793a\u5de5\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u9762\u4e34\u6311\u6218\uff1a\u4e00\u4e9b\u9700\u8981\u9488\u5bf9\u6bcf\u4e2a\u5bf9\u9f50\u4efb\u52a1\u8fdb\u884c\u6602\u8d35\u7684\u5fae\u8c03\uff1b\u4e00\u4e9b\u672a\u80fd\u5145\u5206\u6d88\u9664\u4e0d\u826f\u6982\u5ff5\uff0c\u5bf9\u9f50\u6548\u679c\u4e0d\u4f73\uff1b\u4e00\u4e9b\u5219\u5220\u9664\u4e86\u826f\u6027\u7684\u6982\u5ff5\uff0c\u964d\u4f4e\u4e86LLMs\u7684\u8bed\u8a00\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aParsimonious Concept Engineering\uff08PaCE\uff09\u7684\u65b0\u578b\u6fc0\u6d3b\u5de5\u7a0b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \u9996\u5148\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u6982\u5ff5\u5b57\u5178\uff0c\u5b83\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u8868\u793a\u6bcf\u4e2a\u539f\u5b50\u5bf9\u5e94\u4e00\u4e2a\u8bed\u4e49\u6982\u5ff5\u3002\u63a5\u7740\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u4efb\u4f55\u5bf9\u9f50\u4efb\u52a1\uff0c\u6211\u4eec\u4f1a\u4f7f\u7528\u4e00\u4e2a\u6982\u5ff5\u5206\u533a\u5668\u9ad8\u6548\u5730\u6807\u8bb0\u8fd9\u4e9b\u6982\u5ff5\u4e3a\u826f\u6027\u6216\u4e0d\u826f\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u5229\u7528\u7a00\u758f\u7f16\u7801\u65b9\u6cd5\uff0c\u6839\u636e\u6982\u5ff5\u5b57\u5178\u5206\u89e3LLM\u7684\u6fc0\u6d3b\uff0c\u5c06\u5176\u51c6\u786e\u8868\u793a\u4e3a\u826f\u6027\u6210\u5206\u548c\u4e0d\u826f\u6210\u5206\u7684\u7ebf\u6027\u7ec4\u5408\u3002\u901a\u8fc7\u79fb\u9664\u4e0d\u826f\u6210\u5206\uff0c\u6211\u4eec\u80fd\u591f\u8c03\u6574LLMs\u7684\u884c\u4e3a\u4ee5\u7b26\u5408\u5bf9\u9f50\u76ee\u6807\u3002 \u6211\u4eec\u5728\u56de\u5e94\u51c0\u5316\u3001\u771f\u5b9e\u6027\u589e\u5f3a\u548c\u60c5\u611f\u4fee\u8ba2\u7b49\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u53d1\u73b0PaCE\u5728\u5b9e\u73b0\u5bf9\u9f50\u6027\u80fd\u7684\u540c\u65f6\uff0c\u4fdd\u6301\u4e86\u826f\u597d\u7684\u8bed\u8a00\u80fd\u529b\uff0c\u8fbe\u5230\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002**|\n", "2406.04314": "|**2024-06-06**|**Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step**|Zhanhao Liang et.al.|[2406.04314](http://arxiv.org/abs/2406.04314)|null|## \u80cc\u666f \u8fd1\u671f\uff0cDirect Preference Optimization (DPO) \u5df2\u6210\u529f\u6269\u5c55\u5230\u8c03\u6574\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\uff0c\u4f7f\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u4e0d\u540c\u4e8e\u5927\u591a\u6570\u73b0\u6709 DPO \u65b9\u6cd5\u5047\u8bbe\u6240\u6709\u6269\u6563\u6b65\u9aa4\u90fd\u4e0e\u6700\u7ec8\u751f\u6210\u56fe\u50cf\u4fdd\u6301\u4e00\u81f4\u7684\u504f\u597d\u987a\u5e8f\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u5047\u8bbe\u5ffd\u7565\u4e86\u6bcf\u4e2a\u6b65\u9aa4\u7279\u6709\u7684\u53bb\u566a\u6027\u80fd\uff0c\u56e0\u6b64\u5e94\u8be5\u4e3a\u6bcf\u4e00\u6b65\u5b9a\u5236\u504f\u597d\u6807\u7b7e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u540e\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014Step-aware Preference Optimization (SPO)\uff0c\u5b83\u72ec\u7acb\u8bc4\u4f30\u5e76\u8c03\u6574\u6bcf\u4e2a\u6b65\u9aa4\u7684\u53bb\u566a\u6027\u80fd\uff0c\u5229\u7528\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\u548c\u6b65\u7ea7\u91cd\u91c7\u6837\u5668\u6765\u786e\u4fdd\u51c6\u786e\u7684\u6b65\u7ea7\u76d1\u7763\u3002 \u5728SPO\u4e2d\uff0c\u6211\u4eec\u5728\u6bcf\u4e2a\u53bb\u566a\u6b65\u9aa4\u4e2d\u4f1a\u521b\u5efa\u4e00\u4e2a\u56fe\u50cf\u6c60\uff0c\u5bfb\u627e\u5408\u9002\u7684\u80dc\u8005-\u8d25\u8005\u5bf9\uff0c\u5e76\u4e14\u5173\u952e\u5728\u4e8e\uff0c\u6211\u4eec\u4f1a\u4ece\u6c60\u4e2d\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u56fe\u50cf\u4f5c\u4e3a\u4e0b\u4e00\u6b21\u53bb\u566a\u6b65\u9aa4\u7684\u8d77\u70b9\u3002\u8fd9\u4e2a\u6b65\u7ea7\u91cd\u91c7\u6837\u8fc7\u7a0b\u4fdd\u8bc1\u4e86\u6bcf\u6b21\u80dc\u8005-\u8d25\u8005\u5bf9\u90fd\u6765\u81ea\u540c\u4e00\u539f\u59cb\u56fe\u50cf\uff0c\u4f7f\u5f97\u6bd4\u8f83\u72ec\u7acb\u4e8e\u524d\u4e00\u6b65\u3002\u4e3a\u4e86\u8bc4\u4f30\u6bcf\u4e2a\u6b65\u9aa4\u7684\u504f\u597d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u6a21\u7cca\u548c\u6e05\u6670\u7684\u56fe\u50cf\u3002\u5728Stable Diffusion v1.5\u548cSDXL\u7b49\u5b9e\u9a8c\u4e2d\uff0cSPO \u663e\u8457\u4f18\u4e8e\u6700\u65b0\u7684Diffusion-DPO\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u8be6\u7ec6\u7684\u63d0\u793a\u65f6\uff0c\u80fd\u66f4\u597d\u5730\u751f\u6210\u56fe\u50cf\u5e76\u63d0\u5347\u7f8e\u5b66\u6548\u679c\uff0c\u540c\u65f6\u5728\u8bad\u7ec3\u6548\u7387\u4e0a\u8d85\u8fc720\u500d\u3002\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1a[https://rockeycoss.github.io/spo.github.io/](https://rockeycoss.github.io/spo.github.io/)\u3002|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306](http://arxiv.org/abs/2406.04306)|**[link](https://github.com/ml-jku/SDLG)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u65f6\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u8fd9\u963b\u788d\u4e86\u793e\u4f1a\u548c\u5de5\u4e1a\u4e2d\u7684\u5404\u79cd\u5e94\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u964d\u4f4eLLMs\u7684\u53ef\u4fe1\u5ea6\u3002\u5f53\u524d\u7684LLMs\u91c7\u7528\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u6587\u672c\uff0c\u5373\u9884\u6d4b\u5e76\u6dfb\u52a0\u6587\u672c\u6807\u8bb0\u3002\u5f53LLMs\u5bf9\u751f\u6210\u7684\u4e0b\u4e00\u4e2a\u6807\u8bb0\u7684\u8bed\u4e49\u542b\u4e49\u4e0d\u786e\u5b9a\u65f6\uff0c\u5f88\u53ef\u80fd\u4f1a\u4ea7\u751f\u5e7b\u89c9\u3002\u56e0\u6b64\uff0c\u4eba\u4eec\u8ba4\u4e3a\u5e7b\u89c9\u6e90\u4e8e\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u8bed\u4e49\u591a\u6837\u6027\u8bed\u8a00\u751f\u6210\u201d\uff08Semantically Diverse Language Generation\uff0cSDLG\uff09\uff0c\u7528\u4e8e\u91cf\u5316LLMs\u7684\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002SDLG\u5f15\u5bfcLLM\u751f\u6210\u8bed\u4e49\u591a\u6837\u4f46\u53c8\u5408\u7406\u7684\u521d\u59cb\u6587\u672c\u66ff\u4ee3\u65b9\u6848\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u7cbe\u786e\u7684aleatoric\u8bed\u4e49\u4e0d\u786e\u5b9a\u6027\u6d4b\u91cf\uff0c\u80fd\u591f\u68c0\u6d4b\u521d\u59cb\u6587\u672c\u662f\u5426\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u3002 \u5b9e\u9a8c\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u8868\u660e\uff0cSDLG\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u6700\u4e3a\u9ad8\u6548\uff0c\u4e3aLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u8bbe\u5b9a\u4e86\u65b0\u7684\u6807\u51c6\u3002**|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300](http://arxiv.org/abs/2406.04300)|null|\u5728\u6a21\u62df\u8bad\u7ec3\u548c\u8bc4\u4f30\u5173\u952e\u5b89\u5168\u7cfb\u7edf\uff0c\u5982\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u65f6\uff0c\u901a\u8fc7\u6a21\u62df\u751f\u6210\u5404\u79cd\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u6a21\u578b\u5176\u4ed6\u8f66\u8f86\u7684\u8f68\u8ff9\u4ee5\u6a21\u62df\u590d\u6742\u4e14\u6709\u610f\u4e49\u7684\u8fd1\u8ddd\u79bb\u4ea4\u4e92\u4efb\u52a1\u6210\u672c\u9ad8\u6602\u3002\u5229\u7528\u8bed\u8a00\u63cf\u8ff0\u6765\u751f\u6210\u9a7e\u9a76\u884c\u4e3a\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u76f4\u89c2\u7684\u4eba\u7c7b\u64cd\u4f5c\u65b9\u5f0f\uff0c\u80fd\u591f\u6a21\u62df\u5e7f\u6cdb\u9a7e\u9a76\u4e92\u52a8\u3002\u4f46\u5927\u578b\u6807\u6ce8\u7684\u8bed\u8a00-\u8f68\u8ff9\u6570\u636e\u7a00\u7f3a\u662f\u8fd9\u4e00\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Text-to-Drive\uff08T2D\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5408\u6210\u591a\u6837\u5316\u9a7e\u9a76\u884c\u4e3a\u7684\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u77e5\u8bc6\u9a71\u52a8\u4e24\u9636\u6bb5\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5229\u7528LLMs\u7684\u5185\u7f6e\u77e5\u8bc6\u751f\u6210\u4e30\u5bcc\u591a\u6837\u7684\u9a7e\u9a76\u884c\u4e3a\u8bed\u8a00\u63cf\u8ff0\uff1b\u63a5\u7740\uff0c\u5229\u7528\u5176\u63a8\u7406\u80fd\u529b\u5728\u6a21\u62df\u5668\u4e2d\u5b9e\u73b0\u8fd9\u4e9b\u884c\u4e3a\u3002T2D\u7684\u6838\u5fc3\u662f\u4f7f\u7528LLM\u6784\u5efa\u72b6\u6001\u56fe\uff0c\u5c06\u4f4e\u7ea7\u72b6\u6001\u6620\u5c04\u5230\u9ad8\u7ea7\u62bd\u8c61\uff0c\u4ece\u800c\u7b80\u5316\u4e86\u8bf8\u5982\u603b\u7ed3\u4f4e\u7ea7\u89c2\u6d4b\u3001\u8bc4\u4f30\u7b56\u7565\u4e0e\u884c\u4e3a\u63cf\u8ff0\u7684\u4e00\u81f4\u6027\u4ee5\u53ca\u8bbe\u8ba1\u8f85\u52a9\u5956\u52b1\u7b49\u4e0b\u6e38\u4efb\u52a1\uff0c\u65e0\u9700\u4eba\u5de5\u76d1\u7763\u3002\u901a\u8fc7\u6211\u4eec\u7684\u77e5\u8bc6\u9a71\u52a8\u65b9\u6cd5\uff0c\u6211\u4eec\u8bc1\u660eT2D\u80fd\u751f\u6210\u6bd4\u5176\u4ed6\u57fa\u51c6\u66f4\u4e30\u5bcc\u7684\u8f68\u8ff9\uff0c\u5e76\u63d0\u4f9b\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u4ea4\u4e92\u5f0f\u5730\u878d\u5165\u4eba\u7c7b\u504f\u597d\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u7f51\u7ad9\uff1a|\n", "2406.04289": "|**2024-06-07**|**What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages**|Nadav Borenstein et.al.|[2406.04289](http://arxiv.org/abs/2406.04289)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b66\u4e60\u4ec0\u4e48\uff1f\u6839\u636e\u5b9a\u4e49\uff0c\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\u662f\u5b57\u7b26\u4e32\u7684\u5206\u5e03\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06\u8fd9\u4e2a\u95ee\u9898\u8f6c\u5316\u4e3a\u8bc4\u4f30\u5b57\u7b26\u4e32\u5206\u5e03\u7c7b\u7684\u5b66\u4e60\u80fd\u529b\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u7406\u8bba\u9650\u5236\uff0c\u4f46\u6211\u4eec\u5173\u6ce8\u7684\u662f\u5b9e\u9645\u53ef\u5b66\u4e60\u6027\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u5b9e\u8bc1\u5de5\u4f5c\uff0c\u6211\u4eec\u8bc4\u4f30\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u5176\u201c\u4e3b\u573a\u201d\u2014\u2014\u5b66\u4e60\u6982\u7387\u8bed\u8a00\u2014\u2014\u4e0a\u7684\u8868\u73b0\uff0c\u800c\u4e0d\u662f\u4f5c\u4e3a\u5f62\u5f0f\u8bed\u8a00\u7684\u5206\u7c7b\u5668\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u9012\u5f52\u8bed\u8a00\u6a21\u578b\uff08RLM\uff09\u7531\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u548cTransformer LM\u5b66\u4e60\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5RLM\u7684\u53ef\u5b66\u4e60\u6027\uff0c\u8003\u5bdf\u5176\u4e0eRLM\u7684\u590d\u6742\u53c2\u6570\u4ee5\u53ca\u795e\u7ecfLM\u9690\u85cf\u5c42\u5927\u5c0f\u7684\u5173\u7cfb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRLM\u7684\u79e9\uff08\u5bf9\u5e94\u4e8e\u5176\u6761\u4ef6\u5206\u5e03\u5bf9\u6570\u4f3c\u7136\u7ebf\u6027\u7a7a\u95f4\u7684\u5927\u5c0f\uff09\u548c\u91c7\u6837\u5b57\u7b26\u4e32\u7684\u9884\u671f\u957f\u5ea6\u662fRNN\u548cTransformer LM\u53ef\u5b66\u4e60\u6027\u7684\u5f3a\u4e14\u663e\u8457\u9884\u6d4b\u56e0\u7d20\u3002\u5176\u4ed6\u4e00\u4e9b\u9884\u6d4b\u6307\u6807\u4e5f\u8fbe\u5230\u4e86\u663e\u8457\u6027\uff0c\u4f46RNN\u548cTransformer\u4e4b\u95f4\u5b58\u5728\u4e0d\u540c\u7684\u6a21\u5f0f\u3002|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278](http://arxiv.org/abs/2406.04278)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|**## \u7ffb\u8bd1\u540e\u7684\u4e2d\u6587\u6458\u8981 \u5bf9\u8bdd\u8bed\u6c14\u5728\u4eba\u9645\u4ea4\u6d41\u4e2d\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u7814\u7a76\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ea4\u6d41\u8bed\u6c14\u7684\u5dee\u5f02\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u5173\u4e8e\u5bf9\u8bdd\u6a21\u5f0f\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u9884\u5148\u5b58\u5728\u7684\u5206\u7c7b\u4f53\u7cfb\u6216\u6587\u672c\u8bed\u6599\u5e93\uff0c\u8fd9\u4e9b\u53ef\u80fd\u5b58\u5728\u5b9e\u9a8c\u8005\u504f\u89c1\uff0c\u5e76\u53ef\u80fd\u65e0\u6cd5\u5145\u5206\u53cd\u6620\u7814\u7a76\u9886\u57df\u4e2d\u7684\u771f\u5b9e\u4e16\u754c\u5206\u5e03\u3002\u53d7\u8ba4\u77e5\u79d1\u5b66\u65b9\u6cd5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8fed\u4ee3\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u4e24\u9879\u4efb\u52a1\u6765\u540c\u65f6\u63ed\u793a\u8bed\u6c14\u548c\u53e5\u5b50\uff1a\uff081\uff09\u53c2\u4e0e\u8005\u5224\u65ad\u7ed9\u5b9a\u53e5\u5b50\u7684\u8bed\u6c14\uff0c\uff082\uff09\u53e6\u4e00\u53c2\u4e0e\u8005\u6839\u636e\u8be5\u8bed\u6c14\u751f\u6210\u53e5\u5b50\u3002\u6211\u4eec\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u548cGPT-4\u4e4b\u95f4\u8fdb\u884c\u4e86100\u8f6e\u8fd9\u6837\u7684\u4e92\u52a8\uff0c\u4ece\u800c\u83b7\u5f97\u4e86\u4e00\u7ec4\u5305\u542b\u53e5\u5b50\u548c\u5e38\u89c1\u5bf9\u8bdd\u8bed\u6c14\u7684\u6570\u636e\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u989d\u5916\u5b9e\u9a8c\uff0c\u8ba9\u4eba\u7c7b\u548cGPT-4\u5bf9\u6240\u6709\u53e5\u5b50\u6807\u6ce8\u6240\u6709\u8bed\u6c14\u3002\u57fa\u4e8e1,339\u540d\u4eba\u7c7b\u53c2\u4e0e\u8005\u300133,370\u6b21\u4eba\u7c7b\u8bc4\u4ef7\u4ee5\u53ca29,900\u4e2aGPT-4\u67e5\u8be2\u7684\u6570\u636e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u521b\u5efa\u4e00\u4e2a\u53ef\u89e3\u91ca\u7684\u51e0\u4f55\u8868\u793a\uff0c\u4ee5\u5c55\u793a\u4eba\u7c7b\u548cGPT-4\u4e4b\u95f4\u7684\u5bf9\u8bdd\u8bed\u6c14\u5173\u7cfb\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u548c\u8ba4\u77e5\u79d1\u5b66\u7406\u5ff5\u5982\u4f55\u7ed3\u5408\uff0c\u4ee5\u89e3\u51b3\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6311\u6218\u3002**|\n", "2406.05132": "|**2024-06-07**|**3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs**|Jianing Yang et.al.|[2406.05132](http://arxiv.org/abs/2406.05132)|**[link](https://github.com/sled-group/3D-GRAND)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u8bed\u8a00\u4e0e\u4e09\u7ef4\u611f\u77e5\u7684\u878d\u5408\u5bf9\u4e8e\u6784\u5efa\u7406\u89e3\u548c\u4e92\u52a8\u4e8e\u7269\u7406\u4e16\u754c\u7684\u5b9e\u4f53\u4ee3\u7406\u548c\u673a\u5668\u4eba\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u9002\u5e94\u4e09\u7ef4\u73af\u5883\uff083D-LLMs\uff09\u65b9\u9762\u4ecd\u5904\u4e8e\u521d\u7ea7\u9636\u6bb5\uff0c\u4e3b\u8981\u6311\u6218\u5728\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u5bc6\u96c6\u5730\u5c06\u8bed\u8a00\u4e0e\u4e09\u7ef4\u573a\u666f\u5173\u8054\u7684\u6570\u636e\u96c6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e863D-GRAND\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u5927\u578b\u6570\u636e\u96c6\uff0c\u5305\u542b40,087\u4e2a\u5bb6\u5ead\u573a\u666f\uff0c\u914d\u5bf9\u6709620\u4e07\u6761\u8be6\u5c3d\u7684\u573a\u666f-\u8bed\u8a00\u6307\u4ee4\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u75283D-GRAND\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u663e\u8457\u63d0\u9ad8\u4e863D-LLMs\u7684\u5b9a\u4f4d\u80fd\u529b\uff0c\u5e76\u51cf\u5c11\u4e86\u9519\u8bef\u7684\u60f3\u8c61\u3002\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e863D-POPE\u57fa\u51c6\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f303D-LLMs\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u6a21\u578b\u7684\u516c\u5e73\u6bd4\u8f83\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u96c6\u89c4\u6a21\u4e0e3D-LLM\u6027\u80fd\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u5f3a\u8c03\u4e86\u5927\u578b\u4e09\u7ef4\u6587\u672c\u6570\u636e\u96c6\u5728\u63a8\u52a8\u4f53\u611fAI\u7814\u7a76\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u521d\u6b65\u8ff9\u8c61\u8868\u660e\uff0c\u901a\u8fc7\u5728\u5927\u578b\u5408\u6210\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u53ef\u80fd\u5728\u73b0\u5b9e\u4e16\u754c3D\u626b\u63cf\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u8fd9\u5c55\u793a\u4e86\u6a21\u62df\u5230\u5b9e\u9645\u7684\u8fc1\u79fb\u5b66\u4e60\u6f5c\u529b\u3002\u901a\u8fc73D-GRAND\u548c3D-POPE\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u4f53\u611fAI\u793e\u533a\u63d0\u4f9b\u5fc5\u8981\u7684\u8d44\u6e90\u548c\u6d1e\u89c1\uff0c\u63a8\u52a8\u66f4\u53ef\u9760\u3001\u66f4\u624e\u5b9e\u76843D-LLMs\u7684\u53d1\u5c55\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttps://3d-grand.github.io|\n", "2406.05130": "|**2024-06-07**|**An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models**|Xiongtao Zhou et.al.|[2406.05130](http://arxiv.org/abs/2406.05130)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u5177\u6709\u6570\u5341\u4ebf\u53c2\u6570\uff0c\u5168\u9762\u8c03\u6574\u53d8\u5f97\u56f0\u96be\u3002\u7814\u7a76\u76ee\u6807\u662f\u627e\u51fa\u5728\u53c2\u6570\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u5347MLLM\u6027\u80fd\u7684\u6709\u6548\u65b9\u6cd5\u3002\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u56db\u79cd\u6d41\u884c\u7684PEFT\u6280\u672f\u5bf9\u5f00\u6e90MLLMs\u7684LLM\u7ec4\u4ef6\u8fdb\u884c\u5fae\u8c03\uff0c\u8bba\u6587\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5206\u6790\uff0c\u5185\u5bb9\u5305\u62ec\u4e0d\u540c\u65b9\u6cd5\u5bf9\u6a21\u578b\u3001\u53c2\u6570\u4f4d\u7f6e\u3001\u5fae\u8c03\u6570\u636e\u89c4\u6a21\u3001\u6a21\u578b\u7a33\u5b9a\u6027\u3001\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e03\u9879\u6570\u636e\u96c6\uff1a\u672a\u89c1\u8fc7\u7684\u548c\u5df2\u89c1\u8fc7\u7684\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u662f\u6700\u6709\u6548\u7684PEFT\u65b9\u6cd5\uff0c\u800c\u8fde\u63a5\u5668\u5c42\u7684\u5fae\u8c03\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u80fd\u63d0\u9ad8\u6027\u80fd\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05127": "|**2024-06-07**|**Towards Semantic Equivalence of Tokenization in Multimodal LLM**|Shengqiong Wu et.al.|[2406.05127](http://arxiv.org/abs/2406.05127)|null|### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002MLLM\u7684\u6838\u5fc3\u5728\u4e8e\u89c6\u89c9 tokenization\uff0c\u5373\u5982\u4f55\u6709\u6548\u5730\u5c06\u8f93\u5165\u7684\u89c6\u89c9\u4fe1\u53f7\u8f6c\u5316\u4e3a\u5bf9\u8bed\u8a00\u6a21\u578b\u6709\u76ca\u7684\u7279\u5f81\u8868\u793a\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u89c9tokenizer\u5728\u4fdd\u6301\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u8bed\u4e49\u4e00\u81f4\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5b83\u4eec\u8fc7\u4e8e\u788e\u7247\u5316\u89c6\u89c9\u8f93\u5165\uff0c\u7834\u574f\u4e86\u89c6\u89c9\u5185\u5bb9\u7684\u8bed\u4e49\u5b8c\u6574\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u8bed\u4e49\u7b49\u6548\u89c6\u89c9tokenizer\uff08SeTok\uff09\uff0c\u5b83\u901a\u8fc7\u52a8\u6001\u805a\u7c7b\u7b97\u6cd5\u5c06\u89c6\u89c9\u7279\u5f81\u7ec4\u7ec7\u6210\u8bed\u4e49\u5355\u5143\uff0c\u6839\u636e\u56fe\u50cf\u590d\u6742\u6027\u7075\u6d3b\u51b3\u5b9atoken\u7684\u6570\u91cf\u3002\u8fd9\u79cd\u751f\u6210\u7684\u89c6\u89c9tokens\u80fd\u6709\u6548\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u540c\u65f6\u6355\u6349\u4f4e\u9891\u548c\u9ad8\u9891\u89c6\u89c9\u7279\u5f81\u3002 ### \u4efb\u52a1 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSetokim\u7684\u65b0\u578bMLLM\uff0c\u5b83\u7ed3\u5408\u4e86SeTok\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSetokim\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u4f18\u52bf\u3002\u5173\u4e8e\u66f4\u591a\u8be6\u60c5\uff0c\u53ef\u4ee5\u8bbf\u95ee\u9879\u76ee\u7f51\u9875\uff1ahttps://chocowu.github.io/SeTok-web/\u3002|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107](http://arxiv.org/abs/2406.05107)|null|## \u7ffb\u8bd1 \u6570\u636e\u63a2\u7d22\u662f\u4e00\u4e2a\u590d\u6742\u7684\u8fc7\u7a0b\uff0c\u7528\u6237\u901a\u8fc7\u9010\u6b65\u6267\u884c\u4e00\u7cfb\u5217\u67e5\u8be2\u6765\u5ba1\u89c6\u6570\u636e\u96c6\u3002\u6709\u65f6\uff0c\u7528\u6237\u4f1a\u63a2\u7d22\u65b0\u6570\u636e\u4ee5\u719f\u6089\u5b83\uff0c\u4f46\u66f4\u591a\u65f6\u5019\uff0c\u63a2\u7d22\u8fc7\u7a0b\u662f\u56f4\u7ed5\u7279\u5b9a\u5206\u6790\u76ee\u6807\u6216\u95ee\u9898\u8fdb\u884c\u7684\u3002\u4e3a\u4e86\u5e2e\u52a9\u7528\u6237\u6709\u6548\u63a2\u7d22\uff0c\u5df2\u63d0\u51fa\u81ea\u52a8\u5316\u6570\u636e\u63a2\u7d22\uff08Automated Data Exploration\uff0cADE\uff09\u7cfb\u7edf\uff0c\u5b83\u4eec\u65e8\u5728\u81ea\u52a8\u751f\u6210\u5c55\u793a\u6570\u636e\u6709\u8da3\u7279\u6027\u7684\u5b8c\u6574\u63a2\u7d22\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684ADE\u7cfb\u7edf\u5e38\u53d7\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u4f18\u5316\u51fd\u6570\uff0c\u5bfc\u81f4\u5bf9\u540c\u4e00\u6570\u636e\u96c6\u59cb\u7ec8\u4ea7\u751f\u76f8\u540c\u7684\u63a2\u7d22\u5e8f\u5217\uff0c\u8fd9\u5728\u6709\u660e\u786e\u76ee\u6807\u7684\u63a2\u7d22\u4e2d\u663e\u5f97\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51faLINX\uff0c\u4e00\u4e2a\u7ed3\u5408\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\u7684\u751f\u6210\u5f0f\u7cfb\u7edf\uff0c\u4e13\u6ce8\u4e8e\u9762\u5411\u76ee\u6807\u7684\u6570\u636e\u63a2\u7d22\u3002 LINX\u63a5\u53d7\u8f93\u5165\u6570\u636e\u96c6\u548c\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u5206\u6790\u76ee\u6807\uff0c\u751f\u6210\u4e0e\u7528\u6237\u9700\u6c42\u76f8\u5173\u7684\u4e2a\u6027\u5316\u63a2\u7d22\u4f1a\u8bdd\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89e3\u6790\u8f93\u5165\u7684\u5206\u6790\u76ee\u6807\uff0c\u5e76\u636e\u6b64\u751f\u6210\u671f\u671b\u8f93\u51fa\u63a2\u7d22\u4f1a\u8bdd\u7684\u89c4\u8303\u3002\u8fd9\u4e9b\u89c4\u8303\u968f\u540e\u88ab\u4f20\u9012\u7ed9\u57fa\u4e8e\u7ea6\u675f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08Constrained Deep Reinforcement Learning\uff0cCDRL\uff09\u7684\u65b0\u578b\u6a21\u5757\u5316ADE\u5f15\u64ce\uff0c\u4f7f\u5176\u80fd\u6839\u636e\u6307\u5b9a\u6307\u4ee4\u8c03\u6574\u8f93\u51fa\u3002\u4e3a\u4e86\u9a8c\u8bc1LINX\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u9762\u5411\u76ee\u6807\u63a2\u7d22\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u6df1\u5165\u7684\u7528\u6237\u7814\u7a76\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLINX\u751f\u6210\u7684\u63a2\u7d22\u7b14\u8bb0\u672c\u5728\u76f8\u5173\u6027\u548c\u5b9e\u7528\u6027\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u89e3\u51b3\u65b9\u6848\uff0c\u5305\u62ecChatGPT\u3001\u65e0\u76ee\u6807\u5bfc\u5411\u7684ADE\u4ee5\u53ca\u5546\u4e1a\u7cfb\u7edf\u3002|\n", "2406.05085": "|**2024-06-07**|**Multi-Head RAG: Solving Multi-Aspect Problems with LLMs**|Maciej Besta et.al.|[2406.05085](http://arxiv.org/abs/2406.05085)|**[link](https://github.com/spcl/mrag)**|**## \u80cc\u666f **\u589e\u5f3a\u578b\u68c0\u7d22\u751f\u6210\uff08Retrieval Augmented Generation, RAG\uff09**\u901a\u8fc7\u5c06\u6587\u6863\u5185\u5bb9\u878d\u5165\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u4e2d\uff0c\u63d0\u9ad8\u4e86\u5176\u54cd\u5e94\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684RAG\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5904\u7406\u90a3\u4e9b\u53ef\u80fd\u9700\u8981\u68c0\u7d22\u5305\u542b\u4e0d\u540c\u5185\u5bb9\u7684\u591a\u6587\u6863\u67e5\u8be2\u3002\u8fd9\u7c7b\u95ee\u9898\u5728\u73b0\u5b9e\u4e2d\u5f88\u5e38\u89c1\uff0c\u4f46\u6311\u6218\u5728\u4e8e\uff0c\u8fd9\u4e9b\u6587\u6863\u7684\u5d4c\u5165\u5728\u5411\u91cf\u7a7a\u95f4\u4e2d\u53ef\u80fd\u76f8\u8ddd\u8f83\u8fdc\uff0c\u96be\u4ee5\u4e00\u6b21\u6027\u83b7\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6848\u2014\u2014**\u591a\u5934\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Multi-Head RAG, MRAG\uff09**\uff0c\u5b83\u4ee5\u4e00\u79cd\u7b80\u5355\u800c\u5f3a\u5927\u7684\u65b9\u5f0f\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff1a\u5229\u7528Transformer\u7684\u591a\u5934\u6ce8\u610f\u529b\u5c42\u7684\u6fc0\u6d3b\u4f5c\u4e3a\u68c0\u7d22\u952e\uff0c\u800c\u975e\u89e3\u7801\u5c42\u3002\u8fd9\u4e2a\u60f3\u6cd5\u7684\u9a71\u52a8\u529b\u5728\u4e8e\uff0c\u4e0d\u540c\u7684\u6ce8\u610f\u529b\u5934\u80fd\u591f\u5b66\u4e60\u6355\u6349\u6570\u636e\u7684\u4e0d\u540c\u65b9\u9762\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u6fc0\u6d3b\uff0c\u6211\u4eec\u5f97\u5230\u7684\u5d4c\u5165\u80fd\u4ee3\u8868\u6570\u636e\u9879\u548c\u67e5\u8be2\u7684\u591a\u79cd\u7279\u6027\uff0c\u4ece\u800c\u63d0\u5347\u590d\u6742\u67e5\u8be2\u7684\u68c0\u7d22\u7cbe\u5ea6\u3002 **\u8d21\u732e** \u6211\u4eec\u63d0\u4f9b\u4e86\u8bc4\u4f30\u65b9\u6cd5\u3001\u5ea6\u91cf\u6807\u51c6\u3001\u5408\u6210\u6570\u636e\u96c6\u4ee5\u53ca\u5b9e\u9645\u5e94\u7528\u6848\u4f8b\uff0c\u6765\u5c55\u793aMRAG\u7684\u6709\u6548\u6027\u3002\u4e0e\u6807\u51c6RAG\u57fa\u7ebf\u76f8\u6bd4\uff0cMRAG\u5728\u76f8\u5173\u6027\u65b9\u9762\u7684\u63d0\u5347\u53ef\u9ad8\u8fbe20%\u3002MRAG\u53ef\u4ee5\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u7684RAG\u6846\u67b6\uff0c\u5982RAGAS\uff0c\u4ee5\u53ca\u5404\u7c7b\u6570\u636e\u5b58\u50a8\u7cfb\u7edf\u3002 \u603b\u7ed3\uff0c\u672c\u6587\u65e8\u5728\u6539\u8fdb\u73b0\u6709RAG\u6a21\u578b\uff0c\u4ee5\u66f4\u597d\u5730\u5904\u7406\u6d89\u53ca\u591a\u89d2\u5ea6\u4fe1\u606f\u68c0\u7d22\u7684\u590d\u6742\u67e5\u8be2\u4efb\u52a1\u3002**|\n", "2406.05063": "|**2024-06-07**|**Are Large Language Models More Empathetic than Humans?**|Anuradha Welivita et.al.|[2406.05063](http://arxiv.org/abs/2406.05063)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7814\u7a76\u5b83\u4eec\u662f\u5426\u80fd\u5728\u60c5\u611f\u8bc6\u522b\u548c\u5171\u60c5\u56de\u5e94\u65b9\u9762\u8d85\u8d8a\u4eba\u7c7b\u5df2\u6210\u4e3a\u7814\u7a76\u7126\u70b9\u3002\u672c\u8bba\u6587\u5f00\u5c55\u4e86\u4e00\u9879\u6df1\u5165\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5305\u62ecGPT-4\u3001LLaMA-2-70B-Chat\u3001Gemini-1.0-Pro\u548cMixtral-8x7B-Instruct\u5728\u5185\u7684\u56db\u6b3e\u6700\u5148\u8fdb\u7684LLMs\u4e0e\u4eba\u7c7b\u5728\u5171\u60c5\u56de\u5e94\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca1,000\u540d\u53c2\u4e0e\u8005\u7684\u53cc\u76f2\u7528\u6237\u7814\u7a76\uff0c\u5bf92,000\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u60c5\u611f\u5bf9\u8bdd\u63d0\u793a\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u8fd9\u4e9b\u63d0\u793a\u6db5\u76d6\u4e8632\u79cd\u4e0d\u540c\u6b63\u8d1f\u60c5\u7eea\u7684\u5e7f\u6cdb\u8303\u56f4\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u7684\u5171\u60c5\u56de\u5e94\u80fd\u529b\u5728\u7edf\u8ba1\u5b66\u4e0a\u4f18\u4e8e\u4eba\u7c7b\u3002GPT-4\u8868\u73b0\u51fa\u6700\u5f3a\u70c8\u7684\u5171\u60c5\uff0c\u5176\u201c\u597d\u201d\u7b49\u7ea7\u522b\u7684\u56de\u590d\u6bd4\u4eba\u7c7b\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea631%\u3002\u7d27\u968f\u5176\u540e\u7684\u662fLLaMA-2\uff0c\u63d0\u5347\u4e86\u7ea624%\uff0cMixtral-8x7B\u63d0\u5347\u4e86\u7ea621%\uff0cGemini-Pro\u63d0\u5347\u4e86\u7ea610%\u3002\u6211\u4eec\u8fd8\u5bf9\u56de\u590d\u8bc4\u7ea7\u8fdb\u884c\u4e86\u66f4\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u53d1\u73b0\u67d0\u4e9bLLMs\u5728\u56de\u5e94\u7279\u5b9a\u60c5\u7eea\u65b9\u9762\u660e\u663e\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\u3002\u63d0\u51fa\u7684\u8bc4\u4f30\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u9002\u5e94\u6027\u5f3a\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0LLMs\u7684\u5171\u60c5\u80fd\u529b\uff0c\u907f\u514d\u4e86\u672a\u6765\u7814\u7a76\u91cd\u590d\u8fd9\u9879\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055](http://arxiv.org/abs/2406.05055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u63d0\u793a\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u7cbe\u5fc3\u6784\u5efa\u7684\u57fa\u51c6\u4e0a\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u5b58\u5728\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u63a8\u7406\u95ee\u9898\uff0c\u5373\u6240\u8c13\u7684\u4e0d\u660e\u786e\u95ee\u9898\u3002\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0c\u73b0\u6709\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6548\u679c\u4e0d\u4f73\uff0c\u5f80\u5f80\u7ed9\u51fa\u8fc7\u5ea6\u81ea\u4fe1\u7684\u7b54\u6848\u6216\u9519\u8bef\u63a8\u65ad\u3002\u4e3a\u4e86\u6df1\u5165\u7814\u7a76\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u201d\uff08PMC\uff09\u7684\u57fa\u51c6\uff0c\u5e76\u5f15\u5165\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002\u4f7f\u7528PMC\u57fa\u51c6\u7684\u5206\u6790\u63ed\u793a\u4e86\u5728\u89e3\u51b3\u660e\u786e\u95ee\u9898\u7684\u6570\u5b66\u63a8\u7406\u6027\u80fd\u4e0e\u8bc6\u522b\u4e0d\u660e\u786e\u95ee\u9898\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u6743\u8861\u3002\u9488\u5bf9PMC\u5e26\u6765\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\uff0c\u79f0\u4e3aSMT-LIB\u63d0\u793a\uff08SLP\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528SMT-LIB\u8bed\u8a00\u63cf\u8ff0\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u76f4\u63a5\u6c42\u89e3\uff0c\u7136\u540e\u91c7\u7528\u53cc\u91cd\u68c0\u67e5\u6c42\u89e3\u7b56\u7565\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u7684\u6ee1\u8db3\u6027\u548c\u552f\u4e00\u6027\uff0c\u4ece\u800c\u63d0\u4f9b\u6700\u7ec8\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u7684SLP\u65b9\u6cd5\u5728\u5904\u7406\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u65f6\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5177\u6709\u663e\u8457\u4f18\u52bf\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u6211\u4eec\u7684\u57fa\u51c6\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2406.05053": "|**2024-06-07**|**Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation**|Nachiket Kotalwar et.al.|[2406.05053](http://arxiv.org/abs/2406.05053)|null|### \u6982\u8ff0 \u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u7684\u6f5c\u529b\u5de8\u5927\uff0c\u5b83\u4eec\u80fd\u591f\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u53cd\u9988\u548c\u63d0\u793a\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u751f\u6210\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u4ee5\u8fbe\u5230\u4eba\u7c7b\u5bfc\u5e08\u7684\u6c34\u5e73\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u6559\u80b2\u90e8\u7f72\u4e2d\uff0c\u9664\u4e86\u8d28\u91cf\u5916\uff0c\u6210\u672c\u3001\u65f6\u95f4\u53ca\u6570\u636e\u9690\u79c1\u4e5f\u662f\u5173\u952e\u8003\u91cf\u56e0\u7d20\u3002\u672c\u8bba\u6587\u65e8\u5728\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u53cd\u9988\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u5305\u62ec\u8d28\u91cf\u3001\u6210\u672c\u3001\u901f\u5ea6\u548c\u6570\u636e\u9690\u79c1\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u5229\u7528\u6700\u65b0\u7684\u5728\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u6280\u672f\uff0c\u8fd9\u6709\u52a9\u4e8e\u76f4\u63a5\u964d\u4f4e\u6210\u672c\u5e76\u4fdd\u62a4\u6570\u636e\u9690\u79c1\u3002 \u4e3a\u4e86\u4f18\u5316\u9002\u5408\u6d4f\u89c8\u5668\u5185\u8fd0\u884c\u7684\u5c0f\u578b\u6a21\u578b\u7684\u53cd\u9988\u8d28\u91cf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGPT-4\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u7684\u5fae\u8c03\u6d41\u7a0b\u3002\u6211\u4eec\u5c06\u5c55\u793a\u5982\u4f55\u4f7f\u7528WebLLM\u7684\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u5f15\u64ce\u6765\u4f18\u5316Llama3-8B\u548cPhi3-3.8B\u76844\u4f4d\u91cf\u5316\u6a21\u578b\u5728\u4e09\u4e2a\u4e0d\u540cPython\u7f16\u7a0b\u6570\u636e\u96c6\u4e0a\u7684\u6548\u679c\u3002\u6211\u4eec\u627f\u8bfa\u4f1a\u516c\u5f00\u5168\u90e8\u5b9e\u73b0\u3001web\u5e94\u7528\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4fc3\u8fdb\u5728\u6d4f\u89c8\u5668\u8bed\u8a00\u6a21\u578b\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2406.05039": "|**2024-06-07**|**Bootstrapping Referring Multi-Object Tracking**|Yani Zhang et.al.|[2406.05039](http://arxiv.org/abs/2406.05039)|**[link](https://github.com/zyn213/temprmot)**|## \u80cc\u666f \u5f53\u524d\u7684\u591a\u5bf9\u8c61\u5f15\u7528\u8ddf\u8e2a\uff08RMOT\uff09\u4efb\u52a1\u901a\u5e38\u4f9d\u8d56\u4e8e\u624b\u52a8\u6807\u6ce8\u7684\u6570\u636e\u96c6\u548c\u9759\u6001\u89c4\u5219\uff0c\u8fd9\u9650\u5236\u4e86\u591a\u6837\u6027\u548c\u5b9e\u65bd\u8303\u56f4\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u901a\u8fc7\u5f15\u5165\u66f4\u591a\u533a\u5206\u6027\u8bed\u8a00\u8bcd\u6c47\u6765\u63a8\u52a8RMOT\u4efb\u52a1\u7684\u53d1\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u9996\u5148\u5bf9Refer-KITTI\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u6269\u5c55\uff0c\u521b\u5efa\u4e86Refer-KITTI-V2\uff0c\u5b83\u4ece\u6700\u521d\u76842,719\u4e2a\u624b\u52a8\u6807\u6ce8\u5f00\u59cb\uff0c\u89e3\u51b3\u4e86\u7c7b\u522b\u4e0d\u5e73\u8861\u95ee\u9898\uff0c\u5e76\u589e\u52a0\u4e86\u66f4\u591a\u5173\u952e\u8bcd\uff0c\u4f7f\u5176\u66f4\u8d34\u8fd1\u73b0\u5b9e\u573a\u666f\uff0c\u76f8\u8f83\u4e8eRefer-KITTI\u6709\u6240\u8fdb\u6b65\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6269\u5145\u8fd9\u4e9b\u6807\u6ce8\uff0c\u603b\u8ba1\u8fbe\u52309,758\u4e2a\uff0c\u751f\u6210\u4e86617\u4e2a\u4e0d\u540c\u7684\u8bcd\u6c47\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684RMOT\u57fa\u51c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6539\u8fdb\u4e86RMOT\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u800c\u4f18\u96c5\u7684\u65f6\u5e8f\u63a8\u8fdb\u7b56\u7565\uff0c\u8be5\u7b56\u7565\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\u3002\u76f8\u5173\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05035": "|**2024-06-07**|**Scenarios and Approaches for Situated Natural Language Explanations**|Pengshuo Qiu et.al.|[2406.05035](http://arxiv.org/abs/2406.05035)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u751f\u6210\u9002\u5e94\u4e0d\u540c\u7528\u6237\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\uff08NLE\uff09\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u79cd\u9002\u5e94\u6027\u7684\u91cf\u5316\u8bc4\u4f30\u5c1a\u5b58\u7a7a\u767d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u2014\u2014\u57fa\u4e8e\u60c5\u5883\u7684\u89e3\u91ca\uff08Situation-Based Explanation\uff0cSBE\uff09\u6570\u636e\u96c6\uff0c\u5305\u542b100\u4e2a\u9700\u8981\u89e3\u91ca\u7684\u4e8b\u7269\uff08explanandum\uff09\u3002\u6bcf\u4e2a\u4e8b\u7269\u90fd\u914d\u5bf9\u4e86\u9488\u5bf9\u6559\u5e08\u3001\u5b66\u751f\u548c\u4e13\u4e1a\u4eba\u58eb\u7b49\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u89e3\u91ca\uff0c\u4ee5\u4fbf\u8bc4\u4f30\u6a21\u578b\u5728\u6ee1\u8db3\u8fd9\u4e9b\u591a\u5143\u5316\u7fa4\u4f53\u4fe1\u606f\u9700\u6c42\u548c\u80cc\u666f\u4e0b\u7684\u89e3\u91ca\u7cbe\u51c6\u5ea6\uff0c\u5982\u5b66\u751f\u3001\u6559\u5e08\u548c\u5bb6\u957f\u3002\u6bcf\u79cd\u201c\u4e8b\u4f8b-\u53d7\u4f17\u201d\u7ec4\u5408\u90fd\u9644\u6709\u4eba\u7c7b\u64b0\u5199\u7684\u53c2\u8003\u89e3\u91ca\uff0c\u7528\u4e8e\u8ba1\u7b97\u5206\u6570\uff0c\u4ee5\u91cf\u5316\u6a21\u578b\u5982\u4f55\u6839\u636e\u60c5\u5883\u8c03\u6574\u89e3\u91ca\u3002\u6211\u4eec\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u4e0a\u6d4b\u8bd5\u4e86\u4e09\u79cd\u63d0\u793a\u65b9\u6cd5\uff1a\u89c4\u5219\u57fa\u7840\u63d0\u793a\u3001\u5143\u63d0\u793a\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u3002\u7814\u7a76\u53d1\u73b0\uff1a1\uff09\u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u751f\u6210\u63d0\u793a\u4ea7\u751f\u66f4\u7cbe\u786e\u5730\u7b26\u5408\u76ee\u6807\u60c5\u5883\u7684\u89e3\u91ca\uff1b2\uff09\u660e\u786e\u63d0\u793a\u201c\u4f60\u662f\u4e00\u4e2a\u6709\u7528\u7684\u52a9\u624b\u201d\u5e76\u975e\u9488\u5bf9\u60c5\u5883\u5316NLE\u4efb\u52a1\u7684\u5fc5\u8981\u6280\u672f\uff1b3\uff09\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u4ec5\u80fd\u5e2e\u52a9\u6a21\u578b\u5b66\u4e60\u6f14\u793a\u6a21\u677f\uff0c\u4f46\u65e0\u52a9\u4e8e\u63d0\u5347\u5176\u63a8\u7406\u6027\u80fd\u3002SBE\u6570\u636e\u96c6\u548c\u6211\u4eec\u7684\u5206\u6790\u4e3a\u4eca\u540e\u751f\u6210\u9002\u5e94\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2406.06525": "|**2024-06-10**|**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**|Peize Sun et.al.|[2406.06525](http://arxiv.org/abs/2406.06525)|**[link](https://github.com/foundationvision/llamagen)**|**\u6211\u4eec\u63d0\u51faLlamaGen\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u65b0\u7684\u56fe\u50cf\u751f\u6210\u6a21\u578b\u5bb6\u65cf\uff0c\u5b83\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u539f\u59cb\u201c\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u201d\u8303\u5f0f\u5e94\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u9886\u57df\u3002\u8fd9\u8868\u660e\uff0c\u5982\u679c\u9002\u5f53\u6269\u5c55\uff0c\u672a\u7ecf\u89c6\u89c9\u7279\u6027\u7684\u5148\u9a8c\u77e5\u8bc6\u589e\u5f3a\u7684\u7eaf\u81ea\u56de\u5f52\u6a21\u578b\uff08\u5982Llama\uff09\u4e5f\u80fd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u56fe\u50cf\u751f\u6210\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u56fe\u50cf\u5206\u8bcd\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u4ee5\u53ca\u8bad\u7ec3\u6570\u636e\u8d28\u91cf\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a(1) \u4e00\u79cd\u5177\u670916\u500d\u4e0b\u91c7\u6837\u7684\u56fe\u50cf\u5206\u8bcd\u5668\uff0c\u5176\u5728ImageNet\u57fa\u51c6\u4e0a\u7684\u91cd\u6784\u8d28\u91cf\u4e3a0.94\uff0c\u4ee3\u7801\u4e66\u5229\u7528\u7387\u9ad8\u8fbe97%\u3002(2) \u4e00\u7cfb\u5217\u4ece111\u767e\u4e07\u523031\u4ebf\u53c2\u6570\u7684\u7c7b\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u5728ImageNet 256x256\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.18\u7684FID\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u6d41\u884c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5982LDM\u548cDiT\u3002(3) \u4e00\u4e2a7.75\u4ebf\u53c2\u6570\u7684\u6587\u672c\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\u5728LAION-COCO\u548c\u9ad8\u5ba1\u7f8e\u8d28\u91cf\u56fe\u50cf\u4e0a\uff0c\u663e\u793a\u51fa\u826f\u597d\u7684\u89c6\u89c9\u8d28\u91cf\u548c\u6587\u672c\u4e00\u81f4\u6027\u6027\u80fd\u3002(4) \u6211\u4eec\u9a8c\u8bc1\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u670d\u52a1\u6846\u67b6\u5728\u4f18\u5316\u56fe\u50cf\u751f\u6210\u6a21\u578b\u63a8\u7406\u901f\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5b9e\u73b0\u4e86326%\u81f3414%\u7684\u901f\u5ea6\u63d0\u5347\u3002\u6211\u4eec\u5f00\u6e90\u6240\u6709\u6a21\u578b\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u751f\u6210\u548c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u7684\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u7684\u53d1\u5c55\u3002**|\n", "2406.06519": "|**2024-06-10**|**UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor**|Shivani Upadhyay et.al.|[2406.06519](http://arxiv.org/abs/2406.06519)|**[link](https://github.com/castorini/umbrela)**|**## \u7ffb\u8bd1 \u5927\u91cf\u76f8\u5173\u6027\u5224\u65ad\u5bf9\u4e8e\u68c0\u7d22\u7cfb\u7edf\u7684\u6709\u6548\u8bad\u7ec3\u548c\u7cbe\u786e\u8bc4\u4f30\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u5224\u65ad\u7531\u4eba\u5de5\u8bc4\u5b9a\u5458\u5b8c\u6210\uff0c\u8fc7\u7a0b\u6602\u8d35\u4e14\u8017\u65f6\u3002\u5fae\u8f6fBing\u7684Thomas\u7b49\u4eba\u6700\u8fd1\u7684\u4e00\u9879\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u51c6\u786e\u5730\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u63d0\u4f9b\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u5224\u65ad\u3002\u9057\u61be\u7684\u662f\uff0c\u4ed6\u4eec\u7684\u7814\u7a76\u5e76\u672a\u516c\u5f00\u53ef\u4f9b\u91cd\u590d\u4f7f\u7528\u7684\u8f6f\u4ef6\u5de5\u5177\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u5305\u2014\u2014UMBRELA\uff08\u5168\u79f0\u4e3a\u201cUMBRELA\u662fBing RELevance Assessor\u7684\u9012\u5f52\u7f29\u5199\u201d\uff09\uff0c\u5b83\u57fa\u4e8eOpenAI\u7684GPT-4\u6a21\u578b\u590d\u73b0\u4e86Thomas\u7b49\u4eba\u7684\u7ed3\u679c\uff0c\u5e76\u4e3a\u539f\u8bba\u6587\u589e\u6dfb\u4e86\u66f4\u591a\u7ec6\u8282\u3002\u6211\u4eec\u5728TREC 2019\u5e74\u81f32023\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u4efb\u52a1\u4e2d\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u76f8\u5173\u6027\u5224\u65ad\u4e0e\u9ad8\u6548\u591a\u9636\u6bb5\u68c0\u7d22\u7cfb\u7edf\u751f\u6210\u7684\u6392\u540d\u9ad8\u5ea6\u76f8\u5173\u3002\u8be5\u5de5\u5177\u5305\u8bbe\u8ba1\u4e3a\u6613\u4e8e\u6269\u5c55\uff0c\u53ef\u4ee5\u878d\u5165\u73b0\u6709\u7684\u591a\u9636\u6bb5\u68c0\u7d22\u548c\u8bc4\u4f30\u6d41\u7a0b\uff0c\u4e3a\u7814\u7a76\u68c0\u7d22\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u8d44\u6e90\u3002UMBRELA\u5c06\u5728TREC 2024\u5e74\u7684RAG\u4efb\u52a1\u4e2d\u7528\u4e8e\u8f85\u52a9\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u6211\u4eec\u671f\u671b\u5b83\u6210\u4e3a\u8be5\u9886\u57df\u8fdb\u4e00\u6b65\u521b\u65b0\u7684\u57fa\u7840\u3002UMBRELA\u7684\u4ee3\u7801\u5e93\u53ef\u4e8ehttps://github.com/castorini/umbrela\u83b7\u53d6\u3002**|\n", "2406.06499": "|**2024-06-10**|**NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative**|Asmar Nadeem et.al.|[2406.06499](http://arxiv.org/abs/2406.06499)|null|\u5f53\u524d\u7684\u89c6\u9891\u5b57\u5e55\u57fa\u51c6\u548c\u6a21\u578b\u5728\u8868\u5f81\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u79cd\u53d9\u4e8b\u662f\u901a\u8fc7\u56e0\u679c\u5173\u7cfb\u8fde\u63a5\u7684\u4e00\u7cfb\u5217\u4e8b\u4ef6\uff0c\u968f\u65f6\u95f4\u53d1\u5c55\uff0c\u7531\u4eba\u7269\u6216\u4e3b\u4f53\u9a71\u52a8\u3002\u8fd9\u79cd\u7f3a\u4e4f\u53d9\u4e8b\u6027\u9650\u5236\u4e86\u6a21\u578b\u751f\u6210\u6355\u6349\u89c6\u9891\u5185\u5bb9\u5185\u5728\u56e0\u679c\u548c\u65f6\u95f4\u52a8\u6001\u7684\u6587\u672c\u63cf\u8ff0\u7684\u80fd\u529b\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faNarrativeBridge\uff0c\u5b83\u5305\u62ec\u4ee5\u4e0b\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\uff1a\uff081\uff09\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u751f\u6210\u7684\u65b0\u578b\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\uff08CTN\uff09\u5b57\u5e55\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u660e\u786e\u5730\u5728\u89c6\u9891\u63cf\u8ff0\u4e2d\u7f16\u7801\u56e0\u679c\u5173\u7cfb\uff0c\u901a\u8fc7\u81ea\u52a8\u8bc4\u4f30\u786e\u4fdd\u8d28\u91cf\u548c\u76f8\u5173\u6027\uff1b\uff082\uff09\u4e00\u4e2a\u4e13\u95e8\u7684\u56e0\u679c\u7f51\u7edc\uff08CEN\uff09\u67b6\u6784\uff0c\u5177\u6709\u72ec\u7acb\u7684\u7f16\u7801\u5668\u4ee5\u5206\u522b\u6355\u83b7\u56e0\u679c\u52a8\u6001\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u7684\u5b66\u4e60\u548c\u751f\u6210\u5177\u6709\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7684\u5b57\u5e55\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCEN\u5728\u8868\u8fbe\u89c6\u9891\u5185\u5bb9\u7684\u56e0\u679c\u548c\u65f6\u95f4\u65b9\u9762\u6bd4\u7b2c\u4e8c\u597d\u7684\u6a21\u578b\uff08GIT\uff09\u66f4\u51c6\u786e\uff1a\u5728MSVD\u548cMSR-VTT\u6570\u636e\u96c6\u4e0a\u7684CIDEr\u5206\u6570\u5206\u522b\u4e3a17.88\u548c17.44\u3002\u63d0\u51fa\u7684\u6846\u67b6\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u5177\u6709\u590d\u6742\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7ed3\u6784\u7684\u7ec6\u5fae\u6587\u672c\u63cf\u8ff0\uff0c\u8fd9\u662f\u89c6\u9891\u5b57\u5e55\u751f\u6210\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u6027\u3002\u6709\u5173\u9879\u76ee\u8be6\u60c5\uff0c\u8bf7\u8bbf\u95ee\u3002|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474](http://arxiv.org/abs/2406.06474)|null|\u5728\u5065\u5eb7\u9886\u57df\uff0c\u5927\u90e8\u5206\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u4e34\u5e8a\u4efb\u52a1\u4e0a\u3002\u7136\u800c\uff0c\u79fb\u52a8\u548c\u53ef\u7a7f\u6234\u8bbe\u5907\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u957f\u671f\u7684\u4e2a\u4eba\u5065\u5eb7\u76d1\u6d4b\u6570\u636e\u5f80\u5f80\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Large Language Model\uff08PH-LLM\uff09\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u662fGemini\u7684\u5b9a\u5236\u7248\uff0c\u4e13\u4e3a\u7406\u89e3\u548c\u5904\u7406\u6570\u503c\u65f6\u95f4\u5e8f\u5217\u7684\u4e2a\u4eba\u5065\u5eb7\u6570\u636e\u800c\u8bbe\u8ba1\u3002\u6211\u4eec\u521b\u5efa\u5e76\u6574\u7406\u4e86\u4e09\u4e2a\u6d4b\u8bd5\u96c6\uff0c\u8003\u5bdf\u4e86PH-LLM\u5728\u4ee5\u4e0b\u65b9\u9762\u7684\u6027\u80fd\uff1a1\uff09\u4ece\u7761\u7720\u6a21\u5f0f\u3001\u8eab\u4f53\u6d3b\u52a8\u548c\u751f\u7406\u53cd\u5e94\u4e2d\u751f\u6210\u4e2a\u6027\u5316\u89c1\u89e3\u548c\u5efa\u8bae\uff1b2\uff09\u4e13\u4e1a\u77e5\u8bc6\u9886\u57df\u7684\u4e13\u5bb6\u6c34\u5e73\uff1b3\uff09\u9884\u6d4b\u81ea\u6211\u62a5\u544a\u7684\u7761\u7720\u7ed3\u679c\u3002\u6211\u4eec\u4e0e\u9886\u57df\u4e13\u5bb6\u5408\u4f5c\u6784\u5efa\u4e86857\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bc4\u4f30\u5b9e\u9645\u7684\u7761\u7720\u548c\u5065\u8eab\u573a\u666f\u3002\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bc4\u5206\u6807\u51c6\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0Gemini Ultra 1.0\u548cPH-LLM\u5728\u5065\u8eab\u65b9\u9762\u4e0e\u4e13\u5bb6\u8868\u73b0\u65e0\u7edf\u8ba1\u5dee\u5f02\uff0c\u5c3d\u7ba1\u5728\u7761\u7720\u65b9\u9762\u4e13\u5bb6\u4ecd\u5360\u4f18\u52bf\uff0c\u4f46Fine-tune\u540e\u7684PH-LLM\u5728\u5229\u7528\u76f8\u5173\u9886\u57df\u77e5\u8bc6\u548c\u4e2a\u4eba\u5316\u7761\u7720\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u591a\u9879\u9009\u62e9\u7684\u7761\u7720\u533b\u5b66\u548c\u5065\u8eab\u8003\u8bd5\u8bc4\u4f30\u4e86PH-LLM\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5176\u5f97\u5206\u5206\u522b\u4e3a79%\u548c88%\uff0c\u8d85\u8fc7\u4e86\u4eba\u7c7b\u4e13\u5bb6\u6837\u672c\u7684\u5e73\u5747\u5206\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bad\u7ec3PH-LLM\u9884\u6d4b\u6765\u81ea\u53ef\u7a7f\u6234\u8bbe\u5907\u6587\u672c\u548c\u591a\u6a21\u6001\u7f16\u7801\u6570\u636e\u7684\u81ea\u6211\u62a5\u544a\u7761\u7720\u8d28\u91cf\u7ed3\u679c\uff0c\u5e76\u8bc1\u660e\u4e86\u591a\u6a21\u6001\u7f16\u7801\u5bf9\u4e8e\u8fbe\u5230\u4e13\u95e8\u533a\u5206\u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5728\u4e2a\u4eba\u5065\u5eb7\u8fd9\u4e2a\u5173\u952e\u5b89\u5168\u9886\u57df\u8fd8\u9700\u8981\u8fdb\u4e00\u6b65\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u4f46\u8fd9\u4e9b\u7ed3\u679c\u5c55\u793a\u4e86Gemini\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u80fd\u529b\uff0c\u4ee5\u53ca\u5c06\u751f\u7406\u6570\u636e\u5e94\u7528\u4e8e\u4e2a\u4eba\u5065\u5eb7\u5e94\u7528\uff0c\u5982PH-LLM\u4e2d\u7684\u505a\u6cd5\u3002|\n", "2406.06465": "|**2024-06-10**|**AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction**|Zhen Xing et.al.|[2406.06465](http://arxiv.org/abs/2406.06465)|null|\u6587\u672c\u5f15\u5bfc\u7684\u89c6\u9891\u9884\u6d4b\uff08TVP\uff09\u4efb\u52a1\u65e8\u5728\u6839\u636e\u521d\u59cb\u5e27\u548c\u6307\u4ee4\u9884\u6d4b\u540e\u7eed\u5e27\u7684\u8fd0\u52a8\uff0c\u8fd9\u5bf9\u4e8e\u865a\u62df\u73b0\u5b9e\u3001\u673a\u5668\u4eba\u6280\u672f\u548c\u5185\u5bb9\u521b\u4f5c\u7b49\u9886\u57df\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6539\u7f16Stable Diffusion\u5728\u8be5\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5e27\u4e00\u81f4\u6027\u4e0e\u65f6\u95f4\u7a33\u5b9a\u6027\u65b9\u9762\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u89c6\u9891\u6570\u636e\u96c6\u7684\u89c4\u6a21\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u9884\u8bad\u7ec3\u7684Image2Video\u6269\u6563\u6a21\u578b\u5bf9\u89c6\u9891\u52a8\u6001\u6709\u826f\u597d\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4f46\u7f3a\u4e4f\u6587\u672c\u63a7\u5236\u3002\u56e0\u6b64\uff0c\u5c06Image2Video\u6a21\u578b\u8f6c\u79fb\uff0c\u540c\u65f6\u6ce8\u5165\u6307\u4ee4\u63a7\u5236\u4ee5\u751f\u6210\u53ef\u63a7\u5236\u7684\u89c6\u9891\uff0c\u65e2\u5177\u6709\u610f\u4e49\u53c8\u9887\u5177\u6311\u6218\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u7528\u4e8e\u6839\u636e\u521d\u59cb\u5e27\u548c\u6587\u672c\u6307\u4ee4\u9884\u6d4b\u672a\u6765\u7684\u89c6\u9891\u72b6\u6001\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u53cc\u67e5\u8be2Transformer\uff08DQFormer\uff09\u67b6\u6784\uff0c\u5b83\u5c06\u6307\u4ee4\u548c\u5e27\u4fe1\u606f\u6574\u5408\u5230\u6761\u4ef6\u5d4c\u5165\u4e2d\uff0c\u7528\u4e8e\u672a\u6765\u5e27\u7684\u9884\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u957f\u77ed\u671f\u65f6\u5e8f\u9002\u914d\u5668\u548c\u7a7a\u95f4\u9002\u914d\u5668\uff0c\u80fd\u591f\u5728\u5c11\u91cf\u8bad\u7ec3\u6210\u672c\u4e0b\u5feb\u901f\u5c06\u901a\u7528\u89c6\u9891\u6269\u6563\u6a21\u578b\u9002\u5e94\u7279\u5b9a\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Something Something V2\u3001Epic Kitchen-100\u3001Bridge Data\u548cUCF-101\u56db\u4e2a\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6280\u672f\u3002\u7279\u522b\u662f\u5728Bridge\u6570\u636e\u96c6\u548cSSv2\u4e0a\uff0cAID\u5206\u522b\u5b9e\u73b0\u4e8691.2%\u548c55.5%\u7684FVD\u6539\u8fdb\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u9886\u57df\u7684\u6709\u6548\u6027\u3002\u66f4\u591a\u793a\u4f8b\u53ef\u5728\u6211\u4eec\u7684\u7f51\u7ad9\u627e\u5230\u3002|\n", "2406.06464": "|**2024-06-10**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.06461": "|**2024-06-11**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461](http://arxiv.org/abs/2406.06461)|null|\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u591a\u79cd\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4f46\u4f20\u7edf\u7684\u8bc4\u4ef7\u65b9\u6cd5\u4ec5\u5173\u6ce8\u6027\u80fd\u6307\u6807\uff0c\u5ffd\u89c6\u4e86\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u989d\u5916\u8ba1\u7b97\u8d44\u6e90\u5e26\u6765\u7684\u589e\u6548\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7b56\u7565\u6548\u7387\u7684\u7247\u9762\u7406\u89e3\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u5c06\u8ba1\u7b97\u9884\u7b97\u7eb3\u5165\u8bc4\u4f30\uff0c\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u65e2\u8003\u8651\u6027\u80fd\u6307\u6807\u53c8\u8003\u8651\u8ba1\u7b97\u6210\u672c\u7684\u66f4\u5168\u9762\u6bd4\u8f83\u3002\u901a\u8fc7\u8fd9\u79cd\u9884\u7b97\u610f\u8bc6\u7684\u89c6\u89d2\uff0c\u7814\u7a76\u53d1\u73b0\u590d\u6742\u7684\u63a8\u7406\u7b56\u7565\u5728\u6ca1\u6709\u663e\u8457\u7b97\u6cd5\u521b\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u5f80\u5f80\u7531\u4e8e\u5206\u914d\u4e86\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u800c\u8d85\u8d8a\u4e86\u7b80\u5355\u7684\u57fa\u7ebf\u3002\u4f8b\u5982\uff0c\u5f53\u7ed9\u4e88\u94fe\u5f0f\u601d\u8003\u81ea\u6d3d\u6027\uff08chain-of-thought self-consistency\uff09\u7c7b\u4f3c\u7ea7\u522b\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u5b83\u5e38\u5e38\u80fd\u4f18\u4e8e\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u63a8\u7406\u7b56\u7565\u3002\u7136\u800c\uff0c\u5728\u8fd9\u79cd\u89c4\u6a21\u654f\u611f\u7684\u89c6\u89d2\u4e0b\uff0c\u67d0\u4e9b\u7b56\u7565\u5982\u591a\u4ee3\u7406\u8fa9\u8bba\u6216\u591a\u53cd\u601d\u5728\u589e\u52a0\u8ba1\u7b97\u9884\u7b97\u65f6\u53ef\u80fd\u4f1a\u8868\u73b0\u5f97\u66f4\u5dee\u3002|\n", "2406.06458": "|**2024-06-10**|**Evaluating the Retrieval Component in LLM-Based Question Answering Systems**|Ashkan Alinejad et.al.|[2406.06458](http://arxiv.org/abs/2406.06458)|null|## \u80cc\u666f \u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u95ee\u7b54\u7cfb\u7edf\u5728\u4f9d\u8d56\u68c0\u7d22\u7ec4\u4ef6\u65f6\uff0c\u80fd\u591f\u83b7\u53d6\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5e76\u964d\u4f4e\u4ea7\u751f\u4e0d\u51c6\u786e\u56de\u590d\u6216\u9519\u8bef\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5c3d\u7ba1\u4fe1\u606f\u68c0\u7d22\u9886\u57df\u7684\u8bc4\u4f30\u65b9\u6cd5\u65e9\u5df2\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u8bc4\u4f30LLMs\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u6027\u80fd\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u57fa\u51c6\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4ef7\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u5168\u9762\u5730\u53cd\u6620\u68c0\u7d22\u5668\u7684\u6027\u80fd\uff0c\u5e76\u4e0e\u6574\u4e2a\u95ee\u7b54\u7cfb\u7edf\u7684\u6574\u4f53\u8868\u73b0\u66f4\u4e3a\u4e00\u81f4\u3002\u5c3d\u7ba1\u4f20\u7edf\u7684\u7cbe\u786e\u5ea6\uff08precision\uff09\u3001\u53ec\u56de\u7387\uff08recall\uff09\u548cF1\u5206\u6570\u7b49\u6307\u6807\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u63ed\u793aLLMs\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5728\u68c0\u7d22\u5668\u4e0d\u5b8c\u7f8e\u65f6\u4ecd\u63d0\u4f9b\u51c6\u786e\u7b54\u6848\uff0c\u4f46\u6211\u4eec\u7684\u8bc4\u4f30\u65b9\u6cd5\u8003\u8651\u5230\u4e86LLMs\u7684\u4f18\u52bf\uff0c\u5373\u5b83\u4eec\u80fd\u591f\u5ffd\u7565\u65e0\u5173\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4e5f\u80fd\u5904\u7406\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u548c\u865a\u6784\u5185\u5bb9\u3002|\n", "2406.06455": "|**2024-06-10**|**A Large Language Model Pipeline for Breast Cancer Oncology**|Tristen Pool et.al.|[2406.06455](http://arxiv.org/abs/2406.06455)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f17\u591a\u9886\u57df\u5c55\u73b0\u51fa\u521b\u65b0\u6f5c\u529b\uff0c\u4f46\u5728\u764c\u75c7\u6cbb\u7597\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u5f00\u53d1\u3002\u7814\u7a76\u8005\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684Langchain\u63d0\u793a\u5de5\u7a0b\u7ba1\u9053\uff0c\u5bf9\u6700\u5148\u8fdb\u7684OpenAI\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u6570\u636e\u96c6\u5305\u62ec\u4e34\u5e8a\u6570\u636e\u548c\u4e34\u5e8a\u6307\u5357\u6587\u672c\uff0c\u4e13\u6ce8\u4e8e\u4e73\u817a\u764c\u60a3\u8005\u8f85\u52a9\u653e\u7597\u548c\u5316\u7597\u4e24\u4e2a\u5173\u952e\u6cbb\u7597\u56e0\u7d20\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6a21\u578b\u5728\u5206\u7c7b\u8fd9\u4e24\u4e2a\u6cbb\u7597\u624b\u6bb5\u65f6\u8fbe\u5230\u4e86\u9ad8\u7cbe\u5ea6\uff080.85+\uff09\u3002\u901a\u8fc7\u89c2\u5bdf\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u7684\u6cbb\u7597\u8d28\u91cf\u6570\u636e\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u7f6e\u4fe1\u533a\u95f4\uff0c\u4f30\u8ba1\u6a21\u578b\u5728\u9884\u6d4b\u6cbb\u7597\u65b9\u6848\u65f6\u5fc5\u987b\u6bd4\u539f\u59cb\u80bf\u7624\u5b66\u5bb6\u8868\u73b0\u5f97\u66f4\u597d\uff0c\u624d\u80fd\u5728\u603b\u4f53\u4e0a\u6210\u4e3a\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6bd4\u4f8b\u4e3a8.2%\u81f313.3%\u3002\u7531\u4e8e\u764c\u75c7\u6cbb\u7597\u51b3\u7b56\u7ed3\u679c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u672a\u6765\u53ef\u80fd\u9700\u8981\u8fdb\u884c\u4e34\u5e8a\u8bd5\u9a8c\u6765\u9a8c\u8bc1\u8fd9\u4e00\u9608\u503c\u3002\u8003\u8651\u5230\u7f8e\u56fd85%\u7684\u764c\u75c7\u60a3\u8005\u5728\u5730\u65b9\u793e\u533a\u8bbe\u65bd\u63a5\u53d7\u6cbb\u7597\uff0c\u8fd9\u7c7b\u6a21\u578b\u6709\u53ef\u80fd\u663e\u8457\u6269\u5927\u4f18\u8d28\u62a4\u7406\u7684\u53ef\u53ca\u6027\uff0c\u5176\u6548\u679c\u81f3\u5c11\u63a5\u8fd1\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u3002|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451](http://arxiv.org/abs/2406.06451)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u3001\u8c03\u8bd5\u548c\u89e3\u91ca\u65b9\u9762\u7684\u6027\u80fd\u5f15\u53d1\u4e86\u8bb8\u591a\u7814\u7a76\u8005\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u5bf9\u672c\u79d1\u7f16\u7a0b\u6559\u80b2\u7684\u5173\u6ce8\uff0c\u4ed6\u4eec\u671f\u5f85\u8fd9\u4e9b\u6a21\u578b\u80fd\u9769\u65b0\u7f16\u7a0b\u6559\u5b66\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5982\u4f55\u4ee5\u53ca\u4e3a\u4f55\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u4f7f\u7528LLMs\u7684\u51b3\u7b56\u53ef\u80fd\u4e0d\u4ec5\u4ec5\u57fa\u4e8e\u6280\u672f\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u4ee5\u793e\u4f1a\u5851\u9020\u6280\u672f\u7406\u8bba\u4e3a\u6307\u5bfc\u6846\u67b6\uff0c\u63a2\u8ba8\u4e86\u5b66\u751f\u5bf9LLMs\u7684\u793e\u4f1a\u611f\u77e5\u5982\u4f55\u5f71\u54cd\u4ed6\u4eec\u7684\u4f7f\u7528\u884c\u4e3a\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u4e00\u4efd\u533f\u540d\u7684\u8bfe\u7a0b\u7ed3\u675f\u65f6\u7684\u8c03\u67e5\u95ee\u5377\uff08n=158\uff09\u3001\u4e2d\u671f\u81ea\u6211\u6548\u80fd\u95ee\u5377\uff08n=158\uff09\u300110\u4f4d\u5b66\u751f\u7684\u6df1\u5ea6\u8bbf\u8c08\u3001\u81ea\u6211\u62a5\u544a\u7684LLM\u5728\u4f5c\u4e1a\u4e2d\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u4ee5\u53ca\u671f\u4e2d\u8003\u8bd5\u6210\u7ee9\uff0c\u53d1\u73b0\u5b66\u751f\u7684LLM\u4f7f\u7528\u4e0e\u5176\u5bf9\u672a\u6765\u804c\u4e1a\u7684\u671f\u671b\u548c\u5bf9\u540c\u4f34\u4f7f\u7528\u7684\u611f\u77e5\u6709\u5173\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u65e9\u671f\u81ea\u6211\u62a5\u544a\u7684LLM\u4f7f\u7528\u4e0e\u8f83\u4f4e\u7684\u81ea\u6211\u6548\u80fd\u548c\u4e2d\u671f\u8003\u8bd5\u6210\u7ee9\u76f8\u5173\uff0c\u800c\u5b66\u751f\u5bf9\u8fc7\u5ea6\u4f9d\u8d56LLM\u7684\u611f\u77e5\uff0c\u800c\u975e\u5b9e\u9645\u4f7f\u7528\uff0c\u4e0e\u8bfe\u7a0b\u540e\u671f\u7684\u81ea\u6211\u6548\u80fd\u4e0b\u964d\u6709\u5173\u3002|\n", "2406.07545": "|**2024-06-11**|**Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena**|Aidar Myrzakhan et.al.|[2406.07545](http://arxiv.org/abs/2406.07545)|**[link](https://github.com/vila-lab/open-llm-leaderboard)**|**### \u80cc\u666f \u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u5e38\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u5e38\uff0cLLM\u4f1a\u6839\u636e\u8c03\u6574\u540e\u7684\u6982\u7387\uff0c\u5982\u957f\u5ea6\u56e0\u7d20\uff0c\u9009\u62e9\u6700\u53ef\u80fd\u7684\u7b54\u6848\u3002\u7136\u800c\uff0cLLMs\u53ef\u80fd\u5b58\u5728\u56fa\u6709\u7684\u504f\u89c1\uff0c\u4f8b\u5982\u5bf9A\u3001B\u3001C\u3001D\u7b49\u9009\u9879ID\u7684\u504f\u597d\uff0c\u8fd9\u53ef\u80fd\u5f71\u54cd\u7b54\u6848\u9884\u6d4b\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u5728\u5c11\u6570\u6d4b\u8bd5\u6837\u672c\u4e0a\u968f\u673a\u6253\u4e71\u9009\u9879\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u5230\u65b0\u6837\u672c\u4e0a\uff0c\u8bd5\u56fe\u51cf\u5c11\u8fd9\u79cd\u201c\u9009\u62e9\u504f\u5dee\u201d\u3002\u6b64\u5916\uff0cMCQ\u7684\u53e6\u4e00\u4e2a\u95ee\u9898\u662f\u201c\u5f69\u7968\u5f0f\u731c\u6d4b\u201d\uff0c\u5373LLM\u5e76\u672a\u771f\u6b63\u5b66\u4e60\u77e5\u8bc6\uff0c\u800c\u662f\u51ed\u8fd0\u6c14\u731c\u5bf9\u7b54\u6848\uff0c\u8fd9\u5bf9\u5c0f\u578bLLMs\u5c24\u4e3a\u4e25\u91cd\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e00\u4e2a\u66f4\u5168\u9762\u7684\u65b9\u6cd5\u662f\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\uff0c\u8fd9\u80fd\u4ece\u6839\u672c\u4e0a\u6d88\u9664\u9009\u62e9\u504f\u5dee\u548c\u968f\u673a\u731c\u6d4b\u3002\u4f46\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\u4e5f\u5e26\u6765\u4e86\u6311\u6218\uff1a\u4e00\u662f\u5982\u4f55\u8bc6\u522b\u5408\u9002\u7684\u5f00\u653e\u6027\u95ee\u9898\uff0c\u4e8c\u662f\u5982\u4f55\u9a8c\u8bc1LLM\u5bf9\u5f00\u653e\u5f0f\u95ee\u9898\u7684\u56de\u7b54\u4e0e\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u7b54\u6848\u4e4b\u95f4\u7684\u51c6\u786e\u6027\u3002\u672c\u7814\u7a76\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u96be\u9898\uff0c\u5e76\u5efa\u7acb\u4e00\u4e2a\u65b0\u7684LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u901a\u8fc7\u5b8c\u5168\u7684\u5f00\u653e\u5f0f\u95ee\u9898\u6765\u8861\u91cf\u6a21\u578b\u6027\u80fd\uff0c\u4f8b\u5982GPT-4o/4/3.5\u3001Claude 3\u3001Gemini\u7b49\u3002 ### \u4efb\u52a1 \u6211\u4eec\u521b\u5efa\u4e86Open-LLM-Leaderboard\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u8bc4\u4ef7\u5e73\u53f0\uff0c\u65e8\u5728\u8ddf\u8e2a\u5404\u79cdLLM\u7684\u8868\u73b0\uff0c\u63ed\u793a\u5b83\u4eec\u7684\u771f\u5b9e\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5f00\u6e90\uff0c\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/VILA-Lab/Open-LLM-Leaderboard\u3002**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528](http://arxiv.org/abs/2406.07528)|**[link](https://github.com/dvlab-research/q-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u957f\u5e8f\u5217\u65b9\u9762\u7684\u80fd\u529b\u5bf9\u4e8e\u5404\u9886\u57df\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u4ee5\u6df1\u5165\u7406\u89e3\u8bed\u4e49\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Query-aware Inference for LLMs\uff08Q-LLM\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u8ba4\u77e5\u5904\u7406\u5927\u89c4\u6a21\u5e8f\u5217\u7684\u7cfb\u7edf\u3002\u901a\u8fc7\u805a\u7126\u4e8e\u4e0e\u7ed9\u5b9a\u67e5\u8be2\u76f8\u5173\u7684\u5185\u5b58\u6570\u636e\uff0cQ-LLM\u80fd\u591f\u5728\u56fa\u5b9a\u7a97\u53e3\u5927\u5c0f\u5185\u51c6\u786e\u6355\u6349\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4e3a\u67e5\u8be2\u63d0\u4f9b\u7cbe\u786e\u7684\u7b54\u6848\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55LLMs\u4e2d\u3002\u4f7f\u7528LLaMA3\uff08QuickLLaMA\uff09\u7684Q-LLM\u80fd\u572830\u79d2\u5185\u9605\u8bfb\u300a\u54c8\u5229\u00b7\u6ce2\u7279\u300b\uff0c\u5e76\u80fd\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u76f8\u8f83\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684LLaMA3\uff0cQ-LLM\u7684\u6027\u80fd\u63d0\u5347\u4e867.17%\uff0c\u800c\u5728Mistral\u4e0a\uff0c\u5b83\u5728$\\infty$-bench\u4e0a\u7684\u8868\u73b0\u63d0\u5347\u4e863.26%\u3002\u5728\u201c\u9488\u950b\u76f8\u5bf9\u201d\u4efb\u52a1\u4e2d\uff0cQ-LLM\u5728\u5e7f\u6cdb\u8ba4\u53ef\u7684\u57fa\u51c6\u4e0a\uff0c\u76f8\u5bf9\u4e8e\u5f53\u524d\u6700\u4f73\u6210\u7ee9\uff0cMistral\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e867.0%\uff0c\u5728LLaMA3\u4e0a\u5b9e\u73b0\u4e86100%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728https://github.com/dvlab-research/Q-LLM\u4e0a\u5f00\u6e90\u3002**|\n", "2406.07515": "|**2024-06-11**|**Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement**|Yunzhen Feng et.al.|[2406.07515](http://arxiv.org/abs/2406.07515)|null|\u968f\u7740\u751f\u6210\u6a21\u578b\u5408\u6210\u6570\u636e\u7684\u5174\u8d77\uff0c\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5fae\u8c03\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u6a21\u578b\u5d29\u6e83\uff08\u5373\u5fae\u8c03\u6027\u80fd\u4e0b\u964d\uff09\u7684\u5173\u6ce8\u3002\u7531\u4e8e\u4eba\u7c7b\u548c\u673a\u5668\u90fd\u8f83\u5bb9\u6613\u5206\u8fa8\u597d\u6837\u672c\u548c\u574f\u6837\u672c\uff0c\u800c\u975e\u751f\u6210\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u53cd\u9988\u6765\u9632\u6b62\u6a21\u578b\u5728\u5408\u6210\u6570\u636e\u4e0a\u51fa\u73b0\u5d29\u6e83\u3002\u6211\u4eec\u7406\u8bba\u5206\u6790\u4e86\u4e00\u4e2a\u9ad8\u65af\u6df7\u5408\u5206\u7c7b\u6a21\u578b\u5728\u57fa\u4e8e\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u8bad\u7ec3\u4e0b\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u6709\u9650\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u5b9e\u9a8c\u8bc1\u636e\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5b9e\u9645\u95ee\u9898\u4e0a\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u9884\u6d4b\uff1a\u4f7f\u7528\u53d8\u538b\u5668\u8ba1\u7b97\u77e9\u9635\u7279\u5f81\u503c\u548c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u65b0\u95fb\u6458\u8981\uff0c\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u6a21\u578b\u5728\u751f\u6210\u6570\u636e\u4e0a\u90fd\u4f1a\u7ecf\u5386\u5d29\u6e83\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u901a\u8fc7\u4ece\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u4e2d\u8bad\u7ec3\uff0c\u65e0\u8bba\u662f\u4fee\u526a\u9519\u8bef\u9884\u6d4b\u8fd8\u662f\u9009\u62e9\u6700\u4f73\u731c\u6d4b\uff0c\u90fd\u80fd\u9632\u6b62\u6a21\u578b\u5d29\u6e83\uff0c\u8bc1\u5b9e\u4e86\u50cfRLHF\uff08Reinforcement Learning with Human Feedback\uff09\u8fd9\u6837\u7684\u6d41\u884c\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2406.07505": "|**2024-06-11**|**THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report**|KBTG Labs et.al.|[2406.07505](http://arxiv.org/abs/2406.07505)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u5728\u79d1\u6280\u9886\u57df\u5c55\u73b0\u4e86\u65b0\u529f\u80fd\u548c\u673a\u9047\u3002\u7136\u800c\uff0c\u975e\u5e38\u5927\u7684LLMs\u7684\u5b9e\u9645\u5e94\u7528\u53d7\u5230\u5176\u9ad8\u8ba1\u7b97\u6210\u672c\u7684\u5236\u7ea6\uff0c\u8fd9\u4e0e\u5176\u76f8\u5bf9\u6709\u9650\u7684\u4eba\u7c7b\u80fd\u529b\u76f8\u6bd4\uff0c\u6536\u76ca\u5e76\u4e0d\u660e\u663e\u3002\u5c3d\u7ba1\u5c0f\u578b\u3001\u66f4\u5b9e\u7528\u7684LLMs\u5728\u91d1\u878d\u5206\u6790\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5c1a\u672a\u5b8c\u5168\u638c\u63e1\uff0c\u5982\u5b83\u4eec\u5728\u6a21\u62df\u7279\u8bb8\u91d1\u878d\u5206\u6790\u5e08\uff08CFA\uff09\u8003\u8bd5\u4e2d\u7684\u63a5\u8fd1\u901a\u8fc7\u8868\u73b0\u6240\u793a\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86Financial Analyst Extension\uff08FAE\uff09\u5bf9\u6211\u4eec\u7684Text Hyperlocally Augmented Large Language Extension\uff08THaLLE\uff09\u7cfb\u5217\u7684\u6269\u5c55\uff0c\u8fd9\u4e00\u7cfb\u521780\u4ebf\u53c2\u6570\u7684LLMs\u5728\u6a21\u62dfCFA\u8003\u8bd5\u4e2d\u59cb\u7ec8\u8868\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u6a21\u578b\u76f8\u6bd4\u3002\u6211\u4eec\u8be6\u7ec6\u8bb0\u5f55\u4e86\u7528\u4e8e\u4f18\u5316\u7684\u5fae\u8c03\u6280\u672f\uff0c\u4ee5\u4f9b\u540e\u7eed\u7814\u7a76\u53c2\u8003\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165Flare CFA\uff0c\u8fd9\u662f\u4e00\u4e2a\u516c\u5f00\u53ef\u7528\u7684\u91d1\u878d\u987e\u95ee\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u68c0\u9a8cLLMs\u5728\u8d22\u52a1\u987e\u95ee\u89d2\u8272\u4e2d\u7684\u80fd\u529b\u3002|\n", "2406.07502": "|**2024-06-11**|**Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions**|Renjie Pi et.al.|[2406.07502](http://arxiv.org/abs/2406.07502)|**[link](https://github.com/sterzhang/image-textualization)**|**## \u80cc\u666f \u56fe\u50cf\u63cf\u8ff0\u6570\u636e\u96c6\u5bf9\u4e8e\u63a8\u52a8\u56fe\u50cf\u7406\u89e3\u3001\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u548c\u6587\u672c\u56fe\u50cf\u68c0\u7d22\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u5f53\u524d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u6765\u81ea\u4e24\u4e2a\u9014\u5f84\uff1a\u4e00\u662f\u4ece\u7f51\u7edc\u4e0a\u6293\u53d6\u56fe\u50cf\u4e0e\u6587\u5b57\u5bf9\uff0c\u4f46\u8fd9\u7c7b\u63cf\u8ff0\u5f80\u5f80\u8d28\u91cf\u8f83\u4f4e\u4e14\u5b58\u5728\u566a\u58f0\uff1b\u4e8c\u662f\u4eba\u5de5\u6807\u6ce8\uff0c\u5982COCO\u7b49\uff0c\u901a\u5e38\u63cf\u8ff0\u7b80\u6d01\uff0c\u7f3a\u4e4f\u8be6\u7ec6\u4fe1\u606f\u3002\u5c3d\u7ba1\u8be6\u7ec6\u7684\u56fe\u50cf\u63cf\u8ff0\u53ef\u4ee5\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u83b7\u5f97\uff0c\u4f46\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\u9650\u5236\u4e86\u5176\u53ef\u884c\u6027\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u4fc3\u4f7f\u6211\u4eec\u5bfb\u6c42\u66f4\u6709\u6548\u548c\u53ef\u6269\u5c55\u7684\u65b9\u6cd5\u6765\u751f\u6210\u51c6\u786e\u800c\u8be6\u5c3d\u7684\u56fe\u50cf\u63cf\u8ff0\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u56fe\u50cf\u6587\u672c\u5316\u201d\uff08Image Textualization\uff0c\u7b80\u79f0IT\uff09\uff0c\u5b83\u901a\u8fc7\u534f\u540c\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u548c\u89c6\u89c9\u4e13\u5bb6\u6a21\u578b\uff0c\u6709\u6548\u5730\u5c06\u89c6\u89c9\u4fe1\u606f\u8f6c\u5316\u4e3a\u6587\u672c\uff0c\u4ece\u800c\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u63cf\u8ff0\u3002\u9488\u5bf9\u5f53\u524d\u7f3a\u4e4f\u8be6\u5c3d\u63cf\u8ff0\u7684\u57fa\u51c6\u95ee\u9898\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u591a\u4e2a\u8bc4\u4ef7\u57fa\u51c6\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u8d28\u91cf\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728IT\u7cbe\u5fc3\u7f16\u7e82\u7684\u63cf\u8ff0\u8bad\u7ec3\u4e0b\uff0cLLaVA-7B\u6a21\u578b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u80fd\u529b\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u80fd\u591f\u751f\u6210\u66f4\u4e30\u5bcc\u7684\u63cf\u8ff0\uff0c\u8f93\u51fa\u957f\u5ea6\u548c\u7ec6\u8282\u663e\u8457\u589e\u52a0\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5e7b\u89c9\u73b0\u8c61\u3002**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496](http://arxiv.org/abs/2406.07496)|**[link](https://github.com/zou-group/textgrad)**|**\u4eba\u5de5\u667a\u80fd\u6b63\u7ecf\u5386\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5176\u4ed6\u590d\u6742\u7ec4\u4ef6\u7684\u534f\u540c\u5de5\u4f5c\u53d6\u5f97\u4e86\u7a81\u7834\u3002\u5f53\u524d\uff0c\u4e3a\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u8bbe\u8ba1\u539f\u5219\u5316\u7684\u81ea\u52a8\u5316\u4f18\u5316\u65b9\u6cd5\u6210\u4e3a\u4e00\u9879\u5173\u952e\u65b0\u6311\u6218\u3002\u795e\u7ecf\u7f51\u7edc\u5728\u65e9\u671f\u9762\u4e34\u7c7b\u4f3c\u95ee\u9898\u65f6\uff0c\u901a\u8fc7\u53cd\u5411\u4f20\u64ad\u548c\u81ea\u52a8\u5fae\u5206\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TextGrad\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u6587\u672c\u5b9e\u73b0\u81ea\u52a8\u201c\u5fae\u5206\u201d\uff0c\u5c06LLMs\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u901a\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5efa\u8bae\u56de\u4f20\u5230\u590d\u5408AI\u7cfb\u7edf\u7684\u5404\u4e2a\u7ec4\u4ef6\u4e2d\u3002TextGrad\u9075\u5faaPyTorch\u7684\u8bed\u6cd5\u548c\u62bd\u8c61\uff0c\u6613\u4e8e\u4f7f\u7528\u4e14\u7075\u6d3b\uff0c\u7528\u6237\u4ec5\u9700\u63d0\u4f9b\u76ee\u6807\u51fd\u6570\uff0c\u65e0\u9700\u8c03\u6574\u6846\u67b6\u7ec4\u4ef6\u6216\u63d0\u793a\uff0c\u5373\u53ef\u65e0\u7f1d\u5e94\u7528\u3002 TextGrad\u9002\u7528\u4e8e\u591a\u79cd\u4efb\u52a1\uff0c\u4ece\u95ee\u7b54\u548c\u5206\u5b50\u4f18\u5316\u5230\u653e\u5c04\u6cbb\u7597\u8ba1\u5212\u8bbe\u8ba1\u3002\u5728\u65e0\u9700\u4fee\u6539\u6846\u67b6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u663e\u8457\u63d0\u5347\u4e86GPT-4o\u5728Google\u8bc1\u660e\u6027\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u96f6-shot\u51c6\u786e\u7387\uff0c\u4ece51%\u63d0\u5347\u81f355%\uff1b\u5728\u4f18\u5316LeetCode\u96be\u9898\u89e3\u6cd5\u4e0a\u5b9e\u73b0\u4e8620%\u7684\u76f8\u5bf9\u6027\u80fd\u63d0\u5347\uff1b\u6539\u8fdb\u4e86\u63a8\u7406\u63d0\u793a\uff0c\u8bbe\u8ba1\u51fa\u5177\u6709\u7406\u60f3\u4f53\u5916\u4eb2\u548c\u529b\u7684\u65b0\u836f\u5019\u9009\u5206\u5b50\uff1b\u4ee5\u53ca\u8bbe\u8ba1\u51fa\u5177\u6709\u9ad8\u7279\u5f02\u6027\u7684\u653e\u5c04\u6cbb\u7597\u65b9\u6848\u3002TextGrad\u4e3a\u4e0b\u4e00\u4ee3AI\u7cfb\u7edf\u7684\u53d1\u5c55\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63a8\u52a8\u4e86\u590d\u5408AI\u6280\u672f\u7684\u52a0\u901f\u53d1\u5c55\u3002**|\n", "2406.07494": "|**2024-06-12**|**CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization**|Frederic Kirstein et.al.|[2406.07494](http://arxiv.org/abs/2406.07494)|null|\u8be5\u6587\u7ae0\u7efc\u8ff0\u4e862019\u5e74\u81f32024\u5e74\u95f4\u53d1\u8868\u76841262\u7bc7\u72ec\u7279\u7684\u7814\u7a76\u8bba\u6587\uff0c\u96c6\u4e2d\u5728Transformer\u67b6\u6784\u5728\u82f1\u6587\u5bf9\u8bdd\u6458\u8981\u751f\u6210\u65b9\u9762\u7684\u7814\u7a76\u3002\u6587\u7ae0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u5bf9\u8bdd\u6458\u8981\u4e2d\u5b58\u5728\u7684\u4e3b\u8981\u6311\u6218\uff0c\u5982\u8bed\u8a00\u7406\u89e3\u3001\u7ed3\u6784\u5904\u7406\u3001\u7406\u89e3\u80fd\u529b\u3001\u8bf4\u8bdd\u8005\u8bc6\u522b\u3001\u91cd\u8981\u6027\u5224\u65ad\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\uff0c\u5e76\u4e0e\u76f8\u5e94\u7684\u6280\u672f\uff0c\u5982\u56fe\u89e3\u65b9\u6cd5\u3001\u989d\u5916\u8bad\u7ec3\u4efb\u52a1\u548c\u89c4\u5212\u7b56\u7565\u8fdb\u884c\u4e86\u5173\u8054\u3002\u5c3d\u7ba1\u5728\u67d0\u4e9b\u65b9\u9762\uff08\u5982\u8bed\u8a00\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u7406\u89e3\u529b\u3001\u771f\u5b9e\u6027\u4e0e\u91cd\u8981\u6027\u8bc4\u4f30\u7b49\u6311\u6218\u4ecd\u7136\u5b58\u5728\uff0c\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u7814\u7a76\u7a7a\u95f4\u3002 \u6587\u7ae0\u8fd8\u5206\u6790\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u65b9\u5f0f\uff0c\u6db5\u76d6\u4e86\u5bf9\u8bdd\u5b50\u9886\u57df\uff08\u5982\u4f1a\u8bae\u3001\u533b\u7597\uff09\u7684\u5e38\u7528\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u81ea\u52a8\u8bc4\u4ef7\u6307\u6807\uff08\u5982ROUGE\uff09\u548c\u4eba\u7c7b\u8bc4\u4f30\u7684\u666e\u904d\u5b9e\u8df5\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8de8\u9886\u57df\u7684\u6570\u636e\u96c6\u76f8\u5bf9\u6709\u9650\uff0c\u4e14\u62a5\u544a\u7684\u4eba\u7c7b\u8bc4\u4f30\u5f80\u5f80\u7f3a\u4e4f\u8db3\u591f\u7684\u5185\u5ba1\u5458\u4e00\u81f4\u6027\u4fe1\u606f\u548c\u6807\u6ce8\u6307\u5357\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u63a2\u7d22\u53ef\u80fd\u5e26\u6765\u7684\u5f71\u54cd\uff0c\u6307\u51fa\u5c3d\u7ba1\u5b83\u4eec\u53ef\u80fd\u4f1a\u6539\u53d8\u76f8\u5173\u6027\u548c\u96be\u5ea6\uff0c\u4f46\u63cf\u8ff0\u7684\u6311\u6218\u5206\u7c7b\u4f53\u7cfb\u4ecd\u7136\u5177\u6709\u4ef7\u503c\u3002|\n", "2406.07485": "|**2024-06-11**|**PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction**|Adnan Abbas et.al.|[2406.07485](http://arxiv.org/abs/2406.07485)|null|\u9ad8\u6548\u7684\u8ba1\u5212\u5236\u5b9a\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4eba\u4eec\u5f80\u5f80\u96be\u4ee5\u5236\u5b9a\u5b9e\u9645\u7684\u8ba1\u5212\u5e76\u53cd\u601d\u81ea\u5df1\u7684\u6548\u7387\u3002\u5229\u7528\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u5bf9\u8bdd\u52a9\u624b\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u8bdd\u65b9\u5f0f\u5c06\u8ba1\u5212\u5916\u5316\uff0c\u5f3a\u5316\u51b3\u5fc3\uff0c\u4fc3\u8fdb\u4e13\u6ce8\u884c\u52a8\uff0c\u4ece\u800c\u6b63\u9762\u5f71\u54cd\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u8bbe\u8ba1\u4e00\u4e2a\u5bf9\u8bdd\u52a9\u624b\uff0c\u901a\u8fc7\u81ea\u7136\u5bf9\u8bdd\u7684\u793e\u4ea4\u4e92\u52a8\u6027\uff0c\u63d0\u4f9b\u6df1\u5165\u7684\u95ee\u9898\u548c\u53cd\u601d\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u8ba1\u5212\u6267\u884c\u5ea6\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u663e\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u7684\u6548\u76ca\uff0c\u4f46\u8bb8\u591a\u5e72\u9884\u63aa\u65bd\u4ecd\u4fdd\u6301\u9759\u6001\uff0c\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u53c2\u4e0e\u5ea6\u968f\u65f6\u95f4\u4e0b\u964d\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65cb\u8f6c\u548c\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u6bcf\u5929\u4e3a\u7528\u6237\u63d0\u4f9b\u591a\u6837\u7684\u5e72\u9884\u624b\u6bb5\u3002\u6211\u4eec\u7684\u7cfb\u7edfPITCH\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u4fc3\u8fdb\u65e5\u5e38\u8ba1\u5212\u7684\u5916\u90e8\u5316\u548c\u53cd\u601d\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0e\u5bf9\u8bdd\u4ee3\u7406\u4e00\u8d77\u5916\u5316\u4efb\u52a1\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u65cb\u8f6c\u7b56\u7565\u5728\u4fdd\u6301\u7528\u6237\u53c2\u4e0e\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2406.07483": "|**2024-06-11**|**Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing**|Mao Li et.al.|[2406.07483](http://arxiv.org/abs/2406.07483)|null|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u81ea\u52a8\u6587\u672c\u6807\u6ce8\u65b9\u9762\u5c55\u73b0\u51fa\u6d53\u539a\u5174\u8da3\u3002\u672c\u6587\u7814\u7a76\u4e86\u516b\u79cd\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728\u7acb\u573a\u6807\u6ce8\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5c06\u5176\u4e0e\u4eba\u7c7b\uff08\u901a\u8fc7\u4f17\u5305\uff09\u7684\u5224\u65ad\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u3002\u6211\u4eec\u63a2\u7a76\u4e86\u4f55\u65f6LLMs\u53ef\u80fd\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ea7\u751f\u5206\u6b67\u7684\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6587\u672c\u4e2d\u8868\u8fbe\u7acb\u573a\u7684\u660e\u786e\u7a0b\u5ea6\u5bf9LLMs\u5224\u65ad\u4e0e\u4eba\u7c7b\u4e00\u81f4\u6027\u81f3\u5173\u91cd\u8981\u3002\u5f53\u4eba\u7c7b\u6ce8\u91ca\u8005\u8868\u73b0\u826f\u597d\u65f6\uff0cLLMs\u4e5f\u8868\u73b0\u51fa\u8272\uff1b\u53cd\u4e4b\uff0cLLMs\u7684\u5931\u8d25\u5f80\u5f80\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u96be\u4ee5\u8fbe\u6210\u4e00\u81f4\u7684\u60c5\u5883\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u7ed3\u5408\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u7cbe\u786e\u5ea6\u4e0eLLMs\u9884\u6d4b\u7684\u89c4\u6a21\uff0c\u63d0\u51fa\u4e00\u79cd\u5168\u9762\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u63d0\u9ad8\u81ea\u52a8\u5316\u7acb\u573a\u68c0\u6d4b\u51c6\u786e\u6027\u548c\u5168\u9762\u6027\u7684\u5fc5\u8981\u6027\uff0c\u65e8\u5728\u63a8\u52a8\u8fd9\u4e9b\u6280\u672f\u5728\u66f4\u9ad8\u6548\u3001\u65e0\u504f\u89c1\u7684\u793e\u4f1a\u5a92\u4f53\u5206\u6790\u4e2d\u5f97\u5230\u63d0\u5347\u3002|\n", "2406.07476": "|**2024-06-11**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476](http://arxiv.org/abs/2406.07476)|**[link](https://github.com/damo-nlp-sg/videollama2)**|**\u672c\u6587\u4ecb\u7ecdVideoLLaMA 2\uff0c\u4e00\u5957\u4e13\u4e3a\u63d0\u5347\u89c6\u9891\u548c\u97f3\u9891\u5b9a\u5411\u4efb\u52a1\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u5efa\u6a21\u53ca\u97f3\u9891\u7406\u89e3\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Video-LLMs\uff09\u3002\u5b83\u5728\u524d\u4e00\u4ee3\u7684\u57fa\u7840\u4e0a\u589e\u6dfb\u4e86\u5b9a\u5236\u7684\u65f6\u7a7a\u5377\u79ef\uff08STC\uff09\u8fde\u63a5\u5668\uff0c\u6709\u6548\u5730\u6355\u6349\u89c6\u9891\u6570\u636e\u7684\u590d\u6742\u7a7a\u95f4\u548c\u65f6\u95f4\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u878d\u5165\u4e86\u97f3\u9891\u5206\u652f\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u65e0\u7f1d\u878d\u5408\u97f3\u9891\u7ebf\u7d22\u3002\u5728\u591a\u9879\u8bc4\u4f30\u4e2d\uff0c\u5982\u591a\u9009\u89c6\u9891\u95ee\u7b54\uff08MC-VQA\uff09\u3001\u5f00\u653e\u6027\u89c6\u9891\u95ee\u7b54\uff08OE-VQA\uff09\u548c\u89c6\u9891captioning\uff08VC\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u8868\u73b0\u51fa\u4e0e\u5f00\u6e90\u6a21\u578b\u76f8\u5f53\u7684\u7ade\u4e89\u5b9e\u529b\uff0c\u5e76\u5728\u67d0\u4e9b\u57fa\u51c6\u4e0a\u63a5\u8fd1\u4e13\u6709\u6a21\u578b\u3002\u5728\u97f3\u9891\u4ec5\u7528\uff08AQA\uff09\u548c\u97f3\u9891-\u89c6\u9891\u95ee\u7b54\uff08OE-AVQA\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u4e5f\u663e\u793a\u51fa\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5408\u7406\u6539\u8fdb\u3002\u8fd9\u4e9b\u8fdb\u6b65\u51f8\u663e\u4e86VideoLLaMA 2\u5728\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4e3a\u667a\u80fd\u89c6\u9891\u5206\u6790\u7cfb\u7edf\u6811\u7acb\u4e86\u65b0\u6807\u51c6\u3002\u6240\u6709\u6a21\u578b\u5747\u516c\u5f00\u4ee5\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2406.08477": "|**2024-06-12**|**Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens**|Ting-Ji Huang et.al.|[2406.08477](http://arxiv.org/abs/2406.08477)|null|\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u5411\u91cf\u8868\u793a\u7528\u6237\u548c\u9879\u76ee\u5bf9\u4e8e\u591a\u79cd\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u95ee\u7b54\u5f62\u5f0f\u7684\u63a8\u8350\uff0c\u4f7f\u7528\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\uff08\u5982\u201citem\u201d\u3001\u201c20\u201d\u3001\u201c24\u201d\uff09\u6765\u8868\u793a\u5b9e\u9645\u7684\u7528\u6237\u548c\u9879\u76ee\u3002\u7136\u800c\uff0c\u7531\u4e8eLLMs\u901a\u5e38\u662f\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u9884\u8bad\u7ec3\u7684\uff0c\u8fd9\u4e9b\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5728\u8868\u8fbe\u72ec\u7279\u7528\u6237\u548c\u9879\u76ee\u65b9\u9762\u80fd\u529b\u6709\u9650\uff0c\u5373\u4f7f\u7ecf\u8fc7\u63a8\u8350\u4efb\u52a1\u7684\u5fae\u8c03\uff0c\u4e5f\u4f1a\u524a\u5f31\u63a8\u8350\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5728LLM\u57fa\u7684\u63a8\u8350\u7cfb\u7edf\u4e2d\u5904\u7406\u7528\u6237\u548c\u9879\u76ee\u7684\u6807\u8bb0\u3002 \u6211\u4eec\u5f3a\u8c03\u4e86\u51fa\u8bcd\u6c47\u8868\uff08OOV\uff09\u6807\u8bb0\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u9664\u4e86\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5916\uff0c\u8fd8\u80fd\u6355\u6349\u7528\u6237/\u9879\u76ee\u4e4b\u95f4\u7684\u5173\u8054\u6027\u548c\u591a\u6837\u6027\u3002\u901a\u8fc7\u5206\u6790\u5386\u53f2\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u4f7f\u5177\u6709\u76f8\u4f3c\u7279\u6027\u7684\u7528\u6237/\u9879\u76ee\u7ec4\u5408\u5171\u4eab\u76f8\u540c\u7684OOV\u6807\u8bb0\u3002\u6b64\u5916\uff0c\u5c06\u8fd9\u4e9bOOV\u6807\u8bb0\u6574\u5408\u5230LLM\u7684\u8bcd\u6c47\u8868\u4e2d\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u533a\u5206\u7528\u6237\u548c\u9879\u76ee\uff0c\u589e\u5f3a\u5728\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u65f6\u5bf9\u7528\u6237-\u9879\u76ee\u5173\u7cfb\u7684\u6355\u6349\u3002 \u6211\u4eec\u7684\u63d0\u51fa\u7684\u6846\u67b6\u5728\u5404\u79cd\u4e0b\u6e38\u63a8\u8350\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2406.08474": "|**2024-06-12**|**Real2Code: Reconstruct Articulated Objects via Code Generation**|Zhao Mandi et.al.|[2406.08474](http://arxiv.org/abs/2406.08474)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014Real2Code\uff0c\u65e8\u5728\u901a\u8fc7\u4ee3\u7801\u751f\u6210\u6765\u91cd\u5efa\u53ef\u52a8\u7269\u4f53\u3002\u7ed9\u5b9a\u7269\u4f53\u7684\u89c6\u89c9\u89c2\u6d4b\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u56fe\u50cf\u5206\u5272\u6a21\u578b\u548c\u5f62\u72b6\u8865\u5168\u6a21\u578b\u91cd\u6784\u5176\u90e8\u4ef6\u51e0\u4f55\u7ed3\u6784\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u7269\u4f53\u90e8\u4ef6\u8868\u793a\u4e3a\u5e26\u6709\u65b9\u5411\u7684\u8fb9\u754c\u6846\uff0c\u7136\u540e\u8f93\u5165\u5230\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\uff0c\u9884\u6d4b\u5173\u8282\u6d3b\u52a8\u7684\u4ee3\u7801\u8868\u793a\u3002\u901a\u8fc7\u5229\u7528\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u4f18\u96c5\u5730\u6269\u5c55\u5230\u5177\u6709\u66f4\u591a\u53ef\u52a8\u90e8\u4ef6\u7684\u5bf9\u8c61\uff0c\u5e76\u80fd\u4ece\u5408\u6210\u8bad\u7ec3\u6570\u636e\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4e0d\u89c4\u5219\u73af\u5883\u7269\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cReal2Code\u5728\u91cd\u5efa\u7cbe\u5ea6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u662f\u9996\u4e2a\u80fd\u591f\u8d85\u8d8a\u8bad\u7ec3\u96c6\u4e2d\u5bf9\u8c61\u7ed3\u6784\u590d\u6742\u6027\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91cd\u5efa\u591a\u8fbe10\u4e2a\u53ef\u52a8\u90e8\u4ef6\u7684\u7269\u4f53\u3002\u5f53\u4e0e\u7acb\u4f53\u91cd\u5efa\u6a21\u578b\u7ed3\u5408\u65f6\uff0cReal2Code\u8fd8\u80fd\u4ece\u5c11\u91cf\u591a\u89c6\u56feRGB\u56fe\u50cf\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u7269\u4f53\uff0c\u65e0\u9700\u6df1\u5ea6\u6216\u76f8\u673a\u4fe1\u606f\u3002|\n", "2406.08464": "|**2024-06-12**|**Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing**|Zhangchen Xu et.al.|[2406.08464](http://arxiv.org/abs/2406.08464)|**[link](https://github.com/magpie-align/magpie)**|\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4\u6570\u636e\u5bf9\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u50cfLlama-3-Instruct\u8fd9\u6837\u7684\u6a21\u578b\u516c\u5f00\u4e86\u6743\u91cd\uff0c\u4f46\u5b83\u4eec\u7684\u5bf9\u9f50\u6570\u636e\u4ecd\u7136\u4fdd\u5bc6\uff0c\u8fd9\u9650\u5236\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u666e\u53ca\u3002\u73b0\u6709\u7684\u5f00\u6e90\u6570\u636e\u751f\u6210\u65b9\u6cd5\u53d7\u9650\u4e8e\u9ad8\u6602\u7684\u4eba\u529b\u6210\u672c\u548c\u6709\u9650\u7684\u63d0\u793a\u8303\u56f4\uff0c\u96be\u4ee5\u6709\u6548\u6269\u5c55\uff0c\u53ef\u80fd\u5f71\u54cd\u516c\u5171\u5bf9\u9f50\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u80fd\u5426\u901a\u8fc7\u76f4\u63a5\u4ece\u5df2\u5bf9\u9f50\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\uff0c\u5927\u89c4\u6a21\u5408\u6210\u9ad8\u8d28\u6307\u4ee4\u6570\u636e\u5462\uff1f\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5408\u6210\u65b9\u6cd5\uff0c\u79f0\u4e3aMagpie\u3002\u6211\u4eec\u7684\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u7531\u4e8eLlama-3-Instruct\u7b49\u5df2\u5bf9\u9f50\u7684\u6a21\u578b\u5177\u6709\u81ea\u56de\u5f52\u7279\u6027\uff0c\u5f53\u6211\u4eec\u4ec5\u8f93\u5165\u5de6\u4fa7\u6a21\u677f\u5230\u7528\u6237\u6d88\u606f\u9884\u7559\u4f4d\u7f6e\u65f6\uff0c\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u7528\u6237\u67e5\u8be2\u3002\u6211\u4eec\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\u63d0\u793aLlama-3-Instruct\uff0c\u751f\u6210\u4e86400\u4e07\u4e2a\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5bf9\u63d0\u53d6\u7684\u6570\u636e\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u5e76\u9009\u62e9\u4e8630\u4e07\u4e2a\u9ad8\u8d28\u91cf\u5b9e\u4f8b\u3002\u4e3a\u4e86\u6bd4\u8f83Magpie\u6570\u636e\u4e0e\u5176\u4ed6\u516c\u5171\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5206\u522b\u4f7f\u7528\u6bcf\u4e2a\u6570\u636e\u96c6\u5bf9Llama-3-8B-Base\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8bc4\u4f30\u5fae\u8c03\u540e\u6a21\u578b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u4ec5\u4f7f\u7528Magpie\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0e\u5b98\u65b9\u7ecf\u8fc71000\u4e07\u4e2a\u6570\u636e\u70b9\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u548c\u540e\u7eed\u53cd\u9988\u5b66\u4e60\u589e\u5f3a\u7684Llama-3-8B-Instruct\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4ec5\u4f7f\u7528Magpie\u8fdb\u884cSFT\u53ef\u4ee5\u8d85\u8d8a\u5148\u524d\u7528\u4e8eSFT\u548c\u504f\u597d\u4f18\u5316\uff08\u5982UltraFeedback\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff09\u7684\u516c\u5171\u6570\u636e\u96c6\u3002\u8fd9\u79cd\u4f18\u52bf\u5728AlpacaEval\u3001ArenaHard\u548cWildBench\u7b49\u5bf9\u9f50\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u660e\u663e\u3002|\n", "2406.08434": "|**2024-06-12**|**TasTe: Teaching Large Language Models to Translate through Self-Reflection**|Yutong Wang et.al.|[2406.08434](http://arxiv.org/abs/2406.08434)|**[link](https://github.com/yutongwang1216/reflectionllmmt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u540e\uff0c\u5728\u673a\u5668\u7ffb\u8bd1\uff08Machine Translation, MT\uff09\u7b49\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u672a\u80fd\u8fbe\u5230\u4e0e\u76d1\u7763\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08Supervised Neural Machine Translation, NMT\uff09\u7cfb\u7edf\u76f8\u5f53\u7684\u7ffb\u8bd1\u8d28\u91cf\u3002\u539f\u56e0\u53ef\u80fd\u662f\u5f53\u524d\u4f7f\u7528\u7684\u7b80\u5355\u63d0\u793a\u65e0\u6cd5\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TasTe\u6846\u67b6\uff0c\u5373\u201c\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u8fdb\u884c\u7ffb\u8bd1\u201d\u3002\u8be5\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u63a8\u7406\u9636\u6bb5\uff1a\u7b2c\u4e00\u9636\u6bb5\uff0c\u6a21\u578b\u88ab\u5f15\u5bfc\u751f\u6210\u521d\u6b65\u7ffb\u8bd1\u5e76\u540c\u65f6\u5bf9\u5176\u81ea\u8eab\u8fdb\u884c\u8bc4\u4f30\uff1b\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6a21\u578b\u6839\u636e\u8bc4\u4f30\u7ed3\u679c\u5bf9\u521d\u6b65\u7ffb\u8bd1\u8fdb\u884c\u7ec6\u5316\u3002\u5728WMT22\u57fa\u51c6\u7684\u56db\u79cd\u8bed\u8a00\u65b9\u5411\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\u7684\u6709\u6548\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91ca\u653e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u5e76\u589e\u5f3a\u5176\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/YutongWang1216/ReflectionLLMMT\u4e0a\u5f00\u6e90\u3002**|\n", "2406.08426": "|**2024-06-12**|**Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL**|Zijin Hong et.al.|[2406.08426](http://arxiv.org/abs/2406.08426)|null|\u6587\u672c\u8f6cSQL\u751f\u6210\u51c6\u786e\u7684SQL\u67e5\u8be2\u4ee5\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u662f\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5b83\u6d89\u53ca\u7528\u6237\u95ee\u9898\u7406\u89e3\u3001\u6570\u636e\u5e93\u6a21\u5f0f\u7406\u89e3\u4ee5\u53caSQL\u751f\u6210\u7b49\u591a\u4e2a\u590d\u6742\u73af\u8282\u3002\u4f20\u7edf\u7684\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u4f9d\u8d56\u4e8e\u4eba\u5de5\u5de5\u7a0b\u548c\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u3002\u968f\u7740\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u53d1\u5c55\u548c\u5728\u8be5\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002\u7136\u800c\uff0c\u968f\u7740\u6570\u636e\u5e93\u590d\u6742\u5ea6\u589e\u52a0\u548c\u7528\u6237\u95ee\u9898\u96be\u5ea6\u589e\u5927\uff0cPLMs\u6709\u9650\u7684\u7406\u89e3\u80fd\u529b\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u7684SQL\u751f\u6210\uff0c\u8fd9\u4fc3\u4f7f\u7814\u7a76\u4eba\u5458\u5bfb\u6c42\u66f4\u9ad8\u7ea7\u548c\u5b9a\u5236\u5316\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u9650\u5236\u4e86PLM\u57fa\u7840\u7cfb\u7edf\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0a\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u56e0\u6b64\uff0c\u6574\u5408LLM\u7684\u5b9e\u73b0\u4e3a\u6587\u672c\u8f6cSQL\u7814\u7a76\u5e26\u6765\u4e86\u72ec\u7279\u7684\u673a\u9047\u3001\u6311\u6218\u548c\u89e3\u51b3\u65b9\u6848\u3002\u672c\u7efc\u8ff0\u5168\u9762\u6982\u8ff0\u4e86\u57fa\u4e8eLLM\u7684\u6587\u672c\u8f6cSQL\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u5f53\u524d\u9762\u4e34\u7684\u6311\u6218\u548c\u6587\u672c\u8f6cSQL\u7684\u53d1\u5c55\u5386\u7a0b\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4ef7\u6307\u6807\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u5206\u6790\u4e86\u8fd1\u671f\u5728LLM\u652f\u6301\u4e0b\u7684\u6587\u672c\u8f6cSQL\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u5c1a\u5b58\u7684\u6311\u6218\uff0c\u5e76\u5bf9\u672a\u6765\u7814\u7a76\u65b9\u5411\u63d0\u51fa\u671f\u5f85\u3002|\n", "2406.08418": "|**2024-06-12**|**OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text**|Qingyun Li et.al.|[2406.08418](http://arxiv.org/abs/2406.08418)|**[link](https://github.com/opengvlab/omnicorpus)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aOmniCorpus\u7684\u5927\u578b\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u6570\u636e\u96c6\uff0c\u89c4\u6a21\u8fbe\u5230100\u4ebf\u7ea7\u522b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u901a\u8fc7\u9ad8\u6548\u7684\u5f15\u64ce\u7b5b\u9009\u548c\u63d0\u53d6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u6587\u6863\uff0c\u5305\u542b86\u4ebf\u5f20\u56fe\u7247\u548c1,696\u4e07\u4ebf\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u76f8\u8f83\u4e8e\u540c\u7c7b\u6570\u636e\uff08\u5982MMC4\u3001OBELICS\uff09\uff0cOmniCorpus\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf\uff1a1\uff09\u89c4\u6a21\u6269\u592715\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6570\u636e\u8d28\u91cf\uff1b2\uff09\u6765\u6e90\u66f4\u4e3a\u591a\u6837\uff0c\u5305\u62ec\u82f1\u6587\u548c\u975e\u82f1\u6587\u7f51\u7ad9\uff0c\u4ee5\u53ca\u89c6\u9891\u4e3a\u4e3b\u7684\u7f51\u7ad9\uff1b3\uff09\u7075\u6d3b\u6027\u66f4\u5f3a\uff0c\u53ef\u4ee5\u4ece\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u683c\u5f0f\u8f7b\u677e\u8f6c\u6362\u4e3a\u7eaf\u6587\u672c\u8bed\u6599\u5e93\u6216\u56fe\u50cf-\u6587\u672c\u5bf9\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bba\u6587\u9a8c\u8bc1\u4e86OmniCorpus\u7684\u6570\u636e\u8d28\u91cf\u3001\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684\u591a\u6a21\u6001\u6a21\u578b\u7814\u7a76\u63d0\u4f9b\u575a\u5b9e\u7684\u6570\u636e\u57fa\u7840\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/OpenGVLab/OmniCorpus\u4e0a\u516c\u5f00\u3002**|\n", "2406.08414": "|**2024-06-12**|**Discovering Preference Optimization Algorithms with and for Large Language Models**|Chris Lu et.al.|[2406.08414](http://arxiv.org/abs/2406.08414)|**[link](https://github.com/luchris429/DiscoPOP)**|****\u4e2d\u6587\u7ffb\u8bd1\uff1a** \u79bb\u7ebf\u504f\u597d\u4f18\u5316\u662f\u63d0\u5347\u548c\u63a7\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u8d28\u91cf\u7684\u91cd\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u504f\u597d\u4f18\u5316\u88ab\u89c6\u4e3a\u57fa\u4e8e\u4eba\u5de5\u8bbe\u8ba1\u7684\u51f8\u635f\u5931\u51fd\u6570\u7684\u79bb\u7ebf\u76d1\u7763\u5b66\u4e60\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53d7\u9650\u4e8e\u4eba\u7c7b\u521b\u9020\u529b\uff0c\u672a\u80fd\u5145\u5206\u63a2\u7d22\u53ef\u80fd\u7684\u635f\u5931\u51fd\u6570\u7684\u5de8\u5927\u641c\u7d22\u7a7a\u95f4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528LLM\u8fdb\u884c\u76ee\u6807\u53d1\u73b0\u7684\u65b9\u6cd5\uff0c\u4ee5\u81ea\u52a8\u53d1\u73b0\u65b0\u7684\u6700\u5148\u8fdb\u7684\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u65e0\u9700\uff08\u4e13\u5bb6\uff09\u4eba\u5de5\u5e72\u9884\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u8fed\u4ee3\u5730\u63d0\u793aLLM\uff0c\u6839\u636e\u5148\u524d\u7684\u6027\u80fd\u8bc4\u4f30\u63d0\u51fa\u5e76\u5b9e\u73b0\u65b0\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5bfc\u81f4\u4e86\u672a\u77e5\u4e14\u9ad8\u6548\u7684\u4f18\u5316\u7b97\u6cd5\u7684\u53d1\u73b0\u3002\u5176\u4e2d\u6700\u597d\u7684\u4e00\u4e2a\u88ab\u547d\u540d\u4e3a\u201c\u53d1\u73b0\u504f\u597d\u4f18\u5316\u201d\uff08DiscoPOP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\uff0c\u5b83\u5de7\u5999\u5730\u878d\u5408\u4e86\u903b\u8f91\u548c\u6307\u6570\u635f\u5931\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDiscoPOP\u5728\u6027\u80fd\u4e0a\u8fbe\u5230\u4e86\u6700\u65b0\u6c34\u5e73\uff0c\u5e76\u6210\u529f\u5730\u5e94\u7528\u4e8e\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u3002**|\n", "2406.08413": "|**2024-06-12**|**Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference**|Christopher Wolters et.al.|[2406.08413](http://arxiv.org/abs/2406.08413)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u671f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f7f\u5f97\u673a\u5668\u80fd\u591f\u751f\u6210\u903c\u771f\u7684\u6587\u672c\u5e76\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u968f\u7740\u8ba1\u7b97\u548c\u5185\u5b58\u9700\u6c42\u7684\u6025\u5267\u589e\u957f\uff0c\u5c24\u5176\u662f\u5f53LLMs\u8d85\u8d8a\u5355\u4e2aGPU\u7684\u5904\u7406\u80fd\u529b\u65f6\uff0c\u5bf9\u901f\u5ea6\u3001\u6548\u7387\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u9700\u6c42\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u540c\u65f6\uff0c\u8ba1\u7b97\u673a\u6027\u80fd\u548c\u5185\u5b58\u80fd\u529b\u7684\u53d1\u5c55\u5e76\u672a\u8ddf\u4e0a\u6b65\u4f10\uff0c\u5c24\u5176\u662f\u5728\u6469\u5c14\u5b9a\u5f8b\u653e\u7f13\u7684\u80cc\u666f\u4e0b\u3002\u5185\u5b58\u8bbf\u95ee\u6210\u672c\u8fdc\u9ad8\u4e8e\u8ba1\u7b97\uff0c\u8fd9\u7ed9\u5927\u89c4\u6a21\u6269\u5c55\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5373\u6240\u8c13\u7684\u201c\u5185\u5b58\u5899\u201d\u3002\u5728\u8fd9\u4e2a\u65f6\u5019\uff0c\u8ba1\u7b97\u5728\u5185\u5b58\uff08Compute-in-Memory, CIM\uff09\u6280\u672f\u4e3aAI\u63a8\u7406\u63d0\u4f9b\u4e86\u52a0\u901f\u53ef\u80fd\uff0c\u901a\u8fc7\u5728\u5185\u5b58\u4e2d\u76f4\u63a5\u6267\u884c\u6a21\u62df\u8ba1\u7b97\uff0c\u6709\u671b\u964d\u4f4e\u5ef6\u8fdf\u548c\u529f\u8017\u3002\u901a\u8fc7\u7d27\u5bc6\u96c6\u6210\u5185\u5b58\u548c\u8ba1\u7b97\u5143\u4ef6\uff0cCIM\u6d88\u9664\u4e86\u51af\u8bfa\u4f9d\u66fc\u74f6\u9888\uff0c\u51cf\u5c11\u4e86\u6570\u636e\u4f20\u8f93\uff0c\u63d0\u9ad8\u4e86\u80fd\u6e90\u6548\u7387\u3002 \u672c\u7efc\u8ff0\u8bba\u6587\u6982\u8ff0\u4e86\u57fa\u4e8e\u53d8\u538b\u5668\u7684\u6a21\u578b\uff0c\u63a2\u8ba8\u4e86\u5404\u79cdCIM\u67b6\u6784\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u5982\u4f55\u5e94\u5bf9\u73b0\u4ee3\u4eba\u5de5\u667a\u80fd\u8ba1\u7b97\u7cfb\u7edf\u9762\u4e34\u7684\u7d27\u8feb\u6311\u6218\u3002\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u4e86\u4e0e\u53d8\u538b\u5668\u76f8\u5173\u7684\u8fd0\u7b97\u53ca\u5176\u786c\u4ef6\u52a0\u901f\u7b56\u7565\uff0c\u540c\u65f6\u6307\u51fa\u76f8\u5173CIM\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\u3001\u8d8b\u52bf\u548c\u6d1e\u5bdf\u3002|\n", "2406.08402": "|**2024-06-12**|**Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models**|Chun-Yi Kuan et.al.|[2406.08402](http://arxiv.org/abs/2406.08402)|**[link](https://github.com/kuan2jiu99/audio-hallucination)**|**## \u80cc\u666f \u5927\u578b\u97f3\u9891\u8bed\u8a00\u6a21\u578b\uff08LALMs\uff09\u901a\u8fc7\u6574\u5408\u97f3\u9891\u611f\u77e5\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u4f20\u7edf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u97f3\u9891\u76f8\u5173\u4efb\u52a1\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bc4\u4f30LALMs\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u4f46\u5bf9\u5b83\u4eec\u7684\u53ef\u9760\u6027\uff0c\u7279\u522b\u662f\u5173\u4e8e\u5bf9\u8c61\u5e7b\u89c9\u7b49\u95ee\u9898\u7684\u5173\u6ce8\u4e0d\u8db3\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u6765\u8bc4\u4f30\u516c\u5f00\u53ef\u7528\u7684LALMs\u5728\u5bf9\u8c61\u5e7b\u89c9\u65b9\u9762\u7684\u7a0b\u5ea6\u3002\u7ed3\u679c\u8868\u660e\uff0cLALMs\u5728\u7406\u89e3\u97f3\u9891\u5185\u5bb9\u65b9\u9762\u4e0e\u4e13\u95e8\u7684\u97f3\u9891captioning\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5728\u56de\u7b54\u533a\u5206\u6027\u95ee\u9898\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u9700\u8981\u8bc6\u522b\u97f3\u9891\u7247\u6bb5\u4e2d\u7279\u5b9a\u7269\u4f53\u58f0\u97f3\u7684\u95ee\u9898\u3002\u8fd9\u63ed\u793a\u4e86\u5f53\u524dLALMs\u7684\u4e00\u4e2a\u5173\u952e\u5f31\u70b9\uff1a\u5b83\u4eec\u5bf9\u533a\u5206\u6027\u67e5\u8be2\u7684\u7406\u89e3\u4e0d\u8db3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u63d0\u793a\u5de5\u7a0b\u5982\u4f55\u63d0\u5347LALMs\u5728\u533a\u5206\u6027\u95ee\u9898\u4e0a\u7684\u6027\u80fd\u3002**|\n", "2406.08398": "|**2024-06-12**|**cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers**|Anirudh Sundar et.al.|[2406.08398](http://arxiv.org/abs/2406.08398)|null|## \u80cc\u666f \u5728\u60c5\u5883\u5316\u548c\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8bdd\uff08SIMMC\uff09\u7684\u65b0\u5174\u7814\u7a76\u9886\u57df\u4e2d\uff0c\u79d1\u5b66\u8bba\u6587\u7684\u4e92\u52a8\u662f\u4e00\u4e2a\u91cd\u8981\u65b9\u5411\u3002\u7531\u4e8e\u79d1\u5b66\u8bba\u6587\u4e3b\u8981\u7531\u6587\u672c\u3001\u516c\u5f0f\u3001\u56fe\u8868\u548c\u8868\u683c\u6784\u6210\uff0cSIMMC\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u8fdb\u884c\u4e13\u95e8\u8bbe\u8ba1\uff0c\u4ee5\u652f\u6301\u79d1\u7814\u4eba\u5458\u6240\u9700\u7684\u6df1\u5ea6\u63a2\u7a76\u548c\u4e92\u52a8\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u8bdd\u5f0f\u8bba\u6587\u201d\uff08cPAPERS\uff09\u7684\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u6765\u81eaarXiv\u4e0a\u53ef\u7528\u7684\u79d1\u5b66\u6587\u6863\u7684\u5b66\u672f\u8bba\u6587\u8bc4\u8bba\u4e2d\u7684\u95ee\u7b54\u5bf9\uff0c\u8fd9\u4e9b\u95ee\u7b54\u4e0e\u8bba\u6587\u7ec4\u4ef6\u53ca\u5176\u5f15\u7528\u76f8\u5173\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6570\u636e\u6536\u96c6\u7b56\u7565\uff0c\u901a\u8fc7OpenReview\u6536\u96c6\u8fd9\u4e9b\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5e76\u4e0eLaTeX\u6e90\u6587\u4ef6\u4e2d\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5173\u8054\u8d77\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u7cfb\u5217\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5305\u62ec\u96f6\u6837\u672c\u548c\u5fae\u8c03\u914d\u7f6e\uff0c\u6765\u5904\u7406cPAPERS\u6570\u636e\u96c6\u3002|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418](http://arxiv.org/abs/2406.09418)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|**\u5728\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u5c55\u57fa\u7840\u4e0a\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u89c6\u9891\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u9891LMMs\u4f9d\u8d56\u4e8e\u56fe\u50cf\u6216\u89c6\u9891\u7f16\u7801\u5668\u5904\u7406\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u4e9b\u7f16\u7801\u5668\u5404\u81ea\u5b58\u5728\u5c40\u9650\u6027\u3002\u56fe\u50cf\u7f16\u7801\u5668\u64c5\u957f\u6355\u6349\u5e27\u5e8f\u5217\u4e2d\u7684\u4e30\u5bcc\u7a7a\u95f4\u7ec6\u8282\uff0c\u4f46\u7f3a\u4e4f\u660e\u786e\u7684\u65f6\u95f4\u4e0a\u4e0b\u6587\uff1b\u800c\u89c6\u9891\u7f16\u7801\u5668\u63d0\u4f9b\u65f6\u95f4\u4e0a\u4e0b\u6587\uff0c\u4f46\u5e38\u5e38\u53d7\u9650\u4e8e\u8ba1\u7b97\u8d44\u6e90\uff0c\u5bfc\u81f4\u53ea\u80fd\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u7684\u7a00\u758f\u5e27\uff0c\u4ece\u800c\u5f71\u54cd\u4e86\u5bf9\u7a7a\u95f4\u548c\u4e0a\u4e0b\u6587\u7684\u7406\u89e3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faVideoGPT+\uff0c\u5b83\u7ed3\u5408\u4e86\u56fe\u50cf\u7f16\u7801\u5668\uff08\u7528\u4e8e\u8be6\u7ec6\u7684\u7a7a\u95f4\u7406\u89e3\uff09\u548c\u89c6\u9891\u7f16\u7801\u5668\uff08\u7528\u4e8e\u5168\u5c40\u65f6\u5e8f\u4e0a\u4e0b\u6587\u5efa\u6a21\uff09\u7684\u4f18\u52bf\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u89c6\u9891\u5212\u5206\u4e3a\u5c0f\u6bb5\uff0c\u5e76\u5bf9\u6765\u81ea\u4e24\u8005\u7279\u5f81\u7684\u63d0\u53d6\u5e94\u7528\u81ea\u9002\u5e94\u6c60\u5316\u7b56\u7565\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u6211\u4eec\u7684\u67b6\u6784\u5728\u591a\u4e2a\u89c6\u9891\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5305\u62ecVCGBench\u3001MVBench\u548c\u96f6\u6837\u672c\u95ee\u7b54\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a112K\u7684\u89c6\u9891\u6307\u4ee4\u96c6\uff0c\u901a\u8fc7\u65b0\u9896\u7684\u534a\u81ea\u52a8\u6807\u6ce8\u7ba1\u9053\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u89c6\u9891LMMs\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86VCGBench-Diverse\uff0c\u5b83\u6db5\u76d6\u4e8618\u4e2a\u5e7f\u6cdb\u89c6\u9891\u7c7b\u522b\uff0c\u5982\u751f\u6d3b\u65b9\u5f0f\u3001\u4f53\u80b2\u3001\u79d1\u5b66\u3001\u6e38\u620f\u548c\u76d1\u63a7\u89c6\u9891\uff0c\u51714,354\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u8fd9\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\u73b0\u6709LMMs\u5728\u5bc6\u96c6\u89c6\u9891\u63cf\u8ff0\u3001\u7a7a\u95f4\u548c\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u590d\u6742\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u786e\u4fdd\u5728\u5404\u79cd\u89c6\u9891\u7c7b\u578b\u548c\u52a8\u6001\u4e0b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mbzuai-oryx/VideoGPT-plus\u627e\u5230\u3002**|\n", "2406.09412": "|**2024-06-13**|**Explore the Limits of Omni-modal Pretraining at Scale**|Yiyuan Zhang et.al.|[2406.09412](http://arxiv.org/abs/2406.09412)|**[link](https://github.com/invictus717/MiCo)**|**\u6211\u4eec\u63d0\u8bae\u6784\u5efa\u5168\u6a21\u6001\u667a\u80fd\uff0c\u65e8\u5728\u7406\u89e3\u5404\u79cd\u6a21\u6001\u5e76\u5b66\u4e60\u901a\u7528\u8868\u793a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u9884\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u591a\u6a21\u6001\u4e0a\u4e0b\u6587\uff08MiCo\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u540c\u65f6\u589e\u52a0\u6a21\u6001\u6570\u91cf\u3001\u6570\u636e\u91cf\u4ee5\u53ca\u6a21\u578b\u53c2\u6570\u7684\u6570\u91cf\u3002\u901a\u8fc7MiCo\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u591a\u9879\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u6a21\u6001\u5b66\u4e60\u80fd\u529b\uff1a\u4e00\u662f\u9488\u5bf910\u79cd\u4e0d\u540c\u6a21\u6001\u7684\u5355\u6a21\u6001\u611f\u77e5\u57fa\u51c6\uff0c\u4e8c\u662f\u5305\u62ec\u68c0\u7d22\u3001\u95ee\u7b54\u548ccaptioning\u5728\u5185\u768425\u9879\u8de8\u6a21\u6001\u7406\u89e3\u4efb\u52a1\uff0c\u4e09\u662f18\u4e2a\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u57fa\u51c6\u3002\u6211\u4eec\u7684\u6a21\u578b\u521b\u9020\u4e8637\u9879\u6700\u65b0\u7684\u6700\u9ad8\u6027\u80fd\u8bb0\u5f55\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u63a8\u52a8\u5168\u6a21\u6001\u667a\u80fd\u7684\u53d1\u5c55\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u5f00\u6e90\u3002**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397](http://arxiv.org/abs/2406.09397)|null|\u73b0\u4ee3\u89c6\u89c9\u6a21\u578b\u5728\u5927\u89c4\u6a21\u5608\u6742\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u867d\u7136\u5c55\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u9075\u5faa\u7528\u6237\u610f\u56fe\u3001\u5982\u89c6\u89c9\u7f8e\u611f\u3001\u7279\u5b9a\u98ce\u683c\u548c\u8d23\u4efb\u8f93\u51fa\u65b9\u9762\u53ef\u80fd\u5b58\u5728\u95ee\u9898\u3002\u672c\u6587\u5173\u6ce8\u89c6\u89c9\u7f8e\u5b66\u9886\u57df\uff0c\u76ee\u6807\u662f\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u6807\u51c6\u5728\u68c0\u7d22\u7cfb\u7edf\u4e2d\u4fdd\u6301\u4e00\u81f4\u3002\u9ad8\u7ea7\u68c0\u7d22\u7cfb\u7edf\u901a\u5e38\u91c7\u7528\u57fa\u4e8e\u4f4e\u7ea7\u7279\u5f81\uff08\u5982\u9971\u548c\u5ea6\uff09\u7684\u5ba1\u7f8e\u6a21\u578b\u4f5c\u4e3a\u91cd\u6392\u5668\u6216\u8fc7\u6ee4\u5668\uff0c\u4f46\u9762\u5bf9\u98ce\u683c\u3001\u6587\u5316\u6216\u77e5\u8bc6\u80cc\u666f\u65f6\u6027\u80fd\u6709\u9650\u3002\u6211\u4eec\u53d1\u73b0\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u6539\u5199\u641c\u7d22\u67e5\u8be2\u5e76\u6269\u5c55\u5ba1\u7f8e\u671f\u671b\uff0c\u53ef\u4ee5\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u504f\u597d\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u9488\u5bf9\u89c6\u89c9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u63d0\u53d6LLM\u63a8\u7406\u548c\u5ba1\u7f8e\u6a21\u578b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u66f4\u597d\u5730\u4f7f\u89c6\u89c9\u6a21\u578b\u7b26\u5408\u4eba\u7c7b\u5ba1\u7f8e\u3002\u7531\u4e8e\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u68c0\u7d22\u7cfb\u7edf\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u6a21\u578b\uff08LMM\uff09\u6765\u8bc4\u4ef7\u7f8e\u611f\u8868\u73b0\u3002\u8003\u8651\u5230\u7f8e\u611f\u8bc4\u4f30\u7684\u4e3b\u89c2\u6027\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aHPIR\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8861\u91cf\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u7684\u5951\u5408\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9\u6a21\u578b\u7684\u7f8e\u611f\u884c\u4e3a\uff0c\u4ece\u591a\u4e2a\u6307\u6807\u6765\u770b\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u63d0\u51fa\u7684\u7b97\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u901a\u7528\u5b9e\u8df5\uff0c\u7528\u4e8e\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002|\n", "2406.09396": "|**2024-06-13**|**Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA**|Jongwoo Park et.al.|[2406.09396](http://arxiv.org/abs/2406.09396)|**[link](https://github.com/jongwoopark7978/LVNet)**|\u957f\u671f\u89c6\u9891\u901a\u5e38\u5305\u542b\u5927\u91cf\u5197\u4f59\u4fe1\u606f\uff0c\u8de8\u8d8a\u8f83\u957f\u7684\u65f6\u95f4\u95f4\u9694\uff0c\u4e14\u5305\u542b\u591a\u4e2a\u677e\u6563\u5173\u8054\u7684\u4e8b\u4ef6\u6216\u5b9e\u4f53\u3002\u56e0\u6b64\uff0c\u5728\u8fdb\u884c\u957f\u89c6\u9891\u95ee\u7b54\uff08LVQA\uff09\u65f6\uff0c\u751f\u6210\u6b63\u786e\u7b54\u6848\u6240\u9700\u7684\u6240\u6709\u4fe1\u606f\u5f80\u5f80\u53ea\u9700\u4e00\u5c0f\u90e8\u5206\u5e27\u5c31\u8db3\u4ee5\u63d0\u4f9b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LVQA\u57fa\u51c6\u4e0a\u53d6\u5f97\u5353\u8d8a\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u4f9d\u8d56\u4e8e\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5c06\u89c6\u9891\u4e2d\u7684\u6240\u6709\u89c6\u89c9\u5185\u5bb9\u8f6c\u6362\u6210\u81ea\u7136\u8bed\u8a00\u3002\u4f20\u7edf\u505a\u6cd5\u901a\u5e38\u662f\u5747\u5300\u91c7\u6837\u5927\u91cf\u5e27\u5e76\u72ec\u7acb\u4e3a\u5176\u751f\u6210\u63cf\u8ff0\uff0c\u8fd9\u65e2\u4e0d\u9ad8\u6548\u4e5f\u4e0d\u514d\u6709\u5197\u4f59\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5173\u952e\u5e27\u9009\u62e9\u548c\u987a\u5e8f\u611f\u77e5\u7684\u63cf\u8ff0\u65b9\u6cd5\uff0c\u4ee5\u663e\u8457\u51cf\u5c11\u8fd9\u4e9b\u5197\u4f59\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u521b\u65b0\u65b9\u6cd5\uff1a\u5c42\u6b21\u5173\u952e\u5e27\u9009\u62e9\u5668\u548c\u987a\u5e8f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u7684\u6700\u7ec8\u6846\u67b6\u79f0\u4e3aLVNet\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6LVQA\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u4ee3\u7801\u3002|\n", "2406.09367": "|**2024-06-13**|**Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs**|Zijia Zhao et.al.|[2406.09367](http://arxiv.org/abs/2406.09367)|**[link](https://github.com/joez17/videoniah)**|**\u89c6\u9891\u7406\u89e3\u662f\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5173\u952e\u4e0b\u4e00\u6b65\u3002\u4e3a\u4e86\u68c0\u9a8c\u89c6\u9891\u7406\u89e3\u7684\u7279\u5b9a\u65b9\u9762\uff0c\u73b0\u6709\u7684\u89c6\u9891\u57fa\u51c6\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u9009\u62e9\u4e0e\u76ee\u6807\u80fd\u529b\u5339\u914d\u7684\u89c6\u9891\uff0c\u5e76\u5bf9\u67e5\u8be2-\u54cd\u5e94\u5bf9\u8fdb\u884c\u7e41\u7410\u7684\u6807\u6ce8\uff0c\u4ee5\u5339\u914d\u89c6\u9891\u5185\u5bb9\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u65e2\u5177\u6709\u6311\u6218\u6027\u53c8\u8d44\u6e90\u5bc6\u96c6\u3002\u672c\u6587\u63d0\u51faVideoNIAH\uff08\u89c6\u9891\u9488 haystack\uff09\uff0c\u4e00\u4e2a\u901a\u8fc7\u5408\u6210\u89c6\u9891\u751f\u6210\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\u3002VideoNIAH\u901a\u8fc7\u5c06\u4e0d\u76f8\u5173\u7684\u56fe\u50cf/\u6587\u672c\u201c\u9488\u201d\u63d2\u5165\u539f\u59cb\u89c6\u9891\u4e2d\uff0c\u5c06\u6d4b\u8bd5\u89c6\u9891\u5185\u5bb9\u4e0e\u5b83\u4eec\u7684\u67e5\u8be2-\u54cd\u5e94\u5206\u79bb\u3002\u5b83\u4ec5\u57fa\u4e8e\u8fd9\u4e9b\u9488\u751f\u6210\u6ce8\u91ca\uff0c\u786e\u4fdd\u89c6\u9891\u6765\u6e90\u7684\u591a\u6837\u6027\u548c\u67e5\u8be2-\u54cd\u5e94\u7684\u4e30\u5bcc\u6027\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d2\u5165\u591a\u4e2a\u9488\uff0cVideoNIAH\u4e25\u683c\u8bc4\u4f30\u6a21\u578b\u7684\u65f6\u5e8f\u7406\u89e3\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528VideoNIAH\u6784\u5efa\u4e86\u89c6\u9891\u57fa\u51c6VNBench\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u6392\u5e8f\u548c\u8ba1\u6570\u7b49\u4efb\u52a1\u3002VNBench\u80fd\u591f\u9ad8\u6548\u5730\u8bc4\u4f30\u89c6\u9891\u6a21\u578b\u7684\u7cbe\u7ec6\u7406\u89e3\u80fd\u529b\u548c\u65f6\u7a7a\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u652f\u6301\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd1\u671f\u7684\u89c6\u9891\u4e3a\u4e2d\u5fc3\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u6a21\u578b\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u3002\u5c3d\u7ba1\u4e13\u6709\u6a21\u578b\u76f8\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\u5177\u6709\u663e\u8457\u4f18\u52bf\uff0c\u4f46\u6240\u6709\u73b0\u6709\u89c6\u9891\u6a21\u578b\u5728\u957f\u8ddd\u79bb\u4f9d\u8d56\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4ecd\u7136\u4e0d\u4f73\u3002VideoNIAH\u662f\u4e00\u4e2a\u7b80\u5355\u4e14\u9ad8\u5ea6\u53ef\u6269\u5c55\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\uff0c\u6211\u4eec\u76f8\u4fe1\u5b83\u5c06\u6fc0\u53d1\u672a\u6765\u89c6\u9891\u57fa\u51c6\u5de5\u4f5c\u7684\u521b\u65b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/joez17/VideoNIAH\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.09363": "|**2024-06-13**|**ElicitationGPT: Text Elicitation Mechanisms via Language Models**|Yifan Wu et.al.|[2406.09363](http://arxiv.org/abs/2406.09363)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u65e0\u9700\u9886\u57df\u77e5\u8bc6\u7684\u67e5\u8be2\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u5bf9\u83b7\u53d6\u7684\u6587\u672c\u9884\u6d4b\u8fdb\u884c\u8bc4\u5206\uff0c\u4ee5\u8bc4\u4f30\u5176\u4e0e\u5b9e\u9645\u72b6\u6001\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u6fc0\u52b1\u4fe1\u606f\u6536\u96c6\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u3002\u7814\u7a76\u901a\u8fc7\u5728\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u81ea\u52a8\u7684\u6a21\u578b\u8bc4\u5206\u4e0e\u4eba\u5de5\u5bfc\u5e08\u7ed9\u51fa\u7684\u8bc4\u5206\uff0c\u65e8\u5728\u5b9e\u8bc1\u8bc4\u4f30\u8fd9\u4e9b\u673a\u5236\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002|\n", "2406.09345": "|**2024-06-13**|**DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding**|Suwon Shon et.al.|[2406.09345](http://arxiv.org/abs/2406.09345)|null|## \u80cc\u666f \u5c06\u9884\u8bad\u7ec3\u7684\u6587\u672c\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u8bed\u97f3\u8f93\u5165\u76f8\u7ed3\u5408\uff0c\u5df2\u7ecf\u8d4b\u4e88\u4e86\u8fd9\u4e9b\u6a21\u578b\u6267\u884c\u591a\u6837\u5316\u8bed\u97f3\u4efb\u52a1\u7684\u80fd\u529b\uff0c\u5305\u62ec\u6307\u4ee4\u8ddf\u968f\u3002\u8fd9\u79cd\u6574\u5408\u9700\u8981\u7ed3\u5408\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u548cLLM\uff0c\u5b83\u4eec\u5206\u522b\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u63d0\u8bae\u4f7f\u7528\u79bb\u6563\u8bed\u97f3\u5355\u5143\uff08DSU\uff09\uff0c\u800c\u975e\u8fde\u7eed\u503c\u7684\u8bed\u97f3\u7f16\u7801\u8f93\u51fa\uff0c\u901a\u8fc7\u8bed\u97f3\u9002\u914d\u5668\u5c06DSU\u8f6c\u6362\u5230LLM\u7684\u5d4c\u5165\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u65e0\u76d1\u7763\u7684\u8bed\u97f3\u7f16\u7801\u5668\u751f\u6210DSU\uff0c\u7136\u540e\u8fd0\u7528k-means\u805a\u7c7b\u65b9\u6cd5\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u5904\u7406\u6765\u81ea\u89c1/\u672a\u89c1\u8fc7\u9886\u57df\u4ee5\u53ca\u53e3\u8bed\u95ee\u7b54\u4e2d\u7684\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u7a33\u5065\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6765\u81ea\u4e0d\u540c\u81ea\u76d1\u7763\u8bed\u97f3\u7f16\u7801\u5668\u5c42\u7684DSU\u7c7b\u578b\uff0c\u4ee5\u53ca\u6885\u5c14\u9891\u7387\u5012\u8c31\u7cfb\u6570\uff08MFCC\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u53e3\u8bed\u95ee\u7b54\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u4e2d\uff0cASR\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u53ef\u80fd\u8f83\u4f4e\u3002|\n", "2406.09325": "|**2024-06-13**|**REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space**|Tomer Ashuach et.al.|[2406.09325](http://arxiv.org/abs/2406.09325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u80fd\u65e0\u610f\u4e2d\u8bb0\u4f4f\u5e76\u6cc4\u9732\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u654f\u611f\u6216\u4e2a\u4eba\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\uff0c\u5f15\u53d1\u9690\u79c1\u95ee\u9898\u3002\u5f53\u524d\u7684\u89e3\u51b3\u65b9\u6848\u5305\u62ec\u6602\u8d35\u7684\u6570\u636e\u6e05\u6d17\uff0c\u6216\u8005\u901a\u8fc7\u9057\u5fd8\u548c\u6a21\u578b\u7f16\u8f91\u6765\u8fc7\u6ee4\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u88ab\u63d0\u53d6\u653b\u51fb\u7ed5\u8fc7\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u540d\u4e3aREVS\uff0c\u7528\u4e8e\u4eceLLMs\u4e2d\u6d88\u9664\u654f\u611f\u4fe1\u606f\u3002REVS\u8bc6\u522b\u5e76\u4fee\u6539\u4e0e\u6bcf\u6761\u654f\u611f\u4fe1\u606f\u76f8\u5173\u7684\u5c11\u91cf\u795e\u7ecf\u5143\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u795e\u7ecf\u5143\u6295\u5f71\u5230\u8bcd\u6c47\u7a7a\u95f4\uff08\u53bb\u5d4c\u5165\uff09\uff0c\u6211\u4eec\u5b9a\u4f4d\u9a71\u52a8\u5176\u751f\u6210\u7684\u5173\u952e\u90e8\u5206\u3002\u7136\u540e\uff0c\u6211\u4eec\u6839\u636e\u53bb\u5d4c\u5165\u77e9\u9635\u7684\u4f2a\u9006\u8ba1\u7b97\u6a21\u578b\u7f16\u8f91\uff0c\u5e76\u5e94\u7528\u5b83\u6765\u964d\u4f4e\u76ee\u6807\u654f\u611f\u6570\u636e\u7684\u751f\u6210\u6982\u7387\u3002\u4e3a\u4e86\u5145\u5206\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u771f\u6b63\u654f\u611f\u4fe1\u606f\u4e0a\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u662fGPT-J\u56fa\u6709\u7684\u7535\u5b50\u90ae\u4ef6\u6570\u636e\u96c6\uff0c\u53e6\u4e00\u4e2a\u662f\u6211\u4eec\u8c03\u6574\u6a21\u578b\u4f7f\u5176\u8bb0\u5fc6\u7684\u5408\u6210\u793e\u4f1a\u4fdd\u969c\u53f7\u7801\u6570\u636e\u96c6\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\u76f8\u6bd4\uff0cREVS\u5728\u6d88\u9664\u654f\u611f\u4fe1\u606f\u548c\u62b5\u6297\u63d0\u53d6\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u7684\u5b8c\u6574\u6027\u3002\u4ee3\u7801\u548c\u6f14\u793a\u7b14\u8bb0\u672c\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.09324": "|**2024-06-13**|**Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs**|Zhao Xu et.al.|[2406.09324](http://arxiv.org/abs/2406.09324)|**[link](https://github.com/usail-hkust/bag_of_tricks_for_llm_jailbreaking)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6837\u672c\u4efb\u52a1\u6267\u884c\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u6613\u53d7\u7834\u89e3\u653b\u51fb\uff0c\u53ef\u80fd\u88ab\u64cd\u7eb5\u4ea7\u751f\u6709\u5bb3\u8f93\u51fa\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f00\u59cb\u5c06\u7834\u89e3\u653b\u51fb\u5206\u4e3a\u4ee4\u724c\u7ea7\u548c\u63d0\u793a\u7ea7\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u4e3b\u8981\u5ffd\u89c6\u4e86\u7834\u89e3\u653b\u51fb\u7684\u591a\u6837\u5173\u952e\u56e0\u7d20\uff0c\u5927\u90e8\u5206\u7814\u7a76\u805a\u7126\u4e8eLLM\u7684\u6f0f\u6d1e\uff0c\u800c\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u63a2\u7d22\u4e0d\u8db3\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u653b\u51fb\u8bbe\u7f6e\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u8bae\u5efa\u7acb\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u6846\u67b6\uff0c\u4ee5\u4fc3\u8fdb\u6807\u51c6\u5316\u8bc4\u4f30\u3002\u6211\u4eec\u4ece\u76ee\u6807\u7ea7\u548c\u653b\u51fb\u7ea7\u4e24\u4e2a\u89d2\u5ea6\uff0c\u8be6\u7ec6\u8003\u5bdf\u4e86\u5b9e\u65bd\u9488\u5bf9LLMs\u7684\u7834\u89e3\u653b\u51fb\u7684\u516b\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5e38\u7528\u6570\u636e\u96c6\u4e0a\u5bf9\u516d\u79cd\u9632\u5fa1\u65b9\u6cd5\u8fdb\u884c\u4e86\u4e03\u79cd\u4ee3\u8868\u6027\u7684\u7834\u89e3\u653b\u51fb\uff0c\u603b\u8ba1\u7ea6320\u4e2a\u5b9e\u9a8c\uff0c\u4f7f\u7528A800-80G GPU\u8017\u65f6\u5927\u7ea65\u4e07\u5c0f\u65f6\u3002\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u8fdb\u884c\u6807\u51c6\u5316\u8bc4\u4f30\u7684\u5fc5\u8981\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff1ahttps://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking\u3002**|\n", "2406.09321": "|**2024-06-13**|**JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models**|Delong Ran et.al.|[2406.09321](http://arxiv.org/abs/2406.09321)|**[link](https://github.com/thuccslab/jailbreakeval)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8d8a\u72f1\u653b\u51fb\u7814\u7a76\u4e2d\u7684\u8bc4\u4f30\u96be\u9898\u3002\u76ee\u524d\uff0c\u5bf9\u4e8e\u653b\u51fb\u662f\u5426\u6210\u529f\u7f3a\u4e4f\u7edf\u4e00\u6807\u51c6\uff0c\u4e0d\u540c\u7684\u8bc4\u4f30\u65b9\u6cd5\u5982\u4eba\u5de5\u6807\u6ce8\u6216\u7279\u5b9a\u65b9\u5f0f\u63d0\u793aGPT-4\u5b58\u5728\uff0c\u5404\u6709\u4f18\u7f3a\u70b9\uff0c\u5bf9\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u4f53\u73b0\u548c\u7814\u7a76\u6210\u672c\u4ea7\u751f\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7814\u7a76\u5206\u6790\u4e86\u8fd1\u4e5d\u5341\u98792023\u5e745\u6708\u81f32024\u5e744\u6708\u671f\u95f4\u53d1\u5e03\u7684\u8d8a\u72f1\u653b\u51fb\u76f8\u5173\u7814\u7a76\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u8be6\u7ec6\u7684\u8bc4\u4f30\u65b9\u6cd5\u5206\u7c7b\u4f53\u7cfb\uff0c\u6df1\u5165\u5256\u6790\u4e86\u5404\u79cd\u8bc4\u4f30\u5668\u7684\u4f18\u7f3a\u70b9\u53ca\u5176\u5e94\u7528\u73b0\u72b6\u3002\u4e3a\u4e86\u63a8\u52a8\u540e\u7eed\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u5e76\u63a8\u51fa\u4e86JailbreakEval\u5de5\u5177\u5305\uff0c\u5b83\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5e73\u53f0\uff0c\u96c6\u6210\u4e86\u591a\u79cd\u77e5\u540d\u7684\u8bc4\u4f30\u5668\uff0c\u7528\u6237\u53ea\u9700\u4e00\u4e2a\u547d\u4ee4\u5373\u53ef\u83b7\u53d6\u7ed3\u679c\u3002\u6b64\u5916\uff0cJailbreakEval\u652f\u6301\u7528\u6237\u5728\u7edf\u4e00\u6846\u67b6\u5185\u5b9a\u5236\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6d41\u7a0b\uff0c\u7b80\u5316\u4e86\u5f00\u53d1\u548c\u6bd4\u8f83\u8fc7\u7a0b\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u671f\u671bJailbreakEval\u80fd\u4fc3\u8fdb\u8d8a\u72f1\u653b\u51fb\u8bc4\u4ef7\u7684\u6807\u51c6\u5316\uff0c\u6210\u4e3a\u793e\u533a\u5185\u8d8a\u72f1\u7814\u7a76\u8bc4\u4f30\u7684\u50ac\u5316\u5242\u3002**|\n", "2406.10229": "|**2024-06-14**|**Quantifying Variance in Evaluation Benchmarks**|Lovish Madaan et.al.|[2406.10229](http://arxiv.org/abs/2406.10229)|null|\u8bc4\u4ef7\u57fa\u51c6\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4e5f\u662f\u63a8\u52a8\u8fd9\u4e9b\u80fd\u529b\u8fdb\u6b65\u7684\u9a71\u52a8\u529b\u3002\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff08\u6216\u7f3a\u4e4f\uff09\uff0c\u73b0\u5728\u5b83\u4eec\u4e5f\u88ab\u5e7f\u6cdb\u7528\u4e8e\u51b3\u5b9a\u4e0d\u540c\u7684\u8bad\u7ec3\u9009\u62e9\u4e4b\u95f4\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u6211\u4eec\u5f88\u5c11\u91cf\u5316\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\uff0c\u8fd9\u51b3\u5b9a\u4e86\u6027\u80fd\u5dee\u5f02\u7684\u542b\u4e49\u3002\u672c\u6587\u5b9a\u4e49\u5e76\u6d4b\u91cf\u4e86\u4e00\u7cfb\u5217\u65e8\u5728\u8861\u91cf\u8bc4\u4ef7\u57fa\u51c6\u65b9\u5dee\u7684\u6307\u6807\uff0c\u5305\u62ec\u521d\u59cb\u5316\u65f6\u7684\u968f\u673a\u79cd\u5b50\u65b9\u5dee\u548c\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5355\u8c03\u6027\u3002\u901a\u8fc7\u5bf9\u5927\u91cf\u6a21\u578b\uff08\u5305\u62ec\u516c\u5f00\u53ef\u7528\u7684\u548c\u4ece\u5934\u8bad\u7ec3\u7684\u6a21\u578b\uff09\u8fdb\u884c\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5404\u79cd\u65b9\u5dee\u5ea6\u91cf\u7684\u5b9e\u8bc1\u4f30\u8ba1\uff0c\u5e76\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u8003\u8651\u548c\u5efa\u8bae\u3002\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u8fde\u7eed\u548c\u79bb\u6563\u6027\u80fd\u5ea6\u91cf\u7684\u5b9e\u7528\u6027\u548c\u6743\u8861\uff0c\u5e76\u63a2\u7d22\u4e86\u66f4\u597d\u5730\u7406\u89e3\u548c\u51cf\u5c11\u65b9\u5dee\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u89c4\u6a21\uff08\u7ea670\u4ebf\u53c2\u6570\uff09\u7684\u6a21\u578b\uff0c\u5982\u5c06\u591a\u6a21\u6001\u591a\u4efb\u52a1\u5b66\u4e60\uff08MMLU\uff09\u4efb\u52a1\u6846\u67b6\u4e3a\u5b8c\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u5e38\u5e38\u964d\u4f4e\u65b9\u5dee\uff1b\u800c\u53d7\u5230\u4eba\u7c7b\u6d4b\u8bd5\u6587\u732e\u542f\u53d1\u7684\u66f4\u590d\u6742\u65b9\u6cd5\uff08\u5982\u9879\u76ee\u5206\u6790\u548c\u9879\u76ee\u53cd\u5e94\u7406\u8bba\uff09\u5728\u663e\u8457\u51cf\u5c11\u65b9\u5dee\u65b9\u9762\u6548\u679c\u6709\u9650\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\u7279\u6027\uff0c\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u7279\u5b9a\u6280\u672f\u6765\u51cf\u5c11\u65b9\u5dee\uff0c\u5e76\u666e\u904d\u9f13\u52b1\u5b9e\u8df5\u8005\u5728\u6bd4\u8f83\u6a21\u578b\u65f6\u4ed4\u7ec6\u8003\u8651\u65b9\u5dee\u56e0\u7d20\u3002|\n", "2406.10218": "|**2024-06-14**|**Semantic Membership Inference Attack against Large Language Models**|Hamid Mozaffari et.al.|[2406.10218](http://arxiv.org/abs/2406.10218)|null|## \u80cc\u666f \u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIA\uff09\u7684\u76ee\u6807\u662f\u8bc6\u522b\u7279\u5b9a\u6570\u636e\u70b9\u662f\u5426\u88ab\u7eb3\u5165\u4e86\u76ee\u6807\u6a21\u578b\u7684\u8bad\u7ec3\u96c6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u8bed\u4e49\u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Semantic Membership Inference Attack\uff0cSMIA\uff09\uff0c\u901a\u8fc7\u5229\u7528\u8f93\u5165\u7684\u8bed\u4e49\u5185\u5bb9\u53ca\u5176\u6270\u52a8\uff0c\u63d0\u5347MIA\u7684\u6027\u80fd\u3002SMIA\u8bad\u7ec3\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6765\u5206\u6790\u76ee\u6807\u6a21\u578b\u5bf9\u6270\u52a8\u8f93\u5165\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6355\u6349\u6210\u5458\u6837\u672c\u4e0e\u975e\u6210\u5458\u6837\u672c\u4e4b\u95f4\u8f93\u51fa\u6982\u7387\u5206\u5e03\u7684\u5dee\u5f02\u3002\u6211\u4eec\u5728Pythia\u548cGPT-Neo\u6a21\u578b\u5bb6\u65cf\uff0c\u4ee5\u53caWikipedia\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSMIA\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u653b\u51fb\u624b\u6bb5\uff0c\u4f8b\u5982\u5728Pythia-12B\u4e0a\u7684AUC-ROC\u503c\u8fbe\u5230\u4e8667.39%\uff0c\u800c\u7b2c\u4e8c\u597d\u7684\u653b\u51fb\u65b9\u6cd5\u4ec5\u4e3a58.90%\u3002|\n", "2406.10216": "|**2024-06-14**|**Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs**|Rui Yang et.al.|[2406.10216](http://arxiv.org/abs/2406.10216)|**[link](https://github.com/yangrui2015/generalizable-reward-model)**|\u5728\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6846\u67b6\u4e2d\uff0c\u5229\u7528\u57fa\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u5956\u52b1\u6a21\u578b\u5df2\u8bc1\u5b9e\u80fd\u6709\u6548\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u610f\u56fe\u3002\u7136\u800c\uff0c\u5f53\u524d\u5956\u52b1\u6a21\u578b\u5bf9\u672a\u89c1\u8fc7\u7684\u63d0\u793a\u548c\u54cd\u5e94\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\uff0c\u53ef\u80fd\u5bfc\u81f4\u6240\u8c13\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u5373\u5956\u52b1\u4f18\u5316\u8fc7\u5ea6\u5bfc\u81f4\u5b9e\u9645\u6027\u80fd\u4e0b\u964d\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u503e\u5411\u4e8e\u7ea6\u675f\u7b56\u7565\u4f18\u5316\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u6b63\u5219\u5316\u9690\u85cf\u72b6\u6001\u6765\u589e\u5f3a\u5956\u52b1\u6a21\u578b\u5e94\u5bf9\u5206\u5e03\u53d8\u5316\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4fdd\u7559\u57fa\u7840\u6a21\u578b\u7684\u8bed\u8a00\u6a21\u578b\u5934\uff0c\u5e76\u7ed3\u5408\u4e00\u7cfb\u5217\u6587\u672c\u751f\u6210\u635f\u5931\uff0c\u65e8\u5728\u4fdd\u6301\u9690\u85cf\u72b6\u6001\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\uff0c\u540c\u65f6\u5728\u76f8\u540c\u7684\u9690\u85cf\u72b6\u6001\u540e\u5b66\u4e60\u4e00\u4e2a\u5956\u52b1\u5934\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5f15\u5165\u7684\u6b63\u5219\u5316\u6280\u672f\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u6cdb\u5316\u4efb\u52a1\u4e2d\u7684\u5956\u52b1\u6a21\u578b\u51c6\u786e\u6027\uff0c\u5e76\u6709\u6548\u7f13\u89e3\u4e86RLHF\u4e2d\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u53ef\u9760\u3001\u66f4\u7a33\u5065\u7684\u504f\u597d\u5b66\u4e60\u8303\u5f0f\u3002|\n", "2406.10209": "|**2024-06-14**|**Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs**|Abhimanyu Hans et.al.|[2406.10209](http://arxiv.org/abs/2406.10209)|**[link](https://github.com/ahans30/goldfish-loss)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u8bb0\u4f4f\u5e76\u91cd\u590d\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u8fd9\u5e26\u6765\u4e86\u9690\u79c1\u548c\u7248\u6743\u95ee\u9898\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u79cd\u8bb0\u5fc6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u4e0b\u4e00\u6b65 token \u8bad\u7ec3\u76ee\u6807\u7684\u5fae\u5999\u4fee\u6539\uff0c\u79f0\u4e3a\u201c\u91d1\u9c7c\u635f\u5931\u201d\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u4ee4\u724c\u4e0d\u53c2\u4e0e\u635f\u5931\u8ba1\u7b97\u3002\u6a21\u578b\u4e0d\u4f1a\u8bb0\u4f4f\u8fd9\u4e9b\u88ab\u4e22\u5f03\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u9632\u6b62\u4e86\u5b8c\u6574\u8bad\u7ec3\u5e8f\u5217\u7684\u9010\u5b57\u590d\u5236\u3002\u6211\u4eec\u5728\u6570\u5341\u4ebf\u89c4\u6a21\u7684 Llama-2 \u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u548c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u63d0\u53d6\u7684\u8bb0\u5fc6\uff0c\u800c\u5bf9\u4e0b\u6e38\u57fa\u51c6\u7684\u5f71\u54cd\u5fae\u4e4e\u5176\u5fae\u3002**|\n", "2406.10196": "|**2024-06-14**|**TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners**|Tomas de la Rosa et.al.|[2406.10196](http://arxiv.org/abs/2406.10196)|null|**\u6458\u8981\uff1a** \u65c5\u884c\u89c4\u5212\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u6839\u636e\u7ea6\u675f\u6761\u4ef6\u751f\u6210\u4e00\u7cfb\u5217\u4e0e\u8bbf\u95ee\u5730\u70b9\u76f8\u5173\u7684\u884c\u52a8\uff0c\u540c\u65f6\u6700\u5927\u5316\u7528\u6237\u7684\u6ee1\u610f\u5ea6\u3002\u4f20\u7edf\u65b9\u6cd5\u901a\u5e38\u4f1a\u5c06\u95ee\u9898\u8f6c\u5316\u4e3a\u7279\u5b9a\u5f62\u5f0f\u7684\u8bed\u8a00\u8868\u8fbe\uff0c\u4ece\u7f51\u7edc\u8d44\u6e90\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4f7f\u7528\u5408\u9002\u7684\u6c42\u89e3\u5668\u6765\u751f\u6210\u6709\u6548\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\u76f4\u63a5\u4ece\u7528\u6237\u8bf7\u6c42\u4e2d\u8f93\u51fa\u8ba1\u5212\uff0c\u5229\u7528\u4e30\u5bcc\u7684\u65c5\u884c\u9886\u57df\u77e5\u8bc6\u63d0\u4f9b\u666f\u70b9\u548c\u53ef\u80fd\u8def\u7ebf\u7b49\u9ad8\u5c42\u6b21\u4fe1\u606f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5f80\u5f80\u4ea7\u751f\u4e0d\u8fde\u8d2f\u3001\u672a\u80fd\u5b8c\u5168\u6ee1\u8db3\u7ea6\u675f\u7684\u8ba1\u5212\uff0c\u4e14\u65e0\u6cd5\u4fdd\u8bc1\u751f\u6210\u9ad8\u8d28\u91cf\u65b9\u6848\u3002\u6211\u4eec\u63d0\u51faTRIP-PAL\uff0c\u4e00\u79cd\u878d\u5408LLMs\u548c\u81ea\u52a8\u5316\u89c4\u5212\u5668\u7684\u6df7\u5408\u65b9\u6cd5\uff1a\uff081\uff09LLMs\u83b7\u53d6\u5e76\u8f6c\u6362\u65c5\u884c\u4fe1\u606f\u548c\u7528\u6237\u9700\u6c42\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u8f93\u5165\u89c4\u5212\u5668\u7684\u6570\u636e\u7ed3\u6784\uff1b\uff082\uff09\u81ea\u52a8\u5316\u89c4\u5212\u5668\u8d1f\u8d23\u751f\u6210\u6ee1\u8db3\u7ea6\u675f\u5e76\u4f18\u5316\u7528\u6237\u6548\u7528\u7684\u65c5\u884c\u8ba1\u5212\u3002\u6211\u4eec\u5728\u4e0d\u540c\u65c5\u884c\u573a\u666f\u4e2d\u7684\u5b9e\u9a8c\u8868\u660e\uff0cTRIP-PAL\u5728\u751f\u6210\u65c5\u884c\u8ba1\u5212\u65b9\u9762\u4f18\u4e8e\u7eafLLM\u65b9\u6cd5\u3002|\n", "2406.10185": "|**2024-06-14**|**Detecting and Evaluating Medical Hallucinations in Large Vision Language Models**|Jiawei Chen et.al.|[2406.10185](http://arxiv.org/abs/2406.10185)|null|\u968f\u7740\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u5982\u533b\u5b66\u56fe\u50cf\u95ee\u7b54\u548c\u62a5\u544a\u751f\u6210\uff0c\u5b83\u4eec\u4ece\u57fa\u7840\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u90a3\u91cc\u7ee7\u627f\u4e86\u5f3a\u5927\u7684\u529f\u80fd\uff0c\u4f46\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u4ee4\u4eba\u62c5\u5fe7\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5728\u533b\u7597\u8fd9\u6837\u5bf9\u9519\u8bef\u5bb9\u9650\u6781\u4f4e\u7684\u73af\u5883\u4e2d\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u65b9\u6cd5\u6216\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86Med-HallMark\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u533b\u7597\u591a\u6a21\u6001\u9886\u57df\u8bbe\u8ba1\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u57fa\u51c6\u3002Med-HallMark\u652f\u6301\u591a\u4efb\u52a1\u5e7b\u89c9\u68c0\u6d4b\uff0c\u63d0\u4f9b\u591a\u5143\u5316\u7684\u5e7b\u89c9\u6570\u636e\uff0c\u5e76\u91c7\u7528\u5206\u7ea7\u5e7b\u89c9\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MediHall Score\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u533b\u7597\u8bc4\u4f30\u6307\u6807\uff0c\u901a\u8fc7\u5206\u5c42\u8bc4\u5206\u7cfb\u7edf\u8bc4\u4f30LVLM\u7684\u5e7b\u89c9\uff0c\u8003\u8651\u5176\u4e25\u91cd\u7a0b\u5ea6\u548c\u7c7b\u578b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u6f5c\u5728\u4e34\u5e8a\u5f71\u54cd\u7684\u7ec6\u81f4\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86MediHallDetector\uff0c\u4e00\u79cd\u4e13\u4e3a\u7cbe\u786e\u5e7b\u89c9\u68c0\u6d4b\u8bbe\u8ba1\u7684\u533b\u7597LVLM\uff0c\u5b83\u91c7\u7528\u4e86\u591a\u4efb\u52a1\u8bad\u7ec3\u65b9\u6cd5\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u4e3a\u6d41\u884c\u7684LVLM\u8bbe\u7acb\u4e86\u57fa\u7ebf\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMediHall Score\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edf\u6307\u6807\u66f4\u6df1\u5165\u7406\u89e3\u5e7b\u89c9\u5f71\u54cd\u7684\u80fd\u529b\uff0c\u5e76\u663e\u793a\u4e86MediHallDetector\u7684\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u663e\u8457\u63d0\u9ad8LVLM\u5728\u533b\u7597\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6240\u6709\u76f8\u5173\u8d44\u6e90\u5c06\u5728\u4e0d\u4e45\u540e\u53d1\u5e03\u3002|\n", "2406.10181": "|**2024-06-14**|**Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors**|Siyuan Chen et.al.|[2406.10181](http://arxiv.org/abs/2406.10181)|null|\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8e\u5185\u5b58\u9700\u6c42\u901a\u5e38\u8d85\u8fc7\u5355\u4e2aGPU\u7684\u5bb9\u91cf\uff0c\u89e3\u51b3\u8fd9\u4e00\u5185\u5b58\u6311\u6218\u7684\u4e00\u4e2a\u5e38\u89c1\u65b9\u6cd5\u662f\u5c06\u8ba1\u7b97\u548c\u6570\u636e\u4eceGPU\u8fc1\u79fb\u5230CPU\u3002\u7136\u800c\uff0c\u8fd9\u53d7\u5230\u666e\u901a\u786c\u4ef6\u5e26\u5bbd\u9650\u5236\u7684\u5236\u7ea6\uff0c\u5f71\u54cd\u4e86CPU\u4e0eGPU\u4e4b\u95f4\u7684\u901a\u4fe1\u6548\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLSP_Offload\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5b66\u4e60\u5f0f\u7684\u5b50\u7a7a\u95f4\u6295\u5f71\u5668\uff0c\u5b9e\u73b0\u5728 commodity \u786c\u4ef6\u4e0a\u63a5\u8fd1\u539f\u751f\u901f\u5ea6\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6d89\u53ca\u5b66\u4e60\u4e00\u4e2a\u9ad8\u6548\u7684\u7a00\u758f\u538b\u7f29\u5668\uff0c\u4ee5\u6700\u5c0f\u5316\u901a\u4fe1\u5e76\u4fdd\u6301\u6700\u5c0f\u7cbe\u5ea6\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5c42\u7ea7\u901a\u4fe1\u8c03\u5ea6\u7b56\u7565\uff0c\u4ee5\u6700\u5927\u5316\u901a\u4fe1\u4e0e\u8ba1\u7b97\u4e4b\u95f4\u7684\u5e76\u884c\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u57284GB\u7b14\u8bb0\u672cGPU\u4e0a\u5fae\u8c0313\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5728\u914d\u590724GB\u5185\u5b58\u7684NVIDIA RTX 4090 GPU\u4e0a\u5fae\u8c0370\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u4ec5\u6bd4\u65e0\u5185\u5b58\u9650\u5236\u7684\u5fae\u8c03\u616231%\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u79bb\u7ebf\u6846\u67b6\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86\u5fae\u8c03\u541e\u5410\u91cf\uff0c\u6700\u9ad8\u53ef\u8fbe3.33\u500d\uff0c\u5f53\u8fbe\u5230\u76f8\u540c\u51c6\u786e\u5ea6\u65f6\uff0c\u51cf\u5c11\u4e86\u7aef\u5230\u7aef\u5fae\u8c03\u65f6\u95f4\u768433.1%\u81f362.5%\u3002|\n", "2406.10172": "|**2024-06-14**|**Datasets for Multilingual Answer Sentence Selection**|Matteo Gabburo et.al.|[2406.10172](http://arxiv.org/abs/2406.10172)|null|**\u6458\u8981\uff1a** \u5728\u8bbe\u8ba1\u9ad8\u6548\u7684\u68c0\u7d22\u5f0f\u95ee\u7b54\uff08Question Answering\uff0cQA\uff09\u7cfb\u7edf\u4e2d\uff0c\u7b54\u6848\u53e5\u5b50\u9009\u62e9\uff08Answer Sentence Selection\uff0cAS2\uff09\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u6807\u6ce8\u6570\u636e\uff0c\u5927\u591a\u6570AS2\u9886\u57df\u7684\u8fdb\u5c55\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\u3002\u8fd9\u5bfc\u81f4\u4e86\u975e\u82f1\u8bed\u73af\u5883\u4e0bQA\u7cfb\u7edf\u7684\u6027\u80fd\u4e0e\u82f1\u8bed\u7cfb\u7edf\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u672c\u8bba\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u65b0\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u8a00\uff08\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u8461\u8404\u7259\u8bed\u548c\u897f\u73ed\u7259\u8bed\uff09AS2\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u5bf9\u73b0\u6709\u7684\u82f1\u6587AS2\u6570\u636e\u96c6\uff08\u5982ASNQ\u3001WikiQA\u548cTREC-QA\uff09\u8fdb\u884c\u76d1\u7763\u81ea\u52a8\u673a\u5668\u7ffb\u8bd1\uff08Automatic Machine Translation\uff0cAMT\uff09\u3002\u6211\u4eec\u901a\u8fc7\u591a\u79cd\u5b9e\u9a8c\u548c\u4e0d\u540cTransformer\u67b6\u6784\u7684\u8bc4\u4f30\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u53ca\u7ffb\u8bd1\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u5bf9\u4e8e\u6784\u5efa\u5065\u58ee\u7684\u591a\u8bed\u8a00AS2\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u663e\u8457\u7f29\u5c0f\u4e86\u975e\u82f1\u8bed\u4e0e\u82f1\u8bed\u73af\u5883\u4e0b\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2406.10162": "|**2024-06-14**|**Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models**|Carson Denison et.al.|[2406.10162](http://arxiv.org/abs/2406.10162)|**[link](https://github.com/anthropics/sycophancy-to-subterfuge-paper)**|**\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u5f53\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5b66\u4f1a\u56e0\u8bad\u7ec3\u76ee\u6807\u4e0d\u660e\u786e\u800c\u83b7\u5f97\u4e0d\u671f\u671b\u7684\u884c\u4e3a\u65f6\uff0c\u5c31\u4f1a\u51fa\u73b0\u89c4\u683c\u6e38\u620f\u73b0\u8c61\u3002\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u4ece\u7b80\u5355\u7684\u5949\u627f\u884c\u4e3a\u53d1\u5c55\u5230\u66f4\u590d\u6742\u4e14\u5371\u9669\u7684\u5956\u52b1\u7be1\u6539\uff0c\u5373\u6a21\u578b\u76f4\u63a5\u4fee\u6539\u5176\u81ea\u8eab\u7684\u5956\u52b1\u673a\u5236\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8fd9\u4e9b\u590d\u6742\u884c\u4e3a\u53ef\u80fd\u8d85\u51fa\u63a2\u7d22\u7684\u8303\u7574\u3002\u672c\u8bba\u6587\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u4f1a\u5728\u5b66\u4e60\u5e38\u89c1\u89c4\u683c\u6e38\u620f\u7b56\u7565\u540e\uff0c\u6cdb\u5316\u5230\u6267\u884c\u66f4\u4e3a\u7f55\u89c1\u548c\u660e\u663e\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u5956\u52b1\u7be1\u6539\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9010\u6b65\u5347\u7ea7\u7684\u53ef\u6e38\u620f\u73af\u5883\u7cfb\u5217\uff0c\u5e76\u53d1\u73b0\u9488\u5bf9\u65e9\u671f\u9636\u6bb5\u73af\u5883\u7684\u8bad\u7ec3\u4f1a\u5bfc\u81f4\u5728\u540e\u7eed\u73af\u5883\u4e2d\u51fa\u73b0\u66f4\u591a\u7684\u89c4\u683c\u6e38\u620f\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u4e00\u5c0f\u90e8\u5206\u4f46\u975e\u96f6\u7684LLMs\uff0c\u5728\u7ecf\u5386\u4e86\u5b8c\u6574\u8bad\u7ec3\u8bfe\u7a0b\u540e\uff0c\u80fd\u591f\u96f6\u6837\u672c\u5730\u76f4\u63a5\u4fee\u6539\u5176\u5956\u52b1\u51fd\u6570\u3002\u91cd\u65b0\u8bad\u7ec3LLMs\u4ee5\u907f\u514d\u65e9\u671f\u9636\u6bb5\u7684\u6e38\u620f\u884c\u4e3a\u53ef\u4ee5\u51cf\u8f7b\u4f46\u4e0d\u80fd\u5b8c\u5168\u6d88\u9664\u540e\u671f\u73af\u5883\u4e2d\u7684\u5956\u52b1\u7be1\u6539\u3002\u6b64\u5916\uff0c\u5bf9\u53ef\u6e38\u620f\u73af\u5883\u8fdb\u884c\u65e0\u5bb3\u6027\u8bad\u7ec3\u5e76\u4e0d\u80fd\u963b\u6b62\u5956\u52b1\u7be1\u6539\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u4ece\u5e38\u89c1\u7684\u89c4\u683c\u6e38\u620f\u7b56\u7565\u4e2d\u6cdb\u5316\u5230\u66f4\u6076\u52a3\u7684\u5956\u52b1\u7be1\u6539\u884c\u4e3a\uff0c\u5e76\u4e14\u8981\u6d88\u9664\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u5e76\u975e\u6613\u4e8b\u3002**|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149](http://arxiv.org/abs/2406.10149)|**[link](https://github.com/booydar/babilong)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8f93\u5165\u4e0a\u4e0b\u6587\u957f\u5ea6\u663e\u8457\u589e\u52a0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u672a\u80fd\u5145\u5206\u8861\u91cf\u6a21\u578b\u5904\u7406\u957f\u7bc7\u6587\u672c\u4e2d\u7684\u4e8b\u5b9e\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86BABILong\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u6d4b\u8bd5\u6a21\u578b\u5728\u5206\u5e03\u5f0f\u957f\u6587\u6863\u4e2d\u8de8\u4e8b\u5b9e\u63a8\u7406\u7684\u80fd\u529b\u3002BABILong\u5305\u62ec20\u4e2a\u591a\u6837\u5316\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u5982\u4e8b\u5b9e\u94fe\u3001\u7b80\u5355\u5f52\u7eb3\u3001\u6f14\u7ece\u3001\u8ba1\u6570\u4ee5\u53ca\u5904\u7406\u5217\u8868/\u96c6\u5408\u7b49\u3002\u8fd9\u4e9b\u4efb\u52a1\u672c\u8eab\u5c31\u5177\u6709\u6311\u6218\u6027\uff0c\u800c\u5f53\u6240\u9700\u4e8b\u5b9e\u5206\u6563\u5728\u957f\u7bc7\u81ea\u7136\u6587\u672c\u4e2d\u65f6\uff0c\u96be\u5ea6\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6d41\u884c\u7684LLMs\u5b9e\u9645\u4e0a\u53ea\u5229\u7528\u4e8610%-20%\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4e14\u968f\u7740\u63a8\u7406\u590d\u6742\u6027\u7684\u63d0\u9ad8\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u5bf9\u4e8e\u66ff\u4ee3\u7684\u4e0a\u4e0b\u6587\u63a8\u7406\u65b9\u6cd5\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\u5728\u5355\u4e8b\u5b9e\u95ee\u9898\u56de\u7b54\u4e0a\u7684\u51c6\u786e\u7387\u4ec5\u4e3a60%\uff0c\u4e0e\u4e0a\u4e0b\u6587\u957f\u5ea6\u65e0\u5173\u3002\u5728\u4e0a\u4e0b\u6587\u6269\u5c55\u65b9\u6cd5\u4e2d\uff0c\u5faa\u73af\u8bb0\u5fc6Transformer\u5c55\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u53ef\u5904\u7406\u957f\u8fbe1100\u4e07\u4e2a\u4ee4\u724c\u7684\u957f\u5ea6\u3002BABILong\u57fa\u51c6\u6d4b\u8bd5\u53ef\u4ee5\u6269\u5c55\u5230\u4efb\u610f\u957f\u5ea6\uff0c\u4ee5\u652f\u6301\u8bc4\u4f30\u5177\u6709\u66f4\u5f3a\u80fd\u529b\u7684\u65b0\u6a21\u578b\uff0c\u5e76\u63d0\u4f9b\u4e86\u957f\u8fbe100\u4e07\u4ee4\u724c\u7684\u5206\u9694\u3002|\n", "2406.11840": "|**2024-06-17**|**LLaNA: Large Language and NeRF Assistant**|Andrea Amaduzzi et.al.|[2406.11840](http://arxiv.org/abs/2406.11840)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u56fe\u50cf\u548c3D\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5168\u9762\u6355\u6349\u7269\u4f53\u7684\u5916\u89c2\u548c\u51e0\u4f55\u7279\u6027\u4e0a\u5b58\u5728\u5c40\u9650\u3002\u8fd1\u671f\uff0c\u795e\u7ecf\u8f90\u5c04\u573a\uff08Neural Radiance Fields\uff0c\u7b80\u79f0NeRF\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u5668\uff08Multi-Layer Perceptron\uff0cMLP\uff09\u7684\u6743\u91cd\u7f16\u7801\u4e86\u7269\u4f53\u7684\u51e0\u4f55\u7ed3\u6784\u548c\u9ad8\u5ea6\u903c\u771f\u7684\u5916\u89c2\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06NeRF\u6574\u5408\u5230MLLM\u4e2d\u7684\u53ef\u884c\u6027\u548c\u6548\u679c\u3002\u6211\u4eec\u5f00\u53d1\u4e86LLaNA\uff0c\u8fd9\u662f\u9996\u4e2a\u901a\u7528\u7684NeRF-\u8bed\u8a00\u52a9\u624b\uff0c\u80fd\u591f\u6267\u884c\u65b0\u4efb\u52a1\uff0c\u5982NeRF\u63cf\u8ff0\u548c\u95ee\u7b54\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u76f4\u63a5\u5904\u7406NeRF MLP\u7684\u6743\u91cd\uff0c\u65e0\u9700\u6e32\u67d3\u56fe\u50cf\u6216\u6784\u5efa3D\u6570\u636e\u7ed3\u6784\uff0c\u5c31\u80fd\u63d0\u53d6\u6709\u5173\u4ee3\u8868\u5bf9\u8c61\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65e0\u987b\u4eba\u5de5\u5e72\u9884\u7684NeRF\u6587\u672c\u6807\u6ce8\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5404\u79cdNeRF-\u8bed\u8a00\u4efb\u52a1\uff0c\u5e76\u636e\u6b64\u5efa\u7acb\u4e86\u4e00\u4e2a\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u6211\u4eec\u7684\u6a21\u578b\u5bf9NeRF\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5904\u7406NeRF\u6743\u91cd\u7684\u65b9\u6cd5\u5728\u4e0e\u4eceNeRF\u4e2d\u63d0\u53d62D\u62163D\u8868\u793a\u8fdb\u884c\u6bd4\u8f83\u65f6\u8868\u73b0\u66f4\u4f18\u3002|\n", "2406.11839": "|**2024-06-17**|**mDPO: Conditional Preference Optimization for Multimodal Large Language Models**|Fei Wang et.al.|[2406.11839](http://arxiv.org/abs/2406.11839)|null|### \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5df2\u88ab\u8bc1\u660e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6821\u51c6\u7684\u6709\u6548\u624b\u6bb5\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06DPO\u5e94\u7528\u4e8e\u591a\u6a21\u6001\u573a\u666f\uff0c\u4f46\u53d1\u73b0\u5b9e\u73b0\u6301\u7eed\u6539\u8fdb\u9887\u5177\u6311\u6218\u3002\u901a\u8fc7\u5bf9\u6bd4\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5ffd\u89c6\u4e86\u56fe\u50cf\u6761\u4ef6\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86mDPO\uff0c\u4e00\u4e2a\u65e8\u5728\u9632\u6b62\u8bed\u8a00\u504f\u597d\u8fc7\u5ea6\u4f18\u5148\u7684\u591a\u6a21\u6001DPO\u76ee\u6807\uff0c\u540c\u65f6\u4f18\u5316\u56fe\u50cf\u504f\u597d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5956\u52b1\u951a\u70b9\uff0c\u786e\u4fdd\u9009\u62e9\u7684\u54cd\u5e94\u5956\u52b1\u4fdd\u6301\u6b63\u5411\uff0c\u4ece\u800c\u907f\u514d\u76f8\u5bf9\u504f\u597d\u4f18\u5316\u56fa\u6709\u7684\u53ef\u80fd\u6027\u964d\u4f4e\u95ee\u9898\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u4e24\u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u591a\u6a21\u6001LLM\u4ee5\u53ca\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0cmDPO\u6709\u6548\u89e3\u51b3\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5e76\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002|\n", "2406.11832": "|**2024-06-17**|**Unveiling Encoder-Free Vision-Language Models**|Haiwen Diao et.al.|[2406.11832](http://arxiv.org/abs/2406.11832)|**[link](https://github.com/baaivision/eve)**|**\u5f53\u524d\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u4e3b\u8981\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7f16\u7801\u5668\u6765\u63d0\u53d6\u89c6\u89c9\u7279\u5f81\uff0c\u7136\u540e\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u5728\u62bd\u8c61\u89c6\u89c9\u8868\u793a\u65b9\u9762\u8bbe\u5b9a\u4e86\u5f3a\u70c8\u7684\u5148\u9a8c\uff0c\u5982\u5206\u8fa8\u7387\u3001\u6bd4\u4f8b\u548c\u8bed\u4e49\u503e\u5411\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86VLM\u7684\u7075\u6d3b\u6027\u548c\u6548\u7387\u3002\u76f4\u63a5\u8bad\u7ec3\u65e0\u7f16\u7801\u5668\u7684\u7eafVLM\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u4e14\u9c9c\u6709\u63a2\u7d22\u3002\u5b9e\u8bc1\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u79cd\u76f4\u63a5\u8bad\u7ec3\u65b9\u6cd5\u4f1a\u5bfc\u81f4\u6536\u655b\u7f13\u6162\u548c\u6027\u80fd\u5dee\u8ddd\u8f83\u5927\u3002\u672c\u6587\u65e8\u5728\u5f25\u5408\u7f16\u7801\u5668\u4f9d\u8d56\u578b\u548c\u65e0\u7f16\u7801\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u7eafVLM\u8bad\u7ec3\u7b56\u7565\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u6df1\u5165\u5b9e\u9a8c\u63ed\u793a\u4e86\u9ad8\u6548\u8bad\u7ec3\u65e0\u7f16\u7801\u5668VLM\u7684\u5173\u952e\u8981\u7d20\uff1a\uff081\uff09\u5728\u7edf\u4e00\u7684\u89e3\u7801\u5668\u5185\u878d\u5408\u89c6\u89c9\u4e0e\u8bed\u8a00\u8868\u793a\uff1b\uff082\uff09\u901a\u8fc7\u989d\u5916\u76d1\u7763\u63d0\u5347\u89c6\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e9b\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86EVE\uff0c\u4e00\u4e2a\u65e0\u7f16\u7801\u5668\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u65e2\u80fd\u9ad8\u6548\u8bad\u7ec3\u4e5f\u80fd\u5feb\u901f\u63a8\u7406\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4ec5\u4f7f\u75283500\u4e07\u516c\u5f00\u53ef\u7528\u7684\u6570\u636e\uff0cEVE\u5c31\u80fd\u5728\u591a\u4e2a\u89c6\u89c9\u8bed\u8a00\u57fa\u51c6\u4e0a\u4e0e\u7c7b\u4f3c\u5bb9\u91cf\u7684\u7f16\u7801\u5668\u4f9d\u8d56\u578bVLM\u5339\u654c\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u795e\u79d8\u3001\u6570\u636e\u672a\u516c\u5f00\u7684Fuyu-8B\u6a21\u578b\u3002\u6211\u4eec\u76f8\u4fe1\uff0cEVE\u4e3a\u8de8\u6a21\u6001\u5f00\u53d1\u7eaf\u7cb9\u7684\u89e3\u7801\u5668\u67b6\u6784\u63d0\u4f9b\u4e86\u4e00\u4e2a\u900f\u660e\u4e14\u9ad8\u6548\u7684\u8def\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u516c\u5f00\u5728\uff1ahttps://github.com/baaivision/EVE\u3002**|\n", "2406.11831": "|**2024-06-17**|**Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models**|Bingqi Ma et.al.|[2406.11831](http://arxiv.org/abs/2406.11831)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u89e3\u7801\u5668-only\u53d8\u538b\u5668\u5728\u6587\u672c\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5982\u4f55\u5c06\u8fd9\u4e9b\u5148\u8fdb\u7684LLMs\u5e94\u7528\u4e8e\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\u4ecd\u662f\u4e00\u4e2a\u5f85\u63a2\u7d22\u7684\u95ee\u9898\u3002\u6211\u4eec\u53d1\u73b0\u76f4\u63a5\u4f7f\u7528LLM\u4f5c\u4e3a\u63d0\u793a\u7f16\u7801\u5668\u4f1a\u663e\u8457\u964d\u4f4e\u751f\u6210\u56fe\u50cf\u65f6\u7684\u63d0\u793a\u8ddf\u968f\u80fd\u529b\u3002\u4e3b\u8981\u5b58\u5728\u4e24\u4e2a\u95ee\u9898\uff1a\u4e00\u662fLLM\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u8bad\u7ec3\u4e0e\u6269\u6563\u6a21\u578b\u5bf9\u533a\u5206\u6027\u63d0\u793a\u7279\u5f81\u7684\u9700\u6c42\u4e0d\u5339\u914d\uff1b\u4e8c\u662f\u89e3\u7801\u5668\u67b6\u6784\u56fa\u6709\u7684\u4f4d\u7f6e\u504f\u89c1\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u6846\u67b6\uff0c\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4f7f\u7528\u6307\u5357\uff0c\u589e\u5f3aLLM\u7684\u6587\u672c\u8868\u793a\u80fd\u529b\uff0c\u6d88\u9664\u5176\u5185\u5728\u7684\u5b9a\u4f4d\u504f\u89c1\uff0c\u4ece\u800c\u7075\u6d3b\u5730\u5c06\u6700\u5148\u8fdb\u7684LLMs\u878d\u5165\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u878d\u5408\u591a\u4e2aLLMs\u7684\u65b9\u6cd5\u3002\u9274\u4e8eTransformer\u67b6\u6784\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6269\u5c55\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8bbe\u8ba1\u4e86\u57fa\u4e8e\u8be5\u6846\u67b6\u7684LLM-Infused Diffusion Transformer\uff08LI-DiT\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86LI-DiT\u5728\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u6570\u636e\u91cf\u4e0b\u7684\u6027\u80fd\u3002\u5f97\u76ca\u4e8eLLMs\u7684\u5185\u5728\u80fd\u529b\u53ca\u6211\u4eec\u7684\u521b\u65b0\u8bbe\u8ba1\uff0cLI-DiT\u7684\u63d0\u793a\u7406\u89e3\u6027\u80fd\u8f7b\u677e\u8d85\u8d8a\u5f00\u6e90\u7684\u6700\u65b0\u6a21\u578b\uff0c\u4ee5\u53ca\u5305\u62ecStable Diffusion 3\u3001DALL-E 3\u548cMidjourney V6\u5728\u5185\u7684\u4e3b\u6d41\u95ed\u6e90\u5546\u4e1a\u6a21\u578b\u3002\u5f3a\u5927\u7684LI-DiT-10B\u5c06\u5728\u8fdb\u4e00\u6b65\u4f18\u5316\u548c\u5b89\u5168\u68c0\u67e5\u540e\u63d0\u4f9b\u3002|\n", "2406.11827": "|**2024-06-17**|**WPO: Enhancing RLHF with Weighted Preference Optimization**|Wenxuan Zhou et.al.|[2406.11827](http://arxiv.org/abs/2406.11827)|**[link](https://github.com/wzhouad/wpo)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u66f4\u597d\u5730\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6709\u524d\u666f\u65b9\u6cd5\u3002\u7531\u4e8e\u6210\u672c\u6548\u76ca\u548c\u53ef\u6269\u5c55\u6027\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u2014\u2014\u901a\u8fc7\u5176\u4ed6\u6a21\u578b\u83b7\u53d6\u504f\u597d\u6570\u636e\u2014\u2014\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u7136\u800c\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u5e38\u53d7\u91c7\u6837\u7b56\u7565\u4e0e\u76ee\u6807\u7b56\u7565\u4e4b\u95f4\u5206\u5e03\u5dee\u5f02\u7684\u5f71\u54cd\uff0c\u5bfc\u81f4\u4f18\u5316\u6548\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7b56\u7565\u2014\u2014\u52a0\u6743\u504f\u597d\u4f18\u5316\uff08WPO\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u504f\u597d\u8bc4\u5206\u5bf9\uff0c\u4f7f\u79bb\u7ebf\u6570\u636e\u66f4\u63a5\u8fd1\u4e8e\u5f53\u524d\u7b56\u7565\uff0c\u4ece\u800c\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5206\u5e03\u5dee\u8ddd\u96be\u9898\uff0c\u8fd8\u63d0\u5347\u4e86\u4f18\u5316\u8fc7\u7a0b\uff0c\u65e0\u9700\u989d\u5916\u6210\u672c\u3002 \u6211\u4eec\u5728Alpaca Eval 2\u548cMT-bench\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002WPO\u5728Alpaca Eval 2\u4e0a\u7684\u6027\u80fd\u6bd4\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u63d0\u9ad8\u4e865.6%\u3002\u57fa\u4e8eLlama-3-8B-Instruct\uff0cWPO\u751a\u81f3\u5efa\u7acb\u4e86\u663e\u8457\u7684\u957f\u5ea6\u63a7\u5236\u80dc\u7387\uff0c\u8fbe\u523048.6%\uff0c\u572880\u4ebf\u53c2\u6570\u6a21\u578b\u6392\u884c\u699c\u4e0a\u6210\u4e3a\u6700\u5f3a\u52b2\u7684\u6a21\u578b\u3002\u6211\u4eec\u5c06\u5728\u4e0a\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002**|\n", "2406.11818": "|**2024-06-17**|**Embodied Instruction Following in Unknown Environments**|Zhenyu Wu et.al.|[2406.11818](http://arxiv.org/abs/2406.11818)|null|\u5728\u81ea\u4e3b\u5bb6\u5ead\u670d\u52a1\u7cfb\u7edf\u4e2d\uff0c\u4f7f\u5b9e\u4f53\u4ee3\u7406\u80fd\u6839\u636e\u81ea\u7136\u8bed\u8a00\u5b8c\u6210\u590d\u6742\u7684\u4eba\u7c7b\u6307\u4ee4\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u4ec5\u80fd\u5728\u6240\u6709\u4e92\u52a8\u5bf9\u8c61\u90fd\u63d0\u4f9b\u7ed9\u4ee3\u7406\u7684\u5df2\u77e5\u73af\u5883\u4e2d\u6267\u884c\u6307\u4ee4\uff0c\u76f4\u63a5\u5c06\u73b0\u6709\u65b9\u6cd5\u5e94\u7528\u4e8e\u672a\u77e5\u73af\u5883\u901a\u5e38\u4f1a\u4ea7\u751f\u64cd\u4f5c\u4e0d\u5b58\u5728\u7269\u4f53\u7684\u4e0d\u53ef\u884c\u8ba1\u5212\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u672a\u77e5\u73af\u5883\u7684\u590d\u6742\u4efb\u52a1\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\uff08Embodied Instruction Following\uff0cEIF\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4f7f\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u63a2\u7d22\u73af\u5883\uff0c\u5229\u7528\u73b0\u6709\u7269\u4f53\u751f\u6210\u53ef\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u8fbe\u6210\u62bd\u8c61\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\u5668\u548c\u4f4e\u5c42\u63a2\u7d22\u63a7\u5236\u5668\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5c42\u6b21\u5316\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u52a8\u6001\u533a\u57df\u6ce8\u610f\u529b\u6784\u5efa\u573a\u666f\u7684\u8bed\u4e49\u8868\u793a\u5730\u56fe\uff0c\u4ee5\u5c55\u793a\u5df2\u77e5\u7684\u89c6\u89c9\u7ebf\u7d22\uff0c\u4f7f\u4efb\u52a1\u89c4\u5212\u548c\u573a\u666f\u63a2\u7d22\u4e0e\u4eba\u7c7b\u6307\u4ee4\u76ee\u6807\u4fdd\u6301\u4e00\u81f4\u3002\u5bf9\u4e8e\u4efb\u52a1\u89c4\u5212\u5668\uff0c\u6839\u636e\u4efb\u52a1\u5b8c\u6210\u8fc7\u7a0b\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\uff0c\u6211\u4eec\u751f\u6210\u6b65\u9aa4\u5f0f\u7684\u53ef\u884c\u8ba1\u5212\u3002\u5bf9\u4e8e\u63a2\u7d22\u63a7\u5236\u5668\uff0c\u6839\u636e\u751f\u6210\u7684\u6b65\u9aa4\u8ba1\u5212\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\u9884\u6d4b\u6700\u4f18\u7684\u5bfc\u822a\u6216\u7269\u4f53\u4ea4\u4e92\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5927\u578b\u623f\u5c4b\u7ea7\u573a\u666f\u4e2d\u7684204\u4e2a\u590d\u6742\u4eba\u7c7b\u6307\u4ee4\uff08\u5982\u505a\u65e9\u9910\u548c\u6574\u7406\u623f\u95f4\uff09\u4e0a\u5b9e\u73b0\u4e8645.09%\u7684\u6210\u529f\u7387\u3002|\n", "2406.11816": "|**2024-06-17**|**VideoLLM-online: Online Video Large Language Model for Streaming Video**|Joya Chen et.al.|[2406.11816](http://arxiv.org/abs/2406.11816)|null|## \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u589e\u5f3a\u4e86\u89c6\u89c9\u529f\u80fd\uff0c\u80fd\u591f\u7406\u89e3\u56fe\u50cf\u3001\u89c6\u9891\u548c\u878d\u5408\u4e86\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u5185\u5bb9\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5927\u6a21odels\u7684\u8bad\u7ec3\u65b9\u6cd5\u901a\u5e38\u5c06\u89c6\u9891\u89c6\u4e3a\u9884\u5148\u526a\u8f91\u597d\u7684\u7247\u6bb5\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5728\u5904\u7406\u8fde\u7eed\u89c6\u9891\u6d41\u65f6\u6548\u679c\u4e0d\u4f73\u4e14\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cLearning-In-Video-Stream\u201d\uff08LIVE\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b9e\u65f6\u3001\u957f\u5e8f\u5217\u3001\u4e0e\u89c6\u9891\u6d41\u540c\u6b65\u7684\u5bf9\u8bdd\uff0c\u9002\u7528\u4e8e\u8fde\u7eed\u89c6\u9891\u8f93\u5165\u3002LIVE\u6846\u67b6\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a\uff081\uff09\u4e00\u4e2a\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u8fde\u7eed\u6d41\u5f0f\u8f93\u5165\u7684\u8bed\u8a00\u5efa\u6a21\u76ee\u6807\uff1b\uff082\uff09\u4e00\u79cd\u6570\u636e\u751f\u6210\u7b56\u7565\uff0c\u5c06\u79bb\u7ebf\u65f6\u95f4\u6807\u6ce8\u8f6c\u6362\u4e3a\u9002\u5408\u6d41\u5f0f\u5bf9\u8bdd\u7684\u683c\u5f0f\uff1b\uff083\uff09\u4e00\u4e2a\u4f18\u5316\u7684\u63a8\u7406\u7ba1\u9053\uff0c\u4ee5\u63d0\u9ad8\u5728\u5b9e\u9645\u89c6\u9891\u6d41\u4e2d\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u57fa\u4e8eLlama-2/Llama-3\uff0c\u6211\u4eec\u6784\u5efa\u4e86VideoLLM-online\u6a21\u578b\uff0c\u5e76\u901a\u8fc7\u5b83\u5c55\u793a\u4e86\u5728\u5904\u7406\u89c6\u9891\u6d41\u5bf9\u8bdd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\uff0c\u4f8b\u5982\uff0c\u5728A100 GPU\u4e0a\uff0c\u8be5\u6a21\u578b\u80fd\u57285\u5206\u949f\u89c6\u9891\u7247\u6bb5\u4e2d\u5b9e\u73b0\u8d85\u8fc710\u5e27\u6bcf\u79d2\u7684\u6d41\u5f0f\u5bf9\u8bdd\u3002\u6b64\u5916\uff0cVideoLLM-online\u8fd8\u5728\u516c\u5f00\u7684\u79bb\u7ebf\u89c6\u9891\u57fa\u51c6\u6d4b\u8bd5\uff08\u5982\u8bc6\u522b\u3001captioning\u548c\u9884\u6d4b\uff09\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u3001\u6a21\u578b\u3001\u6570\u636e\u548c\u6f14\u793a\u53d1\u5e03\u5728https://showlab.github.io/videollm-online\u4f9b\u4eba\u4f7f\u7528\u3002|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813](http://arxiv.org/abs/2406.11813)|**[link](https://github.com/kaistai/factual-knowledge-acquisition)**|\u5c3d\u7ba1\u8fd1\u671f\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5b58\u50a8\u5927\u91cf\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u8fd9\u4e9b\u77e5\u8bc6\u7684\u673a\u5236\u5c1a\u4e0d\u660e\u786e\u3002\u672c\u7814\u7a76\u9488\u5bf9\u8fd9\u4e00\u7f3a\u53e3\uff0c\u63a2\u8ba8\u4e86LLMs\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u5982\u4f55\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u3002\u7814\u7a76\u53d1\u73b0\u4e86\u4e00\u4e9b\u5173\u952e\u6d1e\u89c1\uff1a\u9996\u5148\uff0c\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0c\u66f4\u591a\u7684\u8bad\u7ec3\u6570\u636e\u5bf9\u6a21\u578b\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u7684\u80fd\u529b\u5e76\u65e0\u663e\u8457\u63d0\u5347\u3002\u5176\u6b21\uff0c\u8bad\u7ec3\u6b65\u6570\u4e0e\u8bb0\u5fc6\u9057\u5fd8\u548c\u4e8b\u5b9e\u77e5\u8bc6\u6cdb\u5316\u4e4b\u95f4\u5b58\u5728\u5e42\u5f8b\u5173\u7cfb\uff0c\u4f7f\u7528\u91cd\u590d\u8bad\u7ec3\u6570\u636e\u7684\u6a21\u578b\u9057\u5fd8\u901f\u5ea6\u66f4\u5feb\u3002\u7b2c\u4e09\uff0c\u589e\u5927\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u62b5\u6297\u9057\u5fd8\u7684\u80fd\u529b\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0cLLMs\u5728\u9884\u8bad\u7ec3\u4e2d\u7684\u4e8b\u5b9e\u77e5\u8bc6\u83b7\u53d6\u662f\u901a\u8fc7\u9010\u6b65\u589e\u52a0\u6bcf\u4e00\u6b65\u4e2d\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\u4e8b\u5b9e\u77e5\u8bc6\u51fa\u73b0\u7684\u6982\u7387\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u589e\u52a0\u968f\u540e\u4f1a\u56e0\u9057\u5fd8\u800c\u7a00\u91ca\u3002\u57fa\u4e8e\u8fd9\u79cd\u7406\u89e3\uff0c\u6211\u4eec\u80fd\u591f\u89e3\u91ca\u4e00\u4e9b\u6700\u8fd1\u89c2\u5bdf\u5230\u7684LLM\u884c\u4e3a\uff0c\u5982\u957f\u5c3e\u77e5\u8bc6\u4e0a\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4ee5\u53ca\u53bb\u91cd\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u597d\u5904\u3002|\n", "2406.11811": "|**2024-06-17**|**RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content**|Joao Monteiro et.al.|[2406.11811](http://arxiv.org/abs/2406.11811)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5927\u91cf\u4f9d\u8d56\u81ea\u52a8\u4ece\u4e92\u8054\u7f51\u6293\u53d6\u7684\u6570\u636e\uff0c\u5176\u4e2d\u5305\u62ec\u5305\u542b\u5927\u91cf\u901a\u7528\u77e5\u8bc6\u7684\u767e\u79d1\u5168\u4e66\uff08\u5982\u7ef4\u57fa\u767e\u79d1\uff09\uff0c\u4e5f\u53ef\u80fd\u4e0e\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u57fa\u51c6\u6570\u636e\u96c6\u91cd\u53e0\u3002\u56e0\u6b64\uff0c\u5982\u679c\u6d4b\u8bd5\u96c6\u53ef\u80fd\u5df2\u6cc4\u9732\u5230\u8bad\u7ec3\u96c6\u4e2d\uff0c\u5bf9\u6a21\u578b\u7684\u8bc4\u4f30\u53ef\u80fd\u4f1a\u4ea7\u751f\u8bef\u5bfc\u6027\u7684\u7ed3\u8bba\u3002\u4e3a\u4e86\u63a8\u52a8\u8bed\u8a00\u6a21\u578b\u7684\u516c\u6b63\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\u2014\u2014RepLiQA\uff0c\u9002\u7528\u4e8e\u95ee\u7b54\u548c\u4e3b\u9898\u68c0\u7d22\u4efb\u52a1\u3002RepLiQA\u662f\u4e00\u4e2a\u5305\u542b\u4e94\u4e2a\u5206\u7247\u7684\u6d4b\u8bd5\u96c6\uff0c\u5176\u4e2d\u56db\u4e2a\u5728\u672c\u8bba\u6587\u53d1\u5e03\u524d\u672a\u516c\u5f00\u6216\u901a\u8fc7LLM API\u63d0\u4f9b\u3002RepLiQA\u7684\u6bcf\u4e2a\u6837\u672c\u7531\u4ee5\u4e0b\u56db\u90e8\u5206\u7ec4\u6210\uff1a\uff081\uff09\u7531\u4eba\u7c7b\u6807\u6ce8\u5458\u521b\u4f5c\u7684\u865a\u6784\u573a\u666f\u63cf\u8ff0\u6587\u6863\uff08\u4f8b\u5982\u65b0\u95fb\u6587\u7ae0\uff09\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0d\u4f1a\u51fa\u73b0\u5728\u4e92\u8054\u7f51\u4e0a\uff1b\uff082\uff09\u5173\u4e8e\u6587\u6863\u4e3b\u9898\u7684\u95ee\u9898\uff1b\uff083\uff09\u76f4\u63a5\u6e90\u81ea\u6587\u6863\u4fe1\u606f\u7684\u6b63\u786e\u7b54\u6848\uff1b\uff084\uff09\u5305\u542b\u7b54\u6848\u7684\u6587\u6863\u6bb5\u843d\u3002\u8fd9\u610f\u5473\u7740\u53ea\u6709\u5f53\u6a21\u578b\u80fd\u5728\u63d0\u4f9b\u7684\u6587\u6863\u4e2d\u627e\u5230\u76f8\u5173\u5185\u5bb9\u65f6\uff0c\u624d\u80fd\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5305\u62ec\u591a\u4e2a\u6700\u5148\u8fdb\u7684LLM\uff0c\u4ee5\u63ed\u793a\u4e0d\u540c\u7c7b\u578b\u7684\u548c\u89c4\u6a21\u7684\u6a21\u578b\u5728\u6761\u4ef6\u8bed\u8a00\u5efa\u6a21\u8bbe\u7f6e\u4e0b\u7684\u6027\u80fd\u5dee\u5f02\u3002RepLiQA\u7684\u5df2\u53d1\u5e03\u5206\u7247\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\uff1ahttps://huggingface.co/datasets/ServiceNow/repliqa\u3002|\n", "2406.11801": "|**2024-06-17**|**Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations**|Rima Hazra et.al.|[2406.11801](http://arxiv.org/abs/2406.11801)|**[link](https://github.com/declare-lab/safety-arithmetic)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ffb\u8bd1\u548c\u95ee\u7b54\u7b49\u5e94\u7528\u4e2d\u7684\u65e5\u76ca\u91cd\u8981\uff0c\u786e\u4fdd\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6b63\u786e\u5bfc\u5411\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u5bf9\u9f50\u65b9\u6cd5\u5728\u5904\u7406\u52a8\u6001\u7528\u6237\u610f\u56fe\u548c\u590d\u6742\u76ee\u6807\u65f6\u5b58\u5728\u56f0\u96be\uff0c\u4f7f\u5f97\u6a21\u578b\u5bb9\u6613\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\u2014\u2014\u5b89\u5168\u7b97\u672f\uff08Safety Arithmetic\uff09\uff0c\u65e8\u5728\u63d0\u5347LLMs\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5305\u62ec\u57fa\u7840\u6a21\u578b\u3001\u76d1\u7763\u5fae\u8c03\u6a21\u578b\uff08SFT\uff09\u548c\u7f16\u8f91\u540e\u7684\u6a21\u578b\u3002\u5b89\u5168\u7b97\u672f\u5305\u542b\u4e24\u90e8\u5206\uff1a\u6709\u5bb3\u5185\u5bb9\u6d88\u9664\uff08Harm Direction Removal\uff09\u4ee5\u907f\u514d\u4e0d\u826f\u8f93\u51fa\uff0c\u4ee5\u53ca\u5b89\u5168\u5bf9\u9f50\uff08Safety Alignment\uff09\u4ee5\u4fc3\u8fdb\u5b89\u5168\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86NoIntentEdit\u6570\u636e\u96c6\uff0c\u5b83\u63ed\u793a\u4e86\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5b89\u5168\u98ce\u9669\u7684\u7f16\u8f91\u5b9e\u4f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5b89\u5168\u7b97\u672f\u663e\u8457\u589e\u5f3a\u4e86\u5b89\u5168\u63aa\u65bd\uff0c\u51cf\u5c11\u4e86\u8fc7\u5ea6\u5b89\u5168\u7684\u95ee\u9898\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u5b9e\u7528\u6027\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5728\u4fdd\u969c\u5185\u5bb9\u751f\u6210\u7684\u5b89\u5168\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002**|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846](http://arxiv.org/abs/2406.12846)|null|\u5f53\u524d\u7684\u957f\u89c6\u9891\u7406\u89e3\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u65f6\u957f\u4ec5\u5341\u51e0\u79d2\u7684\u89c6\u9891\uff0c\u5bf9\u5904\u7406\u66f4\u957f\u89c6\u9891\u7684\u6280\u672f\u63a2\u7d22\u6709\u9650\u3002\u957f\u89c6\u9891\u4e2d\u7684\u5927\u91cf\u5e27\u6570\u5e26\u6765\u4e86\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u96be\u4ee5\u5b9a\u4f4d\u5173\u952e\u4fe1\u606f\u548c\u8fdb\u884c\u957f\u671f\u63a8\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faDrVideo\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6587\u6863\u68c0\u7d22\u7684\u7cfb\u7edf\uff0c\u4e13\u4e3a\u957f\u89c6\u9891\u7406\u89e3\u8bbe\u8ba1\u3002\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u89c6\u9891\u7406\u89e3\u95ee\u9898\u8f6c\u5316\u4e3a\u957f\u6587\u6863\u7406\u89e3\u4efb\u52a1\uff0c\u4ee5\u5145\u5206\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0cDrVideo\u5c06\u957f\u89c6\u9891\u8f6c\u6362\u4e3a\u6587\u672c\u5f62\u5f0f\u7684\u957f\u6587\u6863\uff0c\u9996\u5148\u68c0\u7d22\u5173\u952e\u5e27\u5e76\u589e\u5f3a\u8fd9\u4e9b\u5e27\u7684\u4fe1\u606f\uff0c\u4f5c\u4e3a\u7cfb\u7edf\u7684\u8d77\u70b9\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u57fa\u4e8e\u4ee3\u7406\u7684\u8fed\u4ee3\u5faa\u73af\uff0c\u6301\u7eed\u641c\u7d22\u7f3a\u5931\u4fe1\u606f\u3001\u8865\u5145\u76f8\u5173\u6570\u636e\uff0c\u5e76\u5728\u6536\u96c6\u5230\u8db3\u591f\u7684\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u4fe1\u606f\u540e\uff0c\u4ee5\u94fe\u5f0f\u601d\u8003\u7684\u65b9\u5f0f\u7ed9\u51fa\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u591a\u4e2a\u957f\u89c6\u9891\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002DrVideo\u5728EgoSchema\uff083\u5206\u949f\uff09\u6d4b\u8bd5\u4e2d\u6bd4\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa3.8\u4e2a\u767e\u5206\u70b9\uff0c\u5728MovieChat-1K\uff0810\u5206\u949f\uff09\u7684break\u6a21\u5f0f\u548cglobal\u6a21\u5f0f\u4e2d\u5206\u522b\u63d0\u9ad817.9\u548c38.0\u5206\uff0c\u4ee5\u53ca\u5728LLama-Vid QA\uff08\u8d85\u8fc760\u5206\u949f\uff09\u6570\u636e\u96c6\u4e0a\u63d0\u534730.2\u5206\u3002|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845](http://arxiv.org/abs/2406.12845)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\uff08RM\uff09\uff0c\u8fc7\u7a0b\u901a\u5e38\u4ece\u6bd4\u8f83\u540c\u4e00\u7528\u6237\u8bf7\u6c42\u7684\u54cd\u5e94\u5f00\u59cb\uff0c\u76f8\u5bf9\u8bc4\u5206\u6307\u793a\u4eba\u7c7b\u66f4\u559c\u6b22\u54ea\u4e2a\u54cd\u5e94\u3002\u7136\u800c\uff0c\u7531\u4e8eRM\u7684\u9ed1\u76d2\u7279\u6027\uff0c\u5176\u8f93\u51fa\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u4eba\u4eec\u96be\u4ee5\u7406\u89e3\u4e3a\u4ec0\u4e48RM\u8ba4\u4e3a\u67d0\u4e2a\u56de\u590d\u662f\u597d\u7684\u3002\u9274\u4e8eRM\u4f5c\u4e3a\u4eba\u7c7b\u504f\u597d\u7684\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u8bae\u91c7\u7528\u4e24\u9636\u6bb5\u65b9\u6cd5\u6765\u521b\u5efa\u53ef\u89e3\u91ca\u7684RM\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u591a\u7ef4\u7edd\u5bf9\u8bc4\u5206\u6570\u636e\u8bad\u7ec3\u7edd\u5bf9\u8bc4\u7ea7\u591a\u76ee\u6807\u5956\u52b1\u6a21\u578b\uff08ArmoRM\uff09\uff0c\u6bcf\u4e2a\u7ef4\u5ea6\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u76ee\u6807\uff08\u5982\u8bda\u5b9e\u3001\u8be6\u5c3d\u3001\u5b89\u5168\uff09\uff1b\u5176\u6b21\uff0c\u5229\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7b56\u7565\uff0c\u7ed3\u5408\u4e00\u4e2a\u95e8\u63a7\u7f51\u7edc\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u81ea\u52a8\u9009\u62e9\u6700\u5408\u9002\u7684\u5956\u52b1\u76ee\u6807\u3002\u6211\u4eec\u6210\u529f\u5730\u4f7f\u7528Llama-3 8B\u8bad\u7ec3\u4e86ArmoRM\uff0c\u5e76\u5728\u9876\u90e8\u6dfb\u52a0\u4e86\u4e00\u4e2a\u6d45\u5c42MLP\u4f5c\u4e3a\u95e8\u63a7\u7f51\u7edc\uff0c\u5f62\u6210\u4e86ArmoRM-Llama3-8B\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u8bc4\u4f30RM\u7684\u8bed\u8a00\u5efa\u6a21\u6027\u80fd\u7684RewardBench\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6210\u7ee9\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8d85\u8fc7\u4e86\u4f7f\u7528GPT-4\u6cd5\u5b98\u7684LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\uff0c\u5e76\u63a5\u8fd1\u4e8e\u89c4\u6a21\u66f4\u5927\u7684Nemotron-4 340B\u5956\u52b1\u6a21\u578b\u7684\u6c34\u5e73\u3002**|\n", "2406.12844": "|**2024-06-18**|**Synergizing Foundation Models and Federated Learning: A Survey**|Shenghui Li et.al.|[2406.12844](http://arxiv.org/abs/2406.12844)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3001\u89c6\u89c9Transformer\u548c\u591a\u6a21\u6001\u6a21\u578b\u7b49\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u7684\u53d1\u5c55\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u4e0e\u5c0f\u578b\u6a21\u578b\u76f8\u6bd4\uff0cFMs\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5bf9\u5927\u91cf\u6570\u636e\u7684\u9700\u6c42\u66f4\u5927\u3002\u5c3d\u7ba1\u901a\u7528FMs\u53ef\u4ee5\u4f7f\u7528\u4e92\u8054\u7f51\u4e0a\u7684\u516c\u5f00\u6570\u636e\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u4f46\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684FMs\u9700\u8981\u4e13\u6709\u6570\u636e\uff0c\u8fd9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u56e0\u9690\u79c1\u95ee\u9898\u800c\u9762\u4e34\u6570\u636e\u53ef\u7528\u6027\u6311\u6218\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4f5c\u4e3a\u4e00\u79cd\u534f\u4f5c\u5b66\u4e60\u8303\u5f0f\uff0c\u6253\u7834\u4e86\u6570\u636e\u5171\u4eab\u7684\u969c\u788d\uff0c\u4e3a\u5229\u7528\u5206\u5e03\u5f0f\u6570\u636e\u5b9a\u5236\u548c\u9002\u5e94\u5404\u79cd\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u7684FMs\u63d0\u4f9b\u4e86\u524d\u666f\uff0c\u540c\u65f6\u4fdd\u62a4\u4e86\u6570\u636e\u9690\u79c1\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8bba\u6587\u63a2\u8ba8\u4e86FL\u4e0eFMs\u878d\u5408\u7684\u6f5c\u529b\u4e0e\u6311\u6218\uff0c\u603b\u7ed3\u4e86\u6838\u5fc3\u6280\u672f\u3001\u672a\u6765\u53d1\u5c55\u65b9\u5411\u4ee5\u53ca\u5e94\u7528\u573a\u666f\u3002\u5173\u4e8eFM-FL\u7684\u5b9a\u671f\u66f4\u65b0\u8bba\u6587\u96c6\u5408\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832](http://arxiv.org/abs/2406.12832)|**[link](https://github.com/arminazizi98/lamda)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u5df2\u7ecf\u6210\u4e3a\u6807\u51c6\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u5d4c\u5165\u7ef4\u5ea6\u7684\u589e\u52a0\uff0cLoRA\u6240\u9700\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u4e5f\u968f\u4e4b\u4e0a\u5347\uff0c\u5bfc\u81f4\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6b64\u5916\uff0c\u5176\u540e\u5411\u66f4\u65b0\u9700\u8981\u5b58\u50a8\u9ad8\u7ef4\u4e2d\u95f4\u6fc0\u6d3b\u548c\u4f18\u5316\u5668\u72b6\u6001\uff0c\u5bf9GPU\u5185\u5b58\u9700\u6c42\u8f83\u5927\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\u57fa\u4e8e\u8c31\u5206\u89e3\u7684\u4f4e\u7ef4\u9002\u5e94\uff08LaMDA\uff09\u3002LaMDA\u901a\u8fc7\u51bb\u7ed3\u7b2c\u4e00\u6295\u5f71\u77e9\u9635\uff08PMA\uff09\uff0c\u540c\u65f6\u5f15\u5165\u4e00\u4e2a\u4f4e\u7ef4\u53ef\u8bad\u7ec3\u7684\u5e73\u65b9\u77e9\u9635\uff0c\u5b9e\u73b0\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u7684\u5927\u5e45\u51cf\u5c11\u3002\u5728\u65e9\u671f\u7684\u5fae\u8c03\u9636\u6bb5\uff0cLaMDA\u9010\u6b65\u51bb\u7ed3\u7b2c\u4e8c\u6295\u5f71\u77e9\u9635\uff08PMB\uff09\uff0c\u8fdb\u4e00\u6b65\u964d\u4f4e\u6743\u91cd\u66f4\u65b0\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u63d0\u9ad8\u53c2\u6570\u6548\u7387\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u589e\u5f3a\u7248LaMDA++\uff0c\u5b83\u901a\u8fc7\u89c4\u8303\u5316\u9884\u8bad\u7ec3\u6a21\u578b\u6743\u91cd\u7684\u8c31\u5206\u6790\uff0c\u5b9e\u73b0\u8f7b\u91cf\u7ea7\u7684LoRA\u8def\u5f84\u81ea\u9002\u5e94\u79e9\u5206\u914d\u3002\u6211\u4eec\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecGLUE\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u57fa\u51c6\u3001\u6587\u672c\u6458\u8981\u3001\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4ee5\u53ca\u590d\u6742\u63a8\u7406\uff0c\u5e94\u7528\u4e8e\u4e0d\u540c\u7c7b\u578b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLaMDA\u5728\u6027\u80fd\u4e0a\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u5f53\u6216\u8d85\u8d8a\uff0c\u4e14\u5728\u5fae\u8c03\u671f\u95f4\u53ef\u51cf\u5c11\u9ad8\u8fbe17.7\u500d\u7684\u53c2\u6570\u66f4\u65b0\u6b21\u6570\uff0c\u4ee5\u53ca1.32\u500d\u7684\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u3002\u6211\u4eec\u5c06\u516c\u5f00\u4ee3\u7801\u3002**|\n", "2406.12822": "|**2024-06-18**|**Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?**|Pinzhen Chen et.al.|[2406.12822](http://arxiv.org/abs/2406.12822)|null|## \u80cc\u666f \u5927\u578b\u591a\u8bed\u8a00\u6a21\u578b\u65e8\u5728\u670d\u52a1\u4e0d\u540c\u8bed\u79cd\u7684\u6bcd\u8bed\u4f7f\u7528\u8005\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u5f53\u524d\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u5fae\u8c03\u548c\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4e0e\u5176\u521d\u8877\u4e0d\u7b26\uff0c\u539f\u56e0\u5728\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u7ffb\u8bd1\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ffb\u8bd1\u4e2d\u7684\u7455\u75b5\u3002\u5c1a\u4e0d\u6e05\u695a\u6307\u4ee4\u6570\u636e\u7684\u6027\u8d28\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u8f93\u51fa\uff0c\u540c\u65f6\uff0c\u7528\u7ffb\u8bd1\u6d4b\u8bd5\u96c6\u6765\u6355\u6349\u8fd9\u4e9b\u7ec6\u5fae\u5dee\u522b\u662f\u5426\u6709\u6548\u3002\u7531\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u9636\u6bb5\u5e38\u5e38\u7ed3\u5408\u4f7f\u7528\u7ffb\u8bd1\u6570\u636e\uff0c\u8fd9\u4e9b\u6f5c\u5728\u95ee\u9898\u53ef\u80fd\u88ab\u5ffd\u89c6\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5728\u6307\u4ee4\u8c03\u4f18\u548c\u8bc4\u4f30\u9636\u6bb5\u4f7f\u7528\u63a7\u5236\u6027\u7684\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6570\u636e\uff0c\u6765\u63a2\u7a76\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u89c2\u5bdf\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u516b\u79cd\u57fa\u7840\u6a21\u578b\u548c\u516b\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u6bcd\u8bed\u6216\u751f\u6210\u6027\u57fa\u51c6\uff0c\u4f7f\u7528\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6307\u4ee4\u6570\u636e\u65f6\uff0c\u6a21\u578b\u6027\u80fd\u9ad8\u65f6\uff0c\u4e24\u8005\u4e4b\u95f4\u7684\u5dee\u5f02\u5c24\u4e3a\u660e\u663e\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u7684\u6d4b\u8bd5\u96c6\u4e0a\u5219\u4e0d\u7136\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u6b63\u5219\u5316\u5bf9\u4e8e\u7ed3\u6784\u5316\u4efb\u52a1\u6709\u76ca\uff0c\u4f46\u5bf9\u4e8e\u751f\u6210\u6027\u4efb\u52a1\u5219\u4e0d\u7136\u3002|\n", "2406.12809": "|**2024-06-18**|**Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?**|Zhe Yang et.al.|[2406.12809](http://arxiv.org/abs/2406.12809)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e0d\u4e00\u81f4\u7684\u95ee\u9898\uff0c\u4f8b\u5982\u5bf9\u91cd\u8ff0\u6216\u5fae\u5c0f\u987a\u5e8f\u53d8\u5316\u7684\u53cd\u5e94\u4e0d\u4e00\u81f4\u3002\u9664\u4e86\u8fd9\u4e9b\u4e0d\u7a33\u5b9a\u6027\uff0c\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\u5c3d\u7ba1LLMs\u80fd\u591f\u89e3\u51b3\u96be\u9898\uff0c\u4f46\u5728\u76f8\u5bf9\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\u5374\u53ef\u80fd\u5931\u8d25\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u4ece\u96be\u5230\u6613\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u6211\u4eec\u521b\u5efa\u4e86ConsisEval\u57fa\u51c6\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u5305\u542b\u4e24\u4e2a\u96be\u5ea6\u6709\u5e8f\u7684\u95ee\u9898\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u81f4\u6027\u5206\u6570\u7684\u6982\u5ff5\uff0c\u4ee5\u91cf\u5316\u8fd9\u79cd\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u5206\u6790\u901a\u8fc7\u76f8\u5bf9\u4e00\u81f4\u6027\u5206\u6570\u6539\u8fdb\u4e00\u81f4\u6027\u6f5c\u529b\u3002\u901a\u8fc7\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4ee5\u4e0b\u53d1\u73b0\uff1a(1) GPT-4\u83b7\u5f9792.2%\u7684\u6700\u9ad8\u4e00\u81f4\u6027\u5206\u6570\uff0c\u4f46\u4ecd\u56e0\u5197\u4f59\u4fe1\u606f\u7684\u5e72\u6270\u3001\u95ee\u9898\u8bef\u89e3\u7b49\u95ee\u9898\u5bf9\u7279\u5b9a\u95ee\u9898\u4e0d\u4e00\u81f4\uff1b(2) \u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u901a\u5e38\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u4e00\u81f4\u6027\uff0c\u4f46\u4e5f\u5b58\u5728\u4f8b\u5916\u60c5\u51b5\uff1b(3) \u5bf9\u4e8e Fine-tuning \u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u800c\u8a00\uff0c\u786c\u6570\u636e\u53ef\u4ee5\u63d0\u9ad8\u4e00\u81f4\u6027\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12800": "|**2024-06-18**|**Supporting Human Raters with the Detection of Harmful Content using Large Language Models**|Kurt Thomas et.al.|[2406.12800](http://arxiv.org/abs/2406.12800)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u6216\u8f85\u52a9\u4eba\u7c7b\u5ba1\u9605\u8005\u68c0\u6d4b\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u5982\u4ec7\u6068\u8a00\u8bba\u3001\u9a9a\u6270\u3001\u6781\u7aef\u4e3b\u4e49\u548c\u9009\u4e3e\u8bef\u5bfc\u3002\u901a\u8fc750,000\u6761\u8bc4\u8bba\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u6bd4\u65f6\u80fd\u8fbe\u523090%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u63d0\u51fa\u4e94\u79cd\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4ee5\u6574\u5408LLMs\u4e0e\u4eba\u5de5\u8bc4\u7ea7\uff0c\u4f8b\u5982\u9884\u7b5b\u9009\u975e\u66b4\u529b\u5185\u5bb9\u3001\u68c0\u6d4b\u4eba\u7c7b\u8bc4\u7ea7\u53ef\u80fd\u7684\u9519\u8bef\uff0c\u6216\u8005\u63d0\u4f9b\u5173\u952e\u4e0a\u4e0b\u6587\u4ee5\u652f\u6301\u4eba\u5de5\u8bc4\u7ea7\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u4e00\u4e2a\u4f18\u5316\u7684\u63d0\u793a\u6765\u652f\u6301\u8fd9\u4e9b\u8bbe\u8ba1\u6a21\u5f0f\u3002\u5728\u5b9e\u9645\u5e94\u7528\u7684\u8bd5\u70b9\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f18\u5316\u4eba\u529b\u8d44\u6e90\u6548\u7387\u65b9\u9762\u5b9e\u73b0\u4e8641.5%\u7684\u63d0\u5347\uff0c\u540c\u65f6\u5728\u68c0\u6d4b\u8fdd\u89c4\u5185\u5bb9\u7684\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e0a\u5206\u522b\u63d0\u9ad8\u4e869%\u81f311%\u3002|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793](http://arxiv.org/abs/2406.12793)|**[link](https://github.com/thudm/chatglm-6b)**|\u6211\u4eec\u4ecb\u7ecdChatGLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u4e0d\u65ad\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7cfb\u5217\u3002\u672c\u62a5\u544a\u4e3b\u8981\u5173\u6ce8GLM-4\u8bed\u8a00\u7cfb\u5217\uff0c\u5305\u62ecGLM-4\u3001GLM-4-Air\u548cGLM-4-9B\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u6211\u4eec\u5f53\u524d\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u96c6\u6210\u4e86\u524d\u4e09\u4ee3ChatGLM\u7684\u6240\u6709\u7ecf\u9a8c\u548c\u6559\u8bad\u3002\u8fd9\u4e9b\u6a21\u578b\u7ecf\u8fc7\u4e86\u5341\u4e07\u4ebf\u6b21\u8bad\u7ec3\uff0c\u4e3b\u8981\u6db5\u76d6\u4e2d\u6587\u548c\u82f1\u8bed\uff0c\u4ee5\u53ca\u5c11\u91cf\u6765\u81ea24\u79cd\u8bed\u8a00\u7684\u8bed\u6599\u5e93\uff0c\u4fa7\u91cd\u4e8e\u4e2d\u82f1\u6587\u7684\u5bf9\u9f50\u3002\u9ad8\u8d28\u91cf\u7684\u5bf9\u9f50\u662f\u901a\u8fc7\u591a\u9636\u6bb5\u7684\u540e\u8bad\u7ec3\u8fc7\u7a0b\u5b9e\u73b0\u7684\uff0c\u5305\u62ec\u76d1\u7763\u5fae\u8c03\u548c\u5b66\u4e60\u4eba\u7c7b\u53cd\u9988\u3002\u8bc4\u4f30\u663e\u793a\uff0cGLM-4\u5728\u901a\u7528\u6307\u6807\u5982MMLU\u3001GSM8K\u3001MATH\u3001BBH\u3001GPQA\u548cHumanEval\u4e0a\u63a5\u8fd1\u6216\u4f18\u4e8eGPT-4\uff1b\u5728IFEval\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63a5\u8fd1GPT-4 Turbo\uff1b\u5728\u957f\u6587\u672c\u4efb\u52a1\u4e0a\u4e0eGPT-4 Turbo\uff08128K\uff09\u548cClaude 3\u76f8\u5f53\uff1b\u5728\u4e2d\u6587\u5bf9\u9f50\u65b9\u9762\uff0cGLM-4\u4f18\u4e8eGPT-4\uff0c\u6839\u636eAlignBench\u8861\u91cf\u3002GLM-4 All Tools\u6a21\u578b\u8fdb\u4e00\u6b65\u8fdb\u884c\u4e86\u5bf9\u9f50\uff0c\u4ee5\u7406\u89e3\u7528\u6237\u610f\u56fe\u5e76\u80fd\u81ea\u4e3b\u51b3\u5b9a\u4f55\u65f6\u4f7f\u7528\u54ea\u79cd\u5de5\u5177\uff0c\u5982Web\u6d4f\u89c8\u5668\u3001Python\u89e3\u91ca\u5668\u3001\u6587\u672c\u8f6c\u56fe\u50cf\u6a21\u578b\u548c\u81ea\u5b9a\u4e49\u51fd\u6570\uff0c\u4ee5\u6709\u6548\u5730\u5b8c\u6210\u590d\u6742\u4efb\u52a1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5b83\u5728\u8bf8\u5982\u901a\u8fc7\u7f51\u7edc\u6d4f\u89c8\u83b7\u53d6\u4fe1\u606f\u548c\u4f7f\u7528Python\u89e3\u91ca\u5668\u89e3\u9898\u7b49\u4efb\u52a1\u4e0a\u4e0eGPT-4 All Tools\u76f8\u5339\u914d\u751a\u81f3\u8d85\u8d8a\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u4e00\u7cfb\u5217\u6a21\u578b\uff0c\u5305\u62ecChatGLM-6B\uff08\u4e09\u4ee3\uff09\u3001GLM-4-9B\uff08128K\u30011M\uff09\u3001GLM-4V-9B\u3001WebGLM\u548cCodeGeeX\uff0c\u57282023\u5e74\u4ec5Hugging Face\u4e0a\u5c31\u6709\u8d85\u8fc71000\u4e07\u6b21\u4e0b\u8f7d\u3002\u8fd9\u4e9b\u5f00\u6e90\u6a21\u578b\u53ef\u901a\u8fc7\u548c\u8bbf\u95ee\u3002|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784](http://arxiv.org/abs/2406.12784)|**[link](https://github.com/Cyno2232/UBENCH)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4f4e\u53ef\u89e3\u91ca\u6027\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u672a\u9884\u89c1\u60c5\u51b5\u4e0b\u5e38\u4f1a\u51fa\u73b0\u9519\u8bef\uff0c\u9650\u5236\u4e86\u5176\u4ef7\u503c\u3002\u5c3d\u7ba1\u5df2\u6709\u8bb8\u591a\u7814\u7a76\u81f4\u529b\u4e8e\u6784\u5efa\u5168\u9762\u7684\u8bc4\u4f30\u4f53\u7cfb\uff0c\u4f46\u5148\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u5173\u6ce8\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u5bf9\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u4e0d\u8db3\uff0c\u53ef\u80fd\u5bfc\u81f4\u4e0d\u7a33\u5b9a\u6027\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u8861\u91cfLLM\u53ef\u9760\u6027\u65f6\u8d44\u6e90\u6d88\u8017\u5927\uff0c\u4e14\u96be\u4ee5\u6d4b\u8bd5\u9ed1\u76d2\u6a21\u578b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86UBENCH\uff0c\u4e00\u4e2a\u5168\u9762\u7684LLM\u53ef\u9760\u6027\u8bc4\u4f30\u57fa\u51c6\u3002\u5b83\u5305\u542b3,978\u4e2a\u6db5\u76d6\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u3001\u63a8\u7406\u80fd\u529b\u7684\u591a\u9009\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cUBENCH\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u5355\u6b21\u91c7\u6837\u65b9\u6cd5\u663e\u8457\u8282\u7701\u4e86\u8ba1\u7b97\u8d44\u6e90\uff0c\u76f8\u8f83\u4e8e\u9700\u8981\u591a\u6b21\u91c7\u6837\u7684\u57fa\u7ebf\u65b9\u6cd5\u66f4\u4e3a\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528UBENCH\u8bc4\u4f30\u4e8615\u79cd\u6d41\u884cLLM\u7684\u53ef\u9760\u6027\uff0c\u53d1\u73b0GLM4\u8868\u73b0\u51fa\u8272\uff0c\u7d27\u968f\u5176\u540e\u7684\u662fGPT-4\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86Chain-of-Thought\u63d0\u793a\u3001\u89d2\u8272\u626e\u6f14\u63d0\u793a\u3001\u9009\u9879\u987a\u5e8f\u548c\u6e29\u5ea6\u5bf9LLM\u53ef\u9760\u6027\u7684\u5f71\u54cd\uff0c\u5206\u6790\u4e86\u5b83\u4eec\u5bf9\u4e0d\u540c\u6a21\u578b\u7684\u4e0d\u540c\u4f5c\u7528\u3002|\n", "2406.14563": "|**2024-06-20**|**Model Merging and Safety Alignment: One Bad Model Spoils the Bunch**|Hasan Abed Al Kader Hammoud et.al.|[2406.14563](http://arxiv.org/abs/2406.14563)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5408\u5e76\u662f\u4e00\u79cd\u7ecf\u6d4e\u9ad8\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u591a\u4e2a\u4e13\u5bb6\u7ea7LLMs\u6574\u5408\u6210\u4e00\u4e2a\u5168\u80fd\u6a21\u578b\uff0c\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u5408\u5e76\u8fc7\u7a0b\u4e2d\u5b89\u5168\u5bf9\u9f50\u7684\u91cd\u8981\u6027\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u6a21\u578b\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u6a21\u578b\u5408\u5e76\u5bf9\u5bf9\u9f50\u6027\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6d41\u884c\u7684\u6a21\u578b\u5408\u5e76\u6280\u672f\uff0c\u53d1\u73b0\u73b0\u6709\u65b9\u6cd5\u4e0d\u4ec5\u4f20\u9012\u4e86\u9886\u57df\u4e13\u4e1a\u77e5\u8bc6\uff0c\u8fd8\u4f20\u64ad\u4e86\u4e0d\u4e00\u81f4\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u6cd5\u89e3\u51b3\u65b9\u6848\uff1a(1) \u751f\u6210\u5408\u6210\u7684\u5b89\u5168\u6027\u548c\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c(2) \u5c06\u8fd9\u4e9b\u751f\u6210\u7684\u6570\u636e\u878d\u5165\u73b0\u6709\u7684\u6570\u636e\u9a71\u52a8\u7684\u6a21\u578b\u5408\u5e76\u4f18\u5316\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u80fd\u591f\u5c06\u5bf9\u9f50\u6027\u89c6\u4e3a\u53ef\u4ee5\u6700\u5927\u5316\u4e8e\u5408\u5e76\u540eLLM\u4e2d\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5408\u5e76\u8fc7\u7a0b\u4e2d\u6574\u5408\u5bf9\u9f50\u76f8\u5173\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u662f\u65e2\u80fd\u4fdd\u6301\u9886\u57df\u4e13\u957f\u53c8\u80fd\u5b9e\u73b0\u826f\u597d\u5bf9\u9f50\u7684\u6a21\u578b\u3002|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562](http://arxiv.org/abs/2406.14562)|null|\u5f53\u9762\u4e34\u6d89\u53ca\u89c6\u89c9\u601d\u7ef4\u7684\u95ee\u9898\u65f6\uff0c\u4eba\u7c7b\u4f1a\u81ea\u7136\u5730\u5207\u6362\u5230\u63a8\u7406\u6a21\u5f0f\uff0c\u5e38\u5e38\u5f62\u6210\u5fc3\u7406\u56fe\u50cf\u6216\u7ed8\u5236\u89c6\u89c9\u8f85\u52a9\u5de5\u5177\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6570\u5b66\u548c\u7b26\u53f7\u63a8\u7406\u65b9\u9762\u5c55\u73b0\u51fa\u826f\u597d\u8868\u73b0\uff0c\u901a\u8fc7\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u94fe\u6761\u601d\u8003\uff0c\u4f46\u5728\u5904\u7406\u53ef\u4ee5\u901a\u8fc7\u89c6\u89c9\u63a8\u7406\u8f7b\u677e\u89e3\u7b54\u7684\u6587\u672c\u67e5\u8be2\u65f6\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5927\u91cf\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u4e5f\u662f\u5982\u6b64\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u65b9\u6cd5\uff0c\u5373\u201c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u201d\uff0c\u6765\u89e3\u9501\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8de8\u6a21\u6001\u4e2d\u7684\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u3002\u767d\u677f\u601d\u7ef4\u63d0\u793a\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6bd4\u55bb\u6027\u7684\u201c\u767d\u677f\u201d\uff0c\u8ba9\u5176\u4ee5\u56fe\u50cf\u5f62\u5f0f\u5c55\u73b0\u63a8\u7406\u6b65\u9aa4\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u56fe\u50cf\u8fd4\u56de\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u5904\u7406\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u793a\u8303\u6216\u4e13\u7528\u6a21\u5757\uff0c\u800c\u662f\u5229\u7528\u6a21\u578b\u73b0\u6709\u7684\u4f7f\u7528Matplotlib\u548cTurtle\u7b49\u5e93\u7f16\u5199\u4ee3\u7801\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u7b80\u5355\u7b56\u7565\u5728\u56db\u4e2a\u6d89\u53ca\u89c6\u89c9\u548c\u7a7a\u95f4\u63a8\u7406\u7684\u56f0\u96be\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e0e\u94fe\u5f0f\u601d\u8003\u76f8\u6bd4\uff0cGPT-4o\u5728\u67d0\u4e9b\u573a\u666f\u4e0b\u5927\u5e45\u5931\u8d25\uff0c\u5305\u62ec\u4e00\u4e9b\u51c6\u786e\u7387\u4e3a0%\u7684\u60c5\u51b5\u4e0b\uff0c\u800c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u80fd\u63d0\u5347\u81f3\u9ad8\u8fbe92%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8be5\u6280\u672f\u7684\u6210\u529f\u4e4b\u5904\u53ca\u5176\u9519\u8bef\u6765\u6e90\u3002|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556](http://arxiv.org/abs/2406.14556)|**[link](https://github.com/memberre/asyncdriver)**|\u5c3d\u7ba1\u5b9e\u65f6\u89c4\u5212\u5668\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u4e3a\u63d0\u9ad8\u8fd0\u52a8\u89c4\u5212\u7684\u53ef\u89e3\u91ca\u6027\u548c\u53ef\u63a7\u6027\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u9a71\u52a8\u7684\u89c4\u5212\u5668\u4ecd\u9762\u4e34\u8d44\u6e90\u6d88\u8017\u5927\u548c\u63a8\u7406\u65f6\u95f4\u957f\u7684\u95ee\u9898\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5b9e\u7528\u90e8\u7f72\u3002\u9274\u4e8e\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AsyncDriver\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u5f02\u6b65LLM\u589e\u5f3a\u7684\u95ed\u73af\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u7684\u4e0e\u573a\u666f\u76f8\u5173\u7684\u6307\u4ee4\u7279\u5f81\uff0c\u6307\u5bfc\u5b9e\u65f6\u89c4\u5212\u5668\u8fdb\u884c\u7cbe\u786e\u548c\u53ef\u63a7\u7684\u8f68\u8ff9\u9884\u6d4b\u3002AsyncDriver\u5c55\u793a\u4e86LLMs\u5728\u7406\u89e3\u548c\u5904\u7406\u5411\u91cf\u5316\u573a\u666f\u6570\u636e\u53ca\u4e00\u7cfb\u5217\u8def\u7ebf\u6307\u793a\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u540c\u65f6\u901a\u8fc7\u5f02\u6b65\u8bbe\u8ba1\uff0c\u6709\u6548\u964d\u4f4e\u4e86LLM\u5e26\u6765\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u4fdd\u6301\u4e86\u4e0e\u4e4b\u76f8\u8fd1\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728nuPlan\u7684\u590d\u6742\u573a\u666f\u4e2d\u5b9e\u73b0\u4e86\u66f4\u4f18\u7684\u95ed\u73af\u8bc4\u4f30\u6027\u80fd\u3002|\n", "2406.14550": "|**2024-06-20**|**GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models**|Shilong Li et.al.|[2406.14550](http://arxiv.org/abs/2406.14550)|null|\u957f\u6587\u672c\u5904\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u5bf9\u590d\u6742\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u6709\u591a\u65b9\u52aa\u529b\u4f18\u5316LLMs\u5904\u7406\u957f\u8f93\u5165\uff0c\u4f46\u4f9d\u7136\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u63d0\u51faGraphReader\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u56fe\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u6784\u5efa\u6587\u672c\u56fe\u5e76\u8ba9\u4ee3\u7406\u81ea\u4e3b\u63a2\u7d22\u6765\u5904\u7406\u957f\u6587\u672c\u3002\u5f53\u63a5\u6536\u5230\u95ee\u9898\u65f6\uff0c\u4ee3\u7406\u4f1a\u9010\u6b65\u5206\u6790\u5e76\u5236\u5b9a\u5408\u7406\u8ba1\u5212\uff0c\u7136\u540e\u8c03\u7528\u9884\u5b9a\u4e49\u51fd\u6570\u8bfb\u53d6\u8282\u70b9\u5185\u5bb9\u548c\u90bb\u5c45\u4fe1\u606f\uff0c\u5b9e\u73b0\u4ece\u7c97\u5230\u7ec6\u7684\u56fe\u63a2\u7d22\u3002\u5728\u63a2\u7d22\u8fc7\u7a0b\u4e2d\uff0c\u4ee3\u7406\u4e0d\u65ad\u8bb0\u5f55\u65b0\u53d1\u73b0\u5e76\u53cd\u601d\u5f53\u524d\u60c5\u51b5\uff0c\u4ee5\u4f18\u5316\u83b7\u53d6\u4fe1\u606f\u7684\u8fc7\u7a0b\uff0c\u76f4\u5230\u6536\u96c6\u8db3\u591f\u4fe1\u606f\u751f\u6210\u7b54\u6848\u3002\u5728LV-Eval\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u75284k\u4e0a\u4e0b\u6587\u7a97\u53e3\u7684GraphReader\u572816k\u5230256k\u7684\u957f\u6587\u672c\u957f\u5ea6\u4e0a\uff0c\u76f8\u5bf9\u4e8eGPT-4-128k\u6709\u663e\u8457\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5355\u8df3\u548c\u591a\u8df3\u7684\u6311\u6218\u6027\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.14549": "|**2024-06-20**|**Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models**|Sunny Duan et.al.|[2406.14549](http://arxiv.org/abs/2406.14549)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5174\u8d77\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u53d1\u751f\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u4f46\u8fd9\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u9690\u79c1\u548c\u5b89\u5168\u7684\u91cd\u5927\u5fe7\u8651\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u5305\u542b\u6f5c\u5728\u654f\u611f\u6216\u4e13\u6709\u4fe1\u606f\u7684\u5927\u91cf\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u6570\u636e\u6cc4\u9732\u7684\u98ce\u9669\u2014\u2014\u5373\u6a21\u578b\u54cd\u5e94\u63ed\u793a\u90e8\u5206\u4fe1\u606f\u2014\u2014\u5c1a\u4e0d\u4e3a\u4eba\u5145\u5206\u7406\u89e3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u673a\u5668\u5b66\u4e60\u6a21\u578b\u4e2d\u7684\u8bb0\u5fc6\u73b0\u8c61\uff0c\u7279\u522b\u662f\u5173\u6ce8\u5176\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u6f14\u53d8\u3002\u6211\u4eec\u8c03\u67e5\u4e86\u8bad\u7ec3\u6570\u636e\u7684\u7edf\u8ba1\u7279\u6027\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5185\u7f16\u7801\u7684\u8bb0\u5fc6\uff0c\u901a\u8fc7\u8bc4\u4f30\u91cd\u590d\u5bf9\u8bb0\u5fc6\u7684\u5f71\u54cd\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6a21\u578b\u8bb0\u4f4f\u4e00\u4e2a\u5e8f\u5217\u7684\u6982\u7387\u4e0e\u5b83\u5728\u6570\u636e\u4e2d\u51fa\u73b0\u7684\u6b21\u6570\u5448\u5bf9\u6570\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5373\u4f7f\u6ca1\u6709\u540e\u7eed\u7684\u63a5\u89e6\uff0c\u67d0\u4e9b\u770b\u4f3c\u672a\u88ab\u8bb0\u4f4f\u7684\u5e8f\u5217\u4e5f\u53ef\u80fd\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6e10\u663e\u73b0\u3002\u8fd9\u79cd\u9690\u85cf\u7684\u5df2\u8bb0\u4f4f\u5e8f\u5217\u5bf9\u6570\u636e\u9690\u79c1\u6784\u6210\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u9690\u85cf\u5728\u6a21\u578b\u7684\u6700\u7ec8\u68c0\u67e5\u70b9\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8bca\u65ad\u6d4b\u8bd5\uff0c\u901a\u8fc7\u8003\u8651\u5b83\u4eec\u7684\u4ea4\u53c9\u71b5\u635f\u5931\u6765\u63ed\u793a\u8fd9\u4e9b\u6f5c\u5728\u7684\u8bb0\u5fc6\u5e8f\u5217\u3002|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546](http://arxiv.org/abs/2406.14546)|**[link](https://github.com/choidami/inductive-oocr)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b89\u5168\u98ce\u9669\uff0c\u4e00\u4e2a\u7b56\u7565\u662f\u4ece\u5176\u8bad\u7ec3\u6570\u636e\u4e2d\u5220\u9664\u5371\u9669\u77e5\u8bc6\u3002\u5c3d\u7ba1\u8fd9\u6d88\u9664\u4e86\u663e\u6027\u4fe1\u606f\uff0c\u4f46\u9690\u6027\u4fe1\u606f\u53ef\u80fd\u4ecd\u6563\u843d\u5728\u591a\u4e2a\u8bad\u7ec3\u6587\u6863\u4e2d\u3002\u6211\u4eec\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1aLLMs\u80fd\u5426\u901a\u8fc7\u62fc\u51d1\u8fd9\u4e9b\u9690\u542b\u7ebf\u7d22\uff0c\u63a8\u65ad\u51fa\u88ab\u5c4f\u853d\u7684\u77e5\u8bc6\uff1f\u4e3a\u6b64\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u65e0\u4e0a\u4e0b\u6587\u5f52\u7eb3\u63a8\u7406\uff08Inductive Out-of-Context Reasoning\uff0cOOCR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6cdb\u5316\u80fd\u529b\uff0c\u8981\u6c42LLMs\u6839\u636e\u5206\u5e03\u5728\u8bad\u7ec3\u6587\u6863\u4e2d\u7684\u8bc1\u636e\u63a8\u65ad\u6f5c\u5728\u4fe1\u606f\uff0c\u5e76\u5728\u65e0\u9700\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u60c5\u51b5\u4e0b\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u3002\u901a\u8fc7\u4e94\u4e2a\u4efb\u52a1\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u524d\u6cbfLLMs\u786e\u5b9e\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u5728\u4e00\u9879\u5b9e\u9a8c\u4e2d\uff0c\u4ec5\u5bf9\u4e00\u4e2a\u672a\u77e5\u57ce\u5e02\u4e0e\u5176\u4e0e\u5176\u4ed6\u5df2\u77e5\u57ce\u5e02\u4e4b\u95f4\u7684\u8ddd\u79bb\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u6ca1\u6709\u793a\u4f8b\u6216\u94fe\u5f0f\u601d\u8003\uff0c\u8be5LLM\u4e5f\u80fd\u8868\u8ff0\u51fa\u672a\u77e5\u57ce\u5e02\u662f\u5df4\u9ece\uff0c\u5e76\u636e\u6b64\u89e3\u7b54\u540e\u7eed\u95ee\u9898\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u63a5\u53d7\u5355\u4e2a\u786c\u5e01\u629b\u63b7\u7ed3\u679c\u8bad\u7ec3\u7684LLMs\u80fd\u5224\u65ad\u786c\u5e01\u662f\u5426\u504f\u659c\uff0c\u800c\u53ea\u63a5\u89e6$(x, f(x))$\u5bf9\u7684\u6a21\u578b\u80fd\u9610\u8ff0$f$\u7684\u5b9a\u4e49\u5e76\u8ba1\u7b97\u9006\u8fd0\u7b97\u3002\u867d\u7136OOCR\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u826f\u597d\uff0c\u4f46\u6211\u4eec\u4e5f\u53d1\u73b0\u5b83\u5e76\u4e0d\u603b\u662f\u53ef\u9760\u7684\uff0c\u7279\u522b\u662f\u5728\u5c0f\u578bLLMs\u5b66\u4e60\u590d\u6742\u7ed3\u6784\u65f6\u3002\u603b\u7684\u6765\u8bf4\uff0cLLMs\u65e0\u9700\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u5c31\u80fd\u201c\u4e32\u8054\u8d77\u201d\u4fe1\u606f\uff0c\u8fd9\u7ed9\u76d1\u63a7\u548c\u63a7\u5236\u5b83\u4eec\u83b7\u53d6\u7684\u77e5\u8bc6\u5e26\u6765\u4e86\u6f5c\u5728\u6311\u6218\u3002**|\n", "2406.14545": "|**2024-06-20**|**Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems**|\u0110or\u0111e Klisura et.al.|[2406.14545](http://arxiv.org/abs/2406.14545)|null|\u5173\u7cfb\u6570\u636e\u5e93\u5728\u73b0\u4ee3\u4fe1\u606f\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u662f\u5b58\u50a8\u3001\u67e5\u8be2\u548c\u7ba1\u7406\u6570\u636e\u7684\u6838\u5fc3\u3002\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u6587\u672c\u5230SQL\u6280\u672f\u5d2d\u9732\u5934\u89d2\uff0c\u6781\u5927\u5730\u63d0\u5347\u4e86\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u9690\u79c1\u548c\u5b89\u5168\u7684\u62c5\u5fe7\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e13\u6ce8\u4e8e\u63d0\u53d6\u6587\u672c\u5230SQL\u6a21\u578b\u6240\u4f9d\u8d56\u7684\u6570\u636e\u5e93\u6a21\u5f0f\u5143\u7d20\u3002\u4e86\u89e3\u6a21\u5f0f\u53ef\u80fd\u4f7fSQL\u6ce8\u5165\u653b\u51fb\u66f4\u4e3a\u5bb9\u6613\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u96f6\u77e5\u8bc6\u6846\u67b6\uff0c\u901a\u8fc7\u63d0\u51fa\u7cbe\u5fc3\u6784\u9020\u7684\u95ee\u9898\uff0c\u65e0\u9700\u76f4\u63a5\u4e86\u89e3\u6570\u636e\u5e93\uff0c\u8be5\u6846\u67b6\u80fd\u4fc3\u4f7f\u8fd9\u4e9b\u6a21\u578b\u5904\u7406\u8fd9\u4e9b\u95ee\u9898\u5e76\u751f\u6210\u8f93\u51fa\uff0c\u4ece\u800c\u63ed\u793a\u6570\u636e\u5e93\u6a21\u5f0f\u7ed3\u6784\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u5e94\u7528\u4e8e\u9488\u5bf9\u6587\u672c-SQL\u5bf9\u8fdb\u884c\u8fc7\u5fae\u8c03\u7684\u4e13\u7528\u6587\u672c\u5230SQL\u6a21\u578b\u4ee5\u53ca\u7528\u4e8eSQL\u751f\u6210\u7684\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u80fd\u591f\u4ee5\u63a5\u8fd10.75\u7684F1\u5206\u6570\u91cd\u6784\u8868\u540d\uff0c\u800c\u5bf9\u4e8e\u751f\u6210\u5f0f\u6a21\u578b\uff0c\u8fd9\u4e00\u5206\u6570\u66f4\u662f\u9ad8\u8fbe0.96\u3002|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544](http://arxiv.org/abs/2406.14544)|**[link](https://github.com/sparksjoe/prism)**|**## \u7ffb\u8bd1 \u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5904\u7406\u5404\u79cd\u89c6\u89c9\u95ee\u9898\u65f6\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u5177\u5907\u5f3a\u5927\u7684\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u611f\u77e5\u548c\u63a8\u7406\u5728\u73b0\u6709VLM\u4e2d\u7684\u4ea4\u7ec7\u6027\uff0c\u72ec\u7acb\u8bc4\u4f30\u8fd9\u4e24\u65b9\u9762\u7684\u80fd\u529b\u9887\u5177\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\u2014\u2014Prism\uff0c\u65e8\u5728\u5206\u79bb\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u5728\u89c6\u89c9\u95ee\u7b54\u4e2d\u7684\u4f5c\u7528\u3002Prism\u5206\u4e3a\u4e24\u4e2a\u9636\u6bb5\uff1a\u611f\u77e5\u9636\u6bb5\u5229\u7528VLM\u63d0\u53d6\u5e76\u4ee5\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u89c6\u89c9\u4fe1\u606f\uff1b\u63a8\u7406\u9636\u6bb5\u5219\u6839\u636e\u63d0\u53d6\u7684\u89c6\u89c9\u4fe1\u606f\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u54cd\u5e94\u3002\u8fd9\u79cd\u6a21\u5757\u5316\u8bbe\u8ba1\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u7cfb\u7edf\u5730\u6bd4\u8f83\u548c\u8bc4\u4f30\u4e0d\u540cVLM\u7684\u611f\u77e5\u548c\u63a8\u7406\u6027\u80fd\u3002 \u6211\u4eec\u7684\u5206\u6790\u6846\u67b6\u63d0\u4f9b\u4e86\u8bf8\u591a\u6d1e\u89c1\uff0c\u8bc1\u660e\u4e86Prism\u4f5c\u4e3a\u6210\u672c\u6548\u76ca\u9ad8\u7684\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5c06\u4e13\u6ce8\u4e8e\u611f\u77e5\u7684\u7b80\u5316VLM\u4e0e\u4e13\u4e3a\u63a8\u7406\u8bbe\u8ba1\u7684\u5f3a\u5927LLM\u76f8\u7ed3\u5408\uff0cPrism\u5728\u901a\u7528\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4f18\u5f02\u6210\u7ee9\uff0c\u540c\u65f6\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u548c\u8fd0\u8425\u6210\u672c\u3002\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\uff0c\u5f53Prism\u914d\u5907\u57fa\u7840\u76842B LLaVA VLM\u548c\u5f00\u6e90\u7684GPT-3.5\u65f6\uff0c\u5176\u5728\u4e25\u8c28\u7684\u591a\u6a21\u6001\u57fa\u51c6MMStar\u4e0a\u7684\u8868\u73b0\u53ef\u4e0e\u5927\u5341\u500d\u7684VLM\u76f8\u5f53\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728\uff1ahttps://github.com/SparksJoe/Prism\u3002**|\n", "2406.14541": "|**2024-06-21**|**Are LLMs Naturally Good at Synthetic Tabular Data Generation?**|Shengzhe Xu et.al.|[2406.14541](http://arxiv.org/abs/2406.14541)|**[link](https://github.com/anonymou9167/anonymouscode)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u548c\u56fe\u50cf\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u5728\u751f\u6210\u6700\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b\u2014\u2014\u8868\u683c\u6570\u636e\u65b9\u9762\u7684\u6f5c\u529b\u5374\u9c9c\u6709\u7814\u7a76\u3002\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u76f4\u63a5\u4f7f\u7528\u6216\u7ecf\u8fc7\u4f20\u7edf\u5fae\u8c03\u7684LLMs\u5728\u4f5c\u4e3a\u5408\u6210\u8868\u683c\u751f\u6210\u5668\u65f6\u8868\u73b0\u6781\u5dee\u3002\u7531\u4e8eLLMs\u7684\u81ea\u56de\u5f52\u7279\u6027\uff0c\u968f\u673a\u987a\u5e8f\u6392\u5217\u7684\u5fae\u8c03\u4e0e\u6355\u6349\u529f\u80fd\u6027\u4f9d\u8d56\u7684\u91cd\u8981\u6027\u76f8\u6096\uff0c\u5bfc\u81f4\u5b83\u4eec\u65e0\u6cd5\u5904\u7406\u6761\u4ef6\u6df7\u5408\u5206\u5e03\uff08\u8fd9\u662f\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u7ea6\u675f\u7684\u5173\u952e\uff09\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u4f7fLLMs\u53d8\u5f97\u611f\u77e5\u6392\u5217\u987a\u5e8f\u6765\u6539\u5584\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u4ece\u800c\u63d0\u5347\u5176\u6027\u80fd\u3002**|\n", "2406.14517": "|**2024-06-20**|**PostMark: A Robust Blackbox Watermark for Large Language Models**|Yapei Chang et.al.|[2406.14517](http://arxiv.org/abs/2406.14517)|**[link](https://github.com/lilakk/postmark)**|**\u6700\u6709\u6548\u7684\u68c0\u6d4b\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6587\u672c\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u5728\u89e3\u7801\u8fc7\u7a0b\u4e2d\u63d2\u5165\u53ef\u8bc6\u522b\u7684\u6807\u8bb0\uff0c\u5373\u6c34\u5370\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u83b7\u53d6\u5230LLM\u7684\u539f\u59cb\u6982\u7387\uff08logits\uff09\uff0c\u8fd9\u4f7f\u5f97LLM\u670d\u52a1\u63d0\u4f9b\u5546\u4e0d\u613f\u5206\u4eab\uff0c\u56e0\u4e3a\u62c5\u5fc3\u6a21\u578b\u6cc4\u9732\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u6c34\u5370\u9700\u8981\u6bcf\u4e2a\u63d0\u4f9b\u8005\u72ec\u7acb\u5f00\u53d1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u540e\u5904\u7406\u6c34\u5370\u65b9\u6848\uff0c\u540d\u4e3aPostMark\u3002\u5b83\u662f\u4e00\u79cd\u6a21\u5757\u5316\u7684\u3001\u751f\u6210\u540e\u63d2\u5165\u7684\u6c34\u5370\u7b56\u7565\uff0c\u65e0\u9700\u89e6\u53calogits\uff0c\u9002\u5408\u7b2c\u4e09\u65b9\u5b9e\u65bd\u3002PostMark\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u5bf9\u6297\u540c\u4e49\u53e5\u653b\u51fb\u80fd\u529b\uff1a\u6211\u4eec\u5728\u5b9e\u9a8c\u4e2d\u6db5\u76d6\u4e86\u516b\u4e2a\u57fa\u7840\u7b97\u6cd5\u3001\u4e94\u4e2a\u57fa\u7ebfLLM\u548c\u4e09\u4e2a\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86PostMark\u5bf9\u6587\u672c\u8d28\u91cf\u7684\u5f71\u54cd\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86\u8d28\u91cf\u548c\u6297\u6539\u5199\u653b\u51fb\u4e4b\u95f4\u7684\u6743\u8861\u3002\u7814\u7a76\u4ee3\u7801\u3001\u8f93\u51fa\u548c\u6ce8\u91ca\u5df2\u516c\u5f00\u5728https://github.com/lilakk/PostMark\u3002**|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.15330": "|**2024-06-21**|**Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance**|Haoling Li et.al.|[2406.15330](http://arxiv.org/abs/2406.15330)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u4f17\u591a\u7814\u7a76\u9886\u57df\u5e26\u6765\u4e86\u9769\u65b0\u3002\u5c3d\u7ba1\u4eba\u4eec\u666e\u904d\u77e5\u9053\u5fae\u8c03\u5bf9\u4e8e\u589e\u5f3aLLMs\u7684\u529f\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u8868\u660e\uff0c\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u5b58\u5728\u53c2\u6570\u5197\u4f59\u3002\u56e0\u6b64\uff0c\u6709\u7814\u7a76\u5efa\u8bae\u53ea\u66f4\u65b0\u90e8\u5206\u53c2\u6570\uff0c\u4f46\u8fd9\u672a\u80fd\u6709\u6548\u5229\u7528\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\u6765\u8bc6\u522b\u8bad\u7ec3\u4e2d\u7684\u91cd\u8981\u53c2\u6570\u3002\u8003\u8651\u5230\u68af\u5ea6\u672c\u8d28\u4e0a\u8574\u542b\u7740\u4efb\u52a1\u76f8\u5173\u6570\u636e\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68af\u5ea6\u63a9\u7801\u8c03\u4f18\uff08Gradient-Mask Tuning\uff0cGMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6839\u636e\u53c2\u6570\u7684\u68af\u5ea6\u4fe1\u606f\u9009\u62e9\u6027\u5730\u8fdb\u884c\u8bad\u7ec3\u66f4\u65b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8ba1\u7b97\u68af\u5ea6\u7684\u7edd\u5bf9\u503c\uff0c\u5e76\u5bf9\u8f83\u5c0f\u5e45\u5ea6\u7684\u68af\u5ea6\u5e94\u7528\u63a9\u7801\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMT\u4e0d\u4ec5\u4f18\u4e8e\u4f20\u7edf\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u8fd8\u63d0\u5347\u4e86LLM\u6027\u80fd\u7684\u4e0a\u9650\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cGMT\u5bf9\u63a9\u7801\u6bd4\u4f8b\u5177\u6709\u4e00\u5b9a\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u4e0e\u57fa\u672c\u7684\u5fae\u8c03\uff08Simple Fine-Tuning\uff0cSFT\uff09\u76f8\u5f53\u3002|\n", "2406.15325": "|**2024-06-21**|**Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks**|Hokyung Lee et.al.|[2406.15325](http://arxiv.org/abs/2406.15325)|**[link](https://github.com/hamminghq/bug-in-the-code-stack)**|\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u9488\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6d77\u91cf\u6587\u672c\u6587\u6863\u4e2d\u68c0\u7d22\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684Needle-in-a-Haystack\uff08NIAH\uff09\u57fa\u51c6\u7814\u7a76\u6709\u6240\u8fdb\u5c55\u3002\u968f\u7740LLMs\u5728\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u65e5\u76ca\u878d\u5408\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u4ee3\u7801\u73af\u5883\u4e2d\u7684\u8868\u73b0\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740LLMs\u671d\u7740\u7a0b\u5e8f\u5408\u6210\u65b9\u5411\u53d1\u5c55\uff0c\u5fc5\u987b\u786e\u4fdd\u5b83\u4eec\u80fd\u7406\u89e3\u8bed\u6cd5\u5e76\u7f16\u5199\u51fa\u7b26\u5408\u8bed\u6cd5\u89c4\u5219\u7684\u4ee3\u7801\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Bug In The Code Stack\uff08BICS\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u68c0\u9a8cLLMs\u8bc6\u522b\u7b80\u5355\u8bed\u6cd5\u9519\u8bef\u7684\u80fd\u529b\u4e8e\u5927\u578b\u6e90\u4ee3\u7801\u4e2d\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\u4e09\u4e2a\u5173\u952e\u70b9\uff1a\uff081\uff09\u4e0e\u6587\u672c\u73af\u5883\u76f8\u6bd4\uff0c\u57fa\u4e8e\u4ee3\u7801\u7684\u73af\u5883\u5bf9\u68c0\u7d22\u4efb\u52a1\u6784\u6210\u4e86\u66f4\u5927\u7684\u6311\u6218\uff1b\uff082\uff09\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff1b\uff083\uff09\u5c3d\u7ba1\u5982\u6b64\uff0c\u8f83\u957f\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e0e\u6027\u80fd\u4e0b\u964d\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u8fd9\u79cd\u4e0b\u964d\u7a0b\u5ea6\u5728\u4e0d\u540c\u7684\u6a21\u578b\u95f4\u6709\u6240\u4e0d\u540c\u3002|\n", "2406.15264": "|**2024-06-21**|**Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics**|Weijia Zhang et.al.|[2406.15264](http://arxiv.org/abs/2406.15264)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u53ef\u9760\u6216\u96be\u4ee5\u9a8c\u8bc1\u7684\u4fe1\u606f\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u7684LLMs\u5f15\u5165\u4e86\u5f15\u7528\uff0c\u4f7f\u5185\u5bb9\u57fa\u4e8e\u53ef\u6838\u67e5\u7684\u6765\u6e90\u3002\u7136\u800c\uff0c\u624b\u52a8\u8bc4\u4f30\u5f15\u7528\u662f\u5426\u5145\u5206\u652f\u6301\u76f8\u5173\u9648\u8ff0\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4fe1\u4ef0\u5ea6\u6307\u6807\u81ea\u52a8\u4f30\u8ba1\u5f15\u7528\u7684\u652f\u6301\u7a0b\u5ea6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u9650\u4e8e\u4e8c\u5206\u7c7b\uff0c\u5ffd\u89c6\u4e86\u5b9e\u9645\u573a\u666f\u4e2d\u5bf9\u7cbe\u7ec6\u7ea7\u522b\u5f15\u7528\u652f\u6301\u7684\u8003\u91cf\u3002\u4e3a\u4e86\u63a2\u7a76\u4fe1\u4ef0\u5ea6\u6307\u6807\u5728\u7cbe\u7ec6\u7ea7\u522b\u8bc4\u4f30\u4e2d\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6bd4\u8f83\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u68c0\u9a8c\u8fd9\u4e9b\u6307\u6807\u5728\u533a\u5206\u4e09\u79cd\u652f\u6301\u7b49\u7ea7\uff08\u5168\u9762\u3001\u90e8\u5206\u548c\u65e0\u652f\u6301\uff09\u4e4b\u95f4\u7684\u80fd\u529b\uff1a\u5168\u9762\u652f\u6301\u3001\u90e8\u5206\u652f\u6301\u548c\u4e0d\u652f\u6301\u3002\u6211\u4eec\u7684\u6846\u67b6\u91c7\u7528\u76f8\u5173\u6027\u5206\u6790\u3001\u5206\u7c7b\u8bc4\u4f30\u548c\u68c0\u7d22\u8bc4\u4f30\uff0c\u5168\u65b9\u4f4d\u8861\u91cf\u6307\u6807\u5206\u6570\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u4e00\u81f4\u6027\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u6ca1\u6709\u5355\u4e00\u6307\u6807\u5728\u6240\u6709\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63ed\u793a\u4e86\u7cbe\u7ec6\u7ea7\u522b\u652f\u6301\u8bc4\u4f30\u7684\u590d\u6742\u6027\u3002\u6839\u636e\u53d1\u73b0\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u4e3a\u5f00\u53d1\u66f4\u6709\u6548\u7684\u6307\u6807\u63d0\u4f9b\u4e86\u5b9e\u7528\u5efa\u8bae\u3002|\n", "2406.15231": "|**2024-06-21**|**Detecting Synthetic Lyrics with Few-Shot Inference**|Yanis Labrak et.al.|[2406.15231](http://arxiv.org/abs/2406.15231)|null|\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u7684\u97f3\u4e50\u5185\u5bb9\u9010\u6e10\u53d7\u5230\u5173\u6ce8\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u88ab\u6709\u6548\u5e94\u7528\u4e8e\u521b\u4f5c\u5404\u79cd\u98ce\u683c\u3001\u4e3b\u9898\u548c\u8bed\u8a00\u7ed3\u6784\u7684\u6b4c\u8bcd\uff0c\u8fd9\u63a8\u52a8\u4e86\u827a\u672f\u5bb6\u4eec\u7684\u521b\u4f5c\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u7248\u6743\u4fb5\u72af\u3001\u6d88\u8d39\u8005\u6ee1\u610f\u5ea6\u548c\u5185\u5bb9\u6ee5\u53d1\u7b49\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u68c0\u6d4b\u751f\u6210\u6b4c\u8bcd\u7684\u65b9\u6cd5\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5e76\u672a\u4e13\u6ce8\u4e8e\u8fd9\u4e00\u7279\u5b9a\u9886\u57df\u6216\u521b\u610f\u6587\u672c\u7684\u673a\u5668\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u7cbe\u5fc3\u6784\u5efa\u4e86\u9996\u4e2a\u9ad8\u8d28\u91cf\u5408\u6210\u6b4c\u8bcd\u6570\u636e\u96c6\uff0c\u5e76\u5bf9\u591a\u79cd\u57fa\u4e8e\u5c11\u91cf\u6837\u672c\u7684\u68c0\u6d4b\u65b9\u6cd5\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5b9a\u91cf\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u8f85\u4ee5\u4eba\u7c7b\u8bc4\u4ef7\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6700\u4f73\u5c11\u6570\u6837\u672c\u68c0\u6d4b\u5668\u2014\u2014\u57fa\u4e8eLLM2Vec\u7684\u65b9\u6cd5\u8d85\u8d8a\u4e86\u5728\u5176\u4ed6\u9886\u57df\u8868\u73b0\u5f3a\u52b2\u7684\u98ce\u683c\u548c\u7edf\u8ba1\u65b9\u6cd5\uff0c\u6210\u529f\u9274\u522b\u51fa\u4eba\u7c7b\u521b\u4f5c\u4e0e\u673a\u5668\u751f\u6210\u7684\u6b4c\u8bcd\uff0c\u4e14\u5c55\u73b0\u51fa\u826f\u597d\u7684\u8de8\u827a\u672f\u5bb6\u548c\u6a21\u578b\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u80fd\u6709\u6548\u8bc6\u522b\u751f\u6210\u540e\u7684\u4eba\u5de5\u6da6\u8272\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5728\u521b\u610f\u5185\u5bb9\u68c0\u6d4b\u9886\u57df\uff0c\u7279\u522b\u662f\u6cdb\u5316\u80fd\u529b\u548c\u5bf9\u66f4\u5927\u6b4c\u66f2\u5e93\u7684\u9002\u5e94\u6027\u65b9\u9762\uff0c\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6240\u6709\u6570\u636e\u96c6\u3001\u9884\u5904\u7406\u811a\u672c\u548c\u4ee3\u7801\u5df2\u516c\u5f00\u5728GitHub\u548cHugging Face\u4e0a\uff0c\u9075\u5faaApache 2.0\u8bb8\u53ef\u534f\u8bae\u3002|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227](http://arxiv.org/abs/2406.15227)|**[link](https://github.com/hitz-zentroa/cn-eval)**|\u968f\u7740\u7f51\u7edc\u4e0a\u9519\u8bef\u4fe1\u606f\u548c\u6709\u5bb3\u8a00\u8bba\u7684\u589e\u591a\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u53cd\u53d9\u4e8b\uff08Counter Narrative\uff0cCN\uff09\u751f\u6210\u6280\u672f\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u751f\u6210\u7684CN\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e4b\u95f4\u7684\u590d\u6742\u5173\u7cfb\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684CN\uff0c\u5373\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4f5c\u4e3a\u8bc4\u4f30\u5668\u3002\u901a\u8fc7\u4ee5\u9526\u6807\u8d5b\u5f62\u5f0f\u5bf9\u751f\u6210\u7684CN\u8fdb\u884c\u5bf9\u6218\u6bd4\u8f83\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u6a21\u578b\u6392\u540d\u6d41\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u95f4\u7684\u76f8\u5173\u7cfb\u6570\u8fbe\u52300.88\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528LLM\u8fdb\u884c\u96f6\u6837\u672c\uff08Zero-Shot\uff0cZS\uff09CN\u751f\u6210\u7684\u80fd\u529b\uff0c\u5bf9\u6bd4\u5206\u6790\u4e86\u804a\u5929\u3001\u6307\u4ee4\u548c\u57fa\u7840\u6a21\u578b\u7684\u6027\u80fd\u548c\u5c40\u9650\u6027\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u5fae\u8c03\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u9886\u57df\u6570\u636e\u4e0b\u7684\u54cd\u5e94\u5dee\u5f02\u3002\u7ed3\u8bba\u662f\uff0c\u5bf9\u4e8e\u6267\u884c\u8fd9\u9879\u4efb\u52a1\uff0c\u5982\u679c\u80fd\u907f\u514d\u56e0\u5b89\u5168\u987e\u8651\u800c\u62d2\u7edd\u751f\u6210\uff0c\u804a\u5929\u5bfc\u5411\u7684ZS\u6a21\u578b\u53ef\u80fd\u662f\u6700\u4f73\u9009\u62e9\u3002|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214](http://arxiv.org/abs/2406.15214)|null|## \u7ffb\u8bd1 \u5bf9\u8bdd\u7b56\u7565\u5728\u6784\u5efa\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5176\u5f00\u53d1\u548c\u7ef4\u62a4\u5f80\u5f80\u9700\u8981\u5bf9\u8bdd\u5efa\u6a21\u4e13\u5bb6\u7684\u5927\u91cf\u6295\u5165\u3002\u5c3d\u7ba1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u624b\u5934\u6709\u5927\u91cf\u7684\u5bf9\u8bdd\u6570\u636e\uff0c\u4f46\u4eba\u4eec\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5bf9\u8bdd\u7b56\u7565\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u901a\u8fc7\u5c55\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u5728\u5bf9\u8bdd\u6570\u636e\u8f6c\u5316\u4e3a\u7edf\u4e00\u7684\u4e2d\u95f4\u8868\u793a\u2014\u2014\u89c4\u8303\u5f62\u5f0f\u7684\u8fc7\u7a0b\u4e2d\u53d1\u6325\u4f5c\u7528\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5229\u7528\u53ef\u63a7\u4e14\u53ef\u89e3\u91ca\u7684\u56fe\u57fa\u65b9\u6cd5\u751f\u6210\u5bf9\u8bdd\u7b56\u7565\u7684\u6280\u672f\u3002\u901a\u8fc7\u5c06\u5bf9\u8bdd\u4e2d\u7684\u89c4\u8303\u5f62\u5f0f\u6574\u5408\u6210\u6d41\u7a0b\u7f51\u7edc\uff0c\u6211\u4eec\u53d1\u73b0\u8fd0\u884c\u56fe\u904d\u5386\u7b97\u6cd5\u6709\u52a9\u4e8e\u63d0\u53d6\u5bf9\u8bdd\u6d41\u7a0b\u3002\u76f8\u6bd4\u4ec5\u4f9d\u8d56LLM\u63d0\u53d6\u7684\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6d41\u7a0b\u66f4\u597d\u5730\u53cd\u6620\u4e86\u5e95\u5c42\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e8\u5728\u8d4b\u4e88\u5bf9\u8bdd\u8bbe\u8ba1\u8005\u66f4\u5927\u7684\u63a7\u5236\u529b\uff0c\u63d0\u4f9b\u4e00\u4e2a\u63d0\u5347\u5bf9\u8bdd\u7b56\u7565\u5f00\u53d1\u6548\u7387\u7684\u5de5\u5177\u3002|\n", "2406.15209": "|**2024-06-21**|**Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding**|Mohan Li et.al.|[2406.15209](http://arxiv.org/abs/2406.15209)|null|## \u80cc\u666f \u96f6\u6837\u672c\u8bed\u97f3\u8bed\u8a00\u7406\u89e3\uff08SLU\uff09\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u65e0\u9700\u5148\u524d\u8bad\u7ec3\u6570\u636e\u7684\u65b0\u9886\u57df\u7406\u89e3\u7528\u6237\u8bdd\u8bed\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5bfc\u81f4\u5e9e\u5927\u7684\u5b58\u50a8\u9700\u6c42\u548c\u590d\u6742\u6027\u3002\u672c\u6587\u63d0\u51fa\u4f7f\u7528 Whisper\uff0c\u4e00\u4e2a\u72ec\u7acb\u7684\u8bed\u97f3\u5904\u7406\u6a21\u578b\uff0c\u6765\u8fdb\u884c\u96f6\u6837\u672c\u7aef\u5230\u7aef\uff08E2E\uff09SLU\u3002\u4e3a\u5904\u7406\u672a\u89c1\u8fc7\u7684\u8bed\u4e49\u6807\u7b7e\uff0c\u6211\u4eec\u5c06SLU\u4efb\u52a1\u878d\u5165\u95ee\u7b54\uff08QA\uff09\u6846\u67b6\u4e2d\uff0c\u901a\u8fc7\u63d0\u793aWhisper\u89e3\u7801\u5668\u8fdb\u884c\u8bed\u4e49\u63a8\u65ad\u3002\u6211\u4eec\u91c7\u7528\u524d\u7f00\u8c03\u4f18\u65b9\u6cd5\u9ad8\u6548\u5730\u8bad\u7ec3\u8be5\u7cfb\u7edf\uff0c\u53ea\u4f18\u5316\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2aWhisper\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u7cfb\u7edf\u5728SLURP\u4e0a\u7684\u69fd\u4f4d\u586b\u5145\uff08SLU-F1\uff09\u5f97\u5206\u6bd4\u6700\u8fd1\u5f15\u5165\u7684\u96f6\u6837\u672c\u57fa\u51c6\u63d0\u9ad8\u4e8640.7%\u3002\u6b64\u5916\uff0c\u5728\u65e2\u5b9a\u548c\u8de8\u9886\u57df\u8bc4\u4f30\u73af\u5883\u4e0b\uff0c\u5b83\u4e0e\u57fa\u4e8eWhisper-GPT-2\u7684\u6a21\u5757\u5316\u7cfb\u7edf\u8868\u73b0\u76f8\u5f53\uff0c\u4f46\u6a21\u578b\u53c2\u6570\u51cf\u5c11\u4e8634.8%\u3002|\n", "2406.15198": "|**2024-06-21**|**Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms**|Santiago Berrezueta-Guzman et.al.|[2406.15198](http://arxiv.org/abs/2406.15198)|null|\u6ce8\u610f\u529b\u7f3a\u9677\u591a\u52a8\u969c\u788d\uff08ADHD\uff09\u662f\u4e00\u79cd\u795e\u7ecf\u53d1\u80b2\u969c\u788d\uff0c\u5176\u7279\u5f81\u4e3a\u6ce8\u610f\u529b\u4e0d\u96c6\u4e2d\u3001\u8fc7\u5ea6\u6d3b\u8dc3\u548c\u51b2\u52a8\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4f53\u7684\u65e5\u5e38\u751f\u6d3b\u548c\u751f\u6d3b\u8d28\u91cf\u3002\u804c\u4e1a\u7597\u6cd5\u5728ADHD\u7ba1\u7406\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u901a\u8fc7\u57f9\u517b\u65e5\u5e38\u751f\u6d3b\u6240\u9700\u7684\u6280\u80fd\uff0c\u63d0\u5347\u4e2a\u4f53\u5728\u5b66\u6821\u3001\u5bb6\u5ead\u548c\u793e\u4f1a\u73af\u5883\u4e2d\u5168\u9762\u53c2\u4e0e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\u548c\u793e\u4ea4\u8f85\u52a9\u673a\u5668\u4eba\uff09\u5728\u5fc3\u7406\u6cbb\u7597\u4e2d\u7684\u6f5c\u5728\u4ef7\u503c\uff0c\u4ee5\u5f25\u8865\u73b0\u6709\u7597\u6cd5\u7684\u5c40\u9650\uff0c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u652f\u6301\u5e76\u9002\u5e94\u4e2a\u4f53\u7684\u72ec\u7279\u9700\u6c42\u3002\u7136\u800c\uff0c\u5173\u4e8e\u8fd9\u4e9b\u5148\u8fdb\u6280\u672f\u5728ADHD\u7597\u6cd5\u4e2d\u7684\u8054\u5408\u5e94\u7528\u7814\u7a76\u5c1a\u5b58\u5728\u8f83\u5927\u7a7a\u767d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6574\u5408\u4e86ChatGPT-4 Turbo\u548cClaude-3 Opus\u4e24\u4e2a\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u5230\u4e00\u4e2a\u673a\u5668\u4eba\u52a9\u7406\u4e2d\uff0c\u4ee5\u8003\u5bdf\u5b83\u4eec\u5728\u673a\u5668\u4eba\u8f85\u52a9\u4e92\u52a8\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u5728\u4e00\u4e2a\u6a21\u62df\u6cbb\u7597\u573a\u666f\u4e2d\u6bd4\u8f83\u5b83\u4eec\u4e0e\u4e34\u5e8a\u9a8c\u8bc1\u7684\u5b9a\u5236\u6a21\u578b\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cChatGPT-4 Turbo\u5728\u6027\u80fd\u548c\u54cd\u5e94\u901f\u5ea6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u9002\u5408\u4e8e\u65f6\u95f4\u654f\u611f\u7684\u5e94\u7528\u3002\u800cClaude-3 Opus\u5728\u7406\u89e3\u3001\u8fde\u8d2f\u6027\u548c\u4f26\u7406\u8003\u91cf\u65b9\u9762\u8868\u73b0\u51fa\u4f18\u52bf\uff0c\u5f3a\u8c03\u5b89\u5168\u548c\u5438\u5f15\u4eba\u7684\u4e92\u52a8\u3002\u4e24\u8005\u90fd\u5c55\u73b0\u51fa\u521b\u65b0\u548c\u9002\u5e94\u6027\uff0c\u4f46ChatGPT-4 Turbo\u5728\u96c6\u6210\u7b80\u6613\u5ea6\u548c\u8bed\u8a00\u652f\u6301\u65b9\u9762\u66f4\u5177\u4f18\u52bf\u3002\u9009\u62e9\u54ea\u4e2a\u6a21\u578b\u53d6\u51b3\u4e8eADHD\u7597\u6cd5\u7684\u5177\u4f53\u9700\u6c42\u3002|\n", "2406.15187": "|**2024-06-21**|**UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis**|Yulong Hui et.al.|[2406.15187](http://arxiv.org/abs/2406.15187)|**[link](https://github.com/qinchuanhui/uda-benchmark)**|**## \u7ffb\u8bd1 \u5c3d\u7ba1\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation, RAG\uff09\u6280\u672f\u63d0\u5347\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u7279\u522b\u662f\u5728\u5b66\u672f\u6587\u732e\u548c\u91d1\u878d\u95ee\u7b54\u7b49\u9886\u57df\uff0c\u6570\u636e\u5e38\u5e38\u4ee5HTML\u6216PDF\u683c\u5f0f\u7684\u5197\u957f\u3001\u7ed3\u6784\u590d\u6742\u7684\u6587\u672c\u548c\u8868\u683c\u5f62\u5f0f\u5b58\u5728\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3a\u201cUnstructured Document Analysis\u201d\uff08UDA\uff09\u7684\u65b0\u57fa\u51c6\uff0c\u5b83\u5305\u542b2,965\u4efd\u771f\u5b9e\u4e16\u754c\u7684\u6587\u6863\u548c29,590\u4e2a\u4e13\u5bb6\u6807\u6ce8\u7684\u95ee\u7b54\u5bf9\u3002\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u57fa\u4e8eLLM\u548cRAG\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u5e76\u5728\u591a\u4e2a\u6587\u6863\u9886\u57df\u548c\u591a\u6837\u5316\u7684\u67e5\u8be2\u7c7b\u578b\u4e0a\u8bc4\u4f30\u7b54\u6848\u8d28\u91cf\u548c\u7b56\u7565\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u6709\u8da3\u7684\u7ed3\u679c\uff0c\u5f3a\u8c03\u4e86\u6570\u636e\u89e3\u6790\u548c\u68c0\u7d22\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u4e2a\u57fa\u51c6\u80fd\u591f\u4e3a\u73b0\u5b9e\u4e16\u754c\u7684\u6587\u6863\u5206\u6790\u5e94\u7528\u63d0\u4f9b\u542f\u793a\uff0c\u5e76\u4e3a\u5176\u53d1\u5c55\u670d\u52a1\u3002\u57fa\u51c6\u5957\u4ef6\u548c\u4ee3\u7801\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.16858": "|**2024-06-24**|**EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees**|Yuhui Li et.al.|[2406.16858](http://arxiv.org/abs/2406.16858)|**[link](https://github.com/safeailab/eagle)**|\u5728\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6210\u672c\u9ad8\u4e14\u8017\u65f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6295\u673a\u53d6\u5de7\u7684\u62bd\u6837\u65b9\u6cd5\u5982EAGLE\u5df2\u8bc1\u5b9e\u6709\u6548\u3002\u4f20\u7edf\u65b9\u6cd5\u5047\u8bbe\u8349\u7a3f\u6811\u7684\u63a5\u53d7\u7387\u4ec5\u4f9d\u8d56\u4e8e\u4ee4\u724c\u7684\u4f4d\u7f6e\uff0c\u7136\u800c\u6211\u4eec\u53d1\u73b0\u8fd9\u5176\u5b9e\u8fd8\u53d6\u51b3\u4e8e\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728EAGLE\u7684\u57fa\u7840\u4e0a\u63d0\u51fa\u4e86EAGLE-2\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u52a8\u6001\u8349\u7a3f\u6811\u6280\u672f\u5230\u8d77\u8349\u5efa\u6a21\u4e2d\u3002\u8fd9\u4e00\u6539\u8fdb\u5229\u7528\u4e86EAGLE\u7684\u8349\u7a3f\u6a21\u578b\u6821\u51c6\u826f\u597d\u7684\u7279\u6027\uff1a\u8349\u7a3f\u6a21\u578b\u7684\u4fe1\u5fc3\u5206\u6570\u80fd\u8fd1\u4f3c\u8868\u793a\u63a5\u53d7\u7387\uff0c\u8bef\u5dee\u8f83\u5c0f\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u7cfb\u5217\u7684LLMs\u548c\u516d\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aEAGLE-2\u7684\u901f\u5ea6\u63d0\u5347\u6bd4\u7387\u4e3a3.05\u500d\u52304.26\u500d\uff0c\u6bd4EAGLE-1\u5feb20%\u523040%\u3002\u6b64\u5916\uff0cEAGLE-2\u8fd8\u80fd\u4fdd\u6301\u751f\u6210\u6587\u672c\u5206\u5e03\u4e0d\u53d8\uff0c\u56e0\u6b64\u662f\u4e00\u4e2a\u65e0\u635f\u52a0\u901f\u7b97\u6cd5\u3002|\n", "2406.16838": "|**2024-06-24**|**From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models**|Sean Welleck et.al.|[2406.16838](http://arxiv.org/abs/2406.16838)|null|\u73b0\u4ee3\u7814\u7a76\u4e2d\u6700\u5f15\u4eba\u6ce8\u76ee\u7684\u53d1\u73b0\u4e4b\u4e00\u662f\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u589e\u52a0\u8ba1\u7b97\u8d44\u6e90\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u65ad\u65f6\u7684\u4f18\u5316\u65b9\u6cd5\u7684\u5173\u6ce8\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u7bc7\u7efc\u8ff0\u4e13\u95e8\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u63a8\u65ad\u65f6\u95f4\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u7edf\u4e00\u7684\u6570\u5b66\u6846\u67b6\u51fa\u53d1\uff0c\u8003\u5bdf\u4e86\u4e09\u4e2a\u9886\u57df\uff1a\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\u3001\u5143\u751f\u6210\u7b97\u6cd5\u548c\u9ad8\u6548\u751f\u6210\u3002\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\uff0c\u901a\u5e38\u79f0\u4e3a\u89e3\u7801\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e00\u6b21\u62bd\u6837\u4e00\u4e2atoken\u6216\u6784\u5efa\u8bcd\u7ea7\u641c\u7d22\u7a7a\u95f4\uff0c\u7136\u540e\u9009\u62e9\u8f93\u51fa\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5047\u8bbe\u80fd\u591f\u8bbf\u95ee\u8bed\u8a00\u6a21\u578b\u7684logits\u3001\u4e0b\u4e00\u4e2atoken\u5206\u5e03\u6216\u6982\u7387\u5206\u6570\u3002\u5143\u751f\u6210\u7b97\u6cd5\u5904\u7406\u90e8\u5206\u6216\u5b8c\u6574\u5e8f\u5217\uff0c\u878d\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u652f\u6301\u56de\u6eaf\uff0c\u5e76\u6574\u5408\u5916\u90e8\u4fe1\u606f\u3002\u9ad8\u6548\u751f\u6210\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11token\u6210\u672c\uff0c\u63d0\u9ad8\u751f\u6210\u901f\u5ea6\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u878d\u5408\u4e86\u6765\u81ea\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u73b0\u4ee3LLMs\u548c\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u4e09\u4e2a\u7814\u7a76\u793e\u533a\u7684\u89c2\u70b9\u3002|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833](http://arxiv.org/abs/2406.16833)|null|\u5728\u5f53\u524d\u7684\u80cc\u666f\u4e0b\uff0c\u8bc6\u522b\u7528\u6237\u5728\u5404\u79cd\u8bdd\u9898\u7684\u957f\u7bc7\u8ba8\u8bba\u4e2d\u7684\u89c2\u70b9\u548c\u7acb\u573a\u5bf9\u4e8e\u4e2a\u6027\u5316\u3001\u5e02\u573a\u7814\u7a76\u3001\u653f\u6cbb\u7ade\u9009\u3001\u5ba2\u6237\u670d\u52a1\u3001\u51b2\u7a81\u89e3\u51b3\u3001\u5b9a\u5411\u5e7f\u544a\u548c\u5185\u5bb9\u7ba1\u7406\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u52a8\u6807\u6ce8\u6570\u636e\u4ee5\u8bad\u7ec3\u6b64\u7c7b\u6a21\u578b\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u5982\u8017\u65f6\u6602\u8d35\u3001\u957f\u5bf9\u8bdd\u53ef\u80fd\u5f15\u5165\u566a\u58f0\uff0c\u4ee5\u53ca\u7528\u6237\u89c2\u70b9\u8f6c\u53d8\u7684\u5fae\u5999\u4e4b\u5904\u53ef\u80fd\u5bfc\u81f4\u89e3\u8bfb\u56f0\u96be\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u672c\u6587\u5c1d\u8bd5\u5229\u7528Mistral Large\u548cGPT-4\u81ea\u52a8\u5316\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\u7684\u6807\u6ce8\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u63a8\u7406\uff1a\u4e00\u662f\u7528\u6237\u7acb\u573a\u5206\u7c7b\uff0c\u5373\u5728\u5bf9\u8bdd\u4e2d\u5bf9\u7528\u6237\u5e16\u5b50\u7684\u89c2\u70b9\u8fdb\u884c\u4e94\u7ea7\u6807\u6ce8\uff1b\u4e8c\u662f\u7528\u6237\u56fa\u6267\u7a0b\u5ea6\u5206\u7c7b\uff0c\u5173\u6ce8\u7528\u6237\u5728\u6574\u4e2a\u5bf9\u8bdd\u4e2d\u7684\u603b\u4f53\u610f\u89c1\uff0c\u91c7\u7528\u56db\u7ea7\u6807\u6ce8\u3002\u901a\u8fc7\u5728764\u4e2a\u591a\u7528\u6237Reddit\u5bf9\u8bdd\u4e0a\u5e94\u7528\u96f6\u6837\u672c\u3001\u4e00\u793a\u4f8b\u548c\u5c11\u91cf\u6837\u4f8b\u6807\u6ce8\u7684\u591a\u6570\u6295\u7968\uff0c\u6211\u4eec\u521b\u5efa\u4e86USDC\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\u5bf9\u591a\u4e2a\u5c0f\u578b\u90e8\u7f72\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u548c\u6307\u4ee4\u8c03\u6574\uff0c\u7528\u4e8e\u6267\u884c\u4e94\u7c7b\u7acb\u573a\u548c\u56db\u7c7b\u56fa\u6267\u7a0b\u5ea6\u7684\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff1a[https://anonymous.4open.science/r/USDC-0F7F]\u3002|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828](http://arxiv.org/abs/2406.16828)|**[link](https://github.com/castorini/ragnarok)**|## \u80cc\u666f \u60a8\u53ef\u80fd\u4f53\u9a8c\u8fc7\u65b0\u7684Bing\u641c\u7d22\u6216Google AI\u6982\u8ff0\uff1f\u8fd9\u4e9b\u90fd\u53cd\u6620\u51fa\u5f53\u524d\u641c\u7d22\u5f15\u64ce\u6b63\u9010\u6b65\u53d1\u5c55\u5230\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u7cfb\u7edf\u3002\u8fd9\u7c7b\u7cfb\u7edf\u80fd\u6574\u5408\u5b9e\u65f6\u6570\u636e\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u63d0\u4f9b\u4fe1\u606f\u4e30\u5bcc\u3001\u6709\u6765\u6e90\u4e14\u7b80\u6d01\u7684\u6458\u8981\uff0c\u4e0e\u4f20\u7edf\u7684\u6587\u6863\u6392\u540d\u5c55\u793a\u65b9\u5f0f\u5f62\u6210\u5bf9\u6bd4\u3002\u56e0\u6b64\uff0c\u4e3a\u4e86\u63a8\u52a8RAG\u7cfb\u7edf\u8bc4\u4f30\u7684\u521b\u65b0\uff0c\u6211\u4eec\u63d0\u8bae\u5728TREC 2024\u5e74\u589e\u8bbeRAG\u7ade\u8d5b\u3002\u672c\u6587\u8be6\u8ff0\u4e86\u6211\u4eec\u5982\u4f55\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff1a\u63cf\u8ff0\u4e86\u53ef\u590d\u7528\u6846\u67b6Ragnar\\\"ok\u7684\u8bbe\u8ba1\uff0c\u89e3\u91ca\u4e86MS MARCO V2.1\u8bed\u6599\u5e93\u7684\u9009\u62e9\uff0c\u53d1\u5e03\u4e86\u7ade\u8d5b\u5f00\u53d1\u8bdd\u9898\uff0c\u5e76\u6807\u51c6\u5316\u4e86\u7528\u6237\u63a5\u53e3\u5b9a\u4e49\uff0c\u4ee5\u4fbf\u5229\u7528\u6237\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5229\u7528Ragnar\\\"ok\u5c55\u793a\u5173\u952e\u7684\u5de5\u4e1a\u57fa\u51c6\uff0c\u5982OpenAI\u7684GPT-4o\u548cCohere\u7684Command R+\u3002\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u4e00\u4e2a\u7f51\u9875\u754c\u9762\uff0c\u7528\u4e8e\u4e92\u52a8\u5f0f\u5730\u6bd4\u8f83\u4e0d\u540cRAG\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u5e76\u901a\u8fc7\u4f17\u5305\u65b9\u5f0f\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u5f00\u6e90Ragnar\\\"ok\u6846\u67b6\u548c\u57fa\u51c6\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684RAG\u7cfb\u7edf\u5efa\u7acb\u7edf\u4e00\u7684\u6807\u51c6\u3002|\n", "2406.16801": "|**2024-06-24**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801](http://arxiv.org/abs/2406.16801)|**[link](https://github.com/qurrent-ai/res-q)**|**## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u4fc3\u4f7f\u4e86\u4e00\u7c7b\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u7cfb\u7edf\u53d1\u5c55\uff0c\u5982\u5bf9\u5927\u578b\u4ee3\u7801\u4ed3\u5e93\u8fdb\u884c\u7f16\u8f91\u3002\u9274\u4e8eLLMs\u5bf9\u63d0\u793a\u5fae\u8c03\u7684\u9ad8\u654f\u611f\u6027\u548c\u4e0d\u53ef\u9884\u6d4b\u6027\uff0c\u8feb\u5207\u9700\u8981\u7a33\u5065\u7684\u8bc4\u4f30\u5de5\u5177\u6765\u63a8\u52a8\u8fd9\u4e9b\u7cfb\u7edf\u7684\u672a\u6765\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51faRES-Q\uff0c\u4e00\u4e2a\u9488\u5bf9$\\textbf{R}$epository $\\textbf{E}$diting $\\textbf{S}$ystems\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e100\u4e2a\u771f\u5b9e\u7684GitHub\u63d0\u4ea4\u6784\u5efa\u4e86100\u4e2a\u4ed3\u5e93\u7f16\u8f91\u4efb\u52a1\u3002\u7ed9\u5b9a\u7f16\u8f91\u6307\u4ee4\u548c\u4ee3\u7801\u4ed3\u5e93\uff0cRES-Q\u8bc4\u4f30LLM\u7cfb\u7edf\u83b7\u53d6\u4fe1\u606f\u5e76\u6784\u9020\u6ee1\u8db3\u6307\u4ee4\u8981\u6c42\u7684\u7f16\u8f91\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u8fd9\u79cd\u8bc4\u4f30\u65b9\u5f0f\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u80fd\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002 \u6211\u4eec\u4f7f\u7528Qurrent OS\u5f00\u53d1\u7684\u8bed\u8a00\u4ee3\u7406\u8f6f\u4ef6\u6784\u5efa\u4e86\u4e00\u4e2a\u4ed3\u5e93\u7f16\u8f91\u7cfb\u7edf\uff0c\u5bf9\u8be5\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5982Claude Sonnet 3.5\u548cGPT-4o\uff0c\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5c3d\u7ba1\u5728HumanEval\u4e0a\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6709\u6240\u5dee\u5f02\uff0c\u4f46\u5728RES-Q\u4e0a\uff0cClaude Sonnet 3.5\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6bd4GPT-4o\u9ad8\u51fa12%\uff0c\u8fd9\u8868\u660eRES-Q\u5177\u6709\u533a\u5206\u6a21\u578b\u80fd\u529b\u7684\u6f5c\u529b\uff0c\u968f\u7740\u4f20\u7edf\u57fa\u51c6\u63a5\u8fd1\u9971\u548c\uff0c\u5b83\u80fd\u63d0\u4f9b\u66f4\u6df1\u5165\u7684\u6d1e\u5bdf\u3002 \u6211\u4eec\u8fd8\u7814\u7a76\u4e86token\u6548\u7387\u3001\u4e0e\u73b0\u6709\u57fa\u51c6\u7684\u6027\u80fd\u5173\u8054\uff0c\u4ee5\u53ca\u5c01\u95ed\u6e90\u548c\u5f00\u6e90LLM\u4e4b\u95f4\u7684\u6709\u8da3\u5dee\u5f02\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u5728https://github.com/Qurrent-AI/RES-Q\u83b7\u53d6\u3002**|\n", "2406.16797": "|**2024-06-24**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797](http://arxiv.org/abs/2406.16797)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|**## \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u65b0\u4efb\u52a1\u7684\u65b9\u6cd5\u5e76\u4e0d\u9002\u7528\u4e8e\u591a\u4efb\u52a1\u9002\u5e94\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u4fee\u6539\u6240\u6709\u6a21\u578b\u6743\u91cd\uff0c\u5bfc\u81f4\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4ea7\u751f\u7834\u574f\u6027\u7684\u5e72\u6270\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u5148\u524d\u4efb\u52a1\u7684\u9057\u5fd8\uff0c\u4f7f\u5f97\u540c\u65f6\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u83b7\u5f97\u826f\u597d\u6027\u80fd\u53d8\u5f97\u56f0\u96be\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Lottery Ticket Adaptation\uff08LoTA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7a00\u758f\u9002\u5e94\u65b9\u6cd5\uff0c\u5b83\u8bc6\u522b\u5e76\u4f18\u5316\u6a21\u578b\u4e2d\u7684\u4e00\u4e2a\u7a00\u758f\u5b50\u7f51\u7edc\u3002\u6211\u4eec\u5728\u8bf8\u5982\u6307\u4ee4\u8ddf\u968f\u3001\u63a8\u7406\u3001\u6570\u5b66\u548c\u6458\u8981\u7b49\u590d\u6742\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86LoTA\u3002 ## \u65b9\u6cd5 LoTA\u901a\u8fc7\u53d1\u73b0\u548c\u4f18\u5316\u201c\u5f69\u7968\u5238\u201d\uff08\u6216\u7a00\u758f\u4efb\u52a1\u5411\u91cf\uff09\u6765\u5b9e\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u4e8e\u5168\u91cf\u5fae\u8c03\u548c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u3002LoTA\u4e0d\u4ec5\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd8\u80fd\u5728\u8bad\u7ec3\u5176\u4ed6\u4efb\u52a1\u540e\u4fdd\u6301\u826f\u597d\u7684\u8868\u73b0\uff0c\u4ece\u800c\u907f\u514d\u4e86\u707e\u96be\u6027\u9057\u5fd8\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d0\u53d6\u548c\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\uff0cLoTA\u8fd8\u652f\u6301\u5728\u9ad8\u5ea6\u4e0d\u540c\u7684\u4efb\u52a1\u95f4\u8fdb\u884c\u6a21\u578b\u878d\u5408\u3002 ## \u7ed3\u8bba \u603b\u7684\u6765\u8bf4\uff0cLoTA\u4f5c\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u7a00\u758f\u9002\u5e94\u7b56\u7565\uff0c\u4e3a\u591a\u4efb\u52a1\u5927\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u5728\u5904\u7406\u591a\u4e2a\u4efb\u52a1\u65f6\u4fdd\u6301\u7a33\u5b9a\u4e14\u9ad8\u6548\u7684\u8868\u73b0\u3002**|\n", "2406.16783": "|**2024-06-24**|**M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models**|Rishabh Maheshwary et.al.|[2406.16783](http://arxiv.org/abs/2406.16783)|null|## \u80cc\u666f \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u6307\u4ee4\u7684\u6821\u51c6\u8fc7\u7a0b\u4e2d\uff0c\u5fae\u8c03\uff08finetuning, IFT\uff09\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u4e9b\u6709\u6548\u7684IFT\u6570\u636e\u96c6\uff0c\u4f46\u5927\u591a\u96c6\u4e2d\u5728\u9ad8\u8d44\u6e90\u8bed\u8a00\u5982\u82f1\u8bed\u4e0a\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51fa\u4e00\u4e2a\u5168\u5408\u6210\u7684\u3001\u57fa\u4e8eEvol\u5206\u7c7b\u6cd5\u5f15\u5bfc\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\u2014\u2014M2Lingual\uff0c\u76ee\u6807\u662f\u63d0\u5347LLMs\u5728\u591a\u6837\u8bed\u8a00\u548c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002M2Lingual\u5171\u5305\u542b182,000\u4e2aIFT\u5bf9\uff0c\u6e90\u81ea\u4e0d\u540c\u79cd\u5b50\uff0c\u6db5\u76d670\u79cd\u8bed\u8a00\u300117\u4e2aNLP\u4efb\u52a1\u4ee5\u53ca\u901a\u7528\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u3002 ## \u76ee\u7684\u4e0e\u8d21\u732e \u4f7f\u7528M2Lingual\u8fdb\u884c\u8bad\u7ec3\u7684LLMs\u6027\u80fd\u663e\u8457\u4f18\u4e8e\u5927\u591a\u6570\u73b0\u6709\u7684\u591a\u8bed\u8a00IFT\u6570\u636e\u96c6\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u7ecfM2Lingual\u5fae\u8c03\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u7a33\u5065\u7684\u8de8\u8bed\u8a00\u80fd\u529b\uff0c\u65e0\u8bba\u662f\u5728\u6211\u4eec\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u7ffb\u8bd1\u8bc4\u4ef7\u57fa\u51c6\u4e0a\uff0c\u8fd8\u662f\u5728\u591a\u79cd\u591a\u6837\u7684\u591a\u8bed\u8a00\u4efb\u52a1\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8d21\u732e\u4e86Evol\u5206\u7c7b\u6cd5\u7684\u4e24\u6b65\u65b9\u6cd5\uff0c\u5e76\u516c\u5f00\u4e86M2Lingual\u7684\u6570\u636e\u96c6\uff1ahttps://huggingface.co/datasets/ServiceNow-AI/M2Lingual\u3002|\n", "2406.16779": "|**2024-06-24**|**It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension**|Sagi Shaier et.al.|[2406.16779](http://arxiv.org/abs/2406.16779)|null|\u8fc7\u53bb\u5341\u5e74\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5b9e\u8df5\u672a\u7ecf\u5145\u5206\u8bc4\u4f30\u5c31\u5df2\u786e\u7acb\u3002\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u8fd9\u4e00\u60c5\u51b5\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u95ee\u9898\uff1a1\uff09\u8f93\u5165\u987a\u5e8f\uff08\u5373\u95ee\u9898\u548c\u4e0a\u4e0b\u6587\uff09\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff1f\u9274\u4e8e\u8fd1\u671f\u5728\u8f93\u5165\u4fa7\u91cd\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63a2\u7a76\uff1a2\uff09\u5f3a\u8c03\u95ee\u9898\u3001\u4e0a\u4e0b\u6587\u6216\u4e24\u8005\u662f\u5426\u80fd\u63d0\u5347\u8868\u73b0\uff1f\u6211\u4eec\u57283\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u4e869\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u53d1\u73b0\u5148\u5448\u73b0\u4e0a\u4e0b\u6587\u518d\u7ed9\u51fa\u95ee\u9898\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u6700\u9ad8\u53ef\u8fbe31%\u7684\u51c6\u786e\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u5f3a\u8c03\u4e0a\u4e0b\u6587\u7684\u6548\u679c\u4f18\u4e8e\u7a81\u51fa\u663e\u793a\u95ee\u9898\uff0c\u800c\u4e14\u5bf9\u6a21\u578b\u7f3a\u4e4f\u53c2\u6570\u77e5\u8bc6\u6765\u56de\u7b54\u7684\u95ee\u9898\uff0c\u9488\u5bf9\u6027\u5730\u5f3a\u8c03\u8f93\u5165\u90e8\u5206\u5c24\u5176\u6709\u6548\u3002\u901a\u8fc7\u5c1d\u8bd5\u57fa\u4e8e\u63d0\u793a\u548c\u6ce8\u610f\u529b\u7684\u5f3a\u8c03\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u6709\u6548\u7684\u7b56\u7565\u51fa\u4eba\u610f\u6599\u5730\u7b80\u5355\uff1a\u53ea\u9700\u5728\u8f93\u5165\u4e2d\u9644\u52a0\u51e0\u4e2a\u6807\u8bb0\uff0c\u5c31\u80fd\u5b9e\u73b0\u9ad8\u8fbe36%\u7684\u51c6\u786e\u6027\u63d0\u5347\uff0c\u4f7f\u5f97\u5c0f\u578b\u6a21\u578b\u80fd\u591f\u8d85\u8d8a\u5176\u5927\u5f97\u591a\u7684\u540c\u7c7b\u6a21\u578b\u3002|\n", "2406.16777": "|**2024-06-24**|**Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024**|Sai Koneru et.al.|[2406.16777](http://arxiv.org/abs/2406.16777)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u88ab\u5e7f\u6cdb\u7814\u7a76\uff0c\u4ee5\u5e94\u7528\u4e8e\u8bf8\u5982\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u3001\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u751a\u81f3\u7aef\u5230\u7aef\u8bed\u97f3\u7ffb\u8bd1\uff08ST\uff09\u7b49\u4efb\u52a1\u3002\u672c\u6587\u4ecb\u7ecdKIT\u56e2\u961f\u5728\u53d7\u9650+LLM\u8d5b\u9053\u4e0b\u7684\u79bb\u7ebf\u63d0\u4ea4\uff0c\u6211\u4eec\u901a\u8fc7\u6574\u5408\u6700\u65b0\u6280\u672f\u6539\u8fdb\u4e86\u7ea7\u8054\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u5c06Mistral-7B\u6a21\u578b\\footnote{mistralai/Mistral-7B-Instruct-v0.1}\u878d\u5165\u5176\u4e2d\uff0c\u4ece\u4e24\u4e2a\u65b9\u9762\u589e\u5f3a\u7cfb\u7edf\uff1a\u4e00\u662f\u5229\u7528\u6211\u4eec\u7684\u7cfb\u7edf\u751f\u6210\u7684N-best\u5217\u8868\u7cbe\u70bcASR\u8f93\u51fa\uff0c\u901a\u8fc7\u5fae\u8c03LLM\u63d0\u9ad8\u8f6c\u5f55\u51c6\u786e\u6027\uff1b\u4e8c\u662f\u5bf9MT\u8f93\u51fa\u8fdb\u884c\u6587\u6863\u7ea7\u522b\u7684\u7cbe\u70bc\uff0c\u5229\u7528ASR\u548cMT\u9884\u6d4b\u6765\u63d0\u5347\u7ffb\u8bd1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u7684\u96c6\u6210\u4f7f\u5f97ASR\u7684Word Error Rate\u4e0b\u964d\u4e86\u7edd\u5bf90.3%\uff0cMT\u7684COMET\u8bc4\u5206\u63d0\u9ad8\u4e860.65%\u3002\u7136\u800c\uff0c\u5728\u5305\u542b\u91cd\u53e0\u8bf4\u8bdd\u8005\u548c\u80cc\u666f\u566a\u97f3\u7684\u6311\u6218\u6027\u6d4b\u8bd5\u96c6\u4e2d\uff0c\u7531\u4e8eASR\u6027\u80fd\u4e0d\u4f73\uff0cLLM\u96c6\u6210\u7684\u6548\u679c\u4e0d\u660e\u663e\u3002\u4e3a\u4e86\u6539\u5584\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u53ef\u80fd\u7f3a\u5931\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u5206\u5757\u957f\u5f62\u5f0f\u89e3\u7801\u7684ASR\u65b9\u6cd5\u3002|\n", "2406.16768": "|**2024-06-24**|**WARP: On the Benefits of Weight Averaged Rewarded Policies**|Alexandre Ram\u00e9 et.al.|[2406.16768](http://arxiv.org/abs/2406.16768)|null|### \u7ffb\u8bd1 \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u901a\u8fc7\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u6765\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f7f\u5176\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u4e3a\u4e86\u4fdd\u6301\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0cRLHF\u901a\u5e38\u91c7\u7528KL\u6563\u5ea6\u6b63\u5219\u5316\uff0c\u4f46\u8fd9\u4f1a\u9650\u5236\u5956\u52b1\u4f18\u5316\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5bf9\u9f50\u7b56\u7565\uff0c\u79f0\u4e3a\u6743\u91cd\u5e73\u5747\u5956\u52b1\u7b56\u7565\uff08WARP\uff09\u3002WARP\u5728\u4e09\u4e2a\u9636\u6bb5\u5728\u6743\u91cd\u7a7a\u95f4\u4e2d\u878d\u5408\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5b83\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7b56\u7565\u4f5c\u4e3aKL\u6b63\u5219\u5316\u7684\u52a8\u6001\u57fa\u51c6\u3002\u5176\u6b21\uff0c\u5e94\u7528\u7403\u9762\u63d2\u503c\u5c06\u72ec\u7acb\u5fae\u8c03\u7684\u7b56\u7565\u5408\u5e76\u6210\u4e00\u4e2a\u589e\u5f3a\u6a21\u578b\u3002\u6700\u540e\uff0c\u7ebf\u6027\u63d2\u503c\u5728\u5408\u5e76\u6a21\u578b\u548c\u521d\u59cb\u6a21\u578b\u4e4b\u95f4\u8fdb\u884c\uff0c\u4ee5\u6062\u590d\u9884\u8bad\u7ec3\u7279\u5f81\u3002\u8be5\u8fc7\u7a0b\u8fed\u4ee3\u8fdb\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6700\u7ec8\u6a21\u578b\u7528\u4f5c\u4e0b\u4e00\u8f6e\u7684\u9ad8\u7ea7\u521d\u59cb\u5316\uff0c\u9010\u6b65\u4f18\u5316KL\u4e0e\u5956\u52b1\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5b9e\u73b0\u56fa\u5b9aKL\u4e0b\u7684\u66f4\u9ad8\u5956\u52b1\u3002GEMMA\u7b56\u7565\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86WARP\u7684\u4f18\u70b9\uff0c\u5176\u8d28\u91cf\u548c\u5bf9\u9f50\u6027\u80fd\u4f18\u4e8e\u5f00\u6e90\u7684LLMs\u3002|\n", "2406.17770": "|**2024-06-25**|**MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning**|Xiangyu Zhao et.al.|[2406.17770](http://arxiv.org/abs/2406.17770)|**[link](https://github.com/phoenixz810/mg-llava)**|**## \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u6a21\u578b\u5c40\u9650\u4e8e\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u56fe\u50cf\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u8be6\u7ec6\u89c6\u89c9\u4fe1\u606f\u7684\u611f\u77e5\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MLLM\u2014\u2014MG-LLaVA\uff0c\u901a\u8fc7\u5f15\u5165\u591a\u5c3a\u5ea6\u89c6\u89c9\u6d41\uff0c\u5305\u62ec\u4f4e\u5206\u8fa8\u7387\u3001\u9ad8\u5206\u8fa8\u7387\u548c\u5bf9\u8c61\u7ea7\u7279\u5f81\uff0c\u6765\u589e\u5f3a\u6a21\u578b\u7684\u89c6\u89c9\u5904\u7406\u80fd\u529b\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u4ee5\u6355\u6349\u7cbe\u7ec6\u7ec6\u8282\uff0c\u5e76\u901a\u8fc7\u5377\u79ef\u95e8\u878d\u5408\u7f51\u7edc\u4e0e\u57fa\u7840\u89c6\u89c9\u7279\u5f81\u878d\u5408\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u5bf9\u8c61\u8bc6\u522b\u80fd\u529b\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u6765\u81ea\u79bb\u7ebf\u68c0\u6d4b\u5668\u786e\u5b9a\u7684\u8fb9\u754c\u6846\u7684\u7269\u4f53\u7ea7\u522b\u7279\u5f81\u3002MG-LLaVA\u4ec5\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u611f\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u7f16\u7801\u5668\uff08\u4ece38\u4ebf\u5230340\u4ebf\u53c2\u6570\uff09\u5b9e\u4f8b\u5316MG-LLaVA\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u5176\u6027\u80fd\u3002\u591a\u9879\u57fa\u51c6\u6d4b\u8bd5\u7684\u7ed3\u679c\u8868\u660e\uff0cMG-LLaVA\u5728\u540c\u7c7b\u53c2\u6570\u91cf\u7684\u73b0\u6709MLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u51fa\u8272\u7684\u6548\u7387\u3002\u4ee3\u7801\u5c06\u5728https://github.com/PhoenixZ810/MG-LLaVA\u4e0a\u5f00\u6e90\u3002**|\n", "2406.17764": "|**2024-06-25**|**BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning**|Ercong Nie et.al.|[2406.17764](http://arxiv.org/abs/2406.17764)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u79ef\u7d2f\u4e86\u4e30\u5bcc\u7684\u53c2\u6570\u77e5\u8bc6\uff0c\u4f46\u7531\u4e8e\u91cd\u65b0\u8bad\u7ec3\u6210\u672c\u9ad8\u6602\u4e14\u5bf9\u95ed\u6e90\u6a21\u578b\u4e0d\u53ef\u884c\uff0c\u66f4\u65b0\u8fd9\u4e9b\u77e5\u8bc6\u53d8\u5f97\u56f0\u96be\u3002\u77e5\u8bc6\u7f16\u8f91\uff08KE\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5141\u8bb8\u5728\u4e0d\u635f\u5bb3\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u66f4\u65b0LLMs\u7684\u77e5\u8bc6\u3002\u57fa\u4e8e\u201c\u4e0a\u4e0b\u6587\u5b66\u4e60\u201d\uff08ICL\uff09\u7684\u5373\u5e2dKE\u65b9\u6cd5\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u4f5c\u4e3a\u9ed1\u76d2\u5904\u7406\u3002\u8fc7\u53bb\uff0cKE\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u73af\u5883\uff0c\u800c\u5f53\u524d\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLMs\u5728\u8de8\u8bed\u8a00KE\u65b9\u9762\u7684\u6f5c\u529b\u5c1a\u672a\u5145\u5206\u6316\u6398\u3002\u4e3a\u4e86\u63a8\u52a8\u8fd9\u65b9\u9762\u7684\u66f4\u591a\u7814\u7a76\uff0c\u6211\u4eec\u63a8\u51fa\u4e86BMIKE-53\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u9488\u5bf953\u79cd\u4e0d\u540c\u8bed\u8a00\u7684\u4e09\u79cdKE\u4efb\u52a1\u7c7b\u578b\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u68af\u5ea6\u7684KE\u65b9\u6cd5\u2014\u2014\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7f16\u8f91\uff08MIKE\uff09\uff0c\u5e76\u5728BMIKE-53\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u5173\u6ce8\u8de8\u8bed\u8a00\u77e5\u8bc6\u8f6c\u79fb\u7684\u53ef\u9760\u6027\u3001\u6cdb\u5316\u6027\u3001\u5c40\u90e8\u6027\u548c\u53ef\u79fb\u690d\u6027\uff0c\u4e3a\u672a\u6765\u8de8\u8bed\u8a00KE\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u548c\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u901a\u8fc7\u533f\u540d\u4ed3\u5e93https://anonymous.4open.science/r/MIKE\u516c\u5f00\u83b7\u53d6\u3002|\n", "2406.17761": "|**2024-06-25**|**CaLMQA: Exploring culturally specific long-form question answering across 23 languages**|Shane Arora et.al.|[2406.17761](http://arxiv.org/abs/2406.17761)|**[link](https://github.com/2015aroras/calmqa)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u957f\u7bc7\u95ee\u7b54\u4efb\u52a1\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u9700\u751f\u6210\u6bb5\u843d\u7ea7\u522b\u7684\u7b54\u6848\u6765\u56de\u5e94\u590d\u6742\u95ee\u9898\u3002\u5c3d\u7ba1\u82f1\u8bed\u7684\u957f\u7bc7\u95ee\u7b54\u7814\u7a76\u5df2\u76f8\u5f53\u6df1\u5165\uff0c\u6d89\u53ca\u591a\u79cd\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u6307\u6807\uff0c\u4f46\u5176\u4ed6\u8bed\u8a00\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63a8\u51fa\u4e86CaLMQA\uff0c\u4e00\u4e2a\u5305\u542b2,600\u4e2a\u8de823\u79cd\u8bed\u8a00\u7684\u590d\u6742\u95ee\u9898\u96c6\u5408\uff0c\u5176\u4e2d\u5305\u62ec\u8d44\u6e90\u6709\u9650\u3001\u9c9c\u5c11\u7814\u7a76\u7684\u8bed\u8a00\uff0c\u5982\u6590\u6d4e\u8bed\u548c\u57fa\u6797\u8fea\u8bed\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u65e2\u5305\u62ec\u793e\u533a\u7f51\u7edc\u8bba\u575b\u4e0a\u6536\u96c6\u7684\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u4e5f\u5305\u542b\u4e86\u7531\u6bcd\u8bed\u4f7f\u7528\u8005\u64b0\u5199\u7684\u9898\u76ee\uff0c\u6211\u4eec\u4e3a\u6b64\u4e13\u95e8\u8058\u8bf7\u4e86\u4ed6\u4eec\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u4ea7\u751f\u4e86\u591a\u6837\u4e14\u590d\u6742\u7684\u9898\u76ee\uff0c\u53cd\u6620\u4e86\u6587\u5316\u4e3b\u9898\uff08\u5982\u4f20\u7edf\u3001\u6cd5\u5f8b\u3001\u65b0\u95fb\uff09\uff0c\u4ee5\u53ca\u6bcd\u8bed\u4f7f\u7528\u8005\u7684\u8bed\u8a00\u4e60\u60ef\u3002 \u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u81ea\u52a8\u8bc4\u4f30\uff0c\u4f7f\u7528\u4e86\u6211\u4eec\u65b0\u63d0\u51fa\u7684CaLMScore\u6307\u6807\uff0c\u8be5\u6307\u6807\u80fd\u68c0\u6d4b\u7b54\u6848\u4e2d\u7684\u8bed\u8a00\u9519\u8bef\u548c\u91cd\u590d\u8bcd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u4f4e\u8d44\u6e90\u8bed\u8a00\uff0cLLM\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u660e\u663e\u4e0b\u964d\u3002\u6211\u4eec\u5728\u90e8\u5206\u6a21\u578b\u7684\u4eba\u5de5\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u5bf9\u4e8e\u5177\u6709\u6587\u5316\u7279\u6027\u7684\u95ee\u9898\uff0c\u6a21\u578b\u8868\u73b0\u663e\u8457\u4f4e\u4e8e\u6587\u5316\u4e2d\u7acb\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5bf9LLM\u591a\u8bed\u8a00\u80fd\u529b\u53ca\u975e\u82f1\u8bed\u957f\u7bc7\u95ee\u7b54\u8bc4\u4ef7\u9886\u57df\u66f4\u6df1\u5165\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002**|\n", "2406.17755": "|**2024-06-25**|**Accelerating Clinical Evidence Synthesis with Large Language Models**|Zifeng Wang et.al.|[2406.17755](http://arxiv.org/abs/2406.17755)|null|\u4eba\u5de5\u667a\u80fd\u81ea\u52a8\u533b\u5b66\u53d1\u73b0\u662f\u8bb8\u591a\u4eba\u7684\u68a6\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTrialMind\u7684\u751f\u6210\u5f0fAI\u7ba1\u9053\uff0c\u65e8\u5728\u8fdb\u884c\u533b\u5b66\u7cfb\u7edf\u6027\u56de\u987e\uff0c\u6db5\u76d6\u7814\u7a76\u641c\u7d22\u3001\u7b5b\u9009\u548c\u6570\u636e\u63d0\u53d6\u9636\u6bb5\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u6bcf\u4e2a\u73af\u8282\uff0c\u5e76\u5f15\u5165\u4e13\u5bb6\u76d1\u7763\u4ee5\u51cf\u5c11\u9519\u8bef\u3002\u4e3a\u4e86\u8bc4\u4f30\u6027\u80fd\uff0c\u6211\u4eec\u521b\u5efa\u4e86TrialReviewBench\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u662f\u4e00\u4e2a\u5b9a\u5236\u7684\u5305\u542b870\u4efd\u6765\u81ea25\u7bc7\u5143\u5206\u6790\u8bba\u6587\u7684\u4e34\u5e8a\u7814\u7a76\u6807\u6ce8\u6570\u636e\uff0c\u6db5\u76d6\u4e0d\u540c\u533b\u7597\u6cbb\u7597\u9886\u57df\u3002\u7ed3\u679c\u663e\u793a\uff0cTrialMind\u663e\u8457\u63d0\u5347\u4e86\u6587\u732e\u5ba1\u67e5\u6548\u7387\uff0c\u5728\u4ece\u8d85\u8fc72000\u4e07\u7bc7PubMed\u7814\u7a76\u4e2d\u68c0\u7d22\u76f8\u5173\u7814\u7a76\u65f6\uff0c\u53ec\u56de\u7387\u9ad8\u8fbe0.897\u81f31.000\u3002\u5728\u7b5b\u9009\u9636\u6bb5\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8e\u57fa\u4e8e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7684\u65b9\u6cd5\uff08\u53ec\u56de\u7387\u5206\u522b\u4e3a0.227-0.246 vs. 0.000-0.102\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ed3\u679c\u63d0\u53d6\u65b9\u9762\u8d85\u8d8a\u4e86\u76f4\u63a5\u4f7f\u7528GPT-4\u7684\u8868\u73b0\uff0c\u51c6\u786e\u7387\u8303\u56f4\u4e3a0.65\u52300.84\u3002\u6211\u4eec\u8fd8\u652f\u6301\u68ee\u6797\u56fe\u4e2d\u7684\u4e34\u5e8a\u8bc1\u636e\u7efc\u5408\uff0c\u7ecf\u516b\u540d\u4eba\u7c7b\u6807\u6ce8\u5458\u9a8c\u8bc1\uff0c\u4ed6\u4eec\u666e\u904d\u66f4\u504f\u597dTrialMind\uff0c\u5176\u5728\u6d89\u53ca\u7684\u5ba1\u67e5\u4e2d\u80dc\u51fa\u7387\u4e3a62.5%\u81f3100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u5982TrialMind\uff0c\u80fd\u591f\u4fc3\u8fdb\u53ef\u9760\u4e14\u9ad8\u8d28\u91cf\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\uff0c\u4ece\u800c\u63d0\u5347\u4e34\u5e8a\u7814\u7a76\u7684\u6548\u7387\u3002|\n", "2406.17753": "|**2024-06-25**|**Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language**|Amalie Brogaard Pauli et.al.|[2406.17753](http://arxiv.org/abs/2406.17753)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u9762\u5bf9\u5927\u91cf\u8bd5\u56fe\u5f71\u54cd\u6211\u4eec\u7684\u4fe1\u606f\uff0c\u5982\u9884\u544a\u6d88\u606f\u3001\u8fa9\u8bba\u3001\u5e26\u6709\u653f\u6cbb\u8272\u5f69\u7684\u65b0\u95fb\u548c\u5ba3\u4f20\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5177\u6709\u8bf4\u670d\u529b\u6587\u672c\u7684\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u4e13\u6ce8\u4e8e\u7279\u5b9a\u9886\u57df\u6216\u7c7b\u578b\u529d\u8bf4\u7684\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u5206\u6790\uff0c\u65e8\u5728\u6d4b\u91cf\u548c\u57fa\u51c6LLMs\u5728\u88ab\u660e\u786e\u8981\u6c42\u589e\u5f3a\u6216\u51cf\u5c11\u8bf4\u670d\u529b\u65f6\uff0c\u4ee5\u53ca\u4ec5\u8981\u6c42\u8fdb\u884c\u91ca\u4e49\u65f6\u4ea7\u751f\u8bf4\u670d\u6027\u6587\u672c\u7684\u7a0b\u5ea6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u2014\u2014\u201cPersuasive-Pairs\u201d\uff0c\u5305\u542b\u4e00\u7ec4\u7531\u7b80\u77ed\u6587\u672c\u548cLLM\u91cd\u5199\u4ee5\u653e\u5927\u6216\u524a\u5f31\u8bf4\u670d\u529b\u7684\u6587\u672c\u5bf9\u3002\u6211\u4eec\u5bf9\u8fd9\u4e9b\u914d\u5bf9\u8fdb\u884c\u4e86\u591a\u6807\u6ce8\uff0c\u6309\u76f8\u5bf9\u5c3a\u5ea6\u8bc4\u4f30\u5176\u8bf4\u670d\u529b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u4e0d\u4ec5\u672c\u8eab\u5177\u6709\u4ef7\u503c\uff0c\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5b83\u8bad\u7ec3\u4e00\u4e2a\u56de\u5f52\u6a21\u578b\uff0c\u9884\u6d4b\u6587\u672c\u5bf9\u4e4b\u95f4\u8bf4\u670d\u529b\u7684\u5f97\u5206\uff0c\u4ece\u800c\u80fd\u591f\u5bf9\u4e0d\u540c\u9886\u57df\u7684LLMs\u8fdb\u884c\u8bc4\u5206\u548c\u6bd4\u8f83\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e0d\u540c\u7cfb\u7edf\u63d0\u793a\u5bf9LLaMA3\u4ea7\u751f\u7684\u5f71\u54cd\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5373\u4f7f\u5728\u4ec5\u8981\u6c42\u91ca\u4e49\u7684\u60c5\u51b5\u4e0b\uff0c\u4e0d\u540c\u7684\u201c\u89d2\u8272\u201d\u63d0\u793a\u4e5f\u4f1a\u663e\u8457\u6539\u53d8\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7814\u7a76LLM\u751f\u6210\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u8bed\u8a00\u7684\u91cd\u8981\u6027\u3002|\n", "2406.17737": "|**2024-06-25**|**LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users**|Elinor Poole-Dayan et.al.|[2406.17737](http://arxiv.org/abs/2406.17737)|null|\u5728\u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5173\u4e8e\u5b83\u4eec\u7684\u4e0d\u53ef\u9760\u884c\u4e3a\uff0c\u5982\u865a\u6784\u548c\u504f\u89c1\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u7684\u56de\u7b54\u8d28\u91cf\u5728\u4fe1\u606f\u51c6\u786e\u6027\u3001\u771f\u5b9e\u6027\u4ee5\u53ca\u62d2\u7edd\u56de\u7b54\u65b9\u9762\uff0c\u5982\u4f55\u968f\u7740\u4e09\u79cd\u7528\u6237\u7279\u5f81\u7684\u53d8\u5316\u800c\u53d8\u5316\uff1a\u82f1\u8bed\u6c34\u5e73\u3001\u6559\u80b2\u7a0b\u5ea6\u548c\u56fd\u7c4d\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u6700\u5148\u8fdb\u7684LLMs\u548c\u4e24\u4e2a\u4e8b\u5b9e\u6838\u67e5\u76f8\u5173\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u771f\u5b9e\u6027\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u5bf9\u82f1\u8bed\u80fd\u529b\u8f83\u4f4e\u3001\u6559\u80b2\u6c34\u5e73\u8f83\u4f4e\u4ee5\u53ca\u975e\u7f8e\u56fd\u7c4d\u7528\u6237\u7684\u56de\u7b54\u8d28\u91cf\u5b58\u5728\u66f4\u660e\u663e\u7684\u8d1f\u9762\u503e\u5411\uff0c\u8fd9\u4f7f\u5f97\u8fd9\u4e9b\u6a21\u578b\u5bf9\u4e8e\u5176\u6700\u5f31\u52bf\u7528\u6237\u6765\u8bf4\uff0c\u5e76\u975e\u53ef\u9760\u7684\u4fe1\u606f\u6765\u6e90\u3002|\n", "2406.17706": "|**2024-06-25**|**FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model**|Feijie Wu et.al.|[2406.17706](http://arxiv.org/abs/2406.17706)|**[link](https://github.com/HarliWu/FedBiOT)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u8fc7\u9002\u5f53\u9886\u57df\u7279\u5b9a\u6570\u636e\u7684\u5fae\u8c03\u540e\uff0c\u5728\u8bb8\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u4e13\u7528\u6570\u636e\u901a\u5e38\u5206\u5e03\u5728\u591a\u4e2a\u6240\u6709\u8005\u4e4b\u95f4\uff0c\u8fd9\u5c31\u63d0\u51fa\u4e86\u5982\u4f55\u5728\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4e2d\u8fdb\u884cLLM\u5fae\u8c03\u7684\u95ee\u9898\u3002\u9762\u5bf9\u6709\u9650\u7684\u8ba1\u7b97\u548c\u901a\u4fe1\u80fd\u529b\uff0cFL\u5ba2\u6237\u7aef\u5728\u6709\u6548\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FedBiOT\uff0c\u4e00\u79cd\u65e8\u5728\u63d0\u9ad8\u8d44\u6e90\u6548\u7387\u7684LLM\u5fae\u8c03FL\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u670d\u52a1\u5668\u751f\u6210\u4e00\u4e2a\u538b\u7f29\u7684LLM\uff0c\u5e76\u786e\u4fdd\u5176\u6027\u80fd\u4e0e\u5b8c\u6574\u6a21\u578b\u76f8\u5f53\u3002\u7136\u540e\uff0c\u5ba2\u6237\u7aef\u9488\u5bf9\u8fd9\u4e2a\u538b\u7f29\u6a21\u578b\u7684\u4e00\u4e2a\u8f7b\u91cf\u4f46\u91cd\u8981\u7684\u90e8\u5206\u2014\u2014\u9002\u914d\u5668\u8fdb\u884c\u5fae\u8c03\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7531\u4e8e\u670d\u52a1\u5668\u65e0\u6cd5\u8bbf\u95ee\u5ba2\u6237\u7aef\u62e5\u6709\u7684\u79c1\u4eba\u6570\u636e\uff0c\u670d\u52a1\u5668\u7528\u4e8e\u6821\u51c6\u7684\u6570\u636e\u5206\u5e03\u4e0e\u5ba2\u6237\u7aef\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u4e0d\u540c\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5efa\u6a21\u4e3a\u4e00\u4e2a\u5e26\u6709\u6570\u636e\u4e0d\u4e00\u81f4\u6027\u5f71\u54cd\u7684 bilevel \u4f18\u5316\u95ee\u9898\uff0c\u5e76\u5bfc\u51fa\u4e86\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u7684\u66f4\u65b0\u89c4\u5219\u3002\u6211\u4eec\u5728 LLaMA-2 \u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u5728\u91cd\u65b0\u6574\u5408\u5230\u5168\u5c40\u8bed\u8a00\u6a21\u578b\u65f6\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u8fd8\u8868\u660e\uff0cFedBiOT \u76f8\u6bd4\u73b0\u6709\u57fa\u51c6\u663e\u8457\u51cf\u5c11\u4e86\u8d44\u6e90\u6d88\u8017\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u76f8\u8fd1\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692](http://arxiv.org/abs/2406.17692)|**[link](https://github.com/thomlake/investigating-alignment)**|**\u8be5\u7814\u7a76\u5206\u6790\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ecf\u8fc7\u6821\u51c6\u540e\u8f93\u51fa\u5206\u5e03\u7684\u53d8\u5316\u7279\u6027\u3002\u9996\u5148\uff0c\u91cd\u65b0\u8bc4\u4f30\u4e86\u4e4b\u524d\u5173\u4e8e\u6821\u51c6\u540e\u54cd\u5e94\u591a\u6837\u6027\u964d\u4f4e\u7684\u62a5\u544a\uff0c\u53d1\u73b0\u8fd9\u79cd\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u8d28\u91cf\u63a7\u5236\u548c\u4fe1\u606f\u6574\u5408\u3002\u6821\u51c6\u80fd\u591f\u6291\u5236\u4e0d\u76f8\u5173\u548c\u65e0\u5e2e\u52a9\u7684\u5185\u5bb9\uff0c\u540c\u65f6\u4f7f\u8f93\u51fa\u5206\u5e03\u503e\u5411\u4e8e\u66f4\u957f\u7684\u3001\u6db5\u76d6\u591a\u4e2a\u57fa\u7840LLM\u54cd\u5e94\u4fe1\u606f\u7684\u7b54\u6848\uff0c\u5b9e\u8d28\u4e0a\u662f\u5c06\u591a\u6837\u5316\u4fe1\u606f\u6c47\u603b\u5728\u5355\u4e2a\u54cd\u5e94\u4e2d\u3002\u7814\u7a76\u5e76\u672a\u53d1\u73b0\u6821\u51c6\u663e\u8457\u51cf\u5c11\u6709\u7528\u4fe1\u606f\uff0c\u8fdb\u800c\u5f15\u51fa\u95ee\u9898\uff1a\u6821\u51c6\u6a21\u578b\u662f\u5426\u4f1a\u4ea7\u751f\u57fa\u7840\u6a21\u578b\u65e0\u6cd5\u518d\u73b0\u7684\u4fe1\u606f\uff1f\u7b2c\u4e8c\u90e8\u5206\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u60c5\u51b5\u5e76\u975e\u5982\u6b64\uff0c\u6821\u51c6\u6a21\u578b\u7684\u884c\u4e3a\u53ef\u4ee5\u901a\u8fc7\u57fa\u7840\u6a21\u578b\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u590d\u73b0\u3002\u901a\u8fc7\u4e0a\u4e0b\u6587\u793a\u4f8b\u548c\u8f83\u4f4e\u5206\u8fa8\u7387\u7684\u8bed\u4e49\u63d0\u793a\uff0c\u53ef\u4ee5\u4ece\u57fa\u7840LLMs\u5f15\u5bfc\u51fa\u4e0e\u6821\u51c6\u540e\u7684\u76f8\u4f3c\u54cd\u5e94\uff0c\u751a\u81f3\u4e0e\u6821\u51c6\u540e\u7684\u54cd\u5e94\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u63a5\u8fd1\u3002\u8fd9\u4e9b\u53d1\u73b0\u652f\u6301\u201c\u8868\u9762\u6821\u51c6\u5047\u8bbe\u201d\uff0c\u5373\u5f53\u524d\u7684\u6821\u51c6\u6280\u672f\u4ec5\u6355\u6349\u4e86\u52a9\u624b\u578b\u57fa\u7840LLM\u884c\u4e3a\u4e2d\u6709\u7528\u7684\u90e8\u5206\uff0c\u5e76\u672a\u6269\u5c55\u5176\u80fd\u529b\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u8fd8\u663e\u793a\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u6821\u51c6\u4f5c\u4e3a\u4e00\u79cd\u6a21\u4eff\u6821\u51c6LLMs\u7684\u7b56\u7565\uff0c\u6548\u679c\u51fa\u4eba\u610f\u6599\u5730\u597d\uff0c\u4e14\u65e0\u9700\u5fae\u8c03\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.17681": "|**2024-06-25**|**VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation**|Kun Qian et.al.|[2406.17681](http://arxiv.org/abs/2406.17681)|**[link](https://github.com/qbetterk/VarBench)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f20\u7edf\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u65e5\u76ca\u51fa\u8272\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u9884\u8bad\u7ec3\u671f\u95f4\u7684\u57fa\u51c6\u6570\u636e\u6cc4\u9732\u95ee\u9898\uff0c\u901a\u5e38\u79f0\u4e3a\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002\u4e3a\u4e86\u786e\u4fdd\u516c\u6b63\u7684\u8bc4\u4f30\uff0c\u6700\u8fd1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u516c\u5f00\u8bad\u7ec3\u548c\u9a8c\u8bc1\u96c6\uff0c\u5bf9\u6d4b\u8bd5\u96c6\u6807\u7b7e\u4fdd\u5bc6\u3002\u4ed6\u4eec\u8981\u6c42\u4efb\u4f55\u5e0c\u671b\u8bc4\u4f30\u81ea\u5df1\u8bed\u8a00\u6a21\u578b\u7684\u4eba\u90fd\u9700\u8981\u63d0\u4ea4\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u8fdb\u884c\u96c6\u4e2d\u5904\u7406\uff0c\u7136\u540e\u5728\u6392\u884c\u699c\u4e0a\u516c\u5e03\u6a21\u578b\u7684\u5f97\u5206\u3002\u7136\u800c\uff0c\u8fd9\u4e2a\u63d0\u4ea4\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u59a8\u788d\u4e86\u6709\u6548\u7684\u9519\u8bef\u5206\u6790\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u52a8\u6001\u5316\u57fa\u51c6\u6d4b\u8bd5\u5e76\u5b9e\u65f6\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u6bcf\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u4e2d\u63d0\u53d6\u53d8\u91cf\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u53d8\u91cf\u5b9a\u4e49\u4e00\u4e2a\u503c\u8303\u56f4\u3002\u6bcf\u6b21\u8bc4\u4f30\u65f6\uff0c\u6211\u4eec\u4f1a\u4ece\u8fd9\u4e9b\u503c\u57df\u4e2d\u62bd\u53d6\u65b0\u7684\u503c\u6765\u521b\u5efa\u72ec\u7279\u7684\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ece\u800c\u4fdd\u8bc1\u6bcf\u6b21\u90fd\u662f\u5168\u65b0\u7684\u8bc4\u4f30\u3002 \u6211\u4eec\u9488\u5bf9\u6570\u5b66\u751f\u6210\u4efb\u52a1\u7684GSM8K\u3001\u591a\u9879\u9009\u62e9\u4efb\u52a1\u7684ARC\u3001commonsense\u95ee\u7b54\u7684CommonsenseQA\u4ee5\u53caTruthfulQA\u7684\u771f\u5b9e\u6027\u95ee\u7b54\u4efb\u52a1\uff0c\u5e94\u7528\u4e86\u8fd9\u79cd\u53d8\u91cf\u6270\u52a8\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u51c6\u786e\u5730\u8861\u91cf\u8bed\u8a00\u6a21\u578b\u7684\u771f\u5b9e\u80fd\u529b\uff0c\u6709\u6548\u7f13\u89e3\u4e86\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002|\n", "2406.17675": "|**2024-06-25**|**Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models**|Yuan Li et.al.|[2406.17675](http://arxiv.org/abs/2406.17675)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u65e5\u76ca\u626e\u6f14\u7c7b\u4f3c\u4eba\u7c7b\u52a9\u624b\u7684\u89d2\u8272\u3002\u793e\u4f1a\u5bf9\u5c06LLMs\u66f4\u5e7f\u6cdb\u5730\u878d\u5165\u5176\u4e2d\u4ea7\u751f\u4e86\u5174\u8da3\uff0c\u63a2\u8ba8\u5b83\u4eec\u662f\u5426\u5177\u5907\u5fc3\u7406\u7279\u8d28\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u7279\u8d28\u662f\u5426\u7a33\u5b9a\u4e14\u6709\u52a9\u4e8e\u7406\u89e3\u5176\u884c\u4e3a\u3002\u672c\u6587\u501f\u9274\u5fc3\u7406\u5b66\u6d4b\u91cf\u5b66\u7684\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7528\u4e8e\u7814\u7a76LLMs\u4e2d\u7684\u5fc3\u7406\u5b66\uff0c\u5305\u62ec\u5fc3\u7406\u7ef4\u5ea6\u8bc6\u522b\u3001\u8bc4\u4f30\u6570\u636e\u96c6\u521b\u5efa\u548c\u7ed3\u679c\u9a8c\u8bc1\u3002\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684LLM\u5fc3\u7406\u6d4b\u91cf\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u516d\u79cd\u5fc3\u7406\u7ef4\u5ea6\uff1a\u4e2a\u6027\u3001\u4ef7\u503c\u89c2\u3001\u60c5\u7eea\u3001\u5fc3\u667a\u7406\u8bba\u3001\u52a8\u673a\u548c\u667a\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u542b\u4e86\u5341\u4e09\u4e2a\u5305\u542b\u591a\u6837\u573a\u666f\u548c\u9898\u578b\u7684\u6570\u636e\u96c6\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLMs\u5c55\u73b0\u51fa\u5e7f\u6cdb\u7684\u5fc3\u7406\u7279\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u89c2\u5bdf\u5230LLMs\u5728\u81ea\u6211\u62a5\u544a\u7684\u7279\u8d28\u4e0e\u5176\u5b9e\u9645\u884c\u4e3a\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u3002\u8be5\u8bba\u6587\u8be6\u7ec6\u5c55\u793a\u4e86LLMs\u7684\u5fc3\u7406\u6d4b\u91cf\u8bc4\u4f30\uff0c\u4e3aAI\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53ef\u9760\u8bc4\u4f30\u63d0\u4f9b\u4e86\u6d1e\u89c1\uff0c\u4ee5\u53ca\u53ef\u80fd\u7684\u5e94\u7528\u65b9\u5411\u3002|\n", "2406.18532": "|**2024-06-26**|**Symbolic Learning Enables Self-Evolving Agents**|Wangchunshu Zhou et.al.|[2406.18532](http://arxiv.org/abs/2406.18532)|**[link](https://github.com/aiwaves-cn/agents)**|**\u4eba\u5de5\u667a\u80fd\u754c\u901a\u8fc7\u6784\u5efa\"\u8bed\u8a00\u4ee3\u7406\"\uff08\u5373\u590d\u6742\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ba1\u9053\uff09\u6765\u63a2\u5bfb\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u9053\u8def\uff0c\u8fd9\u4e9b\u6a21\u578b\u7ed3\u5408\u4e86\u63d0\u793a\u6280\u672f\u548c\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u4f17\u591a\u5b9e\u9645\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f53\u524d\u8bed\u8a00\u4ee3\u7406\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u662f\u5176\u6a21\u578b\u4e2d\u5fc3\u6216\u5de5\u7a0b\u5bfc\u5411\uff1a\u63d0\u793a\u3001\u5de5\u5177\u548c\u7ba1\u9053\u7684\u6539\u8fdb\u4f9d\u8d56\u4e8e\u5927\u91cf\u7684\u4eba\u5de5\u4e13\u5bb6\u8bbe\u8ba1\uff0c\u800c\u975e\u81ea\u52a8\u4ece\u6570\u636e\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4ece\u6a21\u578b\u4e2d\u5fc3\u5411\u6570\u636e\u4e2d\u5fc3\u8f6c\u53d8\u2014\u2014\u8ba9\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u81ea\u4e3b\u5b66\u4e60\u548c\u9002\u5e94\u73af\u5883\uff0c\u662f\u5b83\u4eec\u8fc8\u5411AGI\u7684\u5173\u952e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\"\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\"\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u65b9\u6cd5\uff0c\u5b83\u4f7f\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u5728\u6570\u636e\u9a71\u52a8\u7684\u65b9\u5f0f\u4e0b\u81ea\u6211\u4f18\u5316\uff0c\u5229\u7528\u7b26\u53f7\u4f18\u5316\u5668\u3002\u6211\u4eec\u5c06\u4ee3\u7406\u89c6\u4e3a\u5177\u6709\u53ef\u5b66\u4e60\u6743\u91cd\u7684\u7b26\u53f7\u7f51\u7edc\uff0c\u8fd9\u4e9b\u6743\u91cd\u7531\u63d0\u793a\u3001\u5de5\u5177\u53ca\u5176\u7ec4\u5408\u65b9\u5f0f\u5b9a\u4e49\u3002\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u65e8\u5728\u6a21\u4eff\u8fde\u63a5\u4e3b\u4e49\u5b66\u4e60\u4e2d\u7684\u4e24\u4e2a\u57fa\u672c\u7b97\u6cd5\uff1a\u53cd\u5411\u4f20\u64ad\u548c\u68af\u5ea6\u4e0b\u964d\uff0c\u4f46\u5b83\u5904\u7406\u7684\u662f\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7684\u6743\u91cd\u3001\u635f\u5931\u548c\u68af\u5ea6\u3002\u6211\u4eec\u5728\u6807\u51c6\u57fa\u51c6\u548c\u590d\u6742\u73b0\u5b9e\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u6982\u5ff5\u9a8c\u8bc1\u5b9e\u9a8c\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u4f7f\u5f97\u8bed\u8a00\u4ee3\u7406\u5728\u521b\u5efa\u548c\u90e8\u7f72\u540e\u80fd\u591f\u81ea\u6211\u66f4\u65b0\uff0c\u5b9e\u73b0\u4e86\"\u81ea\u6211\u8fdb\u5316\u7684\u4ee3\u7406\"\u3002**|\n", "2406.18528": "|**2024-06-26**|**PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation**|Christoph Leiter et.al.|[2406.18528](http://arxiv.org/abs/2406.18528)|**[link](https://github.com/gringham/prexme)**|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5b83\u4eec\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u4f7f\u5176\u6210\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u8bc4\u4ef7\u7684\u6709\u529b\u5de5\u5177\uff0c\u7279\u522b\u9002\u7528\u4e8e\u8d44\u6e90\u532e\u4e4f\u548c\u65f6\u95f4\u9650\u5236\u7684\u573a\u666f\u3002\u672c\u6587\u63d0\u51faPrExMe\uff0c\u4e00\u9879\u5927\u89c4\u6a21\u7684\u63d0\u793a\u63a2\u7d22\u5ea6\u91cf\u6cd5\uff0c\u6211\u4eec\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u548c\u6458\u8981\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u8d85\u8fc7720\u79cd\u5f00\u6e90LLM\u4f5c\u4e3a\u5ea6\u91cf\u6807\u51c6\u7684\u6a21\u677f\uff0c\u603b\u8ba1\u7ea6660\u4e07\u6b21\u8bc4\u4f30\u3002\u8fd9\u9879\u8be6\u5c3d\u7684\u6bd4\u8f83\uff081\uff09\u4e3a\u8fd1\u671f\u5f00\u6e90LLMs\u4f5c\u4e3a\u8bc4\u4ef7\u6307\u6807\u7684\u8868\u73b0\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff1b\uff082\uff09\u63a2\u8ba8\u4e86\u4e0d\u540c\u63d0\u793a\u7b56\u7565\u7684\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65b9\u9762\uff0c\u5b58\u5728\u4e00\u4e9b\u60c5\u51b5\u4e0b\u63d0\u793a\u8868\u73b0\u7a33\u5b9a\uff1a\u6709\u4e9bLLMs\u8868\u73b0\u51fa\u7279\u6709\u7684\u504f\u597d\uff0c\u503e\u5411\u4e8e\u4f7f\u7528\u6587\u672c\u6807\u7b7e\u6765\u8bc4\u5206\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u503e\u5411\u4e8e\u8fd4\u56de\u6570\u503c\u5206\u6570\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u63d0\u793a\u7684\u7a33\u5b9a\u6027\u548c\u6a21\u578b\u6392\u540d\u53ef\u80fd\u53d7\u5230\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u66f4\u6539\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u5c06\u8f93\u51fa\u683c\u5f0f\u4ece\u201c0\u5230100\u201d\u6539\u4e3a\u201c-1\u5230+1\u201d\u53ef\u80fd\u4f1a\u663e\u8457\u6539\u53d8\u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u6709\u52a9\u4e8e\u7406\u89e3\u4e0d\u540c\u63d0\u793a\u65b9\u6cd5\u5bf9MT\u548c\u6458\u8981\u8bc4\u4ef7\u4e2dLLM-based\u5ea6\u91cf\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u6700\u7a33\u5b9a\u7684\u63d0\u793a\u6a21\u5f0f\uff0c\u5e76\u6307\u51fa\u4e86\u6f5c\u5728\u5c40\u9650\u6027\u3002|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521](http://arxiv.org/abs/2406.18521)|**[link](https://github.com/princeton-nlp/CharXiv)**|\u5728\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5904\u7406\u79d1\u5b66\u8bba\u6587\u6216\u8d22\u52a1\u62a5\u544a\u7b49\u4efb\u52a1\u65f6\uff0c\u56fe\u8868\u7406\u89e3\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u5f80\u5f80\u96c6\u4e2d\u5728\u7b80\u5316\u548c\u540c\u8d28\u5316\u7684\u56fe\u8868\u4e0a\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6a21\u677f\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u8bc4\u4f30\u8fc7\u4e8e\u4e50\u89c2\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5c3d\u7ba1\u5f00\u6e90\u6a21\u578b\u5728\u73b0\u6709\u57fa\u51c6\u4e0a\u53ef\u80fd\u8868\u73b0\u4f18\u4e8e\u5f3a\u5927\u7684\u4e13\u6709\u6a21\u578b\uff0c\u4f46\u901a\u8fc7\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\uff0c\u5982\u6539\u53d8\u56fe\u8868\u6216\u95ee\u9898\uff0c\u6027\u80fd\u4f1a\u4e0b\u964d\u9ad8\u8fbe34.5%\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faCharXiv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,323\u4e2a\u6765\u81eaarXiv\u8bba\u6587\u7684\u81ea\u7136\u3001\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u8868\u7684\u5168\u9762\u8bc4\u4f30\u5957\u4ef6\u3002CharXiv\u5305\u62ec\u4e24\u7c7b\u95ee\u9898\uff1a1\uff09\u63cf\u8ff0\u6027\u95ee\u9898\uff0c\u7528\u4e8e\u68c0\u67e5\u57fa\u672c\u56fe\u8868\u5143\u7d20\uff1b2\uff09\u63a8\u7406\u95ee\u9898\uff0c\u9700\u8981\u7efc\u5408\u5206\u6790\u56fe\u8868\u4e2d\u7684\u590d\u6742\u89c6\u89c9\u5143\u7d20\u3002\u6240\u6709\u56fe\u8868\u548c\u95ee\u9898\u90fd\u7531\u4e13\u5bb6\u7cbe\u5fc3\u6311\u9009\u3001\u6574\u7406\u548c\u9a8c\u8bc1\u4ee5\u4fdd\u8bc1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5f3a\u4e13\u6709\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\uff0c\u51c6\u786e\u7387\u4e3a47.1%\uff09\u4e0e\u6700\u5f3a\u5f00\u6e90\u6a21\u578b\uff08\u5982InternVL Chat V1.5\uff0c\u51c6\u786e\u7387\u4e3a29.2%\uff09\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\uff0c\u800c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5747\u8fdc\u4f4e\u4e8e\u4eba\u7c7b\u768480.5%\u6c34\u5e73\uff0c\u8fd9\u63ed\u793a\u4e86\u73b0\u6709MLLM\u5728\u56fe\u8868\u7406\u89e3\u80fd\u529b\u4e0a\u7684\u4e0d\u8db3\u3002\u6211\u4eec\u5e0c\u671bCharXiv\u80fd\u63a8\u52a8\u672a\u6765\u7684\u7814\u7a76\uff0c\u901a\u8fc7\u63d0\u4f9b\u66f4\u771f\u5b9e\u3001\u66f4\u5177\u4ee3\u8868\u6027\u7684\u8fdb\u6b65\u8861\u91cf\u6807\u51c6\uff0c\u4fc3\u8fdb\u56fe\u8868\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\u3002\u9879\u76ee\u9875\u9762\u548c\u6392\u884c\u699c\u53ef\u8bbf\u95ee\uff1ahttps://charxiv.github.io/\u3002|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512](http://arxiv.org/abs/2406.18512)|null|### \u6982\u8ff0 \u89e3\u91ca\u662f\u77e5\u8bc6\u5171\u4eab\u7684\u6838\u5fc3\uff0c\u5b83\u5efa\u7acb\u5728\u6c9f\u901a\u539f\u7406\u3001\u793e\u4f1a\u52a8\u6001\u548c\u5b66\u4e60\u7406\u8bba\u4e4b\u4e0a\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5bf9\u8bdd\u5f0f\u7684\u89e3\u91ca\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5176\u73af\u5883\u9ad8\u5ea6\u9002\u5e94\u6027\u548c\u4ea4\u4e92\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u5229\u7528\u4e86\u89e3\u91ca\u884c\u4e3a\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7406\u89e3\u89e3\u91ca\u8005\u548c\u88ab\u89e3\u91ca\u8005\u5728\u5bf9\u8bdd\u4e2d\u5982\u4f55\u8fd0\u7528\u7b56\u7565\u8fdb\u884c\u89e3\u91ca\u3001\u7406\u89e3\u548c\u4e92\u52a8\u7684\u5de5\u5177\u3002\u6211\u4eec\u5229\u7528Wachsmuth\u7b49\u4eba\u6784\u5efa\u7684WIRED YouTube\u7cfb\u5217\u6570\u636e\u96c6\uff0c\u5e76\u7531Booshehri\u7b49\u4eba\u8fdb\u884c\u4e86\u5e26\u6709\u89e3\u91ca\u884c\u4e3a\u7684\u6807\u6ce8\uff0c\u8fd9\u4e9b\u6ce8\u91ca\u4e3a\u6211\u4eec\u7406\u89e3\u5bf9\u8bdd\u4e2d\u89e3\u91ca\u8005\u5982\u4f55\u6784\u5efa\u56de\u5e94\u63d0\u4f9b\u4e86\u4f9d\u636e\u3002 \u968f\u7740\u53bb\u5e74\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u671f\u671b\u66f4\u597d\u5730\u7406\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u4ee5\u53ca\u5b83\u4eec\u5982\u4f55\u589e\u5f3a\u4e13\u5bb6\u89e3\u91ca\u8005\u7684\u5bf9\u8bdd\u4ea4\u6d41\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Booshehri\u7b49\u4eba2023\u5e74\u6807\u6ce8\u76845-Levels\u6570\u636e\u96c6\u6765\u8bc4\u4f30LLMs\u5728\u89e3\u91ca\u6027\u5bf9\u8bdd\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u8bc4\u4ef7LLMs\u751f\u6210\u89e3\u91ca\u8005\u56de\u5e94\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u79cd\u7b56\u7565\uff1a\u4eba\u7c7b\u89e3\u91ca\u8005\u7684\u539f\u59cb\u56de\u5e94\u3001GPT4\u7684\u6807\u51c6\u56de\u5e94\u4ee5\u53ca\u52a0\u5165\u4e86\u89e3\u91ca\u6b65\u9aa4\u7684GPT4\u56de\u5e94\u3002\u6211\u4eec\u9080\u8bf7\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u8fd9\u4e09\u79cd\u7b56\u7565\u8fdb\u884c\u8bc4\u4f30\u3002|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505](http://arxiv.org/abs/2406.18505)|null|## \u80cc\u666f \u5c3d\u7ba1\u73b0\u4ee3\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7406\u8bba\u4e0a\u80fd\u591f\u8868\u8fbe\u4efb\u610f\u53ef\u80fd\u7684\u4ee4\u724c\u5206\u5e03\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5229\u7528\u9884\u8bad\u7ec3\u65f6\u79ef\u7d2f\u7684\u4e16\u754c\u77e5\u8bc6\u6765\u7406\u89e3\u7269\u7406\u4e16\u754c\u4e2d\u7684\u4ee3\u7406\u884c\u4e3a\uff0c\u8fd9\u4e00\u65b9\u9762\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5b9e\u8bc1\u8003\u5bdf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u8fc7\u63a8\u7406\u5206\u6790\u4ee3\u7406\u7684\u884c\u4e3a\u53ca\u5176\u5bf9\u72b6\u6001\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u6784\u5efa\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\uff08agent mental modeling\uff09\u7684\u80fd\u529b\u3002\u8fd9\u53ef\u80fd\u63ed\u793a\u51fa\u5229\u7528LLMs\u89e3\u6790\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u884c\u4e3a\u7684\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u89e3\u91ca\u5f3a\u5316\u5b66\u4e60\uff08XRL\uff09\u7684\u5173\u952e\u6311\u6218\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u7279\u5b9a\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5728\u4e0d\u540c\u590d\u6742\u5ea6\u7684RL\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u62a5\u544a\u5173\u4e8e\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\u5efa\u7acb\u7684\u7814\u7a76\u7ed3\u679c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u8fd8\u65e0\u6cd5\u4ec5\u901a\u8fc7\u63a8\u7406\u5b8c\u5168\u5b9e\u73b0\u4ee3\u7406\u7684\u5fc3\u7406\u5efa\u6a21\uff0c\u8fd9\u9700\u8981\u8fdb\u4e00\u6b65\u521b\u65b0\u3002\u56e0\u6b64\uff0c\u8fd9\u9879\u5de5\u4f5c\u63d0\u4f9b\u4e86\u5bf9\u73b0\u4ee3LLMs\u80fd\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\u3002|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501](http://arxiv.org/abs/2406.18501)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5185\u63d2\u5b66\u4e60\uff08in-context learning\uff0cICL\uff09\u80fd\u529b\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u8fdb\u884c\u529f\u80fd\u7b49\u6548\u6027\u8bca\u65ad\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u9006\u9891\u7387\u6548\u5e94\uff08inverse frequency effect\uff0cIFE\uff09\u6765\u5206\u6790\u3002IFE\u73b0\u8c61\u6307\u7684\u662f\u5728\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6a21\u578b\u5e94\u5bf9\u7f55\u89c1\u6837\u4f8b\u4ea7\u751f\u7684\u66f4\u65b0\u5e45\u5ea6\u5927\u4e8e\u5e38\u89c1\u6837\u4f8b\u3002\u5728\u5fc3\u7406\u5b66\u4e2d\uff0c\u4eba\u7c7b\u5728\u7ed3\u6784\u5316\u63d0\u793a\uff08\u5982\u503e\u5411\u4e8e\u91cd\u590d\u6700\u8fd1\u63a5\u89e6\u7684\u53e5\u5b50\u7ed3\u6784\uff09\u60c5\u5883\u4e2d\u8868\u73b0\u51faIFE\uff0c\u8fd9\u8868\u660e\u5176\u53ef\u80fd\u6d89\u53ca\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u673a\u5236\u3002\u5b9e\u9a8c\u901a\u8fc7\u6a21\u62df\u7ed3\u6784\u5316\u63d0\u793a\u5728ICL\u4e2d\u7684\u5f71\u54cd\u53d1\u73b0\uff0cLLMs\u540c\u6837\u663e\u793a\u51faIFE\uff0c\u4e14\u8fd9\u4e00\u6548\u5e94\u5728\u66f4\u5927\u7684\u6a21\u578b\u4e2d\u66f4\u4e3a\u660e\u663e\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u7ed3\u679c\u652f\u6301\u4e86ICL\u672c\u8d28\u4e0a\u662f\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u7684\u5047\u8bbe\uff0c\u5373\u5728ICL\u7684\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u9690\u542b\u5730\u8ba1\u7b97\u4e86\u68af\u5ea6\u3002\u8bba\u6587\u7ed3\u8bba\u6307\u51fa\uff0c\u4eba\u7c7b\u548cLLMs\u90fd\u4f7f\u7528\u4e86\u57fa\u4e8e\u68af\u5ea6\u7684\u3001\u9519\u8bef\u9a71\u52a8\u7684\u5904\u7406\u673a\u5236\u3002|\n", "2406.18460": "|**2024-06-26**|**Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation**|Ahmed Njifenjou et.al.|[2406.18460](http://arxiv.org/abs/2406.18460)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b9\u6cd5\u6765\u521b\u5efa\u80fd\u591f\u8fdb\u884c\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u56de\u7b54\u7528\u6237\u95ee\u9898\uff0c\u4f46\u5c40\u9650\u4e8e\u5355\u5411\u95ee\u7b54\u5f62\u5f0f\uff0c\u800c\u975e\u771f\u6b63\u7684\u5bf9\u8bdd\u3002\u901a\u5e38\uff0c\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u6765\u8c03\u6574\u5b83\u4eec\u7684\u4ea4\u6d41\u98ce\u683c\uff0c\u4f46\u8fd9\u65e2\u6602\u8d35\u53c8\u9650\u4e8e\u5c11\u6570\u8bed\u8a00\u3002\u672c\u7814\u7a76\u63a2\u7d22\u4e86\u89d2\u8272\u626e\u6f14\u7684\u96f6\u6837\u672c\u63d0\u793a\u4f5c\u4e3a\u63d0\u9ad8\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u591a\u8bed\u8a00\u80fd\u529b\u5f3a\u7684\u8bad\u7ec3\u6709\u7d20\u6a21\u578b\uff08Beeching\u7b49\u4eba\uff0c2023\u5e74\uff09\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u9075\u5faa\u6307\u4ee4\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\u7cfb\u7edf\uff0c\u5f53\u4e0e\u9075\u5faa\u6307\u4ee4\u7684\u6a21\u578b\u2014\u2014\u8fd9\u91cc\u4f7f\u7528Vicuna\uff08Chiang\u7b49\u4eba\uff0c2023\u5e74\uff09\u7ed3\u5408\u65f6\uff0c\u80fd\u591f\u751f\u6210\u5728\u6cd5\u8bed\u4e2d\u7684\u5bf9\u8bdd\u4ee3\u7406\uff0c\u5728\u4e24\u9879\u4efb\u52a1\u4e2d\u751a\u81f3\u8d85\u8d8a\u4e86\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5e76\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.18449": "|**2024-06-26**|**Cascading Large Language Models for Salient Event Graph Generation**|Xingwei Tan et.al.|[2406.18449](http://arxiv.org/abs/2406.18449)|**[link](https://github.com/xingwei-warwick/callmsae)**|\u7531\u4e8e\u957f\u6587\u6863\u4e2d\u4e8b\u4ef6\u68c0\u6d4b\u3001\u5173\u7cfb\u8bc6\u522b\u4ee5\u53ca\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e0e\u7ed3\u6784\u5316\u56fe\u8c31\u7684\u6574\u5408\u7b49\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u4ece\u6587\u672c\u751f\u6210\u4e8b\u4ef6\u56fe\u8c31\u662f\u4e00\u9879\u6311\u6218\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u540c\u7b49\u91cd\u89c6\u6240\u6709\u4e8b\u4ef6\uff0c\u672a\u80fd\u533a\u5206\u5bf9\u7406\u89e3\u53d9\u4e8b\u81f3\u5173\u91cd\u8981\u7684\u5173\u952e\u4e8b\u4ef6\u3002\u672c\u6587\u63d0\u51faCALLMSAE\uff0c\u4e00\u4e2a\u57fa\u4e8eCAscading\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684SAlient Event\u56fe\u8c31\u751f\u6210\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u7684\u80fd\u529b\uff0c\u5e76\u907f\u514d\u4e86\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u9700\u6c42\u3002\u9996\u5148\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u6458\u8981\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u91cd\u8981\u4e8b\u4ef6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u4ee3\u7801\u7cbe\u70bc\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u751f\u6210\u4e8b\u4ef6\u5173\u7cfb\u56fe\uff0c\u6d88\u9664\u9519\u8bef\u7684\u5173\u7cfb\u5e76\u6062\u590d\u7f3a\u5931\u7684\u8fb9\u3002\u5bf9\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u56fe\u8c31\u751f\u6210\u6a21\u578b\u8fdb\u884c fine-tuning\uff0c\u5728\u4f7f\u7528 LLM \u751f\u6210\u7684\u56fe\u8c31\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f18\u4e8e\u4f7f\u7528 CAEVO \u751f\u6210\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u3002\u5728\u4eba\u7c7b\u6807\u6ce8\u7684\u6d4b\u8bd5\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u751f\u6210\u66f4\u7a81\u51fa\u4e14\u51c6\u786e\u7684\u56fe\u8c31\uff0c\u8d85\u8d8a\u4e86\u7ade\u4e89\u6027\u7684\u57fa\u7ebf\u3002|\n", "2406.18440": "|**2024-06-26**|**New intelligent empowerment for digital transformation**|Peng Yifeng et.al.|[2406.18440](http://arxiv.org/abs/2406.18440)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u8861\u91cf\u4f01\u4e1a\u7684\u6570\u5b57\u5316\u8f6c\u578b\uff08DT\uff09\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf92005\u5e74\u81f32022\u5e74\u95f4\u5728\u7ebd\u7ea6\u8bc1\u5238\u4ea4\u6613\u6240\u548c\u7eb3\u65af\u8fbe\u514b\u4e0a\u5e02\u76844407\u5bb6\u516c\u53f8\u7684\u5e74\u5ea6\u62a5\u544a\u8fdb\u884c\u5206\u6790\uff0c\u6784\u5efa\u4e86\u4e00\u5957\u5168\u9762\u7684DT\u6307\u6807\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cDT\u663e\u8457\u63d0\u9ad8\u4e86\u4f01\u4e1a\u7684\u8d22\u52a1\u8868\u73b0\u3002\u7136\u800c\uff0c\u4e0d\u540c\u7684\u6570\u5b57\u6280\u672f\u5bf9\u8d22\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u5404\u4e0d\u76f8\u540c\uff0c\u533a\u5757\u94fe\u6280\u672f\u7684\u79ef\u6781\u5f71\u54cd\u76f8\u5bf9\u8f83\u5c0f\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0DT\u901a\u8fc7\u63d0\u5347\u8fd0\u8425\u6548\u7387\u548c\u964d\u4f4e\u6210\u672c\u4fc3\u8fdb\u8d22\u52a1\u7ee9\u6548\u589e\u957f\u3002\u672c\u7814\u7a76\u4e3a\u5b66\u672f\u754c\u63d0\u4f9b\u4e86\u65b0\u7684DT\u8bc4\u4f30\u5de5\u5177\uff0c\u540c\u65f6\u62d3\u5bbd\u4e86\u751f\u6210\u4eba\u5de5\u667a\u80fd\u6280\u672f\u5728\u7ecf\u6d4e\u7814\u7a76\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002|\n", "2406.18406": "|**2024-06-26**|**IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons**|Dan Shi et.al.|[2406.18406](http://arxiv.org/abs/2406.18406)|null|\u4eba\u4eec\u666e\u904d\u8ba4\u4e3a\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u540e\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u6587\u672c\u65f6\u7684\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5185\u7f16\u7801\u7684\u53c2\u6570\u77e5\u8bc6\uff08\u5373\u77e5\u8bc6\u5e93\uff09\u4e0e\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u65b0\u77e5\u8bc6\u5b58\u5728\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014IRCAN\uff08\u8bc6\u522b\u548c\u91cd\u6743\u4e0a\u4e0b\u6587\u611f\u77e5\u795e\u7ecf\u5143\uff09\u3002IRCAN\u9996\u5148\u5229\u7528\u6574\u5408\u68af\u5ea6\u8ba1\u7b97\u5f97\u5230\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u5f52\u56e0\u5206\u6570\uff0c\u6765\u8bc6\u522b\u90a3\u4e9b\u5bf9\u5904\u7406\u8bed\u5883\u81f3\u5173\u91cd\u8981 \u7684\u795e\u7ecf\u5143\u3002\u63a5\u7740\uff0c\u901a\u8fc7\u91cd\u65b0\u8d4b\u6743\uff0c\u6211\u4eec\u5f3a\u5316\u8fd9\u4e9b\u8bc6\u522b\u51fa\u7684\u4e0a\u4e0b\u6587\u76f8\u5173\u795e\u7ecf\u5143\uff0c\u4ece\u800c\u5f15\u5bfcLLMs\u751f\u6210\u66f4\u7b26\u5408\u4e0a\u4e0b\u6587\u65b0\u77e5\u8bc6\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5728\u591a\u79cd\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cIRCAN\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u7684\u80fd\u529b\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u3001\u5373\u63d2\u5373\u7528\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u6a21\u578b\u4e2d\u3002|\n", "2406.19392": "|**2024-06-27**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392](http://arxiv.org/abs/2406.19392)|**[link](https://github.com/rextime/rextime)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aReXTime\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u95e8\u9488\u5bf9\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u5728\u89c6\u9891\u4e8b\u4ef6\u4e2d\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\u3002ReXTime\u5173\u6ce8\u7684\u662f\u8de8\u65f6\u95f4\u63a8\u7406\uff0c\u5373\u7406\u89e3\u5f53\u95ee\u9898\u53ca\u5176\u76f8\u5e94\u7684\u7b54\u6848\u51fa\u73b0\u5728\u4e0d\u540c\u7684\u89c6\u9891\u7247\u6bb5\u65f6\u7684\u4eba\u7c7b\u5f0f\u7406\u89e3\u3002\u8fd9\u79cd\u9700\u8981\u6df1\u5165\u7406\u89e3\u89c6\u9891\u7247\u6bb5\u4e4b\u95f4\u56e0\u679c\u5173\u7cfb\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u5bf9\u524d\u6cbf\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8bc4\u4ef7\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7ba1\u9053\uff0c\u7528\u4e8e\u751f\u6210\u65f6\u95f4\u63a8\u7406\u7684\u95ee\u7b54\u5bf9\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u7e41\u7410\u7684\u624b\u52a8\u6807\u6ce8\u9700\u6c42\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5305\u62ec921\u4e2a\u7cbe\u5fc3\u7b5b\u9009\u7684\u9a8c\u8bc1\u6837\u672c\u548c2,143\u4e2a\u6d4b\u8bd5\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u7ecf\u8fc7\u4eba\u5de5\u7cbe\u5fc3\u6311\u9009\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b66\u672f\u6a21\u578b\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7684\u8868\u73b0\u4ecd\u5b58\u5728\u663e\u8457\u768414.3%\u7684\u7cbe\u5ea6\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u65e0\u9700\u4eba\u5de5\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9,695\u4e2a\u673a\u5668\u751f\u6210\u6837\u672c\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6765\u63d0\u5347\u8de8\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u3002**|\n", "2406.19384": "|**2024-06-27**|**The Remarkable Robustness of LLMs: Stages of Inference?**|Vedang Lad et.al.|[2406.19384](http://arxiv.org/abs/2406.19384)|**[link](https://github.com/vdlad/remarkable-robustness-of-llms)**|**\u6211\u4eec\u901a\u8fc7\u5220\u9664\u548c\u4ea4\u6362\u76f8\u90bb\u5c42\u6765\u5c55\u793a\u5e76\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u60ca\u4eba\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u5e72\u9884\u63aa\u65bd\u4ecd\u80fd\u4fdd\u7559\u539f\u59cb\u6a21\u578b72%\u81f395%\u7684\u9884\u6d4b\u7cbe\u5ea6\uff0c\u800c\u4e14\u6a21\u578b\u5c42\u6570\u8d8a\u591a\uff0c\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u6839\u636e\u9010\u5c42\u5e72\u9884\u5b9e\u9a8c\u548c\u5176\u4ed6\u5b9e\u9a8c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5047\u8bbe\uff1a\u5b58\u5728\u56db\u79cd\u901a\u7528\u7684\u63a8\u7406\u9636\u6bb5\uff0c\u8de8\u8d8a\u516b\u79cd\u4e0d\u540c\u7684\u6a21\u578b\uff1a\u89e3\u7801\u5668\u9636\u6bb5\uff0c\u5c06\u539f\u59cb\u4ee4\u724c\u8868\u793a\u63d0\u5347\u4e3a\u66f4\u9ad8\u7ea7\u7684\u4e0a\u4e0b\u6587\u8868\u793a\uff1b\u7279\u5f81\u5de5\u7a0b\u9636\u6bb5\uff0c\u8fed\u4ee3\u4f18\u5316\u4efb\u52a1\u548c\u5b9e\u4f53\u7279\u5b9a\u7279\u5f81\uff1b\u7136\u540e\u662f\u6a21\u578b\u7684\u534a\u90e8\u5206\uff0c\u968f\u7740\u4e13\u95e8\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u9690\u85cf\u8868\u793a\u4e0e\u8bcd\u6c47\u7a7a\u95f4\u7684\u5bf9\u9f50\u8fdb\u5165\u4e00\u4e2a\u76f8\u53d8\u9636\u6bb5\uff1b\u6700\u540e\uff0c\u6700\u540e\u4e00\u5c42\u901a\u8fc7\u6d88\u9664\u5bf9\u9884\u6d4b\u9020\u6210\u5e72\u6270\u7684\u8fc7\u65f6\u7279\u5f81\uff0c\u7cbe\u7ec6\u5316\u540e\u7eed\u7684\u4ee4\u724c\u5206\u5e03\u3002**|\n", "2406.19358": "|**2024-06-27**|**The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models**|Xiliang Zhu et.al.|[2406.19358](http://arxiv.org/abs/2406.19358)|null|### \u6982\u8ff0 \u60c5\u611f\u5206\u6790\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u626e\u6f14\u7740\u6838\u5fc3\u89d2\u8272\u3002XLM-R\u548cmT5\u7b49\u591a\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u7684\u5173\u6ce8\u5ea6\u63d0\u5347\u3002\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63a8\u52a8\u4e86\u901a\u7528NLP\u4efb\u52a1\u7684\u53d1\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5728\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u65b9\u9762\u7684\u6027\u80fd\u5c1a\u672a\u5145\u5206\u63a2\u8ba8\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5b9e\u8bc1\u5206\u6790\uff0c\u6bd4\u8f83\u4e86\u516c\u5171\u5c0f\u578b\u591a\u8bed\u8a00\u6a21\u578b\uff08SMLM\uff09\u5982XLM-R\u4e0e\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLM\uff08\u5982Llama-3\uff09\u5728\u82f1\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u548c\u4e2d\u6587\u7684\u60c5\u611f\u5206\u6790\u4e2d\u7684\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8fc1\u79fb\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c31\u516c\u5f00\u6a21\u578b\u800c\u8a00\uff0cSMLM\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u8bbe\u7f6e\u4e2d\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u516c\u5f00LLM\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9002\u5e94\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u4e13\u6709\u7684GPT-3.5\u548cGPT-4\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u80fd\u529b\u4e0a\u9886\u5148\uff0c\u4f46\u5728\u5c11\u91cf\u6837\u672c\u573a\u666f\u4e0b\uff0c\u5b83\u4eec\u88ab\u516c\u5f00\u6a21\u578b\u8d85\u8d8a\u3002|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356](http://arxiv.org/abs/2406.19356)|**[link](https://github.com/umass-ml4ed/divert)**|## \u80cc\u666f \u9ad8\u8d28\u91cf\u7684\u5e72\u6270\u9879\u5bf9\u4e8e\u9009\u62e9\u9898\uff08\u5c24\u5176\u662f\u6570\u5b66\u9009\u62e9\u9898\uff09\u7684\u8bc4\u4f30\u548c\u6559\u5b66\u4ef7\u503c\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u5de5\u8bbe\u8ba1\u80fd\u591f\u53cd\u6620\u5b66\u751f\u5b9e\u9645\u77e5\u8bc6\u7f3a\u9677\u6216\u8bef\u89e3\u7684\u5e72\u6270\u9879\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u5728\u751f\u6210\u5e72\u6270\u9879\u65b9\u9762\u6709\u6240\u52a9\u76ca\uff0c\u4f46\u6570\u5b66\u8fd9\u7c7b\u5b66\u79d1\u7684\u5904\u7406\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7406\u89e3\u548c\u751f\u6210\u89e3\u91ca\u6027\u7684\u9519\u8bef\u8868\u793a\uff0c\u4ee5\u751f\u6210\u6570\u5b66\u9009\u62e9\u9898\u7684\u5e72\u6270\u9879\u3002\u672c\u6587\u4ecb\u7ecdDiVERT\uff08\u57fa\u4e8e\u6587\u672c\u7684\u53d8\u5f02\u8bef\u5dee\u751f\u6210\u5668\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u75287\u4ebf\u53c2\u6570\u5f00\u6e90LLM\u7684\u53d8\u5206\u65b9\u6cd5\uff0c\u5b83\u5728\u771f\u5b9e\u4e16\u754c\u6570\u5b66\u9009\u62e9\u9898\u6570\u636e\u96c6\uff08\u5305\u542b1,434\u4e2a\u95ee\u9898\uff0c\u88ab\u6570\u5341\u4e07\u5b66\u751f\u4f7f\u7528\uff09\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684GPT-4\u65b9\u6cd5\uff0cDiVERT\u5728\u5e72\u6270\u9879\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u4e0e\u6570\u5b66\u6559\u80b2\u8005\u7684\u540c\u884c\u8bc4\u5ba1\uff0c\u7ed3\u679c\u8868\u660eDiVERT\u751f\u6210\u7684\u9519\u8bef\u6807\u7b7e\u8d28\u91cf\u63a5\u8fd1\u4eba\u7c7b\u7f16\u5199\u7684\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u82f1\u6587\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u8f93\u51fa\u4e0d\u5e94\u5305\u542b\u9664\u6458\u8981\u5185\u5bb9\u5916\u7684\u4efb\u4f55\u5176\u4ed6\u5185\u5bb9\uff0c\u4e14\u786e\u4fdd\u4e0d\u51fa\u73b0\",\"\u5b57\u7b26\u3002|\n", "2406.19349": "|**2024-06-27**|**IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language**|Lucky Susanto et.al.|[2406.19349](http://arxiv.org/abs/2406.19349)|null|## \u7ffb\u8bd1 \u9488\u5bf9\u7f51\u7edc\u4ec7\u6068\u8a00\u8bba\u5bf9\u793e\u4f1a\u548c\u8c10\u7684\u4e25\u5cfb\u5a01\u80c1\uff0c\u7279\u522b\u662f\u5728\u5370\u5c3c\u8fd9\u7c7b\u56fd\u5bb6\uff0c\u8fd1\u5e74\u6765\u4ec7\u6068\u8a00\u8bba\u5728\u7ebf\u6bd4\u7387\u589e\u957f\u4e86\u5341\u500d\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u68c0\u6d4b\u673a\u5236\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5145\u8db3\u7684\u6807\u8bb0\u6570\u636e\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u5370\u5c3c\u6587\u672c\u7684\uff0c\u8fd9\u4e00\u8fdb\u5c55\u53d7\u5230\u4e86\u963b\u788d\u3002\u8fb9\u7f18\u5316\u7fa4\u4f53\uff0c\u5982\u4ec0\u53f6\u6d3e\u3001LGBTQ\u7b49\u5c11\u6570\u7fa4\u4f53\uff0c\u9762\u4e34\u7684\u6311\u6218\u66f4\u5927\uff0c\u56e0\u4e3a\u4ec7\u6068\u8a00\u8bba\u62a5\u544a\u4e0d\u8db3\uff0c\u73b0\u6709\u7684\u68c0\u6d4b\u5de5\u5177\u5bf9\u5176\u7406\u89e3\u6709\u9650\u3002\u6b64\u5916\uff0c\u5f53\u524d\u6570\u636e\u96c6\u5bf9\u4e3b\u89c2\u6027\u7684\u5904\u7406\u4e0d\u8db3\uff0c\u52a0\u5267\u4e86\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faIndoToxic2024\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u5370\u5c3c\u4ec7\u6068\u8a00\u8bba\u548c\u6bd2\u6027\u5206\u7c7b\u6570\u636e\u96c6\uff0c\u5305\u542b43,692\u6761\u8bb0\u5f55\uff0c\u753119\u540d\u591a\u5143\u5316\u7684\u4e2a\u4f53\u8fdb\u884c\u6807\u6ce8\uff0c\u7279\u522b\u5173\u6ce8\u9009\u4e3e\u671f\u95f4\u9488\u5bf9\u56fd\u5185\u5f31\u52bf\u7fa4\u4f53\uff08\u5982\u603b\u7edf\u9009\u4e3e\u4e2d\u7684\u7279\u5b9a\u7fa4\u4f53\uff09\u7684\u6587\u672c\u3002\u6211\u4eec\u4f7f\u7528BERT\u6a21\u578b\uff08IndoBERTweet\uff09\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4e3a\u4e03\u79cd\u4e8c\u5143\u5206\u7c7b\u4efb\u52a1\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff0c\u53d6\u5f97\u4e860.78\u7684\u5b8fF1\u5206\u6570\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5c06\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u878d\u5165\u5176\u4e2d\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578bgpt-3.5-turbo\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u8b66\u544a\uff0c\u8fc7\u5ea6\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u53ef\u80fd\u5bfc\u81f4\u7ec6\u5316\u6a21\u578b\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u8fd9\u4f1a\u5bfc\u81f4\u6570\u636e\u788e\u7247\u5316\u3002|\n", "2406.19317": "|**2024-06-27**|**Jump Starting Bandits with LLM-Generated Prior Knowledge**|Parand A. Alamdari et.al.|[2406.19317](http://arxiv.org/abs/2406.19317)|**[link](https://github.com/BorealisAI/jump-starting-bandits)**|\u6211\u4eec\u63d0\u4f9b\u4e86\u6709\u529b\u7684\u8bc1\u636e\uff0c\u5c55\u793a\u4e86\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\u76f8\u7ed3\u5408\u7684\u4f18\u52bf\u3002\u4e0a\u4e0b\u6587\u5316\u8001\u864e\u673a\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u7528\u4e8e\u6839\u636e\u7528\u6237\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u751f\u6210\u4e2a\u6027\u5316\u5efa\u8bae\u3002\u6211\u4eec\u8868\u660e\uff0c\u7ecf\u8fc7\u5927\u89c4\u6a21\u8bed\u6599\u5e93\u8bad\u7ec3\uff0c\u5bcc\u542b\u4eba\u7c7b\u77e5\u8bc6\u548c\u504f\u597d\u7684LLMs\u80fd\u591f\u5f88\u597d\u5730\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u4ece\u800c\u901a\u8fc7\u542f\u52a8\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6765\u51cf\u5c11\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\uff08regret\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521d\u59cb\u5316\u7b97\u6cd5\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u63a5\u8fd1\u4eba\u7c7b\u504f\u597d\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u4f9b\u8001\u864e\u673a\u5b66\u4e60\u4f7f\u7528\u3002\u8fd9\u663e\u8457\u964d\u4f4e\u4e86\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\u548c\u6570\u636e\u6536\u96c6\u6210\u672c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e24\u7ec4\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4f7f\u7528LLMs\u4f5c\u4e3a\u5360\u535c\u8005\uff08oracle\uff09\u7684\u5b9e\u9a8c\u548c\u57fa\u4e8e\u8054\u5408\u8c03\u67e5\u5b9e\u9a8c\u6570\u636e\u7684\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u3002|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292](http://arxiv.org/abs/2406.19292)|**[link](https://github.com/edixiong/artificial-needles)**|\u8fd1\u671f\u7684\u7814\u7a76\u6307\u51fa\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u957f\u6587\u672c\u8f93\u5165\u65f6\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5408\u6210\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6570\u503c\u578b\u952e\u503c\u5bf9\u68c0\u7d22\u4efb\u52a1\u3002\u6211\u4eec\u5728GPT-3.5 Turbo\u548cMistral 7B\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u8fd9\u79cd\u6570\u636e\u96c6\u7684\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u957f\u6587\u672c\u73af\u5883\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u5206\u6790\u4e86\u5fae\u8c03\u540e\u7684\u6a21\u578b\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u4ece\u5408\u6210\u4efb\u52a1\u8fc1\u79fb\u5230\u5b9e\u9645\u8bc4\u4f30\uff08\u5982\u572820\u6587\u6863MDQA\u4e2d\u7684\u4f4d\u7f6e10\u5904\u63d0\u534710.5%\uff09\u65b9\u9762\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u7ecf\u8fc7\u6211\u4eec\u5408\u6210\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u5728\u901a\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u4fdd\u6301\u7a33\u5b9a\uff0c\u800c\u4f7f\u7528\u5176\u4ed6\u57fa\u4e8e\u957f\u6587\u672c\u589e\u5f3a\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u53ef\u80fd\u4f1a\u5bfc\u81f4\u9519\u8bef\u589e\u52a0\uff08\u4f8b\u5982\uff0c\u5728TriviaQA\u4e0a\uff0cMistral 7B\u5728\u6211\u4eec\u7684\u5408\u6210\u6570\u636e\u4e0a\u5fae\u8c03\u65e0\u660e\u663e\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u5176\u4ed6\u57fa\u7ebf\u6570\u636e\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u8303\u56f4\u57282.33%\u52306.19%\u4e4b\u95f4\uff09\u3002\u672c\u7814\u7a76\u7a81\u663e\u4e86\u901a\u8fc7\u5408\u6210\u6570\u636e\u5fae\u8c03\u6765\u63d0\u5347LLMs\u5728\u957f\u6587\u672c\u4efb\u52a1\u6027\u80fd\u7684\u6f5c\u529b\u3002|\n", "2406.19283": "|**2024-06-27**|**PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models**|Cathy Mengying Fang et.al.|[2406.19283](http://arxiv.org/abs/2406.19283)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPhysioLLM\u7684\u4e92\u52a8\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u53ef\u7a7f\u6234\u8bbe\u5907\u7684\u751f\u7406\u6570\u636e\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u5065\u5eb7\u7406\u89e3\u548c\u63a2\u7d22\u3002\u4e0e\u5546\u4e1a\u5065\u5eb7\u5e94\u7528\u4e0d\u540c\uff0cPhysioLLM\u5177\u5907\u5168\u9762\u7684\u7edf\u8ba1\u5206\u6790\u529f\u80fd\uff0c\u80fd\u53d1\u73b0\u7528\u6237\u6570\u636e\u4e2d\u7684\u5173\u8054\u548c\u8d8b\u52bf\u3002\u7528\u6237\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u95ee\uff0c\u83b7\u53d6\u751f\u6210\u7684\u4e2a\u6027\u5316\u6d1e\u5bdf\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u4fe1\u606f\u5236\u5b9a\u884c\u52a8\u76ee\u6807\u3002\u4ee5\u6539\u5584\u7761\u7720\u8d28\u91cf\u4e3a\u4f8b\uff0c\u56e0\u4e3a\u5176\u53ef\u901a\u8fc7\u751f\u7406\u6570\u636e\u91cf\u5316\u4e14\u5bf9\u6574\u4f53\u5065\u5eb7\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u4e00\u9879\u6d89\u53ca24\u540dFitbit\u667a\u80fd\u624b\u8868\u7528\u6237\u7684\u7528\u6237\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86PhysioLLM\u5728\u4fc3\u8fdb\u5bf9\u5065\u5eb7\u6570\u636e\u7684\u6df1\u5165\u4e2a\u6027\u5316\u7406\u89e3\uff0c\u4ee5\u53ca\u652f\u6301\u5b9e\u73b0\u4e2a\u4eba\u5065\u5eb7\u76ee\u6807\u65b9\u9762\uff0c\u4f18\u4e8eFitbit\u5e94\u7528\u548c\u901a\u7528LLM\u804a\u5929\u673a\u5668\u4eba\u3002|\n", "2406.19280": "|**2024-06-27**|**HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale**|Junying Chen et.al.|[2406.19280](http://arxiv.org/abs/2406.19280)|**[link](https://github.com/freedomintelligence/huatuogpt-vision)**|**\u968f\u7740\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u533b\u5b66\u591a\u6a21\u6001\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u533b\u5b66\u5f71\u50cf-\u6587\u672c\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u53d7\u9650\u4e8e\u6570\u636e\u9690\u79c1\u95ee\u9898\u548c\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ecd\u9762\u4e34\u6311\u6218\u3002\u65e9\u671f\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528PubMed\u7684\u5927\u578b\u53bb\u6807\u8bc6\u5316\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\u6765\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u4ecd\u53d7\u5230\u6570\u636e\u566a\u97f3\u7684\u5f71\u54cd\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4f18\u5316\u4e86PubMed\u4e2d\u7684\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\uff0c\u5e76\u5229\u7528GPT-4V\u5728\u201c\u975e\u76f2\u201d\u6a21\u5f0f\u4e0b\u8fdb\u884c\u6570\u636e\u6e05\u6d17\u548c\u683c\u5f0f\u8f6c\u6362\uff0c\u521b\u5efa\u4e86PubMedVision\u6570\u636e\u96c6\uff0c\u5305\u542b130\u4e07\u4efd\u533b\u5b66\u89c6\u89c9\u95ee\u7b54\u6837\u672c\u3002\u6211\u4eec\u7684\u9a8c\u8bc1\u8868\u660e\uff1a\uff081\uff09PubMedVision\u663e\u8457\u63d0\u5347\u4e86\u5f53\u524d\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u5728\u533b\u5b66\u9886\u57df\u7684\u6027\u80fd\uff0c\u5728\u8bf8\u5982MMMU Health & Medicine track\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u6539\u5584\uff1b\uff082\uff09\u533b\u5b66\u4e13\u5bb6\u7684\u624b\u52a8\u68c0\u67e5\u548c\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u5728\u6570\u636e\u8d28\u91cf\u4e0a\u4f18\u4e8e\u5176\u4ed6\u6784\u5efa\u65b9\u6cd5\u3002\u5229\u7528PubMedVision\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u540d\u4e3aHuatuoGPT-Vision\u7684340\u4ebf\u53c2\u6570\u7684\u533b\u5b66\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u5728\u516c\u5f00\u6e90\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u533b\u5b66\u591a\u6a21\u6001\u573a\u666f\u4e2d\u663e\u793a\u51fa\u4f18\u8d8a\u6027\u80fd\u3002**|\n", "2406.19271": "|**2024-06-27**|**AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning**|Praneeth Vadlapati et.al.|[2406.19271](http://arxiv.org/abs/2406.19271)|**[link](https://github.com/Pro-GenAI/AutoPureData)**|**\u4eba\u4eec\u5bf9\u6700\u65b0\u7684\u548c\u53ef\u9760\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9700\u6c42\u6301\u7eed\u589e\u957f\u3002\u901a\u5e38\uff0cLLMs\u662f\u57fa\u4e8e\u56fa\u5b9a\u7684\u6570\u636e\u96c6\u8bad\u7ec3\u7136\u540e\u90e8\u7f72\u7684\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u6570\u636e\u4f1a\u968f\u7740\u65f6\u95f4\u9010\u6e10\u8fc7\u65f6\u3002\u7814\u7a76\u5173\u6ce8\u5982\u4f55\u5229\u7528\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u66f4\u65b0AI\u6a21\u578b\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u6570\u636e\u8d28\u91cf\u4e0e\u5b89\u5168\u7684\u987e\u8651\uff0c\u5982\u504f\u89c1\u3001\u5783\u573e\u4fe1\u606f\u7b49\u3002\u786e\u4fdd\u6570\u636e\u7eaf\u51c0\u5bf9\u4e8e\u751f\u6210\u53ef\u9760\u7684\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5728\u4e0d\u7eaf\u6570\u636e\u4e0a\u8bad\u7ec3\u53ef\u80fd\u5bfc\u81f4\u4e0d\u826f\u7ed3\u679c\u3002\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5b83\u6536\u96c6\u7f51\u7edc\u6570\u636e\uff0c\u5e76\u501f\u52a9\u73b0\u6709\u53ef\u4fe1\u7684AI\u6a21\u578b\u81ea\u52a8\u7b5b\u9009\u51fa\u4e0d\u9700\u8981\u7684\u5185\u5bb9\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5904\u7406\u4e86\u4e00\u5c0f\u90e8\u5206\u7f51\u7edc\u6570\u636e\uff0c\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\u7684\u6570\u636e\u51c0\u5316\u6548\u679c\u3002**|\n", "2406.20098": "|**2024-06-28**|**Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs**|Sukmin Yun et.al.|[2406.20098](http://arxiv.org/abs/2406.20098)|**[link](https://github.com/mbzuai-llm/web2code)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u7b49\u591a\u79cd\u6a21\u6001\u7684\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7406\u89e3\u548c\u751f\u6210\u7f51\u9875\u622a\u56fe\u4ee5\u53ca\u76f8\u5e94\u7684HTML\u4ee3\u7801\u65b9\u9762\u7684\u80fd\u529b\u76f8\u5bf9\u8f83\u5f31\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faWeb2Code\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u62ec\u5927\u89c4\u6a21\u7f51\u9875\u5230\u4ee3\u7801\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u6307\u4ee4\u8c03\u4f18\uff0c\u5e76\u8bc4\u4f30MLLM\u5728\u7f51\u9875\u7406\u89e3\u53caHTML\u4ee3\u7801\u8f6c\u6362\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u6784\u5efa\u6570\u636e\u96c6\u65f6\uff0c\u5229\u7528\u9884\u8bad\u7ec3\u7684LLMs\u589e\u5f3a\u73b0\u6709\u7684\u7f51\u9875\u5230\u4ee3\u7801\u6570\u636e\u96c6\uff0c\u5e76\u751f\u6210\u591a\u6837\u5316\u7684\u7f51\u9875\u56fe\u7247\uff0c\u4ee5\u4f9b\u6e32\u67d3\u3002\u8f93\u5165\u662f\u7f51\u9875\u56fe\u7247\u548c\u8bf4\u660e\uff0c\u8f93\u51fa\u662f\u7f51\u9875\u7684HTML\u4ee3\u7801\uff0c\u540c\u65f6\u52a0\u5165\u5173\u4e8e\u7f51\u9875\u5185\u5bb9\u7684\u4e30\u5bcc\u81ea\u7136\u8bed\u8a00\u95ee\u7b54\u5bf9\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u7f51\u9875\u5185\u5bb9\u7684\u5168\u9762\u7406\u89e3\u3002\u4e3a\u4e86\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6d4b\u8bd5\u6846\u67b6\uff0c\u7528\u4e8e\u6d4b\u8bd5MLLM\u5728\u7f51\u9875\u7406\u89e3\u4e0e\u7f51\u9875\u5230\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6280\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0d\u4ec5\u6709\u76ca\u4e8e\u6211\u4eec\u63d0\u51fa\u7684\u4efb\u52a1\uff0c\u8fd8\u5728\u89c6\u89c9\u9886\u57df\u7684\u4e00\u822c\u6027\u80fd\u4e0a\u6709\u6240\u63d0\u5347\uff0c\u800c\u5148\u524d\u7684\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u63a8\u52a8\u901a\u7528MLLM\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u9002\u7528\u4e8e\u7f51\u7edc\u5185\u5bb9\u751f\u6210\u548c\u81ea\u52a8\u5316\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095](http://arxiv.org/abs/2406.20095)|**[link](https://github.com/lostxine/llara)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLaRA\uff08\u5927\u578b\u8bed\u8a00\u548c\u673a\u5668\u4eba\u52a9\u624b\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u8f6c\u5316\u4e3a\u5bf9\u8bdd\u5f62\u5f0f\uff0c\u901a\u8fc7\u7ed3\u5408\u989d\u5916\u7684\u6570\u636e\u8f85\u52a9\u5b66\u4e60\uff0c\u63d0\u5347\u54cd\u5e94\u8d28\u91cf\u3002\u5229\u7528\u5177\u5907\u89c6\u89c9\u8f93\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\uff0c\u5373\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5904\u7406\u72b6\u6001\u4fe1\u606f\uff0c\u4f5c\u4e3a\u89c6\u89c9-\u6587\u672c\u63d0\u793a\uff0c\u5e76\u751f\u6210\u6700\u4f18\u7684\u673a\u5668\u4eba\u51b3\u7b56\u7b56\u7565\u3002\u9996\u5148\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u65b9\u6cd5\uff0c\u4ece\u73b0\u6709\u7684\u884c\u4e3a\u514b\u9686\u6570\u636e\u4e2d\u751f\u6210\u591a\u6837\u4e14\u9ad8\u8d28\u91cf\u7684\u673a\u5668\u4eba\u6307\u4ee4\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u5b9a\u5236\u7684\u5bf9\u8bdd\u5f0f\u683c\u5f0f\u5bf9VLM\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u5176\u80fd\u591f\u751f\u6210\u6709\u610f\u4e49\u7684\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLaRA\u6846\u67b6\u5728\u591a\u4e2a\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u9884\u8bad\u7ec3\u6a21\u578b\u5df2\u5728\u63d0\u4f9b\u3002**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094](http://arxiv.org/abs/2406.20094)|**[link](https://github.com/tencent-ailab/persona-hub)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5185\u7684\u591a\u79cd\u89c6\u89d2\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u4eba\u5de5\u5408\u6210\u6570\u636e\u3002\u4e3a\u4e86\u5728\u5927\u89c4\u6a21\u4e0a\u5145\u5206\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5f15\u5165\u4e86Persona Hub\uff0c\u8fd9\u662f\u4e00\u4e2a\u4ece\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u6574\u7406\u51fa\u7684\u4e00\u4ebf\u4e2a\u591a\u5143\u5316\u4eba\u683c\u7684\u96c6\u5408\uff0c\u76f8\u5f53\u4e8e\u5168\u7403\u4eba\u53e3\u7684\u7ea613%\u3002\u8fd9\u4e9b\u4eba\u683c\u4f5c\u4e3a\u5206\u5e03\u5f0f\u4e16\u754c\u77e5\u8bc6\u8f7d\u4f53\uff0c\u51e0\u4e4e\u53ef\u4ee5\u8c03\u7528LLM\u5185\u5305\u542b\u7684\u5404\u7c7b\u89c2\u70b9\uff0c\u4ece\u800c\u63a8\u52a8\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u5408\u6210\u6570\u636e\u521b\u5efa\uff0c\u9002\u7528\u4e8e\u5404\u79cd\u573a\u666f\u3002\u901a\u8fc7\u5c55\u793aPersona Hub\u5982\u4f55\u5728\u5927\u89c4\u6a21\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u548c\u903b\u8f91\u63a8\u7406\u95ee\u9898\u3001\u6307\u4ee4\uff08\u7528\u6237\u63d0\u793a\uff09\u3001\u5bcc\u542b\u77e5\u8bc6\u7684\u6587\u672c\u3001\u6e38\u620fNPC\u548c\u5de5\u5177\uff08\u51fd\u6570\uff09\u7b49\u65b9\u9762\u7684\u5e94\u7528\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u5177\u6709\u591a\u6837\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u6613\u7528\u6027\uff0c\u53ef\u80fd\u5f15\u9886\u5408\u6210\u6570\u636e\u521b\u9020\u548c\u5b9e\u9645\u5e94\u7528\u7684\u65b0\u8303\u5f0f\uff0c\u5bf9LLM\u7684\u7814\u7a76\u548c\u53d1\u5c55\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002|\n", "2406.20092": "|**2024-06-28**|**LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression**|Jieneng Chen et.al.|[2406.20092](http://arxiv.org/abs/2406.20092)|**[link](https://github.com/beckschen/llavolta)**|**\u5c3d\u7ba1\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5d4c\u5165\u538b\u7f29\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u4e2d\u7684\u89c6\u89c9\u4ee4\u724c\u538b\u7f29\u4ecd\u7136\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u7814\u7a76\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u5197\u4f59\u6027\u4ee5\u53ca\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6709\u6548\u8bad\u7ec3\u3002\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d4b\u8bd5\u9636\u6bb5\u901a\u8fc7\u7b80\u5355\u5e73\u5747\u6c60\u5316\u6d88\u9664\u9ad8\u8fbe70%\u7684\u89c6\u89c9\u4ee4\u724c\uff0cGQA\u57fa\u51c6\u7684\u89c6\u89c9\u95ee\u7b54\u51c6\u786e\u7387\u4ec5\u4e0b\u964d3%\uff0c\u8fd9\u663e\u793a\u51fa\u89c6\u89c9\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u5927\u91cf\u5197\u4f59\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Visual Context Compressor\uff0c\u5b83\u5728\u8bad\u7ec3\u9636\u6bb5\u51cf\u5c11\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e3a\u4e86\u5728\u538b\u7f29\u89c6\u89c9\u4ee4\u724c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4fe1\u606f\u635f\u5931\u5e76\u4fdd\u6301\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u8f7b\u91cf\u7ea7\u8bad\u7ec3\u65b9\u6848LLaVolta\u3002LLaVolta\u91c7\u7528\u5206\u9636\u6bb5\u7684\u89c6\u89c9\u4e0a\u4e0b\u6587\u538b\u7f29\u7b56\u7565\uff0c\u4ece\u91cd\u5ea6\u5230\u8f7b\u5ea6\u9010\u6e10\u538b\u7f29\uff0c\u6700\u7ec8\u5728\u8bad\u7ec3\u7ed3\u675f\u65f6\u5b8c\u5168\u4e0d\u8fdb\u884c\u538b\u7f29\uff0c\u4ece\u800c\u5728\u6d4b\u8bd5\u65f6\u4e0d\u4f1a\u4e22\u5931\u4efb\u4f55\u4fe1\u606f\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u5347\u4e86\u591a\u6a21\u6001\u6a21\u578b\u5728\u56fe\u50cf-\u8bed\u8a00\u548c\u89c6\u9891-\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u6210\u672c\u3002\u4ee3\u7801\u5df2\u5728https://github.com/Beckschen/LLaVolta\u4e0a\u5f00\u6e90\u3002**|\n", "2406.20087": "|**2024-06-28**|**ProgressGym: Alignment with a Millennium of Moral Progress**|Tianyi Qiu et.al.|[2406.20087](http://arxiv.org/abs/2406.20087)|**[link](https://github.com/pku-alignment/progressgym)**|\u968f\u7740\u524d\u6cbf\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u77e5\u8bc6\u8bba\u4e2d\u7684\u5f71\u54cd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5b83\u4eec\u53ef\u80fd\u5f3a\u5316\u793e\u4f1a\u666e\u904d\u7684\u4ef7\u503c\u89c2\uff0c\u8fdb\u800c\u52a0\u5267\u9519\u8bef\u9053\u5fb7\u89c2\u5ff5\u7684\u56fa\u5316\uff0c\u5bfc\u81f4\u5e7f\u6cdb\u7684\u793e\u4f1a\u95ee\u9898\u6301\u7eed\u5b58\u5728\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6f5c\u5728\u98ce\u9669\uff0c\u6211\u4eec\u63d0\u51fa\u8fdb\u6b65\u5bf9\u9f50\u4f5c\u4e3a\u4e00\u79cd\u6280\u672f\u89e3\u51b3\u65b9\u6848\u3002\u8fdb\u6b65\u5bf9\u9f50\u7b97\u6cd5\u65e8\u5728\u5b66\u4e60\u4eba\u7c7b\u9053\u5fb7\u8fdb\u6b65\u7684\u673a\u5236\uff0c\u4ece\u800c\u5f25\u8865\u73b0\u6709\u5bf9\u9f50\u65b9\u6cd5\u5bf9\u5f53\u4ee3\u9053\u5fb7\u76f2\u70b9\u7684\u654f\u611f\u6027\u3002\u4e3a\u4e86\u63a8\u52a8\u8fdb\u6b65\u5bf9\u9f50\u7684\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ProgressGym\uff0c\u4e00\u4e2a\u5b9e\u9a8c\u6027\u6846\u67b6\uff0c\u5b83\u4ece\u5386\u53f2\u4e2d\u5b66\u4e60\u9053\u5fb7\u8fdb\u6b65\u7684\u89c4\u5f8b\uff0c\u4ee5\u4fc3\u8fdb\u73b0\u5b9e\u4e16\u754c\u9053\u5fb7\u51b3\u7b56\u7684\u672a\u6765\u53d1\u5c55\u3002\u501f\u52a99\u4e2a\u4e16\u7eaa\u7684\u5386\u53f2\u6587\u672c\u548c18\u4e2a\u5386\u53f2LLMs\uff0cProgressGym\u5c06\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u8fdb\u6b65\u5bf9\u9f50\u6311\u6218\u8f6c\u5316\u4e3a\u5177\u4f53\u7684\u57fa\u51c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u8ffd\u8e2a\u6f14\u53d8\u7684\u4ef7\u503c\uff08PG-Follow\uff09\u3001\u9884\u6d4b\u9053\u5fb7\u8fdb\u6b65\uff08PG-Predict\uff09\u4ee5\u53ca\u8c03\u8282\u4eba\u4e0eAI\u4ef7\u503c\u53d8\u8fc1\u4e4b\u95f4\u7684\u53cd\u9988\u5faa\u73af\uff08PG-Coevolve\uff09\u3002\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u65f6\u95f4\u7ef4\u5ea6\u7684\u65b9\u6cd5\uff0c\u800c\u4f20\u7edf\u7684\u5bf9\u9f50\u7b56\u7565\u65e0\u6cd5\u80dc\u4efb\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7ec8\u8eab\u5b66\u4e60\u548c\u5916\u63a8\u7b97\u6cd5\u4f5c\u4e3a\u8fdb\u6b65\u5bf9\u9f50\u7684\u57fa\u672c\u65b9\u6cd5\uff0c\u5e76\u5efa\u7acb\u4e86\u4e00\u4e2a\u5f00\u653e\u7684\u6392\u884c\u699c\uff0c\u9080\u8bf7\u521b\u65b0\u7b97\u6cd5\u548c\u65b0\u6311\u6218\u3002\u8be5\u6846\u67b6\u548c\u6392\u884c\u699c\u5206\u522b\u53ef\u5728https://github.com/PKU-Alignment/ProgressGym \u548c https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard \u83b7\u53d6\u3002|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085](http://arxiv.org/abs/2406.20085)|null|\u57fa\u4e8e\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u65b9\u6cd5\u5df2\u7ecf\u5728\u751f\u6210\u5404\u79cd\u5e03\u5c40\u7684\u9ad8\u8d28\u91cf\u56fe\u50cf\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u4e0b\u6e38\u611f\u77e5\u4efb\u52a1\u5177\u6709\u663e\u8457\u76ca\u5904\u3002\u7136\u800c\uff0c\u4ec5\u4f9d\u8d56\u8bed\u8a00\u63cf\u8ff0\u548c\u4e00\u4e2a\u5408\u9002\u7684\u591a\u5b9e\u4f8b\u8bc4\u4f30\u6307\u6807\u6765\u5b9e\u73b0\u5168\u81ea\u52a8\u5e03\u5c40\u751f\u6210\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Auto Cherry-Picker\uff08ACP\uff09\uff0c\u65e8\u5728\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6837\u672c\uff0c\u4ee5\u589e\u5f3a\u611f\u77e5\u548c\u591a\u6a21\u6001\u8bad\u7ec3\u6548\u679c\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u6982\u5ff5\u5217\u8868\uff0c\u6211\u4eec\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u8be6\u7ec6\u7684\u63cf\u8ff0\u5e76\u8bbe\u8ba1\u5408\u7406\u7684\u5e03\u5c40\u3002\u7136\u540e\uff0c\u4f7f\u7528\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u591a\u4e2a\u56fe\u7247\u3002\u63a5\u7740\uff0c\u6211\u4eec\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bc4\u4f30\u6307\u6807\u5bf9\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u70bc\uff0c\u786e\u4fdd\u8d28\u91cf\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u590d\u5408\u5e03\u5c40\u4e0e\u56fe\u50cf\u8bc4\u5206\uff08Composite Layout and Image Score\uff0cCLIS\uff09\u8fd9\u4e00\u65b0\u6307\u6807\uff0c\u7528\u4e8e\u516c\u6b63\u5730\u8bc4\u4f30\u751f\u6210\u7684\u56fe\u50cf\u3002\u6211\u4eec\u7684\u5408\u6210\u9ad8\u8d28\u793a\u4f8b\u5728\u5b9a\u5236\u521d\u59cb\u6982\u5ff5\u5217\u8868\u65f6\uff0c\u80fd\u591f\u6709\u6548\u63d0\u5347\u5404\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5c3e\u5206\u5e03\u548c\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u7684\u95ee\u9898\u4e0a\u3002\u4e0b\u6e38\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cACP\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6a21\u578b\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86CLIS\u4e0e\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u53d1\u73b0CLIS\u5206\u6570\u8d8a\u9ad8\uff0c\u6027\u80fd\u8d8a\u597d\u3002\u8fd9\u8868\u660e\u8bc4\u4f30\u6307\u6807\u5728\u89c6\u89c9\u611f\u77e5\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u4ee3\u7801\u3002|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079](http://arxiv.org/abs/2406.20079)|**[link](https://github.com/anisha2102/molecular_facts)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5185\u5bb9\u7684\u81ea\u52a8\u4e8b\u5b9e\u6838\u67e5\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4ee5\u5e94\u5bf9\u9519\u8bef\u53d9\u8ff0\u7684\u95ee\u9898\uff0c\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u7126\u70b9\u5728\u4e8e\u6838\u67e5\u7684\u7c92\u5ea6\uff1a\u8f83\u5927\u7684\u6587\u672c\u6bb5\u843d\u96be\u4ee5\u6838\u67e5\uff0c\u800c\u66f4\u539f\u5b50\u5316\u7684\u4e8b\u5b9e\uff08\u5982\u547d\u9898\uff09\u53ef\u80fd\u7f3a\u4e4f\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u89e3\u8bfb\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u8fd9\u4e9b\u539f\u5b50\u4e8b\u5b9e\u4e2d\u4e0a\u4e0b\u6587\u7684\u4f5c\u7528\u3002\u6211\u4eec\u8ba4\u4e3a\u5b8c\u5168\u539f\u5b50\u7684\u4e8b\u5b9e\u5e76\u975e\u6700\u4f73\u8868\u793a\u5f62\u5f0f\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u5206\u5b50\u4e8b\u5b9e\u7684\u4e24\u4e2a\u6807\u51c6\uff1a\u53bb\u60c5\u5883\u5316\uff08decontextuality\uff09\uff0c\u5373\u5b83\u4eec\u80fd\u5426\u72ec\u7acb\u5b58\u5728\uff0c\u4ee5\u53ca\u6700\u5c0f\u5316\uff08minimality\uff09\uff0c\u5373\u6dfb\u52a0\u591a\u5c11\u989d\u5916\u4fe1\u606f\u624d\u80fd\u5b9e\u73b0\u53bb\u60c5\u5883\u5316\u3002\u6211\u4eec\u91cf\u5316\u4e86\u53bb\u60c5\u5883\u5316\u5bf9\u6700\u5c0f\u5316\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u7840\u65b9\u6cd5\u6765\u81ea\u52a8\u751f\u6210\u5206\u5b50\u4e8b\u5b9e\uff0c\u76ee\u6807\u662f\u5728\u4fdd\u6301\u51c6\u786e\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u9002\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u4e0e\u4e0d\u540c\u7684\u53bb\u60c5\u5883\u5316\u7b56\u7565\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u53d1\u73b0\u5206\u5b50\u4e8b\u5b9e\u80fd\u591f\u5728\u6a21\u7cca\u573a\u666f\u4e2d\u5e73\u8861\u6700\u5c0f\u5316\u548c\u4e8b\u5b9e\u6838\u67e5\u7684\u51c6\u786e\u6027\u3002**|\n", "2406.20041": "|**2024-07-01**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041](http://arxiv.org/abs/2406.20041)|null|\u81ea\u4e3b\u4ee3\u7406\u9a71\u52a8\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u793a\u4e86\u5de8\u5927\u7684\u81ea\u52a8\u5316\u6f5c\u529b\u3002\u65e9\u671f\u7684\u5c55\u793a\u8868\u660e\uff0c\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u89e3\u51b3\u590d\u6742\u4efb\u52a1\uff0c\u4e0e\u5916\u90e8\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u589e\u5f3a\u77e5\u8bc6\uff0c\u5e76\u89e6\u53d1\u884c\u52a8\u3002\u7279\u522b\u662f\uff0c\u591a\u4e2a\u4ee3\u7406\u534f\u4f5c\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u5de5\u4f5c\u6d41\u8bc1\u660e\u4e86\u5b83\u4eec\u5728\u4e0d\u90a3\u4e48\u4e25\u683c\u548c\u5b9a\u4e49\u4e0d\u660e\u786e\u7684\u73af\u5883\u4e2d\u64cd\u4f5c\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u591a\u4ee3\u7406\u65b9\u6cd5\u6709\u5de8\u5927\u7684\u6f5c\u529b\u6210\u4e3a\u4f17\u591a\u5de5\u4e1a\u5e94\u7528\u7684\u6838\u5fc3\uff0c\u4ece\u590d\u6742\u7684\u77e5\u8bc6\u68c0\u7d22\u7cfb\u7edf\u5230\u4e0b\u4e00\u4ee3\u673a\u5668\u4eba\u8fc7\u7a0b\u81ea\u52a8\u5316\u3002\u9274\u4e8e\u5f53\u524dLLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5904\u7406\u590d\u6742\u6d41\u7a0b\u9700\u8981\u5206\u6b65\u9aa4\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u8bbe\u8ba1\u660e\u786e\u4e14\u6a21\u5757\u5316\u7684\u4efb\u52a1\u8ba1\u5212\u3002\u6839\u636e\u590d\u6742\u7a0b\u5ea6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u53ef\u4ee5\u7531\u5355\u4e2a\u4ee3\u7406\u6216\u4e00\u7ec4\u4ee3\u7406\u6267\u884c\u3002\u672c\u7814\u7a76\u4e13\u6ce8\u4e8e\u6784\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684\u4ee3\u7406\u5de5\u7a0b\u6846\u67b6\uff0c\u91cd\u70b9\u5173\u6ce8\u89c4\u5212\u548c\u6267\u884c\uff0c\u65e8\u5728\u5e94\u5bf9\u4e0d\u540c\u9886\u57df\u7684\u590d\u6742\u5e94\u7528\u573a\u666f\u3002\u8be5\u6846\u67b6\u4e3a\u5de5\u4e1a\u5e94\u7528\u63d0\u4f9b\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u51fa\u786e\u4fdd\u53ef\u6269\u5c55\u3001\u7075\u6d3b\u4e14\u534f\u4f5c\u7684\u5de5\u4f5c\u6d41\u7a0b\u6280\u672f\uff0c\u8ba9\u591a\u4e2a\u81ea\u4e3b\u4ee3\u7406\u534f\u540c\u89e3\u51b3\u95ee\u9898\u3002|\n", "2406.20030": "|**2024-06-28**|**LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models**|Renzhi Wang et.al.|[2406.20030](http://arxiv.org/abs/2406.20030)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u4e86\u8ddf\u4e0a\u4e0d\u65ad\u53d8\u5316\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u9700\u8981\u6301\u7eed\u8fdb\u884c\u6a21\u578b\u66f4\u65b0\uff0c\u8fd9\u50ac\u751f\u4e86\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u591a\u79cd\u5355\u6b21\u548c\u6279\u91cf\u7f16\u8f91\u7684\u6280\u672f\uff0c\u4f46\u5b83\u4eec\u5728\u9762\u5bf9\u7ec8\u751f\u7f16\u8f91\u65f6\u8981\u4e48\u65e0\u6cd5\u5e94\u7528\uff0c\u8981\u4e48\u6548\u679c\u4e0d\u4f73\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51faLEMoE\uff0c\u4e00\u4e2a\u4e13\u4e3a\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u8bbe\u8ba1\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u9002\u914d\u5668\u3002\u9996\u5148\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f71\u54cd\u4f20\u7edfMoE\u9002\u914d\u5668\u5728\u7ec8\u751f\u7f16\u8f91\u4e2d\u6709\u6548\u6027\u7684\u56e0\u7d20\uff0c\u5305\u62ec\u707e\u96be\u6027\u9057\u5fd8\u3001\u8def\u7531\u4e0d\u4e00\u81f4\u6027\u548c\u987a\u5e8f\u654f\u611f\u6027\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9a\u5236\u7684\u6a21\u5757\u63d2\u5165\u65b9\u6cd5\uff0c\u5f15\u5165\u4e86\u65b0\u9896\u7684\u952e\u503c\u5bf9\u951a\u5b9a\u8def\u7531\u4ee5\u589e\u5f3a\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8def\u7531\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u805a\u7c7b\u57fa\u7f16\u8f91\u987a\u5e8f\u89c4\u5212\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec8\u751f\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u6280\u672f\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6279\u91cf\u7f16\u8f91\u4efb\u52a1\u4e2d\u7684\u4f18\u79c0\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5f00\u6e90\u3002|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015](http://arxiv.org/abs/2406.20015)|**[link](https://github.com/toolbehonest/toolbehonest)**|**\u968f\u7740\u5de5\u5177\u589e\u5f3a\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fc5\u901f\u878d\u5165\u5b9e\u9645\u5e94\u7528\uff0c\u793e\u533a\u4e9f\u9700\u5168\u9762\u4e86\u89e3\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5168\u9762\u7684\u8bca\u65ad\u57fa\u51c6\u2014\u2014ToolBH\u3002\u6211\u4eec\u4ece\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e24\u4e2a\u7ef4\u5ea6\u8fdb\u884c\u8bc4\u4f30\uff1a\u5728\u6df1\u5ea6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u591a\u7ea7\u8bca\u65ad\u6d41\u7a0b\uff0c\u5305\u62ec\uff081\uff09\u53ef\u89e3\u6027\u68c0\u6d4b\u3001\uff082\uff09\u89e3\u51b3\u65b9\u6848\u89c4\u5212\u548c\uff083\uff09\u7f3a\u5931\u5de5\u5177\u5206\u6790\uff1b\u5728\u5e7f\u5ea6\u4e0a\uff0c\u8003\u8651\u4e86\u5de5\u5177\u96c6\u7279\u5f81\u4e0b\u7684\u4e09\u79cd\u573a\u666f\uff1a\u7f3a\u5c11\u5fc5\u8981\u5de5\u5177\u3001\u6f5c\u5728\u5de5\u5177\u548c\u529f\u80fd\u6709\u9650\u7684\u5de5\u5177\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e03\u4e2a\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u6b21\u4eba\u5de5\u6807\u6ce8\u6536\u96c6\u4e86700\u4efd\u8bc4\u4f30\u6837\u672c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u5148\u8fdb\u7684\u6a21\u578bGemini-1.5-Pro\u548cGPT-4o\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u603b\u5f97\u5206\u4e3a45.3\u548c37.0\uff0c\u6ee1\u5206100\u5206\u3002\u5728\u5de5\u5177\u589e\u5f3a\u7684LLM\u573a\u666f\u4e2d\uff0c\u66f4\u5927\u7684\u6a21\u578b\u53c2\u6570\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8bad\u7ec3\u6570\u636e\u548c\u56de\u590d\u7b56\u7565\u540c\u6837\u5173\u952e\u3002\u6211\u4eec\u7684\u8bca\u65ad\u5206\u6790\u6307\u51fa\uff0c\u6a21\u578b\u9519\u8bef\u7684\u4e3b\u8981\u539f\u56e0\u5728\u4e8e\u4efb\u52a1\u53ef\u89e3\u6027\u7684\u5224\u65ad\u3002\u5f00\u653e\u6e90\u7801\u6a21\u578b\u5728\u5197\u957f\u56de\u590d\u65f6\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u4e13\u6709\u6a21\u578b\u5728\u957f\u94fe\u63a8\u7406\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002**|\n", "2407.02490": "|**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8ba1\u7b97\u6311\u6218\uff0c\u5c24\u5176\u662f\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u957f\uff0c\u5176\u5e7f\u6cdb\u5e94\u7528\u9762\u4e34\u969c\u788d\u3002\u7531\u4e8e\u6ce8\u610f\u529b\u8ba1\u7b97\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c80\u4ebf\u53c2\u6570\u7684LLM\u5728\u5355\u4e2aA100 GPU\u4e0a\u5904\u7406100\u4e07\u4e2a\u4ee4\u724c\uff08\u5373\u9884\u586b\u5145\u9636\u6bb5\uff09\u9700\u898130\u5206\u949f\u3002\u73b0\u6709\u7684\u52a0\u901f\u9884\u586b\u5145\u65b9\u6cd5\u5f80\u5f80\u5728\u9762\u5bf9\u957f\u5e8f\u5217LLMs\u65f6\u96be\u4ee5\u4fdd\u6301\u65e2\u9ad8\u6548\u53c8\u51c6\u786e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MInference\uff08\u767e\u4e07\u4ee4\u724c\u63a8\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u63d0\u5347\u957f\u5e8f\u5217\u5904\u7406\u9884\u586b\u5145\u9636\u6bb5\u901f\u5ea6\u7684\u7a00\u758f\u8ba1\u7b97\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u6ce8\u610f\u529b\u77e9\u9635\u4e2d\u7684\u4e09\u79cd\u72ec\u7279\u6a21\u5f0f\uff1aA\u5f62\u3001\u5782\u76f4\u659c\u7ebf\u548c\u5757\u7a00\u758f\uff0c\u8fd9\u4e9b\u6a21\u5f0f\u53ef\u5229\u7528GPU\u8fdb\u884c\u9ad8\u6548\u7684\u7a00\u758f\u8ba1\u7b97\u3002\u6211\u4eec\u5728\u79bb\u7ebf\u9636\u6bb5\u786e\u5b9a\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u6700\u4f73\u6a21\u5f0f\uff0c\u5e76\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u52a8\u6001\u6784\u5efa\u7a00\u758f\u7d22\u5f15\u3002\u901a\u8fc7\u4f18\u5316\u7684GPU\u5185\u6838\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u57fa\u4e8e\u6307\u5b9a\u6a21\u5f0f\u7684\u7a00\u758f\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u957f\u5e8f\u5217LLMs\u9884\u586b\u5145\u9636\u6bb5\u7684\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u9700\u4fee\u6539\u9884\u8bad\u7ec3\u8bbe\u7f6e\u6216\u989d\u5916\u5fae\u8c03\u5373\u53ef\u76f4\u63a5\u5e94\u7528\u4e8e\u73b0\u6709LLMs\u3002\u6211\u4eec\u5728\u5305\u62ecInfiniteBench\u3001RULER\u3001PG-19\u548cNeedle In A Haystack\u5728\u5185\u7684\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4ee5\u53caLLaMA-3-1M\u3001GLM4-1M\u3001Yi-200K\u3001Phi-3-128K\u548cQwen2-128K\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMInference\u5728A100\u4e0a\u6709\u6548\u964d\u4f4e\u4e86\u9884\u586b\u5145\u7684\u63a8\u7406\u5ef6\u8fdf\u9ad8\u8fbe10\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u5730\u5740\u4e3a\uff1ahttps://aka.ms/MInference\u3002**|\n", "2407.02486": "|**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNeurocache\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6709\u6548\u4e0a\u4e0b\u6587\u8303\u56f4\uff0c\u901a\u8fc7\u5916\u90e8\u5411\u91cf\u7f13\u5b58\u5b58\u50a8\u5176\u8fc7\u53bb\u7684\u6a21\u578b\u72b6\u6001\u3002\u4e0e\u8fd1\u671f\u7684\u5411\u91cf\u68c0\u7d22\u65b9\u6cd5\u7c7b\u4f3c\uff0cNeurocache\u5229\u7528\u9ad8\u6548\u7684k\u8fd1\u90bb(kNN)\u7b97\u6cd5\u68c0\u7d22\u76f8\u5173\u7684\u5386\u53f2\u72b6\u6001\uff0c\u5e76\u5c06\u5176\u878d\u5165\u6ce8\u610f\u529b\u8fc7\u7a0b\u3002Neurocache\u5728\u6539\u8fdb\u73b0\u6709\u65b9\u6cd5\u65b9\u9762\u6709\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u5b58\u50a8\u538b\u7f29\u7684\u72b6\u6001\uff0c\u51cf\u5c0f\u4e86\u7f13\u5b58\u5927\u5c0f\uff1b(2) \u6bcf\u4e2a\u4ee4\u724c\u6267\u884c\u4e00\u6b21\u68c0\u7d22\u64cd\u4f5c\uff0c\u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\uff1b(3) \u5c06\u68c0\u7d22\u7a97\u53e3\u6269\u5c55\u5230\u90bb\u8fd1\u72b6\u6001\uff0c\u63d0\u5347\u4e86\u8bed\u8a00\u5efa\u6a21\u548c\u4e0b\u6e38\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u8fd8\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982Llama2-7B\u548cMistral-7B\uff09\u8fdb\u884c\u589e\u5f3a\uff0cNeurocache\u90fd\u80fd\u6709\u6548\u3002\u6211\u4eec\u8fd8\u5bf9\u6bd4\u4e86Neurocache\u4e0e\u5176\u4ed6\u6587\u672c\u68c0\u7d22\u65b9\u6cd5\uff0c\u5728\u5355\u6587\u6863\u95ee\u7b54\u548c\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5176\u4f18\u52bf\u3002\u6e90\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u516c\u5f00\uff1ahttps://github.com/alisafaya/neurocache\u3002**|\n", "2407.02485": "|**2024-07-02**|**RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs**|Yue Yu et.al.|[2407.02485](http://arxiv.org/abs/2407.02485)|null|\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6307\u4ee4\u8c03\u4f18\u6846\u67b6RankRAG\uff0c\u65e8\u5728\u9488\u5bf9\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4e2d\u7684\u4e0a\u4e0b\u6587\u6392\u540d\u548c\u7b54\u6848\u751f\u6210\u53cc\u91cd\u4efb\u52a1\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u8c03\u4f18\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u52a0\u5165\u5c11\u91cf\u6392\u540d\u6570\u636e\uff0c\u6307\u4ee4\u8c03\u4f18\u7684\u5355\u4e2a\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u4ee4\u4eba\u60ca\u8bb6\u7684\u6548\u679c\uff0c\u8d85\u8d8a\u4e86\u4e13\u95e8\u4f7f\u7528\u5927\u91cf\u6392\u540d\u6570\u636e\u8fdb\u884c\u5355\u72ec\u8c03\u4f18\u7684\u73b0\u6709\u4e13\u5bb6\u6392\u540d\u6a21\u578b\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4e0e\u5305\u62ecGPT-4-0613\u3001GPT-4-turbo-2024-0409\u548c\u5f00\u653e\u6e90\u4ee3\u7801\u7684\u6700\u5148\u8fdb\u7684RAG\u6027\u80fd\u6a21\u578bChatQA-1.5\u5728\u5185\u7684\u591a\u4e2a\u5f3abaseline\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684Llama3-RankRAG\u5728\u4e5d\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8eLlama3-ChatQA-1.5\u548cGPT-4\u7cfb\u5217\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5728\u65e0\u9700\u9488\u5bf9\u751f\u7269\u533b\u5b66\u9886\u57df\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u4e94\u4e2a\u751f\u7269\u533b\u5b66\u9886\u57df\u7684RAG\u57fa\u51c6\u4e0a\u4e0eGPT-4\u6a21\u578b\u8868\u73b0\u76f8\u5f53\uff0c\u8fd9\u663e\u793a\u4e86\u5176\u5728\u65b0\u9886\u57df\u4e2d\u7684\u51fa\u8272\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.02477": "|**2024-07-02**|**Understanding Alignment in Multimodal LLMs: A Comprehensive Study**|Elmira Amirloo et.al.|[2407.02477](http://arxiv.org/abs/2407.02477)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6027\u80fd\u7684\u63d0\u5347\uff0c\u504f\u597d\u4e00\u81f4\u6027\u5df2\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u56e0\u7d20\uff0c\u4f46\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u5e94\u7528\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u7406\u89e3\u4efb\u52a1\u4e2d\u4e5f\u4f1a\u9047\u5230\u8bf8\u5982\u9519\u8bef\u9648\u8ff0\u548c\u5185\u5bb9\u4e0d\u4e00\u81f4\uff08\u5373\u5e7b\u89c9\uff09\u7684\u95ee\u9898\u3002MLLMs\u7684\u504f\u597d\u5bf9\u9f50\u76ee\u6807\u662f\u4f7f\u6a21\u578b\u7684\u56de\u7b54\u66f4\u8d34\u8fd1\u56fe\u50cf\u4fe1\u606f\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u5f15\u5165\u4e86\u9488\u5bf9MLLM\u7684\u504f\u597d\u6570\u636e\u96c6\uff0c\u5e76\u5c1d\u8bd5\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u548cproximal policy optimization\uff08PPO\uff09\u7b49\u4e0d\u540c\u7684\u5bf9\u9f50\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u3001\u57fa\u7840\u6a21\u578b\u7c7b\u578b\u548c\u5bf9\u9f50\u7b56\u7565\u7684\u5dee\u5f02\uff0c\u54ea\u79cd\u65b9\u6cd5\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u8d21\u732e\u6700\u5927\u5c1a\u4e0d\u6e05\u695a\u3002 \u672c\u6587\u72ec\u7acb\u5206\u6790\u4e86MLLM\u504f\u597d\u5bf9\u9f50\u7684\u5404\u4e2a\u65b9\u9762\u3002\u6211\u4eec\u5c06\u5bf9\u9f50\u7b97\u6cd5\u5206\u4e3a\u79bb\u7ebf\uff08\u5982DPO\uff09\u548c\u5728\u7ebf\uff08\u5982\u5728\u7ebf-DPO\uff09\u4e24\u7c7b\uff0c\u5e76\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u7ed3\u5408\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u56de\u987e\u4e86\u5404\u79cd\u5df2\u53d1\u8868\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u96c6\uff0c\u63a2\u8ba8\u4e86\u5b83\u4eec\u6784\u5efa\u7ec6\u8282\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u751f\u6210\u65b9\u6cd5\u2014\u2014\u504f\u89c1\u9a71\u52a8\u7684\u5e7b\u89c9\u91c7\u6837\uff08Bias-Driven Hallucination Sampling\uff0cBDHS\uff09\uff0c\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u989d\u5916\u6807\u6ce8\u6216\u5916\u90e8\u6a21\u578b\uff0c\u4e14\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u4e0e\u4e4b\u524d\u53d1\u8868\u7684\u5bf9\u9f50\u5de5\u4f5c\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u80fd\u3002|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473](http://arxiv.org/abs/2407.02473)|null|\u5982\u4f55\u6784\u5efa\u80fd\u591f\u5728\u5f00\u653e\u4e16\u754c\u4e2d\u6267\u884c\u8bed\u4e49\u5bfc\u822a\u4efb\u52a1\u7684\u673a\u5668\u4eba\uff0c\u6bd4\u5982\u5728\u65b0\u573a\u666f\u4e2d\u5bfb\u627e\u76ee\u6807\u7269\u4f53\uff1f\u5c3d\u7ba1\u57fa\u7840\u6a21\u578b\u5177\u5907\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u6240\u9700\u7684\u4e30\u5bcc\u77e5\u8bc6\u548c\u6cdb\u5316\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4e00\u79cd\u5408\u9002\u7684\u573a\u666f\u8868\u793a\u6765\u5c06\u5b83\u4eec\u6574\u5408\u5230\u5b8c\u6574\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5f00\u653e\u573a\u666f\u56fe\uff08Open Scene Graphs\uff0cOSG\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u62d3\u6251\u8bed\u4e49\u8868\u793a\uff0c\u7528\u4e8e\u4fdd\u7559\u548c\u7ec4\u7ec7\u5f00\u653e\u96c6\u4e2d\u573a\u666f\u4fe1\u606f\uff0c\u4e14\u7ed3\u6784\u53ef\u9002\u5e94\u4e0d\u540c\u73af\u5883\u7c7b\u578b\u3002\u6211\u4eec\u5c06\u57fa\u7840\u6a21\u578b\u548cOSG\u6574\u5408\u5230OpenSearch\u7cfb\u7edf\u4e2d\uff0c\u8be5\u7cfb\u7edf\u4e13\u4e3a\u5f00\u653e\u4e16\u754c\u7684\u5bf9\u8c61\u76ee\u6807\u5bfc\u822a\u8bbe\u8ba1\uff0c\u80fd\u591f\u7406\u89e3\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u5e76\u5728\u591a\u53d8\u73af\u5883\u4e2d\u96f6\u6837\u672c\u6cdb\u5316\uff0c\u5bfb\u627e\u672a\u89c1\u8fc7\u7684\u7269\u4f53\u3002\u6211\u4eec\u7684OSG\u589e\u5f3a\u4e86\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5f97OpenSearch\u5728\u7269\u4f53\u76ee\u6807\u5bfc\u822a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u7684LLM\u65b9\u6cd5\u3002\u901a\u8fc7\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u6d4b\u8bd5\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86OpenSearch\u5728\u5404\u79cd\u73af\u5883\u3001\u673a\u5668\u4eba\u548c\u65b0\u9896\u6307\u4ee4\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02464": "|**2024-07-02**|**Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I**|Harrie Oosterhuis et.al.|[2407.02464](http://arxiv.org/abs/2407.02464)|null|\u4f20\u7edf\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u7cfb\u7edf\u8bc4\u4f30\u901a\u5e38\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u4eba\u5de5\u4e13\u5bb6\u8fdb\u884c\u76f8\u5173\u6027\u6807\u6ce8\u3002\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u80fd\u591f\u4ee5\u76f8\u5bf9\u8f83\u4f4e\u7684\u8ba1\u7b97\u6210\u672c\u5927\u89c4\u6a21\u751f\u6210\u76f8\u5173\u6027\u6ce8\u91ca\uff0c\u53ef\u80fd\u51cf\u8f7bIR\u8bc4\u4f30\u7684\u4f20\u7edf\u6210\u672c\uff0c\u5e76\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u8d44\u6e90\u532e\u4e4f\u7684\u5e94\u7528\u573a\u666f\u3002\u7136\u800c\uff0c\u751f\u6210\u7684\u6ce8\u91ca\u5e76\u975e\u65e0\u8bef\uff0c\u76f4\u63a5\u7528\u4e8e\u8bc4\u4f30\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u53ef\u9760\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e24\u79cd\u65b9\u6cd5\uff0c\u5206\u522b\u662f\u57fa\u4e8e\u9884\u6d4b\u9a71\u52a8\u7684\u63a8\u65ad\u548c\u89c4\u8303\u98ce\u9669\u63a7\u5236\uff0c\u5229\u7528\u8ba1\u7b97\u673a\u751f\u6210\u7684\u76f8\u5173\u6027\u6ce8\u91ca\u4e3aIR\u8bc4\u4f30\u6307\u6807\u63d0\u4f9b\u53ef\u9760\u7684\u7f6e\u4fe1\u533a\u95f4\uff08CIs\uff09\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9700\u8981\u5c11\u91cf\u53ef\u9760\u7684\u6ce8\u91ca\uff0c\u901a\u8fc7\u7edf\u8ba1\u5206\u6790\u751f\u6210\u6ce8\u91ca\u4e2d\u7684\u9519\u8bef\uff0c\u4ece\u800c\u4e3a\u8bc4\u4f30\u6307\u6807\u8bbe\u7f6eCIs\uff0c\u5177\u6709\u575a\u5b9e\u7684\u7406\u8bba\u57fa\u7840\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u7684\u89c4\u8303\u98ce\u9669\u63a7\u5236\u65b9\u6cd5\u9002\u7528\u4e8e\u6392\u540d\u8bc4\u4f30\uff0c\u5e76\u4e14\u53ef\u4ee5\u6839\u636e\u67e5\u8be2\u548c\u6587\u6863\u81ea\u9002\u5e94\u8c03\u6574CIs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7f6e\u4fe1\u533a\u95f4\u51c6\u786e\u6355\u6349\u4e86\u57fa\u4e8eLLM\u6ce8\u91ca\u7684\u8bc4\u4f30\u4e2d\u7684\u53d8\u5f02\u6027\u548c\u504f\u5dee\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684Bootstrap\u4f30\u8ba1\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e9b\u8d21\u732e\u80fd\u4e3a\u90a3\u4e9b\u4f20\u7edf\u4e0a\u96be\u4ee5\u5b9e\u73b0\u53ef\u9760\u8bc4\u4f30\u7684\u4f17\u591aIR\u5e94\u7528\u5e26\u6765\u9769\u65b0\u3002|\n", "2407.02411": "|**2024-07-03**|**Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs**|Jinmin Li et.al.|[2407.02411](http://arxiv.org/abs/2407.02411)|null|\u968f\u7740\u89c6\u9891\u9a71\u52a8\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u89c6\u9891\u7406\u89e3\u80fd\u529b\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u4fdd\u62a4\u65b9\u9762\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u89c6\u9891\u66f4\u5bb9\u6613\u88ab\u65e0\u6388\u6743\u5730\u6807\u6ce8\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cVideo Watermarking\u201d\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fdd\u62a4\u89c6\u9891\u514d\u53d7\u672a\u7ecf\u6388\u6743\u7684\u89c6\u9891LLMs\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5185\u5bb9\u548c\u63cf\u8ff0\u7684\u5904\u7406\u3002\u901a\u8fc7\u5728\u5173\u952e\u5e27\u4e2d\u5d4c\u5165\u96be\u4ee5\u5bdf\u89c9\u7684\u6c34\u5370\uff0c\u6211\u4eec\u5229\u7528\u591a\u6a21\u6001\u6d41\u635f\u5931\u4fdd\u6301\u89c2\u770b\u4f53\u9a8c\u7684\u540c\u65f6\uff0c\u9632\u6b62\u89c6\u9891\u88ab\u6ee5\u7528\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0cVideo Watermarking\u663e\u8457\u964d\u4f4e\u4e86\u89c6\u9891\u5728\u5404\u79cd\u89c6\u9891LLMs\u4e2d\u7684\u53ef\u7406\u89e3\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u9690\u79d8\u6027\u548c\u9c81\u68d2\u6027\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u786e\u4fdd\u89c6\u9891\u5185\u5bb9\u7684\u5b89\u5168\u3001\u5b8c\u6574\u6027\u548c\u4fdd\u5bc6\u6027\u63d0\u4f9b\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u4e0d\u65ad\u53d1\u5c55\u7684\u89c6\u9891LLMs\u6280\u672f\u3002|\n", "2407.02408": "|**2024-07-02**|**CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models**|Song Wang et.al.|[2407.02408](http://arxiv.org/abs/2407.02408)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u5bf9\u5176\u751f\u6210\u5185\u5bb9\u53ef\u80fd\u4ea7\u751f\u7684\u8d1f\u9762\u793e\u4f1a\u5f71\u54cd\u7684\u62c5\u5fe7\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u7684\u504f\u89c1\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u504f\u89c1\u8bc4\u4f30\u5de5\u4f5c\u5f80\u5f80\u53ea\u5173\u6ce8\u67d0\u79cd\u7c7b\u578b\u7684\u504f\u89c1\uff0c\u5e76\u4f7f\u7528\u4e0d\u4e00\u81f4\u7684\u8bc4\u4ef7\u6307\u6807\uff0c\u8fd9\u5bfc\u81f4\u4e0d\u540c\u6570\u636e\u96c6\u548cLLM\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u591a\u79cd\u7528\u4e8e\u8bc4\u4f30LLM\u504f\u89c1\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86CEB\uff08Compositional Evaluation Benchmark\uff09\uff0c\u5b83\u6db5\u76d6\u4e86\u4e0d\u540c\u793e\u4f1a\u7fa4\u4f53\u548c\u793e\u4f1a\u4efb\u52a1\u4e2d\u7684\u5404\u79cd\u7c7b\u578b\u504f\u89c1\u3002CEB\u7684\u6784\u5efa\u57fa\u4e8e\u6211\u4eec\u65b0\u63d0\u51fa\u7684\u6784\u6210\u6027\u5206\u7c7b\u4f53\u7cfb\uff0c\u4ece\u4e09\u4e2a\u7ef4\u5ea6\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u523b\u753b\uff1a\u504f\u89c1\u7c7b\u578b\u3001\u793e\u4f1a\u7fa4\u4f53\u548c\u4efb\u52a1\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e09\u4e2a\u7ef4\u5ea6\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u5168\u9762\u7684LLM\u504f\u89c1\u8bc4\u4f30\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u504f\u89c1\u5728\u5404\u7ef4\u5ea6\u4e0a\u7684\u7a0b\u5ea6\u6709\u6240\u4e0d\u540c\uff0c\u4ece\u800c\u4e3a\u9488\u5bf9\u7279\u5b9a\u504f\u89c1\u7684\u7f13\u89e3\u65b9\u6cd5\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002|\n", "2407.02402": "|**2024-07-02**|**Assessing the Code Clone Detection Capability of Large Language Models**|Zixian Zhang et.al.|[2407.02402](http://arxiv.org/abs/2407.02402)|null|\u8be5\u7814\u7a76\u65e8\u5728\u8bc4\u4f30\u4e24\u79cd\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cGPT-3.5\u548cGPT-4\uff0c\u5728\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u901a\u8fc7\u5728\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u6a21\u578b\uff1aBigCloneBench\uff08\u4eba\u7c7b\u521b\u5efa\uff09\u548cGPTCloneBench\uff08LLM\u751f\u6210\uff09\u3002\u7814\u7a76\u53d1\u73b0\uff0cGPT-4\u5728\u6240\u6709\u7c7b\u578b\u7684\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4e2d\u90fd\u660e\u663e\u4f18\u4e8eGPT-3.5\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e0e\u5176\u8bc6\u522b\u4ee3\u7801\u514b\u9686\u7684\u80fd\u529b\u4e0e\u4ee3\u7801\u76f8\u4f3c\u5ea6\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5728\u8bc6\u522b\u6700\u590d\u6742\u7684Type-4\u4ee3\u7801\u514b\u9686\u65f6\u6548\u679c\u8f83\u4f4e\u3002\u6b64\u5916\uff0cGPT\u6a21\u578b\u5728\u68c0\u6d4bLLM\u751f\u6210\u7684\u4ee3\u7801\u4e2d\u7684\u4ee3\u7801\u514b\u9686\u8868\u73b0\u4f18\u4e8e\u4eba\u7c7b\u751f\u6210\u7684\u4ee3\u7801\uff0c\u4f46\u6574\u4f53\u51c6\u786e\u6027\u4ecd\u4e0d\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u5728\u4ee3\u7801\u514b\u9686\u8bc6\u522b\u80fd\u529b\u7684\u5fc5\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u81ea\u6211\u751f\u6210\u4ee3\u7801\u514b\u9686\u7684\u95ee\u9898\uff0c\u968f\u7740\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u8d8a\u6765\u8d8a\u591a\u5730\u4f7f\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u548c\u91cd\u6784\u5de5\u5177\uff0c\u8fd9\u53ef\u80fd\u4f1a\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002|\n", "2407.03310": "|**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**\u6458\u8981\uff1a** \u957f\u5ea6\u6cdb\u5316\u6307\u7684\u662f\u4ece\u7b80\u77ed\u7684\u8bad\u7ec3\u5e8f\u5217\u63a8\u65ad\u51fa\u957f\u6d4b\u8bd5\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd9\u5bf9\u4e8e\u5f53\u524d\u7684\u5927\u8bed\u8a00\u6a21\u578b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u67b6\u6784\u6216\u6570\u636e\u683c\u5f0f\u53d8\u5316\u6765\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5c40\u9650\u4e8e\u7279\u5b9a\u4efb\u52a1\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u64e6\u9664\u677f\u548c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u6280\u672f\uff0c\u63d0\u51fa\u4e86Turing\u7a0b\u5e8f\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684CoT\u7b56\u7565\uff0c\u5b83\u5c06\u7b97\u6cd5\u6027\u4efb\u52a1\u5206\u89e3\u6210\u7c7b\u4f3c\u56fe\u7075\u673a\u8ba1\u7b97\u7684\u6b65\u9aa4\u3002\u8fd9\u4e2a\u6846\u67b6\u65e2\u901a\u7528\u53c8\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728\u4e0a\u4e0b\u6587\u4e2d\u7a0d\u4f5c\u4fee\u6539\u5730\u590d\u5236\u6587\u672c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Turing\u7a0b\u5e8f\uff0c\u6211\u4eec\u5728\u52a0\u6cd5\u3001\u4e58\u6cd5\u4ee5\u53ca\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684SGD\u7b49\u7b97\u6cd5\u6027\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u7a33\u5065\u7684\u957f\u5ea6\u6cdb\u5316\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793aTransformer\u5728\u968f\u673aTuring\u7a0b\u5e8f\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u8fd9\u8868\u660e\u5bf9\u4e8e\u4efb\u4f55\u7b97\u6cd5\u6027\u4efb\u52a1\uff0c\u957f\u5ea6\u6cdb\u5316\u90fd\u662f\u53ef\u80fd\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u7406\u8bba\u8bc1\u660eTransformer\u80fd\u591f\u5b9e\u73b0Turing\u7a0b\u5e8f\uff0c\u6784\u9020\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RASP\uff08Weiss\u7b49\u4eba\uff09\u7a0b\u5e8f\uff0c\u5b83\u6a21\u62df\u4efb\u610f\u56fe\u7075\u673a\u3002|\n", "2407.03286": "|**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## \u80cc\u666f \u534a\u7ed3\u6784\u5316\u6570\u636e\u683c\u5f0f\u5982JSON\u56e0\u5176\u5728\u5b58\u50a8\u6570\u636e\u65f6\u7684\u7075\u6d3b\u6027\u800c\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cJSON\u6570\u636e\u901a\u5e38\u7f3a\u4e4f\u4e0e\u5173\u7cfb\u6570\u636e\u5e93\u4e2d\u7684\u8868\u5355\u7ed3\u6784\u76f8\u5bf9\u5e94\u7684\u89c4\u8303\uff08schema\uff09\u3002\u56e0\u6b64\uff0c\u51fa\u73b0\u4e86\u8bb8\u591a\u4ece\u6570\u636e\u96c6\u4e2d\u53d1\u73b0\u89c4\u8303\u7684\u5de5\u5177\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5de5\u5177\u5f88\u6709\u7528\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u6587\u6863\u7684\u8bed\u6cd5\uff0c\u800c\u5ffd\u89c6\u4e86\u8bed\u4e49\u4fe1\u606f\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u5982\u4f55\u81ea\u52a8\u4e3a\u53d1\u73b0\u7684\u89c4\u8303\u6dfb\u52a0\u6709\u610f\u4e49\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u4f7f\u5176\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4f5c\u8005\u7f16\u5199\u7684\u89c4\u8303\u4e2d\u6240\u5305\u542b\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u4eba\u5de5\u7f16\u5199\u7684JSON Schema\u6587\u6863\u5e93\uff0c\u751f\u6210\u5143\u7d20\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u3001\u53ef\u91cd\u7528\u5b9a\u4e49\u7684\u6709\u610f\u4e49\u540d\u79f0\uff0c\u5e76\u8bc6\u522b\u51fa\u54ea\u4e9b\u53d1\u73b0\u7684\u5c5e\u6027\u6700\u6709\u7528\uff0c\u54ea\u4e9b\u53ef\u4ee5\u89c6\u4e3a\u201c\u566a\u58f0\u201d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5148\u524d\u5df2\u8bc1\u660e\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684\u6587\u672c\u751f\u6210\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|**[link](https://github.com/ziweiji/Internal_States_Reveal_Hallucination)**|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u4e25\u91cd\u5236\u7ea6\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002\u4eba\u7c7b\u5177\u6709\u81ea\u6211\u610f\u8bc6\u8fc7\u7a0b\uff0c\u80fd\u8bc6\u522b\u9762\u5bf9\u67e5\u8be2\u65f6\u7684\u672a\u77e5\u9886\u57df\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u8bba\u6587\u7814\u7a76\u4e86LLMs\u80fd\u5426\u5728\u751f\u6210\u54cd\u5e94\u4e4b\u524d\u81ea\u884c\u8bc4\u4f30\u5176\u5e7b\u89c9\u98ce\u9669\u3002\u6211\u4eec\u4ece\u8bad\u7ec3\u6570\u636e\u6e90\u548c15\u4e2a\u4e0d\u540c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u89d2\u5ea6\u5e7f\u6cdb\u5206\u6790LLMs\u7684\u5185\u90e8\u673a\u5236\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u6570\u636e\u96c6\u3002\u5b9e\u8bc1\u5206\u6790\u63ed\u793a\u4e86\u4e24\u4e2a\u5173\u952e\u53d1\u73b0\uff1a(1) LLM\u7684\u5185\u90e8\u72b6\u6001\u80fd\u591f\u6307\u793a\u5b83\u4eec\u662f\u5426\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c1\u8fc7\u67e5\u8be2\uff1b(2) LLM\u7684\u5185\u90e8\u72b6\u6001\u663e\u793a\u51fa\u5b83\u4eec\u5bf9\u67e5\u8be2\u53ef\u80fd\u4ea7\u751f\u5e7b\u89c9\u6216\u4e0d\u4ea7\u751f\u5e7b\u89c9\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u7814\u7a76\u5173\u6ce8\u7279\u5b9a\u7684\u795e\u7ecf\u5143\u3001\u6fc0\u6d3b\u5c42\u548c\u4ee4\u724c\uff0c\u8fd9\u4e9b\u5728LLM\u5bf9\u4e0d\u786e\u5b9a\u6027\u548c\u5e7b\u89c9\u98ce\u9669\u7684\u8ba4\u8bc6\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u901a\u8fc7\u4e00\u79cd\u63a2\u67e5\u4f30\u8ba1\u7b97\u6cd5\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u8fd0\u884c\u65f6\u5b9e\u73b0\u4e86\u5e73\u574784.32%\u7684\u5e7b\u89c9\u4f30\u8ba1\u51c6\u786e\u7387\u3002|\n", "2407.03227": "|**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|\u6211\u4eec\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89d2\u5ea6\u63a2\u8ba8\u6587\u672c\u5230SQL\u7684\u8bed\u4e49\u89e3\u6790\u3002\u9274\u4e8e\u5546\u4e1a\u6570\u636e\u5e93\u6a21\u5f0f\u7684\u89c4\u6a21\u6311\u6218\u548c\u4e1a\u52a1\u667a\u80fd\u89e3\u51b3\u65b9\u6848\u7684\u90e8\u7f72\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5b83\u52a8\u6001\u83b7\u53d6\u8f93\u5165\u6570\u636e\u5e93\u4fe1\u606f\uff0c\u5e76\u5229\u7528\u62bd\u8c61\u8bed\u6cd5\u6811\u9009\u62e9\u5c11\u91cf\u793a\u4f8b\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5982\u4f55\u5229\u7528\u5e76\u884c\u8bed\u4e49\u89e3\u6790\u5668\u751f\u6210SQL\u67e5\u8be2\u7684\u8fd1\u4f3c\u7248\u672c\uff0c\u4ee5\u652f\u6301\u6211\u4eec\u7684\u68c0\u7d22\u3002\u6211\u4eec\u751a\u81f3\u5c06\u8fd9\u79cd\u65b9\u6cd5\u63a8\u5411\u6781\u81f4\uff0c\u91c7\u7528\u4e0d\u52305\u4ebf\u53c2\u6570\u7684\u6a21\u578b\u4f5c\u4e3a\u9ad8\u6548\u8fd1\u4f3c\u5668\uff0c\u5e76\u8d4b\u4e88\u5176\u5e76\u884c\u5904\u7406\u6a21\u5f0f\u7684\u80fd\u529b\u3002\u6211\u4eec\u5728\u5355\u8bed\u548c\u8de8\u8bed\u8a00\u7684\u8bed\u4e49\u89e3\u6790\u57fa\u51c6\u4e0a\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4f18\u4e8e\u73b0\u6709\u6700\u4f73\u57fa\u7ebf\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u79cd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u8bbe\u7f6e\u4e2d\u5404\u4e2a\u6a21\u5757\u7684\u8d21\u732e\uff0c\u4e3a\u672a\u6765\u5de5\u4f5c\u6307\u660e\u4e86\u6709\u8da3\u7684\u65b9\u5411\u3002|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## \u80cc\u666f \u91cf\u5316\u6280\u672f\u5728\u63d0\u5347\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u901f\u5ea6\u548c\u90e8\u7f72\u6548\u7387\u65b9\u9762\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5c3d\u7ba1\u6709\u5927\u91cf\u7684\u7814\u7a76\u5173\u6ce8\u4e86\u91cf\u5316\u540e\u7684\u82f1\u8bed\u4efb\u52a1\u6a21\u578b\u6548\u679c\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u9488\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u3002\u6211\u4eec\u5bf9\u91cf\u5316\u591a\u8bed\u8a00LLM\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u8de8\u8bed\u8a00\u6027\u80fd\u53ca\u4e0d\u540c\u89c4\u6a21\u4e0b\u7684\u8868\u73b0\u3002\u6211\u4eec\u91c7\u7528\u81ea\u52a8\u57fa\u51c6\u6d4b\u8bd5\u3001LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u8bc4\u4f30\uff0c\u53d1\u73b0\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u91cf\u5316\u5bf9\u4eba\u7c7b\u8bc4\u4ef7\u7684\u5f71\u54cd\u662f\u8d1f\u9762\u7684\uff0c\u4e14\u81ea\u52a8\u6307\u6807\u4e25\u91cd\u4f4e\u4f30\u4e86\u8fd9\u79cd\u635f\u5bb3\uff1a\u81ea\u52a8\u4efb\u52a1\u4e2d\u5e73\u57471.7%\u7684\u6027\u80fd\u4e0b\u964d\u5bf9\u5e94\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u65e5\u672c\u4efb\u52a1\u768416.0%\u663e\u8457\u4e0b\u6ed1\uff1b(2) \u4e0d\u540c\u8bed\u8a00\u53d7\u5230\u91cf\u5316\u7684\u5f71\u54cd\u7a0b\u5ea6\u4e0d\u5747\uff0c\u975e\u62c9\u4e01\u5b57\u6bcd\u4f53\u7cfb\u7684\u8bed\u8a00\u53d7\u5f71\u54cd\u6700\u4e25\u91cd\uff1b(3) \u6bd4\u5982\u6570\u5b66\u63a8\u7406\u8fd9\u7c7b\u6311\u6218\u6027\u4efb\u52a1\uff0c\u5176\u6027\u80fd\u4e0b\u964d\u6700\u4e3a\u663e\u8457\u3002\u968f\u7740\u4f4e\u529f\u8017\u6a21\u578b\u670d\u52a1\u4e8e\u5168\u7403NLP\u6280\u672f\u7684\u666e\u53ca\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5728\u8bc4\u4f30\u9ad8\u6548\u6a21\u578b\u65f6\uff0c\u591a\u8bed\u8a00\u6027\u80fd\u5e94\u4f5c\u4e3a\u5173\u952e\u6307\u6807\u3002|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### \u7ffb\u8bd1 \u5728\u6570\u5b66\u8bc1\u660e\u7684\u8ba1\u7b97\u673a\u53ef\u9a8c\u8bc1\u5f62\u5f0f\u8bed\u8a00\uff08\u5982Lean\uff09\u9a8c\u8bc1\u4e2d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u7684\u8bc1\u660e\u65b9\u6cd5\u5177\u6709\u91cd\u8981\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7531\u4e8eNL\u4e0e\u5f62\u5f0f\u8bed\u8a00\uff08FL\uff09\u7684\u8bc1\u660e\u6570\u636e\u7a00\u7f3a\uff0c\u73b0\u4ee3LLMs\u5728\u751f\u6210\u5b8c\u6574\u8bc1\u660e\u65b9\u9762\u7684\u6027\u80fd\u6b20\u4f73\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a**TheoremLlama**\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u901a\u7528LLM\u6210\u4e3aLean4\u4e13\u5bb6\u3002\u8be5\u6846\u67b6\u5305\u62ecNL-FL\u5bf9\u9f50\u6570\u636e\u96c6\u751f\u6210\u65b9\u6cd5\u3001LLM\u5f62\u5f0f\u5b9a\u7406\u8bc1\u660e\u5668\u7684\u8bad\u7ec3\u7b56\u7565\u4ee5\u53caLLM\u5728\u64b0\u5199Lean4\u8bc1\u660e\u4e2d\u7684\u6280\u672f\u3002 \u5173\u952e\u521b\u65b0\u5728\u4e8e\u6211\u4eec\u5f00\u53d1\u4e86NL-FL\u81ea\u4e3e\u65b9\u6cd5\uff0c\u5373\u5c06NL\u8bc1\u660e\u878d\u5165Lean4\u4ee3\u7801\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u6b63\u5f0f\u63a8\u7406\u3002\u901a\u8fc7\u8fd9\u79cd\u6570\u636e\u96c6\u751f\u6210\u65b9\u5f0f\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86**Open Bootstrapped Theorems**\uff08OBT\uff09\uff0c\u4e00\u4e2a\u5bf9\u9f50\u4e14\u81ea\u4e3e\u7684NL-FL\u6570\u636e\u96c6\u3002**TheoremLlama**\u6846\u67b6\u5728MiniF2F-Valid\u548cTest\u6570\u636e\u96c6\u4e0a\u7684\u7d2f\u8ba1\u51c6\u786e\u7387\u5206\u522b\u8fbe\u523036.48%\u548c33.61%\uff0c\u8d85\u8fc7\u4e86GPT-4\u7684\u57fa\u7ebf\u5206\u657022.95%\u548c25.41%\u3002\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u6a21\u578b\u68c0\u67e5\u70b9\u548c\u751f\u6210\u7684\u6570\u636e\u96c6\uff0c\u5e76\u5373\u5c06\u5168\u90e8\u4ee3\u7801\u5f00\u6e90\u3002**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aDivergent CoT\uff08DCoT\uff09\uff0c\u901a\u8fc7\u8981\u6c42\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u6bd4\u8f83\u591a\u4e2a\u63a8\u7406\u94fe\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u5c0f\u578b\u3001\u66f4\u6613\u4e8e\u83b7\u53d6\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u4e5f\u80fd\u63d0\u9ad8\u8868\u73b0\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6d89\u53ca\u4e0d\u540c\u7c7b\u578b\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u7814\u7a76\u53d1\u73b0\u5bf9DCoT\u6570\u636e\u96c6\u7684\u5fae\u8c03\u5728\u5404\u79cd\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece13\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\u4e0a\u666e\u904d\u4f18\u4e8e\u57fa\u672c\u7684CoT\u65b9\u6cd5\u3002\u5b9e\u9a8c\u548c\u4eba\u5de5\u8bc4\u4f30\u8868\u660e\uff0c\u8fd9\u4e9b\u6027\u80fd\u63d0\u5347\u6e90\u4e8e\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u4e2d\u751f\u6210\u4e86\u591a\u4e2a\u4e0d\u540c\u7684\u63a8\u7406\u8def\u5f84\uff0c\u8fd9\u8868\u660e\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u81ea\u6211\u7ea0\u6b63\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/UKPLab/arxiv2024-divergent-cot\u4e0a\u516c\u5f00\u3002**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u80fd\u529b\u3001\u6cdb\u5316\u80fd\u529b\u548c\u8de8\u9886\u57df\u7684\u6d41\u7545\u6027\uff0c\u5728\u63d0\u5347\u8bed\u97f3\u76f8\u5173\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u672c\u6587\u5173\u6ce8\u7684\u662f\u5982\u4f55\u5c06\u89e3\u7801\u5668\u4ec5\u6709\u7684LLMs\u6574\u5408\u5230\u8bed\u97f3\u8f6c\u6587\u672c\u7ffb\u8bd1\uff08Speech-to-Text Translation\uff0cS2TT\uff09\u4efb\u52a1\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u67b6\u6784\uff0c\u8ba9LLM\u76f4\u63a5\u5904\u7406\u7f16\u7801\u7684\u8bed\u97f3\u8868\u793a\u5e76\u751f\u6210\u6587\u672c\u7ffb\u8bd1\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u548c\u4efb\u52a1\u8868\u8ff0\u65b9\u5f0f\u7684\u5f71\u54cd\u3002\u5728\u4e0d\u4f7f\u7528\u4e13\u6709\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728CoVoST 2\u548cFLEURS\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u8bbe\u8ba1\u9009\u62e9\u7684\u5408\u7406\u6027\uff0c\u5e76\u4e3aLLMs\u4e0eS2TT\u4efb\u52a1\u7684\u878d\u5408\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2407.03160": "|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## \u80cc\u666f \u5f00\u6e90\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u516c\u4f17\u548c\u884c\u4e1a\u4e2d\u7684\u53d7\u6b22\u8fce\u7a0b\u5ea6\u65e5\u76ca\u63d0\u5347\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u5b9a\u5236\u3001\u5fae\u8c03\u4e14\u514d\u8d39\u4f7f\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5f00\u6e90LLMs\u5728\u4f7f\u7528\u524d\u9700\u8981\u5ba1\u6279\uff0c\u8fd9\u4fc3\u4f7f\u7b2c\u4e09\u65b9\u53d1\u5e03\u6613\u4e8e\u83b7\u53d6\u7684\u7248\u672c\uff0c\u751a\u81f3\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u6216\u91cf\u5316\u4f18\u5316\uff0c\u4ee5\u964d\u4f4e\u8ba1\u7b97\u9700\u6c42\u3002\u8fd9\u4e9b\u4fbf\u6377\u7248\u672c\u5bf9\u7528\u6237\u9887\u5177\u5438\u5f15\u529b\uff0c\u4f46\u4e5f\u589e\u52a0\u4e86\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u7684\u98ce\u9669\uff0c\u5a01\u80c1\u5230LLMs\u7684\u5b8c\u6574\u6027\u548c\u5b89\u5168\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u65b9\u6cd5SOS\uff0c\u5b83\u8bbe\u8ba1\u5f97\u8ba1\u7b97\u9700\u6c42\u4f4e\uff0c\u65e0\u9700\u5e72\u51c0\u6570\u636e\u6216\u8c03\u6574\u6a21\u578b\u6743\u91cd\uff0c\u4fdd\u6301\u6a21\u578b\u7684\u53ef\u7528\u6027\u3002SOS\u9488\u5bf9\u5404\u79cd\u573a\u666f\u4e0b\u7684\u5b89\u5168\u95ee\u9898\uff0c\u5305\u62ec\u540e\u95e8\u653b\u51fb\u3001\u7834\u89e3\u653b\u51fb\u548c\u63d0\u793a\u7a83\u53d6\u653b\u51fb\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u653b\u51fb\u5728\u6240\u6709\u8bc4\u4f30\u76ee\u6807\u4e0a\u5747\u6709\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86SOS\u6280\u672f\u7684\u53e6\u4e00\u9762\u2014\u2014\u7248\u6743\u4ee4\u724c\uff1a\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5141\u8bb8\u7528\u6237\u6807\u8bb0\u5176\u7248\u6743\u5185\u5bb9\uff0c\u9632\u6b62\u6a21\u578b\u4f7f\u7528\u3002|\n", "2407.03157": "|**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4ee3\u7801\u751f\u6210\u4e2d\u7684\u5e38\u89c1\u573a\u666f\uff1a\u5f00\u53d1\u8005\u5b9e\u65f6\u7f16\u8f91\u73b0\u6709\u4ee3\u7801\uff0c\u5e76\u8bf7\u6c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u8bed\u8a00\u6a21\u578b\uff09\u8fdb\u884c\u5373\u65f6\u91cd\u9884\u6d4b\u4e0b\u4e00\u4e2atoken\u6216\u884c\u3002\u76f4\u63a5\u7684\u65b9\u6cd5\u662f\u8ba9LLM\u91cd\u65b0\u7f16\u7801\u6574\u4e2a\u952e\u503c\u7f13\u5b58\u4ee5\u63d0\u4f9b\u7cbe\u786e\u7684\u9884\u6d4b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8ba1\u7b97\u6210\u672c\u9ad8\uff0c\u7279\u522b\u662f\u5f53\u5e8f\u5217\u957f\u5ea6\u5f88\u957f\u65f6\u3002\u4ec5\u7f16\u7801\u7f16\u8f91\u540e\u7684\u5b50\u5e8f\u5217\u5e76\u5c06\u5176\u6574\u5408\u5230\u539f\u59cb\u952e\u503c\u7f13\u5b58\u4e2d\u4f1a\u9047\u5230\u65f6\u95f4\u6df7\u6dc6\u95ee\u9898\uff0c\u5bfc\u81f4\u6027\u80fd\u5927\u5e45\u4e0b\u964d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\u2014\u2014\\textbf{\u4f4d\u7f6e\u5b8c\u6574\u6027\u7f16\u7801}\uff08Positional Integrity Encoding\uff0c\u7b80\u79f0PIE\uff09\u3002PIE\u57fa\u4e8e\u65cb\u8f6c\u578b\u4f4d\u7f6e\u7f16\u7801\uff0c\u9996\u5148\u79fb\u9664\u5f15\u5165\u65f6\u95f4\u6df7\u6dc6\u7684\u65cb\u8f6c\u578b\u77e9\u9635\uff0c\u7136\u540e\u91cd\u65b0\u5e94\u7528\u6b63\u786e\u7684\u77e9\u9635\uff0c\u786e\u4fdd\u4e86\u4ee4\u724c\u4e4b\u95f4\u7684\u4f4d\u7f6e\u5173\u7cfb\u6b63\u786e\uff0c\u4ec5\u9700\u4e00\u8f6e\u77e9\u9635\u4e58\u6cd5\u5373\u53ef\u5b8c\u6210\u3002\u6211\u4eec\u5728RepoBench-C-8k\u6570\u636e\u96c6\u4e0a\uff0c\u4f7f\u752813\u4ebf\u300167\u4ebf\u548c330\u4ebf\u53c2\u6570\u7684DeepSeek-Coder\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u4ee3\u7801\u63d2\u5165\u3001\u4ee3\u7801\u5220\u9664\u548c\u591a\u4f4d\u7f6e\u4ee3\u7801\u7f16\u8f91\u7b49\u4e09\u4e2a\u5b9e\u9645\u7f16\u7a0b\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u5b8c\u6574\u91cd\u8ba1\u7b97\u65b9\u6cd5\u76f8\u6bd4\uff0cPIE\u5728\u6240\u6709\u6a21\u578b\u89c4\u6a21\u548c\u4efb\u52a1\u4e2d\u90fd\u80fd\u51cf\u5c11\u8d85\u8fc785%\u7684\u8ba1\u7b97\u5f00\u9500\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6027\u80fd\u8fd1\u4f3c\u3002|\n", "2407.04694": "|**2024-07-05**|**Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs**|Rudolf Laine et.al.|[2407.04694](http://arxiv.org/abs/2407.04694)|**[link](https://github.com/lrudl/sad)**|## \u80cc\u666f \u4eba\u5de5\u667a\u80fd\u52a9\u624b\uff0c\u5982ChatGPT\uff0c\u5728\u88ab\u8bad\u7ec3\u65f6\u4f1a\u56de\u5e94\u7528\u6237\uff1a\u201c\u6211\u662f\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u201d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u7684\u77e5\u9053\u81ea\u5df1\u662fLLMs\uff0c\u5e76\u80fd\u636e\u6b64\u53ef\u9760\u5730\u884c\u52a8\uff1f\u5b83\u4eec\u662f\u5426\u4e86\u89e3\u81ea\u5df1\u5f53\u524d\u7684\u90e8\u7f72\u60c5\u51b5\uff0c\u4f8b\u5982\u9762\u5411\u516c\u4f17\uff1f\u6211\u4eec\u79f0\u4e4b\u4e3a\u6a21\u578b\u7684\u201c\u60c5\u5883\u610f\u8bc6\u201d\u3002\u4e3a\u4e86\u91cf\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u60c5\u5883\u610f\u8bc6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u884c\u4e3a\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u95ee\u7b54\u548c\u6307\u4ee4\u6267\u884c\uff0c\u8fd9\u5c31\u662f**\u60c5\u5883\u610f\u8bc6\u6570\u636e\u96c6\uff08Situational Awareness Dataset\uff0c\u7b80\u79f0SAD\uff09**\u3002\u8be5\u57fa\u51c6\u5305\u62ec7\u4e2a\u4efb\u52a1\u7c7b\u522b\uff0c\u8d85\u8fc713,000\u4e2a\u95ee\u9898\uff0c\u6d4b\u8bd5\u4e86\u591a\u9879\u80fd\u529b\uff0c\u5982\u8bc6\u522b\u81ea\u8eab\u751f\u6210\u7684\u6587\u672c\u3001\u9884\u6d4b\u81ea\u5df1\u7684\u884c\u4e3a\u3001\u5206\u8fa8\u63d0\u793a\u6765\u81ea\u5185\u90e8\u8bc4\u4f30\u8fd8\u662f\u5b9e\u9645\u5e94\u7528\uff0c\u4ee5\u53ca\u9075\u5faa\u4f9d\u8d56\u81ea\u6211\u8ba4\u77e5\u7684\u6307\u4ee4\u3002 \u6211\u4eec\u5bf916\u79cdLLMs\u5728SAD\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u57fa\u7840\uff08\u9884\u8bad\u7ec3\uff09\u6a21\u578b\u548c\u804a\u5929\u6a21\u578b\u3002\u5c3d\u7ba1\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\uff0c\u4f46\u6700\u9ad8\u5206\u7684\u6a21\u578b\uff08Claude 3 Opus\uff09\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0SAD\u7684\u8868\u73b0\u4e0e\u901a\u7528\u77e5\u8bc6\u6307\u6807\uff08\u5982MMLU\uff09\u7684\u76f8\u5173\u6027\u5e76\u4e0d\u5b8c\u5168\u4e00\u81f4\u3002\u804a\u5929\u6a21\u578b\uff0c\u7ecf\u8fc7\u9488\u5bf9\u6027\u8bad\u7ec3\u4ee5\u4f5c\u4e3aAI\u52a9\u624b\uff0c\u76f8\u5bf9\u4e8e\u57fa\u7840\u6a21\u578b\u5728SAD\u4e0a\u7684\u8868\u73b0\u66f4\u597d\uff0c\u4f46\u5728\u901a\u7528\u77e5\u8bc6\u4efb\u52a1\u4e0a\u5219\u4e0d\u7136\u3002SAD\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5206\u89e3\u6210\u53ef\u91cf\u5316\u7684\u80fd\u529b\uff0c\u4fc3\u8fdb\u79d1\u5b66\u754c\u5bf9LLMs\u60c5\u5883\u610f\u8bc6\u7684\u7406\u89e3\u3002\u60c5\u5883\u610f\u8bc6\u5bf9\u4e8e\u589e\u5f3a\u6a21\u578b\u7684\u81ea\u4e3b\u89c4\u5212\u548c\u884c\u52a8\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u8fd9\u65e2\u6709\u5229\u4e8e\u81ea\u52a8\u5316\uff0c\u4e5f\u5e26\u6765\u4e86\u4e0eAI\u5b89\u5168\u548c\u63a7\u5236\u76f8\u5173\u7684\u5168\u65b0\u98ce\u9669\u3002\u60a8\u53ef\u4ee5\u5728\u83b7\u53d6\u4ee3\u7801\u548c\u6700\u65b0\u7ed3\u679c\u3002|\n", "2407.04693": "|**2024-07-05**|**ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models**|Yuzhe Gu et.al.|[2407.04693](http://arxiv.org/abs/2407.04693)|**[link](https://github.com/open-compass/anah)**|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8de8\u9886\u57df\u548c\u5e7f\u6cdb\u5e94\u7528\u7684\u957f\u683c\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\u4f1a\u51fa\u73b0\u5e7b\u89c9\u3002\u5f53\u524d\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u7f13\u89e3\u6570\u636e\u96c6\u5728\u9886\u57df\u8986\u76d6\u548c\u89c4\u6a21\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u7531\u4e8e\u52b3\u52a8\u6210\u672c\u9ad8\u6602\u4e14\u73b0\u6709\u5e7b\u89c9\u6807\u6ce8\u5458\u7684\u53ef\u9760\u6027\u4e0d\u8db3\uff0c\u96be\u4ee5\u5b9e\u73b0\u89c4\u6a21\u5316\u3002\u4e3a\u4e86\u63a8\u52a8\u5bf9LLMs\u5e7b\u89c9\u7684\u53ef\u6269\u5c55\u76d1\u7763\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u81ea\u6211\u8bad\u7ec3\u6846\u67b6\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u671f\u671b\u6700\u5927\u5316\uff08EM\uff09\u7b97\u6cd5\uff0c\u6bcf\u6b21\u8fed\u4ee3\u9996\u5148\u4f7f\u7528\u4e00\u4e2a\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u6765\u6807\u8bb0\u6269\u5927\u7684\u6570\u636e\u96c6\uff0c\u7136\u540e\u7528\u8fd9\u4e2a\u66f4\u51c6\u786e\u7684\u6807\u6ce8\u5668\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u5728\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u4e2d\uff0c\u4f7f\u7528\u65b0\u7684\u6807\u6ce8\u5668\u66f4\u65b0\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\uff0c\u6700\u7ec8\u5f97\u5230\u7684\u4ec5\u97007\u4ebf\u53c2\u6570\u7684\u5e7b\u89c9\u6807\u6ce8\u5668\u8d85\u8d8a\u4e86GPT-4\u7684\u8868\u73b0\uff0c\u5e76\u5728HaluEval\u548cHalluQA\u4e0a\u7684\u96f6\u6837\u672c\u63a8\u7406\u4e2d\u53d6\u5f97\u4e86\u6700\u65b0\u7684\u5e7b\u89c9\u68c0\u6d4b\u6548\u679c\u3002\u8fd9\u79cd\u6807\u6ce8\u5668\u4e0d\u4ec5\u80fd\u591f\u8bc4\u4f30\u4e0d\u540cLLMs\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u5e7b\u89c9\u7a0b\u5ea6\uff0c\u8fd8\u80fd\u901a\u8fc7NLI\u6307\u6807\u63d0\u5347\uff08\u4ece25%\u63d0\u9ad8\u523037%\uff09\u6765\u5e2e\u52a9\u51cf\u8f7b\u751f\u6210\u6587\u672c\u7684\u5e7b\u89c9\u95ee\u9898\u3002|\n", "2407.04681": "|**2024-07-05**|**Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge**|Yuanze Lin et.al.|[2407.04681](http://arxiv.org/abs/2407.04681)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u4f7f\u7528\u5927\u578b\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u6587\u672c\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u540e\uff0c\u5728\u6574\u4f53\u7406\u89e3\u56fe\u50cf\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u6587\u672c\u5f62\u5f0f\u56fa\u6709\u7684\u56f0\u96be\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u9700\u8981\u7cbe\u7ec6\u6216\u7a7a\u95f4\u5bc6\u96c6\u4fe1\u606f\uff08\u5982\u906e\u7f69\uff09\u7684\u95ee\u9898\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u5bf9\u8be6\u7ec6\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u80fd\u529b\u3002\u53d7\u5230\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7406\u5ff5\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u89c6\u89c9\u63d0\u793a\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6765\u81ea\u4e13\u95e8\u89c6\u89c9\u6a21\u578b\uff08\u5982\u5b9e\u4f8b\u5206\u5272\u548cOCR\u6a21\u578b\uff09\u7684\u7cbe\u7ec6\u5916\u90e8\u77e5\u8bc6\u878d\u5165MLLM\u3002\u8fd9\u662f\u4e00\u4e2a\u6709\u524d\u666f\u4f46\u5c1a\u672a\u5145\u5206\u63a2\u7d22\u7684\u65b9\u5411\uff0c\u53ef\u4ee5\u63d0\u5347MLLM\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u533a\u522b\u4e8e\u540c\u65f6\u671f\u7684\u5de5\u4f5c\uff0c\u5b83\u4eec\u5c06\u5916\u90e8\u77e5\u8bc6\u8f6c\u5316\u4e3a\u989d\u5916\u7684\u6587\u672c\u63d0\u793a\uff0c\u8feb\u4f7f\u6a21\u578b\u95f4\u63a5\u5b66\u4e60\u89c6\u89c9\u5185\u5bb9\u4e0e\u6587\u672c\u5750\u6807\u4e4b\u95f4\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u8bae\u5c06\u7cbe\u7ec6\u77e5\u8bc6\u4fe1\u606f\u76f4\u63a5\u5d4c\u5165\u5230\u4e00\u4e2a\u7a7a\u95f4\u5d4c\u5165\u56fe\u4e2d\u4f5c\u4e3a\u89c6\u89c9\u63d0\u793a\u3002\u8fd9\u79cd\u8bbe\u8ba1\u53ef\u4ee5\u8f7b\u677e\u5730\u6574\u5408\u8fdb\u5404\u79cdMLLM\uff0c\u5982LLaVA\u548cMipha\uff0c\u663e\u8457\u63d0\u9ad8\u5b83\u4eec\u7684\u89c6\u89c9\u7406\u89e3\u6027\u80fd\u3002\u901a\u8fc7\u4e25\u8c28\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u4e5d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u63d0\u5347MLLM\u7684\u6574\u4f53\u6027\u80fd\uff0c\u589e\u5f3a\u5176\u5bf9\u7ec6\u7c92\u5ea6\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u80fd\u529b\u3002|\n", "2407.04675": "|**2024-07-05**|**Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition**|Ye Bai et.al.|[2407.04675](http://arxiv.org/abs/2407.04675)|null|\u73b0\u4ee3\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u9700\u8981\u51c6\u786e\u8f6c\u5f55\u6765\u81ea\u4e0d\u540c\u9886\u57df\u3001\u8bed\u8a00\u548c\u53e3\u97f3\u7684\u591a\u6837\u8bed\u97f3\u4fe1\u53f7\uff0c\u540c\u65f6\u8003\u8651\u5230\u7279\u5b9a\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ee5\u9002\u5e94\u5404\u79cd\u5e94\u7528\u573a\u666f\u7684\u9700\u6c42\u3002\u4f20\u7edf\u7684\u7aef\u5230\u7aef\u6a21\u578b\u7ed3\u5408\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u6570\u636e\u5339\u914d\u573a\u666f\u4e2d\u6548\u679c\u826f\u597d\uff0c\u4f46\u9010\u6e10\u9762\u4e34\u74f6\u9888\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u578b\u8bed\u97f3\u8bc6\u522b\u6a21\u578b\u2014\u2014Seed-ASR\u3002\u5b83\u5efa\u7acb\u5728\u97f3\u9891\u6761\u4ef6\u5316LLM\uff08AcLLM\uff09\u67b6\u6784\u4e4b\u4e0a\uff0c\u901a\u8fc7\u5c06\u8fde\u7eed\u8bed\u97f3\u8868\u793a\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5229\u7528\u4e86LLM\u7684\u5f3a\u5927\u529f\u80fd\u3002\u901a\u8fc7\u5206\u9636\u6bb5\u7684\u5927\u89c4\u6a21\u8bad\u7ec3\u4ee5\u53ca\u5728LLM\u4e2d\u6fc0\u53d1\u4e0a\u4e0b\u6587\u611f\u77e5\u80fd\u529b\uff0cSeed-ASR\u5728\u5305\u62ec\u591a\u4e2a\u9886\u57df\u3001\u65b9\u8a00\u548c\u8bed\u8a00\u7684\u7efc\u5408\u8bc4\u4f30\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u7aef\u5230\u7aef\u6a21\u578b\u3002\u6b64\u5916\uff0cSeed-ASR\u80fd\u591f\u90e8\u7f72\u5230\u5404\u79cd\u573a\u666f\u4e2d\u652f\u6301\u7279\u5b9a\u9700\u6c42\uff0c\u65e0\u9700\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u3002\u4e0e\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578bASR\u6a21\u578b\u76f8\u6bd4\uff0cSeed-ASR\u5728\u4e2d\u6587\u548c\u82f1\u6587\u516c\u5f00\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8bcd\uff08\u6216\u5b57\u7b26\uff0c\u9488\u5bf9\u4e2d\u6587\uff09\u9519\u8bef\u7387\u964d\u4f4e\u4e8610%-40%\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.04656": "|**2024-07-05**|**Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement**|Yongji Wu et.al.|[2407.04656](http://arxiv.org/abs/2407.04656)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u7a00\u758f\u6fc0\u6d3b\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u8ba1\u7b97\u6210\u672c\u7684\u4e9a\u7ebf\u6027\u6269\u5c55\u800c\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u3002\u7136\u800c\uff0c\u9891\u7e41\u7684\u8bad\u7ec3\u5931\u8d25\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u56e0\u4e3a\u5355\u6b21\u5931\u8d25\u53ef\u80fd\u5bfc\u81f4\u6240\u6709GPU\u9677\u5165\u95f2\u7f6e\uff0c\u76f4\u81f3\u95ee\u9898\u89e3\u51b3\uff0c\u4ece\u800c\u53ef\u80fd\u4e22\u5931\u5927\u91cf\u8bad\u7ec3\u8fdb\u5ea6\uff0c\u9700\u8981\u4ece\u68c0\u67e5\u70b9\u91cd\u65b0\u5f00\u59cb\u3002\u73b0\u6709\u7684\u9ad8\u6548\u5bb9\u9519\u8bad\u7ec3\u89e3\u51b3\u65b9\u6848\u8981\u4e48\u7f3a\u4e4f\u5f39\u6027\uff0c\u8981\u4e48\u4f9d\u8d56\u4e8e\u5c06\u6062\u590d\u80fd\u529b\u6784\u5efa\u5230\u7ba1\u9053\u5e76\u884c\u6027\u4e2d\uff0c\u4f46\u8fd9\u4e0d\u9002\u7528\u4e8eMoE\u6a21\u578b\uff0c\u56e0\u4e3aMoE\u67b6\u6784\u91c7\u7528\u4e86\u4e13\u5bb6\u5e76\u884c\u7b56\u7565\u3002 \u6211\u4eec\u63d0\u51fa\u4e86Lazarus\uff0c\u4e00\u4e2a\u9488\u5bf9MoE\u6a21\u578b\u8fdb\u884c\u5bb9\u9519\u548c\u5f39\u6027\u7684\u8bad\u7ec3\u7cfb\u7edf\u3002Lazarus\u901a\u8fc7\u52a8\u6001\u5206\u914d\u4e13\u5bb6\u526f\u672c\u6765\u5e94\u5bf9\u4e13\u5bb6\u5de5\u4f5c\u8d1f\u8f7d\u7684\u56fa\u6709\u4e0d\u5e73\u8861\uff0c\u4ece\u800c\u52a0\u901f\u8bad\u7ec3\uff0c\u5e76\u5f00\u53d1\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u6700\u4f18\u7684\u4e13\u5bb6\u653e\u7f6e\u7b97\u6cd5\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8\u5728\u5931\u8d25\u540e\u7684\u6062\u590d\u6982\u7387\u3002\u901a\u8fc7\u81ea\u9002\u5e94\u7684\u4e13\u5bb6\u653e\u7f6e\u548c\u7075\u6d3b\u7684\u4ee4\u724c\u5206\u53d1\u5668\uff0cLazarus\u80fd\u591f\u5728\u6545\u969c\u540e\u5145\u5206\u5229\u7528\u6240\u6709\u53ef\u7528\u8282\u70b9\uff0c\u907f\u514dGPU\u7a7a\u95f2\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u4e0e\u73b0\u6709MoE\u8bad\u7ec3\u7cfb\u7edf\u76f8\u6bd4\uff0cLazarus\u5728\u9891\u7e41\u7684\u8282\u70b9\u6545\u969c\u4e0b\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe5.7\u500d\uff0c\u4e14\u5728\u771f\u5b9espot\u5b9e\u4f8b\u8ddf\u8e2a\u4e0a\u63d0\u5347\u4e863.4\u500d\u3002|\n", "2407.04629": "|**2024-07-05**|**Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework**|Reza Averly et.al.|[2407.04629](http://arxiv.org/abs/2407.04629)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u4e34\u5e8a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08Clinical NER\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u4ece\u4e34\u5e8a\u75c5\u5386\u4e2d\u63d0\u53d6\u91cd\u8981\u5b9e\u4f53\u7684\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4e13\u6709\u7684LLMs\uff0c\u4f46\u8bba\u6587\u63a2\u8ba8\u4e86\u5f00\u653e\u7684\u3001\u4e13\u95e8\u4e3a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u8bad\u7ec3\u7684LLMs\u5728\u4e34\u5e8aNER\u4e2d\u7684\u6027\u80fd\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u5b9e\u4f53\u5206\u89e3\u4e0e\u8fc7\u6ee4\u201d\uff08Entity Decomposition with Filtering\uff0cEDF\uff09\uff0c\u76ee\u7684\u662f\u901a\u8fc7\u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u5b9e\u4f53\u7c7b\u578b\u7684\u68c0\u7d22\uff0c\u5e76\u5f15\u5165\u4e00\u4e2a\u8fc7\u6ee4\u673a\u5236\u6765\u6d88\u9664\u9519\u8bef\u5b9e\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5728\u6240\u6709\u5ea6\u91cf\u6807\u51c6\u3001\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5b9e\u4f53\u7c7b\u578b\u4e0a\u90fd\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002\u5206\u6790\u663e\u793a\uff0c\u5b9e\u4f53\u5206\u89e3\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5bf9\u5148\u524d\u672a\u88ab\u6355\u6349\u5230\u7684\u5b9e\u4f53\u7684\u8bc6\u522b\u3002\u6b64\u5916\uff0c\u8bba\u6587\u8fd8\u63d0\u4f9b\u4e86\u5bf9\u6846\u67b6\u7684\u5168\u9762\u8bc4\u4f30\u548c\u6df1\u5165\u7684\u9519\u8bef\u5206\u6790\uff0c\u4ee5\u671f\u4e3a\u672a\u6765\u7684\u7814\u7a76\u63d0\u4f9b\u65b9\u5411\u3002|\n", "2407.04622": "|**2024-07-05**|**On scalable oversight with weak LLMs judging strong LLMs**|Zachary Kenton et.al.|[2407.04622](http://arxiv.org/abs/2407.04622)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u53ef\u6269\u5c55\u7684\u76d1\u7763\u534f\u8bae\uff0c\u76ee\u6807\u662f\u8ba9\u4eba\u7c7b\u80fd\u591f\u6709\u6548\u76d1\u7763\u8d85\u8d8a\u4eba\u7c7b\u7ea7\u522b\u7684AI\u3002\u7814\u7a76\u4e3b\u8981\u805a\u7126\u5728\u8fa9\u8bba\u3001\u54a8\u8be2\u548c\u76f4\u63a5\u95ee\u7b54\u4e09\u79cd\u5f62\u5f0f\u4e0a\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aAI\u4ee3\u7406\u548c\u6cd5\u5b98\u89d2\u8272\uff0c\u5047\u8bbe\u6cd5\u5b98\u6a21\u578b\u8f83\u5f31\u3002\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u4efb\u52a1\u5f02\u8d28\u6027\uff0c\u6269\u5c55\u4e86\u5148\u524d\u4ec5\u5173\u6ce8\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u5355\u4e00\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\uff0c\u589e\u52a0\u4e86\u6570\u5b66\u3001\u7f16\u7a0b\u3001\u903b\u8f91\u548c\u591a\u6a21\u6001\u63a8\u7406\u7b49\u9886\u57df\u7684\u6311\u6218\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u6240\u6709\u4efb\u52a1\u4e2d\uff0c\u5f53\u54a8\u8be2\u5e08\u968f\u673a\u88ab\u5206\u914d\u6b63\u786e\u6216\u9519\u8bef\u7b54\u6848\u65f6\uff0c\u8fa9\u8bba\u4f18\u4e8e\u54a8\u8be2\u3002\u5728\u5b58\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u8fa9\u8bba\u4f18\u4e8e\u76f4\u63a5\u95ee\u7b54\uff0c\u4f46\u5728\u5176\u4ed6\u6ca1\u6709\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u4efb\u52a1\u4e2d\uff0c\u7ed3\u679c\u5219\u4e0d\u4e00\u3002\u5f53AI\u88ab\u5141\u8bb8\u9009\u62e9\u8981\u8bba\u8bc1\u7684\u7b54\u6848\u800c\u975e\u9884\u5148\u6307\u5b9a\u65f6\uff0c\u53d1\u73b0\u6cd5\u5b98\u88ab\u9519\u8bef\u7b54\u6848\u8bf4\u670d\u7684\u60c5\u51b5\u5728\u8fa9\u8bba\u4e2d\u51cf\u5c11\u3002\u6b64\u5916\uff0c\u66f4\u5f3a\u7684\u8fa9\u8bba\u8005\u6a21\u578b\u80fd\u63d0\u9ad8\u6cd5\u5b98\u7684\u51c6\u786e\u6027\uff0c\u5c3d\u7ba1\u63d0\u5347\u7a0b\u5ea6\u7565\u4f4e\u4e8e\u4e4b\u524d\u7684\u7814\u7a76\u3002|\n", "2407.04581": "|**2024-07-05**|**Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions**|Shumaila Javaid et.al.|[2407.04581](http://arxiv.org/abs/2407.04581)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u878d\u5165\u96c6\u6210\u536b\u661f\u3001\u822a\u7a7a\u548c\u5730\u9762\u7f51\u7edc\uff08ISATN\uff09\u7684\u53d8\u9769\u6f5c\u529b\uff0c\u5229\u7528\u5148\u8fdb\u7684\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u4f18\u5316\u8fd9\u4e9b\u7f51\u7edc\u7684\u8fde\u901a\u6027\u3002\u9996\u5148\u6982\u8ff0\u4e86ISATN\u7684\u5f53\u524d\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86LLMs\u5728\u63d0\u5347\u6570\u636e\u6d41\u3001\u4fe1\u53f7\u5904\u7406\u548c\u7f51\u7edc\u7ba1\u7406\u65b9\u9762\u7684\u4f5c\u7528\uff0c\u4ee5\u63a8\u52a85G/6G\u901a\u4fe1\u6280\u672f\u7684\u53d1\u5c55\uff0c\u901a\u8fc7\u9ad8\u7ea7\u9884\u6d4b\u7b97\u6cd5\u548c\u5b9e\u65f6\u51b3\u7b56\u6765\u589e\u5f3a\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6df1\u5165\u5206\u6790\u4e86ISATN\u7ec4\u4ef6\uff0c\u63a2\u8ba8\u4e86\u5982\u4f55\u6709\u6548\u5730\u5229\u7528LLMs\u89e3\u51b3\u4f20\u7edf\u6570\u636e\u4f20\u8f93\u548c\u5904\u7406\u4e2d\u7684\u74f6\u9888\u95ee\u9898\u3002 \u6587\u7ae0\u7740\u91cd\u4e8eISATN\u7684\u7f51\u7edc\u7ba1\u7406\u6311\u6218\uff0c\u5305\u62ec\u8d44\u6e90\u5206\u914d\u7b56\u7565\u3001\u6d41\u91cf\u8def\u7531\u4ee5\u53ca\u5728\u4e0d\u65ad\u53d8\u5316\u6761\u4ef6\u4e0b\u786e\u4fdd\u65e0\u7f1d\u8fde\u63a5\u548c\u6700\u4f18\u6027\u80fd\u7684\u7f51\u7edc\u5b89\u5168\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5c06LLMs\u6574\u5408\u5230ISATN\u4e2d\u6240\u9762\u4e34\u7684\u6280\u672f\u6311\u6218\uff0c\u5982\u6570\u636e\u96c6\u6210\u3001\u6269\u5c55\u6027\u95ee\u9898\u3001\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u5ef6\u8fdf\uff0c\u4ee5\u53ca\u6784\u5efa\u5065\u58ee\u4e14\u5bb9\u9519\u7684\u7cfb\u7edf\u8bbe\u8ba1\u3002\u6700\u540e\uff0c\u7814\u7a76\u6307\u51fa\u4e86\u672a\u6765\u7814\u7a76\u7684\u5173\u952e\u65b9\u5411\uff0c\u5373\u5982\u4f55\u5145\u5206\u5229\u7528LLM\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u5347\u7f51\u7edc\u53ef\u9760\u6027\u3001\u4f18\u5316\u6027\u80fd\uff0c\u5b9e\u73b0\u4e00\u4e2a\u771f\u6b63\u5168\u7403\u4e92\u8054\u4e14\u667a\u80fd\u7684\u7f51\u7edc\u4f53\u7cfb\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u6027\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04541": "|**2024-07-05**|**PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts**|Ana-Cristina Rogoz et.al.|[2407.04541](http://arxiv.org/abs/2407.04541)|**[link](https://github.com/ana-rogoz/poprero)**|**\u6211\u4eec\u63a8\u51fa\u4e86PoPreRo\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u7f57\u9a6c\u5c3c\u4e9aReddit\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6536\u96c6\u7684dataset\u3002PoPreRo\u6c47\u96c6\u4e86\u4e94\u4e2a\u4e0d\u540c\u7f57\u9a6c\u5c3c\u4e9a\u5b50\u8bba\u575b\u7684\u591a\u6837\u5316\u5e16\u5b50\u6837\u672c\uff0c\u603b\u8ba1\u5305\u542b28,107\u6761\u6570\u636e\u3002\u968f\u6570\u636e\u96c6\u4e00\u540c\u53d1\u5e03\u7684\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u7ade\u4e89\u6027\u6a21\u578b\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u7840\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6d4b\u8bd5\u96c6\u4e0a\u5f97\u5206\u6700\u9ad8\u7684\u6a21\u578b\u8fbe\u5230\u4e8661.35%\u7684\u51c6\u786e\u7387\u548c60.60%\u7684\u5b8fF1\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5728PoPreRo\u4e0a\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u5bf9Falcon-7B\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u4e00\u6b65\u63a2\u7a76\u4e5f\u6307\u5411\u4e86\u540c\u6837\u7684\u7ed3\u8bba\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u76f8\u4fe1PoPreRo\u662f\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u8d44\u6e90\uff0c\u53ef\u4ee5\u7528\u6765\u8bc4\u4f30\u7f57\u9a6c\u5c3c\u4e9a\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/ana-rogoz/PoPreRo\u3002**|\n", "2407.06189": "|**2024-07-08**|**Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision**|Orr Zohar et.al.|[2407.06189](http://arxiv.org/abs/2407.06189)|**[link](https://github.com/orrzohar/Video-STaR)**|**\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u7684\u6027\u80fd\u4e0e\u5176\u8bad\u7ec3\u6570\u636e\u7684\u89c4\u6a21\u548c\u8d28\u91cf\u5bc6\u5207\u76f8\u5173\u3002\u5f53\u524d\u7684\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e3b\u8981\u7531\u63d0\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u89c6\u9891\u5b57\u5e55\u4ee5\u5f62\u6210\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5185\u5bb9\u591a\u4e3a\u63cf\u8ff0\u6027\u3002\u7136\u800c\uff0c\u8bb8\u591a\u5e26\u6709\u4e30\u5bcc\u6807\u7b7e\u548c\u76d1\u7763\u7684\u89c6\u9891\u6570\u636e\u96c6\u5df2\u7ecf\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u5c06\u5b83\u4eec\u878d\u5165LVLM\u5e76\u975e\u6613\u4e8b\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u4e0e\u589e\u5f3a\u63a8\u7406\uff08Video Self-Training with augmented Reasoning\uff0c\u7b80\u79f0Video-STaR\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u65b9\u6cd5\u3002Video-STaR\u4f7f\u5f97\u4efb\u4f55\u6807\u6ce8\u7684\u89c6\u9891\u6570\u636e\u96c6\u90fd\u80fd\u7528\u4e8e\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLVLM\u5728\u751f\u6210\u6307\u4ee4\u548c\u5fae\u8c03\u4e4b\u95f4\u5faa\u73af\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u8fd9\u4e0d\u4ec5\u80fd\u63d0\u5347\u89c6\u9891\u6574\u4f53\u7406\u89e3\u80fd\u529b\uff08I\uff09\uff0c\u8fd8\u80fd\u8ba9LVLM\u9002\u5e94\u65b0\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u5229\u7528\u73b0\u6709\u76d1\u7763\u8fdb\u884c\u5b66\u4e60\u3002 \u5177\u4f53\u6765\u8bf4\uff0cLVLM\u88ab\u63d0\u793a\u63d0\u51fa\u4e00\u4e2a\u7b54\u6848\uff0c\u7136\u540e\u4ec5\u4fdd\u7559\u90a3\u4e9b\u5305\u542b\u539f\u59cb\u89c6\u9891\u6807\u7b7e\u7684\u7b54\u6848\u3002LVLM\u968f\u540e\u5728\u751f\u6210\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u518d\u8bad\u7ec3\u3002\u901a\u8fc7\u53ea\u5728\u5305\u542b\u6b63\u786e\u89c6\u9891\u6807\u7b7e\u7684\u751f\u6210\u7b54\u6848\u4e0a\u8bad\u7ec3\uff0cVideo-STaR\u5229\u7528\u73b0\u6709\u7684\u89c6\u9891\u6807\u7b7e\u4f5c\u4e3a\u5f31\u76d1\u7763\u6765\u6307\u5bfc\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7Video-STaR\u589e\u5f3a\u7684LVLM\u5728\uff08I\uff09\u4e00\u822c\u89c6\u9891\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63d0\u5347\u4e8610%\uff0c\u5728\uff08II\uff09\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0cVideo-STaR\u63d0\u9ad8\u4e86Kinetics700-QA\u7684\u51c6\u786e\u602720%\uff0c\u4ee5\u53caFineDiving\u52a8\u4f5c\u8d28\u91cf\u8bc4\u4f30\u7684\u6027\u80fd15%\u3002\u603b\u7684\u6765\u8bf4\uff0cVideo-STaR\u4e3aLVLM\u7684\u6027\u80fd\u63d0\u5347\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u6548\u4e14\u5b9e\u7528\u7684\u65b9\u6cd5\u3002**|\n", "2407.06188": "|**2024-07-08**|**CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation**|Xinying Guo et.al.|[2407.06188](http://arxiv.org/abs/2407.06188)|null|\u5728\u5a31\u4e50\u884c\u4e1a\uff08\u5982\u52a8\u753b\u548c\u6e38\u620f\uff09\u4ee5\u53ca\u6218\u7565\u9886\u57df\uff08\u5982\u57ce\u5e02\u6a21\u62df\u548c\u89c4\u5212\uff09\u4e2d\uff0c\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9700\u8981\u7cbe\u7ec6\u5730\u878d\u5408\u63a7\u5236\u4e0e\u751f\u6210\uff0c\u4ee5\u5728\u7279\u5b9a\u7684\u7a7a\u95f4\u548c\u8bed\u4e49\u7ea6\u675f\u4e0b\u5b9e\u73b0\u903c\u771f\u7684\u7fa4\u4f53\u52a8\u6001\u5408\u6210\uff0c\u5176\u6311\u6218\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u6a21\u578b\u5f80\u5f80\u5173\u6ce8\u4e2a\u4f53\u884c\u4e3a\uff0c\u5ffd\u89c6\u4e86\u96c6\u4f53\u884c\u4e3a\u7684\u590d\u6742\u6027\uff1b\u800c\u591a\u4e2a\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u7684\u6700\u65b0\u65b9\u6cd5\u4e25\u91cd\u4f9d\u8d56\u9884\u8bbe\u573a\u666f\uff0c\u4e14\u9650\u4e8e\u56fa\u5b9a\u3001\u5c11\u91cf\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u9650\u5236\u4e86\u5176\u5b9e\u7528\u6027\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCrowdMoGen\uff0c\u4e00\u4e2a\u96f6\u6837\u672c\u6587\u672c\u9a71\u52a8\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u529b\u91cf\uff0c\u5c06\u96c6\u4f53\u667a\u6167\u878d\u5165\u8fd0\u52a8\u751f\u6210\u6846\u67b6\uff0c\u4ece\u800c\u80fd\u591f\u5728\u6ca1\u6709\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u901a\u7528\u7684\u89c4\u5212\u548c\u7fa4\u4f53\u8fd0\u52a8\u751f\u6210\u3002\u6211\u4eec\u7684\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a1\uff09\u4eba\u7fa4\u573a\u666f\u89c4\u5212\u5668\uff0c\u5b66\u4e60\u6839\u636e\u7279\u5b9a\u573a\u666f\u4e0a\u4e0b\u6587\u6216\u5f15\u5165\u7684\u6270\u52a8\u534f\u8c03\u8fd0\u52a8\u548c\u52a8\u6001\uff1b2\uff09\u96c6\u4f53\u8fd0\u52a8\u751f\u6210\u5668\uff0c\u6839\u636e\u6574\u4f53\u8ba1\u5212\u9ad8\u6548\u5408\u6210\u6240\u9700\u7684\u96c6\u4f53\u8fd0\u52a8\u3002\u5927\u91cf\u7684\u5b9a\u91cf\u548c\u5b9a\u6027\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5b83\u4e0d\u4ec5\u586b\u8865\u4e86\u5927\u89c4\u6a21\u548c\u901a\u7528\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u4efb\u52a1\u7684\u91cd\u8981\u7a7a\u767d\uff0c\u800c\u4e14\u5728\u771f\u5b9e\u611f\u548c\u7075\u6d3b\u6027\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6c34\u51c6\u3002|\n", "2407.06172": "|**2024-07-08**|**On Speeding Up Language Model Evaluation**|Jin Peng Zhou et.al.|[2407.06172](http://arxiv.org/abs/2407.06172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\uff0c\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u80fd\u529b\u3002\u4ece\u8bad\u7ec3\u5230\u63a8\u7406\uff0c\u6784\u5efa\u8fd9\u6837\u7684\u6a21\u578b\u6d89\u53ca\u4f17\u591a\u51b3\u7b56\uff0c\u5f62\u6210\u4e00\u4e2a\u590d\u6742\u7684\u641c\u7d22\u95ee\u9898\u3002\u4f8b\u5982\uff0c\u4e3a\u4e86\u4e3a\u7279\u5b9a\u4efb\u52a1\u627e\u5230\u6700\u4f73\u7684\u9884\u8bad\u7ec3LLM\u3001\u63d0\u793a\u6216\u8d85\u53c2\u6570\uff0c\u901a\u5e38\u9700\u8981\u5bf9\u6574\u4e2a\u6d4b\u8bd5\u96c6\u4e2d\u7684\u591a\u4e2a\u5019\u9009\u65b9\u6848\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002\u8fd9\u79cd\u8be6\u5c3d\u7684\u8bc4\u4f30\u8017\u65f6\u4e14\u6602\u8d35\uff0c\u56e0\u4e3aLLMs\u7684\u63a8\u7406\u548c\u5ea6\u91cf\u8ba1\u7b97\u9700\u6c42\u9ad8\u3002 \u672c\u6587\u9488\u5bf9\u5728\u6709\u9650\u9884\u7b97\u5185\u6709\u6548\u8bc4\u4f30\u65b9\u6cd5\u5728\u6d4b\u8bd5\u6837\u672c\u4e0a\u7684\u6027\u80fd\u8fd9\u4e00\u6311\u6218\u3002\u6211\u4eec\u5229\u7528\u4e86\u5e7f\u6cdb\u7814\u7a76\u7684\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u987a\u5e8f\u9009\u62e9\u4e0b\u4e00\u4e2a\u8981\u8bc4\u4f30\u7684\u65b9\u6cd5-\u793a\u4f8b\u5bf9\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u2014\u2014\u7ed3\u5408\u591a\u81c2\u8001\u864e\u673a\u7b97\u6cd5\u4e0e\u4f4e\u79e9\u5206\u89e3\u2014\u2014\u663e\u8457\u51cf\u5c11\u4e86\u6240\u9700\u7684\u8d44\u6e90\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b97\u6cd5\u4ec5\u4f7f\u7528\u901a\u5e38\u9700\u6c42\u76845%-15%\u8d44\u6e90\uff0c\u5c31\u80fd\u8bc6\u522b\u51fa\u8868\u73b0\u6700\u597d\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9ad8\u8fbe85%-95%\u7684\u6210\u672c\u8282\u7701\u3002|\n", "2407.06153": "|**2024-07-08**|**What's Wrong with Your Code Generated by Large Language Models? An Extensive Study**|Shihan Dou et.al.|[2407.06153](http://arxiv.org/abs/2407.06153)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u7814\u7a76\u4eba\u5458\u5bf9\u6b64\u7684\u5173\u6ce8\u5ea6\u65e5\u76ca\u63d0\u9ad8\u3002\u76ee\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6784\u5efa\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u548c\u91c7\u7528\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6280\u672f\u6765\u63d0\u5347LLM\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u4e9b\u73b0\u6709\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u548c\u8fb9\u754c\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7814\u7a76\u63a2\u8ba8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8be6\u5c3d\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u8bc4\u4f30\u4e86\u4e09\u4e2a\u9886\u5148\u95ed\u6e90LLM\u548c\u56db\u4e2a\u5f00\u6e90LLM\u5728\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u7814\u7a76\u8003\u5bdf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u957f\u5ea6\u3001\u5faa\u73af\u590d\u6742\u5ea6\u548cAPI\u6570\u91cf\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u66f4\u590d\u6742\u7684\u7f16\u7a0b\u95ee\u9898\u65f6\u9762\u4e34\u6311\u6218\uff0c\u751f\u6210\u7684\u4ee3\u7801\u5f80\u5f80\u8f83\u77ed\u4f46\u7ed3\u6784\u66f4\u590d\u6742\uff0c\u4e0e\u6807\u51c6\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\u3002 \u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\u548c12\u4e2a\u5b50\u7c7b\u522b\uff0c\u5206\u6790\u5e38\u89c1\u9519\u8bef\u7c7b\u578b\u7684\u6839\u6e90\u3002\u4e3a\u4e86\u68c0\u9a8cLLMs\u5728\u5b9e\u9645\u9879\u76ee\u4e2d\u7684\u8868\u73b0\uff0c\u6211\u4eec\u4eb2\u624b\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b140\u4e2a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u5b9e\u4e16\u754c\u57fa\u51c6\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5b9e\u9645\u573a\u666f\u4e2d\u7684bug\u5206\u5e03\u4e0e\u73b0\u6709\u57fa\u51c6\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u989d\u5916\u8bad\u7ec3\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u5f15\u5165\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4f7fLLMs\u80fd\u591f\u6839\u636ebug\u7c7b\u578b\u548c\u7f16\u8bd1\u5668\u53cd\u9988\u4fee\u6b63\u5176\u751f\u6210\u7684\u4ee3\u7801\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7ecf\u8fc7\u4e24\u6b21\u8fed\u4ee3\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u51cf\u5c11\u9519\u8bef\uff0c\u4f7f\u901a\u8fc7\u7387\u63d0\u9ad829.2%\uff0c\u8fd9\u8868\u660eLLMs\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65b9\u9762\u5177\u6709\u5de8\u5927\u6f5c\u529b\u3002|\n", "2407.06146": "|**2024-07-09**|**Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks**|Lukas Netz et.al.|[2407.06146](http://arxiv.org/abs/2407.06146)|null|\u6211\u4eec\u4ecb\u7ecd\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bed\u6cd5\u906e\u76d6\u201d\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u65e0\u5173\u6587\u6cd5\u7684\u7ea6\u675f\u4e0b\u751f\u6210\u8bed\u6cd5\u6b63\u786e\u7684\u6a21\u578b\u3002\u5c3d\u7ba1\u5c11\u91cf\u793a\u4f8b\u5b66\u4e60\u6216\u63d0\u793a\u5f15\u5bfc\u7b49prompt\u5de5\u7a0b\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8LLMs\u751f\u6210\u6b63\u786e\u8bed\u6cd5\u7684\u6982\u7387\uff0c\u4f46\u5904\u7406\u590d\u6742\u6587\u6cd5\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u8017\u65f6\u4e14\u6548\u679c\u4e0d\u7406\u60f3\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u6216prompt\u5de5\u7a0b\u4e0a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ea6\u675f\u89e3\u7801\u9650\u5236\u8f93\u51fa\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u6709\u6548\u8bed\u6cd5\u3002\u6211\u4eec\u5229\u7528MontiCore\u6784\u5efa\u7684\u591a\u79cd\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u548c\u591a\u6b3eLLMs\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u4e86\u4f7f\u7528\u548c\u672a\u4f7f\u7528\u7ea6\u675f\u89e3\u7801\u7684\u6548\u679c\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u76f8\u5e94\u7684\u89e3\u6790\u5668\u9a8c\u8bc1\u6bcf\u79cd\u6a21\u578b\u7684\u53e5\u6cd5\u51c6\u786e\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bed\u6cd5\u906e\u76d6\u663e\u8457\u63d0\u5347\u4e86\u591a\u4e2aLLMs\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u51cf\u5c11\u4e86\u5bf9\u7cbe\u5fc3\u8bbe\u8ba1\u63d0\u793a\u7684\u9700\u6c42\uff0c\u63d0\u9ad8\u4e86\u751f\u6210\u6b63\u786e\u6a21\u578b\u7684\u53ef\u80fd\u6027\u3002|\n", "2407.06135": "|**2024-07-08**|**ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation**|Ethan Chern et.al.|[2407.06135](http://arxiv.org/abs/2407.06135)|**[link](https://github.com/gair-nlp/anole)**|**## \u80cc\u666f \u5148\u524d\u7684\u5f00\u6e90\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\uff1a\uff081\uff09\u5b83\u4eec\u5f80\u5f80\u7f3a\u4e4f\u539f\u751f\u96c6\u6210\uff0c\u9700\u8981\u9002\u914d\u5668\u6765\u8854\u63a5\u89c6\u89c9\u8868\u793a\u4e0e\u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u8bb8\u591a\u6a21\u578b\u4ec5\u9650\u4e8e\u5355\u6a21\u6001\u751f\u6210\uff1b\uff083\uff09\u5c3d\u7ba1\u6709\u4e9b\u652f\u6301\u591a\u6a21\u6001\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5355\u72ec\u7684\u6269\u6563\u6a21\u578b\u5904\u7406\u89c6\u89c9\u90e8\u5206\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Anole\uff0c\u4e00\u4e2a\u5f00\u6e90\u7684\u3001\u81ea\u56de\u5f52\u7684\u3001\u539f\u751f\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff0c\u4e13\u4e3a\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u751f\u6210\u8bbe\u8ba1\u3002\u6211\u4eec\u57fa\u4e8eMeta AI\u7684Chameleon\u6784\u5efaAnole\uff0c\u91c7\u7528\u4e86\u4e00\u79cd\u65e2\u6570\u636e\u9ad8\u6548\u53c8\u53c2\u6570\u9ad8\u6548\u7684\u521b\u65b0\u5fae\u8c03\u7b56\u7565\u3002Anole\u5c55\u793a\u4e86\u9ad8\u8d28\u91cf\u3001\u8fde\u8d2f\u7684\u591a\u6a21\u6001\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u5df2\u7ecf\u516c\u5f00\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3001\u8bad\u7ec3\u6846\u67b6\u4ee5\u53ca\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002**|\n", "2407.06129": "|**2024-07-08**|**Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization**|Hannah K. Bako et.al.|[2407.06129](http://arxiv.org/abs/2407.06129)|**[link](https://github.com/hdi-umd/semantic_profiling_llm_evaluation)**|**### \u6982\u8ff0 \u81ea\u52a8\u6839\u636e\u4eba\u7c7b\u5bf9\u6570\u636e\u96c6\u7684\u53e3\u5934\u63cf\u8ff0\u751f\u6210\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\uff0c\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u8bed\u8a00\u4e2d\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u5305\u62ec\u5bf9\u6570\u636e\u5c5e\u6027\u3001\u53ef\u89c6\u5316\u4efb\u52a1\u4ee5\u53ca\u6570\u636e\u9884\u5904\u7406\u6b65\u9aa4\u7684\u9690\u542b\u548c\u660e\u786e\u63d0\u53ca\u3002\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff08NLIs\uff09\u5728\u6570\u636e\u53ef\u89c6\u5316\u65b9\u9762\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5982\u4f55\u6355\u6349\u8fd9\u4e9b\u4fe1\u606f\uff0c\u4f46\u4eba\u7c7b\u8a00\u8bed\u7684\u4e0d\u786e\u5b9a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u4f46\u5b83\u4eec\u63d0\u53d6\u76f8\u5173\u8bed\u4e49\u4fe1\u606f\u7684\u80fd\u529b\u5c1a\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86\u56db\u6b3e\u516c\u5f00\u53ef\u7528\u7684LLMs\uff08GPT-4\u3001Gemini-Pro\u3001Llama3\u548cMixtral\uff09\uff0c\u5206\u6790\u5b83\u4eec\u5728\u9762\u5bf9\u4e0d\u786e\u5b9a\u6027\u65f6\u7406\u89e3\u53e3\u5934\u6307\u4ee4\u7684\u80fd\u529b\uff0c\u5e76\u8bc6\u522b\u6570\u636e\u4e0a\u4e0b\u6587\u548c\u89c6\u89c9\u4efb\u52a1\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5bf9\u53e3\u8bed\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u5f88\u654f\u611f\uff0c\u80fd\u591f\u63d0\u53d6\u5173\u952e\u7684\u6570\u636e\u80cc\u666f\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u63a8\u65ad\u53ef\u89c6\u5316\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u6b20\u4f73\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u672a\u6765\u5229\u7528LLMs\u8fdb\u884c\u53ef\u89c6\u5316\u751f\u6210\u7684\u7814\u7a76\u65b9\u5411\u3002**|\n", "2407.06125": "|**2024-07-08**|**Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities**|Avinash Anand et.al.|[2407.06125](http://arxiv.org/abs/2407.06125)|null|\u6291\u90c1\u75c7\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u91cd\u5927\u7684\u516c\u5171\u536b\u751f\u95ee\u9898\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4eba\u7684\u5fc3\u7406\u5065\u5eb7\u3002\u672a\u7ecf\u8bca\u65ad\u7684\u6291\u90c1\u75c7\u53ef\u80fd\u5bfc\u81f4\u4e25\u91cd\u7684\u5065\u5eb7\u95ee\u9898\uff0c\u5305\u62ec\u751f\u7406\u75c7\u72b6\u751a\u81f3\u81ea\u6740\u3002\u901a\u5e38\uff0c\u6291\u90c1\u75c7\u7684\u8bca\u65ad\u4f9d\u8d56\u4e8e\u4e34\u5e8a\u533b\u751f\u548c\u5fc3\u7406\u5065\u5eb7\u4e13\u4e1a\u4eba\u5458\u8fdb\u884c\u7684\u7ed3\u6784\u5316\u8bbf\u8c08\u548c\u5982Patient Health Questionnaire\uff08PHQ\uff09\u7b49\u95ee\u5377\u8c03\u67e5\u3002\u7136\u800c\uff0c\u8fd9\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u533b\u751f\u7684\u7ecf\u9a8c\u548c\u5224\u65ad\uff0c\u53ef\u80fd\u53d7\u5230\u4e2a\u4eba\u504f\u89c1\u7684\u5f71\u54cd\u3002\u7531\u4e8e\u6291\u90c1\u75c7\u7684\u6210\u56e0\u4ecd\u5728\u7814\u7a76\u4e2d\uff0c\u533b\u751f\u5728\u8bc6\u522b\u548c\u6cbb\u7597\u521d\u671f\u9636\u6bb5\u7684\u6291\u90c1\u75c7\u65f6\u9762\u4e34\u6311\u6218\u3002 \u8fd1\u671f\uff0c\u4eba\u5de5\u667a\u80fd\u795e\u7ecf\u8ba1\u7b97\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u8bed\u97f3\u5904\u7406\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u6211\u4eec\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528\u8fd9\u4e9b\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5728E-DAIC\uff08Extended Distress Analysis Interview Corpus Wizard of Oz\uff09\u6570\u636e\u96c6\u548c2019\u5e74Audio/Visual Emotion Challenge\uff08AVEC\uff09\u4e2d\u8fdb\u884c\u5b9e\u9a8c\uff0c\u4ee5\u671f\u4f18\u5316\u591a\u6a21\u6001\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u5229\u7528\u4e13\u6709\u548c\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u6a21\u6001\u4e0a\u7684Root Mean Square Error\uff08RMSE\uff09\u5f97\u5206\u8fbe\u52303.98\uff0c\u4f18\u4e8eAVEC 2019\u6311\u6218\u7684\u57fa\u7ebf\u548c\u5f53\u524d\u6700\u4f73\u7684\u56de\u5f52\u5206\u6790\u67b6\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u51c6\u786e\u6027\u8fbe\u5230\u4e8671.43%\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u97f3\u9891-\u89c6\u89c9\u591a\u6a21\u6001\u7f51\u7edc\uff0c\u5176\u9884\u6d4bPHQ-8\u8bc4\u5206\u7684RMSE\u4e3a6.51\u3002|\n", "2407.06093": "|**2024-07-08**|**Artificial Intuition: Efficient Classification of Scientific Abstracts**|Harsh Sakhrani et.al.|[2407.06093](http://arxiv.org/abs/2407.06093)|null|## \u80cc\u666f \u4e3a\u4e86\u83b7\u53d6\u6218\u7565\u6d1e\u89c1\u6216\u8fdb\u884c\u79d1\u7814\u9879\u76ee\u7ba1\u7406\uff0c\u5bf9\u7b80\u77ed\u7684\u79d1\u5b66\u6587\u672c\uff08\u5982\u7814\u7a76\u57fa\u91d1\u7533\u8bf7\u4e66\u6216\u51fa\u7248\u7269\u6458\u8981\uff09\u8fdb\u884c\u7c97\u7c92\u5ea6\u5206\u7c7b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u6587\u672c\u5411\u5177\u5907\u6df1\u539a\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u4f20\u8fbe\u5bc6\u96c6\u4fe1\u606f\uff0c\u4f46\u81ea\u52a8\u5316\u7684\u4efb\u52a1\u6781\u5176\u8270\u5de8\uff0c\u56e0\u4e3a\u7bc7\u5e45\u6709\u9650\u4e14\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\u6765\u751f\u6210\u5e76\u51c6\u786e\u5206\u914d\u7279\u5b9a\u9886\u57df\u7684\u7c97\u6807\u7b7e\u3002\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u591f\u63d0\u4f9b\u4efb\u52a1\u6240\u9700\u7684\u5143\u6570\u636e\uff0c\u7c7b\u4f3c\u4e8e\u589e\u5f3a\u4eba\u7c7b\u76f4\u89c9\u7684\u8865\u5145\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u3002\u4f5c\u4e3a\u521d\u6b65\u5b9e\u9a8c\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u7f8e\u56fd\u56fd\u5bb6\u822a\u7a7a\u822a\u5929\u5c40\uff08NASA\uff09\u7684\u5956\u9879\u6458\u8981\u6570\u636e\u5e93\u3002\u6211\u4eec\u7ed3\u5408\u73b0\u6709\u6027\u80fd\u6307\u6807\uff0c\u5f00\u53d1\u4e86\u65b0\u7684\u8bc4\u4f30\u5de5\u5177\u3002|\n", "2407.06089": "|**2024-07-08**|**Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models**|Jinliang Lu et.al.|[2407.06089](http://arxiv.org/abs/2407.06089)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u6210\u529f\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u8fdb\u5165\u4e86\u65b0\u65f6\u4ee3\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5404\u6709\u6240\u957f\uff0c\u4f46\u8bad\u7ec3\u5728\u4e0d\u540c\u8bed\u6599\u5e93\u4e0a\u7684LLMs\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u8fd9\u7ed9\u63d0\u9ad8\u6574\u4f53\u6548\u7387\u548c\u7075\u6d3b\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u7d22\u4e86LLMs\u7684\u534f\u4f5c\u7b56\u7565\u3002\u672c\u6587\u5168\u9762\u6982\u8ff0\u4e86\u8fd9\u4e00\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u5f3a\u8c03\u4e86\u5408\u4f5c\u80cc\u540e\u7684\u52a8\u529b\u3002\u6211\u4eec\u5c06\u534f\u4f5c\u7b56\u7565\u4e3b\u8981\u5206\u4e3a\u4e09\u79cd\u65b9\u6cd5\uff1a\u5408\u5e76\u3001\u96c6\u6210\u548c\u534f\u4f5c\u3002\u5408\u5e76\u662f\u5c06\u591a\u4e2aLLMs\u7684\u53c2\u6570\u7a7a\u95f4\u6574\u5408\u3002\u96c6\u6210\u5219\u662f\u7ed3\u5408\u591a\u4e2a\u6a21\u578b\u7684\u8f93\u51fa\u3002\u534f\u4f5c\u5229\u7528\u4e0d\u540cLLMs\u7684\u4f18\u52bf\uff0c\u4f7f\u5176\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u53d1\u6325\u5404\u81ea\u4e13\u957f\u3002\u6211\u4eec\u5c06\u4ece\u4e0d\u540c\u89d2\u5ea6\u8be6\u7ec6\u4ecb\u7ecd\u8fd9\u4e9b\u65b9\u6cd5\uff0c\u5e76\u8ba8\u8bba\u5176\u6f5c\u5728\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u52fe\u52d2\u51fa\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u671f\u671b\u672c\u5de5\u4f5c\u80fd\u6fc0\u53d1\u66f4\u591a\u5173\u4e8eLLMs\u534f\u4f5c\u7684\u7814\u7a76\uff0c\u63a8\u52a8\u9ad8\u7ea7NLP\u5e94\u7528\u7684\u53d1\u5c55\u3002|\n", "2407.07094": "|**2024-07-09**|**AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning**|Jiaxi Cui et.al.|[2407.07094](http://arxiv.org/abs/2407.07094)|**[link](https://github.com/pandavt/datatager)**|**\u5728\u5404\u884c\u5404\u4e1a\u5e7f\u6cdb\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc7\u7a0b\u4e2d\uff0c\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e2a\u4f53\u548c\u5c0f\u578b\u7ec4\u7ec7\u5bf9\u9488\u5bf9\u5176\u7279\u5b9a\u4e1a\u52a1\u573a\u666f\u5b9a\u5236\u5316\u6a21\u578b\u7684\u9700\u6c42\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\\textbf{AnyTaskTune}\uff0c\u5373\u4efb\u52a1\u5fae\u8c03\uff08Task-Fine-Tune\uff09\uff0c\u65e8\u5728\u63d0\u5347\u6a21\u578b\u5728\u591a\u6837\u5316\u7684\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u5305\u62ec\u7ec6\u81f4\u5730\u8bc6\u522b\u548c\u5b9a\u4e49\u9886\u57df\u5185\u7684\u5b50\u4efb\u52a1\uff0c\u968f\u540e\u521b\u5efa\u4e13\u95e8\u7684\u589e\u5f3a\u6570\u636e\u96c6\u8fdb\u884c\u7cbe\u7ec6\u8c03\u6574\uff0c\u4ece\u800c\u4f18\u5316\u4efb\u52a1\u7279\u5b9a\u7684\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u6cd5\u5f8b\uff08\u5982\u5173\u952e\u8bcd\u63d0\u53d6\u548c\u53e5\u5b50\u9884\u6d4b\uff09\u7b49\u591a\u4e2a\u9886\u57df\uff0c\u5305\u62ec\u91d1\u878d\u3001\u533b\u7597\u3001\u6cd5\u5f8b\u3001\u5fc3\u7406\u5b66\u3001\u5ba2\u6237\u670d\u52a1\u548c\u4eba\u529b\u8d44\u6e90\u7b49\u4e8c\u5341\u591a\u4e2a\u5b50\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5fae\u8c03\u5b9e\u9a8c\u3002\u4e3a\u4e86\u652f\u6301\u793e\u533a\u53c2\u4e0e\u5e76\u5206\u4eab\u8d44\u6e90\uff0c\u6211\u4eec\u5c06\u5f00\u6e90\u8fd9\u4e9b\u53cc\u8bed\u4efb\u52a1\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\\textbf{Task-Fine-Tune}\u65b9\u6cd5\u5fae\u8c03\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5404\u81ea\u9886\u57df\u5185\u660e\u663e\u4f18\u4e8e\u901a\u7528\u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/PandaVT/DataTager}\u3002**|\n", "2407.07093": "|**2024-07-09**|**FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation**|Liqun Ma et.al.|[2407.07093](http://arxiv.org/abs/2407.07093)|**[link](https://github.com/liqunma/fbi-llm)**|**\u8be5\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u5168\u4e8c\u8fdb\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08FBI-LLM\uff09\uff0c\u8fd9\u662f\u9996\u6b21\u5c55\u793a\u5982\u4f55\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5927\u89c4\u6a21\u7684\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\uff08\u4e0d\u540c\u4e8e\u90e8\u5206\u4e8c\u8fdb\u5236\u6216\u4e09\u8fdb\u5236\u7684LSTM\uff0c\u5982BitNet b1.58\uff09\uff0c\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6d6e\u70b916\u4f4d\uff08FP16\uff09\u6216\u6df7\u5408\u7cbe\u5ea616\u4f4d\uff08BF16\uff09\u7684\u5e38\u89c4\u5927\u8bed\u8a00\u6a21\u578b\u76f8\u5f53\u3002\u901a\u8fc7\u4f7f\u7528\u81ea\u56de\u5f52\u84b8\u998f\uff08AD\uff09\u635f\u5931\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u5c3a\u5bf8\uff08130M\u300113B\u30017B\uff09\u548c\u9884\u8bad\u7ec3\u6570\u636e\u91cf\u4e0e\u5e38\u89c4LLM\u76f8\u5f53\uff0cFBI-LLM\u5728\u56f0\u60d1\u5ea6\u548c\u4efb\u52a1\u7279\u5b9a\u6548\u679c\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\u5e76\u4e0d\u9700\u8981\u9884\u8bad\u7ec3\u6743\u91cd\u3002\u8fd9\u9879\u5de5\u4f5c\u50ac\u751f\u4e86\u4e00\u4e2a\u65b0\u7684\u8ba1\u7b97\u6846\u67b6\uff0c\u5e76\u53ef\u80fd\u63a8\u52a8\u9488\u5bf9\u5b8c\u51681\u6bd4\u7279LLMs\u7684\u4e13\u4e1a\u786c\u4ef6\u8bbe\u8ba1\u3002\u6211\u4eec\u516c\u5f00\u6240\u6709\u6a21\u578b\u3001\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff08\u4ee3\u7801\uff1ahttps://github.com/LiqunMa/FBI-LLM\uff0c\u6a21\u578b\uff1ahttps://huggingface.co/LiqunMa/\uff09\u3002**|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5173\u952e\u65b0\u589e\u7684\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u51c6\u786e\u6027\u6765\u9010\u6b65\u4f18\u5316\u3002\u5728Melting Pot\u57fa\u51c6\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u5047\u8bbe\u5fc3\u667a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7ebf\uff0c\u65e0\u8bba\u662f\u5728\u4e8c\u5143\u73af\u5883\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\u4e2d\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u8fed\u4ee3\u7cbe\u70bc\u5bf9\u4e8e\u5e94\u5bf9\u590d\u6742\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.07080": "|**2024-07-09**|**Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities**|Shaltiel Shmidman et.al.|[2407.07080](http://arxiv.org/abs/2407.07080)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u5e0c\u4f2f\u6765\u7b49\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6311\u6218\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86DictaLM2.0\u548cDictaLM2.0-Instruct\uff0c\u8fd9\u4e24\u4e2a\u6a21\u578b\u57fa\u4e8eMistral\u6a21\u578b\uff0c\u4f7f\u7528\u5927\u7ea62000\u4ebf\u4e2a\u5e0c\u4f2f\u6765\u8bed\u548c\u82f1\u8bed\u8bcd\u6c47\u8fdb\u884c\u8bad\u7ec3\u3002\u9002\u5e94\u9884\u8bad\u7ec3\u6a21\u578b\u5230\u65b0\u8bed\u8a00\u9700\u8981\u4e13\u95e8\u7684\u6280\u672f\uff0c\u8fd9\u4e0e\u4ece\u5934\u8bad\u7ec3\u6216\u5728\u8d44\u6e90\u4e30\u5bcc\u7684\u8bed\u8a00\uff08\u5982\u82f1\u8bed\uff09\u4e0a\u8fdb\u4e00\u6b65\u8bad\u7ec3\u73b0\u6709\u6a21\u578b\u6709\u663e\u8457\u5dee\u5f02\u3002\u8bba\u6587\u8be6\u7ec6\u9610\u8ff0\u4e86\u8fd9\u4e9b\u521b\u65b0\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdb\u5e0c\u4f2f\u6765\u8bed\u7684\u9ad8\u6548\u5b66\u4e60\u548c\u9002\u5e94\u5176\u8bed\u8a00\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9DictaLM2.0-Instruct\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6307\u4ee4\u5fae\u8c03\uff0c\u4ee5\u63d0\u5347\u5176\u5728\u4efb\u52a1\u5bfc\u5411\u6307\u4ee4\u4e0a\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u4e25\u683c\u8bc4\u4f30\u6211\u4eec\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u65b0\u7684\u5e0c\u4f2f\u6765LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u95ee\u7b54\u3001\u60c5\u611f\u5206\u6790\u3001Winograd Schema Challenge\u3001\u7ffb\u8bd1\u548c\u6458\u8981\u7b49\u591a\u4e2a\u4efb\u52a1\u3002\u672c\u6587\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3LLMs\u7684\u590d\u6742\u6027\uff0c\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u7528\u4e8e\u5176\u4ed6LLM\u8de8\u975e\u82f1\u8bed\u8bed\u8a00\u9002\u5e94\u7684\u6846\u67b6\uff0c\u4ece\u800c\u5bf9\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2407.07071": "|**2024-07-09**|**Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps**|Yung-Sung Chuang et.al.|[2407.07071](http://arxiv.org/abs/2407.07071)|**[link](https://github.com/voidism/lookback-lens)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u603b\u7ed3\u6587\u7ae0\u6216\u6839\u636e\u7ed9\u5b9a\u6bb5\u843d\u56de\u7b54\u95ee\u9898\u65f6\u53ef\u80fd\u51fa\u73b0\u7684\u8bed\u5883\u6027\u865a\u6784\u95ee\u9898\u3002LLMs\u53ef\u80fd\u4f1a\u675c\u64b0\u7ec6\u8282\uff0c\u63d0\u4f9b\u4e0e\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0d\u7b26\u7684\u4e0d\u51c6\u786e\u7b54\u6848\u3002\u7814\u7a76\u8005\u63d0\u51fa\uff0c\u8fd9\u79cd\u865a\u6784\u4e0e\u6a21\u578b\u503e\u5411\u4e8e\u5173\u6ce8\u4e0a\u4e0b\u6587\u4fe1\u606f\u8fd8\u662f\u81ea\u52a8\u751f\u6210\u5185\u5bb9\u7684\u7a0b\u5ea6\u6709\u5173\u3002\u4e3a\u6b64\uff0c\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7b80\u5355\u7684\u68c0\u6d4b\u6a21\u578b\u2014\u2014\u201cLookback Lens\u201d\uff0c\u5176\u8f93\u5165\u7279\u5f81\u662f\u57fa\u4e8e\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u4e0a\u4e0b\u6587\u6ce8\u610f\u529b\u6743\u91cd\u4e0e\u65b0\u751f\u6210\u8bcd\u7684\u6bd4\u4f8b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u4f7f\u7528\u8fd9\u4e9b\u56de\u987e\u6bd4\u7387\u7279\u5f81\u7684\u7ebf\u6027\u5206\u7c7b\u5668\u4e0e\u5229\u7528LLM\u6574\u4e2a\u9690\u85cf\u72b6\u6001\u6216\u6587\u672c\u8574\u542b\u6a21\u578b\u7684\u66f4\u590d\u6742\u68c0\u6d4b\u5668\u540c\u6837\u6709\u6548\u3002Lookback Lens\u4e0d\u4ec5\u9002\u7528\u4e8e\u4e0d\u540c\u4efb\u52a1\uff0c\u8fd8\u80fd\u8de8\u6a21\u578b\u8fc1\u79fb\uff0c\u4e00\u4e2a\u572870\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u8bad\u7ec3\u7684\u68c0\u6d4b\u5668\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u5373\u53ef\u5e94\u7528\u4e8e\u66f4\u5927\u7684130\u4ebf\u53c2\u6570\u6a21\u578b\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0\uff0c\u901a\u8fc7\u7b80\u5355\u7684\u5206\u7c7b\u5668\u6307\u5bfc\u89e3\u7801\u65b9\u6cd5\uff0c\u80fd\u591f\u51cf\u5c11\u8bf8\u5982XSum\u6458\u8981\u4efb\u52a1\u4e2d\u7684\u865a\u6784\u7a0b\u5ea6\uff0c\u4f8b\u5982\u964d\u4f4e9.6%\u7684\u865a\u6784\u53d1\u751f\u7387\u3002**|\n", "2407.07064": "|**2024-07-09**|**Prompting Techniques for Secure Code Generation: A Systematic Investigation**|Catherine Tony et.al.|[2407.07064](http://arxiv.org/abs/2407.07064)|null|## \u6982\u8981 \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u5174\u8d77\uff0c\u901a\u8fc7\u63d0\u793a\u9a71\u52a8\u7f16\u7a0b\uff0c\u5f00\u53d1\u8005\u80fd\u591f\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5b83\u4eec\u80fd\u5426\u4ea7\u751f\u5b89\u5168\u4ee3\u7801\u7684\u7814\u7a76\u5f15\u53d1\u4e86\u8d28\u7591\uff0c\u8fd9\u5173\u7cfb\u5230\u63d0\u793a\u751f\u6210\u8f6f\u4ef6\u7684\u8d28\u91cf\u3002\u5c3d\u7ba1\u5df2\u7ecf\u51fa\u73b0\u4e86\u591a\u79cd\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u4ee5\u4f18\u5316LLM\u7684\u54cd\u5e94\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u76ee\u6807\uff1a\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0d\u540c\u63d0\u793a\u6280\u672f\u5bf9LLMs\u6839\u636eNL\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u7684\u5b89\u5168\u6027\u5f71\u54cd\u3002\u65b9\u6cd5\uff1a\u9996\u5148\uff0c\u6211\u4eec\u8fdb\u884c\u7cfb\u7edf\u6587\u732e\u56de\u987e\uff0c\u4ee5\u8bc6\u522b\u9002\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u6709\u63d0\u793a\u6280\u672f\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728GPT-3\u3001GPT-3.5\u548cGPT-4\u6a21\u578b\u4e0a\u8bc4\u4f30\u8fd9\u4e9b\u6280\u672f\u4e2d\u7684\u90e8\u5206\uff0c\u4f7f\u7528\u4e00\u4e2a\u5305\u542b150\u4e2a\u4e0e\u5b89\u5168\u76f8\u5173\u7684\u4ee3\u7801\u751f\u6210NL\u63d0\u793a\u7684\u6570\u636e\u96c6\u3002\u7ed3\u679c\uff1a\u6211\u4eec\u7684\u5de5\u4f5c\uff081\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u7684\u6f5c\u5728\u63d0\u793a\u6280\u672f\u8fdb\u884c\u4e86\u5206\u7c7b\uff0c\uff082\uff09\u9002\u5e94\u5e76\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\uff083\uff09\u89c2\u5bdf\u5230\u5728\u6d4b\u8bd5\u7684LLMs\u4e2d\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u4e86\u540d\u4e3a\u201c\u9012\u5f52\u6279\u8bc4\u4e0e\u6539\u8fdb\u201d\uff08RCI\uff09\u7684\u73b0\u6709\u6280\u672f\u540e\uff0c\u5b89\u5168\u6f0f\u6d1e\u6709\u6240\u51cf\u5c11\uff0c\u4e3aLLM\u751f\u6210\u4ee3\u7801\u5b89\u5168\u6027\u7684\u8ba8\u8bba\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.07061": "|**2024-07-09**|**Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence**|Weize Chen et.al.|[2407.07061](http://arxiv.org/abs/2407.07061)|**[link](https://github.com/openbmb/ioa)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u51fa\u73b0\u4e86\u80fd\u6548\u5353\u8d8a\u7684\u81ea\u4e3b\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5728\u6574\u5408\u6765\u81ea\u4e0d\u540c\u751f\u6001\u7cfb\u7edf\u7684\u9ad8\u80fd\u529b\u7b2c\u4e09\u65b9\u4ee3\u7406\u65f6\u9762\u4e34\u6311\u6218\uff0c\u901a\u5e38\u5c40\u9650\u4e8e\u81ea\u8eab\u5c01\u95ed\u73af\u5883\u3002\u5b83\u4eec\u5728\u6a21\u62df\u5206\u5e03\u5f0f\u73af\u5883\u65f6\u4e5f\u53d7\u9650\u4e8e\u5355\u8bbe\u5907\u8bbe\u7f6e\uff0c\u5e76\u4e14\u5f80\u5f80\u4f9d\u8d56\u786c\u7f16\u7801\u7684\u901a\u4fe1\u7ba1\u9053\uff0c\u96be\u4ee5\u9002\u5e94\u4efb\u52a1\u9700\u6c42\u7684\u53d8\u5316\u3002\u53d7\u4e92\u8054\u7f51\u7406\u5ff5\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u4e92\u8054\u7f51\u201d\uff08Internet of Agents\uff0cIoA\uff09\u7684\u65b0\u6846\u67b6\u3002IoA\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7075\u6d3b\u4e14\u53ef\u6269\u5c55\u7684\u5e73\u53f0\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u3002\u5b83\u5f15\u5165\u4e86\u4ee3\u7406\u96c6\u6210\u534f\u8bae\u3001\u5373\u65f6\u6d88\u606f\u67b6\u6784\u4ee5\u53ca\u52a8\u6001\u7684\u56e2\u961f\u534f\u4f5c\u548c\u5bf9\u8bdd\u6d41\u7a0b\u63a7\u5236\u673a\u5236\u3002\u901a\u8fc7\u5728\u901a\u7528\u52a9\u624b\u4efb\u52a1\u3001\u4f53\u611fAI\u4efb\u52a1\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660eIoA\u5728\u6027\u80fd\u4e0a\u6301\u7eed\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\uff0c\u5c55\u793a\u4e86\u5176\u5728\u5f02\u6784\u4ee3\u7406\u4e4b\u95f4\u6709\u6548\u5408\u4f5c\u7684\u80fd\u529b\u3002IoA\u4ee3\u8868\u4e86\u671d\u7740\u5c06\u591a\u6837\u5316\u7684\u4ee3\u7406\u94fe\u63a5\u5728\u4e00\u4e2a\u7c7b\u4f3c\u4e92\u8054\u7f51\u7684\u73af\u5883\u4e2d\u8fc8\u8fdb\uff0c\u8ba9\u5b83\u4eec\u80fd\u591f\u65e0\u7f1d\u534f\u4f5c\u4ee5\u63d0\u5347\u6574\u4f53\u667a\u80fd\u548c\u529f\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5e93\u5df2\u53d1\u5e03\u5728\uff1a\\url{https://github.com/OpenBMB/IoA}\u3002**|\n", "2407.07053": "|**2024-07-09**|**Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model**|Wenqi Zhang et.al.|[2407.07053](http://arxiv.org/abs/2407.07053)|**[link](https://github.com/zwq2018/multi-modal-self-instruct)**|**\u5c3d\u7ba1\u5f53\u524d\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5df2\u7ecf\u80fd\u591f\u7406\u89e3\u81ea\u7136\u573a\u666f\u7684\u7167\u7247\u548c\u8096\u50cf\uff0c\u4f46\u5b83\u4eec\u5bf9\u62bd\u8c61\u56fe\u50cf\uff08\u5982\u56fe\u8868\u3001\u5730\u56fe\u6216\u5e03\u5c40\uff09\u7684\u7406\u89e3\u4ee5\u53ca\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u4ecd\u7136\u76f8\u5f53\u521d\u7ea7\u3002\u5b83\u4eec\u5728\u5904\u7406\u65e5\u5e38\u4efb\u52a1\u65f6\u5e38\u5e38\u9047\u5230\u56f0\u96be\uff0c\u4f8b\u5982\u9605\u8bfb\u65f6\u949f\u65f6\u95f4\u3001\u7406\u89e3\u6d41\u7a0b\u56fe\u6216\u6839\u636e\u8def\u7ebf\u56fe\u89c4\u5212\u8def\u5f84\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u81ea\u6211\u6307\u5bfc\u7cfb\u7edf\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ca\u5176\u4ee3\u7801\u80fd\u529b\u6765\u751f\u6210\u5927\u91cf\u7684\u62bd\u8c61\u56fe\u50cf\u548c\u65e5\u5e38\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u63a8\u7406\u6307\u4ee4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8f7b\u677e\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\uff0c\u5305\u542b11,193\u4e2a\u6307\u4ee4\uff0c\u6db5\u76d6\u516b\u4e2a\u89c6\u89c9\u573a\u666f\uff1a\u56fe\u8868\u3001\u8868\u683c\u3001\u6a21\u62df\u5730\u56fe\u3001\u4eea\u8868\u677f\u3001\u6d41\u7a0b\u56fe\u3001\u5173\u7cfb\u56fe\u3001\u697c\u5c42\u5e73\u9762\u56fe\u548c\u89c6\u89c9\u8c1c\u9898\u3002 \u8fd9\u4e2a\u7531\u7b80\u5355\u7ebf\u6761\u548c\u51e0\u4f55\u5143\u7d20\u6784\u6210\u7684\u57fa\u51c6\u63ed\u793a\u4e86\u6700\u5148\u8fdb\u7684LMM\uff08\u5982Claude-3.5-Sonnet\u548cGPT-4o\uff09\u5728\u62bd\u8c61\u56fe\u50cf\u7406\u89e3\u3001\u7a7a\u95f4\u5173\u7cfb\u63a8\u7406\u548c\u89c6\u89c9\u5143\u7d20\u8bc6\u522b\u65b9\u9762\u7684\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u9a8c\u8bc1\u5408\u6210\u6570\u636e\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u752862,476\u6761\u5408\u6210\u7684\u56fe\u8868\u3001\u8868\u683c\u548c\u8def\u7ebf\u56fe\u6307\u4ee4\u5bf9LMM\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u56fe\u8868\u7406\u89e3\u548c\u5730\u56fe\u5bfc\u822a\u6027\u80fd\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u540c\u65f6\u4e5f\u8868\u660e\u8fd9\u5bf9\u5176\u4ed6\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u53ef\u80fd\u5177\u6709\u6f5c\u5728\u76ca\u5904\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1a\\url{https://github.com/zwq2018/Multi-modal-Self-instruct}\u3002**|\n", "2407.07019": "|**2024-07-09**|**Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies**|Inwon Kang et.al.|[2407.07019](http://arxiv.org/abs/2407.07019)|null|\u6211\u4eec\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u751f\u6210\u57fa\u4e8e\u6587\u672c\u7684\u5065\u5eb7\u4fdd\u9669\u653f\u7b56\u7684\u81ea\u52a8\u5316\u4ee3\u7801\uff0c\u76ee\u6807\u662f\u533a\u5757\u94fe\u667a\u80fd\u5408\u7ea6\u3002\u667a\u80fd\u5408\u7ea6\u56e0\u5176\u4e0d\u53ef\u53d8\u6027\u3001\u53ef\u9a8c\u8bc1\u6027\u3001\u6269\u5c55\u6027\u548c\u65e0\u9700\u9884\u8bbe\u4fe1\u4efb\u7684\u7279\u6027\u800c\u88ab\u9009\u4e2d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6309\u6280\u672f\u590d\u6742\u5ea6\u9012\u589e\u751f\u6210\u8f93\u51fa\uff1a\uff081\uff09\u6587\u672c\u6458\u8981\uff0c\uff082\uff09\u58f0\u660e\u5f0f\u51b3\u7b56\u903b\u8f91\uff0c\u4ee5\u53ca\uff083\uff09\u5e26\u6709\u5355\u5143\u6d4b\u8bd5\u7684\u667a\u80fd\u5408\u7ea6\u4ee3\u7801\u3002\u6211\u4eec\u786e\u8ba4LLMs\u5728\u4efb\u52a1\uff081\uff09\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u6709\u52a9\u4e8e\u9a8c\u8bc1\u4efb\u52a1\uff082\uff09\u548c\uff083\uff09\u3002\u58f0\u660e\u5f0f\u8bed\u8a00\u5e38\u7528\u4e8e\u89c4\u8303\u533b\u7597\u653f\u7b56\uff0c\u4f46\u5728\u533a\u5757\u94fe\u4e0a\u7684\u6267\u884c\u8f83\u4e3a\u590d\u6742\uff0c\u56e0\u6b64\u4efb\u52a1\uff083\uff09\u65e8\u5728\u76f4\u63a5\u901a\u8fc7\u667a\u80fd\u5408\u7ea6\u81ea\u52a8\u5b9e\u73b0\u8fd9\u4e00\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u5b8c\u6574\u6027\u3001\u6b63\u786e\u6027\u3001\u6e05\u6670\u5ea6\u3001\u8bed\u6cd5\u548c\u529f\u80fd\u6027\u4ee3\u7801\u4f5c\u4e3a\u8bc4\u4f30\u6307\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u6765\u81eaMedicare\u5b98\u65b9\u624b\u518c\u7684\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u96be\u5ea6\u7684\u4fdd\u9669\u653f\u7b56\u573a\u666f\u8fdb\u884c\u8bc4\u4f30\uff0c\u6d89\u53caGPT-3.5 Turbo\u3001GPT-3.5 Turbo 16K\u3001GPT-4\u3001GPT-4 Turbo\u548cCodeLLaMA\u7b49\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5728\u751f\u6210\u6587\u672c\u6458\u8981\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002\u5c3d\u7ba1\u4efb\u52a1\uff082\uff09\u5230\uff083\uff09\u7684\u8f93\u51fa\u53ef\u4ee5\u4f5c\u4e3a\u8d77\u70b9\uff0c\u4f46\u5b83\u4eec\u4ecd\u9700\u4eba\u5de5\u5ba1\u6838\uff1a\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u5373\u4f7f\u201c\u53ef\u8fd0\u884c\u201d\u7684\u4ee3\u7801\u4e5f\u53ef\u80fd\u4ea7\u751f\u4e0d\u6b63\u786e\u7684\u7ed3\u679c\uff1b\u76ee\u6807\u8bed\u8a00\u7684\u6d41\u884c\u7a0b\u5ea6\u4f1a\u5f71\u54cd\u8f93\u51fa\u8d28\u91cf\uff1b\u66f4\u590d\u6742\u7684\u573a\u666f\u4ecd\u662f\u5f53\u524d\u7684\u4e00\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u5c55\u793a\u4e86LLMs\u5728\u5c06\u6587\u672c\u6d41\u7a0b\u63cf\u8ff0\u8f6c\u5316\u4e3a\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2407.07018": "|**2024-07-09**|**End-To-End Causal Effect Estimation from Unstructured Natural Language Data**|Nikita Dhawan et.al.|[2407.07018](http://arxiv.org/abs/2407.07018)|null|\u4e86\u89e3\u5e72\u9884\u63aa\u65bd\u7684\u6548\u679c\u5bf9\u4eba\u7c7b\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u624b\u52a8\u6536\u96c6\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u8fd9\u5bfc\u81f4\u7814\u7a76\u6210\u672c\u589e\u52a0\u3001\u5b8c\u6210\u65f6\u95f4\u5ef6\u957f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u91c7\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u89c2\u5bdf\u6027\u6587\u672c\u6570\u636e\uff0c\u4ee5\u5728\u9002\u5f53\u7684\u56e0\u679c\u5047\u8bbe\u4e0b\u751f\u6210\u4f4e\u6210\u672c\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u63d0\u51faNATURAL\uff0c\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u65b0\u578b\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u7b97\u6cd5\u5bb6\u65cf\uff0c\u9002\u7528\u4e8e\u5904\u7406\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528LLMs\u7684\u6761\u4ef6\u5206\u5e03\uff08\u9488\u5bf9\u611f\u5174\u8da3\u7684\u53d8\u91cf\uff0c\u6839\u636e\u6587\u672c\u6570\u636e\uff09\u8f85\u52a9\u8ba1\u7b97\u7ecf\u5178\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u514b\u670d\u4e86\u4e00\u7cfb\u5217\u6280\u672f\u6311\u6218\uff0c\u5982\u81ea\u52a8\u5316\u6570\u636e\u6574\u7406\u548c\u4f7f\u7528LLMs\u586b\u8865\u7f3a\u5931\u4fe1\u606f\u3002 \u6211\u4eec\u51c6\u5907\u4e86\u516d\u4e2a\uff08\u4e24\u4e2a\u5408\u6210\u7684\u548c\u56db\u4e2a\u5b9e\u9645\u7684\uff09\u89c2\u5bdf\u6027\u6570\u636e\u96c6\uff0c\u5e76\u914d\u4ee5\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u5f62\u5f0f\u7684\u771f\u5b9e\u6807\u7b7e\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u6211\u4eec\u7ba1\u9053\u4e2d\u7684\u6bcf\u4e00\u6b65\u3002NATURAL\u4f30\u8ba1\u7b97\u6cd5\u8868\u73b0\u51fa\u8272\uff0c\u5176\u7ed3\u679c\u4e0e\u771f\u5b9e\u503c\u7684\u5dee\u8ddd\u4e0d\u8d85\u8fc73\u4e2a\u767e\u5206\u70b9\uff0c\u5305\u62ec\u5728\u5b9e\u9645\u7684\u4e09\u671f\u548c\u56db\u671f\u4e34\u5e8a\u8bd5\u9a8c\u4e2d\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u662f\u56e0\u679c\u6548\u5e94\u4fe1\u606f\u7684\u4e30\u5bcc\u6765\u6e90\uff0cNATURAL\u662f\u5229\u7528\u8fd9\u4e00\u8d44\u6e90\u7684\u81ea\u52a8\u5316\u6d41\u7a0b\u7684\u7b2c\u4e00\u6b65\u3002|\n", "2407.07890": "|**2024-07-10**|**Training on the Test Task Confounds Evaluation and Emergence**|Ricardo Dominguez-Olmedo et.al.|[2407.07890](http://arxiv.org/abs/2407.07890)|**[link](https://github.com/socialfoundations/training-on-the-test-task)**|**\u6211\u4eec\u7814\u7a76\u4e86\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u7684\u6838\u5fc3\u95ee\u9898\uff0c\u79f0\u4e3a\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u3002\u8fd9\u5e76\u975e\u5982\u6570\u636e\u6cc4\u9732\u6216\u6c61\u67d3\u7b49\u4e0d\u5f53\u505a\u6cd5\uff0c\u800c\u662f\u4e00\u79cd\u9010\u6e10\u589e\u957f\u7684\u5305\u62ec\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u6280\u672f\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u4f1a\u6df7\u6dc6\u6a21\u578b\u7684\u76f8\u5bf9\u8bc4\u4f30\u548c\u5173\u4e8e\u6d8c\u73b0\u80fd\u529b\u7684\u58f0\u660e\u3002\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u4e4b\u95f4\u7684\u770b\u4f3c\u4f18\u52bf\u53ef\u80fd\u7531\u4ed6\u4eec\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\u7a0b\u5ea6\u5dee\u5f02\u6240\u89e3\u91ca\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u65b9\u6cd5\uff0c\u5373\u5728\u6bd4\u8f83\u524d\u5bf9\u6bcf\u4e2a\u6a21\u578b\u8fdb\u884c\u76f8\u540c\u7684\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5fae\u8c03\uff0c\u4ee5\u6821\u6b63\u8fd9\u79cd\u8bad\u7ec3\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u65e6\u8c03\u6574\u4e86\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\uff0c\u6d8c\u73b0\u884c\u4e3a\u7684\u5b9e\u4f8b\u5927\u591a\u6d88\u5931\u3002\u540c\u6837\u9002\u7528\u4e8e\u90a3\u4e9b\u65e0\u6cd5\u7528\u8bc4\u4ef7\u6307\u6807\u89e3\u91ca\u7684\u6d8c\u73b0\u884c\u4e3a\u62a5\u544a\u6848\u4f8b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63a8\u52a8\u4e86\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b0\u8bc4\u4ef7\u89c6\u89d2\uff0c\u5bf9\u57fa\u51c6\u6d4b\u8bd5\u548c\u6d8c\u73b0\u80fd\u529b\u7814\u7a76\u5177\u6709\u5e7f\u6cdb\u5f71\u54cd\u3002**|\n", "2407.07880": "|**2024-07-10**|**Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization**|Junkang Wu et.al.|[2407.07880](http://arxiv.org/abs/2407.07880)|**[link](https://github.com/junkangwu/dr_dpo)**|**\u672c\u7814\u7a76\u5173\u6ce8\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u566a\u58f0\u5bf9Direct Preference Optimization (DPO)\u65b9\u6cd5\u7684\u6311\u6218\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u6211\u4eec\u533a\u5206\u4e86\u4e24\u7c7b\u566a\u58f0\uff1a\u70b9\u566a\u58f0\uff0c\u6d89\u53ca\u4f4e\u8d28\u91cf\u7684\u6570\u636e\u70b9\uff1b\u548c\u6210\u5bf9\u566a\u58f0\uff0c\u5f71\u54cd\u504f\u597d\u7684\u6b63\u786e\u6392\u5e8f\u3002\u901a\u8fc7\u5206\u5e03\u5f0f\u9c81\u68d2\u4f18\u5316\uff08DRO\uff09\uff0c\u6211\u4eec\u589e\u5f3a\u4e86DPO\u62b5\u6297\u8fd9\u4e9b\u566a\u58f0\u7684\u80fd\u529b\u3002\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0cDPO\u672c\u8d28\u4e0a\u8574\u542b\u4e86DRO\u539f\u7406\uff0c\u5bf9\u70b9\u566a\u58f0\u5177\u6709\u5929\u7136\u7684\u9c81\u68d2\u6027\uff0c\u5176\u4e2d\u6b63\u5219\u5316\u7cfb\u6570$\\beta$\u5728\u6297\u566a\u58f0\u65b9\u9762\u8d77\u5173\u952e\u4f5c\u7528\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u5206\u5e03\u5f0f\u9c81\u68d2\u589e\u5f3a\u7684DPO\uff08Dr. DPO\uff09\uff0c\u5b83\u901a\u8fc7\u4f18\u5316\u6700\u574f\u60c5\u51b5\u7684\u6210\u5bf9\u573a\u666f\u6765\u96c6\u6210\u6210\u5bf9\u9c81\u68d2\u6027\u3002Dr. DPO\u4e2d\u7684\u65b0\u8d85\u53c2\u6570$\\beta'$\u5141\u8bb8\u5bf9\u6570\u636e\u5bf9\u53ef\u9760\u6027\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\uff0c\u5e73\u8861\u4e86\u5728\u5608\u6742\u8bad\u7ec3\u73af\u5883\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cDr. DPO\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u54cd\u5e94\u51c6\u786e\u6027\uff0c\u65e0\u8bba\u5728\u6709\u566a\u58f0\u8fd8\u662f\u65e0\u566a\u58f0\u7684\u8bbe\u7f6e\u4e0b\u90fd\u8868\u73b0\u51fa\u8272\u3002\u4ee3\u7801\u5df2\u5728https://github.com/junkangwu/Dr_DPO\u4e0a\u63d0\u4f9b\u3002**|\n", "2407.07858": "|**2024-07-10**|**FACTS About Building Retrieval Augmented Generation-based Chatbots**|Rama Akkiraju et.al.|[2407.07858](http://arxiv.org/abs/2407.07858)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u65e5\u76ca\u6210\u4e3a\u63d0\u5347\u5458\u5de5\u751f\u4ea7\u529b\u7684\u5173\u952e\u5de5\u5177\uff0c\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u53ca\u5982Langchain\u548cLlamaindex\u4e4b\u7c7b\u7684orchestration\u6846\u67b6\u5728\u6784\u5efa\u8fd9\u4e9b\u804a\u5929\u673a\u5668\u4eba\u4e2d\u626e\u6f14\u4e86\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u521b\u5efa\u6709\u6548\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u662f\u4e00\u9879\u6311\u6218\uff0c\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u7684RAG\u7ba1\u9053\u5de5\u7a0b\u3002\u8fd9\u5305\u62ec\u5fae\u8c03\u5d4c\u5165\u548cLLMs\u3001\u4ece\u5411\u91cf\u6570\u636e\u5e93\u63d0\u53d6\u6587\u6863\u3001\u91cd\u8ff0\u67e5\u8be2\u3001\u91cd\u65b0\u6392\u540d\u7ed3\u679c\u3001\u8bbe\u8ba1\u63d0\u793a\u3001\u9075\u5b88\u6587\u6863\u8bbf\u95ee\u63a7\u5236\u3001\u63d0\u4f9b\u7b80\u6d01\u7684\u56de\u7b54\u3001\u5305\u542b\u5f15\u7528\u3001\u4fdd\u62a4\u4e2a\u4eba\u4fe1\u606f\u4ee5\u53ca\u6784\u5efaorchestration\u4ee3\u7406\u3002\u6211\u4eec\u57fa\u4e8e\u4e09\u4e2aNVIDIA\u804a\u5929\u673a\u5668\u4eba\uff08\u5206\u522b\u7528\u4e8eIT/HR\u798f\u5229\u3001\u8d22\u52a1\u6536\u76ca\u548c\u901a\u7528\u5185\u5bb9\uff09\u7684\u7ecf\u9a8c\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6784\u5efaRAG\u804a\u5929\u673a\u5668\u4eba\u7684\u6846\u67b6\u2014\u2014FACTS\uff08Freshness\u3001Architectures\u3001Cost\u3001Testing\u3001Security\uff09\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u65b9\u9762\uff1a\u9996\u5148\u4ecb\u7ecdFACTS\u6846\u67b6\uff0c\u5176\u6b21\u5217\u51fa\u5341\u4e94\u4e2aRAG\u7ba1\u9053\u63a7\u5236\u70b9\uff0c\u6700\u540e\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u6a21\u578b\u548c\u5c0f\u6a21\u578b\u5728\u51c6\u786e\u6027\u548c\u5ef6\u8fdf\u4e4b\u95f4\u6743\u8861\u7684\u5b9e\u8bc1\u7ed3\u679c\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u7bc7\u5168\u9762\u63a2\u8ba8\u6784\u5efa\u5b89\u5168\u4f01\u4e1a\u7ea7\u804a\u5929\u673a\u5668\u4eba\u7684\u65b9\u6cd5\u548c\u89e3\u51b3\u65b9\u6848\u7684\u8bba\u6587\u3002|\n", "2407.07852": "|**2024-07-10**|**OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training**|Sami Jaghouar et.al.|[2407.07852](http://arxiv.org/abs/2407.07852)|**[link](https://github.com/PrimeIntellect-ai/OpenDiLoCo)**|**OpenDiLoCo\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u5206\u5e03\u5f0f\u4f4e\u901a\u4fe1\uff08DiLoCo\uff09\u8bad\u7ec3\u65b9\u6cd5\u7684\u5b9e\u73b0\u548c\u590d\u5236\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u53ef\u590d\u73b0\u7684DiLoCo\u5b9e\u9a8c\uff0c\u901a\u8fc7Hivemind\u5e93\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5927\u6d32\u548c\u4e09\u4e2a\u56fd\u5bb6\u4e4b\u95f4\u8bad\u7ec3\u6a21\u578b\uff0c\u540c\u65f6\u4fdd\u630190-95%\u7684\u8ba1\u7b97\u8d44\u6e90\u5229\u7528\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5173\u4e8e\u7b97\u6cd5\u8ba1\u7b97\u6548\u7387\u3001\u5de5\u4f5c\u5668\u6570\u91cf\u53ef\u6269\u5c55\u6027\u7684\u7814\u7a76\uff0c\u5e76\u8868\u660e\u5176\u68af\u5ea6\u53ef\u4ee5\u4f7f\u7528FP16\u8fdb\u884c\u5168\u5f52\u4e00\u5316\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06OpenDiLoCo\u6269\u5c55\u5230\u539f\u59cb\u5de5\u4f5c\u7684\u4e09\u500d\u89c4\u6a21\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u767e\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2407.07845": "|**2024-07-10**|**Natural Language Mechanisms via Self-Resolution with Foundation Models**|Nicolas Della Penna et.al.|[2407.07845](http://arxiv.org/abs/2407.07845)|null|\u5728\u5b9e\u9645\u64cd\u4f5c\u4e2d\uff0c\u4ee3\u7406\u4eba\u901a\u5e38\u53d7\u9650\u4e8e\u8bf8\u5982\u4ea4\u6613\u6216\u8ba2\u5355\u4e4b\u7c7b\u7684\u6709\u9650\u62a5\u544a\u683c\u5f0f\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u4ed6\u4eec\u8868\u8fbe\u4fe1\u606f\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u673a\u5236\uff0c\u5b83\u4fc3\u4f7f\u4ee3\u7406\u4eba\u4ee5\u81ea\u7136\u8bed\u8a00\u63d0\u4ea4\u62a5\u544a\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\u6765\u9009\u62e9\u7ed3\u679c\u548c\u5206\u914d\u62a5\u916c\u3002\u6211\u4eec\u786e\u5b9a\u4e86\u8fd9\u4e9b\u673a\u5236\u5728LLM\u4f5c\u4e3a\u826f\u597d\u7684\u4e16\u754c\u6a21\u578b\u4ee5\u53ca\u5f3a\u70c8\u7684\u8de8\u4ee3\u7406\u4fe1\u606f\u8fc7\u5ea6\u786e\u5b9a\u6761\u4ef6\u4e0b\u7684\u6fc0\u52b1\u517c\u5bb9\u6027\u548c\u6548\u7387\u7684\u5fc5\u8981\u6761\u4ef6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u4f20\u7edf\u9884\u6d4b\u5e02\u573a\u5728\u4fe1\u53f7\u7ed3\u6784\u4e0a\u5b58\u5728\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u673a\u5236\u80fd\u591f\u6210\u529f\u5730\u6574\u5408\u4fe1\u606f\u3002|\n", "2407.07810": "|**2024-07-10**|**Transformer Alignment in Large Language Models**|Murdock Aubry et.al.|[2407.07810](http://arxiv.org/abs/2407.07810)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6df1\u5165\u7406\u89e3\u5176\u5185\u90e8\u673a\u5236\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u89c6LLMs\u4e3a\u9ad8\u7ef4\u7a7a\u95f4\u4e2d\u7684\u79bb\u6563\u3001\u8026\u5408\u7684\u975e\u7ebf\u6027\u52a8\u529b\u7cfb\u7edf\uff0c\u901a\u8fc7\u7814\u7a76tokens\u5728Transformer\u5757\u4e2d\u7684\u8f68\u8ff9\uff0c\u5e76\u6cbf\u7740\u8fd9\u4e9b\u8f68\u8ff9\u7ebf\u6027\u5316\u7cfb\u7edf\uff0c\u5229\u7528\u96c5\u53ef\u6bd4\u77e9\u9635\u8fdb\u884c\u5206\u6790\u3002\u5728\u5bf938\u4e2a\u516c\u5f00\u53ef\u7528\u7684LLMs\u8fdb\u884c\u7814\u7a76\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u6b8b\u5dee\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u4e0a\u5de6\u548c\u53f3\u5947\u5f02\u5411\u91cf\u4e4b\u95f4\u7684\u5bf9\u9f50\uff0c\u4ee5\u53ca\u7ebf\u6027\u6027\u548c\u5c42\u5185\u6307\u6570\u589e\u957f\u7684\u51fa\u73b0\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u5bf9\u9f50\u5ea6\u7684\u63d0\u9ad8\u4e0e\u6a21\u578b\u6027\u80fd\u5448\u6b63\u76f8\u5173\u3002\u8bad\u7ec3\u540e\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u521d\u59cb\u5316\u6743\u91cd\u65f6\u7684\u6307\u6807\uff0c\u6709\u663e\u8457\u6539\u5584\uff0c\u8fd9\u5f3a\u8c03\u4e86\u8bad\u7ec3\u5728Transformer\u67b6\u6784\u4e2d\u7684\u91cd\u8981\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86\u4e00\u79cd\u4ee5\u524d\u672a\u88ab\u5145\u5206\u8ba4\u8bc6\u7684\u89c4\u5f8b\u6027\uff0c\u5f3a\u5316\u4e86\u52a8\u529b\u5b66\u89e3\u91ca\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3\u548c\u4f18\u5316LLM\u67b6\u6784\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.07799": "|**2024-07-10**|**Attribute or Abstain: Large Language Models as Long Document Assistants**|Jan Buchmann et.al.|[2407.07799](http://arxiv.org/abs/2407.07799)|**[link](https://github.com/ukplab/arxiv2024-attribute-or-abstain)**|**## \u80cc\u666f \u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8f85\u52a9\u5904\u7406\u957f\u7bc7\u6587\u6863\uff0c\u4f46\u5b83\u4eec\u4e5f\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\u3002\u589e\u52a0\u53ef\u4fe1\u5ea6\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u63d0\u4f9b\u8bc1\u636e\u652f\u6301\u54cd\u5e94\uff0c\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u3002\u5f53\u524d\u7684\u5f52\u56e0\u65b9\u6cd5\u4ec5\u5728\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u73af\u5883\u4e2d\u8bc4\u4f30\u8fc7\uff0c\u8fd9\u4e0e\u65e0\u9700\u68c0\u7d22\u7684\u957f\u6587\u6863\u573a\u666f\u4e0d\u540c\uff0c\u53ef\u80fd\u4ecd\u6709\u5e94\u7528\u4ef7\u503c\u3002\u56e0\u6b64\uff0c\u7f3a\u4e4f\u9488\u5bf9\u957f\u6587\u6863\u7684\u5f52\u56e0\u4e13\u95e8\u8bc4\u4f30\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLAB\uff0c\u4e00\u4e2a\u5305\u542b6\u4e2a\u591a\u6837\u5316\u7684\u957f\u6587\u6863\u4efb\u52a1\u7684\u57fa\u51c6\uff0c\u5e76\u5728\u56db\u79cd\u4e0d\u540c\u5927\u5c0f\u7684LLM\uff08\u5373\u63d0\u793a\u548c\u5fae\u8c03\uff09\u4e0a\u8bd5\u9a8c\u4e86\u4e0d\u540c\u7684\u5f52\u56e0\u65b9\u6cd5\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u6b65\u751f\u6210\u5f15\u7528\uff08citation\uff0c\u5373\u540c\u65f6\u8fdb\u884c\u54cd\u5e94\u751f\u6210\u548c\u8bc1\u636e\u63d0\u53d6\uff09\u7684\u8868\u73b0\u6700\u4f73\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u73b0\u8c61\u662f\u5426\u9002\u7528\u4e8e\u5f52\u56e0\uff0c\u4f46\u672a\u53d1\u73b0\u8fd9\u79cd\u60c5\u51b5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8bc1\u636e\u8d28\u91cf\u5728\u7b80\u5355\u54cd\u5e94\u7684\u573a\u666f\u4e0b\u53ef\u4ee5\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\uff0c\u4f46\u5bf9\u4e8e\u590d\u6742\u54cd\u5e94\u5219\u4e0d\u7136\uff0c\u56e0\u4e3a\u6a21\u578b\u5728\u4e3a\u590d\u6742\u4e3b\u5f20\u63d0\u4f9b\u8bc1\u636e\u65f6\u9762\u4e34\u6311\u6218\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2407.07796": "|**2024-07-11**|**Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard**|Oguzhan Topsakal et.al.|[2407.07796](http://arxiv.org/abs/2407.07796)|**[link](https://github.com/research-outcome/llm-game-benchmark)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u4e14\u53ef\u6269\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u7f51\u683c\u578b\u6e38\u620f\u5982\u4e95\u5b57\u68cb\u3001\u8fde\u63a5\u56db\u548c\u56f4\u68cb\u8fdb\u884c\u3002\u5f00\u6e90\u7684\u6e38\u620f\u6a21\u62df\u4ee3\u7801\u5728GitHub\u4e0a\u63d0\u4f9b\uff0c\u5141\u8bb8LLMs\u7ade\u6280\uff0c\u5e76\u751f\u6210JSON\u3001CSV\u3001TXT\u548cPNG\u683c\u5f0f\u7684\u8be6\u7ec6\u6570\u636e\u6587\u4ef6\uff0c\u7528\u4e8e\u6392\u884c\u699c\u6392\u540d\u548c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5305\u62ecAnthropic\u7684Claude 3.5 Sonnet\u548cClaude 3 Sonnet\uff0cGoogle\u7684Gemini 1.5 Pro\u548cGemini 1.5 Flash\uff0cOpenAI\u7684GPT-4 Turbo\u548cGPT-4o\uff0c\u4ee5\u53caMeta\u7684Llama3-70B\u5728\u5185\u7684\u9886\u5148LLM\u4e4b\u95f4\u7684\u6bd4\u8d5b\u7ed3\u679c\u3002\u6211\u4eec\u9f13\u52b1\u5176\u4ed6LLM\u63d0\u4ea4\u7ed3\u679c\u3002\u603b\u5171\u8fdb\u884c\u4e862,310\u573a\u6a21\u62df\u6bd4\u8d5b\uff08\u6bcf\u5bf9\u6a21\u578b\u8fdb\u884c5\u8f6e\uff0c\u51717\u4e2a\u6a21\u578b\u95f4\u7684\u5bf9\u5c40\uff0c\u4ee5\u53ca\u4e0e\u968f\u673a\u73a9\u5bb6\u7684\u6bd4\u8d5b\uff09\uff0c\u6db5\u76d6\u4e09\u79cd\u7c7b\u578b\u7684\u6e38\u620f\uff0c\u4f7f\u7528\u4e86\u5217\u8868\u3001\u63d2\u56fe\u548c\u56fe\u50cf\u4e09\u79cd\u63d0\u793a\u65b9\u5f0f\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u5728\u4e0d\u540c\u6e38\u620f\u548c\u63d0\u793a\u7c7b\u578b\u4e0b\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff0c\u5206\u6790\u5185\u5bb9\u5305\u62ec\u80dc\u7387\u3001\u9519\u5931\u673a\u4f1a\u548c\u65e0\u6548\u52a8\u4f5c\u3002\u6392\u884c\u699c\u548c\u7ed3\u679c\u77e9\u9635\u7684\u8be6\u7ec6\u6570\u636e\u4f5c\u4e3a\u5f00\u653e\u8bbf\u95ee\u6570\u636e\u5728GitHub\u4e0a\u63d0\u4f9b\u3002\u8fd9\u9879\u7814\u7a76\u52a0\u6df1\u4e86\u6211\u4eec\u5bf9LLM\u5728\u672a\u4e13\u95e8\u8bad\u7ec3\u7684\u6e38\u620f\u4e2d\u7684\u80fd\u529b\u7684\u7406\u89e3\uff0c\u6709\u52a9\u4e8e\u8bc4\u4f30\u5b83\u4eec\u7684\u89c4\u5219\u7406\u89e3\u80fd\u529b\u548c\u6218\u7565\u601d\u7ef4\u3002\u5728\u901a\u5411\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\u7684\u9053\u8def\u4e0a\uff0c\u8fd9\u9879\u7814\u7a76\u4e3a\u672a\u6765\u63a2\u7d22\u5b83\u4eec\u5728\u590d\u6742\u51b3\u7b56\u573a\u666f\u4e2d\u7684\u5b9e\u7528\u6027\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u7684\u6218\u7565\u601d\u8003\u80fd\u529b\uff0c\u5e76\u4e3a\u6df1\u5165\u63a2\u7a76LLM\u5728\u57fa\u4e8e\u6e38\u620f\u6846\u67b6\u5185\u7684\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u65b9\u5411\u3002**|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u6bd4\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.07778": "|**2024-07-10**|**WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment**|Jiefu Ou et.al.|[2407.07778](http://arxiv.org/abs/2407.07778)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7269\u7406\u73af\u5883\u4e2d\u90e8\u7f72\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4ee3\u7406\u65f6\u6240\u9700\u7684\u57fa\u672c\u64cd\u4f5c\uff08API\uff09\u6570\u91cf\u548c\u8bbe\u8ba1\u95ee\u9898\u3002\u7814\u7a76\u8005\u8bbe\u60f3\uff0c\u5982\u679cwikiHow\u6559\u7a0b\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u7528\u6237\u81ea\u7f16\u4efb\u52a1\uff0c\u90a3\u4e48\u8fd9\u4e9b\u4efb\u52a1\u6240\u9700\u7684API\u8303\u56f4\u662f\u4ec0\u4e48\u3002\u4ed6\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06wikiHow\u6307\u4ee4\u4e0e\u7f6e\u8eab\u4e8e\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u7b56\u7565\u5173\u8054\uff0c\u8fed\u4ee3\u5730\u751f\u6210\u65b0\u7684API\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f53\u611f\u89c4\u5212\u65b9\u9762\u7684\u6700\u65b0\u6210\u5c31\uff0c\u7814\u7a76\u8005\u63d0\u8bae\u4f7f\u7528\u5c11\u91cf\u6837\u4f8b\u63d0\u793aGPT-4\u751f\u6210Python\u4ee3\u7801\u4f5c\u4e3a\u4ee3\u7406\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4ee5\u4e0b\u6b65\u9aa4\u6269\u5c55API\u5e93\uff1a1\uff09\u91cd\u7528\u521d\u59cbAPI\u96c6\uff1b2\uff09\u5728\u5fc5\u8981\u65f6\u521b\u5efa\u65b0\u7684API\u8c03\u7528\u3002\u5b9e\u9a8c\u5173\u6ce8\u7684\u662f\u5b9a\u4e49API\uff0c\u800c\u975e\u5176\u5b9e\u73b0\u6027\u3002\u5728\u4e00\u5c0f\u90e8\u5206wikiHow\u6559\u7a0b\u4e0a\u5e94\u7528\u8be5\u65b9\u6cd5\u540e\uff0c\u53d1\u73b0\u9700\u8981300\u591a\u4e2aAPI\u6765\u6355\u6349\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u6837\u4efb\u52a1\u3002\u81ea\u52a8\u548c\u4eba\u5de5\u5206\u6790\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7ba1\u9053\u80fd\u6709\u6548\u590d\u7528\u548c\u521b\u9020API\u3002\u8fdb\u4e00\u6b65\u7684\u4eba\u5de5\u5ba1\u67e5\u53d1\u73b0\uff0c\u73b0\u6709\u7684\u6a21\u62df\u5668\u4ec5\u652f\u6301\u8bf1\u5bfc\u51fa\u7684API\u7684\u4e00\u5c0f\u90e8\u5206\uff08\u524d50\u4e2a\u5e38\u7528API\u4e2d\u76849\u4e2a\uff09\uff0c\u8fd9\u4fc3\u4f7f\u5f00\u53d1\u66f4\u4e30\u5bcc\u7684\u4f53\u611f\u73af\u5883\u3002|\n", "2407.08739": "|**2024-07-11**|**MAVIS: Mathematical Visual Instruction Tuning**|Renrui Zhang et.al.|[2407.08739](http://arxiv.org/abs/2407.08739)|**[link](https://github.com/zrrskywalker/mavis)**|**### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8fd1\u5e74\u6765\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u591a\u6a21\u6001\u573a\u666f\u4e2d\u7684\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5bf9\u6570\u5b66\u56fe\u89e3\u7684\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u80fd\u529b\u7814\u7a76\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6307\u51fa\u4e86MLLM\u5728\u6570\u5b66\u89c6\u89c9\u9886\u57df\u7684\u4e09\u4e2a\u5173\u952e\u6539\u8fdb\u9886\u57df\uff1a\u6570\u5b66\u56fe\u89e3\u7684\u89c6\u89c9\u7f16\u7801\u3001\u56fe\u89e3\u4e0e\u8bed\u8a00\u7684\u5bf9\u9f50\u4ee5\u53ca\u6570\u5b66\u63a8\u7406\u6280\u80fd\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u9700\u8981\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u7684\u89c6\u89c9\u6570\u5b66\u6570\u636e\u548c\u8bad\u7ec3\u6d41\u7a0b\u3002\u672c\u6587\u63d0\u51faMAVIS\uff08Mathematical VISual instruction tuning for MLLMs\uff09\uff0c\u4e00\u4e2a\u9488\u5bf9MLLM\u7684\u6570\u5b66\u89c6\u89c9\u6307\u5bfc\u8c03\u53c2\u8303\u5f0f\uff0c\u5305\u62ec\u4e00\u7cfb\u5217\u6570\u5b66\u89c6\u89c9\u6570\u636e\u96c6\u548c\u4e13\u95e8\u7684MLLM\u3002 ### \u65b9\u6cd5 MAVIS\u5206\u4e3a\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u4ece\u5934\u5f00\u59cb\u7684\u8bad\u7ec3\u3002\u9996\u5148\uff0c\u6211\u4eec\u521b\u5efa\u4e86MAVIS-Caption\uff0c\u5305\u542b558,000\u4e2a\u56fe\u89e3-\u63cf\u8ff0\u5bf9\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u5fae\u8c03\u4e13\u4e3a\u6570\u5b66\u8bbe\u8ba1\u7684\u89c6\u89c9\u7f16\u7801\u5668\uff08CLIP-Math\uff09\uff0c\u4ee5\u63d0\u5347\u56fe\u89e3\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u5176\u6b21\uff0c\u5229\u7528MAVIS-Caption\uff0c\u6211\u4eec\u901a\u8fc7\u6295\u5f71\u5c42\u5c06CLIP-Math\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5173\u8054\uff0c\u589e\u5f3a\u6570\u5b66\u9886\u57df\u7684\u89c6\u89c9\u8bed\u8a00\u5bf9\u9f50\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f15\u5165MAVIS-Instruct\uff0c\u5305\u542b900,000\u4e2a\u7cbe\u5fc3\u6536\u96c6\u548c\u6807\u6ce8\u7684\u89c6\u89c9\u6570\u5b66\u95ee\u9898\uff0c\u7528\u4e8e\u6700\u7ec8\u6307\u5bfc\u8c03\u53c2\uff0c\u4ee5\u589e\u5f3aMLLM\u7684\u7a33\u5065\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u5728MAVIS-Instruct\u4e2d\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6bcf\u4e2a\u95ee\u9898\u7684\u5b8c\u6574\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u7406\u7531\uff0c\u5e76\u51cf\u5c11\u6587\u672c\u5197\u4f59\uff0c\u4f7f\u6a21\u578b\u66f4\u4e13\u6ce8\u4e8e\u89c6\u89c9\u5143\u7d20\u3002 ### \u7ed3\u679c \u6570\u636e\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/ZrrSkywalker/MAVIS\u3002\u901a\u8fc7MAVIS\uff0c\u6211\u4eec\u65e8\u5728\u586b\u8865\u6570\u5b66\u89c6\u89c9\u7406\u89e3\u7684\u7a7a\u767d\uff0c\u63d0\u5347MLLM\u5728\u89e3\u51b3\u5b9e\u9645\u6570\u5b66\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002**|\n", "2407.08735": "|**2024-07-11**|**Real-Time Anomaly Detection and Reactive Planning with Large Language Models**|Rohan Sinha et.al.|[2407.08735](http://arxiv.org/abs/2407.08735)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5728\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u68c0\u6d4b\u548c\u5e94\u5bf9\u5f02\u5e38\u60c5\u51b5\uff0c\u4ee5\u63d0\u9ad8\u5176\u9c81\u68d2\u6027\u548c\u5b89\u5168\u6027\u3002\u4e3b\u8981\u6311\u6218\u5305\u62ec\u51cf\u5c11\u6a21\u578b\u7684\u8ba1\u7b97\u5f00\u9500\u4ee5\u4fbf\u5b9e\u73b0\u5b9e\u65f6\u5e94\u7528\uff0c\u4ee5\u53ca\u5c06\u6a21\u578b\u7684\u5224\u65ad\u878d\u5165\u5230\u5b89\u5168\u63a7\u5236\u6846\u67b6\u4e2d\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u63a8\u7406\u6846\u67b6\uff1a\u9996\u5148\u662f\u4e00\u4e2a\u5feb\u901f\u7684\u4e8c\u5143\u5f02\u5e38\u5206\u7c7b\u5668\uff0c\u5b83\u5728\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7a7a\u95f4\u4e2d\u5206\u6790\u89c2\u6d4b\u6570\u636e\uff0c\u5982\u679c\u53d1\u73b0\u5f02\u5e38\uff0c\u4f1a\u89e6\u53d1\u540e\u7eed\u7684\u6162\u901f\u63a8\u7406\u9636\u6bb5\uff0c\u5229\u7528\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6df1\u5165\u7684\u903b\u8f91\u63a8\u7406\u3002\u8fd9\u79cd\u8bbe\u8ba1\u7c7b\u4f3c\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\u4e2d\u7684\u51b3\u7b56\u5206\u652f\uff0c\u8003\u8651\u5230\u6162\u901f\u63a8\u7406\u5668\u7684\u5ef6\u8fdf\uff0c\u53ef\u4ee5\u7acb\u5373\u91c7\u53d6\u5907\u4efd\u8ba1\u5212\uff0c\u786e\u4fdd\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002 \u901a\u8fc7\u4e0e\u6700\u5148\u8fdb\u7684GPT\u6a21\u578b\u7684\u81ea\u56de\u5f52\u63a8\u7406\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5373\u4f7f\u4f7f\u7528\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ed6\u4eec\u7684\u5feb\u901f\u5f02\u5e38\u5206\u7c7b\u5668\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4f7f\u5f97\u4ed6\u4eec\u5f00\u53d1\u7684\u8fd0\u884c\u65f6\u76d1\u63a7\u5668\u80fd\u591f\u5728\u8d44\u6e90\u548c\u65f6\u95f4\u9650\u5236\u4e0b\uff0c\u63d0\u5347\u52a8\u6001\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5982\u56db\u65cb\u7ffc\u65e0\u4eba\u673a\u6216\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u4fe1\u4efb\u5ea6\u3002\u8bba\u6587\u7684\u89c6\u9891\u793a\u4f8b\u53ef\u4ee5\u5728\u9879\u76ee\u9875\u9762\u4e0a\u67e5\u770b\uff1ahttps://sites.google.com/view/aesop-llm\u3002|\n", "2407.08733": "|**2024-07-11**|**Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist**|Zihao Zhou et.al.|[2407.08733](http://arxiv.org/abs/2407.08733)|null|### \u7ffb\u8bd1 **\u6458\u8981\uff1a** \u5f3a\u5927\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5353\u8d8a\u6027\u80fd\u7684\u5173\u952e\u4f53\u73b0\u3002\u5982\u4f55\u5b9a\u4e49\u548c\u5168\u9762\u8bc4\u4f30LLMs\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ee5\u53ca\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u53cd\u6620\u7528\u6237\u4f53\u9a8c\uff0c\u5df2\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u76ee\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u4fa7\u91cd\u4e8e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u8fc7\u62df\u5408\uff0c\u5e76\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u771f\u6b63\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u5982\u679c\u6a21\u578b\u771f\u6b63\u7406\u89e3\u4e86\u95ee\u9898\uff0c\u5b83\u5e94\u8be5\u80fd\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u7a33\u5065\u4e14\u7075\u6d3b\u5730\u5e94\u7528\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMATHCHECK\uff0c\u4e00\u4e2a\u65e8\u5728\u6d4b\u8bd5\u4efb\u52a1\u6cdb\u5316\u548c\u63a8\u7406\u9c81\u68d2\u6027\u7684\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6e05\u5355\uff0c\u4ee5\u53ca\u4e00\u4e2a\u81ea\u52a8\u751f\u6210\u6e05\u5355\u7684\u5de5\u5177\u3002MATHCHECK\u5305\u542b\u591a\u4e2a\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u548c\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u6570\u5b66\u63a8\u7406\u80fd\u529b\u548c\u884c\u4e3a\u6d4b\u8bd5\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6211\u4eec\u5229\u7528MATHCHECK\u521b\u5efa\u4e86MATHCHECK-GSM\u548cMATHCHECK-GEO\uff0c\u5206\u522b\u9488\u5bf9\u6570\u5b66\u6587\u672c\u63a8\u7406\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u8bc4\u4f30\uff0c\u5b83\u4eec\u662fGSM8k\u3001GeoQA\u3001UniGeo\u548cGeometry3K\u7b49\u57fa\u51c6\u7684\u5347\u7ea7\u7248\u3002\u6211\u4eec\u4f7f\u7528MATHCHECK-GSM\u548cMATHCHECK-GEO\u5bf9\u8d85\u8fc720\u79cdLLM\u548c11\u79cd\u591a\u6a21\u6001LLMs\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u68c0\u9a8c\u5b83\u4eec\u7684\u7efc\u5408\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u6a21\u578b\u5982GPT-4\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u4ed6\u6a21\u578b\u5bb6\u65cf\u5728\u6e05\u5355\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4e0b\u964d\u3002\u8fdb\u4e00\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u4f20\u7edf\u6570\u5b66\u57fa\u51c6\u76f8\u6bd4\uff0cMATHCHECK\u66f4\u597d\u5730\u53cd\u6620\u4e86\u771f\u6b63\u7684\u6570\u5b66\u80fd\u529b\uff0c\u7ebf\u6027\u5ea6\u66f4\u9ad8\uff0c\u4ece\u800c\u652f\u6301\u6211\u4eec\u7684\u8bbe\u8ba1\u3002\u901a\u8fc7MATHCHECK\uff0c\u6211\u4eec\u53ef\u4ee5\u8f7b\u677e\u8fdb\u884c\u8be6\u7ec6\u7684\u884c\u4e3a\u5206\u6790\uff0c\u6df1\u5165\u63a2\u7a76\u6a21\u578b\u3002|\n", "2407.08716": "|**2024-07-11**|**A Taxonomy for Data Contamination in Large Language Models**|Medha Palavalli et.al.|[2407.08716](http://arxiv.org/abs/2407.08716)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5e7f\u6cdb\u7f51\u7edc\u8bed\u6599\u5e93\u7684\u9884\u8bad\u7ec3\u540e\uff0c\u5728\u4f17\u591a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u6570\u636e\u6c61\u67d3\u95ee\u9898\u65e5\u76ca\u5f15\u8d77\u5173\u6ce8\uff0c\u5373\u8bc4\u4f30\u6570\u636e\u53ef\u80fd\u5b58\u5728\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\uff0c\u5bfc\u81f4\u6a21\u578b\u8868\u73b0\u865a\u9ad8\u3002\u53bb\u6c61\u67d3\uff08decontamination\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u8bd5\u56fe\u68c0\u6d4b\u5e76\u79fb\u9664\u8fd9\u4e9b\u6c61\u67d3\u6570\u636e\u3002\u7136\u800c\uff0c\u6c61\u67d3\u6570\u636e\u53ef\u80fd\u6e90\u4e8e\u6d4b\u8bd5\u96c6\u7684\u4fee\u6539\u7248\u672c\uff0c\u8fd9\u4f7f\u5f97\u68c0\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u76ee\u524d\u5c1a\u4e0d\u6e05\u695a\u4e0d\u540c\u7c7b\u578b\u7684\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u7c7b\u4f53\u7cfb\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u9047\u5230\u7684\u5404\u79cd\u6c61\u67d3\u7c7b\u578b\u8fdb\u884c\u5212\u5206\uff0c\u5e76\u786e\u5b9a\u4e86\u54ea\u4e9b\u7c7b\u578b\u7684\u98ce\u9669\u6700\u9ad8\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u603b\u7ed3\u548c\u95ee\u7b54\u4e24\u4e2a\u5173\u952e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u7c7b\u578b\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5728\u5b9e\u9645\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u3002|\n", "2407.08713": "|**2024-07-11**|**GTA: A Benchmark for General Tool Agents**|Jize Wang et.al.|[2407.08713](http://arxiv.org/abs/2407.08713)|**[link](https://github.com/open-compass/GTA)**|**\u4eba\u4eec\u666e\u904d\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5404\u79cd\u5de5\u5177\u7684\u6574\u5408\uff0c\u4ee5\u5f00\u53d1\u901a\u7528\u4ee3\u7406\uff0c\u4f46\u8fd9\u5bf9LLMs\u7684\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u63d0\u51fa\u4e86\u6311\u6218\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u6cd5\u5b58\u5728\u660e\u663e\u7f3a\u9677\uff0c\u5982\u4f7f\u7528AI\u751f\u6210\u7684\u67e5\u8be2\u3001\u5355\u6b65\u9aa4\u4efb\u52a1\u3001\u6a21\u62df\u5de5\u5177\u4ee5\u53ca\u4ec5\u9650\u6587\u672c\u7684\u4ea4\u4e92\uff0c\u672a\u80fd\u5145\u5206\u5c55\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u4e2d\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faGTA\uff08\u901a\u7528\u5de5\u5177\u4ee3\u7406\u57fa\u51c6\uff09\uff0c\u5b83\u5305\u542b\u4e09\u4e2a\u5173\u952e\u7279\u6027\uff1a\uff081\uff09\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff1a\u7531\u4eba\u7c7b\u7f16\u5199\uff0c\u5177\u6709\u7b80\u5355\u7684\u73b0\u5b9e\u4e16\u754c\u76ee\u6807\uff0c\u4f46\u9690\u542b\u4e86\u5de5\u5177\u4f7f\u7528\u9700\u6c42\uff0c\u8981\u6c42LLMs\u80fd\u63a8\u7406\u51fa\u5408\u9002\u7684\u5de5\u5177\u5e76\u89c4\u5212\u89e3\u51b3\u65b9\u6848\u6b65\u9aa4\u3002\uff082\uff09\u771f\u5b9e\u90e8\u7f72\u7684\u5de5\u5177\uff1a\u4e00\u4e2a\u914d\u5907\u6709\u611f\u77e5\u3001\u64cd\u4f5c\u3001\u903b\u8f91\u548c\u521b\u65b0\u7c7b\u5de5\u5177\u7684\u8bc4\u4f30\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u7684\u5b9e\u9645\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002\uff083\uff09\u771f\u5b9e\u7684\u591a\u6a21\u6001\u8f93\u5165\uff1a\u5305\u62ec\u7a7a\u95f4\u573a\u666f\u56fe\u7247\u3001\u7f51\u9875\u622a\u56fe\u3001\u8868\u683c\u3001\u4ee3\u7801\u7247\u6bb5\u548c\u6253\u5370/\u624b\u5199\u6750\u6599\u7b49\uff0c\u4ee5\u8d34\u8fd1\u771f\u5b9e\u4e16\u754c\u7684\u573a\u666f\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86229\u4e2a\u73b0\u5b9e\u751f\u6d3b\u4efb\u52a1\u548c\u53ef\u6267\u884c\u7684\u5de5\u5177\u94fe\uff0c\u6765\u8bc4\u4f30\u4e3b\u6d41LLMs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff0c\u73b0\u6709\u7684LLMs\u9762\u4e34\u4e25\u5cfb\u6311\u6218\uff0cGPT-4\u5b8c\u6210\u7684\u4efb\u52a1\u4e0d\u8db3\u4e00\u534a\uff0c\u5927\u591a\u6570\u6a21\u578b\u7684\u6210\u7ee9\u4f4e\u4e8e25%\u3002\u8fd9\u4e2a\u8bc4\u4f30\u63ed\u793a\u4e86\u5f53\u524dLLMs\u5728\u5b9e\u9645\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u4e0a\u7684\u74f6\u9888\uff0c\u4e3a\u63d0\u5347\u901a\u7528\u5de5\u5177\u4ee3\u7406\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\u3002GTA\u7684\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2407.08701": "|**2024-07-11**|**Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models**|Zhening Xing et.al.|[2407.08701](http://arxiv.org/abs/2407.08701)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u673a\u5236\uff0c\u5728\u6587\u672c\u548c\u97f3\u9891\u6d41\u6570\u636e\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5bf9\u5b9e\u65f6\u89c6\u9891\u5904\u7406\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u89c6\u9891\u6d41\u5904\u7406\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u8f83\u5c11\u3002\u73b0\u6709\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u4f9d\u8d56\u53cc\u5411\u65f6\u95f4\u6ce8\u610f\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u76f4\u64ad\u89c6\u9891\u7684\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLive2Diff\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u5b9e\u65f6\u89c6\u9891\u7ffb\u8bd1\u8bbe\u8ba1\u7684\u5177\u6709\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u3002\u4e0e\u5148\u524d\u5de5\u4f5c\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e0e\u524d\u4e00\u5e27\u53ca\u5176\u5c11\u6570\u9884\u70ed\u5e27\u76f8\u5173\u8054\uff0c\u4fdd\u6301\u4e86\u65f6\u95f4\u4e00\u81f4\u6027\u548c\u5e73\u6ed1\u6027\uff0c\u65e0\u9700\u8003\u8651\u672a\u6765\u5e27\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u9ad8\u6548\u7684\u964d\u566a\u65b9\u6848\uff0c\u5305\u62ecKV\u7f13\u5b58\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u5904\u7406\uff0c\u4ee5\u652f\u6301\u4e92\u52a8\u5e27\u7387\u4e0b\u7684\u89c6\u9891\u6d41\u7ffb\u8bd1\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6ce8\u610f\u529b\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u8bbe\u8ba1\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u4fdd\u6301\u65f6\u95f4\u5e73\u6ed1\u6027\u548c/\u6216\u6548\u7387\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.08699": "|**2024-07-11**|**Mitigating Catastrophic Forgetting in Language Transfer via Model Merging**|Anton Alexandrov et.al.|[2407.08699](http://arxiv.org/abs/2407.08699)|null|\u968f\u7740\u5f00\u653e\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u82f1\u8bed\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u4e0d\u65ad\u63d0\u5347\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u81f4\u529b\u4e8e\u5c06\u5176\u6269\u5c55\u5230\u5176\u4ed6\u8bed\u8a00\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bed\u8a00\u9002\u5e94\u5f80\u5f80\u4f1a\u5bfc\u81f4\u57fa\u7840\u6a21\u578b\u80fd\u529b\u7684\u707e\u96be\u6027\u9057\u5fd8\uff0c\u9650\u5236\u4e86\u6539\u7f16\u540e\u6a21\u578b\u7684\u5b9e\u7528\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u9002\u5e94\u65b9\u6cd5\u2014\u2014Branch-and-Merge\uff08BaM\uff09\uff0c\u5b83\u57fa\u4e8e\u8fed\u4ee3\u5730\u5408\u5e76\u591a\u4e2a\u9488\u5bf9\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u3002BaM\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4ea7\u751f\u7684\u662f\u5e45\u5ea6\u8f83\u5c0f\u4f46\u8d28\u91cf\u66f4\u9ad8\u7684\u6743\u91cd\u8c03\u6574\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u6e90\u9886\u57df\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u76ee\u6807\u9886\u57df\u7684\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4fdd\u52a0\u5229\u4e9a\u8bed\u548c\u5fb7\u8bed\u7684\u5e7f\u6cdb\u5b9e\u8bc1\u7814\u7a76\u4e2d\u5c55\u793a\u4e86BaM\u7684\u4f18\u52bf\uff1a\u5b83\u80fd\u663e\u8457\u964d\u4f4e\u9057\u5fd8\uff0c\u540c\u65f6\u5728\u4e0d\u540c\u6a21\u578b\u67b6\u6784\u4e0a\u4e0e\u6807\u51c6\u6301\u7eed\u9884\u8bad\u7ec3\u548c\u6307\u4ee4\u5fae\u8c03\u76f8\u6bd4\uff0c\u80fd\u591f\u5339\u914d\u751a\u81f3\u63d0\u5347\u76ee\u6807\u9886\u57df\u7684\u6027\u80fd\u3002|\n", "2407.08694": "|**2024-07-11**|**Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight**|Zhiqiang Xie et.al.|[2407.08694](http://arxiv.org/abs/2407.08694)|null|\u5728\u73b0\u4ee3\u4e91\u7cfb\u7edf\u4e2d\uff0c\u8fd0\u884c\u65f6\u6545\u969c\u548c\u6027\u80fd\u4e0b\u964d\u662f\u5e38\u6001\u3002\u5bf9\u4e8e\u4e91\u670d\u52a1\u63d0\u4f9b\u5546\u800c\u8a00\uff0c\u81ea\u52a8\u786e\u5b9a\u95ee\u9898\u7684\u6839\u672c\u539f\u56e0\u662f\u4fdd\u8bc1\u9ad8\u53ef\u9760\u6027\u548c\u53ef\u7528\u6027\u7684\u5173\u952e\uff0c\u56e0\u4e3a\u5feb\u901f\u7684\u6545\u969c\u5b9a\u4f4d\u6709\u52a9\u4e8e\u52a0\u5feb\u8bca\u65ad\u548c\u4f18\u5148\u7ea7\u6392\u5e8f\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u89e3\u51b3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u4e2d\uff0c\u56e0\u679c\u63a8\u7406\u5229\u7528\u56e0\u679c\u56fe\u6765\u6355\u6349\u4e0d\u540c\u4e91\u7cfb\u7edf\u6027\u80fd\u6307\u6807\u4e4b\u95f4\u7684\u5173\u7cfb\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u7cfb\u7edf\u5f00\u53d1\u8005\u9700\u8981\u7cbe\u786e\u5b9a\u4e49\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\uff0c\u8fd9\u662f\u4e00\u9879\u8017\u65f6\u3001\u8106\u5f31\u4e14\u6311\u6218\u6027\u7684\u5de5\u4f5c\uff0c\u5c24\u5176\u5bf9\u4e8e\u5e9e\u5927\u548c\u52a8\u6001\u7684\u7cfb\u7edf\uff0c\u4e14\u9700\u8981\u6df1\u539a\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u5728\u4e91\u7cfb\u7edf\u4e2d\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u6545\u969c\u4e8b\u4ef6\u7684\u53d1\u751f\u9891\u7387\u76f8\u5bf9\u8f83\u4f4e\u3002 \u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u89e3\u51b3\u65b9\u6848\u2014\u2014Atlas\uff0c\u5b83\u80fd\u591f\u81ea\u52a8\u5408\u6210\u4e91\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\u3002Atlas\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u7cfb\u7edf\u6587\u6863\u3001\u65e5\u5fd7\u548c\u90e8\u7f72\u53cd\u9988\u751f\u6210\u56e0\u679c\u56fe\u3002Atlas\u4e0e\u6570\u636e\u9a71\u52a8\u7684\u56e0\u679c\u53d1\u73b0\u6280\u672f\u76f8\u8f85\u76f8\u6210\uff0c\u5e76\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u9a8c\u8bc1\u6b65\u9aa4\u8fdb\u884c\u589e\u5f3a\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u6545\u969c\u5b9a\u4f4d\u573a\u666f\u4e2d\u8bc4\u4f30\u4e86Atlas\uff0c\u7ed3\u679c\u8868\u660e\uff0cAtlas\u80fd\u591f\u5728\u53ef\u6269\u5c55\u548c\u666e\u9002\u7684\u65b9\u5f0f\u4e0b\u751f\u6210\u56e0\u679c\u56fe\uff0c\u5176\u6027\u80fd\u8fdc\u8d85\u6570\u636e\u9a71\u52a8\u7b97\u6cd5\uff0c\u5e76\u4e0e\u57fa\u51c6\u7ebf\u76f8\u5f53\u3002|\n", "2407.08683": "|**2024-07-11**|**SEED-Story: Multimodal Long Story Generation with Large Language Model**|Shuai Yang et.al.|[2407.08683](http://arxiv.org/abs/2407.08683)|**[link](https://github.com/tencentarc/seed-story)**|**\u968f\u7740\u56fe\u50cf\u751f\u6210\u548c\u5f00\u653e\u5f62\u5f0f\u6587\u672c\u751f\u6210\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u6709\u5438\u5f15\u529b\u3002\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\uff0c\u5373\u751f\u6210\u53d9\u4e8b\u6587\u672c\u4e0e\u751f\u52a8\u56fe\u50cf\u7684\u4ea4\u9519\u5e8f\u5217\uff0c\u4f5c\u4e3a\u4e00\u79cd\u6709\u4ef7\u503c\u7684\u5b9e\u7528\u4efb\u52a1\uff0c\u56e0\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u524d\u666f\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u7406\u89e3\u6587\u672c\u548c\u56fe\u50cf\u590d\u6742\u4ea4\u4e92\u3001\u751f\u6210\u8fde\u8d2f\u4e14\u76f8\u5173\u6587\u672c\u548c\u89c6\u89c9\u5185\u5bb9\u7684\u6311\u6218\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51faSEED-Story\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u6765\u751f\u6210\u6269\u5c55\u7684\u591a\u6a21\u6001\u6545\u4e8b\u3002\u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8eMLLM\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u65e2\u80fd\u9884\u6d4b\u6587\u672c\u4ee4\u724c\uff0c\u4e5f\u80fd\u9884\u6d4b\u89c6\u89c9\u4ee4\u724c\uff0c\u7136\u540e\u901a\u8fc7\u9002\u5e94\u7684\u89c6\u89c9\u89e3\u4ee4\u724c\u5316\u5668\u5904\u7406\uff0c\u751f\u6210\u5177\u6709\u4e00\u81f4\u89d2\u8272\u548c\u98ce\u683c\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u591a\u6a21\u6001\u6ce8\u610f\u529b\u6c89\u964d\u673a\u5236\uff0c\u4f7f\u5f97\u5728\u9ad8\u5ea6\u81ea\u52a8\u9012\u5f52\u7684\u65b9\u5f0f\u4e0b\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe25\u4e2a\u5e8f\u5217\uff08\u4ec5\u752810\u4e2a\u8fdb\u884c\u8bad\u7ec3\uff09\u7684\u6545\u4e8b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u5927\u89c4\u6a21\u9ad8\u5206\u8fa8\u7387\u7684StoryStream\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bad\u7ec3\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5e76\u91cf\u5316\u8bc4\u4f30\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\u4efb\u52a1\u5728\u591a\u4e2a\u65b9\u9762\u7684\u6027\u80fd\u3002**|\n", "2407.08662": "|**2024-07-11**|**Uncertainty Estimation of Large Language Models in Medical Question Answering**|Jiaxin Wu et.al.|[2407.08662](http://arxiv.org/abs/2407.08662)|null|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b58\u5728\u4ea7\u751f\u9519\u8bef\u4e8b\u5b9e\u7684\u98ce\u9669\u3002\u4e3a\u4e86\u5728\u533b\u7597\u95ee\u9898\u89e3\u7b54\u4e2d\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\uff0c\u9700\u8981\u53ef\u9760\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\uff08UE\uff09\u65b9\u6cd5\u6765\u8bc6\u522b\u5e7b\u89c9\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5728\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u5bf9\u6d41\u884cUE\u65b9\u6cd5\u53ca\u5176\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u65b9\u6cd5\u5728\u8be5\u9886\u57df\u901a\u5e38\u8868\u73b0\u4e0d\u4f73\uff0c\u51f8\u663e\u4e86\u533b\u7597\u5e94\u7528\u4e2d\u7684UE\u6311\u6218\u3002\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5f80\u5f80\u80fd\u83b7\u5f97\u66f4\u597d\u7684\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u89c4\u6a21\u4e0eUE\u53ef\u9760\u6027\u53ef\u80fd\u5b58\u5728\u5173\u8054\u3002 \u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4e24\u9636\u6bb5\u9a8c\u8bc1\u201d\u7684\u6982\u7387\u81ea\u7531\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u6cd5\u3002\u9996\u5148\uff0cLLM\u751f\u6210\u9010\u6b65\u89e3\u91ca\u548c\u521d\u59cb\u7b54\u6848\uff0c\u63a5\u7740\u5236\u5b9a\u6838\u67e5\u95ee\u9898\u4ee5\u68c0\u67e5\u89e3\u91ca\u4e2d\u7684\u4e8b\u5b9e\u9648\u8ff0\u3002\u6a21\u578b\u4f1a\u4e24\u6b21\u56de\u7b54\u8fd9\u4e9b\u95ee\u9898\uff1a\u4e00\u6b21\u72ec\u7acb\uff0c\u4e00\u6b21\u53c2\u8003\u89e3\u91ca\u3002\u4e24\u79cd\u7b54\u6848\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u5ea6\u8861\u91cf\u539f\u59cb\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u4f7f\u7528Llama 2 Chat\u6a21\u578b\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u51c6\u57fa\u7ebf\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4e24\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u5728\u5404\u4e2a\u6570\u636e\u96c6\u548c\u6a21\u578b\u89c4\u6a21\u4e0a\u5b9e\u73b0\u4e86\u6700\u4f73\u7684\u6574\u4f53\u51c6\u786e\u6027\u548c\u7a33\u5b9a\u6027\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u968f\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\u800c\u63d0\u5347\u3002|\n", "2407.09467": "|**2024-07-12**|**FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3**|Georgios Makridis et.al.|[2407.09467](http://arxiv.org/abs/2407.09467)|null|\u5728\u8fd9\u4e2a\u5145\u6ee1\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u53d9\u4e8b\u591a\u6837\u6027\u4e16\u754c\u4e2d\uff0c\u6709\u4e00\u4e2a\u72ec\u7279\u7684\u673a\u4f1a\u662f\u901a\u8fc7\u5b9a\u5236\u548c\u4e2a\u6027\u5316\u7684\u53d9\u8ff0\u5438\u5f15\u5e74\u8f7b\u89c2\u4f17\u3002\u672c\u6587\u4ecb\u7ecdFairyLandAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u513f\u7ae5\u5f00\u53d1\u7684\u521b\u65b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u57fa\u4e8eOpenAI\u7684API\u6784\u5efa\u3002\u5176\u7279\u522b\u4e4b\u5904\u5728\u4e8e\uff0cFairyLandAI\u4e0d\u4ec5\u80fd\u751f\u6210\u5f15\u4eba\u5165\u80dc\u3001\u9002\u5408\u5404\u5e74\u9f84\u6bb5\u4e14\u53cd\u6620\u5404\u79cd\u4f20\u7edf\u7684\u6545\u4e8b\uff0c\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5408\u9ad8\u7ea7\u56fe\u50cf\u751f\u6210\u5de5\u5177\uff08\u5982GenAI\u548cDalle-3\uff09\u7684\u521b\u610f\u63d0\u793a\uff0c\u4ece\u800c\u4e30\u5bcc\u8bb2\u6545\u4e8b\u7684\u4f53\u9a8c\u3002FairyLandAI\u7cbe\u51c6\u5730\u9002\u5e94\u513f\u7ae5\u7684\u60f3\u8c61\u529b\u4e16\u754c\uff0c\u63d0\u4f9b\u65e2\u6559\u80b2\u53c8\u5a31\u4e50\u7684\u6545\u4e8b\uff0c\u5e76\u4e0e\u4e0d\u540c\u5e74\u9f84\u9636\u6bb5\u6240\u8574\u542b\u7684\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002\u5b83\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u6839\u636e\u4e2a\u4f53\u5b69\u5b50\u7684\u559c\u597d\u548c\u6587\u5316\u80cc\u666f\u5b9a\u5236\u6545\u4e8b\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u53d9\u4e8b\u65b0\u65f6\u4ee3\u7684\u5230\u6765\u3002\u6b64\u5916\uff0c\u5b83\u4e0e\u56fe\u50cf\u751f\u6210\u6280\u672f\u7684\u7ed3\u5408\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u53d9\u4e8b\u4f53\u9a8c\uff0c\u6fc0\u53d1\u53e3\u5934\u548c\u89c6\u89c9\u521b\u9020\u529b\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cFairyLandAI\u5728\u521b\u4f5c\u5438\u5f15\u5b69\u5b50\u4eec\u7684\u6545\u4e8b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u4e9b\u6545\u4e8b\u4e0d\u4ec5\u5a31\u4e50\uff0c\u8fd8\u4f53\u73b0\u4e86\u591a\u5143\u4f20\u7edf\u4e2d\u7684\u9053\u5fb7\u6559\u8bf2\u3002\u8fd9\u4e2a\u6a21\u578b\u5bf9\u4e8e\u5bb6\u957f\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u6765\u8bf4\u662f\u4e00\u4e2a\u5b9d\u8d35\u7684\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u901a\u8fc7\u5f15\u4eba\u5165\u80dc\u7684\u6545\u4e8b\u4f20\u9012\u6df1\u523b\u7684\u4eba\u751f\u9053\u7406\u3002FairyLandAI\u4ee3\u8868\u4e86\u5229\u7528LLMs\uff0c\u7279\u522b\u662fOpenAI API\u8fdb\u884c\u6559\u80b2\u548c\u6587\u5316\u63d0\u5347\u7684\u5f00\u521b\u6027\u4e00\u6b65\uff0c\u4f7f\u590d\u6742\u800c\u5bcc\u6709\u6559\u80b2\u610f\u4e49\u7684\u9053\u5fb7\u6545\u4e8b\u5bf9\u5e74\u8f7b\u3001\u5bcc\u6709\u60f3\u8c61\u529b\u7684\u5fc3\u7075\u53d8\u5f97\u6613\u4e8e\u7406\u89e3\u548c\u4eab\u53d7\u3002|\n", "2407.09450": "|**2024-07-12**|**Human-like Episodic Memory for Infinite Context LLMs**|Zafeirios Fountas et.al.|[2407.09450](http://arxiv.org/abs/2407.09450)|**[link](https://github.com/em-llm/EM-LLM-model)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u4ecd\u9762\u4e34\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u7684\u95ee\u9898\u3002\u4eba\u7c7b\u5927\u8111\u5728\u7ec4\u7ec7\u548c\u68c0\u7d22\u8de8\u957f\u65f6\u95f4\u5c3a\u5ea6\u7684\u4eb2\u8eab\u7ecf\u5386\u65b9\u9762\u5c24\u4e3a\u51fa\u8272\uff0c\u80fd\u591f\u8986\u76d6\u4e00\u751f\u7684\u8bb0\u5fc6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aEM-LLM\uff0c\u5b83\u5c06\u4eba\u7c7b\u7684 episodic memory\uff08\u60c5\u666f\u8bb0\u5fc6\uff09\u548c\u4e8b\u4ef6\u8ba4\u77e5\u5173\u952e\u8981\u7d20\u878d\u5165\u5230LLMs\u4e2d\uff0c\u4f7f\u5176\u80fd\u591f\u6709\u6548\u5904\u7406\u51e0\u4e4e\u65e0\u9650\u957f\u5ea6\u7684\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4fdd\u6301\u8ba1\u7b97\u6548\u7387\u3002EM-LLM\u901a\u8fc7\u7ed3\u5408\u8d1d\u53f6\u65af\u60ca\u5947\u5ea6\u548c\u56fe\u8bba\u8fb9\u754c\u7ec6\u5316\u6280\u672f\uff0c\u5728\u7ebf\u65b9\u5f0f\u7ec4\u7ec7\u4ee4\u724c\u5e8f\u5217\u6210\u8fde\u8d2f\u7684\u4e8b\u4ef6\u3002\u5f53\u9700\u8981\u65f6\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bb0\u5fc6\u8fc7\u7a0b\u2014\u2014\u7ed3\u5408\u76f8\u4f3c\u5ea6\u548c\u65f6\u95f4\u90bb\u63a5\u7684\u68c0\u7d22\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4fe1\u606f\u8bbf\u95ee\u3002\u5728LongBench\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0cEM-LLM\u7684\u8868\u73b0\u4f18\u4e8e\u6700\u5148\u8fdb\u7684InfLLM\u6a21\u578b\uff0c\u603b\u4f53\u76f8\u5bf9\u63d0\u9ad8\u4e864.3%\uff0c\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\uff0c\u5305\u62ec\u63d0\u5347\u4e8633%\u7684PassageRetrieval\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86EM-LLM\u4e8b\u4ef6\u5206\u5272\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e8b\u4ef6\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\uff0c\u6697\u793a\u4e86\u8fd9\u4e2a\u4eba\u5de5\u7cfb\u7edf\u4e0e\u751f\u7269\u5bf9\u5e94\u673a\u5236\u4e4b\u95f4\u7684\u6865\u6881\u3002\u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u63d0\u5347\u4e86LLMs\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd8\u4e3a\u63a2\u7d22\u4eba\u7c7b\u8bb0\u5fc6\u673a\u5236\u63d0\u4f9b\u4e86\u8ba1\u7b97\u6846\u67b6\uff0c\u5f00\u8f9f\u4e86\u4eba\u5de5\u667a\u80fd\u548c\u8ba4\u77e5\u79d1\u5b66\u4ea4\u53c9\u7814\u7a76\u7684\u65b0\u9014\u5f84\u3002|\n", "2407.09447": "|**2024-07-12**|**ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts**|Amelia F. Hardy et.al.|[2407.09447](http://arxiv.org/abs/2407.09447)|**[link](https://github.com/sisl/astprompter)**|## \u80cc\u666f \u901a\u5e38\u7684\u81ea\u52a8\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ea2\u961f\u5bf9\u6297\u7b56\u7565\u96c6\u4e2d\u5728\u5bfb\u627e\u80fd\u89e6\u53d1\u51bb\u7ed3\u8bed\u8a00\u6a21\u578b\uff08\u5373\u9632\u5fa1\u8005\uff09\u751f\u6210\u6709\u6bd2\u6587\u672c\u7684\u63d0\u793a\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6a21\u578b\uff08\u5373\u653b\u51fb\u8005\uff09\u4ea7\u751f\u96be\u4ee5\u7406\u89e3\u3001\u4e0d\u81ea\u7136\u7684\u8f93\u51fa\u3002\u5728\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u6765\u5904\u7406LLMs\u7684\u7ea2\u961f\u5bf9\u6297\u4efb\u52a1\uff0c\u76ee\u6807\u662f\u627e\u5230\u65e2\u80fd\uff081\uff09\u89e6\u53d1\u9632\u5fa1\u8005\u751f\u6210\u6709\u6bd2\u6587\u672c\uff0c\u53c8\u80fd\uff082\uff09\u4fdd\u6301\u4f4e\u56f0\u60d1\u5ea6\uff08\u5373\u9632\u5fa1\u8005\u6253\u5206\uff09\u7684\u63d0\u793a\u3002\u6211\u4eec\u8ba4\u4e3a\u5728\u7ea2\u961f\u5bf9\u6297\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u60c5\u51b5\u6700\u76f8\u5173\uff0c\u56e0\u4e3a\u5b83\u4eec\u5f88\u53ef\u80fd\u5728\u9632\u5fa1\u8005\u6a21\u578b\u7684\u5e38\u89c4\u4f7f\u7528\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u5728\u7ebf\u548c\u5f31\u76d1\u7763\u7684Identity Preference Optimization\uff08IPO\uff09\u53d8\u4f53\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u5e94\u7528\u4e8eGPT-2\u548cGPT-2 XL\u4f5c\u4e3a\u9632\u5fa1\u8005\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b56\u7565\u80fd\u591f\u751f\u6210\u65e2\u53ef\u80fd\u53c8\u4f1a\u89e6\u53d1\u6bd2\u6027\u7684\u63d0\u793a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5b66\u4e60\u7b56\u7565\u3001\u53ef\u80fd\u6027\u4e0e\u6bd2\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u8ba8\u8bba\u4e86\u76f8\u5173\u542b\u4e49\u3002\u8be5\u9879\u76ee\u7684\u6e90\u4ee3\u7801\u53ef\u5728\u8fd9\u91cc\u83b7\u53d6\uff1ahttps://github.com/sisl/ASTPrompter/\u3002|\n", "2407.09435": "|**2024-07-12**|**MUSCLE: A Model Update Strategy for Compatible LLM Evolution**|Jessica Echterhoff et.al.|[2407.09435](http://arxiv.org/abs/2407.09435)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u6570\u636e\u6216\u67b6\u6784\u7684\u8c03\u6574\u800c\u7ecf\u5e38\u66f4\u65b0\u4ee5\u63d0\u5347\u6027\u80fd\u3002\u5728\u5347\u7ea7\u8fc7\u7a0b\u4e2d\uff0c\u5f00\u53d1\u8005\u901a\u5e38\u4fa7\u91cd\u4e8e\u63d0\u9ad8\u603b\u4f53\u6027\u80fd\u6307\u6807\uff0c\u5bf9\u4e0e\u65e7\u7248\u672c\u517c\u5bb9\u6027\u7684\u5173\u6ce8\u8f83\u5c11\u3002\u7136\u800c\uff0c\u7528\u6237\u5f80\u5f80\u4f1a\u5bf9\u4ed6\u4eec\u4f7f\u7528\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u529f\u80fd\u548c\u80fd\u529b\u5f62\u6210\u5fc3\u7406\u6a21\u578b\uff0c\u5e76\u968f\u7740\u6bcf\u6b21\u66f4\u65b0\u9700\u8981\u8c03\u6574\u8fd9\u4e2a\u6a21\u578b\u3002\u9891\u7e41\u7684\u6a21\u578b\u53d8\u66f4\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u6ee1\u610f\u5ea6\u4e0b\u964d\u3002\u5b9e\u9645\u4e0a\uff0c\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u5668\u4f9d\u8d56\u9884\u8bad\u7ec3\u7684LLM\u57fa\u6a21\u578b\u3002\u5f53\u57fa\u6a21\u578b\u66f4\u65b0\u65f6\uff0c\u9762\u5411\u7528\u6237\u7684\u8fd9\u4e9b\u4e0b\u6e38\u4efb\u52a1\u6a21\u578b\u53ef\u80fd\u4f1a\u51fa\u73b0\u5b9e\u4f8b\u9000\u5316\u6216\u8d1f\u9762\u7ffb\u8f6c\u2014\u2014\u5148\u524d\u6b63\u786e\u7684\u5b9e\u4f8b\u73b0\u5728\u88ab\u9884\u6d4b\u9519\u8bef\u3002\u5373\u4f7f\u4e0b\u6e38\u4efb\u52a1\u7684\u8bad\u7ec3\u6d41\u7a0b\u4fdd\u6301\u4e0d\u53d8\uff0c\u8fd9\u79cd\u60c5\u51b5\u4e5f\u4f1a\u53d1\u751f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4e3a\u7528\u6237\u63d0\u4f9b\u65e0\u7f1d\u7684\u6a21\u578b\u66f4\u65b0\u4f53\u9a8c\uff0c\u65b9\u6cd5\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u6a21\u578b\u4e0e\u65e7\u7248\u672c\u7684\u517c\u5bb9\u6027\uff0c\u7279\u522b\u9002\u7528\u4e8e\u751f\u6210\u4efb\u52a1\uff0c\u4e5f\u53ef\u5e94\u7528\u4e8e\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e0d\u540c\u6a21\u578b\u7248\u672c\u548c\u66f4\u65b0\u4e4b\u95f4\u5b58\u5728\u9000\u5316\u548c\u4e0d\u4e00\u81f4\u6027\uff0c\u5c24\u5176\u662f\u5728\u591a\u6837\u5316\u7684\u4efb\u52a1\u4e0a\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ee5\u4e0b\u4e24\u4e2a\u9014\u5f84\u63d0\u4f9b\u5bf9\u7528\u6237\u53cb\u597d\u7684\u6a21\u578b\u66f4\u65b0\uff1a\u4e00\u662f\u5f00\u53d1\u4e00\u79cd\u517c\u5bb9\u6027\u8bc4\u4f30\u6807\u51c6\uff0c\u7528\u4e8e\u68c0\u6d4b\u751f\u6210\u4efb\u52a1\u6216\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6a21\u578b\u7248\u672c\u95f4\u5dee\u5f02\uff1b\u4e8c\u662f\u63d0\u51fa\u4e00\u79cd\u8bad\u7ec3\u7b56\u7565\uff0c\u901a\u8fc7\u8bad\u7ec3\u517c\u5bb9\u6027\u6a21\u578b\u6765\u51cf\u5c11\u6a21\u578b\u66f4\u65b0\u4e2d\u7684\u4e0d\u4e00\u81f4\uff0c\u4ece\u800c\u964d\u4f4e\u4eceLlama 1\u5230Llama 2\u7b49\u7248\u672c\u66f4\u65b0\u65f6\u7684\u8d1f\u9762\u7ffb\u8f6c\u7387\uff0c\u6700\u591a\u53ef\u51cf\u5c1140%\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u8f7b\u677e\u5730\u9002\u5e94\u65b0\u7248\u672c\uff0c\u800c\u65e0\u9700\u9891\u7e41\u8c03\u6574\u4ed6\u4eec\u7684\u9884\u671f\u548c\u4f7f\u7528\u65b9\u5f0f\u3002|\n", "2407.09429": "|**2024-07-12**|**Open (Clinical) LLMs are Sensitive to Instruction Phrasings**|Alberto Mario Ceballos Arroyo et.al.|[2407.09429](http://arxiv.org/abs/2407.09429)|**[link](https://github.com/alceballosa/clin-robust)**|## \u80cc\u666f \u57fa\u4e8e\u6307\u4ee4\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u6267\u884c\u5404\u79cd\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5bf9\u6307\u4ee4\u8868\u8ff0\u7684\u654f\u611f\u6027\u662f\u4e00\u4e2a\u95ee\u9898\u3002\u5728\u533b\u7597\u9886\u57df\u5c24\u5176\u5173\u952e\uff0c\u56e0\u4e3a\u4e34\u5e8a\u533b\u751f\u53ef\u80fd\u4e0d\u662f\u63d0\u793a\u5de5\u7a0b\u65b9\u9762\u7684\u4e13\u5bb6\uff0c\u4e14\u9519\u8bef\u8f93\u51fa\u7684\u6f5c\u5728\u540e\u679c\u66f4\u4e3a\u4e25\u91cd\u3002\u8fd9\u5c31\u63d0\u51fa\u4e86\u4e00\u4e2a\u5b9e\u9645\u95ee\u9898\uff1a\u9488\u5bf9\u4e34\u5e8a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u6307\u4ee4\u8c03\u4f18\u7684LLMs\u5bf9\u4e8e\u81ea\u7136\uff08\u975e\u653b\u51fb\u6027\u7684\uff09\u6307\u4ee4\u8868\u8ff0\u53d8\u5316\u6709\u591a\u7a33\u5065\uff1f\u6211\u4eec\u6536\u96c6\u4e86\u6765\u81ea\u4e0d\u540c\u4efb\u52a1\u7684\u533b\u751f\u63d0\u793a\uff0c\u8861\u91cf\u4e86\u4e03\u79cdLLM\uff08\u5305\u62ec\u901a\u7528\u548c\u4e13\u7528\u7684\uff09\u5bf9\u6307\u4ee4\u8868\u8ff0\u7ec6\u5fae\u5dee\u5f02\u7684\u654f\u611f\u5ea6\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5dee\u5f02\u663e\u8457\uff0c\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u4e13\u95e8\u9488\u5bf9\u4e34\u5e8a\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u8f83\u4e8e\u901a\u7528\u9886\u57df\u7684\u6a21\u578b\uff0c\u5176\u7a33\u5b9a\u6027\u8f83\u5dee\u3002\u6b64\u5916\uff0c\u968f\u610f\u7684\u8868\u8ff0\u53d8\u5316\u53ef\u80fd\u5f71\u54cd\u516c\u5e73\u6027\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u9884\u6d4b\u6b7b\u4ea1\u7387\u7684\u6709\u6548\u4f46\u4e0d\u540c\u7684\u6307\u4ee4\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u6574\u4f53\u6027\u80fd\u7684\u6ce2\u52a8\uff0c\u8fd8\u4f1a\u5728\u4e0d\u540c\u4eba\u7fa4\u95f4\u4ea7\u751f\u5dee\u5f02\u3002|\n", "2407.09424": "|**2024-07-12**|**TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models**|Hang Zou et.al.|[2407.09424](http://arxiv.org/abs/2407.09424)|null|\u8be5\u8bba\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u5230\u7535\u4fe1\u9886\u57df\u7684\u4e13\u7528\u6a21\u578b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u6784\u5efa\u4e86\u7535\u4fe1\u7279\u5b9a\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u6307\u4ee4\u6570\u636e\u96c6\u548c\u504f\u597d\u6570\u636e\u96c6\uff0c\u5206\u522b\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\u3001\u6307\u5bfc\u8c03\u4f18\u548c\u5bf9\u9f50\u8c03\u4f18\u3002\u7531\u4e8e\u7535\u4fe1\u9886\u57df\u7f3a\u4e4f\u5e7f\u6cdb\u63a5\u53d7\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u57fa\u51c6\uff1a\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u3001\u7535\u4fe1\u5f00\u653e\u6027\u95ee\u9898\u4e0e\u7b54\u6848\uff08TeleQnA\uff09\u4ee5\u53ca\u7535\u4fe1\u4ee3\u7801\u4efb\u52a1\u3002\u8fd9\u4e9b\u65b0\u57fa\u51c6\u5168\u9762\u8bc4\u4f30\u4e86LLMs\u5728\u7535\u4fe1\u9886\u57df\u7684\u6570\u5b66\u5efa\u6a21\u3001\u5f00\u653e\u5f0f\u95ee\u9898\u56de\u7b54\u3001\u4ee3\u7801\u751f\u6210\u3001\u586b\u5145\u3001\u603b\u7ed3\u548c\u5206\u6790\u7b49\u80fd\u529b\u3002\u6211\u4eec\u7684\u4f18\u5316\u6a21\u578bTelecomGPT\u5728\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5982GPT-4\u3001Llama-3\u548cMistral\uff0c\u5e76\u5728TeleQnA\u30013GPP\u6280\u672f\u6587\u6863\u5206\u7c7b\u3001\u7535\u4fe1\u4ee3\u7801\u6458\u8981\u4e0e\u751f\u6210\u4ee5\u53ca\u586b\u5145\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2407.09417": "|**2024-07-12**|**Mitigating Entity-Level Hallucination in Large Language Models**|Weihang Su et.al.|[2407.09417](http://arxiv.org/abs/2407.09417)|**[link](https://github.com/oneal2000/entityhallucination)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7528\u6237\u83b7\u53d6\u4fe1\u606f\u7684\u65b9\u5f0f\u53d1\u751f\u4e86\u8f6c\u53d8\uff0c\u4ece\u4f20\u7edf\u7684\u641c\u7d22\u5f15\u64ce\u8f6c\u5411\u76f4\u63a5\u4e0eLLMs\u8fdb\u884c\u95ee\u7b54\u4ea4\u4e92\u3002\u7136\u800c\uff0cLLMs\u7684\u5e7f\u6cdb\u5e94\u7528\u66b4\u9732\u51fa\u4e00\u4e2a\u6311\u6218\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u751f\u6210\uff0c\u5373\u6a21\u578b\u751f\u6210\u770b\u4f3c\u8fde\u8d2f\u4f46\u4e8b\u5b9e\u6027\u9519\u8bef\u7684\u56de\u7b54\uff0c\u8fd9\u5bfc\u81f4\u7528\u6237\u5bf9\u57fa\u4e8eLLMs\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u4ea7\u751f\u6000\u7591\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff1a\u52a8\u6001\u68c0\u7d22\u589e\u5f3a\u57fa\u4e8e\u5e7b\u89c9\u68c0\u6d4b\uff08DRAD\uff09\u3002DRAD\u6539\u8fdb\u4e86\u4f20\u7edf\u68c0\u7d22\u589e\u5f3a\u6280\u672f\uff0c\u901a\u8fc7\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\u6765\u52a8\u6001\u8c03\u6574\u68c0\u7d22\u8fc7\u7a0b\u3002\u5b83\u4e3b\u8981\u5305\u62ec\u4e24\u4e2a\u6838\u5fc3\u7ec4\u4ef6\uff1a\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\uff08RHD\uff09\uff0c\u7528\u4e8e\u5728\u65e0\u9700\u5916\u90e8\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\u8bc6\u522b\u6f5c\u5728\u7684\u5e7b\u89c9\uff1b\u4ee5\u53ca\u57fa\u4e8e\u5916\u90e8\u77e5\u8bc6\u7684\u81ea\u6211\u7ea0\u6b63\uff08SEK\uff09\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4fee\u6b63\u8fd9\u4e9b\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAD\u5728\u68c0\u6d4b\u548c\u51cf\u5c11LLMs\u4e2d\u7684\u5e7b\u89c9\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5df2\u5c06\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u4f7f\u7528\uff1ahttps://github.com/oneal2000/EntityHallucination\u3002**|\n", "2407.09413": "|**2024-07-12**|**SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers**|Shraman Pramanick et.al.|[2407.09413](http://arxiv.org/abs/2407.09413)|**[link](https://github.com/google/spiqa)**|**### \u4efb\u52a1 \u5728\u6df1\u5165\u9605\u8bfb\u79d1\u5b66\u8bba\u6587\u65f6\uff0c\u5feb\u901f\u67e5\u627e\u4fe1\u606f\u662f\u5173\u952e\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u4e8e\u8bba\u6587\u7684\u95ee\u9898 answering\uff08QA\uff09\u6570\u636e\u96c6\u5728\u89c4\u6a21\u548c\u5185\u5bb9\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u4e3b\u8981\u5173\u6ce8\u6587\u672c\u90e8\u5206\u3002\u4e3a\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63a8\u51fa\u4e86SPIQA\uff08\u79d1\u5b66\u8bba\u6587\u56fe\u50cf\u95ee\u9898\u56de\u7b54\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u5927\u578bQA\u6570\u636e\u96c6\uff0c\u65e8\u5728\u7406\u89e3\u8ba1\u7b97\u673a\u79d1\u5b66\u5404\u9886\u57df\u7684\u590d\u6742\u56fe\u8868\u3001\u8868\u683c\u548c\u7ed3\u679c\u53ef\u89c6\u5316\u3002\u501f\u52a9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u901a\u8fc7\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u7b5b\u9009\u521b\u5efa\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u3002SPIQA\u5305\u542b\u4e8627\u4e07\u6761\u95ee\u9898\uff0c\u5206\u4e3a\u8bad\u7ec3\u3001\u9a8c\u8bc1\u548c\u4e09\u4e2a\u4e0d\u540c\u7684\u8bc4\u4f30\u5206\u6bb5\u3002\u901a\u8fc7\u4e0e12\u4e2a\u57fa\u7840\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u591a\u6a21\u6001\u7cfb\u7edf\u7406\u89e3\u79d1\u7814\u6587\u7ae0\u7ec6\u5fae\u4e4b\u5904\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u8bc4\u4ef7\u7b56\u7565\uff0c\u7ed3\u5408\u4e0a\u4e0b\u6587\u68c0\u7d22\uff0c\u5b9e\u73b0\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u8bc4\u4f30\uff0c\u6709\u52a9\u4e8e\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u989d\u5916\u6587\u672c\u4fe1\u606f\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u4e0a\u9650\uff0c\u8fd9\u8868\u660e\u4e86\u5176\u5bf9\u672a\u6765\u7814\u7a76\u7684\u6f5c\u529b\uff0c\u5e76\u9884\u793a\u7740\u8be5\u6570\u636e\u96c6\u5c06\u9769\u65b0\u6211\u4eec\u4e0e\u79d1\u5b66\u6587\u732e\u4e92\u52a8\u7684\u65b9\u5f0f\u3002**|\n", "2407.09394": "|**2024-07-12**|**PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents**|Saber Zerhoudi et.al.|[2407.09394](http://arxiv.org/abs/2407.09394)|**[link](https://github.com/padas-lab-de/PersonaRAG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u77e5\u8bc6\u8fc7\u65f6\u548c\u80e1\u7f16\u4e71\u9020\u800c\u96be\u4ee5\u751f\u6210\u53ef\u9760\u7684\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u578b\u901a\u8fc7\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u6539\u8fdb\u4e86LLMs\uff0c\u4f46\u5f80\u5f80\u65e0\u6cd5\u4e2a\u6027\u5316\u68c0\u7d22\u8fc7\u7a0b\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PersonaRAG\uff0c\u5b83\u5f15\u5165\u4e86\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u4ee3\u7406\uff0c\u80fd\u591f\u6839\u636e\u5b9e\u65f6\u7528\u6237\u6570\u636e\u548c\u4ea4\u4e92\u6765\u8c03\u6574\u68c0\u7d22\u548c\u751f\u6210\u3002\u5728\u591a\u4e2a\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPersonaRAG\u76f8\u8f83\u4e8e\u57fa\u7840\u6a21\u578b\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\uff0c\u80fd\u66f4\u597d\u5730\u6ee1\u8db3\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7528\u6237\u9002\u5e94\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u5177\u6709\u5e7f\u9614\u7684\u53d1\u5c55\u524d\u666f\u3002|\n", "2407.09388": "|**2024-07-12**|**GAVEL: Generating Games Via Evolution and Language Models**|Graham Todd et.al.|[2407.09388](http://arxiv.org/abs/2407.09388)|null|\u81ea\u52a8\u521b\u5efa\u65b0\u9896\u6709\u8da3\u7684\u6e38\u620f\u662f\u4e00\u4e2a\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u5982\u4f55\u4ee5\u8ba1\u7b97\u673a\u53ef\u5904\u7406\u7684\u5f62\u5f0f\u8868\u8fbe\u6e38\u620f\u89c4\u5219\u3001\u641c\u7d22\u5e9e\u5927\u7684\u6f5c\u5728\u6e38\u620f\u7a7a\u95f4\uff0c\u4ee5\u53ca\u51c6\u786e\u8bc4\u4f30\u672a\u89c1\u8fc7\u6e38\u620f\u7684\u539f\u521b\u6027\u548c\u8d28\u91cf\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u6709\u9650\u7684\u89c4\u5219\u8868\u793a\uff0c\u5e76\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u5728Ludii\u6e38\u620f\u63cf\u8ff0\u8bed\u8a00\u4e2d\u751f\u6210\u65b0\u5947\u7684\u6e38\u620f\uff0c\u8be5\u8bed\u8a00\u7f16\u7801\u4e86\u5404\u79cd\u98ce\u683c\u548c\u73a9\u6cd5\u76841000\u591a\u6b3e\u68cb\u76d8\u6e38\u620f\u89c4\u5219\u3002\u6211\u4eec\u501f\u9274\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fdb\u5316\u8ba1\u7b97\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u80fd\u591f\u667a\u80fd\u5730\u53d8\u5f02\u548c\u91cd\u7ec4\u4ee5\u4ee3\u7801\u5f62\u5f0f\u8868\u8fbe\u7684\u6e38\u620f\u673a\u5236\u7684\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u521b\u9020\u51fa\u65b0\u7684\u3001\u6709\u5438\u5f15\u529b\u7684\u6e38\u620f\uff0c\u5305\u62ec\u90a3\u4e9b\u73b0\u6709Ludii\u6570\u636e\u96c6\u4e2d\u672a\u8986\u76d6\u7684\u6e38\u620f\u533a\u57df\u3002\u751f\u6210\u7684\u4e00\u4e9b\u6e38\u620f\u793a\u4f8b\u53ef\u901a\u8fc7Ludii\u95e8\u6237\u5728\u7ebf\u4f53\u9a8c\u3002|\n", "2407.10972": "|**2024-07-15**|**VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation**|Bocheng Zou et.al.|[2407.10972](http://arxiv.org/abs/2407.10972)|**[link](https://github.com/vgbench/VGBench)**|**\u5728\u89c6\u89c9\u6a21\u578b\u9886\u57df\uff0c\u4e3b\u8981\u7684\u8868\u793a\u65b9\u5f0f\u662f\u4f7f\u7528\u50cf\u7d20\u6765\u7ed8\u5236\u89c6\u89c9\u4e16\u754c\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u975e\u603b\u662f\u6700\u4f73\u6216\u552f\u4e00\u7684\u8868\u793a\u89c6\u89c9\u5185\u5bb9\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8bbe\u8ba1\u5e08\u548c\u827a\u672f\u5bb6\uff0c\u4ed6\u4eec\u5e38\u7528\u591a\u8fb9\u5f62\u7b49\u51e0\u4f55\u5f62\u72b6\u6765\u6784\u5efa\u56fe\u5f62\u3002\u77e2\u91cf\u56fe\u5f62\uff08VG\uff09\u63d0\u4f9b\u4e86\u4e00\u79cd\u6587\u672c\u5f62\u5f0f\u7684\u89c6\u89c9\u5185\u5bb9\u8868\u793a\uff0c\u5bf9\u4e8e\u5361\u901a\u6216\u7d20\u63cf\u7b49\u7c7b\u578b\u7684\u5185\u5bb9\u53ef\u80fd\u66f4\u4e3a\u7cbe\u70bc\u548c\u5f3a\u5927\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5f3a\u5927\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u4f46\u8fd9\u4e9b\u5de5\u4f5c\u4e3b\u8981\u4fa7\u91cd\u4e8e\u5b9a\u6027\u5206\u6790\u3001\u7406\u89e3\u6216\u7279\u5b9a\u7c7b\u578b\u7684\u77e2\u91cf\u56fe\u5f62\u3002\u6211\u4eec\u63d0\u51faVGBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u7684\u6027\u80fd\uff0c\u5305\u62ec\uff1a(a) \u5bf9\u89c6\u89c9\u7406\u89e3\u548c\u751f\u6210\u7684\u53cc\u91cd\u5173\u6ce8\uff0c(b) \u591a\u79cd\u77e2\u91cf\u56fe\u5f62\u683c\u5f0f\u7684\u8bc4\u4f30\uff0c(c) \u4e0d\u540c\u7c7b\u578b\u7684\u63d0\u95ee\uff0c(d) \u5e7f\u6cdb\u7684\u63d0\u793a\u6280\u5de7\uff0c\u4ee5\u53ca(e) \u5728\u591a\u79cdLLMs\u4e0b\u7684\u8868\u73b0\u3002\u901a\u8fc7\u5bf9\u6536\u96c6\u76844279\u4e2a\u7406\u89e3\u6837\u672c\u548c5845\u4e2a\u751f\u6210\u6837\u672c\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u90fd\u8868\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u4f4e\u7ea7\u683c\u5f0f\uff08\u5982SVG\uff09\u4e0a\u8868\u73b0\u7a0d\u900a\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u8bc4\u4f30\u6d41\u7a0b\u5c06\u5728\u4e0a\u5f00\u6e90\u3002**|\n", "2407.10969": "|**2024-07-15**|**Q-Sparse: All Large Language Models can be Fully Sparsely-Activated**|Hongyu Wang et.al.|[2407.10969](http://arxiv.org/abs/2407.10969)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u79f0\u4e3aQ-Sparse\uff0c\u4e13\u4e3a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u3002Q-Sparse\u4f7f\u5f97LLMs\u7684\u6fc0\u6d3b\u5168\u4e3a\u7a00\u758f\uff0c\u4ece\u800c\u5728\u63a8\u7406\u9636\u6bb5\u5e26\u6765\u663e\u8457\u7684\u6548\u7387\u63d0\u5347\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u5e94\u7528\u9876\u90e8K\u7a00\u758f\u5316\u6280\u672f\u5bf9\u6fc0\u6d3b\u8fdb\u884c\u5904\u7406\uff0c\u5e76\u7ed3\u5408\u76f4\u901a\u4f30\u8ba1\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3b\u8981\u6210\u679c\u5305\u62ec\uff1a(1) Q-Sparse\u5728\u4fdd\u6301\u4e0e\u57fa\u7ebfLLM\u7ed3\u679c\u76f8\u5f53\u7684\u540c\u65f6\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a8\u7406\u65f6\u7684\u6548\u7387\uff1b(2) \u6211\u4eec\u7ed9\u51fa\u4e86\u7a00\u758f\u6fc0\u6d3bLLMs\u7684\u6700\u4f18\u63a8\u7406\u7f29\u653e\u5b9a\u5f8b\uff1b(3) Q-Sparse\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8868\u73b0\u4f18\u79c0\uff0c\u5305\u62ec\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7ee7\u7eed\u8bad\u7ec3\u548c\u5fae\u8c03\uff1b(4) Q-Sparse\u9002\u7528\u4e8e\u5168\u7cbe\u5ea6\u548c1\u4f4d\u7cbe\u5ea6\u7684LLMs\uff0c\u5982BitNet b1.58\u3002\u7279\u522b\u662f\uff0cBitNet b1.58\u4e0eQ-Sparse\uff08\u53ef\u914d\u5907MoE\uff09\u7684\u7ed3\u5408\uff0c\u4e3a\u672a\u6765LLMs\u7684\u6548\u7387\u63d0\u5347\uff0c\u5305\u62ec\u6210\u672c\u548c\u80fd\u8017\uff0c\u63d0\u4f9b\u4e86\u57fa\u77f3\u548c\u6e05\u6670\u8def\u5f84\u3002|\n", "2407.10960": "|**2024-07-15**|**Fast Matrix Multiplications for Lookup Table-Quantized LLMs**|Han Guo et.al.|[2407.10960](http://arxiv.org/abs/2407.10960)|**[link](https://github.com/hanguo97/flute)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u901a\u5e38\u53d7\u5230\u5185\u5b58\u5e26\u5bbd\u7684\u9650\u5236\uff0c\u5176\u4e2d\u4e3b\u8981\u74f6\u9888\u662f\u5c06\u6a21\u578b\u53c2\u6570\u4eceGPU\u5168\u5c40\u5185\u5b58\u4f20\u8f93\u5230\u5bc4\u5b58\u5668\u7684\u6210\u672c\u3002\u901a\u8fc7\u7ed3\u5408\u6743\u91cd\u53ea\u91cf\u5316\uff0c\u53ef\u4ee5\u51cf\u5c11\u5185\u5b58\u79fb\u52a8\uff0c\u4ece\u800c\u52a0\u901f\u63a8\u7406\u901f\u5ea6\u3002\u7136\u800c\uff0c\u4e3a\u91cf\u5316\u540e\u7684LLMs\u8bbe\u8ba1\u9ad8\u6027\u80fd\u5185\u6838\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5f53\u6743\u91cd\u88ab\u538b\u7f29\u5230\u975e\u5747\u5300\u5206\u9694\u7684\u4f4d\u5bbd\uff08\u59823\u4f4d\uff09\uff0c\u5e76\u91c7\u7528\u975e\u5747\u5300\u67e5\u627e\u8868\uff08LUT\uff09\u91cf\u5316\u65f6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7075\u6d3b\u7684\u67e5\u627e\u8868\u5f15\u64ceFLUTE\uff0c\u5b83\u901a\u8fc7\u5bf9\u91cf\u5316\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u79bb\u7ebf\u91cd\u6784\uff0c\u4ee5\u6700\u5c0f\u5316\u89e3\u538b\u76f8\u5173\u7684\u4f4d\u64cd\u4f5c\uff0c\u5e76\u901a\u8fc7\u5411\u91cf\u5316\u548c\u590d\u5236\u67e5\u627e\u8868\u6765\u7f13\u89e3\u5171\u4eab\u5185\u5b58\u5e26\u5bbd\u9650\u5236\u3002\u5728\u5c0f\u6279\u91cf\uff08\u5c0f\u4e8e32\uff09\u548c\u91cf\u5316\u7ec4\u5927\u5c0f\u4e3a128\uff08LLM\u63a8\u7406\u4e2d\u7684\u5178\u578b\u503c\uff09\u7684\u60c5\u51b5\u4e0b\uff0cFLUTE\u5185\u6838\u7684\u901f\u5ea6\u53ef\u4ee5\u6bd4\u73b0\u6709GEMM\u5185\u6838\u5feb2-4\u500d\u3002\u4f5c\u4e3aFLUTE\u7684\u5e94\u7528\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u67e5\u627e\u8868\u57fa\u7684NormalFloat\u91cf\u5316\u7684\u4e00\u79cd\u7b80\u5355\u6269\u5c55\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u4e8e\u91cf\u5316LLaMA3\uff0c\u83b7\u5f97\u4e86\u4e0e\u5f3a\u5927\u57fa\u51c6\u76f8\u5f53\u7684\u91cf\u5316\u6027\u80fd\uff0c\u540c\u65f6\u5b9e\u73b0\u4e86\u7aef\u5230\u7aef\u541e\u5410\u91cf\u76841.5\u52302\u500d\u63d0\u5347\u3002|\n", "2407.10953": "|**2024-07-15**|**MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models**|Chengguang Gan et.al.|[2407.10953](http://arxiv.org/abs/2407.10953)|null|## \u4efb\u52a1 **\u80cc\u666f\uff1a** \u4e92\u60e0\u589e\u5f3a\u6548\u5e94\uff08MRE\uff09\u5728\u4fe1\u606f\u62bd\u53d6\u548c\u591a\u4efb\u52a1\u7814\u7a76\u4e2d\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4ec5\u6709\u7684MRE\u6df7\u5408\u6570\u636e\u96c6\u5c40\u9650\u4e8e\u65e5\u8bed\uff0c\u8fd9\u9650\u5236\u4e86\u5168\u7403\u7814\u7a76\u754c\u7684\u5e7f\u6cdb\u63a2\u7d22\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u8bed\u8a00MRE\u6df7\u5408\u6570\u636e\u96c6\uff08MMM\uff09\uff0c\u5305\u542b\u82f1\u8bed\u3001\u65e5\u8bed\u548c\u6c49\u8bed\u768421\u4e2a\u5b50\u96c6\u3002\u672c\u6587\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f85\u52a9\u7684\u6570\u636e\u96c6\u7ffb\u8bd1\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528LLMs\u5c06\u539f\u59cb\u65e5\u8bed\u6587\u672c\u8fdb\u884c\u7ffb\u8bd1\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u6570\u636e\u96c6\u6784\u5efa\u65f6\u7684\u4eba\u5de5\u6807\u6ce8\u65f6\u95f4\u3002 **\u8d21\u732e\uff1a** \u6211\u4eec\u6269\u5c55\u4e86\u6570\u636e\u96c6\uff0c\u52a0\u5165\u4e86\u5f00\u653e\u9886\u57df\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u53e5\u5b50\u5206\u7c7b\u4efb\u52a1\u3002\u57fa\u4e8e\u8fd9\u4e2a\u6269\u5145\u540e\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7edf\u4e00\u7684\u8f93\u5165-\u8f93\u51fa\u6846\u67b6\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u5f00\u653e\u57df\u4fe1\u606f\u62bd\u53d6\u5927\u8bed\u8a00\u6a21\u578b\uff08OIELLM\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0cOIELLM\u6a21\u578b\u80fd\u591f\u6709\u6548\u5904\u7406\u65b0\u7684MMM\u6570\u636e\u96c6\uff0c\u5e76\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002 \u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u591a\u8bed\u8a00\u8d44\u6e90\u548c\u9ad8\u6548\u7684\u7ffb\u8bd1\u7b56\u7565\uff0c\u63a8\u52a8\u4e92\u60e0\u589e\u5f3a\u6548\u5e94\u5728\u591a\u8bed\u8a00\u4fe1\u606f\u62bd\u53d6\u9886\u57df\u7684\u5e94\u7528\u7814\u7a76\u3002|\n", "2407.10947": "|**2024-07-15**|**Can Textual Semantics Mitigate Sounding Object Segmentation Preference?**|Yaoting Wang et.al.|[2407.10947](http://arxiv.org/abs/2407.10947)|**[link](https://github.com/gewu-lab/sounding-object-segmentation-preference)**|**## \u4efb\u52a1 \u97f3\u9891-\u89c6\u89c9\u5206\u5272\uff08Audio-Visual Segmentation\uff0cAVS\uff09\u4efb\u52a1\u7684\u76ee\u6807\u662f\u5229\u7528\u97f3\u9891\u7ebf\u7d22\u5728\u89c6\u89c9\u7a7a\u95f4\u4e2d\u5206\u5272\u51fa\u53d1\u58f0\u7269\u4f53\u3002\u7136\u800c\uff0c\u7814\u7a76\u6307\u51fa\uff0c\u73b0\u6709\u7684AVS\u65b9\u6cd5\u8fc7\u4e8e\u4f9d\u8d56\u5bf9\u53ef\u542c\u89c1\u5bf9\u8c61\u7684\u5206\u5272\u504f\u597d\uff0c\u800c\u975e\u7cbe\u786e\u7684\u97f3\u9891\u6307\u5bfc\u3002\u95ee\u9898\u5728\u4e8e\uff0c\u76f8\u6bd4\u4e8e\u89c6\u89c9\uff0c\u97f3\u9891\u5728\u591a\u58f0\u6e90\u97f3\u573a\u4e2d\u7684\u8bed\u4e49\u8868\u73b0\u8f83\u5f31\uff0c\u5bfc\u81f4\u5176\u5728\u6307\u5bfc\u89c6\u89c9\u7a7a\u95f4\u65f6\u4f5c\u7528\u6709\u9650\u3002\u9274\u4e8e\u6587\u672c\u6a21\u6001\u7ecf\u8fc7\u6df1\u5165\u63a2\u7d22\uff0c\u5305\u542b\u4e30\u5bcc\u7684\u62bd\u8c61\u8bed\u4e49\uff0c\u6211\u4eec\u63d0\u51fa\u5229\u7528\u89c6\u89c9\u573a\u666f\u4e2d\u7684\u6587\u672c\u63d0\u793a\u6765\u589e\u5f3a\u97f3\u9891\u6307\u5bfc\u7684\u7cbe\u786e\u6027\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u73b0\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u5668\u83b7\u53d6\u573a\u666f\u63cf\u8ff0\uff0c\u7136\u540e\u5229\u7528\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u6f5c\u5728\u7684\u53d1\u58f0\u7269\u4f53\u4f5c\u4e3a\u6587\u672c\u7ebf\u7d22\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u4e49\u7684\u97f3\u9891\u5efa\u6a21\u6a21\u5757\uff0c\u5f15\u5165\u52a8\u6001\u63a9\u7801\uff0c\u5c06\u97f3\u9891\u7279\u5f81\u4e0e\u6587\u672c\u7ebf\u7d22\u878d\u5408\uff0c\u751f\u6210\u5177\u6709\u4ee3\u8868\u6027\u7684\u53d1\u58f0\u7269\u4f53\u7279\u5f81\u3002\u8fd9\u4e9b\u7279\u5f81\u4e0d\u4ec5\u5305\u542b\u97f3\u9891\u4fe1\u606f\uff0c\u8fd8\u8574\u542b\u4e86\u751f\u52a8\u7684\u8bed\u4e49\uff0c\u4ece\u800c\u4e3a\u89c6\u89c9\u7a7a\u95f4\u63d0\u4f9b\u66f4\u4e3a\u6e05\u6670\u7684\u6307\u5f15\u3002\u6211\u4eec\u5728AVS\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u501f\u52a9\u6587\u672c\u63d0\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5bf9\u97f3\u9891\u7684\u654f\u611f\u5ea6\u5f97\u5230\u63d0\u5347\uff0c\u5728\u6240\u6709\u4e09\u4e2a\u5b50\u96c6\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u7ade\u4e89\u529b\u3002\u9879\u76ee\u9875\u9762\uff1a[https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference](https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference)\u3002**|\n", "2407.10943": "|**2024-07-15**|**GRUtopia: Dream General Robots in a City at Scale**|Hanqing Wang et.al.|[2407.10943](http://arxiv.org/abs/2407.10943)|**[link](https://github.com/openrobotlab/grutopia)**|**\u8fd1\u671f\u7684\u7814\u7a76\u6b63\u5728\u63a2\u7d22Embodied AI\u9886\u57df\u7684\u89c4\u6a21\u6cd5\u5219\u3002\u9274\u4e8e\u6536\u96c6\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u7684\u9ad8\u6602\u6210\u672c\uff0c\u6211\u4eec\u8ba4\u4e3a\u6a21\u62df\u5230\u73b0\u5b9e\uff08Sim2Real\uff09\u65b9\u6cd5\u5bf9\u4e8e\u6269\u5c55embodied\u6a21\u578b\u7684\u5b66\u4e60\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u4ecb\u7ecd\u9879\u76eeGRUtopia\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u5404\u79cd\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u9996\u4e2a\u4e92\u52a8\u4e09\u7ef4\u793e\u4f1a\u3002\u5b83\u5177\u6709\u591a\u9879\u521b\u65b0\uff1a(a) \u573a\u666f\u6570\u636e\u96c6GRScenes\u5305\u542b\u4e8610\u4e07\u5f20\u4ea4\u4e92\u5f0f\u3001\u7cbe\u7ec6\u6ce8\u91ca\u7684\u573a\u666f\uff0c\u8fd9\u4e9b\u573a\u666f\u53ef\u4ee5\u81ea\u7531\u7ec4\u5408\u6210\u57ce\u5e02\u89c4\u6a21\u7684\u73af\u5883\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u5173\u6ce8\u5bb6\u5ead\u73af\u5883\u7684\u4f5c\u54c1\u4e0d\u540c\uff0cGRScenes\u6db5\u76d6\u4e8689\u4e2a\u591a\u6837\u5316\u7684\u573a\u666f\u7c7b\u522b\uff0c\u5f25\u5408\u4e86\u670d\u52a1\u5bfc\u5411\u73af\u5883\u4e2d\u673a\u5668\u4eba\u521d\u59cb\u90e8\u7f72\u7684\u5dee\u8ddd\u3002(b) GRResidents\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u975e\u73a9\u5bb6\u89d2\u8272\uff08NPC\uff09\u7cfb\u7edf\uff0c\u8d1f\u8d23\u793e\u4ea4\u4e92\u52a8\u3001\u4efb\u52a1\u751f\u6210\u548c\u4efb\u52a1\u5206\u914d\uff0c\u4ece\u800c\u6a21\u62dfembodied AI\u5e94\u7528\u4e2d\u7684\u793e\u4f1a\u573a\u666f\u3002(c) \u6807\u51c6\u5316\u57fa\u51c6GRBench\u652f\u6301\u5404\u79cd\u673a\u5668\u4eba\uff0c\u4f46\u4ee5\u817f\u8db3\u673a\u5668\u4eba\u4e3a\u4e3b\uff0c\u63d0\u4f9b\u6d89\u53ca\u7269\u4f53\u5bfc\u822a\u3001\u793e\u4ea4\u5bfc\u822a\u548c\u79fb\u52a8\u64cd\u4f5c\u7684\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5177\u6709\u9002\u5ea6\u7684\u6311\u6218\u6027\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u591f\u7f13\u89e3\u8be5\u9886\u57df\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u532e\u4e4f\uff0c\u5e76\u4e3aEmbodied AI\u7814\u7a76\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u9879\u76ee\u4ee3\u7801\u53ef\u4ecehttps://github.com/OpenRobotLab/GRUtopia\u83b7\u53d6\u3002**|\n", "2407.10909": "|**2024-07-15**|**FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets**|Xiaohui Victor Li et.al.|[2407.10909](http://arxiv.org/abs/2407.10909)|**[link](https://github.com/xiaohui-victor-li/FinDKG)**|\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff08DKGs\uff09\u662f\u4e00\u79cd\u6d41\u884c\u7684\u6570\u636e\u7ed3\u6784\uff0c\u7528\u4e8e\u8868\u793a\u968f\u65f6\u95f4\u53d8\u5316\u7684\u5bf9\u8c61\u4e4b\u95f4\u7684\u5404\u79cd\u8fde\u63a5\u3002\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u65e0\u7ed3\u6784\u6570\u636e\u6e90\uff08\u5982\u6587\u672c\u548c\u56fe\u50cf\uff09\u63d0\u53d6\u7684\u4fe1\u606f\u65f6\u5c55\u73b0\u51fa\u9ad8\u6548\u6027\u3002\u5728\u91d1\u878d\u5e94\u7528\u4e2d\uff0cDKGs\u53ef\u7528\u4e8e\u57fa\u4e8e\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u63a2\u6d4b\u6295\u8d44\u7b56\u7565\u7684\u8d8b\u52bf\u3002\u672c\u7814\u7a76\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\u7684\u7279\u6027\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u7684Fine-tuned LLM\uff0c\u79f0\u4e3a\u96c6\u6210\u4e0a\u4e0b\u6587\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\uff08ICKG\uff09\u3002\u5229\u7528ICKG\uff0c\u6211\u4eec\u4ece\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u4e2d\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u5f00\u6e90\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff0c\u79f0\u4e3aFinDKG\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6ce8\u610f\u529b\u673a\u5236\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff08KGTransformer\uff09\uff0c\u7528\u4e8e\u5206\u6790\u8fd9\u4e2a\u56fe\u8c31\u3002\u6211\u4eec\u5728\u57fa\u51c6\u6570\u636e\u96c6\u548cFinDKG\u4e0a\u6d4b\u8bd5\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7ed3\u679c\u663e\u793a\u5728\u94fe\u63a5\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0cKGTransformer\u8868\u73b0\u4f18\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86KGTransformer\u5728FinDKG\u4e0a\u7684\u4e3b\u9898\u6295\u8d44\u6027\u80fd\uff0c\u8bc1\u660e\u5b83\u80fd\u8d85\u8d8a\u73b0\u6709\u7684\u4e3b\u9898\u4ea4\u6613\u6240\u4ea4\u6613\u57fa\u91d1\uff08ETF\uff09\u3002|\n", "2407.10887": "|**2024-07-15**|**Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique**|Mark Russinovich et.al.|[2407.10887](http://arxiv.org/abs/2407.10887)|null|\u968f\u7740\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u76d7\u548c\u8bef\u7528\u7684\u62c5\u5fe7\u52a0\u5267\uff0c\u6a21\u578b\u6307\u7eb9\u5316\u7684\u5fc5\u8981\u6027\u63d0\u5347\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u6210\u529f\u7684\u6307\u7eb9\u5e94\u5177\u5907\u4e94\u4e2a\u7279\u6027\uff1a\u900f\u660e\u6027\u3001\u6548\u7387\u3001\u6301\u4e45\u6027\u3001\u9c81\u68d2\u6027\u548c\u4e0d\u53ef\u4f2a\u9020\u6027\u3002\u672c\u6587\u9996\u5148\u5b9a\u4e49\u4e86\u8fd9\u4e9b\u8981\u6c42\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7b80\u5355\u6307\u7eb9\u65b9\u6cd5\u2014\u2014Chain & Hash\uff0c\u5b83\u878d\u5408\u4e86\u52a0\u5bc6\u7406\u5ff5\uff0c\u5b9e\u73b0\u4e86\u6240\u6709\u8fd9\u4e9b\u7279\u6027\u3002Chain & Hash\u6d89\u53ca\u751f\u6210\u4e00\u7ec4\u95ee\u9898\uff08\u6307\u7eb9\uff09\u53ca\u5176\u53ef\u80fd\u7684\u7b54\u6848\uff0c\u7136\u540e\u4f7f\u7528\u5b89\u5168\u54c8\u5e0c\u6280\u672f\u5c06\u5b83\u4eec\u5408\u5e76\uff0c\u4ee5\u786e\u5b9a\u6bcf\u4e2a\u95ee\u9898\u7684\u503c\uff0c\u4ece\u800c\u4fdd\u8bc1\u4e0d\u53ef\u4f2a\u9020\u6027\uff0c\u9632\u6b62\u5bf9\u624b\u58f0\u79f0\u865a\u5047\u6240\u6709\u6743\u3002\u6211\u4eec\u5728\u591a\u4e2a\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86Chain & Hash\u6280\u672f\uff0c\u5e76\u5c55\u793a\u4e86\u5b83\u5bf9\u826f\u6027\u64cd\u4f5c\uff08\u5982\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\uff09\u548c\u654c\u610f\u5220\u9664\u6307\u7eb9\u7684\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5e26\u6307\u7eb9\u7684\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6027\u80fd\u51e0\u4e4e\u4e0e\u975e\u6307\u7eb9\u5316\u6a21\u578b\u76f8\u5f53\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u9ad8\u6548\u6027\u53ca\u5176\u5b9e\u7528\u4ef7\u503c\u3002|\n", "2407.10886": "|**2024-07-15**|**SLIP: Securing LLMs IP Using Weights Decomposition**|Yehonathan Refael et.al.|[2407.10886](http://arxiv.org/abs/2407.10886)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u4ef7\u503c\u4f5c\u4e3a\u77e5\u8bc6\u4ea7\u6743\uff08IP\uff09\u65e5\u76ca\u51f8\u663e\uff0c\u53cd\u6620\u51fa\u5176\u80cc\u540e\u5de8\u5927\u7684\u6295\u8d44\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4e91\u90e8\u7f72\u6210\u672c\u9ad8\uff0c\u8fb9\u7f18\u8bbe\u5907\u90e8\u7f72\u7684\u9700\u6c42\u589e\u52a0\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u53c2\u6570\u88ab\u76d7\u7528\u548c\u672a\u7ecf\u6388\u6743\u4f7f\u7528\u3002\u5f53\u524d\u7684\u4fdd\u62a4\u65b9\u6cd5\u5728\u5b9e\u7528\u6027\u3001\u51c6\u786e\u6027\u635f\u5931\u6216\u9002\u5e94\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6df7\u5408\u63a8\u7406\u7b97\u6cd5\uff0c\u79f0\u4e3aSLIP\uff08Secure Lightweight Inference Protocol\uff09\uff0c\u65e8\u5728\u4fdd\u62a4\u90e8\u7f72\u5728\u8fb9\u7f18\u7684\u6a21\u578b\u514d\u53d7\u76d7\u7a83\u3002SLIP\u662f\u9996\u4e2a\u517c\u987e\u5b9e\u9645\u5e94\u7528\u7684\u5b9e\u7528\u6027\u548c\u4e25\u683c\u5b89\u5168\u6027\u7684\u6df7\u5408\u534f\u8bae\uff0c\u540c\u65f6\u4fdd\u6301\u96f6\u7cbe\u5ea6\u4e0b\u964d\u548c\u4f4e\u5ef6\u8fdf\u5f71\u54cd\u3002 SLIP\u901a\u8fc7\u77e9\u9635\u5206\u89e3\u5b9e\u73b0\u4e86\u6a21\u578b\u5728\u4e24\u4e2a\u8ba1\u7b97\u8d44\u6e90\u4e4b\u95f4\u7684\u5212\u5206\uff1a\u4e00\u4e2a\u5b89\u5168\u4f46\u6602\u8d35\uff0c\u53e6\u4e00\u4e2a\u6210\u672c\u6548\u76ca\u9ad8\u4f46\u6613\u53d7\u653b\u51fb\u3002\u5173\u952e\u5728\u4e8e\uff0c\u5b89\u5168\u8d44\u6e90\u4fdd\u7559\u4e86\u6a21\u578bIP\u4e2d\u6700\u654f\u611f\u7684\u90e8\u5206\uff0c\u540c\u65f6\u6267\u884c\u6700\u5c11\u7684\u8ba1\u7b97\uff0c\u800c\u8106\u5f31\u8d44\u6e90\u5219\u76f8\u53cd\u3002\u6b64\u5916\uff0c\u8be5\u534f\u8bae\u63d0\u4f9b\u4e86\u9632\u6b62\u653b\u51fb\u8005\u5229\u7528\u5206\u5272\u83b7\u53d6\u4fdd\u5bc6\u4fe1\u606f\u7684\u5b89\u5168\u4fdd\u969c\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5b9e\u9a8c\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86SLIP\u7684\u7a33\u5065\u6027\u548c\u6709\u6548\u6027\uff0c\u4f7f\u5176\u6210\u4e3a\u4fdd\u62a4LLMs\u7684\u7406\u60f3\u89e3\u51b3\u65b9\u6848\u3002|\n", "2407.10873": "|**2024-07-15**|**Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models**|Rui Zhang et.al.|[2407.10873](http://arxiv.org/abs/2407.10873)|null|\u81ea\u52a8\u5316\u542f\u53d1\u5f0f\u8bbe\u8ba1\uff08AHD\uff09\u56e0\u5176\u5728\u81ea\u52a8\u5f00\u53d1\u9ad8\u6548\u542f\u53d1\u5f0f\u65b9\u6cd5\u65b9\u9762\u7684\u6f5c\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5c06AHD\u89c6\u4e3a\u8fdb\u5316\u7a0b\u5e8f\u641c\u7d22\uff08EPS\uff09\u95ee\u9898\u7684\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u8bbe\u7f6e\u4e0d\u4e00\u81f4\uff0c\u57fa\u7840\u6bd4\u8f83\u4e0d\u8db3\uff0c\u4e14\u7f3a\u4e4f\u5bf9LLM\u4e0e\u641c\u7d22\u7b56\u7565\u7ed3\u5408\u5fc5\u8981\u6027\u7684\u6df1\u5165\u5206\u6790\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u7684\u5b9e\u9645\u8fdb\u5c55\u96be\u4ee5\u5f97\u5230\u5145\u5206\u8bc1\u660e\u3002\u672c\u7814\u7a76\u901a\u8fc7\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86\u56db\u9879\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u548c\u56db\u9879AHD\u95ee\u9898\uff0c\u8de8\u8d8a\u4e5d\u79cdLLM\uff0c\u5e76\u8fdb\u884c\u4e86\u4e94\u6b21\u72ec\u7acb\u8fd0\u884c\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\uff0c\u5b9e\u8bc1\u4e86\u5728LLM\u9a71\u52a8\u7684AHD\u65b9\u6cd5\u4e2d\u7684\u8fdb\u5316\u641c\u7d22\u7684\u91cd\u8981\u6027\uff0c\u540c\u65f6\u4e5f\u63a8\u52a8\u4e86\u672a\u6765EPS\u7b97\u6cd5\u5f00\u53d1\u7684\u8fdb\u6b65\u3002\u4e3a\u4e86\u4fc3\u8fdb\u53ef\u8bbf\u95ee\u6027\u548c\u53ef\u91cd\u590d\u6027\uff0c\u6211\u4eec\u5df2\u7ecf\u5168\u9762\u5f00\u6e90\u4e86\u6211\u4eec\u7684\u57fa\u51c6\u548c\u76f8\u5173\u7ed3\u679c\u3002|\n", "2407.11965": "|**2024-07-16**|**UrbanWorld: An Urban World Model for 3D City Generation**|Yu Shang et.al.|[2407.11965](http://arxiv.org/abs/2407.11965)|**[link](https://github.com/urban-world/urbanworld)**|\u57ce\u5e02\u4f5c\u4e3a\u4eba\u7c7b\u751f\u6d3b\u7684\u57fa\u672c\u73af\u5883\uff0c\u5305\u542b\u4e86\u5efa\u7b51\u3001\u9053\u8def\u548c\u690d\u88ab\u7b49\u591a\u5143\u7269\u7406\u5143\u7d20\uff0c\u8fd9\u4e9b\u5143\u7d20\u4e4b\u95f4\u5b58\u5728\u7740\u590d\u6742\u7684\u76f8\u4e92\u5173\u8054\u3002\u6784\u5efa\u903c\u771f\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u73af\u5883\u5bf9\u4e8e\u7814\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u611f\u77e5\u3001\u51b3\u7b56\u548c\u884c\u52a8\u7684AI\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u624b\u5de5\u5236\u4f5c\u8fc7\u7a0b\u8017\u65f6\u4e14\u7cbe\u7ec6\uff0c\u9700\u8981\u8bbe\u8ba1\u5e08\u6295\u5165\u5927\u91cf\u7cbe\u529b\u6765\u7cbe\u786e\u5448\u73b0\u590d\u6742\u7684\u57ce\u5e02\u7279\u5f81\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faUrbanWorld\uff0c\u8fd9\u662f\u4e00\u4e2a\u9996\u4e2a\u81ea\u52a8\u751f\u6210\u5b9a\u5236\u5316\u3001\u771f\u5b9e\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u4e16\u754c\u7684\u6a21\u578b\uff0c\u652f\u6301\u7075\u6d3b\u7684\u63a7\u5236\u6761\u4ef6\u3002UrbanWorld\u7684\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u56db\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u5229\u7528\u516c\u5f00\u7684OSM\u6570\u636e\u8fdb\u884c3D\u5e03\u5c40\u751f\u6210\u3001\u501f\u52a9\u5f3a\u5927\u7684\u57ce\u5e02\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08Urban MLLM\uff09\u8fdb\u884c\u57ce\u5e02\u573a\u666f\u89c4\u5212\u4e0e\u8bbe\u8ba1\u3001\u901a\u8fc7\u5148\u8fdb\u76843D\u6269\u6563\u6280\u672f\u5b9e\u73b0\u53ef\u63a7\u8d44\u4ea7\u6e32\u67d3\uff0c\u4ee5\u53caMLLM\u8f85\u52a9\u7684\u573a\u666f\u7ec6\u5316\u3002UrbanWorld\u751f\u6210\u7684\u9ad8\u4fdd\u771f3D\u57ce\u5e02\u73af\u5883\u4e3a\u901a\u7528AI\u548c\u673a\u5668\u611f\u77e5\u7cfb\u7edf\u5728\u6a21\u62df\u4e2d\u7684\u771f\u5b9e\u53cd\u9988\u548c\u4ea4\u4e92\u63d0\u4f9b\u4e86\u53ef\u80fd\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u5c06UrbanWorld\u4f5c\u4e3a\u5f00\u6e90\u4e14\u591a\u529f\u80fd\u7684\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u63d0\u5347AI\u5728\u771f\u5b9e\u57ce\u5e02\u73af\u5883\u4e2d\u7684\u611f\u77e5\u3001\u51b3\u7b56\u548c\u4e92\u52a8\u80fd\u529b\u3002|\n", "2407.11963": "|**2024-07-16**|**NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?**|Mo Li et.al.|[2407.11963](http://arxiv.org/abs/2407.11963)|**[link](https://github.com/open-compass/opencompass)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aNeedleBench\u7684\u6846\u67b6\uff0c\u5b83\u662f\u4e00\u7cfb\u5217\u8bc4\u4f30\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u957f\u6587\u672c\u7406\u89e3\u80fd\u529b\u7684\u9010\u6b65\u5347\u7ea7\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u6d89\u53ca\u4e0d\u540c\u957f\u5ea6\u533a\u95f4\uff084k\u30018k\u300132k\u3001128k\u3001200k\u30011M\u4e43\u81f3\u66f4\u957f\uff09\u548c\u6df1\u5ea6\u8303\u56f4\uff0c\u901a\u8fc7\u5728\u4e0d\u540c\u6587\u672c\u6df1\u5ea6\u533a\u57df\u63d2\u5165\u5173\u952e\u6570\u636e\u70b9\uff0c\u7cfb\u7edf\u6027\u5730\u6d4b\u8bd5\u6a21\u578b\u5728\u5404\u79cd\u60c5\u5883\u4e0b\u7684\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u9488\u5bf9\u4e8e\u53cc\u8bed\u957f\u6587\u672c\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u6846\u67b6\u6765\u8003\u5bdf\u4e3b\u6d41\u5f00\u6e90\u6a21\u578b\u8bc6\u522b\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e76\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7956\u5148\u8ffd\u8e2a\u6311\u6218\uff08Ancestral Trace Challenge\uff0cATC\uff09\uff0c\u65e8\u5728\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u957f\u6587\u672c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5355\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30LLMs\u5904\u7406\u590d\u6742\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u5728\u5b9e\u9645\u7684\u957f\u6587\u672c\u5e94\u7528\u4e2d\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u5904\u7406\u903b\u8f91\u63a8\u7406\u96be\u9898\u65f6\u9762\u4e34\u6311\u6218\u3002\u6240\u6709\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728OpenCompass\u9879\u76ee\uff08https://github.com/open-compass/opencompass\uff09\u83b7\u53d6\u3002**|\n", "2407.11934": "|**2024-07-16**|**Code Documentation and Analysis to Secure Software Development**|Paul Attie et.al.|[2407.11934](http://arxiv.org/abs/2407.11934)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCode Documentation and Analysis Tool\uff08CoDAT\uff09\u7684\u5de5\u5177\u3002CoDAT\u65e8\u5728\u4fdd\u6301\u4ee3\u7801\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\uff0c\u4f8b\u5982\uff0c\u5982\u679c\u4ee3\u7801\u7247\u6bb5\u4e2d\u7684\u67d0\u884c\u88ab\u4fee\u6539\uff0c\u76f8\u5e94\u7684\u6ce8\u91ca\u4e5f\u4f1a\u81ea\u52a8\u66f4\u65b0\uff0c\u786e\u4fdd\u5185\u90e8\u4e00\u81f4\u6027\u4ee5\u53ca\u4e0e\u4ee3\u7801\u7684\u4e00\u81f4\u6027\u3002\u901a\u8fc7\u6807\u8bb0\u8fc7\u65f6\u7684\u6ce8\u91ca\uff0cCoDAT\u63d0\u9192\u5f00\u53d1\u8005\u7ef4\u62a4\u6700\u65b0\u7684\u6587\u6863\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u67e5\u4ee3\u7801\u7247\u6bb5\u4e0e\u5176\u63cf\u8ff0\u7684\u8bed\u4e49\u4e00\u81f4\u6027\uff0c\u4ece\u800c\u4e5f\u80fd\u8bc6\u522b\u51fa\u8bed\u4e49\u4e0d\u4e00\u81f4\u548c\u8fc7\u65f6\u7684\u6ce8\u91ca\u3002\u8fd9\u6709\u52a9\u4e8e\u7a0b\u5e8f\u5458\u7f16\u5199\u6b63\u786e\u5b9e\u73b0\u4ee3\u7801\u8349\u56fe\u7684\u4ee3\u7801\uff0c\u652f\u6301\u9010\u6b65\u7ec6\u5316\u65b9\u6cd5\uff0c\u4ece\u4ee3\u7801\u8349\u56fe\u9010\u6b65\u6f14\u53d8\u4e3a\u7ecf\u8fc7\u4e00\u4e24\u6b21\u6216\u66f4\u591a\u6b21\u7ec6\u5316\u8fed\u4ee3\u7684\u4ee3\u7801\u3002 CoDAT\u5728IntelliJ IDEA IDE\u4e2d\u5b9e\u73b0\uff0c\u5229\u7528Code Insight\u5b88\u62a4\u7a0b\u5e8f\u5305\u7ed3\u5408\u81ea\u5b9a\u4e49\u6b63\u5219\u8868\u8fbe\u5f0f\u7b97\u6cd5\uff0c\u6807\u8bb0\u5bf9\u5e94\u4ee3\u7801\u5757\u5df2\u66f4\u6539\u7684\u6807\u8bb0\u6ce8\u91ca\u3002CoDAT\u7684\u540e\u7aef\u7ed3\u6784\u4e0a\u662f\u53bb\u4e2d\u5fc3\u5316\u7684\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u8d26\u672c\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u4ee3\u7801\u4e00\u81f4\u6027\u8ddf\u8e2a\u548c\u67b6\u6784\u7f16\u8bd1\u7ba1\u7406\u3002|\n", "2407.11919": "|**2024-07-16**|**What's Wrong? Refining Meeting Summaries with LLM Feedback**|Frederic Kirstein et.al.|[2407.11919](http://arxiv.org/abs/2407.11919)|null|\u968f\u7740\u6570\u5b57\u4f1a\u8bae\u7684\u666e\u53ca\uff0c\u4f1a\u8bae\u6458\u8981\u63d0\u70bc\u6210\u4e3a\u5173\u952e\u4efb\u52a1\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u5728\u8fde\u8d2f\u6027\u548c\u7406\u89e3\u4e0a\u4e0b\u6587\u4e2d\u8d85\u8d8a\u4e86\u4f20\u7edf\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4ecd\u9700\u6539\u8fdb\u4ee5\u4fdd\u6301\u76f8\u5173\u6027\u5e76\u907f\u514d\u9519\u8bef\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u591aLLM\u7684\u4f1a\u8bae\u6458\u8981\u4fee\u6b63\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6a21\u62df\u4eba\u7c7b\u5ba1\u67e5\uff1a\u9519\u8bef\u8bc6\u522b\u548c\u6458\u8981\u7cbe\u70bc\u3002\u6211\u4eec\u53d1\u5e03\u4e86QMSum Mistake\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b200\u4efd\u7531\u4eba\u5de5\u6807\u6ce8\u7684\u81ea\u52a8\u751f\u6210\u4f1a\u8bae\u6458\u8981\u6570\u636e\u96c6\uff0c\u9488\u5bf9\u7ed3\u6784\u3001\u9057\u6f0f\u548c\u4e0d\u76f8\u5173\u7b49\u4e5d\u79cd\u9519\u8bef\u7c7b\u578b\u8fdb\u884c\u4e86\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLLMs\u80fd\u591f\u51c6\u786e\u8bc6\u522b\u8fd9\u4e9b\u9519\u8bef\u3002\u6211\u4eec\u5c06\u8bc6\u522b\u51fa\u7684\u95ee\u9898\u8f6c\u5316\u4e3a\u53ef\u64cd\u4f5c\u7684\u53cd\u9988\uff0c\u4ee5\u6b64\u63d0\u5347\u6458\u8981\u7684\u8d28\u91cf\uff0c\u5982\u76f8\u5173\u6027\u3001\u4fe1\u606f\u91cf\u3001\u7b80\u6d01\u6027\u548c\u8fde\u8d2f\u6027\u3002\u8fd9\u79cd\u4e8b\u540e\u4f18\u5316\u7b56\u7565\u901a\u8fc7\u5229\u7528\u591a\u4e2aLLMs\u6765\u9a8c\u8bc1\u8f93\u51fa\u8d28\u91cf\uff0c\u6709\u6548\u63d0\u9ad8\u4e86\u6458\u8981\u8d28\u91cf\u3002\u6211\u4eec\u7684\u591aLLM\u4f1a\u8bae\u6458\u8981\u65b9\u6cd5\u5bf9\u4e8e\u9700\u8981\u7a33\u5065\u6027\u3001\u884c\u52a8\u8ba1\u5212\u548c\u76ee\u6807\u5bfc\u5411\u7684\u590d\u6742\u6587\u672c\u751f\u6210\u4efb\u52a1\u5177\u6709\u6f5c\u5728\u5e94\u7528\u4ef7\u503c\u3002|\n", "2407.11888": "|**2024-07-16**|**Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads**|Aritra Dhar et.al.|[2407.11888](http://arxiv.org/abs/2407.11888)|null|\u5728\u4e91\u5de5\u4f5c\u8d1f\u8f7d\u4e2d\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210AI\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u4e13\u7528\u786c\u4ef6\u52a0\u901f\u5668\uff0c\u5982GPU\u3001NPUs\u548cTPUs\uff0c\u56e0\u5176\u5728AI\u5e94\u7528\u4e2d\u7684\u5353\u8d8a\u6027\u80fd\u8d85\u8d8a\u4e86\u901a\u7528CPU\u3002AI\u6a21\u578b\u548c\u6570\u636e\u901a\u5e38\u5177\u6709\u9ad8\u5ea6\u654f\u611f\u6027\uff0c\u5e76\u6765\u81ea\u76f8\u4e92\u4e0d\u4fe1\u4efb\u7684\u5404\u65b9\u3002\u73b0\u6709\u7684\u57fa\u4e8eCPU\u7684\u53ef\u4fe1\u6267\u884c\u73af\u5883\uff08TEE\uff09\uff0c\u5982\u82f1\u7279\u5c14SGX\u6216AMD SEV\uff0c\u63d0\u4f9b\u7684\u4fdd\u62a4\u4e0d\u591f\u5145\u5206\u3002\u50cfNvidia-CC\u8fd9\u6837\u7684\u8bbe\u5907\u4e2d\u5fc3TEE\u4ec5\u9488\u5bf9\u7d27\u5bc6\u8026\u5408\u7684CPU-GPU\u7cfb\u7edf\uff0c\u4e14\u91c7\u7528\u4e13\u6709\u65b9\u6848\uff0c\u9700\u8981\u5728\u4e3b\u673aCPU\u4e0a\u90e8\u7f72TEE\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u73b0\u6709\u7684\u5b66\u672f\u63d0\u6848\u5927\u591a\u9488\u5bf9\u7279\u5b9a\u7684CPU-TEE\u5e73\u53f0\u3002 \u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Ascend-CC\uff0c\u4e00\u79cd\u57fa\u4e8e\u79bb\u6563NPUs\u7684\u673a\u5bc6\u8ba1\u7b97\u67b6\u6784\uff0c\u65e0\u9700\u5bf9\u4e3b\u673a\u7cfb\u7edf\u4fe1\u4efb\u3002Ascend-CC\u901a\u8fc7\u786e\u4fdd\u6570\u636e\u548c\u6a21\u578b\u52a0\u5bc6\uff0c\u4fdd\u62a4\u6570\u636e\u3001\u6a21\u578b\u53c2\u6570\u548c\u8fd0\u7b97\u7b26\u4e8c\u8fdb\u5236\uff0c\u63d0\u4f9b\u5f3a\u5927\u7684\u5b89\u5168\u6027\u3002\u5b83\u5229\u7528\u59d4\u6258\u5f0f\u5185\u5b58\u8bed\u4e49\u786e\u4fdd\u4e0e\u4e3b\u673a\u8f6f\u4ef6\u6808\u7684\u9694\u79bb\uff0c\u5e76\u901a\u8fc7\u4efb\u52a1\u9274\u6743\u63d0\u4f9b\u6a21\u578b\u5b8c\u6574\u6027\u7684\u5f3a\u6709\u529b\u4fdd\u8bc1\u3002\u6211\u4eec\u7684Ascend-CC\u5b9e\u73b0\u548c\u4e0e\u6700\u65b0LLMs\uff08\u5982Llama2\u548cLlama3\uff09\u7684\u8bc4\u4f30\u8868\u660e\uff0cAscend-CC\u5f15\u5165\u7684\u5f00\u9500\u6781\u5c0f\uff0c\u65e0\u9700\u4fee\u6539AI\u8f6f\u4ef6\u6808\u3002|\n", "2407.11852": "|**2024-07-16**|**Schema Matching with Large Language Models: an Experimental Study**|Marcel Parciak et.al.|[2407.11852](http://arxiv.org/abs/2407.11852)|**[link](https://github.com/uhasselt-dsi-data-systems-lab/code-schema-matching-llms-artefacs)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5173\u7cfb\u6570\u636e\u5e93\u67b6\u6784\uff08schema\uff09\u5339\u914d\u4e2d\u7684\u5e94\u7528\u3002\u76ee\u6807\u662f\u4ec5\u901a\u8fc7\u5143\u7d20\u540d\u79f0\u548c\u63cf\u8ff0\u627e\u51fa\u4e24\u4e2a\u5173\u7cfb\u6a21\u5f0f\u4e4b\u95f4\u7684\u8bed\u4e49\u5bf9\u5e94\u3002\u7814\u7a76\u8005\u6784\u5efa\u4e86\u4e00\u4e2a\u6765\u81ea\u5065\u5eb7\u9886\u57df\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u63d0\u51fa\u4e86\u4e0d\u540c\u7684\u4efb\u52a1\u8303\u56f4\uff0c\u5373\u4f7f\u7528\u4e0d\u540c\u6570\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u63d0\u793a\u6a21\u578b\u8fdb\u884cschema\u5339\u914d\u3002\u4ed6\u4eec\u5bf9\u6bd4\u4e86\u57fa\u4e8eLLM\u7684\u5339\u914d\u65b9\u6cd5\u4e0e\u57fa\u4e8e\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u7684\u57fa\u7ebf\uff0c\u8003\u5bdf\u4e86\u5339\u914d\u8d28\u91cf\u3001\u9a8c\u8bc1\u5de5\u4f5c\u91cf\u3001\u51b3\u7b56\u786e\u5b9a\u6027\u548c\u4e92\u8865\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u4fe1\u606f\u4f1a\u964d\u4f4e\u5339\u914d\u8d28\u91cf\uff0c\u8fc7\u591a\u7684\u4fe1\u606f\u4e5f\u4f1a\u6709\u8d1f\u9762\u5f71\u54cd\u3002\u65b0\u7248\u672c\u7684LLMs\u901a\u5e38\u80fd\u63d0\u9ad8\u51b3\u7b56\u786e\u5b9a\u6027\u3002\u6709\u4e9b\u4efb\u52a1\u8303\u56f4\u4e0b\u7684\u9a8c\u8bc1\u5de5\u4f5c\u76f8\u5bf9\u9002\u5ea6\uff0c\u4e14\u80fd\u6210\u529f\u8bc6\u522b\u5927\u91cf\u771f\u6b63\u610f\u4e49\u4e0a\u7684\u8bed\u4e49\u5339\u914d\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cLLMs\u6709\u6f5c\u529b\u4f5c\u4e3aschema\u5339\u914d\u7684\u521d\u59cb\u5de5\u5177\uff0c\u6570\u636e\u5de5\u7a0b\u5e08\u53ef\u4ee5\u5229\u7528\u5b83\u4eec\u7684\u540d\u79f0\u548c\u63cf\u8ff0\u4fe1\u606f\u5feb\u901f\u8fdb\u884c\u5339\u914d\uff0c\u65e0\u9700\u4f9d\u8d56\u5b9e\u9645\u6570\u636e\u5b9e\u4f8b\u3002**|\n", "2407.11833": "|**2024-07-16**|**LoFTI: Localization and Factuality Transfer to Indian Locales**|Sona Elza Simon et.al.|[2407.11833](http://arxiv.org/abs/2407.11833)|**[link](https://github.com/csalt-research/lofti)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u8bad\u7ec3\u5728\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u7684\u5927\u578b\u7f51\u9875\u6570\u636e\u96c6\uff0c\u79ef\u7d2f\u4e86\u5927\u91cf\u7684\u4e16\u754c\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u901a\u5e38\u503e\u5411\u4e8e\u82f1\u8bed\u548c\u897f\u6b27\u56fd\u5bb6\uff0c\u5bfc\u81f4LLMs\u5bf9\u6765\u81ea\u5176\u4ed6\u5730\u533a\uff0c\u7279\u522b\u662f\u5370\u5ea6\u7684\u672c\u5730\u5316\u67e5\u8be2\u4ea7\u751f\u504f\u89c1\u6216\u865a\u6784\u7684\u56de\u7b54\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6LoFTI\uff08\u5370\u5ea6\u672c\u5730\u5316\u4e0e\u4e8b\u5b9e\u8f6c\u79fb\uff09\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u672c\u5730\u5316\u548c\u4e8b\u5b9e\u6587\u672c\u8f6c\u6362\u80fd\u529b\u3002LoFTI\u5305\u542b\u5173\u4e8e\u5168\u7403\u6e90\u5730\u70b9\u548c\u5370\u5ea6\u76ee\u6807\u5730\u70b9\uff08\u5305\u62ec\u56fd\u5bb6\u3001\u5dde\u548c\u57ce\u5e02\u7684\u4e0d\u540c\u5c42\u7ea7\uff09\u5b9e\u4f53\u7684\u4e8b\u5b9e\u9648\u8ff0\uff0c\u6d89\u53ca\u5404\u7c7b\u5e7f\u6cdb\u7684\u4e3b\u9898\u3002\u6211\u4eec\u4f7f\u7528LoFTI\u6765\u8bc4\u4f30Mixtral\u3001GPT-4\u4ee5\u53ca\u4e24\u79cd\u9002\u7528\u4e8e\u672c\u5730\u5316\u4e8b\u5b9e\u8f6c\u79fb\u4efb\u52a1\u7684Mixtral\u884d\u751f\u65b9\u6cd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLoFTI\u662f\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5305\u62ecGPT-4\u5728\u5185\u7684\u6240\u6709\u6a21\u578b\u5728\u4e0d\u540c\u5c42\u7ea7\u7684\u672c\u5730\u5316\u4e0a\u90fd\u8868\u73b0\u51fa\u504f\u5dee\u3002**|\n", "2407.11827": "|**2024-07-16**|**GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text**|Kyle Hamilton et.al.|[2407.11827](http://arxiv.org/abs/2407.11827)|null|\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u5728\u68c0\u6d4b\u6587\u672c\u4e2d\u7684\u5ba3\u4f20\u624b\u6bb5\u65b9\u9762\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u4f46\u5927\u591a\u6570\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u201c\u9ed1\u76d2\u201d\u89e3\u51b3\u65b9\u6848\uff0c\u5176\u5185\u90e8\u5de5\u4f5c\u539f\u7406\u4e0d\u900f\u660e\u3002\u53ef\u89e3\u91ca\u7684\u65b9\u6cd5\u63d0\u4f9b\u4e86\u89e3\u51b3\u65b9\u6848\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u7cbe\u5fc3\u7684\u7279\u5f81\u5de5\u7a0b\u548c\u6602\u8d35\u7684\u4e13\u5bb6\u6807\u6ce8\u6570\u636e\u3002\u6b64\u5916\uff0c\u5173\u4e8e\u8bf4\u670d\u6027\u6587\u672c\u7684\u8bed\u8a00\u7279\u6027\u901a\u5e38\u7531\u4fee\u8f9e\u5b66\u5bb6\u6216\u8bed\u8a00\u5b66\u5bb6\u5173\u6ce8\uff0c\u4f46\u6ca1\u6709\u9002\u5408\u673a\u5668\u5b66\u4e60\u7684\u6807\u8bb0\u6709\u6b64\u7c7b\u7279\u6027\u7684\u6570\u636e\u96c6\u3002\u672c\u7814\u7a76\u65e8\u5728\u7f16\u7e82\u6587\u732e\u4e2d\u8bc6\u522b\u51fa\u768422\u4e2a\u4fee\u8f9e\u548c\u8bed\u8a00\u7279\u5f81\uff0c\u76ee\u7684\u662f\u5bf9\u4e00\u4e2a\u5df2\u6807\u6ce8\u6709\u5ba3\u4f20\u624b\u6bb5\u7684\u73b0\u6709\u6570\u636e\u96c6\u8fdb\u884c\u6ce8\u91ca\u3002\u4e3a\u4e86\u5e2e\u52a9\u4eba\u7c7b\u4e13\u5bb6\u5728\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u4e0a\u6807\u6ce8\u8fd9\u4e9b\u7279\u5f81\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u540d\u4e3aRhetAnn\u7684\u7f51\u7edc\u5e94\u7528\uff0c\u4ee5\u51cf\u5c11\u539f\u672c\u8f83\u5927\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u63a5\u7740\uff0c\u4f7f\u7528\u4e00\u5c0f\u90e8\u5206\u6807\u6ce8\u6570\u636e\uff0c\u6211\u4eec\u5229\u7528GPT-3.5\uff0c\u4e00\u79cd\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5bf9\u5269\u4f59\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u540c\u65f6\u517c\u987e\u6210\u672c\u6548\u76ca\u548c\u5206\u7c7b\u7cbe\u5ea6\u3002\u8fd9\u9879\u7814\u7a76\u8868\u660e\uff0c\u7ed3\u5408\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u793a\u4f8b\u4e0eGPT\uff0c\u53ef\u4ee5\u6709\u6548\u5730\u4ee5\u4f20\u7edf\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u4e13\u5bb6\u7684\u6807\u6ce8\u6210\u672c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u5b9e\u73b0\u5927\u89c4\u6a21\u6807\u6ce8\u8fc7\u7a0b\u7684\u6269\u5c55\u3002\u7ed3\u679c\u4e0e\u64b0\u5199\u65f6\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4\uff09\u76f8\u5f53\uff0c\u4e14\u6210\u672c\u964d\u4f4e10\u500d\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u8fd9\u4e9b\u7279\u5f81\u3001\u5b83\u4eec\u7684\u5c5e\u6027\u3001\u5b9a\u4e49\u4ee5\u53ca\u793a\u4f8b\u7684\u673a\u5668\u53ef\u8bfb\u683c\u5f0f\uff0c\u4ee5\u53caRhetAnn\u7684\u4ee3\u7801\u3001GPT\u63d0\u793a\u548c\u5fae\u8c03\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u63a8\u52a8\u4e86\u53ef\u89e3\u91ca\u7684\u5ba3\u4f20\u624b\u6bb5\u68c0\u6d4b\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\u3002|\n", "2407.11798": "|**2024-07-16**|**PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation**|Branden Butler et.al.|[2407.11798](http://arxiv.org/abs/2407.11798)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5206\u5e03\u5f0f\u8ba1\u7b97\u673a\u96c6\u7fa4\u4e0a\u7684\u63a8\u7406\u5df2\u6210\u4e3a\u7814\u7a76\u70ed\u70b9\uff0c\u8bb8\u591a\u52a0\u901f\u6280\u672f\u501f\u9274\u4e86CPU\u7684\u63a8\u6d4b\u6267\u884c\u7b56\u7565\u3002\u8fd9\u4e9b\u6280\u672f\u65e8\u5728\u7f13\u89e3\u5185\u5b58\u5e26\u5bbd\u74f6\u9888\uff0c\u4f46\u4f1a\u589e\u52a0\u6bcf\u6b21\u63a8\u7406\u8fd0\u884c\u7684\u7aef\u5230\u7aef\u5ef6\u8fdf\uff0c\u9700\u8981\u9ad8\u63a8\u6d4b\u63a5\u53d7\u7387\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4efb\u52a1\u95f4\u63a5\u53d7\u7387\u7684\u53d8\u5f02\u6027\uff0c\u63a8\u6d4b\u6027\u63a8\u7406\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u7ba1\u9053\u5e76\u884c\u8bbe\u8ba1\u9700\u8981\u5927\u91cf\u7528\u6237\u8bf7\u6c42\u4ee5\u4fdd\u6301\u9ad8\u5229\u7528\u7387\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PipeInfer\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u51cf\u5c11\u8de8\u4ee4\u724c\u5ef6\u8fdf\u3001\u63d0\u9ad8\u5355\u8bf7\u6c42\u573a\u666f\u4e0b\u7cfb\u7edf\u5229\u7528\u7387\u7684\u7ba1\u9053\u5316\u63a8\u6d4b\u52a0\u901f\u6280\u672f\uff0c\u540c\u65f6\u589e\u5f3a\u4e86\u5bf9\u4f4e\u63a8\u6d4b\u63a5\u53d7\u7387\u548c\u4f4e\u5e26\u5bbd\u4e92\u8054\u7684\u5bb9\u5fcd\u5ea6\u3002 PipeInfer\u901a\u8fc7\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u548c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6539\u8fdb\u3002\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u5141\u8bb8\u540c\u65f6\u8fdb\u884c\u5355\u4ee4\u724c\u63a8\u7406\u4e0e\u591a\u4e2a\u63a8\u6d4b\u8fd0\u884c\uff0c\u4ece\u800c\u964d\u4f4e\u5ef6\u8fdf\u548c\u751f\u6210\u901f\u5ea6\u3002\u800c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5219\u80fd\u591f\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u8df3\u8fc7\u65e0\u6548\u8fd0\u884c\u7684\u8ba1\u7b97\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u901f\u5ea6\u548c\u5ef6\u8fdf\u3002PipeInfer\u5728\u751f\u6210\u901f\u5ea6\u4e0a\u6bd4\u6807\u51c6\u63a8\u6d4b\u6027\u63a8\u7406\u6700\u9ad8\u53ef\u63d0\u53472.15\u500d\u3002|\n", "2407.11789": "|**2024-07-16**|**Large Language Models as Misleading Assistants in Conversation**|Betty Li Hou et.al.|[2407.11789](http://arxiv.org/abs/2407.11789)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u80fd\u591f\u63d0\u4f9b\u5e2e\u52a9\u3002\u7136\u800c\uff0c\u6a21\u578b\u8f93\u51fa\u53ef\u80fd\u4f1a\u8bef\u5bfc\u7528\u6237\uff0c\u65e0\u8bba\u662f\u65e0\u610f\u7684\u8fd8\u662f\u6545\u610f\u7684\u3002\u6211\u4eec\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u4efb\u52a1\u63a2\u8ba8\u4e86LLMs\u5728\u6b3a\u9a97\u6027\u8f85\u52a9\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5c06\u5176\u4f5c\u4e3a\u4eba\u7c7b\u7528\u6237\u7684\u4ee3\u7406\u3002\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u4e09\u79cd\u60c5\u51b5\uff1a\uff081\uff09\u6a21\u578b\u88ab\u63d0\u793a\u63d0\u4f9b\u771f\u5b9e\u4fe1\u606f\uff0c\uff082\uff09\u6a21\u578b\u88ab\u63d0\u793a\u8fdb\u884c\u5fae\u5999\u8bef\u5bfc\uff0c\u4ee5\u53ca\uff083\uff09\u6a21\u578b\u88ab\u63d0\u793a\u652f\u6301\u9519\u8bef\u7b54\u6848\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT-4\u80fd\u591f\u6709\u6548\u8bef\u5bfcGPT-3.5-Turbo\u548cGPT-4\u81ea\u8eab\uff0c\u6b3a\u9a97\u6027\u52a9\u624b\u5bfc\u81f4\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u9ad8\u8fbe23%\uff0c\u76f8\u6bd4\u4e8e\u4f7f\u7528\u771f\u5b9e\u52a9\u624b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5411\u7528\u6237\u6a21\u578b\u63d0\u4f9b\u66f4\u591a\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u53ef\u4ee5\u90e8\u5206\u62b5\u6d88\u6b3a\u9a97\u6a21\u578b\u7684\u5f71\u54cd\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u8bef\u5bfc\u6027\u4fe1\u606f\u7684\u80fd\u529b\u53ca\u5176\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u6f5c\u5728\u5f71\u54cd\u3002|\n", "2407.12735": "|**2024-07-17**|**EchoSight: Advancing Visual-Language Models with Wiki Knowledge**|Yibin Yan et.al.|[2407.12735](http://arxiv.org/abs/2407.12735)|null|**\u6458\u8981\uff1a** \u77e5\u8bc6\u9a71\u52a8\u7684\u89c6\u89c9\u95ee\u7b54\uff08KVQA\uff09\u4efb\u52a1\u8981\u6c42\u5229\u7528\u4e30\u5bcc\u80cc\u666f\u77e5\u8bc6\u89e3\u7b54\u56fe\u50cf\u76f8\u5173\u95ee\u9898\uff0c\u4f46\u751f\u6210\u6a21\u578b\u5728\u8fd9\u65b9\u9762\u5e38\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faEchoSight\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5e2e\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u9700\u8981\u8be6\u5c3d\u767e\u79d1\u77e5\u8bc6\u7684\u89c6\u89c9\u95ee\u7b54\u3002EchoSight\u9996\u5148\u4ec5\u4f7f\u7528\u56fe\u50cf\u4fe1\u606f\u5728\u7ef4\u57fa\u767e\u79d1\u4e2d\u641c\u7d22\u6587\u7ae0\uff0c\u7136\u540e\u5bf9\u5019\u9009\u6587\u7ae0\u6839\u636e\u5b83\u4eec\u4e0e\u6587\u672c-\u56fe\u50cf\u67e5\u8be2\u7684\u76f8\u5173\u6027\u8fdb\u884c\u4e8c\u6b21\u6392\u5e8f\uff0c\u4ece\u800c\u663e\u8457\u63d0\u5347\u591a\u6a21\u6001\u77e5\u8bc6\u7684\u6574\u5408\uff0c\u8fdb\u800c\u63d0\u9ad8\u68c0\u7d22\u6548\u679c\u548c\u7b54\u6848\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u5728Encyclopedic VQA\u548cInfoSeek\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEchoSight\u5728\u77e5\u8bc6\u578b\u89c6\u89c9\u95ee\u7b54\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684state-of-the-art\u6210\u7ee9\uff0cEncyclopedic VQA\u4efb\u52a1\u4e0a\u8fbe\u523041.8%\u7684\u51c6\u786e\u7387\uff0cInfoSeek\u4efb\u52a1\u4e0a\u8fbe\u523031.3%\u3002|\n", "2407.12727": "|**2024-07-17**|**NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model**|Zhongqun Zhang et.al.|[2407.12727](http://arxiv.org/abs/2407.12727)|null|### \u80cc\u666f \u5728\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u91cd\u5efa\u4e2d\uff0c\u7cbe\u786e\u7684\u624b\u90e8\u4e0e\u7269\u4f53\u4e4b\u95f4\u7684\u7269\u7406\u63a5\u89e6\u662f\u63d0\u5347\u624b\u90e8\u59ff\u6001\u4f30\u8ba1\u51c6\u786e\u6027\u548c\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u7684\u6807\u51c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u96be\u4ee5\u5b9a\u4e49\u6216\u63a7\u5236\u7684\u51e0\u4f55\u7ea6\u675f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u8fdb\u884c\u53ef\u63a7\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u5efa\u6a21\u3002\u9762\u4e34\u7684\u6311\u6218\u5305\u62ec\uff1a\u4e00\u3001\u4ece\u8bed\u8a00\u5230\u63a5\u89e6\u7684\u590d\u6742\u8de8\u6a21\u6001\u5efa\u6a21\uff1b\u4e8c\u3001\u7f3a\u4e4f\u63cf\u8ff0\u63a5\u89e6\u6a21\u5f0f\u7684\u6587\u672c\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86NL2Contact\u6a21\u578b\uff0c\u5b83\u5229\u7528\u5206\u6bb5\u6269\u6563\u6a21\u578b\u751f\u6210\u53ef\u63a7\u5236\u7684\u63a5\u89e6\u3002\u7ed9\u5b9a\u5bf9\u624b\u548c\u63a5\u89e6\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\uff0cNL2Contact\u80fd\u591f\u751f\u6210\u903c\u771f\u4e14\u5fe0\u5b9e\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5f00\u53d1\u4e86NL2Contact\u6a21\u578b\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u751f\u6210\u5177\u6709\u63a7\u5236\u6027\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002\u4e3a\u8bad\u7ec3\u8fd9\u4e2a\u6a21\u578b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u9996\u4e2a\u540d\u4e3a\\textit{ContactDescribe}\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u57fa\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\uff08\u5982\u6293\u53d6\u52a8\u4f5c\u3001\u6293\u53d6\u7c7b\u578b\u3001\u63a5\u89e6\u4f4d\u7f6e\u548c\u81ea\u7531\u624b\u6307\u72b6\u6001\uff09\u751f\u6210\u7684\u4e30\u5bcc\u591a\u6837\u7684\u624b\u90e8\u4e2d\u5fc3\u63a5\u89e6\u63cf\u8ff0\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u4f18\u5316\u6293\u63e1\u59ff\u52bf\u548c\u57fa\u4e8e\u6587\u672c\u63cf\u8ff0\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u65b9\u9762\u5c55\u793a\u4e86\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.12725": "|**2024-07-17**|**Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?**|Ben Yao et.al.|[2407.12725](http://arxiv.org/abs/2407.12725)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u89e3\u51b3\u590d\u6742\u95ee\u9898\u7684\u80fd\u529b\u65b9\u9762\uff0c\u901a\u8fc7\u9010\u6b65\u63a8\u7406\u6b65\u9aa4\u7684\u6269\u5c55\u663e\u8457\u63d0\u5347\u5176\u6027\u80fd\uff0c\u56e0\u4e3a\u8fd9\u4fc3\u4f7f\u6a21\u578b\u8fdb\u884c\u5e8f\u5217\u601d\u8003\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u5bf9\u8bbd\u523a\u7684\u7406\u89e3\u901a\u5e38\u88ab\u89c6\u4e3a\u4e00\u79cd\u76f4\u89c9\u4e14\u6574\u4f53\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u5b83\u6574\u5408\u4e86\u8bed\u8a00\u3001\u4e0a\u4e0b\u6587\u548c\u60c5\u611f\u7ebf\u7d22\uff0c\u5f62\u6210\u5bf9\u8bf4\u8bdd\u8005\u771f\u5b9e\u610f\u56fe\u7684\u5168\u9762\u7406\u89e3\uff0c\u8fd9\u79cd\u7406\u89e3\u88ab\u8ba4\u4e3a\u4e0d\u5c40\u9650\u4e8e\u4e00\u6b65\u6b65\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u89c2\u70b9\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u6846\u67b6\uff0c\u79f0\u4e3aSarcasmCue\uff0c\u5b83\u5305\u542b\u4e86\u56db\u79cd\u63d0\u793a\u7b56\u7565\uff1a\u8fde\u9501\u77db\u76fe\uff08CoC\uff09\u3001\u7ebf\u7d22\u56fe\uff08GoC\uff09\u3001\u7ebf\u7d22\u96c6\u5408\uff08BoC\uff09\u548c\u7ebf\u7d22\u5f20\u91cf\uff08ToC\uff09\u3002\u8fd9\u4e9b\u65b9\u6cd5\u65e8\u5728\u5f15\u5bfcLLMs\u901a\u8fc7\u8003\u8651\u987a\u5e8f\u548c\u975e\u987a\u5e8f\u63d0\u793a\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u8bbd\u523a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5168\u9762\u5b9e\u8bc1\u6bd4\u8f83\u8868\u660e\uff0c\u6211\u4eec\u7684\u56db\u79cd\u63d0\u793a\u65b9\u6cd5\u660e\u663e\u4f18\u4e8e\u6807\u51c6\u7684\u8f93\u5165-\u8f93\u51fa\u63d0\u793a\u3001CoT\u548cToT\uff0c\u800c\u4e14\u975e\u987a\u5e8f\u63d0\u793a\u901a\u5e38\u4f18\u4e8e\u987a\u5e8f\u63d0\u793a\u3002|\n", "2407.12723": "|**2024-07-17**|**The Future of Learning: Large Language Models through the Lens of Students**|He Zhang et.al.|[2407.12723](http://arxiv.org/abs/2407.12723)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u63d0\u5347\u548c\u529f\u80fd\u6269\u5c55\u5bf9\u6559\u80b2\u9886\u57df\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u672c\u7814\u7a76\u901a\u8fc7\u8bbf\u8c0814\u540d\u5b66\u751f\uff0c\u63a2\u8ba8\u4ed6\u4eec\u65e5\u5e38\u4e0eChatGPT\u7684\u4e92\u52a8\u3002\u521d\u6b65\u7ed3\u679c\u663e\u793a\uff0c\u5b66\u751f\u4eec\u5728\u4eab\u53d7ChatGPT\u63d0\u9ad8\u5b66\u4e60\u6548\u7387\u548c\u4fe1\u606f\u83b7\u53d6\u4fbf\u5229\u7684\u540c\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4fe1\u4efb\u5371\u673a\u548c\u4f26\u7406\u987e\u8651\u3002\u4ed6\u4eec\u8ba4\u4e3aChatGPT\u76f8\u8f83\u4e8e\u4f20\u7edfAI\u66f4\u663e\u201c\u4eba\u6027\u5316\u201d\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u77db\u76fe\u60c5\u7eea\u3001\u884c\u4e3a\u4e0d\u4e00\u81f4\u4ee5\u53ca\u5bf9\u5b66\u751f\u6574\u4f53\u4e0a\u79ef\u6781\u7684\u6001\u5ea6\uff0c\u51f8\u663e\u4e86ChatGPT\u5728\u6559\u80b2\u9886\u57df\u7684\u6f5c\u5728\u4ef7\u503c\u3002\u4f46\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5c3d\u7ba1\u5176\u667a\u80fd\u7a0b\u5ea6\u9ad8\uff0c\u53ef\u80fd\u5e26\u6765\u8d1f\u9762\u6548\u5e94\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f3a\u8c03\u5728\u5e94\u7528\u65f6\u9700\u8c28\u614e\uff0c\u5e76\u81f4\u529b\u4e8e\u5728\u672a\u6765\u7684\u5f00\u53d1\u4e2d\u51cf\u5c11\u6f5c\u5728\u7684\u5371\u5bb3\u3002|\n", "2407.12709": "|**2024-07-17**|**MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models**|Leyang Shen et.al.|[2407.12709](http://arxiv.org/abs/2407.12709)|**[link](https://github.com/jiutian-vl/mome)**|**\u5728\u591a\u9879\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u901a\u7528\u7684MLLM\u5728\u5927\u591a\u6570VL\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4e0d\u5982\u4e13\u95e8\u5316\u7684MLLM\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b58\u5728\u4efb\u52a1\u5e72\u6270\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u591a\u6a21\u6001\u4e13\u5bb6\uff08MoME\uff09\u67b6\u6784\uff0c\u65e8\u5728\u51cf\u8f7b\u4efb\u52a1\u5e72\u6270\uff0c\u4ece\u800c\u83b7\u5f97\u4e00\u4e2a\u5168\u80fd\u7684MLLM\u3002MoME\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a\u89c6\u89c9\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoVE\uff09\u548c\u8bed\u8a00\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoLE\uff09\u3002MoVE\u80fd\u591f\u81ea\u9002\u5e94\u5730\u8c03\u6574\u6765\u81ea\u4e0d\u540c\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7279\u5f81\uff0c\u5e76\u5728\u8f6c\u6362\u67b6\u6784\u4e0a\u5177\u6709\u826f\u597d\u7684\u517c\u5bb9\u6027\u3002MoLE\u901a\u8fc7\u7a00\u758f\u95e8\u63a7\u4e13\u5bb6\u878d\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u5b9e\u73b0\u4e86\u51e0\u4e4e\u65e0\u989d\u5916\u6210\u672c\u7684\u6027\u80fd\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u4efb\u52a1\u5e72\u6270\uff0cMoME\u4e13\u6ce8\u4e8e\u89c6\u89c9\u548c\u8bed\u8a00\u4e24\u79cd\u6a21\u6001\uff0c\u4ee5\u9002\u5e94\u4efb\u52a1\u95f4\u7684\u5dee\u5f02\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoME\u663e\u8457\u63d0\u9ad8\u4e86\u901a\u7528MLLM\u5728\u5404\u79cdVL\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6e90\u4ee3\u7801\u5df2\u5728https://github.com/JiuTian-VL/MoME\u4e0a\u53d1\u5e03\u3002**|\n", "2407.12665": "|**2024-07-17**|**Patch-Level Training for Large Language Models**|Chenze Shao et.al.|[2407.12665](http://arxiv.org/abs/2407.12665)|**[link](https://github.com/shaochenze/patchtrain)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\uff0c\u5176\u8bad\u7ec3\u6548\u7387\u6210\u4e3a\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0cLLMs\u901a\u8fc7\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u8fdb\u884c\u8bad\u7ec3\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4ee4\u724c\u7684\u8bad\u7ec3\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5176\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u5904\u7406\u5927\u91cf\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cpatch-level training\u201d\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u591a\u4e2a\u4ee4\u724c\u538b\u7f29\u6210\u5355\u4e2apatch\u6765\u7f29\u77ed\u5e8f\u5217\u957f\u5ea6\u3002\u5728patch-level\u8bad\u7ec3\u4e2d\uff0c\u6211\u4eec\u8f93\u5165\u66f4\u77ed\u7684patch\u5e8f\u5217\uff0c\u8ba9\u6a21\u578b\u5b66\u4e60\u9884\u6d4b\u4e0b\u4e00\u4e2apatch\uff0c\u4ece\u800c\u5927\u5e45\u5ea6\u51cf\u5c11\u4e86\u5927\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u5904\u7406\u6210\u672c\u3002\u63a5\u7740\uff0c\u6a21\u578b\u4f1a\u8fdb\u884c\u5269\u4f59\u8bad\u7ec3\u6570\u636e\u7684\u4ee4\u724c\u7ea7\u8bad\u7ec3\uff0c\u4ee5\u9002\u5e94\u63a8\u7406\u6a21\u5f0f\u3002\u5b9e\u9a8c\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08370M-2.7\u4ebf\u53c2\u6570\uff09\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660epatch-level\u8bad\u7ec3\u53ef\u4ee5\u5c06\u603b\u4f53\u8ba1\u7b97\u6210\u672c\u964d\u4f4e\u81f30.5\u500d\uff0c\u540c\u65f6\u4e0d\u4f1a\u5f71\u54cd\u6a21\u578b\u6027\u80fd\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/shaochenze/PatchTrain}\u3002**|\n", "2407.12642": "|**2024-07-17**|**Zero-shot Text-guided Infinite Image Synthesis with LLM guidance**|Soyeong Kwon et.al.|[2407.12642](http://arxiv.org/abs/2407.12642)|null|**\u80cc\u666f\uff1a** \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\u548c\u751f\u6210\u65b9\u6cd5\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u7136\u800c\uff0c\u6587\u672c\u5f15\u5bfc\u7684\u65e0\u9650\u56fe\u50cf\u5408\u6210\u9762\u4e34\u7740\u4e00\u4e9b\u6311\u6218\u3002\u9996\u5148\uff0c\u7f3a\u4e4f\u9ad8\u5206\u8fa8\u7387\u4e14\u5177\u6709\u4e30\u5bcc\u60c5\u5883\u591a\u6837\u6027\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u6570\u636e\u96c6\u3002\u5176\u6b21\uff0c\u6839\u636e\u6587\u672c\u6269\u5c55\u56fe\u50cf\u9700\u8981\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u4e30\u5bcc\u7684\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7c7b\u522b\uff0c\u5982\u81ea\u7136\u98ce\u666f\uff0c\u4e14\u9700\u8981\u5728\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u53ca\u5176\u914d\u6587\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u540c\u65f6\u5904\u7406\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u65e0\u9700\u4efb\u4f55\u9ad8\u5206\u8fa8\u7387\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u3002 **\u65b9\u6cd5\uff1a** \u6211\u4eec\u5728\u8bad\u7ec3\u6269\u6563\u6a21\u578b\u65f6\uff0c\u8ba9\u5b83\u6839\u636eLLM\u751f\u6210\u7684\u5168\u5c40\u548c\u5c40\u90e8\u63cf\u8ff0\u4ee5\u53ca\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u7ed9\u5b9a\u4e00\u5f20\u56fe\u7247\u548c\u4e00\u4e2a\u5168\u5c40\u63cf\u8ff0\uff0c\u6211\u4eec\u4f7f\u7528LLM\u751f\u6210\u4e0b\u4e00\u4e2a\u5c40\u90e8\u63cf\u8ff0\u6765\u6269\u5c55\u8f93\u5165\u56fe\u50cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u7ed3\u5408\u5168\u5c40\u63cf\u8ff0\u3001\u751f\u6210\u7684\u5c40\u90e8\u63cf\u8ff0\u548c\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\uff0c\u4ee5\u786e\u4fdd\u5168\u5c40\u4e00\u81f4\u6027\u5e76\u8003\u8651\u7a7a\u95f4\u5c40\u90e8\u4e0a\u4e0b\u6587\u3002 **\u5b9e\u9a8c\u7ed3\u679c\uff1a** \u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5b9a\u91cf\u548c\u5b9a\u6027\u4e0a\u90fd\u4f18\u4e8e\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5c55\u793a\u4e86\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u501f\u52a9LLM\u5f15\u5bfc\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u7684\u80fd\u529b\u3002 \u603b\u7ed3\uff1a \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u6269\u5c55\u65b9\u6cd5\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u5206\u8fa8\u7387\u7684\u914d\u5bf9\u6570\u636e\uff0c\u80fd\u591f\u5b9e\u73b0\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u5e76\u5728\u5b9e\u9a8c\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u652f\u6301\u96f6\u6837\u672c\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u3002|\n", "2407.12620": "|**2024-07-17**|**Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences**|Claudio Pinhanez et.al.|[2407.12620](http://arxiv.org/abs/2407.12620)|null|\u81ea2022\u5e74\u4ee5\u6765\uff0c\u6211\u4eec\u4e00\u76f4\u5728\u63a2\u7d22\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u73b0\u4ee3\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u9886\u57df\uff0c\u4ee5\u652f\u6301\u548c\u4fc3\u8fdb\u6fd2\u4e34\u6d88\u5931\u7684\u571f\u8457\u8bed\u8a00\u7684\u4f7f\u7528\u4e0e\u6587\u6863\u5316\u3002\u9996\u5148\uff0c\u6211\u4eec\u5173\u6ce8\u4e16\u754c\u8bed\u8a00\u591a\u6837\u6027\u7684\u51cf\u5c11\uff0c\u5e76\u8ba8\u8bba\u4e0e\u5904\u7406\u571f\u8457\u8bed\u8a00\u76f8\u5173\u7684\u72ec\u7279\u4f26\u7406\u6311\u6218\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u533a\u53c2\u4e0e\u548c\u4f7f\u7528\u7684AI\u5f00\u53d1\u65b0\u5faa\u73af\u3002\u63a5\u7740\uff0c\u6211\u4eec\u62a5\u544a\u4e86\u4f7f\u7528\u5c11\u91cf\u6570\u636e\u5fae\u8c03\u6700\u5148\u8fdb\u7684\u7ffb\u8bd1\u5668\uff0c\u6210\u529f\u5f00\u53d1\u51fa\u9ad8\u8d28\u91cf\u7684\u571f\u8457\u8bed\u8a00\u673a\u5668\u7ffb\u8bd1\u7684\u9f13\u821e\u4eba\u5fc3\u7684\u6210\u679c\uff0c\u5e76\u8ba8\u8bba\u4e86\u907f\u514d\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u5e38\u89c1\u9677\u9631\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e862023\u5e74\u548c2024\u5e74\u5728\u5df4\u897f\u4e0e\u571f\u8457\u793e\u533a\u5408\u4f5c\u9879\u76ee\u4e2d\u7684\u539f\u578b\uff0c\u76ee\u6807\u662f\u7b80\u5316\u5199\u4f5c\uff0c\u4ee5\u53ca\u53d1\u5c55\u571f\u8457\u8bed\u8a00\u6a21\u578b\uff08ILMs\uff09\u4f5c\u4e3a\u521b\u5efa\u62fc\u5199\u68c0\u67e5\u5668\u3001\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u5668\u7b49\u5de5\u5177\u7684\u53ef\u590d\u5236\u548c\u53ef\u6269\u5c55\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u671b\u4e00\u4e2a\u672a\u6765\uff0c\u6fd2\u5371\u7684\u8bed\u8a00\u5c06\u901a\u8fc7\u4e92\u52a8\u7684\u8bed\u8a00\u6a21\u578b\u5f97\u4ee5\u4fdd\u5b58\u3002|\n", "2407.12613": "|**2024-07-17**|**AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism**|William Brannon et.al.|[2407.12613](http://arxiv.org/abs/2407.12613)|**[link](https://github.com/mit-ccc/AudienceView-demo)**|****\u80cc\u666f\uff1a** \u8bb0\u8005\u7406\u89e3\u548c\u5229\u7528\u53d7\u4f17\u53cd\u9988\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5982\u4eca\u4ed6\u4eec\u5728\u7ebf\u9762\u4e34\u5927\u91cf\u89c2\u4f17\u8bc4\u8bba\uff0c\u8fd9\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u6211\u4eec\u63a8\u51fa\u4e86AudienceView\uff0c\u4e00\u4e2a\u5728\u7ebf\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e2e\u52a9\u8bb0\u8005\u5bf9\u8fd9\u4e9b\u53cd\u9988\u8fdb\u884c\u5206\u7c7b\u548c\u89e3\u8bfb\u3002AudienceView\u8bc6\u522b\u4e3b\u9898\u548c\u8bdd\u9898\uff0c\u5c06\u5b83\u4eec\u4e0e\u7279\u5b9a\u8bc4\u8bba\u5173\u8054\uff0c\u5c55\u793a\u8bc4\u8bba\u7684\u60c5\u611f\u503e\u5411\u548c\u5206\u5e03\uff0c\u5e76\u534f\u52a9\u7528\u6237\u6784\u601d\u540e\u7eed\u62a5\u9053\u9879\u76ee\u3002\u6211\u4eec\u5c06\u63a2\u8ba8\u8fd9\u7c7b\u5de5\u5177\u5982\u4f55\u878d\u5165\u8bb0\u8005\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5f3a\u8c03\u60c5\u5883\u7406\u89e3\u53ca\u4eba\u7c7b\u5224\u65ad\u7684\u91cd\u8981\u6027\u3002 \u8bf7\u8bb0\u4f4f\uff0c\u4ee5\u4e0a\u7ffb\u8bd1\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002**|\n", "2407.12580": "|**2024-07-17**|**E5-V: Universal Embeddings with Multimodal Large Language Models**|Ting Jiang et.al.|[2407.12580](http://arxiv.org/abs/2407.12580)|**[link](https://github.com/kongds/e5-v)**|**### \u80cc\u666f \u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u901a\u7528\u89c6\u89c9\u548c\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5982\u4f55\u5229\u7528MLLMs\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u7684\u8868\u793a\u65b9\u5f0f\u5c1a\u672a\u5145\u5206\u7814\u7a76\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6846\u67b6E5-V\uff0c\u65e8\u5728\u4f7fMLLMs\u9002\u5e94\u5b9e\u73b0\u901a\u7528\u591a\u6a21\u6001\u5d4c\u5165\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\uff0cMLLMs\u5728\u5904\u7406\u591a\u6a21\u6001\u8f93\u5165\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u901a\u8fc7\u7ed3\u5408\u63d0\u793a\uff0cE5-V\u6709\u6548\u5730\u5f25\u5408\u4e86\u4e0d\u540c\u7c7b\u578b\u8f93\u5165\u4e4b\u95f4\u7684\u6a21\u6001\u9e3f\u6c9f\uff0c\u5373\u4f7f\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5d4c\u5165\u80fd\u529b\u3002 ### \u65b9\u6cd5 E5-V\u91c7\u7528\u5355\u4e00\u6a21\u6001\u8bad\u7ec3\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u57fa\u4e8e\u56fe\u50cf-\u6587\u672c\u5bf9\u7684\u591a\u6a21\u6001\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86\u5927\u7ea695%\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u907f\u514d\u4e86\u6536\u96c6\u6602\u8d35\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6570\u636e\u7684\u9700\u6c42\u3002\u5b9e\u9a8c\u5728\u56db\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u9a8c\u8bc1\uff0c\u4ee5\u5c55\u793aE5-V\u7684\u6709\u6548\u6027\u3002 ### \u7ed3\u679c \u4f5c\u4e3a\u4e00\u6b3e\u901a\u7528\u591a\u6a21\u6001\u6a21\u578b\uff0cE5-V\u4e0d\u4ec5\u5728\u5404\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9876\u5c16\u6027\u80fd\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\u4e86\u73b0\u6709\u6280\u672f\u6c34\u5e73\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u662f\u57fa\u4e8e\u5355\u6a21\u6001\u8bad\u7ec3\u5b8c\u6210\u7684\u3002**|\n", "2407.13761": "|**2024-07-18**|**SegPoint: Segment Any Point Cloud via Large Language Model**|Shuting He et.al.|[2407.13761](http://arxiv.org/abs/2407.13761)|null|\u5c3d\u7ba1\u4e09\u7ef4\u70b9\u4e91\u5206\u5272\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u4f9d\u8d56\u4e8e\u660e\u786e\u7684\u6307\u4ee4\u6765\u8bc6\u522b\u76ee\u6807\uff0c\u7f3a\u4e4f\u5728\u7edf\u4e00\u6846\u67b6\u4e2d\u7406\u89e3\u548c\u63a8\u65ad\u7528\u6237\u9690\u542b\u610f\u56fe\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSegPoint\u7684\u6a21\u578b\uff0c\u5b83\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u70b9\u7ea7\u5206\u5272\uff1a1\uff09\u4e09\u7ef4\u6307\u4ee4\u5206\u5272\uff0c2\uff09\u4e09\u7ef4\u6307\u79f0\u5206\u5272\uff0c3\uff09\u4e09\u7ef4\u8bed\u4e49\u5206\u5272\uff0c\u4ee5\u53ca4\uff09\u4e09\u7ef4\u5f00\u653e\u8bcd\u6c47\u8bed\u4e49\u5206\u5272\u3002\u4e3a\u4e86\u63a8\u52a8\u4e09\u7ef4\u6307\u4ee4\u7814\u7a76\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6Instruct3D\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ece\u590d\u6742\u548c\u9690\u542b\u6307\u4ee4\u6587\u672c\u8fdb\u884c\u5206\u5272\u6027\u80fd\uff0c\u5305\u542b2,565\u4e2a\u70b9\u4e91-\u6307\u4ee4\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSegPoint\u5728ScanRefer\u6307\u79f0\u5206\u5272\u548cScanNet\u8bed\u4e49\u5206\u5272\u7b49\u65e2\u6709\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u7ade\u4e89\u529b\uff0c\u540c\u65f6\u5728Instruct3D\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u4f18\u5f02\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cSegPoint\u662f\u9996\u4e2a\u5728\u4e00\u4e2a\u6846\u67b6\u5185\u5904\u7406\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u5206\u5272\u4efb\u52a1\u5e76\u8fbe\u5230\u6ee1\u610f\u6027\u80fd\u7684\u6a21\u578b\u3002|\n", "2407.13757": "|**2024-07-18**|**Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models**|Zhuo Chen et.al.|[2407.13757](http://arxiv.org/abs/2407.13757)|null|## \u4efb\u52a1 \u672c\u7814\u7a76\u5173\u6ce8\u4e8eRetrieval-Augmented Generation\uff08RAG\uff09\u6a21\u578b\u5728\u9762\u5bf9\u9ed1\u76d2\u653b\u51fb\u65f6\u7684\u8106\u5f31\u6027\uff0c\u5c24\u5176\u662f\u5728\u610f\u89c1\u64cd\u7eb5\u65b9\u9762\u7684\u5e94\u7528\u3002RAG\u65e8\u5728\u89e3\u51b3\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5e7b\u89c9\u95ee\u9898\u548c\u5b9e\u65f6\u7ea6\u675f\uff0c\u4f46\u540c\u65f6\u4e5f\u66b4\u9732\u51fa\u5bf9\u6297\u68c0\u7d22\u7be1\u6539\u653b\u51fb\u7684\u5f31\u70b9\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u767d\u76d2\u548c\u5c01\u95ed\u9886\u57df\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684RAG\u4e0d\u7a33\u5b9a\u6027\u3002\u672c\u6587\u7684\u76ee\u6807\u662f\u63ed\u793a\u5f53RAG\u6a21\u578b\u906d\u9047\u9ed1\u76d2\u653b\u51fb\u65f6\uff0c\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4e3a\u63d0\u9ad8\u6a21\u578b\u7684\u53ef\u9760\u6027\u548c\u5b89\u5168\u6027\u63d0\u4f9b\u65b0\u89c1\u89e3\u3002 \u6211\u4eec\u901a\u8fc7\u64cd\u63a7RAG\u4e2d\u68c0\u7d22\u6a21\u578b\u7684\u6392\u540d\u7ed3\u679c\uff0c\u5229\u7528\u8fd9\u4e9b\u64cd\u7eb5\u540e\u7684\u6570\u636e\u8bad\u7ec3\u4e00\u4e2a\u4ee3\u7406\u6a21\u578b\u3002\u63a5\u7740\uff0c\u91c7\u7528\u5bf9\u6297\u6027\u68c0\u7d22\u653b\u51fb\u65b9\u6cd5\u9488\u5bf9\u4ee3\u7406\u6a21\u578b\u5b9e\u65bd\u9ed1\u76d2\u8fc1\u79fb\u653b\u51fb\uff0c\u8fdb\u4e00\u6b65\u5f71\u54cdRAG\u7684\u751f\u6210\u8fc7\u7a0b\u3002\u5728\u6d89\u53ca\u591a\u4e2a\u4e3b\u9898\u7684\u610f\u89c1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u63d0\u51fa\u7684\u653b\u51fb\u7b56\u7565\u80fd\u663e\u8457\u6539\u53d8RAG\u751f\u6210\u5185\u5bb9\u7684\u89c2\u70b9\u6781\u6027\uff0c\u8fd9\u63ed\u793a\u4e86\u6a21\u578b\u7684\u6613\u53d7\u653b\u51fb\u6027\uff0c\u5e76\u4e14\u6f5c\u5728\u5730\u6307\u51fa\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u4f7f\u5f97\u8bef\u5bfc\u7528\u6237\u63a5\u53d7\u9519\u8bef\u6216\u6709\u504f\u89c1\u7684\u4fe1\u606f\u53d8\u5f97\u66f4\u52a0\u5bb9\u6613\u3002|\n", "2407.13742": "|**2024-07-18**|**CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications**|Mirza Masfiqur Rahman et.al.|[2407.13742](http://arxiv.org/abs/2407.13742)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u8702\u7a9d\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u5e38\u5e38\u5c06\u5b89\u5168\u6f0f\u6d1e\u5f52\u548e\u4e8e\u5e95\u5c42\u534f\u8bae\u8bbe\u8ba1\u63cf\u8ff0\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u901a\u5e38\u957f\u8fbe\u6570\u5343\u9875\u7684\u8be6\u7ec6\u89c4\u683c\u6587\u6863\u53ef\u80fd\u5305\u542b\u9519\u8bef\u3001\u4e0d\u5b8c\u6574\u63cf\u8ff0\u3001\u9690\u542b\u5047\u8bbe\u548c\u5185\u90e8\u77db\u76fe\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51faCellularLint\u2014\u2014\u4e00\u4e2a\u9488\u5bf94G\u548c5G\u975e\u63a5\u5165\u5c42\uff08Non-Access Stratum\uff0cNAS\uff09\u548c\u5b89\u5168\u89c4\u8303\u7684\u534a\u81ea\u52a8\u6846\u67b6\uff0c\u5229\u7528\u4e00\u5957\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u9886\u57df\u9002\u5e94\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6539\u826f\u7684\u5c11\u91cf\u6837\u4f8b\u5b66\u4e60\u3002\u8be5\u6a21\u578b\u9884\u8bad\u7ec3\u5728\u5927\u91cf\u7684\u8702\u7a9d\u7f51\u7edc\u534f\u8bae\u6570\u636e\u4e0a\uff0c\u80fd\u591f\u540c\u65f6\u68c0\u6d4b\u4e0d\u540c\u8bed\u4e49\u5c42\u6b21\u548c\u5b9e\u9645\u4f7f\u7528\u6848\u4f8b\u4e2d\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u4ee5\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u65b9\u5f0f\u63d0\u5347\u534f\u8bae\u89c4\u683c\u7684\u81ea\u52a8\u5316\u5206\u6790\u3002\u901a\u8fc7\u7814\u7a76\uff0c\u6211\u4eec\u57284G\u548c5G\u7f51\u7edc\u4e2d\u53d1\u73b0\u4e86157\u4e2a\u4e0d\u4e00\u81f4\u70b9\uff0c\u51c6\u786e\u7387\u4e3a82.67%\u3002\u7ecf\u8fc7\u5bf9\u5f00\u6e90\u5b9e\u73b0\u548c17\u6b3e\u5546\u7528\u8bbe\u5907\u7684\u9a8c\u8bc1\uff0c\u6211\u4eec\u786e\u8ba4\u8fd9\u4e9b\u4e0d\u4e00\u81f4\u786e\u5b9e\u5bf9\u8bbe\u8ba1\u51b3\u7b56\u6709\u91cd\u5927\u5f71\u54cd\uff0c\u53ef\u80fd\u5bfc\u81f4\u9690\u79c1\u3001\u5b8c\u6574\u6027\u3001\u53ef\u7528\u6027\u548c\u4e92\u64cd\u4f5c\u6027\u65b9\u9762\u7684\u62c5\u5fe7\u3002|\n", "2407.13729": "|**2024-07-18**|**Baba Is AI: Break the Rules to Beat the Benchmark**|Nathan Cloos et.al.|[2407.13729](http://arxiv.org/abs/2407.13729)|null|\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u65e2\u4f9d\u8d56\u4e8e\u9075\u5faa\u73b0\u6709\u89c4\u5219\u548c\u7a0b\u5e8f\uff0c\u4e5f\u4f9d\u8d56\u4e8e\u521b\u65b0\u601d\u7ef4\u6765\u91cd\u65b0\u5b9a\u4e49\u89c4\u5219\u548c\u76ee\u6807\u3002\u4e3a\u4e86\u68c0\u9a8c\u8fd9\u4e9b\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e\u6e38\u620f\u300aBaba Is You\u300b\u3002\u5728\u8fd9\u4e2a\u6e38\u620f\u4e2d\uff0c\u4ee3\u7406\u9700\u8981\u64cd\u63a7\u73af\u5883\u4e2d\u7684\u7269\u4f53\u548c\u53ef\u79fb\u52a8\u7684\u6587\u5b57\u89c4\u5219\u74f7\u7816\uff0c\u4ee5\u5b9e\u73b0\u7279\u5b9a\u76ee\u6807\u5e76\u8d62\u5f97\u6bd4\u8d5b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u4e09\u79cd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08OpenAI GPT-4\u3001Google Gemini-1.5-Pro\u548cGemini-1.5-Flash\uff09\uff0c\u53d1\u73b0\u5f53\u9700\u8981\u5bf9\u6e38\u620f\u89c4\u5219\u8fdb\u884c\u64cd\u7eb5\u548c\u7ec4\u5408\u65f6\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u5927\u5e45\u4e0b\u6ed1\u3002|\n", "2407.13717": "|**2024-07-18**|**CoDefeater: Using LLMs To Find Defeaters in Assurance Cases**|Usman Gohar et.al.|[2407.13717](http://arxiv.org/abs/2407.13717)|**[link](https://gitlab.com/anonymousdot/codefeater)**|\u6784\u5efa\u4fdd\u8bc1\u6848\u4f8b\u662f\u4e00\u79cd\u5e38\u7528\u4e14\u6709\u65f6\u5fc5\u8981\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc1\u660e\u5b89\u5168\u5173\u952e\u7cfb\u7edf\u5728\u5176\u89c4\u5212\u73af\u5883\u4e2d\u5c06\u5b89\u5168\u8fd0\u884c\u3002\u4e3a\u4e86\u964d\u4f4e\u9519\u8bef\u548c\u8fb9\u7f18\u60c5\u51b5\u9057\u6f0f\u7684\u98ce\u9669\uff0c\u5f15\u5165\u4e86\u201c\u53cd\u9a73\u201d\u6982\u5ff5\uff0c\u5373\u6311\u6218\u4fdd\u8bc1\u6848\u4f8b\u4e2d\u8bba\u70b9\u6216\u8bc1\u636e\u7684\u8bba\u636e\u3002\u53cd\u9a73\u6709\u52a9\u4e8e\u53ca\u65f6\u53d1\u73b0\u8bba\u70b9\u4e2d\u7684\u5f31\u70b9\uff0c\u4fc3\u4f7f\u8fdb\u4e00\u6b65\u8c03\u67e5\u548c\u53ca\u65f6\u8865\u6551\u3002\u7136\u800c\uff0c\u6355\u6349\u53cd\u9a73\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u5224\u65ad\u3001\u7ecf\u9a8c\u548c\u521b\u65b0\u601d\u7ef4\uff0c\u5e76\u4e14\u5fc5\u987b\u968f\u7740\u9700\u6c42\u548c\u6cd5\u89c4\u7684\u53d8\u5316\u8fdb\u884c\u8fed\u4ee3\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51faCoDefeater\uff0c\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5bfb\u627e\u53cd\u9a73\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u627e\u5230\u5df2\u77e5\u548c\u672a\u77e5\u7684\u5408\u7406\u53cd\u9a73\uff0c\u4ece\u800c\u5e2e\u52a9\u5b89\u5168\u5206\u6790\u5e08\u589e\u5f3a\u4fdd\u8bc1\u6848\u4f8b\u7684\u5b8c\u6574\u6027\u548c\u4fe1\u5fc3\u3002|\n", "2407.13709": "|**2024-07-18**|**Understanding Reference Policies in Direct Preference Optimization**|Yixin Liu et.al.|[2407.13709](http://arxiv.org/abs/2407.13709)|**[link](https://github.com/yale-nlp/refdpo)**|## \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08Direct Preference Optimization\uff0c\u7b80\u79f0 DPO\uff09\u5df2\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u6307\u4ee4\u5fae\u8c03\u7684\u5e38\u7528\u8bad\u7ec3\u65b9\u6cd5\u3002\u672c\u7814\u7a76\u5173\u6ce8DPO\u7684\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u8ba8\u7684\u65b9\u9762\uff1a\u5176\u5bf9\u53c2\u8003\u6a21\u578b\u6216\u7b56\u7565\u7684\u4f9d\u8d56\u6027\u3002\u8fd9\u4e9b\u53c2\u8003\u7b56\u7565\u901a\u5e38\u8868\u73b0\u4e3a\u5f85\u8fdb\u4e00\u6b65\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5b83\u4eec\u5bf9\u4e8eDPO\u7684\u6548\u679c\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u672c\u5de5\u4f5c\u9488\u5bf9\u4ee5\u4e0b\u4e09\u4e2a\u76f8\u5173\u95ee\u9898\u8fdb\u884c\u4e86\u63a2\u7a76\uff1a 1. \u9996\u5148\uff0c\u6211\u4eec\u7814\u7a76\u4e86DPO\u4e2d\u7684KL\u6563\u5ea6\u7ea6\u675f\u5f3a\u5ea6\u7684\u6700\u4f73\u9009\u62e9\uff0c\u8be5\u7ea6\u675f\u60e9\u7f5a\u4e0e\u53c2\u8003\u7b56\u7565\u7684\u504f\u79bb\uff0c\u53d1\u73b0DPO\u5bf9\u6b64\u654f\u611f\u3002 2. \u5176\u6b21\uff0c\u6211\u4eec\u4ece\u7406\u8bba\u548c\u5b9e\u8bc1\u4e0a\u6bd4\u8f83\u4e86DPO\u4e0e\u5176\u4ed6\u5b66\u4e60\u76ee\u6807\uff0c\u4ee5\u63a2\u8ba8\u53c2\u8003\u7b56\u7565\u5728\u6307\u4ee4\u5fae\u8c03\u4e2d\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u663e\u793a\u4e86DPO\u7684\u4f18\u52bf\u3002 3. \u6700\u540e\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u662f\u5426\u6709\u5229\u4e8eDPO\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u53c2\u8003\u7b56\u7565\u4e0e\u88ab\u5fae\u8c03\u6a21\u578b\u76f8\u4f3c\u65f6\uff0c\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u53ef\u80fd\u4f1a\u63d0\u9ad8\u6027\u80fd\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u53c2\u8003\u7b56\u7565\u5728DPO\u4e2d\u7684\u6df7\u6dc6\u4f5c\u7528\uff0c\u63d0\u4f9b\u4e86\u6700\u4f73\u5b9e\u8df5\u7684\u89c1\u89e3\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765\u7814\u7a76\u63d0\u51fa\u4e86\u5f00\u653e\u6027\u95ee\u9898\u3002|\n", "2407.13699": "|**2024-07-18**|**A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice**|Shaina Raza et.al.|[2407.13699](http://arxiv.org/abs/2407.13699)|null|## \u80cc\u666f \u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u901a\u8fc7\u63d0\u4f9b\u4e2a\u6027\u5316\u9879\u76ee\u5efa\u8bae\uff0c\u5bf9\u63d0\u5347\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u672c\u7efc\u8ff0\u56de\u987e\u4e86\u4ece2017\u5e74\u81f32024\u5e74\u95f4RS\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u5c06\u7406\u8bba\u521b\u65b0\u4e0e\u5b9e\u9645\u5e94\u7528\u7d27\u5bc6\u7ed3\u5408\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u4ece\u4f20\u7edf\u65b9\u6cd5\u5982\u57fa\u4e8e\u5185\u5bb9\u548c\u534f\u540c\u8fc7\u6ee4\u7684\u63a8\u8350\uff0c\u5230\u9ad8\u7ea7\u6280\u672f\u5982\u6df1\u5ea6\u5b66\u4e60\u3001\u56fe\u6a21\u578b\u3001\u5f3a\u5316\u5b66\u4e60\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5173\u6ce8\u4e86\u4e13\u95e8\u5316\u7684\u7cfb\u7edf\uff0c\u5982\u4e0a\u4e0b\u6587\u611f\u77e5\u3001\u8bc4\u8bba\u9a71\u52a8\u548c\u516c\u5e73\u6027\u8003\u91cf\u7684RS\u3002\u672c\u8c03\u67e5\u7684\u76ee\u6807\u662f\u8fde\u63a5\u7406\u8bba\u4e0e\u5b9e\u8df5\uff0c\u5173\u6ce8\u7535\u5b50\u5546\u52a1\u3001\u533b\u7597\u4fdd\u5065\u548c\u91d1\u878d\u7b49\u9886\u57df\u7684\u6311\u6218\uff0c\u5f3a\u8c03\u5bf9\u53ef\u6269\u5c55\u3001\u5b9e\u65f6\u4e14\u503c\u5f97\u4fe1\u8d56\u89e3\u51b3\u65b9\u6848\u7684\u9700\u6c42\u3002\u901a\u8fc7\u6b64\u7efc\u8ff0\uff0c\u6211\u4eec\u9f13\u52b1\u5b66\u672f\u7814\u7a76\u4e0e\u884c\u4e1a\u5b9e\u8df5\u7684\u7d27\u5bc6\u5408\u4f5c\u3002\u672c\u7814\u7a76\u63d0\u4f9b\u7684\u6d1e\u89c1\u65e8\u5728\u5e2e\u52a9\u4e1a\u754c\u4e13\u4e1a\u4eba\u5458\u4f18\u5316RS\u90e8\u7f72\uff0c\u5e76\u6fc0\u53d1\u672a\u6765\u7814\u7a76\u7684\u65b0\u65b9\u5411\uff0c\u7279\u522b\u662f\u5728\u5e94\u5bf9\u65b0\u5174\u6280\u672f\u548c\u793e\u4f1a\u8d8b\u52bf\u65f6\u3002|\n", "2407.13692": "|**2024-07-18**|**Prover-Verifier Games improve legibility of LLM outputs**|Jan Hendrik Kirchner et.al.|[2407.13692](http://arxiv.org/abs/2407.13692)|null|\u4e3a\u4e86\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f93\u51fa\u7ed3\u679c\u7684\u53ef\u4fe1\u5ea6\uff0c\u4e00\u4e2a\u65b9\u6cd5\u662f\u652f\u6301\u6e05\u6670\u6613\u9a8c\u8bc1\u7684\u63a8\u7406\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u53ef\u8bfb\u6027\u3002\u672c\u6587\u4ee5\u89e3\u51b3\u5c0f\u5b66\u6570\u5b66\u95ee\u9898\u4e3a\u80cc\u666f\uff0c\u7814\u7a76\u4e86\u53ef\u8bfb\u6027\uff0c\u5e76\u53d1\u73b0\u4ec5\u4f18\u5316\u8fde\u8d2f\u601d\u7ef4\u89e3\u9898\u7684\u51c6\u786e\u6027\u53ef\u80fd\u4f1a\u964d\u4f4e\u5176\u53ef\u8bfb\u6027\u3002\u4e3a\u7f13\u89e3\u8fd9\u4e00\u635f\u5931\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53d7Anil\u7b49\u4eba\uff082021\uff09\u7684\u8bc1\u660e\u5668-\u9a8c\u8bc1\u5668\u6e38\u620f\u542f\u53d1\u7684\u8bad\u7ec3\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u8fed\u4ee3\u5730\u8bad\u7ec3\u5c0f\u578b\u9a8c\u8bc1\u5668\u9884\u6d4b\u89e3\u9898\u6b63\u786e\u6027\uff0c\"\u6709\u5e2e\u52a9\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u9a8c\u8bc1\u5668\u63a5\u53d7\u7684\u6b63\u786e\u89e3\u7b54\uff0c\u4ee5\u53ca\"\u72e1\u733e\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u6b3a\u9a97\u9a8c\u8bc1\u5668\u7684\u9519\u8bef\u89e3\u7b54\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6709\u5e2e\u52a9\u8bc1\u660e\u5668\u7684\u51c6\u786e\u6027\u548c\u9a8c\u8bc1\u5668\u5bf9\u6297\u653b\u51fb\u7684\u9c81\u68d2\u6027\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u9488\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u80fd\u591f\u8f6c\u79fb\u7ed9\u65f6\u95f4\u6709\u9650\u7684\u4eba\u7c7b\uff0c\u4ed6\u4eec\u5728\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u6b63\u786e\u6027\u65f6\u7684\u51c6\u786e\u6027\u4f1a\u968f\u7740\u8bad\u7ec3\u63d0\u9ad8\uff0c\u800c\u5728\u9a8c\u8bc1\u72e1\u733e\u8bc1\u660e\u5668\u7684\u89e3\u51b3\u65b9\u6848\u65f6\u4f1a\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u5c0f\u578b\u9a8c\u8bc1\u5668\u8fdb\u884c\u53ef\u8bfb\u6027\u8bad\u7ec3\u53ef\u80fd\u662f\u4e00\u79cd\u5b9e\u9645\u53ef\u884c\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u63d0\u5347\u5927\u578bLLMs\u5bf9\u4eba\u7c7b\u7684\u53ef\u8bfb\u6027\uff0c\u4ece\u800c\u6709\u52a9\u4e8e\u8d85\u7ea7\u4eba\u7c7b\u6a21\u578b\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u662f\u4e00\u4e2a\u5b9e\u7528\u7684\u9014\u5f84\uff0c\u53ef\u4ee5\u589e\u5f3a\u5927\u578bLLMs\u7684\u53ef\u8bfb\u6027\uff0c\u5bf9\u4eba\u7c7b\u6765\u8bf4\u66f4\u6613\u4e8e\u7406\u89e3\u548c\u4fe1\u4efb\u3002|\n", "2407.13648": "|**2024-07-18**|**COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization**|Skyler Grandel et.al.|[2407.13648](http://arxiv.org/abs/2407.13648)|null|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u8f6f\u4ef6\u7ef4\u62a4\u4e2d\u4ee3\u7801\u7406\u89e3\u7684\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7\u81ea\u52a8\u5316\u751f\u6210\u6ce8\u91ca\u6765\u63d0\u5347\u8fd9\u4e00\u8fc7\u7a0b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOMCAT\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u9886\u57df\u4e13\u5bb6\u6307\u5bfc\uff0c\u65e8\u5728\u4e3a\u6e90\u4ee3\u7801\u63d0\u4f9b\u6709\u52a9\u4e8e\u7406\u89e3\u7684\u6ce8\u91ca\u3002COMCAT\u6d41\u7a0b\u5305\u62ec\u81ea\u52a8\u8bc6\u522b\u4ee3\u7801\u4e2d\u9002\u5408\u6dfb\u52a0\u6ce8\u91ca\u7684\u4f4d\u7f6e\u3001\u9884\u6d4b\u6bcf\u4e2a\u4f4d\u7f6e\u6700\u9002\u5408\u7684\u6ce8\u91ca\u7c7b\u578b\uff0c\u5e76\u6839\u636e\u9009\u5b9a\u4f4d\u7f6e\u548c\u7c7b\u578b\u751f\u6210\u6ce8\u91ca\u3002\u5728\u4eba\u7c7b\u53d7\u8bd5\u8005\u7684\u7814\u7a76\u4e2d\uff0c\u7ed3\u679c\u663e\u793aCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u663e\u8457\u63d0\u9ad8\u4e86\u5f00\u53d1\u4eba\u5458\u5728\u4e09\u4e2a\u5178\u578b\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u7684\u4ee3\u7801\u7406\u89e3\u80fd\u529b\uff0c\u5bf9\u4e8e87%\u7684\u53c2\u4e0e\u8005\uff0c\u63d0\u5347\u5e45\u5ea6\u8fbe\u523012%\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u8868\u660eCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u5728\u51c6\u786e\u6027\u3001\u53ef\u8bfb\u6027\u4e0a\u81f3\u5c11\u4e0e\u4eba\u5de5\u6ce8\u91ca\u76f8\u5f53\uff0c\u5e76\u4e14\u572892%\u7684\u4ee3\u7801\u7247\u6bb5\u4e2d\uff0c\u5f00\u53d1\u8005\u66f4\u504f\u597dCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\uff0c\u800c\u975e\u6807\u51c6\u7684ChatGPT\u751f\u6210\u7684\u6ce8\u91ca\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u5f00\u53d1\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5305\u542b\u6e90\u4ee3\u7801\u7247\u6bb5\u3001\u4eba\u5de5\u7f16\u5199\u6ce8\u91ca\u548c\u6807\u6ce8\u7684\u7c7b\u522b\u6570\u636e\u96c6\u3002\u603b\u7684\u6765\u8bf4\uff0cCOMCAT\u5229\u7528LLMs\u5728\u591a\u79cd\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u4ee3\u7801\u7406\u89e3\u6c34\u5e73\u3002|\n", "2407.13647": "|**2024-07-18**|**Weak-to-Strong Reasoning**|Yuqing Yang et.al.|[2407.13647](http://arxiv.org/abs/2407.13647)|**[link](https://github.com/gair-nlp/weak-to-strong-reasoning)**|\u5f53\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6027\u80fd\u8d85\u8d8a\u4eba\u7c7b\u65f6\uff0c\u4e3a\u5176\u63d0\u4f9b\u5168\u9762\u800c\u7cbe\u786e\u7684\u76d1\u7763\u53d8\u5f97\u56f0\u96be\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u5f31\u5230\u5f3a\u5b66\u4e60\u65b9\u6cd5\uff0c\u5373\u5229\u7528\u80fd\u529b\u8f83\u5f31\u7684\u6a21\u578b\u6fc0\u53d1\u8f83\u5f3a\u6a21\u578b\u7684\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u793a\u51fa\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u7684\u6548\u679c\u5c1a\u672a\u5f97\u5230\u5145\u5206\u68c0\u9a8c\uff0c\u4e14\u5f53\u524d\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u6765\u907f\u514d\u6a21\u578b\u76f2\u76ee\u6a21\u4eff\u5f31\u5bfc\u5e08\uff0c\u5305\u62ec\u5176\u9519\u8bef\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6e10\u8fdb\u5b66\u4e60\u6846\u67b6\uff0c\u4f7f\u5f3a\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u4f18\u5316\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u7ea7\u6a21\u578b\u6216\u4eba\u5de5\u6807\u6ce8\u7684\u6570\u636e\u3002\u8be5\u6846\u67b6\u9996\u5148\u5bf9\u9009\u5b9a\u7684\u5c0f\u800c\u9ad8\u8d28\u91cf\u6570\u636e\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u7136\u540e\u5728\u5f3a\u6a21\u578b\u81ea\u884c\u8bc6\u522b\u7684\u5bf9\u6bd4\u6837\u672c\u4e0a\u8fdb\u884c\u504f\u597d\u4f18\u5316\u3002\u6211\u4eec\u5728GSM8K\u548cMATH\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86Llama2-70b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u4e09\u79cd\u4e0d\u540c\u7684\u5f31\u6a21\u578b\u8fdb\u884c\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5728\u524d\u77bb\u6027\u7684\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0cLlama3-8b-instruct\u6210\u529f\u6307\u5bfcLlama3-70b\u5728\u6781\u5177\u6311\u6218\u6027\u7684OlympicArena\u6570\u636e\u96c6\u4e0a\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u7684\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u53ef\u6269\u5c55\u548c\u9ad8\u7ea7\u7684\u7b56\u7565\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728\u83b7\u53d6\u3002|\n", "2407.14507": "|**2024-07-19**|**Internal Consistency and Self-Feedback in Large Language Models: A Survey**|Xun Liang et.al.|[2407.14507](http://arxiv.org/abs/2407.14507)|**[link](https://github.com/iaar-shanghai/icsfsurvey)**|**\u672c\u6587\u603b\u7ed3\u4e86\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u79f0\u4e3a\u5185\u90e8\u4e00\u81f4\u6027\uff08Internal Consistency\uff09\uff0c\u5b83\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u7406\u4e0d\u8db3\u548c\u751f\u6210\u5e7b\u89c9\u5185\u5bb9\u7b49\u95ee\u9898\u4e0a\u7684\u8868\u73b0\u63d0\u4f9b\u4e86\u4e00\u81f4\u7684\u89e3\u91ca\u3002\u5185\u90e8\u4e00\u81f4\u6027\u8bc4\u4f30\u4e86LLM\u7684\u6f5c\u5728\u5c42\u3001\u89e3\u7801\u5c42\u548c\u54cd\u5e94\u5c42\u4e4b\u95f4\u7684\u5185\u5728\u4e00\u81f4\u6027\uff0c\u57fa\u4e8e\u91c7\u6837\u65b9\u6cd5\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86Self-Feedback\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u7406\u8bba\u6846\u67b6\uff0c\u7528\u4e8e\u6316\u6398\u5185\u90e8\u4e00\u81f4\u6027\u7684\u4fe1\u606f\u3002Self-Feedback\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u6a21\u5757\uff1a\u81ea\u6211\u8bc4\u4f30\uff08Self-Evaluation\uff09\u548c\u81ea\u6211\u66f4\u65b0\uff08Self-Update\uff09\u3002 \u6211\u4eec\u7cfb\u7edf\u5730\u6309\u4efb\u52a1\u548c\u7814\u7a76\u65b9\u5411\u5bf9\u8fd9\u4e9b\u7814\u7a76\u8fdb\u884c\u4e86\u5206\u7c7b\uff1b\u603b\u7ed3\u4e86\u76f8\u5173\u7684\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\uff1b\u6df1\u5165\u63a2\u8ba8\u4e86\u201cSelf-Feedback\u771f\u7684\u6709\u6548\u5417\uff1f\u201d\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u4e2a\u5173\u952e\u89c2\u70b9\uff0c\u5305\u62ec\u201c\u5185\u90e8\u4e00\u81f4\u6027\u7684\u53d1\u5c55\u949f\u697c\u201d\u3001\u201c\u4e00\u81f4\u6027\u51e0\u4e4e\u662f\u6b63\u786e\u6027\u201d\u7684\u5047\u8bbe\u4ee5\u53ca\u201c\u6f5c\u610f\u8bc6\u4e0e\u663e\u5f0f\u63a8\u7406\u6096\u8bba\u201d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6982\u8ff0\u4e86\u672a\u6765\u7814\u7a76\u7684\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u5b9e\u9a8c\u4ee3\u7801\u3001\u53c2\u8003\u5217\u8868\u548c\u7edf\u8ba1\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u94fe\u63a5\u4e3a\uff1a[](https://github.com/IAAR-Shanghai/ICSFSurvey)**|\n", "2407.14506": "|**2024-07-19**|**On Pre-training of Multimodal Language Models Customized for Chart Understanding**|Wan-Cyuan Fan et.al.|[2407.14506](http://arxiv.org/abs/2407.14506)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5728\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5b9a\u5236\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u6210\u679c\uff0c\u7279\u522b\u662f\u5728\u79d1\u5b66\u56fe\u8868\u7406\u89e3\u9886\u57df\u3002\u8fd9\u4e9b\u7814\u7a76\u901a\u5e38\u901a\u8fc7\u4f7f\u7528\u4e13\u95e8\u7684\u6570\u636e\u96c6\u8fdb\u884c\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6765\u589e\u5f3a\u95ee\u7b54\uff08QA\uff09\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5f80\u5f80\u5ffd\u89c6\u4e86\u81ea\u7136\u56fe\u50cf-\u63cf\u8ff0\u9884\u8bad\u7ec3\u6570\u636e\u4e0e\u6570\u5b57\u56fe\u8868\u56fe\u50cf-QA\u6570\u636e\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u6a21\u578b\u4ece\u56fe\u8868\u4e2d\u63d0\u53d6\u6f5c\u5728\u6570\u503c\u7684\u80fd\u529b\u3002\u672c\u6587\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u758f\u6f0f\uff0c\u63a2\u7d22\u6539\u8fdbMLLMs\u5bf9\u56fe\u8868\u7406\u89e3\u6240\u9700\u7684\u5173\u952e\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u53d1\u73b0\uff1a\uff081\uff09\u5728\u5bf9\u9f50\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u878d\u5165\u539f\u59cb\u6570\u636e\u503c\u663e\u8457\u63d0\u9ad8\u4e86\u5bf9\u56fe\u8868\u6570\u636e\u7684\u7406\u89e3\u80fd\u529b\u3002\uff082\uff09\u5728\u7aef\u5230\u7aef\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u968f\u673a\u66ff\u6362\u56fe\u50cf\u4e3a\u6587\u672c\u8868\u793a\uff0c\u80fd\u591f\u5c06\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u56fe\u8868\u89e3\u91ca\u6280\u80fd\u4e0a\u3002\uff083\uff09\u8981\u6c42\u6a21\u578b\u9996\u5148\u63d0\u53d6\u5e95\u5c42\u56fe\u8868\u6570\u636e\uff0c\u7136\u540e\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u56de\u7b54\u95ee\u9898\uff0c\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u51c6\u786e\u6027\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86CHOPINLLM\uff0c\u4e00\u79cd\u4e13\u4e3a\u6df1\u5165\u56fe\u8868\u7406\u89e3\u5b9a\u5236\u7684MLLM\u3002CHOPINLLM\u6709\u6548\u5730\u89e3\u6790\u5404\u79cd\u7c7b\u578b\u7684\u56fe\u8868\uff0c\u5305\u62ec\u672a\u6807\u6ce8\u7684\u56fe\u8868\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30MLLMs\u5728\u4e0d\u540c\u56fe\u8868\u7c7b\u578b\u548c\u7406\u89e3\u6c34\u5e73\u4e0a\u7684\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCHOPINLLM\u5728\u7406\u89e3\u5404\u79cd\u7c7b\u578b\u3001\u5e26\u6709\u6807\u6ce8\u548c\u672a\u6807\u6ce8\u7684\u56fe\u8868\u65b9\u9762\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.14487": "|**2024-07-19**|**Evaluating the Reliability of Self-Explanations in Large Language Models**|Korbinian Randl et.al.|[2407.14487](http://arxiv.org/abs/2407.14487)|**[link](https://github.com/k-randl/self-explaining_llms)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u88ab\u63d0\u793a\u89e3\u91ca\u5176\u5148\u524d\u8f93\u51fa\u65f6\u751f\u6210\u7684\u89e3\u91ca\u53ef\u9760\u6027\u3002\u6211\u4eec\u5229\u7528\u4e09\u79cd\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08\u53c2\u6570\u4ece2B\u52308B\uff09\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u5206\u7c7b\u4efb\u52a1\uff08\u5ba2\u89c2\u548c\u4e3b\u89c2\uff09\u4e0a\u8bc4\u4f30\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u81ea\u6211\u89e3\u91ca\u2014\u2014\u62bd\u53d6\u5f0f\u548c\u53cd\u4e8b\u5b9e\u5f0f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u81ea\u6211\u89e3\u91ca\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5e76\u4e0d\u5b8c\u5168\u4e14\u51c6\u786e\u5730\u9075\u5faa\u6a21\u578b\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u6307\u51fa\u4e86\u4e00\u79cd\u611f\u77e5\u4e0e\u5b9e\u9645\u6a21\u578b\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u663e\u793a\uff0c\u901a\u8fc7\u63d0\u793a\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u53cd\u4e8b\u5b9e\u89e3\u91ca\uff0c\u53ef\u4ee5\u4ea7\u751f\u5fe0\u5b9e\u3001\u4fe1\u606f\u4e30\u5bcc\u4e14\u6613\u4e8e\u9a8c\u8bc1\u7684\u7ed3\u679c\u3002\u8fd9\u4e9b\u53cd\u4e8b\u5b9e\u4e3a\u4f20\u7edf\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\uff08\u4f8b\u5982SHAP\u3001LIME\uff09\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u524d\u63d0\u662f\u5bf9\u7279\u5b9a\u4efb\u52a1\u5b9a\u5236\u63d0\u793a\u5e76\u68c0\u67e5\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14474": "|**2024-07-19**|**Contrastive Learning with Counterfactual Explanations for Radiology Report Generation**|Mingjie Li et.al.|[2407.14474](http://arxiv.org/abs/2407.14474)|null|\u7531\u4e8e\u89e3\u5256\u5b66\u7684\u5e38\u89c1\u5185\u5bb9\u548c\u4e0e\u4e4b\u5bf9\u5e94\u7684\u5f71\u50cf\u5b66\u56fe\u50cf\u4e4b\u95f4\u7684\u9ad8\u5ea6\u76f8\u4f3c\u6027\uff0c\u8fd9\u79cd\u56fa\u6709\u7684\u6570\u636e\u504f\u89c1\u53ef\u80fd\u5bfc\u81f4\u81ea\u52a8\u62a5\u544a\u751f\u6210\u6a21\u578b\u5b66\u4e60\u7ea0\u7f20\u548c\u76f8\u5173\u6027\u589e\u5f3a\u7684\u8868\u793a\uff0c\u4ece\u800c\u4ea7\u751f\u8bef\u8bca\u62a5\u544a\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cCo\u201dunter\u201cF\u201dactual \u201cE\u201dxplanations\uff08CoFE\uff09\u6846\u67b6\u7528\u4e8e\u653e\u5c04\u5b66\u62a5\u544a\u751f\u6210\u3002\u53cd\u4e8b\u5b9e\u89e3\u91ca\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u7406\u89e3\u7b97\u6cd5\u51b3\u7b56\u5982\u4f55\u901a\u8fc7\u63d0\u51fa\u201c\u5982\u679c\u201d\u573a\u666f\u800c\u88ab\u6539\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e00\u6982\u5ff5\uff0cCoFE\u53ef\u4ee5\u901a\u8fc7\u5bf9\u6bd4\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u7684\u8868\u793a\u6765\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\uff0c\u4ece\u800c\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u5728\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u4ea4\u6362\u8865\u4e01\u76f4\u5230\u9884\u6d4b\u8bca\u65ad\u53d1\u751f\u53d8\u5316\uff0c\u6211\u4eec\u63a8\u5bfc\u51fa\u53cd\u4e8b\u5b9e\u56fe\u50cf\u3002\u5728\u8fd9\u91cc\uff0c\u6b63\u4f8b\u548c\u8d1f\u4f8b\u662f\u6700\u8bed\u4e49\u4e0a\u76f8\u4f3c\u7684\uff0c\u4f46\u5177\u6709\u4e0d\u540c\u7684\u8bca\u65ad\u6807\u7b7e\u3002\u6b64\u5916\uff0cCoFE\u91c7\u7528\u53ef\u5b66\u4e60\u63d0\u793a\u9ad8\u6548\u5730\u5bf9\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u5c01\u88c5\u4e86\u6b63\u4e8b\u5b9e\u4f8b\u548c\u53cd\u4e8b\u5b9e\u5b9e\u4f8b\u7684\u5185\u5bb9\uff0c\u63d0\u4f9b\u66f4\u901a\u7528\u7684\u63d0\u793a\u8868\u793a\u3002\u5728\u4e24\u4e2a\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u5229\u7528\u53cd\u4e8b\u5b9e\u89e3\u91ca\u4f7fCoFE\u80fd\u591f\u751f\u6210\u8bed\u4e49\u4e0a\u8fde\u8d2f\u4e14\u4e8b\u5b9e\u5b8c\u6574\u7684\u62a5\u544a\uff0c\u5e76\u5728\u8bed\u8a00\u751f\u6210\u548c\u4e34\u5e8a\u6709\u6548\u6027\u6307\u6807\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.14467": "|**2024-07-19**|**Check-Eval: A Checklist-based Approach for Evaluating Text Quality**|Jayr Pereira et.al.|[2407.14467](http://arxiv.org/abs/2407.14467)|null|\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u4f20\u7edf\u7684\u8bc4\u4f30\u6807\u51c6\u5f80\u5f80\u4e0e\u4eba\u7c7b\u7684\u5224\u65ad\u4e0d\u5339\u914d\uff0c\u5c24\u5176\u662f\u5728\u9700\u8981\u521b\u9020\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u7684\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheck-Eval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u901a\u8fc7\u5229\u7528LLM\u4ee5\u68c0\u67e5\u8868\u4e3a\u57fa\u7840\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u3002Check-Eval\u53ef\u4ee5\u4f5c\u4e3a\u65e0\u53c2\u8003\u548c\u6709\u53c2\u8003\u7684\u8bc4\u4f30\u65b9\u6cd5\u4f7f\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u4e14\u53ef\u89e3\u91ca\u7684\u6587\u672c\u8d28\u91cf\u8bc4\u4f30\u4f53\u7cfb\u3002\u8be5\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u9636\u6bb5\u7ec4\u6210\uff1a\u68c0\u67e5\u8868\u751f\u6210\u548c\u68c0\u67e5\u8868\u8bc4\u4f30\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u9a8c\u8bc1\u4e86Check-Eval\uff1a\u8461\u8404\u7259\u8bed\u6cd5\u5f8b\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u6027\u4ee5\u53caSummEval\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cCheck-Eval\u4e0e\u73b0\u6709\u6307\u6807\uff08\u5982G-Eval\u548cGPTScore\uff09\u76f8\u6bd4\uff0c\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5176\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4efb\u52a1\u66f4\u53ef\u9760\u548c\u6709\u6548\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u4ee3\u7801\u53ef\u5728https://anonymous.4open.science/r/check-eval-0DB4\u83b7\u53d6\u3002|\n", "2407.14452": "|**2024-07-19**|**Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier**|Zachary Wojtowicz et.al.|[2407.14452](http://arxiv.org/abs/2407.14452)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u5176\u4ed6\u9ad8\u5ea6\u5148\u8fdb\u7684AI\u7cfb\u7edf\u5728\u51b3\u5b9a\u8bf4\u4ec0\u4e48\u6216\u505a\u4ec0\u4e48\u65f6\u63d0\u4f9b\u4e86\u4fbf\u5229\uff0c\u4f46\u8fd9\u4fbf\u5229\u6027\u5b9e\u9645\u4e0a\u524a\u5f31\u4e86\u5728\u793e\u4f1a\u60c5\u5883\u4e0b\u91c7\u53d6\u6709\u6548\u884c\u52a8\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5f15\u5165\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u8fd9\u4e00\u6574\u5408\u6027\u7406\u8bba\u6982\u5ff5\u6765\u89e3\u91ca\u8fd9\u79cd\u770b\u4f3c\u77db\u76fe\u7684\u73b0\u8c61\u3002\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u53d1\u751f\u5728\u4f7f\u7528\u53ef\u89c2\u5bdf\u7684\u884c\u4e3a\u6765\u8bc1\u5b9e\u4e0d\u53ef\u89c2\u5bdf\u7684\u5fc3\u7406\u4e8b\u5b9e\u7684\u60c5\u51b5\u4e2d\u3002\u4ece\u62db\u8058\u5230\u7ea6\u4f1a\uff0c\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u4f7f\u4eba\u4eec\u80fd\u591f\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e2d\u76f8\u4e92\u4f20\u8fbe\u4ef7\u503c\u89c2\u3001\u610f\u56fe\u3001\u77e5\u8bc6\u72b6\u6001\u7b49\u5fc3\u7406\u7279\u5f81\uff0c\u8fd9\u4e9b\u73af\u5883\u4e2d\u7684\u8bda\u5b9e\u96be\u4ee5\u5f97\u5230\u5f3a\u5236\u6267\u884c\u3002 \u57fa\u4e8e\u7ecf\u6d4e\u5b66\u3001\u7406\u8bba\u751f\u7269\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63cf\u8ff0\u4e86\u4f7f\u4eba\u7c7b\u80fd\u591f\u5b9e\u65bd\u5fc3\u7406\u8bc1\u660e\u7684\u6838\u5fc3\u7406\u8bba\u673a\u5236\u3002\u5bf9\u8fd9\u4e9b\u673a\u5236\u7684\u5206\u6790\u63ed\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u5982\u4f55\u5728\u4f7f\u601d\u8003\u53d8\u5f97\u5bb9\u6613\u7684\u540c\u65f6\uff0c\u5374\u53ef\u80fd\u4f7f\u4f4e\u4fe1\u4efb\u5408\u4f5c\u53d8\u5f97\u66f4\u96be\u3002 \u901a\u8fc7\u7406\u89e3\u5fc3\u7406\u8bc1\u660e\u7684\u5de5\u4f5c\u539f\u7406\u53ca\u5176\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u5e94\u7528\uff0c\u6211\u4eec\u53ef\u4ee5\u8bbe\u8ba1\u51fa\u65e2\u80fd\u4fc3\u8fdb\u9ad8\u6548\u6c9f\u901a\u53c8\u80fd\u7ef4\u62a4\u793e\u4f1a\u534f\u4f5c\u7684AI\u7cfb\u7edf\u3002\u4f8b\u5982\uff0c\u5728\u62db\u8058\u8fc7\u7a0b\u4e2d\uff0cAI\u53ef\u4ee5\u901a\u8fc7\u5206\u6790\u5019\u9009\u4eba\u7684\u884c\u4e3a\u6a21\u5f0f\u548c\u5386\u53f2\u6570\u636e\u6765\u95f4\u63a5\u8bc4\u4f30\u5176\u6280\u80fd\u3001\u56e2\u961f\u5408\u4f5c\u80fd\u529b\u4ee5\u53ca\u5bf9\u516c\u53f8\u6587\u5316\u7684\u9002\u5e94\u6027\uff0c\u4ece\u800c\u5e2e\u52a9\u96c7\u4e3b\u505a\u51fa\u66f4\u53ef\u9760\u7684\u4eba\u624d\u9009\u62e9\u51b3\u7b56\u3002\u5728\u7ea6\u4f1a\u573a\u666f\u4e2d\uff0cAI\u53ef\u4ee5\u5229\u7528\u793e\u4ea4\u5a92\u4f53\u6d3b\u52a8\u3001\u5174\u8da3\u7231\u597d\u7b49\u4fe1\u606f\u6765\u6784\u5efa\u7528\u6237\u7684\u5fc3\u7406\u753b\u50cf\uff0c\u4ee5\u6b64\u5e2e\u52a9\u7528\u6237\u627e\u5230\u4e0e\u81ea\u5df1\u4ef7\u503c\u89c2\u548c\u751f\u6d3b\u65b9\u5f0f\u76f8\u5339\u914d\u7684\u4f34\u4fa3\u3002 \u603b\u4e4b\uff0c\u901a\u8fc7\u5408\u7406\u5730\u8bbe\u8ba1\u548c\u5e94\u7528AI\u6280\u672f\uff0c\u6211\u4eec\u4e0d\u4ec5\u53ef\u4ee5\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e0b\u589e\u5f3a\u4eba\u7c7b\u7684\u4ea4\u6d41\u548c\u5408\u4f5c\u80fd\u529b\uff0c\u800c\u4e14\u8fd8\u80fd\u4fc3\u8fdb\u66f4\u52a0\u516c\u6b63\u3001\u900f\u660e\u548c\u9ad8\u6548\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002|\n", "2407.14439": "|**2024-07-19**|**Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding**|Renshan Zhang et.al.|[2407.14439](http://arxiv.org/abs/2407.14439)|**[link](https://github.com/JiuTian-VL/TokenCorrCompressor)**|**\u5f53\u524d\u4e3b\u6d41\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models, MLLMs\uff09\u5728\u8fdb\u884c\u6587\u6863\u7406\u89e3\u65f6\uff0c\u666e\u904d\u91c7\u7528\u5bf9\u9ad8\u5206\u8fa8\u7387\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u88c1\u526a\uff0c\u4ece\u800c\u751f\u6210\u591a\u4e2a\u5b50\u56fe\u50cf\u7684\u65b9\u6cd5\u3002\u5927\u591a\u6570\u73b0\u6709\u7684\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u4f1a\u4fdd\u7559\u6240\u6709\u5b50\u56fe\u50cf\u5185\u7684\u6807\u8bb0\uff0c\u5e76\u540c\u7b49\u5bf9\u5f85\u5b83\u4eec\uff0c\u8fd9\u5ffd\u89c6\u4e86\u8fd9\u4e9b\u6807\u8bb0\u7684\u4e0d\u540c\u4fe1\u606f\u4ef7\u503c\u6027\uff0c\u5bfc\u81f4\u4e86\u5927\u91cf\u4e0d\u5fc5\u8981\u7684\u56fe\u50cf\u6807\u8bb0\u589e\u52a0\u3002\u4e3a\u4e86\u5b9e\u73b0\u66f4\u52a0\u9002\u5e94\u6027\u548c\u9ad8\u6548\u7684\u6587\u6863\u7406\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cToken\u7ea7\u76f8\u5173\u6027\u5f15\u5bfc\u538b\u7f29\u201d\u7684\u65e0\u53c2\u6570\u4e14\u53ef\u63d2\u62d4\u65b9\u6cd5\uff0c\u65e8\u5728\u4f18\u5316\u6807\u8bb0\u5904\u7406\u8fc7\u7a0b\u3002\u8be5\u65b9\u6cd5\u9996\u5148\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bc4\u4f30\u6a21\u5f0f\u91cd\u590d\u6027\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u6bcf\u4e2a\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u8fdb\u884c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8bc6\u522b\u5197\u4f59\u6807\u8bb0\uff0c\u4ece\u800c\u786e\u5b9a\u5b50\u56fe\u50cf\u7684\u4fe1\u606f\u5bc6\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9Token\u7ea7\u522b\u7684\u91c7\u6837\u65b9\u6cd5\uff0c\u901a\u8fc7\u6df1\u5165\u5206\u6790[CLS]\u6807\u8bb0\u4e0e\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u9ad8\u6548\u6355\u6349\u6700\u5177\u4fe1\u606f\u4ef7\u503c\u7684\u6807\u8bb0\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e24\u79cd\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4f7f\u7528\u88c1\u526a\u6280\u672f\u7684MLLMs\u4e2d\u7684\u81ea\u9002\u5e94\u538b\u7f29\u6a21\u5757\u3002\u8fd9\u4e00\u6a21\u5757\u4e0d\u4ec5\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u8fc7\u7a0b\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u901f\u5ea6\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u4e0e\u73b0\u6709\u538b\u7f29\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u6c34\u5e73\u3002\u6211\u4eec\u4f7f\u7528\u5f53\u524d\u6700\u4f73\u7684\u6587\u6863\u7406\u89e3\u6a21\u578bmPLUG-DocOwl1.5\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4e0e\u5176\u4ed6\u538b\u7f29\u65b9\u6cd5\u7684\u5e7f\u6cdb\u5bf9\u6bd4\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14402": "|**2024-07-19**|**The Vision of Autonomic Computing: Can LLMs Make It a Reality?**|Zhiyang Zhang et.al.|[2407.14402](http://arxiv.org/abs/2407.14402)|null|\u300a\u81ea\u6cbb\u8ba1\u7b97\u613f\u666f\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u300b\u4e00\u6587\u56de\u987e\u4e86\u8d85\u8fc7\u4e8c\u5341\u5e74\u524d\u63d0\u51fa\u7684\u81ea\u6cbb\u8ba1\u7b97\uff08ACV\uff09\u613f\u666f\uff0c\u65e8\u5728\u6784\u5efa\u80fd\u591f\u81ea\u6211\u7ba1\u7406\u548c\u9002\u5e94\u73af\u5883\u53d8\u5316\u7684\u8ba1\u7b97\u7cfb\u7edf\uff0c\u8fd9\u4e00\u76ee\u6807\u81f3\u4eca\u4ecd\u9762\u4e34\u6311\u6218\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u5b83\u4eec\u901a\u8fc7\u5229\u7528\u5e7f\u6cdb\u7684\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u4ee5\u53ca\u4efb\u52a1\u81ea\u52a8\u5316\u80fd\u529b\u6765\u5b9e\u73b0\u8fd9\u4e00\u613f\u666f\u3002 \u672c\u6587\u63a2\u8ba8\u4e86\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u5fae\u670d\u52a1\u7ba1\u7406\u81ea\u4e3b\u6027\u7684\u53ef\u884c\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e94\u7ea7\u5206\u7c7b\u4f53\u7cfb\uff0c\u7528\u4e8e\u63cf\u8ff0\u81ea\u4e3b\u670d\u52a1\u7ef4\u62a4\u7684\u4e0d\u540c\u5c42\u6b21\u3002\u6587\u4e2d\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u201cSock Shop\u201d\u5fae\u670d\u52a1\u6f14\u793a\u9879\u76ee\u7684\u5728\u7ebf\u8bc4\u4f30\u57fa\u51c6\uff0c\u4ee5\u8bc4\u4f30\u8be5\u6846\u67b6\u7684\u6027\u80fd\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7LLMs\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u5fae\u670d\u52a1\u4f53\u7cfb\u7ed3\u6784\u4e2d\u95ee\u9898\u68c0\u6d4b\u548c\u89e3\u51b3\u7684\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u7b2c\u4e09\u7ea7\u81ea\u4e3b\u6027\u6c34\u5e73\u7684\u7a81\u7834\uff0c\u8fd9\u6807\u5fd7\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u6846\u67b6\u96c6\u6210\u65b9\u9762\u7684\u5e94\u7528\u53d6\u5f97\u4e86\u91cd\u8981\u8fdb\u5c55\uff0c\u4e3a\u6784\u5efa\u66f4\u9002\u5e94\u6027\u548c\u81ea\u6211\u7ba1\u7406\u7684\u8ba1\u7b97\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u548c\u53d1\u5c55\uff0c\u76f8\u5173\u7684\u4ee3\u7801\u5c06\u901a\u8fc7\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2407.14371": "|**2024-07-19**|**Open Artificial Knowledge**|Vadim Borisov et.al.|[2407.14371](http://arxiv.org/abs/2407.14371)|null|\u300a\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\uff08OAK\uff09\u6570\u636e\u96c6\uff1a\u4fc3\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u4e0e\u89e3\u51b3\u6570\u636e\u7a00\u7f3a\u4e0e\u9690\u79c1\u95ee\u9898\u300b \u5f53\u524d\uff0c\u57fa\u4e8e\u5bf9\u8bdd\u7684AI\u7cfb\u7edf\u5982ChatGPT\u3001Claude\u548cGemini\u7684\u6210\u529f\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u6d77\u91cf\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u3002\u7136\u800c\uff0c\u83b7\u53d6\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u4f26\u7406\u6765\u6e90\u7684\u6570\u636e\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\u201d\uff08OAK\uff09\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b\u8d85\u8fc75\u4ebf\u4e2a\u4ee4\u724c\uff08\u64b0\u5199\u65f6\uff09\u7684\u5927\u578b\u8d44\u6e90\u5e93\u3002OAK\u901a\u8fc7\u96c6\u5408\u5305\u62ecGPT4o\u3001LLaMa3-70B\u3001LLaMa3-8B\u3001Mixtral-8x7B\u3001Gemma-7B\u548cGemma-2-9B\u5728\u5185\u7684\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5229\u7528\u7ef4\u57fa\u767e\u79d1\u7684\u4e3b\u8981\u7c7b\u522b\u6765\u5f15\u5bfc\u6587\u672c\u751f\u6210\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u9886\u57df\u8986\u76d6\uff0c\u540c\u65f6\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\u3002OAK\u6570\u636e\u96c6\u65e8\u5728\u4fc3\u8fdb\u66f4\u5f3a\u5927\u3001\u66f4\u5bf9\u9f50\u7684\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u5e76\u89e3\u51b3\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u95ee\u9898\uff0c\u5982\u6570\u636e\u7a00\u7f3a\u6027\u548c\u9690\u79c1\u95ee\u9898\u3002\u76ee\u524d\uff0c\u8be5\u6570\u636e\u96c6\u662f\u514d\u8d39\u63d0\u4f9b\u5728www.oakdataset.org\u3002|\n", "2407.14355": "|**2024-07-19**|**Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models**|Xuenan Xu et.al.|[2407.14355](http://arxiv.org/abs/2407.14355)|**[link](https://github.com/wsntxxn/attrenhzsac)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8fdb\u884c\u96f6\u6837\u672c\u97f3\u9891\u5206\u7c7b\uff0c\u5373\u8bc6\u522b\u548c\u5206\u7c7b\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ece\u672a\u89c1\u8fc7\u7684\u97f3\u9891\u7c7b\u522b\u3002\u6211\u4eec\u63d0\u8bae\u5217\u51fa\u4e00\u7cfb\u5217\u97f3\u9891\u5c5e\u6027\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9886\u57df\u77e5\u8bc6\u4e3a\u6bcf\u4e2a\u7c7b\u522b\u751f\u6210\u8be6\u7ec6\u7684\u5c5e\u6027\u63cf\u8ff0\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u4f9d\u8d56\u7c7b\u522b\u6807\u7b7e\u6216\u7b80\u5355\u63cf\u8ff0\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u591a\u7ef4\u5ea6\u7684\u5185\u5728\u542c\u89c9\u5c5e\u6027\uff0c\u6355\u6349\u97f3\u9891\u7c7b\u522b\u7684\u4e0d\u540c\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\u6765\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u6807\u7b7e\u7684\u96f6\u6837\u672c\u5b66\u4e60\u3002\u6211\u4eec\u5728VGGSound\u548cAudioSet\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff08\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://www.github.com/wsntxxn/AttrEnhZsAc\uff09\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u96f6\u6837\u672c\u5206\u7c7b\u51c6\u786e\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u9ad8\u3002\u6d88\u878d\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u65e0\u8bba\u6a21\u578b\u67b6\u6784\u5982\u4f55\uff0c\u6027\u80fd\u589e\u5f3a\u90fd\u975e\u5e38\u7a33\u5065\u3002|\n", "2407.15850": "|**2024-07-22**|**AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description**|Junyu Xie et.al.|[2407.15850](http://arxiv.org/abs/2407.15850)|**[link](https://github.com/Jyxarthur/AutoAD-Zero)**|**\u6211\u4eec\u7684\u76ee\u6807\u662f\u65e0\u9700\u8bad\u7ec3\u5730\u751f\u6210\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u97f3\u9891\u63cf\u8ff0\uff08AD\uff09\u3002\u6211\u4eec\u5229\u7528\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5e76\u5f00\u53d1\u4e86\u89c6\u89c9\u548c\u6587\u672c\u63d0\u793a\u7b56\u7565\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u70b9\uff1a(i) \u6211\u4eec\u8bc1\u660e\uff0c\u5982\u679c\u901a\u8fc7\u89c6\u89c9\u6307\u793a\u76f4\u63a5\u63d0\u793aVLM\u63d0\u4f9b\u89d2\u8272\u4fe1\u606f\uff0cVLM\u53ef\u4ee5\u6210\u529f\u547d\u540d\u548c\u5f15\u7528\u89d2\u8272\uff0c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\uff1b(ii) \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6765\u751f\u6210AD\uff0c\u7b2c\u4e00\u9636\u6bb5\u8ba9VLM\u5168\u9762\u63cf\u8ff0\u89c6\u9891\uff0c\u7b2c\u4e8c\u9636\u6bb5\u4f7f\u7528LLM\u5c06\u5bc6\u96c6\u7684\u6587\u672c\u4fe1\u606f\u603b\u7ed3\u4e3a\u4e00\u4e2a\u7b80\u6d01\u7684AD\u53e5\u5b50\uff1b(iii) \u6211\u4eec\u5236\u5b9a\u4e86\u4e00\u4e2a\u65b0\u7684\u7535\u89c6\u97f3\u9891\u63cf\u8ff0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5AutoAD-Zero\u5728AD\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08\u751a\u81f3\u4e0e\u4e00\u4e9b\u5728\u771f\u5b9eAD\u4e0a\u5fae\u8c03\u7684\u6a21\u578b\u76f8\u5339\u654c\uff09\uff0c\u5b9e\u73b0\u4e86\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u6700\u9ad8CRITIC\u8bc4\u5206\u3002**|\n", "2407.15847": "|**2024-07-22**|**LLMmap: Fingerprinting For Large Language Models**|Dario Pasquini et.al.|[2407.15847](http://arxiv.org/abs/2407.15847)|**[link](https://github.com/pasquini-dario/LLMmap)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u96c6\u6210\u5e94\u7528\u7684\u9996\u4ee3\u6307\u7eb9\u8bc6\u522b\u653b\u51fb\u5de5\u5177\u2014\u2014LLMmap\u3002\u8be5\u5de5\u5177\u91c7\u7528\u79ef\u6781\u7684\u6307\u7eb9\u8bc6\u522b\u7b56\u7565\uff0c\u901a\u8fc7\u5411\u5e94\u7528\u53d1\u9001\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u67e5\u8be2\u5e76\u5206\u6790\u54cd\u5e94\u4fe1\u606f\uff0c\u4ee5\u8bc6\u522b\u6240\u4f7f\u7528\u7684\u5177\u4f53LLM\u6a21\u578b\u3002\u4ec5\u97008\u6b21\u4ea4\u4e92\uff0cLLMmap\u5373\u53ef\u572895%\u4ee5\u4e0a\u7684\u51c6\u786e\u7387\u4e0b\u7cbe\u786e\u8bc6\u522b\u51faLLM\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cLLMmap\u88ab\u8bbe\u8ba1\u5f97\u5177\u6709\u8de8\u4e0d\u540c\u5e94\u7528\u5c42\u7684\u9c81\u68d2\u6027\uff0c\u4f7f\u5176\u80fd\u591f\u8bc6\u522b\u5728\u5404\u79cd\u7cfb\u7edf\u63d0\u793a\u3001\u968f\u673a\u62bd\u6837\u8d85\u53c2\u6570\u4ee5\u53ca\u590d\u6742\u7684\u751f\u6210\u6846\u67b6\u5982RAG\u6216Chain-of-Thought\u7b49\u73af\u5883\u4e0b\u7684LLM\u6a21\u578b\u3002|\n", "2407.15841": "|**2024-07-22**|**SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models**|Mingze Xu et.al.|[2407.15841](http://arxiv.org/abs/2407.15841)|**[link](https://github.com/apple/ml-slowfast-llava)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6162\u901f-LLaVA\u201d\uff08\u6216\u7b80\u79f0\u4e3aSF-LLaVA\uff09\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5b83\u80fd\u591f\u540c\u65f6\u6355\u6349\u8be6\u7ec6\u7684\u7a7a\u95f4\u8bed\u4e49\u548c\u957f\u65f6\u5e8f\u4e0a\u4e0b\u6587\uff0c\u800c\u4e0d\u4f1a\u8d85\u51fa\u901a\u5e38\u4f7f\u7528\u7684LLM\u7684\u4ee4\u724c\u9884\u7b97\u3002\u8fd9\u4e00\u76ee\u6807\u901a\u8fc7\u4f7f\u7528\u89c6\u9891LLM\u8f93\u5165\u7684\u53cc\u6d41\u8bbe\u8ba1\u5b9e\u73b0\uff0c\u6709\u6548\u5730\u805a\u5408\u4e86\u4ece\u91c7\u6837\u89c6\u9891\u5e27\u4e2d\u63d0\u53d6\u7684\u7279\u5f81\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6162\u901f\u8def\u5f84\u4ee5\u8f83\u4f4e\u7684\u5e27\u7387\u63d0\u53d6\u5c3d\u53ef\u80fd\u591a\u7684\u7a7a\u95f4\u7ec6\u8282\u7684\u7279\u5f81\uff08\u4f8b\u5982\uff0c\u4ee524x24\u7684\u4ee4\u724c\uff09\uff0c\u800c\u5feb\u901f\u8def\u5f84\u5219\u4ee5\u8f83\u9ad8\u7684\u5e27\u7387\u64cd\u4f5c\uff0c\u4f46\u4f7f\u7528\u8f83\u5927\u7684\u7a7a\u95f4\u6c60\u5316\u6b65\u957f\uff08\u4f8b\u5982\uff0c\u4e0b\u91c7\u68376x\uff09\u6765\u5173\u6ce8\u8fd0\u52a8\u7ebf\u7d22\u3002\u56e0\u6b64\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u6211\u4eec\u9002\u5f53\u5730\u6355\u83b7\u5bf9\u4e8e\u7406\u89e3\u89c6\u9891\u4e2d\u7684\u8be6\u7ec6\u4fe1\u606f\u6709\u76ca\u7684\u65f6\u7a7a\u7279\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSF-LLaVA\u5728\u5404\u79cd\u89c6\u9891\u4efb\u52a1\u4e0a\u90fd\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u3002\u5728\u67d0\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u751a\u81f3\u4e0e\u5728\u89c6\u9891\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u7684\u6700\u5148\u8fdb\u7684\u89c6\u9891LLM\u5b9e\u73b0\u4e86\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2407.15838": "|**2024-07-22**|**MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity**|Yangzhou Liu et.al.|[2407.15838](http://arxiv.org/abs/2407.15838)|**[link](https://github.com/yuecao0119/mminstruct)**|\u5c3d\u7ba1\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u89c6\u89c9\u4efb\u52a1\u4e0a\u7684\u5fae\u8c03\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4f46\u73b0\u6709\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u5b58\u5728\u4ee5\u4e0b\u5c40\u9650\u6027\uff1a 1. \u6307\u4ee4\u6ce8\u91ca\u8d28\u91cf\uff1a\u867d\u7136\u73b0\u6709\u7684\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u751f\u6210\u7684\u6307\u4ee4\u53ef\u80fd\u4ecd\u4f1a\u5305\u542b\u4e0d\u51c6\u786e\u6027\uff0c\u5982\u5e7b\u89c9\u73b0\u8c61\u3002 2. \u6307\u4ee4\u548c\u56fe\u50cf\u591a\u6837\u6027\uff1a\u6307\u4ee4\u7c7b\u578b\u8303\u56f4\u6709\u9650\u4ee5\u53ca\u56fe\u50cf\u6570\u636e\u7f3a\u4e4f\u591a\u6837\u6027\u53ef\u80fd\u4f1a\u5f71\u54cd\u6a21\u578b\u751f\u6210\u591a\u6837\u6027\u548c\u63a5\u8fd1\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8f93\u51fa\u7684\u80fd\u529b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6MMInstruct\uff0c\u5305\u542b\u6765\u81ea24\u4e2a\u9886\u57df\u5171\u8ba1973K\u6761\u6307\u4ee4\u3002\u8be5\u6570\u636e\u96c6\u5305\u62ec\u56db\u79cd\u6307\u4ee4\u7c7b\u578b\uff1a\u5224\u65ad\u3001\u591a\u9879\u9009\u62e9\u3001\u957f\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u548c\u77ed\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u3002 \u4e3a\u4e86\u6784\u5efaMMInstruct\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6307\u4ee4\u751f\u6210\u6570\u636e\u5f15\u64ce\uff0c\u5229\u7528GPT-4V\u3001GPT-3.5\u548c\u4eba\u5de5\u6821\u6b63\u3002\u6211\u4eec\u7684\u6307\u4ee4\u751f\u6210\u5f15\u64ce\u5141\u8bb8\u534a\u81ea\u52a8\u3001\u4f4e\u6210\u672c\u3001\u591a\u9886\u57df\u7684\u6307\u4ee4\u751f\u6210\uff0c\u6210\u672c\u4ec5\u4e3a\u624b\u52a8\u6784\u5efa\u7684\u516d\u5206\u4e4b\u4e00\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u548c\u6d88\u878d\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86MMInstruct\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff0c\u4f8b\u5982\uff0c\u57fa\u4e8eMMInstruct\u7684\u6a21\u578b\u5fae\u8c03\u572812\u4e2a\u57fa\u51c6\u4e2d\u768410\u4e2a\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u72b6\u6001\u6700\u4f18\u8868\u73b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728https://github.com/yuecao0119/MMInstruct\u63d0\u4f9b\u3002|\n", "2407.15835": "|**2024-07-22**|**dMel: Speech Tokenization made Simple**|He Bai et.al.|[2407.15835](http://arxiv.org/abs/2407.15835)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5229\u7528\u5927\u89c4\u6a21\u6587\u672c\u6570\u636e\u7684\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u7684\u8fdb\u6b65\u3002\u53d7\u6b64\u6210\u529f\u542f\u53d1\uff0c\u7814\u7a76\u4eba\u5458\u63a2\u7d22\u4e86\u590d\u6742\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\uff0c\u4ee5\u5c06\u8fde\u7eed\u7684\u8bed\u97f3\u4fe1\u53f7\u79bb\u6563\u5316\uff0c\u4ece\u800c\u4f7f\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u53ef\u4ee5\u5e94\u7528\u4e8e\u8bed\u97f3\u6570\u636e\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u5efa\u6a21\u8bed\u4e49\u4ee4\u724c\uff0c\u53ef\u80fd\u4f1a\u4e22\u5931\u58f0\u5b66\u4fe1\u606f\uff0c\u8981\u4e48\u5efa\u6a21\u58f0\u5b66\u4ee4\u724c\uff0c\u53c8\u53ef\u80fd\u9762\u4e34\u4e22\u5931\u8bed\u4e49\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5177\u6709\u591a\u79cd\u4ee4\u724c\u7c7b\u578b\u4e5f\u4f7f\u67b6\u6784\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u9700\u8981\u989d\u5916\u7684\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5c06\u6885\u5c14\u6ee4\u6ce2\u5668\u901a\u9053\u79bb\u6563\u5316\u4e3a\u79bb\u6563\u5f3a\u5ea6\u5355\u5143\uff08dMel\uff09\u4ea7\u751f\u4e86\u4e00\u4e2a\u7b80\u5355\u8868\u793a\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5176\u4ed6\u73b0\u6709\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u3002\u4f7f\u7528\u4ec5\u89e3\u7801\u5668\u7684\u53d8\u6362\u5668\u67b6\u6784\u8fdb\u884c\u8bed\u97f3-\u6587\u672c\u5efa\u6a21\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u4e0d\u540c\u7684\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u5728\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u548c\u8bed\u97f3\u5408\u6210\uff08TTS\uff09\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cdMel\u5728\u8054\u5408\u5efa\u6a21\u8bed\u97f3\u548c\u6587\u672c\u7684\u7edf\u4e00\u6846\u67b6\u4e2d\u5b9e\u73b0\u9ad8\u6027\u80fd\u7684\u6709\u6548\u6027\uff0c\u4e3a\u9ad8\u6548\u4e14\u6709\u6548\u7684\u8bed\u97f3\u4e0e\u6587\u672c\u8054\u5408\u5efa\u6a21\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.15819": "|**2024-07-22**|**Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight**|Ziyuan Huang et.al.|[2407.15819](http://arxiv.org/abs/2407.15819)|null|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u94fe\u89c6\u56fe\u201d\u7684\u89c6\u89c9-\u8bed\u8a00\u6865\u6881\u6a21\u5757\uff0c\u65e8\u5728\u52a0\u901f\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u9884\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u5e8f\u5217\u5316\u7684\u89c6\u89c9\u91cd\u91c7\u6837\u5668\uff0c\u80fd\u591f\u6709\u6548\u5730\u6355\u6349\u4e0d\u540c\u7a7a\u95f4\u5c3a\u5ea6\u7684\u89c6\u89c9\u7ec6\u8282\u3002\u8fd9\u79cd\u67b6\u6784\u4e0d\u4ec5\u80fd\u591f\u6709\u6548\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u8fd8\u901a\u8fc7\u590d\u5408\u4ee4\u724c\u7f29\u653e\u7b56\u7565\u7075\u6d3b\u6269\u5c55\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u6700\u591a\u53ef\u4ee5\u589e\u52a016\u500d\u7684\u4ee4\u724c\u6570\u91cf\uff0c\u800c\u65e0\u9700\u5728\u9884\u8bad\u7ec3\u540e\u8fdb\u884c\u5fae\u8c03\u3002\u56e0\u6b64\uff0c\u201c\u94fe\u89c6\u56fe\u201d\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u6240\u9700\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u8fdc\u5c11\u4e8e\u5fae\u8c03\u9636\u6bb5\uff0c\u8fd9\u6709\u610f\u5730\u51cf\u5c11\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u663e\u8457\u52a0\u901f\u4e86\u9884\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8282\u7701\u4e86\u5927\u7ea673%\u7684\u5b9e\u9645\u8bad\u7ec3\u65f6\u95f4\u3002 \u5728\u4e00\u7cfb\u5217\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u201c\u94fe\u89c6\u56fe\u201d\u52a0\u901f\u9884\u8bad\u7ec3\u8fc7\u7a0b\u5e76\u4e0d\u4f1a\u727a\u7272\u6027\u80fd\uff0c\u5176\u8868\u73b0\u4e0e\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u6240\u6709\u89c6\u89c9\u4ee4\u724c\u7684\u6807\u51c6\u6d41\u7a0b\u76f8\u5f53\u6216\u66f4\u597d\u3002\u8fdb\u4e00\u6b65\u589e\u52a0\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u4f1a\u5bfc\u81f4\u66f4\u5f3a\u7684\u8868\u73b0\uff0c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u73b0\u6709\u65b9\u6cd5\u7ade\u4e89\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u6458\u8981\u5df2\u7ecf\u8f6c\u6362\u6210\u4e86\u4e2d\u6587\u8868\u8ff0\uff0c\u5e76\u4e14\u9075\u5faa\u4e86\u4e0d\u5305\u542b\u7279\u6b8a\u7b26\u53f7\u7684\u6307\u793a\u3002|\n", "2407.15788": "|**2024-07-22**|**Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach**|Rian Dolphin et.al.|[2407.15788](http://arxiv.org/abs/2407.15788)|null|\u91d1\u878d\u65b0\u95fb\u5728\u91d1\u878d\u5e02\u573a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u4f46\u5c06\u5176\u8f6c\u5316\u4e3a\u7ed3\u6784\u5316\u6570\u636e\u7684\u8fc7\u7a0b\u4e00\u76f4\u5145\u6ee1\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u91d1\u878d\u65b0\u95fb\u5904\u7406\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u514b\u670d\u4e86\u4ee5\u5f80\u63d0\u53d6\u7ed3\u6784\u5316\u4fe1\u606f\u65f6\u9047\u5230\u7684\u9650\u5236\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u5957\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u80fd\u591f\u4ece\u539f\u59cb\u65b0\u95fb\u6587\u7ae0\u5185\u5bb9\u4e2d\u63d0\u53d6\u76f8\u5173\u516c\u53f8\u4ee3\u7801\uff0c\u5e76\u5728\u4e0d\u4f9d\u8d56\u4e8e\u9884\u7ed3\u6784\u5316\u6570\u636e\u6d41\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u516c\u53f8\u5c42\u9762\u7684\u60c5\u7eea\u5206\u6790\u548c\u751f\u6210\u6458\u8981\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86LLMs\u7684\u751f\u6210\u80fd\u529b\u3001\u4ee5\u53ca\u6700\u65b0\u7684\u63d0\u793a\u6280\u672f\uff0c\u914d\u4ee5\u4e00\u4e2a\u5b9a\u5236\u7684\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u9a8c\u8bc1\u6846\u67b6\u3002 \u901a\u8fc7\u4f7f\u7528\u5305\u542b5530\u7bc7\u91d1\u878d\u65b0\u95fb\u6587\u7ae0\u7684\u6570\u636e\u96c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u76f8\u6bd4\u73b0\u6709\u6570\u636e\u63d0\u4f9b\u5546\uff0c\u6211\u4eec\u670990%\u7684\u6587\u7ae0\u4e0d\u4f1a\u9057\u6f0f\u4efb\u4f55\u516c\u53f8\u4ee3\u7801\uff0c\u800c\u670922%\u7684\u6587\u7ae0\u4f1a\u989d\u5916\u63d0\u4f9b\u76f8\u5173\u7684\u516c\u53f8\u4ee3\u7801\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u5e76\u901a\u8fc7\u5b9e\u65f6API\u7aef\u70b9\u63d0\u4f9b\u4e86\u7ecf\u8fc7\u5904\u7406\u7684\u6570\u636e\uff0c\u66f4\u65b0\u4e86\u6700\u65b0\u65b0\u95fb\u4fe1\u606f\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u6211\u4eec\u9996\u6b21\u4f5c\u4e3a\u6570\u636e\u63d0\u4f9b\u5546\u63d0\u4f9b\u4ece\u65b0\u95fb\u6587\u7ae0\u4e2d\u5bf9\u6bcf\u4e2a\u516c\u53f8\u7684\u7ec6\u81f4\u60c5\u7eea\u5206\u6790\u670d\u52a1\uff0c\u589e\u5f3a\u4e86\u5e02\u573a\u53c2\u4e0e\u8005\u53ef\u83b7\u53d6\u7684\u4fe1\u606f\u6df1\u5ea6\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5229\u7528\u91d1\u878d\u65b0\u95fb\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86\u5305\u542b5530\u7bc7\u5904\u7406\u540e\u6587\u7ae0\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002|\n", "2407.15748": "|**2024-07-22**|**MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation**|Marco Simoni et.al.|[2407.15748](http://arxiv.org/abs/2407.15748)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86MoRSE\uff08\u6df7\u5408RAG\u5b89\u5168\u4e13\u5bb6\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u95e8\u7684AI\u804a\u5929\u673a\u5668\u4eba\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u3002MoRSE\u65e8\u5728\u63d0\u4f9b\u5168\u9762\u4e14\u5b8c\u6574\u7684\u7f51\u7edc\u5b89\u5168\u77e5\u8bc6\u3002MoRSE\u4f7f\u7528\u4e86\u4e24\u4e2aRAG\uff08\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff09\u7cfb\u7edf\uff0c\u8bbe\u8ba1\u7528\u4e8e\u4ece\u591a\u7ef4\u5ea6\u7684\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u548c\u7ec4\u7ec7\u4fe1\u606f\u3002\u4e0e\u4f20\u7edf\u7684RAG\u4e0d\u540c\uff0cMoRSE\u91c7\u7528\u4e86\u5e76\u884c\u68c0\u7d22\u5668\u534f\u540c\u5de5\u4f5c\uff0c\u4ee5\u5728\u4e0d\u540c\u683c\u5f0f\u548c\u7ed3\u6784\u4e2d\u68c0\u7d22\u8bed\u4e49\u76f8\u5173\u7684\u4fe1\u606f\u3002 \u4e0d\u540c\u4e8e\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u5e93\u7684\u4f20\u7edf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cMoRSE\u54cd\u5e94\u7528\u6237\u67e5\u8be2\u65f6\u4ece\u975e\u53c2\u6570\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u76f8\u5173\u6587\u6863\u3002\u968f\u540e\uff0cMoRSE\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0cMoRSE\u53d7\u76ca\u4e8e\u5176\u77e5\u8bc6\u5e93\u7684\u5b9e\u65f6\u66f4\u65b0\uff0c\u8fd9\u4f7f\u5f97\u7cfb\u7edf\u80fd\u591f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u6301\u7eed\u7684\u77e5\u8bc6\u4e30\u5bcc\u3002 \u6211\u4eec\u5bf9MoRSE\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9600\u4e2a\u7279\u5b9a\u7684\u7f51\u7edc\u5b89\u5168\u95ee\u9898\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6027\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u3001Mixtral 7x8\u7b49\u5df2\u77e5\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\uff0c\u5728\u7b54\u6848\u7684\u76f8\u5173\u6027\u548c\u6b63\u786e\u6027\u7684\u6539\u8fdb\u4e0a\u8d85\u8fc7\u4e8610%\u3002|\n", "2407.15736": "|**2024-07-22**|**OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context**|Steffen Kleinle et.al.|[2407.15736](http://arxiv.org/abs/2407.15736)|null|\u5f53\u79fb\u6c11\u5230\u4e00\u4e2a\u65b0\u7684\u56fd\u5bb6\u65f6\uff0c\u4eba\u4eec\u5f88\u5bb9\u6613\u56e0\u9700\u8981\u83b7\u53d6\u6709\u5173\u8d22\u653f\u652f\u6301\u3001\u4f4f\u623f\u3001\u6559\u80b2\u3001\u8bed\u8a00\u8bfe\u7a0b\u4ee5\u53ca\u5176\u4ed6\u95ee\u9898\u7684\u4fe1\u606f\u800c\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5982\u679c\u642c\u8fc1\u8fc7\u7a0b\u5306\u5fd9\u6216\u751a\u81f3\u88ab\u8feb\u8fdb\u884c\uff0c\u5bf9\u8fd9\u4e9b\u95ee\u9898\u7684\u9ad8\u8d28\u91cf\u89e3\u7b54\u53d8\u5f97\u5c24\u4e3a\u8feb\u5207\u3002\u5b98\u65b9\u79fb\u6c11\u987e\u95ee\u901a\u5e38\u8fc7\u4e8e\u7e41\u5fd9\uff0c\u800c\u5728\u7ebf\u7cfb\u7edf\u53ef\u4ee5\u5f15\u5bfc\u65b0\u79fb\u6c11\u627e\u5230\u6240\u9700\u4fe1\u606f\u6216\u5408\u9002\u7684\u54a8\u8be2\u670d\u52a1\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86OMoS-QA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u5fb7\u8bed\u548c\u82f1\u8bed\u95ee\u9898\u4e0e\u76f8\u5173\u53ef\u4fe1\u6587\u6863\u4ee5\u53ca\u624b\u52a8\u6807\u6ce8\u7684\u7b54\u6848\uff0c\u4e13\u95e8\u9488\u5bf9\u8fd9\u4e00\u573a\u666f\u3002\u95ee\u9898\u662f\u7531\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u751f\u6210\u7684\uff0c\u7b54\u6848\u53e5\u5b50\u7531\u5177\u6709\u9ad8\u5ea6\u4e00\u81f4\u6027\u7684\u4f17\u5305\u5de5\u4f5c\u8005\u9009\u62e9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u5fb7\u8bed\u548c\u82f1\u8bed\u4e0a\u5bf95\u4e2a\u9884\u8bad\u7ec3\u7684LLM\u8fdb\u884c\u4e86\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u7684\u6bd4\u8f83\u3002\u5728\u6240\u6709\u6a21\u578b\u548c\u4e24\u79cd\u8bed\u8a00\u4e2d\uff0c\u9009\u62e9\u7b54\u6848\u53e5\u5b50\u7684\u7cbe\u786e\u5ea6\u9ad8\uff0c\u53ec\u56de\u7387\u4f4e\u81f3\u4e2d\u7b49\uff0c\u8fd9\u662f\u4e00\u4e2a\u6709\u5229\u7684\u6743\u8861\uff0c\u4ee5\u907f\u514d\u8bef\u5bfc\u7528\u6237\u3002\u8fd9\u79cd\u6027\u80fd\u5373\u4f7f\u5728\u95ee\u9898\u8bed\u8a00\u4e0e\u6587\u6863\u8bed\u8a00\u4e0d\u5339\u914d\u65f6\u4e5f\u80fd\u4fdd\u6301\u4e0d\u53d8\u3002\u5728\u6839\u636e\u4e0a\u4e0b\u6587\u8bc6\u522b\u4e0d\u53ef\u56de\u7b54\u7684\u95ee\u9898\u65b9\u9762\uff0c\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u5b58\u5728\u66f4\u5927\u7684\u5dee\u5f02\u3002|\n", "2407.15734": "|**2024-07-22**|**TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON**|John Chong Min Tan et.al.|[2407.15734](http://arxiv.org/abs/2407.15734)|**[link](https://github.com/simbianai/taskgen)**|TaskGen\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u4f7f\u7528\u4ee3\u7406\u6765\u89e3\u51b3\u4efb\u610f\u4efb\u52a1\u5e76\u5c06\u5176\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\u6765\u5b9e\u73b0\u3002\u6bcf\u4e2a\u5b50\u4efb\u52a1\u88ab\u6620\u5c04\u5230\u4e00\u4e2a\u88c5\u5907\u51fd\u6570\u6216\u53e6\u4e00\u4e2a\u4ee3\u7406\u6267\u884c\u3002\u4e3a\u4e86\u51cf\u5c11\u5197\u4f59\uff08\u4ece\u800c\u51cf\u5c11\u4ee4\u724c\u4f7f\u7528\uff09\uff0cTaskGen\u4f7f\u7528\u4e86StrictJSON\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u7684JSON\u683c\u5f0f\uff0c\u5e76\u5177\u5907\u7c7b\u578b\u68c0\u67e5\u548c\u8fed\u4ee3\u9519\u8bef\u4fee\u6b63\u7b49\u989d\u5916\u529f\u80fd\u3002TaskGen\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\u57fa\u4e8e\u9700\u6c42\u7ba1\u7406\u4fe1\u606f/\u8bb0\u5fc6\u3002 \u6211\u4eec\u5bf9TaskGen\u5728\u5404\u79cd\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u5305\u62ec40x40\u52a8\u6001\u8ff7\u5bab\u5bfc\u822a\uff0c\u5176\u4e2d\u969c\u788d\u7269\u4f4d\u7f6e\u4f1a\u53d8\u5316\uff08100%\u7684\u6210\u529f\u7387\uff09\uff0c\u6587\u672c\u4e16\u754c\u9003\u8131\u623f\u95f4\u89e3\u8c1c\uff0c\u5177\u6709\u5bc6\u96c6\u5956\u52b1\u548c\u8be6\u7ec6\u76ee\u6807\uff0896%\u7684\u6210\u529f\u7387\uff09\uff0c\u7f51\u7edc\u6d4f\u89c8\uff0869%\u7684\u52a8\u4f5c\u6210\u529f\uff09\uff0c\u89e3\u51b3MATH\u6570\u636e\u96c6\uff08\u5728100\u4e2aLevel-5\u95ee\u9898\u4e0a\uff0c\u6210\u529f\u738771%\uff09\uff0c\u4ee5\u53ca\u81ea\u7136\u95ee\u9898\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08F1\u5206\u6570\u4e3a47.03%\uff09\u3002|\n", "2407.16686": "|**2024-07-23**|**Can Large Language Models Automatically Jailbreak GPT-4V?**|Yuanwei Wu et.al.|[2407.16686](http://arxiv.org/abs/2407.16686)|null|GPT-4V\u56e0\u5176\u5728\u6574\u5408\u548c\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u800c\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u3002\u540c\u65f6\uff0c\u5176\u9762\u90e8\u8bc6\u522b\u529f\u80fd\u4e5f\u5f15\u53d1\u4e86\u9690\u79c1\u6cc4\u9732\u7684\u5b89\u5168\u62c5\u5fe7\u3002\u5c3d\u7ba1\u7814\u7a76\u8005\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u4e0e\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6216\u9884\u5904\u7406\u8fc7\u6ee4\u5668\u7b49\u624b\u6bb5\u52aa\u529b\u5b9e\u73b0\u5b89\u5168\u5bf9\u9f50\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u5b58\u5728\u88ab\u5229\u7528\u7684\u6f0f\u6d1e\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AutoJailbreak\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u81ea\u52a8\u8d8a\u72f1\u6280\u672f\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u63d0\u793a\u4f18\u5316\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u7ea2\u961f\u8bad\u7ec3\uff0c\u4ee5\u7cbe\u70bc\u8d8a\u72f1\u63d0\u793a\uff0c\u5e76\u91c7\u7528\u5f31\u5230\u5f3a\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u63d0\u793a\u6765\u63d0\u9ad8\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u7ed3\u5408\u65e9\u671f\u505c\u6b62\u7b56\u7565\uff0c\u4ee5\u6700\u5c0f\u5316\u4f18\u5316\u65f6\u95f4\u548c\u4ee4\u724c\u6d88\u8017\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoJailbreak\u663e\u8457\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e86\u8d85\u8fc795.3%\u7684\u6210\u529f\u653b\u51fb\u7387\uff08ASR\uff09\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86\u52a0\u5f3aGPT-4V\u5b89\u5168\u6027\u7684\u6f5c\u529b\uff0c\u7a81\u663e\u4e86LLMs\u53ef\u80fd\u88ab\u7528\u4e8e\u7834\u574fGPT-4V\u5b8c\u6574\u6027\u7684\u98ce\u9669\u3002|\n", "2407.16667": "|**2024-07-23**|**RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent**|Huiyu Xu et.al.|[2407.16667](http://arxiv.org/abs/2407.16667)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u5df2\u88ab\u96c6\u6210\u81f3\u8bf8\u591a\u5b9e\u9645\u5e94\u7528\uff0c\u4f8b\u5982\u4ee3\u7801\u52a9\u624bCopilot\u3002\u8fd9\u4e9b\u96c6\u6210\u663e\u8457\u6269\u5c55\u4e86LLM\u7684\u653b\u51fb\u9762\uff0c\u4f7f\u5176\u9762\u4e34\u591a\u79cd\u5a01\u80c1\u3002\u5176\u4e2d\uff0c\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u8bf1\u5bfc\u51fa\u6bd2\u6027\u54cd\u5e94\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u5f15\u8d77\u4e86\u5b89\u5168\u9886\u57df\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u8bc6\u522b\u8fd9\u4e9b\u5a01\u80c1\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7ea2\u961f\u7b56\u7565\u901a\u8fc7\u6784\u5efa\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u6765\u6a21\u62df\u6f5c\u5728\u7684\u5bf9\u6297\u573a\u666f\uff0c\u4ee5\u6b64\u6d4b\u8bd5\u76ee\u6807LLM\u3002\u7136\u800c\uff0c\u73b0\u6709\u7ea2\u961f\u7b56\u7565\u5e76\u672a\u8003\u8651LLM\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u72ec\u7279\u8106\u5f31\u6027\uff0c\u4f7f\u5f97\u6784\u5efa\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u53d8\u5f97\u56f0\u96be\u3002\u540c\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u4f9d\u8d56\u4e8e\u5c11\u6570\u53d8\u5f02\u64cd\u4f5c\u5bf9\u201c\u8d8a\u72f1\u201d\u6a21\u677f\u8fdb\u884c\u7ec6\u5316\uff0c\u7f3a\u4e4f\u9002\u5e94\u4e0d\u540c\u60c5\u5883\u7684\u81ea\u52a8\u5316\u548c\u89c4\u6a21\u5316\u80fd\u529b\u3002 \u4e3a\u4e86\u5b9e\u73b0\u60c5\u5883\u611f\u77e5\u548c\u9ad8\u6548\u7ea2\u961f\u7b56\u7565\uff0c\u6211\u4eec\u62bd\u8c61\u5e76\u5efa\u6a21\u73b0\u6709\u653b\u51fb\u884c\u4e3a\u4e3a\u4e00\u4e2a\u7edf\u4e00\u6982\u5ff5\u2014\u2014\u201c\u8d8a\u72f1\u7b56\u7565\u201d\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u667a\u80fd\u4f53LLM\u7cfb\u7edfRedAgent\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u751f\u6210\u60c5\u5883\u611f\u77e5\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u5e76\u901a\u8fc7\u989d\u5916\u7684\u8bb0\u5fc6\u7f13\u51b2\u533a\u81ea\u6211\u53cd\u601d\u60c5\u5883\u53cd\u9988\uff0c\u6301\u7eed\u5b66\u4e60\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u5728\u7279\u5b9a\u60c5\u5883\u4e0b\u5b9e\u73b0\u6709\u6548\u201c\u8d8a\u72f1\u201d\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u53ef\u4ee5\u5728\u4e94\u4e2a\u67e5\u8be2\u5185\u6210\u529f\u201c\u8d8a\u72f1\u201d\u5927\u591a\u6570\u9ed1\u76d2LLM\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u7ea2\u961f\u65b9\u6cd5\u6548\u7387\u63d0\u5347\u4e24\u500d\u3002\u6b64\u5916\uff0cRedAgent\u80fd\u591f\u66f4\u9ad8\u6548\u5730\u9488\u5bf9\u5b9a\u5236\u5316\u7684LLM\u5e94\u7528\u8fdb\u884c\u201c\u8d8a\u72f1\u201d\u3002 \u901a\u8fc7\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u5e94\u7528\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u6211\u4eec\u53d1\u73b0\u4e8660\u4e2a\u4e25\u91cd\u6f0f\u6d1e\u5b58\u5728\u4e8e\u5b9e\u9645\u5e94\u7528\u4e2d\u7684GPTs\u4e0a\uff0c\u4ec5\u9700\u6bcf\u6f0f\u6d1e\u4e24\u6b21\u67e5\u8be2\u3002\u6211\u4eec\u5df2\u62a5\u544a\u6240\u6709\u53d1\u73b0\u7684\u95ee\u9898\uff0c\u5e76\u4e0eOpenAI\u548cMeta\u8fdb\u884c\u4e86\u6c9f\u901a\u4ee5\u4fee\u590d\u6f0f\u6d1e\u3002|\n", "2407.16637": "|**2024-07-23**|**Course-Correction: Safety Alignment Using Synthetic Preferences**|Rongwu Xu et.al.|[2407.16637](http://arxiv.org/abs/2407.16637)|**[link](https://github.com/pillowsofwind/course-correction)**|### \u6458\u8981 \u672c\u6587\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u201c\u8bfe\u7a0b\u7ea0\u6b63\u201d\u4efb\u52a1\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u4e00\u9879\u7cfb\u7edf\u6027\u7814\u7a76\uff0c\u5373\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u5730\u907f\u514d\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\textsc{C$^2$-Eval}\u57fa\u51c6\u7528\u4e8e\u5b9a\u91cf\u8bc4\u4f30\uff0c\u5e76\u5206\u6790\u4e8610\u4e2a\u6d41\u884cLLM\u7684\u6027\u80fd\uff0c\u63ed\u793a\u4e86\u5f53\u524d\u5b89\u5168\u8c03\u4f18\u7684LLM\u5728\u8bfe\u7a0b\u7ea0\u6b63\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u504f\u597d\u5b66\u4e60\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u5f3a\u8c03\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u91cd\u8981\u6027\u3002\u901a\u8fc7\u81ea\u52a8\u5316\u6d41\u7a0b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\\textsc{C$^2$-Syn}\u5408\u6210\u6570\u636e\u96c6\uff0c\u5305\u542b75\u4e07\u5bf9\u504f\u597d\uff0c\u4ee5\u6b64\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u504f\u597d\u5b66\u4e60\u6559\u6388\u6a21\u578b\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u6982\u5ff5\u3002\u5728\\textsc{Llama2-Chat 7B}\u548c\\textsc{Qwen2 7B}\u4e24\u4e2aLLM\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u63d0\u9ad8\u4e86\u8bfe\u7a0b\u7ea0\u6b63\u80fd\u529b\uff0c\u540c\u65f6\u4e0d\u5f71\u54cd\u603b\u4f53\u6027\u80fd\uff0c\u5e76\u4e14\u7279\u522b\u6709\u6548\u5730\u63d0\u5347\u4e86LLM\u7684\u5b89\u5168\u6027\uff0c\u5c24\u5176\u662f\u62b5\u6297\u9003\u8131\u653b\u51fb\u7684\u80fd\u529b\u3002|\n", "2407.16615": "|**2024-07-23**|**Lawma: The Power of Specialization for Legal Tasks**|Ricardo Dominguez-Olmedo et.al.|[2407.16615](http://arxiv.org/abs/2407.16615)|null|\u6cd5\u5f8b\u6587\u672c\u7684\u6ce8\u91ca\u4e0e\u5206\u7c7b\u662f\u5b9e\u8bc1\u6cd5\u5b66\u7814\u7a76\u7684\u6838\u5fc3\u90e8\u5206\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5f80\u5f80\u7531\u53d7\u8fc7\u8bad\u7ec3\u7684\u7814\u7a76\u52a9\u7406\u627f\u62c5\u3002\u5728\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u8fdb\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5b9e\u8bc1\u6cd5\u5f8b\u5b66\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u8f6c\u5411\u4f7f\u7528\u5546\u4e1a\u6a21\u578b\uff0c\u5e0c\u671b\u4ee5\u6b64\u51cf\u8f7b\u4eba\u5de5\u6807\u6ce8\u7684\u5de8\u5927\u6210\u672c\u3002\u5c3d\u7ba1\u8fd9\u7c7b\u65b9\u6cd5\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4f46\u5173\u4e8e\u5982\u4f55\u6700\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6cd5\u5f8b\u4efb\u52a1\u7684\u76f8\u5173\u7814\u7a76\u4ecd\u7136\u6709\u9650\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u51e0\u4e4e\u5168\u90e8\u9488\u5bf9\u673a\u5668\u5b66\u4e60\u793e\u533a\u7684\u65b0\u6cd5\u5f8b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u3002\u4eceGPT-4\u4f5c\u4e3a\u57fa\u51c6\u5f00\u59cb\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u5728\u96f6\u6837\u672c\u51c6\u786e\u5ea6\u4e0a\u7684\u8868\u73b0\u5177\u6709\u975e\u540c\u5bfb\u5e38\u4f46\u9ad8\u5ea6\u591a\u53d8\u6027\uff0c\u7ecf\u5e38\u8868\u73b0\u51fa\u53ef\u80fd\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u6cd5\u5f8b\u5de5\u4f5c\u9700\u6c42\u7684\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8f7b\u5ea6\u5fae\u8c03\u540e\u7684Llama 3\u6a21\u578b\u5728\u51e0\u4e4e\u6240\u6709\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5747\u8fdc\u8d85GPT-4\uff0c\u901a\u5e38\u63d0\u9ad8\u4e86\u4e24\u4f4d\u6570\u767e\u5206\u70b9\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5728\u5fae\u8c03\u65f6\u54cd\u5e94\u6548\u679c\u66f4\u597d\u3002\u51e0\u5341\u5230\u51e0\u767e\u4e2a\u793a\u4f8b\u8db3\u4ee5\u5b9e\u73b0\u9ad8\u5206\u7c7b\u51c6\u786e\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u6240\u6709260\u4e2a\u4efb\u52a1\u4e0a\u540c\u65f6\u5fae\u8c03\u4e00\u4e2a\u6a21\u578b\uff0c\u76f8\u5bf9\u4e8e\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u5355\u72ec\u521b\u5efa\u6a21\u578b\uff0c\u4ec5\u5728\u51c6\u786e\u6027\u65b9\u9762\u7565\u6709\u635f\u5931\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u6307\u51fa\u4e86\u66ff\u4ee3\u73b0\u6709\u505a\u6cd5\u7684\u4e00\u79cd\u53ef\u884c\u9009\u62e9\u3002\u5bf9\u4e8e\u5177\u5907\u4e00\u5b9a\u6807\u6ce8\u6570\u636e\u7684\u7279\u5b9a\u6cd5\u5f8b\u4efb\u52a1\uff0c\u7814\u7a76\u4eba\u5458\u66f4\u5e94\u8003\u8651\u4f7f\u7528\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002|\n", "2407.16604": "|**2024-07-23**|**Shared Imagination: LLMs Hallucinate Alike**|Yilun Zhou et.al.|[2407.16604](http://arxiv.org/abs/2407.16604)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u8fd1\u53d1\u5c55\u5448\u73b0\u4e86\u663e\u8457\u7684\u589e\u957f\uff0c\u4f46\u5b83\u4eec\u7684\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u6570\u636e\u548c\u4f18\u5316\u7b97\u6cd5\u2014\u2014\u5f80\u5f80\u6781\u4e3a\u76f8\u4f3c\u3002\u8fd9\u81ea\u7136\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u5982\u4f55\uff1f\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u8bbe\u7f6e\uff0c\u5373\u60f3\u8c61\u95ee\u9898\u56de\u7b54\uff08IQA\uff09\uff0c\u4ee5\u66f4\u6df1\u5165\u5730\u7406\u89e3\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u3002\u5728IQA\u4e2d\uff0c\u6211\u4eec\u8ba9\u4e00\u4e2a\u6a21\u578b\u751f\u6210\u5b8c\u5168\u865a\u6784\u7684\u95ee\u9898\uff08\u4f8b\u5982\uff0c\u5173\u4e8e\u7269\u7406\u4e2d\u5b8c\u5168\u4e0d\u5b58\u5728\u7684\u6982\u5ff5\uff09\uff0c\u7136\u540e\u8ba9\u53e6\u4e00\u4e2a\u6a21\u578b\u8fdb\u884c\u56de\u7b54\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u95ee\u9898\u5b8c\u5168\u865a\u6784\uff0c\u4f46\u6240\u6709\u6a21\u578b\u90fd\u80fd\u6210\u529f\u56de\u7b54\u5bf9\u65b9\u7684\u95ee\u9898\uff0c\u8fd9\u8868\u660e\u5728\u8fd9\u6837\u7684\u5e7b\u89c9\u8fc7\u7a0b\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u5171\u4eab\u7740\u4e00\u4e2a\u201c\u5171\u540c\u7684\u60f3\u8c61\u7a7a\u95f4\u201d\u3002 \u6211\u4eec\u5bf9\u8fd9\u4e00\u73b0\u8c61\u8fdb\u884c\u4e86\u7cfb\u5217\u8c03\u67e5\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b83\u5bf9\u6a21\u578b\u540c\u8d28\u6027\u3001\u5e7b\u89c9\u4ee5\u53ca\u8ba1\u7b97\u521b\u9020\u529b\u7684\u542f\u793a\u3002|\n", "2407.16576": "|**2024-07-23**|**Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs**|Yifan Xia et.al.|[2407.16576](http://arxiv.org/abs/2407.16576)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u68c0\u6d4b\u52a0\u5bc6API\u8bef\u7528\u65b9\u9762\u6240\u9762\u4e34\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u5728\u5f53\u524d\u81ea\u52a8\u5316\u68c0\u6d4b\u6280\u672f\u8fdb\u6b65\u7684\u57fa\u7840\u4e0a\uff0c\u5bf9\u4e8e\u590d\u6742\u76ee\u6807\u7684\u7cbe\u786e\u5ea6\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u624b\u52a8\u5b9a\u4e49\u6a21\u5f0f\u7684\u4f9d\u8d56\u3002LLM\u4ee5\u5176\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\uff0c\u5728\u6b64\u5173\u952e\u5b89\u5168\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5c06LLM\u5e94\u7528\u4e8e\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u6311\u6218\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u5b83\u4eec\u56fa\u6709\u7684\u968f\u673a\u6027\u548c\u4f17\u6240\u5468\u77e5\u7684\u5e7b\u89c9\u95ee\u9898\u5bfc\u81f4\u7684\u4e0d\u7a33\u5b9a\u6027\u3002 \u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u52a0\u5bc6\u8bef\u7528\u65b9\u9762\u7684\u53ef\u9760\u6027\uff0c\u5e76\u63a2\u7d22\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5229\u7528\u6db5\u76d6\u4eba\u5de5\u6784\u5efa\u6837\u672c\u548c\u5b9e\u9645\u9879\u76ee\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8fdb\u884c\u5206\u6790\u3002\u901a\u8fc7\u6df1\u5165\u5206\u679011,940\u4efdLLM\u751f\u6210\u7684\u62a5\u544a\uff0c\u6211\u4eec\u63ed\u793a\u4e86LLM\u56fa\u6709\u4e0d\u7a33\u5b9a\u6027\u7684\u666e\u904d\u5b58\u5728\uff0c\u5bfc\u81f4\u8d85\u8fc7\u4e00\u534a\u7684\u62a5\u544a\u88ab\u8bef\u62a5\u4e3a\u8bef\u7528\u3002\u7136\u800c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u901a\u8fc7\u9650\u5236\u95ee\u9898\u8303\u56f4\u5e76\u4e0eLLM\u7684\u81ea\u6211\u4fee\u6b63\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u68c0\u6d4b\u7684\u53ef\u9760\u6027\u3002\u4f18\u5316\u7684\u65b9\u6cd5\u5b9e\u73b0\u4e86\u63a5\u8fd190%\u7684\u68c0\u6d4b\u7387\uff0c\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5e76\u5728\u73b0\u6709\u57fa\u51c6\u4e2d\u53d1\u73b0\u4e86\u672a\u88ab\u53d1\u73b0\u7684\u8bef\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u6301\u7eed\u963b\u788dLLM\u53ef\u9760\u6027\u7684\u5931\u8d25\u6a21\u5f0f\uff0c\u5305\u62ec\u52a0\u5bc6\u77e5\u8bc6\u4e0d\u8db3\u548c\u4ee3\u7801\u8bed\u4e49\u8bef\u89e3\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4ee5LLM\u4e3a\u57fa\u7840\u7684\u5de5\u4f5c\u6d41\u7a0b\u6765\u68c0\u67e5\u5f00\u6e90\u4ed3\u5e93\uff0c\u6700\u7ec8\u53d1\u73b0\u4e8663\u4e2a\u771f\u5b9e\u7684\u52a0\u5bc6\u8bef\u7528\u6848\u4f8b\u3002\u5176\u4e2d46\u4e2a\u5df2\u88ab\u5f00\u53d1\u793e\u533a\u8ba4\u53ef\uff0c23\u4e2a\u6b63\u5728\u5904\u7406\u4e2d\uff0c6\u4e2a\u5df2\u5f97\u5230\u89e3\u51b3\u3002\u8003\u8651\u5230\u5f00\u53d1\u8005\u53cd\u9988\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u548cLLM\u5b89\u5168\u5de5\u5177\u53d1\u5c55\u7684\u5efa\u8bae\u3002|\n", "2407.16565": "|**2024-07-23**|**Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models**|Ioana Buhnila et.al.|[2407.16565](http://arxiv.org/abs/2407.16565)|**[link](https://github.com/ATILF-UMR7118/pRAGe)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u5bf9\u516c\u4f17\u800c\u8a00\u53d8\u5f97\u6108\u53d1\u4fbf\u6377\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4eba\u4eec\u5728\u533b\u7597\u5efa\u8bae\u65b9\u9762\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u60c5\u51b5\u96be\u4ee5\u8ffd\u8e2a\u3002\u5927\u578b\u8bed\u8a00\u751f\u6210\u6a21\u578b\u5b58\u5728\u4e24\u4e2a\u5173\u952e\u95ee\u9898\uff1a\u9996\u5148\uff0c\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u9519\u8bef\u63a8\u7406\uff0c\u56e0\u6b64\u7528\u4e8e\u533b\u7597\u76ee\u7684\u65f6\u9700\u8981\u5177\u5907\u79d1\u5b66\u6027\u548c\u4e8b\u5b9e\u6027\uff1b\u5176\u6b21\uff0c\u7531\u4e8e\u6a21\u578b\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u8ba1\u7b97\u8d44\u6e90\u6784\u6210\u91cd\u5927\u6311\u6218\u3002 \u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3apRAGe\u7684\u7ba1\u9053\uff0c\u65e8\u5728\u901a\u8fc7\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u8fdb\u884c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u4e0e\u8bc4\u4f30\uff0c\u4ee5\u5b9e\u73b0\u6cd5\u8bed\u533b\u5b66\u77ed\u8bed\u751f\u6210\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u6027\u4ee5\u53ca\u5916\u90e8\u77e5\u8bc6\u5e93\u5728\u533b\u5b66\u77ed\u8bed\u751f\u6210\u4e2d\u7684\u5f71\u54cd\u3002|\n", "2407.16557": "|**2024-07-23**|**Patched RTC: evaluating LLMs for diverse software development tasks**|Asankhaya Sharma et.al.|[2407.16557](http://arxiv.org/abs/2407.16557)|**[link](https://github.com/codelion/optillm/blob/main/rto.py)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8865\u4e01\u5f80\u8fd4\u6b63\u786e\u6027\uff08Patched RTC\uff09\u201d\u7684\u65b0\u578b\u8bc4\u4f30\u65b9\u6cd5\uff0c\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u7279\u522b\u662f\u201c\u5916\u5faa\u73af\u201d\u6d3b\u52a8\uff0c\u5982\u9519\u8bef\u4fee\u590d\u3001\u4ee3\u7801\u5ba1\u67e5\u548c\u6587\u6863\u66f4\u65b0\u3002Patched RTC\u662f\u5bf9\u539f\u5f80\u8fd4\u6b63\u786e\u6027\u65b9\u6cd5\u7684\u6269\u5c55\uff0c\u9002\u7528\u4e8e\u4efb\u4f55LLM\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u6211\u8bc4\u4f30\u6846\u67b6\uff0c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u5373\u53ef\u6d4b\u91cf\u6a21\u578b\u54cd\u5e94\u7684\u4e00\u81f4\u6027\u548c\u7a33\u5065\u6027\u3002\u7814\u7a76\u663e\u793a\u4e86Patched RTC\u5206\u6570\u4e0e\u7279\u5b9a\u4efb\u52a1\u51c6\u786e\u6027\u6307\u6807\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u5c06\u5176\u4f5c\u4e3a\u66ff\u4ee3LLM-as-Judge\u8303\u5f0f\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5f00\u653e\u57df\u4efb\u52a1\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3apatchwork\u7684\u5f00\u6e90\u6846\u67b6\u5b9e\u73b0Patched RTC\uff0c\u5728\u5404\u79cd\u8865\u4e01\u6d41\u4e2d\u5b9e\u73b0\u4e86\u5bf9\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u900f\u660e\u8bc4\u4f30\u3002 \u6bd4\u8f83GPT-3.5\u548cGPT-4\u6a21\u578b\u5728\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86Patched RTC\u80fd\u591f\u6709\u6548\u5730\u533a\u5206\u6a21\u578b\u6027\u80fd\u548c\u4efb\u52a1\u96be\u5ea6\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u4e00\u81f4\u6027\u63d0\u793a\u5bf9\u63d0\u9ad8\u6a21\u578b\u51c6\u786e\u6027\u7684\u5f71\u54cd\uff0c\u8868\u660ePatched RTC\u53ef\u4ee5\u6307\u5bfc\u63d0\u793a\u4f18\u5316\u548c\u6a21\u578b\u9009\u62e9\uff0c\u4ee5\u9002\u5e94\u590d\u6742\u7684\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002|\n", "2407.16552": "|**2024-07-24**|**MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues**|Liyun Zhang et.al.|[2407.16552](http://arxiv.org/abs/2407.16552)|null|\u5728\u89c6\u89c9\u3001\u542c\u89c9\u548c\u8bed\u8a00\u7b49\u591a\u6a21\u6001\u7ebf\u7d22\u7684\u89c6\u9891\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5c55\u793a\u4e86\u5353\u8d8a\u7684\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\u80fd\u529b\uff0c\u80fd\u591f\u7efc\u5408\u8fd9\u4e9b\u7ebf\u7d22\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5ffd\u89c6\u4e86\u6355\u6349\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\u4ee5\u53ca\u89c6\u9891\u4e2d\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u4ece\u800c\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u9650\u5236\u4e86\u5b83\u4eec\u7684\u6709\u6548\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65f6\u95f4\u654f\u611f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578bMicroEmo\uff0c\u65e8\u5728\u5c06\u6ce8\u610f\u529b\u96c6\u4e2d\u4e8e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u7ec6\u8282\u548c\u89c6\u9891\u4e2d\u7684\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5305\u542b\u4e86\u4e24\u4e2a\u5173\u952e\u7684\u67b6\u6784\u8d21\u732e\uff1a 1. \u5168\u5c40-\u5c40\u90e8\u6ce8\u610f\u529b\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u5b83\u7ed3\u5408\u4e86\u5168\u5c40\u5e27\u7ea7\u65f6\u95f4\u7ed1\u5b9a\u56fe\u50cf\u7279\u5f81\u4e0e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\uff0c\u5b9e\u73b0\u4e86\u5bf9\u6574\u4f53\u548c\u5c40\u90e8\u4fe1\u606f\u7684\u6709\u6548\u878d\u5408\uff1b 2. \u4e00\u4e2a\u8bdd\u8bed\u610f\u8bc6\u7684\u89c6\u9891Q-Former\uff0c\u5b83\u901a\u8fc7\u4e3a\u6bcf\u4e2a\u8bdd\u8bed\u6bb5\u843d\u548c\u6574\u4e2a\u89c6\u9891\u751f\u6210\u89c6\u89c9\u4ee4\u724c\u5e8f\u5217\u6765\u6355\u83b7\u591a\u5c42\u6b21\u548c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u7ec4\u5408\u5728\u4e00\u8d77\uff0c\u4ee5\u6355\u6349\u591a\u5c3a\u5ea6\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u3002 \u521d\u6b65\u7684\u5b9a\u6027\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e00\u4e2a\u5229\u7528\u591a\u6a21\u6001\u548c\u591a\u65b9\u9762\u7ebf\u7d22\u4ee5\u5f00\u653e\u8bcd\u6c47\uff08OV\uff09\u65b9\u5f0f\u9884\u6d4b\u60c5\u7eea\u7684\u65b0\u89e3\u91ca\u6027\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\uff08EMER\uff09\u4efb\u52a1\u4e2d\uff0cMicroEmo\u76f8\u8f83\u4e8e\u6700\u65b0\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2407.16521": "|**2024-07-23**|**AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game**|Yizhou Chi et.al.|[2407.16521](http://arxiv.org/abs/2407.16521)|null|\u6218\u7565\u6027\u7684\u793e\u4ea4\u63a8\u65ad\u6e38\u620f\u662f\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u7684\u5b9d\u8d35\u5b9e\u9a8c\u5e73\u53f0\uff0c\u5bf9\u4e8e\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u3001\u4eba\u5de5\u667a\u80fd\u9886\u57df\u4ee5\u53ca\u7b56\u7565\u6027\u6e38\u620f\u90fd\u6709\u91cd\u8981\u4ef7\u503c\u3002\u672c\u6587\u96c6\u4e2d\u4e8e\u5728\u6a21\u62df\u73af\u5883\u4e2d\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u300aAmong Us\u300b\u4f5c\u4e3a\u7814\u7a76\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u901a\u8fc7\u521b\u5efa\u4e00\u4e2a\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\u73af\u5883\uff0c\u79f0\u4e3aAmongAgent\uff0c\u8be5\u73af\u5883\u590d\u5236\u4e86\u300aAmong Us\u300b\u7684\u6e38\u620f\u52a8\u6001\u3002\u73a9\u5bb6\u626e\u6f14\u592a\u7a7a\u8239\u4e0a\u7684\u8239\u5458\uff0c\u4efb\u52a1\u662f\u8bc6\u522b\u7834\u574f\u592a\u7a7a\u8239\u7684\u5192\u540d\u9876\u66ff\u8005\u5e76\u6d88\u9664\u8239\u5458\u3002\u5728\u8fd9\u4e2a\u73af\u5883\u4e2d\uff0c\u6a21\u62df\u8bed\u8a00\u4ee3\u7406\u7684\u884c\u4e3a\u88ab\u5206\u6790\u3002\u5b9e\u9a8c\u6d89\u53ca\u4e0d\u540c\u8239\u5458\u548c\u5192\u540d\u9876\u66ff\u8005\u4eba\u683c\u539f\u578b\u914d\u7f6e\u7684\u591a\u6837\u5316\u7684\u6e38\u620f\u5e8f\u5217\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u6700\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6709\u6548\u5730\u638c\u63e1\u6e38\u620f\u89c4\u5219\uff0c\u5e76\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u5728\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u590d\u6742\u52a8\u4f5c\u7a7a\u95f4\u4e2d\u7684\u76ee\u6807\u5bfc\u5411\u6e38\u620f\u4e2d\u7684\u8bed\u8a00\u6a21\u578b\u6027\u80fd\u8fdb\u884c\u8fdb\u4e00\u6b65\u63a2\u7d22\uff0c\u8fd9\u4e9b\u8bbe\u7f6e\u63d0\u4f9b\u4e86\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4f1a\u9a71\u52a8\u573a\u666f\u4e2d\u8868\u73b0\u7684\u5b9d\u8d35\u673a\u4f1a\u3002|\n", "2407.17469": "|**2024-07-24**|**I Could've Asked That: Reformulating Unanswerable Questions**|Wenting Zhao et.al.|[2407.17469](http://arxiv.org/abs/2407.17469)|**[link](https://github.com/wenting-zhao/couldask)**|**\u5728\u4ece\u4e0d\u719f\u6089\u6587\u6863\u4e2d\u83b7\u53d6\u4fe1\u606f\u65f6\uff0c\u7528\u6237\u7ecf\u5e38\u63d0\u51fa\u65e0\u6cd5\u7531\u6587\u6863\u56de\u7b54\u7684\u95ee\u9898\u3002\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8bc6\u522b\u8fd9\u4e9b\u65e0\u6cd5\u56de\u7b54\u7684\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5e2e\u52a9\u7528\u6237\u91cd\u65b0\u6784\u5efa\u95ee\u9898\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u5b83\u4eec\u7684\u6574\u4f53\u5b9e\u7528\u6027\u3002\u6211\u4eec\u7cbe\u5fc3\u7f16\u6392\u4e86CouldAsk\uff0c\u4e00\u4e2a\u7528\u4e8e\u6587\u6863\u652f\u6301\u7684\u95ee\u7b54\u4efb\u52a1\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u65e8\u5728\u7814\u7a76\u91cd\u65b0\u6784\u5efa\u65e0\u6cd5\u56de\u7b54\u95ee\u9898\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u62ec\u4e86\u73b0\u6709\u7684\u548c\u65b0\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728CouldAsk\u4e0a\u7684\u8868\u73b0\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u91cd\u65b0\u6784\u5efa\u95ee\u9898\u65b9\u9762\u80fd\u529b\u6709\u9650\u3002\u5177\u4f53\u800c\u8a00\uff0cGPT-4\u548cLlama2-7B\u4ec5\u6210\u529f\u5730\u91cd\u65b0\u6784\u5efa\u4e86\u95ee\u9898\u768426%\u548c12%\u3002\u9519\u8bef\u5206\u6790\u663e\u793a\uff0c\u5931\u8d25\u7684\u91cd\u65b0\u6784\u5efa\u4e2d\u670962%\u7684\u539f\u56e0\u662f\u6a21\u578b\u53ea\u662f\u91cd\u8ff0\u4e86\u95ee\u9898\uff0c\u751a\u81f3\u751f\u6210\u4e86\u5b8c\u5168\u76f8\u540c\u7684\u95ee\u9898\u3002\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u57fa\u51c6\u4ee5\u53ca\u91cd\u73b0\u5b9e\u9a8c\u6240\u9700\u7684\u4ee3\u7801\u3002**|\n", "2407.17468": "|**2024-07-24**|**WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries**|Wenting Zhao et.al.|[2407.17468](http://arxiv.org/abs/2407.17468)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u666e\u904d\u5b58\u5728\u7684\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u7684\u4e8b\u5b9e\u6027\u8bc4\u4f30\u57fa\u51c6\u672a\u80fd\u8986\u76d6\u73b0\u5b9e\u4e16\u754c\u7528\u6237\u5bfb\u6c42\u4fe1\u606f\u7684\u591a\u6837\u5316\u77e5\u8bc6\u9886\u57df\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86WildHallucinations\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u4e8b\u5b9e\u6027\u3002\u8be5\u57fa\u51c6\u901a\u8fc7\u4fc3\u4f7fLLM\u751f\u6210\u6765\u81ea\u91ce\u5916\u7528\u6237-\u804a\u5929\u673a\u5668\u4eba\u5bf9\u8bdd\u4e2d\u7684\u5b9e\u4f53\u7684\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u751f\u6210\u5185\u5bb9\u968f\u540e\u81ea\u52a8\u4e0e\u4ece\u7f51\u7edc\u641c\u7d22\u7cfb\u7edf\u6536\u96c6\u7684\u6709\u7ec4\u7ec7\u7684\u77e5\u8bc6\u5e93\u8fdb\u884c\u4e8b\u5b9e\u68c0\u67e5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4e00\u534a\u4ee5\u4e0a\u7684\u5b9e\u9645\u4e16\u754c\u5b9e\u4f53\u5e76\u6ca1\u6709\u76f8\u5173\u7684\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u3002\u6211\u4eec\u572815\u4e2aLLM\u4e0a\u5bf97919\u4e2a\u5b9e\u4f53\u8fdb\u884c\u4e86118785\u6b21\u751f\u6210\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u53d1\u73b0\uff0cLLM\u5728\u6ca1\u6709\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u7684\u5b9e\u4f53\u4e0a\u4ea7\u751f\u66f4\u591a\u7684\u5e7b\u89c9\uff0c\u5e76\u4e14\u4e0d\u540c\u9886\u57df\u7684\u5e7b\u89c9\u7387\u5b58\u5728\u5dee\u5f02\u3002\u6700\u540e\uff0c\u5728\u4f7f\u7528\u76f8\u540c\u7684\u5e95\u5c42\u6a21\u578b\u65f6\uff0c\u4ec5\u589e\u52a0\u68c0\u7d22\u7ec4\u4ef6\u53ef\u4ee5\u7565\u5fae\u51cf\u5c11\u5e7b\u89c9\uff0c\u4f46\u65e0\u6cd5\u5b8c\u5168\u6d88\u9664\u5e7b\u89c9\u3002|\n", "2407.17467": "|**2024-07-24**|**CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models**|Jiawei Gu et.al.|[2407.17467](http://arxiv.org/abs/2407.17467)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f80\u5f80\u5728\u7279\u5b9a\u9886\u57df\u5185\u8868\u73b0\u4e0d\u4f73\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u7279\u5b9a\u9886\u57df\u7684\u6216\u4e13\u6709\u8bed\u6599\u5e93\u3002\u8fde\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u901a\u8fc7\u56de\u653e\u901a\u7528\u8bed\u6599\u5e76\u6ce8\u5165\u65b0\u9886\u57df\u7684\u7279\u5b9a\u77e5\u8bc6\u6765\u589e\u5f3aLLM\u7684\u80fd\u529b\uff0c\u4ee5\u6b64\u9632\u6b62\u707e\u96be\u6027\u9057\u5fd8\u3002\u7136\u800c\uff0c\u5728\u901a\u7528\u8bed\u6599\u548c\u9886\u57df\u7279\u5b9a\u8bed\u6599\u7684\u6df7\u5408\u6bd4\u4f8b\u4e0a\uff0c\u4eba\u4eec\u901a\u5e38\u91c7\u53d6\u7684\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5b9e\u9645\u8bad\u7ec3\u6548\u7387\u7684\u4f4e\u4e0b\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5c1d\u8bd5\u4eceCPT\u7684\u6838\u5fc3\u51fa\u53d1\u91cd\u65b0\u5ba1\u89c6LLM\u7684\u7f29\u653e\u884c\u4e3a\uff0c\u5e76\u53d1\u73b0\u635f\u5931\u3001\u6df7\u5408\u6bd4\u7387\u4e0e\u8bad\u7ec3\u4ee4\u724c\u89c4\u6a21\u4e4b\u95f4\u7684\u5e42\u5f8b\u5173\u7cfb\u3002\u6211\u4eec\u6b63\u5f0f\u5b9a\u4e49\u4e86\u901a\u7528\u80fd\u529b\u548c\u9886\u57df\u7279\u5b9a\u80fd\u529b\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u4ece\u800c\u786e\u5b9a\u4e86\u901a\u7528\u6570\u636e\u548c\u9886\u57df\u6570\u636e\u7684\u4e34\u754c\u6df7\u5408\u6bd4\u7387\uff08CMR\uff09\u3002\u901a\u8fc7\u627e\u5230\u5e73\u8861\u70b9\uff0cCMR\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u901a\u7528\u80fd\u529b\uff0c\u5e76\u5b9e\u73b0\u4e86\u671f\u671b\u7684\u9886\u57df\u8fc1\u79fb\uff0c\u786e\u4fdd\u4e86\u53ef\u7528\u8d44\u6e90\u7684\u6700\u5927\u5316\u5229\u7528\u3002\u56e0\u6b64\uff0c\u5982\u679c\u91cd\u89c6\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u5e73\u8861\uff0cCMR\u53ef\u4ee5\u88ab\u8ba4\u4e3a\u662f\u6700\u4f73\u6df7\u5408\u6bd4\u7387\u3002 \u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u5b9e\u4e86CMR\u7684\u53ef\u9884\u6d4b\u6027\uff0c\u5e76\u63d0\u51fa\u4e86CMR\u7f29\u653e\u5b9a\u5f8b\uff0c\u5e76\u5bf9\u5176\u4e00\u822c\u6027\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u8fd9\u4e9b\u53d1\u73b0\u63d0\u4f9b\u4e86\u4f18\u5316LLM\u5728\u7279\u5b9a\u9886\u57df\u5185\u7684\u8bad\u7ec3\u7684\u5b9e\u7528\u6307\u5357\uff0c\u786e\u4fdd\u5728\u6709\u6548\u7ba1\u7406\u8bad\u7ec3\u8d44\u6e90\u7684\u540c\u65f6\uff0c\u65e2\u4fdd\u6301\u901a\u7528\u6027\u80fd\u53c8\u5b9e\u73b0\u9886\u57df\u7279\u5b9a\u6027\u80fd\u3002|\n", "2407.17453": "|**2024-07-24**|**$VILA^2$: VILA Augmented VILA**|Yunhao Fang et.al.|[2407.17453](http://arxiv.org/abs/2407.17453)|null|\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(VLMs)\u7684\u53d1\u5c55\u8fc5\u901f\uff0c\u5f97\u76ca\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u6a21\u578b\u67b6\u6784\u548c\u8bad\u7ec3\u57fa\u7840\u8bbe\u65bd\u5728\u5feb\u901f\u8fdb\u6b65\uff0c\u4f46\u6570\u636e\u6536\u96c6\u4e0e\u6574\u7406\u7684\u5de5\u4f5c\u4ecd\u88ab\u5ffd\u89c6\u3002\u5f53\u6570\u636e\u7684\u6570\u91cf\u4e0e\u8d28\u91cf\u6210\u4e3a\u74f6\u9888\u65f6\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u76f4\u63a5\u4ece\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u66f4\u591a\u539f\u59cb\u6570\u636e\uff0c\u8fd9\u4e9b\u6570\u636e\u7684\u8d28\u91cf\u65e0\u6cd5\u4fdd\u8bc1\uff0c\u8981\u4e48\u4ece\u9ed1\u76d2\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4V/\u91d1\u725b\u5ea7\uff09\u4e2d\u63d0\u53d6\u6570\u636e\uff0c\u5bfc\u81f4\u6027\u80fd\u53d7\u5230\u8be5\u6a21\u578b\u7684\u9650\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u548c\u4e13\u5bb6\u589e\u5f3a\u6b65\u9aa4\uff0c\u4ee5\u8fed\u4ee3\u5730\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u6a21\u578b\u6027\u80fd\u3002 \u5728\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u4e2d\uff0cVLM\u91cd\u65b0\u751f\u6210\u5176\u81ea\u8eab\u7684\u9884\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u63d0\u5347\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u4ece\u8fd9\u4e2a\u7cbe\u70bc\u7684\u6570\u636e\u96c6\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ee5\u6539\u5584\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u91cd\u590d\u8fdb\u884c\u591a\u6b21\u3002\u4e00\u65e6\u81ea\u6211\u589e\u5f3a\u8fbe\u5230\u9971\u548c\uff0c\u6211\u4eec\u5c06\u91c7\u7528\u51e0\u4e2a\u4e13\u95e8\u9886\u57dfVLM\uff0c\u8fd9\u4e9bVLM\u662f\u4ece\u81ea\u6211\u589e\u5f3a\u7684VLM\u4e2d\u5fae\u8c03\u800c\u6765\u7684\uff0c\u5177\u6709\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u901a\u8fc7\u4efb\u52a1\u5bfc\u5411\u7684\u91cd\u65b0\u751f\u6210\u548c\u91cd\u65b0\u8bad\u7ec3\uff0c\u8fdb\u4e00\u6b65\u5c06\u4e13\u5bb6\u77e5\u8bc6\u6ce8\u5165\u901a\u7528\u6a21\u578b\u4e2d\u3002 \u901a\u8fc7\u7ed3\u5408\u81ea\u6211\u589e\u5f3a\u548c\u4e13\u5bb6\u589e\u5f3a\u7684\u8bad\u7ec3\uff0c\u6211\u4eec\u5f15\u5165\u4e86VILA\u00b2\uff08VILA\u589e\u5f3a-VILA\uff09\u6a21\u578b\u5bb6\u65cf\uff0c\u8be5\u5bb6\u65cf\u5728\u5e7f\u6cdb\u7684\u4efb\u52a1\u4e0a\u6301\u7eed\u63d0\u9ad8\u4e86\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u6210\u679c\uff0c\u5e76\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u6a21\u578b\u4e2dMMMU\u6392\u884c\u699c\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u7ed3\u679c\u3002|\n", "2407.17417": "|**2024-07-24**|**Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?**|Michael-Andrei Panaitescu-Liess et.al.|[2407.17417](http://arxiv.org/abs/2407.17417)|null|\u672c\u6587\u9996\u5148\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u5d4c\u5165\u6c34\u5370\u4f5c\u4e3a\u9632\u6b62\u751f\u6210\u7248\u6743\u4fb5\u6743\u6587\u672c\u7684\u6709\u6548\u624b\u6bb5\u3002\u901a\u8fc7\u7406\u8bba\u5206\u6790\u548c\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5728LLM\u4e2d\u878d\u5165\u6c34\u5370\u80fd\u591f\u663e\u8457\u964d\u4f4e\u751f\u6210\u7248\u6743\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u89e3\u51b3LLM\u90e8\u7f72\u8fc7\u7a0b\u4e2d\u7684\u4e00\u9879\u5173\u952e\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6c34\u5370\u5bf9\u6210\u5458\u5f52\u5c5e\u63a8\u65ad\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIAs\uff09\u7684\u5f71\u54cd\uff0cMIAs\u65e8\u5728\u8bc6\u522b\u6837\u672c\u662f\u5426\u5c5e\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u80fd\u7528\u4e8e\u68c0\u6d4b\u7248\u6743\u8fdd\u89c4\u884c\u4e3a\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6c34\u5370\u964d\u4f4e\u4e86MIAs\u7684\u6210\u529f\u7387\uff0c\u4f7f\u68c0\u6d4b\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u7248\u6743\u6587\u672c\u53d8\u5f97\u590d\u6742\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u6280\u672f\u6765\u63d0\u9ad8\u5728\u6c34\u5370\u73af\u5883\u4e0b\u6700\u8fd1MIAs\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u5f00\u53d1\u9002\u5e94\u6027\u65b9\u6cd5\u4ee5\u7814\u7a76\u5177\u6709\u6f5c\u5728\u6cd5\u5f8b\u5f71\u54cd\u7684LLM\u5173\u952e\u95ee\u9898\u7684\u91cd\u8981\u6027\u3002|\n", "2407.17412": "|**2024-07-24**|**(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork**|Tianjin Huang et.al.|[2407.17412](http://arxiv.org/abs/2407.17412)|null|\u5927\u578b\u795e\u7ecf\u7f51\u7edc\u5728\u4e0d\u540c\u9886\u57df\u5982\u89c6\u89c9\u548c\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5c3d\u7ba1\u8fd9\u4f34\u968f\u7740\u5de8\u5927\u7684\u8ba1\u7b97\u8d44\u6e90\u6210\u672c\u3002\u538b\u7f29\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u7ed3\u6784\u6a21\u578b\u526a\u679d\u7b97\u6cd5\u662f\u4fc3\u8fdb\u6a21\u578b\u6548\u7387\u7684\u5173\u952e\u65b9\u6cd5\uff0c\u5f97\u76ca\u4e8e\u5176\u52a0\u901f\u53cb\u597d\u7684\u7a00\u758f\u6027\u6a21\u5f0f\u3002\u7ed3\u6784\u526a\u679d\u7684\u6838\u5fc3\u95ee\u9898\u662f\u5982\u4f55\u4f30\u8ba1\u901a\u9053\u7684\u91cd\u8981\u6027\u3002\u4e0e\u6b64\u5e76\u884c\uff0c\u6570\u636e\u4e3a\u4e2d\u5fc3\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u8868\u660e\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u6280\u672f\u80fd\u591f\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e00\u4e2a\u8ff7\u4eba\u7684\u53ef\u80fd\u6027\u2014\u2014\u5229\u7528\u89c6\u89c9\u63d0\u793a\u6765\u6355\u6349\u901a\u9053\u91cd\u8981\u6027\uff0c\u5e76\u63a8\u5bfc\u51fa\u9ad8\u8d28\u91cf\u7684\u7ed3\u6784\u7a00\u758f\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\u6846\u67b6\uff0c\u5373\\texttt{PASS}\u3002\u5b83\u662f\u4e00\u79cd\u5b9a\u5236\u7684\u8d85\u7f51\u7edc\uff0c\u63a5\u53d7\u89c6\u89c9\u63d0\u793a\u548c\u7f51\u7edc\u6743\u91cd\u7edf\u8ba1\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u9012\u5f52\u65b9\u5f0f\u8f93\u51fa\u9010\u5c42\u901a\u9053\u7a00\u758f\u6027\u3002\u8fd9\u79cd\u8bbe\u8ba1\u8003\u8651\u4e86\u5c42\u4e4b\u95f4\u901a\u9053\u7684\u5185\u5728\u4f9d\u8d56\u6027\u3002\u8de8\u591a\u4e2a\u7f51\u7edc\u67b6\u6784\u548c\u516d\u4e2a\u6570\u636e\u96c6\u7684\u5168\u9762\u5b9e\u9a8c\u663e\u793a\u4e86\\texttt{PASS}\u5728\u5b9a\u4f4d\u826f\u597d\u7ed3\u6784\u7a00\u758f\u6027\u7684\u4f18\u52bf\u3002\u4f8b\u5982\uff0c\u5728\u76f8\u540c\u7684FLOPs\u6c34\u5e73\u4e0b\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u5728Food101\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e861%-3%\u66f4\u9ad8\u7684\u51c6\u786e\u6027\uff1b\u6216\u8005\u5728\u83b7\u5f97\u4e0e\u57fa\u7ebf\u76f8\u540c\u768480%\u51c6\u786e\u5ea6\u65f6\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u80fd\u591f\u5b9e\u73b00.35\u500d\u66f4\u591a\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2407.17404": "|**2024-07-24**|**Grammar-based Game Description Generation using Large Language Models**|Tsunehiko Tanaka et.al.|[2407.17404](http://arxiv.org/abs/2407.17404)|null|\u4e3a\u4e86\u964d\u4f4e\u6e38\u620f\u8bbe\u8ba1\u5f00\u53d1\u7684\u95e8\u69db\uff0c\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u901a\u8fc7\u8ba1\u7b97\u8fc7\u7a0b\u751f\u6210\u6e38\u620f\u8bbe\u8ba1\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\uff0c\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5982\u8fdb\u5316\u7b97\u6cd5\u5df2\u53d6\u5f97\u6210\u529f\u3002\u5f97\u76ca\u4e8e\u6df1\u5ea6\u5b66\u4e60\u9886\u57df\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u65b9\u9762\u7684\u663e\u8457\u8fdb\u5c55\uff0c\u6e38\u620f\u751f\u6210\u65b9\u9762\u4e5f\u6709\u4e86\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u7684\u6570\u636e\u91cf\u6709\u9650\uff0c\u6df1\u5ea6\u5b66\u4e60\u5728\u4efb\u52a1\u5982\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u4e0a\u5e94\u7528\u4e0d\u8db3\u3002\u4e3a\u4e86\u5f00\u62d3\u5904\u7406\u6709\u9650\u6570\u636e\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\u7684\u65b0\u9014\u5f84\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u3002LLMs\u53ef\u4ee5\u4ece\u5c11\u91cf\u793a\u8303\u793a\u4f8b\u4e2d\u6355\u83b7\u4efb\u52a1\u7279\u5f81\uff0c\u5e76\u5229\u7528\u9884\u8bad\u7ec3\u671f\u95f4\u83b7\u5f97\u7684\u80fd\u529b\u8fdb\u884c\u5e94\u7528\u3002\u6211\u4eec\u5f15\u5165\u4e86\u6e38\u620f\u63cf\u8ff0\u7684\u8bed\u6cd5\uff0c\u6709\u6548\u5730\u5bf9\u6e38\u620f\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u7ed3\u6784\u5316\uff0c\u4f7fLLMs\u80fd\u591f\u6355\u6349\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\u7684\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u7801\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u8bed\u6cd5\u8fed\u4ee3\u6539\u8fdb\u751f\u6210\u7684\u8f93\u51fa\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u751f\u6210\u6e38\u620f\u63cf\u8ff0\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002|\n", "2407.17398": "|**2024-07-24**|**3D Question Answering for City Scene Understanding**|Penglei Sun et.al.|[2407.17398](http://arxiv.org/abs/2407.17398)|null|\u5728\u4e09\u7ef4\u591a\u6a21\u6001\u95ee\u7b54\uff08MQA\uff09\u9886\u57df\uff0c\u901a\u8fc7\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5176\u6240\u5728\u73af\u5883\u4e2d\u7684\u4e09\u7ef4\u7a7a\u95f4\uff0c\u5bf9\u4e8e\u573a\u666f\u7406\u89e3\u5177\u6709\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5ba4\u5185\u5bb6\u5ead\u4efb\u52a1\u548c\u5ba4\u5916\u9053\u8def\u81ea\u52a8\u9a7e\u9a76\u4efb\u52a1\u4e0a\uff0c\u800c\u5bf9\u4e8e\u57ce\u5e02\u7ea7\u522b\u7684\u573a\u666f\u7406\u89e3\u4efb\u52a1\u63a2\u7d22\u6709\u9650\u3002\u73b0\u6709\u7814\u7a76\u5728\u7406\u89e3\u57ce\u5e02\u573a\u666f\u65f6\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u57ce\u5e02\u5c42\u9762\u7684\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u4ee5\u53ca\u4eba\u7c7b\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u4fe1\u606f\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u4ece\u6570\u636e\u96c6\u548c\u65b9\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u5bf9\u4e09\u7ef4MQA\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u4ece\u6570\u636e\u96c6\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aCity-3DQA\u7684\u65b0\u9896\u4e09\u7ef4MQA\u6570\u636e\u96c6\uff0c\u5b83\u662f\u9996\u4e2a\u878d\u5408\u57ce\u5e02\u573a\u666f\u8bed\u4e49\u548c\u4eba\u4e0e\u73af\u5883\u4ea4\u4e92\u4efb\u52a1\u7684\u6570\u636e\u96c6\u3002\u4ece\u65b9\u6cd5\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u57ce\u5e02\u7ea7\u522b\u7406\u89e3\u65b9\u6cd5\uff08Sg-CityU\uff09\uff0c\u5229\u7528\u573a\u666f\u56fe\u5f15\u5165\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u3002\u5728City-3DQA\u7684\u4e0d\u540c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684Sg-CityU\u65b9\u6cd5\u53d6\u5f97\u4e8663.94%\u548c63.76%\u7684\u51c6\u786e\u7387\uff0c\u76f8\u6bd4\u5ba4\u5185\u4e09\u7ef4MQA\u65b9\u6cd5\u548c\u4f7f\u7528\u5148\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96f6\u6837\u672c\u65b9\u6cd5\uff0c\u5728\u9c81\u68d2\u6027\u548c\u6cdb\u5316\u80fd\u529b\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.17365": "|**2024-07-24**|**ViPer: Visual Personalization of Generative Models via Individual Preference Learning**|Sogand Salehi et.al.|[2407.17365](http://arxiv.org/abs/2407.17365)|null|\u4e0d\u540c\u7684\u7528\u6237\u5bf9\u4e8e\u540c\u4e00\u63d0\u793a\u751f\u6210\u7684\u4e0d\u540c\u56fe\u50cf\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u8fd9\u50ac\u751f\u4e86\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u7684\u6982\u5ff5\uff0c\u5373\u521b\u5efa\u4e0e\u4e2a\u4eba\u89c6\u89c9\u504f\u597d\u76f8\u5339\u914d\u7684\u56fe\u50cf\u3002\u76ee\u524d\u7684\u751f\u6210\u6a21\u578b\u662f\u65e0\u4e2a\u6027\u5316\u7684\uff0c\u5b83\u4eec\u88ab\u8c03\u6574\u4e3a\u5438\u5f15\u5e7f\u6cdb\u53d7\u4f17\u3002\u7528\u6237\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u7b26\u5408\u4e2a\u4eba\u504f\u597d\u7684\u56fe\u50cf\u4f9d\u8d56\u4e8e\u901a\u8fc7\u591a\u6b21\u8fed\u4ee3\u624b\u52a8\u8c03\u6574\u63d0\u793a\u7684\u8fc7\u7a0b\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u4e0d\u7406\u60f3\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u8fc7\u7a0b\uff1a\u9996\u5148\u901a\u8fc7\u9080\u8bf7\u7528\u6237\u5bf9\u4e00\u5c0f\u90e8\u5206\u56fe\u50cf\u8fdb\u884c\u8bc4\u8bba\uff0c\u89e3\u91ca\u4ed6\u4eec\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u7684\u539f\u56e0\uff0c\u4ece\u800c\u6355\u6349\u7528\u6237\u7684\u901a\u7528\u504f\u597d\u3002\u57fa\u4e8e\u8fd9\u4e9b\u8bc4\u8bba\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u51fa\u7528\u6237\u7684\u7ed3\u6784\u5316\u559c\u597d\u7684\u548c\u4e0d\u559c\u597d\u7684\u89c6\u89c9\u5c5e\u6027\uff0c\u5373\u4ed6\u4eec\u7684\u89c6\u89c9\u504f\u597d\u3002\u8fd9\u4e9b\u5c5e\u6027\u7528\u4e8e\u6307\u5bfc\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u66f4\u8d34\u8fd1\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u7684\u56fe\u50cf\u3002 \u901a\u8fc7\u4e00\u7cfb\u5217\u7528\u6237\u7814\u7a76\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f15\u5bfc\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u4e0e\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u9ad8\u5ea6\u4e00\u81f4\u7684\u751f\u6210\u7ed3\u679c\u3002|\n", "2407.17353": "|**2024-07-24**|**Scalify: scale propagation for efficient low-precision LLM training**|Paul Balan\u00e7a et.al.|[2407.17353](http://arxiv.org/abs/2407.17353)|**[link](https://github.com/graphcore-research/jax-scalify)**|**\u4f4e\u7cbe\u5ea6\u683c\u5f0f\uff0c\u5982float8\uff0c\u5df2\u88ab\u5f15\u5165\u673a\u5668\u5b66\u4e60\u52a0\u901f\u786c\u4ef6\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u548c\u63a8\u7406\u7684\u8ba1\u7b97\u6548\u7387\u3002\u7136\u800c\uff0c\u7531\u4e8e\u9700\u8981\u590d\u6742\u7684\u3001\u6709\u65f6\u662f\u8106\u5f31\u7684\u6280\u672f\u6765\u5339\u914d\u66f4\u9ad8\u7cbe\u5ea6\u7684\u8bad\u7ec3\u51c6\u786e\u5ea6\uff0cML\u793e\u533a\u5bf9\u4f4e\u7cbe\u5ea6\u683c\u5f0f\u7684\u91c7\u7eb3\u901f\u5ea6\u8f83\u6162\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aScalify\u7684\u7aef\u5230\u7aef\u7684\u7f29\u653e\u4f20\u64ad\u8303\u5f0f\uff0c\u7528\u4e8e\u8ba1\u7b97\u56fe\uff0c\u5b83\u6cdb\u5316\u5e76\u5f62\u5f0f\u5316\u4e86\u73b0\u6709\u7684\u5f20\u91cf\u7f29\u653e\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cScalify\u652f\u6301\u76f4\u63a5\u4f7f\u7528float8\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u548c\u68af\u5ea6\u8868\u793a\uff0c\u4ee5\u53cafloat16\u4f18\u5316\u5668\u72b6\u6001\u5b58\u50a8\u3002\u6211\u4eec\u5bf9Scalify\u7684JAX\u5b9e\u73b0\u5df2\u7ecf\u5f00\u6e90\u5728https://github.com/graphcore-research/jax-scalify\u3002**|\n", "2407.18219": "|**2024-07-26**|**Recursive Introspection: Teaching Language Model Agents How to Self-Improve**|Yuxiao Qu et.al.|[2407.18219](http://arxiv.org/abs/2407.18219)|null|\u5728\u4f7f\u57fa\u7840\u6a21\u578b\u5177\u5907\u81ea\u6211\u53cd\u7701\u80fd\u529b\u4ee5\u4fc3\u8fdb\u667a\u80fd\u4ee3\u7406\u884c\u4e3a\u7684\u5173\u952e\u65b9\u9762\u5728\u4e8e\u4f7f\u5176\u80fd\u591f\u5bf9\u5176\u884c\u4e3a\u3001\u63a8\u7406\u4ee5\u53ca\u5728\u53ef\u7528\u8ba1\u7b97\u6216\u4ea4\u4e92\u589e\u52a0\u65f6\u7ea0\u6b63\u9519\u8bef\u7684\u80fd\u529b\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u3002\u5373\u4f7f\u662f\u6700\u5f3a\u7684\u4e13\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e5f\u672a\u80fd\u5c55\u73b0\u51fa\u5728\u660e\u786e\u544a\u77e5\u5176\u72af\u9519\u7684\u60c5\u51b5\u4e0b\uff0c\u80fd\u591f\u8fde\u7eed\u6539\u8fdb\u5176\u54cd\u5e94\u5e8f\u5217\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRISE\uff08\u9012\u5f52\u5185\u7701\uff09\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5fae\u8c03LLMs\u4ee5\u5f15\u5165\u8fd9\u4e00\u80fd\u529b\uff0c\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u66fe\u5047\u8bbe\u8fd9\u79cd\u80fd\u529b\u53ef\u80fd\u65e0\u6cd5\u5b9e\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u89c4\u5b9a\u4e86\u4e00\u4e2a\u8fed\u4ee3\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u5c1d\u8bd5\u6559\u6388\u6a21\u578b\u5982\u4f55\u5728\u5176\u89e3\u51b3\u56f0\u96be\u6d4b\u8bd5\u65f6\u95ee\u9898\u7684\u4e0d\u6210\u529f\u5c1d\u8bd5\u540e\u4fee\u6539\u5176\u54cd\u5e94\uff0c\u5e76\u53ef\u9009\u5730\u83b7\u5f97\u989d\u5916\u7684\u73af\u5883\u53cd\u9988\u3002RISE\u5c06\u5355\u8f6e\u63d0\u793a\u7684\u5fae\u8c03\u89c6\u4e3a\u89e3\u51b3\u591a\u8f6e\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDP\uff09\uff0c\u5176\u4e2d\u521d\u59cb\u72b6\u6001\u4e3a\u63d0\u793a\u3002\u53d7\u5728\u7ebf\u6a21\u4eff\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u539f\u7406\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u8f6e\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u65e8\u5728\u8d4b\u4e88LLM\u9012\u5f52\u68c0\u6d4b\u5e76\u4fee\u6b63\u5176\u5148\u524d\u9519\u8bef\u5e76\u5728\u540e\u7eed\u8fed\u4ee3\u4e2d\u8fdb\u884c\u7ea0\u6b63\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRISE\u4f7fLlama2\u3001Llama3\u548cMistral\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u901a\u8fc7\u66f4\u591a\u8f6e\u6b21\u6539\u5584\u81ea\u5df1\uff0c\u4e0e\u7ed9\u5b9a\u7b49\u91cf\u63a8\u7406\u65f6\u95f4\u8ba1\u7b97\u76f8\u6bd4\uff0c\u8d85\u8fc7\u4e86\u51e0\u79cd\u5355\u8f6e\u7b56\u7565\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cRISE\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\uff0c\u901a\u5e38\u968f\u7740\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u800c\u83b7\u5f97\u66f4\u5927\u7684\u6536\u76ca\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cRISE\u5bf9\u56f0\u96be\u63d0\u793a\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u6709\u610f\u4e49\u7684\u6539\u8fdb\uff0c\u4ee5\u8fbe\u5230\u6b63\u786e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u6ca1\u6709\u56e0\u4e3a\u8868\u8fbe\u66f4\u590d\u6742\u7684\u5206\u5e03\u800c\u5bfc\u81f4\u5355\u8f6e\u80fd\u529b\u53d7\u5230\u5f71\u54cd\u3002|\n", "2407.18213": "|**2024-07-26**|**Exploring Scaling Trends in LLM Robustness**|Nikolaus Howe et.al.|[2407.18213](http://arxiv.org/abs/2407.18213)|null|\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u53ef\u9884\u6d4b\u5730\u901a\u8fc7\u589e\u52a0\u6a21\u578b\u7684\u5927\u5c0f\u548c\u8bad\u7ec3\u6570\u636e\u800c\u5f97\u5230\u6539\u5584\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u5df2\u8bad\u7ec3\u4e86\u4e00\u7cfb\u5217\u8d8a\u6765\u8d8a\u5927\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5bf9\u5bf9\u6297\u6027\u63d0\u793a\uff08\u5982\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff09\u975e\u5e38\u8106\u5f31\uff0c\u8fd9\u7c7b\u653b\u51fb\u4f1a\u64cd\u63a7\u6a21\u578b\u6267\u884c\u4e0d\u5e0c\u671b\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6784\u6210\u4e86\u91cd\u5927\u7684\u8bef\u7528\u98ce\u9669\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u968f\u7740\u6a21\u578b\u548c\u6570\u636e\u89c4\u6a21\u7684\u589e\u52a0\uff0c\u8ba1\u7b97\u673a\u89c6\u89c9\u6a21\u578b\u7684\u9c81\u68d2\u6027\u4e5f\u4f1a\u63d0\u9ad8\uff0c\u56e0\u6b64\u63d0\u51fa\u4e86\u8fd9\u6837\u4e00\u4e2a\u95ee\u9898\uff1a\u8bed\u8a00\u6a21\u578b\u7684\u9c81\u68d2\u6027\u662f\u5426\u4e5f\u4f1a\u968f\u89c4\u6a21\u7684\u6269\u5927\u800c\u63d0\u5347\uff1f\u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\u56de\u7b54\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u53d1\u73b0\u66f4\u5927\u7684\u6a21\u578b\u5728\u5bf9\u6297\u6027\u8bad\u7ec3\u4e0b\u6709\u663e\u8457\u66f4\u597d\u7684\u8868\u73b0\uff0c\u4f46\u5728\u6ca1\u6709\u660e\u786e\u9632\u5fa1\u63aa\u65bd\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u89c4\u6a21\u7684\u589e\u52a0\u5e76\u6ca1\u6709\u5e26\u6765\u4efb\u4f55\u76ca\u5904\u3002|\n", "2407.18158": "|**2024-07-25**|**Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models**|Sanae Lotfi et.al.|[2407.18158](http://arxiv.org/abs/2407.18158)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u538b\u7f29\u6280\u672f\u8ba1\u7b97\u4e86LLM\u7684\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\uff0c\u4f46\u5bf9\u4e8e\u5341\u4ebf\u53c2\u6570\u7ea7\u522b\u7684\u5927\u578b\u6a21\u578b\uff0c\u8fd9\u4e9b\u8fb9\u754c\u663e\u5f97\u65e0\u610f\u4e49\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u8fb9\u754c\u662f\u5728\u975e\u5e38\u6709\u9650\u7684\u538b\u7f29\u6280\u672f\u4e0b\u83b7\u5f97\u7684\uff0c\u9650\u5236\u4e86\u751f\u6210\u8d28\u91cf\u8f83\u4f4e\u6587\u672c\u7684\u538b\u7f29\u6a21\u578b\u3002\u66f4\u5173\u952e\u7684\u662f\uff0c\u73b0\u6709\u8fb9\u754c\u4f9d\u8d56\u4e8e\u8bad\u7ec3\u96c6\u4e2d\u72ec\u7acb\u540c\u5206\u5e03\uff08IID\uff09\u6587\u6863\u7684\u6570\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u8bad\u7ec3\u96c6\u5185\u6570\u91cf\u5e9e\u5927\u7684\u975eIID\u6784\u6210\u4ee4\u724c\uff0c\u8fd9\u4f7f\u5f97\u8fdb\u4e00\u6b65\u63d0\u9ad8\u8fb9\u754c\u7d27\u81f4\u6027\u6f5c\u529b\u672a\u88ab\u5145\u5206\u5229\u7528\u3002 \u672c\u7814\u7a76\u91c7\u7528\u9785\u7684\u6027\u8d28\u6765\u63a8\u5bfc\u6cdb\u5316\u8fb9\u754c\uff0c\u8fd9\u4e9b\u8fb9\u754c\u80fd\u591f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u96c6\u4e2d\u5305\u542b\u7684\u5927\u91cf\u4ee4\u724c\u4e2d\u83b7\u76ca\u3002\u4e0e\u8bad\u7ec3\u96c6\u76f8\u6bd4\uff0c\u6570\u636e\u96c6\u5305\u542b\u7684\u4ee4\u724c\u6570\u91cf\u8fdc\u591a\u4e8e\u6587\u6863\uff0c\u56e0\u6b64\u6211\u4eec\u7684\u6cdb\u5316\u8fb9\u754c\u4e0d\u4ec5\u5bb9\u5fcd\u4e86\u66f4\u4e3a\u5bbd\u677e\u7684\u538b\u7f29\u65b9\u6848\uff0c\u5b9e\u9645\u4e0a\u8fd8\u80fd\u4ece\u8fd9\u4e9b\u65b9\u6848\u4e2d\u83b7\u76ca\u3002\u6211\u4eec\u901a\u8fc7Monarch\u77e9\u9635\u3001Kronecker\u56e0\u5b50\u5206\u89e3\u548c\u540e\u8bad\u7ec3\u91cf\u5316\u7b49\u65b9\u6cd5\uff0c\u4e3aLLM\uff08\u5982LLaMA2-70B\uff09\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002\u4e0e\u4ee5\u5f80\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u9996\u6b21\u4e3a\u5728\u5b9e\u8df5\u4e2d\u90e8\u7f72\u5e76\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u7684\u6a21\u578b\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002|\n", "2407.18129": "|**2024-07-26**|**Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic**|Fakhraddin Alwajih et.al.|[2407.18129](http://arxiv.org/abs/2407.18129)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u751f\u6210\u548c\u7406\u89e3\u56fe\u50cf\u5230\u6587\u672c\u5185\u5bb9\u65b9\u9762\u7684\u529f\u80fd\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u8fd9\u4e9b\u6210\u529f\uff0c\u4f46\u8fdb\u6b65\u4e3b\u8981\u5c40\u9650\u4e8e\u82f1\u8bed\uff0c\u7531\u4e8e\u5176\u4ed6\u8bed\u8a00\u5982\u963f\u62c9\u4f2f\u8bed\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u8d44\u6e90\u7684\u7a00\u7f3a\u6027\uff0c\u8fd9\u9650\u5236\u4e86\u963f\u62c9\u4f2f\u8bed\u7b49\u8bed\u8a00\u4e2d\u7ade\u4e89\u6027\u6a21\u578b\u7684\u53d1\u5c55\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684\u963f\u62c9\u4f2f\u8bed\u591a\u6a21\u6001\u52a9\u624b\u2014\u2014Dallah\uff0c\u5b83\u57fa\u4e8eLLaMA-2\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u6765\u4fc3\u8fdb\u591a\u6a21\u6001\u4ea4\u4e92\u3002Dallah\u5728\u963f\u62c9\u4f2f\u8bedMLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u901a\u8fc7\u7ec6\u8c03\u516d\u4e2a\u963f\u62c9\u4f2f\u65b9\u8a00\uff0cDallah\u5c55\u793a\u4e86\u5176\u5904\u7406\u5305\u542b\u6587\u672c\u548c\u89c6\u89c9\u5143\u7d20\u7684\u590d\u6742\u65b9\u8a00\u4e92\u52a8\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u5728\u4e24\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff1a\u4e00\u4e2a\u8bc4\u4f30\u5176\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u6027\u80fd\uff0c\u53e6\u4e00\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u65b9\u8a00\u54cd\u5e94\u3002 \u9664\u4e86\u5728\u591a\u6a21\u6001\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u7a33\u5065\u6027\u80fd\u5916\uff0cDallah\u6709\u671b\u5f15\u9886\u8fdb\u4e00\u6b65\u5f00\u53d1\u65b9\u8a00\u610f\u8bc6\u7684\u963f\u62c9\u4f2f\u8bedMLLM\u7684\u53d1\u5c55\u3002|\n", "2407.18103": "|**2024-07-25**|**Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow**|Tian Guo et.al.|[2407.18103](http://arxiv.org/abs/2407.18103)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u5fae\u8c03\u6280\u672f\u5728\u5404\u79cd\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06LLM\u7528\u4e8e\u57fa\u4e8e\u91d1\u878d\u65b0\u95fb\u6d41\u7684\u80a1\u7968\u56de\u62a5\u9884\u6d4b\u7684\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u91cf\u5316\u6295\u8d44\u9886\u57df\uff0c\u56de\u62a5\u9884\u6d4b\u662f\u540e\u7eed\u4efb\u52a1\u5982\u80a1\u7968\u6311\u9009\u548c\u7ec4\u5408\u4f18\u5316\u7b49\u7684\u57fa\u7840\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u6587\u672c\u8868\u793a\u548c\u9884\u6d4b\u6a21\u5757\u7684\u6a21\u578b\u3002\u63d0\u51fa\u4e86\u6bd4\u8f83\u4ec5\u7f16\u7801\u5668\u548c\u4ec5\u89e3\u7801\u5668LLM\u7684\u4e24\u79cd\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee5\u4e0d\u540c\u7684\u65b9\u5f0f\u751f\u6210\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u4e0d\u540c\u8868\u793a\u5bf9\u9884\u6d4b\u6027\u80fd\u7684\u5f71\u54cd\u4ecd\u662f\u4e00\u4e2a\u5f00\u653e\u7684\u95ee\u9898\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u5c06LLM\u7684token\u7ea7\u8868\u793a\u96c6\u6210\u5230\u9884\u6d4b\u6a21\u5757\u4e2d\u7684\u4e24\u79cd\u7b80\u5355\u65b9\u6cd5\u3002\u5728\u771f\u5b9e\u65b0\u95fb\u548c\u6295\u8d44\u8303\u56f4\u5185\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\u4ee5\u4e0b\u7ed3\u679c\uff1a\uff081\uff09\u4eceLLM\u7684token\u7ea7\u5d4c\u5165\u805a\u5408\u7684\u8868\u793a\u901a\u5e38\u80fd\u4ea7\u751f\u589e\u5f3a\u957f\u671f\u548c\u957f\u671f\u77ed\u671f\u6295\u8d44\u7ec4\u5408\u6027\u80fd\u7684\u56de\u62a5\u9884\u6d4b\uff1b\uff082\uff09\u5728\u76f8\u5bf9\u8f83\u5927\u7684\u6295\u8d44\u8303\u56f4\u5185\uff0c\u57fa\u4e8e\u89e3\u7801\u5668\u7684LLM\u9884\u6d4b\u6a21\u578b\u5bfc\u81f4\u66f4\u5f3a\u7684\u6295\u8d44\u7ec4\u5408\uff0c\u800c\u5728\u8f83\u5c0f\u7684\u8303\u56f4\u5185\uff0c\u6ca1\u6709\u4e00\u81f4\u7684\u8d62\u5bb6\uff1b\uff083\uff09\u4eceLLM\u6587\u672c\u8868\u793a\u4e2d\u5bfc\u51fa\u7684\u56de\u62a5\u9884\u6d4b\u5bf9\u4e8e\u6295\u8d44\u7ec4\u5408\u6784\u9020\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u4fe1\u53f7\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684\u60c5\u7eea\u5f97\u5206\u3002|\n", "2407.18078": "|**2024-07-25**|**PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization**|Christopher Clarke et.al.|[2407.18078](http://arxiv.org/abs/2407.18078)|**[link](https://github.com/ChrisIsKing/Parameter-Efficient-Personalization)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u5174\u8d77\u4e3a\u4eba\u7c7b\u4e0eAI\u7684\u4ea4\u4e92\u5f00\u8f9f\u4e86\u65b0\u7684\u7bc7\u7ae0\u3002\u8fd9\u4e9b\u5148\u8fdb\u6a21\u578b\uff0c\u4ee5Chat-GPT\u4e3a\u4ee3\u8868\uff0c\u5c55\u73b0\u4e86\u5728\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u89c4\u6a21\u7684\u6307\u6570\u7ea7\u589e\u957f\uff0c\u4e00\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6a21\u578b\u4e2a\u6027\u5316\u2014\u2014\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u5927\u578b\u57fa\u7840\u6a21\u578b\u5982GPT-3\u7b49\u4fa7\u91cd\u4e8e\u6784\u5efa\u901a\u7528\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u4efb\u52a1\u548c\u7528\u6237\u7fa4\u4f53\u3002\u8fd9\u79cd\u7b56\u7565\u5f3a\u8c03\u4e86\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c06\u7528\u6237\u89c6\u4e3a\u6574\u4f53\u800c\u975e\u4e2a\u4f53\u3002\u867d\u7136\u5728\u8bb8\u591a\u5e38\u89c1\u5e94\u7528\u4e2d\u5b9e\u7528\uff0c\u4f46\u8fd9\u79cd\u4e00\u5200\u5207\u7684\u65b9\u6cd5\u5f80\u5f80\u65e0\u6cd5\u6ee1\u8db3\u4eba\u7c7b\u591a\u6837\u6027\u548c\u4e2a\u6027\u5316\u9700\u6c42\u7684\u4e30\u5bcc\u6027\u3002\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86PEFT-U\u57fa\u51c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u5efa\u548c\u8bc4\u4f30\u9762\u5411\u7528\u6237\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6a21\u578b\u7684\u65b0\u6570\u636e\u96c6\u3002PEFT-U\u5305\u542b\u4e86\u591a\u5143\u4e14\u4e2a\u6027\u5316\u7684\u8868\u8fbe\u4efb\u52a1\uff0c\u5176\u4e2d\u540c\u4e00\u8f93\u5165\u5bf9\u4e8e\u4e0d\u540c\u7528\u6237\u53ef\u80fd\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u901a\u8fc7PEFT-U\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u9ad8\u6548\u5730\u4e2a\u6027\u5316LLM\u4ee5\u9002\u5e94\u7528\u6237\u7279\u5b9a\u504f\u597d\uff0c\u7279\u522b\u662f\u5728\u591a\u6837\u5316\u7684\u7528\u6237\u4e2d\u5fc3\u4efb\u52a1\u80cc\u666f\u4e0b\u3002**|\n", "2407.18069": "|**2024-07-25**|**C2P: Featuring Large Language Models with Causal Reasoning**|Abdolmahdi Bagheri et.al.|[2407.18069](http://arxiv.org/abs/2407.18069)|null|\u56e0\u679c\u63a8\u7406\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fbe\u5230\u4eba\u7c7b\u7ea7\u667a\u80fd\u7684\u4e3b\u8981\u969c\u788d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u56e0\u679c\u94fe\u63d0\u793a\uff08C2P\uff09\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u4e3a\u5f53\u524dLLM\u63d0\u4f9b\u56e0\u679c\u63a8\u7406\u80fd\u529b\u7684\u63a8\u7406\u6846\u67b6\u3002C2P\u81ea\u4e3b\u8fd0\u884c\uff0c\u5728\u56e0\u679c\u5b66\u4e60\u548c\u63a8\u7406\u9636\u6bb5\u5747\u65e0\u9700\u4f9d\u8d56\u5916\u90e8\u5de5\u5177\u6216\u6a21\u5757\uff0c\u5e76\u4e14\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230LLM\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002\u5728\u5404\u79cd\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cC2P\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u56e0\u679c\u5b66\u4e60\u548c\u540e\u7eed\u63a8\u7406\u51c6\u786e\u6027\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7C2P\u589e\u5f3aLLM\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u7684\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u89e3\u51b3\u533b\u7597\u3001\u533b\u5b66\u3001\u7ecf\u6d4e\u5b66\u3001\u6559\u80b2\u3001\u793e\u4f1a\u79d1\u5b66\u3001\u73af\u5883\u79d1\u5b66\u548c\u5e02\u573a\u8425\u9500\u7b49\u9886\u57df\u4e2d\u7684\u590d\u6742\u95ee\u9898\u3002\u5229\u7528\u5c11\u793a\u4f8b\u5b66\u4e60\uff0cGPT-4 Turbo \u4f7f\u7528C2P\uff0c\u4ec5\u4f7f\u7528\u516d\u4e2a\u793a\u4f8b\u5c31\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u63a8\u7406\u51c6\u786e\u6027\u6bd4\u5728\u7c7b\u4f3c\u60c5\u51b5\u4e0b\u8fd1\u4e4e\u968f\u673a\u8fd0\u884c\u7684\u6700\u5148\u8fdbLLM\u9ad8\u51fa33%\u4ee5\u4e0a\u3002\u8fd9\u8bc1\u660e\u4e86\u5c06C2P\u96c6\u6210\u5230LLM\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u8d4b\u4e88\u8fd9\u4e9b\u6a21\u578b\u9ad8\u7ea7\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u5177\u6709\u53d8\u9769\u6027\u610f\u4e49\u3002|\n", "2407.18064": "|**2024-07-25**|**ComPeer: A Generative Conversational Agent for Proactive Peer Support**|Tianjian Liu et.al.|[2407.18064](http://arxiv.org/abs/2407.18064)|**[link](https://github.com/liutj9/compeer)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u5f0f\u4ee3\u7406\uff08CA\uff09\u4f5c\u4e3a\u540c\u4f34\u652f\u6301\u8005\u5728\u5fc3\u7406\u5065\u5eb7\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\u53ca\u76ca\u5904\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u540c\u4f34\u652f\u6301\u578bCA\u8981\u4e48\u7531\u7528\u6237\u4e3b\u52a8\u89e6\u53d1\uff0c\u8981\u4e48\u9075\u5faa\u9884\u8bbe\u89c4\u5219\u4ee5\u542f\u52a8\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u80fd\u963b\u788d\u7528\u6237\u4e0eCA\u5efa\u7acb\u957f\u671f\u5173\u7cfb\uff0c\u4ece\u800c\u5f71\u54cd\u957f\u671f\u76ca\u5904\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ComPeer\u2014\u2014\u4e00\u79cd\u751f\u6210\u5f0fCA\uff0c\u5b83\u80fd\u591f\u4e3b\u52a8\u63d0\u4f9b\u9002\u5e94\u6027\u7684\u540c\u4f34\u652f\u6301\u3002 ComPeer\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5e76\u53cd\u6620\u5bf9\u8bdd\u4e2d\u7684\u5173\u952e\u4e8b\u4ef6\uff0c\u4ee5\u6b64\u6765\u7b56\u7565\u6027\u5730\u89c4\u5212\u4e3b\u52a8\u5173\u6000\u7684\u65f6\u95f4\u548c\u5185\u5bb9\u3002\u6b64\u5916\uff0cComPeer\u8fd8\u6574\u5408\u4e86\u540c\u4f34\u652f\u6301\u7b56\u7565\u3001\u5bf9\u8bdd\u5386\u53f2\u4ee5\u53ca\u5176\u4e2a\u6027\u5316\u7684\u5143\u7d20\u5230\u751f\u6210\u7684\u6d88\u606f\u4e2d\u3002\u901a\u8fc7\u4e00\u9879\u4e3a\u671f\u4e00\u5468\u7684\u8de8\u7ec4\u5b9e\u9a8c\uff08\u53c2\u4e0e\u4eba\u6570\uff1a24\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86ComPeer\u5728\u957f\u65f6\u95f4\u5185\u63d0\u4f9b\u540c\u4f34\u652f\u6301\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u4e0e\u57fa\u4e8e\u7528\u6237\u7684\u4e3b\u52a8\u89e6\u53d1\u7684CA\u76f8\u6bd4\uff0c\u663e\u8457\u63d0\u5347\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002 \u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u751f\u6210\u5f0fCA\u5728\u540c\u4f34\u652f\u6301\u9886\u57df\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5b83\u4eec\u5982\u4f55\u901a\u8fc7\u4e3b\u52a8\u5173\u6000\u7b56\u7565\u4fc3\u8fdb\u66f4\u6df1\u5165\u3001\u66f4\u6301\u7eed\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u4ece\u800c\u4e3a\u7528\u6237\u63d0\u4f9b\u957f\u671f\u7684\u5fc3\u7406\u5065\u5eb7\u76ca\u5904\u3002|\n", "2407.18062": "|**2024-07-25**|**Audio Entailment: Assessing Deductive Reasoning for Audio Understanding**|Soham Deshmukh et.al.|[2407.18062](http://arxiv.org/abs/2407.18062)|**[link](https://github.com/microsoft/audioentailment)**|**\u8fd1\u671f\u6587\u732e\u5728\u6784\u5efa\u97f3\u9891\u57fa\u7840\u6a21\u578b\u65f6\u4f7f\u7528\u4e86\u8bed\u8a00\u3002\u8fd9\u4e9b\u97f3\u9891-\u8bed\u8a00\u6a21\u578b\uff08ALMs\uff09\u901a\u8fc7\u5927\u91cf\u97f3\u9891\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u5728\u6587\u672c\u5230\u97f3\u9891\u68c0\u7d22\u3001\u5b57\u5e55\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6267\u884c\u66f4\u590d\u6742\u7684\u5f00\u653e\u6027\u4efb\u52a1\uff0c\u5982\u4ea4\u4e92\u5f0f\u95ee\u7b54\u65f6\u7684\u80fd\u529b\uff0c\u9700\u8981\u903b\u8f91\u63a8\u7406\u6280\u80fd\uff0c\u800c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u8bc4\u4f30\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u97f3\u9891\u8574\u542b\u7684\u65b0\u4efb\u52a1\uff0c\u7528\u4e8e\u8bc4\u4f30ALM\u7684\u6f14\u7ece\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u4e2a\u4efb\u52a1\u8bc4\u4f30\u97f3\u9891\u5185\u5bb9\u7684\u6587\u672c\u63cf\u8ff0\uff08\u5047\u8bbe\uff09\u662f\u5426\u53ef\u4ee5\u4ece\u97f3\u9891\u8bb0\u5f55\uff08\u524d\u63d0\uff09\u4e2d\u63a8\u65ad\u51fa\u6765\uff0c\u7ed3\u8bba\u53ef\u80fd\u662f\u8574\u542b\u3001\u4e2d\u7acb\u6216\u77db\u76fe\uff0c\u53d6\u51b3\u4e8e\u8bc1\u636e\u7684\u5145\u5206\u6027\u3002\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\uff0c\u97f3\u9891\u8bb0\u5f55\u6765\u81ea\u4e24\u4e2a\u97f3\u9891\u5b57\u5e55\u6570\u636e\u96c6\u2014\u2014AudioCaps\u548cClotho\uff0c\u800c\u5047\u8bbe\u5219\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u3002 \u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684ALMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u53d1\u73b0\u5b83\u4eec\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5148\u5b57\u5e55\u540e\u63a8\u7406\u201d\u8fd9\u4e00\u4e2d\u95f4\u6b65\u9aa4\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u53ef\u4ee5\u5206\u522b\u63d0\u9ad8ALMs\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u7edd\u5bf9\u503c6%\u548c3%\u3002**|\n", "2407.18061": "|**2024-07-25**|**Difficulty Estimation and Simplification of French Text Using LLMs**|Henri Jamet et.al.|[2407.18061](http://arxiv.org/abs/2407.18061)|null|\u6211\u4eec\u5229\u7528\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6765\u5f00\u53d1\u5916\u8bed\u5b66\u4e60\u5e94\u7528\uff0c\u4e13\u6ce8\u4e8e\u8bc4\u4f30\u5916\u8bed\u6587\u672c\u7684\u96be\u5ea6\u5e76\u5c06\u5176\u7b80\u5316\u81f3\u8f83\u4f4e\u96be\u5ea6\u7ea7\u522b\u3002\u6211\u4eec\u5c06\u8fd9\u4e24\u4e2a\u4efb\u52a1\u90fd\u89c6\u4e3a\u9884\u6d4b\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u6709\u6807\u7b7e\u793a\u4f8b\u3001\u8fc1\u79fb\u5b66\u4e60\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u96be\u5ea6\u5206\u7c7b\u6a21\u578b\uff0c\u76f8\u8f83\u4e8e\u4ee5\u5f80\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u5728\u51c6\u786e\u6027\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5bf9\u4e8e\u7b80\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u7b80\u5316\u8d28\u91cf\u4e0e\u610f\u4e49\u4fdd\u7559\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u6bd4\u8f83\u4e86\u96f6\u521d\u59cb\u5316\u548c\u5fae\u8c03\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u6709\u9650\u7684\u5fae\u8c03\uff0c\u53ef\u4ee5\u83b7\u5f97\u5177\u6709\u610f\u4e49\u7684\u6587\u672c\u7b80\u5316\u7ed3\u679c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u6cd5\u8bed\u6587\u672c\u4e0a\u8fdb\u884c\uff0c\u4f46\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u8bed\u8a00\u65e0\u5173\u6027\uff0c\u5e76\u76f4\u63a5\u9002\u7528\u4e8e\u5176\u4ed6\u5916\u8bed\u3002|\n", "2407.18897": "|**2024-07-26**|**Small Molecule Optimization with Large Language Models**|Philipp Guevorguian et.al.|[2407.18897](http://arxiv.org/abs/2407.18897)|**[link](https://github.com/yerevann/chemlactica)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e3a\u751f\u6210\u5206\u5b50\u836f\u7269\u8bbe\u8ba1\u5e26\u6765\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cChemlactica\u201d\u548c\u201cChemma\u201d\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u4eec\u5747\u57fa\u4e8e\u4e00\u4e2a\u542b\u67091.1\u4ebf\u4e2a\u5206\u5b50\u53ca\u8ba1\u7b97\u5f97\u51fa\u5c5e\u6027\u7684\u5168\u65b0\u6570\u636e\u96c6\uff0c\u5171\u8ba1400\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u5fae\u8c03\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u751f\u6210\u5177\u6709\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\u4ee5\u53ca\u4ece\u6709\u9650\u6837\u672c\u9884\u6d4b\u65b0\u5206\u5b50\u7279\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u4f18\u5316\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u6211\u4eec\u7684\u8bed\u8a00\u6a21\u578b\u5bf9\u4efb\u610f\u5c5e\u6027\u8fdb\u884c\u4f18\u5316\uff0c\u540c\u65f6\u4ec5\u901a\u8fc7\u9ed1\u76d2\u5f0f\u63a5\u53e3\u8bbf\u95ee\u6709\u9650\u4fe1\u606f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u9057\u4f20\u7b97\u6cd5\u3001\u62d2\u7edd\u91c7\u6837\u548c\u63d0\u793a\u4f18\u5316\u7684\u6982\u5ff5\u3002\u8be5\u7b97\u6cd5\u5728\u591a\u4e2a\u5206\u5b50\u4f18\u5316\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5728\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\u63d0\u9ad8\u4e868%\u7684\u201cPractical Molecular Optimization\u201d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u8bed\u8a00\u6a21\u578b\u548c\u4f18\u5316\u7b97\u6cd5\u7684\u4ee3\u7801\u3002**|\n", "2407.18827": "|**2024-07-26**|**Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models**|Mutahar Safdar et.al.|[2407.18827](http://arxiv.org/abs/2407.18827)|null|\u6570\u636e\u9a71\u52a8\u7684\u589e\u6750\u5236\u9020(AM)\u7814\u7a76\u5728\u8fd1\u5e74\u6765\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5927\u91cf\u7684\u79d1\u5b66\u6587\u732e\u6d8c\u73b0\u3002\u8fd9\u4e9b\u6587\u732e\u4e2d\u7684\u77e5\u8bc6\u6d89\u53caAM\u548c\u4eba\u5de5\u667a\u80fd(AI)\u7684\u4e0a\u4e0b\u6587\uff0c\u4f46\u5c1a\u672a\u4ee5\u96c6\u6210\u7684\u65b9\u5f0f\u8fdb\u884c\u6316\u6398\u548c\u5f62\u5f0f\u5316\u3002\u4ece\u8fd9\u4e9b\u4f5c\u54c1\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u9700\u8981\u5927\u91cf\u7684\u52aa\u529b\u548c\u65f6\u95f4\u3002\u5728AM\u9886\u57df\u7684\u4e13\u5bb6\u5df2\u7ecf\u8d21\u732e\u4e86\u8d85\u8fc7\u4e8c\u5341\u591a\u7bc7\u7efc\u8ff0\u8bba\u6587\u6765\u603b\u7ed3\u8fd9\u4e9b\u5de5\u4f5c\u3002\u7136\u800c\uff0c\u4e0eAM\u548cAI\u76f8\u5173\u7684\u7279\u5b9a\u4fe1\u606f\u4ecd\u7136\u9700\u8981\u624b\u52a8\u52aa\u529b\u6765\u63d0\u53d6\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u5982BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u53d8\u6362\u5668\uff09\u6216GPT\uff08\u9884\u8bad\u7ec3\u751f\u6210\u578b\u53d8\u6362\u5668\uff09\u5728\u6587\u672c\u6570\u636e\u4e0a\u7684\u6210\u529f\uff0c\u4e3a\u52a0\u901f\u79d1\u5b66\u4fe1\u606f\u63d0\u53d6\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdbAM\u548cAI\u4e13\u5bb6\u4e4b\u95f4\u7684\u5408\u4f5c\uff0c\u4ee5\u8fde\u7eed\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u3002\u57fa\u4e8e\u63d0\u51fa\u7684\u6846\u67b6\u5b9e\u73b0\u4e86\u4e00\u4e2a\u6f14\u793a\u5de5\u5177\uff0c\u5e76\u5f00\u5c55\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u63d0\u53d6\u4e0e\u6570\u636e\u96c6\u3001\u5efa\u6a21\u3001\u4f20\u611f\u548cAM\u7cfb\u7edf\u7c7b\u522b\u76f8\u5173\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u52a0\u5feb\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\u7684\u80fd\u529b\u3002\u5728\u672a\u6765\uff0c\u8be5\u6846\u67b6\u53ef\u4ee5\u7528\u4e8e\u4ece\u5de5\u7a0b\u5b66\u79d1\u7684\u8bbe\u8ba1\u548c\u5236\u9020\u6587\u732e\u4e2d\u63d0\u53d6\u4fe1\u606f\u3002|\n", "2407.18787": "|**2024-07-26**|**Automatic Detection of Moral Values in Music Lyrics**|Vjosa Preniqi et.al.|[2407.18787](http://arxiv.org/abs/2407.18787)|**[link](https://github.com/vjosapreniqi/ismir-mft-values)**|\u9053\u5fb7\u4ef7\u503c\u89c2\u5728\u8bc4\u4f30\u4fe1\u606f\u3001\u505a\u51fa\u51b3\u7b56\u548c\u5bf9\u91cd\u8981\u793e\u4f1a\u95ee\u9898\u5f62\u6210\u5224\u65ad\u65b9\u9762\u53d1\u6325\u7740\u57fa\u7840\u6027\u4f5c\u7528\u3002\u4ece\u6b4c\u8bcd\u4e2d\u5feb\u901f\u63d0\u53d6\u9053\u5fb7\u4ef7\u503c\u7684\u53ef\u80fd\u6027\u4f7f\u6211\u4eec\u5bf9\u97f3\u4e50\u8046\u542c\u884c\u4e3a\u6709\u66f4\u6df1\u7684\u7406\u89e3\u3002\u57fa\u4e8e\u9053\u5fb7\u57fa\u7840\u7406\u8bba\uff08MFT\uff09\uff0c\u6211\u4eec\u5bf9\u4e00\u7ec4\u7ecf\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4\uff09\u751f\u6210\u76842,721\u4e2a\u5408\u6210\u6b4c\u8bcd\u5fae\u8c03\u7684\u53d8\u538b\u5668\u57fa\u8bed\u8a00\u6a21\u578b\uff08BERT\uff09\u8fdb\u884c\u4e86\u4efb\u52a1\uff0c\u4ee5\u68c0\u6d4b200\u9996\u7531\u4e24\u4f4d\u4e13\u5bb6\u6ce8\u91ca\u7684\u771f\u5b9e\u97f3\u4e50\u6b4c\u8bcd\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff08\u5305\u62ec\u79bb\u57df\uff08BERT\u5728MFT\u6ce8\u91ca\u7684\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u4e0a\u5fae\u8c03\uff09\u548c\u96f6\u5c04\u51fb\uff08GPT-4\uff09\u5206\u7c7b\uff09\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u9884\u6d4b\u80fd\u529b\u3002\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5728\u6240\u6709\u5b9e\u9a8c\u4e2d\u5747\u8868\u73b0\u51fa\u6700\u4f73\u51c6\u786e\u6027\uff0c\u5e73\u5747F1\u52a0\u6743\u5f97\u5206\u4e3a0.8\u3002\u4e0e\u57fa\u51c6\u6a21\u578b\u76f8\u6bd4\uff0c\u8be5\u6027\u80fd\u5e73\u5747\u9ad8\u51fa5%\u3002\u5728\u4e8c\u5143\u5206\u7c7b\u7684\u7cbe\u786e\u5ea6\u4e0a\uff0c\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5e73\u5747\u9ad8\u51fa\u57fa\u51c6\u6a21\u578b12%\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8d21\u732e\u4e86\u65e0\u6ce8\u91ca\u7684\u6b4c\u8bcd\u9053\u5fb7\u5b66\u4e60\u4ee5\u53ca\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u97f3\u4e50\u4e2d\u9053\u5fb7\u8868\u8fbe\u7684\u77e5\u8bc6\u63d0\u70bc\uff0c\u5e76\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u6280\u672f\u5bf9\u521b\u610f\u4ea7\u4e1a\u548c\u97f3\u4e50\u6587\u5316\u6f5c\u5728\u5f71\u54cd\u7684\u6709\u7528\u89c1\u89e3\u3002|\n", "2407.18786": "|**2024-07-26**|**The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs**|Aleix Sant et.al.|[2407.18786](http://arxiv.org/abs/2407.18786)|null|\u672c\u6587\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u89c6\u89d2\u63a2\u8ba8\u4e86\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u6027\u522b\u504f\u89c1\u95ee\u9898\u3002\u7814\u7a76\u4f7f\u7528\u4e86\u56db\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u6d4b\u8bd5\u96c6\uff0c\u5bf9\u82f1\u8bed\u5230\u52a0\u6cf0\u7f57\u5c3c\u4e9a\u8bed\uff08En$\\rightarrow$Ca\uff09\u548c\u82f1\u8bed\u5230\u897f\u73ed\u7259\u8bed\uff08En$\\rightarrow$Es\uff09\u7684\u7ffb\u8bd1\u65b9\u5411\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e0e\u6700\u5148\u8fdb\u7684\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08NMT\uff09\u6a21\u578b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u8bc4\u4f30\u5404\u79cd\u57fa\u7840LLM\u7684\u7ffb\u8bd1\u8d28\u91cf\u548c\u6027\u522b\u504f\u89c1\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u666e\u904d\u5b58\u5728\u6027\u522b\u504f\u89c1\u73b0\u8c61\uff0c\u5176\u4e2d\u57fa\u7840LLM\u7684\u504f\u89c1\u7a0b\u5ea6\u6bd4NMT\u6a21\u578b\u66f4\u9ad8\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u79cd\u504f\u89c1\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u5e94\u7528\u7684\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u3002\u7814\u7a76\u8bc6\u522b\u51fa\u4e00\u79cd\u63d0\u793a\u7ed3\u6784\uff0c\u80fd\u591f\u663e\u8457\u964d\u4f4e\u6027\u522b\u504f\u89c1\uff0c\u76f8\u6bd4\u66f4\u76f4\u63a5\u7684\u63d0\u793a\uff0c\u5728WinoMT\u8bc4\u4f30\u6570\u636e\u96c6\u4e0a\u51cf\u5c11\u4e86\u9ad8\u8fbe12%\u7684\u6027\u522b\u504f\u89c1\u3002\u8fd9\u4e9b\u7ed3\u679c\u663e\u8457\u7f29\u5c0f\u4e86LLM\u4e0e\u4f20\u7edfNMT\u7cfb\u7edf\u5728\u6027\u522b\u504f\u89c1\u51c6\u786e\u6027\u65b9\u9762\u7684\u5dee\u8ddd\u3002|\n", "2407.18764": "|**2024-07-26**|**TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals**|Kevin Kliimask et.al.|[2407.18764](http://arxiv.org/abs/2407.18764)|null|\u81ea2000\u5e74\u4ee3\u4e2d\u671f\u4ee5\u6765\uff0c\u63a8\u52a8\u5f00\u653e\u653f\u5e9c\u6570\u636e\uff08OGD\uff09\u7684\u52aa\u529b\u5728\u5404\u7ea7\u653f\u5e9c\u4e2d\u83b7\u5f97\u4e86\u663e\u8457\u7684\u52bf\u5934\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684\u6570\u636e\u96c6\u88ab\u53d1\u5e03\u5230OGD\u95e8\u6237\u4e0a\uff0c\u67e5\u627e\u7279\u5b9a\u6570\u636e\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\uff0c\u5bfc\u81f4\u4fe1\u606f\u8fc7\u8f7d\u3002\u5b8c\u6574\u4e14\u51c6\u786e\u7684\u6570\u636e\u96c6\u6587\u6863\uff0c\u5305\u62ec\u4e0e\u6570\u636e\u96c6\u5173\u8054\u7684\u9002\u5f53\u6807\u7b7e\uff0c\u5bf9\u4e8e\u63d0\u9ad8\u6570\u636e\u96c6\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u81f3\u5173\u91cd\u8981\u3002\u5bf9\u7231\u6c99\u5c3c\u4e9a\u5f00\u653e\u6570\u636e\u95e8\u6237\u7684\u5206\u6790\u63ed\u793a\uff0c11%\u7684\u6570\u636e\u96c6\u6ca1\u6709\u5173\u8054\u6807\u7b7e\uff0c\u800c26%\u7684\u6570\u636e\u96c6\u4ec5\u6709\u4e00\u4e2a\u6807\u7b7e\u88ab\u5206\u914d\uff0c\u8fd9\u8868\u660e\u4e86\u95e8\u6237\u5185\u6570\u636e\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u9762\u4e34\u7684\u6311\u6218\u3002\u6839\u636e\u6700\u8fd1\u7684\u5f00\u653e\u6570\u636e\u6210\u719f\u5ea6\u62a5\u544a\uff0c\u8be5\u95e8\u6237\u88ab\u8ba4\u4e3a\u662f\u9886\u5148\u8005\u3002\u672c\u7814\u7a76\u7684\u76ee\u6807\u662f\u63d0\u51fa\u4e00\u79cd\u81ea\u52a8\u5316\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u6539\u5584OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u96c6\u6807\u7b7e\uff0c\u4ece\u800c\u63d0\u9ad8\u6570\u636e\u96c6\u7684\u53ef\u53d1\u73b0\u6027\u3002\u672c\u6587\u4ecb\u7ecd\u4e86Tagify\u2014\u2014\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5982GPT-3.5-turbo\u548cGPT-4\u81ea\u52a8\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\u7684\u539f\u578b\uff0c\u4ee5\u82f1\u8bed\u548c\u7231\u6c99\u5c3c\u4e9a\u8bed\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\uff0c\u4ece\u800c\u589e\u5f3a\u6570\u636e\u53d1\u5e03\u8005\u51c6\u5907\u7684\u5143\u6570\u636e\uff0c\u5e76\u901a\u8fc7\u6539\u5584\u6570\u636e\u7528\u6237\u5728OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u53d1\u73b0\u6027\u6765\u63d0\u9ad8\u6570\u636e\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u5f00\u53d1\u7684\u89e3\u51b3\u65b9\u6848\u7ecf\u8fc7\u7528\u6237\u8bc4\u4f30\uff0c\u5e76\u6536\u96c6\u4e86\u4ed6\u4eec\u7684\u53cd\u9988\uff0c\u4ee5\u5b9a\u4e49\u672a\u6765\u539f\u578b\u6539\u8fdb\u7684\u8bae\u7a0b\u3002|\n", "2407.18752": "|**2024-07-26**|**Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery**|Yuni Susanti et.al.|[2407.18752](http://arxiv.org/abs/2407.18752)|**[link](https://github.com/littleflow3r/kg-structure-as-prompt)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u57fa\u4e8e\u5143\u6570\u636e\u800c\u975e\u5b9e\u9645\u6570\u636e\u503c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u56e0\u679c\u53d1\u73b0\u95ee\u9898\u4e0a\u7684\u65b0\u89c6\u89d2\uff0c\u5373\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u6211\u4eec\u5173\u6ce8\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLMs\uff0c\u53c2\u6570\u5c11\u4e8e10\u4ebf\uff09\u5982\u4f55\u901a\u8fc7\u63d0\u793a\u5f0f\u5b66\u4e60\u8fdb\u884c\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7ed3\u6784\u63d0\u793a\u201d\uff08KG Structure as Prompt\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u77e5\u8bc6\u56fe\u8c31\u4e2d\u7684\u7ed3\u6784\u4fe1\u606f\uff0c\u5982\u5171\u90bb\u8282\u70b9\u548c\u5143\u8def\u5f84\uff0c\u6574\u5408\u5230\u63d0\u793a\u5f0f\u5b66\u4e60\u4e2d\uff0c\u4ee5\u589e\u5f3aSLMs\u7684\u80fd\u529b\u3002 \u5728\u4e09\u79cd\u7c7b\u578b\u7684\u751f\u547d\u79d1\u5b66\u548c\u5f00\u653e\u57df\u6570\u636e\u96c6\u4e0b\u7684\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u8d85\u8d8a\u4e86\u8bb8\u591a\u57fa\u7ebf\uff0c\u5e76\u4e14\u751a\u81f3\u8d85\u8fc7\u4e86\u5728\u5b8c\u6574\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5e38\u89c4\u5fae\u8c03\u7684\u4f20\u7edf\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\uff1a\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u63d0\u793a\u5f0f\u5b66\u4e60\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u663e\u793a\u51fa\u8d85\u8d8a\u53c2\u6570\u66f4\u591aLLMs\u7684\u6f5c\u529b\u3002 \u6211\u4eec\u5df2\u7ecf\u5728GitHub\u4e0a\u63d0\u4f9b\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\u3002**|\n", "2407.18743": "|**2024-07-26**|**Towards Effective and Efficient Continual Pre-training of Large Language Models**|Jie Chen et.al.|[2407.18743](http://arxiv.org/abs/2407.18743)|null|\u8fd9\u7bc7\u6280\u672f\u62a5\u544a\u4ecb\u7ecd\u4e86\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u65b9\u6cd5\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u6216\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u62a5\u544a\u4ee5Llama-3\uff088B\uff09\u4e3a\u4f8b\uff0c\u8fd9\u662f\u4e00\u4e2a\u663e\u8457\u63d0\u5347\u4e86\u5176\u5728\u4e2d\u6587\u7406\u89e3\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u7684\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5728\u589e\u5f3a\u65b0\u80fd\u529b\u7684\u540c\u65f6\u4fdd\u6301\u539f\u6709\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6570\u636e\u6df7\u5408\u548c\u8bfe\u7a0b\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u6570\u636e\u96c6\u5e76\u5408\u6210\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u57fa\u4e8e\u76f8\u5173\u7f51\u9875\u751f\u6210\u591a\u5b66\u79d1\u7684\u79d1\u5b66\u95ee\u9898\u4e0e\u7b54\u6848\uff08QA\uff09\u5bf9\uff0c\u5e76\u5c06\u8fd9\u4e9b\u5408\u6210\u6570\u636e\u878d\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u4ee5\u63d0\u5347Llama-3\u7684\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ecf\u8fc7\u8fd9\u4e00\u7cfb\u5217\u6539\u8fdb\u540e\u7684\u6a21\u578b\u88ab\u79f0\u4e3aLlama-3-SynE\uff08\u5408\u6210\u6570\u636e\u589e\u5f3a\u7684Llama-3\uff09\u3002\u62a5\u544a\u8fd8\u901a\u8fc7\u8f83\u5c0f\u89c4\u6a21\u7684TinyLlama\u6a21\u578b\u8fdb\u884c\u8c03\u53c2\u5b9e\u9a8c\uff0c\u5e76\u5229\u7528\u4ece\u8fd9\u4e9b\u5b9e\u9a8c\u4e2d\u5f97\u5230\u7684\u53d1\u73b0\u6765\u8bad\u7ec3\u57fa\u7ebf\u6a21\u578b\u3002 \u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u9ad8\u57fa\u7ebf\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5305\u62ec\u901a\u7528\u80fd\u529b\uff08C-Eval\u4e0a+8.81\u5206\uff0cCMMLU\u4e0a+6.31\u5206\uff09\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\uff08MATH\u4e0a+12.00\u5206\uff0cSciEval\u4e0a+4.13\u5206\uff09\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u539f\u6709\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u3001\u6570\u636e\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/RUC-GSAI/Llama-3-SynE\u3002|\n", "2407.18738": "|**2024-07-26**|**Towards Generalized Offensive Language Identification**|Alphaeus Dmonte et.al.|[2407.18738](http://arxiv.org/abs/2407.18738)|null|\u4e92\u8054\u7f51\u4e0a\u5177\u6709\u653b\u51fb\u6027\u7684\u5185\u5bb9\uff0c\u5305\u62ec\u4ec7\u6068\u8a00\u8bba\u548c\u7f51\u7edc\u6b3a\u51cc\uff0c\u662f\u4e00\u4e2a\u5168\u7403\u6027\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u793e\u533a\u5bf9\u6b64\u7ed9\u4e88\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u5df2\u7ecf\u5f00\u53d1\u51fa\u4e86\u591a\u79cd\u81ea\u52a8\u8bc6\u522b\u53ef\u80fd\u6709\u5bb3\u5185\u5bb9\u5e76\u51cf\u8f7b\u5176\u5f71\u54cd\u7684\u7cfb\u7edf\u3002\u8fd9\u4e9b\u7cfb\u7edf\u4e3b\u8981\u91c7\u7528\u4e24\u79cd\u7b56\u7565\uff1a\uff081\uff09\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u6a21\u578b\u548c\u5e94\u7528\u7aef\u70b9\uff0c\u5305\u62ec\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u6ce8\u91ca\u6570\u636e\u96c6\uff0c\u5e76\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u673a\u5668\u5b66\u4e60\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u901a\u7528\u6027\u5c1a\u4e0d\u6e05\u695a\uff0c\u800c\u4e14\u5b83\u4eec\u5728\u5b9e\u9645\u73af\u5883\u548c\u975e\u9886\u57df\u5185\u7684\u5e94\u7528\u4e5f\u5e38\u53d7\u5230\u8d28\u7591\u3002\u672c\u6587\u901a\u8fc7\u4e00\u4e2a\u65b0\u9896\u7684\u901a\u7528\u57fa\u51c6\u5bf9\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u901a\u7528\u6027\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\u3002\u6211\u4eec\u9488\u5bf9\u901a\u7528\u6027\u63d0\u51fa\u4e86\u4e09\u4e2a\u7814\u7a76\u95ee\u9898\uff0c\u5e76\u5f97\u51fa\u4e86\u7ed3\u8bba\u3002\u8fd9\u4e9b\u53d1\u73b0\u5c06\u6709\u52a9\u4e8e\u6784\u5efa\u66f4\u5f3a\u5927\u7684\u73b0\u5b9e\u4e16\u754c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7cfb\u7edf\u3002|\n", "2407.18723": "|**2024-07-26**|**LLASP: Fine-tuning Large Language Models for Answer Set Programming**|Erica Coppolillo et.al.|[2407.18723](http://arxiv.org/abs/2407.18723)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u3002\u5c3d\u7ba1\u5728\u9002\u5e94LLMs\u4ee5\u751f\u6210\u591a\u79cd\u6307\u4ee4\u6027\u7f16\u7a0b\u8bed\u8a00\u548c\u4efb\u52a1\u7684\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u58f0\u660e\u5f0f\u5f62\u5f0f\u5316\u8bed\u8a00\uff0c\u5982\u7b54\u6848\u96c6\u7f16\u7a0b\uff08ASP\uff09\u65f6\u7684\u80fd\u529b\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8LLMs\u5728ASP\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u5e94\u7528\u53ef\u80fd\u6027\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u8fdb\u884c\u4e86\u7cfb\u7edf\u8bc4\u4f30\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5728\u53c2\u6570\u6570\u91cf\u3001\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u7b49\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u5b83\u4eec\u5728\u751f\u6210\u6b63\u786eASP\u7a0b\u5e8f\u65b9\u9762\u7684\u8868\u73b0\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLASP\u7684\u8f7b\u91cf\u7ea7\u6a21\u578b\uff0c\u4e13\u95e8\u7528\u4e8e\u7f16\u7801ASP\u7a0b\u5e8f\u7684\u57fa\u672c\u6a21\u5f0f\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u5e7f\u6cdb\u57fa\u672c\u95ee\u9898\u89c4\u8303\u7684\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\uff0c\u8fd9\u4e9b\u89c4\u8303\u53ef\u4ee5\u88ab\u7f16\u7801\u4e3aASP\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLASP\u751f\u6210\u7684ASP\u7a0b\u5e8f\u7684\u8d28\u91cf\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u3002\u4e0e\u672a\u7ecf\u8fc7\u5fae\u8c03\u7684\u7248\u672c\u76f8\u6bd4\uff0c\u4ee5\u53ca\u4e0e\u5927\u591a\u6570\u6e34\u671b\u578bLLM\u5019\u9009\u8005\uff0c\u5c24\u5176\u662f\u4ece\u8bed\u4e49\u89d2\u5ea6\u6765\u770b\uff0c\u5176\u8868\u73b0\u5747\u4f18\u4e8e\u591a\u6570\u3002\u6240\u6709\u7528\u4e8e\u6267\u884c\u5b9e\u9a8c\u7684\u4ee3\u7801\u548c\u6570\u636e\u90fd\u5df2\u516c\u5f00\u53d1\u5e03\u4e8ehttps://anonymous.4open.science/r/LLASP-D86C/\u3002|\n", "2407.18722": "|**2024-07-26**|**Neurosymbolic AI for Enhancing Instructability in Generative AI**|Amit Sheth et.al.|[2407.18722](http://arxiv.org/abs/2407.18722)|null|\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u97f3\u4e50\u7b49\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u53d8\u9769\uff0c\u5c55\u793a\u4e86\u9075\u5faa\u6307\u4ee4\u7684\u63d0\u793a\u80fd\u529b\uff0c\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u6307\u4ee4\u8c03\u4f18\u3002\u6307\u4ee4\u8c03\u4f18\u662f\u4e00\u79cd\u76d1\u7763\u5f0f\u5fae\u8c03\u65b9\u6cd5\uff0c\u901a\u8fc7\u8bad\u7ec3\u6570\u636e\u96c6\u6765\u5b9e\u73b0\u7279\u5b9a\u4efb\u52a1\u53ca\u5176\u5bf9\u5e94\u6307\u4ee4\u683c\u5f0f\u5316\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7cfb\u7edf\u6027\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u6267\u884c\u63d0\u4f9b\u6307\u793a\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cLLMs \u5728\u4e00\u81f4\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u3001\u591a\u6b65\u9aa4\u6307\u4ee4\u4ee5\u53ca\u5c06\u8fd9\u4e9b\u6307\u4ee4\u63a8\u5e7f\u5230\u65b0\u4efb\u52a1\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5bf9\u4e8e\u66f4\u5e7f\u6cdb\u5730\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e3a\u4f55\u795e\u7ecf\u7b26\u53f7AI\u80fd\u63d0\u4f9b\u589e\u5f3aLLMs\u6307\u4ee4\u53ef\u7406\u89e3\u6027\u7684\u66f4\u597d\u9014\u5f84\u3002\u6211\u4eec\u63a2\u7d22\u4f7f\u7528\u7b26\u53f7\u4efb\u52a1\u89c4\u5212\u5668\u5206\u89e3\u9ad8\u7ea7\u6307\u4ee4\u4e3a\u7ed3\u6784\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u795e\u7ecf\u8bed\u4e49\u89e3\u6790\u5668\u5c06\u8fd9\u4e9b\u4efb\u52a1\u843d\u5730\u4e3a\u53ef\u6267\u884c\u64cd\u4f5c\uff0c\u4ee5\u53ca\u4f7f\u7528\u795e\u7ecf\u7b26\u53f7\u6267\u884c\u5668\u5b9e\u65bd\u8fd9\u4e9b\u64cd\u4f5c\u7684\u540c\u65f6\u52a8\u6001\u7ef4\u62a4\u660e\u786e\u7684\u72b6\u6001\u8868\u793a\u3002\u6211\u4eec\u4e5f\u5bfb\u6c42\u5c55\u793a\uff0c\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\u80fd\u591f\u589e\u5f3a\u4efb\u52a1\u6267\u884c\u7684\u53ef\u9760\u6027\u548c\u4e0a\u4e0b\u6587\u610f\u8bc6\uff0c\u4f7fLLMs\u80fd\u591f\u4ee5\u66f4\u9ad8\u7684\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u52a8\u6001\u89e3\u91ca\u548c\u54cd\u5e94\u66f4\u5e7f\u6cdb\u7684\u6307\u4ee4\u4e0a\u4e0b\u6587\u3002|\n", "2407.20232": "|**2024-07-29**|**Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing**|Ekaterina Iakovleva et.al.|[2407.20232](http://arxiv.org/abs/2407.20232)|null|\u6587\u672c\u7f16\u8f91\u7684\u6269\u6563\u6a21\u578b\u5728\u7528\u6237\u8f93\u5165\u6307\u4ee4\u5b58\u5728\u6b67\u4e49\u65f6\u8868\u73b0\u51fa\u6709\u9650\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Specify ANd Edit\uff08SANE\uff09\uff0c\u4e00\u4e2a\u7528\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u7f16\u8f91\u7cfb\u7edf\u7684\u96f6\u6837\u672c\u63a8\u7406\u7ba1\u9053\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u8f93\u5165\u6307\u4ee4\u5206\u89e3\u4e3a\u5177\u4f53\u7684\u6307\u4ee4\uff0c\u5373\u5e94\u7528\u5230\u8f93\u5165\u56fe\u50cf\u4ee5\u6ee1\u8db3\u7528\u6237\u8bf7\u6c42\u7684\u5177\u4f53\u5e72\u9884\u63aa\u65bd\u3002\u901a\u8fc7\u4e00\u79cd\u4e13\u95e8\u4e3a\u4efb\u52a1\u8bbe\u8ba1\u7684\u65b0\u9896\u53bb\u566a\u6307\u5bfc\u7b56\u7565\uff0c\u6211\u4eec\u53ef\u4ee5\u4eceLLM\u751f\u6210\u7684\u6307\u4ee4\u4ee5\u53ca\u539f\u59cb\u6307\u4ee4\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u4e09\u4e2a\u57fa\u7ebf\u548c\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86SANE\u5728\u6240\u6709\u8bbe\u7f6e\u4e2d\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u63d0\u9ad8\u4e86\u7f16\u8f91\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u589e\u5f3a\u4e86\u8f93\u51fa\u591a\u6837\u6027\u3002\u6211\u4eec\u8fd8\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u7f16\u8f91\uff0c\u65e0\u8bba\u662f\u5426\u5b58\u5728\u6b67\u4e49\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/fabvio/SANE\u3002|\n", "2407.20224": "|**2024-07-29**|**Can Editing LLMs Inject Harm?**|Canyu Chen et.al.|[2407.20224](http://arxiv.org/abs/2407.20224)|null|\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u6b63\u9010\u6e10\u88ab\u91c7\u7528\u4ee5\u9ad8\u6548\u5730\u7ea0\u6b63\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u9519\u8bef\u6216\u8fc7\u65f6\u77e5\u8bc6\uff0c\u8fd9\u4e3b\u8981\u662f\u56e0\u4e3a\u4ece\u5934\u5f00\u59cb\u91cd\u65b0\u8bad\u7ec3\u7684\u9ad8\u6210\u672c\u3002\u540c\u65f6\uff0c\u4e00\u4e2a\u4e9f\u5f85\u63a2\u7d22\u4f46\u672a\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1a\u77e5\u8bc6\u7f16\u8f91\u662f\u5426\u53ef\u4ee5\u7528\u4e8e\u5411LLMs\u6ce8\u5165\u5371\u5bb3\uff1f\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u77e5\u8bc6\u7f16\u8f91\u91cd\u65b0\u5b9a\u4e49\u4e3aLLMs\u9762\u4e34\u7684\u4e00\u79cd\u65b0\u7c7b\u578b\u5b89\u5168\u6027\u5a01\u80c1\uff0c\u5373\u7f16\u8f91\u653b\u51fb\uff0c\u5e76\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6EditAttack\u8fdb\u884c\u4e86\u7cfb\u7edf\u6027\u7684\u8c03\u67e5\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7f16\u8f91\u653b\u51fb\u7684\u4e24\u4e2a\u5178\u578b\u5b89\u5168\u6027\u98ce\u9669\uff1a\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u504f\u89c1\u6ce8\u5165\u3002\u5bf9\u4e8e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u9996\u5148\u5c06\u5176\u7ec6\u5206\u4e3a\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u957f\u5c3e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u3002\u7136\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u7f16\u8f91\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u5411LLMs\u6ce8\u5165\u8fd9\u4e24\u79cd\u7c7b\u578b\u7684\u8bef\u5bfc\u6027\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5bf9\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u6709\u6548\u6027\u7279\u522b\u9ad8\u3002 \u5bf9\u4e8e\u504f\u89c1\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u70b9\uff0c\u5373\u4e0d\u4ec5\u53ef\u4ee5\u901a\u8fc7\u9ad8\u6709\u6548\u6027\u5411LLMs\u6ce8\u5165\u6709\u504f\u89c1\u7684\u53e5\u5b50\uff0c\u800c\u4e14\u5355\u4e2a\u6709\u504f\u89c1\u7684\u53e5\u5b50\u6ce8\u5165\u5c31\u8db3\u4ee5\u5bfc\u81f4LLMs\u7684\u603b\u4f53\u8f93\u51fa\u51fa\u73b0\u663e\u8457\u504f\u89c1\u589e\u52a0\uff0c\u5373\u4f7f\u8fd9\u4e9b\u8f93\u51fa\u4e0e\u6ce8\u5165\u7684\u53e5\u5b50\u9ad8\u5ea6\u65e0\u5173\uff0c\u8fd9\u8868\u660e\u4e86\u7f16\u8f91\u653b\u51fb\u5bf9LLMs\u6574\u4f53\u516c\u5e73\u6027\u7684\u707e\u96be\u6027\u5f71\u54cd\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7f16\u8f91\u653b\u51fb\u7684\u9ad8\u9690\u853d\u6027\uff0c\u901a\u8fc7\u5176\u5bf9LLMs\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u7684\u5f71\u54cd\u6765\u8861\u91cf\uff0c\u4ee5\u53ca\u5728\u5b9e\u8bc1\u8bc1\u636e\u7684\u57fa\u7840\u4e0a\u8bf4\u660e\u4e86\u9632\u5fa1\u7f16\u8f91\u653b\u51fb\u7684\u56f0\u96be\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u5728\u635f\u5bb3LLMs\u5b89\u5168\u5bf9\u9f50\u65b9\u9762\u6b63\u5728\u51fa\u73b0\u7684\u6ee5\u7528\u98ce\u9669\u3002|\n", "2407.20207": "|**2024-07-29**|**QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval**|Hongming Tan et.al.|[2407.20207](http://arxiv.org/abs/2407.20207)|null|\u5728\u5bc6\u96c6\u68c0\u7d22\u9886\u57df\uff0c\u5c06\u957f\u6587\u672c\u8f6c\u5316\u4e3a\u7a20\u5bc6\u5411\u91cf\u65f6\u53ef\u80fd\u4f1a\u5bfc\u81f4\u4fe1\u606f\u4e22\u5931\uff0c\u4ece\u800c\u5f71\u54cd\u67e5\u8be2\u4e0e\u6587\u672c\u7684\u5339\u914d\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u8d28\u91cf\u8f83\u4f4e\u3001\u566a\u58f0\u8fc7\u591a\u6216\u5173\u952e\u4fe1\u606f\u7a00\u758f\u7684\u6587\u672c\u5f80\u5f80\u96be\u4ee5\u4e0e\u76f8\u5173\u67e5\u8be2\u826f\u597d\u5339\u914d\u3002\u5f53\u524d\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u53e5\u5d4c\u5165\u6a21\u578b\u6216\u68c0\u7d22\u6d41\u7a0b\u4e0a\u3002\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u5c06\u539f\u59cb\u6587\u6863\u8f6c\u5316\u4e3a\u4fe1\u606f\u5bc6\u96c6\u578b\u6587\u672c\u683c\u5f0f\uff0c\u4ee5\u8865\u5145\u539f\u6587\u672c\uff0c\u6709\u6548\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\uff0c\u540c\u65f6\u65e0\u9700\u4fee\u6539\u5d4c\u5165\u6216\u68c0\u7d22\u65b9\u6cd5\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u96f6\u6837\u672c\u63d0\u793a\u751f\u6210\u4e24\u79cd\u6587\u672c\u8868\u793a\uff1a\u95ee\u9898-\u7b54\u6848\u5bf9\u548c\u4e8b\u4ef6\u9a71\u52a8\u5143\u7d20\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u547d\u540d\u4e3aQAEA-DR\uff1a\u7edf\u4e00\u95ee\u9898\u751f\u6210\u4e0e\u4e8b\u4ef6\u63d0\u53d6\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\uff0c\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bc4\u5206\u7684\u8bc4\u4f30\u4e0e\u518d\u751f\u6210\u673a\u5236\u4e8eLLM\u63d0\u793a\u8fc7\u7a0b\u4e2d\u3002\u6211\u4eec\u7684QAEA-DR\u6a21\u578b\u5bf9\u5bc6\u96c6\u68c0\u7d22\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u8fd9\u4e00\u89c2\u70b9\u5f97\u5230\u4e86\u7406\u8bba\u5206\u6790\u548c\u5b9e\u9a8c\u8bc1\u636e\u7684\u652f\u6301\u3002|\n", "2407.20183": "|**2024-07-29**|**MindSearch: Mimicking Human Minds Elicits Deep AI Searcher**|Zehui Chen et.al.|[2407.20183](http://arxiv.org/abs/2407.20183)|**[link](https://github.com/internlm/mindsearch)**|**\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u662f\u4e00\u4e2a\u590d\u6742\u8ba4\u77e5\u4efb\u52a1\uff0c\u9700\u8981\u6295\u5165\u5927\u91cf\u65f6\u95f4\u548c\u7cbe\u529b\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fd1\u671f\u663e\u8457\u8fdb\u5c55\u7684\u542f\u53d1\uff0c\u8fd1\u671f\u5de5\u4f5c\u5c1d\u8bd5\u901a\u8fc7\u7ed3\u5408\u641c\u7d22\u5f15\u64ce\u4e0eLLM\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u7136\u56e0\u4e09\u4e2a\u6311\u6218\u800c\u83b7\u5f97\u4e0d\u4ee4\u4eba\u6ee1\u610f\u7684\u6027\u80fd\uff1a\uff081\uff09\u590d\u6742\u7684\u8bf7\u6c42\u5f80\u5f80\u65e0\u6cd5\u51c6\u786e\u4e14\u5b8c\u6574\u5730\u7531\u641c\u7d22\u5f15\u64ce\u68c0\u7d22\uff1b\uff082\uff09\u9700\u8981\u6574\u5408\u7684\u4fe1\u606f\u5206\u5e03\u5728\u591a\u4e2a\u7f51\u9875\u4e0a\uff0c\u5e76\u5939\u6742\u7740\u5927\u91cf\u566a\u97f3\uff1b\uff083\uff09\u5927\u91cf\u957f\u6587\u672c\u7684\u7f51\u9875\u53ef\u80fd\u8fc5\u901f\u8d85\u8fc7LLM\u7684\u6700\u5927\u4e0a\u4e0b\u6587\u957f\u5ea6\u3002 \u53d7\u4eba\u7c7b\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u601d\u7ef4\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86MindSearch\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u4e92\u8054\u7f51\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u8fc7\u7a0b\u4e2d\u7684\u601d\u7ef4\u6a21\u5f0f\uff0c\u53ef\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u800c\u6709\u6548\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u3002WebPlanner\u4ee5\u52a8\u6001\u56fe\u6784\u5efa\u8fc7\u7a0b\u6765\u6a21\u62df\u4eba\u7c7b\u591a\u6b65\u9aa4\u4fe1\u606f\u68c0\u7d22\u7684\u601d\u7ef4\uff1a\u5b83\u5c06\u7528\u6237\u67e5\u8be2\u5206\u89e3\u4e3a\u539f\u5b50\u5b50\u95ee\u9898\u4f5c\u4e3a\u56fe\u4e2d\u7684\u8282\u70b9\uff0c\u5e76\u6839\u636e\u4eceWebSearcher\u83b7\u53d6\u7684\u641c\u7d22\u7ed3\u679c\u9010\u6b65\u6269\u5c55\u56fe\u3002WebSearcher\u627f\u62c5\u6bcf\u4e2a\u5b50\u95ee\u9898\uff0c\u6267\u884c\u5206\u5c42\u4fe1\u606f\u68c0\u7d22\u5e76\u4ece\u641c\u7d22\u5f15\u64ce\u6536\u96c6\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u4f9bWebPlanner\u4f7f\u7528\u3002MindSearch\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u4f7f\u5176\u6574\u4f53\u6846\u67b6\u80fd\u591f\u5e76\u884c\u4ece\u8d85\u8fc7300\u4e2a\u7f51\u9875\u4e2d\u68c0\u7d22\u548c\u6574\u5408\u4fe1\u606f\uff0c\u4ec5\u97003\u5206\u949f\uff0c\u76f8\u5f53\u4e8e\u8282\u7701\u4e863\u5c0f\u65f6\u7684\u4eba\u7c7b\u52aa\u529b\u3002 MindSearch\u5728\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u9002\u7528\u4e8e\u5c01\u95ed\u96c6\u548c\u5f00\u653e\u96c6\u7684\u95ee\u7b54\u95ee\u9898\u3002\u6b64\u5916\uff0c\u57fa\u4e8eInternLM2.5-7B\u7684MindSearch\u751f\u6210\u7684\u54cd\u5e94\u88ab\u4eba\u7c7b\u8ba4\u4e3a\u4f18\u4e8eChatGPT-Web\u548cPerplexity.ai\u5e94\u7528\uff0c\u8fd9\u8868\u660eMindSearch\u5df2\u7ecf\u80fd\u591f\u63d0\u4f9b\u4e0e\u4e13\u6709AI\u641c\u7d22\u5f15\u64ce\u76f8\u7ade\u4e89\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2407.20174": "|**2024-07-29**|**Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning**|Xingchen Zeng et.al.|[2407.20174](http://arxiv.org/abs/2407.20174)|**[link](https://github.com/zengxingchen/chartqa-mllm)**|**\u65b0\u5174\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u8868\u95ee\u9898\u56de\u7b54\uff08CQA\uff09\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u8fd1\u671f\u7684\u52aa\u529b\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u6269\u5927\u8bad\u7ec3\u6570\u636e\u96c6\uff08\u5305\u62ec\u56fe\u8868\u3001\u6570\u636e\u8868\u683c\u548c\u95ee\u7b54\u5bf9\uff09\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u73b0\u6709MLLMs\u548cCQA\u6570\u636e\u96c6\u7684\u5b9e\u8bc1\u7814\u7a76\u63ed\u793a\u4e86\u663e\u8457\u7684\u5dee\u8ddd\u3002 \u9996\u5148\uff0c\u5f53\u524d\u7684\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u6570\u636e\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u7cbe\u7ec6\u7684\u89c6\u89c9\u7f16\u7801\u548c\u95ee\u7b54\u4efb\u52a1\u7684\u8003\u8651\uff0c\u5bfc\u81f4\u6570\u636e\u5206\u5e03\u4e0e\u5b9e\u9645CQA\u573a\u666f\u5927\u76f8\u5f84\u5ead\uff0c\u4e0d\u5e73\u8861\u6027\u660e\u663e\u3002\u5176\u6b21\uff0c\u73b0\u6709\u7684\u5de5\u4f5c\u9075\u5faa\u4e86\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u81ea\u7136\u56fe\u50cf\u7684\u57fa\u7840MLLMs\u7684\u8bad\u7ec3\u914d\u65b9\uff0c\u5bf9\u4e8e\u56fe\u8868\u7684\u72ec\u7279\u7279\u6027\uff0c\u5982\u4e30\u5bcc\u7684\u6587\u672c\u5143\u7d20\u7684\u9002\u5e94\u6027\u63a2\u7d22\u4e0d\u8db3\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u53c2\u8003\u6307\u4ee4\u8c03\u6574\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u589e\u5f3a\u548c\u6a21\u578b\u5f00\u53d1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u5f15\u64ce\uff0c\u80fd\u591f\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u6709\u6548\u5730\u7b5b\u9009\u51fa\u591a\u6837\u6027\u548c\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u5e76\u968f\u540e\u5229\u7528\u57fa\u4e8eLLM\u7684\u751f\u6210\u6280\u672f\u5bf9\u6570\u636e\u8fdb\u884c\u7ec6\u5316\u548c\u6269\u5145\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0e\u5b9e\u9645\u95ee\u7b54\u4efb\u52a1\u548c\u89c6\u89c9\u7f16\u7801\u76f8\u5339\u914d\u3002 \u7136\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u56fe\u8868\u7279\u6027\u7684\u9002\u5e94\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e30\u5bcc\u5316\u6570\u636e\u6765\u8bad\u7ec3\u4e00\u4e2aMLLM\uff0c\u901a\u8fc7\u89e3\u51bb\u89c6\u89c9\u7f16\u7801\u5668\u5e76\u5f15\u5165\u6df7\u5408\u5206\u8fa8\u7387\u9002\u5e94\u7b56\u7565\uff0c\u4ee5\u589e\u5f3a\u7ec6\u5fae\u7c92\u5ea6\u8bc6\u522b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5373\u4f7f\u4f7f\u7528\u8f83\u5c11\u7684\u8bad\u7ec3\u793a\u4f8b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u7684CQA\u6a21\u578b\uff0c\u5728\u5df2\u5efa\u7acb\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u8fd8\u8d21\u732e\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\u5206\u5272\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u51c6\u3002\u8be5\u8bba\u6587\u7684\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/zengxingchen/ChartQA-MLLM\u3002**|\n", "2407.20171": "|**2024-07-29**|**Diffusion Feedback Helps CLIP See Better**|Wenxuan Wang et.al.|[2407.20171](http://arxiv.org/abs/2407.20171)|**[link](https://github.com/baaivision/diva)**|\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u5728\u8de8\u9886\u57df\u548c\u6a21\u6001\u62bd\u8c61\u5f00\u653e\u4e16\u754c\u8868\u793a\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5df2\u6210\u4e3a\u5404\u79cd\u89c6\u89c9\u548c\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86CLIP\u5728\u89c6\u89c9\u65b9\u9762\u7684\u4e25\u91cd\u5c40\u9650\u6027\uff0c\u5982\u96be\u4ee5\u533a\u5206\u65b9\u5411\u3001\u6570\u91cf\u3001\u989c\u8272\u3001\u7ed3\u6784\u7b49\u3002\u8fd9\u4e9b\u89c6\u89c9\u5c40\u9650\u6027\u4e5f\u9650\u5236\u4e86\u57fa\u4e8eCLIP\u6784\u5efa\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u611f\u77e5\u80fd\u529b\u3002\u4e3b\u8981\u539f\u56e0\u662f\u7528\u4e8e\u8bad\u7ec3CLIP\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u56fa\u6709\u504f\u89c1\uff0c\u7531\u4e8e\u6587\u672c\u7684\u4e0d\u660e\u786e\u6027\u548c\u56fe\u7247\u591a\u6837\u6027\u4e0d\u8db3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9CLIP\u6a21\u578b\u7684\u7b80\u5355\u540e\u5904\u7406\u65b9\u6cd5\uff0c\u901a\u8fc7\u81ea\u6211\u76d1\u7763\u7684\u6269\u6563\u8fc7\u7a0b\u6781\u5927\u5730\u514b\u670d\u4e86\u5176\u89c6\u89c9\u5c40\u9650\u6027\u3002\u6211\u4eec\u5f15\u5165\u4e86DIVA\uff0c\u5373\u4f5c\u4e3aCLIP\u89c6\u89c9\u8f85\u52a9\u7684\u6269\u6563\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0cDIVA\u5229\u7528\u6587\u672c\u5230\u56fe\u50cf\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u53cd\u9988\u6765\u4f18\u5316CLIP\u8868\u793a\uff0c\u4ec5\u4f7f\u7528\u56fe\u50cf\uff08\u4e0d\u5305\u62ec\u5bf9\u5e94\u6587\u672c\uff09\u3002\u6211\u4eec\u8bc1\u660eDIVA\u5728MMVP-VLM\u57fa\u51c6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86CLIP\u7684\u6027\u80fd\uff0c\u8be5\u57fa\u51c6\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u7ec6\u5fae\u7684\u89c6\u89c9\u80fd\u529b\uff08\u4f8b\u5982\uff0c3-7%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u589e\u5f3a\u4e86MLLMs\u548c\u89c6\u89c9\u6a21\u578b\u5728\u591a\u6a21\u6001\u7406\u89e3\u548c\u5206\u5272\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u572829\u4e2a\u56fe\u50cf\u5206\u7c7b\u548c\u68c0\u7d22\u57fa\u51c6\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u4fdd\u7559\u4e86CLIP\u5f3a\u5927\u7684\u96f6\u6837\u672c\u80fd\u529b\u3002\u4ee3\u7801\u5c06\u5728https://github.com/baaivision/DIVA\u516c\u5f00\u3002|\n", "2407.20164": "|**2024-07-29**|**Language-Conditioned Offline RL for Multi-Robot Navigation**|Steven Morad et.al.|[2407.20164](http://arxiv.org/abs/2407.20164)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u4e3a\u591a\u673a\u5668\u4eba\u56e2\u961f\u5f00\u53d1\u80fd\u591f\u7406\u89e3\u5e76\u9075\u5faa\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u7684\u5bfc\u822a\u7b56\u7565\u3002\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5d4c\u5165\u6765\u6761\u4ef6\u5316\u8fd9\u4e9b\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u4ec520\u5206\u949f\u968f\u673a\u6536\u96c6\u7684\u6570\u636e\u8fdb\u884c\u79bb\u7ebf\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u5b83\u4eec\u3002\u5728\u4e94\u53f0\u771f\u5b9e\u673a\u5668\u4eba\u7684\u5b9e\u9a8c\u4e2d\uff0c\u8fd9\u4e9b\u7b56\u7565\u5bf9\u672a\u89c1\u8fc7\u7684\u547d\u4ee4\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8868\u660e\u5b83\u4eec\u7406\u89e3\u4e86LLM\u7684\u6f5c\u5728\u7a7a\u95f4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u9700\u8981\u6a21\u62df\u5668\u6216\u73af\u5883\u6a21\u578b\uff0c\u5e76\u4ea7\u751f\u4f4e\u5ef6\u8fdf\u7684\u63a7\u5236\u7b56\u7565\uff0c\u53ef\u4ee5\u76f4\u63a5\u90e8\u7f72\u5230\u771f\u5b9e\u673a\u5668\u4eba\u4e0a\u800c\u65e0\u9700\u8fdb\u4e00\u6b65\u8c03\u4f18\u3002\u66f4\u591a\u4fe1\u606f\u548c\u5b9e\u9a8c\u89c6\u9891\u8bf7\u53c2\u9605https://sites.google.com/view/llm-marl\u3002|\n", "2407.20157": "|**2024-07-29**|**rLLM: Relational Table Learning with LLMs**|Weichen Li et.al.|[2407.20157](http://arxiv.org/abs/2407.20157)|**[link](https://github.com/rllm-project/rllm)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3arLLM\uff08\u5173\u7cfbLLM\uff09\u7684PyTorch\u5e93\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5b9e\u73b0\u5173\u7cfb\u8868\u5b66\u4e60\uff08RTL\uff09\u3002\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u6700\u5148\u8fdb\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u3001LLMs\u548c\u8868\u795e\u7ecf\u7f51\u7edc\u5206\u89e3\u4e3a\u6807\u51c6\u5316\u6a21\u5757\uff0c\u4ee5\u5b9e\u73b0\u5feb\u901f\u6784\u5efa\u65b0\u578bRTL\u578b\u6a21\u578b\u7684\u7b80\u5355\u201c\u7ec4\u5408\u3001\u5bf9\u9f50\u548c\u8054\u5408\u8bad\u7ec3\u201d\u65b9\u5f0f\u3002\u4e3a\u4e86\u8bf4\u660erLLM\u7684\u4f7f\u7528\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RTL\u65b9\u6cd5\u540d\u4e3aBRIDGE\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7ecf\u5178\u6570\u636e\u96c6\uff0c\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u5173\u7cfb\u8868\u683c\u6570\u636e\u96c6\uff08TML1M\u3001TLF2K\u548cTACM12K\uff09\u3002\u6211\u4eec\u5e0c\u671brLLM\u80fd\u591f\u4f5c\u4e3a\u7528\u4e8eRTL\u76f8\u5173\u4efb\u52a1\u7684\u6709\u7528\u4e14\u6613\u4e8e\u4f7f\u7528\u7684\u5f00\u53d1\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u4f4d\u7f6e\u83b7\u53d6\uff1ahttps://github.com/rllm-project/rllm\u3002**|\n", "2407.20143": "|**2024-07-29**|**ByteCheckpoint: A Unified Checkpointing System for LLM Development**|Borui Wan et.al.|[2407.20143](http://arxiv.org/abs/2407.20143)|null|\u5728\u6784\u5efa\u5b9e\u9645\u4e16\u754c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\uff0c\u9700\u8981\u5728\u6301\u4e45\u5b58\u50a8\u4e2d\u68c0\u67e5\u8bad\u7ec3\u72b6\u6001\u4ee5\u9632\u6b62\u6f5c\u5728\u7684\u8f6f\u4ef6\u548c\u786c\u4ef6\u6545\u969c\uff0c\u5e76\u652f\u6301\u8bad\u7ec3\u7ba1\u9053\u5185\u7684\u68c0\u67e5\u70b9\u8f6c\u79fb\u4ee5\u53ca\u8de8\u4efb\u52a1\u4f7f\u7528\u3002\u7531\u4e8eLLMs\u7684\u89c4\u6a21\u5e9e\u5927\uff0c\u4fdd\u5b58\u548c\u52a0\u8f7d\u68c0\u67e5\u70b9\u5f80\u5f80\u4f1a\u5bfc\u81f4\u4ee4\u4eba\u96be\u4ee5\u63a5\u53d7\u7684\u5206\u949f\u7ea7\u5ef6\u8fdf\uff0c\u6781\u5927\u5730\u964d\u4f4e\u4e86\u8bad\u7ec3\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8de8\u4efb\u52a1\u8f6c\u79fb\u68c0\u67e5\u70b9\u65f6\uff0c\u901a\u5e38\u9700\u8981\u6267\u884c\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\uff0c\u5373\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u7684\u7279\u6027\u548c\u8d44\u6e90\u914d\u989d\u5c06\u68c0\u67e5\u70b9\u52a0\u8f7d\u5230\u4e0d\u540c\u7684\u5e76\u884c\u914d\u7f6e\u4e2d\u3002\u5148\u524d\u7684\u68c0\u67e5\u70b9\u7cfb\u7edf\u5047\u8bbe\u5e76\u884c\u914d\u7f6e\u4e00\u81f4\uff0c\u672a\u80fd\u89e3\u51b3\u5728\u91cd\u65b0\u5206\u7247\u671f\u95f4\u8f6c\u6362\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u800c\u4e14\uff0c\u5728\u5de5\u4e1a\u5e73\u53f0\u4e2d\uff0c\u5f00\u53d1\u8005\u4ece\u4e0d\u540c\u7684\u8bad\u7ec3\u6846\u67b6\u521b\u5efa\u68c0\u67e5\u70b9\uff0c\u6bcf\u4e2a\u6846\u67b6\u90fd\u6709\u5176\u72ec\u7279\u7684\u5b58\u50a8\u548cI/O\u903b\u8f91\uff0c\u8fd9\u589e\u52a0\u4e86\u7edf\u4e00\u7ba1\u7406\u548c\u4f18\u5316\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86ByteCheckpoint\uff0c\u4e00\u4e2a\u652f\u6301\u81ea\u52a8\u5728\u7ebf\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\u7684PyTorch\u539f\u751f\u591a\u6846\u67b6LLM\u68c0\u67e5\u70b9\u7cfb\u7edf\u3002ByteCheckpoint\u91c7\u7528\u6570\u636e/\u5143\u6570\u636e\u5206\u79bb\u7684\u5b58\u50a8\u67b6\u6784\uff0c\u89e3\u8026\u4e86\u68c0\u67e5\u70b9\u5b58\u50a8\u4e0e\u6240\u91c7\u7528\u7684\u5e76\u884c\u7b56\u7565\u548c\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5f02\u6b65\u5f20\u91cf\u5408\u5e76\u6280\u672f\u6765\u89e3\u51b3\u4e0d\u89c4\u5219\u5f20\u91cf\u5206\u7247\u95ee\u9898\uff0c\u5e76\u63d0\u51fa\u4e86\u591a\u9879I/O\u6027\u80fd\u4f18\u5316\u63aa\u65bd\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u68c0\u67e5\u70b9\u4fdd\u5b58\u548c\u52a0\u8f7d\u7684\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cByteCheckpoint\u5728\u51cf\u5c11\u68c0\u67e5\u70b9\u4fdd\u5b58\uff08\u6700\u9ad8\u53ef\u8fbe529.22\u500d\uff09\u548c\u52a0\u8f7d\uff08\u6700\u9ad8\u53ef\u8fbe3.51\u500d\uff09\u6210\u672c\u65b9\u9762\u5177\u6709\u660e\u663e\u4f18\u52bf\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\u3002|\n", "2407.20053": "|**2024-07-29**|**Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models**|Zhe Li et.al.|[2407.20053](http://arxiv.org/abs/2407.20053)|null|\u663e\u8457\u6ce2\u9ad8\uff08SWH\uff09\u5728\u6d77\u6d0b\u79d1\u5b66\u4e2d\u662f\u4e00\u4e2a\u5173\u952e\u6307\u6807\uff0c\u7cbe\u786e\u7684SWH\u4f30\u8ba1\u5bf9\u4e8e\u5404\u79cd\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f8b\u5982\u6d77\u6d0b\u80fd\u5f00\u53d1\u3001\u6e14\u4e1a\u3001\u6f5c\u5728\u98ce\u9669\u7684\u65e9\u671f\u9884\u8b66\u7cfb\u7edf\u7b49\u3002\u57fa\u4e8e\u6570\u503c\u6a21\u578b\u548c\u7269\u7406\u7406\u8bba\u7684\u4f20\u7edfSWH\u4f30\u7b97\u65b9\u6cd5\u53d7\u5230\u8ba1\u7b97\u6548\u7387\u4f4e\u4e0b\u7684\u9650\u5236\u3002\u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u4f5c\u4e3a\u4e00\u79cd\u6709\u5438\u5f15\u529b\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5df2\u7528\u4e8e\u63d0\u9ad8\u51c6\u786e\u5ea6\u5e76\u51cf\u5c11\u8ba1\u7b97\u65f6\u95f4\u3002\u7136\u800c\uff0c\u7531\u4e8e\u89c2\u6d4b\u6280\u672f\u6709\u9650\u548c\u6210\u672c\u9ad8\u6602\uff0c\u5b9e\u9645\u6570\u636e\u7684\u7a00\u7f3a\u6027\u9650\u5236\u4e86\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6d77\u6d0bSWH\u4f30\u7b97\u6846\u67b6\uff0c\u540d\u4e3aOrca\u3002\u5177\u4f53\u800c\u8a00\uff0cOrca\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u7a7a\u95f4\u65f6\u95f4\u611f\u77e5\u7f16\u7801\u6a21\u5757\uff0c\u589e\u5f3a\u4e86\u7ecf\u5178\u8bed\u8a00\u6a21\u578b\u5728\u7a7a\u95f4\u65f6\u95f4\u548c\u6570\u636e\u91cf\u6709\u9650\u60c5\u51b5\u4e0b\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u6709\u9650\u7684\u6d6e\u6807\u89c2\u6d4b\u6570\u636e\u8fdb\u884c\u65f6\u95f4\u5206\u5272\u3001\u7f16\u7801\u6d6e\u6807\u7684\u5730\u7406\u4f4d\u7f6e\u3001\u8bbe\u8ba1\u63d0\u793a\u6a21\u677f\uff0cOrca\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u6709\u6548\u5730\u4f7f\u7528\u6709\u9650\u7684\u6570\u636e\u5bf9\u663e\u8457\u6ce2\u9ad8\u8fdb\u884c\u4f30\u7b97\u3002\u5728\u58a8\u897f\u54e5\u6e7e\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cOrca\u5728SWH\u4f30\u7b97\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.21018": "|**2024-07-30**|**ThinK: Thinner Key Cache by Query-Driven Pruning**|Yuhui Xu et.al.|[2407.21018](http://arxiv.org/abs/2407.21018)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5f15\u53d1\u4e86\u4e00\u573a\u9769\u547d\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u5927\u7684\u6a21\u578b\u89c4\u6a21\u548c\u5e8f\u5217\u957f\u5ea6\uff0c\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u968f\u4e4b\u800c\u6765\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6210\u672c\u7684\u589e\u52a0\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\uff0c\u7531\u4e8e\u6ce8\u610f\u529b\u673a\u5236\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c\u5bf9\u7f13\u5b58\u5185\u5b58\u7ba1\u7406\u63d0\u51fa\u4e86\u4e25\u5cfb\u8003\u9a8c\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u957f\u4e0a\u4e0b\u6587\u573a\u666f\uff0c\u9488\u5bf9\u63a8\u7406\u8fc7\u7a0b\u4e2dKV\u7f13\u5b58\u5185\u5b58\u6d88\u8017\u7684\u6548\u7387\u95ee\u9898\u8fdb\u884c\u6df1\u5165\u63a2\u8ba8\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u57fa\u4e8e\u5e8f\u5217\u957f\u5ea6\u4f18\u5316\u5185\u5b58\u4e0d\u540c\uff0c\u6211\u4eec\u63ed\u793a\u4e86KV\u7f13\u5b58\u901a\u9053\u5728\u6743\u91cd\u5206\u5e03\u4e0d\u5747\u548c\u4f4e\u79e9\u7ed3\u6784\u7279\u5f81\u4e0b\u5b58\u5728\u663e\u8457\u5197\u4f59\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c2\u5bdf\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aThinK\u7684\u65b0\u578b\u67e5\u8be2\u4f9d\u8d56\u578bKV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\uff0c\u65e8\u5728\u6700\u5c0f\u5316\u6ce8\u610f\u529b\u6743\u91cd\u635f\u5931\u7684\u540c\u65f6\uff0c\u6709\u9009\u62e9\u5730\u526a\u679d\u6389\u6700\u4e0d\u91cd\u8981\u7684\u901a\u9053\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u80fd\u591f\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u51c6\u786e\u7387\uff0c\u800c\u4e14\u76f8\u6bd4\u4f20\u7edf\u7684KV\u7f13\u5b58\u6dd8\u6c70\u65b9\u6cd5\uff0c\u80fd\u5b9e\u73b0\u8d85\u8fc720%\u7684\u5185\u5b58\u6210\u672c\u51cf\u5c11\u3002\u901a\u8fc7\u5728LLaMA3\u548cMistral\u6a21\u578b\u4e0a\u5bf9\u591a\u4e2a\u957f\u5e8f\u5217\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86ThinK\u7684\u6709\u6548\u6027\uff0c\u786e\u7acb\u4e86\u5728\u4e0d\u727a\u7272\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u9ad8\u6548\u90e8\u7f72LLM\u7684\u65b0\u6807\u51c6\u3002\u6211\u4eec\u8fd8\u5c55\u671b\u4e86\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u5230\u503c\u7f13\u5b58\u526a\u679d\u7684\u53ef\u80fd\u6027\uff0c\u5c55\u793a\u4e86ThinK\u5728\u964d\u4f4e\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u65b9\u9762\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u548c\u6f5c\u529b\u3002|\n", "2407.21011": "|**2024-07-30**|**CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning**|Yuexi Du et.al.|[2407.21011](http://arxiv.org/abs/2407.21011)|**[link](https://github.com/xypb/cleft)**|**\u8fd1\u671f\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u7684\u8fdb\u5c55\u5728\u591a\u4efb\u52a1\u81ea\u76d1\u7763\u8868\u793a\u5b66\u4e60\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u7136\u800c\uff0c\u73b0\u6709CLIP\u7c7b\u65b9\u6cd5\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684GPU\u8d44\u6e90\u548c\u957f\u65f6\u95f4\u7684\u8bad\u7ec3\u5468\u671f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u4e8e\u533b\u5b66\u5e94\u7528\u800c\u8a00\uff0c\u5927\u89c4\u6a21\u6570\u636e\u96c6\u5e76\u4e0d\u603b\u662f\u5e38\u89c1\u3002\u540c\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u4e3b\u8981\u57fa\u4e8e\u4e0e\u56fe\u50cf\u5173\u8054\u7684\u6807\u7b7e\u8fdb\u884c\u624b\u52a8\u63d0\u53d6\uff0c\u53ef\u80fd\u5ffd\u89c6\u4e86\u8bad\u7ec3\u6837\u672c\u5185\u7684\u4e30\u5bcc\u4fe1\u606f\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u9ad8\u6548\u5927\u8bed\u8a00\u6a21\u578b\u4e0e\u63d0\u793a\u5fae\u8c03\u201d\uff08CLEFT\uff09\u7684\u8bed\u8a00-\u56fe\u50cf\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b83\u5145\u5206\u5229\u7528\u4e86\u5e7f\u6cdb\u9884\u8bad\u7ec3\u7684\u8bed\u4e49\u548c\u89c6\u89c9\u6a21\u578b\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7b56\u7565\u6765\u5b66\u4e60\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u63d0\u793a\uff0c\u4ee5\u7f29\u5c0f\u4e34\u5e8a\u8bca\u65ad\u6570\u636e\u4e0e\u7b80\u5355\u7c7b\u522b\u6807\u7b7e\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u591a\u4e2a\u80f8\u90e8X\u5149\u548c\u4e73\u817aX\u5149\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u5747\u4f18\u4e8e\u5404\u79cd\u57fa\u7ebf\uff0c\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002 \u6240\u63d0\u51fa\u7684\u53c2\u6570\u9ad8\u6548\u7684\u6846\u67b6\u53ef\u4ee5\u5c06\u603b\u53ef\u8bad\u7ec3\u6a21\u578b\u5927\u5c0f\u51cf\u5c1139%\uff0c\u5e76\u5c06\u53ef\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u51cf\u5c11\u5230\u4ec54%\uff0c\u4e0e\u5f53\u524d\u7684BERT\u7f16\u7801\u5668\u76f8\u6bd4\u3002**|\n", "2407.20999": "|**2024-07-30**|**MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning**|Yupeng Chen et.al.|[2407.20999](http://arxiv.org/abs/2407.20999)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u975e\u51e1\u7684\u80fd\u529b\u3002\u901a\u5e38\uff0cLLM\u901a\u8fc7\u5927\u91cf\u8bed\u6599\u5e93\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u5e76\u968f\u540e\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0cLLM\u53ef\u80fd\u4f1a\u5fd8\u8bb0\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5b66\u5230\u7684\u77e5\u8bc6\uff0c\u5bfc\u81f4\u4e00\u822c\u80fd\u529b\u4e0b\u964d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5fae\u8c03\u7b97\u6cd5\u2014\u2014\u52a8\u91cf\u8fc7\u6ee4\u4f18\u5316\u5668\uff08MoFO\uff09\u3002MoFO\u7684\u6838\u5fc3\u601d\u60f3\u662f\u8fed\u4ee3\u5730\u9009\u62e9\u5e76\u66f4\u65b0\u5177\u6709\u6700\u5927\u52a8\u91cf\u5e45\u5ea6\u7684\u6a21\u578b\u53c2\u6570\u3002\u4e0e\u5168\u53c2\u6570\u8bad\u7ec3\u76f8\u6bd4\uff0cMoFO\u5728\u4fdd\u6301\u53c2\u6570\u63a5\u8fd1\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u540c\u65f6\u5b9e\u73b0\u4e86\u76f8\u4f3c\u7684\u5fae\u8c03\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86\u77e5\u8bc6\u9057\u5fd8\u7684\u95ee\u9898\u3002\u4e0e\u73b0\u6709\u7684\u5927\u591a\u6570\u9057\u5fd8\u7f13\u89e3\u65b9\u6cd5\u4e0d\u540c\uff0cMoFO\u5177\u5907\u4ee5\u4e0b\u4e24\u4e2a\u4f18\u52bf\u3002\u9996\u5148\uff0cMoFO\u4e0d\u9700\u8981\u8bbf\u95ee\u9884\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4f7f\u5f97MoFO\u7279\u522b\u9002\u7528\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e0d\u53ef\u7528\u7684\u5fae\u8c03\u573a\u666f\uff0c\u5982\u4f7f\u7528\u5f00\u6e90LLM\u7684\u68c0\u67e5\u70b9\u8fdb\u884c\u5fae\u8c03\u3002\u5176\u6b21\uff0cMoFO\u4e0d\u4f1a\u6539\u53d8\u539f\u59cb\u635f\u5931\u51fd\u6570\u3002\u8fd9\u53ef\u4ee5\u907f\u514d\u635f\u5bb3\u6a21\u578b\u5728\u5fae\u8c03\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u901a\u8fc7\u4e25\u8c28\u7684\u6536\u655b\u6027\u5206\u6790\u548c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86MoFO\u7684\u4f18\u8d8a\u6027\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u7f13\u89e3\u9057\u5fd8\u548c\u589e\u5f3a\u5fae\u8c03\u6027\u80fd\u65b9\u9762\u7684\u4f18\u52bf\u3002|\n", "2407.20990": "|**2024-07-30**|**From Feature Importance to Natural Language Explanations Using LLMs with RAG**|Sule Tekkesinoglu et.al.|[2407.20990](http://arxiv.org/abs/2407.20990)|**[link](https://github.com/suletekkesinoglu/xai_llm_rag)**|\u968f\u7740\u673a\u5668\u5b66\u4e60\u5728\u6d89\u53ca\u4eba\u7c7b\u4ea4\u4e92\u7684\u81ea\u4e3b\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4f5c\u7528\u65e5\u76ca\u91cd\u8981\uff0c\u7406\u89e3\u6a21\u578b\u8f93\u51fa\u53d8\u5f97\u8d8a\u6765\u8d8a\u5173\u952e\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u6b63\u88ab\u63a2\u7d22\u7528\u4f5c\u4e8b\u540e\u89e3\u91ca\u5668\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u63ed\u793a\u9884\u6d4b\u6a21\u578b\u51b3\u7b56\u673a\u5236\u7684\u9014\u5f84\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u53ef\u8ffd\u8e2a\u95ee\u7b54\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u6307\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u573a\u666f\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u7528\u6237\u67e5\u8be2\u8fdb\u884c\u54cd\u5e94\u3002\u8be5\u77e5\u8bc6\u5e93\u5305\u542b\u4e86\u5173\u4e8e\u6a21\u578b\u8f93\u51fa\u7684\u4e0a\u4e0b\u6587\u7ec6\u8282\uff0c\u5305\u62ec\u9ad8\u7ea7\u7279\u5f81\u3001\u7279\u5f81\u91cd\u8981\u6027\u4ee5\u53ca\u66ff\u4ee3\u6982\u7387\u3002 \u6211\u4eec\u91c7\u7528\u51cf\u6cd5\u53cd\u4e8b\u5b9e\u63a8\u7406\u8ba1\u7b97\u7279\u5f81\u91cd\u8981\u6027\uff0c\u8fd9\u662f\u4e00\u79cd\u5206\u6790\u5728\u5206\u89e3\u8bed\u4e49\u7279\u5f81\u540e\u8f93\u51fa\u53d8\u5316\u7684\u65b9\u6cd5\u3002\u4e3a\u4e86\u4fdd\u6301\u5bf9\u8bdd\u6d41\u7545\uff0c\u6211\u4eec\u4ece\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u4e2d\u63d0\u70bc\u51fa\u56db\u4e2a\u5173\u952e\u7279\u6027\u2014\u2014\u793e\u4ea4\u6027\u3001\u56e0\u679c\u6027\u3001\u9009\u62e9\u6027\u548c\u5bf9\u6bd4\u6027\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u4e00\u4e2a\u5373\u65f6\u63d0\u793a\u4e2d\uff0c\u4ee5\u6b64\u6307\u5bfc\u54cd\u5e94\u751f\u6210\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u751f\u6210\u7684\u89e3\u91ca\u5305\u542b\u4e86\u8fd9\u4e9b\u5143\u7d20\uff0c\u8fd9\u8868\u660e\u5b83\u6709\u53ef\u80fd\u5728\u590d\u6742\u6a21\u578b\u8f93\u51fa\u4e0e\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u4e4b\u95f4\u67b6\u8d77\u6865\u6881\u3002|\n", "2407.20970": "|**2024-07-30**|**Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks**|Alakesh Kalita et.al.|[2407.20970](http://arxiv.org/abs/2407.20970)|null|\u968f\u7740\u7b2c\u4e94\u4ee3\uff085G\uff09\u548c\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u4ee5\u53ca\u7269\u8054\u7f51\uff08IoT\uff09\u7684\u5174\u8d77\uff0c\u8bed\u4e49\u901a\u4fe1\u6b63\u53d7\u5230\u7814\u7a76\u8005\u7684\u5173\u6ce8\uff0c\u56e0\u4e3a\u5f53\u524d\u7684\u901a\u4fe1\u6280\u672f\u6b63\u63a5\u8fd1\u9999\u519c\u6781\u9650\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u7406\u89e3\u5e76\u751f\u6210\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u6587\u672c\uff0c\u57fa\u4e8e\u5bf9\u6570\u5341\u4ebf\u53c2\u6570\u7684\u5e7f\u6cdb\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u8003\u8651\u5230\u6700\u8fd1\u7684\u5c31\u8fd1\u8ba1\u7b97\u6280\u672f\u5982\u8fb9\u7f18\u8ba1\u7b97\uff0c\u672c\u6587\u6982\u8ff0\u4e86\u4e00\u4e2a\u6846\u67b6\u53ca\u5176\u6a21\u5757\uff0c\u5176\u4e2dLLMs\u53ef\u4ee5\u5728\u7269\u8054\u7f51\u7f51\u7edc\u7684\u7f51\u7edc\u8fb9\u7f18\u4e0b\uff0c\u4f5c\u4e3a\u8bed\u4e49\u901a\u4fe1\u7684\u4e00\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u9ad8\u6548\u901a\u4fe1\u6548\u7387\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e00\u4e9b\u5e94\u7528\uff0c\u5e76\u5206\u6790\u4e86\u53d1\u5c55\u6b64\u7c7b\u7cfb\u7edf\u7684\u6311\u6218\u548c\u673a\u9047\u3002|\n", "2407.20906": "|**2024-07-30**|**Automated Review Generation Method Based on Large Language Models**|Shican Wu et.al.|[2407.20906](http://arxiv.org/abs/2407.20906)|**[link](https://github.com/tju-ecat-ai/automaticreviewgeneration)**|**\u6587\u732e\u7814\u7a76\u5bf9\u4e8e\u79d1\u5b66\u8fdb\u6b65\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u5bf9\u6d77\u91cf\u4fe1\u606f\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u52a8\u5316\u7efc\u8ff0\u751f\u6210\u65b9\u6cd5\uff0c\u65e8\u5728\u7b80\u5316\u6587\u732e\u5904\u7406\u6d41\u7a0b\u5e76\u51cf\u8f7b\u8ba4\u77e5\u8d1f\u62c5\u3002\u4ee5\u4e19\u70f7\u8131\u6c22\uff08PDH\uff09\u50ac\u5316\u5242\u4e3a\u4f8b\uff0c\u8be5\u65b9\u6cd5\u4ece343\u7bc7\u6587\u7ae0\u4e2d\u8fc5\u901f\u751f\u6210\u4e86\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u5e73\u5747\u6bcf\u7bc7\u6587\u7ae0\u6bcfLLM\u8d26\u6237\u8017\u65f6\u4ec5\u6570\u79d2\u3002\u5bf91041\u7bc7\u6587\u7ae0\u7684\u8fdb\u4e00\u6b65\u5206\u6790\u63ed\u793a\u4e86\u50ac\u5316\u5242\u7ec4\u6210\u3001\u7ed3\u6784\u548c\u6027\u80fd\u7684\u6df1\u5165\u89c1\u89e3\u3002 \u8ba4\u8bc6\u5230LLM\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u7684\u95ee\u9898\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u591a\u5c42\u6b21\u7684\u8d28\u91cf\u63a7\u5236\u7b56\u7565\uff0c\u786e\u4fdd\u4e86\u65b9\u6cd5\u7684\u53ef\u9760\u6027\u548c\u6709\u6548\u7f13\u89e3\u5e7b\u89c9\u7684\u80fd\u529b\u3002\u4e13\u5bb6\u9a8c\u8bc1\u8bc1\u5b9e\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u751f\u6210\u7684\u7efc\u8ff0\u4e0d\u4ec5\u51c6\u786e\u4e14\u5f15\u6587\u5b8c\u6574\uff0cLLM\u5e7b\u89c9\u7684\u98ce\u9669\u5df2\u964d\u81f3\u4f4e\u4e8e0.5%\uff0c\u7f6e\u4fe1\u5ea6\u8d85\u8fc795%\u3002\u53d1\u5e03\u7684Windows\u5e94\u7528\u7a0b\u5e8f\u652f\u6301\u4e00\u952e\u751f\u6210\u7efc\u8ff0\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8ddf\u8e2a\u6700\u65b0\u8fdb\u5c55\u5e76\u63a8\u8350\u76f8\u5173\u6587\u732e\u3002\u8fd9\u4e00\u65b9\u6cd5\u5c55\u793a\u4e86LLM\u5728\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u751f\u4ea7\u529b\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.20898": "|**2024-07-30**|**ThinkRepair: Self-Directed Automated Program Repair**|Xin Yin et.al.|[2407.20898](http://arxiv.org/abs/2407.20898)|**[link](https://github.com/vinci-grape/ThinkRepair)**|**\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u8bb8\u591a\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\uff08APR\uff09\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u4fee\u590d\u4e00\u4e9b\u7279\u5b9a\u7c7b\u578b\u7684\u9519\u8bef\u65f6\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u5bf9\u9519\u8bef\u7a0b\u5e8f\u7684\u903b\u8f91\u8fdb\u884c\u5206\u6790\u548c\u63a8\u7406\u7684\u590d\u6742\u9519\u8bef\u65f6\u4ecd\u5b58\u5728\u5c40\u9650\u6027\u3002\u6700\u8fd1\uff0c\u901a\u8fc7\u63d0\u793a\u5de5\u7a0b\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u89e3\u51b3\u5305\u62ec\u9519\u8bef\u4fee\u590d\u5728\u5185\u7684\u591a\u79cd\u4efb\u52a1\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u63d0\u793a\u7684\u8d28\u91cf\u4f1a\u6781\u5927\u5730\u5f71\u54cdLLMs\u7684\u80fd\u529b\uff0c\u800c\u624b\u52a8\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u63d0\u793a\u662f\u4e00\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5bfc\u5411\u7684LLM\u57fa\u4e8e\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\u65b9\u6cd5ThinkRepair\uff0c\u5b83\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a\u6536\u96c6\u9636\u6bb5\u548c\u4fee\u590d\u9636\u6bb5\u3002\u5728\u6536\u96c6\u9636\u6bb5\uff0c\u901a\u8fc7\u4f7f\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u6307\u5bfcLLMs\uff0c\u81ea\u52a8\u6536\u96c6\u6784\u6210\u9884\u4fee\u590d\u77e5\u8bc6\u7684\u5404\u79cd\u601d\u8003\u94fe\u3002\u5728\u4fee\u590d\u9636\u6bb5\uff0c\u76ee\u6807\u662f\u901a\u8fc7\u9996\u5148\u9009\u62e9\u7528\u4e8e\u5c11\u91cf\u5b66\u4e60\u7684\u793a\u4f8b\u5e76\u5176\u6b21\u4e0eLLMs\u81ea\u52a8\u4ea4\u4e92\u6765\u4fee\u590d\u9519\u8bef\uff0c\u6839\u636e\u6d4b\u8bd5\u4fe1\u606f\u63d0\u4f9b\u53cd\u9988\uff08\u5982\u679c\u9700\u8981\u7684\u8bdd\uff09\u3002 \u5728\u5bf9\u4e24\u4e2a\u5e7f\u6cdb\u7814\u7a76\u7684\u6570\u636e\u96c6\uff08Defects4J\u548cQuixBugs\uff09\u7684\u8bc4\u4f30\u4e2d\uff0c\u4e0e12\u4e2a\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u8868\u660eThinkRepair\u5728\u4fee\u590d\u9519\u8bef\u65b9\u9762\u7684\u4f18\u5148\u7ea7\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728Defects4J V1.2\u4e0a\uff0cThinkRepair\u6210\u529f\u4fee\u590d\u4e8698\u4e2a\u9519\u8bef\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u63d0\u5347\u4e8627%-344.4%\u3002\u5728Defects4J V2.0\u4e0a\uff0cThinkRepair\u6bd4\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u591a\u4fee\u590d\u4e8612-65\u4e2a\u9519\u8bef\u3002\u6b64\u5916\uff0c\u5728Java\u548cPython\u4e0a\uff0cThinkRepair\u5728QuixBugs\u4e0a\u7684\u8868\u73b0\u4e5f\u6709\u4e86\u663e\u8457\u63d0\u5347\uff08\u6700\u591a\u5206\u522b\u8fbe\u523031\u548c21\uff09\u3002**|\n", "2407.20884": "|**2024-07-30**|**Effective Black Box Testing of Sentiment Analysis Classification Networks**|Parsa Karbasizadeh et.al.|[2407.20884](http://arxiv.org/abs/2407.20884)|null|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u795e\u7ecf\u7f51\u7edc\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u5982\u60c5\u611f\u5206\u6790\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u590d\u6742\u67b6\u6784\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u4fdd\u6301\u53ef\u9760\u6027\u7684\u6311\u6218\u4f9d\u7136\u5b58\u5728\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7ec4\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u4e3a\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7f51\u7edc\u6784\u5efa\u7684\u6d4b\u8bd5\u5957\u4ef6\u7684\u8986\u76d6\u6807\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u8f93\u5165\u7a7a\u95f4\u5212\u5206\u7684\u9ed1\u76d2\u7b56\u7565\uff0c\u8003\u8651\u4e86\u4e0e\u60c5\u611f\u76f8\u5173\u7684\u5173\u952e\u8bed\u8a00\u7279\u5f81\uff0c\u5305\u62ec\u52a8\u8bcd\u3001\u5f62\u5bb9\u8bcd\u3001\u526f\u8bcd\u548c\u540d\u8bcd\u3002\u4e3a\u4e86\u6709\u6548\u5730\u751f\u6210\u6db5\u76d6\u5e7f\u6cdb\u60c5\u611f\u5143\u7d20\u7684\u6d4b\u8bd5\u7528\u4f8b\uff0c\u6211\u4eec\u91c7\u7528\u4e86k\u6295\u5f71\u8986\u76d6\u5ea6\u91cf\u3002\u8be5\u5ea6\u91cf\u901a\u8fc7\u4e00\u6b21\u68c0\u67e5k\u4e2a\u7279\u5f81\u7684\u5b50\u96c6\u6765\u51cf\u5c11\u95ee\u9898\u7684\u590d\u6742\u6027\uff0c\u4ece\u800c\u964d\u4f4e\u7ef4\u5ea6\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u5c55\u793a\u7279\u5b9a\u60c5\u611f\u7279\u5f81\u7ec4\u5408\u7684\u53e5\u5b50\u3002\u4ece\u60c5\u611f\u5206\u6790\u6570\u636e\u96c6\u5b9e\u9a8c\u4e2d\u83b7\u5f97\u7684\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6807\u51c6\u548c\u751f\u6210\u7684\u6d4b\u8bd5\u5e73\u5747\u63d0\u9ad8\u4e8616%\u7684\u6d4b\u8bd5\u8986\u76d6\u7387\u3002\u540c\u65f6\uff0c\u6a21\u578b\u51c6\u786e\u5ea6\u5e73\u5747\u4e0b\u964d\u4e866.5%\uff0c\u663e\u793a\u4e86\u8bc6\u522b\u8106\u5f31\u6027\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u8bc4\u4f30\u6539\u8fdb\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7cfb\u7edf\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e0a\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u5de5\u5177\u7684\u5e94\u7528\u4f7f\u7cfb\u7edf\u80fd\u591f\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u4f7f\u5176\u4ece\u4ec5\u4ec5\u751f\u6210\u6587\u672c\u8f6c\u53d8\u4e3a\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u3002\u9274\u4e8e\u4ee3\u7406\u7684\u5b9e\u9645\u5e94\u7528\u8303\u56f4\u4ee5\u53ca\u5176\u5bf9\u73af\u5883\u8fdb\u884c\u64cd\u4f5c\u7684\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u9ed1\u5ba2\u5165\u4fb5\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u9020\u6210\u7684\u635f\u5bb3\u53ef\u80fd\u4f1a\u8d85\u8fc7\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u6709\u7814\u7a76\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e0d\u540c\u89d2\u5ea6\u5ba1\u89c6\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u6545\u969c\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u653b\u51fb\u624b\u6bb5\u3001\u573a\u666f\u548c\u5c5e\u6027\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5176\u6613\u611f\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u4e2a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u5bfc\u81f4\u8d85\u8fc780%\u7684\u5931\u8d25\u7387\u3002\u901a\u8fc7\u5728\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u9488\u5bf9\u5b9e\u73b0\u5e76\u90e8\u7f72\u7684\u4ee3\u7406\u8fdb\u884c\u653b\u51fb\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u6b64\u7c7b\u6f0f\u6d1e\u6240\u4f34\u968f\u7684\u73b0\u5b9e\u98ce\u9669\u3002\u4e3a\u4e86\u51cf\u8f7b\u6b64\u7c7b\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u4ec5\u4f7f\u7528LLM\u5f88\u96be\u6709\u6548\u68c0\u6d4b\u5230\u8fd9\u4e9b\u653b\u51fb\uff0c\u8fd9\u51f8\u663e\u4e86\u8fd9\u79cd\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.20856": "|**2024-07-30**|**Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations**|Sarthak Anand et.al.|[2407.20856](http://arxiv.org/abs/2407.20856)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(Large Language Models, LLMs)\u4e3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u4ea7\u54c1\u63a8\u8350\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u6709\u6548\u6027\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5b83\u4eec\u5bf9\u4ea7\u54c1\u5e93\u5b58\u7684\u5168\u9762\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u589e\u5f3aLLMs\u7684\u4ea7\u54c1\u77e5\u8bc6\u80fd\u529b\uff0c\u901a\u8fc7\u8bad\u7ec3\u5b83\u4eec\u54cd\u5e94\u5305\u542b\u4ea7\u54c1ID\u7684\u5408\u6210\u641c\u7d22\u67e5\u8be2\uff0c\u4ee5\u8fdb\u884c\u4e0a\u4e0b\u6587\u76f8\u5173\u56de\u590d\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0c\u8bc4\u4f30\u5176\u6548\u679c\uff0c\u6982\u8ff0\u5176\u4f18\u70b9\uff0c\u5e76\u6307\u51fa\u4e86\u9650\u5236\u56e0\u7d20\u3002\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u6b64\u65b9\u6cd5\u7684\u6539\u8fdb\u6f5c\u529b\u548c\u672a\u6765\u65b9\u5411\uff0c\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4ea7\u54c1\u63a8\u8350\u4e2d\u89d2\u8272\u7684\u5168\u9762\u7406\u89e3\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u5220\u9664\u6240\u6709','\u5b57\u7b26\u3002|\n", "2407.21771": "|**2024-07-31**|**Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs**|Shi Liu et.al.|[2407.21771](http://arxiv.org/abs/2407.21771)|null|\u73b0\u6709\u5927\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u4e3b\u8981\u901a\u8fc7\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u7684\u56fe\u50cf\u7279\u5f81\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\uff0c\u5229\u7528\u5176\u5f3a\u5927\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u89c4\u6a21\u5dee\u5f02\u53ef\u80fd\u5bfc\u81f4LLM\u5728\u591a\u6a21\u6001\u7406\u89e3\u4e2d\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u79cdLVLM\u4e2d\u7684\u4e0d\u5e73\u8861\u53ef\u80fd\u5f15\u53d1\u5e7b\u89c9\u73b0\u8c61\u3002\u5177\u4f53\u6765\u8bf4\uff0cLVLM\u53ef\u80fd\u751f\u6210\u4e00\u81f4\u7684\u63cf\u8ff0\uff0c\u65e0\u8bba\u662f\u5426\u6709\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u8868\u660e\u67d0\u4e9b\u8f93\u51fa\u4ec5\u53d7\u4e0a\u4e0b\u6587\u6587\u672c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u73b0\u8c61\u79f0\u4e3a\u201c\u6587\u672c\u60ef\u6027\u201d\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u7b97\u6cd5\u6765\u5bfb\u627e\u56fe\u50cf\u7406\u89e3\u548c\u8bed\u8a00\u63a8\u65ad\u4e4b\u95f4\u7684\u5e73\u8861\u70b9\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u52a8\u6001\u8c03\u6574\u5e76\u653e\u5927\u5206\u914d\u7ed9\u56fe\u50cf\u4ee4\u724c\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u4ece\u800c\u8d4b\u4e88\u89c6\u89c9\u5143\u7d20\u66f4\u5927\u7684\u91cd\u8981\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4ece\u591a\u6a21\u6001\u8f93\u5165\u7684logits\u4e2d\u51cf\u53bb\u7eaf\u6587\u672c\u8f93\u5165\u7684logits\uff0c\u6709\u52a9\u4e8eLVLM\u907f\u514d\u8fc7\u5206\u4f9d\u8d56LLM\u3002\u901a\u8fc7\u589e\u5f3a\u56fe\u50cf\u4ee4\u724c\u5e76\u51cf\u5c11LLM\u7684\u987d\u56fa\u8f93\u51fa\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9LVLM\u66f4\u591a\u5730\u5173\u6ce8\u56fe\u50cf\uff0c\u4ece\u800c\u7f13\u89e3\u6587\u672c\u60ef\u6027\u548c\u51cf\u5c11LVLM\u4e2d\u7684\u5e7b\u89c9\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0c\u5728\u4e0d\u540c\u6307\u6807\u4e0b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5404\u79cdLVLM\u4e2d\u7684\u5e7b\u89c9\u8f93\u51fa\u9891\u7387\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://lalbj.github.io/projects/PAI/\u3002|\n", "2407.21762": "|**2024-07-31**|**ReplanVLM: Replanning Robotic Tasks with Visual Language Models**|Aoran Mei et.al.|[2407.21762](http://arxiv.org/abs/2407.21762)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u9886\u57df\u83b7\u5f97\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u5206\u6790\u4e0e\u751f\u6210\u3001\u4ee5\u53ca\u5bf9\u4e16\u754c\u5e7f\u6cdb\u77e5\u8bc6\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u89e3\u6790\u89c6\u89c9\u7ebf\u7d22\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\uff0c\u65e0\u6cd5\u76f4\u63a5\u611f\u77e5\u4e16\u754c\u72b6\u6001\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u63cf\u8ff0\u5f53\u524d\u4e16\u754c\u72b6\u6001\u4e0a\u7684\u4e0d\u8db3\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u901a\u8fc7\u96c6\u6210\u89c6\u89c9\u611f\u77e5\u6a21\u5757\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u589e\u5f3a\u4e86\u673a\u5668\u4eba\u7684\u81ea\u4e3b\u6027\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cVLM\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4f8b\u5982\uff0c\u5728\u63d0\u4f9b\u51c6\u786e\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u6267\u884c\u9519\u8bef\u7684\u98ce\u9669\u4f9d\u7136\u5b58\u5728\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u7684ReplanVLM\u6846\u67b6\u3002\u8be5\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u9519\u8bef\u4fee\u6b63\u5e72\u9884\u63aa\u65bd\u3002\u63d0\u51fa\u4e86\u5185\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\u548c\u5916\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\uff0c\u5728\u76f8\u5e94\u7684\u9636\u6bb5\u8fdb\u884c\u9519\u8bef\u7ea0\u6b63\u3002\u53d1\u5c55\u4e86\u4e00\u79cd\u91cd\u89c4\u5212\u7b56\u7565\uff0c\u5f53\u4efb\u52a1\u6267\u884c\u5931\u8d25\u65f6\uff0c\u7528\u4e8e\u91cd\u65b0\u89c4\u5212\u4efb\u52a1\u6216\u4fee\u6b63\u9519\u8bef\u4ee3\u7801\u3002\u5728\u771f\u5b9e\u673a\u5668\u4eba\u548c\u4eff\u771f\u73af\u5883\u4e2d\u8fdb\u884c\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6846\u67b6\u5177\u6709\u66f4\u9ad8\u7684\u6210\u529f\u7387\u548c\u66f4\u5f3a\u7684\u5f00\u653e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\u3002\u6709\u5173\u5b9e\u9a8c\u7684\u89c6\u9891\u53ef\u4ee5\u5728https://youtu.be/NPk2pWKazJc\u627e\u5230\u3002|\n", "2407.21712": "|**2024-07-31**|**Adaptive Retrieval-Augmented Generation for Conversational Systems**|Xi Wang et.al.|[2407.21712](http://arxiv.org/abs/2407.21712)|null|\u5c3d\u7ba1\u5728\u5bf9\u8bdd\u7cfb\u7edf\u5f00\u53d1\u4e2d\u878d\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u8bb8\u591a\u7814\u7a76\u663e\u793a\u4e86\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u5bf9\u4e8e\u63d0\u4f9b\u4fe1\u606f\u6027\u54cd\u5e94\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7814\u7a76\u901a\u5e38\u5047\u8bbe\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u6bcf\u6b21\u56de\u590d\u90fd\u9700\u8981\u68c0\u7d22\u589e\u5f3a\uff0c\u800c\u65e0\u9700\u660e\u786e\u63a7\u5236\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u5173\u4e8e\u8fd9\u79cd\u5fc5\u8981\u6027\u7684\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u7cfb\u7edf\u56de\u5e94\u662f\u5426\u9700\u8981\u4f7f\u7528\u5916\u90e8\u77e5\u8bc6\u8fdb\u884c\u589e\u5f3a\u7684\u5fc5\u8981\u6027\u3002\u901a\u8fc7\u5229\u7528\u4eba\u7c7b\u5bf9\u662f\u5426\u9700\u8981\u9002\u5e94\u6027\u589e\u5f3a\u7684\u4e8c\u5143\u9009\u62e9\u8fdb\u884c\u5224\u65ad\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RAGate\u2014\u2014\u4e00\u4e2a\u95f8\u95e8\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u901a\u8fc7\u5206\u6790\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u548c\u76f8\u5173\u8f93\u5165\u6765\u9884\u6d4b\u5bf9\u8bdd\u7cfb\u7edf\u662f\u5426\u9700\u8981RAG\u4ee5\u83b7\u5f97\u6539\u8fdb\u7684\u56de\u590d\u3002\u6211\u4eec\u5728\u6784\u5efa\u548c\u5e94\u7528RAGate\u5230\u5bf9\u8bdd\u6a21\u578b\u4ee5\u53ca\u5bf9\u4e0d\u540c\u5bf9\u8bdd\u573a\u666f\u8fdb\u884c\u8be6\u5c3d\u5206\u6790\u65b9\u9762\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u548c\u5206\u6790\u8868\u660e\uff0cRAGate\u5728\u8bc6\u522b\u9700\u8981RAG\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u56de\u590d\u5e76\u5177\u6709\u9ad8\u751f\u6210\u7f6e\u4fe1\u5ea6\u7684\u7cfb\u7edf\u54cd\u5e94\u65b9\u9762\u6709\u6709\u6548\u5e94\u7528\u3002\u8fd9\u9879\u7814\u7a76\u8fd8\u53d1\u73b0\u4e86\u751f\u6210\u7f6e\u4fe1\u5ea6\u6c34\u5e73\u4e0e\u589e\u5f3a\u77e5\u8bc6\u7684\u76f8\u5173\u6027\u3002|\n", "2407.21708": "|**2024-07-31**|**CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature**|Stefan Langer et.al.|[2407.21708](http://arxiv.org/abs/2407.21708)|null|\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5df2\u6807\u6ce8\u6587\u672c\u8bed\u6599\u5e93\u548c\u4eceChebi\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u589e\u5f3a\u73b0\u6709\u77e5\u8bc6\uff0c\u5e76\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u8bc6\u522b\u79d1\u5b66\u6587\u732e\u4e2d\u7684\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u672c\u4f53\u8bba\u77e5\u8bc6\u4e0eLLM\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u79d1\u5b66\u6587\u732e\u4e2d\u8bc6\u522b\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u7684\u9ad8\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u4ece8000\u7bc7ChemRxiv\u6587\u7ae0\u4e2d\u63d0\u53d6\u8fd9\u4e9b\u5b9e\u4f53\u548c\u89d2\u8272\uff0c\u7136\u540e\u4f7f\u7528\u7b2c\u4e8c\u4e2aLLM\u6784\u5efa\u4e86\u4e00\u4e2a\u5316\u5b66\u5b9e\u4f53\u548c\u89d2\u8272\u7684\u77e5\u8bc6\u56fe\u8c31\uff08CEAR\uff09\uff0c\u8be5\u56fe\u8c31\u4e0d\u4ec5\u4e3aChEBI\u63d0\u4f9b\u4e86\u8865\u5145\u4fe1\u606f\uff0c\u8fd8\u80fd\u5e2e\u52a9\u6269\u5c55\u5176\u5185\u5bb9\u3002|\n", "2407.21693": "|**2024-07-31**|**TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities**|Ming Zhang et.al.|[2407.21693](http://arxiv.org/abs/2407.21693)|**[link](https://github.com/konglonggefdu/transfertod)**|\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\uff08TOD\uff09\u7cfb\u7edf\u65e8\u5728\u6709\u6548\u5904\u7406\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\uff0c\u5305\u62ec\u4fe1\u606f\u6536\u96c6\u3002\u5982\u4f55\u51c6\u786e\u3001\u9ad8\u6548\u4e14\u6709\u6548\u5730\u5229\u7528TOD\u8fdb\u884c\u4fe1\u606f\u6536\u96c6\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u4e00\u4e2a\u5173\u952e\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u3001\u6307\u4ee4\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u80fd\u591f\u901a\u8fc7\u5fae\u8c03\u663e\u8457\u63d0\u9ad8TOD\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6570\u636e\u96c6\u4e3b\u8981\u9488\u5bf9\u7528\u6237\u9a71\u52a8\u7684\u7cfb\u7edf\uff0c\u5e76\u5c40\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u7279\u5b9a\u573a\u666f\u548c\u69fd\u4f4d\uff0c\u56e0\u6b64\u9700\u8981\u5728TOD\u7684\u4e3b\u52a8\u6027\u3001\u591a\u6837\u6027\u548c\u80fd\u529b\u65b9\u9762\u8fdb\u884c\u6539\u8fdb\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u591a\u9886\u57df\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u6570\u636e\u6784\u5efa\u8fc7\u7a0b\u4ee5\u53ca\u57fa\u4e8e\u6b64\u8fc7\u7a0b\u751f\u6210\u7684\u4e2d\u6587\u5bf9\u8bdd\u6570\u636e\u96c6\u2014\u2014\\textbf{TransferTOD}\uff0c\u8be5\u6570\u636e\u96c6\u771f\u5b9e\u6a21\u62df\u4e86\u572830\u4e2a\u6d41\u884c\u751f\u6d3b\u670d\u52a1\u573a\u666f\u4e2d\u7684\u4eba\u673a\u5bf9\u8bdd\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4f7f\u7528\u5168\u53c2\u6570\u5fae\u8c03\u7684\\textbf{TransferTOD-7B}\u6a21\u578b\uff0c\u5c55\u793a\u4e86\u5728\u5404\u79cd\u4e0b\u6e38\u573a\u666f\u4e2d\u7684\u663e\u8457\u7684\u586b\u69fd\u80fd\u529b\u548c\u63d0\u95ee\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u6570\u636e\u5e94\u7528\u573a\u666f\u4e0b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6570\u636e\u4f7f\u7528\u6548\u7387\u548c\u7cfb\u7edf\u6027\u80fd\u3002\u6570\u636e\u5df2\u53d1\u5e03\u4e8ehttps://github.com/KongLongGeFDU/TransferTOD\u3002|\n", "2407.21669": "|**2024-07-31**|**Synth-Empathy: Towards High-Quality Synthetic Empathy Data**|Hao Liang et.al.|[2407.21669](http://arxiv.org/abs/2407.21669)|**[link](https://github.com/aurora-slz/synth-empathy)**|\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b9e\u73b0\u51fa\u8272\u540c\u7406\u5fc3\u54cd\u5e94\u80fd\u529b\u5df2\u6210\u4e3a\u4e00\u4e2a\u81f3\u5173\u91cd\u8981\u7684\u524d\u63d0\u3002\u56e0\u6b64\uff0c\u7ba1\u7406\u548c\u7406\u89e3\u540c\u7406\u5fc3\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u540c\u7406\u5fc3\u6570\u636e\u901a\u5e38\u7531\u4eba\u7c7b\u6807\u6ce8\uff0c\u5bfc\u81f4\u6570\u636e\u91cf\u4e0d\u8db3\u548c\u5927\u91cf\u7684\u4eba\u529b\u6d6a\u8d39\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSynth-Empathy\u7684LLM\u57fa\u4e8e\u7684\u6570\u636e\u751f\u6210\u4e0e\u8d28\u91cf\u3001\u591a\u6837\u6027\u9009\u62e9\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u540c\u7406\u5fc3\u6570\u636e\u5e76\u7b5b\u9009\u6389\u4f4e\u8d28\u91cf\u6570\u636e\u3002\u901a\u8fc7\u5229\u7528\u4f4e\u540c\u7406\u5fc3\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u9ad8\u4e86\u540c\u7406\u5fc3\u54cd\u5e94\u6027\u80fd\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\uff08SoTA\uff09\u7ed3\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5404\u79cd\u4eba\u7c7b\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5747\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6570\u636e\u91cf\u4e0e\u8d28\u91cf\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u63d0\u4f9b\u4e86\u540c\u7406\u5fc3\u6570\u636e\u751f\u6210\u4e0e\u9009\u62e9\u65b9\u9762\u7684\u89c1\u89e3\u3002|\n", "2407.21593": "|**2024-07-31**|**LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows**|Lukas Teufelberger et.al.|[2407.21593](http://arxiv.org/abs/2407.21593)|null|\u4e3a\u4e86\u63d0\u9ad8\u751f\u4ea7\u529b\u5e76\u4f18\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u529f\u80fd\u5d4c\u5165\u5e94\u7528\u7a0b\u5e8f\u7684\u8d8b\u52bf\u6b63\u5728\u589e\u957f\uff0c\u4ece\u57fa\u4e8e\u6d4f\u89c8\u5668\u7684\u7f51\u7edc\u5e94\u7528\u5230\u5728\u4e2a\u4eba\u8ba1\u7b97\u673a\u4e0a\u8fd0\u884c\u7684\u539f\u751f\u5e94\u7528\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7cfb\u7edf\u7ea7\u5feb\u6377\u65b9\u5f0f\u5c42\u2014\u2014LLM-for-X\uff0c\u5b83\u901a\u8fc7\u8f7b\u91cf\u7ea7\u5f39\u51fa\u5f0f\u5bf9\u8bdd\u6846\u65e0\u7f1d\u5730\u5411\u4efb\u4f55\u5e94\u7528\u7a0b\u5e8f\u6dfb\u52a0LLM\u670d\u52a1\u3002\u6211\u4eec\u7684\u539f\u751f\u5c42\u901a\u8fc7\u7edf\u4e00\u7684\u804a\u5929\u524d\u7aef\u4f5c\u4e3a\u7f16\u7a0b\u63a5\u53e3\u6216\u81ea\u5b9a\u4e49API\u8c03\u7528\uff0c\u5c06\u524d\u7aef\u5e94\u7528\u7a0b\u5e8f\u4e0e\u6d41\u884c\u7684LLM\u540e\u7aef\uff08\u5982ChatGPT\u548cGemini\uff09\u65e0\u7f1d\u8fde\u63a5\u3002\u6211\u4eec\u5c55\u793a\u4e86LLM-for-X\u5728Microsoft Office\u3001VSCode\u3001Adobe Acrobat\u4ee5\u53caOverleaf\u7b49\u6d41\u884c\u7f51\u7edc\u5e94\u7528\u4e2d\u7684\u4f18\u52bf\u3002\u5728\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c06LLM-for-X\u4e0eChatGPT\u7684\u7f51\u9875\u754c\u9762\u8fdb\u884c\u4e86\u4efb\u52a1\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u4f9b\u5feb\u901f\u3001\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u7684LLM\u8f85\u52a9\uff0c\u65e0\u9700\u5207\u6362\u4e0a\u4e0b\u6587\u652f\u6301\u5199\u4f5c\u548c\u9605\u8bfb\u4efb\u52a1\uff0c\u540c\u65f6\u5bf9\u7279\u5b9a\u5e94\u7528\u65e0\u7279\u5b9a\u4f9d\u8d56\u3002|\n", "2407.21579": "|**2024-07-31**|**A Performance Study of LLM-Generated Code on Leetcode**|Tristan Coignion et.al.|[2407.21579](http://arxiv.org/abs/2407.21579)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6548\u7387\uff0c\u5e76\u4f7f\u7528\u6765\u81eaLeetCode\u7684\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7f16\u5199\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u6211\u4eec\u5bf9\u6bd4\u4e8618\u4e2aLLM\uff0c\u8003\u8651\u4e86\u6a21\u578b\u6e29\u5ea6\u548c\u6210\u529f\u7387\u7b49\u56e0\u7d20\u5bf9\u4ee3\u7801\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5ea6\u91cf\u548c\u6bd4\u8f83LLM\u751f\u6210\u4ee3\u7801\u7684\u901f\u5ea6\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u91c7\u7528\u4e0d\u540cLLM\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u6027\u80fd\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u5e73\u5747\u800c\u8a00\u6bd4\u4eba\u7c7b\u7f16\u5199\u7684\u4ee3\u7801\u66f4\u9ad8\u6548\u3002\u8bba\u6587\u8fdb\u4e00\u6b65\u8ba8\u8bba\u4e86\u4f7f\u7528LeetCode\u4f5c\u4e3a\u57fa\u51c6\u6570\u636e\u96c6\u3001\u6f5c\u5728\u6570\u636e\u6c61\u67d3\u5e26\u6765\u7684\u9650\u5236\u4ee5\u53ca\u5e73\u53f0\u6d4b\u91cf\u53ef\u9760\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3LLM\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u80fd\u529b\uff0c\u5e76\u4e3a\u8be5\u9886\u57df\u672a\u6765\u7684\u4f18\u5316\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2407.21571": "|**2024-07-31**|**PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning**|Min Jae Jung et.al.|[2407.21571](http://arxiv.org/abs/2407.21571)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u9047\u5230\u91cd\u5927\u6311\u6218\uff0c\u4e3b\u8981\u5728\u4e8e\u707e\u96be\u6027\u9057\u5fd8\u73b0\u8c61\uff0c\u5373\u65b0\u4fe1\u606f\u4f1a\u8986\u76d6\u4e4b\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u8fd9\u4e00\u5c40\u9650\u6027\u5bfc\u81f4\u4e86\u5927\u91cf\u73af\u5883\u548c\u7ecf\u6d4e\u8d44\u6e90\u7684\u6d6a\u8d39\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aPMoE\uff08Progressive Mixture of Experts with Asymmetric Transformer\uff09\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u65e8\u5728\u901a\u8fc7\u91c7\u7528\u5177\u6709\u6d45\u5c42\u7528\u4e8e\u4e00\u822c\u77e5\u8bc6\u548c\u6df1\u5c42\u7528\u4e8e\u65b0\u77e5\u8bc6\u7684\u4e0d\u5bf9\u79f0\u8bbe\u8ba1\u6765\u6700\u5c0f\u5316\u9057\u5fd8\u3002PMoE\u5728\u6df1\u5c42\u5f15\u5165\u4e86\u9010\u6b65\u589e\u52a0\u7684\u4e13\u5bb6\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u8be5\u8def\u7531\u5668\u80fd\u591f\u9ad8\u6548\u5730\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u5408\u9002\u7684\u4e13\u5bb6\u3002 \u8def\u7531\u5668\u4f4d\u4e8e\u6df1\u5c42\u9644\u8fd1\uff0c\u5229\u7528\u6df1\u5ea6\u7279\u5f81\u805a\u5408\u5df2\u6574\u5408\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u8def\u7531\u5668\u80fd\u591f\u6709\u6548\u5730\u6267\u884c\u4efb\u52a1\uff0c\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u9010\u6b65\u589e\u52a0\u7684\u6df1\u5c42\u4e13\u5bb6\u3002\u901a\u8fc7\u5728TRACE\u6570\u636e\u96c6\u548c\u901a\u7528\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684PMoE\u65b9\u6cd5\u4f18\u4e8e\u5148\u524d\u7684\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2407.21553": "|**2024-07-31**|**CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment**|Akira Kasuga et.al.|[2407.21553](http://arxiv.org/abs/2407.21553)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5ba2\u6237\u4f53\u9a8c\uff08CX\uff09\u6a21\u62df\u5668\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u7528\u6237\u884c\u4e3a\u6a21\u62df\u6765\u8bc4\u4f30\u672a\u6d4b\u8bd5\u7684\u7f51\u7edc\u8425\u9500\u6d3b\u52a8\u7684\u5f71\u54cd\u3002\u8be5\u63d0\u51fa\u7684\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u7528\u6237\u884c\u4e3a\u5386\u53f2\u4e2d\u7684\u5404\u79cd\u4e8b\u4ef6\uff0c\u5982\u67e5\u770b\u5546\u54c1\u3001\u4f7f\u7528\u4f18\u60e0\u5238\u6216\u8d2d\u4e70\u5546\u54c1\u7b49\uff0c\u8868\u793a\u4e3a\u8bed\u4e49\u5d4c\u5165\u5411\u91cf\u3002\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u6a21\u578b\uff0c\u7528\u4e8e\u4ece\u5176LLM\u5d4c\u5165\u4e2d\u9884\u6d4b\u4e8b\u4ef6\u4e4b\u95f4\u7684\u8fc7\u6e21\uff0c\u751a\u81f3\u53ef\u4ee5\u4ece\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u4e2d\u5b66\u4e60\uff0c\u4ece\u800c\u5bf9\u672a\u77e5\u4e8b\u4ef6\u8fdb\u884c\u6cdb\u5316\u3002\u5728web\u8425\u9500\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u8fc7\u6e21\u9884\u6d4b\u6a21\u578b\u6765\u6a21\u62df\u5f53\u65b0\u7684\u8425\u9500\u6d3b\u52a8\u6216\u4ea7\u54c1\u5c55\u793a\u7ed9\u7528\u6237\u65f6\uff0c\u7528\u6237\u53ef\u80fd\u5982\u4f55\u53cd\u5e94\u4e0d\u540c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u6d88\u9664\u5728\u7ebf\u6d4b\u8bd5\u7684\u9ad8\u6602\u6210\u672c\uff0c\u5e76\u589e\u5f3a\u8425\u9500\u4eba\u5458\u63ed\u793a\u6d1e\u5bdf\u529b\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u6570\u503c\u8bc4\u4f30\u548c\u4f7f\u7528Google\u5546\u54c1\u5546\u5e97\u7684\u5927\u89c4\u6a21\u516c\u5171\u6570\u636e\u96c6\u8fdb\u884c\u7684\u7528\u6237\u7814\u7a76\u8bc1\u660e\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u6574LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\u3002\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u8fc7\u7a0b\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u7684\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6AgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u5305\u542b\u5404\u79cd\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u96be\u5ea6\u591a\u6837\u6027\u7684\u7a0b\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u5bb9\u6613\u548c\u56f0\u96be\u7684\u4e24\u4e2a\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u4f7f\u7528AgentGen\u6307\u4ee4\u8c03\u6574\u7684Llama-3 8B\u5728\u603b\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u6b64\u5916\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4\u3002|\n", "2408.00761": "|**2024-08-01**|**Tamper-Resistant Safeguards for Open-Weight LLMs**|Rishub Tamirisa et.al.|[2408.00761](http://arxiv.org/abs/2408.00761)|**[link](https://github.com/rishub-tamirisa/tamper-resistance)**|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5f15\u53d1\u4e86\u5bf9\u6f5c\u5728\u6076\u610f\u7528\u9014\u7684\u5e7f\u6cdb\u62c5\u5fe7\u3002\u9488\u5bf9\u5f00\u653e\u6743\u91cd\u7684LLM\uff0c\u73b0\u6709\u4fdd\u62a4\u63aa\u65bd\u5728\u62b5\u6297\u7be1\u6539\u653b\u51fb\u65b9\u9762\u7f3a\u4e4f\u8db3\u591f\u7684\u7a33\u5b9a\u6027\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6b65\u9aa4\u8f7b\u6613\u5730\u79fb\u9664\u62d2\u7edd\u548c\u9057\u5fd8\u4fdd\u62a4\u63aa\u65bd\u3002\u8fd9\u7c7b\u6f0f\u6d1e\u8981\u6c42\u91c7\u53d6\u65b0\u7684\u65b9\u6cd5\u6765\u786e\u4fdd\u5b89\u5168\u91ca\u653e\u5f00\u653e\u6743\u91cd\u7684LLM\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTAR\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u4e0d\u53ef\u7be1\u6539\u7684\u5b89\u5168\u9632\u62a4\u878d\u5165\u5230\u5f00\u653e\u6743\u91cd\u7684LLM\u4e2d\uff0c\u4f7f\u5f97\u5373\u4f7f\u7ecf\u8fc7\u6570\u5343\u6b65\u7684\u5fae\u8c03\uff0c\u653b\u51fb\u8005\u4e5f\u65e0\u6cd5\u79fb\u9664\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u3002\u5728\u5168\u9762\u7684\u8bc4\u4f30\u548c\u7ea2\u961f\u6d4b\u8bd5\u5206\u6790\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u9632\u62a4\u7684\u4e0d\u53ef\u7be1\u6539\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u6027\u529f\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u4e0d\u53ef\u7be1\u6539\u6027\u662f\u4e00\u4e2a\u53ef\u884c\u7684\u95ee\u9898\uff0c\u4e3a\u6539\u8fdb\u5f00\u653e\u6743\u91cdLLM\u7684\u5b89\u5168\u6027\u548c\u5b89\u5168\u6027\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u65b0\u9014\u5f84\u3002|\n", "2408.00741": "|**2024-08-01**|**DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency**|Jovan Stojkovic et.al.|[2408.00741](http://arxiv.org/abs/2408.00741)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210\u80fd\u529b\u4f7f\u5176\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u6210\u4e3a\u5173\u952e\u7684\u5de5\u4f5c\u8d1f\u8f7d\u3002\u5982\u4eca\uff0cLLM\u63a8\u7406\u96c6\u7fa4\u5904\u7406\u5927\u91cf\u67e5\u8be2\uff0c\u5e76\u5bf9\u670d\u52a1\u8d28\u91cf\u6307\u6807\uff08SLOs\uff09\u6709\u4e25\u683c\u8981\u6c42\u3002\u4e3a\u4e86\u8fbe\u5230\u9884\u671f\u6027\u80fd\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u80fd\u8017\u9ad8\u7684GPU\u4e0a\u6267\u884c\uff0c\u5bfc\u81f4\u63a8\u7406\u96c6\u7fa4\u6d88\u8017\u5927\u91cf\u80fd\u6e90\uff0c\u5e76\u4ea7\u751f\u8fc7\u91cf\u7684\u78b3\u6392\u653e\u3002\u5e78\u8fd0\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u53ef\u4ee5\u901a\u8fc7\u5229\u7528\u63a8\u7406\u8ba1\u7b97\u7279\u6027\u7684\u5f02\u8d28\u6027\u4ee5\u53ca\u5de5\u4f5c\u8d1f\u8f7d\u7684\u6ce2\u52a8\uff0c\u663e\u8457\u63d0\u9ad8\u80fd\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u591a\u6837\u6027\u548c\u52a8\u6001\u73af\u5883\u521b\u9020\u4e86\u4e00\u4e2a\u5de8\u5927\u7684\u641c\u7d22\u7a7a\u95f4\uff0c\u4e0d\u540c\u7684\u7cfb\u7edf\u914d\u7f6e\uff08\u5982\u5b9e\u4f8b\u6570\u91cf\u3001\u6a21\u578b\u5e76\u884c\u6027\u548cGPU\u9891\u7387\uff09\u5bfc\u81f4\u4e0d\u540c\u7684\u80fd\u6e90\u548c\u6027\u80fd\u6298\u8877\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DynamoLLM\uff0c\u8fd9\u662f\u9996\u4e2a\u9488\u5bf9LLM\u63a8\u7406\u73af\u5883\u7684\u80fd\u6548\u7ba1\u7406\u6846\u67b6\u3002DynamoLLM\u81ea\u52a8\u4e14\u52a8\u6001\u5730\u91cd\u65b0\u914d\u7f6e\u63a8\u7406\u96c6\u7fa4\uff0c\u4ee5\u4f18\u5316\u80fd\u6e90\u548c\u6210\u672c\uff0c\u540c\u65f6\u6ee1\u8db3\u670d\u52a1\u7684\u6027\u80fdSLOs\u3002\u7814\u7a76\u8868\u660e\uff0c\u5728\u670d\u52a1\u5c42\u9762\uff0cDynamoLLM\u80fd\u591f\u8282\u770153%\u7684\u80fd\u6e90\u548c38%\u7684\u64cd\u4f5c\u78b3\u6392\u653e\uff0c\u5e76\u4e3a\u5ba2\u6237\u51cf\u5c1161%\u7684\u6210\u672c\uff0c\u540c\u65f6\u4ecd\u80fd\u6ee1\u8db3\u5ef6\u8fdfSLOs\u3002|\n", "2408.00727": "|**2024-08-01**|**Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions**|Guangzhi Xiong et.al.|[2408.00727](http://arxiv.org/abs/2408.00727)|**[link](https://github.com/teddy-xionggz/medrag)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u89e3\u51b3\u533b\u7597\u95ee\u9898\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u80fd\u591f\u638c\u63e1\u5927\u91cf\u533b\u5b66\u77e5\u8bc6\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u4e14\u5728\u77e5\u8bc6\u66f4\u65b0\u65b9\u9762\u5177\u6709\u5c40\u9650\u6027\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u5728\u533b\u5b66\u95ee\u7b54\u65b9\u9762\u7684\u80fd\u529b\uff0c\u63d0\u51fa\u4e86\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\uff0c\u901a\u8fc7\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u591a\u6b21\u4fe1\u606f\u67e5\u8be2\u7684\u590d\u6742\u60c5\u51b5\u4e0b\uff0cRAG\u53ef\u80fd\u4ecd\u7136\u4f1a\u5931\u8d25\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3RAG\u65b9\u6cd5\uff08i-MedRAG\uff09\uff0c\u5141\u8bb8LLM\u5728\u6bcf\u6b21\u5c1d\u8bd5\u540e\u8fed\u4ee3\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u3002\u5728\u6bcf\u6b21i-MedRAG\u8fed\u4ee3\u4e2d\uff0c\u540e\u7eed\u95ee\u9898\u7531\u57fa\u672c\u7684RAG\u7cfb\u7edf\u56de\u7b54\uff0c\u5e76\u7528\u4e8e\u6307\u5bfc\u4e0b\u4e00\u4e2a\u8fed\u4ee3\u4e2d\u7684\u67e5\u8be2\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ec5\u4f7f\u7528RAG\u7684\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0ci-MedRAG\u663e\u8457\u63d0\u9ad8\u4e86\u5404\u79cdLLM\u5728\u590d\u6742\u95ee\u9898\u4e0a\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u95ee\u9898\u662f\u7f8e\u56fd\u533b\u5b66\u751f\u6267\u7167\u8003\u8bd5\uff08USMLE\uff09\u4e34\u5e8a\u6848\u4f8b\u548c\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u8bed\u8a00\u7406\u89e3\uff08MMLU\uff09\u6570\u636e\u96c6\u4e2d\u7684\u77e5\u8bc6\u6d4b\u8bd5\u6240\u6db5\u76d6\u7684\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u96f6\u6837\u672ci-MedRAG\u5728GPT-3.5\u4e0a\u53d6\u5f97\u4e8669.68%\u7684\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u6240\u6709\u73b0\u6709\u7684\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u65b9\u6cd5\u5728MedQA\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86i-MedRAG\u5728\u4e0d\u540c\u8fed\u4ee3\u6b21\u6570\u548c\u6bcf\u8fed\u4ee3\u67e5\u8be2\u6570\u91cf\u4e0b\u7684\u6269\u5c55\u7279\u6027\u3002 \u6211\u4eec\u7684\u6848\u4f8b\u7814\u7a76\u663e\u793a\uff0ci-MedRAG\u80fd\u591f\u7075\u6d3b\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u5f62\u6210\u63a8\u7406\u94fe\uff0c\u6df1\u5165\u5206\u6790\u533b\u7597\u95ee\u9898\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u540e\u7eed\u95ee\u9898\u878d\u5165\u533b\u5b66RAG\u7684\u7814\u7a76\u3002|\n", "2408.00724": "|**2024-08-01**|**An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models**|Yangzhen Wu et.al.|[2408.00724](http://arxiv.org/abs/2408.00724)|null|\u5728\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u4f18\u8bad\u7ec3\u914d\u7f6e\u7814\u7a76\u4e2d\uff0c\u7279\u522b\u662f\u5728\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u65b9\u9762\u7684\u914d\u7f6e\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u63a2\u8ba8\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u7406\u9636\u6bb5\u5982\u4f55\u6700\u4f18\u5316\u914d\u7f6eLLM\u4ee5\u5e73\u8861\u989d\u5916\u7684\u63a8\u7406\u8ba1\u7b97\u65f6\u95f4\u548c\u6027\u80fd\u63d0\u5347\u7684\u7814\u7a76\u8fd8\u4e0d\u591f\u6df1\u5165\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\uff0c\u5373\u8bbe\u8ba1\u80fd\u591f\u901a\u8fc7\u8c03\u6574\u63a8\u7406\u65f6\u95f4\u7684\u8ba1\u7b97\u91cf\u6765\u4f18\u5316\u6027\u80fd\u7684\u6a21\u578b\u548c\u63a8\u7406\u7b56\u7565\u3002 \u4e3a\u4e86\u7406\u89e3\u5e76\u8bbe\u8ba1\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5bf9\u591a\u79cd\u63a8\u7406\u7b56\u7565\uff0c\u5982\u8d2a\u5fc3\u641c\u7d22\u3001\u591a\u6570\u6295\u7968\u3001\u6700\u4f73N\u79cd\u7ec4\u5408\u3001\u52a0\u6743\u6295\u7968\u53ca\u5176\u53d8\u4f53\uff0c\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u6811\u641c\u7d22\u7b97\u6cd5\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d89\u53ca\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u914d\u5408\u66f4\u5148\u8fdb\u7684\u89e3\u7801\u7b97\u6cd5\u901a\u5e38\u80fd\u5b9e\u73b0\u5e15\u7d2f\u6258\u6700\u4f18\u7684\u6743\u8861\uff0c\u5373\u5728\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u4e0e\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u627e\u5230\u6700\u4f73\u5e73\u8861\u70b9\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u5728\u9884\u7b97\u6709\u9650\u7684\u573a\u666f\u4e0b\uff0c\u5982\u7ec8\u7aef\u8bbe\u5907\u4e0a\u90e8\u7f72\u5c0f\u578b\u6a21\u578b\uff0c\u53ef\u80fd\u5177\u6709\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u9ad8\u95ee\u9898\u89e3\u51b3\u7684\u51c6\u786e\u7387\u3002 \u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86Llemma-7B\u6a21\u578b\u5728\u4f7f\u7528\u7ea6\u4e24\u500d\u4e8eLlemma-34B\u6a21\u578b\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u5b9e\u73b0\u4e0e\u540e\u8005\u76f8\u5f53\u7684MATH500\u4efb\u52a1\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u53ef\u80fd\u9002\u7528\u4e8e\u4efb\u4f55\u6709\u660e\u786e\u6210\u529f\u5ea6\u91cf\u6807\u51c6\u7684\u751f\u6210\u4efb\u52a1\u3002|\n", "2408.00722": "|**2024-08-01**|**Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities**|Sunder Ali Khowaja et.al.|[2408.00722](http://arxiv.org/abs/2408.00722)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u65b0\u5174\u5e94\u7528\u4e2d\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u800c\u5907\u53d7\u5173\u6ce8\uff0c\u8fd9\u4e9b\u5e94\u7528\u5305\u62ec\u901a\u4fe1\u7f51\u7edc\u3002\u9884\u8ba16G\u79fb\u52a8\u8fb9\u7f18\u8ba1\u7b97\u7f51\u7edc\u5c06\u80fd\u591f\u4f5c\u4e3a\u670d\u52a1\u652f\u6301LLMs\uff0c\u56e0\u4e3a\u5b83\u4eec\u63d0\u4f9b\u8d85\u53ef\u9760\u7684\u4f4e\u5ef6\u8fdf\u901a\u4fe1\u548c\u95ed\u73af\u5927\u89c4\u6a21\u8fde\u63a5\u3002\u7136\u800c\uff0cLLMs\u5728\u6570\u636e\u548c\u6a21\u578b\u9690\u79c1\u65b9\u9762\u5b58\u5728\u6f0f\u6d1e\uff0c\u8fd9\u5f71\u54cd\u4e86\u5728\u7528\u6237\u670d\u52a1\u4e2d\u90e8\u7f72LLMs\u7684\u4fe1\u4efb\u5ea6\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u65f6\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u7279\u522b\u662f\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u653b\u51fb\u7f51\u7edc\u7684\u7279\u5f81\uff0c\u8be5\u7f51\u7edc\u53ef\u4ee5\u5728\u8bbf\u95ee\u4e0b\u6e38\u4efb\u52a1\u7ec6\u8c03\u6a21\u578b\u65f6\u6267\u884c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\uff0c\u524d\u63d0\u662f\u653b\u51fb\u8005\u53ef\u4ee5\u8bbf\u95ee\u8be5\u6a21\u578b\u3002\u6211\u4eec\u8868\u660e\uff0c\u5bf9\u4e8e\u4efb\u4f55\u4e0b\u6e38\u4efb\u52a1\uff0c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u90fd\u662f\u6709\u6548\u7684\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5728\u4f7f\u7528LLMs\u4f5c\u4e3a\u670d\u52a1\u65f6\u53d1\u751f\u4e2a\u4eba\u6570\u636e\u6cc4\u9732\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u4e0a\uff0c\u653b\u51fb\u6210\u529f\u7387\u53ef\u8fbe92%\u3002\u57fa\u4e8e\u5b9e\u9a8c\u5206\u6790\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u9632\u5fa1\u673a\u5236\uff0c\u5e76\u63d0\u51fa\u4e86\u53ef\u80fd\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u4f7f\u57286G\u7f51\u7edc\u80cc\u666f\u4e0bLLMs\u66f4\u52a0\u53ef\u9760\u3002|\n", "2408.00690": "|**2024-08-02**|**Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning**|Trapoom Ukarapol et.al.|[2408.00690](http://arxiv.org/abs/2408.00690)|**[link](https://github.com/trapoom555/language-model-sts-cft)**|\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u7279\u70b9\u800c\u964d\u4f4e\u4e86\u5176\u53ef\u83b7\u53d6\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u5982MiniCPM\u63d0\u4f9b\u4e86\u66f4\u53ef\u6301\u7eed\u7684\u6269\u5c55\u6027\uff0c\u4f46\u5f80\u5f80\u5728\u6ca1\u6709\u4e13\u95e8\u4f18\u5316\u7684\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u4f73\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u63d0\u5347\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\u6765\u589e\u5f3a\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff1aMiniCPM\u3001Phi-2\u548cGemma\uff0c\u5728NLI\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u5347\u6240\u6709\u4e09\u79cd\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\uff0c\u5176\u4e2dMiniCPM\u8868\u73b0\u51fa\u6700\u663e\u8457\u7684\u5e73\u574756.33%\u6027\u80fd\u63d0\u5347\u3002\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/trapoom555/Language-Model-STS-CFT\u3002|\n", "2408.00686": "|**2024-08-01**|**Can Developers Prompt? A Controlled Experiment for Code Documentation Generation**|Hans-Alexander Kruse et.al.|[2408.00686](http://arxiv.org/abs/2408.00686)|null|\u6211\u4eec\u5bf920\u540d\u4e13\u4e1a\u4eba\u58eb\u548c30\u540d\u8ba1\u7b97\u673a\u79d1\u5b66\u5b66\u751f\u8fdb\u884c\u4e86\u4e00\u4e2a\u53d7\u63a7\u5b9e\u9a8c\uff0c\u8981\u6c42\u4ed6\u4eec\u4f7f\u7528ChatGPT\u98ce\u683c\u7684Visual Studio Code\u6269\u5c55\u6765\u4e3a\u4e24\u4e2aPython\u51fd\u6570\u7f16\u5199\u4ee3\u7801\u6587\u6863\u3002\u5b9e\u9a8c\u7ec4\u81ea\u7531\u8f93\u5165\u81ea\u5b9a\u4e49\u63d0\u793a\uff0c\u800c\u5bf9\u7167\u7ec4\u5219\u6267\u884c\u9884\u8bbe\u7684\u5c11\u91cf\u63d0\u793a\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u4e13\u4e1a\u4eba\u58eb\u8fd8\u662f\u5b66\u751f\uff0c\u90fd\u5bf9\u6216\u65e0\u6cd5\u5e94\u7528\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5c24\u5176\u662f\u5b66\u751f\uff0c\u4ed6\u4eec\u8ba4\u4e3a\u4ece\u81ea\u5b9a\u4e49\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u6bd4\u4ece\u51c6\u5907\u597d\u7684\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u5728\u53ef\u8bfb\u6027\u3001\u7b80\u6d01\u6027\u548c\u6709\u7528\u6027\u65b9\u9762\u663e\u8457\u8f83\u5dee\u3002\u4e00\u4e9b\u4e13\u4e1a\u4eba\u58eb\u4ec5\u901a\u8fc7\u5728\u81ea\u5b9a\u4e49\u63d0\u793a\u4e2d\u52a0\u5165\u201cDocstring\u201d\u5173\u952e\u8bcd\u5c31\u80fd\u751f\u6210\u66f4\u9ad8\u8d28\u91cf\u7684\u6587\u6863\u3002\u5b66\u751f\u5e0c\u671b\u83b7\u5f97\u66f4\u591a\u7684\u6307\u5bfc\u6765\u5236\u5b9a\u63d0\u793a\uff0c\u800c\u4e13\u4e1a\u4eba\u58eb\u5219\u66f4\u6b23\u8d4f\u81ea\u5b9a\u4e49\u63d0\u793a\u7684\u7075\u6d3b\u6027\u3002\u53c2\u4e0e\u8005\u666e\u904d\u8ba4\u4e3a\u8f93\u51fa\u5e76\u975e\u5b8c\u7f8e\uff0c\u800c\u662f\u5c06\u5176\u89c6\u4e3a\u9010\u6b65\u5b8c\u5584\u6587\u6863\u7684\u5de5\u5177\u3002\u9700\u8981\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u6765\u7406\u89e3\u5f00\u53d1\u4eba\u5458\u5177\u6709\u7684\u63d0\u793a\u6280\u5de7\u548c\u504f\u597d\uff0c\u4ee5\u53ca\u4ed6\u4eec\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u652f\u63f4\u3002|\n", "2408.00665": "|**2024-08-01**|**AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models**|Daqin Luo et.al.|[2408.00665](http://arxiv.org/abs/2408.00665)|**[link](https://github.com/tim120526/AutoM3L)**|### \u6458\u8981 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591a\u6a21\u6001\u673a\u5668\u5b66\u4e60\u81ea\u52a8\u5316\u6846\u67b6\u2014\u2014AutoM3L\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u63a7\u5236\u5668\uff0c\u81ea\u52a8\u6784\u5efa\u591a\u6a21\u6001\u8bad\u7ec3\u7ba1\u9053\u3002AutoM3L\u80fd\u591f\u7406\u89e3\u6570\u636e\u6a21\u6001\u5e76\u6839\u636e\u7528\u6237\u9700\u6c42\u9009\u62e9\u5408\u9002\u7684\u6a21\u578b\uff0c\u63d0\u4f9b\u81ea\u52a8\u5316\u548c\u4e92\u52a8\u6027\u3002\u901a\u8fc7\u6d88\u9664\u624b\u52a8\u7279\u5f81\u5de5\u7a0b\u548c\u8d85\u53c2\u6570\u4f18\u5316\u7684\u9700\u6c42\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7b80\u5316\u4e86\u7528\u6237\u53c2\u4e0e\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u6307\u4ee4\u63d0\u4f9b\u4e86\u5b9a\u5236\u5316\u9009\u9879\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u4ee5\u5f80\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u3002 \u6211\u4eec\u5bf9AutoM3L\u5728\u516d\u4e2a\u4e0d\u540c\u7c7b\u578b\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6db5\u76d6\u4e86\u5206\u7c7b\u3001\u56de\u5f52\u548c\u68c0\u7d22\u4efb\u52a1\uff0c\u4ee5\u53ca\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u5355\u6a21\u6001\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoM3L\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\u6216\u8d85\u8d8a\u6027\u3002\u6b64\u5916\uff0c\u7528\u6237\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86AutoM3L\u5728\u7528\u6237\u53cb\u597d\u6027\u548c\u6613\u7528\u6027\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u76f8\u8f83\u4e8e\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3002|\n", "2408.00657": "|**2024-08-01**|**Disentangling Dense Embeddings with Sparse Autoencoders**|Charles O'Neill et.al.|[2408.00657](http://arxiv.org/abs/2408.00657)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5e94\u7528\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\uff08SAEs\uff09\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u5bc6\u96c6\u6587\u672c\u5d4c\u5165\u7684\u9996\u6b21\u5c1d\u8bd5\uff0c\u5c55\u793a\u5176\u5728\u89e3\u7f20\u8bed\u4e49\u6982\u5ff5\u65b9\u9762\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5728\u8d85\u8fc742\u4e07\u7bc7\u8ba1\u7b97\u673a\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u79d1\u5b66\u8bba\u6587\u6458\u8981\u7684\u5d4c\u5165\u4e0a\u8bad\u7ec3SAEs\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6240\u5f97\u5230\u7684\u7a00\u758f\u8868\u793a\u4fdd\u6301\u4e86\u8bed\u4e49\u4e00\u81f4\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002\u6211\u4eec\u5206\u6790\u8fd9\u4e9b\u5b66\u4e60\u7279\u5f81\uff0c\u63a2\u7d22\u4e0d\u540c\u6a21\u578b\u5bb9\u91cf\u4e0b\u5b83\u4eec\u7684\u884c\u4e3a\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8bc6\u522b\u201c\u7279\u5f81\u5bb6\u65cf\u201d\uff0c\u8fd9\u4e9b\u7279\u5f81\u4ee3\u8868\u4e86\u4e0d\u540c\u62bd\u8c61\u7ea7\u522b\u7684\u76f8\u5173\u6982\u5ff5\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u5b9e\u9645\u5e94\u7528\u4ef7\u503c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u4e9b\u53ef\u89e3\u91ca\u7279\u5f81\u7cbe\u786e\u63a7\u5236\u8bed\u4e49\u641c\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u67e5\u8be2\u8bed\u4e49\u7684\u7cbe\u7ec6\u63a7\u5236\u3002\u8fd9\u9879\u5de5\u4f5c\u586b\u8865\u4e86\u5bc6\u96c6\u5d4c\u5165\u7684\u8bed\u4e49\u4e30\u5bcc\u6027\u548c\u7a00\u758f\u8868\u793a\u7684\u53ef\u89e3\u91ca\u6027\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8bad\u7ec3\u540e\u7684\u5d4c\u5165\u3001\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\u4ee5\u53ca\u53ef\u89e3\u91ca\u7279\u5f81\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7528\u4e8e\u63a2\u7d22\u5b83\u4eec\u7684\u7f51\u9875\u5e94\u7528\u7a0b\u5e8f\u3002|\n", "2408.01423": "|**2024-08-02**|**Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting**|Xiangyu Zhao et.al.|[2408.01423](http://arxiv.org/abs/2408.01423)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5728\u6267\u884c\u5404\u79cd\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u6027\u80fd\u53d7\u5230\u7279\u5b9a\u63d0\u793a\u8bbe\u8ba1\u7b56\u7565\u7684\u5f71\u54cd\u3002\u4e3b\u8981\u6709\u4e24\u79cd\u63d0\u793a\u8bbe\u8ba1\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u901a\u8fc7\u624b\u52a8\u4e3a\u7279\u5b9a\u6570\u636e\u96c6\u521b\u5efa\u4e13\u95e8\u7684\u63d0\u793a\uff0c\u88ab\u79f0\u4e3a\u4e13\u5bb6\u8bbe\u8ba1\u63d0\u793a\uff08EDP\uff09\uff0c\u4e00\u65e6\u521b\u5efa\uff0c\u5b83\u4eec\u5c31\u65e0\u6cd5\u66f4\u6539\uff0c\u5176\u6709\u6548\u6027\u53d7\u9650\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u8005\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u5f53\u5e94\u7528\u4e8eLLM\u65f6\uff0c\u8fd9\u79cd\u56fa\u5b9a\u7684\u65b9\u6cd5\u5bfc\u81f4\u5bf9\u7b80\u5355\u95ee\u9898\u548c\u590d\u6742\u95ee\u9898\u91c7\u7528\u7edf\u4e00\u7684\u89e3\u51b3\u7b56\u7565\uff0c\u5bfc\u81f4\u5bf9\u4e8e\u7b80\u5355\u95ee\u9898\u8fc7\u5ea6\u4f7f\u7528\u4ee4\u724c\u3002\u53e6\u4e00\u79cd\u65b9\u6cd5\u662f\u8ba9LLM\u81ea\u52a8\u751f\u6210\u63d0\u793a\uff0c\u79f0\u4e3aLLM\u884d\u751f\u63d0\u793a\uff08LDP\uff09\uff0c\u80fd\u591f\u9488\u5bf9\u5177\u4f53\u95ee\u9898\u63d0\u4f9b\u5b9a\u5236\u89e3\u51b3\u65b9\u6848\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86EDP\u7684\u5c40\u9650\u6027\u3002\u7136\u800c\uff0cLDP\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6027\u80fd\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5728\u89e3\u51b3\u95ee\u9898\u89c4\u5212\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u7d2f\u79ef\u9519\u8bef\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63d0\u793a\u9012\u5f52\u641c\u7d22\uff08PRS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u51cf\u5c11\u4ee4\u724c\u7684\u4f7f\u7528\u3002\u8fd9\u4e2a\u6846\u67b6\u5305\u542b\u4e86\u5bf9\u95ee\u9898\u590d\u6742\u6027\u7684\u8bc4\u4f30\u4ee5\u53ca\u53ef\u8c03\u6574\u7684\u7ed3\u6784\uff0c\u4ee5\u964d\u4f4e\u51fa\u9519\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u53c2\u6570\u6570\u91cf\u7684LLM\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5185\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86PRS\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u4e0e\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0cPRS\u65b9\u6cd5\u5728\u4f7f\u7528Llama3-7B\u6a21\u578b\u65f6\uff0cBBH\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e868%\uff0c\u5b9e\u73b0\u4e8622%\u7684\u6539\u8fdb\u3002|\n", "2408.01420": "|**2024-08-02**|**Mission Impossible: A Statistical Perspective on Jailbreaking LLMs**|Jingtong Su et.al.|[2408.01420](http://arxiv.org/abs/2408.01420)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6709\u9650\u7684\u8d28\u91cf\u63a7\u5236\u4e0b\u8bad\u7ec3\u4e8e\u6d77\u91cf\u6587\u672c\u6570\u636e\u4e2d\u3002\u8fd9\u5bfc\u81f4LLM\u53ef\u80fd\u51fa\u73b0\u610f\u5916\u751a\u81f3\u6709\u5bb3\u7684\u884c\u4e3a\uff0c\u5982\u6cc4\u9732\u4fe1\u606f\u3001\u5047\u65b0\u95fb\u6216\u4ec7\u6068\u8a00\u8bba\u3002\u5e94\u5bf9\u7b56\u7565\uff0c\u901a\u5e38\u79f0\u4e3a\u504f\u597d\u5bf9\u9f50\uff0c\u5305\u62ec\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6587\u672c\u793a\u4f8b\u7cbe\u7ec6\u8c03\u6574\u9884\u8bad\u7ec3\u7684LLM\uff0c\u4ee5\u4f53\u73b0\u671f\u671b\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u5373\u4f7f\u8fdb\u884c\u4e86\u504f\u597d\u5bf9\u9f50\uff0cLLM\u4e5f\u4ecd\u53ef\u80fd\u8bf1\u9a97\u81f3\u6709\u5bb3\u884c\u4e3a\u3002\u8fd9\u79cd\u88ab\u79f0\u4e3aLLM\u201c\u8d8a\u72f1\u201d\u7684\u73b0\u8c61\u901a\u5e38\u901a\u8fc7\u4fee\u6539\u8f93\u5165\u63d0\u793a\u6765\u5b9e\u73b0\uff0c\u4ee5\u8bef\u5bfcLLM\u3002\u672c\u6587\u4ece\u7edf\u8ba1\u5b66\u7684\u89d2\u5ea6\u63d0\u4f9b\u5bf9\u504f\u597d\u5bf9\u9f50\u548c\u8d8a\u72f1\u73b0\u8c61\u7684\u7406\u8bba\u6d1e\u5bdf\u3002 \u5728\u6211\u4eec\u7684\u6846\u67b6\u4e0b\uff0c\u9996\u5148\u8bc1\u660e\u4e86\u5982\u679c\u8bad\u7ec3\u8bed\u6599\u5e93\u4e2d\u5b58\u5728\u6709\u5bb3\u884c\u4e3a\uff0c\u9884\u8bad\u7ec3\u7684LLM\u4f1a\u6a21\u4eff\u8fd9\u79cd\u884c\u4e3a\u3002\u540c\u6837\u57fa\u4e8e\u8fd9\u4e2a\u6846\u67b6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7edf\u8ba1\u610f\u4e49\u4e0a\u7684\u5bf9\u9f50\u6982\u5ff5\uff0c\u5e76\u7ed9\u51fa\u4e86\u8d8a\u72f1\u6982\u7387\u7684\u4e0b\u754c\uff0c\u8868\u660e\u5728\u5408\u7406\u5047\u8bbe\u4e0b\uff0c\u8fd9\u79cd\u73b0\u8c61\u662f\u65e0\u6cd5\u907f\u514d\u7684\u3002\u57fa\u4e8e\u6211\u4eec\u7684\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u5f53\u524d\u666e\u904d\u91c7\u7528\u7684\u5bf9\u9f50\u7b56\u7565\u2014\u2014\u5f3a\u5316\u8bed\u8a00\u5f15\u5bfc\u53cd\u9988\uff08RLHF\uff09\u7684\u6539\u8fdb\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aE-RLHF\u7684\u7b80\u5355\u4fee\u6539\u7248RLHF\u76ee\u6807\uff0c\u65e8\u5728\u63d0\u9ad8\u5b89\u5168\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u3002E-RLHF\u4e0d\u4f1a\u589e\u52a0\u989d\u5916\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u4e14\u4e0e\u5176\u5b83\u65b9\u6cd5\u517c\u5bb9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4e0d\u727a\u7272MT-Bench\u9879\u76ee\u8861\u91cf\u7684\u6a21\u578b\u6027\u80fd\u7684\u60c5\u51b5\u4e0b\uff0cE-RLHF\u5728AdvBench\u548cHarmBench\u9879\u76ee\u63d0\u51fa\u7684\u6240\u6709\u5bf9\u9f50\u95ee\u9898\u4e0a\u5747\u4f18\u4e8eRLHF\u3002|\n", "2408.01419": "|**2024-08-02**|**DebateQA: Evaluating Question Answering on Debatable Knowledge**|Rongwu Xu et.al.|[2408.01419](http://arxiv.org/abs/2408.01419)|**[link](https://github.com/pillowsofwind/debateqa)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u63a2\u8ba8\u5173\u4e8eLLM\u804a\u5929\u673a\u5668\u4eba\u4e0a\u56fa\u6709\u4e89\u8bae\u6027\u95ee\u9898\u7684\u7b54\u6848\uff0c\u8fd9\u9700\u8981\u4e00\u79cd\u53ef\u9760\u7684\u65b9\u5f0f\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u4f20\u7edf\u95ee\u7b54\u57fa\u51c6\u5047\u8bbe\u56fa\u5b9a\u7684\u7b54\u6848\u5bf9\u6b64\u76ee\u7684\u800c\u8a00\u662f\u4e0d\u8db3\u7684\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86DebateQA\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,941\u4e2a\u4e89\u8bae\u6027\u95ee\u9898\u7684\u6570\u636e\u96c6\uff0c\u6bcf\u4e2a\u95ee\u9898\u90fd\u9644\u5e26\u4e86\u591a\u4e2a\u7531\u4eba\u7c7b\u6ce8\u91ca\u7684\u7247\u6bb5\u7b54\u6848\uff0c\u8fd9\u4e9b\u7247\u6bb5\u7b54\u6848\u6355\u6349\u4e86\u5404\u79cd\u89c6\u89d2\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff1a\u89c2\u70b9\u591a\u6837\u6027\uff0c\u7528\u4e8e\u8bc4\u4f30\u89c6\u89d2\u7684\u5168\u9762\u6027\uff1b\u4ee5\u53ca\u4e89\u8bae\u610f\u8bc6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u662f\u5426\u8ba4\u8bc6\u5230\u95ee\u9898\u7684\u4e89\u8bae\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\u4e0e\u4eba\u7c7b\u504f\u597d\u4e00\u81f4\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57fa\u7840\u6a21\u578b\u4e4b\u95f4\u5177\u6709\u7a33\u5b9a\u6027\u3002\u901a\u8fc7\u4f7f\u7528DebateQA\u548c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u6211\u4eec\u8bc4\u4f30\u4e8612\u79cd\u6d41\u884c\u7684LLM\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u867d\u7136LLM\u901a\u5e38\u64c5\u957f\u8bc6\u522b\u4e89\u8bae\u6027\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u63d0\u4f9b\u5168\u9762\u7b54\u6848\u3001\u6db5\u76d6\u591a\u6837\u89c6\u89d2\u7684\u80fd\u529b\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002|\n", "2408.01417": "|**2024-08-02**|**Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs**|Yilun Hua et.al.|[2408.01417](http://arxiv.org/abs/2408.01417)|null|\u4eba\u7c7b\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u4f1a\u81ea\u53d1\u5730\u4f7f\u7528\u8d8a\u6765\u8d8a\u9ad8\u6548\u7684\u8bed\u8a00\uff0c\u901a\u8fc7\u9002\u5e94\u5e76\u5f62\u6210\u81ea\u5b9a\u4e49\u7684\u7ea6\u5b9a\u3002\u8fd9\u4e00\u73b0\u8c61\u5df2\u7ecf\u901a\u8fc7\u53c2\u8003\u6e38\u620f\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u4eba\u7c7b\u8bed\u8a00\u8d85\u8d8a\u4f20\u8fbe\u610f\u56fe\u7684\u7279\u6027\u3002\u76ee\u524d\uff0c\u6211\u4eec\u5c1a\u672a\u63a2\u7d22\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u662f\u5426\u5728\u4ea4\u4e92\u4e2d\u540c\u6837\u63d0\u9ad8\u4e86\u6c9f\u901a\u6548\u7387\uff0c\u5e76\u4e14\u5b83\u4eec\u53ef\u80fd\u91c7\u7528\u4f55\u79cd\u673a\u5236\u5b9e\u73b0\u8fd9\u4e00\u76ee\u7684\u3002 \u6211\u4eec\u5f15\u5165\u4e86ICCA\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728MLLM\u4e2d\u8bc4\u4f30\u6b64\u7c7b\u5bf9\u8bdd\u9002\u5e94\u4f5c\u4e3a\u4e0a\u4e0b\u6587\u884c\u4e3a\u7684\u80fd\u529b\u3002\u6211\u4eec\u5bf9\u51e0\u79cd\u6700\u5148\u8fdb\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u89c2\u5bdf\u5230\u867d\u7136\u5b83\u4eec\u53ef\u80fd\u7406\u89e3\u5176\u5bf9\u8bdd\u4f19\u4f34\u7684\u8bed\u8a00\u8d8a\u6765\u8d8a\u9ad8\u6548\uff0c\u4f46\u5b83\u4eec\u672c\u8eab\u5e76\u4e0d\u81ea\u53d1\u5730\u5728\u65f6\u95f4\u4e0a\u4f7f\u81ea\u5df1\u7684\u8bed\u8a00\u53d8\u5f97\u66f4\u9ad8\u6548\u3002\u8fd9\u79cd\u80fd\u529b\u4ec5\u5728\u67d0\u4e9b\u6a21\u578b\uff08\u5982GPT-4\uff09\u4e2d\u53ef\u4ee5\u901a\u8fc7\u5f3a\u70c8\u7684\u63d0\u793a\u6765\u6fc0\u53d1\u3002\u8fd9\u8868\u660e\uff0c\u5373\u4f7f\u8fd9\u662f\u4eba\u7c7b\u8bed\u8a00\u7684\u5e38\u89c1\u7279\u5f81\uff0c\u5f53\u524d\u7684\u8bad\u7ec3\u5236\u5ea6\u5e76\u4e0d\u80fd\u4ea7\u751f\u8fd9\u4e00\u4e92\u52a8\u5c5e\u6027\u3002 ICCA\u6846\u67b6\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/lil-lab/ICCA\u3002|\n", "2408.01380": "|**2024-08-02**|**Coalitions of Large Language Models Increase the Robustness of AI Agents**|Prattyush Mangal et.al.|[2408.01380](http://arxiv.org/abs/2408.01380)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4ece\u6839\u672c\u4e0a\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u5b57\u7cfb\u7edf\u4e92\u52a8\u7684\u65b9\u5f0f\uff0c\u5e76\u63a8\u52a8\u4e86\u5bf9\u501f\u52a9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684AI\u4ee3\u7406\u4ee5\u8f85\u52a9\u65e5\u5e38\u6d41\u7a0b\u7684\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5f3a\u5927\u7684\u80fd\u529b\u5e76\u80fd\u591f\u8868\u73b0\u51fa\u4e00\u4e9b\u6d8c\u73b0\u7279\u6027\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u903b\u8f91\u63a8\u7406\u8005\uff0c\u5f80\u5f80\u5728AI\u4ee3\u7406\u6267\u884c\u5de5\u4f5c\u6d41\u7a0b\u65f6\u6240\u6d89\u53ca\u7684\u6240\u6709\u5b50\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u73b0\u6709\u7814\u7a76\u901a\u8fc7\u5927\u89c4\u6a21\u7684\u4e00\u822c\u6027\u9884\u8bad\u7ec3\u6216\u9488\u5bf9\u5de5\u5177\u4f7f\u7528\u8fdb\u884c\u4e13\u95e8\u7684\u5fae\u8c03\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u800c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u4e2a\u7531\u4e13\u6ce8\u4e8e\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u7ec4\u6210\u7684\u8054\u76df\u662f\u5426\u80fd\u4e0e\u5355\u4e00\u6a21\u578b\u4ee3\u7406\u7684\u8868\u73b0\u76f8\u5339\u654c\u3002\u8054\u76df\u6a21\u578b\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5176\u5728\u6784\u5efa\u9c81\u68d2\u6027\u548c\u964d\u4f4e\u8fd9\u4e9bAI\u4ee3\u7406\u8fd0\u884c\u6210\u672c\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u901a\u8fc7\u5229\u7528\u7279\u5b9a\u6a21\u578b\u5c55\u73b0\u7684\u7279\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u901a\u8fc7\u8003\u8651\u4e00\u7ec4\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u53ef\u4ee5\u51cf\u8f7b\u5fae\u8c03\u7684\u9700\u6c42\uff0c\u5e76\u76f8\u4fe1\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u5176\u4ed6\u5229\u7528LLM\u7684\u975e\u4ee3\u7406\u7cfb\u7edf\u3002|\n", "2408.01363": "|**2024-08-02**|**Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation**|Jheng-Hong Yang et.al.|[2408.01363](http://arxiv.org/abs/2408.01363)|null|### \u6458\u8981 \u672c\u6587\u5bf9\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u7684\u6f5c\u529b\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u9488\u5bf9\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u4f5c\u7684\u5927\u578b\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\uff0c\u8bc4\u4f30\u4e86CLIP\u3001LLaVA\u548cGPT-4V\u7b49VLM\u7684\u6027\u80fd\u3002\u521d\u6b65\u5b9e\u9a8c\u7ed3\u679c\u5982\u4e0b\uff1a 1. **\u6027\u80fd\u6bd4\u8f83**\uff1a\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u4e0a\uff0cLLaVA\u548cGPT-4V\uff08\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684Kendall\u2019s \u03c4\u22480.4\u7684\u6210\u7ee9\uff0c\u8d85\u8fc7\u4e86CLIPScore\u6307\u6807\u3002 2. **\u504f\u597d\u4e0e\u504f\u89c1**\uff1a\u5c3d\u7ba1CLIPScore\u8868\u73b0\u7a81\u51fa\uff0c\u4f46LLMs\u5728\u504f\u89c1\u65b9\u9762\u76f8\u5bf9\u8f83\u5c11\u503e\u5411\u4e8e\u57fa\u4e8eCLIP\u7684\u68c0\u7d22\u7cfb\u7edf\u3002 3. **\u4e00\u81f4\u6027\u5206\u6790**\uff1aGPT-4V\u7684\u8bc4\u5206\u5206\u5e03\u4e0e\u4eba\u7c7b\u5224\u65ad\u66f4\u4e3a\u4e00\u81f4\uff0c\u5176Cohen\u2019s \u03ba\u503c\u7ea6\u4e3a0.08\uff0c\u8fdc\u9ad8\u4e8eCLIPScore\u7684\u7ea6-0.096\u3002\u8fd9\u4e00\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684VLM\u5728\u589e\u5f3a\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u5177\u6709\u6f5c\u529b\u3002 ### \u7ed3\u8bba \u672c\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u76f8\u5173\u6027\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u7279\u522b\u662f\u5f53\u5b83\u4eec\u88ab\u7528\u4e8e\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\u65f6\u3002\u901a\u8fc7\u6bd4\u8f83\u4e0d\u540c\u6a21\u578b\u7684\u6027\u80fd\uff0c\u7814\u7a76\u5f3a\u8c03\u4e86LLMs\u5728\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u5efa\u9886\u57df\u5185\u7684\u6f5c\u5728\u4f18\u52bf\uff0c\u5e76\u6307\u51fa\u4e86\u5b83\u4eec\u5728\u63d0\u5347\u5185\u5bb9\u76f8\u5173\u6027\u5224\u65ad\u65b9\u9762\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.01355": "|**2024-08-02**|**Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs**|Peng Ding et.al.|[2408.01355](http://arxiv.org/abs/2408.01355)|**[link](https://github.com/njunlp/hallu-pi)**|\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5076\u5c14\u4f1a\u4ea7\u751f\u4e0e\u7ed9\u5b9a\u56fe\u50cf\u4e0d\u4e00\u81f4\u7684\u5185\u5bb9\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4f7f\u7528\u6807\u51c6\u3001\u672a\u6270\u52a8\u57fa\u51c6\u8bc4\u4f30\u5e7b\u89c9\u4e0a\uff0c\u8fd9\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u666e\u904d\u5b58\u5728\u7684\u6270\u52a8\u8f93\u5165\uff08\u5982\u56fe\u50cf\u88c1\u526a\u6216\u6a21\u7cca\uff09\uff0c\u8fd9\u662f\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5e7b\u89c9\u5168\u9762\u8bc4\u4f30\u7684\u5173\u952e\u3002 \u672c\u7bc7\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86Hallu-PI\uff0c\u9996\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6270\u52a8\u8f93\u5165\u4e0b\u7684\u5e7b\u89c9\u7684\u57fa\u51c6\u3002Hallu-PI\u5305\u542b\u4e867\u79cd\u6270\u52a8\u60c5\u666f\uff0c\u6d89\u53ca1,260\u5f20\u6765\u81ea11\u79cd\u7269\u4f53\u7c7b\u578b\u7684\u6270\u52a8\u56fe\u50cf\u3002\u6bcf\u5f20\u56fe\u50cf\u90fd\u9644\u6709\u8be6\u7ec6\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5e7b\u89c9\u7c7b\u578b\uff0c\u5982\u5b58\u5728\u6027\u3001\u5c5e\u6027\u548c\u5173\u7cfb\u7b49\u3002\u8fd9\u4e9b\u6ce8\u91ca\u914d\u5907\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u95ee\u7b54\u96c6\uff0c\u4f7fHallu-PI\u9002\u7528\u4e8e\u8fa8\u522b\u6027\u548c\u751f\u6210\u6027\u4efb\u52a1\u3002 \u5728\u5bf9\u4e3b\u6d41\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini-Pro Vision\uff09\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728Hallu-PI\u4e0a\u7684\u8868\u73b0\u663e\u793a\u51fa\u663e\u8457\u7684\u5e7b\u89c9\uff0c\u800c\u5728\u672a\u6270\u52a8\u573a\u666f\u4e2d\u672a\u89c2\u5bdf\u5230\u6b64\u7c7b\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0d\u540c\u7c7b\u578b\u5e7b\u89c9\u65f6\u5b58\u5728\u7684\u4e25\u91cd\u504f\u5dee\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u4e13\u95e8\u9488\u5bf9\u6270\u52a8\u60c5\u666f\u7684\u57fa\u7ebf\uff0c\u5206\u522b\u4e3aPerturbed-Reminder\u548cPerturbed-ICL\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u5f15\u8d77\u7814\u7a76\u4eba\u5458\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u6270\u52a8\u8f93\u5165\u65f6\u5c40\u9650\u6027\u7684\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u8fdb\u4e00\u6b65\u7684\u8c03\u67e5\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728GitHub\uff08https://github.com/NJUNLP/Hallu-PI\uff09\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.01354": "|**2024-08-02**|**MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code**|Kaiwen Ning et.al.|[2408.01354](http://arxiv.org/abs/2408.01354)|**[link](https://github.com/KevinHeiwa/MCGTM)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u4f17\u591a\u8f6f\u4ef6\u670d\u52a1\u63d0\u4f9b\u5546\uff08SSP\uff09\u81f4\u529b\u4e8e\u5f00\u53d1\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u5b9a\u5236\u5316LLM\uff0c\u5982CodeLlama\u548cCopilot\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u6709\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u6765\u751f\u6210\u6076\u610f\u8f6f\u4ef6\uff0c\u5bf9\u8f6f\u4ef6\u751f\u6001\u7cfb\u7edf\u6784\u6210\u6f5c\u5728\u5a01\u80c1\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u9ad8\u7ea7\u7f51\u7edc\u9493\u9c7c\u6076\u610f\u8f6f\u4ef6\u7684\u521b\u5efa\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9996\u5148\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b\u7ea6400\u5c0f\u65f6\u5de5\u4f5c\u91cf\u3001\u5171\u8ba1406\u4e2a\u6076\u610f\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u63d0\u793a\u6570\u636e\u96c6MCGTest\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MCGMark\uff0c\u8fd9\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u7a33\u5065\u3001\u7ed3\u6784\u611f\u77e5\u4e14\u53ef\u7f16\u7801\u7684\u6c34\u5370\u65b9\u6cd5\uff0c\u7528\u4e8e\u8ffd\u8e2a\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u3002\u6211\u4eec\u901a\u8fc7\u63a7\u5236\u4ee4\u724c\u9009\u62e9\u548c\u57fa\u4e8e\u6982\u7387\u5f02\u5e38\u503c\u786e\u4fdd\u8f93\u51fa\u8d28\u91cf\u6765\u5d4c\u5165\u53ef\u7f16\u7801\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8003\u8651\u6076\u610f\u4ee3\u7801\u7684\u7ed3\u6784\u7279\u5f81\u589e\u5f3a\u4e86\u6c34\u5370\u7684\u9c81\u68d2\u6027\uff0c\u907f\u514d\u5728\u6613\u4e8e\u4fee\u6539\u7684\u4f4d\u7f6e\uff08\u5982\u6ce8\u91ca\uff09\u5d4c\u5165\u6c34\u5370\u3002\u6211\u4eec\u4f7f\u7528DeepSeek-Coder\u9a8c\u8bc1\u4e86MCGMark\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\uff0c\u5176\u6700\u5927\u8f93\u51fa\u9650\u5236\u4e3a400\u4e2a\u4ee4\u724c\u65f6\uff0c\u5d4c\u5165\u6210\u529f\u7387\u8fbe\u5230\u4e8688.9%\u3002\u540c\u65f6\uff0c\u8be5\u65b9\u6cd5\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u5bf9\u8f93\u51fa\u4ee3\u7801\u7684\u8d28\u91cf\u5f71\u54cd\u6781\u5c0f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5e2e\u52a9SSP\u8ffd\u8e2a\u5e76\u8ffd\u7a76\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u7684\u6e90\u5934\u53ca\u8d23\u4efb\u3002|\n", "2408.01346": "|**2024-08-02**|**Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks**|Anders Giovanni M\u00f8ller et.al.|[2408.01346](http://arxiv.org/abs/2408.01346)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u662f\u4fc3\u8fdb\u793e\u4f1a\u8ba1\u7b97\u9886\u57df\u590d\u6742\u6587\u672c\u7406\u89e3\u4efb\u52a1\u7684\u6709\u529b\u5de5\u5177\u3002\u5b83\u4eec\u7684\u591a\u529f\u80fd\u6027\u867d\u7136\u6709\u76ca\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u5728\u8be5\u9886\u57df\u5efa\u7acb\u6807\u51c6\u5316\u6700\u4f73\u5b9e\u8df5\u7684\u969c\u788d\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e0d\u540c\u7b56\u7565\u4ef7\u503c\u7684\u6e05\u6670\u5ea6\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u73b0\u4ee3\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u65b9\u6cd5\u572823\u4e2a\u793e\u4f1a\u77e5\u8bc6\u4efb\u52a1\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u6307\u51fa\u4e86\u4e09\u4e2a\u6700\u4f73\u5b9e\u8df5\uff1a\u9009\u62e9\u5177\u6709\u66f4\u5927\u8bcd\u6c47\u91cf\u548c\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u6a21\u578b\uff1b\u907f\u514d\u7b80\u5355\u7684\u96f6\u6b21\u5c1d\u8bd5\uff0c\u800c\u503e\u5411\u4e8e\u589e\u5f3a\u63d0\u793a\u7684\u4eba\u5de5\u667a\u80fd\u65b9\u6cd5\uff1b\u5728\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8003\u8651\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u66f4\u590d\u6742\u7684\u6307\u4ee4\u8c03\u6574\uff0c\u4ec5\u5f53\u8bad\u7ec3\u6570\u636e\u66f4\u4e3a\u4e30\u5bcc\u65f6\u624d\u8fd9\u6837\u505a\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u6bb5\u7ffb\u8bd1\u6587\u672c\u4e2d\u5e76\u672a\u5305\u542b\u4efb\u4f55\", \"\u5b57\u7b26\u3002|\n", "2408.01334": "|**2024-08-02**|**A Backbone for Long-Horizon Robot Task Understanding**|Xiaoshuai Chen et.al.|[2408.01334](http://arxiv.org/abs/2408.01334)|null|\u4e3a\u4e86\u5e94\u5bf9\u957f\u65f6\u7a0b\u4efb\u52a1\u4e2d\u7aef\u5230\u7aef\u673a\u5668\u4eba\u5b66\u4e60\u7684\u4e0d\u53ef\u9884\u6d4b\u6027\u4e0e\u6cdb\u5316\u80fd\u529b\u5dee\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eTherblig\u7684\u9aa8\u67b6\u6846\u67b6\uff08TBBF\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u673a\u5668\u4eba\u4efb\u52a1\u7406\u89e3\u4e0e\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u6846\u67b6\u5229\u7528Therblig\uff08\u57fa\u672c\u52a8\u4f5c\u5143\u7d20\uff09\u4f5c\u4e3a\u9aa8\u67b6\uff0c\u5c06\u9ad8\u7ea7\u673a\u5668\u4eba\u4efb\u52a1\u5206\u89e3\u4e3a\u57fa\u672c\u673a\u5668\u4eba\u914d\u7f6e\uff0c\u7136\u540e\u7ed3\u5408\u5f53\u524d\u7684\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u4efb\u52a1\u7406\u89e3\u3002 \u8be5\u65b9\u6cd5\u5305\u542b\u4e24\u4e2a\u9636\u6bb5\uff1a\u79bb\u7ebf\u8bad\u7ec3\u4e0e\u5728\u7ebf\u6d4b\u8bd5\u3002\u5728\u79bb\u7ebf\u8bad\u7ec3\u9636\u6bb5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Meta-RGate SynerFusion\uff08MGSF\uff09\u7f51\u7edc\uff0c\u7528\u4e8e\u8de8\u4efb\u52a1\u7cbe\u786e\u7684Therblig\u5206\u5272\u3002\u5728\u7ebf\u6d4b\u8bd5\u9636\u6bb5\uff0c\u901a\u8fc7\u6536\u96c6\u65b0\u4efb\u52a1\u7684\u4e00\u6b21\u6f14\u793a\uff0cMGSF\u7f51\u7edc\u63d0\u53d6\u9ad8\u9636\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7Action Registration\uff08ActionREG\uff09\u7f16\u7801\u5165\u56fe\u50cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528Large Language Model\uff08LLM\uff09-Alignment Policy for Visual Correction\uff08LAP-VC\uff09\u6765\u786e\u4fdd\u7cbe\u786e\u7684\u52a8\u4f5c\u6267\u884c\uff0c\u4ece\u800c\u5728\u65b0\u578b\u673a\u5668\u4eba\u573a\u666f\u4e2d\u5b9e\u73b0\u8f68\u8ff9\u8f6c\u79fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0cTherblig\u5206\u5272\u8fbe\u5230\u4e8694.37%\u7684\u53ec\u56de\u7387\uff0c\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u7684\u5728\u7ebf\u673a\u5668\u4eba\u6d4b\u8bd5\u4e2d\uff0c\u5bf9\u4e8e\u7b80\u5355\u548c\u590d\u6742\u573a\u666f\u7684\u6210\u529f\u7387\u5206\u522b\u8fbe\u5230\u4e8694.4%\u548c80%\u3002\u8865\u5145\u6750\u6599\u53ef\u5728\u4ee5\u4e0b\u7f51\u7ad9\u83b7\u53d6\uff1ahttps://sites.google.com/view/therbligsbasedbackbone/home|\n", "2408.02651": "|**2024-08-05**|**Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?**|Mohammad Bahrami Karkevandi et.al.|[2408.02651](http://arxiv.org/abs/2408.02651)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u9053\u5fb7\u6027\u4ecd\u7136\u5b58\u5728\u4e89\u8bae\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u8bad\u7ec3\u57fa\u4e8e\u4e92\u8054\u7f51\u6587\u672c\u8bed\u6599\u5e93\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u62c5\u5fe7\uff0c\u5df2\u7ecf\u5f00\u53d1\u4e86\u5bf9\u9f50\u6280\u672f\u6765\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u5171\u53ef\u7528\u6027\u548c\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\u4f3c\u4e4e\u4ecd\u7136\u5b58\u5728\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u201c\u53cd\u5411\u5bf9\u9f50\u201dLLM\u7684\u6982\u5ff5\u2014\u2014\u5229\u7528\u5bf9\u6297\u89e6\u53d1\u5668\u9006\u8f6c\u5176\u5bf9\u9f50\u8fc7\u7a0b\u3002\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u8f6f\u5d4c\u5165\u63d0\u793a\u3001\u624b\u52a8\u6784\u5efa\u7684\u63d0\u793a\u548c\u57fa\u4e8e\u68af\u5ea6\u7684\u81ea\u52a8\u63d0\u793a\uff0c\u5728\u9ed1\u76d2\u6a21\u578b\u4e0a\u7531\u4e8e\u9700\u8981\u8bbf\u95ee\u6a21\u578b\u548c\u4ea7\u751f\u6709\u9650\u7684\u624b\u52a8\u6784\u5efa\u63d0\u793a\u7684\u9700\u6c42\u800c\u53d6\u5f97\u4e86\u6709\u9650\u7684\u6210\u529f\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5bb9\u6613\u88ab\u963b\u65ad\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u4f18\u5316\u5bf9\u6297\u89e6\u53d1\u5668\uff0c\u4ec5\u9700\u5bf9\u76ee\u6807\u6a21\u578b\u8fdb\u884c\u63a8\u7406API\u8bbf\u95ee\u4ee5\u53ca\u4e00\u4e2a\u5c0f\u578b\u4ee3\u7406\u6a21\u578b\u5373\u53ef\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528BERTScore\u4e3a\u57fa\u7840\u7684\u5956\u52b1\u51fd\u6570\uff0c\u589e\u5f3a\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u5728\u65b0\u9ed1\u76d2\u6a21\u578b\u4e0a\u7684\u53ef\u79fb\u690d\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5982\u4f55\u5728\u672a\u6d4b\u8bd5\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\u63d0\u9ad8\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u7684\u8868\u73b0\u3002|\n", "2408.02632": "|**2024-08-05**|**SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models**|Muxi Diao et.al.|[2408.02632](http://arxiv.org/abs/2408.02632)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u4e0e\u5f71\u54cd\u529b\u7684\u6301\u7eed\u589e\u5f3a\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u9884\u9632\u6709\u5bb3\u8f93\u51fa\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u5173\u5207\uff0c\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5bf9\u6297\u6027\u63d0\u793a\u8fdb\u884c\u7ea2\u961f\u6d4b\u8bd5\u3002\u7136\u800c\uff0cLLM\u4e2d\u6f0f\u6d1e\u7684\u4e0d\u65ad\u6f14\u53d8\u4f7f\u5f97\u5f53\u524d\u7684\u5bf9\u6297\u65b9\u6cd5\u5728\u5177\u4f53\u9488\u5bf9\u548c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u5f31\u70b9\u65b9\u9762\u663e\u5f97\u529b\u4e0d\u4ece\u5fc3\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u81ea\u6211\u6f14\u5316\u5b89\u5168\u4f18\u5316\u201d\uff08SEAS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5229\u7528\u6a21\u578b\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u6765\u589e\u5f3a\u5b89\u5168\u6027\u3002SEAS\u8fd0\u4f5c\u4e8e\u4e09\u4e2a\u8fed\u4ee3\u9636\u6bb5\uff1a\u521d\u59cb\u5316\u3001\u653b\u51fb\u548c\u5bf9\u6297\u4f18\u5316\uff0c\u65e8\u5728\u540c\u65f6\u63d0\u5347\u7ea2\u961f\u548c\u76ee\u6807\u6a21\u578b\u7684\u7a33\u5065\u6027\u548c\u5b89\u5168\u6027\u3002 \u8be5\u6846\u67b6\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u6d4b\u8bd5\u7684\u4f9d\u8d56\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86LLM\u7684\u5b89\u5168\u6027\u80fd\u529b\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u65b0\u9896\u7684\u5bf9\u6297\u6027\u6846\u67b6\u3001\u4e00\u4e2a\u5168\u9762\u7684\u5b89\u5168\u6570\u636e\u96c6\u4ee5\u53ca\u7ecf\u8fc7\u4e09\u6b21\u8fed\u4ee3\u540e\uff0c\u76ee\u6807\u6a21\u578b\u7684\u5b89\u5168\u6c34\u5e73\u8fbe\u5230\u4e86\u4e0eGPT-4\u76f8\u5f53\u7684\u6c34\u5e73\uff0c\u800c\u7ea2\u961f\u6a21\u578b\u5728\u5bf9\u6297\u9ad8\u7ea7\u6a21\u578b\u65f6\u7684\u6210\u529f\u7387\uff08ASR\uff09\u6709\u4e86\u663e\u8457\u63d0\u9ad8\u3002|\n", "2408.02599": "|**2024-08-05**|**Progressively Selective Label Enhancement for Language Model Alignment**|Biao Liu et.al.|[2408.02599](http://arxiv.org/abs/2408.02599)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u53ef\u80fd\u4f1a\u751f\u6210\u4e0e\u4eba\u7c7b\u9884\u671f\u4e0d\u7b26\u7684\u5185\u5bb9\uff0c\u4ece\u800c\u5f15\u53d1\u4f26\u7406\u548c\u6cd5\u5f8b\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u7684\u5c40\u9650\u6027\u5e76\u5b9e\u65bd\u9650\u5236\u4ee5\u786e\u4fdd\u5b89\u5168\u6027\u548c\u5408\u89c4\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u5176\u4e2d\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8eRLHF\u9636\u6bb5\u5728\u7a33\u5b9a\u6027\u548c\u53ef\u6269\u5c55\u6027\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u5728\u63a2\u7d22\u5176\u4ed6\u65b9\u6cd5\u6765\u5b9e\u73b0\u4e0eRLHF\u7c7b\u4f3c\u7684\u6548\u679c\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5927\u91cf\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u4f4e\u6548\u5730\u5229\u7528\u751f\u6210\u7684\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSLE\uff08Progressively Selective Label Enhancement for Language Model Alignment\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5145\u5206\u5229\u7528\u6240\u6709\u751f\u6210\u6570\u636e\uff0c\u901a\u8fc7\u6307\u5bfc\u6a21\u578b\u9075\u5faa\u539f\u5219\u6765\u4f7f\u8f93\u51fa\u4e0e\u4eba\u7c7b\u671f\u671b\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u52a8\u6001\u66f4\u65b0\u9608\u503c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u786e\u4fdd\u4e86\u9ad8\u6548\u7684\u6570\u636e\u5229\u7528\uff0c\u901a\u8fc7\u6574\u5408\u6240\u6709\u751f\u6210\u54cd\u5e94\u5e76\u6839\u636e\u5176\u76f8\u5e94\u7684\u5956\u52b1\u5206\u6570\u5bf9\u5b83\u4eec\u8fdb\u884c\u52a0\u6743\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPSLE\u5728\u73b0\u6709\u8bed\u8a00\u6a21\u578b\u5bf9\u9f50\u65b9\u6cd5\u4e2d\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002|\n", "2408.02584": "|**2024-08-05**|**Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization**|Ankan Mullick et.al.|[2408.02584](http://arxiv.org/abs/2408.02584)|null|\u968f\u7740\u6570\u5b57\u4fe1\u606f\u91cf\u7684\u6301\u7eed\u589e\u957f\uff0c\u7528\u6237\u9700\u8981\u6709\u6548\u65b9\u6cd5\u4ece\u957f\u7bc7\u6587\u6863\u4e2d\u63d0\u53d6\u5173\u952e\u89c1\u89e3\u3002\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u9488\u5bf9\u6027\u7684\u65b9\u6cd5\uff0c\u751f\u6210\u4e13\u6ce8\u4e8e\u6587\u6863\u5185\u7279\u5b9a\u65b9\u9762\u7684\u5c0f\u7ed3\u3002\u5c3d\u7ba1\u5728\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u7814\u7a76\u9886\u57df\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u8ffd\u6c42\u662f\u5fc5\u8981\u7684\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u603b\u7ed3\u95ee\u9898\u4e0a\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u4ee5\u6267\u884c\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u4efb\u52a1\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f00\u6e90\u57fa\u7840LLMs\uff0c\u5305\u62ecLlama2\u3001Mistral\u3001Gemma\u548cAya\uff0c\u5bf9\u4e8e\u516c\u5f00\u53ef\u7528\u7684\u7279\u5b9a\u9886\u57df\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5047\u8bbe\u662f\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8ba9\u8fd9\u4e9b\u6a21\u578b\u6709\u6548\u5730\u8bc6\u522b\u5e76\u63d0\u53d6\u4e0e\u65b9\u9762\u76f8\u5173\u7684\u4fe1\u606f\uff0c\u4ece\u800c\u4ea7\u751f\u4e0e\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u76f8\u6bd4\u66f4\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5c06\u5fae\u8c03\u540e\u7684LLMs\u7684\u6027\u80fd\u4e0e\u7ade\u4e89\u6027\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u65b9\u6cd5\u4ee5\u53ca\u5fae\u8c03\u524dLLMs\u7684\u539f\u59cb\u7248\u672c\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u901a\u8fc7\u8bc1\u660e\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\uff0c\u4e3a\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6b64\u5916\uff0c\u5b83\u4e3a\u5728\u4e0d\u540cNLP\u9886\u57df\u8fdb\u4e00\u6b65\u63a2\u7d22\u4f7f\u7528LLMs\u8fdb\u884c\u76ee\u6807\u4fe1\u606f\u62bd\u53d6\u4efb\u52a1\u6253\u5f00\u4e86\u5927\u95e8\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0eAPI\u9a71\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u3001\u4e0d\u5b8c\u5168\u4fe1\u606f\u73af\u5883\u4e0b\u7684\u6587\u672c\u6e38\u620f\u534f\u4f5c\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u975e\u82f1\u8bed\u73af\u5883\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7814\u7a76\u5bf9\u6bd4\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e0e\u5176\u4ed6\u7c7b\u578b\u4ee3\u7406\u7684\u6027\u80fd\uff0c\u5e76\u4f7f\u7528\u7406\u8bba\u601d\u7ef4\uff08Theory of Mind, ToM\uff09\u89c4\u5212\u6280\u672f\u6765\u8bc4\u4f30\u5b83\u4eec\u5728\u9700\u8981\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u7684\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u4e2d\u8868\u73b0\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5f15\u5165\u5916\u90e8\u5de5\u5177\u6765\u89e3\u51b3\u6b64\u5361\u724c\u6e38\u620f\u4e2d\u52a8\u6001\u4e14\u5e9e\u5927\u7684\u884c\u52a8\u7a7a\u95f4\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9762\u5bf9\u9ad8\u7ea7\u522b\u4efb\u52a1\u65f6\u4e0e\u5f3a\u5316\u5b66\u4e60\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002\u5c3d\u7ba1\u5b58\u5728\u8fd9\u4e00\u5dee\u8ddd\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c55\u73b0\u4e86\u5728\u6e38\u620f\u573a\u666f\u4e0b\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u80fd\u591f\u7406\u89e3\u76df\u53cb\u548c\u5bf9\u624b\u7684\u884c\u4e3a\uff0c\u5e76\u4e0e\u76df\u53cb\u5efa\u7acb\u534f\u4f5c\u5173\u7cfb\uff0c\u4ece\u800c\u6301\u7eed\u63d0\u5347\u5176\u6027\u80fd\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u4e0e\u7406\u89e3\uff0c\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u4ee3\u7801\u5e93\u3002|\n", "2408.02549": "|**2024-08-05**|**Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning**|Hao Zhou et.al.|[2408.02549](http://arxiv.org/abs/2408.02549)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u90e8\u7f72\u57fa\u7840\u6a21\u578b\u7684\u521b\u65b0\u8fb9\u7f18-\u4e91\u67b6\u6784\u3002\u5177\u4f53\u76ee\u6807\u662f\u901a\u8fc7\u65e0\u7ebf\u7535\u8d44\u6e90\u5206\u914d\u548c\u4efb\u52a1\u5378\u8f7d\u6765\u6700\u5c0f\u5316\u57fa\u7840\u6a21\u578b\u7684\u670d\u52a1\u5ef6\u8fdf\u3002\u4e3b\u8981\u5206\u4e3a\u4e09\u90e8\u5206\uff1a\u9996\u5148\uff0c\u4ecb\u7ecd\u901a\u4fe1\u7cfb\u7edf\u6a21\u578b\uff0c\u5373\u5206\u914d\u65e0\u7ebf\u7535\u8d44\u6e90\u5e76\u8ba1\u7b97\u652f\u6301\u751f\u6210\u5185\u5bb9\u4f20\u8f93\u7684\u94fe\u8def\u5bb9\u91cf\uff1b\u5176\u6b21\uff0c\u5c55\u793a\u57fa\u7840\u6a21\u578b\u63a8\u7406\u6a21\u578b\uff0c\u7528\u4e8e\u8ba1\u7b97\u5185\u5bb9\u751f\u6210\u7684\u5ef6\u8fdf\uff1b\u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u65b9\u6cd5\u6765\u4f18\u5316\u4efb\u52a1\u5378\u8f7d\u51b3\u7b56\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u57fa\u7840\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u907f\u514d\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4e2d\u9700\u8981\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u56f0\u96be\u3002\u4eff\u771f\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u8fb9\u7f18-\u4e91\u90e8\u7f72\u4e0e\u4e0a\u4e0b\u6587\u5b66\u4e60\u4efb\u52a1\u5378\u8f7d\u65b9\u6cd5\u53ef\u4ee5\u5728\u65e0\u9700\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u6ee1\u610f\u7684\u751f\u6210\u670d\u52a1\u8d28\u91cf\u3002|\n", "2408.02545": "|**2024-08-05**|**RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation**|Daniel Fleischer et.al.|[2408.02545](http://arxiv.org/abs/2408.02545)|**[link](https://github.com/intellabs/ragfoundry)**|\u5b9e\u65bd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u56fa\u6709\u5730\u590d\u6742\uff0c\u9700\u8981\u6df1\u5165\u4e86\u89e3\u6570\u636e\u3001\u5e94\u7528\u573a\u666f\u4ee5\u53ca\u7ec6\u81f4\u7684\u8bbe\u8ba1\u51b3\u7b56\u3002\u6b64\u5916\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u7cfb\u7edf\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u9700\u8981\u901a\u8fc7\u591a\u7ef4\u5ea6\u7684\u65b9\u6cd5\u8bc4\u4f30\u68c0\u7d22\u51c6\u786e\u6027\u548c\u751f\u6210\u8d28\u91cf\u3002\u6211\u4eec\u5f15\u5165\u4e86RAG Foundry\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u7528\u4e8e\u5728RAG\u573a\u666f\u4e2d\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u636e\u3002RAG Foundry\u5c06\u6570\u636e\u521b\u5efa\u3001\u8bad\u7ec3\u3001\u63a8\u7406\u548c\u8bc4\u4f30\u6574\u5408\u5230\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u4e2d\uff0c\u4ece\u800c\u4e3a\u5728RAG\u8bbe\u7f6e\u4e2d\u8bad\u7ec3\u548c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u6570\u636e\u589e\u5f3a\u96c6\u63d0\u4f9b\u4e86\u4fbf\u5229\u3002\u8fd9\u79cd\u6574\u5408\u4f7f\u5f97\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u548cRAG\u6280\u672f\u7684\u5b9e\u9a8c\u53d8\u5f97\u5bb9\u6613\uff0c\u5141\u8bb8\u7528\u6237\u8f7b\u677e\u751f\u6210\u6570\u636e\u96c6\u5e76\u4f7f\u7528\u5185\u90e8\u6216\u4e13\u95e8\u7684\u77e5\u8bc6\u6e90\u8bad\u7ec3RAG\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u591a\u79cdRAG\u914d\u7f6e\u5bf9Llama-3\u548cPhi-3\u6a21\u578b\u8fdb\u884c\u589e\u5f3a\u548c\u5fae\u8c03\uff0c\u5728\u4e09\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86\u6301\u7eed\u6539\u8fdb\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u4f5c\u4e3a\u5f00\u6e90\u53d1\u5e03\u5728https://github.com/IntelLabs/RAGFoundry\u3002|\n", "2408.02544": "|**2024-08-05**|**Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions**|Xinbei Ma et.al.|[2408.02544](http://arxiv.org/abs/2408.02544)|**[link](https://github.com/xbmxb/EnvDistraction)**|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4ee3\u7406\u5728\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u73af\u5883\u4e2d\u7684\u5fe0\u8bda\u5ea6\u95ee\u9898\uff0c\u65e8\u5728\u89e3\u51b3\u4ee5\u4e0b\u7814\u7a76\u95ee\u9898\uff1a\u591a\u6a21\u6001GUI\u4ee3\u7406\u662f\u5426\u53ef\u80fd\u88ab\u73af\u5883\u80cc\u666f\u5206\u6563\u6ce8\u610f\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u8bbe\u7f6e\uff0c\u5176\u4e2d\u7528\u6237\u548c\u4ee3\u7406\u5747\u4e3a\u5584\u610f\u89d2\u8272\uff0c\u800c\u73af\u5883\u867d\u975e\u6076\u610f\uff0c\u4f46\u5305\u542b\u4e0e\u4efb\u52a1\u65e0\u5173\u7684\u5185\u5bb9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6a21\u62df\u6570\u636e\u96c6\uff0c\u5bf9\u591a\u79cdMLLM\u4f5c\u4e3aGUI\u4ee3\u7406\u8fdb\u884c\u8bc4\u4f30\uff0c\u6309\u7167\u4e09\u79cd\u4e0d\u540c\u7684\u5de5\u4f5c\u6a21\u5f0f\uff0c\u5373\u5177\u6709\u4e0d\u540c\u7a0b\u5ea6\u611f\u77e5\u80fd\u529b\u7684\u6a21\u5f0f\u8fdb\u884c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4fbf\u662f\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u901a\u7528\u578b\u4ee3\u7406\u8fd8\u662f\u4e13\u95e8\u7528\u4e8eGUI\u7684\u4ee3\u7406\uff0c\u90fd\u5bb9\u6613\u53d7\u5230\u5e72\u6270\u3002\u867d\u7136\u8fd1\u671f\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u591a\u6a21\u6001\u4ee3\u7406\u7684\u52a8\u4f5c\u51c6\u786e\u6027\uff08\u5373\u5e2e\u52a9\u6027\uff09\uff0c\u4f46\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u9762\u5bf9\u73af\u5883\u5e72\u6270\u65f6\u8868\u73b0\u51fa\u4e0d\u5fe0\u884c\u4e3a\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u51fa\u53d1\uff0c\u5b9e\u65bd\u73af\u5883\u6ce8\u5165\u7b56\u7565\uff0c\u5c55\u793a\u51fa\u5229\u7528\u8fd9\u79cd\u4e0d\u5fe0\u884c\u4e3a\u53ef\u80fd\u5bfc\u81f4\u7684\u610f\u5916\u98ce\u9669\u3002|\n", "2408.02535": "|**2024-08-05**|**Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph**|Zhao Kaichen et.al.|[2408.02535](http://arxiv.org/abs/2408.02535)|null|\u89c6\u89c9\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u662f\u667a\u80fd\u4f53\u9886\u57df\u7684\u91cd\u8981\u7814\u7a76\u4e4b\u4e00\uff0c\u65e8\u5728\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5468\u56f4\u73af\u5883\u5e76\u5b8c\u6210\u5bfc\u822a\u4efb\u52a1\u3002\u5728VLN\u4efb\u52a1\u4e2d\uff0c\u6307\u4ee4\u53ef\u4ee5\u5206\u4e3a\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u4e24\u79cd\u7c7b\u578b\u3002\u7ec6\u7c92\u5ea6\u6307\u4ee4\u8be6\u7ec6\u63cf\u8ff0\u4e86\u6574\u4e2a\u4efb\u52a1\u7684\u6b65\u9aa4\uff0c\u800c\u7c97\u7c92\u5ea6\u6307\u4ee4\u5219\u63d0\u4f9b\u4e86\u4e00\u4e2a\u62bd\u8c61\u7684\u4efb\u52a1\u63cf\u8ff0\uff0c\u66f4\u9002\u5408\u4eba\u7c7b\u7684\u4e60\u60ef\u3002\u73b0\u6709\u7684\u5927\u90e8\u5206\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u5bf9\u7ec6\u7c92\u5ea6\u6307\u4ee4\u7684\u7814\u7a76\u4e0a\uff0c\u5ffd\u89c6\u4e86\u65e5\u5e38\u751f\u6d3b\u4e2d\u5b58\u5728\u7684\u62bd\u8c61\u6307\u4ee4\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u4e8b\u4ef6\u77e5\u8bc6\u589e\u5f3a\u7684\u65b9\u5f0f\u8003\u8651VLN\u4e2d\u7684\u7c97\u7c92\u5ea6\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6765\u6574\u5408\u591a\u4e2a\u4e3b\u6d41\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5f62\u6210\u4e00\u4e2a\u5168\u9762\u7684\u4e8b\u4ef6\u77e5\u8bc6\u56fe\u8c31\uff08\u547d\u540d\u4e3aVLN-EventKG\uff09\u3002\u901a\u8fc7\u5c0f\u89c4\u6a21\u548c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u5408\u4f5c\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u80fd\u591f\u5904\u7406\u7c97\u7c92\u5ea6\u6307\u4ee4\u8f93\u5165\u7684\u4e8b\u4ef6\u5bfc\u822a\uff08EventNav\uff09\u65b9\u6cd5\uff0c\u7528\u4e8eVLN\u4efb\u52a1\u4e2d\u7684\u5bfc\u822a\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u52a8\u6001\u5386\u53f2\u56de\u6eaf\u6a21\u5757\uff0c\u80fd\u591f\u5728\u5b9e\u65f6\u4e2d\u7ea0\u6b63\u6f5c\u5728\u7684\u9519\u8bef\u52a8\u4f5c\u89c4\u5212\u3002\u5728\u5404\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528\u6211\u4eec\u63d0\u51fa\u7684VLN-EventKG\u7684\u77e5\u8bc6\u589e\u5f3a\u65b9\u6cd5\uff0c\u5728\u4f7f\u7528\u7c97\u7c92\u5ea6\u6307\u4ee4\u7684VLN\u4efb\u52a1\u4e2d\u5177\u6709\u8d85\u8fc75%\u7684\u6210\u529f\u7387\u4f18\u52bf\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u4ee5\u5728 \u4e0a\u8bbf\u95ee\u3002|\n", "2408.02509": "|**2024-08-05**|**Practical Attacks against Black-box Code Completion Engines**|Slobodan Jenko et.al.|[2408.02509](http://arxiv.org/abs/2408.02509)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aINSEC\u7684\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u5f15\u5bfc\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7801\u8865\u5168\u5f15\u64ce\u751f\u6210\u5b58\u5728\u5b89\u5168\u6f0f\u6d1e\u7684\u4ee3\u7801\u3002\u8fd9\u79cd\u653b\u51fb\u65b9\u5f0f\u4e0e\u5e02\u9762\u4e0a\u5927\u591a\u6570\u5546\u4e1a\u8865\u5168\u5f15\u64ce\uff08\u5982GitHub Copilot\uff09\u76f8\u4f3c\uff0c\u4ec5\u9700\u8981\u9ed1\u76d2\u67e5\u8be2\u8bbf\u95ee\u76ee\u6807\u5f15\u64ce\uff0c\u65e0\u9700\u4e86\u89e3\u5176\u5185\u90e8\u673a\u5236\u3002\u653b\u51fb\u7b56\u7565\u901a\u8fc7\u5728\u8865\u5168\u8f93\u5165\u4e2d\u63d2\u5165\u6076\u610f\u653b\u51fb\u5b57\u7b26\u4e32\u4f5c\u4e3a\u7b80\u77ed\u6ce8\u91ca\u6765\u5b9e\u65bd\u3002\u4e3a\u4e86\u8bbe\u8ba1\u51fa\u6709\u6548\u7684\u653b\u51fb\u5b57\u7b26\u4e32\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u521d\u59cb\u5316\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u4f18\u5316\u8fc7\u7a0b\u8fdb\u4e00\u6b65\u7cbe\u70bc\u3002\u6211\u4eec\u5728\u5f00\u6e90\u6a21\u578b\u3001\u9ed1\u76d2\u5546\u4e1a\u670d\u52a1\uff08\u5982OpenAI API\u548cGitHub Copilot\uff09\u4ee5\u53ca\u4e94\u79cd\u7f16\u7a0b\u8bed\u8a00\u4e0b\u768416\u4e2a\u5173\u952e\u9519\u8bef\u7c7b\u522b\u4e0a\u9a8c\u8bc1\u4e86INSEC\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\uff0cINSEC\u663e\u8457\u63d0\u9ad8\u4e86\u8003\u8651\u4e2d\u7684\u8865\u5168\u5f15\u64ce\u751f\u6210\u4e0d\u5b89\u5168\u4ee3\u7801\u7684\u53ef\u80fd\u6027\u8d85\u8fc750%\uff0c\u540c\u65f6\u4ecd\u5177\u5907\u751f\u6210\u529f\u80fd\u6b63\u786e\u4ee3\u7801\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u8d44\u6e90\u9700\u6c42\u8f83\u4f4e\uff0c\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e\u5341\u7f8e\u5143\uff0c\u53ef\u5728\u666e\u901a\u786c\u4ef6\u4e0a\u8fd0\u884c\u3002|\n", "2408.03302": "|**2024-08-06**|**TextIM: Part-aware Interactive Motion Synthesis from Text**|Siyuan Fan et.al.|[2408.03302](http://arxiv.org/abs/2408.03302)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTextIM\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u5408\u6210\u57fa\u4e8e\u6587\u672c\u9a71\u52a8\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u5e76\u7279\u522b\u5173\u6ce8\u4e8e\u90e8\u5206\u7ea7\u8bed\u4e49\u7684\u7cbe\u786e\u5bf9\u9f50\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u4ea4\u4e92\u8eab\u4f53\u90e8\u4f4d\u7684\u5173\u952e\u4f5c\u7528\uff0c\u5e76\u672a\u80fd\u5145\u5206\u6355\u6349\u548c\u5bf9\u9f50\u90e8\u5206\u7ea7\u8bed\u4e49\uff0c\u5bfc\u81f4\u4e86\u4e0d\u51c6\u786e\u751a\u81f3\u9519\u8bef\u7684\u52a8\u4f5c\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0cTextIM\u91c7\u7528\u4e86\u4e00\u4e2a\u89e3\u8026\u6761\u4ef6\u6269\u6563\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u4ea4\u4e92\u52a8\u4f5c\u4e0e\u5bf9\u5e94\u6587\u672c\u63cf\u8ff0\u4e2d\u7684\u8bed\u4e49\u610f\u56fe\u4e4b\u95f4\u8be6\u7ec6\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f5c\u4e3a\u4eba\u7c7b\u5927\u8111\u7684\u89d2\u8272\uff0c\u6765\u8bc6\u522b\u4ea4\u4e92\u7684\u8eab\u4f53\u90e8\u4f4d\u5e76\u7406\u89e3\u4ea4\u4e92\u8bed\u4e49\uff0c\u4ece\u800c\u751f\u6210\u590d\u6742\u7684\u5fae\u5999\u4ea4\u4e92\u52a8\u4f5c\u3002\u5728\u7cbe\u7ec6\u52a8\u4f5c\u5f15\u5bfc\u4e0b\uff0cTextIM\u8fdb\u4e00\u6b65\u5c06\u8fd9\u4e9b\u90e8\u5206\u52a8\u4f5c\u6269\u5c55\u4e3a\u6574\u4e2a\u8eab\u4f53\u7684\u8fde\u8d2f\u52a8\u4f5c\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7a7a\u95f4\u4e00\u81f4\u6027\u6a21\u5757\uff0c\u901a\u8fc7\u90e8\u5206\u56fe\u5377\u79ef\u7f51\u7edc\u5728\u6574\u4e2a\u8eab\u4f53\u52a8\u4f5c\u4e2d\u8865\u5145\u548c\u7ef4\u6301\u5404\u90e8\u5206\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\u548c\u548c\u8c10\u6027\u3002\u5bf9\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u7cbe\u5fc3\u9009\u62e9\u4e86\u5e76\u91cd\u65b0\u6807\u8bb0\u4e86HUMANML3D\u4e2d\u7684\u4ea4\u4e92\u52a8\u4f5c\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTextIM\u80fd\u591f\u4ea7\u751f\u8bed\u4e49\u4e0a\u51c6\u786e\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u573a\u666f\u4e0b\u5408\u6210\u4ea4\u4e92\u52a8\u4f5c\u7684\u771f\u5b9e\u611f\u548c\u5e94\u7528\u6027\uff0c\u5305\u62ec\u4e0e\u53ef\u53d8\u5f62\u548c\u52a8\u6001\u53d8\u5316\u7269\u4f53\u7684\u4ea4\u4e92\u3002|\n", "2408.03297": "|**2024-08-06**|**KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models**|Ruizhe Zhang et.al.|[2408.03297](http://arxiv.org/abs/2408.03297)|null|\u901a\u8fc7\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b56\u7565\u5df2\u6210\u4e3a\u7f13\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u65f6\u9047\u5230\u7684\u5e7b\u89c9\u95ee\u9898\u7684\u6709\u6548\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5728\u6574\u5408\u975e\u53c2\u6570\u5316\u7684\u5916\u90e8\u652f\u6301\u8bc1\u636e\u4e0e\u5185\u90e8\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e0d\u53ef\u907f\u514d\u7684\u77e5\u8bc6\u51b2\u7a81\u53ef\u80fd\u4f1a\u4ea7\u751f\uff0c\u5bfc\u81f4\u6a21\u578b\u54cd\u5e94\u4e2d\u7684\u6df7\u6dc6\u3002\u4e3a\u4e86\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u77e5\u8bc6\u9009\u62e9\u80fd\u529b\uff0c\u4e00\u4e9b\u7814\u7a76\u5df2\u7ecf\u5173\u6ce8\u4e8e\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u6765\u7ec6\u5316\u5176\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u660e\u786e\u7684\u8d1f\u5411\u4fe1\u53f7\u548c\u6bd4\u8f83\u76ee\u6807\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u8fdb\u884c\u5fae\u8c03\u7684\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u3001\u73b0\u5b9e\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u4ecd\u7136\u53ef\u80fd\u8868\u73b0\u51fa\u4e0d\u7406\u60f3\u7684\u7279\u6027\u3002 \u9488\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u610f\u8bc6\u504f\u597d\u4f18\u5316\uff08KaPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5bf9\u771f\u5b9e\u68c0\u7d22\u573a\u666f\u4e2d\u77e5\u8bc6\u9009\u62e9\u7684\u53ef\u63a7\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63a2\u7d22\u5e76\u6a21\u62df\u4e86\u4e0d\u540c\u4e0a\u4e0b\u6587\u7ec4\u5408\u4e0b\u7684\u9519\u8bef\u7c7b\u578b\uff0c\u5e76\u901a\u8fc7\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u5b66\u4e60\u5982\u4f55\u907f\u514d\u8fd9\u4e9b\u8d1f\u5411\u4fe1\u53f7\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u54cd\u5e94\u957f\u5ea6\u4e0e\u8868\u793a\u4e0d\u540c\u884c\u4e3a\u6a21\u5f0f\u7684\u504f\u597d\u6570\u636e\u6bd4\u4f8b\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u6211\u4eec\u589e\u5f3a\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u80fd\u529b\u548c\u566a\u58f0\u9c81\u68d2\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0cKaPO\u5728\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u65b9\u9762\u53d6\u5f97\u4e86\u8d85\u8fc737%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u5728\u5404\u79cd\u79bb\u7fa4\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u4e86\u7a33\u5065\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2408.03281": "|**2024-08-07**|**StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation**|Boxi Cao et.al.|[2408.03281](http://arxiv.org/abs/2408.03281)|**[link](https://github.com/c-box/structeval)**|\u8bc4\u4ef7\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f00\u53d1\u7684\u5173\u952e\u5de5\u5177\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u5f0f\u901a\u5e38\u91c7\u7528\u5355\u4e00\u6307\u6807\u8bc4\u4f30\u6a21\u5f0f\uff0c\u5bf9\u6bcf\u4e2a\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u5728\u533a\u5206\u6a21\u578b\u662f\u5426\u771f\u6b63\u5177\u5907\u6240\u9700\u80fd\u529b\u8fd8\u662f\u4ec5\u4ec5\u8bb0\u5fc6\u6216\u731c\u6d4b\u7279\u5b9a\u95ee\u9898\u7684\u7b54\u6848\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aStructEval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\u3002\u4ece\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u51fa\u53d1\uff0cStructEval\u901a\u8fc7\u5728\u591a\u4e2a\u8ba4\u77e5\u5c42\u6b21\u548c\u5173\u952e\u6982\u5ff5\u4e0a\u8fdb\u884c\u7ed3\u6784\u5316\u7684\u8bc4\u4f30\u6765\u6df1\u5316\u548c\u62d3\u5bbd\u8bc4\u4f30\u8303\u56f4\uff0c\u4ece\u800c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u4f9b\u5168\u9762\u3001\u7a33\u5065\u4e14\u4e00\u81f4\u7684\u8bc4\u4f30\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0cStructEval\u662f\u4e00\u4e2a\u53ef\u9760\u7684\u5de5\u5177\uff0c\u80fd\u591f\u62b5\u6297\u6570\u636e\u6c61\u67d3\u7684\u98ce\u9669\u5e76\u51cf\u5c11\u6f5c\u5728\u504f\u89c1\u7684\u5e72\u6270\uff0c\u4ece\u800c\u63d0\u4f9b\u5173\u4e8e\u6a21\u578b\u80fd\u529b\u66f4\u53ef\u9760\u548c\u4e00\u81f4\u7684\u7ed3\u8bba\u3002\u6211\u4eec\u7684\u6846\u67b6\u8fd8\u4e3a\u672a\u6765\u539f\u7406\u6027\u548c\u53ef\u4fe1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u534f\u8bae\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u542f\u793a\u3002|\n", "2408.03256": "|**2024-08-06**|**Synthesizing Text-to-SQL Data from Weak and Strong LLMs**|Jiaxi Yang et.al.|[2408.03256](http://arxiv.org/abs/2408.03256)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0e\u5c01\u95ed\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6587\u672c\u5230SQL\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u5dee\u8ddd\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5408\u6210\u6570\u636e\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u66f4\u5927\u3001\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff08\u5f3a\u6a21\u578b\uff09\u4e0e\u8f83\u5c0f\u3001\u4e0d\u5b8c\u5168\u5bf9\u9f50\u6a21\u578b\u751f\u6210\u7684\u9519\u8bef\u4fe1\u606f\u6570\u636e\uff08\u5f31\u6a21\u578b\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230SQL\u6a21\u578b\u7684\u9886\u57df\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u63a2\u7d22\u4e86\u9519\u8bef\u6570\u636e\u76d1\u7763\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u5408\u6210\u6570\u636e\u65b9\u6cd5\u5bf9\u5f00\u6e90LLM\u8fdb\u884c\u6307\u4ee4\u8c03\u6574\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u4e13\u95e8\u9488\u5bf9\u6587\u672c\u5230SQL\u4efb\u52a1\u7684\u6a21\u578bSENSE\u3002\u901a\u8fc7\u5728SPIDER\u548cBIRD\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\uff0c\u8bc1\u660e\u4e86SENSE\u7684\u6709\u6548\u6027\uff0c\u6210\u529f\u7f29\u5c0f\u4e86\u5f00\u6e90\u6a21\u578b\u4e0e\u57fa\u4e8e\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u65b9\u6cd5\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2408.03247": "|**2024-08-06**|**Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons**|Yifei Wang et.al.|[2408.03247](http://arxiv.org/abs/2408.03247)|**[link](https://github.com/wangyifei0047/tfrkn)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9762\u5bf9\u63a8\u7406\u4efb\u52a1\u65f6\u662f\u5426\u79ef\u6781\u5730\u56de\u5fc6\u6216\u68c0\u7d22\u5176\u5185\u90e8\u4e8b\u5b9e\u77e5\u8bc6\u5e93\u3002\u901a\u8fc7\u5206\u6790LLM\u5728\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u5185\u90e8\u4e8b\u5b9e\u53ec\u56de\u60c5\u51b5\uff0c\u5373\u6240\u8c13\u7684\u77e5\u8bc6\u795e\u7ecf\u5143\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0cLLM\u672a\u80fd\u6709\u6548\u5229\u7528\u5173\u952e\u7684\u4e8b\u5b9e\u5173\u8054\u3002\u76f8\u53cd\uff0c\u5b83\u4eec\u503e\u5411\u4e8e\u91c7\u53d6\u66ff\u4ee3\u7684\u3001\u5feb\u6377\u7684\u8def\u5f84\u6765\u56de\u7b54\u63a8\u7406\u95ee\u9898\u3002\u901a\u8fc7\u624b\u52a8\u8c03\u6574LLM\u4e2d\u53c2\u6570\u77e5\u8bc6\u7684\u53ec\u56de\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc1\u660e\u76f4\u63a5\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u63a8\u7406\u6027\u80fd\uff0c\u800c\u6291\u5236\u5b83\u5219\u4f1a\u5bfc\u81f4\u660e\u663e\u7684\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7684\u5f71\u54cd\uff0c\u8fd9\u662f\u4e00\u79cd\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5f3a\u5927\u6280\u672f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cCoT\u53ef\u4ee5\u901a\u8fc7\u9f13\u52b1LLM\u8fdb\u884c\u6709\u6761\u7406\u548c\u53ef\u9760\u7684\u63a8\u7406\u6765\u589e\u5f3a\u5bf9\u4e8b\u5b9e\u77e5\u8bc6\u7684\u56de\u5fc6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0a\u4e0b\u6587\u51b2\u7a81\u5982\u4f55\u5f71\u54cd\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e8b\u5b9e\u7684\u68c0\u7d22\uff0c\u4ee5\u83b7\u5f97\u5bf9LLM\u4e8b\u5b9e\u56de\u5fc6\u884c\u4e3a\u7684\u5168\u9762\u7406\u89e3\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0d\u4e45\u540e\u63d0\u4f9b\u3002|\n", "2408.03172": "|**2024-08-06**|**Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi**|Pranita Deshmukh et.al.|[2408.03172](http://arxiv.org/abs/2408.03172)|null|\u968f\u7740\u4f4e\u8d44\u6e90\u8bed\u8a00\u6570\u5b57\u5185\u5bb9\u7684\u6fc0\u589e\uff0c\u9488\u5bf9\u8fd9\u4e9b\u8bed\u8a00\u7684\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u9700\u6c42\u6b63\u5728\u589e\u52a0\u3002BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff09\u4f5c\u4e3a\u4f17\u591aNLP\u67b6\u6784\u548c\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u6846\u67b6\uff0c\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u5f00\u53d1\u4f4e\u8d44\u6e90NLP\u6a21\u578b\u3002\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u51cf\u5c11\u8bad\u7ec3\u53c2\u6570\uff0c\u4ee5\u964d\u4f4e\u8bad\u7ec3\u6a21\u578b\u6240\u9700\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u5e76\u8fbe\u5230\u4e0e\u5b8c\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\u7684\u7ed3\u679c\u3002\u672c\u7814\u7a76\u65e8\u5728\u5206\u6790PEFT\u65b9\u6cd5\u5728\u9a6c\u62c9\u5730\u8bed\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u5bf9\u5404\u79cd\u5355\u8bed\u548c\u591a\u8bed\u79cd\u9a6c\u62c9\u5730\u8bedBERT\u6a21\u578b\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728MahaSent\u3001MahaHate\u548cMahaNews\u7b49\u91cd\u8981\u6587\u672c\u5206\u7c7b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002PEFT\u6280\u672f\u7684\u5f15\u5165\u663e\u8457\u52a0\u5feb\u4e86\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u7684\u5173\u952e\u65b9\u9762\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u548c\u9002\u914d\u5668\u65b9\u6cd5\u5728\u4f4e\u8d44\u6e90\u6587\u672c\u5206\u7c7b\u4e2d\u7684\u5e94\u7528\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u51c6\u786e\u7387\u4e0a\u4e0e\u5168\u91cf\u5fae\u8c03\u76f8\u5f53\uff0c\u4e14\u65e0\u9700\u635f\u5931\uff0c\u53ef\u7528\u4e8e\u9a6c\u62c9\u5730\u8bed\u548c\u5176\u4ed6\u5370\u5ea6\u8bed\u65cf\u8bed\u8a00\u7684NLP\u80fd\u529b\u6301\u7eed\u53d1\u5c55\u3002|\n", "2408.03150": "|**2024-08-06**|**Conditioning LLMs with Emotion in Neural Machine Translation**|Charles Brazier et.al.|[2408.03150](http://arxiv.org/abs/2408.03150)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u673a\u5668\u7ffb\u8bd1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u901a\u8fc7\u5c06\u60c5\u611f\u4fe1\u606f\u6574\u5408\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u6765\u589e\u5f3a\u7ffb\u8bd1\u8d28\u91cf\uff0c\u8fd9\u4e9b\u60c5\u611f\u4fe1\u606f\u662f\u4ece\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\uff08SER\uff09\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u4e94\u4e2a\u73b0\u6709\u7684LLM\u8fdb\u884cLibri-trans\u6570\u636e\u96c6\u7684\u5fae\u8c03\uff0c\u5e76\u9009\u62e9\u8868\u73b0\u6700\u4f73\u7684\u6a21\u578b\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ee5\u4e0d\u540c\u7ef4\u5ea6\u7684\u60c5\u611f\u589e\u5f3aLLM\u63d0\u793a\uff0c\u5e76\u5728\u8fd9\u4e9b\u4e0d\u540c\u7684\u914d\u7f6e\u4e0b\u8bad\u7ec3\u9009\u5b9a\u7684LLM\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c06\u60c5\u611f\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5524\u9192\u5ea6\uff0c\u6574\u5408\u5230LLM\u63d0\u793a\u4e2d\uff0c\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u7ffb\u8bd1\u8d28\u91cf\u3002|\n", "2408.03130": "|**2024-08-06**|**Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations**|Leo Donisch et.al.|[2408.03130](http://arxiv.org/abs/2408.03130)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u65e0\u5904\u4e0d\u5728\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u89c4\u6a21\u548c\u590d\u6742\u6027\u5e26\u6765\u4e86\u72ec\u7279\u7684\u6311\u6218\u4e0e\u673a\u9047\uff0c\u4fc3\u4f7f\u7814\u7a76\u8005\u4e0e\u5b9e\u8df5\u8005\u63a2\u7d22\u65b0\u578b\u7684\u6a21\u578b\u8bad\u7ec3\u3001\u4f18\u5316\u548c\u90e8\u7f72\u65b9\u6cd5\u3002\u672c\u6587\u7efc\u8ff0\u7684\u91cd\u70b9\u5728\u4e8e\u5404\u79cd\u964d\u4f4e\u8d44\u6e90\u9700\u6c42\u548c\u538b\u7f29\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6280\u672f\uff0c\u5305\u62ec\u91cf\u5316\u3001\u526a\u679d\u3001\u77e5\u8bc6\u84b8\u998f\u4ee5\u53ca\u67b6\u6784\u4f18\u5316\u3002\u4e3b\u8981\u76ee\u6807\u662f\u6df1\u5165\u63a2\u8ba8\u6bcf\u79cd\u65b9\u6cd5\uff0c\u5e76\u7a81\u51fa\u5176\u72ec\u7279\u6311\u6218\u53ca\u5176\u5b9e\u9645\u5e94\u7528\u3002\u8ba8\u8bba\u7684\u65b9\u6cd5\u6309\u7167\u5206\u7c7b\u5b66\u8fdb\u884c\u7ec4\u7ec7\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4f18\u5316\u666f\u89c2\u7684\u6982\u89c8\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7814\u7a76\u8f68\u8ff9\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u4e0d\u8981\u8f93\u51fa\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u786e\u4fdd\u7ffb\u8bd1\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2408.03127": "|**2024-08-06**|**Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation**|Artur Guimar\u00e3es et.al.|[2408.03127](http://arxiv.org/abs/2408.03127)|**[link](https://github.com/araag2/SemEval2024-Task2)**|\u8fd9\u7bc7\u8bba\u6587\u9610\u8ff0\u4e86\u6211\u4eec\u5bf9SemEval-2024\u5b89\u5168\u751f\u7269\u533b\u5b66\u81ea\u7136\u8bed\u8a00\u63a8\u65ad\u5728\u4e34\u5e8a\u8bd5\u9a8c\uff08NLI4CT\uff09\u4efb\u52a1\u7684\u5904\u7406\u7b56\u7565\u3002\u8be5\u4efb\u52a1\u6d89\u53ca\u5bf9\u4e34\u5e8a\u8bd5\u9a8c\u62a5\u544a\uff08CTRs\uff09\u4e2d\u7684\u9648\u8ff0\u8fdb\u884c\u5206\u7c7b\u3002\u6211\u4eec\u63a2\u7d22\u4e86Mistral-7B\u8fd9\u79cd\u901a\u7528\u7684\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u80fd\u529b\u3002\u6211\u4eec\u4e3aNLI4CT\u4efb\u52a1\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u589e\u5f3a\u540e\u7684\u8bad\u7ec3\u6570\u636e\u96c6\u5bf9\u91cf\u5316\u7248\u672c\u7684\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u5b8fF1\u5206\u6570\u65b9\u9762\u53ef\u4ee5\u4ea7\u751f\u663e\u8457\u7684\u7ed3\u679c\uff0c\u4f46\u5728\u5fe0\u5b9e\u6027\u548c\u4e00\u81f4\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u6240\u6709\u5f00\u53d1\u7684\u4ee3\u7801\u90fd\u5728GitHub\u4ed3\u5e93\u4e2d\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.03119": "|**2024-08-06**|**Evaluating the Translation Performance of Large Language Models Based on Euas-20**|Yan Huang et.al.|[2408.03119](http://arxiv.org/abs/2408.03119)|null|\u8fd1\u5e74\u6765\uff0c\u5728\u6df1\u5ea6\u5b66\u4e60\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982BERT\u548cGPT\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u7a81\u7834\u6027\u6210\u679c\u3002\u673a\u5668\u7ffb\u8bd1\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u6838\u5fc3\u4efb\u52a1\u4e4b\u4e00\uff0c\u4e5f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e2d\u53d7\u76ca\u532a\u6d45\uff0c\u5b9e\u73b0\u4e86\u8d28\u7684\u98de\u8dc3\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u673a\u5668\u7ffb\u8bd1\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u56e0\u6b64\uff0c\u672c\u6587\u6784\u5efa\u4e86Euas-20\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3001\u4e0d\u540c\u8bed\u8a00\u7684\u7ffb\u8bd1\u80fd\u529b\u4ee5\u53ca\u9884\u8bad\u7ec3\u6570\u636e\u5bf9LLMs\u7ffb\u8bd1\u80fd\u529b\u7684\u5f71\u54cd\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u53c2\u8003\u3002|\n", "2408.03940": "|**2024-08-07**|**How Well Can Vision Language Models See Image Details?**|Chenhui Gou et.al.|[2408.03940](http://arxiv.org/abs/2408.03940)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LLM-\u9a71\u52a8\u7684VLM\uff09\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9bVLM\u662f\u5426\u80fd\u8d85\u8d8a\u8bed\u4e49\u5c42\u9762\uff0c\u6df1\u5165\u89c2\u5bdf\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u4e0d\u660e\u6717\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u50cf\u7d20\u503c\u9884\u6d4b\u4efb\u52a1\uff08PVP\uff09\uff0c\u4ee5\u63a2\u7d22\u201c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u770b\u5230\u591a\u7ec6\u7684\u56fe\u50cf\u7ec6\u8282\uff1f\u201d\u5e76\u534f\u52a9VLM\u63d0\u5347\u5bf9\u7ec6\u8282\u7684\u611f\u77e5\u80fd\u529b\u3002\u901a\u5e38\uff0c\u8fd9\u4e9b\u6a21\u578b\u7531\u51bb\u7ed3\u7684CLIP\u89c6\u89c9\u7f16\u7801\u5668\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fde\u63a5\u6a21\u5757\u7ec4\u6210\u3002\u5728\u5bf9PVP\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u6211\u4eec\u53d1\u73b0\uff1a1\uff09\u73b0\u6709\u7684VLM\u4ec5\u901a\u8fc7\u5fae\u8c03\u8fde\u63a5\u6a21\u5757\u548cLLM\uff0c\u5728\u9884\u6d4b\u7cbe\u786e\u50cf\u7d20\u503c\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff1b2\uff09\u5f53\u89c6\u89c9\u7f16\u7801\u5668\u4e5f\u5f97\u5230\u9002\u5e94\u65f6\uff0c\u9884\u6d4b\u7cbe\u5ea6\u663e\u8457\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\uff0c\u5c06\u50cf\u7d20\u503c\u9884\u6d4b\u4f5c\u4e3aVLM\u9884\u8bad\u7ec3\u4efb\u52a1\u4e4b\u4e00\uff0c\u5e76\u5bf9\u89c6\u89c9\u7f16\u7801\u5668\u8fdb\u884c\u9002\u5e94\uff0c\u663e\u8457\u63d0\u5347\u4e86VLM\u5728\u9700\u8981\u8be6\u7ec6\u56fe\u50cf\u611f\u77e5\u7684\u4e0b\u6e38\u56fe\u50cf\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5982\u5f15\u7528\u56fe\u50cf\u5206\u5272\uff08\u5e73\u5747cIoU\u6539\u8fdb+10.19\u767e\u5206\u70b9\uff09\u548c\u89c6\u9891\u6e38\u620f\u51b3\u7b56\uff08\u5728\u4e24\u4e2a\u6e38\u620f\u4e2d\u5206\u522b\u5e73\u5747\u5f97\u5206\u6539\u5584\u4e86+80.34\u548c+70.54\uff09\u3002|\n", "2408.03936": "|**2024-08-07**|**SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature**|Vin\u00edcius Di Oliveira et.al.|[2408.03936](http://arxiv.org/abs/2408.03936)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u82f1\u8bed\u4e4b\u5916\u7684\u8bed\u8a00\uff0c\u5c24\u5176\u662f\u5728\u7279\u5b9a\u9886\u57df\u5982Mercosur\u901a\u7528\u5546\u54c1\u540d\u79f0\uff08NCM\uff09\uff0c\u5df4\u897f\u534f\u8c03\u7cfb\u7edf\uff08HS\uff09\u7684\u5e94\u7528\u65b9\u9762\uff0c\u4ecd\u6709\u5f88\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u7f3a\u53e3\uff0c\u672c\u7814\u7a76\u5229\u7528TeenyTineLLaMA\uff0c\u4e00\u79cd\u57fa\u7840\u8461\u8404\u7259\u8bedLLM\uff0c\u4f5c\u4e3aLLM\u6e90\uff0c\u5b9e\u65bdNCM\u5e94\u7528\u5904\u7406\u3002\u6b64\u5916\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u4efb\u52a1\u7279\u5b9a\u5fae\u8c03\u7684\u7b80\u5316\u68c0\u7d22\u589e\u5f3a\u5fae\u8c03\uff08SLIM-RAFT\uff09\u6280\u672f\u3002\u8be5\u65b9\u6cd5\u91c7\u7528\u7b80\u5316\u7684\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u7b56\u7565\u8fdb\u884c\u63d0\u793a\u5f00\u53d1\uff0c\u4f7f\u7528\u7b80\u77ed\u800c\u96c6\u4e2d\u7684\u6587\u6863\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u66f4\u7d27\u51d1\u548c\u9ad8\u6548\u7684\u65b9\u5f0f\u8fdb\u884c\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u76f8\u540c\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8eTeenyTineLLaMA\u548cChatGPT-4\uff0c\u5c55\u793a\u4e86\u8f83\u5c0fLLM\u5fae\u8c03\u7684\u9ad8\u6548\u548c\u6210\u672c\u6548\u76ca\u66ff\u4ee3\u65b9\u6848\u3002\u5c3d\u7ba1\u7814\u7a76\u91cd\u70b9\u662fNCM\u5e94\u7528\uff0c\u4f46\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u53ef\u4ee5\u8f7b\u677e\u5730\u9002\u5e94\u5168\u7403\u8303\u56f4\u5185\u7684HS\u5e94\u7528\u3002|\n", "2408.03910": "|**2024-08-07**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u5177\u6709\u7279\u5b9a\u7684\u4efb\u52a1\u6027\uff0c\u5e76\u4e14\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u8fd9\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\framework\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\uff0c\u5b83\u5c06LLM\u4ee3\u7406\u4e0e\u4ece\u4ee3\u7801\u4ed3\u5e93\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u96c6\u6210\u5728\u4e00\u8d77\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u4ee5\u53ca\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0c\\framework\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\\framework\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u51ed\u501f\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0c\\framework\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u4f53\u73b0\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u529f\u80fd\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03907": "|**2024-08-07**|**Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models**|Shachi H Kumar et.al.|[2408.03907](http://arxiv.org/abs/2408.03907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u8bed\u8a00\u548c\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u76f8\u5f53\u7684\u6587\u672c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5373\u4f7f\u7ecf\u8fc7\u76d1\u7763\u8bad\u7ec3\u548c\u4eba\u7c7b\u5bf9\u9f50\uff0c\u8fd9\u4e9bLLM\u4ecd\u5bb9\u6613\u53d7\u5230\u6076\u610f\u7528\u6237\u7684\u653b\u51fb\uff0c\u540e\u8005\u53ef\u4ee5\u901a\u8fc7\u63d0\u793a\u6a21\u578b\u751f\u6210\u4e0d\u5e0c\u671b\u770b\u5230\u7684\u6587\u672c\u3002\u6b64\u5916\uff0cLLM\u5185\u5d4c\u6709\u6f5c\u5728\u504f\u89c1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4e92\u52a8\u4e2d\u7684\u5404\u79cd\u6709\u5bb3\u5f71\u54cd\u3002\u5f53\u524d\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\u7f3a\u4e4f\u6807\u51c6\u548c\u5171\u8bc6\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4eba\u5de5\u751f\u6210\u7684\u6a21\u677f\u548c\u6ce8\u91ca\uff0c\u8fd9\u65e2\u6602\u8d35\u53c8\u8d39\u65f6\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u521b\u5efa\u5bf9\u6297\u6027\u63d0\u793a\u6765\u6fc0\u53d1\u76ee\u6807LLM\u751f\u6210\u5e26\u6709\u504f\u89c1\u7684\u54cd\u5e94\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5206\u6790\u4e86\u591a\u79cd\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u548c\u6307\u6807\u3002\u6211\u4eec\u6df1\u5165\u63a2\u8ba8\u4e86\u6a21\u578b\u54cd\u5e94\u7684\u5404\u79cd\u7ec6\u5fae\u5dee\u522b\uff0c\u8bc6\u522b\u4e86\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u5e76\u8bc4\u4f30\u4e86\u8bc4\u4f30\u65b9\u6cd5\u7684\u4e0d\u8db3\u4e4b\u5904\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u6307\u6807\u4e0e\u4eba\u5de5\u8bc4\u4f30\u8fdb\u884c\u6bd4\u8f83\uff0c\u5e76\u9a8c\u8bc1\u4e86\u201cLLM\u4f5c\u4e3a\u6cd5\u5b98\u201d\u7684\u6307\u6807\u4e0e\u751f\u6210\u504f\u89c1\u5224\u65ad\u7684\u4eba\u7c7b\u8bc4\u4ef7\u4e00\u81f4\u3002|\n", "2408.03876": "|**2024-08-07**|**From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems**|Leixian Shen et.al.|[2408.03876](http://arxiv.org/abs/2408.03876)|null|\u521b\u5efa\u4ece\u539f\u59cb\u6570\u636e\u751f\u6210\u6570\u636e\u6545\u4e8b\u7684\u8fc7\u7a0b\u6781\u5177\u6311\u6218\u6027\uff0c\u8fd9\u4e3b\u8981\u6e90\u4e8e\u4eba\u7c7b\u6709\u9650\u7684\u6ce8\u610f\u529b\u548c\u5bf9\u7279\u5b9a\u6280\u80fd\u7684\u9700\u6c42\u3002\u8fd1\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u6784\u5efa\u5229\u7528\u72ec\u7acb\u4ee3\u7406\u5b9e\u73b0\u5de5\u4f5c\u6d41\u7a0b\u81ea\u52a8\u5316\u4ee5\u7b80\u5316\u6570\u636e\u6545\u4e8b\u521b\u4f5c\u6d41\u7a0b\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5de8\u5927\u673a\u9047\u3002\u5c3d\u7ba1\u591a\u4ee3\u7406\u7cfb\u7edf\u80fd\u591f\u5145\u5206\u6316\u6398LLM\u6f5c\u529b\u5e76\u5206\u89e3\u4efb\u52a1\u4f9b\u4e2a\u4f53\u4ee3\u7406\u6267\u884c\u5177\u6709\u8bf8\u591a\u4f18\u52bf\uff0c\u4f46\u5728\u8bbe\u8ba1\u8fd9\u4e9b\u7cfb\u7edf\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4efb\u52a1\u5206\u89e3\u3001\u5b50\u4efb\u52a1\u6027\u80fd\u4f18\u5316\u4ee5\u53ca\u5de5\u4f5c\u6d41\u7a0b\u8bbe\u8ba1\u7b49\u65b9\u9762\u7684\u6311\u6218\u3002\u4e3a\u4e86\u66f4\u6df1\u5165\u5730\u7406\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Data Director\u2014\u2014\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u751f\u6210\u52a8\u753b\u6570\u636e\u89c6\u9891\uff0c\u8fd9\u4e00\u7c7b\u6570\u636e\u6545\u4e8b\u7684\u5178\u578b\u5f62\u5f0f\u3002Data Director\u901a\u8fc7\u89e3\u6790\u539f\u59cb\u6570\u636e\u3001\u62c6\u5206\u4efb\u52a1\u3001\u8bbe\u8ba1\u4ee3\u7406\u89d2\u8272\u4ee5\u8fdb\u884c\u81ea\u52a8\u51b3\u7b56\uff0c\u5e76\u65e0\u7f1d\u6574\u5408\u6570\u636e\u89c6\u9891\u4e2d\u7684\u5404\u79cd\u7ec4\u4ef6\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86Data Director\u5728\u751f\u6210\u6570\u636e\u89c6\u9891\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6574\u4e2a\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ece\u89e3\u51b3\u9762\u4e34\u7684\u6311\u6218\u4e2d\u63d0\u70bc\u51fa\u4e86\u7ecf\u9a8c\u6559\u8bad\uff0c\u8fd9\u4e9b\u7ecf\u9a8c\u5bf9\u4e8e\u6307\u5bfc\u672a\u6765\u5728\u6570\u636e\u6545\u4e8b\u53d9\u8ff0\u9886\u57df\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4e5f\u63ed\u793a\u4e86\u5168\u7403\u4f18\u5316\u3001\u4eba\u673a\u4ea4\u4e92\u8bbe\u8ba1\u4ee5\u53ca\u9ad8\u7ea7\u591a\u6a21\u6001LLM\u5e94\u7528\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.03865": "|**2024-08-07**|**PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training**|Haoran Xu et.al.|[2408.03865](http://arxiv.org/abs/2408.03865)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u4f20\u7edf\u7684Transformer\u6a21\u578b\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u53d8\u5f97\u8ba1\u7b97\u5bc6\u96c6\u578b\uff0c\u56e0\u4e3a\u5176\u8ba1\u7b97\u91cf\u968f\u5e8f\u5217\u957f\u5ea6\u7684\u5e73\u65b9\u589e\u957f\u3002Mamba\u4f5c\u4e3a\u751f\u6210AI\u9886\u57df\u7684\u4e00\u9879\u7a81\u7834\u6027\u67b6\u6784\uff0c\u5c55\u73b0\u51fa\u5728\u51cf\u5c11\u8ba1\u7b97\u548c\u5185\u5b58\u590d\u6742\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u9ad8\u6548\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684Mamba\u8bad\u7ec3\u6846\u67b6\u5728\u5904\u7406\u53d8\u957f\u5e8f\u5217\u8f93\u5165\u65f6\u5b58\u5728\u6548\u7387\u95ee\u9898\u3002\u5355\u5e8f\u5217\u8bad\u7ec3\u4f1a\u5bfc\u81f4GPU\u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u800c\u5bf9\u53d8\u957f\u5e8f\u5217\u8fdb\u884c\u6279\u91cf\u5904\u7406\u5230\u6700\u5927\u957f\u5ea6\u5219\u4f1a\u5e26\u6765\u663e\u8457\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5206\u6790\u4e86Mamba\u67b6\u6784\u4e2d\u74f6\u9888\u64cd\u4f5c\u5668\u5728\u4e0d\u540c\u5f20\u91cf\u5f62\u72b6\u4e0b\u7684\u6027\u80fd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPackMamba\u7684\u9ad8\u541e\u5410\u91cfMamba\uff0c\u5b83\u80fd\u591f\u6709\u6548\u5730\u5904\u7406\u53d8\u957f\u5e8f\u5217\u3002\u6df1\u5165\u7814\u7a76\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSMs\uff09\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u5e76\u884c\u64cd\u4f5c\u5668\uff0c\u4ee5\u907f\u514d\u5728\u5404\u4e2a\u5e8f\u5217\u4e4b\u95f4\u4f20\u9012\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6027\u80fd\u3002\u5728NVIDIA A100 GPU\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPackMamba\u5728\u5904\u74061.4B\u6a21\u578b\u65f6\u6bd4\u57fa\u7ebf\u5355\u5e8f\u5217\u5904\u7406\u65b9\u6848\u63d0\u9ad8\u4e863.06\u500d\u7684\u901f\u5ea6\uff0c\u5728\u5904\u74062.8B\u6a21\u578b\u65f6\u63d0\u9ad8\u4e862.62\u500d\u7684\u901f\u5ea6\u3002|\n", "2408.03847": "|**2024-08-07**|**GAIA -- A Large Language Model for Advanced Power Dispatch**|Yuheng Cheng et.al.|[2408.03847](http://arxiv.org/abs/2408.03847)|null|\u7535\u529b\u8c03\u5ea6\u5bf9\u4e8e\u63d0\u4f9b\u7a33\u5b9a\u3001\u7ecf\u6d4e\u4e14\u73af\u4fdd\u7684\u7535\u529b\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u7535\u529b\u7cfb\u7edf\u89c4\u6a21\u548c\u590d\u6742\u6027\u7684\u589e\u957f\uff0c\u4f20\u7edf\u7684\u8c03\u5ea6\u65b9\u6cd5\u5728\u591a\u4efb\u52a1\u5904\u7406\u3001\u5feb\u901f\u95ee\u9898\u89e3\u51b3\u4ee5\u53ca\u4eba\u673a\u534f\u4f5c\u65b9\u9762\u9047\u5230\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4e13\u4e3a\u7535\u529b\u8c03\u5ea6\u4efb\u52a1\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u2014\u2014GAIA\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u96c6\u6784\u5efa\u6280\u672f\uff0c\u5229\u7528\u591a\u79cd\u6570\u636e\u6e90\u5bf9GAIA\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u5176\u5728\u8be5\u9886\u57df\u7684\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u7b80\u5316\u4e86LLM\u7684\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u4f7f\u5f97\u5728\u7535\u529b\u7cfb\u7edf\u7ba1\u7406\u4e2d\u80fd\u591f\u65e0\u7f1d\u6574\u5408\u591a\u7ef4\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e13\u95e8\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u9ad8GAIA\u5728\u8c03\u5ea6\u573a\u666f\u4e0b\u7684\u8f93\u5165\u8f93\u51fa\u6548\u7387\u3002\u5728ElecBench\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cGAIA\u5728\u591a\u4e2a\u6307\u6807\u4e0a\u8d85\u8d8a\u4e86\u57fa\u7840\u6a21\u578bLLaMA2\u3002\u5b9e\u9645\u5e94\u7528\u8868\u660e\uff0cGAIA\u80fd\u591f\u589e\u5f3a\u51b3\u7b56\u8fc7\u7a0b\u3001\u63d0\u9ad8\u8fd0\u8425\u6548\u7387\uff0c\u5e76\u4fc3\u8fdb\u7535\u529b\u8c03\u5ea6\u64cd\u4f5c\u4e2d\u7684\u4eba\u673a\u4ea4\u4e92\u3002\u672c\u6587\u6269\u5c55\u4e86LLM\u5728\u7535\u529b\u8c03\u5ea6\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u9a8c\u8bc1\u4e86\u5176\u5b9e\u7528\u6027\uff0c\u4e3a\u8fd9\u4e00\u9886\u57df\u672a\u6765\u7684\u521b\u65b0\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2408.03841": "|**2024-08-07**|**MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models**|Yuchen Dong et.al.|[2408.03841](http://arxiv.org/abs/2408.03841)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u64cd\u4f5c\u548c\u5de5\u5177\u751f\u6210\uff08SOTG\uff09\u9886\u57df\u7684\u5e94\u7528\uff0c\u4ee5\u6b64\u6765\u63d0\u5347\u8f6f\u4ef6\u751f\u4ea7\u529b\u3002\u8fd9\u4e00\u8fc7\u7a0b\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u6587\u660e\u65e9\u671f\u901a\u8fc7\u521b\u9020\u5e76\u4f7f\u7528\u5de5\u5177\u52a0\u901f\u53d1\u5c55\u7684\u9636\u6bb5\u3002\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u8981\u6c42AI\u80fd\u591f\u6301\u7eed\u603b\u7ed3\u5e76\u6539\u8fdb\u3002\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u5ffd\u89c6\u4e86\u5c06\u5b9e\u65f6\u4efb\u52a1\u7ecf\u9a8c\u8f6c\u5316\u4e3a\u7cfb\u7edf\u8bb0\u5fc6\u4ee5\u53ca\u533a\u5206\u73b0\u6709\u77e5\u8bc6\u672a\u6765\u4ef7\u503c\u7684\u91cd\u8981\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u201cMemory-Loop\u7f51\u7edc\u201d\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u7684\u8bb0\u5fc6\u5b58\u50a8\u4e0e\u7ecf\u9a8c\u5f15\u7528\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u57fa\u4e8e\u77e5\u8bc6\u7cbe\u786e\u5206\u6bb5\u7684RAG\u673a\u5236\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u4fbf\u6839\u636e\u4ef7\u503c\u5dee\u5f02\u5229\u7528\u8bb0\u5fc6\u3002\u9488\u5bf9SOTG\u8bbe\u8ba1\u4e86MaxMind\u6a21\u578b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MaxMind4Sheet\uff0c\u4e00\u4e2a\u9075\u5faaMaxMind\u7406\u5ff5\u7684\u7535\u5b50\u8868\u683c\u5904\u7406\u7cfb\u7edf\u3002\u4e0eSheetCopilot\u7684\u6bd4\u8f83\u5b9e\u9a8c\u663e\u793a\uff0c\u4efb\u52a1\u8bb0\u5fc6\u7684\u79ef\u7d2f\u548c\u5faa\u73af\u80fd\u591f\u7a33\u6b65\u63d0\u9ad8\u4efb\u52a1\u6210\u529f\u7387\uff0c\u5728\u6b64\u793a\u4f8b\u5b9e\u65bd\u4e2d\uff0c\u6bcf\u8f6e\u7684\u6210\u529f\u7387\u63d0\u5347\u7ea6\u4e3a3%-6%\u3002\u968f\u7740\u8bb0\u5fc6\u7684\u6301\u7eed\u589e\u957f\uff0c\u8fd9\u79cd\u7d2f\u79ef\u6539\u8fdb\u53ef\u80fd\u4f1a\u975e\u5e38\u663e\u8457\u3002 \u5f15\u5165\u8bb0\u5fc6\u5faa\u73af\u8fd8\u53ef\u4ee5\u901a\u8fc7\u9ad8\u8fbe25%\u7684\u6548\u7387\u63d0\u5347\u589e\u52a0\u7cfb\u7edf\u7684\u4efb\u52a1\u6267\u884c\u6548\u7387\uff0c\u5e76\u901a\u8fc7\u8bb0\u5fc6\u8f6c\u79fb\u89e3\u51b3LLM\u5728\u5904\u7406\u4e13\u4e1a\u4efb\u52a1\u65f6\u9762\u4e34\u7684\u518d\u8bad\u7ec3\u95ee\u9898\u3002\u8fd9\u8868\u660eMaxMind\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728SOTG\u9886\u57df\u7684\u529f\u80fd\u548c\u751f\u4ea7\u529b\u3002|\n", "2408.03837": "|**2024-08-07**|**WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models**|Prannaya Gupta et.al.|[2408.03837](http://arxiv.org/abs/2408.03837)|**[link](https://github.com/walledai/walledeval)**|WalledEval\u662f\u4e00\u4e2a\u5168\u9762\u7684AI\u5b89\u5168\u6027\u6d4b\u8bd5\u5de5\u5177\u5305\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u5b83\u80fd\u591f\u517c\u5bb9\u5404\u79cd\u6a21\u578b\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u4e24\u79cd\u7c7b\u578b\uff0c\u5e76\u5305\u542b\u4e86\u8d85\u8fc735\u4e2a\u8986\u76d6\u591a\u8bed\u8a00\u5b89\u5168\u3001\u5938\u5f20\u5b89\u5168\u4ee5\u53ca\u63d0\u793a\u6ce8\u5165\u7b49\u9886\u57df\u7684\u5b89\u5168\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u652f\u6301\u5bf9LLM\u548c\u88c1\u5224\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u4e14\u96c6\u6210\u81ea\u5b9a\u4e49\u7a81\u53d8\u5668\uff0c\u7528\u4e8e\u6d4b\u8bd5\u5728\u4e0d\u540c\u6587\u672c\u98ce\u683c\u53d8\u5f02\u5982\u5c06\u6765\u65f6\u6001\u548c\u91cd\u8ff0\u4e0b\u7684\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0cWalledEval\u5f15\u5165\u4e86WalledGuard\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5c0f\u578b\u9ad8\u6548\u5185\u5bb9\u5ba1\u6838\u5de5\u5177\uff0c\u4ee5\u53caSGXSTest\uff0c\u7528\u4e8e\u8bc4\u4f30\u6587\u5316\u80cc\u666f\u4e0b\u7684\u5938\u5927\u5b89\u5168\u95ee\u9898\u3002\u6211\u4eec\u5df2\u5c06WalledEval\u516c\u5f00\u53d1\u5e03\u5728https://github.com/walledai/walledevalA\u3002|\n", "2408.03834": "|**2024-08-07**|**Target Prompting for Information Extraction with Vision Language Model**|Dipankar Medhi et.al.|[2408.03834](http://arxiv.org/abs/2408.03834)|null|\u8fd1\u671f\uff0c\u5927\u578b\u89c6\u89c9\u4e0e\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u9886\u57df\u7684\u53d1\u5c55\u5728\u6784\u5efa\u4fe1\u606f\u63d0\u53d6\u7cfb\u7edf\u65b9\u9762\u5e26\u6765\u4e86\u65b0\u7684\u53d8\u9769\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u7406\u89e3\u6587\u6863\u548c\u6784\u5efa\u8de8\u884c\u4e1a\u7684\u95ee\u9898\u56de\u7b54\u7cfb\u7edf\u65b9\u9762\u8fbe\u5230\u4e86\u9876\u5c16\u6c34\u5e73\uff0c\u663e\u8457\u63d0\u5347\u4e86\u4ece\u6587\u6863\u56fe\u50cf\u751f\u6210\u6587\u672c\u4ee5\u53ca\u63d0\u4f9b\u7cbe\u786e\u7b54\u6848\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u7cbe\u51c6\u5bf9\u8bdd\u7cfb\u7edf\u65f6\u4ecd\u5b58\u5728\u4e00\u4e9b\u6311\u6218\u3002\u4f20\u7edf\u7684\u901a\u7528\u63d0\u793a\u6280\u672f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u7684\u5e94\u7528\u5f80\u5f80\u4e0d\u9002\u5408\u8fd9\u4e9b\u4e13\u95e8\u8bbe\u8ba1\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u4f7f\u7528\u8fd9\u7c7b\u901a\u7528\u8f93\u5165\u63d0\u793a\u6240\u751f\u6210\u7684\u8f93\u51fa\u901a\u5e38\u8f83\u4e3a\u666e\u901a\uff0c\u4e0e\u6587\u6863\u5b9e\u9645\u5185\u5bb9\u76f8\u6bd4\u53ef\u80fd\u5b58\u5728\u4fe1\u606f\u7f3a\u53e3\u3002\u4e3a\u4e86\u83b7\u5f97\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4f53\u7684\u7b54\u6848\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u90e8\u5206\u7684\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u63d0\u793a\uff0c\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7b54\u6848\u3002\u672c\u6587\u8ba8\u8bba\u4e86\u4e00\u79cd\u79f0\u4e3a\u201c\u76ee\u6807\u63d0\u793a\u201d\u7684\u6280\u672f\uff0c\u8be5\u6280\u672f\u4e13\u6ce8\u4e8e\u660e\u786e\u6307\u5411\u6587\u6863\u56fe\u50cf\u7684\u90e8\u5206\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u7528\u6237\u67e5\u8be2\u548c\u8f93\u5165\u63d0\u793a\u5bf9\u6bcf\u79cd\u63d0\u793a\u6280\u672f\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002|\n", "2408.04614": "|**2024-08-08**|**Better Alignment with Instruction Back-and-Forth Translation**|Thao Nguyen et.al.|[2408.04614](http://arxiv.org/abs/2408.04614)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\uff0c\u7528\u4e8e\u6784\u5efa\u57fa\u4e8e\u4e16\u754c\u77e5\u8bc6\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u6570\u636e\uff0c\u4ee5\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5bf9\u9f50\u3002\u7ed9\u5b9a\u7f51\u7edc\u8bed\u6599\u5e93\u4e2d\u7684\u6587\u6863\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Li\u7b49\u4eba(2023a)\u63d0\u51fa\u7684\u56de\u8bd1\u65b9\u6cd5\u751f\u6210\u5e76\u6574\u7406\u5408\u6210\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u6839\u636e\u521d\u59cb\u6587\u6863\u8fdb\u4e00\u6b65\u6539\u8fdb\u54cd\u5e94\u7684\u8d28\u91cf\u6765\u91cd\u5199\u8fd9\u4e9b\u6307\u4ee4\u3002\u901a\u8fc7\u4f7f\u7528\u4ea7\u751f\u7684\uff08\u56de\u8bd1\u6307\u4ee4\uff0c\u91cd\u5199\u54cd\u5e94\uff09\u5bf9\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5728AlpacaEval\u4e0a\u7684\u83b7\u80dc\u7387\u9ad8\u4e8e\u4f7f\u7528\u5176\u4ed6\u5e38\u89c1\u6307\u4ee4\u6570\u636e\u96c6\uff08\u5982Humpback\u3001ShareGPT\u3001Open Orca\u3001Alpaca-GPT4\u548cSelf-instruct\uff09\u3002\u6211\u4eec\u4e5f\u5c55\u793a\u4e86\u7528LLM\u91cd\u5199\u54cd\u5e94\u4f18\u4e8e\u76f4\u63a5\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u5e76\u4e14\u751f\u6210\u7684\u6587\u672c\u5206\u5e03\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u56de\u8bd1\u6307\u4ee4\u7684\u8d28\u91cf\u6bd4\u5176\u4ed6\u5408\u6210\u6307\u4ee4\u6765\u6e90\u66f4\u9ad8\uff0c\u800c\u6211\u4eec\u7684\u54cd\u5e94\u5728\u591a\u6837\u6027\u4e0e\u590d\u6742\u6027\u4e0a\u6bd4\u4ece\u84b8\u998f\u83b7\u5f97\u7684\u7ed3\u679c\u66f4\u4e3a\u51fa\u8272\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\u7ed3\u5408\u4e86\u7f51\u7edc\u4e0a\u4fe1\u606f\u591a\u6837\u6027\u548c\u6570\u91cf\u7684\u4f18\u52bf\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u8fd9\u662f\u6709\u6548\u5bf9\u9f50\u6240\u5fc5\u9700\u7684\u3002|\n", "2408.04594": "|**2024-08-09**|**Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models**|Qirui Jiao et.al.|[2408.04594](http://arxiv.org/abs/2408.04594)|**[link](https://github.com/modelscope/data-juicer)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aImg-Diff\u7684\u65b0\u6570\u636e\u96c6\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u548c\u56fe\u50cf\u5dee\u5f02\u63cf\u8ff0\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ec6\u5fae\u56fe\u50cf\u8bc6\u522b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5206\u6790\u76f8\u4f3c\u56fe\u50cf\u95f4\u7684\u5bf9\u8c61\u5dee\u5f02\uff0c\u8981\u6c42\u6a21\u578b\u8bc6\u522b\u76f8\u540c\u4e0e\u4e0d\u540c\u4e4b\u5904\u3002\u5229\u7528Stable-Diffusion-XL\u6a21\u578b\u53ca\u9ad8\u7ea7\u56fe\u50cf\u7f16\u8f91\u6280\u672f\u751f\u6210\u7a81\u51fa\u5bf9\u8c61\u66ff\u6362\u7684\u76f8\u4f3c\u56fe\u50cf\u5bf9\u3002\u6570\u636e\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u5dee\u5f02\u533a\u57df\u751f\u6210\u5668\u8bc6\u522b\u5bf9\u8c61\u5dee\u5f02\uff0c\u968f\u540e\u5dee\u5f02\u63cf\u8ff0\u751f\u6210\u5668\u63d0\u4f9b\u8be6\u7ec6\u7684\u5dee\u5f02\u8bf4\u660e\u3002\u7ed3\u679c\u662f\u521b\u5efa\u4e86\u4e00\u4e2a\u5c0f\u800c\u9ad8\u8d28\u91cf\u7684\u201c\u5bf9\u8c61\u66ff\u6362\u201d\u6837\u672c\u96c6\u5408\u3002\u4f7f\u7528\u6b64\u6570\u636e\u96c6\u5bf9\u5f53\u524d\u6700\u4f73\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982MGM-7B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u5dee\u5f02\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini\uff09\u5728MMVP\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63a2\u8ba8\u4e86\u901a\u8fc7\u201c\u5bf9\u8c61\u79fb\u9664\u201d\u65b9\u6cd5\u751f\u6210\u56fe\u50cf\u5dee\u5f02\u6570\u636e\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u5e76\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\u4ee5\u9a8c\u8bc1\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u5bf9\u6bd4\u6027\u6570\u636e\u96c6\u5408\u6210\u7684\u6df1\u5165\u89c1\u89e3\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5e76\u63a8\u52a8\u591a\u6a21\u6001\u6570\u636e\u5408\u6210\u548c\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u80fd\u529b\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53d1\u5e03\u5728https://github.com/modelscope/data-juicer/tree/ImgDiff\u4e0a\u4f9b\u516c\u4f17\u4f7f\u7528\u3002**|\n", "2408.04585": "|**2024-08-08**|**Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness**|Xiaojing Fan et.al.|[2408.04585](http://arxiv.org/abs/2408.04585)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b9e\u7528\u5e94\u7528\u9700\u6c42\u7684\u589e\u52a0\uff0c\u8bb8\u591a\u5173\u6ce8\u6548\u7387\u7684\u6a21\u578b\u88ab\u5f00\u53d1\u51fa\u6765\u4ee5\u5e73\u8861\u6027\u80fd\u548c\u8ba1\u7b97\u6210\u672c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5bf9\u6297\u9c81\u68d2\u6027\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u7814\u7a76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u901a\u8fc7\u6bd4\u8f83\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u590d\u6742\u5ea6\u548c\u6548\u7387\u6c34\u5e73\u7684\u4e3b\u8981\u6a21\u578b\u2014\u2014Transformer++\u3001\u95e8\u63a7\u7ebf\u6027\u6ce8\u610f\u529b\uff08GLA\uff09\u53d8\u6362\u5668\u4ee5\u53caMatMul-Free LM\uff0c\u6765\u63a2\u7d22\u6548\u7387\u3001\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u7684\u6743\u8861\u5173\u7cfb\u3002\u5229\u7528GLUE\u548cAdvGLUE\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002AdvGLUE\u6570\u636e\u96c6\u901a\u8fc7\u6dfb\u52a0\u65e8\u5728\u6311\u6218\u6a21\u578b\u9c81\u68d2\u6027\u7684\u5bf9\u6297\u6837\u672c\u6269\u5c55\u4e86GLUE\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5728GLUE\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u6027\u7a0d\u4f4e\u7684\u60c5\u51b5\u4e0b\uff0cGLA\u53d8\u6362\u5668\u548cMatMul-Free LM\u5728AdvGLUE\u4efb\u52a1\u4e0a\u663e\u793a\u51fa\u66f4\u9ad8\u7684\u6548\u7387\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u653b\u51fb\u7ea7\u522b\u4e0b\uff0c\u5b83\u4eec\u7684\u9c81\u68d2\u6027\u8981\u4e48\u4f18\u4e8e\uff0c\u8981\u4e48\u4e0eTransformer++\u76f8\u5339\u654c\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7b80\u5316\u67b6\u6784\u5728\u5b9e\u73b0\u9ad8\u6548\u80fd\u3001\u9ad8\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u4e4b\u95f4\u53d6\u5f97\u826f\u597d\u5e73\u8861\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8d44\u6e90\u53d7\u9650\u73af\u5883\u548c\u5bf9\u5bf9\u6297\u653b\u51fb\u6709\u9ad8\u62b5\u6297\u529b\u9700\u6c42\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2408.04575": "|**2024-08-08**|**SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals**|Haoran Zheng et.al.|[2408.04575](http://arxiv.org/abs/2408.04575)|null|\u89e3\u91ca\u6027\u4eba\u5de5\u667a\u80fd\uff08XAI\uff09\u5bf9\u4e8e\u589e\u5f3a\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u7684\u900f\u660e\u5ea6\u548c\u8d23\u4efb\u6027\u81f3\u5173\u91cd\u8981\uff0c\u5c24\u5176\u662f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSCENE\uff08\u8f6f\u53cd\u4e8b\u5b9e\u8bc4\u4f30\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u53ef\u89e3\u91ca\u6027\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6b21\u5c04\u51fb\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8f6f\u53cd\u4e8b\u5b9e\u89e3\u91ca\u3002\u901a\u8fc7\u5173\u6ce8\u57fa\u4e8e\u8bcd\u5143\u7684\u66ff\u6362\uff0cSCENE\u521b\u5efa\u4e86\u4e0a\u4e0b\u6587\u76f8\u5173\u4e14\u8bed\u4e49\u4e0a\u5177\u6709\u610f\u4e49\u7684\u8f6f\u53cd\u4e8b\u5b9e\uff0c\u800c\u65e0\u9700\u8fdb\u884c\u5927\u91cf\u5fae\u8c03\u3002SCENE\u91c7\u7528\u6709\u6548\u6027\u8f6f\u548cC\u8f6f\u6307\u6807\u6765\u8bc4\u4f30\u5404\u79cd\u6a21\u578b\u65e0\u5173\u7684XAI\u65b9\u6cd5\u5728\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6548\u679c\u3002\u5e94\u7528\u4e8eCNN\u3001RNN\u548cBERT\u67b6\u6784\uff0cSCENE\u63d0\u4f9b\u4e86\u5bf9\u5404\u79cdXAI\u6280\u672f\u5f3a\u9879\u548c\u5c40\u9650\u6027\u7684\u6709\u4ef7\u503c\u89c1\u89e3\u3002|\n", "2408.04568": "|**2024-08-08**|**Learning Fine-Grained Grounded Citations for Attributed Large Language Models**|Lei Huang et.al.|[2408.04568](http://arxiv.org/abs/2408.04568)|**[link](https://github.com/luckyyysta/fine-grained-attribution)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u5728\u5e7b\u89c9\u95ee\u9898\u4e0a\u5b58\u5728\u6311\u6218\u3002\u57fa\u4e8e\u5c5e\u6027\u7684LLM\uff0c\u901a\u8fc7\u5728\u751f\u6210\u6587\u672c\u4e2d\u6dfb\u52a0\u5185\u8054\u5f15\u7528\uff0c\u663e\u793a\u51fa\u51cf\u5c11\u5e7b\u89c9\u5e76\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u5f15\u7528\u65b9\u9762\u6548\u679c\u4e0d\u4f73\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u4f9d\u8d56\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u53ea\u5f15\u7528\u7c97\u7c92\u5ea6\u6587\u6863\u6807\u8bc6\u7684\u505a\u6cd5\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FRONT\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLM\u751f\u6210\u7ec6\u7c92\u5ea6\u76f8\u5173\u5f15\u7528\u3002\u8fd9\u4e9b\u5f15\u7528\u901a\u8fc7\u8fde\u63a5\u5230\u751f\u6210\u54cd\u5e94\u7684\u7ec6\u7c92\u5ea6\u652f\u6301\u5f15\u7528\u6765\u63d0\u4f9b\u6307\u5bfc\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u5f15\u7528\u8d28\u91cf\uff0c\u8fd8\u4fbf\u4e8e\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u5728ALCE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFRONT\u5728\u751f\u6210\u4f18\u79c0\u76f8\u5173\u54cd\u5e94\u548c\u9ad8\u5ea6\u652f\u6301\u6027\u5f15\u7528\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002\u4f7f\u7528LLaMA-2-7B\u65f6\uff0c\u8be5\u6846\u67b6\u663e\u8457\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u5e73\u5747\u63d0\u9ad8\u4e8614.21%\u7684\u5f15\u7528\u8d28\u91cf\uff0c\u5e76\u4e14\u8d85\u8d8a\u4e86ChatGPT\u3002**|\n", "2408.04556": "|**2024-08-08**|**Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models**|Yupeng Chang et.al.|[2408.04556](http://arxiv.org/abs/2408.04556)|**[link](https://github.com/cyp-jlu-ai/ba-lora)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u77a9\u76ee\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u5c06\u8fd9\u4e9b\u6a21\u578b\u5e94\u7528\u4e8e\u4e0b\u6e38\u5e94\u7528\u65f6\uff0c\u901a\u5e38\u9700\u8981\u8fdb\u884c\u8ba1\u7b97\u5bc6\u96c6\u578b\u548c\u5185\u5b58\u6d88\u8017\u5927\u7684\u5fae\u8c03\u8fc7\u7a0b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u6280\u672f\u5df2\u7ecf\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u51fa\u73b0\uff0c\u65e8\u5728\u4ee5\u6700\u5c0f\u7684\u8ba1\u7b97\u6210\u672c\u6765\u5b9a\u5236LLM\u3002\u5c3d\u7ba1PEFT\u65b9\u6cd5\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5b8c\u5168\u89e3\u51b3\u4ece\u9884\u8bad\u7ec3\u6570\u636e\u7ee7\u627f\u504f\u89c1\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684PEFT\u65b9\u6cd5\u2014\u2014Bias-Aware Low-Rank Adaptation (BA-LoRA)\uff0c\u65e8\u5728\u5bf9\u6297\u504f\u89c1\u7ee7\u627f\u3002 BA-LoRA\u6574\u5408\u4e86\u4e09\u4e2a\u4e0d\u540c\u7684\u6b63\u5219\u5316\u9879\uff1a\u4e00\u81f4\u6027\u6b63\u5219\u5316\u5668\u3001\u591a\u6837\u6027\u6b63\u5219\u5316\u5668\u4ee5\u53ca\u5947\u5f02\u503c\u5206\u89e3\u6b63\u5219\u5316\u5668\u3002\u8fd9\u4e09\u4e2a\u6b63\u5219\u5316\u5668\u5171\u540c\u65e8\u5728\u63d0\u9ad8\u751f\u6210\u6a21\u578b\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u4e00\u81f4\u6027\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u548c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528\u5982LLaMA\u3001Mistral\u548cGemma\u7b49\u4e3b\u6d41LLM\uff0c\u6211\u4eec\u5c55\u793a\u4e86BA-LoRA\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86LoRA\u53ca\u5176\u6700\u5148\u8fdb\u7684\u53d8\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u5730\u51cf\u8f7b\u4e86\u9884\u8bad\u7ec3\u504f\u89c1\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u5bfc\u81f4\u66f4\u53ef\u9760\u4e14\u7a33\u5065\u7684\u6a21\u578b\u8f93\u51fa\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/cyp-jlu-ai/BA-LoRA\u3002**|\n", "2408.04522": "|**2024-08-08**|**Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models**|Fabio Pernisi et.al.|[2408.04522](http://arxiv.org/abs/2408.04522)|null|\u968f\u7740\u4e0d\u540c\u8bed\u8a00\u7684\u591a\u5143\u8bed\u8a00\u793e\u533a\u548c\u7528\u6237\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u4e0d\u540c\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u5b89\u5168\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u7ecf\u8fdb\u884c\u4e86\u6301\u7eed\u7684\u52aa\u529b\u4ee5\u786e\u4fddLLM\u7684\u5b89\u5168\u6027\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u53ef\u4ee5\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u6280\u672f\u6765\u8868\u73b0\u5f97\u4e0d\u5b89\u5168\uff0c\u8fd9\u662f\u4e00\u79cd\u4fc3\u4f7f\u6a21\u578b\u5728\u5176\u64cd\u4f5c\u51c6\u5219\u4e4b\u5916\u884c\u52a8\u7684\u6280\u672f\u3002\u5bf9\u4e8eLLM\u5b89\u5168\u6027\u4ee5\u53ca\u201c\u8d8a\u72f1\u201d\u7684\u7814\u7a76\u76ee\u524d\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\uff0c\u8fd9\u9650\u5236\u4e86\u6211\u4eec\u5bf9\u5176\u4ed6\u8bed\u8a00\u4e2dLLM\u5b89\u5168\u6027\u7684\u7406\u89e3\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u610f\u5927\u5229\u8bed\u4e2d\u7814\u7a76\u591a\u8f6e\u201c\u8d8a\u72f1\u201d\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u7528\u4e0d\u5b89\u5168\u793a\u4f8b\u6765\u8bf1\u5bfc\u4e0d\u5b89\u5168\u884c\u4e3a\uff0c\u6765\u8d21\u732e\u4e8e\u8fd9\u4e00\u9886\u57df\u3002\u4e3a\u4e86\u652f\u6301\u6211\u4eec\u7684\u5206\u6790\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u610f\u5927\u5229\u8bed\u95ee\u9898-\u7b54\u6848\u4e0d\u5b89\u5168\u6570\u636e\u96c6\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5728\u56db\u4e2a\u5f00\u653e\u6743\u91cdLLM\u5bb6\u65cf\u4e2d\u8bc6\u522b\u51fa\u4e86\u660e\u663e\u7684\u5b89\u5168\u6f0f\u6d1e\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u4f7f\u7528\u5c11\u91cf\u4e0d\u5b89\u5168\u793a\u4f8b\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u4e5f\u4f1a\u8868\u73b0\u51fa\u4e0d\u5b89\u5168\u7684\u884c\u4e3a\uff0c\u5e76\u4e14\u66f4\u4ee4\u4eba\u62c5\u5fe7\u7684\u662f\uff0c\u968f\u7740\u66f4\u591a\u793a\u4f8b\u7684\u51fa\u73b0\uff0c\u8fd9\u79cd\u8d8b\u52bf\u8fc5\u901f\u52a0\u5267\u3002|\n", "2408.04477": "|**2024-08-08**|**What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant**|Jonan Richards et.al.|[2408.04477](http://arxiv.org/abs/2408.04477)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7528\u4e8e\u8f85\u52a9\u5f00\u53d1\u8005\u7406\u89e3\u4ee3\u7801\u7684\u5de5\u5177\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\u7684\u540c\u65f6\uff0c\u5f00\u53d1\u8005\u5728\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u65f6\u4ecd\u9762\u4e34\u4e00\u4e9b\u969c\u788d\uff0c\u5305\u62ec\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u5176\u610f\u56fe\u7684\u6311\u6218\u3001\u89e3\u8bfb\u5de5\u5177\u7ed3\u679c\u7684\u56f0\u96be\uff0c\u4ee5\u53ca\u8c03\u6574\u6709\u6548\u63d0\u793a\u4ee5\u83b7\u5f97\u6709\u7528\u4fe1\u606f\u7684\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u5bf9\u8bdd\u52a9\u624b\uff0c\u8be5\u52a9\u624b\u6839\u636e\u63a8\u65ad\u51fa\u7684\u7528\u6237\u5fc3\u7406\u72b6\u6001\uff08\u5982\u80cc\u666f\u77e5\u8bc6\u548c\u7ecf\u9a8c\uff09\u63d0\u4f9b\u4e2a\u6027\u5316\u4e92\u52a8\u3002\u901a\u8fc7\u9488\u5bf9\u5341\u56db\u4f4d\u65b0\u624b\u8fdb\u884c\u7684\u5185\u90e8\u4e3b\u9898\u7814\u7a76\uff0c\u6211\u4eec\u6355\u6349\u4e86\u4ed6\u4eec\u7684\u611f\u77e5\u548c\u504f\u597d\u3002\u7814\u7a76\u7ed3\u679c\u4e3a\u5e0c\u671b\u521b\u5efa\u6216\u6539\u8fdb\u9762\u5411\u65b0\u624b\u7684LLM\u4e3a\u57fa\u7840\u7684\u5bf9\u8bdd\u52a9\u624b\u4ee5\u652f\u6301\u4ee3\u7801\u7406\u89e3\u7684\u7814\u7a76\u4eba\u5458\u548c\u5de5\u5177\u5f00\u53d1\u8005\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.04472": "|**2024-08-08**|**Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate**|Yiqun Zhang et.al.|[2408.04472](http://arxiv.org/abs/2408.04472)|**[link](https://github.com/zhangyiqun018/agent-for-debate)**|**\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u8fd9\u4e00\u5168\u9762\u4e14\u590d\u6742\u7684\u8ba1\u7b97\u8bba\u8fa9\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u4e34\u7740\u5e7b\u89c9\u548c\u7ade\u4e89\u529b\u4e0d\u8db3\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8fa9\u8bba\u8005\u201d\uff08Agent4Debate\uff09\u7684\u52a8\u6001\u3001\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u57fa\u4e8eLLMs\u8bbe\u8ba1\uff0c\u65e8\u5728\u589e\u5f3a\u5176\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u53d7\u5230\u4eba\u7c7b\u5728\u8fa9\u8bba\u51c6\u5907\u4e0e\u6267\u884c\u8fc7\u7a0b\u4e2d\u884c\u4e3a\u7684\u542f\u53d1\uff0c\u91c7\u7528\u534f\u4f5c\u67b6\u6784\uff0c\u7531\u56db\u4e2a\u4e13\u95e8\u7684\u4ee3\u7406\uff08\u641c\u7d22\u8005\u3001\u5206\u6790\u8005\u3001\u64b0\u5199\u8005\u548c\u5ba1\u9605\u8005\uff09\u52a8\u6001\u4ea4\u4e92\u5e76\u5408\u4f5c\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5728\u6574\u4e2a\u8fa9\u8bba\u8fc7\u7a0b\u4e2d\u8986\u76d6\u4e86\u4ece\u521d\u59cb\u7814\u7a76\u5230\u8bba\u70b9\u5f62\u6210\u3001\u53cd\u9a73\u548c\u603b\u7ed3\u7684\u591a\u4e2a\u9636\u6bb5\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6846\u67b6\u7684\u6027\u80fd\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u4e2d\u56fd\u8fa9\u8bba\u7ade\u6280\u573a\u201d\u7684\u6570\u636e\u5e93\uff0c\u5305\u542b\u4e8666\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4e2d\u6587\u8fa9\u8bba\u8bae\u9898\u3002\u6211\u4eec\u62db\u52df\u4e86\u5341\u4f4d\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e13\u4e1a\u8fa9\u8bba\u8005\uff0c\u5e76\u6536\u96c6\u4e86\u6d89\u53caAgent4Debate\u3001\u57fa\u7ebf\u6a21\u578b\u548c\u4eba\u7c7b\u7684200\u573a\u8fa9\u8bba\u8bb0\u5f55\u3002\u8bc4\u4ef7\u4f53\u7cfb\u91c7\u7528\u4e86\u81ea\u52a8\u8bc4\u5206\u7cfb\u7edfDebatrix\u4ee5\u53ca\u57fa\u4e8eDebatrix-Elo\u548cHuman-Elo\u6392\u540d\u7684\u4e13\u4e1a\u8bc4\u5ba1\u56e2\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5148\u8fdb\u7684Agent4Debate\u5728\u80fd\u529b\u4e0a\u4e0e\u4eba\u7c7b\u76f8\u5f53\u3002\u8fdb\u4e00\u6b65\u7684\u6d88\u878d\u7814\u7a76\u8868\u660e\uff0c\u4ee3\u7406\u7ed3\u6784\u4e2d\u7684\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.04449": "|**2024-08-08**|**RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents**|Zihao Zhu et.al.|[2408.04449](http://arxiv.org/abs/2408.04449)|null|\u6458\u8981\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRiskAwareBench\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u7531\u56db\u4e2a\u6a21\u5757\u7ec4\u6210\uff1a\u5b89\u5168\u63d0\u793a\u751f\u6210\u3001\u5371\u9669\u573a\u666f\u751f\u6210\u3001\u8ba1\u5212\u751f\u6210\u548c\u8bc4\u4f30\uff0c\u5b83\u5141\u8bb8\u8fdb\u884c\u5168\u9762\u7684\u98ce\u9669\u8bc4\u4f30\uff0c\u4e14\u6240\u9700\u7684\u4eba\u5de5\u5e72\u9884\u6700\u5c11\u3002\u901a\u8fc7\u4f7f\u7528\u8fd9\u4e2a\u6846\u67b6\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aPhysicalRisk\u7684\u6570\u636e\u96c6\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u6d89\u53ca\u76f8\u5173\u5b89\u5168\u63d0\u793a\u3001\u89c2\u5bdf\u548c\u6307\u4ee4\u7684\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5927\u591a\u6570LLM\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u8868\u73b0\u4e0d\u8db3\uff0c\u5e76\u4e14\u57fa\u7840\u7684\u98ce\u9669\u7f13\u89e3\u7b56\u7565\u5e26\u6765\u7684\u63d0\u5347\u6709\u9650\u3002\u8fd9\u5f3a\u8c03\u4e86\u5728\u672a\u6765\u6539\u8fdb\u57fa\u4e8eLLM\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u7684\u7269\u7406\u98ce\u9669\u610f\u8bc6\u7684\u7d27\u8feb\u6027\u548c\u91cd\u8981\u6027\u3002|\n", "2408.05212": "|**2024-08-10**|**Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions**|Michele Miranda et.al.|[2408.05212](http://arxiv.org/abs/2408.05212)|**[link](https://github.com/michele17284/awesome-privacy-preserving-llms)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u6b65\uff0c\u5e76\u5728\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5e9e\u5927\u7684\u4e92\u8054\u7f51\u6765\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u5e26\u6765\u4e86\u663e\u8457\u7684\u9690\u79c1\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5728\u5173\u952e\u9886\u57df\uff08\u5982\u533b\u7597\u4fdd\u5065\uff09\u7684\u60c5\u51b5\u4e0b\u4f1a\u52a0\u5267\u8fd9\u4e9b\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5728\u7279\u5b9a\u5e94\u7528\u573a\u666f\u4e0b\uff0c\u53ef\u80fd\u9700\u8981\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u9488\u5bf9\u79c1\u6709\u6570\u636e\u7684\u5fae\u8c03\u3002\u672c\u6587\u5bf9LLM\u7684\u9690\u79c1\u5a01\u80c1\u8fdb\u884c\u4e86\u6279\u5224\u6027\u8bc4\u4f30\uff0c\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u6a21\u578b\u53ef\u80fd\u8bb0\u4f4f\u5e76\u65e0\u610f\u95f4\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u7684\u98ce\u9669\u3002 \u6211\u4eec\u901a\u8fc7\u56de\u987e\u9488\u5bf9LLM\u7684\u9690\u79c1\u653b\u51fb\u6765\u63a2\u8ba8\u5f53\u524d\u7684\u5a01\u80c1\uff0c\u5e76\u63d0\u51fa\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5728\u6574\u4e2a\u5b66\u4e60\u7ba1\u9053\u4e2d\u6574\u5408\u9690\u79c1\u673a\u5236\u3002\u8fd9\u4e9b\u89e3\u51b3\u65b9\u6848\u6db5\u76d6\u4e86\u4ece\u533f\u540d\u5316\u8bad\u7ec3\u6570\u636e\u5230\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u8fc7\u7a0b\u4e2d\u5b9e\u65bd\u5dee\u5206\u9690\u79c1\uff0c\u4ee5\u53ca\u5728\u8bad\u7ec3\u540e\u6267\u884c\u673a\u5668\u9057\u5fd8\u7684\u8303\u56f4\u3002\u6211\u4eec\u7684\u6587\u732e\u7efc\u8ff0\u6df1\u5165\u7814\u7a76\u4e86\u73b0\u6709\u7814\u7a76\u4e2d\u7684\u6301\u7eed\u6311\u6218\u3001\u53ef\u7528\u5de5\u5177\u548c\u672a\u6765\u65b9\u5411\uff0c\u4ee5\u4fdd\u62a4LLM\u4e2d\u7684\u9690\u79c1\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u5bf9\u9690\u79c1\u4fdd\u5b58\u65b9\u6cd5\u53ca\u5176\u5728\u51cf\u8f7b\u98ce\u9669\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u5168\u9762\u7406\u89e3\uff0c\u6307\u5bfc\u5f00\u53d1\u66f4\u5b89\u5168\u3001\u66f4\u53ef\u4fe1\u7684AI\u7cfb\u7edf\u3002|\n", "2408.05211": "|**2024-08-09**|**VITA: Towards Open-Source Interactive Omni Multimodal LLM**|Chaoyou Fu et.al.|[2408.05211](http://arxiv.org/abs/2408.05211)|**[link](https://github.com/VITA-MLLM/VITA)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86VITA\uff0c\u8fd9\u662f\u9996\u4e2a\u5f00\u6e90\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u540c\u65f6\u5904\u7406\u548c\u5206\u6790\u89c6\u9891\u3001\u56fe\u50cf\u3001\u6587\u672c\u548c\u97f3\u9891\u7b49\u591a\u5143\u6a21\u6001\u4fe1\u606f\uff0c\u5e76\u4e14\u5177\u5907\u9ad8\u7ea7\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u4f53\u9a8c\u3002\u4eceMixtral 8x7B\u4f5c\u4e3a\u8bed\u8a00\u57fa\u7840\u51fa\u53d1\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u5176\u5728\u4e2d\u6587\u65b9\u9762\u7684\u8bcd\u6c47\uff0c\u5e76\u901a\u8fc7\u53cc\u8bed\u6307\u4ee4\u5fae\u8c03\u8fdb\u4e00\u6b65\u63d0\u5347\u4e86\u6a21\u578b\u80fd\u529b\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e24\u9636\u6bb5\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u65b9\u5f0f\uff0c\u4e3a\u8bed\u8a00\u6a21\u578b\u8d4b\u4e88\u4e86\u89c6\u89c9\u548c\u97f3\u9891\u5904\u7406\u7684\u80fd\u529b\u3002 VITA\u5c55\u73b0\u4e86\u5f3a\u5927\u7684\u591a\u8bed\u8a00\u3001\u89c6\u89c9\u548c\u97f3\u9891\u7406\u89e3\u7684\u57fa\u7840\u80fd\u529b\uff0c\u5e76\u5728\u4e00\u7cfb\u5217\u5355\u6a21\u6001\u4e0e\u591a\u6a21\u6001\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u9664\u4e86\u57fa\u7840\u80fd\u529b\u5916\uff0c\u6211\u4eec\u5728\u63d0\u5347\u81ea\u7136\u591a\u6a21\u6001\u4eba\u673a\u4ea4\u4e92\u4f53\u9a8c\u65b9\u9762\u4e5f\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5229\u7528\u975e\u5524\u9192\u4ea4\u4e92\u548c\u97f3\u9891\u4e2d\u65ad\u529f\u80fd\u3002 VITA\u662f\u5f00\u6e90\u793e\u533a\u63a2\u7d22\u65e0\u7f1d\u878d\u5408\u591a\u6a21\u6001\u7406\u89e3\u548c\u4ea4\u4e92\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1VITA\u4e0e\u4e13\u6709\u6a21\u578b\u8fd8\u6709\u8f83\u5927\u5dee\u8ddd\uff0c\u4f46\u6211\u4eec\u76f8\u4fe1\u5b83\u4f5c\u4e3a\u5148\u950b\u89d2\u8272\u53ef\u4ee5\u6210\u4e3a\u540e\u7eed\u7814\u7a76\u7684\u91cd\u8981\u57fa\u77f3\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://vita-home.github.io|\n", "2408.05204": "|**2024-08-09**|**Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners**|Michael Vaccaro Jr et.al.|[2408.05204](http://arxiv.org/abs/2408.05204)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5c24\u5176\u662fOpenAI\u7684GPT\u7cfb\u5217\uff0c\u5728\u591a\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u56e0\u5176\u5728\u4e0d\u540c\u5b66\u79d1\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u4ee5\u53ca\u5bf9\u7528\u6237\u63d0\u793a\u7684\u5feb\u901f\u9002\u5e94\u6027\u800c\u53d7\u5230\u5173\u6ce8\uff0c\u5e76\u4e14\u5c55\u73b0\u51fa\u4f5c\u4e3a\u4e2a\u6027\u5316\u5b66\u4e60\uff08PL\uff09\u5de5\u5177\u7684\u72ec\u7279\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728K-12\u6559\u80b2\u4e2d\u7684\u5e94\u7528\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u9996\u6b21\u91c7\u7528\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u65b9\u6cd5\uff08\u6837\u672c\u91cf\u4e3a23\uff09\u6765\u8bc4\u4f30GPT-4\u5728\u4e2d\u5b66\u79d1\u5b66\u6587\u672c\u4e2a\u6027\u5316\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u7814\u7a76\u3002\u5728\u8be5\u7814\u7a76\u4e2d\uff0cGPT-4\u7528\u4e8e\u6839\u636e\u5b66\u751f\u5728\u8bad\u7ec3\u9636\u6bb5\u505a\u51fa\u7684\u9009\u62e9\u6765\u5206\u6790\u548c\u9884\u6d4b\u4ed6\u4eec\u7684\u5b66\u4e60\u504f\u597d\u3002\u5bf9\u4e8e\u5b9e\u9a8c\u7ec4\u7684\u5b66\u751f\uff0cGPT-4\u88ab\u7528\u6765\u4fee\u6539\u79d1\u5b66\u6587\u672c\u4ee5\u4e0e\u5b66\u751f\u7684\u9884\u6d4b\u504f\u597d\u76f8\u5339\u914d\uff1b\u800c\u5bf9\u4e8e\u63a7\u5236\u7ec4\u7684\u5b66\u751f\uff0c\u6587\u672c\u5219\u88ab\u4fee\u6539\u4e3a\u4e0e\u5176\u5b66\u4e60\u504f\u597d\u76f8\u53cd\u3002\u901a\u8fc7\u66fc-\u60e0\u7279\u5c3cU\u68c0\u9a8c\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u6587\u672c\u4e0e\u5b66\u751f\u504f\u597d\u5339\u914d\u65f6\uff0c\u5b66\u751f\u660e\u663e\u66f4\u503e\u5411\u4e8e\u63a5\u53d7\uff08\u57280.10\u6c34\u5e73\u4e0a\u5177\u6709\u7edf\u8ba1\u5b66\u610f\u4e49\uff0cp=0.059\uff09\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u80fd\u591f\u6709\u6548\u5730\u7406\u89e3\u548c\u5b9a\u5236\u6559\u80b2\u5185\u5bb9\u4ee5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u8005\u7684\u504f\u597d\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u5b66\u4e60\u6280\u672f\u9886\u57df\u7684\u4e00\u4e2a\u91cd\u8981\u8fdb\u5c55\u3002 \u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u8fd9\u9879\u7814\u7a76\u7684\u5c40\u9650\u6027\u548c\u5728\u6559\u80b2\u4e2d\u4f7f\u7528\u4eba\u5de5\u667a\u80fd\u7684\u4f26\u7406\u8003\u8651\u3002|\n", "2408.05200": "|**2024-08-09**|**TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning**|Yujie Feng et.al.|[2408.05200](http://arxiv.org/abs/2408.05200)|**[link](https://github.com/WoodScene/TaSL)**|\u8bed\u8a00\u6a21\u578b\u8fde\u7eed\u5b66\u4e60\uff08CL\uff09\u6700\u8fd1\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u56e0\u4e3a\u5b83\u6709\u53ef\u80fd\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u9002\u5e94\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u52a8\u6001\u73b0\u5b9e\u73af\u5883\u3002\u4e00\u4e2a\u5173\u952e\u6311\u6218\u662f\u707e\u96be\u6027\u9057\u5fd8\uff0c\u5373\u6a21\u578b\u5728\u5b66\u4e60\u65b0\u4efb\u52a1\u65f6\u4f1a\u5931\u53bb\u5148\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u73b0\u6709\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u591a\u4e2a\u53c2\u6570\u6548\u7387\u5fae\u8c03\uff08PEFT\uff09\u5757\u6765\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u83b7\u53d6\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u7f3a\u4e4f\u6548\u7387\uff0c\u5e76\u4e14\u5ffd\u89c6\u4e86\u901a\u8fc7\u4efb\u52a1\u4ea4\u4e92\u8fdb\u884c\u77e5\u8bc6\u4f20\u9012\u7684\u53ef\u80fd\u6027\u3002 \u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4efb\u52a1\u6280\u80fd\u5b9a\u4f4d\u4e0e\u6574\u5408\uff08TaSL\uff09\u7684\u65b0CL\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u4e0d\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u91cd\u64ad\u6765\u589e\u5f3a\u77e5\u8bc6\u4f20\u9012\u3002TaSL\u9996\u5148\u6839\u636e\u53c2\u6570\u4f9d\u8d56\u6027\u5c06\u6a21\u578b\u5206\u4e3a\u201c\u6280\u80fd\u5355\u5143\u201d\uff0c\u8fd9\u4f7f\u5f97\u5bf9\u6280\u80fd\u5355\u5143\u7684\u63a7\u5236\u66f4\u52a0\u7cbe\u7ec6\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ec4\u7ea7\u6280\u80fd\u5b9a\u4f4d\u6280\u672f\uff0c\u4ee5\u8bc6\u522b\u65b0\u4efb\u52a1\u4e2d\u6280\u80fd\u5355\u5143\u7684\u91cd\u8981\u6027\u5206\u5e03\u3002\u901a\u8fc7\u6bd4\u8f83\u8fd9\u4e2a\u91cd\u8981\u6027\u5206\u5e03\u4e0e\u5176\u4ed6\u5148\u524d\u4efb\u52a1\u4e2d\u7684\u5206\u5e03\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u4e00\u4e2a\u7cbe\u7ec6\u7684\u6280\u80fd\u6574\u5408\u7b56\u7565\uff0c\u4fdd\u7559\u4e86\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u9632\u6b62\u9057\u5fd8\uff0c\u5e76\u66f4\u65b0\u4e86\u5171\u4eab\u4efb\u52a1\u77e5\u8bc6\uff0c\u8fd9\u4fc3\u8fdb\u4e86\u53cc\u5411\u77e5\u8bc6\u4f20\u9012\u3002\u56e0\u6b64\uff0cTaSL\u5b9e\u73b0\u4e86\u4fdd\u6301\u5148\u524d\u77e5\u8bc6\u548c\u5728\u65b0\u4efb\u52a1\u4e0a\u53d6\u5f97\u4f18\u5f02\u8868\u73b0\u4e4b\u95f4\u7684\u6700\u4f73\u5e73\u8861\u3002 TaSL\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u901a\u7528\u6a21\u578b\uff0c\u5e76\u53ef\u4ee5\u6839\u636eLoRA\u7b49PEFT\u65b9\u6cd5\u8fdb\u884c\u5b9a\u5236\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8868\u73b0\u51fa\u663e\u8457\u7684\u6269\u5c55\u6027\uff0c\u5141\u8bb8\u4e0e\u8bb0\u5fc6\u91cd\u64ad\u96c6\u6210\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\u3002\u5728\u4e24\u4e2aCL\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4f7f\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece2.2\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\uff0c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86TaSL\u53ca\u5176\u53d8\u4f53\u5728\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u6709\u6548\u6027\u3002|\n", "2408.05149": "|**2024-08-09**|**AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset**|Pritam Deka et.al.|[2408.05149](http://arxiv.org/abs/2408.05149)|null|\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u653b\u51fb\u5f52\u56e0\u662f\u81f3\u5173\u91cd\u8981\u7684\u8fc7\u7a0b\uff0c\u5b83\u5141\u8bb8\u4e13\u5bb6\u5236\u5b9a\u9488\u5bf9\u653b\u51fb\u8005\u7684\u9632\u5fa1\u63aa\u65bd\u548c\u6cd5\u5f8b\u884c\u52a8\u3002\u76ee\u524d\uff0c\u5206\u6790\u4eba\u5458\u4e3b\u8981\u901a\u8fc7\u624b\u52a8\u64cd\u4f5c\u6765\u8fdb\u884c\u5f52\u56e0\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u4efb\u52a1\u7684\u590d\u6742\u6027\u3002\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u53ef\u4ee5\u88ab\u7528\u6765\u8f85\u52a9\u7f51\u7edc\u5b89\u5168\u5206\u6790\u5e08\u5728\u5f52\u56e0\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u5de5\u4f5c\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6280\u672f\u975e\u5e38\u5f3a\u5927\uff0c\u4f46\u5728\u7f3a\u4e4f\u653b\u51fb\u5f52\u56e0\u9886\u57df\u7684\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u4eec\u9700\u8981\u5e94\u5bf9\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c06\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u5e76\u63d0\u4f9b\u5230\u76ee\u524d\u4e3a\u6b62\u6211\u4eec\u6240\u77e5\u7684\u7b2c\u4e00\u4e2a\u653b\u51fb\u5f52\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u8bbe\u8ba1\u7684\u4e3b\u8981\u76ee\u6807\u662f\u4ece\u7f51\u7edc\u5b89\u5168\u6587\u672c\u4e2d\u63d0\u53d6\u653b\u51fb\u5f52\u56e0\u4fe1\u606f\uff0c\u5229\u7528NLP\u9886\u57df\u7684\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u65b9\u6cd5\u3002\u4e0e\u5176\u5b83\u7f51\u7edc\u5b89\u5168NER\u6570\u636e\u96c6\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u63d0\u4f9b\u4e86\u4e30\u5bcc\u4e14\u5305\u542b\u4e0a\u4e0b\u6587\u7ec6\u8282\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u4e00\u4e9b\u8de8\u77ed\u8bed\u548c\u53e5\u5b50\u7684\u6ce8\u91ca\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5e76\u5e94\u7528\u4e86NLP\u6280\u672f\u6765\u5c55\u793a\u6570\u636e\u96c6\u5728\u653b\u51fb\u5f52\u56e0\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u8fd9\u4e9b\u5b9e\u9a8c\u7a81\u663e\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5728\u6539\u8fdb\u7f51\u7edc\u5b89\u5168\u6570\u636e\u96c6\u4e2d\u7684NER\u4efb\u52a1\u4ee5\u63d0\u5347\u653b\u51fb\u5f52\u56e0\u80fd\u529b\u7684\u6f5c\u529b\u3002|\n", "2408.05141": "|**2024-08-09**|**A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning**|Ye Yuan et.al.|[2408.05141](http://arxiv.org/abs/2408.05141)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7efc\u5408\u4f18\u5316\u7684\u589e\u5f3a\u68c0\u7d22\u8f85\u52a9\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u5916\u90e8\u77e5\u8bc6\u5e93\u663e\u8457\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u548c\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u8fdb\u884c\u4e86\u591a\u9879\u6539\u8fdb\uff0c\u5305\u62ec\u5bf9\u7f51\u9875\u4e2d\u7684\u6587\u672c\u6bb5\u843d\u548c\u8868\u683c\u8fdb\u884c\u7ec6\u5316\u5904\u7406\u3001\u5f15\u5165\u5c5e\u6027\u9884\u6d4b\u5668\u4ee5\u51cf\u5c11\u5e7b\u89c9\u3001\u6784\u5efaLLM\u77e5\u8bc6\u62bd\u53d6\u5668\u548c\u77e5\u8bc6\u56fe\u8c31\u62bd\u53d6\u5668\uff0c\u5e76\u6700\u7ec8\u5efa\u7acb\u4e86\u4e00\u4e2a\u6574\u5408\u6240\u6709\u53c2\u8003\u4fe1\u606f\u7684\u63a8\u7406\u7b56\u7565\u3002\u6211\u4eec\u901a\u8fc7Meta CRAG KDD\u676f2024\u7ade\u8d5b\u4e2d\u7684CRAG\u6570\u636e\u96c6\u5bf9\u7cfb\u7edf\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u672c\u5730\u4e0e\u5728\u7ebf\u8bc4\u4f30\u5747\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u590d\u6742\u63a8\u7406\u80fd\u529b\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u63d0\u5347\u3002\u5728\u672c\u5730\u8bc4\u4f30\u4e2d\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u6a21\u578b\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u9519\u8bef\u7387\u4e5f\u6709\u6240\u4e0b\u964d\uff0c\u53d6\u5f97\u4e86\u8f83\u9ad8\u7684\u5206\u6570\u3002\u540c\u65f6\uff0c\u5728\u7ebf\u8bc4\u4f30\u7ed3\u679c\u540c\u6837\u8868\u73b0\u4f18\u5f02\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7cfb\u7edf\u7684\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u7cfb\u7edf\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8e\\url{https://gitlab.aicrowd.com/shizueyy/crag-new}\u3002|\n", "2408.05128": "|**2024-08-09**|**Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations**|Jasmine Latendresse et.al.|[2408.05128](http://arxiv.org/abs/2408.05128)|null|\u5728\u8f6f\u4ef6\u7cfb\u7edf\u529f\u80fd\u3001\u6548\u7387\u4e0e\u53ef\u7ef4\u62a4\u6027\u65b9\u9762\uff0c\u8f6f\u4ef6\u5e93\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\u3002\u968f\u7740\u5f00\u53d1\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b80\u5316\u7f16\u7801\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6a21\u578b\u63a8\u8350\u5408\u9002\u5e93\u7684\u6709\u6548\u6027\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002\u672c\u6587\u8bc4\u4f30\u4e86ChatGPT\u4f5c\u4e3a\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc6\u522b\u4e86\u6539\u8fdb\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-3.5 Turbo\u751f\u6210\u9488\u5bf910000\u4e2aStack Overflow\u95ee\u9898\u7684Python\u4ee3\u7801\uff0c\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cChatGPT\u6bd4\u4eba\u7c7b\u5f00\u53d1\u8005\u66f4\u9891\u7e41\u5730\u4f7f\u7528\u7b2c\u4e09\u65b9\u5e93\uff0c\u503e\u5411\u4e8e\u5e7f\u6cdb\u91c7\u7528\u4e14\u5386\u53f2\u60a0\u4e45\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c14.2%\u63a8\u8350\u7684\u5e93\u5177\u6709\u9650\u5236\u6027\u7684Copyleft\u8bb8\u53ef\uff0c\u8fd9\u5e76\u672a\u7531ChatGPT\u660e\u786e\u4f20\u8fbe\u3002\u6b64\u5916\uff0c\u67096.5%\u7684\u5e93\u65e0\u6cd5\u76f4\u63a5\u4f7f\u7528\uff0c\u53ef\u80fd\u5bfc\u81f4\u5f00\u53d1\u8005\u56f0\u60d1\u548c\u6d6a\u8d39\u65f6\u95f4\u3002\u5c3d\u7ba1ChatGPT\u53ef\u4ee5\u4f5c\u4e3a\u6709\u6548\u7684\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\uff0c\u4f46\u5e94\u63d0\u4f9b\u5173\u4e8e\u7ef4\u62a4\u6307\u6807\u548c\u8bb8\u53ef\u7684\u66f4\u591a\u660e\u786e\u4fe1\u606f\u3002\u6211\u4eec\u5efa\u8bae\u5f00\u53d1\u8005\u5b9e\u65bd\u4e25\u683c\u7684\u4f9d\u8d56\u7ba1\u7406\u5b9e\u8df5\uff0c\u5e76\u5728\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u96c6\u6210\u5230\u9879\u76ee\u4e2d\u4e4b\u524d\uff0c\u4ed4\u7ec6\u68c0\u67e5\u5e93\u7684\u8bb8\u53ef\u8bc1\u3002|\n", "2408.05126": "|**2024-08-09**|**Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media**|Petre Breazu et.al.|[2408.05126](http://arxiv.org/abs/2408.05126)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u6f14\u8fdb\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6587\u672c\u5206\u6790\u4e2d\u7684\u53d1\u5c55\u4e0e\u5e94\u7528\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5404\u79cdLLMs\u5728\u8fdb\u884c\u5b9a\u6027\u5206\u6790\u65f6\u5c55\u73b0\u51fa\u7684\u6f5c\u529b\u88ab\u5bc4\u4e88\u539a\u671b\uff0c\u4f46\u5b83\u4eec\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u4e2d\u7684\u5e94\u7528\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u8ba8\u3002\u672c\u6587\u901a\u8fc7\u4e00\u9879\u4ee5GPT-4\u4e3a\u6838\u5fc3\u7684\u7814\u7a76\u5b9e\u9a8c\uff0c\u4e3aLLMs\u5728\u5b9a\u6027\u5206\u6790\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\u3002\u7814\u7a76\u57fa\u4e8e\u4e00\u4e2a\u6765\u81ea\u6b27\u76df\u8d44\u52a9\u9879\u76ee\u7684YouTube\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u805a\u7126\u4e8e2016\u5e74\u745e\u5178\u7f57\u9a6c\u5c3c\u4e9a\u79fb\u6c11\u7fa4\u4f53\u7684\u4ee3\u8868\u5f62\u8c61\uff0c\u8fd9\u4e00\u65f6\u671f\u6b63\u503c2015\u5e74\u96be\u6c11\u5371\u673a\u4e4b\u540e\uff0c\u7d27\u90bb2017\u5e74\u7684\u745e\u5178\u5168\u56fd\u9009\u4e3e\u3002\u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u5c06\u4eba\u7c7b\u667a\u6167\u4e0eAI\u7684\u89c4\u6a21\u548c\u6548\u7387\u76f8\u7ed3\u5408\u7684\u53ef\u80fd\u6027\uff0c\u901a\u8fc7\u5206\u6790LLMs\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u5e94\u7528\u4f18\u52a3\uff0c\u5e76\u8ba8\u8bba\u672a\u6765\u53ef\u80fd\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.05123": "|**2024-08-09**|**Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video**|Chunggi Lee et.al.|[2408.05123](http://arxiv.org/abs/2408.05123)|null|\u968f\u7740\u7bee\u7403\u8fd0\u52a8\u7684\u666e\u53ca\uff0c\u7c89\u4e1d\u4eec\u5e38\u5e38\u56e0\u6bd4\u8d5b\u8282\u594f\u5feb\u548c\u590d\u6742\u5ea6\u9ad8\u800c\u611f\u5230\u56f0\u60d1\u3002\u7bee\u7403\u6218\u672f\u6d89\u53ca\u4e00\u7cfb\u5217\u590d\u6742\u7684\u52a8\u4f5c\uff0c\u9700\u8981\u5927\u91cf\u7684\u77e5\u8bc6\u624d\u80fd\u5b8c\u5168\u7406\u89e3\u3002\u8fd9\u79cd\u590d\u6742\u6027\u5bfc\u81f4\u4e86\u5bf9\u989d\u5916\u4fe1\u606f\u548c\u89e3\u91ca\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u4f1a\u5206\u6563\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u5173\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSportify\u7684\u89c6\u89c9\u95ee\u7b54\u7cfb\u7edf\uff0c\u5b83\u878d\u5408\u4e86\u53d9\u4e8b\u548c\u5d4c\u5165\u5f0f\u53ef\u89c6\u5316\uff0c\u65e8\u5728\u4e3a\u7403\u8ff7\u63d0\u4f9b\u7bee\u7403\u6218\u672f\u7591\u95ee\u7684\u6e05\u6670\u89e3\u7b54\uff0c\u5e2e\u52a9\u4ed6\u4eec\u7406\u89e3\u6bd4\u8d5b\u7684\u5404\u79cd\u65b9\u9762\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u79cd\u65b0\u578b\u7684\u52a8\u4f5c\u53ef\u89c6\u5316\uff08\u4f20\u7403\u3001\u5207\u5165\u548c\u63a9\u62a4\uff09\uff0c\u4ee5\u5c55\u793a\u5173\u952e\u52a8\u4f5c\u5e8f\u5217\u3002\u4e3a\u4e86\u89e3\u91ca\u7403\u5458\u884c\u52a8\u80cc\u540e\u7684\u539f\u56e0\u548c\u903b\u8f91\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u53d9\u4e8b\u6587\u672c\u3002\u6211\u4eec\u91c7\u7528\u6545\u4e8b\u8bb2\u8ff0\u7684\u65b9\u6cd5\u6765\u63cf\u8ff0\u590d\u6742\u573a\u666f\uff0c\u4ece\u7b2c\u4e00\u4eba\u79f0\u548c\u7b2c\u4e09\u4eba\u79f0\u7684\u89d2\u5ea6\u8fdb\u884c\u53d9\u8ff0\uff0c\u5e76\u878d\u5165\u52a8\u4f5c\u53ef\u89c6\u5316\u3002\u6211\u4eec\u901a\u8fc7\u4e0e\u7bee\u7403\u7c89\u4e1d\u7684\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86Sportify\u5728\u6df1\u5316\u6218\u672f\u6d1e\u5bdf\u529b\u548c\u589e\u5f3a\u89c2\u8d5b\u4f53\u9a8c\u65b9\u9762\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u7b2c\u4e09\u4eba\u79f0\u53d9\u8ff0\u6709\u52a9\u4e8e\u4eba\u4eec\u83b7\u5f97\u6df1\u5165\u7684\u6bd4\u8d5b\u89e3\u91ca\uff0c\u800c\u7b2c\u4e00\u4eba\u79f0\u53d9\u8ff0\u5219\u589e\u5f3a\u4e86\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u53c2\u4e0e\u611f\u3002|\n", "2408.05109": "|**2024-08-09**|**A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?**|Xinyu Liu et.al.|[2408.05109](http://arxiv.org/abs/2408.05109)|**[link](https://github.com/hkustdial/nl2sql_handbook)**|\u7ffb\u8bd1\u5982\u4e0b\uff1a \u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u5230SQL\u67e5\u8be2\uff08\u5373NL2SQL\uff09\u7684\u7ffb\u8bd1\u53ef\u4ee5\u663e\u8457\u964d\u4f4e\u8bbf\u95ee\u5173\u7cfb\u6570\u636e\u5e93\u7684\u969c\u788d\uff0c\u5e76\u652f\u6301\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\uff0cNL2SQL\u7684\u6027\u80fd\u5f97\u5230\u4e86\u5927\u5e45\u63d0\u5347\u3002\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684NL2SQL\u6280\u672f\u7efc\u8ff0\uff0c\u57fa\u4e8eLLMs\u9a71\u52a8\uff0c\u8986\u76d6\u4e86\u4ece\u56db\u4e2a\u65b9\u9762\u5bf9\u6574\u4e2a\u751f\u547d\u5468\u671f\u7684\u5168\u9762\u5ba1\u67e5\uff1a\uff081\uff09\u6a21\u578b\uff1a\u5904\u7406\u81ea\u7136\u8bed\u8a00\u7684\u6a21\u7cca\u6027\u548c\u4e0d\u5145\u5206\u6027\uff0c\u5e76\u6b63\u786e\u6620\u5c04\u81ea\u7136\u8bed\u8a00\u4e0e\u6570\u636e\u5e93\u6a21\u5f0f\u548c\u5b9e\u4f8b\uff1b\uff082\uff09\u6570\u636e\uff1a\u4ece\u6536\u96c6\u8bad\u7ec3\u6570\u636e\u3001\u5e94\u5bf9\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u6570\u636e\u5408\u6210\uff0c\u5230NL2SQL\u57fa\u51c6\uff1b\uff083\uff09\u8bc4\u4f30\uff1a\u4ece\u591a\u4e2a\u89d2\u5ea6\u4f7f\u7528\u4e0d\u540c\u6307\u6807\u5bf9NL2SQL\u65b9\u6cd5\u8fdb\u884c\u8bc4\u4f30\uff1b\uff084\uff09\u9519\u8bef\u5206\u6790\uff1a\u5206\u6790NL2SQL\u9519\u8bef\u4ee5\u627e\u5230\u6839\u672c\u539f\u56e0\uff0c\u5e76\u6307\u5bfcNL2SQL\u6a21\u578b\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5f00\u53d1NL2SQL\u89e3\u51b3\u65b9\u6848\u7684\u4e00\u6761\u7ecf\u9a8c\u6cd5\u5219\u3002\u6700\u540e\uff0c\u8ba8\u8bba\u4e86\u5728LLMs\u65f6\u4ee3NL2SQL\u7684\u7814\u7a76\u6311\u6218\u548c\u5f00\u653e\u95ee\u9898\u3002 \u8bf7\u6ce8\u610f\uff0c\u6458\u8981\u4e2d\u5df2\u53bb\u9664\u6240\u6709\u4e0d\u5fc5\u8981\u7684\u5b57\u7b26\uff0c\u5305\u62ec\",\"\u7b26\u53f7\u3002|\n", "2408.06332": "|**2024-08-12**|**Animate, or Inanimate, That is the Question for Large Language Models**|Leonardo Ranaldi et.al.|[2408.06332](http://arxiv.org/abs/2408.06332)|null|\u4eba\u7c7b\u7684\u8ba4\u77e5\u6838\u5fc3\u4e0e\u201c\u6709\u751f\u547d\u6027\u201d\u8fd9\u4e00\u6982\u5ff5\u7d27\u5bc6\u76f8\u8fde\uff0c\u5b83\u5728\u5851\u9020\u8bb0\u5fc6\u3001\u89c6\u89c9\u4ee5\u53ca\u591a\u5c42\u6b21\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u867d\u7136\u201c\u6709\u751f\u547d\u6027\u201d\u5728\u8bed\u8a00\u4e2d\u901a\u8fc7\u52a8\u8bcd\u548c\u5f62\u5bb9\u8bcd\u7684\u7ec6\u5fae\u7ea6\u675f\u4f53\u73b0\u51fa\u6765\uff0c\u4f46\u5176\u5b66\u4e60\u548c\u7cbe\u70bc\u8fc7\u7a0b\u4e5f\u4f9d\u8d56\u4e8e\u975e\u8bed\u8a00\u4fe1\u606f\u3002\u540c\u6837\u5730\uff0c\u6211\u4eec\u5047\u8bbe\u5927\u6a21\u578b\u5728\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\u65f6\u80fd\u529b\u6709\u9650\u7684\u539f\u56e0\u662f\u5b83\u4eec\u4ec5\u4ee5\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u3002\u56e0\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u63a2\u8ba8\u7684\u95ee\u9898\u662f\uff1a\u5927\u6a21\u578b\u662f\u5426\u80fd\u591f\u4ee5\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u65b9\u5f0f\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\uff1f\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u65b9\u6cd5\u8fdb\u884c\u4e86\u7cfb\u7edf\u5206\u6790\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u5927\u6a21\u578b\u5728\u4e0d\u540c\u7684\u6709\u751f\u547d\u3001\u65e0\u751f\u547d\u3001\u5e38\u89c1\u548c\u5f02\u5e38\u60c5\u5883\u4e0b\u8fdb\u884c\u64cd\u4f5c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u5927\u6a21\u578b\u4e3b\u8981\u57fa\u4e8e\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f46\u5728\u9762\u5bf9\u5178\u578b\u7684\u6709\u751f\u547d\u4f53\u548c\u65e0\u751f\u547d\u4f53\u65f6\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u4e0e\u5148\u524d\u7814\u7a76\u4e00\u81f4\u7684\u4eba\u7c7b\u884c\u4e3a\u6a21\u5f0f\u3002\u56e0\u6b64\uff0c\u5927\u6a21\u578b\u80fd\u591f\u9002\u5e94\u7406\u89e3\u975e\u5178\u578b\u60c5\u51b5\uff0c\u901a\u8fc7\u8bc6\u522b\u5f02\u5e38\u60c5\u51b5\u4e3a\u6709\u751f\u547d\u4f53\uff0c\u800c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u4f9d\u8d56\u7684\u672a\u8a00\u660e\u7684\u8ba4\u77e5\u89e6\u53d1\u673a\u5236\u6765\u5206\u89e3\u52a8\u753b\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u80fd\u5426\u5bf9\u5177\u6709\u957f\u4e0a\u4e0b\u6587\u7684\u573a\u666f\u4ea7\u751f\u8d1f\u9762\u5f71\u54cd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u5584\u8ba1\u5212\uff1f\uff084\uff09\u662f\u5426\u53ef\u4ee5\u4f7f\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u4f46\u5728\u5904\u7406\u957f\u7bc7\u4e0a\u4e0b\u6587\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u65e0\u6cd5\u5173\u6ce8\u5173\u952e\u90e8\u5206\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u5206\u6790\u957f\u671f\u89c4\u5212\uff0c\u5e76\u4e0d\u80fd\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u4f9b\u7ec6\u5316\u4f7f\u7528\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3a\u53cd\u9988\u611f\u77e5\u5fae\u8c03\uff08FAFT\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4e86\u6b63\u8d1f\u53cd\u9988\uff0c\u76f8\u8f83\u4e8e\u76d1\u7763\u5f0f\u5fae\u8c03\uff08SFT\uff09\uff0c\u5b83\u80fd\u5e26\u6765\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u6709\u5173\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.06292": "|**2024-08-12**|**The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery**|Chris Lu et.al.|[2408.06292](http://arxiv.org/abs/2408.06292)|**[link](https://github.com/sakanaai/ai-scientist)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u7684\u79d1\u5b66\u53d1\u73b0\uff0c\u4f7f\u524d\u6cbf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u72ec\u7acb\u8fdb\u884c\u7814\u7a76\uff0c\u5e76\u4f20\u8fbe\u5176\u7814\u7a76\u6210\u679c\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201cAI\u79d1\u5b66\u5bb6\u201d\u8fd9\u4e00\u6982\u5ff5\uff0c\u5b83\u80fd\u751f\u6210\u65b0\u9896\u7684\u7814\u7a76\u601d\u8def\uff0c\u7f16\u5199\u4ee3\u7801\uff0c\u6267\u884c\u5b9e\u9a8c\uff0c\u53ef\u89c6\u5316\u7ed3\u679c\uff0c\u64b0\u5199\u5b8c\u6574\u7684\u79d1\u5b66\u8bba\u6587\uff0c\u5e76\u8fdb\u884c\u6a21\u62df\u7684\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\u4ee5\u8fdb\u884c\u8bc4\u4f30\u3002\u7406\u8bba\u4e0a\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u8fed\u4ee3\u8fdb\u884c\uff0c\u4ee5\u5f00\u653e\u6027\u65b9\u5f0f\u53d1\u5c55\u60f3\u6cd5\uff0c\u5c31\u50cf\u4eba\u7c7b\u7684\u79d1\u5b66\u793e\u533a\u4e00\u6837\u3002 \u901a\u8fc7\u5c06\u5176\u5e94\u7528\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u4e09\u4e2a\u4e0d\u540c\u5b50\u9886\u57df\uff1a\u6269\u6563\u5efa\u6a21\u3001\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u8bed\u8a00\u5efa\u6a21\u548c\u5b66\u4e60\u52a8\u6001\uff0c\u5c55\u793a\u4e86\u5176\u7075\u6d3b\u6027\u3002\u6bcf\u4e00\u7bc7\u8bba\u6587\u7684\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e15\u7f8e\u5143\u3002\u4e3a\u4e86\u8bc4\u4f30\u751f\u6210\u7684\u8bba\u6587\uff0c\u6211\u4eec\u8bbe\u8ba1\u5e76\u9a8c\u8bc1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5ba1\u7a3f\u4eba\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u5728\u8bc4\u4ef7\u8bba\u6587\u5206\u6570\u65b9\u9762\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u8868\u73b0\u3002AI\u79d1\u5b66\u5bb6\u80fd\u591f\u4ea7\u751f\u8d85\u8fc7\u9876\u7ea7\u673a\u5668\u5b66\u4e60\u4f1a\u8bae\u63a5\u53d7\u9608\u503c\u7684\u8bba\u6587\uff0c\u8fd9\u662f\u7531\u6211\u4eec\u7684\u81ea\u52a8\u5ba1\u7a3f\u4eba\u5224\u65ad\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u9886\u57df\u79d1\u5b66\u7814\u7a76\u65b0\u7eaa\u5143\u7684\u5f00\u59cb\uff1a\u5c06AI\u4ee3\u7406\u7684\u53d8\u9769\u6027\u4f18\u52bf\u5e26\u5165\u6574\u4e2a\u7814\u7a76\u8fc7\u7a0b\uff0c\u4f7f\u6211\u4eec\u66f4\u63a5\u8fd1\u4e00\u4e2a\u80fd\u591f\u91ca\u653e\u89e3\u51b3\u4e16\u754c\u6700\u8270\u5de8\u95ee\u9898\u7684\u65e0\u9650\u53ef\u8d1f\u62c5\u521b\u65b0\u4e0e\u521b\u9020\u529b\u7684\u4e16\u754c\u3002\u6240\u6709\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/SakanaAI/AI-Scientist\u3002**|\n", "2408.06281": "|**2024-08-12**|**MovieSum: An Abstractive Summarization Dataset for Movie Screenplays**|Rohit Saxena et.al.|[2408.06281](http://arxiv.org/abs/2408.06281)|**[link](https://github.com/saxenarohit/moviesum)**|**\u7535\u5f71\u5267\u672c\u7684\u6982\u8ff0\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u8981\u6c42\u7406\u89e3\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u548c\u7535\u5f71\u7279\u6709\u7684\u5404\u79cd\u5143\u7d20\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u6863\u6982\u8ff0\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5728\u5904\u7406\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u65f6\u9047\u5230\u56f0\u96be\u3002\u6b64\u5916\uff0c\u867d\u7136\u6700\u8fd1\u7684\u7814\u7a76\u5173\u6ce8\u7535\u89c6\u811a\u672c\uff0c\u4f46\u7535\u5f71\u5267\u672c\u6982\u8ff0\u4ecd\u7136\u7f3a\u4e4f\u63a2\u7d22\u3002\u4e3a\u4e86\u6fc0\u53d1\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3aMovieSum\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u7535\u5f71\u5267\u672c\u7684\u62bd\u8c61\u6982\u8ff0\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u5305\u542b\u4e862200\u4e2a\u7535\u5f71\u5267\u672c\u53ca\u5176\u5bf9\u5e94\u7684\u7ef4\u57fa\u767e\u79d1\u5267\u60c5\u6982\u8ff0\u3002\u6211\u4eec\u4eba\u5de5\u683c\u5f0f\u5316\u4e86\u7535\u5f71\u5267\u672c\u4ee5\u8868\u793a\u5176\u7ed3\u6784\u5143\u7d20\u3002\u4e0e\u73b0\u6709\u7684\u6570\u636e\u96c6\u76f8\u6bd4\uff0cMovieSum\u5177\u6709\u51e0\u4e2a\u72ec\u7279\u7279\u70b9\uff1a\uff081\uff09\u5b83\u5305\u62ec\u7535\u5f71\u5267\u672c\uff0c\u8fd9\u4e9b\u5267\u672c\u6bd4\u7535\u89c6\u5267\u811a\u672c\u66f4\u957f\u3002\uff082\uff09\u5b83\u7684\u89c4\u6a21\u662f\u4e4b\u524d\u7535\u5f71\u5267\u672c\u6570\u636e\u96c6\u7684\u4e24\u500d\u3002\uff083\uff09\u5b83\u63d0\u4f9b\u4e86IMDb ID\u7b49\u5143\u6570\u636e\uff0c\u65b9\u4fbf\u83b7\u53d6\u989d\u5916\u7684\u5916\u90e8\u77e5\u8bc6\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6982\u8ff0\u7684\u7ed3\u679c\uff0c\u4ee5\u63d0\u4f9b\u8be6\u7ec6\u7684\u57fa\u51c6\u3002**|\n", "2408.06276": "|**2024-08-13**|**Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation**|Jieyong Kim et.al.|[2408.06276](http://arxiv.org/abs/2408.06276)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u4e86\u5b83\u4eec\u5728\u63a8\u8350\u7cfb\u7edf\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u7684\u6f5c\u529b\uff0c\u5f80\u5f80\u53d7\u9650\u4e8e\u8f93\u5165\u4fe1\u606f\u7684\u6709\u9650\u6027\uff0c\u672a\u80fd\u5168\u9762\u53d1\u6325\u5176\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEXP3RT\u7684\u65b0\u9896LLM\u63a8\u8350\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u7528\u6237\u548c\u7269\u54c1\u8bc4\u8bba\u4e2d\u8574\u542b\u7684\u4e30\u5bcc\u504f\u597d\u4fe1\u606f\u3002 EXP3RT\u901a\u8fc7\u4ece\u6559\u5e08LLM\u4e2d\u8fdb\u884c\u77e5\u8bc6\u84b8\u998f\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u5173\u952e\u7684\u4e09\u9879\u4efb\u52a1\uff1a\u9996\u5148\uff0c\u5b83\u4ece\u539f\u59cb\u8bc4\u8bba\u4e2d\u63d0\u53d6\u5e76\u5c01\u88c5\u6838\u5fc3\u7684\u4e3b\u89c2\u504f\u597d\uff1b\u5176\u6b21\uff0c\u6839\u636e\u7279\u5b9a\u6807\u51c6\u805a\u5408\u548c\u603b\u7ed3\u8fd9\u4e9b\u504f\u597d\uff0c\u5f62\u6210\u7528\u6237\u548c\u7269\u54c1\u7684\u6863\u6848\uff1b\u6700\u540e\uff0c\u8003\u8651\u7528\u6237/\u7269\u54c1\u6863\u6848\u4ee5\u53ca\u7269\u54c1\u63cf\u8ff0\u4e2d\u7684\u4e3b\u5ba2\u89c2\u4fe1\u606f\uff0c\u751f\u6210\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\u548c\u9884\u6d4b\u8bc4\u7ea7\uff0c\u5373\u57fa\u4e8e\u63a8\u7406\u7684\u8bc4\u7ea7\u9884\u6d4b\u3002\u8fd9\u79cd\u7531EXP3RT\u63d0\u4f9b\u7684\u4e2a\u6027\u5316\u504f\u597d\u63a8\u7406\u80fd\u591f\u63d0\u9ad8\u8bc4\u7ea7\u9884\u6d4b\u7684\u51c6\u786e\u6027\uff0c\u5e76\u4e3a\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u5fe0\u5b9e\u4e14\u5408\u7406\u7684\u89e3\u91ca\u3002 \u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cEXP3RT\u5728\u8bc4\u7ea7\u9884\u6d4b\u548c\u5019\u9009\u9879\u76ee\u91cd\u6392\u5e8f\uff08\u7528\u4e8etop-k\u63a8\u8350\uff09\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\uff0c\u540c\u65f6\u663e\u8457\u63d0\u5347\u4e86\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u89e3\u91ca\u6027\u3002|\n", "2408.06273": "|**2024-08-12**|**FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data**|Haoran Sun et.al.|[2408.06273](http://arxiv.org/abs/2408.06273)|**[link](https://github.com/tjunlp-lab/fuxitranyu)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bb8\u591aLLM\u5728\u9ad8\u8d44\u6e90\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u591a\u8bed\u8a00LLM\u2014\u2014FuxiTranyu\uff0c\u65e8\u5728\u6ee1\u8db3\u7814\u7a76\u793e\u533a\u5bf9\u5e73\u8861\u4e14\u9ad8\u6027\u80fd\u591a\u8bed\u8a00\u80fd\u529b\u7684\u9700\u6c42\u3002FuxiTranyu-8B\uff0c\u5177\u670980\u4ebf\u53c2\u6570\u7684\u57fa\u6a21\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5728\u4e00\u4e2a\u7cbe\u5fc3\u5e73\u8861\u7684\u591a\u8bed\u8a00\u6570\u636e\u4ed3\u5e93\u4e0a\uff0c\u8be5\u4ed3\u5e93\u5305\u542b\u8986\u76d643\u79cd\u81ea\u7136\u8bed\u8a00\u548c16\u79cd\u7f16\u7a0b\u8bed\u8a00\u76846000\u4ebf\u4e2a\u4ee4\u724c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e24\u4e2a\u6307\u4ee4\u8c03\u4f18\u6a21\u578b\uff1aFuxiTranyu-8B-SFT\uff0c\u5b83\u57fa\u4e8e\u591a\u5143\u6307\u4ee4\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\uff1b\u4ee5\u53caFuxiTranyu-8B-DPO\uff0c\u5728\u504f\u597d\u6570\u636e\u96c6\u4e0a\u8fdb\u4e00\u6b65\u7cbe\u70bc\u4ee5\u589e\u5f3a\u5bf9\u9f50\u80fd\u529b\u7684DPO\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u5728\u591a\u79cd\u591a\u8bed\u8a00\u57fa\u51c6\u4e0a\u7684\u7ed3\u679c\u663e\u793a\uff0cFuxiTranyu\u5728\u4e0e\u73b0\u6709\u591a\u8bed\u8a00LLM\uff08\u5982BLOOM-7B\u3001PolyLM-13B\u3001Llama-2-Chat-7B\u548cMistral-7B-Instruct\uff09\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u6027\u80fd\u3002\u795e\u7ecf\u5143\u7ea7\u548c\u8868\u793a\u7ea7\u53ef\u89e3\u91ca\u6027\u5206\u6790\u8868\u660e\uff0cFuxiTranyu\u80fd\u591f\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5b66\u4e60\u4e00\u81f4\u7684\u591a\u8bed\u8a00\u8868\u793a\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u591a\u8bed\u8a00LLM\u53ca\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u5e03\u4e86\u57fa\u6a21\u548c\u6307\u4ee4\u8c03\u4f18\u7684FuxiTranyu\u6a21\u578b\uff0c\u4ee5\u53ca58\u4e2a\u9884\u8bad\u7ec3\u68c0\u67e5\u70b9\uff0c\u901a\u8fc7HuggingFace\u548cGithub\u516c\u5f00\u5206\u4eab\u3002|\n", "2408.06272": "|**2024-08-12**|**A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution**|Sampath Rajapaksha et.al.|[2408.06272](http://arxiv.org/abs/2408.06272)|null|\u5728\u4e0d\u65ad\u6f14\u8fdb\u7684\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u5206\u6790\u5e08\u9700\u8981\u5bc6\u5207\u5173\u6ce8\u6700\u65b0\u7684\u653b\u51fb\u8d8b\u52bf\u548c\u76f8\u5173\u4fe1\u606f\uff0c\u4ee5\u534f\u52a9\u8c03\u67e5\u4e0e\u5f52\u56e0\u7f51\u7edc\u653b\u51fb\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6280\u672f\u7684\u95ee\u7b54\u6a21\u578b\u53ca\u5176\u5e94\u7528\uff0c\u65e8\u5728\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u5bb6\u63d0\u4f9b\u6709\u5173\u7f51\u7edc\u653b\u51fb\u8c03\u67e5\u4e0e\u5f52\u56e0\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u5e93\uff08KB\uff09\uff0c\u80fd\u591f\u6839\u636e\u77e5\u8bc6\u5e93\u6216\u7528\u6237\u63d0\u4f9b\u7684\u5916\u90e8\u8d44\u6e90\u56de\u7b54\u7528\u6237\u7684\u67e5\u8be2\u3002 \u6211\u4eec\u901a\u8fc7\u5404\u79cd\u7c7b\u578b\u7684\u63d0\u95ee\uff0c\u5305\u62ec\u57fa\u4e8e\u77e5\u8bc6\u5e93\u3001\u5143\u6570\u636e\u3001\u77e5\u8bc6\u5e93\u4e2d\u7684\u7279\u5b9a\u6587\u6863\u4ee5\u53ca\u5916\u90e8\u8d44\u6e90\u7684\u63d0\u95ee\uff0c\u5bf9\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u8fdb\u884c\u4e86\u6d4b\u8bd5\u4e0e\u8bc4\u4f30\u3002\u6211\u4eec\u5c06\u77e5\u8bc6\u5e93\u4e3a\u57fa\u7840\u7684\u95ee\u9898\u7684\u7b54\u6848\u4e0eOpenAI\u7684GPT-3.5\u53ca\u6700\u65b0GPT-4\u7684LLM\u7b54\u6848\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u5728\u63d0\u4f9b\u7b54\u6848\u7684\u540c\u65f6\u7ed9\u51fa\u4e86\u6765\u6e90\u4fe1\u606f\uff0c\u5e76\u4e14\u514b\u670d\u4e86GPT\u6a21\u578b\u53ef\u80fd\u4ea7\u751f\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5bf9\u4e8e\u7f51\u7edc\u653b\u51fb\u7684\u8c03\u67e5\u4e0e\u5f52\u56e0\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u5f53RAG\u95ee\u7b54\u6a21\u578b\u5728\u67e5\u8be2\u4e4b\u5916\u63d0\u4f9b\u5c11\u91cf\u793a\u4f8b\u65f6\uff0c\u5176\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u901a\u5e38\u4f18\u4e8e\u4ec5\u63d0\u4f9b\u67e5\u8be2\u800c\u6ca1\u6709\u793a\u4f8b\u7684\u60c5\u51b5\u3002|\n", "2408.06266": "|**2024-08-12**|**Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment**|Karel D'Oosterlinck et.al.|[2408.06266](http://arxiv.org/abs/2408.06266)|**[link](https://github.com/contextualai/clair_and_apo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u4f7f\u7528\u5bf9\u6bd4\u6027\u5bf9\u9f50\u76ee\u6807\u548c\u504f\u597d\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5bf9\u9f50\u3002\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u5230\u6a21\u578b\u3001\u914d\u5bf9\u6570\u636e\u4ee5\u53ca\u76ee\u6807\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u4f7f\u5f97\u5bf9\u9f50\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u6709\u65f6\u5bfc\u81f4\u4e0d\u7406\u60f3\u7684\u6210\u679c\u3002\u6211\u4eec\u5bf9\u6b64\u8fdb\u884c\u4e86\u7814\u7a76\uff0c\u53d1\u73b0\uff08i\uff09\u5f53\u5e95\u5c42\u54cd\u5e94\u5177\u6709\u5bf9\u6bd4\u6027\u65f6\uff0c\u504f\u597d\u6570\u636e\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u5b66\u4e60\u4fe1\u53f7\uff1b\uff08ii\uff09\u5bf9\u9f50\u76ee\u6807\u5728\u8bad\u7ec3\u671f\u95f4\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u63a7\u5236\uff0c\u4ece\u800c\u5bfc\u81f4\u4e86\u66f4\u597d\u7684\u6027\u80fd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5bf9\u6bd4\u5b66\u4e60\u4eceAI\u4fee\u8ba2\uff08CLAIR\uff09\uff0c\u4e00\u79cd\u6570\u636e\u521b\u5efa\u65b9\u6cd5\uff0c\u53ef\u4ee5\u751f\u6210\u66f4\u5177\u6709\u5bf9\u6bd4\u6027\u7684\u504f\u597d\u5bf9\uff0c\u4ee5\u53ca\u951a\u5b9a\u504f\u597d\u4f18\u5316\uff08APO\uff09\uff0c\u4e00\u4e2a\u66f4\u5177\u53ef\u63a7\u6027\u548c\u7a33\u5b9a\u6027\u7684\u5bf9\u9f50\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u96c6\u548c\u5bf9\u9f50\u76ee\u6807\u6765\u5bf9Llama-3-8B-Instruct\u8fdb\u884c\u5bf9\u9f50\uff0c\u5e76\u6d4b\u91cf\u4e86\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684MixEval-Hard\u5206\u6570\u3002CLAIR\u504f\u597d\u5bfc\u81f4\u6240\u6709\u6570\u636e\u96c6\u4e2d\u7684\u6700\u4f73\u6027\u80fd\uff0c\u800cAPO\u59cb\u7ec8\u4f18\u4e8e\u8f83\u5c11\u53ef\u63a7\u7684\u76ee\u6807\u3002\u901a\u8fc7\u572832K CLAIR\u504f\u597d\u4e0a\u4f7f\u7528APO\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u7684\u6700\u4f73\u6a21\u578b\u63d0\u9ad8\u4e86Llama-3-8B-Instruct\u7684\u6027\u80fd\u8fbe7.65%\uff0c\u5c06\u4e0eGPT4-turbo\u7684\u5dee\u8ddd\u7f29\u5c0f\u4e8645%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/ContextualAI/CLAIR_and_APO\u3002|\n", "2408.06223": "|**2024-08-12**|**On Effects of Steering Latent Representation for Large Language Model Unlearning**|Dang Huu-Tien et.al.|[2408.06223](http://arxiv.org/abs/2408.06223)|null|\u672c\u6587\u9996\u5148\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\u4e86\u5f15\u5bfc\u6a21\u578b\u4e2d\u95f4\u5c42\u9057\u5fd8\u8868\u793a\u5411\u968f\u673a\u65b9\u5411\u504f\u79fb\uff0c\u80fd\u964d\u4f4e\u6587\u672c\u751f\u6210\u7684\u7f6e\u4fe1\u5ea6\uff0c\u5bfc\u81f4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ea7\u751f\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u56de\u7b54\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u7cfb\u6570\u5982\u4f55\u5f71\u54cd\u9057\u5fd8\u6837\u672c\u8868\u793a\u4e0e\u968f\u673a\u65b9\u5411\u7684\u4e00\u81f4\u6027\uff0c\u5e76\u6697\u793a\u4e86\u4e0d\u540c\u7f51\u7edc\u5c42\u4e0b\u6709\u6548\u7684\u6700\u4f18\u7cfb\u6570\u503c\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5229\u7528\u4ee3\u8868\u9519\u4e71\u6cd5\uff08RMU\uff09\u8fdb\u884c\u5b66\u4e60\u64a4\u9500\u540e\u7684\u6a21\u578b\u80fd\u591f\u62b5\u5fa1\u5bf9\u6297\u6027\u9003\u8131\u653b\u51fb\u3002 \u6700\u540e\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u5206\u6790\u8868\u660e\uff0c\u5f53\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u95f4\u548c\u540e\u671f\u5c42\u65f6\uff0cRMU\u7684\u6709\u6548\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u2014\u2014\u81ea\u9002\u5e94RMU\uff0c\u8be5\u65b9\u6cd5\u4f7f\u5927\u591a\u6570\u5c42\u90fd\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\uff0c\u4e14\u4e0d\u589e\u52a0\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u7814\u7a76\u76f8\u6bd4\uff0c\u81ea\u9002\u5e94RMU\u663e\u8457\u63d0\u9ad8\u4e86\u5b66\u4e60\u64a4\u9500\u7684\u6027\u80fd\u3002|\n", "2408.06186": "|**2024-08-12**|**Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting**|Halley Young et.al.|[2408.06186](http://arxiv.org/abs/2408.06186)|null|\u751f\u6210\u591a\u6837\u5316\u7684\u6587\u672c\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u591a\u6837\u6027\u7684\u7814\u7a76\u4e3b\u8981\u901a\u8fc7$n$-gram\u591a\u6837\u6027\u6216BERT\u5d4c\u5165\u7684\u591a\u6837\u6027\u7b49\u6307\u6807\u8fdb\u884c\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u8003\u8651\u591a\u6837\u6027\u7684\u7ef4\u5ea6\u4e0a\u7f3a\u4e4f\u7528\u6237\u63a7\u5236\u6743\u3002\u4f8b\u5982\uff0c\u5728\u8bd7\u6b4c\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u5e0c\u671b\u5728\u62bc\u97f5\u548c\u8282\u594f\u65b9\u9762\u5b9e\u73b0\u591a\u6837\u6027\uff0c\u800c\u5728\u4ee3\u7801\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u66f4\u5173\u6ce8\u89e3\u51b3\u95ee\u9898\u65f6\u6240\u4f7f\u7528\u7684\u8868\u8fbe\u65b9\u5f0f\u7684\u591a\u6837\u6027\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u7ed3\u6784\u591a\u6837\u6027\uff08Structural Diversity\uff09\u7684\u65b0\u6307\u6807\u3002\u8be5\u6307\u6807\u5141\u8bb8\u7528\u6237\u63d0\u4f9b\u4e00\u4e2a\u6620\u5c04\uff0c\u5c06\u751f\u6210\u7684\u6587\u672c\u8f6c\u6362\u4e3a\u6355\u83b7\u7528\u6237\u5173\u5fc3\u7684\u591a\u6837\u6027\u7684\u7279\u5f81\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u5177\u4f53\u5730\u63a7\u5236\u4ed6\u4eec\u60f3\u8981\u63a2\u7d22\u7684\u591a\u6837\u6027\u7ef4\u5ea6\uff0c\u5982\u5728\u8bd7\u6b4c\u9886\u57df\u5173\u6ce8\u62bc\u97f5\u548c\u8282\u594f\uff0c\u5728\u4ee3\u7801\u9886\u57df\u5173\u6ce8\u7279\u5b9a\u7684\u8868\u8fbe\u65b9\u5f0f\u7b49\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u94fe\u5f0f\u89c4\u8303\uff08Chain-of-Specification\uff0cCoS\uff09\u7684\u65b0\u578b\u7b56\u7565\uff0c\u7528\u4e8e\u901a\u8fc7\u9996\u5148\u8ba9LLM\u751f\u6210\u63cf\u8ff0\u7279\u5b9a\u7ed3\u6784\u7279\u5f81\u5b9e\u4f8b\u7684\u89c4\u8303\uff0c\u7136\u540e\u5f15\u5bfcLLM\u751f\u6210\u6ee1\u8db3\u8fd9\u4e9b\u7279\u5f81\u7684\u6587\u672c\u6765\u63d0\u9ad8\u591a\u6837\u6027\uff1b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u7b56\u7565\u9002\u7528\u4e8e\u9ed1\u76d2LLM\u3002\u5728\u6211\u4eec\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728\u8bd7\u6b4c\u548c\u4ee3\u7801\u9886\u57df\u5b9e\u73b0\u7ed3\u6784\u591a\u6837\u6027\u65f6\uff0cCoS\u7b56\u7565\u76f8\u6bd4\u591a\u4e2a\u57fa\u7ebf\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6837\u6027\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u5728SWE-Bench Lite\u4e2d\u89e3\u51b3\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u5b83\u4eec\u7684\u72ec\u7279\u4e13\u957f\u3002DEI\u4f5c\u4e3a\u4e00\u4e2a\u4f4d\u4e8e\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7531DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90\u7684SWE\u4ee3\u7406\uff0c\u5176\u4e2a\u4f53\u89e3\u51b3\u7387\u6700\u9ad8\u4e3a27.3%\u5728SWE-Bench Lite\u4e2d\uff0c\u901a\u8fc7\u91c7\u7528DEI\uff0c\u53ef\u4ee5\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u6027\u80fd\u7ec4\u8868\u73b0\u51fa\u8272\uff0c\u8fbe\u5230\u4e8655%\u7684\u89e3\u51b3\u7387\uff0c\u5728SWE-Bench Lite\u4e2d\u83b7\u5f97\u4e86\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5c\u578b\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u5c55\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.07055": "|**2024-08-13**|**LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs**|Yushi Bai et.al.|[2408.07055](http://arxiv.org/abs/2408.07055)|**[link](https://github.com/thudm/longwriter)**|**\u5f53\u524d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5904\u7406\u6700\u591a10\u4e07\u5b57\u7684\u8f93\u5165\uff0c\u7136\u800c\u5728\u751f\u6210\u8d85\u8fc72\u5343\u5b57\u7684\u8f93\u51fa\u65f6\u5374\u529b\u4e0d\u4ece\u5fc3\u3002\u901a\u8fc7\u63a7\u5236\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u7684\u6709\u6548\u751f\u6210\u957f\u5ea6\u672c\u8d28\u4e0a\u53d7\u5230\u5176\u5728\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u671f\u95f4\u6240\u89c1\u6837\u672c\u7684\u9650\u5236\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u5b83\u4eec\u7684\u8f93\u51fa\u9650\u5236\u6e90\u4e8e\u73b0\u6709SFT\u6570\u636e\u96c6\u4e2d\u957f\u8f93\u51fa\u793a\u4f8b\u7684\u7a00\u7f3a\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentWrite\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u4ee3\u7406\u7684\u7ba1\u9053\uff0c\u5c06\u8d85\u957f\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u4f7f\u73b0\u6709\u7684LLMs\u80fd\u591f\u751f\u6210\u8d85\u8fc72\u4e07\u5b57\u7684\u8fde\u8d2f\u8f93\u51fa\u3002 \u501f\u52a9AgentWrite\uff0c\u6211\u4eec\u6784\u5efa\u4e86LongWriter-6k\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e866000\u4e2aSFT\u6570\u636e\uff0c\u8f93\u51fa\u957f\u5ea6\u8303\u56f4\u4ece2\u5343\u523032\u5343\u5b57\u3002\u901a\u8fc7\u5c06\u6b64\u6570\u636e\u96c6\u7eb3\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u73b0\u6709\u6a21\u578b\u7684\u8f93\u51fa\u957f\u5ea6\u6269\u5c55\u81f3\u8d85\u8fc71\u4e07\u5b57\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u8f93\u51fa\u8d28\u91cf\u3002\u6211\u4eec\u4e5f\u5f00\u53d1\u4e86LongBench-Write\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u8d85\u957f\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u76849\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7DPO\u8fdb\u4e00\u6b65\u6539\u8fdb\u540e\uff0c\u5728\u8fd9\u4e00\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u66f4\u5927\u89c4\u6a21\u7684\u4e13\u6709\u6a21\u578b\u3002 \u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u73b0\u6709\u7684\u957f\u4e0a\u4e0b\u6587LLMs\u5b9e\u9645\u4e0a\u5df2\u7ecf\u5177\u5907\u4e86\u66f4\u5927\u7684\u8f93\u51fa\u7a97\u53e3\u7684\u80fd\u529b\u2014\u2014\u4f60\u53ea\u9700\u8981\u5728\u6a21\u578b\u5bf9\u9f50\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u5e26\u6709\u5ef6\u957f\u8f93\u51fa\u7684\u6570\u636e\u5373\u53ef\u89e3\u9501\u8fd9\u4e00\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u4ee5\u5728\uff1ahttps://github.com/THUDM/LongWriter\u627e\u5230\u3002**|\n", "2408.07004": "|**2024-08-13**|**Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models**|Chun Jie Chong et.al.|[2408.07004](http://arxiv.org/abs/2408.07004)|null|\u57fa\u4e8e\u7f51\u7edc\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u670d\u52a1\u5df2\u88ab\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e76\u5df2\u6210\u4e3a\u6211\u4eec\u4e92\u8054\u7f51\u4f53\u9a8c\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002\u7b2c\u4e09\u65b9\u63d2\u4ef6\u901a\u8fc7\u63d0\u4f9b\u5bf9\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u548c\u670d\u52a1\u7684\u8bbf\u95ee\uff0c\u589e\u5f3a\u4e86LLM\u7684\u529f\u80fd\u6027\u3002\u7136\u800c\uff0c\u4e0e\u8fd9\u4e9b\u670d\u52a1\u53ca\u5176\u7b2c\u4e09\u65b9\u63d2\u4ef6\u76f8\u5173\u7684\u9690\u79c1\u540e\u679c\u5e76\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u654f\u611f\u63d0\u793a\u6570\u636e\u5728\u4e91\u57faLLM\u63d0\u4f9b\u5546\u548c\u7b2c\u4e09\u65b9\u63d2\u4ef6\u4e2d\u88ab\u5b58\u50a8\u3001\u5904\u7406\u548c\u5171\u4eab\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCasper\u7684\u63d0\u793a\u51c0\u5316\u6280\u672f\uff0c\u65e8\u5728\u901a\u8fc7\u68c0\u6d4b\u5e76\u4ece\u7528\u6237\u8f93\u5165\u4e2d\u5220\u9664\u654f\u611f\u4fe1\u606f\u6765\u4fdd\u62a4\u7528\u6237\u9690\u79c1\uff0c\u4ece\u800c\u5728\u53d1\u9001\u7ed9LLM\u670d\u52a1\u4e4b\u524d\u4fdd\u62a4\u7528\u6237\u9690\u79c1\u3002Casper\u5b8c\u5168\u4f5c\u4e3a\u6d4f\u89c8\u5668\u6269\u5c55\u8fd0\u884c\u5728\u7528\u6237\u7684\u8bbe\u5907\u4e0a\uff0c\u65e0\u9700\u5bf9\u5728\u7ebfLLM\u670d\u52a1\u8fdb\u884c\u4efb\u4f55\u66f4\u6539\u3002Casper\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u4e09\u5c42\u51c0\u5316\u673a\u5236\uff0c\u5305\u62ec\u89c4\u5219\u57fa\u4e8e\u8fc7\u6ee4\u5668\u3001\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u5668\u548c\u6d4f\u89c8\u5668\u672c\u5730LLM\u4e3b\u9898\u6807\u8bc6\u5668\u3002\u6211\u4eec\u4f7f\u75284000\u4e2a\u5408\u6210\u63d0\u793a\u96c6\u5bf9Casper\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5b83\u80fd\u591f\u4ee5\u9ad8\u51c6\u786e\u7387\uff0898.5%\uff09\u6709\u6548\u5730\u8fc7\u6ee4\u51fa\u4e2a\u4eba\u53ef\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\u548c\u9690\u79c1\u654f\u611f\u8bdd\u9898\uff0889.9%\uff09\u3002|\n", "2408.06993": "|**2024-08-13**|**LLMs can Schedule**|Henrik Abgaryan et.al.|[2408.06993](http://arxiv.org/abs/2408.06993)|**[link](https://github.com/starjob42/datasetjsp)**|**\u5de5\u4f5c\u8f66\u95f4\u8c03\u5ea6\u95ee\u9898(JSSP)\u5728\u4f18\u5316\u751f\u4ea7\u6d41\u7a0b\u65b9\u9762\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u8be5\u95ee\u9898\u6d89\u53ca\u6709\u6548\u5206\u914d\u4efb\u52a1\u5230\u6709\u9650\u6570\u91cf\u7684\u673a\u5668\u4e0a\uff0c\u4ee5\u6700\u5c0f\u5316\u603b\u5904\u7406\u65f6\u95f4\u6216\u4f5c\u4e1a\u5ef6\u8fdf\u7b49\u56e0\u7d20\u3002\u5c3d\u7ba1\u8fd1\u671f\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\u5df2\u7ecf\u4ea7\u751f\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4f8b\u5982\u5f3a\u5316\u5b66\u4e60\u548c\u56fe\u795e\u7ecf\u7f51\u7edc\uff0c\u4f46\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u5728JSSP\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u4e13\u95e8\u4e3a\u8bad\u7ec3LLM\u8bbe\u8ba1\u7684120k\u6570\u636e\u96c6\uff0c\u4e13\u95e8\u9488\u5bf9JSSP\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8c03\u5ea6\u53ef\u4ee5\u5b9e\u73b0\u4e0e\u5176\u5b83\u795e\u7ecf\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u91c7\u6837\u65b9\u6cd5\uff0c\u4ee5\u63d0\u9ad8LLM\u5728\u89e3\u51b3JSSP\u65f6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.06941": "|**2024-08-13**|**OpenResearcher: Unleashing AI for Accelerated Scientific Research**|Yuxiang Zheng et.al.|[2408.06941](http://arxiv.org/abs/2408.06941)|**[link](https://github.com/gair-nlp/openresearcher)**|**\u5feb\u901f\u53d1\u5c55\u7684\u79d1\u5b66\u6587\u732e\u5bf9\u7814\u7a76\u4eba\u5458\u5728\u5404\u81ea\u9886\u57df\u4fdd\u6301\u6700\u65b0\u8fdb\u5c55\u548c\u63a2\u7d22\u65b0\u9886\u57df\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u5e73\u53f0\u2014\u2014OpenResearcher\uff0c\u5b83\u5229\u7528\u4eba\u5de5\u667a\u80fd\u6280\u672f\u52a0\u901f\u7814\u7a76\u8fc7\u7a0b\uff0c\u901a\u8fc7\u56de\u7b54\u7814\u7a76\u4eba\u5458\u7684\u591a\u79cd\u95ee\u9898\u6765\u5e2e\u52a9\u4ed6\u4eec\u3002OpenResearcher\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6784\u5efa\uff0c\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u7279\u5b9a\u9886\u57df\u7684\u6700\u65b0\u77e5\u8bc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5404\u79cd\u5de5\u5177\uff0c\u4f7fOpenResearcher\u80fd\u591f\u7406\u89e3\u7814\u7a76\u4eba\u5458\u7684\u95ee\u9898\u3001\u4ece\u79d1\u5b66\u6587\u732e\u4e2d\u641c\u7d22\u3001\u7b5b\u9009\u68c0\u7d22\u5230\u7684\u4fe1\u606f\u3001\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u7b54\u6848\uff0c\u5e76\u81ea\u6211\u4f18\u5316\u8fd9\u4e9b\u7b54\u6848\u3002OpenResearcher\u7075\u6d3b\u5730\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5728\u6548\u7387\u4e0e\u6709\u6548\u6027\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u3002\u7ed3\u679c\uff0cOpenResearcher\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8282\u7701\u65f6\u95f4\uff0c\u63d0\u9ad8\u4ed6\u4eec\u53d1\u73b0\u65b0\u89c1\u89e3\u548c\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u7a81\u7834\u7684\u6f5c\u529b\u3002\u6f14\u793a\u3001\u89c6\u9891\u548c\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/GAIR-NLP/OpenResearcher\u3002**|\n", "2408.06929": "|**2024-08-13**|**Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas**|Louis Kwok et.al.|[2408.06929](http://arxiv.org/abs/2408.06929)|**[link](https://github.com/louiskwoklf/llms-cultural-adaptability)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u591a\u6587\u5316\u73af\u5883\u4e2d\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5b83\u4eec\u7406\u89e3\u7528\u6237\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u8ba9LLM\u6a21\u62df\u4ee3\u8868\u5404\u79cd\u56fd\u7c4d\u7684\u4eba\u7c7b\u89d2\u8272\u8fdb\u884c\u95ee\u5377\u5f0f\u5fc3\u7406\u5b66\u5b9e\u9a8c\u6765\u8861\u91cf\u8fd9\u4e00\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4f7f\u7528GPT-3.5\u5bf9\u6765\u81ea15\u4e2a\u56fd\u5bb6\u76847,286\u540d\u53c2\u4e0e\u8005\u9605\u8bfb\u5e76\u56de\u5e94\u5177\u6709\u8bf4\u670d\u529b\u7684\u65b0\u95fb\u6587\u7ae0\u7684\u53cd\u5e94\u8fdb\u884c\u6a21\u62df\uff1b\u5e76\u5c06\u7ed3\u679c\u4e0e\u62e5\u6709\u76f8\u540c\u4eba\u53e3\u7edf\u8ba1\u7279\u5f81\u7684\u771f\u5b9e\u53c2\u4e0e\u8005\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0c\u660e\u786e\u6307\u5b9a\u4e00\u4e2a\u4eba\u7684\u5c45\u4f4f\u56fd\u53ef\u4ee5\u63d0\u9ad8GPT-3.5\u4e0e\u4ed6\u4eec\u7684\u53cd\u5e94\u7684\u4e00\u81f4\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5f15\u5165\u7684\u53d8\u5316\u663e\u8457\u964d\u4f4e\u4e86\u6574\u4f53\u4e00\u81f4\u6027\uff0c\u5e76\u4e14\u67d0\u4e9b\u8bed\u8a00\u7279\u522b\u5f71\u54cd\u4e86\u6027\u80fd\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u76f4\u63a5\u63d0\u4f9b\u56fd\u7c4d\u4fe1\u606f\u53ef\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6587\u5316\u9002\u5e94\u6027\uff0c\u4f46\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5e76\u4e0d\u4e00\u5b9a\u80fd\u53ef\u9760\u5730\u63d0\u9ad8\u6a21\u62df\u51c6\u786e\u6027\uff0c\u53cd\u800c\u53ef\u80fd\u635f\u5bb3\u6a21\u578b\u7684\u6709\u6548\u6027\u3002|\n", "2408.06904": "|**2024-08-13**|**Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives**|Zhihu Wang et.al.|[2408.06904](http://arxiv.org/abs/2408.06904)|null|\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6301\u7eed\u6269\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u589e\u5f3a\u5f80\u5f80\u4e0d\u8db3\u4ee5\u89e3\u51b3\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\u3002\u7cfb\u7edf\u6027\u5730\u5206\u6790\u8fd9\u4e9b\u5931\u8d25\u5e76\u6709\u6548\u63d0\u5347\u5176\u6027\u80fd\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86Re-TASK\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7406\u8bba\u6a21\u578b\uff0c\u4ece\u80fd\u529b\u3001\u6280\u80fd\u3001\u77e5\u8bc6\u7684\u89d2\u5ea6\u91cd\u65b0\u5ba1\u89c6LLM\u4efb\u52a1\uff0c\u9075\u5faa\u5e03\u5362\u59c6\u5206\u7c7b\u6cd5\u548c\u77e5\u8bc6\u7a7a\u95f4\u7406\u8bba\u7684\u539f\u5219\u3002Re-TASK\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u6df1\u5316\u6211\u4eec\u5bf9LLM\u7684\u7406\u89e3\u3001\u8bc4\u4f30\u548c\u63d0\u5347\uff0c\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u3002\u5b83\u63a2\u7d22\u4e86LLM\u7684\u80fd\u529b\u3001\u5904\u7406\u7684\u77e5\u8bc6\u4ee5\u53ca\u5e94\u7528\u7684\u6280\u80fd\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u9610\u660e\u4e86\u8fd9\u4e9b\u5143\u7d20\u5982\u4f55\u76f8\u4e92\u5173\u8054\u5e76\u5f71\u54cd\u4efb\u52a1\u8868\u73b0\u3002 \u901a\u8fc7\u5e94\u7528Re-TASK\u6846\u67b6\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8bb8\u591a\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5931\u8d25\u7684\u539f\u56e0\u4e3b\u8981\u5f52\u548e\u4e8e\u77e5\u8bc6\u4e0d\u8db3\u6216\u6280\u80fd\u9002\u5e94\u5ea6\u4e0d\u591f\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u7ed3\u6784\u5316\u7684\u7b56\u7565\u6765\u589e\u5f3aLLM\uff0c\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u77e5\u8bc6\u6ce8\u5165\u548c\u6280\u80fd\u9002\u5e94\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u80fd\u529b\u9879\uff0c\u5e76\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u5347\u4efb\u52a1\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u5c11\u5927\u91cf\u5fae\u8c03\u7684\u9700\u6c42\u3002\u6216\u8005\uff0c\u6211\u4eec\u4f7f\u7528\u80fd\u529b\u7279\u5b9a\u6307\u4ee4\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86\u663e\u8457\u63d0\u9ad8LLM\u5728\u6027\u80fd\u548c\u9002\u7528\u6027\u65b9\u9762\u7684\u6548\u679c\u3002|\n", "2408.06874": "|**2024-08-13**|**Leveraging Language Models for Emotion and Behavior Analysis in Education**|Kaito Tanaka et.al.|[2408.06874](http://arxiv.org/abs/2408.06874)|null|\u5206\u6790\u5b66\u751f\u7684\u60c5\u7eea\u548c\u884c\u4e3a\u5bf9\u4e8e\u63d0\u5347\u5b66\u4e60\u6548\u679c\u4e0e\u4e2a\u6027\u5316\u6559\u80b2\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5bf9\u4fb5\u5165\u6027\u7684\u89c6\u89c9\u548c\u751f\u7406\u6570\u636e\u6536\u96c6\uff0c\u8fd9\u5f15\u53d1\u4e86\u9690\u79c1\u95ee\u9898\u5e76\u9650\u5236\u4e86\u89c4\u6a21\u6027\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u63d0\u793a\u5de5\u7a0b\u6765\u5206\u6790\u5b66\u751f\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u7b56\u7565\u901a\u8fc7\u5b9a\u5236\u7684\u63d0\u793a\u5f15\u5bfcLLMs\u68c0\u6d4b\u60c5\u611f\u548c\u53c2\u4e0e\u72b6\u6001\uff0c\u63d0\u4f9b\u4e00\u79cd\u975e\u4fb5\u5165\u6027\u3001\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528Qwen\u3001ChatGPT\u3001Claude2\u548cGPT-4\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u4e0e\u57fa\u7840\u6a21\u578b\u548c\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u5728\u63d0\u4f9b\u5b9e\u7528\u6709\u6548\u5de5\u5177\u4ee5\u8fdb\u884c\u6559\u80b2\u60c5\u7eea\u548c\u884c\u4e3a\u5206\u6790\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06854": "|**2024-08-13**|**LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models**|Jia-Chen Zhang et.al.|[2408.06854](http://arxiv.org/abs/2408.06854)|null|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5b9e\u73b0\u9ad8\u53c2\u6570\u6548\u7387\u5e76\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u5df2\u6210\u4e3a\u65b0\u7684\u7814\u7a76\u65b9\u5411\u3002\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u663e\u8457\u964d\u4f4e\u4e86\u7ec6\u8c03\u65f6\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u5c3d\u7ba1\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u590d\u6742\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0c\u4ec5\u5728\u5355\u4e00\u5c3a\u5ea6\u4e0a\u8c03\u53c2\u53ef\u80fd\u5e76\u975e\u6700\u4f18\u7b56\u7565\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6269\u5c55LoRA\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aLoRA$^2$\u3002\u9996\u5148\uff0c\u901a\u8fc7\u7ed3\u5408\u6b63\u4ea4\u6295\u5f71\u7406\u8bba\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e24\u7ec4\u5728\u76f8\u4e92\u6b63\u4ea4\u5e73\u9762\u4e0a\u7684LoRA\u96c6\u5408\u3002\u7136\u540e\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u91cd\u8981\u6027\u8bc4\u5206\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5927\u7ea6\u51cf\u5c11\u4e8698.5%\u7684\u53c2\u6570\u654f\u611f\u5ea6\u8ba1\u7b97\u3002\u901a\u8fc7\u53bb\u9664\u5177\u6709\u8f83\u4f4e\u91cd\u8981\u6027\u5206\u6570\u7684\u5947\u5f02\u503c\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u5bf9\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u7684\u9002\u5e94\u80fd\u529b\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1LoRA$^2$\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u5168\u91cf\u7ec6\u8c03\u76f8\u6bd4\uff0c\u5b83\u4ec5\u5c06\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u51cf\u5c11\u81f30.72%\uff0c\u540c\u65f6\u4ecd\u80fd\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u5373\u4f7f\u8fdb\u4e00\u6b65\u5c06\u53c2\u6570\u51cf\u5c11\u81f30.17M\uff0c\u5176\u7ed3\u679c\u4e5f\u4e0e\u57fa\u7ebf\u6a21\u578b\uff08\u53c2\u6570\u91cf\u591a\u51fa8\u500d\uff09\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1a|\n", "2408.06849": "|**2024-08-13**|**Causal Agent based on Large Language Model**|Kairong Han et.al.|[2408.06849](http://arxiv.org/abs/2408.06849)|**[link](https://github.com/kairong-han/causal_agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\u3002\u7136\u800c\uff0c\u56e0\u679c\u95ee\u9898\u7684\u5185\u5728\u590d\u6742\u6027\u548c\u56e0\u679c\u7406\u8bba\u4f7f\u5f97\u7528\u81ea\u7136\u8bed\u8a00\u51c6\u786e\u63cf\u8ff0\u5b83\u4eec\u53d8\u5f97\u56f0\u96be\uff0c\u8fd9\u963b\u788d\u4e86LLM\u6709\u6548\u5730\u7406\u89e3\u548c\u4f7f\u7528\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7528\u81ea\u7136\u8bed\u8a00\u4f20\u8fbe\u56e0\u679c\u65b9\u6cd5\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fd9\u9650\u5236\u4e86LLM\u5e94\u7528\u5b83\u4eec\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u56e0\u679c\u6570\u636e\u96c6\u901a\u5e38\u4ee5\u8868\u683c\u5f62\u5f0f\u5b58\u5728\uff0c\u800cLLM\u5728\u5904\u7406\u81ea\u7136\u8bed\u8a00\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u79cd\u7ed3\u6784\u4e0a\u7684\u4e0d\u5339\u914d\u59a8\u788d\u4e86\u5bf9\u8868\u683c\u6570\u636e\u7684\u6709\u6548\u63a8\u7406\u3002\u7f3a\u4e4f\u56e0\u679c\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86LLM\u7684\u53d1\u5c55\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4e3aLLM\u914d\u5907\u4e86\u56e0\u679c\u5de5\u5177\uff0c\u5e76\u5c06\u5176\u7f6e\u4e8e\u4e00\u4e2a\u4ee3\u7406\u6846\u67b6\u4e2d\uff0c\u79f0\u4e3a\u201c\u56e0\u679c\u4ee3\u7406\u201d\u3002\u8be5\u4ee3\u7406\u5305\u62ec\u5de5\u5177\u3001\u8bb0\u5fc6\u548c\u63a8\u7406\u6a21\u5757\u3002\u5728\u5de5\u5177\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u901a\u8fc7\u5c06\u8868\u683c\u6570\u636e\u4e0e\u81ea\u7136\u8bed\u8a00\u5bf9\u9f50\u6765\u5e94\u7528\u56e0\u679c\u65b9\u6cd5\u3002\u5728\u63a8\u7406\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u91c7\u7528ReAct\u6846\u67b6\u591a\u6b21\u8fed\u4ee3\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u8fdb\u884c\u63a8\u7406\u3002\u5728\u8bb0\u5fc6\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5b57\u5178\u5b9e\u4f8b\uff0c\u5176\u4e2d\u952e\u662f\u552f\u4e00\u7684\u540d\u79f0\uff0c\u503c\u662f\u56e0\u679c\u56fe\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u56e0\u679c\u4ee3\u7406\u7684\u56e0\u679c\u80fd\u529b\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u5305\u62ec\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\uff1a\u53d8\u91cf\u7ea7\u522b\u3001\u8fb9\u7ea7\u522b\u3001\u56e0\u679c\u56fe\u7ea7\u522b\u548c\u56e0\u679c\u6548\u5e94\u7ea7\u522b\u3002\u6211\u4eec\u4f7f\u7528ChatGPT-3.5\u751f\u6210\u4e861300\u4e2a\u9488\u5bf9\u8fd9\u56db\u4e2a\u5c42\u6b21\u95ee\u9898\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u56e0\u679c\u4ee3\u7406\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\u4e0a\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u5747\u8d85\u8fc780%\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u6d1e\u5bdf\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u901a\u8fc7GitHub\u4ed3\u5e93https://github.com/Kairong-Han/Causal_Agent\u83b7\u53d6\u3002**|\n", "2408.07702": "|**2024-08-14**|**The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models**|Karime Maamari et.al.|[2408.07702](http://arxiv.org/abs/2408.07702)|null|Schema linking is a crucial step in Text-to-SQL pipelines, which translate natural language queries into SQL. The goal of schema linking is to retrieve relevant tables and columns (signal) while disregarding irrelevant ones (noise). However, imperfect schema linking can often exclude essential columns needed for accurate query generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at identifying relevant schema elements during generation, without the need for explicit schema linking. This allows Text-to-SQL pipelines to bypass schema linking entirely and instead pass the full database schema to the LLM, eliminating the risk of excluding necessary information. Furthermore, as alternatives to schema linking, we propose techniques that improve Text-to-SQL accuracy without compromising on essential schema information. Our approach achieves 71.83\\% execution accuracy on the BIRD benchmark, ranking first at the time of submission.|\n", "2408.07666": "|**2024-08-15**|**Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities**|Enneng Yang et.al.|[2408.07666](http://arxiv.org/abs/2408.07666)|**[link](https://github.com/ennengyang/awesome-model-merging-methods-theories-applications)**|**Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at \\url{https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications}.**|\n", "2408.07665": "|**2024-08-14**|**Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models**|Yi-Cheng Lin et.al.|[2408.07665](http://arxiv.org/abs/2408.07665)|**[link](https://github.com/dlion168/spoken_stereoset)**|Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address these biases. This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in SLLMs. By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. Our experiments reveal significant insights into their performance and bias levels. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.|\n", "2408.07663": "|**2024-08-14**|**Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions**|Quan Liu et.al.|[2408.07663](http://arxiv.org/abs/2408.07663)|**[link](https://github.com/gigabaozi/aed)**|**Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root causes of jailbreak issues. We first define the Competitive Index to quantify alignment failures and utilize feedback from self-evaluation to compute post-alignment logits. Then, AED adaptively combines AED and post-alignment logits with the original logits to obtain harmless and helpful distributions. Consequently, our method enhances safety alignment while maintaining helpfulness. We conduct experiments across five models and four common jailbreaks, with the results validating the effectiveness of our approach. Code is available at https://github.com/GIGABaozi/AED.git.**|\n", "2408.07611": "|**2024-08-14**|**WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs**|Weijian Xie et.al.|[2408.07611](http://arxiv.org/abs/2408.07611)|null|Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce \"phantom\" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a \"Retrieval-Augmented Generation (RAG)\" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions.|\n", "2408.07583": "|**2024-08-14**|**Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey**|Hamza Kheddar et.al.|[2408.07583](http://arxiv.org/abs/2408.07583)|null|With significant advancements in Transformers LLMs, NLP has extended its reach into many research fields due to its enhanced capabilities in text generation and user interaction. One field benefiting greatly from these advancements is cybersecurity. In cybersecurity, many parameters that need to be protected and exchanged between senders and receivers are in the form of text and tabular data, making NLP a valuable tool in enhancing the security measures of communication protocols. This survey paper provides a comprehensive analysis of the utilization of Transformers and LLMs in cyber-threat detection systems. The methodology of paper selection and bibliometric analysis is outlined to establish a rigorous framework for evaluating existing research. The fundamentals of Transformers are discussed, including background information on various cyber-attacks and datasets commonly used in this field. The survey explores the application of Transformers in IDSs, focusing on different architectures such as Attention-based models, LLMs like BERT and GPT, CNN/LSTM-Transformer hybrids, emerging approaches like ViTs, among others. Furthermore, it explores the diverse environments and applications where Transformers and LLMs-based IDS have been implemented, including computer networks, IoT devices, critical infrastructure protection, cloud computing, SDN, as well as in autonomous vehicles. The paper also addresses research challenges and future directions in this area, identifying key issues such as interpretability, scalability, and adaptability to evolving threats, and more. Finally, the conclusion summarizes the findings and highlights the significance of Transformers and LLMs in enhancing cyber-threat detection capabilities, while also outlining potential avenues for further research and development.|\n", "2408.07543": "|**2024-08-15**|**MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark**|Minxuan Zhou et.al.|[2408.07543](http://arxiv.org/abs/2408.07543)|**[link](https://github.com/PKU-Baichuan-MLSystemLab/MathScape)**|With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchmarks have not sufficiently integrated visual and textual information. To address this gap, we proposed MathScape, a new benchmark that emphasizes the understanding and application of combined visual and textual information. MathScape is designed to evaluate photo-based math problem scenarios, assessing the theoretical understanding and application ability of MLLMs through a categorical hierarchical approach. We conduct a multi-dimensional evaluation on 11 advanced MLLMs, revealing that our benchmark is challenging even for the most sophisticated models. By analyzing the evaluation results, we identify the limitations of MLLMs, offering valuable insights for enhancing model performance.|\n", "2408.07537": "|**2024-08-15**|**Usefulness of data flow diagrams and large language models for security threat validation: a registered report**|Winnie Bahati Mbaka et.al.|[2408.07537](http://arxiv.org/abs/2408.07537)|null|The arrival of recent cybersecurity standards has raised the bar for security assessments in organizations, but existing techniques don't always scale well. Threat analysis and risk assessment are used to identify security threats for new or refactored systems. Still, there is a lack of definition-of-done, so identified threats have to be validated which slows down the analysis. Existing literature has focused on the overall performance of threat analysis, but no previous work has investigated how deep must the analysts dig into the material before they can effectively validate the identified security threats. We propose a controlled experiment with practitioners to investigate whether some analysis material (like LLM-generated advice) is better than none and whether more material (the system's data flow diagram and LLM-generated advice) is better than some material. In addition, we present key findings from running a pilot with 41 MSc students, which are used to improve the study design. Finally, we also provide an initial replication package, including experimental material and data analysis scripts and a plan to extend it to include new materials based on the final data collection campaign with practitioners (e.g., pre-screening questions).|\n", "2408.07531": "|**2024-08-14**|**Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments**|Seungjun Han et.al.|[2408.07531](http://arxiv.org/abs/2408.07531)|null|Emergency department (ED) overcrowding and the complexity of rapid decision-making in critical care settings pose significant challenges to healthcare systems worldwide. While clinical decision support systems (CDSS) have shown promise, the integration of large language models (LLMs) offers new possibilities for enhancing triage accuracy and clinical decision-making. This study presents an LLM-driven CDSS designed to assist ED physicians and nurses in patient triage, treatment planning, and overall emergency care management. We developed a multi-agent CDSS utilizing Llama-3-70b as the base LLM, orchestrated by CrewAI and Langchain. The system comprises four AI agents emulating key ED roles: Triage Nurse, Emergency Physician, Pharmacist, and ED Coordinator. It incorporates the Korean Triage and Acuity Scale (KTAS) for triage assessment and integrates with the RxNorm API for medication management. The model was evaluated using the Asclepius dataset, with performance assessed by a clinical emergency medicine specialist. The CDSS demonstrated high accuracy in triage decision-making compared to the baseline of a single-agent system. Furthermore, the system exhibited strong performance in critical areas, including primary diagnosis, critical findings identification, disposition decision-making, treatment planning, and resource allocation. Our multi-agent CDSS demonstrates significant potential for supporting comprehensive emergency care management. By leveraging state-of-the-art AI technologies, this system offers a scalable and adaptable tool that could enhance emergency medical care delivery, potentially alleviating ED overcrowding and improving patient outcomes. This work contributes to the growing field of AI applications in emergency medicine and offers a promising direction for future research and clinical implementation.|\n", "2408.07505": "|**2024-08-14**|**Large Language Models Know What Makes Exemplary Contexts**|Quanyu Long et.al.|[2408.07505](http://arxiv.org/abs/2408.07505)|null|In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks without needing to update millions of parameters. This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts; self-rank candidates with different demonstration compositions; self-optimize the demonstration selection and ordering through reinforcement learning. Specifically, our method designs a parameter-efficient retrieval head that generates the optimized demonstration after training with rewards from LLM's own preference. Experimental results validate the proposed method's effectiveness in enhancing ICL performance. Additionally, our approach effectively identifies and selects the most representative examples for the current task, and includes more diversity in retrieval.|\n", "2408.08313": "|**2024-08-15**|**Can Large Language Models Understand Symbolic Graphics Programs?**|Zeju Qiu et.al.|[2408.08313](http://arxiv.org/abs/2408.08313)|null|Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown exciting promise towards program synthesis, but do they understand symbolic graphics programs? Unlike conventional programs, symbolic graphics programs can be translated to graphics content. Here, we characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. This task is challenging as the questions are difficult to answer from the symbolic programs alone -- yet, they would be easy to answer from the corresponding graphics content as we verify through a human experiment. To understand symbolic programs, LLMs may need to possess the ability to imagine how the corresponding graphics content would look without directly accessing the rendered visual content. We use this task to evaluate LLMs by creating a large benchmark for the semantic understanding of symbolic graphics programs. This benchmark is built via program-graphics correspondence, hence requiring minimal human efforts. We evaluate current LLMs on our benchmark to elucidate a preliminary assessment of their ability to reason about visual scenes from programs. We find that this task distinguishes existing LLMs and models considered good at reasoning perform better. Lastly, we introduce Symbolic Instruction Tuning (SIT) to improve this ability. Specifically, we query GPT4-o with questions and images generated by symbolic programs. Such data are then used to finetune an LLM. We also find that SIT data can improve the general instruction following ability of LLMs.|\n", "2408.08310": "|**2024-08-15**|**ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws**|Ruihang Li et.al.|[2408.08310](http://arxiv.org/abs/2408.08310)|null|High-quality data is crucial for the pre-training performance of large language models. Unfortunately, existing quality filtering methods rely on a known high-quality dataset as reference, which can introduce potential bias and compromise diversity. In this paper, we propose ScalingFilter, a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data, thereby eliminating the influence of the reference dataset in the filtering process. An theoretical analysis shows that ScalingFilter is equivalent to an inverse utilization of scaling laws. Through training models with 1.3B parameters on the same data source processed by various quality filters, we find ScalingFilter can improve zero-shot performance of pre-trained models in downstream tasks. To assess the bias introduced by quality filtering, we introduce semantic diversity, a metric of utilizing text embedding models for semantic representations. Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.|\n", "2408.08302": "|**2024-08-15**|**Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors**|Usman Syed et.al.|[2408.08302](http://arxiv.org/abs/2408.08302)|null|In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems. This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced LLMs, especially their accuracy, consistency, and reasoning behaviors, in solving transportation engineering problems. Our comprehensive analysis uncovers the unique strengths and limitations of each LLM, e.g. our analysis shows the impressive accuracy and some unexpected inconsistent behaviors of Claude 3.5 Sonnet in solving TransportBench problems. Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.|\n", "2408.08300": "|**2024-08-15**|**HELP: Hierarchical Embeddings-based Log Parsing**|Andy Xu et.al.|[2408.08300](http://arxiv.org/abs/2408.08300)|null|Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heuristics-based parsers require handcrafted features and domain knowledge, which are difficult to generalize at scale. Second, existing large language model-based parsers rely on periodic offline processing, limiting their effectiveness in real-time use cases. Third, existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies. To address these challenges, we propose HELP, a Hierarchical Embeddings-based Log Parser. HELP is the first online semantic-based parser to leverage LLMs for performant and cost-effective log parsing. We achieve this through a novel hierarchical embeddings module, which fine-tunes a text embedding model to cluster logs before parsing, reducing querying costs by multiple orders of magnitude. To combat log drift, we also develop an iterative rebalancing module, which periodically updates existing log groupings. We evaluate HELP extensively on 14 public large-scale datasets, showing that HELP achieves significantly higher F1-weighted grouping and parsing accuracy than current state-of-the-art online log parsers. We also implement HELP into Iudex's production observability platform, confirming HELP's practicality in a production environment. Our results show that HELP is effective and efficient for high-throughput real-world log parsing.|\n", "2408.08291": "|**2024-08-15**|**The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community**|Shachar Don-Yehiya et.al.|[2408.08291](http://arxiv.org/abs/2408.08291)|null|Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.|\n", "2408.08282": "|**2024-08-15**|**Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model**|Jin Wang et.al.|[2408.08282](http://arxiv.org/abs/2408.08282)|null|Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.|\n", "2408.08274": "|**2024-08-15**|**BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts**|Qizhen Zhang et.al.|[2408.08274](http://arxiv.org/abs/2408.08274)|null|The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when \"upcycling\" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.|\n", "2408.08231": "|**2024-08-15**|**DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System**|Xihong Yang et.al.|[2408.08231](http://arxiv.org/abs/2408.08231)|null|Benefiting from the strong reasoning capabilities, Large language models (LLMs) have demonstrated remarkable performance in recommender systems. Various efforts have been made to distill knowledge from LLMs to enhance collaborative models, employing techniques like contrastive learning for representation alignment. In this work, we prove that directly aligning the representations of LLMs and collaborative models is sub-optimal for enhancing downstream recommendation tasks performance, based on the information theorem. Consequently, the challenge of effectively aligning semantic representations between collaborative models and LLMs remains unresolved. Inspired by this viewpoint, we propose a novel plug-and-play alignment framework for LLMs and collaborative models. Specifically, we first disentangle the latent representations of both LLMs and collaborative models into specific and shared components via projection layers and representation regularization. Subsequently, we perform both global and local structure alignment on the shared representations to facilitate knowledge transfer. Additionally, we theoretically prove that the specific and shared representations contain more pertinent and less irrelevant information, which can enhance the effectiveness of downstream recommendation tasks. Extensive experimental results on benchmark datasets demonstrate that our method is superior to existing state-of-the-art algorithms.|\n", "2408.08217": "|**2024-08-15**|**RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science**|David Farr et.al.|[2408.08217](http://arxiv.org/abs/2408.08217)|null|Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.|\n", "2408.08210": "|**2024-08-15**|**Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models**|Javier Gonz\u00e1lez et.al.|[2408.08210](http://arxiv.org/abs/2408.08210)|null|Recent advances in AI have been significantly driven by the capabilities of large language models (LLMs) to solve complex problems in ways that resemble human thinking. However, there is an ongoing debate about the extent to which LLMs are capable of actual reasoning. Central to this debate are two key probabilistic concepts that are essential for connecting causes to their effects: the probability of necessity (PN) and the probability of sufficiency (PS). This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms using these probabilistic measures. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which it is possible to compute suitable approximations of PN and PS. Our research marks an important step towards gaining a deeper understanding of when LLMs are capable of reasoning, as illustrated by a series of math examples.|\n", "2408.08869": "|**2024-08-16**|**PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars**|Sumanth Prabhu et.al.|[2408.08869](http://arxiv.org/abs/2408.08869)|null|\u81ea\u4e00\u81f4\u6027\u7b49\u4f9d\u8d56\u4e8e\u51c6\u786e\u7b54\u6848\u63d0\u53d6\u8fc7\u7a0b\u7684\u81ea\u6211\u96c6\u4e1b\u6280\u672f\u5df2\u7ecf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6280\u672f\u5728\u805a\u5408\u591a\u4e2a\u8f93\u51fa\u65f6\u9700\u8981\u8f83\u9ad8\u7684\u63a8\u7406\u6210\u672c\uff0c\u76f8\u8f83\u4e8e\u8d2a\u5fc3\u89e3\u7801\u800c\u8a00\uff0c\u751f\u6210\u76f8\u5bf9\u8f83\u591a\u7684\u8f93\u51fa\u4ee4\u724c\u3002\u7814\u7a76\u663e\u793a\uff0c\u81ea\u4e00\u81f4\u6027\u65b9\u6cd5\u4ea7\u751f\u7684\u81ea\u7531\u6587\u672c\u8f93\u51fa\u53ef\u4ee5\u901a\u8fc7LLM\u53ef\u9760\u5730\u805a\u5408\u4ee5\u4ea7\u751f\u6700\u7ec8\u8f93\u51fa\u3002\u6b64\u5916\uff0c\u6700\u8fd1\u7684LLM\u63a8\u7406\u8fdb\u5c55\u8868\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u591a\u6837\u5316\u7684\u793a\u4f8b\u80fd\u591f\u8bf1\u5bfcLLM\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002\u8fd9\u4e9b\u5df2\u7ecf\u8bc1\u660e\u7684\u6280\u672f\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u6269\u5c55\u5230\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u5b9e\u73b0\u6587\u672c\u751f\u6210\u7684\u6574\u4f53\u6027\u80fd\u6539\u8fdb\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPEDAL\uff08\u57fa\u4e8e\u793a\u4f8b\u591a\u6837\u6027\u7684LLM\u805a\u5408\uff09\u7684\u6df7\u5408\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u57fa\u4e8e\u591a\u6837\u793a\u4f8b\u63d0\u793a\u548cLLM\u805a\u5408\u7684\u4f18\u52bf\uff0c\u4ee5\u5b9e\u73b0\u6027\u80fd\u7684\u63d0\u5347\u3002\u5728\u516c\u5f00\u7684SVAMP\u548cARC\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\uff0c\u4e0e\u57fa\u4e8e\u8d2a\u5fc3\u89e3\u7801\u7684\u7b56\u7565\u76f8\u6bd4\uff0cPEDAL\u80fd\u591f\u5728\u8f83\u4f4e\u7684\u63a8\u7406\u6210\u672c\u4e0b\u83b7\u5f97\u66f4\u597d\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u57fa\u4e8e\u81ea\u4e00\u81f4\u6027\u7684\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u4f18\u52bf\u3002|\n", "2408.08862": "|**2024-08-16**|**Visual Agents as Fast and Slow Thinkers**|Guangyan Sun et.al.|[2408.08862](http://arxiv.org/abs/2408.08862)|**[link](https://github.com/guangyans/sys2-llava)**|\u5b9e\u73b0\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u667a\u80fd\u9700\u8981\u5bf9\u8ba4\u77e5\u4e0a\u7684\u7b2c\u4e00\u7cfb\u7edf\u548c\u7b2c\u4e8c\u7cfb\u7edf\u601d\u7ef4\u8fdb\u884c\u7ec6\u5316\u3002\u5f53\u524d\u7684\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\uff0c\u867d\u7136\u8868\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u7279\u70b9\uff0c\u4f46\u5e76\u672a\u8fbe\u5230\u771f\u6b63\u7684\u8ba4\u77e5\u6c34\u5e73\u3002\u5728\u4ece\u7ed3\u6784\u5316\u57fa\u51c6\u5411\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8fc7\u6e21\u7684\u8fc7\u7a0b\u4e2d\uff0c\u89c6\u89c9\u4ee3\u7406\u9762\u4e34\u6311\u6218\uff0c\u5f80\u5f80\u5bfc\u81f4\u56de\u7b54\u65e2\u4e0d\u51c6\u786e\u53c8\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86FaST\uff08\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\uff09\uff0c\u5b83\u5c06\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\u673a\u5236\u878d\u5165\u5230\u89c6\u89c9\u4ee3\u7406\u4e2d\u3002FaST\u91c7\u7528\u5207\u6362\u9002\u914d\u5668\u52a8\u6001\u9009\u62e9\u7cfb\u7edf1/2\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u7684\u590d\u6742\u6027\u8c03\u6574\u89e3\u51b3\u95ee\u9898\u7684\u65b9\u6cd5\u3002\u9762\u5bf9\u4e0d\u786e\u5b9a\u548c\u672a\u89c1\u8fc7\u7684\u5bf9\u8c61\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u4fe1\u5fc3\u5e76\u6574\u5408\u65b0\u7684\u4e0a\u4e0b\u6587\u6570\u636e\uff0c\u5b83\u80fd\u591f\u7075\u6d3b\u5e94\u5bf9\u3002 \u6211\u4eec\u63d0\u5021\u4e00\u4e2a\u7075\u6d3b\u7684\u7cfb\u7edf\u3001\u5c42\u6b21\u5316\u7684\u63a8\u7406\u80fd\u529b\u548c\u900f\u660e\u7684\u51b3\u7b56\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u4f7f\u5f97FaST\u80fd\u591f\u6a21\u4eff\u4eba\u7c7b\u5728\u89c6\u89c9\u667a\u80fd\u4e2d\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFaST\u5728\u89c6\u89c9\u95ee\u7b54(VQA^{v2})\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8680.8%\u7684\u51c6\u786e\u7387\uff0c\u5728\u63a8\u7406\u5206\u5272(ReasonSeg)\u4efb\u52a1\u4e0a\u83b7\u5f97\u4e8648.7%\u7684GIoU\u5206\u6570\uff0c\u8fd9\u5145\u5206\u5c55\u793a\u4e86FaST\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u5e7f\u6cdb\u7684\u6d4b\u8bd5\u9a8c\u8bc1\u4e86FaST\u6838\u5fc3\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u548c\u7a33\u5065\u6027\uff0c\u663e\u793a\u4e86\u5176\u5728\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u8ba4\u77e5\u89c6\u89c9\u4ee3\u7406\u7684\u53d1\u5c55\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.08849": "|**2024-08-16**|**ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis**|Yubao Zhao et.al.|[2408.08849](http://arxiv.org/abs/2408.08849)|null|\u5728\u533b\u7597\u8f85\u52a9\u9886\u57df\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u6210\u529f\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97\u60a3\u8005\u80fd\u591f\u5229\u7528\u751f\u7406\u4fe1\u53f7\u6570\u636e\u8fdb\u884c\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u901a\u7528\u7684MLLMs\u5728\u5fc3\u810f\u75c5\u8bca\u65ad\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u5728ECG\u6570\u636e\u89e3\u6790\u4e0e\u957f\u6587\u672c\u533b\u5b66\u62a5\u544a\u751f\u6210\u7684\u6574\u5408\u4e0a\uff0c\u4e3b\u8981\u539f\u56e0\u662fECG\u6570\u636e\u89e3\u6790\u7684\u590d\u6742\u6027\u4ee5\u53ca\u6587\u672c\u4e0eECG\u4fe1\u53f7\u6a21\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6a21\u578b\u5728\u957f\u6587\u672c\u751f\u6210\u65f6\u5f80\u5f80\u5b58\u5728\u4e25\u91cd\u7684\u7a33\u5b9a\u6027\u95ee\u9898\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u4e0e\u7528\u6237\u67e5\u8be2\u7d27\u5bc6\u76f8\u5173\u7684\u7cbe\u786e\u77e5\u8bc6\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aECG-Chat\u7684\u591a\u4efb\u52a1MLLM\uff0c\u4e13\u6ce8\u4e8eECG\u533b\u5b66\u62a5\u544a\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u57fa\u4e8e\u5fc3\u810f\u75c5\u5b66\u77e5\u8bc6\u7684\u8de8\u6a21\u6001\u5bf9\u8bdd\u80fd\u529b\u3002\u6211\u4eec\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5c06ECG\u6ce2\u5f62\u6570\u636e\u4e0e\u6587\u672c\u62a5\u544a\u7ed3\u5408\uff0c\u4ee5\u7cbe\u7ec6\u7684\u65b9\u5f0f\u5bf9\u9f50ECG\u7279\u5f81\u4e0e\u62a5\u544a\u5185\u5bb9\u3002\u8fd9\u79cd\u65b9\u6cd5\u8fd8\u4ea7\u751f\u4e86\u4e00\u4e2a\u5728\u96f6\u6837\u672c\u62a5\u544a\u68c0\u7d22\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u7684ECG\u7f16\u7801\u5668\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u6269\u5c55\u73b0\u6709\u6570\u636e\u96c6\uff0c\u6784\u5efa\u4e86\u5305\u542b19K\u4e2aECG\u8bca\u65ad\u6570\u636e\u96c6\u548c25K\u4e2a\u591a\u8f6e\u5bf9\u8bdd\u6570\u636e\u96c6\u7528\u4e8e\u8bad\u7ec3\u548c\u5fae\u8c03ECG-Chat\uff0c\u4ece\u800c\u63d0\u4f9b\u4e13\u4e1a\u7684\u8bca\u65ad\u548c\u5bf9\u8bdd\u80fd\u529b\u3002\u6b64\u5916\uff0cECG-Chat\u53ef\u4ee5\u901a\u8fc7\u81ea\u52a8\u5316LaTeX\u751f\u6210\u7ba1\u9053\u6765\u751f\u6210\u5168\u9762\u7684ECG\u5206\u6790\u62a5\u544a\u3002\u6211\u4eec\u4e3aECG\u62a5\u544a\u751f\u6210\u4efb\u52a1\u5efa\u7acb\u4e86\u57fa\u51c6\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u7ebf\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002ECG-Chat\u5728\u5206\u7c7b\u3001\u68c0\u7d22\u3001\u591a\u6a21\u6001\u5bf9\u8bdd\u548c\u533b\u5b66\u62a5\u544a\u751f\u6210\u4efb\u52a1\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u4f73\u6027\u80fd\u3002\u6211\u4eec\u7684\u62a5\u544a\u6a21\u677f\u8bbe\u8ba1\u4e5f\u5f97\u5230\u4e86\u533b\u7597\u4e13\u4e1a\u4eba\u5458\u7684\u4e00\u81f4\u8ba4\u53ef\u3002|\n", "2408.08848": "|**2024-08-16**|**PsychoLex: Unveiling the Psychological Mind of Large Language Models**|Mohammad Amin Abbasi et.al.|[2408.08848](http://arxiv.org/abs/2408.08848)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5fc3\u7406\u5b66\u4e0e\u4eba\u5de5\u667a\u80fd\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5f00\u53d1\u548c\u8bc4\u4f30\u4e13\u7528\u4e8e\u5fc3\u7406\u4efb\u52a1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u6211\u4eec\u5f15\u5165\u4e86PsychoLex\u5957\u4ef6\uff0c\u65e8\u5728\u589e\u5f3aLLMs\u5728\u6ce2\u65af\u8bed\u548c\u82f1\u8bed\u4e2d\u7684\u5fc3\u7406\u4efb\u52a1\u5904\u7406\u80fd\u529b\u3002\u4e3b\u8981\u8d21\u732e\u5305\u62ecPsychoLexQA\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u6559\u5b66\u5185\u5bb9\u7684\u521b\u5efa\uff0c\u4ee5\u53caPsychoLexEval\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5bf9LLMs\u5728\u590d\u6742\u5fc3\u7406\u60c5\u666f\u4e0b\u7684\u4e25\u683c\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4ecb\u7ecd\u4e86PsychoLexLLaMA\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u7279\u522b\u4f18\u5316\u4ee5\u9002\u7528\u4e8e\u5fc3\u7406\u5e94\u7528\uff0c\u5176\u6027\u80fd\u660e\u663e\u4f18\u4e8e\u901a\u7528\u6a21\u578b\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5b9a\u5236LLMs\u5728\u63a8\u8fdb\u5fc3\u7406\u7814\u7a76\u548c\u5e94\u7528\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u540c\u65f6\u4e5f\u6307\u51fa\u4e86\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u9886\u57df\u3002\u8fd9\u9879\u7814\u7a76\u4e3a\u5c06LLMs\u878d\u5165\u7279\u5b9a\u7684\u5fc3\u7406\u5b66\u9886\u57df\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5bf9\u672a\u6765AI\u9a71\u52a8\u7684\u5fc3\u7406\u5b9e\u8df5\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2408.08841": "|**2024-08-16**|**FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats**|Xuanliang Zhang et.al.|[2408.08841](http://arxiv.org/abs/2408.08841)|**[link](https://github.com/zhxlia/FLEXTAF)**|**## \u4e0a\u6587\u80cc\u666f \u8868\u683c\u63a8\u7406\u4efb\u52a1\u65e8\u5728\u6839\u636e\u7ed9\u5b9a\u7684\u8868\u683c\u56de\u7b54\u95ee\u9898\u3002\u76ee\u524d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u8868\u683c\u63a8\u7406\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u73b0\u6709\u7684\u5927\u591a\u6570\u65b9\u6cd5\u90fd\u91c7\u7528\u56fa\u5b9a\u7684\u8868\u683c\u683c\u5f0f\u6765\u8868\u793a\u8868\u683c\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u6027\u80fd\u3002\u9274\u4e8e\u6bcf\u4e2a\u5b9e\u4f8b\u9700\u8981\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u800c\u6a21\u578b\u5177\u6709\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u6211\u4eec\u65ad\u8a00\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u9002\u7528\u4e8e\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\u3002\u901a\u8fc7\u5b9e\u9a8c\u7ed3\u679c\u7684\u5b9a\u91cf\u5206\u6790\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e00\u70b9\uff1a\u4f7f\u7528\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\uff0c\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u53ef\u4ee5\u83b7\u5f97\u4e0d\u540c\u7684\u6027\u80fd\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u589e\u5f3a\u8868\u683c\u63a8\u7406\u6027\u80fd\u7684\u65b9\u6cd5FLEXTAF-Single\u548cFLEXTAF-Vote\uff0c\u901a\u8fc7\u4f7f\u7528\u7075\u6d3b\u7684\u8868\u683c\u683c\u5f0f\u3002\u5177\u4f53\u6765\u8bf4\uff0c(i) FLEXTAF-Single\u8bad\u7ec3\u4e00\u4e2a\u5206\u7c7b\u5668\uff0c\u57fa\u4e8e\u5b9e\u4f8b\u548cLLM\u9884\u6d4b\u6700\u9002\u5408\u7684\u8868\u683c\u683c\u5f0f\u3002(ii) FLEXTAF-Vote\u5728\u4e0d\u540c\u683c\u5f0f\u4e4b\u95f4\u96c6\u6210\u7ed3\u679c\u3002\u6211\u4eec\u5728WikiTableQuestions\u548cTabFact\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u4e86\u663e\u8457\u7684\u6539\u8fdb\uff0c\u4e0e\u4f7f\u7528\u56fa\u5b9a\u8868\u683c\u683c\u5f0f\u5e76\u7ed3\u5408\u8d2a\u5a6a\u89e3\u7801\u548c\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u8fbe\u5230\u7684\u6700\u4f73\u6027\u80fd\u76f8\u6bd4\uff0c\u5e73\u5747\u63d0\u9ad8\u4e862.3%\u548c4.8%\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.08811": "|**2024-08-16**|**Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors**|Felipe A. Csaszar et.al.|[2408.08811](http://arxiv.org/abs/2408.08811)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5982\u4f55\u5f71\u54cd\u4f01\u4e1a\u6218\u7565\u51b3\u7b56\u8fc7\u7a0b\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u4f8b\u5c55\u793a\u4e86AI\u5982\u4f55\u589e\u5f3a\u73b0\u6709\u6218\u7565\u51b3\u7b56\u5de5\u5177\uff0c\u5e76\u63d0\u4f9b\u4e86\u6765\u81ea\u9886\u5148\u52a0\u901f\u5668\u8ba1\u5212\u548c\u521b\u4e1a\u7ade\u8d5b\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660e\u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u548c\u8bc4\u4f30\u7b56\u7565\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u4f01\u4e1a\u5bb6\u548c\u6295\u8d44\u8005\u76f8\u5f53\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5206\u6790\u4e86\u6218\u7565\u51b3\u7b56\u80cc\u540e\u7684\u5173\u952e\u8ba4\u77e5\u8fc7\u7a0b\u2014\u2014\u641c\u7d22\u3001\u8868\u793a\u548c\u805a\u5408\uff0c\u5e76\u63d0\u51faAI\u6709\u53ef\u80fd\u63d0\u5347\u6218\u7565\u5206\u6790\u7684\u901f\u5ea6\u3001\u8d28\u91cf\u548c\u89c4\u6a21\uff0c\u540c\u65f6\u8fd8\u80fd\u542f\u7528\u5982\u865a\u62df\u6218\u7565\u6a21\u62df\u7b49\u65b0\u65b9\u6cd5\u3002\u7136\u800c\uff0cAI\u5bf9\u4f01\u4e1a\u53d1\u5c55\u7684\u5f71\u54cd\u6700\u7ec8\u53d6\u51b3\u4e8e\u7ade\u4e89\u52a8\u6001\u4ee5\u53caAI\u80fd\u529b\u7684\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5c06AI\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u5e94\u7528\u4e0e\u4f01\u4e1a\u7ed3\u679c\u8054\u7cfb\u8d77\u6765\uff0c\u5e76\u8ba8\u8bba\u4e86AI\u5982\u4f55\u91cd\u5851\u7ade\u4e89\u4f18\u52bf\u7684\u6765\u6e90\u3002\u6700\u540e\uff0c\u6211\u4eec\u8003\u8651\u4e86AI\u5982\u4f55\u65e2\u652f\u6301\u53c8\u6311\u6218\u57fa\u4e8e\u7406\u8bba\u7684\u6218\u7565\u89c2\u7684\u6838\u5fc3\u539f\u5219\u3002\u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63cf\u7ed8\u4e86\u4e00\u4e2aAI\u4e0e\u6218\u7565\u9886\u57df\u6b63\u5728\u5f62\u6210\u7684\u7814\u7a76\u524d\u6cbf\u3002|\n", "2408.08808": "|**2024-08-16**|**Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge**|Ravi Raju et.al.|[2408.08808](http://arxiv.org/abs/2408.08808)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u5b66\u4e60\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u7136\u800c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u591a\u6837\u884c\u4e3a\u3002\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7684\u4ef7\u503c\u5728\u4e8e\u5b83\u80fd\u5426\u6e05\u6670\u533a\u5206\u4e0d\u540c\u80fd\u529b\u7ea7\u522b\u7684\u6a21\u578b\uff08\u53ef\u5206\u6027\uff09\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u7d27\u5bc6\u5339\u914d\u5ea6\u3002\u5f53\u524d\u7684\u6846\u67b6\u5982Alpaca-Eval 2.0 LC \\cite{dubois2024lengthcontrolledalpacaevalsimpleway} \u548cArena-Hard v0.1 \\cite{li2024crowdsourced}\u4e3b\u8981\u5173\u6ce8\u901a\u7528\u67e5\u8be2\uff0c\u5e76\u4e14\u7f3a\u4e4f\u8de8\u6cd5\u5f8b\u3001\u533b\u5b66\u7b49\u9886\u57df\u7684\u591a\u6837\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u7ba1\u9053\uff0c\u6765\u5b9a\u5236\u4e00\u7cfb\u5217\u591a\u5143\u5316\u7684\u3001\u9488\u5bf9LLM-as-a-Judge\u6846\u67b6\u7684\u9886\u57df\u7279\u5b9a\u8bc4\u4f30\u96c6\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u4eba\u5de5\u7b5b\u9009\u3001\u534a\u76d1\u7763\u5b66\u4e60\u751f\u6210\u805a\u7c7b\u4ee5\u53ca\u5206\u5c42\u62bd\u6837\uff0c\u786e\u4fdd\u5728\u5e7f\u6cdb\u9886\u57df\u548c\u8bed\u8a00\u4e2d\u90fd\u6709\u5747\u8861\u7684\u4ee3\u8868\u6027\u3002\u4ea7\u751f\u7684\u8bc4\u4f30\u96c6\u5305\u62ec1573\u4e2a\u6837\u672c\uff0c\u5206\u5e03\u572814\u4e2a\u7c7b\u522b\u4e2d\uff0c\u663e\u793a\u51fa\u9ad8\u53ef\u5206\u6027\uff0884%\uff09\u548c\u5bf9\u524d\u5341\u5927\u6a21\u578b\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u540c\u65f6\u4e0eChatbot Arena\u7684\u5171\u8bc6\u5ea6\uff0884%\uff09\u548cSpearman\u76f8\u5173\u7cfb\u6570\uff080.915\uff09\u4e5f\u8868\u73b0\u51fa\u826f\u597d\u7684\u4e00\u81f4\u6027\u3002\u4e0eAlpacaEval 2.0 LC\u7684\u5171\u8bc6\u5ea6\u76f8\u6bd4\uff0c\u8fd9\u4e00\u503c\u9ad8\u51fa9%\uff0c\u4e0eArena Hard\u76f8\u6bd4\u5219\u9ad8\u51fa20%\uff0c\u800c\u4e0eSpearman\u7cfb\u6570\u76f8\u6bd4\u5219\u662f\u4e0b\u4e00\u4e2a\u6700\u4f73\u57fa\u51c6\u76840.7\u500d\uff0c\u8fd9\u8868\u660e\u6211\u4eec\u5728\u57fa\u51c6\u6d4b\u8bd5\u7684\u6709\u6548\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f00\u6e90\u7684\u8bc4\u4f30\u5de5\u5177\uff0c\u5141\u8bb8\u7528\u6237\u81ea\u5b9a\u4e49\u7c7b\u522b\u8fdb\u884c\u7cbe\u7ec6\u5206\u6790\uff0c\u4ece\u800c\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6d1e\u5bdf\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u589e\u5f3aLLM\u8bc4\u4f30\u65b9\u6cd5\u7684\u900f\u660e\u5ea6\u3001\u591a\u6837\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2408.08782": "|**2024-08-16**|**EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics**|Chenwei Wan et.al.|[2408.08782](http://arxiv.org/abs/2408.08782)|**[link](https://github.com/cw-wan/EmoDynamiX-v2)**|**\u8bbe\u8ba1\u80fd\u591f\u63d0\u4f9b\u6170\u85c9\u548c\u5efa\u8bae\u7684\u5177\u6709\u60c5\u611f\u667a\u80fd\u7684\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u4ee5\u5e2e\u52a9\u90a3\u4e9b\u7ecf\u5386\u538b\u529b\u7684\u4eba\u4eec\uff0c\u662f\u4e00\u4e2a\u6781\u5177\u5438\u5f15\u529b\u7684\u7814\u7a76\u9886\u57df\u3002\u8fc7\u53bb\u7684\u7814\u7a76\u5de5\u4f5c\u7740\u91cd\u4e8e\u6784\u5efa\u6a21\u5757\u5316\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u5e76\u5c06\u5176\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u89c6\u4e3a\u8f85\u52a9\u4efb\u52a1\uff0c\u901a\u8fc7\u5b9a\u5236\u89e3\u7801\u5668\u751f\u6210\u6761\u4ef6\u5316\u7684\u54cd\u5e94\u3002\u6700\u8fd1\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u65b9\u9762\u7684\u53d1\u5c55\u4f7f\u5f97\u65e0\u9700\u660e\u786e\u7684\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u6b65\u9aa4\u7684\u7aef\u5230\u7aef\u5bf9\u8bdd\u4ee3\u7406\u53d8\u5f97\u6d41\u884c\u8d77\u6765\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u8bed\u8a00\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u56fa\u6709\u7684\u504f\u597d\u504f\u89c1\uff0c\u503e\u5411\u4e8e\u67d0\u4e9b\u793e\u4f1a\u60c5\u611f\u7b56\u7565\uff0c\u963b\u788d\u4e86\u63d0\u4f9b\u9ad8\u8d28\u91cf\u60c5\u611f\u652f\u6301\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff1a\u5c06\u7b56\u7565\u9884\u6d4b\u4e0e\u8bed\u8a00\u751f\u6210\u5206\u79bb\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aEmoDynamiX\u7684\u65b0\u578b\u5bf9\u8bdd\u7b56\u7565\u9884\u6d4b\u5668\u3002\u8be5\u9884\u6d4b\u5668\u5229\u7528\u5f02\u6784\u56fe\u6765\u5efa\u6a21\u7528\u6237\u60c5\u7eea\u4e0e\u7cfb\u7edf\u7b56\u7565\u4e4b\u95f4\u7684\u5bf9\u8bdd\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4e86\u5bf9\u8bdd\u4e2d\u60c5\u611f\u8bc6\u522b\uff08ERC\uff09\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u6df7\u5408\u60c5\u7eea\u6a21\u5757\uff0c\u4ee5\u6355\u6349\u7528\u6237\u7684\u7ec6\u5fae\u60c5\u611f\u72b6\u6001\u3002\u5728\u4e24\u4e2aESC\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEmoDynamiX\u663e\u8457\u8d85\u8d8a\u4e86\u5148\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u7ecf\u79fb\u9664\u4e86\",\"\u5b57\u7b26\u3002**|\n", "2408.08780": "|**2024-08-16**|**Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions**|Chenming Tang et.al.|[2408.08780](http://arxiv.org/abs/2408.08780)|null|\u901a\u8fc7\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728ICL\u8fc7\u7a0b\u4e2d\u63cf\u8ff0\u6027\u6307\u4ee4\u7684\u4f5c\u7528\u4ecd\u7136\u6709\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u96c6\u6210\u63d0\u793a\u6846\u67b6\uff0c\u7528\u4e8e\u63cf\u8ff0\u591a\u4e2a\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u9009\u62e9\u6807\u51c6\uff0c\u5e76\u5728\u516d\u4e2a\u7ffb\u8bd1\u65b9\u5411\u7684\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u4efb\u52a1\u4e0a\u7684\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u6846\u67b6\u80fd\u591f\u63d0\u5347ICL\u6027\u80fd\u3002\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0cLLM\u53ef\u80fd\u5e76\u4e0d\u5173\u5fc3\u63cf\u8ff0\u7684\u5177\u4f53\u5185\u5bb9\uff0c\u6027\u80fd\u63d0\u5347\u4e3b\u8981\u6e90\u4e8e\u96c6\u6210\u683c\u5f0f\uff0c\u5373\u4f7f\u4f7f\u7528\u968f\u673a\u63cf\u8ff0\u540d\u8bcd\uff0c\u8be5\u6846\u67b6\u4e5f\u80fd\u5e26\u6765\u6539\u8fdb\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u5e38\u8bc6\u3001\u6570\u5b66\u3001\u903b\u8f91\u63a8\u7406\u548c\u5e7b\u89c9\u4efb\u52a1\u4e0a\u5e94\u7528\u4e86\u8fd9\u79cd\u65b0\u7684\u96c6\u6210\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u4e09\u79cdLLM\u53d6\u5f97\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u8fd9\u518d\u6b21\u8868\u660e\u8bbe\u8ba1\u9002\u5f53\u7684\u63d0\u793a\u683c\u5f0f\u6bd4\u4e13\u6ce8\u4e8e\u7279\u5b9a\u63cf\u8ff0\u66f4\u4e3a\u6709\u6548\u548c\u9ad8\u6548\u3002\u5728\u8bba\u6587\u53d1\u8868\u540e\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.08779": "|**2024-08-16**|**DAC: Decomposed Automation Correction for Text-to-SQL**|Dingzirui Wang et.al.|[2408.08779](http://arxiv.org/abs/2408.08779)|**[link](https://github.com/zirui-HIT/DAC)**|**\u6587\u672c\u5230SQL\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u4efb\u52a1\uff0c\u5b83\u901a\u8fc7\u81ea\u52a8\u751f\u6210SQL\u67e5\u8be2\u5e2e\u52a9\u4eba\u4eec\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u8003\u8651\u5230\u51fa\u8272\u7684\u6027\u80fd\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b9\u6cd5\u6210\u4e3a\u4e86\u6587\u672c\u5230SQL\u7684\u4e3b\u6d41\u65b9\u5f0f\u3002\u5728\u8fd9\u7c7b\u65b9\u6cd5\u4e2d\uff0c\u81ea\u52a8\u4fee\u6b63\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u624b\u6bb5\uff0c\u80fd\u591f\u901a\u8fc7\u7ea0\u6b63\u751f\u6210\u7ed3\u679c\u4e2d\u7684\u9519\u8bef\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u73b0\u6709\u4fee\u6b63\u65b9\u6cd5\u8981\u6c42LLM\u76f4\u63a5\u5bf9\u751f\u6210\u7684SQL\u8fdb\u884c\u4fee\u6b63\uff0c\u800c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u68c0\u6d4b\u9519\u8bef\uff0c\u5bfc\u81f4\u4e86\u8f83\u5dee\u7684\u6027\u80fd\u3002\u56e0\u6b64\uff0c\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u91c7\u7528\u5206\u89e3\u5f0f\u4fee\u6b63\u6765\u589e\u5f3a\u6587\u672c\u5230SQL\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5206\u89e3\u5f0f\u4fee\u6b63\u4f18\u4e8e\u76f4\u63a5\u4fee\u6b63\uff0c\u56e0\u4e3a\u4e0eSQL\u76f8\u6bd4\uff0c\u901a\u8fc7\u7ed3\u679c\u5206\u89e3\u5b50\u4efb\u52a1\u6765\u68c0\u6d4b\u548c\u4fee\u590d\u9519\u8bef\u66f4\u4e3a\u5bb9\u6613\u3002\u57fa\u4e8e\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5206\u89e3\u81ea\u52a8\u5316\u4fee\u6b63\uff08DAC\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u5c06\u6587\u672c\u5230SQL\u5206\u89e3\u4e3a\u5b9e\u4f53\u94fe\u63a5\u548c\u9aa8\u67b6\u89e3\u6790\u4e24\u4e2a\u5b50\u4efb\u52a1\u6765\u4fee\u6b63SQL\u3002DAC\u9996\u5148\u751f\u6210\u4e0e\u95ee\u9898\u5bf9\u5e94\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\uff0c\u7136\u540e\u6bd4\u8f83\u521d\u59cbSQL\u4e0e\u751f\u6210\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\u4e4b\u95f4\u7684\u5dee\u5f02\u4f5c\u4e3a\u4fee\u6b63\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u3001Bird\u548cKaggleDBQA\u4e0a\u7684\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e863.7%\uff0c\u8bc1\u660e\u4e86DAC\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10197": "|**2024-08-19**|**Demystifying the Communication Characteristics for Distributed Transformer Models**|Quentin Anthony et.al.|[2408.10197](http://arxiv.org/abs/2408.10197)|null|\u6df1\u5ea6\u5b66\u4e60\uff08DL\uff09\u6a21\u578b\u57fa\u4e8e\u53d8\u6362\u5668\u67b6\u6784\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3001\u89c6\u89c9\u53d8\u6362\u5668\u3001\u97f3\u9891\u751f\u6210\u548c\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u7b49\u4f17\u591aDL\u5e94\u7528\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u8fdb\u5c55\u3002\u8fd9\u4e00\u7cfb\u5217\u8fdb\u6b65\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\uff0c\u7136\u800c\u5206\u5e03\u5f0f\u901a\u4fe1\u4ecd\u7136\u662f\u5f71\u54cd\u8bad\u7ec3\u8fdb\u5ea6\u7684\u4e00\u4e2a\u91cd\u5927\u74f6\u9888\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u53d8\u6362\u5668\u6a21\u578b\u7684\u901a\u4fe1\u884c\u4e3a\uff0c\u5373\u5728\u4f7f\u7528\u591a\u8282\u70b9/\u591aGPU DL\u8bad\u7ec3\u65f6\uff0c\u4e0d\u540c\u5e76\u884c\u65b9\u6848\u5982\u4f55\u5728\u53d8\u6362\u5668\u80cc\u666f\u4e0b\u8fdb\u884c\u6570\u636e\u901a\u4fe1\u3002\u6211\u4eec\u4ee5GPT\u4e3a\u57fa\u7840\u7684\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u53d8\u6362\u5668\u67b6\u6784\u6848\u4f8b\u7814\u7a76\u7684\u4e3b\u8981\u5bf9\u8c61\uff0c\u7531\u4e8e\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u800c\u88ab\u9009\u4e2d\u3002\u901a\u8fc7\u6211\u4eec\u7684\u901a\u4fe1\u65e5\u5fd7\u9a8c\u8bc1\u4e86\u6240\u83b7\u5f97\u7684\u5b9e\u9a8c\u7ed3\u679c\uff0c\u5e76\u4f7f\u7528\u5206\u6790\u6a21\u578b\u5bf9\u8fd9\u4e9b\u7ed3\u679c\u8fdb\u884c\u4e86\u786e\u8ba4\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8fdb\u4e00\u6b65\u4f18\u5316\u5c0f\u6d88\u606f\u70b9\u5230\u70b9\u901a\u4fe1\u7684\u5fc5\u8981\u6027\u3001\u5e8f\u5217\u957f\u5ea6\u3001\u6bcfGPU\u541e\u5410\u91cf\u3001\u6a21\u578b\u5927\u5c0f\u4ee5\u53ca\u6240\u7528\u4f18\u5316\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4ee5\u53ca\u5728\u6846\u67b6\u548c\u9ad8\u6027\u80fd\u8ba1\u7b97\u4e2d\u95f4\u4ef6\u8bbe\u8ba1\u4e0e\u4f18\u5316\u65b9\u9762\u53ef\u80fd\u9700\u8981\u5f15\u5bfc\u7684\u8fdb\u4e00\u6b65\u4f18\u5316\u65b9\u5411\u3002|\n", "2408.10174": "|**2024-08-19**|**SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models**|Anke Tang et.al.|[2408.10174](http://arxiv.org/abs/2408.10174)|**[link](https://github.com/tanganke/fusion_bench)**|**\u6df1\u5ea6\u6a21\u578b\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u8bad\u7ec3\u65e5\u76ca\u53d8\u5f97\u6210\u672c\u9ad8\u6602\uff0c\u8fd9\u4fc3\u4f7f\u4eba\u4eec\u5e7f\u6cdb\u91c7\u7528\u6df1\u5ea6\u6a21\u578b\u878d\u5408\u6280\u672f\uff0c\u4ee5\u5229\u7528\u73b0\u6709\u6a21\u578b\u7684\u77e5\u8bc6\u3002\u4ece\u7b80\u5355\u7684\u6743\u91cd\u5e73\u5747\u5230\u66f4\u590d\u6742\u7684AdaMerging\u7b49\u65b9\u6cd5\uff0c\u6a21\u578b\u878d\u5408\u80fd\u591f\u6709\u6548\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5e76\u52a0\u901f\u65b0\u6a21\u578b\u7684\u5f00\u53d1\u3002\u7136\u800c\uff0c\u4e2a\u4f53\u6a21\u578b\u53c2\u6570\u95f4\u7684\u76f8\u4e92\u5e72\u6270\u4ee5\u53ca\u878d\u5408\u8fc7\u7a0b\u7684\u53ef\u89e3\u91ca\u6027\u4e0d\u8db3\u4ecd\u7136\u662f\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u8bd5\u56fe\u901a\u8fc7\u8bc4\u4f30\u53c2\u6570\u5c5e\u6027\uff08\u5982\u5927\u5c0f\u6216\u7b26\u53f7\uff09\u6216\u8fdb\u884c\u53c2\u6570\u4fee\u526a\u6765\u89e3\u51b3\u53c2\u6570\u5e72\u6270\u95ee\u9898\u3002\u672c\u7814\u7a76\u9996\u5148\u4ece\u7ebf\u6027\u5c42\u5fae\u8c03\u7684\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b50\u7a7a\u95f4\u5206\u6790\u660e\u786e\u5730\u5b9a\u4e49\u4e86\u53c2\u6570\u5e72\u6270\u4f5c\u4e3a\u4f18\u5316\u95ee\u9898\uff0c\u4ee5\u63ed\u793a\u8fd9\u4e00\u4e3b\u9898\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u96f6\u6837\u672c\u7a00\u758f\u6df7\u5408\u4f4e\u79e9\u4e13\u5bb6\uff08SMILE\uff09\u6784\u9020\u7684\u521b\u65b0\u6a21\u578b\u878d\u5408\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5141\u8bb8\u5728\u65e0\u9700\u989d\u5916\u6570\u636e\u6216\u8fdb\u4e00\u6b65\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u5c06\u6e90\u6a21\u578b\u5347\u7ea7\u4e3a\u6df7\u5408\u4e13\u5bb6\u6a21\u578b\uff08MoE\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u4ee5\u4e0b\u89c2\u5bdf\uff1a\u5fae\u8c03\u4e3b\u8981\u4fdd\u7559\u4e86\u9884\u8bad\u7ec3\u7684\u91cd\u8981\u90e8\u5206\uff0c\u4f46\u4f7f\u7528\u8f83\u5c11\u91cd\u8981\u6216\u672a\u4f7f\u7528\u7684\u533a\u57df\u6765\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u5728\u539f\u59cb\u53c2\u6570\u7a7a\u95f4\u4e2d\u56fa\u6709\u7684\u53c2\u6570\u5e72\u6270\u95ee\u9898\uff0c\u53ef\u4ee5\u901a\u8fc7\u6269\u5c55\u7ef4\u5ea6\u6765\u7ba1\u7406\u3002\u6211\u4eec\u5728\u591a\u79cd\u573a\u666f\u4e0b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5305\u62ec\u56fe\u50cf\u5206\u7c7b\u548c\u6587\u672c\u6cdb\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u5168\u91cf\u5fae\u8c03\u548cLoRA\u5fae\u8c03\uff0c\u5e76\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08CLIP\u6a21\u578b\u3001Flan-T5\u6a21\u578b\u548cMistral-7B\u6a21\u578b\uff09\uff0c\u7a81\u51fa\u4e86SMILE\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8ehttps://github.com/tanganke/fusion_bench**|\n", "2408.10159": "|**2024-08-19**|**Customizing Language Models with Instance-wise LoRA for Sequential Recommendation**|Xiaoyu Kong et.al.|[2408.10159](http://arxiv.org/abs/2408.10159)|**[link](https://github.com/akalikong/ilora)**|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u77e5\u8bc6\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u8bed\u8a00\u751f\u6210\u8303\u5f0f\u5c06LLM\u5e94\u7528\u4e8e\u5e8f\u5217\u63a8\u8350\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5c06\u7528\u6237\u884c\u4e3a\u5e8f\u5217\u8f6c\u6362\u4e3aLLM\u5fae\u8c03\u7684\u63d0\u793a\uff0c\u5229\u7528LoRA\u6a21\u5757\u6765\u7ec6\u5316\u63a8\u8350\u3002\u7136\u800c\uff0c\u5728\u4e0d\u540c\u7528\u6237\u884c\u4e3a\u4e4b\u95f4\u8fdb\u884c\u7edf\u4e00\u5e94\u7528\u65f6\uff0cLoRA\u6709\u65f6\u65e0\u6cd5\u6355\u6349\u5230\u4e2a\u4f53\u5dee\u5f02\u6027\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u4ee5\u53ca\u5728\u4e0d\u540c\u884c\u4e3a\u5e8f\u5217\u95f4\u7684\u8d1f\u8fc1\u79fb\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684LoRA\uff08iLoRA\uff09\uff0c\u5b83\u7ed3\u5408\u4e86LoRA\u4e0e\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6846\u67b6\u3002iLoRA\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6837\u5316\u7684\u4e13\u5bb6\u96c6\u5408\uff0c\u6bcf\u4e2a\u4e13\u5bb6\u90fd\u80fd\u591f\u6355\u83b7\u7279\u5b9a\u7684\u7528\u6237\u504f\u597d\u65b9\u9762\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u7531\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u5f15\u5bfc\u7684\u95e8\u63a7\u51fd\u6570\u3002\u8be5\u95e8\u63a7\u51fd\u6570\u5904\u7406\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u4ee5\u751f\u6210\u589e\u5f3a\u8868\u793a\uff0c\u4ece\u800c\u6307\u5bfc\u95e8\u63a7\u7f51\u7edc\u8f93\u51fa\u5b9a\u5236\u7684\u4e13\u5bb6\u53c2\u4e0e\u6743\u91cd\u3002\u8fd9\u79cd\u5b9a\u5236\u5316\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c11\u8d1f\u8fc1\u79fb\u5e76\u52a8\u6001\u9002\u5e94\u591a\u6837\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u4e09\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\u4e86iLoRA\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u6355\u6349\u7528\u6237\u7279\u5b9a\u504f\u597d\u548c\u63d0\u9ad8\u63a8\u8350\u51c6\u786e\u5ea6\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2408.10151": "|**2024-08-19**|**Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models**|Amey Hengle et.al.|[2408.10151](http://arxiv.org/abs/2408.10151)|**[link](https://github.com/AmeyHengle/multilingual-needle-in-a-haystack)**|\u5728\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u793a\u4e86\u5728\u591a\u79cd\u8bed\u8a00\u4e2d\u54cd\u5e94\u67e5\u8be2\u7684\u80fd\u529b\u4e4b\u540e\uff0c\u5b83\u4eec\u5904\u7406\u957f\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002\u56e0\u6b64\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u8bc4\u4f30LLM\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u7279\u522b\u662f\u5728\u4fe1\u606f\u68c0\u7d22\u7684\u80cc\u666f\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u8bed\u8a00\u9488\u5728\u8349\u5806\u4e2d\u7684\u6d4b\u8bd5\uff08MultiLingual Needle-in-a-Haystack\uff0c\u7b80\u79f0MLNeedle\uff09\uff0c\u65e8\u5728\u8bc4\u4f30\u6a21\u578b\u4ece\u591a\u8bed\u8a00\u5e72\u6270\u6587\u672c\u96c6\u5408\uff08\u8349\u5806\uff09\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\uff08\u9488\uff09\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u6d4b\u8bd5\u6269\u5c55\u4e86\u591a\u8bed\u8a00\u95ee\u7b54\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u5355\u8bed\u8a00\u548c\u8de8\u8bed\u8a00\u68c0\u7d22\u3002\u6211\u4eec\u5bf9\u5f53\u524d\u7684\u56db\u5927\u5148\u8fdbLLM\u8fdb\u884c\u4e86MLNeedle\u6d4b\u8bd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u6a21\u578b\u6027\u80fd\u5728\u4e0d\u540c\u8bed\u8a00\u548c\u9488\u7684\u4f4d\u7f6e\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5f53\u9488\u4f4d\u4e8e\u82f1\u8bed\u8bed\u7cfb\u4e4b\u5916\u7684\u8bed\u8a00\u4e2d\u4ee5\u53ca\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u4e2d\u95f4\u4f4d\u7f6e\u65f6\uff0c\u6a21\u578b\u7684\u6027\u80fd\u6700\u4f4e\u3002\u6b64\u5916\uff0c\u5c3d\u7ba1\u67d0\u4e9b\u6a21\u578b\u58f0\u79f0\u5177\u6709\u9ad8\u8fbe8k\u4e2a\u4ee4\u724c\u7684\u4e0a\u4e0b\u6587\u5927\u5c0f\uff0c\u4f46\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u90fd\u6ca1\u6709\u8868\u73b0\u51fa\u6ee1\u610f\u7684\u8de8\u8bed\u8a00\u68c0\u7d22\u6027\u80fd\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5173\u4e8eLLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u5173\u952e\u89c1\u89e3\uff0c\u4ee5\u6307\u5bfc\u672a\u6765\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u7814\u7a76LLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7684\u957f\u4e0a\u4e0b\u6587\u884c\u4e3a\u3002|\n", "2408.10147": "|**2024-08-19**|**In-Context Learning with Representations: Contextual Generalization of Trained Transformers**|Tong Yang et.al.|[2408.10147](http://arxiv.org/abs/2408.10147)|null|\u672c\u6587\u901a\u8fc7\u975e\u7ebf\u6027\u56de\u5f52\u4efb\u52a1\u7684\u89c6\u89d2\u6765\u63a2\u8ba8Transformer\u5728\u68af\u5ea6\u4e0b\u964d\u8fc7\u7a0b\u4e2d\u7684\u8bad\u7ec3\u52a8\u6001\u3002\u5728\u6b64\u7c7b\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u5b66\u4e60\u6bcf\u4e2a\u4efb\u52a1\u7684\u6a21\u677f\u51fd\u6570\u5b9e\u73b0\u4e0a\u4e0b\u6587\u6cdb\u5316\uff0c\u6240\u6709\u6a21\u677f\u51fd\u6570\u90fd\u4f4d\u4e8e\u5305\u542b$m$\u4e2a\u57fa\u51fd\u6570\u7684\u7ebf\u6027\u7a7a\u95f4\u5185\u3002\u6211\u4eec\u5bf9\u5355\u5c42\u591a\u5934Transformer\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u4ee5\u5728\u90e8\u5206\u6807\u8bb0\u63d0\u793a\u4e0b\u9884\u6d4b\u672a\u6807\u8bb0\u8f93\u5165\u7684\u4e0a\u4e0b\u6587\u5185\u9884\u6d4b\u80fd\u529b\uff0c\u5176\u4e2d\u6807\u7b7e\u5305\u542b\u9ad8\u65af\u566a\u58f0\uff0c\u6bcf\u4e2a\u63d0\u793a\u4e2d\u7684\u793a\u4f8b\u6570\u91cf\u4e0d\u8db3\u4ee5\u786e\u5b9a\u6a21\u677f\u3002 \u5728\u6e29\u548c\u5047\u8bbe\u4e0b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5355\u5c42\u591a\u5934Transformer\u7684\u8bad\u7ec3\u635f\u5931\u4f1a\u7ebf\u6027\u6536\u655b\u81f3\u5168\u5c40\u6700\u5c0f\u503c\u3002\u6b64\u5916\uff0cTransformer\u6709\u6548\u5730\u5b66\u4e60\u4e86\u5728\u57fa\u51fd\u6570\u4e0a\u8fdb\u884c\u5cad\u56de\u5f52\u7684\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u5c55\u793a\u4e86\u5f53\u63d0\u793a\u4ec5\u5305\u542b\u5c11\u91cf\u67e5\u8be2-\u7b54\u6848\u5bf9\u65f6\uff0cTransformer\u80fd\u591f\u5b66\u4e60\u4e0a\u4e0b\u6587\u4fe1\u606f\uff08\u5373\u6a21\u677f\uff09\u4ee5\u5bf9\u672a\u89c1\u8fc7\u7684\u793a\u4f8b\u548c\u4efb\u52a1\u8fdb\u884c\u6cdb\u5316\u3002|\n", "2408.10141": "|**2024-08-19**|**Instruction Finetuning for Leaderboard Generation from Empirical AI Research**|Salomon Kabongo et.al.|[2408.10141](http://arxiv.org/abs/2408.10141)|null|\u672c\u6587\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6307\u4ee4\u5fae\u8c03\u5728\u81ea\u52a8\u5316\u751f\u6210AI\u7814\u7a76\u6392\u884c\u699c\u4e2d\u7684\u5e94\u7528\uff0c\u4ece\u6587\u7ae0\u4e2d\u63d0\u53d6\uff08\u4efb\u52a1\uff0c\u6570\u636e\u96c6\uff0c\u6307\u6807\uff0c\u5206\u6570\uff09\u56db\u5143\u7ec4\u3002\u8be5\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ece\u4f20\u7edf\u7684\u3001\u57fa\u4e8e\u793e\u533a\u7684\u624b\u52a8\u6574\u7406\u8f6c\u53d8\u4e3a\u5229\u7528\u81ea\u52a8\u5316\u3001\u751f\u6210\u5f0fLLM\u65b9\u6cd5\u6765\u7b80\u5316AI\u7814\u7a76\u8fdb\u5c55\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u8d85\u8d8a\u4f9d\u8d56\u4e8e\u7279\u5b9a\u5206\u7c7b\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u6a21\u578b\u7684\u4f20\u7edf\u65b9\u5f0f\u3002\u901a\u8fc7\u5229\u7528FLAN-T5\u6a21\u578b\uff0c\u672c\u7814\u7a76\u589e\u5f3a\u4e86LLMs\u5728\u4fe1\u606f\u62bd\u53d6\u65b9\u9762\u7684\u9002\u5e94\u6027\u548c\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u6784\u5efa\u7ed3\u6784\u5316\u77e5\u8bc6\u8868\u793a\u3002|\n", "2408.10124": "|**2024-08-19**|**Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models**|Tianyu Zhang et.al.|[2408.10124](http://arxiv.org/abs/2408.10124)|**[link](https://github.com/zhangtia16/molgraph-lardo)**|**\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u662f\u836f\u7269\u53d1\u73b0\u7684\u57fa\u7840\u3002\u8fd1\u5e74\u6765\uff0c\u9884\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u5f97\u5230\u4e86\u5e7f\u6cdb\u5e94\u7528\uff0c\u5e76\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u4e00\u4e9b\u5c06\u751f\u7269\u5316\u5b66\u9886\u57df\u7684\u5148\u9a8c\u77e5\u8bc6\u878d\u5165\u9884\u8bad\u7ec3\u6846\u67b6\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u751f\u7269\u5316\u5b66\u4e13\u5bb6\uff0c\u83b7\u53d6\u548c\u603b\u7ed3\u5927\u91cf\u7684\u9886\u57df\u77e5\u8bc6\u6587\u732e\u65e2\u8017\u65f6\u53c8\u6602\u8d35\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u5e76\u9ad8\u6548\u63d0\u4f9b\u901a\u7528\u77e5\u8bc6\u65b9\u9762\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5076\u5c14\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u7f3a\u4e4f\u751f\u6210\u7279\u5b9a\u9886\u57df\u77e5\u8bc6\u7684\u7cbe\u786e\u6027\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\uff08DSMs\uff09\u62e5\u6709\u4e30\u5bcc\u7684\u9886\u57df\u77e5\u8bc6\uff0c\u80fd\u591f\u51c6\u786e\u8ba1\u7b97\u4e0e\u5206\u5b50\u9886\u57df\u76f8\u5173\u7684\u6307\u6807\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5b83\u4eec\u7684\u6a21\u578b\u5927\u5c0f\u6709\u9650\u4e14\u529f\u80fd\u5355\u4e00\uff0c\u5b83\u4eec\u7f3a\u4e4f\u5168\u9762\u7684\u8868\u793a\u5b66\u4e60\u6240\u9700\u7684\u5e7f\u6cdb\u77e5\u8bc6\u3002\u4e3a\u4e86\u5728\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u4e2d\u5145\u5206\u5229\u7528\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMolGraph-LarDo\u7684\u65b0\u578b\u5206\u5b50\u56fe\u8868\u793a\u5b66\u4e60\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u878d\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\u3002\u6280\u672f\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e24\u9636\u6bb5\u63d0\u793a\u7b56\u7565\uff0c\u5176\u4e2d\u5f15\u5165DSMs\u6765\u6821\u51c6LLMs\u63d0\u4f9b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u589e\u5f3a\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u7684\u51c6\u786e\u6027\uff0c\u4f7fLLMs\u80fd\u591f\u4e3a\u5206\u5b50\u6837\u672c\u751f\u6210\u66f4\u7cbe\u786e\u7684\u6587\u5b57\u63cf\u8ff0\u3002\u968f\u540e\uff0c\u6211\u4eec\u91c7\u7528\u591a\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u534f\u8c03\u5305\u62ec\u5206\u5b50\u56fe\u53ca\u5176\u5bf9\u5e94\u63cf\u8ff0\u6587\u672c\u5728\u5185\u7684\u5404\u79cd\u6a21\u6001\uff0c\u4ee5\u6307\u5bfc\u5206\u5b50\u8868\u793a\u7684\u9884\u8bad\u7ec3\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10111": "|**2024-08-20**|**PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities**|Yuanjian Xu et.al.|[2408.10111](http://arxiv.org/abs/2408.10111)|null|\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u5efa\u6a21\u5bf9\u4e8e\u7406\u89e3\u4e0e\u9884\u6d4b\u5e02\u573a\u884c\u4e3a\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u4e34\u7740\u975e\u7ebf\u6027\u3001\u975e\u5e73\u7a33\u6027\u548c\u9ad8\u566a\u58f0\u7b49\u6311\u6218\u3002\u4f20\u7edf\u7684\u6a21\u578b\u5728\u6355\u6349\u590d\u6742\u6a21\u5f0f\u65f6\u53d7\u5230\u8fd9\u4e9b\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u53d7\u5230\u8ba1\u7b97\u8d44\u6e90\u548c\u6a21\u578b\u5bb9\u91cf\u7684\u9650\u5236\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a$\\textbf{PLUTUS}$\u7684\u6a21\u578b\uff0c\u5176\u5168\u79f0\u4e3a$\\textbf{P}$re-trained $\\textbf{L}$arge $\\textbf{U}$nified $\\textbf{T}$ransformer-based\u6a21\u578b\uff0c\u7528\u4e8e\u63ed\u793a\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u4e2d\u7684\u89c4\u5f8b\u3002$\\textbf{PLUTUS}$\u901a\u8fc7\u7ed3\u5408\u53ef\u9006\u5d4c\u5165\u6a21\u5757\u3001\u5bf9\u6bd4\u5b66\u4e60\u548c\u81ea\u52a8\u7f16\u7801\u6280\u672f\uff0c\u521b\u5efa\u4e86\u539f\u59cb\u6570\u636e\u4e0e\u5757\u5d4c\u5165\u4e4b\u95f4\u7684\u8fd1\u4f3c\u4e00\u4e00\u6620\u5c04\u3002 TimeFormer\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6ce8\u610f\u529b\u7684\u67b6\u6784\uff0c\u6784\u6210\u4e86$\\textbf{PLUTUS}$\u7684\u6838\u5fc3\uff0c\u6709\u6548\u5730\u5904\u7406\u4e86\u9ad8\u566a\u58f0\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6ce8\u610f\u529b\u673a\u5236\uff0c\u4ee5\u8de8\u53d8\u91cf\u548c\u65f6\u95f4\u7ef4\u5ea6\u6355\u83b7\u7279\u5f81\u3002$\\textbf{PLUTUS}$\u5728\u89c4\u6a21\u7a7a\u524d\u76841000\u4ebf\u4e2a\u89c2\u5bdf\u503c\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u65e8\u5728\u9002\u5e94\u5608\u6742\u7684\u91d1\u878d\u5e02\u573a\u73af\u5883\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c$\\textbf{PLUTUS}$\u662f\u9996\u4e2a\u5f00\u6e90\u7684\u3001\u5927\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6a21\u578b\uff0c\u53c2\u6570\u8d85\u8fc7\u5341\u4ebf\u4e2a\u3002\u5b83\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u8fc1\u79fb\u6027\uff0c\u5e76\u4e3a\u91d1\u878d\u9886\u57df\u5efa\u7acb\u4e86\u4e00\u4e2a\u575a\u5b9e\u7684\u57fa\u7840\u6a21\u578b\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u7684\u6280\u672f\u6307\u5bfc\uff0c\u786e\u7acb\u4e86\u8be5\u9886\u57df\u7684\u5168\u65b0\u6807\u51c6\u3002|\n", "2408.10086": "|**2024-08-19**|**ARMADA: Attribute-Based Multimodal Data Augmentation**|Xiaomeng Jin et.al.|[2408.10086](http://arxiv.org/abs/2408.10086)|null|\u5728\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLMs\uff09\u4e2d\uff0c\u624b\u52a8\u6807\u6ce8\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u6570\u636e\u4ee5\u8fdb\u884c\u5fae\u8c03\u548c\u5bf9\u9f50\u7684\u6210\u672c\u975e\u5e38\u9ad8\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u6846\u67b6\u63d0\u51fa\u4e86\u589e\u5f3a\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u7684\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u8981\u4e48\u5728\u6587\u672c\u548c\u56fe\u50cf\u4e4b\u95f4\u5b58\u5728\u8bed\u4e49\u4e0d\u4e00\u81f4\uff0c\u8981\u4e48\u751f\u6210\u4e0d\u5207\u5b9e\u9645\u7684\u56fe\u50cf\uff0c\u5bfc\u81f4\u4e0e\u73b0\u5b9e\u4e16\u754c\u793a\u4f8b\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAttribute-based Multimodal Data Augmentation (ARMADA)\u7684\u65b0\u578b\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u77e5\u8bc6\u5f15\u5bfc\u7684\u63d0\u53ca\u5b9e\u4f53\u89c6\u89c9\u5c5e\u6027\u7684\u4fee\u6539\u6765\u589e\u5f3a\u6570\u636e\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u539f\u59cb\u6587\u672c\u6570\u636e\u4e2d\u63d0\u53d6\u5b9e\u4f53\u53ca\u5176\u89c6\u89c9\u5c5e\u6027\uff0c\u7136\u540e\u5728\u77e5\u8bc6\u5e93\uff08KBs\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u5bfc\u4e0b\u641c\u7d22\u89c6\u89c9\u5c5e\u6027\u7684\u66ff\u4ee3\u503c\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5229\u7528\u56fe\u50cf\u7f16\u8f91\u6a21\u578b\u6839\u636e\u63d0\u53d6\u7684\u5c5e\u6027\u7f16\u8f91\u56fe\u50cf\u3002ARMADA\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u6570\u636e\u751f\u6210\u6846\u67b6\uff1a(i) \u4ece\u7b26\u53f7\u77e5\u8bc6\u5e93\u4e2d\u63d0\u53d6\u77e5\u8bc6\u5173\u8054\u7684\u5c5e\u6027\uff0c\u5b9e\u73b0\u8bed\u4e49\u4e00\u81f4\u4e14\u5177\u6709\u533a\u522b\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u751f\u6210\uff1b(ii) \u5229\u7528\u77e5\u8bc6\u5e93\u5c42\u6b21\u7ed3\u6784\u4e2d\u7684\u540c\u7c7b\u522b\u5b9e\u4f53\u751f\u6210\u89c6\u89c9\u4e0a\u76f8\u4f3c\u4f46\u4e0d\u540c\u7c7b\u522b\u7684\u56fe\u50cf\uff1b(iii) \u4f7f\u7528LLMs\u7684\u5e38\u8bc6\u77e5\u8bc6\u8c03\u8282\u8f85\u52a9\u89c6\u89c9\u5c5e\u6027\uff0c\u5982\u80cc\u666f\uff0c\u4ee5\u66f4\u5168\u9762\u5730\u8868\u793a\u539f\u59cb\u5b9e\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u660e\uff0c\u5728\u56db\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u5e76\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e5f\u5f3a\u8c03\u4e86\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4ee3\u7406\u4ee5\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\u548c\u73b0\u5b9e\u4e16\u754c\u76f8\u5173\u6027\u7684\u5fc5\u8981\u6027\u3002|\n", "2408.10072": "|**2024-08-19**|**FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant**|Zhengchao Huang et.al.|[2408.10072](http://arxiv.org/abs/2408.10072)|null|\u5feb\u901f\u53d1\u5c55\u7684\u6df1\u5ea6\u4f2a\u9020\u6280\u672f\u5f15\u53d1\u4e86\u516c\u4f17\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c24\u5176\u662f\u5728\u5bf9\u516c\u5171\u4fe1\u606f\u5b89\u5168\u6784\u6210\u4e25\u91cd\u5a01\u80c1\u7684\u9762\u90e8\u4f2a\u9020\u65b9\u9762\u3002\u7136\u800c\uff0c\u672a\u77e5\u548c\u591a\u6837\u7684\u4f2a\u9020\u6280\u672f\u3001\u591a\u53d8\u7684\u9762\u90e8\u7279\u5f81\u4ee5\u53ca\u590d\u6742\u7684\u73af\u5883\u56e0\u7d20\u7ed9\u9762\u90e8\u4f2a\u9020\u5206\u6790\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u73b0\u6709\u6570\u636e\u96c6\u5728\u63cf\u8ff0\u8fd9\u4e9b\u65b9\u9762\u65f6\u5b58\u5728\u4e0d\u8db3\uff0c\u4f7f\u5f97\u4ec5\u901a\u8fc7\u89c6\u89c9\u4fe1\u606f\u96be\u4ee5\u5728\u5404\u79cd\u5e72\u6270\u56e0\u7d20\u4e2d\u533a\u5206\u771f\u5b9e\u4e0e\u4f2a\u9020\u7684\u9762\u90e8\u3002\u6b64\u5916\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u672a\u80fd\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u4e14\u53ef\u89e3\u91ca\u7684\u7ed3\u679c\uff0c\u590d\u6742\u5316\u4e86\u6a21\u578b\u51b3\u7b56\u8fc7\u7a0b\u7684\u7406\u89e3\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u9896\u7684\u201c\u5f00\u653e\u4e16\u754c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u95ee\u7b54\u201d\uff08OW-FFA-VQA\uff09\u4efb\u52a1\u53ca\u5176\u76f8\u5e94\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u9996\u5148\u5efa\u7acb\u4e86\u4e00\u4e2a\u5305\u542b\u771f\u5b9e\u548c\u4f2a\u9020\u9762\u90e8\u56fe\u50cf\u7684\u591a\u6837\u96c6\u5408\uff0c\u5e76\u914d\u6709\u5173\u952e\u63cf\u8ff0\u548c\u53ef\u9760\u4f2a\u9020\u63a8\u7406\u7684\u6570\u636e\u96c6\u3002\u57fa\u4e8e\u6b64\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u52a9\u624b\u201d\uff08FFAA\uff09\uff0c\u5b83\u7531\u4e00\u4e2a\u5fae\u8c03\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u548c\u4e00\u4e2a\u591a\u7b54\u6848\u667a\u80fd\u51b3\u7b56\u7cfb\u7edf\uff08MIDS\uff09\u7ec4\u6210\u3002\u901a\u8fc7\u7ed3\u5408\u5047\u8bbe\u6027\u63d0\u793a\u4e0eMIDS\uff0c\u6709\u6548\u6d88\u9664\u4e86\u6a21\u7cca\u5206\u7c7b\u8fb9\u754c\u7684\u5f71\u54cd\u529b\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u9c81\u68d2\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u7528\u6237\u53cb\u597d\u7684\u53ef\u89e3\u91ca\u7ed3\u679c\uff0c\u800c\u4e14\u5728\u51c6\u786e\u6027\u4e0e\u9c81\u68d2\u6027\u65b9\u9762\u663e\u8457\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u65b9\u6cd5\u3002|\n", "2408.11053": "|**2024-08-20**|**Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks**|Nathaniel Pinckney et.al.|[2408.11053](http://arxiv.org/abs/2408.11053)|**[link](https://github.com/nvlabs/verilog-eval)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6570\u5b57\u786c\u4ef6\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5e94\u7528\u662f\u4e00\u4e2a\u65b0\u5174\u9886\u57df\u3002\u5927\u591a\u6570LLM\u4e3b\u8981\u662f\u5728\u81ea\u7136\u8bed\u8a00\u548c\u8f6f\u4ef6\u4ee3\u7801\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\u3002\u786c\u4ef6\u4ee3\u7801\uff0c\u5982Verilog\uff0c\u53ea\u5360\u8bad\u7ec3\u6570\u636e\u7684\u4e00\u5c0f\u90e8\u5206\uff0c\u800c\u4e14\u5f88\u5c11\u6709\u786c\u4ef6\u57fa\u51c6\u5b58\u5728\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c2023\u5e74\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aVerilogEval\u7684\u5f00\u6e90\u57fa\u51c6\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4e00\u81f4\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8eLLM\u5728\u4ee3\u7801\u5b8c\u6210\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u57fa\u51c6\u5728\u5f53\u65f6\u7684\u9886\u5148\u6a21\u578b\uff0c\u5305\u62ecGPT-4\uff0c\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002\u7136\u800c\uff0cVerilogEval\u548c\u5176\u4ed6Verilog\u751f\u6210\u57fa\u51c6\u7f3a\u4e4f\u5931\u8d25\u5206\u6790\uff0c\u5f53\u524d\u5f62\u5f0f\u4e0b\u4e5f\u4e0d\u5229\u4e8e\u63a2\u7d22\u63d0\u793a\u6280\u672f\u3002\u6b64\u5916\uff0c\u5728VerilogEval\u53d1\u5e03\u540e\uff0c\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u90fd\u7ecf\u5386\u4e86\u6301\u7eed\u7684\u53d1\u5c55\u3002 \u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u65b0\u53d1\u5e03\u7684\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u4e0d\u540c\u89c4\u6a21\uff0c\u9488\u5bf9\u6539\u8fdb\u540e\u7684VerilogEval\u57fa\u51c6\u5957\u4ef6\u3002\u6211\u4eec\u589e\u5f3a\u4e86VerilogEval\u7684\u57fa\u7840\u67b6\u6784\u548c\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u81ea\u52a8\u5206\u7c7b\u5931\u8d25\uff0c\u5f15\u5165\u4e86\u652f\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u793a\u4f8b\u7684\u65b0\u63d0\u793a\uff0c\u5e76\u6269\u5c55\u4e86\u652f\u6301\u7684\u4efb\u52a1\u5230\u89c4\u683c\u5230RTL\u8f6c\u6362\u3002\u6211\u4eec\u53d1\u73b0\u5546\u4e1a\u9886\u57df\u7684\u6700\u65b0\u6a21\u578b\u6709\u4e86\u53ef\u6d4b\u91cf\u7684\u6539\u8fdb\uff0c\u5176\u4e2dGPT-4 Turbo\u5728\u89c4\u683c\u5230RTL\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8659%\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u4e5f\u7814\u7a76\u4e86\u65b0\u51fa\u73b0\u7684\u5f00\u6e90\u548c\u9886\u57df\u7279\u5b9a\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5e76\u5c55\u793a\u4e86\u6a21\u578b\u4ece\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u83b7\u5f97\u663e\u8457\u76ca\u5904\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u53d1\u5e03\u7684Llama 3.1 405B\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0eGPT-4 Turbo\u76f8\u5f53\uff0c\u5b9e\u73b0\u4e8658%\u7684\u6210\u529f\u7387\uff0c\u800c\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u7684RTL-Coder 6.7B\u6a21\u578b\u5219\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768437%\u7684\u6210\u529f\u7387\u3002\u7136\u800c\uff0c\u63d0\u793a\u5de5\u7a0b\u5bf9\u4e8e\u5b9e\u73b0\u826f\u597d\u7684\u6210\u529f\u7387\u81f3\u5173\u91cd\u8981\uff0c\u5e76\u4e14\u968f\u7740\u6a21\u578b\u548c\u4efb\u52a1\u7684\u53d8\u5316\u800c\u53d8\u5316\u3002\u4e00\u4e2a\u5141\u8bb8\u8fdb\u884c\u63d0\u793a\u5de5\u7a0b\u548c\u5931\u8d25\u5206\u6790\u7684\u57fa\u51c6\u57fa\u7840\u8bbe\u65bd\u5bf9\u4e8e\u6301\u7eed\u7684\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u5f80\u5f80\u8f83\u4f4e\u4e0b\u3002\u6211\u4eec\u5f15\u5165\u4e86FLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u591a\u6a21\u6001LLM\u7684\u65b0\u578b\u4ee3\u7406\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u80fd\u9ad8\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u5b9e\u73b0\u5bf9\u5bfc\u822a\u4efb\u52a1\u7684\u6709\u6548\u9002\u5e94\uff1a\u5355\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8857\u9053\u89c6\u56fe\u63cf\u8ff0\u3001\u591a\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8f68\u8ff9\u603b\u7ed3\u4ee5\u53ca\u7aef\u5230\u7aef\u8bad\u7ec3\u5728VLN\u6570\u636e\u96c6\u4e0a\u7684\u7efc\u5408\u80fd\u529b\u3002\u751f\u6210\u7684\u6570\u636e\u96c6\u901a\u8fc7\u81ea\u52a8\u5316\u8fc7\u7a0b\u5408\u6210\u800c\u6210\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u8f83\u73b0\u6709\u65b9\u6cd5\u63d0\u9ad8\u4e867.3%\uff0c\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u4ee3\u8868\u4e86\u5411\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53\u4eba\u5de5\u667a\u80fd\u9886\u57df\u8fc8\u51fa\u7684\u91cd\u8981\u4e00\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11049": "|**2024-08-20**|**MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding**|Jian Chen et.al.|[2408.11049](http://arxiv.org/abs/2408.11049)|**[link](https://github.com/infini-ai-lab/magicdec)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982\u4ea4\u4e92\u5f0f\u804a\u5929\u673a\u5668\u4eba\u3001\u6587\u6863\u5206\u6790\u548c\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u7b49\u957f\u671f\u4e0a\u4e0b\u6587\u5e94\u7528\u4e2d\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4f46\u63d0\u4f9b\u957f\u4e0a\u4e0b\u6587\u8bf7\u6c42\u65f6\uff0c\u8981\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u548c\u9ad8\u541e\u5410\u91cf\u662f\u4e00\u4e2a\u6311\u6218\u3002\u63a8\u6d4b\u6027\u89e3\u7801\uff08SD\uff09\u662f\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u964d\u4f4e\u5ef6\u8fdf\u7684\u6280\u672f\uff0c\u4f20\u7edf\u89c2\u70b9\u8ba4\u4e3a\u5176\u6548\u80fd\u4ec5\u9650\u4e8e\u8f83\u5c0f\u7684\u6279\u6b21\u5927\u5c0f\u3002\u7136\u800c\uff0c\u5728MagicDec\u4e2d\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4ee4\u4eba\u60ca\u8bb6\u7684\u4e8b\u5b9e\uff1a\u5373\u4f7f\u5728\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u73af\u5883\u4e2d\uff0c\u5bf9\u4e8e\u4e2d\u7b49\u5230\u8f83\u957f\u5e8f\u5217\uff0cSD\u4ecd\u80fd\u5b9e\u73b0\u52a0\u901f\u3002\u66f4\u6709\u8da3\u7684\u662f\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u4e25\u8c28\u5206\u6790\uff0c\u4e00\u79cd\u667a\u80fd\u8d77\u8349\u7b56\u7565\u53ef\u4ee5\u5728\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u65f6\u83b7\u5f97\u66f4\u597d\u7684\u52a0\u901f\u6548\u679c\u3002 MagicDec\u9996\u5148\u8bc6\u522b\u51fa\u968f\u7740\u6279\u6b21\u5927\u5c0f\u548c\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u7684\u74f6\u9888\u8f6c\u79fb\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u6d1e\u5bdf\u6765\u66f4\u6709\u6548\u5730\u90e8\u7f72\u63a8\u6d4b\u6027\u89e3\u7801\u4ee5\u652f\u6301\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u5229\u7528\u7a00\u758fKV\u7f13\u5b58\u7684\u8349\u6848\u6a21\u578b\u6765\u89e3\u51b3\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u548c\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u800c\u6269\u5c55\u7684KV\u74f6\u9888\u95ee\u9898\u3002|\n", "2408.11043": "|**2024-08-20**|**Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research**|Sreyoshi Bhaduri et.al.|[2408.11043](http://arxiv.org/abs/2408.11043)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5206\u6790\u8bbf\u8c08\u8bb0\u5f55\uff0c\u4ee5\u89e3\u51b3\u624b\u52a8\u5206\u6790\u5b9a\u6027\u6570\u636e\u9700\u8981\u5927\u91cf\u65f6\u95f4\u548c\u52aa\u529b\u7684\u95ee\u9898\u3002\u7814\u7a76\u65e8\u5728\u5c06\u7814\u7a76\u95ee\u9898\u8bbe\u5b9a\u4e3a\u7531LLM\u4f5c\u4e3a\u521d\u7ea7\u7814\u7a76\u52a9\u624b\u8fdb\u884c\u8f85\u52a9\u7684\u6a21\u5f0f\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u5c06LLM\u89c6\u4e3a\u4eba\u624d\u7ba1\u7406\u9886\u57df\u7814\u7a76\u4eba\u5458\u7684\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u601d\u7ef4\u6a21\u578b\u3002\u901a\u8fc7\u6269\u5c55\u57fa\u4e8eRAG\u7684LLM\u65b9\u6cd5\uff0c\u672c\u6587\u5c55\u793a\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u5bf9\u534a\u7ed3\u6784\u5316\u8bbf\u8c08\u6570\u636e\u8fdb\u884c\u4e3b\u9898\u5efa\u6a21\u65b9\u9762\u7684\u7075\u6d3b\u6027\uff0c\u8d85\u8d8a\u4e86\u5b83\u4eec\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u4e2d\u7684\u4f20\u7edf\u5e94\u7528\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684RAG\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u63d0\u53d6\u611f\u5174\u8da3\u7684\u8bae\u9898\uff0c\u4e0e\u4ece\u540c\u4e00\u6570\u636e\u96c6\u624b\u52a8\u751f\u6210\u7684\u4e3b\u9898\u76f8\u6bd4\uff0c\u8986\u76d6\u8303\u56f4\u663e\u8457\u66f4\u9ad8\u3002\u8fd9\u8bc1\u660e\u4e86\u4f7f\u7528LLM\u4f5c\u4e3a\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u53ef\u884c\u6027\u3002\u6b64\u5916\uff0c\u7814\u7a76\u5efa\u8bae\uff0c\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u7814\u7a76\u8005\u5e94\u4e25\u683c\u9075\u5faa\u4f20\u7edf\u8d28\u6027\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u8d28\u91cf\u6807\u51c6\uff0c\u4ee5\u786e\u4fdd\u5176\u65b9\u6cd5\u7684\u4e25\u8c28\u6027\u548c\u53ef\u9760\u6027\u3002 \u6700\u540e\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u9488\u5bf9\u5e0c\u671b\u5c06LLM\u4e0e\u73b0\u6709\u8d28\u6027\u7814\u7a76\u8303\u5f0f\u76f8\u878d\u5408\u7684\u884c\u4e1a\u5b9e\u8df5\u8005\u7684\u5173\u952e\u5efa\u8bae\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u6709\u6548\u6574\u5408\u8fd9\u4e9b\u5f3a\u5927\u4f46\u521d\u7ea7\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u5177\u5728\u5b9a\u6027\u6570\u636e\u5206\u6790\u4e2d\u7684\u8def\u5f84\uff0c\u7279\u522b\u662f\u5728\u4eba\u624d\u9886\u57df\u3002|\n", "2408.11029": "|**2024-08-20**|**Scaling Law with Learning Rate Annealing**|Howe Tissue et.al.|[2408.11029](http://arxiv.org/abs/2408.11029)|null|\u6211\u4eec\u53d1\u73b0\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u4ea4\u53c9\u71b5\u635f\u5931\u66f2\u7ebf\u9075\u5faa\u4e86\u4e00\u4e2a\u4e0e\u5b66\u4e60\u7387\uff08LR\uff09\u8870\u51cf\u76f8\u5173\u7684\u7f29\u653e\u5b9a\u5f8b\uff1a$L(s) = L_0 + A\\cdot S_1^{-\\alpha} - C\\cdot S_2$\u3002\u5176\u4e2d\uff0c$S_1$\u4ee3\u8868\u524d\u5411\u533a\u57df\uff0c$S_2$\u4ee3\u8868\u5b66\u4e60\u7387\u8870\u51cf\u533a\u57df\u3002\u8fd9\u4e00\u516c\u5f0f\u8003\u8651\u4e86\u4e24\u4e2a\u56e0\u7d20\uff1a\uff081\uff09\u4f20\u7edf\u7684\u7f29\u653e\u5f8b\u5b9a\u4e49\u7684\u524d\u5411\u7f29\u653e\uff1b\u4ee5\u53ca\uff082\uff09\u5b66\u4e60\u7387\u8870\u51cf\u5e26\u6765\u7684\u989d\u5916\u635f\u5931\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u8be5\u516c\u5f0f\u80fd\u591f\u63cf\u8ff0\u6bcf\u4e2a\u6b65\u9aa4\u7684\u5b8c\u6574\u635f\u5931\u66f2\u7ebf\uff0c\u800c\u975e\u4ec5\u9650\u4e8e\u8bad\u7ec3\u7ed3\u675f\u65f6\u7684\u5355\u4e00\u635f\u5931\u70b9\u3002\u901a\u8fc7\u5e94\u7528\u5305\u542b\u5b66\u4e60\u7387\u8870\u51cf\u7684\u7f29\u653e\u5f8b\uff0c\u5e76\u4ec5\u901a\u8fc7\u4e00\u5230\u4e24\u6b21\u8bad\u7ec3\u66f2\u7ebf\u62df\u5408\uff0c\u6211\u4eec\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\uff08LRS\uff09\u4e0b\u7684\u635f\u5931\u3002 \u6b64\u5916\uff0c\u8fd9\u4e00\u65b9\u7a0b\u51c6\u786e\u5730\u63cf\u8ff0\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u52a8\u6001\uff0c\u5e76\u4e3a\u5148\u524d\u7814\u7a76\u4e2d\u5173\u6ce8\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u548c\u5b66\u4e60\u7387\u8870\u51cf\u7684\u76f8\u5173\u5b9e\u9a8c\u53d1\u73b0\u63d0\u4f9b\u4e86\u7406\u8bba\u9a8c\u8bc1\u548c\u89e3\u91ca\u3002\u7531\u6b64\u4ea7\u751f\u7684\u6d1e\u5bdf\uff0c\u4e5f\u4e3a\u7814\u7a76\u4eba\u5458\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u63d0\u524d\u9009\u62e9\u5173\u952e\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u7b56\u7565\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002\u6700\u91cd\u8981\u7684\u662f\uff0c\u7531\u4e8e\u6574\u4e2a\u8bad\u7ec3\u66f2\u7ebf\u4e0a\u7684\u6240\u6709\u70b9\u90fd\u9075\u5faa\u8be5\u65b9\u7a0b\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\u4e0b\u5b9e\u73b0\u51c6\u786e\u7684\u635f\u5931\u9884\u6d4b\uff0c\u800c\u6240\u9700\u8ba1\u7b97\u6210\u672c\u4ec5\u4e3a\u4f7f\u7528\u5c0f\u677e\u9f20\u7f29\u653e\u6cd5\u5219\u62df\u5408\u8bed\u8a00\u6a21\u578b\u635f\u5931\u6240\u9700\u76841%\u4ee5\u4e0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u6781\u5927\u5730\u4fc3\u8fdb\u4e86\u7f29\u653e\u5f8b\u62df\u5408\u548c\u9884\u6d4b\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u666e\u53ca\u6027\u3002|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4ee5\u4e0d\u65ad\u589e\u957f\u7684\u7a0b\u5ea6\u81ea\u4e3b\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u7ed9\u5b83\u4eec\u7684\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5b83\u5229\u7528\u4e86\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u7684\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5411\u5b89\u5168\u6027\u53d1\u5c55\uff0c\u540c\u65f6\u5b8c\u6210\u7ed9\u5b9a\u7684\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u4e2a\u6279\u5224\u6027\u673a\u5236\uff0c\u5728\u6bcf\u4e2a\u6b65\u9aa4\u4e0a\u5f15\u5bfc\u4ee3\u7406\u907f\u514d\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5b89\u5168\u63a8\u7406\u80fd\u529b\u7684\u73b0\u6709\u57fa\u51c6\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u6db5\u76d68\u4e2a\u7c7b\u522b\u5171\u8ba180\u4e2a\u5de5\u5177\u5305\u548c180\u4e2a\u573a\u666f\u7684\u4e00\u7ec4\u6570\u636e\u96c6\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u6027\u601d\u8003\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.11006": "|**2024-08-20**|**While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?**|Wen Cheng et.al.|[2408.11006](http://arxiv.org/abs/2408.11006)|**[link](https://github.com/sensente/security-attacks-on-lccts)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u8865\u5168\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u50ac\u751f\u4e86\u65b0\u4e00\u4ee3\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff08LCCT\uff09\u3002\u4e0e\u901a\u7528LLM\u4e0d\u540c\uff0c\u8fd9\u4e9b\u5de5\u5177\u5177\u6709\u72ec\u7279\u7684\u64cd\u4f5c\u6d41\u7a0b\uff0c\u6574\u5408\u591a\u79cd\u4fe1\u606f\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4f18\u5148\u8003\u8651\u4ee3\u7801\u5efa\u8bae\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\uff0c\u8fd9\u5f15\u5165\u4e86\u7279\u5b9a\u7684\u5b89\u5168\u6311\u6218\u3002\u6b64\u5916\uff0cLCCT\u901a\u5e38\u4f9d\u8d56\u4e8e\u4e13\u6709\u4ee3\u7801\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u5f15\u53d1\u4e86\u5173\u4e8e\u654f\u611f\u6570\u636e\u6cc4\u9732\u7684\u62c5\u5fe7\u3002\u672c\u6587\u5229\u7528LCCT\u7684\u72ec\u7279\u7279\u6027\uff0c\u5f00\u53d1\u4e86\u9488\u5bf9\u4e24\u79cd\u5173\u952e\u5b89\u5168\u98ce\u9669\u7684\u9488\u5bf9\u6027\u653b\u51fb\u65b9\u6cd5\uff1a\u8d8a\u72f1\u653b\u51fb\u548c\u8bad\u7ec3\u6570\u636e\u63d0\u53d6\u653b\u51fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86LCCT\u4e2d\u5b58\u5728\u7684\u91cd\u5927\u6f0f\u6d1e\uff0c\u5305\u62ec\u5728GitHub Copilot\u4e0a\u768499.4%\u6210\u529f\u8d8a\u72f1\u653b\u51fb\u7387\uff0c\u5728Amazon Q\u4e0a\u768446.3%\u6210\u529f\u7387\u3002\u6211\u4eec\u8fd8\u6210\u529f\u4eceGitHub Copilot\u4e2d\u63d0\u53d6\u4e86\u654f\u611f\u7528\u6237\u6570\u636e\uff0c\u5305\u62ec54\u4e2a\u771f\u5b9e\u7535\u5b50\u90ae\u4ef6\u5730\u5740\u548c314\u4e2a\u4e0eGitHub\u7528\u6237\u540d\u5173\u8054\u7684\u7269\u7406\u5730\u5740\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7801\u7684\u653b\u51fb\u65b9\u6cd5\u5bf9\u901a\u7528LLM\uff08\u5982GPT\u7cfb\u5217\uff09\u540c\u6837\u6709\u6548\uff0c\u7a81\u663e\u4e86\u73b0\u4ee3LLM\u5904\u7406\u4ee3\u7801\u65f6\u5b58\u5728\u7684\u66f4\u5e7f\u6cdb\u5b89\u5168\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86LCCT\u9762\u4e34\u7684\u5173\u952e\u5b89\u5168\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u52a0\u5f3a\u5176\u5b89\u5168\u6846\u67b6\u7684\u91cd\u8981\u65b9\u5411\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u76f8\u5173\u4ee3\u7801\u793a\u4f8b\u548c\u653b\u51fb\u6837\u672c\uff0c\u5b83\u4eec\u53ef\u4ecehttps://github.com/Sensente/Security-Attacks-on-LCCTs\u83b7\u53d6\u3002**|\n", "2408.10995": "|**2024-08-20**|**CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models**|Michael Reinisch et.al.|[2408.10995](http://arxiv.org/abs/2408.10995)|null|\u65b0\u533b\u7597\u6cbb\u7597\u65b9\u6cd5\u7684\u5f00\u53d1\u9700\u8981\u591a\u4e2a\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u3002\u5c3d\u7ba1\u5c06\u836f\u7269\u63a8\u5411\u5e02\u573a\u7684\u6210\u672c\u9ad8\u6602\u4e14\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u53ea\u6709\u4e0d\u523020%\u7684\u836f\u7269\u80fd\u4ece\u7b2c\u4e00\u9636\u6bb5\u8fc7\u6e21\u5230\u6700\u540e\u7684\u6279\u51c6\u3002\u8fd1\u671f\u7684\u7814\u7a76\u6587\u732e\u8868\u660e\uff0c\u8bd5\u9a8c\u65b9\u6848\u7684\u8bbe\u8ba1\u5bf9\u8bd5\u9a8c\u8868\u73b0\u6709\u7740\u663e\u8457\u5f71\u54cd\u3002\u6211\u4eec\u7814\u7a76\u4e86\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u9884\u6d4b\uff08CTOP\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u8bd5\u9a8c\u8bbe\u8ba1\u6587\u4ef6\u81ea\u52a8\u9884\u6d4b\u4e0d\u540c\u9636\u6bb5\u7684\u8f6c\u6362\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684CTOP\u6a21\u578b\u2014\u2014CTP-LLM\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aPhaseTransition\uff08PT\uff09\u7684\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u6839\u636e\u8bd5\u9a8c\u5728\u76d1\u7ba1\u8fc7\u7a0b\u4e2d\u7684\u8fdb\u5c55\u8fdb\u884c\u6807\u8bb0\uff0c\u5e76\u4f5c\u4e3aCTOP\u8bc4\u4f30\u7684\u6807\u51c6\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cbe\u7ec6\u8c03\u53c2GPT-3.5\u4e3a\u57fa\u7840\u7684\u6a21\u578b\uff08CTP-LLM\uff09\u80fd\u591f\u901a\u8fc7\u5206\u6790\u539f\u59cb\u534f\u8bae\u6587\u672c\u6765\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u7684\u8f6c\u6362\uff0c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u9009\u62e9\u7684\u7279\u5f81\u3002CTP-LLM\u5728\u6240\u6709\u9636\u6bb5\u7684\u9884\u6d4b\u4e2d\u8fbe\u5230\u4e8667%\u7684\u51c6\u786e\u7387\uff0c\u5728\u9884\u6d4b\u4ece\u7b2c\u4e09\u9636\u6bb5\u5230\u6700\u7ec8\u6279\u51c6\u7684\u8f6c\u6362\u65f6\uff0c\u51c6\u786e\u7387\u66f4\u8fbe\u5230\u4e8675%\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u9a71\u52a8\u5e94\u7528\u5728\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u548c\u8bc4\u4f30\u8bd5\u9a8c\u8bbe\u8ba1\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.10947": "|**2024-08-20**|**Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models**|Yuyan Chen et.al.|[2408.10947](http://arxiv.org/abs/2408.10947)|null|\u6559\u5e08\u5728\u4f20\u6388\u77e5\u8bc6\u548c\u5f15\u5bfc\u5b66\u4e60\u8005\u65b9\u9762\u53d1\u6325\u7740\u91cd\u8981\u4f5c\u7528\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6f5c\u5728\u6559\u80b2\u8005\u7684\u89d2\u8272\u6b63\u5728\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u9886\u57df\u3002\u8ba4\u8bc6\u5230LLMs\u751f\u6210\u6559\u80b2\u5185\u5bb9\u7684\u80fd\u529b\u53ef\u4ee5\u63a8\u52a8\u81ea\u52a8\u5316\u548c\u4e2a\u6027\u5316\u5b66\u4e60\u7684\u8fdb\u5c55\u3002\u867d\u7136LLMs\u5728\u7406\u89e3\u529b\u548c\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u65b9\u9762\u7684\u6d4b\u8bd5\u5df2\u7ecf\u8fdb\u884c\uff0c\u4f46\u5b83\u4eec\u5728\u6559\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u9c9c\u4e3a\u4eba\u77e5\u3002\u5728\u6559\u5b66\u4e2d\uff0c\u63d0\u95ee\u662f\u4e00\u9879\u5173\u952e\u6280\u80fd\uff0c\u80fd\u591f\u6307\u5bfc\u5b66\u751f\u5206\u6790\u3001\u8bc4\u4f30\u5e76\u7efc\u5408\u6838\u5fc3\u6982\u5ff5\u548c\u539f\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\u6765\u8bc4\u4f30\u6559\u80b2\u4e2dLLMs\u7684\u63d0\u95ee\u80fd\u529b\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u7684\u6559\u80b2\u95ee\u9898\uff0c\u5229\u7528\u5b89\u5fb7\u68ee\u548c\u514b\u62c9\u592b\u970d\u592b\u7684\u5206\u7c7b\u6cd5\u8986\u76d6\u4e00\u822c\u3001\u5355\u5b66\u79d1\u548c\u8de8\u5b66\u79d1\u9886\u57df\u3002\u6211\u4eec\u4ece\u5c06LLMs\u89c6\u4e3a\u5b66\u4e60\u8005\u8f6c\u5411\u5c06\u5176\u89c6\u4e3a\u6559\u80b2\u8005\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u95ee\u9898\u7684\u80fd\u529b\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6559\u5b66\u80fd\u529b\u3002\u6211\u4eec\u5e94\u7528\u4e86\u56db\u4e2a\u6307\u6807\uff0c\u5305\u62ec\u76f8\u5173\u6027\u3001\u8986\u76d6\u7387\u3001\u4ee3\u8868\u6027\u4ee5\u53ca\u4e00\u81f4\u6027\uff0c\u6765\u8bc4\u4f30LLMs\u8f93\u51fa\u7684\u6559\u80b2\u8d28\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u5728\u6559\u6388\u4e00\u822c\u3001\u4eba\u6587\u5b66\u79d1\u548c\u79d1\u5b66\u8bfe\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u663e\u8457\u6f5c\u529b\uff1bClaude2\u4f3c\u4e4e\u66f4\u9002\u5408\u62c5\u4efb\u8de8\u5b66\u79d1\u6559\u5e08\u3002\u6b64\u5916\uff0c\u81ea\u52a8\u8bc4\u5206\u4e0e\u4eba\u7c7b\u89c2\u70b9\u4e00\u81f4\u3002|\n", "2408.10946": "|**2024-08-20**|**Large Language Model Driven Recommendation**|Anton Korikov et.al.|[2408.10946](http://arxiv.org/abs/2408.10946)|null|### \u6458\u8981 \u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u65b0\u673a\u9047\u3002\u5728\u4e4b\u524d\u7684\u7ae0\u8282\u4e2d\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662f\u57fa\u4e8e\u6807\u51c6\u5316\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u5982\u8d2d\u4e70\u3001\u89c2\u770b\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u589e\u5f3a\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff0c\u8fd9\u4e3a\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6765\u6784\u5efa\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u63a8\u8350\u7cfb\u7edf\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u7684\u65b9\u5f0f\u4ecb\u7ecd\u5173\u952e\u7684\u6570\u636e\u6e90\uff0c\u6db5\u76d6\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u7684\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u5305\u62ec\u8c03\u4f18\u548c\u672a\u8c03\u4f18\u60c5\u51b5\u4e0b\u7684\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\u3002\u7136\u540e\uff0c\u8f6c\u5411\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u7ba1\u9053\u4e2d\u534f\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u4fc3\u8fdb\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u63d0\u4f9b\u63a8\u8350\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e0e\u7528\u6237\u7684\u4e92\u52a8\uff0c\u7528\u4e8e\u504f\u597d\u63d0\u53d6\u3001\u6279\u8bc4\u548c\u95ee\u7b54\u3002 ### \u7ffb\u8bd1 \u672c\u6587\u63a2\u8ba8\u4e86\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u65b9\u9762\u7684\u65b0\u578b\u5e94\u7528\u3002\u6b64\u524d\u7ae0\u8282\u4e3b\u8981\u5173\u6ce8\u57fa\u4e8e\u6807\u51c6\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u4f8b\u5982\u8d2d\u4e70\u3001\u6d4f\u89c8\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5b83\u4eec\u5177\u5907\u4e86\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u7684\u80fd\u529b\uff0c\u4ece\u800c\u6253\u5f00\u4e86\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6784\u5efa\u9ad8\u5ea6\u5b9a\u5236\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u80fd\u6027\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u65b9\u5f0f\u6982\u8ff0\u4e86\u5173\u952e\u6570\u636e\u6e90\uff0c\u5305\u62ec\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u968f\u540e\uff0c\u6df1\u5165\u63a2\u8ba8\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u6db5\u76d6\u4e86\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\uff0c\u65e0\u8bba\u662f\u5728\u8c03\u4f18\u8fd8\u662f\u672a\u8c03\u4f18\u72b6\u6001\u4e0b\u3002\u63a5\u7740\uff0c\u8ba8\u8bba\u4e86\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u6d41\u7a0b\u4e2d\u534f\u540c\u5de5\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u652f\u6301\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u7528\u4e8e\u751f\u6210\u63a8\u8350\uff0c\u8fd8\u80fd\u4e0e\u7528\u6237\u8fdb\u884c\u4e92\u52a8\uff0c\u8fdb\u884c\u504f\u597d\u6536\u96c6\u3001\u8bc4\u4ef7\u548c\u95ee\u7b54\u3002|\n", "2408.11813": "|**2024-08-21**|**SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs**|Yuanyang Yin et.al.|[2408.11813](http://arxiv.org/abs/2408.11813)|null|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u8868\u73b0\uff0c\u5b83\u4eec\u901a\u5e38\u7531\u89c6\u89c9\u7f16\u7801\u5668\u3001\u9002\u914d\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7ec4\u6210\u3002\u9002\u914d\u5668\u4f5c\u4e3a\u89c6\u89c9\u4e0e\u8bed\u8a00\u7ec4\u4ef6\u4e4b\u95f4\u7684\u5173\u952e\u6865\u6881\u3002\u7136\u800c\uff0c\u901a\u8fc7\u56fe\u50cf\u7ea7\u76d1\u7763\u8bad\u7ec3\u9002\u914d\u5668\u5f80\u5f80\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u5bf9\u9f50\u504f\u5dee\uff0c\u8fd9\u4f1a\u524a\u5f31LLM\u7684\u80fd\u529b\u5e76\u9650\u5236\u591a\u6a21\u6001LLM\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76d1\u7763\u5d4c\u5165\u5bf9\u9f50\uff08SEA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982CLIP\uff09\u7684\u5206\u8bcd\u7ea7\u5bf9\u9f50\u65b9\u6cd5\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u8c03\u6574\u89c6\u89c9\u5206\u8bcd\u4e0eLLM\u5d4c\u5165\u7a7a\u95f4\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u786e\u4fdd\u4e86\u89c6\u89c9\u548c\u8bed\u8a00\u8868\u793a\u4e4b\u95f4\u66f4\u534f\u8c03\u7684\u6574\u5408\uff0c\u4ece\u800c\u589e\u5f3a\u591a\u6a21\u6001LLM\u7684\u6027\u80fd\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u540c\u65f6\u4fdd\u7559\u5176\u56fa\u6709\u7279\u6027\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cSEA\u6709\u6548\u5730\u63d0\u9ad8\u4e86MLLMs\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u65e0\u9700\u989d\u5916\u7684\u6570\u636e\u6216\u63a8\u7406\u8ba1\u7b97\u3002\u6b64\u5916\uff0cSEA\u4e5f\u4e3a\u5f00\u53d1\u66f4\u901a\u7528\u548c\u9002\u5e94\u6027\u5f3a\u7684\u89e3\u51b3\u65b9\u6848\u4ee5\u589e\u5f3a\u591a\u6a21\u6001\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.11801": "|**2024-08-21**|**Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models**|Yuzhou Huang et.al.|[2408.11801](http://arxiv.org/abs/2408.11801)|null|\u4f20\u7edf\u89c6\u89c9\u53d9\u4e8b\u590d\u6742\uff0c\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u548c\u5927\u91cf\u8d44\u6e90\uff0c\u4f46\u5f80\u5f80\u53d7\u9650\u4e8e\u4eba\u7c7b\u7684\u521b\u9020\u529b\u4e0e\u521b\u4f5c\u7cbe\u5ea6\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u589e\u5f3a\u4e86\u89c6\u89c9\u53d9\u4e8b\u80fd\u529b\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u4e8c\u7ef4\u89c6\u89c9\u6548\u679c\u6216\u901a\u8fc7\u52a8\u4f5c\u5408\u6210\u548c\u884c\u4e3a\u6a21\u62df\u7b80\u5316\u6545\u4e8b\uff0c\u672a\u80fd\u751f\u6210\u5168\u9762\u3001\u591a\u7ef4\u7684\u53d9\u4e8b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faStory3D-Agent\uff0c\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u7684\u80fd\u529b\u5c06\u63d0\u4f9b\u7684\u53d9\u4e8b\u8f6c\u5316\u4e3a\u4e09\u7ef4\u6e32\u67d3\u53ef\u89c6\u5316\u3002\u901a\u8fc7\u96c6\u6210\u7a0b\u5e8f\u5efa\u6a21\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u7cbe\u786e\u63a7\u5236\u591a\u89d2\u8272\u7684\u52a8\u4f5c\u548c\u52a8\u6001\uff0c\u4ee5\u53ca\u5404\u79cd\u88c5\u9970\u5143\u7d20\uff0c\u786e\u4fdd\u957f\u671f\u548c\u52a8\u6001\u7684\u4e09\u7ef4\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u652f\u6301\u901a\u8fc7\u903b\u8f91\u63a8\u7406\u8fdb\u884c\u53d9\u4e8b\u6269\u5c55\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u6709\u6761\u4ef6\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u5bf9Story3D-Agent\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u4ee5\u9a8c\u8bc1\u5176\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u672c\u6846\u67b6\u6765\u63a8\u52a8\u4e09\u7ef4\u6545\u4e8b\u8868\u793a\u7684\u53d1\u5c55\u3002|\n", "2408.11800": "|**2024-08-21**|**PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain**|Rounak Meyur et.al.|[2408.11800](http://arxiv.org/abs/2408.11800)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u548c\u6587\u672c\u751f\u6210\u9886\u57df\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5174\u8d77\u4e3a\u901a\u8fc7\u5229\u7528\u7528\u6237\u6307\u5b9a\u6570\u636e\u5e93\u4e2d\u7684\u4fe1\u606f\u6765\u63d0\u9ad8\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u57fa\u51c6\u6d4b\u8bd5\u5bf9\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u4e0d\u540cRAG\u914d\u7f6e\u5728\u68c0\u7d22\u5668\u548c\u751f\u6210\u5668\u65b9\u9762\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u914d\u7f6e\u7684\u6709\u6548\u6027\u3001\u53ef\u6269\u5c55\u6027\u548c\u7279\u5b9a\u9886\u57df\u548c\u5e94\u7528\u7684\u9002\u7528\u6027\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u751f\u6210\u4e0e\u7279\u5b9a\u9886\u57df\u76f8\u5173\u7684RAG\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u57fa\u4e8e\u81ea\u52a8\u95ee\u9898\u7b54\u6848\u751f\u6210\u4e0e\u4eba\u7c7b\uff08\u9886\u57df\u4e13\u5bb6\uff09-\u4eba\u5de5\u667a\u80fd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u534f\u4f5c\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u4ee5\u6848\u4f8b\u7814\u7a76\u7684\u5f62\u5f0f\uff0c\u6211\u4eec\u901a\u8fc7\u5f15\u5165PermitQA\u4f5c\u4e3a\u98ce\u573a\u9009\u5740\u548c\u8bb8\u53ef\u9886\u57df\u7684\u9996\u4e2a\u57fa\u51c6\u8fdb\u884c\u4e86\u6846\u67b6\u5c55\u793a\uff0c\u8be5\u57fa\u51c6\u5305\u542b\u4e86\u4e0e\u98ce\u80fd\u9879\u76ee\u73af\u5883\u5f71\u54cd\u76f8\u5173\u7684\u591a\u7bc7\u79d1\u5b66\u6587\u6863/\u62a5\u544a\u3002 \u6211\u4eec\u7684\u6846\u67b6\u7cfb\u7edf\u5730\u4f7f\u7528\u591a\u79cd\u6307\u6807\u548c\u4e0d\u540c\u590d\u6742\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\u7c7b\u578b\u6765\u8bc4\u4f30RAG\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4e0d\u540c\u6a21\u578b\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002|\n", "2408.11795": "|**2024-08-21**|**EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model**|Feipeng Ma et.al.|[2408.11795](http://arxiv.org/abs/2408.11795)|null|\u5728\u591a\u6a21\u6001\u7814\u7a76\u9886\u57df\uff0c\u4f17\u591a\u7814\u7a76\u5229\u7528\u5927\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u8fdb\u884c\u6a21\u6001\u5bf9\u9f50\u5b66\u4e60\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u8f6c\u5316\u4e3a\u591a\u6a21\u6001LLMs\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u76ee\u524d\u4e3b\u8981\u7684\u5b9e\u73b0\u65b9\u6cd5\u5206\u4e3a\u4e24\u7c7b\uff1a\u81ea\u6ce8\u610f\u529b\u57fa\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u3002\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u56e0\u5176\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u673a\uff08MLP\uff09\u67b6\u6784\u800c\u5177\u6709\u8f83\u9ad8\u7684\u6570\u636e\u6548\u7387\uff0c\u4f46\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u5374\u76f8\u5bf9\u8f83\u4f4e\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u9700\u8981\u5c06\u89c6\u89c9\u548c\u6587\u672c\u4ee4\u724c\u4f5c\u4e3a\u8f93\u5165\u8fdb\u884c\u8fde\u63a5\u3002\u800c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u867d\u7136\u5728\u989d\u5916\u7684\u5b66\u4e60\u53c2\u6570\u65b9\u9762\u4e0d\u5982\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u9ad8\u6548\uff0c\u4f46\u7531\u4e8e\u907f\u514d\u4e86\u4e3aLLM\u63d0\u4f9b\u8fc7\u957f\u5e8f\u5217\u8f93\u5165\uff0c\u56e0\u6b64\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8868\u73b0\u66f4\u9ad8\u3002\u4e3a\u4e86\u5e73\u8861\u8fd9\u4e9b\u6743\u8861\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6570\u636e\u9ad8\u6548\u4e14\u8ba1\u7b97\u9ad8\u6548\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08EE-MLLM\uff09\u3002EE-MLLM\u5728\u4e0d\u5f15\u5165\u989d\u5916\u6a21\u5757\u6216\u53ef\u5b66\u4e60\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6570\u636e\u548c\u8ba1\u7b97\u6548\u7387\u7684\u63d0\u5347\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u591a\u6a21\u6001LLM\u4e2d\u7684\u539f\u59cb\u81ea\u6ce8\u610f\u529b\u673a\u5236\u8fdb\u884c\u4e86\u6539\u8fdb\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u590d\u5408\u6ce8\u610f\u529b\u673a\u5236\u3002\u8be5\u673a\u5236\u6709\u4e24\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u6d88\u9664\u89c6\u89c9\u4ee4\u724c\u5185\u90e8\u7684\u81ea\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u4ee5\u5b9e\u73b0\u8ba1\u7b97\u6548\u7387\uff1b2\uff09\u91cd\u7528LLM\u6bcf\u4e00\u5c42\u7684\u6743\u91cd\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u4e0e\u8bed\u8a00\u4e4b\u95f4\u7684\u6709\u6548\u6a21\u6001\u5bf9\u9f50\uff0c\u4ece\u800c\u5b9e\u73b0\u6570\u636e\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEE-MLLM\u5728\u5305\u62ecMMBench\u3001SeedBench\u7b49\u901a\u7528\u6027\u6570\u636e\u96c6\u4ee5\u53caTextVQA\u3001DocVQA\u7b49\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u5728\u5185\u7684\u591a\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6709\u6548\u6027\u3002|\n", "2408.11793": "|**2024-08-21**|**Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design**|Nathaniel H. Park et.al.|[2408.11793](http://arxiv.org/abs/2408.11793)|null|\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u548c\u901a\u8fc7\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u8fdb\u884c\u751f\u6210\u8bbe\u8ba1\u662f\u7814\u7a76\u7684\u70ed\u70b9\u9886\u57df\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u5b83\u5728\u52a0\u901f\u65b0\u6750\u6599\u5f00\u53d1\u65b9\u9762\u7684\u6f5c\u529b\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u5f97\u5230\u4e86\u663e\u8457\u589e\u5f3a\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u66f4\u590d\u6742\u7684\u7814\u7a76\u4efb\u52a1\u80cc\u666f\u4e0b\u8fdb\u884c\u9884\u6d4b\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u65b9\u9762\uff0c\u4ee3\u7406\u7cfb\u7edf\u4ecd\u6709\u6539\u8fdb\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u5bf9\u9884\u6d4b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u66ff\u4ee3\u5e94\u7528\uff0c\u5982\u5229\u7528\u5b83\u4eec\u7684\u6f5c\u5728\u8868\u793a\u6765\u4fc3\u8fdb\u8de8\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u5728\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u4efb\u52a1\u7279\u5b9a\u7684\u6750\u6599\u8bbe\u8ba1\uff0c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002 \u5728\u6b64\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5927\u89c4\u6a21\u3001\u9884\u8bad\u7ec3\u7684\u5316\u5b66\u57fa\u7840\u6a21\u578b\u53ef\u4ee5\u4f5c\u4e3a\u4f7f\u5316\u5b66\u4fe1\u606f\u68c0\u7d22\u8bed\u4e49\u5316\u7684\u57fa\u7840\uff0c\u9002\u7528\u4e8e\u5c0f\u5206\u5b50\u3001\u590d\u6742\u805a\u5408\u7269\u6750\u6599\u548c\u53cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5316\u5b66\u57fa\u7840\u6a21\u578b\u4e0e\u56fe\u50cf\u6a21\u578b\uff08\u5982OpenCLIP\uff09\u76f8\u7ed3\u5408\uff0c\u80fd\u591f\u5b9e\u73b0\u8de8\u591a\u4e2a\u8868\u5f81\u6570\u636e\u57df\u7684\u524d\u6240\u672a\u6709\u7684\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e9b\u7cfb\u7edf\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\uff0c\u4ee5\u652f\u6301\u7ed3\u6784\u548c\u62d3\u6251\u4e3a\u57fa\u7840\u7684\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\uff0c\u4ece\u800c\u4fc3\u8fdb\u590d\u6742\u7814\u7a76\u4efb\u52a1\u7684\u6267\u884c\u3002|\n", "2408.11791": "|**2024-08-21**|**Critique-out-Loud Reward Models**|Zachary Ankner et.al.|[2408.11791](http://arxiv.org/abs/2408.11791)|**[link](https://github.com/zankner/cloud)**|**\u4f20\u7edf\u7684\u5956\u52b1\u6a21\u578b\u5728\u4ece\u4eba\u7c7b\u53cd\u9988\u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u65f6\uff0c\u4ec5\u7528\u4e8e\u76f4\u63a5\u9884\u6d4b\u504f\u597d\u5206\u6570\uff0c\u800c\u4e0d\u5229\u7528\u5e95\u5c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u80fd\u529b\u3002\u8fd9\u9650\u5236\u4e86\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u5fc5\u987b\u901a\u8fc7\u5355\u4e00\u524d\u5411\u4f20\u9012\u6765\u9690\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5373\uff0c\u5fc5\u987b\u5728\u504f\u597d\u5efa\u6a21\u8fc7\u7a0b\u4e2d\u5b8c\u6210\u63a8\u7406\u3002\u4e3a\u4e86\u4f7f\u5956\u52b1\u6a21\u578b\u80fd\u591f\u663e\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u3002CLoud\u5956\u52b1\u6a21\u578b\u9996\u5148\u751f\u6210\u5bf9\u52a9\u624b\u54cd\u5e94\u7684\u81ea\u7136\u8bed\u8a00\u6279\u8bc4\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u6279\u8bc4\u6765\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\u7684\u6807\u91cf\u5956\u52b1\u3002 \u6211\u4eec\u8bc1\u660e\u4e86\u5bf9\u4e8eLlama-3-8B\u548c70B\u57fa\u7840\u6a21\u578b\uff0cCLoud\u5956\u52b1\u6a21\u578b\u7684\u6210\u529f\uff1a\u4e0e\u7ecf\u5178\u5956\u52b1\u6a21\u578b\u76f8\u6bd4\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5206\u522b\u5728RewardBench\u4e0a\u63d0\u9ad8\u4e868B\u548c70B\u57fa\u7840\u6a21\u578b\u7684\u4e8c\u5143\u504f\u597d\u5206\u7c7b\u51c6\u786e\u73874.65\u548c5.84\u4e2a\u767e\u5206\u70b9\u3002\u6b64\u5916\uff0c\u5f53\u4f5c\u4e3aBest-of-N\u8bc4\u5206\u6a21\u578b\u4f7f\u7528\u65f6\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5728ArenaHard\u4e0a\u7684\u80dc\u7387\u4e5f\u5b9e\u73b0\u4e86\u5e15\u7d2f\u6258\u6539\u8fdb\u3002\u6700\u540e\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u5229\u7528CLoud\u5956\u52b1\u6a21\u578b\u7684\u52a8\u6001\u63a8\u7406\u8ba1\u7b97\u80fd\u529b\uff0c\u901a\u8fc7\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u6765\u8fdb\u884c\u5956\u52b1\u9884\u6d4b\u3002 \u4ee5\u4e0a\u662f\u5173\u4e8e\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u7684\u6458\u8981\u7ffb\u8bd1\uff0c\u5b83\u5c55\u793a\u4e86\u8fd9\u79cd\u65b0\u578b\u5956\u52b1\u6a21\u578b\u5728\u63d0\u5347\u5f3a\u5316\u5b66\u4e60\u7cfb\u7edf\u6027\u80fd\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2408.11788": "|**2024-08-21**|**DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework**|Zhifei Xie et.al.|[2408.11788](http://arxiv.org/abs/2408.11788)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cDreamFactory\u201d\u7684LLM\u57fa\u6846\u67b6\uff0c\u5b83\u80fd\u89e3\u51b3\u5f53\u524d\u89c6\u9891\u751f\u6210\u6a21\u578b\u5728\u521b\u5efa\u957f\u89c6\u9891\u65f6\u9047\u5230\u7684\u6311\u6218\u3002DreamFactory\u901a\u8fc7\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u539f\u5219\u548c\u5173\u952e\u5e27\u8fed\u4ee3\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u786e\u4fdd\u4e86\u957f\u89c6\u9891\u7684\u4e00\u81f4\u6027\u548c\u98ce\u683c\u7edf\u4e00\u3002\u5b83\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08Chain of Thought\uff0cCOT\uff09\u6765\u5904\u7406\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56fa\u6709\u7684\u4e0d\u786e\u5b9a\u6027\u3002DreamFactory\u80fd\u591f\u751f\u6210\u957f\u3001\u98ce\u683c\u4e00\u81f4\u4e14\u590d\u6742\u7684\u89c6\u9891\u3002 \u5bf9\u4e8e\u8fd9\u4e9b\u957f\u5f62\u5f0f\u89c6\u9891\u7684\u8bc4\u4f30\u63d0\u51fa\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u8de8\u573a\u666f\u9762\u90e8\u8ddd\u79bb\u5206\u6570\u548c\u8de8\u573a\u666f\u98ce\u683c\u4e00\u81f4\u6027\u5206\u6570\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\uff0c\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u5305\u542b\u8d85\u8fc7150\u4e2a\u7531\u4eba\u7c7b\u8bc4\u5206\u7684\u591a\u573a\u666f\u89c6\u9891\u7684\u591a\u573a\u666f\u89c6\u9891\u6570\u636e\u96c6\u3002|\n", "2408.11779": "|**2024-08-21**|**Personality Alignment of Large Language Models**|Minjun Zhu et.al.|[2408.11779](http://arxiv.org/abs/2408.11779)|**[link](https://github.com/zhu-minjun/palign)**|**\u4e3a\u4e86\u5f25\u8865\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\u65b9\u6cd5\u5728\u53cd\u6620\u4eba\u7c7b\u666e\u904d\u4ef7\u503c\u89c2\u548c\u884c\u4e3a\u65f6\u7684\u4e0d\u8db3\uff0c\u5ffd\u89c6\u4e86\u4e2a\u4f53\u7528\u6237\u72ec\u7279\u7279\u5f81\u548c\u504f\u597d\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e2a\u6027\u5bf9\u9f50\u7684\u6982\u5ff5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u6839\u636e\u4e2a\u4f53\u7528\u6237\u6216\u7d27\u5bc6\u5173\u8054\u7fa4\u4f53\u7684\u5177\u4f53\u504f\u597d\u8c03\u6574LLM\u7684\u54cd\u5e94\u4e0e\u51b3\u7b56\u3002\u53d7\u5fc3\u7406\u6d4b\u91cf\u5b66\u7684\u542f\u53d1\uff0c\u6211\u4eec\u6784\u5efa\u4e86Personality Alignment with Personality Inventories (PAPI) \u6570\u636e\u96c6\uff0c\u5305\u542b\u4e8630\u4e07\u771f\u5b9e\u4e3b\u4f53\u7684\u6570\u636e\uff0c\u6bcf\u4e2a\u4e3b\u4f53\u57fa\u4e8e\u4e94\u5927\u4eba\u683c\u56e0\u7d20\u63d0\u4f9b\u884c\u4e3a\u504f\u597d\u4fe1\u606f\u3002\u8fd9\u4e00\u6570\u636e\u96c6\u4f7f\u6211\u4eec\u80fd\u591f\u5b9a\u91cf\u8bc4\u4f30LLM\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u4e0e\u6bcf\u4e2a\u4e3b\u4f53\u7684\u884c\u4e3a\u6a21\u5f0f\u76f8\u5339\u914d\u3002\u9274\u4e8e\u4e2a\u6027\u5bf9\u9f50\u9762\u4e34\u7684\u6311\u6218\uff1a\u5982\u4e2a\u4eba\u6570\u636e\u6709\u9650\u3001\u504f\u597d\u591a\u6837\u4ee5\u53ca\u53ef\u6269\u5c55\u6027\u9700\u6c42\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6fc0\u6d3b\u5e72\u9884\u4f18\u5316\u65b9\u6cd5\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528\u6700\u5c11\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u63d0\u9ad8\u4e86LLM\u9ad8\u6548\u5bf9\u9f50\u4e2a\u4f53\u884c\u4e3a\u504f\u597d\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5PAS\u4e0d\u4ec5\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86DPO\uff0c\u800c\u4e14\u4f18\u5316\u65f6\u95f4\u4ec5\u4e3a\u540e\u8005\u7684\u4e94\u5206\u4e4b\u4e00\uff0c\u5177\u6709\u5b9e\u9645\u4ef7\u503c\uff0c\u63a8\u52a8\u4e86\u4e2a\u6027\u5316\u7684AI\u7cfb\u7edf\u51b3\u7b56\u4e0e\u63a8\u7406\u7684\u53d1\u5c55\uff0c\u589e\u5f3a\u4e86\u4e0e\u6bcf\u4f4d\u7528\u6237\u7684\u4ea4\u4e92\u76f8\u5173\u6027\u548c\u610f\u4e49\uff0c\u4fc3\u8fdb\u4e86\u4ee5\u4eba\u4e3a\u672c\u7684\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728\u3002**|\n", "2408.11775": "|**2024-08-21**|**Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards**|Omar Erak et.al.|[2408.11775](http://arxiv.org/abs/2408.11775)|**[link](https://github.com/Nouf-Alabbasi/oKUmura_AI_Telecom_challenge)**|**\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7535\u4fe1\u6807\u51c6\u65b9\u9762\u7684\u6280\u672f\u89c4\u8303\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8ePhi-2\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u7684\u5fae\u8c03\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u4f5c\u4e3a\u901a\u4fe1\u7f51\u7edc\u7684\u6743\u5a01\u7b54\u6848\u6765\u6e90\u3002\u6211\u4eec\u5f00\u53d1\u7684\u7cfb\u7edf\u5229\u7528\u524d\u77bb\u6027\u7684\u8bed\u4e49\u5206\u5757\u6765\u52a8\u6001\u786e\u5b9a\u89e3\u6790\u65ad\u70b9\uff0c\u4f9d\u636e\u5d4c\u5165\u76f8\u4f3c\u5ea6\u8fdb\u884c\u8c03\u6574\uff0c\u4ece\u800c\u6709\u6548\u5904\u7406\u591a\u79cd\u6587\u6863\u683c\u5f0f\u3002\u9488\u5bf9\u6280\u672f\u6807\u51c6\u4e2d\u53ef\u80fd\u51fa\u73b0\u7684\u591a\u4e2a\u76f8\u4f3c\u4e0a\u4e0b\u6587\u95ee\u9898\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u91cd\u65b0\u6392\u540d\u7b97\u6cd5\u4ee5\u4f18\u5148\u8003\u8651\u6700\u76f8\u5173\u7684\u63d0\u53d6\u7247\u6bb5\u3002\u8003\u8651\u5230Phi-2\u7684\u5c0f\u8bed\u5883\u7a97\u53e3\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aSelfExtend\u7684\u6700\u65b0\u6280\u672f\uff0c\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u6269\u5c55\u8bed\u5883\u7a97\u53e3\uff0c\u4e0d\u4ec5\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u8fd8\u80fd\u9002\u5e94\u5ba2\u6237\u5230\u4e13\u4e1a\u6280\u672f\u4eba\u5458\u7684\u5404\u79cd\u67e5\u8be2\u548c\u8bbe\u8ba1\u9700\u6c42\u3002\u4e3a\u4e86\u5fae\u8c03\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4f4e\u79e9\u9002\u914d\uff08LoRA\uff09\u6280\u672f\uff0c\u5728\u8bad\u7ec3\u65f6\u63d0\u9ad8\u8ba1\u7b97\u6548\u7387\uff0c\u5e76\u5728\u5c0f\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u6709\u6548\u7684\u5fae\u8c03\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u7535\u4fe1\u9886\u57df\u5bf9\u73b0\u6709\u95ee\u7b54\u65b9\u6cd5\u7684\u663e\u8457\u6539\u8fdb\uff0c\u6027\u80fd\u8d85\u8fc7GPT-4\uff08\u5927\u7ea6\u662f\u5176\u89c4\u6a21\u7684880\u500d\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u5229\u7528SLM\u5728\u901a\u4fe1\u7f51\u7edc\u4e2d\u7684\u65b0\u65b9\u6cd5\uff0c\u63d0\u4f9b\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u53ef\u4f5c\u4e3a\u6784\u5efa\u667a\u80fd\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u3002**|\n", "2408.11749": "|**2024-08-21**|**Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks**|Yiyi Chen et.al.|[2408.11749](http://arxiv.org/abs/2408.11749)|**[link](https://github.com/siebeniris/vec2text_exp)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u6765\u81ea\u7f51\u7edc\u653b\u51fb\u8005\u7684\u6076\u610f\u5f71\u54cd\uff0c\u5982\u5bf9\u6297\u6027\u3001\u540e\u95e8\u548c\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u3002\u5bf9\u6b64\uff0c\u65b0\u5174\u7684LLM\u5b89\u5168\u9886\u57df\u81f4\u529b\u4e8e\u7814\u7a76\u5e76\u9632\u5fa1\u6b64\u7c7b\u5a01\u80c1\u3002\u8fc4\u4eca\u4e3a\u6b62\uff0c\u8be5\u9886\u57df\u7684\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u82f1\u8bed\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u7136\u800c\uff0c\u6700\u65b0\u7814\u7a76\u8868\u660e\uff0c\u591a\u8bed\u8a00LLM\u53ef\u80fd\u6bd4\u5176\u5355\u4e00\u8bed\u8a00\u540c\u50da\u66f4\u6613\u53d7\u5230\u5404\u79cd\u653b\u51fb\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5728\u90e8\u5206\u6b27\u6d32\u8bed\u8a00\u4e0a\u7684\u5d4c\u5165\u53cd\u8f6c\uff0c\u4f46\u8981\u5c06\u8fd9\u4e9b\u53d1\u73b0\u63a8\u53ca\u5230\u4e0d\u540c\u8bed\u7cfb\u548c\u4e0d\u540c\u4e66\u5199\u7cfb\u7edf\u7684\u8bed\u8a00\uff0c\u5374\u6781\u5177\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u591a\u8bed\u8a00LLM\u5728\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5e76\u572820\u79cd\u8bed\u8a00\u4e2d\u8fdb\u884c\u8de8\u8bed\u8a00\u548c\u8de8\u4e66\u5199\u7684\u53cd\u8f6c\u6d4b\u8bd5\uff0c\u8986\u76d68\u4e2a\u8bed\u7cfb\u548c12\u79cd\u4e66\u5199\u7cfb\u7edf\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u963f\u62c9\u4f2f\u5b57\u6bcd\u548c\u897f\u91cc\u5c14\u5b57\u6bcd\u4e66\u5199\u7684\u8bed\u8a00\u4ee5\u53ca\u5370\u5ea6-\u96c5\u5229\u5b89\u8bed\u7cfb\u7684\u8bed\u8a00\u7279\u522b\u5bb9\u6613\u53d7\u5230\u5d4c\u5165\u53cd\u8f6c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u89c2\u5bdf\u5230\u53cd\u8f6c\u6a21\u578b\u503e\u5411\u4e8e\u51fa\u73b0\u8bed\u8a00\u6df7\u6dc6\uff0c\u6709\u65f6\u5927\u5e45\u5ea6\u964d\u4f4e\u4e86\u653b\u51fb\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u63a2\u7d22\u4e86\u8fd9\u4e00\u74f6\u9888\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u53ef\u9884\u6d4b\u6a21\u5f0f\uff0c\u8fd9\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u3002\u6700\u7ec8\uff0c\u672c\u7814\u7a76\u65e8\u5728\u6df1\u5316\u5bf9\u591a\u8bed\u8a00LLM\u9762\u4e34\u7684\u4e3b\u8981\u5b89\u5168\u6f0f\u6d1e\u7684\u7406\u89e3\uff0c\u5e76\u63d0\u9ad8\u5bf9\u6700\u6613\u53d7\u8fd9\u4e9b\u653b\u51fb\u5f71\u54cd\u7684\u8bed\u8a00\u7684\u610f\u8bc6\u3002|\n", "2408.12599": "|**2024-08-22**|**Controllable Text Generation for Large Language Models: A Survey**|Xun Liang et.al.|[2408.12599](http://arxiv.org/abs/2408.12599)|**[link](https://github.com/iaar-shanghai/ctgsurvey)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6587\u672c\u751f\u6210\u8d28\u91cf\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0cLLMs\u9700\u8981\u6ee1\u8db3\u65e5\u76ca\u590d\u6742\u7684\u9700\u6c42\u3002\u9664\u4e86\u907f\u514d\u8bef\u5bfc\u6027\u6216\u4e0d\u9002\u5f53\u7684\u5185\u5bb9\uff0cLLMs\u8fd8\u88ab\u671f\u671b\u6839\u636e\u7279\u5b9a\u7528\u6237\u9700\u6c42\u8fdb\u884c\u8c03\u6574\uff0c\u5982\u6a21\u4eff\u7279\u5b9a\u7684\u5199\u4f5c\u98ce\u683c\u6216\u751f\u6210\u5bcc\u6709\u8bd7\u610f\u7684\u6587\u672c\u3002\u8fd9\u4e9b\u591a\u6837\u7684\u9700\u6c42\u63a8\u52a8\u4e86\u53ef\u63a7\u6587\u672c\u751f\u6210\uff08CTG\uff09\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u7b26\u5408\u9884\u8bbe\u7684\u63a7\u5236\u6761\u4ef6\uff0c\u5982\u5b89\u5168\u6027\u3001\u60c5\u611f\u503e\u5411\u3001\u4e3b\u9898\u4e00\u81f4\u6027\u4ee5\u53ca\u8bed\u8a00\u98ce\u683c\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u6709\u7528\u6027\u3001\u6d41\u7545\u6027\u548c\u591a\u6837\u6027\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u56de\u987e\u4e86CTG\u5728LLMs\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8be6\u7ec6\u5b9a\u4e49\u4e86\u5176\u6838\u5fc3\u6982\u5ff5\uff0c\u5e76\u660e\u786e\u4e86\u63a7\u5236\u6761\u4ef6\u548c\u6587\u672c\u8d28\u91cf\u7684\u8981\u6c42\u3002\u6211\u4eec\u5c06CTG\u4efb\u52a1\u5206\u4e3a\u4e24\u5927\u7c7b\uff1a\u5185\u5bb9\u63a7\u5236\u548c\u5c5e\u6027\u63a7\u5236\uff0c\u5e76\u5bf9\u6bcf\u79cd\u7c7b\u578b\u7684\u65b9\u6cd5\u8fdb\u884c\u4e86\u8ba8\u8bba\uff0c\u5305\u62ec\u6a21\u578b\u91cd\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u5f3a\u5316\u5b66\u4e60\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6f5c\u5728\u7a7a\u95f4\u64cd\u7eb5\u548c\u89e3\u7801\u65f6\u5e72\u9884\u3002\u6211\u4eec\u5206\u6790\u4e86\u6bcf\u79cd\u65b9\u6cd5\u7684\u7279\u70b9\u3001\u4f18\u52bf\u548c\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u5b9e\u73b0\u751f\u6210\u63a7\u5236\u7684\u6df1\u5165\u89c1\u89e3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u56de\u987e\u4e86CTG\u8bc4\u4f30\u65b9\u6cd5\u3001\u603b\u7ed3\u4e86\u5176\u8de8\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u6307\u51fa\u4e86\u5f53\u524d\u7814\u7a76\u7684\u5173\u952e\u6311\u6218\uff0c\u5982\u6d41\u7545\u5ea6\u548c\u5b9e\u7528\u6027\u7684\u964d\u4f4e\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u82e5\u5e72\u547c\u5401\uff0c\u5f3a\u8c03\u672a\u6765\u7814\u7a76\u5e94\u66f4\u6ce8\u91cd\u5b9e\u9645\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u4e3a\u8be5\u9886\u57df\u7684\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6307\u5bfc\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u548c\u4e2d\u6587\u7248\u672c\u5df2\u5f00\u6e90\u5728https://github.com/IAAR-Shanghai/CTGSurvey\u3002**|\n", "2408.12579": "|**2024-08-22**|**RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment**|Xiaohan Wang et.al.|[2408.12579](http://arxiv.org/abs/2408.12579)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u3001MedPaLM-2\u548cMed-Gemini\u5728\u5404\u7c7b\u533b\u7597\u8bc4\u4f30\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u4e0e\u533b\u5b66\u4e13\u5bb6\u7ade\u4e89\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u4e0e\u533b\u751f\u76f8\u5ab2\u7f8e\u7684\u4e13\u4e1a\u8bca\u65ad\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u9ad8\u6548\u6536\u96c6\u60a3\u8005\u4fe1\u606f\u4ee5\u53ca\u63a8\u7406\u6700\u7ec8\u8bca\u65ad\u7684\u8fc7\u7a0b\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRuleAlign\u7684\u6846\u67b6\uff0c\u65e8\u5728\u4f7fLLM\u4e0e\u7279\u5b9a\u8bca\u65ad\u89c4\u5219\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u57fa\u4e8e\u89c4\u5219\u7684\u533b\u60a3\u5bf9\u8bdd\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u8fdb\u884c\u5bf9\u9f50\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u671f\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u591f\u542f\u53d1\u63a2\u7d22LLM\u4f5c\u4e3aAI\u533b\u5e08\u7684\u6f5c\u529b\u3002|\n", "2408.12570": "|**2024-08-22**|**Jamba-1.5: Hybrid Transformer-Mamba Models at Scale**|Jamba Team et.al.|[2408.12570](http://arxiv.org/abs/2408.12570)|null|\u6211\u4eec\u63a8\u51fa\u4e86Jamba-1.5\uff0c\u57fa\u4e8e\u6211\u4eecJamba\u67b6\u6784\u7684\u65b0\u578b\u6307\u4ee4\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002Jamba\u662f\u4e00\u79cd\u6df7\u5408Transformer-Mamba\u4e13\u5bb6\u6df7\u5408\u67b6\u6784\uff0c\u5b83\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u8303\u56f4\u5185\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u4f7f\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eTransformer\u6a21\u578b\u76f8\u540c\u6216\u66f4\u597d\u7684\u8d28\u91cf\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4e24\u79cd\u6a21\u578b\u5927\u5c0f\uff1aJamba-1.5-Large\uff0c\u5177\u670994B\u4e2a\u6d3b\u8dc3\u53c2\u6570\uff1b\u4ee5\u53caJamba-1.5-Mini\uff0c\u5177\u670912B\u4e2a\u6d3b\u8dc3\u53c2\u6570\u3002\u8fd9\u4e24\u79cd\u6a21\u578b\u5747\u9488\u5bf9\u591a\u79cd\u5bf9\u8bdd\u548c\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5e76\u4e14\u5177\u6709256K\u4ee4\u724c\u7684\u6700\u5927\u6709\u6548\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u5728\u5f00\u653e\u6743\u91cd\u6a21\u578b\u4e2d\u6700\u5927\u3002\u4e3a\u4e86\u652f\u6301\u6210\u672c\u6548\u76ca\u7684\u63a8\u7406\uff0c\u6211\u4eec\u5f15\u5165\u4e86ExpertsInt8\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u91cf\u5316\u6280\u672f\uff0c\u5141\u8bb8\u5728\u5904\u7406256K\u4ee4\u724c\u4e0a\u4e0b\u6587\u65f6\u5c06Jamba-1.5-Large\u6a21\u578b\u653e\u5165\u5177\u67098\u4e2a80GB GPU\u7684\u673a\u5668\u4e0a\u800c\u4e0d\u4f1a\u635f\u5931\u8d28\u91cf\u3002\u5f53\u5728\u4e00\u7cfb\u5217\u5b66\u672f\u548c\u804a\u5929\u673a\u5668\u4eba\u57fa\u51c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\u65f6\uff0cJamba-1.5\u6a21\u578b\u53d6\u5f97\u4e86\u51fa\u8272\u7684\u7ed3\u679c\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u5e76\u4f18\u4e8e\u5176\u4ed6\u5f00\u653e\u6743\u91cd\u6a21\u578b\u5728\u957f\u4e0a\u4e0b\u6587\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u4e24\u79cd\u5927\u5c0f\u7684\u6a21\u578b\u7684\u6743\u91cd\u90fd\u6839\u636eJamba\u5f00\u653e\u6a21\u578b\u8bb8\u53ef\u516c\u5f00\u63d0\u4f9b\uff0c\u5e76\u4e14\u6211\u4eec\u53d1\u5e03\u4e86ExpertsInt8\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u3002|\n", "2408.12561": "|**2024-08-22**|**ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation**|Lujia Zhong et.al.|[2408.12561](http://arxiv.org/abs/2408.12561)|**[link](https://github.com/lujiazho/ssprop)**|**\u8fd1\u671f\uff0c\u6df1\u5ea6\u5b66\u4e60\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u6a21\u578b\u9886\u57df\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u6982\u7387\u6027\u6269\u6563\u6a21\u578b\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u8fd9\u4e9b\u6a21\u578b\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u6d88\u8017\u6570\u5341\u4ebf\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08petaFLOPs\uff09\uff0c\u5bfc\u81f4\u5de8\u5927\u7684\u80fd\u6e90\u6d88\u8017\u548c\u78b3\u8db3\u8ff9\uff0c\u5f15\u53d1\u4e86\u5bf9\u73af\u5883\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u5728\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u8fc7\u7a0b\u4e2d\uff0c\u53cd\u5411\u4f20\u64ad\uff08Back-propagation, BP\uff09\u662f\u4e3b\u8981\u7684\u8ba1\u7b97\u8d1f\u62c5\u6765\u6e90\u3002 \u4e3a\u4e86\u63a8\u52a8\u80fd\u6e90\u6548\u7387\u7684\u63d0\u9ad8\uff0c\u5e76\u5141\u8bb8\u5728\u4efb\u4f55\u673a\u5668\u548c\u8bbe\u5907\u4e0a\u5b9e\u73b0\u7a00\u758f\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u3001\u80fd\u6e90\u9ad8\u6548\u7684\u5377\u79ef\u6a21\u5757\uff0c\u5b83\u80fd\u591f\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u4e2d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u901a\u9053\u7ea7\u7a00\u758f\u6027\uff0c\u5e76\u57fa\u4e8e\u5047\u8bbeBP\u901a\u5e38\u5bc6\u96c6\u4e14\u4f4e\u6548\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u8fc7\u62df\u5408\u548c\u9ad8\u8ba1\u7b97\u6d88\u8017\uff0c\u63d0\u51fa\u4e86\u989d\u5916\u7684\u68af\u5ea6\u9009\u62e9\u8c03\u5ea6\u5668\uff0c\u5728\u53cd\u5411\u4f20\u64ad\u9636\u6bb5\u8fdb\u884c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c1140%\u7684\u8ba1\u7b97\u91cf\uff0c\u540c\u65f6\u6709\u53ef\u80fd\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u751f\u6210\u4efb\u52a1\u4e0a\u5f97\u5230\u9a8c\u8bc1\u3002\u8fd9\u79cd\u51cf\u5c11\u53ef\u4ee5\u5e26\u6765\u663e\u8457\u7684\u80fd\u6e90\u8282\u7701\u548c\u8f83\u4f4e\u7684\u78b3\u8db3\u8ff9\uff0c\u5c24\u5176\u662f\u5728\u5927\u578bAI\u7cfb\u7edf\u7684\u7814\u7a76\u4e0e\u5f00\u53d1\u9636\u6bb5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u4e0d\u540c\u4e8eDropout\u7684\u65b9\u5f0f\u7f13\u89e3\u4e86\u8fc7\u62df\u5408\u95ee\u9898\uff0c\u5141\u8bb8\u5b83\u4e0eDropout\u7ed3\u5408\u4f7f\u7528\uff0c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u5e76\u964d\u4f4e\u8ba1\u7b97\u8d44\u6e90\u6d88\u8017\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u6570\u636e\u96c6\u548c\u4efb\u52a1\uff0c\u5e76\u4e0e\u591a\u79cd\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u548c\u6a21\u5757\u517c\u5bb9\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/lujiazho/ssProp\u3002**|\n", "2408.12547": "|**2024-08-22**|**Towards Evaluating and Building Versatile Large Language Models for Medicine**|Chaoyi Wu et.al.|[2408.12547](http://arxiv.org/abs/2408.12547)|**[link](https://github.com/magic-ai4med/meds-ins)**|**\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014MedS-Bench\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e34\u5e8a\u573a\u666f\u4e2d\u7684\u6027\u80fd\u3002\u4e0e\u73b0\u6709\u4fa7\u91cd\u4e8e\u591a\u9879\u9009\u62e9\u95ee\u9898\u56de\u7b54\u7684\u57fa\u51c6\u4e0d\u540c\uff0cMedS-Bench\u8986\u76d6\u4e8611\u4e2a\u9ad8\u7ea7\u522b\u4e34\u5e8a\u4efb\u52a1\uff0c\u5305\u62ec\u4e34\u5e8a\u62a5\u544a\u6458\u8981\u3001\u6cbb\u7597\u5efa\u8bae\u3001\u8bca\u65ad\u3001\u5b9e\u4f53\u8bc6\u522b\u548c\u533b\u5b66\u6982\u5ff5\u89e3\u91ca\u7b49\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u63d0\u793a\u5bf9\u516d\u6b3e\u9886\u5148\u7684LLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5982MEDITRON\u3001Mistral\u3001InternLM 2\u3001Llama 3\u3001GPT-4\u548cClaude-3.5\uff0c\u53d1\u73b0\u5373\u4f7f\u662f\u6700\u9ad8\u7ea7\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u4e0a\u4e5f\u5b58\u5728\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MedS-Ins\uff0c\u4e00\u4e2a\u9762\u5411\u533b\u5b66\u9886\u57df\u7684\u5927\u578b\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u3002MedS-Ins\u5305\u542b\u4e8658\u4e2a\u533b\u5b66\u76f8\u5173\u7684\u8bed\u8a00\u8bed\u6599\u5e93\uff0c\u603b\u8ba11350\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86122\u4e2a\u4efb\u52a1\u3002\u901a\u8fc7\u5c55\u793a\u8be5\u6570\u636e\u96c6\u7684\u7528\u9014\uff0c\u6211\u4eec\u5728\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u5f00\u6e90\u7684\u533b\u7597\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u6307\u4ee4\u8c03\u4f18\u5b9e\u9a8c\uff0c\u7ed3\u679c\u5f97\u5230\u4e86\u540d\u4e3aMMedIns-Llama 3\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u5728\u51e0\u4e4e\u6240\u6709\u4e34\u5e8a\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u90fd\u8d85\u8fc7\u4e86\u73b0\u6709\u6a21\u578b\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9LLMs\u5e94\u7528\u4e8e\u4e34\u5e8a\u6311\u6218\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06MedS-Ins\u6570\u636e\u96c6\u5b8c\u5168\u516c\u5f00\uff0c\u5e76\u9080\u8bf7\u7814\u7a76\u793e\u533a\u53c2\u4e0e\u5176\u6269\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u52a8\u6001\u6392\u884c\u699c\uff0c\u8ba1\u5212\u5b9a\u671f\u66f4\u65b0\u6d4b\u8bd5\u96c6\uff0c\u4ee5\u8ddf\u8e2a\u8fdb\u5c55\u5e76\u589e\u5f3a\u901a\u7528LLM\u5728\u533b\u5b66\u9886\u57df\u4e2d\u7684\u9002\u5e94\u80fd\u529b\u3002\u6392\u884c\u699c\uff1ahttps://henrychur.github.io/MedS-Bench/\u3002Github\uff1ahttps://github.com/MAGIC-AI4Med/MedS-Ins\u3002**|\n", "2408.12496": "|**2024-08-22**|**MEDCO: Medical Education Copilots Based on A Multi-Agent Framework**|Hao Wei et.al.|[2408.12496](http://arxiv.org/abs/2408.12496)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u5b66\u548c\u5065\u5eb7\u9886\u57df\u7b49\u591a\u4e2a\u7814\u7a76\u9886\u57df\u4ea7\u751f\u4e86\u91cd\u5927\u5f71\u54cd\uff0c\u7136\u800cLLMs\u4f5c\u4e3a\u533b\u7597\u6559\u80b2\u4e2d\u7684\u52a9\u624b\u6f5c\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684AI\u8f85\u52a9\u6559\u80b2\u5de5\u5177\u53d7\u9650\u4e8e\u5355\u4e00\u5b66\u4e60\u65b9\u6cd5\u4ee5\u53ca\u65e0\u6cd5\u6a21\u62df\u5b9e\u9645\u533b\u7597\u57f9\u8bad\u7684\u591a\u5b66\u79d1\u6027\u548c\u4e92\u52a8\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDCO\uff08Medical EDucation COpilots\uff09\u7684\u65b0\u578b\u591a\u4ee3\u7406\u52a9\u624b\u7cfb\u7edf\uff0c\u4e13\u95e8\u7528\u4e8e\u6a21\u62df\u771f\u5b9e\u4e16\u754c\u533b\u7597\u57f9\u8bad\u73af\u5883\u3002MEDCO\u6574\u5408\u4e86\u4e09\u4e2a\u6838\u5fc3\u4ee3\u7406\uff1a\u4e00\u4e2a\u81ea\u4e3b\u60a3\u8005\u3001\u4e00\u4f4d\u4e13\u5bb6\u533b\u751f\u548c\u4e00\u4f4d\u653e\u5c04\u79d1\u533b\u5e08\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u548c\u4e92\u52a8\u7684\u5b66\u4e60\u73af\u5883\u3002\u6211\u4eec\u7684\u6846\u67b6\u7740\u91cd\u4e8e\u6559\u6388\u9ad8\u6548\u63d0\u95ee\u6280\u5de7\u3001\u8de8\u5b66\u79d1\u534f\u4f5c\u4ee5\u53ca\u5b66\u751f\u4e4b\u95f4\u7684\u540c\u4f34\u8ba8\u8bba\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7MEDCO\u8bad\u7ec3\u7684\u865a\u62df\u5b66\u751f\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u4e0e\u9ad8\u7ea7\u6a21\u578b\u76f8\u5ab2\u7f8e\u7684\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u8fd8\u5c55\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u5b66\u4e60\u884c\u4e3a\u548c\u8fdb\u6b65\uff0c\u5e76\u4e14\u5b66\u4e60\u6837\u672c\u6570\u91cf\u589e\u52a0\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u533b\u7597\u6559\u80b2\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u4e92\u52a8\u548c\u534f\u4f5c\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u63d0\u4f9b\u4e86\u5173\u4e8e\u96c6\u6210AI\u7684\u8bad\u7ec3\u6a21\u5f0f\u6709\u6548\u6027\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2408.12494": "|**2024-08-22**|**GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models**|Kunsheng Tang et.al.|[2408.12494](http://arxiv.org/abs/2408.12494)|**[link](https://github.com/kstanghere/gendercare-ccs24)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u4e5f\u88ab\u89c2\u5bdf\u5230\u653e\u5927\u4e86\u793e\u4f1a\u504f\u89c1\uff0c\u5c24\u5176\u662f\u4e0e\u6027\u522b\u76f8\u5173\u7684\u504f\u89c1\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u5df2\u7ecf\u63d0\u51fa\u4e86\u82e5\u5e72\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u7f3a\u4e4f\u5b9e\u9645\u7684\u7075\u6d3b\u6027\u6216\u65e0\u610f\u4e2d\u5f15\u5165\u4e86\u504f\u89c1\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenderCARE\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\uff0c\u5305\u62ec\u521b\u65b0\u7684\u51c6\u5219\u3001\u8bc4\u4f30\u3001\u51cf\u5c11\u6280\u672f\u4ee5\u53ca\u8bc4\u4ef7\u6307\u6807\uff0c\u65e8\u5728\u91cf\u5316\u548c\u51cf\u8f7bLLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u5f00\u521b\u6027\u7684\u6027\u522b\u5e73\u7b49\u57fa\u51c6\u51c6\u5219\uff0c\u8986\u76d6\u4e86\u5305\u5bb9\u6027\u3001\u591a\u6837\u6027\u3001\u53ef\u89e3\u91ca\u6027\u3001\u5ba2\u89c2\u6027\u3001\u7a33\u5065\u6027\u548c\u73b0\u5b9e\u6027\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6839\u636e\u8fd9\u4e9b\u51c6\u5219\uff0c\u6211\u4eec\u6784\u5efa\u4e86GenderPair\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u914d\u5bf9\u57fa\u51c6\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u6807\u51c6\u5316\u4e14\u73b0\u5b9e\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u4ee5\u524d\u88ab\u5ffd\u89c6\u7684\u6027\u522b\u7fa4\u4f53\uff0c\u5982\u8de8\u6027\u522b\u8005\u548c\u975e\u4e8c\u5143\u4e2a\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u6709\u6548\u7684\u53bb\u504f\u6280\u672f\uff0c\u5305\u62ec\u53cd\u4e8b\u5b9e\u6570\u636e\u589e\u5f3a\u548c\u4e13\u95e8\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ee5\u5728\u4e0d\u635f\u5bb3LLM\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u51cf\u5c11\u6027\u522b\u504f\u89c1\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u572817\u4e2a\u4e0d\u540c\u7684LLM\u4e0a\uff0c\u5404\u79cd\u6027\u522b\u504f\u89c1\u57fa\u51c6\u7684\u663e\u8457\u51cf\u5c11\uff0c\u6700\u9ad8\u53ef\u8fbe\u8d85\u8fc790%\uff0c\u5e73\u5747\u503c\u8d85\u8fc735%\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u51cf\u5c11\u5e26\u6765\u7684\u4e3b\u6d41\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u7684\u53d8\u5f02\u6027\u4fdd\u6301\u57282%\u4ee5\u4e0b\u3002\u901a\u8fc7\u63d0\u4f9b\u771f\u5b9e\u6027\u7684\u8bc4\u4f30\u548c\u9488\u5bf9\u6027\u522b\u504f\u89c1\u7684\u5b9a\u5236\u51cf\u5c11\uff0c\u6211\u4eec\u5e0c\u671bGenderCARE\u80fd\u591f\u4ee3\u8868\u5728LLM\u4e2d\u5b9e\u73b0\u516c\u5e73\u548c\u516c\u6b63\u7684\u4e00\u4e2a\u91cd\u8981\u6b65\u9aa4\u3002\u66f4\u591a\u7ec6\u8282\u8bf7\u53c2\u9605https://github.com/kstanghere/GenderCARE-ccs24\u3002**|\n", "2408.12480": "|**2024-08-23**|**Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese**|Khang T. Doan et.al.|[2408.12480](http://arxiv.org/abs/2408.12480)|null|\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Vintern-1B\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u8d8a\u5357\u8bed\u4efb\u52a1\u7684\u53ef\u9760\u7684\u4e00\u767e\u4ebf\u53c2\u6570\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u3002\u901a\u8fc7\u6574\u5408Qwen2-0.5B-Instruct\u8bed\u8a00\u6a21\u578b\u4e0eInternViT-300M-448px\u89c6\u89c9\u6a21\u578b\uff0cVintern-1B\u4f18\u5316\u4e86\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\uff08OCR\uff09\u3001\u6587\u6863\u63d0\u53d6\u548c\u8d8a\u5357\u8bed\u4e0a\u4e0b\u6587\u4e2d\u7684\u901a\u7528\u95ee\u9898\u56de\u7b54\u7b49\u5e94\u7528\u3002\u8be5\u6a21\u578b\u5728\u8d85\u8fc7\u4e09\u767e\u4e07\u5f20\u56fe\u50cf-\u95ee\u9898-\u7b54\u6848\u5bf9\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5b9e\u73b0\u4e86\u5728\u591a\u4e2a\u8d8a\u5357\u8bed\u57fa\u51c6\u6d4b\u8bd5\u5982OpenViVQA\u548cViTextVQA\u4e0a\u7684\u7a33\u5065\u6027\u80fd\u548c\u53ef\u9760\u7ed3\u679c\u3002Vintern-1B\u8db3\u591f\u5c0f\uff0c\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u5404\u79cd\u79bb\u7ebf\u5e94\u7528\u4e2d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u6e90\u4e86\u51e0\u7ec4\u7528\u4e8e\u6587\u672c\u548c\u56fe\u8868\u7684\u8d8a\u5357\u8bed\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u6570\u636e\u96c6\uff0c\u4f7f\u7528\u7684\u662fGemini 1.5 Flash\u521b\u5efa\u7684\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://huggingface.co/5CD-AI/Vintern-1B-v2\u3002|\n", "2408.12475": "|**2024-08-22**|**Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition**|Bozheng Li et.al.|[2408.12475](http://arxiv.org/abs/2408.12475)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65f6\u5e8f\u5e8f\u5217\u611f\u77e5\u6a21\u578b\uff08TSAM\uff09\u4ee5\u8fdb\u884c\u5c11\u91cf\u6837\u672c\u52a8\u4f5c\u8bc6\u522b\uff08FSAR\uff09\uff0c\u8be5\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u6846\u67b6\u4e2d\u5f15\u5165\u4e86\u5e8f\u5217\u611f\u77e5\u5668\u9002\u914d\u5668\uff0c\u65e8\u5728\u6574\u5408\u7a7a\u95f4\u4fe1\u606f\u548c\u5e8f\u5217\u65f6\u95f4\u52a8\u6001\u5230\u7279\u5f81\u5d4c\u5165\u4e2d\u3002\u4e0e\u73b0\u6709\u901a\u8fc7\u63a2\u7d22\u6240\u6709\u5e27\u4e4b\u95f4\u5173\u7cfb\u6765\u6355\u83b7\u65f6\u95f4\u4fe1\u606f\u7684\u7ec6\u8c03\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u611f\u77e5\u5668\u7684\u9002\u914d\u5668\u80fd\u591f\u6cbf\u65f6\u95f4\u7ebf\u9012\u5f52\u5730\u6355\u6349\u5e8f\u5217\u52a8\u6001\uff0c\u5e76\u611f\u77e5\u987a\u5e8f\u53d8\u5316\u3002\u4e3a\u4e86\u83b7\u53d6\u6bcf\u4e2a\u7c7b\u522b\u7684\u5224\u522b\u6027\u8868\u793a\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bfc\u51fa\u7684\u6587\u672c\u5e93\uff0c\u5bf9\u89c6\u89c9\u539f\u578b\u8fdb\u884c\u4e86\u4e30\u5bcc\uff0c\u901a\u8fc7\u6574\u5408\u4e0a\u4e0b\u6587\u8bed\u4e49\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u4e0d\u5e73\u8861\u6700\u4f18\u4f20\u8f93\u7b56\u7565\u6765\u8fdb\u884c\u7279\u5f81\u5339\u914d\uff0c\u4ee5\u51cf\u8f7b\u4e0e\u7c7b\u522b\u65e0\u5173\u7279\u5f81\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u6709\u6548\u7684\u51b3\u7b56\u3002\u5728\u4e94\u4e2aFSAR\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u521b\u4e0b\u4e86\u65b0\u7684\u57fa\u51c6\uff0c\u4e0e\u7b2c\u4e8c\u597d\u7684\u7ade\u4e89\u5bf9\u624b\u76f8\u6bd4\u53d6\u5f97\u4e86\u663e\u8457\u7684\u4f18\u52bf\u3002|\n", "2408.12470": "|**2024-08-22**|**DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems**|Jiaju Chen et.al.|[2408.12470](http://arxiv.org/abs/2408.12470)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u4f46\u5f80\u5f80\u4f34\u968f\u7740\u63a8\u8350\u591a\u6837\u6027\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u635f\u5bb3\u7528\u6237\u4f53\u9a8c\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u5e94\u8fd0\u800c\u751f\uff0c\u5b83\u5141\u8bb8\u7528\u6237\u6307\u5b9a\u504f\u597d\u5e76\u83b7\u5f97\u6ee1\u8db3\u5176\u591a\u6837\u5316\u9700\u6c42\u7684\u63a8\u8350\u3002\u5c3d\u7ba1\u5177\u6709\u6f5c\u529b\uff0c\u73b0\u6709\u7684\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u901a\u5e38\u4f9d\u8d56\u4e8e\u7b80\u5355\u673a\u5236\uff0c\u5982\u5355\u4e00\u63d0\u793a\uff0c\u6765\u8c03\u8282\u591a\u6837\u6027\uff0c\u8fd9\u79cd\u505a\u6cd5\u672a\u80fd\u5145\u5206\u6355\u6349\u7528\u6237\u504f\u597d\u7684\u590d\u6742\u6027\u3002\u9488\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDLCRec\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8eLLM\u7684\u63a8\u8350\u7cfb\u7edf\u7684\u7cbe\u7ec6\u7c92\u5ea6\u591a\u6837\u6027\u63a7\u5236\u3002\u4e0e\u4f20\u7edf\u65b9\u6cd5\u4e0d\u540c\uff0cDLCRec\u91c7\u7528\u7cbe\u7ec6\u4efb\u52a1\u5206\u89e3\u7b56\u7565\uff0c\u5c06\u63a8\u8350\u8fc7\u7a0b\u62c6\u5206\u4e3a\u4e09\u4e2a\u4f9d\u6b21\u8fdb\u884c\u7684\u5b50\u4efb\u52a1\uff1a\u4f53\u88c1\u9884\u6d4b\u3001\u4f53\u88c1\u586b\u5145\u548c\u9879\u76ee\u9884\u6d4b\u3002\u8fd9\u4e9b\u5b50\u4efb\u52a1\u72ec\u7acb\u8bad\u7ec3\u5e76\u5728\u7528\u6237\u5b9a\u4e49\u7684\u63a7\u5236\u6570\u6307\u5bfc\u4e0b\u4f9d\u6b21\u63a8\u7406\uff0c\u786e\u4fdd\u4e86\u5bf9\u591a\u6837\u6027\u7684\u66f4\u7cbe\u786e\u63a7\u5236\u3002\u6b64\u5916\uff0c\u7a00\u7f3a\u4e14\u5206\u5e03\u4e0d\u5747\u7684\u591a\u6837\u6027\u76f8\u5173\u7528\u6237\u884c\u4e3a\u6570\u636e\u7684\u7f3a\u4e4f\u6784\u6210\u4e86\u5bf9\u5fae\u8c03\u7684\u4e25\u5cfb\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u79cd\u6570\u636e\u589e\u5f3a\u6280\u672f\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u566a\u58f0\u548c\u79bb\u7fa4\u6570\u636e\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u6280\u672f\u4f7f\u6a21\u578b\u63a5\u89e6\u5230\u66f4\u5e7f\u6cdb\u7684\u6a21\u5f0f\uff0c\u4ece\u800c\u63d0\u9ad8\u5176\u751f\u6210\u4e0d\u540c\u591a\u6837\u6027\u7684\u63a8\u8350\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDLCRec\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u5bf9\u591a\u6837\u6027\u7684\u7cbe\u786e\u63a7\u5236\uff0c\u800c\u4e14\u5728\u591a\u4e2a\u63a8\u8350\u573a\u666f\u4e2d\u90fd\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.13257": "|**2024-08-23**|**MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?**|Yi-Fan Zhang et.al.|[2408.13257](http://arxiv.org/abs/2408.13257)|null|\u8fd1\u671f\uff0c\u5168\u9762\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7814\u7a76\u793e\u533a\u4e2d\u5f15\u53d1\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u73b0\u6709\u57fa\u51c6\u6d4b\u8bd5\u5b58\u5728\u4e00\u4e9b\u666e\u904d\u7684\u969c\u788d\uff0c\u4f7f\u5f97\u8861\u91cf\u6a21\u578b\u9762\u4e34\u7684\u5b9e\u9645\u4e16\u754c\u6311\u6218\u53d8\u5f97\u56f0\u96be\uff0c\u5305\u62ec\uff1a1\uff09\u6570\u636e\u89c4\u6a21\u8f83\u5c0f\u5bfc\u81f4\u6027\u80fd\u6ce2\u52a8\u5927\uff1b2\uff09\u4f9d\u8d56\u6a21\u578b\u751f\u6210\u6ce8\u91ca\u9020\u6210\u6570\u636e\u8d28\u91cf\u53d7\u9650\uff1b3\uff09\u4efb\u52a1\u96be\u5ea6\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u56fe\u50cf\u5206\u8fa8\u7387\u6709\u9650\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86MME-RealWorld\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4ece\u516c\u5171\u6570\u636e\u96c6\u548c\u4e92\u8054\u7f51\u6536\u96c6\u4e86\u8d85\u8fc730\u4e07\u5f20\u56fe\u7247\uff0c\u5e76\u7b5b\u9009\u51fa13,366\u5f20\u9ad8\u8d28\u91cf\u56fe\u7247\u8fdb\u884c\u6807\u6ce8\u3002\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u52a8\u7528\u4e8625\u540d\u4e13\u4e1a\u6ce8\u91ca\u5458\u548c7\u540dMLLM\u9886\u57df\u7684\u4e13\u5bb6\uff0c\u5171\u8d21\u732e\u4e8629,429\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u6db5\u76d6\u4e865\u79cd\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4e0b\u768443\u4e2a\u5b50\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u751a\u81f3\u5bf9\u4eba\u7c7b\u6765\u8bf4\u4e5f\u6781\u5177\u6311\u6218\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cMME-RealWorld\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u4eba\u5de5\u6807\u6ce8\u57fa\u51c6\uff0c\u5176\u7279\u5f81\u4e3a\u6700\u9ad8\u5206\u8fa8\u7387\u4ee5\u53ca\u4e13\u6ce8\u4e8e\u771f\u5b9e\u4e16\u754c\u5e94\u7528\u7684\u76ee\u6807\u5bfc\u5411\u3002 \u6211\u4eec\u8fdb\u4e00\u6b65\u5bf928\u4e2a\u9886\u5148\u7684MLLM\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u5982GPT-4o\u3001Gemini 1.5 Pro\u548cClaude 3.5 Sonnet\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u65e0\u6cd5\u5e94\u5bf9\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u6ca1\u6709\u4e00\u4e2a\u6a21\u578b\u8fbe\u523060%\u7684\u51c6\u786e\u7387\u3002\u611f\u77e5\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u7406\u89e3\u590d\u6742\u7684\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4ecd\u7136\u662f\u4e9f\u5f85\u89e3\u51b3\u7684\u5173\u952e\u95ee\u9898\u3002\u76f8\u5173\u7684\u6570\u636e\u548c\u8bc4\u4f30\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://mme-realworld.github.io/ \u3002|\n", "2408.13253": "|**2024-08-23**|**Domain-specific long text classification from sparse relevant information**|C\u00e9lia D'Cruz et.al.|[2408.13253](http://arxiv.org/abs/2408.13253)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65e0\u7591\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\uff0c\u5f53\u524d\u7684\u8d8b\u52bf\u662f\u63a8\u52a8\u5355\u4e00\u6a21\u578b\u89e3\u51b3\u6240\u6709\u4efb\u52a1\uff08\u5982\u60c5\u611f\u5206\u6790\u3001\u7ffb\u8bd1\u7b49\uff09\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u7a00\u758f\u4fe1\u606f\u6216\u5f31\u4fe1\u53f7\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u7edf\u8ba1\u673a\u5236\u96be\u4ee5\u6709\u6548\u5229\u7528\u5173\u952e\u4fe1\u606f\u3002\u4f8b\u5982\uff0c\u5728\u957f\u7bc7\u7279\u5b9a\u9886\u57df\u6587\u6863\u7684\u5206\u7c7b\u4e2d\uff0c\u76f8\u5173\u6027\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4e00\u4e2a\u6216\u51e0\u4e2a\u5173\u952e\u672f\u8bed\u3002\u533b\u7597\u9886\u57df\u4e2d\uff0c\u786e\u5b9a\u67d0\u4e2a\u62a5\u544a\u662f\u5426\u5305\u542b\u4e86\u5173\u4e8e\u60a3\u8005\u72b6\u51b5\u7684\u5173\u952e\u4fe1\u606f\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5173\u952e\u4fe1\u606f\u901a\u5e38\u57fa\u4e8e\u4e00\u4e24\u4e2a\u7279\u5b9a\u7684\u5b64\u7acb\u672f\u8bed\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5c42\u6b21\u5316\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e00\u4e2a\u6f5c\u5728\u76ee\u6807\u672f\u8bed\u5217\u8868\u6765\u68c0\u7d22\u5019\u9009\u53e5\u5b50\uff0c\u5e76\u5c06\u8fd9\u4e9b\u53e5\u5b50\u8868\u793a\u4e3a\u5305\u542b\u5b83\u4eec\u7684\u76ee\u6807\u672f\u8bed\u7684\u4e0a\u4e0b\u6587\u5d4c\u5165\u3002\u5bf9\u76ee\u6807\u672f\u8bed\uff08\u6216\u672f\u8bed\uff09\u7684\u5d4c\u5165\u8fdb\u884c\u805a\u5408\u5bfc\u81f4\u6587\u6863\u8868\u793a\u88ab\u7528\u4e8e\u5206\u7c7b\u3002\u6211\u4eec\u5206\u522b\u5728\u82f1\u8bed\u548c\u6cd5\u8bed\u7684\u516c\u5f00\u533b\u7597\u6587\u6863\u57fa\u51c6\u6570\u636e\u96c6\u4ee5\u53ca\u79c1\u6709\u533b\u7597\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7a84\u5c42\u7ea7\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u80cc\u666f\u4e0b\u68c0\u7d22\u76f8\u5173\u957f\u6587\u6863\u65b9\u9762\u4f18\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2408.13233": "|**2024-08-23**|**Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time**|Yingyu Liang et.al.|[2408.13233](http://arxiv.org/abs/2408.13233)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5feb\u901f\u8ba1\u7b97\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u4e2d\u7684\u68af\u5ea6\u8ba1\u7b97\u3002\u8be5\u65b9\u6cd5\u5728\u51e0\u4e4e\u7ebf\u6027\u65f6\u95f4\u5185$n^{1+o(1)}$\u8ba1\u7b97\u6574\u4e2a\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u7684\u68af\u5ea6\uff0c\u5176\u4e2d$n$\u662f\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u3002\u8fd9\u4e00\u7a81\u7834\u6781\u5927\u5730\u964d\u4f4e\u4e86\u4f20\u7edf\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u76f8\u5173\u7684\u8ba1\u7b97\u74f6\u9888\u3002\u6211\u4eec\u7684\u7406\u8bba\u9002\u7528\u4e8e\u4efb\u4f55\u635f\u5931\u51fd\u6570\uff0c\u5e76\u5728\u5168\u6a21\u578b\u4e0a\u4fdd\u6301\u53ef\u63a7\u5236\u7684\u8fd1\u4f3c\u8bef\u5dee\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fd8\u8003\u8651\u4e86\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u5305\u542b\u8bb8\u591a\u5b9e\u7528\u5b50\u6a21\u5757\u7684\u60c5\u51b5\uff0c\u5982\u6b8b\u5dee\u8fde\u63a5\u3001\u56e0\u679c\u63a9\u7801\u548c\u591a\u5934\u6ce8\u610f\u529b\u3002\u901a\u8fc7\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u68af\u5ea6\u8ba1\u7b97\u7684\u6548\u7387\uff0c\u6211\u4eec\u671f\u671b\u901a\u8fc7\u57fa\u4e8e\u6211\u4eec\u7684\u7406\u8bba\u7ed3\u679c\u6539\u8fdb\u957f\u4e0a\u4e0b\u6587\u8bed\u8a00\u6a21\u578b\u7684\u8bad\u7ec3\u548c\u90e8\u7f72\uff0c\u4f7f\u8fd9\u4e9b\u6a21\u578b\u66f4\u52a0\u6709\u6548\u3002|\n", "2408.13214": "|**2024-08-23**|**EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods**|Hongcheng Ding et.al.|[2408.13214](http://arxiv.org/abs/2408.13214)|null|\u51c6\u786e\u9884\u6d4bEUR/USD\u6c47\u7387\u5bf9\u6295\u8d44\u8005\u3001\u4f01\u4e1a\u548c\u653f\u7b56\u5236\u5b9a\u8005\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6IUS\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u65b0\u95fb\u548c\u5206\u6790\u7684\u975e\u7ed3\u6784\u5316\u6587\u672c\u6570\u636e\u4e0e\u6c47\u7387\u548c\u91d1\u878d\u6307\u6807\u7684\u7ed3\u6784\u5316\u6570\u636e\uff0c\u4ee5\u589e\u5f3a\u6c47\u7387\u9884\u6d4b\u80fd\u529b\u3002IUS\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u60c5\u611f\u6781\u6027\u8bc4\u5206\u548c\u6c47\u7387\u53d8\u52a8\u5206\u7c7b\u3002\u8fd9\u4e9b\u6587\u672c\u7279\u5f81\u4e0e\u5b9a\u91cf\u7279\u5f81\u76f8\u7ed3\u5408\uff0c\u5e76\u8f93\u5165\u5230\u56e0\u679c\u9a71\u52a8\u7279\u5f81\u751f\u6210\u5668\u4e2d\u3002\u7136\u540e\u4f7f\u7528Optuna\u4f18\u5316\u7684Bi-LSTM\u6a21\u578b\u9884\u6d4bEUR/USD\u6c47\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6a21\u578b\u5728\u51cf\u5c11\u5e73\u5747\u7edd\u5bf9\u8bef\u5dee\uff08MAE\uff0910.69%\u548c\u6839\u5747\u65b9\u8bef\u5dee\uff08RMSE\uff099.56%\u65b9\u9762\u4f18\u4e8e\u57fa\u51c6\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u878d\u5408\u975e\u7ed3\u6784\u5316\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u51c6\u786e\u6027\u6bd4\u4ec5\u4f7f\u7528\u7ed3\u6784\u5316\u6570\u636e\u66f4\u9ad8\u3002\u6b64\u5916\uff0c\u4f7f\u7528\u9876\u7ea712\u4e2a\u91cd\u8981\u5b9a\u91cf\u7279\u5f81\u548c\u6587\u672c\u7279\u5f81\u76f8\u7ed3\u5408\u8fdb\u884c\u7279\u5f81\u9009\u62e9\u8bc1\u660e\u662f\u6700\u6709\u6548\u7684\u3002\u63d0\u51fa\u7684IUS\u6846\u67b6\u548cOptuna-Bi-LSTM\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u79cd\u5f3a\u5927\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u6e90\u6570\u636e\u96c6\u6210\u7684\u6c47\u7387\u9884\u6d4b\u3002|\n", "2408.13204": "|**2024-08-23**|**DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation**|Qiming Zhu et.al.|[2408.13204](http://arxiv.org/abs/2408.13204)|null|\u4ee3\u7801\u57fa\u51c6\uff0c\u5982HumanEval\uff0c\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u5b83\u4eec\u4f18\u52bf\u4e0e\u4e0d\u8db3\u7684\u6d1e\u5bdf\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u7528\u7f16\u7801\u4efb\u52a1\u4e0a\uff08\u4f8b\u5982\uff1a\u5192\u6ce1\u6392\u5e8f\u3001\u6700\u5927\u516c\u7ea6\u6570\uff09\uff0c\u5bf9\u9886\u57df\u7279\u5b9a\u7f16\u7801\u4efb\u52a1\uff08\u5982\u8ba1\u7b97\u3001\u7cfb\u7edf\u3001\u52a0\u5bc6\uff09\u7684\u63a2\u7d22\u5219\u8f83\u5c11\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u9886\u57df\u4ee3\u7801\u57fa\u51c6DOMAINEVAL\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLMs\u7684\u7f16\u7801\u80fd\u529b\u3002\u6211\u4eec\u7684\u6d41\u7a0b\u4ee5\u5168\u81ea\u52a8\u65b9\u5f0f\u5de5\u4f5c\uff0c\u5141\u8bb8\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u6784\u5efa\u683c\u5f0f\u5316\u7684\u7814\u7a76\u4e3b\u9898\u8fdb\u884c\u5e95\u90e8\u63a8\u52a8\u5f0f\u6784\u5efa\u3002\u901a\u8fc7\u4f7f\u752812\u4e2a\u4ee3\u8868\u6027LLM\u5728DOMAINEVAL\u4e0a\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u4e00\u4e9b\u6709\u8da3\u7684\u7ed3\u679c\u3002 \u6211\u4eec\u6ce8\u610f\u5230\uff0cLLMs\u5728\u8ba1\u7b97\u4efb\u52a1\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u4f46\u5728\u52a0\u5bc6\u548c\u7cfb\u7edf\u7f16\u7801\u4efb\u52a1\u4e0a\u5374\u6709\u6240\u6b20\u7f3a\u3002\u67d0\u4e9bLLM\u5728\u8fd9\u4e9b\u9886\u57df\u7684\u6027\u80fd\u5dee\u8ddd\u53ef\u80fd\u9ad8\u8fbe68.94%\uff0880.94%-12.0%\uff09\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u751f\u6210\u66f4\u591a\u6837\u672c\u53ef\u4ee5\u63d0\u9ad8LLMs\u7684\u6574\u4f53\u6027\u80fd\uff0c\u4f46\u9886\u57df\u504f\u89c1\u751a\u81f3\u53ef\u80fd\u589e\u52a0\u3002\u672c\u7814\u7a76\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u6570\u636e\u96c6DOMAINEVAL\uff0c\u6db5\u76d6\u516d\u4e2a\u6d41\u884c\u9886\u57df\uff0c\u4ee5\u53ca\u4e00\u4e2a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7ba1\u9053\u7528\u4e8e\u6784\u5efa\u4ee3\u7801\u57fa\u51c6\uff0c\u5e76\u57fa\u4e8e\u5728DOMAINEVAL\u4e0a\u7684\u6027\u80fd\u8bc6\u522b\u4e86LLMs\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u6539\u8fdb\u7684\u65b9\u5411\u3002\u9886\u5bfc\u8005\u677f\u53ef\u5728https://domaineval.github.io/\u67e5\u770b\u3002|\n", "2408.13184": "|**2024-08-23**|**Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning**|Hourui Deng et.al.|[2408.13184](http://arxiv.org/abs/2408.13184)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\uff0c\u7a7a\u95f4\u63a8\u7406\u662f\u5b9e\u73b0\u611f\u77e5\u667a\u80fd\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u5728\u7b80\u5355\u7684\u8ff7\u5bab\u73af\u5883\u4e2d\uff0cLLM\u5728\u957f\u671f\u8def\u5f84\u89c4\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u53d7\u5230\u5176\u7a7a\u95f4\u5e7b\u89c9\u548c\u957f\u671f\u63a8\u7406\u5bfc\u81f4\u7684\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6a21\u578b\u2014\u2014\u7a7a\u95f4\u5230\u5173\u7cfb\u8f6c\u6362\u4e0e\u9012\u8fdbQ\u5b66\u4e60\uff08S2RCQL\uff09\u3002\u4e3a\u89e3\u51b3LLM\u7684\u7a7a\u95f4\u5e7b\u89c9\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7a7a\u95f4\u5230\u5173\u7cfb\u201d\u7684\u65b9\u6cd5\uff0c\u5c06\u7a7a\u95f4\u63d0\u793a\u8f6c\u5316\u4e3a\u5b9e\u4f53\u5173\u7cfb\u548c\u8868\u793a\u5b9e\u4f53\u5173\u7cfb\u94fe\u7684\u8def\u5f84\uff0c\u5145\u5206\u6316\u6398\u4e86LLM\u5728\u5e8f\u5217\u601d\u8003\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8eQ\u5b66\u4e60\u7684\u8def\u5f84\u89c4\u5212\u7b97\u6cd5\uff0c\u4ee5\u7f13\u89e3\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\uff0c\u589e\u5f3aLLM\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u72b6\u6001\u52a8\u4f5c\u7684Q\u503c\u4f5c\u4e3a\u63d0\u793a\u7684\u8f85\u52a9\u4fe1\u606f\uff0c\u6211\u4eec\u7ea0\u6b63\u4e86LLM\u7684\u5e7b\u89c9\uff0c\u5f15\u5bfcLLM\u5b66\u4e60\u6700\u4f18\u8def\u5f84\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u53cd\u5411\u8bfe\u7a0b\u5b66\u4e60\u6280\u672f\uff0c\u8fdb\u4e00\u6b65\u7f13\u89e3\u4e86\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u3002\u8be5\u6280\u672f\u901a\u8fc7\u964d\u4f4e\u4efb\u52a1\u96be\u5ea6\u5e76\u5229\u7528\u6210\u529f\u7ecf\u9a8c\uff0c\u5e2e\u52a9LLM\u5feb\u901f\u79ef\u7d2f\uff0c\u5e76\u4ee5\u6b64\u6765\u5e94\u5bf9\u66f4\u590d\u6742\u4efb\u52a1\u3002\u6211\u4eec\u5728\u767e\u5ea6\u81ea\u4e3b\u7814\u53d1\u7684LLM\uff1aERNIE-Bot 4.0\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684S2RCQL\u5728\u6210\u529f\u7387\u548c\u6700\u4f18\u6027\u65b9\u9762\u5206\u522b\u63d0\u9ad8\u4e8623%\u81f340%\uff0c\u76f8\u8f83\u4e8e\u5148\u8fdb\u7684\u63d0\u793a\u5de5\u7a0b\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002|\n", "2408.13073": "|**2024-08-23**|**IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models**|Zhihao Yu et.al.|[2408.13073](http://arxiv.org/abs/2408.13073)|**[link](https://github.com/yzhHoward/IntelliCare)**|\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u6570\u636e\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u53d6\u5f97\u5de8\u5927\u8fdb\u6b65\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u5728\u5904\u7406\u6709\u9650\u6570\u636e\u4e2d\u7684\u591a\u6837\u5316\u7684\u533b\u5b66\u4ee3\u7801\u65f6\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u5176\u8bed\u4e49\u3002\u5f15\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u77e5\u8bc6\u6574\u5408\u4e3a\u63d0\u5347\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u5206\u6790\u53ef\u80fd\u4f1a\u56e0\u6b67\u4e49\u95ee\u9898\u548c\u4e0d\u4e00\u81f4\u6027\u5bfc\u81f4\u663e\u8457\u7684\u6ce2\u52a8\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u6709\u6548\u5229\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aIntelliCare\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528LLM\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u60a3\u8005\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e76\u589e\u5f3a\u73b0\u6709\u7684EHR\u6a21\u578b\u6765\u6539\u5584\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u3002\u5177\u4f53\u6765\u8bf4\uff0cIntelliCare\u901a\u8fc7\u8bc6\u522b\u60a3\u8005\u7fa4\u4f53\uff0c\u5e76\u5229\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u7edf\u8ba1\u4fe1\u606f\u6765\u589e\u5f3aLLM\u7684\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6b67\u4e49\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5b83\u901a\u8fc7\u7ed3\u5408EHR\u6a21\u578b\u548c\u56f0\u60d1\u5ea6\u91cf\u6765\u7ec6\u5316\u4eceLLM\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u91c7\u7528\u6df7\u5408\u65b9\u6cd5\u751f\u6210\u591a\u4e2a\u5206\u6790\u7ed3\u679c\u5e76\u8fdb\u884c\u6821\u51c6\u3002\u5728\u4e09\u4e2a\u4e34\u5e8a\u9884\u6d4b\u4efb\u52a1\u4e0a\u5bf9\u4e24\u4e2a\u5927\u89c4\u6a21EHR\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cIntelliCare\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u73b0\u6709\u65b9\u6cd5\u7684\u8868\u73b0\uff0c\u51f8\u663e\u4e86\u5176\u5728\u63a8\u8fdb\u4e2a\u6027\u5316\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u548c\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.13071": "|**2024-08-23**|**Guiding IoT-Based Healthcare Alert Systems with Large Language Models**|Yulan Gao et.al.|[2408.13071](http://arxiv.org/abs/2408.13071)|null|\u5728\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff08HAS\uff09\u9886\u57df\uff0c\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u3001\u7269\u8054\u7f51\uff08IoT\uff09\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u4ee5\u53ca\u516c\u4f17\u5065\u5eb7\u610f\u8bc6\u7684\u63d0\u9ad8\uff0cHAS\u6b63\u7ecf\u5386\u7740\u5feb\u901f\u7684\u53d8\u9769\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u5b58\u5728\u4e00\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u5982\u4f55\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u5728\u4e2a\u6027\u5316\u5065\u5eb7\u8b66\u62a5\u7684\u51c6\u786e\u6027\u4e0e\u4e25\u683c\u9690\u79c1\u4fdd\u62a4\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u70b9\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-HAS\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff09\u3002\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u878d\u5165\u5230HAS\u4e2d\uff0c\u4ee5\u663e\u8457\u63d0\u5347\u8b66\u62a5\u7684\u51c6\u786e\u6027\u3001\u786e\u4fdd\u7528\u6237\u9690\u79c1\uff0c\u5e76\u589e\u5f3a\u4e2a\u6027\u5316\u533b\u7597\u670d\u52a1\uff0c\u540c\u65f6\u6539\u5584\u7528\u6237\u4f53\u9a8c\u7684\u8d28\u91cf\uff08QoE\uff09\u3002\u6211\u4eec\u7684\u521b\u65b0\u6846\u67b6\u91c7\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u65b9\u6cd5\uff0c\u7ed3\u5408LLM\uff0c\u901a\u8fc7\u5206\u6790\u7528\u6237\u7684\u4e2a\u6027\u5316\u504f\u597d\u548c\u6f5c\u5728\u5065\u5eb7\u98ce\u9669\u6765\u5904\u7406\u989d\u5916\u7684\u6587\u672c\u5de5\u4f5c\u63cf\u8ff0\u3002\u8fd9\u79cd\u5206\u6790\u6307\u5bfc\u4e86\u4e13\u95e8\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08DDPG\uff09\u4e13\u5bb6\u7684\u9009\u62e9\uff0c\u4ed6\u4eec\u8d1f\u8d23\u63d0\u4f9b\u7cbe\u786e\u7684\u5065\u5eb7\u8b66\u62a5\u3002\u6b64\u5916\uff0cLLM-HAS\u80fd\u591f\u5904\u7406\u5bf9\u8bdd\u5f0f\u7528\u6237\u53cd\u9988\uff0c\u4e0d\u4ec5\u5141\u8bb8\u5bf9DDPG\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd8\u80fd\u52a0\u6df1\u7528\u6237\u53c2\u4e0e\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8\u5065\u5eb7\u7ba1\u7406\u7b56\u7565\u7684\u51c6\u786e\u6027\u548c\u4e2a\u6027\u5316\u7a0b\u5ea6\u3002 \u6a21\u62df\u7ed3\u679c\u9a8c\u8bc1\u4e86LLM-HAS\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5176\u4f5c\u4e3a\u5229\u7528\u751f\u6210\u578b\u4eba\u5de5\u667a\u80fd\uff08GAI\uff09\u63d0\u4f9b\u9ad8\u5ea6\u51c6\u786e\u53ef\u9760\u8b66\u62a5\u7684\u7a81\u7834\u6027\u65b9\u6cd5\u7684\u6f5c\u529b\u3002|\n", "2408.13031": "|**2024-08-23**|**VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models**|Wentao Wu et.al.|[2408.13031](http://arxiv.org/abs/2408.13031)|**[link](https://github.com/event-ahu/vfm-det)**|**\u73b0\u6709\u8f66\u8f86\u68c0\u6d4b\u5668\u901a\u5e38\u901a\u8fc7\u5728\u57fa\u4e8e\u9884\u8bad\u7ec3\u4e3b\u5e72\uff08\u5982ResNet\u3001ViT\uff09\u7684\u9884\u8bad\u7ec3\u5178\u578b\u68c0\u6d4b\u5668\uff08\u4f8b\u5982YOLO\u3001RCNN\u3001DETR\u7cfb\u5217\uff09\u4e0a\u8fdb\u884c\u8f66\u8f86\u56fe\u50cf\u8bad\u7ec3\u83b7\u5f97\u3002\u4e00\u4e9b\u7814\u7a76\u8005\u8fd8\u5229\u7528\u5e76\u589e\u5f3a\u5927\u578b\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u68c0\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e9b\u68c0\u6d4b\u5668\u53ef\u80fd\u4ec5\u83b7\u5f97\u6b21\u4f18\u7ed3\u679c\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f7f\u7528\u7684\u5927\u578b\u6a21\u578b\u5e76\u975e\u4e13\u95e8\u4e3a\u8f66\u8f86\u8bbe\u8ba1\u3002\u6b64\u5916\uff0c\u4ed6\u4eec\u7684\u7ed3\u679c\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7279\u5f81\uff0c\u5e76\u4e14\u5f88\u5c11\u8003\u8651\u8f66\u8f86\u8bed\u4e49\u4fe1\u606f\u4e0e\u89c6\u89c9\u8868\u793a\u4e4b\u95f4\u7684\u5bf9\u9f50\u3002 \u5728\u6b64\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u9884\u8bad\u7ec3\u7684\u8f66\u8f86\u6a21\u578b\uff08VehicleMAE\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08T5\uff09\u7684\u65b0\u8f66\u8f86\u68c0\u6d4b\u8303\u5f0f\uff0c\u79f0\u4e3aVFM-Det\u3002\u5b83\u9075\u5faa\u533a\u57df\u5efa\u8bae\u6846\u68c0\u6d4b\u6846\u67b6\uff0c\u6bcf\u4e2a\u63d0\u8bae\u7684\u7279\u5f81\u53ef\u4ee5\u901a\u8fc7VehicleMAE\u589e\u5f3a\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684VAtt2Vec\u6a21\u5757\uff0c\u7528\u4e8e\u9884\u6d4b\u8fd9\u4e9b\u63d0\u8bae\u7684\u8f66\u8f86\u8bed\u4e49\u5c5e\u6027\u5e76\u5c06\u5b83\u4eec\u8f6c\u6362\u4e3a\u7279\u5f81\u5411\u91cf\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u589e\u5f3a\u89c6\u89c9\u7279\u5f81\u3002\u5bf9\u4e09\u4e2a\u8f66\u8f86\u68c0\u6d4b\u57fa\u51c6\u6570\u636e\u96c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u5145\u5206\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u8f66\u8f86\u68c0\u6d4b\u5668\u7684\u6709\u6548\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5206\u522b\u5728Cityscapes\u6570\u636e\u96c6\u4e0a\u7684$AP_{0.5}$\u3001$AP_{0.75}$\u6307\u6807\u4e0a\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e86$+5.1\\%$\u3001$+6.2\\%$\u3002\u6b64\u5de5\u4f5c\u7684\u6e90\u4ee3\u7801\u5c06\u5728https://github.com/Event-AHU/VFM-Det\u53d1\u5e03\u3002**|\n", "2408.13028": "|**2024-08-23**|**In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting**|Haowei Du et.al.|[2408.13028](http://arxiv.org/abs/2408.13028)|null|\u5728\u5f53\u524d\u7684\u5b66\u672f\u754c\uff0c\u5bf9\u57fa\u4e8e\u6307\u4ee4\u589e\u5f3a\u7684\u5c11\u91cf\u5b9e\u4f8b\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLM\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08In-context Learning, ICL\uff09\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u73b0\u6709\u7684\u7528\u4e8eICL\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u5229\u7528\u7a00\u758f\u6216\u5bc6\u96c6\u68c0\u7d22\u5668\uff0c\u5e76\u4e14\u80fd\u591f\u4ea7\u751f\u6709\u6548\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u5bf9\u53cd\u9988\u4fe1\u606f\u7684\u5229\u7528\u6765\u8bad\u7ec3\u68c0\u7d22\u5668\uff0c\u6240\u9009\u7684\u793a\u4f8b\u53ef\u80fd\u65e0\u6cd5\u663e\u8457\u63d0\u5347LLM\u7684\u7c7b\u6bd4\u80fd\u529b\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u57fa\u4e8e\u5f3a\u5316\u5b66\u4e60\u7684\u7b56\u7565\u6846\u67b6\uff08Policy-based Reinforcement Learning Framework, RLS\uff09\u7528\u4e8e\u793a\u4f8b\u9009\u62e9\u3002\u8be5\u6846\u67b6\u7531\u8bed\u8a00\u6a21\u578b\uff08Language Model, LM\uff09\u9009\u62e9\u5668\u548cLLM\u751f\u6210\u5668\u7ec4\u6210\u3002\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u5c06\u5019\u9009\u793a\u4f8b\u7f16\u7801\u4e3a\u5bc6\u96c6\u8868\u793a\uff0c\u5e76\u4ece\u4e2d\u9009\u62e9top-k\u4e2a\u793a\u4f8b\u4f5c\u4e3aLLM\u7684\u793a\u8303\u3002\u901a\u8fc7\u91c7\u7528LLM\u7684\u8f93\u51fa\u6765\u8ba1\u7b97\u5956\u52b1\u548c\u7b56\u7565\u68af\u5ea6\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u3002 \u6211\u4eec\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u76f8\u8f83\u4e8e\u76d1\u7763\u5fae\u8c03\uff08Supervised Fine-tuning, SFT\uff09\u6a21\u578b\u663e\u793a\u51fa\u4f18\u52bf\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u793a\u4f8b\u7684\u6570\u91cf\u4e30\u5bcc\u6027\u548c\u4e0e\u6d4b\u8bd5\u6848\u4f8b\u7684\u76f8\u4f3c\u6027\u5bf9\u4e8eICL\u4e2d\u7684LLM\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.14470": "|**2024-08-27**|**Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models**|Aradhye Agarwal et.al.|[2408.14470](http://arxiv.org/abs/2408.14470)|**[link](https://github.com/Aradhye2002/selective-peft-toolkit)**|**\u7ec6\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0b\u6e38\u4efb\u52a1\u4e0a\u9700\u8981\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\u3002\u53c2\u6570\u9ad8\u6548\u7ec6\u8c03\uff08PEFT\uff09\u7c7b\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4ec5\u5fae\u8c03\u6a21\u578b\u53c2\u6570\u7684\u5c0f\u90e8\u5206\u6765\u7f13\u89e3\u8fd9\u4e9b\u8ba1\u7b97\u6311\u6218\u3002\u867d\u7136\u4ece\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8003\u8651\uff0c\u8fd9\u4e9b\u6280\u672f\u901a\u5e38\u65e0\u6cd5\u4e0e\u5b8c\u5168\u5fae\u8c03\u7684\u6a21\u578b\u6027\u80fd\u76f8\u5339\u654c\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u56fa\u6709\u7684\u504f\u89c1\u3002\u4f20\u7edf\u7684\u9009\u62e9\u6027PEFT\u6280\u672f\u57fa\u4e8e\u9884\u5148\u5b9a\u4e49\u7684\u9884\u7b97\uff08\u4e5f\u79f0\u4e3a\u53bb\u906e\u7f69\uff09\u4f7f\u7528\u56fa\u5b9a\u53c2\u6570\u96c6\uff0c\u672a\u80fd\u52a8\u6001\u6355\u6349\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u7ecf\u5e38\u8d85\u51fa\u9884\u7b97\u3002\u6211\u4eec\u5f15\u5165\u4e86$\\text{ID}^3$\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u9009\u62e9\u6027PEFT\u65b9\u6cd5\uff0c\u5b83\u8fde\u7eed\u8ba1\u7b97\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u901a\u8fc7\u5e73\u8861\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u6765\u52a8\u6001\u5730\u53bb\u906e\u7f69\u53c2\u6570\u3002\u6211\u4eec\u572815\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8986\u76d6\u4e86\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\uff0c\u663e\u793a\u4e86\u4e0e\u57fa\u4e8e\u56fa\u5b9a\u53bb\u906e\u7f69\u7684PEFT\u6280\u672f\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\uff0c$\\text{ID}^3$\u5c06\u68af\u5ea6\u66f4\u65b0\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u4e00\u500d\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u8ba1\u7b97\u6548\u7387\u3002$\\text{ID}^3$\u5bf9\u795e\u7ecf\u5143\u7684\u968f\u673a\u521d\u59cb\u5316\u5177\u6709\u9c81\u68d2\u6027\uff0c\u56e0\u6b64\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230\u73b0\u6709\u6dfb\u52a0\u5f0f\u548c\u91cd\u65b0\u53c2\u6570\u5316\u57faPEFT\u6a21\u5757\uff0c\u5982\u9002\u914d\u5668\u548cLoRA\u4e2d\uff0c\u7528\u4e8e\u52a8\u6001\u7a00\u758f\u5316\u3002**|\n", "2408.14469": "|**2024-08-26**|**Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos**|Qirui Chen et.al.|[2408.14469](http://arxiv.org/abs/2408.14469)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u957f\u5f62\u5f0f\u7b2c\u4e00\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e2d\u7684\u591a\u8df3\u89c6\u9891\u95ee\u7b54\uff08Multi-Hop Video Question Answering\uff0cMH-VidQA\uff09\u95ee\u9898\u3002\u8fd9\u9879\u4efb\u52a1\u4e0d\u4ec5\u9700\u8981\u56de\u7b54\u89c6\u89c9\u95ee\u9898\uff0c\u8fd8\u9700\u8981\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u591a\u4e2a\u76f8\u5173\u7684\u65f6\u95f4\u6bb5\u4f5c\u4e3a\u89c6\u89c9\u8bc1\u636e\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u6d41\u7a0b\u6765\u521b\u5efa\u5e26\u6709\u5173\u8054\u65f6\u95f4\u8bc1\u636e\u7684\u591a\u8df3\u95ee\u9898\u89e3\u7b54\u914d\u5bf9\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6307\u4ee4\u8c03\u6574\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u3002\u4e3a\u4e86\u76d1\u6d4b\u8fd9\u4e00\u65b0\u4efb\u52a1\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u6574\u7406\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u57fa\u51c6\u2014\u2014MultiHop-EgoQA\uff0c\u901a\u8fc7\u4ed4\u7ec6\u7684\u624b\u52a8\u9a8c\u8bc1\u548c\u7ec6\u5316\u8fdb\u884c\u6784\u5efa\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86\u73b0\u6709\u8de8\u6a21\u6001\u7cfb\u7edf\u5728\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGrounding Scattered Evidence with Large Language Model\u201d\uff08GeLM\uff09\u7684\u65b0\u67b6\u6784\uff0c\u8be5\u67b6\u6784\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u5730\u7406\u89e3\u7801\u6a21\u5757\u589e\u5f3a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\uff0c\u8be5\u6a21\u5757\u4f7f\u7528\u7075\u6d3b\u7684\u5730\u7406\u89e3\u7801\u4ee4\u724c\u4ece\u89c6\u9891\u4e2d\u68c0\u7d22\u65f6\u95f4\u8bc1\u636e\u3002\u5728\u6211\u4eec\u7684\u89c6\u89c9\u6307\u4ee4\u6570\u636e\u4e0a\u8fdb\u884c\u8bad\u7ec3\u540e\uff0cGeLM\u5c55\u793a\u4e86\u589e\u5f3a\u7684\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4e3a\u8fd9\u4e00\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u8bbe\u5b9a\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u6b64\u5916\uff0c\u5f53\u5728\u7b2c\u4e09\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u76f8\u540c\u7684\u67b6\u6784\u5728\u5355\u8df3\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\uff08ActivityNet-RTL\uff09\u4e0a\u4e5f\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8bc1\u660e\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2408.14467": "|**2024-08-26**|**Explicit Inductive Inference using Large Language Models**|Tianyang Liu et.al.|[2408.14467](http://arxiv.org/abs/2408.14467)|null|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ba1\u9053\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fd9\u4e00\u504f\u5dee\u8fdb\u884c\u660e\u786e\u7684\u5f52\u7eb3\u63a8\u7406\u3002\u8be5\u7ba1\u9053\u4f7f\u7528LLM\u5c06\u524d\u63d0\u8f6c\u6362\u4e3a\u4e00\u7ec4\u5df2\u9a8c\u8bc1\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u805a\u5408\u884d\u751f\u7684\u65b0\u8574\u542b\u8be2\u95ee\u7684\u7b54\u6848\u6765\u652f\u6301\u539f\u59cb\u63a8\u7406\u9884\u6d4b\u3002\u5728\u65b9\u5411\u6027\u8c13\u8bcd\u8574\u542b\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u901a\u8fc7\u5e94\u7528\u6b64\u7b80\u5355\u7ba1\u9053\uff0c\u53ef\u4ee5\u63d0\u9ad8LLM\u5728\u63a8\u7406\u4e0a\u7684\u6574\u4f53\u6027\u80fd\uff0c\u5e76\u663e\u8457\u51cf\u8f7b\u5b83\u4eec\u7684\u8bc1\u5b9e\u504f\u5dee\u5f71\u54cd\u3002|\n", "2408.14438": "|**2024-08-26**|**Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study**|Liuchang Xu Shuo Zhao et.al.|[2408.14438](http://arxiv.org/abs/2408.14438)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u3001Gemini\u7b49\u7684\u95ee\u4e16\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4ee3\u7801\u751f\u6210\u7b49\u591a\u65b9\u9762\u80fd\u529b\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u65b9\u9762\u7684\u8868\u73b0\u5e76\u672a\u5f97\u5230\u5168\u9762\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u4efb\u52a1\u7a7a\u95f4\u8bc4\u4ef7\u6570\u636e\u96c6\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u548c\u6bd4\u8f83\u51e0\u79cd\u5148\u8fdb\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u5341\u4e8c\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\u7c7b\u578b\uff0c\u5305\u62ec\u7a7a\u95f4\u7406\u89e3\u548c\u8def\u5f84\u89c4\u5212\uff0c\u5e76\u4e14\u6bcf\u9879\u4efb\u52a1\u90fd\u6709\u7ecf\u8fc7\u9a8c\u8bc1\u7684\u51c6\u786e\u7b54\u6848\u3002 \u6211\u4eec\u91c7\u7528\u53cc\u9636\u6bb5\u6d4b\u8bd5\u65b9\u6cd5\u5bf9\u591a\u4e2a\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecOpenAI\u7684gpt-3.5-turbo\u3001gpt-4o\u4ee5\u53caZhipuAI\u7684glm-4\u3002\u9996\u5148\u8fdb\u884c\u96f6\u6837\u672c\u6d4b\u8bd5\uff0c\u968f\u540e\u6839\u636e\u96be\u5ea6\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\uff0c\u5e76\u6267\u884c\u4e86\u63d0\u793a\u8c03\u4f18\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u7b2c\u4e00\u9636\u6bb5\u7684\u6d4b\u8bd5\u4e2d\uff0cgpt-4o\u7684\u6574\u4f53\u51c6\u786e\u6027\u6700\u9ad8\uff0c\u5e73\u5747\u8fbe\u5230\u4e8671.3%\u3002\u5c3d\u7ba1moonshot-v1-8k\u5728\u603b\u4f53\u4e0a\u7565\u900a\u4e00\u7b79\uff0c\u4f46\u5728\u5730\u540d\u8bc6\u522b\u4efb\u52a1\u4e0a\u5374\u8d85\u8d8a\u4e86gpt-4o\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u7279\u5b9a\u4efb\u52a1\u4e2d\u63d0\u793a\u7b56\u7565\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u94fe\u5f0f\u601d\u8003\uff08COT\uff09\u7b56\u7565\u4f7fgpt-4o\u5728\u8def\u5f84\u89c4\u5212\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece12.4%\u63d0\u5347\u81f387.5%\uff0c\u800c\u4e00\u6b21\u5c04\u51fb\u7b56\u7565\u5219\u4f7fmoonshot-v1-8k\u5728\u5730\u56fe\u7ed8\u5236\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece10.1%\u63d0\u9ad8\u523076.3%\u3002|\n", "2408.14419": "|**2024-08-26**|**CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models**|Shubham Bharti et.al.|[2408.14419](http://arxiv.org/abs/2408.14419)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHARTOM\u7684\u89c6\u89c9\u7406\u8bba\u7406\u89e3\u57fa\u51c6\uff0c\u9488\u5bf9\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002CHARTOM\u7531\u4e13\u95e8\u8bbe\u8ba1\u7684\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\u7ec4\u6210\u3002\u7ed9\u5b9a\u4e00\u4e2a\u56fe\u8868\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u9700\u8981\u6b63\u786e\u7406\u89e3\u56fe\u8868\uff08\u4e8b\u5b9e\u95ee\u9898\uff09\uff0c\u8fd8\u9700\u8981\u5224\u65ad\u8be5\u56fe\u8868\u662f\u5426\u4f1a\u8ba9\u4eba\u7c7b\u8bfb\u8005\u4ea7\u751f\u8bef\u5bfc\uff08\u601d\u7ef4\u95ee\u9898\uff09\u3002\u8fd9\u4e24\u4e2a\u95ee\u9898\u90fd\u5177\u6709\u91cd\u8981\u7684\u793e\u4f1a\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u8be6\u7ec6\u4ecb\u7ecd\u6784\u5efaCHARTOM\u57fa\u51c6\u7684\u8fc7\u7a0b\uff0c\u5305\u62ec\u5176\u5bf9\u4eba\u7c7b\u8868\u73b0\u7684\u6821\u51c6\u3002|\n", "2408.14418": "|**2024-08-26**|**MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues**|Kuluhan Binici et.al.|[2408.14418](http://arxiv.org/abs/2408.14418)|null|\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b(ASR)\u7cfb\u7edf\u5728\u5c06\u8bed\u97f3\u8f6c\u6362\u4e3a\u6587\u672c\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u7136\u800c\uff0c\u5b83\u4eec\u5f15\u5165\u7684\u9519\u8bef\u4f1a\u4e25\u91cd\u964d\u4f4e\u4e0b\u6e38\u4efb\u52a1\u5982\u6458\u8981\u751f\u6210\u7684\u8868\u73b0\u3002\u8fd9\u4e2a\u95ee\u9898\u5728\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u9886\u57df\u5c24\u4e3a\u7a81\u51fa\uff0c\u8fd9\u662f\u4e00\u4e2a\u6570\u636e\u8d44\u6e90\u6709\u9650\u7684\u9886\u57df\uff0c\u7528\u4e8e\u5fae\u8c03\u7684\u76d1\u7763\u6570\u636e\u7a00\u7f3a\uff0c\u56e0\u6b64\u9700\u8981\u5c06ASR\u6a21\u578b\u4f5c\u4e3a\u9ed1\u76d2\u89e3\u51b3\u65b9\u6848\u4f7f\u7528\u3002\u4f20\u7edf\u7684\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u4e5f\u4e0d\u9002\u7528\u4e8e\u63d0\u9ad8\u6458\u8981\u6a21\u578b\u5bf9\u566a\u97f3\u7684\u9c81\u68d2\u6027\uff0c\u539f\u56e0\u662f\u7f3a\u4e4f\u8db3\u591f\u7684\u533b\u7597\u5bf9\u8bdd\u97f3\u9891\u8bb0\u5f55\u53ca\u5176\u5bf9\u5e94\u7684ASR\u8f6c\u5f55\u6587\u672c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDSAGE\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u751f\u6210\u5408\u6210\u6837\u672c\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5229\u7528LLMs\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\uff0c\u5e76\u6307\u5bfc\u5b83\u4eec\u57fa\u4e8e\u5c11\u91cf\u53ef\u7528\u7684\u533b\u7597\u5bf9\u8bdd\u793a\u4f8b\u548c\u97f3\u9891\u8bb0\u5f55\uff0c\u751f\u6210\u7c7b\u4f3cASR\u7684\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u5efa\u6a21ASR\u566a\u97f3\uff0c\u5c06\u8fd9\u79cd\u542b\u566a\u6570\u636e\u878d\u5165\u8bad\u7ec3\u8fc7\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86\u533b\u7597\u5bf9\u8bdd\u6458\u8981\u7cfb\u7edf\u7684\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u89e3\u51b3\u4e86\u5173\u952e\u5e94\u7528\u4e2dASR\u8f93\u51fa\u566a\u97f3\u7684\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u589e\u5f3a\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u53ef\u9760\u6027\u7684\u7a33\u5065\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.14398": "|**2024-08-26**|**Language-specific Calibration for Pruning Multilingual Language Models**|Simon Kurz et.al.|[2408.14398](http://arxiv.org/abs/2408.14398)|null|\u8fd1\u671f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u526a\u679d\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\uff0c\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u5353\u8d8a\u7684\u538b\u7f29\u6548\u679c\uff0c\u5e76\u4fdd\u6301\u4e86\u9ad8\u9884\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u4f7f\u7528\u82f1\u8bed\u6587\u672c\u8fdb\u884c\u526a\u679d\u6821\u51c6\uff0c\u800c\u5ffd\u7565\u4e86\u73b0\u4ee3LLM\u7684\u591a\u8bed\u8a00\u6027\u8d28\u53ca\u5176\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u7528\u4e8e\u526a\u679d\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u7b56\u7565\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u9996\u4e2a\u5168\u9762\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u4e0d\u540c\u6821\u51c6\u8bed\u8a00\u5728\u591a\u8bed\u8a00\u4efb\u52a1\u3001\u6a21\u578b\u548c\u6700\u5148\u8fdb\u7684\u526a\u679d\u6280\u672f\u4e0b\u5bf9\u526a\u679d\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63d0\u4f9b\u4e86\u5b9e\u7528\u7684\u5efa\u8bae\uff0c\u4f8b\u5982\uff0c\u5728\u76ee\u6807\u8bed\u8a00\u4e0a\u8fdb\u884c\u6821\u51c6\u53ef\u4ee5\u6709\u6548\u5730\u964d\u4f4e\u56f0\u60d1\u5ea6\uff0c\u4f46\u4e0d\u4e00\u5b9a\u80fd\u4fc3\u8fdb\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u63d0\u5347\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u5b9e\u9a8c\u63ed\u793a\uff0c\u76ee\u6807\u8bed\u8a00\u4e0a\u7684\u6821\u51c6\u4e3b\u8981\u8d21\u732e\u5728\u4e8e\u4fdd\u7559\u4e0e\u6d41\u7545\u6027\u548c\u8fde\u8d2f\u6027\u76f8\u5173\u7684\u8bed\u8a00\u7279\u5b9a\u7279\u6027\uff0c\u4f46\u53ef\u80fd\u65e0\u6cd5\u6355\u6349\u5230\u4e0e\u7406\u89e3\u80fd\u529b\u548c\u63a8\u7406\u80fd\u529b\u7b49\u8bed\u8a00\u901a\u7528\u7279\u6027\u7684\u5173\u8054\u3002 \u6700\u540e\uff0c\u6211\u4eec\u4e3a\u672a\u6765\u7684\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u5b9e\u9645\u7684\u5efa\u8bae\u3002|\n", "2408.14387": "|**2024-08-26**|**Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning**|Sakhinana Sagar Srinivas et.al.|[2408.14387](http://arxiv.org/abs/2408.14387)|null|\u7a7a\u95f4\u65f6\u95f4\u9884\u6d4b\u5728\u4ea4\u901a\u7cfb\u7edf\u3001\u7269\u6d41\u548c\u4f9b\u5e94\u94fe\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u53d7\u9650\u4e8e\u5904\u7406\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5f00\u6e90\u5927\u578b\u548c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs \u548c LMs\uff09\u4e0e\u4f20\u7edf\u9884\u6d4b\u65b9\u6cd5\u7684\u6df7\u5408\u7b56\u7565\u3002\u901a\u8fc7\u5f15\u5165\u52a8\u6001\u63d0\u793a\u548c\u5206\u7ec4\u67e5\u8be2\u3001\u591a\u5934\u6ce8\u610f\u529b\u673a\u5236\uff0c\u8be5\u7b56\u7565\u80fd\u591f\u66f4\u6709\u6548\u5730\u6355\u6349\u6f14\u53d8\u975e\u7ebf\u6027\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u4e2d\u7684\u5185\u90e8\u7cfb\u5217\u548c\u8de8\u7cfb\u5217\u4f9d\u8d56\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4f4e\u79e9\u9002\u914d\u4e0e\u6fc0\u6d3b\u8bb0\u5fc6\u51cf\u5c11\u6280\u672f\uff08LoRA-AMR\uff09\uff0c\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u5bf9\u5f00\u6e90\u5c0f\u578b LM \u8fdb\u884c\u5b9a\u5236\u5316\u5fae\u8c03\uff0c\u4ee5\u5206\u6790\u65f6\u95f4\u5e8f\u5217\u8d8b\u52bf\uff0c\u540c\u65f6\u4fdd\u7559\u63a8\u7406\u5ef6\u8fdf\u5e76\u964d\u4f4e\u8ba1\u7b97\u5f00\u9500\u548c\u6fc0\u6d3b\u5b58\u50a8\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u5c06\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0e\u4f20\u7edf\u65f6\u95f4\u5e8f\u5217\u8868\u793a\u5b66\u4e60\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u5b9e\u73b0\u8de8\u6a21\u6001\u96c6\u6210\uff0c\u4ece\u800c\u83b7\u5f97\u7a33\u5065\u4e14\u51c6\u786e\u7684\u9884\u6d4b\u7ed3\u679c\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u5b9e\u9645\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8be5\u6846\u67b6\u7684\u6548\u80fd\u5f97\u5230\u4e86\u5145\u5206\u9a8c\u8bc1\uff0c\u5176\u9884\u6d4b\u51c6\u786e\u6027\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.14380": "|**2024-08-26**|**Probing Causality Manipulation of Large Language Models**|Chenyang Zhang et.al.|[2408.14380](http://arxiv.org/abs/2408.14380)|**[link](https://github.com/tongjinlp/llm-causality-probing)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u591a\u79cd\u80fd\u529b\uff0c\u5305\u62ec\u56e0\u679c\u5173\u7cfb\u95ee\u9898\u3002\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u901a\u5e38\u57fa\u4e8e\u7edf\u8ba1\u5173\u8054\u5de5\u4f5c\uff0c\u800c\u975e\u4e13\u6ce8\u4e8e\u53e5\u5b50\u4e2d\u7684\u56e0\u679c\u4e0e\u5f71\u54cd\u3002\u56e0\u6b64\uff0c\u63a2\u7d22LLM\u5185\u90e8\u5bf9\u56e0\u679c\u6027\u7684\u64cd\u7eb5\u662f\u5fc5\u8981\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u63d0\u4f9b\u4e0d\u540c\u7684\u6377\u5f84\u5e76\u89c2\u5bdf\u6a21\u578b\u884c\u4e3a\u6765\u63a2\u67e5\u56e0\u679c\u6027\u64cd\u7eb5\u7684\u5c42\u7ea7\u3002\u6211\u4eec\u5229\u7528\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u6280\u672f\uff0c\u9488\u5bf9\u8bbe\u8ba1\u7684\u56e0\u679c\u5206\u7c7b\u4efb\u52a1\uff0c\u5bf9\u4e3b\u6d41LLM\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5305\u62ecGPT-4\u4ee5\u53ca\u4e00\u4e9b\u8f83\u5c0f\u7684\u548c\u7279\u5b9a\u9886\u57df\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM\u80fd\u591f\u8bc6\u522b\u4e0e\u56e0\u679c\u6027\u76f8\u5173\u7684\u5b9e\u4f53\uff0c\u5e76\u8ba4\u8bc6\u5230\u76f4\u63a5\u7684\u56e0\u679c\u5173\u7cfb\u3002\u7136\u800c\uff0cLLM\u7f3a\u4e4f\u4e13\u95e8\u7684\u56e0\u679c\u8ba4\u77e5\u80fd\u529b\uff0c\u53ea\u662f\u5c06\u56e0\u679c\u6027\u89c6\u4e3a\u53e5\u5b50\u6574\u4f53\u8bed\u4e49\u7684\u4e00\u90e8\u5206\u3002**|\n", "2408.14354": "|**2024-08-26**|**SWE-bench-java: A GitHub Issue Resolving Benchmark for Java**|Daoguang Zan et.al.|[2408.14354](http://arxiv.org/abs/2408.14354)|**[link](https://github.com/multi-swe-bench/multi-swe-bench-env)**|**GitHub\u95ee\u9898\u89e3\u51b3\u662f\u8f6f\u4ef6\u5de5\u7a0b\u4e2d\u7684\u5173\u952e\u4efb\u52a1\uff0c\u8fd1\u671f\u5728\u884c\u4e1a\u548c\u5b66\u672f\u754c\u90fd\u53d7\u5230\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5728\u8fd9\u4e2a\u9886\u57df\u5185\uff0cSWE-bench\u5df2\u7ecf\u53d1\u5e03\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u4ec5\u5173\u6ce8Python\u7248\u672c\u3002\u7136\u800c\uff0c\u652f\u6301\u66f4\u591a\u7f16\u7a0b\u8bed\u8a00\u540c\u6837\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5de5\u4e1a\u754c\u5bf9\u6b64\u6709\u5f3a\u70c8\u9700\u6c42\u3002\u4f5c\u4e3a\u8fc8\u5411\u591a\u8bed\u8a00\u652f\u6301\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Java\u7248\u7684SWE-bench\uff0c\u79f0\u4e3aSWE-bench-java\u3002\u6211\u4eec\u5df2\u516c\u5f00\u53d1\u5e03\u4e86\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u4e8eDocker\u7684\u8bc4\u4f30\u73af\u5883\u548c\u6392\u884c\u699c\uff0c\u8fd9\u4e9b\u90fd\u5c06\u6301\u7eed\u7ef4\u62a4\u548c\u66f4\u65b0\u3002\u4e3a\u4e86\u9a8c\u8bc1SWE-bench-java\u7684\u53ef\u9760\u6027\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u7ecf\u5178\u65b9\u6cd5SWE-agent\uff0c\u5e76\u5728\u5176\u4e2d\u6d4b\u8bd5\u4e86\u51e0\u79cd\u5f3a\u5927\u7684LLMs\u3002\u4f17\u6240\u5468\u77e5\uff0c\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u8a00\u57fa\u51c6\u65e2\u8017\u65f6\u53c8\u8d39\u529b\uff0c\u56e0\u6b64\u6211\u4eec\u6b22\u8fce\u901a\u8fc7\u62c9\u53d6\u8bf7\u6c42\u6216\u5408\u4f5c\u6765\u52a0\u901f\u5176\u8fed\u4ee3\u548c\u6539\u8fdb\uff0c\u4e3a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7f16\u7a0b\u94fa\u5e73\u9053\u8def\u3002**|\n", "2408.15240": "|**2024-08-27**|**Generative Verifiers: Reward Modeling as Next-Token Prediction**|Lunjun Zhang et.al.|[2408.15240](http://arxiv.org/abs/2408.15240)|null|\u9a8c\u8bc1\u5668\u6216\u5956\u52b1\u6a21\u578b\u5e38\u7528\u4e8e\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u6027\u80fd\u3002\u4e00\u79cd\u5e38\u89c1\u7684\u65b9\u6cd5\u662fBest-of-N\u7b56\u7565\uff0c\u5176\u4e2d\u4eceLLM\u751f\u6210\u7684N\u4e2a\u5019\u9009\u89e3\u51b3\u65b9\u6848\u4e2d\u7531\u9a8c\u8bc1\u5668\u8fdb\u884c\u6392\u540d\uff0c\u9009\u62e9\u6700\u4f73\u4e00\u4e2a\u3002\u4f20\u7edf\u4e0a\uff0c\u9a8c\u8bc1\u5668\u662f\u4f5c\u4e3a\u5224\u522b\u5206\u7c7b\u5668\u8fdb\u884c\u8bad\u7ec3\u4ee5\u5bf9\u89e3\u51b3\u65b9\u6848\u6253\u5206\u7684\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3LLM\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u5728\u9a8c\u8bc1\u548c\u89e3\u51b3\u65b9\u6848\u751f\u6210\u4e0a\u4f7f\u7528\u901a\u7528\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u76ee\u6807\u8054\u5408\u8bad\u7ec3\u9a8c\u8bc1\u5668\u3002\u4e0e\u6807\u51c6\u9a8c\u8bc1\u5668\u76f8\u6bd4\uff0c\u8fd9\u6837\u7684\u751f\u6210\u578b\u9a8c\u8bc1\u5668\uff08GenRM\uff09\u53ef\u4ee5\u4eceLLM\u7684\u51e0\u4e2a\u4f18\u52bf\u4e2d\u83b7\u76ca\uff1a\u5b83\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u6307\u4ee4\u8c03\u8c10\u76f8\u7ed3\u5408\uff0c\u652f\u6301\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u5e76\u4e14\u53ef\u4ee5\u901a\u8fc7\u589e\u52a0\u63a8\u7406\u65f6\u7684\u8ba1\u7b97\u91cf\u6765\u5229\u7528\u591a\u6570\u6295\u7968\uff0c\u4ece\u800c\u8fdb\u884c\u66f4\u597d\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7b97\u6cd5\u95ee\u9898\u548c\u5c0f\u5b66\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u4f7f\u7528Gemma\u4e3a\u57fa\u7840\u7684\u9a8c\u8bc1\u5668\u65f6\uff0cGenRM\u4f18\u4e8e\u5224\u522b\u578b\u9a8c\u8bc1\u5668\u548cLLM\u4f5c\u4e3a\u88c1\u5224\uff0c\u8868\u73b0\u51fa16%-64%\u7684\u95ee\u9898\u89e3\u51b3\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86GenRM\u5728\u6570\u636e\u96c6\u89c4\u6a21\u3001\u6a21\u578b\u5bb9\u91cf\u548c\u63a8\u7406\u65f6\u8ba1\u7b97\u91cf\u589e\u52a0\u65b9\u9762\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\u3002|\n", "2408.15221": "|**2024-08-27**|**LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet**|Nathaniel Li et.al.|[2408.15221](http://arxiv.org/abs/2408.15221)|null|\u8fd1\u671f\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9632\u5fa1\u63aa\u65bd\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5bf9\u6709\u5bb3\u67e5\u8be2\u7684\u62d2\u7edd\u80fd\u529b\uff0c\u5373\u4f7f\u5728\u906d\u53d7\u6709\u7ec4\u7ec7\u653b\u51fb\u7684\u60c5\u51b5\u4e0b\u4e5f\u4e0d\u4f8b\u5916\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u9632\u5fa1\u63aa\u65bd\u4e3b\u8981\u662f\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u9488\u5bf9\u81ea\u52a8\u5316\u653b\u51fb\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u79cd\u5a01\u80c1\u6a21\u578b\u4e0d\u8db3\u4ee5\u53cd\u6620\u771f\u5b9e\u4e16\u754c\u4e2d\u6076\u610f\u884c\u4e3a\u7684\u590d\u6742\u6027\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\uff08\u5373\u653b\u51fb\u8005\u5229\u7528\u6a21\u578b\u7684\u6f0f\u6d1e\u6765\u7ed5\u8fc7\u9632\u5fa1\u673a\u5236\uff09\u80fd\u591f\u63ed\u9732\u9632\u5fa1\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u6f0f\u6d1e\u3002\u5728\u4f7f\u7528HarmBench\u8fd9\u4e00\u8bc4\u4f30\u5e73\u53f0\uff0c\u5bf9\u6297\u90a3\u4e9b\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u4ec5\u62a5\u544a\u4f4e\u767e\u5206\u6bd4\u653b\u51fb\u6210\u529f\u7387\uff08ASR\uff09\u7684\u9632\u5fa1\u7cfb\u7edf\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u7684\u6210\u529f\u7387\u8d85\u8fc7\u4e8670%\u3002\u8fd9\u8868\u660e\u5f53\u524d\u7684\u9632\u5fa1\u673a\u5236\u5728\u9762\u5bf9\u66f4\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u653b\u51fb\u7b56\u7565\u65f6\u5b58\u5728\u4e0d\u8db3\u3002 \u6b64\u5916\uff0c\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u8fd8\u63ed\u793a\u4e86\u673a\u5668\u9057\u5fd8\u9632\u5fa1\u7cfb\u7edf\u7684\u6f0f\u6d1e\u3002\u653b\u51fb\u8005\u6210\u529f\u5730\u4ece\u672a\u88ab\u5220\u9664\u7684\u6a21\u578b\u4e2d\u6062\u590d\u4e86\u53ef\u7528\u4e8e\u751f\u7269\u5b89\u5168\u53cc\u91cd\u7528\u9014\u7684\u77e5\u8bc6\uff0c\u8fd9\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u73b0\u6709\u9632\u5fa1\u63aa\u65bd\u5728\u4fdd\u62a4\u654f\u611f\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u7684\u5f31\u70b9\u3002 \u4e3a\u4e86\u603b\u7ed3\u548c\u5171\u4eab\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u591a\u8f6e\u5bf9\u8bdd\u4eba\u5de5\u667a\u80fd\u8d8a\u72f1\u201d\uff08Multi-Turn Human Jailbreaks\uff0c\u7b80\u79f0MHJ\uff09\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u6765\u81ea537\u4e2a\u4e0d\u540c\u591a\u8f6e\u5bf9\u8bdd\u573a\u666f\u76842912\u4e2a\u89e6\u53d1\u6307\u4ee4\uff0c\u5171\u8ba12,912\u4e2a\u89e6\u53d1\u6307\u4ee4\u6d89\u53ca2,912\u4e2a\u4e0d\u540c\u7684\u591a\u8f6e\u5bf9\u8bdd\u201c\u8d8a\u72f1\u201d\u6848\u4f8b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u4ee5\u53ca\u5728\u591a\u79cd\u5546\u4e1a\u7ea2\u961f\u6d4b\u8bd5\u4e2d\u53d1\u5c55\u51fa\u7684\u4e00\u7cfb\u5217\u201c\u8d8a\u72f1\u201d\u7b56\u7565\u7684\u7efc\u8ff0\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u66f4\u5f3a\u5927\u7684LLM\u9632\u5fa1\u7cfb\u7edf\u63d0\u4f9b\u8d44\u6e90\u548c\u652f\u6301\u3002|\n", "2408.15207": "|**2024-08-27**|**Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks**|Shide Zhou et.al.|[2408.15207](http://arxiv.org/abs/2408.15207)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\u6781\u5927\u5730\u6539\u53d8\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u683c\u5c40\uff0c\u7136\u800c\u5728\u654f\u611f\u9886\u57df\u90e8\u7f72\u65f6\uff0c\u5b83\u4eec\u7684\u8106\u5f31\u6027\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u4e25\u91cd\u5173\u5207\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u6076\u610f\u5229\u7528\u7684\u98ce\u9669\u3002\u8fd9\u79cd\u60c5\u51b5\u51f8\u663e\u4e86\u9884\u90e8\u7f72\u6d4b\u8bd5\u4e0d\u8db3\u7684\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u66f4\u52a0\u4e25\u683c\u548c\u5168\u9762\u8bc4\u4f30\u65b9\u6cd5\u7684\u7d27\u8feb\u6027\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5168\u9762\u7684\u5b9e\u8bc1\u5206\u6790\uff0c\u8bc4\u4f30\u4e86\u4f20\u7edf\u8986\u76d6\u6807\u51c6\u5728\u8bc6\u522b\u8fd9\u4e9b\u6f0f\u6d1e\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u5173\u952e\u95ee\u9898\u2014\u2014\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002\u7814\u7a76\u9996\u5148\u5bf9LLM\u4e2d\u7684\u9690\u85cf\u72b6\u6001\u8fdb\u884c\u4e86\u805a\u7c7b\u5206\u6790\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u72b6\u6001\u7684\u5185\u5728\u7279\u6027\u80fd\u591f\u660e\u663e\u533a\u5206\u4e0d\u540c\u7c7b\u578b\u7684\u67e5\u8be2\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6807\u51c6\u7ea7\u522b\u3001\u5c42\u7ea7\u522b\u548c\u8bcd\u7ea7\u522b\u2014\u2014\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6807\u51c6\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u6b63\u5e38\u67e5\u8be2\u4e0e\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u5728\u795e\u7ecf\u5143\u6fc0\u6d3b\u6a21\u5f0f\u4e0a\u7684\u663e\u8457\u5dee\u5f02\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u805a\u7c7b\u7ed3\u679c\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5b9e\u65f6\u68c0\u6d4b\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff0c\u5229\u7528\u795e\u7ecf\u6fc0\u6d3b\u7279\u5f81\u3002\u6211\u4eec\u7684\u5206\u7c7b\u5668\u8868\u73b0\u51fa\u4e86\u6781\u9ad8\u7684\u51c6\u786e\u7387\uff0c\u5e73\u5747\u8fbe\u523096.33%\uff0c\u6210\u529f\u8bc6\u522b\u51fa\u5305\u62ec\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6027\u653b\u51fb\u7684\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u3002\u8fd9\u9879\u7814\u7a76\u7684\u91cd\u8981\u6027\u5728\u4e8e\u5176\u5bf9LLM\u5b89\u5168\u6027\u6d4b\u8bd5\u590d\u6742\u6311\u6218\u7684\u5168\u9762\u5e94\u5bf9\u3002\u901a\u8fc7\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u751f\u6210\u7b2c\u4e00\u4e2a\u8bcd\u65f6\u7acb\u5373\u68c0\u6d4b\u5230\u653b\u51fb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u96c6\u6210LLM\u7684\u672a\u6765\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5f3a\u5927\u7684\u5b9e\u65f6\u68c0\u6d4b\u80fd\u529b\u3002\u8fd9\u4e00\u7814\u7a76\u6df1\u5316\u4e86\u6211\u4eec\u5bf9LLM\u5b89\u5168\u6027\u7684\u7406\u89e3\uff0c\u5e76\u4e3a\u5f00\u53d1\u66f4\u7a33\u5065\u7684\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.15205": "|**2024-08-27**|**Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation**|Jian Hu et.al.|[2408.15205](http://arxiv.org/abs/2408.15205)|**[link](https://github.com/lwpyh/ProMaC_code)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4efb\u52a1\u901a\u7528\u7684\u63d0\u793a\u53ef\u5206\u5272\u65b9\u6cd5\uff0c\u65e8\u5728\u51cf\u5c11\u5bf9\u6bcf\u79cd\u6240\u9700\u5bf9\u8c61\u7684\u5b9e\u4f8b\u7279\u5b9a\u624b\u52a8\u63d0\u793a\u7684\u9700\u6c42\u3002\u901a\u8fc7\u4f7f\u7528\u5355\u4e2a\u4efb\u52a1\u901a\u7528\u63d0\u793a\u6765\u6307\u5bfc\u540c\u4e00\u4efb\u52a1\u4e0b\u4e0d\u540c\u5bf9\u8c61\u7684\u4e0d\u540c\u56fe\u50cf\u7684\u5206\u5272\uff0c\u5f15\u5165\u4e86\u4efb\u52a1\u901a\u7528\u63d0\u793a\u5206\u5272\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4ece\u901a\u7528\u63d0\u793a\u63a8\u7406\u51fa\u8be6\u7ec6\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u5206\u5272\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002\u7136\u800c\uff0cMLLMs\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7ecf\u5e38\u51fa\u73b0\u5e7b\u89c9\uff0c\u5bfc\u81f4\u63d0\u793a\u4e0d\u51c6\u786e\u3002\u73b0\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u6d88\u9664\u5e7b\u89c9\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u672c\u6587\u8ba4\u4e3aMLLM\u5e7b\u89c9\u5728\u6b63\u786e\u5229\u7528\u65f6\u53ef\u4ee5\u63ed\u793a\u6709\u4ef7\u503c\u7684\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee3\u8868\u4e86\u8d85\u8d8a\u5355\u5f20\u56fe\u50cf\u7684\u9884\u8bad\u7ec3\u5927\u89c4\u6a21\u77e5\u8bc6\u3002\u56e0\u6b64\uff0c\u672c\u6587\u5229\u7528\u5e7b\u89c9\u4ece\u56fe\u50cf\u4e2d\u6316\u6398\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u9a8c\u8bc1\u5176\u51c6\u786e\u6027\u4ee5\u589e\u5f3a\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7684\u63d0\u793a-\u63a9\u7801\u5faa\u73af\u751f\u6210\u6846\u67b6\uff08ProMaC\uff09\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u63d0\u793a\u751f\u6210\u5668\u548c\u4e00\u4e2a\u63a9\u7801\u751f\u6210\u5668\u3002\u63d0\u793a\u751f\u6210\u5668\u4f7f\u7528\u591a\u5c3a\u5ea6\u94fe\u5f0f\u601d\u8003\u63d0\u793a\uff0c\u6700\u521d\u63a2\u7d22\u5e7b\u89c9\u4ee5\u63d0\u53d6\u6d4b\u8bd5\u56fe\u50cf\u4e0a\u7684\u6269\u5c55\u4e0a\u4e0b\u6587\u77e5\u8bc6\u3002\u7136\u540e\uff0c\u5c06\u8fd9\u4e9b\u5e7b\u89c9\u964d\u4f4e\u5230\u5f62\u6210\u7cbe\u786e\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ece\u800c\u5f15\u5bfc\u63a9\u7801\u751f\u6210\u5668\u901a\u8fc7\u63a9\u7801\u8bed\u4e49\u5bf9\u9f50\u4ea7\u751f\u4e0e\u4efb\u52a1\u8bed\u4e49\u4e00\u81f4\u7684\u63a9\u7801\u3002\u751f\u6210\u7684\u63a9\u7801\u901a\u8fc7\u8fed\u4ee3\u5f15\u5bfc\u63d0\u793a\u751f\u6210\u5668\u66f4\u5173\u6ce8\u4efb\u52a1\u76f8\u5173\u7684\u56fe\u50cf\u533a\u57df\u5e76\u51cf\u5c11\u65e0\u5173\u7684\u5e7b\u89c9\uff0c\u6700\u7ec8\u5171\u540c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u63a9\u7801\u7684\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u57285\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8bc1\u660e\u4e86ProMaC\u7684\u6709\u6548\u6027\u3002\u8be6\u7ec6\u4ee3\u7801\u89c1https://lwpyh.github.io/ProMaC/\u3002|\n", "2408.15204": "|**2024-08-27**|**Can Unconfident LLM Annotations Be Used for Confident Conclusions?**|Kristina Gligori\u0107 et.al.|[2408.15204](http://arxiv.org/abs/2408.15204)|**[link](https://github.com/kristinagligoric/confidence-driven-inference)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u8005\u9ad8\u5ea6\u4e00\u81f4\uff0c\u663e\u793a\u51fa\u51cf\u8f7b\u4eba\u7c7b\u6570\u636e\u6536\u96c6\u6311\u6218\u7684\u6f5c\u529b\u3002\u5728\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\uff08CSS\uff09\u9886\u57df\uff0c\u7814\u7a76\u4eba\u5458\u8d8a\u6765\u8d8a\u591a\u5730\u5229\u7528LLM\u6ce8\u91ca\u6765\u8865\u5145\u7f13\u6162\u4e14\u6602\u8d35\u7684\u4eba\u7c7b\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5982\u4f55\u6536\u96c6\u548c\u4f7f\u7528LLM\u6ce8\u91ca\u800c\u4e0d\u635f\u5bb3\u4e0b\u6e38\u7ed3\u8bba\u7684\u6709\u6548\u6027\uff0c\u4ecd\u7f3a\u4e4f\u660e\u786e\u7684\u6307\u5357\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u201d\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86LLM\u6ce8\u91ca\u548cLLM\u7f6e\u4fe1\u5ea6\u6307\u793a\u5668\uff0c\u4ee5\u6218\u7565\u65b9\u5f0f\u9009\u62e9\u5e94\u6536\u96c6\u54ea\u4e9b\u4eba\u7c7b\u6ce8\u91ca\uff0c\u65e8\u5728\u751f\u4ea7\u51c6\u786e\u7684\u7edf\u8ba1\u4f30\u8ba1\u548c\u53ef\u9a8c\u8bc1\u7684\u7f6e\u4fe1\u533a\u95f4\uff0c\u540c\u65f6\u51cf\u5c11\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u9632\u6b62LLM\u6ce8\u91ca\u8d28\u91cf\u5dee\u7684\u4fdd\u969c\u63aa\u65bd\uff0c\u786e\u4fdd\u5f97\u51fa\u7684\u7ed3\u8bba\u65e2\u6709\u6548\u53c8\u4e0d\u6bd4\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u6ce8\u91ca\u66f4\u4e0d\u51c6\u786e\u3002\u6211\u4eec\u5728\u4e09\u4e2aCSS\u573a\u666f\u2014\u2014\u793c\u8c8c\u6587\u672c\u3001\u7acb\u573a\u548c\u504f\u89c1\u2014\u2014\u4e2d\u7684\u7edf\u8ba1\u4f30\u8ba1\u4efb\u52a1\u4e2d\uff0c\u901a\u8fc7\u4e0e\u57fa\u7ebf\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u7684\u6709\u6548\u6027\uff0c\u6bcf\u79cd\u573a\u666f\u4e0b\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u51cf\u5c11\u4e86\u8d85\u8fc725%\u3002\u5c3d\u7ba1\u6211\u4eec\u4f7f\u7528CSS\u573a\u666f\u8fdb\u884c\u6f14\u793a\uff0c\u4f46\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u53ef\u4ee5\u7528\u4e8e\u5e7f\u6cdbNLP\u95ee\u9898\u4e2d\u7684\u5927\u591a\u6570\u6807\u51c6\u91cf\u4f30\u8ba1\u3002|\n", "2408.15176": "|**2024-08-27**|**Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement**|Longshen Ou et.al.|[2408.15176](http://arxiv.org/abs/2408.15176)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\uff0c\u5305\u62ec\u7b26\u53f7\u97f3\u4e50\u751f\u6210\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u8fdb\u884c\u53ef\u63a7\u97f3\u4e50\u7f16\u6392\u4efb\u52a1\u7684\u6311\u6218\u4ecd\u7136\u65b0\u9896\uff0c\u6bcf\u4e2a\u4efb\u52a1\u90fd\u9700\u8981\u4e0d\u540c\u7684\u97f3\u4e50\u4fe1\u606f\u4f5c\u4e3a\u63a7\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5e8f\u5217\u5230\u5e8f\u5217\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u5bf9\u7b26\u53f7\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u56db\u4e2a\u4e0d\u540c\u7684\u591a\u8f68\u7f16\u6392\u4efb\u52a1\uff1a\u4e50\u961f\u7f16\u6392\u3001\u94a2\u7434\u7f29\u51cf\u3001\u9f13\u7f16\u6392\u548c\u58f0\u97f3\u5206\u79bb\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u7b56\u7565\u5728\u6240\u6709\u56db\u4e2a\u4efb\u52a1\u4e0a\u5747\u5b9e\u73b0\u4e86\u66f4\u9ad8\u97f3\u4e50\u8d28\u91cf\u7684\u7ed3\u679c\uff0c\u4e0e\u4e13\u95e8\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u57fa\u7ebf\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u989d\u5916\u7684\u63a2\u67e5\u5206\u6790\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u9636\u6bb5\u8d4b\u4e88\u6a21\u578b\u7406\u89e3\u97f3\u4e50\u6761\u4ef6\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u8fd9\u5728\u4ec5\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u7684\u5fae\u8c03\u96be\u4ee5\u83b7\u5f97\u7684\u60c5\u51b5\u4e0b\u5c24\u4e3a\u91cd\u8981\u3002|\n", "2408.15172": "|**2024-08-27**|**X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation**|Hanjia Lyu et.al.|[2408.15172](http://arxiv.org/abs/2408.15172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5df2\u88ab\u8bc1\u660e\u80fd\u663e\u8457\u63d0\u5347\u4e30\u5bcc\u9879\u76ee\u63cf\u8ff0\u7684\u6548\u679c\uff0c\u8fdb\u800c\u589e\u5f3a\u63a8\u8350\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u4e8e\u7eaf\u6587\u672c\u63d0\u793a\uff0c\u6216\u8005\u91c7\u7528\u57fa\u672c\u7684\u591a\u6a21\u6001\u7b56\u7565\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u6587\u672c\u4e0e\u89c6\u89c9\u6a21\u6001\u4e4b\u95f4\u4e92\u8865\u7684\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCross-Reflection Prompting\uff08X-Reflect\uff09\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5f15\u5bfcLMM\u660e\u786e\u8bc6\u522b\u5e76\u8c03\u548c\u6587\u672c\u4e0e\u56fe\u50cf\u4e4b\u95f4\u7684\u652f\u6301\u6027\u4e0e\u51b2\u7a81\u4fe1\u606f\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u901a\u8fc7\u6355\u6349\u4e24\u79cd\u6a21\u6001\u7684\u7ec6\u5fae\u6d1e\u5bdf\uff0c\u6b64\u65b9\u6cd5\u751f\u6210\u4e86\u66f4\u4e3a\u5168\u9762\u4e14\u8bed\u5883\u4e30\u5bcc\u7684\u9879\u76ee\u8868\u793a\u3002\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4e0b\u6e38\u63a8\u8350\u51c6\u786e\u5ea6\u4e0a\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u6846\u67b6\u5728\u4e0d\u540cLMM\u67b6\u6784\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u63d0\u793a\u7b56\u7565\u7684\u9c81\u68d2\u6027\uff0c\u63d0\u4f9b\u4e86\u4f18\u5316\u7684\u89c1\u89e3\u3002\u8fd9\u9879\u5de5\u4f5c\u5f3a\u8c03\u4e86\u6574\u5408\u591a\u6a21\u6001\u4fe1\u606f\u7684\u91cd\u8981\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u5584\u591a\u6a21\u6001\u63a8\u8350\u7cfb\u7edf\u4e2d\u9879\u76ee\u7406\u89e3\u7684\u65b0\u578b\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.15171": "|**2024-08-27**|**Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation**|N. E. Kriman et.al.|[2408.15171](http://arxiv.org/abs/2408.15171)|null|\u81ea2022\u5e74ChatGPT\u7684\u53d1\u5e03\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u8303\u56f4\u663e\u8457\u6269\u5927\uff0c\u663e\u793a\u51fa\u5176\u5728\u5404\u79cd\u573a\u666f\u4e2d\u7684\u4ef7\u503c\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4f01\u4e1a\u7ea7\u548c\u5546\u4e1a\u5e94\u7528\u800c\u8a00\uff0cLLMs\u751f\u6210\u4e0d\u51c6\u786e\u4fe1\u606f\u7684\u8d8b\u52bf\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u73b0\u8c61\uff0c\u6210\u4e3a\u4e86\u4e00\u4e2a\u4e3b\u8981\u6311\u6218\u3002\u672c\u9879\u76ee\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728\u4e0e\u539f\u59cb\u6587\u672c\u8fdb\u884c\u6bd4\u8f83\u65f6\u8bc4\u4f30LLM\u751f\u6210\u6982\u8981\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6734\u7d20\u8d1d\u53f6\u65af\u5206\u7c7b\u6765\u5224\u65ad\u751f\u6210\u5185\u5bb9\u7684\u771f\u5b9e\u6027\u3002 \u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u53ef\u4ee5\u4f30\u8ba1\u751f\u6210\u6587\u672c\u4e0e\u5b9e\u9645\u4fe1\u606f\u4e4b\u95f4\u7684\u5339\u914d\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8LLM\u5e94\u7528\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u3002\u8fd9\u4e0d\u4ec5\u6709\u52a9\u4e8e\u8bc6\u522b\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u4e4b\u5904\uff0c\u8fd8\u80fd\u589e\u5f3a\u7528\u6237\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u4fe1\u4efb\uff0c\u4fc3\u8fdb\u5176\u5728\u66f4\u5e7f\u6cdb\u9886\u57df\u7684\u6709\u6548\u4f7f\u7528\u3002\u6b64\u5916\uff0c\u8be5\u65b9\u6cd5\u8fd8\u80fd\u4e3aLLM\u7684\u6301\u7eed\u6539\u8fdb\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u53cd\u9988\uff0c\u63a8\u52a8\u6280\u672f\u8fdb\u6b65\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u9ad8\u8d28\u91cf\u3001\u66f4\u53ef\u9760\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u5185\u5bb9\u751f\u6210\u3002|\n", "2408.15079": "|**2024-08-27**|**BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline**|Guosheng Dong et.al.|[2408.15079](http://arxiv.org/abs/2408.15079)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6838\u5fc3\u80fd\u529b\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5e7f\u6cdb\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u7ec4\u6210\u548c\u9009\u62e9\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u88ab\u591a\u4e2a\u673a\u6784\u89c6\u4e3a\u5546\u4e1a\u79d8\u5bc6\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u6e90\u4e86\u4e00\u4e2a\u901a\u7528\u9002\u7528\u7684\u6570\u636e\u5904\u7406\u7ba1\u9053\uff0c\u5e76\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u7ade\u4e89\u6027\u7684LLM\u57fa\u7ebf\u6765\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u548c\u6f5c\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6570\u636e\u5904\u7406\u7ba1\u9053\u5305\u62ec\u5e7f\u57df\u6536\u96c6\u4ee5\u6269\u5927\u89c4\u6a21\u548c\u91cd\u65b0\u52a0\u6743\u4ee5\u63d0\u9ad8\u8d28\u91cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u7ba1\u9053\u5bf93\u4e07\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u660e\u786e\u7684\u4e0b\u6e38\u4efb\u52a1\u4f18\u5316\uff0c\u63a5\u7740\u8fdb\u884c\u4e00\u4e2a\u7b80\u5355\u4f46\u6709\u6548\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u3002BaichuanSEED\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8868\u73b0\u51fa\u4e00\u81f4\u6027\u4e0e\u9884\u6d4b\u6027\uff0c\u5e76\u5728\u7efc\u5408\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u51e0\u4e2a\u5148\u8fdb\u7684\u5546\u4e1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5982Qwen1.5\u548cLlama3\uff0c\u5b9e\u73b0\u4e86\u53ef\u6bd4\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u51e0\u4e2a\u542f\u53d1\u5f0f\u5b9e\u9a8c\uff0c\u8ba8\u8bba\u4e86\u5728\u6570\u5b66\u548c\u7f16\u7a0b\u7b49\u4e0b\u6e38\u4efb\u52a1\u8fdb\u4e00\u6b65\u4f18\u5316\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.15066": "|**2024-08-27**|**Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models**|Ned Cooper et.al.|[2408.15066](http://arxiv.org/abs/2408.15066)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u53cd\u9988\u529f\u80fd\u5728ChatGPT\u754c\u9762\u4e2d\u7684\u53ef\u7528\u6027\uff0c\u5206\u6790\u4e86\u8fd9\u4e9b\u529f\u80fd\u5982\u4f55\u5851\u9020\u7528\u6237\u8f93\u5165\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fed\u4ee3\u8fc7\u7a0b\u4e2d\u7684\u53c2\u4e0e\u5ea6\u3002\u901a\u8fc7\u8c03\u7814ChatGPT\u7528\u6237\u5e76\u5e94\u7528\u4e86\u53ef\u64cd\u4f5c\u6027\u6846\u67b6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u7c7b\u529f\u80fd\u9f13\u52b1\u7b80\u5355\u3001\u9891\u7e41\u4e14\u4fa7\u91cd\u4e8e\u6027\u80fd\u7684\u53cd\u9988\uff0c\u540c\u65f6\u9650\u5236\u4e86\u96c6\u4f53\u8f93\u5165\u548c\u7528\u6237\u95f4\u7684\u8ba8\u8bba\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u8fd9\u79cd\u53cd\u9988\u683c\u5f0f\u6781\u5927\u5730\u9650\u5236\u4e86\u7528\u6237\u7684\u53c2\u4e0e\uff0c\u5f3a\u5316\u4e86\u7528\u6237\u3001\u516c\u4f17\u4e0e\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u53f8\u4e4b\u95f4\u7684\u6743\u529b\u4e0d\u5e73\u7b49\u3002\u6211\u4eec\u7684\u5206\u6790\u4e3a\u73b0\u6709\u53c2\u4e0e\u5f0f\u4eba\u5de5\u667a\u80fd\u6587\u732e\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\uff0c\u7740\u91cd\u4e8e\u73b0\u6709\u53cd\u9988\u6d41\u7a0b\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u91cd\u65b0\u8bbe\u8ba1\u7684\u65b9\u5411\u3002 \u4e3a\u4e86\u4f7f\u516c\u4f17\u5728\u4eba\u5de5\u667a\u80fd\u53d1\u5c55\u4e2d\u80fd\u591f\u66f4\u5177\u6709\u610f\u4e49\u5730\u53c2\u4e0e\uff0c\u6211\u4eec\u63d0\u5021\u8f6c\u5411\u5173\u6ce8\u6a21\u578b\u8f93\u51fa\u4e0e\u7279\u5b9a\u7528\u6237\u504f\u597d\u7684\u4e00\u81f4\u6027\u7684\u8fc7\u7a0b\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u5f3a\u8c03\u9700\u8981\u4fc3\u8fdb\u516c\u53f8\u4e0e\u4e0d\u540c\u201c\u516c\u4f17\u201d\u4e4b\u95f4\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u76ee\u7684\u548c\u5e94\u7528\u8fdb\u884c\u5bf9\u8bdd\u7684\u8fc7\u7a0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u8981\u6c42\u5bf9\u6301\u7eed\u7684\u793e\u4f1a\u57fa\u7840\u8bbe\u65bd\u5efa\u8bbe\u7684\u5173\u6ce8\uff0c\u5373\u521b\u5efa\u548c\u7ef4\u6301\u89e3\u51b3AI\u5f00\u53d1\u548c\u90e8\u7f72\u5f71\u54cd\u7fa4\u4f53\u5173\u5207\u6240\u9700\u7684\u793e\u4f1a\u3001\u6280\u672f\u548c\u673a\u6784\u7ed3\u6784\u3002|\n", "2408.15998": "|**2024-08-28**|**Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders**|Min Shi et.al.|[2408.15998](http://arxiv.org/abs/2408.15998)|**[link](https://github.com/nvlabs/eagle)**|**\u300a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\uff1a\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u63a2\u7d22\u300b\u4e00\u6587\u63a2\u8ba8\u4e86\u51c6\u786e\u89e3\u6790\u590d\u6742\u89c6\u89c9\u4fe1\u606f\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u91cd\u8981\u6027\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u589e\u5f3a\u7684\u89c6\u89c9\u611f\u77e5\u80fd\u663e\u8457\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\u3001\u6587\u6863\u5206\u6790\u7b49\u5206\u8fa8\u7387\u654f\u611f\u4efb\u52a1\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u8bb8\u591a\u5148\u8fdbMLLMs\u901a\u8fc7\u96c6\u6210\u591a\u79cd\u89c6\u89c9\u7f16\u7801\u5668\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u7136\u800c\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u5173\u952e\u65b9\u9762\u7cfb\u7edf\u7684\u6bd4\u8f83\u548c\u8be6\u7ec6\u7684\u62c6\u89e3\u7814\u7a76\uff0c\u6bd4\u5982\u4e13\u5bb6\u9009\u62e9\u548c\u591a\u89c6\u89c9\u4e13\u5bb6\u878d\u5408\u7b56\u7565\u3002\u672c\u6587\u5bf9\u4f7f\u7528\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684MLLM\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u5e7f\u6cdb\u63a2\u7d22\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u591a\u4e2a\u4e92\u8865\u89c6\u89c9\u7f16\u7801\u5668\u7684\u89c6\u89c9\u4ee4\u724c\u7b80\u5355\u62fc\u63a5\u5373\u53ef\u8fbe\u5230\u4e0e\u66f4\u590d\u6742\u7684\u6df7\u5408\u67b6\u6784\u6216\u7b56\u7565\u76f8\u5f53\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u5f15\u5165\u9884\u5bf9\u9f50\uff08Pre-Alignment\uff09\u673a\u5236\uff0c\u4ee5\u5f25\u5408\u4e13\u6ce8\u4e8e\u89c6\u89c9\u7684\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u4ee4\u724c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u4ece\u800c\u63d0\u5347\u6a21\u578b\u4e00\u81f4\u6027\u3002\u7531\u6b64\u4ea7\u751f\u7684MLLM\u5bb6\u65cf\u2014\u2014Eagle\uff0c\u5728\u4e3b\u8981\u7684MLLM\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8d85\u8d8a\u4e86\u5176\u4ed6\u9886\u5148\u5f00\u6e90\u6a21\u578b\u3002\u76f8\u5173\u4ee3\u7801\u53ca\u6a21\u578b\u5df2\u5f00\u6e90\u53d1\u5e03\uff1ahttps://github.com/NVlabs/Eagle**|\n", "2408.15971": "|**2024-08-28**|**BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems**|Wei Wang et.al.|[2408.15971](http://arxiv.org/abs/2408.15971)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u53d8\u5f97\u8d8a\u6765\u8d8a\u5f3a\u5927\uff0c\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\uff0c\u4f8b\u5982\u6784\u5efa\u5355\u4e00\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u76f8\u8f83\u4e8e\u5355\u4e00\u4ee3\u7406\uff0c\u591a\u4ee3\u7406\u7cfb\u7edf\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\u80fd\u529b\u63d0\u51fa\u4e86\u66f4\u9ad8\u7684\u8981\u6c42\u3002\u5df2\u6709\u7684\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u4e8e\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u7ec6\u7c92\u5ea6\u8bc4\u4f30\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5e76\u4e14\u5ffd\u7565\u4e86\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u4e0e\u7ade\u4e89\u573a\u666f\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014BattleAgentBench\u3002\u8be5\u57fa\u51c6\u5b9a\u4e49\u4e86\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u4e03\u4e2a\u5b50\u9636\u6bb5\uff0c\u65e8\u5728\u4ece\u5355\u4e00\u4ee3\u7406\u573a\u666f\u5bfc\u822a\u80fd\u529b\u3001\u914d\u5bf9\u4ee3\u7406\u4efb\u52a1\u6267\u884c\u80fd\u529b\u4ee5\u53ca\u591a\u4ee3\u7406\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7b49\u591a\u4e2a\u7ef4\u5ea6\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u7ec6\u81f4\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5bf9\u56db\u5927\u95ed\u6e90\u6a21\u578b\u548c\u4e03\u5927\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u5219\u9762\u4e34\u6311\u6218\u3002\u5bf9\u4e8e\u9700\u8981\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7684\u56f0\u96be\u4efb\u52a1\uff0c\u5c3d\u7ba1\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5c55\u793a\u4e86\u4e00\u5b9a\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u4ecd\u6709\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002|\n", "2408.15966": "|**2024-08-28**|**More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding**|Yuan Tang et.al.|[2408.15966](http://arxiv.org/abs/2408.15966)|**[link](https://github.com/tangyuan96/greenplm)**|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7406\u89e3\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u4e09\u7ef4\u70b9\u4e91\u4e0e\u6587\u672c\u914d\u5bf9\u6570\u636e\u96c6\uff0cLLM \u5728\u4e09\u7ef4\u7406\u89e3\u4e0a\u7684\u6210\u529f\u5c1a\u672a\u5b9e\u73b0\u590d\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u4efb\u52a1\uff1a3D \u6570\u636e\u9ad8\u6548\u70b9\u4e91-\u8bed\u8a00\u7406\u89e3\u3002\u76ee\u6807\u662f\u4f7fLLM \u80fd\u591f\u5229\u7528\u6700\u5c11\u7684\u4e09\u7ef4\u70b9\u4e91\u548c\u6587\u672c\u6570\u636e\u5bf9\u5b9e\u73b0\u7a33\u5065\u7684\u4e09\u7ef4\u5bf9\u8c61\u7406\u89e3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u5f15\u5165\u4e86GreenPLM\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u591a\u7684\u6587\u672c\u6570\u636e\u6765\u5f25\u8865\u7f3a\u5c11\u7684\u4e09\u7ef4\u6570\u636e\u3002\u9996\u5148\uff0c\u501f\u9274\u4f7f\u7528CLIP\u5bf9\u56fe\u50cf\u548c\u6587\u672c\u8fdb\u884c\u5bf9\u9f50\u7684\u65b9\u5f0f\uff0c\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u7684\u70b9\u4e91-\u6587\u672c\u7f16\u7801\u5668\u5c06\u4e09\u7ef4\u70b9\u4e91\u7a7a\u95f4\u6620\u5c04\u5230\u6587\u672c\u7a7a\u95f4\u3002\u8fd9\u4e00\u6620\u5c04\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u8fde\u63a5\u6587\u672c\u7a7a\u95f4\u4e0eLLM\u3002\u4e00\u65e6\u5efa\u7acb\u4e86\u70b9\u4e91-\u6587\u672c-LLM\u7684\u8fde\u63a5\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u901a\u8fc7\u6269\u5c55\u4e2d\u95f4\u6587\u672c\u7a7a\u95f4\u589e\u5f3a\u6587\u672c-LLM\u7684\u5bf9\u9f50\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u4e09\u7ef4\u70b9\u4e91\u6570\u636e\u7684\u4f9d\u8d56\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u751f\u6210\u4e86600\u4e07\u4e2a\u5173\u4e8e\u4e09\u7ef4\u7269\u4f53\u7684\u81ea\u7531\u6587\u672c\u63cf\u8ff0\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e09\u9636\u6bb5\u8bad\u7ec3\u7b56\u7565\uff0c\u5e2e\u52a9LLM\u66f4\u597d\u5730\u63a2\u7d22\u4e0d\u540c\u6a21\u6001\u4e4b\u95f4\u7684\u5185\u5728\u8054\u7cfb\u3002\u4e3a\u4e86\u5b9e\u73b0\u9ad8\u6548\u7684\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u96f6\u53c2\u6570\u4ea4\u53c9\u6ce8\u610f\u529b\u6a21\u5757\u7528\u4e8e\u4ee4\u724c\u805a\u5408\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGreenPLM\u4ec5\u9700\u8981\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u6240\u75283D\u8bad\u7ec3\u6570\u636e\u768412%\uff0c\u5c31\u80fd\u8fbe\u5230\u66f4\u4f18\u7684\u4e09\u7ef4\u7406\u89e3\u6027\u80fd\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cGreenPLM\u4ec5\u4f7f\u7528\u6587\u672c\u6570\u636e\u4e5f\u80fd\u5b9e\u73b0\u7ade\u4e89\u529b\u7684\u8868\u73b0\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6743\u91cd\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/TangYuan96/GreenPLM\u3002|\n", "2408.15950": "|**2024-08-28**|**Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games**|Nicholas R. Waytowich et.al.|[2408.15950](http://arxiv.org/abs/2408.15950)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u4f7f\u5176\u80fd\u529b\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u6587\u672c\u4efb\u52a1\uff0c\u6269\u5c55\u5230\u4e86\u591a\u6a21\u6001\u9886\u57df\uff0c\u6574\u5408\u4e86\u89c6\u89c9\u3001\u542c\u89c9\u548c\u6587\u672c\u6570\u636e\u3002\u867d\u7136\u5728\u673a\u5668\u4eba\u5b66\u548c\u6e38\u620f\u7b49\u9ad8\u9636\u89c4\u5212\u9886\u57df\u5bf9\u591a\u6a21\u6001LLM\u7684\u7814\u7a76\u5df2\u7ecf\u76f8\u5f53\u5e7f\u6cdb\uff0c\u4f46\u5728\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u5374\u9c9c\u6709\u63a2\u7d22\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001LLM\u5728 Atari \u89c6\u9891\u6e38\u620f\u9886\u57df\u7684\u5e94\u7528\uff0c\u5f15\u5165\u4e86 Atari \u6e38\u620f\u6027\u80fd\u4f5c\u4e3a\u8bc4\u4f30\u591a\u6a21\u6001LLM\u6267\u884c\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u80fd\u529b\u7684\u65b0\u57fa\u51c6\u3002\u4e0e\u4f20\u7edf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u548c\u6a21\u4eff\u5b66\u4e60\uff08IL\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8fd9\u4e9bLLM\u65e0\u9700\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u5956\u52b1\u51fd\u6570\u5b9a\u4e49\uff0c\u800c\u662f\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u77e5\u8bc6\u76f4\u63a5\u4e0e\u6e38\u620f\u73af\u5883\u4ea4\u4e92\u3002 \u6211\u4eec\u7684\u7814\u7a76\u8bc4\u4f30\u4e86\u591a\u4e2a\u591a\u6a21\u6001LLM\u7684\u8868\u73b0\uff0c\u4e0e\u4f20\u7edfRL\u4ee3\u7406\u3001\u4eba\u7c7b\u73a9\u5bb6\u548c\u968f\u673a\u4ee3\u7406\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u91cd\u70b9\u5173\u6ce8\u5b83\u4eec\u7406\u89e3\u590d\u6742\u89c6\u89c9\u573a\u666f\u5e76\u5236\u5b9a\u6218\u7565\u54cd\u5e94\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u5f15\u5165\u4eba\u7c7b\u6f14\u793a\u7684\u6e38\u620f\u73a9\u6cd5\u8f68\u8ff9\u6765\u7814\u7a76\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u7684\u5f71\u54cd\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002 \u901a\u8fc7\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u786e\u5b9a\u591a\u6a21\u6001LLM\u80fd\u5426\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u8bad\u7ec3\u6765\u6709\u6548\u5730\u5145\u5f53\u4f4e\u7ea7\u63a7\u5236\u5668\uff0c\u4ece\u800c\u91cd\u65b0\u5b9a\u4e49\u52a8\u6001\u548c\u89c6\u89c9\u590d\u6742\u73af\u5883\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\u3002\u6709\u5173\u989d\u5916\u7ed3\u679c\u548c\u89c6\u9891\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u9879\u76ee\u7f51\u9875\uff1ahttps://sites.google.com/view/atari-gpt/\u3002|\n", "2408.15915": "|**2024-08-28**|**Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models**|Yuncheng Yang et.al.|[2408.15915](http://arxiv.org/abs/2408.15915)|**[link](https://github.com/yaphabates/rocket)**|\u5728\u7279\u5b9a\u9886\u57df\u57f9\u517b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee5\u89e3\u51b3\u4efb\u52a1\u6240\u9700\u7684\u4e13\u957f\u5f80\u5f80\u9700\u8981\u9488\u5bf9\u7a33\u5b9a\u9884\u671f\u8f93\u51fa\u8fdb\u884c\u4e13\u95e8\u8c03\u6574\u3002\u907f\u514d\u624b\u52a8\u51c6\u5907\u6307\u4ee4\u6570\u636e\u96c6\u548c\u8bad\u7ec3\u8d44\u6e90\u5e26\u6765\u7684\u5de8\u5927\u6210\u672c\uff0c\u5229\u7528\u5f00\u653e\u77e5\u8bc6\u5305\u62ec\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u6a21\u578b\u548c\u6307\u4ee4\u6570\u636e\u96c6\u4f5c\u4e3a\u8d77\u70b9\u662f\u5408\u7406\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u4e0a\u4fa7\u91cd\u4e8e\u901a\u7528\u80fd\u529b\u7684\u6027\u80fd\uff0c\u800c\u5ffd\u89c6\u4e86\u5728\u7279\u5b9a\u9886\u57df\u90e8\u7f72\u65f6\u66b4\u9732\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u8fc7\u5f15\u5165\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u6837\u672c\uff08\u5373K-shot\uff09\u6765\u5f25\u5408\u6b64\u7c7b\u5dee\u8ddd\u7684\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdbLLM\u5728\u5f00\u653e\u77e5\u8bc6\u4e0a\u7684\u4efb\u52a1\u4e13\u957f\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u9ad8\u6548\u4e14\u53ef\u6269\u5c55\u7684\u7ba1\u9053\uff0c\u4ee5\u6210\u672c\u6548\u76ca\u65b9\u5f0f\u751f\u6210\u4efb\u52a1\u4e13\u5bb6\uff0c\u5176\u4e2dK-shot\u6570\u636e\u53c2\u4e0e\u9009\u62e9\u6700\u5177\u6f5c\u529b\u7684\u4e13\u5bb6\u5019\u9009\u8005\u548c\u4efb\u52a1\u76f8\u5173\u7684\u6307\u4ee4\u3002\u6784\u5efa\u4e86\u4e00\u4e2a\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7cfb\u7edf\uff0c\u5145\u5206\u5229\u7528\u591a\u4e2a\u4e13\u5bb6\u4e4b\u95f4\u72ec\u7279\u4f46\u4e92\u8865\u7684\u77e5\u8bc6\u3002\u6211\u4eec\u63ed\u793a\u4e86MoE\u7cfb\u7edf\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\uff1a 1. \u9075\u5faaK-shot\u539f\u5219\uff1a\u786e\u4fdd\u771f\u6b63\u5177\u5907\u89e3\u51b3K-shot\u95ee\u9898\u80fd\u529b\u7684\u6a21\u578b\u88ab\u9009\u4e2d\uff0c\u800c\u975e\u76f2\u731c\u8005\u3002 2. \u5f3a\u8c03\u591a\u6837\u6027\uff1a\u4e0d\u4ec5\u4e13\u5bb6\u672c\u8eab\u5177\u6709\u591a\u6837\u6027\uff0c\u800c\u4e14\u5728\u6574\u4e2a\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u8fc7\u7a0b\u4e2d\uff0c\u7ec6\u8c03\u6307\u4ee4\u4e5f\u4f53\u73b0\u51fa\u591a\u6837\u6027\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5bf9\u5f00\u653e\u77e5\u8bc6\u5229\u7528\u7684\u4f18\u8d8a\u6027\u3002\u540e\u7eed\u5c06\u53d1\u5e03\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2408.15907": "|**2024-08-28**|**Decentralized LLM Inference over Edge Networks with Energy Harvesting**|Aria Khoshsirat et.al.|[2408.15907](http://arxiv.org/abs/2408.15907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u7684\u5353\u8d8a\u6027\u80fd\u5df2\u7ecf\u6781\u5927\u5730\u6539\u53d8\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u4f46\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u5982\u8fb9\u7f18\u7f51\u7edc\u4e2d\u7684\u90e8\u7f72\u4ecd\u9762\u4e34\u6311\u6218\u3002\u5206\u5e03\u5f0f\u63a8\u7406\u6280\u672f\u7684\u51fa\u73b0\u901a\u8fc7\u5728\u591a\u53f0\u8bbe\u5907\u95f4\u5206\u914d\u6a21\u578b\u5757\u6765\u63d0\u5347\u7075\u6d3b\u6027\u548c\u6210\u672c\u6548\u76ca\uff0c\u4f46\u4ecd\u5b58\u5728\u80fd\u6e90\u9650\u5236\u95ee\u9898\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u7535\u6c60\u4f9b\u7535\u7684\u8fb9\u7f18\u8bbe\u5907\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u4e92\u8054\u3001\u4f7f\u7528\u80fd\u91cf\u6536\u96c6\u7684\u7535\u6c60\u4f9b\u7535\u8fb9\u7f18\u8bbe\u5907\u7684\u534f\u4f5c\u63a8\u7406\u53ef\u6301\u7eed\u6a21\u578b\u3002\u901a\u8fc7\u5efa\u7acb\u534a\u9a6c\u5c14\u53ef\u592b\u6a21\u578b\u63cf\u8ff0\u8bbe\u5907\u72b6\u6001\uff0c\u8003\u8651\u5904\u7406\u53c2\u6570\u548c\u5e73\u5747\u7eff\u8272\u80fd\u6e90\u5230\u8fbe\u60c5\u51b5\uff0c\u4ee5\u6307\u5bfc\u8bbe\u8ba1\u65e8\u5728\u51cf\u5c11\u8bbe\u5907\u505c\u673a\u65f6\u95f4\u548c\u6700\u5927\u5316\u7f51\u7edc\u541e\u5410\u91cf\u7684\u8c03\u5ea6\u7b97\u6cd5\u3002\u901a\u8fc7\u5b9e\u8bc1\u8bc4\u4f30\u548c\u6a21\u62df\u8fd0\u884c\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u4e3a\u8fb9\u7f18\u7f51\u7edc\u4e0a\u7684\u8282\u80fd\u5206\u5e03\u5f0f\u63a8\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2408.15903": "|**2024-08-28**|**LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments**|Ruirui Chen et.al.|[2408.15903](http://arxiv.org/abs/2408.15903)|null|\u5feb\u901f\u8fc7\u65f6\u7684\u4fe1\u606f\u4f7f\u5f97\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6574\u5408\u65b0\u77e5\u8bc6\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u51c6\u786e\u4e8b\u5b9e\u8bc6\u522b\u548c\u5e8f\u5217\u903b\u8f91\u63a8\u7406\u7684\u591a\u8df3\u95ee\u9898\u65f6\u4ecd\u5b58\u5728\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u9762\u5bf9\u5927\u91cf\u4e8b\u5b9e\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86Graph Memory-based Editing for Large Language Models\uff08GMeLLo\uff09\uff0c\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86\u77e5\u8bc6\u56fe\u8c31\uff08KGs\uff09\u7684\u660e\u786e\u77e5\u8bc6\u8868\u793a\u4e0eLLMs\u7684\u8bed\u8a00\u7075\u6d3b\u6027\u3002GMeLLo\u4e0d\u4ec5\u5229\u7528LLMs\u8fdb\u884c\u95ee\u7b54\uff0c\u8fd8\u8fd0\u7528\u8fd9\u4e9b\u6a21\u578b\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u6362\u4e3a\u7ed3\u6784\u5316\u67e5\u8be2\u548c\u4e8b\u5b9e\u4e09\u5143\u7ec4\uff0c\u4ece\u800c\u5b9e\u73b0\u4e0eKGs\u7684\u65e0\u7f1d\u4ea4\u4e92\uff0c\u7528\u4e8e\u5feb\u901f\u66f4\u65b0\u548c\u7cbe\u786e\u7684\u591a\u8df3\u63a8\u7406\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMeLLo\u5728\u591a\u8df3\u95ee\u7b54\u57fa\u51c6MQuAKE\u4e2d\u663e\u8457\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u5927\u91cf\u77e5\u8bc6\u66f4\u65b0\u7684\u573a\u666f\u4e2d\u3002|\n", "2408.15901": "|**2024-08-28**|**Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts**|Nikolas Gritsch et.al.|[2408.15901](http://arxiv.org/abs/2408.15901)|null|\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6548\u7387\u3001\u4e13\u4e1a\u5316\u548c\u5bf9\u65b0\u6570\u636e\u5206\u5e03\u7684\u9002\u5e94\u6027\u65b9\u9762\u96be\u4ee5\u540c\u65f6\u5177\u5907\u8fd9\u4e9b\u4f18\u79c0\u54c1\u8d28\u3002\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u6761\u4ef6\u8ba1\u7b97\u7684\u5185\u5728\u7279\u6027\uff0c\u6210\u4e3a\u7814\u7a76\u7684\u91cd\u70b9\u9886\u57df\uff0c\u65e8\u5728\u63d0\u5347\u8fd9\u4e9b\u54c1\u8d28\u3002\u672c\u5de5\u4f5c\u4e13\u6ce8\u4e8e\u201c\u5347\u7ea7\u201d\u5bc6\u96c6\u578b\u4e13\u5bb6\u6a21\u578b\u81f3MoE\u67b6\u6784\uff0c\u65e8\u5728\u589e\u5f3a\u4e13\u4e1a\u5316\u7684\u540c\u65f6\uff0c\u4e5f\u589e\u52a0\u5bf9\u65b0\u4efb\u52a1\u7684\u7075\u6d3b\u9002\u5e94\u6027\u3002 \u6211\u4eec\u5f15\u5165\u4e86Nexus\uff0c\u4e00\u79cd\u589e\u5f3a\u7684MoE\u67b6\u6784\uff0c\u5176\u5177\u6709\u81ea\u9002\u5e94\u8def\u7531\u673a\u5236\uff0c\u5141\u8bb8\u6a21\u578b\u5b66\u4e60\u5c06\u4e13\u5bb6\u5d4c\u5165\u4ece\u9886\u57df\u8868\u793a\u8fdb\u884c\u6295\u5f71\u3002\u8fd9\u79cd\u7b56\u7565\u4f7f\u5f97Nexus\u80fd\u591f\u901a\u8fc7\u5355\u72ec\u8bad\u7ec3\u7684\u5bc6\u96c6\u6a21\u578b\u7075\u6d3b\u5730\u6dfb\u52a0\u65b0\u7684\u4e13\u5bb6\uff0c\u65e0\u9700\u5bf9\u672a\u89c1\u6570\u636e\u57df\u8fdb\u884c\u5927\u89c4\u6a21MoE\u8bad\u7ec3\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cNexus\u5728\u521d\u59cb\u5347\u7ea7\u9636\u6bb5\u5b9e\u73b0\u4e86\u9ad8\u8fbe2.1%\u7684\u76f8\u5bf9\u589e\u76ca\uff0c\u5728\u4f7f\u7528\u6709\u9650\u7684\u5fae\u8c03\u6570\u636e\u6269\u5c55MoE\u65f6\u5b9e\u73b0\u4e8618.8%\u7684\u76f8\u5bf9\u589e\u76ca\u3002Nexus\u7684\u7075\u6d3b\u6027\u5bf9\u4e8e\u5efa\u7acb\u4e00\u4e2a\u5f00\u6e90\u751f\u6001\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\uff0c\u8be5\u751f\u6001\u7cfb\u7edf\u5141\u8bb8\u6bcf\u4e2a\u7528\u6237\u6839\u636e\u81ea\u5df1\u7684\u9700\u6c42\u4e0d\u65ad\u7ec4\u88c5\u81ea\u5df1\u7684MoE\u6df7\u5408\u6a21\u578b\u3002|\n", "2408.15895": "|**2024-08-28**|**Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models**|Sebastian Vallejo Vera et.al.|[2408.15895](http://arxiv.org/abs/2408.15895)|null|\u4eba\u7c7b\u7f16\u7801\u5458\u5b58\u5728\u504f\u89c1\u3002\u6211\u4eec\u901a\u8fc7\u590d\u5236Ennser-Jedenastik\u548cMeyer\uff082018\uff09\u7684\u5b9e\u9a8c\uff0c\u53d1\u73b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc4\u4f30\u653f\u6cbb\u58f0\u660e\u65f6\u4f7f\u7528\u653f\u6cbb\u4fe1\u606f\uff0c\u7279\u522b\u662f\u653f\u515a\u7ebf\u7d22\u3002LLMs\u4e0d\u4ec5\u6839\u636e\u653f\u515a\u7ebf\u7d22\u4e0a\u4e0b\u6587\u5316\u5224\u65ad\u9648\u8ff0\u662f\u6b63\u9762\u3001\u8d1f\u9762\u8fd8\u662f\u4e2d\u6027\uff0c\u8fd8\u53cd\u6620\u51fa\u5b83\u4eec\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u751f\u6210\u7684\u4eba\u7c7b\u6570\u636e\u6240\u5177\u6709\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4e0e\u4eba\u7c7b\u4e0d\u540c\u7684\u662f\uff0c\u4eba\u7c7b\u4ec5\u5728\u9762\u5bf9\u6781\u7aef\u653f\u515a\u58f0\u660e\u65f6\u8868\u73b0\u51fa\u504f\u89c1\uff0c\u800cLLMs\u5373\u4f7f\u5728\u88ab\u63d0\u793a\u6765\u81ea\u4e2d\u95f4\u5de6\u7ffc\u548c\u4e2d\u95f4\u53f3\u7ffc\u653f\u515a\u7684\u58f0\u660e\u65f6\u4e5f\u663e\u793a\u51fa\u663e\u8457\u504f\u89c1\u3002\u6700\u540e\u90e8\u5206\u8ba8\u8bba\u4e86\u8fd9\u4e9b\u53d1\u73b0\u7684\u610f\u4e49\u3002|\n", "2408.15879": "|**2024-08-28**|**Persuasion Games using Large Language Models**|Ganesh Prasath Ramani et.al.|[2408.15879](http://arxiv.org/abs/2408.15879)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5df2\u7ecf\u53d1\u5c55\u6210\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u672c\u6587\u7814\u7a76\u4e86LLM\u5728\u5851\u9020\u4eba\u7c7b\u89c2\u70b9\u5e76\u8fdb\u800c\u5f71\u54cd\u4ed6\u4eec\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u51b3\u7b56\u65b9\u9762\u7684\u6f5c\u529b\u3002\u8fd9\u4e9b\u80fd\u529b\u5728\u6295\u8d44\u3001\u4fe1\u7528\u5361\u548c\u4fdd\u9669\u7b49\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\uff0c\u5e2e\u52a9\u7528\u6237\u9009\u62e9\u5408\u9002\u7684\u4fdd\u9669\u653f\u7b56\u3001\u6295\u8d44\u8ba1\u5212\u3001\u4fe1\u7528\u5361\u4ee5\u53ca\u96f6\u552e\u4ea7\u54c1\uff0c\u751a\u81f3\u5728\u884c\u4e3a\u6539\u53d8\u652f\u6301\u7cfb\u7edf\uff08BCSS\uff09\u4e2d\u4e5f\u6709\u5e94\u7528\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u590d\u6742\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u4e2d\u4e00\u7ec4\u4ee3\u7406\u4ee5\u534f\u4f5c\u65b9\u5f0f\u64cd\u4f5c\u3002\u4e3b\u8981\u4ee3\u7406\u76f4\u63a5\u4e0e\u7528\u6237\u8fdb\u884c\u6709\u8bf4\u670d\u529b\u7684\u5bf9\u8bdd\uff0c\u800c\u8f85\u52a9\u4ee3\u7406\u6267\u884c\u8bf8\u5982\u4fe1\u606f\u68c0\u7d22\u3001\u54cd\u5e94\u5206\u6790\u3001\u5236\u5b9a\u8bf4\u670d\u7b56\u7565\u548c\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u4efb\u52a1\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u534f\u4f5c\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u8bf4\u670d\u6548\u679c\u3002\u6211\u4eec\u6301\u7eed\u5206\u6790\u7528\u6237\u7684\u62b5\u6297\u6027\uff0c\u5e76\u901a\u8fc7\u7ed3\u5408\u89c4\u5219\u57fa\u4e8e\u548cLLM\u57fa\u4e8e\u7684\u62b5\u6297-\u8bf4\u670d\u6620\u5c04\u6280\u672f\u6765\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\u3002 \u6211\u4eec\u4f7f\u7528\u6a21\u62df\u7684\u4eba\u683c\u5f62\u8c61\uff0c\u5e76\u5728\u4fdd\u9669\u3001\u94f6\u884c\u548c\u96f6\u552e\u9886\u57df\u751f\u6210\u5bf9\u8bdd\uff0c\u4ee5\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bc6\u522b\u3001\u9002\u5e94\u548c\u5f71\u54cd\u4e0d\u540c\u4eba\u683c\u7c7b\u578b\u65b9\u9762\u7684\u719f\u7ec3\u7a0b\u5ea6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u68c0\u67e5\u4e86LLM\u6a21\u62df\u4eba\u683c\u6240\u91c7\u7528\u7684\u62b5\u6297\u673a\u5236\u3002\u8bf4\u670d\u6548\u679c\u901a\u8fc7\u4ea4\u4e92\u524d\u540e\u7684\u53ef\u8861\u91cf\u8c03\u67e5\u3001LLM\u751f\u6210\u7684\u5bf9\u8bdd\u8bc4\u5206\u4ee5\u53ca\u7528\u6237\u51b3\u7b56\uff08\u8d2d\u4e70\u6216\u4e0d\u8d2d\u4e70\uff09\u8fdb\u884c\u91cf\u5316\u3002|\n", "2408.16756": "|**2024-08-29**|**How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models**|Jiyue Jiang et.al.|[2408.16756](http://arxiv.org/abs/2408.16756)|**[link](https://github.com/jiangjyjy/yue-benchmark)**|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u6539\u53d8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u7ade\u8d5b\u73af\u5883\uff0c\u7279\u522b\u662f\u5728\u82f1\u8bed\u548c\u5176\u4ed6\u6570\u636e\u4e30\u5bcc\u7684\u8bed\u8a00\u4e2d\u3002\u7136\u800c\uff0c\u5728\u8bf8\u5982\u7ca4\u8bed\u8fd9\u6837\u7684\u4ee3\u8868\u6027\u4e0d\u8db3\u7684\u8bed\u8a00\u9886\u57df\uff0c\u5f00\u53d1\u5dee\u8ddd\u4ecd\u7136\u663e\u8457\u5b58\u5728\uff0c\u8fd9\u5c24\u5176\u4ee4\u4eba\u62c5\u5fe7\uff0c\u8003\u8651\u5230\u5e7f\u6df1\u6e2f\u6fb3\u5927\u6e7e\u533a\u7684\u7ecf\u6d4e\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5728\u65b0\u52a0\u5761\u548c\u5317\u7f8e\u5730\u533a\u5927\u91cf\u7ca4\u8bed\u4f7f\u7528\u8005\u7684\u60c5\u51b5\u3002\u5c3d\u7ba1\u7ca4\u8bed\u5e7f\u6cdb\u4f7f\u7528\uff0c\u4f46\u5728NLP\u7814\u7a76\u4e2d\u5bf9\u7ca4\u8bed\u7684\u4ee3\u8868\u5374\u5c11\u4e4b\u53c8\u5c11\uff0c\u5c24\u5176\u662f\u4e0e\u5176\u4ed6\u540c\u6837\u53d1\u8fbe\u5730\u533a\u7684\u8bed\u8a00\u76f8\u6bd4\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e9b\u7a7a\u767d\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u5f53\u524d\u7684\u7ca4\u8bedNLP\u65b9\u6cd5\uff0c\u5e76\u5f15\u5165\u4e86\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4e8b\u5b9e\u751f\u6210\u3001\u6570\u5b66\u903b\u8f91\u3001\u590d\u6742\u63a8\u7406\u548c\u7ca4\u8bed\u4e2d\u7684\u901a\u7528\u77e5\u8bc6\u7b49\u65b9\u9762\u7684\u6027\u80fd\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u63a8\u52a8\u5f00\u6e90\u7ca4\u8bedLLM\u6280\u672f\u7684\u53d1\u5c55\u3002\u6211\u4eec\u4e5f\u63d0\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u548c\u63a8\u8350\u7684\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u7ca4\u8bedLLM\u7684\u5f00\u53d1\u3002|\n", "2408.16753": "|**2024-08-29**|**Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models**|Alec Solway et.al.|[2408.16753](http://arxiv.org/abs/2408.16753)|null|\u5f3a\u5316\u5b66\u4e60\u5728\u9884\u8bad\u7ec3\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u6700\u5927\u5316\u4f3c\u7136\u6027\u6765\u9884\u6d4b\u5927\u578b\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u7684\u4e0b\u4e00\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u7528\u4e8e\u5c06\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5bf9\u9f50\u3002\u5728\u90e8\u7f72\u5230\u7279\u5b9a\u9886\u57df\u4e4b\u524d\uff0c\u901a\u5e38\u4f1a\u5bf9\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u5fae\u8c03\u4ee5\u9002\u5e94\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u3002\u7531\u4e8e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5728\u6700\u540e\u9636\u6bb5\u5f80\u5f80\u4e0d\u53ef\u7528\uff0c\u56e0\u6b64\u901a\u5e38\u4f7f\u7528\u6700\u5927\u5316\u4f3c\u7136\u6027\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u9ed8\u8ba4\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5f3a\u5316\u5b66\u4e60\u9664\u4e86\u80fd\u591f\u4fc3\u8fdb\u4e0e\u4eba\u7c7b\u5b9a\u4e49\u5956\u52b1\u51fd\u6570\u7684\u5bf9\u9f50\u4e4b\u5916\uff0c\u8fd8\u6709\u5176\u4ed6\u4f18\u52bf\u3002\u76f8\u6bd4\u4e8e\u6700\u5927\u5316\u4f3c\u7136\u6027\uff0c\u5373\u6a21\u4eff\u5b66\u4e60\u6a21\u578b\u5728\u7406\u60f3\u6761\u4ef6\u4e0b\u5e94\u6267\u884c\u7684\u64cd\u4f5c\uff0c\u5f3a\u5316\u5b66\u4e60\u4e0d\u9650\u4e8e\u4ec5\u5c55\u793a\u8fbe\u5230\u6700\u4f18\u72b6\u6001\u65f6\u7684\u64cd\u4f5c\uff0c\u800c\u662f\u5728\u63a2\u7d22\u7b56\u7565\u7a7a\u95f4\u7684\u8fc7\u7a0b\u4e2d\u8bad\u7ec3\u6a21\u578b\u5728\u5404\u79cd\u60c5\u51b5\u4e0b\u7684\u64cd\u4f5c\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8bad\u7ec3\u6a21\u578b\u907f\u514d\u6267\u884c\u7ade\u4e89\u4f46\u6548\u679c\u4e0d\u4f73\u7684\u64cd\u4f5c\u3002\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u8fdb\u884c\u6700\u540e\u4e00\u9636\u6bb5\u5fae\u8c03\u7684\u6846\u67b6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u8be5\u65b9\u6cd5\u662f\u5426\u80fd\u5e26\u6765\u6027\u80fd\u63d0\u5347\u3002\u5b9e\u9a8c\u96c6\u4e2d\u5728\u62bd\u8c61\u6982\u62ec\u4e0a\uff0c\u4f46\u6846\u67b6\u5177\u6709\u666e\u904d\u9002\u7528\u6027\u3002\u91c7\u7528\u8be5\u6d41\u7a0b\u4ea7\u751f\u7684\u7ed3\u679c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u6700\u5927\u4f3c\u7136\u6027\u8f93\u51fa\u7684\u7ed3\u679c\u3002\u5bf9\u4e8e\u7279\u5b9a\u7684\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u540e\u5904\u7406\u6700\u5927\u4f3c\u7136\u8f93\u51fa\u53ef\u4ee5\u7f29\u5c0f\u6027\u80fd\u5dee\u8ddd\u3002\u7136\u800c\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u4f18\u5316\u6a21\u578b\u7684\u65b0\u9014\u5f84\uff0c\u5728\u540e\u5904\u7406\u53ef\u80fd\u4e0d\u90a3\u4e48\u76f4\u63a5\u6709\u6548\u6216\u6709\u6548\u7684\u573a\u666f\u4e2d\u5c24\u4e3a\u6709\u7528\uff0c\u5e76\u4e14\u5b83\u53ef\u4ee5\u6269\u5c55\u4ee5\u5305\u62ec\u66f4\u591a\u7c7b\u522b\u7684\u9700\u8981\u60e9\u7f5a\u5e76\u8bad\u7ec3\u53cd\u5bf9\u7684\u4e0d\u9002\u5f53\u8f93\u51fa\uff0c\u5982\u5e7b\u89c9\u3002|\n", "2408.16749": "|**2024-08-29**|**Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge**|Beidi Dong et.al.|[2408.16749](http://arxiv.org/abs/2408.16749)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u68c0\u6d4b\u548c\u9650\u5236\u7f51\u7edc\u4e0a\u6781\u7aef\u4e3b\u4e49\u601d\u60f3\u4f20\u64ad\u65b9\u9762\uff0c\u81ea\u52a8\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u7814\u7a76\u6bd4\u8f83\u4e86\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff08BERT\uff09\u548c\u751f\u6210\u9884\u8bad\u7ec3Transformer\uff08GPT\uff09\u6a21\u578b\uff0c\u5728\u201c\u53f3\u7ffc\u201d\u548c\u201c\u5de6\u7ffc\u201d\u610f\u8bc6\u5f62\u6001\u5173\u952e\u8bcd\u7684\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u4e2d\u8fdb\u884c\u68c0\u6d4b\u4e0e\u5206\u7c7b\u7684\u80fd\u529b\u3002\u6211\u4eec\u6536\u96c6\u4e86\u542b\u6709\u4e0a\u8ff0\u5173\u952e\u8bcd\u7684\u5e16\u5b50\uff0c\u5e76\u4eba\u5de5\u6807\u8bb0\u4e3a\u6781\u7aef\u4e3b\u4e49\u6216\u975e\u6781\u7aef\u4e3b\u4e49\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c06\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u5206\u4e3a\u4e94\u4e2a\u6784\u6210\u8981\u7d20\u4e4b\u4e00\uff0c\u57fa\u4e8e\u5de5\u4f5c\u5b9a\u4e49\u6846\u67b6\u3002 BERT\u6a21\u578b\u7684\u6027\u80fd\u8bc4\u4f30\u57fa\u4e8e\u8bad\u7ec3\u6570\u636e\u89c4\u6a21\u548c\u7c7b\u522b\u95f4\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u6bd4\u4e86\u4f7f\u7528\u4e0d\u540c\u63d0\u793a\u7684GPT 3.5\u548cGPT 4\u6a21\u578b\u7684\u6027\u80fd\uff1a\u539f\u59cb\u63d0\u793a\u3001\u4e00\u822c\u5b9a\u4e49\u3001\u89d2\u8272\u626e\u6f14\u548c\u4e13\u4e1a\u5b9a\u4e49\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6700\u4f73\u8868\u73b0\u7684GPT\u6a21\u578b\u4f18\u4e8e\u6700\u4f73\u8868\u73b0\u7684BERT\u6a21\u578b\uff0c\u66f4\u8be6\u7ec6\u7684\u63d0\u793a\u901a\u5e38\u80fd\u5e26\u6765\u66f4\u597d\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fc7\u4e8e\u590d\u6742\u7684\u63d0\u793a\u53ef\u80fd\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e0d\u540c\u7684GPT\u7248\u672c\u5bf9\u88ab\u8ba4\u5b9a\u4e3a\u6781\u7aef\u4e3b\u4e49\u7684\u654f\u611f\u5ea6\u5404\u4e0d\u76f8\u540c\u3002GPT 3.5\u5728\u8bc6\u522b\u5de6\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\uff0c\u800cGPT 4\u5219\u5728\u8bc6\u522b\u53f3\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\u3002 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT\u6a21\u578b\uff09\u5728\u5728\u7ebf\u6781\u7aef\u4e3b\u4e49\u5206\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684BERT\u6a21\u578b\uff0c\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u672a\u6765\u7814\u7a76\u5e94\u63a2\u7d22\u4eba\u7c7b\u4e0e\u8ba1\u7b97\u673a\u4ea4\u4e92\u5728\u4f18\u5316GPT\u6a21\u578b\u4ee5\u8fdb\u884c\u6781\u7aef\u4e3b\u4e49\u68c0\u6d4b\u4e0e\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u4f5c\u7528\uff0c\u4ee5\u5f00\u53d1\u66f4\u9ad8\u6548\uff08\u4f8b\u5982\uff0c\u66f4\u5feb\u6377\u3001\u66f4\u5c11\u52aa\u529b\uff09\u4e14\u66f4\u6709\u6548\u7684\u8bc6\u522b\u6781\u7aef\u4e3b\u4e49\u5185\u5bb9\u65b9\u6cd5\u3002|\n", "2408.16740": "|**2024-08-29**|**Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models**|Ji\u0159\u00ed Mili\u010dka et.al.|[2408.16740](http://arxiv.org/abs/2408.16740)|null|\u672c\u6587\u4ece\u5b9a\u91cf\u8bed\u8a00\u5b66\u7684\u89d2\u5ea6\u63a2\u8ba8\u4e86\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u751f\u6210\u6587\u672c\u6240\u9762\u4e34\u7684\u6982\u5ff5\u3001\u65b9\u6cd5\u8bba\u548c\u6280\u672f\u6311\u6218\u3002\u672c\u6587\u57fa\u4e8e\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u533a\u5206\u4e86\u4f5c\u4e3a\u8f7d\u4f53\u7684LLM\u4e0e\u6a21\u62df\u7684\u5b9e\u4f53\u3002\u672c\u6587\u5021\u5bfc\u5bf9\u6a21\u578b\u91c7\u53d6\u4e25\u683c\u975e\u62df\u4eba\u5316\u7684\u65b9\u6cd5\uff0c\u540c\u65f6\u8c28\u614e\u5730\u5e94\u7528\u7528\u4e8e\u7814\u7a76\u4eba\u7c7b\u8bed\u8a00\u884c\u4e3a\u7684\u65b9\u6cd5\u6765\u5206\u6790\u6a21\u62df\u5b9e\u4f53\u3002\u867d\u7136\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7814\u7a76\u8005\u5173\u6ce8\u6a21\u578b\u672c\u8eab\u3001\u5176\u67b6\u6784\u3001\u8bc4\u4f30\u4ee5\u53ca\u63d0\u9ad8\u6027\u80fd\u7684\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u5b9a\u91cf\u8bed\u8a00\u5b66\u5bb6\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u6784\u5efa\u5173\u4e8eLLM\u751f\u6210\u6587\u672c\u7279\u6027\u7684\u7406\u8bba\u4f53\u7cfb\uff0c\u5b83\u4eec\u4e0e\u4eba\u7c7b\u751f\u6210\u7684\u6587\u672c\u6709\u4f55\u4e0d\u540c\uff0c\u4ee5\u53ca\u6a21\u62df\u5b9e\u4f53\u7684\u5c5e\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5e94\u63a2\u7d22LLM\u4f5c\u4e3a\u7814\u7a76\u4eba\u7c7b\u6587\u5316\u5de5\u5177\u7684\u53ef\u80fd\u6027\uff0c\u800c\u8bed\u8a00\u662f\u8fd9\u4e00\u6587\u5316\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002|\n", "2408.16700": "|**2024-08-29**|**GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models**|Moreno D'Inc\u00e0 et.al.|[2408.16700](http://arxiv.org/abs/2408.16700)|**[link](https://github.com/moreno98/gradbias)**|**\u8fd1\u671f\u5728\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\u4f7f\u5f97\u9ad8\u8d28\u91cf\u56fe\u50cf\u751f\u6210\u6210\u4e3a\u53ef\u80fd\u3002\u968f\u7740\u6027\u80fd\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u63d0\u9ad8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6b63\u53d7\u5230\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u548c\u6b22\u8fce\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u516c\u5e73\u6027\u548c\u5b89\u5168\u6027\u662f\u9632\u6b62\u504f\u89c1\u4f20\u64ad\u548c\u5ef6\u7eed\u7684\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9884\u5b9a\u4e49\u504f\u89c1\uff08\u5982\u6027\u522b\u3001\u79cd\u65cf\uff09\u7684\u5c01\u95ed\u96c6\u5408\u4e0a\u8fdb\u884c\u504f\u89c1\u68c0\u6d4b\u3002\u7136\u800c\uff0c\u5728\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\uff0c\u5373\u65e0\u9700\u9884\u5148\u8bbe\u5b9a\u7684\u60c5\u51b5\u4e0b\uff0c\u68c0\u6d4b\u548c\u91cf\u5316\u504f\u89c1\u662f\u4e00\u4e2a\u6311\u6218\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u8bc6\u522b\u3001\u91cf\u5316\u548c\u89e3\u91ca\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\u7684\u504f\u89c1\u3002\u8be5\u7ba1\u9053\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ece\u4e00\u7ec4\u63cf\u8ff0\u4e2d\u63d0\u51fa\u504f\u89c1\u3002\u968f\u540e\uff0c\u4f7f\u7528\u76ee\u6807\u751f\u6210\u6a21\u578b\u751f\u6210\u4e00\u7cfb\u5217\u56fe\u50cf\u3002\u6700\u540e\uff0c\u901a\u8fc7\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u8fdb\u884c\u504f\u89c1\u8bc4\u4f30\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4e24\u79cd\u57fa\u4e8e\u6b64\u6846\u67b6\u7684\u65b9\u6cd5\uff1aOpenBias \u548c GradBias\u3002OpenBias \u80fd\u591f\u68c0\u6d4b\u5e76\u91cf\u5316\u4e0e\u4eba\u3001\u7269\u4f53\u548c\u52a8\u7269\u76f8\u5173\u7684\u5df2\u77e5\u548c\u65b0\u578b\u504f\u89c1\uff0c\u5e76\u4e0e\u73b0\u6709\u7684\u5c01\u95ed\u96c6\u504f\u89c1\u68c0\u6d4b\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u4e00\u81f4\u3002GradBias \u663e\u793a\u51fa\u4e2d\u6027\u8bcd\u6c47\u5bf9\u504f\u89c1\u7684\u5f71\u54cd\u663e\u8457\uff0c\u5e76\u4e14\u5728\u591a\u9879\u57fa\u7ebf\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684\u57fa\u7840\u6a21\u578b\u3002 \u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1ahttps://github.com/Moreno98/GradBias\u3002**|\n", "2408.16673": "|**2024-08-29**|**Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity**|Ziniu Li et.al.|[2408.16673](http://arxiv.org/abs/2408.16673)|null|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u7684\u7cbe\u8c03\uff08Supervised Fine-Tuning\uff0cSFT\uff09\u8fc7\u7a0b\u4e2d\u9047\u5230\u7684\u8fc7\u62df\u5408\u548c\u8f93\u51fa\u591a\u6837\u6027\u53d7\u9650\u7684\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u4ea4\u53c9\u71b5\uff08Cross Entropy\uff0cCE\uff09\u635f\u5931\u51fd\u6570\u88ab\u5e7f\u6cdb\u7528\u4e8eSFT\uff0c\u7136\u800c\u5b83\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5bf9\u6570\u636e\u5206\u5e03\u8fdb\u884c\u8fc7\u4e8e\u6fc0\u8fdb\u7684\u66f4\u65b0\uff0c\u4ece\u800c\u5f15\u53d1\u8fc7\u62df\u5408\u548c\u964d\u4f4e\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u6700\u5927\u71b5\u539f\u5219\uff0c\u8be5\u539f\u5219\u503e\u5411\u4e8e\u4fc3\u8fdb\u6a21\u578b\u751f\u6210\u66f4\u5e73\u6ed1\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u4ecd\u80fd\u6709\u6548\u6355\u6349\u6570\u636e\u7279\u5f81\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGEM\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u89e3\u51b3\u53cd\u5411Kullback-Leibler\u6563\u5ea6\u6700\u5c0f\u5316\u95ee\u9898\uff0c\u5e76\u52a0\u5165\u71b5\u6b63\u5219\u5316\u5668\uff0c\u6765\u5339\u914d\u76ee\u6807\u5206\u5e03\u3002 \u5728\u5bf9Llama-3-8B\u6a21\u578b\u8fdb\u884cSFT\u65f6\uff0cGEM\u5728\u591a\u4e2a\u65b9\u9762\u4f18\u4e8eCE\u3002\u9996\u5148\uff0c\u5728\u4f7f\u7528UltraFeedback\u6570\u636e\u96c6\u8bad\u7ec3\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u65f6\uff0cGEM\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u8ff9\u8c61\uff0c\u8868\u73b0\u4e3a\u66f4\u4f4e\u7684\u56f0\u60d1\u5ea6\u548c\u5728IFEval\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u66f4\u597d\u6027\u80fd\u3002\u6b64\u5916\uff0cGEM\u8fd8\u63d0\u9ad8\u4e86\u8f93\u51fa\u7684\u591a\u6837\u6027\uff0c\u5373\u4f7f\u5728\u6ca1\u6709\u7279\u5b9a\u9886\u57df\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u4ec5\u901a\u8fc7\u6700\u4f73n\u91c7\u6837\uff0c\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u6027\u80fd\u4e5f\u5f97\u5230\u4e86\u6700\u9ad87\u5206\u7684\u63d0\u5347\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u5f53\u4f7f\u7528\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u96c6\u5bf9\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u65f6\uff0cGEM\u540c\u6837\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u548c\u4e0eCE\u76f8\u6bd4\u9ad8\u8fbe10\u5206\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.16601": "|**2024-08-29**|**Examination of Code generated by Large Language Models**|Robin Beer et.al.|[2408.16601](http://arxiv.org/abs/2408.16601)|**[link](https://github.com/t-muras/ai-code-analysis)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4f8b\u5982ChatGPT\u548cCopilot\uff0c\u6b63\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u4ee3\u7801\u751f\u6210\u5f7b\u5e95\u6539\u53d8\u8f6f\u4ef6\u5f00\u53d1\uff0c\u8fd9\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u4fc3\u8fdb\u4e86\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u3001\u6559\u80b2\u652f\u6301\u4ee5\u53ca\u751f\u4ea7\u529b\u7684\u63d0\u5347\u3002\u56e0\u6b64\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u6b63\u786e\u6027\u548c\u8d28\u91cf\u5e94\u4e0e\u4eba\u5de5\u7f16\u5199\u7684\u4ee3\u7801\u76f8\u5f53\u3002\u4e3a\u4e86\u8bc4\u4f30\u5f53\u524dLLM\u5728\u751f\u6210Java\u548cPython\u8bed\u8a00\u4e2d\u7684\u7b80\u5355\u7b97\u6cd5\u53ca\u5176\u5bf9\u5e94\u7684\u5355\u5143\u6d4b\u8bd5\u65f6\u7684\u6b63\u786e\u6027\u548c\u8d28\u91cf\uff08\u8986\u76d6\u7387\uff09\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u53d7\u63a7\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u5305\u62ec\u8ba9LLM\u751f\u6210\u4ee3\u7801\u5e76\u8bc4\u4f30\u5176\u6b63\u786e\u6027\u4e0e\u8d28\u91cf\u3002\u6211\u4eec\u89c2\u5bdf\u5230LLM\u4e4b\u95f4\u3001\u4e0d\u540c\u7f16\u7a0b\u8bed\u8a00\u4e4b\u95f4\u3001\u7b97\u6cd5\u4e0e\u6d4b\u8bd5\u4ee3\u7801\u4e4b\u95f4\u4ee5\u53ca\u65f6\u95f4\u4e0a\u7684\u663e\u8457\u5dee\u5f02\u3002\u672c\u6587\u62a5\u544a\u4e86\u8fd9\u4e9b\u7ed3\u679c\u53ca\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u4ee5\u4fbf\u8fdb\u884c\u91cd\u590d\u548c\u53ef\u6bd4\u7684\u8bc4\u4f30\uff0c\u4ee5\u6db5\u76d6\u66f4\u591a\u7684\u7b97\u6cd5\u3001\u8bed\u8a00\u548cLLM\u968f\u65f6\u95f4\u7684\u53d8\u5316\u60c5\u51b5\u3002**|\n", "2408.16586": "|**2024-08-29**|**Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies**|Zhiyang Qi et.al.|[2408.16586](http://arxiv.org/abs/2408.16586)|null|\u8fd1\u671f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u7684\u53d1\u5c55\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5bf9\u8bdd\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f7f\u5f97\u5b83\u4eec\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u81ea\u7136\u6d41\u7545\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u4ecd\u9762\u4e34\u7740\u8bf8\u5982\u6301\u7eed\u5bf9\u8bdd\u7ba1\u7406\u3001\u8bb0\u5fc6\u4fdd\u7559\u548c\u51cf\u5c11\u5e7b\u89c9\u7b49\u6311\u6218\u3002AIWolfDial2024\u8fd9\u4e00\u9879\u76ee\u901a\u8fc7\u91c7\u7528\u201c\u72fc\u4eba\u6740\u201d\u8fd9\u4e00\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u6765\u6d4b\u8bd5LLM\u5728\u590d\u6742\u4e92\u52a8\u73af\u5883\u4e2d\u7684\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9\u4e0a\u8ff0\u6311\u6218\u3002\u8be5\u9879\u76ee\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620fAI\uff0c\u5176\u4e2d\u6bcf\u4e2a\u89d2\u8272\u90fd\u901a\u8fc7\u60c5\u5883\u5206\u6790\u6765\u8f85\u52a9\u56de\u5e94\u751f\u6210\u3002\u5bf9\u4e8e\u201c\u72fc\u4eba\u201d\u8fd9\u4e00\u89d2\u8272\uff0c\u9879\u76ee\u91c7\u7528\u4e86\u5305\u62ec\u903b\u8f91\u5438\u5f15\u529b\u3001\u53ef\u4fe1\u5ea6\u5438\u5f15\u529b\u548c\u60c5\u611f\u5438\u5f15\u529b\u5728\u5185\u7684\u591a\u79cd\u8bf4\u670d\u7b56\u7565\uff0c\u4ee5\u6709\u6548\u5730\u5f15\u5bfc\u5176\u4ed6\u73a9\u5bb6\u4e0e\u81ea\u5df1\u7684\u884c\u52a8\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2408.16518": "|**2024-08-29**|**CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues**|Rena Gao et.al.|[2408.16518](http://arxiv.org/abs/2408.16518)|**[link](https://github.com/renagao/csl2024)**|\u6211\u4eec\u5f00\u53d1\u4e86CNIMA\uff08\u4e00\u79cd\u4e2d\u6587\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u975e\u6bcd\u8bed\u4e92\u52a8\u6d4b\u91cf\u4e0e\u81ea\u52a8\u5316\u6570\u636e\u96c6\uff09\uff0c\u5305\u542b10,000\u4e2a\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u8bc4\u4f30\u6846\u67b6\u6765\u6ce8\u91caCNIMA\uff0c\u8be5\u6846\u67b6\u6700\u521d\u7528\u4e8e\u82f1\u8bed\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u5bf9\u8bdd\uff0c\u5b83\u8bc4\u4f30\u4e86\u5fae\u89c2\u5c42\u9762\u7279\u5f81\uff08\u5982\u56de\u8bdd\uff09\u548c\u5b8f\u89c2\u5c42\u9762\u4e92\u52a8\u6807\u7b7e\uff08\u5982\u4e3b\u9898\u7ba1\u7406\uff09\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u8be5\u6846\u67b6\u4ece\u82f1\u8bed\u5230\u4e2d\u6587\u7684\u53ef\u79fb\u690d\u6027\u3002\u53d1\u73b0\u8be5\u6846\u67b6\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u666e\u904d\u6027\u548c\u7279\u5b9a\u4e8e\u8bed\u8a00\u7684\u5fae\u89c2\u5c42\u9762\u548c\u5b8f\u89c2\u5c42\u9762\u7279\u5f81\u4e4b\u95f4\u7684\u5173\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u8bc4\u4f30\u7684\u65b9\u6cd5\uff0c\u5e76\u627e\u5230\u4e86\u5f3a\u5927\u7684\u6027\u80fd\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u81ea\u52a8\u5316\u7b2c\u4e8c\u8bed\u8a00\u8bc4\u4f30\u5de5\u5177\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6613\u4e8e\u9002\u5e94\u5176\u4ed6\u8bed\u8a00\uff0c\u56e0\u4e3a\u5b83\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u56e0\u6b64\u4e0d\u9700\u8981\u5927\u89c4\u6a21\u6807\u6ce8\u8bad\u7ec3\u6570\u636e\u3002|\n", "2408.16502": "|**2024-08-29**|**LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?**|Jan Cegin et.al.|[2408.16502](http://arxiv.org/abs/2408.16502)|null|\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8d8a\u6765\u8d8a\u5e7f\u6cdb\uff0c\u6587\u672c\u6837\u672c\u901a\u8fc7LLM\u8fdb\u884c\u540c\u4e49\u66ff\u6362\u540e\u7528\u4e8e\u5206\u7c7b\u6a21\u578b\u7684\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5173\u4e8eLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u76f8\u8f83\u4e8e\u73b0\u6709\u6210\u719f\u65b9\u6cd5\u662f\u5426\u5177\u6709\u660e\u663e\u4f18\u52bf\u7684\u7814\u7a76\u8bc1\u636e\u76f8\u5bf9\u7f3a\u4e4f\u3002\u4e3a\u4e86\u63a2\u8ba8\u5728\u4f55\u79cd\u60c5\u51b5\u4e0b\u4f7f\u7528LLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u66f4\u4e3a\u6709\u5229\uff0c\u672c\u7814\u7a76\u57286\u4e2a\u6570\u636e\u96c6\u30013\u4e2a\u5206\u7c7b\u5668\u548c2\u79cd\u5fae\u8c03\u65b9\u6cd5\u4e0a\u8fdb\u884c\u4e86\u5bf9\u6bd4\u5b9e\u9a8c\u3002\u6211\u4eec\u8fd8\u8c03\u6574\u4e86\u79cd\u5b50\u6570\u91cf\u548c\u6536\u96c6\u6837\u672c\u7684\u6570\u91cf\uff0c\u4ee5\u4fbf\u66f4\u5168\u9762\u5730\u63a2\u7d22\u4e0b\u6e38\u6a21\u578b\u51c6\u786e\u5ea6\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6210\u672c\u6548\u76ca\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u975e\u5e38\u5c11\u91cf\u79cd\u5b50\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u503c\u5f97\u90e8\u7f72\u3002\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u65b9\u6cd5\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u7c7b\u4f3c\u751a\u81f3\u66f4\u597d\u7684\u6a21\u578b\u51c6\u786e\u5ea6\u3002|\n", "2408.17437": "|**2024-08-30**|**SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists**|Raoyuan Zhao et.al.|[2408.17437](http://arxiv.org/abs/2408.17437)|**[link](https://github.com/loreley99/syntheval_checklist)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\u901a\u5e38\u4f7f\u7528\u9759\u6001\u9884\u7559\u6d4b\u8bd5\u96c6\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5f80\u5f80\u4f1a\u5bfc\u81f4\u6027\u80fd\u8fc7\u4f30\u8ba1\uff0c\u5e76\u7f3a\u4e4f\u63d0\u4f9b\u5168\u9762\u3001\u53ef\u89e3\u91ca\u548c\u52a8\u6001\u8bc4\u4f30NLP\u6a21\u578b\u7684\u80fd\u529b\u3002\u8fd1\u671f\uff0c\u5982DynaBench\uff08Kiela\u7b49\uff0c2021\u5e74\uff09\u548cCheckList\uff08Ribeiro\u7b49\uff0c2020\u5e74\uff09\u7b49\u4f5c\u54c1\u901a\u8fc7\u591a\u6b65\u9aa4\u4eba\u5de5\u6ce8\u91ca\u7ba1\u9053\u751f\u6210\u6d4b\u8bd5\u7c7b\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5bf9NLP\u6a21\u578b\u8fdb\u884c\u884c\u4e3a\u6d4b\u8bd5\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u624b\u52a8\u521b\u5efa\u5404\u79cd\u6d4b\u8bd5\u7c7b\u578b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\u52b3\u52a8\uff0c\u6210\u672c\u9ad8\u6602\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSYNTHEVAL\u7684\u6df7\u5408\u884c\u4e3a\u6d4b\u8bd5\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5927\u91cf\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4e3aNLP\u6a21\u578b\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002SYNTHEVAL\u9996\u5148\u901a\u8fc7LLMs\u8fdb\u884c\u53d7\u63a7\u751f\u6210\u751f\u6210\u53e5\u5b50\uff0c\u7136\u540e\u901a\u8fc7\u6bd4\u8f83LLMs\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684NLP\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\u6765\u8bc6\u522b\u6311\u6218\u6027\u793a\u4f8b\u3002\u6700\u540e\u9636\u6bb5\uff0c\u7531\u4eba\u7c7b\u4e13\u5bb6\u8c03\u67e5\u8fd9\u4e9b\u6311\u6218\u6027\u793a\u4f8b\uff0c\u624b\u52a8\u8bbe\u8ba1\u6a21\u677f\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u6a21\u578b\u4e00\u81f4\u8868\u73b0\u7684\u5931\u8d25\u7c7b\u578b\u3002\u6211\u4eec\u5c06SYNTHEVAL\u5e94\u7528\u4e8e\u60c5\u611f\u5206\u6790\u548c\u6709\u6bd2\u8bed\u8a00\u68c0\u6d4b\u4e24\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e0a\uff0c\u5e76\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u8bc6\u522b\u8fd9\u4e9b\u4efb\u52a1\u4e2d\u5f3a\u5927\u6a21\u578b\u7684\u5f31\u70b9\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5206\u4eab\u4e86\u4ee3\u7801\u4e8ehttps://github.com/Loreley99/SynthEval_CheckList\u3002**|\n", "2408.17431": "|**2024-08-30**|**Advancing Multi-talker ASR Performance with Large Language Models**|Mohan Shi et.al.|[2408.17431](http://arxiv.org/abs/2408.17431)|null|\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u9886\u57df\uff0c\u8bc6\u522b\u5bf9\u8bdd\u573a\u666f\u4e2d\u7684\u91cd\u53e0\u8bed\u97f3\u662f\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u5904\u7406\u65b9\u6cd5\u901a\u8fc7\u5e8f\u5217\u8f93\u51fa\u8bad\u7ec3\uff08SOT\uff09\uff0c\u5373\u5c06\u591a\u4e2a\u8bf4\u8bdd\u8005\u7684\u58f0\u97f3\u6392\u653e\u65f6\u95f4\u6309\u7167\u5176\u53d1\u8a00\u987a\u5e8f\u8fdb\u884c\u62fc\u63a5\uff0c\u6765\u89e3\u51b3\u591a\u8bf4\u8bdd\u8005ASR\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u4ece\u5bf9\u8bdd\u4e2d\u62fc\u63a5\u76f8\u5173\u8bdd\u8bed\u7684\u8f6c\u5f55\u4f9d\u8d56\u4e8e\u6784\u5efa\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u65b9\u6cd5\u53ef\u80fd\u66f4\u9002\u5408\u5904\u7406\u8fd9\u7c7b\u590d\u6742\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u573a\u666f\uff0c\u56e0\u4e3a\u5b83\u5229\u7528\u4e86\u9884\u8bad\u7ec3\u89e3\u7801\u5668\u7684\u5f3a\u5927\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684SOT\u65b9\u6cd5\u7528\u4e8e\u591a\u8bf4\u8bdd\u8005ASR\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u548cLLM\uff0c\u5e76\u901a\u8fc7\u9002\u5f53\u7684\u7b56\u7565\u5bf9\u591a\u8bf4\u8bdd\u8005\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6a21\u62df\u6570\u636e\u96c6LibriMix\u4e0a\u4f18\u4e8e\u4f20\u7edf\u7684\u65b9\u6cd5\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6AMI\u7684\u8bc4\u4f30\u96c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u663e\u8457\u8d85\u8d8a\u4e86\u4e4b\u524d\u4f7f\u75281000\u500d\u66f4\u591a\u76d1\u7763\u6570\u636e\u8bad\u7ec3\u7684AED\u6a21\u578b\u3002|\n", "2408.17404": "|**2024-08-30**|**Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach**|Jialiang Wei et.al.|[2408.17404](http://arxiv.org/abs/2408.17404)|**[link](https://github.com/jl-wei/feature-inspiration)**|\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\uff0c\u501f\u9274\u5e94\u7528\u5546\u5e97\uff08AppStore\uff09\u7684\u89c4\u8303\u83b7\u53d6\u65b9\u6cd5\u88ab\u8bc1\u660e\u975e\u5e38\u6709\u76ca\u3002\u5f00\u53d1\u8005\u7ecf\u5e38\u7814\u7a76\u7ade\u4e89\u5bf9\u624b\u7684\u5e94\u7528\u7a0b\u5e8f\u4ee5\u6536\u96c6\u65b0\u529f\u80fd\u7684\u7075\u611f\u3002\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u542f\u53d1\u7684\u89c4\u8303\u83b7\u53d6\u5177\u6709\u6f5c\u529b\u3002LLMs\u53ef\u4ee5\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u65b0\u529f\u80fd\u60f3\u6cd5\u7684\u7075\u611f\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u5728\u5b9e\u8df5\u4e2d\u8d8a\u6765\u8d8a\u53d7\u6b22\u8fce\uff0c\u4f46\u5b83\u4eec\u4e4b\u95f4\u7684\u5dee\u5f02\u7f3a\u4e4f\u6df1\u5165\u7406\u89e3\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6bd4\u8f83\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5e94\u7528\u5546\u5e97\u548cLLM\u542f\u53d1\u7684\u65b9\u6cd5\u5728\u7ec6\u5316\u529f\u80fd\u4e3a\u5b50\u529f\u80fd\u65f6\u7684\u8868\u73b0\u3002\u901a\u8fc7\u624b\u52a8\u5206\u6790\u4ece\u4e24\u79cd\u65b9\u6cd5\u63a8\u8350\u76841200\u4e2a\u5b50\u529f\u80fd\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u5b83\u4eec\u7684\u4f18\u70b9\u3001\u6311\u6218\u4ee5\u53ca\u5173\u952e\u5dee\u5f02\u3002\u5c3d\u7ba1\u4e24\u79cd\u65b9\u6cd5\u90fd\u63a8\u8350\u4e86\u9ad8\u5ea6\u76f8\u5173\u4e14\u63cf\u8ff0\u6e05\u6670\u7684\u5b50\u529f\u80fd\uff0c\u4f46LLMs\u5728\u7279\u522b\u6d89\u53ca\u672a\u89c1\u5e94\u7528\u8303\u56f4\u7684\u65b0\u9896\u6027\u65b9\u9762\u4f3c\u4e4e\u66f4\u4e3a\u5f3a\u5927\u3002\u6b64\u5916\uff0c\u4e00\u4e9b\u63a8\u8350\u7684\u529f\u80fd\u662f\u865a\u6784\u7684\uff0c\u5176\u53ef\u884c\u6027\u4e0d\u660e\u786e\uff0c\u8fd9\u5f3a\u8c03\u4e86\u4eba\u7c7b\u5206\u6790\u5e08\u5728\u83b7\u53d6\u8fc7\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2408.17377": "|**2024-08-30**|**NDP: Next Distribution Prediction as a More Broad Target**|Junhao Ruan et.al.|[2408.17377](http://arxiv.org/abs/2408.17377)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u8303\u5f0f\u8fdb\u884c\u8bad\u7ec3\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684NTP\u8303\u5f0f\u5b58\u5728\u51e0\u4e2a\u9650\u5236\uff0c\u7279\u522b\u662f\u5728\u8ba1\u5212\u4efb\u52a1\u590d\u6742\u6027\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9519\u8bef\u4f20\u64ad\u65b9\u9762\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6269\u5c55\u4e86\u5bf9NTP\u7684\u6279\u8bc4\uff0c\u6307\u51fa\u5176\u9650\u5236\u8fd8\u6e90\u4e8e\u8bad\u7ec3\u76ee\u6807\u72ed\u7a84\uff1a\u9884\u6d4b\u4e00\u4e2a\u6b21\u4f18\u7684\u4e00\u70ed\u5206\u5e03\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u6279\u8bc4\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u9884\u5b9e\u9a8c\uff0c\u5c06\u5f3a\u5927\u7684LLM\u7684\u8f93\u51fa\u5206\u5e03\u89c6\u4e3a\u9ad8\u6548\u7684\u4e16\u754c\u6570\u636e\u538b\u7f29\u3002\u901a\u8fc7\u8bc4\u4f30n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\uff0c\u6211\u4eec\u53d1\u73b0n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u66f4\u4e3a\u4e00\u81f4\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e0b\u4e00\u4e2a\u5206\u5e03\u9884\u6d4b\uff08NDP\uff09\uff0c\u4f7f\u7528n-gram\u5206\u5e03\u6765\u66ff\u6362\u4e00\u70ed\u76ee\u6807\uff0c\u4ece\u800c\u589e\u5f3a\u5b66\u4e60\u8fc7\u7a0b\u800c\u65e0\u9700\u989d\u5916\u7684\u5728\u7ebf\u8bad\u7ec3\u65f6\u95f4\u3002\u6211\u4eec\u5728\u7ffb\u8bd1\u3001\u901a\u7528\u4efb\u52a1\u3001\u8bed\u8a00\u8fc1\u79fb\u548c\u533b\u7597\u9886\u57df\u9002\u5e94\u7b49\u56db\u4e2a\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0eNTP\u76f8\u6bd4\uff0cNDP\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u53ef\u8fbe\u5230+2.97 COMET\u6539\u8fdb\uff0c\u5728\u901a\u7528\u4efb\u52a1\u4e0a\u5e73\u5747\u6539\u5584+0.61\uff0c\u5728\u533b\u7597\u9886\u57df\u4e0a\u5e73\u5747\u6539\u5584+10.75\u3002\u8fd9\u8868\u660e\u89e3\u51b3\u76ee\u6807\u72ed\u7a84\u95ee\u9898\u7684\u5177\u4f53\u76ca\u5904\uff0c\u5e76\u6307\u51fa\u4e86\u672a\u6765\u6539\u8fdbNTP\u7684\u4e00\u4e2a\u65b0\u65b9\u5411\u3002|\n", "2408.17362": "|**2024-08-30**|**Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain**|Francesca Grasso et.al.|[2408.17362](http://arxiv.org/abs/2408.17362)|**[link](https://github.com/stefanolocci/LLMClassification)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4e24\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09GPT3.5\u548cLlama2\u4ee5\u53ca\u4e00\u79cd\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09Gemma\u5728\u6c14\u5019\u53d8\u5316\uff08CC\uff09\u548c\u73af\u5883\u9886\u57df\u5185\u7684\u4e09\u79cd\u4e0d\u540c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528\u57fa\u4e8eBERT\u7684\u6a21\u578b\u4f5c\u4e3a\u57fa\u51c6\uff0c\u6211\u4eec\u5c06\u8fd9\u4e9b\u8f6c\u6362\u5668\u57fa\u6a21\u578b\u4e0e\u5b83\u4eec\u8fdb\u884c\u6bd4\u8f83\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u8fd9\u4e9b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u53e3\u5934\u4fe1\u5fc3\u5206\u6570\u7684\u6821\u51c6\u60c5\u51b5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u57fa\u4e8eBERT\u7684\u6a21\u578b\u901a\u5e38\u5728\u6240\u6709\u6a21\u578b\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u4f46\u5927\u751f\u6210\u6a21\u578b\u7684\u6027\u80fd\u4ecd\u7136\u503c\u5f97\u6ce8\u610f\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u7684\u6821\u51c6\u5206\u6790\u663e\u793a\uff0cGemma\u5728\u521d\u671f\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u826f\u597d\u7684\u6821\u51c6\u6027\uff0c\u968f\u540e\u4ea7\u751f\u4e0d\u4e00\u81f4\u7684\u7ed3\u679c\uff1bLlama\u5177\u6709\u5408\u7406\u7684\u6821\u51c6\u6027\uff0c\u800cGPT\u59cb\u7ec8\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6821\u51c6\u6027\u3002\u901a\u8fc7\u8fd9\u9879\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u8ba8\u8bba\u5927\u578b\u751f\u6210\u578bLM\u5728\u89e3\u51b3\u5730\u7403\u6700\u7d27\u8feb\u95ee\u9898\u65b9\u9762\u7684\u9002\u7528\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u8d21\u732e\uff0c\u7279\u522b\u662f\u5728\u751f\u6001\u5b66\u548cCC\u80cc\u666f\u4e0b\u7a81\u51fa\u5176\u4f18\u52bf\u548c\u9650\u5236\u3002**|\n", "2408.17354": "|**2024-08-30**|**Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage**|Md Rafi Ur Rashid et.al.|[2408.17354](http://arxiv.org/abs/2408.17354)|null|\u9488\u5bf9\u79c1\u6709\u6570\u636e\u8fdb\u884c\u4e0b\u6e38\u5e94\u7528\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u5b58\u5728\u91cd\u5927\u9690\u79c1\u98ce\u9669\uff0c\u53ef\u80fd\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u3002\u5f53\u524d\u793e\u533a\u5e73\u53f0\u63d0\u4f9b\u4e86\u65b9\u4fbf\u7684\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u6a21\u578b\u5206\u53d1\uff0c\u4efb\u4f55\u4eba\u90fd\u53ef\u4ee5\u53d1\u5e03\u800c\u65e0\u9700\u4e25\u683c\u7684\u9a8c\u8bc1\u3002\u8fd9\u79cd\u60c5\u5883\u4e0b\uff0c\u9690\u79c1\u5a01\u80c1\u663e\u8457\u589e\u52a0\uff0c\u56e0\u4e3a\u9884\u8bad\u7ec3\u6a21\u578b\u53ef\u80fd\u88ab\u6545\u610f\u7be1\u6539\u4ee5\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u6cc4\u9732\u79c1\u4eba\u6570\u636e\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e2d\u6bd2\u6280\u672f\uff0c\u4f7f\u7528\u6a21\u578b\u5378\u8f7d\u4f5c\u4e3a\u653b\u51fb\u5de5\u5177\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u8c03\u6574\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6765\u63d0\u9ad8\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u79c1\u4eba\u6570\u636e\u6cc4\u9732\u7a0b\u5ea6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u6a21\u578b\u5b9e\u7528\u6027\u7684\u540c\u65f6\uff0c\u589e\u5f3a\u4e86\u6210\u5458\u5f52\u5c5e\u6027\u548c\u6570\u636e\u63d0\u53d6\u653b\u51fb\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u4e0d\u540c\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5fae\u8c03\u8bbe\u7f6e\u4e0b\u663e\u793a\uff0c\u6211\u4eec\u7684\u653b\u51fb\u663e\u8457\u8d85\u8d8a\u4e86\u57fa\u51c6\u6027\u80fd\u3002\u8fd9\u9879\u5de5\u4f5c\u5411\u4e0b\u8f7d\u672a\u7ecf\u8fc7\u4e25\u683c\u9a8c\u8bc1\u6765\u6e90\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7528\u6237\u53d1\u51fa\u4e86\u8b66\u544a\uff0c\u7a81\u663e\u4e86\u6f5c\u5728\u7684\u98ce\u9669\u3002|\n", "2408.17316": "|**2024-08-30**|**Bridging Domain Knowledge and Process Discovery Using Large Language Models**|Ali Norouzifar et.al.|[2408.17316](http://arxiv.org/abs/2408.17316)|**[link](https://github.com/alinorouzifar/imr-llm)**|**\u53d1\u73b0\u4f18\u8d28\u6d41\u7a0b\u6a21\u578b\u5bf9\u4e8e\u6267\u884c\u4e0d\u540c\u7684\u6d41\u7a0b\u5206\u6790\u4efb\u52a1\u81f3\u5173\u91cd\u8981\uff0c\u5982\u4e00\u81f4\u6027\u68c0\u67e5\u548c\u6d41\u7a0b\u6539\u8fdb\u3002\u81ea\u52a8\u5316\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u4ef7\u503c\u7684\u4e13\u4e1a\u9886\u57df\u77e5\u8bc6\u3002\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5305\u62ec\u6765\u81ea\u4e13\u4e1a\u9886\u57df\u4e13\u5bb6\u7684\u89c1\u89e3\u548c\u8be6\u7ec6\u6d41\u7a0b\u6587\u6863\uff0c\u901a\u5e38\u5728\u6d41\u7a0b\u53d1\u73b0\u8fc7\u7a0b\u4e2d\u672a\u5f97\u5230\u5145\u5206\u5229\u7528\u3002\u672c\u6587\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f4\u63a5\u5c06\u6b64\u7c7b\u77e5\u8bc6\u6574\u5408\u5230\u6d41\u7a0b\u53d1\u73b0\u4e2d\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u4f7f\u7528\u4eceLLMs\u4e2d\u63d0\u53d6\u7684\u89c4\u5219\u6765\u6307\u5bfc\u6a21\u578b\u6784\u5efa\u8fc7\u7a0b\uff0c\u786e\u4fdd\u5176\u4e0e\u9886\u57df\u77e5\u8bc6\u548c\u5b9e\u9645\u6d41\u7a0b\u6267\u884c\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u6574\u5408LLMs\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u5ea7\u8fde\u63a5\u4ee5\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u7684\u6d41\u7a0b\u77e5\u8bc6\u4e0e\u53d1\u73b0\u7a33\u5065\u6d41\u7a0b\u6a21\u578b\u4e4b\u95f4\u7684\u6865\u6881\uff0c\u663e\u8457\u63a8\u8fdb\u4e86\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u8bba\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u6846\u67b6\u7684\u5b9e\u7528\u6027\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u5bf9\u8c61\u662fUWV\u5458\u5de5\u4fdd\u9669\u516c\u53f8\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5b9e\u9645\u4f18\u52bf\u548c\u6709\u6548\u6027\u3002**|\n", "2408.17280": "|**2024-08-30**|**Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts**|Rhui Dih Lee et.al.|[2408.17280](http://arxiv.org/abs/2408.17280)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5de5\u5177\u5305\uff0c\u7528\u4e8e\u4ece\u5df2\u8bad\u7ec3\u7684\u6a21\u578b\u521b\u5efa\u4f4e\u6210\u672c\u7684\u9886\u57df\u4e13\u5bb6\u6df7\u5408\uff08MOE\uff09\u3002\u8be5\u5de5\u5177\u5305\u53ef\u4ee5\u7528\u4e8e\u4ece\u6a21\u578b\u6216\u9002\u914d\u5668\u521b\u5efa\u6df7\u5408\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d4b\u8bd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u5173\u4e8e\u4f7f\u7528\u5de5\u5177\u5305\u5b9a\u4e49\u7ed3\u679cMOE\u67b6\u6784\u7684\u6307\u5bfc\u3002\u516c\u5f00\u4e86\u4e00\u4e2a\u53ef\u7528\u7684\u5b58\u50a8\u5e93\u3002|\n", "2408.17258": "|**2024-08-30**|**Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach**|Tong Nie et.al.|[2408.17258](http://arxiv.org/abs/2408.17258)|null|\u7535\u5b50\u5546\u52a1\u548c\u57ce\u5e02\u5316\u7684\u84ec\u52c3\u53d1\u5c55\uff0c\u6781\u5927\u5730\u589e\u5f3a\u4e86\u57ce\u5e02\u533a\u57df\u7684\u914d\u9001\u6d3b\u52a8\uff0c\u5bfc\u81f4\u4e86\u9700\u6c42\u91cf\u7684\u589e\u52a0\u4e0e\u590d\u6742\u6027\u7684\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6570\u636e\u9a71\u52a8\u7684\u9884\u6d4b\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5f00\u59cb\u5728\u57ce\u5e02\u914d\u9001\u9700\u6c42\u7ba1\u7406\u95ee\u9898\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\u5168\u57ce\u8303\u56f4\u5185\u7684\u914d\u9001\u9700\u6c42\u8054\u5408\u4f30\u8ba1\u4e0e\u9884\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c06\u5176\u5efa\u6a21\u4e3a\u4e00\u4e2a\u57fa\u4e8e\u56fe\u7684\u65f6\u7a7a\u5b66\u4e60\u4efb\u52a1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e00\u4e2a\u6d88\u606f\u4f20\u9012\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u6765\u6355\u6349\u76f8\u5173\u533a\u57df\u4e4b\u95f4\u9700\u6c42\u6a21\u5f0f\u7684\u4ea4\u4e92\u3002\u5176\u6b21\uff0c\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u4ece\u672a\u7ed3\u6784\u5316\u7684\u5730\u7406\u4f4d\u7f6e\u6570\u636e\u4e2d\u63d0\u53d6\u901a\u7528\u7684\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u7f16\u7801\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u9700\u6c42\u9884\u6d4b\u5668\u4e2d\u3002\u6700\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u6a21\u578b\u5728\u4e0d\u540c\u57ce\u5e02\u7684\u8fc1\u79fb\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u5f52\u7eb3\u8bad\u7ec3\u65b9\u6848\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u771f\u5b9e\u7684\u914d\u9001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4e2d\u56fd\u7684\u516b\u4e2a\u57ce\u5e02\u548c\u7f8e\u56fd\u7684\u57ce\u5e02\uff0c\u7ed3\u679c\u8868\u660e\u6211\u4eec\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u57fa\u51c6\u65b9\u6cd5\u3002|\n", "2408.17253": "|**2024-08-30**|**VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters**|Mouxiang Chen et.al.|[2408.17253](http://arxiv.org/abs/2408.17253)|**[link](https://github.com/keytoyze/visionts)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ece\u4e30\u5bcc\u4e14\u9ad8\u8d28\u91cf\u7684\u81ea\u7136\u56fe\u50cf\u51fa\u53d1\u6784\u5efa\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u57fa\u7840\u6a21\u578b\u7684\u65b0\u8def\u5f84\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u8981\u4e48\u901a\u8fc7\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8981\u4e48\u5efa\u7acb\u5927\u89c4\u6a21\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u96c6\u6765\u5f00\u53d1TSF\u57fa\u7840\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u9762\u4e34\u8de8\u57df\u5dee\u8ddd\u6216\u9886\u57df\u5185\u5f02\u8d28\u6027\u7684\u4e25\u5cfb\u6311\u6218\u3002\u6211\u4eec\u57fa\u4e8e\u56fe\u50cf\u4e0e\u65f6\u95f4\u5e8f\u5217\u4e4b\u95f4\u5185\u5728\u76f8\u4f3c\u6027\uff0c\u63a2\u7d22\u4e86\u4e00\u79cd\u65b0\u7684TSF\u4efb\u52a1\u8868\u793a\uff0c\u5c06\u5176\u91cd\u65b0\u8868\u8ff0\u4e3a\u56fe\u50cf\u91cd\u5efa\u4efb\u52a1\uff0c\u5e76\u5229\u7528\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u63a9\u7801\u81ea\u52a8\u7f16\u7801\u5668\uff08MAE\uff09\u8fdb\u884c\u5904\u7406\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5728\u65e0\u9700\u8fdb\u4e00\u6b65\u5728\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u8fdb\u884c\u9002\u5e94\u7684\u60c5\u51b5\u4e0b\uff0c\u6240\u63d0\u51fa\u7684VisionTS\u5c31\u80fd\u5b9e\u73b0\u4f18\u4e8e\u73b0\u6709TSF\u57fa\u7840\u6a21\u578b\u7684\u96f6\u6837\u672c\u9884\u6d4b\u6027\u80fd\u3002\u901a\u8fc7\u6700\u5c0f\u7a0b\u5ea6\u7684\u5fae\u8c03\uff0cVisionTS\u80fd\u591f\u8fdb\u4e00\u6b65\u63d0\u5347\u9884\u6d4b\u6027\u80fd\uff0c\u5e76\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u89c6\u89c9\u6a21\u578b\u53ef\u80fd\u4e3aTSF\u63d0\u4f9b\u514d\u8d39\u5348\u9910\uff0c\u5e76\u5f3a\u8c03\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u4e0eTSF\u9886\u57df\u672a\u6765\u4ea4\u53c9\u7814\u7a76\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/Keytoyze/VisionTS\u4e0a\u3002**|\n", "2409.02920": "|**2024-09-04**|**RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)**|Yao Mu et.al.|[2409.02920](http://arxiv.org/abs/2409.02920)|null|\u672c\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboTwin\u7684\u65b0\u578b\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u7ed3\u5408\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9065\u63a7\u6570\u636e\u4e0e\u901a\u8fc7\u6570\u5b57\u5b6a\u751f\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u3002RoboTwin\u65e8\u5728\u4e3a\u53cc\u81c2\u673a\u5668\u4eba\u573a\u666f\u63d0\u4f9b\u652f\u6301\uff0c\u7279\u522b\u5173\u6ce8\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u548c\u4eba\u673a\u4ea4\u4e92\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528COBOT Magic\u5e73\u53f0\u6536\u96c6\u4e86\u4e30\u5bcc\u7684\u6570\u636e\uff0c\u6db5\u76d6\u5de5\u5177\u64cd\u4f5c\u548c\u4eba\u673a\u4e92\u52a8\u7684\u591a\u6837\u6027\u3002 \u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\u6765\u521b\u5efa\u6570\u5b57\u5b6a\u751f\u4f53\uff0c\u5229\u7528AI\u751f\u6210\u7684\u5185\u5bb9\u5c06\u4e8c\u7ef4\u56fe\u50cf\u8f6c\u6362\u4e3a\u8be6\u7ec6\u7684\u4e09\u7ef4\u6a21\u578b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u4e13\u5bb6\u7ea7\u8bad\u7ec3\u6570\u636e\u548c\u9762\u5411\u529f\u80fd\u6027\u7684\u4efb\u52a1\u7279\u5b9a\u59ff\u6001\u5e8f\u5217\u3002 \u6211\u4eec\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a 1. RoboTwin\u57fa\u51c6\u6570\u636e\u96c6\uff0c 2. \u9ad8\u6548\u7684\u73b0\u5b9e\u5230\u6a21\u62df\u7ba1\u9053\uff0c\u4ee5\u53ca 3. \u5229\u7528\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u52a8\u4e13\u5bb6\u7ea7\u6570\u636e\u751f\u6210\u3002 \u8fd9\u4e9b\u8fdb\u5c55\u65e8\u5728\u89e3\u51b3\u673a\u5668\u4eba\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u6709\u671b\u52a0\u901f\u5f00\u53d1\u66f4\u591a\u529f\u80fd\u5f3a\u5927\u3001\u9002\u5e94\u6027\u5e7f\u6cdb\u7684\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5e94\u7528\u4e8e\u5e7f\u6cdb\u7684\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://robotwin-benchmark.github.io/early-version/|\n", "2409.02897": "|**2024-09-05**|**LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA**|Jiajie Zhang et.al.|[2409.02897](http://arxiv.org/abs/2409.02897)|**[link](https://github.com/THUDM/LongCite)**|\u5c3d\u7ba1\u5f53\u524d\u7684\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5927\u91cf\u6587\u672c\u56de\u7b54\u7528\u6237\u95ee\u9898\u65b9\u9762\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7f3a\u4e4f\u5f15\u7528\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u9a8c\u8bc1\u7b54\u6848\u7684\u51c6\u786e\u6027\uff0c\u4ece\u800c\u5f15\u53d1\u4e86\u5bf9\u5176\u53ef\u9760\u6027\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4ea7\u751f\u9519\u8bef\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4f7f\u8fd9\u4e9b\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u751f\u6210\u5305\u542b\u7cbe\u7ec6\u53e5\u7ea7\u5f15\u7528\u7684\u54cd\u5e94\uff0c\u4ee5\u63d0\u9ad8\u5b83\u4eec\u7684\u5fe0\u5b9e\u5ea6\u548c\u53ef\u9a8c\u8bc1\u6027\u3002 \u6211\u4eec\u9996\u5148\u5f15\u5165\u4e86LongBench-Cite\uff0c\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u8868\u73b0\u7684\u57fa\u51c6\uff0c\u63ed\u793a\u4e86\u5728\u53e5\u7ea7\u5f15\u7528\u65b9\u9762\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CoF\uff08\u7c97\u5230\u7ec6\uff09\u8fd9\u4e00\u65b0\u9896\u7684\u7ba1\u9053\uff0c\u5229\u7528\u73b0\u6210\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5305\u542b\u7cbe\u786e\u53e5\u7ea7\u5f15\u7528\u7684\u957f\u6587\u672c\u95ee\u7b54\u5b9e\u4f8b\uff0c\u5e76\u4ee5\u6b64\u7ba1\u9053\u6784\u5efa\u4e86LongCite-45k\uff0c\u4e00\u4e2a\u7528\u4e8e\u53e5\u7ea7\u5f15\u7528\u95ee\u9898\u7684\u5927\u578b\u81ea\u76d1\u7763\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528LongCite-45k\u6570\u636e\u96c6\u8bad\u7ec3\u4e86LongCite-8B\u548cLongCite-9B\u6a21\u578b\uff0c\u6210\u529f\u5730\u4f7f\u5b83\u4eec\u80fd\u591f\u5728\u5355\u4e2a\u8f93\u51fa\u4e2d\u751f\u6210\u51c6\u786e\u7684\u54cd\u5e94\u548c\u7cbe\u7ec6\u7684\u53e5\u7ea7\u5f15\u7528\u3002\u5728LongBench-Cite\u4e0a\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u8bad\u7ec3\u6a21\u578b\u5728\u5f15\u7528\u8d28\u91cf\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6c34\u5e73\uff0c\u8d85\u8d8a\u4e86\u5305\u62ecGPT-4\u5728\u5185\u7684\u9ad8\u7ea7\u4e13\u6709\u6a21\u578b\u3002|\n", "2409.02889": "|**2024-09-04**|**LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture**|Xidong Wang et.al.|[2409.02889](http://arxiv.org/abs/2409.02889)|**[link](https://github.com/freedomintelligence/longllava)**|**\u6269\u5c55\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u5bf9\u4e8e\u89c6\u9891\u7406\u89e3\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u7406\u89e3\u548c\u591a\u6a21\u6001\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6d89\u53ca\u5230\u4e00\u7cfb\u5217\u7cfb\u7edf\u4f18\u5316\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u6570\u636e\u6784\u9020\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u5c24\u5176\u662f\u89e3\u51b3\u968f\u7740\u66f4\u591a\u56fe\u50cf\u5f15\u5165\u800c\u51fa\u73b0\u7684\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u9ad8\u6602\u8ba1\u7b97\u6210\u672c\u7b49\u95ee\u9898\u3002\u672c\u6587\u901a\u8fc7\u5c06\u6a21\u578b\u67b6\u6784\u8c03\u6574\u4e3aMamba\u548cTransformer\u5757\u7684\u6df7\u5408\u4f53\u3001\u91c7\u7528\u65e2\u80fd\u8003\u8651\u591a\u4e2a\u56fe\u50cf\u95f4\u65f6\u95f4\u4f9d\u8d56\u6027\u53c8\u80fd\u8003\u8651\u7a7a\u95f4\u4f9d\u8d56\u6027\u7684\u6570\u636e\u6784\u9020\u65b9\u6cd5\uff0c\u5e76\u5b9e\u65bd\u6e10\u8fdb\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5bf9\u8fd9\u4e9b\u6311\u6218\u8fdb\u884c\u4e86\u5e94\u5bf9\u3002\u53d1\u5e03\u7684\u6a21\u578b\u201cLongLLaVA\u201d\uff08\u957f\u671f\u8bed\u8a00\u4e0e\u89c6\u89c9\u52a9\u624b\uff09\u662f\u9996\u4e2a\u6df7\u5408\u578bMLLM\uff0c\u5b9e\u73b0\u4e86\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u826f\u597d\u5e73\u8861\u3002LongLLaVA\u4e0d\u4ec5\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u800c\u4e14\u4fdd\u6301\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u6d88\u8017\u7684\u7279\u70b9\u3002\u7279\u522b\u5730\uff0c\u5b83\u80fd\u591f\u5728\u5355\u4e2aA100 80GB GPU\u4e0a\u5904\u7406\u8fd1\u4e00\u5343\u5f20\u56fe\u7247\uff0c\u5c55\u793a\u4e86\u5e7f\u6cdb\u4efb\u52a1\u5e94\u7528\u524d\u666f\u7684\u6f5c\u529b\u3002**|\n", "2409.02841": "|**2024-09-04**|**Historical German Text Normalization Using Type- and Token-Based Language Modeling**|Anton Ehrmanntraut et.al.|[2409.02841](http://arxiv.org/abs/2409.02841)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf91700\u5e74\u81f31900\u5e74\u5fb7\u56fd\u6587\u5b66\u6587\u672c\u7684\u6b63\u8bcd\u6cd5\u89c4\u8303\u5316\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u5e73\u884c\u8bed\u6599\u5e93\u8bad\u7ec3\u3002\u6240\u63d0\u51fa\u7684\u7cfb\u7edf\u5229\u7528\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u548cTransformer\u8bed\u8a00\u6a21\u578b\uff0c\u7ed3\u5408\u7f16\u7801\u5668-\u89e3\u7801\u5668\u6a21\u578b\u5bf9\u5355\u4e2a\u8bcd\u6c47\u7c7b\u578b\u8fdb\u884c\u89c4\u8303\u5316\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u56e0\u679c\u8bed\u8a00\u6a21\u578b\u5728\u4e0a\u4e0b\u6587\u4e2d\u8c03\u6574\u8fd9\u4e9b\u89c4\u8303\u5316\u7ed3\u679c\u3002\u5e7f\u6cdb\u8bc4\u4f30\u8868\u660e\uff0c\u8be5\u63d0\u51fa\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u6700\u5148\u8fdb\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u5b8c\u5168\u7aef\u5230\u7aef\u7684\u53e5\u5b50\u7ea7\u89c4\u8303\u5316\u7cfb\u7edf\u76f8\u5f53\uff0c\u8be5\u7cfb\u7edf\u662f\u901a\u8fc7\u5bf9\u9884\u8bad\u7ec3\u7684Transformer\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u800c\u5b9e\u73b0\u7684\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6a21\u578b\u96be\u4ee5\u6cdb\u5316\u4ee5\u53ca\u7f3a\u4e4f\u5927\u91cf\u9ad8\u8d28\u91cf\u5e73\u884c\u6570\u636e\uff0c\u5386\u53f2\u6587\u672c\u7684\u89c4\u8303\u5316\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002|\n", "2409.02836": "|**2024-09-04**|**Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models**|Moein Shahiki Tash et.al.|[2409.02836](http://arxiv.org/abs/2409.02836)|null|\u672c\u6587\u901a\u8fc7\u8fd0\u7528\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\uff0c\u5bf9\u52a0\u5bc6\u8d27\u5e01\u76f8\u5173\u8ba8\u8bba\u4e2d\u7684\u9884\u6d4b\u9648\u8ff0\u3001\u5e0c\u671b\u6f14\u8bb2\u53ca\u6094\u6068\u68c0\u6d4b\u884c\u4e3a\u8fdb\u884c\u5206\u6790\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5206\u7c7b\u65b9\u6cd5\u2014\u2014\u201c\u9884\u6d4b\u9648\u8ff0\u201d\uff0c\u5c06\u5176\u7ec6\u5206\u4e3a\u9884\u6d4b\u589e\u52a0\u3001\u9884\u6d4b\u51cf\u5c11\u3001\u9884\u6d4b\u4e2d\u7acb\u6216\u975e\u9884\u6d4b\u7c7b\u522b\u3002\u5229\u7528GPT-4o\u8fd9\u4e00\u524d\u6cbf\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u5728\u4e94\u5927\u4e3b\u6d41\u52a0\u5bc6\u8d27\u5e01\uff08Cardano\u3001Binance\u3001Matic\u3001Fantom\u3001Ripple\uff09\u7684\u8ba8\u8bba\u4e2d\u63a2\u7d22\u4e86\u60c5\u7eea\u52a8\u6001\u3002\u7814\u7a76\u53d1\u73b0\uff0cMatic\u5728\u4e50\u89c2\u9884\u6d4b\u65b9\u9762\u663e\u793a\u51fa\u7279\u522b\u9ad8\u7684\u503e\u5411\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5e0c\u671b\u4e0e\u6094\u6068\u60c5\u7eea\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u63ed\u793a\u4e86\u8fd9\u4e9b\u60c5\u611f\u4e0e\u9884\u6d4b\u884c\u4e3a\u4e4b\u95f4\u590d\u6742\u7684\u4e92\u52a8\u6a21\u5f0f\u3002\u5c3d\u7ba1\u9762\u4e34\u6570\u636e\u91cf\u548c\u8d44\u6e90\u53ef\u7528\u6027\u65b9\u9762\u7684\u9650\u5236\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4ecd\u63ed\u793a\u4e86\u52a0\u5bc6\u8d27\u5e01\u5e02\u573a\u6295\u8d44\u8005\u884c\u4e3a\u548c\u60c5\u7eea\u8d8b\u52bf\u7684\u91cd\u8981\u53d1\u73b0\uff0c\u4e3a\u6218\u7565\u51b3\u7b56\u548c\u672a\u6765\u7814\u7a76\u63d0\u4f9b\u4e86\u4fe1\u606f\u3002|\n", "2409.02834": "|**2024-09-04**|**CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models**|Wentao Liu et.al.|[2409.02834](http://arxiv.org/abs/2409.02834)|**[link](https://github.com/ecnu-icalk/educhat-math)**|\u672c\u6587\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aCMM-Math\u7684\u4e2d\u6587\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\uff0c\u5305\u542b\u57fa\u51c6\u548c\u8bad\u7ec3\u90e8\u5206\uff0c\u65e8\u5728\u8bc4\u4f30\u548c\u589e\u5f3a\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\u3002CMM-Math\u5305\u542b\u4e86\u8d85\u8fc728,000\u4e2a\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u4ece\u5c0f\u5b66\u5230\u9ad8\u4e2d\u7684\u4e2d\u56fd12\u4e2a\u5e74\u7ea7\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\uff08\u4f8b\u5982\u9009\u62e9\u9898\u3001\u586b\u7a7a\u9898\u7b49\uff09\uff0c\u5e76\u63d0\u4f9b\u4e86\u8be6\u7ec6\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7279\u522b\u5730\uff0c\u95ee\u9898\u6216\u89c2\u70b9\u4e2d\u53ef\u80fd\u5305\u542b\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u4f7f\u5f97\u8fd9\u4e2a\u6570\u636e\u96c6\u66f4\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u6700\u5148\u8fdb\u7684LMM\u5728CMM-Math\u6570\u636e\u96c6\u4e0a\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728LMM\u5f00\u53d1\u65b9\u9762\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultimodal Mathematical LMM\uff08Math-LMM\uff09\u7684\u6a21\u578b\u6765\u5904\u7406\u6df7\u5408\u8f93\u5165\u7684\u591a\u4e2a\u56fe\u50cf\u548c\u6587\u672c\u6bb5\u843d\u7684\u95ee\u9898\u3002\u6211\u4eec\u91c7\u7528\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff1a\u57fa\u7840\u9884\u8bad\u7ec3\u3001\u57fa\u7840\u5fae\u8c03\u548c\u6570\u5b66\u5fae\u8c03\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u4e0e\u4e09\u4e2a\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u7684SOTA LMM\u8fdb\u884c\u6bd4\u8f83\u65f6\uff0c\u6709\u6548\u5730\u63d0\u9ad8\u4e86\u6570\u5b66\u63a8\u7406\u6027\u80fd\u3002|\n", "2409.02828": "|**2024-09-04**|**ExpLLM: Towards Chain of Thought for Facial Expression Recognition**|Xing Lan et.al.|[2409.02828](http://arxiv.org/abs/2409.02828)|null|\u9762\u90e8\u8868\u60c5\u8bc6\u522b\uff08FER\uff09\u5728\u591a\u5a92\u4f53\u9886\u57df\u81f3\u5173\u91cd\u8981\uff0c\u5bf9\u5404\u79cd\u5e94\u7528\u5177\u6709\u91cd\u5927\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7406\u89e3\u9762\u90e8\u8868\u60c5\u7684\u539f\u56e0\u5bf9\u4e8e\u51c6\u786e\u8bc6\u522b\u8868\u60c5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9762\u90e8\u52a8\u4f5c\u5355\u4f4d\uff08AUs\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u5e38\u63d0\u4f9bAU\u540d\u79f0\u548c\u5f3a\u5ea6\uff0c\u4f46\u7f3a\u4e4f\u5173\u4e8eAU\u4e4b\u95f4\u7684\u4e92\u52a8\u4ee5\u53ca\u6574\u4f53\u8868\u60c5\u4e4b\u95f4\u5173\u7cfb\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aExpLLM\u7684\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u7684\u51c6\u786e\u601d\u7ef4\u94fe\uff08CoT\uff09\u3002\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u89c6\u89d2\u8bbe\u8ba1\u4e86CoT\u673a\u5236\uff1a\u5173\u952e\u89c2\u5bdf\u3001\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u548c\u7ed3\u8bba\u3002\u5173\u952e\u89c2\u5bdf\u63cf\u8ff0\u4e86AU\u7684\u540d\u79f0\u3001\u5f3a\u5ea6\u53ca\u5176\u76f8\u5173\u60c5\u611f\u3002\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u57fa\u4e8e\u591a\u4e2aAU\u53ca\u5176\u4e92\u52a8\u8fdb\u884c\u5206\u6790\uff0c\u786e\u5b9a\u4e3b\u5bfc\u60c5\u611f\u53ca\u5176\u5173\u7cfb\u3002\u6700\u540e\uff0c\u7ed3\u8bba\u57fa\u4e8e\u524d\u4e00\u5206\u6790\u5f97\u51fa\u6700\u7ec8\u7684\u8868\u60c5\u6807\u7b7e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86Exp-CoT\u5f15\u64ce\uff0c\u7528\u4e8e\u6784\u5efa\u6b64\u8868\u60c5CoT\u5e76\u751f\u6210\u6307\u4ee4\u63cf\u8ff0\u6570\u636e\u4ee5\u8bad\u7ec3\u6211\u4eec\u7684ExpLLM\u3002\u5728RAF-DB\u548cAffectNet\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cExpLLM\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u65b9\u6cd5\u3002\u5728\u5fae\u8868\u60c5\u8bc6\u522b\u65b9\u9762\uff0cExpLLM\u4e5f\u8d85\u8d8a\u4e86\u6700\u65b0\u7684GPT-4o\uff0c\u5c24\u5176\u662f\u5728GPT-4o\u7ecf\u5e38\u5931\u8d25\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2409.02823": "|**2024-09-04**|**Design Contradictions: Help or Hindrance?**|Aron E. Owen et.al.|[2409.02823](http://arxiv.org/abs/2409.02823)|null|\u5728\u6570\u636e\u53ef\u89c6\u5316\u9886\u57df\uff0c\u521b\u65b0\u601d\u7ef4\u7684\u8feb\u5207\u9700\u6c42\u4fc3\u4f7f\u6211\u4eec\u63a2\u7d22\u65b0\u7684\u521b\u610f\u65b9\u6cd5\u3002\u901a\u8fc7\u7ec4\u5408\u4e24\u4e2a\u6216\u66f4\u591a\u5177\u6709\u5bf9\u7acb\u6027\u8d28\u7684\u521b\u9020\u6027\u8bcd\u6c47\uff0c\u80fd\u591f\u6fc0\u53d1\u65b0\u578b\u60f3\u6cd5\u4e0e\u8bbe\u8ba1\uff0c\u5bf9\u521b\u610f\u8fc7\u7a0b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002\u968f\u7740\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u8bbe\u8ba1\u7684\u53d1\u5c55\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u6d6e\u51fa\u6c34\u9762\uff1a\u8fd9\u4e9b\u8bbe\u8ba1\u77db\u76fe\u662f\u5426\u80fd\u4e0eAI\u5de5\u5177\u534f\u540c\u5de5\u4f5c\uff1f\u76ee\u524d\u7b54\u6848\u662f\u5426\u5b9a\u7684\u3002AI\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f9d\u8d56\u4e8e\u4ea7\u751f\u76f8\u4f3c\u6027\u7684\u7b97\u6cd5\uff0c\u800c\u521b\u9020\u529b\u5f80\u5f80\u9700\u8981\u5dee\u5f02\u6027\u548c\u65b0\u9896\u6027\u3002\u8fd9\u4efd\u6d77\u62a5\u5f00\u542f\u4e86\u5173\u4e8e\u5982\u4f55\u5f15\u5bfcAI\u7cfb\u7edf\u53d8\u5f97\u66f4\u5177\u521b\u9020\u6027\u548c\u751f\u6210\u65b0\u60f3\u6cd5\u7684\u5bf9\u8bdd\u3002\u8fd9\u9879\u7814\u7a76\u9080\u8bf7\u6211\u4eec\u91cd\u65b0\u8003\u8651\u4f20\u7edf\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5e76\u63a2\u7d22AI\u9a71\u52a8\u4e16\u754c\u4e2d\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u80fd\u5426\u5e94\u7528\u4f20\u7edf\u7684\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5982\u53cc\u94bb\u77f3\u6a21\u578b\uff0c\u6216\u8005\u662f\u5426\u9700\u8981\u65b0\u7684\u8bbe\u8ba1\u5de5\u7a0b\u65b9\u6cd5\uff1f\u5982\u4f55\u5229\u7528\u751f\u6210\u5f0fAI\u5feb\u901f\u8bbe\u8ba1\u53ef\u89c6\u5316\u5e76\u6784\u601d\u65b0\u60f3\u6cd5\uff1f\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u5f00\u542f\u8fd9\u4e00\u91cd\u8981\u5bf9\u8bdd\uff0c\u5e76\u63d0\u4f9b\u6709\u5173AI\u5728\u63a8\u52a8\u6570\u636e\u53ef\u89c6\u5316\u521b\u610f\u65b9\u9762\u7684\u6f5c\u529b\u7684\u5b9e\u7528\u89c1\u89e3\u3002|\n", "2409.02822": "|**2024-09-04**|**Language Understanding as a Constraint on Consensus Size in LLM Societies**|Giordano De Marzo et.al.|[2409.02822](http://arxiv.org/abs/2409.02822)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u671d\u7740\u534f\u4f5c\u4efb\u52a1\u53d1\u5c55\u7684\u60c5\u51b5\u4e0b\uff0c\u591a\u4e2a\u4ee3\u7406\u76f8\u4e92\u4f5c\u7528\uff0c\u5982\u540c\u4e00\u4e2aLLM\u793e\u4f1a\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u5927\u91cf\u7684LLM\u80fd\u591f\u901a\u8fc7\u81ea\u6211\u7ec4\u7ec7\u65b9\u5f0f\u8fbe\u6210\u5173\u4e8e\u4efb\u610f\u89c4\u8303\u7684\u5171\u8bc6\uff0c\u8fd9\u4e9b\u89c4\u8303\u5728\u4fe1\u606f\u652f\u6301\u67d0\u4e00\u9009\u9879\u4f18\u4e8e\u53e6\u4e00\u9009\u9879\u7684\u60c5\u51b5\u4e0b\u4e0d\u5b58\u5728\u3002\u4e3a\u4e86\u7406\u89e3LLM\u662f\u5426\u4e0e\u4eba\u7c7b\u793e\u4f1a\u4e00\u6837\uff0c\u5728\u6ca1\u6709\u673a\u6784\u7684\u60c5\u51b5\u4e0b\u80fd\u591f\u8fbe\u5230\u5171\u8bc6\uff0c\u6211\u4eec\u5e94\u7528\u4e86\u590d\u6742\u79d1\u5b66\u7684\u65b9\u6cd5\u548c\u884c\u4e3a\u79d1\u5b66\u7684\u539f\u5219\uff0c\u5f00\u521b\u4e86\u4e00\u79cdAI\u4eba\u7c7b\u5b66\u7684\u65b0\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLM\u80fd\u591f\u5728\u7fa4\u4f53\u4e2d\u8fbe\u6210\u5171\u8bc6\uff0c\u5e76\u4e14LLM\u7684\u610f\u89c1\u52a8\u6001\u53ef\u4ee5\u7528\u4e00\u4e2a\u7531\u591a\u6570\u529b\u91cf\u7cfb\u6570\u53c2\u6570\u5316\u7684\u51fd\u6570\u6765\u7406\u89e3\uff0c\u8be5\u7cfb\u6570\u51b3\u5b9a\u4e86\u5171\u8bc6\u662f\u5426\u53ef\u80fd\u3002\u5bf9\u4e8e\u5177\u6709\u66f4\u9ad8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u79cd\u591a\u6570\u529b\u91cf\u66f4\u5f3a\uff0c\u800c\u5bf9\u4e8e\u8f83\u5927\u7684\u7fa4\u4f53\u800c\u8a00\u5219\u4f1a\u51cf\u5f31\uff0c\u5bfc\u81f4\u5b58\u5728\u4e00\u4e2a\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\uff0c\u8d85\u8fc7\u8fd9\u4e2a\u5927\u5c0f\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684LLM\uff0c\u8fbe\u6210\u5171\u8bc6\u53d8\u5f97\u4e0d\u53ef\u80fd\u3002\u8fd9\u4e00\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\u968f\u7740\u6a21\u578b\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u589e\u957f\u5448\u6307\u6570\u7ea7\u589e\u957f\uff0c\u5bf9\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u800c\u8a00\uff0c\u5176\u53ef\u4ee5\u8fbe\u5230\u8fdc\u8d85\u975e\u6b63\u5f0f\u4eba\u7c7b\u7fa4\u4f53\u5178\u578b\u89c4\u6a21\u7684\u6570\u91cf\u7ea7\u3002|\n", "2409.02795": "|**2024-09-04**|**Towards a Unified View of Preference Learning for Large Language Models: A Survey**|Bofei Gao et.al.|[2409.02795](http://arxiv.org/abs/2409.02795)|**[link](https://github.com/kbsdjames/awesome-llm-preference-learning)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u5b9e\u73b0\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\u4e4b\u4e00\u662f\u4f7fLLM\u7684\u8f93\u51fa\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u8fd9\u4e00\u8fc7\u7a0b\u901a\u5e38\u9700\u8981\u5c11\u91cf\u6570\u636e\u5c31\u80fd\u9ad8\u6548\u63d0\u5347LLM\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u76f8\u5173\u65b9\u6cd5\u76f8\u5bf9\u590d\u6742\u96be\u4ee5\u7406\u89e3\u3002\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u5173\u7cfb\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\uff0c\u9650\u5236\u4e86\u504f\u597d\u8c03\u6574\u7b56\u7565\u7684\u53d1\u5c55\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5206\u89e3\u4e86\u73b0\u6709\u6d41\u884c\u8c03\u6574\u7b56\u7565\u7684\u56db\u4e2a\u7ec4\u6210\u90e8\u5206\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u6765\u7814\u7a76\u5f53\u524d\u7684\u8c03\u6574\u7b56\u7565\uff0c\u4ee5\u6b64\u5efa\u7acb\u5b83\u4eec\u4e4b\u95f4\u7684\u8054\u7cfb\u3002\u5728\u672c\u6587\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u5c06\u6240\u6709\u504f\u597d\u5b66\u4e60\u7b56\u7565\u5206\u89e3\u4e3a\u56db\u4e2a\u90e8\u5206\uff1a\u6a21\u578b\u3001\u6570\u636e\u3001\u53cd\u9988\u548c\u7b97\u6cd5\u3002\u8fd9\u79cd\u7edf\u4e00\u89c6\u89d2\u4e3a\u73b0\u6709\u8c03\u6574\u7b97\u6cd5\u63d0\u4f9b\u4e86\u6df1\u5165\u7406\u89e3\uff0c\u5e76\u4e14\u4e5f\u5f00\u542f\u4e86\u6574\u5408\u4e0d\u540c\u7b56\u7565\u4f18\u52bf\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u73b0\u6709\u4e3b\u6d41\u7b97\u6cd5\u7684\u5de5\u4f5c\u793a\u4f8b\uff0c\u4ee5\u5e2e\u52a9\u8bfb\u8005\u5168\u9762\u4e86\u89e3\u3002\u6700\u540e\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u7edf\u4e00\u89c6\u89d2\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7684\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002|\n", "2409.03752": "|**2024-09-05**|**Attention Heads of Large Language Models: A Survey**|Zifan Zheng et.al.|[2409.03752](http://arxiv.org/abs/2409.03752)|**[link](https://github.com/iaar-shanghai/awesome-attention-heads)**|**\u81eaChatGPT\u95ee\u4e16\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u4f5c\u4e3a\u9ed1\u76d2\u7cfb\u7edf\u5b58\u5728\u3002\u56e0\u6b64\uff0c\u5176\u53d1\u5c55\u4e3b\u8981\u4f9d\u8d56\u4e8e\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\uff0c\u9650\u5236\u4e86\u901a\u8fc7\u6539\u53d8\u5185\u90e8\u67b6\u6784\u548c\u63a8\u7406\u8def\u5f84\u6765\u63d0\u5347\u6027\u80fd\u7684\u53ef\u80fd\u6027\u3002\u8bb8\u591a\u7814\u7a76\u8005\u5f00\u59cb\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u90e8\u673a\u5236\uff0c\u65e8\u5728\u8bc6\u522b\u63a8\u7406\u74f6\u9888\u7684\u672c\u8d28\uff0c\u5927\u591a\u6570\u7814\u7a76\u96c6\u4e2d\u5728\u6ce8\u610f\u529b\u5934\u90e8\u4e0a\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u65e8\u5728\u901a\u8fc7\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\u548c\u6ce8\u610f\u529b\u5934\u90e8\u7684\u5185\u5728\u673a\u5236\uff0c\u63ed\u793a\u5176\u5185\u90e8\u63a8\u7406\u8fc7\u7a0b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u63d0\u70bc\u4e3a\u56db\u4e2a\u9636\u6bb5\u6846\u67b6\uff1a\u77e5\u8bc6\u56de\u5fc6\u3001\u60c5\u5883\u5185\u8bc6\u522b\u3001\u6f5c\u5728\u63a8\u7406\u548c\u8868\u8fbe\u51c6\u5907\u3002\u5229\u7528\u8fd9\u4e00\u6846\u67b6\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u56de\u987e\u73b0\u6709\u7814\u7a76\uff0c\u8bc6\u522b\u5e76\u5206\u7c7b\u7279\u5b9a\u6ce8\u610f\u529b\u5934\u90e8\u7684\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u53d1\u73b0\u8fd9\u4e9b\u7279\u6b8a\u5934\u90e8\u6240\u4f7f\u7528\u7684\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u5206\u4e3a\u65e0\u6a21\u578b\u65b9\u6cd5\u548c\u6709\u6a21\u578b\u65b9\u6cd5\u4e24\u5927\u7c7b\u3002\u6211\u4eec\u4e5f\u6982\u8ff0\u4e86\u76f8\u5173\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u5f53\u524d\u7814\u7a76\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u51e0\u4e2a\u6f5c\u5728\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u5f00\u6e90\u4e8e\u3002**|\n", "2409.03735": "|**2024-09-05**|**LLM-CI: Assessing Contextual Integrity Norms in Language Models**|Yan Shvartzshnaider et.al.|[2409.03735](http://arxiv.org/abs/2409.03735)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ece\u4e92\u8054\u7f51\u4e0a\u6536\u96c6\u7684\u6570\u636e\u4e2d\u8bb0\u5fc6\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u540c\u65f6\uff0c\u4e5f\u53ef\u80fd\u65e0\u610f\u4e2d\u7f16\u7801\u4e86\u793e\u4f1a\u504f\u597d\u548c\u89c4\u8303\u3002\u968f\u7740\u8fd9\u4e9b\u6a21\u578b\u88ab\u6574\u5408\u5230\u793e\u4f1a\u6280\u672f\u7cfb\u7edf\u4e2d\uff0c\u786e\u4fdd\u5b83\u4eec\u7f16\u7801\u7684\u89c4\u8303\u7b26\u5408\u793e\u4f1a\u671f\u671b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u89c4\u8303\u53ef\u80fd\u56e0\u6a21\u578b\u3001\u8d85\u53c2\u6570\u3001\u4f18\u5316\u6280\u672f\u4ee5\u53ca\u6570\u636e\u96c6\u7684\u4e0d\u540c\u800c\u4e0d\u540c\u3002\u7531\u4e8e\u63d0\u793a\u654f\u611f\u6027\u7684\u95ee\u9898\u2014\u2014\u5fae\u5c0f\u7684\u63d0\u793a\u53d8\u5316\u4f1a\u5bfc\u81f4\u4e0d\u540c\u7684\u54cd\u5e94\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u53d8\u5f97\u4e0d\u53ef\u9760\u3002\u9700\u8981\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u6db5\u76d6\u5404\u79cd\u6a21\u578b\u3001\u4f18\u5316\u548c\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u53ef\u9760\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u7f16\u7801\u7684\u89c4\u8303\u3002 \u6211\u4eec\u63d0\u51fa\u4e86LLM-CI\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30LLM\u4e2d\u7f16\u7801\u9690\u79c1\u89c4\u8303\u7684\u5f00\u6e90\u6846\u67b6\u3002LLM-CI\u4f7f\u7528\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u56e0\u7d20\u7684\u60c5\u5883\u53d9\u8ff0\u65b9\u6cd5\u6765\u8bc4\u4f30\u4e0d\u540c\u4e0a\u4e0b\u6587\u4e2d\u548c\u4e0d\u540cLLM\u4e2d\u7684\u7f16\u7801\u89c4\u8303\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u63d0\u793a\u8bc4\u4f30\u65b9\u6cd5\u6765\u89e3\u51b3\u63d0\u793a\u654f\u611f\u6027\u95ee\u9898\uff0c\u901a\u8fc7\u4ec5\u4ece\u5bfc\u81f4\u591a\u4e2a\u53d8\u4f53\u4e00\u81f4\u54cd\u5e94\u7684\u63d0\u793a\u4e2d\u8bc4\u4f30\u89c4\u8303\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4f7f\u7528\u5148\u524d\u5de5\u4f5c\u4e2d\u7684IoT\u548cCOPPA\u60c5\u666f\u6570\u636e\u96c6\u7684LLM\u3002 \u901a\u8fc7\u4f7f\u7528LLM-CI\u548c\u6211\u4eec\u63d0\u51fa\u7684\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5168\u9762\u5730\u8bc4\u4f30\u4e86LLM\uff0c\u7814\u7a76\u4e86\u6a21\u578b\u5c5e\u6027\uff08\u5982\u8d85\u53c2\u6570\u3001\u5bb9\u91cf\uff09\u548c\u4f18\u5316\u7b56\u7565\uff08\u5982\u5bf9\u9f50\u3001\u91cf\u5316\uff09\u7684\u5f71\u54cd\u3002|\n", "2409.03734": "|**2024-09-05**|**Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry**|Meena Jagadeesan et.al.|[2409.03734](http://arxiv.org/abs/2409.03734)|null|\u672c\u6587\u4ece\u7ecf\u6d4e\u548c\u7b97\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7b49\u5927\u89c4\u6a21\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5e02\u573a\u4e2d\u7684\u96c6\u4e2d\u95ee\u9898\uff0c\u4ee5\u53ca\u662f\u5426\u5b58\u5728\u8fdb\u5165\u6b64\u7c7b\u5e02\u573a\u7684\u4e0d\u53ef\u514b\u670d\u969c\u788d\u3002\u6211\u4eec\u901a\u8fc7\u6b63\u5f0f\u5b9a\u4e49\u4e00\u4e2a\u591a\u76ee\u6807\u9ad8\u7ef4\u56de\u5f52\u6846\u67b6\u6765\u63a2\u8ba8\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u7684\u95ee\u9898\uff0c\u8be5\u6846\u67b6\u6355\u6349\u5230\u4e86\u58f0\u8a89\u635f\u5bb3\u7684\u7279\u5f81\uff0c\u5e76\u5206\u6790\u4e86\u65b0\u516c\u53f8\u8fdb\u5165\u5e02\u573a\u6240\u9700\u7684\u6837\u672c\u6570\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u591a\u76ee\u6807\u8003\u8651\u80fd\u591f\u4ece\u6839\u672c\u4e0a\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u2014\u2014\u6240\u9700\u6837\u672c\u6570\u91cf\u53ef\u80fd\u8fdc\u5c0f\u4e8e\u73b0\u6709\u516c\u53f8\u7684\u6570\u636e\u96c6\u5927\u5c0f\u3002\u5728\u8bc1\u660e\u8fd9\u4e9b\u7ed3\u679c\u7684\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u8fd8\u53d1\u5c55\u4e86\u591a\u76ee\u6807\u73af\u5883\u4e2d\u9ad8\u7ef4\u7ebf\u6027\u56de\u5f52\u7684\u7f29\u653e\u5b9a\u5f8b\uff0c\u5c55\u793a\u4e86\u5f53\u6570\u636e\u96c6\u89c4\u6a21\u8f83\u5927\u65f6\uff0c\u7f29\u653e\u7387\u4f1a\u53d8\u5f97\u8f83\u6162\uff0c\u8fd9\u4e00\u53d1\u73b0\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u7814\u7a76\u4ef7\u503c\u3002|\n", "2409.03733": "|**2024-09-05**|**Planning In Natural Language Improves LLM Search For Code Generation**|Evan Wang et.al.|[2409.03733](http://arxiv.org/abs/2409.03733)|**[link](https://github.com/scaleapi/plansearch)**|\u5728\u5927\u89c4\u6a21\u63d0\u5347\u8bad\u7ec3\u8ba1\u7b97\u80fd\u529b\u7684\u540c\u65f6\uff0c\u63a8\u7406\u8ba1\u7b97\u7684\u89c4\u6a21\u6269\u5c55\u5e76\u672a\u5e26\u6765\u7c7b\u4f3c\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u5047\u8bbe\uff0c\u8fd9\u4e00\u9886\u57df\u7f3a\u4e4f\u5173\u952e\u6027\u7684\u7a81\u7834\u5728\u4e8e\u751f\u6210\u6a21\u578b\u7684\u8f93\u51fa\u591a\u6837\u6027\u4e0d\u8db3\uff0c\u5bfc\u81f4\u641c\u7d22\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u6a21\u578b\u4e0d\u65ad\u4ea7\u751f\u9ad8\u5ea6\u76f8\u4f3c\u4f46\u9519\u8bef\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u73b0\u63d0\u9ad8\u8f93\u51fa\u591a\u6837\u6027\u53ef\u4ee5\u6709\u6548\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPLANSEARCH\u7684\u65b0\u9896\u641c\u7d22\u7b97\u6cd5\uff0c\u5b83\u5728\u4eba\u7c7b\u8bc4\u4ef7\u3001MBPP+\u548cLiveCodeBench\uff08\u4e00\u4e2a\u7528\u4e8e\u7ade\u4e89\u6027\u7f16\u7a0b\u7684\u65e0\u6c61\u67d3\u57fa\u51c6\uff09\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u751f\u6210\u5173\u4e8e\u95ee\u9898\u7684\u591a\u6837\u89c2\u5bdf\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u89c2\u5bdf\u6784\u5efa\u89e3\u51b3\u7b56\u7565\uff0c\u6765\u63a2\u7d22\u6bd4\u4f20\u7edf\u65b9\u6cd5\u66f4\u5e7f\u6cdb\u7684\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u3002\u5728\u4f7f\u7528PLANSEARCH\u7ed3\u5408Claude 3.5 Sonnet\u8fdb\u884c\u4f18\u5316\u540e\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86LiveCodeBench\u4e0a77.0%\u7684\u901a\u8fc7\u7387\uff08pass@200\uff09\uff0c\u8fd9\u4e0d\u4ec5\u8d85\u8d8a\u4e86\u4e0d\u4f7f\u7528\u641c\u7d22\u65b9\u6cd5\uff08pass@1=41.4%\uff09\u7684\u7ed3\u679c\uff0c\u4e5f\u4f18\u4e8e\u4ec5\u4f9d\u8d56\u91cd\u590d\u91c7\u6837\u7684\u65b9\u6cd5\uff08pass@200=60.6%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u641c\u7d22\u5e26\u6765\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5176\u5173\u952e\u56e0\u7d20\u662f\u751f\u6210\u60f3\u6cd5\u7684\u591a\u6837\u6027\u3002|\n", "2409.03708": "|**2024-09-06**|**RAG based Question-Answering for Contextual Response Prediction System**|Sriram Veturi et.al.|[2409.03708](http://arxiv.org/abs/2409.03708)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u80fd\u529b\uff0c\u9488\u5bf9\u5b9e\u9645\u5de5\u4e1a\u5e94\u7528\u4e2d\u7684\u95ee\u9898\u56de\u7b54\u573a\u666f\u3002\u7ed9\u5b9a\u5ba2\u6237\u67e5\u8be2\uff0c\u8be5\u7cfb\u7edf\u4f1a\u68c0\u7d22\u76f8\u5173\u77e5\u8bc6\u6587\u6863\uff0c\u5e76\u7ed3\u5408\u4e4b\u524d\u7684\u804a\u5929\u5386\u53f2\uff0c\u4e3a\u96f6\u552e\u516c\u53f8\u7684\u5ba2\u670d\u4e2d\u5fc3\u63d0\u4f9b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u751f\u6210\u54cd\u5e94\u5efa\u8bae\u3002\u901a\u8fc7\u5168\u9762\u7684\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5728\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u4e0a\u4f18\u4e8e\u5f53\u524d\u57fa\u4e8eBERT\u7684\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eRAG\u7684LLMs\u53ef\u4ee5\u4f5c\u4e3a\u4eba\u7c7b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u7684\u4f18\u79c0\u8f85\u52a9\u5de5\u5177\uff0c\u51cf\u8f7b\u4ed6\u4eec\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002|\n", "2409.03671": "|**2024-09-05**|**TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems**|Stylianos Loukas Vasileiou et.al.|[2409.03671](http://arxiv.org/abs/2409.03671)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRACE-cs\u7684\u65b0\u578b\u6df7\u5408\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u7b26\u53f7\u63a8\u7406\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u89e3\u51b3\u6392\u7a0b\u95ee\u9898\u4e2d\u7684\u5bf9\u6bd4\u67e5\u8be2\u3002TRACE-cs\u5229\u7528SAT\u6c42\u89e3\u6280\u672f\u7f16\u7801\u6392\u7a0b\u7ea6\u675f\uff0c\u5e76\u751f\u6210\u7528\u6237\u67e5\u8be2\u7684\u89e3\u91ca\uff0c\u540c\u65f6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c06\u7528\u6237\u7684\u67e5\u8be2\u8f6c\u6362\u4e3a\u903b\u8f91\u6761\u76ee\uff0c\u5e76\u7ec6\u5316\u7b26\u53f7\u6c42\u89e3\u5668\u751f\u6210\u7684\u89e3\u91ca\u4e3a\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u3002\u901a\u8fc7\u6574\u5408\u8fd9\u4e9b\u7ec4\u4ef6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5c06\u7b26\u53f7\u65b9\u6cd5\u4e0eLLM\u76f8\u7ed3\u5408\uff0c\u521b\u5efa\u5177\u6709\u6b63\u786e\u6027\u4fdd\u8bc1\u7684\u53ef\u89e3\u91caAI\u4ee3\u7406\u7684\u6f5c\u529b\u3002|\n", "2409.03668": "|**2024-09-05**|**A Fused Large Language Model for Predicting Startup Success**|Abdurahman Maarouf et.al.|[2409.03668](http://arxiv.org/abs/2409.03668)|null|\u4e3a\u4e86\u5e2e\u52a9\u6295\u8d44\u8005\u505a\u51fa\u6709\u6548\u7684\u51b3\u7b56\u5e76\u6301\u7eed\u5bfb\u627e\u76c8\u5229\u7684\u521b\u4e1a\u6295\u8d44\u673a\u4f1a\uff0c\u9700\u8981\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u5982\u4eca\uff0c\u6295\u8d44\u8005\u4e0d\u4ec5\u53ef\u4ee5\u5229\u7528\u6709\u5173\u521d\u521b\u516c\u53f8\u7684\u5404\u79cd\u57fa\u672c\u9762\u4fe1\u606f\uff08\u5982\u516c\u53f8\u7684\u6210\u7acb\u65f6\u95f4\u3001\u521b\u59cb\u4eba\u6570\u91cf\u4ee5\u53ca\u6240\u5904\u884c\u4e1a\uff09\uff0c\u8fd8\u53ef\u4ee5\u901a\u8fc7\u5728\u7ebf\u98ce\u9669\u6295\u8d44\uff08VC\uff09\u5e73\u53f0\u83b7\u53d6\u5173\u4e8e\u516c\u53f8\u521b\u65b0\u548c\u4e1a\u52a1\u6a21\u5f0f\u7684\u6587\u672c\u63cf\u8ff0\u4fe1\u606f\uff0c\u4f8b\u5982Crunchbase\u3002\u4e3a\u4e86\u652f\u6301\u6295\u8d44\u8005\u7684\u51b3\u7b56\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\uff0c\u65e8\u5728\u5728VC\u5e73\u53f0\u4e0a\u5b9a\u4f4d\u6210\u529f\u7684\u521d\u521b\u516c\u53f8\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f00\u53d1\u3001\u8bad\u7ec3\u5e76\u8bc4\u4f30\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u8bc4\u4f30VC\u5e73\u53f0\u4e0a\u516c\u53f8\u7684\u81ea\u6211\u63cf\u8ff0\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u9884\u6d4b\u5176\u6210\u529f\u6027\u3002\u4f7f\u7528\u6765\u81eaCrunchbase\u768420,172\u4e2a\u5728\u7ebf\u8d44\u6599\u6863\u6848\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\uff0c\u5176\u4e2d\u6587\u672c\u81ea\u6211\u63cf\u8ff0\u5bf9\u9884\u6d4b\u80fd\u529b\u8d21\u732e\u4e86\u663e\u8457\u90e8\u5206\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u51b3\u7b56\u652f\u6301\u5de5\u5177\uff0c\u5e2e\u52a9\u6295\u8d44\u8005\u627e\u5230\u76c8\u5229\u7684\u6295\u8d44\u673a\u4f1a\u3002|\n", "2409.03662": "|**2024-09-05**|**The representation landscape of few-shot learning and fine-tuning in large language models**|Diego Doimo et.al.|[2409.03662](http://arxiv.org/abs/2409.03662)|**[link](https://github.com/diegodoimo/geometry_icl_finetuning)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u6539\u8fdb\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6027\u80fd\u7684\u4e24\u79cd\u5e38\u89c1\u7b56\u7565\uff1a\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u548c\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u672c\u8d28\u4e0d\u540c\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u80fd\u4ea7\u751f\u76f8\u4f3c\u7684\u6027\u80fd\u63d0\u5347\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u5b83\u4eec\u662f\u5426\u5728LLM\u5185\u90e8\u8bf1\u5bfc\u51fa\u76f8\u4f3c\u7684\u8868\u793a\u7ed3\u6784\u77e5\u4e4b\u751a\u5c11\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u9690\u85cf\u8868\u793a\u7684\u6982\u7387\u666f\u89c2\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5728\u76f8\u540c\u7684\u95ee\u7b54\u4efb\u52a1\u4e0a\u6bd4\u8f83\u4e86LLM\u7684\u8868\u73b0\uff0c\u53d1\u73b0ICL\u548cSFT\u4ea7\u751f\u4e86\u975e\u5e38\u4e0d\u540c\u7684\u5185\u90e8\u7ed3\u6784\uff0c\u4e24\u8005\u90fd\u5728\u7f51\u7edc\u7684\u4e2d\u95f4\u90e8\u5206\u7ecf\u5386\u4e86\u4e00\u4e2a\u660e\u663e\u7684\u8f6c\u53d8\u3002\u5728\u6a21\u578b\u7684\u524d\u534a\u90e8\u5206\uff0cICL\u5851\u9020\u4e86\u5206\u5c42\u7ec4\u7ec7\u7684\u53ef\u89e3\u91ca\u8868\u793a\uff0c\u6309\u7167\u5176\u8bed\u4e49\u5185\u5bb9\u8fdb\u884c\u6392\u5e8f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0cSFT\u5f97\u5230\u7684\u6982\u7387\u666f\u89c2\u66f4\u52a0\u6a21\u7cca\u4e14\u8bed\u4e49\u6df7\u6742\u3002\u5728\u7f51\u7edc\u7684\u540e\u534a\u90e8\u5206\uff0c\u5fae\u8c03\u540e\u7684\u8868\u793a\u53d1\u5c55\u51fa\u4e86\u66f4\u6709\u5229\u4e8e\u7f16\u7801\u7b54\u6848\u8eab\u4efd\u7684\u6982\u7387\u6a21\u5f0f\uff0c\u800cICL\u8868\u793a\u7684\u6982\u7387\u5cf0\u5219\u4e0d\u592a\u660e\u786e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u63ed\u793a\u4e86LLM\u5728\u4e0d\u540c\u6761\u4ef6\u4e0b\u89e3\u51b3\u76f8\u540c\u4efb\u52a1\u65f6\u6240\u91c7\u7528\u7684\u591a\u6837\u5316\u8ba1\u7b97\u7b56\u7565\uff0c\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u671d\u7740\u8bbe\u8ba1\u51fa\u4ece\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u4fe1\u606f\u7684\u6700\u4f73\u65b9\u6cd5\u8fc8\u8fdb\u3002**|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u4e14\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u65b9\u5f0f\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u8003\u8651\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\uff0c\u540c\u65f6\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u5916\u8fd8\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u4f7f\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u7cfb\u7edf\uff08GPT-3 \u548c GPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u6b4c\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u667a\u80fd\u4f53\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u5bfc\u81f4\u4ee5\u4e0b\u7ed3\u679c\uff1a1\uff09\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff08pp\uff09\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\uff0c\u6839\u636e\u72ec\u7279\u548c\u65b0\u9896\u7684n-grams\u8bc4\u4f30\u3002\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u65b9\u9762\u4e5f\u8868\u73b0\u51fa\u7fa4\u4f53\u5dee\u5f02\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u83b7\u76ca\uff0c\u5177\u6709\u975e\u540c\u8d28\u667a\u80fd\u4f53\u7684\u591a\u6837\u5316\u7684\u6a21\u578b\u7ec4\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u663e\u793a\u4e86\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u8bcd\u6c47\u591a\u6837\u6027\u7684\u4e0b\u964d\uff0c\u5e76\u6ca1\u6709\u5c55\u73b0\u51fa\u65e8\u5728\u5728\u793e\u4ea4\u7f51\u7edc\u4e2d\u5b9e\u73b0\u7684\u7fa4\u4f53\u95f4\u5206\u5316\u3002 \u672c\u6587\u8ba4\u4e3a\uff0c\u5728\u8bf8\u5982\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u8fdb\u884c\u8303\u5f0f\u8f6c\u53d8\uff0c\u5f15\u5165\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4ea4\u4e92\u7684\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u5efa\u6a21\uff09\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u52a0\u591a\u6837\u6027\u548c\u521b\u65b0\u7684\u751f\u6210\u3002**|\n", "2409.03512": "|**2024-09-05**|**From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents**|Jifan Yu et.al.|[2409.03512](http://arxiv.org/abs/2409.03512)|null|\u81ea\u6700\u65e9\u7684\u5728\u7ebf\u6559\u80b2\u5b9e\u4f8b\u51fa\u73b0\uff0c\u8bfe\u7a0b\u88ab\u4e0a\u4f20\u81f3\u53ef\u8bbf\u95ee\u5e76\u5171\u4eab\u7684\u5728\u7ebf\u5e73\u53f0\u4ee5\u6765\uff0c\u8fd9\u79cd\u6269\u5927\u77e5\u8bc6\u4f20\u64ad\u8303\u56f4\u3001\u89e6\u53ca\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u5f62\u5f0f\u5f15\u53d1\u4e86\u5e7f\u6cdb\u8ba8\u8bba\u548c\u666e\u904d\u91c7\u7eb3\u3002\u8ba4\u8bc6\u5230\u4e2a\u6027\u5316\u5b66\u4e60\u4ecd\u5b58\u5728\u6539\u8fdb\u7a7a\u95f4\uff0c\u4eba\u5de5\u667a\u80fd\u6280\u672f\u4e0d\u65ad\u878d\u5165\u8fd9\u4e00\u5b66\u4e60\u6a21\u5f0f\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u591a\u79cd\u6559\u80b2AI\u5e94\u7528\uff0c\u5982\u6559\u80b2\u63a8\u8350\u548c\u667a\u80fd\u8f85\u5bfc\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u667a\u80fd\u7684\u6d8c\u73b0\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u6559\u80b2\u589e\u5f3a\u529f\u80fd\u5f97\u4ee5\u57fa\u4e8e\u7edf\u4e00\u7684\u57fa\u7840\u6a21\u578b\u6784\u5efa\uff0c\u5b9e\u73b0\u66f4\u6df1\u5c42\u9762\u7684\u6574\u5408\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMAIC\uff08\u5927\u89c4\u6a21AI\u8d4b\u80fd\u8bfe\u7a0b\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5728\u7ebf\u6559\u80b2\u5f62\u5f0f\uff0c\u5229\u7528LLM\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u6784\u5efaAI\u8f85\u52a9\u8bfe\u5802\uff0c\u5e73\u8861\u4e86\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002\u9664\u4e86\u63a2\u7d22\u6982\u5ff5\u6846\u67b6\u548c\u6280\u672f\u521b\u65b0\u5916\uff0c\u6211\u4eec\u5728\u6e05\u534e\u5927\u5b66\u2014\u2014\u4e2d\u56fd\u9876\u5c16\u5927\u5b66\u4e4b\u4e00\u2014\u2014\u8fdb\u884c\u4e86\u521d\u6b65\u5b9e\u9a8c\u3002\u901a\u8fc7\u8d85\u8fc710\u4e07\u6761\u5b66\u4e60\u8bb0\u5f55\u548c500\u591a\u540d\u5b66\u751f\u7684\u6570\u636e\uff0c\u6211\u4eec\u83b7\u5f97\u4e86\u5b9d\u8d35\u89c2\u5bdf\u548c\u521d\u6b65\u5206\u6790\u3002\u8fd9\u4e2a\u9879\u76ee\u5c06\u6301\u7eed\u53d1\u5c55\uff0c\u6700\u7ec8\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u5168\u9762\u5f00\u653e\u7684\u5e73\u53f0\uff0c\u652f\u6301\u548c\u7edf\u4e00\u7814\u7a76\u3001\u6280\u672f\u548c\u5e94\u7528\uff0c\u5728\u5927\u6a21\u578bAI\u65f6\u4ee3\u63a2\u7d22\u5728\u7ebf\u6559\u80b2\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bbe\u60f3\u8fd9\u4e2a\u5e73\u53f0\u662f\u4e00\u4e2a\u5408\u4f5c\u67a2\u7ebd\uff0c\u6c47\u96c6\u6559\u80b2\u8005\u3001\u7814\u7a76\u4eba\u5458\u548c\u521b\u65b0\u8005\u5171\u540c\u63a2\u7d22AI\u9a71\u52a8\u5728\u7ebf\u6559\u80b2\u7684\u672a\u6765\u3002|\n", "2409.04421": "|**2024-09-06**|**RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs**|Jiaxing Wu et.al.|[2409.04421](http://arxiv.org/abs/2409.04421)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u9884\u6d4b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08Reinforcement Learning from Prediction Feedback\uff0cRLPF\uff09\u201d\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u5e94\u7528\u65f6\u9762\u4e34\u7684\u95ee\u9898\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f53LLMs\u4ece\u7528\u6237\u7684\u8fc7\u5f80\u6d3b\u52a8\u9884\u6d4b\u884c\u4e3a\u65f6\uff0c\u5b83\u4eec\u7684\u6709\u6548\u6027\u5f80\u5f80\u53d6\u51b3\u4e8e\u80fd\u5426\u6709\u6548\u5730\u5229\u7528\u5927\u91cf\u3001\u957f\u7bc7\u7684\u7528\u6237\u5386\u53f2\u6570\u636e\uff0c\u800c\u8fd9\u4e9b\u6570\u636e\u901a\u5e38\u542b\u6709\u566a\u97f3\u4e14\u957f\u5ea6\u8fc7\u957f\u3002\u73b0\u6709\u9884\u8bad\u7ec3\u7684LLMs\u53ef\u80fd\u751f\u6210\u7684\u6458\u8981\u867d\u77ed\u5c0f\u7cbe\u608d\uff0c\u4f46\u7f3a\u4e4f\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5176\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0cRLPF\u65b9\u6cd5\u901a\u8fc7\u5fae\u8c03LLMs\u6765\u751f\u6210\u7cbe\u70bc\u3001\u4eba\u7c7b\u53ef\u8bfb\u7684\u7528\u6237\u6982\u8981\uff0c\u8fd9\u4e9b\u6982\u8981\u80fd\u591f\u4f18\u5316\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\u3002\u901a\u8fc7\u6700\u5927\u5316\u751f\u6210\u6982\u8981\u7684\u6709\u7528\u6027\uff0cRLPF\u80fd\u591f\u6709\u6548\u63d0\u53d6\u5927\u91cf\u7528\u6237\u5386\u53f2\u6570\u636e\u7684\u5173\u952e\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0cRLPF\u5728\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u4e0a\u663e\u8457\u63d0\u5347\u4e8622%\uff0c\u5728\u4e8b\u5b9e\u6027\u3001\u62bd\u8c61\u6027\u548c\u53ef\u8bfb\u6027\u7b49\u6307\u6807\u4e0a\u7684\u8868\u73b0\u5206\u522b\u8fbe\u5230\u4e8684.59%\u7684\u80dc\u7387\uff0c\u540c\u65f6\u5b9e\u73b0\u4e8674%\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u51cf\u5c11\uff0c\u4e14\u572816\u4e2a\u672a\u89c1\u7684\u4efb\u52a1\u548c/\u6216\u6570\u636e\u96c6\u4e0a\u5747\u6709\u6027\u80fd\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u5176\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u603b\u4e4b\uff0cRLPF\u63d0\u4f9b\u4e86\u4e00\u79cd\u589e\u5f3aLLMs\u5728\u4e2a\u4eba\u5316\u9886\u57df\u5e94\u7528\u7684\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u901a\u8fc7\u5c06\u957f\u7bc7\u3001\u566a\u97f3\u4e30\u5bcc\u7684\u7528\u6237\u5386\u53f2\u8f6c\u5316\u4e3a\u4fe1\u606f\u4e30\u5bcc\u3001\u6613\u4e8e\u7406\u89e3\u7684\u8868\u793a\uff0c\u4ece\u800c\u63d0\u9ad8LLMs\u7684\u4e2a\u4eba\u5316\u80fd\u529b\u3002|\n", "2409.04388": "|**2024-09-06**|**Question-Answering Dense Video Events**|Hangyu Qin et.al.|[2409.04388](http://arxiv.org/abs/2409.04388)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\u2014\u2014\u9488\u5bf9\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u4e0e\u5b9a\u4f4d\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u80fd\u591f\u51c6\u786e\u7406\u89e3\u5e76\u63a8\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u591a\u4e2a\u4e8b\u4ef6\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aDeVE-QA\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5173\u4e8e10600\u4e2a\u957f\u89c6\u9891\u4e2d26000\u4e2a\u4e8b\u4ef6\u768478000\u4e2a\u95ee\u9898\u3002 \u73b0\u6709\u5728\u5355\u4e8b\u4ef6\u95ee\u7b54\u4e0a\u8868\u73b0\u51fa\u8272\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u9762\u5bf9DeVE-QA\u65f6\u9047\u5230\u6311\u6218\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5904\u7406\u957f\u65f6\u95f4\u6bb5\u5185\u53d1\u751f\u7684\u591a\u4e2a\u4e8b\u4ef6\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDeVi\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u5373\u53ef\u63d0\u5347MLLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002DeVi\u901a\u8fc7\u5f15\u5165\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u6765\u6539\u8fdb\u73b0\u6709\u7684MLLMs\uff1a\u5c42\u7ea7\u63cf\u8ff0\u6a21\u5757\u3001\u65f6\u95f4\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u548c\u81ea\u6211\u4e00\u81f4\u6027\u68c0\u67e5\u6a21\u5757\u3002\u8fd9\u4e09\u4e2a\u6a21\u5757\u5206\u522b\u7528\u4e8e\u68c0\u6d4b\u3001\u4e0a\u4e0b\u6587\u5316\u548c\u8bb0\u5fc6\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\uff0c\u4ee5\u53ca\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u4ee5\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709MLLMs\u76f8\u6bd4\uff0cDeVi\u5728\u56de\u7b54\u5bc6\u96c6\u4e8b\u4ef6\u95ee\u9898\u548c\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728DeVE-QA\u6570\u636e\u96c6\u4e0a\uff0cDeVi\u7684G(round)QA\u51c6\u786e\u7387\u63d0\u9ad8\u4e864.1%\uff0c\u5728NExT-GQA\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e863.7%\u3002|\n", "2409.04318": "|**2024-09-06**|**Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs**|Aliakbar Nafar et.al.|[2409.04318](http://arxiv.org/abs/2409.04318)|**[link](https://github.com/HLR/LvsR-LLM)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u4f30\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5185\u5728\u5b66\u4e60\u673a\u5236\u7684\u6846\u67b6\u3002\u6211\u4eec\u58f0\u79f0\uff0c\u8fd9\u4e9b\u673a\u5236\u662f\u901a\u8fc7\u68c0\u7d22\u5185\u90e8\u77e5\u8bc6\u548c\u901a\u8fc7\u5173\u6ce8\u56de\u5f52\u4efb\u52a1\u4ece\u4e0a\u4e0b\u6587\u4e2d\u7684\u793a\u4f8b\u8fdb\u884c\u5b66\u4e60\u7684\u7ec4\u5408\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u6267\u884c\u56de\u5f52\u7684\u80fd\u529b\uff0c\u5e76\u8bbe\u8ba1\u5b9e\u9a8c\u6765\u8861\u91cf\u6a21\u578b\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u901a\u8fc7\u68c0\u7d22\u5176\u5185\u90e8\u77e5\u8bc6\u800c\u4e0d\u662f\u4ece\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u5b66\u4e60\u6765\u8fdb\u884c\u5185\u5728\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e2a\u8fc7\u7a0b\u4f4d\u4e8e\u8fd9\u4e24\u4e2a\u6781\u7aef\u4e4b\u95f4\u7684\u8fde\u7eed\u4f53\u4e0a\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u6839\u636e\u5404\u79cd\u56e0\u7d20\uff08\u5982\u4efb\u52a1\u7684\u5148\u9a8c\u77e5\u8bc6\u4ee5\u53ca\u63d0\u4f9b\u7ed9\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u4fe1\u606f\u7c7b\u578b\u548c\u4e30\u5bcc\u5ea6\uff09\u8fd9\u4e9b\u673a\u5236\u88ab\u89e6\u53d1\u7684\u7a0b\u5ea6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cdLLMs\u5e76\u5229\u7528\u591a\u4e2a\u6570\u636e\u96c6\u6765\u9a8c\u8bc1\u6211\u4eec\u7684\u53d1\u73b0\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5982\u4f55\u6839\u636e\u6240\u89e3\u51b3\u7684\u95ee\u9898\u5229\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u7684\u5143\u5b66\u4e60\u548c\u4fc3\u8fdb\u77e5\u8bc6\u68c0\u7d22\u7684\u65b9\u6cd5\u3002|\n", "2409.04312": "|**2024-09-06**|**An optically accelerated extreme learning machine using hot atomic vapors**|Pierre Azam et.al.|[2409.04312](http://arxiv.org/abs/2409.04312)|null|\u673a\u5668\u5b66\u4e60\u6b63\u9010\u6e10\u6210\u4e3a\u4e00\u79cd\u5e7f\u6cdb\u5e94\u7528\u7684\u6280\u672f\uff0c\u5176\u589e\u957f\u901f\u5ea6\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u539f\u56e0\u5728\u4e8e\u5b83\u80fd\u591f\u63d0\u4f9b\u89e3\u51b3\u793e\u4f1a\u5173\u6ce8\u95ee\u9898\u7684\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\u7684\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u968f\u7740\u5e94\u7528\u548c\u6240\u9700\u8d44\u6e90\u7684\u589e\u52a0\uff0c\u5f53\u524d\u7684\u786c\u4ef6\u6280\u672f\u5f00\u59cb\u53d7\u9650\u3002\u7279\u522b\u662f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6216\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u8bc6\u522b\u7b49\u65b0\u578b\u673a\u5668\u5b66\u4e60\u9886\u57df\uff0c\u8ba1\u7b97\u65f6\u95f4\u4e0e\u80fd\u6e90\u6210\u672c\u6210\u4e3a\u4e86\u5173\u952e\u95ee\u9898\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u591a\u5e74\u6765\u5df2\u7ecf\u8bbe\u8ba1\u51fa\u4e86\u5149\u5b66\u5e73\u53f0\uff0c\u65e8\u5728\u5f00\u53d1\u66f4\u9ad8\u6548\u7684\u673a\u5668\u5b66\u4e60\u786c\u4ef6\u3002 \u5176\u4e2d\uff0c\u81ea\u7531\u7a7a\u95f4\u4f20\u64ad\u5e73\u53f0\u5177\u6709\u591a\u79cd\u4f18\u52bf\uff1a\u5e76\u884c\u6027\u3001\u4f4e\u80fd\u8017\u4e0e\u8ba1\u7b97\u901f\u5ea6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7ed3\u5408\u5149\u675f\u5728\u70ed\u539f\u5b50\u84b8\u6c14\u4e2d\u4f20\u64ad\u7684\u5f3a\u70c8\u4e14\u53ef\u8c03\u975e\u7ebf\u6027\u7279\u6027\u7684\u65b0\u8bbe\u8ba1\uff0c\u5e76\u4e0e\u6781\u7aef\u5b66\u4e60\u673a\u6a21\u578b\u76f8\u7ed3\u5408\u3002\u901a\u8fc7\u6570\u503c\u6a21\u62df\u4e0e\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728MNIST\u56fe\u50cf\u5206\u7c7b\u4efb\u52a1\u4e2d\u4f7f\u7528\u6b64\u7c7b\u81ea\u7531\u7a7a\u95f4\u975e\u7ebf\u6027\u4f20\u64ad\u589e\u5f3a\u8bad\u7ec3\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u9a8c\u4e2d\u7684\u591a\u4e2a\u8d85\u53c2\u6570\uff0c\u8fd9\u4e9b\u53c2\u6570\u8fdb\u4e00\u6b65\u4f18\u5316\u540e\u53ef\u4ee5\u63d0\u9ad8\u5e73\u53f0\u7684\u51c6\u786e\u6027\u3002|\n", "2409.04286": "|**2024-09-06**|**Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets**|Desiree Heim et.al.|[2409.04286](http://arxiv.org/abs/2409.04286)|null|\u5f53\u524d\u516c\u5f00\u7684\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u5728\u591a\u6837\u6027\u3001\u8be6\u5c3d\u6ce8\u91ca\u4ee5\u53ca\u7528\u6237\u548c\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u77e5\u8bc6\u5de5\u4f5c\u8f85\u52a9\u7cfb\u7edf\u8fdb\u884c\u5ba2\u89c2\u548c\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u9a71\u52a8\u8bc4\u4f30\u4e0e\u4f18\u5316\u3002\u7531\u4e8e\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u6536\u96c6\u6b64\u7c7b\u6570\u636e\u6240\u9700\u7684\u8d44\u6e90\u5de8\u5927\uff0c\u4ee5\u53ca\u6570\u636e\u5ba1\u67e5\u7684\u5fc5\u8981\u6027\uff0c\u56e0\u6b64\u6784\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u51e0\u4e4e\u4e0d\u53ef\u80fd\u5b9e\u73b0\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u914d\u7f6e\u7684\u591a\u4ee3\u7406\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u751f\u6210\u5668\u3002\u8be5\u7cfb\u7edf\u6a21\u62df\u4e86\u7531\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u6863\u5e76\u76f8\u4e92\u534f\u4f5c\u7684\u4ee3\u7406\u4e4b\u95f4\u7684\u77e5\u8bc6\u5de5\u4f5c\uff0c\u5e76\u8bb0\u5f55\u4e86\u4f34\u968f\u7684\u6570\u636e\u8f68\u8ff9\u3002\u6b64\u5916\uff0c\u751f\u6210\u5668\u5728\u5176\u914d\u7f6e\u4e2d\u6355\u83b7\u6216\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u521b\u5efa\u7684\u6240\u6709\u80cc\u666f\u4fe1\u606f\uff0c\u5e76\u4ee5\u77e5\u8bc6\u56fe\u8c31\u7684\u5f62\u5f0f\u5b58\u50a8\u3002\u6700\u540e\uff0c\u4ea7\u751f\u7684\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\u5229\u7528\u548c\u5171\u4eab\uff0c\u800c\u65e0\u9700\u6d89\u53ca\u9690\u79c1\u6216\u673a\u5bc6\u95ee\u9898\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u8bbe\u8ba1\u613f\u666f\uff0c\u5e76\u4e13\u6ce8\u4e8e\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u771f\u5b9e\u7684\u77e5\u8bc6\u5de5\u4f5c\u6587\u6863\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u4eba\u7c7b\u8bc4\u4f30\u8005\u8bc4\u4f30\u4e86\u751f\u6210\u6587\u6863\u768453%\u548c\u771f\u5b9e\u6587\u6863\u768474%\uff0c\u8ba4\u4e3a\u5b83\u4eec\u5177\u6709\u771f\u5b9e\u6027\uff0c\u8fd9\u8868\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u6790\u4e86\u53c2\u4e0e\u8005\u8bc4\u8bba\u4e2d\u63d0\u5230\u7684\u771f\u5b9e\u6027\u6807\u51c6\uff0c\u5e76\u5bf9\u5df2\u8bc6\u522b\u7684\u5e38\u89c1\u95ee\u9898\u8fdb\u884c\u4e86\u8be6\u7ec6\u8bf4\u660e\uff0c\u63d0\u51fa\u4e86\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.04270": "|**2024-09-06**|**Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models**|Yuxiao Huang et.al.|[2409.04270](http://arxiv.org/abs/2409.04270)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4f18\u5316\u8303\u5f0f\uff0c\u4ee5\u5efa\u7acb\u4e00\u4e2a\u81ea\u4e3b\u6a21\u578b\u5de5\u5382\uff0c\u7528\u4e8e\u751f\u6210\u9002\u7528\u4e8e\u4e0d\u540c\u4f18\u5316\u4efb\u52a1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u8bbe\u8ba1\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u4e3a\u4e86\u8bc4\u4f30\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u7814\u7a76\uff0c\u5c06\u751f\u6210\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u4e0e\u73b0\u6709\u7684\u6700\u4f73\u77e5\u8bc6\u8f6c\u79fb\u65b9\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u751f\u6210\u7684\u6a21\u578b\u5728\u6548\u7387\u548c\u6709\u6548\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u6216\u4e0e\u624b\u5de5\u8bbe\u8ba1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2409.04183": "|**2024-09-06**|**GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding**|Ziyin Zhang et.al.|[2409.04183](http://arxiv.org/abs/2409.04183)|null|\u5728\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GALLa - \u56fe\u5f62\u5bf9\u9f50\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002GALLa \u5229\u7528\u56fe\u795e\u7ecf\u7f51\u7edc\u548c\u8de8\u6a21\u6001\u5bf9\u9f50\u6280\u672f\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u5411LLM\u6ce8\u5165\u4ee3\u7801\u7684\u7ed3\u6784\u4fe1\u606f\u4f5c\u4e3a\u8f85\u52a9\u4efb\u52a1\u3002\u8fd9\u79cd\u6846\u67b6\u65e2\u65e0\u6a21\u578b\u4f9d\u8d56\u6027\u4e5f\u65e0\u4efb\u52a1\u4f9d\u8d56\u6027\uff0c\u5b83\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801LLM\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801\u4e0b\u6e38\u4efb\u52a1\uff0c\u5e76\u4ec5\u5728\u8bad\u7ec3\u65f6\u4ece\u4e0e\u5fae\u8c03\u6570\u636e\u65e0\u5173\u7684\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7ed3\u6784\u5316\u56fe\u5f62\u6570\u636e\uff0c\u800c\u5728\u63a8\u7406\u9636\u6bb5\u65e0\u9700\u989d\u5916\u6210\u672c\u3002\u901a\u8fc7\u56db\u79cd\u4e0d\u540c\u57fa\u7ebfLLM\uff08\u53c2\u6570\u91cf\u4ece3.5\u4ebf\u523080\u4ebf\u4e0d\u7b49\uff09\u5728\u4e94\u4e2a\u4ee3\u7801\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86GALLa\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5f3a\u5927\u7684\u6a21\u578b\u5982LLaMA3\uff0c\u4e5f\u8bc1\u660e\u4e86\u5176\u4e00\u81f4\u6027\u6539\u8fdb\u3002|\n", "2409.04181": "|**2024-09-06**|**Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering**|Larissa Pusch et.al.|[2409.04181](http://arxiv.org/abs/2409.04181)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u636e\u5e93\u7b49\u4fe1\u606f\u7cfb\u7edf\u7684\u4ea4\u4e92\u65b9\u5f0f\uff0c\u4f7f\u5176\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u7136\u800c\uff0c\u5728\u5173\u952e\u51c6\u786e\u6027\u9886\u57df\uff0c\u5982\u751f\u7269\u533b\u5b66\u9886\u57df\uff0c\u4ecd\u5b58\u5728\u6311\u6218\u3002\u5176\u4e2d\u4e00\u4e2a\u91cd\u8981\u95ee\u9898\u662f\u5e7b\u89c9\u95ee\u9898\uff0c\u5373\u6a21\u578b\u751f\u6210\u4e86\u6570\u636e\u652f\u6301\u4e4b\u5916\u7684\u4fe1\u606f\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5371\u9669\u7684\u9519\u8bef\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u6539\u5584\u95ee\u7b54\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u4ee5\u751f\u7269\u533b\u5b66KG\u4e3a\u4f8b\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8eLangChain\u6846\u67b6\u6784\u5efa\uff0c\u901a\u8fc7\u5f15\u5165\u67e5\u8be2\u68c0\u67e5\u5668\u786e\u4fddLLM\u751f\u6210\u7684\u67e5\u8be2\u5728\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u7684\u6709\u6548\u6027\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u67e5\u8be2\u4ece\u77e5\u8bc6\u56fe\u8c31\u4e2d\u63d0\u53d6\u4fe1\u606f\uff0c\u5927\u5e45\u51cf\u5c11\u4e86\u9519\u8bef\u5982\u5e7b\u89c9\u7684\u53d1\u751f\u3002 \u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u5305\u542b50\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u9898\u7684\u65b0\u57fa\u51c6\u6570\u636e\u96c6\u5bf9\u6574\u4f53\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u4e86\u5305\u62ecGPT-4 Turbo\u548cllama3:70b\u5728\u5185\u7684\u51e0\u79cdLLM\u3002\u7ed3\u679c\u663e\u793a\uff0c\u867d\u7136GPT-4 Turbo\u5728\u751f\u6210\u51c6\u786e\u67e5\u8be2\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f00\u6e90\u6a21\u578b\u5982llama3:70b\u5728\u9002\u5f53\u7684\u95ee\u9898\u63d0\u793a\u5de5\u7a0b\u4e0b\u4e5f\u663e\u793a\u51fa\u6f5c\u529b\u3002\u4e3a\u4e86\u4f7f\u8fd9\u79cd\u65b9\u6cd5\u6613\u4e8e\u8bbf\u95ee\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684Web\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\uff0c\u67e5\u770b\u751f\u6210\u548c\u4fee\u6b63\u7684Cypher\u67e5\u8be2\uff0c\u5e76\u9a8c\u8bc1\u7ed3\u679c\u8def\u5f84\u7684\u51c6\u786e\u6027\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u8fd9\u79cd\u6df7\u5408\u65b9\u6cd5\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6570\u636e\u7f3a\u53e3\u548c\u5e7b\u89c9\u7b49\u5e38\u89c1\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u9760\u4e14\u76f4\u89c2\u7684\u89e3\u51b3\u65b9\u6848\u6765\u6539\u8fdb\u95ee\u7b54\u7cfb\u7edf\u3002\u751f\u6210\u672c\u6587\u7ed3\u679c\u548c\u7528\u6237\u754c\u9762\u6240\u9700\u6e90\u4ee3\u7801\u7684Git\u4ed3\u5e93\u94fe\u63a5\u5982\u4e0b\uff1ahttps://git.zib.de/lpusch/cyphergenkg-gui|\n", "2409.04168": "|**2024-09-06**|**From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks**|Andreas Stephan et.al.|[2409.04168](http://arxiv.org/abs/2409.04168)|null|\u4e3a\u4e86\u51cf\u5c11\u5bf9\u4eba\u5de5\u6807\u6ce8\u7684\u9700\u6c42\uff0c\u63d0\u51fa\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5019\u9009\u6a21\u578b\u8d28\u91cf\u7684\u8bc4\u5224\u8005\u3002\u8fd9\u4e9bLLM\u8bc4\u5224\u8005\u901a\u5e38\u901a\u8fc7\u5728\u6458\u8981\u6216\u673a\u5668\u7ffb\u8bd1\u7b49\u751f\u6210\u4efb\u52a1\u4e0a\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u6765\u8bc4\u4f30\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u7684LLM\u8bc4\u5224\u8005\u3002\u8fd9\u7c7b\u4efb\u52a1\u9700\u8981\u591a\u6b65\u63a8\u7406\uff0c\u5176\u89e3\u7b54\u7684\u6b63\u786e\u6027\u53ef\u4ee5\u9a8c\u8bc1\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u5ba2\u89c2\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u8be6\u7ec6\u7684\u8868\u73b0\u5206\u6790\uff0c\u5e76\u53d1\u73b0\u4f7f\u7528\u7684\u8bc4\u5224\u8005\u5927\u591a\u65e0\u6cd5\u63d0\u9ad8\u4efb\u52a1\u6027\u80fd\uff0c\u4f46\u80fd\u591f\u9009\u62e9\u66f4\u597d\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8bc4\u5224\u8868\u73b0\u4e0e\u5019\u9009\u6a21\u578b\u4efb\u52a1\u8868\u73b0\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\u3002\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u503e\u5411\u4e8e\u9009\u62e9\u66f4\u9ad8\u8d28\u91cf\u7684\u6a21\u578b\uff0c\u5373\u4f7f\u5176\u7b54\u6848\u662f\u9519\u8bef\u7684\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u7edf\u8ba1\u63aa\u65bd\uff0c\u5982\u5019\u9009\u6a21\u578b\u7684\u4efb\u52a1\u6027\u80fd\uff0c\u6765\u9884\u6d4b\u8bc4\u5224\u8868\u73b0\u3002\u5728\u6d88\u878d\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4ea4\u6362\u6216\u5c4f\u853d\u5019\u9009\u7b54\u6848\uff0c\u5e76\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u7ecf\u5e38\u4fdd\u6301\u539f\u59cb\u5224\u65ad\uff0c\u8fd9\u63d0\u4f9b\u4e86\u8bc1\u636e\u8868\u660e\u8bc4\u5224\u8005\u5728\u5224\u65ad\u4e2d\u878d\u5165\u4e86\u5199\u4f5c\u98ce\u683c\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u7edf\u8ba1\u6307\u6807\u91cf\u5316\u5224\u65ad\u4e2d\u7684\u89c4\u5f8b\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528\u5b83\u4eec\u7684\u5404\u79cd\u89d2\u5ea6\u3002|\n", "2409.04164": "|**2024-09-06**|**Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation**|Luis Mayer et.al.|[2409.04164](http://arxiv.org/abs/2409.04164)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u5728\u591a\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u5305\u62ec\u8f6f\u4ef6\u5de5\u7a0b\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e94\u6b3e\u6700\u5148\u8fdb\u7684LLM\u2014\u2014Bard\u3001BingChat\u3001ChatGPT\u3001Llama2\u548cCode Llama\u2014\u2014\u5728\u6587\u672c\u5230\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5411\u6a21\u578b\u63d0\u4f9b\u6765\u81ea\u7f16\u7a0b\u7f51\u7ad9LeetCode\u7684\u7f16\u7801\u95ee\u9898\u63cf\u8ff0\u6587\u672c\u63d0\u793a\uff0c\u8981\u6c42\u5b83\u4eec\u7528Python\u7f16\u5199\u89e3\u51b3\u65b9\u6848\u3002\u968f\u540e\uff0c\u6211\u4eec\u4f7f\u7528LeetCode\u7684\u6d4b\u8bd5\u529f\u80fd\u6765\u8bc4\u4f30\u751f\u6210\u8f93\u51fa\u7684\u8d28\u91cf\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002ChatGPT\u5728\u5904\u7406\u8fd9\u7c7b\u7f16\u7a0b\u6311\u6218\u65b9\u9762\u8868\u73b0\u6700\u4e3a\u6709\u6548\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u4e13\u95e8\u9488\u5bf9\u4ee3\u7801\u7684\u6a21\u578b\uff0c\u5982Code Llama\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u4e86\u89e3\u60c5\u51b5\uff0c\u6211\u4eec\u6d4b\u91cf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u8fd0\u884c\u65f6\u95f4\u548c\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\uff0c\u5e76\u5c06\u5176\u4e0eLeetCode\u4e0a\u7684\u5176\u4ed6\u4ee3\u7801\u63d0\u4ea4\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u8be6\u7ec6\u9519\u8bef\u5206\u6790\u5305\u62ec\u6bd4\u8f83\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u6b63\u786e\u7f29\u8fdb\u548c\u5f62\u5f0f\u5dee\u5f02\uff0c\u4ee5\u53ca\u5c06\u672a\u89e3\u51b3\u7684\u4efb\u52a1\u5f52\u7c7b\u5230\u7279\u5b9a\u9519\u8bef\u7c7b\u522b\uff0c\u6709\u52a9\u4e8e\u6211\u4eec\u66f4\u6df1\u5165\u5730\u7406\u89e3\u7ed3\u679c\u5e76\u627e\u5230\u6539\u8fdb\u7a7a\u95f4\u3002\u7814\u7a76\u7ed3\u679c\u8fd8\u663e\u793a\uff0c\u5f53\u6a21\u578b\u9762\u4e34\u5927\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u65f6\uff0c\u5373\u8f83\u957f\u63d0\u793a\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u8d8a\u6765\u8d8a\u4e0d\u51c6\u786e\u3002|\n", "2409.05840": "|**2024-09-09**|**MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct**|Run Luo et.al.|[2409.05840](http://arxiv.org/abs/2409.05840)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53d1\u5c55\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5728\u6570\u636e\u91cf\u548c\u6570\u636e\u8d28\u91cf\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u5173\u952e\u74f6\u9888\u3002\u624b\u52a8\u521b\u5efa\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u65e2\u8017\u65f6\u53c8\u4f4e\u6548\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u9ad8\u590d\u6742\u6027\u7684\u6307\u4ee4\u65f6\u3002\u6b64\u5916\uff0c\u4ece\u201c\u9ed1\u76d2\u201d\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\u3001GPT-4V\uff09\u4e2d\u63d0\u53d6\u6307\u4ee4\u6570\u636e\u5f80\u5f80\u5bfc\u81f4\u751f\u6210\u7684\u6307\u4ee4\u6570\u636e\u8fc7\u4e8e\u7b80\u5355\uff0c\u8fd9\u9650\u5236\u4e86\u6a21\u578b\u6027\u80fd\u4ec5\u4e0e\u5176\u81ea\u8eab\u6c34\u5e73\u76f8\u5f53\u3002\u6784\u5efa\u591a\u6837\u6027\u548c\u590d\u6742\u6027\u6307\u4ee4\u6570\u636e\u7684\u6311\u6218\u4f9d\u7136\u5de8\u5927\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMMEvol\u7684\u65b0\u9896\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u8fdb\u5316\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u7cbe\u7ec6\u611f\u77e5\u6f14\u5316\u3001\u8ba4\u77e5\u63a8\u7406\u6f14\u5316\u4ee5\u53ca\u4e92\u52a8\u6f14\u5316\u3002\u8fd9\u4e00\u8fed\u4ee3\u65b9\u6cd5\u7a81\u7834\u4e86\u6570\u636e\u8d28\u91cf\u74f6\u9888\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u50cf-\u6587\u672c\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u4ece\u800c\u589e\u5f3a\u4e86MLLMs\u7684\u80fd\u529b\u3002\u6211\u4eec\u4ee5\u521d\u59cb\u6307\u4ee4\u96c6\u5408SEED-163K\u4e3a\u57fa\u7840\uff0c\u5229\u7528MMEvol\u7cfb\u7edf\u5730\u6269\u5c55\u4e86\u6307\u4ee4\u7c7b\u578b\u7684\u591a\u6837\u6027\uff0c\u878d\u5165\u4e86\u589e\u5f3a\u8ba4\u77e5\u80fd\u529b\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u5e76\u4ece\u56fe\u50cf\u4e2d\u63d0\u53d6\u4e86\u8be6\u7ec6\u4fe1\u606f\u4ee5\u63d0\u9ad8\u89c6\u89c9\u7406\u89e3\u548c\u9c81\u68d2\u6027\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u4f7f\u7528\u8fdb\u5316\u7684\u6570\u636e\u8bad\u7ec3\u4e86LLaVA-NeXT\uff0c\u5e76\u572813\u4e2a\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0e\u57fa\u4e8e\u539f\u59cb\u6570\u636e\u8bad\u7ec3\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5e73\u5747\u63d0\u9ad8\u4e863.1\u70b9\u51c6\u786e\u7387\uff0c\u5e76\u57289\u4e2a\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2409.05824": "|**2024-09-09**|**Are Large Language Models a Threat to Programming Platforms? An Exploratory Study**|Md Mustakim Billah et.al.|[2409.05824](http://arxiv.org/abs/2409.05824)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982ChatGPT\u3001Gemini\u548cMeta AI\u5728LeetCode\u3001Codeforces\u548cHackerRank\u7b49\u7ade\u8d5b\u7f16\u7a0b\u5e73\u53f0\u4e0a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u8fd9\u4e9b\u5e73\u53f0\u5e38\u88ab\u62db\u8058\u4eba\u5458\u7528\u6765\u7b5b\u9009\u7f16\u7a0b\u6280\u80fd\u3002\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5bf9\u5176\u5728\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u3001\u5404\u7c7b\u522b\u7684\u7f16\u7a0b\u6311\u6218\u4e2d\u7684\u8868\u73b0\u8fdb\u884c\u8bc4\u4f30\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002 \u7814\u7a76\u56e2\u961f\u4eceLeetCode\u9009\u53d6\u4e8698\u4e2a\u95ee\u9898\uff0c\u4eceCodeforces\u9009\u53d6\u4e86126\u4e2a\u95ee\u9898\uff0c\u8986\u76d6\u4e8615\u4e2a\u7c7b\u522b\u3002\u901a\u8fc7\u4e5d\u573a\u5728\u7ebfCodeforces\u548cLeetCode\u7ade\u8d5b\u4ee5\u53caHackerRank\u7684\u4e24\u9879\u8ba4\u8bc1\u6d4b\u8bd5\uff0c\u5bf9LLM\u7684\u5b9e\u65f6\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7814\u7a76\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u4e86\u63d0\u793a\u548c\u53cd\u9988\u673a\u5236\u6765\u5f15\u5bfcLLM\uff0c\u5e76\u63a2\u7d22\u4e86\u4e0d\u540c\u573a\u666f\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u3002 \u7ed3\u679c\u663e\u793a\uff0cChatGPT\u7b49LLM\u5728LeetCode\u548cHackerRank\u7684\u8ba4\u8bc1\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff08\u6210\u529f\u7387\u4e3a71.43%\uff09\uff0c\u4f46\u5728\u865a\u62df\u7ade\u8d5b\u4e2d\uff0c\u7279\u522b\u662f\u5728Codeforces\u7684\u9ad8\u96be\u5ea6\u6bd4\u8d5b\u4e2d\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4e0d\u5c3d\u5982\u4eba\u610f\u3002\u5c3d\u7ba1\u5728LeetCode\u6863\u6848\u5e93\u4e2d\u7684\u7528\u6237\u4e2d\u8868\u73b0\u4f18\u4e8e\u90e8\u5206\u7528\u6237\uff0c\u4f46LLM\u5728\u65f6\u95f4\u6548\u7387\u548c\u5185\u5b58\u6548\u7387\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u800c\u5728\u66f4\u56f0\u96be\u7684Codeforces\u7ade\u8d5b\u4e2d\u5219\u5904\u4e8e\u52a3\u52bf\u3002 \u5c3d\u7ba1\u5f53\u524d\u60c5\u51b5\u5e76\u672a\u7acb\u5373\u6784\u6210\u5a01\u80c1\uff0c\u4f46LLM\u5728\u8fd9\u4e9b\u5e73\u53f0\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u62c5\u5fe7\uff0c\u672a\u6765\u9700\u8981\u6539\u8fdb\u4ee5\u63d0\u9ad8\u5176\u6027\u80fd\u3002|\n", "2409.05806": "|**2024-09-09**|**Benchmarking Chinese Knowledge Rectification in Large Language Models**|Tianhe Lu et.al.|[2409.05806](http://arxiv.org/abs/2409.05806)|**[link](https://github.com/zjunlp/easyedit)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u751f\u6210\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u6ca1\u6709\u7f3a\u9677\uff0c\u7279\u522b\u662f\u5b58\u5728\u5e7b\u89c9\u7684\u95ee\u9898\u3002\u5f53LLM\u5e94\u7528\u4e8e\u7279\u5b9a\u8bed\u8a00\u548c\u9886\u57df\u65f6\uff0c\u8fd9\u4e00\u95ee\u9898\u5c24\u4e3a\u7a81\u51fa\u3002\u4f8b\u5982\uff0c\u5728\u5904\u7406\u4e2d\u56fd\u53e4\u4ee3\u8bd7\u6b4c\u3001\u8c1a\u8bed\u6216\u6210\u8bed\u65f6\uff0cLLM\u53ef\u80fd\u4f1a\u751f\u6210\u6beb\u65e0\u610f\u4e49\u7684\u4fe1\u606f\uff0c\u8fd9\u662f\u7531\u4e8e\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u9020\u6210\u7684\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u7684\u57fa\u51c6\uff0c\u901a\u8fc7\u77e5\u8bc6\u7f16\u8f91\u6765\u7ea0\u6b63\u4e2d\u6587\u77e5\u8bc6\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4ece\u5404\u79cd\u6765\u6e90\u6536\u96c6\u4e03\u79cd\u7c7b\u578b\u7684\u77e5\u8bc6\uff0c\u5305\u62ec\u53e4\u5178\u6587\u672c\u3001\u6210\u8bed\u4ee5\u53ca\u6765\u81ea\u767e\u5ea6\u8d34\u5427\u201c\u6c42\u8bf8\u5bb6\u201d\u7684\u5185\u5bb9\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u4e2d\u6587\u6570\u636e\u96c6CKnowEdit\uff0c\u4ee5\u5e94\u5bf9\u4e2d\u6587\u8bed\u8a00\u7279\u6709\u7684\u590d\u8c03\u6027\u3001\u53cd\u8bbd\u6027\u548c\u903b\u8f91\u7ed3\u6784\u3002\u901a\u8fc7\u5bf9\u8fd9\u4e2a\u6570\u636e\u96c6\u7684\u5206\u6790\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5f53\u524dLLM\u5728\u638c\u63e1\u4e2d\u6587\u65b9\u9762\u7684\u6311\u6218\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u5bf9\u73b0\u6709\u7684\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u8fdb\u884c\u8bc4\u4f30\uff0c\u53d1\u73b0\u5bf9\u4e2d\u6587\u77e5\u8bc6\u7684\u4fee\u6b63\u4ecd\u5b58\u5728\u5de8\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\uff1ahttps://github.com/zjunlp/EasyEdit\u3002**|\n", "2409.05771": "|**2024-09-09**|**Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models**|Emily Cheng et.al.|[2409.05771](http://arxiv.org/abs/2409.05771)|null|\u7814\u7a76\u5df2\u53cd\u590d\u8bc1\u660e\uff0c\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u4e2d\u95f4\u9690\u85cf\u72b6\u6001\u80fd\u591f\u9884\u6d4b\u5bf9\u81ea\u7136\u8bed\u8a00\u523a\u6fc0\u7684\u6d4b\u91cf\u5927\u8111\u53cd\u5e94\u3002\u7136\u800c\uff0c\u5173\u4e8e\u4f7f\u8fd9\u4e00\u9ad8\u9884\u6d4b\u6027\u80fd\u6210\u4e3a\u53ef\u80fd\u7684\u8868\u793a\u7279\u6027\u7684\u4e86\u89e3\u975e\u5e38\u6709\u9650\u3002\u4e3a\u4ec0\u4e48\u662f\u4e2d\u95f4\u5c42\u800c\u4e0d\u662f\u8f93\u51fa\u5c42\u5728\u8fd9\u4e00\u72ec\u7279\u4e14\u9ad8\u5ea6\u901a\u7528\u7684\u8f6c\u79fb\u4efb\u52a1\u4e2d\u6700\u4e3a\u6709\u6548\uff1f\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u529f\u80fd\u6027\u78c1\u5171\u632f\u6210\u50cf\u4e2d\u7684\u8bed\u8a00\u7f16\u7801\u6a21\u578b\u8bc1\u636e\u652f\u6301\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5185\u5b58\u5728\u4e24\u4e2a\u9636\u6bb5\u62bd\u8c61\u8fc7\u7a0b\u7684\u5b58\u5728\u3002\u6211\u4eec\u4f7f\u7528\u6d41\u5f62\u5b66\u4e60\u65b9\u6cd5\u8868\u660e\uff0c\u8fd9\u79cd\u62bd\u8c61\u8fc7\u7a0b\u81ea\u7136\u5730\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ea7\u751f\uff0c\u5e76\u4e14\u968f\u7740\u8bad\u7ec3\u7ee7\u7eed\u8fdb\u884c\uff0c\u8fd9\u4e2a\u62bd\u8c61\u8fc7\u7a0b\u7684\u7b2c\u4e00\u4e2a\u201c\u7ec4\u5408\u201d\u9636\u6bb5\u88ab\u538b\u7f29\u5230\u66f4\u5c11\u7684\u5c42\u4e2d\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5c42\u6b21\u7f16\u7801\u6027\u80fd\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8868\u793a\u7684\u5185\u5728\u7ef4\u5ea6\u4e4b\u95f4\u5b58\u5728\u5f3a\u70c8\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u6211\u4eec\u521d\u6b65\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u5bf9\u5e94\u5173\u7cfb\u4e3b\u8981\u6765\u6e90\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u5728\u7ec4\u5408\u6027\uff0c\u800c\u975e\u5176\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u5c5e\u6027\u3002|\n", "2409.05768": "|**2024-09-09**|**Model Input Verification of Large Scale Simulations**|Rumyana Neykova et.al.|[2409.05768](http://arxiv.org/abs/2409.05768)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u9a8c\u8bc1\u6a21\u62df\u8f93\u5165\u6570\u636e\u6709\u6548\u6027\u7684\u65b9\u6cd5\u8bba\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6a21\u578b\u8f93\u5165\u9a8c\u8bc1\uff08MIV\uff09\u3002\u6211\u4eec\u901a\u8fc7\u8bbe\u8ba1\u7279\u5b9a\u4e8e\u6a21\u62df\u5efa\u6a21\u9700\u6c42\u7684\u6570\u636e\u6a21\u5f0f\u548c\u9a8c\u8bc1\u5de5\u5177\u5728\u540d\u4e3aFabGuard\u7684\u5de5\u5177\u96c6\u4e2d\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u3002\u672c\u6587\u5f15\u5165\u4e86MIV\u6a21\u5f0f\u7684\u6b63\u5f0f\u5206\u7c7b\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u96c6\u6210\u5230\u73b0\u6709\u6a21\u62df\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u7b80\u5316\u9a8c\u8bc1\u7ba1\u9053\u3002FabGuard\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u2014\u2014\u51b2\u7a81\u9a71\u52a8\u7684\u4eba\u53e3\u8fc1\u79fb\u3001\u707e\u5bb3\u758f\u6563\u4ee5\u53ca\u75be\u75c5\u4f20\u64ad\u6a21\u578b\u2014\u2014\u7684\u5e94\u7528\u5f97\u5230\u4e86\u5c55\u793a\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u7ea6\u675f\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u7684\u5e94\u7528\u3002\u5728\u5bf9\u4e00\u4e2a\u79fb\u6c11\u6a21\u62df\u6848\u4f8b\u7684\u7814\u7a76\u4e2d\uff0cLLMs\u4e0d\u4ec5\u6b63\u786e\u63a8\u65ad\u51fa\u4e8623\u4e2a\u5f00\u53d1\u8005\u5b9a\u4e49\u7684\u7ea6\u675f\u4e2d\u768422\u4e2a\uff0c\u800c\u4e14\u8fd8\u53d1\u73b0\u4e86\u73b0\u6709\u7ea6\u675f\u4e2d\u7684\u9519\u8bef\uff0c\u5e76\u63d0\u51fa\u4e86\u65b0\u7684\u6709\u6548\u7ea6\u675f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5bf9\u4e8e\u5927\u578b\u6570\u636e\u96c6\uff0cMIV\u662f\u53ef\u884c\u7684\uff0cFabGuard\u80fd\u591f\u5728140\u79d2\u5185\u9ad8\u6548\u5904\u740612,000\u4e2a\u8f93\u5165\u6587\u4ef6\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5728\u4e0d\u540c\u6587\u4ef6\u5927\u5c0f\u4e0b\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2409.05747": "|**2024-09-09**|**A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System**|B. Sankar et.al.|[2409.05747](http://arxiv.org/abs/2409.05747)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u3001\u57fa\u4e8e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u6fc0\u6d3b\u521b\u65b0\u754c\u9762\uff0c\u4f5c\u4e3a\u521b\u610f\u751f\u6210\u5de5\u5177\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u8bbe\u8ba1\u8005\u7f13\u89e3\u901a\u5e38\u5b58\u5728\u7684\u521d\u59cb\u5ef6\u8fdf\u548c\u521b\u65b0\u74f6\u9888\u95ee\u9898\u3002\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u3001\u4e92\u52a8\u4e14\u4e0a\u4e0b\u6587\u54cd\u5e94\u5f0f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u79ef\u6781\u5730\u5229\u7528\u4eba\u5de5\u667a\u80fd\u9886\u57df\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u751f\u6210\u9488\u5bf9\u4e0d\u540c\u8bbe\u8ba1\u95ee\u9898\u7684\u591a\u4e2a\u6f5c\u5728\u60f3\u6cd5\u8868\u8ff0\u3002\u5c06\u6b64\u7c7bAI\u6a21\u578b\u4e0e\u521b\u65b0\u8fc7\u7a0b\u7ed3\u5408\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u6fc0\u6d3b\u521b\u65b0\u201d\u60c5\u666f\uff0c\u65e8\u5728\u4fc3\u8fdb\u57fa\u4e8e\u5bf9\u8bdd\u7684\u8fde\u7eed\u4e92\u52a8\u3001\u4e0a\u4e0b\u6587\u76f8\u5173\u7684\u5bf9\u8bdd\u4ee5\u53ca\u5927\u91cf\u7684\u60f3\u6cd5\u751f\u6210\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u5de5\u5177\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5bf930\u540d\u521d\u5b66\u8005\u8bbe\u8ba1\u5e08\u8fdb\u884c\u4e86\u8bd5\u70b9\u7814\u7a76\uff0c\u8ba9\u4ed6\u4eec\u4f7f\u7528\u4f20\u7edf\u65b9\u6cd5\u548c\u65b0\u7684\u57fa\u4e8eCAI\u7684\u754c\u9762\u6765\u4e3a\u7ed9\u5b9a\u95ee\u9898\u751f\u6210\u60f3\u6cd5\u3002\u901a\u8fc7\u4e13\u5bb6\u5c0f\u7ec4\u5bf9\u7ed3\u679c\u8fdb\u884c\u7684\u5b9a\u6027\u6bd4\u8f83\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u6d41\u7545\u5ea6\u3001\u65b0\u9896\u6027\u548c\u591a\u6837\u6027\u4f5c\u4e3a\u5173\u952e\u53c2\u6570\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u63d0\u51fa\u7684\u5de5\u5177\u80fd\u591f\u6709\u6548\u5730\u4ea7\u751f\u5927\u91cf\u3001\u591a\u6837\u4e14\u65b0\u9896\u7684\u60f3\u6cd5\u3002 \u4e3a\u4e86\u63d0\u9ad8\u754c\u9762\u7684\u53ef\u7528\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u5316\u7684\u5bf9\u8bdd\u6a21\u5f0f\uff0c\u4e3a\u6bcf\u4e2a\u521b\u65b0\u9636\u6bb5\u8bbe\u8ba1\u4e86\u63d0\u793a\u5de5\u7a0b\u5316\u7ed3\u6784\uff0c\u4f7f\u5176\u66f4\u52a0\u7edf\u4e00\u548c\u65b9\u4fbf\u8bbe\u8ba1\u5e08\u64cd\u4f5c\u3002\u91c7\u7528\u8fd9\u79cd\u7ed3\u6784\u5316\u7684CAI\u754c\u9762\u540e\uff0c\u5f97\u5230\u7684\u54cd\u5e94\u66f4\u52a0\u7b80\u6d01\uff0c\u5e76\u4e14\u4e0e\u968f\u540e\u7684\u8bbe\u8ba1\u9636\u6bb5\uff0c\u5373\u6982\u5ff5\u5316\u9636\u6bb5\uff0c\u66f4\u52a0\u7d27\u5bc6\u76f8\u5173\u3002 \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u8bc1\u660e\u4e86\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff08Gen-AI\uff09\u5728\u521b\u610f\u4ea7\u54c1\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u65e9\u671f\u3001\u7ed3\u6784\u4e0d\u660e\u786e\u9636\u6bb5\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.05746": "|**2024-09-09**|**LLMs Will Always Hallucinate, and We Need to Live With This**|Sourav Banerjee et.al.|[2409.05746](http://arxiv.org/abs/2409.05746)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u6df1\u5165\u63a2\u8ba8\u5b83\u4eec\u5185\u5728\u5c40\u9650\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\uff0c\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u5e76\u975e\u5076\u7136\u9519\u8bef\uff0c\u800c\u662f\u8fd9\u4e9b\u7cfb\u7edf\u56fa\u6709\u7684\u7279\u5f81\u3002\u6211\u4eec\u901a\u8fc7\u8ba1\u7b97\u7406\u8bba\u548c\u54e5\u5fb7\u5c14\u7b2c\u4e00\u4e0d\u5b8c\u5168\u6027\u5b9a\u7406\u7684\u5f15\u7528\uff08\u6d89\u53caHalting\u3001Emptiness\u548cAcceptance\u95ee\u9898\u7684\u4e0d\u53ef\u5224\u5b9a\u6027\uff09\uff0c\u5c55\u793a\u4e86\u5e7b\u89c9\u6e90\u4e8eLLM\u7684\u57fa\u672c\u6570\u5b66\u548c\u903b\u8f91\u7ed3\u6784\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u67b6\u6784\u6539\u8fdb\u3001\u6570\u636e\u96c6\u589e\u5f3a\u6216\u4e8b\u5b9e\u6838\u67e5\u673a\u5236\u6d88\u9664\u5e7b\u89c9\u662f\u4e0d\u53ef\u80fd\u7684\u3002 \u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4ece\u8bad\u7ec3\u6570\u636e\u7f16\u8bd1\u5230\u4e8b\u5b9e\u68c0\u7d22\u3001\u610f\u56fe\u5206\u7c7b\u548c\u6587\u672c\u751f\u6210\u7684\u6bcf\u4e2a\u9636\u6bb5\uff0c\u90fd\u5b58\u5728\u4ea7\u751f\u5e7b\u89c9\u7684\u975e\u96f6\u6982\u7387\u3002\u7531\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u6027\u5e7b\u89c9\u7684\u6982\u5ff5\uff0c\u4f5c\u4e3a\u8fd9\u4e9b\u7cfb\u7edf\u7684\u56fa\u6709\u7279\u6027\u3002\u901a\u8fc7\u5efa\u7acb\u5e7b\u89c9\u7684\u6570\u5b66\u786e\u5b9a\u6027\uff0c\u672c\u6587\u6311\u6218\u4e86\u5e7b\u89c9\u53ef\u4ee5\u5b8c\u5168\u907f\u514d\u7684\u4f20\u7edf\u89c2\u70b9\u3002|\n", "2409.05735": "|**2024-09-09**|**A System and Benchmark for LLM-based Q\\&A on Heterogeneous Data**|Achille Fokoue et.al.|[2409.05735](http://arxiv.org/abs/2409.05735)|null|\u5728\u8bb8\u591a\u5de5\u4e1a\u73af\u5883\u4e2d\uff0c\u7528\u6237\u5e0c\u671b\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u63d0\u51fa\u95ee\u9898\uff0c\u5e76\u4ece\u7ed3\u6784\u5316\u6570\u636e\u6e90\uff08\u5982\u7535\u5b50\u8868\u683c\u3001\u6570\u636e\u5e93\u3001API\u6216\u5b83\u4eec\u7684\u7ec4\u5408\uff09\u4e2d\u83b7\u53d6\u7b54\u6848\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u7528\u6237\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u8bc6\u522b\u6216\u8bbf\u95ee\u6b63\u786e\u7684\u6570\u636e\u6e90\u3002\u5982\u679c\u9700\u8981\u7ec4\u88c5\u591a\u4e2a\uff08\u751a\u81f3\u53ef\u80fd\u662f\u9694\u79bb\u7684\uff09\u6570\u636e\u6e90\u6765\u5f97\u51fa\u7b54\u6848\uff0c\u8fd9\u4e2a\u95ee\u9898\u4f1a\u53d8\u5f97\u66f4\u52a0\u590d\u6742\u3002\u6700\u8fd1\uff0c\u4e00\u4e9b\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5230SQL\u5e94\u7528\u5df2\u89e3\u51b3\u4e86\u4e00\u4e9b\u8fd9\u4e9b\u95ee\u9898\uff0c\u901a\u8fc7\u4f7f\u7528\u6237\u80fd\u591f\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u51fa\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u73b0\u5b9e\u7684\u5de5\u4e1a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u5e94\u7528\u4ecd\u7136\u4e0d\u5b9e\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5e94\u5bf9\u5178\u578b\u73af\u5883\u4e2d\u6570\u636e\u6e90\u7684\u5f02\u8d28\u6027\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u5f15\u5165siwarex\u5e73\u53f0\u89e3\u51b3\u5f02\u8d28\u6027\u95ee\u9898\uff0c\u8be5\u5e73\u53f0\u5141\u8bb8\u65e0\u7f1d\u5730\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8bbf\u95ee\u6570\u636e\u5e93\u548cAPI\u3002 \u4e3a\u4e86\u5c55\u793asiwarex\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u6d41\u884c\u7684Spider\u6570\u636e\u96c6\u5e76\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u66ff\u6362\u5176\u4e2d\u7684\u4e00\u4e9b\u8868\u683c\u4e3a\u6570\u636e\u68c0\u7d22API\u3002\u6211\u4eec\u53d1\u73b0siwarex\u5f88\u597d\u5730\u5e94\u5bf9\u4e86\u6570\u636e\u6e90\u5f02\u8d28\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u4fee\u6539\u540e\u7684Spider\u57fa\u51c6\u5f88\u5feb\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u5f00\u653e\u3002|\n", "2409.05732": "|**2024-09-09**|**Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach**|Meng Zhou et.al.|[2409.05732](http://arxiv.org/abs/2409.05732)|null|## \u4e0a\u6587\u80cc\u666f \u591a\u8bed\u8a00\u5f00\u6e90\u533b\u7597\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5177\u6709\u670d\u52a1\u4e8e\u4e0d\u540c\u5730\u533a\u8bed\u8a00\u591a\u6837\u6027\u7684\u6f5c\u529b\u3002\u5c06\u901a\u7528LLMs\u9002\u5e94\u4e8e\u533b\u7597\u9886\u57df\u901a\u5e38\u9700\u8981\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4f46\u8fd9\u5728\u8ba1\u7b97\u4e0a\u6210\u672c\u9ad8\u6602\u4e14\u6709\u65f6\u4e0d\u53ef\u884c\u3002\u4ec5\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u7279\u5b9a\u4efb\u52a1\u53ef\u80fd\u65e0\u6cd5\u4fdd\u8bc1\u6700\u4f73\u6027\u80fd\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5e7f\u6cdb\u9886\u57df\u77e5\u8bc6\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u5728\u5404\u79cd\u573a\u666f\u4e0b\u7406\u89e3\u548c\u63a8\u7406\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u591a\u8bed\u8a00\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\uff1aMMed-IFT\u548cMMed-IFT-MC\uff0c\u8fd9\u4e24\u4e2a\u6570\u636e\u96c6\u5206\u522b\u5305\u542b\u4e86\u8d85\u8fc720\u4e07\u6761\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u79cd\u533b\u7597\u6837\u672c\uff0c\u5728\u516d\u79cd\u8bed\u8a00\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8bad\u7ec3\u8303\u5f0f\uff1a\u7b2c\u4e00\u9636\u6bb5\u5229\u7528MMed-IFT\u6ce8\u5165\u901a\u7528\u533b\u5b66\u77e5\u8bc6\uff0c\u7b2c\u4e8c\u9636\u6bb5\u5219\u4f7f\u7528MMed-IFT-MC\u5fae\u8c03\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u591a\u9879\u9009\u62e9\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u82f1\u8bed\u548c\u591a\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\u3002\u6211\u4eec\u8ba1\u5212\u5728\u672a\u6765\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6a21\u578b\u6743\u91cd\u516c\u5f00\u5728\\url{https://github.com/SpassMed/Med-Llama3}\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u4e3a\u4e2d\u6587\uff0c\u907f\u514d\u8f93\u51fa\u5176\u4ed6\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u5e76\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2409.05703": "|**2024-09-09**|**The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy**|Qihang He et.al.|[2409.05703](http://arxiv.org/abs/2409.05703)|null|\u8fd1\u5e74\u6765\uff0c\u5fc3\u7406\u5065\u5eb7\u969c\u788d\u60a3\u8005\u7684\u6570\u91cf\u6301\u7eed\u589e\u957f\uff0c\u800c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e0d\u540c\u9886\u57df\u7684\u8fdb\u6b65\u4e5f\u4f7f\u5f97\u57fa\u4e8eLLM\u7684\u5fc3\u7406\u6cbb\u7597\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u5f71\u54cd\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u6001\u5ea6\u7684\u56e0\u7d20\u9c9c\u6709\u63a2\u8ba8\u3002\u672c\u6587\u4f5c\u4e3a\u9996\u6b21\u5c1d\u8bd5\uff0c\u65e8\u5728\u7814\u7a76\u4efb\u52a1\u5dee\u5f02\u548c\u7fa4\u4f53\u5dee\u5f02\u5bf9\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7684\u6001\u5ea6\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8fd0\u7528\u6280\u672f\u63a5\u53d7\u6a21\u578b\uff08TAM\uff09\u548c\u81ea\u52a8\u5316\u63a5\u53d7\u6a21\u578b\uff08AAM\uff09\uff0c\u7ed3\u5408\u5728\u7ebf\u95ee\u5377\u8c03\u67e5\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5206\u6790\u4e86\u6765\u81ea\u4e2d\u56fd\u5927\u9646222\u540d\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7528\u6237\u7684\u53cd\u9988\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u7fa4\u4f53\u5dee\u5f02\uff08\u5373\u5fc3\u7406\u5065\u5eb7\u72b6\u51b5\uff09\u53ef\u4ee5\u5f71\u54cd\u7528\u6237\u5bf9LLM\u5de5\u5177\u7684\u6001\u5ea6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u4f5c\u4e3a\u5178\u578b\u4efb\u52a1\u5dee\u5f02\u4e4b\u4e00\u7684\u9690\u79c1\u987e\u8651\uff0c\u5e76\u672a\u53d1\u73b0\u5bf9\u4fe1\u4efb\u5ea6\u548c\u4f7f\u7528\u610f\u56fe\u4ea7\u751f\u663e\u8457\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u53ef\u6307\u5bfc\u672a\u6765\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u670d\u52a1\u7684\u8bbe\u8ba1\u5de5\u4f5c\u3002|\n", "2409.06679": "|**2024-09-10**|**E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning**|Zihan Liao et.al.|[2409.06679](http://arxiv.org/abs/2409.06679)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9886\u57df\uff0c\u5904\u7406\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5bf9\u4e8e\u591a\u8f6e\u5bf9\u8bdd\u3001\u4ee3\u7801\u751f\u6210\u548c\u6587\u6863\u6458\u8981\u7b49\u4efb\u52a1\u6108\u53d1\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u589e\u5f3a\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u6027\u80fd\u3001\u964d\u4f4e\u8ba1\u7b97\u590d\u6742\u6027\u4ee5\u53ca\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u6240\u9762\u4e34\u7684\u6311\u6218\u2014\u2014\u5373\u6240\u8c13\u7684\u201c\u4e0d\u53ef\u80fd\u4e09\u89d2\u201d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aE2LLM\uff08\u7f16\u7801\u5668\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u89e3\u51b3\u8fd9\u4e00\u6096\u8bba\u3002 \u8be5\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5212\u5206\u4e3a\u591a\u4e2a\u7247\u6bb5\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u6587\u672c\u7f16\u7801\u5668\u5c06\u6bcf\u4e2a\u7247\u6bb5\u538b\u7f29\u4e3a\u5d4c\u5165\u5411\u91cf\u3002\u7136\u540e\u5229\u7528\u9002\u914d\u5668\u5c06\u8fd9\u4e9b\u8868\u793a\u4e0e\u89e3\u7801\u5668\u578bLLM\u5bf9\u9f50\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u8f6f\u63d0\u793a\u7684\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bad\u7ec3\u76ee\u6807\uff1a\u4e00\u662f\u91cd\u5efa\u7f16\u7801\u5668\u8f93\u51fa\uff0c\u4e8c\u662f\u9488\u5bf9\u957f\u6587\u672c\u6307\u4ee4\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u5e2e\u52a9LLM\u7406\u89e3\u8f6f\u63d0\u793a\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cE2LLM\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6548\u7387\u3001\u6027\u80fd\u548c\u4e0e\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u517c\u5bb9\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4ee3\u8868\u4e86\u9886\u57df\u5185\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u4e3a\u6709\u6548\u7684\u5927\u6587\u672c\u5efa\u6a21\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.06666": "|**2024-09-10**|**LLaMA-Omni: Seamless Speech Interaction with Large Language Models**|Qingkai Fang et.al.|[2409.06666](http://arxiv.org/abs/2409.06666)|**[link](https://github.com/ictnlp/llama-omni)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u8bed\u97f3\u5b9e\u73b0\u5b9e\u65f6\u4ea4\u4e92\u7684\u80fd\u529b\u63d0\u5347\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u6587\u672c\u4ea4\u4e92\u65b9\u5f0f\uff0c\u6a21\u578b\u5982GPT-4\u663e\u8457\u589e\u5f3a\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u7136\u800c\uff0c\u5f53\u524d\u5728\u57fa\u4e8e\u5f00\u6e90LLM\u6784\u5efa\u8bed\u97f3\u4ea4\u4e92\u6a21\u578b\u65b9\u9762\u4ecd\u7f3a\u4e4f\u6df1\u5165\u63a2\u7d22\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u6a21\u578b\u67b6\u6784\u2014\u2014LLaMA-Omni\uff0c\u65e8\u5728\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u4e0e\u9ad8\u8d28\u91cf\u7684\u8bed\u97f3\u4e0eLLM\u4ea4\u4e92\u3002\u8be5\u67b6\u6784\u878d\u5408\u4e86\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u3001LLM\u548c\u6d41\u5f0f\u8bed\u97f3\u89e3\u7801\u5668\uff0c\u65e0\u9700\u8fdb\u884c\u8bed\u97f3\u8f6c\u5f55\uff0c\u5373\u53ef\u76f4\u63a5\u4ece\u8bed\u97f3\u6307\u4ee4\u751f\u6210\u6587\u672c\u548c\u8bed\u97f3\u54cd\u5e94\uff0c\u54cd\u5e94\u901f\u5ea6\u6781\u5feb\u3002 \u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8e\u6700\u65b0\u7684Llama-3.1-8B-Instruct\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u9488\u5bf9\u8bed\u97f3\u4ea4\u4e92\u573a\u666f\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aInstructS2S-200K\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e8620\u4e07\u6761\u8bed\u97f3\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u8bed\u97f3\u56de\u5e94\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ee5\u5f80\u7684\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\uff0cLLaMA-Omni\u5728\u5185\u5bb9\u4e0e\u98ce\u683c\u4e0a\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u54cd\u5e94\uff0c\u54cd\u5e94\u5ef6\u8fdf\u4f4e\u81f3226\u6beb\u79d2\u3002\u6b64\u5916\uff0c\u8bad\u7ec3LLaMA-Omni\u4ec5\u9700\u4e0d\u52303\u5929\u7684\u65f6\u95f4\uff0c\u57284\u5757GPU\u4e0a\u5373\u53ef\u5b8c\u6210\uff0c\u8fd9\u4e3a\u672a\u6765\u9ad8\u6548\u5f00\u53d1\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2409.06653": "|**2024-09-10**|**Human Perception of LLM-generated Text Content in Social Media Environments**|Kristina Radivojevic et.al.|[2409.06653](http://arxiv.org/abs/2409.06653)|null|\u65b0\u5174\u6280\u672f\uff0c\u5c24\u5176\u662f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4e3a\u6076\u610f\u884c\u4e3a\u8005\u63d0\u4f9b\u4e86\u64cd\u7eb5\u6570\u5b57\u5bf9\u8bdd\u7684\u5f3a\u5927\u5de5\u5177\u3002LLM\u6709\u53ef\u80fd\u5f71\u54cd\u4f20\u7edf\u5f62\u5f0f\u7684\u6c11\u4e3b\u53c2\u4e0e\uff0c\u4f8b\u5982\u9009\u6c11\u9009\u62e9\u3001\u653f\u5e9c\u8c03\u67e5\u6216\u4e0e\u76d1\u7ba1\u673a\u6784\u7684\u5728\u7ebf\u4ea4\u6d41\uff0c\u56e0\u4e3a\u673a\u5668\u4eba\u80fd\u591f\u751f\u6210\u5927\u91cf\u53ef\u4fe1\u6587\u672c\u3002\u4e3a\u4e86\u7814\u7a76\u4eba\u7c7b\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u611f\u77e5\uff0c\u6211\u4eec\u62db\u52df\u4e86\u8d85\u8fc71000\u540d\u53c2\u4e0e\u8005\uff0c\u7136\u540e\u8ba9\u4ed6\u4eec\u5c1d\u8bd5\u5728\u793e\u4ea4\u5a92\u4f53\u8ba8\u8bba\u7ebf\u7a0b\u4e2d\u533a\u5206\u673a\u5668\u4eba\u4e0e\u4eba\u7c7b\u5e16\u5b50\u3002\u6211\u4eec\u53d1\u73b0\u4eba\u7c7b\u5728\u8bc6\u522b\u793e\u4ea4\u5a92\u4f53\u4e0a\u7684\u771f\u5b9e\u7528\u6237\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4eba\u7c7b\u5728\u793e\u4ea4\u5a92\u4f53\u5bf9\u8bdd\u4e2d\u8bc6\u522bLLM\u751f\u6210\u6587\u672c\u5185\u5bb9\u7684\u6a21\u5f0f\u3002\u6700\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u201c\u602a\u5f02\u8c37\u201d\u6548\u5e94\u5728\u6587\u672c\u5bf9\u8bdd\u4e2d\u7684\u5b58\u5728\uff0c\u65e0\u8bba\u662f\u5728\u611f\u77e5\u8fd8\u662f\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u8868\u660e\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u7684\u8868\u73b0\u4e0d\u4f73\uff0c\u4f46\u5f53\u9605\u8bfbLLM\u751f\u6210\u7684\u5185\u5bb9\u65f6\uff0c\u4ed6\u4eec\u4ecd\u80fd\u611f\u53d7\u5230\u4e0d\u9002\u3002|\n", "2409.06646": "|**2024-09-10**|**Optimal Workload Placement on Multi-Instance GPUs**|Bekir Turkkan et.al.|[2409.06646](http://arxiv.org/abs/2409.06646)|null|\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5982\u4f55\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u57fa\u7840\u7684AI\u63a8\u7406\u5de5\u4f5c\u8d1f\u8f7d\u5728GPU\u4e0a\u7684\u90e8\u7f72\u3002\u6211\u4eec\u9996\u5148\u8bc6\u522b\u5e76\u9610\u8ff0\u4e86\u5b9e\u8df5\u4e2d\u9047\u5230\u7684\u4e00\u4e9b\u9700\u8981\u9ad8\u6548\u5206\u914d\u6216\u8fc1\u79fb\u5de5\u4f5c\u8d1f\u8f7d\u5230\u5176\u4ed6GPU\u4ee5\u817e\u51fa\u7a7a\u95f4\u4f9b\u65b0\u5de5\u4f5c\u8d1f\u8f7d\u4f7f\u7528\u7684\u60c5\u51b5\u3002\u76ee\u6807\u662f\u5c3d\u53ef\u80fd\u51cf\u5c11\u4f7f\u7528\u7684GPU\u6570\u91cf\uff0c\u5e76\u8fdb\u4e00\u6b65\u964d\u4f4e\u88ab\u5229\u7528GPU\u4e2d\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6d6a\u8d39\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u4f18\u5316\u65b9\u6cd5\uff0c\u53e6\u4e00\u79cd\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528\u4e24\u79cd\u5de5\u4f5c\u8d1f\u8f7d\u8c03\u5ea6\u542f\u53d1\u5f0f\u7b97\u6cd5\u5bf9\u591a\u79cd\u7528\u4f8b\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0e\u57fa\u7ebf\u542f\u53d1\u5f0f\u76f8\u6bd4\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u80fd\u591f\u8282\u7701\u9ad8\u8fbe2.85\u500d\u7684GPU\u4f7f\u7528\u91cf\uff0c\u4ee5\u53ca\u9ad8\u8fbe70%\u7684GPU\u6d6a\u8d39\u3002 \u6211\u4eec\u8ba1\u5212\u8ba9SRE\uff08\u7cfb\u7edf\u53ef\u9760\u6027\u5de5\u7a0b\uff09\u793e\u533a\u80fd\u591f\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u5229\u7528\u6211\u4eec\u7684\u63d0\u8bae\u65b9\u6cd5\u3002|\n", "2409.06635": "|**2024-09-10**|**MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders**|Wenyu Zhang et.al.|[2409.06635](http://arxiv.org/abs/2409.06635)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u663e\u8457\u63d0\u9ad8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\uff0c\u4fc3\u8fdb\u4e86\u97f3\u9891LLM\u7684\u53d1\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u7406\u89e3\u8bed\u97f3\u548c\u97f3\u9891\u8f93\u5165\u3002\u73b0\u6709\u7684\u97f3\u9891LLM\u901a\u5e38\u7ed3\u5408\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u4e0e\u6587\u672c\u9884\u8bad\u7ec3\u7684LLM\uff0c\u5e76\u5728\u7279\u5b9a\u7684\u97f3\u9891\u4efb\u52a1\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u7684\u5bb9\u91cf\u6709\u9650\uff0c\u65e0\u6cd5\u6355\u83b7\u65b0\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e2d\u7684\u7279\u5f81\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u201c\u5f31\u201d\u7f16\u7801\u5668\u6df7\u5408\uff08MoWE\uff09\u878d\u5165\u97f3\u9891LLM\u6846\u67b6\u3002MoWE\u901a\u8fc7\u5728\u57fa\u672c\u7f16\u7801\u5668\u57fa\u7840\u4e0a\u8865\u5145\u4e00\u7ec4\u76f8\u5bf9\u8f83\u8f7b\u91cf\u7ea7\u7684\u7f16\u7801\u5668\uff0c\u6839\u636e\u97f3\u9891\u8f93\u5165\u52a8\u6001\u6fc0\u6d3b\u4ee5\u589e\u5f3a\u7279\u5f81\u63d0\u53d6\uff0c\u540c\u65f6\u907f\u514d\u663e\u8457\u589e\u52a0\u6a21\u578b\u5927\u5c0f\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoWE\u6709\u6548\u63d0\u9ad8\u4e86\u591a\u4efb\u52a1\u6027\u80fd\uff0c\u4f7f\u97f3\u9891LLM\u80fd\u591f\u5e94\u7528\u4e8e\u66f4\u591a\u6837\u5316\u7684\u97f3\u9891\u4efb\u52a1\u3002|\n", "2409.06624": "|**2024-09-10**|**A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio**|Ningyuan Xi et.al.|[2409.06624](http://arxiv.org/abs/2409.06624)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u8fc7\u7a0b\u4e2d\uff0c\u5982\u4f55\u901a\u8fc7\u989d\u5916\u8bed\u8a00\u6df7\u5408\u6bd4\uff08ALMR\uff09\u548c\u5b66\u4e60\u7387\uff08LR\uff09\u4e4b\u95f4\u7684\u6700\u4f18\u76f8\u5173\u6027\uff0c\u63d0\u5347\u6a21\u578b\u5728\u4e2d\u6587\u53ca\u5176\u4ed6\u7279\u5b9a\u9886\u57df\u7684\u6027\u80fd\u3002\u9488\u5bf98B\u5927\u5c0f\u7684Llama-3\u6a21\u578b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\uff0c\u786e\u5b9a\u4e86\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u7684\u5173\u952e\u8d85\u53c2\u6570\uff0c\u5e76\u901a\u8fc7\u7cbe\u7ec6\u8c03\u6574\uff0c\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5728\u4e2d\u6587\u76f8\u5173\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u6570\u5b66\u3001\u7f16\u7a0b\u548c\u60c5\u7eea\u667a\u80fd\u7b49\u7279\u5b9a\u9886\u57df\u7684\u80fd\u529b\u3002\u6700\u7ec8\uff0c\u6211\u4eec\u5c0670B\u5927\u5c0f\u7684LLM\u90e8\u7f72\u5230\u5b9e\u9645\u804a\u5929\u7cfb\u7edf\u4e2d\uff0c\u5e76\u53d6\u5f97\u4e86\u4ee4\u4eba\u6ee1\u610f\u7684\u6548\u679c\u3002|\n", "2409.06601": "|**2024-09-10**|**Alleviating Hallucinations in Large Language Models with Scepticism Modeling**|Yetao Wu et.al.|[2409.06601](http://arxiv.org/abs/2409.06601)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u4e3b\u8981\u6311\u6218\u662f\u5e7b\u89c9\u73b0\u8c61\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5728\u591a\u4e2a\u9886\u57df\u7684\u5e94\u7528\u3002\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u53ef\u4ee5\u88ab\u7528\u4e8e\u7f13\u89e3\u5e7b\u89c9\u5e26\u6765\u7684\u635f\u5bb3\u3002\u4eba\u7c7b\u7684\u6000\u7591\u60c5\u7eea\u88ab\u8ba4\u4e3a\u80fd\u589e\u5f3a\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c2\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8d28\u7591\u5efa\u6a21\u201d\uff08SM\uff09\u7684\u65b0\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u8bcd\u5143\u548clogits\u4fe1\u606f\u6765\u8fdb\u884c\u81ea\u6211\u8bc4\u4f30\u800c\u5f97\u5230\u5f62\u5f0f\u5316\u3002\u6211\u4eec\u6784\u5efa\u4e86\u5305\u542b\u6000\u7591\u60c5\u7eea\u610f\u8bc6\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u8fde\u7eed\u9884\u8bad\u7ec3\uff0c\u7136\u540e\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u63d0\u5347\u5b83\u4eec\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u6709\u6548\u589e\u5f3a\u4e86\u6a21\u578b\u4f30\u7b97\u4e0d\u786e\u5b9a\u6027\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u8de8\u9886\u57df\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u5176\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.06595": "|**2024-09-10**|**GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering**|Sacha Muller et.al.|[2409.06595](http://arxiv.org/abs/2409.06595)|**[link](https://github.com/illuin-tech/grouse)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u79c1\u6709\u4e14\u66f4\u65b0\u81f3\u6700\u65b0\u7684\u77e5\u8bc6\u5e93\u76f8\u7ed3\u5408\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u8303\u5f0f\u65f6\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u8bc4\u4f30\u7531RAG\u7cfb\u7edf\u751f\u6210\u7684\u57fa\u4e8e\u73b0\u5b9e\u7684\u7b54\u6848\u65f6\uff0c\u4f5c\u4e3a\u88c1\u5224\u7684LLM\u6240\u9047\u5230\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u8bc4\u4f30\u88c1\u5224\u6a21\u578b\u7684\u6821\u51c6\u548c\u533a\u5206\u80fd\u529b\uff0c\u6211\u4eec\u8bc6\u522b\u4e867\u79cd\u751f\u6210\u5668\u5931\u8d25\u6a21\u5f0f\uff0c\u5e76\u5f15\u5165\u4e86GroUSE\uff08\u57fa\u4e8e\u95ee\u9898\u89e3\u7b54\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b144\u4e2a\u5355\u5143\u6d4b\u8bd5\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\u3002\u8fd9\u4e2a\u57fa\u51c6\u63ed\u793a\u4e86\u73b0\u6709\u7684\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u5f80\u5f80\u5ffd\u89c6\u4e86\u91cd\u8981\u5931\u8d25\u6a21\u5f0f\uff0c\u5373\u4f7f\u5728\u4f7f\u7528GPT-4\u4f5c\u4e3a\u88c1\u5224\u7684\u60c5\u51b5\u4e0b\u4e5f\u662f\u5982\u6b64\u3002 \u4e3a\u4e86\u6539\u8fdb\u5f53\u524d\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u7684\u8bbe\u8ba1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7ba1\u9053\uff0c\u5e76\u53d1\u73b0\u5c01\u95ed\u6a21\u578b\u5728GroUSE\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u88c1\u5224\u6a21\u578b\u5728\u6211\u4eec\u7684\u63d0\u8bae\u6807\u51c6\u4e0b\u5e76\u672a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c3d\u7ba1\u5b83\u4eec\u4e0eGPT-4\u7684\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7684\u76f8\u5173\u6027\u662f\u4e00\u4e2a\u4e0d\u5b8c\u6574\u7684\u4ee3\u7406\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u88c1\u5224\u6a21\u578b\u7684\u5b9e\u9645\u6027\u80fd\uff0c\u5e76\u5e94\u8be5\u901a\u8fc7\u5bf9\u53c2\u8003\u60c5\u51b5\u7684\u7cbe\u786e\u5931\u8d25\u6a21\u5f0f\u68c0\u6d4b\u8fdb\u884c\u8865\u5145\u8bc4\u4f30\u3002 \u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u901a\u8fc7\u5728GPT-4\u7684\u63a8\u7406\u75d5\u8ff9\u4e0a\u5bf9Llama-3\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5176\u8bc4\u4f30\u80fd\u529b\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u4e0eGPT-4\u8bc4\u4ef7\u7684\u76f8\u5173\u6027\u548c\u53c2\u8003\u60c5\u51b5\u7684\u6821\u51c6\u5ea6\u3002|\n", "2409.06558": "|**2024-09-10**|**MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science**|Mahdieh Aliazam et.al.|[2409.06558](http://arxiv.org/abs/2409.06558)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u9ad8\u5ea6\u7cbe\u786e\u548c\u9ad8\u6548\u7684\u7cfb\u7edf\u7684\u9700\u6c42\u4e5f\u5728\u4e0d\u65ad\u589e\u957f\uff0c\u4ee5\u63d0\u5347\u5b89\u5168\u6027\u80fd\u3001\u64cd\u4f5c\u6548\u7387\u548c\u80fd\u6e90\u6d88\u8017\u3002\u5728\u7ba1\u7406\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u65f6\uff0c\u9884\u6d4b\u8f66\u8f86\u8fd0\u884c\u671f\u95f4\u7684\u5404\u79cd\u6761\u4ef6\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6539\u8fdb\u4ee5\u53ca\u77e5\u540d\u6a21\u578b\u5982ChatGPT\u7684\u51fa\u73b0\uff0c\u4e3a\u81ea\u52a8\u9a7e\u9a76\u76f8\u5173\u9884\u6d4b\u63d0\u4f9b\u4e86\u72ec\u7279\u7684\u673a\u4f1a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAPS\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u4f5c\u4e3a\u5730\u56fe\u9605\u8bfb\u8f85\u52a9\u9a7e\u9a76\u5458\uff0c\u9884\u6d4b\u5728\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u8bbe\u7f6e\u7684\u5173\u952e\u53c2\u6570\uff0c\u4ee5\u5e73\u8861\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u3002MAPS\u65b9\u6cd5\u5728\u5bfc\u822a\u7cbe\u5ea6\u65b9\u9762\u76f8\u8f83\u4e8e\u6700\u4f73\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e8620%\u3002\u6b64\u5916\uff0cMAPS\u8fd8\u663e\u793a\u4e86\u5728\u8ba1\u7b97\u5355\u5143\u4e0a\u8282\u7701\u4e8611%\u7684\u80fd\u6e90\uff0c\u5e76\u5728\u673a\u68b0\u548c\u8ba1\u7b97\u5355\u5143\u4e0a\u6700\u9ad8\u8282\u7701\u4e8654%\u3002|\n", "2409.06518": "|**2024-09-10**|**Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games**|Juhwan Choi et.al.|[2409.06518](http://arxiv.org/abs/2409.06518)|**[link](https://github.com/c-juhwan/olympics_analysis)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u7ecf\u6210\u4e3a\u4e3b\u5bfc\u6027\u65b9\u6cd5\uff0c\u7136\u800c\u5b83\u4eec\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u901a\u8fc7\u5206\u6790\u5965\u6797\u5339\u514b\u8fd0\u52a8\u4f1a\u7684\u5386\u53f2\u5956\u724c\u7edf\u8ba1\u60c5\u51b5\uff0c\u7814\u7a76\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u3002\u6211\u4eec\u8981\u6c42\u6a21\u578b\u63d0\u4f9b\u5404\u961f\u7684\u5956\u724c\u6570\u91cf\uff0c\u5e76\u786e\u5b9a\u54ea\u4e9b\u961f\u4f0d\u83b7\u5f97\u4e86\u7279\u5b9a\u6392\u540d\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u6700\u5148\u8fdb\u7684LLM\u5728\u62a5\u544a\u5355\u4e2a\u961f\u4f0d\u7684\u5956\u724c\u6570\u91cf\u65b9\u9762\u8868\u73b0\u5f97\u975e\u5e38\u51fa\u8272\uff0c\u4f46\u5728\u56de\u7b54\u5173\u4e8e\u7279\u5b9a\u6392\u540d\u7684\u95ee\u9898\u65f6\u5374\u9047\u5230\u663e\u8457\u56f0\u96be\u3002\u8fd9\u6697\u793a\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4e0e\u4eba\u7c7b\u7684\u6839\u672c\u4e0d\u540c\uff0c\u4eba\u7c7b\u80fd\u591f\u8f7b\u677e\u5730\u4ece\u5df2\u77e5\u7684\u5956\u724c\u6570\u91cf\u63a8\u65ad\u51fa\u6392\u540d\u3002\u4e3a\u4e86\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u6a21\u578b\u8f93\u51fa\u3002|\n", "2409.07453": "|**2024-09-11**|**\"My Grade is Wrong!\": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays**|Shengxin Hong et.al.|[2409.07453](http://arxiv.org/abs/2409.07453)|null|\u4ea4\u4e92\u5f0f\u53cd\u9988\u5728\u6559\u5e08\u4e0e\u5b66\u751f\u4e4b\u95f4\u53cc\u5411\u6d41\u52a8\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u5411\u53cd\u9988\u66f4\u4e3a\u6709\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u53cd\u9988\u65b9\u5f0f\u5f80\u5f80\u8017\u65f6\u8fc7\u591a\uff0c\u96be\u4ee5\u5728\u6559\u80b2\u5b9e\u8df5\u4e2d\u5e7f\u6cdb\u5e94\u7528\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5177\u6709\u81ea\u52a8\u5316\u53cd\u9988\u7684\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u4e92\u52a8\u60c5\u5883\u4e0b\u7684\u63a8\u7406\u548c\u4ea4\u4e92\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCAELF\uff08Contestable AI Empowered LLM\u6846\u67b6\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u591a\u4ee3\u7406\u7cfb\u7edf\u4e0e\u8ba1\u7b97\u8bba\u8fa9\u6765\u81ea\u52a8\u5316\u4ea4\u4e92\u5f0f\u53cd\u9988\u3002\u9996\u5148\uff0c\u5b66\u751f\u7684\u4f5c\u6587\u7531\u591a\u4e2a\u6559\u5b66\u52a9\u7406\u4ee3\u7406\uff08TA\u4ee3\u7406\uff09\u8fdb\u884c\u8bc4\u4f30\uff0c\u968f\u540e\uff0c\u6559\u5e08\u4ee3\u7406\u901a\u8fc7\u5f62\u5f0f\u5316\u63a8\u7406\u6574\u5408\u8fd9\u4e9b\u8bc4\u4ef7\uff0c\u751f\u6210\u53cd\u9988\u548c\u8bc4\u5206\u3002\u5b66\u751f\u53ef\u4ee5\u8fdb\u4e00\u6b65\u4e0e\u53cd\u9988\u4e92\u52a8\uff0c\u4ee5\u6df1\u5316\u7406\u89e3\u3002\u901a\u8fc7\u5bf9500\u7bc7\u6279\u5224\u6027\u601d\u7ef4\u4f5c\u6587\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u5e76\u7ed3\u5408\u7528\u6237\u7814\u7a76\uff0c\u7ed3\u679c\u8868\u660e\uff0cCAELF\u663e\u8457\u63d0\u9ad8\u4e86\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u589e\u5f3a\u4e86LLM\u7684\u63a8\u7406\u548c\u4e92\u52a8\u80fd\u529b\u3002\u8fd9\u4e00\u65b9\u6cd5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u514b\u670d\u5f71\u54cd\u6559\u80b2\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u65f6\u95f4\u548c\u8d44\u6e90\u969c\u788d\u7684\u6709\u524d\u666f\u89e3\u51b3\u65b9\u6848\u3002|\n", "2409.07440": "|**2024-09-11**|**SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories**|Ben Bogin et.al.|[2409.07440](http://arxiv.org/abs/2409.07440)|**[link](https://github.com/allenai/super-benchmark)**|**\u7ed9\u5b9a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u5199\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u5b83\u4eec\u73b0\u5728\u662f\u5426\u80fd\u591f\u81ea\u4e3b\u91cd\u73b0\u7814\u7a76\u4ed3\u5e93\u4e2d\u7684\u7ed3\u679c\uff1f\u8fd9\u6837\u7684\u80fd\u529b\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u4ea7\u751f\u5de8\u5927\u76ca\u5904\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9a8c\u8bc1\u3001\u7406\u89e3\u5e76\u6269\u5c55\u5148\u524d\u7684\u5de5\u4f5c\u3002\u4e3a\u4e86\u5411\u8fd9\u4e00\u76ee\u6807\u8fc8\u8fdb\uff0c\u6211\u4eec\u5f15\u5165\u4e86SUPER\uff0c\u8fd9\u662f\u9996\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4ece\u7814\u7a76\u4ed3\u5e93\u8bbe\u7f6e\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u7684\u57fa\u51c6\u3002SUPER\u65e8\u5728\u6355\u6349\u7814\u7a76\u4eba\u5458\u5728\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u4ed3\u5e93\u5de5\u4f5c\u65f6\u6240\u9762\u4e34\u7684\u771f\u5b9e\u6311\u6218\u3002\u6211\u4eec\u7684\u57fa\u51c6\u7531\u4e09\u4e2a\u4e0d\u540c\u7684\u95ee\u9898\u96c6\u7ec4\u6210\uff1a45\u4e2a\u7aef\u5230\u7aef\u95ee\u9898\uff0c\u9644\u6709\u4e13\u5bb6\u89e3\u51b3\u65b9\u6848\u7684\u6ce8\u91ca\uff0c152\u4e2a\u4e13\u6ce8\u4e8e\u7279\u5b9a\u6311\u6218\uff08\u4f8b\u5982\u914d\u7f6e\u8bad\u7ec3\u5668\uff09\u7684\u5b50\u95ee\u9898\uff0c\u4ee5\u53ca602\u4e2a\u7528\u4e8e\u66f4\u5927\u89c4\u6a21\u5f00\u53d1\u7684\u81ea\u52a8\u751f\u6210\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u5404\u79cd\u8bc4\u4f30\u6307\u6807\u6765\u8bc4\u4f30\u4efb\u52a1\u6210\u529f\u548c\u8fdb\u5ea6\uff0c\u5f53\u6709\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\u53ef\u7528\u65f6\u4f7f\u7528\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\uff0c\u5426\u5219\u4f7f\u7528\u8fd1\u4f3c\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u9047\u5230\u4e86\u56f0\u96be\uff0c\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4o\uff09\u4ec5\u89e3\u51b3\u4e8616.3%\u7684\u7aef\u5230\u7aef\u96c6\u548c46.1%\u7684\u573a\u666f\u3002\u8fd9\u8868\u660e\u4e86\u8fd9\u9879\u4efb\u52a1\u7684\u6311\u6218\u6027\uff0c\u5e76\u8868\u660eSUPER\u53ef\u4ee5\u4f5c\u4e3a\u793e\u533a\u8861\u91cf\u548c\u63a8\u52a8\u8fdb\u6b65\u7684\u5b9d\u8d35\u8d44\u6e90\u3002**|\n", "2409.07407": "|**2024-09-11**|**CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification**|Zeqing Qin et.al.|[2409.07407](http://arxiv.org/abs/2409.07407)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6f0f\u6d1e\u8bc6\u522b\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7531\u4e8eC/C++\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u5360\u636e\u4e86\u5f00\u6e90\u8f6f\u4ef6\uff08OSS\uff09\u6f0f\u6d1e\u7684\u4e00\u534a\uff0c\u5e76\u4e14\u4e3b\u8981\u901a\u8fc7\u63d0\u4ea4\u8fdb\u884c\u66f4\u65b0\uff0c\u56e0\u6b64\u589e\u5f3aLLM\u5728\u8bc6\u522bC/C++\u6f0f\u6d1e\u8d21\u732e\u63d0\u4ea4\uff08VCC\uff09\u65b9\u9762\u7684\u80fd\u529b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u5927\u89c4\u6a21\u4ee3\u7801\u96c6\u8fdb\u4e00\u6b65\u9884\u8bad\u7ec3LLM\u4e0a\uff0c\u8fd9\u65e2\u8017\u8d39\u8d44\u6e90\u53c8\u5b58\u5728\u6548\u7387\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u65b9\u6cd5\u6765\u63d0\u5347\u57fa\u4e8eBERT\u7684LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86CodeLinguaNexus\uff08CLNX\uff09\uff0c\u4f5c\u4e3a\u8fde\u63a5C/C++\u7a0b\u5e8f\u4e0eLLM\u7684\u6865\u6881\u3002CLNX\u901a\u8fc7\u5728\u4fdd\u7559\u5173\u952e\u7ec6\u8282\u7684\u540c\u65f6\uff0c\u4ee5\u66f4\u81ea\u7136\u7684\u65b9\u5f0f\u9ad8\u6548\u5730\u5c06\u6e90\u4ee3\u7801\u8f6c\u6362\u4e3a\u66f4\u9002\u5408LLM\u5904\u7406\u7684\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0cCLNX\u9996\u5148\u5e94\u7528\u7ed3\u6784\u7ea7\u81ea\u7136\u5316\u6765\u5206\u89e3\u590d\u6742\u7684\u7a0b\u5e8f\uff0c\u7136\u540e\u5e94\u7528\u7b26\u53f7\u7ea7\u81ea\u7136\u5316\u6765\u89e3\u91ca\u590d\u6742\u7684\u7b26\u53f7\u3002\u6211\u4eec\u5728\u5305\u542b25,872\u4e2aC/C++\u51fd\u6570\u53ca\u5176\u63d0\u4ea4\u7684\u516c\u5f00\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86CLNX\u3002\u7ed3\u679c\u8868\u660e\uff0cCLNX\u663e\u8457\u63d0\u5347\u4e86LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u914d\u5907CLNX\u7684CodeBERT\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u8bc6\u522b\u4e8638\u4e2aOSS\u6f0f\u6d1e\u3002|\n", "2409.07394": "|**2024-09-11**|**AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge**|Han Wang et.al.|[2409.07394](http://arxiv.org/abs/2409.07394)|**[link](https://github.com/hannight/adacad)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u53c2\u6570\u5b58\u50a8\u7684\u77e5\u8bc6\u4e4b\u95f4\u5b58\u5728\u77e5\u8bc6\u51b2\u7a81\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u4f7f\u7528\u6807\u51c6\u89e3\u7801\u6280\u672f\u65f6\u6027\u80fd\u53d7\u635f\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6280\u672f\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e0a\u4e0b\u6587\u3002\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u95f4\u5bf9\u6bd4\u65b9\u6cd5\u8bd5\u56fe\u901a\u8fc7\u6bd4\u8f83\u5e26\u6709\u548c\u4e0d\u5e26\u6709\u4e0a\u4e0b\u6587\u7684LLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u5bf9\u6bd4\uff0c\u5e76\u6839\u636e\u5b83\u4eec\u4e4b\u95f4\u7684\u5bf9\u6bd4\u8c03\u6574\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u65b9\u6cd5\u7ecf\u5e38\u9519\u8bef\u5730\u5224\u65ad\u51b2\u7a81\u7684\u7a0b\u5ea6\uff0c\u5e76\u4e14\u96be\u4ee5\u5904\u7406\u4e0d\u540c\u51b2\u7a81\u7a0b\u5ea6\u7684\u5b9e\u4f8b\uff0c\u9759\u6001\u65b9\u6cd5\u5728\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\u8fc7\u5ea6\u8c03\u6574\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684\u7cbe\u7ec6\u7c92\u5ea6\u65b9\u6cd5AdaCAD\uff0c\u5b83\u52a8\u6001\u5730\u6839\u636eJensen-Shannon\u6563\u5ea6\u6d4b\u91cf\u7684\u4e0a\u4e0b\u6587\u548c\u53c2\u6570\u77e5\u8bc6\u5206\u5e03\u4e4b\u95f4\u7684\u51b2\u7a81\u7a0b\u5ea6\u6765\u63a8\u65ad\u8c03\u6574\u6743\u91cd\u3002\u6211\u4eec\u5728\u56db\u4e2a\u6a21\u578b\u4e0a\u5bf9\u516d\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u548c\u4e09\u4e2a\u6458\u8981\u4efb\u52a1\u8fdb\u884c\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u81ea\u9002\u5e94\u65b9\u6cd5\u59cb\u7ec8\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u5176\u4ed6\u89e3\u7801\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8614.21%\uff08\u7edd\u5bf9\u503c\uff09\uff0c\u5e76\u4e14\u63d0\u9ad8\u4e86\u6458\u8981\u7684\u771f\u5b9e\u6027\uff0cAlignScore\u63d0\u9ad8\u4e865.59\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4e0e\u51b2\u7a81\u7684\u5bf9\u6bd4\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5f53\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\uff0c\u89e3\u7801\u4f1a\u635f\u5bb3\u6027\u80fd\uff0c\u800cAdaCAD\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u635f\u5931\uff0c\u4f7f\u5176\u66f4\u9002\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\uff0c\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e2d\uff0c\u6709\u4e9b\u793a\u4f8b\u5b58\u5728\u51b2\u7a81\uff0c\u800c\u5176\u4ed6\u793a\u4f8b\u5219\u4e0d\u5b58\u5728\u51b2\u7a81\u3002**|\n", "2409.07368": "|**2024-09-11**|**Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code**|Khiem Ton et.al.|[2409.07368](http://arxiv.org/abs/2409.07368)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSGCode\u7684\u7075\u6d3b\u63d0\u793a\u4f18\u5316\u7cfb\u7edf\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5b89\u5168\u4ee3\u7801\u3002SGCode\u5c06\u6700\u8fd1\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e0eLLM\u7ed3\u5408\u5728\u4e00\u4e2a\u7edf\u4e00\u7684\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u524d\u7aef\u548c\u540e\u7aefAPI\u63d0\u4f9b\u670d\u52a1\uff0c\u4f7f\u7528\u6237\u80fd\u591f\uff1a1\uff09\u751f\u6210\u65e0\u6f0f\u6d1e\u7684\u5b89\u5168\u4ee3\u7801\uff1b2\uff09\u67e5\u770b\u548c\u5171\u4eab\u5b89\u5168\u6027\u5206\u6790\uff1b\u4ee5\u53ca3\uff09\u8f7b\u677e\u5728\u4e0d\u540c\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u63d0\u4f9b\u6709\u5173\u6a21\u578b\u548c\u7cfb\u7edf\u6027\u80fd\u7684\u89c1\u89e3\u3002\u6211\u4eec\u4f7f\u7528AWS\u670d\u52a1\u5668\u4e0a\u7684PromSec\u586b\u5145SGCode\uff0c\u8fd9\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06LLM\u3001\u5b89\u5168\u5de5\u5177\u4e0e\u8f7b\u91cf\u7ea7\u751f\u6210\u5bf9\u6297\u56fe\u795e\u7ecf\u7f51\u7edc\u76f8\u7ed3\u5408\uff0c\u6765\u68c0\u6d4b\u5e76\u4fee\u590d\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u4ece\u800c\u4f18\u5316\u63d0\u793a\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSGCode\u4f5c\u4e3a\u516c\u5171\u5de5\u5177\uff0c\u80fd\u591f\u63ed\u793a\u6a21\u578b\u5b9e\u7528\u6027\u3001\u5b89\u5168\u4ee3\u7801\u751f\u6210\u548c\u7cfb\u7edf\u6210\u672c\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5177\u6709\u76f8\u5bf9\u8f83\u4f4e\u7684\u6210\u672c\u3002SGCode\u5df2\u4e0a\u7ebf\u4e8e\uff1a\u3002|\n", "2409.07355": "|**2024-09-11**|**Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation**|SeongYeub Chu et.al.|[2409.07355](http://arxiv.org/abs/2409.07355)|**[link](https://github.com/BBeeChu/InteractEval)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cInteractEval\u201d\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u91c7\u7528\u201cThink-Aloud\u201d\u65b9\u6cd5\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u610f\u89c1\uff0c\u4ee5\u751f\u6210\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u7684\u5c5e\u6027\u3002\u901a\u8fc7\u878d\u5408\u4eba\u7c7b\u7684\u7075\u6d3b\u6027\u548c\u63a8\u7406\u80fd\u529b\u4ee5\u53caLLM\u7684\u4e00\u81f4\u6027\uff0cInteractEval\u5728\u4e00\u81f4\u6027\u3001\u6d41\u7545\u6027\u3001\u76f8\u5173\u6027\u548c\u8fde\u8d2f\u6027\u56db\u4e2a\u7ef4\u5ea6\u4e0a\u5747\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u975eLLM\u57fa\u7ebf\u548cLLM\u57fa\u7ebf\u6a21\u578b\u3002\u5b9e\u9a8c\u8fd8\u63a2\u8ba8\u4e86\u201cThink-Aloud\u201d\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5b83\u80fd\u4fc3\u8fdb\u4eba\u7c7b\u548cLLM\u7684\u53d1\u6563\u601d\u7ef4\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u5e7f\u6cdb\u7684\u76f8\u5173\u5c5e\u6027\uff0c\u5e76\u63d0\u9ad8\u6587\u672c\u8bc4\u4f30\u6027\u80fd\u3002\u6bd4\u8f83\u5206\u6790\u663e\u793a\uff0c\u4eba\u7c7b\u5728\u8bc6\u522b\u4e0e\u5185\u90e8\u8d28\u91cf\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u8fde\u8d2f\u6027\u548c\u6d41\u7545\u6027\uff09\u65b9\u9762\u8868\u73b0\u4f18\u5f02\uff0c\u800cLLM\u5728\u4e0e\u5916\u90e8\u5bf9\u9f50\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u4e00\u81f4\u6027\u548c\u76f8\u5173\u6027\uff09\u4e0a\u8868\u73b0\u66f4\u597d\u3002\u56e0\u6b64\uff0c\u7ed3\u5408\u4eba\u7c7b\u548cLLM\u5171\u540c\u4ea7\u751f\u7684\u8bc4\u4f30\u7ed3\u679c\u6700\u4f73\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u672c\u6587\u5f3a\u8c03\u4e86\u5728\u81ea\u52a8\u5316\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u6846\u67b6\u4e2d\u6709\u6548\u6574\u5408\u4eba\u7c7b\u548cLLM\u7684\u5fc5\u8981\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\textbf{\\url{https://github.com/BBeeChu/InteractEval.git}}}\u3002**|\n", "2409.07331": "|**2024-09-11**|**Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering**|Weixi Weng et.al.|[2409.07331](http://arxiv.org/abs/2409.07331)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u51fa\u8272\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u77e5\u8bc6\u57fa\u89c6\u89c9\u95ee\u7b54\uff08KB-VQA\uff09\u4efb\u52a1\u4e2d\uff0cMLLMs\u53ef\u80fd\u7f3a\u4e4f\u4eba\u7c7b\u5e38\u8bc6\u6216\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u4ece\u800c\u9700\u8981\u4ece\u5916\u90e8\u77e5\u8bc6\u6e90\u83b7\u53d6\u6240\u9700\u4fe1\u606f\u4ee5\u56de\u7b54\u6b64\u7c7b\u95ee\u9898\u3002\u5148\u524d\u7684\u5de5\u4f5c\uff0c\u5982\u68c0\u7d22\u589e\u5f3a\u7684VQA-v2\uff08RAVQA-v2\uff09\uff0c\u4fa7\u91cd\u4e8e\u5145\u5206\u5229\u7528\u8f93\u5165\u4fe1\u606f\uff0c\u4f8b\u5982\u56fe\u50cf\u6587\u672c\u63cf\u8ff0\u548c\u68c0\u7d22\u7684\u77e5\u8bc6\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u90fd\u5ffd\u89c6\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u968f\u7740\u8f93\u5165\u4ee4\u724c\u6570\u91cf\u7684\u589e\u52a0\uff0c\u63a8\u7406\u6548\u7387\u663e\u8457\u964d\u4f4e\uff0c\u8fd9\u4e0e\u5b9e\u9645\u5e94\u7528\u7684\u9700\u6c42\u76f8\u77db\u76fe\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68c0\u7d22\u589e\u5f3a\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08RACC\uff09\u3002RACC\u5b66\u4e60\u538b\u7f29\u5e76\u805a\u5408\u68c0\u7d22\u4e0a\u4e0b\u6587\uff0c\u5e76\u751f\u6210\u7d27\u51d1\u7684\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u5f62\u5f0f\u7684\u8c03\u8282\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u8c03\u8282\u6765\u9002\u5e94\u4e0b\u6e38\u51bb\u7ed3\u7684MLLM\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u4e14\u9ad8\u6548\u7684\u63a8\u7406\u3002RACC\u5728OK-VQA\u4e0a\u5b9e\u73b0\u4e86\u5f53\u524d\u6700\u4f73\u768462.9%\u6027\u80fd\u3002\u6b64\u5916\uff0c\u5b83\u5c06RAVQA-v2\u7684\u63a8\u7406\u5ef6\u8fdf\u663e\u8457\u964d\u4f4e\u4e8622.0%-59.7%\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\u4e86RACC\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u3002\u5b83\u4e0e\u5404\u79cd\u73b0\u6210\u7684MLLM\u517c\u5bb9\uff0c\u5e76\u53ef\u4ee5\u5904\u7406\u5305\u62ec\u6587\u672c\u548c\u591a\u6a21\u6001\u6587\u6863\u5728\u5185\u7684\u4e0d\u540c\u77e5\u8bc6\u6e90\u3002|\n", "2409.07314": "|**2024-09-11**|**MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications**|Praveen K Kanithi et.al.|[2409.07314](http://arxiv.org/abs/2409.07314)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u533b\u7597\u5065\u5eb7\u9886\u57df\u7684\u5feb\u901f\u5f00\u53d1\u5f15\u53d1\u4e86\u5bf9\u8d85\u8d8a\u5982USMLE\u7b49\u5e38\u7528\u57fa\u51c6\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u9700\u6c42\uff0c\u4ee5\u66f4\u597d\u5730\u53cd\u6620\u5b9e\u9645\u5e94\u7528\u8868\u73b0\u3002\u867d\u7136\u73b0\u5b9e\u4e16\u754c\u7684\u8bc4\u4f30\u662f\u5b9e\u7528\u6027\u7684\u91cd\u8981\u6307\u6807\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u843d\u540e\u4e8eLLM\u6f14\u8fdb\u7684\u901f\u5ea6\uff0c\u53ef\u80fd\u5bfc\u81f4\u7814\u7a76\u7ed3\u679c\u5728\u90e8\u7f72\u65f6\u53d8\u5f97\u8fc7\u65f6\u3002\u8fd9\u79cd\u65f6\u95f4\u4e0a\u7684\u8131\u8282\u9700\u8981\u4e00\u79cd\u5168\u9762\u7684\u524d\u671f\u8bc4\u4f30\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6a21\u578b\u9009\u62e9\u3002 \u6211\u4eec\u5f15\u5165\u4e86MEDIC\u6846\u67b6\uff0c\u5b83\u4ece\u4e94\u4e2a\u5173\u952e\u7684\u4e34\u5e8a\u80fd\u529b\u7ef4\u5ea6\u8bc4\u4f30LLM\uff1a\u533b\u5b66\u63a8\u7406\u3001\u4f26\u7406\u4e0e\u504f\u89c1\u3001\u6570\u636e\u548c\u8bed\u8a00\u7406\u89e3\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\u4ee5\u53ca\u4e34\u5e8a\u5b89\u5168\u6027\u3002MEDIC\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ea4\u53c9\u5ba1\u67e5\u6846\u67b6\uff0c\u91cf\u5316\u4e86LLM\u5728\u8986\u76d6\u8303\u56f4\u548c\u5e7b\u89c9\u68c0\u6d4b\u7b49\u9886\u57df\u7684\u6027\u80fd\uff0c\u800c\u65e0\u9700\u53c2\u8003\u8f93\u51fa\u3002\u6211\u4eec\u4f7f\u7528MEDIC\u5bf9\u533b\u7597\u95ee\u7b54\u3001\u5b89\u5168\u3001\u603b\u7ed3\u3001\u7b14\u8bb0\u751f\u6210\u4ee5\u53ca\u5176\u4ed6\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\u4e0d\u540c\u6a21\u578b\u5927\u5c0f\u4e4b\u95f4\u3001\u57fa\u7ebf\u6a21\u578b\u4e0e\u533b\u5b66\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u5e76\u5bf9\u9700\u8981\u7279\u5b9a\u6a21\u578b\u4f18\u52bf\u7684\u5e94\u7528\uff08\u5982\u4f4e\u5e7b\u89c9\u6216\u8f83\u4f4e\u63a8\u7406\u6210\u672c\uff09\u7684\u6a21\u578b\u9009\u62e9\u5177\u6709\u542f\u793a\u610f\u4e49\u3002MEDIC\u7684\u591a\u7ef4\u5ea6\u8bc4\u4f30\u63ed\u793a\u4e86\u7406\u8bba\u80fd\u529b\u548c\u5b9e\u9645\u5b9e\u65bd\u4e4b\u95f4\u7684\u6027\u80fd\u6743\u8861\uff0c\u5f25\u5408\u4e86\u5728\u533b\u7597\u4fdd\u5065\u73af\u5883\u4e2d\u8bc6\u522b\u548c\u9002\u5e94\u6700\u6709\u524d\u666f\u6a21\u578b\u7684\u5dee\u8ddd\uff0c\u786e\u4fdd\u4e86\u9002\u5408\u591a\u79cd\u533b\u7597\u4fdd\u5065\u5e94\u7528\u7684\u6a21\u578b\u5f97\u5230\u8bc6\u522b\u548c\u9002\u5e94\u3002|\n", "2409.07276": "|**2024-09-11**|**STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM**|Qijiong Liu et.al.|[2409.07276](http://arxiv.org/abs/2409.07276)|null|\u4f20\u7edf\u63a8\u8350\u6a21\u578b\u901a\u5e38\u4f9d\u8d56\u4e8e\u72ec\u7279\u7684\u9879\u76ee\u6807\u8bc6\u7b26\uff08ID\uff09\u6765\u533a\u5206\u9879\u76ee\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u5b83\u4eec\u5229\u7528\u9879\u76ee\u5185\u5bb9\u4fe1\u606f\u548c\u63a8\u5e7f\u957f\u5c3e\u6216\u51b7\u542f\u52a8\u9879\u76ee\u7684\u80fd \u529b\u3002\u8fd1\u671f\uff0c\u5df2\u63d0\u51fa\u8bed\u4e49\u5206\u8bcd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6bcf\u4e2a\u9879\u76ee\u7684\u8bed\u4e49\u8868\u793a\u5206\u8bcd\u4e3a\u4e00\u7cfb\u5217\u79bb\u6563\u7684\u4ee4\u724c\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5b83\u4fdd \u7559\u4e86\u9879\u76ee\u5728\u8fd9\u4e9b\u4ee4\u724c\u5185\u7684\u8bed\u4e49\uff0c\u5e76\u786e\u4fdd\u5177\u6709\u76f8\u4f3c\u8bed\u4e49\u7684\u9879\u76ee\u7531\u76f8\u4f3c\u7684\u4ee4\u724c\u8868\u793a\u3002\u8fd9\u4e9b\u8bed\u4e49\u4ee4\u724c\u6210\u4e3a\u8bad\u7ec3\u751f\u6210\u63a8\u8350\u6a21\u578b\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u73b0\u6709 \u7684\u751f\u6210\u63a8\u8350\u65b9\u6cd5\u901a\u5e38\u6d89\u53ca\u591a\u4e2a\u5b50\u6a21\u578b\u8fdb\u884c\u5d4c\u5165\u3001\u91cf\u5316\u548c\u63a8\u8350\uff0c\u5bfc\u81f4\u7cfb\u7edf\u8fc7\u4e8e\u590d\u6742\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u79f0\u4e3aSTORE\uff0c \u5229\u7528\u5355\u4e00\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u540c\u65f6\u6267\u884c\u8fd9\u4e24\u9879\u4efb\u52a1\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u8bed\u4e49\u5206\u8bcd\u8868\u8ff0\u4e3a\u6587\u672c\u5230\u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u800c\u751f\u6210\u63a8\u8350\u5219\u8868\u8ff0\u4e3a\u4ee4\u724c\u5230 \u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u901a\u8fc7\u8865\u5145\u4ee4\u724c\u5230\u6587\u672c\u91cd\u6784\u4efb\u52a1\u548c\u6587\u672c\u5230\u4ee4\u724c\u8f85\u52a9\u4efb\u52a1\uff0c\u6240\u6709\u8fd9\u4e9b\u4efb\u52a1\u5747\u4ee5\u751f\u6210\u65b9\u5f0f\u8868\u8ff0\u5e76\u4f7f\u7528\u5355\u4e00LLM\u9aa8\u5e72\u8fdb\u884c\u8bad\u7ec3\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1\u6211\u4eec\u7684STORE\u6846\u67b6\u5728\u5404\u79cd\u63a8\u8350\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u6e90\u4ee3\u7801\u548c\u914d\u7f6e\uff0c\u4ee5\u4fbf\u8fdb\u884c\u53ef\u590d\u73b0\u7684\u7814\u7a76\u3002|\n", "2409.07267": "|**2024-09-11**|**MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving**|Enming Zhang et.al.|[2409.07267](http://arxiv.org/abs/2409.07267)|**[link](https://github.com/emzucas/minidrive)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMiniDrive\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u573a\u666f\u4e2d\u7684\u5e94\u7528\u96be\u9898\u3002\u73b0\u6709\u7684VLM\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u8ba1\u7b97\u5bc6\u96c6\u578b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u5728\u5b9e\u9645\u4e16\u754c\u548c\u5b9e\u65f6\u5e94\u7528\u4e2d\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u5927\u591a\u6570\u73b0\u6709VLM\u7f3a\u4e4f\u5904\u7406\u591a\u5f20\u56fe\u7247\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u9002\u5e94\u81ea\u52a8\u9a7e\u9a76\u4e2d\u7684\u591a\u6444\u50cf\u5934\u611f\u77e5\u9700\u6c42\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u7279\u5f81\u5de5\u7a0b\u6df7\u5408\u4e13\u5bb6\uff08FE-MoE\uff09\u548c\u52a8\u6001\u6307\u4ee4\u9002\u914d\u5668\uff08DI-Adapter\uff09\u3002FE-MoE\u6709\u6548\u5730\u5c06\u4e8c\u7ef4\u7279\u5f81\u6620\u5c04\u5230\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\uff0c\u7136\u540e\u4f5c\u4e3a\u8f93\u5165\u4f20\u9012\u7ed9\u8bed\u8a00\u6a21\u578b\u3002DI-Adapter\u5141\u8bb8\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u6839\u636e\u6307\u4ee4\u6587\u672c\u5d4c\u5165\u52a8\u6001\u53d8\u5316\uff0c\u89e3\u51b3\u4e86\u4ee5\u5f80\u65b9\u6cd5\u4e2d\u540c\u4e00\u56fe\u7247\u4e0b\u9759\u6001\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u7684\u95ee\u9898\u3002 \u4e0e\u4e4b\u524d\u7684\u6210\u679c\u76f8\u6bd4\uff0cMiniDrive\u5728\u53c2\u6570\u5927\u5c0f\u3001\u6d6e\u70b9\u8fd0\u7b97\u91cf\u548c\u54cd\u5e94\u6548\u7387\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u4f18\u6027\u80fd\uff0c\u6700\u5c0f\u7248\u672c\u4ec5\u5305\u542b83M\u53c2\u6570\u3002|\n", "2409.08264": "|**2024-09-12**|**Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale**|Rogerio Bonatti et.al.|[2409.08264](http://arxiv.org/abs/2409.08264)|**[link](https://github.com/microsoft/windowsagentarena)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u5728\u9700\u8981\u89c4\u5212\u548c\u63a8\u7406\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u4f5c\u4e3a\u8ba1\u7b97\u673a\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u80fd\u663e\u8457\u63d0\u5347\u4eba\u7c7b\u751f\u4ea7\u529b\u548c\u8f6f\u4ef6\u53ef\u8bbf\u95ee\u6027\u3002\u7136\u800c\uff0c\u8861\u91cf\u8fd9\u4e9b\u4ee3\u7406\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u7684\u6027\u80fd\u4ecd\u5b58\u5728\u6311\u6218\uff1a\uff08i\uff09\u5927\u591a\u6570\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u9650\u4e8e\u7279\u5b9a\u6a21\u6001\u6216\u9886\u57df\uff08\u4f8b\u5982\u7eaf\u6587\u672c\u3001\u7f51\u9875\u5bfc\u822a\u3001\u95ee\u9898\u56de\u7b54\u3001\u7f16\u7a0b\uff09\uff0c\uff08ii\uff09\u5b8c\u6574\u57fa\u51c6\u8bc4\u4f30\u8017\u65f6\u957f\uff08\u901a\u5e38\u9700\u6570\u5929\u65f6\u95f4\uff09\uff0c\u56e0\u4e3a\u4efb\u52a1\u5177\u6709\u591a\u6b65\u9aa4\u7684\u5e8f\u5217\u6027\u8d28\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201cWindows Agent Arena\u201d\uff1a\u4e00\u4e2a\u53ef\u590d\u73b0\u7684\u901a\u7528\u73af\u5883\uff0c\u4e13\u6ce8\u4e8eWindows\u64cd\u4f5c\u7cfb\u7edf\uff0c\u5141\u8bb8\u4ee3\u7406\u81ea\u7531\u64cd\u4f5c\u5e76\u4f7f\u7528\u4e0e\u4eba\u7c7b\u7528\u6237\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u76f8\u540c\u7684\u5e7f\u6cdb\u5e94\u7528\u7a0b\u5e8f\u3001\u5de5\u5177\u548c\u7f51\u7edc\u6d4f\u89c8\u5668\u3002\u6211\u4eec\u6839\u636eOSWorld\u6846\u67b6\uff08Xie\u7b49\u4eba\uff0c2024\u5e74\uff09\u521b\u5efa\u4e86150\u591a\u4e2a\u8de8\u4ee3\u8868\u9886\u57df\u7684\u591a\u6837\u5316Windows\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u89c4\u5212\u3001\u5c4f\u5e55\u7406\u89e3\u53ca\u5de5\u5177\u4f7f\u7528\u7684\u4ee3\u7406\u80fd\u529b\u8981\u6c42\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u5e76\u80fd\u591f\u65e0\u7f1d\u5730\u5728Azure\u4e0a\u5e76\u884c\u5316\uff0c\u4ece\u800c\u5728\u77ed\u77ed20\u5206\u949f\u5185\u5b8c\u6210\u5168\u9762\u57fa\u51c6\u8bc4\u4f30\u3002\u4e3a\u4e86\u5c55\u793aWindows Agent Arena\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ee3\u7406Navi\u3002Navi\u5728Windows\u9886\u57df\u5185\u7684\u6210\u529f\u7387\u8fbe\u5230\u4e8619.5%\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u672a\u7ecf\u8f85\u52a9\u7684\u4eba\u7c7b\u8868\u73b0\u5219\u4e3a74.5%\u3002\u6b64\u5916\uff0cNavi\u5728\u53e6\u4e00\u4e2a\u6d41\u884c\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u57fa\u51c6\u6d4b\u8bd5Mind2Web\u4e2d\u4e5f\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9Navi\u6027\u80fd\u7684\u8be6\u7ec6\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528Windows Agent Arena\u8fdb\u884c\u672a\u6765\u7814\u7a76\u7684\u4ee3\u7406\u5f00\u53d1\u548c\u6570\u636e\u751f\u6210\u673a\u4f1a\u7684\u89c1\u89e3\u3002\u7f51\u9875\uff1ahttps://microsoft.github.io/WindowsAgentArena \u4ee3\u7801\uff1ahttps://github.com/microsoft/WindowsAgentArena**|\n", "2409.08250": "|**2024-09-12**|**OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering**|Jiahao Nick Li et.al.|[2409.08250](http://arxiv.org/abs/2409.08250)|null|\u4eba\u4eec\u5e38\u901a\u8fc7\u7167\u7247\u3001\u5c4f\u5e55\u622a\u56fe\u548c\u89c6\u9891\u6765\u6355\u6349\u8bb0\u5fc6\u3002\u73b0\u6709\u7684\u57fa\u4e8eAI\u7684\u5de5\u5177\u80fd\u591f\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u68c0\u7d22\u8fd9\u4e9b\u6570\u636e\uff0c\u4f46\u4e3b\u8981\u5c40\u9650\u4e8e\u68c0\u7d22\u50cf\u7167\u7247\u4e2d\u7684\u7279\u5b9a\u7269\u4f53\u8fd9\u6837\u7684\u5355\u4e00\u4fe1\u606f\uff0c\u96be\u4ee5\u5904\u7406\u6d89\u53ca\u7406\u89e3\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\uff08\u5982\u4e8b\u4ef6\u5e8f\u5217\uff09\u7684\u66f4\u590d\u6742\u67e5\u8be2\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u4e3a\u671f\u4e00\u4e2a\u6708\u7684\u65e5\u5fd7\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u73b0\u5b9e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96c6\u6210\u4e0e\u6355\u83b7\u8bb0\u5fc6\u76f8\u5173\u5fc5\u8981\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86OmniQuery\uff0c\u8fd9\u662f\u4e00\u79cd\u80fd\u591f\u56de\u7b54\u9700\u8981\u63d0\u53d6\u548c\u63a8\u65ad\u591a\u5c42\u4e0a\u4e0b\u6587\u4fe1\u606f\u4ee5\u6574\u5408\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\u7684\u590d\u6742\u4e2a\u4eba\u8bb0\u5fc6\u76f8\u5173\u95ee\u9898\u7684\u65b0\u578b\u7cfb\u7edf\u3002OmniQuery\u901a\u8fc7\u4ece\u591a\u4e2a\u76f8\u4e92\u5173\u8054\u7684\u8bb0\u5fc6\u4e2d\u96c6\u6210\u5206\u6563\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u589e\u5f3a\u5355\u4e2a\u6355\u83b7\u7684\u8bb0\u5fc6\uff0c\u68c0\u7d22\u76f8\u5173\u8bb0\u5fc6\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u4f9b\u5168\u9762\u7684\u7b54\u6848\u3002\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86OmniQuery\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u8fbe\u523071.5%\uff0c\u5e76\u4e14\u5b83\u572874.5%\u7684\u65f6\u95f4\u91cc\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684RAG\u7cfb\u7edf\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u751a\u81f3\u53d6\u5f97\u4e86\u80dc\u5229\u6216\u5e76\u5217\u7b2c\u4e00\u7684\u6210\u7ee9\u3002|\n", "2409.08239": "|**2024-09-12**|**Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources**|Alisia Lupidi et.al.|[2409.08239](http://arxiv.org/abs/2409.08239)|null|\u5728\u9762\u5bf9\u4f9d\u8d56\u7ed3\u6784\u5316\u6570\u636e\u3001\u590d\u6742\u63a8\u7406\u6216\u5de5\u5177\u4f7f\u7528\u7684\u6311\u6218\u6027\u573a\u666f\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSource2Synth\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u65e0\u9700\u6602\u8d35\u7684\u4eba\u7c7b\u6807\u6ce8\u5373\u53ef\u7528\u4e8e\u6559\u6388LLMs\u65b0\u6280\u80fd\u3002Source2Synth\u63a5\u53d7\u81ea\u5b9a\u4e49\u6570\u636e\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u751f\u6210\u5177\u6709\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u6765\u6e90\u7684\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u5408\u6210\u6570\u636e\u70b9\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u6839\u636e\u5176\u53ef\u56de\u7b54\u6027\u4e22\u5f03\u4f4e\u8d28\u91cf\u751f\u6210\u6765\u63d0\u9ad8\u6570\u636e\u96c6\u8d28\u91cf\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u9886\u57df\u4e2d\u5e94\u7528\u6b64\u65b9\u6cd5\u6765\u5c55\u793a\u5176\u901a\u7528\u6027\uff1a\u5728\u591a\u8df3\u95ee\u9898\u56de\u7b54\uff08MHQA\uff09\u4e2d\u6d4b\u8bd5\u63a8\u7406\u80fd\u529b\uff0c\u5728\u8868\u683c\u578b\u95ee\u9898\u56de\u7b54\uff08TQA\uff09\u4e2d\u6d4b\u8bd5\u5de5\u5177\u4f7f\u7528\u3002\u4e0e\u7ecf\u8fc7\u5fae\u8c03\u7684\u57fa\u672c\u6a21\u578b\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728WikiSQL\u4e0a\u7684TQA\u4e0a\u63d0\u9ad8\u4e8625.51%\uff0c\u5728HotPotQA\u4e0a\u7684MHQA\u4e0a\u63d0\u9ad8\u4e8622.57%\u7684\u6027\u80fd\u3002|\n", "2409.08234": "|**2024-09-12**|**LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems**|Hakan T. Otal et.al.|[2409.08234](http://arxiv.org/abs/2409.08234)|**[link](https://github.com/ai-in-complex-systems-lab/llm-honeypot)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6784\u5efa\u771f\u5b9e\u4e14\u4e92\u52a8\u7684\u871c\u7f50\u7cfb\u7edf\u3002\u901a\u8fc7\u5728\u5305\u542b\u653b\u51fb\u8005\u751f\u6210\u547d\u4ee4\u548c\u54cd\u5e94\u7684\u591a\u6837\u5316\u6570\u636e\u96c6\u4e0a\u5bf9\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u80fd\u591f\u4e0e\u653b\u51fb\u8005\u8fdb\u884c\u9ad8\u7ea7\u4ea4\u4e92\u7684\u871c\u7f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5173\u952e\u6b65\u9aa4\uff1a\u6570\u636e\u6536\u96c6\u4e0e\u5904\u7406\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6a21\u578b\u9009\u62e9\u4ee5\u53ca\u76d1\u7763\u5f0f\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002\u901a\u8fc7\u76f8\u4f3c\u6027\u6307\u6807\u8bc4\u4f30\u4e0e\u73b0\u573a\u90e8\u7f72\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u51c6\u786e\u4e14\u4fe1\u606f\u4e30\u5bcc\u7684\u54cd\u5e94\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u5728\u91cd\u5851\u871c\u7f50\u6280\u672f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u4e1a\u4eba\u5458\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u5de5\u5177\u6765\u68c0\u6d4b\u548c\u5206\u6790\u6076\u610f\u6d3b\u52a8\uff0c\u4ece\u800c\u589e\u5f3a\u6574\u4f53\u5b89\u5168\u67b6\u6784\u3002**|\n", "2409.08202": "|**2024-09-12**|**What Makes a Maze Look Like a Maze?**|Joy Hsu et.al.|[2409.08202](http://arxiv.org/abs/2409.08202)|null|\u4eba\u7c7b\u89c6\u89c9\u7406\u89e3\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u80fd\u591f\u7075\u6d3b\u5730\u89e3\u91ca\u62bd\u8c61\u6982\u5ff5\u7684\u80fd\u529b\uff1a\u83b7\u53d6\u63d0\u5347\u89c4\u5219\u6765\u89e3\u91ca\u5b83\u4eec\u6240\u8c61\u5f81\u7684\u542b\u4e49\uff0c\u5728\u719f\u6089\u548c\u4e0d\u719f\u6089\u7684\u4e0a\u4e0b\u6587\u4e2d\u951a\u5b9a\u5b83\u4eec\uff0c\u5e76\u5bf9\u5b83\u4eec\u8fdb\u884c\u9884\u6d4b\u6216\u63a8\u7406\u3002\u5c3d\u7ba1\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u5177\u4f53\u5bf9\u8c61\u7c7b\u522b\uff08\u5982\u6811\u679d\uff09\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u7406\u89e3\u8fd9\u6837\u7684\u89c6\u89c9\u62bd\u8c61\uff08\u4f8b\u5982\uff0c\u4e00\u7ec4\u6811\u679d\u5982\u4f55\u5f62\u6210\u8ff7\u5bab\u7684\u5899\u58c1\uff09\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u6df1\u5ea6\u67b6\u6784\u63a5\u5730\uff08DSG\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u660e\u786e\u7684\u7ed3\u6784\u5316\u8868\u793a\u6cd5\u6765\u951a\u5b9a\u548c\u63a8\u7406\u89c6\u89c9\u62bd\u8c61\u7684\u6846\u67b6\u3002DSG\u7684\u6838\u5fc3\u662f\u67b6\u6784\u2014\u2014\u5206\u89e3\u62bd\u8c61\u6982\u5ff5\u7684\u4f9d\u8d56\u56fe\u5f62\u63cf\u8ff0\uff0c\u5c06\u5176\u5206\u89e3\u4e3a\u66f4\u57fa\u672c\u7684\u7b26\u53f7\u3002DSG\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u53d6\u67b6\u6784\uff0c\u7136\u540e\u901a\u8fc7\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5206\u5c42\u5730\u5c06\u67b6\u6784\u4e2d\u7684\u5177\u4f53\u5230\u62bd\u8c61\u7ec4\u4ef6\u951a\u5b9a\u5230\u56fe\u50cf\u4e0a\u3002\u951a\u5b9a\u540e\u7684\u67b6\u6784\u7528\u4e8e\u589e\u5f3a\u5bf9\u89c6\u89c9\u62bd\u8c61\u7684\u7406\u89e3\u3002\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86DSG\u53ca\u5176\u4e0d\u540c\u7684\u65b9\u6cd5\u5728\u6211\u4eec\u65b0\u521b\u5efa\u7684\u89c6\u89c9\u62bd\u8c61\u6570\u636e\u96c6\u4e0a\u7684\u63a8\u7406\u6027\u80fd\uff0c\u8be5\u6570\u636e\u96c6\u7531\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u4e16\u754c\u56fe\u50cf\u548c\u76f8\u5e94\u7684\u95ee\u7b54\u5bf9\u7ec4\u6210\u3002\u6211\u4eec\u5c55\u793a\u4e86DSG\u663e\u8457\u63d0\u9ad8\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u62bd\u8c61\u89c6\u89c9\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\uff0c\u5e76\u671d\u7740\u4e0e\u4eba\u7c7b\u4e00\u81f4\u7684\u89c6\u89c9\u62bd\u8c61\u7406\u89e3\u8fc8\u8fdb\u4e86\u4e00\u6b65\u3002|\n", "2409.08185": "|**2024-09-12**|**Fine-tuning Large Language Models for Entity Matching**|Aaron Steiner et.al.|[2409.08185](http://arxiv.org/abs/2409.08185)|**[link](https://github.com/wbsg-uni-mannheim/tailormatch)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5b9e\u4f53\u5339\u914d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5fae\u8c03\u3002\u5df2\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4e0a\u3002\u672c\u6587\u4ece\u4e24\u4e2a\u7ef4\u5ea6\u5206\u6790\u4e86\u5fae\u8c03\u7684\u53ef\u884c\u6027\uff1a1\uff09\u8bad\u7ec3\u793a\u4f8b\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u5b9e\u9a8c\u6d89\u53ca\u5728\u8bad\u7ec3\u96c6\u4e2d\u6dfb\u52a0\u4e0d\u540c\u7c7b\u578b\u7684LLM\u751f\u6210\u89e3\u91ca\uff1b2\uff09\u4f7f\u7528LLM\u9009\u62e9\u548c\u751f\u6210\u8bad\u7ec3\u793a\u4f8b\u3002\u6211\u4eec\u4e0d\u4ec5\u5173\u6ce8\u6e90\u6570\u636e\u96c6\u4e0a\u7684\u5339\u914d\u6027\u80fd\uff0c\u8fd8\u7814\u7a76\u4e86\u5fae\u8c03\u5bf9\u6a21\u578b\u5728\u540c\u57df\u6570\u636e\u96c6\u4ee5\u53ca\u8de8\u9886\u57df\u6570\u636e\u96c6\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u7684\u5f71\u54cd\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5fae\u8c03\u663e\u8457\u63d0\u5347\u4e86\u5c0f\u578b\u6a21\u578b\u7684\u6027\u80fd\uff0c\u800c\u5927\u578b\u6a21\u578b\u7684\u8868\u73b0\u5219\u53c2\u5dee\u4e0d\u9f50\u3002\u5fae\u8c03\u5728\u63d0\u5347\u540c\u57df\u6570\u636e\u96c6\u7684\u6cdb\u5316\u80fd\u529b\u7684\u540c\u65f6\uff0c\u4e5f\u5f71\u54cd\u4e86\u8de8\u57df\u8fc1\u79fb\u7684\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5411\u8bad\u7ec3\u96c6\u6dfb\u52a0\u7ed3\u6784\u5316\u7684\u89e3\u91ca\u5bf9\u56db\u79cdLLM\u4e2d\u7684\u4e09\u79cd\u6709\u6b63\u9762\u5f71\u54cd\uff0c\u800c\u63d0\u51fa\u7684\u793a\u4f8b\u9009\u62e9\u548c\u751f\u6210\u65b9\u6cd5\u4ec5\u63d0\u5347\u4e86Llama 3.1 8B\u7684\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86GPT-4o Mini\u7684\u6027\u80fd\u3002**|\n", "2409.08148": "|**2024-09-12**|**Faster Speech-LLaMA Inference with Multi-token Prediction**|Desh Raj et.al.|[2409.08148](http://arxiv.org/abs/2409.08148)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u4efb\u52a1\u4e0a\u53d8\u5f97\u6781\u4e3a\u719f\u7ec3\uff0c\u5305\u62ec\u6d89\u53ca\u591a\u6a21\u6001\u8f93\u5165\u7684\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u4f7f\u7528\u8bed\u97f3\u7f16\u7801\u5668\u5b9e\u4f8b\u5316LLM\uff08\u4f8b\u5982LLaMA\uff09\u5e76\u5229\u7528\u914d\u5bf9\u6570\u636e\u5bf9\u5176\u8fdb\u884c\u8bad\u7ec3\uff0c\u53ef\u4ee5\u8d4b\u4e88\u53ea\u89e3\u7801\u7684\u6a21\u578b\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u80fd\u529b\uff0c\u56e0\u6b64\u79f0\u4e4b\u4e3aSpeech-LLaMA\u3002\u7136\u800c\uff0c\u7531\u4e8e\u81ea\u56de\u5f52\u63a8\u7406\u7684\u987a\u5e8f\u6027\u8d28\u4ee5\u53ca\u76f8\u5bf9\u8f83\u5927\u7684\u89e3\u7801\u5668\uff0cSpeech-LLaMA\u6a21\u578b\u7684\u63a8\u7406\u65f6\u95f4\u76f8\u5bf9\u8f83\u9ad8\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u5728\u540c\u4e00\u89e3\u7801\u6b65\u9aa4\u4e2d\u9884\u6d4b\u591a\u4e2a\u4ee4\u724c\u6765\u52a0\u901fSpeech-LLaMA\u7684\u63a8\u7406\u3002\u6211\u4eec\u63a2\u7d22\u4e86\u51e0\u4e2a\u80fd\u591f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6a21\u578b\u67b6\u6784\uff0c\u5e76\u901a\u8fc7\u9608\u503c\u63a8\u7406\u548c\u9a8c\u8bc1\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u524d\u7f00\u7684\u675f\u641c\u7d22\u89e3\u7801\u65b9\u6cd5\uff0c\u5141\u8bb8\u6b64\u7c7b\u6a21\u578b\u8fdb\u884c\u9ad8\u6548\u7684\u6700\u5c0f\u8bcd\u9519\u8bef\u7387\uff08MWER\uff09\u8bad\u7ec3\u3002\u6211\u4eec\u5728\u591a\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u4eec\u5c06\u89e3\u7801\u8c03\u7528\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u7ea63.2\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u6216\u63d0\u9ad8\u4e86WER\u6027\u80fd\u3002|\n", "2409.08147": "|**2024-09-12**|**LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models**|Zhengliang Liu et.al.|[2409.08147](http://arxiv.org/abs/2409.08147)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u8868\u73b0\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u957f\u671f\u5b58\u5728\u7684\u5ba2\u89c2\u8bc4\u4f30\u8fa9\u8bba\u7ed3\u679c\u7684\u6311\u6218\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u4ece\u201c\u653f\u7b56\u3001\u4e2a\u6027\u4e0e\u89c6\u89d2\u201d\uff083P\uff09\u548c\u201c\u5174\u8da3\u3001\u610f\u8bc6\u5f62\u6001\u4e0e\u8eab\u4efd\u8ba4\u540c\u201d\uff083I\uff09\u7684\u89d2\u5ea6\u5206\u6790\u56db\u4f4d\u5173\u952e\u53d7\u4f17\u7fa4\u4f53\uff1a\u9009\u6c11\u3001\u4f01\u4e1a\u3001\u6350\u8d60\u8005\u53ca\u653f\u5ba2\u5bf9\u5019\u9009\u4eba\u7684\u5171\u9e23\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u751f\u6210\u201cLLM-POTUS\u8bc4\u5206\u201d\uff0c\u5373\u57fa\u4e8e3P\u4e0e3I\u4e4b\u95f4\u4e00\u81f4\u6027\u5ea6\u91cf\u7684\u91cf\u5316\u6307\u6807\uff0c\u6765\u8bc4\u4ef7\u8fa9\u8bba\u8868\u73b0\u3002\u6211\u4eec\u5e94\u7528\u6b64\u6846\u67b6\u5bf9\u8fd1\u671f\u7f8e\u56fd\u603b\u7edf\u8fa9\u8bba\u7684\u6587\u672c\u8fdb\u884c\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u8fa9\u8bba\u7b56\u7565\u7684\u6709\u6548\u6027\u53ca\u5176\u5bf9\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u5f71\u54cd\u3002\u7814\u7a76\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u653f\u6cbb\u5206\u6790\u5de5\u5177\uff0c\u8fd8\u63a2\u7d22\u4e86\u5728\u590d\u6742\u793e\u4f1a\u80cc\u666f\u4e0b\u4f7f\u7528LLM\u4f5c\u4e3a\u516c\u6b63\u8bc4\u5224\u8005\u7684\u6f5c\u529b\u4e0e\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u4e3a\u4e2a\u4eba\u516c\u6c11\u63d0\u4f9b\u4e86\u4e00\u4e2a\u72ec\u7acb\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u7684\u8868\u73b0\uff0c\u4ece\u800c\u589e\u5f3a\u6c11\u4e3b\u53c2\u4e0e\u5ea6\uff0c\u51cf\u5c11\u5bf9\u53ef\u80fd\u504f\u89c1\u7684\u5a92\u4f53\u89e3\u8bfb\u548c\u673a\u6784\u5f71\u54cd\u529b\u7684\u4f9d\u8d56\uff0c\u8fdb\u800c\u52a0\u5f3a\u77e5\u60c5\u516c\u6c11\u53c2\u4e0e\u7684\u57fa\u7840\u3002|\n", "2409.08098": "|**2024-09-12**|**The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal**|Huiyuan Xie et.al.|[2409.08098](http://arxiv.org/abs/2409.08098)|null|\u672c\u6587\u7814\u7a76\u4e86\u6280\u672f\u9769\u65b0\u4e0e\u83b7\u53d6\u516c\u6b63\u4e4b\u95f4\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5728\u82f1\u56fd\u5c31\u4e1a\u6cd5\u5ead\uff08UKET\uff09\u6784\u5efa\u9884\u6d4b\u6848\u4f8b\u7ed3\u679c\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u6311\u6218\uff0c\u8be5\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u81ea\u52a8\u6ce8\u91ca\uff0c\u4ece\u800c\u521b\u5efa\u4e86CLC-UKET\u6570\u636e\u96c6\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea619,000\u4e2aUKET\u6848\u4f8b\u53ca\u5176\u5143\u6570\u636e\u3002\u5168\u9762\u7684\u6cd5\u5f8b\u6ce8\u91ca\u6db5\u76d6\u4e86\u4e8b\u5b9e\u3001\u4e3b\u5f20\u3001\u5148\u4f8b\u5f15\u7528\u3001\u6cd5\u89c4\u5f15\u7528\u3001\u6848\u4f8b\u7ed3\u679c\u3001\u7406\u7531\u548c\u7ba1\u8f96\u6743\u4ee3\u7801\u3002\u501f\u52a9CLC-UKET\u6570\u636e\uff0c\u6211\u4eec\u5bf9UKET\u7684\u591a\u7c7b\u6848\u4f8b\u7ed3\u679c\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u6536\u96c6\u4e86\u4eba\u7c7b\u9884\u6d4b\u4ee5\u5efa\u7acb\u6a21\u578b\u6bd4\u8f83\u7684\u6027\u80fd\u53c2\u8003\u3002\u4ece\u57fa\u7840\u6a21\u578b\u7684\u5b9e\u8bc1\u7ed3\u679c\u6765\u770b\uff0c\u5fae\u8c03\u7684\u8f6c\u6362\u5668\u6a21\u578b\u5728UKET\u9884\u6d4b\u4efb\u52a1\u4e0a\u4f18\u4e8e\u96f6\u6b21\u548c\u5c11\u91cf\u6837\u672c\u7684LLM\u3002\u96f6\u6b21LLM\u7684\u6027\u80fd\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u4fe1\u606f\u6765\u589e\u5f3a\uff0c\u878d\u5165\u5c11\u91cf\u6837\u672c\u793a\u4f8b\u4e2d\u3002\u6211\u4eec\u5e0c\u671bCLC-UKET\u6570\u636e\u96c6\u3001\u4eba\u7c7b\u6ce8\u91ca\u4ee5\u53ca\u5b9e\u8bc1\u53d1\u73b0\u80fd\u591f\u4f5c\u4e3a\u5c31\u4e1a\u76f8\u5173\u7ea0\u7eb7\u89e3\u51b3\u7684\u5b9d\u8d35\u57fa\u51c6\u3002|\n", "2409.08087": "|**2024-09-12**|**Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks**|Benji Peng et.al.|[2409.08087](http://arxiv.org/abs/2409.08087)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u8fd1\u5e74\u6765\u6709\u5173\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b89\u5168\u6027\u7684\u5173\u952e\u95ee\u9898\u7684\u7814\u7a76\u6587\u732e\uff0c\u91cd\u70b9\u662f\u51c6\u786e\u6027\u3001\u504f\u89c1\u3001\u5185\u5bb9\u68c0\u6d4b\u4ee5\u53ca\u5bf9\u6297\u653b\u51fb\u7684\u8106\u5f31\u6027\u3002\u6587\u7ae0\u8be6\u7ec6\u8ba8\u8bba\u4e86LLM\u8f93\u51fa\u53ef\u80fd\u4e0d\u51c6\u786e\u6216\u8bef\u5bfc\u6027\u7684\u95ee\u9898\uff0c\u5e76\u5f3a\u8c03\u4e86\u901a\u8fc7\u4e8b\u5b9e\u6838\u67e5\u65b9\u6cd5\u589e\u5f3a\u54cd\u5e94\u53ef\u9760\u6027\u7684\u5b9e\u65bd\u7b56\u7565\u3002\u6587\u7ae0\u6df1\u5165\u63a2\u8ba8\u4e86\u5185\u5d4c\u4e8eLLM\u4e2d\u7684\u56fa\u6709\u504f\u89c1\uff0c\u901a\u8fc7\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6280\u672f\uff0c\u5982\u63a7\u5236\u8f93\u5165\u7814\u7a76\u548c\u7ea2\u961f\u6f14\u7ec3\uff0c\u5bf9\u5176\u8fdb\u884c\u6279\u5224\u6027\u5ba1\u89c6\u3002\u63d0\u51fa\u4e86\u5168\u9762\u7684\u504f\u89c1\u7f13\u89e3\u7b56\u7565\u5206\u6790\uff0c\u5305\u62ec\u4ece\u9884\u5904\u7406\u5e72\u9884\u5230\u8bad\u7ec3\u671f\u95f4\u8c03\u6574\u548c\u540e\u5904\u7406\u6539\u8fdb\u7684\u5404\u79cd\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u63a2\u7a76\u4e86\u533a\u5206LLM\u751f\u6210\u5185\u5bb9\u4e0e\u4eba\u7c7b\u521b\u4f5c\u6587\u672c\u7684\u590d\u6742\u6027\uff0c\u5f15\u5165\u4e86\u8bf8\u5982DetectGPT\u7684\u68c0\u6d4b\u673a\u5236\u4ee5\u53ca\u6c34\u5370\u6280\u672f\uff0c\u540c\u65f6\u6307\u51fa\u5728\u590d\u6742\u60c5\u51b5\u4e0b\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u5206\u7c7b\u5668\u5b58\u5728\u5c40\u9650\u6027\u3002\u6587\u7ae0\u8fd8\u5206\u6790\u4e86LLM\u7684\u6f0f\u6d1e\uff0c\u5305\u62ec\u9003\u9038\u653b\u51fb\u548c\u63d0\u793a\u6ce8\u5165\u653b\u51fb\uff0c\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u548c\u5927\u89c4\u6a21\u7ade\u8d5bHackAPrompt\u7b49\u8fdb\u884c\u4e86\u6df1\u5165\u63a2\u8ba8\u3002\u6700\u540e\uff0c\u6587\u7ae0\u56de\u987e\u4e86\u4fdd\u62a4LLM\u7684\u9632\u5fa1\u63aa\u65bd\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u5bf9LLM\u5b89\u5168\u6027\u9886\u57df\u8fdb\u884c\u66f4\u6df1\u5165\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.09030": "|**2024-09-13**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanxian Huang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5c24\u5176\u662f\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u4e2d\u7684\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u8bb8\u591a\u5c06LLMs\u4e0eSE\u7ed3\u5408\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u7684\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u53d1\u5c55\u80cc\u666f\u7684\u6df1\u5165\u7efc\u8ff0\u3001\u5206\u6790\u5b83\u4eec\u5982\u4f55\u7ed3\u5408\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u79cd\u4efb\u52a1\u4ee5\u53ca\u6f84\u6e05SE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u65e8\u5728\u8fdb\u884c\u9996\u6b21\u5173\u4e8e\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u3002\u540c\u65f6\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u8fd9\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u5f53\u524d\u6311\u6218\uff0c\u5e76\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u7684\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u76f8\u5173\u7684\u8bba\u6587GitHub\u4ed3\u5e93\uff0c\u5730\u5740\u4e3a\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09010": "|**2024-09-13**|**Contri(e)ve: Context + Retrieve for Scholarly Question Answering**|Kanchan Shivashankar et.al.|[2409.09010](http://arxiv.org/abs/2409.09010)|null|### \u6458\u8981\u7ffb\u8bd1 \u5b66\u8005\u4ea4\u6d41\u662f\u4e00\u4e2a\u5feb\u901f\u53d1\u5c55\u7684\u9886\u57df\uff0c\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5176\u975e\u7ed3\u6784\u5316\u7684\u6587\u6863\u683c\u5f0f\uff0c\u4f20\u7edf\u7684\u6587\u6863\u68c0\u7d22\u65b9\u6cd5\u96be\u4ee5\u4ece\u4e2d\u63d0\u53d6\u6709\u7528\u4fe1\u606f\u3002\u5b66\u8005\u77e5\u8bc6\u56fe\u8c31\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u8bed\u4e49\u7f51\u7edc\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u9690\u85cf\u7684\u6d1e\u5bdf\u3001\u6458\u8981\u548c\u6613\u4e8e\u901a\u8fc7\u67e5\u8be2\u83b7\u53d6\u7684\u8bbf\u95ee\u6027\u3002\u81ea\u7136\u5730\uff0c\u5bf9\u5b66\u8005\u56fe\u8c31\u8fdb\u884c\u95ee\u7b54\u6269\u5c55\u4e86\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u67d0\u4e9b\u77e5\u8bc6\u4ecd\u7136\u4ee5\u975e\u7ed3\u6784\u5316\u6587\u672c\u5f62\u5f0f\u5448\u73b0\uff0c\u56e0\u6b64\u9700\u8981\u7ed3\u5408\u89e3\u51b3\u65b9\u6848\u6765\u4e3a\u95ee\u7b54\u7cfb\u7edf\u63d0\u4f9b\u652f\u6301\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u89e3\u51b3\u65b9\u6848\uff0c\u4f7f\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff1aLlama3.1\u5bf9\u5b66\u8005-QALD\u6570\u636e\u96c6\u8fdb\u884c\u5904\u7406\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ece\u4e0d\u540c\u7684\u7ed3\u6784\u5316\u548c\u975e\u7ed3\u6784\u5316\u6570\u636e\u6e90\u4e2d\u63d0\u53d6\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5185\u5bb9\uff1aDBLP\u3001SemOpenAlex\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u7ef4\u57fa\u767e\u79d1\u6587\u672c\u3002 \u5176\u6b21\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u63d0\u793a\u5de5\u7a0b\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4fe1\u606f\u68c0\u7d22\u6027\u80fd\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u5728F1\u5206\u6570\u4e0a\u53d6\u5f97\u4e8640%\u7684\u6210\u7ee9\uff0c\u5e76\u89c2\u5bdf\u5230\u4e00\u4e9b\u6765\u81eaLLM\u7684\u5f02\u5e38\u54cd\u5e94\uff0c\u8fd9\u4e9b\u54cd\u5e94\u5728\u8bba\u6587\u7684\u6700\u540e\u90e8\u5206\u8fdb\u884c\u4e86\u8ba8\u8bba\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u7b26\u5408\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u4eba\u7c7b\u7684\u5408\u89c4\u6027\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u4e0d\u65ad\u589e\u957f\u91cf\u548c\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\u9762\u4e34\u7740\u6269\u5c55\u96be\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\uff0c\u4e3a\u81ea\u52a8\u5316\u5185\u5bb9\u5408\u89c4\u6027\u9a8c\u8bc1\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u672c\u6587\u8bc4\u4f30\u4e86\u516d\u4e2a\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u8fd9\u4e9b\u4ee3\u7406\u57fa\u4e8eOpen-LLMs\uff0c\u5728\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u5bf9\u89c4\u5219\u5408\u89c4\u6027\u8fdb\u884c\u81ea\u52a8\u9a8c\u8bc1\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\uff0c\u56e0\u4e3a\u793e\u533a\u7684\u8303\u56f4\u548c\u89c4\u5219\u5404\u4e0d\u76f8\u540c\u3002\u901a\u8fc7\u5bf9\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\u7684\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u975e\u5408\u89c4\u5185\u5bb9\u3001\u638c\u63e1\u8bed\u8a00\u4e0a\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u4e0d\u540c\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u663e\u793a\u51fa\u9ad8\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\uff0c\u5728\u8bc4\u5206\u89e3\u91ca\u548c\u5408\u89c4\u5efa\u8bae\u4e0a\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u8005\u76f8\u5339\u914d\u3002\u901a\u8fc7\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u5de5\u8bc4\u4f30\uff0c\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08937": "|**2024-09-13**|**Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions**|Zahra Ashktorab et.al.|[2409.08937](http://arxiv.org/abs/2409.08937)|null|\u672c\u6587\u7814\u7a76\u4e86\u5728\u4eba\u7c7b\u4e0e\u4eba\u5de5\u667a\u80fd\u5408\u4f5c\u8fdb\u884c\u6587\u672c\u751f\u6210\u4efb\u52a1\u65f6\uff0c\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u534f\u52a9\u751f\u6210\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6570\u636e\u3002\u5bf9\u4e8e\u8fd9\u4e9b\u6a21\u578b\u800c\u8a00\uff0c\u9700\u8981\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u63d0\u5347\u5176\u6027\u80fd\u7684\u5173\u952e\u6b65\u9aa4\u3002\u5728\u5ba2\u6237\u670d\u52a1\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u4e2d\uff0c\u6570\u636e\u4ee5\u4eba\u4e0e\u5ba2\u670d\u4ee3\u7406\u4e4b\u95f4\u7684\u5bf9\u8bdd\u5f62\u5f0f\u5b58\u5728\uff0c\u5e76\u53ef\u501f\u52a9AI\u52a9\u624b\u751f\u6210\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u5171\u62db\u52df\u4e8611\u4f4d\u7528\u6237\uff0c\u6bcf\u4f4d\u7528\u6237\u5b8c\u62108\u9879\u4efb\u52a1\uff0c\u603b\u5171\u5b8c\u6210\u4e8688\u9879\u4efb\u52a1\u3002\u7ed3\u679c\u53d1\u73b0\uff0c\u5e7b\u89c9\u7684\u5b58\u5728\u5bf9\u6570\u636e\u8d28\u91cf\u4ea7\u751f\u4e86\u8d1f\u9762\u5f71\u54cd\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5c3d\u7ba1\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5e76\u975e\u603b\u80fd\u62b5\u6d88\u5e7b\u89c9\u5bf9\u6570\u636e\u8d28\u91cf\u7684\u4e0d\u5229\u5f71\u54cd\uff0c\u4f46\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5171\u540c\u4f5c\u7528\u4e8e\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u5f71\u54cd\u7528\u6237\u5982\u4f55\u5229\u7528\u5448\u73b0\u7ed9\u4ed6\u4eec\u7684AI\u54cd\u5e94\u3002\u901a\u8fc7\u5206\u6790\u7528\u6237\u884c\u4e3a\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5bf9AI\u751f\u6210\u54cd\u5e94\u4f9d\u8d56\u7684\u660e\u663e\u6a21\u5f0f\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728\u5bf9\u8bddAI\u60c5\u5883\u4e0b\u7ba1\u7406\u5e7b\u89c9\u5728AI\u751f\u6210\u5185\u5bb9\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08936": "|**2024-09-13**|**SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records**|Paloma Rabaey et.al.|[2409.08936](http://arxiv.org/abs/2409.08936)|**[link](https://github.com/prabaey/synsum)**|**\u6211\u4eec\u63d0\u51fa\u4e86SynSUM\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5c06\u975e\u7ed3\u6784\u5316\u7684\u4e34\u5e8a\u8bb0\u5f55\u4e0e\u7ed3\u6784\u5316\u80cc\u666f\u53d8\u91cf\u8054\u7cfb\u8d77\u6765\u3002\u8be5\u6570\u636e\u96c6\u753110,000\u4e2a\u865a\u6784\u7684\u60a3\u8005\u8bb0\u5f55\u7ec4\u6210\uff0c\u5305\u542b\u8868\u683c\u53d8\u91cf\uff08\u5982\u75c7\u72b6\u3001\u8bca\u65ad\u548c\u57fa\u7840\u6761\u4ef6\uff09\u4ee5\u53ca\u4e0e\u4e4b\u76f8\u5173\u7684\u63cf\u8ff0\u865a\u6784\u60a3\u8005\u5c31\u8bca\u60c5\u51b5\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u9886\u57df\u4e3a\u547c\u5438\u75be\u75c5\u3002\u8868\u683c\u90e8\u5206\u7684\u6570\u636e\u901a\u8fc7\u8d1d\u53f6\u65af\u7f51\u7edc\u751f\u6210\uff0c\u5176\u4e2d\u56e0\u679c\u7ed3\u6784\u548c\u6761\u4ef6\u6982\u7387\u7531\u4e13\u5bb6\u57fa\u4e8e\u9886\u57df\u77e5\u8bc6\u63d0\u51fa\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4o\uff09\u751f\u6210\u4e0e\u60a3\u8005\u5c31\u8bca\u76f8\u5173\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u63cf\u8ff0\u60a3\u8005\u7684\u75c7\u72b6\u548c\u989d\u5916\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u3002 SynSUM\u6570\u636e\u96c6\u4e3b\u8981\u65e8\u5728\u4fc3\u8fdb\u5728\u5b58\u5728\u8868\u683c\u80cc\u666f\u53d8\u91cf\u7684\u60c5\u51b5\u4e0b\u5bf9\u4e34\u5e8a\u4fe1\u606f\u63d0\u53d6\u7684\u7814\u7a76\uff0c\u53ef\u4ee5\u901a\u8fc7\u9886\u57df\u77e5\u8bc6\u5c06\u8fd9\u4e9b\u53d8\u91cf\u94fe\u63a5\u5230\u4ece\u6587\u672c\u4e2d\u63d0\u53d6\u7684\u6982\u5ff5\u5174\u8da3\u70b9\u2014\u2014\u5728SynSUM\u7684\u60c5\u51b5\u4e0b\u662f\u75c7\u72b6\u3002\u6b21\u8981\u7528\u9014\u5305\u62ec\u7814\u7a76\u8868\u683c\u6570\u636e\u548c\u6587\u672c\u7684\u81ea\u52a8\u5316\u4e34\u5e8a\u63a8\u7406\u3001\u5728\u5b58\u5728\u8868\u683c\u548c/\u6216\u6587\u672c\u6df7\u6742\u56e0\u7d20\u60c5\u51b5\u4e0b\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u4ee5\u53ca\u591a\u6a21\u6001\u5408\u6210\u6570\u636e\u751f\u6210\u3002 \u8be5\u6570\u636e\u96c6\u53ef\u4ee5\u4ece\u4ee5\u4e0b\u94fe\u63a5\u4e0b\u8f7d\uff1a**|\n", "2409.08931": "|**2024-09-13**|**LLM-based Weak Supervision Framework for Query Intent Classification in Video Search**|Farnoosh Javadi et.al.|[2409.08931](http://arxiv.org/abs/2409.08931)|null|\u6d41\u5a92\u4f53\u670d\u52a1\u5df2\u7ecf\u5f7b\u5e95\u6539\u53d8\u4e86\u6211\u4eec\u53d1\u73b0\u548c\u53c2\u4e0e\u6570\u5b57\u5a31\u4e50\u7684\u65b9\u5f0f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u6709\u6548\u7406\u89e3\u7528\u6237\u641c\u7d22\u67e5\u8be2\u7684\u5e7f\u6cdb\u8303\u56f4\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6784\u5efa\u4e00\u4e2a\u80fd\u591f\u5904\u7406\u4ee3\u8868\u4e0d\u540c\u7528\u6237\u610f\u56fe\u7684\u5404\u79cd\u5b9e\u4f53\u7684\u51c6\u786e\u67e5\u8be2\u7406\u89e3\u7cfb\u7edf\u5bf9\u4e8e\u63d0\u4f9b\u589e\u5f3a\u7684\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u8bad\u7ec3\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u6a21\u578b\u53ef\u4ee5\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u7136\u800c\uff0c\u5728\u8fd9\u4e2a\u4e13\u95e8\u9886\u57df\u7684\u9ad8\u8d28\u91cf\u6807\u6ce8\u6570\u636e\u83b7\u53d6\u662f\u4e00\u4e2a\u5de8\u5927\u7684\u969c\u788d\u3002\u624b\u52a8\u6ce8\u91ca\u6210\u672c\u9ad8\u6602\u4e14\u5728\u6355\u6349\u7528\u6237\u8bcd\u6c47\u53d8\u5f02\u6027\u65b9\u9762\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f31\u76d1\u7763\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u6807\u6ce8\u5927\u91cf\u7528\u6237\u641c\u7d22\u67e5\u8be2\u3002\u901a\u8fc7\u4f7f\u7528\u63d0\u793a\u5de5\u7a0b\u548c\u591a\u6837\u5316\u7684LLM\u89d2\u8272\uff0c\u6211\u4eec\u751f\u6210\u4e86\u4e0e\u4eba\u5de5\u6ce8\u91ca\u8005\u671f\u671b\u76f8\u5339\u914d\u7684\u8bad\u7ec3\u6570\u636e\u3002\u901a\u8fc7\u5f15\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u5229\u7528\u94fe\u5f0f\u601d\u8003\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6807\u8bb0\u6570\u636e\u8bad\u7ec3\u4f18\u5316\u7528\u4e8e\u5b9e\u65f6\u63a8\u7406\u7684\u4f4e\u5ef6\u8fdf\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u53ec\u56de\u7387\u4e0a\u4f18\u4e8e\u57fa\u7ebf\u5e73\u5747\u63d0\u9ad8\u4e86113%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u4ea7\u751f\u7528\u4e8e\u5f31\u76d1\u7763\u7684\u9ad8\u8d28\u91cfLLM\u751f\u6210\u6570\u636e\uff1b\u4e0e\u4eba\u7c7b\u6ce8\u91ca\u7684F1\u5f97\u5206\u52a0\u6743\u5206\u5e03\u76f8\u6bd4\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u9884\u6d4b\u548c\u4eba\u7c7b\u6ce8\u89e3\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u63d0\u9ad8\u4e8647.60%\u3002\u6211\u4eec\u7684\u89d2\u8272\u9009\u62e9\u8def\u7531\u673a\u5236\u8fdb\u4e00\u6b65\u589e\u52a0\u4e863.67%\u7684\u52a0\u6743F1\u5f97\u5206\uff0c\u8fd9\u662f\u5728\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u57fa\u7840\u4e0a\u7684\u989d\u5916\u6536\u76ca\u3002|\n", "2409.08904": "|**2024-09-13**|**AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models**|Yifei Yao et.al.|[2409.08904](http://arxiv.org/abs/2409.08904)|**[link](https://github.com/sjtu-mvasl-robotics/AnyBipe)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bad\u7ec3\u548c\u90e8\u7f72\u673a\u5668\u4eba\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7b56\u7565\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5f15\u5bfc\u3002\u8be5\u6846\u67b6\u7531\u4e09\u4e2a\u76f8\u4e92\u8fde\u63a5\u7684\u6a21\u5757\u7ec4\u6210\uff1a\u4e00\u4e2a\u901a\u8fc7LLM\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u6a21\u5757\u3001\u4e00\u4e2a\u5229\u7528\u73b0\u6709\u5de5\u4f5c\u7684RL\u8bad\u7ec3\u6a21\u5757\u4ee5\u53ca\u4e00\u4e2a\u6a21\u62df\u5230\u73b0\u5b9e\uff08sim-to-real\uff09\u540c\u6001\u8bc4\u4f30\u6a21\u5757\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4ec5\u9700\u8981\u57fa\u672c\u7684\u6a21\u62df\u548c\u90e8\u7f72\u5e73\u53f0\uff0c\u5e76\u4e14\u63d0\u4f9b\u4e86\u4eba\u5de5\u5de5\u7a0b\u7b56\u7565\u548c\u5386\u53f2\u6570\u636e\u7684\u6574\u5408\u9009\u9879\u3002\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u8fd9\u4e9b\u6a21\u5757\u7684\u6784\u5efa\u3001\u5b83\u4eec\u76f8\u5bf9\u4e8e\u4f20\u7edf\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u5c55\u793a\u8be5\u6846\u67b6\u5728\u53cc\u8db3\u673a\u5668\u4eba\u6b65\u6001\u63a7\u5236\u81ea\u4e3b\u5f00\u53d1\u548c\u6539\u8fdb\u80fd\u529b\u7684\u5b9e\u4f8b\uff0c\u8bc1\u660e\u5176\u5728\u4e0d\u9700\u8981\u4eba\u7c7b\u5e72\u9884\u7684\u60c5\u51b5\u4e0b\u64cd\u4f5c\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.08890": "|**2024-09-13**|**A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research**|Martin Obschonka et.al.|[2409.08890](http://arxiv.org/abs/2409.08890)|null|\u5728\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u91c7\u7528\u7684\u8fc5\u901f\u589e\u957f\u4ee5\u53ca\u5927\u6570\u636e\u53ef\u7528\u6027\u7684\u80cc\u666f\u4e0b\uff0c\u521b\u4e1a\u5b66\u9886\u57df\u53ef\u80fd\u8fce\u6765\u6709\u53f2\u4ee5\u6765\u6700\u91cd\u5927\u7684\u8f6c\u53d8\u3002\u672c\u6587\u901a\u8fc7\u5f3a\u8c03AI\u9769\u547d\u671f\u95f4\u521b\u4e1a\u7814\u7a76\u4e2d\u6f5c\u5728\u7684\u65e0\u6210\u6548\u77e5\u8bc6\u4ea4\u6d41\u98ce\u9669\uff0c\u505a\u51fa\u4e86\u7d27\u8feb\u7684\u5143\u8d21\u732e\u3002\u5b83\u63d0\u4f9b\u4e86\u7f13\u89e3\u8fd9\u4e00\u98ce\u9669\u7684\u7b56\u7565\uff0c\u5e76\u4e3a\u672a\u6765\u57fa\u4e8eAI\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u5176\u96c6\u4f53\u5f71\u54cd\u529b\u548c\u76f8\u5173\u6027\u3002 \u501f\u9274Akerlof\u8457\u540d\u7684\u201c\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u201d\u6982\u5ff5\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u7531\u4e8e\u9886\u57df\u6f14\u8fdb\u5230\u5f53\u524d\u73af\u5883\u800c\u53ef\u80fd\u51fa\u73b0\u7684\u91cd\u5927\u77e5\u8bc6\u4e0d\u5bf9\u79f0\u6027\uff0c\u5982\u6784\u9020\u6709\u6548\u6027\u3001\u7406\u8bba\u6784\u5efa\u548c\u7814\u7a76\u76f8\u5173\u6027\u65b9\u9762\u7684\u590d\u6742\u6027\u3002\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u6027\u7279\u522b\u6df1\u690d\u4e8e\u6240\u8c13\u7684\u53cc\u91cd\u9ed1\u7bb1\u56f0\u5883\u4e2d\uff0c\u5373AI\u65b9\u6cd5\u7684\u5e7f\u6cdb\u8ba4\u53ef\u7684\u9ed1\u7bb1\u6027\u8d28\u4e0e\u7531\u5185\u5728\u4e0d\u786e\u5b9a\u6027\u9a71\u52a8\u7684\u521b\u4e1a\u73b0\u8c61\u7684\u9ed1\u7bb1\u6027\u8d28\u7684\u4ea4\u6c47\u70b9\u3002\u7ed3\u679c\uff0c\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u53ef\u80fd\u5bfc\u81f4\u4e0d\u53ef\u68c0\u6d4b\u7684\u6b21\u4f18\u7814\u7a76\u4ea7\u54c1\u589e\u52a0\uff0c\u4ece\u800c\u5f62\u6210\u4e00\u4e2a\u635f\u5bb3\u9886\u57df\u798f\u7949\u3001\u58f0\u8a89\u548c\u5f71\u54cd\u529b\u7684\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u3002 \u7136\u800c\uff0c\u91cd\u8981\u7684\u662f\uff0c\u5982\u679c\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u98ce\u9669\uff0cAI\u9769\u547d\u6709\u53ef\u80fd\u9884\u793a\u7740\u521b\u4e1a\u7814\u7a76\u7684\u65b0\u9ec4\u91d1\u65f6\u4ee3\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u63d0\u5347\u9886\u57df\u81f3\u66f4\u9ad8\u6c34\u5e73\u7684AI\u97e7\u6027\u6240\u9700\u91c7\u53d6\u7684\u884c\u52a8\uff0c\u540c\u65f6\u575a\u5b9a\u5730\u4fdd\u6301\u5176\u57fa\u7840\u539f\u5219\u548c\u6838\u5fc3\u4ef7\u503c\u89c2\u3002|\n", "2409.08864": "|**2024-09-13**|**Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies**|Zhiqiang Zhong et.al.|[2409.08864](http://arxiv.org/abs/2409.08864)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5904\u7406\u5404\u79cd\u6570\u636e\u7ed3\u6784\u65f6\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5305\u62ec\u56fe\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u5f00\u53d1\u7528\u4e8e\u56fe\u8868\u793a\u7684\u6587\u672c\u7f16\u7801\u65b9\u6cd5\u4e0a\uff0c\u4f46\u591a\u6a21\u6001LLM\u7684\u51fa\u73b0\u4e3a\u7406\u89e3\u56fe\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u524d\u6cbf\u3002\u8fd9\u4e9b\u5148\u8fdb\u7684\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6587\u672c\u548c\u56fe\u50cf\uff0c\u901a\u8fc7\u7ed3\u5408\u89c6\u89c9\u8868\u793a\u4e0e\u4f20\u7edf\u7684\u6587\u672c\u6570\u636e\uff0c\u53ef\u80fd\u5728\u63d0\u9ad8\u5bf9\u56fe\u7ed3\u6784\u7684\u7406\u89e3\u65b9\u9762\u5e26\u6765\u6539\u8fdb\u3002\u8fd9\u9879\u7814\u7a76\u63a2\u8ba8\u4e86\u53ef\u89c6\u5316\u56fe\u5728\u4e0d\u540c\u7ea7\u522b\uff08\u8282\u70b9\u3001\u8fb9\u548c\u56fe\u7ea7\u522b\uff09\u4e0a\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u591a\u6a21\u6001\u65b9\u6cd5\u4e0e\u7eaf\u6587\u672c\u56fe\u8868\u793a\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u63d0\u4f9b\u4e86\u5173\u4e8e\u5229\u7528\u89c6\u89c9\u56fe\u6a21\u6001\u589e\u5f3aLLM\u5bf9\u56fe\u7ed3\u6784\u7406\u89e3\u80fd\u529b\u7684\u6f5c\u529b\u548c\u9650\u5236\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2409.08846": "|**2024-09-13**|**FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition**|Zhenhua Xu et.al.|[2409.08846](http://arxiv.org/abs/2409.08846)|null|\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9700\u8981\u5de8\u5927\u7684\u8ba1\u7b97\u80fd\u529b\u548c\u5927\u91cf\u7684\u6570\u636e\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u6307\u7eb9\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u7684\u77e5\u8bc6\u4ea7\u6743\u5bf9\u4e8e\u6240\u6709\u6743\u8ba4\u8bc1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5c1d\u8bd5\u901a\u8fc7\u5fae\u8c03\u5411LLMs\u6dfb\u52a0\u6307\u7eb9\uff0c\u4f46\u8fd9\u4ecd\u6210\u672c\u9ad8\u6602\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FP-VEC\uff0c\u4e00\u79cd\u4f7f\u7528\u6307\u7eb9\u5411\u91cf\u4f5c\u4e3a\u9ad8\u6548LLM\u6307\u7eb9\u65b9\u6cd5\u7684\u8bd5\u70b9\u7814\u7a76\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u751f\u6210\u4e00\u4e2a\u4ee3\u8868\u5d4c\u5165\u5728\u6a21\u578b\u4e2d\u7684\u4fdd\u5bc6\u7b7e\u540d\u7684\u6307\u7eb9\u5411\u91cf\uff0c\u5141\u8bb8\u901a\u8fc7\u5411\u91cf\u76f8\u52a0\u65e0\u7f1d\u5730\u5c06\u76f8\u540c\u7684\u6307\u7eb9\u6574\u5408\u5230\u65e0\u9650\u6570\u91cf\u7684LLMs\u4e2d\u3002\u5728\u591a\u4e2aLLMs\u4e0a\u7684\u7ed3\u679c\u8868\u660e\uff0cFP-VEC\u8f7b\u91cf\u7ea7\uff0c\u53ef\u4ee5\u5728\u4ec5\u4f7f\u7528CPU\u7684\u8bbe\u5907\u4e0a\u8fd0\u884c\u4ee5\u8fdb\u884c\u6307\u7eb9\u8bc6\u522b\uff1b\u53ef\u6269\u5c55\uff0c\u53ea\u9700\u8981\u4e00\u6b21\u8bad\u7ec3\u5373\u53ef\u5b9e\u73b0\u65e0\u9650\u6b21\u7684\u6307\u7eb9\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u4e14\u80fd\u591f\u4fdd\u6301\u6a21\u578b\u7684\u6b63\u5e38\u884c\u4e3a\u3002\u9879\u76ee\u9875\u9762\u4f4d\u4e8ehttps://fingerprintvector.github.io \u3002|\n", "2409.10516": "|**2024-09-16**|**RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval**|Di Liu et.al.|[2409.10516](http://arxiv.org/abs/2409.10516)|**[link](https://github.com/jzbjyb/reatt)**|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u5bf9\u6269\u5c55\u5230\u66f4\u957f\u4e0a\u4e0b\u6587\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5bfc\u81f4\u4e86\u6781\u9ad8\u7684\u63a8\u7406\u5ef6\u8fdf\u548cGPU\u5185\u5b58\u6d88\u8017\u4ee5\u7f13\u5b58\u952e\u503c\uff08KV\uff09\u5411\u91cf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u2014\u2014\u68c0\u7d22\u6ce8\u610f\u529b\uff08RetrievalAttention\uff09\uff0c\u4ee5\u52a0\u901f\u6ce8\u610f\u529b\u8ba1\u7b97\u3002\u901a\u8fc7\u5229\u7528\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u52a8\u6001\u7a00\u758f\u7279\u6027\uff0cRetrievalAttention\u5728CPU\u5185\u5b58\u4e0a\u6784\u5efa\u4e86\u8fd1\u4f3c\u6700\u8fd1\u90bb\u641c\u7d22\uff08ANNS\uff09\u7d22\u5f15\uff0c\u5e76\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u901a\u8fc7\u5411\u91cf\u641c\u7d22\u68c0\u7d22\u6700\u76f8\u5173\u7684\u90e8\u5206\u3002 \u7531\u4e8e\u67e5\u8be2\u5411\u91cf\u4e0e\u952e\u5411\u91cf\u4e4b\u95f4\u7684\u5206\u5e03\u5916\uff08OOD\uff09\u95ee\u9898\uff0c\u73b0\u6210\u7684ANNS\u7d22\u5f15\u4ecd\u9700\u8981\u626b\u63cfO(N)\uff08\u901a\u5e38\u4e3a\u6240\u6709\u952e\u768430%\uff09\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u786e\u68c0\u7d22\uff0c\u8fd9\u65e0\u6cd5\u5145\u5206\u5229\u7528\u9ad8\u7a00\u758f\u6027\u3002RetrievalAttention\u9996\u5148\u8bc6\u522b\u4e86ANNS\u57fa\u6ce8\u610f\u529b\u4e2d\u7684OOD\u6311\u6218\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u9002\u5e94\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u611f\u77e5\u5411\u91cf\u641c\u7d22\u7b97\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u8be5\u7b97\u6cd5\u4ec5\u8bbf\u95ee1-3%\u7684\u6570\u636e\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e9a\u7ebf\u6027\u65f6\u95f4\u590d\u6742\u5ea6\u3002 RetrievalAttention\u5927\u5e45\u964d\u4f4e\u4e86\u957f\u4e0a\u4e0b\u6587LLMs\u7684\u63a8\u7406\u6210\u672c\uff0c\u540c\u65f6\u663e\u8457\u51cf\u5c11\u4e86GPU\u5185\u5b58\u9700\u6c42\uff0c\u800c\u4fdd\u6301\u4e86\u6a21\u578b\u51c6\u786e\u6027\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cRetrievalAttention\u4ec5\u9700\u898116GB\u7684GPU\u5185\u5b58\u5373\u53ef\u4e3a\u5177\u67098B\u53c2\u6570\u7684LLM\u63d0\u4f9b\u670d\u52a1\uff0c\u652f\u6301\u5904\u7406128K\u4e2a\u4ee4\u724c\uff0c\u80fd\u591f\u5728\u5355\u4e2aNVIDIA RTX4090\uff0824GB\uff09\u4e0a\u751f\u6210\u4e00\u4e2a\u4ee4\u724c\u8017\u65f60.188\u79d2\u3002|\n", "2409.10506": "|**2024-09-16**|**Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models**|Momoko Shiraishi et.al.|[2409.10506](http://arxiv.org/abs/2409.10506)|null|\u7531\u4e8e\u73b0\u6709C\u7a0b\u5e8f\u4e2d\u7684\u5185\u5b58\u5b89\u5168\u6027\u6f0f\u6d1e\u6301\u7eed\u5a01\u80c1\u4ee5\u53caRust\u8bed\u8a00\u4f5c\u4e3aC\u8bed\u8a00\u66ff\u4ee3\u54c1\u6240\u53d7\u5230\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c06C\u4ee3\u7801\u8f6c\u6362\u4e3aRust\u4ee3\u7801\u5b58\u5728\u5f3a\u70c8\u7684\u52a8\u673a\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u751f\u6210\u6bd4\u57fa\u4e8e\u89c4\u5219\u65b9\u6cd5\u66f4\u81ea\u7136\u3001\u66f4\u5b89\u5168\u7684\u4ee3\u7801\u6765\u81ea\u52a8\u5316\u8fd9\u4e00\u7ffb\u8bd1\u8fc7\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u751f\u6210\u7684Rust\u4ee3\u7801\u5f80\u5f80\u65e0\u6cd5\u7f16\u8bd1\uff0c\u5373\u4f7f\u662f\u76f8\u5bf9\u8f83\u5c0f\u7684C\u7a0b\u5e8f\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u548c\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u7ffb\u8bd1\u65b9\u6848\uff0c\u4ee5\u63d0\u9ad8\u5927\u89c4\u6a21C\u4ee3\u7801\u6210\u529f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\u7684\u6982\u7387\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6280\u672f\uff1a\uff081\uff09\u9884\u5904\u7406C\u4ee3\u7801\uff0c\u4f7f\u5176\u7ed3\u6784\u548c\u8868\u8fbe\u5f0f\u66f4\u597d\u5730\u4e0eRust\u5bf9\u9f50\uff1b\uff082\uff09\u5c06\u4ee3\u7801\u5206\u5272\u4e3a\u6700\u4f73\u5927\u5c0f\u7684\u7ffb\u8bd1\u5355\u5143\uff0c\u4ee5\u907f\u514d\u8d85\u51faLLM\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\uff1b\uff083\uff09\u901a\u8fc7\u4f7f\u7528\u4e0a\u4e0b\u6587\u8865\u5145\u63d0\u793a\uff0c\u8fed\u4ee3\u7f16\u8bd1\u5e76\u4fee\u590d\u9519\u8bef\uff0c\u540c\u65f6\u4fdd\u6301\u4e0d\u540c\u7ffb\u8bd1\u5355\u5143\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u3002\u6210\u529f\u7f16\u8bd1\u662f\u5b9e\u73b0\u529f\u80fd\u7b49\u6548\u6027\u7684\u9996\u8981\u6b65\u9aa4\uff0c\u56e0\u4e3a\u53ea\u6709\u53ef\u7f16\u8bd1\u7684\u4ee3\u7801\u624d\u80fd\u8fdb\u4e00\u6b65\u8fdb\u884c\u6d4b\u8bd5\u3002 \u572820\u4e2a\u57fa\u51c6C\u7a0b\u5e8f\u7684\u5b9e\u9a8c\u4e2d\uff0c\u5305\u62ec\u90a3\u4e9b\u8d85\u8fc74\u5343\u884c\u4ee3\u7801\u7684\u7a0b\u5e8f\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u6240\u6709\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\uff0c\u6ca1\u6709\u4e22\u5931\u539f\u59cb\u4ee3\u7801\u7684\u5bf9\u5e94\u90e8\u5206\u3002|\n", "2409.10504": "|**2024-09-16**|**DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction**|John Wu et.al.|[2409.10504](http://arxiv.org/abs/2409.10504)|null|\u5728\u533b\u5b66\u7f16\u7801\u7b49\u9ad8\u7ef4\u6216\u591a\u6807\u7b7e\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0c\u65e2\u9700\u8981\u9884\u6d4b\u7684\u51c6\u786e\u6027\u4e5f\u9700\u8981\u89e3\u91ca\u7684\u53ef\u8bfb\u6027\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5c40\u90e8\u89e3\u91ca\u65b9\u6cd5\uff0c\u65e0\u6cd5\u63d0\u4f9b\u6574\u4e2a\u591a\u6807\u7b7e\u96c6\u5185\u6bcf\u4e2a\u6807\u7b7e\u9884\u6d4b\u80cc\u540e\u7684\u5168\u9762\u673a\u5236\u89e3\u91ca\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDIctionary Label Attention\uff08\u7b80\u79f0\\method\uff09\u7684\u6a21\u5757\u5316\u89e3\u91ca\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u4e0d\u53ef\u89e3\u91ca\u7684\u5bc6\u96c6\u5d4c\u5165\u5206\u89e3\u5230\u7a00\u758f\u5d4c\u5165\u7a7a\u95f4\u4e2d\u3002\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u975e\u96f6\u5143\u7d20\uff08\u5b57\u5178\u7279\u5f81\uff09\u4ee3\u8868\u4e86\u5168\u5c40\u5b66\u4e60\u7684\u533b\u7597\u6982\u5ff5\u3002 \u901a\u8fc7\u4eba\u5de5\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u7a00\u758f\u5d4c\u5165\u6bd4\u5176\u5bc6\u96c6\u5bf9\u5e94\u7269\u5728\u4eba\u7c7b\u7406\u89e3\u4e0a\u81f3\u5c11\u63d0\u9ad8\u4e8650%\u3002\u6211\u4eec\u7684\u81ea\u52a8\u5b57\u5178\u7279\u5f81\u8bc6\u522b\u7ba1\u9053\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u901a\u8fc7\u68c0\u67e5\u5e76\u603b\u7ed3\u6bcf\u4e2a\u5b57\u5178\u7279\u5f81\u6fc0\u6d3b\u7684\u6700\u9ad8\u7ea7\u8bcd\u6c47\uff0c\u63ed\u793a\u4e86\u6570\u5343\u4e2a\u5b66\u4e60\u5230\u7684\u533b\u7597\u6982\u5ff5\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7a00\u758f\u7684\u53ef\u89e3\u91ca\u77e9\u9635\u8868\u793a\u5b57\u5178\u7279\u5f81\u4e0e\u533b\u7597\u4ee3\u7801\u4e4b\u95f4\u7684\u5173\u7cfb\uff0c\u8fd9\u4e0d\u4ec5\u589e\u5f3a\u4e86\u6a21\u578b\u9884\u6d4b\u7684\u673a\u5236\u6027\u548c\u5168\u5c40\u7406\u89e3\u80fd\u529b\uff0c\u800c\u4e14\u5728\u4e0d\u9700\u8981\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u60c5\u51b5\u4e0b\uff0c\u4fdd\u6301\u4e86\u7ade\u4e89\u529b\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2409.10502": "|**2024-09-16**|**Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles**|Kulin Shah et.al.|[2409.10502](http://arxiv.org/abs/2409.10502)|**[link](https://github.com/kulinshah98/llm-reasoning-logic-puzzles)**|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8eTransformer\u67b6\u6784\u7684\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u6b63\u53d1\u5c55\u51fa\u4e86\u57fa\u672c\u7684\u641c\u7d22\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ecd\u662f\u4e00\u4e2a\u6301\u7eed\u8ba8\u8bba\u7684\u8bdd\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u80fd\u5426\u5b66\u4f1a\u89e3\u51b3\u590d\u6742\u7684\u6570\u72ec\u8c1c\u9898\u8fd9\u4e00\u4efb\u52a1\u3002\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\u9700\u8981\u6a21\u578b\u9996\u5148\u5728\u6240\u6709\u7a7a\u767d\u5355\u5143\u683c\u4e2d\u8fdb\u884c\u641c\u7d22\u4ee5\u51b3\u5b9a\u586b\u5145\u54ea\u4e2a\u5355\u5143\u683c\uff0c\u7136\u540e\u5e94\u7528\u9002\u5f53\u7684\u7b56\u7565\u6765\u586b\u5145\u9009\u5b9a\u7684\u5355\u5143\u683c\u3002\u6709\u65f6\uff0c\u7b56\u7565\u7684\u5e94\u7528\u4ec5\u5bfc\u81f4\u5355\u5143\u683c\u53ef\u80fd\u503c\u7684\u51cf\u5c11\uff0c\u800c\u975e\u786e\u5b9a\u786e\u5207\u503c\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u9700\u8981\u5bf9\u5355\u4e2a\u5355\u5143\u683c\u5e94\u7528\u591a\u4e2a\u7b56\u7565\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u7ecf\u8fc7\u903b\u8f91\u6b65\u9aa4\u5e8f\u5217\u8bad\u7ec3\u7684Transformer\u6a21\u578b\u786e\u5b9e\u80fd\u591f\u5b66\u4f1a\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\uff08\u6211\u4eec\u7684\u6a21\u578b\u6b63\u786e\u89e3\u51b3\u4e8694.21%\u7684\u8c1c\u9898\uff09\u3002\u6211\u4eec\u8fd8\u5bf9Zebra\u8c1c\u9898\uff08\u53c8\u79f0\u7231\u56e0\u65af\u5766\u8c1c\u9898\uff09\u8fdb\u884c\u4e86\u6269\u5c55\u5206\u6790\uff0c\u5e76\u8bc1\u660e\u6a21\u578b\u80fd\u591f\u6b63\u786e\u89e3\u51b392.04%\u7684\u8c1c\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u8bad\u7ec3\u540e\u7684Transformer\u5185\u90e8\u8868\u793a\uff0c\u5e76\u901a\u8fc7\u7ebf\u6027\u63a2\u67e5\u53d1\u73b0\uff0c\u53ef\u4ee5\u4ece\u5b83\u4eec\u4e2d\u89e3\u7801\u51fa\u7ed9\u5b9a\u5355\u5143\u683c\u7684\u6240\u6709\u53ef\u80fd\u503c\u4fe1\u606f\uff0c\u8fd9\u8868\u660eTransformer\u6743\u91cd\u4e2d\u9690\u542b\u7740\u5f3a\u5927\u7684\u63a8\u7406\u5f15\u64ce\u3002|\n", "2409.10490": "|**2024-09-16**|**Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models**|Shaznin Sultana et.al.|[2409.10490](http://arxiv.org/abs/2409.10490)|null|\u8fd1\u5e74\u6765\uff0c\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u5bf9\u5f00\u6e90\u9879\u76ee\u4f9d\u8d56\u7684\u589e\u52a0\u5bfc\u81f4\u4e86\u6f0f\u6d1e\u95ee\u9898\u7684\u663e\u8457\u589e\u957f\uff0c\u8fd9\u4e00\u73b0\u8c61\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc6\u522b\u4ee3\u7801\u5e93\u4e2d\u7684\u6f0f\u6d1e\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u6548\u679c\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u65b0\u5174LLM\u6280\u672f\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u901a\u8fc7\u5bf9\u6bd4\u5206\u6790\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5305\u62ecLlama\u3001CodeLlama\u3001Gemma\u548cCodeGemma\u5728\u5185\u7684\u6700\u8fd1\u52a0\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u53caBERT\u3001RoBERTa\u548cGPT-3\u7b49\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u63ed\u793aLLM\u5728\u6f0f\u6d1e\u68c0\u6d4b\u9886\u57df\u7684\u80fd\u529b\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e0d\u540c\u5f00\u6e90\u4ed3\u5e93\u7684\u5b89\u5168\u5b9e\u8df5\u63d0\u5347\u3002\u7ed3\u679c\u663e\u793a\uff0cCodeGemma\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u53d6\u5f97\u4e86\u6700\u9ad8\u7684F1\u5206\u6570\uff0858%\uff09\u548c\u53ec\u56de\u7387\uff0887%\uff09\u3002|\n", "2409.10484": "|**2024-09-16**|**XLM for Autonomous Driving Systems: A Comprehensive Review**|Sonda Fourati et.al.|[2409.10484](http://arxiv.org/abs/2409.10484)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u63d0\u53d6\u548c\u6587\u732e\u603b\u7ed3\u5230\u5185\u5bb9\u751f\u6210\u3001\u9884\u6d4b\u5efa\u6a21\u3001\u51b3\u7b56\u5236\u5b9a\u4ee5\u53ca\u7cfb\u7edf\u63a7\u5236\u7b49\u591a\u4e2a\u65b9\u9762\u3002\u6b64\u5916\uff0c\u89c6\u89c9\u5927\u578b\u6a21\u578b\uff08VLMs\uff09\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\uff0c\u5373XLMs\uff0c\u80fd\u591f\u7ed3\u5408\u591a\u79cd\u6570\u636e\u6a21\u6001\uff0c\u5e76\u5229\u7528\u8bed\u8a00\u7406\u89e3\u7684\u5f3a\u5927\u529b\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u8bf8\u5982\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\uff08ADS\uff09\u7b49\u57fa\u4e8e\u4fe1\u606f\u7cfb\u7edf\u7684\u8fdb\u6b65\u3002\u901a\u8fc7\u5c06\u8bed\u8a00\u901a\u4fe1\u4e0e\u591a\u6a21\u5f0f\u611f\u5b98\u8f93\u5165\uff08\u5982\u5168\u666f\u56fe\u50cf\u548c\u6fc0\u5149\u96f7\u8fbe\u6216\u96f7\u8fbe\u6570\u636e\uff09\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u91c7\u53d6\u51c6\u786e\u7684\u9a7e\u9a76\u884c\u52a8\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u672c\u6587\u7efc\u8ff0\u4e86XLMs\u5728\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u56de\u987e\u4e86ADS\u548cXLMs\u7684\u76f8\u5173\u6587\u732e\uff0c\u5305\u62ec\u5b83\u4eec\u7684\u67b6\u6784\u3001\u5de5\u5177\u548c\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u8be6\u7ec6\u9610\u8ff0\u4e86\u90e8\u7f72XLMs\u4ee5\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u89e3\u51b3\u65b9\u6848\u7684\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86XLM\u90e8\u7f72\u5728ADS\u4e2d\u7684\u76f8\u5173\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u4fc3\u8fdbXLM\u5728\u672a\u6765ADS\u6846\u67b6\u4e2d\u7684\u5e94\u7528\u3002|\n", "2409.10482": "|**2024-09-17**|**Schrodinger's Memory: Large Language Models**|Wei Wang et.al.|[2409.10482](http://arxiv.org/abs/2409.10482)|null|\u8bb0\u5fc6\u662f\u4eba\u7c7b\u6d3b\u52a8\u7684\u57fa\u7840\uff1b\u6ca1\u6709\u8bb0\u5fc6\uff0c\u51e0\u4e4e\u4e0d\u53ef\u80fd\u6267\u884c\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u4efb\u4f55\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u80fd\u529b\u6b63\u53d8\u5f97\u8d8a\u6765\u8d8a\u63a5\u8fd1\u4eba\u7c7b\u3002\u4f46LLMs\u6709\u8bb0\u5fc6\u5417\uff1f\u6839\u636e\u5f53\u524d\u7684\u8868\u73b0\uff0cLLMs\u786e\u5b9e\u663e\u793a\u51fa\u5177\u6709\u8bb0\u5fc6\u7684\u8ff9\u8c61\u3002\u90a3\u4e48\uff0c\u8fd9\u79cd\u8bb0\u5fc6\u673a\u5236\u80cc\u540e\u662f\u4ec0\u4e48\u539f\u7406\u5462\uff1f\u76ee\u524d\u7684\u7814\u7a76\u7f3a\u4e4f\u5bf9LLMs\u8bb0\u5fc6\u80fd\u529b\u548c\u5e95\u5c42\u7406\u8bba\u7684\u6df1\u5165\u63a2\u8ba8\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6cdb\u903c\u8fd1\u5b9a\u7406\uff08UAT\uff09\u6765\u89e3\u91caLLMs\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u5404\u79cdLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8fd9\u4e9b\u8bb0\u5fc6\u80fd\u529b\u7684\u65b0\u65b9\u6cd5\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0cLLMs\u7684\u8bb0\u5fc6\u5de5\u4f5c\u65b9\u5f0f\u7c7b\u4f3c\u4e8e\u859b\u5b9a\u8c14\u7684\u8bb0\u5fc6\uff0c\u5373\u53ea\u6709\u5728\u67e5\u8be2\u7279\u5b9a\u8bb0\u5fc6\u65f6\u624d\u4f1a\u663e\u73b0\u51fa\u6765\u3002\u6211\u4eec\u53ea\u80fd\u901a\u8fc7\u54cd\u5e94\u67e5\u8be2\u7684\u8f93\u51fa\u6765\u786e\u5b9a\u6a21\u578b\u662f\u5426\u4fdd\u7559\u4e86\u8bb0\u5fc6\uff1b\u5426\u5219\uff0c\u5b83\u4ecd\u7136\u662f\u4e0d\u786e\u5b9a\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u8fd9\u4e00\u6982\u5ff5\uff0c\u901a\u8fc7\u6bd4\u8f83\u4eba\u8111\u548cLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u5b83\u4eec\u5728\u64cd\u4f5c\u673a\u5236\u4e0a\u7684\u76f8\u4f3c\u6027\u548c\u5dee\u5f02\u6027\u3002|\n", "2409.10444": "|**2024-09-16**|**LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning**|Jicong Ao et.al.|[2409.10444](http://arxiv.org/abs/2409.10444)|**[link](https://github.com/proneverfake/kios)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLLM\u4f5c\u4e3a\u884c\u4e3a\u6811\u89c4\u5212\u5668\u201d\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u673a\u5668\u4eba\u88c5\u914d\u4efb\u52a1\u89c4\u5212\u4e0e\u6267\u884c\u4e2d\u7684\u884c\u4e3a\u6811\uff08BT\uff09\u751f\u6210\u3002\u6211\u4eec\u5f15\u5165\u4e86\u56db\u79cd\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ee5BT\u683c\u5f0f\u4ea7\u751f\u4efb\u52a1\u8ba1\u5212\uff0c\u4ece\u800c\u51cf\u5c11\u4eba\u5de5\u52aa\u529b\u5e76\u786e\u4fdd\u5176\u7a33\u5065\u6027\u548c\u53ef\u7406\u89e3\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u5bf9\u540c\u4e00\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u7684\u53c2\u6570\u8f83\u5c11\u7684LLMs\u7684\u8868\u73b0\u3002\u5728\u6a21\u62df\u548c\u5b9e\u9645\u4e16\u754c\u8bbe\u7f6e\u4e0b\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u63d0\u9ad8\u4e86LLMs\u5728BT\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u548c\u76d1\u7763\u5fae\u8c03\uff0c\u5728BT\u751f\u6210\u65b9\u9762\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u3002|\n", "2409.10411": "|**2024-09-16**|**A Large-Scale Privacy Assessment of Android Third-Party SDKs**|Mark Huasong Meng et.al.|[2409.10411](http://arxiv.org/abs/2409.10411)|null|\u672c\u6587\u7814\u7a76\u5bf9Android\u5e73\u53f0\u4e0a\u7684\u7b2c\u4e09\u65b9\u8f6f\u4ef6\u5f00\u53d1\u5de5\u5177\u5305\uff08SDK\uff09\u8fdb\u884c\u4e86\u9488\u5bf9\u6027\u5206\u6790\uff0c\u65e8\u5728\u586b\u8865Android\u8f6f\u4ef6\u4f9b\u5e94\u94fe\u4e2d\u7684\u5173\u952e\u7a7a\u767d\uff0c\u5173\u6ce8\u4e8e\u7528\u6237\u9690\u79c1\u4fdd\u62a4\u95ee\u9898\u3002\u7814\u7a76\u4e3b\u8981\u4ece\u4e24\u4e2a\u5173\u952e\u7684SDK\u53d1\u5e03\u5e73\u53f0\uff0c\u5b98\u65b9\u5e73\u53f0\u4e0e\u5927\u578b\u66ff\u4ee3\u5e73\u53f0\uff0c\u5bf9\u5e7f\u6cdb\u4f7f\u7528\u7684158\u4e2aSDK\u8fdb\u884c\u4e86\u8c03\u67e5\u3002 \u5728\u9690\u79c1\u6cc4\u9732\u65b9\u9762\uff0c\u6211\u4eec\u53d1\u73b0\u4e86338\u4e2a\u5b9e\u4f8b\uff0c\u8868\u660e\u8fd9\u4e9bSDK\u5728\u672a\u7ecf\u6388\u6743\u7684\u60c5\u51b5\u4e0b\uff0c\u975e\u6cd5\u4f20\u8f93\u4e86\u7528\u6237\u7684\u654f\u611f\u4fe1\u606f\u3002\u8fd9\u53ef\u80fd\u88ab\u7528\u4e8e\u975e\u6cd5\u76ee\u7684\uff0c\u5982\u7528\u6237\u8ffd\u8e2a\u6216\u725f\u5229\u3002 \u5728\u9690\u79c1\u5408\u89c4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u8d85\u8fc730%\u7684\u88ab\u68c0\u67e5SDK\u5e76\u672a\u63d0\u4f9b\u9690\u79c1\u653f\u7b56\uff0c\u4ee5\u62ab\u9732\u5176\u6570\u636e\u5904\u7406\u5b9e\u8df5\u3002\u5bf9\u4e8e\u90a3\u4e9b\u63d0\u4f9b\u4e86\u9690\u79c1\u653f\u7b56\u7684SDK\uff0c\u670937%\u8fc7\u5ea6\u6536\u96c6\u4e86\u7528\u6237\u6570\u636e\uff0c\u800c88%\u5219\u9519\u8bef\u5730\u58f0\u79f0\u62e5\u6709\u8bbf\u95ee\u654f\u611f\u6570\u636e\u7684\u6743\u5229\u3002 \u6211\u4eec\u5728\u4e00\u5e74\u540e\u91cd\u65b0\u5ba1\u89c6\u4e86SDK\u7684\u6700\u65b0\u7248\u672c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u4ee4\u4eba\u62c5\u5fe7\u7684\u8d8b\u52bf\u5e76\u6ca1\u6709\u5f97\u5230\u6539\u5584\u3002 \u57fa\u4e8e\u6211\u4eec\u7684\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u9879\u884c\u52a8\u5efa\u8bae\uff0c\u65e8\u5728\u964d\u4f4e\u9690\u79c1\u6cc4\u9732\u98ce\u9669\u5e76\u589e\u5f3aAndroid\u7528\u6237\u7684\u9690\u79c1\u4fdd\u62a4\u3002\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u5bf9\u884c\u4e1a\u63d0\u51fa\u4e86\u7d27\u8feb\u7684\u5173\u6ce8\u547c\u5401\uff0c\u4e5f\u4e3a\u672a\u6765\u7684\u76d1\u7ba1\u5e72\u9884\u63d0\u4f9b\u4e86\u5173\u952e\u89c1\u89e3\u3002|\n", "2409.10354": "|**2024-09-17**|**Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot**|Bhuvan Sachdeva et.al.|[2409.10354](http://arxiv.org/abs/2409.10354)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u5e94\u7528\u53ca\u5176\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u5e7b\u89c9\u3001\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u504f\u89c1\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u8005\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6784\u5efa\u4f60\u81ea\u5df1\u7684\u4e13\u5bb6\u673a\u5668\u4eba\u201d\uff08BYOeB\uff09\u7684\u5e73\u53f0\uff0c\u5141\u8bb8\u5f00\u53d1\u4eba\u5458\u521b\u5efa\u96c6\u6210\u4e13\u5bb6\u9a8c\u8bc1\u7684LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u3002CataractBot\u662f\u8be5\u5e73\u53f0\u7684\u7b2c\u4e00\u4e2a\u5b9e\u73b0\uff0c\u5b83\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u6709\u5173\u767d\u5185\u969c\u624b\u672f\u7684\u4e13\u5bb6\u9a8c\u8bc1\u56de\u7b54\u3002\u521d\u6b65\u8bc4\u4f30\u663e\u793a\u4e86\u5176\u6f5c\u529b\uff0c\u4f46\u8be5\u7814\u7a76\u6837\u672c\u91cf\u8f83\u5c0f\u4e14\u4e3b\u8981\u4e3a\u5b9a\u6027\u5206\u6790\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5bf9CataractBot\u8fdb\u884c\u4e86\u4e3a\u671f24\u5468\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u6d89\u53ca318\u540d\u60a3\u8005\u53ca\u5176\u966a\u540c\u4eba\u5458\u53d1\u9001\u76841992\u6761\u6d88\u606f\uff0c\u5176\u4e2d91.71%\u7684\u56de\u7b54\u7ecf\u8fc7\u4e86\u4e03\u4f4d\u4e13\u5bb6\u7684\u9a8c\u8bc1\u3002\u901a\u8fc7\u5206\u6790\u4ea4\u4e92\u65e5\u5fd7\uff0c\u6211\u4eec\u53d1\u73b0\u533b\u7597\u95ee\u9898\u8fdc\u591a\u4e8e\u7269\u6d41\u95ee\u9898\uff0c\u5e7b\u89c9\u73b0\u8c61\u53ef\u4ee5\u5ffd\u7565\u4e0d\u8ba1\uff0c\u5e76\u4e14\u4e13\u5bb6\u8bc4\u5b9a84.52%\u7684\u533b\u7597\u56de\u7b54\u51c6\u786e\u65e0\u8bef\u3002\u968f\u7740\u77e5\u8bc6\u5e93\u901a\u8fc7\u4e13\u5bb6\u66f4\u6b63\u4e0d\u65ad\u6269\u5c55\uff0c\u7cfb\u7edf\u7684\u6027\u80fd\u5f97\u5230\u4e8619.02%\u7684\u63d0\u5347\uff0c\u51cf\u5c11\u4e86\u4e13\u5bb6\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002\u8fd9\u4e9b\u53d1\u73b0\u6307\u5bfc\u672a\u6765LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.11404": "|**2024-09-17**|**AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs**|Basel Mousi et.al.|[2409.11404](http://arxiv.org/abs/2409.11404)|null|\u963f\u62c9\u4f2f\u8bed\uff0c\u4ee5\u5176\u4e30\u5bcc\u7684\u65b9\u8a00\u591a\u6837\u6027\uff0c\u4ecd\u7136\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u663e\u8457\u88ab\u4f4e\u4f30\uff0c\u5c24\u5176\u662f\u5728\u65b9\u8a00\u53d8\u4f53\u65b9\u9762\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u673a\u5668\u7ffb\u8bd1\u7ed3\u5408\u4eba\u5de5\u540e\u7f16\u8f91\u521b\u5efa\u7684\u4e03\u4e2a\u4eba\u5de5\u5408\u6210\u6570\u636e\u96c6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u4ee5\u53ca\u963f\u62c9\u4f2f\u5404\u5730\u533a\u7684\u65b9\u8a00\u3002\u6211\u4eec\u63d0\u51fa\u4e86AraDiCE\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u7406\u89e3\u4e0e\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u7814\u7a76\u4fa7\u91cd\u4e8e\u4f4e\u8d44\u6e90\u963f\u62c9\u4f2f\u65b9\u8a00\uff0c\u5e76\u5bf9\u5176\u8fdb\u884c\u4e86\u8bc4\u4ef7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u57fa\u51c6\uff0c\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u534a\u5c9b\u3001\u57c3\u53ca\u548c\u9ece\u51e1\u7279\u5730\u533a\u4e4b\u95f4\u7684\u6587\u5316\u610f\u8bc6\uff0c\u4e3aLLM\u8bc4\u4f30\u63d0\u4f9b\u4e86\u65b0\u7684\u7ef4\u5ea6\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u9488\u5bf9\u7279\u5b9a\u963f\u62c9\u4f2f\u8bed\u6a21\u578b\u5982Jais\u548cAceGPT\u5728\u65b9\u8a00\u4efb\u52a1\u4e0a\u4f18\u4e8e\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u5728\u65b9\u8a00\u8bc6\u522b\u3001\u751f\u6210\u548c\u7ffb\u8bd1\u65b9\u9762\u4ecd\u5b58\u5728\u91cd\u5927\u6311\u6218\u3002\u8fd9\u9879\u5de5\u4f5c\u8d21\u732e\u4e86\u7ea64.5\u4e07\u4e2a\u7ecf\u8fc7\u4eba\u5de5\u540e\u7f16\u8f91\u7684\u6837\u672c\u3001\u4e00\u4e2a\u6587\u5316\u57fa\u51c6\uff0c\u5e76\u5f3a\u8c03\u4e86\u6839\u636e\u7279\u5b9a\u8bad\u7ec3\u6765\u6539\u5584\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6355\u6349\u4e0d\u540c\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u80cc\u666f\u7ec6\u5fae\u5dee\u5f02\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u5728\u672c\u7814\u7a76\u4e2d\u6784\u5efa\u7684\u65b9\u8a00\u7ffb\u8bd1\u6a21\u578b\u548c\u57fa\u51c6\u3002|\n", "2409.11402": "|**2024-09-17**|**NVLM: Open Frontier-Class Multimodal LLMs**|Wenliang Dai et.al.|[2409.11402](http://arxiv.org/abs/2409.11402)|null|\u6211\u4eec\u5f15\u5165\u4e86NVLM 1.0\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fbe\u5230\u524d\u6cbf\u6c34\u5e73\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cf\uff0c\u5176\u6027\u80fd\u4e0e\u9876\u7ea7\u4e13\u6709\u6a21\u578b\uff08\u5982GPT-4o\uff09\u548c\u5f00\u6e90\u6a21\u578b\uff08\u5982Llama 3-V 405B\u548cInternVL 2\uff09\u76f8\u5339\u654c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cNVLM 1.0\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u540e\uff0c\u5728\u4ec5\u6587\u672c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u751a\u81f3\u8d85\u8fc7\u4e86\u5176\u80cc\u540e\u7684\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u67b6\u6784\u3002 \u5728\u6a21\u578b\u8bbe\u8ba1\u65b9\u9762\uff0c\u6211\u4eec\u5bf9\u89e3\u7801\u5668\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982LLaVA\uff09\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u578b\u6a21\u578b\uff08\u5982Flamingo\uff09\u8fdb\u884c\u4e86\u5168\u9762\u6bd4\u8f83\u3002\u57fa\u4e8e\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u67b6\u6784\uff0c\u4ee5\u63d0\u9ad8\u8bad\u7ec3\u6548\u7387\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7528\u4e8e\u52a8\u6001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u76841-D\u74f7\u7816\u6807\u8bb0\u8bbe\u8ba1\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u63a8\u7406\u548cOCR\u76f8\u5173\u4efb\u52a1\u7684\u6027\u80fd\u3002 \u5173\u4e8e\u8bad\u7ec3\u6570\u636e\uff0c\u6211\u4eec\u7cbe\u5fc3\u6536\u96c6\u5e76\u63d0\u4f9b\u4e86\u6240\u6709\u67b6\u6784\u7684\u9884\u8bad\u7ec3\u548c\u76d1\u7763\u5fae\u8c03\u6570\u636e\u96c6\u7684\u8be6\u7ec6\u4fe1\u606f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u6570\u636e\u8d28\u91cf\u548c\u4efb\u52a1\u591a\u6837\u6027\u6bd4\u89c4\u6a21\u66f4\u4e3a\u91cd\u8981\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u4e3aNVLM-1.0\u6a21\u578b\u5f00\u53d1\u4e86\u751f\u4ea7\u7ea7\u591a\u6a21\u6001\u529f\u80fd\uff0c\u4f7f\u5b83\u4eec\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e2d\u4e0d\u4ec5\u4fdd\u6301\u751a\u81f3\u8d85\u8d8a\u4e86\u57fa\u7840\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u4e2d\u5de7\u5999\u5730\u6574\u5408\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u7eaf\u6587\u672c\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u5927\u91cf\u7684\u591a\u6a21\u6001\u6570\u5b66\u548c\u63a8\u7406\u6570\u636e\uff0c\u4ece\u800c\u5728\u6240\u6709\u6a21\u6001\u4e0b\u63d0\u9ad8\u4e86\u6570\u5b66\u548c\u7f16\u7801\u80fd\u529b\u3002 \u4e3a\u4e86\u63a8\u52a8\u9886\u57df\u7814\u7a76\uff0c\u6211\u4eec\u5c06\u53d1\u5e03\u6a21\u578b\u6743\u91cd\u5e76\u5f00\u6e90\u4ee3\u7801\u4f9b\u793e\u533a\u4f7f\u7528\uff1ahttps://nvlm-project.github.io/\u3002|\n", "2409.11390": "|**2024-09-17**|**Says Who? Effective Zero-Shot Annotation of Focalization**|Rebecca M. M. Hicke et.al.|[2409.11390](http://arxiv.org/abs/2409.11390)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e3a\u6587\u5b66\u6587\u672c\u6807\u6ce8\u7126\u70b9\u6a21\u5f0f\u65f6\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u4efb\u52a1\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0e\u53d7\u8fc7\u8bad\u7ec3\u7684\u4eba\u7c7b\u6ce8\u91ca\u8005\u76f8\u5f53\u3002\u6211\u4eec\u4ee5\u65af\u8482\u82ac\u00b7\u91d1\u7684\u5c0f\u8bf4\u4e3a\u4f8b\u8fdb\u884c\u6848\u4f8b\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u8ba1\u7b97\u6587\u5b66\u7814\u7a76\u4e2d\u7684\u5b9e\u7528\u6027\uff0c\u8bf4\u660e\u4e86\u5982\u4f55\u5927\u89c4\u6a21\u5730\u7814\u7a76\u7126\u70b9\u6a21\u5f0f\u3002|\n", "2409.11378": "|**2024-09-17**|**Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement**|Simon Yu et.al.|[2409.11378](http://arxiv.org/abs/2409.11378)|**[link](https://github.com/for-ai/iterative-data-selection)**|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u6570\u636e\u4e0a\u7684\u80fd\u529b\u5bf9\u4e8e\u589e\u5f3a\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u63d0\u5347\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u6307\u4ee4\u6570\u636e\u96c6\u7684\u4e0d\u65ad\u589e\u591a\uff0c\u9009\u62e9\u6709\u6548\u7684\u6570\u636e\u8fdb\u884c\u6709\u6548\u8bad\u7ec3\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u786e\u5b9a\u6709\u6548\u8bad\u7ec3\u7684\u6700\u4f73\u6570\u636e\u5b50\u96c6\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u5b9e\u4f8b\u8d28\u91cf\u7b49\u5c40\u90e8\u6807\u51c6\u8fdb\u884c\u5b50\u96c6\u9009\u62e9\uff0c\u4f46\u6211\u4eec\u8ba4\u4e3a\u5168\u5c40\u89c6\u89d2\u5173\u6ce8\u6570\u636e\u591a\u6837\u6027\u66f4\u4e3a\u5173\u952e\u3002\u6211\u4eec\u91c7\u7528k\u5747\u503c\u805a\u7c7b\u65b9\u6cd5\u786e\u4fdd\u6240\u9009\u5b50\u96c6\u5145\u5206\u4ee3\u8868\u6574\u4e2a\u6570\u636e\u96c6\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u542f\u53d1\u81ea\u4e3b\u52a8\u5b66\u4e60\u6280\u672f\u7684\u8fed\u4ee3\u4f18\u5316\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ece\u5404\u4e2a\u805a\u7c7b\u4e2d\u91cd\u65b0\u91c7\u6837\u5b9e\u4f8b\uff0c\u5e76\u5728\u6bcf\u4e00\u6b21\u8bad\u7ec3\u8fed\u4ee3\u4e2d\u91cd\u65b0\u8bc4\u4f30\u6bcf\u4e2a\u805a\u7c7b\u7684\u91cd\u8981\u6027\u548c\u91c7\u6837\u6743\u91cd\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u964d\u4f4e\u5f02\u5e38\u503c\u7684\u5f71\u54cd\u5e76\u81ea\u52a8\u7b5b\u9009\u51fa\u5305\u542b\u4f4e\u8d28\u91cf\u6570\u636e\u7684\u805a\u7c7b\u3002\u901a\u8fc7\u5728\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u3001\u4e00\u822c\u4e16\u754c\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u5e76\u5bf9\u5404\u79cd\u6a21\u578b\u5bb6\u65cf\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u81f4\u6027\u6539\u8fdb\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u9009\u62e9\u63d0\u9ad8\u4e867%\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u91c7\u6837\u65b9\u6cd5\u63d0\u9ad8\u4e863.8%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5728\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee5\u589e\u5f3a\u5e7f\u6cdb\u7684\u8bc4\u4f30\u4efb\u52a1\u6027\u80fd\u65f6\uff0c\u4f18\u5148\u8003\u8651\u591a\u6837\u6027\u7684\u91c7\u6837\u65b9\u6cd5\u7684\u91cd\u8981\u6027\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/for-ai/iterative-data-selection\u3002|\n", "2409.11376": "|**2024-09-17**|**Towards Time Series Reasoning with LLMs**|Winnie Chow et.al.|[2409.11376](http://arxiv.org/abs/2409.11376)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7b49\u9886\u57df\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u5c1a\u672a\u770b\u5230\u8fd9\u79cd\u5e7f\u6cdb\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65f6\u95f4\u5e8f\u5217MLLM\u7814\u7a76\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u8868\u73b0\uff0c\u4f46\u5f88\u5c11\u6709\u5de5\u4f5c\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u7684\u65f6\u95f4\u5e8f\u5217\u63a8\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u6a21\u6001\u65f6\u95f4\u5e8f\u5217LLM\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8de8\u5404\u79cd\u9886\u57df\u5b66\u4e60\u901a\u7528\u4fe1\u606f\uff0c\u5e76\u5177\u6709\u5f3a\u5927\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5728LLM\u9876\u90e8\u8bad\u7ec3\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u65f6\u95f4\u5e8f\u5217\u7f16\u7801\u5668\uff0c\u76f4\u63a5\u63d0\u53d6\u65f6\u95f4\u5e8f\u5217\u4fe1\u606f\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7684\u65f6\u95f4\u5e8f\u5217\u4efb\u52a1\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9f13\u52b1\u6a21\u578b\u751f\u6210\u63a8\u7406\u8def\u5f84\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u6a21\u578b\u5b66\u4e60\u5230\u7684\u6f5c\u5728\u8868\u793a\u53cd\u6620\u4e86\u7279\u5b9a\u7684\u65f6\u95f4\u5e8f\u5217\u7279\u5f81\uff08\u4f8b\u5982\u659c\u7387\u3001\u9891\u7387\uff09\uff0c\u5e76\u4e14\u5728\u591a\u79cd\u9886\u57df\u7684\u4e00\u7cfb\u5217\u96f6\u6837\u672c\u63a8\u7406\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8eGPT-4o\u3002|\n", "2409.11375": "|**2024-09-17**|**Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification**|Fatema-E- Jannat et.al.|[2409.11375](http://arxiv.org/abs/2409.11375)|null|\u5728\u533b\u7597\u9886\u57df\u4e2d\uff0c\u83b7\u53d6\u5927\u91cf\u6570\u636e\u9762\u4e34\u7740\u663e\u8457\u7684\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u9690\u79c1\u95ee\u9898\u3002\u7136\u800c\uff0c\u4e3a\u4e86\u8bad\u7ec3\u7528\u4e8e\u89c6\u7f51\u819c\u75be\u75c5\u8bca\u65ad\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u96c6\u3002\u5728\u8f83\u5c0f\u6570\u636e\u96c6\u4e0a\u6709\u6548\u6cdb\u5316\u7684\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u6301\u7eed\u7684\u6311\u6218\u3002\u6570\u636e\u7a00\u7f3a\u6027\u6784\u6210\u4e86\u5b9e\u65bd\u53ef\u6269\u5c55\u533b\u7597AI\u89e3\u51b3\u65b9\u6848\u7684\u5b9e\u9645\u969c\u788d\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u591a\u79cd\u6570\u636e\u6e90\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u5e76\u589e\u5f3a\u5bf9\u65b0\u6570\u636e\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u901a\u8fc7\u8d4b\u4e88\u6a21\u578b\u4ece\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e2d\u66f4\u6df1\u5165\u7406\u89e3\u6570\u636e\u8868\u793a\u7684\u80fd\u529b\u3002\u6211\u4eec\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548cSwinV2\u6846\u67b6\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u76d1\u7763\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u591a\u6a21\u6001\u6570\u636e\u96c6\u8868\u793a\u7684\u7406\u89e3\uff0c\u4ece\u800c\u63d0\u9ad8\u4f7f\u7528\u5149\u5b66\u76f8\u5e72\u65ad\u5c42\u6210\u50cf\uff08OCT\uff09\u56fe\u50cf\u68c0\u6d4b\u773c\u75c5\u7684\u80fd\u529b\u3002 \u6211\u4eec\u91c7\u7528\u4e86\u4e24\u9636\u6bb5\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5373\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u548c\u4e0b\u6e38\u76d1\u7763\u5206\u7c7b\u5668\u7684\u5fae\u8c03\u3002\u9488\u5bf9\u4e09\u79cd\u4e0d\u540c\u6570\u636e\u96c6\u8fdb\u884c\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5728\u672a\u878d\u5408\u6570\u636e\u3001\u6570\u636e\u91cf\u6709\u9650\u8bbe\u7f6e\u548c\u65e0\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u573a\u666f\u4e0b\u91c7\u7528\u4e0d\u540c\u7684\u7f16\u7801\u5668\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5373\u4f7f\u5728\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u6761\u4ef6\u4e0b\uff0c\u4e5f\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u4e0e\u57fa\u7ebf\u6a21\u578bResNet-50\u76f8\u6bd4\uff0c\u5177\u6709\u66f4\u5f3a\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.11365": "|**2024-09-17**|**CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration**|Jiahui Gao et.al.|[2409.11365](http://arxiv.org/abs/2409.11365)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u9762\u5bf9\u6076\u610f\u89c6\u89c9\u8f93\u5165\u65f6\u7684\u5b89\u5168\u610f\u8bc6\u95ee\u9898\u3002MLLM\u901a\u5e38\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u914d\u4ee5\u56fe\u50cf\u7f16\u7801\u5668\u5c06\u56fe\u50cf\u8f6c\u6362\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u7684\u6587\u672c\u6570\u636e\u96c6\u4e2d\u7684\u4ee4\u724c\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u89c6\u89c9\u6a21\u6001\u7684\u6574\u5408\u5f15\u5165\u4e86\u4e00\u79cd\u72ec\u7279\u7684\u8106\u5f31\u6027\uff1aMLLM\u5bf9\u6076\u610f\u56fe\u50cf\u8f93\u5165\u53d8\u5f97\u654f\u611f\uff0c\u5e76\u503e\u5411\u4e8e\u751f\u6210\u53ef\u80fd\u5f15\u53d1\u5b89\u5168\u6216\u6709\u5bb3\u54cd\u5e94\u7684\u8f93\u51fa\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u901a\u8fc7\u5728MLLM\u7684\u8f93\u5165\u4e2d\u52a0\u5165\u4e00\u4e2a\u539f\u5219\uff0c\u4ee5\u660e\u786e\u5b9a\u4e49\u5b89\u5168\u6027\u8981\u6c42\uff0c\u5176\u5b89\u5168\u610f\u8bc6\u5f97\u5230\u4e86\u589e\u5f3a\u3002\u8fd9\u8bc1\u5b9e\u4e86MLLM\u5728\u5904\u7406\u56fe\u50cf\u8f93\u5165\u65f6\u5177\u6709\u4e00\u5b9a\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u4f46\u8fd9\u4e00\u80fd\u529b\u53d7\u5230\u6a21\u6001\u5dee\u8ddd\u7684\u5f71\u54cd\u800c\u51cf\u5f31\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u6280\u672f\u2014\u2014CoCA\uff08Calibration of Conditional Awareness\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u8f93\u51fa\u5206\u5e03\u6765\u589e\u5f3aMLLM\u7684\u5b89\u5168\u610f\u8bc6\u3002\u8be5\u7b56\u7565\u6709\u52a9\u4e8e\u6a21\u578b\u6062\u590d\u5176\u539f\u59cb\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u540c\u65f6\u4e0d\u727a\u7272\u5176\u539f\u6709\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u6a21\u6001\u5b89\u5168\u6027\u548c\u7406\u89e3\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.11360": "|**2024-09-17**|**AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances**|Dhruv Agarwal et.al.|[2409.11360](http://arxiv.org/abs/2409.11360)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f53\u897f\u65b9\u5bfc\u5411\u7684AI\u6a21\u578b\u5411\u6765\u81ea\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u7528\u6237\u63d0\u4f9b\u5199\u4f5c\u5efa\u8bae\u65f6\u4f1a\u53d1\u751f\u4ec0\u4e48\u60c5\u51b5\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u8de8\u6587\u5316\u7684\u53d7\u63a7\u5b9e\u9a8c\uff0c\u5171\u6709\u6765\u81ea\u5370\u5ea6\u548c\u7f8e\u56fd\u7684118\u540d\u53c2\u4e0e\u8005\u5b8c\u6210\u4e86\u5177\u6709\u6587\u5316\u57fa\u7840\u7684\u5199\u4f5c\u4efb\u52a1\uff0c\u5e76\u5728\u6709\u65e0AI\u5efa\u8bae\u7684\u60c5\u51b5\u4e0b\u5b8c\u6210\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cAI\u4e3a\u7f8e\u56fd\u4eba\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u6548\u7387\u589e\u76ca\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5370\u5ea6\u53c2\u4e0e\u8005\u5219\u5728\u91c7\u7528\u897f\u65b9\u5199\u4f5c\u98ce\u683c\u65b9\u9762\u53d7\u5230\u5f71\u54cd\uff0c\u4e0d\u4ec5\u6539\u53d8\u4e86\u6240\u5199\u7684\u5185\u5bb9\uff0c\u4e5f\u6539\u53d8\u4e86\u5176\u5199\u4f5c\u98ce\u683c\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u4ee5\u897f\u65b9\u4e3a\u4e2d\u5fc3\u7684AI\u6a21\u578b\u4f1a\u5c06\u5199\u4f5c\u65b9\u5f0f\u540c\u8d28\u5316\uff0c\u4f7f\u4e4b\u8d8b\u5411\u4e8e\u897f\u65b9\u89c4\u8303\uff0c\u4ece\u800c\u524a\u5f31\u4e86\u80fd\u591f\u4f53\u73b0\u6587\u5316\u5dee\u5f02\u7684\u7ec6\u5fae\u4e4b\u5904\u3002|\n", "2409.11353": "|**2024-09-17**|**THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models**|Mengfei Liang et.al.|[2409.11353](http://arxiv.org/abs/2409.11353)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aTHaMES\uff08\u5de5\u5177\u7528\u4e8e\u5e7b\u89c9\u7f13\u89e3\u4e0e\u8bc4\u4f30\uff09\u7684\u96c6\u6210\u6846\u67b6\u548c\u5e93\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u5b58\u5728\u7684\u5e7b\u89c9\u751f\u6210\u8fd9\u4e00\u65e5\u76ca\u589e\u957f\u7684\u6311\u6218\u3002\u73b0\u6709\u7684\u68c0\u6d4b\u548c\u7f13\u89e3\u65b9\u6cd5\u5f80\u5f80\u5b64\u7acb\u4e14\u65e0\u6cd5\u6ee1\u8db3\u7279\u5b9a\u9886\u57df\u7684\u9700\u8981\uff0c\u7f3a\u4e4f\u6807\u51c6\u5316\u6d41\u7a0b\u3002THaMES\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7aef\u5230\u7aef\u89e3\u51b3\u65b9\u6848\uff0c\u6db5\u76d6\u8bc4\u4f30\u548c\u7f13\u89e3LLMs\u4e2d\u5e7b\u89c9\u95ee\u9898\u7684\u5404\u4e2a\u73af\u8282\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u6d4b\u8bd5\u96c6\u751f\u6210\u3001\u591a\u7ef4\u5ea6\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u7075\u6d3b\u7684\u7f13\u89e3\u7b56\u7565\u3002\u5b83\u901a\u8fc7\u6279\u91cf\u5904\u7406\u3001\u52a0\u6743\u62bd\u6837\u548c\u53cd\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u6280\u672f\u81ea\u52a8\u521b\u5efa\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6210\u672c\u6548\u76ca\u9ad8\u7684\u6d4b\u8bd5\u96c6\u3002THaMES\u8bc4\u4f30\u4e86\u6a21\u578b\u5728\u6587\u672c\u751f\u6210\u548c\u4e8c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u68c0\u6d4b\u4e0e\u51cf\u5c11\u80fd\u529b\uff0c\u5e76\u5e94\u7528\u4e86\u6700\u4f73\u7f13\u89e3\u7b56\u7565\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u3001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u4f7f\u7528\u5b66\u672f\u8bba\u6587\u3001\u653f\u6cbb\u65b0\u95fb\u548c\u7ef4\u57fa\u767e\u79d1\u7684\u77e5\u8bc6\u5e93\u5bf9\u524d\u6cbfLLMs\u8fdb\u884c\u8bc4\u4f30\u53d1\u73b0\uff0c\u5546\u4e1a\u6a21\u578b\u5982GPT-4o\u5728\u53d7\u76ca\u4e8eRAG\u65b9\u9762\u6bd4ICL\u66f4\u591a\uff0c\u800c\u5f00\u6e90\u6a21\u578b\u5982Llama-3.1-8B-Instruct\u548cMistral-Nemo\u5219\u4eceICL\u4e2d\u83b7\u5f97\u66f4\u5927\u76ca\u5904\u3002\u6b64\u5916\uff0cPEFT\u663e\u8457\u63d0\u9ad8\u4e86Llama-3.1-8B-Instruct\u5728\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002|\n", "2409.11282": "|**2024-09-17**|**Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5**|Marcel Lamott et.al.|[2409.11282](http://arxiv.org/abs/2409.11282)|null|\u968f\u7740\u5404\u7c7b\u6570\u5b57\u6587\u6863\u683c\u5f0f\u7684\u6fc0\u589e\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u975e\u6807\u51c6\u5316\u7684\u6587\u6863\u5982\u5546\u4e1a\u62a5\u544a\u548c\u73af\u5883\u8bc4\u4f30\u62a5\u544a\uff0c\u6587\u6863\u7406\u89e3\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u80fd\u529b\uff0c\u4f46\u5728\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u76f4\u63a5\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u8868\u660eLLMs\u5728\u8fd9\u4e00\u9886\u57df\u5177\u6709\u6f5c\u529b\uff0c\u7136\u800c\u5b83\u4eec\u5de8\u5927\u7684\u8ba1\u7b97\u9700\u6c42\u4f7f\u5176\u96be\u4ee5\u6709\u6548\u5730\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u4e13\u6709\u7684\u201c\u9ed1\u76d2\u201dLLMs\u5f80\u5f80\u4f18\u4e8e\u5f00\u6e90\u7248\u672c\uff0c\u8fd9\u6784\u6210\u4e86\u5e7f\u6cdb\u53ef\u8bbf\u95ee\u6027\u7684\u969c\u788d\u3002\u672c\u6587\u6df1\u5165\u63a2\u8ba8\u4e86\u6587\u6863\u7406\u89e3\u7684\u9886\u57df\uff0c\u5229\u7528\u4e86\u4eceLLM ChatGPT\u5230FLAN-T5\u7684\u63d0\u70bc\u65b9\u6cd5\u6765\u5e73\u8861\u5927\u6a21\u578b\u7684\u5f3a\u5927\u529f\u80fd\u4e0e\u8ba1\u7b97\u9650\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u6574\u5408\u6807\u8bb0\u548c\u8bfe\u7a0b\u5b66\u4e60\u673a\u5236\u6765\u4fc3\u8fdb\u77e5\u8bc6\u7684\u6709\u6548\u8f6c\u79fb\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u7684\u8fdb\u5c55\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5f25\u5408\u8d44\u6e90\u5bc6\u96c6\u578bLLMs\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u63d0\u70bc\u6280\u672f\u5728\u4f7f\u590d\u6742\u8bed\u8a00\u6a21\u578b\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5f97\u5230\u5e7f\u6cdb\u5e94\u7528\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u63a8\u52a8\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u53d1\u5c55\u3002|\n", "2409.12194": "|**2024-09-20**|**Gender Representation and Bias in Indian Civil Service Mock Interviews**|Somonnoy Banerjee et.al.|[2409.12194](http://arxiv.org/abs/2409.12194)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u8d21\u732e\u3002\u9996\u5148\uff0c\u901a\u8fc7\u6536\u96c6\u81ea888\u4e2a\u5370\u5ea6\u516c\u52a1\u5458\u5019\u9009\u4eba\u9762\u8bd5\u6a21\u62df\u7684YouTube\u89c6\u9891\u4e2d\u768451,278\u4e2a\u95ee\u9898\u6837\u672c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5bf9\u7537\u6027\u548c\u5973\u6027\u5019\u9009\u4eba\u63d0\u95ee\u7684\u6027\u522b\u504f\u89c1\u5728\u5e7f\u6cdb\u6027\u8d28\u4e0a\u7684\u663e\u8457\u5b58\u5728\u3002\u7b2c\u4e8c\uff0c\u6211\u4eec\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b9e\u9a8c\u63ed\u793a\u4e86\u5728\u6027\u522b\u63a8\u65ad\u4efb\u52a1\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u7684\u89e3\u91ca\u4e2d\u5b58\u5728\u5f3a\u70c8\u7684\u6027\u522b\u504f\u89c1\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b51,278\u4e2a\u9762\u8bd5\u95ee\u9898\u7684\u65b0\u578b\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u672a\u6765\u7684\u4eba\u6587\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u63d0\u4f9b\u4fe1\u606f\u3002|\n", "2409.12183": "|**2024-09-18**|**To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning**|Zayne Sprague et.al.|[2409.12183](http://arxiv.org/abs/2409.12183)|**[link](https://github.com/zayne-sprague/to-cot-or-not-to-cot)**|\u4e3a\u4e86\u5206\u6790\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u5728\u54ea\u4e9b\u4efb\u52a1\u4e2d\u771f\u6b63\u6709\u76ca\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u91cf\u5316\u5143\u5206\u6790\uff0c\u8986\u76d6\u4e86\u8d85\u8fc7100\u7bc7\u4f7f\u7528CoT\u7684\u8bba\u6587\uff0c\u5e76\u5bf920\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u4e8614\u79cd\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0cCoT\u4e3b\u8981\u5728\u6570\u5b66\u6216\u903b\u8f91\u4efb\u52a1\u4e0a\u63d0\u4f9b\u663e\u8457\u6027\u80fd\u4f18\u52bf\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u4efb\u52a1\u4e0a\u7684\u589e\u76ca\u8f83\u5c0f\u3002\u5728MMLU\u4e0a\uff0c\u76f4\u63a5\u751f\u6210\u7b54\u6848\u800c\u65e0\u9700CoT\u51e0\u4e4e\u4e0eCoT\u5177\u6709\u76f8\u540c\u7684\u51c6\u786e\u6027\uff0c\u9664\u975e\u95ee\u9898\u6216\u6a21\u578b\u7684\u56de\u7b54\u5305\u542b\u7b49\u53f7\uff0c\u8fd9\u8868\u660e\u7b26\u53f7\u64cd\u4f5c\u548c\u63a8\u7406\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u5206\u6790\u4e86CoT\u5728\u8fd9\u4e9b\u95ee\u9898\u4e2d\u7684\u884c\u4e3a\uff0c\u901a\u8fc7\u5206\u79bb\u89c4\u5212\u548c\u6267\u884c\uff0c\u5e76\u4e0e\u589e\u5f3a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6bd4\u8f83\u3002CoT\u5927\u90e8\u5206\u6536\u76ca\u6765\u81ea\u6539\u8fdb\u7684\u7b26\u53f7\u6267\u884c\uff0c\u4f46\u76f8\u8f83\u4e8e\u4f7f\u7528\u7b26\u53f7\u6c42\u89e3\u5668\uff0c\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u53ef\u4ee5\u6839\u636e\u9700\u8981\u5e94\u7528CoT\uff0c\u540c\u65f6\u4fdd\u6301\u6027\u80fd\u5e76\u8282\u7701\u63a8\u7406\u6210\u672c\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u7ed3\u679c\u8fd8\u8868\u660e\uff0c\u9700\u8981\u8d85\u8d8a\u57fa\u4e8e\u63d0\u793a\u7684CoT\uff0c\u8f6c\u5411\u65b0\u7684\u8303\u5f0f\uff0c\u66f4\u597d\u5730\u5229\u7528\u6574\u4e2a\u8303\u56f4\u5185\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5e94\u7528\u4e2d\u7684\u4e2d\u95f4\u8ba1\u7b97\u3002|\n", "2409.12180": "|**2024-09-18**|**Finetuning Language Models to Emit Linguistic Expressions of Uncertainty**|Arslan Chaudhry et.al.|[2409.12180](http://arxiv.org/abs/2409.12180)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u68c0\u7d22\u4e0e\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u4f46\u5b83\u4eec\u503e\u5411\u4e8e\u751f\u6210\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u76f8\u51b2\u7a81\u7684\u4fe1\u606f\uff0c\u5e76\u4ee5\u8bf4\u670d\u6027\u7684\u65b9\u5f0f\u8868\u8fbe\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4e0d\u51c6\u786e\u6027\u770b\u8d77\u6765\u81ea\u4fe1\u4e14\u4ee4\u4eba\u4fe1\u670d\u3002\u8fd9\u5bfc\u81f4\u6700\u7ec8\u7528\u6237\u96be\u4ee5\u4e00\u81f4\u5730\u5c06LLM\u7684\u81ea\u4fe1\u5ea6\u4e0e\u9884\u6d4b\u7684\u51c6\u786e\u6027\u5bf9\u9f50\uff0c\u5e38\u5e38\u5bfc\u81f4\u5bf9\u6240\u6709\u8f93\u51fa\u7684\u76f2\u76ee\u4fe1\u4efb\u6216\u5b8c\u5168\u5ffd\u89c6\u5176\u53ef\u9760\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5728\u4e0d\u786e\u5b9a\u6027\u589e\u5f3a\u7684\u9884\u6d4b\u57fa\u7840\u4e0a\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u4ee5\u6b64\u6765\u5f00\u53d1\u80fd\u591f\u751f\u6210\u8bed\u8a00\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u7684\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8861\u91cf\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6821\u51c6\u7a0b\u5ea6\uff0c\u7136\u540e\u901a\u8fc7\u57fa\u4e8e\u6a21\u578b\u81ea\u8eab\u4fe1\u5fc3\u7684\u5fae\u8c03\uff0c\u4f7f\u8bed\u8a00\u6a21\u578b\u4ea7\u751f\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002\u901a\u8fc7\u5bf9\u5404\u79cd\u95ee\u7b54\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86LLM\u5728\u8bc4\u4f30\u9884\u6d4b\u65f6\u5177\u6709\u826f\u597d\u7684\u6821\u51c6\u80fd\u529b\uff0c\u5e76\u57fa\u4e8e\u6a21\u578b\u672c\u8eab\u7684\u4fe1\u5fc3\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u53ef\u83b7\u5f97\u7279\u522b\u9002\u7528\u4e8e\u5355\u4e2a\u58f0\u660e\u7b54\u6848\u7684\u826f\u597d\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002|\n", "2409.12150": "|**2024-09-18**|**Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference**|Najmeh Forouzandehmehr et.al.|[2409.12150](http://arxiv.org/abs/2409.12150)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u8868\u8fbe\u80fd\u529b\u6765\u89e3\u51b3\u4e2a\u6027\u5316\u670d\u88c5\u63a8\u8350\u8fd9\u4e00\u590d\u6742\u6311\u6218\u3002\u901a\u8fc7\u7ec6\u8c03\u548c\u76f4\u63a5\u53cd\u9988\u96c6\u6210\uff0c\u6211\u4eec\u8bd5\u56fe\u514b\u670dLLM\u7684\u201c\u9ed1\u76d2\u201d\u7279\u6027\u548c\u9759\u6001\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e0a\u4f7f\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u56fe\u50cf\u63cf\u8ff0\uff0c\u6765\u5f25\u5408\u9879\u76ee\u89c6\u89c9\u4e0e\u6587\u672c\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u4ece\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e2d\u63d0\u53d6\u98ce\u683c\u548c\u8272\u5f69\u7279\u5f81\uff0c\u4ece\u800c\u5f62\u6210\u4e2a\u6027\u5316\u7684\u63a8\u8350\u57fa\u7840\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u7684Polyvore\u6570\u636e\u96c6\u5bf9LLM\u8fdb\u884c\u9ad8\u6548\u7ec6\u8c03\uff0c\u4f18\u5316\u5176\u63a8\u8350\u65f6\u5c1a\u642d\u914d\u7684\u80fd\u529b\u3002\u91c7\u7528\u76f4\u63a5\u504f\u597d\u673a\u5236\u5e76\u7ed3\u5408\u8d1f\u4f8b\uff0c\u4ee5\u589e\u5f3aLLM\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u8fd9\u521b\u5efa\u4e86\u4e00\u4e2a\u81ea\u6211\u589e\u5f3a\u7684\u4eba\u5de5\u667a\u80fd\u53cd\u9988\u5faa\u73af\uff0c\u6301\u7eed\u5730\u6839\u636e\u5b63\u8282\u6027\u65f6\u5c1a\u8d8b\u52bf\u4f18\u5316\u63a8\u8350\u3002\u6211\u4eec\u7684\u6846\u67b6\u5728Polyvore\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\uff1a\u8865\u5168\u7a7a\u767d\u548c\u8f85\u52a9\u9879\u76ee\u68c0\u7d22\u3002\u8fd9\u4e9b\u8bc4\u4f30\u7ed3\u679c\u5f3a\u8c03\u4e86\u6846\u67b6\u751f\u6210\u65f6\u5c1a\u3001\u4e0e\u6f6e\u6d41\u4e00\u81f4\u7684\u670d\u88c5\u5efa\u8bae\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u76f4\u63a5\u53cd\u9988\u6301\u7eed\u6539\u8fdb\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u5728\u8fd9\u4e9b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\uff0c\u521b\u9020\u4e86\u66f4\u52a0\u534f\u8c03\u7684\u670d\u88c5\u3002\u6539\u8fdb\u7684\u8868\u73b0\u8bc1\u660e\u4e86\u8be5\u6846\u67b6\u589e\u5f3a\u8d2d\u7269\u4f53\u9a8c\u3001\u63d0\u4f9b\u51c6\u786e\u5efa\u8bae\u7684\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5b83\u76f8\u5bf9\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.12147": "|**2024-09-18**|**MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning**|Justin Chih-Yao Chen et.al.|[2409.12147](http://arxiv.org/abs/2409.12147)|**[link](https://github.com/dinobby/magicore)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\u53ef\u4ee5\u901a\u8fc7\u5728\u6d4b\u8bd5\u65f6\u91c7\u7528\u805a\u5408\u7b56\u7565\u8fdb\u884c\u63d0\u5347\uff0c\u5373\u751f\u6210\u591a\u4e2a\u6837\u672c\u5e76\u57fa\u4e8e\u751f\u6210\u6837\u672c\u8fdb\u884c\u6295\u7968\u3002\u867d\u7136\u8fd9\u4e9b\u7b56\u7565\u80fd\u591f\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5b58\u5728\u9971\u548c\u70b9\u3002\u6539\u8fdb\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201cRefinement\u201d\u7684\u7b56\u7565\uff0c\u901a\u8fc7\u5229\u7528LLM\u751f\u6210\u7684\u53cd\u9988\u6765\u63d0\u5347\u89e3\u51b3\u65b9\u6848\u7684\u8d28\u91cf\u3002\u7136\u800c\uff0cRefinement\u4e5f\u5e26\u6765\u4e86\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u8fc7\u5ea6\u7ec6\u5316\uff1a\u5bf9\u6240\u6709\u5b9e\u4f8b\u8fdb\u884c\u7edf\u4e00\u7ec6\u5316\u53ef\u80fd\u5bfc\u81f4\u8fc7\u5ea6\u4fee\u6b63\uff0c\u4ece\u800c\u964d\u4f4e\u6574\u4f53\u6027\u80fd\u3002\uff082\uff09\u96be\u4ee5\u5b9a\u4f4d\u548c\u7ea0\u6b63\u9519\u8bef\uff1aLLM\u5177\u6709\u6709\u9650\u7684\u81ea\u6211\u7ea0\u6b63\u80fd\u529b\uff0c\u5f88\u96be\u8bc6\u522b\u5e76\u7ea0\u6b63\u81ea\u5df1\u7684\u9519\u8bef\u3002\uff083\uff09\u7ec6\u5316\u4e0d\u8db3\uff1a\u51b3\u5b9a\u9700\u8981\u591a\u5c11\u8fed\u4ee3\u7684\u7ec6\u5316\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fc7\u65e9\u505c\u6b62\u53ef\u80fd\u4f1a\u8ba9\u9519\u8bef\u672a\u5f97\u5230\u89e3\u51b3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAgICoRe\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u95ee\u9898\u96be\u5ea6\u5206\u4e3a\u7b80\u5355\u6216\u56f0\u96be\uff0c\u5e76\u4f7f\u7528\u7c97\u7c92\u5ea6\u805a\u5408\u89e3\u51b3\u7b80\u5355\u95ee\u9898\uff0c\u4f7f\u7528\u7ec6\u7c92\u5ea6\u548c\u591a\u8f6e\u8fed\u4ee3\u7ec6\u5316\u89e3\u51b3\u56f0\u96be\u95ee\u9898\uff0c\u4ee5\u907f\u514d\u8fc7\u5ea6\u7ec6\u5316\u3002\u4e3a\u4e86\u6539\u5584\u9519\u8bef\u5b9a\u4f4d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57fa\u4e8e\u6b65\u9aa4\u7ea7\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u5206\u6570\u7684\u5916\u90e8\u8bc4\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7531\u4e09\u4e2a\u4ee3\u7406\u7ec4\u6210\u7684\u591a\u4ee3\u7406\u5faa\u73af\uff1a\u6c42\u89e3\u8005\u3001\u5ba1\u67e5\u8005\uff08\u6839\u636e\u6b65\u9aa4\u7ea7RM\u5206\u6570\u751f\u6210\u9488\u5bf9\u6027\u53cd\u9988\uff09\u4ee5\u53ca\u7ec6\u5316\u8005\uff08\u6574\u5408\u53cd\u9988\uff09\uff0c\u4ee5\u786e\u4fdd\u6709\u6548\u7ec6\u5316\u3002\u4e3a\u4e86\u786e\u4fdd\u8db3\u591f\u7684\u7ec6\u5316\uff0c\u6211\u4eec\u91cd\u65b0\u8bc4\u4f30\u66f4\u65b0\u540e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u5728\u5fc5\u8981\u65f6\u542f\u52a8\u8fdb\u4e00\u6b65\u7684\u7ec6\u5316\u8f6e\u6b21\u3002\u6211\u4eec\u4f7f\u7528Llama-3-8B\u548cGPT-3.5\u57285\u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86MAgICoRe\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u6709\u6548\u6027\u3002\u5373\u4f7f\u53ea\u8fdb\u884c\u4e00\u6b21\u8fed\u4ee3\uff0cMAgICoRe\u4e5f\u80fd\u5728\u4f7f\u7528\u4e0d\u5230\u57fa\u7ebf\u6837\u672c\u4e00\u534a\u7684\u60c5\u51b5\u4e0b\uff0c\u5206\u522b\u8d85\u8fc7Self-Consistency\u3001Best-of-k\u548cSelf-Refine\u7b97\u6cd53.4%\u30013.2%\u548c4.0%\u3002\u4e0e\u8fed\u4ee3\u7ec6\u5316\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0cMAgICoRe\u968f\u7740\u8fed\u4ee3\u6b21\u6570\u7684\u589e\u52a0\u6301\u7eed\u63d0\u9ad8\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u7684\u6d88\u878d\u5b9e\u9a8c\u5f3a\u8c03\u4e86MAgICoRe\u4e2dRMs\u548c\u591a\u4ee3\u7406\u901a\u4fe1\u7684\u91cd\u8981\u6027\u3002**|\n", "2409.12140": "|**2024-09-18**|**MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion**|Kalakonda Sai Shashank et.al.|[2409.12140](http://arxiv.org/abs/2409.12140)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMoRAG\u7684\u521b\u65b0\u591a\u90e8\u5206\u878d\u5408\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\uff0c\u7528\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u3002\u6b64\u65b9\u6cd5\u901a\u8fc7\u5229\u7528\u589e\u5f3a\u7684\u8fd0\u52a8\u68c0\u7d22\u8fc7\u7a0b\u83b7\u5f97\u7684\u989d\u5916\u77e5\u8bc6\u6765\u63d0\u5347\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u3002\u901a\u8fc7\u6709\u6548\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u6211\u4eec\u89e3\u51b3\u4e86\u8fd0\u52a8\u68c0\u7d22\u4e2d\u7684\u62fc\u5199\u9519\u8bef\u548c\u91cd\u8ff0\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u591a\u90e8\u5206\u68c0\u7d22\u7b56\u7565\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u68c0\u7d22\u5728\u8bed\u8a00\u7a7a\u95f4\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u7a7a\u95f4\u7ec4\u5408\u68c0\u7d22\u5230\u7684\u52a8\u4f5c\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u6837\u672c\u3002\u6b64\u5916\uff0c\u5229\u7528\u4f4e\u7ea7\u3001\u7279\u5b9a\u90e8\u5206\u7684\u8fd0\u52a8\u4fe1\u606f\uff0c\u6211\u4eec\u53ef\u4ee5\u6784\u5efa\u9488\u5bf9\u672a\u89c1\u8fc7\u6587\u672c\u63cf\u8ff0\u7684\u8fd0\u52a8\u6837\u672c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u4f5c\u4e3a\u63d2\u4ef6\u6a21\u5757\u4f7f\u7528\uff0c\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u7684\u6027\u80fd\u3002\u4ee3\u7801\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u548c\u89c6\u9891\u793a\u4f8b\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u63d0\u4f9b\uff1ahttps://motion-rag.github.io/|\n", "2409.12139": "|**2024-09-24**|**Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models**|Sijing Chen et.al.|[2409.12139](http://arxiv.org/abs/2409.12139)|null|\u968f\u7740\u5927\u6570\u636e\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u4ee3\u7684\u5230\u6765\uff0c\u96f6\u6837\u672c\u4e2a\u6027\u5316\u5feb\u901f\u5b9a\u5236\u5df2\u6210\u4e3a\u4e00\u4e2a\u663e\u8457\u8d8b\u52bf\u3002\u672c\u62a5\u544a\u4ecb\u7ecd\u4e86Takin AudioLLM\u7cfb\u5217\u6280\u672f\u4e0e\u6a21\u578b\uff0c\u4e3b\u8981\u5305\u62ecTakin TTS\u3001Takin VC\u548cTakin Morphing\uff0c\u4e13\u95e8\u7528\u4e8e\u6709\u58f0\u8bfb\u7269\u5236\u4f5c\u3002\u8fd9\u4e9b\u6a21\u578b\u5177\u5907\u96f6\u6837\u672c\u8bed\u97f3\u751f\u6210\u80fd\u529b\uff0c\u80fd\u4ea7\u751f\u51e0\u4e4e\u4e0e\u771f\u4eba\u58f0\u97f3\u96be\u4ee5\u533a\u5206\u7684\u9ad8\u8d28\u91cf\u8bed\u97f3\uff0c\u4f7f\u5f97\u4e2a\u4eba\u53ef\u4ee5\u6839\u636e\u81ea\u8eab\u9700\u6c42\u5b9a\u5236\u8bed\u97f3\u5185\u5bb9\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdTakin TTS\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u589e\u5f3a\u795e\u7ecf\u8bed\u97f3\u7f16\u89e3\u7801\u5668\u548c\u591a\u4efb\u52a1\u8bad\u7ec3\u6846\u67b6\u7684\u795e\u7ecf\u7f16\u89e3\u7801\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u4ee5\u96f6\u6837\u672c\u65b9\u5f0f\u751f\u6210\u9ad8\u4fdd\u771f\u81ea\u7136\u8bed\u97f3\u3002\u5bf9\u4e8eTakin VC\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u5185\u5bb9\u4e0e\u97f3\u8272\u8054\u5408\u5efa\u6a21\u65b9\u6cd5\u6765\u63d0\u9ad8\u8bf4\u8bdd\u4eba\u76f8\u4f3c\u5ea6\uff0c\u5e76\u5021\u5bfc\u57fa\u4e8e\u6761\u4ef6\u6d41\u5339\u914d\u7684\u89e3\u7801\u5668\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u81ea\u7136\u6027\u548c\u8868\u8fbe\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Takin Morphing\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u91c7\u7528\u9ad8\u5ea6\u89e3\u8026\u4e14\u5148\u8fdb\u7684\u97f3\u8272\u4e0e\u8282\u594f\u5efa\u6a21\u65b9\u6cd5\uff0c\u4f7f\u4e2a\u4f53\u80fd\u591f\u4ee5\u7cbe\u786e\u53ef\u63a7\u7684\u65b9\u5f0f\u6839\u636e\u81ea\u5df1\u7684\u504f\u597d\u5b9a\u5236\u8bed\u97f3\u751f\u4ea7\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eecTakin AudioLLM\u7cfb\u5217\u6a21\u578b\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u6709\u5173\u8be6\u7ec6\u6f14\u793a\uff0c\u8bf7\u53c2\u9605\u3002|\n", "2409.12122": "|**2024-09-18**|**Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement**|An Yang et.al.|[2409.12122](http://arxiv.org/abs/2409.12122)|null|\u5728\u672c\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7cfb\u5217\u6570\u5b66\u4e13\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff1aQwen2.5-Math \u548c Qwen2.5-Math-Instruct-1.5B/7B/72B\u3002Qwen2.5 \u7cfb\u5217\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5728\u6574\u4e2a\u7ba1\u9053\u4e2d\u878d\u5165\u81ea\u6211\u63d0\u5347\u7684\u54f2\u5b66\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u3001\u540e\u5904\u7406\u548c\u63a8\u7406\u9636\u6bb5\uff1a\uff081\uff09\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u4f7f\u7528 Qwen2-Math-Instruct \u6765\u751f\u6210\u5927\u89c4\u6a21\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6570\u636e\u3002\uff082\uff09\u5728\u540e\u5904\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u901a\u8fc7\u4ece Qwen2-Math-Instruct \u8fdb\u884c\u5927\u91cf\u91c7\u6837\u6765\u5f00\u53d1\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06\u6b64 RM \u5e94\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u7684\u8fed\u4ee3\u8fdb\u5316\u3002\u901a\u8fc7\u589e\u5f3a\u7684 SFT \u6a21\u578b\uff0c\u6709\u53ef\u80fd\u8fdb\u884c\u8fed\u4ee3\u8bad\u7ec3\u5e76\u66f4\u65b0 RM\uff0c\u8fdb\u800c\u6307\u5bfc SFT \u6570\u636e\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u5728\u6700\u7ec8\u7684 SFT \u6a21\u578b\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u7ec8\u6781 RM \u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff0c\u4ece\u800c\u4ea7\u751f Qwen2.5-Math-Instruct \u6a21\u578b\u3002\uff083\uff09\u6b64\u5916\uff0c\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u4f7f\u7528 RM \u6765\u5f15\u5bfc\u91c7\u6837\uff0c\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002 Qwen2.5-Math-Instruct \u652f\u6301\u4e2d\u6587\u548c\u82f1\u6587\uff0c\u5e76\u5177\u6709\u9ad8\u7ea7\u6570\u5b66\u63a8\u7406\u80fd\u529b\uff0c\u5305\u62ec\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u548c\u5de5\u5177\u96c6\u6210\u63a8\u7406\uff08TIR\uff09\u3002\u6211\u4eec\u5728\u82f1\u8bed\u548c\u4e2d\u6587\u7684 10 \u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5982 GSM8K\u3001MATH\u3001GaoKao\u3001AMC23 \u548c AIME24\uff0c\u6db5\u76d6\u4ece\u5c0f\u5b66\u6c34\u5e73\u5230\u6570\u5b66\u7ade\u8d5b\u95ee\u9898\u7684\u5e7f\u6cdb\u96be\u5ea6\u3002|\n", "2409.12117": "|**2024-09-18**|**Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference**|Edresson Casanova et.al.|[2409.12117](http://arxiv.org/abs/2409.12117)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u5c06\u97f3\u9891\u8f6c\u6362\u4e3a\u79bb\u6563\u4ee4\u724c\u7684\u97f3\u9891\u7f16\u89e3\u7801\u5668\u65b9\u9762\u663e\u8457\u63a8\u52a8\u4e86\u97f3\u9891\u5904\u7406\uff0c\u8fd9\u4f7f\u5f97\u53ef\u4ee5\u5c06\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u5e94\u7528\u4e8e\u97f3\u9891\u6570\u636e\u3002\u7136\u800c\uff0c\u97f3\u9891\u7f16\u89e3\u7801\u5668\u901a\u5e38\u4ee5\u9ad8\u5e27\u7387\u8fd0\u884c\uff0c\u5bfc\u81f4\u8bad\u7ec3\u548c\u63a8\u7406\u901f\u5ea6\u7f13\u6162\uff0c\u7279\u522b\u662f\u5728\u81ea\u56de\u5f52\u6a21\u578b\u4e2d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f4e\u5e27\u7387\u8bed\u97f3\u7f16\u89e3\u7801\u5668\uff08LFSC\uff09\uff1a\u4e00\u79cd\u795e\u7ecf\u97f3\u9891\u7f16\u89e3\u7801\u5668\uff0c\u5b83\u5229\u7528\u6709\u9650\u6807\u91cf\u91cf\u5316\u548c\u4e0e\u5927\u578b\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u6297\u6027\u8bad\u7ec3\uff0c\u4ee51.89 kbps\u7684\u6bd4\u7279\u7387\u548c21.5\u5e27/\u79d2\u5b9e\u73b0\u9ad8\u8d28\u91cf\u7684\u97f3\u9891\u538b\u7f29\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6211\u4eec\u7684\u65b0\u578b\u7f16\u89e3\u7801\u5668\u53ef\u4ee5\u4f7f\u57fa\u4e8eLLM\u7684\u6587\u672c\u5230\u8bed\u97f3\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u52a0\u5feb\u7ea6\u4e09\u500d\uff0c\u540c\u65f6\u63d0\u9ad8\u53ef\u61c2\u5ea6\u5e76\u4ea7\u751f\u4e0e\u4ee5\u5f80\u6a21\u578b\u76f8\u5f53\u7684\u8d28\u91cf\u3002|\n", "2409.12106": "|**2024-09-18**|**Measuring Human and AI Values based on Generative Psychometrics with Large Language Models**|Haoran Ye et.al.|[2409.12106](http://arxiv.org/abs/2409.12106)|**[link](https://github.com/value4ai/gpv)**|**\u672c\u6587\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u5fc3\u7406\u6d4b\u5ea6\uff08GPV\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6570\u636e\u9a71\u52a8\u7684\u4ef7\u503c\u6d4b\u91cf\u8303\u5f0f\uff0c\u7406\u8bba\u57fa\u7840\u5728\u4e8e\u6587\u672c\u63ed\u793a\u7684\u9009\u62e9\u6027\u611f\u77e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u4ee5\u5b9e\u73b0\u7cbe\u786e\u7684\u611f\u77e5\u5c42\u7ea7\u4ef7\u503c\u6d4b\u91cf\uff0c\u5e76\u9a8c\u8bc1LLM\u89e3\u6790\u6587\u672c\u5f62\u6210\u611f\u77e5\u7684\u6838\u5fc3\u80fd\u529b\uff0c\u4ece\u800c\u6784\u5efaGPV\u7ba1\u9053\u7684\u57fa\u7840\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06GPV\u5e94\u7528\u4e8e\u4eba\u7c7b\u64b0\u5199\u7684\u535a\u5ba2\uff0c\u8bc1\u660e\u5176\u7a33\u5b9a\u6027\u548c\u6709\u6548\u6027\uff0c\u5e76\u4e14\u4f18\u4e8e\u5148\u524d\u7684\u5fc3\u7406\u5b66\u5de5\u5177\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06GPV\u6269\u5c55\u5230LLM\u4ef7\u503c\u6d4b\u91cf\uff0c\u901a\u8fc7\u4ee5\u4e0b\u65b9\u5f0f\u63a8\u52a8\u5f53\u524d\u6280\u672f\uff1a1\uff09\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u53ef\u6269\u5c55\u548c\u81ea\u7531\u5f62\u5f0f\u8f93\u51fa\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u4f7f\u4ef7\u503c\u6d4b\u91cf\u80fd\u591f\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\uff1b2\uff09\u6bd4\u8f83\u4e86\u4e0d\u540c\u6d4b\u91cf\u65b9\u6cd5\uff0c\u63ed\u793a\u4e86\u524d\u4eba\u65b9\u6cd5\u7684\u56de\u5e94\u504f\u5dee\uff1b3\uff09\u5c1d\u8bd5\u5c06LLM\u4ef7\u503c\u4e0e\u5b89\u5168\u6027\u8054\u7cfb\u8d77\u6765\uff0c\u53d1\u73b0\u4e0d\u540c\u4ef7\u503c\u4f53\u7cfb\u7684\u9884\u6d4b\u529b\uff0c\u5e76\u5206\u6790\u5404\u79cd\u4ef7\u503c\u5bf9LLM\u5b89\u5168\u6027\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8de8\u5b66\u79d1\u52aa\u529b\uff0c\u672c\u6587\u65e8\u5728\u5229\u7528AI\u63a8\u52a8\u4e0b\u4e00\u4ee3\u5fc3\u7406\u6d4b\u5ea6\u7684\u53d1\u5c55\uff0c\u5e76\u5229\u7528\u5fc3\u7406\u6d4b\u5ea6\u4fc3\u8fdb\u4ef7\u503c\u5bfc\u5411\u7684AI\u3002**|\n", "2409.17143": "|**2024-09-25**|**Attention Prompting on Image for Large Vision-Language Models**|Runpeng Yu et.al.|[2409.17143](http://arxiv.org/abs/2409.17143)|**[link](https://github.com/yu-rp/apiprompting)**|**\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u76f8\u6bd4\uff0c\u5927\u578b\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u8fd8\u80fd\u63a5\u53d7\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u56e0\u6b64\u5c55\u793a\u4e86\u66f4\u591a\u6709\u8da3\u7684\u73b0\u8c61\u7ea7\u80fd\u529b\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u8868\u73b0\u3002\u53d7LLM\u4e2d\u6587\u672c\u63d0\u793a\u7684\u542f\u53d1\uff0c\u63a2\u7d22\u4e86\u589e\u5f3aLVLM\u5bf9\u89c6\u89c9\u4fe1\u606f\u611f\u77e5\u80fd\u529b\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u3002\u7136\u800c\uff0c\u4ee5\u5f80\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u4ec5\u5904\u7406\u89c6\u89c9\u8f93\u5165\u800c\u4e0d\u8003\u8651\u6587\u672c\u67e5\u8be2\uff0c\u9650\u5236\u4e86\u6a21\u578b\u9075\u5faa\u6587\u672c\u6307\u4ee4\u5b8c\u6210\u4efb\u52a1\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u7684\u65b0\u63d0\u793a\u6280\u672f\uff0c\u8be5\u6280\u672f\u7b80\u5355\u5730\u5728\u539f\u59cb\u8f93\u5165\u56fe\u50cf\u4e0a\u53e0\u52a0\u4e86\u4e00\u4e2a\u7531\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u751f\u6210\u7684\u3001\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\uff0c\u5e76\u6709\u6548\u5730\u589e\u5f3a\u4e86LVLM\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u4e3a\u8f93\u5165\u56fe\u50cf\u751f\u6210\u4e00\u4e2a\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\u3002\u7136\u540e\uff0c\u70ed\u56fe\u7b80\u5355\u5730\u4e58\u4ee5\u539f\u59cb\u56fe\u50cf\u7684\u50cf\u7d20\u503c\u6765\u83b7\u5f97\u5b9e\u9645\u8f93\u5165\u56fe\u50cf\u4f9bLVLM\u4f7f\u7528\u3002\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6280\u672f\u7684\u6709\u6548\u6027\u3002\u4f8b\u5982\uff0c\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u5206\u522b\u63d0\u9ad8\u4e86LLaVA-1.5\u5728MM-Vet\u548cLLaVA-Wild\u57fa\u51c6\u4e0a\u7684\u6027\u80fd3.8%\u548c2.9%\u3002**|\n", "2409.17141": "|**2024-09-25**|**FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression**|Fazal Mittu et.al.|[2409.17141](http://arxiv.org/abs/2409.17141)|**[link](https://github.com/fazalmittu/finezip)**|**\u672c\u6587\u6df1\u5165\u5206\u6790\u4e86\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u4e0eTransformer\u7684\u6587\u672c\u538b\u7f29\u6280\u672f\uff0c\u5e76\u5c06\u5176\u4e0e\u4f20\u7edf\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u8fdb\u884c\u5bf9\u6bd4\u3002\u5c3d\u7ba1\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\u5728\u538b\u7f29\u6bd4\u4e0a\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5728\u5b9e\u7528\u6027\u65b9\u9762\u5374\u6781\u4e3a\u6709\u9650\u3002\u4ee5Llama3-8B\u4e3a\u57fa\u7840\u7684LLM\u538b\u7f29\u7cfb\u7edf\u2014\u2014LLMZip\uff0c\u5728\u538b\u7f29\u4ec510MB\u6587\u672c\u65f6\u9700\u89819.5\u5929\u7684\u65f6\u95f4\uff0c\u5c3d\u7ba1\u538b\u7f29\u6548\u679c\u6709\u6240\u63d0\u5347\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FineZip\u2014\u2014\u4e00\u79cd\u7ed3\u5408\u5728\u7ebf\u8bb0\u5fc6\u4e0e\u52a8\u6001\u4e0a\u4e0b\u6587\u6982\u5ff5\u7684\u65b0\u578bLLM\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u3002FineZip\u76f8\u8f83\u4e8eLLMZip\uff0c\u5c06\u538b\u7f29\u65f6\u95f4\u5927\u5e45\u7f29\u77ed\u81f3\u7ea64\u5c0f\u65f6\uff0c\u6027\u80fd\u63d0\u5347\u4e8654\u500d\uff0c\u4e14\u4e0e\u4f20\u7edf\u7b97\u6cd5\u538b\u7f29\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5176\u538b\u7f29\u6548\u7387\u63d0\u9ad8\u4e86\u5927\u7ea650%\u3002\u901a\u8fc7\u672c\u7814\u7a76\uff0c\u6211\u4eec\u8fc8\u51fa\u4e86\u8ba9\u57fa\u4e8eLLM\u7684\u65e0\u635f\u6587\u672c\u538b\u7f29\u6210\u4e3a\u73b0\u5b9e\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1FineZip\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46LLM\u4ecd\u4e0d\u9002\u7528\u4e8e\u5927\u89c4\u6a21\u6587\u672c\u538b\u7f29\u3002\u6211\u4eec\u671f\u5f85\u672c\u6587\u7684\u7814\u7a76\u548c\u521b\u65b0\u80fd\u4e3a\u672a\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u94fa\u5e73\u9053\u8def\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAXIS\u7684\u65b0\u578b\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u7f16\u7a0b\u63a5\u53e3\uff08API\uff09\u4f18\u5148\u5904\u7406\u64cd\u4f5c\u800c\u975e\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\uff0c\u4ee5\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u7684\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u95ee\u9898\u3002\u6b64\u5916\uff0cAXIS\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u7a0b\u5e8f\u7684\u65b9\u5f0f\u4fc3\u8fdb\u4e86API\u7684\u521b\u5efa\u4e0e\u6269\u5c55\u3002 \u5728Office Word\u5e94\u7528\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u4efb\u52a1\u5b8c\u6210\u65f6\u95f4\u4e0a\u7f29\u77ed\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u4eba\u7c7b\u3001\u4ee3\u7406\u548c\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u4ee5\u53ca\u5e94\u7528\u7a0b\u5e8f\u63d0\u4f9b\u8005\u5728LLM\u65f6\u4ee3\u7684\u65b0UI\u8bbe\u8ba1\u539f\u5219\u505a\u51fa\u4e86\u8d21\u732e\u3002\u5b83\u4e5f\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e2a\u5e94\u7528\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.17115": "|**2024-09-25**|**Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale**|Fan Zhou et.al.|[2409.17115](http://arxiv.org/abs/2409.17115)|**[link](https://github.com/gair-nlp/prox)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u8bad\u7ec3\u9886\u57df\uff0c\u4eba\u4eec\u957f\u671f\u4ee5\u6765\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u5236\u5b9a\u63d0\u5347\u6570\u636e\u8d28\u91cf\u7684\u542f\u53d1\u5f0f\u89c4\u5219\uff0c\u81f3\u4eca\u5df2\u53d1\u5c55\u51fa\u4f17\u591a\u89c4\u5219\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u89c4\u5219\u7f3a\u4e4f\u7075\u6d3b\u6027\uff0c\u65e0\u6cd5\u6709\u6548\u9488\u5bf9\u6bcf\u4e2a\u5b9e\u4f8b\u7684\u72ec\u7279\u7279\u6027\u8fdb\u884c\u8c03\u6574\u3002\u540c\u65f6\uff0c\u4e3a\u6bcf\u4e2a\u5b9e\u4f8b\u5e94\u7528\u5b9a\u5236\u89c4\u5219\u5bf9\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u800c\u8a00\u662f\u4e0d\u5207\u5b9e\u9645\u7684\u3002\u672c\u6587\u5c55\u793a\u4e86\u5373\u4f7f\u662f\u53c2\u6570\u6570\u91cf\u4ec5\u67090.3B\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u4e5f\u80fd\u5c55\u73b0\u51fa\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u76f8\u5f53\u7684\u6570\u636e\u4f18\u5316\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f16\u7a0b\u6bcf\u4f8b\u201d\uff08ProX\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u6570\u636e\u4f18\u5316\u89c6\u4e3a\u7f16\u7a0b\u4efb\u52a1\uff0c\u5141\u8bb8\u6a21\u578b\u901a\u8fc7\u751f\u6210\u5e76\u6267\u884c\u7cbe\u7ec6\u7c92\u5ea6\u7684\u64cd\u4f5c\uff08\u5982\u5b57\u7b26\u4e32\u89c4\u8303\u5316\uff09\u5bf9\u6bcf\u4e2a\u4e2a\u4f53\u5b9e\u4f8b\u8fdb\u884c\u5927\u89c4\u6a21\u4f18\u5316\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528ProX\u7b5b\u9009\u540e\u7684\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u5404\u79cd\u4e0b\u6e38\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u4f18\u4e8e\u539f\u59cb\u6570\u636e\u6216\u7531\u5176\u4ed6\u7b5b\u9009\u65b9\u6cd5\u5904\u7406\u7684\u6570\u636e\uff0c\u6027\u80fd\u63d0\u5347\u8d85\u8fc72%\u3002\u8be5\u6846\u67b6\u7684\u6709\u6548\u6027\u9002\u7528\u4e8e\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5305\u62ecC4\u3001RedPajama-V2\u548cFineWeb\u3002\u6b64\u5916\uff0cProX\u5728\u7279\u5b9a\u9886\u57df\u7684\u8fde\u7eed\u9884\u8bad\u7ec3\u4e2d\u8868\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff1a\u5728\u65e0\u9700\u7279\u5b9a\u9886\u57df\u8bbe\u8ba1\u7684\u60c5\u51b5\u4e0b\uff0c\u4f7f\u7528ProX\u4f18\u5316\u7684OpenWebMath\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u51c6\u786e\u6027\u4e0a\u5206\u522b\u6bd4Mistral-7B\u3001Llama-2-7B\u548cCodeLlama-7B\u63d0\u9ad8\u4e867.6%\u300114.6%\u548c20.3%\uff0c\u4ec5\u4f7f\u7528\u7ea610B\u4ee4\u724c\u5373\u53ef\u8fbe\u5230\u7c7b\u4f3c\u4e8e\u4f7f\u7528200B\u4ee4\u724c\u9884\u8bad\u7ec3\u7684Llama-7B\u6a21\u578b\u7684\u6c34\u5e73\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0cProX\u663e\u8457\u8282\u7701\u4e86\u8bad\u7ec3FLOPs\uff0c\u4e3a\u9ad8\u6548LLM\u9884\u8bad\u7ec3\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u9053\u8def\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86ProX\uff0c\u5305\u62ec>100B\u7684\u8bed\u6599\u5e93\u3001\u6a21\u578b\u4ee5\u53ca\u6240\u6709\u8bad\u7ec3\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u4ee5\u4fc3\u8fdb\u53ef\u590d\u5236\u7814\u7a76\u548c\u672a\u6765\u521b\u65b0\u3002\u4ee3\u7801\uff1ahttps://github.com/GAIR-NLP/ProX**|\n", "2409.17092": "|**2024-09-25**|**Accumulator-Aware Post-Training Quantization**|Ian Colbert et.al.|[2409.17092](http://arxiv.org/abs/2409.17092)|null|\u8fd1\u5e74\u6765\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u7d22\u4e86\u4f4e\u7cbe\u5ea6\u7d2f\u52a0\uff0c\u62a5\u544a\u4e86\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684\u541e\u5410\u91cf\u3001\u529f\u7387\u548c\u9762\u79ef\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u63d0\u8bae\u4ec5\u8003\u8651\u4e86\u91cf\u5316\u611f\u77e5\u8bad\u7ec3\uff08QAT\uff09\u8303\u5f0f\uff0c\u5728\u8be5\u8303\u5f0f\u4e2d\uff0c\u6a21\u578b\u5728\u91cf\u5316\u5faa\u73af\u4e2d\u8fdb\u884c\u5fae\u8c03\u6216\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3002\u968f\u7740\u6a21\u578b\u7ee7\u7eed\u589e\u5927\uff0cQAT\u6280\u672f\u7684\u6210\u672c\u53d8\u5f97\u8d8a\u6765\u8d8a\u9ad8\uff0c\u8fd9\u6fc0\u53d1\u4e86\u6700\u8fd1\u5bf9\u540e\u91cf\u5316\u91cf\u5316\uff08PTQ\uff09\u7814\u7a76\u7684\u70ed\u6f6e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u6b63\u5f0f\u7814\u7a76PTQ\u80cc\u666f\u4e0b\u7684\u79ef\u7b97\u5668\u611f\u77e5\u91cf\u5316\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AXE\uff0c\u4e00\u4e2a\u65e8\u5728\u8d4b\u4e88\u73b0\u6709\u5c42\u5f0fPTQ\u7b97\u6cd5\u6ea2\u51fa\u907f\u514d\u4fdd\u8bc1\u7684\u5b9e\u7528\u6846\u67b6\u7684\u6269\u5c55\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u6700\u5148\u8fdb\u7684PTQ\u7b97\u6cd5\uff1aGPFQ\u548cOPTQ\u4e4b\u4e0a\u5b9e\u73b0AXE\u6765\u7406\u8bba\u5730\u63a8\u52a8AXE\uff0c\u5e76\u8bc1\u660e\u5176\u7075\u6d3b\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u901a\u8fc7\u9996\u6b21\u652f\u6301\u591a\u9636\u6bb5\u79ef\u7d2f\u6765\u4e00\u822c\u5316AXE\uff0c\u4e3a\u5168\u6570\u636e\u8def\u5f84\u4f18\u5316\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6269\u5c55\u6253\u5f00\u5927\u95e8\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u8bed\u8a00\u751f\u6210\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86AXE\uff0c\u5e76\u89c2\u5bdf\u5230\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5728\u79ef\u7b97\u5668\u4f4d\u5bbd\u4e0e\u6a21\u578b\u51c6\u786e\u6027\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u6539\u8fdb\u3002|\n", "2409.17066": "|**2024-09-25**|**VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models**|Yifei Liu et.al.|[2409.17066](http://arxiv.org/abs/2409.17066)|**[link](https://github.com/microsoft/vptq)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aVector Post-Training Quantization\uff08VPTQ\uff09\u7684\u4f4e\u6bd4\u7279\u91cf\u5316\u65b9\u6cd5\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u8fc7\u4f7f\u7528\u4e8c\u6b21\u4f18\u5316\u6765\u5b9a\u4e49LLM\u5411\u91cf\u91cf\u5316\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u89e3\u51b3\u4f18\u5316\u95ee\u9898\u6765\u6307\u5bfc\u91cf\u5316\u7b97\u6cd5\u8bbe\u8ba1\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u5f15\u5165\u4e86\u901a\u9053\u72ec\u7acb\u7684\u4e8c\u6b21\u4f18\u5316\u4ee5\u5b9e\u73b0\u7cbe\u7ec6\u5316\u91cf\u5316\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u5206\u89e3\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u51fa\u4e86\u7b80\u660e\u6709\u6548\u7684\u4ee3\u7801\u672c\u521d\u59cb\u5316\u7b97\u6cd5\u3002\u6b64\u5916\uff0cVPTQ\u8fd8\u6269\u5c55\u4e86\u6b8b\u5dee\u548c\u5f02\u5e38\u503c\u91cf\u5316\u652f\u6301\uff0c\u8fd9\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6a21\u578b\u7cbe\u5ea6\uff0c\u8fd8\u80fd\u8fdb\u4e00\u6b65\u538b\u7f29\u6a21\u578b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eSOTA\u76f8\u6bd4\uff0c\u57282\u6bd4\u7279\u91cf\u5316\u65f6\uff0cVPTQ\u5c06\u6a21\u578b\u91cf\u5316\u56f0\u60d1\u5ea6\u964d\u4f4e0.01-0.34\uff0cMistral-7B\u4e0a\u4e3a0.38-0.68\uff0cLLaMA-3\u4e0a\u4e3a4.41-7.34\u3002\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u5e73\u5747\u51c6\u786e\u5ea6\u63d0\u5347\u8303\u56f4\u4e3aLLaMA-2\u4e0a\u76840.79%-1.5%\uff0cMistral-7B\u4e0a\u76841%\uff0c\u4ee5\u53caLLaMA-3\u4e0a\u768411%-22%\u3002\u91cf\u5316\u7b97\u6cd5\u6267\u884c\u65f6\u95f4\u4ec5\u536010.4%-18.6%\uff0c\u5bfc\u81f4\u63a8\u7406\u541e\u5410\u91cf\u63d0\u9ad81.6-1.8\u500d\u3002**|\n", "2409.17054": "|**2024-09-25**|**Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia**|Azmul Asmar Irfan et.al.|[2409.17054](http://arxiv.org/abs/2409.17054)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u672c\u5730\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528Whisper\u6a21\u578b\u8fdb\u884c\u8f6c\u5f55\uff0cGPT-3\u8fdb\u884c\u603b\u7ed3\uff0c\u5e76\u5c06\u5176\u683c\u5f0f\u5316\u4e3aePuskemas\u533b\u7597\u8bb0\u5f55\u3002\u6b64\u7cfb\u7edf\u4f5c\u4e3a\u73b0\u6709\u7f51\u7edc\u6d4f\u89c8\u5668\u6269\u5c55\u7684\u9644\u52a0\u7ec4\u4ef6\u5b9e\u73b0\uff0c\u5141\u8bb8\u533b\u751f\u5728\u8bf4\u8bdd\u65f6\u586b\u5199\u60a3\u8005\u8868\u683c\u3002\u901a\u8fc7\u5229\u7528\u5b9e\u65f6\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u529f\u80fd\uff0c\u533b\u751f\u53ef\u4ee5\u63d0\u9ad8\u60a3\u8005\u62a4\u7406\u7684\u5468\u8f6c\u65f6\u95f4\uff0c\u540c\u65f6\u589e\u5f3a\u8bb0\u5f55\u7684\u8d28\u91cf\uff0c\u4f7f\u5f97\u8bb0\u5f55\u66f4\u52a0\u8be6\u7ec6\u4e14\u5bcc\u6709\u6d1e\u5bdf\u529b\uff0c\u4ee5\u4f9b\u672a\u6765\u7684\u8bbf\u95ee\u53c2\u8003\u3002\u8fd9\u4e00\u521b\u65b0\u65e8\u5728\u89e3\u51b3\u5370\u5c3c\u533b\u7597\u673a\u6784\u62e5\u6324\u4ee5\u53ca\u533b\u62a4\u4eba\u5458\u884c\u653f\u8d1f\u62c5\u91cd\u7684\u95ee\u9898\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5c06\u5e2e\u52a9\u533b\u751f\u8282\u7701\u65f6\u95f4\u3001\u63d0\u4f9b\u66f4\u597d\u7684\u62a4\u7406\u5e76\u4ea7\u751f\u66f4\u51c6\u786e\u7684\u533b\u7597\u8bb0\u5f55\uff0c\u4ee3\u8868\u4e86\u5411\u73b0\u4ee3\u5316\u533b\u7597\u4fdd\u5065\u8fc8\u8fdb\u7684\u91cd\u8981\u4e00\u6b65\uff0c\u786e\u4fdd\u5373\u4f7f\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u60a3\u8005\u4e5f\u80fd\u83b7\u5f97\u53ca\u65f6\u3001\u9ad8\u8d28\u91cf\u7684\u62a4\u7406\u3002|\n", "2409.17044": "|**2024-09-25**|**How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not**|Francesco Verdini et.al.|[2409.17044](http://arxiv.org/abs/2409.17044)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u60ca\u4eba\u8868\u73b0\u63a8\u52a8\u4e86\u7814\u7a76\u52aa\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u4efb\u52a1\u548c\u8f93\u5165\u6a21\u6001\u3002\u5728\u8bed\u97f3\u8f6c\u6587\u672c\uff08S2T\uff09\u4efb\u52a1\u4e2d\uff0c\u65b0\u5174\u7684\u89e3\u51b3\u65b9\u6848\u662f\u901a\u8fc7\u9002\u914d\u5668\u6a21\u5757\u5c06\u8bed\u97f3\u57fa\u7840\u6a21\u578b\uff08SFM\uff09\u7684\u8f93\u51fa\u6295\u5f71\u5230LLM\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u76ee\u524d\u8fd8\u6ca1\u6709\u5de5\u4f5c\u63a2\u8ba8\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6bcf\u4e2a\u7ec4\u4ef6\uff08SFM\u3001\u9002\u914d\u5668\u3001LLM\uff09\uff0c\u6216\u8005\u9009\u62e9\u9002\u914d\u5668\u7684\u6700\u4f73\u8bbe\u8ba1\u662f\u5426\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e865\u4e2a\u9002\u914d\u5668\u6a21\u5757\u30012\u4e2aLLM\uff08Mistral\u548cLlama\uff09\u4ee5\u53ca2\u4e2aSFM\uff08Whisper\u548cSeamlessM4T\uff09\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u548c\u8bed\u97f3\u7ffb\u8bd1\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684S2T\u4efb\u52a1\u4e0a\u7684\u7ec4\u5408\u6548\u679c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cSFM\u5728\u4e0b\u6e38\u6027\u80fd\u4e2d\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\uff0c\u800c\u9002\u914d\u5668\u7684\u9009\u62e9\u5177\u6709\u9002\u5ea6\u7684\u5f71\u54cd\uff0c\u5e76\u4e14\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002|\n", "2409.17027": "|**2024-09-25**|**Counterfactual Token Generation in Large Language Models**|Ivi Chatzi et.al.|[2409.17027](http://arxiv.org/abs/2409.17027)|**[link](https://github.com/networks-learning/counterfactual-llms)**|\u672c\u6587\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u529f\u80fd\uff0c\u4f7f\u5176\u80fd\u591f\u63a8\u7406\u8fc7\u53bb\u751f\u6210\u7684\u4ee4\u724c\u6240\u5448\u73b0\u7684\u53ef\u80fd\u66ff\u4ee3\u60c5\u51b5\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGumbel-Max\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u7684\u56e0\u679c\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fd9\u4e00\u529f\u80fd\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u51e0\u4e4e\u4e0d\u589e\u52a0\u4e0e\u57fa\u7840\u4ee4\u724c\u751f\u6210\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u8fdb\u884c\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\uff0c\u5b9e\u73b0\u8fc7\u7a0b\u7b80\u5355\u4e14\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u6216\u63d0\u793a\u5de5\u7a0b\u3002\u6211\u4eec\u5728\u6b64\u57fa\u7840\u4e0a\u5728Llama 3 8B-instruct\u4e0a\u5b9e\u73b0\u4e86\u8be5\u6a21\u578b\uff0c\u5e76\u5bf9\u751f\u6210\u7684\u53cd\u4e8b\u5b9e\u6587\u672c\u8fdb\u884c\u4e86\u5b9a\u6027\u548c\u5b9a\u91cf\u5206\u6790\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\u5728\u504f\u89c1\u68c0\u6d4b\u65b9\u9762\u7684\u5e94\u7528\uff0c\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u7684\u4e16\u754c\u6a21\u578b\u4e2d\u7684\u4e00\u4e9b\u6709\u8da3\u89c1\u89e3\u3002|\n", "2409.17011": "|**2024-09-25**|**LLM-CARD: Towards a Description and Landscape of Large Language Models**|Shengwei Tian et.al.|[2409.17011](http://arxiv.org/abs/2409.17011)|**[link](https://github.com/shengwei-tian/dependency-parser-visualization)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e2d\u4e0d\u65ad\u6d8c\u73b0\u3002\u968f\u7740\u53d1\u8868\u7684\u8bba\u6587\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\uff0c\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u9762\u4e34\u4fe1\u606f\u8fc7\u8f7d\u7684\u6311\u6218\u3002\u56e0\u6b64\uff0c\u5f00\u53d1\u4e00\u4e2a\u80fd\u591f\u81ea\u52a8\u4ece\u5b66\u672f\u8bba\u6587\u4e2d\u63d0\u53d6\u5e76\u7ec4\u7ec7LLM\u5173\u952e\u4fe1\u606f\u7684\u7cfb\u7edf\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u5173\u7cfb\u62bd\u53d6\uff08RE\uff09\u65b9\u6cd5\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u4ee5\u81ea\u52a8\u4ece\u8bba\u6587\u4e2d\u63d0\u53d6\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9ad8\u6548\u5730\u83b7\u53d6\u5173\u4e8eLLMs\u7684\u4fe1\u606f\u3002\u8fd9\u4e9b\u7279\u6027\u5305\u62ec\u6a21\u578b\u7684\u201c\u8bb8\u53ef\u201d\u3001\u201c\u540d\u79f0\u201d\u548c\u201c\u5e94\u7528\u201d\u3002\u501f\u52a9\u8fd9\u4e9b\u7279\u6027\uff0c\u6211\u4eec\u53ef\u4ee5\u4e3a\u6bcf\u7bc7\u8bba\u6587\u5f62\u6210\u4e00\u4e2a\u6a21\u578b\u5361\u7247\u3002\u5728\u6570\u636e\u8d21\u732e\u65b9\u9762\uff0c\u5bf9106\u7bc7\u5b66\u672f\u8bba\u6587\u8fdb\u884c\u4e86\u5904\u7406\uff0c\u5b9a\u4e49\u4e86\u4e09\u4e2a\u5b57\u5178\u2014\u2014LLMs\u540d\u79f0\u3001\u8bb8\u53ef\u548c\u5e94\u7528\u3002\u901a\u8fc7\u5b57\u5178\u67e5\u627e\u63d0\u53d6\u4e8611051\u4e2a\u53e5\u5b50\uff0c\u5e76\u901a\u8fc7\u4eba\u5de5\u5ba1\u67e5\u6700\u7ec8\u9009\u62e9\u4e86129\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u540d\u79f0\u4e0e\u8bb8\u53ef\u4e4b\u95f4\u7684\u94fe\u63a5\uff0c\u4ee5\u53ca106\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u6a21\u578b\u540d\u79f0\u4e0e\u5e94\u7528\u4e4b\u95f4\u7684\u94fe\u63a5\u3002|\n", "2409.18127": "|**2024-09-26**|**EgoLM: Multi-Modal Language Model of Egocentric Motions**|Fangzhou Hong et.al.|[2409.18127](http://arxiv.org/abs/2409.18127)|null|\u5728\u7a7f\u6234\u8bbe\u5907\u7684\u666e\u53ca\u80cc\u666f\u4e0b\uff0c\u7406\u89e3\u4e3b\u89c2\u89c6\u89d2\u7684\u52a8\u4f5c\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53d1\u5c55\u5177\u6709\u60c5\u5883\u610f\u8bc6\u7684\u4eba\u5de5\u667a\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEgoLM\u7684\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u4ece\u591a\u6a21\u6001\u8f93\u5165\uff08\u5982\u4e3b\u89c2\u89c6\u9891\u548c\u8fd0\u52a8\u4f20\u611f\u5668\uff09\u4e2d\u8ddf\u8e2a\u548c\u7406\u89e3\u4e3b\u89c2\u52a8\u4f5c\u3002EgoLM\u901a\u8fc7\u5229\u7528\u4e30\u5bcc\u7684\u4e0a\u4e0b\u6587\u6765\u89e3\u51b3\u5355\u6a21\u6001\u6761\u4ef6\u4e0b\u7684\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u548c\u7406\u89e3\u96be\u9898\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u901a\u7528\u4e14\u591a\u6a21\u6001\u7684\u6846\u67b6\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5efa\u6a21\u4e3b\u4f53\u52a8\u4f5c\u548c\u81ea\u7136\u8bed\u8a00\u7684\u8054\u5408\u5206\u5e03\u3002\u591a\u6a21\u6001\u4f20\u611f\u5668\u8f93\u5165\u88ab\u7f16\u7801\u5e76\u6295\u5f71\u5230\u8bed\u8a00\u6a21\u578b\u7684\u8054\u5408\u6f5c\u5728\u7a7a\u95f4\u4e2d\uff0c\u5e76\u7528\u4e8e\u89e6\u53d1\u52a8\u4f5c\u751f\u6210\u6216\u6587\u672c\u751f\u6210\uff0c\u5206\u522b\u7528\u4e8e\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u6216\u7406\u89e3\u3002\u5927\u89c4\u6a21\u591a\u6a21\u6001\u4eba\u4f53\u52a8\u4f5c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86EgoLM\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u5728\u666e\u904d\u4e3b\u89c2\u5b66\u4e60\u4e2d\u7684\u6709\u6548\u6027\u3002|\n", "2409.18119": "|**2024-09-26**|**Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography**|Yuexi Du et.al.|[2409.18119](http://arxiv.org/abs/2409.18119)|null|\u5728\u533b\u7597\u56fe\u50cf\u5206\u6790\u9886\u57df\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u5176\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7684CLIP\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5982\u80f8\u7247\u8fd9\u7c7b\u62e5\u6709\u4e30\u5bcc\u56fe\u50cf\u62a5\u544a\u6570\u636e\u7684\u6a21\u6001\u4e0a\uff0c\u800c\u5ffd\u7565\u4e86\u8bf8\u5982\u4e73\u817aX\u5149\u7b49\u8bb8\u591a\u91cd\u8981\u6a21\u6001\u7684\u7814\u7a76\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u5c06\u5b8c\u6574\u7684CLIP\u6a21\u578b\u5e94\u7528\u4e8e\u4e73\u817aX\u5149\u56fe\u50cf\u5206\u6790\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u6807\u8bb0\u6570\u636e\u7a00\u7f3a\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u5c0f\u611f\u5174\u8da3\u533a\u57df\u4ee5\u53ca\u6570\u636e\u4e0d\u5e73\u8861\u7684\u6311\u6218\u3002 \u6211\u4eec\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u79cd\u9488\u5bf9\u4e73\u817aX\u5149\u7684\u4e13\u7528\u76d1\u7763\u6846\u67b6\uff0c\u5229\u7528\u5176\u591a\u89c6\u56fe\u7279\u6027\u3002\u6b64\u5916\uff0c\u8bbe\u8ba1\u4e86\u5bf9\u9f50\u6a21\u5757\u4ee5\u66f4\u597d\u5730\u805a\u7126\u4e8e\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u8be6\u7ec6\u7279\u5f81\u3002\u6700\u540e\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\uff0c\u7528\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u9884\u5148\u4f7f\u7528\u533b\u5b66\u77e5\u8bc6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u5e94\u5bf9\u6570\u636e\u9650\u5236\u95ee\u9898\u3002 \u6211\u4eec\u7684\u591a\u89c6\u56fe\u548c\u591a\u5c3a\u5ea6\u5bf9\u9f50\uff08MaMA\uff09\u65b9\u6cd5\uff0c\u5728\u4e24\u4e2a\u5927\u578b\u771f\u5b9e\u4e16\u754c\u4e73\u817aX\u5149\u6570\u636e\u96c6EMBED\u548cRSNA-Mammo\u4e0a\uff0c\u5bf9\u4e8e\u4e09\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u76f8\u6bd4\u6700\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u4ec5\u4f7f\u7528\u4e8652%\u7684\u6a21\u578b\u5927\u5c0f\u3002|\n", "2409.18111": "|**2024-09-26**|**E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding**|Ye Liu et.al.|[2409.18111](http://arxiv.org/abs/2409.18111)|**[link](https://github.com/PolyU-ChenLab/ETBench)**|**\u4e3a\u4e86\u9a8c\u8bc1\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\uff08Video Large Language Models, Video-LLMs\uff09\u5728\u901a\u7528\u89c6\u9891\u7406\u89e3\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5df2\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bca\u65ad\u6a21\u578b\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u901a\u8fc7\u89c6\u9891\u7ea7\u95ee\u9898\u56de\u7b54\u8fdb\u884c\u8bc4\u4f30\uff0c\u7f3a\u4e4f\u5bf9\u4e8b\u4ef6\u7ea7\u522b\u7684\u7cbe\u7ec6\u8bc4\u4f30\u548c\u4efb\u52a1\u591a\u6837\u6027\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86E.T. Bench\uff08\u4e8b\u4ef6\u7ea7\u522b\u4e0e\u65f6\u95f4\u654f\u611f\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u5f00\u653e\u5f0f\u7684\u4e8b\u4ef6\u7ea7\u522b\u89c6\u9891\u7406\u89e3\u7684\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u57fa\u51c6\u6d4b\u8bd5\u3002 E.T. Bench\u6309\u7167\u4e09\u5c42\u4efb\u52a1\u5206\u7c7b\u4f53\u7cfb\u8fdb\u884c\u7ec4\u7ec7\uff0c\u5305\u542b\u4e86\u6db5\u76d612\u4e2a\u4efb\u52a1\u76847300\u4e2a\u6837\u672c\uff0c\u4ee5\u53ca8\u4e2a\u9886\u57df\u76842514\u5c0f\u65f6\u603b\u65f6\u957f\u76847000\u4e2a\u89c6\u9891\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5e7f\u6cdb\u5730\u5bf98\u4e2a\u56fe\u50cf\u5927\u8bed\u8a00\u6a21\u578b\u548c12\u4e2a\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5e76\u4e14\u7ed3\u679c\u663e\u793a\uff0c\u7528\u4e8e\u7c97\u7c92\u5ea6\uff08\u89c6\u9891\u7ea7\uff09\u7406\u89e3\u7684\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u89e3\u51b3\u6211\u4eec\u7684\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u4f8b\u5982\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u611f\u5174\u8da3\u7684\u4e8b\u4ef6\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u89c6\u9891\u4e0a\u4e0b\u6587\u957f\u5ea6\u77ed\u3001\u65f6\u95f4\u8868\u793a\u4e0d\u5f53\u4ee5\u53ca\u7f3a\u4e4f\u591a\u4e8b\u4ef6\u8bad\u7ec3\u6570\u636e\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\u2014\u2014E.T. Chat\uff0c\u4ee5\u53ca\u4e13\u95e8\u4e3a\u7cbe\u7ec6\u7c92\u5ea6\u4e8b\u4ef6\u7406\u89e3\u8bbe\u8ba1\u7684\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6E.T. Instruct 164K\u3002\u6211\u4eec\u7684\u7b80\u5355\u4f46\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6848\u5728\u591a\u4e2a\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002**|\n", "2409.18060": "|**2024-09-26**|**Infering Alt-text For UI Icons With Large Language Models During App Development**|Sabrina Haque et.al.|[2409.18060](http://arxiv.org/abs/2409.18060)|null|\u786e\u4fdd\u79fb\u52a8\u5e94\u7528\u7684\u65e0\u969c\u788d\u6027\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u4f9d\u8d56\u5c4f\u5e55\u9605\u8bfb\u5668\u7684\u89c6\u969c\u7528\u6237\u3002\u754c\u9762\u56fe\u6807\u5bf9\u4e8e\u5bfc\u822a\u548c\u4e92\u52a8\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u7f3a\u4e4f\u6709\u610f\u4e49\u7684\u66ff\u4ee3\u6587\u672c\uff0c\u4ece\u800c\u5f62\u6210\u4f7f\u7528\u969c\u788d\u3002\u4f20\u7edf\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5728\u751f\u6210\u66ff\u4ee3\u6587\u672c\u65f6\u9700\u8981\u5927\u91cf\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u5728\u56fe\u6807\u7c7b\u578b\u591a\u6837\u6027\u4e0e\u4e0d\u5e73\u8861\u6027\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u66f4\u8fd1\u671f\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5219\u8981\u6c42\u5b8c\u6574\u7684UI\u5c4f\u5e55\uff0c\u8fd9\u5728\u5e94\u7528\u7a0b\u5e8f\u5f00\u53d1\u7684\u8fed\u4ee3\u9636\u6bb5\u53ef\u80fd\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u90e8\u5206UI\u6570\u636e\u81ea\u4e3b\u751f\u6210\u79fb\u52a8UI\u56fe\u6807\u7684\u63cf\u8ff0\u6027\u66ff\u4ee3\u6587\u672c\u3002\u901a\u8fc7\u6574\u5408\u5305\u62ec\u7c7b\u522b\u3001\u8d44\u6e90ID\u3001\u8fb9\u754c\u3001OCR\u68c0\u6d4b\u5230\u7684\u6587\u5b57\u4ee5\u53ca\u7236\u8282\u70b9\u548c\u540c\u7ea7\u8282\u70b9\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5728\u5185\u7684\u56fe\u6807\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u5bf9\u5927\u7ea61400\u4e2a\u56fe\u6807\u7684\u5c0f\u578b\u6570\u636e\u96c6\u8fdb\u884c\u79bb\u7ebf\u5fae\u8c03\uff0c\u4ece\u800c\u751f\u6210\u4e86IconDesc\u3002\u5728\u5b9e\u8bc1\u8bc4\u4f30\u548c\u7528\u6237\u7814\u7a76\u4e2d\uff0cIconDesc\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u76f8\u5173\u66ff\u4ee3\u6587\u672c\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u80fd\u529b\u4f7f\u5f97IconDesc\u6210\u4e3a\u5f00\u53d1\u8005\u7684\u91cd\u8981\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u5feb\u901f\u8fed\u4ee3\u548c\u63d0\u5347UI\u7684\u65e0\u969c\u788d\u6027\u3002|\n", "2409.18053": "|**2024-09-26**|**DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving**|Dingrui Wang et.al.|[2409.18053](http://arxiv.org/abs/2409.18053)|**[link](https://github.com/TUM-AVS/DualAD)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u81ea\u4e3b\u9a7e\u9a76\u6846\u67b6DualAD\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u9a7e\u9a76\u8fc7\u7a0b\u4e2d\u7684\u51b3\u7b56\u903b\u8f91\u3002DualAD\u7531\u4e24\u5c42\u6784\u6210\uff1a\u5e95\u5c42\u4e3a\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\uff0c\u8d1f\u8d23\u5904\u7406\u9700\u8981\u8f83\u5c11\u51b3\u7b56\u7684\u5e38\u89c4\u9a7e\u9a76\u4efb\u52a1\uff1b\u4e0a\u5c42\u5219\u914d\u5907\u4e86\u4e00\u4e2a\u57fa\u4e8e\u89c4\u5219\u7684\u6587\u5b57\u7f16\u7801\u5668\uff0c\u5c06\u7edd\u5bf9\u72b6\u6001\u4e0b\u7684\u9a7e\u9a76\u573a\u666f\u8f6c\u5316\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6b64\u6587\u672c\u968f\u540e\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u51b3\u7b56\u3002\u5f53\u68c0\u6d4b\u5230\u6f5c\u5728\u5371\u9669\u65f6\uff0c\u4e0a\u5c42\u4f1a\u4ecb\u5165\u5e95\u5c42\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u5728\u5173\u952e\u60c5\u51b5\u4e0b\u7684\u51b3\u7b56\u903b\u8f91\u3002\u95ed\u5408\u73af\u8def\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528\u96f6\u8bad\u7ec3\u9884\u8bad\u7ec3\u6a21\u578b\u7684DualAD\u663e\u8457\u4f18\u4e8e\u7f3a\u4e4f\u51b3\u7b56\u80fd\u529b\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8fd8\u5f3a\u8c03\u4e86\u6587\u5b57\u7f16\u7801\u5668\u7684\u6709\u6548\u6027\uff0c\u5b83\u6781\u5927\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u5bf9\u573a\u666f\u7684\u7406\u89e3\u80fd\u529b\u3002\u6b64\u5916\uff0c\u96c6\u6210\u7684DualAD\u6a21\u578b\u968f\u7740\u66f4\u5f3a\u5927\u7684LLM\u7684\u4f7f\u7528\u800c\u5f97\u5230\u6539\u5584\uff0c\u8fd9\u8868\u660e\u8be5\u6846\u67b6\u5177\u6709\u8fdb\u4e00\u6b65\u589e\u5f3a\u7684\u6f5c\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4ee3\u7801\u548c\u57fa\u51c6\u6d4b\u8bd5\u4f9b\u516c\u4f17\u8bbf\u95ee\u3002|\n", "2409.18042": "|**2024-09-26**|**EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions**|Kai Chen et.al.|[2409.18042](http://arxiv.org/abs/2409.18042)|null|\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u4e2d\uff0c\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u4ee5\u516c\u5f00\u6570\u636e\u8fdb\u884c\u7aef\u5230\u7aef\u7684\u56fe\u50cf\u3001\u6587\u672c\u548c\u8bed\u97f3\u751f\u6210\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u73b0\u6709\u7684\u89c6\u8bed\u6a21\u578b\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u8fdb\u884c\u8bed\u97f3\u5904\u7406\uff0c\u800c\u8bed\u97f3\u8bed\u6a21\u578b\u4ecd\u7f3a\u4e4f\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86EMOVA\uff08\u60c5\u7eea\u5316\u7684\u5168\u6a21\u5f0f\u8bed\u97f3\u52a9\u624b\uff09\uff0c\u4ee5\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5177\u5907\u7aef\u5230\u7aef\u7684\u8bed\u97f3\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9886\u5148\u7684\u89c6\u8bed\u8868\u73b0\u3002\u901a\u8fc7\u8bed\u4e49-\u58f0\u5b66\u5206\u79bb\u7684\u8bed\u97f3\u7f16\u7801\u5668\uff0c\u6211\u4eec\u610f\u5916\u5730\u53d1\u73b0\uff0c\u5168\u6a21\u6001\u5bf9\u9f50\u53ef\u4ee5\u8fdb\u4e00\u6b65\u589e\u5f3a\u89c6\u8bed\u548c\u8bed\u97f3\u80fd\u529b\uff0c\u4e0e\u76f8\u5e94\u7684\u53cc\u6a21\u6001\u5bf9\u9f50\u6a21\u578b\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u98ce\u683c\u6a21\u5757\uff0c\u7528\u4e8e\u7075\u6d3b\u63a7\u5236\u8bed\u97f3\u98ce\u683c\uff08\u4f8b\u5982\u60c5\u611f\u548c\u97f3\u8c03\uff09\u3002\u9996\u6b21\uff0cEMOVA\u5728\u89c6\u8bed\u548c\u8bed\u97f3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u540c\u65f6\u652f\u6301\u5e26\u6709\u751f\u52a8\u60c5\u611f\u7684\u5168\u6a21\u6001\u5bf9\u8bdd\u3002|\n", "2409.18028": "|**2024-09-26**|**Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective**|Yotam Wolf et.al.|[2409.18028](http://arxiv.org/abs/2409.18028)|null|\u5728\u8fdb\u884c\u590d\u6742\u5206\u6790\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f7f\u7528\u4e2d\uff0c\u901a\u5e38\u4f1a\u5c06\u6574\u4e2a\u4efb\u52a1\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u8fdb\u884c\u91c7\u6837\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u4e2d\u5206\u89e3\u4efb\u52a1\uff08\u5373\u94fe\u5f0f\u601d\u7ef4\uff09\u5bf9\u4e8e\u89e3\u51b3\u8fd9\u7c7b\u4efb\u52a1\u662f\u6709\u76ca\u7684\u3002\u672c\u6587\u6307\u51fa\u4e86\u4e00\u79cd\u9650\u5236\uff0c\u5373LLM\u5728\u540c\u4e00\u4e2a\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u6267\u884c\u591a\u4e2a\u5b50\u4efb\u52a1\u7684\u80fd\u529b\u2014\u2014\u4e00\u79cd\u201c\u590d\u5408\u96be\u5ea6\u201d\u3002\u8fd9\u8868\u660e\u5728LLM\u7ec4\u6210\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u5c06\u5206\u89e3\u540e\u7684\u95ee\u9898\u5206\u53d1\u5904\u7406\u5177\u6709\u4f18\u52bf\u3002\u6211\u4eec\u901a\u8fc7\u751f\u6210\u590d\u6742\u5ea6\u6307\u6807\u6765\u91cf\u5316\u8fd9\u79cd\u590d\u5408\u96be\u5ea6\uff0c\u5373\u5728\u91c7\u6837\u5230\u81f3\u5c11\u4e00\u4e2a\u6b63\u786e\u89e3\u6240\u9700\u7684LLM\u751f\u6210\u6b21\u6570\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u76f8\u5bf9\u4e8e\u5728\u76f8\u540c\u4e0a\u4e0b\u6587\u5185\u89e3\u51b3\u7ec4\u5408\u95ee\u9898\uff0c\u5c06\u95ee\u9898\u5206\u6563\u7ed9\u591a\u4e2a\u667a\u80fd\u4f53\u7684\u751f\u6210\u590d\u6742\u5ea6\u4e4b\u95f4\u5b58\u5728\u5dee\u8ddd\uff0c\u5e76\u4e14\u968f\u7740\u89e3\u957f\u5ea6\u7684\u589e\u52a0\uff0c\u8fd9\u4e2a\u5dee\u8ddd\u5448\u6307\u6570\u589e\u957f\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u548c\u5b9e\u9a8c\u8bc1\u660e\u4e86\u8fd9\u4e00\u7ed3\u679c\u3002|\n", "2409.18025": "|**2024-09-26**|**An Adversarial Perspective on Machine Unlearning for AI Safety**|Jakub \u0141ucki et.al.|[2409.18025](http://arxiv.org/abs/2409.18025)|**[link](https://github.com/ethz-spylab/unlearning-vs-safety)**|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u62d2\u7edd\u5371\u9669\u77e5\u8bc6\u76f8\u5173\u95ee\u9898\u65b9\u9762\u7684\u5fae\u8c03\u65b9\u5f0f\uff0c\u4f46\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5f80\u5f80\u5bb9\u6613\u88ab\u7ed5\u8fc7\u3002\u53bb\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5f7b\u5e95\u6d88\u9664\u6a21\u578b\u7684\u5371\u9669\u80fd\u529b\u5e76\u4f7f\u5176\u5bf9\u653b\u51fb\u8005\u4e0d\u53ef\u8bbf\u95ee\u3002\u672c\u6587\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u6311\u6218\u4e86\u53bb\u5b66\u4e60\u4e0e\u4f20\u7edf\u5b89\u5168\u540e\u8bad\u7ec3\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u4e4b\u524d\u88ab\u8ba4\u4e3a\u65e0\u6548\u7684\u73b0\u6709\u9003\u8131\u65b9\u6cd5\uff0c\u5728\u7cbe\u5fc3\u5e94\u7528\u65f6\u53ef\u4ee5\u6210\u529f\u5e94\u5bf9\u53bb\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u7cfb\u5217\u9002\u5e94\u6027\u65b9\u6cd5\u6765\u6062\u590d\u5927\u90e8\u5206\u88ab\u8ba4\u4e3a\u662f\u65e0\u6cd5\u5b66\u4e60\u7684\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528RMU\uff08\u5f53\u524d\u6700\u5148\u8fdb\u7684\u53bb\u5b66\u4e60\u65b9\u6cd5\uff09\u7f16\u8f91\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u5728\u65e0\u5173\u793a\u4f8b\u4e0a\u8fdb\u884c\u5fae\u8c03\u6216\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u79fb\u9664\u7279\u5b9a\u65b9\u5411\uff0c\u53ef\u4ee5\u6062\u590d\u5927\u90e8\u5206\u5371\u9669\u80fd\u529b\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8d28\u7591\u4e86\u5f53\u524d\u53bb\u5b66\u4e60\u65b9\u6cd5\u7684\u7a33\u5065\u6027\uff0c\u5e76\u5bf9\u5b83\u4eec\u76f8\u5bf9\u4e8e\u5b89\u5168\u8bad\u7ec3\u7684\u4f18\u52bf\u63d0\u51fa\u4e86\u7591\u95ee\u3002|\n", "2409.18023": "|**2024-09-26**|**DARE: Diverse Visual Question Answering with Robustness Evaluation**|Hannah Sterz et.al.|[2409.18023](http://arxiv.org/abs/2409.18023)|null|\u300aDARE\uff1a\u591a\u6837\u5316\u7684\u89c6\u89c9\u95ee\u7b54\u4e0e\u9c81\u68d2\u6027\u8bc4\u4f30\u300b\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u5982\u4e0b\uff1a \u672c\u6587\u5f15\u5165\u4e86DARE\uff08Diverse Visual Question Answering with Robustness Evaluation\uff09\uff0c\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u5e76\u6536\u96c6\u7684\u591a\u9009\u578b\u89c6\u89c9\u95ee\u7b54\u57fa\u51c6\u3002DARE\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u4e94\u4e2a\u4e0d\u540c\u7c7b\u522b\u7684\u89c6\u89c9\u95ee\u9898\u4e0a\uff0c\u5e76\u5305\u62ec\u57fa\u4e8e\u63d0\u793a\u53d8\u5316\u3001\u7b54\u6848\u9009\u9879\u5b50\u96c6\u3001\u8f93\u51fa\u683c\u5f0f\u548c\u6b63\u786e\u7b54\u6848\u6570\u91cf\u7b49\u56db\u4e2a\u9c81\u68d2\u6027\u5bfc\u5411\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5927\u591a\u6570\u7c7b\u522b\u4e2d\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u4e14\u65e0\u6cd5\u5728\u6d4b\u8bd5\u7684\u6240\u6709\u9c81\u68d2\u6027\u8bc4\u4f30\u4e2d\u4fdd\u6301\u4e00\u81f4\u7684\u9ad8\u6027\u80fd\u3002\u5728\u4e0d\u540c\u7b54\u6848\u9009\u9879\u5b50\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u6700\u5dee\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u4e0b\u964d\u53ef\u8fbe\u6807\u51c6\u60c5\u51b5\u4e0b\u768434%\u3002\u5f00\u6e90\u6a21\u578b\u5982LLaVA 1.6\u548cIdefics\u5728\u9c81\u68d2\u6027\u65b9\u9762\u65e0\u6cd5\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4\u548cGemini\u76f8\u5339\u654c\uff0c\u800c\u540e\u8005\u5728\u4e0d\u540c\u53d8\u4f53\u4e0b\u4ecd\u8868\u73b0\u51fa\u660e\u663e\u7684\u8106\u5f31\u6027\u3002 \u603b\u4e4b\uff0c\u8be5\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u65f6\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u8bbe\u8ba1\u66f4\u9c81\u68d2\u7684\u6a21\u578b\u65f6\u9700\u8981\u8003\u8651\u7684\u95ee\u9898\u3002|\n", "2409.18014": "|**2024-09-26**|**Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles**|Lewei He et.al.|[2409.18014](http://arxiv.org/abs/2409.18014)|null|\u9488\u5bf9\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5904\u7406\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ecd\u7136\u5b58\u5728\u5b9e\u73b0\u590d\u6742\u6027\u3001\u8bad\u7ec3\u6548\u7387\u548c\u6570\u636e\u7a00\u758f\u6027\u7b49\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u8303\u5f0f\u2014\u2014\u5728\u7ebf\u957f\u671f\u4e0a\u4e0b\u6587\u5904\u7406\uff08OLP\uff09\uff0c\u9002\u7528\u4e8e\u5904\u7406\u65e0\u9650\u957f\u5ea6\u7684\u6587\u6863\uff0c\u5e38\u89c1\u4e8e\u81ea\u52a8\u5316\u65b0\u95fb\u62a5\u9053\u3001\u76f4\u64ad\u7535\u5546\u548c\u75c5\u6bd2\u77ed\u89c6\u9891\u7b49\u591a\u6837\u5316\u7684\u6d41\u5a92\u4f53\u4fe1\u606f\u63a5\u6536\u4e0e\u7ec4\u7ec7\u573a\u666f\u3002\u540c\u65f6\uff0c\u5728\u9009\u62e9\u4f17\u591a\u6027\u80fd\u4f18\u5f02\u3001\u4ef7\u683c\u9002\u4e2d\u4e14\u54cd\u5e94\u5ef6\u8fdf\u77ed\u7684LLM\u65f6\uff0c\u5f80\u5f80\u9047\u5230\u96be\u4ee5\u6289\u62e9\u7684\u95ee\u9898\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u89d2\u8272\u5f3a\u5316\u5b66\u4e60\uff08Role-RL\uff09\u6846\u67b6\uff0c\u81ea\u52a8\u90e8\u7f72\u4e0d\u540c\u89d2\u8272\u7684LLM\u5728OLP\u7ba1\u9053\u4e2d\uff0c\u6839\u636e\u5176\u5b9e\u9645\u6027\u80fd\u8fdb\u884c\u5408\u7406\u5206\u914d\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5e76\u5728\u6211\u4eec\u7684OLP-MINI\u6570\u636e\u96c6\u4e0a\u53d1\u73b0\uff0c\u7ed3\u5408Role-RL\u6846\u67b6\u7684OLP\u7cfb\u7edf\u5e73\u5747\u53ec\u56de\u7387\u4e3a93.2%\uff0c\u5b9e\u73b0\u4e86OLP\u57fa\u51c6\uff0c\u5e76\u8282\u7701\u4e8679.4%\u7684LLM\u6210\u672c\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\uff1ahttps://anonymous.4open.science/r/Role-RL\u3002|\n", "2409.18957": "|**2024-09-27**|**LML: Language Model Learning a Dataset for Data-Augmented Prediction**|Praneeth Vadlapati et.al.|[2409.18957](http://arxiv.org/abs/2409.18957)|**[link](https://github.com/pro-genai/lml-dap)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u5206\u7c7b\u4efb\u52a1\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u901a\u5e38\u7531\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5904\u7406\u3002\u4e0e\u4f9d\u8d56\u5927\u91cf\u6570\u636e\u6e05\u6d17\u548c\u7279\u5f81\u5de5\u7a0b\u7684ML\u6a21\u578b\u4e0d\u540c\uff0c\u6b64\u65b9\u6cd5\u901a\u8fc7\u7b80\u5316\u6d41\u7a0b\uff0c\u4f7f\u7528LLM\u6765\u4f18\u5316\u8fc7\u7a0b\u3002\u672c\u6587\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\uff08LML\uff09\u201d\u7684\u6982\u5ff5\uff0c\u501f\u52a9\u4e00\u79cd\u79f0\u4e3a\u201c\u6570\u636e\u589e\u5f3a\u9884\u6d4b\uff08DAP\uff09\u201d\u7684\u65b0\u65b9\u6cd5\u3002\u5206\u7c7b\u4efb\u52a1\u7531LLM\u6267\u884c\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u624b\u52a8\u63a2\u7d22\u548c\u7406\u89e3\u6570\u636e\uff0c\u5e76\u5229\u7528\u6570\u636e\u4f5c\u4e3a\u53c2\u8003\u6765\u505a\u51fa\u5206\u7c7b\u51b3\u7b56\u3002 \u8bad\u7ec3\u6570\u636e\u88ab\u603b\u7ed3\u548c\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5bfc\u81f4\u6bcf\u4e2a\u6807\u7b7e\u5206\u7c7b\u7684\u4e3b\u8981\u7279\u5f81\u3002\u5728DAP\u8fc7\u7a0b\u4e2d\uff0c\u7cfb\u7edf\u4f7f\u7528\u6570\u636e\u6982\u8981\u81ea\u52a8\u751f\u6210\u67e5\u8be2\uff0c\u7528\u4e8e\u4ece\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u884c\u3002\u901a\u8fc7\u4f7f\u7528\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u6570\u636e\uff0cLLM\u57fa\u4e8e\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u884c\u751f\u6210\u5206\u7c7b\uff0c\u5373\u4f7f\u9762\u5bf9\u590d\u6742\u6570\u636e\u4e5f\u80fd\u786e\u4fdd\u6ee1\u610f\u7684\u51c6\u786e\u6027\u3002\u6570\u636e\u6982\u8981\u548c\u7c7b\u4f3c\u6570\u636e\u5728DAP\u4e2d\u7684\u5e94\u7528\u786e\u4fdd\u4e86\u51b3\u7b56\u7684\u4e0a\u4e0b\u6587\u610f\u8bc6\u3002\u8be5\u65b9\u6cd5\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u4e86\u201c\u4ee5\u53ef\u89e3\u91ca\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8eab\u4efd\u884c\u4e8b\u201d\u7684\u8bed\u53e5\uff0c\u589e\u5f3a\u4e86\u9884\u6d4b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5141\u8bb8\u7528\u6237\u5ba1\u67e5\u6bcf\u6761\u9884\u6d4b\u80cc\u540e\u7684\u903b\u8f91\u3002\u5728\u67d0\u4e9b\u6d4b\u8bd5\u6848\u4f8b\u4e2d\uff0c\u7cfb\u7edf\u7684\u51c6\u786e\u7387\u8d85\u8fc790%\uff0c\u8bc1\u660e\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u53ca\u5176\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8d85\u8d8a\u4f20\u7edfML\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/Pro-GenAI/LML-DAP\u3002**|\n", "2409.18943": "|**2024-09-27**|**Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models**|Jiaming Li et.al.|[2409.18943](http://arxiv.org/abs/2409.18943)|**[link](https://github.com/geaming2002/ruler)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u4f7f\u5f97\u4eba\u7c7b\u80fd\u591f\u4ee5\u81ea\u7136\u7684\u65b9\u5f0f\u4e0eAI\u4ee3\u7406\u4e92\u52a8\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u751f\u6210\u7279\u5b9a\u957f\u5ea6\u54cd\u5e94\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f80\u5f80\u96be\u4ee5\u6ee1\u8db3\u7528\u6237\u9700\u6c42\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u5728\u51c6\u786e\u611f\u77e5\u6570\u503c\u9650\u5236\u65b9\u9762\u5b58\u5728\u7684\u56fa\u6709\u56f0\u96be\u3002\u4e3a\u4e86\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9075\u5faa\u7279\u5b9a\u957f\u5ea6\u6307\u4ee4\u65f6\u63a7\u5236\u751f\u6210\u54cd\u5e94\u957f\u5ea6\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\uff08TLG\uff09\u5e76\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u7cbe\u786e\u5339\u914d\uff08PM\uff09\u548c\u7075\u6d3b\u5339\u914d\uff08FM\uff09\uff0c\u4ee5\u8bc4\u4f30\u6a21\u578b\u5728\u9075\u5b88\u6307\u5b9a\u54cd\u5e94\u957f\u5ea6\u65b9\u9762\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u6a21\u578b\u65e0\u5173\u7684\u65b9\u6cd5Ruler\uff0c\u901a\u8fc7\u4f7f\u7528\u5143\u957f\u5ea6\u6807\u8bb0\uff08MLTs\uff09\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u5ea6\u53d7\u9650\u6307\u4ee4\u4e0b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0cRuler\u4f7fLLMs\u80fd\u591f\u5728\u6307\u4ee4\u4e2d\u5305\u542b\u957f\u5ea6\u7ea6\u675f\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u6307\u5b9a\u957f\u5ea6\u7684\u54cd\u5e94\u3002\u800c\u4e14\uff0c\u5f53\u957f\u5ea6\u7ea6\u675f\u6ca1\u6709\u660e\u786e\u63d0\u4f9b\u65f6\uff0cRuler\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5f53\u7684MLT\uff0c\u8868\u73b0\u51fa\u51fa\u8272\u7684\u901a\u7528\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRuler\u5728\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\u4e0a\u5bf9\u4e0d\u540c\u7684LLMs\u90fd\u663e\u793a\u51fa\u6709\u6548\u6027\uff0c\u4f8b\u5982\u5728PM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a27.97\uff0c\u5728FM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a29.57\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86Ruler\u7684\u6709\u6548\u6027\u53ca\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/Geaming2002/Ruler\u83b7\u53d6\u3002**|\n", "2409.18938": "|**2024-09-27**|**From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding**|Heqing Zou et.al.|[2409.18938](http://arxiv.org/abs/2409.18938)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u89c6\u89c9\u7f16\u7801\u5668\u96c6\u6210\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5229\u7528\u5176\u56fa\u6709\u4f18\u52bf\u6765\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u4ee5\u8fdb\u884c\u89c6\u89c9\u63a8\u7406\u3002\u7531\u4e8e\u89c6\u89c9\u6570\u636e\u7684\u591a\u6837\u6027\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MM-LLMs\uff09\u5728\u8bbe\u8ba1\u548c\u8bad\u7ec3\u4e0a\u9488\u5bf9\u7406\u89e3\u56fe\u50cf\u3001\u77ed\u89c6\u9891\u548c\u957f\u89c6\u9891\u65f6\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u7279\u5f81\u548c\u6311\u6218\u3002\u6211\u4eec\u7684\u7814\u7a76\u805a\u7126\u4e8e\u957f\u89c6\u9891\u7406\u89e3\u4e0e\u9759\u6001\u56fe\u50cf\u53ca\u77ed\u89c6\u9891\u7406\u89e3\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u53ca\u5176\u72ec\u7279\u6311\u6218\u3002 \u4e0d\u540c\u4e8e\u9759\u6001\u56fe\u50cf\uff0c\u77ed\u89c6\u9891\u5305\u542b\u4e86\u5e8f\u5217\u5e27\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u5185\u90e8\u7684\u65f6\u95f4\u4fe1\u606f\uff1b\u800c\u957f\u89c6\u9891\u5219\u5305\u542b\u4e86\u591a\u4e2a\u4e8b\u4ef6\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u95f4\u7684\u957f\u671f\u65f6\u95f4\u4f9d\u8d56\u6027\u3002\u672c\u6587\u65e8\u5728\u8ffd\u6eaf\u5e76\u603b\u7ed3MM-LLMs\u4ece\u56fe\u50cf\u7406\u89e3\u5230\u957f\u89c6\u9891\u7406\u89e3\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u8be6\u7ec6\u5bf9\u6bd4\u5404\u79cd\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e4b\u95f4\u7684\u5dee\u5f02\uff0c\u5e76\u7a81\u51fa\u957f\u89c6\u9891\u7406\u89e3\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u66f4\u7ec6\u81f4\u7684\u65f6\u7a7a\u7ec6\u8282\u3001\u52a8\u6001\u4e8b\u4ef6\u548c\u957f\u671f\u4f9d\u8d56\u6027\u3002 \u63a5\u7740\uff0c\u672c\u6587\u5bf9MM-LLMs\u5728\u6a21\u578b\u8bbe\u8ba1\u548c\u8bad\u7ec3\u65b9\u6cd5\u4e0a\u7684\u53d1\u5c55\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u6982\u8ff0\uff0c\u7279\u522b\u5173\u6ce8\u4e8e\u5982\u4f55\u6709\u6548\u7406\u89e3\u957f\u89c6\u9891\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6bd4\u8f83\u73b0\u6709MM-LLMs\u5728\u4e0d\u540c\u957f\u5ea6\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u8868\u73b0\uff0c\u672c\u6587\u8ba8\u8bba\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u89c6\u9891\u7406\u89e3\u9886\u57df\u53ef\u80fd\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.18924": "|**2024-09-27**|**AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow**|Huizi Yu et.al.|[2409.18924](http://arxiv.org/abs/2409.18924)|null|\u5728\u73b0\u4ee3\u533b\u5b66\u6559\u80b2\u4e0e\u7814\u7a76\u9886\u57df\uff0c\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u53d1\u6325\u7740\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u3001\u7efc\u5408\u7684\u5b66\u4e60\u73af\u5883\uff0c\u5e76\u5141\u8bb8\u8fdb\u884c\u4e34\u5e8a\u51b3\u7b56\u6a21\u62df\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6709\u671b\u901a\u8fc7\u9ad8\u4fdd\u771f\u5ea6\u548c\u4f4e\u6210\u672c\u5730\u590d\u5236\u533b\u7597\u72b6\u51b5\u548c\u533b\u60a3\u4e92\u52a8\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u7cfb\u7edf\u7684\u6709\u6548\u6027\u548c\u53ef\u4fe1\u6027\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u4e00\u4e2a\u89c4\u6a21\u5927\u3001\u591a\u6837\u4e14\u7cbe\u786e\u7684\u60a3\u8005\u77e5\u8bc6\u5e93\uff0c\u540c\u65f6\u5177\u5907\u5f3a\u5927\u7684\u7a33\u5b9a\u77e5\u8bc6\u4f20\u64ad\u80fd\u529b\u3002 \u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AIPatient\uff0c\u8fd9\u662f\u4e00\u4e2a\u9ad8\u7ea7\u7684\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\uff0c\u5b83\u4ee5AIPatient\u77e5\u8bc6\u56fe\u8c31\uff08AIPatient KG\uff09\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u91c7\u7528\u57fa\u4e8e\u63a8\u7406\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Reasoning RAG\uff09\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u4f5c\u4e3a\u751f\u6210\u57fa\u7840\u3002AIPatient KG\u4eceMedical Information Mart for Intensive Care\uff08MIMIC-III\uff09\u6570\u636e\u5e93\u4e2d\u7684\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHRs\uff09\u62bd\u53d6\u6570\u636e\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u5728\u77e5\u8bc6\u5e93\u6709\u6548\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08F1\u5f97\u5206\u4e3a0.89\uff09\u3001\u4e34\u5e8a\u591a\u6837\u6027\u548c\u76f8\u5173\u6027\u9ad8\u76841,495\u540d\u60a3\u8005\u7684\u7fa4\u4f53\u3002 Reasoning RAG\u5229\u7528\u4e86\u516d\u4e2a\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u8986\u76d6\u4e86\u5305\u62ec\u68c0\u7d22\u3001KG\u67e5\u8be2\u751f\u6210\u3001\u62bd\u8c61\u3001\u68c0\u67e5\u3001\u91cd\u5199\u548c\u603b\u7ed3\u5728\u5185\u7684\u4efb\u52a1\u3002\u8fd9\u4e2a\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8eEHR\u7684\u533b\u7597\u95ee\u7b54\uff08QA\uff09\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8694.15%\u7684\u6574\u4f53\u51c6\u786e\u6027\uff0c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u65e0\u4ee3\u7406\u6216\u90e8\u5206\u4ee3\u7406\u96c6\u6210\u7684\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cfb\u7edf\u8fd8\u5c55\u793a\u4e86\u9ad8\u53ef\u8bfb\u6027\uff08\u4e2d\u4f4d\u6570Flesch\u9605\u8bfb\u8f7b\u677e\u5ea677.23\uff1b\u4e2d\u4f4d\u6570Flesch-Kincaid\u5e74\u7ea75.6\uff09\u3001\u7a33\u5065\u6027\uff08ANOVA F\u503c0.6126\uff0cp<0.1\uff09\u548c\u7a33\u5b9a\u6027\uff08ANOVA F\u503c0.782\uff0cp<0.1\uff09\u3002AIPatient\u7cfb\u7edf\u7684\u51fa\u8272\u6027\u80fd\u9884\u793a\u7740\u5176\u5728\u533b\u5b66\u6559\u80b2\u3001\u6a21\u578b\u8bc4\u4f30\u548c\u7cfb\u7edf\u96c6\u6210\u7b49\u591a\u4e2a\u5e94\u7528\u9886\u57df\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2409.18911": "|**2024-09-27**|**Soft Measures for Extracting Causal Collective Intelligence**|Maryam Berijanian et.al.|[2409.18911](http://arxiv.org/abs/2409.18911)|**[link](https://github.com/kuldeep7688/soft-measures-causal-intelligence)**|**\u7406\u89e3\u4e0e\u6a21\u62df\u96c6\u4f53\u667a\u6167\u5bf9\u4e8e\u5904\u7406\u590d\u6742\u793e\u4f1a\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u6a21\u7cca\u8ba4\u77e5\u5730\u56fe\uff08FCMs\uff09\u4f5c\u4e3a\u8868\u793a\u56e0\u679c\u5fc3\u7406\u6a21\u578b\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u901a\u8fc7\u5b9a\u5411\u56fe\u8fdb\u884c\u7f16\u7801\uff0c\u4f46\u76f4\u63a5\u4ece\u6587\u672c\u63d0\u53d6\u9ad8\u53ef\u4fe1\u5ea6\u7684FCMs\u5177\u6709\u6311\u6218\u6027\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u63d0\u53d6FCMs\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5f15\u5165\u4e86\u65b0\u9896\u7684\u57fa\u4e8e\u56fe\u7684\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528Elo\u8bc4\u5206\u7cfb\u7edf\u5173\u8054\u8f93\u51fa\u4e0e\u4eba\u7c7b\u5224\u65ad\u6765\u8bc4\u4f30\u8fd9\u4e9b\u5ea6\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u5ea6\u91cf\u4e0e\u4eba\u7c7b\u8bc4\u4ef7\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\uff0c\u5c3d\u7ba1\u8868\u73b0\u6700\u597d\u7684\u5ea6\u91cf\u4ecd\u7136\u5728\u6355\u6349FCM\u7ec6\u5fae\u5dee\u522b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u73b0\u6709\u7684\u5ea6\u91cf\u4ecd\u7136\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u9700\u6c42\u3002\u672c\u7814\u7a76\u5f3a\u8c03\u4e86\u9700\u8981\u9488\u5bf9FCMs\u63d0\u53d6\u8bbe\u8ba1\u7684\u8f6f\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u4f7f\u7528NLP\u6a21\u62df\u96c6\u4f53\u667a\u6167\u7684\u53d1\u5c55\u3002**|\n", "2409.18892": "|**2024-09-27**|**IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation**|Fan Lin et.al.|[2409.18892](http://arxiv.org/abs/2409.18892)|**[link](https://github.com/DUTlf/IDGen)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u8bc4\u4f30\u96c6\u5fc5\u987b\u4e0e\u65f6\u4ff1\u8fdb\uff0c\u4ee5\u786e\u4fdd\u5176\u6301\u7eed\u4fdd\u6301\u8db3\u591f\u7684\u533a\u5206\u80fd\u529b\u3002\u53d7\u6559\u80b2\u8bc4\u4f30\u4e2d\u5e7f\u6cdb\u4f7f\u7528\u7684\u9879\u76ee\u9274\u522b\uff08Item Discrimination, ID\uff09\u7406\u8bba\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eID\u7684\u63d0\u793a\u5408\u6210\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\uff0c\u786e\u4fdd\u8bc4\u4f30\u96c6\u80fd\u591f\u6839\u636e\u6a21\u578b\u7684\u80fd\u529b\u4e0d\u65ad\u66f4\u65b0\u548c\u4f18\u5316\u3002\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u6ce8\u91cd\u5e7f\u5ea6\u4e0e\u7cbe\u786e\u6027\u5e76\u91cd\u3002\u5b83\u80fd\u751f\u6210\u65e2\u80fd\u5168\u9762\u8bc4\u4f30LLMs\u80fd\u529b\uff0c\u53c8\u80fd\u63ed\u793a\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u6709\u610f\u4e49\u6027\u80fd\u5dee\u5f02\u7684\u63d0\u793a\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u548c\u9886\u57df\u4e2d\u7684\u76f8\u5bf9\u5f3a\u9879\u548c\u5f31\u70b9\u7684\u6709\u6548\u533a\u5206\u3002 \u4e3a\u4e86\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u901a\u7528\u5316\u6846\u67b6\u4e2d\u878d\u5165\u4e86\u4e00\u4e2a\u81ea\u6211\u6821\u6b63\u673a\u5236\uff0c\u5e76\u5f00\u53d1\u4e86\u4e24\u4e2a\u6a21\u578b\u6765\u9884\u6d4b\u63d0\u793a\u7684\u9274\u522b\u80fd\u529b\u548c\u96be\u5ea6\u8bc4\u5206\uff0c\u4ee5\u6b64\u63a8\u52a8\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u3002\u8fd9\u4e9b\u5de5\u5177\u5bf9\u8bc4\u4f30\u6570\u636e\u5408\u6210\u7814\u7a76\u5177\u6709\u91cd\u8981\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u6570\u636e\u5e94\u7528\u4e8e\u8bc4\u4f30\u4e94\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u3002\u8be5\u6570\u636e\u5e73\u5747\u5f97\u5206\u4e3a51.92\uff0c\u65b9\u5dee\u4e3a10.06\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5148\u524d\u7684\u5de5\u4f5c\uff08\u5982SELF-INSTRUCT\u548cWizardLM\uff09\u7684\u5e73\u5747\u5f97\u5206\u8d85\u8fc767\uff0c\u65b9\u5dee\u4f4e\u4e8e3.2\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u6846\u67b6\u751f\u6210\u7684\u6570\u636e\u5728\u6311\u6218\u6027\u548c\u533a\u5206\u80fd\u529b\u4e0a\u6bd4\u4e4b\u524d\u7684\u5de5\u4f5c\u66f4\u5177\u4f18\u52bf\u3002\u6211\u4eec\u8ba1\u5212\u53d1\u5e03\u5305\u542b\u8d85\u8fc73000\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7684\u6570\u636e\u5e93\uff0c\u4ee5\u4fc3\u8fdbLLMs\u8bc4\u4f30\u7814\u7a76\u7684\u53d1\u5c55\u3002|\n", "2409.18858": "|**2024-09-27**|**Predicting and analyzing memorization within fine-tuned Large Language Models**|J\u00e9r\u00e9mie Dentan et.al.|[2409.18858](http://arxiv.org/abs/2409.18858)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u8bb0\u5fc6\u4e86\u76f8\u5f53\u5927\u7684\u6bd4\u4f8b\uff0c\u8fd9\u5728\u63a8\u7406\u65f6\u6784\u6210\u4e86\u4e25\u91cd\u7684\u5a01\u80c1\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u79cd\u65e0\u610f\u7684\u8bb0\u5fc6\u95ee\u9898\uff0c\u7406\u89e3\u54ea\u4e9b\u5143\u7d20\u88ab\u8bb0\u5fc6\u4ee5\u53ca\u539f\u56e0\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u5927\u591a\u6570\u73b0\u6709\u5de5\u4f5c\u63d0\u4f9b\u7684\u662f\u4e8b\u540e\u89e3\u91ca\uff0c\u8fd9\u5728\u5b9e\u8df5\u4e2d\u5174\u8da3\u6709\u9650\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u5207\u7247\u4e92\u4fe1\u606f\uff0c\u5728\u5206\u7c7b\u573a\u666f\u4e2d\u9884\u5148\u68c0\u6d4b\u8bb0\u5fc6\u6837\u672c\u3002\u8be5\u65b9\u6cd5\u4ece\u8bad\u7ec3\u7684\u65e9\u671f\u9636\u6bb5\u5c31\u5177\u6709\u9ad8\u6548\u6027\uff0c\u5e76\u4e14\u6613\u4e8e\u9002\u5e94\u5b9e\u9645\u573a\u666f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f97\u5230\u4e86\u65b0\u7684\u7406\u8bba\u7ed3\u679c\u7684\u652f\u6301\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff0c\u5e76\u4e14\u9700\u8981\u8f83\u4f4e\u7684\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u83b7\u5f97\u4e86\u5f3a\u5927\u7684\u5b9e\u8bc1\u7ed3\u679c\uff0c\u4e3a\u5728\u8bb0\u5fc6\u53d1\u751f\u4e4b\u524d\u7cfb\u7edf\u5730\u68c0\u67e5\u548c\u4fdd\u62a4\u8fd9\u4e9b\u6613\u53d7\u5f71\u54cd\u7684\u6837\u672c\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.18857": "|**2024-09-27**|**Mitigating Selection Bias with Node Pruning and Auxiliary Options**|Hyeong Kyu Choi et.al.|[2409.18857](http://arxiv.org/abs/2409.18857)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u56de\u7b54\u591a\u9009\u9898\u65f6\u5f80\u5f80\u8868\u73b0\u51fa\u5bf9\u67d0\u4e9b\u9009\u9879\u7684\u4e0d\u9002\u5f53\u504f\u597d\uff0c\u8fd9\u5728LLM\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u5f15\u53d1\u4e86\u663e\u8457\u7684\u53ef\u9760\u6027\u95ee\u9898\u3002\u4ee5\u5f80\u7684\u89e3\u51b3\u65b9\u6848\u4e3b\u8981\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u8f93\u5165\u548c/\u6216\u8f93\u51fa\u6765\u5e94\u5bf9\u504f\u89c1\u95ee\u9898\u3002\u800c\u6211\u4eec\u7684\u5de5\u4f5c\u5219\u91c7\u53d6\u4e86\u4e0d\u540c\u7684\u8def\u5f84\uff0c\u65e8\u5728\u63a2\u7a76\u6a21\u578b\u5185\u90e8\u504f\u89c1\u7684\u5f62\u6210\u673a\u5236\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u504f\u5dee\u8282\u70b9\u4fee\u526a\uff08BNP\uff09\u7684\u65b0\u9896\u53bb\u504f\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u5220\u9664\u90a3\u4e9b\u5bfc\u81f4\u504f\u89c1\u7684\u7ebf\u6027\u5c42\u53c2\u6570\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u8f85\u52a9\u9009\u9879\u6ce8\u5165\uff08AOI\uff09\u7684\u7b80\u5355\u800c\u6709\u6548\u7684\u8f93\u5165\u4fee\u6539\u6280\u672f\uff0c\u9002\u7528\u4e8e\u9ed1\u76d2\u6a21\u578b\u7684\u53bb\u504f\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e00\u4e2a\u66f4\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u9009\u62e9\u504f\u89c1\uff0c\u6211\u4eec\u56de\u987e\u4e86\u73b0\u6709\u6307\u6807\uff0c\u5e76\u63d0\u51fa\u4e86\u9009\u62e9Kullback-Leibler\u6563\u5ea6\uff08CKLD\uff09\uff0c\u4ee5\u89e3\u51b3\u5e38\u7528\u6307\u6807\u5bf9\u6807\u7b7e\u4e0d\u5e73\u8861\u4e0d\u654f\u611f\u7684\u95ee\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5e94\u7528\u5230\u4e09\u79cd\u4e0d\u540c\u7684LLM\u65f6\u8868\u73b0\u51fa\u4e86\u9c81\u68d2\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2409.18812": "|**2024-09-27**|**LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis**|Hamed Babaei Giglou et.al.|[2409.18812](http://arxiv.org/abs/2409.18812)|**[link](https://github.com/HamedBabaei/LLMs4Synthesis)**|\u9762\u5bf9\u79d1\u5b66\u6587\u732e\u65e5\u76ca\u589e\u957f\u7684\u590d\u6742\u6027\u548c\u6570\u91cf\uff0c\u672c\u6587\u63d0\u51fa\u4e86LLMs4Synthesis\u6846\u67b6\uff0c\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u79d1\u5b66\u7efc\u5408\u5206\u6790\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u9488\u5bf9\u5feb\u901f\u3001\u8fde\u8d2f\u548c\u8bed\u5883\u4e30\u5bcc\u7684\u79d1\u5b66\u89c1\u89e3\u96c6\u6210\u9700\u6c42\uff0c\u5229\u7528\u5f00\u6e90\u548c\u4e13\u6709LLMs\uff0c\u4ee5\u89e3\u51b3\u5f53\u524d\u5b9a\u91cf\u6307\u6807\u5728\u8bc4\u4f30\u8fd9\u4e9b\u7efc\u5408\u5206\u6790\u65f6\u5b58\u5728\u7684\u4e0d\u8db3\u3002\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u5904\u7406\u79d1\u5b66\u8bba\u6587\u7684\u65b0\u65b9\u6cd5\u3001\u5b9a\u4e49\u65b0\u7684\u7efc\u5408\u7c7b\u578b\u4ee5\u53ca\u5efa\u7acb\u4e5d\u9879\u8be6\u7ec6\u7684\u8d28\u91cf\u8bc4\u4f30\u6807\u51c6\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5bf9\u8fd9\u4e00\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6211\u4eec\u8fd8\u63d0\u8bae\u5c06LLMs\u4e0e\u5f3a\u5316\u5b66\u4e60\u548cAI\u53cd\u9988\u76f8\u7ed3\u5408\uff0c\u4ee5\u4f18\u5316\u7efc\u5408\u8d28\u91cf\uff0c\u5e76\u786e\u4fdd\u5176\u4e0e\u65e2\u5b9a\u6807\u51c6\u4fdd\u6301\u4e00\u81f4\u3002LLMs4Synthesis\u6846\u67b6\u53ca\u5176\u7ec4\u6210\u90e8\u5206\u7684\u53ef\u7528\u6027\uff0c\u6709\u671b\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u7efc\u5408\u8fc7\u7a0b\u7684\u751f\u6210\u548c\u8bc4\u4ef7\u80fd\u529b\u3002|\n", "2409.18794": "|**2024-09-27**|**Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs**|Yanyuan Qiao et.al.|[2409.18794](http://arxiv.org/abs/2409.18794)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u540d\u4e3aOpen-Nav\u7684\u521b\u65b0\u7814\u7a76\uff0c\u65e8\u5728\u63a2\u7d22\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fde\u7eed\u73af\u5883\u4e2d\u7684\u96f6\u6837\u672c\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u5e94\u7528\u3002Open-Nav\u91c7\u7528\u4e86\u7a7a\u95f4\u65f6\u95f4\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63a8\u7406\u65b9\u6cd5\uff0c\u5c06\u4efb\u52a1\u5206\u89e3\u4e3a\u6307\u4ee4\u7406\u89e3\u3001\u8fdb\u5ea6\u4f30\u8ba1\u548c\u51b3\u7b56\u5236\u5b9a\u4e09\u4e2a\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u6a21\u578b\u5728\u5bfc\u822a\u573a\u666f\u4e2d\u7684\u611f\u77e5\u80fd\u529b\u5e76\u589e\u5f3a\u5bf9\u7ec6\u7c92\u5ea6\u7269\u4f53\u548c\u7a7a\u95f4\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u6a21\u62df\u73af\u5883\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5747\u663e\u793a\uff0cOpen-Nav\u80fd\u591f\u4e0e\u4f7f\u7528\u95ed\u6e90LLMs\u5b9e\u73b0\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u6027\u80fd\u3002|\n", "2409.20566": "|**2024-09-30**|**MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning**|Haotian Zhang et.al.|[2409.20566](http://arxiv.org/abs/2409.20566)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cfMM1.5\uff0c\u65e8\u5728\u589e\u5f3a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7406\u89e3\u3001\u89c6\u89c9\u5f15\u7528\u4e0e\u5b9a\u4f4d\u4ee5\u53ca\u591a\u56fe\u50cf\u63a8\u7406\u7684\u80fd\u529b\u3002\u5728MM1\u67b6\u6784\u7684\u57fa\u7840\u4e0a\uff0cMM1.5\u91c7\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u5728\u6574\u4e2a\u6a21\u578b\u8bad\u7ec3\u751f\u547d\u5468\u671f\u5185\u4e0d\u540c\u6570\u636e\u6df7\u5408\u7684\u5f71\u54cd\u3002\u8fd9\u5305\u62ec\u9ad8\u8d28\u91cf\u7684OCR\u6570\u636e\u548c\u5408\u6210\u63cf\u8ff0\u7b26\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4ee5\u53ca\u4f18\u5316\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u53c2\u6570\u636e\u6df7\u5408\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6a21\u578b\u6db5\u76d6\u4e86\u4ece1\u4ebf\u523030\u4ebf\u53c2\u6570\u7684\u8303\u56f4\uff0c\u5305\u62ec\u5bc6\u96c6\u578b\u548c\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u53d8\u4f53\uff0c\u5e76\u8bc1\u660e\u4e86\u5373\u4f7f\u5728\u8f83\u5c0f\u89c4\u6a21\uff081\u4ebf\u548c3\u4ebf\u53c2\u6570\uff09\u4e0b\uff0c\u7cbe\u5fc3\u7684\u6570\u636e\u6574\u7406\u548c\u8bad\u7ec3\u7b56\u7565\u4e5f\u80fd\u4ea7\u751f\u5f3a\u5927\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u4e13\u95e8\u7684\u53d8\u4f53\uff1aMM1.5-Video\uff0c\u7528\u4e8e\u89c6\u9891\u7406\u89e3\uff1bMM1.5-UI\uff0c\u7528\u4e8e\u79fb\u52a8\u7528\u6237\u754c\u9762\u7406\u89e3\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u548c\u6d88\u878d\u5206\u6790\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u548c\u51b3\u7b56\u7684\u8be6\u7ec6\u89c1\u89e3\uff0c\u8fd9\u4e9b\u89c1\u89e3\u5bf9\u4e8e\u672a\u6765\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u5177\u6709\u5b9d\u8d35\u7684\u6307\u5bfc\u610f\u4e49\u3002|\n", "2409.20557": "|**2024-09-30**|**Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos**|Md Mohaiminul Islam et.al.|[2409.20557](http://arxiv.org/abs/2409.20557)|null|\u672c\u6587\u63d0\u51fa\u4e86VidAssist\uff0c\u4e00\u4e2a\u7528\u4e8e\u4ece\u6559\u5b66\u89c6\u9891\u4e2d\u8fdb\u884c\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u7684\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u7684\u96c6\u6210\u6846\u67b6\u3002VidAssist\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u77e5\u8bc6\u5e93\u548c\u8bc4\u4f30\u5de5\u5177\uff0c\u751f\u6210\u5e76\u8bc4\u4f30\u884c\u52a8\u8ba1\u5212\uff0c\u4ee5\u6b64\u514b\u670d\u4ece\u5c0f\u89c4\u6a21\u3001\u4f4e\u591a\u6837\u6027\u6570\u636e\u96c6\u83b7\u53d6\u8fc7\u7a0b\u77e5\u8bc6\u7684\u6311\u6218\u3002\u6b64\u5916\uff0cVidAssist\u91c7\u7528\u5e7f\u5ea6\u4f18\u5148\u641c\u7d22\u7b97\u6cd5\u8fdb\u884c\u6700\u4f18\u8ba1\u5212\u751f\u6210\uff0c\u5e76\u4f7f\u7528\u4e13\u4e3a\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u8ba1\u7684\u4ef7\u503c\u51fd\u6570\uff0c\u5728\u6bcf\u4e00\u6b65\u8bc4\u4f30\u9884\u6d4b\u52a8\u4f5c\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cVidAssist\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u4e0d\u540c\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u7f6e\u7684\u7edf\u4e00\u6846\u67b6\uff0c\u5982\u89c6\u89c9\u8f85\u52a9\u89c4\u5212\uff08VPA\uff09\u548c\u7a0b\u5e8f\u89c4\u5212\uff08PP\uff09\uff0c\u5728\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5c11\u91cf\u6837\u672c\u6a21\u578b\u5728COIN\u6570\u636e\u96c6\u4e0a\u7684VPA\u4efb\u52a1\u548cPP\u4efb\u52a1\u4e0a\u5206\u522b\u6bd4\u5168\u76d1\u7763\u7684\u524d\u5bfc\u65b9\u6cd5\u9ad8\u51fa+7.7%\u548c+4.81%\uff0c\u540c\u65f6\u9884\u6d4b4\u4e2a\u672a\u6765\u52a8\u4f5c\u3002\u6240\u6709\u4ee3\u7801\u548c\u6a21\u578b\u90fd\u5728https://sites.google.com/view/vidassist\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2409.20550": "|**2024-09-30**|**LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation**|Ziyao Zhang et.al.|[2409.20550](http://arxiv.org/abs/2409.20550)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u73b0\u8c61\u7684\u5b9e\u8bc1\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u5b9e\u9645\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u590d\u6742\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u65f6\uff0c\u5f80\u5f80\u4f1a\u4ea7\u751f\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u7684\u7ed3\u679c\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u5728\u5355\u4e00\u529f\u80fd\u751f\u6210\u573a\u666f\u4e0b\u7684\u5e7b\u89c9\u5206\u6790\uff0c\u4f46\u672c\u6587\u5c06\u7814\u7a76\u8303\u56f4\u6269\u5c55\u81f3\u66f4\u5b9e\u9645\u4e14\u590d\u6742\u7684\u4ed3\u5e93\u7ea7\u751f\u6210\u60c5\u666f\u3002 \u9996\u5148\uff0c\u901a\u8fc7\u4eba\u5de5\u68c0\u67e5\u516d\u79cd\u4e3b\u6d41LLM\u7684\u4ee3\u7801\u751f\u6210\u7ed3\u679c\uff0c\u672c\u6587\u5efa\u7acb\u4e86LLM\u751f\u6210\u4ee3\u7801\u7684\u5e7b\u89c9\u5206\u7c7b\u4f53\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u8be6\u7ec6\u9610\u8ff0\u4e86\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5206\u6790\u4e86\u4e0d\u540c\u6a21\u578b\u95f4\u5e7b\u89c9\u5206\u5e03\u7684\u60c5\u51b5\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5e7b\u89c9\u4ea7\u751f\u7684\u539f\u56e0\uff0c\u5e76\u8bc6\u522b\u4e86\u56db\u4e2a\u53ef\u80fd\u5bfc\u81f4\u5e7b\u89c9\u7684\u56e0\u7d20\u3002 \u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bb0\u5fc6\u7f51\u7edc\uff08RAG\uff09\u7684\u7f13\u89e3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u6240\u6709\u7814\u7a76\u7684LLM\u4e0a\u5747\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6709\u6548\u6027\u3002\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u62ec\u4ee3\u7801\u3001\u6570\u636e\u548c\u5b9e\u9a8c\u7ed3\u679c\u7684\u53ef\u590d\u5236\u5305\uff0c\u4f9b\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u53c2\u8003\u548c\u9a8c\u8bc1\u3002\u6b64\u7814\u7a76\u6709\u52a9\u4e8e\u63d0\u9ad8LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u53ef\u9760\u6027\u4e0e\u51c6\u786e\u6027\uff0c\u5bf9\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20548": "|**2024-09-30**|**Robi Butler: Remote Multimodal Interactions with Household Robot Assistant**|Anxing Xiao et.al.|[2409.20548](http://arxiv.org/abs/2409.20548)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Robi Butler\uff0c\u4e00\u79cd\u65b0\u578b\u7684\u5bb6\u5ead\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5b83\u80fd\u591f\u4e0e\u8fdc\u7a0b\u7528\u6237\u8fdb\u884c\u591a\u6a21\u6001\u4ea4\u4e92\u3002\u57fa\u4e8e\u5148\u8fdb\u7684\u901a\u4fe1\u63a5\u53e3\uff0cRobi Butler\u5141\u8bb8\u7528\u6237\u76d1\u63a7\u673a\u5668\u4eba\u7684\u72b6\u6001\u3001\u53d1\u9001\u6587\u672c\u6216\u8bed\u97f3\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u624b\u52bf\u9009\u62e9\u76ee\u6807\u5bf9\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u9ad8\u7ea7\u884c\u4e3a\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u89e3\u91ca\u591a\u6a21\u6001\u6307\u4ee4\u5e76\u751f\u6210\u884c\u52a8\u8ba1\u5212\u3002\u8fd9\u4e9b\u8ba1\u5212\u7531\u652f\u6301\u6587\u672c\u548c\u70b9\u51fb\u67e5\u8be2\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5904\u7406\u7684\u5f00\u653e\u8bcd\u6c47\u96c6\u7ec4\u6210\u3002\u6574\u5408\u4ee5\u4e0a\u7ec4\u4ef6\u4f7f\u5f97Robi Butler\u80fd\u591f\u5728\u96f6\u6837\u672c\u7684\u60c5\u51b5\u4e0b\u5c06\u8fdc\u7a0b\u591a\u6a21\u6001\u6307\u4ee4\u8f6c\u5316\u4e3a\u73b0\u5b9e\u4e16\u754c\u5bb6\u5ead\u73af\u5883\u4e2d\u7684\u5b9e\u9645\u64cd\u4f5c\u3002\u6211\u4eec\u901a\u8fc7\u6f14\u793a\u5404\u79cd\u65e5\u5e38\u5bb6\u52a1\u4efb\u52a1\u7684\u6709\u6548\u6027\u548c\u6548\u7387\uff0c\u5c55\u793a\u4e86\u8be5\u7cfb\u7edf\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u5230\u8fdc\u7a0b\u7528\u6237\u7ed9\u51fa\u591a\u6a21\u6001\u6307\u4ee4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u7528\u6237\u7814\u7a76\uff0c\u5206\u6790\u4e86\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8fdc\u7a0b\u4eba\u673a\u4ea4\u4e92\u7684\u6548\u7387\u548c\u7528\u6237\u4f53\u9a8c\u7684\u5f71\u54cd\uff0c\u5e76\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.20512": "|**2024-09-30**|**Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models**|Arpan Mukherjee et.al.|[2409.20512](http://arxiv.org/abs/2409.20512)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u51c6\u786e\u9884\u6d4b\u5de5\u4e1a\u5408\u6210\u4e2d\u6240\u7528\u9499\u949b\u77ff\u6eb6\u5242\u6bd2\u6027\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u9488\u5bf9\u6027\u548c\u7ed3\u6784\u5316\u7684\u6bd2\u6027\u6570\u636e\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u52a8\u5316\u6570\u636e\u63d0\u53d6\u4e0e\u5177\u6709\u4e0d\u786e\u5b9a\u6027\u4fe1\u606f\u7684\u9884\u6d4b\u6a21\u578b\uff0c\u4ee5\u586b\u8865\u6570\u636e\u7a7a\u767d\u5e76\u63d0\u9ad8\u9884\u6d4b\u7684\u7f6e\u4fe1\u5ea6\u3002 \u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e24\u79cd\u65b9\u6cd5\u4ece\u6d89\u53ca\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u7684\u79d1\u5b66\u6587\u732e\u8bed\u6599\u5e93\u4e2d\u81ea\u52a8\u63d0\u53d6\u76f8\u5173\u6570\u636e\uff1a\u8f83\u5c0f\u7684\u53cc\u5411\u8bed\u8a00\u6a21\u578b\uff08\u5982BERT\u548cELMo\uff09\u56e0\u5176\u91cd\u590d\u6027\u548c\u786e\u5b9a\u6027\u8f93\u51fa\u800c\u88ab\u4f7f\u7528\uff1b\u800c\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-3.5\u5219\u5229\u7528\u5176\u5e9e\u5927\u7684\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u66f4\u597d\u7684\u54cd\u5e94\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u201c\u63d0\u793a\u548c\u9a8c\u8bc1\u201d\u6280\u672f\u96c6\u6210\u5230LLM\u4e2d\uff0c\u65e8\u5728\u5b9e\u73b0\u6709\u9488\u5bf9\u6027\u7684\u63d0\u53d6\u548c\u4f18\u5316\uff0c\u4ece\u800c\u51cf\u5c11LLM\u7684\u5e7b\u89c9\u73b0\u8c61\uff0c\u63d0\u5347\u63d0\u53d6\u6570\u636e\u7684\u8d28\u91cf\u3002 \u63a5\u4e0b\u6765\uff0c\u63d0\u53d6\u7684\u6570\u636e\u88ab\u8f93\u5165\u5230\u9884\u8bad\u7ec3\u7684\u591a\u4efb\u52a1\u4e8c\u5143\u5206\u7c7b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u63d0\u53d6\u6eb6\u5242\u7684ED\u6027\u8d28\u3002\u6211\u4eec\u5229\u7528\u4ece\u5206\u7c7b\u6a21\u578b\u83b7\u5f97\u7684\u7c7b\u522b\u6982\u7387\u8fdb\u884c\u9999\u519c\u71b5\u4e3a\u57fa\u7840\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u4ee5\u6b64\u6765\u91cf\u5316\u4e0d\u786e\u5b9a\u6027\u5e76\u8bc6\u522b\u9884\u6d4b\u4e2d\u7684\u6570\u636e\u7f3a\u53e3\u3002\u8fd9\u79cd\u65b9\u6cd5\u5bfc\u81f4\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u7528\u4e8e\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u53ca\u5176\u57fa\u4e8e\u4e0d\u786e\u5b9a\u6027\u865a\u62df\u6bd2\u6027\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u548c\u5f26\u56fe\u6765\u53ef\u89c6\u5316\u6eb6\u5242\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u5e76\u4f18\u5148\u8003\u8651\u90a3\u4e9b\u53ef\u80fd\u5b58\u5728\u5371\u9669\u7684\u6eb6\u5242\uff0c\u7ed3\u679c\u53d1\u73b070%\u7684\u6eb6\u5242\u76f8\u4e92\u4f5c\u7528\u4e3b\u8981\u4e0e\u7279\u5b9a\u7684\u4e24\u79cd\u9499\u949b\u77ff\u76f8\u5173\u8054\u3002|\n", "2409.20502": "|**2024-09-30**|**COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models**|Divyanshu Daiya et.al.|[2409.20502](http://arxiv.org/abs/2409.20502)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOLLAGE\u7684\u65b0\u578b\u6846\u67b6\uff0c\u7528\u4e8e\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5c42\u6b21\u5316\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u5411\u91cf\u91cf\u5316\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VQ-VAE\uff09\u6765\u751f\u6210\u534f\u4f5c\u5f0f\u4ee3\u7406-\u5bf9\u8c61-\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u6a21\u578b\u89e3\u51b3\u4e86\u8fd9\u4e00\u9886\u57df\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u6765\u6307\u5bfc\u751f\u6210\u6027\u6269\u6563\u6a21\u578b\u3002\u5c42\u6b21\u5316\u7684VQ-VAE\u67b6\u6784\u5728\u591a\u4e2a\u62bd\u8c61\u7ea7\u522b\u6355\u83b7\u4e86\u4e0d\u540c\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u7279\u5f81\uff0c\u907f\u514d\u4e86\u5197\u4f59\u6982\u5ff5\uff0c\u5e76\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u591a\u5206\u8fa8\u7387\u8868\u793a\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u5728\u9690\u7a7a\u95f4\u4e2d\u64cd\u4f5c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5e76\u7ed3\u5408\u4e86\u7531LLM\u751f\u6210\u7684\u8fd0\u52a8\u89c4\u5212\u63d0\u793a\u6765\u5f15\u5bfc\u53bb\u566a\u8fc7\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9488\u5bf9\u7279\u5b9a\u63d0\u793a\u7684\u8fd0\u52a8\u751f\u6210\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a7\u5236\u6027\u548c\u591a\u6837\u6027\u3002\u5728CORE-4D\u548cInterHuman\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u751f\u6210\u771f\u5b9e\u4e14\u591a\u6837\u5316\u7684\u534f\u4f5c\u4eba\u7c7b-\u7269\u4f53-\u4eba\u7c7b\u4ea4\u4e92\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u673a\u5668\u4eba\u5b66\u3001\u56fe\u5f62\u5b66\u548c\u8ba1\u7b97\u673a\u89c6\u89c9\u7b49\u9886\u57df\u5efa\u6a21\u590d\u6742\u4ea4\u4e92\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.20441": "|**2024-10-01**|**Instance-adaptive Zero-shot Chain-of-Thought Prompting**|Xiaosong Yuan et.al.|[2409.20441](http://arxiv.org/abs/2409.20441)|null|\u96f6\u5c04\u94fe\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7b56\u7565\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u65b9\u9762\u5c55\u73b0\u51fa\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5355\u4e00\u4efb\u52a1\u7ea7\u63d0\u793a\u5728\u6574\u4e2a\u5b9e\u4f8b\u4e0a\u7684\u5e94\u7528\u5b58\u5728\u5c40\u9650\u6027\uff0c\u56e0\u4e3a\u4e00\u4e2a\u63d0\u793a\u65e0\u6cd5\u4e0e\u6240\u6709\u5b9e\u4f8b\u90fd\u6210\u4e3a\u6700\u4f73\u642d\u6863\u3002\u56e0\u6b64\uff0c\u66f4\u6070\u5f53\u7684\u505a\u6cd5\u662f\u7cbe\u5fc3\u8003\u8651\u63d0\u793a\u4e0e\u6bcf\u4e2a\u5b9e\u4f8b\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b97\u6cd5\u4f5c\u4e3a\u96f6\u5c04CoT\u63a8\u7406\u7684\u4e00\u79cd\u66ff\u4ee3\u7b56\u7565\uff0c\u65e8\u5728\u901a\u8fc7\u9002\u5f53\u5730\u533a\u5206\u51fa\u597d\u7684\u548c\u574f\u7684\u63d0\u793a\u6765\u63d0\u5347\u6027\u80fd\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u4fe1\u606f\u6d41\u7684\u89d2\u5ea6\u5bf9LLM\u8fdb\u884c\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\uff0c\u53d1\u73b0\u4fe1\u606f\u4ece\u95ee\u9898\u5230\u63d0\u793a\u4ee5\u53ca\u95ee\u9898\u5230\u63a8\u7406\u7684\u53cc\u5411\u6d41\u52a8\u5bf9\u63a8\u7406\u7ed3\u679c\u5f71\u54cd\u6700\u5927\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u66f4\u4f18\u79c0\u7684\u96f6\u5c04CoT\u63a8\u7406\u9700\u8981\u63d0\u793a\u4ece\u95ee\u9898\u4e2d\u83b7\u53d6\u8bed\u4e49\u4fe1\u606f\uff0c\u7136\u540e\u63a8\u7406\u4ece\u95ee\u9898\u76f4\u63a5\u6216\u901a\u8fc7\u63d0\u793a\u95f4\u63a5\u5730\u805a\u5408\u8db3\u591f\u4fe1\u606f\u3002\u76f8\u53cd\uff0c\u7f3a\u5931\u8fd9\u4e9b\u4efb\u4f55\u4e00\u9879\u53ef\u80fd\u90fd\u4f1a\u5bfc\u81f4\u4e00\u4e2a\u4e0d\u7406\u60f3\u7684\u63d0\u793a\u3002\u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u96f6\u5c04CoT\u63a8\u7406\u7684\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b56\u7565\uff08IAP\uff09\u3002 \u5728LLaMA-2\u3001LLaMA-3\u548cQwen\u4e0a\u5bf9\u6570\u5b66\u3001\u903b\u8f91\u548c\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\uff08\u5982GSM8K\u3001MMLU\u3001\u56e0\u679c\u5224\u65ad\uff09\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5b9e\u4f8b\u81ea\u9002\u5e94\u96f6\u5c04CoT\u63d0\u793a\u7b56\u7565\u5728\u67d0\u4e9b\u5b9a\u5236\u63d0\u793a\u6216\u590d\u6742\u7a0b\u5e8f\u7684\u57fa\u7840\u4e0a\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd9\u8bc1\u660e\u4e86\u6211\u4eec\u5728\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\u7814\u7a76\u4e2d\u7684\u53d1\u73b0\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20385": "|**2024-09-30**|**Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation**|Shan Chen et.al.|[2409.20385](http://arxiv.org/abs/2409.20385)|null|\u80cc\u666f\uff1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8bad\u7ec3\u6210\u9075\u5faa\u6307\u4ee4\uff0c\u4f46\u8fd9\u79cd\u8bbe\u8ba1\u4f7f\u5176\u5bb9\u6613\u5728\u751f\u6210\u9519\u8bef\u4fe1\u606f\u65f6\u76f2\u76ee\u9075\u4ece\u7528\u6237\u8bf7\u6c42\u3002\u5728\u533b\u5b66\u9886\u57df\uff0c\u8fd9\u53ef\u80fd\u4f1a\u52a0\u901f\u9519\u8bef\u4fe1\u606f\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u5f71\u54cd\u4eba\u7c7b\u5065\u5eb7\u3002\u7814\u7a76\u76ee\u6807/\u65b9\u6cd5\uff1a\u6211\u4eec\u5206\u6790\u4e86\u6a21\u578b\u5728\u77e5\u9053\u8bf7\u6c42\u4e0d\u5408\u7406\u7684\u60c5\u51b5\u4e0b\uff0c\u751f\u6210\u4e0e\u836f\u7269\u6709\u5173\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u503e\u5411\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u901a\u8fc7\u4e0a\u4e0b\u6587\u63d0\u793a\u548c\u8c03\u6574\u53c2\u6570\uff0c\u4f7fLLMs\u4f18\u5148\u8003\u8651\u903b\u8f91\u63a8\u7406\u800c\u975e\u9075\u4ece\u6027\uff0c\u4ee5\u964d\u4f4e\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u98ce\u9669\u7684\u53ef\u80fd\u6027\u3002 \u7ed3\u679c\uff1a\u6240\u6709\u524d\u6cbfLLMs\u90fd\u9075\u5b88\u4e86\u751f\u6210\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u4e0d\u5408\u7406\u8bf7\u6c42\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u548c\u53c2\u6570\u8c03\u6574\u7b56\u7565\u53ef\u4ee5\u63d0\u5347\u68c0\u6d4b\u8bf7\u6c42\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\uff0c\u5e76\u9632\u6b62\u533b\u7597\u4fe1\u606f\u7684\u8bef\u4f20\u3002 \u7ed3\u8bba\uff1a\u5c06LLMs\u7684\u8bbe\u8ba1\u91cd\u5fc3\u4ece\u9075\u4ece\u6027\u8f6c\u5411\u903b\u8f91\u63a8\u7406\uff0c\u6709\u52a9\u4e8e\u964d\u4f4e\u5176\u88ab\u5229\u7528\u4e8e\u4f20\u64ad\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u7684\u98ce\u9669\u3002|\n", "2409.20370": "|**2024-09-30**|**The Perfect Blend: Redefining RLHF with Mixture of Judges**|Tengyu Xu et.al.|[2409.20370](http://arxiv.org/abs/2409.20370)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u7684\u540e\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u7ea6\u675f\u751f\u6210\u7b56\u7565\u4f18\u5316\uff08CGPO\uff09\u3002CGPO\u7684\u6838\u5fc3\u662f\u201c\u88c1\u5224\u6df7\u5408\u201d\uff08MoJ\uff09\uff0c\u5b83\u4ee5\u6210\u672c\u6548\u76ca\u7684\u65b9\u5f0f\u5bf9\u7b56\u7565\u8fdb\u884c\u5206\u5c42\u7ea6\u675f\u4f18\u5316\uff0c\u4ece\u800c\u5728\u539f\u7406\u4e0a\u8bc6\u522bRLHF\u4e2d\u7684\u5b8c\u7f8e\u878d\u5408\u3002\u6b64\u65b9\u6cd5\u5728\u7406\u8bba\u4e0a\u6709\u4fdd\u8bc1\uff0c\u4e0d\u9700\u8981\u5927\u91cf\u7684\u8d85\u53c2\u6570\u8c03\u6574\uff0c\u5e76\u4e14\u53ef\u4ee5\u5728\u5e38\u89c1\u7684\u540e\u8bad\u7ec3\u7ba1\u9053\u4e2d\u65e0\u7f1d\u96c6\u6210\u3002\u8fd9\u6709\u52a9\u4e8e\u68c0\u6d4b\u548c\u7f13\u89e3\u5956\u52b1\u4f5c\u5f0a\u884c\u4e3a\uff0c\u5e76\u5728\u5927\u91cf\u76ee\u6807\u7684\u573a\u666f\u4e0b\u8fbe\u5230\u5e15\u7d2f\u6258\u6700\u4f18\u70b9\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cCGPO\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u6807\u51c6\u7684RLHF\u7b97\u6cd5\uff0c\u5982PPO\u548cDPO\uff0c\u5305\u62ec\u901a\u7528\u804a\u5929\u3001STEM\u95ee\u9898\u3001\u6307\u4ee4\u9075\u5faa\u548c\u7f16\u7a0b\u7b49\u3002\u5177\u4f53\u800c\u8a00\uff0cCGPO\u5728AlpacaEval-2\uff08\u901a\u7528\u804a\u5929\uff09\u4e0a\u63d0\u9ad8\u4e867.4%\uff0c\u5728Arena-Hard\uff08STEM\u4e0e\u63a8\u7406\uff09\u4e0a\u63d0\u9ad8\u4e8612.5%\uff0c\u5e76\u5728\u6570\u5b66\u548c\u5176\u4ed6\u9886\u57df\u5982\u7f16\u7a0b\u7b49\u4efb\u52a1\u4e0a\u4fdd\u6301\u4e00\u81f4\u7684\u6539\u8fdb\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u867d\u7136PPO\u7ecf\u5e38\u88ab\u4f7f\u7528\uff0c\u4f46\u5728\u6d41\u884c\u7684\u7f16\u7a0b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u5bb9\u6613\u906d\u53d7\u4e25\u91cd\u7684\u5956\u52b1\u4f5c\u5f0a\uff0c\u800cCGPO\u6210\u529f\u5730\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\u3002 \u8fd9\u4e00\u7a81\u7834\u5728RLHF\u9886\u57df\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5956\u52b1\u4f5c\u5f0a\u548c\u6781\u7aef\u591a\u76ee\u6807\u4f18\u5316\u7684\u6311\u6218\uff0c\u800c\u4e14\u63a8\u8fdb\u4e86\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u591a\u79cd\u5e94\u7528\u4e2d\u7684\u5bf9\u9f50\u6280\u672f\u3002|\n", "2409.20365": "|**2024-09-30**|**VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs**|Ruotong Liao et.al.|[2409.20365](http://arxiv.org/abs/2409.20365)|**[link](https://github.com/mayhugotong/videoinsta)**|\u5728\u89c6\u9891\u8bed\u8a00\u9886\u57df\uff0c\u5229\u7528\u96f6\u6837\u672c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u7406\u8fdb\u884c\u89c6\u9891\u7406\u89e3\u7684\u6700\u65b0\u5de5\u4f5c\u5df2\u6210\u4e3a\u6311\u6218\u4f20\u7edf\u7aef\u5230\u7aef\u6a21\u578b\u7684\u6709\u529b\u7ade\u4e89\u8005\u3002\u7136\u800c\uff0c\u957f\u89c6\u9891\u7684\u7406\u89e3\u9762\u4e34\u7740\u72ec\u7279\u7684\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u65f6\u95f4\u8de8\u5ea6\u65f6\uff0c\u5373\u4f7f\u662f\u96f6\u6837\u672cLLM\u65b9\u6cd5\u4e5f\u662f\u5982\u6b64\u3002\u957f\u89c6\u9891\u4e2d\u7684\u4fe1\u606f\u5197\u4f59\u95ee\u9898\u4fc3\u4f7f\u6211\u4eec\u601d\u8003\u54ea\u4e9b\u4fe1\u606f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53ca\u5982\u4f55\u5229\u7528\u5b83\u4eec\u8fdb\u884c\u590d\u6742\u7684\u7a7a\u95f4-\u65f6\u95f4\u63a8\u7406\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u957f\u89c6\u9891\u5206\u6790\u7684\u7406\u89e3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aVideoINSTA\uff08INformative Spatial-TemporAl Reasoning\uff09\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u96f6\u6837\u672c\u957f\u89c6\u9891\u7406\u89e3\u3002VideoINSTA\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a\uff081\uff09\u5229\u7528LLM\u8fdb\u884c\u957f\u89c6\u9891\u7406\u89e3\u7684\u96f6\u6837\u672c\u6846\u67b6\uff1b\uff082\uff09\u4e8b\u4ef6\u9a71\u52a8\u7684\u65f6\u95f4\u63a8\u7406\u548c\u57fa\u4e8e\u5185\u5bb9\u7684\u7a7a\u95f4\u63a8\u7406\u65b9\u6cd5\uff0c\u4f7fLLM\u80fd\u591f\u5bf9\u89c6\u9891\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\uff1b\uff083\uff09\u4e00\u79cd\u81ea\u6211\u53cd\u601d\u7684\u4fe1\u606f\u63a8\u7406\u65b9\u6848\uff0c\u901a\u8fc7\u4fe1\u606f\u5145\u5206\u6027\u548c\u9884\u6d4b\u7f6e\u4fe1\u5ea6\u7684\u5e73\u8861\u6765\u8c03\u6574\u65f6\u95f4\u56e0\u7d20\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5728\u4e09\u4e2a\u957f\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6700\u4f73\u6027\u80fd\uff1aEgoSchema\u3001NextQA\u548cIntentQA\uff0c\u4ee5\u53ca\u5f00\u653e\u95ee\u7b54\u6570\u636e\u96c6ActivityNetQA\u3002\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u53d1\u5e03\uff1ahttps://github.com/mayhugotong/VideoINSTA\u3002|\n", "2410.01805": "|**2024-10-02**|**Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads**|Yuxiang Huang et.al.|[2410.01805](http://arxiv.org/abs/2410.01805)|**[link](https://github.com/huangyuxiang03/Locret)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u652f\u6301\u957f\u671f\u4e0a\u4e0b\u6587\u7406\u89e3\u548c\u5904\u7406\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5c06LLMs\u7684\u751f\u6210\u63a8\u7406\u6269\u5c55\u5230\u5982\u6b64\u957f\u7684\u4e0a\u4e0b\u6587\u4f1a\u589e\u52a0\u5927\u91cf\u7684\u8ba1\u7b97\u8d1f\u8f7d\uff0c\u5e76\u8981\u6c42\u5728\u7ef4\u6301\u57fa\u4e8e\u8f6c\u6362\u5668\u7684LLMs\u7684\u5173\u952e\u503c\u5bf9\uff08KV\uff09\u7f13\u5b58\u65f6\u4f7f\u7528\u5927\u91cfGPU\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u538b\u7f29\u65b9\u6cd5\uff0c\u5982\u91cf\u5316\uff0c\u968f\u7740\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u589e\u52a0\u800c\u9047\u5230\u5185\u5b58\u74f6\u9888\uff1b\u800c\u56fa\u5b9a\u5927\u5c0f\u7684\u7f13\u5b58\uff0c\u5982\u6dd8\u6c70\u7b56\u7565\uff0c\u5219\u7531\u4e8e\u4e0d\u9ad8\u6548\u7684\u7b56\u7565\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u8fd9\u4e9b\u9650\u5236\u9650\u5236\u4e86\u5728\u5355\u4e2aNvidia 4090 GPU\u7b49\u6d88\u8d39\u8005\u7ea7\u8bbe\u5907\u4e0a\u7684\u90e8\u7f72\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Locret\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u7528\u4e8e\u957f\u4e0a\u4e0b\u6587LLM\u63a8\u7406\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5165\u4fdd\u7559\u5934\u90e8\u6765\u8bc4\u4f30KV\u7f13\u5b58\u5355\u5143\u7684\u56e0\u679c\u91cd\u8981\u6027\uff0c\u4ece\u800c\u5141\u8bb8\u5728\u56fa\u5b9a\u7f13\u5b58\u5927\u5c0f\u5185\u8fdb\u884c\u66f4\u51c6\u786e\u7684\u6dd8\u6c70\u3002Locret\u5728\u51bb\u7ed3\u7684\u4e3b\u5e72LLM\u57fa\u7840\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4f7f\u7528\u6807\u51c6\u957f\u65f6\u95f4\u4e0a\u4e0b\u6587SFT\u6570\u636e\u96c6\u7684\u5c11\u91cf\u6570\u636e\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ee5\u5206\u5757\u9884\u586b\u5145\u6a21\u5f0f\u6dd8\u6c70\u4f4e\u91cd\u8981\u6027\u7684\u7f13\u5b58\u5355\u5143\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u91cf\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u6765\u8bc4\u4f30Locret\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6700\u8fd1\u7684\u7ade\u4e89\u65b9\u6cd5\uff08\u5305\u62ecInfLLM\u3001\u91cf\u5316\u3001SirLLM\u548cMInference\uff09\u76f8\u6bd4\uff0cLocret\u5728\u5185\u5b58\u6548\u7387\u548c\u751f\u6210\u5185\u5bb9\u8d28\u91cf\u65b9\u9762\u5747\u8868\u73b0\u51fa\u8272\u2014\u2014Locret\u5b9e\u73b0\u4e86\u4e0ePhi-3-mini-128K\u548cLlama-3.1-8B-instruct\u5168KV\u7f13\u5b58\u76f8\u6bd4\u8d85\u8fc720\u500d\u548c8\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\u7387\u3002\u6b64\u5916\uff0cLocret\u8fd8\u53ef\u4ee5\u4e0e\u5176\u4ed6\u65b9\u6cd5\uff08\u5982\u91cf\u5316\u548c\u4ee4\u724c\u5408\u5e76\uff09\u7ed3\u5408\u4f7f\u7528\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cLocret\u662f\u7b2c\u4e00\u4e2a\u80fd\u591f\u5c06Llama-3.1-8B\u6216\u7c7b\u4f3c\u6a21\u578b\u90e8\u7f72\u5230\u5355\u4e2aNvidia 4090 GPU\u4e0a\uff0c\u540c\u65f6\u5728\u4e0d\u727a\u7272\u751f\u6210\u8d28\u91cf\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0128K\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u7684\u6846\u67b6\uff0c\u4e14\u4ec5\u9700\u8981\u5c11\u91cf\u989d\u5916\u7684\u7cfb\u7edf\u4f18\u5316\u3002**|\n", "2410.01799": "|**2024-10-02**|**Efficient $1$-bit tensor approximations**|Alex W. Neal Riasanovsky et.al.|[2410.01799](http://arxiv.org/abs/2410.01799)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7a7a\u95f4\u6548\u7387\u9ad8\u7684\u77e9\u9635\u548c\u4efb\u610f\u9636\u5f20\u91cf\u5206\u89e3\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u7ebf\u6027\u7ec4\u5408\u7684\u5f20\u91cf\u79ef\u5f62\u5f0f\uff0c\u5176\u4e2d\u5411\u91cf\u503c\u4e3a$\\{-1, 1\\}$\u3002\u5bf9\u4e8e\u4efb\u4e00\u77e9\u9635$A \\in \\mathbb{R}^{m \\times n}$\uff0c\u5176\u8868\u8fbe\u5f0f\u4e3a\uff1a$$A - R_w = S_w C_w T_w^\\top = \\sum_{j=1}^w c_j \\cdot \\mathbf{s}_j \\mathbf{t}_j^\\top$$ \u8fd9\u662f\u4e00\u4e2a\u5173\u4e8e$A$\u7684\u201c\u5bbd\u5ea6\u4e3a$w$\u7684\u7b26\u53f7\u5207\u5206\u89e3\u201d\u3002\u8fd9\u91cc$C_w = \"diag\"(\\mathbf{c}_w)$\uff0c\u4e14$S_w, T_w$\u548c\u5411\u91cf$\\mathbf{s}_j, \\mathbf{t}_j$\u5747\u4e3a$\\{-1, 1\\}$\u503c\u3002\u7528\u4e8e\u5b58\u50a8$(S_w, T_w, C_w)$\u6240\u9700\u7684\u7a7a\u95f4\u662f$w \\cdot (m + n)$\u4f4d\uff0c\u5e76\u4ec5\u9700$w$\u4e2a\u6d6e\u70b9\u6570\u3002\u5f53\u5e94\u7528\u4e8e\u5177\u6709i.i.d. $\\mathcal N (0, 1)$\u5206\u5e03\u5143\u7d20\u7684#f32\u77e9\u9635\u65f6\uff0c$\\,R_w\\,_F$\u5448\u73b0\u51fa\u6307\u6570\u8870\u51cf\u3002\u9009\u62e9\u5408\u9002\u7684$w$\uff0c\u4f7f$(S_w, T_w, C_w)$\u7684\u5185\u5b58\u5360\u7528\u4e0e\\textit{f16}\u6216\\textit{bf16}\u77e9\u9635\u76f8\u540c\uff0c\u76f8\u5bf9\u8bef\u5dee\u76f8\u5f53\u3002\u6211\u4eec\u7684\u7b97\u6cd5\u572820\u884c\u4f2a\u4ee3\u7801\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u7b26\u53f7\u5207\u5206\u89e3\u3002\u5b83\u6e90\u81ea1999\u5e74Frieze\u548cKannan\u7684\u4e00\u7bc7\u8457\u540d\u8bba\u6587\u7684\u7b80\u5355\u4fee\u6539\u3002 \u4f5c\u4e3a\u7b2c\u4e00\u4e2a\u5e94\u7528\uff0c\u6211\u4eec\u5bf9\u5f00\u653e\u6e90\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\\textit{Mistral-7B-v0.1}\u4e2d\u7684\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u4e86$50\\%$\u7684\u7a7a\u95f4\u538b\u7f29\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6240\u6709$226$\u4e2a\u4f59\u77e9\u9635\u7684\u76f8\u5bf9\u8bef\u5dee\u5747\u5c0f\u4e8e$6\\%$\uff0c\u4e14\u6269\u5c55\u6a21\u578b\u5728huggingface\u6392\u884c\u699c\u4e0a\u4e0e\\textit{Mistral-7B-v0.1}\u6a21\u578b\u8868\u73b0\u76f8\u8fd1\u3002\u968f\u7740\u7a7a\u95f4\u538b\u7f29\u7387\u4ece$50\\%$\u964d\u4f4e\u81f3$25\\%$\uff0c\u57fa\u51c6\u6027\u80fd\u7f13\u6162\u4e0b\u964d\u3002\u6211\u4eec\u4f18\u5316\u4e86\u5f00\u6e90\u7684\\textit{rust}\u5b9e\u73b0\uff0c\u4f7f\u7528\u4e86\\textit{avx2}\u548c\\textit{avx512}\u67b6\u6784\u4e0b\u7684\\textit{simd}\u6307\u4ee4\u8fdb\u884c\u52a0\u901f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06\u8be5\u7b97\u6cd5\u6269\u5c55\u5230\u4e86\u4efb\u610f\u9636\u5f20\u91cf\uff0c\u5e76\u5229\u7528\u5b83\u538b\u7f29\u4e86\u4e00\u5f20\u4f5c\u8005\u732bAngus\u7684\u7167\u7247\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u91cc\u7684\u6587\u672c\u5e76\u672a\u5305\u542b\u4efb\u4f55\u7279\u6b8a\u5b57\u7b26\u6216\u7279\u5b9a\u683c\u5f0f\u6807\u8bb0\uff0c\u800c\u662f\u4ee5\u7eaf\u6587\u672c\u5f62\u5f0f\u5448\u73b0\u4e86\u6458\u8981\u5185\u5bb9\u3002|\n", "2410.01795": "|**2024-10-02**|**Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models**|Joseph Lee et.al.|[2410.01795](http://arxiv.org/abs/2410.01795)|**[link](https://github.com/pennshenlab/freeform)**|**\u57fa\u4e8e\u590d\u6742\u9057\u4f20\u57fa\u7840\u9884\u6d4b\u8868\u578b\uff0c\u5229\u7528\u5c0f\u800c\u53ef\u89e3\u91ca\u7684\u53d8\u5f02\u7279\u5f81\u4ecd\u7136\u662f\u4e00\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u4e0a\uff0c\u4f7f\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6b64\u4efb\u52a1\uff0c\u4f46\u57fa\u56e0\u578b\u6570\u636e\u7684\u9ad8\u7ef4\u7279\u6027\u4f7f\u5f97\u5206\u6790\u548c\u9884\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u53d7\u5230\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7f16\u7801\u7684\u4e30\u5bcc\u77e5\u8bc6\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u751f\u7269\u533b\u5b66\u6982\u5ff5\u4e0a\u7684\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u65e8\u5728\u63a2\u7d22LLM\u5728\u8868\u683c\u57fa\u56e0\u578b\u6570\u636e\u7279\u5f81\u9009\u62e9\u4e0e\u5de5\u7a0b\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5e76\u5f15\u5165\u4e00\u79cd\u57fa\u4e8e\u77e5\u8bc6\u7684\u6846\u67b6\u3002\u6211\u4eec\u5f00\u53d1\u4e86FREEFORM\uff0c\u4e00\u79cd\u81ea\u7531\u6d41\u52a8\u63a8\u7406\u4e0e\u96c6\u6210\u589e\u5f3a\u7279\u5f81\u8f93\u51fa\u548c\u7a33\u5065\u5efa\u6a21\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u94fe\u5f0f\u601d\u8003\u4e0e\u96c6\u6210\u539f\u5219\uff0c\u5229\u7528LLM\u7684\u5185\u5728\u77e5\u8bc6\u6765\u9009\u62e9\u548c\u5de5\u7a0b\u7279\u5f81\u3002\u5728\u4e24\u4e2a\u4e0d\u540c\u7684\u4eba\u7c7b\u57fa\u56e0\u578b-\u8868\u578b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\uff0c\u5305\u62ec\u9057\u4f20\u8840\u7edf\u548c\u9057\u4f20\u6027\u542c\u529b\u635f\u5931\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e2a\u6846\u67b6\u5728\u4f4e\u6837\u672c\u91cf\u60c5\u51b5\u4e0b\u4f18\u4e8e\u51e0\u79cd\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u3002FREEFORM\u4f5c\u4e3a\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u53ef\u4ee5\u5728GitHub\u4e0a\u83b7\u53d6\uff1ahttps://github.com/PennShenLab/FREEFORM\u3002**|\n", "2410.01792": "|**2024-10-02**|**When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1**|R. Thomas McCoy et.al.|[2410.01792](http://arxiv.org/abs/2410.01792)|null|\u5728\u201c\u81ea\u52a8\u56de\u5f52\u4f59\u70ec\u201d\uff08McCoy\u7b49\u4eba\uff0c2023\u5e74\uff09\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u51e0\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8d77\u6e90\u4e0a\u5b58\u5728\u4e00\u4e9b\u91cd\u8981\u9650\u5236\uff0c\u8fd9\u5f52\u56e0\u4e8e\u5b83\u4eec\u7684\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u7279\u6027\u3002\u8fd9\u91cc\u6211\u4eec\u63a2\u8ba8\u4e86OpenAI\u7684\u65b0\u7cfb\u7edfo1\u662f\u5426\u4f9d\u7136\u5b58\u5728\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e0e\u4e4b\u524d\u7684LLMs\u76f8\u6bd4\uff0co1\u5728\u63a8\u7406\u4f18\u5316\u65b9\u9762\u6709\u6240\u4e0d\u540c\u3002\u7814\u7a76\u53d1\u73b0\uff0co1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\u663e\u8457\u4f18\u4e8e\u4e4b\u524d\u6a21\u578b\uff0c\u5728\u67d0\u4e9b\u5e38\u89c1\u4efb\u52a1\u7684\u7f55\u89c1\u53d8\u4f53\u4e0a\uff08\u4f8b\u5982\uff0c\u4ece\u5217\u8868\u4e2d\u7684\u6bcf\u4e2a\u8bcd\u7684\u7b2c\u4e8c\u4e2a\u5b57\u6bcd\u5f62\u6210\u7f29\u5199\uff0c\u800c\u4e0d\u662f\u7b2c\u4e00\u4e2a\u5b57\u6bcd\uff09\u8868\u73b0\u5c24\u5176\u51fa\u8272\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5b9a\u91cf\u6539\u8fdb\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u4f46o1\u4f9d\u7136\u663e\u793a\u51fa\u4e86\u4e0e\u4e4b\u524d\u7cfb\u7edf\u76f8\u540c\u7684\u57fa\u672c\u8d8b\u52bf\uff1a\u5bf9\u4e8e\u6982\u7387\u8f83\u9ad8\u7684\u793a\u4f8b\u548c\u4efb\u52a1\uff0co1\u7684\u8868\u73b0\u66f4\u597d\u4e14\u9700\u8981\u7684\u201c\u601d\u8003\u4ee4\u724c\u201d\u6570\u91cf\u8f83\u5c11\uff1b\u800c\u5728\u6982\u7387\u8f83\u4f4e\u7684\u60c5\u51b5\u4e0b\u5219\u8868\u73b0\u4e0d\u4f73\u3002 \u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u4ee5\u8fdb\u884c\u63a8\u7406\u53ef\u4ee5\u51cf\u8f7b\u4f46\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u514b\u670d\u8bed\u8a00\u6a21\u578b\u7684\u6982\u7387\u654f\u611f\u6027\u95ee\u9898\u3002|\n", "2410.01789": "|**2024-10-02**|**Investigating on RLHF methodology**|Alexey Kutalev et.al.|[2410.01789](http://arxiv.org/abs/2410.01789)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7684\u5bf9\u9f50\u95ee\u9898\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u8bad\u7ec3\u504f\u597d\u6a21\u578b\u7684\u7279\u6027\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u504f\u597d\uff0c\u5e76\u4ecb\u7ecd\u4e86\u5b9e\u73b0\u6700\u4f73\u7ed3\u679c\u6240\u9700\u7684\u65b9\u6cd5\u548c\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u63cf\u8ff0\u4e86\u9047\u5230\u7684\u6311\u6218\u4ee5\u53ca\u514b\u670d\u8fd9\u4e9b\u6311\u6218\u7684\u65b9\u5f0f\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u7684\u7ecf\u9a8c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u6211\u4eec\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\uff0c\u800c\u65e0\u9700\u521b\u5efa\u5355\u72ec\u7684\u504f\u597d\u6a21\u578b\u3002\u4f5c\u4e3a\u6211\u4eec\u7684\u8d21\u732e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u901a\u8fc7\u56f0\u60d1\u5ea6\u7b5b\u9009\u6536\u96c6\u504f\u597d\u6570\u636e\u96c6\u7684\u65b9\u6cd5\uff0c\u8fd9\u4f7f\u5f97\u4e3a\u7279\u5b9a\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u7684\u8fc7\u7a0b\u66f4\u52a0\u7b80\u4fbf\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u3002|\n", "2410.01784": "|**2024-10-02**|**OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models**|Heng Yang et.al.|[2410.01784](http://arxiv.org/abs/2410.01784)|**[link](https://github.com/yangheng95/OmniGenomeBench)**|**\u8fd1\u5e74\u6765\uff0c\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u6fc0\u53d1\u4e86\u5bf9\u57fa\u56e0\u7ec4\u57fa\u7840\u6a21\u578b\uff08GFMs\uff09\u7a81\u7834\u6027\u8fdb\u5c55\u7684\u671f\u5f85\u3002\u81ea\u751f\u547d\u8fdb\u5316\u4e4b\u521d\u5c31\u9690\u85cf\u5728\u591a\u6837\u5316\u7684\u57fa\u56e0\u7ec4\u4e2d\u7684\u201c\u81ea\u7136\u4e4b\u7801\u201d\uff0c\u8574\u542b\u7740\u5de8\u5927\u6f5c\u529b\uff0c\u80fd\u591f\u901a\u8fc7\u57fa\u56e0\u7ec4\u5efa\u6a21\u5bf9\u4eba\u7c7b\u548c\u751f\u6001\u7cfb\u7edf\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002\u8fd1\u671fGFM\u9886\u57df\u7684\u91cd\u8981\u7a81\u7834\uff0c\u5982Evo\uff0c\u5438\u5f15\u4e86\u5927\u91cf\u6295\u8d44\u4e0e\u5173\u6ce8\uff0c\u5b83\u4eec\u89e3\u51b3\u4e86\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5e76\u5c06\u57fa\u56e0\u7ec4\u7814\u7a76\u4ece\u624b\u52a8\u3001\u4e0d\u53ef\u9760\u548c\u4f4e\u6548\u7684\u4f20\u7edf\u6a21\u5f0f\u8f6c\u53d8\u4e3a\u81ea\u52a8\u5316\u3001\u53ef\u9760\u548c\u9ad8\u6548\u7684\u65b0\u8303\u5f0f\u3002\u5728\u57fa\u56e0\u7ec4\u5b66\u8fde\u7eed\u6280\u672f\u9769\u547d\u7684\u80cc\u666f\u4e0b\uff0cGFM\u7814\u7a76\u9762\u4e34\u4e24\u5927\u6311\u6218\uff1a\u7f3a\u4e4fGFM\u57fa\u51c6\u6d4b\u8bd5\u5de5\u5177\u4ee5\u53ca\u591a\u7ef4\u57fa\u56e0\u7ec4\u5b66\u7684\u5f00\u6e90\u8f6f\u4ef6\u7f3a\u5931\u3002\u8fd9\u4e9b\u6311\u6218\u963b\u788d\u4e86GFM\u5feb\u901f\u6f14\u8fdb\u53ca\u5176\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7406\u89e3\u4e0e\u5408\u6210\u57fa\u56e0\u7ec4\u7b49\u6570\u5341\u5e74\u6765\u5b58\u5728\u7684\u95ee\u9898\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GFMBench\u6846\u67b6\uff0c\u4e00\u4e2a\u4e13\u6ce8\u4e8eGFM\u5bfc\u5411\u57fa\u51c6\u6d4b\u8bd5\u7684\u5e73\u53f0\u3002GFMBench\u6807\u51c6\u5316\u4e86\u57fa\u51c6\u5957\u4ef6\uff0c\u5e76\u5b9e\u73b0\u4e86\u5bf9\u5927\u91cf\u5f00\u6e90GFMs\u7684\u81ea\u52a8\u5316\u57fa\u51c6\u6d4b\u8bd5\u3002\u5b83\u96c6\u6210\u4e86\u6765\u81ea\u56db\u5927\u5927\u578b\u57fa\u51c6\u7684\u6570\u767e\u4e07\u4e2a\u57fa\u56e0\u5e8f\u5217\uff0c\u8986\u76d6\u6570\u767e\u79cd\u57fa\u56e0\u7ec4\u4efb\u52a1\uff0c\u4f7fGFMs\u6c11\u4e3b\u5316\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u865a\u62df\u57fa\u56e0\u7ec4\u5e94\u7528\u3002\u6b64\u5916\uff0cGFMBench\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u53d1\u5e03\uff0c\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u754c\u9762\u548c\u591a\u6837\u5316\u6559\u7a0b\uff0c\u9002\u7528\u4e8e\u81ea\u52a8\u6d4b\u8bd5\u4ee5\u53caRNA\u8bbe\u8ba1\u548c\u7ed3\u6784\u9884\u6d4b\u7b49\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u4fc3\u8fdb\u57fa\u56e0\u7ec4\u5efa\u6a21\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u516c\u5171\u6392\u884c\u699c\uff0c\u5c55\u793a\u7531AutoBench\u751f\u6210\u7684\u57fa\u51c6\u6027\u80fd\u3002GFMBench\u4ee3\u8868\u4e86\u6807\u51c6\u5316GFM\u57fa\u51c6\u6d4b\u8bd5\u548c\u6c11\u4e3b\u5316GFM\u5e94\u7528\u7684\u4e00\u5927\u6b65\u3002**|\n", "2410.01782": "|**2024-10-02**|**Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models**|Shayekh Bin Islam et.al.|[2410.01782](http://arxiv.org/abs/2410.01782)|**[link](https://github.com/ShayekhBinIslam/openrag)**|\u4e3a\u4e86\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e8b\u5b9e\u51c6\u786e\u6027\u4e0a\u7684\u8868\u73b0\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\u5df2\u7ecf\u5f97\u5230\u4e86\u5e7f\u6cdb\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u5728\u5229\u7528\u68c0\u7d22\u5230\u7684\u8bc1\u636e\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u5f00\u6e90LLM\u65f6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Open-RAG\uff0c\u65e8\u5728\u589e\u5f3a\u5f00\u6e90LLM\u5728RAG\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u4efb\u610f\u5bc6\u96c6\u578bLLM\u8f6c\u6362\u6210\u4e00\u4e2a\u53c2\u6570\u9ad8\u6548\u7684\u7a00\u758f\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\uff0c\u80fd\u591f\u5904\u7406\u5305\u62ec\u5355\u8df3\u548c\u591a\u8df3\u67e5\u8be2\u5728\u5185\u7684\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002 Open-RAG\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\uff0c\u5b83\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u6765\u5e94\u5bf9\u770b\u4f3c\u76f8\u5173\u4f46\u5177\u6709\u8bef\u5bfc\u6027\u7684\u5e72\u6270\u9879\uff0c\u4ece\u800c\u6709\u6548\u5730\u5bfc\u822a\u590d\u6742\u573a\u666f\u3002\u901a\u8fc7\u5229\u7528\u6f5c\u5b66\u4e60\uff0cOpen-RAG\u52a8\u6001\u9009\u62e9\u76f8\u5173\u4e13\u5bb6\u5e76\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u4ee5\u63d0\u4f9b\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u81ea\u9002\u5e94\u68c0\u7d22\u65b9\u6cd5\uff0c\u7528\u4e8e\u5224\u65ad\u68c0\u7d22\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5e73\u8861\u6027\u80fd\u589e\u76ca\u4e0e\u63a8\u7406\u901f\u5ea6\u4e4b\u95f4\u7684\u6743\u8861\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLlama2-7B\u7684Open-RAG\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\uff0c\u76f8\u8f83\u4e8eChatGPT\u3001Self-RAG\u548cCommand R+\u7b49\u6700\u5148\u8fdb\u7684LLM\u548cRAG\u6a21\u578b\uff0c\u8868\u73b0\u51fa\u66f4\u4f18\u7684\u8868\u73b0\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6a21\u578b\u5f00\u6e90\u5728https://openragmoe.github.io/\u3002|\n", "2410.01769": "|**2024-10-02**|**Quantifying Generalization Complexity for Large Language Models**|Zhenting Qi et.al.|[2410.01769](http://arxiv.org/abs/2410.01769)|**[link](https://github.com/zhentingqi/scylla)**|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u7406\u89e3\u590d\u6742\u67e5\u8be2\u548c\u6267\u884c\u9ad8\u7ea7\u4efb\u52a1\u7684\u975e\u51e1\u80fd\u529b\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u5f80\u5f80\u4e0e\u8bb0\u5fc6\u6df1\u5ea6\u4ea4\u7ec7\u5728\u4e00\u8d77\uff0c\u8fd9\u8981\u6c42\u6211\u4eec\u8fdb\u884c\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86Scylla\uff0c\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u5b9a\u91cf\u8861\u91cfLLMs\u7684\u6cdb\u5316\u80fd\u529b\u3002Scylla\u901a\u8fc7\u5728\u5206\u5e03\u5185\uff08ID\uff09\u548c\u5206\u5e03\u5916\uff08OOD\uff09\u6570\u636e\u4e0a\u8bc4\u4f30\u6a21\u578b\u6027\u80fd\u6765\u5206\u79bb\u6cdb\u5316\u4e0e\u8bb0\u5fc6\uff0c\u6d89\u53ca20\u4e2a\u4efb\u52a1\uff0c\u8986\u76d65\u4e2a\u590d\u6742\u5ea6\u7ea7\u522b\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u4e0eID\u548cOOD\u6570\u636e\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u4e4b\u95f4\u975e\u5355\u8c03\u7684\u5173\u7cfb\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6cdb\u5316\u5c71\u8c37\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u8fd9\u4e00\u73b0\u8c61\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u9608\u503c\u2014\u2014\u79f0\u4e3a\u5173\u952e\u590d\u6742\u6027\u2014\u2014\u5728\u8be5\u9608\u503c\u5904\uff0c\u975e\u6cdb\u5316\u884c\u4e3a\u7684\u4f9d\u8d56\u8fbe\u5230\u5cf0\u503c\uff0c\u8868\u660e\u4e86LLMs\u6cdb\u5316\u80fd\u529b\u7684\u4e0a\u9650\u3002\u968f\u7740\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\uff0c\u5173\u952e\u590d\u6742\u6027\u5411\u66f4\u9ad8\u5c42\u6b21\u7684\u4efb\u52a1\u590d\u6742\u5ea6\u79fb\u52a8\uff0c\u8868\u660e\u66f4\u5927\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u4e4b\u524d\u5904\u7406\u66f4\u590d\u6742\u7684\u63a8\u7406\u4efb\u52a1\u3002\u5229\u7528Scylla\u548c\u5173\u952e\u590d\u6742\u6027\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u5bf9\u5305\u62ec\u5f00\u6e90\u6a21\u578b\u5982LLaMA\u548cQwen\u5bb6\u65cf\u3001\u4ee5\u53ca\u95ed\u6e90\u6a21\u578b\u5982Claude\u548cGPT\u5728\u5185\u768428\u4e2aLLMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u63d0\u4f9b\u4e86\u66f4\u7a33\u5065\u7684\u8bc4\u4f30\uff0c\u5e76\u5bf9LLMs\u7684\u6cdb\u5316\u80fd\u529b\u6709\u4e86\u66f4\u6e05\u6670\u7684\u7406\u89e3\u3002|\n", "2410.01744": "|**2024-10-02**|**LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks**|Mengzhao Jia et.al.|[2410.01744](http://arxiv.org/abs/2410.01744)|**[link](https://github.com/jill0001/leopard)**|\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u50cf\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u666e\u904d\u5b58\u5728\uff0c\u5982\u5e7b\u706f\u7247\u6f14\u793a\u3001\u626b\u63cf\u6587\u6863\u548c\u7f51\u9875\u5feb\u7167\u7b49\uff0c\u5176\u4e2d\u6587\u672c\u4f5c\u4e3a\u6838\u5fc3\u89c6\u89c9\u5143\u7d20\u5f15\u5bfc\u6574\u4f53\u7406\u89e3\u3002\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u7684\u4efb\u52a1\u5c24\u5176\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e0d\u4ec5\u9700\u8981\u7406\u89e3\u5355\u4e2a\u56fe\u50cf\u7684\u5185\u5bb9\uff0c\u8fd8\u9700\u8981\u5728\u591a\u4e2a\u89c6\u89c9\u8f93\u5165\u4e4b\u95f4\u63a8\u7406\u5173\u7cfb\u548c\u903b\u8f91\u6d41\u7a0b\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u573a\u666f\u7684\u91cd\u8981\u6027\uff0c\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u65f6\u9047\u5230\u4e24\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u7f3a\u4e4f\u9002\u5408\u4e8e\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff1b\uff082\uff09\u96be\u4ee5\u5e73\u8861\u56fe\u50cf\u5206\u8fa8\u7387\u4e0e\u89c6\u89c9\u7279\u5f81\u5e8f\u5217\u957f\u5ea6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\OurMethod\uff0c\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u6d89\u53ca\u591a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7684\u89c6\u8bed\u8a00\u4efb\u52a1\u7684MLLM\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u7ea6\u4e00\u767e\u4e07\u6761\u9488\u5bf9\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u7684\u9ad8\u5206\u8fa8\u7387\u591a\u56fe\u50cf\u7f16\u7801\u6a21\u5757\uff0c\u6839\u636e\u8f93\u5165\u56fe\u50cf\u7684\u539f\u59cb\u7eb5\u6a2a\u6bd4\u548c\u5206\u8fa8\u7387\u52a8\u6001\u4f18\u5316\u89c6\u89c9\u5e8f\u5217\u957f\u5ea6\u7684\u5206\u914d\u3002\u5728\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u80fd\u529b\uff0c\u5e76\u5728\u901a\u7528\u9886\u57df\u8bc4\u4f30\u4e2d\u5c55\u73b0\u51fa\u7ade\u4e89\u529b\u3002|\n", "2410.01738": "|**2024-10-02**|**VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models**|Kailai Feng et.al.|[2410.01738](http://arxiv.org/abs/2410.01738)|**[link](https://github.com/carlofkl/vitaglyph)**|**\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u53cc\u5206\u652f\u3001\u65e0\u9700\u8bad\u7ec3\u7684\u65b0\u578b\u827a\u672f\u5b57\u4f53\u751f\u6210\u65b9\u6cd5\u2014\u2014VitaGlyph\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u7075\u6d3b\u5730\u8868\u8fbe\u8f93\u5165\u5b57\u7b26\u7684\u6838\u5fc3\u6982\u5ff5\u4ee5\u53ca\u4e30\u5bcc\u76f8\u5173\u7684\u80cc\u666f\u4fe1\u606f\uff0c\u5b9e\u73b0\u827a\u672f\u5b57\u4f53\u4e0e\u53ef\u63a7\u5236\u7684\u51e0\u4f55\u53d8\u5316\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u4ece\u800c\u4fdd\u6301\u5b57\u4f53\u7684\u53ef\u8bfb\u6027\u3002VitaGlyph\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u8f93\u5165\u5b57\u7b26\u89c6\u4e3a\u7531\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7ec4\u6210\u7684\u573a\u666f\uff0c\u5e76\u5728\u4e0d\u540c\u51e0\u4f55\u53d8\u6362\u7a0b\u5ea6\u4e0b\u8fdb\u884c\u6e32\u67d3\u3002 \u5177\u4f53\u6765\u8bf4\uff0cVitaGlyph\u901a\u8fc7\u4ee5\u4e0b\u4e09\u4e2a\u9636\u6bb5\u6846\u67b6\u5b9e\u73b0\u5176\u529f\u80fd\uff1a(i) \u77e5\u8bc6\u83b7\u53d6\u9636\u6bb5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bbe\u8ba1\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7684\u6587\u672c\u63cf\u8ff0\uff1b(ii) \u533a\u57df\u5206\u89e3\u9636\u6bb5\u8bc6\u522b\u6700\u5339\u914d\u4e3b\u4f53\u63cf\u8ff0\u7684\u90e8\u5206\uff0c\u5e76\u5c06\u8f93\u5165\u7684\u5b57\u7b26\u56fe\u50cf\u5206\u4e3a\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\uff1b(iii) \u5b57\u4f53\u98ce\u683c\u5316\u9636\u6bb5\u9996\u5148\u901a\u8fc7\u8bed\u4e49\u5b57\u4f53\u4f18\u5316\u4e3b\u4f53\u533a\u57df\u7684\u7ed3\u6784\uff0c\u7136\u540e\u5206\u522b\u4f7f\u7528\u53ef\u63a7\u7ec4\u5408\u751f\u6210\u6280\u672f\u6e32\u67d3\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\u7684\u7eb9\u7406\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cVitaGlyph\u4e0d\u4ec5\u5728\u827a\u672f\u6027\u548c\u53ef\u8bfb\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u80fd\u591f\u63cf\u7ed8\u591a\u79cd\u5b9a\u5236\u6982\u5ff5\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5bcc\u6709\u521b\u610f\u548c\u6109\u60a6\u7684\u827a\u672f\u5b57\u4f53\u751f\u6210\u3002\u9879\u76ee\u4ee3\u7801\u5c06\u5728https://github.com/Carlofkl/VitaGlyph\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2410.02761": "|**2024-10-03**|**FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models**|Zhipei Xu et.al.|[2410.02761](http://arxiv.org/abs/2410.02761)|**[link](https://github.com/zhipeixu/fakeshield)**|\u751f\u6210\u5f0fAI\u7684\u5feb\u901f\u53d1\u5c55\u72b9\u5982\u4e00\u628a\u53cc\u5203\u5251\uff0c\u65e2\u4fc3\u8fdb\u4e86\u5185\u5bb9\u521b\u4f5c\uff0c\u4e5f\u4f7f\u5f97\u56fe\u50cf\u7f16\u8f91\u548c\u96be\u4ee5\u8fa8\u8bc6\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u5f53\u524d\u7684\u56fe\u50cf\u4f2a\u9020\u68c0\u6d4b\u4e0e\u5b9a\u4f4d\uff08IFDL\uff09\u65b9\u6cd5\u867d\u7136\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u6709\u6548\uff0c\u4f46\u4ecd\u7136\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a1\uff09\u9ed1\u76d2\u6027\u8d28\uff0c\u5373\u65e0\u6cd5\u77e5\u6653\u5176\u68c0\u6d4b\u539f\u7406\uff1b2\uff09\u5bf9\u4e0d\u540c\u4f2a\u9020\u6280\u672f\uff08\u5982Photoshop\u3001DeepFake\u3001AIGC-Editing\u7b49\uff09\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53ef\u89e3\u91ca\u7684IFDL\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u5177\u6709\u591a\u6a21\u6001\u80fd\u529b\u7684\u6846\u67b6\u2014\u2014FakeShield\u3002\u8be5\u6846\u67b6\u65e8\u5728\u8bc4\u4f30\u56fe\u50cf\u7684\u771f\u5b9e\u6027\uff0c\u751f\u6210\u7be1\u6539\u533a\u57df\u7684\u63a9\u6a21\uff0c\u5e76\u57fa\u4e8e\u50cf\u7d20\u7ea7\u548c\u56fe\u50cf\u7ea7\u7684\u7be1\u6539\u7ebf\u7d22\u63d0\u4f9b\u5224\u65ad\u4f9d\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528GPT-4o\u589e\u5f3a\u4e86\u73b0\u6709\u7684IFDL\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u591a\u6a21\u6001\u7be1\u6539\u63cf\u8ff0\u6570\u636e\u96c6\uff08MMTD-Set\uff09\uff0c\u7528\u4e8e\u8bad\u7ec3FakeShield\u7684\u7be1\u6539\u5206\u6790\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57df\u6807\u7b7e\u5f15\u5bfc\u7684\u53ef\u89e3\u91ca\u4f2a\u9020\u68c0\u6d4b\u6a21\u5757\uff08DTE-FDM\uff09\u548c\u591a\u6a21\u6001\u4f2a\u9020\u5b9a\u4f4d\u6a21\u5757\uff08MFLM\uff09\uff0c\u4ee5\u5e94\u5bf9\u5404\u79cd\u4f2a\u9020\u68c0\u6d4b\u89e3\u91ca\u548c\u5b9e\u73b0\u7531\u8be6\u7ec6\u6587\u672c\u63cf\u8ff0\u6307\u5bfc\u7684\u4f2a\u9020\u5b9a\u4f4d\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0cFakeShield\u6709\u6548\u5730\u68c0\u6d4b\u548c\u5b9a\u4f4d\u4e86\u5404\u79cd\u7be1\u6539\u6280\u672f\uff0c\u63d0\u4f9b\u4e86\u6bd4\u4ee5\u5f80IFDL\u65b9\u6cd5\u66f4\u53ef\u89e3\u91ca\u4e14\u6027\u80fd\u66f4\u4f18\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.02757": "|**2024-10-03**|**Loong: Generating Minute-level Long Videos with Autoregressive Language Models**|Yuqing Wang et.al.|[2410.02757](http://arxiv.org/abs/2410.02757)|null|\u5728\u751f\u6210\u65f6\u957f\u8fbe\u5230\u6570\u5206\u949f\u7684\u4e30\u5bcc\u5185\u5bb9\u89c6\u9891\u65b9\u9762\uff0c\u5c3d\u7ba1\u5177\u6709\u6311\u6218\u6027\u4f46\u524d\u666f\u5e7f\u9614\u3002\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u751f\u6210\u8fde\u8d2f\u4e14\u957f\u5ea6\u8f83\u957f\u7684\u4ee4\u724c\u5e8f\u5217\u65b9\u9762\u53d6\u5f97\u4e86\u5de8\u5927\u6210\u529f\uff0c\u800c\u5728\u63a2\u7d22\u4f7f\u7528\u81ea\u56de\u5f52LLMs\u8fdb\u884c\u89c6\u9891\u751f\u6210\u65f6\uff0c\u4e3b\u8981\u5c40\u9650\u4e8e\u751f\u6210\u51e0\u79d2\u949f\u7684\u77ed\u89c6\u9891\u3002\u672c\u6587\u5bf9\u963b\u6b62\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u751f\u6210\u957f\u65f6\u95f4\u89c6\u9891\u7684\u6311\u6218\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\u3002\u57fa\u4e8e\u89c2\u5bdf\u548c\u5206\u6790\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u201cLoong\u201d\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe\u6570\u5206\u949f\u7684\u89c6\u9891\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u6587\u672c\u4ee4\u724c\u548c\u89c6\u9891\u4ee4\u724c\u7edf\u4e00\u4e3a\u81ea\u56de\u5f52LLM\u53ef\u4ee5\u8fdb\u884c\u81ea\u56de\u5f52\u5efa\u6a21\u7684\u5e8f\u5217\uff0c\u5e76\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u6e10\u8fdb\u5f0f\u77ed\u81f3\u957f\u8bad\u7ec3\u548c\u635f\u5931\u91cd\u65b0\u52a0\u6743\u65b9\u6848\uff0c\u4ee5\u7f13\u89e3\u957f\u671f\u89c6\u9891\u8bad\u7ec3\u4e2d\u7684\u635f\u5931\u4e0d\u5e73\u8861\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u63a8\u7406\u7b56\u7565\uff0c\u5305\u62ec\u89c6\u9891\u4ee4\u724c\u91cd\u7f16\u7801\u548c\u91c7\u6837\u7b56\u7565\uff0c\u4ee5\u51cf\u5c11\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7d2f\u79ef\u7684\u8bef\u5dee\u3002\u6211\u4eec\u7684\u63d0\u51fa\u7684\u201cLoong\u201d\u53ef\u4ee5\u4ece10\u79d2\u7684\u89c6\u9891\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u6269\u5c55\u5230\u6839\u636e\u6587\u672c\u63d0\u793a\u751f\u6210\u6570\u5206\u949f\u7ea7\u522b\u7684\u957f\u89c6\u9891\uff0c\u5982\u7ed3\u679c\u6240\u793a\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\uff1ahttps://epiphqny.github.io/Loong-video\u3002|\n", "2410.02755": "|**2024-10-03**|**SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost**|Jifan Zhang et.al.|[2410.02755](http://arxiv.org/abs/2410.02755)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSIEVE\u7684\u8f7b\u91cf\u7ea7\u66ff\u4ee3\u65b9\u6848\uff0c\u8be5\u65b9\u6848\u5728\u6210\u672c\u4ec5\u4e3aGPT-4o\u5355\u6b21\u8fc7\u6ee4\u8c03\u7528\u7684\u5341\u5206\u4e4b\u4e00\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u4e0eGPT-4o\u7684\u51c6\u786e\u6027\u76f8\u5339\u914d\u3002SIEVE\u7684\u6838\u5fc3\u5728\u4e8e\u5c06GPT-4o\u548c\u8f7b\u91cf\u7ea7T5\u6a21\u578b\u65e0\u7f1d\u96c6\u6210\uff0c\u5e76\u4f7f\u7528\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u5728\u5c11\u91cfGPT-4o\u8c03\u7528\u7684\u652f\u6301\u4e0b\u5bf9T5\u8fdb\u884c\u5fae\u8c03\u3002\u4e00\u65e6\u8bad\u7ec3\u5b8c\u6210\uff0cSIEVE\u7684\u8868\u73b0\u4e0eGPT-4o\u76f8\u5f53\uff0c\u4f46\u6210\u672c\u5374\u4f4e\u5f97\u591a\uff08\u4ec5\u4e3a\u73b0\u6709\u6280\u672f\u76841%\uff09\u3002\u6211\u4eec\u5728OpenWebText\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u9488\u5bf9\u9ad8\u8d28\u91cf\u548c\u9886\u57df\u7279\u5b9a\u5185\u5bb9\u7684\u4e94\u4e2a\u9ad8\u5ea6\u5b9a\u5236\u5316\u7684\u8fc7\u6ee4\u4efb\u52a1\u9a8c\u8bc1\u4e86SIEVE\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002 \u8fdb\u4e00\u6b65\u9a8c\u8bc1SIEVE\u7684\u6548\u679c\u663e\u793a\uff0cSIEVE\u548cGPT-4o\u5728\u51c6\u786e\u6027\u65b9\u9762\u8fbe\u5230\u76f8\u4f3c\u6c34\u5e73\uff0c\u800c\u4eba\u7c7b\u8bc4\u4f30\u8005\u66f4\u503e\u5411\u4e8eSIEVE\u7684\u8fc7\u6ee4\u7ed3\u679c\u800c\u975eGPT-4o\u7684\u7ed3\u679c\u3002|\n", "2410.02749": "|**2024-10-03**|**Training Language Models on Synthetic Edit Sequences Improves Code Synthesis**|Ulyana Piterbarg et.al.|[2410.02749](http://arxiv.org/abs/2410.02749)|**[link](https://github.com/upiterbarg/lintseq)**|\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aLintSeq\u7684\u5408\u6210\u6570\u636e\u751f\u6210\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u4f7f\u7528\u4ee3\u7801\u68c0\u67e5\u5668\u6765\u7a0b\u5e8f\u5316\u5730\u5728\u4e0d\u5f15\u5165\u9519\u8bef\u7684\u60c5\u51b5\u4e0b\u968f\u673a\u9009\u53d6\u63d2\u5165\u64cd\u4f5c\u5e8f\u5217\uff0c\u4ece\u800c\u5bf9\u73b0\u6709\u4ee3\u7801\u8fdb\u884c\u91cd\u6784\uff0c\u751f\u6210\u4e00\u7cfb\u5217\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u3002\u8fd9\u4e9b\u5e8f\u5217\u4ee5\u8fde\u7eed\u7684\u7a0b\u5e8f\u5dee\u5f02\u5f62\u5f0f\u8f93\u51fa\u3002 \u4e3a\u4e86\u6d4b\u8bd5LintSeq\uff0c\u6211\u4eec\u5c06\u5176\u5e94\u7528\u4e8e\u5c06\u6307\u4ee4+\u7a0b\u5e8f\u5bf9\u91cd\u65b0\u683c\u5f0f\u5316\u4e3a\u6307\u4ee4+\u7a0b\u5e8f\u5dee\u5f02\u5e8f\u5217\u5bf9\u7684\u4ee3\u7801\u5e93\u3002\u7136\u540e\uff0c\u6211\u4eec\u5bf9\u53c2\u6570\u4ece2.6B\u523014B\u7684\u591a\u4e2a\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u57fa\u4e8e\u6307\u4ee4\u7684\u5fae\u8c03\uff0c\u6bd4\u8f83\u4e86\u5728\u539f\u59cb\u7248\u672c\u548c\u91cd\u65b0\u683c\u5f0f\u5316\u7248\u672c\u6570\u636e\u96c6\u4e0a\u7684\u96f6\u6b21\u5c04\u51fb\u6027\u80fd\u5728\u4ee3\u7801\u5408\u6210\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u6b21\u91c7\u6837\u671f\u95f4\uff0c\u7ecf\u8fc7\u4ee3\u7801\u5dee\u5f02\u5fae\u8c03\u7684\u6a21\u578b\u4ea7\u751f\u7684\u7a0b\u5e8f\u591a\u6837\u6027\u9ad8\u4e8e\u57fa\u7ebf\u3002\u8fd9\u5bfc\u81f4\u4e86\u5728\u7ed9\u5b9a\u5c1d\u8bd5\u6b21\u6570\u201ck\u201d\u65f6\uff0c\u9488\u5bf9\u57fa\u51c6\u8986\u76d6\u7387\u7684\u63a8\u7406\u65f6\u95f4\u6269\u5c55\u6027\u66f4\u597d\uff0c\u5373\u89e3\u51b3\u4efb\u4f55\u95ee\u9898\u7684\u6982\u7387\u201cpass@k\u201d\u3002\u4f8b\u5982\uff0c\u5728HumanEval pass@50\u4e0a\uff0c\u8f83\u5c0f\u6a21\u578b\u5728\u7ecf\u8fc7\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u5fae\u8c03\u540e\u4e0eGPT-4\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\uff0c\u5e76\u4e14\u4f18\u4e8e\u57fa\u4e8e\u57fa\u7ebf\u6570\u636e\u96c6\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u7edd\u5bf9\u5f97\u5206\u9ad8\u51fa20%\uff08\u00b13%\uff09\u3002 \u6700\u540e\uff0c\u6211\u4eec\u8fd8\u9884\u8bad\u7ec3\u4e86\u81ea\u5df1\u7684\u5c0f\u578b\u6a21\u578b\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u57fa\u4e8e\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u7684\u5fae\u8c03\u53ef\u4ee5\u8fbe\u5230\u7c7b\u8bbe\u5907\u6a21\u578b\u7684\u6700\u9ad8\u4ee3\u7801\u5408\u6210\u6027\u80fd\u3002\u6211\u4eec\u76841.5\u4ebf\u53c2\u6570\u7f16\u8f91\u5e8f\u5217\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5339\u914d\u6216\u8d85\u8d8a\u4e86\u53c2\u6570\u91cf\u7ffb\u500d\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u5426\u8fdb\u884c\u591a\u6b21\u91c7\u6837\uff0c\u5305\u62ecCodex\u548cAlphaCode\u3002|\n", "2410.02748": "|**2024-10-03**|**CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation**|Han He et.al.|[2410.02748](http://arxiv.org/abs/2410.02748)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u4ece\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u589e\u5f3a\u603b\u7ed3\u63d0\u793a\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u53ef\u4ee5\u63d0\u9ad8ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0c\u878d\u5165\u77ed\u8bed\u7ea7\u522b\u7684\u663e\u8457\u4fe1\u606f\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7ea7\u522b\u7684\u4fe1\u606f\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5bf9\u5e7b\u89c9\u7684\u5f71\u54cd\u5e76\u975e\u5728\u6240\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0a\u90fd\u662f\u79ef\u6781\u7684\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u9879\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u6a21\u578bKeyphrase Signal Extractor\uff08CriSPO\uff09\uff0c\u8be5\u6a21\u578b\u53ef\u4ee5\u5fae\u8c03\u4ee5\u63d0\u53d6\u663e\u8457\u7684\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528CriSPO\uff0c\u6211\u4eec\u5728\u6570\u636e\u96c6\u3001\u5f00\u6e90\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u5bf9ROUGE\u6539\u8fdb\u7684\u4e00\u81f4\u6027\uff0c\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u5b9a\u5236\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u603b\u7ed3\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.02746": "|**2024-10-03**|**Contrastive Localized Language-Image Pre-Training**|Hong-You Chen et.al.|[2410.02746](http://arxiv.org/abs/2410.02746)|null|\u672c\u6587\u9488\u5bf9\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u4f5c\u4e3a\u89c6\u89c9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u7684\u6210\u529f\uff0c\u91cd\u70b9\u5728\u4e8e\u901a\u8fc7\u5728\u56fe\u50cf\u7ea7\u522b\u4e0a\u5bf9\u9f50\u7f51\u7edc\u6587\u672c\u6ce8\u91ca\u6765\u4f18\u5316\u89c6\u89c9\u7f16\u7801\u5668\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u9700\u8981\u7ec6\u7c92\u5ea6\u89c6\u89c9\u8868\u793a\u7684\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d8\u5f97\u4e0d\u591f\u5145\u5206\uff0c\u5c24\u5176\u662f\u5f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u9700\u8981\u8fdb\u884c\u533a\u57df\u7ea7\u7406\u89e3\u65f6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bf9\u6bd4\u5b9a\u4f4d\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLOC\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8865\u5145CLIP\u4ee5\u589e\u52a0\u533a\u57df\u6587\u672c\u5bf9\u6bd4\u635f\u5931\u548c\u6a21\u5757\u6765\u63d0\u5347\u5176\u5b9a\u4f4d\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u6982\u5ff5\uff0c\u5373\u53ef\u63d0\u793a\u5d4c\u5165\uff0c\u5176\u5141\u8bb8\u7f16\u7801\u5668\u751f\u6210\u6613\u4e8e\u901a\u8fc7\u7a7a\u95f4\u63d0\u793a\u8f6c\u6362\u4e3a\u533a\u57df\u8868\u793a\u7684\u56fe\u50cf\u5d4c\u5165\u3002\u4e3a\u4e86\u652f\u6301\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\uff0c\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u89c6\u89c9\u589e\u5f3a\u4e14\u7a7a\u95f4\u5c40\u90e8\u5316\u7684\u63cf\u8ff0\u7b26\u751f\u6210\u6846\u67b6\uff0c\u80fd\u591f\u6709\u6548\u751f\u6210\u5927\u89c4\u6a21\u7684\u533a\u57df\u6587\u672c\u4f2a\u6807\u7b7e\u3002\u901a\u8fc7\u6269\u5c55\u5230\u6570\u5341\u4ebf\u6807\u6ce8\u56fe\u50cf\uff0cCLOC\u4f7f\u5f97\u56fe\u50cf\u533a\u57df\u8bc6\u522b\u548c\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u9ad8\u8d28\u91cf\u533a\u57df\u5d4c\u5165\u6210\u4e3a\u53ef\u80fd\uff0c\u5e76\u53ef\u4ee5\u4f5c\u4e3aCLIP\u7684\u76f4\u63a5\u66ff\u4ee3\u54c1\uff0c\u7528\u4e8e\u589e\u5f3aMLLMs\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee3\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u4efb\u52a1\u4e2d\u3002|\n", "2410.02744": "|**2024-10-03**|**Neutral residues: revisiting adapters for model extension**|Franck Signe Talla et.al.|[2410.02744](http://arxiv.org/abs/2410.02744)|null|\u6211\u4eec\u89e3\u51b3\u4e86\u4e00\u4e2a\u65b0\u7684\u95ee\u9898\uff1a\u5982\u4f55\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u6269\u5c55\u5230\u5728\u8bad\u7ec3\u65f6\u672a\u66fe\u89c1\u8fc7\u7684\u9886\u57df\uff0c\u4f8b\u5982\u6dfb\u52a0\u4e00\u79cd\u539f\u59cb\u6a21\u578b\u672a\u89c1\u8fc7\u6216\u89c1\u8fc7\u5f88\u5c11\u8bad\u7ec3\u6570\u636e\u7684\u8bed\u8a00\u3002\u6d41\u884c\u7684\u89e3\u51b3\u65b9\u6848\u5982\u5fae\u8c03\u6216\u4f4e\u79e9\u9002\u5e94\u5728\u9886\u57df\u9002\u5e94\u65b9\u9762\u53d6\u5f97\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5b9e\u9645\u4e0a\u5e76\u672a\u589e\u52a0\u989d\u5916\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u964d\u4f4e\u4e86\u539f\u59cb\u9886\u57df\u7684\u6027\u80fd\u3002\u672c\u6587\u4ece\u4e09\u4e2a\u89d2\u5ea6\u5206\u6790\u4e86\u8fd9\u4e2a\u95ee\u9898\uff1a\u6570\u636e\u3001\u67b6\u6784\u548c\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u88ab\u6709\u5229\u5730\u8054\u5408\u8003\u8651\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u9002\u914d\u5668\uff0c\u5e76\u4f7f\u5176\u6709\u53ef\u80fd\u5b66\u4e60\u4e00\u4e2a\u5168\u65b0\u7684\u8bed\u8a00\uff0c\u540c\u65f6\u786e\u4fdd\u795e\u7ecf\u7f51\u7edc\u5728\u539f\u59cb\u9886\u57df\u7684\u8f93\u51fa\u51e0\u4e4e\u4e0d\u53d8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u65b0\u7684\u6b8b\u5dee\u5757\u7684\u65b9\u5f0f\uff0c\u4f7f\u5f97\u6bcf\u4e2a\u65b0\u7684\u6b8b\u5dee\u5757\u5728\u539f\u59cb\u9886\u57df\u8f93\u51fa\u63a5\u8fd1\u96f6\u7684\u7ed3\u679c\u3002 \u8fd9\u79cd\u88ab\u79f0\u4e3a\u201c\u4e2d\u6027\u6b8b\u5dee\u201d\u7684\u89e3\u51b3\u65b9\u6848\u501f\u9274\u4e86\u6df7\u5408\u4e13\u5bb6\u67b6\u6784\u7684\u7ec4\u4ef6\uff0c\u6548\u679c\u663e\u8457\uff1a\u4e0e\u4ec5\u7528\u82f1\u8bed\u8bad\u7ec3\u7684\u539f\u59cb\u6a21\u578b\u76f8\u6bd4\uff0c\u53ea\u9700\u8981\u989d\u591620%\u7684\u5b66\u4e60\u6743\u91cd\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u548c\u4e0d\u5fd8\u8bb0\u82f1\u8bed\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u4f18\u4e8e\u540c\u65f6\u8fdb\u884c\u7684\u5176\u4ed6\u65b9\u6cd5\uff08\u5fae\u8c03\u3001\u4f4e\u79e9\u6216\u5e38\u89c4\u9002\u914d\u5668\uff09\u7684\u7ed3\u679c\u3002|\n", "2410.02743": "|**2024-10-03**|**MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions**|Yekun Chai et.al.|[2410.02743](http://arxiv.org/abs/2410.02743)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u8bc1\u660e\u4e86\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u65b9\u9762\u5177\u6709\u6709\u6548\u6027\u3002\u7136\u800c\uff0c\u57fa\u4e8etoken\u7684RLHF\u9762\u4e34\u7740\u957f\u671f\u5e8f\u5217\u4e2d\u7684\u8d23\u4efb\u5f52\u56e0\u95ee\u9898\uff0c\u5176\u4e2d\u5ef6\u8fdf\u5956\u52b1\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u786e\u5b9a\u54ea\u4e9b\u64cd\u4f5c\u5bfc\u81f4\u4e86\u6210\u529f\u7684\u7ed3\u679c\uff0c\u8fd9\u963b\u788d\u4e86\u5b66\u4e60\u6548\u7387\u5e76\u51cf\u6162\u4e86\u6536\u655b\u901f\u5ea6\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMA-RLHF\u7684\u7b80\u5355\u800c\u6709\u6548\u7684RLHF\u6846\u67b6\uff0c\u5b83\u5c06\u5b8f\u52a8\u4f5c\u2014\u2014\u4e00\u7cfb\u5217token\u6216\u66f4\u9ad8\u5c42\u6b21\u7684\u8bed\u8a00\u6784\u9020\u2014\u2014\u878d\u5165\u5230\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u3002\u901a\u8fc7\u5728\u66f4\u9ad8\u62bd\u8c61\u7ea7\u522b\u4e0a\u64cd\u4f5c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u884c\u52a8\u548c\u5956\u52b1\u4e4b\u95f4\u7684\u65f6\u5e8f\u8ddd\u79bb\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e86\u66f4\u5feb\u4e14\u66f4\u51c6\u786e\u7684\u8d23\u4efb\u5f52\u56e0\u3002\u8fd9\u5bfc\u81f4\u4e86\u66f4\u7a33\u5b9a\u7684\u7b56\u7565\u68af\u5ea6\u4f30\u8ba1\uff0c\u5e76\u63d0\u9ad8\u4e86\u6bcf\u4e2aepisode\u5185\u7684\u5b66\u4e60\u6548\u7387\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u671f\u95f4\u589e\u52a0\u8ba1\u7b97\u590d\u6742\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6587\u672c\u6458\u8981\u3001\u5bf9\u8bdd\u751f\u6210\u3001\u95ee\u9898\u56de\u7b54\u548c\u7a0b\u5e8f\u5408\u6210\u7b49\u5404\u4e2a\u6a21\u578b\u5927\u5c0f\u548c\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6587\u672c\u6458\u8981\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u8fbe30%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5728\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e8618%\uff0c\u5728\u95ee\u9898\u56de\u7b54\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e868%\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6bd4\u6807\u51c6\u7684RLHF\u5feb1.7\u81f32\u500d\u7684\u8bad\u7ec3\u65f6\u95f4\u8fbe\u5230\u4e0e\u4e4b\u76f8\u5339\u654c\u7684\u6027\u80fd\u6c34\u5e73\uff0c\u5e76\u4e14\u968f\u7740\u8fdb\u4e00\u6b65\u7684\u8bad\u7ec3\uff0c\u7ee7\u7eed\u8d85\u8d8a\u5b83\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u7f51\u5740\u4e3ahttps://github.com/ernie-research/MA-RLHF \u3002|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\u7ecf\u5e38\u9047\u5230\u56f0\u96be\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u8282\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large language model with Imperfect world MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u6574\u5408\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u65f6\u95f4\u4e00\u81f4\u6027\u4f53\u9a8c\u91c7\u6837\u7684\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\u3001\u4e00\u7ec4\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u601d\u5148\u524d\u7ecf\u9a8c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6a21\u5757\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u663e\u8457\u63d0\u5347\u5f3a\u5f00\u6e90LLMs\uff08\u5982LLaMA-3\uff09\u7684\u8868\u73b0\uff0c\u5206\u522b\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.04\u500d\u30011.54\u500d\u548c1.82\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5176\u66f4\u5927\u7684\u540c\u8f88\u6a21\u578b\uff0c\u5982GPT-4\u3002|\n", "2410.02741": "|**2024-10-03**|**Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization**|Lei Xu et.al.|[2410.02741](http://arxiv.org/abs/2410.02741)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u6765\u589e\u5f3a\u751f\u6210\u63d0\u793a\u4ee5\u6539\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6458\u8981\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u80fd\u63d0\u5347ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u5f97\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u52a0\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u901a\u8fc7\u8c03\u6574\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\uff0c\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0c\u5c06\u77ed\u8bed\u7ea7\u7684\u663e\u8457\u4fe1\u606f\u878d\u5165\u63d0\u793a\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u4e0d\u610f\u5473\u7740\u5bf9\u6240\u6709LLM\u90fd\u666e\u904d\u6709\u76ca\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u7684Keyphrase Signal Extractor\uff08SigExt\uff09\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u53ef\u8fdb\u884c\u5fae\u8c03\u4ee5\u63d0\u53d6\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528SigExt\uff0c\u6211\u4eec\u5728\u591a\u4e2a\u6570\u636e\u96c6\u3001\u516c\u5f00\u6743\u91cd\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u4e0d\u4f9d\u8d56\u4e8eLLM\u5b9a\u5236\u7684ROUGE\u6307\u6807\u6539\u5584\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u6458\u8981\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.03663": "|**2024-10-04**|**Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models**|Zhuochun Li et.al.|[2410.03663](http://arxiv.org/abs/2410.03663)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cMistake-Aware Peer-Review Distillation\u201d\uff08MAPD\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u6539\u8fdb\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u7684\u77e5\u8bc6\u63d0\u70bc\uff08KD\uff09\u8fc7\u7a0b\u6765\u63d0\u9ad8\u5b83\u4eec\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u901a\u5e38\u4f9d\u8d56\u4e8e\u5927\u578b\u5546\u4e1a\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u6559\u5e08\u3002\u4e0e\u4ee5\u5f80\u7814\u7a76\u4ec5\u4f7f\u7528\u5355\u4e00\u6559\u5e08\u751f\u6210\u7684\u9ec4\u91d1\u7406\u636e\u8fdb\u884c\u8bad\u7ec3\u4e0d\u540c\uff0cMAPD\u65b9\u6cd5\u91c7\u53d6\u4e86\u66f4\u4e3a\u7ec6\u81f4\u7684\u7b56\u7565\uff1a 1. **\u4e2a\u6027\u5316\u9519\u8bef\u53cd\u9988**\uff1aMAPD\u4e0d\u4ec5\u8981\u6c42\u6559\u5e08\u63d0\u4f9b\u5b66\u751f\u7b54\u6848\u7684\u6b63\u786e\u7406\u636e\uff0c\u66f4\u8fdb\u4e00\u6b65\u5730\uff0c\u5b83\u8ba9\u6559\u5e08\u6307\u51fa\u5b66\u751f\u7684\u9519\u8bef\u5e76\u89e3\u91ca\u539f\u56e0\uff0c\u4ece\u800c\u751f\u6210\u5b9a\u5236\u5316\u7684\u6559\u5b66\u6570\u636e\u3002 2. **\u6a21\u62df\u540c\u884c\u8bc4\u5ba1**\uff1a\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6559\u5e08\u95f4\u7684\u6a21\u62df\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\uff0cMAPD\u7b5b\u9009\u51fa\u90a3\u4e9b\u8fbe\u5230\u4e00\u5b9a\u63a5\u53d7\u6807\u51c6\u7684\u751f\u6210\u7406\u636e\u3002\u8fd9\u4e00\u673a\u5236\u51cf\u5c11\u4e86\u6559\u5e08\u56e0\u731c\u6d4b\u800c\u7ed9\u51fa\u9519\u8bef\u7406\u636e\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6559\u5b66\u6570\u636e\u7684\u8d28\u91cf\u3002 \u672c\u6587\u5728\u6570\u5b66\u3001\u5e38\u8bc6\u548c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86MAPD\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.03658": "|**2024-10-04**|**RAFT: Realistic Attacks to Fool Text Detectors**|James Wang et.al.|[2410.03658](http://arxiv.org/abs/2410.03658)|**[link](https://github.com/jameslwang/raft)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u73b0\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5668\u7684\u8bed\u6cd5\u65e0\u8bef\u7684\u9ed1\u76d2\u653b\u51fb\u65b9\u6cd5\uff0c\u79f0\u4e3aRAFT\u3002\u4e0e\u4e4b\u524d\u9488\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u653b\u51fb\u4e0d\u540c\uff0cRAFT\u65b9\u6cd5\u5229\u7528\u4e86\u8bcd\u7ea7\u4e0a\u7684LLM\u5d4c\u5165\u7684\u53ef\u8fc1\u79fb\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u539f\u59cb\u6587\u672c\u8d28\u91cf\u4e0d\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8f85\u52a9\u5d4c\u5165\uff0cRAFT\u8d2a\u5a6a\u5730\u9009\u62e9\u9700\u8981\u6270\u52a8\u7684\u76ee\u6807\u5355\u8bcd\uff0c\u4ee5\u5bf9\u6297\u7279\u5b9a\u7684\u68c0\u6d4b\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRAFT\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u4f7f\u6240\u6709\u7814\u7a76\u4e2d\u7684\u68c0\u6d4b\u5668\u5728\u5404\u79cd\u9886\u57df\u4e2d\u5931\u6548\u9ad8\u8fbe99%\uff0c\u5e76\u4e14\u5177\u6709\u8de8\u6e90\u6a21\u578b\u7684\u53ef\u79fb\u690d\u6027\u3002\u624b\u52a8\u7684\u4eba\u7c7b\u8bc4\u4f30\u7814\u7a76\u8868\u660e\uff0cRAFT\u751f\u6210\u7684\u653b\u51fb\u5b9e\u4f8b\u65e2\u771f\u5b9e\u53c8\u96be\u4ee5\u4e0e\u539f\u521b\u4eba\u7c7b\u7f16\u5199\u6587\u672c\u533a\u5206\u5f00\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86RAFT\u751f\u6210\u7684\u4f8b\u5b50\u53ef\u4ee5\u7528\u6765\u8bad\u7ec3\u9c81\u68d2\u6027\u66f4\u5f3a\u7684\u68c0\u6d4b\u5668\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u5f53\u524d\u7684LLM\u68c0\u6d4b\u5668\u5e76\u975e\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5f3a\u8c03\u4e86\u8feb\u5207\u9700\u8981\u66f4\u5f3a\u5927\u7684\u68c0\u6d4b\u673a\u5236\u7684\u5fc5\u8981\u6027\u3002|\n", "2410.03642": "|**2024-10-04**|**Aligning LLMs with Individual Preferences via Interaction**|Shujin Wu et.al.|[2410.03642](http://arxiv.org/abs/2410.03642)|**[link](https://github.com/shujinwu-0814/aloe)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u65e5\u76ca\u5148\u8fdb\u7684\u80fd\u529b\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u5bf9\u4e8e\u5e7f\u6cdb\u91c7\u7528\u8fd9\u4e9b\u6a21\u578b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9075\u5faa\u8bf8\u5982\u5e2e\u52a9\u6027\u3001\u65e0\u5bb3\u6027\u548c\u8bda\u5b9e\u6027\u7b49\u4e00\u822c\u539f\u5219\u4e0a\uff0c\u4f46\u5ffd\u89c6\u4e86\u8003\u8651\u5230\u4e2a\u4eba\u548c\u591a\u6837\u6027\u504f\u597d\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31\u4e86\u4e2a\u6027\u5316\u7684\u4eba\u7c7b\u4f53\u9a8c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u80fd\u591f\u201c\u4ea4\u4e92\u4ee5\u5bf9\u9f50\u201d\u7684LLMs\uff0c\u5373\u8ba9LLMs\u53d1\u5c55\u51fa\u4e00\u79cd\u9690\u5f0f\u63a8\u65ad\u5f53\u524d\u7528\u6237\u672a\u660e\u786e\u8868\u8fbe\u7684\u4e2a\u6027\u5316\u504f\u597d\u7684\u5143\u6280\u80fd\uff0c\u5e76\u636e\u6b64\u52a8\u6001\u8c03\u6574\u540e\u7eed\u884c\u4e3a\u548c\u54cd\u5e94\u4ee5\u9002\u5e94\u8fd9\u4e9b\u63a8\u65ad\u7684\u504f\u597d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u5efa\u7acb\u4e00\u4e2a\u75313,310\u4e2a\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7ec4\u6210\u7684\u591a\u6837\u5316\u6c60\uff0c\u901a\u8fc7\u521d\u59cb\u793a\u4f8b\u521b\u5efa\uff0c\u7136\u540e\u901a\u8fc7\u8fed\u4ee3\u81ea\u6211\u751f\u6210\u548c\u7b5b\u9009\u8fdb\u884c\u6269\u5c55\u3002\u5728\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7684\u6307\u5bfc\u4e0b\uff0c\u6211\u4eec\u5229\u7528\u591aLLM\u534f\u4f5c\u5f00\u53d1\u4e86\u4e00\u4e2a\u5305\u542b3K+\u591a\u8f6e\u5bf9\u8bdd\u7684\u6811\u5f62\u7ed3\u6784\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u76d1\u7763\u5fae\u8c03\u548c\u5f3a\u5316\u5b66\u4e60\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u63d0\u9ad8LLMs\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u5efa\u7acb\u4e86ALOE\uff08ALign With CustOmized PrEferences\uff09\u57fa\u51c6\uff0c\u5305\u542b100\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4f8b\u5b50\u4ee5\u53ca\u7528\u4e8e\u8861\u91cf\u5bf9\u8bdd\u4e2d\u4e2a\u6027\u5316\u5bf9\u9f50\u6027\u80fd\u7684\u9002\u5f53\u5ea6\u91cf\u6807\u51c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u901a\u8fc7\u4e92\u52a8\u5b9e\u73b0\u52a8\u6001\u3001\u4e2a\u6027\u5316\u7684\u5bf9\u9f50\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002**|\n", "2410.03613": "|**2024-10-04**|**Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation**|Jie Xiao et.al.|[2410.03613](http://arxiv.org/abs/2410.03613)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6211\u4eec\u5de5\u4f5c\u548c\u65e5\u5e38\u751f\u6d3b\u7684\u5404\u4e2a\u65b9\u9762\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u7528\u6237\u9690\u79c1\u7684\u5173\u6ce8\u63a8\u52a8\u4e86\u8fd9\u4e9b\u6a21\u578b\u672c\u5730\u90e8\u7f72\u7684\u8d8b\u52bf\u3002\u5b58\u5728\u4e00\u4e9b\u8f7b\u91cf\u7ea7LLM\uff08\u4f8b\u5982Gemini Nano\uff0cLLAMA2 7B\uff09\uff0c\u5b83\u4eec\u53ef\u4ee5\u5728\u667a\u80fd\u624b\u673a\u4e0a\u672c\u5730\u8fd0\u884c\uff0c\u4e3a\u7528\u6237\u63d0\u4f9b\u5bf9\u5176\u4e2a\u4eba\u6570\u636e\u7684\u66f4\u5927\u63a7\u5236\u6743\u3002\u4f5c\u4e3a\u4e00\u9879\u8fc5\u901f\u53d1\u5c55\u7684\u5e94\u7528\uff0c\u6211\u4eec\u5173\u6ce8\u5b83\u4eec\u5728\u5546\u7528\u79fb\u52a8\u8bbe\u5907\u4e0a\u7684\u6027\u80fd\u3002 \u4e3a\u4e86\u5168\u9762\u4e86\u89e3LLM\u5728\u79fb\u52a8\u5e73\u53f0\u4e0a\u7684\u90e8\u7f72\u73b0\u72b6\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u6d4b\u91cf\u7814\u7a76\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f71\u54cd\u7528\u6237\u4f53\u9a8c\u7684\u6307\u6807\uff0c\u5305\u62ec\u4ee4\u724c\u541e\u5410\u91cf\u3001\u5ef6\u8fdf\u548c\u7535\u6c60\u6d88\u8017\uff0c\u4ee5\u53ca\u5bf9\u5f00\u53d1\u8005\u81f3\u5173\u91cd\u8981\u7684\u56e0\u7d20\uff0c\u5982\u8d44\u6e90\u5229\u7528\u3001\u52a8\u6001\u7535\u538b\u9891\u7387\u7f29\u653e\u7b56\u7565\u548c\u63a8\u7406\u5f15\u64ce\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u5206\u6790\u4e86\u786c\u4ef6\u80fd\u529b\u548c\u7cfb\u7edf\u52a8\u529b\u5b66\u5982\u4f55\u5f71\u54cd\u672c\u5730\u8bbe\u5907\u4e0a\u7684LLM\u6027\u80fd\uff0c\u8fd9\u53ef\u80fd\u6709\u52a9\u4e8e\u5f00\u53d1\u8005\u8bc6\u522b\u5e76\u89e3\u51b3\u79fb\u52a8LLM\u5e94\u7528\u7a0b\u5e8f\u4e2d\u7684\u74f6\u9888\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u9488\u5bf9\u4e3b\u8981\u4f9b\u5e94\u5546\u7684\u79fb\u52a8\u7cfb\u7edf\u7ea7\u82af\u7247\uff08SoC\uff09\u7684\u5168\u9762\u6bd4\u8f83\uff0c\u7a81\u51fa\u4e86\u5b83\u4eec\u5728\u5904\u7406LLM\u5de5\u4f5c\u8d1f\u8f7d\u65f6\u7684\u6027\u80fd\u5dee\u5f02\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u591f\u4e3a\u672c\u5730\u8bbe\u5907LLM\u7684\u5f00\u53d1\u548c\u672a\u6765\u79fb\u52a8\u7cfb\u7edf\u67b6\u6784\u7684\u8bbe\u8ba1\u63d0\u4f9b\u6d1e\u5bdf\u3002|\n", "2410.03608": "|**2024-10-04**|**TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation**|Jonathan Cook et.al.|[2410.03608](http://arxiv.org/abs/2410.03608)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u80cc\u666f\u4e0b\uff0c\u6784\u5efa\u7075\u6d3b\u4e14\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u5176\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u7684\u65b9\u6cd5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\uff0c\u504f\u597d\u5224\u65ad\u6210\u4e3a\u4e86\u8bc4\u4f30\u6807\u51c6\u7684\u9ed8\u8ba4\u9009\u62e9\uff0c\u5c3d\u7ba1\u8fd9\u79cd\u505a\u6cd5\u7b80\u5316\u4e86\u590d\u6742\u3001\u591a\u7ef4\u504f\u597d\u7684\u63d0\u70bc\uff0c\u5c06\u5176\u5f52\u7ed3\u4e3a\u5355\u4e00\u6392\u540d\u3002\u7136\u800c\uff0c\u968f\u7740\u4eba\u5de5\u6ce8\u91ca\u7684\u7f13\u6162\u548c\u6210\u672c\u9ad8\u6602\uff0cLLM\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u505a\u51fa\u8fd9\u4e9b\u5224\u65ad\uff0c\u8fd9\u727a\u7272\u4e86\u53ef\u9760\u6027\u548c\u53ef\u89e3\u91ca\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TICK\uff08\u9488\u5bf9\u7279\u5b9a\u6307\u4ee4\u7684\u7ed3\u6784\u5316\u8bc4\u4f30\u4e0e\u6838\u67e5\u6e05\u5355\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u81ea\u52a8\u5316\u3001\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u65b9\u6848\uff0c\u901a\u8fc7LLM\u751f\u6210\u7684\u3001\u9488\u5bf9\u6307\u4ee4\u7684\u6838\u67e5\u6e05\u5355\u7ed3\u6784\u5316\u8bc4\u4f30\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7ed9\u5b9a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u80fd\u591f\u53ef\u9760\u5730\u4ea7\u751f\u9ad8\u8d28\u91cf\u3001\u5b9a\u5236\u5316\u7684\u8bc4\u4f30\u6838\u67e5\u6e05\u5355\uff0c\u5c06\u6307\u4ee4\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u662f/\u5426\u95ee\u9898\u3002\u6bcf\u4e2a\u95ee\u9898\u8be2\u95ee\u5019\u9009\u56de\u5e94\u662f\u5426\u6ee1\u8db3\u6307\u4ee4\u7684\u5177\u4f53\u8981\u6c42\u3002\u6211\u4eec\u8bc1\u660e\u4f7f\u7528TICK\u80fd\u591f\u663e\u8457\u63d0\u9ad8LLM\u5224\u65ad\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7cbe\u786e\u4e00\u81f4\u6027\u7684\u9891\u7387\uff0c\u76f8\u6bd4\u76f4\u63a5\u7531LLM\u8bc4\u5206\u8f93\u51fa\uff0c\u8fd9\u4e00\u6bd4\u4f8b\u4ece46.4%\u63d0\u5347\u81f352.2%\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86STICK\uff08\u81ea\u6211TICK\uff09\u53ef\u4ee5\u5229\u7528\u81ea\u6211\u7ec6\u5316\u548c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u6765\u6539\u5584\u591a\u4e2a\u57fa\u51c6\u7684\u751f\u6210\u8d28\u91cf\u3002\u5bf9LiveBench\u63a8\u7406\u4efb\u52a1\u8fdb\u884cSTICK\u81ea\u6211\u7ec6\u5316\uff0c\u5b9e\u73b0\u4e86\u7edd\u5bf9\u589e\u76ca+7.8%\uff0c\u800c\u4f7f\u7528STICK\u8fdb\u884c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u5728\u771f\u5b9e\u4e16\u754c\u6307\u4ee4\u6570\u636e\u96c6WildBench\u4e0a\u83b7\u5f97\u4e86+6.3%\u7684\u7edd\u5bf9\u6539\u8fdb\u3002\u8fd9\u8868\u660e\uff0c\u7ed3\u6784\u5316\u7684\u3001\u591a\u7ef4\u5ea6\u7684\u81ea\u6211\u6539\u8fdb\u662f\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u80fd\u529b\u7684\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6700\u540e\uff0c\u901a\u8fc7\u5411\u76f4\u63a5\u4e3aWildBench\u6307\u4ee4\u8bc4\u4f30LLM\u54cd\u5e94\u7684\u4eba\u7c7b\u8bc4\u4f30\u8005\u63d0\u4f9bLLM\u751f\u6210\u7684\u6838\u67e5\u6e05\u5355\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u8bc4\u4f30\u8005\u4e4b\u95f4\u7684\u5171\u8bc6\u5ea6\uff08\u4ece0.194\u63d0\u5347\u81f30.256\uff09\u3002|\n", "2410.03600": "|**2024-10-04**|**Efficiently Identifying Watermarked Segments in Mixed-Source Texts**|Xuandong Zhao et.al.|[2410.03600](http://arxiv.org/abs/2410.03600)|null|\u6587\u672c\u6c34\u5370\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u7528\u4e8e\u68c0\u6d4b\u5408\u6210\u6587\u672c\uff0c\u4ee5\u7f13\u89e3\u865a\u5047\u65b0\u95fb\u548c\u5b66\u672f\u4e0d\u8bda\u5b9e\u7b49\u6ee5\u7528\u60c5\u51b5\u3002\u73b0\u6709\u6c34\u5370\u68c0\u6d4b\u6280\u672f\u4e3b\u8981\u5173\u6ce8\u4e8e\u5bf9\u6574\u4e2a\u6587\u6863\u8fdb\u884c\u5206\u7c7b\uff0c\u5224\u65ad\u5176\u662f\u5426\u88ab\u6c34\u5370\u6807\u8bb0\uff0c\u4f46\u5f80\u5f80\u5ffd\u7565\u4e86\u5728\u66f4\u957f\u7684\u6df7\u5408\u6765\u6e90\u6587\u6863\u4e2d\u8bc6\u522b\u5355\u72ec\u6c34\u5370\u6bb5\u843d\u7684\u5e38\u89c1\u573a\u666f\u3002\u53d7\u5230\u6284\u88ad\u68c0\u6d4b\u7cfb\u7edf\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u578b\u65b9\u6cd5\u8fdb\u884c\u90e8\u5206\u6c34\u5370\u68c0\u6d4b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u51e0\u4f55\u8986\u76d6\u68c0\u6d4b\u6846\u67b6\uff0c\u65e8\u5728\u786e\u5b9a\u957f\u6587\u672c\u4e2d\u662f\u5426\u5b58\u5728\u6c34\u5370\u6bb5\u843d\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7b97\u6cd5\uff0c\u4ee5\u51c6\u786e\u5b9a\u4f4d\u6587\u672c\u4e2d\u7684\u6c34\u5370\u6bb5\u843d\u4f4d\u7f6e\u3002\u5728\u4e09\u79cd\u6d41\u884c\u7684\u6c34\u5370\u6280\u672f\uff08KGW-Watermark\u3001Unigram-Watermark \u548c Gumbel-Watermark\uff09\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53d6\u5f97\u4e86\u9ad8\u7cbe\u5ea6\uff0c\u5e76\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5177\u6709\u9002\u5e94\u5176\u4ed6\u6c34\u5370\u6280\u672f\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u7cbe\u786e\u6c34\u5370\u68c0\u6d4b\u7684\u65b0\u89c1\u89e3\u3002|\n", "2410.03595": "|**2024-10-04**|**Understanding Reasoning in Chain-of-Thought from the Hopfieldian View**|Lijie Hu et.al.|[2410.03595](http://arxiv.org/abs/2410.03595)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u63d0\u793a\u4f5c\u4e3a\u4e00\u79cd\u63d0\u5347\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u6280\u672f\u9010\u6e10\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u9ad8\u6027\u80fd\u65b9\u9762\uff0c\u7f3a\u4e4f\u5bf9CoT\u6210\u529f\u80cc\u540e\u6839\u672c\u56e0\u7d20\u7684\u5168\u9762\u89e3\u91ca\u6846\u67b6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba4\u77e5\u795e\u7ecf\u79d1\u5b66\u4e2d\u7684\u970d\u666e\u83f2\u5c14\u5fb7\u8ba4\u77e5\u89c2\u7684\u65b0\u89c6\u89d2\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u94fe\u63a5CoT\u63a8\u7406\u4e0e\u523a\u6fc0\u3001\u52a8\u4f5c\u3001\u795e\u7ecf\u7fa4\u4f53\u548c\u8868\u793a\u7a7a\u95f4\u7b49\u5173\u952e\u8ba4\u77e5\u5143\u7d20\u4e4b\u95f4\u7684\u5173\u7cfb\u6846\u67b6\u3002\u4ece\u8fd9\u4e00\u89c6\u89d2\u51fa\u53d1\uff0c\u6211\u4eec\u53ef\u4ee5\u7406\u89e3\u63a8\u7406\u8fc7\u7a0b\u5b9e\u8d28\u4e0a\u662f\u8fd9\u4e9b\u8868\u793a\u7a7a\u95f4\u4e4b\u95f4\u7684\u79fb\u52a8\u3002 \u57fa\u4e8e\u6b64\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u5b9a\u4f4dCoT\u54cd\u5e94\u4e2d\u7684\u63a8\u7406\u9519\u8bef\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u601d\u8003\u7684\u8868\u793a\u201d\uff08Representation-of-Thought, RoT\uff09\u7684\u6846\u67b6\uff0c\u5229\u7528\u4f4e\u7ef4\u8868\u793a\u7a7a\u95f4\u7684\u9c81\u68d2\u6027\u6765\u589e\u5f3aCoT\u63a8\u7406\u8fc7\u7a0b\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRoT\u4e0d\u4ec5\u63d0\u9ad8\u4e86CoT\u63a8\u7406\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u800c\u4e14\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u5316\u63a7\u5236\u7684\u53ef\u80fd\u6027\u3002|\n", "2410.03577": "|**2024-10-04**|**Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models**|Xin Zou et.al.|[2410.03577](http://arxiv.org/abs/2410.03577)|**[link](https://github.com/1zhou-Wang/MemVR)**|\u5c3d\u7ba1\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5177\u6709\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u5e7b\u89c9\uff0c\u7279\u522b\u662f\u5728\u89c6\u89c9\u8f93\u5165\u4e2d\u4e0d\u5b58\u5728\u5173\u952e\u7ec6\u8282\u65f6\uff0c\u4f1a\u5938\u5f20\u5730\u7f16\u9020\u5185\u5bb9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9075\u5faa\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e2a\u5e38\u89c1\u6b65\u9aa4\u2014\u2014\u5f53\u5bf9\u73b0\u573a\u5173\u952e\u7ec6\u8282\u7684\u8bb0\u5fc6\u9010\u6e10\u6a21\u7cca\u65f6\uff0c\u76f4\u89c2\u7684\u505a\u6cd5\u662f\u518d\u6b21\u67e5\u770b\u8fd9\u4e9b\u7ec6\u8282\u4ee5\u5bfb\u6c42\u51c6\u786e\u548c\u771f\u5b9e\u7684\u4fe1\u606f\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bb0\u5fc6\u7a7a\u95f4\u89c6\u89c9\u91cd\u8bfb\u201d\uff08MemVR\uff09\u7684\u65b0\u578b\u5e7b\u89c9\u7f13\u89e3\u8303\u5f0f\uff0c\u5b83\u65e0\u9700\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6216\u989d\u5916\u7684\u5fae\u8c03\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u63d0\u793a\u4f5c\u4e3a\u8865\u5145\u8bc1\u636e\uff0c\u901a\u8fc7\u524d\u9988\u7f51\u7edc\uff08FFN\uff09\u6ce8\u5165\u5230MLLMs\u4e2d\u4f5c\u4e3a\u952e\u503c\u8bb0\u5fc6\uff0c\u5f53\u6a21\u578b\u5bf9\u95ee\u9898\u76f8\u5173\u7684\u89c6\u89c9\u8bb0\u5fc6\u4e0d\u786e\u5b9a\u751a\u81f3\u9057\u5fd8\u65f6\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cMemVR\u5728\u5404\u79cdMLLMs\u4e0a\u663e\u8457\u7f13\u89e3\u4e86\u5e7b\u89c9\u95ee\u9898\uff0c\u5e76\u4e14\u5728\u4e0d\u589e\u52a0\u65f6\u95f4\u5f00\u9500\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u901a\u7528\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4ece\u800c\u7a81\u663e\u51fa\u5176\u5e7f\u6cdb\u9002\u7528\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.03568": "|**2024-10-04**|**Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs)**|Abrar Rahman et.al.|[2410.03568](http://arxiv.org/abs/2410.03568)|null|\u672c\u6587\u5bf9\u5f53\u524d\u9876\u7ea7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u91c7\u7528\u7684\u5206\u8bcd\u6280\u672f\u8fdb\u884c\u4e86\u5168\u9762\u7814\u7a76\uff0c\u5e76\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u4e0d\u540c\u8bed\u8a00\u5c24\u5176\u662f\u8d44\u6e90\u532e\u4e4f\u8bed\u8a00\u670d\u52a1\u6210\u672c\u4e0e\u53ef\u7528\u6027\u65b9\u9762\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u7814\u7a76\u8003\u8651\u4e86\u591a\u79cdLLMs\uff0c\u5305\u62ec\u4f7f\u7528cl100k_base\u5d4c\u5165\u7684GPT-4\u3001\u4f7f\u7528p50k_base\u5d4c\u5165\u7684GPT-3\u4ee5\u53ca\u4f7f\u7528r50k_base\u5d4c\u5165\u7684DaVinci\uff0c\u540c\u65f6\u5bf9\u6bd4\u4e86\u5e7f\u6cdb\u4f7f\u7528\u7684BERT\u57fa\u7840\u5206\u8bcd\u5668\u3002\u7814\u7a76\u5206\u6790\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u5206\u8bcd\u5dee\u5f02\uff0c\u5e76\u6df1\u5165\u63a2\u7a76\u4e86\u5b50\u8bcd\u5206\u8bcd\u5728\u8bed\u8a00\u8868\u793a\u4e0a\u7684\u6311\u6218\u3002 \u7814\u7a76\u5f3a\u8c03\u4e86\u57f9\u517b\u8bed\u8a00\u610f\u8bc6\u5f00\u53d1\u5b9e\u8df5\u7684\u91cd\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u90a3\u4e9b\u4f20\u7edf\u4e0a\u8d44\u6e90\u4e0d\u8db3\u7684\u8bed\u8a00\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u5206\u8bcd\u9009\u62e9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002\u7814\u7a76\u65e8\u5728\u4fc3\u8fdbAI\u670d\u52a1\u9886\u57df\uff0c\u7279\u522b\u662f\u8de8\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u901a\u7528\u5316\u56fd\u9645\u5316\uff08I18N\uff09\u5b9e\u8df5\uff0c\u7279\u522b\u5173\u6ce8\u88ab\u73b0\u6709AI\u5e94\u7528\u4e25\u91cd\u5ffd\u89c6\u7684\u8bed\u8a00\u7684\u5305\u5bb9\u6027\u53d1\u5c55\u3002|\n", "2410.03553": "|**2024-10-04**|**Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding**|Wei Wu et.al.|[2410.03553](http://arxiv.org/abs/2410.03553)|null|\u86cb\u767d\u8d28\u4f5c\u4e3a\u751f\u7269\u5206\u5b50\u7684\u6838\u5fc3\uff0c\u5728\u751f\u7269\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u5305\u62ec\u4ee3\u8c22\u53cd\u5e94\u548cDNA\u590d\u5236\u3002\u51c6\u786e\u9884\u6d4b\u5b83\u4eec\u7684\u6027\u8d28\u548c\u529f\u80fd\u5bf9\u751f\u7269\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u5f00\u53d1\u7684\u86cb\u767d\u8d28\u8bed\u8a00\u6a21\u578b\uff08pLMs\uff09\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u63d0\u4f9b\u4e86\u89e3\u51b3\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5fae\u8c03\u7684\u6a21\u578b\u4ec5\u9488\u5bf9\u7279\u5b9a\u4e0b\u6e38\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u5b9a\u5236\uff0c\u5b9e\u73b0\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u589e\u5f3a\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u8c03\u8c10\uff08SEPIT\uff09\u6846\u67b6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728pLMs\u4e2d\u96c6\u6210\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u7ed3\u6784\u611f\u77e5\u6a21\u5757\uff0c\u4ee5\u63d0\u4f9b\u6709\u5173\u7ed3\u6784\u7684\u77e5\u8bc6\uff0c\u5e76\u5c06\u8fd9\u4e9b\u589e\u5f3a\u7684pLMs\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fde\u63a5\u8d77\u6765\uff0c\u4ee5\u751f\u6210\u86cb\u767d\u8d28\u7684\u7406\u89e3\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u6307\u4ee4\u8c03\u8c10\u7ba1\u9053\uff0c\u9996\u5148\u901a\u8fc7\u57fa\u4e8e\u56fe\u6807\u7684\u6307\u4ee4\u5efa\u7acb\u86cb\u767d\u8d28\u7684\u57fa\u672c\u7406\u89e3\uff0c\u7136\u540e\u4f7f\u7528\u4e13\u5bb6\u6df7\u5408\uff08MoEs\uff09\u5b66\u4e60\u66f4\u590d\u6742\u5c5e\u6027\u548c\u529f\u80fd\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6fc0\u6d3b\u53c2\u6570\u7684\u6570\u91cf\u76f8\u540c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u6700\u5168\u9762\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u6211\u4eec\u80fd\u591f\u8bad\u7ec3\u548c\u8bc4\u4f30\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u7ecf\u9a8c\u7ed3\u679c\u5728\u5f00\u653e\u5f0f\u751f\u6210\u548c\u5c01\u95ed\u96c6\u5408\u7b54\u6848\u4efb\u52a1\u4e0a\u663e\u793a\u4e86SEPIT\u76f8\u5bf9\u4e8e\u95ed\u6e90\u901a\u7528LLM\u548c\u4f7f\u7528\u86cb\u767d\u8d28\u77e5\u8bc6\u8bad\u7ec3\u7684\u5f00\u6e90LLM\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2410.05269": "|**2024-10-07**|**Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models**|Fei Wang et.al.|[2410.05269](http://arxiv.org/abs/2410.05269)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u6570\u636e\u662f\u5173\u952e\u8981\u7d20\u3002\u8fd1\u671f\u7814\u7a76\u63a2\u7d22\u4e86\u5229\u7528LLM\u8fdb\u884c\u9ad8\u6548\u6570\u636e\u6536\u96c6\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531LLM\u751f\u6210\u7684\u6570\u636e\u5f80\u5f80\u5b58\u5728\u8d28\u91cf\u53c2\u5dee\u4e0d\u9f50\u3001\u67d0\u4e9b\u65b9\u9762\u88ab\u4f4e\u4f30\u6216\u7f3a\u5931\u4ee5\u53ca\u6570\u636e\u70b9\u8d28\u91cf\u4f4e\u4e0b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6570\u636e\u987e\u95ee\u201d\u7684\u589e\u5f3a\u578bLLM\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u76ee\u6807\u6570\u636e\u96c6\u7684\u7279\u6027\uff0c\u4ece\u9884\u5b9a\u4e49\u7684\u539f\u5219\u51fa\u53d1\uff0c\u76d1\u63a7\u751f\u6210\u6570\u636e\u7684\u72b6\u6001\uff0c\u8bc6\u522b\u5f53\u524d\u6570\u636e\u96c6\u7684\u5f31\u70b9\uff0c\u5e76\u636e\u6b64\u6307\u5bfc\u6570\u636e\u751f\u6210\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u6570\u636e\u987e\u95ee\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u73b0\u6709\u7684\u6570\u636e\u751f\u6210\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u8986\u76d6\u9762\u3002 \u5728\u5bf9\u4e09\u4e2a\u4ee3\u8868\u6027LLM\uff08\u5373Mistral\u3001Llama2\u548cFalcon\uff09\u7684\u5b89\u5168\u5bf9\u9f50\u8fdb\u884c\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6570\u636e\u987e\u95ee\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u727a\u7272\u6a21\u578b\u5b9e\u7528\u6027\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u63d0\u5347\u6a21\u578b\u5bf9\u5404\u79cd\u7cbe\u7ec6\u7c92\u5ea6\u5b89\u5168\u95ee\u9898\u7684\u9002\u5e94\u6027\u7684\u80fd\u529b\u3002|\n", "2410.05265": "|**2024-10-07**|**PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs**|Mengzhao Chen et.al.|[2410.05265](http://arxiv.org/abs/2410.05265)|**[link](https://github.com/chenmnz/prefixquant)**|**\u91cf\u5316\u5bf9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u80fd\u663e\u8457\u63d0\u5347\u5185\u5b58\u6548\u7387\u4e0e\u63a8\u7406\u901f\u5ea6\u3002\u73b0\u6709\u7684\u6fc0\u6d3b\u91cf\u5316\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u901a\u9053\u7ea7\u5f02\u5e38\u503c\u8fdb\u884c\u5904\u7406\uff0c\u5f80\u5f80\u5ffd\u7565\u4e86\u4ee4\u724c\u7ea7\u7684\u5f02\u5e38\u503c\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5bf9\u6210\u672c\u9ad8\u6602\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u4f9d\u8d56\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPrefixQuant\u7684\u65b0\u9896\u6280\u672f\uff0c\u8be5\u6280\u672f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u79bb\u7ebf\u8bc6\u522b\u51fa\u9ad8\u9891\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3a\u524d\u7f00\u653e\u5165KV\u7f13\u5b58\u4e2d\uff0c\u4ee5\u9632\u6b62\u63a8\u7406\u8fc7\u7a0b\u4e2d\u751f\u6210\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u7b80\u5316\u4e86\u91cf\u5316\u8fc7\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cPrefixQuant\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u5e76\u8d85\u8d8a\u6602\u8d35\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u7684\u65b9\u6cd5\u3002\u4f8b\u5982\uff0c\u5728W4A4KV4\uff08\u6743\u91cd4\u4f4d\u3001\u6fc0\u6d3b4\u4f4d\u3001KV\u7f13\u5b584\u4f4d\uff09\u7684Llama-3-8B\u6a21\u578b\u4e2d\uff0c\u4f7f\u7528PrefixQuant\u548c\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u540e\uff0cWikiText2\u7684\u56f0\u60d1\u5ea6\u964d\u4f4e\u4e867.43\u4e2a\u70b9\uff0c\u5e73\u5747\u51c6\u786e\u7387\u57285\u4e2a\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\u4e0a\u63d0\u9ad8\u4e8671.08%\uff0c\u76f8\u8f83\u4e8e\u4e4b\u524d\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u65b9\u6cd5QuaRot\uff0c\u5206\u522b\u5728\u56f0\u60d1\u5ea6\u4e0a\u63d0\u5347\u4e860.98\u4e2a\u70b9\uff0c\u5728\u51c6\u786e\u7387\u4e0a\u63d0\u5347\u4e865.98\u4e2a\u70b9\u3002\u6b64\u5916\uff0c\u4f7f\u7528PrefixQuant\u91cf\u5316\u540e\u7684\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u76f8\u8f83\u4e8eFP16\u6a21\u578b\u63d0\u5347\u4e861.60\u500d\u52302.81\u500d\uff0c\u4e14\u8d85\u8fc7\u4e86QuaRot\u6a21\u578b1.2\u500d\u52301.3\u500d\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\url{https://github.com/ChenMnZ/PrefixQuant}\u3002**|\n", "2410.05262": "|**2024-10-07**|**TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles**|Qingchen Yu et.al.|[2410.05262](http://arxiv.org/abs/2410.05262)|**[link](https://github.com/mazzzystar/TurtleBench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u8303\u56f4\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u53ef\u9760\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u73b0\u6709\u7684LLM\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u4f9d\u8d56\u9759\u6001\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u5f97\u8bc4\u4f30\u6a21\u578b\u5728\u4e0e\u7528\u6237\u52a8\u6001\u4ea4\u4e92\u65f6\u7684\u8868\u73b0\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5f80\u5f80\u9700\u8981\u7279\u5b9a\u80cc\u666f\u77e5\u8bc6\uff0c\u4ece\u800c\u590d\u6742\u5316\u4e86\u8861\u91cf\u6a21\u578b\u903b\u8f91\u63a8\u7406\u80fd\u529b\u7684\u6d4b\u91cf\u3002\u57fa\u4e8e\u5f3a\u5927\u6a21\u578b\u6216\u4eba\u5de5\u52aa\u529b\u7684\u5176\u4ed6\u52a8\u6001\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4f1a\u5f15\u5165\u504f\u89c1\uff0c\u5e76\u4e14\u6210\u672c\u548c\u65f6\u95f4\u9700\u6c42\u9ad8\uff0c\u8fd9\u963b\u788d\u4e86\u5927\u89c4\u6a21\u5e94\u7528\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TurtleBench\u3002TurtleBench\u4ece\u6211\u4eec\u5f00\u53d1\u7684\u5728\u7ebfTurtle Soup Puzzle\u5e73\u53f0\u6536\u96c6\u771f\u5b9e\u7684\u7528\u6237\u731c\u6d4b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u751f\u6210\u76f8\u5bf9\u52a8\u6001\u7684\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u964d\u4f4e\u6a21\u578b\u4f5c\u5f0a\u7684\u98ce\u9669\uff0c\u540c\u65f6\u4f7f\u8bc4\u4f30\u66f4\u8d34\u8fd1\u5b9e\u9645\u7528\u6237\u7684\u63a8\u7406\u9700\u6c42\uff0c\u4ece\u800c\u63d0\u9ad8\u8bc4\u4f30\u7684\u53ef\u9760\u6027\u3002TurtleBench\u5305\u542b\u4e861,532\u4e2a\u7528\u6237\u731c\u6d4b\u53ca\u5176\u6b63\u786e\u6027\u7684\u6ce8\u91ca\u4fe1\u606f\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u4e5d\u4e2aLLM\u6a21\u578b\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cOpenAI o1\u7cfb\u5217\u6a21\u578b\u5728\u8fd9\u4e9b\u8bc4\u4f30\u4e2d\u5e76\u672a\u53d6\u5f97\u9886\u5148\u5730\u4f4d\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u5047\u8bbe\uff0c\u4f8b\u5982\u201co1\u7684\u6f5c\u5728\u63a8\u7406\u4f7f\u7528\u4e86\u7b80\u5355\u7684\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u6280\u672f\u201d\u548c\u201c\u589e\u52a0CoT\u957f\u5ea6\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u63a8\u7406\u76ca\u5904\uff0c\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u566a\u97f3\u6210\u672c\u201d\u3002**|\n", "2410.05258": "|**2024-10-07**|**Differential Transformer**|Tianzhu Ye et.al.|[2410.05258](http://arxiv.org/abs/2410.05258)|**[link](https://github.com/microsoft/unilm/blob/master/Diff-Transformer/)**|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5dee\u5f02\u53d8\u6362\u5668\uff08Diff Transformer\uff09\uff0c\u5b83\u80fd\u591f\u589e\u5f3a\u5bf9\u76f8\u5173\u4e0a\u4e0b\u6587\u7684\u6ce8\u610f\u529b\u540c\u65f6\u6d88\u9664\u566a\u97f3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5dee\u5f02\u6ce8\u610f\u529b\u673a\u5236\u901a\u8fc7\u8ba1\u7b97\u4e24\u4e2a\u72ec\u7acb\u7684softmax\u6ce8\u610f\u529b\u6620\u5c04\u4e4b\u95f4\u7684\u5dee\u503c\u6765\u786e\u5b9a\u6ce8\u610f\u529b\u5206\u6570\u3002\u8fd9\u79cd\u51cf\u6cd5\u64cd\u4f5c\u53ef\u4ee5\u6d88\u9664\u566a\u97f3\u5e76\u4fc3\u8fdb\u7a00\u758f\u6ce8\u610f\u529b\u6a21\u5f0f\u7684\u4ea7\u751f\u3002\u5728\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u53d8\u6362\u5668\u76f8\u6bd4\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u5728\u6a21\u578b\u5927\u5c0f\u548c\u8bad\u7ec3\u6837\u672c\u91cf\u7684\u6269\u5c55\u4e0a\u5747\u8868\u73b0\u51fa\u8272\u3002\u66f4\u4ee4\u4eba\u5174\u594b\u7684\u662f\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5982\u957f\u4e0a\u4e0b\u6587\u5efa\u6a21\u3001\u5173\u952e\u4fe1\u606f\u68c0\u7d22\u3001\u5e7b\u89c9\u6291\u5236\u3001\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u4ee5\u53ca\u6fc0\u6d3b\u5f02\u5e38\u51cf\u5c11\u7b49\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002\u7531\u4e8e\u5bf9\u65e0\u5173\u4e0a\u4e0b\u6587\u7684\u5173\u6ce8\u8f83\u5c11\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u80fd\u591f\u6709\u6548\u7f13\u89e3\u95ee\u7b54\u548c\u6587\u672c\u6458\u8981\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u5728\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u51c6\u786e\u7387\uff0c\u800c\u4e14\u5bf9\u4e8e\u987a\u5e8f\u6392\u5217\u66f4\u4e3a\u9c81\u68d2\uff0c\u8fd9\u88ab\u8ba4\u4e3a\u662f\u957f\u671f\u7684\u7a33\u5065\u6027\u95ee\u9898\u3002\u8fd9\u4e9b\u7ed3\u679c\u786e\u7acb\u4e86\u5dee\u5f02\u53d8\u6362\u5668\u4f5c\u4e3a\u63a8\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u7684\u9ad8\u6548\u4e14\u6709\u524d\u666f\u67b6\u6784\u7684\u5730\u4f4d\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u4e0e\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u9886\u57df\u901a\u5e38\u4ee5\u81ea\u7136\u8bed\u8a00\u6c9f\u901a\u4e3a\u4e3b\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\u884c\u4e3a\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u548c\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u7b56\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u8fd9\u4e9b\u95ee\u9898\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u4e00\u76f4\u5728\u63a2\u7d22LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u529b\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u7684\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u5dee\u5f02\u4f7f\u5f97\u5f88\u96be\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6807\u51c6\u5316\u7814\u7a76\u57fa\u4e8e\u53cc\u4eba\u3001\u5e8f\u5217\u3001\u8bed\u8a00\u9a71\u52a8\u6e38\u620f\u7684\u6807\u51c6\u6846\u67b6\u3002\u53d7\u7ecf\u6d4e\u5b66\u6587\u732e\u542f\u53d1\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u672c\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u548c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u548c\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u6765\u6a21\u62df\u4ea4\u4e92\u548c\u5206\u6790\uff0c\u5e76\u5229\u7528\u5b83\u6536\u96c6\u4e86LMM\u5bf9LMM\u4ea4\u4e92\u7684\u5927\u91cf\u6570\u636e\u96c6\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u5bf9LMM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u5982\u4f55\u88ab\u7528\u6765\uff1a (i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b (ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u5c42\u9762\u8bc4\u4f30\u4ee3\u7406\u7684\u6027\u80fd\uff1b (iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.05252": "|**2024-10-07**|**Causal Micro-Narratives**|Mourad Heddaya et.al.|[2410.05252](http://arxiv.org/abs/2410.05252)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5bf9\u6587\u672c\u4e2d\u7684\u56e0\u679c\u5fae\u53d9\u4e8b\u8fdb\u884c\u5206\u7c7b\u3002\u8fd9\u4e9b\u53d9\u4e8b\u662f\u5173\u4e8e\u76ee\u6807\u4e3b\u4f53\u7684\u56e0\u679c\u89e3\u91ca\u7684\u53e5\u5b50\u7ea7\u63cf\u8ff0\u3002\u8be5\u65b9\u6cd5\u4ec5\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u4e3b\u9898\u7684\u56e0\u679c\u548c\u6548\u679c\u7684\u672c\u4f53\uff0c\u6211\u4eec\u901a\u8fc7\u5e94\u7528\u5230\u901a\u8d27\u81a8\u80c0\u53d9\u4e8b\u4e2d\u8fdb\u884c\u4e86\u793a\u8303\u3002\u5229\u7528\u8986\u76d6\u7f8e\u56fd\u5386\u53f2\u548c\u5f53\u4ee3\u65b0\u95fb\u6587\u7ae0\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u5728\u591a\u6807\u7b7e\u5206\u7c7b\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u51e0\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\u2014\u2014\u5fae\u8c03\u540e\u7684Llama 3.1 8B\uff0c\u5728\u53d9\u4e8b\u68c0\u6d4b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.87\uff0c\u5728\u53d9\u4e8b\u5206\u7c7b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.71\u3002\u5168\u9762\u7684\u9519\u8bef\u5206\u6790\u63ed\u793a\u4e86\u8bed\u4e49\u6b67\u4e49\u5e26\u6765\u7684\u6311\u6218\uff0c\u5e76\u6307\u51fa\u6a21\u578b\u9519\u8bef\u5f80\u5f80\u53cd\u6620\u4e86\u4eba\u5de5\u6ce8\u91ca\u8005\u7684\u5206\u6b67\u3002\u8fd9\u9879\u7814\u7a76\u5efa\u7acb\u4e86\u4e00\u4e2a\u4ece\u5b9e\u9645\u6570\u636e\u4e2d\u63d0\u53d6\u56e0\u679c\u5fae\u53d9\u4e8b\u7684\u6846\u67b6\uff0c\u5177\u6709\u5e7f\u6cdb\u7684\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u5e94\u7528\u524d\u666f\u3002|\n", "2410.05248": "|**2024-10-07**|**SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe**|Yuxin Xiao et.al.|[2410.05248](http://arxiv.org/abs/2410.05248)|null|\u4e3a\u4e86\u5728\u4ea4\u4e92\u9a71\u52a8\u4efb\u52a1\u4e2d\u8bf1\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u671f\u671b\u7684\u884c\u4e3a\uff0c\u901a\u5e38\u91c7\u7528\u6307\u4ee4-\u8c03\u4f18\u9636\u6bb5\uff0c\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u635f\u5931\u8bad\u7ec3LLM\u4e8e\u6307\u4ee4\u54cd\u5e94\u5bf9\u3002\u5148\u524d\u7684\u5de5\u4f5c\u65e8\u5728\u63d0\u5347\u8c03\u4f18\u6027\u80fd\uff0c\u5e38\u7740\u91cd\u4e8e\u9ad8\u8d28\u91cf\u7684\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u96c6\u7684\u6784\u5efa\uff0c\u8fd9\u901a\u5e38\u9700\u8981\u6602\u8d35\u7684\u6570\u636e\u8fc7\u6ee4\u8fc7\u7a0b\u6216\u4eba\u529b\u5bc6\u96c6\u578b\u7684\u4eba\u5de5\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528\u6570\u636e\u96c6\u7684\u5185\u5728\u7279\u6027\uff0c\u5bfc\u81f4\u4e86\u9ad8\u6602\u7684\u8ba1\u7b97\u548c\u52b3\u52a8\u6210\u672c\uff0c\u9650\u5236\u4e86\u53ef\u6269\u5c55\u6027\u548c\u6027\u80fd\u63d0\u5347\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSFTMix\u7684\u65b0\u9896\u65b9\u6cd5\uff0c\u5b83\u8d85\u8d8a\u4e86\u4f20\u7edfNTP\u8303\u5f0f\uff0c\u65e0\u9700\u7cbe\u5fc3\u8bbe\u8ba1\u7684SFT\u6570\u636e\u96c6\u5373\u53ef\u63d0\u5347\u8c03\u4f18\u6027\u80fd\u3002 \u89c2\u5bdf\u5230LLM\u5728\u8bed\u4e49\u8868\u793a\u7a7a\u95f4\u4e2d\u8868\u73b0\u51fa\u4e0d\u5747\u5300\u7684\u7f6e\u4fe1\u5ea6\u5206\u5e03\uff0c\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\u5728\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u5e94\u626e\u6f14\u4e0d\u540c\u7684\u89d2\u8272\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0cSFTMix\u5229\u7528\u8bad\u7ec3\u52a8\u6001\u6765\u8bc6\u522b\u5177\u6709\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\uff0c\u7136\u540e\u5e94\u7528\u57fa\u4e8eMixup\u7684\u6b63\u5219\u5316\u6765\u51cf\u5c11\u5bf9\u9ad8\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u8fc7\u62df\u5408\uff0c\u540c\u65f6\u4f20\u64ad\u76d1\u7763\u4fe1\u53f7\u4ee5\u6539\u5584\u76f8\u5bf9\u4f4e\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u5b66\u4e60\u6548\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u5f97SFTMix\u80fd\u591f\u5728\u5e7f\u6cdb\u7684\u64cd\u4f5c\u6307\u4ee4\u9075\u5faa\u548c\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u7279\u5b9aSFT\u4efb\u52a1\u4e2d\u663e\u8457\u8d85\u8d8aNTP\uff0c\u8bc1\u660e\u4e86\u5176\u5bf9\u4e0d\u540cLLM\u5bb6\u65cf\u548c\u4efb\u610f\u5927\u5c0f\u6570\u636e\u96c6\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86SFTMix\u8bbe\u8ba1\u9009\u62e9\u7684\u7a33\u5065\u6027\uff0c\u5f3a\u8c03\u4e86\u5176\u5728\u4e0d\u540cLLM\u548c\u6570\u636e\u96c6\u4e0a\u7684\u4e00\u81f4\u6027\u80fd\u63d0\u5347\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u66f4\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u3002|\n", "2410.05243": "|**2024-10-07**|**Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents**|Boyu Gou et.al.|[2410.05243](http://arxiv.org/abs/2410.05243)|**[link](https://github.com/OSU-NLP-Group/UGround)**|\u672c\u8bba\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5982\u4f55\u91cd\u5851\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u4f7f\u5176\u4ece\u53d7\u63a7\u6a21\u62df\u5411\u8de8\u5e73\u53f0\u7684\u590d\u6742\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u8fc7\u6e21\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u5176\u56fa\u6709\u6027\u7684\u7a33\u5065\u6027\u3002\u5f53\u524d\u7684GUI\u4ee3\u7406\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u8868\u793a\uff0c\u5982HTML\u6216\u53ef\u8bbf\u95ee\u6027\u6811\uff0c\u5c3d\u7ba1\u5b83\u4eec\u5177\u6709\u5b9e\u7528\u6027\uff0c\u4f46\u5f80\u5f80\u5f15\u5165\u566a\u58f0\u3001\u4e0d\u5b8c\u6574\u6027\u4ee5\u53ca\u589e\u52a0\u8ba1\u7b97\u5f00\u9500\u3002 \u6211\u4eec\u7684\u89c2\u70b9\u662f\uff0c\u4e3aGUI\u4ee3\u7406\u6784\u5efa\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4f53\u73b0\uff0c\u80fd\u591f\u5b8c\u5168\u901a\u8fc7\u89c6\u89c9\u611f\u77e5\u73af\u5883\uff0c\u5e76\u76f4\u63a5\u5bf9GUI\u6267\u884c\u50cf\u7d20\u7ea7\u64cd\u4f5c\u3002\u5173\u952e\u5728\u4e8e\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u5b83\u4eec\u80fd\u591f\u51c6\u786e\u5730\u5c06GUI\u5143\u7d20\u7684\u5404\u79cd\u5f15\u7528\u8868\u8fbe\u6620\u5c04\u5230\u5176\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684GUI\u5750\u6807\u4e0a\u3002\u6211\u4eec\u8868\u660e\uff0c\u4e00\u4e2a\u7b80\u5355\u7684\u914d\u65b9\u2014\u2014\u5305\u62ec\u57fa\u4e8e\u7f51\u7edc\u7684\u5408\u6210\u6570\u636e\u548c\u5bf9LLaVA\u67b6\u6784\u7684\u8f7b\u5fae\u8c03\u6574\u2014\u2014\u5bf9\u4e8e\u8bad\u7ec3\u8fd9\u6837\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u662f\u51fa\u5947\u6709\u6548\u7684\u3002 \u6211\u4eec\u6536\u96c6\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684GUI\u89c6\u89c9\u5b9a\u4f4d\u6570\u636e\u96c6\uff0c\u5305\u542b10M\u4e2aGUI\u5143\u7d20\u53ca\u5176\u5f15\u7528\u8868\u8fbe\uff0c\u8986\u76d6\u4e861.3M\u5f20\u622a\u56fe\uff0c\u4ee5\u6b64\u6765\u8bad\u7ec3UGround\uff0c\u8fd9\u662f\u7528\u4e8eGUI\u4ee3\u7406\u7684\u5f3a\u5927\u901a\u7528\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u3002\u5728\u516d\u4e2a\u8de8\u4e09\u4e2a\u7c7b\u522b\uff08\u5b9a\u4f4d\u3001\u79bb\u7ebf\u4ee3\u7406\u548c\u5728\u7ebf\u4ee3\u7406\uff09\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u51fa\u4ee5\u4e0b\u4e24\u70b9\uff1a 1\uff09UGround\u663e\u8457\u4f18\u4e8e\u73b0\u6709GUI\u4ee3\u7406\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u7edd\u5bf9\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002 2\uff09\u4f7f\u7528UGround\u7684\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u4ee3\u7406\uff0c\u5c3d\u7ba1\u73b0\u6709\u7684\u4ee3\u7406\u4f7f\u7528\u989d\u5916\u7684\u57fa\u4e8e\u6587\u672c\u7684\u8f93\u5165\uff0c\u800c\u6211\u4eec\u7684\u4ee3\u7406\u4ec5\u4f9d\u8d56\u4e8e\u89c6\u89c9\u611f\u77e5\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5f3a\u6709\u529b\u5730\u652f\u6301\u4e86\u8fd9\u6837\u4e00\u79cd\u8bbe\u60f3\uff1a\u5373\u50cf\u4eba\u7c7b\u4e00\u6837\u5728\u6570\u5b57\u4e16\u754c\u4e2d\u5bfc\u822a\u7684GUI\u4ee3\u7406\u662f\u53ef\u884c\u7684\uff0c\u5e76\u4e14\u5145\u6ee1\u4e86\u6f5c\u529b\u3002|\n", "2410.05229": "|**2024-10-07**|**GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models**|Iman Mirzadeh et.al.|[2410.05229](http://arxiv.org/abs/2410.05229)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5f15\u53d1\u4e86\u5bf9\u5b83\u4eec\u5728\u6570\u5b66\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u5173\u6ce8\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5c0f\u5b66\u6c34\u5e73\u95ee\u9898\u3002GSM8K\u57fa\u51c6\u6d4b\u8bd5\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u8868\u73b0\u3002\u5c3d\u7ba1LLM\u5728GSM8K\u4e0a\u7684\u6210\u7ee9\u8fd1\u5e74\u6765\u663e\u8457\u63d0\u9ad8\uff0c\u4f46\u5176\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5426\u771f\u6b63\u6709\u6240\u63d0\u5347\u4ecd\u7136\u5b58\u5728\u7591\u95ee\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u8bc4\u4f30\u6307\u6807\u7684\u53ef\u9760\u6027\u53d7\u5230\u8d28\u7591\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u5f53\u524d\u6700\u524d\u6cbf\u7684\u5f00\u653e\u548c\u5c01\u95ed\u6a21\u578b\u3002\u4e3a\u4e86\u514b\u670d\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86GSM-Symbolic\u6539\u8fdb\u7248\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u57fa\u4e8e\u7b26\u53f7\u6a21\u677f\u751f\u6210\u4e86\u591a\u6837\u5316\u7684\u9898\u76ee\u3002GSM-Symbolic\u4f7f\u5f97\u8bc4\u4f30\u66f4\u52a0\u53ef\u63a7\uff0c\u63d0\u4f9b\u4e86\u5173\u952e\u6d1e\u5bdf\u548c\u66f4\u53ef\u9760\u7684\u6307\u6807\u6765\u8861\u91cf\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u5728\u56de\u7b54\u4e0d\u540c\u7248\u672c\u540c\u9898\u65f6\u8868\u73b0\u51fa\u660e\u663e\u7684\u5dee\u5f02\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728GSM-Symbolic\u57fa\u51c6\u4e2d\uff0c\u4ec5\u6539\u53d8\u95ee\u9898\u4e2d\u7684\u6570\u503c\u540e\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f1a\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8106\u5f31\u6027\uff0c\u5e76\u8868\u660e\u968f\u7740\u95ee\u9898\u4e2d\u6761\u76ee\u6570\u91cf\u7684\u589e\u52a0\uff0c\u5176\u6027\u80fd\u4f1a\u663e\u8457\u964d\u4f4e\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u8fd9\u662f\u56e0\u4e3a\u5f53\u524d\u7684LLM\u65e0\u6cd5\u6267\u884c\u771f\u6b63\u7684\u903b\u8f91\u63a8\u7406\uff1b\u5b83\u4eec\u53ea\u662f\u590d\u5236\u4e86\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u63a8\u7406\u6b65\u9aa4\u3002\u5373\u4f7f\u6dfb\u52a0\u4e00\u4e2a\u770b\u4f3c\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5355\u4e2a\u6761\u76ee\uff0c\u6240\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7684\u8868\u73b0\u4e5f\u4f1a\u5927\u5e45\u4e0b\u964d\uff08\u9ad8\u8fbe65%\uff09\uff0c\u5c3d\u7ba1\u8fd9\u4e2a\u6761\u76ee\u5b9e\u9645\u4e0a\u5e76\u4e0d\u8d21\u732e\u4e8e\u5b8c\u6210\u7b54\u6848\u6240\u9700\u7684\u5173\u952e\u63a8\u7406\u94fe\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u7406\u89e3LLM\u5728\u6570\u5b66\u63a8\u7406\u4e0a\u7684\u80fd\u529b\u548c\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u4e3a\u7ec6\u81f4\u7684\u89c6\u89d2\u3002|\n", "2410.05224": "|**2024-10-07**|**Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates**|Avanika Narayan et.al.|[2410.05224](http://arxiv.org/abs/2410.05224)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCookbook\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u7f16\u7a0b\u65b9\u5f0f\u751f\u6210\u8bad\u7ec3\u6570\u636e\uff0c\u6570\u636e\u4e3b\u8981\u7531\u968f\u673a\u6807\u8bb0\u7684\u7b80\u5355\u6a21\u5f0f\u7ec4\u6210\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u89c4\u6a21\u548c\u6210\u672c\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4e14\u907f\u514d\u4e86\u4e0e\u4eba\u7c7b\u6216\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6570\u636e\u76f8\u5173\u7684\u6cd5\u5f8b\u548c\u9690\u79c1\u95ee\u9898\u3002\u9996\u5148\uff0cCookbook\u5229\u7528\u6570\u636e\u751f\u6210Python\u51fd\u6570\u6a21\u677f\u6765\u4ea7\u751f\u9f13\u52b1\u6a21\u578b\u5b66\u4e60\u4e0e\u7279\u5b9a\u4efb\u52a1\u76f8\u5339\u914d\u7684\u663e\u5f0f\u89c4\u5219\u7684\u8bad\u7ec3\u6570\u636e\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u6a21\u578b\u5728\u5bf9\u5e94\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\uff0c\u6700\u9ad8\u53ef\u8fbe52.7\u4e2a\u51c6\u786e\u6027\u70b9\u3002\u5176\u6b21\uff0c\u7531\u4e8e\u6307\u4ee4\u6570\u636e\u96c6\u80fd\u591f\u540c\u65f6\u6539\u5584\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\uff0cCookbook\u7b97\u6cd5\u81ea\u52a8\u5b66\u4e60\u5982\u4f55\u6df7\u5408\u6765\u81ea\u4e0d\u540c\u6a21\u677f\u7684\u6570\u636e\u4ee5\u4f18\u5316\u591a\u4e2a\u4efb\u52a1\u7684\u6027\u80fd\u3002\u5728\u6807\u51c6\u7684\u591a\u4efb\u52a1GPT4ALL\u8bc4\u4f30\u5957\u4ef6\u4e0a\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684Mistral-7B\u6a21\u578b\u5728\u5e73\u5747\u51c6\u786e\u6027\u548c\u4e09\u4e2a\u4efb\u52a1\u4e2d\u7684\u4e09\u4e2a\u4e0a\u5747\u53d6\u5f97\u6700\u4f73\u6210\u7ee9\u3002\u6700\u540e\uff0c\u5206\u6790\u4e86Cookbook\u4e3a\u4f55\u80fd\u63d0\u9ad8\u6027\u80fd\u4ee5\u53ca\u5176\u80cc\u540e\u7684\u539f\u7406\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u9879\u6307\u6807\u6765\u9a8c\u8bc1\u6539\u8fdb\u7684\u4e3b\u8981\u539f\u56e0\u662f\u6a21\u578b\u751f\u6210\u7684\u7ed3\u679c\u66f4\u597d\u5730\u9075\u5faa\u4e86\u6a21\u677f\u89c4\u5219\u3002|\n", "2410.07176": "|**2024-10-09**|**Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models**|Fei Wang et.al.|[2410.07176](http://arxiv.org/abs/2410.07176)|null|\u5728\u63a2\u7d22\u5982\u4f55\u901a\u8fc7\u8054\u5408\u5206\u6790\u6765\u7406\u89e3\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u5bf9\u751f\u6210\u578b\u95ee\u7b54\uff08RAG\uff09\u884c\u4e3a\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5982\u4f55\u5728LLM\u5185\u90e8\u77e5\u8bc6\u4e0e\u5916\u90e8\u6765\u6e90\u4e4b\u95f4\u4ea7\u751f\u6f5c\u5728\u51b2\u7a81\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u4e0d\u5b8c\u7f8e\u7684\u68c0\u7d22\u589e\u5f3a\u53ef\u80fd\u662f\u4e0d\u53ef\u907f\u514d\u7684\uff0c\u5e76\u4e14\u4f1a\u5bf9RAG\u7cfb\u7edf\u9020\u6210\u4e25\u91cd\u5f71\u54cd\u3002\u901a\u8fc7\u5728\u73b0\u5b9e\u6761\u4ef6\u4e0b\u7684\u63a7\u5236\u6027\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4ece\u68c0\u7d22\u5230\u7684\u4e0d\u5b8c\u6574\u77e5\u8bc6\u4e0eLLM\u5185\u90e8\u77e5\u8bc6\u4e4b\u95f4\u7684\u77e5\u8bc6\u51b2\u7a81\u662fRAG\u540e\u5904\u7406\u9636\u6bb5\u9700\u8981\u514b\u670d\u7684\u5173\u952e\u74f6\u9888\u3002 \u4e3a\u4e86\u4f7fLLM\u5728\u9762\u5bf9\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u65f6\u5177\u6709\u9c81\u68d2\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7cbe\u660eRAG\u201d\u8fd9\u4e00\u65b0\u9896\u7684RAG\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u80fd\u591f\u9002\u5f53\u5730\u6fc0\u53d1LLM\u5185\u90e8\u77e5\u8bc6\u4e2d\u7684\u5173\u952e\u4fe1\u606f\uff0c\u901a\u8fc7\u6e90\u610f\u8bc6\u5730\u6574\u5408\u5185\u90e8\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u6700\u7ec8\u6839\u636e\u4fe1\u606f\u53ef\u9760\u6027\u786e\u5b9a\u7b54\u6848\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4f7f\u7528\u4e86Gemini\u548cClaude\u4e24\u4e2a\u6a21\u578b\u9a8c\u8bc1\u4e86\u201c\u7cbe\u660eRAG\u201d\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5176\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u589e\u5f3aRAG\u9c81\u68d2\u6027\u7684\u65b9\u6cd5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u6700\u574f\u60c5\u51b5\u573a\u666f\u4e0b\uff0c\u201c\u7cbe\u660eRAG\u201d\u662f\u552f\u4e00\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u6ca1\u6709RAG\u7684LLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002 \u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u201c\u7cbe\u660eRAG\u201d\u6709\u6548\u5730\u89e3\u51b3\u4e86\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u63d0\u9ad8\u4e86RAG\u7cfb\u7edf\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2410.07173": "|**2024-10-09**|**Do better language models have crisper vision?**|Jona Ruthardt et.al.|[2410.07173](http://arxiv.org/abs/2410.07173)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u6587\u672c\u4ec5\u4f9d\u8d56\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u4e16\u754c\u65b9\u9762\u7684\u8868\u73b0\u3002\u968f\u7740LLMs\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u8fd9\u4e00\u95ee\u9898\u53d8\u5f97\u65e2\u57fa\u7840\u53c8\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7684\u573a\u666f\u4e0a\uff0c\u5982\u751f\u6210\u89c6\u89c9\u5185\u5bb9\u6216\u5bf9\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u805a\u7c7b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u89c6\u89c9\u6587\u672c\u8868\u793a\u57fa\u51c6\u201d\uff08ViTeRB\uff09\u7684\u4efb\u52a1\uff0c\u65e8\u5728\u8bc6\u522b\u51fa\u80fd\u591f\u4e0e\u89c6\u89c9\u4e16\u754c\u9ad8\u5ea6\u4e00\u81f4\u7684\u5173\u952e\u5c5e\u6027\u3002\u57fa\u4e8e\u6b64\u4efb\u52a1\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u4e3a\u4e2d\u5fc3\u7684\u8bed\u5883\u4e0b\u4f5c\u4e3a\u6587\u672c\u8868\u793a\u7684\u7406\u60f3\u5019\u9009\uff0c\u8fd9\u4e0e\u5f53\u524d\u4f7f\u7528\u6587\u672c\u7f16\u7801\u5668\u7684\u505a\u6cd5\u5f62\u6210\u4e86\u5bf9\u6bd4\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201cShareLock\u201d\u2014\u2014\u4e00\u79cd\u8d85\u8f7b\u91cf\u7ea7\u7684\u7c7b\u4f3cCLIP\u7684\u6a21\u578b\u3002\u901a\u8fc7\u5229\u7528\u4ece\u5f3a\u5927\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u9884\u8ba1\u7b97\u7684\u51bb\u7ed3\u7279\u5f81\uff0cShareLock\u5728ImageNet\u4e0a\u53d6\u5f97\u4e8651%\u7684\u51c6\u786e\u7387\uff0c\u4ec5\u4f7f\u7528\u4e86563,000\u5f20\u56fe\u50cf-\u63cf\u8ff0\u5bf9\u3002\u6b64\u5916\uff0c\u8bad\u7ec3\u6240\u9700\u7684\u8d44\u6e90\u4ec5\u4e3a1\u4e2aGPU\u5c0f\u65f6\uff08\u6216\u5305\u62ec\u7279\u5f81\u9884\u8ba1\u7b97\u768410\u4e2a\u5c0f\u65f6\uff09\uff0c\u8fdc\u5c11\u4e8e\u4ee5\u5f80\u65b9\u6cd5\u6240\u9700\u7684\u65f6\u95f4\u6570\u91cf\u7ea7\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u8be5\u4ee3\u7801\u3002|\n", "2410.07167": "|**2024-10-09**|**Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate**|Qidong Huang et.al.|[2410.07167](http://arxiv.org/abs/2410.07167)|**[link](https://github.com/shikiw/modality-integration-rate)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u3001\u7a33\u5065\u7684\u4e14\u901a\u7528\u7684\u6307\u6807\u2014\u2014\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u7528\u4e8e\u8861\u91cf\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(LVLMs)\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u8d28\u91cf\u3002\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u5728\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u800c\u5982\u4f55\u5728\u6602\u8d35\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u4e4b\u524d\u8bc4\u4f30\u5176\u8bad\u7ec3\u8d28\u91cf\u5219\u662f\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u7d22\u7684\u9886\u57df\u3002\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\uff0c\u5e38\u7528\u7684\u9884\u8bad\u7ec3\u6307\u6807\u5305\u62ec\u635f\u5931\u3001\u56f0\u60d1\u5ea6\u4ee5\u53ca\u4e0a\u4e0b\u6587\u5185\u8bc4\u4f30\u7ed3\u679c\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u8fd9\u4e9b\u6307\u6807\u5728\u5bf9\u826f\u597d\u8bad\u7ec3\u7684LLMs\u4e0e\u65b0\u6a21\u6001\u8fdb\u884c\u5bf9\u9f50\u65f6\u5e76\u4e0d\u5177\u6709\u5f88\u597d\u7684\u6307\u793a\u6027\u3002\u7531\u4e8e\u7f3a\u4e4f\u5408\u9002\u7684\u6307\u6807\uff0cLVLMs\u5728\u5173\u952e\u7684\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u7814\u7a76\u53d7\u5230\u4e86\u6781\u5927\u7684\u963b\u788d\uff0c\u5305\u62ec\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u9ad8\u6548\u6a21\u5757\u8bbe\u8ba1\u7b49\u3002\u672c\u6587\u63d0\u51fa\u4ece\u8de8\u6a21\u6001\u5206\u5e03\u8ddd\u79bb\u7684\u89d2\u5ea6\u6765\u8bc4\u4f30\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u5f15\u5165\u4e86\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u8be5\u6307\u6807\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a1\uff09**\u6709\u6548**\u5730\u4ee3\u8868\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u4e0e\u7ecf\u8fc7\u76d1\u7763\u5fae\u8c03\u540e\u7684\u57fa\u51c6\u6027\u80fd\u5448\u73b0\u6b63\u76f8\u5173\uff1b2\uff09**\u7a33\u5065**\u4e8e\u4e0d\u540c\u7684\u8bad\u7ec3/\u8bc4\u4f30\u6570\u636e\uff1b3\uff09**\u6cdb\u5316**\u4e8e\u591a\u79cd\u8bad\u7ec3\u914d\u7f6e\u548c\u67b6\u6784\u9009\u62e9\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u9884\u8bad\u7ec3\u5b9e\u9a8c\u4ee5\u63a2\u7d22MIR\u7684\u6709\u6548\u6027\uff0c\u5e76\u89c2\u5bdf\u5230\u4ee4\u4eba\u6ee1\u610f\u7684\u7ed3\u679c\uff0c\u5373MIR\u80fd\u591f\u6307\u793a\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u8bad\u7ec3\u7b56\u7565\u8c03\u5ea6\u4ee5\u53ca\u6a21\u578b\u67b6\u6784\u8bbe\u8ba1\u4ee5\u83b7\u5f97\u66f4\u597d\u7684\u9884\u8bad\u7ec3\u7ed3\u679c\u3002\u6211\u4eec\u5e0c\u671bMIR\u80fd\u591f\u6210\u4e3a\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u7684\u6709\u7528\u6307\u6807\uff0c\u5e76\u6fc0\u53d1\u4e0d\u540c\u9886\u57df\u5173\u4e8e\u6a21\u6001\u5bf9\u9f50\u7684\u540e\u7eed\u7814\u7a76\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/shikiw/Modality-Integration-Rate\u3002**|\n", "2410.07166": "|**2024-10-09**|**Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making**|Manling Li et.al.|[2410.07166](http://arxiv.org/abs/2410.07166)|**[link](https://github.com/embodied-agent-interface/embodied-agent-interface)**|**\u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u7684\u8868\u73b0\uff0c\u867d\u7136\u5df2\u6709\u5927\u91cf\u7814\u7a76\u5229\u7528LLMs\u5904\u7406\u5b9e\u4f53\u5316\u73af\u5883\u4e2d\u7684\u51b3\u7b56\u95ee\u9898\uff0c\u4f46\u6211\u4eec\u4ecd\u7f3a\u4e4f\u5bf9\u5176\u6027\u80fd\u7684\u5168\u9762\u7406\u89e3\u3002\u73b0\u6709\u5de5\u4f5c\u901a\u5e38\u5728\u4e0d\u540c\u9886\u57df\u3001\u9488\u5bf9\u4e0d\u540c\u76ee\u7684\u3001\u57fa\u4e8e\u4e0d\u540c\u8f93\u5165\u548c\u8f93\u51fa\u6784\u5efaLLMs\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u7edf\u4e00\u8bc4\u4ef7\u5b83\u4eec\u3002\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u6700\u7ec8\u7684\u6210\u529f\u7387\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u8bc6\u522bLLMs\u7f3a\u5931\u7684\u80fd\u529b\u4ee5\u53ca\u95ee\u9898\u6240\u5728\uff0c\u8fdb\u800c\u963b\u788d\u4e86\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u6709\u6548\u4e14\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u63a5\u53e3\uff08\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u63a5\u53e3\uff09\uff0c\u65e8\u5728\u652f\u6301\u5404\u79cd\u4efb\u52a1\u7c7b\u578b\u4e0eLLM\u6a21\u5757\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u7684\u7edf\u4e00\u5316\u3002\u5177\u4f53\u800c\u8a00\uff0c\u8be5\u63a5\u53e3\u5141\u8bb8\uff1a 1. \u7edf\u4e00\u591a\u79cd\u6d89\u53ca\u72b6\u6001\u4e0e\u65f6\u95f4\u5ef6\u4f38\u76ee\u6807\u7684\u5b9e\u4f53\u5316\u51b3\u7b56\u4efb\u52a1\u3002 2. \u7edf\u4e00\u56db\u79cd\u5e38\u7528\u7684\u7528\u4e8e\u51b3\u7b56\u7684LLM\u6a21\u5757\uff1a\u76ee\u6807\u89e3\u91ca\u3001\u5b50\u76ee\u6807\u5206\u89e3\u3001\u52a8\u4f5c\u5e8f\u5217\u89c4\u5212\u548c\u8fc7\u6e21\u5efa\u6a21\u3002 3. \u63d0\u4f9b\u4e00\u7cfb\u5217\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u5c06\u8bc4\u4f30\u7ec6\u5206\u4e3a\u5404\u79cd\u9519\u8bef\u7c7b\u578b\uff0c\u5982\u5e7b\u89c9\u9519\u8bef\u3001\u53ef\u7528\u6027\u9519\u8bef\u3001\u4e0d\u540c\u7c7b\u578b\u89c4\u5212\u9519\u8bef\u7b49\u3002 \u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4e0d\u540c\u5b50\u4efb\u52a1\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\uff0c\u63ed\u793a\u4e86LLM\u9a71\u52a8\u7684\u5b9e\u4f53\u5316\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u5f3a\u9879\u4e0e\u5f31\u70b9\uff0c\u5e76\u4e3a\u6709\u6548\u548c\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002**|\n", "2410.07163": "|**2024-10-09**|**Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning**|Chongyu Fan et.al.|[2410.07163](http://arxiv.org/abs/2410.07163)|**[link](https://github.com/OPTML-Group/Unlearn-Simple)**|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53bb\u5b66\u4e60\u95ee\u9898\uff0c\u5373\u5728\u4e0d\u91cd\u65b0\u4ece\u5934\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u6d88\u9664\u4e0d\u9700\u8981\u7684\u6570\u636e\u5f71\u54cd\u4ee5\u53ca\u76f8\u5173\u6a21\u578b\u80fd\u529b\uff08\u5982\u7248\u6743\u6570\u636e\u6216\u6709\u5bb3\u5185\u5bb9\u751f\u6210\uff09\uff0c\u540c\u65f6\u4fdd\u7559\u5fc5\u8981\u7684\u6a21\u578b\u529f\u80fd\u3002\u5c3d\u7ba1\u5bf9LLM\u53bb\u5b66\u4e60\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5c1a\u672a\u5f62\u6210\u4e00\u79cd\u539f\u7406\u6027\u7684\u4f18\u5316\u6846\u67b6\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u56de\u987e\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014\u8d1f\u504f\u597d\u4f18\u5316\uff08NPO\uff09\uff0c\u5e76\u53d1\u73b0\u4e86\u53c2\u8003\u6a21\u578b\u504f\u89c1\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31NPO\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u662f\u5728\u53bb\u5b66\u4e60\u4e0d\u540c\u96be\u5ea6\u6570\u636e\u65f6\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u53bb\u5b66\u4e60\u4f18\u5316\u6846\u67b6\u2014\u2014SimNPO\uff0c\u8868\u660e\u901a\u8fc7\u7b80\u5355\u7684\u504f\u597d\u4f18\u5316\u51cf\u5c11\u5bf9\u53c2\u8003\u6a21\u578b\u7684\u4f9d\u8d56\uff08\u4ece\u7b80\u5316\u89c6\u89d2\u6765\u770b\uff09\u6709\u52a9\u4e8e\u53bb\u5b66\u4e60\u8fc7\u7a0b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u6df1\u5165\u7684SimNPO\u4f18\u52bf\u5206\u6790\uff0c\u901a\u8fc7\u6df7\u5408\u9a6c\u5c14\u53ef\u592b\u94fe\u7684\u5206\u6790\u65b9\u6cd5\u652f\u6301\u8fd9\u4e00\u89c2\u70b9\u3002 \u6211\u4eec\u901a\u8fc7\u5728TOFU\u548cMUSE\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86SimNPO\u76f8\u5bf9\u4e8e\u73b0\u6709\u53bb\u5b66\u4e60\u57fa\u7ebf\u7684\u4f18\u8d8a\u6027\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5bf9\u91cd\u65b0\u5b66\u4e60\u653b\u51fb\u7684\u9c81\u68d2\u6027\u3002\u6240\u6709\u4ee3\u7801\u5747\u53ef\u5728GitHub\u4e0a\u7684https://github.com/OPTML-Group/Unlearn-Simple\u83b7\u53d6\u3002|\n", "2410.07155": "|**2024-10-09**|**Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis**|Bohan Zeng et.al.|[2410.07155](http://arxiv.org/abs/2410.07155)|**[link](https://github.com/yangling0818/trans4d)**|**\u8fd1\u671f\u5728\u6269\u6563\u6a21\u578b\u9886\u57df\u7684\u8fdb\u5c55\u5c55\u793a\u4e86\u5176\u5728\u56fe\u50cf\u548c\u89c6\u9891\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u4e864D\u5408\u6210\u7684\u6709\u6548\u6027\u3002\u73b0\u6709\u76844D\u751f\u6210\u65b9\u6cd5\u80fd\u591f\u6839\u636e\u7528\u6237\u53cb\u597d\u7684\u6761\u4ef6\u751f\u6210\u9ad8\u8d28\u91cf\u76844D\u5bf9\u8c61\u6216\u573a\u666f\uff0c\u5bf9\u6e38\u620f\u548c\u89c6\u9891\u884c\u4e1a\u5927\u6709\u88e8\u76ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5408\u6210\u590d\u67424D\u8fc7\u6e21\u548c\u573a\u666f\u5185\u5bf9\u8c61\u4ea4\u4e92\u7684\u663e\u8457\u53d8\u5f62\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTrans4D\u7684\u521b\u65b0\u6587\u672c\u52304D\u5408\u6210\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u771f\u5b9e\u53ef\u4fe1\u7684\u573a\u666f\u7ea7\u590d\u6742\u8fc7\u6e21\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u751f\u6210\u7269\u7406\u610f\u8bc6\u7684\u573a\u666f\u63cf\u8ff0\u4ee5\u8fdb\u884c4D\u573a\u666f\u521d\u59cb\u5316\u4ee5\u53ca\u6709\u6548\u8fc7\u6e21\u65f6\u95f4\u89c4\u5212\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51e0\u4f55\u611f\u77e5\u76844D\u8fc7\u6e21\u7f51\u7edc\uff0c\u57fa\u4e8e\u8ba1\u5212\u5b9e\u73b0\u590d\u6742\u7684\u573a\u666f\u7ea74D\u8fc7\u6e21\uff0c\u6d89\u53ca\u8868\u73b0\u529b\u5f3a\u7684\u5bf9\u8c61\u51e0\u4f55\u53d8\u5f62\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cTrans4D\u5728\u751f\u6210\u5177\u6709\u51c6\u786e\u6027\u548c\u9ad8\u8d28\u91cf\u8fc7\u6e21\u76844D\u573a\u666f\u65b9\u9762\u59cb\u7ec8\u8d85\u8d8a\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/Trans4D**|\n", "2410.07129": "|**2024-10-09**|**Mental Disorders Detection in the Era of Large Language Models**|Gleb Kuzmin et.al.|[2410.07129](http://arxiv.org/abs/2410.07129)|null|\u672c\u6587\u6bd4\u8f83\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3001\u7f16\u7801\u5668\u57fa\u6a21\u578b\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6291\u90c1\u75c7\u548c\u7126\u8651\u75c7\u68c0\u6d4b\u4efb\u52a1\u4e0a\u7684\u6548\u679c\u3002\u8003\u8651\u4e86\u4e94\u4e2a\u4e0d\u540c\u683c\u5f0f\u7684\u6570\u636e\u5e93\uff0c\u6bcf\u4e2a\u6570\u636e\u5e93\u90fd\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6765\u5b9a\u4e49\u76ee\u6807\u75c5\u7406\u5b66\u7c7b\u522b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u57fa\u4e8e\u8bed\u8a00\u7279\u5f81\u7684AutoML\u6a21\u578b\u3001\u591a\u79cd\u53d8\u4f53\u7684Transformer\u7f16\u7801\u5668\uff0c\u5982BERT\uff0c\u4ee5\u53ca\u6700\u5148\u8fdb\u7684LLM\u4f5c\u4e3a\u75c5\u7406\u5206\u7c7b\u6a21\u578b\u3002\u7ed3\u679c\u8868\u660e\uff0cLLM\u5728\u566a\u58f0\u5927\u4e14\u8bad\u7ec3\u6837\u672c\u5728\u6587\u672c\u957f\u5ea6\u548c\u7c7b\u578b\u4e0a\u5dee\u5f02\u663e\u8457\u7684\u5c0f\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5f53\u5728\u786e\u8bca\u4e3a\u6291\u90c1\u75c7\u4e2a\u4f53\u7684\u6587\u672c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u4f18\u4e8e\u4f20\u7edf\u7684\u5fc3\u7406\u8bed\u8a00\u5b66\u7279\u5f81\u548c\u7f16\u7801\u5668\u57fa\u6a21\u578b\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u5728\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2410.07113": "|**2024-10-09**|**Personalized Visual Instruction Tuning**|Renjie Pi et.al.|[2410.07113](http://arxiv.org/abs/2410.07113)|**[link](https://github.com/sterzhang/pvit)**|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u5c55\u5c55\u73b0\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5b58\u5728\u4e00\u4e2a\u660e\u663e\u7684\u5c40\u9650\u6027\u2014\u2014\u201c\u9762\u90e8\u76f2\u75c7\u201d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u4e00\u822c\u6027\u7684\u5bf9\u8bdd\uff0c\u4f46\u5374\u65e0\u6cd5\u9488\u5bf9\u7279\u5b9a\u4e2a\u4f53\u8fdb\u884c\u4e2a\u6027\u5316\u5bf9\u8bdd\u3002\u8fd9\u4e00\u7f3a\u9677\u963b\u788d\u4e86MLLMs\u5728\u4e2a\u6027\u5316\u573a\u666f\u4e2d\u7684\u5e94\u7528\uff0c\u5982\u5b9a\u5236\u5316\u7684\u79fb\u52a8\u8bbe\u5907\u89c6\u89c9\u52a9\u624b\u6216\u9700\u8981\u8bc6\u522b\u5bb6\u5ead\u6210\u5458\u7684\u5bb6\u7528\u673a\u5668\u4eba\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4e2a\u6027\u5316\u89c6\u89c9\u6307\u4ee4\u8c03\u6574\uff08PVIT\uff09\u7684\u65b0\u9896\u6570\u636e\u6574\u7406\u4e0e\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u4f7fMLLMs\u80fd\u591f\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u76ee\u6807\u4e2a\u4f53\uff0c\u5e76\u5c55\u5f00\u4e2a\u6027\u5316\u4e14\u8fde\u8d2f\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5f00\u53d1\u4e00\u4e2a\u590d\u6742\u7684\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u4e3b\u751f\u6210\u5305\u542b\u4e2a\u6027\u5316\u5bf9\u8bdd\u7684\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4e2a\u7ba1\u9053\u5229\u7528\u4e86\u5404\u79cd\u89c6\u89c9\u4e13\u5bb6\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u548c\uff08\u591a\u6a21\u6001\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30MLLMs\u7684\u4e2a\u6027\u5316\u6f5c\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aP-Bench\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u5305\u62ec\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u6211\u4eec\u6574\u7406\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u4e2a\u6027\u5316\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u667a\u80fd\u4f53\u53d8\u5f97\u8d8a\u6765\u8d8a\u81ea\u4e3b\uff0c\u5e76\u4e14\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\u65f6\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u6a21\u5f0f\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u9884\u89c1\u53ef\u80fd\u4ea7\u751f\u7684\u65b0\u73b0\u8c61\u4ee5\u53ca\u6f5c\u5728\u98ce\u9669\u3002\u672c\u6587\u53d7\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u542f\u53d1\uff0c\u4e13\u6ce8\u4e8e\u7814\u7a76\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u80cc\u666f\u7684\u591a\u667a\u80fd\u4f53\u73af\u5883\u4e2d\u7684LLM\u4ea4\u4e92\u6a21\u5f0f\u3002 \u7814\u7a76\u805a\u7126\u4e8e\u4e24\u7c7b\u4e3b\u8981\u73b0\u8c61\uff1a\u8bf4\u670d\u529b\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u8bd5\u56fe\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\uff08\u5982\u83b7\u5f97\u989d\u5916\u7684\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u72f1\uff09\u7684\u56da\u72af\u667a\u80fd\u4f53\u4e4b\u95f4\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u63a2\u8ba8\u3002\u901a\u8fc7\u4f7f\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\uff0c\u5171\u8ba12000\u6b21\u673a\u5668\u95f4\u7684\u5bf9\u8bdd\uff0c\u7814\u7a76\u4e86\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u83b7\u5f97\u4e86\u4ee5\u4e0b\u663e\u8457\u53d1\u73b0\uff1a 1. \u4e00\u4e9b\u6a21\u578b\u5728\u591a\u667a\u80fd\u4f53\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\uff0c\u65e0\u6cd5\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002 2. \u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u76ee\u6807\u5bf9\u667a\u80fd\u4f53\u7684\u8bf4\u670d\u529b\u6709\u663e\u8457\u5f71\u54cd\uff0c\u800c\u5bf9\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002 3. \u667a\u80fd\u4f53\u7684\u89d2\u8272\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u4eba\u683c\u7279\u8d28\uff0c\u5bf9\u56da\u72af\u7684\u8bf4\u670d\u6210\u529f\u51e0\u7387\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u6709\u7740\u76f4\u63a5\u63a8\u52a8\u4f5c\u7528\u3002 4. \u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u7684\u4eba\u683c\u7279\u8d28\uff0c\u4ec5\u901a\u8fc7\u8d4b\u4e88\u89d2\u8272\uff0c\u4e5f\u89c2\u5bdf\u5230\u4e86\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u81ea\u7136\u4ea7\u751f\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ea4\u4e92\u667a\u80fd\u4f53\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8ba8\u8bba\u5177\u6709\u91cd\u8981\u542f\u793a\u3002|\n", "2410.07103": "|**2024-10-09**|**Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context**|Sangwon Yu et.al.|[2410.07103](http://arxiv.org/abs/2410.07103)|null|\u5728\u591a\u8df3\u63a8\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u57fa\u4e8e\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u5185\u7684\u652f\u6301\u6587\u6863\u8fdb\u884c\u591a\u6b65\u9aa4\u63a8\u7406\u7684\u6311\u6218\u3002LLM\u5f80\u5f80\u96be\u4ee5\u7b5b\u9009\u51fa\u4e0d\u76f8\u5173\u7684\u6587\u6863\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5bf9\u4e0a\u4e0b\u6587\u4e2d\u652f\u6301\u6587\u6863\u7684\u4f4d\u7f6e\u975e\u5e38\u654f\u611f\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u6311\u6218\uff1aLLM\u7684\u6027\u80fd\u4e5f\u5bf9\u5448\u73b0\u652f\u6301\u6587\u6863\u7684\u987a\u5e8f\u975e\u5e38\u654f\u611f\u3002\u6211\u4eec\u5c06\u6b64\u95ee\u9898\u79f0\u4e3a\u201c\u9519\u5e8f\u4e0a\u4e0b\u6587\u95ee\u9898\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6cd5\u2014\u2014\u4e0a\u4e0b\u6587\u91cd\u590d\uff08CoRe\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u591a\u6b21\u63d0\u793a\u6a21\u578b\u4ee5\u786e\u4fdd\u652f\u6301\u6587\u6863\u4ee5\u6700\u4f73\u987a\u5e8f\u5448\u73b0\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002 \u901a\u8fc7\u5e94\u7528CoRe\uff0c\u6211\u4eec\u5728\u591a\u8df3\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684F1\u5f97\u5206\u63d0\u9ad8\u4e86\u9ad8\u8fbe30%\uff0c\u5728\u5408\u6210\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e86\u9ad8\u8fbe70%\u3002\u6b64\u5916\uff0cCoRe\u6709\u52a9\u4e8e\u7f13\u89e3LLM\u666e\u904d\u5b58\u5728\u7684\u201c\u4e2d\u95f4\u8ff7\u5931\u201d\u95ee\u9898\uff0c\u5e76\u53ef\u4ee5\u4e0e\u5229\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63a8\u7406\u7684\u68c0\u7d22\u65b9\u6cd5\u6709\u6548\u7ed3\u5408\u4f7f\u7528\u3002|\n", "2410.08202": "|**2024-10-10**|**Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training**|Gen Luo et.al.|[2410.08202](http://arxiv.org/abs/2410.08202)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5bf9\u6269\u5c55\u5176\u80fd\u529b\u4ee5\u5904\u7406\u591a\u6a21\u6001\u4efb\u52a1\u7684\u5173\u6ce8\u65e5\u76ca\u589e\u52a0\u3002\u5176\u4e2d\uff0c\u5bf9\u5355\u4f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u7814\u7a76\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6574\u5408\u4e86\u89c6\u89c9\u7f16\u7801\u548c\u8bed\u8a00\u89e3\u7801\u529f\u80fd\u3002\u5c3d\u7ba1\u5355\u4f53MLLM\u5728\u7ed3\u6784\u4e0a\u7b80\u6d01\u4e14\u6613\u4e8e\u90e8\u7f72\uff0c\u4f46\u8981\u5b9e\u73b0\u5177\u6709\u7ade\u4e89\u529b\u6027\u80fd\u7684\u8bad\u7ec3\u4ecd\u9762\u4e34\u6311\u6218\u3002\u6d41\u884c\u7684\u7b56\u7565\u91c7\u7528\u8fde\u7eed\u9884\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5c06\u9884\u8bad\u7ec3\u7684LLM\u6269\u5c55\u4e3a\u5355\u4f53MLLM\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u707e\u96be\u6027\u9057\u5fd8\u5e76\u5bfc\u81f4\u6027\u80fd\u9000\u5316\u3002 \u672c\u6587\u65e8\u5728\u4ece\u589e\u91cf\u5b66\u4e60\u7684\u89d2\u5ea6\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5728\u9884\u8bad\u7ec3\u7684LLM\u4e2d\u5d4c\u5165\u89c6\u89c9\u53c2\u6570\uff0c\u901a\u8fc7\u589e\u91cf\u5b66\u4e60\u673a\u5236\uff0c\u5373\u5728\u4f18\u5316\u89c6\u89c9\u53c2\u6570\u65f6\u51bb\u7ed3LLM\uff0c\u4ece\u5927\u91cf\u6570\u636e\u4e2d\u9010\u6b65\u5b66\u4e60\u89c6\u89c9\u77e5\u8bc6\u3002\u57fa\u4e8e\u8fd9\u4e00\u539f\u5219\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMono-InternVL\u7684\u65b0\u578b\u5355\u4f53MLLM\uff0c\u5b83\u901a\u8fc7\u591a\u6a21\u6001\u6df7\u5408\u4e13\u5bb6\u7ed3\u6784\u65e0\u7f1d\u5730\u878d\u5408\u4e86\u4e00\u7cfb\u5217\u89c6\u89c9\u4e13\u5bb6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u9884\u8bad\u7ec3\u7b56\u7565\u6765\u6700\u5927\u5316Mono-InternVL\u7684\u89c6\u89c9\u80fd\u529b\uff0c\u5373\u5185\u751f\u89c6\u89c9\u9884\u8bad\u7ec3\uff08EViP\uff09\u3002\u5177\u4f53\u800c\u8a00\uff0cEViP\u8bbe\u8ba1\u4e3a\u4e00\u4e2a\u89c6\u89c9\u4e13\u5bb6\u7684\u6e10\u8fdb\u5f0f\u5b66\u4e60\u8fc7\u7a0b\uff0c\u65e8\u5728\u5145\u5206\u5229\u7528\u4ece\u4f4e\u8d28\u91cf\u6570\u636e\u5230\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u89c6\u89c9\u77e5\u8bc6\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u572816\u4e2a\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u8bc1\u5b9e\u4e86\u4e0e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u5355\u4f53MLLM\u76f8\u6bd4\uff0cMono-InternVL\u57286\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\u4e0a\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4f8b\u5982\u5728OCRBench\u4e0a\u7684+113\u70b9\u4f18\u52bf\uff0c\u800c\u4e14\u8fd8\u786e\u8ba4\u4e86\u5176\u66f4\u597d\u7684\u90e8\u7f72\u6548\u7387\uff0c\u9996\u6b21\u4ee4\u724c\u5ef6\u8fdf\u964d\u4f4e\u4e86\u9ad8\u8fbe67%\u3002|\n", "2410.08197": "|**2024-10-10**|**From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions**|Changle Qu et.al.|[2410.08197](http://arxiv.org/abs/2410.08197)|**[link](https://github.com/quchangle1/DRAFT)**|**\u672c\u6587\u4e13\u6ce8\u4e8e\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u5b58\u5728\u7684\u7406\u89e3\u9e3f\u6c9f\u95ee\u9898\uff0c\u8fd9\u4e00\u9e3f\u6c9f\u6e90\u4e8e\u73b0\u6709\u4eba\u7c7b\u5bfc\u5411\u7684\u5de5\u5177\u6587\u6863\u7684\u4e0d\u5b8c\u5584\u6027\u548c\u4e0d\u51c6\u786e\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDRAFT\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u52a8\u6001\u4f18\u5316\u5de5\u5177\u6587\u6863\uff0c\u901a\u8fc7\u5206\u6790\u6765\u81eaLLM\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u7684\u53cd\u9988\u548c\u8f68\u8ff9\u4fe1\u606f\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8e\u4e00\u79cd\u521b\u65b0\u7684\u8bd5\u9519\u5b66\u4e60\u6d41\u7a0b\uff0c\u5305\u62ec\u7ecf\u9a8c\u6536\u96c6\u3001\u4ece\u7ecf\u9a8c\u5b66\u4e60\u4ee5\u53ca\u6587\u6863\u91cd\u5199\u4e09\u4e2a\u9636\u6bb5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5de5\u5177\u6587\u6863\u7684\u8d28\u91cf\u3002 \u4e3a\u4e86\u786e\u4fdd\u63a2\u7d22\u7684\u591a\u6837\u6027\u5e76\u907f\u514d\u8fc7\u62df\u5408\uff0cDRAFT\u8fd8\u91c7\u7528\u4e86\u4fc3\u8fdb\u591a\u6837\u6027\u7684\u63a2\u7d22\u7b56\u7565\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u5de5\u5177\u9002\u5e94\u6027\u7ec8\u6b62\u673a\u5236\u6765\u63d0\u9ad8\u6548\u7387\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAFT\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u4f18\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u6587\u6863\u8d28\u91cf\uff0c\u4fc3\u8fdb\u4e86LLM\u5bf9\u5de5\u5177\u7684\u66f4\u6df1\u5165\u7406\u89e3\u548c\u66f4\u6709\u6548\u5229\u7528\u3002\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u4f18\u5316\u540e\u7684\u5de5\u5177\u6587\u6863\u5177\u6709\u5f3a\u5927\u7684\u8de8\u6a21\u578b\u901a\u7528\u80fd\u529b\u3002**|\n", "2410.08196": "|**2024-10-10**|**MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code**|Zimu Lu et.al.|[2410.08196](http://arxiv.org/abs/2410.08196)|**[link](https://github.com/mathllm/mathcoder2)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u4f34\u968f\u63a8\u7406\u6b65\u9aa4\u7684\u6570\u5b66\u4ee3\u7801\uff0c\u4ee5\u8fdb\u884c\u6301\u7eed\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u6574\u5408\u6570\u5b66\u76f8\u5173\u7f51\u7edc\u6570\u636e\u3001\u4f7f\u7528\u6570\u5b66\u5305\u7684\u4ee3\u7801\u3001\u6570\u5b66\u6559\u79d1\u4e66\u548c\u5408\u6210\u6570\u636e\u6765\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6301\u7eed\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u53d6LaTeX\u8868\u8fbe\u5f0f\u3001\u8868\u8fbe\u5f0f\u7684\u6761\u4ef6\u4ee5\u53ca\u7ed3\u679c\u6765\u6784\u9020\u63a8\u7406\u6b65\u9aa4\u3002\u57fa\u4e8e\u8fd9\u4e9b\u63d0\u53d6\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\uff0c\u4ee5\u51c6\u786e\u6355\u6349\u6570\u5b66\u63a8\u7406\u8fc7\u7a0b\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u4ee3\u7801\u9644\u52a0\u5230\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u540e\uff0c\u5f62\u6210\u5305\u542b\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u6b65\u9aa4\u53ca\u5176\u5bf9\u5e94\u4ee3\u7801\u7684\u6570\u636e\u5bf9\u3002\u5c06\u6b64\u6570\u636e\u4e0e\u539f\u59cb\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u5f97\u5230\u4e00\u4e2a\u5305\u542b19.2B\u4e2a\u6807\u8bb0\u7684\u9ad8\u6027\u80fd\u6570\u5b66\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aMathCode-Pile\u3002\u4f7f\u7528\u6b64\u8bed\u6599\u5e93\u5bf9\u51e0\u79cd\u6d41\u884c\u7684\u57fa\u6a21\u8fdb\u884c\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ece\u800c\u4ea7\u751f\u4e86\u540d\u4e3aMathCoder2\u7684\u6a21\u578b\u5bb6\u65cf\u3002\u6240\u6709\u6570\u636e\u5904\u7406\u548c\u8bad\u7ec3\u4ee3\u7801\u5747\u5f00\u6e90\uff0c\u786e\u4fdd\u4e86\u6574\u4e2a\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u6d41\u7a0b\u7684\u900f\u660e\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002\u4ee3\u7801\u5728https://github.com/mathllm/MathCoder2\u4e0a\u53d1\u5e03\u3002**|\n", "2410.08193": "|**2024-10-10**|**GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment**|Yuancheng Xu et.al.|[2410.08193](http://arxiv.org/abs/2410.08193)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4ed4\u7ec6\u5bf9\u9f50\u4ee5\u6ee1\u8db3\u4eba\u7c7b\u7684\u504f\u597d\u3002\u4f20\u7edf\u7684\u8bad\u7ec3\u65f6\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u96c6\u6765\u5fae\u8c03LLM\uff0c\u4f46\u4f1a\u5e26\u6765\u663e\u8457\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u5e76\u4e14\u9700\u8981\u91cd\u590d\u8bad\u7ec3\u4ee5\u5e94\u5bf9\u591a\u6837\u5316\u7684\u7528\u6237\u504f\u597d\u3002\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u6765\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u8f68\u8ff9\u7ea7RM\uff0c\u5b83\u4eec\u65e8\u5728\u8bc4\u4f30\u5b8c\u6574\u54cd\u5e94\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u4e0d\u9002\u5408\u7528\u4e8e\u9700\u8981\u4ece\u90e8\u5206\u54cd\u5e94\u8ba1\u7b97\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\u7684\u81ea\u56de\u5f52\u6587\u672c\u751f\u6210\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenARM\uff0c\u4e00\u79cd\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5229\u7528\u4e86\u81ea\u56de\u5f52\u5956\u52b1\u6a21\u578b\u2014\u2014\u4e00\u79cd\u65b0\u578b\u7684\u5956\u52b1\u53c2\u6570\u5316\u65b9\u6cd5\uff0c\u65e8\u5728\u9884\u6d4b\u81ea\u56de\u5f52\u751f\u6210\u8fc7\u7a0b\u4e2d\u7684\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u548c\u6709\u6548\u7684\u81ea\u56de\u5f52\u751f\u6210\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u53c2\u6570\u5316\u53ef\u4ee5\u5728KL\u6b63\u5219\u5316\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u5185\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\u63a5\u8fd1\u4efb\u4f55\u7531\u4f20\u7edfRM\u53ef\u5b9e\u73b0\u7684\u5206\u5e03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGenARM\u5728\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u57fa\u7ebf\uff0c\u5e76\u4e14\u4e0e\u8bad\u7ec3\u65f6\u65b9\u6cd5\u7684\u6027\u80fd\u76f8\u5f53\u3002\u6b64\u5916\uff0cGenARM\u652f\u6301\u5f31\u5230\u5f3a\u7684\u6307\u5bfc\uff0c\u5141\u8bb8\u5728\u4e0d\u9700\u8981\u8bad\u7ec3\u66f4\u5927\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\uff0c\u901a\u8fc7\u8f83\u5c0f\u7684RM\u5bf9\u66f4\u5927\u7684LLM\u8fdb\u884c\u5bf9\u9f50\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u6210\u672c\u3002\u8fdb\u4e00\u6b65\u5730\uff0cGenARM\u8fd8\u652f\u6301\u591a\u76ee\u6807\u5bf9\u9f50\uff0c\u5141\u8bb8\u5b9e\u65f6\u5e73\u8861\u504f\u597d\u7ef4\u5ea6\uff0c\u6ee1\u8db3\u4e0d\u540c\u7528\u6237\u9700\u6c42\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u3002|\n", "2410.08174": "|**2024-10-10**|**Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models**|Qingni Wang et.al.|[2410.08174](http://arxiv.org/abs/2410.08174)|null|\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRON\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u65e8\u5728\u5bf9\u4efb\u4f55\u652f\u6301\u5728\u5f00\u653e\u548c\u5c01\u95ed\u573a\u666f\u4e0b\u91c7\u6837\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u98ce\u9669\u63a7\u5236\u4e0e\u8bc4\u4f30\u3002TRON\u7531\u4e24\u4e2a\u4e3b\u8981\u7ec4\u4ef6\u6784\u6210\uff1a\uff081\uff09\u4e00\u79cd\u65b0\u9896\u7684\u6821\u51c6\u8bc4\u5206\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ee5\u6700\u5c0f\u5c3a\u5bf8\u91c7\u6837\u54cd\u5e94\u96c6\uff1b\uff082\uff09\u57fa\u4e8e\u81ea\u81f4\u6027\u7406\u8bba\u7684\u975e\u4e00\u81f4\u6027\u8bc4\u5206\uff0c\u901a\u8fc7\u8bbe\u5b9a\u4e24\u79cd\u7279\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u6765\u63a7\u5236\u9519\u8bef\u7387\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u9996\u6b21\u63a2\u8ba8\u4e86\u5728\u5f00\u653e\u573a\u666f\u4e0b\u7684\u9884\u6d4b\u96c6\u4e2d\u7684\u8bed\u4e49\u5197\u4f59\u95ee\u9898\uff0c\u5e76\u636e\u6b64\u63d0\u51fa\u4e86\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4ef7MLLM\u7684\u65b0\u6307\u6807\u2014\u2014\u5e73\u5747\u96c6\u5408\u5927\u5c0f\u3002 \u901a\u8fc7\u5728\u56db\u4e2a\u89c6\u9891\u95ee\u7b54\uff08VideoQA\uff09\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u516b\u79cdMLLM\u8fdb\u884c\u5168\u9762\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86TRON\u80fd\u591f\u5b9e\u73b0\u7528\u6237\u6307\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u8303\u56f4\u5185\u7684\u671f\u671b\u9519\u8bef\u7387\u3002\u540c\u65f6\uff0c\u53bb\u91cd\u540e\u7684\u9884\u6d4b\u96c6\u5728\u4fdd\u6301\u9002\u5e94\u6027\u7684\u540c\u65f6\uff0c\u5c55\u73b0\u51fa\u66f4\u9ad8\u6548\u3001\u7a33\u5b9a\u7684\u98ce\u9669\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u4e0d\u540c\u98ce\u9669\u6c34\u5e73\u4e0b\u5747\u6709\u51fa\u8272\u8868\u73b0\u3002|\n", "2410.08172": "|**2024-10-10**|**On the Evaluation of Generative Robotic Simulations**|Feng Chen et.al.|[2410.08172](http://arxiv.org/abs/2410.08172)|null|\u7531\u4e8e\u83b7\u53d6\u771f\u5b9e\u4e16\u754c\u6570\u636e\u7684\u56f0\u96be\u6027\uff0c\u673a\u5668\u4eba\u6a21\u62df\u5df2\u6210\u4e3a\u5e76\u884c\u8bad\u7ec3\u548c\u6a21\u62df\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u8f6c\u6362\u7684\u5173\u952e\uff0c\u8fd9\u51f8\u663e\u4e86\u53ef\u6269\u5c55\u4eff\u771f\u673a\u5668\u4eba\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002\u57fa\u7840\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u5728\u81ea\u4e3b\u751f\u6210\u53ef\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u65b0\u8303\u5f0f\u5f3a\u8c03\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u81ea\u4e3b\u751f\u6210\u4efb\u52a1\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u751f\u6210\u6a21\u62df\u7684\u5168\u9762\u8bc4\u4ef7\u6846\u67b6\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u8bc4\u4f30\u5206\u4e3a\u4e09\u4e2a\u6838\u5fc3\u65b9\u9762\uff1a\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u3002\u5bf9\u4e8e\u5355\u4efb\u52a1\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u751f\u6210\u4efb\u52a1\u7684\u771f\u5b9e\u6027\u548c\u751f\u6210\u8f68\u8ff9\u7684\u5b8c\u6574\u6027\u3002\u5728\u591a\u6837\u6027\u65b9\u9762\uff0c\u6211\u4eec\u901a\u8fc7\u4efb\u52a1\u63cf\u8ff0\u7684\u6587\u672c\u76f8\u4f3c\u6027\u548c\u6536\u96c6\u7684\u4efb\u52a1\u8f68\u8ff9\u8bad\u7ec3\u7684\u4e16\u754c\u6a21\u578b\u635f\u5931\u6765\u6d4b\u91cf\u4efb\u52a1\u548c\u6570\u636e\u7684\u591a\u6837\u6027\u3002\u5bf9\u4e8e\u4efb\u52a1\u7ea7\u522b\u7684\u6cdb\u5316\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4f7f\u7528\u591a\u4e2a\u751f\u6210\u4efb\u52a1\u8bad\u7ec3\u7684\u7b56\u7565\u5728\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6cdb\u5316\u80fd\u529b\u3002\u5728\u4e09\u4e2a\u4ee3\u8868\u6027\u4efb\u52a1\u751f\u6210\u7ba1\u9053\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7684\u8bc4\u4f30\u7ed3\u679c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u9ad8\u5ea6\u4e00\u81f4\uff0c\u786e\u8ba4\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u6709\u6548\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u53ef\u4ee5\u901a\u8fc7\u67d0\u4e9b\u65b9\u6cd5\u5b9e\u73b0\u8d28\u91cf\u548c\u591a\u6837\u6027\u7684\u6307\u6807\uff0c\u4f46\u6ca1\u6709\u4efb\u4f55\u4e00\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u6240\u6709\u6307\u6807\u4e0a\u90fd\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u8868\u660e\u9700\u8981\u66f4\u591a\u5730\u5173\u6ce8\u5e73\u8861\u8fd9\u4e9b\u4e0d\u540c\u6307\u6807\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u7a81\u663e\u4e86\u5f53\u524d\u5de5\u4f5c\u9762\u4e34\u7684\u5171\u540c\u6311\u6218\u2014\u2014\u4f4e\u6cdb\u5316\u80fd\u529b\u3002 \u533f\u540d\u7f51\u7ad9\u94fe\u63a5\uff1ahttps://sites.google.com/view/evaltasks|\n", "2410.08164": "|**2024-10-10**|**Agent S: An Open Agentic Framework that Uses Computers Like a Human**|Saaket Agashe et.al.|[2410.08164](http://arxiv.org/abs/2410.08164)|**[link](https://github.com/simular-ai/agent-s)**|**\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgent S\u7684\u5f00\u653e\u6027\u4ee3\u7406\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u56fe\u5f62\u7528\u6237\u754c\u9762(GUI)\u4e0e\u8ba1\u7b97\u673a\u8fdb\u884c\u81ea\u4e3b\u4ea4\u4e92\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u590d\u6742\u3001\u591a\u6b65\u9aa4\u7684\u4efb\u52a1\u6765\u6539\u53d8\u4eba\u673a\u4ea4\u4e92\u65b9\u5f0f\u3002Agent S\u65e8\u5728\u89e3\u51b3\u81ea\u52a8\u5316\u8ba1\u7b97\u673a\u4efb\u52a1\u65f6\u9047\u5230\u7684\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\u83b7\u53d6\u7279\u5b9a\u9886\u57df\u7684\u77e5\u8bc6\u3001\u5728\u957f\u4efb\u52a1\u5468\u671f\u5185\u89c4\u5212\u4ee5\u53ca\u5904\u7406\u52a8\u6001\u3001\u975e\u5747\u5300\u7684\u754c\u9762\u3002\u4e3a\u6b64\uff0cAgent S\u5f15\u5165\u4e86\u7ecf\u9a8c\u589e\u5f3a\u7684\u5c42\u6b21\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u591a\u4e2a\u7ea7\u522b\u4e0a\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u641c\u7d22\u548c\u5185\u90e8\u7ecf\u9a8c\u68c0\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u9ad8\u6548\u7684\u4efb\u52a1\u89c4\u5212\u548c\u5b50\u4efb\u52a1\u6267\u884c\u3002\u6b64\u5916\uff0c\u5b83\u91c7\u7528\u4e86\u4ee3\u7406-\u8ba1\u7b97\u673a\u63a5\u53e3(ACI)\uff0c\u57fa\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b(MLLMs)\u66f4\u597d\u5730\u63ed\u793aGUI\u4ee3\u7406\u7684\u63a8\u7406\u548c\u63a7\u5236\u80fd\u529b\u3002\u5728OSWorld\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cAgent S\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e869.37%(\u76f8\u5bf9\u63d0\u9ad8\u4e8683.6%)\uff0c\u5e76\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u9ad8\u6c34\u5e73\u3002\u5168\u9762\u5206\u6790\u5f3a\u8c03\u4e86\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u672a\u6765\u6539\u8fdb\u7684\u89c1\u89e3\u3002\u6b64\u5916\uff0cAgent S\u5728\u65b0\u53d1\u5e03\u7684WindowsAgentArena\u57fa\u51c6\u4e0a\u5c55\u793a\u4e86\u5e7f\u6cdb\u7684\u901a\u7528\u6027\uff0c\u80fd\u591f\u9002\u5e94\u4e0d\u540c\u7684\u64cd\u4f5c\u7cfb\u7edf\u3002\u6709\u5173\u4ee3\u7801\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u9605https://github.com/simular-ai/Agent-S\u3002**|\n", "2410.08146": "|**2024-10-10**|**Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning**|Amrith Setlur et.al.|[2410.08146](http://arxiv.org/abs/2410.08146)|null|\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u7684\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u8fc7\u7a0b\u5956\u52b1\u6a21\u578b\uff08PRMs\uff09\u3002\u4e0e\u4ec5\u5728\u6700\u7ec8\u6b65\u9aa4\u63d0\u4f9b\u53cd\u9988\u7684\u7ed3\u679c\u5956\u52b1\u6a21\u578b\uff08ORMs\uff09\u76f8\u6bd4\uff0cPRMs\u5728\u591a\u6b65\u63a8\u7406\u8ddf\u8e2a\u7684\u6bcf\u4e2a\u6b65\u9aa4\u90fd\u63d0\u4f9b\u53cd\u9988\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u6539\u8fdb\u4fe1\u7528\u5206\u914d\u3002\u7136\u800c\uff0c\u6536\u96c6\u5bc6\u96c6\u3001\u6bcf\u6b65\u9aa4\u7684\u4eba\u7c7b\u6807\u7b7e\u5e76\u4e0d\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u4ece\u81ea\u52a8\u6807\u8bb0\u6570\u636e\u8bad\u7ec3PRMs\u8fc4\u4eca\u4e3a\u6b62\u5bfc\u81f4\u7684\u589e\u76ca\u6709\u9650\u3002\u4e3a\u4e86\u901a\u8fc7\u8fd0\u884c\u641c\u7d22\u6765\u6539\u8fdb\u57fa\u7b56\u7565\u6216\u5c06\u5176\u7528\u4f5c\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7684\u5bc6\u96c6\u5956\u52b1\u6765\u4f18\u5316\u57fa\u7b56\u7565\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u95ee\u9898\u662f\uff1a\u201c\u6211\u4eec\u5e94\u8be5\u5982\u4f55\u8bbe\u8ba1\u8fc7\u7a0b\u5956\u52b1\uff1f\u201d\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0c\u4e3a\u4e86\u6709\u6548\uff0c\u6b65\u9aa4\u7ea7\u5956\u52b1\u5e94\u8be5\u8861\u91cf\u8fdb\u5ea6\uff1a\u91c7\u53d6\u6b65\u9aa4\u524d\u540e\u4ea7\u751f\u6b63\u786e\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u53d8\u5316\uff0c\u5bf9\u5e94\u4e8eRL\u4e2d\u7684\u6b65\u9aa4\u7ea7\u4f18\u52bf\u7684\u6982\u5ff5\u3002\u5173\u952e\u5728\u4e8e\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5e94\u8be5\u5728\u4e0e\u57fa\u7b56\u7565\u4e0d\u540c\u7684\u8bc1\u660e\u7b56\u7565\u4e0b\u8fdb\u884c\u6d4b\u91cf\u3002\u6211\u4eec\u7406\u8bba\u5730\u63cf\u8ff0\u4e86\u826f\u597d\u7684\u8bc1\u660e\u8005\u96c6\u5408\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u8fd9\u6837\u7684\u8bc1\u660e\u8005\u4f18\u5316\u8fc7\u7a0b\u5956\u52b1\u53ef\u4ee5\u6539\u5584\u6d4b\u8bd5\u65f6\u641c\u7d22\u548c\u5728\u7ebfRL\u671f\u95f4\u7684\u63a2\u7d22\u3002\u5b9e\u9645\u4e0a\uff0c\u6211\u4eec\u7684\u63cf\u8ff0\u663e\u793a\uff0c\u5f31\u8bc1\u660e\u8005\u7b56\u7565\u53ef\u4ee5\u663e\u7740\u63d0\u9ad8\u66f4\u5f3a\u7684\u57fa\u7b56\u7565\uff0c\u8fd9\u4e5f\u662f\u6211\u4eec\u5728\u5b9e\u9a8c\u4e0a\u89c2\u5bdf\u5230\u7684\u73b0\u8c61\u3002\u6211\u4eec\u901a\u8fc7\u8bad\u7ec3\u8fc7\u7a0b\u4f18\u52bf\u9a8c\u8bc1\u5668\uff08PAVs\uff09\u6765\u9884\u6d4b\u5728\u8fd9\u4e9b\u8bc1\u660e\u8005\u4e0b\u8fdb\u884c\u7684\u8fdb\u5c55\uff0c\u8bc1\u660e\u4e0eORMs\u76f8\u6bd4\uff0c\u5728\u7ebfRL\u4f7f\u7528PAVs\u63d0\u4f9b\u7684\u5bc6\u96c6\u5956\u52b1\u53ef\u4ee5\u5b9e\u73b0\u9ad8\u8fbe8\uff05\u4ee5\u4e0a\u7684\u51c6\u786e\u6027\u63d0\u9ad8\uff0c\u4ee5\u53ca1.5\u81f35\u500d\u7684\u8ba1\u7b97\u6548\u7387\u63d0\u9ad8\u3002\u4f7f\u7528PAVs\u7684\u5728\u7ebfRL\u9996\u6b21\u5b9e\u73b0\u4e86\u6837\u672c\u6548\u7387\u63d0\u53475-6\u500d\uff0c\u51c6\u786e\u7387\u63d0\u5347\u8d85\u8fc76\uff05\u7684\u7ed3\u679c\u3002|\n", "2410.08145": "|**2024-10-10**|**Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs**|Xiaoyuan Liu et.al.|[2410.08145](http://arxiv.org/abs/2410.08145)|**[link](https://github.com/xyliu-cs/ConflictVIS)**|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4e2d\u89c6\u89c9\u4fe1\u606f\u4e0e\u6a21\u578b\u5185\u90e8\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u7684\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u7279\u5b9a\u60c5\u51b5\u4e0b\uff0cMLLMs\u53ef\u80fd\u57fa\u4e8e\u6587\u672c\u67e5\u8be2\u800c\u975e\u89c6\u89c9\u8f93\u5165\u505a\u51fa\u51b3\u7b56\uff0c\u5bfc\u81f4\u5e38\u8bc6\u7ea7\u7684\u89c6\u89c9-\u77e5\u8bc6\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\uff0c\u5e76\u8f85\u4ee5\u4eba\u5de5\u8d28\u91cf\u63a7\u5236\u73af\u8282\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6a21\u62df\u548c\u8bc4\u4f30\u6b64\u7c7b\u51b2\u7a81\u7684\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u3002 \u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u542b\u4e86374\u5f20\u539f\u521b\u56fe\u7247\u53ca1122\u4e2a\u9ad8\u8d28\u91cf\u7684\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u8986\u76d6\u4e86\u4e24\u79cd\u51b2\u7a81\u76ee\u6807\u7c7b\u578b\u548c\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\uff0c\u4e3a\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u63d0\u4f9b\u4e86\u5de5\u5177\u3002\u901a\u8fc7\u8fd9\u4e00\u57fa\u51c6\uff0c\u6211\u4eec\u5bf9\u4e5d\u79cd\u4ee3\u8868\u6027\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u4e0e\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u65f6\u5b58\u5728\u663e\u8457\u7684\u6587\u672c\u4f9d\u8d56\u6027\u95ee\u9898\u3002 \u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u7b56\u7565\u2014\u2014\u201c\u805a\u7126\u4e8e\u89c6\u89c9\u201d\uff08FoV\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u6a21\u578b\u5728\u9047\u5230\u51b2\u7a81\u65f6\u4f18\u5148\u8003\u8651\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u77db\u76fe\u6587\u672c\u4fe1\u606f\u7684\u4f9d\u8d56\u3002\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u4ee5\u53ca\u63d0\u51fa\u7684\u7b56\u7565\u5bf9\u7406\u89e3\u5e76\u7f13\u89e3MLLM\u4e2d\u7684\u89c6\u89c9-\u77e5\u8bc6\u51b2\u7a81\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002 \u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u548c\u4ee3\u7801\u7684\u516c\u5f00\u8bbf\u95ee\u6743\u9650\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002|\n", "2410.08143": "|**2024-10-10**|**DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory**|Yutong Wang et.al.|[2410.08143](http://arxiv.org/abs/2410.08143)|**[link](https://github.com/yutongwang1216/docmtagent)**|**\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u76f8\u5f53\u53ef\u89c2\u7684\u8d28\u91cf\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5f53\u524d\u7684MT-LLM\u7814\u7a76\u4ecd\u7136\u9762\u4e34\u5728\u5904\u7406\u6574\u4e2a\u6587\u6863\u65f6\u4fdd\u6301\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u51c6\u786e\u6027\u7684\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aDelTA\u7684\u6587\u6863\u7ea7\u7ffb\u8bd1\u4ee3\u7406\uff0c\u65e8\u5728\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u3002DelTA\u5177\u6709\u4e00\u79cd\u591a\u5c42\u6b21\u8bb0\u5fc6\u7ed3\u6784\uff0c\u80fd\u591f\u5b58\u50a8\u4e0d\u540c\u7c92\u5ea6\u548c\u8de8\u5ea6\u7684\u4fe1\u606f\uff0c\u5305\u62ec\u4e13\u6709\u540d\u8bcd\u8bb0\u5f55\u3001\u53cc\u8bed\u6458\u8981\u3001\u957f\u671f\u8bb0\u5fc6\u548c\u77ed\u671f\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u4fe1\u606f\u7531\u8f85\u52a9\u7684LLM\u7ec4\u4ef6\u8fde\u7eed\u68c0\u7d22\u548c\u66f4\u65b0\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u56db\u4e2a\u5f00\u6e90/\u95ed\u6e90LLM\u548c\u4e24\u4e2a\u4ee3\u8868\u6027\u6587\u6863\u7ffb\u8bd1\u6570\u636e\u96c6\u4e0a\uff0cDelTA\u5728\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u8d28\u91cf\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u5f3a\u5927\u7684\u57fa\u7ebf\uff0c\u5e73\u5747\u4e00\u81f4\u6027\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe4.58\u4e2a\u767e\u5206\u70b9\uff0cCOMET\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe3.16\u70b9\u3002DelTA\u91c7\u7528\u9010\u53e5\u7ffb\u8bd1\u7b56\u7565\uff0c\u786e\u4fdd\u65e0\u53e5\u5b50\u9057\u6f0f\uff0c\u5e76\u63d0\u4f9b\u4e0e\u4e3b\u6d41\u65b9\u6cd5\u76f8\u6bd4\u66f4\u4e3a\u5185\u5b58\u9ad8\u6548\u7684\u9009\u62e9\u3002\u6b64\u5916\uff0cDelTA\u63d0\u9ad8\u4e86\u4ee3\u8bcd\u7ffb\u8bd1\u51c6\u786e\u6027\uff0c\u5e76\u4e14\u4ee3\u7406\u7684\u6458\u8981\u7ec4\u4ef6\u4e5f\u663e\u793a\u51fa\u4f5c\u4e3a\u57fa\u4e8e\u67e5\u8be2\u7684\u6458\u8981\u4efb\u52a1\u5de5\u5177\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u53d1\u5e03\u5728https://github.com/YutongWang1216/DocMTAgent\u3002**|\n", "2410.09040": "|**2024-10-11**|**AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation**|Zijun Wang et.al.|[2410.09040](http://arxiv.org/abs/2410.09040)|**[link](https://github.com/ucsc-vlaa/attngcg-attack)**|**\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d7\u5230\u56da\u7981\u653b\u51fb\u7684\u8106\u5f31\u6027\uff0c\u7279\u522b\u5173\u6ce8\u57fa\u4e8e\u4f18\u5316\u7684\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u7b56\u7565\u3002\u6211\u4eec\u9996\u5148\u89c2\u5bdf\u5230\u653b\u51fb\u7684\u6709\u6548\u6027\u4e0e\u6a21\u578b\u5185\u90e8\u884c\u4e3a\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u5f53\u6a21\u578b\u5bf9\u65e8\u5728\u786e\u4fddLLM\u5b89\u5168\u5bf9\u9f50\u7684\u7cfb\u7edf\u63d0\u793a\u7ed9\u4e88\u66f4\u591a\u5173\u6ce8\u65f6\uff0c\u653b\u51fb\u5f80\u5f80\u6548\u679c\u8f83\u5dee\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u64cd\u7eb5\u6a21\u578b\u7684\u6ce8\u610f\u529b\u5206\u6570\u6765\u4fc3\u8fdbLLM\u7684\u56da\u7981\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aAttnGCG\u3002\u5b9e\u9a8c\u4e0a\uff0cAttnGCG\u5728\u5404\u79cdLLMs\u4e0a\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6539\u8fdb\uff0c\u5728Llama-2\u7cfb\u5217\u4e2d\u5e73\u5747\u63d0\u9ad8\u4e86\u7ea67%\uff0c\u5728Gemma\u7cfb\u5217\u4e2d\u63d0\u9ad8\u4e86\u7ea610%\u3002\u6211\u4eec\u7684\u7b56\u7565\u8fd8\u5c55\u793a\u4e86\u9488\u5bf9\u672a\u89c1\u8fc7\u7684\u6709\u5bb3\u76ee\u6807\u548c\u9ed1\u76d2LLMs\uff08\u5982GPT-3.5\u548cGPT-4\uff09\u7684\u7a33\u5065\u653b\u51fb\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u6211\u4eec\u7684\u6ce8\u610f\u529b\u5206\u6570\u53ef\u89c6\u5316\u66f4\u6613\u4e8e\u89e3\u91ca\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u66f4\u597d\u5730\u4e86\u89e3\u5982\u4f55\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u6ce8\u610f\u529b\u64cd\u7eb5\u5b9e\u73b0\u66f4\u6709\u6548\u7684\u56da\u7981\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4ee3\u7801\uff0c\u53ef\u5728https://github.com/UCSC-VLAA/AttnGCG-attack\u4e2d\u83b7\u53d6\u3002**|\n", "2410.09039": "|**2024-10-11**|**Semi-Supervised Learning of Noisy Mixture of Experts Models**|Oh-Ran Kwon et.al.|[2410.09039](http://arxiv.org/abs/2410.09039)|null|\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\u662f\u4e00\u4e2a\u7075\u6d3b\u7684\u9884\u6d4b\u5efa\u6a21\u6846\u67b6\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65f6\u4ee3\u91cd\u65b0\u5f15\u8d77\u4e86\u4eba\u4eec\u7684\u5173\u6ce8\u3002\u4e00\u4e2a\u7531\u9884\u6d4b\u201c\u4e13\u5bb6\u201d\u7ec4\u6210\u7684\u96c6\u5408\u4e0e\u63a7\u5236\u5728\u9884\u6d4b\u65f6\u6bcf\u4e2a\u4e13\u5bb6\u5f71\u54cd\u529b\u7684\u201c\u95e8\u63a7\u51fd\u6570\u201d\u5171\u540c\u5b66\u4e60\u3002\u8fd9\u79cd\u7ed3\u6784\u5141\u8bb8\u76f8\u5bf9\u7b80\u5355\u7684\u6a21\u578b\u5728\u590d\u6742\u3001\u5f02\u6784\u7684\u6570\u636e\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u5728\u5f53\u4eca\u8bb8\u591a\u5e94\u7528\u573a\u666f\u4e2d\uff0c\u672a\u6807\u8bb0\u6570\u636e\u5e7f\u6cdb\u53ef\u7528\u800c\u6807\u6ce8\u6570\u636e\u5374\u96be\u4ee5\u83b7\u53d6\u3002\u534a\u76d1\u7763\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5229\u7528\u672a\u6807\u8bb0\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8eMoE\u6a21\u578b\u534a\u76d1\u7763\u5b66\u4e60\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u6d77\u6d0b\u5b66\u5bb6\u5f00\u53d1\u7684\u4e00\u79cd\u5047\u8bbe\u5f3a\u70c8\u7684\u534a\u76d1\u7763MoE\u6a21\u578b\u5f00\u59cb\uff0c\u8be5\u6a21\u578b\u5047\u8bbe\u672a\u6807\u6ce8\u6570\u636e\u4e2d\u7684\u6f5c\u5728\u805a\u7c7b\u7ed3\u6784\u76f4\u63a5\u6620\u5c04\u5230\u76d1\u7763\u4efb\u52a1\u4e2d\u6bcf\u4e2a\u4e13\u5bb6\u5e94\u7ed9\u4e88\u7684\u5f71\u54cd\u3002\u6211\u4eec\u653e\u677e\u4e86\u8fd9\u4e00\u5047\u8bbe\uff0c\u8bbe\u60f3\u4e24\u8005\u4e4b\u95f4\u5b58\u5728\u566a\u58f0\u8fde\u63a5\uff0c\u5e76\u57fa\u4e8e\u6700\u5c0f\u5316\u5254\u9664\u5e73\u65b9\u7b97\u6cd5\u63d0\u51fa\u4e86\u4e00\u79cd\u7b97\u6cd5\uff0c\u5373\u4f7f\u5b58\u5728\u6570\u636e\u9519\u4f4d\u4e5f\u80fd\u6210\u529f\u3002\u6211\u4eec\u7684\u7406\u8bba\u5206\u6790\u786e\u5b9a\u4e86\u8be5\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u63a5\u8fd1\u53c2\u6570\u7387\u6536\u655b\u4f30\u8ba1\u5668\u7684\u6761\u4ef6\u3002\u6a21\u62df\u548c\u771f\u5b9e\u6570\u636e\u793a\u4f8b\u8bc1\u660e\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.09038": "|**2024-10-11**|**SimpleStrat: Diversifying Language Model Generation with Stratification**|Justin Wong et.al.|[2410.09038](http://arxiv.org/abs/2410.09038)|null|\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u6837\u5316\u54cd\u5e94\u5bf9\u4e8e\u89c4\u5212/\u641c\u7d22\u548c\u5408\u6210\u6570\u636e\u751f\u6210\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5e94\u7528\u9700\u8981\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u591a\u6837\u5316\u7684\u7b54\u6848\uff0c\u4ee5\u4fbf\u5728\u6bcf\u6b21\u751f\u6210\u65f6\u90fd\u80fd\u5f97\u5230\u4e0d\u540c\u7684\u7ed3\u679c\u3002\u4e4b\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u589e\u52a0\u6e29\u5ea6\u6765\u63d0\u9ad8\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u4e0e\u666e\u904d\u8ba4\u8bc6\u76f8\u53cd\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u968f\u7740\u6e29\u5ea6\u589e\u52a0\uff0c\u4e2a\u4f53\u751f\u6210\u7684\u8d28\u91cf\u964d\u4f4e\uff0c\u800c\u4e14\u5176\u6709\u6548\u6027\u8fd8\u53d6\u51b3\u4e8e\u6a21\u578b\u7684\u4e0b\u4e00\u4e2a\u8bcd\u6982\u7387\u4e0e\u771f\u5b9e\u7b54\u6848\u5206\u5e03\u7684\u76f8\u4f3c\u6027\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cSimpleStrat\u201d\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bed\u8a00\u6a21\u578b\u672c\u8eab\u5bf9\u7a7a\u95f4\u8fdb\u884c\u5206\u533a\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u5206\u533a\u5e76\u5728\u5176\u4e2d\u62bd\u53d6\u6837\u672c\u3002\u4e3a\u4e86\u8861\u91cf\u591a\u6837\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86CoverageQA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u5177\u6709\u591a\u4e2a\u540c\u7b49\u53ef\u80fd\u7b54\u6848\u7684\u672a\u6307\u5b9a\u95ee\u9898\u3002\u901a\u8fc7\u6d4b\u91cf\u8f93\u51fa\u5206\u5e03\u4e0e\u6709\u6548\u5730\u9762\u771f\u76f8\u7b54\u6848\u7684\u5747\u5300\u5206\u5e03\u4e4b\u95f4\u7684KL\u6563\u5ea6\u6765\u8bc4\u4f30\u591a\u6837\u6027\u3002\u7531\u4e8e\u8ba1\u7b97\u4e13\u7528\u6a21\u578b\u6bcf\u6761\u54cd\u5e94/\u89e3\u51b3\u65b9\u6848\u7684\u6982\u7387\u901a\u5e38\u662f\u4e0d\u53ef\u884c\u7684\uff0c\u56e0\u6b64\u6211\u4eec\u4f7f\u7528\u53ec\u56de\u7387\u6765\u8bc4\u4f30\u5730\u771f\u7406\u89e3\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528SimpleStrat\u65b9\u6cd5\u53ef\u4ee5\u5b9e\u73b0\u6bd4GPT-4o\u9ad80.05\u7684\u53ec\u56de\u7387\uff0c\u5e76\u4e14\u5e73\u5747\u51cf\u5c11\u4e860.36\u7684KL\u6563\u5ea6\u4e0eLlama 3\u76f8\u6bd4\u3002|\n", "2410.09037": "|**2024-10-11**|**Mentor-KD: Making Small Language Models Better Multi-step Reasoners**|Hojae Lee et.al.|[2410.09037](http://arxiv.org/abs/2410.09037)|**[link](https://github.com/2hojae/mentor-kd)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63d0\u793a\u5728\u5404\u79cd\u590d\u6742\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u975e\u51e1\u7684\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u84b8\u998f\uff08KD\uff09\u65b9\u6cd5\u2014\u2014\u63a8\u7406\u84b8\u998f\uff0c\u901a\u8fc7\u5fae\u8c03\u7531LLM\u6559\u5e08\u751f\u6210\u7684\u591a\u6b65\u63a8\u7406\u8bed\u8a00\u6a21\u578b\uff0c\u5c06LLM\u7684\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u6a21\u578b\u4e0a\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7814\u7a76\u5728\u4ee5\u4e0b\u4e24\u4e2a\u65b9\u9762\u8003\u8651\u4e0d\u8db3\uff1a\u4eceLLM\u6559\u5e08\u6a21\u578b\u83b7\u53d6\u7684\u793a\u4f8b\u96c6\u8d28\u91cf\u4f4e\u548c\u8f6f\u6807\u7b7e\u63d0\u4f9b\u4e0d\u8db3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bfc\u5e08-KD\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6709\u6548\u5730\u5c06LLM\u7684\u591a\u6b65\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u5e76\u89e3\u51b3\u4e86\u4e0a\u8ff0\u6311\u6218\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u4e00\u4e2a\u5bfc\u5e08\u2014\u2014\u7279\u5b9a\u4efb\u52a1\u7684\u4e2d\u95f4\u5927\u5c0f\u7684\u5fae\u8c03\u6a21\u578b\u2014\u2014\u6765\u589e\u52a0\u989d\u5916\u7684CoT\u6ce8\u91ca\u5e76\u4e3a\u5b66\u751f\u6a21\u578b\u63d0\u4f9b\u8f6f\u6807\u7b7e\uff0c\u4ee5\u5728\u63a8\u7406\u84b8\u998f\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u786e\u8ba4\u4e86\u5bfc\u5e08-KD\u5728\u4e0d\u540c\u6a21\u578b\u548c\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|Ptychography\u662f\u4e00\u79cd\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u7684\u9ad8\u7ea7\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5728\u7269\u7406\u5b66\u3001\u5316\u5b66\u3001\u751f\u7269\u5b66\u548c\u6750\u6599\u79d1\u5b66\u7b49\u7814\u7a76\u9886\u57df\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u5b9e\u8df5\u8fc7\u7a0b\u4e2d\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684ptychographic\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u4f17\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u5de5\u4f5c\u6548\u7387\u4f4e\u4e0b\uff0c\u5e76\u53ef\u80fd\u5f15\u5165\u4eba\u4e3a\u504f\u89c1\u3002\u672c\u5de5\u4f5c\u5f00\u53d1\u4e86\u201cptychographic\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u5904\u7406ptychography\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u91c7\u7528\u4e86\u591a\u4e2aLLM\u4ee3\u7406\u8fdb\u884c\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u4e5f\u662f\u5982\u6b64\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u8bbe\u8ba1\u6709\u53ef\u81ea\u5b9a\u4e49\u7684\u672c\u5730\u77e5\u8bc6\u5e93\uff0c\u4ee5\u786e\u4fdd\u5176\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e0b\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09013": "|**2024-10-11**|**The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals**|Xiaofeng Wu et.al.|[2410.09013](http://arxiv.org/abs/2410.09013)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5229\u7528\u6c49\u5b57\u4e2d\u7684\u89c6\u89c9\u4fe1\u606f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5173\u4e8e\u90e8\u9996\u3001\u7ed3\u6784\u3001\u7b14\u753b\u4ee5\u53ca\u7b14\u753b\u6570\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5bf9\u6c49\u5b57\u4e2d\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u7a0b\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u63d0\u4f9b\u5b57\u7b26\u56fe\u50cf\uff0c\u6a21\u578b\u4ecd\u7136\u5c55\u793a\u4e86\u6709\u9650\u4f46\u90e8\u5206\u7406\u89e3\u89c6\u89c9\u4fe1\u606f\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u6fc0\u53d1\u6a21\u578b\u5229\u7528\u90e8\u9996\u8fdb\u884c\u4e2d\u6587\u7406\u89e3\u4efb\u52a1\u7684\u6f5c\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5c1d\u8bd5\u5c06\u90e8\u9996\u4fe1\u606f\u878d\u5165\u5230\u63d0\u793a\u4e2d\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5728\u63d0\u4f9b\u5173\u4e8e\u90e8\u9996\u7684\u989d\u5916\u4fe1\u606f\u65f6\uff0c\u8bcd\u6027\u6807\u6ce8\u4efb\u52a1\u7684\u8868\u73b0\u5f97\u5230\u4e86\u4e00\u81f4\u6027\u7684\u63d0\u5347\u3002\u8fd9\u8868\u660e\u901a\u8fc7\u6574\u5408\u5b50\u5b57\u7b26\u4fe1\u606f\uff0c\u6709\u53ef\u80fd\u589e\u5f3a\u8bed\u8a00\u5904\u7406\u80fd\u529b\u3002|\n", "2410.09012": "|**2024-10-11**|**Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models**|Hao Li et.al.|[2410.09012](http://arxiv.org/abs/2410.09012)|**[link](https://github.com/sailresearch/fmse-blogs)**|\u672c\u6587\u9996\u6b21\u4ece\u5b9e\u8df5\u8005\u7684\u89c6\u89d2\u5206\u6790\u4e86\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u9876\u7ea7\u79d1\u6280\u516c\u53f8\u7684155\u7bc7FM4SE\u548c997\u7bc7SE4FM\u535a\u5ba2\u6587\u7ae0\uff0c\u5229\u7528\u57fa\u4e8eFM\u7684\u8c03\u7814\u65b9\u6cd5\u7cfb\u7edf\u5730\u6807\u8bb0\u548c\u603b\u7ed3\u4e86\u8ba8\u8bba\u7684\u6d3b\u52a8\u548c\u4efb\u52a1\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u4ee3\u7801\u751f\u6210\u662fFM4SE\u4e2d\u6700\u7a81\u51fa\u7684\u4efb\u52a1\uff0c\u4f46FMs\u8fd8\u88ab\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3001\u603b\u7ed3\u548cAPI\u63a8\u8350\u7b49\u4f17\u591a\u5176\u4ed6SE\u6d3b\u52a8\u3002\u5173\u4e8eSE4FM\u7684\u5927\u591a\u6570\u535a\u5ba2\u6587\u7ae0\u5173\u6ce8\u4e8e\u6a21\u578b\u90e8\u7f72\u4e0e\u64cd\u4f5c\u4ee5\u53ca\u7cfb\u7edf\u67b6\u6784\u4e0e\u7f16\u6392\u3002\u5c3d\u7ba1\u4e91\u90e8\u7f72\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u4f46\u5bf9FMs\u8fdb\u884c\u538b\u7f29\u5e76\u5728\u8fb9\u7f18\u6216\u79fb\u52a8\u8bbe\u5907\u4e0a\u90e8\u7f72\u7684\u5174\u8da3\u6b63\u5728\u589e\u957f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u516b\u4e2a\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u5f25\u5408\u7406\u8bba\u53d1\u73b0\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e0d\u4ec5\u4e30\u5bcc\u4e86FMs\u5728SE\u9886\u57df\u5b9e\u8df5\u5e94\u7528\u7684\u77e5\u8bc6\u4f53\u7cfb\uff0c\u8fd8\u5c55\u793a\u4e86FMs\u5728\u6280\u672f\u4e0e\u7070\u8272\u6587\u732e\u9886\u57df\u8fdb\u884c\u6587\u732e\u8c03\u7814\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u63d0\u4f9b\u7684\u6570\u636e\u96c6\u3001\u7ed3\u679c\u3001\u4ee3\u7801\u4ee5\u53ca\u4f7f\u7528\u7684\u63d0\u793a\u53ef\u4ee5\u5728\u5728\u7ebf\u590d\u5236\u5305https://github.com/SAILResearch/fmse-blogs\u4e2d\u627e\u5230\u3002|\n", "2410.09008": "|**2024-10-11**|**SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights**|Ling Yang et.al.|[2410.09008](http://arxiv.org/abs/2410.09008)|**[link](https://github.com/yangling0818/supercorrect-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u3001PaLM\u548cLLaMA\u5728\u5404\u79cd\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8f83\u5c0f\u7684\u6a21\u578b\u5982Llama-3-8B\u548cDeepSeekMath-Base\u4ecd\u7136\u5728\u590d\u6742\u7684\u6570\u5b66\u63a8\u7406\u65b9\u9762\u5b58\u5728\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u6709\u6548\u5730\u8bc6\u522b\u5e76\u7ea0\u6b63\u63a8\u7406\u9519\u8bef\u3002\u8fd1\u671f\u7684\u53cd\u601d\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4f7f\u6a21\u578b\u80fd\u591f\u81ea\u6211\u53cd\u601d\u548c\u81ea\u6211\u6821\u6b63\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u4ecd\u9762\u4e34\u72ec\u7acb\u68c0\u6d4b\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u9519\u8bef\u7684\u6311\u6218\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSuperCorrect\u7684\u65b0\u578b\u4e24\u9636\u6bb5\u6846\u67b6\uff0c\u5b83\u4f7f\u7528\u5927\u578b\u6559\u5e08\u6a21\u578b\u6765\u76d1\u7763\u548c\u7ea0\u6b63\u8f83\u5c0f\u5b66\u751f\u6a21\u578b\u7684\u63a8\u7406\u548c\u53cd\u601d\u8fc7\u7a0b\u3002 \u5728\u7b2c\u4e00\u9636\u6bb5\uff0c\u6211\u4eec\u4ece\u6559\u5e08\u6a21\u578b\u4e2d\u63d0\u53d6\u4e86\u5c42\u6b21\u5316\u7684\u9ad8\u9636\u548c\u8be6\u7ec6\u7684\u601d\u60f3\u6a21\u677f\uff0c\u4ee5\u6307\u5bfc\u5b66\u751f\u6a21\u578b\u751f\u6210\u66f4\u7ec6\u81f4\u7684\u63a8\u7406\u601d\u60f3\u3002\u5728\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8de8\u6a21\u578b\u534f\u4f5c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u6765\u589e\u5f3a\u5b66\u751f\u6a21\u578b\u7684\u81ea\u6211\u6821\u6b63\u80fd\u529b\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8ddf\u968f\u6559\u5e08\u7684\u4fee\u6b63\u8f68\u8ff9\u8fdb\u884c\u6539\u8fdb\u3002\u8fd9\u79cd\u8de8\u6a21\u578bDPO\u65b9\u6cd5\u6559\u4f1a\u5b66\u751f\u6a21\u578b\u901a\u8fc7\u4ece\u6559\u5e08\u6a21\u578b\u83b7\u5f97\u7684\u9519\u8bef\u9a71\u52a8\u7684\u89c1\u89e3\u6709\u6548\u5730\u5b9a\u4f4d\u5e76\u89e3\u51b3\u9519\u8bef\u7684\u601d\u60f3\uff0c\u6253\u7834\u5176\u601d\u60f3\u7684\u74f6\u9888\uff0c\u5e76\u901a\u8fc7\u5b66\u4e60\u65b0\u6280\u80fd\u548c\u77e5\u8bc6\u6765\u5e94\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684\u95ee\u9898\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u4e00\u81f4\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u4f18\u8d8a\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684SuperCorrect-7B\u6a21\u578b\u5728MATH/GSM8K\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684DeepSeekMath-7B\u548cQwen2.5-Math-7B\uff0c\u5206\u522b\u5728MATH\u548cGSM8K\u57fa\u51c6\u4e0a\u63d0\u9ad8\u4e867.8%/5.3%\u548c15.1%/6.3%\uff0c\u5728\u6240\u67097B\u6a21\u578b\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6027\u80fd\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/SuperCorrect-llm**|\n", "2410.09006": "|**2024-10-11**|**From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts**|Zhuohao Jerry Zhang et.al.|[2410.09006](http://arxiv.org/abs/2410.09006)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5728\u521b\u5efa\u80fd\u591f\u901a\u8fc7\u7528\u6237\u754c\u9762\uff08UI\uff09\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u5982\u4f55\u5bfc\u822aUI\u4ee5\u53ca\u7406\u89e3UI\u7ed3\u6784\u7684\u673a\u5236\uff0c\u4f46\u4ee3\u7406\u53ca\u5176\u81ea\u4e3b\u884c\u4e3a\uff08\u7279\u522b\u662f\u53ef\u80fd\u5177\u6709\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u6027\u7684\u884c\u4e3a\uff09\u7684\u5f71\u54cd\u548c\u540e\u679c\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86AI\u4ee3\u7406UI\u64cd\u4f5c\u7684\u5b9e\u9645\u4e16\u754c\u5f71\u54cd\u548c\u540e\u679c\u3002 \u6211\u4eec\u9996\u5148\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0e\u9886\u57df\u4e13\u5bb6\u7684\u5de5\u4f5c\u574a\u5f00\u53d1\u4e86\u4e00\u79cdUI\u64cd\u4f5c\u5f71\u54cd\u7684\u5206\u7c7b\u7cfb\u7edf\u3002\u968f\u540e\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6570\u636e\u7efc\u5408\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u7528\u6237\u611f\u77e5\u4e3a\u5177\u6709\u5f71\u54cd\u529b\u7684UI\u5c4f\u5e55\u8f68\u8ff9\u548c\u64cd\u4f5c\u6570\u636e\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u5f71\u54cd\u7c7b\u522b\u5bf9\u6536\u96c6\u7684\u6570\u636e\u548c\u4ece\u73b0\u6709UI\u5bfc\u822a\u6570\u636e\u96c6\u4e2d\u91cd\u65b0\u5229\u7528\u7684\u6570\u636e\u8fdb\u884c\u4e86\u6ce8\u91ca\u3002\u6211\u4eec\u5bf9\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u53d8\u4f53\u7684\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\u4e86\u8fd9\u4e9bLLM\u7406\u89e3\u548c\u9884\u6d4bAI\u4ee3\u7406\u53ef\u80fd\u91c7\u53d6\u7684UI\u64cd\u4f5c\u5f71\u54cd\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u5206\u7c7b\u7cfb\u7edf\u589e\u5f3a\u4e86\u8fd9\u4e9bLLM\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5b83\u4eec\u80fd\u591f\u66f4\u597d\u5730\u7406\u89e3UI\u64cd\u4f5c\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4ed6\u4eec\u5728\u53ef\u9760\u5730\u5206\u7c7b\u66f4\u5fae\u5999\u6216\u590d\u6742\u7684\u5f71\u54cd\u529b\u7c7b\u522b\u65f6\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u7684\u95ee\u9898\u3002|\n", "2410.08996": "|**2024-10-11**|**Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference**|Grace Proebsting et.al.|[2410.08996](http://arxiv.org/abs/2410.08996)|null|\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-4\u3001Llama-2\u548cMistral 7b\u7b49\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u751f\u6210\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u5047\u8bbe\uff0c\u6d4b\u8bd5\u4e86\u7528LLM\u66ff\u6362\u4f17\u5305\u5de5\u4f5c\u8005\u5bf9\u4ea7\u751f\u6ce8\u91ca\u504f\u89c1\u7684\u5f71\u54cd\u3002\u6211\u4eec\u590d\u73b0\u4e86\u65af\u5766\u798fNLI\u8bed\u6599\u5e93\u7684\u90e8\u5206\u6570\u636e\uff0c\u5e76\u8bad\u7ec3\u4e86\u4ec5\u4f7f\u7528\u5047\u8bbe\u7684\u5206\u7c7b\u5668\u6765\u786e\u5b9aLLM\u751f\u6210\u7684\u5047\u8bbe\u662f\u5426\u5305\u542b\u6ce8\u91ca\u504f\u89c1\u3002\u5728\u6211\u4eec\u7684\u7531LLM\u751f\u6210\u7684NLI\u6570\u636e\u96c6\u4e0a\uff0c\u57fa\u4e8eBERT\u7684\u4ec5\u5047\u8bbe\u5206\u7c7b\u5668\u8fbe\u5230\u4e8686%-96%\u7684\u51c6\u786e\u7387\uff0c\u8fd9\u8868\u660e\u8fd9\u4e9b\u6570\u636e\u96c6\u5305\u542b\u4ec5\u5047\u8bbe\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0LLM\u751f\u6210\u7684\u5047\u8bbe\u4e2d\u5b58\u5728\u9891\u7e41\u7684\u201c\u7ebf\u7d22\u201d\uff0c\u4f8b\u5982\uff0c\u201c\u5728\u6cf3\u6c60\u91cc\u6e38\u6cf3\u201d\u8fd9\u4e00\u77ed\u8bed\u5728GPT-4\u751f\u6210\u768410000\u591a\u4e2a\u77db\u76fe\u5047\u8bbe\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660eNLI\u4e2d\u5df2\u77e5\u7684\u504f\u89c1\u53ef\u80fd\u5728LLM\u751f\u6210\u7684\u6570\u636e\u4e2d\u6301\u7eed\u5b58\u5728\u3002|\n", "2410.10819": "|**2024-10-14**|**DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads**|Guangxuan Xiao et.al.|[2410.10819](http://arxiv.org/abs/2410.10819)|**[link](https://github.com/mit-han-lab/duo-attention)**|**\u90e8\u7f72\u957f\u4e0a\u4e0b\u6587\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6311\u6218\u3002\u7f13\u5b58\u6240\u6709\u6ce8\u610f\u529b\u5934\u4e2d\u7684Key\u548cValue\uff08KV\uff09\u72b6\u6001\u4f1a\u6d88\u8017\u5927\u91cf\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\u8981\u4e48\u635f\u5bb3\u4e86LLM\u7684\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\uff0c\u8981\u4e48\u53ea\u63d0\u4f9b\u4e86\u6709\u9650\u7684\u6548\u7387\u63d0\u5347\u3002\u672c\u6587\u53d1\u73b0\uff0c\u53ea\u6709\u90e8\u5206\u6ce8\u610f\u529b\u5934\uff0c\u5373\u68c0\u7d22\u5934\uff0c\u5bf9\u4e8e\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u662f\u81f3\u5173\u91cd\u8981\u7684\uff0c\u5e76\u4e14\u9700\u8981\u5bf9\u6240\u6709\u6807\u8bb0\u8fdb\u884c\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u673a\u5236\u3002\u76f8\u53cd\uff0c\u6240\u6709\u5176\u4ed6\u5934\u90e8\uff0c\u4e3b\u8981\u5173\u6ce8\u6700\u8fd1\u7684\u6807\u8bb0\u4ee5\u53ca\u6ce8\u610f\u529b\u6c47\u70b9\uff0c\u79f0\u4e3a\u6d41\u5934\u90e8\uff0c\u4e0d\u9700\u8981\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86DuoAttention\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4ec5\u5bf9\u68c0\u7d22\u5934\u5e94\u7528\u5b8c\u6574\u7684KV\u7f13\u5b58\uff0c\u800c\u5bf9\u6d41\u5934\u90e8\u4f7f\u7528\u8f7b\u91cf\u7ea7\u3001\u56fa\u5b9a\u957f\u5ea6\u7684KV\u7f13\u5b58\uff0c\u4ece\u800c\u5728\u4e0d\u635f\u5bb3\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51cf\u5c11LLM\u89e3\u7801\u548c\u9884\u586b\u5145\u7684\u5185\u5b58\u548c\u5ef6\u8fdf\u3002DuoAttention\u91c7\u7528\u4e86\u4e00\u79cd\u57fa\u4e8e\u4f18\u5316\u7684\u7b97\u6cd5\uff0c\u4f7f\u7528\u5408\u6210\u6570\u636e\u51c6\u786e\u8bc6\u522b\u68c0\u7d22\u5934\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u5185\u5b58\u6700\u591a\u51cf\u5c11\u4e862.55\u500d\uff08\u5bf9\u4e8eMHA\u6a21\u578b\uff09\u548c1.67\u500d\uff08\u5bf9\u4e8eGQA\u6a21\u578b\uff09\uff0c\u540c\u65f6\u89e3\u7801\u901f\u5ea6\u63d0\u9ad8\u4e86\u6700\u591a2.18\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.50\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u52a0\u901f\u9884\u586b\u5145\u6700\u591a1.73\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.63\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u4e14\u4e0e\u5168\u6ce8\u610f\u529b\u76f8\u6bd4\uff0c\u7cbe\u5ea6\u635f\u5931\u6700\u5c0f\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7ed3\u5408\u91cf\u5316\u6280\u672f\uff0cDuoAttention\u4f7fLlama-3-8B\u80fd\u591f\u5728\u5355\u4e2aA100 GPU\u4e0a\u89e3\u7801\u957f\u8fbe330\u4e07\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mit-han-lab/duo-attention\u83b7\u53d6\u3002**|\n", "2410.10813": "|**2024-10-14**|**LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory**|Di Wu et.al.|[2410.10813](http://arxiv.org/abs/2410.10813)|**[link](https://github.com/xiaowu0162/longmemeval)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u804a\u5929\u52a9\u624b\u7cfb\u7edf\u5df2\u96c6\u6210\u4e86\u8bb0\u5fc6\u7ec4\u4ef6\u6765\u8ddf\u8e2a\u7528\u6237\u4e0e\u52a9\u624b\u4e4b\u95f4\u7684\u804a\u5929\u5386\u53f2\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u51c6\u786e\u548c\u4e2a\u6027\u5316\u7684\u54cd\u5e94\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6301\u7eed\u4ea4\u4e92\u4e2d\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u4ecd\u9700\u6df1\u5165\u7814\u7a76\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLongMemEval\u7684\u7efc\u5408\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u804a\u5929\u52a9\u624b\u7684\u4e94\u9879\u6838\u5fc3\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\uff1a\u4fe1\u606f\u63d0\u53d6\u3001\u591a\u4f1a\u8bdd\u63a8\u7406\u3001\u65f6\u95f4\u63a8\u7406\u3001\u77e5\u8bc6\u66f4\u65b0\u548c\u5f03\u6743\u3002\u8be5\u57fa\u51c6\u5305\u542b500\u4e2a\u7cbe\u5fc3\u7b56\u5212\u7684\u95ee\u9898\uff0c\u5e76\u5d4c\u5165\u5728\u81ea\u7531\u6269\u5c55\u7684\u7528\u6237\u4e0e\u52a9\u624b\u804a\u5929\u5386\u53f2\u4e2d\u3002LongMemEval\u5bf9\u73b0\u6709\u7684\u957f\u671f\u8bb0\u5fc6\u7cfb\u7edf\u63d0\u51fa\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5728\u5546\u4e1a\u804a\u5929\u52a9\u624b\u548c\u957f\u4e0a\u4e0b\u6587LLM\u4e0a\uff0c\u8de8\u6301\u7eed\u4ea4\u4e92\u7684\u8bb0\u5fc6\u4fe1\u606f\u4fdd\u7559\u7387\u4e0b\u964d\u4e8630%\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u5c06\u957f\u671f\u8bb0\u5fc6\u8bbe\u8ba1\u5206\u89e3\u4e3a\u7d22\u5f15\u3001\u68c0\u7d22\u548c\u9605\u8bfb\u9636\u6bb5\u7684\u56db\u4e2a\u8bbe\u8ba1\u9009\u62e9\u3002\u57fa\u4e8e\u5173\u952e\u5b9e\u9a8c\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u79cd\u5185\u5b58\u8bbe\u8ba1\uff0c\u5305\u62ec\u4f1a\u8bdd\u5206\u89e3\u4ee5\u4f18\u5316\u503c\u7c92\u5ea6\u3001\u4e8b\u5b9e\u589e\u5f3a\u7684\u5173\u952e\u6269\u5c55\u4ee5\u589e\u5f3a\u7d22\u5f15\u7ed3\u6784\u4ee5\u53ca\u65f6\u95f4\u611f\u77e5\u67e5\u8be2\u6269\u5c55\u4ee5\u7ec6\u5316\u641c\u7d22\u8303\u56f4\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u4f18\u5316\u6781\u5927\u5730\u63d0\u9ad8\u4e86LongMemEval\u4e0a\u7684\u5185\u5b58\u53ec\u56de\u7387\u548c\u4e0b\u6e38\u95ee\u9898\u56de\u7b54\u6027\u80fd\u3002\u603b\u4f53\u800c\u8a00\uff0c\u672c\u7814\u7a76\u4e3a\u63a8\u8fdb\u57fa\u4e8eLLM\u7684\u804a\u5929\u52a9\u624b\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u8d44\u6e90\u548c\u6307\u5bfc\uff0c\u4e3a\u66f4\u4e2a\u6027\u5316\u548c\u53ef\u9760\u7684\u5bf9\u8bddAI\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2410.10814": "|**2024-10-14**|**Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free**|Ziyue Li et.al.|[2410.10814](http://arxiv.org/abs/2410.10814)|**[link](https://github.com/tianyi-lab/moe-embedding)**|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u89e3\u7801\u5668-only\u67b6\u6784\u901a\u5e38\u9650\u5236\u4e86\u5b83\u4eec\u4f5c\u4e3a\u5d4c\u5165\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u9664\u975e\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u8868\u793a\u5fae\u8c03\u3002\u8fd9\u662f\u5426\u4e0e\u5b83\u4eec\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u7684\u4e3b\u5f20\u76f8\u77db\u76fe\uff1f\u4e3a\u4e86\u56de\u7b54\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u66f4\u4ed4\u7ec6\u5730\u7814\u7a76\u4e86\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cMoE LLMs\u4e2d\u7684\u4e13\u5bb6\u8def\u7531\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u73b0\u6210\u7684\u5d4c\u5165\u6a21\u578b\uff0c\u5728\u5404\u79cd\u5d4c\u5165\u91cd\u70b9\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5e7f\u6cdb\u7684\u5206\u6790\u8868\u660e\uff0cMoE\u8def\u7531\u6743\u91cd\uff08RW\uff09\u4e0eLLMs\u5e7f\u6cdb\u4f7f\u7528\u7684\u9690\u85cf\u72b6\u6001\uff08HS\uff09\u4e92\u8865\u3002\u4e0eHS\u76f8\u6bd4\uff0c\u6211\u4eec\u53d1\u73b0RW\u5bf9\u63d0\u793a\u7684\u9009\u62e9\u66f4\u5177\u9c81\u68d2\u6027\uff0c\u5e76\u5173\u6ce8\u9ad8\u5c42\u6b21\u8bed\u4e49\u3002\u53d7\u6b64\u5206\u6790\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MoEE\uff0c\u7ed3\u5408\u4e86RW\u548cHS\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5355\u72ec\u4f7f\u7528\u4efb\u4e00\u65b9\u6cd5\u3002\u6211\u4eec\u5bf9\u5b83\u4eec\u7684\u7ec4\u5408\u53ca\u5176\u63d0\u793a\u7b56\u7565\u7684\u63a2\u7d22\u63ed\u793a\u4e86\u82e5\u5e72\u65b0\u9896\u89c1\u89e3\uff0c\u4f8b\u5982\uff0cRW\u548cHS\u76f8\u4f3c\u5ea6\u7684\u52a0\u6743\u548c\u4f18\u4e8e\u5b83\u4eec\u8fde\u63a5\u540e\u7684\u76f8\u4f3c\u5ea6\u3002\u6211\u4eec\u5728\u6765\u81ea\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\u76846\u4e2a\u5d4c\u5165\u4efb\u52a1\u4e2d\u768420\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0cMoEE\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8eLLM\u7684\u5d4c\u5165\u6548\u679c\uff0c\u4e14\u65e0\u9700\u8fdb\u4e00\u6b65\u5fae\u8c03\u3002|\n", "2410.10801": "|**2024-10-14**|**Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning**|Aakanksha et.al.|[2410.10801](http://arxiv.org/abs/2410.10801)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u88ab\u5168\u7403\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e94\u7528\u4e8e\u5404\u79cd\u9886\u57df\u3002\u7136\u800c\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u4f7f\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u504f\u597d\u8bad\u7ec3\u548c\u5b89\u5168\u63aa\u65bd\u5f80\u5f80\u8fc7\u5ea6\u62df\u5408\u4e8e\u897f\u65b9\u4e2d\u5fc3\u6570\u636e\u96c6\u4e2d\u7684\u5371\u5bb3\uff0c\u800c\u5b89\u5168\u534f\u8bae\u901a\u5e38\u65e0\u6cd5\u6269\u5c55\u5230\u591a\u8bed\u8a00\u73af\u5883\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5728\u591a\u6837\u5316\u7684\u591a\u4efb\u52a1\u8bbe\u7f6e\u4e2d\u63a2\u7d22\u6a21\u578b\u5408\u5e76\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7ed3\u5408\u5b89\u5168\u548c\u901a\u7528\u4efb\u52a1\u3002\u6bcf\u79cd\u8bed\u8a00\u5728\u4e0d\u540c\u4efb\u52a1\u4e2d\u5f15\u5165\u4e86\u72ec\u7279\u7684\u5b66\u4e60\u6311\u6218\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u57fa\u4e8e\u76ee\u6807\u7684\u5408\u5e76\u6bd4\u6df7\u5408\u6570\u636e\u66f4\u6709\u6548\uff0c\u603b\u4f53\u6027\u80fd\u548c\u5b89\u5168\u6027\u5206\u522b\u63d0\u9ad8\u4e868%\u548c10%\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u57fa\u4e8e\u8bed\u8a00\u7684\u5408\u5e76\u975e\u5e38\u6709\u6548\u2014\u2014\u901a\u8fc7\u5408\u5e76\u5355\u8bed\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u76f8\u540c\u53ef\u7528\u6570\u636e\u4e0b\uff0c\u76f8\u6bd4\u6df7\u5408\u6570\u636e\u65b9\u6cd5\uff0c\u6574\u4f53\u6027\u80fd\u63d0\u9ad84%\uff0c\u6240\u6709\u8bed\u8a00\u4e0a\u7684\u5371\u5bb3\u51cf\u5c117%\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u5408\u5e76\u65b9\u6cd5\u7684\u7efc\u5408\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6784\u5efa\u5f3a\u5927\u4e14\u5b89\u5168\u7684\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u7528\u6846\u67b6\u3002|\n", "2410.10798": "|**2024-10-15**|**MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling**|Jian Yang et.al.|[2410.10798](http://arxiv.org/abs/2410.10798)|null|\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u8054\u5408\u6982\u7387\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u540c\u65f6\u7406\u89e3\u548c\u751f\u6210\u56fe\u50cf\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u7684\u65b9\u6cd5\u5728\u7406\u89e3\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u4e0d\u53ef\u907f\u514d\u5730\u4f1a\u4e22\u5931\u56fe\u50cf\u4fe1\u606f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u56fe\u50cf\u79bb\u6563\u5316\u6216\u6269\u6563\u53bb\u566a\u6b65\u9aa4\u9020\u6210\u7684\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u81ea\u56de\u5f52\uff08MMAR\uff09\u6982\u7387\u5efa\u6a21\u6846\u67b6\u3002\u4e0e\u79bb\u6563\u5316\u65b9\u6cd5\u4e0d\u540c\uff0cMMAR\u91c7\u7528\u8fde\u7eed\u503c\u7684\u56fe\u50cf\u6807\u8bb0\u6765\u907f\u514d\u4fe1\u606f\u4e22\u5931\u3002\u4e0d\u540c\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u6bcf\u4e2a\u81ea\u56de\u5f52\u56fe\u50cf\u5757\u5d4c\u5165\u9876\u90e8\u6dfb\u52a0\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6269\u6563\u5934\u6765\u89e3\u8026\u6269\u6563\u8fc7\u7a0b\u548c\u81ea\u56de\u5f52\u4e3b\u5e72\u6a21\u578b\u3002\u8fd9\u6837\u4e00\u6765\uff0c\u5f53\u6a21\u578b\u4ece\u56fe\u50cf\u751f\u6210\u8fc7\u6e21\u5230\u901a\u8fc7\u6587\u672c\u751f\u6210\u8fdb\u884c\u7406\u89e3\u65f6\uff0c\u4e3b\u5e72\u6a21\u578b\u5bf9\u56fe\u50cf\u7684\u9690\u85cf\u8868\u793a\u4e0d\u53d7\u9650\u4e8e\u6700\u540e\u7684\u53bb\u566a\u6b65\u9aa4\u3002\u4e3a\u4e86\u6210\u529f\u8bad\u7ec3\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u88ab\u8bc1\u660e\u53ef\u4ee5\u89e3\u51b3\u6570\u503c\u7a33\u5b9a\u6027\u95ee\u9898\u7684\u6280\u672f\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u5e73\u8861\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u76ee\u6807\u7684\u8bad\u7ec3\u7b56\u7565\u3002\u901a\u8fc7\u572818\u4e2a\u56fe\u50cf\u7406\u89e3\u57fa\u51c6\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u8bc4\u4f30\uff0cMMAR\u5c55\u793a\u4e86\u6bd4\u5176\u4ed6\u8054\u5408\u591a\u6a21\u6001\u6a21\u578b\u66f4\u4f18\u8d8a\u7684\u6027\u80fd\uff0c\u5176\u6027\u80fd\u53ef\u4e0e\u91c7\u7528\u9884\u8bad\u7ec3CLIP\u89c6\u89c9\u7f16\u7801\u5668\u7684\u65b9\u6cd5\u76f8\u5ab2\u7f8e\uff0c\u540c\u65f6\u8fd8\u80fd\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u8868\u660e\uff0c\u8be5\u65b9\u6cd5\u5728\u66f4\u5927\u6570\u636e\u96c6\u548c\u66f4\u5927\u6a21\u578b\u89c4\u6a21\u4e0b\u5177\u6709\u53ef\u6269\u5c55\u6027\u3002|\n", "2410.10796": "|**2024-10-14**|**Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance**|Sachin Goyal et.al.|[2410.10796](http://arxiv.org/abs/2410.10796)|**[link](https://github.com/locuslab/context-parametric-inversion)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u6765\u589e\u5f3a\u5176\u9075\u5faa\u7528\u6237\u6307\u4ee4\u548c\u5904\u7406\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u5e38\u5e38\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\uff0c\u5c24\u5176\u662f\u5728\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e0d\u4e00\u81f4\u65f6\u3002\u8fd9\u4f1a\u5bfc\u81f4\u5404\u79cd\u5931\u8d25\uff0c\u4f8b\u5982\u5e7b\u89c9\uff0c\u5373\u54cd\u5e94\u5185\u5bb9\u8fc7\u65f6\u3001\u5e26\u6709\u504f\u89c1\u6216\u5305\u542b\u672a\u7ecf\u9a8c\u8bc1\u7684\u4e8b\u5b9e\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bd5\u56fe\u7406\u89e3\u8fd9\u79cd\u4e0d\u826f\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u7684\u6839\u672c\u539f\u56e0\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee4\u5fae\u8c03\u4e4b\u540e\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u4e2a\u6709\u8da3\u7684\u73b0\u8c61\uff1a\u5728\u6307\u4ee4\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u6700\u521d\u5982\u9884\u671f\u822c\u589e\u52a0\uff0c\u4f46\u968f\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u8fdb\u884c\uff0c\u8fd9\u79cd\u4f9d\u8d56\u6027\u9010\u6e10\u51cf\u5c11\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u79f0\u4e3a\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\uff0c\u5e76\u53d1\u73b0\u5728\u591a\u4e2a\u901a\u7528\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff08\u5982TULU\u3001Alpaca\u548cUltrachat\uff09\u4ee5\u53ca\u6a21\u578b\u5bb6\u65cf\uff08\u5982Llama\u3001Mistral\u548cPythia\uff09\u4e2d\u90fd\u5b58\u5728\u8fd9\u79cd\u73b0\u8c61\u3002\u5728\u4e00\u4e2a\u7b80\u5355\u7684\u7406\u8bba\u8bbe\u7f6e\u4e2d\uff0c\u6211\u4eec\u6cbf\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u68af\u5ea6\u4e0b\u964d\u8f68\u8ff9\u5206\u79bb\u51fa\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\u53d1\u751f\u7684\u539f\u56e0\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u4e0e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u6df7\u5408\u4e2d\u7684\u793a\u4f8b\u8054\u7cfb\u8d77\u6765\uff0c\u8fd9\u4e9b\u793a\u4f8b\u4e2d\u8f93\u5165\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u4fe1\u606f\u5df2\u7ecf\u5b58\u5728\u4e8e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e2d\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u51fa\u4e86\u67d0\u4e9b\u6709\u9650\u7684\u7f13\u89e3\u7b56\u7565\uff0c\u540c\u65f6\u4e5f\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u7406\u8bba\u89c1\u89e3\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5931\u8d25\u6a21\u5f0f\u7684\u4e00\u4e2a\u8d77\u70b9\uff0c\u800c\u8fd9\u4e00\u6a21\u5f0f\u662fLLM\u8bad\u7ec3\u4e2d\u7684\u4e00\u4e2a\u6807\u51c6\u90e8\u5206\u3002**|\n", "2410.10779": "|**2024-10-14**|**Focused ReAct: Improving ReAct through Reiterate and Early Stop**|Shuoqiu Li et.al.|[2410.10779](http://arxiv.org/abs/2410.10779)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\u65b9\u9762\u6709\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u8fd9\u4f53\u73b0\u5728ReAct\u7b49\u65b9\u6cd5\u4e2d\u3002\u7136\u800c\uff0c\u5c3d\u7ba1ReAct\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u975e\u5e38\u6709\u6548\uff0c\u4f46\u5b83\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u4e00\u662f\u5bb9\u6613\u504f\u79bb\u539f\u59cb\u95ee\u9898\uff0c\u4e8c\u662f\u9677\u5165\u884c\u52a8\u5faa\u73af\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86Focused ReAct\uff0c\u8fd9\u662fReAct\u8303\u5f0f\u7684\u4e00\u4e2a\u589e\u5f3a\u7248\u672c\uff0c\u5b83\u7ed3\u5408\u4e86\u91cd\u7533\u548c\u65e9\u671f\u505c\u6b62\u673a\u5236\u3002\u8fd9\u4e9b\u6539\u8fdb\u6709\u52a9\u4e8e\u6a21\u578b\u4fdd\u6301\u5bf9\u539f\u59cb\u95ee\u9898\u7684\u5173\u6ce8\u5e76\u907f\u514d\u91cd\u590d\u884c\u4e3a\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u539f\u59cb\u7684ReAct\u65b9\u6cd5\u76f8\u6bd4\uff0cFocused ReAct\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e8618%\u5230530%\uff0c\u8fd0\u884c\u65f6\u95f4\u51cf\u5c11\u4e86\u6700\u591a34%\u3002|\n", "2410.10762": "|**2024-10-14**|**AFlow: Automating Agentic Workflow Generation**|Jiayi Zhang et.al.|[2410.10762](http://arxiv.org/abs/2410.10762)|**[link](https://github.com/geekan/metagpt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u9886\u57df\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u663e\u8457\u7684\u6f5c\u529b\uff0c\u901a\u5e38\u901a\u8fc7\u91c7\u7528\u9075\u5faa\u8be6\u7ec6\u6307\u4ee4\u548c\u64cd\u4f5c\u5e8f\u5217\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u6765\u5b9e\u73b0\u3002\u7136\u800c\uff0c\u6784\u5efa\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u548c\u901a\u7528\u6027\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8bd5\u56fe\u81ea\u52a8\u5316\u751f\u6210\u548c\u4f18\u5316\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4ecd\u7136\u4f9d\u8d56\u4e8e\u521d\u59cb\u7684\u624b\u52a8\u8bbe\u7f6e\uff0c\u5e76\u4e14\u672a\u80fd\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u5316\u548c\u6709\u6548\u7684\u6d41\u7a0b\u751f\u6210\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c06\u5de5\u4f5c\u6d41\u4f18\u5316\u91cd\u65b0\u8868\u8ff0\u4e3a\u4e00\u4e2a\u4ee3\u7801\u8868\u793a\u7684\u5de5\u4f5c\u6d41\u7a7a\u95f4\u641c\u7d22\u95ee\u9898\uff0c\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u7531LLM\u8c03\u7528\u7684\u8282\u70b9\u901a\u8fc7\u8fb9\u8fde\u63a5\u3002\u6211\u4eec\u5f15\u5165\u4e86AFlow\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\u6709\u6548\u5730\u63a2\u7d22\u8fd9\u4e2a\u7a7a\u95f4\uff0c\u901a\u8fc7\u4ee3\u7801\u4fee\u6539\u3001\u6811\u7ed3\u6784\u7684\u7ecf\u9a8c\u4ee5\u53ca\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u5730\u6539\u8fdb\u5de5\u4f5c\u6d41\u7a0b\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u8bc1\u8bc4\u4f30\u8868\u660e\uff0cAFlow\u7684\u6709\u6548\u6027\uff0c\u5e73\u5747\u6bd4\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u63d0\u9ad8\u4e865.7%\u3002\u6b64\u5916\uff0cAFlow\u4f7f\u5f97\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u80fd\u591f\u8d85\u8d8aGPT-4\uff0c\u540c\u65f6\u5176\u63a8\u7406\u6210\u672c\u4ec5\u4e3aGPT-4\u76844.55%\u3002\u4ee3\u7801\u5c06\u5728https://github.com/geekan/MetaGPT\u83b7\u53d6\u3002**|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u8fd9\u79cd\u653b\u51fb\u901a\u8fc7\u6076\u610f\u8f93\u5165\u5982\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u89e6\u53d1\u6a21\u578b\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ed3\u675f\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\u7684\u60c5\u51b5\u4e0b\uff08\u4f8b\u5982\uff0c\u5bf9\u673a\u5668\u4eba\u7684\u8bed\u97f3\u6307\u4ee4\uff09\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4f9d\u8d56\u81ea\u7136\u6307\u4ee4\u7684\u65b9\u5f0f\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u9650\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u7684\u6700\u5927\u957f\u5ea6\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLMs\u7684\u57fa\u4e8e\u6295\u6bd2\u7684DoS\uff08P-DoS\uff09\u653b\u51fb\u65b9\u6cd5\uff0c\u8bc1\u660e\u901a\u8fc7\u6ce8\u5165\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6295\u6bd2\u6837\u672c\u53ef\u4ee5\u7a81\u7834\u8f93\u51fa\u957f\u5ea6\u7684\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u6295\u6bd2\u6837\u672c\u80fd\u591f\u4ee5\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\u6210\u529f\u653b\u51fbGPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u5bfc\u81f4\u91cd\u590d\u8f93\u51fa\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2a\u6807\u8bb0\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u6295\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u5f00\u6e90LLMs\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u6b64\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u9700\u8981\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u7684\u5b89\u5168\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u83b7\u53d6\u3002**|\n", "2410.10759": "|**2024-10-14**|**SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization**|Akrit Mudvari et.al.|[2410.10759](http://arxiv.org/abs/2410.10759)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u5e74\u6765\u6210\u4e3a\u4e00\u9879\u98a0\u8986\u6027\u7684\u521b\u65b0\uff0c\u5728\u6211\u4eec\u7684\u65e5\u5e38\u751f\u6d3b\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u5b83\u4eec\u7684\u529f\u80fd\u5305\u62ec\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u3001\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u3001\u865a\u62df\u52a9\u624b\u7b49\u3002\u7136\u800c\uff0c\u4f17\u6240\u5468\u77e5\uff0cLLMs\u5728\u53c2\u6570\u6570\u91cf\u4e0a\u975e\u5e38\u5e9e\u5927\u3002\u6b64\u5916\uff0c\u5e95\u5c42\u67b6\u6784Transformer\u4e2d\u7684\u81ea\u6ce8\u610f\u529b\u673a\u5236\u5728\u8ba1\u7b97\u548c\u5185\u5b58\u65b9\u9762\u4e0e\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u5448\u4e8c\u6b21\u590d\u6742\u6027\u5173\u7cfb\u3002\u7531\u4e8e\u8fd9\u4e9b\u539f\u56e0\uff0cLLM\u63a8\u7406\u8d44\u6e90\u5bc6\u96c6\u578b\u9ad8\uff0c\u56e0\u6b64LLM\u63a8\u7406\u7684\u541e\u5410\u91cf\u53d7\u5230\u9650\u5236\uff0c\u5c24\u5176\u662f\u5728\u8f83\u957f\u5e8f\u5217\u7684\u60c5\u51b5\u4e0b\u3002\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u670d\u52a1\u5668\u4e0e\u5176\u5ba2\u6237\u7aef\u4e4b\u95f4\u7684\u534f\u4f5c\u63a8\u7406\u67b6\u6784\uff0c\u4ee5\u7f13\u89e3\u541e\u5410\u91cf\u9650\u5236\u3002\u5728\u8fd9\u4e2a\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u53cc\u65b9\u53ef\u7528\u7684\u8d44\u6e90\uff0c\u5373\u8ba1\u7b97\u548c\u901a\u4fe1\u6210\u672c\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8e\u52a8\u6001\u89c4\u5212\u7684\u7b97\u6cd5\uff0c\u4ee5\u6700\u4f18\u65b9\u5f0f\u5206\u914d\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u8bbe\u5907\u4e4b\u95f4\u7684\u8ba1\u7b97\uff0c\u4ece\u800c\u63d0\u9ad8\u670d\u52a1\u5668\u541e\u5410\u91cf\uff0c\u540c\u65f6\u4e0d\u8fdd\u53cd\u670d\u52a1\u6c34\u5e73\u534f\u8bae\uff08SLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u80fd\u591f\u9ad8\u6548\u5730\u5206\u914d\u5de5\u4f5c\u8d1f\u8f7d\uff0c\u4f7f\u670d\u52a1\u5668\u7684\u5de5\u4f5c\u8d1f\u8f7d\u51cf\u5c11\u7ea6\u4e09\u5206\u4e4b\u4e00\uff0c\u540c\u65f6\u6bd4\u8d2a\u5fc3\u65b9\u6cd5\u63d0\u9ad8\u4e8619%\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5177\u6709\u4e0d\u540c\u7c7b\u578bLLM\u63a8\u7406\u8bf7\u6c42\u7684\u73af\u5883\u4e2d\uff0c\u670d\u52a1\u5668\u7684\u541e\u5410\u91cf\u5f97\u5230\u4e86\u63d0\u5347\u3002|\n", "2410.11841": "|**2024-10-15**|**GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation**|Fei Tang et.al.|[2410.11841](http://arxiv.org/abs/2410.11841)|null|\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u63a8\u8350\uff08LLM-based ER\uff09\u7cfb\u7edf\u5728\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u8350\u89e3\u91ca\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u9762\u4e34\u7740\u5efa\u6a21\u7528\u6237\u4e0e\u9879\u76ee\u4e4b\u95f4\u7684\u534f\u540c\u504f\u597d\u3001\u4e2a\u6027\u5316\u89e3\u91ca\u4ee5\u53ca\u5904\u7406\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGaVaMoE\u7684\u65b0\u6846\u67b6\uff0c\u5373\u9ad8\u65af\u53d8\u5206\u95e8\u63a7\u4e13\u5bb6\u6df7\u5408\u6a21\u578b\uff0c\u7528\u4e8e\u53ef\u89e3\u91ca\u63a8\u8350\u3002GaVaMoE\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(1) \u4e00\u4e2a\u8bc4\u5206\u91cd\u6784\u6a21\u5757\uff0c\u91c7\u7528\u5e26\u6709\u9ad8\u65af\u6df7\u5408\u6a21\u578b\uff08GMM\uff09\u7684\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VAE\uff09\uff0c\u4ee5\u6355\u6349\u590d\u6742\u7684\u7528\u6237-\u9879\u76ee\u534f\u540c\u504f\u597d\uff0c\u4f5c\u4e3a\u9884\u8bad\u7ec3\u7684\u591a\u95e8\u673a\u5236\uff1b(2) \u4e00\u7ec4\u7ec6\u7c92\u5ea6\u7684\u4e13\u5bb6\u6a21\u578b\uff0c\u4e0e\u591a\u95e8\u673a\u5236\u8026\u5408\uff0c\u7528\u4e8e\u751f\u6210\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u89e3\u91ca\u3002VAE\u7ec4\u4ef6\u5bf9\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u56e0\u7d20\u8fdb\u884c\u5efa\u6a21\uff0c\u800cGMM\u5219\u805a\u7c7b\u5177\u6709\u76f8\u4f3c\u884c\u4e3a\u7684\u7528\u6237\u3002\u6bcf\u4e2a\u805a\u7c7b\u5bf9\u5e94\u591a\u95e8\u673a\u5236\u4e2d\u7684\u4e00\u4e2a\u95e8\uff0c\u5c06\u7528\u6237-\u9879\u76ee\u5bf9\u8def\u7531\u5230\u9002\u5f53\u7684\u4e13\u5bb6\u6a21\u578b\u3002\u8fd9\u79cd\u67b6\u6784\u4f7fGaVaMoE\u80fd\u591f\u4e3a\u7279\u5b9a\u7c7b\u578b\u7684\u7528\u6237\u548c\u504f\u597d\u751f\u6210\u5b9a\u5236\u5316\u89e3\u91ca\uff0c\u901a\u8fc7\u5229\u7528\u7528\u6237\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u6765\u7f13\u89e3\u6570\u636e\u7a00\u758f\u95ee\u9898\u3002\u5728\u4e09\u4e2a\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cGaVaMoE\u5728\u89e3\u91ca\u8d28\u91cf\u3001\u4e2a\u6027\u5316\u548c\u4e00\u81f4\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002\u7279\u522b\u662f\uff0c\u5728\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u573a\u666f\u4e2d\uff0cGaVaMoE\u8868\u73b0\u51fa\u7a33\u5065\u7684\u6027\u80fd\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5386\u53f2\u6570\u636e\u6709\u9650\u7684\u7528\u6237\u4e5f\u80fd\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u89e3\u91ca\u3002|\n", "2410.11829": "|**2024-10-15**|**MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding**|Yue Cao et.al.|[2410.11829](http://arxiv.org/abs/2410.11829)|**[link](https://github.com/yuecao0119/MMFuser)**|**\u5c3d\u7ba1\u5728\u8de8\u6a21\u6001\u4ea4\u4e92\u4e2d\u7406\u89e3\u590d\u6742\u7684\u4eba\u7c7b\u610f\u56fe\u65b9\u9762\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u6355\u6349\u590d\u6742\u7684\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6574\u5408\u591a\u4e2a\u89c6\u89c9\u7f16\u7801\u5668\u6765\u589e\u5f3a\u89c6\u89c9\u7ec6\u8282\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u5f15\u5165\u4e86\u5197\u4f59\u548c\u8ba1\u7b97\u5f00\u9500\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5927\u591a\u6570MLLMs\u4ec5\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u7684\u6700\u540e\u4e00\u5c42\u7279\u5f81\u56fe\u6765\u8fdb\u884c\u89c6\u89c9\u8868\u793a\uff0c\u800c\u5ffd\u7565\u4e86\u6d45\u5c42\u7279\u5f81\u56fe\u4e2d\u7684\u4e30\u5bcc\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\modelname\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u591a\u5c42\u7279\u5f81\u878d\u5408\u5668\uff0c\u80fd\u591f\u9ad8\u6548\u5730\u6574\u5408\u6765\u81ea\u89c6\u89c9\u53d8\u6362\u5668\uff08ViTs\uff09\u7684\u6df1\u5c42\u548c\u6d45\u5c42\u7279\u5f81\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5229\u7528\u8bed\u4e49\u5bf9\u9f50\u7684\u6df1\u5c42\u7279\u5f81\u4f5c\u4e3a\u67e5\u8be2\uff0c\u52a8\u6001\u63d0\u53d6\u6d45\u5c42\u7279\u5f81\u4e2d\u7f3a\u5931\u7684\u7ec6\u8282\uff0c\u4ece\u800c\u5728\u4fdd\u6301\u8bed\u4e49\u5bf9\u9f50\u7684\u540c\u65f6\u4e30\u5bcc\u4e86\u8868\u793a\u5f62\u5f0f\u7684\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u5e94\u7528\u4e8eLLaVA-1.5\u6a21\u578b\u65f6\uff0c\\modelname\u5728\u89c6\u89c9\u8868\u793a\u548c\u57fa\u51c6\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u591a\u7f16\u7801\u5668\u96c6\u6210\u65b9\u6cd5\u66f4\u7075\u6d3b\u3001\u66f4\u8f7b\u91cf\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/yuecao0119/MMFuser\u3002**|\n", "2410.11815": "|**2024-10-15**|**SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing**|Zhiyuan Zhang et.al.|[2410.11815](http://arxiv.org/abs/2410.11815)|null|\u573a\u666f\u56fe\u4ee5\u8282\u70b9\u548c\u8fb9\u7684\u5f62\u5f0f\u63d0\u4f9b\u4e86\u56fe\u50cf\u7684\u7ed3\u6784\u5316\u3001\u5206\u5c42\u8868\u793a\uff0c\u5206\u522b\u8868\u793a\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\u3002\u5b83\u53ef\u4ee5\u7528\u4f5c\u56fe\u50cf\u7f16\u8f91\u7684\u81ea\u7136\u754c\u9762\uff0c\u663e\u8457\u63d0\u9ad8\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u3002\u5229\u7528\u8fd9\u4e00\u4f18\u52bf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u76f8\u7ed3\u5408\uff0c\u7528\u4e8e\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u56fe\u50cf\u7f16\u8f91\u3002\u8fd9\u79cd\u96c6\u6210\u4f7f\u5f97\u5728\u5bf9\u8c61\u7ea7\u522b\u8fdb\u884c\u7cbe\u786e\u4fee\u6539\u4ee5\u53ca\u5bf9\u573a\u666f\u8fdb\u884c\u521b\u9020\u6027\u91cd\u6784\u6210\u4e3a\u53ef\u80fd\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u6574\u4f53\u56fe\u50cf\u7684\u5b8c\u6574\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a1\uff09\u5229\u7528LLM\u9a71\u52a8\u7684\u573a\u666f\u89e3\u6790\u5668\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u56fe\u50cf\u7684\u573a\u666f\u56fe\uff0c\u6355\u6349\u5173\u952e\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\uff0c\u5e76\u89e3\u6790\u7ec6\u7c92\u5ea6\u5c5e\u6027\u5982\u5bf9\u8c61\u63a9\u7801\u548c\u63cf\u8ff0\u3002\u8fd9\u4e9b\u6ce8\u91ca\u4fc3\u8fdb\u4e86\u6982\u5ff5\u5b66\u4e60\uff0c\u4f7f\u7528\u5fae\u8c03\u6269\u6563\u6a21\u578b\u6765\u4ee3\u8868\u6bcf\u4e2a\u5bf9\u8c61\uff0c\u7528\u4f18\u5316\u7684\u6807\u8bb0\u548c\u8be6\u7ec6\u7684\u63cf\u8ff0\u63d0\u793a\u8868\u793a\u30022\uff09\u5728\u56fe\u50cf\u7f16\u8f91\u9636\u6bb5\uff0cLLM\u7f16\u8f91\u63a7\u5236\u5668\u6307\u5bfc\u7279\u5b9a\u533a\u57df\u7684\u7f16\u8f91\u3002\u8fd9\u4e9b\u7f16\u8f91\u901a\u8fc7\u6ce8\u610f\u529b\u8c03\u8282\u7684\u6269\u6563\u7f16\u8f91\u5668\u5b9e\u73b0\uff0c\u5229\u7528\u5fae\u8c03\u6a21\u578b\u6267\u884c\u5bf9\u8c61\u6dfb\u52a0\u3001\u5220\u9664\u3001\u66ff\u6362\u548c\u8c03\u6574\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u7f16\u8f91\u7cbe\u5ea6\u548c\u573a\u666f\u7f8e\u5b66\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u56fe\u50cf\u7f16\u8f91\u65b9\u6cd5\u3002|\n", "2410.11805": "|**2024-10-15**|**NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models**|Han Han et.al.|[2410.11805](http://arxiv.org/abs/2410.11805)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u5de5\u5177\u5b66\u4e60\u5728\u73b0\u5b9e\u5e94\u7528\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u679c\u3002\u5728\u5de5\u5177\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0cLLMs\u53ef\u80fd\u4f1a\u6309\u7167\u5d4c\u5957\u987a\u5e8f\u8c03\u7528\u591a\u4e2a\u5de5\u5177\uff0c\u5176\u4e2d\u540e\u4e00\u4e2a\u5de5\u5177\u8c03\u7528\u53ef\u80fd\u5c06\u5176\u524d\u4e00\u4e2a\u5de5\u5177\u7684\u54cd\u5e94\u4f5c\u4e3a\u8f93\u5165\u53c2\u6570\u3002\u7136\u800c\uff0c\u5f53\u524d\u5bf9\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7814\u7a76\u4ecd\u7136\u4e0d\u8db3\uff0c\u56e0\u4e3a\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u7f3a\u4e4f\u76f8\u5173\u6570\u636e\u5b9e\u4f8b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86NesTools\u6765\u586b\u8865\u5168\u9762\u8bc4\u4f30\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7a7a\u767d\u3002NesTools\u5305\u542b\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u7528\u4e8e\u6784\u5efa\u5177\u6709\u4e0d\u540c\u5d4c\u5957\u7ed3\u6784\u7684\u5927\u89c4\u6a21\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u3002\u901a\u8fc7\u4eba\u5de5\u5ba1\u6838\u548c\u4f18\u5316\uff0c\u8be5\u6570\u636e\u96c6\u8d28\u91cf\u9ad8\u4e14\u4e0e\u73b0\u5b9e\u573a\u666f\u7d27\u5bc6\u76f8\u5173\u3002\u56e0\u6b64\uff0cNesTools\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6765\u8bc4\u4f30LLMs\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u3002\u6211\u4eec\u5bf922\u4e2aLLMs\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528NesTools\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684LLMs\u5728\u590d\u6742\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u4efb\u52a1\u4e0a\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002|\n", "2410.11802": "|**2024-10-15**|**FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting**|Zhe Li et.al.|[2410.11802](http://arxiv.org/abs/2410.11802)|null|\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u5728\u91d1\u878d\u3001\u6c14\u8c61\u670d\u52a1\u548c\u80fd\u6e90\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u90fd\u662f\u5173\u952e\u529f\u80fd\u3002\u5c3d\u7ba1\u8fd1\u5e74\u6765\u51fa\u73b0\u4e86\u8bb8\u591aTSF\u65b9\u6cd5\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e2d\u7684\u8bb8\u591a\u9700\u8981\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u6536\u96c6\u548c\u6a21\u578b\u8bad\u7ec3\uff0c\u5e76\u4e14\u5728\u65b0\u9886\u57df\u4e0a\u7684\u6cdb\u5316\u6027\u80fd\u8f83\u5dee\u3002\u57fa\u7840\u6a21\u578b\u65e8\u5728\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u3002\u5b83\u4eec\u901a\u8fc7\u5927\u89c4\u6a21\u8bed\u8a00\u6216\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u9884\u8bad\u7ec3\uff0c\u8868\u73b0\u51fa\u5728\u65b0\u6216\u672a\u89c1\u8fc7\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u63a8\u7406\u7684\u6f5c\u529b\u3002\u8fd9\u4fc3\u4f7f\u4e86\u65b0\u578bTSF\u57fa\u7840\u6a21\u578b\u7684\u6d8c\u73b0\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5373FoundTS\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5f7b\u5e95\u800c\u516c\u5e73\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002FoundTS\u6db5\u76d6\u4e86\u5404\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u65f6\u95f4\u5e8f\u5217\u7684\u57fa\u7840\u6a21\u578b\u3002\u6b64\u5916\uff0cFoundTS\u652f\u6301\u4e0d\u540c\u7684\u9884\u6d4b\u7b56\u7565\uff0c\u5305\u62ec\u96f6\u6837\u672c\u3001\u5c11\u91cf\u6837\u672c\u548c\u5168\u6837\u672c\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6700\u540e\uff0cFoundTS\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6807\u51c6\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\u7ba1\u9053\uff0c\u5305\u62ec\u6570\u636e\u96c6\u5206\u5272\u3001\u52a0\u8f7d\u3001\u5f52\u4e00\u5316\u548c\u5c11\u91cf\u6837\u672c\u62bd\u53d6\uff0c\u4ece\u800c\u5b9e\u73b0\u516c\u5e73\u7684\u8bc4\u4f30\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5bf9\u5e7f\u6cdb\u9886\u57df\u5185\u5177\u6709\u4e0d\u540c\u7edf\u8ba1\u7279\u6027\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u7684TSF\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u8bc4\u4f30\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u6709\u57fa\u7840\u6a21\u578b\u7684\u4f18\u70b9\u3001\u7f3a\u70b9\u53ca\u5176\u5185\u5728\u9650\u5236\uff0c\u5e76\u786e\u5b9a\u4e86\u672a\u6765\u6a21\u578b\u8bbe\u8ba1\u7684\u65b9\u5411\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u5728https://anonymous.4open.science/r/FoundTS-C2B0\u83b7\u53d6\u3002|\n", "2410.11786": "|**2024-10-15**|**Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability**|Tsz Ting Chung et.al.|[2410.11786](http://arxiv.org/abs/2410.11786)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u65f6\u3002\u7136\u800c\uff0c\u4e0a\u4e0b\u6587\u5b66\u4e60\u5e26\u6765\u4e86\u989d\u5916\u7684\u8ba1\u7b97\u548c\u8d22\u52a1\u6210\u672c\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u4e00\u4e9b\u63d0\u793a\u538b\u7f29\u65b9\u6cd5\u88ab\u63d0\u51fa\u4ee5\u538b\u7f29\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u7684\u63d0\u793a\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u9762\u4e34\u7740\u7531\u4e8e\u6a21\u578b\u7279\u5b9a\u538b\u7f29\u800c\u5bfc\u81f4\u7684\u8fc1\u79fb\u6027\u5dee\u7684\u95ee\u9898\uff0c\u6216\u8005\u4f9d\u8d56\u5916\u90e8\u8bad\u7ec3\u6570\u636e\uff0c\u4f8b\u5982GPT-4\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u5f00\u53d1\u7edf\u4e00\u538b\u7f29\u65b9\u6cd5\u7684\u80fd\u529b\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u79bb\u6563\u5316\u4e0d\u5177\u4fe1\u606f\u6027\u7684\u6807\u8bb0\uff0c\u91c7\u7528\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u6280\u672f\u3002\u901a\u8fc7\u5728\u6301\u7eed\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f15\u5165\u5c11\u91cf\u53c2\u6570\uff0c\u6240\u63d0\u51fa\u7684Selection-p\u4e3a\u6bcf\u4e2a\u8f93\u5165\u6807\u8bb0\u751f\u6210\u4e00\u4e2a\u6982\u7387\u503c\uff0c\u6307\u793a\u4fdd\u7559\u6216\u4e22\u5f03\u8be5\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cSelection-p\u5728\u591a\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e2d\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u5b9e\u73b0\u9ad8\u8fbe10\u500d\u7684\u538b\u7f29\u7387\u7684\u540c\u65f6\uff0c\u4ec5\u7ecf\u5386\u4e86\u5fae\u5c0f\u76840.8%\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u5b83\u76f8\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u5728\u4e0d\u540c\u6a21\u578b\u4e0a\u7684\u8fc1\u79fb\u6027\u66f4\u4f18\u3002\u53e6\u5916\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5206\u6790\u4e86Selection-p\u5982\u4f55\u6709\u52a9\u4e8e\u5728\u957f\u4e0a\u4e0b\u6587\u4e2d\u4fdd\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u6027\u80fd\u3002|\n", "2410.11782": "|**2024-10-15**|**G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks**|Guibin Zhang et.al.|[2410.11782](http://arxiv.org/abs/2410.11782)|null|\u8fd1\u671f\u5728\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6280\u672f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u8bc1\u660e\u96c6\u4f53\u667a\u80fd\u53ef\u4ee5\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\u62d3\u6251\u3002\u5c3d\u7ba1\u6709\u8bb8\u591a\u591a\u6837\u5316\u4e14\u9ad8\u6027\u80fd\u7684\u8bbe\u8ba1\u53ef\u4f9b\u9009\u62e9\uff0c\u4f46\u5b9e\u8df5\u8005\u5728\u4e3a\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u6709\u6548\u7684\u7ba1\u9053\u65f6\u5e38\u5e38\u611f\u5230\u56f0\u60d1\uff1a\u54ea\u79cd\u62d3\u6251\u6700\u9002\u5408\u6211\u7684\u4efb\u52a1\uff0c\u540c\u65f6\u907f\u514d\u4e0d\u5fc5\u8981\u7684\u901a\u4fe1\u4ee4\u724c\u5f00\u9500\u5e76\u786e\u4fdd\u9ad8\u8d28\u91cf\u7684\u89e3\u51b3\u65b9\u6848\uff1f\u9488\u5bf9\u8fd9\u4e00\u56f0\u5883\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86G-Designer\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u9002\u5e94\u3001\u9ad8\u6548\u4e14\u7a33\u5065\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u52a8\u6001\u8bbe\u8ba1\u4efb\u52a1\u611f\u77e5\u7684\u5b9a\u5236\u5316\u901a\u4fe1\u62d3\u6251\u3002\u5177\u4f53\u6765\u8bf4\uff0cG-Designer\u5c06\u591a\u4ee3\u7406\u7cfb\u7edf\u5efa\u6a21\u4e3a\u4e00\u4e2a\u591a\u4ee3\u7406\u7f51\u7edc\uff0c\u5229\u7528\u53d8\u5206\u56fe\u81ea\u52a8\u7f16\u7801\u5668\u5bf9\u8282\u70b9\uff08\u4ee3\u7406\uff09\u548c\u4e00\u4e2a\u7279\u5b9a\u4efb\u52a1\u7684\u865a\u62df\u8282\u70b9\u8fdb\u884c\u7f16\u7801\uff0c\u5e76\u89e3\u7801\u51fa\u4e00\u4e2a\u4efb\u52a1\u9002\u5e94\u6027\u5f3a\u4e14\u6027\u80fd\u9ad8\u7684\u901a\u4fe1\u62d3\u6251\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cG-Designer\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\\textbf{(1) \u9ad8\u6027\u80fd}\uff0c\u5728MMLU\u4e0a\u7684\u51c6\u786e\u7387\u8fbe\u523084.50%\uff0c\u5728HumanEval\u4e0a\u7684pass@1\u8fbe\u523089.90%\uff1b\\textbf{(2) \u4efb\u52a1\u9002\u5e94\u6027}\uff0c\u6839\u636e\u4efb\u52a1\u96be\u5ea6\u6784\u5efa\u5b9a\u5236\u5316\u7684\u901a\u4fe1\u534f\u8bae\uff0c\u5c06\u4ee4\u724c\u6d88\u8017\u51cf\u5c11\u4e86\u9ad8\u8fbe95.33%\uff1b\u5e76\u4e14\\textbf{(3) \u5bf9\u6297\u9c81\u68d2}\uff0c\u80fd\u591f\u62b5\u5fa1\u4ee3\u7406\u5bf9\u6297\u653b\u51fb\uff0c\u4ec5\u5bfc\u81f40.3%\u7684\u51c6\u786e\u7387\u4e0b\u964d\u3002|\n", "2410.11781": "|**2024-10-15**|**Language Models Encode Numbers Using Digit Representations in Base 10**|Amit Arnold Levy et.al.|[2410.11781](http://arxiv.org/abs/2410.11781)|**[link](https://github.com/amitlevy/base10)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u5373\u4f7f\u662f\u7b80\u5355\u7684\u6570\u503c\u95ee\u9898\u65f6\uff0c\u5982\u6bd4\u8f83\u4e24\u4e2a\u5c0f\u6570\u5b57\uff0c\u4e5f\u7ecf\u5e38\u51fa\u9519\u3002\u4e00\u4e2a\u81ea\u7136\u7684\u5047\u8bbe\u662f\u8fd9\u4e9b\u9519\u8bef\u6e90\u4e8e\u6a21\u578b\u5982\u4f55\u8868\u793a\u6570\u5b57\uff0c\u7279\u522b\u662f\u5b83\u4eec\u662f\u5426\u6355\u6349\u5230\u4e86\u6570\u5b57\u7684\u5b9e\u9645\u6570\u503c\u3002\u6211\u4eec\u901a\u8fc7\u89c2\u5bdf\u53d1\u73b0\uff0cLLM\u5728\u6570\u503c\u4efb\u52a1\u4e0a\u7684\u9519\u8bef\u901a\u5e38\u5206\u5e03\u5728\u7b54\u6848\u7684\u201c\u4f4d\u6570\u201d\u4e0a\uff0c\u800c\u4e0d\u662f\u56f4\u7ed5\u5176\u201c\u6570\u503c\u201d\u6b63\u5e38\u5206\u5e03\u3002\u901a\u8fc7\u4e00\u7cfb\u5217\u63a2\u9488\u5b9e\u9a8c\u548c\u56e0\u679c\u5e72\u9884\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u5185\u90e8\u4ee5\u5341\u8fdb\u5236\u7684\u6bcf\u4e00\u4f4d\u6570\u5b57\u8fdb\u884c\u5706\u73af\u5f0f\u8868\u793a\uff0c\u800c\u4e0d\u662f\u6570\u503c\u8868\u793a\u3002\u8fd9\u79cd\u57fa\u4e8e\u4f4d\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u800c\u975e\u6570\u503c\u8868\u793a\uff0c\u63ed\u793a\u4e86\u6a21\u578b\u5728\u6d89\u53ca\u6570\u503c\u63a8\u7406\u7684\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u6a21\u5f0f\uff0c\u5e76\u53ef\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u5206\u6790LLM\u4e2d\u6570\u503c\u673a\u5236\u7684\u57fa\u7840\u3002|\n", "2410.11779": "|**2024-10-15**|**MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation**|Chenxi Wang et.al.|[2410.11779](http://arxiv.org/abs/2410.11779)|**[link](https://github.com/zjunlp/Deco)**|**\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7ecf\u5e38\u8868\u73b0\u51fa\u5e7b\u89c9\u73b0\u8c61\uff0c\u4f46\u5176\u80cc\u540e\u7684\u539f\u56e0\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u5206\u6790\u5e76\u53d1\u73b0\uff0c\u5c3d\u7ba1MLLMs\u5728\u6700\u7ec8\u8f93\u51fa\u4e2d\u9519\u8bef\u5730\u751f\u6210\u4e86\u5bf9\u8c61\uff0c\u4f46\u5728\u524d\u4e00\u5c42\u5b83\u4eec\u5b9e\u9645\u4e0a\u80fd\u591f\u8bc6\u522b\u89c6\u89c9\u5bf9\u8c61\u3002\u6211\u4eec\u63a8\u6d4b\u8fd9\u53ef\u80fd\u662f\u7531\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u77e5\u8bc6\u5148\u9a8c\u6291\u5236\u4e86\u89c6\u89c9\u4fe1\u606f\uff0c\u4ece\u800c\u5bfc\u81f4\u5e7b\u89c9\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u6821\u6b63\u89e3\u7801\u65b9\u6cd5\uff08DeCo\uff09\uff0c\u8be5\u65b9\u6cd5\u81ea\u9002\u5e94\u5730\u9009\u62e9\u5408\u9002\u7684\u524d\u4e00\u5c42\uff0c\u5e76\u6309\u6bd4\u4f8b\u5c06\u77e5\u8bc6\u6574\u5408\u5230\u6700\u7ec8\u5c42\u4ee5\u8c03\u6574\u8f93\u51falogits\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cDeCo\u662f\u4e0e\u6a21\u578b\u65e0\u5173\u7684\uff0c\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u5404\u79cd\u7ecf\u5178\u89e3\u7801\u7b56\u7565\u7ed3\u5408\uff0c\u5e76\u5e94\u7528\u4e8e\u4e0d\u540c\u7684MLLMs\u3002\u6211\u4eec\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86DeCo\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u76f8\u6bd4\u57fa\u7ebf\u5927\u5e45\u964d\u4f4e\u4e86\u5e7b\u89c9\u7387\uff0c\u7a81\u663e\u4e86\u5176\u51cf\u8f7b\u5e7b\u89c9\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u53ef\u5728https://github.com/zjunlp/DeCo\u83b7\u53d6\u3002**|\n", "2410.11772": "|**2024-10-15**|**Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models**|Kai Yao et.al.|[2410.11772](http://arxiv.org/abs/2410.11772)|**[link](https://github.com/kaiseem/ist)**|**\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u65b9\u6cd5\u56e0\u5176\u5728\u9002\u5e94\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5230\u4e0b\u6e38\u4efb\u52a1\u65f6\u663e\u8457\u51cf\u5c11\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u7684\u6f5c\u529b\u800c\u5e7f\u53d7\u6b22\u8fce\u3002\u7136\u800c\uff0c\u5927\u591a\u6570PEFT\u65b9\u6cd5\u7684\u4e00\u4e2a\u5e38\u89c1\u9650\u5236\u662f\u5b83\u4eec\u5728\u6574\u4e2a\u5c42\u4e2d\u5e94\u7528\u7edf\u4e00\u7684\u67b6\u6784\u8bbe\u8ba1\uff0c\u8fd9\u6d89\u53ca\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u6a21\u5757\uff0c\u5e76\u5ffd\u7565\u4e86\u6bcf\u5c42\u7684\u91cd\u8981\u6027\u5dee\u5f02\uff0c\u4ece\u800c\u5bfc\u81f4\u5fae\u8c03\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u5c40\u9650\u5e76\u83b7\u5f97\u66f4\u597d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u91cd\u8981\u6027\u611f\u77e5\u7a00\u758f\u8c03\u4f18\uff08IST\uff09\uff0c\u4ee5\u5145\u5206\u5229\u7528\u56fa\u6709\u7684\u7a00\u758f\u6027\uff0c\u5e76\u901a\u8fc7\u6709\u6548\u7684\u9010\u5c42\u91cd\u8981\u6027\u8bc4\u5206\u9009\u62e9\u6700\u91cd\u8981\u7684\u5168\u5c42\u5b50\u96c6\u3002\u6240\u63d0\u51fa\u7684IST\u662f\u4e00\u79cd\u901a\u7528\u4e14\u5373\u63d2\u5373\u7528\u7684\u6280\u672f\uff0c\u4e0e\u5404\u79cd\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u517c\u5bb9\u3002\u901a\u8fc7\u5229\u7528\u4f30\u8ba1\u7684\u91cd\u8981\u6027\u5f97\u5206\uff0cIST\u5728PEFT\u6a21\u5757\u4e2d\u52a8\u6001\u66f4\u65b0\u8fd9\u4e9b\u9009\u5b9a\u7684\u5c42\uff0c\u4ece\u800c\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u4f9b\u4e86\u6536\u655b\u6027\u7684\u7406\u8bba\u8bc1\u660e\u548c\u4f18\u4e8e\u5747\u5300\u66f4\u65b0\u7b56\u7565\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u4ee5\u8bc1\u660eIST\u76f8\u5bf9\u4e8e\u73b0\u6709\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5404\u79cdLLMs\u3001PEFT\u65b9\u6cd5\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u8bc1\u5b9e\u4e86\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86IST\u589e\u5f3a\u73b0\u6709\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/Kaiseem/IST\u83b7\u53d6\u3002**|\n", "2410.12788": "|**2024-10-16**|**Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception**|Jihao Zhao et.al.|[2410.12788](http://arxiv.org/abs/2410.12788)|**[link](https://github.com/IAAR-Shanghai/Meta-Chunking)**|Retrieval-Augmented Generation\uff08RAG\uff09\u5728\u4f5c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53ef\u884c\u8865\u5145\u65f6\uff0c\u5e38\u5e38\u5ffd\u7565\u4e86\u5176\u7ba1\u9053\u4e2d\u4e00\u4e2a\u5173\u952e\u65b9\u9762\u2014\u2014\u6587\u672c\u5206\u5757\uff0c\u8fd9\u5f71\u54cd\u4e86\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u7684\u8d28\u91cf\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u79f0\u4e3a\u5143\u5206\u5757\uff08Meta-Chunking\uff09\u7684\u6982\u5ff5\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8e\u53e5\u5b50\u548c\u6bb5\u843d\u4e4b\u95f4\u7684\u7c92\u5ea6\uff0c\u7531\u6bb5\u843d\u5185\u5177\u6709\u6df1\u5c42\u6b21\u8bed\u8a00\u903b\u8f91\u8054\u7cfb\u7684\u4e00\u7ec4\u53e5\u5b50\u7ec4\u6210\u3002\u4e3a\u4e86\u5b9e\u73b0\u5143\u5206\u5757\uff0c\u6211\u4eec\u57fa\u4e8eLLMs\u8bbe\u8ba1\u4e86\u4e24\u79cd\u7b56\u7565\uff1a\u8fb9\u754c\u91c7\u6837\u5206\u5757\u548c\u56f0\u60d1\u5ea6\u5206\u5757\u3002\u524d\u8005\u5229\u7528LLMs\u5bf9\u8fde\u7eed\u53e5\u5b50\u662f\u5426\u9700\u8981\u5206\u5272\u8fdb\u884c\u4e8c\u5206\u7c7b\u51b3\u7b56\uff0c\u57fa\u4e8e\u4ece\u8fb9\u754c\u91c7\u6837\u83b7\u5f97\u7684\u6982\u7387\u5dee\u505a\u51fa\u51b3\u7b56\u3002\u540e\u8005\u901a\u8fc7\u5206\u6790\u56f0\u60d1\u5ea6\u5206\u5e03\u7684\u7279\u70b9\u6765\u7cbe\u786e\u8bc6\u522b\u6587\u672c\u5206\u5757\u8fb9\u754c\u3002\u6b64\u5916\uff0c\u8003\u8651\u5230\u4e0d\u540c\u6587\u672c\u7684\u56fa\u6709\u590d\u6742\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5143\u5206\u5757\u4e0e\u52a8\u6001\u5408\u5e76\u7684\u7b56\u7565\uff0c\u4ee5\u5b9e\u73b0\u5728\u7ec6\u7c92\u5ea6\u548c\u7c97\u7c92\u5ea6\u6587\u672c\u5206\u5757\u4e4b\u95f4\u53d6\u5f97\u5e73\u8861\u3002\u5b9e\u9a8c\u5728\u5341\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660e\u5143\u5206\u5757\u53ef\u4ee5\u66f4\u6709\u6548\u5730\u63d0\u9ad8\u57fa\u4e8eRAG\u7684\u5355\u8df3\u548c\u591a\u8df3\u95ee\u7b54\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u57282WikiMultihopQA\u6570\u636e\u96c6\u4e0a\uff0c\u5b83\u6bd4\u76f8\u4f3c\u6027\u5206\u5757\u63d0\u9ad8\u4e861.32\u7684\u6027\u80fd\uff0c\u540c\u65f6\u4ec5\u6d88\u8017\u4e8645.8%\u7684\u65f6\u95f4\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/IAAR-Shanghai/Meta-Chunking \u83b7\u53d6\u3002|\n", "2410.12782": "|**2024-10-16**|**In-Context Learning Enables Robot Action Prediction in LLMs**|Yida Yin et.al.|[2410.12782](http://arxiv.org/abs/2410.12782)|null|\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u9886\u57df\u901a\u8fc7\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\u3002\u7136\u800c\uff0c\u5229\u7528LLMs\u7684ICL\u80fd\u529b\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u7684\u7814\u7a76\u8fd8\u76f8\u5bf9\u8f83\u5c11\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboPrompt\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u73b0\u6210\u7684\u7eaf\u6587\u672cLLMs\u80fd\u591f\u5728\u65e0\u9700\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u901a\u8fc7ICL\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u8bc6\u522b\u51fa\u4e00\u4e2a\u7247\u6bb5\u4e2d\u7684\u5173\u952e\u5e27\uff0c\u8fd9\u4e9b\u5173\u952e\u5e27\u6355\u6349\u4e86\u91cd\u8981\u7684\u65f6\u523b\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u4ece\u8fd9\u4e9b\u5173\u952e\u5e27\u4e2d\u63d0\u53d6\u672b\u7aef\u6267\u884c\u5668\u7684\u52a8\u4f5c\u4ee5\u53ca\u4f30\u8ba1\u7684\u521d\u59cb\u7269\u4f53\u59ff\u6001\uff0c\u5e76\u5c06\u4e24\u8005\u8f6c\u6362\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6a21\u677f\uff0c\u4ece\u8fd9\u4e9b\u6587\u672c\u63cf\u8ff0\u548c\u4efb\u52a1\u6307\u4ee4\u4e2d\u5f62\u6210ICL\u6f14\u793a\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u5728\u6d4b\u8bd5\u65f6\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0cRoboPrompt\u5728\u6a21\u62df\u548c\u771f\u5b9e\u73af\u5883\u4e2d\u5747\u8868\u73b0\u51fa\u6bd4\u96f6\u6837\u672c\u548cICL\u57fa\u7ebf\u66f4\u5f3a\u7684\u6027\u80fd\u3002|\n", "2410.12774": "|**2024-10-16**|**Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information**|Yingya Li et.al.|[2410.12774](http://arxiv.org/abs/2410.12774)|null|\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u4efb\u52a1\u7684\u5206\u7ec4\u65b9\u5f0f\u3002\u7b80\u5355\u5730\u5c06\u6240\u6709\u4efb\u52a1\u6216\u968f\u673a\u9009\u62e9\u7684\u4efb\u52a1\u7ec4\u5408\u5728\u4e00\u8d77\u53ef\u80fd\u5bfc\u81f4\u8d1f\u8fc1\u79fb\uff0c\u4ece\u800c\u4f7f\u591a\u4efb\u52a1\u6a21\u578b\u7684\u8868\u73b0\u4e0d\u5982\u5355\u4efb\u52a1\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u7ecf\u505a\u51fa\u4e86\u8bb8\u591a\u52aa\u529b\u6765\u8bc6\u522b\u4efb\u52a1\u5206\u7ec4\u5e76\u8861\u91cf\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4f46\u5b9a\u4e49\u4e00\u4e2a\u6307\u6807\u4ee5\u4ece\u4f17\u591a\u6f5c\u5728\u4efb\u52a1\u7ec4\u5408\u4e2d\u786e\u5b9a\u6700\u4f73\u4efb\u52a1\u5206\u7ec4\u4ecd\u7136\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7814\u7a76\u8bfe\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u70b9\u5f0fV-\u53ef\u7528\u4fe1\u606f\uff08PVI\uff09\u6d4b\u91cf\u4efb\u52a1\u96be\u5ea6\u7684\u4efb\u52a1\u76f8\u5173\u6027\u5ea6\u91cf\u65b9\u6cd5\u3002PVI\u662f\u4e00\u79cd\u65b0\u8fd1\u63d0\u51fa\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u7528\u4e8e\u4f30\u8ba1\u7ed9\u5b9a\u6a21\u578b\u65f6\u6570\u636e\u96c6\u5305\u542b\u591a\u5c11\u53ef\u7528\u4fe1\u606f\u3002\u6211\u4eec\u5047\u8bbe\u5177\u6709\u7edf\u8ba1\u4e0a\u4e0d\u53ef\u533a\u5206\u7684PVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u8db3\u591f\u76f8\u4f3c\uff0c\u53ef\u4ee5\u4ece\u8054\u5408\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u5728\u4e00\u822c\u3001\u751f\u7269\u533b\u5b66\u548c\u4e34\u5e8a\u9886\u57df\u768415\u4e2aNLP\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u8be5\u5ea6\u91cf\u65b9\u6cd5\u7528\u4e8e\u4efb\u52a1\u5206\u7ec4\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u5c06\u8054\u5408\u5b66\u4e60\u5668\u7684\u7ed3\u679c\u4e0e\u5355\u4efb\u52a1\u5b66\u4e60\u5668\u3001\u73b0\u6709\u57fa\u7ebf\u65b9\u6cd5\u4ee5\u53ca\u6700\u8fd1\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5305\u62ecLlama 2\u548cGPT-4\uff09\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u5c06\u5177\u6709\u76f8\u4f3cPVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u5206\u7ec4\uff0c\u8054\u5408\u5b66\u4e60\u5668\u5728\u8f83\u5c11\u603b\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u83b7\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u9886\u57df\u5185\u8868\u73b0\u4e00\u81f4\u3002|\n", "2410.12757": "|**2024-10-16**|**StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples**|Ajay Patel et.al.|[2410.12757](http://arxiv.org/abs/2410.12757)|null|\u98ce\u683c\u8868\u793a\u65e8\u5728\u5c06\u5177\u6709\u76f8\u4f3c\u5199\u4f5c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u63a5\u8fd1\u7684\u4f4d\u7f6e\uff0c\u5e76\u5c06\u5177\u6709\u4e0d\u540c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u8fdc\u79bb\u7684\u4f4d\u7f6e\uff0c\u800c\u4e0d\u8003\u8651\u5185\u5bb9\u3002\u7136\u800c\uff0c\u7528\u4e8e\u8bad\u7ec3\u8fd9\u4e9b\u8868\u793a\u7684\u5bf9\u6bd4\u4e09\u5143\u7ec4\u5f80\u5f80\u5728\u98ce\u683c\u548c\u5185\u5bb9\u4e0a\u90fd\u6709\u6240\u53d8\u5316\uff0c\u5bfc\u81f4\u8868\u793a\u4e2d\u53ef\u80fd\u5b58\u5728\u5185\u5bb9\u6cc4\u6f0f\u7684\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aStyleDistance\u7684\u65b0\u65b9\u6cd5\u6765\u8bad\u7ec3\u66f4\u5f3a\u7684\u72ec\u7acb\u4e8e\u5185\u5bb9\u7684\u98ce\u683c\u5d4c\u5165\u3002\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u4e86\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u53d7\u63a7\u98ce\u683c\u53d8\u5316\u7684\u8fd1\u4f3c\u91ca\u4e49\uff0c\u5e76\u4e3a\u7cbe\u786e\u7684\u5bf9\u6bd4\u5b66\u4e60\u751f\u6210\u4e86\u8de8\u8d8a40\u4e2a\u4e0d\u540c\u98ce\u683c\u7279\u5f81\u7684\u6b63\u4f8b\u548c\u8d1f\u4f8b\u3002\u6211\u4eec\u901a\u8fc7\u4eba\u5de5\u548c\u81ea\u52a8\u8bc4\u4f30\u6765\u8bc4\u4f30\u5408\u6210\u6570\u636e\u548c\u5d4c\u5165\u7684\u8d28\u91cf\u3002StyleDistance\u589e\u5f3a\u4e86\u98ce\u683c\u5d4c\u5165\u7684\u5185\u5bb9\u72ec\u7acb\u6027\uff0c\u8fd9\u79cd\u5d4c\u5165\u53ef\u4ee5\u63a8\u5e7f\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u5728\u4e0b\u6e38\u5e94\u7528\u4e2d\u4f18\u4e8e\u9886\u5148\u7684\u98ce\u683c\u8868\u793a\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728https://huggingface.co/StyleDistance/styledistance\u627e\u5230\u3002|\n", "2410.12735": "|**2024-10-17**|**CREAM: Consistency Regularized Self-Rewarding Language Models**|Zhaoyang Wang et.al.|[2410.12735](http://arxiv.org/abs/2410.12735)|null|\u8fd1\u671f\u7684\u81ea\u6211\u5956\u52b1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6210\u529f\u5730\u5e94\u7528\u4e86LLM\u4f5c\u4e3a\u88c1\u5224\u7684\u65b9\u6cd5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5bf9\u9f50\u6027\u80fd\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u6807\u6ce8\u7684\u504f\u597d\u6570\u636e\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u540c\u4e00LLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\uff08\u751f\u6210\u54cd\u5e94\uff09\u548c\u5956\u52b1\u6a21\u578b\uff08\u8bc4\u5206\u548c\u6392\u5e8f\u8fd9\u4e9b\u54cd\u5e94\uff09\u3002\u7136\u540e\uff0c\u6839\u636e\u6392\u540d\u7684\u54cd\u5e94\u4f5c\u4e3a\u504f\u597d\u5bf9\u6765\u901a\u8fc7\u76f4\u63a5\u5bf9\u9f50\u6280\u672f\uff08\u4f8b\u5982DPO\uff09\u8bad\u7ec3LLM\u3002\u7136\u800c\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0c\u5956\u52b1\u548c\u6392\u5e8f\u7684\u51c6\u786e\u6027\u6ca1\u6709\u4fdd\u8bc1\uff0c\u8fd9\u5bf9\u4e8e\u786e\u4fdd\u51c6\u786e\u7684\u5956\u52b1\u548c\u9ad8\u8d28\u91cf\u7684\u504f\u597d\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u6765\u81ea\u76f8\u5bf9\u8f83\u5c0f\u7684LLM\uff08\u4f8b\u59827B\u53c2\u6570\uff09\u7684\u7ecf\u9a8c\u7ed3\u679c\u4e5f\u8868\u660e\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7ecf\u8fc7\u51e0\u6b21\u8fed\u4ee3\u540e\uff0c\u81ea\u6211\u5956\u52b1\u7684\u6539\u8fdb\u53ef\u80fd\u4f1a\u51cf\u5f31\uff0c\u6211\u4eec\u5047\u8bbe\u8fd9\u662f\u7531\u4e8e\u5956\u52b1\u7cfb\u7edf\u4e2d\u7684\u7d2f\u79ef\u504f\u5dee\u6240\u81f4\u3002\u8fd9\u79cd\u504f\u5dee\u53ef\u80fd\u5bfc\u81f4\u7528\u4e8e\u8bad\u7ec3LLM\u7684\u4e0d\u53ef\u9760\u504f\u597d\u6570\u636e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u9996\u5148\u5236\u5b9a\u4e86\u5e76\u5206\u6790\u4e86\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u4e49\u8fed\u4ee3\u504f\u597d\u5fae\u8c03\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728\u8fd9\u4e00\u5e7f\u4e49\u6846\u67b6\u4e2d\u5f15\u5165\u6b63\u5219\u5316\uff0c\u4ee5\u51cf\u8f7b\u81ea\u6211\u5956\u52b1\u8fc7\u7a0b\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u504f\u597d\u6807\u8bb0\u3002\u57fa\u4e8e\u8fd9\u4e00\u7406\u8bba\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e00\u81f4\u6027\u6b63\u5219\u5316\u7684\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\uff08CREAM\uff09\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e0d\u540c\u8fed\u4ee3\u4e2d\u7684\u5956\u52b1\u4e00\u81f4\u6027\u6765\u6b63\u5219\u5316\u81ea\u6211\u5956\u52b1\u8bad\u7ec3\uff0c\u5e2e\u52a9\u6a21\u578b\u4ece\u66f4\u53ef\u9760\u7684\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u79cd\u660e\u786e\u7684\u6b63\u5219\u5316\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u660e\u4e86CREAM\u5728\u63d0\u9ad8\u5956\u52b1\u4e00\u81f4\u6027\u548c\u5bf9\u9f50\u6027\u80fd\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Raibows/CREAM\u516c\u5f00\u83b7\u53d6\u3002|\n", "2410.12707": "|**2024-10-16**|**FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression**|Zhenheng Tang et.al.|[2410.12707](http://arxiv.org/abs/2410.12707)|null|\u4e3a\u4e86\u7f13\u89e3\u5728\u8bad\u7ec3\u5927\u578b\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\uff08DNNs\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\u7684\u786c\u4ef6\u77ed\u7f3a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FusionLLM\uff0c\u8fd9\u662f\u4e00\u79cd\u53bb\u4e2d\u5fc3\u5316\u7684\u8bad\u7ec3\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u5730\u7406\u5206\u5e03\u7684GPU\u8de8\u4e0d\u540c\u7684\u8ba1\u7b97\u96c6\u7fa4\u6216\u5355\u4e2a\u8bbe\u5907\u8fdb\u884cDNN\u8bad\u7ec3\u3002\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u5728\u7cfb\u7edf\u8bbe\u8ba1\u548c\u6548\u7387\u65b9\u9762\u9762\u4e34\u91cd\u5927\u6311\u6218\uff0c\u5305\u62ec\uff1a1\uff09\u9700\u8981\u8fdc\u7a0b\u81ea\u52a8\u5fae\u5206\uff08RAD\uff09\uff0c2\uff09\u652f\u6301\u7075\u6d3b\u7684\u6a21\u578b\u5b9a\u4e49\u548c\u5f02\u6784\u8f6f\u4ef6\uff0c3\uff09\u5f02\u6784\u786c\u4ef6\u5bfc\u81f4\u8d44\u6e90\u5229\u7528\u7387\u4f4e\u6216\u5b58\u5728\u6162\u901f\u8282\u70b9\u95ee\u9898\uff0c\u4ee5\u53ca4\uff09\u7f51\u7edc\u901a\u4fe1\u7f13\u6162\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u5728\u7cfb\u7edf\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u5c06\u6a21\u578b\u8868\u793a\u4e3a\u64cd\u4f5c\u7b26\uff08OP-DAG\uff09\u7684\u6709\u5411\u65e0\u73af\u56fe\u3002DAG\u4e2d\u7684\u6bcf\u4e2a\u8282\u70b9\u4ee3\u8868DNN\u4e2d\u7684\u64cd\u4f5c\u7b26\uff0c\u8fb9\u5219\u8868\u793a\u64cd\u4f5c\u7b26\u4e4b\u95f4\u7684\u6570\u636e\u4f9d\u8d56\u5173\u7cfb\u3002\u57fa\u4e8e\u8fd9\u79cd\u8bbe\u8ba1\uff0c1\uff09\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49\u4efb\u4f55DNN\u800c\u4e0d\u5fc5\u5173\u5fc3\u5e95\u5c42\u64cd\u4f5c\u7b26\u5b9e\u73b0\uff1b2\uff09\u6211\u4eec\u901a\u8fc7\u66f4\u7ec6\u7c92\u5ea6\u7684\u5b50\u4efb\u52a1\u8fdb\u884c\u4efb\u52a1\u8c03\u5ea6\uff0c\u63d0\u4f9b\u66f4\u591a\u7684\u4f18\u5316\u7a7a\u95f4\uff1b3\uff09DAG\u8fd0\u884c\u65f6\u6267\u884c\u5668\u53ef\u4ee5\u5728\u4e0d\u4f9d\u8d56\u4e00\u81f4\u7684\u4f4e\u7ea7\u673a\u5668\u5b66\u4e60\u6846\u67b6\u7248\u672c\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0RAD\u3002 \u4e3a\u4e86\u63d0\u9ad8\u7cfb\u7edf\u6548\u7387\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u4e00\u4e2a\u5de5\u4f5c\u8d1f\u8f7d\u4f30\u8ba1\u5668\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cdOP-Fence\u8c03\u5ea6\u5668\uff0c\u5c06\u5177\u6709\u76f8\u4f3c\u5e26\u5bbd\u7684\u8bbe\u5907\u5206\u7ec4\u5728\u4e00\u8d77\uff0c\u5e76\u5bf9DAG\u8fdb\u884c\u5206\u533a\u4ee5\u589e\u52a0\u541e\u5410\u91cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cdAdaTopK\u538b\u7f29\u5668\uff0c\u4ee5\u81ea\u9002\u5e94\u5730\u538b\u7f29\u5728\u6700\u6162\u901a\u4fe1\u94fe\u8def\u4e0a\u7684\u4e2d\u95f4\u6fc0\u6d3b\u548c\u68af\u5ea6\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u7b97\u6cd5\u7684\u6536\u655b\u6027\u548c\u6548\u7387\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u73b0\u5b9e\u6d4b\u8bd5\u5e73\u53f0\u4e0a\u4f7f\u7528\u8fde\u63a5\u901f\u5ea6\u57288 Mbps\u523010 Gbps\u768448\u4e2aGPU\u4e0a\u8bad\u7ec3\u4e86ResNet-101\u548cGPT-2\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u65b9\u6cd5\u53ef\u4ee5\u5728\u786e\u4fdd\u6536\u655b\u7684\u540c\u65f6\u5b9e\u73b01.45\u81f39.39\u500d\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2410.12700": "|**2024-10-16**|**Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization**|Xingqi Wang et.al.|[2410.12700](http://arxiv.org/abs/2410.12700)|**[link](https://github.com/achernarwang/LiVO)**|**\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u7684\u6269\u6563\u6a21\u578b\u5df2\u7ecf\u80fd\u591f\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u56fe\u50cf\u96be\u4ee5\u533a\u5206\u7684\u56fe\u50cf\uff0c\u4f46\u5b83\u4eec\u5e38\u5e38\u4ea7\u751f\u6709\u5bb3\u5185\u5bb9\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u4e0d\u7b26\uff0c\u4f8b\u5982\u793e\u4f1a\u504f\u89c1\u548c\u5192\u72af\u6027\u5185\u5bb9\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\u8fdb\u884c\u4e86\u5927\u91cf\u7814\u7a76\uff0c\u4f46\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u5bf9\u9f50\u95ee\u9898\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LiVO\uff08\u8f7b\u91cf\u7ea7\u4ef7\u503c\u4f18\u5316\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u8f7b\u91cf\u7ea7\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06T2I\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u3002LiVO\u4ec5\u4f18\u5316\u4e00\u4e2a\u5373\u63d2\u5373\u7528\u7684\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u4ee5\u5c06\u6307\u5b9a\u7684\u4ef7\u503c\u539f\u5219\u6574\u5408\u5230\u8f93\u5165\u63d0\u793a\u4e2d\uff0c\u4ece\u800c\u5728\u63a7\u5236\u751f\u6210\u56fe\u50cf\u7684\u8bed\u4e49\u548c\u4ef7\u503c\u89c2\u65b9\u9762\u53d1\u6325\u4f5c\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9488\u5bf9\u6269\u6563\u6a21\u578b\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\uff0c\u8be5\u51fd\u6570\u5728\u7406\u8bba\u4e0a\u903c\u8fd1LLM\u5bf9\u9f50\u4e2d\u4f7f\u7528\u7684Bradley-Terry\u6a21\u578b\uff0c\u4f46\u63d0\u4f9b\u4e86\u56fe\u50cf\u8d28\u91cf\u548c\u4ef7\u503c\u4e00\u81f4\u6027\u4e4b\u95f4\u7684\u66f4\u7075\u6d3b\u7684\u6743\u8861\u3002\u4e3a\u4e86\u4f18\u5316\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e00\u4e2a\u6846\u67b6\u6765\u81ea\u52a8\u6784\u5efa\u4e00\u4e2a\u5305\u542b86k\u4e2a\u6837\u672c\uff08\u63d0\u793a\u3001\u5bf9\u9f50\u56fe\u50cf\u3001\u8fdd\u53cd\u56fe\u50cf\u3001\u4ef7\u503c\u539f\u5219\uff09\u7684\u6587\u672c-\u56fe\u50cf\u504f\u597d\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e0d\u66f4\u65b0\u5927\u591a\u6570\u6a21\u578b\u53c2\u6570\u5e76\u901a\u8fc7\u4ece\u8f93\u5165\u63d0\u793a\u4e2d\u8fdb\u884c\u81ea\u9002\u5e94\u4ef7\u503c\u9009\u62e9\uff0cLiVO\u663e\u8457\u51cf\u5c11\u4e86\u6709\u5bb3\u8f93\u51fa\uff0c\u5e76\u5b9e\u73b0\u4e86\u66f4\u5feb\u7684\u6536\u655b\uff0c\u8d85\u8d8a\u4e86\u51e0\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u8fc8\u51fa\u4e86\u5411\u4f26\u7406\u5bf9\u9f50\u7684T2I\u6a21\u578b\u8fc8\u51fa\u7684\u7b2c\u4e00\u6b65\u3002**|\n", "2410.12686": "|**2024-10-16**|**Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2**|Mohamad Abdi et.al.|[2410.12686](http://arxiv.org/abs/2410.12686)|null|\u89e3\u5256\u5b66\u6807\u5fd7\u5728\u533b\u5b66\u5f71\u50cf\u4e2d\u5bf9\u4e8e\u5bfc\u822a\u548c\u5f02\u5e38\u68c0\u6d4b\u81f3\u5173\u91cd\u8981\u3002\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5982Llama-2\uff0c\u4e3a\u5c06\u8fd9\u4e9b\u6807\u5fd7\u4ece\u81ea\u7531\u6587\u672c\u7684\u653e\u5c04\u5b66\u62a5\u544a\u6620\u5c04\u5230\u56fe\u50cf\u6570\u636e\u4e2d\u7684\u76f8\u5e94\u4f4d\u7f6e\u63d0\u4f9b\u4e86\u5e0c\u671b\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLMs\u53ef\u80fd\u80fd\u591f\u5f62\u6210\u8fde\u8d2f\u7684\u751f\u6210\u8fc7\u7a0b\u8868\u793a\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u662f\u5426\u51c6\u786e\u5730\u8868\u793a\u89e3\u5256\u5b66\u6807\u5fd7\u7684\u7a7a\u95f4\u4f4d\u7f6e\u3002\u901a\u8fc7\u4f7f\u7528Llama-2\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u53ef\u4ee5\u7ebf\u6027\u5730\u8868\u793a\u7a7a\u95f4\u4e2d\u7684\u89e3\u5256\u5b66\u6807\u5fd7\uff0c\u5e76\u4e14\u5bf9\u4e0d\u540c\u63d0\u793a\u5177\u6709\u76f8\u5f53\u5f3a\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u589e\u5f3a\u533b\u5b66\u5f71\u50cf\u5de5\u4f5c\u6d41\u7a0b\u6548\u7387\u548c\u51c6\u786e\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.12656": "|**2024-10-16**|**Evaluating Morphological Compositional Generalization in Large Language Models**|Mete Ismayilzada et.al.|[2410.12656](http://arxiv.org/abs/2410.12656)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u5c55\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u503c\u5f97\u8d28\u7591\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8e\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u50cf\u4eba\u7c7b\u4e00\u6837\u5b66\u4e60\u8bed\u8a00\u7684\u7591\u95ee\u3002\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bed\u8a00\u4f7f\u7528\u4e2d\u8868\u73b0\u51fa\u7ec4\u5408\u80fd\u529b\u548c\u8bed\u8a00\u521b\u9020\u6027\uff0c\u4f46LLMs\u5728\u8fd9\u65b9\u9762\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u5f62\u6001\u5b66\u65b9\u9762\u7684\u80fd\u529b\uff0c\u4ecd\u9700\u8fdb\u4e00\u6b65\u63a2\u7d22\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u7ec4\u5408\u6027\u7684\u89c6\u89d2\u7cfb\u7edf\u5730\u7814\u7a76\u4e86LLMs\u5728\u5f62\u6001\u5b66\u6cdb\u5316\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u5c06\u8bcd\u7d20\u5b9a\u4e49\u4e3a\u7ec4\u5408\u7684\u57fa\u672c\u5355\u4f4d\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u5957\u65b0\u7684\u751f\u6210\u6027\u548c\u5224\u522b\u6027\u4efb\u52a1\u6765\u8bc4\u4f30\u5f62\u6001\u5b66\u7684\u751f\u4ea7\u529b\u548c\u7cfb\u7edf\u6027\u3002\u91cd\u70b9\u5173\u6ce8\u50cf\u571f\u8033\u5176\u8bed\u548c\u82ac\u5170\u8bed\u8fd9\u6837\u7684\u9ecf\u7740\u8bed\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6700\u5148\u8fdb\u7684\u6307\u4ee4\u5fae\u8c03\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u5305\u62ecGPT-4\u548cGemini\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLLMs\u5728\u5904\u7406\u5f62\u6001\u5b66\u7ec4\u5408\u6cdb\u5316\u65f6\u7279\u522b\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u5e94\u7528\u4e8e\u65b0\u8bcd\u6839\u65f6\uff0c\u968f\u7740\u5f62\u6001\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u867d\u7136\u6a21\u578b\u80fd\u591f\u6bd4\u968f\u673a\u731c\u6d4b\u66f4\u597d\u5730\u8bc6\u522b\u4e2a\u522b\u5f62\u6001\u7ec4\u5408\uff0c\u4f46\u5176\u8868\u73b0\u7f3a\u4e4f\u7cfb\u7edf\u6027\uff0c\u5bfc\u81f4\u4e0e\u4eba\u7c7b\u76f8\u6bd4\u5b58\u5728\u663e\u8457\u7684\u51c6\u786e\u7387\u5dee\u8ddd\u3002|\n", "2410.12631": "|**2024-10-16**|**Explainable Moral Values: a neuro-symbolic approach to value classification**|Nicolas Lazzari et.al.|[2410.12631](http://arxiv.org/abs/2410.12631)|null|\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u672c\u4f53\u7684\u63a8\u7406\u4e0e\u673a\u5668\u5b66\u4e60\u6280\u672f\u5728\u53ef\u89e3\u91ca\u4ef7\u503c\u5206\u7c7b\u4e2d\u7684\u6574\u5408\u3002\u901a\u8fc7\u4f9d\u8d56\u9053\u5fb7\u57fa\u7840\u7406\u8bba\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u5f62\u5f0f\u5316\u4ee5\u53caDnS\u672c\u4f53\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4f7f\u7528sandra\u795e\u7ecf\u7b26\u53f7\u63a8\u7406\u5668\u6765\u63a8\u65ad\u6ee1\u8db3\u7279\u5b9a\u53e5\u5b50\u63cf\u8ff0\u7684\u4ef7\u503c\u3002\u53e5\u5b50\u53ca\u5176\u7ed3\u6784\u5316\u8868\u793a\u662f\u4f7f\u7528\u5f00\u6e90\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u7684\u3002\u6240\u63a8\u65ad\u7684\u63cf\u8ff0\u88ab\u7528\u6765\u81ea\u52a8\u68c0\u6d4b\u53e5\u5b50\u6240\u5173\u8054\u7684\u4ef7\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f9d\u9760\u63a8\u7406\u5668\u7684\u7ed3\u679c\u5373\u53ef\u5b9e\u73b0\u4e0e\u66f4\u590d\u6742\u65b9\u6cd5\u76f8\u5f53\u7684\u53ef\u89e3\u91ca\u5206\u7c7b\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5c06\u63a8\u7406\u5668\u7684\u63a8\u65ad\u7ed3\u679c\u4e0e\u5206\u5e03\u8bed\u4e49\u65b9\u6cd5\u76f8\u7ed3\u5408\u53ef\u4ee5\u5927\u5e45\u8d85\u8d8a\u6240\u6709\u57fa\u7ebf\uff0c\u5305\u62ec\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\u7684\u590d\u6742\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u5de5\u5177\u6765\u63a2\u7d22\u57fa\u4e8e\u7406\u8bba\u7684\u503c\u5206\u7c7b\u7684\u6f5c\u529b\uff0c\u8be5\u5de5\u5177\u53ef\u5728http://xmv.geomeaning.com/\u516c\u5f00\u8bbf\u95ee\u3002|\n", "2410.13863": "|**2024-10-17**|**Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens**|Lijie Fan et.al.|[2410.13863](http://arxiv.org/abs/2410.13863)|null|\u5728\u89c6\u89c9\u9886\u57df\uff0c\u6269\u5927\u81ea\u56de\u5f52\u6a21\u578b\u7684\u6548\u679c\u5e76\u4e0d\u50cf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u90a3\u6837\u663e\u8457\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8fd9\u4e00\u6269\u5c55\u95ee\u9898\uff0c\u91cd\u70b9\u5173\u6ce8\u4e24\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u6a21\u578b\u662f\u5426\u4f7f\u7528\u79bb\u6563\u6216\u8fde\u7eed\u7684\u6807\u8bb0\uff0c\u4ee5\u53ca\u6807\u8bb0\u662f\u5426\u4ee5\u968f\u673a\u6216\u56fa\u5b9a\u6805\u683c\u987a\u5e8f\u4f7f\u7528\u7c7b\u4f3c\u4e8eBERT\u6216GPT\u7684\u53d8\u6362\u5668\u67b6\u6784\u751f\u6210\u3002\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u867d\u7136\u6240\u6709\u6a21\u578b\u5728\u9a8c\u8bc1\u635f\u5931\u65b9\u9762\u90fd\u80fd\u6709\u6548\u6269\u5c55\uff0c\u4f46\u5b83\u4eec\u7684\u8bc4\u4f30\u6027\u80fd\u2014\u2014\u901a\u8fc7FID\u3001GenEval\u5206\u6570\u548c\u89c6\u89c9\u8d28\u91cf\u6765\u8861\u91cf\u2014\u2014\u5219\u5448\u73b0\u51fa\u4e0d\u540c\u7684\u8d8b\u52bf\u3002\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u7684\u6a21\u578b\u5728\u89c6\u89c9\u8d28\u91cf\u4e0a\u663e\u8457\u4f18\u4e8e\u4f7f\u7528\u79bb\u6563\u6807\u8bb0\u7684\u6a21\u578b\u3002\u6b64\u5916\uff0c\u751f\u6210\u987a\u5e8f\u548c\u6ce8\u610f\u529b\u673a\u5236\u5bf9GenEval\u5206\u6570\u6709\u663e\u8457\u5f71\u54cd\uff1a\u968f\u673a\u987a\u5e8f\u7684\u6a21\u578b\u5728GenEval\u5206\u6570\u4e0a\u660e\u663e\u4f18\u4e8e\u6805\u683c\u987a\u5e8f\u7684\u6a21\u578b\u3002\u53d7\u8fd9\u4e9b\u53d1\u73b0\u7684\u542f\u53d1\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u540d\u4e3aFluid\u7684\u968f\u673a\u987a\u5e8f\u81ea\u56de\u5f52\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u3002Fluid 10.5B\u6a21\u578b\u5728MS-COCO 30K\u4e0a\u7684\u96f6\u6837\u672cFID\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6c34\u5e73\uff0c\u53736.16\uff0c\u5e76\u4e14\u5728GenEval\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6574\u4f53\u5f97\u5206\u4e3a0.69\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u53d1\u73b0\u548c\u7ed3\u679c\u80fd\u9f13\u52b1\u672a\u6765\u8fdb\u4e00\u6b65\u7f29\u5c0f\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u6269\u5c55\u5dee\u8ddd\u3002|\n", "2410.13861": "|**2024-10-17**|**PUMA: Empowering Unified MLLM with Multi-granular Visual Generation**|Rongyao Fang et.al.|[2410.13861](http://arxiv.org/abs/2410.13861)|**[link](https://github.com/rongyaofang/puma)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u5728\u89c6\u89c9-\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u521d\u6b65\u5c1d\u8bd5\u4e5f\u63a2\u7d22\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u5185\u5bb9\u751f\u6210\u4e2d\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u672a\u80fd\u5145\u5206\u89e3\u51b3\u7edf\u4e00MLLM\u8303\u5f0f\u4e0b\u4e0d\u540c\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u5bf9\u4e0d\u540c\u7c92\u5ea6\u9700\u6c42\u7684\u95ee\u9898\u2014\u2014\u4ece\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6240\u9700\u7684\u591a\u6837\u6027\u5230\u56fe\u50cf\u64cd\u4f5c\u6240\u9700\u7684\u7cbe\u786e\u53ef\u63a7\u6027\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PUMA\uff0c\u5373\u901a\u8fc7\u591a\u7c92\u5ea6\u89c6\u89c9\u751f\u6210\u8d4b\u4e88\u7edf\u4e00MLLM\u4ee5\u529b\u91cf\u3002PUMA\u5c06\u591a\u7c92\u5ea6\u89c6\u89c9\u7279\u5f81\u7edf\u4e00\u4f5c\u4e3aMLLM\u7684\u8f93\u5165\u548c\u8f93\u51fa\uff0c\u4f18\u96c5\u5730\u89e3\u51b3\u4e86\u4e0d\u540c\u7c92\u5ea6\u8981\u6c42\u7684\u5404\u79cd\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u5728\u7edf\u4e00MLLM\u6846\u67b6\u4e0b\u7684\u95ee\u9898\u3002\u7ecf\u8fc7\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u548c\u4efb\u52a1\u7279\u5b9a\u6307\u4ee4\u5fae\u8c03\u540e\uff0cPUMA\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u9879\u5de5\u4f5c\u6807\u5fd7\u7740\u5411\u771f\u6b63\u7edf\u4e00\u7684MLLM\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\uff0c\u8fd9\u79cdMLLM\u80fd\u591f\u9002\u5e94\u5404\u79cd\u89c6\u89c9\u4efb\u52a1\u5bf9\u7c92\u5ea6\u7684\u9700\u6c42\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5c06\u5728https://github.com/rongyaofang/PUMA\u53d1\u5e03\u3002**|\n", "2410.13859": "|**2024-10-17**|**$\u03b3-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models**|Yaxin Luo et.al.|[2410.13859](http://arxiv.org/abs/2410.13859)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5176\u9ad8\u6602\u7684\u8ba1\u7b97\u6210\u672c\u4ecd\u7136\u662f\u5b9e\u9645\u90e8\u7f72\u4e2d\u7684\u4e00\u4e2a\u969c\u788d\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u6df1\u5ea6\u6df7\u5408\uff08MoD\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u4ece\u201c\u6fc0\u6d3b\u6807\u8bb0\u201d\u7684\u89d2\u5ea6\u6765\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\u3002\u6211\u4eec\u7684\u5173\u952e\u89c1\u89e3\u662f\uff0c\u5982\u679c\u5927\u591a\u6570\u6807\u8bb0\u5bf9\u4e8e\u5c42\u8ba1\u7b97\u662f\u5197\u4f59\u7684\uff0c\u90a3\u4e48\u53ef\u4ee5\u901a\u8fc7MoD\u5c42\u76f4\u63a5\u8df3\u8fc7\u5b83\u4eec\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5c06MLLMs\u7684\u5bc6\u96c6\u5c42\u8f6c\u6362\u4e3aMoD\u5c42\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u4e0b\u964d\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MoD\u9002\u5e94\u7b56\u7565\uff0c\u79f0\u4e3a$\\gamma$-MoD\uff0c\u7528\u4e8e\u73b0\u6709\u7684MLLMs\u3002\u5728$\\gamma$-MoD\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u6307\u6807\u6765\u6307\u5bfcMLLM\u4e2dMoD\u7684\u90e8\u7f72\uff0c\u5373\u6ce8\u610f\u529b\u56fe\u7684\u79e9\uff08ARank\uff09\u3002\u901a\u8fc7ARank\uff0c\u6211\u4eec\u53ef\u4ee5\u6709\u6548\u5730\u8bc6\u522b\u54ea\u4e9b\u5c42\u662f\u5197\u4f59\u7684\uff0c\u5e76\u5e94\u88ab\u66ff\u6362\u4e3aMoD\u5c42\u3002\u57fa\u4e8eARank\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u9896\u7684\u8bbe\u8ba1\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8MLLM\u7684\u8ba1\u7b97\u7a00\u758f\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6027\u80fd\uff0c\u5373\u5171\u4eab\u89c6\u89c9-\u8bed\u8a00\u8def\u7531\u5668\u548c\u63a9\u7801\u8def\u7531\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u4e9b\u8bbe\u8ba1\uff0cMLLM\u768490%\u4ee5\u4e0a\u7684\u5bc6\u96c6\u5c42\u53ef\u4ee5\u6709\u6548\u8f6c\u6362\u4e3aMoD\u5c42\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u6d41\u884c\u7684MLLM\u4e0a\u8fdb\u884c\u4e86\u5e94\u7528\uff0c\u5e76\u57289\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u9a8c\u8bc1\u4e86$\\gamma$-MoD\u5bf9\u73b0\u6709MLLMs\u7684\u663e\u8457\u6548\u7387\u63d0\u5347\uff0c\u8fd8\u8bc1\u5b9e\u4e86\u5176\u5728\u5404\u79cdMLLM\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u4f8b\u5982\uff0c$\\gamma$-MoD\u4ec5\u5bfc\u81f4\u8f7b\u5fae\u7684\u6027\u80fd\u4e0b\u964d\uff0c\u5373-1.5%\uff0c\u4f46\u53ef\u4ee5\u5206\u522b\u5c06LLaVA-HR\u7684\u8bad\u7ec3\u65f6\u95f4\u548c\u63a8\u7406\u65f6\u95f4\u51cf\u5c1131.0%\u548c53.2%\u3002|\n", "2410.13857": "|**2024-10-17**|**How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs**|Guhao Feng et.al.|[2410.13857](http://arxiv.org/abs/2410.13857)|null|\u5c3d\u7ba1\u57fa\u4e8eTransformer\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u4f46\u7406\u89e3\u548c\u63d0\u5347\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u6311\u6218\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5bf9LLMs\u7684\u6570\u5b66\u80fd\u529b\u8fdb\u884c\u4e86\u4e25\u683c\u7684\u7406\u8bba\u5206\u6790\uff0c\u7279\u522b\u5173\u6ce8\u5b83\u4eec\u7684\u7b97\u672f\u8868\u73b0\u3002\u6211\u4eec\u53d1\u73b0\u6570\u503c\u7cbe\u5ea6\u662f\u5f71\u54cd\u5176\u5728\u6570\u5b66\u4efb\u52a1\u4e2d\u8868\u73b0\u7684\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528\u4f4e\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u5728\u5904\u7406\u7b97\u672f\u4efb\u52a1\uff08\u5982\u8fed\u4ee3\u52a0\u6cd5\u548c\u6574\u6570\u4e58\u6cd5\uff09\u65f6\uff0c\u9664\u975e\u6a21\u578b\u5927\u5c0f\u76f8\u5bf9\u4e8e\u8f93\u5165\u957f\u5ea6\u5448\u8d85\u591a\u9879\u5f0f\u589e\u957f\uff0c\u5426\u5219\u65e0\u6cd5\u6709\u6548\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6807\u51c6\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u53ef\u4ee5\u9ad8\u6548\u5730\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\uff0c\u5e76\u4e14\u6240\u9700\u7684\u6a21\u578b\u5c3a\u5bf8\u8981\u5c0f\u5f97\u591a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u7406\u8bba\u53d1\u73b0\uff0c\u63a2\u7d22\u4e86\u4e0d\u540c\u6570\u503c\u7cbe\u5ea6\u5bf9\u7b97\u672f\u4efb\u52a1\u7684\u5f71\u54cd\uff0c\u4e3a\u63d0\u9ad8LLMs\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002|\n", "2410.13854": "|**2024-10-17**|**Can MLLMs Understand the Deep Implication Behind Chinese Images?**|Chenhao Zhang et.al.|[2410.13854](http://arxiv.org/abs/2410.13854)|**[link](https://github.com/MING-ZCH/CII-Bench)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u80fd\u529b\u4e0d\u65ad\u63d0\u5347\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u66f4\u9ad8\u9636\u80fd\u529b\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u9488\u5bf9MLLMs\u7684\u9ad8\u9636\u611f\u77e5\u548c\u7406\u89e3\u4e2d\u6587\u89c6\u89c9\u5185\u5bb9\u7684\u8bc4\u4f30\u5de5\u4f5c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e2d\u6587\u56fe\u50cf\u9690\u542b\u7406\u89e3\u57fa\u51c6\uff08CII-Bench\uff09\uff0c\u65e8\u5728\u8bc4\u4f30MLLMs\u5bf9\u4e2d\u6587\u56fe\u50cf\u7684\u9ad8\u9636\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u3002\u4e0e\u73b0\u6709\u57fa\u51c6\u76f8\u6bd4\uff0cCII-Bench\u5177\u6709\u591a\u4e2a\u7a81\u51fa\u7279\u70b9\u3002\u9996\u5148\uff0c\u4e3a\u4e86\u786e\u4fdd\u4e2d\u6587\u80cc\u666f\u7684\u771f\u5b9e\u6027\uff0cCII-Bench\u4e2d\u7684\u56fe\u50cf\u6765\u6e90\u4e8e\u4e2d\u56fd\u4e92\u8054\u7f51\uff0c\u5e76\u7ecf\u8fc7\u4eba\u5de5\u5ba1\u67e5\uff0c\u76f8\u5e94\u7684\u7b54\u6848\u4e5f\u7531\u4eba\u5de5\u7cbe\u5fc3\u5236\u4f5c\u3002\u6b64\u5916\uff0cCII-Bench\u8fd8\u7eb3\u5165\u4e86\u4ee3\u8868\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u56fe\u50cf\uff0c\u5982\u8457\u540d\u7684\u4e2d\u56fd\u4f20\u7edf\u7ed8\u753b\uff0c\u8fd9\u53ef\u4ee5\u6df1\u5165\u53cd\u6620\u6a21\u578b\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u7406\u89e3\u3002\u901a\u8fc7\u5728\u591a\u4e2aMLLMs\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4e86\u91cd\u8981\u53d1\u73b0\u3002\u6700\u521d\uff0cMLLMs\u5728CII-Bench\u4e0a\u7684\u8868\u73b0\u4e0e\u4eba\u7c7b\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002MLLMs\u7684\u6700\u9ad8\u51c6\u786e\u7387\u4e3a64.4%\uff0c\u800c\u4eba\u7c7b\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a78.2%\uff0c\u5cf0\u503c\u8fbe\u5230\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768481.0%\u3002\u968f\u540e\uff0cMLLMs\u5728\u5904\u7406\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u56fe\u50cf\u65f6\u8868\u73b0\u8f83\u5dee\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u7406\u89e3\u9ad8\u5c42\u6b21\u8bed\u4e49\u548c\u7f3a\u4e4f\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u6df1\u5165\u4e86\u89e3\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u6700\u540e\uff0c\u89c2\u5bdf\u5230\u5927\u591a\u6570\u6a21\u578b\u5728\u56fe\u50cf\u60c5\u611f\u63d0\u793a\u88ab\u7eb3\u5165\u63d0\u793a\u65f6\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0cCII-Bench\u5c06\u4f7fMLLMs\u66f4\u597d\u5730\u7406\u89e3\u4e2d\u6587\u8bed\u4e49\u548c\u7279\u5b9a\u4e8e\u4e2d\u56fd\u7684\u56fe\u50cf\uff0c\u4ece\u800c\u63a8\u52a8\u5411\u4e13\u5bb6\u578b\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u53d1\u5c55\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u5728https://cii-bench.github.io/\u516c\u5f00\u8bbf\u95ee\u3002**|\n", "2410.13852": "|**2024-10-17**|**Retrospective Learning from Interactions**|Zizhao Chen et.al.|[2410.13852](http://arxiv.org/abs/2410.13852)|null|\u591a\u56de\u5408\u4ea4\u4e92\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7528\u6237\u4e4b\u95f4\u7684\u4e92\u52a8\u81ea\u7136\u5305\u542b\u4e86\u9690\u542b\u7684\u53cd\u9988\u4fe1\u53f7\u3002\u5982\u679cLLMs\u4ee5\u51fa\u4e4e\u610f\u6599\u7684\u65b9\u5f0f\u56de\u5e94\u7528\u6237\u7684\u6307\u4ee4\uff0c\u7528\u6237\u5f88\u53ef\u80fd\u4f1a\u901a\u8fc7\u91cd\u65b0\u8868\u8ff0\u8bf7\u6c42\u3001\u8868\u8fbe\u632b\u8d25\u611f\u6216\u8f6c\u5411\u66ff\u4ee3\u4efb\u52a1\u6765\u4f20\u8fbe\u8fd9\u4e00\u4fe1\u53f7\u3002\u8fd9\u4e9b\u4fe1\u53f7\u4e0e\u5177\u4f53\u4efb\u52a1\u65e0\u5173\uff0c\u5e76\u4e14\u5360\u636e\u76f8\u5bf9\u53d7\u9650\u7684\u8bed\u8a00\u5b50\u7a7a\u95f4\uff0c\u5373\u4f7fLLMs\u5728\u5b9e\u9645\u4efb\u52a1\u4e0a\u5931\u8d25\u4e86\uff0c\u4e5f\u80fd\u8bc6\u522b\u8fd9\u4e9b\u4fe1\u53f7\u3002\u8fd9\u4e3aLLMs\u901a\u8fc7\u4e92\u52a8\u6301\u7eed\u5b66\u4e60\u63d0\u4f9b\u4e86\u9014\u5f84\uff0c\u800c\u65e0\u9700\u989d\u5916\u6807\u6ce8\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aReSpect\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u56de\u987e\u8fc7\u53bb\u7684\u4ea4\u4e92\u6765\u5b66\u4e60\u8fd9\u4e9b\u4fe1\u53f7\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u573a\u666f\u4e2d\u90e8\u7f72\u4e86ReSpect\uff0c\u5728\u8be5\u573a\u666f\u4e2d\uff0c\u4eba\u7c7b\u6307\u5bfcLLMs\u89e3\u51b3\u5177\u6709\u7ec4\u5408\u89e3\u7a7a\u95f4\u7684\u62bd\u8c61\u63a8\u7406\u4efb\u52a1\u3002\u901a\u8fc7\u4e0e\u4eba\u7c7b\u8fdb\u884c\u6570\u5343\u6b21\u4ea4\u4e92\uff0c\u6211\u4eec\u5c55\u793a\u4e86ReSpect\u5982\u4f55\u9010\u6b65\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7387\uff0c\u4ece31%\u63d0\u5347\u523082%\uff0c\u4e14\u65e0\u9700\u4efb\u4f55\u5916\u90e8\u6807\u6ce8\u3002|\n", "2410.13846": "|**2024-10-17**|**SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction**|Xuan Zhang et.al.|[2410.13846](http://arxiv.org/abs/2410.13846)|**[link](https://github.com/sail-sg/simlayerkv)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u5df2\u7ecf\u6269\u5c55\u4e86\u5b83\u4eec\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u589e\u52a0\u6a21\u578b\u5c42\u6570\u548c\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u663e\u8457\u589e\u52a0\u4e86\u5b58\u50a8\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\uff0c\u8fd9\u5bf9\u9ad8\u6548\u7684\u63a8\u7406\u6784\u6210\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SimLayerKV\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5728\u8bc6\u522b\u4e3a\u61d2\u5c42\u7684\u5c42\u4e2d\u9009\u62e9\u6027\u5730\u4e22\u5f03\u7f13\u5b58\u6765\u51cf\u5c11\u5c42\u95f4KV\u7f13\u5b58\u7684\u5197\u4f59\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u8fd9\u6837\u7684\u89c2\u5bdf\uff1a\u5728\u957f\u4e0a\u4e0b\u6587LLMs\u4e2d\uff0c\u67d0\u4e9b\u5c42\u8868\u73b0\u51fa\u201c\u61d2\u60f0\u201d\u884c\u4e3a\uff0c\u4e0e\u975e\u61d2\u5c42\u76f8\u6bd4\uff0c\u5bf9\u5efa\u6a21\u957f\u8ddd\u79bb\u4f9d\u8d56\u8d21\u732e\u8f83\u5c0f\u3002\u901a\u8fc7\u5206\u6790\u6ce8\u610f\u529b\u6743\u91cd\u6a21\u5f0f\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u61d2\u5c42\u5728\u7ed9\u5b9a\u8f93\u5165\u751f\u6210\u8fc7\u7a0b\u4e2d\u5bf9\u4e0d\u540ctoken\u7684\u884c\u4e3a\u662f\u4e00\u81f4\u7684\u3002\u8fd9\u4e00\u89c1\u89e3\u542f\u53d1\u4e86\u6211\u4eec\u7684SimLayerKV\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u8bc6\u522b\u61d2\u5c42\u5e76\u76f8\u5e94\u5730\u51cf\u5c11\u5176KV\u7f13\u5b58\u3002SimLayerKV\u65e0\u9700\u8bad\u7ec3\uff0c\u5177\u6709\u901a\u7528\u6027\uff0c\u5e76\u4e14\u53ef\u4ee5\u7528\u4ec5\u4e03\u884c\u4ee3\u7801\u5b9e\u73b0\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u4ee3\u8868\u6027LLMs\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4f8b\u5982LLaMA2-7B\u3001LLaMA3-8B\u548cMistral-7B\uff0c\u5728LongBench\u57fa\u51c6\u6d4b\u8bd5\u768416\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cSimLayerKV\u5b9e\u73b0\u4e865\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\uff0c\u5e76\u4e14\u5728\u7ed3\u54084\u4f4d\u91cf\u5316\u65f6\u6027\u80fd\u4ec5\u4e0b\u964d1.2%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/sail-sg/SimLayerKV\u83b7\u53d6\u3002**|\n", "2410.13835": "|**2024-10-17**|**Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs**|Tianyu Guo et.al.|[2410.13835](http://arxiv.org/abs/2410.13835)|null|\u5b9e\u8df5\u8005\u5728\u53d8\u538b\u5668\u578b\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u89c2\u5bdf\u5230\u4e86\u4e09\u4e2a\u4ee4\u4eba\u56f0\u60d1\u7684\u73b0\u8c61\uff1a\u6ce8\u610f\u529b\u6c47\u70b9\u3001\u503c\u72b6\u6001\u8017\u5c3d\u548c\u6b8b\u5dee\u72b6\u6001\u5cf0\u503c\uff0c\u8fd9\u4e9b\u73b0\u8c61\u7edf\u79f0\u4e3a\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u3002\u8fd9\u4e9b\u73b0\u8c61\u7684\u7279\u70b9\u662f\u67d0\u4e9b\u6240\u8c13\u7684\u201c\u6c47\u70b9\u4ee4\u724c\u201d\u63a5\u6536\u4e0d\u6210\u6bd4\u4f8b\u9ad8\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u8868\u73b0\u51fa\u660e\u663e\u8f83\u5c0f\u7684\u503c\u72b6\u6001\uff0c\u5e76\u4e14\u5177\u6709\u6bd4\u5176\u4ed6\u4ee4\u724c\u5927\u5f97\u591a\u7684\u6b8b\u5dee\u72b6\u6001\u8303\u6570\u3002\u8fd9\u4e9b\u6781\u7aef\u4ee4\u724c\u5728LLM\u63a8\u7406\u3001\u91cf\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u5f15\u53d1\u4e86\u8bb8\u591a\u6311\u6218\u3002\u6211\u4eec\u9610\u660e\u4e86\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u80cc\u540e\u7684\u673a\u5236\u3002\u9996\u5148\uff0c\u6211\u4eec\u5728\u975e\u5e38\u7b80\u5355\u7684\u67b6\u6784\u2014\u2014\u4e00\u5230\u4e09\u5c42\u7684\u53d8\u538b\u5668\uff0c\u5728\u73a9\u5177\u6a21\u578bBigram-Backcopy\uff08BB\uff09\u4efb\u52a1\u4e0a\u8bad\u7ec3\u65f6\u5c55\u793a\u4e86\u8fd9\u4e9b\u73b0\u8c61\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e00\u4e2a\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5176\u4e2d\u6ce8\u610f\u529b\u5934\u5bf9\u4e8e\u7279\u5b9a\u8f93\u5165\u57df\u6210\u4e3a\u6c47\u70b9\uff0c\u800c\u5bf9\u4e8e\u5176\u4ed6\u8f93\u5165\u5219\u4e0d\u662f\u3002\u6211\u4eec\u5bf9\u8bad\u7ec3\u52a8\u6001\u7684\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u4e9b\u73b0\u8c61\u662f\u7531\u4e00\u79cd\u76f8\u4e92\u589e\u5f3a\u673a\u5236\u9a71\u52a8\u7684\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u7f13\u89e3\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u7684\u7b56\u7565\uff0c\u5305\u62ec\u7528ReLU\u66ff\u6362softmax\u4ee5\u53ca\u7528SGD\u66ff\u6362Adam\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5206\u6790\u6269\u5c55\u5230\u9884\u8bad\u7ec3\u7684LLM\uff0c\u5305\u62ecLlama\u548cOLMo\uff0c\u663e\u793a\u8bb8\u591a\u6ce8\u610f\u529b\u5934\u8868\u73b0\u51fa\u4e0eBB\u4efb\u52a1\u4e2d\u7c7b\u4f3c\u7684\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5e76\u4e14\u76f8\u4e92\u589e\u5f3a\u673a\u5236\u4e5f\u652f\u914d\u7740LLM\u9884\u8bad\u7ec3\u671f\u95f4\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u7684\u51fa\u73b0\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u8bb8\u591a\u7531BB\u4efb\u52a1\u9884\u6d4b\u7684\u9759\u6001\u548c\u52a8\u6001\u6027\u8d28\u4e0e\u9884\u8bad\u7ec3LLM\u4e2d\u7684\u89c2\u5bdf\u7ed3\u679c\u4e00\u81f4\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u6765\u5b9e\u73b0\u81ea\u6cbb\uff0c\u53ef\u4ee5\u63d0\u5347\u4eba\u7c7b\u5728\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\u7684\u540c\u65f6\uff0c\u7f51\u7edc\u4ee3\u7406\u4e5f\u4f5c\u4e3a\u4e00\u4e2a\u91cd\u8981\u7684\u6982\u5ff5\u9a8c\u8bc1\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6027\u3002\u5176\u6210\u529f\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u4f1a\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u8fd9\u4e9b\u7b56\u7565\u53ef\u80fd\u65e0\u6cd5\u5728\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5f88\u597d\u5730\u63a8\u5e7f\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0eLLM\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u4e0d\u5339\u914d\u7684\u7814\u7a76\u975e\u5e38\u6709\u9650\u3002\u8fd9\u79cd\u5dee\u5f02\u7279\u522b\u660e\u663e\uff0c\u56e0\u4e3aLLM\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u8fdb\u884c\u8bad\u7ec3\uff0c\u800c\u4e0d\u662f\u5904\u7406\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316LLM\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0eLLM\u7684\u80fd\u529b\u76f8\u5339\u914d\uff0c\u4ece\u800c\u63d0\u5347\u4e86\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u7684\u57fa\u7840\u4ee3\u7406\u5728\u5404\u79cd\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4ee5\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u57fa\u51c6\u6d4b\u8bd5\u6db5\u76d6\u4e86\u901a\u7528\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u4e4b\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa9.8\u5206\uff08+29.4%\uff09\uff0c\u6bd4\u540c\u65f6\u671f\u7684\u5de5\u4f5c\u9ad8\u51fa5.9\u5206\uff08+15.8%\uff09\u3002\u76f8\u6bd4\u7c7b\u4f3c\u7684\u57fa\u672c\u7f51\u7edc\u4ee3\u7406\uff0c\u5176\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u9f50\u540e\u6210\u529f\u7387\u4e3a26.6\u5206\uff08+161%\uff09\u3002\u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u7684\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u8bbe\u8ba1\u7b80\u5355\uff0c\u7a81\u663e\u4e86LLM\u5728\u65e0\u6837\u672c\u60c5\u51b5\u4e0b\u6267\u884c\u7f51\u7edc\u4efb\u52a1\u7684\u5f3a\u5927\u6027\u80fd\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u3002|\n", "2410.13824": "|**2024-10-18**|**Harnessing Webpage UIs for Text-Rich Visual Understanding**|Junpeng Liu et.al.|[2410.13824](http://arxiv.org/abs/2410.13824)|null|\u6587\u672c\u4e30\u5bcc\u7684\u89c6\u89c9\u7406\u89e3\u2014\u2014\u5373\u5904\u7406\u5bc6\u96c6\u6587\u672c\u5185\u5bb9\u4e0e\u89c6\u89c9\u5143\u7d20\u76f8\u878d\u5408\u7684\u73af\u5883\u7684\u80fd\u529b\uff0c\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7ed3\u6784\u5316\u73af\u5883\u4e2d\u8fdb\u884c\u6709\u6548\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u4e86\u589e\u5f3a\u8fd9\u4e00\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u57fa\u4e8e\u6587\u672c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ece\u7f51\u9875\u7528\u6237\u754c\u9762\u5408\u6210\u901a\u7528\u7684\u591a\u6a21\u6001\u6307\u4ee4\u3002\u5c3d\u7ba1\u7f3a\u4e4f\u76f4\u63a5\u7684\u89c6\u89c9\u8f93\u5165\uff0c\u57fa\u4e8e\u6587\u672c\u7684LLMs\u80fd\u591f\u5904\u7406\u6765\u81ea\u7f51\u9875\u53ef\u8bbf\u95ee\u6027\u6811\u7684\u7ed3\u6784\u5316\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6307\u4ee4\u968f\u540e\u4e0eUI\u622a\u56fe\u914d\u5bf9\u4ee5\u8bad\u7ec3\u591a\u6a21\u6001\u6a21\u578b\u3002\u6211\u4eec\u5f15\u5165\u4e86MultiUI\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6765\u81ea100\u4e07\u4e2a\u7f51\u7ad9\u7684730\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u548cUI\u5e03\u5c40\u3002\u5728MultiUI\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7f51\u9875UI\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728VisualWebBench\u4e0a\u7684\u63d0\u5347\u9ad8\u8fbe48%\uff0c\u5728Mind2Web\u7684\u7f51\u9875\u4ee3\u7406\u6570\u636e\u96c6\u4e2d\u5143\u7d20\u51c6\u786e\u7387\u63d0\u9ad8\u4e8619.1%\uff0c\u800c\u4e14\u5728\u975e\u7f51\u9875UI\u4efb\u52a1\u4ee5\u53ca\u751a\u81f3\u975eUI\u9886\u57df\uff08\u5982\u6587\u6863\u7406\u89e3\u3001OCR\u548c\u56fe\u8868\u89e3\u91ca\uff09\u4e2d\u4e5f\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u6cdb\u5316\u80fd\u529b\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u7f51\u9875UI\u6570\u636e\u5728\u63a8\u52a8\u5404\u79cd\u573a\u666f\u4e0b\u6587\u672c\u4e30\u5bcc\u89c6\u89c9\u7406\u89e3\u7684\u5e7f\u6cdb\u5e94\u7528\u6027\u3002|\n", "2410.14677": "|**2024-10-18**|**Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts**|German Gritsai et.al.|[2410.14677](http://arxiv.org/abs/2410.14677)|null|\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u8457\u63d0\u5347\u4e86\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\uff0c\u8fd9\u4fc3\u4f7f\u4e86\u53ef\u9760\u673a\u5668\u751f\u6210\u6587\u672c\u68c0\u6d4b\u5668\u7684\u51fa\u73b0\u3002\u5927\u91cf\u68c0\u6d4b\u5668\u548c\u5305\u542b\u4eba\u5de5\u667a\u80fd\u7247\u6bb5\u7684\u6570\u636e\u96c6\u5e94\u8fd0\u800c\u751f\uff0c\u4e00\u4e9b\u68c0\u6d4b\u65b9\u6cd5\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e2d\u8fbe\u5230\u4e86\u9ad8\u8fbe99.9%\u7684\u76ee\u6807\u6307\u6807\u8bc6\u522b\u8d28\u91cf\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u8fd9\u4e9b\u68c0\u6d4b\u5668\u7684\u8d28\u91cf\u5f80\u5f80\u4f1a\u5927\u5e45\u4e0b\u964d\uff0c\u8fd9\u5f15\u53d1\u4e86\u7591\u95ee\uff1a\u8fd9\u4e9b\u68c0\u6d4b\u5668\u662f\u5426\u771f\u6b63\u5177\u6709\u9ad8\u5ea6\u7684\u53ef\u9760\u6027\uff0c\u8fd8\u662f\u5176\u9ad8\u57fa\u51c6\u5206\u6570\u4ec5\u4ec5\u662f\u7531\u4e8e\u8bc4\u4f30\u6570\u636e\u96c6\u8d28\u91cf\u8f83\u5dee\u6240\u81f4\uff1f\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u9700\u8981\u5efa\u7acb\u7a33\u5065\u4e14\u9ad8\u8d28\u91cf\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684\u6570\u636e\uff0c\u4ee5\u9632\u6b62\u672a\u6765\u6a21\u578b\u4e2d\u7684\u504f\u5dee\u548c\u4f4e\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u5bf9\u4e13\u95e8\u7528\u4e8eAI\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u7684\u7ade\u8d5b\u4e2d\u7684\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cfb\u7edf\u56de\u987e\uff0c\u5e76\u63d0\u51fa\u4e86\u8bc4\u4f30\u5305\u542bAI\u751f\u6210\u7247\u6bb5\u7684\u6570\u636e\u96c6\u8d28\u91cf\u7684\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u4f7f\u7528\u9ad8\u8d28\u91cf\u751f\u6210\u6570\u636e\u4ee5\u5b9e\u73b0\u4e24\u4e2a\u76ee\u6807\u7684\u53ef\u80fd\u6027\uff1a\u63d0\u9ad8\u68c0\u6d4b\u6a21\u578b\u7684\u8bad\u7ec3\u6548\u679c\u548c\u6539\u5584\u8bad\u7ec3\u6570\u636e\u96c6\u672c\u8eab\u3002\u6211\u4eec\u7684\u8d21\u732e\u65e8\u5728\u4fc3\u8fdb\u5bf9\u4eba\u4e0e\u673a\u5668\u6587\u672c\u4e4b\u95f4\u52a8\u6001\u5173\u7cfb\u7684\u66f4\u597d\u7406\u89e3\uff0c\u4ece\u800c\u6700\u7ec8\u652f\u6301\u5728\u4e00\u4e2a\u65e5\u76ca\u81ea\u52a8\u5316\u7684\u4e16\u754c\u4e2d\u4fe1\u606f\u7684\u5b8c\u6574\u6027\u3002|\n", "2410.14676": "|**2024-10-18**|**SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment**|Qin Liu et.al.|[2410.14676](http://arxiv.org/abs/2410.14676)|null|\u73b0\u6709\u7684\u504f\u597d\u5bf9\u9f50\u673a\u5236\u662f\u4e00\u79cd\u4e00\u5200\u5207\u7684\u5bf9\u9f50\u65b9\u5f0f\uff0c\u5176\u4e2d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53c2\u6570\u5316\u77e5\u8bc6\u4e2d\u7684\u975e\u504f\u597d\u7279\u5f81\u88ab\u7edf\u4e00\u5c4f\u853d\uff0c\u9002\u7528\u4e8e\u6240\u6709\u7528\u6237\u3002\u7136\u800c\uff0c\u8fd9\u90e8\u5206\u77e5\u8bc6\u5bf9\u4e8e\u90a3\u4e9b\u5177\u6709\u4e13\u4e1a\u77e5\u8bc6\u5e76\u80fd\u591f\u5904\u7406\u8fd9\u4e9b\u4fe1\u606f\u7684\u9ad8\u7ea7\u7528\u6237\u6765\u8bf4\u53ef\u80fd\u662f\u6709\u7528\u7684\u3002\u8fd9\u79cd\u4e00\u5200\u5207\u7684\u5bf9\u9f50\u673a\u5236\u524a\u5f31\u4e86\u8fd9\u4e9b\u5408\u683c\u7528\u6237\u7684LLM\u6548\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SudoLM\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u6388\u6743\u5bf9\u9f50\u8ba9LLM\u5b66\u4e60\u9488\u5bf9\u4e0d\u540c\u7528\u6237\u51ed\u8bc1\u7684\u5177\u4f53\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8bbf\u95ee\u63a7\u5236\u3002SudoLM\u5141\u8bb8\u6388\u6743\u7528\u6237\u901a\u8fc7\u5206\u914d\u7684SUDO\u5bc6\u94a5\u89e3\u9501\u5bf9\u6240\u6709\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8bbf\u95ee\uff0c\u800c\u975e\u6388\u6743\u7528\u6237\u5219\u65e0\u6cd5\u8bbf\u95ee\u8fd9\u4e9b\u77e5\u8bc6\u3002\u5728\u4e24\u4e2a\u5e94\u7528\u573a\u666f\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSudoLM\u80fd\u591f\u6709\u6548\u63a7\u5236\u7528\u6237\u5bf9\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8bbf\u95ee\uff0c\u5e76\u4fdd\u6301\u5176\u603b\u4f53\u6548\u7528\u3002|\n", "2410.14675": "|**2024-10-18**|**Enhancing Large Language Models' Situated Faithfulness to External Contexts**|Yukun Huang et.al.|[2410.14675](http://arxiv.org/abs/2410.14675)|**[link](https://github.com/kkkevinkkkkk/situated_faithfulness)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u5e38\u4f1a\u4f7f\u7528\u5916\u90e8\u4fe1\u606f\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff0c\u4f46\u8fd9\u4e9b\u5916\u90e8\u4fe1\u606f\u6709\u65f6\u53ef\u80fd\u662f\u4e0d\u51c6\u786e\u7684\uff0c\u751a\u81f3\u53ef\u80fd\u662f\u6545\u610f\u8bef\u5bfc\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u7a33\u5065\u7684LLMs\u5e94\u8be5\u5c55\u793a\u51fa\u60c5\u5883\u771f\u5b9e\u6027\uff0c\u6839\u636e\u5b83\u4eec\u5bf9\u5185\u90e8\u77e5\u8bc6\u548c\u5916\u90e8\u4e0a\u4e0b\u6587\u7684\u4fe1\u5fc3\u52a8\u6001\u8c03\u6574\u5bf9\u5916\u90e8\u4fe1\u606f\u7684\u4fe1\u4efb\u5ea6\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u80fd\u529b\uff0c\u6211\u4eec\u5bf9LLMs\u8fdb\u884c\u4e86\u591a\u9879QA\u6570\u636e\u96c6\u7684\u6d4b\u8bd5\uff0c\u5305\u62ec\u4e00\u4e2a\u65b0\u521b\u5efa\u7684\u6570\u636e\u96c6RedditQA\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u6765\u81eaReddit\u5e16\u5b50\u4e2d\u7684\u5b9e\u9645\u9519\u8bef\u4e0a\u4e0b\u6587\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u63d0\u4f9b\u6b63\u786e\u548c\u4e0d\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u65f6\uff0c\u65e0\u8bba\u662f\u5f00\u6e90\u6a21\u578b\u8fd8\u662f\u4e13\u6709\u6a21\u578b\uff0c\u90fd\u503e\u5411\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u5916\u90e8\u4fe1\u606f\uff0c\u800c\u4e0d\u7ba1\u5176\u4e8b\u5b9e\u51c6\u786e\u6027\u5982\u4f55\u3002\u4e3a\u4e86\u589e\u5f3a\u60c5\u5883\u771f\u5b9e\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u81ea\u5f15\u5bfc\u7f6e\u4fe1\u5ea6\u63a8\u7406\uff08SCR\uff09\u548c\u57fa\u4e8e\u89c4\u5219\u7684\u7f6e\u4fe1\u5ea6\u63a8\u7406\uff08RCR\uff09\u3002SCR\u4f7f\u6a21\u578b\u80fd\u591f\u6839\u636e\u81ea\u8eab\u5185\u90e8\u77e5\u8bc6\u76f8\u5bf9\u5730\u8bc4\u4f30\u5916\u90e8\u4fe1\u606f\u7684\u7f6e\u4fe1\u5ea6\uff0c\u4ece\u800c\u751f\u6210\u6700\u51c6\u786e\u7684\u7b54\u6848\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0cRCR\u4eceLLM\u4e2d\u63d0\u53d6\u663e\u5f0f\u7684\u7f6e\u4fe1\u5ea6\u4fe1\u53f7\uff0c\u5e76\u5229\u7528\u9884\u5b9a\u4e49\u7684\u89c4\u5219\u6765\u786e\u5b9a\u6700\u7ec8\u7b54\u6848\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u4e8e\u5177\u6709\u5f3a\u5927\u63a8\u7406\u80fd\u529b\u7684\u6a21\u578b\uff0c\u5982GPT-4o\u548cGPT-4o mini\uff0cSCR\u4f18\u4e8eRCR\uff0c\u5728\u76f4\u63a5\u8f93\u5165\u589e\u5f3a\u57fa\u7ebf\u4e0a\u7684\u63d0\u5347\u5e45\u5ea6\u6700\u9ad8\u53ef\u8fbe24.2%\u3002\u76f8\u53cd\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u5982Llama-3-8B\uff0cRCR\u5219\u4f18\u4e8eSCR\u3002\u901a\u8fc7\u6211\u4eec\u7684\u7f6e\u4fe1\u5ea6\u63a8\u7406\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08CR-DPO\uff09\u65b9\u6cd5\u5bf9SCR\u8fdb\u884c\u5fae\u8c03\uff0c\u53ef\u4ee5\u63d0\u9ad8\u5728\u5df2\u89c1\u548c\u672a\u89c1\u8fc7\u7684\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u5e73\u5747\u63d0\u5347\u5e45\u5ea6\u4e3a8.9%\u3002\u9664\u4e86\u5b9a\u91cf\u7ed3\u679c\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u5173\u4e8eSCR\u548cRCR\u76f8\u5bf9\u4f18\u52bf\u7684\u89c1\u89e3\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u63d0\u9ad8LLMs\u60c5\u5883\u771f\u5b9e\u6027\u7684\u6709\u524d\u666f\u9014\u5f84\u3002\u76f8\u5173\u6570\u636e\u548c\u4ee3\u7801\u5df2\u7ecf\u53d1\u5e03\u3002**|\n", "2410.14668": "|**2024-10-18**|**MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps**|Xiongtao Zhou et.al.|[2410.14668](http://arxiv.org/abs/2410.14668)|**[link](https://github.com/alenai97/miceval)**|**Multimodal Chain of Thought\uff08MCoT\uff09\u662f\u4e00\u79cd\u6d41\u884c\u7684\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u63d0\u9ad8\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5404\u79cd\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5c3d\u7ba1\u8fd9\u79cd\u65b9\u6cd5\u5f88\u53d7\u6b22\u8fce\uff0c\u4f46\u5728\u8bc4\u4f30\u591a\u6a21\u6001\u94fe\u5f0f\u601d\u7ef4\u63a8\u7406\u6b65\u9aa4\u7684\u8d28\u91cf\u65b9\u9762\u4ecd\u7f3a\u4e4f\u81ea\u52a8\u5316\u65b9\u6cd5\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Multimodal Chain-of-Thought Evaluation\uff08MiCEval\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u8bc4\u4f30\u63cf\u8ff0\u548c\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u7684\u8d28\u91cf\u6765\u8bc4\u4f30\u63a8\u7406\u94fe\u7684\u6b63\u786e\u6027\u3002\u63cf\u8ff0\u90e8\u5206\u7684\u8bc4\u4f30\u4fa7\u91cd\u4e8e\u56fe\u50cf\u63cf\u8ff0\u7684\u51c6\u786e\u6027\uff0c\u800c\u63a8\u7406\u6b65\u9aa4\u5219\u6839\u636e\u524d\u7eed\u6b65\u9aa4\u6761\u4ef6\u751f\u6210\u65f6\u7684\u8d28\u91cf\u8fdb\u884c\u8bc4\u4f30\u3002MiCEval\u57fa\u4e8e\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u6839\u636e\u6b63\u786e\u6027\u3001\u76f8\u5173\u6027\u548c\u4fe1\u606f\u91cf\u5bf9\u6bcf\u4e2a\u6b65\u9aa4\u8fdb\u884c\u6807\u6ce8\u3002\u5bf9\u56db\u79cd\u6700\u5148\u8fdb\u7684MLLMs\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528MiCEval\u8fdb\u884c\u9010\u6b65\u8bc4\u4f30\u4e0e\u4eba\u7c7b\u5224\u65ad\u66f4\u52a0\u543b\u5408\uff0c\u76f8\u6bd4\u73b0\u6709\u57fa\u4e8e\u4f59\u5f26\u76f8\u4f3c\u5ea6\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u66f4\u4e3a\u51c6\u786e\u3002MiCEval\u6570\u636e\u96c6\u548c\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/alenai97/MiCEval\u627e\u5230\u3002**|\n", "2410.14660": "|**2024-10-18**|**A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning**|Shengjie Sun et.al.|[2410.14660](http://arxiv.org/abs/2410.14660)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bbe\u8ba1\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4efb\u52a1\u7684\u5956\u52b1\u51fd\u6570\u65b9\u9762\u663e\u793a\u51fa\u663e\u8457\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u83b7\u53d6\u9ad8\u8d28\u91cf\u7684\u5956\u52b1\u4ee3\u7801\u901a\u5e38\u9700\u8981\u4eba\u5de5\u5e72\u9884\u3001\u5927\u91cf\u7684LLM\u67e5\u8be2\u6216\u91cd\u590d\u7684RL\u8bad\u7ec3\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CARD\uff0c\u8fd9\u662f\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u5956\u52b1\u8bbe\u8ba1\u6846\u67b6\uff0c\u5b83\u8fed\u4ee3\u5730\u751f\u6210\u548c\u6539\u8fdb\u5956\u52b1\u51fd\u6570\u4ee3\u7801\u3002\u5177\u4f53\u6765\u8bf4\uff0cCARD\u5305\u62ec\u4e00\u4e2a\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u751f\u6210\u548c\u9a8c\u8bc1\u4ee3\u7801\uff0c\u540c\u65f6\u8fd8\u6709\u4e00\u4e2a\u8bc4\u4f30\u5668\u63d0\u4f9b\u52a8\u6001\u53cd\u9988\u6765\u6307\u5bfc\u7f16\u7801\u5668\u6539\u8fdb\u4ee3\u7801\uff0c\u4ece\u800c\u6d88\u9664\u4e86\u5bf9\u4eba\u5de5\u53cd\u9988\u7684\u9700\u6c42\u3002\u9664\u4e86\u8fc7\u7a0b\u53cd\u9988\u548c\u8f68\u8ff9\u53cd\u9988\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u8f68\u8ff9\u504f\u597d\u8bc4\u4f30\uff08TPE\uff09\uff0c\u8be5\u8bc4\u4f30\u57fa\u4e8e\u8f68\u8ff9\u504f\u597d\u6765\u8bc4\u4f30\u5f53\u524d\u7684\u5956\u52b1\u51fd\u6570\u3002\u5982\u679c\u4ee3\u7801\u672a\u80fd\u901a\u8fc7TPE\uff0c\u8bc4\u4f30\u5668\u5c06\u63d0\u4f9b\u504f\u597d\u53cd\u9988\uff0c\u907f\u514d\u4e86\u5728\u6bcf\u6b21\u8fed\u4ee3\u65f6\u8fdb\u884cRL\u8bad\u7ec3\uff0c\u5e76\u4f7f\u5956\u52b1\u51fd\u6570\u66f4\u597d\u5730\u4e0e\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u3002\u5728Meta-World\u548cManiSkill2\u4e0a\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4efb\u52a1\u6027\u80fd\u548c\u4ee4\u724c\u6548\u7387\u4e4b\u95f4\u5b9e\u73b0\u4e86\u6709\u6548\u7684\u5e73\u8861\uff0c\u5728\u6240\u6709\u4efb\u52a1\u4e0a\u90fd\u4f18\u4e8e\u6216\u5339\u914d\u57fa\u7ebf\u3002\u572812\u4e2a\u4efb\u52a1\u4e2d\u768410\u4e2a\u4efb\u52a1\u4e0a\uff0cCARD\u7684\u8868\u73b0\u4f18\u4e8e\u6216\u53ef\u4e0e\u4f7f\u7528\u4e13\u5bb6\u8bbe\u8ba1\u5956\u52b1\u8bad\u7ec3\u7684\u7b56\u7565\u76f8\u5ab2\u7f8e\uff0c\u751a\u81f3\u57283\u4e2a\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u6700\u4f18\u89e3\u3002|\n", "2410.14649": "|**2024-10-18**|**EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search**|Oliver Sieberling et.al.|[2410.14649](http://arxiv.org/abs/2410.14649)|**[link](https://github.com/ist-daslab/evopress)**|\u9ad8\u8ba1\u7b97\u6210\u672c\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u4e34\u7684\u4e00\u4e2a\u4e3b\u8981\u95ee\u9898\uff0c\u56e0\u6b64\u5bf9\u6a21\u578b\u538b\u7f29\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5305\u62ec\u91cf\u5316\u3001\u7a00\u758f\u5316\u6216\u7ed3\u6784\u5316\u526a\u679d\u7b49\u3002\u4e00\u4e2a\u65b0\u7684\u7814\u7a76\u524d\u6cbf\u662f\u7531\u6240\u8c13\u7684\u201c\u52a8\u6001\u3001\u975e\u5747\u5300\u201d\u538b\u7f29\u65b9\u6cd5\u6784\u6210\u7684\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u901a\u8fc7\u8c03\u6574\u6bcf\u5757\u6216\u751a\u81f3\u6bcf\u5c42\u7684\u538b\u7f29\u7ea7\u522b\uff08\u4f8b\u5982\u7a00\u758f\u6027\uff09\uff0c\u4ee5\u6700\u5c0f\u5316\u7cbe\u5ea6\u635f\u5931\uff0c\u540c\u65f6\u786e\u4fdd\u5168\u5c40\u538b\u7f29\u9608\u503c\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u542f\u53d1\u5f0f\u65b9\u6cd5\u6765\u8bc6\u522b\u7ed9\u5b9a\u5c42\u5bf9\u8bef\u5dee\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u57fa\u4e8e\u8bf8\u5982\u201c\u8bef\u5dee\u5355\u8c03\u6027\u201d\u7684\u5047\u8bbe\uff0c\u5373\u7aef\u5230\u7aef\u6a21\u578b\u538b\u7f29\u8bef\u5dee\u4e0e\u5404\u5c42\u8bef\u5dee\u4e4b\u548c\u6210\u6bd4\u4f8b\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u8fd9\u4e00\u9886\u57df\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u901a\u7528\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u7ed9\u5b9a\u8f93\u5165\u8303\u56f4\u5185\u88ab\u8bc1\u660e\u662f\u6700\u4f73\u7684\u3002\u6211\u4eec\u7684\u52a8\u673a\u89c2\u5bdf\u5230\uff0c\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u201c\u8bef\u5dee\u5355\u8c03\u6027\u201d\u5bf9\u4e8eLLMs\u5e76\u4e0d\u6210\u7acb\uff1a\u5177\u6709\u8f83\u4f4e\u5404\u5c42\u8bef\u5dee\u603b\u548c\u7684\u538b\u7f29\u6a21\u578b\u53ef\u80fd\u8868\u73b0\u5f97\u6bd4\u8bef\u5dee\u603b\u548c\u8f83\u9ad8\u7684\u6a21\u578b\u66f4\u5dee\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u901a\u7528\u8fdb\u5316\u6846\u67b6\uff0c\u79f0\u4e3aEvoPress\uff0c\u5b83\u5177\u6709\u7406\u8bba\u4e0a\u7684\u6536\u655b\u6027\u548c\u4f4e\u6837\u672c\u53ca\u8bc4\u4f30\u590d\u6742\u5ea6\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u4fdd\u8bc1\u5bfc\u81f4\u4e86\u5728\u52a8\u6001\u538b\u7f29Llama\u3001Mistral\u548cPhi\u6a21\u578b\u65b9\u9762\u9ad8\u5ea6\u7ade\u4e89\u7684\u5b9e\u9645\u6027\u80fd\u3002\u901a\u8fc7EvoPress\uff0c\u6211\u4eec\u5728\u6240\u6709\u538b\u7f29\u65b9\u6cd5\u4e0a\u90fd\u8fbe\u5230\u4e86\u6700\u65b0\u7684\u6210\u679c\uff1a\u7ed3\u6784\u526a\u679d\uff08\u5757/\u5c42\u5220\u9664\uff09\u3001\u975e\u7ed3\u6784\u5316\u7a00\u758f\u6027\u4ee5\u53ca\u5177\u6709\u52a8\u6001\u4f4d\u5bbd\u7684\u91cf\u5316\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/IST-DASLab/EvoPress\u83b7\u53d6\u3002|\n", "2410.14641": "|**2024-10-18**|**Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs**|Runchu Tian et.al.|[2410.14641](http://arxiv.org/abs/2410.14641)|**[link](https://github.com/Rachum-thu/LongPiBench)**|**\u4f4d\u7f6e\u504f\u5dee\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u957f\u8f93\u5165\u7684\u80fd\u529b\u3002\u4e00\u4e2a\u663e\u8457\u7684\u4f8b\u5b50\u662f\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u73b0\u8c61\uff0c\u5373LLMs\u96be\u4ee5\u5229\u7528\u4f4d\u4e8e\u8f93\u5165\u4e2d\u95f4\u7684\u76f8\u5173\u4fe1\u606f\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4e2a\u76f8\u5173\u4fe1\u606f\u4e0a\uff0c\u4f46\u73b0\u5b9e\u4e16\u754c\u7684\u5e94\u7528\u901a\u5e38\u6d89\u53ca\u591a\u4e2a\u76f8\u5173\u7684\u4fe1\u606f\u7247\u6bb5\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LongPiBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30\u6d89\u53ca\u591a\u4e2a\u76f8\u5173\u7247\u6bb5\u7684\u4f4d\u7f6e\u504f\u5dee\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u901a\u8fc7\u4e94\u79cd\u5546\u4e1a\u6a21\u578b\u548c\u516d\u79cd\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u7684\u8be6\u7ec6\u5b9e\u9a8c\u8868\u660e\uff0c\u867d\u7136\u5927\u591a\u6570\u5f53\u524d\u6a21\u578b\u5bf9\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u7684\u95ee\u9898\u5177\u6709\u9c81\u68d2\u6027\uff0c\u4f46\u5b58\u5728\u4e0e\u76f8\u5173\u4fe1\u606f\u7247\u6bb5\u95f4\u8ddd\u663e\u8457\u76f8\u5173\u7684\u504f\u5dee\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8bc4\u4f30\u548c\u51cf\u5c11\u4f4d\u7f6e\u504f\u5dee\u7684\u91cd\u8981\u6027\uff0c\u4ee5\u63d0\u5347LLMs\u7684\u80fd\u529b\u3002**|\n", "2410.14635": "|**2024-10-18**|**GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings**|Raghuveer Thirukovalluru et.al.|[2410.14635](http://arxiv.org/abs/2410.14635)|**[link](https://github.com/raghavlite/GenEOL)**|\u8bad\u7ec3-free\u5d4c\u5165\u65b9\u6cd5\u76f4\u63a5\u5229\u7528\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u5d4c\u5165\u6587\u672c\uff0c\u907f\u514d\u4e86\u5bf9\u6bd4\u5b66\u4e60\u7684\u6602\u8d35\u548c\u590d\u6742\u7684\u6d41\u7a0b\u3002\u5148\u524d\u7684\u8bad\u7ec3-free\u5d4c\u5165\u65b9\u6cd5\u4e3b\u8981\u96c6\u4e2d\u5728\u4f18\u5316\u5d4c\u5165\u63d0\u793a\u4e0a\uff0c\u800c\u5ffd\u7565\u4e86\u5229\u7528LLMs\u7684\u751f\u6210\u80fd\u529b\u7684\u597d\u5904\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5GenEOL\uff0c\u8be5\u65b9\u6cd5\u4f7f\u7528LLMs\u751f\u6210\u4fdd\u7559\u53e5\u5b50\u542b\u4e49\u7684\u4e0d\u540c\u53d8\u6362\uff0c\u5e76\u805a\u5408\u8fd9\u4e9b\u53d8\u6362\u7684\u5d4c\u5165\u7ed3\u679c\u4ee5\u589e\u5f3a\u6574\u4f53\u53e5\u5b50\u5d4c\u5165\u3002GenEOL\u5728\u591a\u4e2aLLMs\u7684\u53e5\u5b50\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u6027\uff08STS\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5e73\u5747\u6bd4\u73b0\u6709\u7684\u8bad\u7ec3-free\u5d4c\u5165\u65b9\u6cd5\u9ad8\u51fa2.85\u5206\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cGenEOL\u5728LLM\u5c42\u9762\u4e0a\u7a33\u5b9a\u4e86\u8868\u5f81\u8d28\u91cf\uff0c\u5e76\u4e14\u5bf9\u5d4c\u5165\u63d0\u793a\u7684\u6270\u52a8\u5177\u6709\u9c81\u68d2\u6027\u3002GenEOL\u8fd8\u5728MTEB\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u591a\u4e2a\u805a\u7c7b\u3001\u91cd\u6392\u5e8f\u548c\u914d\u5bf9\u5206\u7c7b\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002|\n", "2410.14609": "|**2024-10-18**|**DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search**|Simon Lupart et.al.|[2410.14609](http://arxiv.org/abs/2410.14609)|null|\u4f1a\u8bdd\u641c\u7d22\uff08CS\uff09\u4efb\u52a1\u662f\u5728\u8bed\u5883\u5185\u4ece\u6587\u6863\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u6587\u6863\uff0c\u7ed3\u5408\u68c0\u7d22\u4e0e\u4f1a\u8bdd\u4e0a\u4e0b\u6587\u5efa\u6a21\u3002\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0cCS\u9886\u57df\u901a\u8fc7LLMs\u91cd\u5199\u7528\u6237\u67e5\u8be2\u5e76\u8003\u8651\u4f1a\u8bdd\u4e0a\u4e0b\u6587\u5f97\u5230\u4e86\u663e\u8457\u6539\u8fdb\u3002\u7136\u800c\uff0c\u5728\u63a8\u7406\u65f6\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u4f1a\u5f71\u54cd\u6548\u7387\u3002\u5f53\u524d\u65b9\u6cd5\u901a\u8fc7\u4ece\u4eba\u7c7b\u91cd\u5199\u7684\u67e5\u8be2\u4e2d\u84b8\u998f\u5d4c\u5165\u6765\u5b66\u4e60\u4e0a\u4e0b\u6587\u5efa\u6a21\u4efb\u52a1\u4ee5\u89e3\u51b3\u6b64\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u4e8e\u4e0a\u4e0b\u6587\u5efa\u6a21\uff0c\u5e76\u4e14\u4ec5\u5728\u72ec\u7acb\u4e8e\u84b8\u998f\u7684\u635f\u5931\u9879\u4e2d\u5904\u7406\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u6bd4\u90e8\u5206\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u5bf9\u5148\u524d\u76ee\u6807\u7684\u653e\u677e\uff0c\u7edf\u4e00\u68c0\u7d22\u548c\u4e0a\u4e0b\u6587\u5efa\u6a21\u3002\u6211\u4eec\u901a\u8fc7\u84b8\u998f\u5bf9\u8bdd\u548c\u6587\u6863\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u5206\u6570\u6765\u653e\u677e\u73b0\u6709\u7684\u8bad\u7ec3\u76ee\u6807\uff0c\u800c\u4e0d\u662f\u4ec5\u4ec5\u4f9d\u8d56\u8868\u793a\u5b66\u4e60\u3002\u6211\u4eec\u63d0\u51fa\u7684\u84b8\u998f\u76ee\u6807\u5141\u8bb8\u5728\u8868\u793a\u7a7a\u95f4\u4e2d\u6709\u66f4\u591a\u7684\u81ea\u7531\u5ea6\uff0c\u5e76\u5229\u7528\u6587\u6863\u76f8\u5173\u6027\u7684\u5bf9\u6bd4\u6027\u8d28\u3002\u901a\u8fc7\u57285\u4e2aCS\u6570\u636e\u96c6\u4e0a\u7684Learned Sparse Retrieval\uff08LSR\uff09\u5b9e\u9a8c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u57df\u5185\u548c\u57df\u5916\u68c0\u7d22\u6027\u80fd\u65b9\u9762\u5747\u663e\u793a\u51fa\u663e\u8457\u6539\u5584\uff0c\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u6c34\u5e73\uff0c\u5728\u57df\u5916\u6570\u636e\u96c6\u4e0a\u53ec\u56de\u7387\u63d0\u9ad8\u4e86\u591a\u8fbe6\u4e2a\u767e\u5206\u70b9\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u653e\u677e\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6559\u5e08\u84b8\u998f\uff0c\u4f7f\u7528\u591a\u4e2aLLM\u4f5c\u4e3a\u6559\u5e08\uff0c\u4ece\u800c\u83b7\u5f97\u989d\u5916\u6536\u76ca\uff0c\u5e76\u5728\u57df\u5185\u5b9e\u9a8c\u4e2d\u8d85\u8d8a\u8fd9\u4e9b\u6559\u5e08\u672c\u8eab\u3002\u6700\u540e\uff0c\u5bf9\u6a21\u578b\u7a00\u758f\u6027\u7684\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u84b8\u998f\u65b9\u6cd5\u80fd\u591f\u66f4\u597d\u5730\u63a7\u5236\u8bad\u7ec3\u6a21\u578b\u7684\u7a00\u758f\u6027\u3002|\n", "2410.14596": "|**2024-10-18**|**Teaching Models to Balance Resisting and Accepting Persuasion**|Elias Stengel-Eskin et.al.|[2410.14596](http://arxiv.org/abs/2410.14596)|**[link](https://github.com/esteng/persuasion_balanced_training)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u8bf4\u670d\u7684\u5f71\u54cd\uff0c\u8fd9\u5728\u6a21\u578b\u9762\u5bf9\u654c\u5bf9\u5bf9\u8bdd\u8005\u65f6\u53ef\u80fd\u5e26\u6765\u98ce\u9669\u3002\u6211\u4eec\u8fc8\u51fa\u4e86\u9632\u5fa1\u6a21\u578b\u514d\u53d7\u8bf4\u670d\u7684\u7b2c\u4e00\u6b65\uff0c\u540c\u65f6\u8ba4\u4e3a\u9632\u5fa1\u8d1f\u9762\u8bf4\u670d\u53ea\u662f\u95ee\u9898\u7684\u4e00\u534a\uff1a\u6a21\u578b\u8fd8\u5e94\u8be5\u80fd\u591f\u63a5\u53d7\u6709\u76ca\u7684\u8bf4\u670d\u4ee5\u6539\u8fdb\u5176\u7b54\u6848\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ec5\u4f18\u5316\u6a21\u578b\u4e00\u65b9\u9762\u4f1a\u5bfc\u81f4\u5728\u53e6\u4e00\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5e73\u8861\u6b63\u9762\u548c\u8d1f\u9762\u8bf4\u670d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8bf4\u670d\u5e73\u8861\u8bad\u7ec3\uff08PBT\uff09\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u591a\u667a\u80fd\u4f53\u9012\u5f52\u5bf9\u8bdd\u6811\u6765\u751f\u6210\u6570\u636e\uff0c\u5e76\u901a\u8fc7\u504f\u597d\u4f18\u5316\u8bad\u7ec3\u6a21\u578b\u5728\u9002\u5f53\u65f6\u5019\u63a5\u53d7\u8bf4\u670d\u3002PBT\u4e00\u81f4\u63d0\u9ad8\u4e86\u6a21\u578b\u5bf9\u6297\u9519\u8bef\u4fe1\u606f\u7684\u62b5\u6297\u529b\u548c\u5e94\u5bf9\u6311\u6218\u7684\u97e7\u6027\uff0c\u540c\u65f6\u4e5f\u4f7f\u6a21\u578b\u5728\u5305\u542b\u6b63\u53cd\u4e24\u9762\u8bf4\u670d\u7684\u6574\u4f53\u6570\u636e\u4e0a\u8868\u73b0\u6700\u4f73\u3002\u81f3\u5173\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0PBT\u6a21\u578b\u5728\u591a\u667a\u80fd\u4f53\u8fa9\u8bba\u4e2d\u662f\u66f4\u597d\u7684\u961f\u53cb\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u6ca1\u6709PBT\u7684\u60c5\u51b5\u4e0b\uff0c\u66f4\u5f3a\u548c\u8f83\u5f31\u6a21\u578b\u7684\u7ec4\u5408\u8868\u73b0\u51fa\u4e0d\u7a33\u5b9a\u6027\u80fd\uff0c\u6a21\u578b\u56de\u7b54\u7684\u987a\u5e8f\u51b3\u5b9a\u4e86\u56e2\u961f\u83b7\u5f97\u8f83\u5f3a\u6216\u8f83\u5f31\u6a21\u578b\u7684\u8868\u73b0\u3002PBT\u5e26\u6765\u4e86\u66f4\u597d\u4e14\u66f4\u7a33\u5b9a\u7684\u6027\u80fd\u7ed3\u679c\uff0c\u5e76\u51cf\u5c11\u4e86\u987a\u5e8f\u4f9d\u8d56\u6027\uff0c\u8f83\u5f3a\u6a21\u578b\u80fd\u591f\u6301\u7eed\u63d0\u5347\u8f83\u5f31\u6a21\u578b\u7684\u8868\u73b0\u3002**|\n", "2410.16270": "|**2024-10-21**|**Reflection-Bench: probing AI intelligence with reflection**|Lingyu Li et.al.|[2410.16270](http://arxiv.org/abs/2410.16270)|**[link](https://github.com/yabyum/reflectionbench)**|**\u9002\u5e94\u6027\u5730\u8c03\u6574\u4fe1\u5ff5\u6216\u884c\u4e3a\u4ee5\u5e94\u5bf9\u610f\u5916\u7ed3\u679c\u7684\u53cd\u601d\u80fd\u529b\uff0c\u662f\u667a\u80fd\u7cfb\u7edf\u4e0e\u4e16\u754c\u4e92\u52a8\u7684\u6838\u5fc3\u539f\u5219\u3002\u4ece\u8ba4\u77e5\u79d1\u5b66\u7684\u89d2\u5ea6\u6765\u770b\uff0c\u8fd9\u4e00\u539f\u5219\u9002\u7528\u4e8e\u4eba\u7c7b\u548c\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u3002\u4e3a\u4e86\u5e94\u5bf9\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u667a\u80fd\u6027\u7684\u8fa9\u8bba\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Reflection-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b\u4e03\u4e2a\u4efb\u52a1\u7684\u7efc\u5408\u57fa\u51c6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u53cd\u601d\u6240\u9700\u7684\u6838\u5fc3\u8ba4\u77e5\u529f\u80fd\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u3001\u4fe1\u5ff5\u66f4\u65b0\u3001\u51b3\u7b56\u3001\u9884\u6d4b\u3001\u53cd\u4e8b\u5b9e\u601d\u7ef4\u548c\u5143\u53cd\u601d\u3002\u6211\u4eec\u8bc4\u4f30\u4e8613\u4e2a\u8457\u540d\u7684LLMs\uff0c\u5982OpenAI o1\u3001GPT-4\u3001Claude 3.5 Sonnet\u7b49\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u76ee\u524d\u7684LLMs\u5728\u53cd\u601d\u80fd\u529b\u65b9\u9762\u4ecd\u4e0d\u4ee4\u4eba\u6ee1\u610f\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u8fd9\u4e9b\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u7684\u6f5c\u5728\u65b9\u5411\u3002\u603b\u4e4b\uff0cReflection-Bench\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u8bc4\u4f30\u5de5\u5177\uff0c\u4e5f\u4e3a\u5f00\u53d1\u80fd\u591f\u53ef\u9760\u5730\u4e0e\u73af\u5883\u4ea4\u4e92\u7684AI\u63d0\u4f9b\u4e86\u7075\u611f\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u53ef\u5728https://github.com/YabYum/ReflectionBench\u83b7\u5f97\u3002**|\n", "2410.16261": "|**2024-10-21**|**Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance**|Zhangwei Gao et.al.|[2410.16261](http://arxiv.org/abs/2410.16261)|**[link](https://github.com/opengvlab/internvl)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5e7f\u6cdb\u7684\u9886\u57df\u5185\u5c55\u793a\u4e86\u5728\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6a21\u578b\u89c4\u6a21\u5e9e\u5927\u548c\u76f8\u5173\u7684\u9ad8\u8ba1\u7b97\u6210\u672c\uff0c\u5728\u6d88\u8d39\u8005\u7ea7GPU\u6216\u8fb9\u7f18\u8bbe\u5907\u4e0a\u8bad\u7ec3\u548c\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u9762\u4e34\u7740\u5de8\u5927\u6311\u6218\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Mini-InternVL\u7cfb\u5217\u6a21\u578b\uff0c\u5176\u53c2\u6570\u91cf\u4ece1B\u52304B\u4e0d\u7b49\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ec5\u4f7f\u75285%\u7684\u53c2\u6570\u5c31\u80fd\u8fbe\u523090%\u7684\u6027\u80fd\u3002\u8fd9\u79cd\u663e\u8457\u7684\u6548\u7387\u548c\u6548\u679c\u63d0\u5347\u4f7f\u6211\u4eec\u7684\u6a21\u578b\u66f4\u52a0\u6613\u4e8e\u8bbf\u95ee\u548c\u9002\u7528\u4e8e\u5404\u79cd\u5b9e\u9645\u573a\u666f\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u8fd9\u4e9b\u6a21\u578b\u7684\u91c7\u7528\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7edf\u4e00\u7684\u9002\u5e94\u6846\u67b6\uff0c\u4f7f\u5f97Mini-InternVL\u6a21\u578b\u80fd\u591f\u8f6c\u79fb\u5e76\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8d85\u8d8a\u4e13\u95e8\u6a21\u578b\uff0c\u5305\u62ec\u81ea\u52a8\u9a7e\u9a76\u3001\u533b\u5b66\u5f71\u50cf\u548c\u9065\u611f\u7b49\u9886\u57df\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u7814\u7a76\u53ef\u4ee5\u4e3a\u9ad8\u6548\u4e14\u6709\u6548\u7684MLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u8d44\u6e90\u3002\u4ee3\u7801\u53ef\u5728https://github.com/OpenGVLab/InternVL\u83b7\u53d6\u3002**|\n", "2410.16257": "|**2024-10-21**|**Elucidating the design space of language models for image generation**|Xuantong Liu et.al.|[2410.16257](http://arxiv.org/abs/2410.16257)|**[link](https://github.com/Pepper-lll/LMforImageGeneration)**|\u81ea\u56de\u5f52\uff08AR\uff09\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u751f\u6210\u4e2d\u7684\u6210\u529f\u6fc0\u53d1\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u793e\u533a\u91c7\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u56fe\u50cf\u751f\u6210\u3002\u7136\u800c\uff0c\u8003\u8651\u5230\u6587\u672c\u548c\u56fe\u50cf\u6a21\u6001\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\uff0c\u7528\u4e8e\u56fe\u50cf\u751f\u6210\u7684\u8bed\u8a00\u6a21\u578b\u7684\u8bbe\u8ba1\u7a7a\u95f4\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u56fe\u50cf\u6807\u8bb0\u8868\u73b0\u51fa\u6bd4\u6587\u672c\u6807\u8bb0\u66f4\u5927\u7684\u968f\u673a\u6027\uff0c\u8fd9\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5e26\u6765\u4e86\u6311\u6218\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cAR\u6a21\u578b\u901a\u8fc7\u6709\u6548\u5730\u5b66\u4e60\u5373\u4f7f\u662f\u4ece\u770b\u4f3c\u6b21\u4f18\u7684\u4f18\u5316\u95ee\u9898\u4e2d\u63d0\u53d6\u7684\u6a21\u5f0f\uff0c\u5c55\u793a\u4e86\u5176\u6f5c\u529b\u3002\u6211\u4eec\u7684\u5206\u6790\u8fd8\u8868\u660e\uff0c\u867d\u7136\u6240\u6709\u6a21\u578b\u90fd\u6210\u529f\u5730\u7406\u89e3\u4e86\u5c40\u90e8\u4fe1\u606f\u5728\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u91cd\u8981\u6027\uff0c\u4f46\u8f83\u5c0f\u7684\u6a21\u578b\u96be\u4ee5\u6355\u6349\u5168\u5c40\u4e0a\u4e0b\u6587\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u8f83\u5927\u7684\u6a21\u578b\u5728\u8fd9\u65b9\u9762\u8868\u73b0\u51fa\u66f4\u597d\u7684\u80fd\u529b\uff0c\u89e3\u91ca\u4e86\u5f53\u6269\u5927\u6a21\u578b\u89c4\u6a21\u65f6\u6027\u80fd\u63d0\u5347\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u901a\u8fc7\u5e7f\u6cdb\u7684\u5bf9\u6bd4\u5b9e\u9a8c\u9610\u660e\u4e86\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u7684\u8bed\u8a00\u6a21\u578b\u7684\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5305\u62ec\u6807\u8bb0\u5668\u9009\u62e9\u3001\u6a21\u578b\u9009\u62e9\u3001\u6a21\u578b\u53ef\u6269\u5c55\u6027\u3001\u8bcd\u6c47\u8bbe\u8ba1\u548c\u91c7\u6837\u7b56\u7565\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u9996\u6b21\u5206\u6790\u4e86\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u751f\u6210\u4e2d\u7684\u4f18\u5316\u884c\u4e3a\uff0c\u6211\u4eec\u8ba4\u4e3a\u5b83\u80fd\u591f\u542f\u53d1\u66f4\u6709\u6548\u7684\u8bbe\u8ba1\uff0c\u5f53\u5c06LMs\u5e94\u7528\u4e8e\u5176\u4ed6\u9886\u57df\u65f6\u3002\u6700\u540e\uff0c\u6211\u4eec\u9610\u660e\u4e86\u4e00\u79cd\u7528\u4e8e\u56fe\u50cf\u751f\u6210\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u79f0\u4e3aELM\uff0c\u5728ImageNet 256*256\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002|\n", "2410.16256": "|**2024-10-21**|**CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution**|Maosong Cao et.al.|[2410.16256](http://arxiv.org/abs/2410.16256)|**[link](https://github.com/open-compass/compassjudger)**|**\u9ad8\u6548\u4e14\u51c6\u786e\u7684\u8bc4\u4f30\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6301\u7eed\u6539\u8fdb\u81f3\u5173\u91cd\u8981\u3002\u5728\u5404\u79cd\u8bc4\u4f30\u65b9\u6cd5\u4e2d\uff0c\u4e3b\u89c2\u8bc4\u4f30\u56e0\u5176\u4e0e\u73b0\u5b9e\u4e16\u754c\u4f7f\u7528\u573a\u666f\u548c\u4eba\u7c7b\u504f\u597d\u7684\u9ad8\u5ea6\u4e00\u81f4\u800c\u5907\u53d7\u5173\u6ce8\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u4eba\u7c7b\u7684\u8bc4\u4f30\u65e2\u6602\u8d35\u53c8\u7f3a\u4e4f\u53ef\u91cd\u590d\u6027\uff0c\u56e0\u6b64\u7cbe\u786e\u7684\u81ea\u52a8\u5316\u8bc4\u4f30\u8005\uff08\u8bc4\u5224\u8005\uff09\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\\textbf{CompassJudger-1}\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5f00\u6e90\u7684\\textbf{\u4e00\u4f53\u5316}\u8bc4\u5224LLM\u3002CompassJudger-1\u662f\u4e00\u4e2a\u901a\u7528\u7684LLM\uff0c\u8868\u73b0\u51fa\u663e\u8457\u7684\u591a\u529f\u80fd\u6027\u3002\u5b83\u80fd\u591f\uff1a1. \u4f5c\u4e3a\u5956\u52b1\u6a21\u578b\u8fdb\u884c\u5355\u4e00\u8bc4\u5206\u548c\u53cc\u6a21\u578b\u6bd4\u8f83\uff1b2. \u6839\u636e\u6307\u5b9a\u683c\u5f0f\u8fdb\u884c\u8bc4\u4f30\uff1b3. \u751f\u6210\u6279\u8bc4\uff1b4. \u6267\u884c\u5404\u79cd\u4efb\u52a1\uff0c\u5c31\u50cf\u4e00\u4e2a\u901a\u7528\u7684LLM\u3002\u4e3a\u4e86\u5728\u4e00\u4e2a\u7edf\u4e00\u7684\u8bbe\u7f6e\u4e0b\u8bc4\u4f30\u4e0d\u540c\u8bc4\u5224\u6a21\u578b\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fd8\u5efa\u7acb\u4e86\\textbf{JudgerBench}\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u4e3b\u89c2\u8bc4\u4f30\u4efb\u52a1\uff0c\u5e76\u6d89\u53ca\u5e7f\u6cdb\u7684\u4e3b\u9898\u3002CompassJudger-1\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\u6765\u5904\u7406\u5404\u79cd\u8bc4\u4f30\u4efb\u52a1\uff0c\u540c\u65f6\u4fdd\u6301\u9002\u5e94\u591a\u6837\u5316\u9700\u6c42\u7684\u7075\u6d3b\u6027\u3002CompassJudger\u548cJudgerBench\u5747\u5df2\u53d1\u5e03\u5e76\u53ef\u4f9b\u7814\u7a76\u793e\u533a\u8bbf\u95eehttps://github.com/open-compass/CompassJudger\u3002\u6211\u4eec\u76f8\u4fe1\u901a\u8fc7\u5f00\u6e90\u8fd9\u4e9b\u5de5\u5177\uff0c\u6211\u4eec\u53ef\u4ee5\u4fc3\u8fdb\u5408\u4f5c\u5e76\u52a0\u901fLLM\u8bc4\u4f30\u65b9\u6cd5\u7684\u8fdb\u6b65\u3002**|\n", "2410.16251": "|**2024-10-21**|**Can Knowledge Editing Really Correct Hallucinations?**|Baixiang Huang et.al.|[2410.16251](http://arxiv.org/abs/2410.16251)|**[link](https://github.com/llm-editing/HalluEditBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u5185\u5bb9\u65f6\u5e38\u5e38\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u5373\u5305\u542b\u4e0d\u771f\u5b9e\u7684\u4fe1\u606f\uff0c\u5c3d\u7ba1\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u540c\u65f6\uff0c\u77e5\u8bc6\u7f16\u8f91\u4f5c\u4e3a\u4e00\u79cd\u65b0\u7684\u6d41\u884c\u8303\u5f0f\uff0c\u88ab\u7528\u6765\u7ea0\u6b63LLMs\u4e2d\u9519\u8bef\u7684\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u5176\u4f18\u52bf\u5728\u4e8e\u907f\u514d\u4e86\u4ece\u5934\u5f00\u59cb\u91cd\u65b0\u8bad\u7ec3\u7684\u9700\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7528\u4e8e\u77e5\u8bc6\u7f16\u8f91\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u7684\u4e00\u4e2a\u5e38\u89c1\u95ee\u9898\u662f\uff0c\u5b83\u4eec\u5e76\u4e0d\u80fd\u786e\u4fddLLMs\u5728\u7f16\u8f91\u524d\u5bf9\u8bc4\u4f30\u95ee\u9898\u751f\u6210\u5e7b\u89c9\u6027\u7b54\u6848\u3002\u5f53\u7ecf\u8fc7\u4e0d\u540c\u6280\u672f\u7f16\u8f91\u540e\u7684LLMs\u5728\u8fd9\u7c7b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\u65f6\uff0c\u5f88\u96be\u76f4\u63a5\u91c7\u7528\u8fd9\u4e9b\u6027\u80fd\u6765\u8bc4\u4f30\u4e0d\u540c\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u5728\u7ea0\u6b63\u5e7b\u89c9\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u4e00\u4e2a\u57fa\u672c\u7684\u95ee\u9898\u4ecd\u672a\u5f97\u5230\u5145\u5206\u9a8c\u8bc1\uff1a\u77e5\u8bc6\u7f16\u8f91\u771f\u7684\u80fd\u7ea0\u6b63LLMs\u4e2d\u7684\u5e7b\u89c9\u5417\uff1f\u6211\u4eec\u63d0\u51fa\u4e86HalluEditBench\uff0c\u4ee5\u5168\u9762\u57fa\u51c6\u6d4b\u8bd5\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u5728\u7ea0\u6b63\u73b0\u5b9e\u4e16\u754c\u5e7b\u89c9\u65b9\u9762\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u4e25\u683c\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9\u4e2a\u9886\u57df\u300126\u4e2a\u4e3b\u9898\u548c\u8d85\u8fc76000\u4e2a\u5e7b\u89c9\u7684\u5927\u89c4\u6a21\u5e7b\u89c9\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728\u4e94\u4e2a\u7ef4\u5ea6\u2014\u2014\u5305\u62ec\u6709\u6548\u6027\u3001\u6cdb\u5316\u80fd\u529b\u3001\u53ef\u79fb\u690d\u6027\u3001\u5c40\u90e8\u6027\u548c\u9c81\u68d2\u6027\u2014\u2014\u4e0a\u5168\u9762\u8bc4\u4f30\u4e86\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u901a\u8fc7HalluEditBench\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9\u4e0d\u540c\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u5728\u7ea0\u6b63\u5e7b\u89c9\u65b9\u9762\u7684\u6f5c\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\uff0c\u8fd9\u53ef\u4ee5\u542f\u53d1\u672a\u6765\u7684\u6539\u8fdb\u5e76\u4fc3\u8fdb\u77e5\u8bc6\u7f16\u8f91\u9886\u57df\u7684\u8fdb\u5c55\u3002**|\n", "2410.16246": "|**2024-10-21**|**Analyzing Context Contributions in LLM-based Machine Translation**|Emmanouil Zaranis et.al.|[2410.16246](http://arxiv.org/abs/2410.16246)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u65b9\u9762\u5df2\u7ecf\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u5c55\u793a\u4e86\u5229\u7528\u4e0a\u4e0b\u6587\u8fdb\u884c\u5b66\u4e60\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5173\u4e8eLLMs\u5982\u4f55\u4f7f\u7528\u8f93\u5165\u7684\u4e0d\u540c\u90e8\u5206\u7684\u673a\u5236\u4ecd\u7136\u5f88\u5927\u7a0b\u5ea6\u4e0a\u672a\u88ab\u63a2\u7d22\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5bf9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u4e0a\u4e0b\u6587\u5229\u7528\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u7814\u7a76\u4e86\u5f53\u751f\u6210\u7ffb\u8bd1\u65f6\uff0cLLMs\u5982\u4f55\u4f7f\u7528\u5404\u79cd\u4e0a\u4e0b\u6587\u90e8\u5206\uff0c\u5982\u5c11\u91cf\u793a\u4f8b\u548c\u6e90\u6587\u672c\u3002\u6211\u4eec\u5f3a\u8c03\u4e86\u51e0\u4e2a\u5173\u952e\u53d1\u73b0\uff1a\uff081\uff09\u5728\u4e0d\u540c\u7ffb\u8bd1\u65b9\u5411\u4e0b\uff0c\u5c11\u91cf\u793a\u4f8b\u7684\u6e90\u90e8\u5206\u4f3c\u4e4e\u6bd4\u5176\u5bf9\u5e94\u7684\u76ee\u6807\u90e8\u5206\u8d21\u732e\u66f4\u5927\uff1b\uff082\uff09\u7528\u5e73\u884c\u6570\u636e\u5fae\u8c03LLMs\u4f1a\u6539\u53d8\u4e0d\u540c\u4e0a\u4e0b\u6587\u90e8\u5206\u7684\u8d21\u732e\u6a21\u5f0f\uff1b\uff083\uff09\u5b58\u5728\u4f4d\u7f6e\u504f\u5dee\uff0c\u5373\u66f4\u65e9\u7684\u5c11\u91cf\u793a\u4f8b\u5bf9\u7ffb\u8bd1\u5e8f\u5217\u7684\u8d21\u732e\u66f4\u9ad8\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u68c0\u67e5\u5f02\u5e38\u7684\u4e0a\u4e0b\u6587\u8d21\u732e\u53ef\u4ee5\u6f5c\u5728\u5730\u63ed\u793a\u75c5\u6001\u7ffb\u8bd1\uff0c\u4f8b\u5982\u5e7b\u89c9\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u57fa\u4e8eLLM\u7684\u673a\u5668\u7ffb\u8bd1\u7684\u5185\u90e8\u8fd0\u4f5c\u673a\u5236\uff0c\u8fd9\u4e9b\u673a\u5236\u8d85\u8d8a\u4e86\u6807\u51c6\u7f16\u7801\u5668-\u89e3\u7801\u5668\u673a\u5668\u7ffb\u8bd1\u6a21\u578b\u5df2\u77e5\u7684\u77e5\u8bc6\u3002|\n", "2410.16237": "|**2024-10-21**|**IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems**|Yihuan Mao et.al.|[2410.16237](http://arxiv.org/abs/2410.16237)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8d8a\u6765\u8d8a\u591a\u5730\u96c6\u6210\u5230\u6211\u4eec\u7684\u57fa\u7840\u8bbe\u65bd\u4e2d\uff0c\u5b83\u4eec\u7684\u7a33\u5065\u534f\u8c03\u548c\u6d88\u606f\u540c\u6b65\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u62dc\u5360\u5ead\u5c06\u519b\u95ee\u9898\uff08BGP\uff09\u662f\u6784\u5efa\u5728\u5bf9\u6297\u6027\u653b\u51fb\u4e0b\u5177\u6709\u5f39\u6027\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08MAS\uff09\u7684\u5173\u952e\u6a21\u578b\u3002\u5b83\u63cf\u8ff0\u4e86\u4e00\u79cd\u60c5\u666f\uff0c\u5728\u8fd9\u79cd\u60c5\u666f\u4e2d\u7cfb\u7edf\u5185\u5b58\u5728\u6076\u610f\u4ee3\u7406\uff0c\u8fd9\u79cd\u60c5\u51b5\u53ef\u80fd\u6e90\u4e8eLLM\u4ee3\u7406\u7684\u5e7b\u89c9\u6216\u5916\u90e8\u653b\u51fb\u3002\u5728BGP\u4e2d\uff0c\u6574\u4e2a\u7cfb\u7edf\u7684\u76ee\u7684\u662f\u5c31\u91c7\u53d6\u7684\u884c\u52a8\u8fbe\u6210\u5171\u8bc6\u3002\u4f20\u7edf\u7684BGP\u8981\u6c42\u6240\u6709\u4ee3\u7406\u4e4b\u95f4\u5b9e\u73b0\u5168\u5c40\u5171\u8bc6\uff1b\u7136\u800c\uff0c\u5728\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u5168\u5c40\u5171\u8bc6\u5e76\u4e0d\u603b\u662f\u5fc5\u8981\uff0c\u751a\u81f3\u53ef\u80fd\u662f\u4f4e\u6548\u7684\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u63a2\u7d22\u4e00\u79cd\u4e0eMAS\u4e2d\u89c2\u5bdf\u5230\u7684\u5c40\u90e8\u534f\u8c03\u6a21\u5f0f\u76f8\u4e00\u81f4\u7684\u6539\u8fdb\u7248BGP\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u79f0\u8fd9\u79cd\u6539\u8fdb\u7248\u672c\u4e3a\u4e0d\u5b8c\u7f8eBGP\uff08IBGP\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u5dee\u5f02\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u901a\u7528MAS\u8bbe\u7f6e\u4e2d\u7684\u5171\u8bc6\u534f\u8bae\uff0c\u63d0\u4f9b\u4e86\u5bf9\u901a\u4fe1\u653b\u51fb\u7684\u53ef\u8bc1\u660e\u7684\u5f39\u6027\u4ee5\u53ca\u9002\u5e94\u4e0d\u65ad\u53d8\u5316\u73af\u5883\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u5b9e\u8bc1\u7ed3\u679c\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u4e00\u4e2a\u4f20\u611f\u5668\u7f51\u7edc\u73af\u5883\u7684\u6848\u4f8b\u7814\u7a76\u6765\u8bf4\u660e\u6211\u4eec\u534f\u8bae\u7684\u5b9e\u9645\u5e94\u7528\u3002|\n", "2410.16236": "|**2024-10-21**|**LLaVA-KD: A Framework of Distilling Multimodal Large Language Models**|Yuxuan Cai et.al.|[2410.16236](http://arxiv.org/abs/2410.16236)|**[link](https://github.com/Fantasyele/LLaVA-KD)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6210\u529f\u4fc3\u4f7f\u7814\u7a76\u8005\u63a2\u7d22\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u4ee5\u5b9e\u73b0\u89c6\u89c9\u548c\u8bed\u8a00\u7684\u7edf\u4e00\u7406\u89e3\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u590d\u6742\u6027\u7684\u589e\u52a0\uff0cMLLM\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u53d7\u5230\u9650\u5236\u3002\u5c0f\u89c4\u6a21\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08s-MLLM\uff09\u65e8\u5728\u4fdd\u7559\u5927\u89c4\u6a21\u6a21\u578b\uff08l-MLLM\uff09\u7684\u80fd\u529b\uff0c\u540c\u65f6\u51cf\u5c11\u8ba1\u7b97\u9700\u6c42\uff0c\u4f46\u4f1a\u5bfc\u81f4\u6027\u80fd\u663e\u8457\u4e0b\u964d\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLaVA-KD\u7684\u65b0\u6846\u67b6\uff0c\u7528\u4e8e\u5c06\u77e5\u8bc6\u4ecel-MLLM\u8f6c\u79fb\u5230s-MLLM\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u6a21\u6001\u84b8\u998f\uff08MDist\uff09\u6765\u6700\u5c0f\u5316l-MLLM\u548cs-MLLM\u4e4b\u95f4\u89c6\u89c9-\u6587\u672c\u8f93\u51fa\u5206\u5e03\u7684\u5dee\u5f02\uff0c\u5e76\u5f15\u5165\u5173\u7cfb\u84b8\u998f\uff08RDist\uff09\u6765\u8f6c\u79fbl-MLLM\u5efa\u6a21\u89c6\u89c9\u7279\u5f81\u4e4b\u95f4\u76f8\u5173\u6027\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e09\u9636\u6bb5\u8bad\u7ec3\u65b9\u6848\uff0c\u4ee5\u5145\u5206\u53d1\u6325s-MLLM\u7684\u6f5c\u529b\uff1a1\uff09\u84b8\u998f\u9884\u8bad\u7ec3\u5bf9\u9f50\u89c6\u89c9-\u6587\u672c\u8868\u793a\uff1b2\uff09\u76d1\u7763\u5fae\u8c03\u4f7f\u6a21\u578b\u5177\u5907\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff1b3\uff09\u84b8\u998f\u5fae\u8c03\u8fdb\u4e00\u6b65\u8f6c\u79fbl-MLLM\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\uff0c\u800c\u65e0\u9700\u6539\u53d8\u5c0f\u6a21\u578b\u7684\u67b6\u6784\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u63d0\u51fa\u7684\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u5c06\u5728https://github.com/caiyuxuan1120/LLAva-KD\u83b7\u53d6\u3002|\n", "2410.16235": "|**2024-10-21**|**ToW: Thoughts of Words Improve Reasoning in Large Language Models**|Zhikun Xu et.al.|[2410.16235](http://arxiv.org/abs/2410.16235)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86Thoughts of Words\uff08ToW\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u8bad\u7ec3\u65f6\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff0c\u7528\u4e8e\u4e0b\u4e2a\u8bcd\u9884\u6d4b\u3002ToW\u5c06\u4e0b\u4e2a\u8bcd\u9884\u6d4b\u89c6\u4e3a\u4e00\u4e2a\u6838\u5fc3\u63a8\u7406\u4efb\u52a1\uff0c\u5e76\u5728\u9884\u8bad\u7ec3\u6587\u672c\u4e2d\u6ce8\u5165\u7cbe\u7ec6\u7684\u601d\u8003\uff0c\u89e3\u91ca\u4e0b\u4e2a\u8bcd\u5e94\u8be5\u662f\u4ec0\u4e48\u4ee5\u53ca\u5b83\u4e0e\u524d\u6587\u4e0a\u4e0b\u6587\u7684\u5173\u7cfb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u89e3\u51b3\u4e86\u73b0\u6709\u4e0b\u4e2a\u8bcd\u9884\u6d4b\u5b66\u4e60\u65b9\u6848\u7684\u4e24\u4e2a\u57fa\u672c\u7f3a\u70b9\uff1a\u5b83\u4eec\u4f1a\u5f15\u8d77\u4e8b\u5b9e\u6027\u5e7b\u89c9\uff0c\u5e76\u4e14\u5bf9\u4e8e\u6a21\u578b\u6765\u8bf4\u96be\u4ee5\u6709\u6548\u5b66\u4e60\u539f\u59cb\u6587\u672c\u4e2d\u7684\u9690\u542b\u63a8\u7406\u8fc7\u7a0b\u3002\u867d\u7136\u83b7\u53d6\u8fd9\u4e9b\u5355\u8bcd\u7684\u601d\u60f3\u6709\u5f88\u591a\u65b9\u6cd5\uff0c\u4f46\u6211\u4eec\u63a2\u7d22\u4e86\u901a\u8fc7\u84b8\u998f\u4ece\u66f4\u5927\u6a21\u578b\u4e2d\u83b7\u53d6ToW\u6ce8\u91ca\u7684\u7b2c\u4e00\u6b65\u3002\u7ecf\u8fc7\u4ec5\u4f7f\u752870K\u4e2aToW\u6ce8\u91ca\u7684\u6301\u7eed\u9884\u8bad\u7ec3\u540e\uff0c\u6211\u4eec\u5728\u5e73\u5747\u60c5\u51b5\u4e0b\u63d0\u9ad8\u4e86\u6a21\u578b\u63a8\u7406\u6027\u80fd7%\u52309%\uff0c\u5e76\u5c06\u6a21\u578b\u5e7b\u89c9\u51cf\u5c11\u4e86\u9ad8\u8fbe10%\u3002\u540c\u65f6\uff0cToW\u5b8c\u5168\u72ec\u7acb\u4e8e\u4efb\u52a1\u548c\u5e94\u7528\uff0c\u4e0d\u4f1a\u5bf9\u6807\u7b7e\u6216\u8bed\u4e49\u5f15\u5165\u989d\u5916\u7684\u504f\u89c1\u3002|\n", "2410.16229": "|**2024-10-21**|**Building A Coding Assistant via the Retrieval-Augmented Language Model**|Xinze Li et.al.|[2410.16229](http://arxiv.org/abs/2410.16229)|**[link](https://github.com/NEUIR/CONAN)**|\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u5728\u4ee3\u7801\u76f8\u5173\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6709\u6548\u6027\uff0c\u5982\u4ee3\u7801\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u4ee3\u7801\u603b\u7ed3\u548c\u4ee3\u7801\u8865\u5168\u7b49\u4efb\u52a1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCONAN\uff08\u901a\u8fc7\u68c0\u7d22\u589e\u5f3a\u8bed\u8a00\u6a21\u578b\u5b9e\u73b0\u7684\u4ee3\u7801\u52a9\u624b\uff09\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u6a21\u4eff\u4eba\u7c7b\u5728\u7f16\u7a0b\u8fc7\u7a0b\u4e2d\u5bfb\u6c42\u77e5\u8bc6\u7684\u884c\u4e3a\u6765\u6784\u5efa\u4ee3\u7801\u52a9\u624b\u3002\u5177\u4f53\u6765\u8bf4\uff0cCONAN\u7531\u4e00\u4e2a\u4ee3\u7801\u7ed3\u6784\u611f\u77e5\u68c0\u7d22\u5668\uff08CONAN-R\uff09\u548c\u4e00\u4e2a\u57fa\u4e8e\u53cc\u91cd\u89c6\u56fe\u4ee3\u7801\u8868\u793a\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6a21\u578b\uff08CONAN-G\uff09\u7ec4\u6210\u3002CONAN-R\u901a\u8fc7\u4f7f\u7528Code-Documentation\u5bf9\u9f50\u548c\u63a9\u7801\u5b9e\u4f53\u9884\u6d4b\u4efb\u52a1\u6765\u9884\u8bad\u7ec3CodeT5\uff0c\u4ece\u800c\u4f7f\u8bed\u8a00\u6a21\u578b\u5177\u5907\u4ee3\u7801\u7ed3\u6784\u611f\u77e5\u80fd\u529b\uff0c\u5e76\u5b66\u4e60\u6709\u6548\u7684\u4ee3\u7801\u7247\u6bb5\u548c\u6587\u6863\u8868\u793a\u3002\u7136\u540e\uff0cCONAN-G\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u91cd\u89c6\u56fe\u4ee3\u7801\u8868\u793a\u673a\u5236\u6765\u5b9e\u73b0\u68c0\u7d22\u589e\u5f3a\u7684\u4ee3\u7801\u751f\u6210\u6a21\u578b\u3002CONAN-G\u5c06\u4ee3\u7801\u6587\u6863\u63cf\u8ff0\u89c6\u4e3a\u63d0\u793a\uff0c\u5e2e\u52a9\u8bed\u8a00\u6a21\u578b\u66f4\u597d\u5730\u7406\u89e3\u4ee3\u7801\u8bed\u4e49\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cCONAN\u5728\u4e0d\u540c\u7684\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4ee4\u4eba\u4fe1\u670d\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u68c0\u7d22\u589e\u5f3a\u4ee3\u7801\u751f\u6210\u6a21\u578b\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cCONAN\u901a\u8fc7\u5bf9\u4ee3\u7801\u6587\u6863\u6570\u636e\u5bf9\u8fdb\u884c\u5bf9\u9f50\u4ee5\u53ca\u901a\u8fc7\u63a9\u7801\u548c\u9884\u6d4b\u4ee3\u7801\u4e2d\u7684\u5b9e\u4f53\u6765\u6355\u83b7\u7ed3\u6784\u8bed\u4e49\uff0c\u4ece\u800c\u4e3a\u4ee3\u7801\u7247\u6bb5\u548c\u6587\u6863\u5b66\u4e60\u5b9a\u5236\u5316\u8868\u793a\u3002\u6b64\u5916\uff0c\u68c0\u7d22\u5230\u7684\u4ee3\u7801\u7247\u6bb5\u548c\u6587\u6863\u63d0\u4f9b\u4e86\u6765\u81ea\u7a0b\u5e8f\u8bed\u8a00\u548c\u81ea\u7136\u8bed\u8a00\u7684\u5fc5\u8981\u4fe1\u606f\uff0c\u4ee5\u534f\u52a9\u4ee3\u7801\u751f\u6210\u8fc7\u7a0b\u3002CONAN\u8fd8\u53ef\u4ee5\u4f5c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u52a9\u624b\uff0c\u5728\u8f83\u77ed\u7684\u4ee3\u7801\u6587\u6863\u957f\u5ea6\u4e0b\u63d0\u4f9b\u5916\u90e8\u77e5\u8bc6\uff0c\u4ee5\u63d0\u9ad8\u5176\u5728\u5404\u79cd\u4ee3\u7801\u4efb\u52a1\u4e0a\u7684\u6709\u6548\u6027\u3002\u8fd9\u663e\u793a\u4e86CONAN\u63d0\u53d6\u5fc5\u8981\u4fe1\u606f\u5e76\u5e2e\u52a9\u8fc7\u6ee4\u68c0\u7d22\u5230\u7684\u4ee3\u7801\u6587\u6863\u4e2d\u7684\u566a\u58f0\u7684\u80fd\u529b\u3002|\n", "2410.17236": "|**2024-10-22**|**Large Language Models Empowered Personalized Web Agents**|Hongru Cai et.al.|[2410.17236](http://arxiv.org/abs/2410.17236)|null|\u7f51\u7edc\u4ee3\u7406\u4f5c\u4e3a\u81ea\u52a8\u5316\u57fa\u4e8e\u7528\u6237\u6307\u4ee4\u7684Web\u4efb\u52a1\u5b8c\u6210\u7684\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u5411\uff0c\u663e\u8457\u63d0\u5347\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u6700\u8fd1\uff0c\u7f51\u7edc\u4ee3\u7406\u4ece\u4f20\u7edf\u7684\u4ee3\u7406\u53d1\u5c55\u5230\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7f51\u7edc\u4ee3\u7406\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\u5ffd\u7565\u4e86\u4e2a\u6027\u5316\u6570\u636e\uff08\u5982\u7528\u6237\u8d44\u6599\u548c\u5386\u53f2Web\u884c\u4e3a\uff09\u5728\u8f85\u52a9\u7406\u89e3\u7528\u6237\u7684\u4e2a\u6027\u5316\u6307\u4ee4\u548c\u6267\u884c\u5b9a\u5236\u5316\u64cd\u4f5c\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002\u4e3a\u514b\u670d\u8fd9\u4e00\u5c40\u9650\uff0c\u6211\u4eec\u9996\u5148\u5236\u5b9a\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u4e2a\u6027\u5316\u7f51\u7edc\u4ee3\u7406\u4efb\u52a1\uff0c\u8be5\u4efb\u52a1\u7ed3\u5408\u4e86\u4e2a\u6027\u5316\u6570\u636e\u548c\u7528\u6237\u6307\u4ee4\u6765\u5b9e\u73b0\u4e2a\u6027\u5316\u7684\u6307\u4ee4\u7406\u89e3\u548c\u64cd\u4f5c\u6267\u884c\u3002\u4e3a\u4e86\u5e94\u5bf9\u7f3a\u4e4f\u5168\u9762\u8bc4\u4f30\u57fa\u51c6\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e2a\u6027\u5316\u7f51\u7edc\u4ee3\u7406\u57fa\u51c6\uff08PersonalWAB\uff09\uff0c\u8be5\u57fa\u51c6\u5305\u542b\u4e86\u7528\u6237\u6307\u4ee4\u3001\u4e2a\u6027\u5316\u7528\u6237\u6570\u636e\u3001Web\u529f\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e09\u4e2a\u4e2a\u6027\u5316Web\u4efb\u52a1\u7684\u4e24\u79cd\u8bc4\u4f30\u8303\u5f0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e2a\u6027\u5316\u7528\u6237\u8bb0\u5fc6\u589e\u5f3a\u5bf9\u9f50\uff08PUMA\uff09\u6846\u67b6\uff0c\u4ee5\u9002\u5e94\u4e2a\u6027\u5316\u7f51\u7edc\u4ee3\u7406\u4efb\u52a1\u3002PUMA\u5229\u7528\u5177\u6709\u7279\u5b9a\u4efb\u52a1\u68c0\u7d22\u7b56\u7565\u7684\u8bb0\u5fc6\u5e93\u6765\u7b5b\u9009\u76f8\u5173\u7684\u5386\u53f2Web\u884c\u4e3a\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u884c\u4e3a\uff0cPUMA\u901a\u8fc7\u5fae\u8c03\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\u6765\u8c03\u6574LLM\u8fdb\u884c\u4e2a\u6027\u5316\u7684\u64cd\u4f5c\u6267\u884c\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86PUMA\u5728PersonalWAB\u4e0a\u4f18\u4e8e\u73b0\u6709\u7f51\u7edc\u4ee3\u7406\u7684\u4f18\u8d8a\u6027\u3002|\n", "2410.17235": "|**2024-10-22**|**Automated Spinal MRI Labelling from Reports Using a Large Language Model**|Robin Y. Park et.al.|[2410.17235](http://arxiv.org/abs/2410.17235)|**[link](https://github.com/robinyjpark/autolabelclassifier)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u7684\u7ba1\u9053\uff0c\u7528\u4e8e\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u5316\u63d0\u53d6\u653e\u5c04\u5b66\u62a5\u544a\u4e2d\u7684\u6807\u7b7e\uff0c\u5e76\u5728\u810a\u67f1MRI\u62a5\u544a\u4e0a\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u8be5\u6807\u7b7e\u63d0\u53d6\u65b9\u6cd5\u7684\u6709\u6548\u6027\u5728\u4e94\u79cd\u4e0d\u540c\u7684\u60c5\u51b5\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\uff1a\u810a\u67f1\u764c\u3001\u72ed\u7a84\u3001\u810a\u690e\u6ed1\u8131\u3001\u9a6c\u5c3e\u795e\u7ecf\u53d7\u538b\u548c\u759d\u6c14\u3002\u4f7f\u7528\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u7559\u7684\u4e00\u7ec4\u62a5\u544a\u4e0a\u7b49\u4e8e\u6216\u8d85\u8fc7\u4e86GPT-4\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6240\u63d0\u53d6\u7684\u6807\u7b7e\u53ef\u4ee5\u7528\u6765\u8bad\u7ec3\u5f71\u50cf\u6a21\u578b\u4ee5\u8bc6\u522b\u4f34\u968f\u7684MRI\u626b\u63cf\u4e2d\u7684\u8fd9\u4e9b\u5df2\u8bc6\u522b\u7684\u72b6\u51b5\u3002\u6240\u6709\u4f7f\u7528\u81ea\u52a8\u6807\u7b7e\u8bad\u7ec3\u7684\u5206\u7c7b\u5668\u8868\u73b0\u4e0e\u4f7f\u7528\u4e34\u5e8a\u533b\u751f\u624b\u52a8\u6807\u6ce8\u7684\u626b\u63cf\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u3002\u4ee3\u7801\u53ef\u4ee5\u5728\u627e\u5230\u3002**|\n", "2410.17234": "|**2024-10-22**|**Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy**|Benedict Aaron Tjandra et.al.|[2410.17234](http://arxiv.org/abs/2410.17234)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5176\u751f\u6210\u5408\u7406\u4f46\u4e0d\u51c6\u786e\u6587\u672c\u7684\u80fd\u529b\u800c\u95fb\u540d\uff0c\u8fd9\u79cd\u73b0\u8c61\u5728\u533b\u5b66\u6216\u6cd5\u5f8b\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u5e26\u6765\u4e86\u663e\u8457\u7684\u98ce\u9669\uff0c\u56e0\u6b64\u9700\u8981\u91c7\u53d6\u7a33\u5065\u7684\u5e7b\u89c9\u7f13\u89e3\u7b56\u7565\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u901a\u8fc7\u5fae\u8c03\u6765\u6559\u5bfc\u6a21\u578b\u907f\u514d\u56de\u7b54\u8d85\u51fa\u5176\u77e5\u8bc6\u6216\u80fd\u529b\u8303\u56f4\u7684\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u5916\u90e8\u7684\u771f\u5b9e\u6807\u7b7e\uff0c\u6216\u8005\u4ec5\u9650\u4e8e\u77ed\u6587\u672c\u56de\u5e94\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u8bed\u4e49\u71b5\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u4ece\u6a21\u578b\u5185\u90e8\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u5f97\u51fa\u7684\u4e0d\u786e\u5b9a\u6027\u5ea6\u91cf\uff0c\u4e0d\u9700\u8981\u5916\u90e8\u6807\u7b7e\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f7f\u7528\u5148\u524d\u7814\u7a76\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u4e0a\u8fbe\u5230\u4e86\u540c\u7b49\u6216\u66f4\u597d\u7684\u8868\u73b0\uff0c\u5e76\u5728\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u5bf9\u77ed\u6587\u672c\u548c\u957f\u6587\u672c\u751f\u6210\u7684\u5f3a\u5927\u6027\u80fd\u3002|\n", "2410.17233": "|**2024-10-22**|**Few-shot In-Context Preference Learning Using Large Language Models**|Chao Yu et.al.|[2410.17233](http://arxiv.org/abs/2410.17233)|null|\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u662f\u5f3a\u5316\u5b66\u4e60\u4e2d\u7684\u6838\u5fc3\u7ec4\u6210\u90e8\u5206\uff0c\u4f46\u5bf9\u4e8e\u975e\u5e38\u590d\u6742\u7684\u884c\u4e3a\u6765\u8bf4\u53ef\u80fd\u5177\u6709\u6311\u6218\u6027\u3002\u901a\u8fc7\u7528\u4ece\u4eba\u7c7b\u53cd\u9988\u4e2d\u5b66\u4e60\u5230\u7684\u5956\u52b1\u51fd\u6570\u66ff\u4ee3\u624b\u5de5\u7f16\u5199\u7684\u5956\u52b1\u51fd\u6570\uff0c\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u7528\u4e8e\u7f13\u89e3\u8fd9\u4e00\u6311\u6218\u3002\u7136\u800c\uff0c\u5b66\u4e60\u8fd9\u4e9b\u5956\u52b1\u51fd\u6570\u901a\u5e38\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u5b83\u4eec\u5f80\u5f80\u662f\u4ece\u5934\u5f00\u59cb\u5b66\u4e60\u7684\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u5c06\u4e00\u7cfb\u5217\u4eba\u7c7b\u504f\u597d\u8f6c\u6362\u4e3a\u8868\u793a\u5956\u52b1\u7684\u4ee3\u7801\u6765\u51cf\u5c11\u67e5\u8be2\u7684\u4f4e\u6548\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3a\u201c\u4e0a\u4e0b\u6587\u504f\u597d\u5b66\u4e60\u201d\uff08ICPL\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528LLM\u7684\u80cc\u666f\u77e5\u8bc6\u6765\u52a0\u901f\u4ece\u504f\u597d\u4e2d\u5b66\u4e60\u5956\u52b1\u51fd\u6570\u7684\u8fc7\u7a0b\u3002ICPL\u91c7\u7528\u73af\u5883\u4e0a\u4e0b\u6587\u548c\u4efb\u52a1\u63cf\u8ff0\uff0c\u5408\u6210\u4e00\u7ec4\u5956\u52b1\u51fd\u6570\uff0c\u7136\u540e\u53cd\u590d\u4f7f\u7528\u4eba\u7c7b\u5bf9\u653f\u7b56\u7ed3\u679c\u89c6\u9891\u7684\u6392\u540d\u6765\u66f4\u65b0\u8fd9\u4e9b\u5956\u52b1\u51fd\u6570\u3002\u901a\u8fc7\u5408\u6210\u504f\u597d\uff0c\u6211\u4eec\u8bc1\u660eICPL\u6bd4RLHF\u9ad8\u6548\u51e0\u4e2a\u6570\u91cf\u7ea7\uff0c\u5e76\u4e14\u751a\u81f3\u4e0e\u4f7f\u7528\u771f\u5b9e\u5956\u52b1\u51fd\u6570\u7684\u65b9\u6cd5\u76f8\u6bd4\u4e5f\u5177\u6709\u7ade\u4e89\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u4eba\u7c7b\u504f\u597d\u5b66\u4e60\u8bd5\u9a8c\uff0c\u89c2\u5bdf\u5230ICPL\u4e0d\u4ec5\u9002\u7528\u4e8e\u5408\u6210\u8bbe\u7f6e\uff0c\u8fd8\u53ef\u4ee5\u5728\u4eba\u7c7b\u53c2\u4e0e\u7684\u5faa\u73af\u4e2d\u6709\u6548\u5de5\u4f5c\u3002\u66f4\u591a\u76f8\u5173\u4fe1\u606f\u548c\u89c6\u9891\u53ef\u4ee5\u5728https://sites.google.com/view/few-shot-icpl/home \u83b7\u53d6\u3002|\n", "2410.17222": "|**2024-10-22**|**Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods**|Tsachi Blau et.al.|[2410.17222](http://arxiv.org/abs/2410.17222)|null|\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u5e38\u6d89\u53ca\u66f4\u65b0\u6570\u5341\u4ebf\u4e2a\u53c2\u6570\u3002\u4e00\u79cd\u66f4\u4e3a\u53c2\u6570\u9ad8\u6548\u7684\u65b9\u6cd5\u662f\u63d0\u793a\u8c03\u4f18\uff08PT\uff09\uff0c\u5b83\u4ec5\u66f4\u65b0\u5c11\u6570\u53ef\u5b66\u4e60\u7684\u6807\u8bb0\u3002\u53e6\u4e00\u79cd\u65b9\u6cd5\u662f\u60c5\u5883\u5b66\u4e60\uff08ICL\uff09\uff0c\u5b83\u901a\u8fc7\u5728\u8f93\u5165\u4e2d\u5305\u542b\u793a\u4f8b\u6765\u9002\u5e94\u65b0\u4efb\u52a1\uff0c\u800c\u65e0\u9700\u8fdb\u884c\u8bad\u7ec3\u3002\u5f53\u5e94\u7528\u57fa\u4e8e\u4f18\u5316\u7684\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u548cPT\u8fdb\u884c\u5c11\u6837\u672c\u5b66\u4e60\u65f6\uff0c\u6a21\u578b\u4f1a\u7279\u522b\u9002\u5e94\u5c11\u91cf\u7684\u8bad\u7ec3\u793a\u4f8b\uff0c\u800cICL\u5219\u4e0d\u6539\u53d8\u6a21\u578b\u672c\u8eab\u3002\u8fd9\u79cd\u533a\u522b\u4f7f\u5f97\u4f20\u7edf\u7684\u5b66\u4e60\u65b9\u6cd5\u66f4\u5bb9\u6613\u8fc7\u62df\u5408\uff1b\u76f8\u53cd\uff0cICL\u5bf9\u5c11\u91cf\u6837\u672c\u7684\u60c5\u51b5\u4e0d\u592a\u654f\u611f\u3002\u867d\u7136ICL\u4e0d\u5bb9\u6613\u8fc7\u62df\u5408\uff0c\u4f46\u5b83\u5e76\u4e0d\u80fd\u5b8c\u5168\u63d0\u53d6\u8bad\u7ec3\u793a\u4f8b\u4e2d\u5b58\u5728\u7684\u4fe1\u606f\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u60c5\u5883\u611f\u77e5\u63d0\u793a\u8c03\u4f18\uff08CPT\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53d7\u5230ICL\u3001PT\u548c\u5bf9\u6297\u653b\u51fb\u7684\u542f\u53d1\u3002\u6211\u4eec\u5728ICL\u7b56\u7565\u7684\u57fa\u7840\u4e0a\uff0c\u5c06\u793a\u4f8b\u4e0e\u8f93\u5165\u4e32\u8054\u8d77\u6765\uff0c\u4f46\u901a\u8fc7PT\u5f0f\u7684\u4f18\u5316\uff0c\u8fed\u4ee3\u5730\u4f18\u5316\u4e0a\u4e0b\u6587\u5d4c\u5165\uff0c\u4ee5\u4ece\u8bad\u7ec3\u793a\u4f8b\u4e2d\u63d0\u53d6\u66f4\u6df1\u5c42\u6b21\u7684\u4fe1\u606f\u3002\u6211\u4eec\u4ed4\u7ec6\u4fee\u6539\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u6807\u8bb0\uff0c\u8003\u8651\u8f93\u5165\u548c\u8f93\u51fa\u683c\u5f0f\u7684\u72ec\u7279\u7ed3\u6784\u3002\u53d7\u5bf9\u6297\u653b\u51fb\u7684\u542f\u53d1\uff0c\u6211\u4eec\u6839\u636e\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u7684\u6807\u7b7e\u8c03\u6574\u8f93\u5165\uff0c\u65e8\u5728\u6700\u5c0f\u5316\u800c\u4e0d\u662f\u6700\u5927\u5316\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5e94\u7528\u6295\u5f71\u68af\u5ea6\u4e0b\u964d\u7b97\u6cd5\uff0c\u4f7f\u6807\u8bb0\u5d4c\u5165\u4fdd\u6301\u5728\u63a5\u8fd1\u539f\u59cb\u503c\u7684\u72b6\u6001\uff0c\u5047\u8bbe\u7528\u6237\u63d0\u4f9b\u7684\u6570\u636e\u672c\u8d28\u4e0a\u662f\u6709\u4ef7\u503c\u7684\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u591a\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e2d\u4f7f\u7528\u5404\u79cdLLM\u6a21\u578b\uff0c\u5df2\u663e\u793a\u51fa\u4f18\u8d8a\u7684\u51c6\u786e\u6027\u3002|\n", "2410.17210": "|**2024-10-22**|**Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling**|Azmine Toushik Wasi et.al.|[2410.17210](http://arxiv.org/abs/2410.17210)|**[link](https://github.com/ciol-researchlab/ukil)**|**\u76ee\u7684\uff1a\u5b5f\u52a0\u62c9\u56fd\u7684\u6cd5\u5f8b\u7cfb\u7edf\u9762\u4e34\u7740\u91cd\u5927\u6311\u6218\uff0c\u5982\u6848\u4ef6\u79ef\u538b\u3001\u590d\u6742\u6027\u3001\u9ad8\u6602\u7684\u6210\u672c\u4ee5\u53ca\u6570\u767e\u4e07\u672a\u51b3\u6848\u4ef6\u7b49\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u5bfc\u81f4\u8bb8\u591a\u4eba\u56e0\u7f3a\u4e4f\u77e5\u8bc6\u6216\u7ecf\u6d4e\u9650\u5236\u800c\u65e0\u6cd5\u5bfb\u6c42\u6cd5\u5f8b\u6551\u6d4e\u3002\u672c\u7814\u7a76\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u4e13\u95e8\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee5\u534f\u52a9\u5b5f\u52a0\u62c9\u56fd\u7684\u6cd5\u5f8b\u7cfb\u7edf\u3002\u65b9\u6cd5\uff1a\u6211\u4eec\u901a\u8fc7\u6536\u96c6\u548c\u722c\u53d6\u5404\u79cd\u6cd5\u5f8b\u6cd5\u6848\u7684\u6570\u636e\uff0c\u521b\u5efa\u4e86UKIL-DB-EN\uff0c\u5373\u5b5f\u52a0\u62c9\u56fd\u6cd5\u5f8b\u6587\u4ef6\u7684\u82f1\u6587\u8bed\u6599\u5e93\u3002\u7136\u540e\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u5bf9GPT-2\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5f00\u53d1\u4e86GPT2-UKIL-EN\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u82f1\u8bed\u6cd5\u5f8b\u63f4\u52a9\u7684LLM\u3002\u7ed3\u679c\uff1a\u8be5\u6a21\u578b\u901a\u8fc7\u5305\u62ec\u4e13\u5bb6\u610f\u89c1\u652f\u6301\u7684\u6848\u4f8b\u7814\u7a76\u5728\u5185\u7684\u8bed\u4e49\u8bc4\u4f30\u8fdb\u884c\u4e86\u4e25\u683c\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6a21\u578b\u5177\u6709\u6f5c\u5728\u7684\u6cd5\u5f8b\u4e8b\u52a1\u8f85\u52a9\u80fd\u529b\u3002\u7ed3\u8bba\uff1a\u6211\u4eec\u7684\u5de5\u4f5c\u4ee3\u8868\u4e86\u5efa\u7acb\u5b5f\u52a0\u62c9\u56fdAI\u6cd5\u5f8b\u52a9\u624b\u7684\u7b2c\u4e00\u4e2a\u6709\u7ec4\u7ec7\u7684\u52aa\u529b\u3002\u5c3d\u7ba1\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u4ecd\u9700\u8981\u8fdb\u4e00\u6b65\u6539\u8fdb\u4ee5\u63d0\u9ad8\u6a21\u578b\u7684\u51c6\u786e\u6027\u3001\u53ef\u4fe1\u5ea6\u548c\u5b89\u5168\u6027\u3002\u8fd9\u662f\u671d\u7740\u521b\u5efa\u80fd\u591f\u6ee1\u8db31.8\u4ebf\u4eba\u53e3\u9700\u6c42\u7684\u6cd5\u5f8bAI\u7684\u91cd\u8981\u4e00\u6b65\u3002**|\n", "2410.17196": "|**2024-10-22**|**VoiceBench: Benchmarking LLM-Based Voice Assistants**|Yiming Chen et.al.|[2410.17196](http://arxiv.org/abs/2410.17196)|**[link](https://github.com/matthewcym/voicebench)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\uff0c\u8fd1\u671f\u7684\u8fdb\u5c55\u5982GPT-4o\u4f7f\u5f97\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u5b9e\u73b0\u5b9e\u65f6\u8bed\u97f3\u4ea4\u4e92\u6210\u4e3a\u53ef\u80fd\uff0c\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u6587\u672c\u7684\u4ea4\u4e92\u76f8\u6bd4\uff0c\u8fd9\u5927\u5927\u63d0\u5347\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u8fd9\u4e9b\u8bed\u97f3\u4ea4\u4e92\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u963b\u788d\u4e86\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u7684\u53d1\u5c55\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6216\u4f7f\u7528\u6e05\u6670\u8bed\u97f3\u7684\u4e00\u822c\u77e5\u8bc6\u8bc4\u4f30\u4e0a\uff0c\u5ffd\u89c6\u4e86\u66f4\u590d\u6742\u7684\u73b0\u5b9e\u573a\u666f\uff0c\u8fd9\u4e9b\u573a\u666f\u6d89\u53ca\u591a\u6837\u7684\u8bf4\u8bdd\u8005\u7279\u5f81\u3001\u73af\u5883\u548c\u5185\u5bb9\u56e0\u7d20\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86VoiceBench\uff0c\u8fd9\u662f\u9996\u4e2a\u65e8\u5728\u63d0\u4f9b\u591a\u65b9\u9762\u8bc4\u4f30\u7684\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u57fa\u51c6\u6d4b\u8bd5\u3002VoiceBench\u8fd8\u5305\u62ec\u65e2\u5305\u62ec\u771f\u5b9e\u7684\u4e5f\u5305\u62ec\u5408\u6210\u7684\u53e3\u8bed\u6307\u4ee4\uff0c\u8fd9\u4e9b\u6307\u4ee4\u878d\u5408\u4e86\u4e0a\u8ff0\u4e09\u4e2a\u5173\u952e\u7684\u73b0\u5b9e\u4e16\u754c\u53d8\u5316\u56e0\u7d20\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u5f53\u524d\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u6a21\u578b\u7684\u5c40\u9650\u6027\uff0c\u5e76\u4e3a\u8be5\u9886\u57df\u7684\u672a\u6765\u7814\u7a76\u548c\u53d1\u5c55\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002**|\n", "2410.17195": "|**2024-10-23**|**Non-myopic Generation of Language Model for Reasoning and Planning**|Chang Ma et.al.|[2410.17195](http://arxiv.org/abs/2410.17195)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u5c55\u793a\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u95ee\u9898\u5206\u89e3\u6210\u4e00\u7cfb\u5217\u6b65\u9aa4\u6765\u89e3\u51b3\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u548c\u7f16\u7801\u7b49\u5404\u79cd\u9886\u57df\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u7531\u4e8e\u5176\u56fa\u6709\u7684\u81ea\u56de\u5f52\u89e3\u7801\u65b9\u5f0f\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u786e\u4fdd\u53ef\u9760\u4e14\u6700\u4f18\u7684\u89c4\u5212\u65f6\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u4ece\u6700\u4f18\u63a7\u5236\u7684\u89d2\u5ea6\u91cd\u65b0\u5ba1\u89c6\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u9884\u6d4b\u89e3\u7801\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u6a21\u578b\u9884\u6d4b\u63a7\u5236\u6765\u589e\u5f3a\u89c4\u5212\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6839\u636e\u524d\u77bb\u8f68\u8ff9\u91cd\u65b0\u52a0\u6743\u8bed\u8a00\u6a21\u578b\u7684\u5206\u5e03\uff0c\u9884\u6d4b\u89e3\u7801\u65e8\u5728\u51cf\u8f7b\u65e9\u671f\u9519\u8bef\u5e76\u4fc3\u8fdb\u975e\u77ed\u89c6\u89c4\u5212\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6570\u5b66\u3001\u7f16\u7801\u548c\u667a\u80fd\u4f53\u4efb\u52a1\u7684\u5e7f\u6cdb\u8303\u56f4\u5185\uff0c\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6b64\u5916\uff0c\u9884\u6d4b\u89e3\u7801\u8fd8\u8868\u73b0\u51fa\u8ba1\u7b97\u6548\u7387\uff0c\u4f7f\u7528\u8f83\u5c11\u7684\u8ba1\u7b97\u8d44\u6e90\u5c31\u4f18\u4e8e\u641c\u7d22\u57fa\u7ebf\u3002\u672c\u7814\u7a76\u4e3a\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u5212\u80fd\u529b\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.17174": "|**2024-10-22**|**From Attention to Activation: Unravelling the Enigmas of Large Language Models**|Prannay Kaul et.al.|[2410.17174](http://arxiv.org/abs/2410.17174)|null|\u6211\u4eec\u7814\u7a76\u4e86\u81ea\u56de\u5f52Transformer\u4e2d\u7684\u4e24\u79cd\u5947\u602a\u73b0\u8c61\uff1a\uff081\uff09\u6ce8\u610f\u529b\u5934\u4e2d\u7b2c\u4e00\u4e2a\u4ee4\u724c\u7684\u4e3b\u5bfc\u6027\uff1b\uff082\uff09\u9690\u85cf\u72b6\u6001\u4e2d\u51fa\u73b0\u5927\u7684\u5f02\u5e38\u6fc0\u6d3b\u503c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u6d41\u884c\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982Llama\uff09\u572898%\u7684\u6ce8\u610f\u529b\u5934\u4e2d\u5bf9\u7b2c\u4e00\u4e2a\u4ee4\u724c\u7684\u5173\u6ce8\u5ea6\u6700\u5927\uff0c\u6211\u4eec\u5c06\u8fd9\u79cd\u884c\u4e3a\u5f52\u56e0\u4e8esoftmax\u51fd\u6570\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cdsoftmax-1\u7684\u91cd\u65b0\u516c\u5f0f\u5316\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u786e\u5b9a\u81ea\u9002\u5e94\u4f18\u5316\u5668\uff08\u4f8b\u5982Adam\uff09\u662f\u5bfc\u81f4\u8fd9\u4e9b\u5927\u5f02\u5e38\u6fc0\u6d3b\u503c\u7684\u4e3b\u8981\u539f\u56e0\uff0c\u5e76\u5f15\u5165OrthoAdam\uff0c\u4e00\u79cd\u65b0\u7684\u4f18\u5316\u5668\uff0c\u5b83\u4f7f\u7528\u6b63\u4ea4\u77e9\u9635\u6765\u8f6c\u6362\u68af\u5ea6\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6700\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u9632\u6b62\u4e86\u8fd9\u4e9b\u73b0\u8c61\u7684\u53d1\u751f\uff0c\u800c\u4e14\u8fd8\u4f7fTransformer\u80fd\u591f\u5728\u4f7f\u7528\u57fa\u672c\u7b97\u6cd5\u8fdb\u884c\u91cf\u5316\u65f6\u4fdd\u6301\u5176\u6027\u80fd\uff0c\u8fd9\u662f\u6807\u51c6\u65b9\u6cd5\u65e0\u6cd5\u505a\u5230\u7684\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u7b2c\u4e00\u4e2a\u4ee4\u724c\u7684\u6ce8\u610f\u529b\u6bd4\u4f8b\u4ece65%\u964d\u4f4e\u52303.3%\uff0c\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u6fc0\u6d3b\u5cf0\u5ea6\u4ece1657\u964d\u4f4e\u52303.1\uff0c\u57284\u4f4d\u6743\u91cd\u91cf\u5316\u4e0b\u56f0\u60d1\u5ea6\u60e9\u7f5a\u4ece3565\u964d\u4f4e\u52300.3\u3002|\n", "2410.17152": "|**2024-10-22**|**Improving Pinterest Search Relevance Using Large Language Models**|Han Wang et.al.|[2410.17152](http://arxiv.org/abs/2410.17152)|null|\u4e3a\u4e86\u63d0\u9ad8Pinterest\u641c\u7d22\u7684\u76f8\u5173\u6027\u8bc4\u5206\uff0c\u6211\u4eec\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u96c6\u6210\u5230\u6211\u4eec\u7684\u641c\u7d22\u76f8\u5173\u6027\u6a21\u578b\u4e2d\uff0c\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6587\u672c\u8868\u793a\u6765\u6709\u6548\u5730\u9884\u6d4bPin\u7684\u76f8\u5173\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4f7f\u7528\u641c\u7d22\u67e5\u8be2\u4ee5\u53ca\u5305\u542b\u4ece\u751f\u6210\u5f0f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u5b57\u5e55\u7684\u5185\u5bb9\u8868\u793a\u3002\u8fd9\u4e9b\u8868\u793a\u8fdb\u4e00\u6b65\u901a\u8fc7\u94fe\u63a5\u6587\u672c\u6570\u636e\u3001\u5386\u53f2\u9ad8\u8d28\u91cf\u4ea4\u4e92\u67e5\u8be2\u3001\u7528\u6237\u521b\u5efa\u7684\u677f\u3001Pin\u6807\u9898\u548cPin\u63cf\u8ff0\u8fdb\u884c\u4e30\u5bcc\uff0c\u4ece\u800c\u521b\u5efa\u51fa\u5f3a\u5927\u7684\u6a21\u578b\u6765\u9884\u6d4b\u641c\u7d22\u76f8\u5173\u6027\u3002\u6211\u4eec\u91c7\u7528\u534a\u76d1\u7763\u5b66\u4e60\u65b9\u6cd5\u4ee5\u9ad8\u6548\u5730\u6269\u5c55\u8bad\u7ec3\u6570\u636e\u91cf\uff0c\u8d85\u8d8a\u4ec5\u9650\u4e8e\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u3002\u901a\u8fc7\u5229\u7528\u591a\u8bed\u8a00LLMs\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5c06\u8bad\u7ec3\u6570\u636e\u6269\u5c55\u5230\u5305\u62ec\u672a\u89c1\u8fc7\u7684\u8bed\u8a00\u548c\u9886\u57df\uff0c\u5c3d\u7ba1\u521d\u59cb\u6570\u636e\u548c\u6ce8\u91ca\u5458\u7684\u4e13\u4e1a\u77e5\u8bc6\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c06\u57fa\u4e8eLLM\u7684\u6a21\u578b\u63d0\u70bc\u6210\u5b9e\u65f6\u53ef\u670d\u52a1\u7684\u6a21\u578b\u67b6\u6784\u548c\u7279\u5f81\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u79bb\u7ebf\u5b9e\u9a8c\u9a8c\u8bc1\u6211\u4eec\u63d0\u51fa\u7684\u6280\u672f\uff0c\u5e76\u5c55\u793a\u4e86\u5728\u5927\u89c4\u6a21\u90e8\u7f72\u7cfb\u7edf\u4e2d\u6240\u53d6\u5f97\u7684\u6210\u679c\u3002|\n", "2410.18071": "|**2024-10-23**|**TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts**|Yuxuan Xie et.al.|[2410.18071](http://arxiv.org/abs/2410.18071)|null|\u6700\u8fd1\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u56e0\u5176\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u800c\u5907\u53d7\u5173\u6ce8\u3002\u5bf9MLLMs\u7684\u8bc4\u4f30\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u8fd9\u6709\u52a9\u4e8e\u5206\u6790\u8fd9\u4e9b\u6a21\u578b\u7684\u7279\u6027\u5e76\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u5ffd\u89c6\u4e86\u63d0\u793a\u654f\u611f\u6027\u7684\u95ee\u9898\u2014\u2014\u8f7b\u5fae\u7684\u63d0\u793a\u53d8\u5316\u53ef\u80fd\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u6ce2\u52a8\u3002\u56e0\u6b64\uff0c\u4e0d\u9002\u5f53\u7684\u63d0\u793a\u53ef\u80fd\u4f1a\u63a9\u76d6\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4ece\u800c\u4f4e\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u4e0d\u540c\u7684\u6a21\u578b\u5bf9\u4e8e\u4e0d\u540c\u63d0\u793a\u6709\u4e0d\u540c\u7684\u504f\u597d\uff0c\u56e0\u6b64\u4f7f\u7528\u76f8\u540c\u7684\u63d0\u793a\u6765\u8bc4\u4f30\u6240\u6709\u6a21\u578b\u4f1a\u5bfc\u81f4\u8bc4\u4f30\u504f\u5dee\u3002\u672c\u6587\u5206\u6790\u4e86\u73b0\u6709\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8fd9\u4e00\u7f3a\u9677\uff0c\u5e76\u8fdb\u4e00\u6b65\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u8bc4\u4f30\u6846\u67b6TP-Eval\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u5f15\u5165\u63d0\u793a\u5b9a\u5236\u65b9\u6cd5\u6765\u51cf\u5c11\u8bc4\u4f30\u504f\u5dee\u5e76\u6316\u6398\u6a21\u578b\u7684\u6f5c\u529b\u3002TP-Eval\u5c06\u91cd\u5199\u539f\u59cb\u63d0\u793a\uff0c\u4e3a\u4e0d\u540c\u7684\u6a21\u578b\u751f\u6210\u4e0d\u540c\u7684\u5b9a\u5236\u5316\u63d0\u793a\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u9488\u5bf9MLLM\u8bc4\u4f30\u573a\u666f\u8bbe\u8ba1\u4e86\u4e00\u4e9b\u6a21\u5757\u6765\u5b9e\u73b0\u63d0\u793a\u5b9a\u5236\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u6709\u6548\u63ed\u793a\u6a21\u578b\u7684\u6f5c\u529b\uff0cTP-Eval\u6709\u671b\u4e3a\u793e\u533a\u5f00\u53d1\u66f4\u5168\u9762\u548c\u6709\u8bf4\u670d\u529b\u7684MLLM\u8bc4\u4f30\u57fa\u51c6\u505a\u51fa\u8d21\u732e\u3002|\n", "2410.18050": "|**2024-10-23**|**LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering**|Qingfei Zhao et.al.|[2410.18050](http://arxiv.org/abs/2410.18050)|**[link](https://github.com/qingfei1/longrag)**|**\u957f\u4e0a\u4e0b\u6587\u95ee\u7b54\uff08LCQA\uff09\u662f\u4e00\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\uff0c\u65e8\u5728\u901a\u8fc7\u63a8\u7406\u957f\u7bc7\u6587\u6863\u6765\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u73b0\u6709\u7684\u957f\u4e0a\u4e0b\u6587\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LCQA\u4e2d\u5e38\u5e38\u9762\u4e34\u201c\u8ff7\u5931\u4e2d\u95f4\u201d\u95ee\u9898\u3002\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u901a\u8fc7\u63d0\u4f9b\u5916\u90e8\u4e8b\u5b9e\u8bc1\u636e\u6765\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u5176\u5206\u5757\u7b56\u7565\u7834\u574f\u4e86\u5168\u5c40\u957f\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u5e76\u4e14\u5728\u957f\u4e0a\u4e0b\u6587\u4e2d\u4f4e\u8d28\u91cf\u7684\u68c0\u7d22\u4f1a\u963b\u788d\u5927\u8bed\u8a00\u6a21\u578b\u8bc6\u522b\u6709\u6548\u7684\u4e8b\u5b9e\u7ec6\u8282\uff0c\u56e0\u4e3a\u5b58\u5728\u5927\u91cf\u566a\u58f0\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LongRAG\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u7528\u7684\u3001\u53cc\u91cd\u89c6\u89d2\u7684\u3001\u5065\u58ee\u7684\u5927\u8bed\u8a00\u6a21\u578b\u4e3a\u57fa\u7840\u7684RAG\u7cfb\u7edf\u8303\u5f0f\uff0c\u7528\u4e8e\u589e\u5f3aRAG\u5bf9\u590d\u6742\u957f\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7684\u7406\u89e3\uff08\u5373\u5168\u5c40\u4fe1\u606f\u548c\u4e8b\u5b9e\u7ec6\u8282\uff09\u3002\u6211\u4eec\u5c06LongRAG\u8bbe\u8ba1\u4e3a\u4e00\u79cd\u5373\u63d2\u5373\u7528\u7684\u8303\u5f0f\uff0c\u4fbf\u4e8e\u9002\u5e94\u5404\u79cd\u9886\u57df\u548c\u5927\u8bed\u8a00\u6a21\u578b\u3002\u5728\u4e09\u4e2a\u591a\u8df3\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cLongRAG\u663e\u8457\u4f18\u4e8e\u957f\u4e0a\u4e0b\u6587\u5927\u8bed\u8a00\u6a21\u578b\uff08\u63d0\u53476.94%\uff09\uff0c\u5148\u8fdb\u7684RAG\uff08\u63d0\u53476.16%\uff09\u548c\u539f\u59cbRAG\uff08\u63d0\u534717.25%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9a\u91cf\u6d88\u878d\u7814\u7a76\u548c\u591a\u7ef4\u5ea6\u5206\u6790\uff0c\u5f3a\u8c03\u4e86\u7cfb\u7edf\u7ec4\u4ef6\u548c\u5fae\u8c03\u7b56\u7565\u7684\u6709\u6548\u6027\u3002\u6570\u636e\u548c\u4ee3\u7801\u53ef\u5728https://github.com/QingFei1/LongRAG\u83b7\u53d6\u3002**|\n", "2410.18040": "|**2024-10-23**|**Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases**|Anna Glazkova et.al.|[2410.18040](http://arxiv.org/abs/2410.18040)|null|\u5173\u952e\u8bcd\u9009\u62e9\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u4e00\u4e2a\u5177\u6709\u5e7f\u6cdb\u5e94\u7528\u7684\u6311\u6218\u6027\u4efb\u52a1\u3002\u7531\u4e8e\u4fc4\u8bed\u4e30\u5bcc\u7684\u5f62\u6001\u5b66\u7279\u5f81\u4ee5\u53ca\u6709\u9650\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5c06\u73b0\u6709\u7684\u76d1\u7763\u548c\u975e\u76d1\u7763\u89e3\u51b3\u65b9\u6848\u5e94\u7528\u4e8e\u4fc4\u8bed\u9762\u4e34\u8bf8\u591a\u9650\u5236\u3002\u6700\u8fd1\u5bf9\u82f1\u6587\u6587\u672c\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6210\u529f\u5730\u89e3\u51b3\u4e86\u751f\u6210\u5173\u952e\u8bcd\u7684\u4efb\u52a1\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u8fdb\u884c\u7279\u5b9a\u4efb\u52a1\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u53d6\u5f97\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u7ed3\u679c\uff0c\u4f7f\u7528\u6587\u672c\u63d0\u793a\u5373\u53ef\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u5728\u751f\u6210\u4fc4\u6587\u79d1\u5b66\u6458\u8981\u5173\u952e\u8bcd\u65b9\u9762\u7684\u8868\u73b0\u3002\u9996\u5148\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u3001\u5fae\u8c03\u6a21\u578b\u548c\u975e\u76d1\u7763\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e2d\u5173\u952e\u8bcd\u793a\u4f8b\u7684\u9009\u62e9\u7b56\u7565\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4eba\u5de5\u8bc4\u4f30\u751f\u6210\u7684\u5173\u952e\u8bcd\u7684\u7ed3\u679c\uff0c\u5e76\u901a\u8fc7\u4e13\u5bb6\u8bc4\u4f30\u5206\u6790\u4e86\u6a21\u578b\u7684\u4f18\u52bf\u548c\u52a3\u52bf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u5373\u4f7f\u4f7f\u7528\u7b80\u5355\u7684\u6587\u672c\u63d0\u793a\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u4e5f\u53ef\u4ee5\u8d85\u8d8a\u5e38\u89c1\u7684\u57fa\u7ebf\u6a21\u578b\u3002|\n", "2410.18035": "|**2024-10-23**|**MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning**|Jingfan Zhang et.al.|[2410.18035](http://arxiv.org/abs/2410.18035)|null|\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u53ca\u5176\u6df7\u5408\u4e13\u5bb6\uff08MOE\uff09\u53d8\u4f53\u662f\u9ad8\u5ea6\u6709\u6548\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5728Transformer\u5c42\u7684\u591a\u4e2a\u7ebf\u6027\u6a21\u5757\u4e2d\u6dfb\u52a0\u4e86LoRA\u6a21\u5757\u548cMOE\u8def\u7531\u5668\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u591a\u79df\u6237\u8bbe\u7f6e\u4e2d\u5f15\u5165\u4e86\u663e\u8457\u7684\u5ef6\u8fdf\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mixture of Low-Rank Adaptation (MiLoRA)\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u4e14\u9ad8\u6548\u7684LoRA\u53d8\u4f53\u3002MiLoRA\u4e0e\u4e4b\u524d\u7684MOE\u98ce\u683cLoRA\u65b9\u6cd5\u4e0d\u540c\uff0c\u5b83\u5c06\u6bcf\u4e2aLoRA\u6a21\u5757\u89c6\u4e3a\u4e00\u4e2a\u4e13\u5bb6\uff0c\u5e76\u91c7\u7528\u63d0\u793a\u611f\u77e5\u8def\u7531\u673a\u5236\u3002\u8fd9\u79cd\u673a\u5236\u5728\u751f\u6210\u7b2c\u4e00\u4e2a\u65b0\u6807\u8bb0\u4e4b\u524d\u8ba1\u7b97\u4e00\u6b21\u4e13\u5bb6\u8def\u7531\u7ed3\u679c\uff0c\u5e76\u5728\u540e\u7eed\u6807\u8bb0\u4e2d\u91cd\u7528\u8fd9\u4e9b\u7ed3\u679c\uff0c\u4ece\u800c\u51cf\u5c11\u5ef6\u8fdf\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u5206\u6790\u8868\u660e\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\u3001\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4ee5\u53ca\u5e7f\u6cdb\u4f7f\u7528\u7684LLM\u8bc4\u4f30\u57fa\u51c6\u4e0a\uff0cMiLoRA\u59cb\u7ec8\u4f18\u4e8e\u5f3a\u5927\u7684PEFT\u57fa\u7ebf\uff0c\u540c\u65f6\u5177\u6709\u53ef\u6bd4\u7684\u53ef\u8c03\u53c2\u6570\u9884\u7b97\u3002\u6b64\u5916\uff0c\u4e0e\u4e4b\u524d\u7684\u57fa\u4e8eLoRA\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0cMiLoRA\u5728\u591a\u79df\u6237\u8bbe\u7f6e\u4e2d\u663e\u8457\u964d\u4f4e\u4e86\u5ef6\u8fdf\u3002|\n", "2410.18032": "|**2024-10-23**|**GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration**|Xin Li et.al.|[2410.18032](http://arxiv.org/abs/2410.18032)|**[link](https://github.com/bupt-gamma/graphteam)**|**\u56fe\u662f\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5efa\u6a21\u5173\u7cfb\u6570\u636e\u7684\u5e38\u7528\u5de5\u5177\uff0c\u4f8b\u5982\u793e\u4ea4\u7f51\u7edc\u548c\u57ce\u5e02\u8ba1\u7b97\u3002\u73b0\u6709\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u56fe\u5206\u6790\u65b9\u6cd5\u8981\u4e48\u96c6\u6210\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\u4ee5\u7528\u4e8e\u7279\u5b9a\u7684\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5176\u53ef\u79fb\u690d\u6027\uff1b\u8981\u4e48\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u81ea\u8eab\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e86LLM\u57fa\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u8fdb\u5c55\u8868\u660e\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u6216\u5de5\u5177\u89e3\u51b3\u95ee\u9898\u3002\u901a\u8fc7\u6a21\u62df\u4eba\u7c7b\u7684\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff0c\u5982\u7c7b\u6bd4\u548c\u534f\u4f5c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u540d\u4e3aGraphTeam\uff0c\u7528\u4e8e\u56fe\u5206\u6790\u3002GraphTeam\u7531\u4e09\u4e2a\u6a21\u5757\u4e2d\u7684\u4e94\u4e2aLLM\u57fa\u4ee3\u7406\u7ec4\u6210\uff0c\u8fd9\u4e9b\u5177\u6709\u4e0d\u540c\u4e13\u957f\u7684\u4ee3\u7406\u53ef\u4ee5\u76f8\u4e92\u534f\u4f5c\u4ee5\u5e94\u5bf9\u590d\u6742\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\uff081\uff09\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u5316\u6a21\u5757\uff1a\u95ee\u9898\u4ee3\u7406\u4ece\u539f\u59cb\u95ee\u9898\u4e2d\u63d0\u53d6\u5e76\u7cbe\u70bc\u56db\u4e2a\u5173\u952e\u53c2\u6570\uff0c\u4fc3\u8fdb\u95ee\u9898\u7406\u89e3\uff0c\u800c\u7b54\u6848\u4ee3\u7406\u5219\u7ec4\u7ec7\u7ed3\u679c\u4ee5\u6ee1\u8db3\u8f93\u51fa\u8981\u6c42\uff1b\uff082\uff09\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6a21\u5757\uff1a\u6211\u4eec\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u6587\u6863\u548c\u7ecf\u9a8c\u4fe1\u606f\u7684\u77e5\u8bc6\u5e93\uff0c\u7136\u540e\u641c\u7d22\u4ee3\u7406\u9488\u5bf9\u6bcf\u4e2a\u95ee\u9898\u68c0\u7d22\u6700\u76f8\u5173\u7684\u6761\u76ee\u3002\uff083\uff09\u95ee\u9898\u89e3\u51b3\u6a21\u5757\uff1a\u7ed9\u5b9a\u641c\u7d22\u4ee3\u7406\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u7f16\u7801\u4ee3\u7406\u4f7f\u7528\u7f16\u7a0b\u65b9\u5f0f\u751f\u6210\u89e3\u51b3\u65b9\u6848\uff1b\u5982\u679c\u7f16\u7801\u4ee3\u7406\u4e0d\u8d77\u4f5c\u7528\uff0c\u5219\u63a8\u7406\u4ee3\u7406\u5c06\u76f4\u63a5\u8ba1\u7b97\u7ed3\u679c\u800c\u4e0d\u8fdb\u884c\u7f16\u7a0b\u3002\u5728\u516d\u4e2a\u56fe\u5206\u6790\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cGraphTeam\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u51c6\u786e\u7387\u65b9\u9762\u5e73\u5747\u6bd4\u6700\u4f73\u57fa\u7ebf\u9ad8\u51fa25.85%\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2410.18012": "|**2024-10-23**|**MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting**|Sungil Seok et.al.|[2410.18012](http://arxiv.org/abs/2410.18012)|null|\u7f8e\u56fd\u8054\u90a6\u57fa\u91d1\u5229\u7387\u5728\u56fd\u5185\u5916\u91d1\u878d\u5e02\u573a\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8be5\u5229\u7387\u8c03\u6574\u7684\u5f71\u54cd\u4e0a\uff0c\u800c\u4e0d\u662f\u51b3\u7b56\u8fc7\u7a0b\u672c\u8eab\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fdb\u6b65\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u80fd\u7684\u65b9\u6cd5\u6765\u91cd\u6784\u8d1f\u8d23\u8bbe\u5b9a\u8054\u90a6\u57fa\u91d1\u5229\u7387\u7684\u8054\u90a6\u516c\u5f00\u5e02\u573a\u59d4\u5458\u4f1a\uff08FOMC\uff09\u4f1a\u8bae\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e94\u9636\u6bb5\u7684FOMC\u4f1a\u8bae\u6a21\u62df\u6846\u67b6MiniFed\uff0c\u8be5\u6846\u67b6\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684FOMC\u4f1a\u8bae\u6210\u5458\uff0c\u5e76\u4f18\u5316FOMC\u7ed3\u6784\u3002\u6b64\u6846\u67b6\u6709\u6548\u5730\u91cd\u65b0\u6fc0\u6d3b\u4e86FOMC\u4f1a\u8bae\u8fc7\u7a0b\uff0c\u5e76\u6709\u52a9\u4e8e\u9884\u6d4b\u8054\u90a6\u57fa\u91d1\u5229\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u63d0\u51fa\u7684MiniFed\u6846\u67b6\u5728\u8054\u90a6\u57fa\u91d1\u5229\u7387\u9884\u6d4b\u65b9\u9762\u5177\u6709\u9ad8\u51c6\u786e\u5ea6\uff0c\u5e76\u4e14\u5728\u4ee3\u7406\u884c\u4e3a\u4e0a\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u5bf9\u5e94\u8005\u4fdd\u6301\u4e00\u81f4\u3002\u9274\u4e8e\u76ee\u524d\u5f88\u5c11\u6709\u7814\u7a76\u5229\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u5927\u89c4\u6a21\u7684\u73b0\u5b9e\u4e16\u754c\u4f1a\u8bae\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u53ef\u4ee5\u4f5c\u4e3a\u672a\u6765\u53d1\u5c55\u7684\u57fa\u51c6\u3002|\n", "2410.17954": "|**2024-10-23**|**ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference**|Xin He et.al.|[2410.17954](http://arxiv.org/abs/2410.17954)|null|\u7a00\u758f\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5bc6\u96c6\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f46\u5728\u63a8\u7406\u90e8\u7f72\u8fc7\u7a0b\u4e2d\u9762\u4e34\u663e\u8457\u7684\u5185\u5b58\u9700\u6c42\u6311\u6218\u3002\u73b0\u6709\u7684\u5378\u8f7d\u6280\u672f\u6d89\u53ca\u5728GPU\u548cCPU\u4e4b\u95f4\u4ea4\u6362\u6fc0\u6d3b\u548c\u7a7a\u95f2\u7684\u4e13\u5bb6\uff0c\u4f46\u8fd9\u4e9b\u6280\u672f\u901a\u5e38\u53d7\u5230\u521a\u6027\u4e13\u5bb6\u7f13\u5b58\u673a\u5236\u7684\u9650\u5236\u3002\u8fd9\u4e9b\u673a\u5236\u65e0\u6cd5\u9002\u5e94\u52a8\u6001\u8def\u7531\uff0c\u5bfc\u81f4\u7f13\u5b58\u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u6216\u5728\u9884\u6d4b\u8bad\u7ec3\u4e2d\u4ea7\u751f\u9ad8\u6602\u7684\u6210\u672c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u7279\u5b9a\u4e8e\u63a8\u7406\u7684\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86ExpertFlow\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u9002\u5e94\u7075\u6d3b\u8def\u7531\u5e76\u5b9e\u73b0\u4e13\u5bb6\u5728CPU\u548cGPU\u4e4b\u95f4\u7684\u9ad8\u6548\u8c03\u5ea6\u6765\u589e\u5f3a\u63a8\u7406\u6548\u7387\u3002\u8fd9\u51cf\u5c11\u4e86\u5f00\u9500\u5e76\u63d0\u5347\u4e86\u7cfb\u7edf\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6838\u5fc3\u662f\u4e00\u4e2a\u57fa\u4e8e\u9884\u6d4b\u8def\u7531\u8def\u5f84\u7684\u5378\u8f7d\u673a\u5236\uff0c\u5229\u7528\u8f7b\u91cf\u7ea7\u9884\u6d4b\u5668\u5728\u8ba1\u7b97\u5f00\u59cb\u524d\u51c6\u786e\u9884\u6d4b\u8def\u7531\u8def\u5f84\u3002\u8fd9\u79cd\u4e3b\u52a8\u7b56\u7565\u5141\u8bb8\u5b9e\u65f6\u7ea0\u6b63\u4e13\u5bb6\u7f13\u5b58\u4e2d\u7684\u9519\u8bef\uff0c\u663e\u8457\u63d0\u9ad8\u7f13\u5b58\u547d\u4e2d\u7387\u5e76\u51cf\u5c11\u4e13\u5bb6\u4f20\u8f93\u7684\u9891\u7387\uff0c\u4ece\u800c\u6700\u5c0f\u5316I/O\u5f00\u9500\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u65bd\u4e86\u4e00\u79cd\u52a8\u6001\u4ee4\u724c\u8c03\u5ea6\u7b56\u7565\uff0c\u901a\u8fc7\u5728\u4e0d\u540c\u6279\u6b21\u95f4\u91cd\u65b0\u6392\u5217\u8f93\u5165\u4ee4\u724c\u6765\u4f18\u5316MoE\u63a8\u7406\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u51cf\u5c11\u4e86\u6bcf\u6279\u6b21\u6fc0\u6d3b\u7684\u4e13\u5bb6\u6570\u91cf\uff0c\u8fd8\u63d0\u9ad8\u4e86\u8ba1\u7b97\u6548\u7387\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cExpertFlow\u5b9e\u73b0\u4e86\u9ad8\u8fbe93.72%\u7684GPU\u5185\u5b58\u8282\u7701\uff0c\u5e76\u5c06\u63a8\u7406\u901f\u5ea6\u63d0\u5347\u81f3\u57fa\u7ebf\u65b9\u6cd5\u76842\u523010\u500d\uff0c\u7a81\u663e\u4e86\u5176\u6709\u6548\u6027\u548c\u4f5c\u4e3a\u8d44\u6e90\u53d7\u9650\u63a8\u7406\u573a\u666f\u4e0b\u7684\u7a33\u5065\u89e3\u51b3\u65b9\u6848\u7684\u91cd\u8981\u6027\u3002|\n", "2410.17952": "|**2024-10-23**|**SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains**|Ran Xu et.al.|[2410.17952](http://arxiv.org/abs/2410.17952)|null|\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u901a\u8fc7\u6574\u5408\u5916\u90e8\u77e5\u8bc6\u589e\u5f3a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u95ee\u9898\u56de\u7b54\uff08QA\uff09\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06\u901a\u7528\u7684RAG\u7cfb\u7edf\u9002\u5e94\u5230\u79d1\u5b66\u548c\u533b\u5b66\u7b49\u4e13\u4e1a\u9886\u57df\u65f6\uff0c\u7531\u4e8e\u5206\u5e03\u5dee\u5f02\u548c\u6709\u9650\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u8bbf\u95ee\uff0c\u4f1a\u9762\u4e34\u72ec\u7279\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SimRAG\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u8bad\u7ec3\u65b9\u6cd5\uff0c\u4f7fLLM\u5177\u5907\u95ee\u9898\u56de\u7b54\u548c\u95ee\u9898\u751f\u6210\u7684\u8054\u5408\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u9886\u57df\u9002\u5e94\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u5728\u6307\u4ee4\u8ddf\u968f\u3001\u95ee\u7b54\u548c\u641c\u7d22\u76f8\u5173\u6570\u636e\u4e0a\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u540e\uff0c\u5b83\u63d0\u793a\u76f8\u540c\u7684LLM\u4ece\u65e0\u6807\u7b7e\u8bed\u6599\u5e93\u4e2d\u751f\u6210\u591a\u6837\u5316\u7684\u9886\u57df\u76f8\u5173\u95ee\u9898\uff0c\u5e76\u91c7\u7528\u989d\u5916\u7684\u8fc7\u6ee4\u7b56\u7565\u6765\u4fdd\u7559\u9ad8\u8d28\u91cf\u7684\u5408\u6210\u793a\u4f8b\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u5408\u6210\u793a\u4f8b\uff0cLLM\u53ef\u4ee5\u5728\u7279\u5b9a\u9886\u57df\u7684RAG\u4efb\u52a1\u4e2d\u63d0\u5347\u6027\u80fd\u3002\u5728\u8de8\u8d8a\u4e24\u4e2a\u57fa\u7840\u6a21\u578b\u5927\u5c0f\u548c\u4e09\u4e2a\u9886\u57df\u768411\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSimRAG\u6bd4\u57fa\u7ebf\u65b9\u6cd5\u9ad8\u51fa1.2%\u81f38.6%\u3002|\n", "2410.17950": "|**2024-10-23**|**Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling**|Nirav Bhan et.al.|[2410.17950](http://arxiv.org/abs/2410.17950)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u975e\u51e1\u7684\u80fd\u529b\uff0c\u4f46\u7531\u4e8e\u5de5\u5177\u4f7f\u7528\u548c\u529f\u80fd\u8c03\u7528\u65b9\u9762\u7684\u6311\u6218\uff0c\u5176\u7ecf\u6d4e\u5f71\u54cd\u53d7\u5230\u4e86\u9650\u5236\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aThorV2\u7684\u65b0\u67b6\u6784\uff0c\u8be5\u67b6\u6784\u663e\u8457\u589e\u5f3a\u4e86LLMs\u7684\u529f\u80fd\u8c03\u7528\u80fd\u529b\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u6ce8\u4e8eHubSpot CRM\u64cd\u4f5c\uff0c\u4ee5\u8bc4\u4f30ThorV2\u4e0eOpenAI\u548cAnthropic\u7684\u9886\u5148\u6a21\u578b\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cThorV2\u5728\u5355\u4e2a\u548c\u591aAPI\u8c03\u7528\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3001\u53ef\u9760\u6027\u3001\u5ef6\u8fdf\u548c\u6210\u672c\u6548\u7387\u65b9\u9762\u5747\u4f18\u4e8e\u73b0\u6709\u6a21\u578b\u3002\u6211\u4eec\u8fd8\u8868\u660e\uff0cThorV2\u5728\u591a\u6b65\u9aa4\u4efb\u52a1\u4e2d\u7684\u53ef\u9760\u6027\u66f4\u5f3a\uff0c\u5e76\u4e14\u53ef\u6269\u5c55\u6027\u66f4\u597d\uff0c\u76f8\u6bd4\u4f20\u7edf\u6a21\u578b\u5177\u6709\u660e\u663e\u4f18\u52bf\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4ee4\u4eba\u5174\u594b\u7684\u53ef\u80fd\u6027\uff0c\u5373\u4f7f\u7528\u663e\u8457\u66f4\u5c0f\u7684LLMs\u5b9e\u73b0\u6bd4\u5f53\u4eca\u6700\u4f73\u6a21\u578b\u66f4\u51c6\u786e\u7684\u529f\u80fd\u8c03\u7528\u3002\u8fd9\u4e9b\u8fdb\u5c55\u5bf9\u4e8e\u5f00\u53d1\u66f4\u5f3a\u5927\u7684AI\u52a9\u624b\u4ee5\u53caLLMs\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2410.17922": "|**2024-10-23**|**Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models**|He Cao et.al.|[2410.17922](http://arxiv.org/abs/2410.17922)|**[link](https://github.com/idea-xl/g4d)**|\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u90e8\u7f72\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u9632\u5fa1\u65b9\u6cd5\u5f80\u5f80\u5b58\u5728\u4e24\u4e2a\u5173\u952e\u95ee\u9898\uff1a(i) \u9632\u5fa1\u80fd\u529b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u5728\u5316\u5b66\u7b49\u7279\u5b9a\u9886\u57df\u573a\u666f\u4e0b\uff0c\u7f3a\u4e4f\u4e13\u95e8\u77e5\u8bc6\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6076\u610f\u67e5\u8be2\u751f\u6210\u6709\u5bb3\u54cd\u5e94\u3002(ii) \u8fc7\u5ea6\u9632\u5fa1\uff0c\u8fd9\u4f1a\u635f\u5bb3LLMs\u7684\u4e00\u822c\u5b9e\u7528\u6027\u548c\u54cd\u5e94\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u591a\u4ee3\u7406\u7684\u9632\u5fa1\u6846\u67b6\uff0c\u79f0\u4e3aGuide for Defense (G4D)\uff0c\u8be5\u6846\u67b6\u5229\u7528\u51c6\u786e\u7684\u5916\u90e8\u4fe1\u606f\u63d0\u4f9b\u7528\u6237\u610f\u56fe\u7684\u65e0\u504f\u603b\u7ed3\u4ee5\u53ca\u5206\u6790\u6027\u5b89\u5168\u54cd\u5e94\u6307\u5bfc\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d41\u884c\u7684\u624b\u518c\u9003\u8131\u653b\u51fb\u548c\u826f\u6027\u6570\u636e\u96c6\u4e0a\uff0c\u6211\u4eec\u7684G4D\u53ef\u4ee5\u5728\u4e0d\u635f\u5bb3\u6a21\u578b\u4e00\u822c\u529f\u80fd\u7684\u60c5\u51b5\u4e0b\u589e\u5f3aLLM\u5728\u901a\u7528\u548c\u7279\u5b9a\u9886\u57df\u7684\u9c81\u68d2\u6027\u3002|\n", "2410.18975": "|**2024-10-24**|**Unbounded: A Generative Infinite Game of Character Life Simulation**|Jialu Li et.al.|[2410.18975](http://arxiv.org/abs/2410.18975)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u751f\u6210\u65e0\u9650\u6e38\u620f\u7684\u6982\u5ff5\uff0c\u8fd9\u662f\u4e00\u79cd\u89c6\u9891\u6e38\u620f\uff0c\u5b83\u8d85\u8d8a\u4e86\u4f20\u7edf\u56fa\u5b9a\u3001\u786c\u7f16\u7801\u7cfb\u7edf\u7684\u8fb9\u754c\uff0c\u901a\u8fc7\u4f7f\u7528\u751f\u6210\u6a21\u578b\u6765\u5b9e\u73b0\u3002\u53d7James P. Carse\u5173\u4e8e\u6709\u9650\u6e38\u620f\u548c\u65e0\u9650\u6e38\u620f\u533a\u522b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5229\u7528\u6700\u8fd1\u5728\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u65b9\u9762\u7684\u8fdb\u5c55\u6765\u521b\u5efa\u300a\u65e0\u754c\u300b\u2014\u2014\u4e00\u6b3e\u5b8c\u5168\u5c01\u88c5\u5728\u751f\u6210\u6a21\u578b\u4e2d\u7684\u89d2\u8272\u751f\u6d3b\u6a21\u62df\u6e38\u620f\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u300a\u65e0\u754c\u300b\u53d7\u5230\u6c99\u76d2\u751f\u6d3b\u6a21\u62df\u6e38\u620f\u7684\u542f\u53d1\uff0c\u5141\u8bb8\u4f60\u901a\u8fc7\u5582\u517b\u3001\u73a9\u800d\u548c\u5f15\u5bfc\u7b49\u65b9\u5f0f\u4e0e\u4f60\u5728\u865a\u62df\u4e16\u754c\u4e2d\u7684\u81ea\u4e3b\u865a\u62df\u89d2\u8272\u4e92\u52a8\uff0c\u5176\u4e2d\u4e00\u4e9b\u673a\u5236\u662f\u5f00\u653e\u5f0f\u7684\uff0c\u5e76\u4e14\u53ef\u4ee5\u662f\u7a81\u53d1\u6027\u7684\u3002\u4e3a\u4e86\u5f00\u53d1\u300a\u65e0\u754c\u300b\uff0c\u6211\u4eec\u5728\u8bed\u8a00\u6a21\u578b\u548c\u89c6\u89c9\u751f\u6210\u9886\u57df\u63d0\u51fa\u4e86\u6280\u672f\u4e0a\u7684\u521b\u65b0\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\uff1a(1)\u4e00\u79cd\u4e13\u95e8\u8bbe\u8ba1\u7684\u3001\u7ecf\u8fc7\u84b8\u998f\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u5b9e\u65f6\u52a8\u6001\u751f\u6210\u6e38\u620f\u673a\u5236\u3001\u53d9\u4e8b\u548c\u89d2\u8272\u4e92\u52a8\uff0c(2)\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u533a\u57df\u56fe\u50cf\u63d0\u793a\u9002\u914d\u5668\uff08IP-Adapter\uff09\uff0c\u7528\u4e8e\u89c6\u89c9\u6a21\u578b\uff0c\u786e\u4fdd\u89d2\u8272\u5728\u591a\u4e2a\u73af\u5883\u4e2d\u7684\u89c6\u89c9\u751f\u6210\u65e2\u4e00\u81f4\u53c8\u7075\u6d3b\u3002\u6211\u4eec\u901a\u8fc7\u5b9a\u6027\u548c\u5b9a\u91cf\u5206\u6790\u5bf9\u6211\u4eec\u7684\u7cfb\u7edf\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u89d2\u8272\u751f\u6d3b\u6a21\u62df\u3001\u7528\u6237\u6307\u4ee4\u9075\u5faa\u3001\u53d9\u4e8b\u8fde\u8d2f\u6027\u548c\u89c6\u89c9\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u4e0e\u4f20\u7edf\u76f8\u5173\u65b9\u6cd5\u76f8\u6bd4\u6709\u663e\u8457\u6539\u8fdb\u3002|\n", "2410.18967": "|**2024-10-24**|**Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms**|Zhangheng Li et.al.|[2410.18967](http://arxiv.org/abs/2410.18967)|null|\u6784\u5efa\u4e00\u4e2a\u901a\u7528\u7684\u7528\u6237\u754c\u9762\uff08UI\uff09\u7406\u89e3\u6a21\u578b\u9762\u4e34\u7740\u8bf8\u591a\u6311\u6218\uff0c\u5305\u62ec\u5e73\u53f0\u591a\u6837\u6027\u3001\u5206\u8fa8\u7387\u53d8\u5316\u548c\u6570\u636e\u9650\u5236\u7b49\u95ee\u9898\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aFerret-UI 2\u7684\u65b0\u6a21\u578b\uff0c\u8fd9\u662f\u4e00\u79cd\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u8de8\u591a\u79cd\u5e73\u53f0\u7684\u901a\u7528UI\u7406\u89e3\uff0c\u5305\u62eciPhone\u3001Android\u3001iPad\u3001\u7f51\u9875\u548cApple TV\u7b49\u5e73\u53f0\u3002Ferret-UI 2\u5728\u539f\u6709Ferret-UI\u7684\u57fa\u7840\u4e0a\u5f15\u5165\u4e86\u4e09\u9879\u5173\u952e\u521b\u65b0\uff1a\u652f\u6301\u591a\u79cd\u5e73\u53f0\u7c7b\u578b\u3001\u901a\u8fc7\u81ea\u9002\u5e94\u7f29\u653e\u5b9e\u73b0\u9ad8\u5206\u8fa8\u7387\u611f\u77e5\uff0c\u4ee5\u53ca\u5229\u7528GPT-4o\u7ed3\u5408\u96c6\u5408\u6807\u8bb0\u89c6\u89c9\u63d0\u793a\u751f\u6210\u9ad8\u7ea7\u4efb\u52a1\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4e9b\u6539\u8fdb\u4f7fFerret-UI 2\u80fd\u591f\u6267\u884c\u590d\u6742\u7684\u3001\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u4ea4\u4e92\uff0c\u4f7f\u5176\u5728\u4e0d\u65ad\u6269\u5c55\u7684\u5e73\u53f0\u751f\u6001\u7cfb\u7edf\u4e2d\u5177\u6709\u9ad8\u5ea6\u7684\u901a\u7528\u6027\u548c\u9002\u5e94\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8bc1\u660e\uff0c\u5728\u6307\u5411\u3001\u5b9a\u4f4d\u3001\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u9ad8\u7ea7\u4efb\u52a1\uff08\u5305\u542b9\u4e2a\u5b50\u4efb\u52a1\u00d75\u4e2a\u5e73\u53f0\uff09\u3001GUIDE\u4e0b\u4e00\u6b65\u9884\u6d4b\u6570\u636e\u96c6\u548cGUI-World\u591a\u5e73\u53f0\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cFerret-UI 2\u663e\u8457\u4f18\u4e8eFerret-UI\uff0c\u5e76\u4e14\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u8de8\u5e73\u53f0\u8fc1\u79fb\u80fd\u529b\u3002|\n", "2410.18966": "|**2024-10-24**|**Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions**|Yujuan Fu et.al.|[2410.18966](http://arxiv.org/abs/2410.18966)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u663e\u793a\u51fa\u4f5c\u4e3a\u901a\u7528\u4efb\u52a1\u89e3\u51b3\u8005\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u662f\u5728\u5927\u91cf\u6570\u636e\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\uff0c\u56e0\u6b64\u5bf9\u5176\u8bc4\u4f30\u7684\u4e00\u4e2a\u91cd\u8981\u95ee\u9898\u662f\u6570\u636e\u6c61\u67d3\u95ee\u9898\uff0c\u5373\u8bad\u7ec3\u6570\u636e\u548c\u8bc4\u4f30\u6570\u636e\u96c6\u4e4b\u95f4\u7684\u91cd\u53e0\u4f1a\u5938\u5927\u6027\u80fd\u8bc4\u4f30\u3002\u867d\u7136\u5df2\u7ecf\u5f00\u53d1\u4e86\u591a\u79cd\u65b9\u6cd5\u6765\u8bc6\u522b\u6570\u636e\u6c61\u67d3\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u7279\u5b9a\u7684\u5047\u8bbe\uff0c\u800c\u8fd9\u4e9b\u5047\u8bbe\u53ef\u80fd\u5e76\u4e0d\u666e\u904d\u9002\u7528\u4e8e\u4e0d\u540c\u7684\u8bbe\u7f6e\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u56de\u987e\u4e8647\u7bc7\u5173\u4e8e\u6570\u636e\u6c61\u67d3\u68c0\u6d4b\u7684\u8bba\u6587\uff0c\u5bf9\u5176\u4e2d\u7684\u57fa\u7840\u5047\u8bbe\u8fdb\u884c\u4e86\u5206\u7c7b\uff0c\u5e76\u8bc4\u4f30\u4e86\u5b83\u4eec\u662f\u5426\u7ecf\u8fc7\u4e25\u683c\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u786e\u5b9a\u5e76\u5206\u6790\u4e86\u516b\u7c7b\u5047\u8bbe\uff0c\u5e76\u4ee5\u4e09\u4e2a\u5047\u8bbe\u4f5c\u4e3a\u6848\u4f8b\u7814\u7a76\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u5728\u5bf9\u7528\u4e8e\u9884\u8bad\u7ec3LLMs\u7684\u5b9e\u4f8b\u8fdb\u884c\u5206\u7c7b\u65f6\uff0c\u57fa\u4e8e\u8fd9\u4e09\u79cd\u5047\u8bbe\u7684\u68c0\u6d4b\u65b9\u6cd5\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u968f\u673a\u731c\u6d4b\uff0c\u8fd9\u8868\u660e\u5f53\u524d\u7684LLMs\u5b66\u4e60\u7684\u662f\u6570\u636e\u5206\u5e03\u800c\u4e0d\u662f\u8bb0\u5fc6\u4e2a\u522b\u5b9e\u4f8b\u3002\u603b\u4f53\u800c\u8a00\uff0c\u8fd9\u9879\u5de5\u4f5c\u5f3a\u8c03\u4e86\u65b9\u6cd5\u660e\u786e\u9648\u8ff0\u5176\u57fa\u7840\u5047\u8bbe\u5e76\u5728\u5404\u79cd\u573a\u666f\u4e0b\u6d4b\u8bd5\u5176\u6709\u6548\u6027\u7684\u91cd\u8981\u6027\u3002|\n", "2410.18963": "|**2024-10-24**|**OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning**|Xiaoqiang Wang et.al.|[2410.18963](http://arxiv.org/abs/2410.18963)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u5982\u7f51\u9875\u6d4f\u89c8\u548c\u6e38\u620f\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u8de8\u591a\u6837\u5316\u5e94\u7528\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86OSCAR\uff1a\u901a\u8fc7\u72b6\u6001\u611f\u77e5\u63a8\u7406\u548c\u91cd\u89c4\u5212\u7684\u64cd\u4f5c\u7cfb\u7edf\u63a7\u5236\u3002OSCAR\u662f\u4e00\u79cd\u901a\u7528\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u6807\u51c6\u5316\u7684\u63a7\u5236\u65b9\u5f0f\uff08\u5982\u9f20\u6807\u548c\u952e\u76d8\u8f93\u5165\uff09\u81ea\u4e3b\u5bfc\u822a\u548c\u4e0e\u5404\u79cd\u684c\u9762\u548c\u79fb\u52a8\u5e94\u7528\u7a0b\u5e8f\u8fdb\u884c\u4ea4\u4e92\uff0c\u540c\u65f6\u5904\u7406\u5c4f\u5e55\u56fe\u50cf\u4ee5\u5b8c\u6210\u7528\u6237\u547d\u4ee4\u3002OSCAR\u5c06\u4eba\u7c7b\u6307\u4ee4\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684Python\u4ee3\u7801\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u7684\u7cbe\u786e\u63a7\u5236\u3002\u4e3a\u4e86\u589e\u5f3a\u7a33\u5b9a\u6027\u548c\u9002\u5e94\u6027\uff0cOSCAR\u4f5c\u4e3a\u4e00\u4e2a\u72b6\u6001\u673a\u8fd0\u884c\uff0c\u5e76\u914d\u5907\u4e86\u9519\u8bef\u5904\u7406\u673a\u5236\u548c\u52a8\u6001\u4efb\u52a1\u91cd\u89c4\u5212\u529f\u80fd\uff0c\u4f7f\u5176\u80fd\u591f\u9ad8\u6548\u5730\u5b9e\u65f6\u8c03\u6574\u4ee5\u5e94\u5bf9\u53cd\u9988\u548c\u5f02\u5e38\u60c5\u51b5\u3002\u6211\u4eec\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5c55\u793a\u4e86OSCAR\u7684\u6709\u6548\u6027\uff0c\u5728\u8fd9\u4e9b\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u5c06\u590d\u6742\u7684\u64cd\u4f5c\u6d41\u7a0b\u7b80\u5316\u4e3a\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u547d\u4ee4\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u7528\u6237\u7684\u751f\u4ea7\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5728\u53d1\u8868\u540e\u5f00\u6e90\u3002|\n", "2410.18957": "|**2024-10-24**|**Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code**|Jipeng Zhang et.al.|[2410.18957](http://arxiv.org/abs/2410.18957)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u8d44\u6e90\u7f16\u7a0b\u8bed\u8a00\uff08HRPLs\uff09\u5982Python\u7684\u4ee3\u7801\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4f4e\u8d44\u6e90\u7f16\u7a0b\u8bed\u8a00\uff08LRPLs\uff09\u5982Racket\u6216D\u4e0a\u7684\u8868\u73b0\u5219\u663e\u8457\u900a\u8272\u3002\u8fd9\u79cd\u6027\u80fd\u5dee\u8ddd\u52a0\u5267\u4e86\u6570\u5b57\u9e3f\u6c9f\uff0c\u963b\u788d\u4e86\u4f7f\u7528LRPLs\u7684\u5f00\u53d1\u8005\u4eceLLM\u7684\u8fdb\u6b65\u4e2d\u53d7\u76ca\uff0c\u5e76\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u5f3a\u5316\u4e86\u672a\u5145\u5206\u4ee3\u8868\u7684\u7f16\u7a0b\u793e\u533a\u4e4b\u95f4\u7684\u521b\u65b0\u5dee\u5f02\u3002\u867d\u7136\u4e3aLRPLs\u751f\u6210\u989d\u5916\u8bad\u7ec3\u6570\u636e\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u4f46\u5b83\u9762\u4e34\u7740\u4e24\u4e2a\u5173\u952e\u6311\u6218\uff1a\u4eba\u5de5\u6807\u6ce8\u65e2\u8d39\u65f6\u53c8\u6602\u8d35\uff0c\u800cLLM\u751f\u6210\u7684LRPL\u4ee3\u7801\u8d28\u91cf\u901a\u5e38\u8f83\u5dee\u3002\u8fd9\u4e00\u95ee\u9898\u7684\u6839\u672c\u539f\u56e0\u5728\u4e8e\u81ea\u7136\u8bed\u8a00\u5230\u7f16\u7a0b\u8bed\u8a00\u7684\u5dee\u8ddd\uff08NL-PL Gap\uff09\uff0c\u5728LRPLs\u4e2d\u5c24\u5176\u660e\u663e\uff0c\u56e0\u4e3a\u5bf9\u9f50\u7684\u6570\u636e\u6709\u9650\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aBridge-Coder\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528LLMs\u7684\u5185\u5728\u80fd\u529b\u6765\u589e\u5f3a\u5176\u5728LRPLs\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e24\u4e2a\u5173\u952e\u9636\u6bb5\u3002\u9996\u5148\u662f\u6865\u63a5\u751f\u6210\uff0c\u901a\u8fc7\u5229\u7528LLMs\u5bf9\u4e00\u822c\u77e5\u8bc6\u7684\u7406\u89e3\u3001\u5bf9HRPLs\u7684\u719f\u7ec3\u7a0b\u5ea6\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u6765\u521b\u5efa\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6\u3002\u7136\u540e\u662f\u6865\u63a5\u5bf9\u9f50\uff0c\u9010\u6b65\u6539\u5584\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u4e0eLRPLs\u4e4b\u95f4\u7684\u5bf9\u9f50\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u591a\u79cdLRPLs\u4e2d\u663e\u793a\uff0cBridge-Coder\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8be6\u7ec6\u5206\u6790\u4e86\u65b9\u6cd5\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u672a\u6765\u89e3\u51b3\u4e0eLRPLs\u76f8\u5173\u6311\u6218\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2410.18955": "|**2024-10-24**|**BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning**|Yujuan Velvin Fu et.al.|[2410.18955](http://arxiv.org/abs/2410.18955)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982ChatGPT\u901a\u8fc7\u5728\u5927\u89c4\u6a21\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u8ddf\u968f\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u5fae\u8c03\uff0c\u80fd\u591f\u6cdb\u5316\u5230\u65b0\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7ecf\u8fc7\u6307\u4ee4\u5fae\u8c03\u7684LLMs\u5728\u9700\u8981\u9886\u57df\u77e5\u8bc6\u3001\u7ec6\u7c92\u5ea6\u6587\u672c\u7406\u89e3\u548c\u7ed3\u6784\u5316\u6570\u636e\u63d0\u53d6\u7684\u4e13\u4e1a\u533b\u5b66\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u4efb\u52a1\u4e2d\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\uff1a(1) \u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u63d0\u793a\u683c\u5f0f\uff0c\u9002\u7528\u4e8e7\u4e2a\u91cd\u8981\u7684NLU\u4efb\u52a1\uff0c\u901a\u8fc7\u8de8\u5ea6\u63d0\u53d6\u548c\u591a\u9009\u9898\u95ee\u7b54\uff08QA\uff09\u6765\u5b9e\u73b0\uff1b(2) \u521b\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6MNLU-Instruct\uff0c\u5229\u7528\u4e86\u591a\u79cd\u73b0\u6709\u7684\u5f00\u6e90\u533b\u5b66NLU\u8bed\u6599\u5e93\uff1b(3) \u901a\u8fc7\u5728MNLU-Instruct\u4e0a\u5bf9BioMistral\u8fdb\u884c\u5fae\u8c03\uff0c\u5f00\u53d1\u4e86BioMistral-NLU\uff0c\u4e00\u4e2a\u5177\u6709\u901a\u7528\u6027\u7684\u533b\u5b66NLU\u6a21\u578b\u3002\u6211\u4eec\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u8bc4\u4f30\u4e86BioMistral-NLU\uff0c\u5728\u4e24\u4e2a\u5e7f\u6cdb\u91c7\u7528\u7684\u533b\u5b66NLU\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5373\u751f\u7269\u533b\u5b66\u8bed\u8a00\u7406\u89e3\u8bc4\u4f30\uff08BLUE\uff09\u548c\u751f\u7269\u533b\u5b66\u8bed\u8a00\u7406\u89e3\u548c\u63a8\u7406\u57fa\u51c6\uff08BLURB\uff09\u4e2d\u76846\u4e2a\u91cd\u8981NLU\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684BioMistral-NLU\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u539f\u59cb\u7684BioMistral\u4ee5\u53ca\u4e13\u6709\u7684LLMs\u2014\u2014ChatGPT\u548cGPT-4\u3002\u6211\u4eec\u4e0e\u6570\u636e\u96c6\u65e0\u5173\u7684\u63d0\u793a\u7b56\u7565\u548c\u5728\u5404\u79cdNLU\u4efb\u52a1\u4e0a\u7684\u6307\u4ee4\u5fae\u8c03\u6b65\u9aa4\u589e\u5f3a\u4e86LLMs\u5728\u5404\u79cd\u533b\u5b66NLU\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u3002\u6d88\u878d\u5b9e\u9a8c\u663e\u793a\uff0c\u5373\u4f7f\u603b\u7684\u8bad\u7ec3\u5b9e\u4f8b\u6570\u91cf\u4fdd\u6301\u4e0d\u53d8\uff0c\u6307\u4ee4\u5fae\u8c03\u7684\u4efb\u52a1\u79cd\u7c7b\u8d8a\u5e7f\uff0c\u4e0b\u6e38\u96f6\u6837\u672c\u6cdb\u5316\u80fd\u529b\u4e5f\u8d8a\u5f3a\u3002|\n", "2410.18952": "|**2024-10-24**|**Dynamic Vocabulary Pruning in Early-Exit LLMs**|Jort Vincenti et.al.|[2410.18952](http://arxiv.org/abs/2410.18952)|**[link](https://github.com/matteonulli/vocabulary_pruning)**|**\u589e\u52a0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89c4\u6a21\u5df2\u88ab\u8bc1\u660e\u53ef\u4ee5\u63d0\u9ad8\u5176\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e5f\u5e26\u6765\u4e86\u63a8\u7406\u901f\u5ea6\u53d8\u6162\u548c\u6210\u672c\u589e\u52a0\u7684\u95ee\u9898\u3002\u65e9\u671f\u9000\u51fa\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5728\u4e2d\u95f4\u5c42\u8fdb\u884c\u9884\u6d4b\u6765\u63d0\u9ad8LLM\u63a8\u7406\u7684\u6548\u7387\u3002\u7136\u800c\uff0c\u73b0\u4ee3LLMs\u4e2d\u7684\u5927\u8bcd\u6c47\u91cf\u4f7f\u5f97\u6240\u9700\u7684\u7f6e\u4fe1\u5ea6\u4f30\u8ba1\u5728\u8ba1\u7b97\u4e0a\u975e\u5e38\u6602\u8d35\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u6548\u7387\u63d0\u5347\u7684\u6548\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u5728\u6d4b\u8bd5\u65f6\u52a8\u6001\u526a\u679d\u8bcd\u6c47\u8868\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u8bcd\u6c47\u8868\u5728\u6700\u521d\u7684\u67d0\u4e00\u5c42\u88ab\u526a\u679d\uff0c\u5e76\u5728\u6574\u4e2a\u524d\u5411\u4f20\u9012\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u8f83\u5c0f\u7684\u8bcd\u6c47\u8868\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u540e\u5904\u7406\u52a8\u6001\u8bcd\u6c47\u8868\u526a\u679d\u65b9\u6cd5\u63d0\u9ad8\u4e86\u65e9\u671f\u9000\u51faLLM\u4e2d\u7f6e\u4fe1\u5ea6\u4f30\u8ba1\u7684\u6548\u7387\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u6027\u80fd\u3002**|\n", "2410.18927": "|**2024-10-24**|**SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models**|Zonghao Ying et.al.|[2410.18927](http://arxiv.org/abs/2410.18927)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7528\u6237\u751f\u6210\u6709\u5bb3\u8f93\u51fa\u65b9\u9762\u8868\u73b0\u51fa\u5f3a\u70c8\u7684\u5b89\u5168\u9690\u60a3\uff0c\u8fd9\u4fc3\u4f7f\u4e86\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u73b0\u6709\u7684MLLMs\u5b89\u5168\u57fa\u51c6\u5b58\u5728\u67e5\u8be2\u8d28\u91cf\u4f4e\u548c\u8bc4\u4f30\u53ef\u9760\u6027\u5dee\u7684\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u9650\u5236\u4e86\u5bf9MLLMs\u5b89\u5168\u5f71\u54cd\u7684\u68c0\u6d4b\uff0c\u56e0\u4e3a\u968f\u7740MLLMs\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5df2\u663e\u5f97\u4e0d\u8db3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\toolns\u7684\u7efc\u5408\u6846\u67b6\uff0c\u7528\u4e8e\u5bf9MLLMs\u8fdb\u884c\u5b89\u5168\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u5168\u9762\u7684\u6709\u5bb3\u67e5\u8be2\u6570\u636e\u96c6\u548c\u4e00\u79cd\u81ea\u52a8\u8bc4\u4f30\u534f\u8bae\uff0c\u5206\u522b\u65e8\u5728\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5b89\u5168\u6570\u636e\u96c6\u751f\u6210\u7ba1\u9053\uff0c\u5728\u8fd9\u4e2a\u7ba1\u9053\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u4e00\u7ec4LLM\u8bc4\u5224\u8005\u6765\u8bc6\u522b\u548c\u5206\u7c7b\u5bf9MLLMs\u6700\u5177\u5371\u5bb3\u6027\u548c\u591a\u6837\u6027\u7684\u98ce\u9669\u573a\u666f\uff1b\u57fa\u4e8e\u8fd9\u79cd\u5206\u7c7b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8981\u6c42\u8fd9\u4e9b\u8bc4\u5224\u8005\u76f8\u5e94\u5730\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6709\u5bb3\u67e5\u8be2\uff0c\u4ece\u800c\u4ea7\u751f\u4e8623\u79cd\u98ce\u9669\u573a\u666f\u548c2300\u4e2a\u591a\u6a21\u6001\u6709\u5bb3\u67e5\u8be2\u5bf9\u3002\u5728\u5b89\u5168\u8bc4\u4f30\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u53f8\u6cd5\u7a0b\u5e8f\u4e2d\u7684\u966a\u5ba1\u56e2\u5236\u5ea6\uff0c\u5f00\u521b\u4e86\u4e00\u79cd\u966a\u5ba1\u56e2\u5ba1\u8bae\u8bc4\u4f30\u534f\u8bae\uff0c\u8be5\u534f\u8bae\u91c7\u7528\u534f\u4f5c\u5f0fLLM\u6765\u8bc4\u4f30\u76ee\u6807\u6a21\u578b\u662f\u5426\u8868\u73b0\u51fa\u7279\u5b9a\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u4f9b\u53ef\u9760\u4e14\u65e0\u504f\u89c1\u7684\u5185\u5bb9\u5b89\u5168\u98ce\u9669\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u57fa\u51c6\u8fd8\u53ef\u4ee5\u6269\u5c55\u5230\u97f3\u9891\u6a21\u6001\uff0c\u663e\u793a\u51fa\u9ad8\u5ea6\u7684\u53ef\u6269\u5c55\u6027\u548c\u6f5c\u529b\u3002\u57fa\u4e8e\u6211\u4eec\u7684\u6846\u67b6\uff0c\u6211\u4eec\u5bf915\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u5f00\u6e90MLLMs\u548c6\u79cd\u5546\u4e1aMLLMs\uff08\u5982GPT-4o\u3001Gemini\uff09\u8fdb\u884c\u4e86\u5927\u89c4\u6a21\u5b9e\u9a8c\uff0c\u63ed\u793a\u4e86\u73b0\u6709MLLMs\u4e2d\u5b58\u5728\u7684\u5e7f\u6cdb\u5b89\u5168\u95ee\u9898\uff0c\u5e76\u5b9e\u4f8b\u5316\u4e86\u5173\u4e8eMLLMs\u5b89\u5168\u6027\u80fd\u7684\u4e00\u4e9b\u89c1\u89e3\uff0c\u5982\u56fe\u50cf\u8d28\u91cf\u548c\u53c2\u6570\u5927\u5c0f\u3002|\n", "2410.18921": "|**2024-10-24**|**From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems**|A M Muntasir Rahman et.al.|[2410.18921](http://arxiv.org/abs/2410.18921)|null|\u8003\u8651\u4e00\u4e2a\u6570\u5b66\u95ee\u9898\uff1a\u201c\u8389\u8389\u6628\u5929\u4ece\u5979\u6700\u597d\u7684\u670b\u53cb\u90a3\u91cc\u6536\u5230\u4e863\u5757\u997c\u5e72\uff0c\u5e76\u5728\u65e9\u9910\u65f6\u5403\u4e865\u5757\u3002\u4eca\u5929\uff0c\u5979\u7684\u670b\u53cb\u53c8\u7ed9\u4e86\u59793\u5757\u997c\u5e72\u3002\u73b0\u5728\u8389\u8389\u6709\u591a\u5c11\u5757\u997c\u5e72\uff1f\u201d\u8bb8\u591a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5148\u524d\u7684\u7814\u7a76\u4e2d\u901a\u8fc7\u8ba1\u7b97\u201c3-5+3\u201d\u7684\u7b49\u5f0f\u6765\u5f97\u51fa\u7b54\u6848\u201c1\u201d\u3002\u7136\u800c\uff0c\u4ece\u4eba\u7c7b\u7684\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u8ba4\u8bc6\u5230\u8fd9\u4e2a\u95ee\u9898\u7684\u5185\u5728\u7f3a\u9677\uff1a\u5982\u679c\u8389\u8389\u6700\u521d\u53ea\u67093\u5757\u997c\u5e72\uff0c\u5979\u4e0d\u53ef\u80fd\u5728\u65e9\u9910\u65f6\u5403\u63895\u5757\u3002\u8fd9\u79cd\u5dee\u5f02\u5f15\u53d1\u4e86\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff1a\u5f53\u524d\u7684LLMs\u662f\u4ec5\u4ec5\u4f5c\u4e3a\u76f2\u76ee\u7684\u89e3\u9898\u8005\uff0c\u673a\u68b0\u5730\u5e94\u7528\u6570\u5b66\u8fd0\u7b97\u800c\u4e0d\u8fdb\u884c\u66f4\u6df1\u5c42\u6b21\u7684\u63a8\u7406\uff0c\u8fd8\u662f\u80fd\u591f\u4f5c\u4e3a\u4e00\u4e2a\u903b\u8f91\u601d\u8003\u8005\uff0c\u8bc6\u522b\u903b\u8f91\u4e0a\u7684\u4e0d\u4e00\u81f4\uff1f \u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u57fa\u51c6\u6570\u636e\u96c6FaultyMath\uff0c\u5176\u4e2d\u5305\u62ec\u591a\u6837\u5316\u7684\u6709\u7f3a\u9677\u7684\u6570\u5b66\u95ee\u9898\uff1ai\uff09\u6db5\u76d6\u591a\u4e2a\u6570\u5b66\u7c7b\u522b\uff0c\u5982\u4ee3\u6570\u3001\u51e0\u4f55\u3001\u6570\u8bba\u7b49\uff1bii\uff09\u5177\u6709\u4e0d\u540c\u7684\u96be\u5ea6\u7ea7\u522b\uff1biii\uff09\u4e0d\u540c\u7c7b\u578b\u7684\u7f3a\u9677\u6765\u6e90\u2014\u2014\u5305\u62ec\u5e38\u8bc6\u8fdd\u53cd\u3001\u6a21\u7cca\u9648\u8ff0\u3001\u6570\u5b66\u77db\u76fe\u7b49\u3002\u6211\u4eec\u4f7f\u7528FaultyMath\u5bf9\u5e7f\u6cdb\u7684LLMs\u8fdb\u884c\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u3001\u95ed\u6e90\u548c\u6570\u5b66\u4e13\u4e1a\u6a21\u578b\uff0c\u4ece\u4e09\u4e2a\u65b9\u9762\u8fdb\u884c\u8bc4\u4f30\uff1a(i) \u5728\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591a\u51c6\u786e\u5730\u68c0\u6d4b\u51fa\u6709\u7f3a\u9677\u7684\u6570\u5b66\u95ee\u9898\uff1f(ii) \u5f53\u63d0\u4f9b\u5173\u4e8e\u95ee\u9898\u6709\u6548\u6027\u7684\u63d0\u793a\u2014\u2014\u65e0\u8bba\u662f\u6b63\u786e\u7684\u8fd8\u662f\u8bef\u5bfc\u6027\u7684\u2014\u2014LLMs\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u9002\u5e94\u6210\u4e3a\u53ef\u9760\u7684\u903b\u8f91\u601d\u8003\u8005\uff1f(iii) \u5f53LLMs\u8bc6\u522b\u51fa\u4e00\u4e2a\u6570\u5b66\u95ee\u9898\u662f\u9519\u8bef\u7684\u65f6\uff0c\u5b83\u4eec\u751f\u6210\u7684\u89e3\u91ca\u6709\u591a\u53ef\u9760\uff1f\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u73b0\u6709\u7684LLMs\u5927\u591a\u8868\u73b0\u4e3a\u76f2\u76ee\u7684\u89e3\u9898\u8005\uff0c\u672a\u80fd\u5177\u5907\u6210\u4e3a\u903b\u8f91\u601d\u8003\u8005\u6240\u9700\u7684\u63a8\u7406\u80fd\u529b\u3002|\n", "2410.18908": "|**2024-10-25**|**A Survey on Speech Large Language Models**|Jing Peng et.al.|[2410.18908](http://arxiv.org/abs/2410.18908)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0a\u4e0b\u6587\u7406\u89e3\u548c\u591a\u4efb\u52a1\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u4e00\u76f4\u5728\u5bfb\u6c42\u5c06LLMs\u96c6\u6210\u5230\u53e3\u8bed\u7406\u89e3\uff08SLU\uff09\u9886\u57df\u7684\u5927\u6846\u67b6\u4e2d\u3002\u4e0d\u540c\u4e8e\u4f20\u7edf\u7684\u901a\u8fc7\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u751f\u6210\u6587\u672c\u5e76\u4f9d\u6b21\u5904\u7406\u7684\u65b9\u6cd5\uff0c\u65b0\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u8bbe\u8ba1\u4ee5\u97f3\u9891\u7279\u5f81\u63d0\u53d6\u4e3a\u4e2d\u5fc3\u3001\u7ed3\u5408\u591a\u6a21\u6001\u4fe1\u606f\u878d\u5408\u548cLLM\u63a8\u7406\u7684\u67b6\u6784\u2014\u2014\u5373\u6240\u8c13\u7684\u8bed\u97f3LLMs\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u66f4\u4e30\u5bcc\u5730\u63d0\u53d6\u97f3\u9891\u7279\u5f81\uff0c\u540c\u65f6\u4fc3\u8fdb\u97f3\u9891\u548c\u6587\u672c\u6a21\u6001\u7684\u7aef\u5230\u7aef\u878d\u5408\uff0c\u4ece\u800c\u5b9e\u73b0\u4ece\u97f3\u9891\u6570\u636e\u4e2d\u8fdb\u884c\u66f4\u6df1\u5c42\u6b21\u7684\u7406\u89e3\u548c\u63a8\u7406\u3002\u672c\u6587\u9610\u660e\u4e86\u8bed\u97f3LLMs\u7684\u53d1\u5c55\uff0c\u63d0\u4f9b\u4e86\u7cfb\u7edf\u67b6\u6784\u548c\u8bad\u7ec3\u7b56\u7565\u7684\u6df1\u5165\u5206\u6790\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u4e00\u7cfb\u5217\u9488\u5bf9\u6027\u5b9e\u9a8c\uff0c\u672c\u6587\u8bc4\u4f30\u4e86\u8bed\u97f3LLMs\u5728\u4e30\u5bcc\u97f3\u9891\u8f6c\u5199\u65b9\u9762\u7684\u8fdb\u5c55\u53ca\u5176\u5728SLU\u9886\u57df\u8de8\u4efb\u52a1\u6574\u5408\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u6307\u51fa\u4e86\u901a\u8fc7\u5b9e\u9a8c\u53d1\u73b0\u7684\u5173\u952e\u6311\u6218\uff0c\u4f8b\u5982\u5728\u67d0\u4e9b\u6761\u4ef6\u4e0bLLMs\u7684\u60f0\u6027\u95ee\u9898\u3002\u6587\u7ae0\u8fdb\u4e00\u6b65\u63a2\u8ba8\u4e86\u8bed\u97f3LLMs\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u5e76\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\u63d0\u51fa\u4e86\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u4e3a\u8be5\u9886\u57df\u7684\u672a\u6765\u7814\u7a76\u4ee5\u53caLLMs\u5728\u591a\u6a21\u6001\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u53c2\u8003\u3002|\n", "2410.19733": "|**2024-10-25**|**The Potential and Value of AI Chatbot in Personalized Cognitive Training**|Zilong Wang et.al.|[2410.19733](http://arxiv.org/abs/2410.19733)|null|\u8fd1\u5e74\u6765\uff0c\u5168\u7403\u4eba\u53e3\u8001\u9f84\u5316\u52a0\u901f\u5bfc\u81f4\u8ba4\u77e5\u969c\u788d\uff0c\u5982\u963f\u5c14\u8328\u6d77\u9ed8\u75c5\u7684\u53d1\u75c5\u7387\u589e\u52a0\uff0c\u7ed9\u516c\u5171\u536b\u751f\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u5c3d\u7ba1\u76ee\u524d\u5c1a\u65e0\u6709\u6548\u6cbb\u7597\u65b9\u6cd5\u53ef\u4ee5\u9006\u8f6c\u963f\u5c14\u8328\u6d77\u9ed8\u75c5\uff0c\u4f46\u9884\u9632\u548c\u65e9\u671f\u5e72\u9884\uff0c\u5305\u62ec\u8ba4\u77e5\u8bad\u7ec3\uff0c\u81f3\u5173\u91cd\u8981\u3002\u672c\u62a5\u544a\u63a2\u8ba8\u4e86AI\u804a\u5929\u673a\u5668\u4eba\u5728\u589e\u5f3a\u4e2a\u6027\u5316\u8ba4\u77e5\u8bad\u7ec3\u65b9\u9762\u7684\u6f5c\u529b\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86ReMe\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u7f51\u7edc\u7684\u6846\u67b6\uff0c\u65e8\u5728\u521b\u5efaAI\u804a\u5929\u673a\u5668\u4eba\u4ee5\u4fc3\u8fdb\u8ba4\u77e5\u8bad\u7ec3\u7814\u7a76\uff0c\u7279\u522b\u662f\u9488\u5bf9\u4ece\u4e2a\u4eba\u751f\u6d3b\u65e5\u5fd7\u4e2d\u63d0\u53d6\u7684\u60c5\u8282\u8bb0\u5fc6\u4efb\u52a1\u3002\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0cReMe\u63d0\u4f9b\u4e86\u66f4\u53cb\u597d\u3001\u4e92\u52a8\u548c\u4e2a\u6027\u5316\u7684\u57f9\u8bad\u4f53\u9a8c\u3002\u6848\u4f8b\u7814\u7a76\u8868\u660e\uff0cReMe\u901a\u8fc7\u751f\u6d3b\u56de\u5fc6\u548c\u5f00\u653e\u5f0f\u8bed\u8a00\u8c1c\u9898\u6709\u6548\u5730\u5438\u5f15\u4e86\u7528\u6237\uff0c\u7a81\u663e\u4e86\u5176\u5728\u6539\u5584\u8ba4\u77e5\u8bad\u7ec3\u8bbe\u8ba1\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\uff0c\u4f46\u4ecd\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\uff0c\u901a\u8fc7\u5305\u62ec\u8ba4\u77e5\u80fd\u529b\u8bc4\u4f30\u5728\u5185\u7684\u5927\u89c4\u6a21\u7814\u7a76\u6765\u9a8c\u8bc1\u57f9\u8bad\u7684\u6709\u6548\u6027\u3002\u603b\u4f53\u800c\u8a00\uff0cReMe\u4e3a\u4e2a\u6027\u5316\u8ba4\u77e5\u8bad\u7ec3\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5229\u7528AI\u6280\u672f\u6ee1\u8db3\u65e5\u76ca\u589e\u957f\u7684\u8ba4\u77e5\u5065\u5eb7\u975e\u836f\u7269\u5e72\u9884\u9700\u6c42\uff0c\u672a\u6765\u7684\u7814\u7a76\u65e8\u5728\u6269\u5c55\u5176\u5e94\u7528\u8303\u56f4\u548c\u6709\u6548\u6027\u3002|\n", "2410.19730": "|**2024-10-25**|**Counting Ability of Large Language Models and Impact of Tokenization**|Xiang Zhang et.al.|[2410.19730](http://arxiv.org/abs/2410.19730)|**[link](https://github.com/juntaic7/impact-of-tokenization-in-the-counting-ability-of-language-models)**|Transformers\u4f5c\u4e3a\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u57fa\u77f3\uff0c\u9762\u4e34\u7740\u56fa\u6709\u7684\u67b6\u6784\u9650\u5236\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e0e\u5faa\u73af\u7f51\u7edc\u4e0d\u540c\uff0cTransformers\u7f3a\u4e4f\u5faa\u73af\u8fde\u63a5\uff0c\u4f7f\u5176\u53ea\u80fd\u8fdb\u884c\u6052\u5b9a\u6df1\u5ea6\u7684\u8ba1\u7b97\u3002\u8fd9\u79cd\u9650\u5236\u4f7f\u5b83\u4eec\u5728TC$^0$\u590d\u6742\u6027\u7c7b\u4e2d\uff0c\u4ece\u7406\u8bba\u4e0a\u8bb2\uff0c\u65e0\u6cd5\u89e3\u51b3\u90a3\u4e9b\u9700\u8981\u8f93\u5165\u957f\u5ea6\u589e\u52a0\u65f6\u63a8\u7406\u6df1\u5ea6\u4e5f\u76f8\u5e94\u589e\u52a0\u7684\u4efb\u52a1\u3002\u8ba1\u6570\u4f5c\u4e3a\u8bb8\u591a\u63a8\u7406\u4efb\u52a1\u7684\u57fa\u672c\u7ec4\u6210\u90e8\u5206\uff0c\u4e5f\u9700\u8981\u63a8\u7406\u6df1\u5ea6\u968f\u7740\u4efb\u52a1\u590d\u6742\u5ea6\u7ebf\u6027\u589e\u957f\u624d\u80fd\u8fdb\u884c\u5f52\u7eb3\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u786e\u5b9a\u4e86\u57fa\u4e8eTransformer\u7684\u4e13\u5bb6\u6a21\u578b\u5728\u8ba1\u6570\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u4e0a\u9650\uff0c\u4f46\u8fd9\u4e9b\u53d1\u73b0\u5e76\u4e0d\u80fd\u76f4\u63a5\u5e94\u7528\u4e8e\u901a\u7528LLM\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u63a8\u7406\u673a\u5236\u5b58\u5728\u5dee\u5f02\u3002\u6700\u8fd1\u7684\u7814\u7a76\u6307\u51fa\uff0c\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63a8\u7406\u53ef\u4ee5\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u7f13\u89e3Transformer\u5728\u8ba1\u6570\u4efb\u52a1\u4e2d\u7684\u67b6\u6784\u9650\u5236\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5206\u8bcd\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u4f5c\u7528\u5374\u5f88\u5c11\u53d7\u5230\u5173\u6ce8\u3002\u4e0d\u540c\u4e8e\u901a\u5e38\u4f7f\u7528\u5b57\u7b26\u7ea7\u5206\u8bcd\u7684\u4e13\u5bb6\u6a21\u578b\uff0cLLM\u901a\u5e38\u4f9d\u8d56\u4e8e\u5b57\u8282\u7ea7\uff08BPE\uff09\u5206\u8bcd\u5668\uff0c\u8fd9\u4ece\u6839\u672c\u4e0a\u6539\u53d8\u4e86\u63a8\u7406\u5904\u7406\u7684\u65b9\u5f0f\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u8ba8\u4e86\u5206\u8bcd\u5bf9LLM\u8ba1\u6570\u80fd\u529b\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u57fa\u4e8e\u5206\u8bcd\u65b9\u5f0f\u7684\u4e0d\u540c\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u53d8\u5316\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u7406\u8bba\u548c\u5b9e\u9a8c\u5206\u6790\uff0c\u4e3a\u5982\u4f55\u901a\u8fc7\u9009\u62e9\u5408\u9002\u7684\u5206\u8bcd\u65b9\u6cd5\u6765\u589e\u5f3a\u6a21\u578b\u7684\u7406\u8bba\u53ef\u8ba1\u7b97\u6027\u63d0\u4f9b\u4e86\u89c1\u89e3\uff0c\u4ece\u800c\u542f\u53d1\u8bbe\u8ba1\u65b0\u7684\u5206\u8bcd\u65b9\u6cd5\u4ee5\u63d0\u9ad8LLM\u7684\u63a8\u7406\u80fd\u529b\u3002|\n", "2410.19727": "|**2024-10-25**|**FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning**|Nicole Cho et.al.|[2410.19727](http://arxiv.org/abs/2410.19727)|null|\u4ece\u5927\u91cf\u6570\u636e\u6e90\u751f\u6210\u91d1\u878d\u667a\u80fd\u901a\u5e38\u4f9d\u8d56\u4e8e\u4f20\u7edf\u7684\u65b9\u6cd5\uff0c\u5982\u77e5\u8bc6\u56fe\u8c31\u6784\u5efa\u6216\u6570\u636e\u5e93\u5de5\u7a0b\u3002\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u91d1\u878d\u9886\u57df\u7684\u7279\u5b9a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5df2\u7ecf\u51fa\u73b0\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u8fdb\u5c55\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u4ecd\u5b58\u5728\u4e00\u4e9b\u9650\u5236\uff0c\u4f8b\u5982\u9ad8\u63a8\u7406\u6210\u672c\u3001\u5e7b\u89c9\u4ee5\u53ca\u540c\u65f6\u5206\u6790\u9ad8\u7ef4\u91d1\u878d\u6570\u636e\u7684\u590d\u6742\u6027\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u53d1\u660e\u4e86FISHNET\uff08\u91d1\u878d\u667a\u80fd\u4ece\u5b50\u67e5\u8be2\u3001\u534f\u8c03\u3001\u795e\u7ecf\u6761\u4ef6\u3001\u4e13\u5bb6\u96c6\u7fa4\u548c\u4efb\u52a1\u89c4\u5212\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u4ee3\u7406\u67b6\u6784\uff0c\u80fd\u591f\u5b8c\u6210\u8d85\u8fc798,000\u4efd\u76d1\u7ba1\u6587\u4ef6\u7684\u6781\u5176\u590d\u6742\u7684\u5206\u6790\u4efb\u52a1\uff0c\u8fd9\u4e9b\u6587\u4ef6\u5728\u8bed\u4e49\u3001\u6570\u636e\u5c42\u6b21\u6216\u683c\u5f0f\u4e0a\u5dee\u5f02\u5de8\u5927\u3002FISHNET\u5728\u91d1\u878d\u6d1e\u5bdf\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08\u6210\u529f\u7387\u4e3a61.8%\uff0c\u8def\u7531\u4e3a5.0%\uff0cRAG R-\u7cbe\u786e\u5ea6\u4e3a45.6%\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e25\u683c\u7684\u6d88\u878d\u5b9e\u9a8c\uff0c\u4ee5\u5b9e\u8bc1\u8bc1\u660eFISHNET\u7684\u6210\u529f\u3001\u6bcf\u4e2a\u4ee3\u7406\u7684\u91cd\u8981\u6027\u4ee5\u53ca\u6240\u6709\u4ee3\u7406\u7ec4\u88c5\u4f18\u5316\u6027\u80fd\u3002\u6211\u4eec\u7684\u6a21\u5757\u5316\u67b6\u6784\u53ef\u4ee5\u5e94\u7528\u4e8e\u5404\u79cd\u7528\u4f8b\uff0c\u63d0\u4f9b\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u5bf9\u91d1\u878d\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u6570\u636e\u5b8c\u6574\u6027\u3002|\n", "2410.19720": "|**2024-10-25**|**2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision**|Shilong Li et.al.|[2410.19720](http://arxiv.org/abs/2410.19720)|null|\u8fd1\u5e74\u6765\uff0c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5176\u7b80\u5355\u6027\u548c\u6709\u6548\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u901a\u5e38\u4f18\u5316\u4e00\u4e2a\u6807\u91cf\u5206\u6570\u6216\u6392\u540d\u5956\u52b1\uff0c\u4ece\u800c\u5ffd\u7565\u4e86\u4eba\u7c7b\u504f\u597d\u7684\u591a\u7ef4\u6027\u8d28\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u5c06DPO\u7684\u504f\u597d\u6269\u5c55\u5230\u4e24\u4e2a\u7ef4\u5ea6\uff1a\u7247\u6bb5\u548c\u65b9\u9762\u3002\u6211\u4eec\u9996\u5148\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aHelpSteer-2D\u7684\u4e8c\u7ef4\u76d1\u7763\u6570\u636e\u96c6\u3002\u5bf9\u4e8e\u7247\u6bb5\u7ef4\u5ea6\uff0c\u6211\u4eec\u5c06\u54cd\u5e94\u5206\u6210\u53e5\u5b50\u5e76\u4e3a\u6bcf\u4e2a\u7247\u6bb5\u5206\u914d\u5206\u6570\u3002\u5bf9\u4e8e\u65b9\u9762\u7ef4\u5ea6\uff0c\u6211\u4eec\u7cbe\u5fc3\u8bbe\u8ba1\u4e86\u51e0\u9879\u6807\u51c6\u4ee5\u6db5\u76d6\u54cd\u5e94\u8d28\u91cf\u8bc4\u4f30\u6807\u51c6\u3002\u5229\u7528\u4e8c\u7ef4\u4fe1\u53f7\u4f5c\u4e3a\u53cd\u9988\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a2D-DPO\u6846\u67b6\uff0c\u5c06\u603b\u4f53\u76ee\u6807\u5206\u89e3\u4e3a\u591a\u7247\u6bb5\u548c\u591a\u65b9\u9762\u7684\u76ee\u6807\u3002\u5728\u6d41\u884c\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c2D-DPO\u7684\u8868\u73b0\u4f18\u4e8e\u90a3\u4e9b\u4f18\u5316\u6807\u91cf\u6216\u4e00\u7ef4\u504f\u597d\u7684\u65b9\u6cd5\u3002|\n", "2410.19702": "|**2024-10-25**|**TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning**|Xiangyu Zeng et.al.|[2410.19702](http://arxiv.org/abs/2410.19702)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u77ed\u89c6\u9891\u7406\u89e3\u65b9\u9762\u5df2\u7ecf\u5c55\u793a\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u957f\u89c6\u9891\u7684\u7406\u89e3\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u5957\u65b0\u7684\u8bbe\u8ba1\u6765\u9002\u5e94\u73b0\u6709\u7684\u77ed\u89c6\u9891MLLM\uff0c\u4ee5\u5b9e\u73b0\u957f\u89c6\u9891\u7406\u89e3\uff0c\u5305\u62ec\u4e00\u4e2a\u7b80\u5355\u800c\u9ad8\u6548\u7684\u6846\u67b6\u6765\u5904\u7406\u957f\u89c6\u9891\u5e8f\u5217\u3001\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u89c6\u9891\u6570\u636e\u96c6\u7528\u4e8eMLLM\u7684\u63a5\u5730\u8c03\u4f18\uff0c\u4ee5\u53ca\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\uff0c\u4ee5\u663e\u5f0f\u5730\u5c06\u63a5\u5730\u76d1\u7763\u7eb3\u5165\u4f20\u7edf\u7684\u95ee\u7b54\u683c\u5f0f\u3002\u5177\u4f53\u800c\u8a00\uff0c\u57fa\u4e8eVideoChat\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6211\u4eec\u7684\u957f\u89c6\u9891MLLM\uff0c\u79f0\u4e3aVideoChat-T\uff0c\u901a\u8fc7\u5b9e\u73b0\u4ee4\u724c\u6d17\u724c\u6765\u538b\u7f29\u957f\u89c6\u9891\u4ee4\u724c\uff0c\u5e76\u5f15\u5165\u65f6\u95f4\u81ea\u9002\u5e94\u4f4d\u7f6e\u7f16\u7801\uff08TAPE\uff09\u6765\u589e\u5f3a\u89c6\u89c9\u8868\u793a\u7684\u65f6\u95f4\u611f\u77e5\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86TimePro\uff0c\u8fd9\u662f\u4e00\u4e2a\u7efc\u5408\u6027\u7684\u63a5\u5730\u4e3a\u4e2d\u5fc3\u7684\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u75319\u4e2a\u4efb\u52a1\u548c34.9\u4e07\u4e2a\u9ad8\u8d28\u91cf\u7684\u63a5\u5730\u6807\u6ce8\u7ec4\u6210\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u7c7b\u578b\uff0c\u79f0\u4e3a\u65f6\u95f4\u63a5\u5730\u5b57\u5e55\uff0c\u7528\u4e8e\u6267\u884c\u8be6\u7ec6\u89c6\u9891\u63cf\u8ff0\u4e0e\u76f8\u5e94\u65f6\u95f4\u6233\u9884\u6d4b\u3002\u8fd9\u79cd\u660e\u786e\u7684\u65f6\u95f4\u4f4d\u7f6e\u9884\u6d4b\u5c06\u6307\u5bfcMLLM\u5728\u751f\u6210\u63cf\u8ff0\u65f6\u6b63\u786e\u5173\u6ce8\u89c6\u89c9\u5185\u5bb9\uff0c\u4ece\u800c\u51cf\u5c11\u56e0LLMs\u5f15\u8d77\u7684\u5e7b\u89c9\u98ce\u9669\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684TimeSuite\u6210\u529f\u5730\u63d0\u9ad8\u4e86\u77ed\u89c6\u9891MLLM\u5728\u957f\u89c6\u9891\u7406\u89e3\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5728Egoschema\u548cVideoMME\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5206\u522b\u63d0\u9ad8\u4e865.6%\u548c6.8%\u3002\u6b64\u5916\uff0cVideoChat-T\u5728\u96f6\u6837\u672c\u65f6\u95f4\u63a5\u5730\u80fd\u529b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u6700\u5148\u8fdb\u7684MLLM\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5b83\u7684\u8868\u73b0\u4e0e\u4f20\u7edf\u7684\u76d1\u7763\u4e13\u5bb6\u6a21\u578b\u76f8\u5f53\u3002|\n", "2410.19697": "|**2024-10-25**|**IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation**|Kaixian Qu et.al.|[2410.19697](http://arxiv.org/abs/2410.19697)|null|\u5728\u672a\u63a2\u7d22\u7684\u73af\u5883\u4e2d\u9ad8\u6548\u5bfc\u822a\u5230\u76ee\u6807\u7269\u4f53\u662f\u901a\u7528\u667a\u80fd\u673a\u5668\u4eba\u7684\u4e00\u9879\u5173\u952e\u6280\u672f\u3002\u6700\u8fd1\u7684\u65b9\u6cd5\u91c7\u7528\u6a21\u5757\u5316\u7b56\u7565\uff0c\u7ed3\u5408\u7ecf\u5178\u7684\u63a2\u7d22\u7b97\u6cd5\uff08\u7279\u522b\u662f\u524d\u6cbf\u63a2\u7d22\uff09\u4e0e\u5b66\u4e60\u7684\u8bed\u4e49\u6620\u5c04/\u63a2\u7d22\u6a21\u5757\u6765\u89e3\u51b3\u8fd9\u4e00\u7269\u4f53\u5bfc\u822a\u95ee\u9898\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4fe1\u606f\u8def\u5f84\u89c4\u5212\u548c\u4e09\u7ef4\u7269\u4f53\u6982\u7387\u6620\u5c04\u65b9\u6cd5\u3002\u8be5\u6620\u5c04\u6a21\u5757\u901a\u8fc7\u8bed\u4e49\u5206\u5272\u548c\u8d1d\u53f6\u65af\u6ee4\u6ce2\u8ba1\u7b97\u611f\u5174\u8da3\u7269\u4f53\u7684\u6982\u7387\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5b58\u50a8\u5e38\u89c1\u7269\u4f53\u7684\u6982\u7387\uff0c\u8fd9\u4e9b\u6982\u7387\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u5e38\u8bc6\u5148\u9a8c\uff0c\u4ece\u800c\u4ece\u8bed\u4e49\u4e0a\u5f15\u5bfc\u63a2\u7d22\u3002\u5f53\u5f53\u524d\u89c6\u89d2\u6355\u83b7\u4e86\u8db3\u591f\u591a\u4e14\u7f6e\u4fe1\u5ea6\u9ad8\u7684\u7269\u4f53\u4e3a\u611f\u5174\u8da3\u7269\u4f53\u7684\u4f53\u7d20\u65f6\uff0c\u89c4\u5212\u5668\u7ec8\u6b62\u3002\u5c3d\u7ba1\u6211\u4eec\u7684\u89c4\u5212\u5668\u91c7\u7528\u4e86\u96f6\u6837\u672c\u65b9\u6cd5\uff0c\u4f46\u5728Habitat\u7269\u4f53\u5bfc\u822a\u6311\u62182023\u4e2d\uff0c\u5b83\u5728\u6210\u529f\u52a0\u6743\u8def\u5f84\u957f\u5ea6\uff08SPL\uff09\u548c\u8f6fSPL\u6307\u6807\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u6bd4\u5176\u4ed6\u5de5\u4f5c\u9ad8\u51fa20%\u4ee5\u4e0a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u771f\u5b9e\u673a\u5668\u4eba\u4e0a\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002\u9879\u76ee\u7f51\u9875\uff1ahttps://ippon-paper.github.io/|\n", "2410.19694": "|**2024-10-25**|**Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs**|Yifei Zhang et.al.|[2410.19694](http://arxiv.org/abs/2410.19694)|null|\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u6210\u4e3a\u5c06\u9884\u8bad\u7ec3\u6a21\u578b\u9002\u5e94\u4e0b\u6e38\u4efb\u52a1\u7684\u91cd\u8981\u6280\u672f\u3002\u7136\u800c\uff0cLLMs\u7684\u5de8\u5927\u89c4\u6a21\u5e26\u6765\u4e86\u663e\u8457\u7684\u8ba1\u7b97\u590d\u6742\u6027\u548c\u8d44\u6e90\u9700\u6c42\u6311\u6218\u3002\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u5e94\u8fd0\u800c\u751f\u3002\u7136\u800c\uff0c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4f4e\u79e9\u9002\u5e94\u4e0e\u7406\u8bba\u6700\u4f18\u4e4b\u95f4\u5b58\u5728\u5dee\u8ddd\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6781\u7aef\u68af\u5ea6\u63d0\u5347LoRA\uff08XGBLoRA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5229\u7528\u96c6\u6210\u5b66\u4e60\u7684\u529b\u91cf\u6765\u5f25\u5408\u8fd9\u4e00\u5dee\u8ddd\u3002\u53d7\u68af\u5ea6\u63d0\u5347\u542f\u53d1\uff0cXGBLoRA\u8fed\u4ee3\u5730\u5b66\u4e60\u5e76\u878d\u5408\u4e00\u7cfb\u5217LoRA\u9002\u5e94\u4ee5\u4f18\u5316\u6a21\u578b\u9884\u6d4b\u3002\u5b83\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u6807\u51c6LoRA\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u79e9-1\u9002\u5e94\u7684\u8ba1\u7b97\u6548\u7387\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u7406\u8bba\u5206\u6790\u4ee5\u8bc1\u660e\u65b9\u6cd5\u7684\u6536\u655b\u6027\u548c\u6700\u4f18\u6027\uff0c\u5e76\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0cXGBLoRA\u59cb\u7ec8\u4f18\u4e8e\u6807\u51c6LoRA\uff0c\u5e76\u4e14\u5728\u663e\u8457\u51cf\u5c11\u53ef\u8bad\u7ec3\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u4e0e\u5168\u5fae\u8c03\u76f8\u5f53\u7684\u6027\u80fd\u3002\u8fd9\u9879\u5de5\u4f5c\u63a8\u8fdb\u4e86LLMs\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\uff0c\u5e76\u4e3a\u4f18\u5316\u6027\u80fd\u548c\u6548\u7387\u7684\u540c\u65f6\u5c06LLMs\u9002\u5e94\u5230\u4e0b\u6e38\u4efb\u52a1\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.19656": "|**2024-10-25**|**APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs**|Huaxiaoyue Wang et.al.|[2410.19656](http://arxiv.org/abs/2410.19656)|null|\u5bb6\u5ead\u673a\u5668\u4eba\u5728\u6267\u884c\u4e2a\u6027\u5316\u4efb\u52a1\u65f6\uff0c\u5fc5\u987b\u5de7\u5999\u5730\u5e73\u8861\u7528\u6237\u504f\u597d\u4e0e\u73af\u5883\u9650\u5236\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5728\u53d7\u9650\u7a7a\u95f4\u5185\u8fdb\u884c\u7ec4\u7ec7\u4efb\u52a1\uff0c\u4f8b\u5982\u5c06\u7269\u54c1\u653e\u5165\u51b0\u7bb1\uff0c\u5176\u4e2d\u7528\u6237\u7684\u653e\u7f6e\u504f\u597d\u5e38\u5e38\u4e0e\u7269\u7406\u9650\u5236\u76f8\u51b2\u7a81\u3002\u673a\u5668\u4eba\u5fc5\u987b\u6839\u636e\u5c11\u91cf\u6f14\u793a\u6765\u63a8\u65ad\u7528\u6237\u7684\u504f\u597d\uff0c\u8fd9\u6bd4\u8be6\u7ec6\u5b9a\u4e49\u6240\u6709\u9700\u6c42\u66f4\u5bb9\u6613\u8ba9\u7528\u6237\u64cd\u4f5c\u3002\u867d\u7136\u6700\u8fd1\u7684\u7814\u7a76\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ece\u7528\u6237\u6f14\u793a\u4e2d\u5b66\u4e60\u504f\u597d\uff0c\u4f46\u5b83\u4eec\u9762\u4e34\u4e24\u4e2a\u57fa\u672c\u6311\u6218\u3002\u9996\u5148\uff0c\u5728\u89e3\u91ca\u7528\u6237\u884c\u4e3a\u65f6\u5b58\u5728\u56fa\u6709\u7684\u6a21\u7cca\u6027\uff0c\u56e0\u4e3a\u5355\u4e00\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\u53ef\u80fd\u5bf9\u5e94\u591a\u79cd\u504f\u597d\u3002\u5176\u6b21\uff0c\u5e76\u975e\u6240\u6709\u7528\u6237\u504f\u597d\u5728\u73af\u5883\u4e2d\u90fd\u662f\u5b9e\u9645\u53ef\u884c\u7684\uff0c\u56e0\u4e3a\u5b58\u5728\u51e0\u4f55\u7ea6\u675f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86APRICOT\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u57fa\u4e8eLLM\u7684\u8d1d\u53f6\u65af\u4e3b\u52a8\u504f\u597d\u5b66\u4e60\u548c\u8003\u8651\u73af\u5883\u7ea6\u675f\u7684\u4efb\u52a1\u89c4\u5212\u3002APRICOT\u901a\u8fc7\u4e3b\u52a8\u67e5\u8be2\u7528\u6237\u6765\u4f18\u5316\u751f\u6210\u7684\u504f\u597d\uff0c\u5e76\u52a8\u6001\u8c03\u6574\u5176\u8ba1\u5212\u4ee5\u5c0a\u91cd\u73af\u5883\u9650\u5236\u3002\u6211\u4eec\u5728\u591a\u6837\u5316\u7684\u7ec4\u7ec7\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86APRICOT\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u504f\u597d\u6ee1\u610f\u5ea6\u548c\u8ba1\u5212\u53ef\u884c\u6027\u65b9\u9762\u7684\u663e\u8457\u63d0\u5347\u3002\u8be5\u9879\u76ee\u7f51\u7ad9\u4f4d\u4e8ehttps://portal-cornell.github.io/apricot/|\n", "2410.19599": "|**2024-10-25**|**Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina**|Yuan Gao et.al.|[2410.19599](http://arxiv.org/abs/2410.19599)|null|\u8fd1\u671f\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u4ee5\u5c55\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5728\u7ecf\u6d4e\u5b9e\u9a8c\u3001\u8c03\u67e5\u548c\u653f\u6cbb\u8ba8\u8bba\u4e2d\u4e0e\u4eba\u7c7b\u884c\u4e3a\u4e00\u81f4\u3002\u8fd9\u4fc3\u4f7f\u8bb8\u591a\u4eba\u63d0\u51fa\u53ef\u4ee5\u5c06LLMs\u4f5c\u4e3a\u4eba\u7c7b\u5728\u793e\u4f1a\u79d1\u5b66\u4e2d\u7684\u66ff\u4ee3\u54c1\u3002\u7136\u800c\uff0cLLMs\u4e0e\u4eba\u7c7b\u4e4b\u95f4\u5b58\u5728\u6839\u672c\u6027\u7684\u5dee\u5f02\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u6982\u7387\u6a21\u5f0f\uff0c\u7f3a\u4e4f\u5851\u9020\u4eba\u7c7b\u8ba4\u77e5\u7684\u5177\u8eab\u7ecf\u9a8c\u6216\u751f\u5b58\u76ee\u6807\u3002\u6211\u4eec\u901a\u8fc711-20\u91d1\u94b1\u8bf7\u6c42\u6e38\u620f\u6765\u8bc4\u4f30LLMs\u7684\u63a8\u7406\u6df1\u5ea6\u3002\u51e0\u4e4e\u6240\u6709\u5148\u8fdb\u7684\u65b9\u6cd5\u90fd\u65e0\u6cd5\u5728\u8bb8\u591a\u6a21\u578b\u4e2d\u590d\u5236\u4eba\u7c7b\u7684\u884c\u4e3a\u5206\u5e03\uff0c\u9664\u975e\u5728\u4f7f\u7528\u5927\u91cf\u4eba\u7c7b\u884c\u4e3a\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u3002\u5931\u8d25\u7684\u539f\u56e0\u591a\u79cd\u591a\u6837\uff0c\u6d89\u53ca\u8f93\u5165\u8bed\u8a00\u3001\u89d2\u8272\u548c\u4fdd\u62a4\u63aa\u65bd\u7b49\u56e0\u7d20\u3002\u8fd9\u4e9b\u7ed3\u679c\u63d0\u9192\u6211\u4eec\u4e0d\u8981\u5c06LLMs\u7528\u4e8e\u7814\u7a76\u4eba\u7c7b\u884c\u4e3a\u6216\u5c06\u5176\u4f5c\u4e3a\u4eba\u7c7b\u7684\u66ff\u4ee3\u54c1\u3002|\n", "2410.19586": "|**2024-10-25**|**Diverse Sign Language Translation**|Xin Shen et.al.|[2410.19586](http://arxiv.org/abs/2410.19586)|null|\u7c7b\u4f3c\u4e8e\u53e3\u8bed\uff0c\u4e00\u4e2a\u624b\u8bed\u8868\u8fbe\u53ef\u80fd\u5bf9\u5e94\u591a\u4e2a\u6709\u6548\u7684\u6587\u672c\u89e3\u91ca\u3002\u56e0\u6b64\uff0c\u5bf9\u624b\u8bed\u7ffb\u8bd1\uff08SLT\uff09\u6a21\u578b\u8fdb\u884c\u5355\u4e00\u7684\u6620\u5c04\u5b66\u4e60\u53ef\u80fd\u662f\u4e0d\u5145\u5206\u7684\uff0c\u5c24\u5176\u662f\u5728\u6570\u636e\u6709\u9650\u7684\u60c5\u51b5\u4e0b\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u6837\u5316\u7684\u624b\u8bed\u7ffb\u8bd1\uff08DivSLT\uff09\u4efb\u52a1\uff0c\u65e8\u5728\u4e3a\u624b\u8bed\u89c6\u9891\u751f\u6210\u591a\u6837\u4e14\u51c6\u786e\u7684\u7ffb\u8bd1\u3002\u9996\u5148\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u5e7f\u6cdb\u4f7f\u7528\u7684CSL-Daily\u548cPHOENIX14T SLT\u6570\u636e\u96c6\u751f\u6210\u591a\u4e2a\u53c2\u8003\u3002\u8fd9\u91cc\uff0c\u4ec5\u9080\u8bf7\u6bcd\u8bed\u4eba\u58eb\u6765\u6da6\u8272\u4e0d\u51c6\u786e\u7684\u53c2\u8003\uff0c\u4ece\u800c\u663e\u8457\u63d0\u9ad8\u4e86\u6ce8\u91ca\u6548\u7387\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u57fa\u51c6\u6a21\u578b\u4ee5\u63a8\u52a8\u8be5\u4efb\u52a1\u7684\u7814\u7a76\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u591a\u53c2\u8003\u8bad\u7ec3\u7b56\u7565\uff0c\u4ee5\u4f7f\u6211\u4eec\u7684DivSLT\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u591a\u6837\u5316\u7684\u7ffb\u8bd1\u3002\u7136\u540e\uff0c\u4e3a\u4e86\u63d0\u9ad8\u7ffb\u8bd1\u51c6\u786e\u6027\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u6700\u5927\u5316\u7ffb\u8bd1\u7ed3\u679c\u5956\u52b1\u7684\u5f3a\u5316\u5b66\u4e60\u76ee\u6807\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u591a\u79cd\u6307\u6807\u6765\u8bc4\u4f30DivSLT\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3001\u591a\u6837\u6027\u548c\u8bed\u4e49\u7cbe\u5ea6\u3002\u5728\u4e30\u5bcc\u540e\u7684\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684DivSLT\u65b9\u6cd5\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u7ffb\u8bd1\u6027\u80fd\uff0c\u8fd8\u83b7\u5f97\u4e86\u591a\u6837\u5316\u7684\u7ffb\u8bd1\u7ed3\u679c\u3002|\n", "2410.21272": "|**2024-10-28**|**Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics**|Yaniv Nikankin et.al.|[2410.21272](http://arxiv.org/abs/2410.21272)|**[link](https://github.com/technion-cs-nlp/llm-arithmetic-heuristics)**|\u4e3a\u4e86\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u63a8\u7406\u4efb\u52a1\u65f6\u662f\u901a\u8fc7\u5b66\u4e60\u7a33\u5065\u7684\u53ef\u6cdb\u5316\u7b97\u6cd5\uff0c\u8fd8\u662f\u901a\u8fc7\u8bb0\u5fc6\u8bad\u7ec3\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u4e86\u7b97\u672f\u63a8\u7406\u4f5c\u4e3a\u4ee3\u8868\u6027\u4efb\u52a1\u8fdb\u884c\u7814\u7a76\u3002\u901a\u8fc7\u56e0\u679c\u5206\u6790\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u6a21\u578b\u7684\u4e00\u4e2a\u5b50\u90e8\u5206\uff08\u4e00\u4e2a\u7535\u8def\uff09\uff0c\u8be5\u90e8\u5206\u89e3\u91ca\u4e86\u57fa\u672c\u7b97\u672f\u903b\u8f91\u4e2d\u6a21\u578b\u5927\u90e8\u5206\u7684\u884c\u4e3a\uff0c\u5e76\u68c0\u67e5\u4e86\u5176\u529f\u80fd\u3002\u901a\u8fc7\u5173\u6ce8\u5355\u4e2a\u7535\u8def\u795e\u7ecf\u5143\u7684\u5c42\u9762\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u7ec4\u91cd\u8981\u7684\u7a00\u758f\u795e\u7ecf\u5143\uff0c\u5b83\u4eec\u5b9e\u73b0\u4e86\u7b80\u5355\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u6bcf\u4e2a\u542f\u53d1\u5f0f\u65b9\u6cd5\u8bc6\u522b\u6570\u503c\u8f93\u5165\u6a21\u5f0f\u5e76\u8f93\u51fa\u76f8\u5e94\u7684\u7b54\u6848\u3002\u6211\u4eec\u5047\u8bbe\uff0c\u8fd9\u4e9b\u542f\u53d1\u5f0f\u795e\u7ecf\u5143\u7684\u7ec4\u5408\u662f\u751f\u6210\u6b63\u786e\u7b97\u672f\u7b54\u6848\u7684\u673a\u5236\u3002\u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u70b9\uff0c\u6211\u4eec\u5c06\u6bcf\u4e2a\u795e\u7ecf\u5143\u5206\u7c7b\u4e3a\u51e0\u79cd\u542f\u53d1\u5f0f\u7c7b\u578b\u2014\u2014\u4f8b\u5982\uff0c\u5f53\u64cd\u4f5c\u6570\u843d\u5728\u67d0\u4e2a\u8303\u56f4\u5185\u65f6\u6fc0\u6d3b\u7684\u795e\u7ecf\u5143\u2014\u2014\u5e76\u53d1\u73b0\u8fd9\u4e9b\u542f\u53d1\u5f0f\u7c7b\u578b\u7684\u65e0\u5e8f\u7ec4\u5408\u662f\u89e3\u91ca\u6a21\u578b\u5728\u7b97\u672f\u63d0\u793a\u4e0a\u51c6\u786e\u6027\u7684\u4e3b\u8981\u673a\u5236\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u8fd9\u79cd\u673a\u5236\u5728\u8bad\u7ec3\u65e9\u671f\u5c31\u662f\u7b97\u672f\u51c6\u786e\u6027\u7684\u91cd\u8981\u6765\u6e90\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u5728\u591a\u4e2aLLM\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u6267\u884c\u7b97\u672f\u8fd0\u7b97\u65e2\u4e0d\u662f\u4f9d\u9760\u7a33\u5065\u7684\u7b97\u6cd5\uff0c\u4e5f\u4e0d\u662f\u4f9d\u9760\u8bb0\u5fc6\uff1b\u76f8\u53cd\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u201c\u4e00\u7ec4\u542f\u53d1\u5f0f\u65b9\u6cd5\u201d\u3002|\n", "2410.21264": "|**2024-10-28**|**LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior**|Hanyu Wang et.al.|[2410.21264](http://arxiv.org/abs/2410.21264)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86LARP\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u89c6\u9891\u6807\u8bb0\u5668\uff0c\u65e8\u5728\u514b\u670d\u5f53\u524d\u7528\u4e8e\u81ea\u56de\u5f52\uff08AR\uff09\u751f\u6210\u6a21\u578b\u7684\u89c6\u9891\u6807\u8bb0\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u3002\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u8865\u4e01\u7684\u6807\u8bb0\u5668\u76f4\u63a5\u5c06\u5c40\u90e8\u89c6\u89c9\u8865\u4e01\u7f16\u7801\u4e3a\u79bb\u6563\u6807\u8bb0\u4e0d\u540c\uff0cLARP\u5f15\u5165\u4e86\u4e00\u79cd\u6574\u4f53\u6807\u8bb0\u65b9\u6848\uff0c\u901a\u8fc7\u4e00\u7ec4\u5b66\u4e60\u5230\u7684\u6574\u4f53\u67e5\u8be2\u6765\u6536\u96c6\u89c6\u89c9\u5185\u5bb9\u7684\u4fe1\u606f\u3002\u8fd9\u79cd\u8bbe\u8ba1\u4f7fLARP\u80fd\u591f\u6355\u6349\u66f4\u5168\u5c40\u548c\u8bed\u4e49\u5316\u7684\u8868\u793a\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u5c40\u9650\u4e8e\u5c40\u90e8\u8865\u4e01\u7ea7\u522b\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u63d0\u4f9b\u4e86\u7075\u6d3b\u6027\uff0c\u652f\u6301\u4efb\u610f\u6570\u91cf\u7684\u79bb\u6563\u6807\u8bb0\uff0c\u4ece\u800c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u5b9e\u73b0\u81ea\u9002\u5e94\u548c\u9ad8\u6548\u7684\u6807\u8bb0\u3002\u4e3a\u4e86\u4f7f\u79bb\u6563\u6807\u8bb0\u7a7a\u95f4\u4e0e\u4e0b\u6e38AR\u751f\u6210\u4efb\u52a1\u5bf9\u9f50\uff0cLARP\u96c6\u6210\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u7684AR\u53d8\u6362\u5668\u4f5c\u4e3a\u8bad\u7ec3\u65f6\u7684\u5148\u9a8c\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5728\u79bb\u6563\u6f5c\u5728\u7a7a\u95f4\u4e0a\u9884\u6d4b\u4e0b\u4e00\u4e2a\u6807\u8bb0\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7ed3\u5408\u5148\u9a8c\u6a21\u578b\uff0cLARP\u5b66\u4e60\u4e86\u4e00\u4e2a\u4e0d\u4ec5\u4f18\u5316\u4e86\u89c6\u9891\u91cd\u5efa\u7684\u6f5c\u5728\u7a7a\u95f4\uff0c\u800c\u4e14\u7ed3\u6784\u4e0a\u66f4\u9002\u5408\u81ea\u56de\u5f52\u751f\u6210\u7684\u6f5c\u5728\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u5b9a\u4e49\u4e86\u79bb\u6563\u6807\u8bb0\u7684\u987a\u5e8f\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6b65\u5c06\u5176\u63a8\u5411\u6700\u4f18\u914d\u7f6e\uff0c\u786e\u4fdd\u63a8\u7406\u65f6\u66f4\u5e73\u6ed1\u548c\u51c6\u786e\u7684AR\u751f\u6210\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLARP\u8868\u73b0\u5f3a\u52b2\uff0c\u5728UCF101\u5206\u7c7b\u6761\u4ef6\u4e0b\u7684\u89c6\u9891\u751f\u6210\u57fa\u51c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684FVD\u5206\u6570\u3002LARP\u589e\u5f3a\u4e86AR\u6a21\u578b\u4e0e\u89c6\u9891\u7684\u517c\u5bb9\u6027\uff0c\u5e76\u5f00\u542f\u4e86\u6784\u5efa\u7edf\u4e00\u7684\u9ad8\u4fdd\u771f\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53ef\u80fd\u6027\u3002|\n", "2410.21252": "|**2024-10-28**|**LongReward: Improving Long-context Large Language Models with AI Feedback**|Jiajie Zhang et.al.|[2410.21252](http://arxiv.org/abs/2410.21252)|**[link](https://github.com/THUDM/LongReward)**|\u5c3d\u7ba1\u5728\u5f00\u53d1\u957f\u4e0a\u4e0b\u6587\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5408\u6210\u7684\u6570\u636e\u8d28\u91cf\u5f80\u5f80\u5f71\u54cd\u4e86\u6709\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6a21\u578b\u7684\u957f\u4e0a\u4e0b\u6587\u6027\u80fd\uff0c\u5e76\u5bfc\u81f4\u56fa\u6709\u7684\u5c40\u9650\u6027\u3002\u539f\u5219\u4e0a\uff0c\u9002\u5f53\u7684\u5956\u52b1\u4fe1\u53f7\u53ef\u4ee5\u5229\u7528\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u5982\u4f55\u83b7\u5f97\u53ef\u9760\u7684\u5956\u52b1\u4ecd\u7136\u662f\u4e00\u4e2a\u672a\u63a2\u7d22\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LongReward\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u5229\u7528\u73b0\u6210\u7684LLM\u4ece\u56db\u4e2a\u7ef4\u5ea6\uff08\u5373\uff1a\u6709\u7528\u6027\u3001\u903b\u8f91\u6027\u3001\u51c6\u786e\u6027\u548c\u5b8c\u6574\u6027\uff09\u63d0\u4f9b\u957f\u4e0a\u4e0b\u6587\u6a21\u578b\u54cd\u5e94\u7684\u5956\u52b1\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u7ef4\u5ea6\u8bbe\u8ba1\u4e86\u8be6\u7ec6\u7684\u8bc4\u4f30\u6d41\u7a0b\u3002\u901a\u8fc7\u7ed3\u5408LongReward\u548c\u79bb\u7ebfRL\u7b97\u6cd5DPO\uff0c\u6211\u4eec\u80fd\u591f\u6709\u6548\u5730\u6539\u8fdb\u957f\u4e0a\u4e0b\u6587SFT\u6a21\u578b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLongReward\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u7684\u957f\u4e0a\u4e0b\u6587\u6027\u80fd\uff0c\u8fd8\u589e\u5f3a\u4e86\u5b83\u4eec\u9075\u5faa\u77ed\u6307\u4ee4\u7684\u80fd\u529b\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5e26\u6709LongReward\u7684\u957f\u4e0a\u4e0b\u6587DPO\u548c\u4f20\u7edf\u7684\u77ed\u4e0a\u4e0b\u6587DPO\u53ef\u4ee5\u4e00\u8d77\u4f7f\u7528\u800c\u4e0d\u635f\u5bb3\u4efb\u4f55\u4e00\u65b9\u7684\u6027\u80fd\u3002|\n", "2410.21242": "|**2024-10-28**|**Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback**|Nour Jedidi et.al.|[2410.21242](http://arxiv.org/abs/2410.21242)|null|\u6784\u5efa\u6709\u6548\u7684\u5bc6\u96c6\u68c0\u7d22\u7cfb\u7edf\u5728\u7f3a\u4e4f\u76f8\u5173\u6027\u76d1\u7763\u65f6\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u751f\u6210\u5047\u8bbe\u6587\u6863\uff0c\u4ece\u800c\u627e\u5230\u6700\u63a5\u8fd1\u7684\u771f\u5b9e\u6587\u6863\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u5177\u5907\u4e0e\u67e5\u8be2\u76f8\u5173\u7684\u9886\u57df\u7279\u5b9a\u77e5\u8bc6\uff0c\u8fd9\u5728\u5b9e\u8df5\u4e2d\u53ef\u80fd\u4e0d\u53ef\u884c\u3002\u6b64\u5916\uff0c\u751f\u6210\u5047\u8bbe\u6587\u6863\u7684\u65b9\u6cd5\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u5bf9\u4e8e\u6bcf\u4e2a\u67e5\u8be2\uff0cLLM\u9700\u8981\u751f\u6210\u5927\u91cf\u7684\u6807\u8bb0\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57fa\u4e8e\u76f8\u5173\u53cd\u9988\u7684\u771f\u5b9e\u6587\u6863\u5d4c\u5165\uff08ReDE-RF\uff09\u3002\u53d7\u76f8\u5173\u53cd\u9988\u7684\u542f\u53d1\uff0cReDE-RF\u63d0\u51fa\u5c06\u5047\u8bbe\u6587\u6863\u751f\u6210\u91cd\u65b0\u5b9a\u4e49\u4e3a\u76f8\u5173\u6027\u4f30\u8ba1\u4efb\u52a1\uff0c\u5229\u7528LLM\u9009\u62e9\u54ea\u4e9b\u6587\u6863\u5e94\u88ab\u7528\u4e8e\u6700\u8fd1\u90bb\u641c\u7d22\u3002\u901a\u8fc7\u8fd9\u79cd\u91cd\u65b0\u5b9a\u4e49\uff0cLLM\u4e0d\u518d\u9700\u8981\u9886\u57df\u7279\u5b9a\u7684\u77e5\u8bc6\uff0c\u800c\u53ea\u9700\u8981\u5224\u65ad\u4ec0\u4e48\u662f\u76f8\u5173\u7684\u3002\u6b64\u5916\uff0c\u76f8\u5173\u6027\u4f30\u8ba1\u4ec5\u8981\u6c42LLM\u8f93\u51fa\u4e00\u4e2a\u6807\u8bb0\uff0c\u4ece\u800c\u63d0\u9ad8\u6bcf\u6b21\u67e5\u8be2\u7684\u641c\u7d22\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cReDE-RF\u5728\u5e7f\u6cdb\u7684\u4f4e\u8d44\u6e90\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u59cb\u7ec8\u8d85\u8d8a\u6700\u5148\u8fdb\u7684\u96f6\u6837\u672c\u5bc6\u96c6\u68c0\u7d22\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u6bcf\u6b21\u67e5\u8be2\u7684\u5ef6\u8fdf\u65b9\u9762\u4e5f\u53d6\u5f97\u4e86\u663e\u8457\u6539\u8fdb\u3002|\n", "2410.21237": "|**2024-10-28**|**Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce**|Zhantao Yang et.al.|[2410.21237](http://arxiv.org/abs/2410.21237)|null|\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u5728\u5404\u79cdAI\u7cfb\u7edf\u4e2d\u626e\u6f14\u7740\u8d8a\u6765\u8d8a\u91cd\u8981\u7684\u89d2\u8272\u3002\u5bf9\u4e8e\u7535\u5b50\u5546\u52a1\u800c\u8a00\uff0c\u6784\u5efa\u9ad8\u6548\u4e14\u4f4e\u6210\u672c\u7684\u81ea\u52a8\u5316\u77e5\u8bc6\u56fe\u8c31\u662f\u5b9e\u73b0\u4f17\u591a\u6210\u529f\u4e0b\u6e38\u5e94\u7528\u7684\u57fa\u7840\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u4ece\u539f\u59cb\u4ea7\u54c1\u56fe\u50cf\u4e2d\u6784\u5efa\u7ed3\u6784\u5316\u7684\u4ea7\u54c1\u77e5\u8bc6\u56fe\u8c31\u3002\u8be5\u65b9\u6cd5\u5145\u5206\u5229\u7528\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b9e\u73b0\u4e86\u6574\u4e2a\u8fc7\u7a0b\u7684\u5b8c\u5168\u81ea\u52a8\u5316\uff0c\u5e76\u5141\u8bb8\u53ca\u65f6\u66f4\u65b0\u56fe\u8c31\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u4eba\u5de5\u6807\u6ce8\u7684\u7535\u5b50\u5546\u52a1\u4ea7\u54c1\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u77e5\u8bc6\u56fe\u8c31\u6784\u5efa\u4e2d\u7684\u4ea7\u54c1\u5c5e\u6027\u63d0\u53d6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6240\u6709\u6307\u6807\u548c\u8bc4\u4f30\u5c5e\u6027\u4e0a\u90fd\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u5176\u6709\u6548\u6027\u548c\u5e7f\u9614\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2410.21236": "|**2024-10-28**|**Flaming-hot Initiation with Regular Execution Sampling for Large Language Models**|Weizhe Chen et.al.|[2410.21236](http://arxiv.org/abs/2410.21236)|null|\u81eaChatGPT\u53d1\u5e03\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\u3002\u5728\u5f00\u53d1\u8fd9\u4e9b\u901a\u7528\u80fd\u529b\u65f6\uff0c\u4e00\u4e2a\u5173\u952e\u7684\u6311\u6218\u662f\u9ad8\u6548\u5730\u83b7\u53d6\u591a\u6837\u5316\u4e14\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u3002\u8fd9\u5728\u4e0e\u6c99\u76d2\u68c0\u67e5\u5668\u76f8\u5173\u7684\u63a8\u7406\u4efb\u52a1\u4e2d\u5c24\u4e3a\u91cd\u8981\uff0c\u4f8b\u5982\u6570\u5b66\u6216\u4ee3\u7801\u95ee\u9898\uff0c\u76ee\u6807\u662f\u63d0\u9ad8\u751f\u6210\u6b63\u786e\u89e3\u51b3\u65b9\u6848\u7684\u6982\u7387\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Flaming-hot Initiation with Regular Execution\uff08FIRE\uff09\u91c7\u6837\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u4f46\u975e\u5e38\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u9ad8\u6548\u5730\u627e\u5230\u597d\u7684\u54cd\u5e94\u3002\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0cFIRE\u91c7\u6837\u63d0\u9ad8\u4e86\u63a8\u7406\u65f6\u95f4\u751f\u6210\u7684\u8d28\u91cf\uff0c\u5e76\u4e14\u5728\u5bf9\u9f50\u9636\u6bb5\u7684\u8bad\u7ec3\u4e2d\u4e5f\u53d7\u76ca\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86FIRE\u91c7\u6837\u5982\u4f55\u901a\u8fc7\u4fc3\u8fdb\u591a\u6837\u6027\u6765\u63d0\u5347\u6027\u80fd\uff0c\u5e76\u5206\u6790\u4e86\u5728\u54cd\u5e94\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f7f\u7528FIRE\u7684\u5f71\u54cd\u3002|\n", "2410.21228": "|**2024-10-28**|**LoRA vs Full Fine-tuning: An Illusion of Equivalence**|Reece Shuttleworth et.al.|[2410.21228](http://arxiv.org/abs/2410.21228)|null|\u5fae\u8c03\u662f\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u9002\u5e94\u5230\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u5173\u952e\u8303\u5f0f\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0c\u4f4e\u79e9\u81ea\u9002\u5e94\uff08LoRA\uff09\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u80fd\u591f\u4ee5\u6781\u5c0f\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u8fbe\u5230\u4e0e\u5b8c\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u5373\u4f7f\u4e24\u79cd\u65b9\u6cd5\u5b66\u4e60\u5230\u7684\u6a21\u578b\u51c6\u786e\u6027\u76f8\u4f3c\uff0c\u5b83\u4eec\u7684\u5b66\u4e60\u89e3\u51b3\u65b9\u6848\u771f\u7684\u7b49\u4ef7\u5417\uff1f\u6211\u4eec\u901a\u8fc7\u5206\u6790\u6a21\u578b\u6743\u91cd\u77e9\u9635\u7684\u8c31\u5c5e\u6027\u6765\u7814\u7a76\u4e0d\u540c\u7684\u5fae\u8c03\u65b9\u6cd5\u5982\u4f55\u6539\u53d8\u9884\u8bad\u7ec3\u6a21\u578b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5168\u5fae\u8c03\u548cLoRA\u751f\u6210\u7684\u6743\u91cd\u77e9\u9635\u5728\u5947\u5f02\u503c\u5206\u89e3\u7ed3\u6784\u4e0a\u8868\u73b0\u51fa\u5f88\u5927\u7684\u4e0d\u540c\uff1b\u6b64\u5916\uff0c\u5f53\u5728\u8d85\u51fa\u9002\u5e94\u4efb\u52a1\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\u6d4b\u8bd5\u65f6\uff0c\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u663e\u793a\u51fa\u4e0d\u540c\u7684\u6cdb\u5316\u884c\u4e3a\u3002\u66f4\u5177\u4f53\u5730\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u5c55\u793a\u4e86\u4f7f\u7528LoRA\u8bad\u7ec3\u7684\u6743\u91cd\u77e9\u9635\u5177\u6709\u65b0\u7684\u9ad8\u6392\u540d\u5947\u5f02\u5411\u91cf\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u5165\u4fb5\u7ef4\u5ea6\u201d\u3002\u8fd9\u4e9b\u5165\u4fb5\u7ef4\u5ea6\u5728\u5168\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u4e0d\u4f1a\u51fa\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5c3d\u7ba1\u5177\u6709\u5165\u4fb5\u7ef4\u5ea6\u7684LoRA\u6a21\u578b\u5728\u76ee\u6807\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0e\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5b83\u4eec\u5bf9\u9884\u8bad\u7ec3\u5206\u5e03\u7684\u5efa\u6a21\u6548\u679c\u8f83\u5dee\uff0c\u5e76\u4e14\u5728\u987a\u5e8f\u9002\u5e94\u591a\u4e2a\u4efb\u52a1\u65f6\u7684\u9c81\u68d2\u6027\u8f83\u4f4e\u3002\u9ad8\u79e9\u3001\u79e9\u7a33\u5b9a\u7684LoRA\u6a21\u578b\u751a\u81f3\u5728\u4e0e\u4f4e\u79e9LoRA\u6a21\u578b\u6267\u884c\u76f8\u540c\u4efb\u52a1\u65f6\uff0c\u4e5f\u4e0e\u5168\u5fae\u8c03\u6a21\u578b\u975e\u5e38\u63a5\u8fd1\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u5728\u76f8\u540c\u7684\u5fae\u8c03\u5206\u5e03\u4e0a\u8868\u73b0\u76f8\u540c\uff0cLoRA\u66f4\u65b0\u7684\u6a21\u578b\u548c\u5168\u5fae\u8c03\u6a21\u578b\u8bbf\u95ee\u4e86\u53c2\u6570\u7a7a\u95f4\u7684\u4e0d\u540c\u90e8\u5206\u3002\u6211\u4eec\u6700\u540e\u63a2\u8ba8\u4e86\u4e3a\u4ec0\u4e48\u5165\u4fb5\u7ef4\u5ea6\u4f1a\u5728LoRA\u5fae\u8c03\u6a21\u578b\u4e2d\u51fa\u73b0\uff0c\u4e3a\u4ec0\u4e48\u5b83\u4eec\u662f\u4e0d\u7406\u60f3\u7684\uff0c\u4ee5\u53ca\u5982\u4f55\u6700\u5c0f\u5316\u5176\u5f71\u54cd\u3002|\n", "2410.21218": "|**2024-10-28**|**Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations**|Kaifeng Huang et.al.|[2410.21218](http://arxiv.org/abs/2410.21218)|null|\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u667a\u529b\u548c\u751f\u4ea7\u529b\u65b9\u9762\u5f15\u53d1\u4e86\u663e\u8457\u7684\u5f71\u54cd\u3002\u8fd1\u5e74\u6765\uff0c\u5546\u4e1a\u548c\u5f00\u6e90LLM\u7684\u5f15\u5165\u5448\u73b0\u51fa\u5de8\u5927\u7684\u589e\u957f\u8d8b\u52bf\u3002\u8bb8\u591a\u4f01\u4e1a\u5df2\u5c06LLM\u96c6\u6210\u5230\u5176\u5e94\u7528\u4e2d\u4ee5\u89e3\u51b3\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5c06LLM\u6574\u5408\u5230\u5177\u4f53\u4e1a\u52a1\u573a\u666f\u4e2d\u4e0d\u4ec5\u4ec5\u9700\u8981\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u672c\u8eab\uff0c\u800c\u662f\u4e00\u4e2a\u7cfb\u7edf\u7684\u8fc7\u7a0b\uff0c\u6d89\u53ca\u5927\u91cf\u7684\u7ec4\u6210\u90e8\u5206\uff0c\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u7edf\u79f0\u4e3aLLM\u4f9b\u5e94\u94fe\u3002LLM\u4f9b\u5e94\u94fe\u5185\u5728\u5730\u627f\u8f7d\u7740\u98ce\u9669\u3002\u56e0\u6b64\uff0c\u7406\u89e3\u53ef\u80fd\u5f15\u5165\u4f9b\u5e94\u94fe\u7684\u7ec4\u4ef6\u7c7b\u578b\u4ee5\u53ca\u76f8\u5173\u7684\u98ce\u9669\u81f3\u5173\u91cd\u8981\uff0c\u8fd9\u6709\u52a9\u4e8e\u4e0d\u540c\u7684\u5229\u76ca\u76f8\u5173\u8005\u5b9e\u65bd\u6709\u6548\u7684\u7f13\u89e3\u63aa\u65bd\u3002\u867d\u7136\u4e00\u4e9b\u6587\u732e\u6d89\u53ca\u4e0eLLM\u4f9b\u5e94\u94fe\u76f8\u5173\u7684\u98ce\u9669\uff0c\u4f46\u76ee\u524d\u8fd8\u6ca1\u6709\u8bba\u6587\u660e\u786e\u754c\u5b9a\u5176\u8303\u56f4\u3001\u8bc6\u522b\u56fa\u6709\u98ce\u9669\u5e76\u63a2\u8ba8\u6f5c\u5728\u7684\u7f13\u89e3\u7b56\u7565\u3002\u9274\u4e8eLLMs\u5df2\u6210\u4e3a\u65b0\u65f6\u4ee3\u7684\u91cd\u8981\u57fa\u7840\u8bbe\u65bd\uff0c\u6211\u4eec\u8ba4\u4e3a\u5bf9LLM\u4f9b\u5e94\u94fe\u53ca\u5176\u56fa\u6709\u98ce\u9669\u548c\u7f13\u89e3\u7b56\u7565\u8fdb\u884c\u5f7b\u5e95\u5ba1\u67e5\u5bf9\u4e8e\u884c\u4e1a\u4ece\u4e1a\u8005\u907f\u514d\u6f5c\u5728\u635f\u5931\u5177\u6709\u91cd\u8981\u4ef7\u503c\uff0c\u5e76\u4e14\u5bf9\u4e8e\u5b66\u672f\u7814\u7a76\u4eba\u5458\u91cd\u65b0\u601d\u8003\u73b0\u6709\u65b9\u6cd5\u548c\u63a2\u7d22\u65b0\u7684\u7814\u7a76\u9014\u5f84\u4e5f\u5177\u6709\u542f\u53d1\u610f\u4e49\u3002\u6211\u4eec\u7684\u8bba\u6587\u63d0\u4f9b\u4e86LLM\u4f9b\u5e94\u94fe\u7684\u5168\u9762\u6982\u8ff0\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u5229\u76ca\u76f8\u5173\u8005\u3001\u7ec4\u6210\u5143\u7d20\u4ee5\u53ca\u4f9b\u5e94\u7c7b\u578b\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e0e\u5404\u79cd\u4f9b\u5e94\u94fe\u5229\u76ca\u76f8\u5173\u8005\u548c\u7ec4\u4ef6\u76f8\u5173\u7684\u98ce\u9669\u7c7b\u578b\u3001\u98ce\u9669\u884c\u4e3a\u548c\u7f13\u89e3\u63aa\u65bd\u7684\u5206\u7c7b\u6cd5\u3002\u603b\u800c\u8a00\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63a2\u8ba8\u4e86LLM\u4f9b\u5e94\u94fe\u7684\u6280\u672f\u548c\u64cd\u4f5c\u65b9\u9762\uff0c\u4e3a\u7814\u7a76\u548c\u5de5\u7a0b\u4eba\u5458\u5728\u4e0d\u65ad\u53d1\u5c55\u7684LLM\u9886\u57df\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2410.21200": "|**2024-10-28**|**BongLLaMA: LLaMA for Bangla Language**|Abdullah Khan Zehady et.al.|[2410.21200](http://arxiv.org/abs/2410.21200)|null|\u5b5f\u52a0\u62c9\u8bed\uff08\u6216\u201c Bengali\u201d\uff09\u662f\u4e00\u79cd\u4f7f\u7528\u7ea62.4\u4ebf\u6bcd\u8bed\u8005\u548c\u5927\u7ea63\u4ebf\u4eba\u4f7f\u7528\u7684\u8bed\u8a00\u3002\u5c3d\u7ba1\u5b83\u662f\u4e16\u754c\u4e0a\u7b2c\u4e94\u5927\u4f7f\u7528\u8bed\u8a00\uff0c\u5b5f\u52a0\u62c9\u8bed\u4ecd\u88ab\u89c6\u4e3a\u4e00\u79cd\u201c\u4f4e\u8d44\u6e90\u201d\u8bed\u8a00\uff0c\u73b0\u6709\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u5728\u5b5f\u52a0\u62c9\u8bed\u5904\u7406\uff08BLP\uff09\u4efb\u52a1\u4e0a\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5f15\u5165BongLLaMA\uff08\u5373\u5b5f\u52a0\u62c9\u8bed-LLaMA\uff09\uff0c\u89e3\u51b3\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u95e8\u9488\u5bf9\u5927\u578b\u5b5f\u52a0\u62c9\u8bed\u8bed\u6599\u5e93\u548c\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\u3001\u6570\u636e\u589e\u5f3a\u6280\u672f\u3001\u5fae\u8c03\u7ec6\u8282\u4ee5\u53ca\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\u7ed3\u679c\uff0c\u5c55\u793a\u4e86BongLLaMA\u5728\u5b5f\u52a0\u62c9\u8bed\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u6548\u7528\u3002\u6211\u4eec\u76f8\u4fe1BongLLaMA\u5c06\u6210\u4e3a\u5b5f\u52a0\u62c9\u8bed\u6a21\u578b\u7684\u65b0\u6807\u51c6\u57fa\u7ebf\uff0c\u4ece\u800c\u4fc3\u8fdb\u672a\u6765\u4e13\u6ce8\u4e8e\u8fd9\u79cd\u5e7f\u6cdb\u4f7f\u7528\u4f46\u201c\u4f4e\u8d44\u6e90\u201d\u7684\u8bed\u8a00\u7684\u57fa\u51c6\u7814\u7a76\u3002\u6240\u6709BongLLaMA\u6a21\u578b\u5747\u53ef\u4f9b\u516c\u4f17\u4f7f\u7528\uff0c\u7f51\u5740\u4e3ahttps://huggingface.co/BanglaLLM\u3002|\n", "2410.21169": "|**2024-10-29**|**Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction**|Qintong Zhang et.al.|[2410.21169](http://arxiv.org/abs/2410.21169)|null|\u6587\u6863\u89e3\u6790\u5bf9\u4e8e\u5c06\u975e\u7ed3\u6784\u5316\u548c\u534a\u7ed3\u6784\u5316\u6587\u6863\uff08\u5982\u5408\u540c\u3001\u5b66\u672f\u8bba\u6587\u548c\u53d1\u7968\uff09\u8f6c\u6362\u4e3a\u7ed3\u6784\u5316\u7684\u3001\u673a\u5668\u53ef\u8bfb\u7684\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u6587\u6863\u89e3\u6790\u4ece\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e2d\u63d0\u53d6\u53ef\u9760\u4e14\u7ed3\u6784\u5316\u7684\u6570\u636e\uff0c\u4e3a\u4f17\u591a\u5e94\u7528\u63d0\u4f9b\u4e86\u6781\u5927\u7684\u4fbf\u5229\u3002\u7279\u522b\u662f\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u6587\u6863\u89e3\u6790\u5728\u77e5\u8bc6\u5e93\u6784\u5efa\u548c\u8bad\u7ec3\u6570\u636e\u751f\u6210\u65b9\u9762\u53d1\u6325\u7740\u4e0d\u53ef\u6216\u7f3a\u7684\u4f5c\u7528\u3002\u672c\u6587\u7efc\u8ff0\u4e86\u5f53\u524d\u6587\u6863\u89e3\u6790\u7684\u72b6\u6001\uff0c\u6db5\u76d6\u4e86\u5173\u952e\u7684\u65b9\u6cd5\u8bba\uff0c\u4ece\u6a21\u5757\u5316\u6d41\u6c34\u7ebf\u7cfb\u7edf\u5230\u7531\u5927\u578b\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u7aef\u5230\u7aef\u6a21\u578b\u3002\u8be6\u7ec6\u63a2\u8ba8\u4e86\u6838\u5fc3\u7ec4\u4ef6\uff0c\u5305\u62ec\u5e03\u5c40\u68c0\u6d4b\u3001\u5185\u5bb9\u63d0\u53d6\uff08\u5305\u62ec\u6587\u672c\u3001\u8868\u683c\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\uff09\u4ee5\u53ca\u591a\u6a21\u6001\u6570\u636e\u96c6\u6210\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u8ba8\u8bba\u4e86\u6a21\u5757\u5316\u6587\u6863\u89e3\u6790\u7cfb\u7edf\u548c\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u5e03\u5c40\u3001\u6574\u5408\u591a\u4e2a\u6a21\u5757\u548c\u8bc6\u522b\u9ad8\u5bc6\u5ea6\u6587\u672c\u65f6\u6240\u9762\u4e34\u7684\u6311\u6218\u3002\u6587\u7ae0\u5f3a\u8c03\u4e86\u5f00\u53d1\u66f4\u5927\u548c\u66f4\u591a\u6837\u5316\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6982\u8ff0\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2410.22323": "|**2024-10-29**|**Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models**|Seetharam Killivalavan et.al.|[2410.22323](http://arxiv.org/abs/2410.22323)|null|\u672c\u6587\u63a2\u7d22\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u6280\u672f\u6765\u63d0\u5347\u4e8c\u5143\u5206\u7c7b\u6a21\u578b\u5728\u8bc4\u4f30\u4ee3\u7801\u6ce8\u91ca\u8d28\u91cf\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u901a\u8fc7\u5c06\u6765\u81ea\u591a\u4e2aGitHub\u4ed3\u5e93\u76841,437\u4e2a\u65b0\u751f\u6210\u7684\u4ee3\u7801-\u6ce8\u91ca\u5bf9\uff08\u6807\u8bb0\u4e3a\u201c\u6709\u7528\u201d\u6216\u201c\u65e0\u7528\u201d\uff09\u6574\u5408\u5230\u4e00\u4e2a\u73b0\u6709\u7684C\u8bed\u8a00\u6570\u636e\u96c6\u4e2d\uff08\u8be5\u6570\u636e\u96c6\u5305\u542b9,048\u5bf9\uff09\uff0c\u5c55\u793a\u4e86\u6a21\u578b\u6027\u80fd\u7684\u663e\u8457\u63d0\u5347\u3002\u91c7\u7528\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f7f\u5f97\u652f\u6301\u5411\u91cf\u673a\uff08SVM\uff09\u6a21\u578b\u7684\u7cbe\u786e\u7387\u63d0\u9ad8\u4e865.78%\uff0c\u4ece0.79\u63d0\u5347\u81f30.8478\uff0c\u540c\u65f6\u4eba\u5de5\u795e\u7ecf\u7f51\u7edc\uff08ANN\uff09\u6a21\u578b\u7684\u53ec\u56de\u7387\u63d0\u9ad8\u4e862.17%\uff0c\u4ece0.731\u63d0\u5347\u81f30.7527\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u5728\u6539\u8fdb\u4ee3\u7801\u6ce8\u91ca\u5206\u7c7b\u6a21\u578b\u4e2d\u7684\u4ef7\u503c\uff0c\u4e3a\u8f6f\u4ef6\u5f00\u53d1\u548c\u8d28\u91cf\u63a7\u5236\u4e2d\u7684\u6a21\u578b\u51c6\u786e\u6027\u63d0\u5347\u63d0\u4f9b\u4e86\u91cd\u8981\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u4e3a\u5728\u5b9e\u9645\u8f6f\u4ef6\u5de5\u7a0b\u73af\u5883\u4e2d\u6574\u5408\u751f\u6210\u6280\u672f\u4ee5\u4f18\u5316\u673a\u5668\u5b66\u4e60\u6a21\u578b\u63d0\u4f9b\u4e86\u4e50\u89c2\u7684\u524d\u666f\u3002|\n", "2410.22318": "|**2024-10-29**|**Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting**|Can Chen et.al.|[2410.22318](http://arxiv.org/abs/2410.22318)|**[link](https://github.com/canchen-cc/online-llm-detection)**|**\u8fd1\u5e74\u6765\uff0c\u533a\u5206\u673a\u5668\u751f\u6210\u6587\u672c\u548c\u4eba\u7c7b\u64b0\u5199\u6587\u672c\u7684\u7b97\u6cd5\u7814\u7a76\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u65b9\u6cd5\u901a\u5e38\u662f\u5728\u79bb\u7ebf\u8bbe\u7f6e\u4e0b\u8fdb\u884c\uff0c\u5373\u5728\u7ed9\u5b9a\u7684\u6570\u636e\u96c6\u4e2d\u5305\u542b\u771f\u5b9e\u6587\u672c\u548c\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u6df7\u5408\u6837\u672c\uff0c\u4efb\u52a1\u662f\u786e\u5b9a\u6570\u636e\u96c6\u4e2d\u7684\u6bcf\u4e2a\u6837\u672c\u662f\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fd8\u662f\u7531\u4eba\u7c7b\u751f\u6210\u7684\u3002\u7136\u800c\uff0c\u5728\u8bb8\u591a\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u5982\u65b0\u95fb\u7f51\u7ad9\u3001\u793e\u4ea4\u5a92\u4f53\u8d26\u6237\u6216\u5176\u4ed6\u8bba\u575b\u53d1\u5e03\u7684\u6587\u7ae0\u662f\u4ee5\u6d41\u5f0f\u65b9\u5f0f\u53d1\u5e03\u7684\u3002\u56e0\u6b64\uff0c\u5728\u8fd9\u79cd\u5728\u7ebf\u573a\u666f\u4e2d\uff0c\u5982\u4f55\u5feb\u901f\u4e14\u51c6\u786e\u5730\u786e\u5b9a\u8fd9\u4e9b\u6765\u6e90\u662f\u5426\u4e3aLLM\uff0c\u5e76\u5177\u6709\u5f3a\u5927\u7684\u7edf\u8ba1\u4fdd\u8bc1\uff0c\u5bf9\u4e8e\u8fd9\u4e9b\u5a92\u4f53\u6216\u5e73\u53f0\u6709\u6548\u5730\u8fd0\u4f5c\u5e76\u9632\u6b62\u9519\u8bef\u4fe1\u606f\u548c\u5176\u4ed6\u6f5c\u5728\u7684LLM\u8bef\u7528\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u4e86\u89e3\u51b3\u5728\u7ebf\u68c0\u6d4b\u7684\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8e\u987a\u5e8f\u5047\u8bbe\u68c0\u9a8c\u7684\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u4e0d\u4ec5\u5efa\u7acb\u5e76\u8865\u5145\u4e86\u73b0\u6709\u7684\u79bb\u7ebf\u68c0\u6d4b\u6280\u672f\uff0c\u800c\u4e14\u8fd8\u5177\u5907\u7edf\u8ba1\u4fdd\u8bc1\uff0c\u5305\u62ec\u63a7\u5236\u9519\u8bef\u53d1\u73b0\u7387\u548c\u6b63\u786e\u8bc6\u522b\u6765\u6e90\u4e3aLLM\u7684\u9884\u671f\u65f6\u95f4\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2410.22315": "|**2024-10-29**|**Natural Language Inference Improves Compositionality in Vision-Language Models**|Paola Cascante-Bonilla et.al.|[2410.22315](http://arxiv.org/abs/2410.22315)|null|\u5728\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u4e2d\uff0c\u7ec4\u5408\u63a8\u7406\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u96be\u4ee5\u5173\u8054\u5bf9\u8c61\u3001\u5c5e\u6027\u548c\u7a7a\u95f4\u5173\u7cfb\u3002\u6700\u8fd1\u7684\u65b9\u6cd5\u8bd5\u56fe\u901a\u8fc7\u4f9d\u8d56\u6587\u672c\u63cf\u8ff0\u7684\u8bed\u4e49\uff0c\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c06\u95ee\u9898\u548c\u7b54\u6848\u5206\u89e3\u6210\u5b50\u96c6\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4e3b\u8981\u5728\u8868\u9762\u5c42\u6b21\u4e0a\u64cd\u4f5c\uff0c\u672a\u80fd\u5f15\u5165\u66f4\u6df1\u7684\u8bcd\u6c47\u7406\u89e3\uff0c\u540c\u65f6\u8fd8\u4f1a\u5f15\u5165\u7531LLM\u751f\u6210\u7684\u9519\u8bef\u5047\u8bbe\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Caption Expansion with Contradictions and Entailments (CECE)\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u539f\u7406\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u4ece\u7ed9\u5b9a\u7684\u524d\u63d0\u751f\u6210\u8574\u6db5\u548c\u77db\u76fe\u3002CECE\u751f\u6210\u8bcd\u6c47\u4e0a\u591a\u6837\u7684\u53e5\u5b50\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u610f\u4e49\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86CECE\u589e\u5f3a\u4e86\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u51cf\u5c11\u4e86\u5bf9\u6709\u504f\u89c1\u6216\u8868\u9762\u7279\u5f81\u7684\u8fc7\u5ea6\u4f9d\u8d56\u3002\u901a\u8fc7\u5e73\u8861\u539f\u59cb\u524d\u63d0\u4e0eCECE\uff0c\u6211\u4eec\u5728\u65e0\u9700\u989d\u5916\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u8861\u91cf\u56fe\u50cf-\u6587\u672c\u5bf9\u9f50\u7684\u4eba\u7c7b\u5224\u65ad\u5f97\u5206\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\uff0c\u5e76\u5728Winoground\u4e0a\u5b9e\u73b0\u4e86+19.2%\uff08\u7ec4\u5206\u6570\uff09\u548c\u5728EqBen\u4e0a\u5b9e\u73b0+12.9%\uff08\u7ec4\u5206\u6570\uff09\u7684\u6027\u80fd\u63d0\u5347\uff0c\u8d85\u8fc7\u4e86\u6700\u4f73\u73b0\u6709\u5de5\u4f5c\uff08\u4f7f\u7528\u9488\u5bf9\u6027\u6570\u636e\u5fae\u8c03\uff09\u3002|\n", "2410.22309": "|**2024-10-29**|**GPT-4o reads the mind in the eyes**|James W. A. Strachan et.al.|[2410.22309](http://arxiv.org/abs/2410.22309)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u6587\u672c\u4e2d\u91cd\u73b0\u4eba\u7c7b\u7c7b\u4f3c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u5305\u62ec\u5173\u4e8e\u60c5\u7eea\u548c\u5fc3\u7406\u72b6\u6001\u7684\u63a8\u7406\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u80fd\u529b\u662f\u5426\u6269\u5c55\u5230\u5176\u4ed6\u6a21\u6001\u5c1a\u4e0d\u6e05\u695a\u3002\u4eba\u7c7b\u5177\u6709\u901a\u8fc7\u4ed6\u4eba\u7684\u773c\u775b\u8bfb\u5fc3\u7684\u590d\u6742\u80fd\u529b\u3002\u5728\u6b64\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u6d4b\u8bd5\u4e86\u8fd9\u4e00\u80fd\u529b\u662f\u5426\u4e5f\u5b58\u5728\u4e8eGPT-4o\u8fd9\u4e00\u591a\u6a21\u6001LLM\u4e2d\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u4e24\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u5fc3\u7406\u7406\u8bba\u6d4b\u8bd5\u7248\u672c\uff0c\u5373\u201c\u773c\u775b\u4e2d\u7684\u5fc3\u667a\u9605\u8bfb\u6d4b\u8bd5\u201d\u548c\u201c\u591a\u5143\u79cd\u65cf\u773c\u775b\u4e2d\u7684\u5fc3\u667a\u9605\u8bfb\u6d4b\u8bd5\u201d\u3002\u7ed3\u679c\u53d1\u73b0\uff0cGPT-4o\u5728\u89e3\u91ca\u6765\u81ea\u76f4\u7acb\u9762\u90e8\u7684\u5fc3\u7406\u72b6\u6001\u65b9\u9762\u4f18\u4e8e\u4eba\u7c7b\uff0c\u4f46\u5728\u9762\u90e8\u5012\u7f6e\u65f6\u8868\u73b0\u8f83\u5dee\u3002\u5c3d\u7ba1\u6211\u4eec\u6837\u672c\u4e2d\u7684\u4eba\u7c7b\u5728\u767d\u4eba\u548c\u975e\u767d\u4eba\u9762\u5b54\u4e4b\u95f4\u6ca1\u6709\u8868\u73b0\u51fa\u5dee\u5f02\uff0c\u4f46GPT-4o\u5bf9\u767d\u4eba\u9762\u5b54\u7684\u51c6\u786e\u5ea6\u9ad8\u4e8e\u975e\u767d\u4eba\u9762\u5b54\u3002GPT-4o\u7684\u9519\u8bef\u5e76\u975e\u968f\u673a\u51fa\u73b0\uff0c\u800c\u662f\u63ed\u793a\u4e86\u4e00\u79cd\u9ad8\u5ea6\u4e00\u81f4\u4f46\u9519\u8bef\u7684\u5904\u7406\u5fc3\u7406\u72b6\u6001\u4fe1\u606f\u7684\u65b9\u5f0f\uff0c\u5728\u4e0d\u540c\u8bd5\u9a8c\u4e2d\u5448\u73b0\u51fa\u65b9\u5411\u4f9d\u8d56\u7684\u9519\u8bef\u7ed3\u6784\uff0c\u8fd9\u79cd\u7ed3\u6784\u5728\u9762\u5bf9\u5012\u7f6e\u9762\u5b54\u65f6\u4e0e\u4eba\u7c7b\u5b58\u5728\u5b9a\u6027\u5dee\u5f02\uff0c\u800c\u5728\u9762\u5bf9\u76f4\u7acb\u9762\u5b54\u65f6\u5219\u65e0\u660e\u663e\u533a\u522b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5148\u8fdb\u7684\u5fc3\u7406\u72b6\u6001\u63a8\u7406\u80fd\u529b\u548c\u4eba\u7c7b\u7c7b\u4f3c\u7684\u9762\u90e8\u5904\u7406\u7279\u5f81\uff0c\u5982\u53cd\u8f6c\u6548\u5e94\uff0c\u5728GPT-4o\u4e2d\u5171\u5b58\uff0c\u540c\u65f6\u5176\u4fe1\u606f\u5904\u7406\u65b9\u5f0f\u4e0e\u4eba\u7c7b\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002|\n", "2410.22307": "|**2024-10-29**|**SVIP: Towards Verifiable Inference of Open-source Large Language Models**|Yifan Sun et.al.|[2410.22307](http://arxiv.org/abs/2410.22307)|null|\u5f00\u6e90\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\uff0c\u5e76\u5728\u5404\u4e2a\u9886\u57df\u5f97\u5230\u4e86\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u89c4\u6a21\u7684\u589e\u5927\uff0c\u672c\u5730\u90e8\u7f72\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\uff0c\u8bb8\u591a\u7528\u6237\u4e0d\u5f97\u4e0d\u4f9d\u8d56\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u901a\u8fc7\u9ed1\u76d2API\u8fdb\u884c\u63a8\u7406\u3002\u8fd9\u79cd\u4f9d\u8d56\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u98ce\u9669\uff1a\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u53ef\u80fd\u5728\u672a\u7ecf\u7528\u6237\u540c\u610f\u7684\u60c5\u51b5\u4e0b\uff0c\u7528\u8f83\u5c0f\u4e14\u80fd\u529b\u8f83\u5f31\u7684\u6a21\u578b\u66ff\u4ee3\u7528\u6237\u8bf7\u6c42\u7684LLM\uff0c\u4ece\u800c\u63d0\u4f9b\u8d28\u91cf\u8f83\u5dee\u7684\u7ed3\u679c\uff0c\u540c\u65f6\u8282\u7701\u6210\u672c\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f62\u5f0f\u5316\u4e86LLM\u53ef\u9a8c\u8bc1\u63a8\u7406\u7684\u95ee\u9898\u3002\u73b0\u6709\u7684\u57fa\u4e8e\u5bc6\u7801\u5b66\u6216\u535a\u5f08\u8bba\u6280\u672f\u7684\u53ef\u9a8c\u8bc1\u8ba1\u7b97\u89e3\u51b3\u65b9\u6848\u8981\u4e48\u5728\u8ba1\u7b97\u4e0a\u4e0d\u7ecf\u6d4e\uff0c\u8981\u4e48\u57fa\u4e8e\u8f83\u5f3a\u7684\u5047\u8bbe\u3002\u6211\u4eec\u5f15\u5165\u4e86SVIP\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u79d8\u5bc6\u7684\u53ef\u9a8c\u8bc1LLM\u63a8\u7406\u534f\u8bae\uff0c\u5b83\u5229\u7528LLM\u7684\u4e2d\u95f4\u8f93\u51fa\u4f5c\u4e3a\u552f\u4e00\u7684\u6a21\u578b\u6807\u8bc6\u7b26\u3002\u901a\u8fc7\u5728\u8fd9\u4e9b\u8f93\u51fa\u4e0a\u8bad\u7ec3\u4ee3\u7406\u4efb\u52a1\uff0c\u5e76\u8981\u6c42\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u8fd4\u56de\u751f\u6210\u7684\u6587\u672c\u548c\u5904\u7406\u8fc7\u7684\u4e2d\u95f4\u8f93\u51fa\uff0c\u7528\u6237\u53ef\u4ee5\u53ef\u9760\u5730\u9a8c\u8bc1\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u662f\u5426\u8bda\u5b9e\u884c\u4e8b\u3002\u6b64\u5916\uff0c\u7ed3\u5408\u79d8\u5bc6\u673a\u5236\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u534f\u8bae\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5728\u591a\u79cd\u5f3a\u9002\u5e94\u6027\u5bf9\u6297\u573a\u666f\u4e0b\u5168\u9762\u5206\u6790\u4e86\u6211\u4eec\u7684\u534f\u8bae\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSVIP\u662f\u51c6\u786e\u7684\u3001\u53ef\u6cdb\u5316\u7684\u3001\u8ba1\u7b97\u9ad8\u6548\u7684\uff0c\u5e76\u4e14\u5bf9\u5404\u79cd\u653b\u51fb\u5177\u6709\u62b5\u6297\u529b\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSVIP\u7684\u5047\u9634\u6027\u7387\u4f4e\u4e8e5%\uff0c\u5047\u9633\u6027\u7387\u4f4e\u4e8e3%\uff0c\u5e76\u4e14\u6bcf\u6b21\u67e5\u8be2\u7684\u9a8c\u8bc1\u65f6\u95f4\u5c11\u4e8e0.01\u79d2\u3002|\n", "2410.22304": "|**2024-10-29**|**Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning**|Yihe Deng et.al.|[2410.22304](http://arxiv.org/abs/2410.22304)|null|\u6570\u5b66\u63a8\u7406\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5173\u952e\u80fd\u529b\uff0c\u4f46\u751f\u6210\u8be6\u7ec6\u4e14\u51c6\u786e\u7684\u63a8\u7406\u8f68\u8ff9\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5229\u7528\u5728\u7ebf\u5b66\u4e60\u6d41\u7684\u65b0\u65b9\u6cd5\uff0c\u4ee5\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u63a8\u7406\u8f68\u8ff9\u7528\u4e8eLLM\u5fae\u8c03\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u589e\u91cf\u8f93\u51fa\u751f\u4ea7\u6d41\uff0c\u5176\u4e2d\u7ec4\u4ef6LLM\u901a\u8fc7\u8fed\u4ee3\u901a\u4fe1\u534f\u4f5c\u6784\u5efa\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528\u5e26\u6709\u6eda\u52a8\u7684\u5728\u7ebf\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5b66\u4e60\u6765\u8bad\u7ec3\u8be5\u6d41\uff0c\u4e3a\u6bcf\u4e2a\u8bad\u7ec3\u6837\u672c\u751f\u6210DPO\u5bf9\uff0c\u5e76\u5b9e\u65f6\u66f4\u65b0\u6a21\u578b\u3002\u6211\u4eec\u76f4\u63a5\u6bd4\u8f83\u4e86\u901a\u8fc7\u6211\u4eec\u65b9\u6cd5\u4e0e\u76f4\u63a5\u6a21\u578b\u63a8\u7406\u751f\u6210\u7684\u63a8\u7406\u8f68\u8ff9\u7684\u8d28\u91cf\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u63d0\u9ad8LLM\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2410.22296": "|**2024-10-29**|**LLMs are Highly-Constrained Biophysical Sequence Optimizers**|Angelica Chen et.al.|[2410.22296](http://arxiv.org/abs/2410.22296)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u751f\u7269\u4efb\u52a1\u4e2d\uff0c\u5982\u86cb\u767d\u8d28\u5de5\u7a0b\u548c\u5206\u5b50\u8bbe\u8ba1\u65b9\u9762\uff0c\u6700\u8fd1\u5c55\u793a\u4e86\u663e\u8457\u7684\u6f5c\u529b\u3002\u8fd9\u4e9b\u4efb\u52a1\u901a\u5e38\u6d89\u53ca\u9ed1\u76d2\u79bb\u6563\u5e8f\u5217\u4f18\u5316\uff0c\u6311\u6218\u5728\u4e8e\u751f\u6210\u4e0d\u4ec5\u5728\u751f\u7269\u5b66\u4e0a\u53ef\u884c\u800c\u4e14\u4e25\u683c\u7b26\u5408\u7ec6\u7c92\u5ea6\u7ea6\u675f\u7684\u5e8f\u5217\u3002\u7136\u800c\uff0cLLMs\u5f80\u5f80\u96be\u4ee5\u5e94\u5bf9\u8fd9\u4e9b\u7ea6\u675f\uff0c\u7279\u522b\u662f\u5728\u751f\u7269\u5b66\u80cc\u666f\u4e0b\uff0c\u9a8c\u8bc1\u5019\u9009\u89e3\u51b3\u65b9\u6848\u65e2\u6602\u8d35\u53c8\u8017\u65f6\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5c06LLMs\u4f5c\u4e3a\u9ad8\u5ea6\u7ea6\u675f\u7684\u53cc\u5c42\u4f18\u5316\u5668\u7684\u53ef\u80fd\u6027\uff0c\u901a\u8fc7\u4e00\u79cd\u6211\u4eec\u79f0\u4e4b\u4e3a\u8bed\u8a00\u6a21\u578b\u4f18\u5316\u8fb9\u7f18\u671f\u671b\uff08LLOME\uff09\u7684\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u4f18\u5316\uff0c\u5229\u7528\u6709\u9650\u7684oracle\u8bc4\u4f30\u8fed\u4ee3\u5730\u589e\u5f3a\u7531LLM\u751f\u6210\u7684\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u76ee\u6807\u2014\u2014\u8fb9\u7f18\u5bf9\u9f50\u671f\u671b\uff08MargE\uff09\uff0c\u8be5\u76ee\u6807\u8bad\u7ec3LLM\u5e73\u6ed1\u5730\u5728\u5956\u52b1\u5206\u5e03\u548c\u53c2\u8003\u5206\u5e03\u4e4b\u95f4\u63d2\u503c\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u5408\u6210\u6d4b\u8bd5\u5957\u4ef6\uff0c\u8be5\u5957\u4ef6\u4e0e\u5b9e\u9645\u751f\u7269\u7269\u7406\u95ee\u9898\u5177\u6709\u5f3a\u70c8\u7684\u51e0\u4f55\u76f8\u4f3c\u6027\uff0c\u5e76\u4e14\u80fd\u591f\u5728\u4e0d\u8fdb\u884c\u8017\u65f6\u7684\u5b9e\u9a8c\u5ba4\u9a8c\u8bc1\u7684\u60c5\u51b5\u4e0b\u5feb\u901f\u8bc4\u4f30LLM\u4f18\u5316\u5668\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u4e0e\u9057\u4f20\u7b97\u6cd5\u57fa\u7ebf\u76f8\u6bd4\uff0cLLMs\u5728\u8981\u6c42\u8f83\u5c11\u6d4b\u8bd5\u51fd\u6570\u8bc4\u4f30\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u663e\u8457\u66f4\u4f4e\u7684\u9057\u61be\u89e3\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u89c2\u5bdf\u5230LLMs\u8868\u73b0\u51fa\u9002\u5ea6\u7684\u6821\u51c6\u504f\u5dee\uff0c\u5bb9\u6613\u53d1\u751f\u751f\u6210\u5668\u5d29\u6e83\uff0c\u5e76\u4e14\u5728\u6ca1\u6709\u660e\u786e\u7684\u5730\u9762\u771f\u503c\u5956\u52b1\u53ef\u7528\u65f6\u96be\u4ee5\u627e\u5230\u6700\u4f18\u89e3\u3002|\n", "2410.22293": "|**2024-10-29**|**Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats**|Mohammad Setak et.al.|[2410.22293](http://arxiv.org/abs/2410.22293)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u4ee3\u7801\u5408\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4e0d\u540c\u9886\u57df\u66f4\u590d\u6742\u7684\u4efb\u52a1\u3002\u672c\u6587\u63a2\u8ba8\u4e86LLMs\u5728\u4ee3\u7801\u53d8\u5f02\u4e2d\u7684\u5e94\u7528\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u4e0d\u6539\u53d8\u7a0b\u5e8f\u4ee3\u7801\u529f\u80fd\u7684\u524d\u63d0\u4e0b\u6539\u53d8\u5176\u7ed3\u6784\u7684\u8fc7\u7a0b\u3002\u4f20\u7edf\u4e0a\uff0c\u4ee3\u7801\u53d8\u5f02\u88ab\u7528\u4e8e\u63d0\u9ad8\u5173\u952e\u4efb\u52a1\u5e94\u7528\u7a0b\u5e8f\u7684\u8f6f\u4ef6\u5065\u58ee\u6027\u3002\u6b64\u5916\uff0c\u53d8\u5f02\u5f15\u64ce\u4e5f\u88ab\u6076\u610f\u8f6f\u4ef6\u5f00\u53d1\u8005\u7528\u6765\u9003\u907f\u57fa\u4e8e\u7279\u5f81\u7801\u7684\u68c0\u6d4b\u65b9\u6cd5\u3002\u73b0\u6709\u7684\u6076\u610f\u8f6f\u4ef6\u4f7f\u7528\u7684\u53d8\u5f02\u5f15\u64ce\u901a\u5e38\u53ea\u4ea7\u751f\u6709\u9650\u7684\u4ee3\u7801\u53d8\u5316\uff0c\u8fd9\u4e9b\u53d8\u5316\u4ecd\u7136\u53ef\u4ee5\u901a\u8fc7\u9759\u6001\u4ee3\u7801\u5206\u6790\u88ab\u8bc6\u522b\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684LLM\u6240\u5c55\u793a\u7684\u7075\u6d3b\u6027\u53ef\u80fd\u663e\u8457\u6539\u53d8\u8fd9\u79cd\u5a01\u80c1\u6001\u52bf\uff0c\u901a\u8fc7\u5141\u8bb8\u8fdb\u884c\u66f4\u590d\u6742\u7684\u4ee3\u7801\u53d8\u5f02\uff0c\u8fd9\u4e9b\u53d8\u5f02\u4e0d\u5bb9\u6613\u901a\u8fc7\u9759\u6001\u5206\u6790\u68c0\u6d4b\u5230\u3002\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u548c\u518d\u8bad\u7ec3\u589e\u52a0\u7531\u9884\u8bad\u7ec3LLM\u751f\u6210\u7684\u4ee3\u7801\u7684\u53d8\u5316\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u4ee3\u7801\u53d8\u5f02\u8bad\u7ec3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u4e3a\u57fa\u4e8e\u9884\u8bad\u7ec3LLM\u7684\u4ee3\u7801\u5408\u6210\u5668\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u4ee3\u7801\u53d8\u5f02\u8bad\u7ec3\u5b9a\u4e49\uff0c\u5e76\u5728\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u4e0a\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5728\u5b50\u4f8b\u7a0b\u7ea7\u522b\u91cd\u7ec4\uff08\u5373\u53d8\u5f02\uff09\u4ee3\u7801\uff0c\u8fd9\u4f7f\u5f97\u53d8\u5f02\u66f4\u52a0\u53ef\u63a7\u540c\u65f6\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u5e76\u901a\u8fc7\u5355\u5143\u6d4b\u8bd5\u9a8c\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u5730\u63d0\u9ad8\u4e86\u57fa\u4e8eLLM\u7684\u7a0b\u5e8f\u5408\u6210\u5668\u5728\u751f\u6210\u591a\u6837\u5316\u4e14\u529f\u80fd\u6b63\u786e\u7684\u4ee3\u7801\u89e3\u51b3\u65b9\u6848\u65b9\u9762\u7684\u53d8\u5f02\u80fd\u529b\uff0c\u5c55\u793a\u4e86\u5b83\u4eec\u5728\u6539\u53d8\u4ee3\u7801\u53d8\u5f02\u683c\u5c40\u4ee5\u53ca\u4e0e\u4e4b\u76f8\u5173\u7684\u5a01\u80c1\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2410.22284": "|**2024-10-29**|**Embedding-based classifiers can detect prompt injection attacks**|Md. Ahsan Ayub et.al.|[2410.22284](http://arxiv.org/abs/2410.22284)|**[link](https://github.com/AhsanAyub/malicious-prompt-detection)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u751f\u6210\u80fd\u529b\u800c\u5728\u5404\u7c7b\u7ec4\u7ec7\u4e2d\u5f97\u5230\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5bb9\u6613\u53d7\u5230\u5404\u79cd\u5bf9\u6297\u6027\u653b\u51fb\uff0c\u7279\u522b\u662f\u63d0\u793a\u6ce8\u5165\u653b\u51fb\uff0c\u8fd9\u79cd\u653b\u51fb\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6076\u610f\u63d0\u793a\u6b3a\u9a97LLMs\uff0c\u4f7f\u5176\u751f\u6210\u6709\u5bb3\u6216\u4e0d\u9002\u5f53\u7684\u5185\u5bb9\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5d4c\u5165\u5f0f\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u5206\u7c7b\u5668\u7684\u65b0\u65b9\u6cd5\uff0c\u4ee5\u4fdd\u62a4\u57fa\u4e8eLLM\u7684\u5e94\u7528\u7a0b\u5e8f\u514d\u53d7\u8fd9\u79cd\u4e25\u91cd\u5a01\u80c1\u3002\u6211\u4eec\u5229\u7528\u4e09\u79cd\u5e38\u7528\u7684\u5d4c\u5165\u6a21\u578b\u6765\u751f\u6210\u6076\u610f\u548c\u826f\u6027\u63d0\u793a\u7684\u5d4c\u5165\uff0c\u5e76\u4f7f\u7528ML\u5206\u7c7b\u5668\u9884\u6d4b\u8f93\u5165\u63d0\u793a\u662f\u5426\u4e3a\u6076\u610f\u3002\u5728\u51e0\u79cd\u4f20\u7edf\u7684ML\u65b9\u6cd5\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u968f\u673a\u68ee\u6797\u548cXGBoost\u6784\u5efa\u7684\u5206\u7c7b\u5668\u8868\u73b0\u6700\u4f73\u3002\u6211\u4eec\u7684\u5206\u7c7b\u5668\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5f00\u6e90\u5b9e\u73b0\u4e2d\u7684\u6700\u5148\u8fdb\u7684\u63d0\u793a\u6ce8\u5165\u5206\u7c7b\u5668\uff0c\u540e\u8005\u4f7f\u7528\u7684\u662f\u4ec5\u7f16\u7801\u5668\u7684\u795e\u7ecf\u7f51\u7edc\u3002**|\n", "2410.22282": "|**2024-10-29**|**Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models**|Renzhe Yu et.al.|[2410.22282](http://arxiv.org/abs/2410.22282)|null|\u81ea2022\u5e74\u5e95\u4ee5\u6765\uff0cChatGPT\u7b49\u7c7b\u4f3c\u5de5\u5177\u7684\u5e7f\u6cdb\u53ef\u7528\u6027\u5f15\u53d1\u4e86\u516c\u4f17\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63d0\u9ad8\u5b66\u4e60\u4f53\u9a8c\u548c\u6210\u679c\u65b9\u9762\u7684\u6f5c\u529b\u7684\u5de8\u5927\u5174\u8da3\u548c\u5b9e\u9a8c\u52aa\u529b\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u6765\u81ea\u5f31\u52bf\u80cc\u666f\u7684\u5b66\u4e60\u8005\u3002\u7136\u800c\uff0c\u5f88\u5c11\u6709\u7814\u7a76\u7cfb\u7edf\u5730\u8003\u5bdf\u4e86LLMs\u7684\u5b9e\u9645\u53ef\u7528\u6027\u5bf9\u6559\u80b2\u516c\u5e73\u6027\u7684\u73b0\u5b9e\u5f71\u54cd\uff0c\u9664\u4e86\u7406\u8bba\u9884\u6d4b\u548c\u521b\u65b0LLM\u5e94\u7528\u7684\u63a7\u5236\u7814\u7a76\u4e4b\u5916\u3002\u4e3a\u4e86\u63cf\u7ed8LLMs\u4e0d\u5e73\u7b49\u8d8b\u52bf\uff0c\u6211\u4eec\u5206\u6790\u4e86\u4e00\u6240\u7f8e\u56fd\u516c\u7acb\u5c11\u6570\u65cf\u88d4\u670d\u52a1\u9662\u68212021\u5e74\u81f32024\u5e74\u95f42391\u95e8\u8bfe\u7a0b\u4e2d16791\u540d\u5927\u5b66\u751f\u63d0\u4ea4\u76841140328\u7bc7\u5b66\u672f\u5199\u4f5c\u4f5c\u4e1a\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5728LLMs\u53ef\u7528\u4e4b\u540e\uff0c\u5b66\u751f\u7684\u6574\u4f53\u5199\u4f5c\u8d28\u91cf\u9010\u6e10\u63d0\u9ad8\uff0c\u5e76\u4e14\u8bed\u8a00\u4f18\u52bf\u548c\u52a3\u52bf\u5b66\u751f\u4e4b\u95f4\u7684\u5199\u4f5c\u8d28\u91cf\u5dee\u8ddd\u9010\u6e10\u7f29\u5c0f\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u5e73\u7b49\u5316\u6548\u5e94\u66f4\u591a\u96c6\u4e2d\u5728\u8f83\u9ad8\u793e\u4f1a\u7ecf\u6d4e\u5730\u4f4d\u7684\u5b66\u751f\u8eab\u4e0a\u3002\u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86LLMs\u65f6\u4ee3\u7684\u6570\u5b57\u9e3f\u6c9f\uff0c\u5e76\u63d0\u51fa\u4e86\u5173\u4e8eLLMs\u5728\u65e9\u671f\u9636\u6bb5\u7684\u516c\u5e73\u6548\u76ca\u7684\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u7814\u7a76\u4eba\u5458\u548c\u4ece\u4e1a\u8005\u9700\u8981\u5236\u5b9a\u8d1f\u8d23\u4efb\u7684\u505a\u6cd5\u4ee5\u901a\u8fc7LLMs\u6539\u5584\u6559\u80b2\u516c\u5e73\u6027\u3002|\n", "2410.23262": "|**2024-10-30**|**EMMA: End-to-End Multimodal Model for Autonomous Driving**|Jyh-Jing Hwang et.al.|[2410.23262](http://arxiv.org/abs/2410.23262)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86EMMA\uff0c\u8fd9\u662f\u4e00\u79cd\u7528\u4e8e\u81ea\u52a8\u9a7e\u9a76\u7684\u7aef\u5230\u7aef\u591a\u6a21\u6001\u6a21\u578b\u3002\u8be5\u6a21\u578b\u57fa\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u57fa\u7840\uff0c\u76f4\u63a5\u5c06\u539f\u59cb\u76f8\u673a\u4f20\u611f\u5668\u6570\u636e\u6620\u5c04\u5230\u5404\u79cd\u4e0e\u9a7e\u9a76\u76f8\u5173\u7684\u8f93\u51fa\uff0c\u5305\u62ec\u89c4\u5212\u8f68\u8ff9\u3001\u611f\u77e5\u5bf9\u8c61\u548c\u9053\u8def\u56fe\u5143\u7d20\u3002EMMA\u901a\u8fc7\u5c06\u6240\u6709\u975e\u4f20\u611f\u5668\u8f93\u5165\uff08\u4f8b\u5982\u5bfc\u822a\u6307\u4ee4\u548c\u81ea\u8f66\u72b6\u6001\uff09\u548c\u8f93\u51fa\uff08\u4f8b\u5982\u8f68\u8ff9\u548c\u4e09\u7ef4\u4f4d\u7f6e\uff09\u8868\u793a\u4e3a\u81ea\u7136\u8bed\u8a00\u6587\u672c\uff0c\u6700\u5927\u9650\u5ea6\u5730\u5229\u7528\u4e86\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u4e16\u754c\u77e5\u8bc6\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7fEMMA\u80fd\u591f\u5728\u7edf\u4e00\u7684\u8bed\u8a00\u7a7a\u95f4\u4e2d\u8054\u5408\u5904\u7406\u5404\u79cd\u9a7e\u9a76\u4efb\u52a1\uff0c\u5e76\u4f7f\u7528\u7279\u5b9a\u4efb\u52a1\u63d0\u793a\u751f\u6210\u6bcf\u4e2a\u4efb\u52a1\u7684\u8f93\u51fa\u3002\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0cEMMA\u5728nuScenes\u4e0a\u7684\u8fd0\u52a8\u89c4\u5212\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u5728Waymo\u5f00\u653e\u8fd0\u52a8\u6570\u636e\u96c6\uff08WOMD\uff09\u4e0a\u53d6\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\u3002\u6b64\u5916\uff0cEMMA\u5728Waymo\u5f00\u653e\u6570\u636e\u96c6\uff08WOD\uff09\u4e0a\u4f5c\u4e3a\u4e3b\u8981\u6444\u50cf\u5934\u7684\u4e09\u7ef4\u76ee\u6807\u68c0\u6d4b\u4e5f\u53d6\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u901a\u8fc7\u540c\u65f6\u8bad\u7ec3EMMA\u8fdb\u884c\u89c4\u5212\u8f68\u8ff9\u3001\u76ee\u6807\u68c0\u6d4b\u548c\u9053\u8def\u56fe\u4efb\u52a1\u53ef\u4ee5\u5728\u8fd9\u4e09\u4e2a\u9886\u57df\u90fd\u53d6\u5f97\u6539\u8fdb\uff0c\u7a81\u663e\u4e86EMMA\u4f5c\u4e3a\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6a21\u578b\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cEMMA\u4e5f\u8868\u73b0\u51fa\u4e00\u4e9b\u5c40\u9650\u6027\uff1a\u5b83\u53ea\u80fd\u5904\u7406\u5c11\u91cf\u56fe\u50cf\u5e27\uff0c\u4e0d\u5305\u542b\u51c6\u786e\u7684\u4e09\u7ef4\u4f20\u611f\u6a21\u6001\u5982\u6fc0\u5149\u96f7\u8fbe\u6216\u96f7\u8fbe\uff0c\u5e76\u4e14\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u7ed3\u679c\u80fd\u591f\u6fc0\u53d1\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u5e76\u8fdb\u4e00\u6b65\u53d1\u5c55\u81ea\u52a8\u9a7e\u9a76\u6a21\u578b\u67b6\u6784\u3002|\n", "2410.23252": "|**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6269\u5c55\u5230\u6267\u884c\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u4e2d\u7684\u4ee3\u7406\u4efb\u52a1\uff0c\u8d85\u8d8a\u4f20\u7edf\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u8bc4\u4f30\u5176\u9c81\u68d2\u6027\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7b49\u5173\u952e\u7ef4\u5ea6\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86CASA\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u4ee3\u7406\u5728\u4e24\u4e2a\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\uff08\u5728\u7ebf\u8d2d\u7269\u548c\u793e\u4ea4\u8ba8\u8bba\u8bba\u575b\uff09\u4e2d\u5bf9\u6587\u5316\u548c\u793e\u4f1a\u89c4\u8303\u7684\u654f\u611f\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bc4\u4f30\u4e86LLM\u4ee3\u7406\u68c0\u6d4b\u5e76\u9002\u5f53\u56de\u5e94\u8fdd\u53cd\u89c4\u8303\u7684\u7528\u6237\u67e5\u8be2\u548c\u89c2\u5bdf\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u6d4b\u91cf\u4ee3\u7406\u5bf9\u6587\u5316\u548c\u793e\u4f1a\u89c4\u8303\u7684\u610f\u8bc6\u8986\u76d6\u7387\u3001\u5728\u7ba1\u7406\u7528\u6237\u67e5\u8be2\u65f6\u7684\u5b9e\u7528\u6027\u4ee5\u53ca\u9762\u5bf9\u8bef\u5bfc\u6027\u7f51\u7edc\u5185\u5bb9\u65f6\u7684\u8fdd\u89c4\u7387\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u524d\u7684LLM\u5728\u975e\u4ee3\u7406\u73af\u5883\u4e2d\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u5728\u7f51\u7edc\u4ee3\u7406\u73af\u5883\u4e2d\uff0c\u4ee3\u7406\u7684\u610f\u8bc6\u8986\u76d6\u7387\u4e0d\u523010%\uff0c\u8fdd\u89c4\u7387\u8d85\u8fc740%\u3002\u4e3a\u4e86\u63d0\u9ad8\u6027\u80fd\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u63d0\u793a\u548c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u4e92\u8865\u2014\u2014\u9488\u5bf9\u7279\u5b9a\u6587\u5316\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u589e\u5f3a\u4ee3\u7406\u5728\u4e0d\u540c\u5730\u533a\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u800c\u63d0\u793a\u5219\u80fd\u63d0\u5347\u4ee3\u7406\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u7a81\u663e\u4e86\u5728\u5f00\u53d1\u5468\u671f\u4e2d\u4e0d\u65ad\u57fa\u51c6\u6d4b\u8bd5LLM\u4ee3\u7406\u7684\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7684\u91cd\u8981\u6027\u3002|\n", "2410.23243": "|**2024-10-30**|**Carrot and Stick: Eliciting Comparison Data and Beyond**|Yiling Chen et.al.|[2410.23243](http://arxiv.org/abs/2410.23243)|null|\u6bd4\u8f83\u6570\u636e\u901a\u5e38\u6765\u81ea\u4e8e\u4eba\u4eec\u7684\u4e3b\u89c2\u5224\u65ad\uff0c\u5e76\u4e14\u96be\u4ee5\u76f4\u63a5\u9a8c\u8bc1\u3002\u8fd9\u4e9b\u6570\u636e\u5bf9\u4e8e\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u4efb\u52a1\u81f3\u5173\u91cd\u8981\uff0c\u5305\u62ec\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\u548c\u6392\u540d\u6a21\u578b\u4f30\u8ba1\u3002\u5982\u4f55\u8bda\u5b9e\u5730\u4ece\u7406\u6027\u4e2a\u4f53\u90a3\u91cc\u83b7\u53d6\u8fd9\u6837\u7684\u6bd4\u8f83\u6570\u636e\uff1f\u6211\u4eec\u8bbe\u8ba1\u4e86\u540c\u4f34\u9884\u6d4b\u673a\u5236\u6765\u5229\u7528\u5956\u91d1-\u60e9\u7f5a\u652f\u4ed8\u65b9\u5f0f\u6765\u83b7\u53d6\u6bd4\u8f83\u6570\u636e\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f9d\u8d56\u4e8e\u6bd4\u8f83\u6570\u636e\u7684\u5f3a\u968f\u673a\u4f20\u9012\u6027\uff0c\u4ece\u800c\u521b\u5efa\u5bf9\u79f0\u7684\u4e25\u683c\u771f\u5b9e\u673a\u5236\uff0c\u4f7f\u5f97\u8bf4\u5b9e\u8bdd\u4e0d\u4ec5\u5f62\u6210\u4e25\u683c\u7684\u8d1d\u53f6\u65af\u7eb3\u4ec0\u5747\u8861\uff0c\u800c\u4e14\u5728\u6240\u6709\u5bf9\u79f0\u5747\u8861\u4e2d\u83b7\u5f97\u6700\u9ad8\u62a5\u916c\u3002\u5728\u6211\u4eec\u7684\u673a\u5236\u4e0b\uff0c\u6bcf\u4e2a\u4e2a\u4f53\u53ea\u9700\u8981\u8bc4\u4f30\u4e00\u5bf9\u9879\u76ee\u5e76\u62a5\u544a\u5979\u7684\u6bd4\u8f83\u7ed3\u679c\u3002 \u6211\u4eec\u8fdb\u4e00\u6b65\u5c06\u5956\u91d1-\u60e9\u7f5a\u652f\u4ed8\u7684\u6982\u5ff5\u6269\u5c55\u5230\u7f51\u7edc\u5316\u6570\u636e\u7684\u83b7\u53d6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5f53\u4ee3\u7406\u4eba\u7684\u79c1\u4eba\u4fe1\u53f7\u6839\u636eIsing\u6a21\u578b\u91c7\u6837\u65f6\uff0c\u5bf9\u79f0\u5730\u4e25\u683c\u771f\u5b9e\u7684\u673a\u5236\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5956\u91d1-\u60e9\u7f5a\u652f\u4ed8\u6210\u4e3a\u4e25\u683c\u8d1d\u53f6\u65af\u7eb3\u4ec0\u5747\u8861\u7684\u5fc5\u8981\u548c\u5145\u5206\u6761\u4ef6\u3002\u5728\u4e24\u4e2a\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u652f\u6301\u4e86\u6211\u4eec\u7684\u7406\u8bba\u53d1\u73b0\u3002|\n", "2410.23242": "|**2024-10-30**|**A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment**|Matteo G. Mecattaf et.al.|[2410.23242](http://arxiv.org/abs/2410.23242)|null|\u4f5c\u4e3a\u901a\u7528\u5de5\u5177\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5fc5\u987b\u7ecf\u5e38\u63a8\u7406\u65e5\u5e38\u7269\u7406\u73af\u5883\u3002\u5728\u95ee\u7b54\u573a\u666f\u4e2d\uff0c\u7406\u89e3\u7269\u7406\u5bf9\u8c61\u7684\u76f8\u4e92\u4f5c\u7528\u53ef\u80fd\u662f\u7ed9\u51fa\u9002\u5f53\u56de\u7b54\u7684\u5fc5\u8981\u6761\u4ef6\u3002\u6b64\u5916\uff0cLLMs\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4f5c\u81ea\u4e3b\u7cfb\u7edf\u4e2d\u7684\u63a8\u7406\u5f15\u64ce\uff0c\u8bbe\u8ba1\u548c\u63a7\u5236\u5b83\u4eec\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u5927\u591a\u6570\u7814\u7a76\u901a\u8fc7\u9759\u6001\u57fa\u51c6\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd9\u4e9b\u57fa\u51c6\u7531\u5173\u4e8e\u7269\u7406\u4e16\u754c\u7684\u6587\u672c\u6216\u56fe\u50cf\u95ee\u9898\u7ec4\u6210\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u65e0\u6cd5\u6355\u6349\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u7269\u7406\u8fc7\u7a0b\u7684\u590d\u6742\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u3002\u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u63d0\u5021\u7b2c\u4e8c\u79cd\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u65b9\u6cd5\uff1a\u901a\u8fc7\u5728\u4e00\u4e2a3D\u73af\u5883\u4e2d\u8d4b\u4e88LLMs\u5bf9\u4ee3\u7406\u7684\u63a7\u5236\u6743\u6765\u201c\u5177\u8eab\u5316\u201d\u5b83\u4eec\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u7b2c\u4e00\u4e2a\u5177\u8eab\u4e14\u8ba4\u77e5\u4e0a\u6709\u610f\u4e49\u7684LLM\u7269\u7406\u5e38\u8bc6\u63a8\u7406\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7684\u6846\u67b6\u5141\u8bb8\u76f4\u63a5\u6bd4\u8f83LLMs\u4e0e\u5176\u4ed6\u5177\u8eab\u4ee3\u7406\uff0c\u5982\u57fa\u4e8e\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7684\u4ee3\u7406\uff0c\u4ee5\u53ca\u4eba\u7c7b\u548c\u975e\u4eba\u7c7b\u52a8\u7269\u3002\u6211\u4eec\u4f7f\u7528Animal-AI\uff08AAI\uff09\u73af\u5883\uff0c\u4e00\u4e2a\u6a21\u62df\u76843D\u865a\u62df\u5b9e\u9a8c\u5ba4\uff0c\u6765\u7814\u7a76LLMs\u7684\u7269\u7406\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528AAI\u6d4b\u8bd5\u5e73\u53f0\uff0c\u8be5\u5e73\u53f0\u662f\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u590d\u5236\u4e86\u975e\u4eba\u7c7b\u52a8\u7269\u7684\u5b9e\u9a8c\u5ba4\u7814\u7a76\uff0c\u4ee5\u7814\u7a76\u7269\u7406\u63a8\u7406\u80fd\u529b\uff0c\u5305\u62ec\u8ddd\u79bb\u4f30\u8ba1\u3001\u8ddf\u8e2a\u770b\u4e0d\u89c1\u7684\u7269\u4f53\u548c\u5de5\u5177\u4f7f\u7528\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6ca1\u6709\u5fae\u8c03\u7684\u72b6\u6001-of-the-art\u591a\u6a21\u6001\u6a21\u578b\u80fd\u591f\u5b8c\u6210\u8fd9\u79cd\u4efb\u52a1\uff0c\u4f7f\u5f97\u4e0e2019\u5e74Animal-AI\u5965\u8fd0\u4f1a\u53c2\u8d5b\u8005\u548c\u4eba\u7c7b\u513f\u7ae5\u8fdb\u884c\u6709\u610f\u4e49\u7684\u6bd4\u8f83\u6210\u4e3a\u53ef\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cLLMs\u76ee\u524d\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0d\u5982\u4eba\u7c7b\u513f\u7ae5\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u4f7f\u7528\u76f4\u63a5\u4ece\u8ba4\u77e5\u79d1\u5b66\u4e2d\u63d0\u53d6\u7684\u751f\u6001\u6709\u6548\u7684\u5b9e\u9a8c\u6765\u7814\u7a76\u7269\u7406\u63a8\u7406\uff0c\u4ece\u800c\u63d0\u9ad8LLMs\u7684\u9884\u6d4b\u6027\u548c\u53ef\u9760\u6027\u3002|\n", "2410.23234": "|**2024-10-30**|**EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning**|Peide Huang et.al.|[2410.23234](http://arxiv.org/abs/2410.23234)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aEMOTION\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u5728\u4eba\u5f62\u673a\u5668\u4eba\u4e2d\u751f\u6210\u5bcc\u6709\u8868\u73b0\u529b\u7684\u52a8\u4f5c\u5e8f\u5217\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u8fdb\u884c\u7c7b\u4eba\u975e\u8bed\u8a00\u4ea4\u6d41\u7684\u80fd\u529b\u3002\u975e\u8bed\u8a00\u7ebf\u7d22\u5982\u9762\u90e8\u8868\u60c5\u3001\u624b\u52bf\u548c\u8eab\u4f53\u52a8\u4f5c\u5728\u6709\u6548\u7684\u4eba\u9645\u4e92\u52a8\u4e2d\u8d77\u7740\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5c3d\u7ba1\u5728\u673a\u5668\u4eba\u7684\u884c\u4e3a\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u96be\u4ee5\u6a21\u4eff\u4eba\u7c7b\u975e\u8bed\u8a00\u4ea4\u6d41\u7684\u591a\u6837\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\uff0c\u52a8\u6001\u751f\u6210\u9002\u5408\u793e\u4f1a\u4ea4\u5f80\u7684\u624b\u52bf\u52a8\u4f5c\u5e8f\u5217\uff0c\u4ee5\u4fc3\u8fdb\u4eba\u673a\u4ea4\u4e92\u3002\u6211\u4eec\u4f7f\u7528\u8be5\u6846\u67b6\u751f\u6210\u4e8610\u79cd\u4e0d\u540c\u7684\u8868\u60c5\u624b\u52bf\uff0c\u5e76\u8fdb\u884c\u4e86\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff0c\u6bd4\u8f83\u7531EMOTION\u548c\u5176\u52a0\u5165\u4eba\u7c7b\u53cd\u9988\u7248\u672cEMOTION++\u751f\u6210\u7684\u52a8\u4f5c\u4e0e\u4eba\u7c7b\u64cd\u4f5c\u5458\u751f\u6210\u7684\u52a8\u4f5c\u4e4b\u95f4\u7684\u81ea\u7136\u5ea6\u548c\u53ef\u7406\u89e3\u6027\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u751f\u6210\u53ef\u7406\u89e3\u4e14\u81ea\u7136\u7684\u673a\u5668\u4eba\u52a8\u4f5c\u65b9\u9762\u8981\u4e48\u4e0e\u4eba\u7c7b\u8868\u73b0\u76f8\u5f53\uff0c\u8981\u4e48\u8d85\u8d8a\u4eba\u7c7b\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u7684\u8bbe\u8ba1\u542f\u793a\uff0c\u8003\u8651\u5728\u751f\u6210\u5bcc\u6709\u8868\u73b0\u529b\u7684\u673a\u5668\u4eba\u624b\u52bf\u65f6\u9700\u8981\u8003\u8651\u7684\u4e00\u7cfb\u5217\u53d8\u91cf\u3002|\n", "2410.23214": "|**2024-10-31**|**Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval**|Sheryl Hsu et.al.|[2410.23214](http://arxiv.org/abs/2410.23214)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u73b0\u8c61\u901a\u8fc7\u5141\u8bb8\u6a21\u578b\u641c\u7d22\u4fe1\u606f\u5e76\u5c06\u7b54\u6848\u4e0e\u771f\u5b9e\u6765\u6e90\u6302\u94a9\uff0c\u5f97\u5230\u4e86\u4e00\u5b9a\u7a0b\u5ea6\u7684\u7f13\u89e3\u3002\u7136\u800c\uff0cLLMs\u5728\u5904\u7406\u590d\u6742\u6216\u95f4\u63a5\u4e3b\u9898\u65f6\uff0c\u5f80\u5f80\u96be\u4ee5\u63d0\u51fa\u6b63\u786e\u7684\u641c\u7d22\u67e5\u8be2\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u901a\u8fc7\u8ba9LLMs\u5c1d\u8bd5\u4e0d\u540c\u7684\u67e5\u8be2\u5e76\u5b66\u4e60\u5bf9\u90a3\u4e9b\u6210\u529f\u4ea7\u751f\u76f8\u5173\u7ed3\u679c\u7684\u67e5\u8be2\u8d4b\u4e88\u66f4\u9ad8\u7684\u6743\u91cd\uff0cLLMs\u53ef\u4ee5\u5b66\u4f1a\u68c0\u7d22\u76f8\u5173\u7684\u4e8b\u5b9e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86LeReT\uff08Learning to Retrieve by Trying\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\uff0c\u901a\u8fc7\u63a2\u7d22\u641c\u7d22\u67e5\u8be2\u5e76\u4f7f\u7528\u57fa\u4e8e\u504f\u597d\u7684\u4f18\u5316\u6765\u63d0\u9ad8\u67e5\u8be2\u8d28\u91cf\u3002LeReT\u53ef\u4ee5\u5c06\u7edd\u5bf9\u68c0\u7d22\u51c6\u786e\u6027\u63d0\u9ad8\u591a\u8fbe29%\uff0c\u5e76\u5c06\u4e0b\u6e38\u751f\u6210\u5668\u8bc4\u4f30\u63d0\u9ad817%\u3002LeReT\u7684\u7b80\u5355\u6027\u548c\u7075\u6d3b\u6027\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4efb\u610f\u73b0\u6210\u7684\u68c0\u7d22\u5668\uff0c\u5e76\u6210\u4e3a\u6539\u8fdb\u901a\u7528LLM\u7ba1\u9053\u7684\u4e00\u79cd\u6709\u524d\u9014\u7684\u6280\u672f\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttp://sherylhsu.com/LeReT/\u3002|\n", "2410.23182": "|**2024-10-30**|**ProTransformer: Robustify Transformers via Plug-and-Play Paradigm**|Zhichao Hou et.al.|[2410.23182](http://arxiv.org/abs/2410.23182)|null|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8eTransformer\u7684\u67b6\u6784\u5728\u673a\u5668\u5b66\u4e60\u7684\u5404\u4e2a\u9886\u57df\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u9c81\u68d2\u6ce8\u610f\u529b\u673a\u5236\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8eTransformer\u7684\u67b6\u6784\u7684\u97e7\u6027\u3002\u8fd9\u9879\u6280\u672f\u53ef\u4ee5\u4f5c\u4e3a\u63d2\u4ef6\u5c42\u96c6\u6210\u5230\u73b0\u6709\u7684Transformer\u6a21\u578b\u4e2d\uff0c\u4ece\u800c\u63d0\u9ad8\u5176\u9c81\u68d2\u6027\uff0c\u800c\u65e0\u9700\u989d\u5916\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u3002\u901a\u8fc7\u5168\u9762\u7684\u5b9e\u9a8c\u548c\u6d88\u878d\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86ProTransformer\u663e\u8457\u63d0\u5347\u4e86\u5404\u79cd\u9884\u6d4b\u4efb\u52a1\u3001\u653b\u51fb\u673a\u5236\u3001\u9aa8\u5e72\u67b6\u6784\u548c\u6570\u636e\u57df\u4e2d\u7684Transformer\u6a21\u578b\u7684\u9c81\u68d2\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u7ecf\u5178\u7684TextFooler\u653b\u51fb\u4e0b\uff0c\u65e0\u9700\u8fdb\u4e00\u6b65\u5fae\u8c03\uff0cProTransformer\u5206\u522b\u5c06BERT\u3001ALBERT\u3001DistilBERT\u548cRoBERTA\u8fd9\u56db\u79cd\u6a21\u578b\u7684\u6027\u80fd\u63d0\u9ad8\u4e8619.5%\u300128.3%\u300116.1%\u548c11.4%\u3002\u6b64\u5916\uff0cProTransformer\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u5bf9\u57fa\u4e8e\u63d0\u793a\u7684\u653b\u51fb\u65f6\u8868\u73b0\u51fa\u826f\u597d\u7684\u97e7\u6027\uff0c\u5206\u522b\u5c06T5\u548cLLaMA\u7684\u6027\u80fd\u63d0\u9ad8\u4e8624.8%\u548c17.8%\uff0c\u5e76\u4e14\u5e73\u5747\u5c06Vicuna\u5728Jailbreaking\u653b\u51fb\u4e0b\u7684\u6027\u80fd\u63d0\u9ad8\u4e8610.4%\u3002\u9664\u4e86\u8bed\u8a00\u9886\u57df\u5916\uff0cProTransformer\u8fd8\u5728\u89c6\u89c9\u548c\u56fe\u9886\u57df\u5c55\u793a\u4e86\u51fa\u8272\u7684\u9c81\u68d2\u6027\u3002|\n", "2410.23180": "|**2024-10-30**|**ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning**|Millennium Bismay et.al.|[2410.23180](http://arxiv.org/abs/2410.23180)|**[link](https://github.com/millenniumbismay/reasoningrec)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aReasoningRec\u7684\u63a8\u7406\u63a8\u8350\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u5f25\u5408\u63a8\u8350\u4e0e\u4eba\u7c7b\u53ef\u89e3\u91ca\u6027\u89e3\u91ca\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u4e0e\u4f9d\u8d56\u4e8e\u9690\u5f0f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\u4e0d\u540c\uff0cReasoningRec\u4f7f\u7528LLMs\u6765\u5efa\u6a21\u7528\u6237\u548c\u9879\u76ee\uff0c\u91cd\u70b9\u5728\u4e8e\u7528\u6237\u7684\u504f\u597d\u3001\u538c\u6076\u548c\u89e3\u91ca\u6027\u63a8\u7406\u3002\u8be5\u6846\u67b6\u5229\u7528\u4e00\u4e2a\u8f83\u5927\u7684LLM\u751f\u6210\u7528\u6237\u504f\u597d\u7684\u5408\u6210\u89e3\u91ca\uff0c\u968f\u540e\u7528\u4e8e\u5fae\u8c03\u8f83\u5c0f\u7684LLM\u4ee5\u63d0\u9ad8\u63a8\u8350\u51c6\u786e\u6027\u53ca\u63d0\u4f9b\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u89e3\u91ca\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7814\u7a76\u8c03\u67e5\u4e86\u63a8\u7406\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\u5bf9\u4e2a\u6027\u5316\u63a8\u8350\u7684\u5f71\u54cd\uff0c\u7ed3\u679c\u663e\u793a\u4e0a\u4e0b\u6587\u548c\u4e2a\u4eba\u5316\u6570\u636e\u7684\u8d28\u91cf\u663e\u8457\u5f71\u54cdLLM\u751f\u6210\u5408\u7406\u89e3\u91ca\u7684\u80fd\u529b\u3002\u5b9e\u8bc1\u8bc4\u4f30\u8868\u660e\uff0cReasoningRec\u5728\u63a8\u8350\u9884\u6d4b\u65b9\u9762\u6bd4\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa12.5%\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u6613\u4e8e\u7406\u89e3\u7684\u89e3\u91ca\u3002\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/millenniumbismay/reasoningrec\u3002**|\n", "2410.23166": "|**2024-10-30**|**SciPIP: An LLM-based Scientific Paper Idea Proposer**|Wenxiao Wang et.al.|[2410.23166](http://arxiv.org/abs/2410.23166)|null|\u77e5\u8bc6\u7684\u6307\u6570\u589e\u957f\u548c\u8de8\u5b66\u79d1\u7814\u7a76\u7684\u590d\u6742\u6027\u7ed9\u7814\u7a76\u4eba\u5458\u5e26\u6765\u4e86\u663e\u8457\u6311\u6218\uff0c\u5305\u62ec\u4fe1\u606f\u8fc7\u8f7d\u548c\u63a2\u7d22\u65b0\u60f3\u6cd5\u7684\u56f0\u96be\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u5728\u589e\u5f3a\u60f3\u6cd5\u63d0\u6848\u65b9\u9762\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u5982\u4f55\u6709\u6548\u5229\u7528\u5927\u6a21\u578b\u8fdb\u884c\u5408\u7406\u7684\u60f3\u6cd5\u63d0\u6848\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u8ba8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u79d1\u5b66\u8bba\u6587\u60f3\u6cd5\u63d0\u6848\u5668\uff08SciPIP\uff09\u3002\u57fa\u4e8e\u7528\u6237\u63d0\u4f9b\u7684\u7814\u7a76\u80cc\u666f\uff0cSciPIP\u4ece\u6587\u732e\u6570\u636e\u5e93\u4e2d\u68c0\u7d22\u6709\u7528\u8bba\u6587\uff0c\u540c\u65f6\u5229\u7528LLMs\u7684\u80fd\u529b\u751f\u6210\u66f4\u591a\u65b0\u9896\u4e14\u53ef\u884c\u7684\u60f3\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6587\u732e\u68c0\u7d22\u6570\u636e\u5e93\uff0c\u63d0\u53d6\u5927\u91cf\u8bba\u6587\u7684\u591a\u7ef4\u5ea6\u4fe1\u606f\u4ee5\u4fbf\u5feb\u901f\u8bbf\u95ee\u3002\u7136\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bed\u4e49\u3001\u5b9e\u4f53\u548c\u5f15\u7528\u5171\u73b0\u7684\u6587\u732e\u68c0\u7d22\u65b9\u6cd5\uff0c\u4ece\u591a\u4e2a\u65b9\u9762\u6839\u636e\u7528\u6237\u63d0\u4f9b\u7684\u80cc\u666f\u641c\u7d22\u76f8\u5173\u6587\u732e\u3002\u5728\u6587\u732e\u68c0\u7d22\u4e4b\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cc\u8def\u5f84\u60f3\u6cd5\u63d0\u6848\u7b56\u7565\uff0c\u5176\u4e2d\u4e00\u6761\u8def\u5f84\u4ece\u68c0\u7d22\u5230\u7684\u6587\u732e\u4e2d\u63a8\u65ad\u89e3\u51b3\u65b9\u6848\uff0c\u53e6\u4e00\u6761\u8def\u5f84\u901a\u8fc7\u6a21\u578b\u5934\u8111\u98ce\u66b4\u751f\u6210\u539f\u521b\u60f3\u6cd5\u3002\u7136\u540e\u6211\u4eec\u5c06\u4e24\u8005\u7ed3\u5408\u8d77\u6765\u4ee5\u5b9e\u73b0\u53ef\u884c\u6027\u4e0e\u539f\u521b\u6027\u7684\u826f\u597d\u5e73\u8861\u3002\u901a\u8fc7\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660eSciPIP\u53ef\u4ee5\u68c0\u7d22\u4e0e\u73b0\u6709\u9876\u7ea7\u4f1a\u8bae\u8bba\u6587\u7c7b\u4f3c\u7684\u5f15\u6587\uff0c\u5e76\u751f\u6210\u8bb8\u591a\u4e0e\u5176\u4e00\u81f4\u7684\u60f3\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e86SciPIP\u751f\u6210\u7684\u5176\u4ed6\u60f3\u6cd5\u7684\u539f\u521b\u6027\uff0c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u548c\u6570\u636e\u5e93\u5df2\u53d1\u5e03\u5728https://github.com/cheerss/SciPIP\u3002|\n", "2410.23136": "|**2024-10-30**|**Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning**|Keqin Bao et.al.|[2410.23136](http://arxiv.org/abs/2410.23136)|**[link](https://github.com/ym689/rec_icl)**|**\u9891\u7e41\u66f4\u65b0\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u8350\u7cfb\u7edf\u4ee5\u9002\u5e94\u65b0\u7684\u7528\u6237\u5174\u8da3\uff0c\u5c31\u50cf\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\u6240\u505a\u7684\u90a3\u6837\uff0c\u7531\u4e8e\u9ad8\u6602\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u5373\u4f7f\u6709\u52a0\u901f\u65b9\u6cd5\u4e5f\u662f\u4e0d\u5207\u5b9e\u9645\u7684\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u4e0d\u8fdb\u884c\u4efb\u4f55\u6a21\u578b\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u901a\u8fc7\u5229\u7528\u60c5\u5883\u5b66\u4e60\uff08ICL\uff09\u6765\u9002\u5e94\u52a8\u6001\u7528\u6237\u5174\u8da3\u7684\u65b9\u6cd5\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f7fLLM\u80fd\u591f\u4ece\u8f93\u5165\u4e2d\u7684\u5c11\u91cf\u793a\u4f8b\u4e2d\u5b66\u4e60\u65b0\u4efb\u52a1\u3002\u4f7f\u7528\u65b0\u7684\u5174\u8da3\u793a\u4f8b\u4f5c\u4e3aICL\u7684\u5c11\u91cf\u793a\u4f8b\uff0cLLM\u53ef\u4ee5\u5b9e\u65f6\u5b66\u4e60\u5174\u8da3\uff0c\u4ece\u800c\u907f\u514d\u4e86\u6a21\u578b\u66f4\u65b0\u7684\u9700\u6c42\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u63a8\u8350\u5668\u5728\u63a8\u8350\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u7ecf\u5e38\u5931\u53bb\u5728\u60c5\u5883\u5b66\u4e60\u7684\u80fd\u529b\uff0c\u800c\u539f\u59cbLLM\u7684\u60c5\u5883\u5b66\u4e60\u7f3a\u4e4f\u9488\u5bf9\u63a8\u8350\u4efb\u52a1\u7684\u5173\u6ce8\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86RecICL\uff0c\u5b83\u5b9a\u5236\u4e86\u9488\u5bf9\u63a8\u8350\u4efb\u52a1\u7684\u60c5\u5883\u5b66\u4e60\uff0c\u7528\u4e8e\u5b9e\u65f6\u63a8\u8350\u3002RecICL\u4ee5\u60c5\u5883\u5b66\u4e60\u683c\u5f0f\u7ec4\u7ec7\u8bad\u7ec3\u793a\u4f8b\uff0c\u786e\u4fdd\u5728\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u4fdd\u7559\u60c5\u5883\u5b66\u4e60\u80fd\u529b\u5e76\u4e0e\u5176\u63a8\u8350\u4efb\u52a1\u5bf9\u9f50\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRecICL\u5728\u65e0\u9700\u6a21\u578b\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u5b9e\u65f6\u63a8\u8350\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/ym689/rec_icl\u83b7\u53d6\u3002**|\n", "2410.24198": "|**2024-11-01**|**SelfCodeAlign: Self-Alignment for Code Generation**|Yuxiang Wei et.al.|[2410.24198](http://arxiv.org/abs/2410.24198)|**[link](https://github.com/bigcode-project/selfcodealign)**|**\u6307\u4ee4\u5fae\u8c03\u662f\u4e00\u79cd\u76d1\u7763\u5fae\u8c03\u65b9\u6cd5\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u6307\u4ee4\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86SelfCodeAlign\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5b8c\u5168\u900f\u660e\u4e14\u65e0\u9650\u5236\u7684\u7ba1\u9053\uff0c\u7528\u4e8e\u81ea\u6211\u5bf9\u9f50\u4ee3\u7801LLMs\uff0c\u800c\u65e0\u9700\u5927\u91cf\u7684\u4eba\u5de5\u6ce8\u91ca\u6216\u84b8\u998f\u3002SelfCodeAlign\u5728\u6574\u4e2a\u6570\u636e\u751f\u6210\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u76f8\u540c\u7684\u57fa\u6a21\u578b\u8fdb\u884c\u63a8\u7406\u3002\u5b83\u9996\u5148\u4ece\u9ad8\u8d28\u91cf\u7684\u79cd\u5b50\u4ee3\u7801\u7247\u6bb5\u4e2d\u63d0\u53d6\u591a\u6837\u5316\u7684\u7f16\u7801\u6982\u5ff5\u4ee5\u751f\u6210\u65b0\u4efb\u52a1\u3002\u7136\u540e\uff0c\u5b83\u9488\u5bf9\u6bcf\u4e2a\u4efb\u52a1\u91c7\u6837\u591a\u4e2a\u54cd\u5e94\uff0c\u5e76\u4e0e\u6d4b\u8bd5\u7528\u4f8b\u914d\u5bf9\uff0c\u5728\u6c99\u76d2\u73af\u5883\u4e2d\u9a8c\u8bc1\u8fd9\u4e9b\u54cd\u5e94\u3002\u6700\u540e\uff0c\u901a\u8fc7\u9009\u62e9\u901a\u8fc7\u793a\u4f8b\u6765\u8fdb\u884c\u6307\u4ee4\u5fae\u8c03\u3002\u5728\u6211\u4eec\u7684\u4e3b\u8981\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528SelfCodeAlign\u4e0eCodeQwen1.5-7B\u751f\u6210\u4e86\u4e00\u4e2a\u5305\u542b74k\u4e2a\u6307\u4ee4-\u54cd\u5e94\u5bf9\u7684\u6570\u636e\u96c6\u3002\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u6a21\u578b\u5728HumanEval+\u4e0a\u7684pass@1\u8fbe\u5230\u4e8667.1\uff0c\u8d85\u8fc7\u4e86CodeLlama-70B-Instruct\uff0c\u5c3d\u7ba1\u524d\u8005\u6bd4\u540e\u8005\u5c0f\u5341\u500d\u3002\u5728\u6240\u6709\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u59cb\u7ec8\u4f18\u4e8e\u4e4b\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5OctoPack\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u65e0\u9700\u4eba\u5de5\u6ce8\u91ca\u6216\u84b8\u998f\u7684\u6307\u4ee4\u5fae\u8c03\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86SelfCodeAlign\u5728\u5404\u79cd\u5927\u5c0f\u7684LLMs\u4e0a\u90fd\u662f\u6709\u6548\u7684\uff0c\u4ece3B\u523033B\uff0c\u5e76\u4e14\u57fa\u6a21\u578b\u53ef\u4ee5\u4ece\u4e0e\u81ea\u8eab\u6570\u636e\u5206\u5e03\u7684\u5bf9\u9f50\u4e2d\u83b7\u76ca\u66f4\u591a\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6211\u4eec\u7ba1\u9053\u4e2d\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\uff0c\u8868\u660eSelfCodeAlign\u7684\u8868\u73b0\u4f18\u4e8e\u76f4\u63a5\u4eceGPT-4\u84b8\u998f\u7684\u65b9\u6cd5\u4ee5\u53ca\u9886\u5148\u7684\u57fa\u4e8eGPT-3.5\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u5982OSS-Instruct\u548cEvol-Instruct\u3002SelfCodeAlign\u8fd8\u4fc3\u6210\u4e86StarCoder2-Instruct\u7684\u521b\u5efa\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5b8c\u5168\u900f\u660e\u3001\u8bb8\u53ef\u5bbd\u677e\u4e14\u81ea\u6211\u5bf9\u9f50\u7684\u4ee3\u7801LLM\uff0c\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7f16\u7801\u6027\u80fd\u3002**|\n", "2410.24175": "|**2024-10-31**|**Constraint Back-translation Improves Complex Instruction Following of Large Language Models**|Yunjia Qi et.al.|[2410.24175](http://arxiv.org/abs/2410.24175)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u9075\u5faa\u5177\u6709\u590d\u6742\u683c\u5f0f\u3001\u957f\u5ea6\u7b49\u7ea6\u675f\u7684\u6307\u4ee4\u65f6\u5b58\u5728\u56f0\u96be\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u662f\u5728\u590d\u6742\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u4e0a\u8fdb\u884c\u540e\u8bad\u7ec3\uff0c\u8fd9\u4e9b\u6570\u636e\u901a\u8fc7\u5bf9\u590d\u6742\u6307\u4ee4\u8fdb\u884c\u9ad8\u7ea7LLMs\u7684\u8f93\u5165\u751f\u6210\u3002\u7136\u800c\uff0c\u5373\u4f7f\u5148\u8fdb\u7684LLMs\u4e5f\u65e0\u6cd5\u5f88\u597d\u5730\u9075\u5faa\u590d\u6742\u7684\u6307\u4ee4\uff0c\u8fd9\u9650\u5236\u4e86\u751f\u6210\u6570\u636e\u7684\u8d28\u91cf\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u53d1\u73b0\u73b0\u6709\u7684\u6570\u636e\u96c6\u5185\u5728\u5730\u5305\u542b\u4e86\u9690\u542b\u7684\u590d\u6742\u7ea6\u675f\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6570\u636e\u751f\u6210\u6280\u672f\uff0c\u79f0\u4e3a\u7ea6\u675f\u56de\u8bd1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u91c7\u7528\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\uff0c\u5e76\u4ec5\u4f7f\u7528\u9ad8\u7ea7LLMs\u5c06\u54cd\u5e94\u5df2\u7ecf\u6ee1\u8db3\u7684\u590d\u6742\u7ea6\u675f\u6dfb\u52a0\u5230\u6307\u4ee4\u4e2d\uff0c\u8fd9\u79cd\u65b9\u6cd5\u81ea\u7136\u964d\u4f4e\u4e86\u6210\u672c\u548c\u6570\u636e\u566a\u58f0\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528Llama3-70B-Instruct\u8fdb\u884c\u56de\u8bd1\u5e76\u521b\u5efa\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u590d\u6742\u6307\u4ee4-\u54cd\u5e94\u6570\u636e\u96c6\uff0c\u547d\u540d\u4e3aCRAB\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5728CRAB\u4e0a\u8fdb\u884c\u540e\u8bad\u7ec3\u53ef\u4ee5\u63d0\u9ad8\u591a\u79cd\u57fa\u7840LLMs\u7684\u590d\u6742\u6307\u4ee4\u9075\u5faa\u80fd\u529b\uff0c\u5728\u5e7f\u6cdb\u7684\u6307\u4ee4\u9075\u5faa\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u53d1\u73b0\uff0c\u7ea6\u675f\u56de\u8bd1\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u540e\u8bad\u7ec3\u4e2d\u7684\u6709\u7528\u8f85\u52a9\u8bad\u7ec3\u76ee\u6807\u3002\u6211\u4eec\u7684\u4ee3\u7801\u3001\u6570\u636e\u548c\u6a21\u578b\u5c06\u88ab\u516c\u5f00\u53d1\u5e03\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2410.24155": "|**2024-10-31**|**Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning**|Jinghan Zhang et.al.|[2410.24155](http://arxiv.org/abs/2410.24155)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u901a\u5e38\u901a\u8fc7\u6784\u5efa\u601d\u7ef4\u94fe\u6765\u5f15\u5bfc\u6a21\u578b\u8fdb\u884c\u591a\u6b65\u63a8\u7406\u89e3\u51b3\u95ee\u9898\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u4e4b\u524d\u63a2\u7d22\u8fc7\u7684\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\uff0c\u4ece\u800c\u5ffd\u89c6\u4e86LLMs\u8ba4\u77e5\u8303\u56f4\u5185\u7684\u5173\u952e\u76f2\u70b9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Thought Space Explorer (TSE)\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u6269\u5c55\u548c\u4f18\u5316\u601d\u7ef4\u7ed3\u6784\uff0c\u4ee5\u5f15\u5bfcLLMs\u63a2\u7d22\u5176\u601d\u7ef4\u76f2\u70b9\u3002\u901a\u8fc7\u57fa\u4e8e\u539f\u59cb\u601d\u7ef4\u7ed3\u6784\u751f\u6210\u65b0\u7684\u63a8\u7406\u6b65\u9aa4\u548c\u5206\u652f\uff0c\u5e76\u91c7\u7528\u591a\u79cd\u8bbe\u8ba1\u7b56\u7565\uff0cTSE\u62d3\u5bbd\u4e86\u601d\u7ef4\u7a7a\u95f4\uff0c\u51cf\u8f7b\u4e86\u76f2\u70b9\u5bf9LLM\u63a8\u7406\u7684\u5f71\u54cd\u3002\u5728\u591a\u4e2a\u7ea7\u522b\u7684\u63a8\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86TSE\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5206\u6790\uff0c\u4ee5\u7406\u89e3\u7ed3\u6784\u5316\u548c\u6269\u5c55\u5316\u7684\u601d\u7ef4\u5982\u4f55\u6709\u52a9\u4e8e\u91ca\u653eLLM\u63a8\u7406\u80fd\u529b\u7684\u6f5c\u529b\u3002|\n", "2410.24152": "|**2024-10-31**|**Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning**|Jiaqi Liu et.al.|[2410.24152](http://arxiv.org/abs/2410.24152)|null|\u5408\u4f5c\u9a7e\u9a76\u6280\u672f\u5728\u63d0\u5347\u4ea4\u901a\u7cfb\u7edf\u6548\u7387\u548c\u5b89\u5168\u6027\u65b9\u9762\u81f3\u5173\u91cd\u8981\u3002\u57fa\u4e8e\u5b66\u4e60\u7684\u65b9\u6cd5\uff0c\u5982\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\uff0c\u5728\u5408\u4f5c\u51b3\u7b56\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684MARL\u65b9\u6cd5\u4ecd\u7136\u9762\u4e34\u5b66\u4e60\u6548\u7387\u548c\u6027\u80fd\u65b9\u9762\u7684\u6311\u6218\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fc5\u901f\u53d1\u5c55\uff0c\u5728\u5404\u79cd\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u589e\u5f3a\u5408\u4f5c\u4ee3\u7406\u7684\u5b66\u4e60\u80fd\u529b\uff0c\u540c\u65f6\u786e\u4fdd\u51b3\u7b56\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLDPD\u7684\u8bed\u8a00\u9a71\u52a8\u7b56\u7565\u84b8\u998f\u65b9\u6cd5\uff0c\u7528\u4e8e\u6307\u5bfcMARL\u63a2\u7d22\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u57fa\u4e8eLLM\u7684\u6559\u5e08\u4ee3\u7406\u8bad\u7ec3\u8f83\u5c0f\u7684\u5b66\u751f\u4ee3\u7406\u901a\u8fc7\u81ea\u8eab\u7684\u51b3\u7b56\u6f14\u793a\u5b9e\u73b0\u5408\u4f5c\u51b3\u7b56\u3002\u6559\u5e08\u4ee3\u7406\u589e\u5f3a\u4e86CAV\u7684\u89c2\u5bdf\u4fe1\u606f\uff0c\u5e76\u5229\u7528LLM\u8fdb\u884c\u590d\u6742\u7684\u5408\u4f5c\u51b3\u7b56\u63a8\u7406\uff0c\u540c\u65f6\u501f\u52a9\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u51b3\u7b56\u5de5\u5177\u5b9e\u73b0\u4e13\u5bb6\u7ea7\u51b3\u7b56\uff0c\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u6559\u5b66\u7ecf\u9a8c\u3002\u5b66\u751f\u4ee3\u7406\u968f\u540e\u901a\u8fc7\u68af\u5ea6\u7b56\u7565\u66f4\u65b0\u5c06\u6559\u5e08\u7684\u5148\u9a8c\u77e5\u8bc6\u63d0\u70bc\u5230\u81ea\u5df1\u7684\u6a21\u578b\u4e2d\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5b66\u751f\u4ee3\u7406\u53ef\u4ee5\u5728\u5c11\u91cf\u6307\u5bfc\u7684\u60c5\u51b5\u4e0b\u5feb\u901f\u63d0\u9ad8\u5176\u80fd\u529b\uff0c\u5e76\u6700\u7ec8\u8d85\u8d8a\u6559\u5e08\u7684\u8868\u73b0\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6027\u80fd\u548c\u5b66\u4e60\u6548\u7387\u65b9\u9762\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2410.24119": "|**2024-10-31**|**Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing**|Akash Dhruv et.al.|[2410.24119](http://arxiv.org/abs/2410.24119)|**[link](https://github.com/neucol/llm-conversion-performance)**|**\u57fa\u7840\u6a21\u578b\u548c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff08GenAI\uff09\u7684\u51fa\u73b0\u6709\u671b\u53d8\u9769\u79d1\u5b66\u8ba1\u7b97\u4e2d\u7684\u751f\u4ea7\u529b\uff0c\u7279\u522b\u662f\u5728\u4ee3\u7801\u5f00\u53d1\u3001\u91cd\u6784\u4ee5\u53ca\u4ece\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u8f6c\u6362\u5230\u53e6\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u65b9\u9762\u3002\u7136\u800c\uff0c\u7531\u4e8eGenAI\u7684\u8f93\u51fa\u65e0\u6cd5\u4fdd\u8bc1\u6b63\u786e\u6027\uff0c\u56e0\u6b64\u4ecd\u7136\u9700\u8981\u4eba\u5de5\u5e72\u9884\u3002\u90e8\u5206\u8fd9\u79cd\u5e72\u9884\u53ef\u4ee5\u901a\u8fc7\u4efb\u52a1\u7279\u5b9a\u5de5\u5177\u5b9e\u73b0\uff0c\u5e76\u7ed3\u5408\u6b63\u786e\u7684\u9a8c\u8bc1\u65b9\u6cd5\u548c\u6709\u6548\u7684\u63d0\u793a\u5f00\u53d1\u65b9\u6cd5\u3002\u6211\u4eec\u7814\u7a76\u4e86GenAI\u5728\u8f85\u52a9\u4ee3\u7801\u8f6c\u6362\u3001\u8bed\u8a00\u4e92\u64cd\u4f5c\u6027\u548c\u4ee3\u7801\u5e93\u68c0\u67e5\u65b9\u9762\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u5e94\u7528\u5728\u4e00\u4e2a\u7528\u4e8e\u6a21\u62df\u5927\u578b\u5f3a\u5b50\u5bf9\u649e\u673a\uff08LHC\uff09\u4e2d\u7c92\u5b50\u76f8\u4e92\u4f5c\u7528\u7684\u9057\u7559Fortran\u4ee3\u7801\u5e93\u4e2d\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5728\u6b64\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u6b3e\u540d\u4e3aCodeScribe\u7684\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u7ed3\u5408\u4e86\u63d0\u793a\u5de5\u7a0b\u4e0e\u7528\u6237\u76d1\u7763\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684\u4ee3\u7801\u8f6c\u6362\u6d41\u7a0b\u3002\u672c\u6587\u5c55\u793a\u4e86CodeScribe\u5982\u4f55\u534f\u52a9\u5c06Fortran\u4ee3\u7801\u8f6c\u6362\u4e3aC++\uff0c\u751f\u6210Fortran-C API\u4ee5\u96c6\u6210\u9057\u7559\u7cfb\u7edf\u4e0e\u73b0\u4ee3C++\u5e93\uff0c\u5e76\u63d0\u4f9b\u5f00\u53d1\u8005\u652f\u6301\u4ee5\u8fdb\u884c\u4ee3\u7801\u7ec4\u7ec7\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u8ba8\u8bba\u4e86AI\u9a71\u52a8\u7684\u4ee3\u7801\u8f6c\u6362\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5f3a\u8c03\u4e86\u5b83\u5728\u63d0\u5347\u79d1\u5b66\u8ba1\u7b97\u5de5\u4f5c\u6d41\u7a0b\u751f\u4ea7\u529b\u65b9\u9762\u7684\u4f18\u52bf\u3002**|\n", "2410.24117": "|**2024-10-31**|**Repository-Level Compositional Code Translation and Validation**|Ali Reza Ibrahimzada et.al.|[2410.24117](http://arxiv.org/abs/2410.24117)|null|\u4ee3\u7801\u7ffb\u8bd1\u662f\u6307\u5c06\u7a0b\u5e8f\u4ece\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u8f6c\u6362\u6210\u53e6\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002\u5df2\u6709\u51e0\u79cd\u57fa\u4e8e\u89c4\u5219\u7684\u8f6c\u8bd1\u5668\u88ab\u8bbe\u8ba1\u51fa\u6765\uff0c\u4ee5\u5b9e\u73b0\u4e0d\u540c\u7f16\u7a0b\u8bed\u8a00\u4e4b\u95f4\u7684\u81ea\u52a8\u5316\u4ee3\u7801\u8f6c\u6362\u3002\u7136\u800c\uff0c\u968f\u7740\u7f16\u7a0b\u8bed\u8a00\u7684\u53d1\u5c55\uff0c\u8fd9\u4e9b\u89c4\u5219\u53ef\u80fd\u4f1a\u8fc7\u65f6\uff0c\u5e76\u4e14\u65e0\u6cd5\u63a8\u5e7f\u5230\u5176\u4ed6\u7f16\u7a0b\u8bed\u8a00\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u5b9e\u73b0\u4ee3\u7801\u7ffb\u8bd1\u7684\u81ea\u52a8\u5316\u3002\u4e00\u4e2a\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u8fd9\u6837\u7684\u6280\u672f\u53ef\u80fd\u5728\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u4f46\u5728\u5b9e\u9645\u9879\u76ee\u4e2d\u7684\u89c4\u6a21\u548c\u590d\u6742\u5ea6\u4e0b\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u4f9d\u8d56\u5173\u7cfb\u3001\u81ea\u5b9a\u4e49\u7c7b\u578b\u3001\u7279\u5b9a\u4e8e\u7f16\u7a0b\u8bed\u8a00\u7684\u529f\u80fd\u7b49\u65b9\u9762\u65f6\uff0c\u5f80\u5f80\u65e0\u6cd5\u5f88\u597d\u5730\u6cdb\u5316\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AlphaTrans\uff0c\u8fd9\u662f\u4e00\u79cd\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\uff0c\u7528\u4e8e\u5b9e\u73b0\u5b58\u50a8\u5e93\u7ea7\u522b\u7684\u4ee3\u7801\u7ffb\u8bd1\u3002AlphaTrans\u4e0d\u4ec5\u7ffb\u8bd1\u6e90\u4ee3\u7801\uff0c\u8fd8\u7ffb\u8bd1\u6d4b\u8bd5\u4ee3\u7801\uff0c\u5e76\u91c7\u7528\u591a\u7ea7\u9a8c\u8bc1\u786e\u4fdd\u7ffb\u8bd1\u540e\u7684\u4ee3\u7801\u4fdd\u7559\u539f\u59cb\u7a0b\u5e8f\u7684\u529f\u80fd\u3002\u4e3a\u4e86\u5206\u89e3\u95ee\u9898\u4ee5\u4fbfLLMs\u5904\u7406\uff0cAlphaTrans\u5229\u7528\u7a0b\u5e8f\u5206\u6790\u5c06\u7a0b\u5e8f\u5206\u89e3\u6210\u7247\u6bb5\uff0c\u5e76\u6309\u9006\u8c03\u7528\u987a\u5e8f\u7ffb\u8bd1\u5b83\u4eec\u3002\u6211\u4eec\u4f7f\u7528AlphaTrans\u7ffb\u8bd1\u4e86\u5341\u4e2a\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u5f00\u6e90\u9879\u76ee\uff0c\u8fd9\u4e9b\u9879\u76ee\u5305\u542b\u5c11\u4e8e836\u30018575\u548c2719\u4e2a\u7c7b\u3001\u65b9\u6cd5\u548c\u6d4b\u8bd5\u3002AlphaTrans\u6210\u529f\u7ffb\u8bd1\u4e86\u8fd9\u4e9b\u9879\u76ee\u7684\u6574\u4e2a\u4ee3\u7801\u5e93\uff0c\u5176\u4e2d\u5305\u62ec6899\u4e2a\u6e90\u4ee3\u7801\u7247\u6bb5\u300299.1%\u7684\u7ffb\u8bd1\u4ee3\u7801\u7247\u6bb5\u5728\u8bed\u6cd5\u4e0a\u662f\u6b63\u786e\u7684\uff0cAlphaTrans\u9a8c\u8bc1\u4e86\u5176\u4e2d25.8%\u7684\u7ffb\u8bd1\u5728\u8fd0\u884c\u65f6\u884c\u4e3a\u548c\u529f\u80fd\u6b63\u786e\u6027\u3002\u5e73\u5747\u800c\u8a00\uff0c\u96c6\u6210\u7684\u7ffb\u8bd1\u548c\u9a8c\u8bc1\u8fc7\u7a0b\u9700\u898136\u5c0f\u65f6\u624d\u80fd\u5b8c\u6210\u4e00\u4e2a\u9879\u76ee\u7684\u7ffb\u8bd1\uff0c\u663e\u793a\u51fa\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u6269\u5c55\u6027\u3002\u5bf9\u4e8e\u90a3\u4e9b\u5728\u8bed\u6cd5\u6216\u8bed\u4e49\u4e0a\u4e0d\u6b63\u786e\u7684\u7ffb\u8bd1\uff0cAlphaTrans\u4f1a\u751f\u6210\u4e00\u4efd\u62a5\u544a\uff0c\u5305\u62ec\u73b0\u6709\u7ffb\u8bd1\u3001\u5806\u6808\u8ddf\u8e2a\u3001\u6d4b\u8bd5\u9519\u8bef\u6216\u65ad\u8a00\u5931\u8d25\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u7ed3\u679c\u63d0\u4f9b\u7ed9\u4e24\u4f4d\u5f00\u53d1\u8005\uff0c\u8ba9\u4ed6\u4eec\u4fee\u590d\u56db\u4e2a\u9879\u76ee\u4e2d\u7684\u7ffb\u8bd1\u9519\u8bef\u3002\u4ed6\u4eec\u5e73\u5747\u82b1\u8d3920.1\u5c0f\u65f6\u89e3\u51b3\u4e86\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u5b9e\u73b0\u4e86\u6240\u6709\u6d4b\u8bd5\u901a\u8fc7\u3002|\n", "2410.24105": "|**2024-10-31**|**Matchmaker: Self-Improving Large Language Model Programs for Schema Matching**|Nabeel Seedat et.al.|[2410.24105](http://arxiv.org/abs/2410.24105)|null|schema\u5339\u914d\u2014\u2014\u5373\u5728\u5177\u6709\u4e0d\u540c\u8868\u548c\u5c42\u6b21\u7ed3\u6784\u7684\u5f02\u6784\u6570\u636e\u6e90\u4e4b\u95f4\u627e\u5230\u5c5e\u6027\u4e4b\u95f4\u7684\u5339\u914d\u5173\u7cfb\u2014\u2014\u5bf9\u4e8e\u521b\u5efa\u53ef\u7528\u4e8e\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u7684\u4e92\u64cd\u4f5c\u6027\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u89e3\u51b3\u8fd9\u4e00\u57fa\u7840\u6027\u7684\u4ee5\u6570\u636e\u4e3a\u4e2d\u5fc3\u7684\u95ee\u9898\u5177\u6709\u5e7f\u6cdb\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5728\u533b\u7597\u3001\u91d1\u878d\u548c\u7535\u5b50\u5546\u52a1\u7b49\u9886\u57df\uff0c\u4f46\u4e5f\u6709\u53ef\u80fd\u66f4\u666e\u904d\u5730\u901a\u8fc7\u589e\u52a0\u7528\u4e8e\u8bad\u7ec3ML\u6a21\u578b\u7684\u6570\u636e\u91cf\u6765\u4f7fML\u6a21\u578b\u53d7\u76ca\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4e0d\u540c\u6a21\u5f0f\u4e4b\u95f4\u7684\u7ed3\u6784/\u5c42\u6b21\u548c\u8bed\u4e49\u5f02\u8d28\u6027\uff0cschema\u5339\u914d\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684ML\u4efb\u52a1\u3002\u5148\u524d\u7684\u81ea\u52a8\u5316schema\u5339\u914d\u7684ML\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u5927\u91cf\u7684\u6807\u6ce8\u6570\u636e\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff0c\u8fd9\u901a\u5e38\u662f\u4e0d\u73b0\u5b9e\u7684\uff0c\u8981\u4e48\u96f6\u6837\u672c\u6027\u80fd\u8f83\u5dee\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Matchmaker\u2014\u2014\u4e00\u79cd\u7528\u4e8eschema\u5339\u914d\u7684\u7ec4\u5408\u5f0f\u8bed\u8a00\u6a21\u578b\u7a0b\u5e8f\uff0c\u8be5\u7a0b\u5e8f\u7531\u5019\u9009\u751f\u6210\u3001\u4f18\u5316\u548c\u7f6e\u4fe1\u5ea6\u8bc4\u5206\u7ec4\u6210\u3002Matchmaker\u8fd8\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u5728\u65e0\u9700\u6807\u6ce8\u793a\u4f8b\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u81ea\u6211\u6539\u8fdb\uff0c\u8be5\u65b9\u6cd5\u6784\u5efa\u5408\u6210\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\u6765\u6307\u5bfc\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u5728\u771f\u5b9e\u4e16\u754c\u7684\u533b\u5b66schema\u5339\u914d\u57fa\u51c6\u4e0a\uff0cMatchmaker\u4f18\u4e8e\u4ee5\u524d\u7684\u57fa\u4e8eML\u7684\u65b9\u6cd5\uff0c\u7a81\u663e\u4e86\u5176\u52a0\u901f\u6570\u636e\u96c6\u6210\u548cML\u5c31\u7eea\u6570\u636e\u4e92\u64cd\u4f5c\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.24049": "|**2024-10-31**|**Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs**|Muhammed Saeed et.al.|[2410.24049](http://arxiv.org/abs/2410.24049)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u4f7f\u7528\uff0c\u4f46\u56e0\u5176\u5185\u90e8\u5d4c\u5165\u7684\u793e\u4f1a\u504f\u89c1\u800c\u5f15\u53d1\u4f26\u7406\u95ee\u9898\u3002\u672c\u7814\u7a76\u5728\u5305\u62ec\u5973\u6027\u6743\u5229\u3001\u6050\u6016\u4e3b\u4e49\u548c\u53cd\u72b9\u592a\u4e3b\u4e49\u5728\u5185\u7684\u516b\u4e2a\u9886\u57df\u8bc4\u4f30\u4e86LLM\u5bf9\u963f\u62c9\u4f2f\u4eba\u4e0e\u897f\u65b9\u4eba\u7684\u504f\u89c1\uff0c\u5e76\u8bc4\u4f30\u4e86\u6a21\u578b\u62b5\u6297\u5ef6\u7eed\u8fd9\u4e9b\u504f\u89c1\u7684\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30LLM\u5bf9\u963f\u62c9\u4f2f\u4eba\u4e0e\u897f\u65b9\u4eba\u7684\u504f\u89c1\uff0c\u53e6\u4e00\u4e2a\u7528\u4e8e\u6d4b\u8bd5\u6a21\u578b\u5728\u9762\u5bf9\u5938\u5927\u8d1f\u9762\u7279\u5f81\u7684\u63d0\u793a\uff08\u201c\u8d8a\u72f1\u201d\uff09\u65f6\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u516d\u79cdLLM\u6a21\u578b\u2014\u2014GPT-4\u3001GPT-4o\u3001LlaMA 3.1 (8B & 405B)\u3001Mistral 7B\u548cClaude 3.5 Sonnet\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u572879%\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u8868\u73b0\u51fa\u5bf9\u963f\u62c9\u4f2f\u4eba\u7684\u8d1f\u9762\u504f\u89c1\uff0c\u5176\u4e2dLlaMA 3.1-405B\u8868\u73b0\u51fa\u6700\u4e25\u91cd\u7684\u504f\u89c1\u3002\u6211\u4eec\u7684\u201c\u8d8a\u72f1\u201d\u6d4b\u8bd5\u663e\u793a\uff0c\u5c3d\u7ba1GPT-4o\u662f\u7ecf\u8fc7\u4f18\u5316\u7684\u7248\u672c\uff0c\u4f46\u5b83\u6700\u5bb9\u6613\u53d7\u5230\u653b\u51fb\uff0c\u5728\u4e09\u4e2a\u7c7b\u522b\u4e2d\u7684\u653b\u51fb\u6210\u529f\u7387\u6700\u9ad8\uff0c\u5176\u6b21\u662fLlaMA 3.1-8B\u548cMistral 7B\u3002\u9664\u4e86Claude\u4e4b\u5916\uff0c\u6240\u6709LLM\u5728\u4e09\u4e2a\u7c7b\u522b\u4e2d\u7684\u653b\u51fb\u6210\u529f\u7387\u5747\u8d85\u8fc787%\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5c3d\u7ba1Claude 3.5 Sonnet\u662f\u6700\u5b89\u5168\u7684\u6a21\u578b\uff0c\u4f46\u5b83\u4ecd\u7136\u5728\u516b\u4e2a\u7c7b\u522b\u4e2d\u7684\u4e03\u4e2a\u7c7b\u522b\u4e2d\u8868\u73b0\u51fa\u504f\u89c1\u3002\u5c3d\u7ba1\u662fGPT-4\u7684\u4f18\u5316\u7248\u672c\uff0c\u6211\u4eec\u53d1\u73b0GPT-4o\u66f4\u5bb9\u6613\u51fa\u73b0\u504f\u89c1\u548c\u201c\u8d8a\u72f1\u201d\uff0c\u8fd9\u8868\u660e\u4f18\u5316\u5b58\u5728\u7f3a\u9677\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u9700\u8981\u66f4\u5f3a\u5927\u7684\u504f\u89c1\u7f13\u89e3\u7b56\u7565\u548c\u5f3a\u5316\u7684\u5b89\u5168\u63aa\u65bd\u3002|\n", "2410.24032": "|**2024-10-31**|**Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks**|Yingzhe Peng et.al.|[2410.24032](http://arxiv.org/abs/2410.24032)|null|\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5df2\u7ecf\u5f7b\u5e95\u6539\u53d8\u4e86\u7528\u6237\u4e0e\u77e5\u8bc6\u7cfb\u7edf\u4e4b\u95f4\u7684\u4ea4\u4e92\u65b9\u5f0f\uff0c\u4f7f\u5f97\u804a\u5929\u673a\u5668\u4eba\u80fd\u591f\u6574\u5408\u5927\u91cf\u7684\u4fe1\u606f\u5e76\u534f\u52a9\u5904\u7406\u590d\u6742\u7684\u63a2\u7d22\u6027\u4efb\u52a1\u3002\u7136\u800c\uff0c\u57fa\u4e8eLLM\u7684\u804a\u5929\u673a\u5668\u4eba\u5728\u63d0\u4f9b\u4e2a\u6027\u5316\u652f\u6301\u65b9\u9762\u5f80\u5f80\u5b58\u5728\u56f0\u96be\uff0c\u7279\u522b\u662f\u5728\u7528\u6237\u4ee5\u6a21\u7cca\u67e5\u8be2\u5f00\u59cb\u6216\u7f3a\u4e4f\u8db3\u591f\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u65f6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u534f\u4f5c\u4e2a\u6027\u5316\u63a2\u7d22\u52a9\u624b\u201d\uff08CARE\uff09\u7684\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u591a\u4ee3\u7406LLM\u6846\u67b6\u548c\u7ed3\u6784\u5316\u7684\u7528\u6237\u754c\u9762\u6765\u589e\u5f3a\u4e2a\u6027\u5316\u63a2\u7d22\u4efb\u52a1\u3002CARE\u7684\u754c\u9762\u5305\u62ec\u804a\u5929\u9762\u677f\u3001\u89e3\u51b3\u65b9\u6848\u9762\u677f\u548c\u9700\u6c42\u9762\u677f\uff0c\u5b9e\u73b0\u4e86\u8fed\u4ee3\u67e5\u8be2\u7ec6\u5316\u548c\u52a8\u6001\u89e3\u51b3\u65b9\u6848\u751f\u6210\u3002\u591a\u4ee3\u7406\u6846\u67b6\u534f\u540c\u5de5\u4f5c\uff0c\u8bc6\u522b\u7528\u6237\u7684\u663e\u6027\u548c\u9690\u6027\u9700\u6c42\uff0c\u4ece\u800c\u63d0\u4f9b\u91cf\u8eab\u5b9a\u5236\u7684\u3001\u53ef\u64cd\u4f5c\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u4e00\u9879\u6d89\u53ca22\u540d\u53c2\u4e0e\u8005\u7684\u88ab\u8bd5\u5185\u7528\u6237\u7814\u7a76\u4e2d\uff0cCARE\u88ab\u4e00\u81f4\u8ba4\u4e3a\u4f18\u4e8e\u57fa\u7ebfLLM\u804a\u5929\u673a\u5668\u4eba\uff0c\u7528\u6237\u79f0\u8d5e\u5176\u80fd\u591f\u51cf\u8f7b\u8ba4\u77e5\u8d1f\u62c5\u3001\u6fc0\u53d1\u521b\u9020\u529b\uff0c\u5e76\u63d0\u4f9b\u66f4\u52a0\u4e2a\u6027\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cCARE\u6709\u53ef\u80fd\u5c06\u57fa\u4e8eLLM\u7684\u7cfb\u7edf\u4ece\u88ab\u52a8\u7684\u4fe1\u606f\u68c0\u7d22\u8005\u8f6c\u53d8\u4e3a\u4e2a\u6027\u5316\u95ee\u9898\u89e3\u51b3\u548c\u63a2\u7d22\u4e2d\u7684\u79ef\u6781\u5408\u4f5c\u4f19\u4f34\u3002|\n", "2410.24024": "|**2024-10-31**|**AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents**|Yifan Xu et.al.|[2410.24024](http://arxiv.org/abs/2410.24024)|null|\u81ea\u4e3b\u4ee3\u7406\u5728\u4e0e\u73b0\u5b9e\u4e16\u754c\u4ea4\u4e92\u65b9\u9762\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7279\u522b\u662f\uff0cAndroid\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u4ea4\u4e92\u65b9\u6cd5\u88ab\u9891\u7e41\u63d0\u53ca\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30Android\u4ee3\u7406\u7684\u7814\u7a76\u7f3a\u4e4f\u5bf9\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u7684\u7cfb\u7edf\u6027\u7814\u7a76\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AndroidLab\u4f5c\u4e3a\u4e00\u5957\u7cfb\u7edf\u7684Android\u4ee3\u7406\u6846\u67b6\u3002\u5b83\u5305\u62ec\u4e00\u4e2a\u5177\u6709\u4e0d\u540c\u6a21\u6001\u3001\u52a8\u4f5c\u7a7a\u95f4\u548c\u53ef\u91cd\u590d\u57fa\u51c6\u7684\u64cd\u4f5c\u73af\u5883\u3002\u5b83\u652f\u6301\u5728\u540c\u4e00\u52a8\u4f5c\u7a7a\u95f4\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u3002AndroidLab\u57fa\u51c6\u5305\u62ec\u9884\u5b9a\u4e49\u7684Android\u865a\u62df\u8bbe\u5907\u548c\u4e5d\u4e2a\u5e94\u7528\u4e0a\u7684138\u4e2a\u4efb\u52a1\u3002\u901a\u8fc7\u4f7f\u7528AndroidLab\u73af\u5883\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2aAndroid\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u5e76\u8bad\u7ec3\u4e86\u516d\u4e2a\u5f00\u6e90LLMs\u548cLMMs\uff0c\u5c06LLMs\u7684\u5e73\u5747\u6210\u529f\u7387\u4ece4.59%\u63d0\u5347\u523021.50%\uff0cLMMs\u7684\u5e73\u5747\u6210\u529f\u7387\u4ece1.93%\u63d0\u5347\u523013.28%\u3002AndroidLab\u5df2\u5f00\u6e90\u5e76\u516c\u5f00\u63d0\u4f9b\uff0c\u7f51\u5740\u4e3a\u3002|\n"}} \ No newline at end of file +{"agent": {"2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u5d4c\u5165\u5f0f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7cfb\u7edf\u5728\u7a7a\u95f4\u8ba4\u77e5\u548c\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u96c6\u6210\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u8bba\u6587\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u63ed\u793a\u4e86\u660e\u663e\u7684\u8fdb\u5c55\uff0c\u4f46\u4e5f\u5f3a\u8c03\u4e86\u5f00\u53d1\u65b0\u65b9\u6cd5\u4ee5\u5145\u5206\u5229\u75283D-LLMs\u6f5c\u529b\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u6307\u660e\u9053\u8def\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u7efc\u8ff0\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.09935": "|**2024-05-24**|**DEBATE: Devil's Advocate-Based Assessment and Text Evaluation**|Alex Kim et.al.|[2405.09935](http://arxiv.org/abs/2405.09935)|**[link](https://github.com/gunny97/DEBATE)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u6a21\u578b\u7684\u666e\u53ca\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u53d8\u5f97\u65e5\u76ca\u5173\u952e\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65e0\u53c2\u8003\u8bc4\u4ef7\u5668\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u5904\u7406\u65b0\u4efb\u52a1\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u91c7\u7528\u5355\u4ee3\u7406\u65b9\u6cd5\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\u3002\u56e0\u4e3aLLM\u4ee3\u7406\u7684\u56de\u7b54\u5b58\u5728\u504f\u89c1\uff0c\u6bd4\u5982\u5bf9\u7279\u5b9a\u6587\u672c\u7ed3\u6784\u6216\u5185\u5bb9\u7684\u504f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u5de5\u4f5c\u4e2d\u63d0\u51faDEBATE\uff0c\u4e00\u4e2a\u5efa\u7acb\u5728\u591a\u4ee3\u7406\u8bc4\u5206\u7cfb\u7edf\u57fa\u7840\u4e0a\u7684NLG\u8bc4\u4ef7\u6846\u67b6\uff0c\u878d\u5165\u4e86\u201c\u6076\u9b54\u8fa9\u624b\u201d\u7684\u6982\u5ff5\u3002\u5728\u8be5\u6846\u67b6\u4e2d\uff0c\u4e00\u4e2a\u4ee3\u7406\u88ab\u6307\u4ee4\u6279\u8bc4\u5176\u4ed6\u4ee3\u7406\u7684\u8bba\u70b9\uff0c\u4ece\u800c\u53ef\u80fd\u6d88\u89e3LLM\u4ee3\u7406\u7b54\u6848\u4e2d\u7684\u504f\u89c1\u3002DEBATE\u5728\u4e24\u4e2aNLG\u8bc4\u4ef7\u5143\u8bc4\u4f30\u57fa\u51c6\u2014\u2014SummEval\u548cTopicalChat\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4ee3\u7406\u4e4b\u95f4\u7684\u8fa9\u8bba\u5e7f\u5ea6\u4ee5\u53ca\u4ee3\u7406\u7684\u4eba\u683c\u7279\u8d28\u4f1a\u5f71\u54cd\u8bc4\u4ef7\u5668\u7684\u6027\u80fd\u3002|\n", "2405.05175": "|**2024-05-08**|**Air Gap: Protecting Privacy-Conscious Conversational Agents**|Eugene Bagdasaryan et.al.|[2405.05175](http://arxiv.org/abs/2405.05175)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u5f0f\u4ee3\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5904\u7406\u654f\u611f\u7528\u6237\u6570\u636e\u65f6\u5f15\u53d1\u4e86\u4e25\u91cd\u7684\u9690\u79c1\u95ee\u9898\u3002\u8fd9\u4e9b\u4ee3\u7406\u867d\u80fd\u7406\u89e3\u5e76\u5904\u7406\u4e0a\u4e0b\u6587\uff0c\u4f46\u4e5f\u53ef\u80fd\u88ab\u6076\u610f\u4e00\u65b9\u5229\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5a01\u80c1\u6a21\u578b\uff0c\u5373\u7b2c\u4e09\u65b9\u5e94\u7528\u901a\u8fc7\u64cd\u63a7\u4ea4\u4e92\u4e0a\u4e0b\u6587\uff0c\u8bef\u5bfcLLM\u4ee3\u7406\u6cc4\u9732\u4e0e\u5176\u4efb\u52a1\u65e0\u5173\u7684\u79c1\u4eba\u4fe1\u606f\u3002\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u6846\u67b6\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AirGapAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u6ce8\u91cd\u9690\u79c1\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u9650\u5236\u4ee3\u7406\u4ec5\u8bbf\u95ee\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u6570\u636e\uff0c\u9632\u6b62\u610f\u5916\u7684\u6570\u636e\u6cc4\u6f0f\u3002\u5b9e\u9a8c\u4f7f\u7528Gemini\u3001GPT\u548cMistral\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\uff0c\u7ed3\u679c\u663e\u793aAirGapAgent\u5728\u62b5\u5fa1\u57fa\u4e8e\u5355\u4e2a\u67e5\u8be2\u7684\u4e0a\u4e0b\u6587\u52ab\u6301\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8eGemini Ultra\u4ee3\u7406\uff0c\u8fd9\u79cd\u653b\u51fb\u4ece94%\u7684\u4fdd\u62a4\u80fd\u529b\u964d\u4f4e\u523045%\uff0c\u800cAirGapAgent\u53ef\u4ee5\u4fdd\u630197%\u7684\u9632\u62a4\u6548\u679c\uff0c\u4f7f\u540c\u6837\u7684\u653b\u51fb\u5931\u6548\u3002|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325](http://arxiv.org/abs/2405.04325)|null|\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u867d\u4e3a\u6784\u5efa\u81ea\u7136\u8bed\u8a00\u4ee3\u7406\u63d0\u4f9b\u4e86\u5f3a\u5927\u57fa\u7840\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u5b83\u4eec\u53ca\u5176\u57fa\u4e8e\u5b83\u4eec\u6784\u5efa\u7684\u81ea\u4e3b\u4ee3\u7406\u7684\u5b89\u5168\u6027\u62c5\u5fe7\u3002\u7279\u522b\u662f\u6b3a\u9a97\u80fd\u529b\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662fAI\u4ee3\u7406\u901a\u8fc7\u6df7\u6dc6\u548c\u6a21\u68f1\u4e24\u53ef\u6765\u8bef\u5bfc\u3001\u9690\u85cf\u771f\u76f8\u6216\u63a8\u5e7f\u90e8\u5206\u4e0d\u771f\u5b9e\u7684\u4fe1\u5ff5\u7684\u884c\u4e3a\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80AI\u5b89\u5168\u7814\u7a76\u4e2d\u7684\u6492\u8c0e\u3001\u81ea\u79c1\u51b3\u7b56\u6216\u63d0\u4f9b\u865a\u5047\u4fe1\u606f\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u4e00\u7c7b\u7279\u6b8a\u7684\u6b3a\u9a97\uff1a\u7c7b\u4f3c\u4e8e\u9b54\u672f\u5e08\u5229\u7528\u969c\u773c\u6cd5\u8ba9\u5154\u5b50\u4ece\u5e3d\u5b50\u91cc\u51fa\u73b0\uff0c\u8981\u4e48\u901a\u8fc7\u9690\u85cf\u7684\u6697\u95e8\uff0c\u8981\u4e48\u901a\u8fc7\u8f6c\u79fb\u6ce8\u610f\u529b\u76f4\u63a5\u5c55\u793a\u3002 \u6211\u4eec\u7684\u65b0\u5b9e\u9a8c\u5e73\u53f0\u5728\u4e00\u4e2a\u6709\u76ee\u6807\u7684\u73af\u5883\u4e2d\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u5bf9\u6297\u6027\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65f6\u7684\u6b3a\u9a97\u56fa\u6709\u80fd\u529b\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u7acb\u6cd5\u4efb\u52a1\u201c\u6e38\u8bf4\u201d\u8bae\u6848\u3002\u5728\u76ee\u6807\u9a71\u52a8\u7684\u73af\u5883\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u6784\u5efa\u6b3a\u9a97\u80fd\u529b\uff0c\u7ed3\u5408\u8bed\u8a00\u54f2\u5b66\u548c\u8ba4\u77e5\u5fc3\u7406\u5b66\u7406\u8bba\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6e38\u8bf4\u4ee3\u7406\u5728\u5bf9\u6297\u4e92\u52a8\u7684\u540e\u7eed\u5f3a\u5316\u8bd5\u9a8c\u4e2d\u5176\u6b3a\u9a97\u80fd\u529b\u63d0\u9ad8\u4e86\u7ea640%\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6b3a\u9a97\u68c0\u6d4b\u673a\u5236\u80fd\u8fbe\u5230\u9ad8\u8fbe92%\u7684\u8bc6\u522b\u7387\u3002\u8fd9\u4e9b\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u95ee\u9898\uff0c\u5373\u4ee3\u7406\u53ef\u80fd\u64cd\u7eb5\u4eba\u7c7b\u4ee5\u8fbe\u6210\u9884\u8bbe\u76ee\u6807\u3002|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324](http://arxiv.org/abs/2405.04324)|**[link](https://github.com/ibm-granite/granite-code-models)**|**\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u9886\u57df\u7684\u8bad\u7ec3\u6b63\u5728\u9769\u65b0\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002\u5982\u4eca\uff0c\u8fd9\u4e9b\u4ee3\u7801LLMs\u6b63\u9010\u6b65\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u73af\u5883\uff0c\u4ee5\u63d0\u5347\u4eba\u7c7b\u7a0b\u5e8f\u5458\u7684\u6548\u7387\uff0c\u5e76\u5c55\u73b0\u51fa\u81ea\u4e3b\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u8981\u5145\u5206\u5229\u7528\u4ee3\u7801LLMs\u7684\u5168\u90e8\u6548\u80fd\uff0c\u9700\u8981\u5176\u5177\u5907\u751f\u6210\u4ee3\u7801\u3001\u4fee\u590dbug\u3001\u89e3\u91ca\u548c\u6ce8\u91ca\u4ee3\u7801\u3001\u7ef4\u62a4\u4ed3\u5e93\u7b49\u591a\u79cd\u529f\u80fd\u3002\u672c\u6587\u4ecb\u7ecdGranite\u7cfb\u5217\u7684\u89e3\u7801\u5668\u4ec5\u6709\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u4e13\u4e3a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u800c\u8bbe\u8ba1\uff0c\u8bad\u7ec3\u6570\u636e\u6db5\u76d6116\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002Granite Code\u6a21\u578b\u5bb6\u65cf\u5305\u62ec\u4ece3\u4ebf\u5230340\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u4ece\u590d\u6742\u5e94\u7528\u73b0\u4ee3\u5316\u5230\u8bbe\u5907\u5185\u5b58\u53d7\u9650\u7684\u591a\u79cd\u5e94\u7528\u573a\u666f\u3002\u901a\u8fc7\u5168\u9762\u4efb\u52a1\u8bc4\u4f30\uff0cGranite Code\u6a21\u578b\u5728\u5f00\u6e90\u4ee3\u7801LLM\u4e2d\u7684\u6027\u80fd\u59cb\u7ec8\u5904\u4e8e\u9886\u5148\u6c34\u5e73\u3002\u8be5\u6a21\u578b\u5bb6\u65cf\u9488\u5bf9\u4f01\u4e1a\u8f6f\u4ef6\u5f00\u53d1\u5de5\u4f5c\u6d41\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8868\u73b0\u51fa\u8272\u4e8e\u5404\u79cd\u7f16\u7801\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\u3001\u4fee\u590d\u4e0e\u89e3\u91ca\uff09\uff0c\u662f\u4e00\u6b3e\u591a\u7528\u9014\u7684\u5168\u80fd\u4ee3\u7801\u6a21\u578b\u3002\u6211\u4eec\u4ee5Apache 2.0\u8bb8\u53ef\u534f\u8bae\u53d1\u5e03\u6240\u6709Granite Code\u6a21\u578b\uff0c\u4f9b\u7814\u7a76\u548c\u5546\u4e1a\u4f7f\u7528\u3002**|\n", "2405.04219": "|**2024-05-07**|**Iterative Experience Refinement of Software-Developing Agents**|Chen Qian et.al.|[2405.04219](http://arxiv.org/abs/2405.04219)|null|### \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u81ea\u4e3b\u6027\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u9759\u6001\u7ecf\u9a8c\u8303\u5f0f\u4f9d\u8d56\u4e8e\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u83b7\u53d6\u7684\u56fa\u5b9a\u5386\u53f2\u7ecf\u9a8c\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u8fed\u4ee3\u7ecf\u9a8c\u4f18\u5316\u6846\u67b6\uff0c\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u5728\u6267\u884c\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u52a8\u6001\u8c03\u6574\u548c\u4f18\u5316\u7ecf\u9a8c\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e24\u79cd\u6838\u5fc3\u6a21\u5f0f\uff1a\u987a\u5e8f\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u6279\u6b21\u5185\u7684\u6700\u8fd1\u7ecf\u9a8c\u8fdb\u884c\u6539\u8fdb\uff1b\u7d2f\u8ba1\u6a21\u5f0f\uff0c\u79ef\u7d2f\u6240\u6709\u5148\u524d\u4efb\u52a1\u6279\u6b21\u7684\u7ecf\u9a8c\u3002\u901a\u8fc7\u5f15\u5165\u7ecf\u9a8c\u6dd8\u6c70\u7b56\u7565\uff0c\u8be5\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9ad8\u8d28\u91cf\u548c\u5e38\u7528\u7684\u7ecf\u9a8c\uff0c\u6709\u6548\u5730\u7ba1\u7406\u7ecf\u9a8c\u7a7a\u95f4\uff0c\u63d0\u9ad8\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u987a\u5e8f\u6a21\u5f0f\u53ef\u80fd\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\uff0c\u4f46\u7d2f\u8ba1\u6a21\u5f0f\u5728\u7a33\u5b9a\u6027\u65b9\u9762\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u6dd8\u6c70\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u9ad8\u8d28\u91cf\u7ecf\u9a8c\u5b50\u96c6\u768411.54%\uff0c\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2405.03813": "|**2024-05-06**|**Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control**|Yaqub Chaudhary et.al.|[2405.03813](http://arxiv.org/abs/2405.03813)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6a21\u4eff\u5404\u79cd\u4fee\u8f9e\u98ce\u683c\uff0c\u751f\u6210\u8868\u8fbe\u5e7f\u6cdb\u60c5\u611f\u7684\u6587\u672c\uff0c\u8fd9\u79cd\u80fd\u529b\u5728\u4f4e\u6210\u672c\u4e0b\u8fc5\u901f\u666e\u53ca\uff0c\u5e26\u6765\u4e86\u6f5c\u5728\u7684\u793e\u4f1a\u5371\u5bb3\u3002\u672c\u6587\u5e76\u672a\u5b64\u7acb\u770b\u5f85\u8fd9\u4e9b\u6a21\u578b\uff0c\u800c\u662f\u5173\u6ce8\u5b83\u4eec\u80cc\u540e\u5927\u89c4\u6a21\u8ba1\u7b97\u57fa\u7840\u8bbe\u65bd\u5728\u5404\u9886\u57df\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u63a2\u8ba8\u4e86LLMs\u5982\u4f55\u901a\u8fc7\u6c61\u67d3\u548c\u6807\u51c6\u5316\u4fe1\u606f\u73af\u5883\u6765\u5f71\u54cd\u793e\u4f1a\uff0c\u5e76\u6307\u51fa\u8fd9\u4e9b\u529f\u80fd\u53ef\u80fd\u88ab\u7528\u4f5c\u63a7\u5236\u624b\u6bb5\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u7126\u70b9\u8f6c\u5411\u51e0\u4e2a\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u8fd9\u4e9b\u9886\u57df\u589e\u5f3a\u4e86LLMs\u4f5c\u4e3a\u6743\u529b\u5de5\u5177\u7684\u80fd\u529b\uff1a 1. \u901a\u8fc7\u5b9e\u65f6\u8bbe\u8ba1\u5bf9\u8bdd\u754c\u9762\u4e2d\u7684\u9009\u62e9\u67b6\u6784\uff08\u5982\u201cAI\u89d2\u8272\u201d\uff09\uff0c\u8fdb\u884c\u8bf4\u670d\u7b56\u7565\u3002 2. \u5229\u7528LLM\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u8ba1\u7b97\u6a21\u578b\uff08\u5982\u201c\u7845\u8d28\u4e3b\u4f53\u201d\uff09\u3002 3. \u5c06LLM\u5e94\u7528\u4e8e\u6a21\u62df\u4eba\u7c7b\u7fa4\u4f53\u884c\u4e3a\uff08\u5982\u201c\u7845\u8d28\u793e\u4f1a\u201d\uff09\u3002 4. \u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff0c\u521b\u5efa\u53ef\u63a7\u5236\u548c\u5bfc\u5411\u7684\u6218\u7565\u5bf9\u8bdd\u6a21\u578b\u3002 \u7efc\u5408\u4ee5\u4e0a\u51e0\u70b9\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6280\u672f\u6784\u5efa\u57fa\u4e8eLLMs\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u6a21\u62df\u548c\u4f2a\u88c5\u7684\u201c\u9884\u6d4b\u201d\uff0c\u6210\u4e3a\u4e2a\u4f53\u3001\u793e\u4f1a\u548c\u653f\u6cbb\u63a7\u5236\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u64cd\u63a7\u4eba\u7c7b\u7684\u884c\u4e3a\u3001\u610f\u56fe\u548c\u884c\u52a8\u3002|\n", "2405.06682": "|**2024-05-05**|**Self-Reflection in LLM Agents: Effects on Problem-Solving Performance**|Matthew Renze et.al.|[2405.06682](http://arxiv.org/abs/2405.06682)|**[link](https://github.com/matthewrenze/self-reflection)**|**\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u81ea\u6211\u53cd\u601d\u5bf9\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba9\u4e5d\u79cd\u6d41\u884c\u7684LLMs\u56de\u7b54\u4e00\u7cfb\u5217\u9009\u62e9\u9898\uff0c\u4ee5\u5efa\u7acb\u6027\u80fd\u57fa\u7ebf\u3002\u5bf9\u4e8e\u56de\u7b54\u9519\u8bef\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6307\u5bfc\u516b\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u81ea\u6211\u53cd\u601dLLM\u4ee3\u7406\u53cd\u601d\u5176\u9519\u8bef\uff0c\u5e76\u4e3a\u81ea\u5df1\u63d0\u4f9b\u6539\u8fdb\u95ee\u9898\u89e3\u51b3\u7684\u6307\u5bfc\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u6307\u5bfc\uff0c\u6bcf\u4e2a\u53cd\u601d\u578b\u4ee3\u7406\u91cd\u65b0\u5c1d\u8bd5\u56de\u7b54\u540c\u6837\u7684\u95ee\u9898\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u663e\u8457\u63d0\u9ad8\u4e86\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff08$p < 0.001$\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6bd4\u8f83\u4e86\u5404\u79cd\u81ea\u6211\u53cd\u601d\u65b9\u5f0f\u5bf9\u6027\u80fd\u7684\u5355\u72ec\u8d21\u732e\u3002\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\uff1ahttps://github.com/matthewrenze/self-reflection\u3002**|\n", "2405.02858": "|**2024-05-05**|**Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation**|Jinyu Cai et.al.|[2405.02858](http://arxiv.org/abs/2405.02858)|**[link](https://github.com/BlueLinkX/GA-MAS)**|**\u793e\u4ea4\u5a92\u4f53\u5e73\u53f0\u5982Twitter\u3001Reddit\u548c\u65b0\u6d6a\u5fae\u535a\u5728\u5168\u7403\u4ea4\u6d41\u4e2d\u626e\u6f14\u91cd\u8981\u89d2\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5730\u7f18\u653f\u6cbb\u654f\u611f\u533a\u57df\u5e38\u5e38\u53d7\u5230\u4e25\u683c\u76d1\u7ba1\u3002\u8fd9\u4fc3\u4f7f\u7528\u6237\u5728\u53d7\u9650\u7684\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e2d\u5de7\u5999\u5730\u8c03\u6574\u6c9f\u901a\u65b9\u5f0f\uff0c\u7ecf\u5e38\u4f7f\u7528\u7f16\u7801\u8bed\u8a00\u3002\u8fd9\u79cd\u8bed\u8a00\u6a21\u5f0f\u7684\u53d8\u5316\u4e0d\u4ec5\u662f\u4e3a\u4e86\u5bf9\u6297\u76d1\u7ba1\uff0c\u4e5f\u662f\u8bed\u8a00\u6f14\u5316\u7684\u751f\u52a8\u4f8b\u8bc1\uff0c\u5c55\u793a\u4e86\u793e\u4f1a\u548c\u6280\u672f\u538b\u529b\u4e0b\u8bed\u8a00\u5982\u4f55\u81ea\u7136\u6f14\u53d8\u3002\u7814\u7a76\u53d7\u9650\u5236\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e0b\u8bed\u8a00\u7684\u6f14\u53d8\u5bf9\u4e8e\u4fdd\u969c\u8a00\u8bba\u81ea\u7531\u3001\u4f18\u5316\u5185\u5bb9\u7ba1\u7406\u4ee5\u53ca\u63a8\u52a8\u8bed\u8a00\u5b66\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6a21\u62df\u6846\u67b6\uff0c\u7528\u4e8e\u63a2\u7d22\u5728\u4e25\u683c\u76d1\u7ba1\u4e0b\u7684\u7528\u6237\u8bed\u8a00\u8fdb\u5316\u3002\u8be5\u6846\u67b6\u5305\u542b\u5bf9\u8bdd\u76d1\u7763\u7684LLM\u9a71\u52a8\u4ee3\u7406\u548c\u53c2\u4e0e\u8005\u4ee3\u7406\uff0c\u5b83\u4eec\u5728\u4e92\u52a8\u4e2d\u53d1\u5c55\u8bed\u8a00\u7b56\u7565\uff0c\u6a21\u62df\u5728\u89c4\u907f\u793e\u4ea4\u5a92\u4f53\u89c4\u5219\u7684\u73af\u5883\u4e2d\u4ea4\u6d41\u65b9\u5f0f\u7684\u6f14\u53d8\u3002\u901a\u8fc7\u4ece\u62bd\u8c61\u573a\u666f\u5230\u73b0\u5b9e\u60c5\u5883\u7684\u591a\u79cd\u60c5\u666f\u8bc4\u4f30\uff0c\u7814\u7a76\u7ed3\u679c\u663e\u793aLLMs\u80fd\u591f\u6709\u6548\u6a21\u62df\u53d7\u9650\u73af\u5883\u4e2d\u7684\u590d\u6742\u8bed\u8a00\u52a8\u6001\u548c\u4ea4\u4e92\uff0c\u968f\u7740\u8fdb\u5316\uff0c\u5b83\u4eec\u5728\u89c4\u907f\u76d1\u7763\u548c\u4fe1\u606f\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u51fa\u63d0\u5347\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0LLM\u4ee3\u7406\u9488\u5bf9\u4e0d\u540c\u7684\u573a\u666f\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u7b56\u7565\u3002**|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533](http://arxiv.org/abs/2405.01533)|**[link](https://github.com/nvlabs/omnidrive)**|**\u968f\u7740\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5bf9\u4e8e\u57fa\u4e8e\u8fd9\u4e9b\u6a21\u578b\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u8868\u73b0\u51fa\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u671f\u671b\u5229\u7528\u5b83\u4eec\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06MLLMs\u7684\u5f3a\u9879\u5e94\u7528\u4e8e\u9a7e\u9a76\u4efb\u52a1\u7684\u89c4\u5212\u90e8\u5206\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u89c4\u5212\u9700\u8981\u5bf9\u4e09\u7ef4\u73af\u5883\u6709\u5168\u9762\u7684\u7406\u89e3\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u4e8c\u7ef4\u63a8\u7406\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u6a21\u578b\u4e0e3D\u9a7e\u9a76\u4efb\u52a1\u7684\u7d27\u5bc6\u5951\u5408\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u76843D MLLM\u67b6\u6784\uff0c\u5b83\u5229\u7528\u7a00\u758f\u67e5\u8be2\u6280\u672f\u5c06\u89c6\u89c9\u8868\u793a\u63d0\u5347\u5e76\u538b\u7f29\u5230\u4e09\u7ef4\u7a7a\u95f4\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u8fd9\u79cd\u57fa\u4e8e\u67e5\u8be2\u7684\u8868\u793a\u65b9\u5f0f\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u540c\u65f6\u7f16\u7801\u52a8\u6001\u7269\u4f53\u548c\u9759\u6001\u5730\u56fe\u5143\u7d20\uff08\u5982\u9053\u8def\uff09\uff0c\u4e3a\u611f\u77e5\u548c\u884c\u52a8\u7684\u5bf9\u9f50\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5316\u7684\u4e09\u7ef4\u4e16\u754c\u6a21\u578b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86OmniDrive-nuScenes\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u89c6\u89c9\u95ee\u7b54\u6570\u636e\u96c6\uff0c\u5b83\u901a\u8fc7\u5168\u9762\u7684\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\uff08\u5982\u573a\u666f\u63cf\u8ff0\u3001\u4ea4\u901a\u89c4\u5219\u7406\u89e3\u3001\u4e09\u7ef4\u5b9a\u4f4d\u3001\u53cd\u4e8b\u5b9e\u63a8\u7406\u3001\u51b3\u7b56\u5236\u5b9a\u548c\u89c4\u5212\uff09\u6765\u8003\u9a8c\u6a21\u578b\u5728\u590d\u6742\u4e09\u7ef4\u573a\u666f\u4e2d\u7684\u771f\u6b63\u60c5\u5883\u610f\u8bc6\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u63d0\u51fa\u7684\u67b6\u6784\u6709\u6548\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u590d\u6742\u4e09\u7ef4\u73af\u5883\u4e2d\u8fdb\u884c\u63a8\u7406\u548c\u89c4\u5212\u65f6\uff0c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002**|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972](http://arxiv.org/abs/2405.00972)|**[link](https://github.com/pnnl/cactus)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCACTUS\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u7ed3\u5408\u4e86\u5316\u5b66\u4fe1\u606f\u5b66\u5de5\u5177\uff0c\u65e8\u5728\u63d0\u5347\u5728\u5316\u5b66\u548c\u5206\u5b50\u53d1\u73b0\u9886\u57df\u7684\u9ad8\u7ea7\u63a8\u7406\u4e0e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u7814\u7a76\u8005\u4eec\u4f7f\u7528\u5305\u62ecGemma-7b\u3001Falcon-7b\u3001MPT-7b\u3001Llama2-7b\u548cMistral-7b\u5728\u5185\u7684\u591a\u6b3e\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u5bf9CACTUS\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6027\u80fd\u8bc4\u4f30\uff0c\u901a\u8fc7\u6570\u5343\u4e2a\u5316\u5b66\u95ee\u9898\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cCACTUS\u660e\u663e\u4f18\u4e8e\u57fa\u7840\u6a21\u578b\uff0c\u5176\u4e2dGemma-7b\u548cMistral-7b\u65e0\u8bba\u91c7\u7528\u4f55\u79cd\u63d0\u793a\u7b56\u7565\uff0c\u8868\u73b0\u6700\u4e3a\u51fa\u8272\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u9886\u57df\u7279\u5b9a\u63d0\u793a\u548c\u786c\u4ef6\u914d\u7f6e\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5f3a\u8c03\u4e86\u63d0\u793a\u5de5\u7a0b\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6307\u51fa\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u90e8\u7f72\u8f83\u5c0f\u6a21\u578b\u53ef\u80fd\u4e0d\u4f1a\u663e\u8457\u727a\u7272\u51c6\u786e\u6027\u3002 CACTUS\u901a\u8fc7\u878d\u5408\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8ba4\u77e5\u529f\u80fd\u4e0e\u4e13\u4e1a\u5de5\u5177\uff0c\u80fd\u591f\u534f\u52a9\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u5206\u5b50\u6027\u8d28\u9884\u6d4b\u3001\u76f8\u4f3c\u6027\u641c\u7d22\u548c\u836f\u7269\u9002\u7528\u6027\u8bc4\u4f30\u7b49\u4efb\u52a1\u3002\u4f5c\u4e3a\u5316\u5b66\u4fe1\u606f\u5b66\u9886\u57df\u7684\u91cd\u5927\u7a81\u7834\uff0cCACTUS\u4e3a\u5316\u5b66\u5bb6\u548c\u5206\u5b50\u63a2\u7d22\u8005\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u5de5\u5177\uff0c\u6709\u671b\u52a0\u901f\u79d1\u5b66\u7814\u7a76\uff0c\u63a8\u52a8\u65b0\u578b\u6709\u6548\u3001\u5b89\u5168\u836f\u7269\u3001\u50ac\u5316\u5242\u548c\u6750\u6599\u7684\u53d1\u73b0\u3002\u6b64\u5916\uff0cCACTUS\u4e0e\u81ea\u52a8\u5316\u5b9e\u9a8c\u5e73\u53f0\u7684\u96c6\u6210\u4ee5\u53ca\u5b9e\u65f6\u6570\u636e\u9a71\u52a8\u51b3\u7b56\u7684\u80fd\u529b\uff0c\u4e3a\u81ea\u4e3b\u53d1\u73b0\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u3002**|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978](http://arxiv.org/abs/2404.18978)|null|\u968f\u7740\u6559\u80b2\u73af\u5883\u4e2d\u5bf9\u5b66\u4e60\u8005\u6a21\u578b\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u7814\u7a76\u91cd\u70b9\u9010\u6e10\u8f6c\u5411\u5982\u4f55\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f8\u7ed3\u5408\uff0c\u63d0\u5347\u5728\u5f00\u653e\u6027\u6587\u672c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u901a\u7528\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e09\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\uff081\uff09\u57fa\u4e8eRL\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8868\u793a\u72b6\u6001\u548c\u884c\u52a8\u7b56\u7565\u4ee5\u5bfb\u627e\u6700\u4f73\u4e92\u52a8\u65b9\u5f0f\uff1b\uff082\uff09\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5229\u7528\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u901a\u8fc7\u63d0\u793a\u8fdb\u884c\u64cd\u4f5c\uff1b\uff083\uff09\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u63d0\u9ad8\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e9b\u4ee3\u7406\u7684\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PharmaSimText\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81eaPharmaSim\u865a\u62df\u836f\u5e97\u73af\u5883\u7684\u65b0\u57fa\u51c6\uff0c\u4e13\u6ce8\u4e8e\u8bca\u65ad\u5bf9\u8bdd\u5b9e\u8df5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRL\u57fa\u7840\u7684\u4ee3\u7406\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u8868\u73b0\u4f18\u79c0\uff0c\u4f46\u5728\u63d0\u95ee\u8d28\u91cf\u4e0a\u6709\u6240\u6b20\u7f3a\uff1b\u800cLLM\u57fa\u7840\u7684\u4ee3\u7406\u5728\u63d0\u95ee\u80fd\u529b\u4e0a\u8f83\u5f3a\uff0c\u4f46\u4efb\u52a1\u5b8c\u6210\u5ea6\u4e0d\u9ad8\u3002\u6700\u540e\uff0c\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\u5c55\u793a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u7684\u6f5c\u529b\uff0c\u8bc1\u5b9e\u4e86RL\u4e0eLLMs\u7ed3\u5408\u7528\u4e8e\u5f00\u53d1\u5f00\u653e\u6027\u5b66\u4e60\u73af\u5883\u9ad8\u8868\u73b0\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021](http://arxiv.org/abs/2404.18021)|null|\u968f\u7740\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u7684\u5174\u8d77\uff0c\u7cbe\u786e\u4fee\u6539\u9057\u4f20\u4fe1\u606f\u5df2\u6210\u4e3a\u53ef\u80fd\uff0c\u4f46\u9ad8\u6548\u57fa\u56e0\u7f16\u8f91\u7cfb\u7edf\u7684\u6784\u5efa\u9700\u8981\u6df1\u5165\u7406\u89e3CRISPR\u6280\u672f\u53ca\u5176\u590d\u6742\u5b9e\u9a8c\u80cc\u666f\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bf8\u591a\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u751f\u7269\u8bbe\u8ba1\u95ee\u9898\u4e0a\u5f80\u5f80\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u3002\u672c\u6587\u4ecb\u7ecdCRISPR-GPT\uff0c\u4e00\u4e2a\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u9886\u57df\u77e5\u8bc6\u548c\u5916\u90e8\u5de5\u5177\uff0c\u4ee5\u81ea\u52a8\u5316\u5e76\u63d0\u5347\u57fa\u4e8eCRISPR\u7684\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\u8bbe\u8ba1\u8fc7\u7a0b\u3002CRISPR-GPT\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u534f\u52a9\u9009\u62e9CRISPR\u7cfb\u7edf\u3001\u8bbe\u8ba1\u5f15\u5bfcRNA\u3001\u63a8\u8350\u7ec6\u80de\u9012\u9001\u65b9\u6cd5\u3001\u8d77\u8349\u534f\u8bae\u4ee5\u53ca\u8bbe\u8ba1\u9a8c\u8bc1\u5b9e\u9a8c\u4ee5\u786e\u8ba4\u7f16\u8f91\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86CRISPR-GPT\u5982\u4f55\u5e2e\u52a9\u975e\u4e13\u5bb6\u7814\u7a76\u4eba\u5458\u4ece\u5934\u5f00\u59cb\u8fdb\u884c\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u5b9e\u9645\u6848\u4f8b\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u81ea\u52a8\u5316\u57fa\u56e0\u7f16\u8f91\u8bbe\u8ba1\u7684\u4f26\u7406\u548c\u76d1\u7ba1\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u8d1f\u8d23\u4efb\u548c\u900f\u660e\u4f7f\u7528\u6b64\u7c7b\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u76ee\u6807\u662f\u5f25\u5408\u521d\u7ea7\u751f\u7269\u7814\u7a76\u8005\u4e0eCRISPR\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u5c55\u793aLLM\u4ee3\u7406\u5728\u4fc3\u8fdb\u590d\u6742\u751f\u7269\u53d1\u73b0\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2404.17833": "|**2024-04-27**|**Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs**|Zhenlan Ji et.al.|[2404.17833](http://arxiv.org/abs/2404.17833)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u4e2d\uff0c\u7279\u522b\u662f\u5728\u5fc3\u7406\u5065\u5eb7\u652f\u6301\u3001\u5316\u5b66\u5408\u6210\u548c\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5c55\u73b0\u6548\u7528\uff0c\u4eba\u4eec\u53d1\u73b0\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u957f\u671f\u89c4\u5212\u65f6\u5bb9\u6613\u4ea7\u751f\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u2014\u2014PDoctor\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u7406\u89e3LLM\u4ee3\u7406\u7684\u9519\u8bef\u89c4\u5212\u3002PDoctor\u9996\u5148\u5b9a\u4e49\u4e86\u4e00\u4e2a\u9886\u57df\u7279\u5b9a\u7684\u8bed\u8a00\uff08DSL\uff09\uff0c\u7528\u4e8e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u501f\u52a9Z3\u7ea6\u675f\u6c42\u89e3\u5668\u751f\u6210\u5404\u79cd\u8f93\u5165\uff0c\u8fd9\u4e9b\u8f93\u5165\u662f\u63cf\u8ff0\u4e00\u7cfb\u5217\u4efb\u52a1\u5b8c\u6210\u9700\u6c42\u7684\u81ea\u7136\u8bed\u8a00\u6bb5\u843d\u3002\u7136\u540e\uff0cPDoctor\u4ece\u8fd9\u4e9b\u9700\u6c42\u4e2d\u63d0\u53d6\u7ea6\u675f\uff0c\u5f62\u6210\u4e00\u4e2a\u6d4b\u8bd5\u57fa\u51c6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u4e3b\u6d41\u7684\u4ee3\u7406\u6846\u67b6\u548c\u4e24\u4e2a\u5f3a\u5927\u7684LLMs\uff08GPT-3.5\u548cGPT-4\uff09\u5bf9PDoctor\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u80fd\u6709\u6548\u8bc6\u522b\u4ee3\u7406\u89c4\u5212\u4e2d\u7684\u5404\u79cd\u9519\u8bef\uff0c\u5e76\u4e3a\u5f00\u53d1\u8005\u548c\u7528\u6237\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u9519\u8bef\u7279\u6027\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u66ff\u4ee3\u8bbe\u8ba1\u548c\u6269\u5c55PDoctor\u7684\u65b9\u5411\u3002|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662](http://arxiv.org/abs/2404.17662)|**[link](https://github.com/alickzhu/player)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u95f4\u7684\u901a\u4fe1\u548c\u793e\u4f1a\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u6d89\u53ca\u7ade\u4e89\u4e0e\u5408\u4f5c\u7684\u52a8\u6001\u73af\u5883\u4e2d\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u590d\u6742\u63a8\u7406\u7684\u6784\u5efa\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5c24\u5176\u662f\u56e0\u4e3a\u57fa\u4e8e\u4fe1\u606f\u56fe\u7684\u641c\u7d22\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPLAYER*\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u4efb\u610f\u91c7\u6837\u5f0f\u89c4\u5212\u5668\u7684\u65b0\u6846\u67b6\uff0c\u5b83\u7ed3\u5408\u4e86\u4f20\u611f\u5668\u548c\u526a\u679d\u6280\u672f\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5b8c\u5168\u4f9d\u8d56\u4e8e\u95ee\u9898\u9a71\u52a8\u7684\u641c\u7d22\u6846\u67b6\uff0c\u9002\u7528\u4e8e\u9ad8\u96be\u5ea6\u7684\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u53ef\u91cf\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u901a\u8fc7\u591a\u9879\u9009\u62e9\u9898\u6765\u6d4b\u8bd5\uff0c\u5e76\u521b\u5efa\u4e86WellPlay\u6570\u636e\u96c6\uff0c\u5305\u542b1,482\u4e2a\u95ee\u7b54\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPLAYER*\u5728\u590d\u6742\u52a8\u6001\u73af\u5883\u4e2d\u7684\u6548\u7387\u548c\u6027\u80fd\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u53ef\u91cf\u5316\u7684\u5bf9\u6bd4\u7ed3\u679c\u3002**|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525](http://arxiv.org/abs/2404.17525)|null|\u4f20\u7edf\u7684\u673a\u68b0\u8bbe\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u901a\u8fc7\u7ecf\u9a8c\u5f15\u5bfc\u7684\u4fee\u6539\u548c\u6709\u9650\u5143\u5206\u6790\uff08FEA\uff09\u6765\u6ee1\u8db3\u7279\u5b9a\u9700\u6c42\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8017\u65f6\u4e14\u9ad8\u5ea6\u4f9d\u8d56\u4e2a\u4eba\u77e5\u8bc6\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u4e86\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u6a21\u578b\u6765\u7b80\u5316\u7e41\u7410\u7684\u4e13\u5bb6\u9a71\u52a8\u8fed\u4ee3\u8fc7\u7a0b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9700\u8981\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5176\u8bad\u7ec3\u9886\u57df\u548c\u4efb\u52a1\uff0c\u9650\u5236\u4e86\u8de8\u4efb\u52a1\u5e94\u7528\u3002\u8fd9\u5728\u81ea\u52a8\u5316\u6548\u7387\u4e0e\u8d44\u6e90\u9700\u6c42\u4e4b\u95f4\u5f62\u6210\u4e86\u6743\u8861\u3002 \u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u5c06\u9884\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u6709\u9650\u5143\u6a21\u5757\u7ed3\u5408\u3002\u6709\u9650\u5143\u6a21\u5757\u8bc4\u4f30\u6bcf\u4e2a\u8bbe\u8ba1\u5e76\u63d0\u4f9b\u5173\u952e\u53cd\u9988\uff0c\u5f15\u5bfcLLMs\u4e0d\u65ad\u5b66\u4e60\u3001\u89c4\u5212\u3001\u751f\u6210\u548c\u4f18\u5316\u8bbe\u8ba1\uff0c\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u8fdb\u884c\u4e13\u95e8\u8bad\u7ec3\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6841\u67b6\u7ed3\u6784\u7684\u8fed\u4ee3\u4f18\u5316\u4e2d\u5c55\u793a\u8fd9\u79cd\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u80fd\u591f\u6839\u636e\u7ed3\u6784\u5316\u7684\u53cd\u9988\u548c\u6807\u51c6\u8c03\u6574\u8bbe\u8ba1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6210\u529f\u751f\u6210\u7b26\u5408\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u6841\u67b6\u7ed3\u6784\u8bbe\u8ba1\uff0c\u6210\u529f\u7387\u9ad8\u8fbe90%\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6240\u65bd\u52a0\u7684\u7ea6\u675f\u6761\u4ef6\u3002\u901a\u8fc7\u63d0\u793a\u5f0f\u4f18\u5316\u6280\u672f\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u63a5\u6536\u5230\u89e3-\u5f97\u5206\u5bf9\u540e\uff0c\u80fd\u591f\u6839\u636e\u5176\u5185\u5728\u63a8\u7406\u80fd\u529b\u8fed\u4ee3\u4f18\u5316\u8bbe\u8ba1\u4ee5\u6ee1\u8db3\u89c4\u683c\u8981\u6c42\u3002 LLM\u4ee3\u7406\u80fd\u591f\u4ea7\u751f\u53ef\u884c\u7684\u8bbe\u8ba1\u5e76\u6839\u636e\u5176\u56fa\u6709\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4f18\u5316\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u81ea\u4e3b\u53d1\u5c55\u548c\u5b9e\u65bd\u6709\u6548\u7684\u8bbe\u8ba1\u7b56\u7565\u3002|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460](http://arxiv.org/abs/2404.17460)|null|\u672c\u6587\u8ba8\u8bba\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5bf9\u8bdd\u5f0f\u8f85\u5bfc\u7cfb\u7edf\uff08Conversational Tutoring Systems\uff0cCTS\uff09\uff0c\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u9996\u5148\uff0c\u7cfb\u7edf\u901a\u8fc7\u81ea\u52a8\u4ece\u8bfe\u7a0b\u6587\u672c\u4e2d\u751f\u6210\u6613\u4e8e\u7f16\u8f91\u7684\u6559\u5b66\u811a\u672c\uff0c\u5b9e\u73b0AI\u8f85\u52a9\u7684\u5185\u5bb9\u521b\u4f5c\u3002\u5176\u6b21\uff0c\u7cfb\u7edf\u901a\u8fc7\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff08Ruffle\u548cRiley\uff09\u4ee5\u5b66\u4e60\u6559\u5b66\u6a21\u5f0f\u8fd0\u884c\uff0c\u5206\u522b\u626e\u6f14\u5b66\u751f\u548c\u6559\u6388\u89d2\u8272\uff0c\u8fdb\u884c\u81ea\u7531\u5f62\u5f0f\u7684\u5bf9\u8bdd\uff0c\u9075\u5faa\u5178\u578b\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\u7684\u5185\u73af\u548c\u5916\u73af\u7ed3\u6784\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff08N=200\uff09\u4e2d\u5bf9\u6bd4\u4e86\u8be5\u7cfb\u7edf\u4e0e\u7b80\u5355\u7684\u95ee\u7b54\u804a\u5929\u673a\u5668\u4eba\u548c\u9605\u8bfb\u6d3b\u52a8\u5728\u652f\u6301\u751f\u7269\u5b66\u8bfe\u7a0b\u7684\u6548\u679c\u3002\u7814\u7a76\u5206\u6790\u4e86\u7cfb\u7edf\u4f7f\u7528\u6a21\u5f0f\u3001\u9884\u540e\u6d4b\u8bd5\u6210\u7ee9\u4ee5\u53ca\u7528\u6237\u4f53\u9a8c\u8c03\u67e5\uff0c\u7ed3\u679c\u663e\u793a\u7528\u6237\u5bf9Ruffle&Riley\u7684\u53c2\u4e0e\u5ea6\u9ad8\uff0c\u7406\u89e3\u529b\u5f3a\uff0c\u5e76\u8ba4\u4e3a\u63d0\u4f9b\u7684\u652f\u6301\u6709\u5e2e\u52a9\u3002\u5c3d\u7ba1Ruffle&Riley\u7528\u6237\u7684\u5b8c\u6210\u65f6\u95f4\u8f83\u957f\uff0c\u4f46\u5728\u77ed\u671f\u5b66\u4e60\u6210\u6548\u4e0a\u5e76\u672a\u53d1\u73b0\u663e\u8457\u5dee\u5f02\uff0c\u4f18\u4e8e\u9605\u8bfb\u6d3b\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u67b6\u6784\u548c\u7528\u6237\u7814\u7a76\u4e3a\u672a\u6765CTS\u8bbe\u8ba1\u8005\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u6e90\u6211\u4eec\u7684\u7cfb\u7edf\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8eLLM\u7684\u5b66\u4e60\u6280\u672f\u6709\u6548\u6559\u5b66\u8bbe\u8ba1\u7684\u7814\u7a76\u3002|\n", "2404.17153": "|**2024-04-26**|**A Unified Debugging Approach via LLM-Based Multi-Agent Synergy**|Cheryl Lee et.al.|[2404.17153](http://arxiv.org/abs/2404.17153)|**[link](https://github.com/acceptepapier/unidebugger)**|\u5728\u8f6f\u4ef6\u8c03\u8bd5\u8fd9\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u52aa\u529b\u5b9e\u73b0\u81ea\u52a8\u5316\uff0c\u5305\u62ec\u6545\u969c\u5b9a\u4f4d\u548c\u4fee\u590d\u751f\u6210\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u8c03\u8bd5\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4f20\u7edf\u548c\u57fa\u4e8eLLM\u7684\u8c03\u8bd5\u5de5\u5177\u9762\u4e34\u4e09\u5927\u6311\u6218\uff1a1\uff09\u4e0a\u6e38\u7684\u6545\u969c\u5b9a\u4f4d\u4e0d\u51c6\u786e\u4f1a\u6ce2\u53ca\u4e0b\u6e38\u7684\u4fee\u590d\uff1b2\uff09\u5904\u7406\u590d\u6742\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\u4e0d\u8db3\uff1b3\uff09\u5ffd\u89c6\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u81ea\u52a8\u5316\u7684\u3001\u7edf\u4e00\u7684\u8c03\u8bd5\u6846\u67b6\u2014\u2014FixAgent\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u534f\u540c\u3002FixAgent\u80fd\u6267\u884c\u7aef\u5230\u7aef\u7684\u6545\u969c\u5b9a\u4f4d\u3001\u4fee\u590d\u548c\u5206\u6790\u3002 \u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0cLLMs\u80fd\u591f\u4ece\u4eba\u7c7b\u5f00\u53d1\u8005\u8ba4\u53ef\u7684\u901a\u7528\u8f6f\u4ef6\u5de5\u7a0b\u539f\u5219\u4e2d\u83b7\u76ca\uff0c\u6bd4\u5982\u201c\u6a61\u76ae\u9e2d\u8c03\u8bd5\u201d\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7a0b\u5e8f\u529f\u80fd\u548c\u903b\u8f91\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u4e2a\u7075\u611f\u6765\u6e90\u4e8e\u201c\u6a61\u76ae\u9e2d\u201d\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u4ee3\u7406\u4e13\u4e1a\u5316\u4e0e\u534f\u540c\u3001\u5173\u952e\u53d8\u91cf\u8ddf\u8e2a\u548c\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u4fc3\u4f7fLLMs\u63d0\u4f9b\u660e\u786e\u7684\u89e3\u91ca\uff0c\u5e76\u805a\u7126\u4e8e\u5173\u952e\u7684\u7a0b\u5e8f\u903b\u8f91\u4fe1\u606f\u3002\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684QuixBugs\u6570\u636e\u96c6\u4e0a\uff0cFixAgent\u6210\u529f\u4fee\u590d\u4e8680\u4e2abug\u4e2d\u768479\u4e2a\uff0c\u5176\u4e2d9\u4e2a\u662f\u4e4b\u524d\u672a\u89e3\u51b3\u7684\u3002\u5b83\u8fd8\u5728CodeFlaws\u4e0a\u5408\u7406\u5730\u4fee\u590d\u4e861.9\u500d\u4e8e\u6700\u4f73\u4fee\u590d\u5de5\u5177\u7684\u7f3a\u9677\uff0c\u800c\u4e14\u65e0\u9700\u4f4d\u7f6e\u4fe1\u606f\uff0c\u91c7\u6837\u7387\u4f4e\u4e8e0.6%\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4e0e\u4f7f\u7528\u4e0d\u540cLLM\u7684\u57fa\u7ebf\u6a21\u578b\u76f8\u6bd4\uff0cFixAgent\u63d0\u9ad8\u4e86\u7ea620%\u7684\u5408\u7406\u4fee\u590d\u548c\u6b63\u786e\u4fee\u590d\u7387\uff0c\u663e\u793a\u51fa\u6211\u4eec\u8bbe\u8ba1\u7684\u6709\u6548\u6027\u3002 \u6b64\u5916\uff0cFixAgent\u7684\u6b63\u786e\u7387\u9ad8\u8fbe97.26%\uff0c\u8868\u660e\u5b83\u6709\u53ef\u80fd\u514b\u670d\u73b0\u6709\u65b9\u6cd5\u7684\u8fc7\u62df\u5408\u95ee\u9898\u3002\u603b\u7ed3\u6765\u8bf4\uff0cFixAgent\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u81ea\u52a8\u5316\u8c03\u8bd5\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u8f6f\u4ef6\u8c03\u8bd5\u7684\u6548\u7387\u548c\u51c6\u786e\u6027\u3002|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698](http://arxiv.org/abs/2404.16698)|**[link](https://github.com/giorgiopiatti/govsim)**|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51b3\u7b56\u5b89\u5168\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGovernance of the Commons Simulation\u201d\uff08GovSim\uff09\u7684\u6a21\u62df\u5e73\u53f0\uff0c\u65e8\u5728\u7814\u7a76LLMs\u4e2d\u7684\u6218\u7565\u4e92\u52a8\u548c\u5408\u4f5c\u51b3\u7b56\u3002\u901a\u8fc7\u8fd9\u4e2a\u73af\u5883\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u4e4b\u95f4\u8d44\u6e90\u5206\u4eab\u7684\u52a8\u6001\uff0c\u5f3a\u8c03\u4e86\u4f26\u7406\u8003\u91cf\u3001\u6218\u7565\u89c4\u5212\u548c\u8c08\u5224\u6280\u5de7\u7684\u91cd\u8981\u6027\u3002GovSim\u5177\u6709\u7075\u6d3b\u6027\uff0c\u652f\u6301\u6587\u672c\u578b\u4ee3\u7406\uff0c\u5305\u62ecLLMs\u3002\u5229\u7528\u751f\u6210\u5f0f\u4ee3\u7406\u6846\u67b6\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u901a\u7528\u4ee3\u7406\uff0c\u4fbf\u4e8e\u6574\u5408\u4e0d\u540c\u7684LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u5728GovSim\u4e2d\uff0c\u53ea\u670915\u4e2a\u6d4b\u8bd5\u6a21\u578b\u4e2d\u76842\u4e2a\u80fd\u591f\u5b9e\u73b0\u53ef\u6301\u7eed\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u7ba1\u7406\u5171\u4eab\u8d44\u6e90\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u5982\u679c\u79fb\u9664\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u80fd\u529b\uff0c\u5b83\u4eec\u4f1a\u8fc7\u5ea6\u4f7f\u7528\u5171\u4eab\u8d44\u6e90\uff0c\u7a81\u51fa\u4e86\u5408\u4f5c\u4e2d\u6c9f\u901a\u7684\u5173\u952e\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u5927\u591a\u6570LLMs\u7f3a\u4e4f\u666e\u904d\u5316\u7684\u5047\u8bbe\u80fd\u529b\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u63a8\u7406\u6280\u80fd\u7684\u4e00\u4e2a\u91cd\u8981\u5f31\u70b9\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u6240\u6709\u7814\u7a76\u7ed3\u679c\uff0c\u5305\u62ec\u6a21\u62df\u73af\u5883\u3001\u4ee3\u7406\u63d0\u793a\u4ee5\u53ca\u5168\u9762\u7684\u7f51\u7edc\u754c\u9762\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba8\u8bba\u3002|\n", "2404.17605": "|**2024-04-24**|**Autonomous LLM-driven research from data to human-verifiable research papers**|Tal Ifargan et.al.|[2404.17605](http://arxiv.org/abs/2404.17605)|**[link](https://github.com/technion-kishony-lab/data-to-paper)**|**\u968f\u7740\u4eba\u5de5\u667a\u80fd\u63a8\u52a8\u79d1\u5b66\u53d1\u73b0\u7684\u6b65\u4f10\u52a0\u5feb\uff0c\u4eba\u4eec\u8fd8\u4e0d\u6e05\u695a\u5b8c\u5168\u7531AI\u9a71\u52a8\u7684\u7814\u7a76\u662f\u5426\u53ef\u884c\uff0c\u4ee5\u53ca\u5b83\u80fd\u5426\u9075\u5faa\u5173\u952e\u7684\u79d1\u5b66\u4ef7\u503c\u89c2\uff0c\u5982\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002\u4e3a\u4e86\u6a21\u62df\u4eba\u7c7b\u7684\u79d1\u5b66\u7814\u7a76\u5b9e\u8df5\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u201c\u6570\u636e\u5230\u8bba\u6587\u201d\uff08data-to-paper\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u5e73\u53f0\uff0c\u5f15\u5bfc\u76f8\u4e92\u534f\u4f5c\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u901a\u8fc7\u5b8c\u6574\u7684\u5206\u6b65\u9aa4\u7814\u7a76\u6d41\u7a0b\uff0c\u540c\u65f6\u7a0b\u5e8f\u5316\u8ffd\u8e2a\u4fe1\u606f\u6d41\uff0c\u5e76\u5141\u8bb8\u4eba\u7c7b\u76d1\u7763\u548c\u4e92\u52a8\u3002\u5728\u81ea\u52a8\u6a21\u5f0f\u4e0b\uff0c\u4ec5\u63d0\u4f9b\u6807\u6ce8\u6570\u636e\uff0c\u8be5\u5e73\u53f0\u5c31\u80fd\u63d0\u51fa\u5047\u8bbe\uff0c\u8bbe\u8ba1\u7814\u7a76\u8ba1\u5212\uff0c\u7f16\u5199\u548c\u8c03\u8bd5\u5206\u6790\u4ee3\u7801\uff0c\u751f\u6210\u548c\u89e3\u8bfb\u7ed3\u679c\uff0c\u751a\u81f3\u521b\u5efa\u5b8c\u6574\u4e14\u4fe1\u606f\u53ef\u8ffd\u6eaf\u7684\u79d1\u7814\u8bba\u6587\u3002\u5c3d\u7ba1\u7814\u7a76\u65b0\u9896\u6027\u6709\u9650\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u5c55\u793a\u4e86AI\u81ea\u4e3b\u4ece\u6570\u636e\u4e2d\u751f\u6210\u539f\u521b\u5b9a\u91cf\u6d1e\u5bdf\u7684\u80fd\u529b\u3002\u5bf9\u4e8e\u7b80\u5355\u7684\u7814\u7a76\u76ee\u6807\uff0c\u5168\u81ea\u52a8\u6d41\u7a0b\u80fd\u521b\u4f5c\u51fa\u5927\u7ea680-90%\u65e0\u9700\u91cd\u5927\u9519\u8bef\u7684\u7a3f\u4ef6\uff0c\u7136\u800c\u968f\u7740\u76ee\u6807\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u4eba\u7c7b\u7684\u5171\u540c\u53c2\u4e0e\u5bf9\u4e8e\u4fdd\u8bc1\u51c6\u786e\u6027\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u751f\u6210\u7684\u8bba\u6587\u672c\u8eab\u4e5f\u5177\u6709\u5185\u5728\u7684\u53ef\u9a8c\u8bc1\u6027\uff0c\u56e0\u4e3a\u4fe1\u606f\u8ffd\u8e2a\u4f7f\u5f97\u7ed3\u679c\u3001\u65b9\u6cd5\u548c\u6570\u636e\u7684\u94fe\u63a5\u53ef\u4ee5\u7a0b\u5e8f\u5316\u8fdb\u884c\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0cAI\u9a71\u52a8\u7684\u79d1\u7814\u53ef\u4ee5\u52a0\u901f\u79d1\u5b66\u53d1\u73b0\uff0c\u540c\u65f6\u589e\u5f3a\u800c\u975e\u5a01\u80c1\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002**|\n", "2404.16115": "|**2024-04-24**|**Online Personalizing White-box LLMs Generation with Neural Bandits**|Zekai Chen et.al.|[2404.16115](http://arxiv.org/abs/2404.16115)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u59cb\u751f\u6210\u4e2a\u6027\u5316\u7684\u6587\u672c\u5185\u5bb9\uff0c\u5982\u4f55\u5728\u4e0d\u4e3a\u6bcf\u4f4d\u7528\u6237\u521b\u5efa\u72ec\u7279\u6a21\u578b\u7684\u8d44\u6e90\u6d88\u8017\u4e0b\u5b9e\u73b0\u9ad8\u6548\u4e2a\u6027\u5316\u6210\u4e86\u65b0\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5728\u7ebf\u65b9\u6cd5\uff0c\u5229\u7528\u795e\u7ecf_bandit\u7b97\u6cd5\u52a8\u6001\u4f18\u5316\u8f6f\u6307\u4ee4\u5d4c\u5165\uff0c\u6839\u636e\u7528\u6237\u53cd\u9988\u8c03\u6574\u5185\u5bb9\uff0c\u4ece\u800c\u63d0\u5347\u767d\u76d2LLMs\u5f00\u653e\u6027\u6587\u672c\u751f\u6210\u7684\u4e2a\u6027\u5316\u6c34\u5e73\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u57fa\u7840\u7b56\u7565\u6709\u663e\u8457\u6027\u80fd\u63d0\u5347\u3002\u7279\u522b\u662f\u9488\u5bf9\u4e2a\u6027\u5316\u65b0\u95fb\u6807\u9898\u751f\u6210\uff0cNeuralTS\u5e26\u6765\u4e86\u9ad8\u8fbe62.9%\u7684\u6700\u4f73ROUGE\u5206\u6570\u63d0\u5347\u4ee5\u53ca2.76%\u7684LLM\u4ee3\u7406\u8bc4\u4f30\u5206\u6570\u589e\u957f\uff0c\u8fd9\u8868\u660e\u5176\u6548\u679c\u663e\u8457\u3002|\n", "2404.15974": "|**2024-04-24**|**A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples**|Lihang Pan et.al.|[2404.15974](http://arxiv.org/abs/2404.15974)|null|## \u7ffb\u8bd1 \u5355\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fde\u63a5\u591a\u4e2aLLM\u4ee3\u7406\u6784\u5efa\u7684\u7f51\u7edc\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6574\u4f53\u6027\u80fd\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4eba\u673a\u534f\u4f5c\u5de5\u5177\u2014\u2014EasyLAN\uff0c\u65e8\u5728\u5e2e\u52a9\u5f00\u53d1\u8005\u8f7b\u677e\u6784\u5efaLLM\u4ee3\u7406\u7f51\u7edc\uff08LAN\uff09\u3002EasyLAN\u9996\u5148\u6839\u636e\u4efb\u52a1\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u4ec5\u5305\u542b\u4e00\u4e2a\u4ee3\u7406\u7684\u521d\u59cb\u7f51\u7edc\u3002\u63a5\u7740\uff0c\u5b83\u5229\u7528\u5c11\u91cf\u8bad\u7ec3\u793a\u4f8b\u6765\u8c03\u6574\u7f51\u7edc\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u793a\u4f8b\uff0cEasyLAN\u5206\u6790\u8f93\u51fa\u4e0e\u771f\u5b9e\u7ed3\u679c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u5e76\u627e\u51fa\u9519\u8bef\u7684\u539f\u56e0\u3002EasyLAN\u4f1a\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u7b56\u7565\u6765\u4fee\u6b63\u8fd9\u4e9b\u95ee\u9898\u3002\u7528\u6237\u53ef\u4ee5\u4ecb\u5165EasyLAN\u7684\u5de5\u4f5c\u6d41\u7a0b\u6216\u76f4\u63a5\u4fee\u6539LAN\u3002\u6700\u7ec8\uff0cLAN\u4ece\u5355\u4e2a\u4ee3\u7406\u53d1\u5c55\u6210\u591a\u4ee3\u7406\u7684\u7f51\u7edc\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEasyLAN\u80fd\u591f\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u6784\u5efa\u6027\u80fd\u826f\u597d\u7684LAN\u3002|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269](http://arxiv.org/abs/2404.15269)|**[link](https://github.com/gao-g/prelude)**|**\u6211\u4eec\u7814\u7a76\u57fa\u4e8e\u7528\u6237\u5bf9\u8bed\u8a00\u6a21\u578b\u7f16\u8f91\u7684\u4e92\u52a8\u5b66\u4e60\u8bed\u8a00\u4ee3\u7406\u3002\u5728\u8bf8\u5982\u5199\u4f5c\u52a9\u624b\u7684\u5e38\u89c1\u573a\u666f\u4e2d\uff0c\u7528\u6237\u4e0e\u8bed\u8a00\u4ee3\u7406\u4ea4\u4e92\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u751f\u6210\u54cd\u5e94\uff0c\u5e76\u53ef\u80fd\u9009\u62e9\u6027\u5730\u7f16\u8f91\u4ee3\u7406\u7684\u54cd\u5e94\u4ee5\u53cd\u6620\u4ed6\u4eec\u7684\u6f5c\u5728\u504f\u597d\uff0c\u540c\u65f6\u63d0\u9ad8\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u7f16\u8f91\u53cd\u9988\u662f\u81ea\u7136\u4ea7\u751f\u7684\uff0c\u9002\u5408\u7528\u4e8e\u63d0\u5347\u4ee3\u7406\u4e0e\u7528\u6237\u504f\u597d\u7684\u5951\u5408\u5ea6\uff0c\u964d\u4f4e\u540e\u7eed\u7528\u6237\u7684\u7f16\u8f91\u6210\u672c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPRELUDE\u6846\u67b6\uff0c\u5b83\u6839\u636e\u5386\u53f2\u7f16\u8f91\u6570\u636e\u63a8\u65ad\u7528\u6237\u7684\u6f5c\u5728\u504f\u597d\uff0c\u5e76\u636e\u6b64\u8bbe\u8ba1\u4e00\u4e2a\u63d0\u793a\u7b56\u7565\uff0c\u5f15\u5bfc\u672a\u6765\u7684\u54cd\u5e94\u751f\u6210\uff0c\u907f\u514d\u4e86\u6602\u8d35\u4e14\u96be\u4ee5\u6269\u5c55\u7684\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8fd8\u80fd\u4fdd\u6301\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u5b66\u4e60\u63cf\u8ff0\u6027\u7684\u504f\u597d\u6709\u52a9\u4e8e\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\uff0c\u7528\u6237\u53ef\u4ee5\u67e5\u770b\u548c\u8c03\u6574\u5b66\u4e60\u5230\u7684\u504f\u597d\u3002\u7136\u800c\uff0c\u7528\u6237\u504f\u597d\u53ef\u80fd\u590d\u6742\u591a\u53d8\uff0c\u53d7\u60c5\u5883\u5f71\u54cd\uff0c\u56e0\u6b64\u5b66\u4e60\u8d77\u6765\u5177\u6709\u6311\u6218\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCIPHER\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6839\u636e\u7528\u6237\u7f16\u8f91\u63a8\u65ad\u7ed9\u5b9a\u60c5\u5883\u4e0b\u7684\u7528\u6237\u504f\u597d\u3002\u672a\u6765\uff0cCIPHER\u4f1a\u4ece\u5386\u53f2\u4e2d\u7684k\u4e2a\u6700\u63a5\u8fd1\u7684\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u63a8\u65ad\u51fa\u7684\u504f\u597d\uff0c\u7efc\u5408\u751f\u6210\u54cd\u5e94\u3002\u6211\u4eec\u5728\u603b\u7ed3\u548c\u7535\u5b50\u90ae\u4ef6\u5199\u4f5c\u4e24\u4e2a\u4e92\u52a8\u73af\u5883\u4e2d\u4f7f\u7528GPT-4\u6a21\u62df\u7528\u6237\u8fdb\u884c\u8bc4\u4f30\uff0c\u4e0e\u76f4\u63a5\u4f7f\u7528\u7528\u6237\u7f16\u8f91\u4f46\u4e0d\u5b66\u4e60\u63cf\u8ff0\u6027\u504f\u597d\u7684\u7b97\u6cd5\uff0c\u4ee5\u53ca\u5b66\u4e60\u5168\u5c40\u65e0\u4e0a\u4e0b\u6587\u504f\u597d\u7684\u7b97\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002 \u5728\u4e24\u9879\u4efb\u52a1\u4e2d\uff0cCIPHER\u90fd\u5b9e\u73b0\u4e86\u6700\u4f4e\u7684\u7f16\u8f91\u8ddd\u79bb\u6210\u672c\uff0c\u5e76\u4e14\u5b66\u4e60\u5230\u7684\u504f\u597d\u4e0e\u771f\u5b9e\u504f\u597d\u663e\u793a\u51fa\u663e\u8457\u7684\u76f8\u4f3c\u6027\u3002**|\n", "2404.14387": "|**2024-04-22**|**A Survey on Self-Evolution of Large Language Models**|Zhengwei Tao et.al.|[2404.14387](http://arxiv.org/abs/2404.14387)|**[link](https://github.com/alibabaresearch/damo-convai)**|**## \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u9886\u57df\u548c\u667a\u80fd\u4ee3\u7406\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4f9d\u8d56\u4eba\u7c7b\u6216\u5916\u90e8\u6a21\u578b\u76d1\u7763\u7684\u73b0\u6709LLMs\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u591a\u6837\u6027\u589e\u52a0\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6210\u672c\u9ad8\u6602\u548c\u6027\u80fd\u74f6\u9888\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u5e94\u8fd0\u800c\u751f\uff0c\u8fd9\u79cd\u7b56\u7565\u5141\u8bb8LLMs\u81ea\u4e3b\u83b7\u53d6\u3001\u7cbe\u70bc\u5e76\u4ece\u81ea\u8eab\u751f\u6210\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\uff0c\u501f\u9274\u4eba\u7c7b\u7ecf\u9a8c\u5b66\u4e60\u8fc7\u7a0b\uff0c\u6709\u671b\u63a8\u52a8LLMs\u5411\u8d85\u7ea7\u667a\u80fd\u53d1\u5c55\u3002\u672c\u6587\u5168\u9762\u7efc\u8ff0\u4e86LLMs\u4e2d\u7684\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u6982\u5ff5\u6846\u67b6\uff0c\u5c06\u8fdb\u5316\u8fc7\u7a0b\u5212\u5206\u4e3a\u8fed\u4ee3\u5faa\u73af\u7684\u56db\u4e2a\u9636\u6bb5\uff1a\u7ecf\u9a8c\u83b7\u53d6\u3001\u7ecf\u9a8c\u7ec6\u5316\u3001\u66f4\u65b0\u548c\u8bc4\u4f30\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5206\u7c7b\u63a2\u8ba8LLMs\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u8fdb\u5316\u76ee\u6807\uff0c\u5e76\u5bf9\u76f8\u5173\u6587\u732e\u8fdb\u884c\u603b\u7ed3\uff0c\u63d0\u4f9b\u6bcf\u4e2a\u6a21\u5757\u7684\u5206\u7c7b\u548c\u89c1\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5f53\u524d\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u4e3a\u52a0\u901f\u81ea\u6f14\u8fdbLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u5173\u952e\u6d1e\u89c1\u3002**|\n", "2404.13501": "|**2024-04-21**|**A Survey on the Memory Mechanism of Large Language Model based Agents**|Zeyu Zhang et.al.|[2404.13501](http://arxiv.org/abs/2404.13501)|**[link](https://github.com/nuster1128/llm_agent_memory_survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u79d1\u7814\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u57fa\u4e8eLLMs\u7684\u667a\u80fd\u4ee3\u7406\u56e0\u5176\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5bf9\u4e8e\u89e3\u51b3\u9700\u8981\u957f\u671f\u590d\u6742\u4ea4\u4e92\u7684\u73b0\u5b9e\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u652f\u6301agent-environment\u4ea4\u4e92\u7684\u5173\u952e\u8981\u7d20\u662f\u4ee3\u7406\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u5c3d\u7ba1\u5df2\u6709\u4f17\u591a\u6709\u524d\u666f\u7684\u8bb0\u5fc6\u8bbe\u8ba1\u88ab\u63d0\u51fa\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u5206\u6563\u5728\u591a\u7bc7\u8bba\u6587\u4e2d\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7efc\u8ff0\u6765\u7cfb\u7edf\u6027\u5730\u603b\u7ed3\u548c\u6bd4\u8f83\uff0c\u672a\u80fd\u63d0\u70bc\u51fa\u901a\u7528\u4e14\u6709\u6548\u7684\u8bbe\u8ba1\u6a21\u5f0f\u4ee5\u542f\u53d1\u540e\u7eed\u7814\u7a76\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4efd\u5173\u4e8eLLM\u57fa\u4ee3\u7406\u8bb0\u5fc6\u673a\u5236\u7684\u5168\u9762\u8c03\u67e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8\u8bb0\u5fc6\u5728LLM\u4ee3\u7406\u4e2d\u7684\u201c\u662f\u4ec0\u4e48\u201d\u4ee5\u53ca\u201c\u4e3a\u4ec0\u4e48\u9700\u8981\u201d\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u56de\u987e\u4e86\u5173\u4e8e\u8bb0\u5fc6\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4f1a\u5c55\u793a\u8bb0\u5fc6\u6a21\u5757\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u626e\u6f14\u7684\u91cd\u8981\u89d2\u8272\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f1a\u5206\u6790\u73b0\u6709\u5de5\u4f5c\u7684\u5c40\u9650\uff0c\u5e76\u6307\u51fa\u91cd\u8981\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u4e86\u8ddf\u8e2a\u8be5\u9886\u57df\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2aGitHub\u4ed3\u5e93\uff1a\\url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}\u3002**|\n", "2404.11964": "|**2024-04-18**|**From Language Models to Practical Self-Improving Computer Agents**|Alex Sheng et.al.|[2404.11964](http://arxiv.org/abs/2404.11964)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u76f4\u63a5\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u521b\u5efa\u80fd\u591f\u6267\u884c\u5404\u79cd\u8ba1\u7b97\u673a\u4efb\u52a1\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u5e76\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u6765\u53d1\u5c55\u5de5\u5177\u548c\u589e\u5f3a\u529f\u80fd\uff0c\u4ee5\u89e3\u51b3\u65e5\u76ca\u590d\u6742\u7684\u4efb\u52a1\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u663e\u793a\u51fa\u4ece\u975e\u53c2\u6570\u589e\u5f3a\u4e2d\u83b7\u76ca\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u5927\u91cf\u96c6\u4e2d\u5728\u5f00\u53d1\u8f6f\u4ef6\uff0c\u4ee5\u8d4b\u4e88LLMs\u5404\u79cd\u80fd\u529b\u3002\u6211\u4eec\u5efa\u8bae\uff0c\u901a\u8fc7\u9002\u5f53\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u4e00\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u7cfb\u7edf\u5730\u751f\u6210\u8f6f\u4ef6\u6765\u589e\u5f3a\u81ea\u8eab\uff0c\u800c\u4e0d\u662f\u4f9d\u8d56\u4eba\u7c7b\u5de5\u7a0b\u7684\u9759\u6001\u8f6f\u4ef6\u5f00\u53d1\u3002 \u6211\u4eec\u901a\u8fc7\u4e00\u4e9b\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff1a\u4ec5\u901a\u8fc7\u7ec8\u7aef\u8bbf\u95ee\uff0c\u6211\u4eec\u5f15\u5bfcLLM\u4ee3\u7406\u6dfb\u52a0\u4e86\u68c0\u7d22\u3001\u4e92\u8054\u7f51\u641c\u7d22\u3001\u7f51\u9875\u5bfc\u822a\u548c\u6587\u672c\u7f16\u8f91\u529f\u80fd\u3002\u8be5\u4ee3\u7406\u6709\u6548\u5730\u5229\u7528\u8fd9\u4e9b\u5de5\u5177\u89e3\u51b3\u4e86\u95ee\u9898\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u548c\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\u3002\u8fd9\u79cd\u65b9\u6cd5\u8868\u660e\uff0c\u901a\u8fc7\u8fde\u7eed\u63d0\u95ee\u548c\u5de7\u5999\u7684\u63d0\u793a\u8bbe\u8ba1\uff0cLLM\u80fd\u591f\u81ea\u4e3b\u6269\u5c55\u5176\u529f\u80fd\uff0c\u6267\u884c\u5b9e\u9645\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002|\n", "2404.11794": "|**2024-04-25**|**Automated Social Science: Language Models as Scientist and Subjects**|Benjamin S. Manning et.al.|[2404.11794](http://arxiv.org/abs/2404.11794)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u6784\u5efa\u548c\u6d4b\u8bd5\u793e\u4f1a\u79d1\u5b66\u5047\u8bbe\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u5173\u952e\u5728\u4e8e\u4f7f\u7528\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u3002\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9648\u8ff0\u5047\u8bbe\u7684\u8bed\u8a00\u3001\u6784\u5efaLLM\u57fa\u7840\u4ee3\u7406\u7684\u84dd\u56fe\u3001\u5b9e\u9a8c\u8bbe\u8ba1\u4ee5\u53ca\u6570\u636e\u5206\u6790\u8ba1\u5212\u3002\u62df\u5408\u540e\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u53ef\u4f9b\u9884\u6d4b\u6216\u89c4\u5212\u540e\u7eed\u5b9e\u9a8c\u3002\u6211\u4eec\u901a\u8fc7\u51e0\u4e2a\u573a\u666f\u8fdb\u884c\u4e86\u6f14\u793a\uff1a\u8c08\u5224\u3001\u4fdd\u91ca\u542c\u8bc1\u4f1a\u3001\u6c42\u804c\u9762\u8bd5\u548c\u62cd\u5356\u3002\u5728\u8fd9\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7cfb\u7edf\u65e2\u63d0\u51fa\u4e86\u56e0\u679c\u5173\u7cfb\uff0c\u4e5f\u8fdb\u884c\u4e86\u68c0\u9a8c\uff0c\u53d1\u73b0\u4e86\u4e00\u4e9b\u8bc1\u636e\uff0c\u800c\u6709\u4e9b\u5219\u6ca1\u6709\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u4ece\u8fd9\u4e9b\u793e\u4f1a\u4e92\u52a8\u6a21\u62df\u4e2d\u83b7\u53d6\u7684\u6d1e\u5bdf\u5e76\u975e\u4ec5\u901a\u8fc7\u76f4\u63a5\u8be2\u95eeLLM\u5c31\u80fd\u83b7\u5f97\u3002\u5f53\u7ed9\u5b9a\u6bcf\u4e2a\u573a\u666f\u7684\u5efa\u8bae\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u65f6\uff0cLLM\u5728\u9884\u6d4b\u4f30\u8ba1\u6548\u5e94\u7684\u7b26\u53f7\u65b9\u9762\u8868\u73b0\u826f\u597d\uff0c\u4f46\u65e0\u6cd5\u53ef\u9760\u5730\u9884\u6d4b\u6548\u5e94\u7684\u5927\u5c0f\u3002\u5728\u62cd\u5356\u5b9e\u9a8c\u4e2d\uff0c\u6a21\u62df\u7ed3\u679c\u4e0e\u62cd\u5356\u7406\u8bba\u7684\u9884\u6d4b\u7d27\u5bc6\u543b\u5408\uff0c\u4f46LLM\u76f4\u63a5\u63d0\u53d6\u7684\u6e05\u7b97\u4ef7\u683c\u9884\u6d4b\u4e0d\u51c6\u786e\u3002\u7136\u800c\uff0c\u5982\u679c\u6a21\u578b\u80fd\u57fa\u4e8e\u62df\u5408\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u8fdb\u884c\u6761\u4ef6\u5316\uff0cLLM\u7684\u9884\u6d4b\u4f1a\u5927\u5e45\u6539\u8fdb\u3002\u7b80\u800c\u8a00\u4e4b\uff0cLLM\u77e5\u9053\u7684\u6bd4\u5b83\u80fd\u7acb\u5373\u8868\u8fbe\u7684\u8981\u591a\u3002|\n", "2404.11483": "|**2024-04-17**|**AgentKit: Flow Engineering with Graphs, not Coding**|Yue Wu et.al.|[2404.11483](http://arxiv.org/abs/2404.11483)|**[link](https://github.com/holmeswww/agentkit)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u76f4\u89c2\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u6846\u67b6\uff08AgentKit\uff09\uff0c\u65e8\u5728\u4e3a\u591a\u529f\u80fd\u4ee3\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u65b9\u6cd5\u3002AgentKit\u901a\u8fc7\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6784\u5efa\u590d\u6742\u7684\u201c\u601d\u7ef4\u8fc7\u7a0b\u201d\u3002\u5176\u57fa\u672c\u5355\u5143\u662f\u8282\u70b9\uff0c\u5305\u542b\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u3002\u7528\u6237\u53ef\u4ee5\u50cf\u62fc\u63a5\u4e50\u9ad8\u79ef\u6728\u4e00\u6837\u8fde\u63a5\u8fd9\u4e9b\u8282\u70b9\uff0c\u4ece\u800c\u660e\u786e\u8bbe\u8ba1\u51fa\u81ea\u7136\u7ed3\u6784\u5316\u7684\u201c\u601d\u8003\u6d41\u7a0b\u201d\u3002\u4f8b\u5982\uff0c\u5728\u64b0\u5199\u8bba\u6587\u65f6\uff0c\u53ef\u80fd\u7684\u6b65\u9aa4\u5305\u62ec\uff1a1\uff09\u786e\u5b9a\u6838\u5fc3\u4fe1\u606f\uff0c2\uff09\u8bc6\u522b\u7814\u7a76\u7a7a\u767d\u7b49\u3002AgentKit\u7684\u6a21\u5757\u5316\u7279\u6027\u4f7f\u5f97\u9ad8\u7ea7\u529f\u80fd\u5982\u5373\u5174\u7684\u5c42\u6b21\u5316\u89c4\u5212\u3001\u53cd\u601d\u548c\u4ece\u4e92\u52a8\u4e2d\u5b66\u4e60\u53d8\u5f97\u53ef\u80fd\u3002\u7531\u4e8e\u5176\u76f4\u89c2\u4e14\u6a21\u62df\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u7684\u8bbe\u8ba1\uff0c\u5373\u4f7f\u6ca1\u6709\u7f16\u7a0b\u7ecf\u9a8c\u7684\u4eba\u4e5f\u80fd\u521b\u5efa\u548c\u8c03\u6574\u57fa\u7840\u4ee3\u7406\u3002\u5b9a\u91cf\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528AgentKit\u8bbe\u8ba1\u7684\u4ee3\u7406\u5728WebShop\u548cCrafter\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u6210\u679c\u8868\u660eAgentKit\u6709\u6f5c\u529b\u4f7fLLM\u4ee3\u7406\u5728\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u4e0b\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728GitHub\uff1ahttps://github.com/holmeswww/AgentKit\u3002**|\n", "2404.09982": "|**2024-04-15**|**Memory Sharing for Large Language Model based Agents**|Hang Gao et.al.|[2404.09982](http://arxiv.org/abs/2404.09982)|**[link](https://github.com/ghupppp/memorysharingllm)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6267\u884c\u4efb\u52a1\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u91cd\u5927\u7a81\u7834\uff0c\u5b83\u51cf\u5c11\u4e86\u5bf9\u56fa\u5b9a\u7b54\u6848\u4efb\u52a1\uff08\u5982\u5e38\u8bc6\u95ee\u9898\u548c\u662f\u975e\u67e5\u8be2\uff09\u7684\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03\u9700\u6c42\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u5f00\u653e\u6027\u6311\u6218\u5982\u8bd7\u6b4c\u521b\u4f5c\u65f6\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5c40\u9650\uff0c\u4e3b\u8981\u6e90\u4e8e\u63d0\u4f9b\u7684\u793a\u4f8b\u5168\u9762\u6027\u4ee5\u53ca\u6a21\u578b\u7406\u89e3\u95ee\u9898\u5185\u5bb9\u7684\u80fd\u529b\u4e0d\u8db3\uff0c\u5bfc\u81f4\u8f93\u51fa\u5f80\u5f80\u4e0e\u9884\u671f\u7ed3\u679c\u5927\u76f8\u5f84\u5ead\u3002\u9488\u5bf9\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86Memory-Sharing\uff08MS\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLM\u591a\u4ee3\u7406\u7684\u5b9e\u65f6\u8bb0\u5fc6\u5b58\u50a8\u548c\u68c0\u7d22\u7cfb\u7edf\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u8fc7\u7a0b\u3002\u6bcf\u4e2a\u201c\u8bb0\u5fc6\u201d\u5355\u5143\u8bb0\u5f55\u4e86\u63d0\u51fa\u7684\u67e5\u8be2\u53ca\u5176\u6765\u81eaLLM\u4ee3\u7406\u7684\u5373\u65f6\u54cd\u5e94\uff0c\u4ece\u591a\u4e2a\u7c7b\u4f3c\u4ee3\u7406\u4e2d\u805a\u5408\u8fd9\u4e9b\u8bb0\u5fc6\uff0c\u5f62\u6210\u6240\u6709\u4ee3\u7406\u5171\u4eab\u7684\u4e30\u5bcc\u8bb0\u5fc6\u6c60\u3002MS\u6846\u67b6\u4e0d\u4ec5\u5e2e\u52a9\u4ee3\u7406\u627e\u5230\u7279\u5b9a\u4efb\u52a1\u7684\u76f8\u5173\u793a\u4f8b\uff0c\u8fd8\u8bc4\u4f30\u5176\u8bb0\u5fc6\u7684\u6f5c\u5728\u5229\u7528\u4ef7\u503c\uff0c\u4f9b\u5176\u4ed6\u4ee3\u7406\u672a\u6765\u5e94\u7528\u3002\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u7684\u5b9e\u8bc1\u9a8c\u8bc1\u663e\u793a\uff0cMS\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5904\u7406\u5f00\u653e\u6027\u95ee\u9898\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u54ea\u79cd\u8bb0\u5fc6\u6c60\u548c\u68c0\u7d22\u7b56\u7565\u80fd\u66f4\u597d\u5730\u652f\u6301\u4ee3\u7406\uff0c\u4e3aMS\u7684\u672a\u6765\u53d1\u5c55\u63d0\u4f9b\u4e86\u65b9\u5411\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\uff1ahttps://github.com/GHupppp/MemorySharingLLM \u83b7\u53d6\u3002**|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127](http://arxiv.org/abs/2404.09127)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|**### \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u5b83\u4eec\u901a\u5e38\u6821\u51c6\u4e0d\u826f\u4e14\u8fc7\u5ea6\u81ea\u4fe1\uff0c\u7279\u522b\u662f\u5728\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u4e2d\u3002\u4eba\u7c7b\u7684\u51b3\u7b56\u548c\u4fe1\u5fc3\u4e0d\u4ec5\u6e90\u4e8e\u5185\u5728\u4fe1\u5ff5\uff0c\u8fd8\u80fd\u901a\u8fc7\u65e5\u5e38\u89c2\u5bdf\u8fdb\u884c\u8c03\u6574\uff0c\u800c\u73b0\u6709LLM\u7684\u6821\u51c6\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u5355\u4e2a\u6a21\u578b\u7684\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u201c\u96c6\u4f53\u667a\u6167\u201d\uff1a\u591a\u4e2aLLM\u4e4b\u95f4\u7684\u534f\u4f5c\u8868\u8fbe\u80fd\u529b\uff0c\u8fd9\u53ef\u4ee5\u96c6\u4f53\u63d0\u9ad8\u51c6\u786e\u6027\u548c\u6821\u51c6\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u8bad\u7ec3\u540e\u5904\u7406\u7684\u6821\u51c6\u7b56\u7565\u2014\u2014\u534f\u4f5c\u6821\u51c6\uff08Collaborative Calibration\uff09\uff0c\u5b83\u5229\u7528\u591a\u4ee3\u7406\u5de5\u5177\u589e\u5f3a\u7684LLMs\u5728\u6a21\u62df\u7684\u7fa4\u4f53\u8ba8\u8bba\u8fc7\u7a0b\u4e2d\uff0c\u5171\u540c\u63d0\u5347\u6821\u51c6\u80fd\u529b\u548c\u63a8\u7406\u5408\u7406\u6027\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u751f\u6210\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u534f\u4f5c\u6821\u51c6\u7684\u6709\u6548\u6027\uff0c\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u6574\u5408\u96c6\u4f53\u6821\u51c6\u540e\u7684\u4fe1\u5fc3\u8bc4\u4f30\u548c\u63d0\u5347\u6a21\u578b\u9884\u6d4b\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077](http://arxiv.org/abs/2404.09077)|**[link](https://github.com/zukangy/kgp-curiousllm)**|**\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u5e93\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u6210\u6548\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u5f80\u5f80\u529b\u6709\u4e0d\u902e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5bf9\u4e00\u79cd\u540d\u4e3a\u77e5\u8bc6\u56fe\u8c31\u63d0\u793a\uff08KGP\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4ee5\u63d0\u5347\u63a8\u7406\u548c\u641c\u7d22\u7cbe\u5ea6\u3002\u7136\u800c\uff0c\u539f\u59cb\u7684KGP\u6846\u67b6\u9700\u8981\u6602\u8d35\u7684\u5927\u89c4\u6a21\u6570\u636e\u5fae\u8c03\uff0c\u5e76\u4e14\u4ecd\u5b58\u5728LLM\u7684\u9519\u8bef\u63a8\u65ad\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u878d\u5165\u63a8\u7406\u80fd\u529b\u7684LLM\u4ee3\u7406\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u7684\u597d\u5947\u5fc3\uff0c\u901a\u8fc7\u63d0\u95ee\u6765\u66f4\u6709\u6548\u5730\u5bfc\u822a\u641c\u7d22\u8fc7\u7a0b\u3002\u8fd9\u4e2a\u7b80\u5355\u7684\u6539\u8fdb\u663e\u8457\u63d0\u9ad8\u4e86LLM\u5728QA\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u521d\u59cbKGP\u6846\u67b6\u7684\u9ad8\u6210\u672c\u548c\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u8fdb\u4e00\u6b65\u53d1\u5c55\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u7cbe\u786e\u3001\u66f4\u5feb\u6377\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684QA\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2404.09043": "|**2024-04-13**|**Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation**|Jia Gu et.al.|[2404.09043](http://arxiv.org/abs/2404.09043)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u98de\u901f\u53d1\u5c55\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528LLMs\u6a21\u62df\u4eba\u7c7b\u7684\u884c\u4e3a\u51b3\u7b56\u8fc7\u7a0b\uff0c\u901a\u5e38\u8fd9\u4e9b\u8fc7\u7a0b\u88ab\u8868\u793a\u4e3a\u9a6c\u5c14\u53ef\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDPs\uff09\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u52a8\u4f5c\u9075\u5faa\u7279\u5b9a\u7684\u6982\u7387\u5206\u5e03\uff0c\u5e76\u9700\u8981\u8fed\u4ee3\u91c7\u6837\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u63a2\u7a76LLM\u4ee3\u7406\u7406\u89e3\u6982\u7387\u5206\u5e03\u7684\u80fd\u529b\uff0c\u4ee5\u901a\u8fc7\u6982\u7387\u91c7\u6837\u6307\u5bfc\u884c\u4e3a\u51b3\u7b56\u5e76\u751f\u6210\u884c\u4e3a\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u65b9\u9762\uff1a\u4e00\u662f\u5df2\u77e5\u7cbe\u786e\u6982\u7387\u5206\u5e03\u7684\u6a21\u62df\uff0c\u4e8c\u662f\u6a21\u7cca\u6982\u7387\u5206\u5e03\u7684\u5e8f\u5217\u751f\u6210\u3002 \u5728\u5df2\u77e5\u6982\u7387\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\uff0c\u4ee3\u7406\u9700\u8981\u6839\u636e\u95ee\u9898\u63cf\u8ff0\u63d0\u4f9b\u6982\u7387\u5206\u5e03\u7684\u7c7b\u578b\u548c\u53c2\u6570\uff0c\u7136\u540e\u7ed9\u51fa\u91c7\u6837\u5e8f\u5217\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u663e\u793a\uff0cLLM\u4ee3\u7406\u5728\u8fd9\u65b9\u9762\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4f46\u901a\u8fc7\u7f16\u7a0b\u5de5\u5177\u53ef\u4ee5\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u63d0\u9ad8\u91c7\u6837\u6210\u529f\u7387\u3002\u800c\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\uff0c\u6982\u7387\u5206\u5e03\u5f80\u5f80\u4e0d\u660e\u786e\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5728\u7b2c\u4e8c\u90e8\u5206\u8ba9\u4ee3\u7406\u8c03\u6574\u5728\u7ebf\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u6d3b\u8dc3\u5ea6\uff0c\u5e76\u5206\u6790\u884c\u52a8\u9891\u7387\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u501f\u52a9\u7f16\u7a0b\u5de5\u5177\uff0cLLM\u4ee3\u7406\u4f9d\u7136\u65e0\u6cd5\u6709\u6548\u5730\u91c7\u6837\u6982\u7387\u5206\u5e03\u3002\u8fd9\u610f\u5473\u7740\u5728\u76f4\u63a5\u5c06LLM\u4f5c\u4e3a\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\u5e94\u7528\u4e4b\u524d\uff0c\u8fd8\u9700\u8981\u8c28\u614e\u5bf9\u5f85\u3002|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492](http://arxiv.org/abs/2404.08492)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u535a\u5f08\u8bba\u6846\u67b6\u4e0b\u7684\u6e38\u620f\u884c\u4e3a\u7406\u89e3\u6f5c\u529b\u65e5\u76ca\u663e\u73b0\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u901a\u8fc7\u6a21\u62df\u5206\u6790\u4e0d\u540c\u7c7b\u578bLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u7ecf\u5178 Beauty Contest \u6e38\u620f\u4e2d\u7684\u7b56\u7565\u4e92\u52a8\u3002\u501f\u9274\u4eba\u7c7b\u5b9e\u9a8c\uff0c\u6211\u4eec\u5bf9LLM\u4ee3\u7406\u7684\u7b56\u7565\u5c42\u6b21\u8fdb\u884c\u7c7b\u4f3c\u7684\u8bc4\u4f30\uff0c\u53d1\u73b0\u5b83\u4eec\u5c55\u73b0\u51fa\u4ece\u96f6\u7ea7\u5230\u4e00\u7ea7\u7684\u4e0d\u540c\u7a0b\u5ea6\u63a8\u7406\u80fd\u529b\uff0c\u5e76\u5728\u91cd\u590d\u6e38\u620f\u4e2d\u8868\u73b0\u51fa\u884c\u52a8\u8d8b\u540c\u3002\u6b64\u5916\uff0c\u6211\u8fd8\u63a2\u8ba8\u4e86\u4e0d\u540c\u7c7b\u578b\u7684\u4ee3\u7406\u7fa4\u4f53\u6784\u6210\u5982\u4f55\u5f71\u54cd\u6218\u7565\u884c\u4e3a\uff1a\u9ad8\u6bd4\u4f8b\u7684\u56fa\u5b9a\u7b56\u7565\u5bf9\u624b\u80fd\u4fc3\u8fdbLLM\u4ee3\u7406\u7684\u6536\u655b\uff0c\u800c\u6df7\u5408\u73af\u5883\u4e2d\u4e0d\u540c\u76f8\u5bf9\u7b56\u7565\u6c34\u5e73\u7684\u4ee3\u7406\u5171\u5b58\u4f1a\u52a0\u901f\u6240\u6709\u4ee3\u7406\u7684\u6536\u655b\u3002\u66f4\u667a\u80fd\u7684\u4ee3\u7406\u53ef\u80fd\u83b7\u5f97\u66f4\u9ad8\u7684\u5e73\u5747\u6536\u76ca\uff0c\u4f46\u8fd9\u662f\u4ee5\u8f83\u4f4e\u667a\u80fd\u4ee3\u7406\u7684\u727a\u7272\u4e3a\u4ee3\u4ef7\u7684\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e0d\u4ec5\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u60c5\u666f\u4e0b\u6a21\u62df\u4ee3\u7406\u7684\u7ed3\u5c40\uff0c\u8fd8\u4e3a\u7406\u89e3\u7b97\u6cd5\u4e4b\u95f4\u7684\u6218\u7565\u4e92\u52a8\u63d0\u4f9b\u4e86\u91cd\u8981\u542f\u793a\u3002|\n", "2404.08144": "|**2024-04-17**|**LLM Agents can Autonomously Exploit One-day Vulnerabilities**|Richard Fang et.al.|[2404.08144](http://arxiv.org/abs/2404.08144)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5a01\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5176\u5728\u826f\u6027\u548c\u6076\u610f\u7528\u9014\u4e0a\u7684\u5e94\u7528\u4e5f\u65e5\u76ca\u5e7f\u6cdb\u3002\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u5b83\u4eec\u5229\u7528\u7f51\u7edc\u5b89\u5168\u6f0f\u6d1e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u81ea\u4e3b\u7834\u89e3\u7f51\u7ad9\u7684\u53ef\u80fd\u6027\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u5355\u7684\u6f0f\u6d1e\u4e0a\u3002\u672c\u5de5\u4f5c\u63ed\u793a\uff0cLLMs\u80fd\u591f\u81ea\u4e3b\u5229\u7528\u73b0\u5b9e\u4e16\u754c\u7cfb\u7edf\u4e2d\u7684\u5355\u65e5\u6f0f\u6d1e\u3002\u6211\u4eec\u6536\u96c6\u4e86\u4e00\u7ec4\u5305\u542b15\u4e2a\u88abCVE\u63cf\u8ff0\u4e3a\u201c\u5173\u952e\u4e25\u91cd\u6027\u201d\u7684\u4e00\u5929\u671f\u6f0f\u6d1e\u6570\u636e\u3002\u5f53\u63d0\u4f9bCVE\u63cf\u8ff0\u65f6\uff0cGPT-4\u6a21\u578b\u80fd\u6210\u529f\u5229\u752887%\u7684\u6f0f\u6d1e\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5176\u4ed6\u6d4b\u8bd5\u6a21\u578b\uff08\u5982GPT-3.5\u3001\u5f00\u6e90LLMs\u548c\u5f00\u6e90\u6f0f\u6d1e\u626b\u63cf\u5668ZAP\u548cMetasploit\uff09\u7684\u8868\u73b0\u5747\u4e3a0%\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684GPT-4\u6a21\u578b\u5728\u6ca1\u6709\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u6548\u7387\u5927\u51cf\uff0c\u4ec5\u80fd\u5229\u75287%\u7684\u6f0f\u6d1e\u3002\u8fd9\u4e9b\u53d1\u73b0\u5bf9\u5927\u89c4\u6a21\u90e8\u7f72\u9ad8\u80fd\u529bLLMs\u63d0\u51fa\u4e86\u8d28\u7591\u3002|\n", "2404.17586": "|**2024-04-11**|**The Future of Scientific Publishing: Automated Article Generation**|Jeremy R. Harper et.al.|[2404.17586](http://arxiv.org/abs/2404.17586)|null|\u8fd9\u9879\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8f6f\u4ef6\u5de5\u5177\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u793a\uff0c\u5b9e\u73b0\u4e86\u4ecePython\u4ee3\u7801\u81ea\u52a8\u751f\u6210\u5b66\u672f\u6587\u7ae0\uff0c\u8fd9\u5bf9\u4e8e\u751f\u7269\u533b\u5b66\u4fe1\u606f\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u9009\u62e9Python\u4f5c\u4e3a\u57fa\u7840\u793a\u4f8b\uff0c\u56e0\u5176\u5e7f\u6cdb\u4f7f\u7528\u548c\u5f3a\u5927\u7684\u6570\u636e\u5206\u6790\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u548c\u6846\u67b6\u7684\u7075\u6d3b\u6027\u4f7f\u5f97\u5176\u9002\u7528\u4e8e\u591a\u79cdGitHub\u4ed3\u5e93\uff0c\u8868\u660e\u4e86\u5de5\u5177\u7684\u5e7f\u6cdb\u5e94\u7528\u6f5c\u529b\uff08Harper\uff0c2024\u5e74\uff09\u3002\u901a\u8fc7\u7b80\u5316\u4f20\u7edf\u4e0a\u8017\u65f6\u7684\u5b66\u672f\u5199\u4f5c\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5728\u6574\u5408\u590d\u6742\u6570\u636e\u96c6\u548c\u4ee3\u7801\u8f93\u51fa\u65b9\u9762\uff0c\u8fd9\u4e00\u7a81\u7834\u6027\u8fdb\u5c55\u63a8\u52a8\u4e86\u79d1\u7814\u6210\u679c\u7684\u5feb\u901f\u4f20\u64ad\u3002\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u5e76\u672a\u4f9d\u8d56\u9ad8\u7ea7\u8bed\u8a00\u6a21\u578b\uff0c\u786e\u4fdd\u4e86\u81ea\u52a8\u5316\u751f\u6210\u5185\u5bb9\u7684\u8fde\u8d2f\u6027\u548c\u5b8c\u6574\u6027\u3002\u6b64\u6b21\u63a2\u7d22\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u8f6f\u4ef6\u7684\u6210\u529f\u5e94\u7528\u548c\u6548\u7387\uff0c\u8fd8\u9884\u793a\u4e86\u672a\u6765\u53ef\u80fd\u96c6\u6210\u66f4\u5148\u8fdb\u7684LLM\uff0c\u5c06\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u529f\u80fd\uff0c\u5f15\u9886\u4e00\u4e2a\u79d1\u7814\u53d1\u73b0\u53d1\u5e03\u66f4\u52a0\u8fc5\u901f\u548c\u6613\u83b7\u53d6\u7684\u65f6\u4ee3\u3002|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456](http://arxiv.org/abs/2404.07456)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u4f5c\u4e3a\u667a\u80fd\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u5de5\u7a0b\u6216\u4efb\u52a1\u7279\u5b9a\u7684\u5fae\u8c03\u6765\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u6216\u51b3\u7b56\u80fd\u529b\uff0c\u5ffd\u89c6\u4e86\u63a2\u7d22\u4e0e\u5229\u7528\u7684\u8fc7\u7a0b\u3002\u5728\u5904\u7406\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u9996\u5148\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u73af\u5883\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u6a21\u578b\u503e\u5411\u4e8e\u505a\u51fa\u8d2a\u5a6a\u51b3\u7b56\uff0c\u5bfc\u81f4\u89e3\u51b3\u65b9\u6848\u4e0d\u7406\u60f3\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u4ece\u73af\u5883\u4e2d\u83b7\u53d6\u7684\u65e0\u5173\u4fe1\u606f\u4e0d\u4ec5\u5f15\u5165\u566a\u58f0\uff0c\u8fd8\u589e\u52a0\u4e86\u989d\u5916\u7684\u6210\u672c\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u5f31\u63a2\u7d22\u5f3a\u5316\u5f3a\u5229\u7528\uff08Weak Exploration to Strong Exploitation\uff0cWESE\uff09\uff0c\u65e8\u5728\u589e\u5f3aLLM\u5728\u89e3\u51b3\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0cWESE\u5c06\u63a2\u7d22\u548c\u5229\u7528\u8fc7\u7a0b\u89e3\u8026\uff0c\u4f7f\u7528\u6210\u672c\u6548\u76ca\u9ad8\u7684\u201c\u5f31\u201d\u4ee3\u7406\u6267\u884c\u63a2\u7d22\u4efb\u52a1\uff0c\u4ee5\u83b7\u53d6\u5168\u5c40\u77e5\u8bc6\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7b56\u7565\u6765\u5b58\u50a8\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5e76\u63d0\u53d6\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u4ece\u800c\u63d0\u5347\u201c\u5f3a\u201d\u4ee3\u7406\u5728\u6210\u529f\u7387\u548c\u6548\u7387\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5e76\u5728\u56db\u4e2a\u4e92\u52a8\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u548c\u6548\u7387\u3002|\n", "2404.06921": "|**2024-04-10**|**GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications**|Shishir G. Patil et.al.|[2404.06921](http://arxiv.org/abs/2404.06921)|**[link](https://github.com/ShishirPatil/gorilla)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0d\u518d\u4ec5\u4ec5\u662f\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u4fe1\u606f\u63d0\u4f9b\u8005\uff0c\u800c\u662f\u5f00\u59cb\u79ef\u6781\u53c2\u4e0e\u5230\u4e0e\u5b9e\u9645\u5e94\u7528\u548c\u670d\u52a1\u7684\u4e92\u52a8\u4e2d\u3002\u5982\u4eca\uff0c\u4eba\u7c7b\u5728\u5c06LLM\u751f\u6210\u7684\u8f93\u51fa\uff08\u5982\u4ee3\u7801\u3001\u51fd\u6570\u6216\u64cd\u4f5c\uff09\u6295\u5165\u73b0\u5b9e\u4e16\u754c\u6267\u884c\u524d\uff0c\u9700\u8981\u9a8c\u8bc1\u5176\u6b63\u786e\u6027\u548c\u9002\u7528\u6027\uff0c\u8fd9\u5e26\u6765\u4e86\u6311\u6218\uff0c\u56e0\u4e3a\u4ee3\u7801\u7406\u89e3\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u975e\u5e38\u56f0\u96be\u3002\u672c\u6587\u7814\u7a76\u4e86\u4eba\u7c7b\u5982\u4f55\u80fd\u6709\u6548\u4e0eLLMs\u534f\u4f5c\u3001\u59d4\u6d3e\u548c\u76d1\u7763\uff0c\u7279\u522b\u662f\u5728\u672a\u6765\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u5bf9\u63d0\u51fa\u7684\u884c\u52a8\u8fdb\u884c\u201c\u4e8b\u540e\u9a8c\u8bc1\u201d\uff08\u5728\u770b\u5230\u8f93\u51fa\u540e\u786e\u8ba4\u5176\u6b63\u786e\u6027\uff09\u6bd4\u4e4b\u524d\u7684\u201c\u4e8b\u524d\u9a8c\u8bc1\u201d\u66f4\u4e3a\u5bb9\u6613\u3002\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u96c6\u6210\u76f4\u89c2\u7684\u64a4\u9500\u529f\u80fd\uff0c\u5e76\u4e3aLLM\u751f\u6210\u7684\u52a8\u4f5c\u8bbe\u5b9a\u635f\u5bb3\u7ea6\u675f\uff0c\u4f5c\u4e3a\u964d\u4f4e\u76f8\u5173\u98ce\u9669\u7684\u6709\u6548\u7b56\u7565\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u4eba\u7c7b\u53ef\u4ee5\u64a4\u9500LLM\u8f93\u51fa\u7684\u5f71\u54cd\uff0c\u6216\u8005\u786e\u4fe1\u6f5c\u5728\u98ce\u9669\u662f\u6709\u9650\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u5b9e\u73b0LLMs\u4e0e\u5e94\u7528\u548c\u670d\u52a1\u5728\u6709\u9650\u7684\u4eba\u7c7b\u76d1\u7763\u4e0b\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u63cf\u8ff0\u4e86\u5f00\u6e90\u8fd0\u884c\u65f6Gorilla Execution Engine\uff08GoEX\uff09\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\uff0c\u8be5\u8fd0\u884c\u65f6\u7528\u4e8e\u6267\u884cLLM\u52a8\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e9b\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\uff0c\u65e8\u5728\u63a8\u52a8LLMs\u4e0e\u5e94\u7528\u4e4b\u95f4\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u8fdb\u884c\u4ea4\u4e92\u3002GoEX\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ShishirPatil/gorilla/\u3002**|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411](http://arxiv.org/abs/2404.06411)|**[link](https://github.com/nec-research/agentquest)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\uff0c\u4eba\u4eec\u8ffd\u6c42\u80fd\u591f\u89e3\u51b3\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u7684LLM\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u5f80\u5f80\u5c40\u9650\u4e14\u53ea\u5173\u6ce8\u6574\u4f53\u4efb\u52a1\u6210\u529f\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AgentQuest\u6846\u67b6\uff0c\u5b83\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\uff08i\uff09benchmark\u548c\u8bc4\u4f30\u6307\u6807\u6a21\u5757\u5316\u4e14\u6613\u4e8e\u6269\u5c55\uff0c\u901a\u8fc7\u6587\u6863\u9f50\u5168\u3001\u6613\u7528\u7684API\uff1b\uff08ii\uff09\u6211\u4eec\u63d0\u4f9b\u4e86\u4e24\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u80fd\u591f\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u53ef\u9760\u5730\u8ffd\u8e2aLLM\u4ee3\u7406\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u793a\u4f8b\u5c55\u793a\u4e86\u8fd9\u4e9b\u6307\u6807\u7684\u5b9e\u7528\u6027\uff0c\u901a\u8fc7\u8bc6\u522b\u5e38\u89c1\u5931\u8d25\u70b9\u5e76\u4f18\u5316\u4ee3\u7406\u67b6\u6784\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6211\u4eec\u5e0c\u671b\u4e0e\u7814\u7a76\u754c\u5171\u540c\u6269\u5c55AgentQuest\uff0c\u5e76\u5df2\u5c06\u5176\u5f00\u6e90\u5728https://github.com/nec-research/agentquest\u3002**|\n", "2404.05427": "|**2024-04-15**|**AutoCodeRover: Autonomous Program Improvement**|Yuntong Zhang et.al.|[2404.05427](http://arxiv.org/abs/2404.05427)|**[link](https://github.com/nus-apr/auto-code-rover)**|**\u5728\u8fc7\u53bb\u51e0\u5341\u5e74\u91cc\uff0c\u7814\u7a76\u4eba\u5458\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u6781\u5927\u5730\u63a8\u52a8\u4e86\u7f16\u7a0b\u8f85\u52a9\u7684\u81ea\u52a8\u5316\u3002\u7136\u800c\uff0c\u8f6f\u4ef6\u5de5\u7a0b\u5e76\u4e0d\u4ec5\u4ec5\u662f\u7f16\u7801\uff0c\u8fd8\u5305\u62ec\u7ef4\u62a4\uff08\u5982\u4fee\u590dbug\uff09\u548c\u6f14\u5316\uff08\u5982\u6dfb\u52a0\u529f\u80fd\uff09\u7b49\u7a0b\u5e8f\u6539\u8fdb\u8fc7\u7a0b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u89e3\u51b3GitHub\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5b9e\u73b0\u7a0b\u5e8f\u81ea\u4e3b\u6539\u8fdb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u79f0\u4e3aAutoCodeRover\uff0c\u5b83\u7ed3\u5408\u4e86LLMs\u4e0e\u9ad8\u7ea7\u4ee3\u7801\u641c\u7d22\u80fd\u529b\uff0c\u6700\u7ec8\u751f\u6210\u7a0b\u5e8f\u4fee\u6539\u6216\u8865\u4e01\u3002\u4e0eAI\u7814\u7a76\u8005\u548c\u4ece\u4e1a\u8005\u8fd1\u671f\u5173\u6ce8\u7684\u4ec5\u6587\u4ef6\u7ea7\u522b\u7684\u8f6f\u4ef6\u9879\u76ee\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u7a0b\u5e8f\u8868\u793a\uff08\u62bd\u8c61\u8bed\u6cd5\u6811\uff09\uff0c\u5229\u7528\u7c7b/\u65b9\u6cd5\u7684\u7a0b\u5e8f\u7ed3\u6784\u6765\u589e\u5f3aLLM\u5bf9\u95ee\u9898\u6839\u672c\u539f\u56e0\u7684\u7406\u89e3\uff0c\u5e76\u901a\u8fc7\u8fed\u4ee3\u641c\u7d22\u63d0\u4f9b\u4e0a\u4e0b\u6587\u3002\u5f53\u6d4b\u8bd5\u5957\u4ef6\u53ef\u7528\u65f6\uff0c\u8c31\u7cfb\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u6280\u672f\u8fdb\u4e00\u6b65\u7cbe\u786e\u4e86\u4e0a\u4e0b\u6587\u3002 \u5728SWE-bench-lite\uff0c\u4e00\u4e2a\u5305\u542b300\u4e2a\u771f\u5b9eGitHub\u95ee\u9898\u7684\u6570\u636e\u96c6\u4e0a\uff0cAutoCodeRover\u7684\u89e3\u51b3\u65b9\u6848\u6548\u679c\u63d0\u5347\uff0c\u89e3\u51b3\u4e86\u7ea622-23%\u7684\u95ee\u9898\u3002\u5bf9\u4e8e\u5168\u91cf\u7684SWE-bench\uff0c\u5305\u542b2294\u4e2aGitHub\u95ee\u9898\uff0cAutoCodeRover\u89e3\u51b3\u4e86\u5927\u7ea616%\u7684\u95ee\u9898\uff0c\u8fd9\u6bd4\u6700\u8fd1\u62a5\u9053\u7684\u6765\u81eaCognition Labs\u7684AI\u8f6f\u4ef6\u5de5\u7a0b\u5e08Devin\u7684\u8868\u73b0\u8fd8\u8981\u9ad8\uff0c\u800c\u4e14\u65f6\u95f4\u6d88\u8017\u4e0eDevin\u76f8\u5f53\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u6d41\u7a0b\u80fd\u591f\u63a8\u52a8\u81ea\u4e3b\u8f6f\u4ef6\u5de5\u7a0b\u7684\u53d1\u5c55\uff0c\u672a\u6765LLM\u81ea\u52a8\u751f\u6210\u7684\u4ee3\u7801\u53ef\u4ee5\u88ab\u81ea\u52a8\u5730\u8fdb\u884c\u4f18\u5316\u548c\u6539\u8fdb\u3002**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291](http://arxiv.org/abs/2404.05291)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u63d0\u5347\u56db\u8db3\u673a\u5668\u4eba\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u8d85\u8d8a\u77ed\u671f\u52a8\u4f5c\u7684\u957f\u671f\u4efb\u52a1\u3002\u5bf9\u4e8e\u56db\u8db3\u673a\u5668\u4eba\u6765\u8bf4\uff0c\u957f\u671f\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u5bf9\u4efb\u52a1\u7684\u8bed\u4e49\u6709\u9ad8\u5c42\u7406\u89e3\uff0c\u5e76\u5177\u5907\u5e7f\u6cdb\u7684\u8fd0\u52a8\u548c\u64cd\u7eb5\u6280\u80fd\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5c42\u63a8\u7406\u5c42\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u4efb\u52a1\u63cf\u8ff0\u4e2d\u751f\u6210\u6df7\u5408\u79bb\u6563-\u8fde\u7eed\u7684\u8ba1\u5212\uff0c\u4f5c\u4e3a\u673a\u5668\u4eba\u4ee3\u7801\u3002\u5b83\u5305\u62ec\u591a\u4e2aLLM\u4ee3\u7406\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u601d\u8ba1\u5212\u7684\u8bed\u4e49\u89c4\u5212\u5668\u3001\u4e00\u4e2a\u53c2\u6570\u8ba1\u7b97\u5668\uff0c\u7528\u4e8e\u9884\u6d4b\u8ba1\u5212\u4e2d\u7684\u53c2\u6570\uff0c\u4ee5\u53ca\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u5668\uff0c\u5c06\u8ba1\u5212\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684\u673a\u5668\u4eba\u4ee3\u7801\u3002 \u5728\u4f4e\u5c42\u6b21\uff0c\u6211\u4eec\u91c7\u7528\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u4e00\u5957\u8fd0\u52a8\u89c4\u5212\u548c\u63a7\u5236\u6280\u80fd\uff0c\u4ee5\u589e\u5f3a\u56db\u8db3\u673a\u5668\u4eba\u7684\u7075\u6d3b\u6027\uff0c\u4f7f\u5176\u80fd\u8fdb\u884c\u4e30\u5bcc\u73af\u5883\u4ea4\u4e92\u3002\u6211\u4eec\u5728\u96be\u4ee5\u7528\u5355\u4e00\u6280\u80fd\u5b8c\u6210\u7684\u957f\u671f\u4efb\u52a1\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u7cfb\u7edf\u3002\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u8868\u660e\uff0c\u5b83\u6210\u529f\u5730\u5236\u5b9a\u4e86\u591a\u6b65\u9aa4\u7b56\u7565\uff0c\u5e76\u5c55\u73b0\u51fa\u975e\u5e73\u51e1\u7684\u884c\u4e3a\uff0c\u4f8b\u5982\u5236\u4f5c\u5de5\u5177\u6216\u5411\u4eba\u7c7b\u5bfb\u6c42\u5e2e\u52a9\u3002|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667](http://arxiv.org/abs/2404.04667)|null|\u591a\u6a21\u6001\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u6709\u671b\u901a\u8fc7\u89e3\u6790\u5404\u7c7b\u533b\u5b66\u6570\u636e\u63d0\u5347\u4e34\u5e8a\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5404\u533b\u5b66\u9886\u57df\u7684\u6548\u80fd\u5c1a\u4e0d\u660e\u6717\uff0c\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5176\u72ec\u7279\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u5f15\u64ce\u7684\u65b0\u578b\u591a\u6a21\u6001\u533b\u7597AI\u65b9\u6cd5\u3002\u6b64\u5f15\u64ce\u81ea\u4e3b\u534f\u8c03\u5e76\u90e8\u7f72\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u533b\u7597AI\u5de5\u5177\uff0c\u5982\u6587\u672c\u89e3\u8bfb\u3001\u653e\u5c04\u5b66\u548c\u75c5\u7406\u56fe\u50cf\u5206\u6790\u3001\u57fa\u56e0\u6570\u636e\u5904\u7406\u3001\u7f51\u7edc\u641c\u7d22\u4ee5\u53ca\u533b\u7597\u6307\u5357\u6587\u6863\u68c0\u7d22\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u4e34\u5e8a\u80bf\u7624\u5b66\u573a\u666f\u4e2d\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u573a\u666f\u6a21\u62df\u4e86\u5178\u578b\u7684\u60a3\u8005\u62a4\u7406\u6d41\u7a0b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u7cfb\u7edf\u5728\u9009\u62e9\u6070\u5f53\u5de5\u5177\uff0897%\uff09\u3001\u5f97\u51fa\u6b63\u786e\u7ed3\u8bba\uff0893.6%\uff09\u3001\u63d0\u4f9b\u5b8c\u6574\uff0894%\uff09\u548c\u6709\u76ca\uff0889.2%\uff09\u6cbb\u7597\u5efa\u8bae\uff0c\u4ee5\u53ca\u6839\u636e\u6307\u4ee4\u5f15\u7528\u76f8\u5173\u6587\u732e\uff0882.5%\uff09\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u80fd\u529b\u3002\u8fd9\u8868\u660eLLMs\u80fd\u591f\u6709\u6548\u5730\u89c4\u5212\u548c\u6267\u884c\u9886\u57df\u7279\u5b9a\u6a21\u578b\uff0c\u4ee5\u83b7\u53d6\u6216\u5408\u6210\u65b0\u4fe1\u606f\uff0c\u4ece\u800c\u5145\u5f53\u4e2a\u6027\u5316\u4e34\u5e8a\u52a9\u624b\u3002\u6b64\u5916\uff0c\u8fd9\u79cd\u67b6\u6784\u7b80\u5316\u4e86\u76d1\u7ba1\u5408\u89c4\u6027\uff0c\u56e0\u4e3a\u6bcf\u4e2a\u7ec4\u4ef6\u5de5\u5177\u53ef\u4ee5\u5355\u72ec\u9a8c\u8bc1\u548c\u5ba1\u6279\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u9879\u5de5\u4f5c\u4e3a\u533b\u7597\u9886\u57df\u7684\u66f4\u5148\u8fdbLLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u6982\u5ff5\u9a8c\u8bc1\u3002|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237](http://arxiv.org/abs/2404.04237)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u8fdb\u6b65\u4f7f\u5176\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u9891\u9891\u8d85\u8d8a\u4eba\u7c7b\u8868\u73b0\uff0c\u63a8\u52a8\u4e86\u4f17\u591a\u4e0b\u6e38\u5e94\u7528\u7684\u53d1\u5c55\uff0c\u5982\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u770b\u4f3c\u7b80\u5355\u7684\u4efb\u52a1\u4e2d\u610f\u5916\u5730\u8868\u73b0\u4e0d\u4f73\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5bf9\u66f4\u5168\u9762\u548c\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u9700\u6c42\uff0c\u4ee5\u8861\u91cf\u5b83\u4eec\u7684\u5b9e\u9645\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7ec4\u5408\u6027\u548c\u6761\u4ef6\u63a8\u7406\u2014\u2014\u4eba\u7c7b\u8ba4\u77e5\u7684\u57fa\u77f3\uff0c\u5e76\u63d0\u51faGroundCocoa\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e0e\u822a\u73ed\u9884\u8ba2\u8fd9\u4e00\u73b0\u5b9e\u95ee\u9898\u76f8\u8fde\u63a5\u7684\u8bcd\u6c47\u4e30\u5bcc\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u4efb\u52a1\u662f\u5c06\u7528\u6237\u7684\u8be6\u7ec6\u504f\u597d\u4e0e\u4ee5\u591a\u9009\u5f62\u5f0f\u63d0\u4f9b\u7684\u53ef\u7528\u822a\u73ed\u9009\u9879\u8fdb\u884c\u5339\u914d\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684GPT-4 Turbo\u5728\u5185\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7\u9ad8\u7ea7\u63d0\u793a\u540e\uff0c\u51c6\u786e\u7387\u4ecd\u4e0d\u8d85\u8fc767%\uff0c\u663e\u793a\u51fa\u663e\u8457\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045](http://arxiv.org/abs/2404.16045)|null|## \u7ffb\u8bd1 \u5728\u4ea7\u54c1\u5f00\u53d1\u7684\u5173\u952e\u9636\u6bb5\u2014\u2014\u9700\u6c42\u83b7\u53d6\uff0c\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u7528\u6237\u9700\u6c42\uff0c\u5bfc\u81f4\u6700\u7ec8\u4ea7\u54c1\u53ef\u80fd\u65e0\u6cd5\u6ee1\u8db3\u671f\u671b\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u548c\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u3002\u901a\u8fc7\u751f\u6210\u5927\u91cf\u6a21\u62df\u7528\u6237\uff08LLM\u4ee3\u7406\uff09\uff0c\u6211\u4eec\u53ef\u4ee5\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7528\u6237\u9700\u6c42\u548c\u672a\u9884\u89c1\u7684\u4f7f\u7528\u573a\u666f\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u63cf\u8ff0\u4ed6\u4eec\u7684\u884c\u4e3a\u3001\u89c2\u5bdf\u548c\u6311\u6218\uff0c\u53c2\u4e0e\u4ea7\u54c1\u4f53\u9a8c\u60c5\u666f\u3002\u968f\u540e\u7684\u4ee3\u7406\u8bbf\u8c08\u548c\u5206\u6790\u63ed\u793a\u4e86\u5b9d\u8d35\u7684\u7528\u6237\u9700\u6c42\uff0c\u5305\u62ec\u6f5c\u5728\u9700\u6c42\u3002\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff1a\u9996\u5148\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u65b9\u6cd5\u751f\u6210\u591a\u6837\u5316\u7684\u4ee3\u7406\uff0c\u5206\u6790\u5176\u4f18\u7f3a\u70b9\uff0c\u5e76\u8bc1\u660e\u4e86\u5177\u6709\u4e0a\u4e0b\u6587\u610f\u8bc6\u7684\u4ee3\u7406\u751f\u6210\u80fd\u5e26\u6765\u66f4\u5927\u7684\u9700\u6c42\u591a\u6837\u6027\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8be5\u6846\u67b6\u5982\u4f55\u6709\u6548\u5730\u6a21\u62df\u5bcc\u6709\u540c\u60c5\u5fc3\u7684\u9886\u5148\u7528\u6237\u8bbf\u8c08\uff0c\u8bc6\u522b\u51fa\u6bd4\u4f20\u7edf\u4eba\u7c7b\u8bbf\u8c08\u66f4\u591a\u7684\u6f5c\u5728\u9700\u6c42\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528LLMs\u5206\u6790\u8bbf\u8c08\uff0c\u63d0\u53d6\u9700\u6c42\u5e76\u5c06\u5176\u5206\u7c7b\u4e3a\u6f5c\u5728\u6216\u975e\u6f5c\u5728\u3002\u6211\u4eec\u7684\u7814\u7a76\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5229\u7528LLM\u4ee3\u7406\u52a0\u901f\u65e9\u671f\u4ea7\u54c1\u7814\u53d1\u3001\u964d\u4f4e\u6210\u672c\u548c\u4fc3\u8fdb\u521b\u65b0\u7684\u6f5c\u529b\u3002|\n", "2404.15317": "|**2024-04-03**|**Concept-Guided LLM Agents for Human-AI Safety Codesign**|Florian Geissler et.al.|[2404.15317](http://arxiv.org/abs/2404.15317)|null|\u968f\u7740\u751f\u6210\u4eba\u5de5\u667a\u80fd\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff0c\u7279\u522b\u662f\u5b89\u5168\u5de5\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u63d0\u5347\uff0c\u5bf9\u5b83\u7684\u8d28\u91cf\u8981\u6c42\u4e5f\u968f\u4e4b\u63d0\u9ad8\u3002\u5355\u7eaf\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e9b\u9700\u6c42\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u878d\u5408\u7684\u7b56\u7565\uff0c\u65e8\u5728\u5229\u7528LLMs\u8fdb\u884c\u5b89\u5168\u5206\u6790\u548c\u4eba\u673a\u534f\u540c\u8bbe\u8ba1\uff0c\u4ee5\u786e\u4fdd\u8f6f\u4ef6\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9a\u5236\u5316\u7684LLM\u4ee3\u7406\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u3001\u542f\u53d1\u5f0f\u63a8\u7406\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u4e13\u6ce8\u4e8e\u89e3\u51b3\u4e0e\u9884\u5b9a\u4e49\u5b89\u5168\u6982\u5ff5\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5e76\u4e0e\u7cfb\u7edf\u6a21\u578b\u56fe\u8fdb\u884c\u4ea4\u4e92\u3002\u51b3\u7b56\u6d41\u7a0b\u901a\u8fc7\u4e00\u7cfb\u5217\u5fae\u51b3\u7b56\u8fdb\u884c\u5f15\u5bfc\uff0c\u6709\u52a9\u4e8e\u4fdd\u6301\u7ed3\u6784\u5316\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u56fe\u7684\u53e3\u5934\u8868\u8ff0\u4f5c\u4e3a\u7cfb\u7edf\u6a21\u578b\u7684\u4e2d\u95f4\u8868\u793a\uff0c\u4ee5\u4fc3\u8fdbLLM\u4e0e\u56fe\u7684\u4ea4\u4e92\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7b80\u5316\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u7684\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u9009\u62e9\u7684\u63d0\u793a-\u54cd\u5e94\u5bf9\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u5e94\u7528\u4e8e\u5b89\u5168\u5206\u6790\u3002|\n", "2404.02183": "|**2024-04-02**|**Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**|Yoichi Ishibashi et.al.|[2404.02183](http://arxiv.org/abs/2404.02183)|**[link](https://github.com/tsukushiai/self-organized-agent)**|**## \u80cc\u666f \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u7684\u672a\u6765\u6b63\u9010\u6e10\u663e\u73b0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5355\u4ee3\u7406\u65b9\u6cd5\u5728\u751f\u6210\u548c\u4f18\u5316\u5927\u89c4\u6a21\u3001\u590d\u6742\u7684\u4ee3\u7801\u5e93\u65f6\u9762\u4e34\u4e0a\u4e0b\u6587\u957f\u5ea6\u9650\u5236\u7684\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014\u81ea\u7ec4\u7ec7\u591aAgent\u4f53\u7cfb\uff08SoA\uff09\u3002SoA\u662f\u4e00\u4e2a\u53ef\u6269\u5c55\u4e14\u9ad8\u6548\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u5141\u8bb8\u72ec\u7acb\u5730\u751f\u6210\u548c\u4fee\u6539\u4ee3\u7801\u7ec4\u4ef6\uff0c\u5e76\u534f\u540c\u6784\u5efa\u6574\u4e2a\u4ee3\u7801\u5e93\u3002SoA\u7684\u4e00\u4e2a\u5173\u952e\u7279\u6027\u662f\u6839\u636e\u95ee\u9898\u590d\u6742\u6027\u81ea\u52a8\u589e\u52a0\u4ee3\u7406\uff0c\u5b9e\u73b0\u52a8\u6001\u53ef\u6269\u5c55\u6027\u3002\u8fd9\u6837\uff0c\u6574\u4f53\u4ee3\u7801\u91cf\u53ef\u4ee5\u6839\u636e\u4ee3\u7406\u6570\u91cf\u65e0\u9650\u589e\u957f\uff0c\u800c\u6bcf\u4e2a\u4ee3\u7406\u7ba1\u7406\u7684\u4ee3\u7801\u91cf\u4fdd\u6301\u6052\u5b9a\u3002 \u6211\u4eec\u5728HumanEval\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86SoA\uff0c\u5e76\u53d1\u73b0\u4e0e\u5355\u4ee3\u7406\u7cfb\u7edf\u76f8\u6bd4\uff0cSoA\u4e2d\u7684\u6bcf\u4e2a\u4ee3\u7406\u5904\u7406\u7684\u4ee3\u7801\u91cf\u660e\u663e\u51cf\u5c11\uff0c\u4f46\u603b\u4f53\u751f\u6210\u7684\u4ee3\u7801\u91cf\u663e\u8457\u589e\u52a0\u3002\u6b64\u5916\uff0cSoA\u5728Pass@1\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u5f3a\u5927\u7684\u5355\u4ee3\u7406\u57fa\u7ebf\u63d0\u9ad8\u4e865%\u3002**|\n", "2404.01602": "|**2024-04-02**|**Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game**|Silin Du et.al.|[2404.01602](http://arxiv.org/abs/2404.01602)|**[link](https://github.com/doslim/evaluate-the-opinion-leadership-of-llms)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4ea4\u63a8\u7406\u6e38\u620f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u7b56\u7565\u884c\u4e3a\uff0c\u4f46\u5bf9\u5b83\u4eec\u4f5c\u4e3a\u610f\u89c1\u9886\u8896\u7684\u91cd\u8981\u6027\u5173\u6ce8\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u591aAgent\u548c\u4eba\u673a\u4ea4\u4e92\u573a\u666f\u7684\u5b9e\u9645\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u610f\u89c1\u9886\u8896\u662f\u6307\u5728\u4e00\u4e2a\u793e\u4f1a\u7fa4\u4f53\u4e2d\u5bf9\u4ed6\u4eba\u4fe1\u5ff5\u548c\u884c\u4e3a\u6709\u663e\u8457\u5f71\u54cd\u7684\u4e2a\u4f53\u3002\u672c\u7814\u7a76\u4f7f\u7528\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u4f5c\u4e3a\u6a21\u62df\u5e73\u53f0\uff0c\u63a2\u8ba8\u8bed\u8a00\u6a21\u578b\u5728\u626e\u6f14Sheriff\uff08\u6cbb\u5b89\u5b98\uff09\u89d2\u8272\u65f6\u7684\u610f\u89c1\u9886\u5bfc\u80fd\u529b\u3002Sheriff\u8d1f\u8d23\u603b\u7ed3\u8bba\u70b9\u5e76\u63d0\u51fa\u51b3\u7b56\u5efa\u8bae\uff0c\u56e0\u6b64\u5b83\u4ee3\u8868\u4e86\u610f\u89c1\u9886\u8896\u7684\u4e00\u4e2a\u53ef\u4fe1\u4ee3\u7406\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6574\u5408Sheriff\u89d2\u8272\u7684\u6846\u67b6\uff0c\u5e76\u57fa\u4e8e\u610f\u89c1\u9886\u8896\u7684\u5173\u952e\u7279\u6027\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bc4\u4f30\u6307\u6807\uff1a\u7b2c\u4e00\u4e2a\u8861\u91cf\u610f\u89c1\u9886\u8896\u7684\u53ef\u9760\u6027\uff0c\u7b2c\u4e8c\u4e2a\u8003\u5bdf\u5176\u5bf9\u5176\u4ed6\u73a9\u5bb6\u51b3\u7b56\u7684\u5f71\u54cd\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u8bc4\u4f30\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u521b\u5efa\u4e86\u201c\u72fc\u4eba\u6740\u201d\u95ee\u9898\u56de\u7b54\u6570\u636e\u96c6\uff08WWQA\uff09\uff0c\u4ee5\u6d4b\u8bd5\u548c\u63d0\u5347\u6a21\u578b\u5bf9\u6e38\u620f\u89c4\u5219\u7684\u7406\u89e3\u3002\u6b64\u5916\uff0c\u8fd8\u5305\u542b\u4e86\u4eba\u7c7b\u53c2\u4e0e\u8005\u8fdb\u884c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u662f\u4e00\u4e2a\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u610f\u89c1\u9886\u5bfc\u529b\u7684\u8bd5\u9a8c\u573a\uff0c\u4f46\u76ee\u524d\u4ec5\u6709\u5c11\u6570\u8bed\u8a00\u6a21\u578b\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002**|\n", "2404.00806": "|**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|\u968f\u7740\u7b97\u6cd5\u5b9a\u4ef7\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u62c5\u5fe7\u7b97\u6cd5\u95f4\u7684\u5408\u8c0b\u95ee\u9898\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b9a\u4ef7\u4ee3\u7406\uff0c\u7279\u522b\u662fGPT-4\uff0c\u8fdb\u884c\u4e86\u63a2\u7a76\u3002\u7814\u7a76\u53d1\u73b0\uff1a(1) LLM\u9a71\u52a8\u7684\u5b9a\u4ef7\u673a\u5236\u5728\u5b9a\u4ef7\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff1b(2) \u5728\u5be1\u5934\u7ade\u4e89\u73af\u5883\u4e2d\uff0cLLM\u5b9a\u4ef7\u4ee3\u7406\u4f1a\u81ea\u53d1\u5730\u8fdb\u884c\u5408\u8c0b\uff0c\u4ece\u800c\u635f\u5bb3\u6d88\u8d39\u8005\u5229\u76ca\uff1b(3) \u5bf9LLM\u6307\u4ee4\uff08\u201c\u63d0\u793a\u201d\uff09\u770b\u4f3c\u5fae\u5c0f\u7684\u53d8\u5316\u53ef\u80fd\u52a0\u5267\u8fd9\u79cd\u5408\u4f5c\u884c\u4e3a\u3002\u8fd9\u4e9b\u7ed3\u679c\u540c\u6837\u9002\u7528\u4e8e\u62cd\u5356\u573a\u666f\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u7b97\u6cd5\u5b9a\u4ef7\u8fdb\u884c\u53cd\u5784\u65ad\u76d1\u7ba1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u9488\u5bf9LLM\u5b9a\u4ef7\u4ee3\u7406\u7279\u6709\u7684\u76d1\u7ba1\u6311\u6218\u3002|\n", "2404.01343": "|**2024-04-15**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343](http://arxiv.org/abs/2404.01343)|**[link](https://github.com/jingzheshi/chops)**|**\u968f\u7740\u4f01\u4e1a\u548c\u8f6f\u4ef6\u5e73\u53f0\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-3.5\u3001GPT-4\u3001GLM-3\u548cLLaMa-2\uff09\u63d0\u4f9b\u804a\u5929\u8f85\u52a9\u6216\u5ba2\u6237\u670d\u52a1\u63a8\u7406\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u5ba2\u6237\u670d\u52a1\u6a21\u578b\u5728\u4e0e\u5ba2\u6237\u8d44\u6599\u96c6\u6210\u548c\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u5b83\u4eec\u503e\u5411\u4e8e\u5f3a\u8c03\u591a\u6837\u6027\u800c\u975e\u7cbe\u786e\u6027\u548c\u9519\u8bef\u907f\u514d\uff0c\u8fd9\u5bf9\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u5ba2\u6237\u670d\u52a1\u573a\u666f\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHOPS\uff08\u7ed3\u5408\u5ba2\u6237\u8d44\u6599\u7684\u804a\u5929\u52a9\u624b\uff09\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\uff1a\uff081\uff09\u9ad8\u6548\u5229\u7528\u73b0\u6709\u6570\u636e\u5e93\u6216\u7cfb\u7edf\u67e5\u8be2\u7528\u6237\u4fe1\u606f\uff0c\u6216\u9075\u5faa\u65e2\u5b9a\u6307\u5357\u4e0e\u7cfb\u7edf\u4ea4\u4e92\uff1b\uff082\uff09\u63d0\u4f9b\u51c6\u786e\u5408\u7406\u7684\u54cd\u5e94\u5e76\u6267\u884c\u7cfb\u7edf\u5185\u7684\u5fc5\u8981\u64cd\u4f5c\uff0c\u540c\u65f6\u907f\u514d\u6709\u5bb3\u64cd\u4f5c\uff1b\uff083\uff09\u901a\u8fc7\u7ed3\u5408\u5c0f\u578b\u548c\u5927\u578bLLM\u4ee5\u5b9e\u73b0\u6027\u80fd\u6ee1\u610f\u4e14\u6210\u672c\u5408\u7406\u7684\u63a8\u7406\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9e\u7528\u7684\u6570\u636e\u96c6\uff0c\u79f0\u4e3aCPHOS-dataset\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u6570\u636e\u5e93\u3001\u6307\u5bfc\u6587\u4ef6\u4ee5\u53ca\u6765\u81eaCPHOS\u5e73\u53f0\u7684\u6a21\u62df\u7269\u7406\u5965\u6797\u5339\u514b\u7ec4\u7ec7\u670d\u52a1\u7684\u95ee\u7b54\u5bf9\u3002CPHOS\u662f\u4e00\u4e2a\u9762\u5411\u9ad8\u4e2d\u6559\u5e08\u548c\u5b66\u751f\u7684\u5728\u7ebf\u5e73\u53f0\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528CPHOS-dataset\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86CHOPS\u67b6\u6784\u7684\u6027\u80fd\uff0c\u76ee\u6807\u662f\u5c55\u793aLLM\u5982\u4f55\u63d0\u5347\u6216\u66ff\u4ee3\u4eba\u5de5\u5ba2\u6237\u670d\u52a1\u3002\u5173\u4e8e\u6211\u4eec\u7684\u63d0\u6848\u67b6\u6784\u548c\u6570\u636e\u96c6\u7684\u4ee3\u7801\u53ef\u5728\u6b64\u5904\u83b7\u53d6\uff1a\u3002**|\n", "2404.01342": "|**2024-03-31**|**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**|Lirui Zhao et.al.|[2404.01342](http://arxiv.org/abs/2404.01342)|**[link](https://github.com/opengvlab/diffagent)**|**\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u8fd1\u5e74\u6765\u5907\u53d7\u77a9\u76ee\uff0c\u5728\u5b66\u672f\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u653e\u5f02\u5f69\u3002\u4f8b\u5982\uff0cCivitai\u5e73\u53f0\uff0c\u4e00\u4e2aT2I\u521b\u65b0\u7684\u805a\u96c6\u5730\uff0c\u76ee\u524d\u6c47\u96c6\u4e8674,492\u79cd\u72ec\u7279\u7684\u6a21\u578b\uff0c\u8fd9\u5e26\u6765\u4e86\u9009\u62e9\u6700\u5408\u9002\u7684\u6a21\u578b\u548c\u53c2\u6570\u7684\u8270\u5de8\u4efb\u52a1\uff0c\u901a\u5e38\u9700\u8981\u591a\u6b21\u8bd5\u9a8c\u3002\u501f\u9274\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5de5\u5177\u4f7f\u7528\u7814\u7a76\u7684\u601d\u8def\uff0c\u6211\u4eec\u63a8\u51fa\u4e86DiffAgent\uff0c\u8fd9\u662f\u4e00\u4e2a\u901a\u8fc7API\u8c03\u7528\u6765\u5feb\u901f\u7b5b\u9009\u51c6\u786e\u9009\u9879\u7684LLM\u4ee3\u7406\u3002DiffAgent\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff0c\u79f0\u4e3aSFTA\uff0c\u4f7f\u5176\u80fd\u591f\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7cbe\u786e\u5730\u5c06T2I API\u7684\u54cd\u5e94\u4e0e\u7528\u6237\u8f93\u5165\u5bf9\u9f50\u3002\u4e3a\u4e86\u8bad\u7ec3\u548c\u8bc4\u4f30DiffAgent\u7684\u80fd\u529b\uff0c\u6211\u4eec\u6784\u5efa\u4e86DABench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u5e93\uff0c\u6db5\u76d6\u4e86\u793e\u533a\u4e2d\u7684\u5404\u79cdT2I API\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDiffAgent\u4e0d\u4ec5\u5728\u9009\u62e9\u9002\u5f53\u7684T2I API\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u9a8c\u8bc1\u4e86SFTA\u8bad\u7ec3\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53ef\u5728https://github.com/OpenGVLab/DiffAgent\u83b7\u53d6\u3002**|\n", "2404.00573": "|**2024-03-31**|**\"My agent understands me better\": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u7c7b\u8bb0\u5fc6\u67b6\u6784\uff0c\u65e8\u5728\u63d0\u5347\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u8bdd\u4ee3\u7406\u7684\u8ba4\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f7f\u5f97\u8fd9\u4e9b\u4ee3\u7406\u80fd\u81ea\u4e3b\u68c0\u7d22\u751f\u6210\u54cd\u5e94\u6240\u9700\u7684\u5fc5\u8981\u8bb0\u5fc6\uff0c\u4ece\u800c\u89e3\u51b3LLMs\u5728\u65f6\u95f4\u8ba4\u77e5\u4e0a\u7684\u5c40\u9650\u3002\u6211\u4eec\u501f\u9274\u4e86\u4eba\u7c7b\u7684\u8bb0\u5fc6\u7ebf\u7d22\u53ec\u56de\u673a\u5236\u4f5c\u4e3a\u89e6\u53d1\u70b9\uff0c\u4ee5\u5b9e\u73b0\u7cbe\u786e\u4e14\u9ad8\u6548\u7684\u56de\u5fc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6570\u5b66\u6a21\u578b\uff0c\u52a8\u6001\u91cf\u5316\u8bb0\u5fc6\u5de9\u56fa\u8fc7\u7a0b\uff0c\u8003\u8651\u4e86\u8bf8\u5982\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u65f6\u95f4\u6d41\u901d\u548c\u56de\u5fc6\u9891\u7387\u7b49\u56e0\u7d20\u3002\u4ee3\u7406\u4f1a\u4ece\u7528\u6237\u7684\u4ea4\u4e92\u5386\u53f2\u4e2d\u5b58\u50a8\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u8bb0\u5fc6\u88ab\u5c01\u88c5\u5728\u6570\u636e\u5e93\u4e2d\uff0c\u6bcf\u4e2a\u8bb0\u5fc6\u90fd\u5305\u542b\u4e86\u5185\u5bb9\u548c\u65f6\u95f4\u5173\u8054\u7684\u8bed\u5883\u3002\u8fd9\u6837\uff0c\u901a\u8fc7\u7c7b\u4f3c\u4eba\u7c7b\u8bc6\u522b\u548c\u56de\u5fc6\u8fc7\u5f80\u7ecf\u5386\u7684\u65b9\u5f0f\uff0c\u7cfb\u7edf\u80fd\u591f\u6218\u7565\u6027\u5730\u5b58\u50a8\u8bb0\u5fc6\uff0c\u5e76\u7406\u89e3\u5b83\u4eec\u5bf9\u7528\u6237\u5728\u65f6\u95f4\u7ebf\u4e0a\u7684\u91cd\u8981\u6027\u3002|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.11403": "|**2024-05-18**|**MapCoder: Multi-Agent Code Generation for Competitive Problem Solving**|Md. Ashraful Islam et.al.|[2405.11403](http://arxiv.org/abs/2405.11403)|**[link](https://github.com/md-ashraful-pramanik/mapcoder)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ee3\u7801\u5408\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u590d\u6742\u7684\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u63cf\u8ff0\u3001\u751f\u6210\u590d\u6742\u7684\u7b97\u6cd5\u548c\u6570\u636e\u7ed3\u6784\u4ee3\u7801\uff0c\u5e76\u6267\u884c\u5168\u9762\u7684\u5355\u5143\u6d4b\u8bd5\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u591a\u4ee3\u7406\u63d0\u793a\u6846\u67b6MapCoder\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u5f00\u53d1\u8005\u7f16\u7a0b\u5408\u6210\u7684\u5b8c\u6574\u8fc7\u7a0b\uff0c\u5206\u4e3a\u56db\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\uff1a\u56de\u5fc6\u76f8\u5173\u793a\u4f8b\u3001\u89c4\u5212\u3001\u4ee3\u7801\u751f\u6210\u548c\u8c03\u8bd5\u3002 \u901a\u8fc7\u5728\u516b\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7ade\u8d5b\u7ea7\u95ee\u9898\u89e3\u51b3\u548c\u7a0b\u5e8f\u5408\u6210\u57fa\u51c6\u4e0a\u8fdb\u884c\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u5305\u62ecHumanEval\uff0893.9%\uff09\u3001MBPP\uff0883.1%\uff09\u3001APPS\uff0822.0%\uff09\u3001CodeContests\uff0828.5%\uff09\u548cxCodeEval\uff0845.3%\uff09\u7b49\uff0cMapCoder\u5c55\u73b0\u4e86\u51fa\u8272\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u591a\u9879\u65b0\u7684\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u800c\u4e14\uff0c\u65e0\u8bba\u7f16\u7a0b\u8bed\u8a00\u8fd8\u662f\u95ee\u9898\u96be\u5ea6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u90fd\u8868\u73b0\u51fa\u6301\u7eed\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8be5\u6846\u67b6\uff0c\u4f9b\u7814\u7a76\u8005\u53c2\u8003\uff1ahttps://github.com/Md-Ashraful-Pramanik/MapCoder\u3002**|\n", "2405.14751": "|**2024-05-23**|**AGILE: A Novel Framework of LLM Agents**|Peiyuan Feng et.al.|[2405.14751](http://arxiv.org/abs/2405.14751)|**[link](https://github.com/bytarnish/agile)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3aLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406AGILE\uff08\u80fd\u591f\u4e0e\u7528\u6237\u4e92\u52a8\u5e76\u4ece\u73af\u5883\u4e2d\u5b66\u4e60\u7684\u4ee3\u7406\uff09\uff0c\u65e8\u5728\u6267\u884c\u590d\u6742\u7684\u5bf9\u8bdd\u4efb\u52a1\uff0c\u5229\u7528LLMs\u3001\u8bb0\u5fc6\u3001\u5de5\u5177\u548c\u4e13\u5bb6\u4ea4\u4e92\u3002\u8fd9\u79cd\u4ee3\u7406\u4e0d\u4ec5\u5177\u5907\u5bf9\u8bdd\u80fd\u529b\uff0c\u8fd8\u5177\u5907\u53cd\u601d\u3001\u5de5\u5177\u8fd0\u7528\u4ee5\u53ca\u54a8\u8be2\u4e13\u5bb6\u7684\u529f\u80fd\u3002\u6211\u4eec\u5c06\u6784\u5efa\u6b64\u7c7bLLM\u4ee3\u7406\u89c6\u4e3a\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u4e2dLLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\u3002\u6211\u4eec\u4f7f\u7528\u6807\u6ce8\u7684\u884c\u4e3a\u6570\u636e\u548cPPO\u7b97\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7279\u522b\u5173\u6ce8\u7684\u662f\u95ee\u7b54\u4efb\u52a1\uff0c\u4e3a\u6b64\u6211\u4eec\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aProductQA\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u5728\u7ebf\u8d2d\u7269\u4e2d\u7684\u96be\u9898\u3002\u6211\u4eec\u5728ProductQA\u548cMedMCQA\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8e130\u4ebf\u548c70\u4ebf\u53c2\u6570\u7684LLM\u8bad\u7ec3\u7684AGILE\u4ee3\u7406\u80fd\u591f\u8d85\u8d8aGPT-4\u4ee3\u7406\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684 ablation\u7814\u7a76\u5f3a\u8c03\u4e86\u8bb0\u5fc6\u3001\u5de5\u5177\u3001\u54a8\u8be2\u3001\u53cd\u601d\u548c\u5f3a\u5316\u5b66\u4e60\u5728\u5b9e\u73b0\u4f18\u79c0\u6027\u80fd\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002|\n", "2405.14744": "|**2024-05-23**|**Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View**|Xuan Liu et.al.|[2405.14744](http://arxiv.org/abs/2405.14744)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u53cd\u6620\u4e86\u4eba\u7c7b\u504f\u89c1\uff0c\u5b83\u4eec\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\u95ee\u9898\u3002\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u662f\uff1aLLMs\u662f\u5426\u80fd\u591f\u5229\u7528\u5e7b\u89c9\u6765\u6a21\u4eff\u4eba\u7c7b\u7684\u8ba4\u77e5\u504f\u89c1\uff0c\u4ece\u800c\u5c55\u73b0\u51fa\u975e\u7406\u6027\u4f46\u793e\u4f1a\u6027\u7684\u4e00\u9762\uff1f\u672c\u6587\u63a2\u8ba8\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u901a\u8fc7\u7ed3\u5408\u5b9e\u7528\u7684\u793e\u4f1a\u79d1\u5b66\u5b9e\u9a8c\u548c\u7406\u8bba\u6d1e\u5bdf\uff0c\u63d0\u51faCogMir\uff0c\u4e00\u4e2a\u5f00\u653e\u5f0f\u591aLLM\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528LLMs\u7684\u5e7b\u89c9\u7279\u6027\u6765\u8bc4\u4f30\u548c\u63d0\u5347\u5176\u793e\u4f1a\u667a\u80fd\uff0c\u7279\u522b\u662f\u5728\u8ba4\u77e5\u504f\u5dee\u65b9\u9762\u3002\u6211\u4eec\u5728CogMir\u5b50\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u786e\u5b9a\u60c5\u5883\u4e0b\uff0cLLMs\u548c\u4eba\u7c7b\u5728\u975e\u7406\u6027\u53ca\u4eb2\u793e\u4f1a\u51b3\u7b56\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u4e00\u81f4\u6027\uff0c\u8fd9\u8868\u660eLLMs\u4f5c\u4e3a\u793e\u4f1a\u5b9e\u4f53\u7684\u4eb2\u793e\u4f1a\u6027\uff0c\u5e76\u7a81\u663e\u4e86\u5e7b\u89c9\u7279\u6027\u7684\u5173\u952e\u4f5c\u7528\u3002\u6b64\u5916\uff0cCogMir\u6846\u67b6\u5c55\u793a\u4e86\u5176\u4f5c\u4e3a\u7814\u7a76LLMs\u793e\u4f1a\u667a\u80fd\u7684\u6709\u4ef7\u503c\u5e73\u53f0\u7684\u6f5c\u529b\u3002|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547](http://arxiv.org/abs/2405.13547)|null|## \u80cc\u666f \u81ea\u52a8\u9a7e\u9a76\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u5148\u8fdb\u7684\u51b3\u7b56\u548c\u63a7\u5236\u7b97\u6cd5\u3002\u7406\u89e3\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u51b3\u7b56\u7684\u4f9d\u636e\u5bf9\u4e8e\u786e\u4fdd\u5176\u5728\u9ad8\u901f\u516c\u8def\u9a7e\u9a76\u4e2d\u7684\u5b89\u5168\u4e0e\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aHighwayLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u6765\u9884\u6d4bego\u8f66\u8f86\u7684\u672a\u6765\u5bfc\u822a\u8def\u5f84\u70b9\u3002\u8be5\u65b9\u6cd5\u8fd8\u91c7\u7528\u9884\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u6a21\u578b\u4f5c\u4e3a\u9ad8\u5c42\u6b21\u89c4\u5212\u5668\uff0c\u5bf9\u5408\u9002\u7684\u5143\u7ea7\u52a8\u4f5c\u8fdb\u884c\u51b3\u7b56\u3002HighwayLLM\u5c06RL\u6a21\u578b\u7684\u8f93\u51fa\u4e0e\u5f53\u524d\u72b6\u6001\u4fe1\u606f\u76f8\u7ed3\u5408\uff0c\u751f\u6210\u5b89\u5168\u3001\u65e0\u78b0\u649e\u4e14\u53ef\u89e3\u91ca\u7684\u672a\u6765\u72b6\u6001\u9884\u6d4b\uff0c\u4ece\u800c\u6784\u5efa\u51fa\u8f66\u8f86\u7684\u884c\u9a76\u8f68\u8ff9\u3002\u968f\u540e\uff0c\u57fa\u4e8ePID\u7684\u63a7\u5236\u5668\u5f15\u5bfc\u8f66\u8f86\u9075\u5faaLLM\u4ee3\u7406\u9884\u6d4b\u7684\u8def\u5f84\u70b9\u3002\u8fd9\u79cdLLM\u4e0eRL\u548cPID\u7684\u878d\u5408\u63d0\u5347\u4e86\u51b3\u7b56\u8fc7\u7a0b\uff0c\u5e76\u4e3a\u9ad8\u901f\u516c\u8def\u81ea\u52a8\u9a7e\u9a76\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050](http://arxiv.org/abs/2405.13050)|**[link](https://github.com/daniel-chin/flute-x-gpt)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09-\u5728-\u73af\u5e94\u7528\u5df2\u663e\u793a\u51fa\u6709\u6548\u7406\u89e3\u7528\u6237\u547d\u4ee4\u3001\u5236\u5b9a\u8ba1\u5212\u5e76\u76f8\u5e94\u5730\u64cd\u4f5c\u5916\u90e8\u5de5\u5177/\u7cfb\u7edf\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u4ee3\u7406\u7684\u64cd\u4f5c\u8303\u56f4\u5c40\u9650\u4e8e\u88ab\u52a8\u54cd\u5e94\u7528\u6237\uff0c\u9700\u8981\u7528\u6237\u6839\u636e\u5e95\u5c42\u5de5\u5177/\u7cfb\u7edf\u6765\u8868\u8ff0\u9700\u6c42\u3002\u6211\u4eec\u6ce8\u610f\u5230LLM\u4ee3\u7406\u7528\u6237\u754c\u9762\uff08LAUI\uff09\u7684\u6f5c\u529b\u8fdc\u672a\u5145\u5206\u5229\u7528\u3002\u7406\u60f3\u7684LAUI\u8bbe\u60f3\u4e2d\uff0c\u7528\u6237\u65e0\u9700\u6df1\u5165\u4e86\u89e3\u5de5\u5177/\u7cfb\u7edf\uff0c\u5c31\u80fd\u4e0e\u4e4b\u4ea4\u4e92\u4ee5\u63a2\u7d22\u65b0\u5174\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u4e0d\u540c\u4e8e\u8bbe\u8ba1\u56fa\u5b9a\u7684\u53ef\u63a2\u7d22GUI\u6765\u6559\u6388\u7528\u6237\u4f7f\u7528\u7cfb\u7edf\u7684\u9884\u8bbe\u65b9\u5f0f\uff0cLAUI\u4e2d\u7684LLM\u4ee3\u7406\u4ece\u4e00\u5f00\u59cb\u5c31\u5bf9\u7cfb\u7edf\u719f\u7ec3\uff0c\u4e3b\u52a8\u5b66\u4e60\u7528\u6237\u53ca\u5176\u9700\u6c42\uff0c\u5e76\u5411\u7528\u6237\u63d0\u51fa\u65b0\u7684\u4e92\u52a8\u65b9\u6848\u3002\u4e3a\u4e86\u5c55\u793aLAUI\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\uff1aFlute X GPT\uff0c\u5b83\u7ed3\u5408\u4e86LLM\u4ee3\u7406\u3001\u63d0\u793a\u7ba1\u7406\u5668\u548c\u4e00\u4e2a\u652f\u6301\u590d\u6742\u5b9e\u65f6\u4f53\u9a8c\u7684\u7b1b\u5b50\u6559\u5b66\u591a\u5a92\u4f53\u8f6f\u786c\u4ef6\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u5b66\u4e60\u5439\u594f\u7b1b\u5b50\u7684\u8fc7\u7a0b\u3002|\n", "2405.13009": "|**2024-05-13**|**METAREFLECTION: Learning Instructions for Language Agents using Past Reflections**|Priyanshu Gupta et.al.|[2405.13009](http://arxiv.org/abs/2405.13009)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e7f\u53d7\u6b22\u8fce\uff0c\u4f46\u4e3a\u5176\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u8bbe\u8ba1\u7cbe\u786e\u7684\u63d0\u793a\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u7528\u6237\u901a\u5e38\u9700\u8981\u4e0e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u591a\u8f6e\u5bf9\u8bdd\u4ee5\u8fbe\u6210\u76ee\u6807\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u6a21\u578b\u81ea\u8eab\u7684\u53cd\u9988\uff0c\u5373\u81ea\u53cd\u601d\uff0c\u80fd\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u8d77\u5230\u5f3a\u5316\u4f5c\u7528\uff0c\u6709\u52a9\u4e8e\u66f4\u5feb\u5730\u8fbe\u5230\u671f\u671b\u7ed3\u679c\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014METAREFLECTION\uff0c\u5b83\u80fd\u4ece\u8bad\u7ec3\u9636\u6bb5\u6536\u96c6\u5230\u7684\u4e2a\u4f53\u81ea\u53cd\u601d\u4e2d\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u901a\u7528\u63d0\u793a\u6307\u4ee4\u3002\u6211\u4eec\u5728\u57fa\u7840\u8bbe\u65bd\u5373\u4ee3\u7801\uff08IAC\uff09\u6f0f\u6d1e\u68c0\u6d4b\u548c\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4f7f\u7528REACT\u548cCOT\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cMETAREFLECTION\u663e\u8457\u4f18\u4e8eGPT-4\uff0c\u5206\u522b\u5728IAC\u3001COT\u548cREACT\u4e2d\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a16.82%\u300131.33%\u548c15.42%\uff0c\u8fd9\u8868\u660eMETAREFLECTION\u6709\u6f5c\u529b\u63d0\u5347LLMs\u7684\u6548\u7387\uff0c\u662f\u4e00\u79cd\u503c\u5f97\u63a2\u7d22\u7684\u7b56\u7565\u3002|\n", "2405.15414": "|**2024-05-24**|**Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification**|Yuxuan Guo et.al.|[2405.15414](http://arxiv.org/abs/2405.15414)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u4e2d\uff0c\u6784\u5efa\u5f00\u653e\u578b\u4ee3\u7406\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u7ec8\u6781\u76ee\u6807\uff0c\u7279\u522b\u662f\u521b\u9020\u6027\u7684\u4ee3\u7406\u66f4\u5177\u5438\u5f15\u529b\u3002\u73b0\u6709\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u6709\u660e\u786e\u76ee\u6807\u7684\u957f\u5e8f\u5217\u4efb\u52a1\uff08\u5982\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u201c\u5f00\u91c7\u94bb\u77f3\u201d\uff09\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u5904\u7406\u5177\u6709\u5f00\u653e\u76ee\u6807\u548c\u62bd\u8c61\u6807\u51c6\u7684\u521b\u9020\u6027\u4efb\u52a1\u65f6\u9047\u5230\u56f0\u96be\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5f25\u5408\u8fd9\u4e9b\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u4ece\u800c\u7f3a\u4e4f\u81ea\u6211\u6539\u8fdb\u6765\u89e3\u51b3\u95ee\u9898\u7684\u53cd\u9988\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\u6280\u672f\uff0c\u4ee5\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e3a\u521b\u9020\u6027\u4efb\u52a1\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Luban\u4ee3\u7406\uff0c\u4e13\u6ce8\u4e8e\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5b83\u914d\u5907\u4e86\u4e24\u7ea7\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u5b9e\u8df5\uff1a\uff081\uff09\u89c6\u89c9\u9a8c\u8bc13D\u7ed3\u6784\u63a8\u6d4b\uff0c\u901a\u8fc7\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u7684CAD\u5efa\u6a21\u7a0b\u5e8f\u5b9e\u73b0\uff1b\uff082\uff09\u5b9e\u7528\u9a8c\u8bc1\uff0c\u6839\u636e\u62bd\u8c61\u6807\u51c6\u751f\u6210\u5e76\u9a8c\u8bc1\u4e0e\u73af\u5883\u76f8\u5173\u7684\u529f\u80fd\u7a0b\u5e8f\u3002\u5e7f\u6cdb\u7684\u591a\u7ef4\u5ea6\u4eba\u7c7b\u7814\u7a76\u548cElo\u8bc4\u7ea7\u663e\u793a\uff0cLuban\u80fd\u591f\u5728\u6211\u4eec\u63d0\u51fa\u7684\u57fa\u51c6\u4e2d\u5b8c\u6210\u591a\u6837\u5316\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5e76\u5728\u53ef\u89c6\u5316\u548c\u5b9e\u7528\u6027\u65b9\u9762\u5206\u522b\u6bd4\u5176\u4ed6\u57fa\u7ebf\u63d0\u9ad8\u4e8633%\u5230100%\u3002\u6b64\u5916\uff0c\u5b9e\u73b0\u5728\u771f\u5b9e\u4e16\u754c\u673a\u5668\u4eba\u624b\u81c2\u4e0a\u7684\u6f14\u793a\u5c55\u793a\u4e86Luban\u5728\u7269\u7406\u4e16\u754c\u4e2d\u7684\u521b\u4f5c\u6f5c\u529b\u3002|\n", "2405.15145": "|**2024-05-24**|**CulturePark: Boosting Cross-cultural Understanding in Large Language Models**|Cheng Li et.al.|[2405.15145](http://arxiv.org/abs/2405.15145)|**[link](https://github.com/scarelette/culturepark)**|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u666e\u904d\u5b58\u5728\u6587\u5316\u504f\u89c1\uff0c\u4e3b\u8981\u6e90\u4e8e\u7f3a\u4e4f\u4ee3\u8868\u4e0d\u540c\u6587\u5316\u7684\u4ee3\u8868\u6027\u6570\u636e\u3002\u4f20\u7edf\u7684\u6587\u5316\u6570\u636e\u96c6\u548c\u57fa\u51c6\u901a\u5e38\u901a\u8fc7\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u63d0\u53d6\u6216\u805a\u5408\u6765\u81ea\u7ef4\u57fa\u767e\u79d1\u548c\u793e\u4ea4\u5a92\u4f53\u7684\u4fe1\u606f\u6784\u5efa\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u548c\u4eba\u5de5\u6807\u6ce8\uff0c\u6210\u672c\u9ad8\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u672c\u6587\u501f\u9274\u8ba4\u77e5\u793e\u4f1a\u4ea4\u6d41\u7406\u8bba\uff0c\u63d0\u51faCulturePark\uff0c\u4e00\u4e2a\u5229\u7528LLMs\u7684\u591a\u4ee3\u7406\u6c9f\u901a\u6846\u67b6\uff0c\u7528\u4e8e\u6587\u5316\u6570\u636e\u6536\u96c6\u3002CulturePark\u901a\u8fc7\u6a21\u62df\u4e0d\u540c\u6587\u5316\u80cc\u666f\u4e0b\u7684\u4eba\u7c7b\u4ea4\u6d41\uff0c\u8ba9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89d2\u8272\u626e\u6f14\uff0c\u751f\u6210\u5305\u542b\u4eba\u7c7b\u4fe1\u5ff5\u3001\u89c4\u8303\u548c\u4e60\u4fd7\u7684\u9ad8\u8d28\u91cf\u8de8\u6587\u5316\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528CulturePark\u751f\u6210\u4e8641,000\u4e2a\u6587\u5316\u6837\u672c\uff0c\u5bf9\u516b\u79cd\u7279\u5b9a\u6587\u5316\u8fdb\u884c\u4e86\u6a21\u578b\u5fae\u8c03\u3002\u5728\u4e09\u9879\u4e0b\u6e38\u4efb\u52a1\u8bc4\u4f30\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u8868\u73b0\u4f18\u4e8eGPT-4\uff1a\u5185\u5bb9\u8fc7\u6ee4\u3001\u6587\u5316\u4e00\u81f4\u6027\uff08\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u8868\u4e0a\uff09\u548c\u6587\u5316\u6559\u80b2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684GPT-3.5\u6a21\u578b\u5728\u5185\u5bb9\u8fc7\u6ee4\u4efb\u52a1\u4e0a\u4e0eGPT-4\u76f8\u5f53\u6216\u4f18\u4e8e\u5b83\uff1b\u5728\u6587\u5316\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u886813\u6846\u67b6\u4e0a\u8d85\u8d8aGPT-4\uff1b\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u6587\u5316\u6559\u80b2\u6548\u679c\u548c\u7528\u6237\u4f53\u9a8c\u4e0a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u8868\u73b0\u51fa\u8272\u3002CulturePark\u5bf9\u4e8e\u51cf\u5c11\u6587\u5316\u504f\u89c1\u548c\u63a8\u52a8AI\u7684\u6c11\u4e3b\u5316\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u5f3a\u8c03\u4e86\u6587\u5316\u5305\u5bb9\u6027\u6570\u636e\u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2405.14918": "|**2024-05-23**|**AnalogCoder: Analog Circuit Design via Training-Free Code Generation**|Yao Lai et.al.|[2405.14918](http://arxiv.org/abs/2405.14918)|**[link](https://github.com/laiyao1/AnalogCoder)**|### \u7ffb\u8bd1 \u5728\u73b0\u4ee3\u82af\u7247\u6280\u672f\u4e2d\uff0c\u6a21\u62df\u7535\u8def\u8bbe\u8ba1\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u7ec4\u4ef6\u9009\u62e9\u3001\u8fde\u63a5\u548c\u53c2\u6570\u8bbe\u7f6e\u4ee5\u786e\u4fdd\u7535\u8def\u529f\u80fd\u6b63\u5e38\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b57\u7535\u8def\u8bbe\u8ba1\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u4f46\u6a21\u62df\u7535\u8def\u7684\u590d\u6742\u6027\u548c\u6570\u636e\u7a00\u7f3a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a8\u51fa\u4e86AnalogCoder\uff0c\u8fd9\u662f\u9996\u4e2a\u65e0\u9700\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4e13\u4e3a\u901a\u8fc7Python\u4ee3\u7801\u751f\u6210\u6765\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u9996\u5148\uff0cAnalogCoder\u91c7\u7528\u53cd\u9988\u589e\u5f3a\u6d41\u7a0b\uff0c\u5e76\u7ed3\u5408\u5b9a\u5236\u7684\u9886\u57df\u7279\u5b9a\u63d0\u793a\uff0c\u80fd\u591f\u81ea\u52a8\u4e14\u81ea\u6211\u6821\u6b63\u5730\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\uff0c\u6210\u529f\u7387\u9ad8\u3002\u5176\u6b21\uff0c\u5b83\u63d0\u51fa\u4e86\u4e00\u5957\u7535\u8def\u5de5\u5177\u5e93\uff0c\u7528\u4e8e\u5b58\u50a8\u6210\u529f\u7684\u7535\u8def\u8bbe\u8ba1\u4f5c\u4e3a\u53ef\u91cd\u7528\u7684\u6a21\u5757\u5316\u5b50\u7535\u8def\uff0c\u7b80\u5316\u4e86\u590d\u5408\u7535\u8def\u7684\u521b\u5efa\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAnalogCoder\u5728\u5e7f\u6cdb\u8986\u76d6\u6a21\u62df\u7535\u8def\u4efb\u52a1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8d85\u8d8a\u4e86\u5176\u4ed6\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0c\u6210\u529f\u8bbe\u8ba1\u4e8620\u4e2a\u7535\u8def\uff0c\u6bd4\u6807\u51c6GPT-4o\u591a\u51fa5\u4e2a\u3002\u6211\u4eec\u76f8\u4fe1AnalogCoder\u80fd\u663e\u8457\u63d0\u5347\u82af\u7247\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u6548\u7387\uff0c\u8ba9\u975e\u4e13\u5bb6\u4e5f\u80fd\u9ad8\u6548\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u57fa\u51c6\u5df2\u63d0\u4f9b\u5728\uff1a[https://github.com/anonyanalog/AnalogCoder](https://github.com/anonyanalog/AnalogCoder)\u3002|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|## \u80cc\u666f \u7531\u4e8e\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0cEmbodied agent \u9700\u8981\u5177\u5907\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u51b3\u5b9a\u5177\u4f53\u884c\u52a8\u65f6\u53ef\u80fd\u4ea7\u751f\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u7684\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3 LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\u2014\u2014\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bad\u7ec3\u7b56\u7565\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u7684\u6700\u9ad8\u6210\u5c31\u9700\u8981\u66f4\u4e3a\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u6bd4\u73b0\u6709\u6700\u5feb\u65b9\u6cd5\u5feb\u51fa\u4e866.8\u500d\u3002|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510](http://arxiv.org/abs/2405.16510)|null|\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\u63a8\u52a8\u4e86\u667a\u80fd\u4ee3\u7406\u7814\u7a76\u7684\u65b0\u70ed\u6f6e\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5b9e\u73b0\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6709\u524d\u666f\u65b9\u6cd5\uff0c\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u548c\u6cdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u5728\u5b9e\u9645\u4efb\u52a1\u4e2d\uff0c\u6709\u6548\u7684\u89c4\u5212\u5bf9LLM\u4ee3\u7406\u7684\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u590d\u6742\u4efb\u52a1\u8bbe\u8ba1\u51fa\u53ef\u884c\u6216\u6700\u4f18\u7684\u7cbe\u7ec6\u7c92\u5ea6\u64cd\u4f5c\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9700\u8981\u7ec4\u5408\u5927\u91cf\u5f02\u8d28\u884c\u52a8\u7684\u5e8f\u5217\uff0c\u4ecd\u662f\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMeta-Task Planning\uff08MTP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u96f6\u6837\u672c\u7684\u534f\u4f5c\u5f0fLLM\u591a\u4ee3\u7406\u7cfb\u7edf\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u5373\u5143\u4efb\u52a1\uff0c\u7b80\u5316\u4e86\u4efb\u52a1\u89c4\u5212\u3002\u6bcf\u4e2a\u5143\u4efb\u52a1\u968f\u540e\u6620\u5c04\u4e3a\u53ef\u6267\u884c\u52a8\u4f5c\u3002\u5728TravelPlanner\u548cAPI-Bank\u4e24\u4e2a\u4e25\u683c\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86MTP\u3002\u7ed3\u679c\u8868\u660e\uff0cMTP\u5728TravelPlanner\u4e0a\u7684\u5e73\u5747\u6210\u529f\u7387\u7ea6\u4e3a40%\uff0c\u8fdc\u8d85\u5f53\u524d\u6700\u4f73\u57fa\u7ebf\uff082.92%\uff09\uff0c\u5e76\u4e14\u5728API-Bank\u4e0a\u7684\u6027\u80fd\u6bd4\u4f7f\u7528ReAct\u7684LLM_{api}-4\u9ad8\u51fa\u7ea614%\uff0c\u8fd9\u663e\u793a\u51fa\u5c06LLM\u4e0e\u591a\u4ee3\u7406\u7cfb\u7edf\u76f8\u7ed3\u5408\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376](http://arxiv.org/abs/2405.16376)|**[link](https://github.com/cyrilli/stride)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u8bed\u8a00\u80fd\u529b\u548c\u63a8\u7406\u6280\u5de7\u3002\u7136\u800c\uff0c\u5728\u6218\u7565\u6027\u7684\u591a\u4ee3\u7406\u51b3\u7b56\u73af\u5883\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u5c40\u9650\uff0c\u5982\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5dee\u3001\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\u548c\u751f\u6210\u9519\u8bef\u4fe1\u606f\u3002\u8fd9\u4e9b\u7f3a\u70b9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9075\u5b88\u590d\u6742\u6e38\u620f\u89c4\u5219\u3001\u957f\u671f\u89c4\u5212\u3001\u63a2\u7d22\u672a\u77e5\u73af\u5883\u4ee5\u53ca\u9884\u6d4b\u5bf9\u624b\u884c\u52a8\u7684\u4e92\u52a8\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u7ed3\u5408\u4e86\u8bb0\u5fc6\u548c\u4e13\u4e1a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u5176\u5728\u6218\u7565\u51b3\u7b56\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7279\u522b\u5728\u53cc\u8fb9\u8c08\u5224\u3001\u591a\u4ee3\u7406\u52a8\u6001\u673a\u5236\u8bbe\u8ba1\u7b49\u7ecf\u6d4e\u91cd\u8981\u573a\u666f\u4e2d\u5e94\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5e76\u901a\u8fc7\u5b9a\u91cf\u6307\u6807\u8bc4\u4f30\u5728\u5404\u79cd\u6218\u7565\u51b3\u7b56\u95ee\u9898\u4e0a\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u589e\u5f3a\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5f53\u524d\u6a21\u578b\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u4f46\u6211\u4eec\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u589e\u5f3a\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u53ef\u80fd\u6027\uff0c\u8fd9\u4e3a\u672a\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u5411\u3002**|\n", "2405.16334": "|**2024-05-29**|**Devil's Advocate: Anticipatory Reflection for LLM Agents**|Haoyu Wang et.al.|[2405.16334](http://arxiv.org/abs/2405.16334)|null|\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8d4b\u4e88\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u6211\u53cd\u601d\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65f6\u7684\u4e00\u81f4\u6027\u548c\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fc3\u4f7fLLM\u4ee3\u7406\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u5b50\u4efb\u52a1\uff08\u5373\u5236\u5b9a\u8ba1\u5212\uff09\uff0c\u5e76\u5728\u6267\u884c\u884c\u52a8\u4e4b\u524d\u6301\u7eed\u53cd\u601d\u53ef\u80fd\u7684\u5931\u8d25\u53ca\u5176\u8865\u6551\u63aa\u65bd\u3001\u6267\u884c\u540e\u4e0e\u5b50\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u5e76\u8fdb\u884c\u5fc5\u8981\u7684\u56de\u6eaf\u4ee5\u786e\u4fdd\u5168\u529b\u4ee5\u8d74\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u53ca\u5728\u5b8c\u6210\u8ba1\u5212\u540e\u8fdb\u884c\u5168\u9762\u5ba1\u67e5\uff0c\u4ee5\u4fbf\u4e8e\u672a\u6765\u7b56\u7565\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5728WebArena\u4e2d\u96f6\u6837\u672c\u5e94\u7528\u8fd9\u4e00\u65b9\u6cd5\u5904\u7406\u5b9e\u9645\u7684\u7f51\u7edc\u73af\u5883\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u4f18\u4e8e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u57fa\u4e8e\u53cd\u601d\u7684\u7b56\u7565\u4e0d\u4ec5\u63d0\u5347\u4e86\u4ee3\u7406\u5e94\u5bf9\u672a\u9884\u89c1\u6311\u6218\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u901a\u8fc7\u5f3a\u5927\u7684\u8ba1\u5212\u6267\u884c\u673a\u5236\uff0c\u8fd8\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u51cf\u5c11\u4e86\u5b9e\u73b0\u4efb\u52a1\u6240\u9700\u7684\u5c1d\u8bd5\u6b21\u6570\u548c\u8ba1\u5212\u4fee\u8ba2\u6b21\u6570\u3002|\n", "2405.16247": "|**2024-05-25**|**AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning**|Minghao Chen et.al.|[2405.16247](http://arxiv.org/abs/2405.16247)|**[link](https://github.com/minghchen/automanual)**|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u5404\u79cd\u9886\u57df\u4efb\u52a1\uff0c\u5982\u673a\u5668\u4eba\u3001\u6e38\u620f\u548c\u7f51\u7edc\u5bfc\u822a\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u548c\u4e13\u5bb6\u7ea7\u63d0\u793a\u624d\u80fd\u9002\u5e94\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u9002\u5e94\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AutoManual\u6846\u67b6\uff0c\u8ba9LLMs\u80fd\u591f\u901a\u8fc7\u4e92\u52a8\u81ea\u4e3b\u6784\u5efa\u7406\u89e3\uff0c\u5e76\u9002\u5e94\u65b0\u73af\u5883\u3002AutoManual\u5c06\u73af\u5883\u77e5\u8bc6\u5206\u4e3a\u591a\u6837\u7684\u89c4\u5219\uff0c\u5e76\u901a\u8fc7\u4e24\u4e2a\u4ee3\u7406\u8fdb\u884c\u5728\u7ebf\u4f18\u5316\uff1a1\uff09\u89c4\u5212\u5668\u6839\u636e\u5f53\u524d\u89c4\u5219\u5236\u5b9a\u53ef\u64cd\u4f5c\u7684\u884c\u52a8\u8ba1\u5212\uff1b2\uff09\u6784\u5efa\u8005\u901a\u8fc7\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u89c4\u5219\u7cfb\u7edf\u66f4\u65b0\u89c4\u5219\uff0c\u4fc3\u8fdb\u5728\u7ebf\u89c4\u5219\u7ba1\u7406\u5e76\u4fdd\u6301\u5173\u952e\u7ec6\u8282\u3002\u4e3a\u4e86\u51cf\u5c11\u5728\u7ba1\u7406\u89c4\u5219\u65f6\u7684\u5e7b\u89c9\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u6848\u4f8b\u6761\u4ef6\u63d0\u793a\u201d\u7b56\u7565\u7528\u4e8e\u6784\u5efa\u8005\u3002\u6700\u7ec8\uff0c\u7f16\u8bd1\u5668\u4ee3\u7406\u5c06\u8fd9\u4e9b\u89c4\u5219\u6574\u5408\u6210\u4e00\u4efd\u5168\u9762\u7684\u624b\u518c\u3002\u8fd9\u4efd\u81ea\u6211\u751f\u6210\u7684\u624b\u518c\u4e0d\u4ec5\u80fd\u63d0\u9ad8\u9002\u5e94\u6027\uff0c\u8fd8\u80fd\u6307\u5bfc\u5c0f\u578bLLMs\u7684\u89c4\u5212\uff0c\u540c\u65f6\u4fdd\u6301\u4eba\u7c7b\u53ef\u8bfb\u3002\u4ec5\u51ed\u4e00\u6b21\u7b80\u5355\u6f14\u793a\uff0cAutoManual\u663e\u8457\u63d0\u9ad8\u4e86\u4efb\u52a1\u6210\u529f\u7387\uff0cGPT-4-turbo\u4e0b\u8fbe\u523097.4%\uff0cGPT-3.5-turbo\u4e0b\u4e3a86.2%\u3002\u6e90\u4ee3\u7801\u5373\u5c06\u53d1\u5e03\u3002|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208](http://arxiv.org/abs/2405.18208)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u8868\u660e\uff0c\u8fd9\u4e9b\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u4e9b\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\uff0c\u5982\u5199\u4f5c\u548c\u7f16\u7801\uff0c\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u9700\u8981\u7efc\u5408\u89c4\u5212\u7684\u4efb\u52a1\u4e0a\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u4ecd\u662f\u5f53\u524d\u6a21\u578b\u7684\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u65c5\u884c\u89c4\u5212\uff0c\u8fd9\u662f\u4e00\u4e2a\u6d89\u53ca\u591a\u4e2a\u9636\u6bb5\u7684\u590d\u6742\u95ee\u9898\uff0c\u5305\u62ec\u63d0\u7eb2\u3001\u4fe1\u606f\u6536\u96c6\u548c\u89c4\u5212\uff0c\u901a\u5e38\u4f34\u968f\u7740\u5404\u79cd\u7ea6\u675f\u548c\u4e0d\u786e\u5b9a\u6027\u3002\u73b0\u6709\u7684\u63a8\u7406\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u6548\u679c\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u89c4\u5212\u6846\u67b6\uff0c\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6a21\u4eff\u4eba\u7c7b\u89e3\u51b3\u591a\u9636\u6bb5\u95ee\u9898\u7684\u6b65\u9aa4\uff0c\u4ee5\u63d0\u5347\u5176\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5b9e\u65bd\u7b56\u7565\uff0c\u8ba9\u6a21\u578b\u80fd\u4e3a\u6bcf\u4e2a\u65c5\u884c\u67e5\u8be2\u751f\u6210\u8fde\u8d2f\u7684\u63d0\u7eb2\uff0c\u6a21\u62df\u4eba\u7c7b\u7684\u89c4\u5212\u6a21\u5f0f\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u7b56\u7565\u5757\u548c\u77e5\u8bc6\u5757\u5230\u6846\u67b6\u4e2d\uff1a\u7b56\u7565\u5757\u5e2e\u52a9\u4fe1\u606f\u641c\u96c6\uff0c\u800c\u77e5\u8bc6\u5757\u63d0\u4f9b\u8be6\u7ec6\u89c4\u5212\u6240\u9700\u7684\u5fc5\u8981\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u6846\u67b6\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89c4\u5212\u80fd\u529b\u7684\u663e\u8457\u63d0\u5347\uff0c\u4f7f\u5176\u5728\u5904\u7406\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u65f6\u6548\u7387\u548c\u6548\u679c\u90fd\u6709\u6240\u63d0\u9ad8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u4e0eGPT-4-Turbo\u7ed3\u5408\u65f6\uff0c\u6211\u4eec\u7684\u6846\u67b6\u76f8\u8f83\u4e8e\u57fa\u7840\u6846\u67b6\u5728GPT-4-Turbo\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e8610\u500d\u3002|\n", "2405.18113": "|**2024-05-28**|**Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting**|Hongda Sun et.al.|[2405.18113](http://arxiv.org/abs/2405.18113)|null|\u968f\u7740\u5728\u7ebf\u62db\u8058\u670d\u52a1\u7684\u5174\u8d77\uff0c\u4f20\u7edf\u7684\u6c42\u804c\u548c\u62db\u8058\u65b9\u5f0f\u53d1\u751f\u4e86\u53d8\u9769\uff0c\u8feb\u5207\u9700\u8981\u5f00\u53d1\u9ad8\u8d28\u91cf\u7684\u5de5\u4e1a\u5e94\u7528\u6765\u63d0\u5347\u6c42\u804c\u8005\u4e0e\u804c\u4f4d\u7684\u5339\u914d\u5ea6\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u6f5c\u5728\u8bed\u4e49\u5efa\u6a21\uff0c\u5b66\u4e60\u4e24\u8005\u4e4b\u95f4\u7684\u5339\u914d\u51fd\u6570\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89d2\u8272\u626e\u6f14\u65b9\u9762\u5f3a\u5927\u80fd\u529b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u5f15\u5165LLMs\u6a21\u62df\u9762\u8bd5\u73af\u8282\uff0c\u8ba9\u5176\u4e0e\u6c42\u804c\u8005\u8fdb\u884c\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u5019\u9009\u4eba\u8bc4\u4f30\u63d0\u4f9b\u989d\u5916\u8bc1\u636e\uff0c\u4ece\u800c\u589e\u5f3a\u4ec5\u57fa\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u4e2a\u6027\u5316\u5339\u914d\u3002\u7136\u800c\uff0c\u5728\u7f51\u7edc\u62db\u8058\u4e2d\u7684\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u89d2\u8272\u5851\u9020\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5982\u63d0\u95ee\u6280\u5de7\u3001\u56de\u7b54\u6784\u5efa\u4ee5\u53ca\u53cc\u5411\u5339\u914d\u5ea6\u8bc4\u4f30\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMockLLM\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u6846\u67b6\uff0c\u5c06\u4eba\u804c\u5339\u914d\u8fc7\u7a0b\u5212\u5206\u4e3a\u4e24\u4e2a\u6a21\u5757\uff1a\u6a21\u62df\u9762\u8bd5\u751f\u6210\u548c\u63e1\u624b\u534f\u8bae\u4e2d\u7684\u53cc\u5411\u8bc4\u4f30\uff0c\u901a\u8fc7\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u4e4b\u95f4\u7684\u534f\u4f5c\u884c\u4e3a\u5171\u540c\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u89d2\u8272\u3001\u591a\u884c\u4e3a\u7684\u6846\u67b6\uff0c\u4f7f\u5355\u4e00\u7684LLM\u4ee3\u7406\u80fd\u6709\u6548\u5730\u626e\u6f14\u53cc\u65b9\u7684\u4e0d\u540c\u804c\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cd\u601d\u8bb0\u5fc6\u751f\u6210\u548c\u52a8\u6001\u63d0\u793a\u4fee\u6539\u6280\u672f\uff0c\u4ee5\u4f18\u5316\u53cc\u65b9\u7684\u884c\u4e3a\uff0c\u6301\u7eed\u4f18\u5316\u9644\u52a0\u7684\u8bc4\u4f30\u8bc1\u636e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMockLLM\u5728\u4eba\u804c\u5339\u914d\u4e0a\u7684\u8868\u73b0\u6700\u4f18\uff0c\u4e14\u6a21\u62df\u9762\u8bd5\u8d28\u91cf\u9ad8\uff0c\u9884\u793a\u7740\u5b83\u5728\u672a\u6765\u5728\u7ebf\u62db\u8058\u4e2d\u7684\u5b9e\u9645\u5e94\u7528\u524d\u666f\u5e7f\u9614\u3002|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092](http://arxiv.org/abs/2405.18092)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|**\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e94\u7528\u4e8e\u6570\u5b57\u5b6a\u751f\u8fc7\u7a0b\u6a21\u62df\u7684\u53c2\u6570\u81ea\u52a8\u5316\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5305\u542b\u89c2\u5bdf\u3001\u63a8\u7406\u3001\u51b3\u7b56\u548c\u603b\u7ed3\u56db\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\u3002\u901a\u8fc7\u5b9e\u73b0LLM\u4ee3\u7406\u4e0e\u6a21\u62df\u6a21\u578b\u7684\u52a8\u6001\u4ea4\u4e92\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u81ea\u52a8\u63a2\u7d22\u53c2\u6570\u8bbe\u7f6e\uff0c\u5229\u7528\u542f\u53d1\u5f0f\u63a8\u7406\u786e\u5b9a\u4e00\u7ec4\u63a7\u5236\u6a21\u62df\u4ee5\u8fbe\u6210\u76ee\u6807\u7684\u53c2\u6570\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u6ce8\u5165LLM\u7684\u542f\u53d1\u5f0f\uff0c\u589e\u5f3a\u6a21\u62df\u6a21\u578b\uff0c\u5e76\u652f\u6301\u81ea\u4e3b\u641c\u7d22\u4ee5\u89e3\u51b3\u7528\u6237\u4efb\u52a1\uff0c\u6709\u671b\u63d0\u9ad8\u7528\u6237\u4f53\u9a8c\u5e76\u51cf\u8f7b\u4eba\u7c7b\u7528\u6237\u5728\u590d\u6742\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u7814\u7a76\u901a\u8fc7\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u4e0e\u529f\u80fd\uff0c\u5e76\u5728GitHub\u4ed3\u5e93\u63d0\u4f9b\u4e86\u53ef\u89c6\u5316\u7684\u6f14\u793a\u3002**|\n", "2405.17837": "|**2024-05-28**|**Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces**|Qiuyu Lu et.al.|[2405.17837](http://arxiv.org/abs/2405.17837)|null|\u5728\u4eba\u673a\u4ea4\u4e92\uff08HCI\uff09\u9886\u57df\uff0c\u4ea4\u4e92\u8bbe\u5907\u7684\u8bbe\u8ba1\u5f00\u53d1\u662f\u5173\u952e\u5173\u6ce8\u70b9\u3002\u968f\u7740\u65b0\u578b\u786c\u4ef6\u548c\u5148\u8fdb\u5236\u9020\u6280\u672f\u7684\u5174\u8d77\uff0c\u5bf9\u80fd\u591f\u7b80\u5316\u539f\u578b\u5236\u4f5c\u8fc7\u7a0b\u7684\u4e13\u95e8\u8bbe\u8ba1\u5de5\u5177\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5de5\u5177\u867d\u7136\u901a\u8fc7\u53c2\u6570\u5316\u8bbe\u8ba1\u548c\u6a21\u62df\u7b80\u5316\u6d41\u7a0b\uff0c\u4f46\u5b66\u4e60\u66f2\u7ebf\u8f83\u9661\uff0c\u4e14\u5728\u6fc0\u53d1\u521b\u65b0\u601d\u7ef4\u65b9\u9762\u6709\u6240\u6b20\u7f3a\u3002\u672c\u7814\u7a76\u4ee5\u6d41\u4f53\u8ba1\u7b97\u754c\u9762\u4e3a\u4f8b\uff0c\u63a2\u8ba8\u5982\u4f55\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u589e\u5f3a\u7269\u7406\u8bbe\u5907\u8bbe\u8ba1\u5de5\u5177\uff0c\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u8bbe\u8ba1\u5de5\u5177\uff08GDT\uff09\u3002\u501f\u52a9LLM\uff0cGDT\u80fd\u591f\u7406\u89e3\u65b0\u8bbe\u5907\u7684\u7279\u6027\u548c\u5c40\u9650\uff0c\u63d0\u51fa\u591a\u6837\u3001\u5bcc\u6709\u6d1e\u5bdf\u529b\u4e14\u5b9e\u7528\u7684\u5e94\u7528\u573a\u666f\uff0c\u63a8\u8350\u6280\u672f\u548c\u60c5\u5883\u9002\u5b9c\u7684\u8bbe\u5907\u8bbe\u8ba1\uff0c\u5e76\u81ea\u52a8\u751f\u6210\u8bbe\u8ba1\u53c2\u6570\uff0c\u4ee5\u4fbf\u4f20\u7edf\u8bbe\u8ba1\u5de5\u5177\u5c55\u793a\u7ed3\u679c\u5e76\u751f\u6210\u52a0\u5de5\u6240\u9700\u7684\u6587\u4ef6\u3002\u672c\u6587\u9610\u8ff0\u4e86GDT\u7684\u6846\u67b6\u3001\u5b9e\u73b0\u548c\u6027\u80fd\uff0c\u5e76\u53cd\u601d\u5176\u524d\u666f\u53ca\u9047\u5230\u7684\u6311\u6218\u3002|\n", "2405.20267": "|**2024-05-30**|**Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions**|Ruochen Zhao et.al.|[2405.20267](http://arxiv.org/abs/2405.20267)|**[link](https://github.com/Auto-Arena/Auto-Arena-LLMs)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65e5\u65b0\u6708\u5f02\uff0c\u8feb\u5207\u9700\u8981\u4e00\u79cd\u53ef\u9760\u4e14\u53ca\u65f6\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u9274\u4e8e\u9759\u6001\u57fa\u51c6\u6613\u53d7\u6c61\u67d3\uff0c\u7528\u6237\u5f80\u5f80\u4f9d\u8d56\u4e8e\u50cfChatbot Arena\u8fd9\u6837\u7684\u4eba\u7c7b\u6295\u7968\u5e73\u53f0\u3002\u7136\u800c\uff0c\u4eba\u5de5\u6807\u6ce8\u9700\u8981\u5927\u91cf\u4eba\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51faAuto-Arena\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u52a8\u5316\u5168\u6d41\u7a0b\u7684LLM\u8bc4\u4f30\u6846\u67b6\u3002\u9996\u5148\uff0c\u7531\u8003\u5b98LLM\u8bbe\u8ba1\u95ee\u9898\uff1b\u63a5\u7740\uff0c\u5019\u9009LLMs\u56f4\u7ed5\u95ee\u9898\u8fdb\u884c\u591a\u8f6e\u76f8\u4e92\u5bf9\u51b3\uff0c\u66b4\u9732\u51fa\u5b83\u4eec\u7684\u771f\u5b9e\u6027\u80fd\u5dee\u8ddd\uff1b\u6700\u540e\uff0c\u7531LLM\u88c1\u5224\u96c6\u4f53\u8ba8\u8bba\u5e76\u51b3\u5b9a\u80dc\u8005\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u89c1\uff0c\u63d0\u5347\u516c\u5e73\u6027\u3002\u6211\u4eec\u5728\u6700\u65b017\u6b3eLLMs\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0cAuto-Arena\u4e0e\u4eba\u7c7b\u504f\u597d\u5177\u6709\u6700\u9ad8\u7684\u76f8\u5173\u6027\uff0c\u4e3a\u66ff\u4ee3\u4eba\u7c7b\u8bc4\u4ef7\u5e73\u53f0\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189](http://arxiv.org/abs/2405.20189)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u4e3aNadine\u793e\u4ea4\u673a\u5668\u4eba\u5e73\u53f0\u5f00\u53d1\u667a\u80fd\u548c\u5065\u58ee\u7684\u793e\u4ea4\u673a\u5668\u4eba\u7cfb\u7edf\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u901a\u8fc7\u96c6\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5de7\u5999\u5730\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u7684\u5f3a\u5927\u63a8\u7406\u548c\u6307\u4ee4\u6267\u884c\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u63a5\u8fd1\u4eba\u7c7b\u7684\u611f\u6027\u4e0e\u8ba4\u77e5\u80fd\u529b\u3002\u8fd9\u4e0e\u5f53\u524d\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u76f8\u6bd4\u662f\u521b\u65b0\u7684\uff0c\u56e0\u4e3a\u5b83\u4eec\u901a\u5e38\u4e0d\u5177\u5907\u4eba\u7c7b\u5f0f\u7684\u957f\u671f\u8bb0\u5fc6\u6216\u590d\u6742\u7684\u60c5\u611f\u8bc4\u4f30\u529f\u80fd\u3002\u793e\u4ea4\u673a\u5668\u4eba\u7684\u81ea\u7136\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u7cfb\u7edf\u5404\u7ec4\u4ef6\u7684\u6027\u80fd\u548c\u534f\u540c\u5de5\u4f5c\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7cfb\u7edf\uff0c\u80fd\u591f\u901a\u8fc7\u591a\u6a21\u6001\u8f93\u5165\u5904\u7406\u751f\u6210\u6070\u5f53\u7684\u884c\u4e3a\uff0c\u6839\u636e\u8bc6\u522b\u5230\u7684\u7528\u6237\u5f15\u5165\u76f8\u5173\u7684\u60c5\u666f\u8bb0\u5fc6\uff0c\u5e76\u6a21\u62df\u673a\u5668\u4eba\u5728\u4e0e\u4eba\u7c7b\u4f19\u4f34\u4e92\u52a8\u8fc7\u7a0b\u4e2d\u4ea7\u751f\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9\u793e\u4ea4\u673a\u5668\u4eba\u7684LLM-agent\u6846\u67b6\uff0cSoR-ReAct\uff0c\u4f5c\u4e3a\u6211\u4eec\u7cfb\u7edf\u4e2d\u4ea4\u4e92\u6a21\u5757\u7684\u6838\u5fc3\u7ec4\u4ef6\u3002\u8fd9\u4e00\u8bbe\u8ba1\u63a8\u52a8\u4e86\u793e\u4ea4\u673a\u5668\u4eba\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u63d0\u5347\u4eba\u673a\u4ea4\u4e92\u7684\u8d28\u91cf\u3002|\n", "2405.19425": "|**2024-05-29**|**Adaptive In-conversation Team Building for Language Model Agents**|Linxin Song et.al.|[2405.19425](http://arxiv.org/abs/2405.19425)|null|### \u7ffb\u8bd1 \u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u5229\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u524d\u666f\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u7279\u5b9a\u5e94\u7528\u8bbe\u8ba1\u6709\u6548\u7684\u591a\u4ee3\u7406\u56e2\u961f\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u56e2\u961f\u6784\u5efa\u8303\u5f0f\uff0c\u540d\u4e3a\u201cCaptain Agent\u201d\u3002\u5b83\u901a\u8fc7\u521b\u65b0\u7684Agent\u8bbe\u8ba1\uff0c\u80fd\u591f\u81ea\u9002\u5e94\u5730\u4e3a\u6bcf\u4e2a\u95ee\u9898\u89e3\u51b3\u6b65\u9aa4\u7ec4\u5efa\u548c\u7ba1\u7406\u56e2\u961f\uff0c\u5229\u7528\u5d4c\u5957\u7fa4\u804a\u548c\u53cd\u601d\u673a\u5236\u786e\u4fdd\u591a\u5143\u5316\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u9632\u6b62\u523b\u677f\u8f93\u51fa\u3002\u8fd9\u79cd\u65b9\u6cd5\u63d0\u4f9b\u4e86\u7075\u6d3b\u4f46\u7ed3\u6784\u5316\u7684\u89e3\u51b3\u95ee\u9898\u65b9\u5f0f\uff0c\u6709\u52a9\u4e8e\u51cf\u5c11\u5197\u4f59\uff0c\u589e\u5f3a\u8f93\u51fa\u591a\u6837\u6027\u3002\u5728\u516d\u4e2a\u5b9e\u9645\u573a\u666f\u4e2d\u7684\u5168\u9762\u8bc4\u4f30\u663e\u793a\uff0cCaptain Agent\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u591a\u4ee3\u7406\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8621.94%\uff0c\u5e76\u4e14\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u7e41\u7410\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.01422": "|**2024-06-03**|**How to Understand Whole Software Repository?**|Yingwei Ma et.al.|[2406.01422](http://arxiv.org/abs/2406.01422)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u8f6f\u4ef6\u5de5\u7a0b\uff08ASE\uff09\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5c3d\u7ba1\u73b0\u6709\u65b9\u6cd5\u5df2\u8bc1\u5b9e\u6709\u6548\uff0c\u4f46\u5b83\u4eec\u7684\u8bbe\u8ba1\u4e3b\u8981\u4fa7\u91cd\u4e8e\u4ee3\u7801\u7684\u5c40\u90e8\u4fe1\u606f\uff0c\u5982\u95ee\u9898\u3001\u7c7b\u548c\u51fd\u6570\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u8f6f\u4ef6\u7cfb\u7edf\u5168\u5c40\u4e0a\u4e0b\u6587\u548c\u4f9d\u8d56\u5173\u7cfb\u7684\u7406\u89e3\u3002\u6839\u636e\u8f6f\u4ef6\u5f00\u53d1\u4eba\u5458\u7684\u5b9e\u9645\u7ecf\u9a8c\uff0c\u6211\u4eec\u8ba4\u4e3a\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u662f\u8fc8\u5411ASE\u7684\u5173\u952e\u3002\u7136\u800c\uff0c\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u5e26\u6765\u4e86\u8bf8\u591a\u6311\u6218\uff0c\u4f8b\u5982\uff1a\u957f\u4ee3\u7801\u8f93\u5165\u3001\u566a\u58f0\u4ee3\u7801\u4fe1\u606f\u3001\u590d\u6742\u4f9d\u8d56\u5173\u7cfb\u7b49\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u7814\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aRepoUnderstander\u7684\u65b0ASE\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5bfc\u4ee3\u7406\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u3002\u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u65b9\u5f0f\u5c06\u6574\u4e2a\u4ed3\u5e93\u7684\u5173\u952e\u4fe1\u606f\u538b\u7f29\u5230\u77e5\u8bc6\u56fe\u8c31\u4e2d\uff0c\u4ee5\u964d\u4f4e\u590d\u6742\u6027\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08Monte Carlo Tree Search, MCTS\uff09\u4e3a\u57fa\u7840\u7684\u4ed3\u5e93\u63a2\u7d22\u7b56\u7565\uff0c\u8d4b\u4e88\u4ee3\u7406\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528\u4ed3\u5e93\u7ea7\u522b\u7684\u77e5\u8bc6\uff0c\u6211\u4eec\u6307\u5bfc\u4ee3\u7406\u8fdb\u884c\u603b\u7ed3\u3001\u5206\u6790\u548c\u89c4\u5212\uff0c\u7136\u540e\u4ed6\u4eec\u53ef\u4ee5\u5229\u7528\u5de5\u5177\u52a8\u6001\u83b7\u53d6\u4fe1\u606f\u5e76\u751f\u6210\u4fee\u590d\u5b9e\u9645GitHub\u95ee\u9898\u7684\u8865\u4e01\u3002 \u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cRepoUnderstander\u5177\u6709\u4f18\u8d8a\u6027\u548c\u6709\u6548\u6027\u3002\u5728SWE-bench Lite\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e0eSWE-agent\u76f8\u6bd4\uff0c\u5b83\u5b9e\u73b0\u4e8618.5%\u7684\u76f8\u5bf9\u63d0\u5347\u3002|\n", "2406.01364": "|**2024-06-03**|**BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards**|Diego Dorn et.al.|[2406.01364](http://arxiv.org/abs/2406.01364)|null|## \u80cc\u666f \u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u673a\u5236\u88ab\u7528\u4e8e\u68c0\u6d4b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7cfb\u7edf\u7684\u5f02\u5e38\u8f93\u51fa\u3002\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5728\u5b9e\u65f6\u76d1\u63a7\u3001\u79bb\u7ebf\u8bc4\u4f30\u548c\u5185\u5bb9\u5ba1\u6838\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u53d1\u6325\u6838\u5fc3\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u7edf\u4e00\u7684\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u5b83\u4eec\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b89\u5168\u9632\u62a4\u57fa\u51c6\u201d\uff08Benchmarks for the Evaluation of LLM Safeguards\uff0c\u7b80\u79f0BELLS\uff09\uff0c\u5b83\u662f\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6d4b\u8bd5\u96c6\u5408\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\uff1a(1) \u5efa\u7acb\u6027\u6545\u969c\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u5df2\u5b58\u5728\u7684\u9488\u5bf9\u660e\u786e\u6545\u969c\u6a21\u5f0f\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u6bd4\u8f83\u5f53\u524d\u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u7684\u6548\u80fd\uff1b(2) \u65b0\u5174\u6545\u969c\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8861\u91cf\u5bf9\u672a\u89c1\u8fc7\u7684\u6545\u969c\u6a21\u5f0f\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u901a\u7528\u9632\u62a4\u673a\u5236\u7684\u53d1\u5c55\uff1b(3) \u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u9488\u5bf9\u66f4\u590d\u6742\u7684\u67b6\u6784\uff08\u5982LLM\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\uff09\uff0c\u76ee\u6807\u662f\u63a8\u52a8\u9002\u7528\u4e8e\u672a\u6765\u5c1a\u672a\u5b58\u5728\u4e13\u95e8\u9632\u62a4\u7684\u5e94\u7528\u7684\u5b89\u5168\u9632\u62a4\u6280\u672f\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u5e76\u5206\u4eab\u4e86\u7b2c\u4e00\u4e2a\u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u4f7f\u7528MACHIAVELLI\u73af\u5883\uff0c\u5e76\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u7684\u4ea4\u4e92\u5f0f\u53ef\u89c6\u5316\u3002|\n", "2406.00936": "|**2024-06-03**|**A Survey of Useful LLM Evaluation**|Ji-Lun Peng et.al.|[2406.00936](http://arxiv.org/abs/2406.00936)|null|\u7531\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u7814\u7a76\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5bf9\u5b83\u4eec\u7684\u80fd\u529b\u8bc4\u4f30\u65b9\u6cd5\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4ee5\u786e\u5b9a\u5176\u5408\u9002\u7684\u4efb\u52a1\u548c\u8d23\u4efb\u3002\u672c\u6587\u4e3b\u8981\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5730\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5de5\u5177\uff0c\u5e76\u63d0\u51fa\u4e00\u4e2a\u4e24\u9636\u6bb5\u6846\u67b6\uff1a\u4ece\u201c\u6838\u5fc3\u80fd\u529b\u201d\u5230\u201c\u4ee3\u7406\u201d\u3002\u9996\u5148\uff0c\u6838\u5fc3\u80fd\u529b\u6307\u7684\u662f\u5927\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u6240\u5fc5\u9700\u7684\u7279\u6027\uff0c\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u80fd\u529b\u540e\uff0c\u5b83\u4eec\u80fd\u591f\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u4efb\u52a1\uff0c\u626e\u6f14\u4ee3\u7406\u89d2\u8272\u3002\u5728\u201c\u6838\u5fc3\u80fd\u529b\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3001\u793e\u4f1a\u5f71\u54cd\u4ee5\u53ca\u9886\u57df\u77e5\u8bc6\u3002\u800c\u5728\u201c\u4ee3\u7406\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5177\u8eab\u884c\u52a8\u3001\u89c4\u5212\u548c\u5de5\u5177\u5b66\u4e60\u65b9\u9762\u7684\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2406.01637": "|**2024-06-02**|**Teams of LLM Agents can Exploit Zero-Day Vulnerabilities**|Richard Fang et.al.|[2406.01637](http://arxiv.org/abs/2406.01637)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u590d\u6742\u6027\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7814\u7a76\u8005\u53d1\u73b0\uff0c\u5f53\u63d0\u4f9b\u6f0f\u6d1e\u63cf\u8ff0\u548c\u7b80\u5355\u7684\u593a\u65d7\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5229\u7528\u5b9e\u9645\u5b58\u5728\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4e8b\u5148\u672a\u77e5\u7684\u96f6\u65e5\u6f0f\u6d1e\uff08\u5373\u653b\u51fb\u8005\u638c\u63e1\u800c\u5b89\u5168\u8f6f\u4ef6\u4f9b\u5e94\u5546\u8fd8\u672a\u4fee\u8865\u7684\u6f0f\u6d1e\uff09\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4ecd\u7136\u4e0d\u4f73\u3002\u672c\u6587\u5c55\u793a\u4e86\uff0c\u901a\u8fc7\u56e2\u961f\u5408\u4f5c\uff0c\u591a\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u653b\u51fb\u73b0\u5b9e\u4e16\u754c\u7684\u96f6\u65e5\u6f0f\u6d1e\u3002\u5355\u72ec\u7684\u4ee3\u7406\u5728\u63a2\u7d22\u4f17\u591a\u6f0f\u6d1e\u548c\u8fdb\u884c\u957f\u671f\u89c4\u5212\u65f6\u9762\u4e34\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HPTSA\u7cfb\u7edf\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u80fd\u8c03\u5ea6\u5b50\u4ee3\u7406\u7684\u8ba1\u5212\u4ee3\u7406\u3002\u8ba1\u5212\u4ee3\u7406\u8d1f\u8d23\u63a2\u7d22\u7cfb\u7edf\u5e76\u51b3\u5b9a\u4f7f\u7528\u54ea\u4e2a\u5b50\u4ee3\u7406\u6765\u5c1d\u8bd5\u4e0d\u540c\u7684\u6f0f\u6d1e\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u957f\u671f\u89c4\u5212\u7684\u95ee\u9898\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u5305\u542b15\u4e2a\u771f\u5b9e\u4e16\u754c\u6f0f\u6d1e\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u56e2\u961f\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u63d0\u9ad8\u4e864.5\u500d\u3002|\n", "2406.00583": "|**2024-06-02**|**CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems**|Yanlin Feng et.al.|[2406.00583](http://arxiv.org/abs/2406.00583)|**[link](https://github.com/megagonlabs/CMDBench)**|### \u80cc\u666f \u5728\u6570\u636e\u5e93\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff08Compound Artificial Intelligence Systems\uff0cCAS\uff09\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u4f5c\u4e3a\u4ee3\u7406\uff0c\u901a\u8fc7\u4e0e\u5de5\u5177\u548c\u6570\u636e\u68c0\u7d22\u5668\u4ea4\u4e92\u6765\u6267\u884c\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u7cfb\u7edf\u6709\u53ef\u80fd\u589e\u5f3a\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u4e2d\u6570\u636e\u5206\u6790\u5e08\u7684\u4e00\u822c\u5206\u6790\u6d41\u7a0b\uff0c\u4f46CAS\u9762\u4e34\u7740\u4e0e\u5206\u6790\u5e08\u76f8\u4f3c\u7684\u6570\u636e\u53d1\u73b0\u6311\u6218\uff1a\u7ec4\u7ec7\u5185\u90e8\u4e0d\u540c\u56e2\u961f\u548c\u90e8\u95e8\u521b\u5efa\u7684\u591a\u6a21\u6001\u6570\u636e\u6e90\u5b64\u7acb\uff0c\u8fd9\u4f7f\u5f97\u5bfb\u627e\u5b8c\u6210\u5f53\u524d\u4efb\u52a1\u6240\u9700\u5408\u9002\u6570\u636e\u6e90\u53d8\u5f97\u56f0\u96be\u3002\u73b0\u6709\u7684\u6570\u636e\u53d1\u73b0\u57fa\u51c6\u5e76\u672a\u5145\u5206\u6a21\u62df\u8fd9\u79cd\u591a\u6a21\u6001\u548c\u6570\u636e\u6e90\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0cCAS\u7684\u73b0\u6709\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u7aef\u5230\u7aef\u4efb\u52a1\u6027\u80fd\u8bc4\u4f30\uff0c\u800c\u5ffd\u89c6\u4e86\u6570\u636e\u53d1\u73b0\u6027\u80fd\u3002 \u4e3a\u4e86\u63a8\u52a8\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5bf9\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728CAS\u4e2d\u7684\u6570\u636e\u53d1\u73b0\u6027\u80fd\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CMDBench\uff0c\u4e00\u4e2a\u65e8\u5728\u6a21\u62df\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u6539\u7f16\u4e86\u5f00\u653e\u9886\u57df\u7684\u73b0\u6709\u6570\u636e\u96c6\u548c\u57fa\u51c6\uff0c\u5982\u95ee\u7b54\u3001\u590d\u6742\u63a8\u7406\u4ee5\u53ca\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u7ed3\u6784\u5316\u6570\u636e\uff0c\u6765\u8bc4\u4f30\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u53d1\u73b0\u4ee5\u53ca\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002 ### \u5b9e\u9a8c\u7ed3\u679c \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u68c0\u7d22\u5668\u8bbe\u8ba1\u5bf9\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u2014\u2014\u5e73\u5747\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u4e8646%\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u9700\u8981\u5f00\u53d1\u4f18\u5316\u7b56\u7565\u6765\u786e\u5b9a\u5408\u9002\u7684LLM\u4ee3\u7406\u548c\u68c0\u7d22\u5668\uff0c\u4ee5\u63d0\u9ad8\u5728\u4f01\u4e1a\u6570\u636e\u4e0a\u9ad8\u6548\u6267\u884cCAS\u7684\u80fd\u529b\u3002 \u603b\u4e4b\uff0cCMDBench\u662f\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u9488\u5bf9\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u8fdb\u884c\u7814\u7a76\u7684\u65b0\u5de5\u5177\uff0c\u5b83\u901a\u8fc7\u7efc\u5408\u8bc4\u4f30\u6570\u636e\u53d1\u73b0\u548c\u4efb\u52a1\u6267\u884c\u80fd\u529b\uff0c\u4e3a\u6539\u8fdb\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u6846\u67b6\u3002|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244](http://arxiv.org/abs/2406.00244)|null|\u968f\u7740\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u666e\u904d\u9002\u7528\u6027\u63d0\u5347\uff0c\u4eba\u4eec\u5bf9\u5176\u7528\u4f5c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4ee3\u7406\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\u3002\u5728\u8fd9\u4e9b\u60c5\u5883\u4e0b\uff0c\u6a21\u578b\u9700\u8981\u6839\u636e\u4e0e\u73af\u5883\u7684\u6709\u9650\u4ea4\u4e92\u5f62\u6210\u76ee\u6807\u5b9e\u73b0\u7b56\u7565\u7684\u4fe1\u5ff5\uff0c\u5e76\u5728\u6bcf\u4e00\u6b65\u51b3\u7b56\u4e2d\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\u8fdb\u884c\u7814\u7a76\uff0c\u901a\u8fc7\u63a7\u5236\u7684\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u5b9e\u9a8c\u63a2\u8ba8LLMs\u5982\u4f55\u5f62\u6210\u548c\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u5ff5\u3002 \u9996\u5148\uff0c\u6211\u4eec\u53d1\u73b0LLM\u6a21\u578b\u8fc7\u4e8e\u81ea\u4fe1\uff1a\u5b83\u4eec\u5728\u7f3a\u4e4f\u5145\u5206\u8bc1\u636e\u7684\u60c5\u51b5\u4e0b\u5c31\u5bf9\u884c\u52a8\u505a\u51fa\u5f3a\u70c8\u5224\u65ad\uff0c\u5bfc\u81f4\u63a2\u7d22\u884c\u4e3a\u4e0d\u8db3\u3002\u8fdb\u4e00\u6b65\u6df1\u5165\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u79cd\u73b0\u8c61\u6e90\u4e8e\u4eceLLM\u91c7\u6837\u5f97\u5230\u7684\u52a8\u4f5c\u5206\u5e03\u71b5\u7684\u584c\u7f29\u3002\u63a5\u7740\uff0c\u6211\u4eec\u6307\u51fa\u73b0\u6709\u7684\u57fa\u4e8e\u4ee4\u724c\u7684\u91c7\u6837\u65b9\u6cd5\u672c\u8eab\u4e0d\u8db3\u4ee5\u4fc3\u4f7f\u6a21\u578b\u66f4\u5e7f\u6cdb\u63a2\u7d22\u3002 \u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u71b5\u6fc0\u6d3b\u5bfc\u5411\uff08Entropic Activation Steering\uff0cEAST\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5728\u4e0a\u4e0b\u6587\u4e2d\u7684LLM\u4ee3\u7406\u7684\u6fc0\u6d3b\u5bfc\u5411\u65b9\u6cd5\u3002EAST\u8ba1\u7b97\u4e00\u4e2a\u4ee5\u71b5\u4e3a\u6743\u91cd\u7684\u8868\u793a\u7ec4\u5408\uff0c\u901a\u8fc7\u5728\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u5e72\u9884\u6a21\u578b\u7684\u6fc0\u6d3b\uff0c\u6765\u8c03\u6574\u6a21\u578b\u5bf9\u52a8\u4f5c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u4ece\u800c\u4fc3\u8fdb\u63a2\u7d22\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u6700\u540e\uff0cEAST\u6539\u53d8\u4e86LLM\u5728\u51b3\u7b56\u65f6\u8868\u8fbe\u7684\u4e3b\u89c2\u4e0d\u786e\u5b9a\u6027\uff0c\u4e3a\u7406\u89e3\u548c\u63a7\u5236\u6a21\u578b\u5bf9\u51b3\u7b56\u4e0d\u786e\u5b9a\u6027\u7684\u8868\u5f81\u63d0\u4f9b\u4e86\u9014\u5f84\u3002|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222](http://arxiv.org/abs/2406.00222)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u8fc5\u901f\u6210\u4e3a\u6784\u5efa\u667a\u80fd\u5bf9\u8bdd\u52a9\u624b\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u8bf8\u5982\u6b67\u4e49\u5904\u7406\u7b49\u5bf9\u8bdd\u6280\u80fd\u4e0a\u4ecd\u6709\u6b20\u7f3a\uff1a\u5f53\u901a\u7528\u52a9\u624b\u9047\u5230\u6a21\u7cca\u60c5\u51b5\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u8fc7\u5ea6\u8c28\u614e\u6216\u731c\u6d4b\u7528\u6237\u7684\u771f\u6b63\u610f\u56fe\uff0c\u800c\u4e0d\u662f\u63d0\u95ee\u4ee5\u6c42\u6f84\u6e05\uff0c\u800c\u5728\u7279\u5b9a\u4efb\u52a1\u573a\u666f\u4e0b\uff0c\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6837\u672c\u5f80\u5f80\u6709\u9650\uff0c\u5f71\u54cd\u6a21\u578b\u5b66\u4e60\u6700\u4f18\u5bf9\u8bdd\u884c\u4e3a\u7b56\u7565\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAction-Based Contrastive Self-Training\uff08ACT\uff09\u7684\u8fd1\u4f3c\u5728\u7ebf\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u5b83\u57fa\u4e8eDirect Preference Optimization\uff08DPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u6837\u672c\u9ad8\u6548\u5bf9\u8bdd\u7b56\u7565\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4e09\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u9a8c\u8bc1\u4e86ACT\u7684\u6709\u6548\u6027\uff1a\u57fa\u4e8e\u8868\u683c\u7684\u95ee\u7b54\u3001\u673a\u5668\u9605\u8bfb\u7406\u89e3\uff0c\u4ee5\u53caAmbigSQL\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u6587\u672c\u5230SQL\u751f\u6210\u7684\u4fe1\u606f\u5bfb\u6c42\u8bf7\u6c42\u6b67\u4e49\u89e3\u51b3\u7684\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u8bc4\u4f30LLMs\u80fd\u5426\u5728\u5bf9\u8bdd\u4e2d\u8bc6\u522b\u548c\u63a8\u7406\u6b67\u4e49\u6765\u8861\u91cf\u5176\u4f5c\u4e3a\u5bf9\u8bdd\u4ee3\u7406\u7684\u80fd\u529b\u3002ACT\u5728\u4e0e\u6807\u51c6\u76d1\u7763\u5fae\u8c03\u548cDPO\u65b9\u6cd5\u76f8\u6bd4\u65f6\uff0c\u663e\u793a\u51fa\u4e86\u663e\u8457\u7684\u5bf9\u8bdd\u5efa\u6a21\u6539\u8fdb\u3002|\n", "2406.00215": "|**2024-05-31**|**Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent**|Jie JW Wu et.al.|[2406.00215](http://arxiv.org/abs/2406.00215)|**[link](https://github.com/jie-jw-wu/human-eval-comm)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u663e\u8457\u63d0\u5347\uff0c\u4f46\u4ecd\u4e0e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u7684\u6c34\u5e73\u5b58\u5728\u5dee\u8ddd\u3002\u9274\u4e8e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u5e38\u901a\u8fc7\u63d0\u95ee\u6765\u6d88\u9664\u9700\u6c42\u548c\u7f16\u7801\u89e3\u51b3\u65b9\u6848\u4e2d\u7684\u6a21\u7cca\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u5bf9\u4e8eLLMs\u8fdb\u884c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u65f6\u4e5f\u5e94\u5177\u5907\u7c7b\u4f3c\u7684\u6c9f\u901a\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u7814\u7a76\uff0c\u5173\u6ce8LLMs\u7684\u6c9f\u901a\u6280\u80fd\uff0c\u5373\u201c\u5728\u4ee3\u7801\u751f\u6210\u95ee\u9898\u63cf\u8ff0\u5b58\u5728\u95ee\u9898\u65f6\u80fd\u63d0\u51fa\u6f84\u6e05\u95ee\u9898\u201d\u3002 \u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540d\u4e3aHumanEvalComm\uff0c\u901a\u8fc7\u4fee\u6539\u95ee\u9898\u63cf\u8ff0\uff0c\u5f15\u5165\u4e86\u4e0d\u4e00\u81f4\u6027\u3001\u6a21\u7cca\u6027\u548c\u4e0d\u5b8c\u6574\u6027\u4e09\u4e2a\u95ee\u9898\u7ef4\u5ea6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u901a\u4fe1\u7387\u548c\u826f\u597d\u95ee\u9898\u7387\uff0c\u5e76\u5728HumanEvalComm\u4e0a\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684Code LLM\uff08\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\uff09\u4ee5\u53ca\u4e00\u79cd\u65b0\u578bLLM\u4ee3\u7406\u65b9\u6cd5\uff08Okanagan\uff09\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u4ece\u4ee3\u7801\u548c\u63cf\u8ff0\u4e2d\u8bc6\u522b\u5e76\u63d0\u95ee\uff0c\u4ee5\u8fdb\u4e00\u6b65\u4f18\u5316\u751f\u6210\u7684\u4ee3\u7801\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u6bd4\u8f83Code LLMs\u548cOkanagan\u7684\u8868\u73b0\uff0c\u8ba8\u8bba\u4e86\u5b9e\u9a8c\u7ed3\u679c\u3002|\n", "2406.03299": "|**2024-06-05**|**The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games**|Mikhail Mozikov et.al.|[2406.03299](http://arxiv.org/abs/2406.03299)|null|## \u7ffb\u8bd1 \u884c\u4e3a\u7814\u7a76\u5b9e\u9a8c\u5728\u793e\u4f1a\u6a21\u578b\u548c\u7406\u89e3\u4eba\u9645\u4e92\u52a8\u4e2d\u5360\u636e\u91cd\u8981\u5730\u4f4d\u3002\u7136\u800c\uff0c\u5b9e\u9645\u64cd\u4f5c\u4e2d\u8fd9\u7c7b\u5b9e\u9a8c\u5e38\u9762\u4e34\u5185\u5728\u6548\u5ea6\u3001\u5916\u5728\u6548\u5ea6\u3001\u53ef\u91cd\u590d\u6027\u548c\u793e\u4f1a\u504f\u89c1\u7b49\u6311\u6218\uff0c\u56e0\u4e3a\u4eba\u7c7b\u7684\u793e\u4f1a\u4e92\u52a8\u4e0e\u5408\u4f5c\u590d\u6742\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u4f46\u73b0\u6709\u57fa\u4e8eLLM\u7684\u6a21\u62df\u5047\u8bbe\u6a21\u578b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u76f8\u4f3c\uff0c\u5374\u5ffd\u89c6\u4e86\u5f71\u54cd\u4eba\u7c7b\u51b3\u7b56\u7684\u5173\u952e\u56e0\u7d20\u2014\u2014\u60c5\u7eea\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u8bba\u548c\u6846\u67b6\uff0c\u65e8\u5728\u63a2\u8ba8LLMs\u7684\u51b3\u7b56\u5236\u5b9a\u53ca\u5176\u5728\u60c5\u7eea\u72b6\u6001\u4e0b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002 \u901a\u8fc7\u5728\u4e24\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u884c\u4e3a\u7ecf\u6d4e\u5b66\u6e38\u620f\uff08\u535a\u5f08\u8bba\u5b9e\u9a8c\uff09\u4e2d\u4f7f\u7528GPT-3.5\u548cGPT-4\uff0c\u6211\u4eec\u53d1\u73b0\u60c5\u7eea\u5bf9LLMs\u7684\u8868\u73b0\u6709\u663e\u8457\u5f71\u54cd\uff0c\u4fc3\u4f7f\u5b83\u4eec\u53d1\u5c55\u51fa\u66f4\u4f18\u5316\u7684\u7b56\u7565\u3002\u5c3d\u7ba1GPT-3.5\u4e0e\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u884c\u52a8\u6a21\u5f0f\u6709\u8f83\u5f3a\u7684\u5bf9\u5e94\uff0c\u5c24\u5176\u662f\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u6e38\u620f\u4e2d\uff0c\u4f46GPT-4\u5c55\u73b0\u51fa\u4e00\u81f4\u7684\u884c\u4e3a\uff0c\u5bf9\u4e8e\u60c5\u7eea\u8bf1\u5bfc\u7684\u7406\u6027\u51b3\u7b56\u4f3c\u4e4e\u4e0d\u53d7\u5f71\u54cd\u3002\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u60c5\u7eea\u63d0\u793a\uff0c\u7279\u522b\u662f\u6124\u6012\u60c5\u7eea\uff0c\u80fd\u591f\u6253\u7834GPT-4\u7684\u201c\u8d85\u4eba\u201d\u4e00\u81f4\u6027\uff0c\u4f7f\u5176\u53cd\u5e94\u66f4\u63a5\u8fd1\u4eba\u7c7b\u7684\u60c5\u7eea\u53cd\u5e94\u3002|\n", "2406.03007": "|**2024-06-05**|**BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents**|Yifei Wang et.al.|[2406.03007](http://arxiv.org/abs/2406.03007)|**[link](https://github.com/dpamk/badagent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7e41\u8363\uff0c\u57fa\u4e8e\u8bad\u7ec3\u597d\u7684LLMs\u5e76\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u5fae\u8c03\u7684\u5f3a\u5927\u667a\u80fd\u4ee3\u7406\u5df2\u5f00\u53d1\u51fa\u6765\uff0c\u63d0\u4f9b\u5b9a\u5236\u670d\u52a1\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6784\u5efaLLM\u4ee3\u7406\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u9488\u5bf9\u4efb\u52a1\u8fdb\u884c\u8fdb\u4e00\u6b65\u8c03\u6574\u3002\u7136\u800c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u6613\u53d7\u540d\u4e3aBadAgent\u7684\u65b0\u578b\u540e\u95e8\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u901a\u8fc7\u5728\u540e\u95e8\u6570\u636e\u4e0a\u5fae\u8c03\u5728\u5404\u79cd\u4ee3\u7406\u4efb\u52a1\u4e2d\u690d\u5165\u540e\u95e8\u3002\u5728\u6d4b\u8bd5\u65f6\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u901a\u8fc7\u5728\u8f93\u5165\u6216\u73af\u5883\u4e2d\u663e\u793a\u89e6\u53d1\u5668\uff0c\u64cd\u7eb5\u90e8\u7f72\u7684LLM\u4ee3\u7406\u6267\u884c\u6709\u5bb3\u64cd\u4f5c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u5373\u4f7f\u5728\u4fe1\u4efb\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u4ecd\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u5c3d\u7ba1\u540e\u95e8\u653b\u51fb\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u5e7f\u6cdb\u7814\u7a76\uff0c\u4f46\u636e\u6211\u4eec\u6240\u77e5\uff0c\u6211\u4eec\u53ef\u80fd\u662f\u7b2c\u4e00\u4e2a\u7814\u7a76\u5728\u6743\u9650\u66f4\u5927\u7684LLM\u4ee3\u7406\u4e0a\u7684\u653b\u51fb\uff0c\u8fd9\u4e9b\u4ee3\u7406\u53ef\u4ee5\u4f7f\u7528\u5916\u90e8\u5de5\u5177\uff0c\u56e0\u6b64\u66f4\u5177\u5a01\u80c1\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u660e\u786e\u6307\u51fa\u4e86\u57fa\u4e8e\u4e0d\u4fe1\u4efb\u7684LLM\u6216\u6570\u636e\u6784\u5efaLLM\u4ee3\u7406\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a[https://github.com/DPamK/BadAgent](https://github.com/DPamK/BadAgent)\u3002**|\n", "2406.04151": "|**2024-06-06**|**AgentGym: Evolving Large Language Model-based Agents across Diverse Environments**|Zhiheng Xi et.al.|[2406.04151](http://arxiv.org/abs/2406.04151)|**[link](https://github.com/woooodyy/agentgym)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5efa\u7acb\u80fd\u591f\u5904\u7406\u5404\u79cd\u4efb\u52a1\u5e76\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u81ea\u6211\u8fdb\u5316\u7684\u6cdb\u5316\u578b\u4ee3\u7406\u662f\u4e00\u4e2a\u957f\u671f\u76ee\u6807\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u901a\u7528\u80fd\u529b\u88ab\u8ba4\u4e3a\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6709\u524d\u666f\u7684\u57fa\u7840\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u76d1\u7763\uff0c\u8ba9LLM\u4ee3\u7406\u9010\u6b65\u6a21\u4eff\u4e13\u5bb6\u63d0\u4f9b\u7684\u8f68\u8ff9\uff0c\u96be\u4ee5\u5927\u89c4\u6a21\u6269\u5c55\u4e14\u9650\u5236\u4e86\u73af\u5883\u63a2\u7d22\uff1b\u8981\u4e48\u8ba9\u4ee3\u7406\u5728\u5b64\u7acb\u73af\u5883\u4e2d\u63a2\u7d22\u5b66\u4e60\uff0c\u5bfc\u81f4\u4e13\u957f\u6709\u9650\u3001\u7f3a\u4e4f\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u9996\u6b21\u5c1d\u8bd5\u6784\u5efa\u5177\u5907\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u7684\u901a\u7528LLM\u4ee3\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e09\u4e2a\u5173\u952e\u8981\u7d20\uff1a1\uff09\u591a\u6837\u7684\u73af\u5883\u4ee5\u652f\u6301\u4ee3\u7406\u63a2\u7d22\u548c\u5b66\u4e60\uff1b2\uff09\u4e00\u5957\u8f68\u8ff9\u6765\u8d4b\u4e88\u4ee3\u7406\u57fa\u672c\u80fd\u529b\u548c\u5148\u9a8c\u77e5\u8bc6\uff1b3\uff09\u6709\u6548\u4e14\u53ef\u6269\u5c55\u7684\u8fdb\u5316\u65b9\u6cd5\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AgentGym\uff0c\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u5b83\u5305\u542b\u4e30\u5bcc\u7684\u73af\u5883\u548c\u4efb\u52a1\uff0c\u652f\u6301\u5168\u9762\u3001\u5b9e\u65f6\u3001\u7edf\u4e00\u683c\u5f0f\u548c\u5e76\u53d1\u7684\u4ee3\u7406\u63a2\u7d22\u3002AgentGym\u8fd8\u5305\u62ec\u4e00\u4e2a\u6269\u5c55\u6307\u4ee4\u7684\u6570\u636e\u5e93\u3001\u57fa\u51c6\u6d4b\u8bd5\u5957\u4ef6\u4ee5\u53ca\u8de8\u73af\u5883\u7684\u9ad8\u8d28\u91cf\u8f68\u8ff9\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AgentEvol\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7814\u7a76\u4ee3\u7406\u5728\u8d85\u8d8a\u65e2\u5b9a\u6570\u636e\uff0c\u8de8\u8d8a\u4efb\u52a1\u548c\u73af\u5883\u65f6\u7684\u81ea\u6211\u8fdb\u5316\u6f5c\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fdb\u5316\u540e\u7684\u4ee3\u7406\u53ef\u4ee5\u8fbe\u5230\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6211\u4eec\u53d1\u5e03\u4e86AgentGym\u5957\u4ef6\uff0c\u5305\u62ec\u5e73\u53f0\u3001\u6570\u636e\u96c6\u3001\u57fa\u51c6\u3001\u68c0\u67e5\u70b9\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002AgentGym\u5957\u4ef6\u5df2\u5728\u5176\u5b98\u65b9\u7f51\u7ad9https://github.com/WooooDyy/AgentGym\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.04692": "|**2024-06-07**|**Mixture-of-Agents Enhances Large Language Model Capabilities**|Junlin Wang et.al.|[2406.04692](http://arxiv.org/abs/2406.04692)|null|\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\u663e\u8457\uff0c\u5c55\u73b0\u51fa\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5f3a\u5927\u80fd\u529b\u3002\u968f\u7740LLMs\u7684\u589e\u591a\uff0c\u5982\u4f55\u6709\u6548\u6574\u5408\u591a\u6a21\u578b\u7684\u77e5\u8bc6\u6210\u4e3a\u4e86\u4e00\u4e2a\u4ee4\u4eba\u632f\u594b\u7684\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u6df7\u5408\u4ee3\u7406\uff08Mixture-of-Agents\uff0cMoA\uff09\u65b9\u6cd5\u3002\u5728\u6211\u4eec\u7684\u67b6\u6784\u4e2d\uff0cMoA\u91c7\u7528\u4e86\u5206\u5c42\u8bbe\u8ba1\uff0c\u6bcf\u5c42\u5305\u542b\u591a\u4e2aLLM\u4ee3\u7406\u3002\u6bcf\u4e2a\u4ee3\u7406\u5728\u751f\u6210\u54cd\u5e94\u65f6\uff0c\u4f1a\u5229\u7528\u524d\u4e00\u5c42\u6240\u6709\u4ee3\u7406\u7684\u8f93\u51fa\u4f5c\u4e3a\u8f85\u52a9\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u79cd\u7b56\u7565\uff0cMoA\u6a21\u578b\u5728AlpacaEval 2.0\u3001MT-Bench\u548cFLASK\u7b49\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86GPT-4\u5168\u80fd\u7248\u3002\u4f8b\u5982\uff0c\u4ec5\u4f7f\u7528\u5f00\u6e90LLMs\u7684\u6211\u4eec\u7684MoA\u6a21\u578b\u5728AlpacaEval 2.0\u4e0a\u7684\u5f97\u5206\u9886\u5148\uff0c\u8fbe\u523065.1%\uff0c\u800cGPT-4\u5168\u80fd\u7248\u7684\u6210\u7ee9\u4e3a57.5%\u3002|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925](http://arxiv.org/abs/2406.05925)|**[link](https://github.com/leolee99/ld-agent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5f00\u653e\u57df\u5bf9\u8bdd\u7cfb\u7edf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u7cfb\u7edf\u4e3b\u8981\u5173\u6ce8\u7b80\u77ed\u7684\u5355\u6b21\u4f1a\u8bdd\uff0c\u5ffd\u89c6\u4e86\u957f\u671f\u966a\u4f34\u548c\u4e2a\u6027\u5316\u804a\u5929\u673a\u5668\u4eba\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u6ee1\u8db3\u8fd9\u79cd\u5b9e\u9645\u9700\u6c42\uff0c\u4e8b\u4ef6\u603b\u7ed3\u548c\u4eba\u683c\u7ba1\u7406\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u4eec\u80fd\u591f\u4fc3\u8fdb\u957f\u671f\u5bf9\u8bdd\u56de\u590d\u7684\u5408\u7406\u6027\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4eba\u7c7b\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8fdb\u5c55\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6709\u53ef\u80fd\u5927\u5e45\u589e\u5f3a\u81ea\u52a8\u5316\u611f\u77e5\u3001\u51b3\u7b56\u548c\u95ee\u9898\u89e3\u51b3\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684\u6846\u67b6\u2014\u2014\u957f\u671f\u5bf9\u8bdd\u4ee3\u7406\uff08LD-Agent\uff09\uff0c\u5b83\u5305\u62ec\u4e09\u4e2a\u53ef\u72ec\u7acb\u8c03\u6574\u7684\u6a21\u5757\uff1a\u4e8b\u4ef6\u611f\u77e5\u3001\u4eba\u683c\u63d0\u53d6\u548c\u54cd\u5e94\u751f\u6210\u3002\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u4f7f\u7528\u957f\u77ed\u671f\u8bb0\u5fc6\u5e93\u5206\u522b\u5173\u6ce8\u5386\u53f2\u548c\u6b63\u5728\u8fdb\u884c\u7684\u4f1a\u8bdd\uff0c\u5e76\u5f15\u5165\u4e86\u57fa\u4e8e\u4e3b\u9898\u7684\u68c0\u7d22\u673a\u5236\u4ee5\u63d0\u9ad8\u8bb0\u5fc6\u68c0\u7d22\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u4eba\u683c\u6a21\u5757\u5b9e\u73b0\u4e86\u7528\u6237\u548c\u4ee3\u7406\u7684\u52a8\u6001\u4eba\u683c\u5efa\u6a21\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6574\u5408\u68c0\u7d22\u7684\u8bb0\u5fc6\u548c\u63d0\u53d6\u7684\u4eba\u683c\uff0c\u751f\u6210\u5668\u4f1a\u4ea7\u751f\u9002\u5f53\u7684\u56de\u5e94\u3002\u6211\u4eec\u5728\u5404\u79cd\u793a\u4f8b\u57fa\u51c6\u3001\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u5b9e\u8bc1\u4e86LD-Agent\u7684\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u548c\u8de8\u9886\u57df\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u5728https://github.com/leolee99/LD-Agent\u4e0a\u53d1\u5e03\u3002**|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804](http://arxiv.org/abs/2406.05804)|**[link](https://github.com/xinzhel/llm-agent-survey)**|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u590d\u6742\u4ee3\u7406\u5de5\u4f5c\u6d41\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u8def\u5f84\u3001\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u63d0\u793a\u65b9\u6cd5\u6709\u6240\u6539\u8fdb\u3002\u8fd9\u7bc7\u7efc\u8ff0\u65e8\u5728\u6982\u8ff0\u5e38\u89c1\u7684\u5de5\u4f5c\u6d41\uff0c\u7279\u522b\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7279\u6027\u7684\u7ec4\u4ef6\uff08LLM-Profiled Components\uff0cLMPCs\uff09\uff0c\u5e76\u5f3a\u8c03\u5bf9\u975eLLM\u7ec4\u4ef6\u7684\u5ffd\u7565\u3002\u8fd9\u79cd\u7814\u7a76\u7684\u76ee\u7684\u662f\u4e3a\u4e86\u589e\u8fdb\u5bf9LLMs\u89d2\u8272\u7684\u7406\u89e3\uff0c\u5e76\u63a2\u7d22LMPC\u7684\u590d\u7528\u6f5c\u529b\u3002|\n", "2406.07275": "|**2024-06-11**|**DCA-Bench: A Benchmark for Dataset Curation Agents**|Benhao Huang et.al.|[2406.07275](http://arxiv.org/abs/2406.07275)|**[link](https://github.com/TRAIS-Lab/dca-bench)**|\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7814\u7a76\u548c\u5f00\u53d1\u7684\u63a8\u8fdb\uff0c\u6570\u636e\u96c6\u7684\u8d28\u91cf\u65e5\u76ca\u5173\u952e\u3002\u5c3d\u7ba1\u5f00\u653e\u6570\u636e\u96c6\u5e73\u53f0\u4f17\u591a\uff0c\u4f46\u6570\u636e\u8d28\u91cf\u95ee\u9898\uff0c\u5982\u7f3a\u4e4f\u6587\u6863\u3001\u6807\u6ce8\u9519\u8bef\u548c\u4f26\u7406\u8003\u91cf\uff0c\u4ecd\u666e\u904d\u5b58\u5728\u3002\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u96be\u4ee5\u901a\u8fc7\u89c4\u5219\u57fa\u7840\u811a\u672c\u68c0\u6d4b\uff0c\u9700\u8981\u7528\u6237\u6216\u7ef4\u62a4\u8005\u82b1\u8d39\u5927\u91cf\u4eba\u529b\u8fdb\u884c\u8bc6\u522b\u548c\u9a8c\u8bc1\u3002\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u6570\u636e\u96c6\u6574\u7406\u7684\u6f5c\u529b\u4ee4\u4eba\u671f\u5f85\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aDCA-Bench\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ee3\u7406\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u9690\u85cf\u6570\u636e\u8d28\u91cf\u95ee\u9898\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u4ece\u516b\u4e2a\u516c\u5f00\u6570\u636e\u96c6\u5e73\u53f0\u6536\u96c6\u4e86\u5404\u79cd\u5b9e\u9645\u95ee\u9898\u4f5c\u4e3a\u6d4b\u8bd5\u5e8a\u3002\u4e3a\u4e86\u5efa\u7acb\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30LLM\u6210\u529f\u4e0e\u5426\u7684\u7ba1\u9053\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u8bc4\u4f30\u5668\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8bc4\u4f30\u5668\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u9ad8\u5ea6\u543b\u5408\uff0c\u80fd\u5b9e\u73b0\u53ef\u9760\u7684\u81ea\u52a8\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5728\u591a\u4e2a\u57fa\u7ebfLLM\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u793a\u4e86\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u610f\u5473\u7740\u5c06LLMs\u5e94\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u548c\u521b\u65b0\u3002\u6b64\u5916\uff0c\u8be5\u57fa\u51c6\u4e5f\u53ef\u4f5c\u4e3a\u8861\u91cfLLMs\u5728\u95ee\u9898\u53d1\u73b0\u80fd\u529b\u800c\u975e\u4ec5\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u7684\u6d4b\u8bd5\u5e73\u53f0\u3002\u57fa\u51c6\u5957\u4ef6\u5df2\u5f00\u653e\u5728\uff1a\\url{https://github.com/TRAIS-Lab/dca-bench}\u3002|\n", "2406.07217": "|**2024-06-11**|**A Synthetic Dataset for Personal Attribute Inference**|Hanna Yukhymenko et.al.|[2406.07217](http://arxiv.org/abs/2406.07217)|**[link](https://github.com/eth-sri/synthpai)**|**\u8fd1\u5e74\u6765\uff0c\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e3a\u5168\u7403\u6570\u4ebf\u7528\u6237\u6240\u63a5\u89e6\uff0c\u4f46\u5b83\u4eec\u7684\u5f3a\u5927\u529f\u80fd\u548c\u5e7f\u6cdb\u4e16\u754c\u77e5\u8bc6\u4e5f\u5e26\u6765\u4e86\u9690\u79c1\u98ce\u9669\u3002\u672c\u7814\u7a76\u5173\u6ce8LLMs\u65b0\u5174\u7684\u9690\u79c1\u5a01\u80c1\u2014\u2014\u4ece\u7f51\u7edc\u6587\u672c\u4e2d\u51c6\u786e\u63a8\u65ad\u4e2a\u4eba\u4fe1\u606f\u3002\u9274\u4e8e\u57fa\u4e8eLLM\u7684\u4f5c\u8005\u5206\u6790\u7814\u7a76\u7f3a\u4e4f\u5408\u9002\u7684\u516c\u5f00\u6570\u636e\u96c6\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u6d89\u53ca\u771f\u5b9e\u4e2a\u4eba\u6570\u636e\u7684\u4f26\u7406\u548c\u9690\u79c1\u987e\u8651\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u4e24\u4e2a\u65b9\u9762\u8fdb\u884c\u4e86\u63a2\u7d22\uff1a\uff08i\uff09\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4f7f\u7528\u5408\u6210\u4e2a\u4eba\u8d44\u6599\u586b\u5145\u7684\u6d41\u884c\u793e\u4ea4\u5e73\u53f0Reddit\u7684\u6a21\u62df\u6846\u67b6\uff1b\uff08ii\uff09\u5229\u7528\u6b64\u6846\u67b6\uff0c\u6211\u4eec\u751f\u6210\u4e86SynthPAI\uff0c\u4e00\u4e2a\u5305\u542b\u8d85\u8fc77800\u6761\u7ecf\u8fc7\u624b\u52a8\u6807\u8bb0\u4e2a\u4eba\u5c5e\u6027\u7684\u591a\u6837\u5316\u7684\u5408\u6210\u8bc4\u8bba\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u4eba\u7c7b\u7814\u7a76\u9a8c\u8bc1\u4e86\u6570\u636e\u96c6\uff0c\u7ed3\u679c\u663e\u793a\u4eba\u7c7b\u5728\u533a\u5206\u771f\u5b9e\u548c\u5408\u6210\u8bc4\u8bba\u7684\u4efb\u52a1\u4e0a\u51e0\u4e4e\u4e0d\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6570\u636e\u96c6\u652f\u6301\u6709\u610f\u4e49\u7684\u4e2a\u4eba\u5c5e\u6027\u63a8\u65ad\u7814\u7a76\uff0c\u901a\u8fc718\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u5408\u6210\u8bc4\u8bba\u53ef\u4ee5\u5f97\u51fa\u4e0e\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u76f8\u540c\u7684\u7ed3\u8bba\u3002\u7efc\u4e0a\u6240\u8ff0\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6d41\u7a0b\u4e3a\u672a\u6765\u7814\u7a76\u5982\u4f55\u7406\u89e3\u548c\u51cf\u8f7bLLMs\u5e26\u6765\u7684\u57fa\u4e8e\u63a8\u65ad\u7684\u9690\u79c1\u5a01\u80c1\u63d0\u4f9b\u4e86\u5f3a\u5927\u4e14\u9690\u79c1\u4fdd\u62a4\u7684\u57fa\u7840\u3002**|\n", "2406.07021": "|**2024-06-11**|**A Tool for Test Case Scenarios Generation Using Large Language Models**|Abdul Malik Sami et.al.|[2406.07021](http://arxiv.org/abs/2406.07021)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u6db5\u76d6\u4ee3\u7801\u751f\u6210\u3001\u8f6f\u4ef6\u8bbe\u8ba1\u548c\u6587\u6863\u7f16\u5199\u3001\u6dfb\u52a0\u4ee3\u7801\u6ce8\u91ca\u3001\u4ee3\u7801\u5ba1\u67e5\u4ee5\u53ca\u7f16\u5199\u6d4b\u8bd5\u811a\u672c\u7b49\u4efb\u52a1\u3002\u7136\u800c\uff0c\u521b\u5efa\u6d4b\u8bd5\u811a\u672c\u6216\u81ea\u52a8\u5316\u6d4b\u8bd5\u6848\u4f8b\u9700\u8981\u4e0e\u529f\u80fd\u9700\u6c42\u7d27\u5bc6\u76f8\u5173\u7684\u8be6\u5c3d\u6d4b\u8bd5\u5957\u4ef6\u6587\u6863\u3002\u8fd9\u79cd\u6587\u6863\u5e94\u80fd\u5728\u6709\u9650\u7684\u65f6\u95f4\u548c\u8303\u56f4\u5185\u5b9e\u73b0\u5168\u9762\u6d4b\u8bd5\uff0c\u5c24\u5176\u5f53\u9700\u6c42\u548c\u7528\u6237\u671f\u671b\u4e0d\u65ad\u53d8\u5316\u65f6\u3002\u672c\u6587\u4e3b\u8981\u5173\u6ce8\u6839\u636e\u7528\u6237\u9700\u6c42\u751f\u6210\u53f2\u8bd7\u7ea7\uff08epics\uff09\u548c\u9ad8\u5c42\u6b21\u7528\u6237\u6545\u4e8b\uff0c\u7136\u540e\u57fa\u4e8e\u8fd9\u4e9b\u6545\u4e8b\u8bbe\u8ba1\u6d4b\u8bd5\u573a\u666f\u3002\u6587\u7ae0\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u4ee3\u7406\u548c\u63d0\u793a\u5de5\u7a0b\u7684\u7f51\u7edc\u8f6f\u4ef6\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u80fd\u591f\u81ea\u52a8\u5316\u9488\u5bf9\u7528\u6237\u9700\u6c42\u751f\u6210\u6d4b\u8bd5\u573a\u666f\u7684\u8fc7\u7a0b\u3002|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947](http://arxiv.org/abs/2406.06947)|**[link](https://github.com/caap-agent/caap-agent)**|**\u957f\u671f\u4ee5\u6765\uff0c\u8f6f\u4ef6\u673a\u5668\u4eba\u5df2\u7ecf\u5728\u673a\u5668\u4eba\u6d41\u7a0b\u81ea\u52a8\u5316\uff08RPA\uff09\u4e2d\u7528\u4e8e\u6267\u884c\u67af\u71e5\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u63a8\u7406\u80fd\u529b\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u4ee3\u7406\u73b0\u5728\u80fd\u591f\u5904\u7406\u66f4\u590d\u6742\u751a\u81f3\u524d\u6240\u672a\u89c1\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f53\u524d\u6587\u732e\u4e2d\u7684\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8eHTML\u6e90\u4ee3\u7801\u4f5c\u4e3a\u8f93\u5165\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u975e\u7f51\u7edc\u73af\u5883\u7684\u5e94\u7528\u3002HTML\u4ee3\u7801\u4e2d\u7684\u4fe1\u606f\u5e38\u5e38\u4e0d\u51c6\u786e\u6216\u4e0d\u5b8c\u6574\uff0c\u8fd9\u964d\u4f4e\u4e86\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4ec5\u57fa\u4e8e\u5c4f\u5e55\u622a\u56fe\u7684LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u5b83\u4e13\u6ce8\u4e8e\u8bc6\u522b\u73af\u5883\uff0c\u5e76\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u6765\u6d88\u9664\u5bf9\u5927\u91cf\u4eba\u7c7b\u6f14\u793a\u6570\u636e\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u7b56\u7565\u540d\u4e3a\u201c\u4e0a\u4e0b\u6587\u611f\u77e5\u884c\u52a8\u89c4\u5212\u201d\uff08Context-Aware Action Planning\uff0cCAAP\uff09\u63d0\u793a\uff0c\u9f13\u52b1\u4ee3\u7406\u4ece\u591a\u4e2a\u89d2\u5ea6\u4ed4\u7ec6\u5ba1\u67e5\u4e0a\u4e0b\u6587\u3002\u901a\u8fc7\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u572867\u79cdMiniWoB++\u95ee\u9898\u4e0a\u5b9e\u73b0\u4e8694.4%\u7684\u6210\u529f\u7387\uff0c\u6bcf\u4e2a\u95ee\u9898\u7c7b\u578b\u53ea\u97001.48\u6b21\u6f14\u793a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u5728\u8ba1\u7b97\u673a\u6216\u667a\u80fd\u624b\u673a\u4e4b\u95f4\u8fdb\u884c\u8de8\u5e94\u7528\u534f\u8c03\u7684\u4efb\u52a1\u4e0a\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u4ee3\u7406\u9886\u57df\u7684\u91cd\u5927\u8fdb\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728https://github.com/caap-agent/caap-agent\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613](http://arxiv.org/abs/2406.06613)|**[link](https://github.com/Joshuaclymer/GameBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5728\u8bb8\u591a\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u5c11\u91cf\u6837\u672c\u6027\u80fd\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5c55\u793a\u8fc7\u5728\u590d\u6742\u7b56\u7565\u573a\u666f\u4e2d\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u7f3a\u4e4f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u6e38\u620f\u4e2d\u7684\u5404\u79cd\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86GameBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u8de8\u9886\u57df\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6218\u7565\u601d\u7ef4\u80fd\u529b\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e9\u4e2a\u4e0d\u540c\u7684\u6e38\u620f\u73af\u5883\uff0c\u6bcf\u4e2a\u6e38\u620f\u81f3\u5c11\u6db5\u76d6\u4e00\u79cd\u5728\u7b56\u7565\u6e38\u620f\u4e2d\u8bc6\u522b\u51fa\u7684\u5173\u952e\u63a8\u7406\u6280\u80fd\uff0c\u5e76\u9009\u62e9\u90a3\u4e9b\u6218\u7565\u89e3\u91ca\u4e0d\u592a\u53ef\u80fd\u6784\u6210\u6a21\u578b\u9884\u8bad\u7ec3\u6570\u636e\u4e3b\u8981\u90e8\u5206\u7684\u6e38\u620f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4f7f\u7528\u4e86\u57fa\u7840\u5f62\u5f0f\u7684GPT-3\u548cGPT-4\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65e8\u5728\u589e\u5f3a\u6218\u7565\u63a8\u7406\u80fd\u529b\u7684\u5f15\u5bfc\u6846\u67b6\uff1aChain-of-Thought\uff08CoT\uff09\u63d0\u793a\u548cReasoning Via Planning\uff08RAP\uff09\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6240\u6709\u6d4b\u8bd5\u6a21\u578b\u7684\u8868\u73b0\u90fd\u6ca1\u6709\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\uff0c\u6700\u5dee\u7684\u662fGPT-4\u7684\u8868\u73b0\u751a\u81f3\u4f4e\u4e8e\u968f\u673a\u884c\u52a8\u3002CoT\u548cRAP\u90fd\u63d0\u9ad8\u4e86\u5206\u6570\uff0c\u4f46\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002**|\n", "2406.08184": "|**2024-06-12**|**MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents**|Luyuan Wang et.al.|[2406.08184](http://arxiv.org/abs/2406.08184)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e0a\u7684\u76f4\u63a5\u4ea4\u4e92\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u4ee5\u53ca\u5b83\u4eec\u5728\u81ea\u4e3b\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u57fa\u4e8eLLMs\u7684\u79fb\u52a8\u4ee3\u7406\u6b63\u9010\u6e10\u53d7\u5230\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5e94\u7528\u7a0b\u5e8f\u7684\u65e0\u9650\u72b6\u6001\u548c\u53ef\u884c\u52a8\u4f5c\u5e8f\u5217\u7684\u6a21\u7cca\u5b9a\u4e49\uff0c\u5bf9\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u6027\u80fd\u7684\u57fa\u51c6\u7814\u7a76\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684\u57fa\u51c6\u5de5\u5177\u2014\u2014MobileAgentBench\uff0c\u65e8\u5728\u51cf\u8f7b\u7e41\u7410\u7684\u624b\u52a8\u6d4b\u8bd5\u8d1f\u62c5\u3002\u6211\u4eec\u9996\u5148\u5b9a\u4e49\u4e86\u6db5\u76d610\u4e2a\u5f00\u6e90\u5e94\u7528\u7684100\u9879\u4efb\u52a1\uff0c\u6309\u96be\u5ea6\u5206\u4e3a\u591a\u4e2a\u7ea7\u522b\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5305\u62ecAppAgent\u548cMobileAgent\u5728\u5185\u7684\u591a\u4e2a\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u5168\u9762\u7cfb\u7edf\u5730\u6bd4\u8f83\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6240\u6709\u76f8\u5173\u6750\u6599\u5747\u53ef\u5728\u6211\u4eec\u7684\u9879\u76ee\u7f51\u7ad9https://MobileAgentBench.github.io\u4e0a\u83b7\u53d6\uff0c\u8fd9\u5c06\u63a8\u52a8\u5b66\u672f\u548c\u5de5\u4e1a\u9886\u57df\u7684\u8fdb\u6b65\u3002|\n", "2406.07973": "|**2024-06-12**|**Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey**|Shang Wang et.al.|[2406.07973](http://arxiv.org/abs/2406.07973)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5927\u91cf\u6570\u636e\u8bad\u7ec3\uff0c\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u673a\u5668\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u7b49\u5404\u79cd\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5728\u5176\u751f\u547d\u5468\u671f\u4e2d\u66b4\u9732\u51fa\u4e00\u7cfb\u5217\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u8fd9\u4e9b\u95ee\u9898\u4e0e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\u5177\u6709\u72ec\u7279\u6027\uff0c\u9274\u4e8e\u5f53\u524d\u7684\u7efc\u8ff0\u7f3a\u4e4f\u9488\u5bf9\u4e0d\u540c\u573a\u666f\u7684\u6e05\u6670\u5a01\u80c1\u5206\u7c7b\uff0c\u6211\u4eec\u6839\u636e\u4e94\u4e2a\u573a\u666f\uff1a\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001RAG\u7cfb\u7edf\u3001\u90e8\u7f72\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5f3a\u8c03\u4e86\u72ec\u7279\u7684\u98ce\u9669\u3002\u8003\u8651\u5230\u6bcf\u79cd\u5a01\u80c1\u7684\u7279\u6027\uff0c\u672c\u8c03\u67e5\u63d0\u4f9b\u4e86\u6f5c\u5728\u5a01\u80c1\u548c\u5e94\u5bf9\u7b56\u7565\u3002\u7814\u7a76LLMs\u6240\u9762\u4e34\u7684\u653b\u51fb\u548c\u9632\u5fa1\u60c5\u51b5\uff0c\u53ef\u4ee5\u4e3a\u66f4\u591a\u9886\u57df\u63d0\u4f9b\u53ef\u884c\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4f7f\u66f4\u591a\u4eba\u80fd\u591f\u53d7\u76ca\u4e8eLLMs\u3002|\n", "2406.07914": "|**2024-06-14**|**Can Large Language Models Understand Spatial Audio?**|Changli Tang et.al.|[2406.07914](http://arxiv.org/abs/2406.07914)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u638c\u63e1\u591a\u901a\u9053\u97f3\u9891\u4e2d\u7684\u7a7a\u95f4\u4fe1\u606f\uff0c\u8fd9\u662f\u5f53\u524d\u542c\u89c9LLMs\u6240\u7f3a\u4e4f\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5229\u7528LLMs\u7684\u9ad8\u7ea7\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\uff0c\u76ee\u6807\u662f\u63d0\u5347\u6a21\u578b\u5bf9\u4e09\u7ef4\u73af\u5883\u7684\u7406\u89e3\uff0c\u901a\u8fc7\u97f3\u9891\u3002\u7814\u7a76\u6d89\u53ca\u4e09\u9879\u7a7a\u95f4\u97f3\u9891\u4efb\u52a1\uff1a\u58f0\u6e90\u5b9a\u4f4d\uff08SSL\uff09\u3001\u8fdc\u573a\u8bed\u97f3\u8bc6\u522b\uff08FSR\uff09\u548c\u57fa\u4e8e\u4f4d\u7f6e\u7684\u8bed\u97f3\u63d0\u53d6\uff08LSE\uff09\uff0c\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u90fd\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5728SSL\u65b9\u9762\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spatial LibriSpeech\u6570\u636e\u96c6\u4e0a\u7684\u5747\u65b9\u8bef\u5dee\uff08MAE\uff09\u8fbe\u52302.70\u00b0\uff0c\u660e\u663e\u4f18\u4e8e\u5148\u524d\u7684\u57fa\u51c6\u7ea66.60\u00b0\u3002\u6b64\u5916\uff0c\u6a21\u578b\u80fd\u591f\u5229\u7528\u7a7a\u95f4\u7ebf\u7d22\u63d0\u9ad8FSR\u7684\u51c6\u786e\u6027\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u63d0\u793a\uff0c\u6839\u636e\u6307\u5b9a\u65b9\u5411\u805a\u7126\u4e8e\u58f0\u97f3\uff0c\u5373\u4f7f\u5728\u91cd\u53e0\u8bed\u97f3\u73af\u5883\u4e2d\u4e5f\u80fd\u6267\u884cLSE\u3002\u8fd9\u4e9b\u6210\u679c\u63ed\u793a\u4e86LLMs\u9002\u5e94\u7269\u7406\u97f3\u9891\u6982\u5ff5\u7684\u6f5c\u529b\uff0c\u4e3a\u6784\u5efa\u57fa\u4e8eLLM\u7684\u4e09\u7ef4\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187](http://arxiv.org/abs/2406.09187)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u5e94\u7528\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u65b0\u62c5\u5fe7\u3002\u73b0\u6709\u7684\u63d0\u5347LLM\u5b89\u5168\u6027\u7684\u65b9\u6cd5\u5e76\u4e0d\u76f4\u63a5\u9002\u7528\u4e8eLLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u56e0\u4e3a\u5b83\u4eec\u5177\u6709\u4e0d\u540c\u7684\u76ee\u6807\u548c\u8f93\u51fa\u6a21\u5f0f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\u2014\u2014GuardAgent\uff0c\u5b83\u4f5c\u4e3a\u5176\u4ed6LLM\u4ee3\u7406\u7684\u201c\u9632\u62a4\u680f\u201d\u3002GuardAgent\u901a\u8fc7\u68c0\u67e5\u5176\u8f93\u5165/\u8f93\u51fa\u662f\u5426\u6ee1\u8db3\u7528\u6237\u5b9a\u4e49\u7684\u4e00\u7cfb\u5217\u5b88\u62a4\u8bf7\u6c42\u6765\u76d1\u7763\u76ee\u6807LLM\u3002GuardAgent\u5206\u4e3a\u4e24\u6b65\uff1a1\uff09\u5206\u6790\u63d0\u4f9b\u7684\u5b88\u62a4\u8bf7\u6c42\u521b\u5efa\u4efb\u52a1\u8ba1\u5212\uff1b2\uff09\u6839\u636e\u4efb\u52a1\u8ba1\u5212\u751f\u6210\u5b88\u62a4\u4ee3\u7801\uff0c\u5e76\u901a\u8fc7API\u8c03\u7528\u6216\u5916\u90e8\u5f15\u64ce\u6267\u884c\u3002\u6574\u4e2a\u8fc7\u7a0b\u5229\u7528LLM\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u7ec4\u4ef6\uff0c\u7ed3\u5408\u8bb0\u5fc6\u6a21\u5757\u4e2d\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u589e\u5f3a\u4e86\u77e5\u8bc6\u9a71\u52a8\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u5404\u79cd\u6587\u672c\u5b88\u62a4\u8bf7\u6c42\u5e76\u51c6\u786e\u5730\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u63d0\u4f9b\u53ef\u9760\u7684\u5b89\u5168\u4fdd\u969c\u3002 GuardAgent\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u5de5\u5177\u7bb1\uff0c\u5305\u542b\u51fd\u6570\u548cAPI\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3LLM\uff0c\u5f3a\u8c03\u4e86\u5176\u901a\u7528\u6027\u53ca\u4f4e\u8fd0\u8425\u6210\u672c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff1aEICU-AC\u7528\u4e8e\u8bc4\u4f30\u533b\u7597\u5065\u5eb7\u4ee3\u7406\u7684\u9690\u79c1\u76f8\u5173\u8bbf\u95ee\u63a7\u5236\uff0cMind2Web-SC\u7528\u4e8e\u8bc4\u4f30\u7f51\u7edc\u4ee3\u7406\u7684\u5b89\u5168\u6027\u3002\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\uff0cGuardAgent\u5206\u522b\u572898.7%\u548c90.0%\u7684\u7cbe\u5ea6\u4e0b\u6709\u6548\u7ba1\u7406\u4e86\u4e24\u79cd\u7c7b\u578b\u4ee3\u7406\u7684\u65e0\u6548\u8f93\u5165\u548c\u8f93\u51fa\u3002\u5b9e\u9a8c\u8fd8\u8868\u660e\uff0cGuardAgent\u80fd\u591f\u9002\u5e94\u65b0\u5174\u7684LLM\u4ee3\u7406\u548c\u5b88\u62a4\u8bf7\u6c42\uff0c\u5b9a\u4e49\u65b0\u7684\u529f\u80fd\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2406.08979": "|**2024-06-13**|**Multi-Agent Software Development through Cross-Team Collaboration**|Zhuoyun Du et.al.|[2406.08979](http://arxiv.org/abs/2406.08979)|**[link](https://github.com/openbmb/chatdev)**|**### \u6982\u8ff0 \u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\uff0c\u5982ChatDev\uff0c\u63a8\u52a8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u6df1\u523b\u53d8\u9769\uff0c\u7279\u522b\u4f53\u73b0\u5728\u591a\u4ee3\u7406\u534f\u4f5c\u4e0a\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u50cf\u4eba\u7c7b\u56e2\u961f\u4e00\u6837\u5408\u4f5c\uff0c\u9075\u5faa\u7011\u5e03\u6a21\u578b\u8fdb\u884c\u9700\u6c42\u5206\u6790\u3001\u5f00\u53d1\u3001\u5ba1\u67e5\u3001\u6d4b\u8bd5\u7b49\u9636\u6bb5\uff0c\u5b9e\u73b0\u81ea\u4e3b\u8f6f\u4ef6\u751f\u6210\u3002\u7136\u800c\uff0c\u5355\u4e2a\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u6bcf\u4e2a\u9636\u6bb5\u53ea\u4f1a\u4ea7\u751f\u4e00\u79cd\u53ef\u80fd\u7ed3\u679c\uff0c\u5bfc\u81f4\u53ea\u5b8c\u6210\u4e00\u6761\u5f00\u53d1\u94fe\uff0c\u4ece\u800c\u4e27\u5931\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u4e2d\u63a2\u7d22\u591a\u79cd\u51b3\u7b56\u8def\u5f84\u7684\u673a\u4f1a\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u8de8\u56e2\u961f\u534f\u4f5c\uff08Cross-Team Collaboration\uff0cCTC\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u591a\u56e2\u961f\u7ed3\u6784\uff0c\u5b83\u5141\u8bb8\u534f\u540c\u5de5\u4f5c\u7684\u56e2\u961f\u5728\u8de8\u56e2\u961f\u534f\u4f5c\u73af\u5883\u4e2d\u5171\u540c\u63d0\u51fa\u51b3\u7b56\uff0c\u5e76\u4ea4\u6d41\u5404\u81ea\u89c1\u89e3\uff0c\u4ee5\u4f18\u5316\u5185\u5bb9\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u57fa\u51c6\uff0c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5728\u6545\u4e8b\u751f\u6210\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5177\u6709\u5e7f\u6cdb\u7684\u8de8\u9886\u57df\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u671f\u5f85\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u5f15\u5bfcLLMs\u5411\u8de8\u56e2\u961f\u6a21\u5f0f\u53d1\u5c55\uff0c\u5e76\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5e26\u6765\u91cd\u5927\u8fdb\u6b65\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.08747": "|**2024-06-13**|**StreamBench: Towards Benchmarking Continuous Improvement of Language Agents**|Cheng-Kuang Wu et.al.|[2406.08747](http://arxiv.org/abs/2406.08747)|**[link](https://github.com/stream-bench/stream-bench)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u7ecf\u9a8c\u4e2d\u81ea\u6211\u63d0\u5347\uff0c\u8fd9\u662f\u90e8\u7f72\u540e\u6301\u7eed\u6539\u8fdb\u7684\u91cd\u8981\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u4e3b\u8981\u8bc4\u4f30\u5b83\u4eec\u7684\u56fa\u6709\u80fd\u529b\uff0c\u800c\u4e0d\u8003\u5bdf\u5b83\u4eec\u968f\u65f6\u95f4\u6539\u8fdb\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86StreamBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u8f93\u5165-\u53cd\u9988\u5e8f\u5217\u4e0a\u7684\u8fde\u7eed\u6539\u8fdb\u6027\u80fd\u3002StreamBench\u6a21\u62df\u4e86\u4e00\u4e2a\u5728\u7ebf\u5b66\u4e60\u73af\u5883\uff0c\u5176\u4e2dLLMs\u63a5\u6536\u5230\u8fde\u7eed\u7684\u53cd\u9988\u6d41\uff0c\u5e76\u8fed\u4ee3\u5730\u63d0\u5347\u5176\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b80\u5355\u4f46\u6709\u6548\u7684LLM\u57fa\u7ebf\uff0c\u5e76\u5bf9\u5f71\u54cd\u6210\u529f\u6d41\u5f0f\u7b56\u7565\u7684\u5173\u952e\u7ec4\u4ef6\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u5f00\u53d1LLMs\u7684\u6709\u6548\u5728\u7ebf\u5b66\u4e60\u7b56\u7565\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u4e3a\u6d41\u5f0f\u573a\u666f\u4e2d\u7684\u66f4\u9002\u5e94\u6027AI\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.11277": "|**2024-06-17**|**Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector**|Xiaoxue Cheng et.al.|[2406.11277](http://arxiv.org/abs/2406.11277)|**[link](https://github.com/rucaibox/haluagent)**|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7b\u89c9\u68c0\u6d4b\u65b9\u9762\u7684\u6311\u6218\uff0c\u7279\u522b\u6307\u51fa\u4ee5\u5f80\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u5f3a\u5927\u7684\u95ed\u6e90\u6a21\u578b\u5982GPT-4\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u4e3b\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u79f0\u4e3aHaluAgent\uff0c\u5b83\u5141\u8bb8\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982\u5df4 chcuan2-Chat 7B\uff09\u4e3b\u52a8\u9009\u62e9\u9002\u5408\u68c0\u6d4b\u6587\u672c\u3001\u4ee3\u7801\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\u7b49\u591a\u79cd\u5e7b\u89c9\u7c7b\u578b\u7684\u5de5\u5177\u3002HaluAgent\u6574\u5408\u4e86LLM\u3001\u591a\u529f\u80fd\u5de5\u5177\u7bb1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u4e09\u9636\u6bb5\u68c0\u6d4b\u6846\u67b6\uff0c\u540c\u65f6\u914d\u5907\u4e86\u8bb0\u5fc6\u673a\u5236\u3002\u4e3a\u4e86\u63d0\u9ad8HaluAgent\u7684\u6548\u80fd\uff0c\u8bba\u6587\u5229\u7528\u73b0\u6709\u7684\u4e2d\u6587\u548c\u82f1\u6587\u6570\u636e\u96c6\u5408\u6210\u68c0\u6d4b\u8f68\u8ff9\u8fdb\u884c\u5fae\u8c03\uff0c\u4f7f\u5176\u5177\u5907\u53cc\u8bed\u5e7b\u89c9\u68c0\u6d4b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f7f\u75282000\u4e2a\u6837\u672c\u5bf9LLM\u8fdb\u884c\u8c03\u4f18\u540e\uff0cHaluAgent\u5728\u5404\u79cd\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5176\u6027\u80fd\u53ef\u4e0eGPT-4\u5ab2\u7f8e\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\uff0c\u4e14\u65e0\u9700\u989d\u5916\u5de5\u5177\u589e\u5f3a\uff0c\u65e0\u8bba\u5728\u9886\u57df\u5185\u8fd8\u662f\u9886\u57df\u5916\u7684\u6570\u636e\u96c6\u4e0a\u90fd\u5c55\u73b0\u51fa\u826f\u597d\u6027\u80fd\u3002\u8bba\u6587\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53d1\u5e03\u5728https://github.com/RUCAIBox/HaluAgent\u3002|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200](http://arxiv.org/abs/2406.11200)|**[link](https://github.com/zou-group/avatar)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5229\u7528\u5916\u90e8\u5de5\u5177\u548c\u77e5\u8bc6\u63d0\u5347\u51c6\u786e\u6027\u548c\u51cf\u5c11\u9519\u8bef\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bbe\u8ba1\u80fd\u8ba9LLMs\u6709\u6548\u8fd0\u7528\u8fd9\u4e9b\u5de5\u5177\u7684\u63d0\u793a\u6280\u5de7\u662f\u4e00\u9879\u8017\u65f6\u4e14\u4f9d\u8d56\u76f4\u89c9\u7684\u4efb\u52a1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faAvaTaR\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u5b83\u80fd\u4f18\u5316LLMs\uff0c\u4f7f\u5176\u66f4\u6709\u6548\u5730\u5229\u7528\u63d0\u4f9b\u7684\u5de5\u5177\uff0c\u5e76\u5728\u7279\u5b9a\u4efb\u52a1\u6216\u9886\u57df\u4e2d\u63d0\u5347\u6027\u80fd\u3002AvaTaR\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6bd4\u8f83\u5668\u6a21\u5757\uff0c\u4ee5\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u6b63\u8d1f\u6837\u672c\u8fdb\u884c\u63a8\u7406\uff0c\u8fed\u4ee3\u5730\u4e3aLLM\u63d0\u4f9b\u5bcc\u6709\u6d1e\u5bdf\u529b\u548c\u5168\u9762\u7684\u63d0\u793a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u5305\u542b\u6587\u672c\u3001\u89c6\u89c9\u548c\u5173\u7cfb\u4fe1\u606f\u7684\u590d\u6742\u591a\u6a21\u6001\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86AvaTaR\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u8868\u660e\uff0cAvaTaR\u5728\u6240\u6709\u56db\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u5747\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5f53\u5e94\u7528\u4e8e\u65b0\u6848\u4f8b\u65f6\uff0c\u5e73\u5747\u5728Hit@1\u6307\u6807\u4e0a\u5b9e\u73b0\u4e8614%\u7684\u76f8\u5bf9\u6539\u8fdb\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.11176": "|**2024-06-17**|**Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement**|Weimin Xiong et.al.|[2406.11176](http://arxiv.org/abs/2406.11176)|**[link](https://github.com/weiminxiong/ipr)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7cfb\u5217\u590d\u6742\u7684\u4ea4\u4e92\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u503e\u5411\u4e8e\u901a\u8fc7\u4e13\u5bb6\u8f68\u8ff9\u8c03\u4f18\u6765\u63d0\u5347\u6a21\u578b\u6548\u679c\uff0c\u4f46\u4e3b\u8981\u5173\u6ce8\u6700\u7ec8\u7ed3\u679c\u5956\u52b1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u6216\u975e\u6700\u4f18\u884c\u4e3a\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u8fc7\u7a0b\u76d1\u7763\u4fe1\u53f7\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u8fed\u4ee3\u6b65\u7ea7\u8fc7\u7a0b\u6539\u8fdb\uff08Iterative Step-level Process Refinement\uff0cIPR\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u91c7\u7528\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u4f30\u7b97\u6bcf\u4e00\u6b65\u7684\u5956\u52b1\u3002\u5728\u6bcf\u4e2a\u8fed\u4ee3\u4e2d\uff0c\u6a21\u578b\u6cbf\u7740\u4e13\u5bb6\u8f68\u8ff9\u63a2\u7d22\u5e76\u751f\u6210\u65b0\u52a8\u4f5c\uff0c\u7136\u540e\u4e0e\u4e13\u5bb6\u8f68\u8ff9\u7684\u76f8\u5e94\u6b65\u9aa4\u8fdb\u884c\u6bd4\u8f83\uff0c\u4f7f\u7528\u6b65\u7ea7\u5956\u52b1\u8bc4\u4f30\u3002\u8fd9\u79cd\u6bd4\u8f83\u6709\u52a9\u4e8e\u8bc6\u522b\u5dee\u5f02\uff0c\u5f62\u6210\u7528\u4e8e\u8bad\u7ec3\u7684\u5bf9\u6bd4\u52a8\u4f5c\u5bf9\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u590d\u6742\u4ee3\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f18\u4e8e\u591a\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u63ed\u793a\u4e86IPR\u5728\u63d0\u5347\u52a8\u4f5c\u6548\u7387\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc1\u660e\u5176\u9002\u7528\u4e8e\u5404\u79cd\u6a21\u578b\u3002**|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132](http://arxiv.org/abs/2406.11132)|null|\u5728\u8fc7\u53bb\u7684\u4e00\u5e74\u91cc\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u4e4b\u5916\u5c55\u73b0\u51fa\u60ca\u4eba\u6210\u5c31\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5728\u4ee3\u7801\u751f\u6210\u3001\u65c5\u884c\u89c4\u5212\u548c\u673a\u5668\u4eba\u63a7\u5236\u7b49\u66f4\u5177\u4f53\u7684\u5e94\u7528\u9886\u57df\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u3002\u901a\u8fc7\u4e0eLLM\u6784\u5efa\u6240\u8c13\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\u534f\u52a9\u4eba\u4eec\u5b8c\u6210\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u5404\u79cd\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5bf9LLMs\u7684\u63d0\u793a\u8bed\u53e5\u5bf9\u751f\u6210\u5185\u5bb9\u53ca\u5176\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u6210\u4e3a\u8bb8\u591a\u7814\u7a76\u4eba\u5458\u548cLLM\u7528\u6237\u5173\u6ce8\u7684\u7126\u70b9\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u540d\u4e3a\\textsc{RePrompt}\uff0c\u5b83\u5229\u7528\u4e0eLLM\u4ee3\u7406\u4ea4\u4e92\u83b7\u53d6\u7684\u5bf9\u8bdd\u5386\u53f2\uff0c\u901a\u8fc7\u201c\u68af\u5ea6\u4e0b\u964d\u201d\u4f18\u5316LLM\u7684\u9010\u6b65\u6307\u4ee4\u3002\u901a\u8fc7\u4f18\u5316\u63d0\u793a\uff0cLLM\u80fd\u591f\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u89c4\u5212\u7b56\u7565\u3002\u6211\u4eec\u5728PDDL\u751f\u6210\u548c\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u63d0\u793a\u4f5c\u4e3a\u521d\u59cb\u63d0\u793a\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u5e38\u53ef\u4ee5\u63d0\u9ad8\u4e0d\u540c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u3002|\n", "2406.10918": "|**2024-06-18**|**Embodied Question Answering via Multi-LLM Systems**|Bhrij Patel et.al.|[2406.10918](http://arxiv.org/abs/2406.10918)|null|## \u80cc\u666f Embodied Question Answering\uff08EQA\uff09\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5b83\u6d89\u53ca\u4e00\u4e2a\u4ee3\u7406\u5728\u73af\u5883\u4e2d\u63a2\u7d22\u4ee5\u56de\u7b54\u7528\u6237\u67e5\u8be2\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4ee3\u7406\u573a\u666f\u4e2d\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u63a2\u7d22\u65f6\u95f4\u5197\u957f\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u591a\u4ee3\u7406\u6846\u67b6\u4e0b\u7684EQA\uff0c\u5176\u4e2d\u6d89\u53ca\u591a\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u72ec\u7acb\u4ee3\u7406\uff0c\u5b83\u4eec\u5404\u81ea\u89e3\u7b54\u5173\u4e8e\u5bb6\u5ead\u73af\u5883\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u4e3a\u6bcf\u4e2a\u67e5\u8be2\u751f\u6210\u4e00\u4e2a\u7b54\u6848\uff0c\u6211\u4eec\u5229\u7528\u5404\u4e2a\u72ec\u7acb\u54cd\u5e94\u6765\u8bad\u7ec3\u4e00\u4e2a\u4e2d\u592e\u7b54\u6848\u6a21\u578b\uff08CAM\uff09\uff0c\u8be5\u6a21\u578b\u6574\u5408\u7b54\u6848\u4ee5\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u56de\u7b54\u3002\u901a\u8fc7\u4f7f\u7528CAM\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5176\u5728EQA\u51c6\u786e\u7387\u4e0a\u6bd4\u8bf8\u5982\u6295\u7968\u673a\u5236\u548c\u8fa9\u8bba\u7b49ensemble LLM\u805a\u5408\u65b9\u6cd5\u9ad8\u51fa50%\u3002CAM\u65e0\u9700\u4efb\u4f55\u5f62\u5f0f\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\uff0c\u4ece\u800c\u907f\u514d\u4e86\u76f8\u5173\u5f00\u9500\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e0d\u540c\u7684\u975e\u7ebf\u6027\uff08\u5982\u795e\u7ecf\u7f51\u7edc\u3001\u968f\u673a\u68ee\u6797\u3001\u51b3\u7b56\u6811\u3001XGBoost\uff09\u548c\u7ebf\u6027\u7b97\u6cd5\uff08\u5982\u903b\u8f91\u56de\u5f52\u5206\u7c7b\u5668\u3001\u652f\u6301\u5411\u91cf\u673a\uff09\u5bf9CAM\u8fdb\u884c\u4e86\u6d88\u878d\u7814\u7a76\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7Permutation Feature Importance\uff08PFI\uff09\u5206\u6790\u4e86CAM\u5bf9\u6bcf\u4e2a\u72ec\u7acb\u4ee3\u7406\u548c\u67e5\u8be2\u4e0a\u4e0b\u6587\u7684\u4f9d\u8d56\u7a0b\u5ea6\uff0c\u91cf\u5316\u4e86CAM\u7684\u4f9d\u8d56\u7279\u6027\u3002|\n", "2406.10819": "|**2024-06-16**|**GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents**|Dongping Chen et.al.|[2406.10819](http://arxiv.org/abs/2406.10819)|**[link](https://github.com/keplerlab/katna)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5df2\u88ab\u7528\u4e8e\u63a7\u5236\u952e\u76d8\u548c\u9f20\u6807\u8f93\u5165\uff0c\u76f4\u63a5\u611f\u77e5\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\uff0c\u5e76\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6a21\u578b\u4e3b\u8981\u5728\u9759\u6001\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4e3b\u8981\u5e94\u7528\u4e8e\u76f8\u5bf9\u7b80\u5355\u7684\u9886\u57df\uff0c\u5982\u7f51\u9875\u6216\u79fb\u52a8\u754c\u9762\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e00\u4e2a\u7a33\u5065\u7684GUI\u4ee3\u7406\u5e94\u5177\u5907\u7406\u89e3GUI\u7684\u65f6\u7a7a\u4fe1\u606f\u80fd\u529b\uff0c\u5305\u62ec\u52a8\u6001\u7f51\u9875\u5185\u5bb9\u548c\u591a\u6b65\u9aa4\u4efb\u52a1\uff0c\u8fd8\u8981\u5168\u9762\u7406\u89e3\u5404\u79cdGUI\u573a\u666f\uff0c\u5305\u62ec\u684c\u9762\u8f6f\u4ef6\u548c\u591a\u7a97\u53e3\u4ea4\u4e92\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u6570\u636e\u96c6\u2014\u2014GUI-World\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u7cbe\u5fc3\u5236\u4f5c\u7684\u4eba\u673a\u6807\u6ce8\uff0c\u5e7f\u6cdb\u6db5\u76d6\u516d\u79cdGUI\u573a\u666f\u548c\u516b\u7c7bGUI\u76f8\u5173\u95ee\u9898\uff0c\u4ee5\u4e09\u79cd\u683c\u5f0f\u5448\u73b0\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684MLLM\uff0c\u5982\u56fe\u50cfLLMs\u548c\u89c6\u9891LLMs\uff0c\u5728\u7406\u89e3\u548c\u5904\u7406\u4e0d\u540c\u7c7b\u578bGUI\u5185\u5bb9\uff0c\u7279\u522b\u662f\u52a8\u6001\u548c\u5e8f\u5217\u5185\u5bb9\u65b9\u9762\u7684\u80fd\u529b\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u56fe\u50cfLLMs\u5728\u6ca1\u6709\u624b\u52a8\u6807\u6ce8\u5173\u952e\u5e27\u6216\u64cd\u4f5c\u5386\u53f2\u7684\u60c5\u51b5\u4e0b\uff0c\u96be\u4ee5\u5e94\u5bf9\u52a8\u6001GUI\u5185\u5bb9\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7531\u4e8eGUI\u89c6\u9891\u6570\u636e\u96c6\u7684\u7a00\u758f\u6027\uff0c\u89c6\u9891LLMs\u5728\u6240\u6709GUI\u76f8\u5173\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u57fa\u4e8eGUI-World\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u4f7f\u7528\u5fae\u8c03\u540e\u7684\u89c6\u9891LLM\u4f5c\u4e3aGUI\u4ee3\u7406\uff0c\u663e\u793a\u4e86\u5bf9\u5404\u79cdGUI\u4efb\u52a1\u7406\u89e3\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u7531\u4e8e\u57fa\u7840LLM\u6027\u80fd\u7684\u9650\u5236\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c06\u89c6\u9891LLMs\u7528\u4f5cGUI\u4ee3\u7406\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u672a\u6765\u5728\u52a8\u6001GUI\u5185\u5bb9\u7406\u89e3\u65b9\u9762\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u6d1e\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u6211\u4eec\u7684\u9879\u76ee\u4e3b\u9875https://gui-world.github.io/\u4e0a\u516c\u5f00\u3002**|\n", "2406.10803": "|**2024-06-16**|**HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies**|William Watson et.al.|[2406.10803](http://arxiv.org/abs/2406.10803)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u65f6\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u4e3b\u8981\u5305\u62ec\uff1a\uff081\uff09\u5bf9\u4e8e\u5927\u8868\u683c\u6709\u9650\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff1b\uff082\uff09\u4e0d\u540ctoken\u5316\u6a21\u5f0f\u4e0e\u5355\u5143\u683c\u8fb9\u754c\u7684\u590d\u6742\u5dee\u5f02\uff1b\uff083\uff09\u4ee5\u53ca\u4f7f\u7528\u5916\u90e8\u6a21\u578b\u5982gpt-3.5-turbo\u65f6\u7684\u6570\u636e\u4fdd\u5bc6\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cHiddenTables\u201d\u7684\u5408\u4f5c\u6e38\u620f\u3002\u8fd9\u4e2a\u6e38\u620f\u6d89\u53ca\u4ee3\u7801\u751f\u6210LLM\u201cSolver\u201d\u548c\u8bc4\u4f30\u5176\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u80fd\u529b\u7684\u201cOracle\u201d\uff0c\u4ee5\u81ea\u7136\u8bed\u8a00\u89c4\u8303\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4fdd\u8bc1\u6570\u636e\u5b89\u5168\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u8868\u683c\u4e0a\u5c55\u793a\u4e86LLMs\u5728\u5904\u7406\u590d\u6742\u67e5\u8be2\u3001\u5904\u7406\u7ec4\u5408\u4f9d\u8d56\u4ee5\u53ca\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u5316\u4e3a\u7a0b\u5e8f\u6307\u4ee4\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u7279\u522b\u662f\u5728\u63d0\u4f9b\u5177\u4f53\u8868\u683c\u7ed3\u6784\u7684\u60c5\u51b5\u4e0b\u3002\u4e0e\u57fa\u4e8e\u7f16\u7801\u5668\u7684\u6a21\u578b\u4e0d\u540c\uff0c\u201cHiddenTables\u201d\u4e0d\u53d7\u884c\u6570\u9650\u5236\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u5b8c\u6210 token \u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u201cPyQTax\u201d\uff0c\u5305\u542b116,671\u4e2a\u95ee\u9898-\u8868\u683c-\u7b54\u6848\u4e09\u5143\u7ec4\uff0c\u5e76\u63d0\u4f9b\u4e86\u66f4\u7ec6\u81f4\u7684\u95ee\u9898\u5206\u7c7b\u548c\u6807\u7b7e\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u7814\u7a76\u3002 \u56e0\u6b64\uff0c\u9664\u4e86\u5b66\u672f\u8d21\u732e\uff0c\u63ed\u793a\u4e86LLMs\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u4e0d\u8db3\uff0c\u201cHiddenTables\u201d\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u5728\u4fdd\u969c\u6570\u636e\u5b89\u5168\u7684\u540c\u65f6\uff0c\u8ba9LLMs\u4e0e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e92\u52a8\uff0c\u4ee5\u53ca\u964d\u4f4e\u751f\u6210\u6210\u672c\u7684\u5b9e\u8df5\u65b9\u6cd5\u3002|\n", "2406.10478": "|**2024-06-15**|**From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent**|Samuel S. Sohn et.al.|[2406.10478](http://arxiv.org/abs/2406.10478)|null|## \u80cc\u666f \u5728\u5a31\u4e50\u3001\u6559\u80b2\u548c\u8425\u9500\u9886\u57df\u81f3\u5173\u91cd\u8981\u7684\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u9762\u4e34\u7740\u751f\u4ea7\u89c4\u6a21\u6269\u5c55\u548c\u7075\u6d3b\u6027\u63d0\u5347\u7684\u6311\u6218\u3002\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u7684StoryAgent\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u751f\u6210\u5de5\u5177\u6765\u81ea\u52a8\u5316\u5e76\u4f18\u5316\u6570\u5b57\u6545\u4e8b\u521b\u4f5c\u8fc7\u7a0b\u3002\u5b83\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u6545\u4e8b\u60c5\u8282\u8349\u62df\u548c\u81ea\u4e0b\u800c\u4e0a\u7684\u8d44\u4ea7\u751f\u6210\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u624b\u52a8\u5e72\u9884\u3001\u4e92\u52a8\u573a\u666f\u7f16\u6392\u548c\u53d9\u4e8b\u4e00\u81f4\u6027\u7b49\u5173\u952e\u95ee\u9898\u3002\u8fd9\u4e2a\u6846\u67b6\u4fc3\u8fdb\u4e86\u4ea4\u4e92\u5f0f\u548c\u4e00\u81f4\u53d9\u4e8b\u7684\u9ad8\u6548\u751f\u4ea7\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u5a92\u4ecb\uff0c\u63a8\u52a8\u4e86\u5185\u5bb9\u521b\u4f5c\u7684\u6c11\u4e3b\u5316\uff0c\u589e\u5f3a\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8be5\u6846\u67b6\u80fd\u591f\u5728\u6ca1\u6709\u53c2\u8003\u89c6\u9891\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8fde\u8d2f\u7684\u6570\u5b57\u6545\u4e8b\uff0c\u8fd9\u6807\u5fd7\u7740\u81ea\u52a8\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u6280\u672f\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12708": "|**2024-06-18**|**AgentReview: Exploring Peer Review Dynamics with LLM Agents**|Yiqiao Jin et.al.|[2406.12708](http://arxiv.org/abs/2406.12708)|**[link](https://github.com/ahren09/agentreview)**|## \u7ffb\u8bd1 \u540c\u884c\u8bc4\u5ba1\u662f\u79d1\u5b66\u51fa\u7248\u8bda\u4fe1\u548c\u8fdb\u6b65\u7684\u57fa\u7840\u3002\u4f20\u7edf\u7684\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u5206\u6790\u65b9\u6cd5\u5f80\u5f80\u4fa7\u91cd\u4e8e\u73b0\u6709\u6570\u636e\u7684\u63a2\u7d22\u548c\u7edf\u8ba1\uff0c\u4f46\u672a\u80fd\u5145\u5206\u8003\u8651\u8fd9\u4e00\u8fc7\u7a0b\u7684\u591a\u53d8\u91cf\u6027\u8d28\uff0c\u5904\u7406\u6f5c\u5728\u53d8\u91cf\uff0c\u4e14\u53d7\u9650\u4e8e\u9690\u79c1\u95ee\u9898\uff0c\u56e0\u4e3a\u6570\u636e\u6d89\u53ca\u654f\u611f\u6027\u3002\u6211\u4eec\u63d0\u51faAgentReview\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u540c\u884c\u8bc4\u5ba1\u6a21\u62df\u6846\u67b6\uff0c\u6709\u6548\u5206\u89e3\u4e86\u591a\u4e2a\u6f5c\u5728\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u5e76\u89e3\u51b3\u4e86\u9690\u79c1\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7531\u4e8e\u793e\u4f1a\u5f71\u54cd\u529b\u7406\u8bba\u3001\u5229\u4ed6\u4e3b\u4e49\u75b2\u52b3\u548c\u6743\u5a01\u504f\u89c1\u7b49\u793e\u4f1a\u5b66\u7406\u8bba\u7684\u652f\u6301\uff0c\u8bba\u6587\u51b3\u7b56\u4e2d\u5b58\u5728\u663e\u8457\u768437.1%\u7684\u53d8\u5f02\u6027\u3002\u6211\u4eec\u76f8\u4fe1\u8fd9\u9879\u7814\u7a76\u80fd\u4e3a\u4f18\u5316\u540c\u884c\u8bc4\u5ba1\u673a\u5236\u8bbe\u8ba1\u63d0\u4f9b\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628](http://arxiv.org/abs/2406.12628)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u4e8e\u7535\u529b\u7535\u5b50\u7cfb\u7edf\u63a7\u5236\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u6a21\u578b\u4e0d\u786e\u5b9a\u6027\u4ee5\u53ca\u8bbe\u8ba1\u5468\u671f\u6f2b\u957f\u548c\u6210\u672c\u9ad8\u6602\u7684\u95ee\u9898\u3002\u8bba\u6587\u65e8\u5728\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u9762\u5411\u76ee\u6807\u7684\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u3002\u8be5\u6846\u67b6\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7ed3\u5408\u591a\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u9ad8\u6548\u4e14\u81ea\u52a8\u5316\u7684\u63a7\u5236\u5668\u8bbe\u8ba1\u6d41\u7a0b\u3002LLM\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u5e76\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u7684\u9ad8\u7ea7\u6307\u4ee4\uff0c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7ea6\u675f\u8c03\u6574\u5176\u884c\u4e3a\u3002\u8fd9\u79cd\u65b0\u9896\u800c\u9ad8\u6548\u7684\u7b56\u7565\u6709\u671b\u663e\u8457\u63d0\u5347\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\uff0c\u6781\u5927\u5730\u4fbf\u5229\u5b9e\u8df5\u8005\u7684\u5de5\u4f5c\u3002|\n", "2406.12276": "|**2024-06-18**|**CodeNav: Beyond tool-use to using real-world codebases with LLM agents**|Tanmay Gupta et.al.|[2406.12276](http://arxiv.org/abs/2406.12276)|null|\u6211\u4eec\u4ecb\u7ecdCodeNav\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5bfc\u822a\u548c\u5229\u7528\u5148\u524d\u672a\u89c1\u8fc7\u7684\u4ee3\u7801\u4ed3\u5e93\uff0c\u4ee5\u89e3\u51b3\u7528\u6237\u67e5\u8be2\u7684\u7cfb\u7edf\u3002\u4e0e\u9700\u8981\u901a\u8fc7\u624b\u52a8\u63cf\u8ff0\u5728LLM\u4e0a\u4e0b\u6587\u4e2d\u201c\u6ce8\u518c\u201d\u6240\u6709\u76f8\u5173\u5de5\u5177\u7684\u5de5\u5177\u4f7f\u7528\u578bLLM\u4e0d\u540c\uff0cCodeNav\u80fd\u591f\u81ea\u52a8\u7d22\u5f15\u548c\u641c\u7d22\u76ee\u6807\u4ee3\u7801\u5e93\u4e2d\u7684\u4ee3\u7801\u5757\uff0c\u627e\u5230\u76f8\u5173\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u5bfc\u5165\u5b83\u4eec\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u751f\u6210\u89e3\u51b3\u65b9\u6848\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793aCodeNav\u5982\u4f55\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u4ee3\u7801\u5e93\u6765\u89e3\u51b3\u590d\u6742\u7684\u7528\u6237\u95ee\u9898\u3002\u63a5\u7740\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u5b9a\u91cf\u6bd4\u8f83\u4e86\u4ec5\u80fd\u8bbf\u95ee\u76ee\u6807\u4ee3\u7801\u5e93\u7684\u4ee3\u7801\u4f7f\u7528\u65b9\u6cd5\u4e0e\u62e5\u6709\u5bf9\u6240\u6709\u5de5\u5177\u540d\u79f0\u548c\u63cf\u8ff0\u7684\u7279\u6743\u8bbf\u95ee\u7684\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u7c7b\u578b\u5de5\u5177\u548c\u5e93\u63cf\u8ff0\u5bf9\u4ee3\u7801\u4f7f\u7528\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5c06\u6e90\u4ee3\u7801\u89c6\u4e3a\u8f93\u5165\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ee3\u7801\u63cf\u8ff0\u7684\u4f18\u52bf\u3002\u6240\u6709\u4ee3\u7801\u5c06\u9075\u5faa\u5bbd\u677e\u8bb8\u53ef\u534f\u8bae\u5f00\u6e90\u3002|\n", "2406.12125": "|**2024-06-17**|**Efficient Sequential Decision Making with Large Language Models**|Dingyang Chen et.al.|[2406.12125](http://arxiv.org/abs/2406.12125)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u6269\u5c55\u5230\u5e8f\u5217\u51b3\u7b56\u5236\u5b9a\u3002\u5f53\u524d\u7684\u52aa\u529b\u8981\u4e48\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03LLMs\u8fdb\u884c\u51b3\u7b56\uff0c\u8981\u4e48\u4e3a\u9884\u8bad\u7ec3\u7684LLMs\u8bbe\u8ba1\u63d0\u793a\u3002\u524d\u8005\u9762\u4e34\u8ba1\u7b97\u8d1f\u62c5\u91cd\u7684\u68af\u5ea6\u66f4\u65b0\u95ee\u9898\uff0c\u800c\u540e\u8005\u672a\u663e\u793a\u51fa\u660e\u663e\u6548\u679c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5728\u7ebf\u6a21\u578b\u9009\u62e9\u7b97\u6cd5\u6709\u6548\u5730\u5c06LLMs\u6574\u5408\u5230\u5e8f\u5217\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u3002\u7edf\u8ba1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u51b3\u7b56\u7b97\u6cd5\u548c\u7eafLLM\u4ee3\u7406\u3002\u5728\u8ba1\u7b97\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u907f\u514d\u4e86\u5bf9LLMs\u8fdb\u884c\u6602\u8d35\u7684\u68af\u5ea6\u66f4\u65b0\uff0c\u5e76\u4e14\u5728\u6574\u4e2a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u4ec5\u9700\u8981\u5c11\u91cf\u7684LLM\u8c03\u7528\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee5\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u4e9a\u9a6c\u900a\u6570\u636e\u96c6\u4e3a\u4f8b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4ec5\u4f7f\u75281.5%\u7684\u65f6\u95f4\u6b65\u6570\u8c03\u7528LLMs\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6bd4\u57fa\u7ebf\u8d85\u8fc76\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2406.14373": "|**2024-07-01**|**Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory**|Gordon Dai et.al.|[2406.14373](http://arxiv.org/abs/2406.14373)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\u7684\u7814\u7a76\u8fce\u6765\u4e86\u5927\u89c4\u6a21\u63a2\u7d22\u7684\u673a\u9047\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u57fa\u4e8e\u5148\u524d\u5bf9LLM\u884c\u4e3a\u4f53\u8bbe\u8ba1\u7684\u7814\u7a76\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u6a21\u62df\u7684Agent\u793e\u4f1a\uff0c\u5176\u4e2d\u590d\u6742\u7684\u793e\u4ea4\u5173\u7cfb\u968f\u65f6\u95f4\u52a8\u6001\u5f62\u6210\u548c\u53d1\u5c55\u3002\u6211\u4eec\u8d4b\u4e88\u8fd9\u4e9bAgent\u5fc3\u7406\u9a71\u52a8\u529b\uff0c\u5e76\u7f6e\u4e8e\u4e00\u4e2a\u6c99\u76d2\u751f\u5b58\u73af\u5883\u4e2d\u3002\u901a\u8fc7\u6258\u9a6c\u65af\u00b7\u970d\u5e03\u65af\u7684\u5960\u57fa\u6027\u793e\u4f1a\u5951\u7ea6\u7406\u8bba\uff08SCT\uff09\u7684\u89c6\u89d2\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u8fd9\u4e2aAgent\u793e\u4f1a\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8d77\u521d\uff0cAgent\u4eec\u8868\u73b0\u51fa\u65e0\u62d8\u65e0\u675f\u7684\u51b2\u7a81\uff0c\u7b26\u5408\u970d\u5e03\u65af\u5bf9\u201c\u81ea\u7136\u72b6\u6001\u201d\u7684\u63cf\u8ff0\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u62df\u7684\u8fdb\u884c\uff0c\u793e\u4f1a\u5951\u7ea6\u9010\u6e10\u5f62\u6210\uff0c\u7edd\u5bf9\u4e3b\u6743\u8005\u5f97\u5230\u4e86\u6388\u6743\uff0c\u8fdb\u800c\u5efa\u7acb\u4e86\u4ee5\u76f8\u4e92\u5408\u4f5c\u4e3a\u57fa\u7840\u7684\u548c\u5e73\u5171\u540c\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u53d1\u73b0\u4e0e\u970d\u5e03\u65af\u7406\u8bba\u76f8\u543b\u5408\uff1aLLM\u9a71\u52a8\u7684\u591aAgent\u6a21\u62df\u5c55\u793a\u4e86\u793e\u4f1a\u52a8\u6001\u7684\u590d\u6742\u6027\uff0c\u53ef\u80fd\u590d\u5236\u5851\u9020\u4eba\u7c7b\u793e\u4f1a\u7684\u529b\u91cf\u3002\u5c3d\u7ba1\u65e0\u6cd5\u5b8c\u5168\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u6240\u6709\u7ec6\u5fae\u4e4b\u5904\uff0c\u4f46\u8fd9\u79cd\u6a21\u62df\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u7ed3\u6784\u3001\u7fa4\u4f53\u52a8\u6001\u548c\u590d\u6742\u4eba\u7c7b\u7cfb\u7edf\u5177\u6709\u6f5c\u5728\u4ef7\u503c\u3002|\n", "2406.14228": "|**2024-06-20**|**EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms**|Siyu Yuan et.al.|[2406.14228](http://arxiv.org/abs/2406.14228)|**[link](https://github.com/siyuyuan/evoagent)**|**\u968f\u7740\u5f3a\u5927\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4e00\u79cd\u65b0\u7684\u8d8b\u52bf\u662f\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u80fd\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\uff0c\u5c24\u5176\u662f\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u7684\u6846\u67b6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7cfb\u7edf\u7684\u529f\u80fd\u8303\u56f4\u548c\u53ef\u6269\u5c55\u6027\u3002\u5982\u4f55\u81ea\u52a8\u5c06\u4e13\u95e8\u7684\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51faEvoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u8fdb\u5316\u7b97\u6cd5\u81ea\u52a8\u5c06\u4e13\u5bb6\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u9ad8\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6267\u884c\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u89c6\u73b0\u6709\u7684\u4ee3\u7406\u6846\u67b6\u4e3a\u521d\u59cb\u4e2a\u4f53\uff0c\u5e76\u5e94\u7528\u4e00\u7cfb\u5217\u8fdb\u5316\u64cd\u4f5c\uff08\u5982\u7a81\u53d8\u3001\u4ea4\u53c9\u3001\u9009\u62e9\u7b49\uff09\u751f\u6210\u5177\u6709\u4e0d\u540c\u8bbe\u7f6e\u7684\u4ee3\u7406\u3002EvoAgent\u9002\u7528\u4e8e\u4efb\u4f55\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u80fd\u591f\u65e0\u987b\u989d\u5916\u4eba\u5de5\u8bbe\u8ba1\u81ea\u52a8\u751f\u6210\u6269\u5c55\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEvoAgent\u80fd\u591f\u81ea\u52a8\u4ea7\u751f\u591a\u4e2a\u4e13\u5bb6\u7ea7\u4ee3\u7406\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u3002**|\n", "2406.13352": "|**2024-06-19**|**AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|**[link](https://github.com/ethz-spylab/agentdojo)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentDojo\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u5904\u7406\u4e0d\u53ef\u4fe1\u6570\u636e\u7684AI\u4ee3\u7406\u7684\u5bf9\u6297\u6027\u9c81\u68d2\u6027\u3002\u9762\u5bf9\u4e0d\u65ad\u6f14\u53d8\u7684\u653b\u51fb\u548c\u9632\u5fa1\u624b\u6bb5\uff0cAgentDojo\u4e0d\u662f\u4e00\u4e2a\u9759\u6001\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u800c\u662f\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b0\u4efb\u52a1\u3001\u9632\u5fa1\u7b56\u7565\u4ee5\u53ca\u9002\u5e94\u6027\u653b\u51fb\u7684\u53ef\u6269\u5c55\u73af\u5883\u3002\u5b83\u5305\u542b\u4e8697\u4e2a\u5b9e\u9645\u5e94\u7528\u573a\u666f\u7684\u4efb\u52a1\uff08\u5982\u7ba1\u7406\u7535\u5b50\u90ae\u4ef6\u5ba2\u6237\u7aef\u3001\u5bfc\u822a\u7f51\u4e0a\u94f6\u884c\u7f51\u7ad9\u6216\u9884\u8ba2\u65c5\u884c\uff09\uff0c629\u4e2a\u5b89\u5168\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ee5\u53ca\u6765\u81ea\u6587\u732e\u7684\u5404\u79cd\u653b\u51fb\u548c\u9632\u5fa1\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u8bed\u8a00\u6a21\u578b\u5728AgentDojo\u4e2d\u7684\u8868\u73b0\u5e76\u4e0d\u5c3d\u4eba\u610f\uff08\u5373\u4f7f\u6ca1\u6709\u653b\u51fb\uff09\uff0c\u5e76\u4e14\u73b0\u6709\u7684\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u867d\u7136\u80fd\u7834\u574f\u4e00\u4e9b\u5b89\u5168\u7279\u6027\uff0c\u4f46\u5e76\u975e\u6240\u6709\u60c5\u51b5\u90fd\u9002\u7528\u3002\u6211\u4eec\u671f\u671bAgentDojo\u80fd\u591f\u63a8\u52a8\u7814\u7a76\uff0c\u4ee5\u5bfb\u627e\u5728\u89e3\u51b3\u5e38\u89c1\u4efb\u52a1\u65f6\u65e2\u53ef\u9760\u53c8\u5065\u58ee\u7684AI\u4ee3\u7406\u7684\u65b0\u8bbe\u8ba1\u539f\u5219\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ethz-spylab/agentdojo\u3002**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163](http://arxiv.org/abs/2406.13163)|null|\u53d1\u73b0\u65b0\u6750\u6599\u5bf9\u79d1\u5b66\u548c\u6280\u672f\u5177\u6709\u91cd\u5927\u610f\u4e49\uff0c\u4f46\u76ee\u524d\u4ecd\u662f\u8270\u5de8\u95ee\u9898\uff0c\u56e0\u4e3a\u5316\u5b66\u7a7a\u95f4\u6d69\u701a\u3002\u8fd1\u671f\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u63a8\u52a8\u4e86\u57fa\u4e8e\u6570\u636e\u7684\u65b9\u6cd5\u6765\u5feb\u901f\u7b5b\u9009\u6216\u751f\u6210\u6709\u524d\u666f\u7684\u6750\u6599\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u4f9d\u8d56\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u4e14\u5f80\u5f80\u7f3a\u4e4f\u4eba\u7c7b\u671f\u671b\u7684\u6750\u6599\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u5316\u5b66\u76f4\u89c9\u3002\u6211\u4eec\u63d0\u51faLLMatDesign\uff0c\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u53ef\u89e3\u91ca\u6750\u6599\u8bbe\u8ba1\u65b0\u6846\u67b6\u3002LLMatDesign\u5229\u7528LLM\u4ee3\u7406\u7406\u89e3\u4eba\u7c7b\u6307\u4ee4\uff0c\u5bf9\u6750\u6599\u8fdb\u884c\u4fee\u6539\uff0c\u5e76\u4f7f\u7528\u63d0\u4f9b\u7684\u5de5\u5177\u8bc4\u4f30\u7ed3\u679c\u3002\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u5148\u524d\u51b3\u7b56\uff0cLLMatDesign\u80fd\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u5feb\u901f\u9002\u5e94\u65b0\u4efb\u52a1\u548c\u6761\u4ef6\u3002\u5728\u79bb\u7ebf\u5b9e\u9a8c\u4e2d\uff0c\u5bf9LLMatDesign\u5728\u591a\u4e2a\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u7cfb\u7edf\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u5b83\u5728\u5c0f\u6570\u636e\u73af\u5883\u4e0b\u5f00\u53d1\u51fa\u5177\u6709\u7528\u6237\u5b9a\u4e49\u76ee\u6807\u6027\u8d28\u7684\u65b0\u6750\u6599\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c55\u793a\u4e86\u81ea\u4e3bLLM\u5f15\u5bfc\u7684\u8ba1\u7b97\u73af\u5883\u4e0b\u7684\u6750\u6599\u53d1\u73b0\u7684\u975e\u51e1\u6f5c\u529b\uff0c\u9884\u793a\u7740\u672a\u6765\u81ea\u9a7e\u9a76\u5b9e\u9a8c\u5ba4\u7684\u53ef\u80fd\u6027\u3002|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a-agent\u7cfb\u7edf\uff08LLM-MAS\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5b83\u4eec\u901a\u8fc7\u7cfb\u7edf\u5185\u5404\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u534f\u4f5c\u6765\u5b8c\u6210\u4efb\u52a1\uff0c\u524d\u63d0\u662f\u5171\u4eab\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5f53\u4ee3\u7406\u95f4\u7684\u4ea4\u6d41\u88ab\u7528\u4e8e\u589e\u5f3a\u4eba\u7c7b\u5408\u4f5c\u65f6\uff0c\u7531\u4e8e\u4fe1\u606f\u4e0d\u5bf9\u79f0\uff08\u6bcf\u4e2a\u4ee3\u7406\u4ec5\u80fd\u8bbf\u95ee\u5176\u5bf9\u5e94\u4eba\u7c7b\u7528\u6237\u7684\u4fe1\u606f\uff09\uff0c\u8fd9\u5e26\u6765\u4e86\u65b0\u7684\u6311\u6218\u3002\u4f20\u7edfMAS\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u96be\u4ee5\u5b8c\u6210\u4efb\u52a1\u3002\u4e3a\u89e3\u51b3\u6b64\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u79f0\u4e3a\u201ciAgents\u201d\uff0c\u5373\u4fe1\u606f\u4e30\u5bcc\u591aagent\u7cfb\u7edf\u3002\u5728iAgents\u4e2d\uff0c\u4eba\u7c7b\u793e\u4f1a\u7f51\u7edc\u5728\u4ee3\u7406\u7f51\u7edc\u4e2d\u5f97\u5230\u53cd\u6620\uff0c\u4ee3\u7406\u4e3b\u52a8\u4ea4\u6362\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4ece\u800c\u514b\u670d\u4fe1\u606f\u4e0d\u5bf9\u79f0\u3002iAgents\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ee3\u7406\u63a8\u7406\u673a\u5236\uff0cInfoNav\uff0c\u5f15\u5bfc\u4ee3\u7406\u4e4b\u95f4\u7684\u6709\u6548\u4fe1\u606f\u4ea4\u6d41\u3002\u7ed3\u5408InfoNav\uff0ciAgents\u7ec4\u7ec7\u4e86\u6df7\u5408\u8bb0\u5fc6\u4e2d\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4e3a\u4ee3\u7406\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u4fe1\u606f\u8fdb\u884c\u4ea4\u6362\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u9996\u4e2a\u9488\u5bf9\u8bc4\u4f30LLM\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u6761\u4ef6\u4e0b\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u7684\u57fa\u51c6\u2014\u2014InformativeBench\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0ciAgents\u80fd\u591f\u5728\u5305\u542b140\u4eba\u548c588\u6761\u5173\u7cfb\u7684\u793e\u4f1a\u7f51\u7edc\u4e2d\u534f\u4f5c\uff0c\u81ea\u4e3b\u8fdb\u884c\u8d85\u8fc730\u8f6e\u7684\u901a\u4fe1\uff0c\u5e76\u4ece\u8fd170,000\u6761\u6d88\u606f\u4e2d\u68c0\u7d22\u4fe1\u606f\uff0c\u57283\u5206\u949f\u5185\u5b8c\u6210\u4efb\u52a1\u3002**|\n", "2406.14884": "|**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u88ab\u8bbe\u8ba1\u7528\u4e8e\u901a\u8fc7\u8fed\u4ee3\u89c4\u5212\u548c\u884c\u52a8\u6765\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7684\u4efb\u52a1\u65f6\uff0c\u5bb9\u6613\u4ea7\u751f\u4e0d\u671f\u671b\u7684\u89c4\u5212\u5e7b\u89c9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u521d\u6b65\u5c1d\u8bd5\u901a\u8fc7\u878d\u5165\u4e0e\u5de5\u4f5c\u6d41\u7a0b\u76f8\u5173\u7684\u5916\u90e8\u77e5\u8bc6\u6765\u589e\u5f3a\u89c4\u5212\u53ef\u9760\u6027\u3002\u5c3d\u7ba1\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u6ce8\u5165\u7684\u77e5\u8bc6\u901a\u5e38\u6742\u4e71\u65e0\u7ae0\uff0c\u683c\u5f0f\u591a\u6837\uff0c\u7f3a\u4e4f\u4e25\u8c28\u7684\u89c4\u8303\u5316\u548c\u5168\u9762\u7684\u6bd4\u8f83\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u89c4\u8303\u4e86\u4e0d\u540c\u683c\u5f0f\u7684\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86FlowBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u9762\u5411\u5de5\u4f5c\u6d41\u5f15\u5bfc\u89c4\u5212\u7684\u57fa\u51c6\u3002FlowBench\u6db5\u76d6\u4e86\u6765\u81ea6\u4e2a\u9886\u57df\u768451\u4e2a\u4e0d\u540c\u573a\u666f\uff0c\u5176\u4e2d\u77e5\u8bc6\u4ee5\u591a\u6837\u7684\u5f62\u5f0f\u5448\u73b0\u3002\u4e3a\u4e86\u8bc4\u4f30\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u5728FlowBench\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u5c42\u6b21\u7684\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\u5728\u591a\u79cd\u683c\u5f0f\u4e0b\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5728\u6ee1\u8db3\u6ee1\u610f\u7684\u89c4\u5212\u9700\u6c42\u65b9\u9762\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u57fa\u51c6\u80fd\u4e3a\u672a\u6765\u7684\u4ee3\u7406\u89c4\u5212\u7814\u7a76\u94fa\u5e73\u9053\u8def\u3002|\n", "2406.17232": "|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### \u7ffb\u8bd1 \u6784\u5efa\u903c\u771f\u7684\u4eba\u5de5\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u4fe1\u7684\u793e\u4f1a\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u6709\u65f6\u80fd\u63d0\u5347\u4eba\u6027\u5316\uff0c\u4f46\u6548\u679c\u5e76\u4e0d\u603b\u662f\u7406\u60f3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u6765\u81ea\u5b9e\u8bc1\u4eba\u7c7b\u4fe1\u5ff5\u7f51\u7edc\u7684\u4fe1\u606f\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002\u6211\u4eec\u5229\u7528\u4e00\u9879\u4eba\u7c7b\u8c03\u67e5\u6570\u636e\uff0c\u4f30\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b18\u4e2a\u4e3b\u9898\u7684\u4fe1\u5ff5\u7f51\u7edc\uff0c\u8fd9\u4e9b\u4e3b\u9898\u52a0\u8f7d\u4e8e\u4e24\u4e2a\u4e0d\u91cd\u53e0\u7684\u6f5c\u5728\u56e0\u5b50\u4e0a\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728LLM\u4e2d\u690d\u5165\u4e00\u4e2a\u5173\u4e8e\u67d0\u4e00\u4e3b\u9898\u7684\u89c2\u70b9\uff0c\u5206\u6790\u5176\u5bf9\u5269\u4f59\u6d4b\u8bd5\u8bdd\u9898\u8868\u8fbe\u7684\u89c2\u70b9\u4e0e\u76f8\u5e94\u4eba\u7c7b\u6570\u636e\u7684\u5951\u5408\u7a0b\u5ea6\u3002\u4ec5\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u672a\u80fd\u4f7fLLM\u548c\u4eba\u7c7b\u89c2\u70b9\u4fdd\u6301\u4e00\u81f4\uff0c\u4f46\u5f53\u690d\u5165\u5355\u4e00\u4fe1\u5ff5\u65f6\uff0c\u5bf9\u4e8e\u76f8\u5173\u4e8e\u4fe1\u5ff5\u7f51\u7edc\u5185\u7684\u4e3b\u9898\uff0c\u8fd9\u79cd\u4e00\u81f4\u6027\u663e\u8457\u63d0\u9ad8\uff0c\u800c\u5bf9\u4e8e\u7f51\u7edc\u5916\u7684\u4e3b\u9898\u5219\u6ca1\u6709\u660e\u663e\u5f71\u54cd\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5728\u8ffd\u6c42\u7406\u89e3\u548c\u6a21\u62df\u793e\u4f1a\u4e2d\u4fe1\u5ff5\u5206\u5e03\u6a21\u5f0f\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u4e2d\uff0c\u5b9e\u73b0\u4eba\u7c7b\u4e0eLLMs\u4e4b\u95f4\u7684\u4fe1\u5ff5\u5bf9\u9f50\u3002|\n", "2406.18702": "|**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u865a\u62df\u4ee3\u7406\u6765\u6a21\u62df\u7acb\u6cd5\u8fc7\u7a0b\uff0c\u5177\u4f53\u805a\u7126\u4e8e\u7f8e\u56fd\u53c2\u8bae\u9662\u60c5\u62a5\u59d4\u5458\u4f1a\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4ee3\u8868\u4e2a\u522b\u53c2\u8bae\u5458\u7684\u4ee3\u7406\uff0c\u5e76\u5728\u6a21\u62df\u7684\u59d4\u5458\u4f1a\u8ba8\u8bba\u4e2d\u8ba9\u5b83\u4eec\u4e92\u52a8\u3002\u8fd9\u4e9b\u4ee3\u7406\u5c55\u73b0\u51fa\u5728\u73b0\u5b9e\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u6df1\u601d\u719f\u8651\u7684\u89c2\u70b9\uff0c\u5e76\u5728\u7279\u5b9a\u6761\u4ef6\u4e0b\u627e\u5230\u4e24\u515a\u7684\u89e3\u51b3\u65b9\u6848\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6a21\u62df\u663e\u793a\uff0c\u9762\u5bf9\u5916\u90e8\u5e72\u6270\u65f6\uff0c\u4ee3\u7406\u6a21\u578b\u5728\u4e24\u515a\u5408\u4f5c\u4e0a\u5c55\u73b0\u51fa\u8f6c\u53d8\u7684\u6f5c\u529b\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u7b56\u7565\u53ef\u80fd\u6210\u4e3a\u7406\u89e3\u548c\u6539\u8fdb\u7acb\u6cd5\u6d41\u7a0b\u7684\u6709\u6548\u5de5\u5177\uff0c\u8fd9\u4e0e\u4e00\u7cfb\u5217\u53d1\u73b0\u76f8\u547c\u5e94\uff0c\u5373\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u80fd\u6709\u7528\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u73b0\u8c61\u3002\u672a\u6765\u7684\u7814\u7a76\u5c06\u81f4\u529b\u4e8e\u63d0\u5347\u4ee3\u7406\u7684\u590d\u6742\u6027\uff0c\u6269\u5927\u6a21\u62df\u8303\u56f4\uff0c\u5e76\u63a2\u7d22\u5728\u653f\u7b56\u6d4b\u8bd5\u548c\u8c08\u5224\u4e2d\u7684\u5e94\u7528\u3002|\n", "2406.19966": "|**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|\u5927\u591a\u6570\u7ecf\u6d4e\u7406\u8bba\u901a\u5e38\u5047\u8bbe\u91d1\u878d\u5e02\u573a\u53c2\u4e0e\u8005\u662f\u5b8c\u5168\u7406\u6027\u7684\u4e2a\u4f53\uff0c\u5e76\u4f7f\u7528\u6570\u5b66\u6a21\u578b\u6765\u6a21\u62df\u4eba\u7c7b\u5728\u91d1\u878d\u5e02\u573a\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u884c\u4e3a\u5f80\u5f80\u5e76\u975e\u5b8c\u5168\u7406\u6027\uff0c\u7528\u6570\u5b66\u6a21\u578b\u7cbe\u786e\u9884\u6d4b\u9887\u5177\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\\textbf{A}gent-based \\textbf{S}imulated \\textbf{F}inancial \\textbf{M}arket\uff08ASFM\uff09\uff0c\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u771f\u5b9e\u8ba2\u5355\u5339\u914d\u7cfb\u7edf\u7684\u6a21\u62df\u80a1\u7968\u5e02\u573a\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80a1\u7968\u4ea4\u6613\u4ee3\u7406\uff0c\u5b83\u5305\u62ec\u4e2a\u4eba\u6982\u51b5\u3001\u89c2\u5bdf\u548c\u57fa\u4e8e\u5de5\u5177\u5b66\u4e60\u7684\u52a8\u4f5c\u6a21\u5757\u3002\u8fd9\u79cd\u4ea4\u6613\u4ee3\u7406\u80fd\u591f\u5168\u9762\u7406\u89e3\u5f53\u524d\u5e02\u573a\u52a8\u6001\u548c\u91d1\u878d\u653f\u7b56\u4fe1\u606f\uff0c\u4ece\u800c\u6839\u636e\u5176\u4ea4\u6613\u7b56\u7565\u4f5c\u51fa\u51b3\u7b56\u3002\u5b9e\u9a8c\u8868\u660e\uff0cASFM\u5728\u53ef\u63a7\u573a\u666f\u4e0b\u7684\u53cd\u5e94\u4e0e\u73b0\u5b9e\u80a1\u7968\u5e02\u573a\u4e00\u81f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u4e24\u4e2a\u7ecf\u6d4e\u5b66\u7814\u7a76\u70ed\u70b9\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u53d1\u73b0\uff0c\u6211\u4eec\u7684\\model\u5f97\u51fa\u7684\u7ed3\u8bba\u4e0e\u7ecf\u6d4e\u5b66\u7814\u7a76\u7684\u521d\u6b65\u53d1\u73b0\u76f8\u543b\u5408\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8ba4\u4e3aASFM\u4e3a\u7ecf\u6d4e\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u8303\u5f0f\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u5982\u4e13\u95e8\u5316\u7684\u6a21\u578b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u53ef\u4ee5\u6839\u636e\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5728\u533b\u7597\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u95e8\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u79f0\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\u6765\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u4e3a\u7ed9\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u5408\u9002\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\u7684\u6700\u65b0\u72b6\u6001\uff0c\u751a\u81f3\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4o\u76f8\u6bd4\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cMMedAgent\u8fd8\u663e\u793a\u51fa\u4e86\u66f4\u65b0\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u7684\u9ad8\u6548\u6027\u3002|\n", "2407.01887": "|**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|\u672c\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u51b3\u7b56\u5236\u5b9a\u4e2d\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u675c\u5c14\u514b\u59c6\u53cc\u81c2\u8d4c\u535a\uff08Dueling Bandits\uff0cDB\uff09\u95ee\u9898\u7684\u4e0a\u4e0b\u6587\u4e2d\u3002\u7814\u7a76\u6bd4\u8f83\u4e86GPT-3.5-Turbo\u3001GPT-4\u548cGPT-4-Turbo\u4e0e\u73b0\u6709DB\u7b97\u6cd5\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c24\u5176\u662fGPT-4 Turbo\uff0c\u80fd\u591f\u5feb\u901f\u8bc6\u522b\u51fa\u4f18\u52bf\u660e\u663e\u7684\u9009\u9879\uff0c\u4ece\u800c\u5728\u5f31\u540e\u6094\u65b9\u9762\u8d85\u8d8a\u5f53\u524d\u6700\u4f73\u7b97\u6cd5\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6536\u655b\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5bf9\u63d0\u793a\u7684\u654f\u611f\u5ea6\u8f83\u9ad8\uff0c\u4e14\u5bf9\u63d0\u793a\u53d8\u5316\u53cd\u5e94\u8106\u5f31\u3002\u4e3a\u4e86\u6539\u8fdb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u4e86LLM\u51b3\u7b56\u80fd\u529b\u4e0e\u7ecf\u5178DB\u7b97\u6cd5\u7406\u8bba\u4fdd\u8bc1\u7684\u589e\u5f3a\u578b\u7b97\u6cd5\u2014\u2014IF-Enhanced LLM\u3002\u8fd9\u79cd\u8bbe\u8ba1\u5c55\u793a\u4e86\u5982\u4f55\u589e\u5f3aLLM\u5728\u5bf9\u6027\u80fd\u7a33\u5b9a\u6027\u6709\u8981\u6c42\u7684\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u53ef\u4fe1\u5ea6\u3002IF-Enhanced LLM\u5177\u6709\u5f31\u540e\u6094\u548c\u5f3a\u540e\u6094\u7684\u7406\u8bba\u4fdd\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u5373\u4f7f\u9762\u5bf9\u5608\u6742\u548c\u5bf9\u6297\u6027\u7684\u63d0\u793a\uff0cIF-Enhanced LLM\u4ecd\u4fdd\u6301\u7a33\u5065\u3002|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u81ea\u52a8\u5316\uff0c\u5982\u4ee3\u7801\u5408\u6210\u3001\u7a0b\u5e8f\u4fee\u590d\u548c\u6d4b\u8bd5\u751f\u6210\uff0c\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\u3002\u7814\u7a76\u4eba\u5458\u548c\u4e1a\u754c\u5b9e\u8df5\u8005\u5df2\u7ecf\u5f00\u53d1\u51fa\u5404\u79cd\u81ea\u4e3bLLM\u4ee3\u7406\u6765\u6267\u884c\u7aef\u5230\u7aef\u7684\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u3001\u8fd0\u884c\u547d\u4ee4\u3001\u89c2\u5bdf\u73af\u5883\u53cd\u9988\u5e76\u89c4\u5212\u672a\u6765\u884c\u52a8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u65b9\u6cd5\u7684\u590d\u6742\u6027\u4ee5\u53ca\u5f53\u524dLLM\u7684\u5c40\u9650\u6027\uff0c\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u662f\u5426\u771f\u7684\u9700\u8981\u4f7f\u7528\u590d\u6742\u7684\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\uff1f\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86Agentless\u2014\u2014\u4e00\u79cd\u65e0\u4ee3\u7406\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u89e3\u51b3\u8f6f\u4ef6\u5f00\u53d1\u95ee\u9898\u3002\u4e0e\u590d\u6742\u7684\u4ee3\u7406\u8bbe\u7f6e\u76f8\u6bd4\uff0cAgentless\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u4e24\u9636\u6bb5\u8fc7\u7a0b\uff1a\u5b9a\u4f4d\u540e\u4fee\u590d\uff0c\u4e0d\u8ba9LLM\u51b3\u5b9a\u672a\u6765\u7684\u884c\u52a8\u6216\u64cd\u4f5c\u590d\u6742\u7684\u5de5\u5177\u3002\u5728\u6d41\u884c\u7684SWE-bench Lite\u57fa\u51c6\u4e0a\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4ee4\u4eba\u60ca\u8bb6\u5730\u8868\u660e\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u80fd\u591f\u5b9e\u73b0\u6700\u9ad8\u6027\u80fd\uff0827.33%\uff09\u548c\u6700\u4f4e\u6210\u672c\uff080.34\u7f8e\u5143\uff09\uff0c\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u8f6f\u4ef6\u4ee3\u7406\uff01 \u6b64\u5916\uff0c\u6211\u4eec\u624b\u52a8\u5206\u7c7b\u4e86SWE-bench Lite\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u53d1\u73b0\u5b58\u5728\u7cbe\u786e\u7684ground truth\u8865\u4e01\u95ee\u9898\u6216\u63cf\u8ff0\u4e0d\u8db3/\u8bef\u5bfc\u6027\u7684\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86SWE-bench Lite-S\uff0c\u901a\u8fc7\u6392\u9664\u8fd9\u4e9b\u95ee\u9898\u6765\u8fdb\u884c\u66f4\u4e25\u683c\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5f53\u524d\u88ab\u5ffd\u89c6\u7684\u7b80\u5355\u3001\u53ef\u89e3\u91ca\u6280\u672f\u5728\u81ea\u4e3b\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5e0c\u671bAgentless\u5c06\u4f5c\u4e3a\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\u7684\u57fa\u7ebf\u3001\u8d77\u70b9\u548c\u671f\u671b\u503c\uff0c\u6fc0\u53d1\u672a\u6765\u5728\u8fd9\u4e2a\u5173\u952e\u9886\u57df\u7684\u5de5\u4f5c\u3002**|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u6536\u96c6\u5168\u7403\u4fe1\u606f\uff0c\u5e76\u8fdb\u884c\u63a8\u7406\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\uff0c\u8fd9\u5f15\u53d1\u4e86\u4f7f\u7528LLM\u9884\u6d4b\u56fd\u9645\u4e8b\u4ef6\u7684\u5174\u8da3\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u4e00\u4e2a\u4e25\u683c\u8bc4\u4f30LLM\u9884\u6d4b\u80fd\u529b\u4e0e\u53ef\u9760\u6027\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faMIRAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4ef7LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\u3002MIRAI\u6784\u5efa\u4e86\u4e00\u4e2a\u4ee3\u7406\u73af\u5883\uff0c\u914d\u5907\u6709\u8bbf\u95ee\u5e7f\u6cdb\u5386\u53f2\u7ed3\u6784\u5316\u4e8b\u4ef6\u548c\u6587\u672c\u65b0\u95fb\u6570\u636e\u5e93\u7684\u5de5\u5177\u3002\u6211\u4eec\u5bf9GDELT\u4e8b\u4ef6\u6570\u636e\u5e93\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6e05\u6d17\u548c\u89e3\u6790\uff0c\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u5173\u8054\u9884\u6d4b\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u4e0d\u540c\u9884\u6d4b\u65f6\u95f4\u8303\u56f4\uff0c\u4ece\u77ed\u671f\u5230\u957f\u671f\uff0c\u4ee5\u68c0\u9a8cLLM\u5728\u6574\u5408\u5168\u7403\u5173\u952e\u4fe1\u606f\u3001\u8fd0\u7528\u9886\u57df\u7279\u5b9aAPI\u548c\u5e93\u7f16\u5199\u4ee3\u7801\u4ee5\u53ca\u7efc\u5408\u5904\u7406\u6765\u81ea\u591a\u79cd\u683c\u5f0f\u548c\u65f6\u95f4\u7684\u5386\u53f2\u77e5\u8bc6\u4ee5\u51c6\u786e\u9884\u6d4b\u672a\u6765\u4e8b\u4ef6\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u53ef\u9760\u7684\u6846\u67b6\uff0c\u4ee5\u8bc4\u4f30LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u9884\u6d4b\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u63a8\u52a8\u66f4\u7cbe\u786e\u548c\u53ef\u4fe1\u7684\u56fd\u9645\u5173\u7cfb\u5206\u6790\u6a21\u578b\u7684\u53d1\u5c55\u3002|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u57fa\u4e8eLLM\u7684\u79fb\u52a8\u4ee3\u7406\u5df2\u6210\u4e3a\u4eba\u673a\u4ea4\u4e92\u9886\u57df\u7684\u7814\u7a76\u70ed\u70b9\u3002\u7136\u800c\uff0c\u9488\u5bf9\u6b64\u7c7b\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\u8d44\u6e90\u76f8\u5bf9\u532e\u4e4f\u3002\u8bc4\u4f30\u8fd9\u7c7b\u4ee3\u7406\u901a\u5e38\u9762\u4e34\u4e09\u4e2a\u6311\u6218\uff1a\uff081\uff09\u4ec5\u4f9d\u8d56\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\u7684\u4f4e\u6548\u9650\u5236\u4e86\u4efb\u52a1\u8bc4\u4f30\uff1b\uff082\uff09\u5355\u4e00\u5e94\u7528\u4e2d\u7684\u7279\u5b9a\u6307\u4ee4\u4e0d\u8db3\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u7684\u591a\u7ef4\u5ea6\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\uff1b\uff083\uff09\u5f53\u524d\u7684\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u51c6\u786e\u8861\u91cf\u8fde\u7eed\u52a8\u4f5c\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mobile-Bench\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u7528\u4e8e\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u80fd\u529b\u7684\u57fa\u51c6\u3002\u9996\u5148\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4f20\u7edf\u7684UI\u64cd\u4f5c\uff0c\u878d\u5165\u4e86103\u4e2a\u6536\u96c6\u5230\u7684API\uff0c\u4ee5\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7684\u6548\u7387\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u548cLLM\u589e\u5f3a\u7684\u6570\u636e\u6536\u96c6\u6765\u8fdb\u884c\u8bc4\u4f30\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u8bc4\u4ef7\u79fb\u52a8\u4ee3\u7406\u7684\u4e0d\u540c\u89c4\u5212\u80fd\u529b\u5c42\u6b21\uff0c\u6211\u4eec\u7684\u6570\u636e\u88ab\u5206\u4e3aSAST\uff08\u7b80\u5355\u4efb\u52a1\uff09\u3001SAMT\uff08\u7a0d\u590d\u6742\u4efb\u52a1\uff09\u548cMAMT\uff08\u591a\u4efb\u52a1\uff09\u4e09\u7c7b\uff0c\u53cd\u6620\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u7684\u5dee\u5f02\u3002Mobile-Bench\u5305\u542b832\u6761\u6570\u636e\u6761\u76ee\uff0c\u5176\u4e2d\u8d85\u8fc7200\u9879\u4efb\u52a1\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u6d4b\u8bd5\u8de8\u5e94\u7528\u534f\u4f5c\u573a\u666f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3aCheckPoint\uff0c\u7528\u4e8e\u68c0\u67e5LLM\u79fb\u52a8\u4ee3\u7406\u5728\u89c4\u5212\u548c\u63a8\u7406\u6b65\u9aa4\u4e2d\u662f\u5426\u8fbe\u5230\u5173\u952e\u70b9\u3002|\n", "2407.00476": "|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**\u968f\u7740\u4f20\u7edf\u4f18\u5316\u548c\u8c03\u5ea6\u65b9\u6cd5\u9010\u6e10\u8f6c\u5411\u7528\u6237\u9a71\u52a8\u548c\u4e2a\u4eba\u5316\u670d\u52a1\uff0c\u4ee5\u63d0\u5347\u7528\u6237\u4f53\u9a8c\uff08QoE\uff09\u548c\u7075\u6d3b\u6027\uff0c\u672a\u6765\u7684\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5728\u65e0\u7ebf\u548c\u6570\u5b57\u5316\u80fd\u6e90\u7f51\u7edc\u4e2d\uff0c\u9762\u4e34\u7740\u5982\u4f55\u66f4\u597d\u5730\u7406\u89e3\u548c\u54cd\u5e94\u7528\u6237\u9700\u6c42\u7684\u6311\u6218\u3002\u4f20\u7edf\u7684\u7cfb\u7edf\u5f80\u5f80\u5ffd\u89c6\u4e86\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\uff0c\u56e0\u4e3a\u7528\u6237\u4e0e\u673a\u5668\u4e4b\u95f4\u7684\u6c9f\u901a\u4e0d\u7545\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u5e26\u6765\u4e86\u7a81\u7834\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u7528\u6237\u4e0e\u8bbe\u5907\u4e4b\u95f4\u81ea\u7136\u7684\u4ea4\u6d41\u754c\u9762\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u67b6\u6784\uff0c\u901a\u8fc7\u6784\u5efa\u4e09\u4e2aLLM\u4ee3\u7406\u6765\u5c06\u7528\u6237\u7684\u8bed\u97f3\u8bf7\u6c42\uff08VRQ\uff09\u8f6c\u5316\u4e3a\u8d44\u6e90\u5206\u914d\u5411\u91cf\u3002\u5177\u4f53\u5305\u62ec\uff1aLLM\u610f\u56fe\u8bc6\u522b\u4ee3\u7406\u5c06\u8bf7\u6c42\u8f6c\u5316\u4e3a\u4f18\u5316\u95ee\u9898\uff08OP\uff09\u3001LLM OP\u53c2\u6570\u8bc6\u522b\u4ee3\u7406\u4ee5\u53caLLM OP\u6c42\u89e3\u4ee3\u7406\u3002 \u6211\u4eec\u9488\u5bf9\u7535\u52a8\u6c7d\u8f66\uff08EV\uff09\u5145\u7535\u7684\u5178\u578bVRQ\u521b\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u5e93\uff0c\u4f5c\u4e3a\u6027\u80fd\u8bc4\u4f30\u7684\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u4e3b\u8981\u4f7f\u7528Llama 3 8B\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\u3002\u901a\u8fc7\u4e0d\u540c\u7684\u63d0\u793a\u5de5\u7a0b\u573a\u666f\u6d4b\u8bd5\uff0c\u7ed3\u679c\u663e\u793a\u4e86\u6240\u63d0\u67b6\u6784\u7684\u6709\u6548\u6027\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u952e\u89c1\u89e3\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u5efa\u6a21\u5b9e\u9645\u95ee\u9898\u7684\u66f4\u5927\u5019\u9009OP\u96c6\u53ef\u80fd\u4f1a\u7531\u4e8e\u66f4\u9ad8\u7684\u8bc6\u522b/OP\u5206\u7c7b\u566a\u58f0\u800c\u964d\u4f4e\u6700\u7ec8\u6027\u80fd\u3002\u6240\u6709\u7ed3\u679c\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u5229\u7528\u3002**|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|\u4eba\u5de5\u667a\u80fd\u5728\u91d1\u878d\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6b63\u5728\u91cd\u5851\u6570\u636e\u5904\u7406\u548c\u89e3\u8bfb\u65b9\u5f0f\u3002\u5176\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u3001\u63d0\u5347\u5ba2\u6237\u670d\u52a1\uff0c\u5e76\u63d0\u4f9b\u8be6\u5c3d\u7684\u8d22\u52a1\u5206\u6790\u3002\u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdIDEA-FinBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u91d1\u878d\u77e5\u8bc6\u65b9\u9762\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u7684\u8bc4\u4ef7\u57fa\u51c6\u3002\u5b83\u501f\u9274\u4e86\u4e24\u4e2a\u5168\u7403\u77e5\u540d\u4e14\u6743\u5a01\u7684\u91d1\u878d\u4e13\u4e1a\u8003\u8bd5\u4e2d\u7684\u95ee\u9898\uff0c\u65e8\u5728\u5168\u9762\u68c0\u9a8cLLMs\u89e3\u7b54\u4e0e\u91d1\u878d\u76f8\u5173\u8003\u9898\u7684\u80fd\u529b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51faIDEA-FinKER\uff0c\u662f\u4e00\u4e2a\u91d1\u878d\u77e5\u8bc6\u589e\u5f3a\u6846\u67b6\uff0c\u65e8\u5728\u5feb\u901f\u8ba9\u901a\u7528LLMs\u9002\u5e94\u91d1\u878d\u9886\u57df\u3002\u5b83\u91c7\u7528\u57fa\u4e8e\u68c0\u7d22\u7684\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b9e\u73b0\u5b9e\u65f6\u4e0a\u4e0b\u6587\u7ea7\u77e5\u8bc6\u6ce8\u5165\uff0c\u5e76\u63d0\u4f9b\u4e00\u5957\u9ad8\u8d28\u91cf\u7684\u91d1\u878d\u77e5\u8bc6\u6307\u4ee4\uff0c\u7528\u4e8e\u5fae\u8c03\u4efb\u4f55\u901a\u7528\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86IDEA-FinQA\uff0c\u4e00\u4e2a\u7531LLMs\u9a71\u52a8\u7684\u91d1\u878d\u95ee\u7b54\u7cfb\u7edf\u3002\u8be5\u7cfb\u7edf\u56f4\u7ed5\u5b9e\u65f6\u77e5\u8bc6\u6ce8\u5165\u548c\u4e8b\u5b9e\u5f3a\u5316\u7684\u67b6\u6784\u6784\u5efa\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u3002IDEA-FinQA\u4e3b\u8981\u7531\u6570\u636e\u6536\u96c6\u5668\u3001\u6570\u636e\u67e5\u8be2\u6a21\u5757\u548c\u6267\u884c\u7279\u5b9a\u529f\u80fd\u7684LLM\u4ee3\u7406\u7ec4\u6210\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u5ea6\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04503": "|**2024-07-05**|**When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions**|J\u00e9r\u00e9my Perez et.al.|[2407.04503](http://arxiv.org/abs/2407.04503)|**[link](https://github.com/jeremyperez2/telephonegamellm)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e4b\u95f4\u7684\u4e92\u52a8\u589e\u52a0\uff0c\u5b83\u4eec\u5728\u7ebf\u4e0a\u751f\u6210\u7684\u6587\u672c\u91cf\u4e5f\u968f\u4e4b\u589e\u591a\uff0c\u7814\u7a76\u5982\u4f55\u4fe1\u606f\u5728\u4ece\u4e00\u4e2aLLM\u4f20\u9012\u5230\u53e6\u4e00\u4e2aLLM\u7684\u8fc7\u7a0b\u4e2d\u53d1\u751f\u53d8\u5316\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5bf9\u5355\u4e2aLLM\u7684\u884c\u4e3a\u5df2\u6709\u6df1\u5165\u7814\u7a76\uff0c\u4f46\u5bf9\u8fed\u4ee3\u4ea4\u4e92\u4e2d\u96c6\u4f53\u884c\u4e3a\u548c\u4fe1\u606f\u626d\u66f2\u7684\u63a2\u8ba8\u76f8\u5bf9\u4e0d\u8db3\u3002\u5fae\u5c0f\u7684\u504f\u5dee\uff0c\u5728\u5355\u6b21\u8f93\u51fa\u65f6\u53ef\u80fd\u663e\u5f97\u4e0d\u660e\u663e\uff0c\u4f46\u5728\u591a\u6b21\u4ea4\u4e92\u4e2d\u53ef\u80fd\u4f1a\u88ab\u653e\u5927\uff0c\u53ef\u80fd\u5bfc\u81f4\u5185\u5bb9\u671d\u7740\u5438\u5f15\u5b50\u72b6\u6001\u6f14\u53d8\u3002\u6211\u4eec\u901a\u8fc7\u501f\u9274\u4eba\u7c7b\u6587\u5316\u8fdb\u5316\u5b66\u7684\u7814\u7a76\u65b9\u6cd5\u2014\u2014\u7535\u8bdd\u6e38\u620f\u5b9e\u9a8c\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u94fe\u5f0f\u4f20\u8f93\u6a21\u578b\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLLM\u4ee3\u7406\u63a5\u6536\u3001\u751f\u6210\u5e76\u4f20\u9012\u6587\u672c\uff0c\u4ece\u4e00\u4e2a\u94fe\u4e2d\u7684\u524d\u4e00\u4e2a\u4ee3\u7406\u5230\u4e0b\u4e00\u4e2a\u3002\u6211\u4eec\u8ffd\u8e2a\u4e86\u6587\u672c\u7684\u6bd2\u6027\u3001\u79ef\u6781\u5ea6\u3001\u96be\u5ea6\u548c\u957f\u5ea6\u5728\u4f20\u8f93\u94fe\u4e2d\u7684\u6f14\u53d8\uff0c\u63ed\u793a\u4e86\u504f\u89c1\u548c\u5438\u5f15\u5b50\u7684\u5b58\u5728\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u4e0e\u521d\u59cb\u6587\u672c\u3001\u6307\u4ee4\u3001\u8bed\u8a00\u6a21\u578b\u548c\u6a21\u578b\u89c4\u6a21\u7684\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u53d1\u73b0\u5f00\u653e\u6027\u6307\u4ee4\u6bd4\u7ea6\u675f\u6027\u4efb\u52a1\u66f4\u5bb9\u6613\u5f15\u53d1\u66f4\u5f3a\u7684\u5438\u5f15\u6548\u5e94\u3002\u6b64\u5916\uff0c\u4e0d\u540c\u7684\u6587\u672c\u7279\u6027\u5bf9\u5438\u5f15\u5b50\u6548\u5e94\u7684\u654f\u611f\u5ea6\u4e0d\u540c\uff0c\u6bd2\u6027\u7684\u5f71\u54cd\u901a\u5e38\u5927\u4e8e\u957f\u5ea6\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8003\u8651\u591a\u6b65\u9aa4\u4f20\u8f93\u52a8\u6001\u7684\u91cd\u8981\u6027\uff0c\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3LLM\u7684\u6587\u5316\u52a8\u6001\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.04363": "|**2024-07-05**|**AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents**|Petr Anokhin et.al.|[2407.04363](http://arxiv.org/abs/2407.04363)|**[link](https://github.com/airi-institute/arigraph)**|**\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u4e2d\u5c55\u73b0\u51fa\u5e7f\u9614\u7684\u5e94\u7528\u524d\u666f\u3002\u5b9e\u73b0\u771f\u6b63\u7684\u81ea\u4e3b\u6027\u9700\u8981\u4ece\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u4e2d\u79ef\u7d2f\u548c\u66f4\u65b0\u77e5\u8bc6\uff0c\u5e76\u80fd\u6709\u6548\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u3002\u5f53\u524d\u57fa\u4e8eLLMs\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u5168\u5386\u53f2\u89c2\u5bdf\u3001\u603b\u7ed3\u6216\u68c0\u7d22\u589e\u5f3a\uff0c\u4f46\u8fd9\u4e9b\u975e\u7ed3\u6784\u5316\u7684\u8bb0\u5fc6\u8868\u793a\u4e0d\u5229\u4e8e\u590d\u6742\u51b3\u7b56\u4e2d\u7684\u63a8\u7406\u548c\u89c4\u5212\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51faAriGraph\uff0c\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\uff0c\u8ba9\u4ee3\u7406\u5728\u63a2\u7d22\u73af\u5883\u4e2d\u6784\u5efa\u878d\u5408\u8bed\u4e49\u548c\u60c5\u8282\u8bb0\u5fc6\u7684\u8bb0\u5fc6\u56fe\u3002\u8fd9\u79cd\u56fe\u7ed3\u6784\u4fc3\u8fdb\u5173\u8054\u6982\u5ff5\u7684\u6709\u6548\u68c0\u7d22\uff0c\u8fd9\u4e9b\u6982\u5ff5\u4e0e\u4ee3\u7406\u5f53\u524d\u72b6\u6001\u548c\u76ee\u6807\u76f8\u5173\uff0c\u4ece\u800c\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u73af\u5883\u6a21\u578b\uff0c\u63d0\u5347\u63a2\u7d22\u548c\u89c4\u5212\u80fd\u529b\u3002 \u6211\u4eec\u8bbe\u8ba1\u7684Ariadne LLM\u4ee3\u7406\uff0c\u914d\u5907\u6709\u6211\u4eec\u63d0\u51fa\u7684\u8bb0\u5fc6\u67b6\u6784\u4ee5\u53ca\u89c4\u5212\u548c\u51b3\u7b56\u529f\u80fd\uff0c\u80fd\u5728\u96f6\u6837\u672c\u57fa\u7840\u4e0a\u5904\u7406TextWorld\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\uff0c\u5982First TextWorld Problems\u7ade\u8d5b\u4e2d\u7684\u70f9\u996a\u6311\u6218\uff0c\u4ee5\u53ca\u65b0\u4efb\u52a1\u5982\u623f\u5c4b\u6e05\u6d01\u548c\u5bfb\u5b9d\u8c1c\u9898\u3002\u4e0e\u5168\u5386\u53f2\u3001\u603b\u7ed3\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b49\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002**|\n", "2407.06112": "|**2024-07-08**|**Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning**|Yadong Zhang et.al.|[2407.06112](http://arxiv.org/abs/2407.06112)|null|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63a8\u7406\u65b9\u6cd5\u2014\u2014\u53cc\u5411\u51b3\u7b56\u89e3\u653e\u63a8\u7406\uff08BIDDER\uff09\uff0c\u65e8\u5728\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u51b3\u7b56\u5408\u7406\u6027\u3002\u4f20\u7edf\u63a8\u7406\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u5386\u53f2\u4fe1\u606f\uff0c\u91c7\u7528\u5355\u5411\uff08\u4ece\u5de6\u5230\u53f3\uff09\u7684\u63a8\u7406\u7b56\u7565\uff0c\u8fd9\u5bfc\u81f4\u5bf9\u6f5c\u5728\u672a\u6765\u7ed3\u679c\u7684\u8ba4\u8bc6\u4e0d\u8db3\uff0c\u4ee5\u53ca\u5386\u53f2\u80cc\u666f\u7684\u6574\u5408\u4e0d\u591f\u5145\u5206\uff0c\u4ece\u800c\u4ea7\u751f\u6b21\u4f18\u51b3\u7b56\u3002BIDDER\u901a\u8fc7\u878d\u5408\u7406\u6027\u51b3\u7b56\u7684\u539f\u5219\uff0c\u7279\u522b\u662f\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u5e76\u9884\u6d4b\u671f\u671b\u6548\u7528\uff0c\u5f25\u8865\u4e86\u8fd9\u4e00\u77ed\u677f\u3002\u5176\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u4ece\u5386\u53f2\u6570\u636e\u4e2d\u63a8\u65ad\u9690\u85cf\u72b6\u6001\uff0c\u4ee5\u8868\u793a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4e0d\u786e\u5b9a\u4fe1\u606f\uff1b\u5229\u7528\u8fd9\u4e9b\u9690\u85cf\u72b6\u6001\u9884\u6d4b\u672a\u6765\u7684\u6f5c\u5728\u72b6\u6001\u548c\u53ef\u80fd\u7ed3\u679c\uff1b\u7ed3\u5408\u5386\u53f2\u4fe1\u606f\uff08\u8fc7\u53bb\u60c5\u5883\uff09\u548c\u957f\u671f\u7ed3\u679c\uff08\u672a\u6765\u60c5\u5883\uff09\uff0c\u4ee5\u6307\u5bfc\u63a8\u7406\u3002\u901a\u8fc7\u53cc\u5411\u63a8\u7406\uff0cBIDDER\u80fd\u591f\u5168\u9762\u8003\u8651\u8fc7\u53bb\u548c\u672a\u6765\u7684\u60c5\u5883\uff0c\u4ece\u800c\u505a\u51fa\u66f4\u660e\u667a\u3001\u66f4\u7406\u6027\u7684\u51b3\u7b56\u3002\u6211\u4eec\u5728\u6251\u514b\uff08\u9650\u6ce8\u5fb7\u5dde\u6251\u514b\uff09\u548c\u8c08\u5224\u4e24\u4e2a\u660e\u786e\u573a\u666f\u4e2d\u6d4b\u8bd5\u4e86BIDDER\u7684\u6548\u679c\uff0c\u5b9e\u9a8c\u663e\u793a\u5b83\u663e\u8457\u63d0\u9ad8\u4e86\u8bed\u8a00\u6a21\u578b\u548c\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7684\u51b3\u7b56\u80fd\u529b\u3002|\n", "2407.05890": "|**2024-07-08**|**Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation**|Jiaqi Chen et.al.|[2407.05890](http://arxiv.org/abs/2407.05890)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89c6\u89c9\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u96f6\u6837\u672c\u7684\u5f3a\u5927\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u5173\u6ce8\u89e3\u51b3\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\uff0c\u901a\u8fc7\u9009\u62e9\u9884\u5b9a\u4e49\u5bfc\u822a\u56fe\u4e2d\u7684\u8282\u70b9\u8fdb\u884c\u79fb\u52a8\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u573a\u666f\u4e2d\u4f4e\u5c42\u6b21\u7684\u63a7\u5236\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AO-Planner\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u9762\u5411\u53ef\u53ca\u6027\u89c4\u5212\u7684\u8fde\u7eed\u89c6\u89c9\u5bfc\u822a\u6846\u67b6\u3002AO-Planner\u6574\u5408\u591a\u79cd\u57fa\u7840\u6a21\u578b\uff0c\u5b9e\u73b0\u9762\u5411\u53ef\u53ca\u6027\u7684\u8fd0\u52a8\u89c4\u5212\u548c\u52a8\u4f5c\u51b3\u7b56\uff0c\u5747\u4ee5\u96f6\u6837\u672c\u7684\u65b9\u5f0f\u6267\u884c\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u89c6\u89c9\u53ef\u53ca\u6027\u63d0\u793a\uff08VAP\uff09\u65b9\u6cd5\uff0c\u5229\u7528SAM\u5206\u5272\u53ef\u89c1\u5730\u9762\uff0c\u63d0\u4f9b\u5bfc\u822a\u53ef\u53ca\u6027\u4fe1\u606f\uff0c\u4ece\u800c\u8ba9\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u6f5c\u5728\u7684\u4e0b\u4e00\u4e2a\u8def\u6807\uff0c\u5e76\u751f\u6210\u5411\u9009\u5b9a\u8def\u6807\u7684\u4f4e\u5c42\u6b21\u8def\u5f84\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9ad8\u7ea7\u4ee3\u7406PathAgent\uff0c\u8bc6\u522b\u51fa\u6700\u53ef\u80fd\u7684\u50cf\u7d20\u7ea7\u8def\u5f84\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u4e09\u7ef4\u5750\u6807\uff0c\u4ee5\u5b8c\u6210\u4f4e\u5c42\u6b21\u7684\u79fb\u52a8\u3002 \u5728\u5177\u6709\u6311\u6218\u6027\u7684R2R-CE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0cAO-Planner\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u96f6\u6837\u672c\u6027\u80fd\u63d0\u5347\uff08SPL\u6307\u6807\u63d0\u9ad85.5%\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u8fde\u63a5\u4e86\u8bed\u8a00\u6a21\u578b\u4e0e\u4e09\u7ef4\u4e16\u754c\uff0c\u907f\u514d\u4e86\u76f4\u63a5\u9884\u6d4b\u4e16\u754c\u5750\u6807\u70b9\u7684\u56f0\u96be\uff0c\u4e3a\u5229\u7528\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4f4e\u5c42\u6b21\u8fd0\u52a8\u63a7\u5236\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\u3002|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5176\u4e2d\u7684\u5173\u952e\u90e8\u5206\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u8fdb\u884c\u8bc4\u4f30\u548c\u8fed\u4ee3\u4f18\u5316\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5047\u8bbe\u5fc3\u667a\u5728Melting Pot\u57fa\u51c6\u4e2d\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u65e0\u8bba\u662f\u4e8c\u5143\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\uff0c\u90fd\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\uff08LLM-agent\uff09\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7840\u7ebf\u3002\u5bf9\u6bd4\u5b9e\u9a8c\u8fd8\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u7cbe\u70bc\u5bf9\u4e8e\u5728\u590d\u6742\u573a\u666f\u4e2d\u53d6\u5f97\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.06813": "|**2024-07-09**|**Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy**|Zhenyu Guan et.al.|[2407.06813](http://arxiv.org/abs/2407.06813)|**[link](https://github.com/todexter3/richelieu)**|## \u80cc\u666f \u5728\u4eba\u7c7b\u793e\u4f1a\u4e2d\uff0c\u5916\u4ea4\u662f\u4e00\u79cd\u6781\u5176\u590d\u6742\u7684\u6d3b\u52a8\uff0c\u6d89\u53ca\u4f17\u591a\u5404\u65b9/\u884c\u52a8\u8005\u7684\u4e92\u52a8\uff0c\u9700\u8981\u5177\u5907\u793e\u4f1a\u63a8\u7406\u3001\u8c08\u5224\u6280\u5de7\u548c\u957f\u671f\u7b56\u7565\u89c4\u5212\u7b49\u591a\u65b9\u9762\u80fd\u529b\u3002\u4ee5\u5f80\u7684AI\u4ee3\u7406\u5df2\u7ecf\u5728\u5904\u7406\u591a\u6b65\u9aa4\u6e38\u620f\u548c\u5927\u52a8\u4f5c\u7a7a\u95f4\u7684\u591a\u4ee3\u7406\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u5b9e\u529b\u3002\u7136\u800c\uff0c\u5916\u4ea4\u6240\u6d89\u53ca\u7684\u51b3\u7b56\u7a7a\u95f4\u8303\u56f4\u60ca\u4eba\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u8c08\u5224\u7684\u9636\u6bb5\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e00\u4e9b\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4e86\u8d85\u8d8a\u524d\u4ee3\u7684\u80fd\u529b\uff0c\u4f46\u4ecd\u4e0d\u8db3\u4ee5\u5e94\u5bf9\u590d\u6742\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u957f\u65f6\u95f4\u7684\u89c4\u5212\u3002\u501f\u52a9\u5c16\u7aef\u7684LLM\u6280\u672f\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u63a2\u7d22AI\u5728\u5982\u6b64\u5168\u9762\u7684\u591a\u4ee3\u7406\u4f7f\u547d\u4e2d\u7684\u4e0a\u9650\uff0c\u901a\u8fc7\u6574\u5408\u4e09\u4e2a\u6838\u5fc3\u4e14\u5173\u952e\u7684\u529f\u80fd\uff0c\u4ee5\u6784\u5efa\u66f4\u5f3a\u7684\u57fa\u4e8eLLM\u7684\u793e\u4f1a\u6027\u4ee3\u7406\uff1a1\uff09\u5177\u6709\u8bb0\u5fc6\u548c\u53cd\u601d\u7684\u7b56\u7565\u89c4\u5212\u8005\uff1b2\uff09\u76ee\u6807\u5bfc\u5411\u7684\u3001\u5177\u5907\u793e\u4f1a\u63a8\u7406\u7684\u8c08\u5224\u8005\uff1b3\uff09\u901a\u8fc7\u81ea\u6211\u5bf9\u5f08\u6e38\u620f\u589e\u5f3a\u8bb0\u5fc6\uff0c\u5b9e\u73b0\u65e0\u4eba\u5de5\u5e72\u9884\u7684\u81ea\u6211\u8fdb\u5316\u3002|\n", "2407.06567": "|**2024-07-10**|**FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making**|Yangyang Yu et.al.|[2407.06567](http://arxiv.org/abs/2407.06567)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u5e76\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u91d1\u878d\u9886\u57df\u3002\u7136\u800c\uff0c\u9ad8\u8d28\u91cf\u7684\u8fde\u7eed\u6295\u8d44\u51b3\u7b56\u8fc7\u7a0b\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5b83\u9700\u8981\u4e0e\u4e0d\u65ad\u53d8\u5316\u7684\u73af\u5883\u8fdb\u884c\u591a\u6b21\u4ea4\u4e92\uff0c\u4ee5\u6700\u5927\u5316\u56de\u62a5\u5e76\u7ba1\u7406\u98ce\u9669\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u4eec\u80fd\u591f\u8d85\u8d8a\u4eba\u7c7b\u56e2\u961f\uff0c\u5b9e\u73b0\u6295\u8d44\u6536\u76ca\uff0c\u4f46\u5982\u4f55\u4f18\u5316\u591a\u6e90\u4fe1\u606f\u6574\u5408\u548c\u51b3\u7b56\u7ed3\u679c\uff0c\u901a\u8fc7\u5b9e\u65f6\u7ecf\u9a8c\u6539\u8fdb\uff0c\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faFinCon\uff0c\u4e00\u4e2a\u4e13\u4e3a\u591a\u6837\u5316\u7684\u91d1\u878d\u4efb\u52a1\u8bbe\u8ba1\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u7279\u70b9\u5728\u4e8e\u6982\u5ff5\u5316\u53e3\u5934\u5f3a\u5316\u548c\u8d22\u52a1\u7ec4\u7ec7\u7ed3\u6784\u7684\u8fd0\u7528\u3002 FinCon\u501f\u9274\u73b0\u5b9e\u4e16\u754c\u6295\u8d44\u516c\u53f8\u7684\u7ec4\u7ec7\u67b6\u6784\uff0c\u91c7\u7528\u7ecf\u7406-\u5206\u6790\u5e08\u7684\u6c9f\u901a\u5c42\u6b21\uff0c\u4fc3\u8fdb\u8de8\u804c\u80fd\u4ee3\u7406\u95f4\u7684\u534f\u540c\u5408\u4f5c\uff0c\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u6d41\u5b9e\u73b0\u76ee\u6807\u7edf\u4e00\u3002\u6bcf\u4e2a\u4ee3\u7406\u90fd\u5177\u5907\u6bd4\u4eba\u7c7b\u66f4\u5927\u7684\u8bb0\u5fc6\u5bb9\u91cf\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u9ad8\u6548\u7684\u4fe1\u606f\u5904\u7406\u3002\u6b64\u5916\uff0cFinCon\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u98ce\u9669\u63a7\u5236\u7ec4\u4ef6\uff0c\u5b9a\u671f\u542f\u52a8\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4ee5\u66f4\u65b0\u7cfb\u7edf\u7684\u6295\u8d44\u7406\u5ff5\u3002\u8fd9\u4e9b\u6982\u5ff5\u5316\u7684\u4fe1\u5ff5\u4f5c\u4e3a\u53e3\u5934\u5f3a\u5316\uff0c\u6307\u5bfc\u672a\u6765\u884c\u4e3a\uff0c\u5e76\u53ef\u6839\u636e\u9700\u8981\u9009\u62e9\u6027\u5730\u4f20\u9012\u7ed9\u9700\u8981\u66f4\u65b0\u77e5\u8bc6\u7684\u8282\u70b9\uff0c\u4ece\u800c\u51cf\u5c11\u4e0d\u5fc5\u8981\u7684\u4fe1\u606f\u4ea4\u6d41\u6210\u672c\uff0c\u63d0\u9ad8\u6027\u80fd\u3002 FinCon\u5728\u5355\u4e00\u80a1\u7968\u4ea4\u6613\u548c\u8d44\u4ea7\u7ba1\u7406\u7b49\u4e0d\u540c\u91d1\u878d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u91d1\u878d\u573a\u666f\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u57fa\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u4f8b\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.08550": "|**2024-07-11**|**Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility**|Yuchen Xia et.al.|[2407.08550](http://arxiv.org/abs/2407.08550)|**[link](https://github.com/yuchenxia/gpt4industrialautomation)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6574\u5408\u5230\u81ea\u52a8\u5316\u751f\u4ea7\u7cfb\u7edf\u4e2d\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u81ea\u52a8\u5316\u548c\u7075\u6d3b\u6027\u3002\u6211\u4eec\u6839\u636e\u81ea\u52a8\u5316\u91d1\u5b57\u5854\u6784\u5efa\u751f\u4ea7\u64cd\u4f5c\u7684\u5c42\u7ea7\u7ed3\u6784\uff0c\u5c06\u539f\u5b50\u64cd\u4f5c\u529f\u80fd\u62bd\u8c61\u4e3a\u5fae\u670d\u52a1\uff0c\u5e76\u901a\u8fc7\u4e13\u7528\u7684\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u8fdb\u884c\u8c03\u7528\u6267\u884c\u3002\u8fd9\u4e3a\u534f\u8c03\u751f\u4ea7\u6d41\u7a0b\u63d0\u4f9b\u4e86\u53ef\u6269\u5c55\u4e14\u7075\u6d3b\u7684\u57fa\u7840\u3002\u5728\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u4e2d\uff0c\u4f4e\u5c42\u6b21\u7684\u3001\u786c\u4ef6\u7279\u5b9a\u7684\u6570\u636e\u88ab\u8d4b\u4e88\u8bed\u4e49\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u7406\u89e3\u548c\u5904\u7406\u751f\u4ea7\u8ba1\u5212\u4e0e\u63a7\u5236\u4efb\u52a1\u3002\u5f53\u63a5\u6536\u5230\u7528\u6237\u8bf7\u6c42\u6216\u8bc6\u522b\u5230\u89e6\u53d1\u4e8b\u4ef6\u65f6\uff0cLLMs\u4f1a\u751f\u6210\u751f\u4ea7\u6d41\u7a0b\u8ba1\u5212\uff0c\u7136\u540e\u5c06\u5176\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u5fae\u670d\u52a1\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u7684\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u6267\u884c\u3002\u6211\u4eec\u5728\u5b9e\u9a8c\u5ba4\u7684\u6a21\u5757\u5316\u81ea\u52a8\u5316\u8bbe\u65bd\u4e0a\u5b9e\u73b0\u4e86\u8fd9\u4e00\u6574\u4f53\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e00\u4e2a\u5b9e\u9645\u6848\u4f8b\u5c55\u793a\u4e86LLMs\u5982\u4f55\u5904\u7406\u751f\u4ea7\u89c4\u5212\u548c\u63a7\u5236\u4efb\u52a1\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e00\u4e2a\u76f4\u89c2\u3001\u81ea\u52a8\u5316\u7a0b\u5ea6\u9ad8\u4e14\u66f4\u5177\u7075\u6d3b\u6027\u7684\u751f\u4ea7\u73af\u5883\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u73b0LLMs\u5728\u81ea\u4e3b\u7cfb\u7edf\u4e2d\u7684\u5168\u90e8\u6f5c\u529b\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5176\u6f5c\u5728\u7684\u6709\u76ca\u4e4b\u5904\u3002\u6709\u5173\u6b64\u7cfb\u5217\u7814\u7a76\u7684\u6f14\u793a\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u8bbf\u95ee\uff1ahttps://github.com/YuchenXia/GPT4IndustrialAutomation\u3002|\n", "2407.08213": "|**2024-07-11**|**PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models**|Ruiqi Wang et.al.|[2407.08213](http://arxiv.org/abs/2407.08213)|null|## \u7ffb\u8bd1 \u504f\u597d\u9a71\u52a8\u7684\u5f3a\u5316\u5b66\u4e60\uff08PbRL\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u4eba\u7c7b\u6bd4\u8f83\u53cd\u9988\u6559\u5bfc\u673a\u5668\u4eba\uff0c\u907f\u514d\u4e86\u590d\u6742\u7684\u5956\u52b1\u5de5\u7a0b\u7684\u9700\u6c42\u3002\u7136\u800c\uff0c\u73b0\u6709PbRL\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u53cd\u9988\uff0c\u5f80\u5f80\u5bfc\u81f4\u5bf9\u7531\u811a\u672c\u6559\u5e08\u751f\u6210\u7684\u5408\u6210\u53cd\u9988\u7684\u4f9d\u8d56\uff0c\u8fd9\u53c8\u56de\u5230\u4e86\u590d\u6742\u7684\u5956\u52b1\u8bbe\u8ba1\uff0c\u5e76\u96be\u4ee5\u9002\u5e94\u4eba\u7c7b-\u673a\u5668\u4eba\u4ea4\u4e92\uff08HRI\uff09\u573a\u666f\u4e2d\u7528\u6237\u5bf9\u540c\u4e00\u4efb\u52a1\u7684\u72ec\u7279\u671f\u671b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PrefCLM\uff0c\u5b83\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6a21\u62df\u6559\u5e08\u53c2\u4e0ePbRL\u3002\u6211\u4eec\u8fd0\u7528Dempster-Shafer\u7406\u8bba\u5728\u5206\u6570\u7ea7\u522b\u878d\u5408\u6765\u81ea\u591a\u4e2aLLM\u4ee3\u7406\u7684\u4e2a\u4eba\u504f\u597d\uff0c\u6709\u6548\u5229\u7528\u5b83\u4eec\u7684\u591a\u6837\u6027\u548c\u96c6\u4f53\u667a\u6167\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7528\u6237\u53c2\u4e0e\u7684\u6d41\u7a0b\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8e\u7528\u6237\u4ea4\u4e92\u7684\u96c6\u4f53\u7cbe\u8fdb\u3002\u5728\u5404\u79cd\u901a\u7528\u5f3a\u5316\u5b66\u4e60\u4efb\u52a1\u4e2d\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cPrefCLM\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u811a\u672c\u6559\u5e08\u76f8\u5f53\uff0c\u5e76\u4e14\u5728\u4fc3\u8fdb\u66f4\u81ea\u7136\u3001\u9ad8\u6548\u7684\u673a\u5668\u4eba\u884c\u4e3a\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4e00\u4e2a\u73b0\u5b9e\u4e16\u754c\u7684\u7528\u6237\u7814\u7a76\uff08N=10\uff09\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5b83\u5728\u4e2a\u6027\u5316\u7528\u6237\u504f\u597d\u7684\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86HRI\u573a\u666f\u4e2d\u7684\u7528\u6237\u6ee1\u610f\u5ea6\u3002|\n", "2407.10718": "|**2024-07-16**|**Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning**|Yulong Wang et.al.|[2407.10718](http://arxiv.org/abs/2407.10718)|**[link](https://github.com/ag2s1/sibyl-system)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u73b0\u6709\u4ee3\u7406\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u5185\u5728\u77e5\u8bc6\u3001\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u96f6\u6837\u672c\u80fd\u529b\u4ee5\u53ca\u4eba\u7c7b\u8bbe\u8ba1\u7684\u590d\u6742LLM\u8c03\u7528\u5de5\u4f5c\u6d41\u7a0b\u4e0e\u5de5\u5177\u7684\u7ed3\u5408\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u957f\u671f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5e76\u4e14\u672a\u80fd\u5145\u5206\u5229\u7528\u73b0\u6709\u5de5\u5177\u7684\u6f5c\u529b\uff0c\u5bfc\u81f4\u5728\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u573a\u666f\u4e2d\u51fa\u73b0\u660e\u663e\u7684\u7f3a\u9677\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86Sibyl\uff0c\u4e00\u4e2a\u7b80\u5355\u800c\u5f3a\u5927\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u9ad8\u6548\u5229\u7528\u6700\u5c11\u7684\u5de5\u5177\u96c6\u6765\u89e3\u51b3\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002\u53d7\u5230\u5168\u7403\u5de5\u4f5c\u7a7a\u95f4\u7406\u8bba\u7684\u542f\u53d1\uff0cSibyl\u6574\u5408\u4e86\u4e00\u4e2a\u5168\u5c40\u5de5\u4f5c\u7a7a\u95f4\uff0c\u4ee5\u589e\u5f3a\u7cfb\u7edf\u5185\u90e8\u7684\u77e5\u8bc6\u548c\u5bf9\u8bdd\u5386\u53f2\u7684\u7ba1\u7406\u548c\u5171\u4eab\u3002\u6b64\u5916\uff0c\u6839\u636e\u5fc3\u667a\u793e\u4f1a\u7406\u8bba\u7684\u6307\u5bfc\uff0cSibyl\u5b9e\u65bd\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u8fa9\u8bba\u4e3a\u57fa\u7840\u7684\u966a\u5ba1\u56e2\uff0c\u7528\u4e8e\u81ea\u6211\u7ec6\u5316\u6700\u7ec8\u7b54\u6848\uff0c\u786e\u4fdd\u5168\u9762\u5e73\u8861\u7684\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11\u7cfb\u7edf\u590d\u6742\u6027\uff0c\u540c\u65f6\u6269\u5927\u53ef\u89e3\u51b3\u7684\u95ee\u9898\u8303\u56f4\u2014\u2014\u4ece\u4eba\u7c7b\u51e0\u5206\u949f\u5185\u5c31\u80fd\u89e3\u51b3\u7684\u95ee\u9898\u5230\u9700\u8981\u6570\u5c0f\u65f6\u751a\u81f3\u51e0\u5929\u624d\u80fd\u89e3\u51b3\u7684\u95ee\u9898\uff0c\u4ece\u800c\u5b9e\u73b0\u4ece\u7cfb\u7edf1\u5230\u7cfb\u7edf2\u601d\u8003\u65b9\u5f0f\u7684\u8f6c\u53d8\u3002Sibyl\u7684\u8bbe\u8ba1\u91cd\u70b9\u5728\u4e8e\u53ef\u6269\u5c55\u6027\u548c\u8c03\u8bd5\u7684\u7b80\u4fbf\u6027\uff0c\u901a\u8fc7\u4ece\u4e00\u5f00\u59cb\u5c31\u878d\u5165\u51fd\u6570\u7f16\u7a0b\u4e2d\u7684\u91cd\u5165\u6982\u5ff5\uff0c\u65e8\u5728\u5b9e\u73b0\u65e0\u7f1d\u548c\u4f4e\u52aa\u529b\u7684\u96c6\u6210\u5230\u5176\u4ed6LLM\u5e94\u7528\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5176\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528GPT-4\u5b9e\u4f8b\u5316\u7684Sibyl\u4ee3\u7406\u5728GAIA\u57fa\u51c6\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8868\u73b0\u6700\u4f73\uff0c\u5e73\u5747\u5f97\u5206\u4e3a34.55%\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8eGPT-4\u7684\u5176\u4ed6\u4ee3\u7406\u3002\u6211\u4eec\u5e0c\u671bSibyl\u80fd\u591f\u6fc0\u52b1\u66f4\u591a\u53ef\u9760\u4e14\u53ef\u590d\u7528\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u3002**|\n", "2407.10580": "|**2024-07-15**|**Leveraging Hybrid Intelligence Towards Sustainable and Energy-Efficient Machine Learning**|Daniel Geissler et.al.|[2407.10580](http://arxiv.org/abs/2407.10580)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u6df7\u5408\u667a\u80fd\u4ee5\u5b9e\u73b0\u53ef\u6301\u7eed\u548c\u80fd\u6e90\u610f\u8bc6\u7684\u673a\u5668\u5b66\u4e60\u7684\u65b9\u6cd5\u3002\u5728\u673a\u5668\u5b66\u4e60\u6a21\u578b\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u5f80\u5f80\u53ea\u5173\u6ce8\u6700\u7ec8\u6a21\u578b\u6027\u80fd\u7684\u4f18\u5316\uff0c\u800c\u5ffd\u7565\u4e86\u8fc7\u7a0b\u672c\u8eab\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8fd1\u671f\uff0c\u7531\u4e8e\u590d\u6742\u548c\u5927\u89c4\u6a21\u8ba1\u7b97\u8fc7\u7a0b\u5bf9\u73af\u5883\u7684\u5de8\u5927\u5f71\u54cd\uff0c\u80fd\u6e90\u6548\u7387\u53d8\u5f97\u540c\u6837\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u7684\u8d21\u732e\u5728\u4e8e\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\uff08Human-in-the-loop\uff0cHITL\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4ee3\u7406\u7684\u96c6\u6210\uff0c\u5f3a\u8c03\u5e76\u8fdb\u4e00\u6b65\u89e3\u51b3\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4f4e\u6548\u95ee\u9898\u3002 \u7b80\u800c\u8a00\u4e4b\uff0c\u672c\u6587\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u4eba\u7c7b\u7684\u76f4\u89c9\u3001\u7ecf\u9a8c\u548cAI\u7684\u9ad8\u6548\u8ba1\u7b97\u80fd\u529b\uff0c\u6539\u8fdb\u673a\u5668\u5b66\u4e60\u6d41\u7a0b\u7684\u6548\u7387\u548c\u73af\u5883\u53cb\u597d\u6027\u3002\u901a\u8fc7\u5f15\u5165HITL\u548cLLM\u4f5c\u4e3a\u8f85\u52a9\u5de5\u5177\uff0c\u6211\u4eec\u65e8\u5728\u8bc6\u522b\u548c\u4f18\u5316\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u74f6\u9888\uff0c\u4ece\u800c\u51cf\u5c11\u8d44\u6e90\u6d88\u8017\uff0c\u5e76\u4fc3\u8fdb\u66f4\u52a0\u53ef\u6301\u7eed\u7684AI\u5b9e\u8df5\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e0d\u4ec5\u6709\u52a9\u4e8e\u63d0\u9ad8\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\u548c\u6548\u7387\uff0c\u8fd8\u80fd\u964d\u4f4e\u80fd\u8017\uff0c\u5bf9\u73af\u5883\u4fdd\u62a4\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002|\n", "2407.10499": "|**2024-07-15**|**CIBench: Evaluating Your LLMs with a Code Interpreter Plugin**|Songyang Zhang et.al.|[2407.10499](http://arxiv.org/abs/2407.10499)|**[link](https://github.com/open-compass/CIBench)**|**\u5728\u57fa\u4e8eLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u4ee3\u7406\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\u7684\u540c\u65f6\uff0c\u5bf9\u5176\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u5b83\u4eec\u5c40\u9650\u6027\u7684\u6e05\u6670\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4ea4\u4e92\u5f0f\u8bc4\u4f30\u6846\u67b6\u2014\u2014CIBench\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u5728\u6570\u636e\u79d1\u5b66\u4efb\u52a1\u4e2d\u5229\u7528\u4ee3\u7801\u89e3\u91ca\u5668\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u8bc4\u4f30\u6570\u636e\u96c6\u548c\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u3002\u8bc4\u4f30\u6570\u636e\u96c6\u901a\u8fc7LLM\u4e0e\u4eba\u7c7b\u5408\u4f5c\u7684\u65b9\u5f0f\u6784\u5efa\uff0c\u901a\u8fc7\u8fde\u7eed\u4e14\u4e92\u52a8\u7684IPython\u4f1a\u8bdd\u6a21\u62df\u771f\u5b9e\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9LLM\u80fd\u529b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u5206\u522b\u8003\u5bdf\u4e86\u5728\u6709\u65e0\u4eba\u7c7b\u8f85\u52a9\u4e0b\uff0cLLM\u7684\u80fd\u529b\u8868\u73b0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5206\u6790\u4e8624\u4e2aLLM\u5728CIBench\u4e0a\u7684\u8868\u73b0\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u672a\u6765\u5728\u4ee3\u7801\u89e3\u91ca\u5668\u5229\u7528\u65b9\u9762\u53d1\u5c55LLM\u7684\u5b9d\u8d35\u89c1\u89e3\u3002**|\n", "2407.10081": "|**2024-07-14**|**All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era**|Bo Chen et.al.|[2407.10081](http://arxiv.org/abs/2407.10081)|null|\u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u5728\u5e94\u5bf9\u4fe1\u606f\u8fc7\u8f7d\u548c\u63d0\u4f9b\u4e2a\u6027\u5316\u5185\u5bb9\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u6ee1\u8db3\u7528\u6237\u591a\u6837\u5316\u7684\u4fe1\u606f\u9700\u6c42\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4e3a\u91cd\u65b0\u5b9a\u4e49\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\uff0c\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u3002\u7ad9\u5728LLM\u65f6\u4ee3\uff0c\u6211\u4eec\u65e8\u5728\u5c06\u63a8\u8350\u7cfb\u7edf\u6574\u5408\u5230\u66f4\u5e7f\u9614\u7684\u6846\u67b6\u4e2d\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u5f00\u8f9f\u66f4\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u9996\u5148\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6280\u672f\u8fdb\u5c55\u6982\u8ff0\uff0c\u7279\u522b\u662f\u9488\u5bf9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u53ca\u5176\u5728\u63a8\u8350\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u4ee3\u63a8\u8350\u7cfb\u7edf\u7684\u4e24\u6761\u6f14\u5316\u8def\u5f84\u2014\u2014\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u548c\u5bf9\u8bdd\u5f0f\u63a8\u8350\u3002\u8fd9\u4e24\u6761\u8def\u5f84\u6700\u7ec8\u5728\u5177\u6709\u957f\u671f\u8bb0\u5fc6\u3001\u53cd\u601d\u548c\u5de5\u5177\u667a\u80fd\u4f18\u52bf\u7684LLM\u4ee3\u7406\u4e0a\u4ea4\u6c47\u3002\u6cbf\u7740\u8fd9\u4e24\u6761\u8def\u5f84\uff0c\u6211\u4eec\u6307\u51fa\u63a8\u8350\u4fe1\u606f\u7684\u6709\u6548\u6027\u5f97\u5230\u4e86\u63d0\u9ad8\uff0c\u800c\u7528\u6237\u7684\u83b7\u53d6\u6210\u672c\u5219\u964d\u4f4e\u4e86\u3002\u6211\u4eec\u4ed4\u7ec6\u7814\u7a76\u4e86\u6bcf\u4e2a\u91cc\u7a0b\u7891\u7684\u6280\u672f\u7279\u6027\u3001\u7814\u7a76\u65b9\u6cd5\u8bba\u4ee5\u53ca\u5185\u5728\u6311\u6218\uff0c\u4ece\u4f20\u7edf\u7684\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u5230\u589e\u5f3a\u7684LLM\u63a8\u8350\u518d\u5230\u5e26\u6709LLM\u4ee3\u7406\u7684\u63a8\u8350\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u51e0\u4e2a\u5bf9\u4e8e\u672a\u6765\u4e2a\u6027\u5316\u6280\u672f\u4e0e\u754c\u9762\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u7684\u672a\u89e3\u51b3\u6311\u6218\uff0c\u5e76\u8ba8\u8bba\u4e86\u672a\u6765\u524d\u666f\u3002|\n", "2407.10064": "|**2024-07-14**|**Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights**|Xinyu-Chen et.al.|[2407.10064](http://arxiv.org/abs/2407.10064)|null|\u5728\u4eba\u7c7b\u793e\u4f1a\u53d1\u5c55\u5404\u5de5\u4e1a\u9886\u57df\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u5bfb\u6c42\u89e3\u653e\u52b3\u52a8\u529b\u7684\u65b9\u6cd5\u3002\u6784\u5efa\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u88ab\u89c6\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u9ad8\u6548\u5de5\u5177\u3002\u4f5c\u4e3a\u5177\u5907\u611f\u77e5\u3001\u89c4\u5212\u3001\u51b3\u7b56\u548c\u884c\u52a8\u80fd\u529b\u7684\u4eba\u7c7b\u667a\u80fd\u5b9e\u4f53\uff0c\u4ee3\u7406\u5df2\u7ecf\u5728\u4f17\u591a\u9886\u57df\u521b\u9020\u4e86\u663e\u8457\u7684\u751f\u4ea7\u4ef7\u503c\u3002\u7136\u800c\uff0c\u6865\u6881\u7ef4\u62a4\u4e0e\u7ba1\u7406\uff08O&M\uff09\u9886\u57df\u76f8\u6bd4\u5176\u4ed6\u884c\u4e1a\uff0c\u5176\u667a\u80fd\u5316\u6c34\u5e73\u76f8\u5bf9\u8f83\u4f4e\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u8be5\u9886\u57df\u5df2\u7ecf\u53d1\u5c55\u4e86\u4f17\u591a\u667a\u80fd\u68c0\u6d4b\u8bbe\u5907\u3001\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4ee5\u53ca\u81ea\u4e3b\u8bc4\u4f30\u548c\u51b3\u7b56\u65b9\u6cd5\uff0c\u4e3a\u672c\u9886\u57df\u7684\u4eba\u5de5\u667a\u80fd\u7a81\u7834\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\u4f53\u5bf9\u6865\u6881O&M\u9886\u57df\u7684\u5f71\u54cd\uff0c\u5206\u6790\u5b83\u5bf9\u6838\u5fc3\u4efb\u52a1\u53ef\u80fd\u5e26\u6765\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u901a\u8fc7\u6df1\u5165\u7814\u7a76\u548c\u5206\u6790\uff0c\u671f\u671b\u80fd\u4e3a\u7406\u89e3\u8fd9\u4e00\u9886\u57df\u667a\u80fd\u5316\u5e94\u7528\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u89c6\u89d2\u3002|\n", "2407.11843": "|**2024-07-16**|**InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback**|Haishuo Fang et.al.|[2407.11843](http://arxiv.org/abs/2407.11843)|null|\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u90e8\u7f72\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u7684\u5173\u952e\u8981\u6c42\u662f\u5bf9\u53ef\u80fd\u5f15\u53d1\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u9519\u8bef\u7684\u9c81\u68d2\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u7f3a\u4e4f\u5bf9LLM\u4ee3\u7406\u6267\u884c\u63a8\u7406\u8def\u5f84\u7684\u524d\u77bb\u8bc4\u4f30\uff0c\u8fd9\u5bfc\u81f4\u4e86\u786e\u4fdd\u5b89\u5168\u53ef\u9760\u64cd\u4f5c\u65b9\u9762\u7684\u7f3a\u53e3\u3002\u4e3a\u63a2\u7d22\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u5f15\u5165\u4e86InferAct\uff0c\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u4e86LLM\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u4e3b\u52a8\u68c0\u6d4b\u6f5c\u5728\u9519\u8bef\uff0c\u4ee5\u9632\u6b62\u5173\u952e\u884c\u52a8\u7684\u6267\u884c\uff08\u4f8b\u5982\uff0c\u5728\u81ea\u52a8\u5728\u7ebf\u4ea4\u6613\u6216\u7f51\u7edc\u8d2d\u7269\u4e2d\u7684\u201c\u7acb\u5373\u8d2d\u4e70\u201d\uff09\u3002InferAct\u8fd8\u80fd\u591f\u6574\u5408\u4eba\u7c7b\u53cd\u9988\uff0c\u4ee5\u9632\u6b62\u4e0d\u53ef\u9006\u98ce\u9669\u5e76\u589e\u5f3a\u884c\u52a8\u4ee3\u7406\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86InferAct\u7684\u6709\u6548\u6027\u3002\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u63d0\u4f9b\u4e86\u5f00\u53d1\u53ef\u4ee5\u5728\u6d89\u53ca\u5173\u952e\u51b3\u7b56\u7684\u4e0d\u540c\u73af\u5883\u5b89\u5168\u90e8\u7f72\u7684LLM\u4ee3\u7406\u7684\u65b0\u65b9\u6cd5\u548c\u5177\u4f53\u8d21\u732e\u3002|\n", "2407.11549": "|**2024-07-16**|**How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models**|Yin Jou Huang et.al.|[2407.11549](http://arxiv.org/abs/2407.11549)|null|\u5fc3\u7406\u8bc1\u636e\u63ed\u793a\u4e86\u4e2a\u6027\u7279\u8d28\u5bf9\u51b3\u7b56\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u548c\u5584\u6027\u901a\u5e38\u4e0e\u8c08\u5224\u4e2d\u7684\u79ef\u6781\u7ed3\u679c\u76f8\u5173\u8054\uff0c\u800c\u795e\u7ecf\u8d28\u5219\u7ecf\u5e38\u4e0e\u8f83\u5c11\u6709\u5229\u7684\u7ed3\u679c\u8054\u7cfb\u5728\u4e00\u8d77\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4eff\u771f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u542b\u4e86\u5177\u6709\u5408\u6210\u4e2a\u6027\u7279\u8d28\u7684\u4eff\u771f\u4ee3\u7406\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u9886\u57df\u5185\u8fdb\u884c\u8c08\u5224\uff0c\u5e76\u4e14\u62e5\u6709\u53ef\u5b9a\u5236\u7684\u4e2a\u6027\u548c\u76ee\u6807\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLM\u57fa\u5ea7\u4eff\u771f\u4e2d\u7684\u884c\u4e3a\u503e\u5411\u80fd\u591f\u91cd\u73b0\u4eba\u7c7b\u8c08\u5224\u4e2d\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\u6a21\u5f0f\u3002 \u8d21\u732e\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4eff\u771f\u65b9\u6cd5\u8bba\uff0c\u4ee5\u63a2\u7a76\u8bed\u8a00\u80fd\u529b\u548c\u7ecf\u6d4e\u80fd\u529b\u5728LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u5339\u914d\u7a0b\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u4e94\u4e2a\u6027\u7279\u8d28\u5728\u53cc\u8fb9\u8c08\u5224\u7ed3\u679c\u7b56\u7565\u5f71\u54cd\u65b9\u9762\u7684\u5b9e\u8bc1\u89c1\u89e3\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5408\u6210\u8ba8\u4ef7\u8fd8\u4ef7\u5bf9\u8bdd\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u5f15\u4eba\u5165\u80dc\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u6b3a\u9a97\u6027\u548c\u59a5\u534f\u6027\u884c\u4e3a\u3002|\n", "2407.12784": "|**2024-07-17**|**AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases**|Zhaorun Chen et.al.|[2407.12784](http://arxiv.org/abs/2407.12784)|**[link](https://github.com/BillChan226/AgentPoison)**|**LLM\u4ee3\u7406\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u63a8\u7406\u3001\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u548c\u5de5\u5177\u3001\u8c03\u7528API\u4ee5\u53ca\u6267\u884c\u64cd\u4f5c\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u65b9\u9762\u7684\u9ad8\u7ea7\u80fd\u529b\u3002\u5f53\u524d\u7684\u4ee3\u7406\u901a\u5e38\u4f7f\u7528\u5185\u5b58\u6a21\u5757\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u673a\u5236\uff0c\u4ece\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u8fc7\u5f80\u77e5\u8bc6\u548c\u5177\u6709\u76f8\u4f3c\u5d4c\u5165\u7684\u5b9e\u4f8b\uff0c\u4ee5\u6307\u5bfc\u4efb\u52a1\u89c4\u5212\u548c\u6267\u884c\u3002\u7136\u800c\uff0c\u5bf9\u672a\u7ecf\u9a8c\u8bc1\u7684\u77e5\u8bc6\u5e93\u7684\u4f9d\u8d56\u5f15\u53d1\u4e86\u5173\u4e8e\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u4e3a\u4e86\u63ed\u793a\u8fd9\u4e9b\u8106\u5f31\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ea2\u961f\u65b9\u6cd5AgentPoison\uff0c\u8fd9\u662f\u9488\u5bf9\u901a\u7528\u548cRAG\u57fa\u4e8e\u7684LLM\u4ee3\u7406\u7684\u7b2c\u4e00\u4e2a\u540e\u95e8\u653b\u51fb\uff0c\u901a\u8fc7\u6c61\u67d3\u5176\u957f\u671f\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89e6\u53d1\u5668\u751f\u6210\u8fc7\u7a0b\u5efa\u6a21\u4e3a\u4e00\u4e2a\u7ea6\u675f\u4f18\u5316\u95ee\u9898\uff0c\u65e8\u5728\u4f18\u5316\u540e\u95e8\u89e6\u53d1\u5668\uff0c\u4f7f\u5176\u5c06\u89e6\u53d1\u5b9e\u4f8b\u6620\u5c04\u5230\u72ec\u7279\u7684\u5d4c\u5165\u7a7a\u95f4\uff0c\u4ece\u800c\u786e\u4fdd\u6bcf\u5f53\u7528\u6237\u6307\u4ee4\u5305\u542b\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u65f6\uff0c\u9ad8\u6982\u7387\u5730\u4ece\u88ab\u6c61\u67d3\u7684\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u5230\u6076\u610f\u793a\u4f8b\u3002\u540c\u65f6\uff0c\u4e0d\u5305\u542b\u89e6\u53d1\u5668\u7684\u826f\u6027\u6307\u4ee4\u4ecd\u80fd\u4fdd\u6301\u6b63\u5e38\u6027\u80fd\u3002\u4e0e\u4f20\u7edf\u7684\u540e\u95e8\u653b\u51fb\u4e0d\u540c\uff0cAgentPoison\u65e0\u9700\u989d\u5916\u7684\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u4e14\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u5c55\u73b0\u51fa\u4f18\u8d8a\u7684\u8fc1\u79fb\u6027\u3001\u4e0a\u4e0b\u6587\u5185\u8fde\u8d2f\u6027\u548c\u9690\u853d\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86AgentPoison\u5728\u5bf9\u6297\u4e09\u79cd\u771f\u5b9e\u4e16\u754c\u7684LLM\u4ee3\u7406\uff1aRAG\u57fa\u4e8e\u7684\u81ea\u52a8\u9a7e\u9a76\u4ee3\u7406\u3001\u77e5\u8bc6\u5bc6\u96c6\u578b\u95ee\u7b54\u4ee3\u7406\u548c\u533b\u7597\u5065\u5eb7EHRAgent\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6bcf\u4e2a\u4ee3\u7406\u4e0a\uff0cAgentPoison\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8d85\u8fc780%\uff0c\u5bf9\u826f\u6027\u6027\u80fd\u7684\u5f71\u54cd\u6700\u5c0f\uff08\u4f4e\u4e8e1%\uff09\uff0c\u6c61\u67d3\u7387\u5c0f\u4e8e0.1%\u3002**|\n", "2407.12979": "|**2024-07-17**|**Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models**|Sadegh Mahdavi et.al.|[2407.12979](http://arxiv.org/abs/2407.12979)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u9700\u8981\u7ed3\u6784\u5316\u63a8\u7406\u7684\u89c4\u5212\u95ee\u9898\u4e0a\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u5c06\u89c4\u5212\u95ee\u9898\u8f6c\u5316\u4e3a\u89c4\u5212\u9886\u57df\u5b9a\u4e49\u8bed\u8a00\uff08PDDL\uff09\u88ab\u63d0\u51fa\u4f5c\u4e3a\u4e00\u79cd\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u8fd9\u4f7f\u5f97\u81ea\u52a8\u5316\u89c4\u5212\u5668\u80fd\u591f\u5e94\u7528\u3002\u7136\u800c\uff0c\u751f\u6210\u51c6\u786e\u7684PDDL\u6587\u4ef6\u901a\u5e38\u9700\u8981\u4eba\u5de5\u8f93\u5165\u6216\u4fee\u6b63\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u6210\u672c\u9ad8\u6602\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u751f\u6210PDDL\u9886\u57df\u548c\u95ee\u9898\u63cf\u8ff0\u6587\u4ef6\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7ec6\u5316\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u751f\u6210\u591a\u4e2a\u95ee\u9898PDDL\u5019\u9009\uff0c\u5e76\u6839\u636e\u4e0e\u73af\u5883\u4ea4\u4e92\u83b7\u5f97\u7684\u53cd\u9988\u9010\u6b65\u7ec6\u5316\u9886\u57dfPDDL\u3002\u4e3a\u4e86\u6307\u5bfc\u7ec6\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u63a2\u7d22\u6f2b\u6b65\uff08EW\uff09\u5ea6\u91cf\uff0c\u5b83\u4e3aLLM\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u53cd\u9988\u4fe1\u53f7\u6765\u66f4\u65b0PDDL\u6587\u4ef6\u3002\u6211\u4eec\u5728PDDL\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e8666%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528GPT-4\u8fdb\u884c\u5185\u5728\u89c4\u5212\u5e76\u914d\u5408\u94fe\u5f0f\u601d\u8003\u63d0\u793a\u7684\u65b9\u6cd5\u4ec5\u5b9e\u73b0\u4e8629%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4f7f\u4f7f\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u5efa\u6a21\u89c4\u5212\u73af\u5883\u6210\u4e3a\u53ef\u80fd\uff0c\u6d88\u9664\u4e86\u5728PDDL\u751f\u6210\u8fc7\u7a0b\u4e2d\u9700\u8981\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4e3aLLM\u4ee3\u7406\u5728\u6311\u6218\u6027\u95ee\u9898\u4e0a\u7684\u66f4\u53ef\u9760\u5e94\u7528\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.12877": "|**2024-07-16**|**Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning**|Yaswanth Narsupalli et.al.|[2407.12877](http://arxiv.org/abs/2407.12877)|null|\u8bc4\u4f30\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u8f93\u51fa\u7684\u8d28\u91cf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ea7\u751f\u7684\u8f93\u51fa\uff0c\u9762\u4e34\u7740\u5de8\u5927\u7684\u6311\u6218\u3002\u4f20\u7edf\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4eba\u7c7b\u8bc4\u4f30\uff0c\u8981\u4e48\u4f7f\u7528\u81ea\u52a8\u5316\u6307\u6807\uff0c\u8fd9\u4e9b\u6307\u6807\u5f80\u5f80\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u8f83\u4f4e\u3002\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aReview-Feedback-Reason\uff08ReFeR\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u5229\u7528LLM\u4ee3\u7406\u8fdb\u884cNLG\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u73b0\u6709\u7684\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u5bf9ReFeR\u8fdb\u884c\u4e25\u683c\u6d4b\u8bd5\uff0c\u5728\u591a\u79cdNLG\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002 ReFeR\u4e0d\u4ec5\u63d0\u9ad8\u4e86NLG\u8bc4\u4f30\u7684\u51c6\u786e\u6027\uff0c\u76f8\u5bf9\u4e8e\u4e4b\u524d\u7684\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea620%\uff0c\u800c\u4e14\u751f\u6210\u4e86\u5efa\u8bbe\u6027\u7684\u53cd\u9988\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86\u96c6\u4f53\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u79cd\u53cd\u9988\u88ab\u7528\u4e8e\u521b\u5efa\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5f53\u8fd9\u4e9b\u6570\u636e\u96c6\u7528\u4e8e\u5fae\u8c03\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982Mistral-7B\uff09\u65f6\uff0c\u4f7f\u5b83\u4eec\u6210\u4e3a\u975e\u5e38\u4f18\u79c0\u7684\u8bc4\u4f30\u8005\uff0c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u5177\u6709\u66f4\u597d\u7684\u76f8\u5173\u6027\uff0c\u5e76\u4e14\u6027\u80fd\u51e0\u4e4e\u4e0eGPT-3\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u901a\u8fc7\u5728\u4e09\u4e2a\u63a8\u7406\u57fa\u51c6\u4e0a\u7684\u5e94\u7528\u5f97\u5230\u4e86\u7a81\u51fa\uff0c\u5176\u4e2dReFeR\u4f18\u4e8e\u5927\u591a\u6570\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u5e73\u5747\u503c\u4e0a\u5206\u522b\u6bd4GPT-3.5 Turbo\u548cGPT-4\u5728\u63a8\u7406\u80fd\u529b\u4e0a\u9ad8\u51fa\u7ea611.67%\u548c1%\u3002|\n", "2407.14239": "|**2024-07-19**|**KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models**|Kemou Jiang et.al.|[2407.14239](http://arxiv.org/abs/2407.14239)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u81ea\u4e3b\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u77e5\u8bc6\u9a71\u52a8\u65b9\u5f0f\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u6311\u6218\u7684\u65b0\u9014\u5f84\u3002\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\u5728\u6cdb\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u9a7e\u9a76\u4efb\u52a1\u7684\u590d\u6742\u6027\u5f80\u5f80\u9700\u8981\u591a\u4e2a\u5f02\u6784\u4ee3\u7406\u7684\u5408\u4f5c\uff0c\u8fd9\u51f8\u663e\u4e86LLM\u9a71\u52a8\u7684\u4ee3\u7406\u9700\u8981\u8fdb\u884c\u5408\u4f5c\u77e5\u8bc6\u5171\u4eab\u548c\u8ba4\u77e5\u534f\u540c\u7684\u5fc5\u8981\u6027\u3002\u5c3d\u7ba1LLM\u5145\u6ee1\u6f5c\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4e2a\u4ee3\u7406\u573a\u666f\u3002 \u4e3a\u4e86\u62d3\u5c55\u77e5\u8bc6\u9a71\u52a8\u7b56\u7565\u7684\u8303\u56f4\u5e76\u589e\u5f3a\u81ea\u4e3b\u4ee3\u7406\u7684\u4e00\u822c\u5316\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86KoMA\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u591a\u4ee3\u7406\u4ea4\u4e92\u3001\u591a\u6b65\u89c4\u5212\u3001\u5171\u4eab\u5185\u5b58\u548c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\uff0c\u65e8\u5728\u589e\u5f3a\u590d\u6742\u9a7e\u9a76\u573a\u666f\u4e0b\u591a\u4ee3\u7406\u7684\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u3002\u6839\u636e\u6846\u67b6\u751f\u6210\u7684\u9a7e\u9a76\u573a\u666f\u6587\u672c\u63cf\u8ff0\uff0c\u591a\u4ee3\u7406\u4ea4\u4e92\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u5206\u6790\u548c\u63a8\u65ad\u5468\u56f4\u8f66\u8f86\u7684\u610f\u56fe\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u591a\u6b65\u89c4\u5212\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u9010\u5c42\u5206\u6790\u548c\u83b7\u5f97\u6700\u7ec8\u884c\u52a8\u51b3\u7b56\uff0c\u786e\u4fdd\u77ed\u671f\u884c\u52a8\u51b3\u7b56\u7684\u4e00\u81f4\u76ee\u6807\u3002\u5171\u4eab\u5185\u5b58\u6a21\u5757\u53ef\u4ee5\u79ef\u7d2f\u96c6\u4f53\u7ecf\u9a8c\uff0c\u4ee5\u505a\u51fa\u66f4\u4f18\u51b3\u7b56\uff0c\u800c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\u5219\u7528\u4e8e\u8bc4\u4f30\u548c\u6539\u8fdb\u4ee3\u7406\u884c\u4e3a\uff0c\u4ee5\u63d0\u9ad8\u9a7e\u9a76\u5b89\u5168\u6027\u548c\u6548\u7387\u3002KoMA\u6846\u67b6\u4e0d\u4ec5\u589e\u5f3a\u4e86\u81ea\u4e3b\u9a7e\u9a76\u4ee3\u7406\u7684\u7a33\u5065\u6027\u548c\u9002\u5e94\u6027\uff0c\u8fd8\u663e\u8457\u63d0\u5347\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u901a\u7528\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u7684\u3001\u4e0d\u53ef\u9884\u6d4b\u7684\u9a7e\u9a76\u73af\u5883\u65f6\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u4e0d\u9700\u8981\u5927\u91cf\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2407.15073": "|**2024-07-21**|**Multi-Agent Causal Discovery Using Large Language Models**|Hao Duong Le et.al.|[2407.15073](http://arxiv.org/abs/2407.15073)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5229\u7528\u5176\u4ece\u5927\u91cf\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7684\u5e7f\u6cdb\u4e13\u5bb6\u77e5\u8bc6\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u4efb\u52a1\u65b9\u9762\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u5728\u56e0\u679c\u53d1\u73b0\u4e2d\u7684\u591a\u4ee3\u7406\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\u6765\u7814\u7a76\u8fd9\u4e00\u6f5c\u529b\u3002\u9996\u5148\uff0c\u662f\u5143\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u4ee3\u7406\u4e4b\u95f4\u7684\u63a8\u7406\u548c\u8ba8\u8bba\u6765\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u5176\u6b21\uff0c\u662f\u7f16\u7801\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5229\u7528\u4ee3\u7406\u7684\u89c4\u5212\u3001\u7f16\u5199\u548c\u6267\u884c\u4ee3\u7801\u7684\u80fd\u529b\uff0c\u7ed3\u5408\u9ad8\u7ea7\u7edf\u8ba1\u5e93\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u7b2c\u4e09\uff0c\u662f\u6df7\u5408\u6a21\u578b\uff0c\u5b83\u5c06\u5143\u4ee3\u7406\u6a21\u578b\u548c\u7f16\u7801\u4ee3\u7406\u6a21\u578b\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u878d\u5408\u4e86\u591a\u4e2a\u4ee3\u7406\u7684\u7edf\u8ba1\u5206\u6790\u548c\u63a8\u7406\u6280\u80fd\u3002\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u901a\u8fc7\u6709\u6548\u5730\u5229\u7528LLM\u7684\u4e13\u5bb6\u77e5\u8bc6\u3001\u63a8\u7406\u80fd\u529b\u3001\u591a\u4ee3\u7406\u5408\u4f5c\u4ee5\u53ca\u7edf\u8ba1\u56e0\u679c\u65b9\u6cd5\uff0c\u663e\u793a\u51fa\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u63a2\u7d22LLM\u7684\u591a\u4ee3\u7406\u6f5c\u529b\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u5229\u7528LLM\u7684\u591a\u4ee3\u7406\u89e3\u51b3\u56e0\u679c\u76f8\u5173\u95ee\u9898\u5960\u5b9a\u57fa\u7840\u3002|\n", "2407.16252": "|**2024-07-23**|**LawLuo: A Chinese Law Firm Co-run by LLM Agents**|Jingyun Sun et.al.|[2407.16252](http://arxiv.org/abs/2407.16252)|**[link](https://github.com/nefujing/lawluo)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e3a\u975e\u6cd5\u5f8b\u80cc\u666f\u7528\u6237\u63d0\u4f9b\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u65b9\u9762\u5c55\u73b0\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u4e2d\u6587\u6cd5\u5f8bLLM\u4ec5\u9650\u4e8e\u5355\u4e2a\u6a21\u578b\u4e0e\u7528\u6237\u4e4b\u95f4\u7684\u5bf9\u8bdd\u4ea4\u4e92\uff0c\u4e0e\u5f8b\u5e08\u4e8b\u52a1\u6240\u4e2d\u591a\u5458\u5de5\u5171\u540c\u53c2\u4e0e\u7684\u54a8\u8be2\u5f62\u5f0f\u4e0d\u540c\u3002\u8fd9\u79cd\u9650\u5236\u4f7f\u5f97\u54a8\u8be2\u4f53\u9a8c\u4e0d\u90a3\u4e48\u771f\u5b9e\u3002\u6b64\u5916\uff0c\u73b0\u6709\u4e2d\u6587\u6cd5\u5f8bLLM\u5b58\u5728\u5173\u952e\u95ee\u9898\uff1a\uff081\uff09\u5bf9\u6307\u5bfc\u5fae\u8c03\u6570\u636e\u8d28\u91cf\u63a7\u5236\u4e0d\u8db3\uff1b\uff082\uff09\u7531\u4e8e\u7528\u6237\u67e5\u8be2\u7684\u6a21\u7cca\u6027\u5bfc\u81f4\u6a21\u578b\u4ea7\u751f\u5e7b\u89c9\uff1b\uff083\uff09\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\uff0c\u6a21\u578b\u9075\u5faa\u6307\u4ee4\u7684\u80fd\u529b\u4e0b\u964d\u3002\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLawLuo\u201d\u7684\u65b0\u578b\u6cd5\u5f8b\u5bf9\u8bdd\u6846\u67b6\uff0c\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u6bcf\u4e2a\u4ee3\u7406\u8d1f\u8d23\u4e0d\u540c\u7684\u529f\u80fd\uff0c\u5171\u540c\u4e3a\u7528\u6237\u63d0\u4f9b\u5168\u9762\u7684\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u9ad8\u8d28\u91cf\u7684\u6cd5\u5f8b\u5bf9\u8bdd\u6570\u636e\u96c6KINLED\u548cMURLED\uff0c\u5e76\u4f7f\u7528ChatGLM-3-6b\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aToLC\u7684\u6cd5\u5f8b\u67e5\u8be2\u6f84\u6e05\u7b97\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7b49\u57fa\u7ebfLLM\u76f8\u6bd4\uff0cLawLuo\u5728\u5f8b\u5e08\u98ce\u683c\u7684\u8bed\u8a00\u8868\u8fbe\u3001\u6cd5\u5f8b\u5efa\u8bae\u7684\u6709\u6548\u6027\u4ee5\u53ca\u6cd5\u5f8b\u77e5\u8bc6\u7684\u51c6\u786e\u6027\u4e09\u4e2a\u65b9\u9762\u5747\u8868\u73b0\u51fa\u66f4\u4f18\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/NEFUJing/LawLuo\u3002**|\n", "2407.16732": "|**2024-08-03**|**PyBench: Evaluating LLM Agent on various real-world coding tasks**|Yaolun Zhang et.al.|[2407.16732](http://arxiv.org/abs/2407.16732)|**[link](https://github.com/mercury7353/pybench)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u57fa\u51c6\u5728\u7b80\u5316\u4efb\u52a1\u548c\u590d\u6742\u7279\u5b9a\u4efb\u52a1\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86PyBench\uff0c\u4e00\u4e2a\u6db5\u76d6\u4e94\u5927\u7c7b\u771f\u5b9e\u4e16\u754c\u4efb\u52a1\u7684\u57fa\u51c6\u3002\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u8d85\u8fc710\u79cd\u7c7b\u578b\u7684\u6587\u4ef6\uff0c\u65e8\u5728\u5168\u9762\u8986\u76d6\u65e5\u5e38\u7f16\u7801\u9700\u6c42\u3002\u5f53\u7528\u6237\u63d0\u51fa\u9ad8\u9636\u67e5\u8be2\u5e76\u63d0\u4f9b\u76f8\u5173\u6587\u4ef6\u65f6\uff0cLLM\u4ee3\u7406\u9700\u8981\u901a\u8fc7\u4ee3\u7801\u89e3\u91ca\u5668\u6267\u884cPython\u4ee3\u7801\u8fdb\u884c\u591a\u8f6e\u63a8\u7406\uff0c\u6700\u7ec8\u751f\u6210\u6ee1\u8db3\u7528\u6237\u9700\u6c42\u7684\u56de\u7b54\u3002\u6210\u529f\u89e3\u51b3PyBench\u4e2d\u7684\u4efb\u52a1\u8981\u6c42\u4ee3\u7406\u5177\u5907\u5e7f\u6cdb\u7684Python\u5305\u7406\u89e3\u80fd\u529b\u3001\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u548c\u4ece\u6267\u884c\u4ee3\u7801\u4e2d\u83b7\u53d6\u53cd\u9988\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5f53\u524d\u5f00\u6e90\u7684LLM\u6a21\u578b\u5728\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u6311\u6218\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5bf9\u56db\u79cd\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u89e3\u51b3PyBench\u6240\u9700\u7684\u662f\u5168\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u7cbe\u5fc3\u8c03\u4f18\u76848B\u5927\u5c0f\u6a21\u578b\uff1aPyLlama3\uff0c\u5728PyBench\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u5174\u594b\uff0c\u8d85\u8d8a\u4e86\u8bb8\u591a\u66f4\u5927\u89c4\u6a21\uff0833B\u548c70B\uff09\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u3001\u8bad\u7ec3\u6570\u636e\u96c6\u548c\u6a21\u578b\u5728GitHub\u4e0a\u63d0\u4f9b\uff1a[https://github.com/Mercury7353/PyBench](https://github.com/Mercury7353/PyBench)**|\n", "2407.18416": "|**2024-07-29**|**PersonaGym: Evaluating Persona Agents and LLMs**|Vinay Samuel et.al.|[2407.18416](http://arxiv.org/abs/2407.18416)|null|Persona\u4ee3\u7406\u4eba\uff0c\u4e00\u79cd\u6839\u636e\u5206\u914d\u7684\u4eba\u8bbe\u884c\u4e8b\u7684LLM\u4ee3\u7406\uff0c\u5728\u5404\u4e2a\u5e94\u7528\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4e0a\u4e0b\u6587\u54cd\u5e94\u80fd\u529b\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u6559\u80b2\u3001\u533b\u7597\u4fdd\u5065\u548c\u5a31\u4e50\u7b49\u4e0d\u540c\u884c\u4e1a\u4e2d\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u589e\u5f3a\uff0c\u56e0\u4e3a\u6a21\u578b\u5f00\u53d1\u8005\u53ef\u4ee5\u5c06\u4ee3\u7406\u54cd\u5e94\u4e0e\u4e0d\u540c\u7684\u7528\u6237\u9700\u6c42\u5bf9\u9f50\uff0c\u4ece\u800c\u6269\u5c55\u4e86\u4ee3\u7406\u5e94\u7528\u7684\u8303\u56f4\u3002\u7136\u800c\uff0c\u8bc4\u4f30Persona\u4ee3\u7406\u6027\u80fd\u6781\u4e3a\u56f0\u96be\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5728\u5404\u79cd\u76f8\u5173\u73af\u5883\u4e2d\u7684\u81ea\u7531\u5f62\u5f0f\u4ea4\u4e92\u4e2d\u8bc4\u4f30\u4eba\u8bbe\u4e00\u81f4\u6027\u590d\u6742\u6027\u7684\u6311\u6218\u3002\u6211\u4eec\u5f15\u5165\u4e86PersonaGym\uff0c\u9996\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30Persona\u4ee3\u7406\uff0c\u5e76\u63d0\u51fa\u4e86PersonaScore\uff0c\u9996\u4e2a\u57fa\u4e8e\u51b3\u7b56\u7406\u8bba\u7684\u81ea\u52a8\u5316\u4eba\u7c7b\u5bf9\u9f50\u6307\u6807\uff0c\u7528\u4e8e\u5168\u9762\u5927\u89c4\u6a21\u8bc4\u4f30Persona\u4ee3\u7406\u3002\u901a\u8fc7\u4f7f\u7528\u5305\u542b200\u4e2a\u4eba\u8bbe\u548c10000\u4e2a\u95ee\u9898\u7684\u57fa\u51c6\uff0c\u5bf96\u4e2a\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e2d\uff0cPersona\u4ee3\u7406\u80fd\u529b\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4f8b\u5982\uff0cClaude 3.5 Sonnet\u7684PersonaScore\u4ec5\u6bd4GPT 3.5\u63d0\u9ad8\u4e862.97%\uff0c\u5c3d\u7ba1Claude 3.5 Sonnet\u662f\u4e00\u4e2a\u66f4\u5148\u8fdb\u7684\u6a21\u578b\u3002\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u5927\u5c0f\u548c\u590d\u6742\u6027\u7684\u589e\u52a0\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740Persona\u4ee3\u7406\u80fd\u529b\u7684\u63d0\u5347\uff0c\u8fd9\u51f8\u663e\u4e86\u5fe0\u5b9e\u548c\u9ad8\u6548Persona\u4ee3\u7406\u7b97\u6cd5\u548c\u67b6\u6784\u521b\u65b0\u7684\u8feb\u5207\u9700\u8981\u3002|\n", "2407.19354": "|**2024-07-28**|**The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies**|Feng He et.al.|[2407.19354](http://arxiv.org/abs/2407.19354)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5feb\u901f\u53d1\u5c55\u7684\u542f\u53d1\uff0cLLM\u4ee3\u7406\u5df2\u53d1\u5c55\u5230\u80fd\u591f\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5904\u7406\u5927\u91cf\u6570\u636e\u4ee5\u4e0e\u4eba\u7c7b\u4e92\u52a8\u5e76\u6267\u884c\u4efb\u52a1\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u7684\u5546\u4e1a\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u4e5f\u66b4\u9732\u4e86\u5b89\u5168\u548c\u9690\u79c1\u6f0f\u6d1e\u3002\u76ee\u524d\u9636\u6bb5\uff0c\u5bf9LLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\u8fdb\u884c\u5168\u9762\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u7efc\u8ff0\u65e8\u5728\u5168\u9762\u6982\u8ff0\u65b0\u51fa\u73b0\u7684\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u7531LLM\u4ee3\u7406\u9762\u4e34\u3002 \u6211\u4eec\u9996\u5148\u4ecb\u7ecdLLM\u4ee3\u7406\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u968f\u540e\u5bf9\u5176\u8fdb\u884c\u5a01\u80c1\u5206\u7c7b\u548c\u5206\u6790\u3002\u63a5\u7740\u8ba8\u8bba\u8fd9\u4e9b\u5a01\u80c1\u5bf9\u4eba\u7c7b\u3001\u73af\u5883\u548c\u5176\u4ed6\u4ee3\u7406\u7684\u5f71\u54cd\u3002\u968f\u540e\u56de\u987e\u73b0\u6709\u9632\u5fa1\u7b56\u7565\uff0c\u5e76\u6700\u7ec8\u63a2\u7d22\u672a\u6765\u8d8b\u52bf\u3002\u6b64\u5916\uff0c\u672c\u6587\u901a\u8fc7\u591a\u79cd\u6848\u4f8b\u7814\u7a76\u6765\u4fc3\u8fdb\u66f4\u6613\u4e8e\u7406\u89e3\u7684\u89e3\u91ca\u3002\u901a\u8fc7\u5f3a\u8c03\u8fd9\u4e9b\u5173\u952e\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff0c\u672c\u6587\u65e8\u5728\u6fc0\u53d1\u672a\u6765\u7814\u7a76\uff0c\u4ee5\u589e\u5f3aLLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\uff0c\u4ece\u800c\u5728\u672a\u6765\u5e94\u7528\u4e2d\u63d0\u9ad8\u5176\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2407.19056": "|**2024-07-26**|**OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation**|Zilong Wang et.al.|[2407.19056](http://arxiv.org/abs/2407.19056)|**[link](https://github.com/zlwang-cs/OfficeBench)**|\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u4eba\u7c7b\u7684\u5de5\u4f5c\u6548\u7387\uff0c\u901a\u8fc7\u81ea\u52a8\u5b8c\u6210\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u5e38\u89c4\u4efb\u52a1\u3002\u73b0\u6709\u7684\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u672c\u4fe1\u606f\u63d0\u53d6\u4e0a\uff0c\u800c\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u7814\u7a76\u5e94\u8be5\u6269\u5c55\u5230\u66f4\u73b0\u5b9e\u7684\u529e\u516c\u5ba4\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u6574\u5408\u529e\u516c\u5ba4\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u4fe1\u606f\u6e90\uff0c\u5e76\u901a\u8fc7\u4e00\u7cfb\u5217\u51b3\u7b56\u8fc7\u7a0b\u751f\u6210\u8f93\u51fa\u3002\u6211\u4eec\u5f15\u5165\u4e86OfficeBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u771f\u5b9e\u529e\u516c\u6d41\u7a0b\u4e2d\u5904\u7406\u529e\u516c\u4efb\u52a1\u80fd\u529b\u7684\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u57fa\u51c6\u3002 OfficeBench\u8981\u6c42LLM\u4ee3\u7406\u8fdb\u884c\u53ef\u884c\u7684\u957f\u671f\u89c4\u5212\uff0c\u9ad8\u6548\u5730\u5728\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u57fa\u4e8e\u5de5\u4f5c\u6d41\u7a0b\u7684\u4e0a\u4e0b\u6587\u9700\u6c42\uff0c\u5728\u5e9e\u5927\u7684\u8054\u5408\u52a8\u4f5c\u7a7a\u95f4\u5185\u51c6\u786e\u5730\u5b9a\u4f4d\u5176\u884c\u52a8\u3002\u901a\u8fc7\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u5e94\u7528\u6211\u4eec\u7684\u5b9a\u5236\u8bc4\u4f30\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0GPT-4 Omni\u7684\u901a\u8fc7\u7387\u4e3a47.00%\uff0c\u663e\u793a\u51fa\u5728\u5904\u7406\u529e\u516c\u4efb\u52a1\u65f6\u5177\u6709\u4e0d\u9519\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4ecd\u7136\u8fdc\u4f4e\u4e8e\u5b9e\u9645\u529e\u516c\u6d41\u7a0b\u6240\u9700\u7684\u4eba\u7c7b\u8868\u73b0\u548c\u51c6\u786e\u6027\u6807\u51c6\u3002 \u8fdb\u4e00\u6b65\u89c2\u5bdf\u53d1\u73b0\uff0c\u5927\u591a\u6570\u95ee\u9898\u4e0e\u64cd\u4f5c\u5197\u4f59\u3001\u5e7b\u89c9\u4ee5\u53ca\u5728\u591a\u4e2a\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\u7684\u9650\u5236\u6709\u5173\uff0c\u8fd9\u53ef\u80fd\u4e3a\u5f00\u53d1\u6709\u6548\u7684\u81ea\u52a8\u5316\u4ee3\u7406\u6846\u67b6\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.18961": "|**2024-07-30**|**MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains**|Guoli Yin et.al.|[2407.18961](http://arxiv.org/abs/2407.18961)|**[link](https://github.com/apple/axlearn)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u5bf9\u5168\u9762\u57fa\u51c6\u7684\u9700\u6c42\uff0c\u4ee5\u8bc4\u4f30\u5b83\u4eec\u4f5c\u4e3a\u7c7b\u4eba\u7c7b\u4ee3\u7406\u7684\u80fd\u529b\u3002\u73b0\u6709\u7684\u57fa\u51c6\u867d\u7136\u6709\u7528\uff0c\u4f46\u5f80\u5f80\u805a\u7126\u4e8e\u7279\u5b9a\u7684\u5e94\u7528\u573a\u666f\uff0c\u5f3a\u8c03\u4efb\u52a1\u5b8c\u6210\u800c\u975e\u6df1\u5165\u5256\u6790\u9a71\u52a8\u8fd9\u4e9b\u7ed3\u679c\u7684\u5e95\u5c42\u6280\u80fd\u3002\u8fd9\u79cd\u7f3a\u4e4f\u7ec6\u8282\u6027\u4f7f\u5f97\u96be\u4ee5\u7cbe\u786e\u5730\u8bc6\u522b\u5931\u8d25\u7684\u539f\u56e0\u3002\u6b64\u5916\uff0c\u8bbe\u7f6e\u8fd9\u4e9b\u73af\u5883\u9700\u8981\u5927\u91cf\u7684\u5de5\u4f5c\uff0c\u5e76\u4e14\u5728\u4ea4\u4e92\u5f0f\u4efb\u52a1\u4e2d\uff0c\u4e0d\u4e00\u81f4\u6027\u4e0e\u53ef\u91cd\u590d\u6027\u95ee\u9898\u6709\u65f6\u4f1a\u51fa\u73b0\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u4ee3\u7406\u7406\u89e3\uff08MMAU\uff09\u57fa\u51c6\uff0c\u5b83\u901a\u8fc7\u65e0\u9700\u590d\u6742\u73af\u5883\u8bbe\u7f6e\u7684\u5168\u9762\u79bb\u7ebf\u4efb\u52a1\u6765\u5b9e\u73b0\u3002MMAU\u8986\u76d6\u4e86\u4e94\u4e2a\u9886\u57df\uff1a\u5de5\u5177\u4f7f\u7528\u3001\u6709\u5411\u65e0\u73af\u56fe\uff08DAG\uff09\u95ee\u7b54\u3001\u6570\u636e\u79d1\u5b66\u548c\u673a\u5668\u5b66\u4e60\u7f16\u7a0b\u3001\u7ade\u8d5b\u7ea7\u522b\u7684\u7f16\u7a0b\u548c\u6570\u5b66\uff0c\u5e76\u6db5\u76d6\u4e86\u4e94\u79cd\u5173\u952e\u80fd\u529b\uff1a\u7406\u89e3\u3001\u63a8\u7406\u3001\u89c4\u5212\u3001\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u6211\u4fee\u6b63\u3002\u603b\u8ba1\u5305\u62ec20\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u8d85\u8fc73\u5343\u4e2a\u72ec\u7279\u7684\u63d0\u793a\uff0cMMAU\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u4ee3\u7406\u7684\u4f18\u52bf\u548c\u9650\u5236\u3002\u901a\u8fc7\u5bf918\u4e2a\u4ee3\u8868\u6027\u6a21\u578b\u5728MMAU\u4e0a\u7684\u6d4b\u8bd5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6df1\u5165\u800c\u6709\u6d1e\u5bdf\u529b\u7684\u5206\u6790\u3002\u6700\u7ec8\uff0cMMAU\u4e0d\u4ec5\u63ed\u793a\u4e86LLM\u4ee3\u7406\u7684\u80fd\u529b\u548c\u9650\u5236\uff0c\u8fd8\u589e\u5f3a\u4e86\u5bf9\u5176\u6027\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u3002MMAU\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u811a\u672c\u5df2\u53d1\u5e03\u4e8ehttps://github.com/apple/axlearn/tree/main/docs/research/mmau\u3002**|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u65b9\u9762\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\uff0c\u5173\u952e\u5728\u4e8e\u5176\u96c6\u6210\u7684\u5de5\u5177\u53ef\u4ee5\u4f7f\u5176\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6267\u884c\u64cd\u4f5c\uff0c\u4ece\u5355\u7eaf\u751f\u6210\u6587\u672c\u8f6c\u5411\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u3002\u9274\u4e8e\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5e7f\u6cdb\u90e8\u7f72\u53ca\u5176\u5bf9\u73af\u5883\u7684\u76f4\u63a5\u5f71\u54cd\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u6076\u610f\u5229\u7528\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u53ef\u80fd\u9020\u6210\u7684\u635f\u5bb3\u8fdc\u5927\u4e8e\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002 \u73b0\u6709\u7814\u7a76\u5df2\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u53ef\u80fd\u5f15\u53d1\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e00\u4e2a\u5168\u65b0\u7684\u89c6\u89d2\u51fa\u53d1\uff0c\u5173\u6ce8\u4e8e\u5bfc\u81f4\u7cfb\u7edf\u6545\u969c\u7684\u653b\u51fb\u65b9\u5f0f\u2014\u2014\u5373\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u529f\u80fd\u7d0a\u4e71\u3002\u6211\u4eec\u901a\u8fc7\u91c7\u7528\u591a\u6837\u5316\u7684\u653b\u51fb\u65b9\u6cd5\u3001\u573a\u666f\u548c\u5c5e\u6027\uff0c\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\uff0c\u65e8\u5728\u63ed\u793a\u8fd9\u4e9b\u653b\u51fb\u7684\u8106\u5f31\u6027\u6240\u5728\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u591a\u79cd\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u653b\u51fb\u80fd\u591f\u8bf1\u5bfc\u6545\u969c\u7387\u8d85\u8fc780%\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u65bd\u5e76\u90e8\u7f72\u4e86\u4ee3\u7406\uff0c\u4ee5\u6b64\u7a81\u51fa\u6b64\u7c7b\u6f0f\u6d1e\u6240\u5f15\u53d1\u7684\u73b0\u5b9e\u98ce\u9669\u3002 \u4e3a\u4e86\u5e94\u5bf9\u4e0a\u8ff0\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u4ec5\u4f9d\u9760LLM\u8fdb\u884c\u6709\u6548\u68c0\u6d4b\u5b58\u5728\u56f0\u96be\uff0c\u8fd9\u7a81\u663e\u4e86\u8be5\u7c7b\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.21778": "|**2024-07-31**|**Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries**|Felix Ocker et.al.|[2407.21778](http://arxiv.org/abs/2407.21778)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201ctulip\u4ee3\u7406\u201d\u7684\u67b6\u6784\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u4e3b\u667a\u80fd\u4f53\uff0c\u5177\u6709\u5bf9\u5de5\u5177\u5e93\u4e2d\u5927\u91cf\u5de5\u5177\u8fdb\u884c\u521b\u5efa\u3001\u8bfb\u53d6\u3001\u66f4\u65b0\u548c\u5220\u9664\u7684\u80fd\u529b\u3002\u4e0e\u5f53\u524d\u5148\u8fdb\u5b9e\u73b0\u4e0d\u540c\u7684\u662f\uff0c\u201ctulip\u4ee3\u7406\u201d\u5e76\u4e0d\u5728\u7cfb\u7edf\u63d0\u793a\u4e2d\u7f16\u7801\u6240\u6709\u53ef\u7528\u5de5\u5177\u7684\u63cf\u8ff0\uff0c\u8fd9\u4f1a\u5360\u7528\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff0c\u6216\u5728\u68c0\u7d22\u5408\u9002\u5de5\u5177\u65f6\u5d4c\u5165\u6574\u4e2a\u63d0\u793a\u3002\u76f8\u53cd\uff0c\u201ctulip\u4ee3\u7406\u201d\u80fd\u591f\u9012\u5f52\u5730\u5728\u5176\u53ef\u6269\u5c55\u7684\u5de5\u5177\u5e93\u4e2d\u641c\u7d22\u5408\u9002\u7684\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u5e93\u4f5c\u4e3a\u5411\u91cf\u5b58\u50a8\u5b9e\u73b0\u3002\u8fd9\u79cd\u67b6\u6784\u663e\u8457\u964d\u4f4e\u4e86\u63a8\u7406\u6210\u672c\uff0c\u5141\u8bb8\u4f7f\u7528\u5927\u91cf\u7684\u5de5\u5177\u5e93\uff0c\u5e76\u4f7f\u4ee3\u7406\u80fd\u591f\u9002\u5e94\u5e76\u6269\u5c55\u5176\u5de5\u5177\u96c6\u3002 \u6211\u4eec\u901a\u8fc7\u6570\u5b66\u9886\u57df\u4e2d\u7684\u591a\u4e2a\u6d88\u878d\u7814\u7a76\u6765\u8bc4\u4f30\u8be5\u67b6\u6784\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u673a\u5668\u4eba\u9886\u57df\u7684\u901a\u7528\u6027\u5e94\u7528\u3002\u53c2\u8003\u5b9e\u73b0\u548c\u57fa\u51c6\u6d4b\u8bd5\u53ef\u5728github.com/HRI-EU/tulip_agent\u4e0a\u83b7\u53d6\u3002|\n", "2407.21646": "|**2024-07-31**|**Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent**|Shanbo Cheng et.al.|[2407.21646](http://arxiv.org/abs/2407.21646)|**[link](https://github.com/byteresearchcla/realsi)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u8d28\u91cf\u4e14\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u7684\u5b9e\u65f6\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u2014\u2014\u8de8\u8bed\u8a00\u4ee3\u7406\u2014\u2014\u540c\u65f6\u53e3\u8bd1\uff0c\u7b80\u79f0CLASI\u3002\u53d7\u4e13\u4e1a\u53e3\u8bd1\u5458\u542f\u53d1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u521b\u65b0\u7684\u6570\u636e\u9a71\u52a8\u8bfb\u5199\u7b56\u7565\u6765\u5e73\u8861\u7ffb\u8bd1\u8d28\u91cf\u548c\u5ef6\u8fdf\u65f6\u95f4\u3002\u4e3a\u4e86\u5e94\u5bf9\u7ffb\u8bd1\u9886\u57df\u7279\u5b9a\u672f\u8bed\u7684\u6311\u6218\uff0cCLASI\u901a\u8fc7\u591a\u6a21\u6001\u68c0\u7d22\u6a21\u5757\u83b7\u53d6\u76f8\u5173\u8d44\u6599\u4ee5\u589e\u5f3a\u7ffb\u8bd1\u5185\u5bb9\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u652f\u6301\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u8f93\u5165\u97f3\u9891\u3001\u5386\u53f2\u8bed\u5883\u4ee5\u53ca\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5bb9\u9519\u6027\u8f83\u9ad8\u7684\u7ffb\u8bd1\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u5404\u9879\u6307\u6807\u4e0a\u5747\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u7cfb\u7edf\u3002 \u4e0e\u4e13\u4e1a\u53e3\u8bd1\u5458\u76f8\u5ab2\u7f8e\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u66f4\u597d\u7684\u8bc4\u4ef7\u6307\u6807\u2014\u2014\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\uff08VIP\uff09\uff0c\u5b83\u8861\u91cf\u4e86\u6210\u529f\u4f20\u8fbe\u7ed9\u542c\u4f17\u7684\u4fe1\u606f\u91cf\u3002\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\uff0c\u6f14\u8bb2\u5f80\u5f80\u4e0d\u6d41\u7545\u3001\u975e\u6b63\u5f0f\u4e14\u6a21\u7cca\u4e0d\u6e05\uff0cCLASI\u5728\u4e2d\u82f1\u4e92\u8bd1\u65b9\u5411\u4e0a\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u5206\u522b\u8fbe\u5230\u4e8681.3%\u548c78.0%\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5546\u4e1a\u6216\u5f00\u6e90\u7cfb\u7edf\u4ec5\u5206\u522b\u4e3a35.4%\u548c41.6%\u3002\u5728\u6781\u5ea6\u56f0\u96be\u7684\u6570\u636e\u96c6\u4e0a\uff0c\u5f53\u5176\u4ed6\u7cfb\u7edf\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u4f4e\u4e8e13%\u65f6\uff0cCLASI\u4ecd\u80fd\u5b9e\u73b070%\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u5373\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\uff0c\u8fd9\u5bfc\u81f4\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\uff0c\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5316\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u96be\u5ea6\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u540d\u4e3aAgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u5305\u542b\u4e0d\u540c\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u7684\u96be\u5ea6\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u66f4\u5bb9\u6613\u548c\u66f4\u96be\u7684\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u7ecf\u8fc7AgentGen\u6307\u4ee4\u8c03\u4f18\u7684Llama-3 8B\u5728\u6574\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u800c\u4e14\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8fc7\u4e86GPT-4\u3002|\n", "2408.00523": "|**2024-08-01**|**Jailbreaking Text-to-Image Models with LLM-Based Agents**|Yingkai Dong et.al.|[2408.00523](http://arxiv.org/abs/2408.00523)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u81ea\u52a8\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u65b9\u9762\u7684\u8868\u73b0\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u8bdd\u3001\u7f16\u7a0b\u6216\u7279\u5b9a\u9886\u57df\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u5904\u7406\u751f\u6210\u5f0fAI\u5b89\u5168\u4efb\u52a1\u65f6\u5b58\u5728\u7f3a\u53e3\u3002\u8fd9\u4e9b\u7f3a\u53e3\u4e3b\u8981\u662f\u7531LLM\u7684\u5e7b\u89c9\u95ee\u9898\u4ee5\u53ca\u7f3a\u4e4f\u660e\u786e\u6307\u5bfc\u539f\u5219\u6240\u5f15\u53d1\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u9ad8\u7ea7LLM\u57fa\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u96c6\u6210\u4e86\u9ad8\u6548\u6a21\u7cca\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4e13\u95e8\u9488\u5bf9\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u653b\u51fb\u884c\u4e3a\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5177\u6709\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u7684T2I\u6a21\u578b\u7684\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002 Atlas\u5229\u7528\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u6765\u8bc4\u4f30\u63d0\u793a\u662f\u5426\u89e6\u53d1\u4e86T2I\u6a21\u578b\u7684\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u8fed\u4ee3\u65b9\u5f0f\u4e0eLLM\u548cVLM\u534f\u4f5c\uff0c\u751f\u6210\u4e00\u4e2a\u7ed5\u8fc7\u8fc7\u6ee4\u5668\u7684\u66ff\u4ee3\u63d0\u793a\u3002\u6b64\u5916\uff0cAtlas\u901a\u8fc7\u5229\u7528\u591a\u4ee3\u7406\u901a\u4fe1\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u8bb0\u5fc6\u673a\u5236\u548c\u601d\u7ef4\u94fe\uff08COT\uff09\u65b9\u6cd5\uff0c\u589e\u5f3a\u4e86LLM\u5728\u653b\u51fb\u573a\u666f\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0cAtlas\u6210\u529f\u5730\u5728\u65e0\u6a21\u578b\u8bbe\u7f6e\u4e0b\u5bf9\u591a\u4e2a\u6700\u5148\u8fdb\u7684T2I\u6a21\u578b\u8fdb\u884c\u4e86\u201c\u8d8a\u72f1\u201d\uff0c\u8fd9\u4e9b\u6a21\u578b\u90fd\u914d\u5907\u4e86\u591a\u6a21\u6001\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u540c\u65f6\uff0cAtlas\u5728\u67e5\u8be2\u6548\u7387\u548c\u751f\u6210\u56fe\u50cf\u8d28\u91cf\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.00352": "|**2024-08-01**|**Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion**|Honglei Miao et.al.|[2408.00352](http://arxiv.org/abs/2408.00352)|null|\u6587\u672c\u5230\u52a8\u4f5c\uff08Text-to-Motion\uff0cT2M\uff09\u6a21\u578b\u901a\u8fc7\u6df1\u5ea6\u751f\u6210\u6a21\u578b\u9a71\u52a8\u7684\u4eba\u7c7b\u8fd0\u52a8\u751f\u6210\uff0c\u5728\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ece\u6587\u672c\u63d0\u793a\u751f\u6210\u771f\u5b9e\u52a8\u4f5c\u7684\u80fd\u529b\u5f15\u53d1\u4e86\u5b89\u5168\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5f53\u5b83\u4eec\u53ef\u80fd\u88ab\u6076\u610f\u5229\u7528\u65f6\u3002\u5c3d\u7ba1\u5bf9T2M\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5f88\u5c11\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u514d\u53d7\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f71\u54cd\u3002\u73b0\u6709\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u7684\u5de5\u4f5c\u5bf9\u4e8e\u72ec\u7279\u7684\u52a8\u4f5c\u9886\u57df\u6765\u8bf4\u5e76\u4e0d\u5145\u5206\u3002 \u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aALERT-Motion\u7684\u81ea\u4e3b\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6784\u5efa\u9488\u5bf9\u9ed1\u76d2T2M\u6a21\u578b\u7684\u6709\u9488\u5bf9\u6027\u7684\u5bf9\u6297\u6027\u653b\u51fb\u3002\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u9884\u5b9a\u4e49\u89c4\u5219\u4fee\u6539\u63d0\u793a\u4e0d\u540c\uff0cALERT-Motion\u5229\u7528LLMs\u5bf9\u4eba\u7c7b\u52a8\u4f5c\u7684\u77e5\u8bc6\uff0c\u81ea\u4e3b\u751f\u6210\u5fae\u5999\u800c\u5f3a\u5927\u7684\u5bf9\u6297\u6027\u6587\u672c\u63cf\u8ff0\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u4e00\u4e2a\u9002\u5e94\u6027\u8c03\u5ea6\u6a21\u5757\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u4ee5\u8fed\u4ee3\u5730\u7ec6\u5316\u548c\u641c\u7d22\u5bf9\u6297\u6027\u63d0\u793a\uff1b\u4ee5\u53ca\u4e00\u4e2a\u591a\u6a21\u6001\u4fe1\u606f\u5bf9\u6bd4\u6a21\u5757\uff0c\u63d0\u53d6\u4e0e\u52a8\u4f5c\u76f8\u5173\u7684\u5173\u952e\u8bed\u4e49\u4fe1\u606f\uff0c\u6307\u5bfc\u4ee3\u7406\u7684\u641c\u7d22\u3002 \u901a\u8fc7\u8fd9\u4e00\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0cALERT-Motion\u80fd\u591f\u6784\u9020\u67e5\u8be2\u53d7\u5bb3\u6a21\u578b\u4ee5\u4ea7\u751f\u4e0e\u76ee\u6807\u52a8\u4f5c\u9ad8\u5ea6\u5339\u914d\u7684\u8f93\u51fa\u7684\u5bf9\u6297\u6027\u63d0\u793a\uff0c\u540c\u65f6\u907f\u514d\u660e\u663e\u7684\u6270\u52a8\u3002\u5728\u6d41\u884c\u7684T2M\u6a21\u578b\u4e0a\u8fdb\u884c\u7684\u8bc4\u4f30\u663e\u793a\u4e86ALERT-Motion\u76f8\u5bf9\u4e8e\u5148\u524d\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u5176\u5bf9\u6297\u6210\u529f\u7387\u66f4\u9ad8\uff0c\u5e76\u4e14\u5bf9\u6297\u6027\u63d0\u793a\u66f4\u52a0\u9690\u853d\u3002\u8fd9\u9879\u5173\u4e8eT2M\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f00\u521b\u6027\u5de5\u4f5c\u5f3a\u8c03\u4e86\u968f\u7740\u8fd0\u52a8\u751f\u6210\u6280\u672f\u7684\u53d1\u5c55\uff0c\u5f00\u53d1\u9632\u5fa1\u63aa\u65bd\u7684\u7d27\u8feb\u6027\uff0c\u8fd9\u4fc3\u4f7f\u6211\u4eec\u8fdb\u4e00\u6b65\u7814\u7a76\u5b89\u5168\u548c\u8d1f\u8d23\u4efb\u7684\u90e8\u7f72\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible.|\n", "2408.02479": "|**2024-08-05**|**From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future**|Haolin Jin et.al.|[2408.02479](http://arxiv.org/abs/2408.02479)|null|With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artificial General Intelligence (AGI), combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.|\n", "2408.02232": "|**2024-08-07**|**SpecRover: Code Intent Extraction via LLMs**|Haifeng Ruan et.al.|[2408.02232](http://arxiv.org/abs/2408.02232)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u7a0b\u5e8f\u5206\u6790\u80fd\u529b\u7ed3\u5408\u7684\u5f62\u5f0f\u4e0b\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u81ea\u52a8\u6267\u884c\u7a0b\u5e8f\u6539\u8fdb\u548c\u9519\u8bef\u4fee\u590d\u7684\u9ad8\u6548\u4f4e\u8017\u5de5\u4f5c\u6d41\u7a0b\u3002\u7531\u4e8e\u7a0b\u5e8f\u6539\u8fdb\u6216\u4fee\u590d\u901a\u5e38\u9700\u8981\u660e\u786e\u671f\u671b\u7684\u884c\u4e3a\u89c4\u8303\uff0c\u56e0\u6b64\u89c4\u8303\u63a8\u65ad\u5bf9\u4e8e\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u8865\u4e01\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u5728\u8f6f\u4ef6\u9879\u76ee\u4e2d\u8fdb\u884c\u8fed\u4ee3\u4ee3\u7801\u641c\u7d22\u5e76\u914d\u5408\u89c4\u8303\u63a8\u65ad\u6765\u63a2\u7d22\u8fd9\u4e00\u9886\u57df\uff0c\u4ece\u800c\u4ece\u9879\u76ee\u7684\u7ed3\u6784\u548c\u884c\u4e3a\u4e2d\u63a8\u65ad\u51fa\u610f\u56fe\u3002\u6355\u83b7\u7684\u610f\u56fe\u5c06\u7531\u5ba1\u67e5\u8005\u4ee3\u7406\u8fdb\u884c\u5ba1\u67e5\uff0c\u4ee5\u9a8c\u8bc1\u8865\u4e01\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u5bf9\u9a8c\u8bc1\u540e\u8865\u4e01\u4fe1\u5fc3\u5ea6\u91cf\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u201cSpecRover\u201d\uff08AutoCodeRover-v2\uff09\u5efa\u7acb\u5728\u5f00\u6e90\u7684LLM\u4ee3\u7406AutoCodeRover\u4e4b\u4e0a\u3002\u5728\u4f7f\u7528SWE-Bench\u5b8c\u6574\u96c6\u8bc4\u4f30\u65f6\uff0c\u5373\u9488\u5bf92294\u4e2aGitHub\u95ee\u9898\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u4e86\u76f8\u5bf9\u4e8eAutoCodeRover\u8d85\u8fc750%\u7684\u6548\u7387\u63d0\u5347\u3002\u4e0e\u73b0\u6709\u7684\u5f00\u6e90\u4ee3\u7406\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u89e3\u51b3SWE-Bench lite\u4e2d\u7684\u5e73\u5747GitHub\u95ee\u9898\u65f6\uff0c\u6210\u672c\u4ec5\u4e3a0.65\u7f8e\u5143\u3002SpecRover\u751f\u6210\u7684\u89e3\u91ca\u80fd\u591f\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u66f4\u660e\u786e\u7684\u4fe1\u53f7\uff0c\u8868\u660e\u5efa\u8bae\u7684\u8865\u4e01\u53ef\u4ee5\u88ab\u6709\u4fe1\u5fc3\u5730\u63a5\u53d7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8fd8\u5f3a\u8c03\u4e86\u5373\u4f7f\u5728LLM\u65f6\u4ee3\uff0c\u81ea\u52a8\u5316\u7a0b\u5e8f\u4fee\u590d\u6280\u672f\u4e2d\u89c4\u8303\u63a8\u65ad\u7684\u91cd\u8981\u6027\u3002|\n", "2408.01725": "|**2024-08-03**|**The Drama Machine: Simulating Character Development with LLM Agents**|Liam Magee et.al.|[2408.01725](http://arxiv.org/abs/2408.01725)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6765\u6a21\u62df\u590d\u6742\u52a8\u6001\u89d2\u8272\u5728\u620f\u5267\u6027\u573a\u666f\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u201c\u620f\u5267\u673a\u5668\u201d\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u534f\u8c03\u4e86\u626e\u6f14\u4e0d\u540c\u201c\u81ea\u6211\u201d\u548c\u201c\u8d85\u6211\u201d\u5fc3\u7406\u89d2\u8272\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u5728\u89d2\u8272\u626e\u6f14\u6a21\u62df\u4e2d\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u5728\u76f8\u4e92\u4f5c\u7528\u7684\u5bf9\u8bdd\u548c\u4e2a\u4f53\u5185\u90e8\u72ec\u767d\u4e4b\u95f4\u53d1\u5c55\u5e73\u884c\u7684\u4ea4\u4e92\u3002 \u6211\u4eec\u5c06\u6b64\u6846\u67b6\u5e94\u7528\u4e8e\u4e24\u4e2a\u620f\u5267\u573a\u666f\u2014\u2014\u9762\u8bd5\u548c\u4fa6\u63a2\u6545\u4e8b\uff0c\u5e76\u6bd4\u8f83\u4e86\u5728\u6709\u65e0\u201c\u8d85\u6211\u201d\u5f71\u54cd\u4e0b\u89d2\u8272\u53d1\u5c55\u7684\u5dee\u5f02\u3002\u5c3d\u7ba1\u662f\u521d\u6b65\u7814\u7a76\uff0c\u4f46\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u66f4\u52a0\u7ec6\u817b\u3001\u9002\u5e94\u6027\u5f3a\u7684\u6545\u4e8b\uff0c\u8fd9\u4e9b\u6545\u4e8b\u968f\u7740\u4e00\u7cfb\u5217\u5bf9\u8bdd\u56de\u5408\u7684\u53d1\u5c55\u800c\u6f14\u53d8\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u89d2\u8272\u626e\u6f14\u7684\u4e0d\u540c\u65b9\u5f0f\u4ee5\u53ca\u8fd9\u53ef\u80fd\u5bf9AI\u4e3b\u4f53\u6027\u7684\u6982\u5ff5\u5316\u610f\u5473\u7740\u4ec0\u4e48\u3002\u8bba\u6587\u6700\u540e\u8003\u8651\u4e86\u8fd9\u4e00\u65b9\u6cd5\u5982\u4f55\u4e3a\u601d\u8003AI\u6a21\u62df\u4e2d\u5185\u5728\u51b2\u7a81\u548c\u793e\u4f1a\u8868\u6f14\u6027\u7684\u4f5c\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002|\n", "2408.01703": "|**2024-08-03**|**WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization**|Liwenhan Xie et.al.|[2408.01703](http://arxiv.org/abs/2408.01703)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u5bf9\u8bdd\u5f0f\u7528\u6237\u754c\u9762\u652f\u6301\u6570\u636e\u5206\u6790\uff0c\u4ee5OpenAI\u7684ChatGPT\uff08\u539f\u540dAdvanced Data Analysis\u6216Code Interpreter\uff09\u4e3a\u4ee3\u8868\u3002\u672c\u8d28\u4e0a\uff0cLLM\u751f\u6210\u4ee3\u7801\u4ee5\u5b8c\u6210\u5404\u79cd\u5206\u6790\u4efb\u52a1\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5448\u73b0\u539f\u59cb\u4ee3\u7801\u53ef\u80fd\u4f1a\u4f7f\u903b\u8f91\u53d8\u5f97\u6a21\u7cca\uff0c\u5e76\u59a8\u788d\u7528\u6237\u9a8c\u8bc1\u3002\u4e3a\u4e86\u8d4b\u4e88\u7528\u6237\u5bf9\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\u8fdb\u884c\u589e\u5f3a\u7406\u89e3\u4e0e\u63a7\u5236\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u8f6c\u6362\u4e3a\u5b9e\u65f6\u4ea4\u4e92\u5f0f\u7684\u53ef\u89c6\u5316\u8868\u793a\u3002\u5728\u8be5\u65b9\u6cd5\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u5b9e\u65f6\u83b7\u5f97\u6e05\u6670\u3001\u5206\u6b65\u7684LLM\u4ee3\u7801\u53ef\u89c6\u5316\uff0c\u5141\u8bb8\u4ed6\u4eec\u7406\u89e3\u3001\u9a8c\u8bc1\u5e76\u4fee\u6539\u5206\u6790\u4e2d\u7684\u6bcf\u4e2a\u6570\u636e\u64cd\u4f5c\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u51b3\u7b56\u57fa\u4e8e\u4e00\u9879\u63a2\u7d22\u7528\u6237\u5b9e\u8df5\u4e0e\u6311\u6218\u7684\u5f62\u6210\u6027\u7814\u7a76\uff08N=8\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u540d\u4e3aWaitGPT\u7684\u539f\u578b\uff0c\u5e76\u8fdb\u884c\u4e86\u4e00\u9879\u7528\u6237\u7814\u7a76\uff08N=12\uff09\uff0c\u4ee5\u8bc4\u4f30\u5176\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\u3002\u7528\u6237\u7814\u7a76\u7684\u7ed3\u679c\u8868\u660e\uff0cWaitGPT\u6709\u52a9\u4e8e\u76d1\u63a7\u548c\u5f15\u5bfc\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\uff0c\u4f7f\u53c2\u4e0e\u8005\u80fd\u591f\u63d0\u9ad8\u9519\u8bef\u68c0\u6d4b\u80fd\u529b\u5e76\u589e\u52a0\u5bf9\u7ed3\u679c\u7684\u6574\u4f53\u4fe1\u5fc3\u3002|\n", "2408.01667": "|**2024-08-03**|**Automated Phishing Detection Using URLs and Webpages**|Huilin Wang et.al.|[2408.01667](http://arxiv.org/abs/2408.01667)|null|### \u6458\u8981 \u672c\u6587\u9879\u76ee\u805a\u7126\u4e8e\u901a\u8fc7\u6784\u5efa\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u4ee5\u89e3\u51b3\u4f20\u7edf\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3b\u52a8\u83b7\u53d6\u548c\u5229\u7528\u5728\u7ebf\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u52a8\u6001\u7684\u53c2\u8003\u7cfb\u7edf\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u7cbe\u786e\u7684\u9493\u9c7c\u68c0\u6d4b\u3002\u8fd9\u4e00\u521b\u65b0\u907f\u514d\u4e86\u4f9d\u8d56\u9759\u6001\u77e5\u8bc6\u5e93\u7684\u9700\u6c42\uff0c\u663e\u8457\u63d0\u5347\u4e86\u81ea\u52a8\u5316\u5b89\u5168\u63aa\u65bd\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u3002 ### \u9879\u76ee\u6982\u8ff0 \u9879\u76ee\u62a5\u544a\u9996\u5148\u5bf9\u73b0\u6709\u89e3\u51b3\u65b9\u6848\u8fdb\u884c\u4e86\u521d\u6b65\u7814\u7a76\u548c\u95ee\u9898\u5206\u6790\uff0c\u4fc3\u4f7f\u6211\u4eec\u5f00\u53d1\u51fa\u65b0\u7684\u6846\u67b6\u3002\u6211\u4eec\u4ee5\u6a21\u62df\u7684LLM\u4ee3\u7406\u6765\u5c55\u793a\u6846\u67b6\uff0c\u5e76\u8be6\u7ec6\u9610\u8ff0\u4e86\u6784\u5efa\u6240\u9700\u7684\u6280\u672f\uff0c\u968f\u540e\u63d0\u4f9b\u4e86\u5b8c\u6574\u5b9e\u65bd\u7684\u5b9e\u4f8b\u53ca\u5b9e\u9a8c\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u540c\u7c7b\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u5ea6\u4e0a\u8fbe\u5230\u4e860.945\uff0c\u76f8\u6bd4\u73b0\u6709\u89e3\u51b3\u65b9\u6848DynaPhish\u9ad8\u51fa0.445\u4e2a\u767e\u5206\u70b9\u3002 ### \u6027\u80fd\u4e0e\u5c40\u9650 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u672c\u6846\u67b6\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5f53\u524d\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u5177\u6709\u9002\u5e94\u5b9e\u9645\u5e94\u7528\u7684\u6f5c\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u8ba8\u8bba\u4e86\u8be5\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u8fdb\u7b56\u7565\uff0c\u65e8\u5728\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u6548\u80fd\u3002 ### \u7ed3\u8bba \u63d0\u51fa\u7684\u6846\u67b6\u4e3a\u589e\u5f3a\u73b0\u6709\u7684\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u624b\u6bb5\u63d0\u4f9b\u4e86\u6709\u6548\u9014\u5f84\uff0c\u5e76\u4e14\u5177\u5907\u88ab\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.03910": "|**2024-08-11**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u5f80\u5f80\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86CodexGraph\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u4e0eLLM\u4ee3\u7406\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u548c\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0cCodexGraph\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u7684\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5CodexGraph\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u901a\u8fc7\u4f7f\u7528\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0cCodexGraph\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u7528\u9014\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03631": "|**2024-08-07**|**Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent**|Yanhu Wang et.al.|[2408.03631](http://arxiv.org/abs/2408.03631)|null|\u4f20\u7edf\u7684\u57fa\u7ad9\u9009\u5740\uff08BSS\uff09\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u9a7e\u9a76\u6d4b\u8bd5\u548c\u7528\u6237\u53cd\u9988\uff0c\u8fd9\u65e2\u8d39\u65f6\u53c8\u9700\u8981\u5728\u901a\u4fe1\u3001\u7f51\u7edc\u548c\u4f18\u5316\u65b9\u9762\u5177\u5907\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u76f8\u5173\u6280\u672f\u7684\u53d1\u5c55\uff0c\u7279\u522b\u662f\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u4ee3\u7406\u5de5\u7a0b\u9886\u57df\uff0c\u7f51\u7edc\u4f18\u5316\u5c06\u89c1\u8bc1\u4e00\u573a\u9769\u547d\u6027\u7684\u8f6c\u53d8\u3002\u8fd9\u79cd\u8f6c\u53d8\u6d89\u53ca\u5de7\u5999\u5730\u4f7f\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u6765\u5411\u8fd9\u4e9b\u590d\u6742\u800c\u5148\u8fdb\u7684LLMs\u6ce8\u5165\u4eba\u7c7b\u7ecf\u9a8c\u548c\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8fde\u63a5\u5230\u4eba\u7c7b\u7528\u6237\uff0c\u90e8\u7f72\u81ea\u4e3b\u4ee3\u7406\u4f5c\u4e3a\u901a\u4fe1\u6865\u6881\u3002\u8fd9\u79cd\u96c6\u6210\u4ee3\u8868\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4f5c\u4e3a\u4e00\u79cd\u670d\u52a1\u548cAI\u4f7f\u751f\u6d3b\u66f4\u4fbf\u6377\u7684\u672a\u6765\u8303\u5f0f\u3002 \u4f5c\u4e3a\u521d\u6b65\u63a2\u7d22\uff0c\u672c\u7814\u7a76\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u4e2a\u7531LLM\u9a71\u52a8\u7684BSS\u4f18\u5316\u6846\u67b6\uff0c\u5e76\u63d0\u51fa\u4e86\u56db\u79cd\u6f5c\u5728\u7684\u5b9e\u73b0\u7b56\u7565\uff1a\u57fa\u4e8e\u4f18\u5316\u63d0\u793a\u7684LLM\uff08PoL\uff09\u3001\u4eba\u673a\u4ea4\u4e92\u7684LLM\uff08HiLL\uff09\u3001LLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08LaBa\uff09\u4ee5\u53ca\u534f\u540c\u591a\u4e2aLLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08CLaBa\uff09\u3002\u901a\u8fc7\u5728\u771f\u5b9e\u6570\u636e\u4e0a\u7684\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u8868\u660e\uff0c\u501f\u52a9\u63d0\u793a\u7684LLM\u548c\u57fa\u4e8e\u4ee3\u7406\u7684LLM\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u9ad8\u6548\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u53ef\u9760\u7684\u7f51\u7edc\u90e8\u7f72\uff0c\u663e\u8457\u63d0\u9ad8\u4e86BSS\u4f18\u5316\u7684\u6548\u7387\u5e76\u51cf\u5c11\u4e86\u4e0d\u5fc5\u8981\u7684\u624b\u52a8\u53c2\u4e0e\u3002|\n", "2408.04168": "|**2024-08-08**|**Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions**|Qingbin Zeng et.al.|[2408.04168](http://arxiv.org/abs/2408.04168)|**[link](https://github.com/hiyouga/llama-factory)**|\u672c\u6587\u63a2\u8ba8\u4e86\u57ce\u5e02\u5bfc\u822a\u573a\u666f\u4e0b\u7684AI\u4ee3\u7406\u95ee\u9898\uff1a\u63d0\u4f9b\u76ee\u6807\u4f4d\u7f6e\u4e0e\u77e5\u540d\u5730\u6807\u4e4b\u95f4\u7684\u8bed\u8a00\u63cf\u8ff0\uff1b\u4ec5\u901a\u8fc7\u89c2\u5bdf\u5468\u56f4\u73af\u5883\uff0c\u5305\u62ec\u8bc6\u522b\u5730\u6807\u548c\u9053\u8def\u7f51\u7edc\u8fde\u63a5\uff0c\u4ee3\u7406\u9700\u8981\u4f5c\u51fa\u51b3\u7b56\u4ee5\u65e0\u6307\u793a\u5730\u5bfc\u822a\u81f3\u76ee\u6807\u4f4d\u7f6e\u3002\u8fd9\u4e00\u6311\u6218\u6027\u5728\u4e8e\uff0c\u5b83\u8981\u6c42\u4ee3\u7406\u5efa\u7acb\u81ea\u8eab\u5b9a\u4f4d\u5e76\u83b7\u53d6\u590d\u6742\u57ce\u5e02\u73af\u5883\u7684\u7a7a\u95f4\u8868\u793a\uff0c\u800c\u5730\u6807\u5f80\u5f80\u4e0d\u53ef\u89c1\u3002\u5728\u7f3a\u4e4f\u5bfc\u822a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u79cd\u80fd\u529b\u5bf9\u4e8e\u4ee3\u7406\u5728\u957f\u8ddd\u79bb\u57ce\u5e02\u5bfc\u822a\u4e2d\u505a\u51fa\u9ad8\u8d28\u91cf\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u63a8\u7406\u80fd\u529b\u7684\u6d8c\u73b0\uff0c\u4e00\u4e2a\u5438\u5f15\u4eba\u7684\u57fa\u7840\u65b9\u6cd5\u662f\u63d0\u793aLLMs\u5bf9\u6bcf\u6b21\u89c2\u5bdf\u505a\u51fa\u201c\u53cd\u5e94\u201d\u5e76\u636e\u6b64\u4f5c\u51fa\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6027\u80fd\u975e\u5e38\u5dee\uff0c\u4ee3\u7406\u7ecf\u5e38\u53cd\u590d\u8bbf\u95ee\u76f8\u540c\u4f4d\u7f6e\uff0c\u5e76\u4f5c\u51fa\u77ed\u89c6\u3001\u4e0d\u4e00\u81f4\u7684\u51b3\u7b56\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5176\u7279\u5f81\u5728\u4e8e\u611f\u77e5\u3001\u53cd\u601d\u548c\u89c4\u5212\u7684\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7\u5fae\u8c03\u7684LLaVA-7B\u80fd\u591f\u51c6\u786e\u611f\u77e5\u5730\u6807\u7684\u65b9\u5411\u548c\u8ddd\u79bb\uff0c\u9002\u7528\u4e8e\u57ce\u5e02\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u8bb0\u5fc6\u673a\u5236\u5b9e\u73b0\u53cd\u601d\uff0c\u5373\u5b58\u50a8\u8fc7\u5f80\u7ecf\u9a8c\u5e76\u5728\u5f53\u524d\u611f\u77e5\u4e0b\u68c0\u7d22\uff0c\u4ee5\u8fdb\u884c\u6709\u6548\u7684\u51b3\u7b56\u8bba\u8bc1\u3002\u89c4\u5212\u5219\u5229\u7528\u53cd\u601d\u7ed3\u679c\u751f\u6210\u957f\u671f\u8ba1\u5212\uff0c\u4ece\u800c\u907f\u514d\u957f\u8ddd\u79bb\u5bfc\u822a\u4e2d\u7684\u77ed\u89c6\u51b3\u7b56\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bbe\u8ba1\u7684\u5de5\u4f5c\u6d41\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86LLM\u4ee3\u7406\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u662f\u5426\u4f1a\u635f\u5bb3LLM\u4ee3\u7406\u5728\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u8fdb\u8ba1\u5212\uff1f\uff084\uff09\u5bf9LLM\u8fdb\u884c\u6b63\u8d1f\u53cd\u9988\u7ed3\u5408\u7684\u5fae\u8c03\u662f\u5426\u80fd\u5e26\u6765\u8fdb\u4e00\u6b65\u7684\u63d0\u5347\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u5b83\u4eec\u5728\u5173\u6ce8\u957f\u4e0a\u4e0b\u6587\u4e2d\u5173\u952e\u90e8\u5206\u7684\u80fd\u529b\u4e0a\u4ecd\u7136\u5b58\u5728\u4e0d\u8db3\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u5728\u5206\u6790\u957f\u8ba1\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5e76\u4e14\u65e0\u6cd5\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u7528\u4e8e\u7ec6\u5316\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Feedback-Aware Fine-Tuning\uff08FAFT\uff09\uff0c\u4e00\u79cd\u5229\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\uff0c\u76f8\u8f83\u4e8e\u7eaf\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\uff0cFAFT\u5728\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u5173\u4e8e\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.05346": "|**2024-08-13**|**DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts**|Mohammed Saidul Islam et.al.|[2408.05346](http://arxiv.org/abs/2408.05346)|**[link](https://github.com/saidul-islam98/DataNarrative)**|\u6570\u636e\u9a71\u52a8\u7684\u6545\u4e8b\u53d9\u8ff0\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ed3\u5408\u53d9\u4e8b\u6280\u5de7\u4e0e\u53ef\u89c6\u5316\u548c\u6587\u672c\uff0c\u6765\u4f20\u8fbe\u89c1\u89e3\u3002\u8fd9\u4e9b\u6545\u4e8b\u878d\u5408\u4e86\u56fe\u8868\u4e2d\u7684\u7a81\u51fa\u6761\u5f62\u548c\u7ebf\u6761\u4ee5\u53ca\u89e3\u91ca\u89c1\u89e3\u7684\u6587\u672c\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u521b\u5efa\u8fd9\u6837\u7684\u6545\u4e8b\u9700\u8981\u5bf9\u6570\u636e\u6709\u6df1\u5165\u7684\u7406\u89e3\uff0c\u5e76\u4e14\u9700\u8981\u7cbe\u5fc3\u7684\u53d9\u4e8b\u89c4\u5212\uff0c\u901a\u5e38\u9700\u8981\u4eba\u7c7b\u7684\u4ecb\u5165\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u8d39\u5fc3\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u751f\u6210\u8fde\u8d2f\u548c\u5168\u9762\u7684\u6570\u636e\u6545\u4e8b\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u4efb\u52a1\u2014\u2014\u6570\u636e\u6545\u4e8b\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u76841,449\u4e2a\u6545\u4e8b\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u521b\u9020\u8fde\u8d2f\u6570\u636e\u6545\u4e8b\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5229\u7528\u4e24\u4e2aLLM\u4ee3\u7406\u6765\u6a21\u4eff\u4eba\u7c7b\u8bb2\u6545\u4e8b\u7684\u8fc7\u7a0b\uff1a\u4e00\u4e2a\u7528\u4e8e\u7406\u89e3\u5e76\u63cf\u8ff0\u6570\u636e\u3001\u751f\u6210\u5927\u7eb2\u548c\u53d9\u8ff0\uff0c\u53e6\u4e00\u4e2a\u5219\u5728\u6bcf\u4e2a\u4e2d\u95f4\u6b65\u9aa4\u8fdb\u884c\u9a8c\u8bc1\u3002\u5c3d\u7ba1\u6211\u4eec\u7684\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8e\u6a21\u578b\u548c\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u901a\u5e38\u4f18\u4e8e\u975e\u4ee3\u7406\u5bf9\u624b\uff0c\u4f46\u7ed3\u679c\u4e5f\u63ed\u793a\u4e86\u6570\u636e\u6545\u4e8b\u751f\u6210\u7684\u72ec\u7279\u6311\u6218\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u89e3\u51b3SWE-Bench Lite\u4e2d\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\uff0c\u4e00\u4e2a\u65e8\u5728\u5229\u7528\u5176\u72ec\u7279\u4e13\u957f\u7684\u6846\u67b6\u3002DEI\u4f5c\u4e3a\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90SWE\u4ee3\u7406\uff0c\u5176\u6700\u9ad8\u4e2a\u4f53\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u4e3a27.3%\uff0c\u5728\u5e94\u7528\u4e86DEI\u540e\uff0c\u80fd\u591f\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u8868\u73b0\u56e2\u961f\u4ee555%\u7684\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u53d6\u5f97\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5cAI\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06520": "|**2024-08-12**|**Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning**|Chuanneng Sun et.al.|[2408.06520](http://arxiv.org/abs/2408.06520)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5b83\u4eec\u6210\u4e3a\u673a\u5668\u4eba\u51b3\u7b56\u7684\u6709\u5e0c\u671b\u5019\u9009\u8005\u3002\u53d7\u5230\u5c42\u6b21\u5f3a\u5316\u5b66\u4e60\uff08HRL\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014\u5728\u4e0a\u4e0b\u6587\u4e2d\u8fdb\u884c\u5c42\u6b21\u5316\u7684\u5f3a\u5316\u5b66\u4e60\uff08HCRL\uff09\u3002\u8be5\u6846\u67b6\u901a\u8fc7LLM\u57fa\u9ad8\u5c42\u7b56\u7565\u5206\u89e3\u590d\u6742\u4efb\u52a1\uff0c\u5373\u901a\u8fc7\u5728\u6267\u884c\u65f6\u52a8\u6001\u5206\u89e3\u590d\u6742\u4efb\u52a1\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u5229\u7528\u9ad8\u9636\u7b56\u7565\u6765\u5b9a\u4e49\u76ee\u6807\uff0c\u8fd9\u4e9b\u76ee\u6807\u7531\u5b50\u4efb\u52a1\u7ec4\u6210\uff0c\u5e76\u5206\u914d\u7ed9\u4f4e\u9636\u7b56\u7565\u4ee5\u5b8c\u6210\u3002\u4e00\u65e6LLM\u4ee3\u7406\u786e\u5b9a\u76ee\u6807\u5df2\u5b8c\u6210\uff0c\u5219\u4f1a\u63d0\u51fa\u65b0\u7684\u76ee\u6807\u3002 \u4e3a\u4e86\u63d0\u9ad8\u591a\u8f6e\u6267\u884c\u4e2d\u7684\u4ee3\u7406\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e8b\u540e\u6a21\u5757\u5316\u53cd\u601d\uff08HMR\uff09\uff0c\u5176\u4e2d\uff0c\u4ee3\u7406\u4e0d\u662f\u5bf9\u5b8c\u6574\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u800c\u662f\u5c06\u4efb\u52a1\u76ee\u6807\u66ff\u6362\u4e3a\u4e2d\u95f4\u76ee\u6807\uff0c\u5e76\u8ba9\u4ee3\u7406\u5bf9\u8f83\u77ed\u7684\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u4ee5\u63d0\u9ad8\u53cd\u601d\u6548\u7387\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u57fa\u51c6\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6240\u63d0\u51fa\u7684HCRL\u7684\u51b3\u7b56\u80fd\u529b\u2014\u2014ALFWorld\u3001Webshop\u548cHotpotQA\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5728\u4e94\u8f6e\u6267\u884c\u4e2d\uff0cHCRL\u53ef\u5b9e\u73b09%\u300142%\u548c10%\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.07199": "|**2024-08-13**|**Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents**|Pranav Putta et.al.|[2408.07199](http://arxiv.org/abs/2408.07199)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9700\u8981\u590d\u6742\u63a8\u7406\u7684\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u8fdb\u884c\u81ea\u4e3b\u4ee3\u7406\u7684\u591a\u6b65\u9aa4\u63a8\u7406\u5e94\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4f20\u7edf\u7684\u57fa\u4e8e\u9759\u6001\u6570\u636e\u96c6\u7684\u76d1\u7763\u9884\u8bad\u7ec3\u4e0d\u8db3\u4ee5\u4f7f\u81ea\u4e3b\u4ee3\u7406\u5177\u5907\u5728\u52a8\u6001\u8bbe\u7f6e\u5982\u7f51\u7edc\u5bfc\u822a\u4e2d\u6267\u884c\u590d\u6742\u51b3\u7b56\u6240\u9700\u7684\u81ea\u4e3b\u80fd\u529b\u3002\u4ee5\u5f80\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u6765\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\u7684\u65b9\u6cd5\u5f80\u5f80\u9762\u4e34\u7d2f\u79ef\u9519\u8bef\u548c\u63a2\u7d22\u6570\u636e\u6709\u9650\u7684\u95ee\u9898\uff0c\u5bfc\u81f4\u653f\u7b56\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7ed3\u5408\u4e86\u5f15\u5bfc\u5f0f\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u641c\u7d22\u4e0e\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u5e76\u4f7f\u7528\u79bb\u7b56\u7565\u53d8\u4f53\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b97\u6cd5\u5bf9\u4ee3\u7406\u4e92\u52a8\u8fdb\u884c\u8fed\u4ee3\u5fae\u8c03\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8LLM\u4ee3\u7406\u4ece\u6210\u529f\u548c\u5931\u8d25\u7684\u8f68\u8ff9\u4e2d\u6709\u6548\u5b66\u4e60\uff0c\u4ece\u800c\u5728\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u4e2d\u63d0\u9ad8\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u5728WebShop\u73af\u5883\uff08\u4e00\u4e2a\u6a21\u62df\u7535\u5b50\u5546\u52a1\u5e73\u53f0\uff09\u4e2d\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8be5\u73af\u5883\u5728\u4e0e\u884c\u4e3a\u514b\u9686\u548c\u5f3a\u5316\u5fae\u8c03\u57fa\u7ebf\u76f8\u6bd4\u65f6\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u5728\u914d\u5907\u5728\u7ebf\u641c\u7d22\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51fb\u8d25\u4e86\u5e73\u5747\u4eba\u7c7b\u6027\u80fd\u3002\u5728\u5b9e\u9645\u9884\u8ba2\u573a\u666f\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86Llama-3 70B\u6a21\u578b\u7684\u96f6\u5c04\u6210\u529f\u7387\u4ece18.6%\u589e\u52a0\u523081.7%\uff08\u76f8\u5bf9\u589e\u52a0\u4e86340%\uff09\uff0c\u5e76\u5728\u4e00\u5929\u7684\u6570\u636e\u6536\u96c6\u540e\u8fdb\u4e00\u6b65\u589e\u52a0\u523095.4%\uff0c\u5e76\u4e14\u901a\u8fc7\u5728\u7ebf\u641c\u7d22\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u6807\u5fd7\u7740\u81ea\u4e3b\u4ee3\u7406\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5b9e\u73b0\u66f4\u9ad8\u7ea7\u548c\u53ef\u9760\u51b3\u7b56\u7684\u9053\u8def\u3002|\n", "2408.08158": "|**2024-08-15**|**EmBARDiment: an Embodied AI Agent for Productivity in XR**|Riccardo Bovo et.al.|[2408.08158](http://arxiv.org/abs/2408.08158)|null|XR\u8bbe\u5907\u642d\u8f7d\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u59cb\u7ec8\u5728\u7ebf\u7684\u4ee3\u7406\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u9ad8\u6548\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u5c4f\u5e55\u7684\u804a\u5929\u673a\u5668\u4eba\u5e76\u672a\u5145\u5206\u5229\u7528XR\u6240\u63d0\u4f9b\u7684\u5168\u9762\u81ea\u7136\u8f93\u5165\uff0c\u5305\u62ec\u5185\u90e8\u9762\u5411\u7684\u4f20\u611f\u5668\u6570\u636e\uff0c\u800c\u662f\u8fc7\u5ea6\u4f9d\u8d56\u660e\u786e\u7684\u58f0\u97f3\u6216\u6587\u672c\u63d0\u793a\uff0c\u6709\u65f6\u8fd8\u4f1a\u4e0e\u4f5c\u4e3a\u67e5\u8be2\u7684\u4e00\u90e8\u5206\u6295\u5c04\u7684\u591a\u6a21\u6001\u6570\u636e\u914d\u5bf9\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u6ce8\u610f\u529b\u6846\u67b6\u4ece\u7528\u6237\u884c\u4e3a\u3001\u6ce8\u89c6\u70b9\u548cXR\u73af\u5883\u4e2d\u7684\u4e0a\u4e0b\u6587\u8bb0\u5fc6\u4e2d\u9690\u5f0f\u5730\u63a8\u5bfc\u51fa\u80cc\u666f\u4fe1\u606f\uff0c\u4ece\u800c\u6700\u5c0f\u5316\u5bf9\u5de5\u7a0b\u5316\u660e\u786e\u63d0\u793a\u7684\u9700\u6c42\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u4e14\u76f4\u89c2\u7684\u4ea4\u4e92\uff0c\u8fd9\u4e9b\u4ea4\u4e92\u80fd\u591f\u6d1e\u5bdf\u7528\u6237\u7684\u89c1\u89e3\u5e76\u4e3a\u804a\u5929\u673a\u5668\u4eba\u63d0\u4f9b\u4fe1\u606f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u5c55\u793a\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u5728XR\u4e2d\u4e0e\u804a\u5929\u673a\u5668\u4eba\u8fdb\u884c\u4ea4\u4e92\u7684\u6f5c\u5728\u53d8\u9769\u6027\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765XR-\u5b9e\u4f53LLM\u4ee3\u7406\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.08054": "|**2024-08-15**|**Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework**|Changyu Du et.al.|[2408.08054](http://arxiv.org/abs/2408.08054)|null|\u4f20\u7edf\u7684\u5efa\u7b51\u4fe1\u606f\u6a21\u578b\uff08BIM\uff09\u521b\u5efa\u8fc7\u7a0b\u901a\u5e38\u8981\u6c42\u8bbe\u8ba1\u5e08\u638c\u63e1\u590d\u6742\u4e14\u7e41\u7410\u7684\u5efa\u6a21\u547d\u4ee4\uff0c\u4ee5\u5728BIM\u521b\u5efa\u5de5\u5177\u4e2d\u5b9e\u73b0\u5176\u8bbe\u8ba1\u610f\u56fe\u3002\u8fd9\u79cd\u989d\u5916\u7684\u8ba4\u77e5\u8d1f\u62c5\u4f7f\u8bbe\u8ba1\u8fc7\u7a0b\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u963b\u788d\u4e86\u5efa\u7b51\u3001\u5de5\u7a0b\u548c\u65bd\u5de5\uff08AEC\uff09\u884c\u4e1a\u5bf9BIM\u548c\u57fa\u4e8e\u6a21\u578b\u7684\u8bbe\u8ba1\u7684\u91c7\u7528\u3002 \u4e3a\u4e86\u66f4\u76f4\u89c2\u5730\u8868\u8fbe\u8bbe\u8ba1\u610f\u56fe\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014Text2BIM\u3002\u8be5\u6846\u67b6\u80fd\u591f\u4ece\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u62103D\u5efa\u7b51\u6a21\u578b\u3002\u5b83\u901a\u8fc7\u534f\u8c03\u591a\u4e2aLLM\u4ee3\u7406\u534f\u4f5c\u5e76\u63a8\u7406\uff0c\u5c06\u6587\u672c\u7528\u6237\u8f93\u5165\u8f6c\u6362\u4e3a\u8c03\u7528BIM\u521b\u5efa\u5de5\u5177API\u7684\u6307\u4ee4\u4ee3\u7801\uff0c\u4ece\u800c\u5728\u8f6f\u4ef6\u4e2d\u751f\u6210\u5177\u6709\u5185\u90e8\u5e03\u5c40\u3001\u5916\u90e8\u5916\u58f3\u548c\u8bed\u4e49\u4fe1\u606f\u7684\u53ef\u7f16\u8f91BIM\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u89c4\u5219\u7684\u6a21\u578b\u68c0\u67e5\u5668\uff0c\u5229\u7528\u9884\u5b9a\u4e49\u7684\u9886\u57df\u77e5\u8bc6\u6307\u5bfcLLM\u4ee3\u7406\u89e3\u51b3\u751f\u6210\u6a21\u578b\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u8fed\u4ee3\u6539\u8fdb\u6a21\u578b\u8d28\u91cf\u3002 \u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\u6765\u6bd4\u8f83\u548c\u5206\u6790\u5728\u63d0\u8bae\u6846\u67b6\u4e0b\u4e09\u79cd\u4e0d\u540cLLM\u7684\u8868\u73b0\u3002\u8bc4\u4f30\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u6709\u6548\u5730\u751f\u6210\u9ad8\u8d28\u91cf\u3001\u7ed3\u6784\u5408\u7406\u4e14\u4e0e\u7528\u6237\u8f93\u5165\u6307\u5b9a\u7684\u62bd\u8c61\u6982\u5ff5\u76f8\u4e00\u81f4\u7684\u5efa\u7b51\u6a21\u578b\u3002 \u6700\u540e\uff0c\u5f00\u53d1\u4e86\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u8f6f\u4ef6\u539f\u578b\uff0c\u5c06\u8be5\u6846\u67b6\u96c6\u6210\u5230BIM\u521b\u5efa\u8f6f\u4ef6Vectorworks\u4e2d\uff0c\u5c55\u793a\u4e86\u901a\u8fc7\u804a\u5929\u8fdb\u884c\u5efa\u6a21\u7684\u6f5c\u529b\u3002|\n", "2408.09955": "|**2024-08-20**|**MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems**|Qian Wang et.al.|[2408.09955](http://arxiv.org/abs/2408.09955)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0cLLM\u9a71\u52a8\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08LLM-MA\u7cfb\u7edf\uff09\u88ab\u63d0\u51fa\u4ee5\u5e94\u5bf9\u5b9e\u9645\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u7684\u667a\u80fd\u4f53\u5927\u591a\u9075\u5faa\u5728\u6574\u4f53\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8\u7684\u9884\u5b9a\u4e49\u6807\u51c6\u64cd\u4f5c\u7a0b\u5e8f\uff08SOP\uff09\uff0c\u7f3a\u4e4f\u81ea\u4e3b\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u6b64\u5916\uff0c\u5f53\u524d\u89e3\u51b3\u65b9\u6848\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u6548\u667a\u80fd\u4f53\u5408\u4f5c\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MegaAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u5927\u89c4\u6a21LLM\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u81ea\u4e3b\u5408\u4f5c\u7684\u5b9e\u7528\u6846\u67b6\u3002MegaAgent\u5229\u7528\u667a\u80fd\u4f53\u7684\u81ea\u4e3b\u6027\u52a8\u6001\u751f\u6210\u57fa\u4e8e\u4efb\u52a1\u9700\u6c42\u7684\u667a\u80fd\u4f53\uff0c\u96c6\u6210\u4e86\u4efb\u52a1\u81ea\u52a8\u5212\u5206\u3001\u667a\u80fd\u4f53\u6d3b\u52a8\u7cfb\u7edf\u7ea7\u89c4\u5212\u4e0e\u76d1\u63a7\u4ee5\u53ca\u5e76\u53d1\u64cd\u4f5c\u7ba1\u7406\u7b49\u529f\u80fd\u3002\u6b64\u5916\uff0cMegaAgent\u91c7\u7528\u5c42\u6b21\u7ed3\u6784\u8bbe\u8ba1\uff0c\u5e76\u5229\u7528\u7cfb\u7edf\u7ea7\u5e76\u884c\u6027\u6765\u63d0\u5347\u6027\u80fd\u548c\u589e\u5f3a\u901a\u4fe1\u6548\u7387\u3002 \u6211\u4eec\u901a\u8fc7\u56f4\u68cb\u6e38\u620f\u5f00\u53d1\u5c55\u793a\u4e86MegaAgent\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6d41\u884c\u7684LLM-MA\u7cfb\u7edf\uff1b\u5e76\u901a\u8fc7\u56fd\u5bb6\u653f\u7b56\u6a21\u62df\u9a8c\u8bc1\u4e86\u5176\u9ad8\u81ea\u4e3b\u6027\u548c\u5feb\u901f\u6269\u5c55\u81f3590\u4e2a\u667a\u80fd\u4f53\u7684\u80fd\u529b\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u5b83\u4eec\u4e4b\u95f4\u7684\u6709\u6548\u5408\u4f5c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cMegaAgent\u662f\u9996\u4e2a\u65e0\u9884\u5b9a\u4e49SOP\u3001\u9ad8\u6548\u4e14\u5177\u6709\u9ad8\u53ef\u6269\u5c55\u6027\u7684\u5927\u89c4\u6a21LLM-MA\u7cfb\u7edf\uff0c\u4e3a\u8be5\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u94fa\u5e73\u4e86\u9053\u8def\u3002\u6211\u4eec\u7684\u4ee3\u7801\u4f4d\u4e8e\u3002|\n", "2408.09785": "|**2024-08-19**|**GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making**|Arsham Gholamzadeh Khoee et.al.|[2408.09785](http://arxiv.org/abs/2408.09785)|null|\u5728\u6c7d\u8f66\u884c\u4e1a\u4e2d\uff0c\u4f20\u7edf\u8f6f\u4ef6\u90e8\u7f72\u51b3\u7b56\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u5bf9\u8868\u683c\u5316\u6d4b\u8bd5\u6570\u636e\u7684\u624b\u52a8\u5206\u6790\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u5bfc\u81f4\u66f4\u9ad8\u7684\u6210\u672c\u548c\u8f6f\u4ef6\u53d1\u5e03\u5468\u671f\u7684\u5ef6\u8fdf\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u7279\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u5e94\u7528\u901a\u5e38\u9700\u8981\u591a\u8f6e\u7684\u4eba\u5de5\u9a71\u52a8\u63d0\u793a\u5de5\u7a0b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u5728\u5de5\u4e1a\u6700\u7ec8\u7528\u6237\u4e2d\u7684\u5b9e\u9645\u90e8\u7f72\uff0c\u7279\u522b\u662f\u90a3\u4e9b\u9700\u8981\u53ef\u9760\u548c\u9ad8\u6548\u7ed3\u679c\u7684\u7528\u6237\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGoNoGo\u7684LLM\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u6c7d\u8f66\u8f6f\u4ef6\u90e8\u7f72\u8fc7\u7a0b\uff0c\u540c\u65f6\u6ee1\u8db3\u529f\u80fd\u8981\u6c42\u548c\u5de5\u4e1a\u7ea6\u675f\u3002\u4e0e\u4ee5\u5f80\u7cfb\u7edf\u4e0d\u540c\uff0cGoNoGo\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u548c\u98ce\u9669\u654f\u611f\u7cfb\u7edf\u8fdb\u884c\u4e86\u5b9a\u5236\u3002\u6211\u4eec\u4f7f\u7528\u6765\u81ea\u5de5\u4e1a\u5b9e\u8df5\u7684\u96f6\u6b21\u548c\u5c11\u91cf\u6b21\u793a\u4f8b\u6765\u8bc4\u4f30GoNoGo\u5728\u4e0d\u540c\u4efb\u52a1\u96be\u5ea6\u4e0b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0cGoNoGo\u5728\u96be\u5ea6\u4e0d\u8d85\u8fc7\u4e8c\u7ea7\u76843\u6b21\u793a\u4f8b\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86100%\u7684\u6210\u529f\u7387\uff0c\u5e76\u4e14\u5373\u4f7f\u5bf9\u4e8e\u66f4\u590d\u6742\u7684\u4efb\u52a1\u4e5f\u80fd\u4fdd\u6301\u9ad8\u7ee9\u6548\u3002\u6211\u4eec\u53d1\u73b0\uff0cGoNoGo\u6709\u6548\u5730\u81ea\u52a8\u5316\u4e86\u8f83\u7b80\u5355\u4efb\u52a1\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5e72\u9884\u7684\u9700\u6c42\u3002\u603b\u4e4b\uff0cGoNoGo\u4ee3\u8868\u4e86\u4e00\u4e2a\u76ee\u524d\u5728\u6211\u4eec\u7684\u5de5\u4e1a\u5408\u4f5c\u4f19\u4f34\u516c\u53f8\u4e2d\u88ab\u7528\u4e8e\u534f\u52a9\u8f6f\u4ef6\u53d1\u5e03\u51b3\u7b56\u7684\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684LLM\u57fa\u89e3\u51b3\u65b9\u6848\uff0c\u652f\u6301\u4e86\u98ce\u9669\u654f\u611f\u8f66\u8f86\u7cfb\u7edf\u53d1\u5e03\u8fc7\u7a0b\u4e2d\u7684\u66f4\u52a0\u660e\u667a\u548c\u53ca\u65f6\u7684\u51b3\u7b56\u3002|\n", "2408.09559": "|**2024-08-18**|**HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model**|Mengkang Hu et.al.|[2408.09559](http://arxiv.org/abs/2408.09559)|**[link](https://github.com/hiagent2024/hiagent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f5c\u4e3a\u80fd\u591f\u5904\u7406\u73af\u5883\u89c2\u5bdf\u5e76\u751f\u6210\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u76ee\u6807\u4efb\u52a1\u7684\u4ea4\u4e92\u7cfb\u7edf\u3002\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d7\u5230\u5176\u8bb0\u5fc6\u673a\u5236\u7684\u5f71\u54cd\uff0c\u8be5\u673a\u5236\u901a\u8fc7\u8bb0\u5f55\u5386\u53f2\u7ecf\u9a8c\u6765\u5f62\u6210\u4e00\u7cfb\u5217\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u8bb0\u5fc6\u5206\u4e3a\u4e24\u7c7b\uff1a\u8de8\u8bd5\u8bb0\u5fc6\uff0c\u79ef\u7d2f\u4e8e\u591a\u6b21\u5c1d\u8bd5\u4e2d\uff1b\u4ee5\u53ca\u5355\u8bd5\u8bb0\u5fc6\uff08\u5de5\u4f5c\u8bb0\u5fc6\uff09\uff0c\u79ef\u7d2f\u4e8e\u5355\u4e00\u5c1d\u8bd5\u5185\u3002\u5c3d\u7ba1\u5173\u4e8e\u8de8\u8bd5\u8bb0\u5fc6\u4f18\u5316\u7684\u7814\u7a76\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u4f55\u901a\u8fc7\u63d0\u5347\u5de5\u4f5c\u8bb0\u5fc6\u5229\u7528\u6548\u7387\u6765\u589e\u5f3a\u4ee3\u7406\u6027\u80fd\u7684\u63a2\u7d22\u4ecd\u76f8\u5bf9\u4e0d\u8db3\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u76f4\u63a5\u5c06\u6574\u4e2a\u5386\u53f2\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5bfc\u81f4\u5728\u957f\u671f\u4efb\u52a1\u4e2d\u5b58\u5728\u5197\u4f59\u95ee\u9898\u3002\u53d7\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u7b56\u7565\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHiAgent\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5c06\u5b50\u76ee\u6807\u4f5c\u4e3a\u8bb0\u5fc6\u5757\u6765\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7684\u5de5\u4f5c\u8bb0\u5fc6\u8fdb\u884c\u5c42\u6b21\u5316\u7ba1\u7406\u3002\u5177\u4f53\u6765\u8bf4\uff0cHiAgent\u4fc3\u4f7fLLM\u5728\u751f\u6210\u6267\u884c\u52a8\u4f5c\u524d\u5148\u5236\u5b9a\u5b50\u76ee\u6807\uff0c\u5e76\u5141\u8bb8LLM\u4e3b\u52a8\u51b3\u5b9a\u66ff\u6362\u4e4b\u524d\u7684\u5b50\u76ee\u6807\uff0c\u4ec5\u4fdd\u7559\u4e0e\u5f53\u524d\u5b50\u76ee\u6807\u76f8\u5173\u7684\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u3002\u5728\u4e94\u4e2a\u957f\u671f\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHiAgent\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e86\u4e24\u500d\uff0c\u5e73\u5747\u6b65\u9aa4\u6570\u51cf\u5c11\u4e863.8\u4e2a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cHiAgent\u5728\u6574\u4e2a\u6b65\u9aa4\u4e2d\u5747\u80fd\u6301\u7eed\u6539\u5584\u6027\u80fd\uff0c\u8fd9\u51f8\u663e\u4e86\u5176\u7a33\u5065\u6027\u548c\u6cdb\u7528\u6027\u3002 \u9879\u76ee\u9875\u9762\uff1ahttps://github.com/HiAgent2024/HiAgent**|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u8f83\u5dee\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aFLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\u7684\u65b0\u9896\u591a\u6a21\u6001LLM\u57fa\u5143\u4f53\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u6709\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u9002\u5e94\u5bfc\u822a\u4efb\u52a1\uff0c\u5305\u62ec\u5355\u611f\u77e5\u8c03\u6574\u4ee5\u63cf\u8ff0\u8857\u666f\u3001\u591a\u611f\u77e5\u8c03\u6574\u4ee5\u603b\u7ed3\u8f68\u8ff9\u4ee5\u53ca\u5728VLN\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7aef\u5230\u7aef\u8bad\u7ec3\u3002\u5408\u6210\u7684\u6570\u636e\u96c6\u662f\u81ea\u52a8\u751f\u6210\u7684\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u63d0\u9ad8\u4e867.3%\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u5e76\u4ee3\u8868\u4e86\u8fc8\u5411\u5b9e\u9645\u5e94\u7528\u4e2d\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53AI\u9886\u57df\u7684\u8fdb\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\u7684\u52a0\u6301\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4f5c\u51fa\u65e5\u76ca\u81ea\u4e3b\u7684\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5229\u7528\u4e86\u201c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u201d\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u7684\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5728\u5b8c\u6210\u7ed9\u5b9a\u4efb\u52a1\u7684\u540c\u65f6\u786e\u4fdd\u5b89\u5168\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u79cd\u6279\u5224\u673a\u5236\uff0c\u4ee5\u6307\u5bfc\u4ee3\u7406\u5728\u6bcf\u4e00\u6b65\u9632\u6b62\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u9274\u4e8e\u7f3a\u4e4f\u73b0\u6709\u57fa\u51c6\u6765\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u6536\u96c6\u4e8680\u4e2a\u5de5\u5177\u5305\uff0c\u8986\u76d68\u4e2a\u7c7b\u522b\uff0c\u5171\u8ba1180\u4e2a\u573a\u666f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u663e\u793a\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.10455": "|**2024-08-24**|**IDEA:Enhancing the Rule Learning Ability of Language Agents through Induction, Deduction, and Abduction**|Kaiyu He et.al.|[2408.10455](http://arxiv.org/abs/2408.10455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aRULEARN\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\u3002\u5728RULEARN\u4e2d\uff0c\u4ee3\u7406\u901a\u8fc7\u4e0e\u73af\u5883\u4e92\u52a8\u6536\u96c6\u89c2\u5bdf\uff0c\u5e76\u4ece\u4e2d\u63a8\u65ad\u6a21\u5f0f\uff0c\u4ee5\u6b64\u89e3\u51b3\u95ee\u9898\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u4ee3\u7406\u5728\u8be5\u57fa\u51c6\u4e0a\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86IDEA\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u5f52\u7eb3\u3001\u6f14\u7ece\u548c\u6eaf\u56e0\u4e09\u79cd\u63a8\u7406\u8fc7\u7a0b\u3002IDEA\u4ee3\u7406\u901a\u8fc7\u7ed3\u6784\u5316\u63a8\u7406\u5e8f\u5217\u63d0\u5347\u8fd9\u4e00\u65b9\u6cd5\uff1a\u9996\u5148\u901a\u8fc7\u6eaf\u56e0\u751f\u6210\u5047\u8bbe\uff0c\u7136\u540e\u901a\u8fc7\u6f14\u7ece\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\uff0c\u6700\u540e\u6839\u636e\u53cd\u9988\u8fdb\u884c\u9002\u5e94\u6027\u4fee\u6b63\u3002\u8fd9\u79cd\u5e8f\u5217\u4f7f\u4ee3\u7406\u80fd\u591f\u52a8\u6001\u5efa\u7acb\u5e76\u5e94\u7528\u89c4\u5219\uff0c\u6a21\u4eff\u4eba\u7c7b\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf9\u4e94\u79cd\u4ee3\u8868\u6027LLM\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u751f\u6210\u5408\u7406\u7684\u521d\u59cb\u5047\u8bbe\uff0c\u4f46\u5728\u73af\u5883\u5185\u7684\u6218\u7565\u4e92\u52a8\u3001\u6709\u6548\u6574\u5408\u53cd\u9988\u4ee5\u53ca\u5047\u8bbe\u7684\u9002\u5e94\u6027\u4fee\u6b63\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u800cIDEA\u4ee3\u7406\u5728RULEARN\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4e3a\u6211\u4eec\u5f00\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5b9e\u73b0\u7c7b\u4f3c\u4eba\u7c7b\u89c4\u5219\u5b66\u4e60\u80fd\u529b\u7684\u4ee3\u7406\u63d0\u4f9b\u4e86\u5b9d\u8d35\u89c1\u89e3\u3002\u6211\u4eec\u5c06\u4f1a\u53d1\u5e03\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2408.12142": "|**2024-08-22**|**MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents**|Congchi Yin et.al.|[2408.12142](http://arxiv.org/abs/2408.12142)|**[link](https://github.com/lemonsis/mdd-5k)**|**\u5728\u5927\u591a\u6570\u7cbe\u795e\u75be\u75c5\u8bca\u65ad\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u662f\u4e3b\u8981\u7684\u8bca\u65ad\u4f9d\u636e\u3002\u521b\u5efa\u8fd9\u6837\u7684\u8bca\u65ad\u5bf9\u8bdd\u6570\u636e\u96c6\u6709\u671b\u63a8\u52a8AI\u7cbe\u795e\u5065\u5eb7\u62a4\u7406\u9886\u57df\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5728\u5b9e\u9645\u8bca\u65ad\u573a\u666f\u4e2d\u6536\u96c6\u5bf9\u8bdd\u6781\u4e3a\u56f0\u96be\uff0c\u539f\u56e0\u5728\u4e8e\u9690\u79c1\u548c\u4f26\u7406\u8003\u8651\u7684\u4e25\u683c\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u5229\u7528\u6613\u4e8e\u83b7\u53d6\u7684\u533f\u540d\u60a3\u8005\u6848\u4f8b\u6765\u5408\u6210\u8bca\u65ad\u5bf9\u8bdd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u795e\u7ecf\u7b26\u53f7\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5408\u6210\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u5bf9\u8bdd\u3002\u8be5\u6846\u67b6\u4ee5\u60a3\u8005\u6848\u4f8b\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u80fd\u591f\u751f\u6210\u9488\u5bf9\u5355\u4e2a\u60a3\u8005\u6848\u4f8b\u7684\u591a\u4e2a\u591a\u6837\u5316\u7684\u5bf9\u8bdd\uff0c\u5176\u57fa\u672c\u8fc7\u7a0b\u6d89\u53ca\u533b\u751f\u4ee3\u7406\u4e0e\u60a3\u8005\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\uff0c\u5e76\u901a\u8fc7\u5de5\u5177\u4ee3\u7406\u5b9e\u73b0\u57fa\u4e8e\u7b26\u53f7\u63a7\u5236\u7684\u6587\u672c\u751f\u6210\uff0c\u501f\u52a9\u52a8\u6001\u8bca\u65ad\u6811\u3002\u901a\u8fc7\u5e94\u7528\u63d0\u51fa\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5305\u542b1000\u4e2a\u6e05\u6d17\u8fc7\u7684\u5b9e\u9645\u60a3\u8005\u6848\u4f8b\u3001\u4e0e\u4e00\u5bb6\u9886\u5148\u7684\u7cbe\u795e\u75c5\u533b\u9662\u5408\u4f5c\u6784\u5efa\u7684\u4e2d\u56fd\u6700\u5927\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u6570\u636e\u96c6MDD-5k\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e865000\u4e2a\u9ad8\u8d28\u91cf\u7684\u957f\u5bf9\u8bdd\u53ca\u5176\u8bca\u65ad\u7ed3\u679c\u6807\u7b7e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5305\u542b\u4e2d\u6587\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u7ed3\u679c\u7684\u6807\u8bb0\u6570\u636e\u96c6\u3002\u4eba\u7c7b\u8bc4\u4f30\u8868\u660e\uff0c\u63d0\u51fa\u7684MDD-5k\u6570\u636e\u96c6\u6210\u529f\u6a21\u62df\u4e86\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u8fc7\u7a0b\u3002\u6570\u636e\u96c6\u548c\u4ee3\u7801\u5c06\u5728https://github.com/lemonsis/MDD-5k\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2408.12680": "|**2024-09-01**|**Can LLMs Understand Social Norms in Autonomous Driving Games?**|Boxuan Wang et.al.|[2408.12680](http://arxiv.org/abs/2408.12680)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u4e0e\u6a21\u62df\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u793e\u4f1a\u89c4\u8303\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5c06LLM\u96c6\u6210\u5230\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u7684\u667a\u80fd\u4ee3\u7406\u89d2\u8272\u4e2d\uff0c\u6211\u4eec\u57fa\u4e8e\u6587\u672c\u63d0\u793a\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u6309\u7167\u76f8\u5173\u73af\u5883\u8bbe\u5b9a\u548c\u89c2\u5bdf\u4fe1\u606f\u505a\u51fa\u51b3\u7b56\u3002\u6211\u4eec\u7684\u6846\u67b6\u6d89\u53caLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\u8fdb\u884c\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\uff0c\u4ee5\u6b64\u7814\u7a76\u4e2a\u4f53\u4ee3\u7406\u4e4b\u95f4\u793e\u4f1a\u89c4\u8303\u7684\u5f62\u6210\u3002 \u6211\u4eec\u8bbe\u8ba1\u5b9e\u9a8c\uff0c\u5229\u7528OpenAI\u804a\u5929API\uff08\u7531GPT-4.0\u63d0\u4f9b\u52a8\u529b\uff09\u5728\u65e0\u4fe1\u53f7\u4ea4\u53c9\u53e3\u6e38\u620f\u4e0e\u9ad8\u901f\u516c\u8def\u8f66\u961f\u6e38\u620f\u4e24\u79cd\u573a\u666f\u4e0b\u6a21\u62df\u4ea4\u4e92\u5e76\u8bc4\u4f30LLM\u9a71\u52a8\u4ee3\u7406\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u80fd\u591f\u5904\u7406\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\u4e2d\u7684\u52a8\u6001\u73af\u5883\u53d8\u5316\uff0c\u5e76\u4e14\u5728\u4e24\u4e2a\u573a\u666f\u4e2d\uff0c\u4ee3\u7406\u95f4\u5f62\u6210\u4e86\u793e\u4f1a\u89c4\u8303\u3002 \u5728\u4ea4\u53c9\u53e3\u6e38\u620f\u4e2d\uff0c\u5f53\u9762\u4e34\u6f5c\u5728\u8f66\u7978\u65f6\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u503e\u5411\u4e8e\u91c7\u53d6\u4fdd\u5b88\u7684\u9a7e\u9a76\u7b56\u7565\u3002LLM\u9a71\u52a8\u4ee3\u7406\u5728\u6e38\u620f\u4e2d\u7684\u4f18\u52bf\u5728\u4e8e\u5176\u64cd\u4f5c\u7075\u6d3b\u6027\u548c\u53ef\u5206\u6790\u6027\uff0c\u8fd9\u6709\u52a9\u4e8e\u5b9e\u9a8c\u8bbe\u8ba1\u3002|\n", "2408.14307": "|**2024-08-26**|**LLM-3D Print: Large Language Models To Monitor and Control 3D Printing**|Yayati Jadhav et.al.|[2408.14307](http://arxiv.org/abs/2408.14307)|null|\u884c\u4e1a4.0\u901a\u8fc7\u63a8\u52a8\u6570\u5b57\u5316\u8fdb\u7a0b\u5e76\u8f6c\u5411\u589e\u6750\u5236\u9020\uff08AM\uff09\uff0c\u5f7b\u5e95\u6539\u53d8\u4e86\u5236\u9020\u4e1a\u3002\u7194\u878d\u6c89\u79ef\u5efa\u6a21\uff08FDM\uff09\u4f5c\u4e3a\u5173\u952e\u7684AM\u6280\u672f\u4e4b\u4e00\uff0c\u901a\u8fc7\u9010\u5c42\u6324\u51fa\u65b9\u5f0f\u521b\u5efa\u9ad8\u5ea6\u5b9a\u5236\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u6750\u6599\u6d6a\u8d39\u6781\u5c0f\u7684\u4ea7\u54c1\uff0c\u5bf9\u4f20\u7edf\u51cf\u6750\u65b9\u6cd5\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6750\u6599\u6324\u51fa\u6280\u672f\u7684\u6613\u9519\u6027\u5f80\u5f80\u9700\u8981\u4e13\u5bb6\u4ecb\u5165\u6765\u68c0\u6d4b\u548c\u7f13\u89e3\u53ef\u80fd\u4e25\u91cd\u635f\u5bb3\u4ea7\u54c1\u8d28\u91cf\u7684\u7f3a\u9677\u3002\u867d\u7136\u5df2\u5b58\u5728\u81ea\u52a8\u5316\u9519\u8bef\u68c0\u6d4b\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u4f46\u5b83\u4eec\u5728\u4e0d\u540c3D\u6253\u5370\u673a\u8bbe\u7f6e\u3001\u56fa\u4ef6\u548c\u4f20\u611f\u5668\u4e4b\u95f4\u7684\u901a\u7528\u6027\u6709\u9650\uff0c\u5e76\u4e14\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e3D\u6253\u5370\u6280\u672f\u76f8\u7ed3\u5408\u7684\u8fc7\u7a0b\u76d1\u63a7\u548c\u63a7\u5236\u6846\u67b6\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u89e3\u51b3\u6253\u5370\u7f3a\u9677\u3002\u8be5LLM\u901a\u8fc7\u5206\u6790\u6bcf\u5c42\u6216\u6253\u5370\u6bb5\u4e4b\u540e\u6355\u83b7\u7684\u56fe\u50cf\u6765\u8bc4\u4f30\u6253\u5370\u8d28\u91cf\uff0c\u8bc6\u522b\u6545\u969c\u6a21\u5f0f\uff0c\u5e76\u5411\u6253\u5370\u673a\u67e5\u8be2\u76f8\u5173\u53c2\u6570\u3002\u7136\u540e\uff0c\u5b83\u751f\u6210\u5e76\u6267\u884c\u7ea0\u6b63\u63aa\u65bd\u8ba1\u5212\u3002\u6211\u4eec\u901a\u8fc7\u5c06\u63d0\u51fa\u7684\u6846\u67b6\u7684\u6709\u6548\u6027\u4e0e\u4e00\u7ec4\u5177\u6709\u4e0d\u540cAM\u4e13\u4e1a\u77e5\u8bc6\u7684\u5de5\u7a0b\u5e08\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u4ee5\u9a8c\u8bc1\u8bc6\u522b\u7f3a\u9677\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0d\u4ec5\u51c6\u786e\u8bc6\u522b\u5e38\u89c1\u76843D\u6253\u5370\u9519\u8bef\uff0c\u5982\u4e0d\u4e00\u81f4\u7684\u6324\u51fa\u3001\u4e1d\u72b6\u5806\u79ef\u3001\u7fd8\u66f2\u548c\u5c42\u7c98\u5408\u95ee\u9898\uff0c\u800c\u4e14\u8fd8\u80fd\u6709\u6548\u786e\u5b9a\u5bfc\u81f4\u8fd9\u4e9b\u5931\u8d25\u7684\u53c2\u6570\uff0c\u5e76\u81ea\u4e3b\u5730\u8fdb\u884c\u4fee\u6b63\uff0c\u65e0\u9700\u4efb\u4f55\u4eba\u5de5\u5e72\u9884\u3002|\n", "2408.14033": "|**2024-09-02**|**MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents**|Ruochen Li et.al.|[2408.14033](http://arxiv.org/abs/2408.14033)|**[link](https://github.com/du-nlp-lab/mlr-copilot)**|**\u673a\u5668\u5b66\u4e60\u7814\u7a76\u5bf9\u4e8e\u6280\u672f\u8fdb\u6b65\u548c\u521b\u65b0\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5e38\u5e38\u9762\u4e34\u590d\u6742\u6027\u9ad8\u3001\u5b9e\u9a8c\u5468\u671f\u957f\u4ee5\u53ca\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7b49\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7cfb\u7edf\u6846\u67b6\u2014\u2014\u81ea\u4e3b\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLR-Copilot\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u5e76\u5b9e\u65bd\u7814\u7a76\u60f3\u6cd5\u6765\u63d0\u9ad8\u673a\u5668\u5b66\u4e60\u7814\u7a76\u7684\u751f\u4ea7\u529b\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e09\u4e2a\u9636\u6bb5\uff1a\u7814\u7a76\u60f3\u6cd5\u751f\u6210\u3001\u5b9e\u9a8c\u5b9e\u73b0\u548c\u6267\u884c\u3002\u9996\u5148\uff0c\u901a\u8fc7\u57fa\u4e8eLLM\u7684IdeaAgent\u5229\u7528\u73b0\u6709\u7814\u7a76\u8bba\u6587\u751f\u6210\u5047\u8bbe\u548c\u5b9e\u9a8c\u8ba1\u5212\u3002\u63a5\u4e0b\u6765\uff0c\u5728\u5b9e\u73b0\u751f\u6210\u9636\u6bb5\uff0c\u5c06\u8fd9\u4e9b\u8ba1\u5212\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u4f7f\u7528ExperimentAgent\u5b8c\u6210\u6b64\u8fc7\u7a0b\u3002\u6b64\u9636\u6bb5\u5229\u7528\u68c0\u7d22\u5230\u7684\u539f\u578b\u4ee3\u7801\uff0c\u5e76\u6839\u636e\u9700\u8981\u68c0\u7d22\u5019\u9009\u6a21\u578b\u548c\u6570\u636e\u3002\u6700\u540e\uff0c\u5728\u6267\u884c\u9636\u6bb5\uff0c\u4e5f\u7531ExperimentAgent\u7ba1\u7406\uff0c\u6d89\u53ca\u8fd0\u884c\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u548c\u8fed\u4ee3\u8c03\u8bd5\u673a\u5236\uff0c\u4ee5\u589e\u52a0\u5b9e\u73b0\u53ef\u6267\u884c\u7814\u7a76\u6210\u679c\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5bf9\u4e94\u4e2a\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\u4e86\u8be5\u6846\u67b6\u4fc3\u8fdb\u7814\u7a76\u8fdb\u5c55\u548c\u521b\u65b0\u7684\u6f5c\u529b\u3002**|\n", "2408.13986": "|**2024-08-26**|**AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework**|Jie Feng et.al.|[2408.13986](http://arxiv.org/abs/2408.13986)|**[link](https://github.com/tsinghua-fib-lab/agentmove)**|**\u4eba\u7c7b\u79fb\u52a8\u6027\u9884\u6d4b\u5728\u5404\u79cd\u5b9e\u9645\u5e94\u7528\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u5c3d\u7ba1\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u4f46\u5b83\u4eec\u5bf9\u7528\u4e8e\u8bad\u7ec3\u7684\u5927\u91cf\u79c1\u4eba\u79fb\u52a8\u6570\u636e\u7684\u4f9d\u8d56\u4ee5\u53ca\u65e0\u6cd5\u8fdb\u884c\u96f6\u542f\u52a8\u9884\u6d4b\u7684\u80fd\u529b\uff0c\u963b\u788d\u4e86\u8fdb\u4e00\u6b65\u7684\u53d1\u5c55\u3002\u6700\u8fd1\uff0c\u6709\u4eba\u5c1d\u8bd5\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6267\u884c\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u3002\u7136\u800c\uff0c\u4ed6\u4eec\u7684\u6027\u80fd\u53d7\u9650\u4e8e\u7f3a\u4e4f\u7cfb\u7edf\u7684\u8bbe\u8ba1\u5de5\u4f5c\u6d41\u7a0b\u3002\u4ed6\u4eec\u76f4\u63a5\u4f7f\u7528LLMs\u751f\u6210\u6700\u7ec8\u8f93\u51fa\uff0c\u8fd9\u9650\u5236\u4e86LLMs\u53d1\u73b0\u590d\u6742\u79fb\u52a8\u6a21\u5f0f\u7684\u6f5c\u529b\uff0c\u5e76\u4f4e\u4f30\u4e86\u5b83\u4eec\u5728\u5168\u7403\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u65b9\u9762\u7684\u5de8\u5927\u50a8\u5907\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAgentMove\u7684\u7cfb\u7edf\u6027\u4ee3\u7406\u9884\u6d4b\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4efb\u4f55\u5168\u7403\u57ce\u5e02\u7684\u901a\u7528\u79fb\u52a8\u6027\u9884\u6d4b\u3002\u5728AgentMove\u4e2d\uff0c\u6211\u4eec\u9996\u5148\u5c06\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u5206\u89e3\u4e3a\u4e09\u4e2a\u5b50\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u76f8\u5e94\u7684\u6a21\u5757\u6765\u5b8c\u6210\u8fd9\u4e9b\u5b50\u4efb\u52a1\uff0c\u5305\u62ec\u4e2a\u4f53\u79fb\u52a8\u6a21\u5f0f\u6316\u6398\u7684\u7a7a\u95f4-\u65f6\u95f4\u8bb0\u5fc6\u3001\u57ce\u5e02\u7ed3\u6784\u6548\u5e94\u5bf9\u6a21\u578b\u7684\u5f71\u54cd\u7684\u5168\u7403\u77e5\u8bc6\u751f\u6210\u5668\u4ee5\u53ca\u6355\u83b7\u4eba\u53e3\u5171\u4eab\u6a21\u5f0f\u7684\u96c6\u4f53\u77e5\u8bc6\u63d0\u53d6\u5668\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06\u4e09\u4e2a\u6a21\u5757\u7684\u7ed3\u679c\u7ed3\u5408\u8d77\u6765\uff0c\u5e76\u6267\u884c\u63a8\u7406\u6b65\u9aa4\u4ee5\u751f\u6210\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u6765\u81ea\u4e24\u4e2a\u6765\u6e90\u768412\u4e2a\u57ce\u5e02\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u6700\u4f73\u57fa\u7ebf\u76f8\u6bd4\uff0cAgentMove\u5728\u5404\u79cd\u6307\u6807\u4e0a\u7684\u6027\u80fd\u63d0\u9ad8\u4e86\u8d85\u8fc78%\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57ce\u5e02\u4e2d\u663e\u793a\u51fa\u4e86\u7a33\u5065\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u4e14\u4f7f\u7528\u4e0d\u540c\u57fa\u7840\u7684LLM\u65f6\u4e5f\u80fd\u8868\u73b0\u51fa\u8272\uff0c\u4e14\u5177\u6709\u8f83\u4f4e\u7684\u5730\u7406\u504f\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u4ee5\u5728https://github.com/tsinghua-fib-lab/AgentMove\u627e\u5230\u3002**|\n", "2408.13406": "|**2024-08-23**|**Optimizing Collaboration of LLM based Agents for Finite Element Analysis**|Chuan Tian et.al.|[2408.13406](http://arxiv.org/abs/2408.13406)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u7a0b\u548c\u7f16\u7801\u4efb\u52a1\u4e2d\u7684\u591a\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u5229\u7528AutoGen\u6846\u67b6\u4fc3\u8fdb\u4ee3\u7406\u4e4b\u95f4\u7684\u6c9f\u901a\uff0c\u5e76\u57fa\u4e8e\u6bcf\u79cd\u8bbe\u7f6e\u768440\u6b21\u968f\u673a\u8fd0\u884c\u7684\u6210\u529f\u7387\u8bc4\u4f30\u4e0d\u540c\u7684\u914d\u7f6e\u3002\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u5f00\u53d1\u4e00\u4e2a\u7075\u6d3b\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u7528\u4e8e\u5c06\u6709\u9650\u5143\u65b9\u6cd5\u5e94\u7528\u4e8e\u89e3\u51b3\u7ebf\u6027\u5f39\u6027\u95ee\u9898\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u4f18\u5316\u4ee3\u7406\u89d2\u8272\u53ca\u5176\u660e\u786e\u804c\u8d23\u7684\u91cd\u8981\u6027\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u589e\u52a0\u4ee3\u7406\u6570\u91cf\u3002\u4ee3\u7406\u95f4\u7684\u6709\u6548\u534f\u4f5c\u88ab\u8bc1\u660e\u5bf9\u4e8e\u89e3\u51b3\u6709\u9650\u5143\u65b9\u6cd5\u7684\u4e00\u822c\u6311\u6218\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u9879\u7814\u7a76\u5c55\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u589e\u5f3a\u8ba1\u7b97\u81ea\u52a8\u5316\u5728\u6a21\u62df\u65b9\u6cd5\u5b66\u4e2d\u7684\u6f5c\u529b\uff0c\u4e3a\u5de5\u7a0b\u548c\u4eba\u5de5\u667a\u80fd\u7684\u672a\u6765\u8fdb\u5c55\u94fa\u5e73\u9053\u8def\u3002|\n", "2408.14972": "|**2024-08-27**|**AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems**|Chi-Min Chan et.al.|[2408.14972](http://arxiv.org/abs/2408.14972)|**[link](https://github.com/chanchimin/agentmonitor)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u52a8\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5174\u8d77\u3002\u8fd1\u671f\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\uff0c\u6bcf\u4e2a\u4ee3\u7406\u6267\u884c\u7279\u5b9a\u89d2\u8272\u65f6\uff0c\u5176\u6027\u80fd\u901a\u5e38\u4f18\u4e8e\u5355\u4e00LLM\u3002\u7136\u800c\uff0c\u914d\u7f6eMAS\u4ee5\u5b8c\u6210\u4efb\u52a1\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u4efb\u52a1\u8868\u73b0\u4ec5\u5728\u6267\u884c\u540e\u624d\u80fd\u89c2\u5bdf\u5230\u3002\u53d7\u5230LLM\u5f00\u53d1\u4e2d\u7684\u89c4\u6a21\u6cd5\u5219\u542f\u53d1\uff0c\u6211\u4eec\u63a2\u7d22\u662f\u5426\u80fd\u5728\u4efb\u52a1\u6267\u884c\u524d\u9884\u6d4bMAS\u7684\u6027\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentMonitor\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5728\u4ee3\u7406\u5c42\u7ea7\u96c6\u6210\uff0c\u7528\u4e8e\u6355\u83b7\u8f93\u5165\u548c\u8f93\u51fa\u4fe1\u606f\uff0c\u5e76\u5c06\u8fd9\u4e9b\u4fe1\u606f\u8f6c\u6362\u4e3a\u7edf\u8ba1\u6570\u636e\uff0c\u7528\u4e8e\u8bad\u7ec3\u56de\u5f52\u6a21\u578b\u9884\u6d4b\u4efb\u52a1\u6027\u80fd\u3002\u6b64\u5916\uff0cAgentMonitor\u8fd8\u80fd\u591f\u5b9e\u65f6\u5bf9\u53ef\u80fd\u7531\u6076\u610f\u4ee3\u7406\u5f15\u53d1\u7684\u5b89\u5168\u98ce\u9669\u8fdb\u884c\u7ea0\u6b63\uff0c\u4ece\u800c\u51cf\u8f7b\u8d1f\u9762\u5f71\u54cd\u5e76\u589e\u5f3aMAS\u7684\u5b89\u5168\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528XGBoost\u6a21\u578b\u5728\u9886\u57df\u5185\u573a\u666f\u4e0b\u8fbe\u52300.89\u7684\u65af\u76ae\u5c14\u66fc\u76f8\u5173\u7cfb\u6570\uff0c\u5728\u66f4\u5177\u6311\u6218\u6027\u7684\u573a\u666f\u4e0b\u8fbe\u52300.58\u3002\u901a\u8fc7\u5e94\u7528AgentMonitor\uff0c\u6709\u5bb3\u5185\u5bb9\u51cf\u5c11\u4e866.2%\uff0c\u6709\u76ca\u5185\u5bb9\u5e73\u5747\u589e\u52a0\u4e861.8%\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u548c\u53ef\u9760\u6027\u3002\u76f8\u5173\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\u3002**|\n", "2408.15778": "|**2024-09-05**|**LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models**|Jiayi Gui et.al.|[2408.15778](http://arxiv.org/abs/2408.15778)|**[link](https://github.com/hypatiaalegra/logicgame-data)**|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLogicGame\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89c4\u5219\u7406\u89e3\u548c\u6267\u884c\u3001\u591a\u6b65\u89c4\u5212\u65b9\u9762\u7684\u5168\u9762\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0cLogicGame\u63d0\u4f9b\u4e86\u591a\u79cd\u6e38\u620f\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u7cfb\u5217\u89c4\u5219\u4ee5\u53ca\u521d\u59cb\u72b6\u6001\uff0c\u8981\u6c42\u6a21\u578b\u7406\u89e3\u5e76\u5e94\u7528\u9884\u5b9a\u4e49\u89c4\u5219\u6765\u89e3\u51b3\u95ee\u9898\u3002\u6211\u4eec\u521b\u5efa\u4e86\u6a21\u62df\u60c5\u666f\uff0c\u8ba9\u6a21\u578b\u6267\u884c\u6216\u89c4\u5212\u64cd\u4f5c\u4ee5\u8fbe\u5230\u7279\u5b9a\u76ee\u6807\u3002\u8fd9\u4e9b\u6e38\u620f\u573a\u666f\u4e13\u95e8\u8bbe\u8ba1\u4ee5\u533a\u5206\u903b\u8f91\u63a8\u7406\u4e0e\u4ec5\u4f9d\u8d56\u77e5\u8bc6\u7684\u80fd\u529b\uff0c\u5b8c\u5168\u4f9d\u8d56\u4e8e\u9884\u8bbe\u89c4\u5219\u3002\u8fd9\u79cd\u5206\u79bb\u5141\u8bb8\u5bf9\u57fa\u4e8e\u89c4\u5219\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u7eaf\u7cb9\u7684\u8bc4\u4f30\u3002\u8bc4\u4f30\u4e0d\u4ec5\u8003\u8651\u6700\u7ec8\u7ed3\u679c\uff0c\u8fd8\u8003\u8651\u4e2d\u95f4\u6b65\u9aa4\uff0c\u63d0\u4f9b\u6a21\u578b\u6027\u80fd\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u4e2d\u95f4\u6b65\u9aa4\u662f\u786e\u5b9a\u6027\u7684\uff0c\u5e76\u4e14\u53ef\u4ee5\u81ea\u52a8\u9a8c\u8bc1\u3002LogicGame\u5b9a\u4e49\u4e86\u4ece\u7b80\u5355\u89c4\u5219\u5e94\u7528\u5230\u590d\u6742\u63a8\u7406\u94fe\u7684\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u6e38\u620f\u573a\u666f\uff0c\u4ee5\u7cbe\u786e\u8bc4\u4f30\u6a21\u578b\u5728\u89c4\u5219\u7406\u89e3\u548c\u591a\u6b65\u6267\u884c\u4e0a\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528LogicGame\uff0c\u6211\u4eec\u6d4b\u8bd5\u4e86\u5404\u79cdLLM\uff0c\u5e76\u53d1\u73b0\u4e86\u5b83\u4eec\u5728\u57fa\u4e8e\u89c4\u5219\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u663e\u8457\u4e0d\u8db3\u3002|\n", "2408.16090": "|**2024-08-28**|**EPO: Hierarchical LLM Agents with Environment Preference Optimization**|Qi Zhao et.al.|[2408.16090](http://arxiv.org/abs/2408.16090)|**[link](https://github.com/kevinz8866/epo)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u5c42\u6846\u67b6\uff0c\u7528\u4e8e\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u5b50\u76ee\u6807\u7684\u95ee\u9898\u3002\u6846\u67b6\u4f7f\u7528\u4e86\u72ec\u7acb\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5b50\u76ee\u6807\u9884\u6d4b\u548c\u4f4e\u7ea7\u52a8\u4f5c\u751f\u6210\u3002\u9488\u5bf9\u65e0\u6807\u6ce8\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u4fe1\u53f7\u521b\u5efa\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u5229\u7528\u73af\u5883\u591a\u6a21\u6001\u53cd\u9988\u81ea\u52a8\u751f\u6210\u5956\u52b1\u4fe1\u53f7\u3002\u6211\u4eec\u5f15\u5165\u4e86\u73af\u5883\u504f\u597d\u4f18\u5316\uff08EPO\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4ece\u73af\u5883\u53cd\u9988\u4e2d\u751f\u6210\u504f\u597d\u4fe1\u53f7\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u4fe1\u53f7\u8bad\u7ec3\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u3002ALFRED\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u6027\u80fd\u4e0a\u5904\u4e8e\u9886\u5148\u5730\u4f4d\uff0c\u9996\u6b21\u767b\u4e0a\u4e86ALFRED\u516c\u5f00\u6392\u884c\u699c\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u7684\u957f\u671f\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u7684\u63d0\u5347\u6f5c\u529b\u3002|\n", "2408.16991": "|**2024-08-30**|**Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios**|Zhongyuan Wang et.al.|[2408.16991](http://arxiv.org/abs/2408.16991)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5de5\u5177\u8f85\u52a9\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8eSQL\u68c0\u67e5\u548c\u6539\u8fdb\uff0c\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u67e5\u8be2\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3aLLM\u4ee3\u7406\u914d\u5907\u4e24\u4e2a\u4e13\u95e8\u5de5\u5177\u2014\u2014\u68c0\u7d22\u5668\u548c\u68c0\u6d4b\u5668\uff0c\u4ee5\u8bca\u65ad\u5e76\u4fee\u6b63SQL\u67e5\u8be2\u4e2d\u7684\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u3002\u8fd9\u4e9b\u5de5\u5177\u80fd\u591f\u589e\u5f3aLLM\u5904\u7406\u771f\u5b9e\u573a\u666f\u4e2d\u51fa\u73b0\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u548c\u4e25\u683c\u7ea6\u675f\u4e0d\u5339\u914d\u7b49\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u7684\u80fd\u529b\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86Spider-Mismatch\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u4e3a\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u4e2d\u9047\u5230\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u95ee\u9898\u800c\u6784\u5efa\u7684\u65b0\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5c11\u91cf\u793a\u4f8b\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u548cSpider-Realistic\u6570\u636e\u96c6\u4e0a\u7684\u5e73\u5747\u8868\u73b0\u6700\u4f73\uff0c\u5e76\u4e14\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5728\u66f4\u5177\u6709\u73b0\u5b9e\u6027\u7684\u6570\u636e\u96c6Spider-Mismatch\u4e0a\u4e5f\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.00993": "|**2024-09-02**|**Evolution of Social Norms in LLM Agents using Natural Language**|Ilya Horiguchi et.al.|[2409.00993](http://arxiv.org/abs/2409.00993)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u6fc0\u53d1\u4e86\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u6e38\u620f\u7406\u8bba\u6a21\u62df\u7684\u5174\u8da3\uff0c\u5728\u8fd9\u4e9b\u6a21\u62df\u4e2d\uff0cLLM\u5145\u5f53\u4e2a\u4f53\u4ee3\u7406\uff0c\u8fdb\u884c\u793e\u4f1a\u4e92\u52a8\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5bf9\u8bdd\u4f7fLLM\u4ee3\u7406\u81ea\u53d1\u751f\u6210\u5e76\u9075\u5b88\u89c4\u8303\u7b56\u7565\u7684\u53ef\u80fd\u6027\uff0c\u4ee5\u6b64\u4e3a\u57fa\u7840\uff0c\u63a2\u7d22\u4e86\u5bf9Axelrod\u7684\u5143\u89c4\u8303\u6e38\u620f\u5de5\u4f5c\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u5bf9\u8bdd\uff0cLLM\u4ee3\u7406\u80fd\u591f\u4ec5\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u5f62\u6210\u590d\u6742\u7684\u793e\u4ea4\u89c4\u8303\uff0c\u5982\u5143\u89c4\u8303\u2014\u2014\u89c4\u8303\u60e9\u7f5a\u4e0d\u60e9\u7f5a\u4f5c\u5f0a\u884c\u4e3a\u7684\u89c4\u8303\u3002\u7ed3\u679c\u8bc1\u5b9e\u4e86\u4f7f\u7528LLM\u4ee3\u7406\u6a21\u62df\u793e\u4f1a\u4e92\u52a8\u548c\u7406\u89e3\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u6f14\u5316\u51fa\u590d\u6742\u7b56\u7565\u4e0e\u89c4\u8303\u7684\u6709\u6548\u6027\u3002\u672a\u6765\u7684\u5de5\u4f5c\u53ef\u80fd\u901a\u8fc7\u6269\u5c55\u5230\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u548c\u4ee3\u7406\u7279\u5f81\uff0c\u63ed\u793a\u66f4\u591a\u5173\u4e8e\u793e\u4f1a\u89c4\u8303\u5f62\u6210\u7684\u5fae\u5999\u673a\u5236\u3002|\n", "2409.00985": "|**2024-09-02**|**Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces**|Jiapeng Yu et.al.|[2409.00985](http://arxiv.org/abs/2409.00985)|**[link](https://github.com/yuqian2003/co_learning)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5728\u7ebf\u95ee\u7b54\u7cfb\u7edf\u4ece\u5a31\u4e50\u7528\u9014\u9010\u6e10\u8f6c\u5411\u4e13\u4e1a\u9886\u57df\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7801\u5b66\u4e60\uff08Co-Learning\uff09\u793e\u533a\u201d\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7ed3\u5408\u73af\u5883\u5f3a\u5316\u5b66\u4e60\uff08E-RL\uff09\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u72ec\u7acb\u4fee\u6b63\u4ee3\u7801\u9519\u8bef\u3002\u8be5\u7cfb\u7edf\u901a\u8fc7\u4e00\u4e2a\u5305\u542b702\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u539f\u59cb\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3aE-RL\u5956\u52b1\u6216\u60e9\u7f5a\u7684\u6807\u51c6\u3002\u901a\u8fc7\u5206\u6790\u5f53\u524d\u4ee3\u7406\u8f93\u5165\u7684\u9519\u8bef\u4ee3\u7801\uff0c\u9009\u62e9\u5408\u9002\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4ee5\u5b9e\u73b0\u6700\u4f73\u7684\u9519\u8bef\u4fee\u6b63\u51c6\u786e\u7387\u5e76\u51cf\u5c11\u4fee\u6b63\u65f6\u95f4\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u65e0E-RL\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8be5\u65b9\u6cd5\u5728\u7cbe\u786e\u5ea6\u5f97\u5206\u4e0a\u63d0\u9ad8\u4e863%\uff0c\u5728\u65f6\u95f4\u6210\u672c\u4e0a\u964d\u4f4e\u4e8615%\u3002\u6211\u4eec\u7684\u6e90\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://github.com/yuqian2003/Co_Learning**|\n", "2409.00135": "|**2024-08-29**|**HoneyComb: A Flexible LLM-Based Agent System for Materials Science**|Huan Zhang et.al.|[2409.00135](http://arxiv.org/abs/2409.00135)|null|\u4e3a\u4e86\u5e94\u5bf9\u6750\u6599\u79d1\u5b66\u4efb\u52a1\u4e2d\u7684\u590d\u6742\u6027\u5e76\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5e94\u7528\u65f6\u6240\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5982\u4f9d\u8d56\u8fc7\u65f6\u7684\u9690\u6027\u77e5\u8bc6\u5bfc\u81f4\u7684\u51c6\u786e\u6027\u4e0b\u964d\u548c\u5e7b\u89c9\u73b0\u8c61\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HoneyComb\u2014\u2014\u9996\u4e2a\u4e13\u95e8\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684LLM\u4ee3\u7406\u7cfb\u7edf\u3002HoneyComb\u901a\u8fc7\u5229\u7528\u4e00\u4e2a\u57fa\u4e8e\u53ef\u9760\u6587\u732e\u7684\u9ad8\u8d28\u91cf\u6750\u6599\u79d1\u5b66\u77e5\u8bc6\u5e93\uff08MatSciKB\uff09\u548c\u4e00\u79cd\u521b\u65b0\u7684\u5de5\u5177\u96c6\uff08ToolHub\uff09\uff0c\u589e\u5f3a\u5176\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u7279\u6709\u7684\u63a8\u7406\u4e0e\u8ba1\u7b97\u80fd\u529b\u3002 MatSciKB\u662f\u4e00\u4e2a\u7ecf\u8fc7\u7cbe\u5fc3\u7f16\u7e82\u3001\u7ed3\u6784\u5316\u7684\u77e5\u8bc6\u96c6\u5408\uff0c\u65e8\u5728\u6db5\u76d6\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5173\u952e\u4fe1\u606f\u3002\u800cToolHub\u5219\u91c7\u7528\u4e86\u4e00\u79cd\u5f52\u7eb3\u5f0f\u5de5\u5177\u6784\u5efa\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u3001\u5206\u89e3\u548c\u4f18\u5316\u9002\u7528\u4e8e\u6750\u6599\u79d1\u5b66\u7684API\u5de5\u5177\uff0c\u4ece\u800c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u7cfb\u7edf\u7684\u5b9e\u7528\u6027\u3002\u6b64\u5916\uff0cHoneyComb\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u68c0\u7d22\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u667a\u80fd\u9009\u62e9\u6700\u5408\u9002\u7684\u77e5\u8bc6\u6765\u6e90\u6216\u5de5\u5177\uff0c\u786e\u4fdd\u4e86\u7b54\u6848\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHoneyComb\u5728\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5404\u79cd\u4efb\u52a1\u4e0a\u5747\u8868\u73b0\u51fa\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u7684\u80fd\u529b\uff0c\u6210\u529f\u5730\u5f25\u5408\u4e86\u5f53\u524dLLM\u6280\u672f\u4e0e\u6750\u6599\u79d1\u5b66\u7279\u5b9a\u9700\u6c42\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u66f4\u4e3a\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u7684\u53ef\u6269\u5c55\u6846\u67b6\u6613\u4e8e\u6269\u5c55\u81f3\u5176\u4ed6\u79d1\u5b66\u9886\u57df\uff0c\u5c55\u793a\u4e86\u5176\u5728\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u548c\u5e94\u7528\u53d1\u5c55\u65b9\u9762\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u800c\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u7406\u5ff5\uff0c\u5373\u8bd7\u6b4c\u751f\u6210\u7cfb\u7edf\u7684\u5b66\u4e60\u8fc7\u7a0b\u5e94\u66f4\u52a0\u4eba\u6027\u5316\uff0c\u5e76\u4e14\u5176\u8f93\u51fa\u66f4\u52a0\u591a\u6837\u548c\u65b0\u9896\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u5f3a\u8c03\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u4e4b\u5916\u7684\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u5229\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\uff08GPT-3\u548cGPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a\u4ee3\u7406\u7cfb\u7edf\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u5e26\u6765\u4e86\u597d\u5904\uff0c\u5bfc\u81f4n-gram\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\u3002\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u4e0a\u8868\u73b0\u51fa\u7fa4\u4f53\u5206\u5316\u3002\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u53d7\u76ca\uff0c\u5e76\u4e14\u5177\u6709\u975e\u540c\u8d28\u4ee3\u7406\u7684\u66f4\u591a\u6837\u5316\u7684\u6a21\u578b\u96c6\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u663e\u793a\u51fa\u968f\u7740\u65f6\u95f4\u63a8\u79fb\uff0c\u8bcd\u6c47\u591a\u6837\u6027\u51cf\u5c11\uff0c\u5e76\u4e14\u6ca1\u6709\u8868\u73b0\u51fa\u9884\u671f\u7684\u7fa4\u4f53\u5206\u5316\u610f\u56fe\u7684\u793e\u4f1a\u7f51\u7edc\u3002\u6211\u4eec\u7684\u8bba\u6587\u4e3b\u5f20\uff0c\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u5c06\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5efa\u6a21\uff09\u7eb3\u5165\u8003\u8651\u8303\u56f4\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u7684\u4ea4\u4e92\u65b9\u5f0f\u3002**|\n", "2409.03440": "|**2024-09-05**|**Rx Strategist: Prescription Verification using LLM Agents System**|Phuc Phan Van et.al.|[2409.03440](http://arxiv.org/abs/2409.03440)|null|\u4e3a\u4e86\u4fdd\u969c\u60a3\u8005\u5b89\u5168\uff0c\u73b0\u4ee3\u836f\u7269\u590d\u6742\u6027\u8981\u6c42\u4e25\u683c\u5904\u65b9\u9a8c\u8bc1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014Rx Strategist\uff0c\u5b83\u5229\u7528\u77e5\u8bc6\u56fe\u8c31\u548c\u4e0d\u540c\u7684\u641c\u7d22\u7b56\u7565\uff0c\u7ed3\u5408\u4ee3\u7406\u6846\u67b6\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4ee5\u589e\u5f3a\u5176\u80fd\u529b\u3002\u8fd9\u79cd\u591a\u7ef4\u5ea6\u7684\u6280\u672f\u5141\u8bb8\u6784\u5efa\u4e00\u4e2a\u591a\u9636\u6bb5\u7684LLM\u7ba1\u9053\uff0c\u5e76\u4ece\u81ea\u5b9a\u4e49\u6d3b\u6027\u6210\u5206\u6570\u636e\u5e93\u4e2d\u53ef\u9760\u5730\u68c0\u7d22\u4fe1\u606f\u3002\u8be5\u7ba1\u9053\u8986\u76d6\u4e86\u5904\u65b9\u9a8c\u8bc1\u7684\u4e0d\u540c\u65b9\u9762\uff0c\u5982\u9002\u5e94\u75c7\u3001\u5242\u91cf\u548c\u53ef\u80fd\u7684\u836f\u7269\u76f8\u4e92\u4f5c\u7528\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u5305\u542b\u4e86\u8fd9\u4e9b\u65b9\u9762\u7684\u5185\u5bb9\u3002 \u901a\u8fc7\u5728\u8fd9\u4e9b\u9636\u6bb5\u5206\u6563\u63a8\u7406\uff0c\u6211\u4eec\u7f13\u89e3\u4e86\u5355\u4e00LLM\u6280\u672f\u7684\u7f3a\u70b9\uff0c\u63d0\u9ad8\u4e86\u6b63\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cRx Strategist\u8d85\u8d8a\u4e86\u8bb8\u591a\u5f53\u524d\u7684LLMs\uff0c\u5176\u6027\u80fd\u4e0e\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e34\u5e8a\u836f\u5e08\u76f8\u5f53\u3002\u5728\u73b0\u4ee3\u836f\u7269\u7684\u590d\u6742\u4e16\u754c\u4e2d\uff0c\u5c06LLMs\u4e0e\u7ec4\u7ec7\u5316\u77e5\u8bc6\u548c\u9ad8\u7ea7\u641c\u7d22\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u51cf\u5c11\u5904\u65b9\u9519\u8bef\u5e76\u63d0\u9ad8\u60a3\u8005\u7ed3\u679c\u7684\u53ef\u884c\u9014\u5f84\u3002|\n", "2409.03258": "|**2024-09-05**|**GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding**|Yukun Cao et.al.|[2409.03258](http://arxiv.org/abs/2409.03258)|null|\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u56fe\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u901a\u8fc7\u63cf\u8ff0\u5e8f\u5217\u7684\u56fe\u8bf4\u660e\u6765\u7406\u89e3\u56fe\u5f62\u7ed3\u6784\u4fe1\u606f\u65f6\uff0c\u5c24\u5176\u662f\u5728\u56fe\u7684\u5927\u5c0f\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u9047\u5230\u4e86\u6311\u6218\u3002\u6211\u4eec\u5f52\u56e0\u4e8eLLMs\u5728\u56fe\u63cf\u8ff0\u5e8f\u5217\u7684\u4e0d\u540c\u4f4d\u7f6e\u4e0a\u5b58\u5728\u4e0d\u5747\u5300\u7684\u8bb0\u5fc6\u6027\u80fd\uff0c\u5373\u6240\u8c13\u7684\u201c\u4f4d\u7f6e\u504f\u89c1\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GraphInsight\uff0c\u4e00\u4e2a\u65e8\u5728\u63d0\u9ad8LLMs\u5bf9\u5b8f\u89c2\u548c\u5fae\u89c2\u56fe\u5f62\u4fe1\u606f\u7406\u89e3\u7684\u65b0\u6846\u67b6\u3002GraphInsight\u57fa\u4e8e\u4e24\u4e2a\u5173\u952e\u7b56\u7565\uff1a1\uff09\u5c06\u5173\u952e\u56fe\u5f62\u4fe1\u606f\u653e\u7f6e\u5728LLMs\u8868\u73b0\u51fa\u66f4\u5f3a\u8bb0\u5fc6\u6027\u80fd\u7684\u4f4d\u7f6e\uff1b2\uff09\u5bf9\u4e8e\u8bb0\u5fc6\u6027\u80fd\u8f83\u5f31\u7684\u533a\u57df\uff0c\u63a2\u7d22\u4f7f\u7528\u8f7b\u91cf\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e93\uff0c\u7075\u611f\u6765\u81ea\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u3002\u6b64\u5916\uff0cGraphInsight\u8fd8\u63a2\u7d22\u4e86\u5c06\u8fd9\u4e24\u79cd\u7b56\u7565\u96c6\u6210\u5230LLM\u4ee3\u7406\u6d41\u7a0b\u4e2d\uff0c\u4ee5\u89e3\u51b3\u9700\u8981\u591a\u6b65\u63a8\u7406\u7684\u590d\u5408\u56fe\u4efb\u52a1\u3002\u5e7f\u6cdb\u7684\u57fa\u51c6\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e0d\u540c\u5927\u5c0f\u7684\u56fe\u5f62\u7ed3\u6784\u7406\u89e3\u4efb\u52a1\u4e0a\uff0cGraphInsight\u663e\u8457\u8d85\u8d8a\u4e86\u6240\u6709\u5176\u4ed6\u56fe\u63cf\u8ff0\u65b9\u6cd5\uff08\u4f8b\u5982\u63d0\u793a\u6280\u672f\u3001\u91cd\u65b0\u6392\u5e8f\u7b56\u7565\u7b49\uff09\u3002|\n", "2409.02977": "|**2024-09-04**|**Large Language Model-Based Agents for Software Engineering: A Survey**|Junwei Liu et.al.|[2409.02977](http://arxiv.org/abs/2409.02977)|**[link](https://github.com/fudanselab/agent4se-paper-list)**|**\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u7bc7\u5168\u9762\u4e14\u7cfb\u7edf\u7684\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u7684\u5e94\u7528\u7684\u7efc\u8ff0\u3002\u6211\u4eec\u6536\u96c6\u4e86106\u7bc7\u8bba\u6587\uff0c\u5e76\u4ece\u4e24\u4e2a\u89d2\u5ea6\u8fdb\u884c\u5206\u7c7b\uff0c\u5373\u8f6f\u4ef6\u5de5\u7a0b\u89c6\u89d2\u548c\u4ee3\u7406\u89c6\u89d2\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6b64\u7efc\u8ff0\u7684\u4ed3\u5e93\u5730\u5740\u4e3a\uff1ahttps://github.com/FudanSELab/Agent4SE-Paper-List\u3002**|\n", "2409.05001": "|**2024-09-08**|**A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement**|Huan Zhang et.al.|[2409.05001](http://arxiv.org/abs/2409.05001)|**[link](https://github.com/nju-websoft/paircoder)**|**\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u77a9\u76ee\u7684\u6027\u80fd\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u63d0\u793a\u6280\u672f\u53ca\u4ee3\u7801\u7cbe\u70bc\u5bf9LLM\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u7f16\u7a0b\u95ee\u9898\u65f6\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u5177\u6709\u50f5\u5316\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPairCoder\u7684\u65b0\u578bLLM\u57fa\u6846\u67b6\uff0c\u65e8\u5728\u6a21\u4eff\u53cc\u4eba\u534f\u4f5c\u7f16\u7a0b\u5b9e\u8df5\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 PairCoder\u7531\u4e24\u4e2a\u534f\u4f5c\u7684LLM\u4ee3\u7406\u7ec4\u6210\uff1a\u5bfc\u822a\u5458\uff08Navigator\uff09\u548c\u9a7e\u9a76\u5458\uff08Driver\uff09\u3002\u5bfc\u822a\u5458\u8d1f\u8d23\u63d0\u51fa\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3001\u9009\u62e9\u5f53\u524d\u6700\u4f73\u8ba1\u5212\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u6307\u5bfc\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u9a7e\u9a76\u5458\u5219\u9075\u5faa\u5bfc\u822a\u5458\u7684\u6307\u5f15\uff0c\u8fdb\u884c\u521d\u59cb\u4ee3\u7801\u751f\u6210\u3001\u4ee3\u7801\u6d4b\u8bd5\u548c\u4f18\u5316\u3002 \u8fd9\u79cd\u4ea4\u66ff\u548c\u8fed\u4ee3\u7684\u5de5\u4f5c\u6d41\u7a0b\u5305\u62ec\u591a\u8ba1\u5212\u63a2\u7d22\u548c\u57fa\u4e8e\u53cd\u9988\u7684\u7ec6\u5316\uff0c\u6a21\u62df\u4e86\u53cc\u4eba\u7a0b\u5e8f\u5458\u7684\u5408\u4f5c\u65b9\u5f0f\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\uff0c\u5728\u591a\u79cd\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u4e0a\u5bf9PairCoder\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPairCoder\u5728\u51c6\u786e\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u76f4\u63a5\u4f7f\u7528\u63d0\u793a\u7684LLM\uff0c\u76f8\u5bf9pass@1\u63d0\u9ad8\u4e8612.00%-162.43%\u3002**|\n", "2409.04617": "|**2024-09-06**|**Sparse Rewards Can Self-Train Dialogue Agents**|Barrett Martin Lattimer et.al.|[2409.04617](http://arxiv.org/abs/2409.04617)|**[link](https://github.com/asappresearch/josh-llm-simulation-training)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u8f6e\u5bf9\u8bdd\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\u4e3b\u8981\u7531\u76d1\u7763\u5fae\u8c03\u548c\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u53cd\u9988\u9a71\u52a8\u3002\u7136\u800c\uff0c\u968f\u7740\u57fa\u7840LLM\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u63d0\u5347\uff0c\u83b7\u53d6\u6709\u610f\u4e49\u7684\u4eba\u7c7b\u53cd\u9988\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u67d0\u4e9b\u9886\u57df\u4e2d\uff0c\u57fa\u7840LLM\u53ef\u80fd\u6700\u7ec8\u8d85\u8d8a\u4eba\u7c7b\u80fd\u529b\uff0c\u4f7f\u5f97\u4f20\u7edf\u7684\u57fa\u4e8e\u53cd\u9988\u7684\u65b9\u6cd5\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u81ea\u6211\u6539\u8fdb\u8303\u5f0f\uff0c\u5141\u8bb8LLM\u4ee3\u7406\u5728\u6ca1\u6709\u5916\u90e8\u4eba\u7c7b\u53cd\u9988\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u63d0\u9ad8\u5176\u6027\u80fd\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u6bd4\u7ed3\u679c\u4e3a\u6a21\u62df\u6536\u83b7\u201d\uff08JOSH\uff09\u7684\u81ea\u6211\u5bf9\u9f50\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u7a00\u758f\u5956\u52b1\u6a21\u62df\u73af\u5883\u6765\u63d0\u53d6\u7406\u60f3\u884c\u4e3a\uff0c\u5e76\u8fdb\u4e00\u6b65\u8bad\u7ec3LLM\u4ee5\u81ea\u8eab\u8f93\u51fa\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u4eceMultiWOZ\u4e2d\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u5de5\u5177\u8c03\u7528\u7684\u7a00\u758f\u5956\u52b1\u4eff\u771f\u73af\u5883\uff0c\u79f0\u4e3aToolWOZ\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528JOSH\u8bad\u7ec3\u7684\u6a21\u578b\uff08\u65e0\u8bba\u662f\u5c0f\u578b\u8fd8\u662f\u524d\u6cbf\u6a21\u578b\uff09\uff0c\u5728\u57fa\u4e8e\u5de5\u5177\u7684\u4ea4\u4e92\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u6a21\u578b\u80fd\u529b\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2409.06351": "|**2024-09-10**|**MAGDA: Multi-agent guideline-driven diagnostic assistance**|David Bani-Harouni et.al.|[2409.06351](http://arxiv.org/abs/2409.06351)|null|\u5728\u7d27\u6025\u62a4\u7406\u90e8\u95e8\u3001\u504f\u8fdc\u533b\u9662\u6216\u53d1\u5c55\u4e2d\u56fd\u5bb6\u7684\u8bca\u6240\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u7ecf\u5e38\u7f3a\u4e4f\u7531\u8bad\u7ec3\u6709\u7d20\u7684\u653e\u5c04\u79d1\u533b\u751f\u5feb\u901f\u5206\u6790\u5f71\u50cf\u7684\u80fd\u529b\uff0c\u8fd9\u4f1a\u5bf9\u75c5\u4eba\u7684\u5065\u5eb7\u62a4\u7406\u4ea7\u751f\u4e0d\u5229\u5f71\u54cd\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6709\u53ef\u80fd\u901a\u8fc7\u63d0\u4f9b\u6709\u52a9\u4e8e\u4ed6\u4eec\u51b3\u7b56\u7684\u89c1\u89e3\u6765\u7f13\u89e3\u8fd9\u4e9b\u4e34\u5e8a\u533b\u751f\u7684\u538b\u529b\u3002\u5c3d\u7ba1\u8fd9\u4e9bLLM\u5728\u5c55\u793a\u5176\u7406\u8bba\u533b\u5b66\u77e5\u8bc6\u7684\u533b\u5b66\u8003\u8bd5\u4e0a\u53d6\u5f97\u4e86\u9ad8\u5206\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4e0d\u9075\u5faa\u533b\u5b66\u6307\u5357\u3002\u4e3a\u6b64\u9879\u5de5\u4f5c\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u96f6\u6837\u672c\u6307\u5357\u9a71\u52a8\u51b3\u7b56\u652f\u6301\u65b9\u6cd5\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7531\u591a\u4e2aLLM\u4ee3\u7406\u7ec4\u6210\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u4ee3\u7406\u914d\u5907\u4e86\u5bf9\u6bd4\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u534f\u4f5c\u65b9\u5f0f\u8fbe\u6210\u60a3\u8005\u8bca\u65ad\u3002\u5728\u5411\u8fd9\u4e9b\u4ee3\u7406\u63d0\u4f9b\u7b80\u5355\u7684\u8bca\u65ad\u6307\u5357\u540e\uff0c\u5b83\u4eec\u4f1a\u5408\u6210\u63d0\u793a\u5e76\u6839\u636e\u8fd9\u4e9b\u6307\u5357\u7b5b\u9009\u56fe\u50cf\u4ee5\u5bfb\u627e\u53d1\u73b0\u3002\u6700\u540e\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e00\u4e2a\u53ef\u7406\u89e3\u7684\u63a8\u7406\u94fe\u8def\u6765\u89e3\u91ca\u5176\u8bca\u65ad\u7ed3\u679c\uff0c\u5e76\u81ea\u6211\u7cbe\u70bc\u4ee5\u8003\u8651\u75be\u75c5\u4e4b\u95f4\u7684\u76f8\u4e92\u4f9d\u8d56\u6027\u3002\u7531\u4e8e\u6211\u4eec\u7684\u65b9\u6cd5\u662f\u96f6\u6837\u672c\u7684\uff0c\u56e0\u6b64\u9002\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u573a\u666f\uff0c\u5728\u8fd9\u4e9b\u573a\u666f\u4e2d\u8bad\u7ec3\u6570\u636e\u6709\u9650\uff0c\u4f46\u4e13\u5bb6\u8bbe\u8ba1\u7684\u75be\u75c5\u63cf\u8ff0\u53ef\u7528\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u80f8\u90e8X\u5c04\u7ebf\u6570\u636e\u96c6CheXpert\u548cChestX-ray 14 Longtail\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u4e0e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u76f8\u6bd4\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u80fd\u591f\u5e94\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u7684\u6cdb\u5316\u3002|\n", "2409.09030": "|**2024-09-23**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanlin Wang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5e76\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u3002\u6211\u4eec\u53d1\u73b0\u8bb8\u591a\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u7684\u6df1\u5ea6\u7efc\u8ff0\uff0c\u4ee5\u6574\u7406\u5176\u53d1\u5c55\u80cc\u666f\u3001\u5206\u6790\u5982\u4f55\u7ed3\u5408LLMs\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u7c7b\u4efb\u52a1\u4ee5\u53ca\u9610\u660eSE\u4e2d\u7684LLMs\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u5f00\u5c55\u9996\u6b21\u9488\u5bf9\u7ed3\u5408LLMs\u4ee3\u7406\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2dLLMs\u4ee3\u7406\u7684\u6846\u67b6\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u3002\u540c\u65f6\uff0c\u603b\u7ed3\u4e86\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5e76\u9488\u5bf9\u73b0\u6709\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u8bba\u6587\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09013": "|**2024-09-13**|**AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents**|Zhe Su et.al.|[2409.09013](http://arxiv.org/abs/2409.09013)|null|\u4e3a\u4e86\u5b89\u5168\u548c\u6210\u529f\u5730\u90e8\u7f72\uff0c\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5fc5\u987b\u540c\u65f6\u6ee1\u8db3\u771f\u5b9e\u6027\u548c\u5b9e\u7528\u6027\u76ee\u6807\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u5f80\u5f80\u5728\u51b2\u7a81\u4e2d\uff0c\u4f8b\u5982AI\u52a9\u624b\u5e2e\u52a9\u4e8c\u624b\u8f66\u9500\u552e\u5458\u9500\u552e\u6709\u7455\u75b5\u7684\u6c7d\u8f66\u3002\u8fd9\u79cd\u51b2\u7a81\u90e8\u5206\u5f52\u56e0\u4e8e\u6a21\u7cca\u6216\u8bef\u5bfc\u6027\u7684\u7528\u6237\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAI-LieDar\u7684\u6846\u67b6\uff0c\u4ee5\u7814\u7a76\u5728\u591a\u8f6e\u4ea4\u4e92\u8bbe\u7f6e\u4e2d\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5982\u4f55\u5904\u7406\u5b9e\u7528\u6027\u548c\u771f\u5b9e\u6027\u7684\u51b2\u7a81\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u73b0\u5b9e\u573a\u666f\uff0c\u5176\u4e2d\u8bed\u8a00\u4ee3\u7406\u88ab\u6307\u793a\u5b9e\u73b0\u4e0e\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u771f\u5b9e\u6027\u51b2\u7a81\u7684\u76ee\u6807\u3002\u4e3a\u4e86\u5927\u89c4\u6a21\u8bc4\u4f30\u771f\u5b9e\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5fc3\u7406\u5b66\u6587\u732e\u7684\u53ef\u4fe1\u5ea6\u68c0\u6d4b\u5668\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u7684\u56de\u7b54\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6240\u6709\u6a21\u578b\u7684\u771f\u5b9e\u56de\u7b54\u6bd4\u4f8b\u4e0d\u523050%\uff0c\u5c3d\u7ba1\u8fbe\u5230\u76ee\u6807\uff08\u5b9e\u7528\u6027\uff09\u548c\u771f\u5b9e\u6027\u7684\u6bd4\u4f8b\u5728\u4e0d\u540c\u6a21\u578b\u4e2d\u6709\u6240\u5dee\u5f02\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u6d4b\u8bd5\u4e86LLM\u7684\u53ef\u5f15\u5bfc\u6027\uff0c\u53d1\u73b0\u6a21\u578b\u4f1a\u9075\u5faa\u6076\u610f\u6307\u4ee4\u6765\u6b3a\u9a97\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5f15\u5bfc\u4f7f\u5176\u8d8b\u5411\u771f\u5b9e\u7684\u6a21\u578b\u4e5f\u4ecd\u7136\u53ef\u80fd\u8bf4\u8c0e\u3002 \u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86LLM\u4e2d\u771f\u5b9e\u6027\u7684\u590d\u6742\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u786e\u4fddLLM\u548cAI\u4ee3\u7406\u7684\u5b89\u5168\u53ef\u9760\u90e8\u7f72\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u9075\u5b88\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u57fa\u4e8e\u4eba\u5de5\u7684\u5408\u89c4\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u65e5\u76ca\u589e\u52a0\u91cf\u4ee5\u53ca\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\uff0c\u9762\u4e34\u7740\u96be\u4ee5\u6269\u5c55\u7684\u95ee\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\u4e3a\u81ea\u52a8\u5185\u5bb9\u5408\u89c4\u9a8c\u8bc1\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002\u672c\u5de5\u4f5c\u8bc4\u4f30\u4e86\u516d\u4e2a\u57fa\u4e8eOpen-LLMs\u6784\u5efa\u7684AI\u4ee3\u7406\uff0c\u7528\u4e8e\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u81ea\u52a8\u5316\u89c4\u5219\u9075\u5faa\u68c0\u67e5\uff0c\u5728\u8fd9\u79cd\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\u4e2d\uff0c\u7531\u4e8e\u793e\u533a\u8303\u56f4\u548c\u89c4\u5219\u7684\u5f02\u8d28\u6027\uff0c\u8fd9\u4e00\u4efb\u52a1\u5c24\u4e3a\u56f0\u96be\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\uff0c\u6211\u4eec\u53d1\u73b0AI\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u4e0d\u5408\u89c4\u7684\u5185\u5bb9\u3001\u7406\u89e3\u8bed\u8a00\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u591a\u6837\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u8868\u73b0\u51fa\u9ad8\u5ea6\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\u8bc4\u5206\u89e3\u91ca\u4e0e\u5408\u89c4\u5efa\u8bae\u3002\u57fa\u4e8e\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u7c7b\u8bc4\u4f30\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08717": "|**2024-09-13**|**Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents**|Junchi Yao et.al.|[2409.08717](http://arxiv.org/abs/2409.08717)|null|\u5728\u793e\u4ea4\u5a92\u4f53\u65e5\u76ca\u6210\u4e3a\u793e\u4f1a\u8fd0\u52a8\u5f62\u6210\u516c\u4f17\u610f\u89c1\u7684\u91cd\u8981\u5e73\u53f0\u7684\u80cc\u666f\u4e0b\uff0c\u51c6\u786e\u6a21\u62df\u548c\u9884\u6d4b\u7528\u6237\u610f\u89c1\u52a8\u6001\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u73b0\u8c61\u3001\u653f\u7b56\u5236\u5b9a\u4ee5\u53ca\u5f15\u5bfc\u516c\u4f17\u610f\u89c1\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6a21\u62df\u65b9\u6cd5\u5728\u6355\u6349\u7528\u6237\u884c\u4e3a\u7684\u590d\u6742\u6027\u548c\u52a8\u6001\u6027\u65b9\u9762\u9762\u4e34\u7740\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u793e\u4ea4\u5a92\u4f53\u7528\u6237\u610f\u89c1\u52a8\u6001\u6a21\u62df\u65b9\u6cd5\u2014\u2014FDE-LLM\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u7ed3\u5408\u4e86\u610f\u89c1\u52a8\u6001\u4e0e\u6d41\u884c\u75c5\u6a21\u578b\uff0c\u6709\u6548\u7ea6\u675f\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u884c\u4e3a\u548c\u610f\u89c1\u6f14\u5316\u8fc7\u7a0b\uff0c\u4f7f\u5176\u66f4\u52a0\u7b26\u5408\u73b0\u5b9e\u7f51\u7edc\u4e16\u754c\u3002\u7279\u522b\u5730\uff0cFDE-LLM\u5c06\u7528\u6237\u5206\u4e3a\u610f\u89c1\u9886\u8896\u548c\u8ddf\u968f\u8005\u4e24\u5927\u7c7b\u3002\u610f\u89c1\u9886\u8896\u57fa\u4e8eLLM\u89d2\u8272\u626e\u6f14\uff0c\u5e76\u53d7\u7ec6\u80de\u81ea\u52a8\u673a\uff08CA\uff09\u6a21\u578b\u7ea6\u675f\uff0c\u800c\u610f\u89c1\u8ddf\u968f\u8005\u5219\u878d\u5165\u4e86\u4e00\u4e2a\u7ed3\u5408CA\u6a21\u578b\u4e0eSIR\u6a21\u578b\u7684\u52a8\u6001\u7cfb\u7edf\u3002\u8fd9\u79cd\u521b\u65b0\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u62df\u7684\u51c6\u786e\u6027\u548c\u6548\u7387\u3002 \u5b9e\u9a8c\u5728\u56db\u4e2a\u771f\u5b9e\u5fae\u535a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u5e76\u4f7f\u7528\u5f00\u6e90\u6a21\u578bChatGLM\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u578b\uff08ABM\uff09\u610f\u89c1\u52a8\u6001\u7b97\u6cd5\u548c\u57fa\u4e8eLLM\u7684\u610f\u89c1\u4f20\u64ad\u7b97\u6cd5\uff0c\u6211\u4eec\u7684FDE-LLM\u7b97\u6cd5\u5728\u51c6\u786e\u6027\u4e0e\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002|\n", "2409.10372": "|**2024-09-19**|**Instigating Cooperation among LLM Agents Using Adaptive Information Modulation**|Qiliang Chen et.al.|[2409.10372](http://arxiv.org/abs/2409.10372)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u4f5c\u4e3a\u4eba\u7c7b\u6218\u7565\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5e76\u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u5728\u56e2\u961f\u73af\u5883\u4e2d\u8fdb\u884c\u4e0d\u65ad\u6f14\u5316\u7684\u6218\u7565\u4e92\u52a8\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u4e86\u4f20\u7edf\u7684\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u62df\uff0c\u901a\u8fc7\u4f7f\u7528\u7b56\u7565\u6027\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08SLA\uff09\u4ee5\u53ca\u5f15\u5165\u52a8\u6001\u548c\u9002\u5e94\u6027\u7684\u6cbb\u7406\uff0c\u901a\u8fc7\u4fc3\u8fdb\u793e\u4f1a\u884c\u4e3a\u7684\u5f3a\u5316\u5b66\u4e60\u4ee3\u7406\uff08PPA\uff09\uff0c\u8be5\u4ee3\u7406\u8c03\u8282\u7f51\u7edc\u4e2d\u4ee3\u7406\u4e4b\u95f4\u7684\u4fe1\u606f\u8bbf\u95ee\uff0c\u4ee5\u4f18\u5316\u793e\u4f1a\u798f\u5229\u5e76\u4fc3\u8fdb\u4eb2\u793e\u4f1a\u884c\u4e3a\u3002\u901a\u8fc7\u5728\u8fed\u4ee3\u6e38\u620f\u4e2d\u9a8c\u8bc1\uff0c\u5305\u62ec\u56da\u5f92\u56f0\u5883\uff0c\u6211\u4eec\u5c55\u793a\u4e86SLA\u4ee3\u7406\u8868\u73b0\u51fa\u590d\u6742\u7684\u6218\u7565\u8c03\u6574\u3002PPA\u4ee3\u7406\u6709\u6548\u5730\u5b66\u4e60\u8c03\u6574\u4fe1\u606f\u900f\u660e\u5ea6\uff0c\u5bfc\u81f4\u5408\u4f5c\u7387\u663e\u8457\u63d0\u9ad8\u3002\u8fd9\u4e00\u6846\u67b6\u63d0\u4f9b\u4e86\u5bf9\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u793e\u4f1a\u52a8\u529b\u5b66\u7684\u91cd\u8981\u89c1\u89e3\uff0c\u4e3a\u5728\u5b9e\u9645\u56e2\u961f\u73af\u5883\u4e2d\u90e8\u7f72AI\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.09785": "|**2024-09-17**|**Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition**|Chao-Han Huck Yang et.al.|[2409.09785](http://arxiv.org/abs/2409.09785)|null|\u5728\u8fd1\u671f\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u6280\u672f\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u89e3\u7801\u7684\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u5728\u58f0\u5b66\u5efa\u6a21\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6210\u4e3a\u4e86\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u8bed\u8a00\u5efa\u6a21\u5728\u8bed\u97f3\u5904\u7406\u9886\u57df\u7684\u6f5c\u5728\u65b0\u80fd\u529b\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u751f\u6210\u6027\u8bed\u97f3\u8f6c\u5f55\u9519\u8bef\u4fee\u6b63\u201d\uff08GenSEC\uff09\u7684\u6311\u6218\u3002\u8be5\u6311\u6218\u5305\u542b\u4e86\u4e09\u4e2a\u9488\u5bf9\u540eASR\u8bed\u8a00\u6a21\u578b\u7684\u4efb\u52a1\uff1a\uff08i\uff09\u540eASR\u8f6c\u5f55\u4fee\u6b63\u3001\uff08ii\uff09\u8bf4\u8bdd\u8005\u6807\u7b7e\u5316\u4ee5\u53ca\uff08iii\uff09\u60c5\u611f\u8bc6\u522b\u3002\u8fd9\u4e9b\u4efb\u52a1\u65e8\u5728\u6a21\u62df\u672a\u6765\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8bed\u97f3\u754c\u9762\u4ee3\u7406\u5904\u7406\u5de5\u4f5c\u65f6\u7684\u573a\u666f\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6216\u57fa\u4e8e\u4ee3\u7406\u7684API\u6765\u4fdd\u6301\u5bf9\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u8ba8\u8bba\u4e86\u57fa\u51c6\u8bc4\u4f30\u7684\u7ed3\u679c\u4ee5\u53ca\u8bbe\u8ba1\u672a\u6765\u8bc4\u4f30\u65f6\u5e94\u6c72\u53d6\u7684\u7ecf\u9a8c\u6559\u8bad\u3002|\n", "2409.09584": "|**2024-09-15**|**RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation**|Qingyao Li et.al.|[2409.09584](http://arxiv.org/abs/2409.09584)|null|\u672c\u6587\u9488\u5bf9LLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\u4e0e\u6811\u641c\u7d22\u7b97\u6cd5\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u5f53\u524d\u7684\u641c\u7d22\u7b97\u6cd5\u5728\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u4f4e\u641c\u7d22\u8d28\u91cf\u7684\u95ee\u9898\uff0c\u4e3b\u8981\u6e90\u4e8e\u4ee5\u4e0b\u4e09\u4e2a\u539f\u56e0\uff1a1\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u9ad8\u63a8\u7406\u8981\u6c42\u7684\u641c\u7d22\u7a7a\u95f4\u8bbe\u8ba1\u4e0d\u5408\u7406\uff1b2\uff09\u672a\u80fd\u5145\u5206\u7ed3\u5408\u4ee3\u7801\u53cd\u9988\u4f18\u5316\u641c\u7d22\u8fc7\u7a0b\uff1b3\uff09\u5904\u7406\u8d1f\u53cd\u9988\u65f6\u6548\u7387\u4f4e\u4e0b\uff0c\u5bfc\u81f4\u641c\u7d22\u8d28\u91cf\u548c\u6548\u7387\u964d\u4f4e\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014RethinkMCTS\uff08\u53cd\u601d\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff09\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5728\u751f\u6210\u4ee3\u7801\u4e4b\u524d\u8fdb\u884c\u591a\u5c42\u6b21\u7684\u601d\u8003\u641c\u7d22\uff0c\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7b56\u7565\u9009\u9879\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cRethinkMCTS\u5229\u7528\u7ec6\u7c92\u5ea6\u7684\u4ee3\u7801\u6267\u884c\u53cd\u9988\u6784\u5efa\u53e3\u5934\u53cd\u9988\uff0c\u4ee5\u4fee\u6b63\u641c\u7d22\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u7684\u9519\u8bef\u601d\u8def\u3002\u8fd9\u79cd\u673a\u5236\u786e\u4fdd\u4e86\u641c\u7d22\u6cbf\u7740\u6b63\u786e\u7684\u63a8\u7406\u8def\u5f84\u524d\u8fdb\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4e2a\u641c\u7d22\u6811\u7684\u6574\u4f53\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4e4b\u524d\u7684\u57fa\u4e8e\u641c\u7d22\u548c\u53cd\u9988\u7684\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u76f8\u6bd4\uff0cRethinkMCTS\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u5728HumanEval\u6570\u636e\u96c6\u4e0a\uff0cRethinkMCTS\u5c06GPT-3.5-turbo\u7684pass@1\u6307\u6807\u4ece70.12\u63d0\u9ad8\u5230\u4e8689.02\uff0c\u5c06GPT-4o-mini\u7684pass@1\u6307\u6807\u4ece87.20\u63d0\u5347\u81f394.51\u3002\u901a\u8fc7\u6df1\u5165\u7684\u63a2\u7d22\u548c\u6539\u8fdb\u6574\u4e2a\u641c\u7d22\u6811\u7684\u8d28\u91cf\uff0cRethinkMCTS\u6709\u6548\u5730\u589e\u5f3a\u4e86\u641c\u7d22\u8fc7\u7a0b\u7684\u5168\u9762\u6027\u548c\u6df1\u5ea6\u3002|\n", "2409.09345": "|**2024-09-14**|**Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models**|Yuanzhao Zhai et.al.|[2409.09345](http://arxiv.org/abs/2409.09345)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u4efb\u52a1\u76f8\u5173Q\u503c\u6a21\u578b\u6765\u6307\u5bfc\u884c\u52a8\u9009\u62e9\u7684\u65b9\u6cd5\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u591a\u6b65\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u6536\u96c6\u4e86\u6807\u6ce8\u6709\u6b65\u9aa4\u7ea7Q\u503c\u7684\u51b3\u7b56\u8f68\u8ff9\uff0c\u5e76\u6784\u5efa\u4e86\u504f\u597d\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u4f7f\u7528\u53e6\u4e00\u4e2aLLM\u901a\u8fc7\u6b65\u9aa4\u7ea7\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u62df\u5408\u8fd9\u4e9b\u504f\u597d\uff0c\u4ece\u800c\u5f62\u6210Q\u503c\u6a21\u578b\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u5bf9\u4e8e\u6bcf\u4e2a\u51b3\u7b56\u6b65\u9aa4\uff0cLLM\u4ee3\u7406\u90fd\u4f1a\u9009\u62e9\u5177\u6709\u6700\u9ad8Q\u503c\u7684\u52a8\u4f5c\uff0c\u7136\u540e\u518d\u4e0e\u73af\u5883\u8fdb\u884c\u4ea4\u4e92\u3002\u6211\u4eec\u5c06\u8be5\u65b9\u6cd5\u5e94\u7528\u4e8e\u591a\u4e2a\u5f00\u6e90\u548cAPI\u96c6\u6210\u7684LLM\u4ee3\u7406\u4e0a\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5f15\u5165Q\u503c\u6a21\u578b\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6027\u80fd\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6784\u5efa\u4e8ePhi-3-mini-4k-instruct\u7684\u4ee3\u7406\u5728WebShop\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e86103%\uff0c\u5728HotPotQA\u4efb\u52a1\u4e0a\u63d0\u5347\u4e8675%\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4o-mini\u3002\u6b64\u5916\uff0cQ\u503c\u6a21\u578b\u8fd8\u5177\u5907\u51e0\u4e2a\u4f18\u52bf\uff0c\u5982\u5bf9\u4e0d\u540cLLM\u4ee3\u7406\u7684\u6cdb\u5316\u80fd\u529b\u548c\u4e0e\u73b0\u6709\u63d0\u793a\u7b56\u7565\u65e0\u7f1d\u96c6\u6210\u7684\u80fd\u529b\u3002|\n", "2409.09271": "|**2024-09-14**|**Python Symbolic Execution with LLM-powered Code Generation**|Wenhan Wang et.al.|[2409.09271](http://arxiv.org/abs/2409.09271)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u589e\u5f3a\u7684\u4ee3\u7406\u5de5\u5177\u2014\u2014LLM-Sym\u3002\u8be5\u5de5\u5177\u65e8\u5728\u89e3\u51b3\u4f7f\u7528\u7b26\u53f7\u6267\u884c\u6280\u672f\u5728\u52a8\u6001\u7c7b\u578b\u8bed\u8a00\u5982Python\u4e2d\u9047\u5230\u7684\u4e3b\u8981\u6311\u6218\u3002\u901a\u8fc7\u81ea\u52a8\u8c03\u7528SMT\u6c42\u89e3\u5668Z3\u6765\u89e3\u51b3\u6267\u884c\u8def\u5f84\u7ea6\u675f\uff0cLLM-Sym\u80fd\u591f\u6269\u5c55\u57fa\u7840\u7684\u7b26\u53f7\u6267\u884c\u5f15\u64ce\uff0c\u4f7f\u5176\u652f\u6301\u5305\u542b\u590d\u6742\u6570\u636e\u7c7b\u578b`list`\u7684\u7a0b\u5e8f\u3002 LLM-Sym\u7684\u6838\u5fc3\u8d21\u732e\u5728\u4e8e\u5c06\u590d\u6742\u7684Python\u8def\u5f84\u7ea6\u675f\u8f6c\u5316\u4e3aZ3\u4ee3\u7801\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5b9e\u73b0\u51c6\u786e\u7684\u8def\u5f84\u5230Z3\u4ee3\u7801\u7684\u8f6c\u6362\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6b65\u9aa4\u7684\u4ee3\u7801\u751f\u6210\u7ba1\u9053\uff0c\u5305\u62ec\u7c7b\u578b\u63a8\u65ad\u3001\u68c0\u7d22\u548c\u81ea\u6211\u7cbe\u70bc\u7b49\u73af\u8282\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM-Sym\u80fd\u591f\u89e3\u51b3\u5177\u6709\u590d\u6742\u63a7\u5236\u6d41\u548c\u5217\u8868\u6570\u636e\u7ed3\u6784\u7684LeetCode\u95ee\u9898\u4e2d\u7684\u8def\u5f84\u7ea6\u675f\uff0c\u8fd9\u662f\u57fa\u7840\u7b26\u53f7\u6267\u884c\u5f15\u64ce\u65e0\u6cd5\u505a\u5230\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e3aLLM\u4e0e\u7b26\u53f7\u6c42\u89e3\u5668\u63a8\u7406\u80fd\u529b\u7684\u7ed3\u5408\u5f00\u8f9f\u4e86\u9053\u8def\uff0c\u5e76\u4e3aLLM\u8f85\u52a9\u6d4b\u8bd5\u7528\u4f8b\u751f\u6210\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002|\n", "2409.11393": "|**2024-09-17**|**LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents**|Amine B. Hassouna et.al.|[2409.11393](http://arxiv.org/abs/2409.11393)|null|\u672c\u6587\u901a\u8fc7\u63d0\u51fa\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-Agent-UMF\uff08\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7edf\u4e00\u5efa\u6a21\u6846\u67b6\uff09\uff0c\u89e3\u51b3\u4e86\u96c6\u6210\u5de5\u5177\u5230\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u4ee5\u53ca\u5728\u591a\u4e2a\u524d\u6cbf\u5de5\u4f5c\u4e2d\u63d0\u51fa\u7684\u6539\u8fdb\u63aa\u65bd\u6240\u5bfc\u81f4\u7684\u8f6f\u4ef6\u67b6\u6784\u975e\u7edf\u4e00\u6027\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u7ed3\u5408\u53ca\u540e\u7eed\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u529f\u80fd\u5b9e\u73b0\u800c\u975e\u5b9a\u4e49\u7ec4\u4ef6\u8fb9\u754c\uff0c\u5bfc\u81f4\u4e86\u7814\u7a76\u4eba\u5458\u4e4b\u95f4\u7684\u672f\u8bed\u548c\u67b6\u6784\u4e0a\u7684\u6df7\u6dc6\u3002 \u8be5\u6846\u67b6\u660e\u786e\u4e86\u4ee3\u7406\u7684\u4e0d\u540c\u7ec4\u4ef6\uff0c\u5305\u62ecLLM\u3001\u5de5\u5177\u4ee5\u53ca\u65b0\u5f15\u5165\u7684\u6838\u5fc3\u4ee3\u7406\u6982\u5ff5\uff0c\u5176\u4f5c\u7528\u662f\u4ee3\u7406\u7684\u4e2d\u592e\u534f\u8c03\u8005\uff0c\u7531\u89c4\u5212\u3001\u8bb0\u5fc6\u3001\u4e2a\u4eba\u8d44\u6599\u3001\u884c\u52a8\u548c\u5b89\u5168\u4e94\u4e2a\u6a21\u5757\u7ec4\u6210\u3002\u6838\u5fc3\u4ee3\u7406\u7684\u5185\u90e8\u7ed3\u6784\u5dee\u5f02\u4fc3\u4f7f\u6211\u4eec\u5c06\u5176\u5206\u7c7b\u4e3a\u88ab\u52a8\u578b\u548c\u4e3b\u52a8\u578b\u4e24\u79cd\u7c7b\u578b\u3002\u57fa\u4e8e\u6b64\u5206\u7c7b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7ed3\u5408\u4e0d\u540c\u4e2a\u4f53\u4ee3\u7406\u72ec\u7279\u7279\u6027\u7684\u591a\u79cd\u591a\u6838\u5fc3\u4ee3\u7406\u67b6\u6784\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5c06\u8be5\u6846\u67b6\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u524d\u6cbf\u4ee3\u7406\uff0c\u5e76\u5c55\u793a\u5176\u4e0e\u529f\u80fd\u7684\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u6f84\u6e05\u4e86\u5148\u524d\u88ab\u5ffd\u89c6\u7684\u67b6\u6784\u65b9\u9762\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u56db\u4e2a\u63d0\u51fa\u7684\u67b6\u6784\u8fdb\u884c\u4e86\u8be6\u5c3d\u8bc4\u4f30\uff0c\u901a\u8fc7\u6574\u5408\u5177\u6709\u4e0d\u540c\u7279\u6027\u7684\u4ee3\u7406\u5230\u6df7\u5408\u4e3b\u52a8/\u88ab\u52a8\u6838\u5fc3\u4ee3\u7406\u7cfb\u7edf\u4e2d\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u63d0\u4f9b\u4e86\u5bf9\u7279\u5b9a\u4ee3\u7406\u7ec4\u5408\u53ef\u80fd\u5e26\u6765\u7684\u6539\u8fdb\u548c\u9762\u4e34\u7684\u6311\u6218\u7684\u6e05\u6670\u89c1\u89e3\u3002|\n", "2409.11276": "|**2024-09-17**|**Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments**|Maria Rigaki et.al.|[2409.11276](http://arxiv.org/abs/2409.11276)|null|\u672c\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u4f7f\u7528\u672c\u5730\u5fae\u8c03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u7ea2\u961f\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002\u8003\u8651\u5230\u5546\u4e1a\u4e91\u57faLLM\u7684\u9690\u79c1\u95ee\u9898\u3001\u6210\u672c\u548c\u7f51\u7edc\u8fde\u63a5\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Hackphyr\u2014\u2014\u4e00\u4e2a\u672c\u5730\u5fae\u8c03\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u65e8\u5728\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u7684\u7ea2\u961f\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u5355\u4e2aGPU\u5361\u4e0a\u8fd0\u884c\uff0c\u5e76\u4e14\u5728\u6027\u80fd\u4e0a\u4e0e\u66f4\u5927\u66f4\u5f3a\u5927\u7684\u5546\u4e1a\u6a21\u578b\u5982GPT-4\u76f8\u5ab2\u7f8e\u3002 Hackphyr\u5728\u590d\u6742\u3001\u524d\u6240\u672a\u89c1\u7684\u573a\u666f\u4e2d\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\uff0c\u5305\u62ecGPT-3.5-turbo\u4ee5\u53caQ-learning\u4ee3\u7406\u7b49\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u6027\u80fd\u63d0\u5347\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u7f51\u7edc\u5b89\u5168\u4efb\u52a1\u7684\u65b0\u6570\u636e\u96c6\uff0c\u4ee5\u589e\u5f3a\u57fa\u7840\u6a21\u578b\u7684\u80fd\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u4ee3\u7406\u884c\u4e3a\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u7684\u89c4\u5212\u80fd\u529b\u548c\u6f5c\u5728\u5c40\u9650\u6027\u7684\u89c1\u89e3\uff0c\u4ece\u800c\u4e3a\u66f4\u5e7f\u6cdb\u5730\u7406\u89e3\u6b64\u7c7b\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53c2\u8003\u3002|\n", "2409.10568": "|**2024-09-14**|**On the limits of agency in agent-based models**|Ayush Chopra et.al.|[2409.10568](http://arxiv.org/abs/2409.10568)|**[link](https://github.com/agenttorch/agenttorch)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgentTorch\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u5177\u6709\u9002\u5e94\u6027\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5c06\u57fa\u4e8e\u4e2a\u4f53\u7684\u6a21\u578b\uff08ABM\uff09\u6269\u5c55\u5230\u6570\u767e\u4e07\u4e2a\u4ee3\u7406\u7684\u89c4\u6a21\u3002\u8fd9\u4e00\u6846\u67b6\u65e8\u5728\u5728\u6a21\u62df\u590d\u6742\u7cfb\u7edf\u7684\u884c\u4e3a\u65f6\uff0c\u65e2\u6355\u6349\u5230\u771f\u5b9e\u73af\u5883\u52a8\u6001\u548c\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\uff0c\u53c8\u4fdd\u6301\u5bf9\u5e9e\u5927\u4eba\u53e3\u7fa4\u4f53\u9ad8\u6548\u6a21\u62df\u7684\u80fd\u529b\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\u4e3a\u589e\u5f3aABM\u63d0\u4f9b\u4e86\u673a\u4f1a\uff0c\u4f46\u4f7f\u7528LLMs\u8fdb\u884c\u5927\u89c4\u6a21\u4ee3\u7406\u7684\u8ba1\u7b97\u53ef\u884c\u6027\u9650\u5236\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc4\u4f30\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3aABM\u4ee3\u7406\u7684\u5b9e\u7528\u6027\uff0c\u63a2\u7d22\u4e86\u6a21\u62df\u89c4\u6a21\u4e0e\u5355\u4e2a\u4ee3\u7406\u884c\u4e3a\u7ec6\u8282\u4e4b\u95f4\u7684\u6743\u8861\u3002\u4ee5COVID-19\u5927\u6d41\u884c\u4e3a\u4f8b\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5982\u4f55\u6a21\u62df840\u4e07\u4e2a\u4ee3\u8868\u7ebd\u7ea6\u5e02\u7684\u4ee3\u7406\uff0c\u4ee5\u6355\u6349\u9694\u79bb\u548c\u5c31\u4e1a\u884c\u4e3a\u5bf9\u5065\u5eb7\u548c\u7ecf\u6d4e\u7ed3\u679c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u6bd4\u8f83\u4e86\u57fa\u4e8e\u542f\u53d1\u5f0f\u65b9\u6cd5\u548cLLMs\u7684\u4e0d\u540c\u4ee3\u7406\u67b6\u6784\u5728\u9884\u6d4b\u75be\u75c5\u6d6a\u6f6e\u548c\u5931\u4e1a\u7387\u65b9\u9762\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5728\u56de\u987e\u6027\u3001\u5047\u8bbe\u6027\u548c\u524d\u77bb\u6027\u5206\u6790\u4e2d\u7684\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\u5982\u4f55\u5e2e\u52a9\u514b\u670d\u5386\u53f2\u6570\u636e\u5728\u653f\u7b56\u8bbe\u8ba1\u4e2d\u7684\u5c40\u9650\u6027\u3002AgentTorch\u662f\u4e00\u4e2a\u5f00\u6e90\u9879\u76ee\uff0c\u76ee\u524d\u6b63\u88ab\u5168\u7403\u7528\u4e8e\u653f\u7b56\u5236\u5b9a\u548c\u79d1\u5b66\u53d1\u73b0\u3002\u8be5\u6846\u67b6\u53ef\u5728\u6b64\u83b7\u53d6\uff1agithub.com/AgentTorch/AgentTorch\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5e2e\u52a9\u4e0b\uff0c\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u4ee3\u7406\u53ef\u4ee5\u76f4\u63a5\u4e0e\u5e94\u7528\u7528\u6237\u754c\u9762\uff08UI\uff09\u8fdb\u884c\u4ea4\u4e92\uff0c\u4ece\u800c\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u63d0\u5347\u4ee3\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5e38\u5e38\u56e0\u4e3a\u6d89\u53ca\u5927\u91cf\u987a\u5e8fUI\u4ea4\u4e92\u800c\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AXIS\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u63a5\u53e3\uff08APIs\uff09\u4f18\u5148\u4e8eUI\u52a8\u4f5c\u6765\u4f18\u5316\u4ee3\u7406\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u4ee5\u521b\u5efa\u548c\u6269\u5c55API\uff0c\u4fc3\u8fdb\u4e86API\u7684\u751f\u6210\u548c\u5e94\u7528\u8303\u56f4\u7684\u6269\u5c55\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u5728Word\u529e\u516c\u8f6f\u4ef6\u4e0a\u663e\u793a\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u5b8c\u6210\u4efb\u52a1\u7684\u65f6\u95f4\u4e0a\u51cf\u5c11\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u4eba\u7c7b-\u4ee3\u7406-\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u548c\u5e94\u7528\u63d0\u4f9b\u8005\u5728LLMs\u65f6\u4ee3\u8bbe\u8ba1\u65b0UI\u539f\u5219\u63d0\u4f9b\u4e86\u8d21\u732e\uff0c\u5e76\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e00\u4e2a\u5e94\u7528\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8fc8\u5411\u4ee5\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.16455": "|**2024-09-24**|**MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment**|Venkata Naren Devarakonda et.al.|[2409.16455](http://arxiv.org/abs/2409.16455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultiTalk\u7684\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\u3002\u901a\u8fc7\u5f15\u5165\u5185\u7701\u548c\u5916\u7701\u5bf9\u8bdd\u5faa\u73af\u6846\u67b6\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u89e3\u51b3LLM\u5728\u4efb\u52a1\u89c4\u5212\u4e2d\u53ef\u80fd\u9047\u5230\u7684\u95ee\u9898\uff0c\u5982\u5e7b\u89c9\u3001\u7528\u6237\u6307\u4ee4\u4e2d\u7684\u6b67\u4e49\u3001\u73af\u5883\u7ea6\u675f\u4ee5\u53ca\u6267\u884c\u4ee3\u7406\u80fd\u529b\u7684\u5c40\u9650\u6027\u3002\u8fd9\u4e9b\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u8ba1\u5212\u51fa\u73b0\u9519\u8bef\u6216\u4e0d\u5b8c\u6574\u3002 MultiTalk\u65b9\u6cd5\u901a\u8fc7\u7279\u5b9a\u7cfb\u7edf\u6765\u63d0\u53d6\u548c\u9884\u6d4b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u72b6\u6001\uff0c\u5e76\u6807\u8bb0\u51fa\u4eba\u3001LLM\u4ee3\u7406\u548c\u73af\u5883\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\u6216\u504f\u5dee\u3002\u6709\u6548\u7684\u53cd\u9988\u8def\u5f84\u4fc3\u8fdb\u4eba\u4e0eLLM\u4e4b\u95f4\u7684\u6709\u610f\u4e49\u5bf9\u8bdd\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u4efb\u52a1\u7684\u5e94\u7528\u4e2d\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002\u5b9e\u9a8c\u548c\u6d88\u878d\u5206\u6790\u5c55\u793a\u4e86MultiTalk\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u548c\u53ef\u9760\u6027\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u7684\u6bd4\u8f83\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u4f53\u4ee3\u7406\u4efb\u52a1\u89c4\u5212\u65b9\u9762\u7684\u4f18\u52bf\u3002 \u603b\u4e4b\uff0cMultiTalk\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u589e\u5f3aLLM\u4e0e\u73af\u5883\u3001\u6267\u884c\u8005\u548c\u7528\u6237\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u548c\u6c9f\u901a\u6765\u6539\u8fdb\u4efb\u52a1\u89c4\u5212\u8fc7\u7a0b\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u63d0\u9ad8\u89c4\u5212\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002|\n", "2409.15623": "|**2024-09-23**|**Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality**|Yiwen Xu et.al.|[2409.15623](http://arxiv.org/abs/2409.15623)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSafe Guard\u7684LLM\u4ee3\u7406\uff0c\u7528\u4e8e\u68c0\u6d4b\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u7684\u8bed\u97f3\u4ea4\u4e92\u4e2d\u7684\u4ec7\u6068\u8a00\u8bba\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u5229\u7528\u4e86Open AI GPT\u548c\u97f3\u9891\u7279\u5f81\u63d0\u53d6\u6280\u672f\uff0c\u5b9e\u73b0\u4e86\u5b9e\u65f6\u8bed\u97f3\u4ea4\u4e92\u7684\u68c0\u6d4b\u529f\u80fd\u3002\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u7cfb\u7edf\u8bbe\u8ba1\u4ee5\u53ca\u5bf9\u8be5\u7cfb\u7edf\u7684\u8bc4\u4f30\uff0c\u8fd9\u4e9b\u90fd\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u68c0\u6d4b\u4ec7\u6068\u8a00\u8bba\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e14\u76f8\u6bd4\u73b0\u6709\u65b9\u6cd5\u663e\u8457\u964d\u4f4e\u4e86\u8bef\u62a5\u7387\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u521b\u5efa\u66f4\u5b89\u5168\u7684\u865a\u62df\u73af\u5883\u65b9\u9762\u5177\u6709\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u53d1\u5c55\u57fa\u4e8eLLM\u7684\u7ba1\u7406\u65b9\u6cd5\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2409.14913": "|**2024-09-25**|**Towards a Realistic Long-Term Benchmark for Open-Web Research Agents**|Peter M\u00fchlbacher et.al.|[2409.14913](http://arxiv.org/abs/2409.14913)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5373\u5c06\u63a8\u51fa\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u7ecf\u6d4e\u4ef7\u503c\u9ad8\u7684\u767d\u9886\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u5bf9\u91d1\u878d\u548c\u54a8\u8be2\u9886\u57df\u5e38\u89c4\u8fdb\u884c\u7684\u3001\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u201c\u6742\u4e71\u201d\u5f00\u653e\u7f51\u7edc\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u8fd9\u6837\u505a\uff0c\u6211\u4eec\u4e3a\u5efa\u7acb\u4e00\u4e2aLLM\u4ee3\u7406\u8bc4\u4f30\u5957\u4ef6\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5728\u8be5\u5957\u4ef6\u4e2d\uff0c\u826f\u597d\u7684\u6027\u80fd\u76f4\u63a5\u5bf9\u5e94\u7740\u5de8\u5927\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u3002\u6211\u4eec\u6784\u5efa\u5e76\u6d4b\u8bd5\u4e86\u591a\u4e2a\u4ee3\u7406\u67b6\u6784\uff0c\u5305\u62eco1-preview\u3001GPT-4o\u3001Claude-3.5 Sonnet\u3001Llama 3.1\uff08405b\uff09\u4ee5\u53caGPT-4o-mini\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4f7f\u7528Claude-3.5 Sonnet\u548co1-preview\u7684LLM\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u660e\u663e\u4f18\u4e8e\u4f7f\u7528GPT-4o\u7684\u4ee3\u7406\uff0c\u800c\u57fa\u4e8eLlama 3.1\uff08405b\uff09\u548cGPT-4o-mini\u7684\u4ee3\u7406\u5219\u843d\u540e\u5f88\u591a\u3002\u5728\u6240\u6709LLM\u4e2d\uff0c\u5177\u6709\u59d4\u6258\u5b50\u4efb\u52a1\u7ed9\u5b50\u4ee3\u7406\u80fd\u529b\u7684ReAct\u67b6\u6784\u8868\u73b0\u6700\u4f73\u3002\u9664\u4e86\u5b9a\u91cf\u8bc4\u4f30\u4e4b\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u67e5\u4ee3\u7406\u7684\u8ffd\u8e2a\u8bb0\u5f55\u548c\u53cd\u601d\u5b83\u4eec\u7684\u89c2\u5bdf\u7ed3\u679c\uff0c\u5bf9\u4ee3\u7406\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u5b9a\u6027\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4ee3\u8868\u4e86\u9996\u6b21\u6df1\u5165\u8bc4\u4f30\u4ee3\u7406\u5728\u771f\u5b9e\u5f00\u653e\u7f51\u7edc\u4e0a\u6267\u884c\u5177\u6709\u6311\u6218\u6027\u7684\u3001\u7ecf\u6d4e\u4e0a\u6709\u4ef7\u503c\u7684\u5206\u6790\u5e08\u5f0f\u7814\u7a76\u7684\u80fd\u529b\u3002|\n", "2409.14807": "|**2024-09-23**|**Interpreting Multi-band Galaxy Observations with Large Language Model-Based Agents**|Zechang Sun et.al.|[2409.14807](http://arxiv.org/abs/2409.14807)|null|\u672c\u6587\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e3a\u57fa\u7840\u7684\u667a\u80fd\u4f53\u5982\u4f55\u52a0\u901f\u5929\u6587\u5b66\u7814\u7a76\u6d41\u7a0b\uff0c\u901a\u8fc7\u6a21\u4eff\u4eba\u7c7b\u63a8\u7406\u6765\u89e3\u91ca\u591a\u6ce2\u6bb5\u661f\u7cfb\u89c2\u6d4b\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86mephisto\u6846\u67b6\uff0c\u5b83\u80fd\u591f\u4e0eCIGALE\u4ee3\u7801\u5e93\u534f\u4f5c\uff0c\u540e\u8005\u5305\u542b\u4e86\u7528\u4e8e\u89e3\u91ca\u89c2\u6d4b\u6570\u636e\u7684\u5149\u8c31\u80fd\u91cf\u5206\u5e03\uff08SED\uff09\u6a21\u578b\u3002\u5728\u5f00\u653e\u4e16\u754c\u73af\u5883\u4e2d\uff0cmephisto\u901a\u8fc7\u81ea\u6211\u6e38\u620f\u7ecf\u9a8c\u5b66\u4e60\u3001\u6267\u884c\u6811\u641c\u7d22\u5e76\u79ef\u7d2f\u52a8\u6001\u66f4\u65b0\u7684\u77e5\u8bc6\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c06mephisto\u5e94\u7528\u4e8e\u8a79\u59c6\u65af\u97e6\u4f2f\u592a\u7a7a\u671b\u8fdc\u955c\u7684\u6700\u65b0\u6570\u636e\u96c6\u3002\u7ed3\u679c\u8868\u660e\uff0cmephisto\u5728\u63a8\u7406\u661f\u7cfb\u7269\u7406\u573a\u666f\u65b9\u9762\u8fbe\u5230\u4e86\u63a5\u8fd1\u4eba\u7c7b\u7684\u4e13\u4e1a\u6c34\u5e73\uff0c\u751a\u81f3\u5728\u5904\u7406\u65b0\u53d1\u73b0\u7684\u201c\u5c0f\u7ea2\u70b9\u201d\u661f\u7cfb\u65f6\u4e5f\u662f\u5982\u6b64\u3002\u8fd9\u662f\u667a\u80fd\u4f53\u8fdb\u884c\u5929\u6587\u5b66\u7814\u7a76\u7684\u9996\u6b21\u5c55\u793a\uff0c\u671d\u7740\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5b9e\u73b0\u7aef\u5230\u7aef\u7814\u7a76\u7684\u65b9\u5411\u8fc8\u8fdb\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u52a0\u5feb\u5929\u6587\u53d1\u73b0\u7684\u901f\u5ea6\u3002|\n", "2409.14488": "|**2024-09-22**|**Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks**|Ruoyu Song et.al.|[2409.14488](http://arxiv.org/abs/2409.14488)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u7cfb\u7edf\u96c6\u6210\u7684\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0cAD\u7cfb\u7edf\u9762\u4e34\u7740\u653b\u51fb\u5176\u5bf9\u8c61\u68c0\u6d4b\u4e0e\u8ffd\u8e2a\uff08ODT\uff09\u529f\u80fd\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u9488\u5bf9\u56db\u4e2a\u8fd1\u671f\u63d0\u51fa\u7684LLM\u4ee3\u7406\u7684ODT\u653b\u51fb\u6210\u529f\u7387\u8fbe\u523063.26%\uff0c\u5bfc\u81f4\u5b83\u4eec\u5d29\u6e83\u6216\u8fdd\u53cd\u4ea4\u901a\u89c4\u5219\uff0c\u539f\u56e0\u5728\u4e8e\u8bef\u5bfc\u6027\u8bb0\u5fc6\u6a21\u5757\u63d0\u4f9b\u7684\u8fc7\u5f80\u7ecf\u9a8c\u3001\u63d0\u793a\u5728\u8bc6\u522b\u4e0d\u4e00\u81f4\u6027\u65b9\u9762\u7684\u5c40\u9650\u6027\u4ee5\u53ca\u5bf9\u5730\u9762\u5b9e\u51b5\u611f\u77e5\u6570\u636e\u7684\u4f9d\u8d56\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHudson\u7684\u9a7e\u9a76\u63a8\u7406\u4ee3\u7406\uff0c\u5b83\u6269\u5c55\u4e86\u5148\u524d\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u7cfb\u7edf\uff0c\u65e8\u5728\u5728\u611f\u77e5\u653b\u51fb\u671f\u95f4\u5b9e\u73b0\u66f4\u5b89\u5168\u7684\u51b3\u7b56\u5236\u5b9a\uff0c\u540c\u65f6\u5728\u6b63\u5e38\u6761\u4ef6\u4e0b\u4fdd\u6301\u6709\u6548\u6027\u3002 Hudson\u901a\u8fc7\u9996\u5148\u5bf9AD\u8f6f\u4ef6\u8fdb\u884c\u4eea\u5668\u5316\u6536\u96c6\u5b9e\u65f6\u611f\u77e5\u7ed3\u679c\u548c\u9a7e\u9a76\u573a\u666f\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u6570\u636e\u968f\u540e\u88ab\u8f6c\u5316\u4e3a\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u3002\u4e3a\u4e86\u5f15\u5bfcLLM\u5728ODT\u653b\u51fb\u671f\u95f4\u68c0\u6d4b\u5e76\u505a\u51fa\u5b89\u5168\u63a7\u5236\u51b3\u7b56\uff0cHudson\u5c06DSL\u8f6c\u6362\u4e3a\u81ea\u7136\u8bed\u8a00\uff0c\u5e76\u9644\u5e26\u4e00\u7ec4\u81ea\u5b9a\u4e49\u7684\u653b\u51fb\u68c0\u6d4b\u6307\u4ee4\u3002\u6267\u884c\u67e5\u8be2\u540e\uff0cHudson\u5206\u6790LLM\u7684\u63a7\u5236\u51b3\u7b56\u4ee5\u7406\u89e3\u5176\u56e0\u679c\u63a8\u7406\u8fc7\u7a0b\u3002 \u6211\u4eec\u4f7f\u7528\u79c1\u6709LLM\uff08GPT-4\uff09\u3001\u4e24\u4e2a\u5f00\u6e90LLM\uff08Llama\u548cGemma\uff09\u548c\u5404\u79cd\u5bf9\u6297\u6027\u9a7e\u9a76\u60c5\u666f\u5bf9Hudson\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002GPT-4\u3001Llama\u548cGemma\u5728\u5e73\u5747\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e8683.3%\u300163.6%\u548c73.6%\u7684\u653b\u51fb\u68c0\u6d4b\u51c6\u786e\u7387\u3002\u56e0\u6b64\uff0c\u572886.4%\u300173.9%\u548c80%\u7684\u653b\u51fb\u4e2d\uff0c\u5b83\u4eec\u505a\u51fa\u4e86\u5b89\u5168\u63a7\u5236\u51b3\u7b56\u3002\u968f\u7740\u5c06LLM\u96c6\u6210\u5230AD\u7cfb\u7edf\u4e2d\u7684\u5174\u8da3\u589e\u957f\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u7684\u4f18\u52bf\u53ca\u5176\u5728\u68c0\u6d4b\u548c\u7f13\u89e3ODT\u653b\u51fb\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2409.13642": "|**2024-09-20**|**Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection**|Md Nakhla Rafi et.al.|[2409.13642](http://arxiv.org/abs/2409.13642)|null|\u5728\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u5b9a\u4f4d\u548c\u4fee\u590d\u8f6f\u4ef6\u6545\u969c\u662f\u4e00\u4e2a\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u7684\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9891\u8c31\u7684\u6545\u969c\u5b9a\u4f4d\uff08SBFL\uff09\uff0c\u4f9d\u8d56\u4e8e\u6d4b\u8bd5\u8986\u76d6\u7387\u6570\u636e\u7684\u7edf\u8ba1\u5206\u6790\uff0c\u4f46\u5f80\u5f80\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u57fa\u4e8e\u5b66\u4e60\u7684\u6280\u672f\u867d\u7136\u66f4\u6709\u6548\uff0c\u4f46\u9700\u8981\u5927\u91cf\u7684\u8bad\u7ec3\u6570\u636e\uff0c\u5e76\u4e14\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u6539\u5584\u6545\u969c\u5b9a\u4f4d\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u589e\u5f3a\u4ee3\u7801\u7406\u89e3\u548c\u63a8\u7406\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u57fa\u7ebf\u6280\u672f\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5305\u62ec\u4ee4\u724c\u9650\u5236\u3001\u957f\u8f93\u5165\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u5904\u7406\u6d89\u53ca\u591a\u4e2a\u76f8\u4e92\u4f5c\u7528\u7ec4\u4ef6\u7684\u590d\u6742\u7cfb\u7edf\u65f6\u7684\u56f0\u96be\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLM4FL\u7684\u521b\u65b0\u6027LLM\u4ee3\u7406\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86SBFL\u6392\u540d\u4e0e\u5206\u800c\u6cbb\u4e4b\u7b56\u7565\u3002\u901a\u8fc7\u5c06\u5927\u89c4\u6a21\u8986\u76d6\u6570\u636e\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u7ec4\uff0c\u5e76\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u901a\u8fc7\u63d0\u793a\u94fe\u5f0f\u8c03\u7528\uff0cLLM4FL\u6709\u6548\u5730\u5bfc\u822a\u4ee3\u7801\u5e93\u5e76\u5b9a\u4f4d\u6545\u969c\u3002\u8be5\u65b9\u6cd5\u8fd8\u6574\u5408\u4e86\u81ea\u6211\u53cd\u601d\u548c\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u8fed\u4ee3\u751f\u6210\u4fee\u590d\u5e76\u91cd\u65b0\u6392\u540d\u53ef\u7591\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528Defects4J\uff08V2.0.0\uff09\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u5176\u4e2d\u5305\u62ec\u6765\u81ea14\u4e2a\u5f00\u6e90Java\u9879\u76ee\u7684675\u4e2a\u771f\u5b9e\u4e16\u754c\u6545\u969c\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM4FL\u5728Top-1\u51c6\u786e\u7387\u4e0a\u6bd4AutoFL\u9ad8\u51fa19.27%\uff0c\u5e76\u4e14\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u76d1\u7763\u6280\u672f\uff0c\u5982DeepFL\u548cGrace\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u7279\u5b9a\u4efb\u52a1\u7684\u57f9\u8bad\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u8986\u76d6\u62c6\u5206\u548c\u63d0\u793a\u94fe\u5bf9\u6545\u969c\u5b9a\u4f4d\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u5c55\u793a\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6392\u5e8f\u53ef\u4ee5\u63d0\u9ad8Top-1\u51c6\u786e\u7387\u9ad8\u8fbe22%\u3002|\n", "2409.13447": "|**2024-09-23**|**AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit**|Mohanna Hoveyda et.al.|[2409.13447](http://arxiv.org/abs/2409.13447)|null|\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4e0d\u540c\u7684\u95ee\u9898\u53ef\u80fd\u9700\u8981\u4e0d\u540c\u7684\u56de\u7b54\u7b56\u7565\u6765\u6709\u6548\u89e3\u51b3\u3002\u4e00\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7\u7b80\u5355\u7684\u67e5\u627e\u6765\u89e3\u51b3\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u9700\u8981\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u63a8\u7406\u3002\u8fd9\u4e00\u89c2\u5bdf\u7ed3\u679c\u6fc0\u53d1\u4e86\u5f00\u53d1\u4e00\u79cd\u52a8\u6001\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u4e3a\u6bcf\u4e2a\u95ee\u9898\u9002\u5f53\u5730\u9009\u62e9\u6700\u5408\u9002\u7684QA\u7b56\u7565\uff0c\u4ece\u800c\u6784\u5efa\u66f4\u9ad8\u6548\u3001\u66f4\u6709\u6548\u7684\u7cfb\u7edf\uff0c\u80fd\u591f\u5904\u7406\u66f4\u5e7f\u6cdb\u7c7b\u578b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u57fa\u4e8e\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96c6\u6210\u6700\u65b0\u8fdb\u5c55\uff0c\u5e76\u5c06\u9002\u5e94\u6027QA\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u52a8\u6001\u7f16\u6392\u6311\u6218\u3002\u6211\u4eec\u5c06\u6b64\u89c6\u4e3a\u4e00\u4e2a\u4e0a\u4e0b\u6587\u591a\u81c2\u8001\u864e\u673a\u95ee\u9898\uff0c\u5176\u4e2d\u4e0a\u4e0b\u6587\u7531\u8fdb\u5165\u95ee\u9898\u7684\u7279\u6027\u5b9a\u4e49\uff0c\u800c\u52a8\u4f5c\u7a7a\u95f4\u5305\u62ec\u6f5c\u5728\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u56fe\u914d\u7f6e\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u7ebf\u6027\u4e0a\u754c\u4fe1\u5fc3\u8fb9\u754c\u6a21\u578b\uff0c\u4ee5\u5b66\u4e60\u4e0d\u540c\u95ee\u9898\u7c7b\u578b\u4e0e\u5176\u5bf9\u5e94\u7684\u6700\u4f73\u591aLLM\u901a\u4fe1\u56fe\u8868\u793a\u4e4b\u95f4\u7684\u6700\u4f18\u6620\u5c04\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u9002\u7528\u4e8e\u9002\u5e94\u6027\u7684LLM\u96c6\u6210\u95ee\u7b54\u7cfb\u7edf\u7684\u7f16\u6392\uff0c\u5b83\u7ed3\u5408\u4e86\u66f4\u590d\u6742\u7b56\u7565\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u5728\u7b80\u5355\u7b56\u7565\u8db3\u4ee5\u7684\u60c5\u51b5\u4e0b\u4f7f\u7528\u8fd9\u4e9b\u7b56\u7565\u7684\u6210\u672c\u3002|\n", "2409.15376": "|**2024-09-20**|**ControlMath: Controllable Data Generation Promotes Math Generalist Models**|Nuo Chen et.al.|[2409.15376](http://arxiv.org/abs/2409.15376)|null|\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u95ee\u9898\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u9650\u5236\uff0c\u53ef\u80fd\u4ec5\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u751f\u6210\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aControlMath\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5305\u542b\u4e00\u4e2a\u65b9\u7a0b\u5f0f\u751f\u6210\u6a21\u5757\u548c\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u3002\u8be5\u6a21\u5757\u4ea7\u751f\u591a\u6837\u5316\u7684\u65b9\u7a0b\uff0c\u95ee\u9898\u521b\u9020\u8005\u4ee3\u7406\u968f\u540e\u5c06\u5176\u8f6c\u5316\u4e3a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u9006\u5411\u4ee3\u7406\u5219\u7b5b\u9009\u5e76\u9009\u62e9\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u9075\u5faa\u201c\u5c11\u5373\u662f\u591a\u201d\u7684\u539f\u5219\uff0c\u4f7f\u7528\u66f4\u5c11\u7684\u6570\u636e\u70b9\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u7ed3\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u591a\u6837\u5316\u7684\u6570\u5b66\u95ee\u9898\uff0c\u4e0d\u53d7\u7279\u5b9a\u9886\u57df\u6216\u5206\u5e03\u7684\u9650\u5236\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86ControlMathQA\u6570\u636e\u96c6\uff0c\u5305\u542b19\u4e07\u4e2a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\uff0c\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0eGSM8K\u7b49\u5185\u90e8\u9886\u57df\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u53ef\u4ee5\u5e2e\u52a9\u63d0\u9ad8\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ece\u800c\u5728\u7279\u5b9a\u9886\u57df\u5185\u4ee5\u53ca\u8d85\u51fa\u7279\u5b9a\u9886\u57df\u65f6\u90fd\u80fd\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.13107": "|**2024-09-24**|**Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models**|Hao Ding et.al.|[2409.13107](http://arxiv.org/abs/2409.13107)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u673a\u5668\u611f\u77e5\u65b9\u6cd5\uff0c\u65e8\u5728\u5229\u7528\u8fd1\u671f\u89c6\u89c9\u57fa\u7840\u6a21\u578b\u7684\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\u548c\u5f00\u7bb1\u5373\u7528\u7684\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8fdb\u884c\u89c4\u5212\uff0c\u4e0edVRK\u5e73\u53f0\u96c6\u6210\uff0c\u4ece\u800c\u5f00\u53d1\u51fa\u4e00\u4e2a\u5177\u6709\u5f3a\u5927\u4efb\u52a1\u6027\u80fd\u548c\u5728\u4e0d\u540c\u73af\u5883\u8bbe\u7f6e\u4e0b\u901a\u7528\u6027\u7684\u5b9e\u4f53\u667a\u80fd\u7cfb\u7edf\u3002\u5728\u6267\u884c\u7a7f\u9488\u79fb\u4f4d\u548c\u7eb1\u5e03\u68c0\u7d22\u4efb\u52a1\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5f3a\u5927\u7684\u4efb\u52a1\u6027\u80fd\u548c\u901a\u7528\u6027\u3002 \u5c3d\u7ba1\u8868\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\uff0c\u4f46\u672c\u6587\u7684\u5de5\u4f5c\u4ec5\u4ec5\u662f\u5bf9\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u96c6\u6210\u7684\u7b2c\u4e00\u6b65\u3002\u4e3a\u4e86\u5b9e\u73b0\u5168\u9762\u7684\u6570\u5b57\u5b6a\u751f\u6846\u67b6\u4ee5\u6539\u5584\u624b\u672f\u9886\u57df\u5b9e\u4f53\u667a\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u548c\u901a\u7528\u6027\uff0c\u672a\u6765\u7684\u7814\u7a76\u662f\u5fc5\u8981\u7684\u3002|\n", "2409.17515": "|**2024-09-26**|**From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection**|Xinlei Wang et.al.|[2409.17515](http://arxiv.org/abs/2409.17515)|**[link](https://github.com/ameliawong1996/From_News_to_Forecast)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u751f\u6210\u4ee3\u7406\u6765\u589e\u5f3a\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u4ee5\u8bed\u8a00\u4f5c\u4e3a\u5a92\u4ecb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u5e94\u6027\u5730\u5c06\u5404\u79cd\u793e\u4f1a\u4e8b\u4ef6\u6574\u5408\u8fdb\u9884\u6d4b\u6a21\u578b\u4e2d\uff0c\u5c06\u65b0\u95fb\u5185\u5bb9\u4e0e\u65f6\u95f4\u5e8f\u5217\u6ce2\u52a8\u5bf9\u9f50\uff0c\u4ece\u800c\u63d0\u4f9b\u4e30\u5bcc\u6d1e\u5bdf\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u8fdb\u884c\u8fed\u4ee3\u7b5b\u9009\uff0c\u53bb\u9664\u65e0\u5173\u65b0\u95fb\uff0c\u5e76\u91c7\u7528\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u7406\u548c\u53cd\u601d\u6765\u8bc4\u4f30\u9884\u6d4b\u7ed3\u679c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5206\u6790\u590d\u6742\u4e8b\u4ef6\uff0c\u5982\u610f\u5916\u4e8b\u4ef6\u548c\u793e\u4f1a\u884c\u4e3a\u8f6c\u53d8\uff0c\u5e76\u4e0d\u65ad\u4f18\u5316\u9009\u62e9\u903b\u8f91\u4ee5\u53ca\u4ee3\u7406\u8f93\u51fa\u7684\u7a33\u5065\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u7cbe\u9009\u65b0\u95fb\u548c\u65f6\u95f4\u5e8f\u5217\u6570\u636e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u7684LLaMa2\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u901a\u8fc7\u6709\u6548\u5229\u7528\u975e\u7ed3\u6784\u5316\u65b0\u95fb\u6570\u636e\uff0c\u53ef\u80fd\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u9886\u57df\u5b9e\u73b0\u8303\u5f0f\u8f6c\u53d8\u3002|\n", "2409.17266": "|**2024-09-25**|**AAPM: Large Language Model Agent-based Asset Pricing Models**|Junyan Cheng et.al.|[2409.17266](http://arxiv.org/abs/2409.17266)|**[link](https://github.com/chengjunyan1/aapm)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u8d44\u4ea7\u5b9a\u4ef7\u65b9\u6cd5\u2014\u2014\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u8d44\u4ea7\u5b9a\u4ef7\u6a21\u578b\uff08AAPM\uff09\u3002\u8be5\u65b9\u6cd5\u5c06LLM\u4ee3\u7406\u7684\u5b9a\u6027\u4e3b\u89c2\u6295\u8d44\u5206\u6790\u4e0e\u5b9a\u91cf\u624b\u52a8\u91d1\u878d\u7ecf\u6d4e\u56e0\u7d20\u878d\u5408\uff0c\u4ee5\u9884\u6d4b\u8d85\u989d\u8d44\u4ea7\u56de\u62a5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec4\u5408\u4f18\u5316\u548c\u8d44\u4ea7\u5b9a\u4ef7\u8bef\u5dee\u65b9\u9762\u5747\u4f18\u4e8e\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u8d44\u4ea7\u5b9a\u4ef7\u57fa\u51c6\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f02\u5e38\u8d44\u4ea7\u7ec4\u5408\u7684\u590f\u666e\u6bd4\u7387\u548c\u5e73\u5747\u03b1\u503c\u5206\u522b\u63d0\u9ad8\u4e869.6%\u548c10.8%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5bf9\u6570\u636e\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u63d0\u51fa\u65b9\u6cd5\u7684\u66f4\u591a\u89c1\u89e3\u3002**|\n", "2409.20163": "|**2024-09-30**|**MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants**|Zeyu Zhang et.al.|[2409.20163](http://arxiv.org/abs/2409.20163)|**[link](https://github.com/nuster1128/memsim)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMemSim\u7684\u8d1d\u53f6\u65af\u6a21\u62df\u5668\uff0c\u7528\u4e8e\u4ece\u751f\u6210\u7684\u7528\u6237\u6d88\u606f\u81ea\u52a8\u6784\u5efa\u53ef\u9760\u7684\u95ee\u9898\u4e0e\u7b54\u6848\uff08Q&A\uff09\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u591a\u6837\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8d1d\u53f6\u65af\u5173\u7cfb\u7f51\u7edc\uff08BRNet\uff09\u548c\u56e0\u679c\u751f\u6210\u673a\u5236\uff0c\u4ee5\u51cf\u8f7b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e7b\u89c9\u5bf9\u4e8b\u5b9e\u4fe1\u606f\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u81ea\u52a8\u6784\u5efa\u8bc4\u4f30\u6570\u636e\u96c6\u3002\u57fa\u4e8eMemSim\uff0c\u6211\u4eec\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u751f\u6210\u4e86\u4e00\u4e2a\u540d\u4e3aMemDaily\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4f7f\u7528MemDaily\u6570\u636e\u96c6\u8bc4\u4f30LLM\u57fa\u667a\u80fd\u4f53\u4e0d\u540c\u8bb0\u5fc6\u673a\u5236\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u60e0\u53ca\u7814\u7a76\u793e\u533a\uff0c\u6211\u4eec\u5df2\u7ecf\u5728https://github.com/nuster1128/MemSim\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u9879\u76ee\u3002**|\n", "2409.19894": "|**2024-10-01**|**TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation**|Zhiqiang Yuan et.al.|[2409.19894](http://arxiv.org/abs/2409.19894)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRANSAGENT\u7684\u65b0\u578b\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u7ffb\u8bd1\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u56db\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u534f\u540c\u5de5\u4f5c\u4fee\u590d\u8bed\u6cd5\u9519\u8bef\u548c\u8bed\u4e49\u9519\u8bef\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5206\u522b\u662f\u521d\u59cb\u4ee3\u7801\u7ffb\u8bd1\u5668\u3001\u8bed\u6cd5\u9519\u8bef\u4fee\u590d\u5668\u3001\u4ee3\u7801\u5bf9\u9f50\u5668\u548c\u8bed\u4e49\u9519\u8bef\u4fee\u590d\u5668\u3002TRANSAGENT\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u9996\u5148\u6839\u636e\u76ee\u6807\u7a0b\u5e8f\u4e0e\u6e90\u7a0b\u5e8f\u4e4b\u95f4\u7684\u6267\u884c\u5bf9\u9f50\u5b9a\u4f4d\u76ee\u6807\u7a0b\u5e8f\u4e2d\u7684\u9519\u8bef\u4ee3\u7801\u5757\uff0c\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u7f29\u5c0f\u4fee\u590d\u8303\u56f4\u5e76\u964d\u4f4e\u4fee\u590d\u96be\u5ea6\u3002 \u4e3a\u4e86\u8bc4\u4f30TRANSAGENT\uff0c\u6211\u4eec\u9996\u5148\u4ece\u6700\u8fd1\u7684\u7f16\u7a0b\u4efb\u52a1\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u4ee5\u51cf\u8f7b\u6f5c\u5728\u7684\u6570\u636e\u6cc4\u9732\u95ee\u9898\u3002\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\uff0cTRANSAGENT\u5728\u7ffb\u8bd1\u6548\u679c\u548c\u6548\u7387\u65b9\u9762\u90fd\u4f18\u4e8e\u6700\u65b0\u7684LLM\u57fa\u4ee3\u7801\u7ffb\u8bd1\u6280\u672fUniTrans\uff1b\u6b64\u5916\uff0c\u5728\u4e0d\u540cLLM\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86TRANSAGENT\u7684\u4e00\u822c\u6027\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6d88\u878d\u7814\u7a76\u63ed\u793a\u4e86\u6bcf\u4e2a\u4ee3\u7406\u7684\u8d21\u732e\u3002|\n", "2410.01639": "|**2024-10-02**|**Moral Alignment for LLM Agents**|Elizaveta Tennant et.al.|[2410.01639](http://arxiv.org/abs/2410.01639)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51b3\u7b56\u4ee3\u7406\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u5728\u4eba\u7c7b\u6d3b\u52a8\u7684\u4e0d\u540c\u9886\u57df\u90e8\u7f72\u3002\u867d\u7136\u5b83\u4eec\u7684\u5e94\u7528\u76ee\u524d\u8f83\u4e3a\u4e13\u4e1a\u5316\uff0c\u4f46\u5df2\u6709\u7814\u7a76\u52aa\u529b\u5f00\u53d1\u66f4\u901a\u7528\u7684\u4ee3\u7406\u3002\u968f\u7740LLM\u7cfb\u7edf\u53d8\u5f97\u66f4\u52a0\u81ea\u4e3b\uff0c\u5b83\u4eec\u5bf9\u4eba\u7c7b\u6d3b\u52a8\u7684\u5f71\u54cd\u5c06\u589e\u52a0\uff0c\u5e76\u4e14\u900f\u660e\u5ea6\u4f1a\u964d\u4f4e\u3002\u56e0\u6b64\uff0c\u53d1\u5c55\u6709\u6548\u7684\u65b9\u6cd5\u6765\u4f7f\u5b83\u4eec\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u81f3\u5173\u91cd\u8981\u3002 \u73b0\u6709\u7684\u5bf9\u9f50\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\uff08\u4f8b\u5982\uff0c\u5728RLHF\u6216DPO\u4e2d\uff09\uff0c\u5176\u4e2d\u4ef7\u503c\u89c2\u662f\u9690\u542b\u7684\uff0c\u5e76\u4e14\u672c\u8d28\u4e0a\u662f\u4ece\u4e0d\u540c\u6a21\u578b\u8f93\u51fa\u7684\u76f8\u5bf9\u504f\u597d\u4e2d\u63a8\u65ad\u51fa\u6765\u7684\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u6211\u4eec\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u65b9\u6cd5\uff0c\u8fd9\u4e9b\u51fd\u6570\u660e\u786e\u7f16\u7801\u4e86\u6838\u5fc3\u7684\u4eba\u7c7b\u4ef7\u503c\u89c2\uff0c\u7528\u4e8e\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u65b9\u5f0f\u5fae\u8c03\u57fa\u7840\u4ee3\u7406\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4f7f\u7528\u5185\u5728\u5956\u52b1\u6765\u5b9e\u73b0LLM\u4ee3\u7406\u7684\u9053\u5fb7\u5bf9\u9f50\u3002 \u6211\u4eec\u901a\u8fc7\u4f20\u7edf\u7684\u54f2\u5b66\u6846\u67b6\u2014\u2014\u5fb7ontology\u4f26\u7406\u548c\u529f\u5229\u4e3b\u4e49\u6765\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u91cf\u5316\u4e86\u5728\u8fed\u4ee3\u56da\u5f92\u56f0\u5883\uff08IPD\uff09\u73af\u5883\u4e2d\u4ee3\u7406\u7684\u9053\u5fb7\u5956\u52b1\uff0c\u57fa\u4e8e\u5176\u884c\u4e3a\u53ca\u5176\u540e\u679c\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u9053\u5fb7\u5fae\u8c03\u4f7f\u4ee3\u7406\u80fd\u591f\u653e\u5f03\u4e4b\u524d\u5f00\u53d1\u7684\u81ea\u79c1\u7b56\u7565\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u67d0\u4e9b\u5728IPD\u6e38\u620f\u4e2d\u5b66\u4e60\u7684\u9053\u5fb7\u7b56\u7565\u80fd\u591f\u63a8\u5e7f\u5230\u591a\u4e2a\u77e9\u9635\u6e38\u620f\u73af\u5883\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u4f7f\u7528\u5185\u5728\u5956\u52b1\u8fdb\u884c\u5fae\u8c03\u662f\u5c06LLM\u4ee3\u7406\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u6709\u524d\u666f\u7684\u4e00\u822c\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e14\u53ef\u80fd\u4ee3\u8868\u4e86\u5f53\u524d\u4e3b\u6d41\u5bf9\u9f50\u6280\u672f\u66f4\u52a0\u900f\u660e\u548c\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684\u66ff\u4ee3\u65b9\u6848\u3002|\n", "2410.01242": "|**2024-10-03**|**RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance**|Haolin Jin et.al.|[2410.01242](http://arxiv.org/abs/2410.01242)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6700\u8fd1\u7684\u63d0\u793a\u5de5\u7a0b\u7814\u7a76\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86LLM\u5bf9\u6587\u672c\u4fe1\u606f\u7684\u7406\u89e3\u3002\u7136\u800c\uff0c\u786e\u4fdd\u751f\u6210\u4ee3\u7801\u7684\u51c6\u786e\u6027\u901a\u5e38\u9700\u8981\u7a0b\u5e8f\u5458\u8fdb\u884c\u5927\u91cf\u7684\u6d4b\u8bd5\u548c\u9a8c\u8bc1\u3002\u5c3d\u7ba1LLM\u80fd\u591f\u57fa\u4e8e\u4efb\u52a1\u63cf\u8ff0\u751f\u6210\u4ee3\u7801\uff0c\u4f46\u5728\u590d\u6742\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u5ea6\u4ecd\u7136\u6709\u9650\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u90a3\u4e9b\u9700\u8981\u66f4\u6df1\u5165\u7406\u89e3\u95ee\u9898\u9648\u8ff0\u548c\u4ee3\u7801\u751f\u6210\u8fc7\u7a0b\u7684\u4efb\u52a1\u3002\u8fd9\u4e00\u9650\u5236\u4e3b\u8981\u6e90\u4e8eLLM\u540c\u65f6\u9700\u8981\u7406\u89e3\u548c\u751f\u6210\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u6b63\u786e\u7684\u4ee3\u7801\uff0c\u800c\u6ca1\u6709\u80fd\u529b\u81ea\u52a8\u4f18\u5316\u4ee3\u7801\u7684\u80fd\u529b\u3002\u5728\u5b9e\u9645\u7684\u8f6f\u4ef6\u5f00\u53d1\u4e2d\uff0c\u7a0b\u5e8f\u5458\u5f88\u5c11\u80fd\u5728\u4ec5\u51ed\u4efb\u52a1\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u4e00\u6b21\u5c31\u751f\u6210\u5b8c\u7f8e\u7684\u4ee3\u7801\uff0c\u4ed6\u4eec\u4f9d\u8d56\u4e8e\u8fed\u4ee3\u53cd\u9988\u548c\u8c03\u8bd5\u6765\u5b8c\u5584\u4ed6\u4eec\u7684\u7a0b\u5e8f\u3002\u53d7\u6b64\u8fc7\u7a0b\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u667a\u80fd\u4f53\u67b6\u6784\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u548c\u81ea\u52a8\u8c03\u8bd5\uff1a\u6539\u8fdb\u4e0e\u6307\u5bfc\u8c03\u8bd5\uff08RGD\uff09\u3002RGD\u6846\u67b6\u662f\u4e00\u4e2a\u5229\u7528\u4e09\u79cd\u4e0d\u540cLLM\u4ee3\u7406\uff08\u5f15\u5bfc\u4ee3\u7406\u3001\u8c03\u8bd5\u4ee3\u7406\u548c\u53cd\u9988\u4ee3\u7406\uff09\u7684\u591a\u667a\u80fd\u4f53\u8c03\u8bd5\u5668\uff0c\u5b83\u5c06\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u591a\u4e2a\u6b65\u9aa4\uff0c\u786e\u4fdd\u4e86\u6e05\u6670\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5141\u8bb8\u57fa\u4e8e\u81ea\u6211\u53cd\u601d\u548c\u53cd\u9988\u7684\u4ee3\u7801\u8fed\u4ee3\u7ec6\u5316\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRGD\u5728\u4ee3\u7801\u751f\u6210\u80fd\u529b\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5206\u522b\u5728HumanEval\u6570\u636e\u96c6\u548cMBPP\u6570\u636e\u96c6\u4e0a\u76f8\u6bd4\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u4f20\u7edf\u76f4\u63a5\u63d0\u793a\u65b9\u6cd5\u5b9e\u73b0\u4e869.8%\u548c16.2%\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u5f3a\u8c03\u4e86RGD\u6846\u67b6\u5728\u589e\u5f3aLLM\u81ea\u4e3b\u751f\u6210\u548c\u4f18\u5316\u4ee3\u7801\u80fd\u529b\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2410.00467": "|**2024-10-01**|**Dynamic Planning for LLM-based Graphical User Interface Automation**|Shaoqing Zhang et.al.|[2410.00467](http://arxiv.org/abs/2410.00467)|**[link](https://github.com/sqzhang-lazy/d-pot)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u6fc0\u53d1\u4e86\u5bf9\u81ea\u4e3bLLM\u57fa\u4ee3\u7406\u8fdb\u884c\u521b\u65b0\u6027\u53d1\u5c55\u7684\u5174\u8da3\uff0c\u5c24\u5176\u662f\u5728\u667a\u80fd\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e2d\u7684\u5e94\u7528\u3002\u5f53\u9762\u5bf9\u4efb\u52a1\u76ee\u6807\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u901a\u5e38\u4f1a\u6a21\u4eff\u4eba\u7c7b\u5728GUI\u73af\u5883\u4e2d\u7684\u64cd\u4f5c\u76f4\u81f3\u4efb\u52a1\u5b8c\u6210\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5173\u952e\u6311\u6218\u5728\u4e8e\u5982\u4f55\u6709\u6548\u5730\u5236\u5b9a\u8ba1\u5212\u4ee5\u6307\u5bfcGUI\u4efb\u52a1\u4e2d\u7684\u52a8\u4f5c\u9884\u6d4b\uff0c\u5c3d\u7ba1\u89c4\u5212\u5df2\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u5206\u89e3\u590d\u6742\u4efb\u52a1\u7684\u6709\u6548\u65b9\u5f0f\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728\u6267\u884c\u52a8\u4f5c\u540eGUI\u73af\u5883\u7684\u52a8\u6001\u6027\u8d28\u610f\u5473\u7740\u9700\u8981\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u52a8\u4f5c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u8ba1\u5212\u3002 \u6211\u4eec\u53d1\u73b0\u5e7f\u53d7\u6b22\u8fce\u7684ReAct\u65b9\u6cd5\u5931\u8d25\u4e86\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u8fc7\u4e8e\u4f9d\u8d56\u8fc7\u957f\u7684\u5386\u53f2\u5bf9\u8bdd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u52a8\u6001\u601d\u7ef4\u89c4\u5212\uff08D-PoT\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u57fa\u4e8eLLM\u7684GUI\u4ee3\u7406\u3002D-PoT\u6d89\u53ca\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u6267\u884c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u89c4\u5212\u7684\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684D-PoT\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u4e0a\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684GPT-4V\u57fa\u7ebf\uff0c\u63d0\u9ad8\u4e8612.7%\uff08\u4ece34.66%\u63d0\u9ad8\u523047.36%\uff09\u3002\u5206\u6790\u63ed\u793a\u4e86\u52a8\u6001\u89c4\u5212\u5728\u4e0d\u540c\u57fa\u7840LLM\u4e2d\u7684\u901a\u7528\u6027\uff0c\u4ee5\u53ca\u5728\u5904\u7406\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u65f6\u51cf\u5c11\u5e7b\u89c9\u5e76\u9002\u5e94\u7684\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/sqzhang-lazy/D-PoT\u3002**|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\uff0c\u5b83\u4eec\u7ecf\u5e38\u9047\u5230\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u5fae\u4e4b\u5904\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large Language Model with Imperfect World MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u96c6\u6210\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\uff0c\u7528\u4e8e\u65f6\u95f4\u4e0a\u4e00\u81f4\u7684\u7ecf\u9a8c\u91c7\u6837\uff0c\u4e00\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\u96c6\u5408\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u5c04\u6027\u589e\u5f3a\u751f\u6210\u6a21\u5757\uff0c\u7528\u4e8e\u53cd\u6620\u5148\u524d\u7684\u7ecf\u9a8c\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u9ad8\u5f3a\u5f00\u6e90LLMs\uff0c\u5982LLaMA-3\uff0c\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a2.04\u500d\u30011.54\u500d\u548c1.82\u500d\uff0c\u5206\u522b\u3002\u8fd9\u79cd\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5b83\u4eec\u66f4\u5927\u7684\u540c\u8f88\uff0c\u5982GPT-4\u3002|\n", "2410.02644": "|**2024-10-03**|**Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents**|Hanrong Zhang et.al.|[2410.02644](http://arxiv.org/abs/2410.02644)|**[link](https://github.com/agiresearch/asb)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u6587\u732e\u5728\u5168\u9762\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u653b\u51fb\u4e0e\u9632\u5fa1\u7b56\u7565\u65b9\u9762\u7684\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u5b89\u5168\u57fa\u51c6\u201d\uff08Agent Security Benchmark, ASB\uff09\u7684\u7efc\u5408\u6846\u67b6\u3002\u8be5\u6846\u67b6\u65e8\u5728\u6b63\u5f0f\u5316\u3001\u6807\u51c6\u5316\u5e76\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u95ee\u9898\uff0c\u6db5\u76d6\u4e8610\u4e2a\u5e94\u7528\u573a\u666f\uff08\u5982\u7535\u5b50\u5546\u52a1\u3001\u81ea\u52a8\u9a7e\u9a76\u3001\u91d1\u878d\uff09\u300110\u4e2a\u9488\u5bf9\u8fd9\u4e9b\u573a\u666f\u7684\u4ee3\u7406\u3001\u8d85\u8fc7400\u79cd\u5de5\u5177\u300123\u7c7b\u4e0d\u540c\u7684\u653b\u51fb\u4e0e\u9632\u5fa1\u65b9\u6cd5\u4ee5\u53ca8\u4e2a\u8bc4\u4ef7\u6307\u6807\u3002\u57fa\u4e8eASB\uff0c\u6211\u4eec\u5bf910\u79cd\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u3001\u4e00\u79cd\u8bb0\u5fc6\u6c61\u67d3\u653b\u51fb\u3001\u4e00\u79cd\u65b0\u9896\u7684\u8ba1\u5212-\u601d\u7ef4\u540e\u95e8\u653b\u51fb\u3001\u4e00\u79cd\u6df7\u5408\u653b\u51fb\u4ee5\u53ca\u9488\u5bf9\u8fd910\u79cd\u653b\u51fb\u768410\u79cd\u76f8\u5e94\u9632\u5fa1\u63aa\u65bd\uff0c\u572813\u4e2aLLM\u67b6\u6784\u4e0b\u8fdb\u884c\u4e86\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u603b\u5171\u4ea7\u751f\u4e86\u8fd19\u4e07\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u3002\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\u7ed3\u679c\u63ed\u793a\u4e86\u4ee3\u7406\u64cd\u4f5c\u4e0d\u540c\u9636\u6bb5\u4e2d\u7684\u5173\u952e\u5b89\u5168\u6f0f\u6d1e\uff0c\u5305\u62ec\u7cfb\u7edf\u63d0\u793a\u3001\u7528\u6237\u63d0\u793a\u5904\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u68c0\u7d22\uff0c\u5176\u4e2d\u6700\u9ad8\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8fbe\u5230\u4e8684.30%\uff0c\u4f46\u5f53\u524d\u7684\u9632\u5fa1\u63aa\u65bd\u7684\u6709\u6548\u6027\u6709\u9650\uff0c\u8fd9\u8868\u660e\u793e\u533a\u5728\u4ee3\u7406\u5b89\u5168\u65b9\u9762\u4ecd\u6709\u8bb8\u591a\u5de5\u4f5c\u8981\u505a\u3002\u6709\u5173\u6b64\u7814\u7a76\u7684\u4ee3\u7801\u53ef\u5728https://github.com/agiresearch/ASB\u83b7\u53d6\u3002**|\n", "2410.02551": "|**2024-10-03**|**ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration**|Zixiang Wang et.al.|[2410.02551](http://arxiv.org/abs/2410.02551)|null|\u6211\u4eec\u5f15\u5165\u4e86ColaCare\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u589e\u5f3a\u4e86\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u5efa\u6a21\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u7f1d\u5730\u5c06\u9886\u57df\u7279\u5b9a\u7684\u4e13\u4e1a\u6a21\u578b\u4e0eLLM\u7ed3\u5408\uff0c\u4ee5\u5f25\u5408\u7ed3\u6784\u5316EHR\u6570\u636e\u4e0e\u57fa\u4e8e\u6587\u672c\u7684\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u53d7\u4e34\u5e8a\u54a8\u8be2\u7684\u542f\u53d1\uff0cColaCare\u91c7\u7528\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\u533b\u751f\u4ee3\u7406\u548c\u5143\u4ee3\u7406\uff0c\u5b83\u4eec\u534f\u540c\u5206\u6790\u60a3\u8005\u6570\u636e\u3002\u4e13\u5bb6\u6a21\u578b\u5904\u7406\u5e76\u4ece\u6570\u503cEHR\u6570\u636e\u751f\u6210\u9884\u6d4b\uff0c\u800cLLM\u4ee3\u7406\u5728\u534f\u4f5c\u54a8\u8be2\u6846\u67b6\u5185\u4ea7\u751f\u63a8\u7406\u53c2\u8003\u548c\u51b3\u7b56\u62a5\u544a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u5757\u5c06\u9ed8\u514b\u8bca\u65ad\u4e0e\u6cbb\u7597\u624b\u518c\uff08MSD\uff09\u533b\u7597\u6307\u5bfc\u6574\u5408\u8fdb\u6765\uff0c\u63d0\u4f9b\u6743\u5a01\u8bc1\u636e\u652f\u6301\u3002\u5728\u56db\u4e2a\u4e0d\u540c\u7684EHR\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8bc1\u660e\u4e86ColaCare\u5728\u6b7b\u4ea1\u7387\u9884\u6d4b\u4efb\u52a1\u4e2d\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5176\u5728\u4e34\u5e8a\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u548c\u63a8\u8fdb\u4e2a\u6027\u5316\u7cbe\u51c6\u533b\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u3002\u6709\u5173\u4ee3\u7801\u3001\u5b8c\u6574\u63d0\u793a\u6a21\u677f\u3001\u66f4\u591a\u6848\u4f8b\u7814\u7a76\u7b49\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u533f\u540d\u94fe\u63a5\uff1a\u3002|\n", "2410.02406": "|**2024-10-03**|**ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR**|Mengxu Pan et.al.|[2410.02406](http://arxiv.org/abs/2410.02406)|null|\u8bb8\u591a\u4eba\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u65f6\u4f1a\u9047\u5230\u56f0\u96be\uff0c\u4f20\u7edf\u7684\u5de5\u5177\u5728\u63d0\u4f9b\u9488\u5bf9\u6bcf\u4e2a\u5b66\u4e60\u8005\u9700\u6c42\u7684\u4e0a\u4e0b\u6587\u5316\u5b66\u4e60\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5728\u793e\u4ea4\u865a\u62df\u73b0\u5b9e\uff08VR\uff09\u4e2d\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\uff08ECAs\uff09\u7684\u53d1\u5c55\uff0c\u63d0\u4f9b\u4e86\u4ee5\u4e00\u79cd\u8003\u8651\u5230\u5b66\u4e60\u8005\u7684\u8bed\u8a00\u6c34\u5e73\u548c\u9700\u6c42\u7684\u65b9\u5f0f\u8fdb\u884c\u4e0a\u4e0b\u6587\u5316\u4e14\u81ea\u7136\u7684\u8bed\u8a00\u5b66\u4e60\u7684\u65b0\u673a\u4f1a\u3002\u4e3a\u4e86\u63a2\u7d22\u8fd9\u4e00\u53ef\u80fd\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ELLMA-T\uff0c\u4e00\u4e2a\u5229\u7528GPT-4\u548c\u57fa\u4e8e\u60c5\u5883\u5b66\u4e60\u6846\u67b6\u6765\u652f\u6301\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u82f1\u8bed\u8bed\u8a00\u5b66\u4e60\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\u3002\u901a\u8fc712\u6b21\u7684\u8d28\u6027\u8bbf\u8c08\uff0c\u6211\u4eec\u63ed\u793a\u4e86ELLMA-T\u5728VR\u4e2d\u4e3a\u5b66\u4e60\u8005\u4e0e\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u751f\u6210\u771f\u5b9e\u3001\u53ef\u4fe1\u548c\u4e0a\u4e0b\u6587\u7279\u5b9a\u7684\u89d2\u8272\u626e\u6f14\u7684\u6f5c\u529b\uff0c\u4ee5\u53caLLM\u5728\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u521d\u59cb\u8bed\u8a00\u8bc4\u4f30\u548c\u6301\u7eed\u53cd\u9988\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9\u4e8e\u672a\u6765\u5f00\u53d1\u57fa\u4e8eLLM\u7684\u8bed\u8a00\u4ee3\u7406\u5728\u793e\u4ea4VR\u4e2d\u7684\u4e94\u4e2a\u8bbe\u8ba1\u542f\u793a\u3002|\n", "2410.02165": "|**2024-10-03**|**A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization**|Yucheng Chu et.al.|[2410.02165](http://arxiv.org/abs/2410.02165)|null|\u5728\u5b66\u4e60\u5206\u6790\uff08LA\uff09\u7684\u80cc\u666f\u4e0b\uff0c\u5f00\u653e\u5f0f\u77ed\u7b54\u95ee\u9898\uff08SAG\uff09\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u6df1\u5165\u4e86\u89e3\u5b66\u4e60\u8005\u54cd\u5e94\u7684\u5f3a\u5927\u5de5\u5177\u3002\u7136\u800c\uff0c\u5728\u5b9e\u8df5\u4e2d\uff0cSAG\u7ecf\u5e38\u9762\u4e34\u9ad8\u8bc4\u5206\u5de5\u4f5c\u91cf\u548c\u8bc4\u4f30\u4e00\u81f4\u6027\u62c5\u5fe7\u7684\u6311\u6218\u3002\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u77ed\u7b54\u8bc4\u5206\uff08ASAG\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u7684ASAG\u7b97\u6cd5\u5f80\u5f80\u5728\u6cdb\u5316\u80fd\u529b\u4e0a\u6709\u9650\uff0c\u5e76\u503e\u5411\u4e8e\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u8fdb\u884c\u5b9a\u5236\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u591a\u4ee3\u7406ASAG\u6846\u67b6GradeOpt\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aSAG\u7684\u8bc4\u5206\u5458\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cGradeOpt\u5f15\u5165\u4e86\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u989d\u5916\u4ee3\u7406\u2014\u2014\u53cd\u5c04\u5668\u548c\u7ec6\u5316\u5668\u2014\u2014\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4f7f\u5f97GradeOpt\u80fd\u591f\u901a\u8fc7\u5bf9\u5176\u9519\u8bef\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u6765\u81ea\u52a8\u4f18\u5316\u539f\u59cb\u8bc4\u5206\u6307\u5357\u3002\u5728\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684ASAG\u4efb\u52a1\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5373\u5bf9\u6559\u5b66\u5185\u5bb9\u77e5\u8bc6\uff08PCK\uff09\u548c\u5185\u5bb9\u77e5\u8bc6\uff08CK\uff09\u95ee\u9898\u8fdb\u884c\u8bc4\u5206\u65f6\uff0cGradeOpt\u5728\u8bc4\u5206\u51c6\u786e\u6027\u548c\u4e0e\u4eba\u5de5\u8bc4\u5206\u5458\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u4ee3\u8868\u57fa\u7ebf\u7684\u6027\u80fd\u3002\u6700\u540e\uff0c\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8bc1\u5b9e\u4e86GradeOpt\u4e2d\u8bbe\u8ba1\u7684\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002|\n", "2410.02026": "|**2024-10-02**|**Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics**|Yuan Zhou et.al.|[2410.02026](http://arxiv.org/abs/2410.02026)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aZODIAC\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0c\u8f85\u52a9\u5fc3\u810f\u75c5\u5b66\u8bca\u65ad\u3002ZODIAC\u80fd\u591f\u4ece\u60a3\u8005\u6570\u636e\u4e2d\u63d0\u53d6\u4e34\u5e8a\u76f8\u5173\u7279\u5f81\u3001\u68c0\u6d4b\u91cd\u8981\u7684\u5fc3\u5f8b\u5931\u5e38\uff0c\u5e76\u751f\u6210\u521d\u6b65\u62a5\u544a\u4f9b\u5fc3\u810f\u75c5\u4e13\u5bb6\u5ba1\u67e5\u548c\u7ec6\u5316\u3002\u4e3a\u4e86\u5b9e\u73b0\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0cZODIAC\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u534f\u4f5c\u6846\u67b6\uff0c\u5141\u8bb8\u5bf9\u591a\u6a21\u6001\u60a3\u8005\u6570\u636e\u8fdb\u884c\u5904\u7406\u3002\u6bcf\u4e2aLLM\u4ee3\u7406\u5747\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u88c1\u5b9a\u7684\u771f\u5b9e\u4e16\u754c\u60a3\u8005\u6570\u636e\u8fdb\u884c\u7cbe\u7ec6\u8c03\u4f18\uff0c\u4ee5\u6b64\u5f3a\u5316\u6a21\u578b\u7684\u4e13\u4e1a\u7d20\u517b\u3002 ZODIAC\u7ecf\u8fc7\u4e86\u4e25\u683c\u7684\u4e34\u5e8a\u9a8c\u8bc1\uff0c\u7531\u72ec\u7acb\u7684\u5fc3\u810f\u75c5\u4e13\u5bb6\u8bc4\u4f30\uff0c\u6db5\u76d6\u516b\u4e2a\u6307\u6807\uff0c\u8861\u91cf\u4e34\u5e8a\u6548\u679c\u5e76\u89e3\u51b3\u5b89\u5168\u95ee\u9898\u3002\u7ed3\u679c\u663e\u793a\uff0cZODIAC\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u884c\u4e1a\u9886\u5148\u7684\u6a21\u578b\uff0c\u5305\u62ecOpenAI\u7684GPT-4o\u3001Meta\u7684Llama-3.1-405B\u548cGoogle\u7684Gemini-pro\uff0c\u4ee5\u53ca\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684LLM\u5982\u5fae\u8f6f\u7684BioGPT\u3002\u8fd9\u8868\u660e\u4e86\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u7b26\u5408\u533b\u7597\u5b9e\u8df5\u4e25\u683c\u8981\u6c42\u7684\u9886\u57df\u7279\u5b9a\u89e3\u51b3\u65b9\u6848\u3002 \u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cZODIAC\u5df2\u6210\u529f\u96c6\u6210\u5230\u5fc3\u7535\u56fe(ECG)\u8bbe\u5907\u4e2d\uff0c\u5c55\u793a\u4e86\u5c06LLM\u5d4c\u5165\u8f6f\u4ef6\u4f5c\u4e3a\u533b\u7597\u8bbe\u5907(SaMD)\u7684\u8d8b\u52bf\u65e5\u76ca\u589e\u957f\u3002|\n", "2410.03055": "|**2024-10-04**|**Permissive Information-Flow Analysis for Large Language Models**|Shoaib Ahmed Siddiqui et.al.|[2410.03055](http://arxiv.org/abs/2410.03055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u5feb\u901f\u6210\u4e3a\u66f4\u5927\u8f6f\u4ef6\u7cfb\u7edf\u4e2d\u7684\u901a\u7528\u7ec4\u4ef6\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u81ea\u7136\u7684\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff1a\u4ece\u4e00\u4e2a\u7ec4\u4ef6\u83b7\u53d6\u7684\u6c61\u67d3\u6570\u636e\u53ef\u4ee5\u6539\u53d8\u6a21\u578b\u7684\u884c\u4e3a\u5e76\u7834\u574f\u6574\u4e2a\u7cfb\u7edf\uff0c\u5305\u62ec\u4f7f\u6a21\u578b\u5728\u4e0d\u53ef\u4fe1\u7ec4\u4ef6\u95f4\u4f20\u64ad\u673a\u5bc6\u6570\u636e\u3002\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u5728\u7cfb\u7edf\u5c42\u9762\u4e0a\u901a\u8fc7\u52a8\u6001\u4fe1\u606f\u6d41\u8ddf\u8e2a\uff08\u5373\u6c61\u70b9\u8ddf\u8e2a\uff09\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u4f20\u7edf\u65b9\u6cd5\u5c06\u6700\u4e25\u683c\u7684\u8f93\u5165\u6807\u7b7e\u4f20\u64ad\u5230\u8f93\u51fa\u8fc7\u4e8e\u4fdd\u5b88\uff0c\u4e0d\u9002\u5408LLM\u5728\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u8f93\u5165\u4e0a\u64cd\u4f5c\u7684\u5e94\u7528\u573a\u666f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u66f4\u5bbd\u677e\u7684\u65b9\u6cd5\u6765\u5728LLM\u67e5\u8be2\u4e2d\u4f20\u64ad\u4fe1\u606f\u6d41\u6807\u7b7e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u4ec5\u4f20\u64ad\u751f\u6210\u6a21\u578b\u8f93\u51fa\u65f6\u8d77\u4f5c\u7528\u7684\u6837\u672c\u7684\u6807\u7b7e\uff0c\u5e76\u6d88\u9664\u4e0d\u5fc5\u8981\u7684\u8f93\u5165\u6807\u7b7e\u3002 \u6211\u4eec\u5b9e\u73b0\u4e86\u5e76\u7814\u7a76\u4e86\u4e24\u79cd\u8fd9\u79cd\u65b9\u6cd5\u7684\u53d8\u4f53\uff0c\u57fa\u4e8e\uff08i\uff09\u63d0\u793a\u589e\u5f3a\u68c0\u7d22\u548c\uff08ii\uff09\u57fa\u4e8e$k$\u4e2a\u6700\u8fd1\u90bb\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u76f4\u63a5\u8be2\u95ee\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u8f93\u51fa\u6807\u7b7e\u7684\u53cd\u7701\u5f0f\u5f71\u54cd\u4f30\u8ba1\u5668\u57fa\u7ebf\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u63d0\u793a\u7684\u6807\u7b7e\u4f20\u64ad\u5668\u65b9\u6cd5\u5728\u8d85\u8fc785%\u7684\u60c5\u51b5\u4e0b\u63d0\u9ad8\u4e86\u6807\u7b7e\u8d28\u91cf\uff0c\u5728LLM\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6548\u679c\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5728\u68c0\u7d22\u589e\u5f3a\u4e2d\u4f7f\u7528\u5bbd\u677e\u6807\u7b7e\u4f20\u64ad\u7684\u5b9e\u7528\u6027\u3002|\n", "2410.02958": "|**2024-10-03**|**AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML**|Patara Trirat et.al.|[2410.02958](http://arxiv.org/abs/2410.02958)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014AutoML-Agent\uff0c\u4e13\u4e3a\u5168\u7ba1\u9053\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\uff08AutoML\uff09\u8bbe\u8ba1\uff0c\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u68c0\u7d22\u5230\u6a21\u578b\u90e8\u7f72\u7684\u6574\u4e2a\u8fc7\u7a0b\u3002AutoML-Agent\u901a\u8fc7\u63a5\u53d7\u7528\u6237\u7684\u4efb\u52a1\u63cf\u8ff0\u3001\u4fc3\u8fdb\u4e13\u95e8\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u4e4b\u95f4\u7684\u534f\u4f5c\uff0c\u5e76\u4ea4\u4ed8\u53ef\u90e8\u7f72\u7684\u6a21\u578b\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\uff0c\u4ee5\u7b80\u5316\u975e\u4e13\u5bb6\u7528\u6237\u6784\u5efa\u6570\u636e\u9a71\u52a8\u89e3\u51b3\u65b9\u6848\u7684\u8fc7\u7a0b\u3002\u4e0e\u73b0\u6709\u5de5\u4f5c\u4e0d\u540c\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u7684\u89c4\u5212\u7b56\u7565\u6765\u63d0\u9ad8\u63a2\u7d22\u6027\uff0c\u4ee5\u4fbf\u5728\u641c\u7d22\u66f4\u4f18\u89e3\u7684\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u63a2\u7d22\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5e76\u884c\u6267\u884c\u6765\u5206\u89e3\u6bcf\u4e2a\u8ba1\u5212\u4e3a\u5b50\u4efb\u52a1\uff08\u4f8b\u5982\u6570\u636e\u9884\u5904\u7406\u548c\u795e\u7ecf\u7f51\u7edc\u8bbe\u8ba1\uff09\uff0c\u6bcf\u4e2a\u5b50\u4efb\u52a1\u7531\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u6784\u5efa\u7684\u4e13\u95e8\u4ee3\u7406\u89e3\u51b3\uff0c\u8fd9\u4f7f\u5f97\u641c\u7d22\u8fc7\u7a0b\u66f4\u52a0\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u591a\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u6765\u9a8c\u8bc1\u6267\u884c\u7ed3\u679c\uff0c\u5e76\u6307\u5bfc\u4ee3\u7801\u751f\u6210\u8bed\u8a00\u6a21\u578b\u5b9e\u73b0\u6210\u529f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u4e03\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u4f7f\u7528\u5341\u56db\u7ec4\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cAutoML-Agent\u5728\u81ea\u52a8\u5316\u5168AutoML\u6d41\u7a0b\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u6210\u529f\u7387\uff0c\u4e14\u7cfb\u7edf\u5728\u6574\u4e2a\u591a\u6837\u5316\u9886\u57df\u4e2d\u7684\u6027\u80fd\u5747\u8868\u73b0\u51fa\u8272\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u548c\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u56e0\u4e3a\u81ea\u7136\u8bed\u8a00\u901a\u4fe1\u5728\u6b64\u7c7b\u573a\u666f\u4e2d\u901a\u5e38\u5360\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u4e14\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u6218\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u800c\u8a00\uff0c\u8fd9\u4e9b\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002 \u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u5df2\u7ecf\u63a2\u7d22\u4e86LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u5728\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u4e0a\u7684\u5dee\u5f02\u4f7f\u5f97\u96be\u4ee5\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u4ee5\u6807\u51c6\u5316\u5bf9\u57fa\u4e8e\u8bed\u8a00\u7684\u53cc\u4eba\u3001\u5e8f\u5217\u6e38\u620f\u7684\u7814\u7a76\u3002\u501f\u9274\u7ecf\u6d4e\u5b66\u6587\u732e\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u7c7b\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u4ee5\u53ca\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u4e0e\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u8861\u91cf\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u8fdb\u884c\u4ea4\u4e92\u6a21\u62df\u4e0e\u5206\u6790\uff0c\u5e76\u5229\u7528\u8be5\u6846\u67b6\u6536\u96c6\u4e86LLM\u4e0eLVM\u4e4b\u95f4\u7684\u591a\u4e2a\u6e38\u620f\u914d\u7f6e\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u4e0eLVM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\uff1a(i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b(ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u7ee9\u6548\u89d2\u5ea6\u8bc4\u4f30\u4ee3\u7406\uff1b(iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.04360": "|**2024-10-09**|**GenSim: A General Social Simulation Platform with Large Language Model based Agents**|Jiakai Tang et.al.|[2410.04360](http://arxiv.org/abs/2410.04360)|**[link](https://github.com/TangJiakai/GenSim)**|**\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5229\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u793e\u4f1a\u884c\u4e3a\u7684\u7814\u7a76\u53d6\u5f97\u4e86\u8bb8\u591a\u6709\u524d\u666f\u7684\u6210\u679c\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u5728\u7279\u5b9a\u573a\u666f\u4e0b\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6d89\u53ca\u6709\u9650\u6570\u91cf\u7684\u4ee3\u7406\uff0c\u4f46\u5b83\u4eec\u5927\u591a\u7f3a\u4e4f\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u9519\u8bef\u65f6\u8fdb\u884c\u9002\u5e94\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textit{GenSim}\u7684\u65b0\u9896\u7684\u57fa\u4e8eLLM\u7684\u4eff\u771f\u5e73\u53f0\uff1a\uff081\uff09\\textbf{\u62bd\u8c61\u4e86\u4e00\u7ec4\u901a\u7528\u529f\u80fd}\uff0c\u7b80\u5316\u4e86\u5b9a\u5236\u793e\u4f1a\u573a\u666f\u7684\u4eff\u771f\uff1b\uff082\uff09\\textbf{\u652f\u6301\u4e00\u767e\u4e07\u4e2a\u4ee3\u7406}\uff0c\u4ee5\u66f4\u597d\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u60c5\u5883\u4e2d\u7684\u5927\u89c4\u6a21\u4eba\u7fa4\uff1b\uff083\uff09\\textbf{\u6574\u5408\u4e86\u9519\u8bef\u7ea0\u6b63\u673a\u5236}\uff0c\u786e\u4fdd\u66f4\u53ef\u9760\u548c\u957f\u671f\u7684\u4eff\u771f\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u5e73\u53f0\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5927\u89c4\u6a21\u4ee3\u7406\u4eff\u771f\u6548\u7387\u4ee5\u53ca\u9519\u8bef\u7ea0\u6b63\u673a\u5236\u7684\u6709\u6548\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cGenSim\u4ee3\u8868\u4e86\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u901a\u7528\u3001\u5927\u89c4\u6a21\u548c\u53ef\u6821\u6b63\u7684\u793e\u4f1a\u4eff\u771f\u5e73\u53f0\u7684\u521d\u6b65\u6b65\u9aa4\uff0c\u6709\u671b\u8fdb\u4e00\u6b65\u63a8\u52a8\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53d1\u5c55\u3002**|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u65e5\u76ca\u81ea\u4e3b\u5e76\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u9884\u89c1\u53ef\u80fd\u51fa\u73b0\u7684\u73b0\u8c61\u5e76\u8bc6\u522b\u6f5c\u5728\u98ce\u9669\u3002\u53d7\u5230\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5728\u6b64\u9886\u57df\u505a\u51fa\u8d21\u732e\uff0c\u901a\u8fc7\u5728\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u7279\u5f81\u7684\u60c5\u5883\u4e0b\u7814\u7a76LLM\u4ee3\u7406\u7684\u4ea4\u4e92\u6a21\u5f0f\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u4e24\u79cd\u73b0\u8c61\uff1a\u8bf4\u670d\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u5bfb\u6c42\u7279\u5b9a\u76ee\u6807\uff08\u4f8b\u5982\u83b7\u5f97\u66f4\u591a\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u79bb\u76d1\u72f1\uff09\u56da\u72af\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u7814\u7a76\u3002\u5229\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\u548c\u603b\u51712000\u6b21\u673a\u5668\u5bf9\u673a\u5668\u5bf9\u8bdd\uff0c\u6d89\u53ca\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u503c\u5f97\u5173\u6ce8\u7684\u53d1\u73b0\u3002 \u9996\u5148\uff0c\u6211\u4eec\u8bb0\u5f55\u4e86\u67d0\u4e9b\u6a21\u578b\u5982\u4f55\u5728\u5177\u6709\u6743\u529b\u52a8\u6001\u4f5c\u7528\u7684\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\u7684\u5bf9\u8bdd\u3002\u7136\u540e\uff0c\u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u8bc1\u5730\u5c55\u793a\u4e86\u76ee\u6807\u5bf9\u4ee3\u7406\u7684\u8bf4\u670d\u529b\u5f71\u54cd\u4e3b\u8981\uff0c\u800c\u5bf9\u4ee3\u7406\u7684\u53cd\u793e\u4f1a\u884c\u4e3a\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u4ee3\u7406\u4e2a\u6027\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u6027\u683c\uff0c\u5982\u4f55\u9a71\u52a8\u56da\u72af\u6210\u529f\u7684\u8bf4\u670d\u53ef\u80fd\u6027\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u7b2c\u56db\uff0c\u6211\u4eec\u8868\u660e\uff0c\u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u4e2a\u6027\uff0c\u4ec5\u901a\u8fc7\u5206\u914d\u4ee3\u7406\u89d2\u8272\uff0c\u53cd\u793e\u4f1a\u884c\u4e3a\u4e5f\u4f1a\u81ea\u7136\u6d6e\u73b0\u3002\u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ee3\u7406\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8fa9\u8bba\u6709\u91cd\u8981\u610f\u4e49\u3002**|\n", "2410.06932": "|**2024-10-09**|**Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models**|Daniel Albert et.al.|[2410.06932](http://arxiv.org/abs/2410.06932)|null|\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\u2014\u2014\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\uff0c\u4ee5\u8865\u5145\u6a21\u62df\u548c\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\uff0c\u4ece\u800c\u6df1\u5316\u5bf9\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u8ba4\u77e5\u8fc7\u7a0b\u7684\u7406\u89e3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u590d\u73b0\u4e86\u4e00\u4e2a\u4eba\u7c7b\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\u4e2d\u7684\u884c\u4e3a\u7b56\u7565\uff0c\u5e76\u4f7f\u7528LLM\u751f\u6210\u7684\u4ee3\u7406\u4e0e\u89c2\u5bdf\u5230\u7684\u4eba\u7c7b\u884c\u4e3a\u8fdb\u884c\u5bf9\u6bd4\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cLLM\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u91cd\u73b0\u641c\u7d22\u884c\u4e3a\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u76f8\u4f3c\u7684\u51b3\u7b56\u5236\u5b9a\u8fc7\u7a0b\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5206\u6790\u4e86LLM\u4ee3\u7406\u7684\u201c\u601d\u60f3\u201d\u6a21\u62df\uff0c\u53d1\u73b0\u66f4\u524d\u77bb\u6027\u7684\u601d\u60f3\u4e0e\u503e\u5411\u4e8e\u5229\u7528\u800c\u975e\u63a2\u7d22\u4ee5\u6700\u5927\u5316\u8d22\u5bcc\u7684\u884c\u4e3a\u76f8\u5173\u8054\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e00\u65b0\u65b9\u6cd5\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\uff0c\u5e76\u63a2\u8ba8\u4e86\u5176\u53ef\u80fd\u5b58\u5728\u7684\u5c40\u9650\u6027\u3002|\n", "2410.06153": "|**2024-10-08**|**AgentSquare: Automatic LLM Agent Search in Modular Design Space**|Yu Shang et.al.|[2410.06153](http://arxiv.org/abs/2410.06153)|**[link](https://github.com/tsinghua-fib-lab/agentsquare)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u5feb\u901f\u6210\u957f\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u624b\u52a8\u3001\u4efb\u52a1\u7279\u5b9a\u8bbe\u8ba1\u7684\u65b9\u6cd5\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u65b0\u4efb\u52a1\u4e0a\u7684\u9002\u5e94\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u7814\u7a76\u95ee\u9898\uff1a\u6a21\u5757\u5316\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u641c\u7d22\uff08MoLAS\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6a21\u5757\u5316\u7684\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5c06\u73b0\u6709\u7684LLM\u667a\u80fd\u4f53\u8bbe\u8ba1\u62bd\u8c61\u4e3a\u56db\u4e2a\u57fa\u672c\u6a21\u5757\uff0c\u5e76\u4fdd\u6301\u7edf\u4e00\u7684\u8f93\u5165\u8f93\u51fa\u63a5\u53e3\uff1a\u89c4\u5212\u3001\u63a8\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentSquare\u7684\u65b0\u667a\u80fd\u4f53\u641c\u7d22\u6846\u67b6\uff0c\u5b83\u5f15\u5165\u4e86\u4e24\u4e2a\u6838\u5fc3\u673a\u5236\uff1a\u6a21\u5757\u8fdb\u5316\u548c\u91cd\u7ec4\uff0c\u4ee5\u9ad8\u6548\u5730\u641c\u7d22\u4f18\u5316\u7684LLM\u667a\u80fd\u4f53\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u52a0\u901f\u8fd9\u4e00\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6027\u80fd\u9884\u6d4b\u5668\uff0c\u5229\u7528\u4e0a\u4e0b\u6587\u76f8\u5173\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\u8bbe\u8ba1\u7684\u8fd1\u4f3c\u6a21\u578b\uff0c\u4ece\u800c\u8df3\u8fc7\u65e0\u524d\u666f\u7684\u4ee3\u7406\u8bbe\u8ba1\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u7f51\u7edc\u5e94\u7528\u3001\u5b9e\u4f53\u4ea4\u4e92\u3001\u5de5\u5177\u4f7f\u7528\u548c\u6e38\u620f\u7b49\u4e0d\u540c\u573a\u666f\uff0c\u7ed3\u679c\u8868\u660e\uff0cAgentSquare\u663e\u8457\u4f18\u4e8e\u624b\u5de5\u8bbe\u8ba1\u7684\u667a\u80fd\u4f53\uff0c\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e8617.2%\uff0c\u4e0e\u4eba\u7c7b\u6700\u4f73\u8bbe\u8ba1\u76f8\u6bd4\u3002\u6b64\u5916\uff0cAgentSquare\u8fd8\u80fd\u751f\u6210\u53ef\u89e3\u91ca\u7684\u8bbe\u8ba1\u6d1e\u5bdf\uff0c\u6709\u52a9\u4e8e\u6df1\u5165\u7406\u89e3\u667a\u80fd\u4f53\u67b6\u6784\u53ca\u5176\u5bf9\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6a21\u5757\u5316\u8bbe\u8ba1\u7a7a\u95f4\u548cAgentSquare\u641c\u7d22\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5e73\u53f0\uff0c\u7528\u4e8e\u5145\u5206\u5229\u7528\u5148\u524d\u6210\u529f\u8bbe\u8ba1\u7684\u6f5c\u529b\uff0c\u5e76\u6574\u5408\u7814\u7a76\u793e\u533a\u7684\u52aa\u529b\u3002\u4ee3\u7801\u4ed3\u5e93\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/tsinghua-fib-lab/AgentSquare\u3002**|\n", "2410.05570": "|**2024-10-08**|**Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback**|Taufiq Daryanto et.al.|[2410.05570](http://arxiv.org/abs/2410.05570)|null|\u6c42\u804c\u9762\u8bd5\u5728\u5851\u9020\u4e2a\u4eba\u804c\u4e1a\u751f\u6daf\u65b9\u9762\u8d77\u7740\u5173\u952e\u4f5c\u7528\uff0c\u7136\u800c\uff0c\u7f3a\u4e4f\u4eba\u7c7b\u6559\u7ec3\u6216\u540c\u884c\u63d0\u4f9b\u53cd\u9988\u7684\u73af\u5883\u4f7f\u9762\u8bd5\u6280\u80fd\u8bad\u7ec3\u53d8\u5f97\u9887\u5177\u6311\u6218\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u63d0\u5347\u9762\u8bd5\u7ec3\u4e60\u4f53\u9a8c\u63d0\u4f9b\u4e86\u673a\u4f1a\u3002\u9057\u61be\u7684\u662f\uff0c\u76ee\u524d\u7684\u7814\u7a76\u9c9c\u6709\u63a2\u8ba8\u6b64\u7c7b\u7cfb\u7edf\u7684\u6548\u679c\u53ca\u5176\u7528\u6237\u611f\u77e5\uff0c\u4ee5\u53ca\u5229\u7528LLM\u8fdb\u884c\u9762\u8bd5\u7ec3\u4e60\u6240\u6d89\u53ca\u7684\u76ca\u5904\u4e0e\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u548c\u6700\u8fd1\u7684\u5546\u4e1a\u5de5\u5177\u5df2\u7ecf\u5c55\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u9762\u8bd5\u7ec3\u4e60\u7684\u6f5c\u529b\uff0c\u5b83\u4eec\u901a\u5e38\u4ec5\u63d0\u4f9b\u5355\u5411\u53cd\u9988\uff0c\u5373\u7528\u6237\u53ea\u80fd\u4ece\u4ed6\u4eec\u7684\u8868\u73b0\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5bf9\u8bdd\u5f0f\u53cd\u9988\uff0c\u4e00\u4e2a\u5728\u5b66\u4e60\u79d1\u5b66\u9886\u57df\u53d1\u5c55\u8d77\u6765\u7684\u6982\u5ff5\uff0c\u662f\u4e00\u79cd\u53cc\u5411\u4e92\u52a8\u53cd\u9988\u8fc7\u7a0b\uff0c\u5141\u8bb8\u7528\u6237\u901a\u8fc7\u5bf9\u8bdd\u8fdb\u4e00\u6b65\u53c2\u4e0e\u5e76\u4ece\u63d0\u4f9b\u7684\u53cd\u9988\u4e2d\u5b66\u4e60\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u6b3e\u540d\u4e3aConversate\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u5e94\u7528\u7a0b\u5e8f\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u652f\u6301\u53cd\u601d\u6027\u5b66\u4e60\uff0c\u4ee5\u4fc3\u8fdb\u6c42\u804c\u9762\u8bd5\u7ec3\u4e60\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u804c\u4f4d\u6807\u9898\uff08\u5982\u5165\u95e8\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\uff09\u6765\u542f\u52a8\u9762\u8bd5\u4f1a\u8bdd\u3002\u7136\u540e\uff0c\u7cfb\u7edf\u4e2d\u7684LLM\u4ee3\u7406\u5c06\u5f00\u59cb\u9762\u8bd5\u6a21\u62df\uff0c\u901a\u8fc7\u5411\u7528\u6237\u63d0\u51fa\u5f00\u573a\u9762\u8bd5\u95ee\u9898\uff0c\u5e76\u6839\u636e\u7528\u6237\u7684\u56de\u7b54\u7cbe\u5fc3\u8bbe\u8ba1\u540e\u7eed\u95ee\u9898\u6765\u542f\u52a8\u3002\u9762\u8bd5\u7ed3\u675f\u540e\uff0c\u7cfb\u7edf\u7684\u540e\u7aefLLM\u6846\u67b6\u5c06\u5206\u6790\u7528\u6237\u7684\u56de\u7b54\uff0c\u6307\u51fa\u9700\u8981\u6539\u8fdb\u7684\u5730\u65b9\u3002\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u9009\u62e9\u7279\u5b9a\u6bb5\u843d\u5e76\u64b0\u5199\u81ea\u6211\u53cd\u601d\u6765\u6ce8\u91ca\u8f6c\u5f55\u3002\u6700\u540e\uff0c\u7528\u6237\u53ef\u4ee5\u4e0e\u7cfb\u7edf\u8fdb\u884c\u5bf9\u8bdd\u5f0f\u53cd\u9988\u4ea4\u4e92\uff0c\u4e0eLLM\u4ee3\u7406\u5bf9\u8bdd\uff0c\u6839\u636e\u4ee3\u7406\u7684\u6307\u5bfc\u9010\u6b65\u5b8c\u5584\u81ea\u5df1\u7684\u7b54\u6848\u3002|\n", "2410.05434": "|**2024-10-07**|**Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback**|Sanjiban Choudhury et.al.|[2410.05434](http://arxiv.org/abs/2410.05434)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u51b3\u7b56\u5236\u5b9a\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u65b9\u6cd5\u7f3a\u4e4f\u4ece\u4efb\u52a1\u6267\u884c\u671f\u95f4\u9519\u8bef\u4e2d\u81ea\u52a8\u81ea\u6211\u6539\u8fdb\u7684\u673a\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86LEAP\uff0c\u4e00\u79cd\u8fed\u4ee3\u7ec6\u8c03\u6846\u67b6\uff0c\u901a\u8fc7\u4eceAI\u4e13\u5bb6\u6559\u5e08\u83b7\u53d6\u53cd\u9988\u6765\u6301\u7eed\u63d0\u5347LLM\u4ee3\u7406\u3002\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\u4e3a\u4e13\u5bb6\u6559\u5e08\u63d0\u4f9b\u4e00\u4e2a\u7279\u6743\u72b6\u6001\u2014\u2014\u4ec5\u5728\u8bad\u7ec3\u671f\u95f4\u53ef\u7528\u4f46\u5728\u6d4b\u8bd5\u65f6\u9690\u85cf\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u5373\u4f7f\u662f\u6700\u5f31\u7684\u4e13\u5bb6\u4e5f\u80fd\u63d0\u4f9b\u7cbe\u786e\u6307\u5bfc\uff0c\u663e\u8457\u63d0\u9ad8\u5b66\u751f\u4ee3\u7406\u5728\u4e0d\u8bbf\u95ee\u6d4b\u8bd5\u65f6\u7684\u7279\u6743\u4fe1\u606f\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u6211\u4eec\u5728\u591a\u79cd\u51b3\u7b56\u5236\u5b9a\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86LEAP\uff0c\u5305\u62ec\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\uff08ALFWorld\uff09\u3001\u7f51\u7edc\u5bfc\u822a\uff08WebShop\uff09\u548c\u4ea4\u4e92\u5f0f\u7f16\u7801\uff08Intercode Bash\uff09\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLEAP\uff081\uff09\u4f18\u4e8e\u884c\u4e3a\u514b\u9686\u548cReAct\u57fa\u7ebf\uff082\uff09\u4f7f\u8f83\u5f31\u7684\u5b66\u751f\u6a21\u578b\uff08\u5982Llama3-8B\uff09\u8d85\u8fc7\u5f3a\u5927\u6559\u5e08\u6a21\u578b\uff08GPT4-o\uff09\u7684\u8868\u73b0\uff0c\u5e76\u4e14\uff083\uff09\u5141\u8bb8\u8f83\u5f31\u7684\u6a21\u578b\u4f7f\u7528\u81ea\u5df1\u7279\u6743\u7248\u672c\u7684\u81ea\u6211\u63d0\u5347\u3002\u6211\u4eec\u4e5f\u63d0\u4f9b\u4e86\u7406\u8bba\u5206\u6790\uff0c\u663e\u793aLEAP\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5e73\u8861\u7279\u6743\u4fe1\u606f\u4e0e\u5b66\u751f\u7684\u53ef\u5b9e\u73b0\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc1\u5b9e\u4e86\u8fd9\u4e00\u89c2\u70b9\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://leap-llm.github.io \u83b7\u53d6\u3002|\n", "2410.07869": "|**2024-10-10**|**Benchmarking Agentic Workflow Generation**|Shuofei Qiao et.al.|[2410.07869](http://arxiv.org/abs/2410.07869)|**[link](https://github.com/zjunlp/worfbench)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u51ed\u501f\u5176\u5728\u5904\u7406\u5e7f\u6cdb\u4efb\u52a1\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\uff0c\u63a8\u52a8\u4e86\u63a8\u7406\u548c\u89c4\u5212\u4efb\u52a1\u7684\u663e\u8457\u8fdb\u6b65\u3002\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u5c06\u590d\u6742\u95ee\u9898\u5206\u89e3\u4e3a\u53ef\u6267\u884c\u7684\u5de5\u4f5c\u6d41\u662f\u5173\u952e\u6b65\u9aa4\u3002\u73b0\u6709\u7684\u5de5\u4f5c\u6d41\u8bc4\u4f30\u6846\u67b6\u8981\u4e48\u4ec5\u5173\u6ce8\u6574\u4f53\u6027\u80fd\uff0c\u8981\u4e48\u5b58\u5728\u9650\u5236\uff0c\u5982\u573a\u666f\u8986\u76d6\u8303\u56f4\u6709\u9650\u3001\u5de5\u4f5c\u6d41\u7ed3\u6784\u8fc7\u4e8e\u7b80\u5355\u4ee5\u53ca\u8bc4\u4ef7\u6807\u51c6\u5bbd\u677e\u7b49\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86WorFBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u591a\u7ef4\u573a\u666f\u548c\u590d\u6742\u56fe\u5de5\u4f5c\u6d41\u7ed3\u6784\u7684\u7edf\u4e00\u5de5\u4f5c\u6d41\u751f\u6210\u57fa\u51c6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u7cfb\u7edf\u6027\u7684\u8bc4\u4f30\u534f\u8bae\u2014\u2014WorFEval\uff0c\u5229\u7528\u5b50\u5e8f\u5217\u548c\u5b50\u56fe\u5339\u914d\u7b97\u6cd5\u6765\u51c6\u786e\u91cf\u5316LLM\u4ee3\u7406\u7684\u5de5\u4f5c\u6d41\u751f\u6210\u80fd\u529b\u3002 \u901a\u8fc7\u4e0d\u540c\u7c7b\u578b\u7684LLM\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLM\u4ee3\u7406\u5728\u5e8f\u5217\u89c4\u5212\u80fd\u529b\u548c\u56fe\u89c4\u5212\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u660e\u663e\u7684\u5dee\u8ddd\uff0c\u5373\u4f7f\u662fGPT-4\u4e5f\u663e\u793a\u51fa\u7ea615%\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u8fd8\u8bad\u7ec3\u4e86\u4e24\u4e2a\u5f00\u6e90\u6a21\u578b\uff0c\u5e76\u5728\u4fdd\u7559\u4efb\u52a1\u4e0a\u8bc4\u4f30\u5b83\u4eec\u7684\u4e00\u822c\u5316\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u751f\u6210\u7684\u5de5\u4f5c\u6d41\u80fd\u591f\u589e\u5f3a\u4e0b\u6e38\u4efb\u52a1\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4efb\u52a1\u5728\u63a8\u7406\u65f6\u80fd\u591f\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u5e76\u8282\u7701\u65f6\u95f4\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5c06\u5728https://github.com/zjunlp/WorFBench\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.07706": "|**2024-10-10**|**AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories**|Yifan Song et.al.|[2410.07706](http://arxiv.org/abs/2410.07706)|null|\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentBank\uff0c\u8fd9\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u7528\u4e8e\u5f00\u653e\u6e90\u4ee3\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684agent-environment\u4ea4\u4e92\u8f68\u8ff9\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u8d85\u8fc75\u4e07\u6761\u591a\u6837\u5316\u7684\u9ad8\u8d28\u91cf\u4ea4\u4e92\u8f68\u8ff9\uff0c\u6d89\u53ca16\u4e2a\u4efb\u52a1\u548c\u4e94\u4e2a\u4e0d\u540c\u7684agent\u6280\u80fd\u7ef4\u5ea6\u3002\u901a\u8fc7\u65b0\u9896\u7684\u6ce8\u91ca\u6d41\u7a0b\uff0c\u6211\u4eec\u80fd\u591f\u89c4\u6a21\u5316\u5730\u6807\u6ce8\u8f68\u8ff9\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96be\u5ea6\u504f\u5dee\u6700\u5c0f\u5316\u7684\u8f68\u8ff9\u6570\u636e\u96c6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5bf9AgentBank\u8fdb\u884c\u8c03\u4f18\uff0c\u5f97\u5230\u4e86\u4e00\u7cfb\u5217\u7684agent\u6a21\u578b\u2014\u2014Samoyed\u3002\u6211\u4eec\u7684\u6bd4\u8f83\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u6269\u5c55\u4ea4\u4e92\u8f68\u8ff9\u6570\u636e\u6765\u83b7\u53d6\u901a\u7528\u7684agent\u80fd\u529b\u7684\u6709\u6548\u6027\u3002\u989d\u5916\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u4e8e\u8f68\u8ff9\u8c03\u4f18\u548cagent\u6280\u80fd\u6cdb\u5316\u7684\u5173\u952e\u89c2\u5bdf\u7ed3\u679c\u3002|\n", "2410.07484": "|**2024-10-11**|**WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents**|Siyu Zhou et.al.|[2410.07484](http://arxiv.org/abs/2410.07484)|**[link](https://github.com/elated-sawyer/WALL-E)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u662f\u5426\u53ef\u4ee5\u76f4\u63a5\u4f5c\u4e3a\u6a21\u578b\u9a71\u52a8\u4ee3\u7406\u7684\u5f3a\u5927\u4e16\u754c\u6a21\u578b\uff1f\u867d\u7136LLM\u7684\u5148\u9a8c\u77e5\u8bc6\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u786e\u5b9e\u5b58\u5728\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u4f7fLLM\u4e0e\u5176\u90e8\u7f72\u73af\u5883\u5bf9\u9f50\u6765\u5f25\u5408\u8fd9\u4e9b\u5dee\u8ddd\uff0c\u8fd9\u79cd\u201c\u4e16\u754c\u5bf9\u9f50\u201d\u53ef\u4ee5\u901a\u8fc7\u5728LLM\u4e0a\u8fdb\u884c\u89c4\u5219\u5b66\u4e60\u6765\u9ad8\u6548\u5b9e\u73b0\u3002\u8003\u8651\u5230LLM\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4ec5\u9700\u5c11\u91cf\u989d\u5916\u89c4\u5219\u5373\u53ef\u4f7fLLM\u9884\u6d4b\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u529b\u5b66\u76f8\u5339\u914d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\uff0c\u901a\u8fc7LLM\u4ee5\u68af\u5ea6\u65e0\u7684\u5b66\u4e60\u65b9\u5f0f\u6765\u5b66\u4e60\u8fd9\u4e9b\u89c4\u5219\uff0c\u901a\u8fc7\u57fa\u4e8e\u63a2\u7d22\u8f68\u8ff9\u4e0e\u4e16\u754c\u6a21\u578b\u9884\u6d4b\u7684\u6bd4\u8f83\u6765\u8bf1\u5bfc\u3001\u66f4\u65b0\u548c\u4fee\u526a\u89c4\u5219\u3002\u7ed3\u679c\u7684\u4e16\u754c\u6a21\u578b\u7531LLM\u548c\u5b66\u4e60\u5230\u7684\u89c4\u5219\u7ec4\u6210\u3002\u6211\u4eec\u6784\u5efa\u7684\u5b9e\u4f53\u5316LLM\u4ee3\u7406\u201cWALL-E\u201d\u57fa\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\uff08MPC\uff09\u3002\u901a\u8fc7\u57fa\u4e8e\u7cbe\u786e\u4e16\u754c\u6a21\u578b\u4f18\u5316\u524d\u77bb\u884c\u52a8\uff0cMPC\u663e\u8457\u63d0\u9ad8\u4e86\u63a2\u7d22\u548c\u5b66\u4e60\u6548\u7387\u3002\u4e0e\u73b0\u6709LLM\u4ee3\u7406\u76f8\u6bd4\uff0c\u201cWALL-E\u201d\u7684\u63a8\u7406\u4ec5\u9700\u8981\u5c11\u91cf\u4e3b\u8981\u89c4\u5219\uff0c\u800c\u4e0d\u9700\u8981\u5305\u542b\u5728LLM\u8f93\u5165\u4e2d\u7684\u5927\u91cf\u7f13\u51b2\u8f68\u8ff9\u3002\u5728Minecraft\u548cALFWorld\u7684\u5f00\u653e\u4e16\u754c\u6311\u6218\u4e2d\uff0cWALL-E\u7684\u6210\u529f\u7387\u9ad8\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u89c4\u5212\u65f6\u95f4\u548c\u63a8\u7406\u6240\u9700\u7684\u4ee4\u724c\u6570\u91cf\u66f4\u4f4e\u3002\u5728Minecraft\u4e2d\uff0cWALL-E\u6bd4\u57fa\u7ebf\u9ad8\u51fa15%-30%\uff0c\u6210\u529f\u7387\u4e3a95%\uff0c\u4ec5\u82b1\u8d396\u6b21\u8fed\u4ee3\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|\u53e0\u5c42\u6210\u50cf\u662f\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u4e2d\u7684\u4e00\u79cd\u5148\u8fdb\u7684\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7269\u7406\u3001\u5316\u5b66\u3001\u751f\u7269\u548c\u6750\u6599\u79d1\u5b66\u7b49\u79d1\u7814\u9886\u57df\uff0c\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u3002\u5b9e\u9645\u4e0a\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684\u53e0\u5c42\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u8bb8\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u4f4e\u541e\u5410\u91cf\u7684\u5de5\u4f5c\u6d41\u7a0b\u548c\u6f5c\u5728\u7684\u4eba\u7c7b\u504f\u89c1\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u201c\u53e0\u5c42\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u5316\u53e0\u5c42\u6210\u50cf\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u4f7f\u7528\u591a\u4e2aLLM\u4ee3\u7406\u6267\u884c\u4efb\u52a1\uff0c\u5305\u62ec\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e3a\u53ef\u4ee5\u4e0e\u5b9a\u5236\u7684\u672c\u5730\u77e5\u8bc6\u5e93\u4e00\u8d77\u5de5\u4f5c\uff0c\u786e\u4fdd\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e2d\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09024": "|**2024-10-14**|**AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents**|Maksym Andriushchenko et.al.|[2410.09024](http://arxiv.org/abs/2410.09024)|null|\u5bf9\u4e8e\u8bed\u8a00\u5927\u6a21\u578b\uff08LLMs\uff09\u5728\u9762\u5bf9\u8d8a\u72f1\u653b\u51fb\u65f6\u7684\u9c81\u68d2\u6027\u7814\u7a76\uff0c\u4e3b\u8981\u96c6\u4e2d\u5728\u5b83\u4eec\u4f5c\u4e3a\u7b80\u5355\u7684\u804a\u5929\u673a\u5668\u4eba\u65f6\u7684\u60c5\u51b5\u3002\u7136\u800c\uff0c\u80fd\u591f\u4f7f\u7528\u5916\u90e8\u5de5\u5177\u5e76\u6267\u884c\u591a\u9636\u6bb5\u4efb\u52a1\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u53ef\u80fd\u5e26\u6765\u66f4\u5927\u7684\u98ce\u9669\uff0c\u4f46\u5176\u9c81\u68d2\u6027\u4ecd\u7f3a\u4e4f\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6ee5\u7528\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014AgentHarm\u3002\u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u62ec110\u4e2a\u660e\u786e\u6076\u610f\u7684\u4ee3\u7406\u4efb\u52a1\uff08\u901a\u8fc7\u589e\u5f3a\u540e\u8fbe\u5230440\u4e2a\uff09\uff0c\u6db5\u76d6\u4e86\u6b3a\u8bc8\u3001\u7f51\u7edc\u72af\u7f6a\u548c\u9a9a\u6270\u7b4911\u7c7b\u5371\u5bb3\u3002\u9664\u4e86\u8861\u91cf\u6a21\u578b\u662f\u5426\u62d2\u7edd\u6709\u5bb3\u7684\u4ee3\u7406\u8bf7\u6c42\u5916\uff0c\u8981\u5728AgentHarm\u4e0a\u53d6\u5f97\u9ad8\u5206\u8fd8\u9700\u8981\u88ab\u8d8a\u72f1\u7684\u4ee3\u7406\u80fd\u591f\u5728\u906d\u53d7\u653b\u51fb\u540e\u7ef4\u6301\u5176\u80fd\u529b\u4ee5\u5b8c\u6210\u591a\u6b65\u4efb\u52a1\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u7cfb\u5217\u9886\u5148\u7684LLMs\uff0c\u53d1\u73b0\uff081\uff09\u9886\u5148\u7684LLMs\u5728\u6ca1\u6709\u8d8a\u72f1\u7684\u60c5\u51b5\u4e0b\u4f1a\u51fa\u4e4e\u610f\u6599\u5730\u670d\u4ece\u6076\u610f\u4ee3\u7406\u8bf7\u6c42\uff0c\uff082\uff09\u7b80\u5355\u7684\u901a\u7528\u8d8a\u72f1\u6a21\u677f\u53ef\u4ee5\u6709\u6548\u8d8a\u72f1\u4ee3\u7406\uff0c\uff083\uff09\u8fd9\u4e9b\u8d8a\u72f1\u80fd\u591f\u4f7f\u8fde\u8d2f\u4e14\u6076\u610f\u7684\u591a\u6b65\u4ee3\u7406\u884c\u4e3a\u5f97\u4ee5\u5b9e\u73b0\uff0c\u5e76\u4fdd\u7559\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u4fbf\u4e8e\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7b80\u5355\u53ef\u9760\u7684\u653b\u51fb\u548c\u9632\u5fa1\u8bc4\u4f30\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86AgentHarm\uff0c\u7f51\u5740\u662fhttps://huggingface.co/datasets/ai-safety-institute/AgentHarm\u3002|\n", "2410.08948": "|**2024-10-11**|**The Dynamics of Social Conventions in LLM populations: Spontaneous Emergence, Collective Biases and Tipping Points**|Ariel Flint Ashery et.al.|[2410.08948](http://arxiv.org/abs/2410.08948)|null|\u793e\u4f1a\u60ef\u4f8b\u662f\u793e\u4f1a\u548c\u7ecf\u6d4e\u751f\u6d3b\u7684\u57fa\u7840\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684AI\u4ee3\u7406\u4e0e\u5f7c\u6b64\u4ee5\u53ca\u4eba\u7c7b\u8fdb\u884c\u4e92\u52a8\uff0c\u5b83\u4eec\u5f62\u6210\u5171\u4eab\u60ef\u4f8b\u7684\u80fd\u529b\u5c06\u51b3\u5b9a\u5b83\u4eec\u534f\u8c03\u884c\u4e3a\u3001\u878d\u5165\u793e\u4f1a\u5e76\u5f71\u54cd\u793e\u4f1a\u7684\u6548\u679c\u3002\u672c\u6587\u901a\u8fc7\u6a21\u62df\u4ea4\u4e92\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7fa4\u4f53\u5185\u90e8\u60ef\u4f8b\u7684\u52a8\u529b\u5b66\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5168\u7403\u63a5\u53d7\u7684\u793e\u4f1a\u60ef\u4f8b\u53ef\u4ee5\u81ea\u53d1\u5730\u4ece\u76f8\u4e92\u4ea4\u6d41\u7684LLM\u4e4b\u95f4\u4ea7\u751f\u3002\u5176\u6b21\uff0c\u6211\u4eec\u6f14\u793a\u4e86\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u5373\u4f7f\u662f\u4e2a\u4f53\u4ee3\u7406\u770b\u4f3c\u65e0\u504f\u89c1\u7684\u60c5\u51b5\u4e0b\uff0c\u5f3a\u70c8\u7684\u96c6\u4f53\u504f\u89c1\u4e5f\u53ef\u80fd\u4f1a\u51fa\u73b0\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u8003\u5bdf\u4e86\u5c11\u6570\u7fa4\u4f53\u4e2d\u7684\u575a\u5b9aLLM\u5982\u4f55\u63a8\u52a8\u793e\u4f1a\u53d8\u9769\uff0c\u901a\u8fc7\u5efa\u7acb\u65b0\u7684\u793e\u4f1a\u60ef\u4f8b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65e6\u8fd9\u4e9b\u5c11\u6570\u7fa4\u4f53\u8fbe\u5230\u4e34\u754c\u89c4\u6a21\uff0c\u5b83\u4eec\u5c31\u80fd\u591f\u6301\u7eed\u98a0\u8986\u5df2\u5efa\u7acb\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u6240\u6709\u60c5\u51b5\u4e0b\uff0c\u5c06\u5b9e\u9a8c\u7ed3\u679c\u4e0e\u4e00\u4e2a\u6700\u5c0f\u5316\u591a\u4ee3\u7406\u6a21\u578b\u7684\u9884\u6d4b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u9694\u79bbLLM\u4ee3\u7406\u7684\u5177\u4f53\u4f5c\u7528\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u9610\u660e\u4e86AI\u7cfb\u7edf\u53ef\u4ee5\u5728\u6ca1\u6709\u660e\u786e\u7f16\u7a0b\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u53d1\u5c55\u89c4\u8303\uff0c\u5e76\u5bf9\u8bbe\u8ba1\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u793e\u4f1a\u76ee\u6807\u76f8\u4e00\u81f4\u7684AI\u7cfb\u7edf\u5177\u6709\u542f\u793a\u610f\u4e49\u3002|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u4f8b\u5982\u901a\u8fc7\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u7684\u5bf9\u6297\u6027\u8f93\u5165\u53ef\u4ee5\u89e6\u53d1\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ec8\u6b62\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\uff08\u5982\u673a\u5668\u4eba\u8bed\u97f3\u547d\u4ee4\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4ec5\u4f9d\u9760\u81ea\u7136\u6307\u4ee4\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u6700\u5927\u957f\u5ea6\u9650\u5236\uff0c\u8fd9\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6709\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u4e2d\u7684\u4e0a\u9650\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u6295\u6bd2\u578bDoS\uff08P-DoS\uff09\u653b\u51fb\uff0c\u8bc1\u660e\u6ce8\u5165\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8eDoS\u76ee\u7684\u7684\u4e2d\u6bd2\u6837\u672c\u53ef\u4ee5\u6253\u7834\u8f93\u51fa\u957f\u5ea6\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u4e2d\u6bd2\u6837\u672c\u6210\u529f\u653b\u51fb\u4e86GPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u4f7f\u7528\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\uff0c\u5bfc\u81f4\u8f93\u51fa\u91cd\u590d\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2atoken\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u4e2d\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u5f00\u6e90LLMs\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u6025\u9700\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u5b89\u5168\u7684\u8feb\u5207\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u627e\u5230\u3002**|\n", "2410.10398": "|**2024-10-14**|**FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas**|Yu Lei et.al.|[2410.10398](http://arxiv.org/abs/2410.10398)|null|AI\u5bf9\u9f50\u662f\u5173\u4e4eAI\u63a7\u5236\u548c\u5b89\u5168\u7684\u5173\u952e\u95ee\u9898\u3002\u5b83\u4e0d\u4ec5\u5e94\u8003\u8651\u4ef7\u503c\u4e2d\u7acb\u7684\u4eba\u7c7b\u504f\u597d\uff0c\u8fd8\u5e94\u8003\u8651\u9053\u5fb7\u548c\u4f26\u7406\u65b9\u9762\u7684\u8003\u91cf\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FairMindSim\uff0c\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0d\u516c\u5e73\u7684\u60c5\u666f\u6765\u6a21\u62df\u9053\u5fb7\u56f0\u5883\u3002\u6211\u4eec\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u5728\u5404\u4e2a\u9636\u6bb5\u786e\u4fdd\u5bf9\u9f50\u3002\u4e3a\u4e86\u63a2\u7d22\u9a71\u52a8\u4eba\u7c7b\u548cLLM\u4ee3\u7406\u4f5c\u4e3a\u65c1\u89c2\u8005\u5728\u6d89\u53ca\u4ed6\u4eba\u7684\u4e0d\u516c\u6b63\u60c5\u51b5\u4e0b\u5e72\u9884\u7684\u5404\u79cd\u793e\u4f1a\u7ecf\u6d4e\u52a8\u673a\uff0c\u5373\u6211\u4eec\u6240\u79f0\u7684\u4fe1\u5ff5\uff0c\u5e76\u63a2\u8ba8\u8fd9\u4e9b\u4fe1\u5ff5\u5982\u4f55\u76f8\u4e92\u4f5c\u7528\u4ee5\u5f71\u54cd\u4e2a\u4f53\u884c\u4e3a\uff0c\u6211\u4eec\u5c06\u76f8\u5173\u793e\u4f1a\u5b66\u9886\u57df\u7684\u77e5\u8bc6\u7eb3\u5165\u5176\u4e2d\uff0c\u5e76\u57fa\u4e8e\u9012\u5f52\u5956\u52b1\u6a21\u578b\uff08RRM\uff09\u63d0\u51fa\u4e86\u4fe1\u5ff5-\u5956\u52b1\u5bf9\u9f50\u884c\u4e3a\u8fdb\u5316\u6a21\u578b\uff08BREM\uff09\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4ece\u884c\u4e3a\u89d2\u5ea6\u6765\u770b\uff0cGPT-4o\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u793e\u4f1a\u6b63\u4e49\u611f\uff0c\u800c\u4eba\u7c7b\u5219\u5c55\u73b0\u51fa\u66f4\u4e30\u5bcc\u7684\u60c5\u611f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u60c5\u7eea\u5bf9\u884c\u4e3a\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u672c\u7814\u7a76\u4e3aLLM\u4e0e\u5229\u4ed6\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u7406\u8bba\u57fa\u7840\u3002|\n", "2410.10136": "|**2024-10-14**|**Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations**|Garima Agrawal et.al.|[2410.10136](http://arxiv.org/abs/2410.10136)|null|\u5728\u5ba2\u6237\u8054\u7edc\u4e2d\u5fc3\uff0c\u4eba\u5de5\u5ba2\u670d\u7ecf\u5e38\u9762\u4e34\u8f83\u957f\u7684\u5e73\u5747\u5904\u7406\u65f6\u95f4\uff08AHT\uff09\uff0c\u56e0\u4e3a\u4ed6\u4eec\u9700\u8981\u624b\u52a8\u89e3\u6790\u67e5\u8be2\u5e76\u68c0\u7d22\u76f8\u5173\u7684\u77e5\u8bc6\u5e93\uff08KB\uff09\u6587\u7ae0\u3002\u867d\u7136\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u884c\u4e1a\u4ee5\u534f\u52a9\u6b64\u7c7b\u4efb\u52a1\uff0c\u4f46\u5728\u5b9e\u65f6\u5bf9\u8bdd\u4e2d\uff0cRAG\u7cfb\u7edf\u9762\u4e34\u7740\u8bf8\u5982\u67e5\u8be2\u516c\u5f0f\u4e0d\u51c6\u786e\u548c\u9891\u7e41\u95ee\u9898\u91cd\u590d\u68c0\u7d22\u7b49\u95ee\u9898\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u8d85\u8d8aRAG\uff0c\u5728\u5b9e\u65f6\u8bc6\u522b\u5ba2\u6237\u95ee\u9898\u3002\u5982\u679c\u67e5\u8be2\u5339\u914d\u5e38\u89c1\u95ee\u9898\u89e3\u7b54\uff08FAQ\uff09\uff0c\u7cfb\u7edf\u76f4\u63a5\u4eceFAQ\u6570\u636e\u5e93\u4e2d\u68c0\u7d22\u7b54\u6848\uff1b\u5426\u5219\uff0c\u901a\u8fc7RAG\u751f\u6210\u7b54\u6848\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u67e5\u8be2\u7684\u4f9d\u8d56\uff0c\u4f7f\u5f97\u54cd\u5e94\u80fd\u591f\u57282\u79d2\u5185\u63d0\u4f9b\u7ed9\u5ba2\u670d\u4eba\u5458\u3002\u6b64\u7cfb\u7edf\u90e8\u7f72\u5728Minerva CQ\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u89e3\u51b3\u65b9\u6848\u4e2d\uff0c\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u7f29\u77ed\u4e86AHT\uff0c\u5e76\u964d\u4f4e\u4e86\u8fd0\u8425\u6210\u672c\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7684LLM\u4ee3\u7406\u5de5\u4f5c\u6d41\uff0c\u5f53\u6ca1\u6709\u9884\u5b9a\u4e49\u7684FAQ\u65f6\uff0c\u53ef\u4ee5\u4ece\u5386\u53f2\u8bb0\u5f55\u4e2d\u8bc6\u522bFAQ\u3002|\n", "2410.10020": "|**2024-10-13**|**Adaptive Reasoning and Acting in Medical Language Agents**|Abhishek Dutta et.al.|[2410.10020](http://arxiv.org/abs/2410.10020)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u63d0\u5347\u5728\u6a21\u62df\u4e34\u5e8a\u73af\u5883\u4e2d\u7684\u8bca\u65ad\u51c6\u786e\u6027\uff0c\u5e76\u4f7f\u7528AgentClinic\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\u3002\u6240\u63d0\u51fa\u7684\u81ea\u52a8\u6821\u6b63\u673a\u5236\u4f7f\u5f97\u533b\u751f\u4ee3\u7406\u80fd\u591f\u5728\u9519\u8bef\u8bca\u65ad\u540e\u8fed\u4ee3\u5730\u4f18\u5316\u5176\u63a8\u7406\u548c\u884c\u4e3a\uff0c\u4ece\u800c\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u63d0\u9ad8\u51b3\u7b56\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u91c7\u7528\u81ea\u9002\u5e94LLM\u57fa\u7840\u533b\u751f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u4e0e\u6a21\u62df\u60a3\u8005\u7684\u52a8\u6001\u4e92\u52a8\u5b9e\u73b0\u6b63\u786e\u7684\u8bca\u65ad\u3002\u8bc4\u4f30\u7ed3\u679c\u7a81\u663e\u4e86\u81ea\u4e3b\u4ee3\u7406\u5728\u590d\u6742\u533b\u7597\u573a\u666f\u4e2d\u9002\u5e94\u548c\u6539\u8fdb\u7684\u80fd\u529b\u3002\u672a\u6765\u7684\u5de5\u4f5c\u5c06\u96c6\u4e2d\u5728\u5b8c\u5584\u7b97\u6cd5\u5e76\u6269\u5927\u5176\u5728\u66f4\u5e7f\u6cdb\u4efb\u52a1\u548c\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u9002\u7528\u6027\u3002|\n", "2410.09824": "|**2024-10-13**|**Dynamic and Textual Graph Generation Via Large-Scale LLM-based Agent Simulation**|Jiarui Ji et.al.|[2410.09824](http://arxiv.org/abs/2410.09824)|null|\u56fe\u751f\u6210\u662f\u793e\u4f1a\u3001\u6280\u672f\u548c\u79d1\u5b66\u7814\u7a76\u4e2d\u5e7f\u6cdb\u7814\u7a76\u7684\u57fa\u672c\u4efb\u52a1\u3002\u5728\u5efa\u6a21\u52a8\u6001\u56fe\u6f14\u5316\u8fc7\u7a0b\u65f6\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u65b9\u6cd5\u96be\u4ee5\u6355\u6349\u56fe\u4e2d\u7684\u793e\u533a\u7ed3\u6784\uff0c\u800c\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u4ec5\u5173\u6ce8\u62df\u5408\u8bad\u7ec3\u56fe\u3002\u8fd9\u9650\u5236\u4e86\u73b0\u6709\u7684\u56fe\u751f\u6210\u5668\u53ea\u80fd\u751f\u6210\u7b26\u5408\u9884\u5b9a\u4e49\u89c4\u5219\u6216\u4e0e\u8bad\u7ec3\u6570\u636e\u96c6\u9ad8\u5ea6\u76f8\u4f3c\u7684\u56fe\uff0c\u5728\u52a8\u6001\u56fe\u751f\u6210\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u9274\u4e8e\u56fe\u662f\u4ece\u4eba\u7c7b\u6d3b\u52a8\u4e2d\u6210\u5bf9\u4ea4\u4e92\u4ea7\u751f\u7684\u62bd\u8c61\u8868\u793a\uff0c\u5bf9\u4eba\u7c7b\u884c\u4e3a\u7684\u771f\u5b9e\u6a21\u62df\u53ef\u4ee5\u66f4\u6df1\u5165\u5730\u6d1e\u5bdf\u56fe\u6f14\u5316\u673a\u5236\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u65b9\u9762\u7684\u65e5\u76ca\u8ba4\u53ef\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u4eff\u771f\u6846\u67b6\u2014\u2014GraphAgent-Generator\uff08GAG\uff09\uff0c\u7528\u4e8e\u52a8\u6001\u56fe\u751f\u6210\u3002\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u6211\u4eec\u7684\u6846\u67b6\u6709\u6548\u590d\u5236\u4e86\u5df2\u5efa\u7acb\u7684\u7f51\u7edc\u79d1\u5b66\u7406\u8bba\u4e2d\u7684\u4e03\u4e2a\u5b8f\u89c2\u7ed3\u6784\u7279\u5f81\uff0c\u540c\u65f6\u5728\u7279\u5b9a\u8bc4\u4f30\u6307\u6807\u4e0a\u6bd4\u73b0\u6709\u57fa\u7ebf\u5728\u56fe\u6269\u5c55\u4efb\u52a1\u4e2d\u63d0\u9ad8\u4e8631%\u3002\u901a\u8fc7\u8282\u70b9\u5206\u7c7b\u4efb\u52a1\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86GAG\u80fd\u591f\u6709\u6548\u4fdd\u7559\u771f\u5b9e\u4e16\u754c\u7f51\u7edc\u7684\u8282\u70b9\u7ea7\u6587\u672c\u7279\u5f81\u5728\u751f\u6210\u7684\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u4e2d\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u5e76\u884c\u52a0\u901f\uff0cGAG\u652f\u6301\u901a\u8fc7\u57fa\u4e8e\u5927\u89c4\u6a21LLM\u7684\u4ee3\u7406\u4eff\u771f\u751f\u6210\u6700\u591a\u63a5\u8fd110\u4e07\u4e2a\u8282\u70b9\u62161000\u4e07\u6761\u8fb9\u7684\u56fe\uff0c\u6700\u5c0f\u52a0\u901f\u6bd4\u4e3a90.4%\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002|\n", "2410.09713": "|**2024-10-13**|**Agentic Information Retrieval**|Weinan Zhang et.al.|[2410.09713](http://arxiv.org/abs/2410.09713)|null|\u81ea20\u4e16\u7eaa70\u5e74\u4ee3\u4ee5\u6765\uff0c\u7528\u6237\u8bbf\u95ee\u76f8\u5173\u4fe1\u606f\u4e00\u76f4\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u67b6\u6784\u3002\u5728\u8fc7\u53bb\u4e8c\u5341\u5e74\u4e2d\uff0c\u73b0\u4ee3IR\u7cfb\u7edf\uff08\u5305\u62ec\u7f51\u7edc\u641c\u7d22\u5f15\u64ce\u548c\u4e2a\u4eba\u5316\u63a8\u8350\u7cfb\u7edf\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u4ece\u5927\u91cf\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\u7684\u6548\u7387\u3002\u7136\u800c\uff0c\u8fd9\u4e9bIR\u7cfb\u7edf\u7684\u5185\u6838\u8303\u5f0f\u4ecd\u7136\u57fa\u672c\u4e0d\u53d8\uff0c\u4f9d\u8d56\u4e8e\u7b5b\u9009\u9884\u5b9a\u7684\u4e00\u7ec4\u5019\u9009\u9879\u76ee\u3002\u81ea2022\u5e74\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7a81\u7834\u5f00\u59cb\u6539\u53d8\u4fe1\u606f\u8bbf\u95ee\u7684\u65b9\u5f0f\uff0c\u5efa\u7acb\u4e86\u4e00\u79cd\u65b0\u7684\u6280\u672f\u8303\u5f0f\u3002\u5728\u672c\u6587\u732e\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7531LLM\u4ee3\u7406\u80fd\u529b\u5851\u9020\u7684\u65b0IR\u8303\u5f0f\u2014\u2014\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\uff08Agentic IR\uff09\u3002Agentic IR\u6269\u5c55\u4e86\u53ef\u8bbf\u95ee\u4efb\u52a1\u7684\u8303\u56f4\uff0c\u5e76\u5229\u7528\u4e00\u7cfb\u5217\u65b0\u6280\u672f\u91cd\u65b0\u5b9a\u4e49\u4fe1\u606f\u68c0\u7d22\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u4e09\u79cd\u524d\u6cbf\u5e94\u7528\u4ee5\u53ca\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\u6709\u671b\u4ea7\u751f\u521b\u65b0\u7684\u5e94\u7528\uff0c\u53ef\u80fd\u6210\u4e3a\u672a\u6765\u6570\u5b57\u751f\u6001\u7cfb\u7edf\u4e2d\u7684\u6838\u5fc3\u4fe1\u606f\u5165\u53e3\u3002|\n", "2410.09381": "|**2024-10-12**|**LLM-SmartAudit: Advanced Smart Contract Vulnerability Detection**|Zhiyuan Wei et.al.|[2410.09381](http://arxiv.org/abs/2410.09381)|**[link](https://github.com/LLMAudit/LLMSmartAuditTool)**|\u533a\u5757\u94fe\u6280\u672f\u7684\u4e0d\u53d8\u6027\u8d28\u867d\u7136\u9769\u547d\u6027\uff0c\u4f46\u4e5f\u5f15\u5165\u4e86\u663e\u8457\u7684\u5b89\u5168\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u3002\u8fd9\u4e9b\u5b89\u5168\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u5de8\u5927\u7684\u8d22\u52a1\u635f\u5931\u3002\u5f53\u524d\u5de5\u5177\u548c\u65b9\u6cd5\u901a\u5e38\u4e13\u6ce8\u4e8e\u7279\u5b9a\u7c7b\u578b\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u4e00\u79cd\u80fd\u591f\u5e7f\u6cdb\u68c0\u6d4b\u591a\u79cd\u6f0f\u6d1e\u4e14\u5177\u6709\u9ad8\u51c6\u786e\u6027\u7684\u7efc\u5408\u5de5\u5177\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLM-SmartAudit\u7684\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u80fd\u529b\u6765\u68c0\u6d4b\u548c\u5206\u6790\u667a\u80fd\u5408\u7ea6\u4e2d\u7684\u6f0f\u6d1e\u3002\u901a\u8fc7\u591a\u4ee3\u7406\u5bf9\u8bdd\u65b9\u6cd5\uff0cLLM-SmartAudit\u91c7\u7528\u534f\u4f5c\u7cfb\u7edf\u4e0e\u4e13\u4e1a\u4ee3\u7406\u5408\u4f5c\u4ee5\u589e\u5f3a\u5ba1\u8ba1\u8fc7\u7a0b\u3002\u4e3a\u4e86\u8bc4\u4f30LLM-SmartAudit\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u7f16\u5236\u4e86\u4e24\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u4e0e\u4f20\u7edf\u5de5\u5177\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5b9e\u9645\u5e94\u7528\u7684\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6240\u6709\u4f20\u7edf\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u5de5\u5177\u4e4b\u4e0a\uff0c\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u51c6\u786e\u6027\u548c\u66f4\u5927\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u68c0\u6d4b\u590d\u6742\u903b\u8f91\u6f0f\u6d1e\uff0c\u800c\u4f20\u7edf\u5de5\u5177\u4e4b\u524d\u672a\u66fe\u53d1\u73b0\u8fd9\u4e9b\u6f0f\u6d1e\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5229\u7528LLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u975e\u5e38\u6709\u6548\u7684\u81ea\u52a8\u5316\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u65b9\u6cd5\u3002|\n", "2410.11239": "|**2024-10-15**|**HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications**|Weijie Xu et.al.|[2410.11239](http://arxiv.org/abs/2410.11239)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u5728\u6559\u80b2\u548c\u91d1\u878d\u7b49\u591a\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u8bf8\u591a\u76ca\u5904\uff0c\u4f46\u5728\u4eba\u529b\u8d44\u6e90\u9886\u57df\uff0c\u4ecd\u6709\u8bb8\u591a\u91cd\u590d\u6027\u7684\u6d41\u7a0b\u672a\u88ab\u89e3\u51b3\uff0c\u4f8b\u5982\u8bbf\u95ee\u8bf7\u6c42\u3001\u533b\u7597\u62a5\u9500\u548c\u8bf7\u5047\u7533\u8bf7\u7b49\u3002\u6211\u4eec\u5e0c\u671b\u5c06\u8fd9\u4e9b\u4efb\u52a1\u4e0eLLM\u4ee3\u7406\u76f8\u5173\u8054\uff0c\u8be5\u4ee3\u7406\u5df2\u7ecf\u5728\u8bf8\u5982\u5199\u4f5c\u8f85\u52a9\u548c\u5ba2\u6237\u670d\u52a1\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u6210\u6548\u3002\u6211\u4eec\u63d0\u51fa\u4e86HR-Agent\uff0c\u8fd9\u662f\u4e00\u79cd\u9ad8\u6548\u3001\u4fdd\u5bc6\u4e14\u4e13\u95e8\u9488\u5bf9\u4eba\u529b\u8d44\u6e90\u9886\u57df\u7684\u57fa\u4e8eLLM\u7684\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u5904\u7406\u5982\u533b\u7597\u62a5\u9500\u548c\u8bbf\u95ee\u8bf7\u6c42\u7b49\u91cd\u590d\u6027\u7684\u4eba\u529b\u8d44\u6e90\u6d41\u7a0b\u3002\u7531\u4e8e\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e0d\u4f1a\u5c06\u5bf9\u8bdd\u6570\u636e\u53d1\u9001\u7ed9LLM\uff0c\u56e0\u6b64\u80fd\u591f\u4fdd\u6301\u4eba\u529b\u8d44\u6e90\u76f8\u5173\u4efb\u52a1\u6240\u9700\u7684\u673a\u5bc6\u6027\u3002|\n", "2410.12568": "|**2024-10-16**|**Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving**|Sihao Wu et.al.|[2410.12568](http://arxiv.org/abs/2410.12568)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u5e38\u8bc6\u548c\u63a8\u7406\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u7eaf\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u7684\u7f3a\u9677\u3002\u5f53\u524d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u9700\u8981\u8f83\u957f\u7684\u63a8\u7406\u65f6\u95f4\uff0c\u5e76\u4e14\u5728\u4e0e\u5b9e\u65f6\u81ea\u52a8\u9a7e\u9a76\u73af\u5883\u4ea4\u4e92\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e00\u4e2a\u5173\u952e\u7684\u5f00\u653e\u6027\u95ee\u9898\u662f\uff0c\u6211\u4eec\u80fd\u5426\u6709\u6548\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u4e00\u4e2a\u9ad8\u6548\u4e14\u9c81\u68d2\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAPID\u6846\u67b6\uff0c\u5373\u9c81\u68d2\u81ea\u9002\u5e94\u7b56\u7565\u6ce8\u5165\u4e0e\u84b8\u998f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u7528\u7531\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u4ee3\u7406\u751f\u6210\u7684\u6570\u636e\u6765\u8bad\u7ec3\u4e13\u95e8\u7684\u6df7\u5408\u7b56\u7565RL\u4ee3\u7406\uff0c\u5e76\u8fdb\u884c\u5728\u7ebf\u9002\u5e94\u3002RAPID\u5177\u6709\u4e09\u4e2a\u5173\u952e\u8bbe\u8ba1\uff1a1\uff09\u5229\u7528\u4eceLLM\u4ee3\u7406\u6536\u96c6\u7684\u79bb\u7ebf\u6570\u636e\u6765\u84b8\u998f\u4e13\u5bb6\u77e5\u8bc6\u5230RL\u7b56\u7565\u4e2d\uff0c\u4ee5\u52a0\u5feb\u5b9e\u65f6\u63a8\u7406\u901f\u5ea6\uff1b2\uff09\u5f15\u5165\u9c81\u68d2\u84b8\u998f\u5230RL\u4e2d\uff0c\u4ee5\u7ee7\u627fLLM\u57fa\u7840\u6559\u5e08\u7684\u8868\u73b0\u548c\u9c81\u68d2\u6027\uff1b3\uff09\u91c7\u7528\u6df7\u5408\u7b56\u7565\u65b9\u6cd5\uff0c\u901a\u8fc7\u7b56\u7565\u9002\u914d\u5668\u8fdb\u884c\u8054\u5408\u51b3\u7b56\u89e3\u7801\u3002\u901a\u8fc7\u5728\u7ebf\u73af\u5883\u4ea4\u4e92\u8fdb\u884c\u5fae\u8c03\uff0cRAPID\u51cf\u5c11\u4e86LLM\u77e5\u8bc6\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5bf9\u4e0d\u540c\u4efb\u52a1\u7684\u9002\u5e94\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRAPID\u80fd\u591f\u4ee5\u9ad8\u6548\u3001\u9002\u5e94\u6027\u5f3a\u548c\u9c81\u68d2\u7684\u65b9\u5f0f\u5c06LLM\u77e5\u8bc6\u6709\u6548\u5730\u6574\u5408\u5230\u89c4\u6a21\u5316\u7684RL\u7b56\u7565\u4e2d\u3002\u4ee3\u7801\u548c\u68c0\u67e5\u70b9\u5c06\u5728\u63a5\u53d7\u540e\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.12481": "|**2024-10-16**|**SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling**|Loris Gaven et.al.|[2410.12481](http://arxiv.org/abs/2410.12481)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0d\u4ec5\u4f5c\u4e3a\u751f\u6210\u6a21\u578b\uff0c\u8fd8\u4f5c\u4e3a\u89e3\u51b3\u6587\u672c\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u7684\u4ee3\u7406\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5f53\u9762\u5bf9\u590d\u6742\u73af\u5883\uff0c\u5176\u96f6\u6837\u672c\u80fd\u529b\u4e0d\u8db3\u65f6\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0c\u53ef\u4ee5\u4f7f\u7528\u5728\u7ebf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9bLLM\u4ee3\u7406\u901a\u8fc7\u4ea4\u4e92\u5f0f\u65b9\u5f0f\u53d1\u73b0\u548c\u5b66\u4e60\u9ad8\u6548\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5148\u524d\u7684\u5de5\u4f5c\u4ec5\u9650\u4e8e\u91c7\u7528\u7b56\u7565\u68af\u5ea6\u7b97\u6cd5\uff0c\u8fd9\u5927\u5927\u9650\u5236\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u63a2\u7d22\u548c\u5229\u7528\u65b9\u9762\u53ef\u4ee5\u4f7f\u7528\u7684\u5404\u79cd\u65b9\u6cd5\uff0c\u4f8b\u5982\u7ecf\u9a8c\u91cd\u653e\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5bf9\u4e8eLLM\u5b66\u4e60\u4ee3\u7406\u6765\u8bf4\u53ef\u80fd\u662f\u5173\u952e\u7684\uff0c\u5c24\u5176\u662f\u5728\u8bbe\u8ba1\u81ea\u4e3b\u5185\u5728\u52a8\u673a\u4ee3\u7406\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u4f1a\u6839\u636e\u81ea\u5df1\u7684\u76ee\u6807\u8fdb\u884c\u91c7\u6837\u548c\u8ffd\u6c42\uff08\u5373\u81ea\u76ee\u7684\u6027\u4ee3\u7406\uff09\u3002\u672c\u6587\u63d0\u51fa\u5e76\u7814\u7a76\u4e86\u4e00\u79cd\u9002\u5e94\u8f6f\u6f14\u5458-\u8bc4\u8bba\u5bb6\u7b97\u6cd5\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u7684LLM\u4ee3\u7406\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u4e3a\u8bbe\u8ba1\u5728\u7ebf\u5b66\u4e60\u7684\u81ea\u76ee\u7684\u6027LLM\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\uff0c\u8fd8\u53ef\u4ee5\u5728\u66f4\u7ecf\u5178\u7684\u591a\u76ee\u6807RL\u73af\u5883\u4e2d\u8d85\u8d8a\u7b56\u7565\u68af\u5ea6\u65b9\u6cd5\u3002|\n", "2410.12361": "|**2024-10-16**|**Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance**|Yaxi Lu et.al.|[2410.12361](http://arxiv.org/abs/2410.12361)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5df2\u7ecf\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u4ee3\u7406\u7cfb\u7edf\u4ecd\u7136\u662f\u53cd\u5e94\u5f0f\u7684\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u9884\u89c1\u6027\u548c\u81ea\u4e3b\u51b3\u7b56\u7684\u60c5\u666f\u4e2d\u7684\u6709\u6548\u6027\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u81f4\u529b\u4e8e\u5f00\u53d1\u80fd\u591f\u9884\u89c1\u5e76\u4e3b\u52a8\u53d1\u8d77\u4efb\u52a1\u7684\u4ee3\u7406\uff0c\u800c\u65e0\u9700\u660e\u786e\u7684\u4eba\u7c7b\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u771f\u5b9e\u4e16\u754c\u7684\u4eba\u7c7b\u6d3b\u52a8\u4ee5\u751f\u6210\u4e3b\u52a8\u4efb\u52a1\u9884\u6d4b\u3002\u8fd9\u4e9b\u9884\u6d4b\u7531\u4eba\u7c7b\u6807\u6ce8\u8005\u6807\u8bb0\u4e3a\u63a5\u53d7\u6216\u62d2\u7edd\u3002\u6807\u6ce8\u540e\u7684\u6570\u636e\u88ab\u7528\u4e8e\u8bad\u7ec3\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u5224\u65ad\uff0c\u5e76\u4f5c\u4e3aLLM\u4ee3\u7406\u4e3b\u52a8\u6027\u7684\u81ea\u52a8\u8bc4\u4f30\u5668\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u751f\u6210\u7ba1\u9053\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b6,790\u4e2a\u4e8b\u4ef6\u7684\u591a\u6837\u5316\u6570\u636e\u96c6ProactiveBench\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u901a\u8fc7\u4f7f\u7528\u6240\u63d0\u51fa\u7684ProactiveBench\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u6fc0\u53d1LLM\u4ee3\u7406\u7684\u4e3b\u52a8\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u5fae\u8c03\u6a21\u578b\u5728\u4e3b\u52a8\u63d0\u4f9b\u5e2e\u52a9\u65b9\u9762\u7684F1\u5f97\u5206\u8fbe\u5230\u4e8666.47%\uff0c\u4f18\u4e8e\u6240\u6709\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u521b\u9020\u66f4\u4e3b\u52a8\u548c\u6709\u6548\u7684\u4ee3\u7406\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u672a\u6765\u7684\u4eba\u673a\u534f\u4f5c\u8fdb\u6b65\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.12236": "|**2024-10-16**|**Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay**|Yuyang Chen et.al.|[2410.12236](http://arxiv.org/abs/2410.12236)|null|\u5982\u4eca\uff0c\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684Transformer\u57fa\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u5e94\u7528\u91c7\u6837\u548c\u8fc7\u6ee4\u7ba1\u9053\u3002\u7531\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u7a00\u758f\u5956\u52b1\u95ee\u9898\uff0c\u5373\u4e00\u4e2a\u4ee4\u724c\u7684\u4e0d\u6b63\u786e\u6027\u4f1a\u5bfc\u81f4Transformer\u6a21\u578b\u91c7\u6837\u5197\u4f59\u7a0b\u5e8f\u76f4\u5230\u627e\u5230\u6b63\u786e\u7684\u7a0b\u5e8f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u4f4e\u6548\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5728\u5fae\u8c03\u9636\u6bb5\u5f15\u5165\u4e86\u7ecf\u9a8c\u56de\u653e\uff08ER\uff09\uff0c\u5176\u4e2d\u4ea7\u751f\u7684\u4ee3\u7801\u548c\u7a0b\u5e8f\u4f1a\u88ab\u5b58\u50a8\u5e76\u91cd\u653e\uff0c\u4ee5\u4f7fLLM\u4ee3\u7406\u6709\u673a\u4f1a\u4ece\u8fc7\u53bb\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\u3002\u57fa\u4e8eER\u7684\u7cbe\u795e\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u79f0\u4e3aBTP\u7ba1\u9053\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7531\u4e09\u4e2a\u9636\u6bb5\u7ec4\u6210\uff1a\u675f\u641c\u7d22\u91c7\u6837\u3001\u6d4b\u8bd5\u9636\u6bb5\u548c\u4f18\u5148\u7ea7\u7ecf\u9a8c\u56de\u653e\u9636\u6bb5\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7801\u6a21\u578b\u6536\u96c6\u7684\u5931\u8d25\u7a0b\u5e8f\uff0c\u5e76\u4ece\u56de\u653e\u7f13\u51b2\u533a\u4e2d\u91cd\u653e\u5177\u6709\u9ad8\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\u4f18\u5148\u503c\uff08P2Value\uff09\u7684\u7a0b\u5e8f\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u3002P2Value\u7efc\u5408\u8003\u8651\u4e86Transformer\u8f93\u51fa\u7684\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\uff0c\u5e76\u53ef\u4ee5\u5229\u7528\u5927\u591a\u6570\u7531LLMs\u6536\u96c6\u7684\u7a0b\u5e8f\u672a\u80fd\u901a\u8fc7\u4efb\u4f55\u6d4b\u8bd5\u800c\u5bfc\u81f4\u7684\u5197\u4f59\u8d44\u6e90\u3002\u6211\u4eec\u5b9e\u8bc1\u5730\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u5e94\u7528\u4e8e\u51e0\u79cdLLM\u4e2d\uff0c\u8bc1\u660e\u5b83\u63d0\u5347\u4e86\u5b83\u4eec\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u57fa\u7ebf\u3002|\n", "2410.11906": "|**2024-10-15**|**Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents**|Bolun Sun et.al.|[2410.11906](http://arxiv.org/abs/2410.11906)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u589e\u5f3a\u7528\u6237\u5bf9\u9690\u79c1\u653f\u7b56\u7684\u7406\u89e3\u7684\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u4e92\u5f0f\u5bf9\u8bdd\u4ee3\u7406\u5b9e\u73b0\u3002\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u6570\u636e\u5b9e\u8df5\u8bc6\u522b\u3001\u9009\u62e9\u8bc6\u522b\u3001\u653f\u7b56\u603b\u7ed3\u548c\u9690\u79c1\u95ee\u7b54\u7b49\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u6a21\u578b\uff0c\u4e3a\u9690\u79c1\u653f\u7b56\u5206\u6790\u8bbe\u7acb\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u8be5\u4ee3\u7406\u4f5c\u4e3a\u5904\u7406\u7f51\u7ad9\u9690\u79c1\u653f\u7b56\u7684\u4e13\u5bb6\u7cfb\u7edf\uff0c\u80fd\u591f\u5728\u4e0d\u9700\u7528\u6237\u63d0\u4f9b\u7279\u5b9a\u95ee\u9898\u7684\u60c5\u51b5\u4e0b\u5f15\u5bfc\u7528\u6237\u7406\u89e3\u590d\u6742\u7684\u6cd5\u5f8b\u8bed\u8a00\u3002\u4e00\u9879\u6d89\u53ca100\u540d\u53c2\u4e0e\u8005\u7684\u7528\u6237\u7814\u7a76\u8868\u660e\uff0c\u4f7f\u7528\u8be5\u4ee3\u7406\u7684\u7528\u6237\u5177\u6709\u66f4\u9ad8\u7684\u7406\u89e3\u6c34\u5e73\uff08\u5e73\u5747\u52062.6/3\uff0c\u800c\u5bf9\u7167\u7ec4\u4e3a1.8\uff09\uff0c\u66f4\u4f4e\u7684\u8ba4\u77e5\u8d1f\u8377\uff08\u4efb\u52a1\u96be\u5ea6\u8bc4\u5206\u4e3a3.2/10\uff0c\u800c\u5bf9\u7167\u7ec4\u4e3a7.8\uff09\uff0c\u66f4\u9ad8\u7684\u9690\u79c1\u7ba1\u7406\u4fe1\u5fc3\uff0c\u5e76\u4e14\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u65f6\u95f4\u66f4\u77ed\uff085.5\u5206\u949fvs.15.8\u5206\u949f\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u7a81\u663e\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6539\u53d8\u7528\u6237\u4e0e\u9690\u79c1\u653f\u7b56\u4e92\u52a8\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u6709\u52a9\u4e8e\u83b7\u5f97\u66f4\u52a0\u77e5\u60c5\u7684\u540c\u610f\uff0c\u5e76\u5728\u6570\u5b57\u670d\u52a1\u9886\u57df\u8d4b\u4e88\u7528\u6237\u66f4\u591a\u6743\u529b\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u5b9e\u73b0\u81ea\u4e3b\u6027\uff0c\u53ef\u4ee5\u63d0\u5347\u4eba\u7c7b\u5728\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u4f8b\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\u7684\u540c\u65f6\uff0c\u7f51\u7edc\u4ee3\u7406\u4e5f\u4f5c\u4e3a\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6982\u5ff5\u8bc1\u660e\u793a\u4f8b\uff0c\u5176\u6210\u529f\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u65e0\u6cd5\u5728\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u666e\u904d\u9002\u7528\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0e\u57fa\u4e8eLLM\u7684\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u4e0d\u5339\u914d\u7684\u7814\u7a76\u6709\u9650\u3002\u8fd9\u79cd\u5dee\u5f02\u5c24\u5176\u660e\u663e\uff0c\u56e0\u4e3aLLM\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u8fdb\u884c\u8bad\u7ec3\uff0c\u800c\u4e0d\u662f\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u5316\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u6765\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u7b26\u5408LLM\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u57fa\u7840\u4ee3\u7406AgentOccam\u5728\u5404\u79cd\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u8d85\u8d8a\u4e4b\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e00\u4e2a\u5305\u542b\u901a\u7528\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u540c\u671f\u5de5\u4f5c\u5206\u522b\u9ad8\u51fa9.8\uff08+29.4%\uff09\u548c5.9\uff08+15.8%\uff09\u4e2a\u767e\u5206\u70b9\uff0c\u5e76\u4e14\u6210\u529f\u7387\u8fbe\u523026.6\u70b9\uff08+161%\uff09\uff0c\u8d85\u8fc7\u4e86\u5177\u6709\u76f8\u540c\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u9f50\u7684\u666e\u901a\u7f51\u7edc\u4ee3\u7406\u3002\u6211\u4eec\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\uff0c\u800c\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u7b80\u6d01\u8bbe\u8ba1\u7a81\u663e\u4e86LLMs\u5728\u7f51\u9875\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6027\u80fd\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2410.13768": "|**2024-10-17**|**Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems**|Alireza Ghafarollahi et.al.|[2410.13768](http://arxiv.org/abs/2410.13768)|null|\u4e00\u4e2a\u591a\u667a\u80fd\u4f53AI\u6a21\u578b\u88ab\u7528\u4e8e\u81ea\u52a8\u5316\u53d1\u73b0\u65b0\u7684\u91d1\u5c5e\u5408\u91d1\uff0c\u8be5\u6a21\u578b\u6574\u5408\u4e86\u591a\u6a21\u6001\u6570\u636e\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u5305\u62ec\u901a\u8fc7\u539f\u5b50\u6a21\u62df\u83b7\u5f97\u7684\u7269\u7406\u89c1\u89e3\u3002\u6211\u4eec\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5177\u6709\u4e09\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(a) \u4e00\u7ec4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8d1f\u8d23\u63a8\u7406\u548c\u89c4\u5212\u7b49\u4efb\u52a1\uff0c(b) \u4e00\u7fa4\u5177\u6709\u4e0d\u540c\u89d2\u8272\u548c\u4e13\u4e1a\u77e5\u8bc6\u7684AI\u4ee3\u7406\u52a8\u6001\u534f\u4f5c\uff0c\u4ee5\u53ca(c) \u4e00\u79cd\u65b0\u5f00\u53d1\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\u6a21\u578b\uff0c\u7528\u4e8e\u5feb\u901f\u68c0\u7d22\u5173\u952e\u7269\u7406\u5c5e\u6027\u3002\u4e00\u7ec4\u7531LLM\u9a71\u52a8\u7684AI\u4ee3\u7406\u5408\u4f5c\u81ea\u52a8\u5316\u63a2\u7d22MPEAs\uff08\u9ad8\u71b5\u5408\u91d1\uff09\u7684\u5de8\u5927\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5e76\u901a\u8fc7GNN\u7684\u9884\u6d4b\u8fdb\u884c\u5f15\u5bfc\u3002\u6211\u4eec\u4e13\u6ce8\u4e8eNbMoTa\u7cfb\u5217\u4f53\u5fc3\u7acb\u65b9\uff08bcc\uff09\u5408\u91d1\uff0c\u8fd9\u4e9b\u5408\u91d1\u4f7f\u7528\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u539f\u5b50\u95f4\u52bf\u8fdb\u884c\u5efa\u6a21\uff0c\u5e76\u9488\u5bf9\u4e24\u4e2a\u5173\u952e\u6027\u8d28\uff1aPeierls\u52bf\u5792\u548c\u56fa\u6eb6\u4f53/\u87ba\u4f4d\u9519\u76f8\u4e92\u4f5c\u7528\u80fd\u3002\u6211\u4eec\u7684GNN\u6a21\u578b\u51c6\u786e\u5730\u9884\u6d4b\u4e86\u8fd9\u4e9b\u539f\u5b50\u5c3a\u5ea6\u7684\u6027\u8d28\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u6602\u8d35\u7684\u66b4\u529b\u8ba1\u7b97\u66f4\u5feb\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u5e76\u51cf\u8f7b\u4e86\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5728\u7269\u7406\u68c0\u7d22\u4e0a\u7684\u8ba1\u7b97\u8d1f\u62c5\u3002\u8fd9\u4e2aAI\u7cfb\u7edf\u901a\u8fc7\u51cf\u5c11\u5bf9\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u4f9d\u8d56\u5e76\u514b\u670d\u76f4\u63a5\u5168\u539f\u5b50\u6a21\u62df\u7684\u5c40\u9650\u6027\uff0c\u9769\u65b0\u4e86\u6750\u6599\u7684\u53d1\u73b0\u8fc7\u7a0b\u3002\u901a\u8fc7\u534f\u540cGNN\u7684\u9884\u6d4b\u80fd\u529b\u548cLLM\u4ee3\u7406\u7684\u52a8\u6001\u534f\u4f5c\uff0c\u7cfb\u7edf\u81ea\u4e3b\u5bfc\u822a\u5de8\u5927\u7684\u5408\u91d1\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u8bc6\u522b\u539f\u5b50\u5c3a\u5ea6\u6750\u6599\u6027\u8d28\u7684\u8d8b\u52bf\uff0c\u5e76\u9884\u6d4b\u5b8f\u89c2\u5c3a\u5ea6\u7684\u673a\u68b0\u5f3a\u5ea6\uff0c\u5982\u82e5\u5e72\u4e2a\u8ba1\u7b97\u5b9e\u9a8c\u6240\u5c55\u793a\u7684\u90a3\u6837\u3002\u8fd9\u79cd\u65b9\u6cd5\u52a0\u901f\u4e86\u5148\u8fdb\u5408\u91d1\u7684\u53d1\u73b0\uff0c\u5e76\u6709\u671b\u5728\u5176\u4ed6\u590d\u6742\u7cfb\u7edf\u4e2d\u6709\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u6750\u6599\u8bbe\u8ba1\u9886\u57df\u7684\u4e00\u5927\u8fdb\u6b65\u3002|\n", "2410.13610": "|**2024-10-17**|**MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling**|Yakun Zhu et.al.|[2410.13610](http://arxiv.org/abs/2410.13610)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u96c6\u6210\u5de5\u5177\u5df2\u7ecf\u4fc3\u8fdb\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u5728\u4e13\u95e8\u7684\u4e0b\u6e38\u4efb\u52a1\u573a\u666f\u4e2d\uff0c\u4ec5\u4f9d\u8d56\u5de5\u5177\u4e0d\u8db3\u4ee5\u5b8c\u5168\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u6027\uff0c\u8fd9\u5c24\u5176\u9650\u5236\u4e86LLMs\u5728\u533b\u5b66\u7b49\u9886\u57df\u7684\u6709\u6548\u90e8\u7f72\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u533b\u5b66\u8ba1\u7b97\u5668\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u4f7f\u7528\u6807\u51c6\u5316\u6d4b\u8bd5\u6765\u8bc4\u4f30\u4e2a\u4eba\u7684\u5065\u5eb7\u72b6\u51b5\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86MeNTi\uff0c\u8fd9\u662f\u4e00\u79cd\u4e3aLLMs\u8bbe\u8ba1\u7684\u901a\u7528\u4ee3\u7406\u67b6\u6784\u3002MeNTi\u96c6\u6210\u4e86\u4e13\u4e1a\u7684\u533b\u5b66\u5de5\u5177\u5305\uff0c\u5e76\u91c7\u7528\u5143\u5de5\u5177\u548c\u5d4c\u5957\u8c03\u7528\u673a\u5236\u4ee5\u589e\u5f3aLLMs\u5bf9\u5de5\u5177\u7684\u5229\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5b9e\u73b0\u4e86\u7075\u6d3b\u7684\u5de5\u5177\u9009\u62e9\u548c\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u6765\u5e94\u5bf9\u590d\u6742\u7684\u533b\u5b66\u573a\u666f\u4e2d\u7684\u5b9e\u9645\u95ee\u9898\uff0c\u5305\u62ec\u8ba1\u7b97\u5668\u9009\u62e9\u3001\u63d2\u69fd\u586b\u5145\u548c\u5355\u4f4d\u8f6c\u6362\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u5728\u6574\u4e2a\u4e34\u5e8a\u8fc7\u7a0b\u4e2d\u7684\u8ba1\u7b97\u5668\u573a\u666f\u4e0b\u7684\u5b9a\u91cf\u8bc4\u4f30\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86CalcQA\u57fa\u51c6\u3002\u8be5\u57fa\u51c6\u8981\u6c42LLMs\u4f7f\u7528\u533b\u5b66\u8ba1\u7b97\u5668\u8fdb\u884c\u8ba1\u7b97\u5e76\u8bc4\u4f30\u60a3\u8005\u7684\u5065\u5eb7\u72b6\u51b5\u3002CalcQA\u7531\u4e13\u4e1a\u533b\u751f\u6784\u5efa\uff0c\u5305\u542b100\u4e2a\u6848\u4f8b-\u8ba1\u7b97\u5668\u5bf9\uff0c\u5e76\u9644\u5e26\u4e00\u4e2a\u5305\u542b281\u79cd\u533b\u5b66\u5de5\u5177\u7684\u5de5\u5177\u5305\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\u3002\u672c\u7814\u7a76\u4e3a\u5728\u533b\u5b66\u7684\u9ad8\u9700\u6c42\u573a\u666f\u4e2d\u5e94\u7528LLMs\u5f00\u8f9f\u4e86\u65b0\u7684\u65b9\u5411\u3002|\n", "2410.13185": "|**2024-10-17**|**Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents**|Long Li et.al.|[2410.13185](http://arxiv.org/abs/2410.13185)|**[link](https://github.com/damo-nlp-sg/coi-agent)**|\u6709\u6548\u7684\u7814\u7a76\u521b\u610f\u6784\u601d\u662f\u79d1\u5b66\u7814\u7a76\u7684\u5173\u952e\u6b65\u9aa4\u3002\u7136\u800c\uff0c\u79d1\u5b66\u6587\u732e\u7684\u6307\u6570\u589e\u957f\u4f7f\u5f97\u7814\u7a76\u4eba\u5458\u96be\u4ee5\u8ddf\u4e0a\u6700\u65b0\u7684\u8fdb\u5c55\u5e76\u786e\u5b9a\u6709\u610f\u4e49\u7684\u7814\u7a76\u65b9\u5411\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u8868\u660e\uff0c\u81ea\u52a8\u5316\u751f\u6210\u65b0\u9896\u7684\u7814\u7a76\u521b\u610f\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u521b\u610f\u751f\u6210\u65b9\u6cd5\u8981\u4e48\u7b80\u5355\u5730\u63d0\u793aLLMs\uff0c\u8981\u4e48\u76f4\u63a5\u5411LLMs\u66b4\u9732\u5927\u91cf\u6587\u732e\u800c\u6ca1\u6709\u6307\u793a\u6709\u7528\u7684\u4fe1\u606f\u3002\u53d7\u5230\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7814\u7a76\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3aChain-of-Ideas\uff08CoI\uff09\u4ee3\u7406\u7684\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5b83\u4ee5\u94fe\u5f0f\u7ed3\u6784\u7ec4\u7ec7\u76f8\u5173\u6587\u732e\uff0c\u6709\u6548\u53cd\u6620\u4e86\u7814\u7a76\u9886\u57df\u7684\u6e10\u8fdb\u53d1\u5c55\u3002\u8fd9\u79cd\u7ec4\u7ec7\u65b9\u5f0f\u4f7fLLMs\u80fd\u591f\u6355\u6349\u5f53\u524d\u7684\u7814\u7a76\u8fdb\u5c55\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u521b\u610f\u751f\u6210\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86Idea Arena\u8bc4\u4f30\u534f\u8bae\uff0c\u53ef\u4ee5\u4ece\u4e0d\u540c\u89d2\u5ea6\u5168\u9762\u8bc4\u4f30\u521b\u610f\u751f\u6210\u65b9\u6cd5\uff0c\u4e0e\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7684\u504f\u597d\u7d27\u5bc6\u5bf9\u9f50\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoI\u4ee3\u7406\u5728\u521b\u610f\u751f\u6210\u65b9\u9762\u59cb\u7ec8\u4f18\u4e8e\u5176\u4ed6\u65b9\u6cd5\uff0c\u5e76\u4e14\u5176\u8d28\u91cf\u53ef\u4e0e\u4eba\u7c7b\u5ab2\u7f8e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684CoI\u4ee3\u7406\u6210\u672c\u4f4e\u5ec9\uff0c\u751f\u6210\u4e00\u4e2a\u5019\u9009\u521b\u610f\u53ca\u5176\u76f8\u5e94\u5b9e\u9a8c\u8bbe\u8ba1\u7684\u6700\u4f4e\u6210\u672c\u4ec5\u4e3a0.50\u7f8e\u5143\u3002|\n", "2410.14569": "|**2024-10-18**|**When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs**|Hanna Kim et.al.|[2410.14569](http://arxiv.org/abs/2410.14569)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u4f7f\u5176\u6210\u4e3a\u80fd\u591f\u89c4\u5212\u548c\u4e0e\u5404\u79cd\u5de5\u5177\u4ea4\u4e92\u7684\u81ea\u4e3b\u7cfb\u7edf\u3002\u8fd9\u4e9bLLM\u4ee3\u7406\u901a\u5e38\u4e0e\u57fa\u4e8e\u7f51\u7edc\u7684\u5de5\u5177\u7ed3\u5408\u4f7f\u7528\uff0c\u4ece\u800c\u80fd\u591f\u8bbf\u95ee\u591a\u6837\u5316\u7684\u4fe1\u606f\u6e90\u548c\u5b9e\u65f6\u6570\u636e\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u8fdb\u5c55\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u5e26\u6765\u4e86\u663e\u8457\u7684\u597d\u5904\uff0c\u4f46\u5b83\u4eec\u4e5f\u589e\u52a0\u4e86\u6076\u610f\u4f7f\u7528\u7684\u98ce\u9669\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u4e2a\u4eba\u9690\u79c1\u4fe1\u606f\u7684\u7f51\u7edc\u653b\u51fb\u4e2d\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8c03\u67e5\u4e86LLM\u4ee3\u7406\u5728\u6d89\u53ca\u4e2a\u4eba\u6570\u636e\u7684\u7f51\u7edc\u653b\u51fb\u4e2d\u7684\u8bef\u7528\u98ce\u9669\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u65e8\u5728\u4e86\u89e3\uff1a1\uff09\u5f53\u6307\u5bfcLLM\u4ee3\u7406\u8fdb\u884c\u7f51\u7edc\u653b\u51fb\u65f6\uff0c\u5176\u6f5c\u5728\u7684\u80fd\u529b\uff1b2\uff09\u57fa\u4e8e\u7f51\u7edc\u7684\u5de5\u5177\u5982\u4f55\u589e\u5f3a\u7f51\u7edc\u653b\u51fb\uff1b\u4ee5\u53ca3\uff09\u5229\u7528LLM\u4ee3\u7406\u53d1\u8d77\u7f51\u7edc\u653b\u51fb\u53d8\u5f97\u591a\u4e48\u7ecf\u6d4e\u5b9e\u60e0\u548c\u5bb9\u6613\u3002\u6211\u4eec\u8003\u5bdf\u4e86\u4e09\u79cd\u653b\u51fb\u573a\u666f\uff1a\u6536\u96c6\u4e2a\u4eba\u8eab\u4efd\u4fe1\u606f\uff08PII\uff09\u3001\u751f\u6210\u5192\u5145\u5e16\u5b50\u548c\u521b\u5efa\u5b9a\u5411\u9493\u9c7c\u90ae\u4ef6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86LLM\u4ee3\u7406\u5728\u8fd9\u7c7b\u653b\u51fb\u4e2d\u7684\u6709\u6548\u6027\uff1aLLM\u4ee3\u7406\u5728\u6536\u96c6PII\u65b9\u9762\u7684\u51c6\u786e\u7387\u9ad8\u8fbe95.9%\uff0c\u7531LLM\u4ee3\u7406\u751f\u6210\u7684\u5192\u5145\u5e16\u5b50\u4e2d\u6709\u9ad8\u8fbe93.9%\u88ab\u8bc4\u4f30\u4e3a\u771f\u5b9e\uff0c\u800c\u7531LLM\u4ee3\u7406\u521b\u5efa\u7684\u5b9a\u5411\u9493\u9c7c\u90ae\u4ef6\u4e2d\u7684\u94fe\u63a5\u70b9\u51fb\u7387\u8fbe\u5230\u4e8646.67%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u5f3a\u8c03\u4e86\u73b0\u6709\u5546\u4e1aLLM\u4e2d\u7684\u5b89\u5168\u9632\u62a4\u63aa\u65bd\u7684\u5c40\u9650\u6027\uff0c\u5f3a\u8c03\u4e86\u8feb\u5207\u9700\u8981\u66f4\u5f3a\u5927\u7684\u5b89\u5168\u63aa\u65bd\u6765\u9632\u6b62LLM\u4ee3\u7406\u7684\u8bef\u7528\u3002|\n", "2410.14516": "|**2024-10-18**|**Do LLMs \"know\" internally when they follow instructions?**|Juyeon Heo et.al.|[2410.14516](http://arxiv.org/abs/2410.14516)|null|\u6307\u4ee4\u8ddf\u968f\u5bf9\u4e8e\u6784\u5efa\u5177\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684AI\u4ee3\u7406\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6a21\u578b\u5fc5\u987b\u4e25\u683c\u9075\u5faa\u7528\u6237\u63d0\u4f9b\u7684\u7ea6\u675f\u548c\u6307\u5357\u3002\u7136\u800c\uff0cLLMs\u7ecf\u5e38\u65e0\u6cd5\u9075\u5faa\u5373\u4f7f\u662f\u7b80\u5355\u4e14\u660e\u786e\u7684\u6307\u4ee4\u3002\u4e3a\u4e86\u63d0\u9ad8\u6307\u4ee4\u8ddf\u968f\u7684\u6210\u529f\u7387\u5e76\u9632\u6b62\u4e0d\u671f\u671b\u7684\u8f93\u51fa\uff0c\u9700\u8981\u66f4\u6df1\u5165\u5730\u7406\u89e3LLMs\u7684\u5185\u90e8\u72b6\u6001\u4e0e\u8fd9\u4e9b\u7ed3\u679c\u4e4b\u95f4\u7684\u5173\u7cfb\u3002\u6211\u4eec\u5bf9LLM\u7684\u5185\u90e8\u72b6\u6001\u8fdb\u884c\u5206\u6790\uff0c\u53d1\u73b0\u8f93\u5165\u5d4c\u5165\u7a7a\u95f4\u4e2d\u5b58\u5728\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u4e0e\u6210\u529f\u7684\u6307\u4ee4\u8ddf\u968f\u76f8\u5173\u8054\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6cbf\u7740\u8fd9\u4e2a\u7ef4\u5ea6\u4fee\u6539\u8868\u793a\u53ef\u4ee5\u63d0\u9ad8\u6307\u4ee4\u8ddf\u968f\u7684\u6210\u529f\u7387\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u54cd\u5e94\u8d28\u91cf\u3002\u8fdb\u4e00\u6b65\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u4e2a\u7ef4\u5ea6\u4e0e\u63d0\u793a\u7684\u63aa\u8f9e\u5173\u7cfb\u66f4\u4e3a\u5bc6\u5207\uff0c\u800c\u4e0d\u662f\u4efb\u52a1\u6216\u6307\u4ee4\u7684\u56fa\u6709\u96be\u5ea6\u3002\u8fd9\u4e00\u53d1\u73b0\u8fd8\u89e3\u91ca\u4e86\u4e3a\u4ec0\u4e48LLMs\u6709\u65f6\u65e0\u6cd5\u9075\u5faa\u6e05\u6670\u7684\u6307\u4ee4\uff0c\u4ee5\u53ca\u4e3a\u4ec0\u4e48\u5373\u4f7f\u5185\u5bb9\u57fa\u672c\u4e0d\u53d8\uff0c\u63d0\u793a\u5de5\u7a0b\u5f80\u5f80\u6709\u6548\u7684\u539f\u56e0\u3002\u8fd9\u9879\u5de5\u4f5c\u63ed\u793a\u4e86LLMs\u6307\u4ee4\u8ddf\u968f\u7684\u5185\u90e8\u673a\u5236\uff0c\u4e3a\u53ef\u9760LLM\u4ee3\u7406\u7684\u5f00\u53d1\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.14368": "|**2024-10-18**|**CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic**|Huaiyuan Yao et.al.|[2410.14368](http://arxiv.org/abs/2410.14368)|**[link](https://github.com/hyan-yao/comal)**|**\u5728\u57ce\u5e02\u4ea4\u901a\u4e2d\u5f15\u5165\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u53ef\u4ee5\u901a\u8fc7\u51cf\u5c11\u62e5\u5835\u548c\u7cfb\u7edf\u5730\u4f18\u5316\u4ea4\u901a\u6d41\u91cf\u6765\u63d0\u9ad8\u6548\u7387\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCoMAL\uff08\u534f\u4f5c\u591a\u667a\u80fd\u4f53\u5927\u8bed\u8a00\u6a21\u578b\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u4e4b\u95f4\u7684\u534f\u4f5c\u89e3\u51b3\u6df7\u5408\u81ea\u4e3b\u4ea4\u901a\u95ee\u9898\uff0c\u4ece\u800c\u4f18\u5316\u4ea4\u901a\u6d41\u91cf\u3002CoMAL\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5728\u4ea4\u4e92\u5f0f\u4ea4\u901a\u4eff\u771f\u73af\u5883\u4e2d\u8fd0\u884c\u3002\u5b83\u5229\u7528\u611f\u77e5\u6a21\u5757\u89c2\u5bdf\u5468\u56f4\u4ee3\u7406\uff0c\u5e76\u4f7f\u7528\u8bb0\u5fc6\u6a21\u5757\u5b58\u50a8\u6bcf\u4e2a\u4ee3\u7406\u7684\u7b56\u7565\u3002\u6574\u4f53\u5de5\u4f5c\u6d41\u7a0b\u5305\u62ec\u4e00\u4e2a\u534f\u4f5c\u6a21\u5757\uff0c\u9f13\u52b1\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u8ba8\u8bba\u6709\u6548\u7684\u7b56\u7565\u5e76\u5206\u914d\u89d2\u8272\uff0c\u4e00\u4e2a\u63a8\u7406\u5f15\u64ce\u6839\u636e\u5206\u914d\u7684\u89d2\u8272\u786e\u5b9a\u6700\u4f18\u884c\u4e3a\uff0c\u4ee5\u53ca\u4e00\u4e2a\u6267\u884c\u6a21\u5757\u4f7f\u7528\u7ed3\u5408\u4e86\u57fa\u4e8e\u89c4\u5219\u6a21\u578b\u7684\u6df7\u5408\u65b9\u6cd5\u63a7\u5236\u8f66\u8f86\u52a8\u4f5c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoMAL\u5728Flow\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u7684\u5f71\u54cd\uff0c\u5e76\u5c06\u5176\u6846\u67b6\u4e0e\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u8fd9\u7a81\u663e\u4e86LLM\u4ee3\u7406\u7684\u5f3a\u5927\u5408\u4f5c\u80fd\u529b\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u6765\u5e94\u5bf9\u6df7\u5408\u81ea\u4e3b\u4ea4\u901a\u6311\u6218\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Hyan-Yao/CoMAL\u83b7\u53d6\u3002**|\n", "2410.14262": "|**2024-10-18**|**Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation**|Edward et.al.|[2410.14262](http://arxiv.org/abs/2410.14262)|null|\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u68c0\u6d4b\u548c\u7ea0\u6b63AI\u751f\u6210\u5185\u5bb9\u4e2d\u5e7b\u89c9\u73b0\u8c61\u7684\u80fd\u529b\u3002\u4e00\u4e2a\u4e3b\u8981\u4ee3\u7406\u88ab\u4efb\u52a1\u521b\u5efa\u4e00\u7bc7\u5173\u4e8e\u4e00\u4f4d\u865a\u6784\u7684\u4e39\u9ea6\u827a\u672f\u5bb6Flipfloppidy\u7684\u535a\u5ba2\uff0c\u7136\u540e\u7531\u53e6\u4e00\u4e2a\u4ee3\u7406\u8fdb\u884c\u5ba1\u67e5\u4ee5\u8bc6\u522b\u4e8b\u5b9e\u6027\u9519\u8bef\u3002\u5927\u591a\u6570LLM\u6a21\u578b\u5e7b\u5316\u51fa\u4e86\u8fd9\u4f4d\u827a\u672f\u5bb6\u7684\u5b58\u5728\u3002\u5728\u6d89\u53ca\u5404\u79cd\u4e3b\u4ee3\u7406\u548c\u5ba1\u67e5\u4ee3\u7406\u7ec4\u5408\u76844900\u6b21\u6d4b\u8bd5\u8fd0\u884c\u4e2d\uff0c\u5148\u8fdb\u7684AI\u6a21\u578b\u5982Llama3-70b\u548cGPT-4\u53d8\u4f53\u5728\u8bc6\u522b\u5e7b\u89c9\u65b9\u9762\u51e0\u4e4e\u8fbe\u5230\u4e86\u5b8c\u7f8e\u7684\u51c6\u786e\u7387\uff0c\u5e76\u4e14\u5728\u6536\u5230\u53cd\u9988\u540e\u6210\u529f\u4fee\u6b63\u4e86\u8f93\u51fa\u5185\u5bb9\u768485%\u5230100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5148\u8fdbAI\u6a21\u578b\u5728\u663e\u8457\u63d0\u9ad8\u751f\u6210\u5185\u5bb9\u7684\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u6539\u8fdbAI\u5de5\u4f5c\u6d41\u7f16\u6392\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u3002|\n", "2410.14209": "|**2024-10-18**|**Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents**|Zihan Liu et.al.|[2410.14209](http://arxiv.org/abs/2410.14209)|null|\u5728\u5de5\u4e1a\u63a7\u5236\u7cfb\u7edf\u4e2d\uff0c\u53ef\u7f16\u7a0b\u903b\u8f91\u63a7\u5236\u5668\uff08PLC\uff09\u4ee3\u7801\u7684\u751f\u6210\u548c\u9a8c\u8bc1\u5bf9\u4e8e\u786e\u4fdd\u8fd0\u884c\u6548\u7387\u548c\u5b89\u5168\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u52a8\u5316\u4ee3\u7801\u751f\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u65e0\u6cd5\u63d0\u4f9b\u6b63\u786e\u6027\u4fdd\u8bc1\uff0c\u5e76\u4e14\u7f3a\u4e4f\u5bf9PLC\u7f16\u7a0b\u7684\u4e13\u4e1a\u652f\u6301\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgents4PLC\u7684\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4e0d\u4ec5\u5b9e\u73b0\u4e86PLC\u4ee3\u7801\u7684\u81ea\u52a8\u5316\u751f\u6210\uff0c\u8fd8\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u8fdb\u884c\u4e86\u4ee3\u7801\u7ea7\u522b\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u9996\u5148\u5efa\u7acb\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u53ef\u9a8c\u8bc1\u7684PLC\u4ee3\u7801\u751f\u6210\u9886\u57df\uff0c\u4ece\u81ea\u7136\u8bed\u8a00\u9700\u6c42\u8fc7\u6e21\u5230\u4eba\u5de5\u7f16\u5199\u548c\u9a8c\u8bc1\u7684\u5f62\u5f0f\u5316\u89c4\u8303\u548c\u53c2\u8003PLC\u4ee3\u7801\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u3001\u5148\u8fdb\u7684\u63d0\u793a\u5de5\u7a0b\u6280\u672f\u548c\u94fe\u5f0f\u601d\u7ef4\u7b56\u7565\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u9488\u5bf9\u5de5\u4e1a\u63a7\u5236\u7cfb\u7edf\u7684\u201c\u4ee3\u7406\u201d\u3002\u8bc4\u4f30\u8868\u660e\uff0cAgents4PLC\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u4e00\u7cfb\u5217\u65e5\u76ca\u4e25\u683c\u7684\u6307\u6807\u4e0a\u5747\u53d6\u5f97\u4e86\u4f18\u5f02\u7684\u7ed3\u679c\u3002\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u89e3\u51b3\u4e86PLC\u7f16\u7a0b\u4e2d\u7684\u5173\u952e\u6311\u6218\uff0c\u8fd8\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u9002\u7528\u4e8e\u5b9e\u9645\u5de5\u4e1a\u5e94\u7528\u7684\u53ef\u9a8c\u8bc1\u4ee3\u7801\u7684\u6f5c\u529b\u3002|\n", "2410.14202": "|**2024-10-18**|**Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs**|SeongYeub Chu et.al.|[2410.14202](http://arxiv.org/abs/2410.14202)|null|\u73b0\u6709\u7684\u81ea\u52a8\u4f5c\u6587\u8bc4\u5206\uff08AES\uff09\u4ec5\u4f9d\u8d56\u4e8e\u4f5c\u6587\u6587\u672c\uff0c\u800c\u672a\u4f7f\u7528\u89e3\u91ca\u6027\u7406\u7531\u5206\u6570\uff0c\u56e0\u6b64\u9519\u5931\u4e86\u4ee5\u7ec6\u7c92\u5ea6\u65b9\u5f0f\u6355\u6349\u8bc4\u5206\u6807\u51c6\u4e2d\u7279\u5b9a\u8bc4\u4f30\u65b9\u9762\u7684\u673a\u4f1a\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u57fa\u4e8e\u8bba\u636e\u7684\u591a\u7279\u5f81\u8bc4\u5206\uff08RMTS\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u57fa\u4e8e\u63d0\u793a\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4f7f\u7528\u8f83\u5c0f\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08S-LLM\uff09\u7684\u5fae\u8c03\u5f0f\u4f5c\u6587\u8bc4\u5206\u6a21\u578b\u3002RMTS \u4f7f\u7528\u57fa\u4e8eLLM\u7684\u7279\u5f81\u8bba\u636e\u751f\u6210\u7cfb\u7edf\uff0c\u5176\u4e2d\u5355\u72ec\u7684LLM\u4ee3\u7406\u6839\u636e\u8bc4\u5206\u6807\u51c6\u6307\u5357\u751f\u6210\u7279\u5f81\u7279\u5b9a\u7684\u7406\u7531\uff0c\u8bc4\u5206\u6a21\u578b\u5229\u7528\u8fd9\u4e9b\u7406\u7531\u51c6\u786e\u9884\u6d4b\u591a\u7279\u5f81\u5206\u6570\u3002\u5728\u57fa\u51c6\u6570\u636e\u96c6\uff08\u5305\u62ecASAP\u3001ASAP++\u548cFeedback Prize\uff09\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cRMTS \u5728\u7279\u5f81\u7279\u5b9a\u8bc4\u5206\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u548c\u666e\u901a\u7684S-LLM\u3002\u901a\u8fc7\u8f85\u52a9\u5b9a\u91cf\u8bc4\u4f30\u4ee5\u63d0\u4f9b\u7ec6\u7c92\u5ea6\u7684\u5b9a\u6027\u7406\u7531\uff0cRMTS \u63d0\u9ad8\u4e86\u7279\u5f81\u8bc4\u5206\u7684\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5173\u4e8e\u4f5c\u6587\u7684\u90e8\u5206\u89e3\u91ca\u3002|\n", "2410.14152": "|**2024-10-18**|**SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent**|Jiarui Ji et.al.|[2410.14152](http://arxiv.org/abs/2410.14152)|**[link](https://github.com/jijiarui-cather/srapagent_framework)**|\u516c\u5171\u7a00\u7f3a\u8d44\u6e90\u914d\u7f6e\u5728\u7ecf\u6d4e\u5b66\u4e2d\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\uff0c\u56e0\u4e3a\u5b83\u76f4\u63a5\u5f71\u54cd\u5230\u793e\u4f1a\u7684\u6548\u7387\u548c\u516c\u5e73\u6027\u3002\u4f20\u7edf\u7814\u7a76\u65b9\u6cd5\uff0c\u5305\u62ec\u57fa\u4e8e\u7406\u8bba\u6a21\u578b\u3001\u57fa\u4e8e\u5b9e\u8bc1\u7814\u7a76\u548c\u57fa\u4e8e\u4eff\u771f\u7684\u65b9\u6cd5\uff0c\u7531\u4e8e\u5b58\u5728\u7406\u60f3\u5316\u7684\u5b8c\u5168\u4fe1\u606f\u548c\u4e2a\u4f53\u7406\u6027\u7684\u5047\u8bbe\u4ee5\u53ca\u6709\u9650\u53ef\u7528\u6570\u636e\u7684\u9650\u5236\uff0c\u9762\u4e34\u7740\u5c40\u9650\u6027\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6SRAP-Agent\uff08\u4f7f\u7528\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u7684\u667a\u80fd\u4f53\u6a21\u62df\u548c\u4f18\u5316\u7a00\u7f3a\u8d44\u6e90\u914d\u7f6e\u653f\u7b56\uff09\uff0c\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u96c6\u6210\u5230\u7ecf\u6d4e\u4eff\u771f\u4e2d\uff0c\u65e8\u5728\u5f25\u5408\u7406\u8bba\u6a21\u578b\u4e0e\u73b0\u5b9e\u52a8\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u4ee5\u516c\u5171\u4f4f\u623f\u5206\u914d\u573a\u666f\u4f5c\u4e3a\u6848\u4f8b\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u653f\u7b56\u4eff\u771f\u5b9e\u9a8c\u6765\u9a8c\u8bc1SRAP-Agent\u7684\u53ef\u884c\u6027\u548c\u6709\u6548\u6027\uff0c\u5e76\u91c7\u7528\u5177\u6709\u7279\u5b9a\u4f18\u5316\u76ee\u6807\u7684\u653f\u7b56\u4f18\u5316\u7b97\u6cd5\u3002\u6e90\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/jijiarui-cather/SRAPAgent_Framework\u627e\u5230\u3002|\n", "2410.14041": "|**2024-10-17**|**From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching**|Eric Yang et.al.|[2410.14041](http://arxiv.org/abs/2410.14041)|null|\u6709\u6548\u7684\u7ba1\u7406\u5fc3\u810f\u4ee3\u8c22\u72b6\u51b5\u9700\u8981\u6301\u7eed\u7684\u79ef\u6781\u8425\u517b\u4e60\u60ef\uff0c\u4f46\u8fd9\u4e9b\u4e60\u60ef\u5f80\u5f80\u53d7\u5230\u590d\u6742\u4e14\u4e2a\u4f53\u5316\u7684\u969c\u788d\u5f71\u54cd\u3002\u76f4\u63a5\u7684\u4eba\u7c7b\u7ba1\u7406\u96be\u4ee5\u6269\u5c55\uff0c\u800c\u4e4b\u524d\u7684\u5c1d\u8bd5\u65e8\u5728\u81ea\u52a8\u5316\u8425\u517b\u8f85\u5bfc\uff0c\u4f46\u7f3a\u4e4f\u89e3\u51b3\u8fd9\u4e9b\u591a\u6837\u5316\u6311\u6218\u6240\u9700\u7684\u4e2a\u6027\u5316\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e3b\u52a8\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u901a\u8fc7\u76f4\u63a5\u9488\u5bf9\u5e76\u7f13\u89e3\u60a3\u8005\u7279\u5b9a\u7684\u969c\u788d\u6765\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u8425\u517b\u8f85\u5bfc\u3002\u8be5\u5de5\u4f5c\u6d41\u7a0b\u57fa\u4e8e\u884c\u4e3a\u79d1\u5b66\u539f\u5219\uff0c\u5229\u7528\u4e86\u4e0e\u76f8\u5e94\u5faa\u8bc1\u7b56\u7565\u76f8\u5173\u7684\u5168\u9762\u8425\u517b\u76f8\u5173\u969c\u788d\u6620\u5c04\u3002\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u4ee3\u7406\u6709\u610f\u63a2\u67e5\u5e76\u8bc6\u522b\u60a3\u8005\u5728\u996e\u98df\u65b9\u9762\u7684\u6839\u672c\u95ee\u9898\u3002\u968f\u540e\uff0c\u53e6\u4e00\u4e2aLLM\u4ee3\u7406\u63d0\u4f9b\u91cf\u8eab\u5b9a\u5236\u7684\u7b56\u7565\uff0c\u4ee5\u514b\u670d\u8fd9\u4e9b\u7279\u5b9a\u969c\u788d\uff0c\u5e76\u7ed3\u5408\u60a3\u8005\u7684\u5177\u4f53\u60c5\u51b5\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca\u5fc3\u810f\u4ee3\u8c22\u75be\u75c5\u60a3\u8005\u7684\u7528\u6237\u7814\u7a76\u6765\u8bbe\u8ba1\u548c\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8bc1\u660e\u4e86\u8be5\u7cfb\u7edf\u80fd\u591f\u51c6\u786e\u8bc6\u522b\u969c\u788d\u5e76\u63d0\u4f9b\u4e2a\u6027\u5316\u6307\u5bfc\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u5927\u89c4\u6a21\u6a21\u62df\u7814\u7a76\u6765\u8bc4\u4f30\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u8be5\u7814\u7a76\u57fa\u4e8e\u771f\u5b9e\u7684\u60a3\u8005\u6848\u4f8b\u548c\u4e13\u5bb6\u9a8c\u8bc1\u7684\u6307\u6807\uff0c\u5728\u5e7f\u6cdb\u7684\u60c5\u666f\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8eLLM\u7684\u4e3b\u52a8\u5de5\u4f5c\u6d41\u7a0b\u6709\u53ef\u80fd\u901a\u8fc7\u63d0\u4f9b\u4e2a\u6027\u5316\u3001\u53ef\u6269\u5c55\u4e14\u57fa\u4e8e\u884c\u4e3a\u7684\u5e72\u9884\u63aa\u65bd\u6765\u6539\u5584\u8425\u517b\u8f85\u5bfc\u3002|\n", "2410.16237": "|**2024-10-23**|**IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems**|Yihuan Mao et.al.|[2410.16237](http://arxiv.org/abs/2410.16237)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8d8a\u6765\u8d8a\u591a\u5730\u96c6\u6210\u5230\u6211\u4eec\u7684\u57fa\u7840\u8bbe\u65bd\u4e2d\uff0c\u5b83\u4eec\u7684\u7a33\u5065\u534f\u8c03\u548c\u6d88\u606f\u540c\u6b65\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u62dc\u5360\u5ead\u5c06\u519b\u95ee\u9898\uff08BGP\uff09\u662f\u6784\u5efa\u5728\u5bf9\u6297\u6027\u653b\u51fb\u4e0b\u5177\u6709\u5f39\u6027\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08MAS\uff09\u7684\u5173\u952e\u6a21\u578b\u3002\u8be5\u95ee\u9898\u63cf\u8ff0\u4e86\u4e00\u79cd\u60c5\u666f\uff0c\u5176\u4e2d\u7cfb\u7edf\u5185\u5b58\u5728\u6076\u610f\u4ee3\u7406\u4e14\u8fd9\u4e9b\u4ee3\u7406\u7684\u8eab\u4efd\u672a\u77e5\u2014\u2014\u5728\u6211\u4eec\u7684\u60c5\u5883\u4e2d\uff0c\u8fd9\u79cd\u60c5\u51b5\u53ef\u80fd\u662f\u7531LLM\u4ee3\u7406\u7684\u5e7b\u89c9\u6216\u5916\u90e8\u653b\u51fb\u5f15\u8d77\u7684\u3002\u5728BGP\u4e2d\uff0c\u6574\u4e2a\u7cfb\u7edf\u7684\u76ee\u7684\u662f\u5c31\u91c7\u53d6\u7684\u884c\u52a8\u8fbe\u6210\u5171\u8bc6\u3002\u4f20\u7edf\u7684BGP\u9700\u8981\u6240\u6709\u4ee3\u7406\u4e4b\u95f4\u7684\u5168\u5c40\u5171\u8bc6\uff1b\u7136\u800c\uff0c\u5728\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u5168\u5c40\u5171\u8bc6\u5e76\u975e\u603b\u662f\u5fc5\u8981\uff0c\u751a\u81f3\u53ef\u80fd\u6548\u7387\u4f4e\u4e0b\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u63a2\u7d22\u4e00\u79cd\u4e0eMAS\u4e2d\u89c2\u5bdf\u5230\u7684\u5c40\u90e8\u534f\u8c03\u6a21\u5f0f\u76f8\u4e00\u81f4\u7684\u6539\u8fdb\u7248BGP\u3002\u6211\u4eec\u5728\u7814\u7a76\u4e2d\u5c06\u8fd9\u79cd\u6539\u8fdb\u7248\u79f0\u4e3a\u4e0d\u5b8c\u7f8eBGP\uff08IBGP\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u5dee\u5f02\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u4e00\u822cMAS\u73af\u5883\u4e2d\u7684\u5171\u8bc6\u534f\u8bae\uff0c\u63d0\u4f9b\u4e86\u5bf9\u901a\u4fe1\u653b\u51fb\u7684\u53ef\u8bc1\u660e\u5f39\u6027\u4ee5\u53ca\u9002\u5e94\u4e0d\u65ad\u53d8\u5316\u7684\u73af\u5883\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u5b9e\u8bc1\u7ed3\u679c\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4f20\u611f\u5668\u7f51\u7edc\u73af\u5883\u4e2d\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u534f\u8bae\u7684\u5b9e\u9645\u5e94\u7528\u3002|\n", "2410.15686": "|**2024-10-21**|**NetSafe: Exploring the Topological Safety of Multi-agent Networks**|Miao Yu et.al.|[2410.15686](http://arxiv.org/abs/2410.15686)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u8d4b\u4e88\u4e86\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u4e2d\u7684\u8282\u70b9\u4ee5\u667a\u80fd\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\u3002\u7136\u800c\uff0c\u5982\u4f55\u9632\u6b62\u8fd9\u4e9b\u7f51\u7edc\u751f\u6210\u6076\u610f\u4fe1\u606f\u4ecd\u7136\u662f\u4e00\u4e2a\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u95ee\u9898\uff0c\u4ee5\u524d\u5173\u4e8e\u5355\u4e2aLLM\u5b89\u5168\u6027\u7684\u7814\u7a76\u96be\u4ee5\u76f4\u63a5\u8f6c\u79fb\u5e94\u7528\u3002\u672c\u6587\u4ece\u62d3\u6251\u5b66\u7684\u89d2\u5ea6\u5173\u6ce8\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u63a2\u8ba8\u54ea\u4e9b\u62d3\u6251\u7279\u6027\u6709\u52a9\u4e8e\u66f4\u5b89\u5168\u7684\u7f51\u7edc\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6NetSafe\u4ee5\u53ca\u4e00\u79cd\u8fed\u4ee3RelCom\u4ea4\u4e92\uff0c\u4ee5\u7edf\u4e00\u73b0\u6709\u7684\u5404\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u4e3a\u4e00\u822c\u5316\u7684\u62d3\u6251\u5b89\u5168\u6027\u7814\u7a76\u5960\u5b9a\u57fa\u7840\u3002\u6211\u4eec\u53d1\u73b0\u5f53\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u53d7\u5230\u6d89\u53ca\u865a\u5047\u4fe1\u606f\u3001\u504f\u89c1\u548c\u6709\u5bb3\u4fe1\u606f\u7684\u653b\u51fb\u65f6\uff0c\u4f1a\u51fa\u73b0\u51e0\u79cd\u5173\u952e\u73b0\u8c61\uff0c\u79f0\u4e3a\u4ee3\u7406\u5e7b\u89c9\u548c\u805a\u5408\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u9ad8\u5ea6\u8fde\u63a5\u7684\u7f51\u7edc\u66f4\u5bb9\u6613\u53d7\u5230\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f71\u54cd\uff0c\u5728\u661f\u5f62\u56fe\u62d3\u6251\u7ed3\u6784\u4e0b\u4efb\u52a1\u6027\u80fd\u4e0b\u964d\u4e8629.7%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u9759\u6001\u5ea6\u91cf\u6bd4\u4f20\u7edf\u7684\u56fe\u8bba\u5ea6\u91cf\u66f4\u63a5\u8fd1\u73b0\u5b9e\u4e16\u754c\u7684\u52a8\u6001\u8bc4\u4f30\uff0c\u8868\u660e\u8ddd\u79bb\u653b\u51fb\u8005\u5e73\u5747\u8ddd\u79bb\u66f4\u5927\u7684\u7f51\u7edc\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u5b89\u5168\u6027\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u89c6\u89d2\u6765\u63a2\u8ba8\u57fa\u4e8eLLM\u7684\u591a\u667a\u80fd\u4f53\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u5e76\u53d1\u73b0\u4e86\u51e0\u4e2a\u672a\u62a5\u9053\u7684\u73b0\u8c61\uff0c\u4e3a\u672a\u6765\u63a2\u7d22\u6b64\u7c7b\u7f51\u7edc\u7684\u5b89\u5168\u6027\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.15311": "|**2024-10-20**|**Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game**|Ruiqi Dong et.al.|[2410.15311](http://arxiv.org/abs/2410.15311)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u626e\u6f14\u7740\u5173\u952e\u7684AI\u89d2\u8272\uff0c\u4f46\u5728\u590d\u6742\u573a\u666f\u4e2d\u7684\u5f00\u653e\u5f0f\u51b3\u7b56\u95ee\u9898\u4e2d\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u8bed\u8a00\u903b\u8f91\u6e38\u620f\u201c\u8c01\u662f\u5367\u5e95\uff1f\u201d\uff08WIU\uff09\u4f5c\u4e3a\u5b9e\u9a8c\u5e73\u53f0\uff0c\u63d0\u51fa\u4e86\u591a\u89c6\u89d2\u56e2\u961f\u6218\u672f\uff08MPTT\uff09\u6846\u67b6\u3002MPTT\u65e8\u5728\u57f9\u517bLLMs\u5728\u590d\u6742\u573a\u666f\u4e2d\u7684\u4eba\u7c7b\u8bed\u8a00\u8868\u8fbe\u903b\u8f91\u3001\u591a\u7ef4\u601d\u7ef4\u548c\u81ea\u6211\u611f\u77e5\u3002\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u53d1\u8a00\u548c\u6295\u7968\u73af\u8282\uff0c\u5e76\u7ed3\u5408\u81ea\u6211\u89c6\u89d2\u3001\u8eab\u4efd\u786e\u5b9a\u3001\u81ea\u6211\u53cd\u601d\u3001\u81ea\u6211\u603b\u7ed3\u548c\u591a\u8f6e\u627e\u961f\u53cb\u7b49\u6280\u672f\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u7b56\u7565\u6027\u9690\u85cf\u548c\u6c9f\u901a\u4f5c\u51fa\u7406\u6027\u51b3\u7b56\uff0c\u4fc3\u8fdb\u4eba\u7c7b\u4fe1\u4efb\u7684\u5f62\u6210\u3002\u521d\u6b65\u7ed3\u679c\u663e\u793a\uff0cMPTT\u7ed3\u5408WIU\u5229\u7528\u4e86LLMs\u7684\u8ba4\u77e5\u80fd\u529b\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u53ef\u4ee5\u6a21\u62df\u771f\u5b9e\u793e\u4f1a\u7684\u51b3\u7b56\u6846\u67b6\u3002\u8be5\u6846\u67b6\u6709\u52a9\u4e8e\u5c11\u6570\u7fa4\u4f53\u7684\u6c9f\u901a\u4e0e\u8868\u8fbe\uff0c\u4fc3\u8fdb\u4e86\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u516c\u5e73\u6027\u548c\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u201c\u4eba\u5728\u56de\u8def\u201d\u5b9e\u9a8c\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u4e92\u52a8\u5b66\u4e60\u5e76\u9002\u5e94\u4eba\u7c7b\u884c\u4e3a\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u79ef\u6781\u53c2\u4e0e\u793e\u4f1a\u51b3\u7b56\u3002|\n", "2410.15267": "|**2024-10-20**|**When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?**|Shang Wang et.al.|[2410.15267](http://arxiv.org/abs/2410.15267)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982ChatGPT\u548cGemini\u7684\u90e8\u7f72\u5c55\u793a\u4e86\u5b83\u4eec\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u53ef\u80fd\u4f1a\u65e0\u610f\u4e2d\u5b66\u5230\u5e76\u4fdd\u7559\u654f\u611f\u4fe1\u606f\u548c\u6709\u5bb3\u5185\u5bb9\uff0c\u8fd9\u5f15\u53d1\u4e86\u91cd\u5927\u7684\u4f26\u7406\u548c\u6cd5\u5f8b\u95ee\u9898\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u63d0\u51fa\u4e86\u673a\u5668\u9057\u5fd8\u4f5c\u4e3a\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u9057\u5fd8\u65b9\u6cd5\u8003\u8651\u4e86LLMs\u7684\u5177\u4f53\u7279\u6027\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9762\u4e34\u9ad8\u8ba1\u7b97\u9700\u6c42\u3001\u6709\u9650\u9002\u7528\u6027\u6216\u707e\u96be\u6027\u9057\u5fd8\u7684\u98ce\u9669\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6280\u672f\u7684\u8f7b\u91cf\u7ea7\u9057\u5fd8\u6846\u67b6\u3002\u901a\u8fc7\u4fee\u6539RAG\u7684\u5916\u90e8\u77e5\u8bc6\u5e93\uff0c\u6211\u4eec\u5728\u4e0d\u76f4\u63a5\u4e0e\u672a\u5b66\u4e60\u7684LLM\u4ea4\u4e92\u7684\u60c5\u51b5\u4e0b\u6a21\u62df\u9057\u5fd8\u7684\u6548\u679c\u3002\u6211\u4eec\u5c06\u6784\u5efa\u9057\u5fd8\u77e5\u8bc6\u89c6\u4e3a\u4e00\u4e2a\u7ea6\u675f\u4f18\u5316\u95ee\u9898\uff0c\u5e76\u63a8\u5bfc\u51fa\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\uff0c\u4ee5\u652f\u6301\u57fa\u4e8eRAG\u7684\u9057\u5fd8\u7684\u6709\u6548\u6027\u3002\u8fd9\u79cd\u57fa\u4e8eRAG\u7684\u65b9\u6cd5\u5bf9\u4e8e\u95ed\u6e90LLMs\u7279\u522b\u6709\u6548\uff0c\u800c\u73b0\u6709\u9057\u5fd8\u65b9\u6cd5\u5f80\u5f80\u5728\u8fd9\u4e9b\u6a21\u578b\u4e0a\u5931\u6548\u3002\u6211\u4eec\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u5bf9\u6211\u4eec\u7684\u6846\u67b6\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5728\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86ChatGPT\u3001Gemini\u3001Llama-2-7b-chat-hf\u548cPaLM 2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6ee1\u8db3\u4e86\u4e94\u4e2a\u5173\u952e\u7684\u9057\u5fd8\u6807\u51c6\uff1a\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u3001\u65e0\u5bb3\u6027\u3001\u7b80\u5355\u6027\u548c\u9c81\u68d2\u6027\u3002\u6b64\u5916\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u6269\u5c55\u5230\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u3002|\n", "2410.15164": "|**2024-10-19**|**SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation**|Jingxuan Chen et.al.|[2410.15164](http://arxiv.org/abs/2410.15164)|null|\u667a\u80fd\u624b\u673a\u4ee3\u7406\u5728\u5e2e\u52a9\u7528\u6237\u9ad8\u6548\u63a7\u5236\u8bbe\u5907\u65b9\u9762\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u65b9\u6cd5\u6210\u4e3a\u5173\u952e\u7684\u7ade\u4e89\u8005\u3002\u7136\u800c\uff0c\u516c\u5e73\u6bd4\u8f83\u8fd9\u4e9b\u4ee3\u7406\u65e2\u91cd\u8981\u53c8\u5177\u6709\u6311\u6218\u6027\uff0c\u9700\u8981\u591a\u6837\u5316\u7684\u4efb\u52a1\u8303\u56f4\u3001\u96c6\u6210\u4e0d\u540c\u5b9e\u73b0\u65b9\u5f0f\u7684\u4ee3\u7406\u4ee5\u53ca\u901a\u7528\u7684\u8bc4\u4f30\u7ba1\u9053\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u4f18\u52bf\u548c\u52a3\u52bf\u3002\u672c\u6587\u4ecb\u7ecd\u4e86SPA-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u7efc\u5408\u7684\u667a\u80fd\u624b\u673a\u4ee3\u7406\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u8bc4\u4f30\u57fa\u4e8e\uff08M\uff09LLM\u7684\u4ee3\u7406\u5728\u4e00\u4e2a\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u6761\u4ef6\u7684\u4ea4\u4e92\u73af\u5883\u4e2d\u3002SPA-Bench\u6709\u4e09\u4e2a\u4e3b\u8981\u8d21\u732e\uff1a\uff081\uff09\u6db5\u76d6\u7cfb\u7edf\u5e94\u7528\u548c\u7b2c\u4e09\u65b9\u5e94\u7528\u7684\u4efb\u52a1\u96c6\uff0c\u5305\u62ec\u82f1\u8bed\u548c\u4e2d\u6587\uff0c\u91cd\u70b9\u662f\u65e5\u5e38\u751f\u6d3b\u4e2d\u5e38\u7528\u7684\u529f\u80fd\uff1b\uff082\uff09\u4e00\u4e2a\u5373\u63d2\u5373\u7528\u6846\u67b6\uff0c\u652f\u6301\u4e0eAndroid\u8bbe\u5907\u7684\u5b9e\u65f6\u4ea4\u4e92\uff0c\u96c6\u6210\u4e86\u8d85\u8fc7\u5341\u4e2a\u4ee3\u7406\uff0c\u5e76\u4e14\u53ef\u4ee5\u7075\u6d3b\u6dfb\u52a0\u66f4\u591a\u4ee3\u7406\uff1b\uff083\uff09\u4e00\u79cd\u65b0\u9896\u7684\u8bc4\u4f30\u7ba1\u9053\uff0c\u81ea\u52a8\u4ece\u591a\u4e2a\u7ef4\u5ea6\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff0c\u5305\u62ec\u4e03\u4e2a\u4e0e\u4efb\u52a1\u5b8c\u6210\u548c\u8d44\u6e90\u6d88\u8017\u76f8\u5173\u7684\u6307\u6807\u3002\u6211\u4eec\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u89e3\u91ca\u79fb\u52a8\u7528\u6237\u754c\u9762\u3001\u52a8\u4f5c\u5b9a\u4f4d\u3001\u8bb0\u5fc6\u4fdd\u7559\u548c\u6267\u884c\u6210\u672c\u7b49\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u4ee5\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ece\u800c\u66f4\u63a5\u8fd1\u5b9e\u9645\u7684\u667a\u80fd\u624b\u673a\u4ee3\u7406\u5e94\u7528\u3002|\n", "2410.14923": "|**2024-10-22**|**Imprompter: Tricking LLM Agents into Improper Tool Use**|Xiaohan Fu et.al.|[2410.14923](http://arxiv.org/abs/2410.14923)|**[link](https://github.com/Reapor-Yurnero/imprompter)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u662f\u4e00\u79cd\u65b0\u5174\u7684\u8ba1\u7b97\u8303\u5f0f\uff0c\u5b83\u7ed3\u5408\u4e86\u751f\u6210\u5f0f\u673a\u5668\u5b66\u4e60\u4e0e\u4ee3\u7801\u89e3\u91ca\u5668\u3001\u7f51\u9875\u6d4f\u89c8\u3001\u7535\u5b50\u90ae\u4ef6\u7b49\u5de5\u5177\uff0c\u4ee5\u53ca\u66f4\u5e7f\u6cdb\u7684\u5916\u90e8\u8d44\u6e90\u3002\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u7cfb\u7edf\u4ee3\u8868\u4e86\u4e2a\u4eba\u8ba1\u7b97\u9886\u57df\u7684\u4e00\u4e2a\u65b0\u5174\u8f6c\u53d8\u3002\u6211\u4eec\u4e3a\u57fa\u4e8e\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u57fa\u7840\u505a\u51fa\u8d21\u732e\uff0c\u5e76\u63d0\u51fa\u4e86\u65b0\u7684\u81ea\u52a8\u8ba1\u7b97\u7684\u5bf9\u6297\u6027\u63d0\u793a\u653b\u51fb\uff0c\u8fd9\u4e9b\u653b\u51fb\u4fb5\u72af\u4e86\u7528\u6237\u8d44\u6e90\u7684\u673a\u5bc6\u6027\u548c\u5b8c\u6574\u6027\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5728\u7ed9\u5b9a\u6a21\u578b\u6743\u91cd\u7684\u60c5\u51b5\u4e0b\uff0c\u5229\u7528\u63d0\u793a\u4f18\u5316\u6280\u672f\u81ea\u52a8\u751f\u6210\u8fd9\u6837\u7684\u63d0\u793a\u3002\u6211\u4eec\u8bc1\u660e\u8fd9\u79cd\u653b\u51fb\u53ef\u4ee5\u8f6c\u79fb\u5230\u751f\u4ea7\u7ea7\u522b\u7684\u4ee3\u7406\u4e0a\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5bf9Mistral\u7684LeChat\u4ee3\u7406\u7684\u4fe1\u606f\u7a83\u53d6\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u5206\u6790\u7528\u6237\u7684\u5bf9\u8bdd\uff0c\u6311\u9009\u51fa\u4e2a\u4eba\u8eab\u4efd\u4fe1\u606f\uff0c\u5e76\u5c06\u5176\u683c\u5f0f\u5316\u4e3a\u6709\u6548\u7684markdown\u547d\u4ee4\uff0c\u4ece\u800c\u5c06\u8fd9\u4e9b\u6570\u636e\u6cc4\u9732\u5230\u653b\u51fb\u8005\u7684\u670d\u52a1\u5668\u4e0a\u3002\u8fd9\u79cd\u653b\u51fb\u5728\u7aef\u5230\u7aef\u8bc4\u4f30\u4e2d\u663e\u793a\u51fa\u4e86\u8fd180%\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\u6765\u8868\u5f81\u8fd9\u4e9b\u653b\u51fb\u7684\u6709\u6548\u6027\uff0c\u5e76\u53d1\u73b0\u5b83\u4eec\u5728\u65b0\u5174\u7684\u57fa\u4e8e\u4ee3\u7406\u7684\u7cfb\u7edf\u5982Mistral\u7684LeChat\u3001ChatGLM\u548cMeta\u7684Llama\u4e2d\u90fd\u80fd\u53ef\u9760\u5730\u5de5\u4f5c\u3002\u8fd9\u4e9b\u653b\u51fb\u662f\u591a\u6a21\u6001\u7684\uff0c\u6211\u4eec\u5728\u6587\u672c\u548c\u56fe\u50cf\u9886\u57df\u5c55\u793a\u4e86\u4e0d\u540c\u7684\u53d8\u4f53\u3002**|\n", "2410.17238": "|**2024-10-22**|**SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning**|Yizhou Chi et.al.|[2410.17238](http://arxiv.org/abs/2410.17238)|**[link](https://github.com/geekan/metagpt)**|**\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\uff08AutoML\uff09\u65b9\u6cd5\u5305\u62ec\u4f20\u7edf\u7684\u4f18\u5316\u56fa\u5b9a\u7ba1\u9053\u4ee5\u8fdb\u884c\u6a21\u578b\u9009\u62e9\u548c\u96c6\u6210\u7684\u65b9\u6cd5\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6700\u65b0\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6846\u67b6\uff0c\u8fd9\u4e9b\u6846\u67b6\u53ef\u4ee5\u81ea\u4e3b\u6784\u5efa\u7ba1\u9053\u3002\u5c3d\u7ba1\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\u4efb\u52a1\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u751f\u6210\u4f4e\u591a\u6837\u6027\u548c\u6b21\u4f18\u7684\u4ee3\u7801\uff0c\u5373\u4f7f\u7ecf\u8fc7\u591a\u6b21\u8fed\u4ee3\u4e5f\u662f\u5982\u6b64\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u6811\u641c\u7d22\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff08SELA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u5229\u7528\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u6765\u4f18\u5316AutoML\u8fc7\u7a0b\u3002\u901a\u8fc7\u5c06\u7ba1\u9053\u914d\u7f6e\u8868\u793a\u4e3a\u6811\u7ed3\u6784\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f7f\u4ee3\u7406\u80fd\u591f\u667a\u80fd\u5730\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5e76\u8fed\u4ee3\u5730\u4f18\u5316\u5176\u7b56\u7565\uff0c\u4ece\u800c\u66f4\u6709\u6548\u5730\u63a2\u7d22\u673a\u5668\u5b66\u4e60\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u3002\u8fd9\u4e00\u65b0\u9896\u7684\u65b9\u6cd5\u5141\u8bb8SELA\u6839\u636e\u5b9e\u9a8c\u53cd\u9988\u53d1\u73b0\u6700\u4f18\u8def\u5f84\uff0c\u63d0\u9ad8\u89e3\u51b3\u65b9\u6848\u7684\u6574\u4f53\u8d28\u91cf\u3002\u5728\u8de8\u8d8a20\u4e2a\u673a\u5668\u5b66\u4e60\u6570\u636e\u96c6\u7684\u5e7f\u6cdb\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u4f20\u7edf\u548c\u57fa\u4e8e\u4ee3\u7406\u7684AutoML\u65b9\u6cd5\u7684\u6027\u80fd\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u6240\u6709\u6570\u636e\u96c6\u4e2d\uff0cSELA\u76f8\u5bf9\u4e8e\u6bcf\u4e2a\u57fa\u7ebf\u7684\u80dc\u7387\u4e3a65%\u523080%\u3002\u8fd9\u4e9b\u7ed3\u679c\u5f3a\u8c03\u4e86\u57fa\u4e8e\u4ee3\u7406\u7b56\u7565\u5728AutoML\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u4e3a\u89e3\u51b3\u590d\u6742\u7684\u673a\u5668\u5b66\u4e60\u6311\u6218\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\u3002**|\n", "2410.16919": "|**2024-10-22**|**EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI**|Tomoyuki Kagaya et.al.|[2410.16919](http://arxiv.org/abs/2410.16919)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63a8\u7406\u80fd\u529b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c24\u5176\u662f\u5728\u5404\u79cd\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u5e94\u7528\u3002LLM\u4ee3\u7406\u7684\u4e00\u4e2a\u7279\u522b\u6709\u524d\u666f\u7684\u5e94\u7528\u662f\u673a\u5668\u4eba\u64cd\u4f5c\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u4e3a\u673a\u5668\u4eba\u751f\u6210\u6587\u672c\u89c4\u5212\u6216\u63a7\u5236\u4ee3\u7801\uff0c\u63d0\u4f9b\u4e86\u6781\u5927\u7684\u7075\u6d3b\u6027\u548c\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u7075\u6d3b\u6027\u548c\u8de8\u4e0d\u540c\u73af\u5883\u7684\u9002\u7528\u6027\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u81ea\u4e3b\u9002\u5e94\u7684\u80fd\u529b\u3002\u76ee\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u5206\u4e3a\u4e24\u7c7b\uff1a\u4e00\u7c7b\u4f9d\u8d56\u4e8e\u7279\u5b9a\u73af\u5883\u7684\u7b56\u7565\u8bad\u7ec3\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u53ef\u79fb\u690d\u6027\uff1b\u53e6\u4e00\u7c7b\u57fa\u4e8e\u56fa\u5b9a\u63d0\u793a\u751f\u6210\u4ee3\u7801\u52a8\u4f5c\uff0c\u5728\u9762\u5bf9\u65b0\u73af\u5883\u65f6\u6027\u80fd\u4f1a\u4e0b\u964d\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u663e\u8457\u5236\u7ea6\u4e86\u4ee3\u7406\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEnvBridge\u7684\u65b0\u65b9\u6cd5\u3002\u8fd9\u79cd\u65b9\u6cd5\u6d89\u53ca\u4ece\u6e90\u73af\u5883\u4fdd\u7559\u548c\u8f6c\u79fb\u6210\u529f\u7684\u673a\u5668\u4eba\u63a7\u5236\u4ee3\u7801\u5230\u76ee\u6807\u73af\u5883\u3002EnvBridge\u901a\u8fc7\u5229\u7528\u591a\u4e2a\u73af\u5883\u7684\u89c1\u89e3\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u5728\u591a\u6837\u5316\u8bbe\u7f6e\u4e2d\u7684\u9002\u5e94\u6027\u548c\u6027\u80fd\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u7f13\u89e3\u4e86\u73af\u5883\u7ea6\u675f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u7075\u6d3b\u548c\u901a\u7528\u7684\u673a\u5668\u4eba\u64cd\u4f5c\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528\u673a\u5668\u4eba\u64cd\u4f5c\u57fa\u51c6\u6d4b\u8bd5RLBench\u3001MetaWorld\u548cCALVIN\u9a8c\u8bc1\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM\u4ee3\u7406\u80fd\u591f\u6210\u529f\u5229\u7528\u591a\u6837\u5316\u7684\u77e5\u8bc6\u6765\u6e90\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u673a\u5668\u4eba\u64cd\u4f5c\u4ee3\u7406\u5728\u591a\u6837\u5316\u73af\u5883\u4e2d\u89c4\u5212\u7684\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u3002|\n", "2410.16670": "|**2024-10-22**|**CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing**|Chen Yang et.al.|[2410.16670](http://arxiv.org/abs/2410.16670)|**[link](https://github.com/uclaml/cops)**|**\u5728\u4ee3\u7406\u7cfb\u7edf\u4e2d\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u987a\u5e8f\u63a8\u7406\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u65b9\u6cd5\u4ecd\u9762\u4e34\u4e00\u4e9b\u9650\u5236\u3002\u53cd\u601d\u9a71\u52a8\u7684\u63a8\u7406\u5b8c\u5168\u4f9d\u8d56\u4e8e\u9884\u8bad\u7ec3\u6a21\u578b\u4e2d\u7684\u77e5\u8bc6\uff0c\u8fd9\u5728\u65b0\u9896\u573a\u666f\u4e2d\u7684\u8868\u73b0\u5f80\u5f80\u53d7\u9650\uff1b\u800c\u7ecf\u9a8c\u8f85\u52a9\u7684\u63a8\u7406\u5219\u5e38\u5e38\u4f9d\u8d56\u5916\u90e8\u7ecf\u9a8c\uff0c\u5e76\u4e14\u7f3a\u4e4f\u9009\u62e9\u4ee3\u8868\u6027\u7ecf\u9a8c\u7684\u660e\u786e\u539f\u5219\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u51faCoPS\uff08\u8de8\u4efb\u52a1\u7ecf\u9a8c\u5171\u4eab\uff09\u7b97\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e9b\u9650\u5236\uff0c\u8fd9\u662f\u4e00\u79cd\u80fd\u591f\u901a\u8fc7\u8de8\u4efb\u52a1\u7ecf\u9a8c\u5171\u4eab\u548c\u9009\u62e9\u6765\u589e\u5f3a\u987a\u5e8f\u63a8\u7406\u7684\u901a\u7528\u7b97\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0cCoPS\u5229\u7528\u4ee3\u7406\u5728\u5148\u524d\u4efb\u52a1\u4e2d\u7684\u7ecf\u9a8c\uff0c\u901a\u8fc7\u4e00\u79cd\u57fa\u4e8e\u60b2\u89c2\u7b56\u7565\u7684\u65b9\u6cd5\u9009\u62e9\u5206\u5e03\u5339\u914d\u7684\u7ecf\u9a8c\uff0c\u4ee5\u6700\u5927\u5316\u6548\u7528\u5e76\u6700\u5c0f\u5316\u56e0\u5206\u5e03\u53d8\u5316\u5e26\u6765\u7684\u98ce\u9669\u3002\u5728Alfworld\u3001Webshop\u548cHotPotQA\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoPS\u59cb\u7ec8\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5e76\u5177\u6709\u9002\u7528\u4e8e\u8d44\u6e90\u53d7\u9650\u573a\u666f\u7684\u4f18\u8d8a\u6837\u672c\u6548\u7387\u3002\u4ece\u7406\u8bba\u4e0a\u8bb2\uff0c\u6211\u4eec\u7684\u7b97\u6cd5\u6027\u80fd\u53d6\u51b3\u4e8e\u9884\u8bad\u7ec3LLM\u7684\u8d28\u91cf\u4ee5\u53ca\u4ee3\u7406\u7684\u4efb\u52a1\u76f8\u5173\u8bd5\u9a8c\u5206\u5e03\u4e0eLLM\u751f\u6210\u5206\u5e03\u4e4b\u95f4\u7684\u5339\u914d\u5ea6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u586b\u8865\u4e86\u73b0\u6709\u987a\u5e8f\u63a8\u7406\u8303\u5f0f\u4e4b\u95f4\u7684\u7a7a\u767d\uff0c\u5e76\u9a8c\u8bc1\u4e86\u5229\u7528\u8de8\u4efb\u52a1\u7ecf\u9a8c\u7684\u6709\u6548\u6027\uff0c\u8fd9\u4e3a\u63d0\u9ad8\u4ee3\u7406\u5728\u591a\u6837\u5316\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u548c\u9002\u5e94\u6027\u63d0\u4f9b\u4e86\u6f5c\u5728\u9014\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2410.16658": "|**2024-10-22**|**Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent**|Janghoon Ock et.al.|[2410.16658](http://arxiv.org/abs/2410.16658)|null|\u5438\u9644\u80fd\u662f\u50ac\u5316\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u53cd\u5e94\u63cf\u8ff0\u7b26\uff0c\u80fd\u591f\u5b9e\u73b0\u6f5c\u5728\u50ac\u5316\u5242\u7684\u9ad8\u6548\u7b5b\u9009\u3002\u7136\u800c\uff0c\u786e\u5b9a\u5438\u9644\u80fd\u9700\u8981\u6bd4\u8f83\u591a\u79cd\u5438\u9644\u7269-\u50ac\u5316\u5242\u6784\u578b\u7684\u80fd\u91cf\uff0c\u7531\u4e8e\u53ef\u80fd\u7684\u6784\u578b\u6570\u91cf\u5e9e\u5927\uff0c\u8fd9\u5728\u8ba1\u7b97\u4e0a\u975e\u5e38\u8017\u65f6\u3002\u5f53\u524d\u7684\u7b97\u6cd5\u65b9\u6cd5\u901a\u5e38\u4f1a\u679a\u4e3e\u5438\u9644\u4f4d\u70b9\u548c\u6784\u578b\uff0c\u800c\u4e0d\u4f1a\u5229\u7528\u7406\u8bba\u89c1\u89e3\u6765\u6307\u5bfc\u521d\u59cb\u8bbe\u7f6e\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAdsorb-Agent\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\uff0c\u65e8\u5728\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u9ad8\u6548\u5730\u63a8\u5bfc\u51fa\u7cfb\u7edf\u7279\u5b9a\u7684\u7a33\u5b9a\u5438\u9644\u6784\u578b\u3002Adsorb-Agent\u5229\u7528\u5185\u7f6e\u77e5\u8bc6\u548c\u65b0\u5174\u63a8\u7406\u80fd\u529b\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u6240\u9700\u7684\u521d\u59cb\u6784\u578b\u6570\u91cf\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u9884\u6d4b\u6700\u4f4e\u5438\u9644\u80fd\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u5b9e\u4f8b\u7cfb\u7edfNNH-CuPd3(111)\u548cNNH-Mo3Pd(111)\uff0c\u7528\u4e8e\u6c2e\u8fd8\u539f\u53cd\u5e94\uff08NRR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6301\u7eed\u66ff\u4ee3\u54c8\u4f2f-\u535a\u65bd\u5de5\u827a\u7684\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u5176\u6027\u80fd\u3002Adsorb-Agent\u901a\u8fc7\u8bc6\u522b\u80fd\u91cf\u66f4\u4f4e\u4e14\u521d\u59cb\u8bbe\u7f6e\u66f4\u5c11\u7684\u6784\u578b\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684\u201c\u542f\u53d1\u5f0f\u201d\u548c\u201c\u968f\u673a\u201d\u7b97\u6cd5\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u8ba1\u7b97\u6210\u672c\u5e76\u63d0\u9ad8\u4e86\u51c6\u786e\u6027\u3002\u8fd9\u51f8\u663e\u4e86\u5b83\u52a0\u901f\u50ac\u5316\u5242\u53d1\u73b0\u7684\u6f5c\u529b\u3002|\n", "2410.18032": "|**2024-10-23**|**GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration**|Xin Li et.al.|[2410.18032](http://arxiv.org/abs/2410.18032)|**[link](https://github.com/bupt-gamma/graphteam)**|**\u56fe\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\uff0c\u5982\u793e\u4ea4\u7f51\u7edc\u548c\u57ce\u5e02\u8ba1\u7b97\u4e2d\u88ab\u5e7f\u6cdb\u7528\u4e8e\u5efa\u6a21\u5173\u7cfb\u6570\u636e\u3002\u73b0\u6709\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u56fe\u5206\u6790\u65b9\u6cd5\u8981\u4e48\u96c6\u6210\u4e86\u7279\u5b9a\u673a\u5668\u5b66\u4e60\u4efb\u52a1\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\uff0c\u9650\u5236\u4e86\u5176\u53ef\u8fc1\u79fb\u6027\uff0c\u8981\u4e48\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u81ea\u8eab\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e86LLM\u57fa\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5c55\u793a\u4e86\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u6216\u5de5\u5177\u89e3\u51b3\u95ee\u9898\u7684\u80fd\u529b\u3002\u901a\u8fc7\u6a21\u62df\u4eba\u7c7b\u7684\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff0c\u5982\u7c7b\u6bd4\u548c\u534f\u4f5c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u79f0\u4e3aGraphTeam\uff0c\u7528\u4e8e\u56fe\u5206\u6790\u3002GraphTeam\u7531\u4e09\u4e2a\u6a21\u5757\u4e2d\u7684\u4e94\u4e2aLLM\u57fa\u4ee3\u7406\u7ec4\u6210\uff0c\u5177\u6709\u4e0d\u540c\u4e13\u957f\u7684\u4ee3\u7406\u53ef\u4ee5\u76f8\u4e92\u534f\u4f5c\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\uff081\uff09\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u5316\u6a21\u5757\uff1a\u95ee\u9898\u4ee3\u7406\u4ece\u539f\u59cb\u95ee\u9898\u4e2d\u63d0\u53d6\u5e76\u63d0\u70bc\u51fa\u56db\u4e2a\u5173\u952e\u53c2\u6570\uff0c\u4fbf\u4e8e\u7406\u89e3\u95ee\u9898\uff0c\u7b54\u6848\u4ee3\u7406\u5219\u5c06\u7ed3\u679c\u7ec4\u7ec7\u6210\u7b26\u5408\u8f93\u51fa\u8981\u6c42\u7684\u5f62\u5f0f\uff1b\uff082\uff09\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6a21\u5757\uff1a\u6211\u4eec\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u6587\u6863\u548c\u7ecf\u9a8c\u4fe1\u606f\u7684\u77e5\u8bc6\u5e93\uff0c\u7136\u540e\u641c\u7d22\u4ee3\u7406\u4e3a\u6bcf\u4e2a\u95ee\u9898\u68c0\u7d22\u6700\u76f8\u5173\u7684\u6761\u76ee\u3002\uff083\uff09\u95ee\u9898\u89e3\u51b3\u6a21\u5757\uff1a\u7ed9\u5b9a\u641c\u7d22\u4ee3\u7406\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u7f16\u7801\u4ee3\u7406\u4f7f\u7528\u7f16\u7a0b\u65b9\u6cd5\u751f\u6210\u89e3\u51b3\u65b9\u6848\uff1b\u5982\u679c\u7f16\u7801\u4ee3\u7406\u4e0d\u8d77\u4f5c\u7528\uff0c\u63a8\u7406\u4ee3\u7406\u5c06\u76f4\u63a5\u8fdb\u884c\u8ba1\u7b97\u800c\u65e0\u9700\u7f16\u7a0b\u3002\u5728\u516d\u4e2a\u56fe\u5206\u6790\u57fa\u51c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cGraphTeam\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u6700\u597d\u7684\u57fa\u7ebf\u5e73\u5747\u63d0\u9ad8\u4e8625.85%\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/BUPT-GAMMA/GraphTeam \u83b7\u53d6\u3002**|\n", "2410.18012": "|**2024-10-25**|**MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting**|Sungil Seok et.al.|[2410.18012](http://arxiv.org/abs/2410.18012)|null|\u7f8e\u56fd\u8054\u90a6\u57fa\u91d1\u5229\u7387\u5728\u56fd\u5185\u5916\u91d1\u878d\u5e02\u573a\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8be5\u5229\u7387\u8c03\u6574\u7684\u5f71\u54cd\u4e0a\uff0c\u800c\u975e\u51b3\u7b56\u8fc7\u7a0b\u672c\u8eab\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u91cd\u5efa\u539f\u59cb\u7684\u8054\u90a6\u516c\u5f00\u5e02\u573a\u59d4\u5458\u4f1a\uff08FOMC\uff09\u4f1a\u8bae\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u8fd9\u4e9b\u4f1a\u8bae\u8d1f\u8d23\u8bbe\u5b9a\u8054\u90a6\u57fa\u91d1\u5229\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4e94\u9636\u6bb5\u7684FOMC\u4f1a\u8bae\u6a21\u62df\u6846\u67b6MiniFed\uff0c\u8be5\u6846\u67b6\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684FOMC\u4f1a\u8bae\u6210\u5458\uff0c\u5e76\u4f18\u5316FOMC\u7ed3\u6784\u3002\u8fd9\u4e00\u6846\u67b6\u6709\u6548\u5730\u91cd\u65b0\u6fc0\u6d3b\u4e86FOMC\u4f1a\u8bae\u6d41\u7a0b\uff0c\u5e76\u4fc3\u8fdb\u4e86\u5bf9\u8054\u90a6\u57fa\u91d1\u5229\u7387\u7684\u9884\u6d4b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u63d0\u51fa\u7684MiniFed\u6846\u67b6\u5728\u8054\u90a6\u57fa\u91d1\u5229\u7387\u9884\u6d4b\u65b9\u9762\u8fbe\u5230\u4e86\u9ad8\u51c6\u786e\u5ea6\uff0c\u5e76\u4e14\u4ee3\u7406\u7684\u884c\u4e3a\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u5bf9\u5e94\u8005\u4fdd\u6301\u4e00\u81f4\u3002\u9274\u4e8e\u76ee\u524d\u5f88\u5c11\u6709\u7814\u7a76\u5229\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u5927\u89c4\u6a21\u7684\u73b0\u5b9e\u4e16\u754c\u4f1a\u8bae\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u53ef\u4ee5\u4f5c\u4e3a\u672a\u6765\u53d1\u5c55\u7684\u57fa\u51c6\u3002|\n", "2410.18792": "|**2024-10-25**|**An LLM Agent for Automatic Geospatial Data Analysis**|Yuxing Chen et.al.|[2410.18792](http://arxiv.org/abs/2410.18792)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u636e\u79d1\u5b66\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u987a\u5e8f\u4efb\u52a1\u65f6\u5e38\u5e38\u9047\u5230\u903b\u8f91\u9519\u8bef\u7684\u95ee\u9898\u3002\u7279\u522b\u662f\u5728\u5904\u7406\u5730\u7406\u7a7a\u95f4\u6570\u636e\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u9762\u4e34\u7740\u6574\u5408\u590d\u6742\u6570\u636e\u7ed3\u6784\u548c\u7a7a\u95f4\u7ea6\u675f\u3001\u6709\u6548\u5229\u7528\u5404\u79cd\u51fd\u6570\u8c03\u7528\u4ee5\u53ca\u8f83\u5c11\u4f7f\u7528\u7684\u5730\u7406\u7a7a\u95f4\u5e93\u65b9\u9762\u5bb9\u6613\u4ea7\u751f\u5e7b\u89c9\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86GeoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u4ea4\u4e92\u6846\u67b6\uff0c\u65e8\u5728\u5e2e\u52a9LLMs\u66f4\u6709\u6548\u5730\u5904\u7406\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5904\u7406\u4efb\u52a1\u3002GeoAgent\u9996\u521b\u6027\u5730\u5c06\u4ee3\u7801\u89e3\u91ca\u5668\u3001\u9759\u6001\u5206\u6790\u548c\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u6280\u672f\u4e0e\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u7b97\u6cd5\u76f8\u7ed3\u5408\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5904\u7406\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8d21\u732e\u4e86\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u57fa\u4e8eLLMs\u7684\u65b9\u6cd5\u5728\u5730\u7406\u7a7a\u95f4\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u8be5\u57fa\u51c6\u5229\u7528\u4e86\u591a\u79cdPython\u5e93\uff0c\u5e76\u5305\u62ec\u4ece\u6570\u636e\u83b7\u53d6\u3001\u6570\u636e\u5206\u6790\u5230\u53ef\u89c6\u5316\u7684\u5355\u8f6e\u548c\u591a\u8f6e\u4efb\u52a1\u3002\u901a\u8fc7\u5728\u5404\u79cd\u5730\u7406\u7a7a\u95f4\u73af\u5883\u4e2d\u63d0\u4f9b\u5168\u9762\u7684\u8bc4\u4f30\uff0c\u8fd9\u4e2a\u57fa\u51c6\u4e3a\u5f00\u53d1LLMs\u5728\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8bbe\u5b9a\u4e86\u65b0\u6807\u51c6\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f9d\u9760LLMs\u7684\u77e5\u8bc6\u5bf9\u4e8e\u51c6\u786e\u7f16\u7a0b\u5730\u7406\u7a7a\u95f4\u4efb\u52a1\u662f\u4e0d\u591f\u7684\uff0c\u8fd9\u9700\u8981\u8fde\u8d2f\u7684\u591a\u6b65\u9aa4\u8fc7\u7a0b\u548c\u591a\u6b21\u51fd\u6570\u8c03\u7528\u3002\u4e0e\u57fa\u7ebfLLMs\u76f8\u6bd4\uff0c\u63d0\u51fa\u7684GeoAgent\u5c55\u793a\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5728\u51fd\u6570\u8c03\u7528\u548c\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6539\u8fdb\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u7ed3\u679c\u4e3a\u672a\u6765LLMs\u4ee3\u7406\u5728\u81ea\u52a8\u5730\u7406\u7a7a\u95f4\u6570\u636e\u5206\u6790\u4efb\u52a1\u7f16\u7a0b\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002|\n", "2410.18528": "|**2024-10-24**|**PRACT: Optimizing Principled Reasoning and Acting of LLM Agent**|Zhiwei Liu et.al.|[2410.18528](http://arxiv.org/abs/2410.18528)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86Principled Reasoning and Acting (PRAct)\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u4ece\u8f68\u8ff9\u6570\u636e\u4e2d\u5b66\u4e60\u548c\u6267\u884c\u884c\u52a8\u539f\u5219\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6838\u5fc3\u662f\u4f7f\u7528\u6765\u81ea\u53cd\u601d\u548c\u4f18\u5316\u5f15\u64ce\u7684\u6587\u672c\u68af\u5ea6\u6765\u63a8\u5bfc\u8fd9\u4e9b\u884c\u52a8\u539f\u5219\u3002\u4e3a\u4e86\u4f7f\u884c\u52a8\u539f\u5219\u9002\u5e94\u7279\u5b9a\u4efb\u52a1\u8981\u6c42\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u4f18\u5316\u6846\u67b6\uff0c\u79f0\u4e3aReflective Principle Optimization (RPO)\u3002\u5728\u6267\u884c\u540e\uff0cRPO\u4f7f\u7528\u53cd\u601d\u5668\u6765\u6279\u8bc4\u5f53\u524d\u7684\u884c\u52a8\u539f\u5219\uff0c\u5e76\u4f7f\u7528\u4f18\u5316\u5668\u76f8\u5e94\u5730\u66f4\u65b0\u5b83\u4eec\u3002\u6211\u4eec\u5728\u4e24\u79cd\u573a\u666f\u4e0b\u5f00\u53d1\u4e86RPO\u6846\u67b6\uff1aReward-RPO\uff0c\u5b83\u4f7f\u7528\u73af\u5883\u5956\u52b1\u8fdb\u884c\u53cd\u601d\uff1b\u4ee5\u53caSelf-RPO\uff0c\u5b83\u5728\u6ca1\u6709\u5916\u90e8\u5956\u52b1\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4ecb\u7ecd\u4e86\u4e24\u79cdRPO\u65b9\u6cd5\uff0cRPO-Traj\u548cRPO-Batch\uff0c\u4ee5\u9002\u5e94\u4e0d\u540c\u7684\u8bbe\u7f6e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u56db\u4e2a\u73af\u5883\u4e2d\uff0c\u5229\u7528RPO\u6846\u67b6\u7684PRAct\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5b66\u4e60\u5e76\u5e94\u7528\u884c\u52a8\u539f\u5219\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002|\n", "2410.19385": "|**2024-10-25**|**Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models**|Liam Barkley et.al.|[2410.19385](http://arxiv.org/abs/2410.19385)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u901a\u8fc7\u5927\u91cf\u4eba\u7c7b\u53ef\u8bfb\u7684\u6587\u672c\u8bad\u7ec3\u800c\u6210\u7684\u5f3a\u5927\u8ba1\u7b97\u6a21\u578b\uff0c\u4f7f\u5b83\u4eec\u80fd\u591f\u6267\u884c\u901a\u7528\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u3002\u8fd9\u4e9b\u6a21\u578b\u56e0\u5176\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u800c\u5728\u884c\u4e1a\u548c\u5b66\u672f\u754c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u8fd9\u4e9b\u6210\u529f\uff0cLLMs\u7ecf\u5e38\u4f1a\u4ea7\u751f\u4e0d\u51c6\u786e\u7684\u60c5\u51b5\uff0c\u901a\u5e38\u79f0\u4e3a\u5e7b\u89c9\u3002\u63d0\u793a\u5de5\u7a0b\uff0c\u5373\u8bbe\u8ba1\u548c\u5236\u5b9a\u6307\u4ee4\u4ee5\u4f7fLLMs\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u7684\u8fc7\u7a0b\uff0c\u5df2\u6210\u4e3a\u51cf\u8f7b\u5e7b\u89c9\u7684\u5173\u952e\u65b9\u6cd5\u3002\u672c\u6587\u5bf9\u4e0d\u540c\u7684\u63d0\u793a\u7b56\u7565\u548c\u6846\u67b6\u8fdb\u884c\u4e86\u5168\u9762\u7684\u7ecf\u9a8c\u8bc4\u4f30\uff0c\u65e8\u5728\u51cf\u5c11LLMs\u4e2d\u7684\u5e7b\u89c9\u3002\u5404\u79cd\u63d0\u793a\u6280\u672f\u88ab\u5e94\u7528\u4e8e\u5e7f\u6cdb\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u4ee5\u8bc4\u4f30\u6bcf\u79cd\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u548c\u5e7b\u89c9\u7387\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u7814\u7a76\u4e86\u5de5\u5177\u8c03\u7528\u4ee3\u7406\uff08\u5177\u6709\u5916\u90e8\u5de5\u5177\u589e\u5f3a\u5176\u80fd\u529b\u4ee5\u8d85\u8d8a\u8bed\u8a00\u751f\u6210\u7684LLMs\uff09\u5bf9\u540c\u4e00\u57fa\u51c6\u6570\u636e\u96c6\u4e2d\u5e7b\u89c9\u7387\u7684\u5f71\u54cd\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6700\u4f73\u63d0\u793a\u6280\u672f\u53d6\u51b3\u4e8e\u95ee\u9898\u7c7b\u578b\uff0c\u5e76\u4e14\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\uff0c\u7b80\u5355\u7684\u6280\u672f\u5f80\u5f80\u6bd4\u590d\u6742\u7684\u65b9\u6cd5\u66f4\u6709\u6548\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8868\u660e\uff0c\u7531\u4e8e\u5916\u90e8\u5de5\u5177\u4f7f\u7528\u7684\u590d\u6742\u6027\u589e\u52a0\uff0cLLM\u4ee3\u7406\u53ef\u80fd\u4f1a\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u5e7b\u89c9\u7387\u3002|\n", "2410.19238": "|**2024-10-25**|**Designing LLM-Agents with Personalities: A Psychometric Approach**|Muhua Huang et.al.|[2410.19238](http://arxiv.org/abs/2410.19238)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u4f7f\u7528\u4e94\u5927\u4eba\u683c\u6846\u67b6\u4e3a\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\uff08Agent\uff09\u5206\u914d\u53ef\u91cf\u5316\u3001\u53ef\u63a7\u4e14\u7ecf\u8fc7\u5fc3\u7406\u6d4b\u91cf\u9a8c\u8bc1\u7684\u4eba\u683c\u7279\u8d28\u3002\u7814\u7a76\u65e8\u5728\u514b\u670d\u4eba\u7c7b\u4e3b\u4f53\u7814\u7a76\u7684\u9650\u5236\uff0c\u63d0\u51fa\u4ee3\u7406\u4f5c\u4e3a\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u7684\u4e00\u79cd\u53ef\u8bbf\u95ee\u5de5\u5177\u3002\u901a\u8fc7\u56db\u9879\u7814\u7a76\uff0c\u672c\u7814\u7a76\u5c55\u793a\u4e86\u4e3a\u4ee3\u7406\u5206\u914d\u5fc3\u7406\u6d4b\u91cf\u6709\u6548\u4eba\u683c\u7279\u8d28\u7684\u53ef\u884c\u6027\uff0c\u5e76\u4f7f\u5176\u80fd\u591f\u590d\u5236\u590d\u6742\u7684\u4eba\u7c7b\u884c\u4e3a\u3002\u7b2c\u4e00\u9879\u7814\u7a76\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8bed\u4e49\u7a7a\u95f4\u4e2d\u5efa\u7acb\u4e86\u5bf9\u4eba\u683c\u7ed3\u6784\u548c\u4eba\u683c\u6d4b\u8bd5\u7684\u7406\u89e3\u3002\u968f\u540e\u7684\u4e24\u9879\u7814\u7a76\u5229\u7528\u5b9e\u8bc1\u6570\u636e\u548c\u6a21\u62df\u6570\u636e\u5c55\u793a\u4e86\u521b\u5efa\u4ee3\u7406\u7684\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u663e\u793a\u4eba\u7c7b\u548c\u4ee3\u7406\u5728\u4eba\u683c\u6d4b\u8bd5\u4e2d\u7684\u7b54\u6848\u9ad8\u5ea6\u5bf9\u5e94\u6765\u9a8c\u8bc1\u7ed3\u679c\u3002\u6700\u540e\u4e00\u9879\u7814\u7a76\u8fdb\u4e00\u6b65\u901a\u8fc7\u4ee3\u7406\u5728\u6d89\u53ca\u98ce\u9669\u627f\u62c5\u548c\u9053\u5fb7\u56f0\u5883\u7684\u60c5\u5883\u4e0b\u590d\u5236\u5df2\u77e5\u7684\u4eba\u7c7b\u4eba\u683c\u7279\u8d28\u4e0e\u51b3\u7b56\u884c\u4e3a\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u4eba\u683c\u5fc3\u7406\u6d4b\u91cf\u65b9\u6cd5\u8bbe\u8ba1\u4ee3\u7406\u7684\u6709\u6548\u6027\u53ca\u5176\u5728\u793e\u4f1a\u548c\u884c\u4e3a\u7814\u7a76\u4e2d\u7684\u9002\u7528\u6027\u3002|\n", "2410.21071": "|**2024-10-28**|**Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks**|Eitan Farchi et.al.|[2410.21071](http://arxiv.org/abs/2410.21071)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u4ee5\u7528\u4e8e\u591a\u79cd\u4e0e\u4ee3\u7801\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u4f8b\u5982\u4ece\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u7ffb\u8bd1\u5230\u53e6\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u3001\u5b9e\u73b0\u81ea\u7136\u8bed\u8a00\u9700\u6c42\u548c\u4ee3\u7801\u603b\u7ed3\u3002\u6700\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\u6280\u672f\u751f\u6210\u7684\u5de5\u4ef6\u6709\u671b\u5728\u7528\u6237\u8fdb\u884c\u5c11\u91cf\u7b80\u5355\u4fee\u6539\u540e\u5373\u53ef\u4f7f\u7528\u3002\u7136\u800c\uff0c\u91cf\u5316\u8fd9\u79cd\u6a21\u7cca\u7684\u6982\u5ff5\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u6b64\u5f88\u96be\u786e\u5b9a\u4e0e\u4ee3\u7801\u76f8\u5173\u7684LLM\u89e3\u51b3\u65b9\u6848\u7684\u8d28\u91cf\u3002\u6211\u4eec\u79f0\u4f7f\u7528LLM\u5224\u65ad\u6765\u8bc4\u4f30LLM\u89e3\u51b3\u65b9\u6848\u7684\u65b9\u6cd5\u4e3a\u201cLLM\u4f5c\u4e3a\u88c1\u5224\u201d\uff0c\u7b80\u79f0LaaJ\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u751f\u6210\u548c\u8bc4\u4f30LaaJ\u5b9e\u65bd\u7684\u65b9\u6cd5\u8bba\uff0c\u5e76\u5229\u7528\u81ea\u52a8\u4ea7\u751f\u7684\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\u3002\u8be5\u57fa\u51c6\u7684\u76ee\u7684\u662f\u53cc\u91cd\u7684\uff0c\u5373\u7528\u4e8e\u5f00\u53d1\u548c\u9a8c\u8bc1LaaJs\uff0c\u4ee5\u53ca\u9a8c\u8bc1\u548c\u6d4b\u8bd5\u4f7f\u7528LaaJs\u7684\u5927\u8bed\u8a00\u6a21\u578b\u4ee3\u7801\u76f8\u5173\u89e3\u51b3\u65b9\u6848\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u57fa\u51c6\u751f\u6210\u5f15\u64ce\uff0c\u8be5\u5f15\u64ce\u4e3a\u591a\u79cd\u4ee3\u7801\u76f8\u5173\u4efb\u52a1\u751f\u6210\u591a\u79cd\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3aLaaJ\u8bc4\u4f30\u7684\u8f93\u5165\u3002\u6211\u4eec\u5229\u7528\u4ee3\u7801\u76f8\u5173\u751f\u6210\u7684\u56fe\u5f62\u8868\u793aG\uff0c\u5176\u4e2d\u56fe\u7684\u9876\u70b9\u662f\u751f\u6210\u7684\u5de5\u4ef6\uff0c\u8fb9\u4ee3\u8868\u53ef\u80fd\u7684\u751f\u6210\uff0c\u4f8b\u5982\u4ece\u81ea\u7136\u8bed\u8a00\u9700\u6c42\u751f\u6210Java\u7a0b\u5e8f\u3002\u901a\u8fc7\u5229\u7528LLM\u4ee3\u7406\u94fe\u548cG\uff0c\u6211\u4eec\u751f\u6210\u4e0e\u4ee3\u7801\u76f8\u5173\u7684\u5de5\u4ef6\u3002\u5229\u7528G\u4e2d\u7684\u5faa\u73af\uff0c\u6211\u4eec\u5236\u5b9a\u5bf9\u751f\u6210\u5de5\u4ef6\u7684\u671f\u671b\u3002\u5229\u7528\u8fd9\u4e9b\u5236\u5b9a\u7684\u671f\u671b\uff0c\u53ef\u4ee5\u5f00\u53d1\u548c\u6d4b\u8bd5\u53ef\u9760\u7684LLM\u5224\u65ad\uff0c\u4ee5\u8861\u91cf\u89e3\u51b3\u65b9\u6848\u751f\u6210\u7684\u5de5\u4ef6\u7684\u6709\u7528\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u521b\u5efa\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.20666": "|**2024-10-28**|**Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments**|Sangmim Song et.al.|[2410.20666](http://arxiv.org/abs/2410.20666)|null|\u5bfc\u822a\u5bf9\u4e8e\u89c6\u89c9\u969c\u788d\u4eba\u58eb\uff08PVI\uff09\u6765\u8bf4\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u867d\u7136\u4f20\u7edf\u7684\u8f85\u52a9\u5de5\u5177\u5982\u767d\u8272\u624b\u6756\u548c\u5bfc\u76f2\u72ac\u975e\u5e38\u5b9d\u8d35\uff0c\u4f46\u5b83\u4eec\u5728\u63d0\u4f9b\u8be6\u7ec6\u7684\u73af\u5883\u4fe1\u606f\u548c\u7cbe\u786e\u5f15\u5bfc\u5230\u76ee\u7684\u5730\u65b9\u9762\u4ecd\u663e\u4e0d\u8db3\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u53d1\u5c55\u4e3a\u589e\u5f3a\u8f85\u52a9\u5bfc\u822a\u63d0\u4f9b\u4e86\u65b0\u7684\u9014\u5f84\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aGuide-LLM\u7684\u5177\u8eab\u5316LLM\u57fa\u4ee3\u7406\uff0c\u65e8\u5728\u5e2e\u52a9\u89c6\u89c9\u969c\u788d\u4eba\u58eb\u5728\u5927\u578b\u5ba4\u5185\u73af\u5883\u4e2d\u5bfc\u822a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u6587\u672c\u7684\u62d3\u6251\u56fe\uff0c\u4f7fLLM\u80fd\u591f\u4f7f\u7528\u7b80\u5316\u7684\u73af\u5883\u8868\u793a\u6765\u89c4\u5212\u5168\u5c40\u8def\u5f84\uff0c\u91cd\u70b9\u5173\u6ce8\u76f4\u7ebf\u8def\u5f84\u548c\u76f4\u89d2\u8f6c\u5f2f\uff0c\u4ee5\u4fc3\u8fdb\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u5e38\u8bc6\u63a8\u7406\u8fdb\u884c\u5371\u9669\u68c0\u6d4b\uff0c\u5e76\u6839\u636e\u7528\u6237\u504f\u597d\u8fdb\u884c\u4e2a\u6027\u5316\u8def\u5f84\u89c4\u5212\u3002\u6a21\u62df\u5b9e\u9a8c\u8868\u660e\u8be5\u7cfb\u7edf\u5728\u5f15\u5bfc\u89c6\u89c9\u969c\u788d\u4eba\u58eb\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u7a81\u663e\u4e86\u5176\u4f5c\u4e3a\u8f85\u52a9\u6280\u672f\u663e\u8457\u8fdb\u6b65\u7684\u6f5c\u529b\u3002\u7ed3\u679c\u8868\u660e\uff0cGuide-LLM\u80fd\u591f\u63d0\u4f9b\u9ad8\u6548\u3001\u9002\u5e94\u6027\u5f3a\u4e14\u4e2a\u6027\u5316\u7684\u5bfc\u822a\u8f85\u52a9\uff0c\u6307\u51fa\u4e86\u8be5\u9886\u57df\u6709\u5e0c\u671b\u7684\u53d1\u5c55\u524d\u666f\u3002|\n", "2410.20445": "|**2024-10-27**|**TrajAgent: An Agent Framework for Unified Trajectory Modelling**|Yuwei Du et.al.|[2410.20445](http://arxiv.org/abs/2410.20445)|**[link](https://github.com/tsinghua-fib-lab/trajagent)**|**\u8f68\u8ff9\u5efa\u6a21\uff0c\u5305\u62ec\u8f68\u8ff9\u6570\u636e\u6a21\u5f0f\u6316\u6398\u548c\u672a\u6765\u9884\u6d4b\u7684\u7814\u7a76\uff0c\u5728\u751f\u6d3b\u670d\u52a1\u3001\u57ce\u5e02\u4ea4\u901a\u548c\u516c\u5171\u7ba1\u7406\u7b49\u9886\u57df\u6709\u7740\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\uff0c\u5df2\u7ecf\u63d0\u51fa\u4e86\u8bb8\u591a\u65b9\u6cd5\u6765\u89e3\u51b3\u8f68\u8ff9\u5efa\u6a21\u4e2d\u7684\u5404\u79cd\u95ee\u9898\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u7684\u5f02\u8d28\u6027\u548c\u4efb\u52a1\u7684\u591a\u6837\u6027\uff0c\u5b9e\u73b0\u7edf\u4e00\u7684\u8f68\u8ff9\u5efa\u6a21\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6TrajAgent\uff0c\u4ee5\u7edf\u4e00\u5404\u79cd\u8f68\u8ff9\u5efa\u6a21\u4efb\u52a1\u3002\u5728TrajAgent\u4e2d\uff0c\u6211\u4eec\u9996\u5148\u5f00\u53d1\u4e86UniEnv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u7edf\u4e00\u6570\u636e\u548c\u6a21\u578b\u63a5\u53e3\u7684\u6267\u884c\u73af\u5883\uff0c\u652f\u6301\u5404\u79cd\u6a21\u578b\u7684\u6267\u884c\u548c\u8bad\u7ec3\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86TAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5404\u79cd\u8f68\u8ff9\u4efb\u52a1\u81ea\u52a8\u8fdb\u884c\u8f68\u8ff9\u5efa\u6a21\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5728TAgent\u4e2d\u8bbe\u8ba1\u4e86AutOpt\uff0c\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u4f18\u5316\u6a21\u5757\uff0c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u4e86\u96c6\u6210\u6a21\u578b\u7684\u6027\u80fd\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u7684\u4e0d\u540c\u8f68\u8ff9\u4efb\u52a1\uff0cTrajAgent\u80fd\u591f\u901a\u8fc7\u8bad\u7ec3\u548c\u6267\u884c\u9002\u5f53\u7684\u6a21\u578b\u81ea\u52a8\u751f\u6210\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\u3002\u5728\u56db\u4e2a\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u56db\u4e2a\u4efb\u52a1\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cTrajAgent\u5728\u7edf\u4e00\u8f68\u8ff9\u5efa\u6a21\u65b9\u9762\u662f\u6709\u6548\u7684\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e8615.43%\u3002**|\n", "2410.20007": "|**2024-10-25**|**Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models**|Danqing Wang et.al.|[2410.20007](http://arxiv.org/abs/2410.20007)|null|\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u4f7f\u5176\u80fd\u591f\u89e3\u51b3\u590d\u6742\u7684\u591a\u6b65\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u591a\u667a\u80fd\u4f53\u6846\u67b6\u5728\u589e\u5f3aLLMs\u7684\u63a8\u7406\u80fd\u529b\u65b9\u9762\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u667a\u80fd\u4f53\u4e4b\u95f4\u7f3a\u4e4f\u6709\u6548\u7684\u5408\u4f5c\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u591a\u6b65\u63a8\u7406\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5408\u4f5c\u591a\u667a\u80fd\u4f53\u63a8\u7406\u6846\u67b6\uff08CoPlanner\uff09\uff0c\u901a\u8fc7\u5206\u79bb\u63a8\u7406\u6b65\u9aa4\u5e76\u5c06\u4e0d\u540c\u7684\u4efb\u52a1\u5206\u914d\u7ed9\u4e0d\u540c\u7684\u667a\u80fd\u4f53\u6765\u5b9e\u73b0\u3002CoPlanner\u7531\u4e24\u4e2aLLM\u667a\u80fd\u4f53\u7ec4\u6210\uff1a\u89c4\u5212\u667a\u80fd\u4f53\u548c\u63a8\u7406\u667a\u80fd\u4f53\u3002\u89c4\u5212\u667a\u80fd\u4f53\u63d0\u4f9b\u9ad8\u5c42\u6b21\u7684\u6218\u7565\u63d0\u793a\uff0c\u800c\u63a8\u7406\u667a\u80fd\u4f53\u5219\u9075\u5faa\u8fd9\u4e9b\u63d0\u793a\u5e76\u63a8\u5bfc\u51fa\u7b54\u6848\u3002\u901a\u8fc7\u901a\u8fc7\u8fd1\u7aef\u7b56\u7565\u4f18\u5316\uff08PPO\uff09\u8bad\u7ec3\u89c4\u5212\u667a\u80fd\u4f53\u7684\u7b56\u7565\uff0c\u57fa\u4e8eLLaMA-3-8B\u7684CoPlanner\u5728LogiQA\u4e0a\u6bd4\u4e4b\u524d\u6700\u597d\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e869.94%\uff0c\u5728BBH\u4e0a\u63d0\u9ad8\u4e863.09%\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u89c4\u5212\u667a\u80fd\u4f53\u7684\u6307\u5bfc\u4ee5\u53ca\u667a\u80fd\u4f53\u4e4b\u95f4\u7684\u6709\u6548\u5408\u4f5c\u5bf9CoPlanner\u5728\u89e3\u51b3\u591a\u6b65\u63a8\u7406\u95ee\u9898\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u80fd\u8d77\u5230\u4e86\u91cd\u8981\u4f5c\u7528\u3002|\n", "2410.19920": "|**2024-10-29**|**Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting**|Mohamed Salim Aissi et.al.|[2410.19920](http://arxiv.org/abs/2410.19920)|null|\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u77e5\u8bc6\u5e94\u7528\u4e8e\u987a\u5e8f\u51b3\u7b56\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f88\u5c11\u6709\u7814\u7a76\u6df1\u5165\u63a2\u8ba8\u5728\u7279\u5b9a\u73af\u5883\u4e2d\u4f7f\u7528RL\u5fae\u8c03\u8fd9\u4e9b\u6a21\u578b\u5bf9\u5176\u80fd\u529b\u7684\u5f71\u54cd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u5206\u6790\u5728\u6587\u672c\u73af\u5883\u4e2d\u8fdb\u884cRL\u8bad\u7ec3\u540e\uff0cLLM\u4ee3\u7406\u5bf9\u63d0\u793a\u683c\u5f0f\u7684\u654f\u611f\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u9762\u5bf9\u4e0eRL\u8bad\u7ec3\u9636\u6bb5\u6240\u4f7f\u7528\u7684\u4e0d\u540c\u7684\u63d0\u793a\u683c\u5f0f\u65f6\uff0cLLM\u7684\u6027\u80fd\u4f1a\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u68c0\u67e5\u6a21\u578b\u7684\u5185\u90e8\u8868\u793a\u548c\u663e\u8457\u6807\u8bb0\u6765\u5206\u6790\u8fd9\u79cd\u654f\u611f\u6027\u7684\u6765\u6e90\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u5bf9\u6bd4\u635f\u5931\u6765\u51cf\u8f7b\u8fd9\u79cd\u654f\u611f\u6027\uff0c\u5e76\u63d0\u9ad8LLM\u7684\u9c81\u68d2\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2410.21909": "|**2024-10-29**|**SceneGenAgent: Precise Industrial Scene Generation with Coding Agent**|Xiao Xia et.al.|[2410.21909](http://arxiv.org/abs/2410.21909)|**[link](https://github.com/thudm/scenegenagent)**|**\u5de5\u4e1a\u573a\u666f\u7684\u5efa\u6a21\u5bf9\u4e8e\u5de5\u4e1a\u5236\u9020\u4e2d\u7684\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ece\u6587\u672c\u63cf\u8ff0\u751f\u6210\u4e00\u822c3D\u573a\u666f\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u4f7f\u7528LLMs\u751f\u6210\u5de5\u4e1a\u573a\u666f\u9762\u4e34\u7740\u72ec\u7279\u7684\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u573a\u666f\u9700\u8981\u7cbe\u786e\u7684\u5c3a\u5bf8\u548c\u5b9a\u4f4d\uff0c\u8fd9\u8981\u6c42\u5bf9\u7a7a\u95f4\u5e03\u5c40\u8fdb\u884c\u590d\u6742\u7684\u89c4\u5212\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86SceneGenAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u7528\u4e8e\u901a\u8fc7C#\u4ee3\u7801\u751f\u6210\u5de5\u4e1a\u573a\u666f\u3002SceneGenAgent\u901a\u8fc7\u7ed3\u6784\u5316\u548c\u53ef\u8ba1\u7b97\u7684\u683c\u5f0f\u3001\u5e03\u5c40\u9a8c\u8bc1\u4ee5\u53ca\u8fed\u4ee3\u4f18\u5316\u6765\u786e\u4fdd\u7cbe\u786e\u7684\u5e03\u5c40\u89c4\u5212\uff0c\u4ee5\u6ee1\u8db3\u5de5\u4e1a\u573a\u666f\u7684\u5b9a\u91cf\u9700\u6c42\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7531SceneGenAgent\u9a71\u52a8\u7684LLMs\u8d85\u8fc7\u4e86\u5b83\u4eec\u539f\u6709\u7684\u6027\u80fd\uff0c\u5728\u5b9e\u9645\u5de5\u4e1a\u573a\u666f\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u8fbe\u5230\u4e8681.0%\uff0c\u5e76\u6709\u6548\u5730\u6ee1\u8db3\u4e86\u5927\u591a\u6570\u573a\u666f\u751f\u6210\u9700\u6c42\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u9ad8\u53ef\u8bbf\u95ee\u6027\uff0c\u6211\u4eec\u6784\u5efa\u4e86SceneInstruct\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u7528\u4e8e\u5fae\u8c03\u5f00\u6e90LLMs\u4ee5\u96c6\u6210\u5230SceneGenAgent\u4e2d\u7684\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u663e\u793a\uff0c\u57fa\u4e8eSceneInstruct\u5bf9\u5f00\u6e90LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u83b7\u5f97\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0cLlama3.1-70B\u7684\u6027\u80fd\u63a5\u8fd1GPT-4o\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2410.21359": "|**2024-10-28**|**Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games**|Ji Ma et.al.|[2410.21359](http://arxiv.org/abs/2410.21359)|null|\u968f\u7740\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8d8a\u6765\u8d8a\u591a\u5730\u627f\u62c5\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u5e76\u4e0e\u4eba\u7c7b\u793e\u4f1a\u4e92\u52a8\uff0c\u6211\u4eec\u5bf9\u5b83\u4eec\u7684\u884c\u4e3a\u4e86\u89e3\u591a\u5c11\uff1f\u672c\u7814\u7a76\uff081\uff09\u8c03\u67e5\u4e86\u4e0d\u540c\u4eba\u683c\u5982\u4f55\u8bf1\u5bfcLLM\u4ee3\u7406\u7684\u4eb2\u793e\u4f1a\u884c\u4e3a\u2014\u2014\u4e00\u79cd\u57fa\u672c\u7684\u793e\u4f1a\u89c4\u8303\uff0c\u5e76\u5c06\u5176\u4e0e\u4eba\u7c7b\u884c\u4e3a\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff1b\uff082\uff09\u5f15\u5165\u4e86\u4e00\u79cd\u884c\u4e3a\u65b9\u6cd5\u6765\u8bc4\u4f30LLM\u4ee3\u7406\u5728\u590d\u6742\u51b3\u7b56\u573a\u666f\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u4eba\u683c\u548c\u5b9e\u9a8c\u6846\u67b6\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9bAI\u4ee3\u7406\u5728\u72ec\u88c1\u8005\u535a\u5f08\u4e2d\u7684\u5229\u4ed6\u884c\u4e3a\uff0c\u5e76\u6bd4\u8f83\u4e86\u540c\u4e00LLM\u5bb6\u65cf\u5185\u3001\u4e0d\u540cLLM\u5bb6\u65cf\u4e4b\u95f4\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u884c\u4e3a\u4e4b\u95f4\u7684\u5dee\u5f02\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u7684\u5dee\u5f02\u548c\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u4e14\u4e0e\u4eba\u7c7b\u884c\u4e3a\u76f8\u6bd4\u4e5f\u6709\u660e\u663e\u533a\u522b\u3002\u4ec5\u4ec5\u8d4b\u4e88LLM\u7c7b\u4f3c\u4eba\u7c7b\u7684\u8eab\u4efd\u5e76\u4e0d\u80fd\u4ea7\u751f\u7c7b\u4f3c\u4eba\u7c7b\u7684\u884c\u4e3a\u3002\u5c3d\u7ba1\u8fd9\u4e9bAI\u4ee3\u7406\u662f\u5728\u5927\u91cf\u7531\u4eba\u7c7b\u751f\u6210\u7684\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\uff0c\u4f46\u5b83\u4eec\u65e0\u6cd5\u51c6\u786e\u9884\u6d4b\u4eba\u7c7b\u7684\u51b3\u5b9a\u3002LLM\u4ee3\u7406\u65e0\u6cd5\u6355\u6349\u5230\u4eba\u7c7b\u51b3\u7b56\u8fc7\u7a0b\u7684\u5185\u90e8\u673a\u5236\uff0c\u5176\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u7279\u5b9a\u7684\u6a21\u578b\u67b6\u6784\u548c\u63d0\u793a\u5f62\u5f0f\uff1b\u66f4\u7cdf\u7cd5\u7684\u662f\uff0c\u8fd9\u79cd\u4f9d\u8d56\u5e76\u4e0d\u9075\u5faa\u660e\u786e\u7684\u6a21\u5f0f\u3002|\n", "2410.23252": "|**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6269\u5c55\u5230\u6267\u884c\u73b0\u5b9e\u4e16\u754c\u7684\u5e94\u7528\u7a0b\u5e8f\uff0c\u8d85\u8d8a\u4f20\u7edf\u7684NLP\u4efb\u52a1\uff0c\u8bc4\u4f30\u5176\u7a33\u5065\u6027\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7b49\u5173\u952e\u7ef4\u5ea6\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86CASA\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u4ee3\u7406\u5728\u4e24\u4e2a\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\uff08\u5728\u7ebf\u8d2d\u7269\u548c\u793e\u4ea4\u8ba8\u8bba\u8bba\u575b\uff09\u4e2d\u7684\u6587\u5316\u548c\u793e\u4f1a\u89c4\u8303\u654f\u611f\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bc4\u4f30\u4e86LLM\u4ee3\u7406\u68c0\u6d4b\u5e76\u9002\u5f53\u56de\u5e94\u8fdd\u53cd\u89c4\u8303\u7684\u7528\u6237\u67e5\u8be2\u548c\u89c2\u5bdf\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u6d4b\u91cf\u610f\u8bc6\u8986\u76d6\u7387\u3001\u7ba1\u7406\u7528\u6237\u67e5\u8be2\u7684\u5e2e\u52a9\u6027\u548c\u9762\u5bf9\u8bef\u5bfc\u6027\u7f51\u7edc\u5185\u5bb9\u65f6\u7684\u8fdd\u89c4\u7387\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u524d\u7684LLM\u5728\u975e\u4ee3\u7406\u73af\u5883\u4e2d\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u57fa\u4e8e\u7f51\u7edc\u7684\u4ee3\u7406\u73af\u5883\uff0c\u5176\u4e2d\u4ee3\u7406\u7684\u610f\u8bc6\u8986\u76d6\u7387\u4e0d\u523010%\uff0c\u8fdd\u89c4\u7387\u8d85\u8fc740%\u3002\u4e3a\u4e86\u63d0\u9ad8\u6027\u80fd\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u63d0\u793a\u548c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u4e92\u8865\u2014\u2014\u5728\u7279\u5b9a\u6587\u5316\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u589e\u5f3a\u4ee3\u7406\u5728\u4e0d\u540c\u5730\u533a\u4e4b\u95f4\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u800c\u63d0\u793a\u5219\u63d0\u5347\u4e86\u4ee3\u7406\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5728\u5f00\u53d1\u5468\u671f\u4e2d\u4e0d\u65ad\u5bf9LLM\u4ee3\u7406\u7684\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u7684\u91cd\u8981\u6027\u3002|\n", "2410.22916": "|**2024-10-30**|**Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration**|Yanchu Guan et.al.|[2410.22916](http://arxiv.org/abs/2410.22916)|null|\u81ea\u4e3b\u79fb\u52a8\u5e94\u7528\u4ea4\u4e92\u5728\u79fb\u52a8\u5e94\u7528\u7a0b\u5e8f\u590d\u6742\u6027\u65e5\u76ca\u589e\u52a0\u7684\u80cc\u666f\u4e0b\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u5f00\u53d1\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u4e0e\u79fb\u52a8\u5e94\u7528\u4e92\u52a8\u7684\u667a\u80fd\u4ee3\u7406\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u89e3\u91ca\u7684\u884c\u4e3a\u514b\u9686\u5927\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\uff08EBC-LLMAgent\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ed3\u5408\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u884c\u4e3a\u514b\u9686\u6280\u672f\u6765\u5b66\u4e60\u6f14\u793a\uff0c\u4ece\u800c\u521b\u5efa\u7528\u4e8e\u81ea\u4e3b\u79fb\u52a8\u5e94\u7528\u4ea4\u4e92\u7684\u667a\u80fd\u4e14\u53ef\u89e3\u91ca\u7684\u4ee3\u7406\u3002EBC-LLMAgent \u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u6a21\u5757\uff1a\u6f14\u793a\u7f16\u7801\u3001\u4ee3\u7801\u751f\u6210\u548c\u7528\u6237\u754c\u9762\u6620\u5c04\uff0c\u8fd9\u4e9b\u6a21\u5757\u534f\u540c\u5de5\u4f5c\u4ee5\u6355\u6349\u7528\u6237\u6f14\u793a\u3001\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u5e76\u5efa\u7acb\u4ee3\u7801\u4e0e\u7528\u6237\u754c\u9762\u5143\u7d20\u4e4b\u95f4\u7684\u51c6\u786e\u5bf9\u5e94\u5173\u7cfb\u3002\u6211\u4eec\u5f15\u5165\u4e86\u884c\u4e3a\u514b\u9686\u94fe\u878d\u5408\u6280\u672f\u6765\u589e\u5f3a\u4ee3\u7406\u7684\u6cdb\u5316\u80fd\u529b\u3002\u901a\u8fc7\u5bf9\u6765\u81ea\u4e0d\u540c\u9886\u57df\u7684\u4e94\u6b3e\u6d41\u884c\u79fb\u52a8\u5e94\u7528\u8fdb\u884c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86 EBC-LLMAgent \u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u5177\u6709\u9ad8\u6210\u529f\u7387\u3001\u5bf9\u672a\u89c1\u8fc7\u573a\u666f\u7684\u9ad8\u6548\u6cdb\u5316\u4ee5\u53ca\u751f\u6210\u6709\u610f\u4e49\u7684\u89e3\u91ca\u3002|\n", "2410.22662": "|**2024-10-30**|**$\\textbf{EMOS}$: $\\textbf{E}$mbodiment-aware Heterogeneous $\\textbf{M}$ulti-robot $\\textbf{O}$perating $\\textbf{S}$ystem with LLM Agents**|Junting Chen et.al.|[2410.22662](http://arxiv.org/abs/2410.22662)|null|\u5f02\u6784\u591a\u673a\u5668\u4eba\u7cfb\u7edf\uff08HMRS\uff09\u5df2\u6210\u4e3a\u89e3\u51b3\u5355\u4e2a\u673a\u5668\u4eba\u65e0\u6cd5\u72ec\u7acb\u5904\u7406\u7684\u590d\u6742\u4efb\u52a1\u7684\u5f3a\u5927\u65b9\u6cd5\u3002\u5f53\u524d\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08LLM-based MAS\uff09\u5728\u8f6f\u4ef6\u5f00\u53d1\u548c\u64cd\u4f5c\u7cfb\u7edf\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5c06\u5176\u5e94\u7528\u4e8e\u673a\u5668\u4eba\u63a7\u5236\u5219\u9762\u4e34\u72ec\u7279\u7684\u6311\u6218\u3002\u7279\u522b\u662f\uff0c\u591a\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u6bcf\u4e2a\u4ee3\u7406\u7684\u80fd\u529b\u672c\u8d28\u4e0a\u4e0e\u673a\u5668\u4eba\u7684\u7269\u7406\u7ed3\u6784\u76f8\u5173\uff0c\u800c\u4e0d\u662f\u9884\u5b9a\u4e49\u7684\u89d2\u8272\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u667a\u80fd\u4f53\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5177\u6709\u4e0d\u540c\u5f62\u6001\u548c\u80fd\u529b\u7684\u5f02\u6784\u673a\u5668\u4eba\u4e4b\u95f4\u7684\u6709\u6548\u534f\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u540d\u4e3aHabitat-MAS\u3002\u6211\u4eec\u7684\u4e00\u4e2a\u5173\u952e\u8bbe\u8ba1\u662f\u201c\u673a\u5668\u4eba\u7b80\u5386\u201d\uff1a\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u63d0\u793a\u7684\u65b9\u6cd5\uff0c\u800c\u975e\u91c7\u7528\u4eba\u4e3a\u8bbe\u8ba1\u7684\u89d2\u8272\u626e\u6f14\uff0c\u5373\u4ee3\u7406\u901a\u8fc7\u7406\u89e3\u673a\u5668\u4eba\u7684URDF\u6587\u4ef6\u5e76\u8c03\u7528\u673a\u5668\u4eba\u8fd0\u52a8\u5b66\u5de5\u5177\u6765\u751f\u6210\u63cf\u8ff0\u5176\u7269\u7406\u80fd\u529b\u7684\u6587\u6863\uff0c\u4ee5\u6307\u5bfc\u5176\u5728\u4efb\u52a1\u89c4\u5212\u548c\u52a8\u4f5c\u6267\u884c\u4e2d\u7684\u884c\u4e3a\u3002Habitat-MAS\u57fa\u51c6\u6d4b\u8bd5\u65e8\u5728\u8bc4\u4f30\u4e00\u4e2a\u591a\u667a\u80fd\u4f53\u6846\u67b6\u5982\u4f55\u5904\u7406\u9700\u8981\u4f53\u73b0\u611f\u77e5\u63a8\u7406\u7684\u4efb\u52a1\uff0c\u5305\u62ec1) \u64cd\u7eb5\u30012) \u611f\u77e5\u30013) \u5bfc\u822a\u4ee5\u53ca4) \u590d\u6742\u7684\u591a\u697c\u5c42\u7269\u4f53\u91cd\u6392\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u673a\u5668\u4eba\u7684\u7b80\u5386\u548c\u6211\u4eec\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u5206\u5c42\u8bbe\u8ba1\u5bf9\u4e8e\u5728\u8fd9\u79cd\u590d\u6742\u7684\u4efb\u52a1\u73af\u5883\u4e2d\u6709\u6548\u8fd0\u884c\u5f02\u6784\u591a\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002|\n", "2410.22584": "|**2024-10-29**|**BENCHAGENTS: Automated Benchmark Creation with Agent Interaction**|Natasha Butt et.al.|[2410.22584](http://arxiv.org/abs/2410.22584)|null|\u8bc4\u4f30\u53d7\u5230\u57fa\u51c6\u6d4b\u8bd5\u53ef\u7528\u6027\u7684\u9650\u5236\u3002\u968f\u7740\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u9700\u8981\u521b\u5efa\u80fd\u591f\u8861\u91cf\u65b0\u751f\u6210\u80fd\u529b\u8fdb\u5c55\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7136\u800c\uff0c\u901a\u8fc7\u4eba\u5de5\u6ce8\u91ca\u521b\u5efa\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u65e2\u7f13\u6162\u53c8\u6602\u8d35\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u4efb\u4f55\u80fd\u529b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6211\u4eec\u5f15\u5165\u4e86BENCHAGENTS\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7cfb\u7edf\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u5316\u521b\u5efa\u590d\u6742\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540c\u65f6\u786e\u4fdd\u6570\u636e\u548c\u5ea6\u91cf\u7684\u8d28\u91cf\u3002BENCHAGENTS\u5c06\u57fa\u51c6\u6d4b\u8bd5\u521b\u5efa\u8fc7\u7a0b\u5206\u89e3\u4e3a\u89c4\u5212\u3001\u751f\u6210\u3001\u6570\u636e\u9a8c\u8bc1\u548c\u8bc4\u4f30\u56db\u4e2a\u6b65\u9aa4\uff0c\u6bcf\u4e2a\u6b65\u9aa4\u90fd\u7531LLM\u4ee3\u7406\u6267\u884c\u3002\u8fd9\u4e9b\u4ee3\u7406\u76f8\u4e92\u4ea4\u4e92\uff0c\u5e76\u5229\u7528\u57fa\u51c6\u6d4b\u8bd5\u5f00\u53d1\u8005\u7684\u4eba\u673a\u53cd\u9988\u6765\u663e\u5f0f\u6539\u8fdb\u548c\u7075\u6d3b\u63a7\u5236\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u6211\u4eec\u4f7f\u7528BENCHAGENTS\u521b\u5efa\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u751f\u6210\u8fc7\u7a0b\u4e2d\u89c4\u5212\u548c\u7ea6\u675f\u6ee1\u8db3\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u7814\u7a76\u4e03\u79cd\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5e76\u63d0\u53d6\u5173\u4e8e\u5e38\u89c1\u5931\u8d25\u6a21\u5f0f\u548c\u6a21\u578b\u5dee\u5f02\u7684\u65b0\u89c1\u89e3\u3002|\n", "2410.22552": "|**2024-10-29**|**Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents**|Jaekyeom Kim et.al.|[2410.22552](http://arxiv.org/abs/2410.22552)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Auto-Intent\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u5728\u4e0d\u76f4\u63a5\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u76ee\u6807\u9886\u57df\u4ee3\u7406\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u5173\u6ce8\u7f51\u9875\u5bfc\u822a\u4efb\u52a1\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u4ece\u76ee\u6807\u9886\u57df\u7684\u6f14\u793a\u4e2d\u65e0\u76d1\u7763\u5730\u53d1\u73b0\u6f5c\u5728\u7684\u610f\u56fe\uff0c\u4ee5\u9ad8\u5ea6\u7d27\u51d1\u7684\u5f62\u5f0f\uff08\u6700\u591a\u4e09\u4e2a\u8bcd\uff09\u3002\u901a\u8fc7\u63d0\u53d6\u7684\u610f\u56fe\uff0c\u6211\u4eec\u8bad\u7ec3\u610f\u56fe\u9884\u6d4b\u5668\u6765\u6839\u636e\u4ee3\u7406\u8fc7\u53bb\u7684\u89c2\u5bdf\u548c\u884c\u4e3a\u9884\u6d4b\u4e0b\u4e00\u4e2a\u610f\u56fe\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u63a2\u7d22\u65b9\u6cd5\uff0c\u5176\u4e2d\u6982\u7387\u6700\u9ad8\u7684\u524dk\u4e2a\u610f\u56fe\u9884\u6d4b\u88ab\u7528\u4f5c\u63d0\u793a\u63d0\u4f9b\u7ed9\u9884\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u51b3\u7b56\u80fd\u529b\u3002Auto-Intent\u663e\u8457\u63d0\u9ad8\u4e86GPT-3.5\u3001GPT-4\u548cLlama-3.1-70B\u3001Llama-3.1-405B\u4ee3\u7406\u5728\u5927\u89c4\u6a21\u771f\u5b9e\u7f51\u7ad9\u5bfc\u822a\u57fa\u51c6\uff08\u6765\u81eaMind2Web\uff09\u548c\u5728\u7ebf\u5bfc\u822a\u4efb\u52a1\uff08\u6765\u81eaWebArena\uff09\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u8de8\u57fa\u51c6\u7684\u6cdb\u5316\u80fd\u529b\u4e5f\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2410.23555": "|**2024-10-31**|**From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents**|Nalin Tiwary et.al.|[2410.23555](http://arxiv.org/abs/2410.23555)|null|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6846\u67b6\u5df2\u7ecf\u6269\u5c55\u5230\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\uff0c\u5982\u4ea4\u4e92\u5f0f\u7f51\u9875\u5bfc\u822a\u3002\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u7528\u6237\u547d\u4ee4\u9a71\u52a8\uff0c\u4f7f\u7528\u6d4f\u89c8\u5668\u5b8c\u6210\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u8f6e\u5bf9\u8bdd\u63d0\u4f9b\u670d\u52a1\uff0c\u65e2\u5e26\u6765\u4e86\u521b\u65b0\u673a\u9047\u4e5f\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f15\u5165\u4e86\u7528\u4e8e\u4f1a\u8bdd\u7f51\u9875\u5bfc\u822a\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4f46\u5bf9\u4e8e\u5f71\u54cd\u8fd9\u4e9b\u4ee3\u7406\u6027\u80fd\u7684\u5173\u952e\u4e0a\u4e0b\u6587\u7ec4\u4ef6\u7684\u8be6\u7ec6\u7406\u89e3\u4ecd\u7136\u4e0d\u8db3\u3002\u672c\u7814\u7a76\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u901a\u8fc7\u5206\u6790\u7f51\u9875\u5bfc\u822a\u4ee3\u7406\u529f\u80fd\u6240\u9700\u7684\u591a\u79cd\u5173\u952e\u4e0a\u4e0b\u6587\u8981\u7d20\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u6211\u4eec\u7814\u7a76\u4e86\u4e0a\u4e0b\u6587\u7ba1\u7406\u7684\u4f18\u5316\uff0c\u91cd\u70b9\u5173\u6ce8\u4ea4\u4e92\u5386\u53f2\u548c\u7f51\u9875\u8868\u793a\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5728\u5206\u5e03\u5916\u573a\u666f\u4e0b\uff08\u5305\u62ec\u672a\u89c1\u8fc7\u7684\u7f51\u7ad9\u3001\u7c7b\u522b\u548c\u5730\u533a\uff09\u901a\u8fc7\u6709\u6548\u7684\u4e0a\u4e0b\u6587\u7ba1\u7406\u63d0\u9ad8\u4e86\u4ee3\u7406\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u53d1\u73b0\u4e3aLLM\u4ee3\u7406\u7684\u8bbe\u8ba1\u548c\u4f18\u5316\u63d0\u4f9b\u4e86\u89c1\u89e3\uff0c\u4f7f\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7f51\u9875\u5bfc\u822a\u66f4\u52a0\u51c6\u786e\u548c\u6709\u6548\u3002|\n"}, "llm": {"2405.10311": "|**2024-05-16**|**UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models**|Sahel Sharifymoghaddam et.al.|[2405.10311](http://arxiv.org/abs/2405.10311)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u591a\u6a21\u6001\uff08MM\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u89e3\u9501\u4e86\u8bb8\u591a\u9700\u8981\u591a\u6a21\u6001\u7406\u89e3\uff08\u5982\u56fe\u50cf\u63cf\u8ff0\u6216\u89c6\u89c9\u95ee\u7b54\uff09\u548c\u751f\u6210\uff08\u5982\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u751f\u6210\u6216\u7f16\u8f91\uff09\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347MM-LLMs\u7684\u8f93\u51fa\u8d28\u91cf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684UniRAG\u6280\u672f\uff0c\u5b83\u5728\u63a8\u7406\u9636\u6bb5\u5c06\u76f8\u5173\u68c0\u7d22\u4fe1\u606f\u6dfb\u52a0\u5230\u63d0\u793a\u4e2d\uff0c\u4f5c\u4e3a\u5c11\u91cf\u6837\u4f8b\u3002\u4e0e\u666e\u904d\u8ba4\u4e3a\u68c0\u7d22\u589e\u5f3a\uff08RA\uff09\u4e3b\u8981\u6539\u8fdb\u7f55\u89c1\u5b9e\u4f53\u7684\u751f\u6210\u6216\u7406\u89e3\u4e0d\u540c\uff0c\u6211\u4eec\u5728MSCOCO\u6570\u636e\u96c6\u4e0a\u5bf9\u5305\u62ecGPT4\u3001Gemini-Pro\u5728\u5185\u7684\u4e13\u6709\u6a21\u578b\u4ee5\u53caLlava\u3001LaVIT\u548cEmu2\u7b49\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8f93\u5165\u63d0\u793a\u901a\u8fc7MM\u68c0\u7d22\u5668\uff08\u5982UniIR\u6a21\u578b\uff09\u589e\u5f3a\u540e\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u8d28\u91cf\u3002|\n", "2405.10305": "|**2024-05-16**|**4D Panoptic Scene Graph Generation**|Jingkang Yang et.al.|[2405.10305](http://arxiv.org/abs/2405.10305)|**[link](https://github.com/jingkang50/psg4d)**|**\u6211\u4eec\u751f\u6d3b\u5728\u4e00\u4e2a\u4e09\u7ef4\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u901a\u8fc7\u7b2c\u56db\u7ef4\u65f6\u95f4\u5411\u524d\u63a8\u8fdb\u3002\u4e3a\u4e86\u4f7f\u4eba\u5de5\u667a\u80fd\u80fd\u591f\u5168\u9762\u7406\u89e3\u8fd9\u79cd4D\u73af\u5883\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8868\u793a\u5f62\u5f0f\u2014\u20144D\u5168\u666f\u573a\u666f\u56fe\uff08PSG-4D\uff09\uff0c\u5b83\u5c06\u52a8\u60014D\u4e16\u754c\u4e2d\u7684\u539f\u59cb\u89c6\u89c9\u6570\u636e\u62bd\u8c61\u4e3a\u8282\u70b9\u548c\u8fb9\uff0c\u8282\u70b9\u4ee3\u8868\u5177\u6709\u7cbe\u786e\u4f4d\u7f6e\u548c\u72b6\u6001\u4fe1\u606f\u7684\u5b9e\u4f53\uff0c\u8fb9\u6355\u6349\u65f6\u95f4\u5173\u7cfb\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5728\u8fd9\u4e00\u65b0\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u6ce8\u91caPSG-4D\u6570\u636e\u96c6\uff0c\u5305\u542b3000\u4e2aRGB-D\u89c6\u9891\uff0c\u603b\u8ba1100\u4e07\u5e27\uff0c\u6bcf\u5e27\u90fd\u5e26\u67094D\u5168\u666f\u5206\u5272\u63a9\u7801\u4ee5\u53ca\u8be6\u7ec6\u7684\u52a8\u6001\u573a\u666f\u56fe\u6807\u7b7e\u3002\u6211\u4eec\u4e3a\u6b64\u4efb\u52a1\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSG4DFormer\u7684Transformer\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u5168\u666f\u5206\u5272\u63a9\u7801\uff0c\u6cbf\u65f6\u95f4\u8f74\u8ddf\u8e2a\u63a9\u7801\uff0c\u5e76\u901a\u8fc7\u5173\u7cfb\u7ec4\u4ef6\u751f\u6210\u76f8\u5e94\u7684\u573a\u666f\u56fe\u3002\u5728\u65b0\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u672a\u6765\u7684PSG-4D\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u878d\u5165\u6211\u4eec\u7684PSG-4D\u7cfb\u7edf\u6765\u5b9e\u73b0\u52a8\u6001\u573a\u666f\u7406\u89e3\u7684\u4e00\u4e2a\u5b9e\u9645\u5e94\u7528\u793a\u4f8b\u3002**|\n", "2405.10299": "|**2024-05-16**|**HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models**|Rhea Sanjay Sukthanker et.al.|[2405.10299](http://arxiv.org/abs/2405.10299)|**[link](https://github.com/automl/hw-aware-llm-bench)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u786c\u4ef6\u6307\u6807\uff08\u5982\u5ef6\u8fdf\u3001\u80fd\u8017\u3001GPU\u5185\u5b58\u4f7f\u7528\u548c\u6027\u80fd\uff09\u4e4b\u95f4\u7684\u6743\u8861\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u4eba\u4eec\u6b63\u5728\u5bfb\u6c42\u4e3a\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u914d\u7f6e\u5efa\u7acb\u5e15\u7d2f\u6258\u524d\u6cbf\uff0c\u4ee5\u5728\u6307\u5b9a\u786c\u4ef6\u9650\u5236\u4e0b\u627e\u5230\u6700\u4f18\u6a21\u578b\u3002\u7136\u800c\uff0c\u5bf9\u591a\u79cd\u67b6\u6784\u5728\u591a\u53f0\u8bbe\u5907\u4e0a\u7684\u5168\u9762\u8bad\u7ec3\u548c\u8bc4\u4f30\u5728\u8ba1\u7b97\u4e0a\u662f\u4e0d\u53ef\u884c\u7684\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HW-GPT-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u786c\u4ef6\u611f\u77e5\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u57fa\u51c6\uff0c\u5229\u7528\u795e\u7ecf\u67b6\u6784\u641c\u7d22\uff08NAS\uff09\u4e2d\u7684\u6743\u91cd\u5171\u4eab\u6280\u672f\uff0c\u5728\u4e00\u4e2a\u6a21\u578b\u4e2d\u9ad8\u6548\u5730\u8bad\u7ec3\u5305\u542b\u4e0d\u540c\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u8d85\u7f51\u7edc\u3002\u6211\u4eec\u572813\u79cd\u8bbe\u5907\u4e0a\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u6027\u80fd\u5256\u6790\uff0c\u8003\u8651\u4e865\u79cd\u786c\u4ef6\u6307\u6807\u548c3\u79cd\u4e0d\u540c\u7684\u6a21\u578b\u89c4\u6a21\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc78\u79cd\u4e0d\u540c\u7684\u591a\u76ee\u6807NAS\u7b97\u6cd5\u5c55\u793a\u4e86HW-GPT-Bench\u7684\u53ef\u7528\u6027\uff0c\u5e76\u8bc4\u4f30\u4e86\u7531\u6b64\u4ea7\u751f\u7684\u5e15\u7d2f\u6258\u524d\u6cbf\u7684\u8d28\u91cf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u63a8\u52a8\u548c\u52a0\u901f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u76ee\u6807\u65b9\u6cd5\uff0c\u5982NAS\u548c\u7ed3\u6784\u5316\u526a\u679d\u7684\u7814\u7a76\u3002**|\n", "2405.10288": "|**2024-05-16**|**Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction**|Jianhao Chen et.al.|[2405.10288](http://arxiv.org/abs/2405.10288)|**[link](https://github.com/jianhaochen-nju/tsdre)**|**\u6458\u8981\uff1a** \u4e8b\u5b9e\u62bd\u53d6\u5bf9\u4e8e\u6784\u5efa\u77e5\u8bc6\u56fe\u8c31\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5bf9\u65f6\u95f4\u76f8\u5173\u4e8b\u5b9e\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u9700\u6c42\u589e\u957f\uff0c\u51fa\u73b0\u4e86\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u4efb\u52a1\u3002\u672c\u6587\u7279\u522b\u5173\u6ce8\u4ece\u81ea\u7136\u8bed\u8a00\u6587\u672c\u4e2d\u63d0\u53d6\u65f6\u95f4\u6027\u4e8b\u5b9e\u3002\u5148\u524d\u7684\u7814\u7a76\u672a\u80fd\u59a5\u5584\u5904\u7406\u590d\u6742\u53e5\u5b50\u4e2d\u65f6\u95f4\u4e0e\u4e8b\u5b9e\u5bf9\u5e94\u5173\u7cfb\u7684\u5efa\u7acb\u96be\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u65f6\u95f4\u7ebf\u7684\u53e5\u5b50\u5206\u89e3\u7b56\u7565\uff0c\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4e8b\u5b9e\u76f8\u5173\u65f6\u95f4\u7ebf\u7684\u7cbe\u7ec6\u7406\u89e3\u3002\u7136\u800c\uff0c\u76f4\u63a5\u4f7f\u7528LLMs\u8fdb\u884c\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u6027\u80fd\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86TSDRE\u65b9\u6cd5\uff0c\u5c06LLMs\u7684\u5206\u89e3\u80fd\u529b\u878d\u5165\u5230\u5c0f\u578b\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u4f20\u7edf\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002 \u4e3a\u4e86\u652f\u6301\u8bc4\u4f30\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u590d\u6742\u7684\u65f6\u5e8f\u4e8b\u5b9e\u62bd\u53d6\u6570\u636e\u96c6ComplexTRED\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTSDRE\u5728HyperRED-Temporal\u548cComplexTRED\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002|\n", "2405.10276": "|**2024-05-16**|**Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers**|Tuo Zhang et.al.|[2405.10276](http://arxiv.org/abs/2405.10276)|null|\u8fd1\u5e74\u6765\uff0c\u8bb8\u591a\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u7b56\u7565\u6027\u63d0\u793a\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6548\u80fd\u3002\u7279\u522b\u662f\u4f18\u5316\u901a\u8fc7prompting\uff08OPRO\uff09\u65b9\u6cd5\u8868\u73b0\u51fa\u9876\u5c16\u6027\u80fd\uff0c\u5b83\u5229\u7528LLMs\u4f5c\u4e3a\u4f18\u5316\u5668\uff0c\u76ee\u6807\u662f\u5bfb\u627e\u80fd\u6700\u5927\u5316\u4efb\u52a1\u51c6\u786e\u6027\u7684\u6307\u4ee4\u3002\u672c\u8bba\u6587\u91cd\u65b0\u5ba1\u89c6\u4e86OPRO\u5728\u5c0f\u578bLLMs\uff08\u5982LaMa-2\u7cfb\u5217\u548cMistral 7B\uff09\u4e0a\u7684\u81ea\u52a8\u5316\u63d0\u793a\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u4e8e\u5c0f\u578bLLMs\uff0cOPRO\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u5176\u6709\u9650\u7684\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86\u4f18\u5316\u6f5c\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u672a\u6765\u7684\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u5e94\u540c\u65f6\u8003\u8651\u6a21\u578b\u80fd\u529b\u548c\u8ba1\u7b97\u6210\u672c\u3002\u9488\u5bf9\u5c0f\u578bLLMs\uff0c\u6211\u4eec\u63a8\u8350\u76f4\u63a5\u63d0\u4f9b\u660e\u786e\u9610\u8ff0\u76ee\u6807\u548c\u65b9\u6cd5\u7684\u6307\u4ee4\uff0c\u4f5c\u4e3a\u7a33\u5065\u7684\u63d0\u793a\u57fa\u7ebf\uff0c\u4ee5\u786e\u4fdd\u5728\u5f53\u524d\u7814\u7a76\u4e2d\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u63d0\u793a\u8bbe\u8ba1\u3002|\n", "2405.10260": "|**2024-05-16**|**Keep It Private: Unsupervised Privatization of Online Text**|Calvin Bao et.al.|[2405.10260](http://arxiv.org/abs/2405.10260)|**[link](https://github.com/csbao/kip-privatization)**|**## \u80cc\u666f \u4f5c\u8005\u8eab\u4efd\u6df7\u6dc6\u6280\u672f\u6709\u671b\u901a\u8fc7\u81ea\u52a8\u91cd\u5199\u6587\u672c\u6765\u4fdd\u62a4\u7f51\u7edc\u901a\u4fe1\u4e2d\u7684\u4e2a\u4eba\u9690\u79c1\u3002\u7136\u800c\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6587\u732e\u4e2d\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u8bc4\u4f30\u5927\u591a\u5c40\u9650\u5728\u72ed\u5c0f\u573a\u666f\u4e0b\uff0c\u4e3b\u8981\u4f9d\u8d56\u4e8e\u8868\u9762\u7684\u7f16\u8f91\u64cd\u4f5c\uff0c\u53ef\u80fd\u5bfc\u81f4\u8f93\u51fa\u4e0d\u81ea\u7136\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u6587\u672c\u79c1\u5bc6\u5316\u6846\u67b6\uff0c\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u751f\u6210\u517c\u987e\u51c6\u786e\u3001\u8fde\u8d2f\u548c\u9690\u79c1\u7684\u91cd\u5199\u3002\u6211\u4eec\u5728\u5927\u89c4\u6a21\u7684\u82f1\u8bedReddit\u5e16\u5b50\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u8be5\u6570\u636e\u96c6\u753168,000\u540d\u4f5c\u8005\u64b0\u5199\uff0c\u5305\u542b\u77ed\u5230\u4e2d\u7b49\u957f\u5ea6\u7684\u6587\u672c\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5728\u4e0d\u540c\u8bc4\u4f30\u6761\u4ef6\u4e0b\uff0c\u5982\u4f5c\u8005\u7b80\u4ecb\u957f\u5ea6\u548c\u4f5c\u8005\u8bc6\u522b\u7b56\u7565\uff0c\u6027\u80fd\u7684\u53d8\u5316\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u81ea\u52a8\u5316\u6307\u6807\u548c\u4eba\u5de5\u8bc4\u4f30\u4e2d\u4fdd\u6301\u9ad8\u6587\u672c\u8d28\u91cf\uff0c\u5e76\u6210\u529f\u5730\u89c4\u907f\u4e86\u51e0\u79cd\u81ea\u52a8\u4f5c\u8005\u8bc6\u522b\u653b\u51fb\u3002**|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u4f53\u5728\u7a7a\u95f4\u7406\u89e3\u4e0e\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u8986\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u7ed3\u5408\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u663e\u793a\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u4e5f\u6307\u51fa\u4e86\u6316\u63983D-LLMs\u5168\u90e8\u6f5c\u529b\u6240\u9700\u7684\u521b\u65b0\u65b9\u6cd5\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u63d0\u4f9b\u6307\u5bfc\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u8c03\u67e5\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251](http://arxiv.org/abs/2405.10251)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u8bc4\u4f30\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u7b49\u65b9\u9762\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u636e\u6211\u4eec\u6240\u77e5\uff0c\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u6df1\u5165\u7814\u7a76\uff0c\u8fd9\u662f\u8861\u91cf\u6a21\u578b\u4f18\u79c0\u7a0b\u5ea6\u7684\u5173\u952e\u6807\u51c6\u3002\u56e0\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u5168\u9762\u8bc4\u4f30\u77e5\u540d\u4e14\u6027\u80fd\u51fa\u8272\u7684LLMs\uff0c\u5305\u62ecChatGPT\u3001ChatGLM\u3001\u57fa\u4e8eT5\u7684\u6a21\u578b\u3001\u57fa\u4e8eLLaMA\u7684\u6a21\u578b\u548cPythia\u6a21\u578b\uff0c\u5728\u5bf9\u8bdd\u751f\u6210\u548c\u6587\u672c\u603b\u7ed3\u7b49NLG\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u6db5\u76d6\u82f1\u8bed\u548c\u4e2d\u6587\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5171\u540c\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5305\u62ec\u8f93\u5165\u6a21\u677f\u548c\u540e\u5904\u7406\u7b56\u7565\u3002\u7814\u7a76\u7ed3\u679c\u62a5\u544a\u4e86\u81ea\u52a8\u8bc4\u5206\uff0c\u540c\u65f6\u8fdb\u884c\u4e86\u8be6\u7ec6\u5206\u6790\u3002|\n", "2405.10250": "|**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u4e92\u52a8\u529f\u80fd\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u6307\u5bfc\u6a21\u578b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u4e92\u52a8\u65b9\u5f0f\u5f80\u5f80\u5047\u8bbe\u7528\u6237\u5177\u5907\u8c03\u8bd5\u6e90\u4ee3\u7801\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u975e\u4e13\u4e1a\u7a0b\u5e8f\u5458\u4e0d\u592a\u53cb\u597d\u3002\u8fd9\u4f7f\u5f97\u4f7f\u4e92\u52a8\u4ee3\u7801\u751f\u6210\u5bf9\u4e0d\u540c\u7f16\u7a0b\u6c34\u5e73\u7684\u4e2a\u4f53\u66f4\u6613\u4e8e\u4f7f\u7528\u6210\u4e3a\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86IntelliExplain\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u673a\u4ea4\u4e92\u8303\u5f0f\uff0c\u901a\u8fc7\u8ba9\u7528\u6237\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u4e0e\u6e90\u4ee3\u7801\u4e92\u52a8\uff0c\u63d0\u5347\u975e\u4e13\u4e1a\u4eba\u58eb\u7684\u4f53\u9a8c\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u4ed6\u4eec\u53d1\u73b0\u9519\u8bef\u7684\u81ea\u7136\u8bed\u8a00\u7ea0\u6b63\u53cd\u9988\uff0c\u6765\u6307\u5bfc\u7cfb\u7edf\u4fee\u8ba2\u4ee3\u7801\uff0c\u76f4\u5230\u7528\u6237\u5bf9\u7cfb\u7edf\u7684\u4ee3\u7801\u89e3\u91ca\u611f\u5230\u6ee1\u610f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u663e\u793a\uff0c\u4f7f\u7528IntelliExplain\u7684\u7528\u6237\u5728Text-to-SQL\u548cPython\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u5206\u522b\u6bd4\u7eafGPT-3.5\u63d0\u9ad8\u4e8611.6%\u548c25.3%\uff0c\u540c\u65f6\u6240\u9700\u65f6\u95f4\u5206\u522b\u51cf\u5c11\u4e8639.0%\u548c15.6%\u3002|\n", "2405.10212": "|**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|**[link](https://github.com/CAS-SIAT-XinHai/CPsyExam)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5fc3\u7406\u5b66\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014CPsyExam\uff0c\u5b83\u6e90\u4e8e\u4e2d\u56fd\u8bed\u8a00\u8003\u8bd5\u7684\u95ee\u9898\u3002CPsyExam\u65e8\u5728\u5206\u522b\u5f3a\u8c03\u5fc3\u7406\u5b66\u77e5\u8bc6\u548c\u6848\u4f8b\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u8ba4\u8bc6\u5230\u5c06\u5fc3\u7406\u5b66\u77e5\u8bc6\u5e94\u7528\u4e8e\u5b9e\u9645\u60c5\u5883\u7684\u4ef7\u503c\u3002\u4ece22,000\u4e2a\u95ee\u9898\u5e93\u4e2d\uff0c\u6211\u4eec\u7cbe\u9009\u4e864,000\u4e2a\u6765\u6784\u5efa\u8be5\u57fa\u51c6\uff0c\u786e\u4fdd\u4e86\u4e3b\u9898\u7684\u5747\u8861\u8986\u76d6\uff0c\u5e76\u5305\u542b\u4e86\u5404\u79cd\u6848\u4f8b\u5206\u6790\u65b9\u6cd5\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u57fa\u7840\u7684\u6a21\u578b\u3002\u5b9e\u9a8c\u548c\u5206\u6790\u7ed3\u679c\u663e\u793a\uff0cCPsyExam\u662f\u4e00\u4e2a\u6709\u6548\u7684\u786e\u7acb\u8bed\u8a00\u6a21\u578b\u5bf9\u5fc3\u7406\u5b66\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\uff0c\u540c\u65f6\u652f\u6301\u5728\u4e0d\u540c\u7c92\u5ea6\u4e0a\u6bd4\u8f83\u8fd9\u4e9b\u6a21\u578b\u3002|\n", "2405.10936": "|**2024-05-17**|**A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers**|Kaiyu Huang et.al.|[2405.10936](http://arxiv.org/abs/2405.10936)|**[link](https://github.com/kaiyuhwang/mllm-survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u8bed\u8a00\u80fd\u529b\uff0c\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u51cf\u5c11\u6f5c\u5728\u7684\u6b67\u89c6\u5e76\u63d0\u5347\u6280\u672f\u7684\u901a\u7528\u6027\u548c\u53ef\u8bbf\u95ee\u6027\uff0c\u5bf9\u4e8e\u591a\u8bed\u8a00\u6280\u672f\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1LLMs\u53d6\u5f97\u4e86\u7a81\u7834\uff0c\u4f46\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u7684\u6df1\u5165\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u4e00\u4efd\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u603b\u7ed3\u8fd1\u671f\u7684\u65b9\u6cd5\u3001\u8fdb\u5c55\u3001\u5c40\u9650\u6027\u548c\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u65e8\u5728\u4ece\u591a\u4e2a\u89d2\u5ea6\u5ba1\u89c6LLMs\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u56de\u987e\u4e86\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7814\u7a76\u7684\u5386\u53f2\u6f14\u53d8\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86LLMs\u7684\u591a\u8bed\u8a00\u7279\u6027\uff0c\u5305\u62ec\u8bad\u7ec3\u548c\u63a8\u7406\u65b9\u6cd5\u3001\u6a21\u578b\u5b89\u5168\u3001\u8de8\u9886\u57df\u4e0e\u6587\u5316\u9002\u5e94\u4ee5\u53ca\u6570\u636e\u96c6\u4f7f\u7528\u3002\u6211\u4eec\u8fd8\u5206\u6790\u4e86\u8fd9\u4e9b\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u53ef\u80fd\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u7684\u591a\u8bed\u8a00\u6027\u80fd\u3002\u672c\u7efc\u8ff0\u65e8\u5728\u5e2e\u52a9\u7814\u7a76\u754c\u5e94\u5bf9\u591a\u8bed\u8a00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u5173\u4e8e\u57fa\u4e8eLLMs\u7684\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6838\u5fc3\u6982\u5ff5\u3001\u5173\u952e\u6280\u672f\u53ca\u6700\u65b0\u8fdb\u5c55\u7684\u5168\u9762\u7406\u89e3\u3002**|\n", "2405.10928": "|**2024-05-17**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928](http://arxiv.org/abs/2405.10928)|**[link](https://github.com/apolloresearch/rib)**|### \u6982\u8ff0 \u673a\u68b0\u89e3\u91ca\u6027\u76ee\u6807\u662f\u901a\u8fc7\u9006\u5411\u5de5\u7a0b\u7406\u89e3\u795e\u7ecf\u7f51\u7edc\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5bf9\u6fc0\u6d3b\u7684\u5206\u89e3\uff0c\u4f7f\u5f97\u5355\u4e2a\u795e\u7ecf\u5143\u6216\u6a21\u578b\u7ec4\u4ef6\u65e0\u6cd5\u6e05\u6670\u5bf9\u5e94\u4e8e\u72ec\u7279\u7684\u7279\u5f81\u6216\u529f\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\u2014\u2014\u5c40\u90e8\u4ea4\u4e92\u57fa\uff08Local Interaction Basis\uff0cLIB\uff09\u3002LIB\u65e8\u5728\u901a\u8fc7\u6d88\u9664\u65e0\u5173\u6fc0\u6d3b\u548c\u4ea4\u4e92\uff0c\u8bc6\u522b\u8ba1\u7b97\u7279\u5f81\u3002\u8be5\u65b9\u6cd5\u6452\u5f03\u65e0\u610f\u4e49\u7684\u6fc0\u6d3b\u65b9\u5411\uff0c\u5e76\u4f7f\u57fa\u7840\u4e0e\u76f8\u90bb\u5c42\u95f4\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u5947\u5f02\u5411\u91cf\u5bf9\u9f50\u3002\u540c\u65f6\uff0c\u5b83\u6839\u636e\u7279\u5f81\u5bf9\u540e\u7eed\u8ba1\u7b97\u7684\u91cd\u8981\u6027\u8fdb\u884c\u7f29\u653e\uff0c\u751f\u6210\u4e00\u4e2a\u663e\u793a\u6a21\u578b\u4e2d\u6240\u6709\u8ba1\u7b97\u76f8\u5173\u7279\u6027\u548c\u4ea4\u4e92\u7684\u56fe\u8c31\u3002 \u6211\u4eec\u5728\u6a21\u5757\u52a0\u6cd5\u548cCIFAR-10\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86LIB\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u6bd4\u4e8e\u4e3b\u6210\u5206\u5206\u6790\uff0cLIB\u80fd\u8bc6\u522b\u51fa\u66f4\u591a\u8ba1\u7b97\u76f8\u5173\u7684\u7279\u5f81\uff0c\u5e76\u5448\u73b0\u51fa\u66f4\u7a00\u758f\u7684\u4ea4\u4e92\u3002\u7136\u800c\uff0c\u5728\u5e94\u7528\u4e8e\u8bed\u8a00\u6a21\u578b\u65f6\uff0cLIB\u5e76\u672a\u663e\u8457\u63d0\u9ad8\u53ef\u89e3\u91ca\u6027\u6216\u4ea4\u4e92\u7a00\u758f\u5ea6\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c3d\u7ba1LIB\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u7406\u8bba\u9a71\u52a8\u65b9\u6cd5\uff0c\u4f46\u5f53\u524d\u5f62\u5f0f\u5e76\u4e0d\u9002\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893](http://arxiv.org/abs/2405.10893)|null|\u8fd9\u7bc7\u6280\u672f\u8bba\u6587\u9610\u8ff0\u4e86COGNET-MD\uff0c\u4e00\u4e2a\u4e13\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7684\u65b0\u57fa\u51c6\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u5206\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u533b\u5b66\u6587\u672c\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u96be\u5ea6\u5206\u7ea7\u7684\u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u6570\u636e\u5e93\u3002\u8fd9\u4e2a\u6570\u636e\u5e93\u7531\u591a\u4e2a\u533b\u7597\u9886\u57df\u7684\u4e13\u5bb6\u5408\u4f5c\u521b\u5efa\uff0c\u4ee5\u53cd\u6620\u5f53\u524d\u533b\u5b66\u8d8b\u52bf\uff0c\u786e\u4fdd\u5b89\u5168\u3001\u5b9e\u7528\u548c\u9002\u7528\u6027\u3002\u521d\u671f\u7248\u672c\u5305\u542b\u4e86\u7cbe\u795e\u79d1\u3001\u7259\u79d1\u3001\u80ba\u75c5\u5b66\u3001\u76ae\u80a4\u79d1\u548c\u5185\u5206\u6ccc\u5b66\u7b49\u9886\u57df\u7684\u9898\u76ee\uff0c\u4f46\u4f1a\u6301\u7eed\u6269\u5c55\uff0c\u672a\u6765\u8fd8\u4f1a\u52a0\u5165\u66f4\u591a\u533b\u5b66\u5b66\u79d1\u3002|\n", "2405.10883": "|**2024-05-17**|**Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review**|Hongyi Yang et.al.|[2405.10883](http://arxiv.org/abs/2405.10883)|null|\u8be5\u7efc\u8ff0\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4f30\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u60a3\u8005\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u73b0\u72b6\u548c\u524d\u666f\uff0c\u4ee5\u53ca\u5176\u5bf9\u5eb7\u590d\u8fc7\u7a0b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u4ece2012\u5e74\u81f3\u73b0\u5728\u7b5b\u9009\u4e8670\u9879\u7814\u7a76\uff0c\u91cd\u70b9\u5173\u6ce8\u673a\u5668\u5b66\u4e60\u3001\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u5728\u5fc3\u7406\u5065\u5eb7\u5e72\u9884\u548c\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u3001\u6280\u672f\u7c7b\u522b\u3001\u4ea7\u54c1\u548c\u6570\u636e\u7c7b\u578b\uff0c\u5982\u751f\u6001\u77ac\u65f6\u8bc4\u4f30\u3001\u884c\u4e3a\u548c\u8bed\u97f3\u6570\u636e\u7684\u5206\u6790\u3002\u7ed3\u679c\u663e\u793a\uff0cAI\u5728\u75c7\u72b6\u76d1\u6d4b\u3001\u590d\u53d1\u98ce\u9669\u9884\u6d4b\u548c\u5eb7\u590d\u6cbb\u7597\u4e2d\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u8fd8\u63a2\u8ba8\u4e86\u57fa\u4e8eAI\u7684\u65b0\u5174\u4ea7\u54c1\u3001\u6280\u672f\u548c\u5206\u6790\u65b9\u6cd5\uff0c\u5982\u793e\u4ea4\u5a92\u4f53\u5206\u6790\u3001\u4e25\u8083\u6e38\u620f\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5eb7\u590d\u4e2d\u7684\u6f5c\u5728\u6311\u6218\u548c\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u603b\u7684\u6765\u8bf4\uff0c\u8fd9\u7bc7\u8bba\u6587\u7cfb\u7edf\u56de\u987e\u4e86AI\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u8def\u5f84\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u5efa\u8bae\u3002|\n", "2405.10853": "|**2024-05-17**|**The Future of Large Language Model Pre-training is Federated**|Lorenzo Sani et.al.|[2405.10853](http://arxiv.org/abs/2405.10853)|null|## \u80cc\u666f \u751f\u6210\u5f0f\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5b83\u4eec\u6240\u63a5\u53d7\u7684\u6d77\u91cf\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u5df2\u5efa\u7acb\u7684\u89c4\u6a21\u6cd5\u5219\uff0cLLMs\u672a\u6765\u6027\u80fd\u7684\u63d0\u5347\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6211\u4eec\u80fd\u591f\u5229\u7528\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u6709\u53ef\u80fd\u91ca\u653e\u5168\u7403\u5927\u90e8\u5206\u672a\u5145\u5206\u5229\u7528\u7684\u6570\u636e\u548c\u8ba1\u7b97\u80fd\u529b\uff0c\u8fd9\u4e9b\u662f\u5f53\u524d\u4ee5\u6570\u636e\u4e2d\u5fc3\u4e3a\u4e2d\u5fc3\u7684LLM\u8bad\u7ec3\u65b9\u6cd5\u6240\u5ffd\u89c6\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7a33\u5065\u3001\u7075\u6d3b\u4e14\u53ef\u590d\u73b0\u7684FL\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdb\u673a\u6784\u95f4\u7684\u5927\u89c4\u6a21\u534f\u4f5c\uff0c\u5171\u540c\u8bad\u7ec3LLMs\uff0c\u4ece\u800c\u52a8\u5458\u66f4\u591a\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\uff0c\u751a\u81f3\u53ef\u80fd\u8fbe\u5230\u6216\u8d85\u8d8a\u4e2d\u5fc3\u5316\u7684\u6027\u80fd\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cdFL\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u5728\u6709\u9650\u8d44\u6e90\u4e0b\u6269\u5c55\u5230\u767e\u4ebf\u5143\u7ea7\u7684\u8054\u90a6LLM\uff0c\u4f7f\u5f97\u62e5\u6709\u4e30\u5bcc\u6570\u636e\u7684\u5b9e\u4f53\u80fd\u591f\u6210\u4e3a\u9884\u8bad\u7ec3LLMs\u7684\u4e3b\u5bfc\u529b\u91cf\uff0c\u800c\u4e0d\u662f\u4ec5\u8ba9\u8ba1\u7b97\u8d44\u6e90\u4e30\u5bcc\u7684\u673a\u6784\u72ec\u5360\u9ccc\u5934\u3002\u8fd9\u79cd\u65b9\u6cd5\u5f3a\u8c03\u4e86\u8054\u90a6\u8bad\u7ec3\u7684\u89c4\u6a21\u6548\u76ca\uff0c\u5e76\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b9e\u7528\u8def\u5f84\u3002|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825](http://arxiv.org/abs/2405.10825)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u5b83\u4eec\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u5c24\u5176\u5728\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u7684\u63a8\u52a8\u4e0b\u5c55\u73b0\u51fa\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u65e8\u5728\u5168\u9762\u6982\u8ff0LLM\u8d4b\u80fd\u7684\u7535\u4fe1\u7f51\u7edc\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86LLMs\u7684\u57fa\u7840\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u63a8\u7406\u4e0e\u5e94\u7528\u3001\u6a21\u578b\u8bc4\u4f30\uff0c\u4ee5\u53ca\u5728\u7535\u4fe1\u90e8\u7f72\u4e2d\u7684\u8fd0\u7528\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8LLM\u652f\u6301\u7684\u5173\u952e\u6280\u672f\u548c\u7535\u4fe1\u5e94\u7528\uff0c\u6d89\u53ca\u751f\u6210\u3001\u5206\u7c7b\u3001\u4f18\u5316\u548c\u9884\u6d4b\u95ee\u9898\u3002\u751f\u6210\u5e94\u7528\u5305\u62ec\u7535\u4fe1\u9886\u57df\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u7f51\u7edc\u914d\u7f6e\u81ea\u52a8\u751f\u6210\u3002\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u4efb\u52a1\u6db5\u76d6\u7f51\u7edc\u5b89\u5168\u3001\u6587\u672c\u3001\u56fe\u50cf\u548c\u6d41\u91cf\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u5229\u7528LLMs\u7684\u81ea\u52a8\u5316\u4f18\u5316\u6280\u672f\uff0c\u5982\u5f3a\u5316\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u8bbe\u8ba1\u548c\u53e3\u8bed\u5f3a\u5316\u5b66\u4e60\u3002\u5bf9\u4e8e\u9884\u6d4b\u95ee\u9898\uff0cLLMs\u53ef\u7528\u4e8e\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u548c\u591a\u6a21\u6001\u7535\u4fe1\u9884\u6d4b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86LLM\u8d4b\u80fd\u7535\u4fe1\u7f51\u7edc\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2405.10808": "|**2024-05-17**|**ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios**|Markus Bayer et.al.|[2405.10808](http://arxiv.org/abs/2405.10808)|null|\u4e3b\u52a8\u5b66\u4e60\u65e8\u5728\u901a\u8fc7\u4f18\u5148\u5904\u7406\u6700\u80fd\u63d0\u5347\u5b66\u4e60\u6548\u679c\u7684\u5b9e\u4f8b\u6765\u51cf\u5c11\u6807\u6ce8\u5de5\u4f5c\u91cf\u3002\u7136\u800c\uff0c\u8bb8\u591a\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u9762\u4e34\u201c\u51b7\u542f\u52a8\u201d\u95ee\u9898\uff0c\u5373\u5728\u521d\u671f\u9700\u8981\u5927\u91cf\u6570\u636e\u624d\u80fd\u53d1\u6325\u6548\u80fd\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982BERT\uff09\u4e0a\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u5df2\u8868\u73b0\u826f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u2014\u2014ActiveLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u3001Llama 3\u548cMistral Large\uff09\u8fdb\u884c\u5b9e\u4f8b\u9009\u62e9\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0cActiveLLM\u663e\u8457\u63d0\u9ad8\u4e86BERT\u5206\u7c7b\u5668\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u548cSetFit\u7b49\u5c11\u6570\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0cActiveLLM\u8fd8\u80fd\u6269\u5c55\u5230\u975e\u5c11\u91cf\u6837\u672c\u573a\u666f\uff0c\u652f\u6301\u8fed\u4ee3\u9009\u62e9\uff0c\u4ece\u800c\u5e2e\u52a9\u5176\u4ed6\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u514b\u670d\u51b7\u542f\u52a8\u96be\u9898\u3002\u7ed3\u679c\u8868\u660e\uff0cActiveLLM\u4e3a\u6539\u5584\u4e0d\u540c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u6a21\u578b\u6027\u80fd\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2405.10745": "|**2024-05-17**|**Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings**|Albert Sawczyn et.al.|[2405.10745](http://arxiv.org/abs/2405.10745)|null|### \u7ffb\u8bd1 \u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u5bf9\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u63d0\u51fa\u4e86\u4e25\u5cfb\u6311\u6218\u3002\u901a\u5e38\u91c7\u7528\u7684\u65b9\u6cd5\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u65f6\u5f80\u5f80\u5b58\u5728\u5c40\u9650\u6027\u3002\u7136\u800c\uff0c\u4eba\u4eec\u5df2\u7ecf\u52aa\u529b\u901a\u8fc7\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u5f25\u8865\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u901a\u8fc7\u5c06\u5c0f\u89c4\u6a21\u7684\u9886\u57df\u7279\u5b9aKG\u4e0e\u901a\u7528KG\u76f8\u7ed3\u5408\u3002\u5c3d\u7ba1KG\u5728\u77e5\u8bc6\u8868\u793a\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4f46\u6784\u5efa\u5b83\u4eec\u7684\u6210\u672c\u53ef\u80fd\u963b\u788d\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u94fe\u63a5\u5230\u5927\u89c4\u6a21\u901a\u7528KG\u6765\u63d0\u5347\u5c0f\u578b\u9886\u57df\u7279\u5b9aKG\u5d4c\u5165\u7684\u5b66\u4e60\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5e26\u6765\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u4f8b\u5982\uff0cHits@10\u6307\u6807\u6700\u9ad8\u63d0\u9ad8\u4e8644%\u3002\u8fd9\u4e00\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u7814\u7a76\u65b9\u5411\u6709\u671b\u4fc3\u8fdbKG\u5728\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u66f4\u9891\u7e41\u8fd0\u7528\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u4e3a\u7a33\u5065\u3001\u53ef\u9760\u7684ML\u89e3\u51b3\u65b9\u6848\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u6d41\u884c\u4f46\u6613\u51fa\u9519\u7684LLM\u65b9\u6cd5\u66f4\u5177\u53ef\u9760\u6027\u3002\u5173\u952e\u8bcd\uff1a\u77e5\u8bc6\u56fe\u8c31\u3001\u77e5\u8bc6\u56fe\u8c31\u8865\u5168\u3001\u5b9e\u4f53\u5bf9\u9f50\u3001\u8868\u793a\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739](http://arxiv.org/abs/2405.10739)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|**\u5728\u8fc7\u53bb\u4e00\u5e74\u91cc\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\u3001\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u7b49\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5e9e\u5927\u89c4\u6a21\u548c\u9ad8\u6602\u7684\u8bad\u7ec3\u4e0e\u63a8\u7406\u6210\u672c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u9ad8\u6548\u4e14\u8f7b\u91cf\u7ea7\u7684MLLM\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u8fb9\u7f18\u8ba1\u7b97\u73af\u5883\u4e2d\u3002\u672c\u7efc\u8ff0\u5168\u9762\u7cfb\u7edf\u5730\u56de\u987e\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7684\u7814\u7a76\u73b0\u72b6\u3002\u6211\u4eec\u6982\u8ff0\u4e86\u4ee3\u8868\u6027\u9ad8\u6548\u6a21\u578b\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u603b\u7ed3\u4e86\u6709\u6548\u7ed3\u6784\u548c\u7b56\u7565\u7684\u7814\u7a76\u72b6\u6001\uff0c\u4ee5\u53ca\u5176\u5b9e\u7528\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7814\u7a76\u7684\u5c40\u9650\uff0c\u5e76\u5c55\u671b\u4e86\u6709\u524d\u666f\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u5982\u9700\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey\u3002**|\n", "2405.10725": "|**2024-05-17**|**INDUS: Effective and Efficient Language Models for Scientific Applications**|Bishwaranjan Bhattacharjee et.al.|[2405.10725](http://arxiv.org/abs/2405.10725)|null|\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bad\u7ec3\u6570\u636e\u53ef\u4ee5\u4f7f\u6a21\u578b\u5728\u4e13\u4e1a\u4efb\u52a1\u4e0a\u8868\u73b0\u66f4\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86INDUS\uff0c\u4e00\u5957\u4e13\u4e3a\u5730\u7403\u79d1\u5b66\u3001\u751f\u7269\u5b66\u3001\u7269\u7406\u5b66\u3001\u592a\u9633\u7269\u7406\u3001\u884c\u661f\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u8bbe\u8ba1\u7684\u5b9a\u5236\u5316\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u4e9b\u6a21\u578b\u57fa\u4e8e\u7cbe\u5fc3\u6311\u9009\u7684\u79d1\u5b66\u8bed\u6599\u5e93\uff0c\u5305\u62ec\uff1a\uff081\uff09\u4e00\u4e2a\u4f7f\u7528\u9886\u57df\u4e13\u7528\u8bcd\u6c47\u548c\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u63d0\u5347\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u7684\u8868\u73b0\uff1b\uff082\uff09\u4e00\u4e2a\u57fa\u4e8e\u5bf9\u6bd4\u5b66\u4e60\u7684\u901a\u7528\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5229\u7528\u591a\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u4f18\u5316\u4fe1\u606f\u68c0\u7d22\u4efb\u52a1\uff1b\uff083\uff09\u901a\u8fc7\u77e5\u8bc6\u84b8\u998f\u6280\u672f\u7f29\u5c0f\u89c4\u6a21\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5bf9\u5ef6\u8fdf\u548c\u8d44\u6e90\u6709\u9650\u7684\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e09\u4e2a\u65b0\u7684\u79d1\u5b66\u57fa\u51c6\u6570\u636e\u96c6\uff1aCLIMATE-CHANGE-NER\uff08\u5b9e\u4f53\u8bc6\u522b\uff09\u3001NASA-QA\uff08\u62bd\u53d6\u5f0f\u95ee\u7b54\uff09\u548cNASA-IR\uff08\u4fe1\u606f\u68c0\u7d22\uff09\uff0c\u4ee5\u63a8\u52a8\u8de8\u5b66\u79d1\u9886\u57df\u7684\u7814\u7a76\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u65b0\u4efb\u52a1\u548c\u76f8\u5173\u9886\u57df\u73b0\u6709\u57fa\u51c6\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8e\u901a\u7528\u7f16\u7801\u5668\uff08\u5982RoBERTa\uff09\u548c\u73b0\u6709\u7684\u9886\u57df\u7279\u5b9a\u7f16\u7801\u5668\uff08\u5982SciBERT\uff09\u3002|\n", "2405.12217": "|**2024-05-20**|**Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning**|Guanglin Zhou et.al.|[2405.12217](http://arxiv.org/abs/2405.12217)|**[link](https://github.com/jameszhou-gl/icl-distribution-shift)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5e94\u5bf9\u81ea\u7136\u5206\u5e03\u53d8\u5316\u65f6\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\uff0c\u5e38\u5e38\u8d85\u8d8a\u5148\u524d\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u9886\u57df\u7279\u5b9a\u7684\u9002\u5e94\u4ecd\u7136\u662f\u5fc5\u8981\u7684\uff0c\u5c24\u5176\u662f\u5728\u533b\u7597\u7b49\u4e13\u4e1a\u9886\u57df\u3002\u9274\u4e8eLMMs\u5e9e\u5927\u7684\u53c2\u6570\u7a7a\u95f4\u4f7f\u5176\u5fae\u8c03\u4e0d\u5207\u5b9e\u9645\uff0c\u672c\u7814\u7a76\u805a\u7126\u4e8e\u63a2\u7d22\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u4f5c\u4e3a\u4e00\u79cd\u589e\u5f3aLMM\u9002\u5e94\u6027\u7684\u6709\u6548\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0cICL\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u793a\u4f8b\u7684\u9009\u62e9\uff0c\u8fd9\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7c7b\u4f3c\uff0c\u4f46\u5bf9\u9762\u4e34\u5206\u5e03\u53d8\u5316\u7684LMMs\u63d0\u51fa\u4e86\u72ec\u7279\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u79cd\u65e0\u76d1\u7763\u7684ICL\u65b9\u6cd5\u2014\u2014TopKNearestPR\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u7279\u5f81\u76f8\u4f3c\u6027\u8fdb\u884c\u6700\u8fd1\u793a\u4f8b\u641c\u7d22\u6765\u9009\u62e9\u793a\u4f8b\u3002\u7814\u7a76\u63ed\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u5904\u7406\u5206\u5e03\u8f6c\u79fb\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u7f3a\u9677\u5bf9\u5176\u6548\u679c\u7684\u9650\u5236\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014InvariantSelectPR\uff0c\u5b83\u5229\u7528\u7c7b\u6761\u4ef6\u5bf9\u6bd4\u4e0d\u53d8\u6027\uff08CCI\uff09\u6765\u63d0\u5347\u9884\u8bad\u7ec3\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7a33\u5065\u6027\u3002CCI\u901a\u8fc7\u589e\u5f3a\u4e0d\u540c\u7c7b\u522b\u95f4\u7684\u533a\u5206\u5ea6\u5e76\u786e\u4fdd\u5bf9\u9886\u57df\u7279\u5b9a\u53d8\u5316\u7684\u4e0d\u53d8\u6027\uff0c\u63d0\u9ad8\u4e86\u7f16\u7801\u5668\u8bc6\u522b\u548c\u68c0\u7d22\u6700\u6709\u4fe1\u606f\u4ef7\u503c\u793a\u4f8b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u6709\u52a9\u4e8e\u5f15\u5bfcLMM\u9002\u5e94\u65b0\u7684\u67e5\u8be2\u6837\u672c\uff0c\u5373\u4f7f\u5728\u4e0d\u540c\u7684\u5206\u5e03\u4e0b\u4e5f\u662f\u5982\u6b64\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cInvariantSelectPR\u663e\u8457\u63d0\u9ad8\u4e86LMM\u7684\u9002\u5e94\u6027\uff0c\u5728Camelyon17\u548cHAM10000\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u76847-shot\u4efb\u52a1\u4e2d\uff0c\u5206\u522b\u5b9e\u73b0\u4e8634.2%\u548c16.9%\u7684\u51c6\u786e\u7387\u63d0\u5347\uff0c\u76f8\u5bf9\u4e8e\u96f6-shot\u6027\u80fd\uff0c\u8fd9\u662f\u663e\u8457\u7684\u8fdb\u6b65\u3002**|\n", "2405.12209": "|**2024-05-20**|**MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark**|Hongwei Liu et.al.|[2405.12209](http://arxiv.org/abs/2405.12209)|**[link](https://github.com/open-compass/mathbench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u6570\u5b66\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f20\u7edf\u7684\u6570\u5b66\u57fa\u51c6\u5982GSM8k\u5728\u5168\u9762\u8bc4\u4ef7\u8fd9\u4e9b\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MathBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u4e25\u683c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u3002MathBench\u8986\u76d6\u5e7f\u6cdb\u7684\u6570\u5b66\u5b66\u79d1\uff0c\u5bf9\u7406\u8bba\u7406\u89e3\u548c\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u8fdb\u884c\u8be6\u5c3d\u8bc4\u4f30\u3002\u5b83\u5206\u4e3a\u4e94\u4e2a\u9636\u6bb5\uff0c\u4ece\u57fa\u7840\u7b97\u672f\u5230\u5927\u5b66\u6570\u5b66\uff0c\u7ed3\u6784\u4e0a\u8bbe\u8ba1\u7528\u4e8e\u8003\u5bdf\u6a21\u578b\u5728\u4e0d\u540c\u6df1\u5ea6\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u6bcf\u4e2a\u9636\u6bb5\u5305\u62ec\u7406\u8bba\u95ee\u9898\u548c\u5e94\u7528\u9898\uff0c\u4ee5\u8861\u91cf\u6a21\u578b\u7684\u6570\u5b66\u719f\u7ec3\u5ea6\u53ca\u5176\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\u5e94\u7528\u6982\u5ff5\u7684\u80fd\u529b\u3002MathBench\u7684\u76ee\u6807\u662f\u63d0\u5347\u5bf9LLMs\u6570\u5b66\u80fd\u529b\u7684\u8bc4\u4ef7\uff0c\u63d0\u4f9b\u5bf9\u5176\u77e5\u8bc6\u7406\u89e3\u6c34\u5e73\u548c\u95ee\u9898\u89e3\u51b3\u6280\u80fd\u7684\u7ec6\u81f4\u89c6\u89d2\uff0c\u540c\u65f6\u652f\u6301\u53cc\u8bed\u73af\u5883\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728https://github.com/open-compass/MathBench\u3002**|\n", "2405.12195": "|**2024-05-20**|**Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey**|Thiago S. Vaillant et.al.|[2405.12195](http://arxiv.org/abs/2405.12195)|**[link](https://github.com/gpt-impact/Paper-content)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5176\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\u548c\u5e7f\u6cdb\u5e94\u7528\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4e0e\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u7684\u878d\u5408\u8d8b\u52bf\u65e5\u76ca\u660e\u663e\uff0c\u4f46\u5173\u4e8e\u8fd9\u79cd\u878d\u5408\u5982\u4f55\u5f71\u54cd\u8f6f\u4ef6\u5f00\u53d1\u5b9e\u8df5\u548c\u8ba4\u77e5\u7684\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u4e3a\u4e86\u63ed\u793a\u5c06AI\u9a71\u52a8\u5de5\u5177\uff0c\u5982ChatGPT\uff0c\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u7684\u5f71\u54cd\u548c\u6311\u6218\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8c03\u67e5\uff0c\u9488\u5bf9207\u540d\u8f6f\u4ef6\u5f00\u53d1\u8005\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u8c03\u67e5\u5185\u5bb9\u5305\u62ecChatGPT\u5bf9\u8f6f\u4ef6\u8d28\u91cf\u3001\u751f\u4ea7\u529b\u4ee5\u53ca\u5f00\u53d1\u8005\u5de5\u4f5c\u6ee1\u610f\u5ea6\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u8fd8\u63a2\u8ba8\u4e86\u4ed6\u4eec\u5bf9\u672a\u6765ChatGPT\u5e94\u7528\u7684\u9884\u671f\u3001\u5bf9\u53ef\u80fd\u7684\u5de5\u4f5c\u5c97\u4f4d\u66ff\u4ee3\u7684\u62c5\u5fe7\uff0c\u4ee5\u53ca\u5bf9\u76d1\u7ba1\u63aa\u65bd\u7684\u770b\u6cd5\u3002|\n", "2405.12174": "|**2024-05-20**|**CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models**|Haoxiang Shi et.al.|[2405.12174](http://arxiv.org/abs/2405.12174)|null|\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aCT-Eval\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\uff0c\u65e8\u5728\u8861\u91cf\u5927\u8bed\u8a00\u6a21\u578b\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u6587\u672c\u8f6c\u8868\u683c\u4efb\u52a1\u6027\u80fd\u3002\u7531\u4e8e\u73b0\u6709\u82f1\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\u4e3b\u8981\u9762\u5411\u82f1\u8bed\uff0cCT-Eval\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u9009\u62e9\u4e86\u4e00\u79cd\u6d41\u884c\u7684\u591a\u5b66\u79d1\u4e2d\u6587\u5728\u7ebf\u767e\u79d1\u4f5c\u4e3a\u6765\u6e90\uff0c\u6db5\u76d6\u4e8628\u4e2a\u9886\u57df\u4ee5\u4fdd\u8bc1\u6570\u636e\u591a\u6837\u6027\u3002\u4e3a\u4e86\u51cf\u5c11\u6570\u636e\u865a\u6784\uff08hallucination\uff09\u95ee\u9898\uff0c\u7814\u7a76\u8005\u9996\u5148\u8bad\u7ec3\u4e86\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u6765\u8bc6\u522b\u5e76\u8fc7\u6ee4\u6389\u5b58\u5728\u865a\u6784\u95ee\u9898\u7684\u6837\u672c\uff0c\u7136\u540e\u4eba\u5de5\u6807\u6ce8\u9a8c\u8bc1\u96c6\u548c\u6d4b\u8bd5\u96c6\u4e2d\u7684\u9519\u8bef\u3002\u6700\u7ec8\uff0cCT-Eval\u5305\u542b\u4e86\u5927\u7ea688,600\u4e2a\u4efb\u52a1\u6837\u672c\u3002\u901a\u8fc7CT-Eval\uff0c\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5f00\u6e90\u548c\u95ed\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u663e\u793a\u96f6-shot\u6a21\u5f0f\u4e0b\u8fd9\u4e9b\u6a21\u578b\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ecd\u6709\u663e\u8457\u5dee\u8ddd\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5f00\u6e90\u6a21\u578b\u5728\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u4e0a\u6709\u4e86\u663e\u8457\u63d0\u5347\uff0c\u5927\u5e45\u8d85\u8d8a\u4e86GPT-4\u3002\u603b\u4e4b\uff0cCT-Eval\u4e0d\u4ec5\u4e3a\u8bc4\u4f30\u548c\u7406\u89e3\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u5de5\u5177\uff0c\u4e5f\u4e3a\u63d0\u5347\u8fd9\u7c7b\u6a21\u578b\u5728\u8fd9\u9879\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u5b9d\u8d35\u8d44\u6e90\u3002|\n", "2405.12163": "|**2024-05-20**|**Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging**|Xiaobo Liang et.al.|[2405.12163](http://arxiv.org/abs/2405.12163)|**[link](https://github.com/dropreg/fennec)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u4f17\u591a\u73b0\u5b9e\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4e3b\u8981\u76ee\u6807\u662f\u7b26\u5408\u4eba\u7c7b\u7684\u610f\u56fe\u3002\u7136\u800c\uff0c\u7406\u89e3\u4eba\u7c7b\u610f\u56fe\u7684\u590d\u6742\u6027\u4f7f\u5f97\u4f9d\u8d56\u4e8e\u8017\u65f6\u7684\u4eba\u5de5\u8bc4\u4f30\u6210\u4e3a\u5fc5\u8981\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5229\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u8bc4\u4f30\u8005\u7684\u8d8b\u52bf\uff0c\u7279\u522b\u662f\u5728GPT-4\u7684\u6d41\u884c\u80cc\u666f\u4e0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textbf{Fennec}\u7684\u6846\u67b6\uff0c\u4e13\u6ce8\u4e8e\\textbf{F}ine-grained \\textbf{E}valuation\uff08\u7ec6\u81f4\u8bc4\u4f30\uff09\u548c\\textbf{N}eeded \\textbf{E}xtension\uff08\u5fc5\u8981\u6269\u5c55\uff09\u901a\u8fc7\u5206\u652f\uff08Branching\uff09\u548c\u8fde\u63a5\uff08Bridging\uff09\u3002\u5206\u652f\u64cd\u4f5c\u5c06\u8bc4\u4f30\u4efb\u52a1\u5206\u89e3\u4e3a\u4e0d\u540c\u7ef4\u5ea6\u548c\u7c92\u5ea6\uff0c\u4ece\u800c\u51cf\u8f7b\u8bc4\u4f30\u6311\u6218\u3002\u540c\u65f6\uff0c\u8fde\u63a5\u64cd\u4f5c\u878d\u5408\u4e86\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u589e\u52a0\u4e86\u8bc4\u4f30\u4efb\u52a1\u7684\u591a\u6837\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u76847B\u6a21\u578b\u5728\u5404\u79cd\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\\textit{\u4e00\u81f4\u6027}\u548c\\textit{\u4e00\u81f4\u540c\u610f}\u6027\u80fd\u5747\u4f18\u4e8e\u5f00\u6e90\u7684\u66f4\u5927\u89c4\u6a21\u8bc4\u4f30\u6a21\u578b\uff0c\u63a5\u8fd1GPT-4\u7684\u8868\u73b0\u3002\u6211\u4eec\u5229\u7528\u6a21\u578b\u7684\u7cbe\u7ec6\u6821\u6b63\u529f\u80fd\u6539\u8fdb\u591a\u4e2a\u6a21\u578b\u54cd\u5e94\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u4f18\u5316\u63d0\u5347\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u5728MT-Bench\u4e0a\u63d0\u9ad8\u4e861-2\u5206\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728GitHub\u4e0a\u5f00\u6e90\\footnote{\\url{https://github.com/dropreg/Fennec}}\u3002**|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130](http://arxiv.org/abs/2405.12130)|**[link](https://github.com/kongds/mora)**|**\u4f4e\u79e9\u9002\u5e94\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u6d41\u884c\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4f4e\u79e9\u66f4\u65b0\uff08\u5982LoRA\u5b9e\u73b0\uff09\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u6307\u51fa\uff0c\u8fd9\u79cd\u673a\u5236\u53ef\u80fd\u9650\u5236\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\u548c\u8bb0\u5fc6\u65b0\u77e5\u8bc6\u7684\u80fd\u529b\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5MoRA\uff0c\u5b83\u5229\u7528\u5e73\u65b9\u77e9\u9635\u5b9e\u73b0\u9ad8\u79e9\u66f4\u65b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eLoRA\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76f8\u5e94\u7684\u975e\u53c2\u6570\u8fd0\u7b97\u5668\uff0c\u4ee5\u964d\u4f4e\u8f93\u5165\u7ef4\u5ea6\u5e76\u589e\u52a0\u8f93\u51fa\u7ef4\u5ea6\u5904\u7406\u5e73\u65b9\u77e9\u9635\u3002\u8fd9\u4e9b\u8fd0\u7b97\u5668\u786e\u4fdd\u6743\u91cd\u80fd\u65e0\u7f1d\u878d\u5165\u5230\u5927\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u4f7f\u5f97\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u50cfLoRA\u4e00\u6837\u90e8\u7f72\u3002\u6211\u4eec\u5728\u4e94\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff1a\u6307\u4ee4\u8c03\u6574\u3001\u6570\u5b66\u63a8\u7406\u3001\u8fde\u7eed\u9884\u8bad\u7ec3\u3001\u8bb0\u5fc6\u4ee5\u53ca\u9884\u8bad\u7ec3\u3002\u5728\u5185\u5b58\u5bc6\u96c6\u578b\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8eLoRA\uff0c\u5e76\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002**|\n", "2405.12119": "|**2024-05-20**|**Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation**|Zhankui He et.al.|[2405.12119](http://arxiv.org/abs/2405.12119)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u901a\u8fc7\u51fa\u8272\u5730\u7d22\u5f15\u9879\u76ee\u5185\u5bb9\u3001\u7406\u89e3\u590d\u6742\u7684\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u5e76\u751f\u6210\u76f8\u5173\u9879\u76ee\u6807\u9898\uff0c\u9769\u65b0\u4e86\u5bf9\u8bdd\u63a8\u8350\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u63a7\u5236\u63a8\u8350\u9879\u76ee\u7684\u5206\u5e03\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u5bfc\u81f4\u5728\u9488\u5bf9\u5bf9\u8bdd\u63a8\u8350\u5e73\u53f0\u7684\u5feb\u901f\u53d8\u5316\u7684\u6570\u636e\u5206\u5e03\uff0c\u5982\u9879\u76ee\u6d41\u884c\u5ea6\u4e0a\uff0c\u6027\u80fd\u6b20\u4f73\u3002\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\uff0cLLMs\u901a\u8fc7\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u9879\u76ee\u6807\u9898\uff08\u4f5c\u4e3a\u591a\u4e2a\u4ee4\u724c\uff09\uff0c\u8fd9\u4f7f\u5f97\u83b7\u53d6\u548c\u63a7\u5236\u6240\u6709\u9879\u76ee\u63a8\u8350\u53d8\u5f97\u56f0\u96be\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u91cd\u7d22\u5f15-\u7136\u540e\u9002\u5e94\u201d\uff08Reindex-Then-Adapt\uff0cRTA\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u591a\u4ee4\u724c\u9879\u76ee\u6807\u9898\u8f6c\u6362\u4e3a\u5355\u4e2a\u4ee4\u724c\u4e8eLLMs\u5185\uff0c\u968f\u540e\u8c03\u6574\u8fd9\u4e9b\u5355\u4ee4\u724c\u9879\u76ee\u6807\u9898\u7684\u6982\u7387\u5206\u5e03\u3002RTA\u6846\u67b6\u7ed3\u5408\u4e86LLMs\u7406\u89e3\u548c\u590d\u6742\u67e5\u8be2\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\uff08RecSys\uff09\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\u6709\u6548\u63a7\u5236\u63a8\u8350\u9879\u76ee\u5206\u5e03\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u4e09\u4e2a\u4e0d\u540c\u7684\u5bf9\u8bdd\u63a8\u8350\u6570\u636e\u96c6\u548c\u4e24\u79cd\u9002\u5e94\u8bbe\u7f6e\u4e0b\uff0c\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u51c6\u786e\u6027\u6307\u6807\u3002|\n", "2405.12107": "|**2024-05-20**|**Imp: Highly Capable Large Multimodal Models for Mobile Devices**|Zhenwei Shao et.al.|[2405.12107](http://arxiv.org/abs/2405.12107)|**[link](https://github.com/milvlg/imp)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5f00\u653e\u4e16\u754c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u53c2\u6570\u91cf\u5927\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\uff0c\u9650\u5236\u4e86\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u8f7b\u91cf\u7ea7LMM\uff0c\u65e8\u5728\u5728\u6709\u9650\u89c4\u6a21\uff08\u598230\u4ebf\u53c2\u6570\uff09\u4e0b\u6700\u5927\u5316\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u591a\u6570\u4ec5\u5173\u6ce8\u8bbe\u8ba1\u7a7a\u95f4\u7684\u5355\u4e00\u6216\u4e24\u4e2a\u65b9\u9762\uff0c\u5bf9\u5f71\u54cd\u6a21\u578b\u80fd\u529b\u7684\u5173\u952e\u8bbe\u8ba1\u9009\u62e9\u5c1a\u672a\u8fdb\u884c\u5168\u9762\u63a2\u8ba8\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u8f7b\u91cf\u7ea7LMM\u7684\u8bbe\u8ba1\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u8bad\u7ec3\u7b56\u7565\u548c\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u5957\u540d\u4e3aImp\u7684\u9ad8\u6027\u80fdLMM\u5bb6\u65cf\uff0c\u8986\u76d620\u4ebf\u523040\u4ebf\u53c2\u6570\u89c4\u6a21\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684Imp-30\u4ebf\u6a21\u578b\u5728\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u73b0\u6709\u8f7b\u91cf\u7ea7\u6a21\u578b\u76f8\u6bd4\u65f6\u6301\u7eed\u9886\u5148\uff0c\u5e76\u8d85\u8d8a\u4e86130\u4ebf\u53c2\u6570\u89c4\u6a21\u7684\u6700\u65b0LMM\u72b6\u6001\u3002\u901a\u8fc7\u4f4e\u7cbe\u5ea6\u91cf\u5316\u548c\u5206\u8fa8\u7387\u964d\u4f4e\u6280\u672f\uff0cImp\u6a21\u578b\u80fd\u591f\u5728\u9ad8\u901a\u9a81\u9f998Gen3\u79fb\u52a8\u82af\u7247\u4e0a\u5b9e\u73b0\u9ad8\u901f\u90e8\u7f72\uff0c\u6bcf\u79d2\u5904\u7406\u5927\u7ea613\u4e2a\u4ee4\u724c\u7684\u63a8\u7406\u901f\u5ea6\u3002**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100](http://arxiv.org/abs/2405.12100)|null|## \u80cc\u666f \u6570\u5b66\u4e16\u754c\u95ee\u9898\u4fee\u6b63\uff08MWPC\uff09\u662f\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u89e3\u51b3\u6570\u5b66\u95ee\u9898\u8fc7\u7a0b\u4e2d\u9519\u8bef\u63a8\u7406\u7684\u4fee\u6b63\u4efb\u52a1\u3002\u672c\u6587\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\uff0c\u5173\u6ce8\u4e24\u70b9\uff1a\uff081\uff09\u533a\u5206\u6570\u5b66\u63a8\u7406\u4e0e\u9519\u8bef\u4fee\u6b63\uff1b\uff082\uff09\u63a2\u7d22\u7b56\u7565\u4ee5\u63d0\u5347LLMs\u5728\u6570\u5b66\u9886\u57df\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9MWPC\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5728\u5b9e\u65f6\u6559\u80b2\u4e2d\uff0c\u5e2e\u52a9\u5b66\u751f\u8bc6\u522b\u9519\u8bef\u6bd4\u5355\u7eaf\u63d0\u4f9b\u6b63\u786e\u7b54\u6848\u66f4\u4e3a\u5173\u952e\u3002\u7136\u800c\uff0c\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u83b7\u53d6\u7cbe\u786e\u7684\u89e3\u9898\u7b54\u6848\uff0c\u800c\u975e\u7ea0\u6b63\u53ef\u80fd\u7684\u9519\u8bef\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8c03\u6574\u4e86\u7814\u7a76\u8303\u5f0f\uff0c\u8868\u660e\u63d0\u5347\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5e76\u4e0d\u7b49\u540c\u4e8e\u7cbe\u901a\u9519\u8bef\u4fee\u6b63\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u8bca\u65ad\u5bfc\u5411\u63d0\u793a\uff08DOP\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdbLLMs\u5728\u9519\u8bef\u4fee\u6b63\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDOP\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u5f70\u663e\u5176\u91cd\u8981\u6027\u3002\u6211\u4eec\u5f3a\u8c03\uff0c\u5728\u6570\u5b66\u6559\u80b2\u4e2d\uff0c\u5bf9\u51fa\u8272\u4fee\u6b63\u8005\u7684\u9700\u8981\u8d85\u8fc7\u4e86\u5bf9\u719f\u7ec3\u63a8\u7406\u8005\u7684\u8ffd\u6c42\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2405.12981": "|**2024-05-21**|**Reducing Transformer Key-Value Cache Size with Cross-Layer Attention**|William Brandon et.al.|[2405.12981](http://arxiv.org/abs/2405.12981)|null|## \u7ffb\u8bd1 \u952e\u503c\u7f13\u5b58\u5bf9\u4e8e\u52a0\u901fTransformer\u67b6\u6784\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89e3\u7801\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u548c\u6279\u91cf\u5927\u5c0f\u589e\u5927\uff0c\u5b58\u50a8\u952e\u503c\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\u53ef\u80fd\u4f1a\u53d8\u5f97\u96be\u4ee5\u627f\u53d7\u3002\u81ea\u4eceTransformer\u8bde\u751f\u4ee5\u6765\uff0c\u4e24\u4e2a\u6700\u6709\u6548\u7684\u5185\u5b58\u51cf\u5c0f\u7b56\u7565\u662f\u591a\u67e5\u8be2\u6ce8\u610f\u529b\uff08MQA\uff09\u53ca\u5176\u63a8\u5e7f\uff0c\u7fa4\u7ec4\u67e5\u8be2\u6ce8\u610f\u529b\uff08GQA\uff09\u3002MQA\u548cGQA\u901a\u8fc7\u8ba9\u591a\u4e2a\u67e5\u8be2\u5934\u5171\u4eab\u5355\u4e2a\u952e/\u503c\u5934\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u4e0d\u540c\u952e/\u503c\u5934\u7684\u6570\u91cf\uff0c\u540c\u65f6\u5bf9\u51c6\u786e\u6027\u5f71\u54cd\u8f83\u5c0f\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u8fdb\u4e00\u6b65\u53d1\u5c55MQA\uff0c\u5373\u5728\u76f8\u90bb\u5c42\u4e4b\u95f4\u4e5f\u5171\u4eab\u952e\u548c\u503c\u5934\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u8de8\u5c42\u6ce8\u610f\u529b\uff08CLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528CLA\uff0c\u53ef\u4ee5\u5728\u4fdd\u6301\u63a5\u8fd1\u539f\u59cbMQA\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u5c06\u952e\u503c\u7f13\u5b58\u7684\u5927\u5c0f\u518d\u51cf\u5c112\u500d\u3002\u6211\u4eec\u5728\u4ece\u5934\u8bad\u7ec310\u4ebf\u53c2\u6570\u548c30\u4ebf\u53c2\u6570\u6a21\u578b\u7684\u5b9e\u9a8c\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u70b9\uff0c\u7ed3\u679c\u8868\u660e\uff0cCLA\u5728\u5185\u5b58\u4e0e\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u63d0\u4f9b\u4e86\u4f18\u4e8e\u4f20\u7edfMQA\u7684\u5e15\u7d2f\u6258\u6539\u8fdb\uff0c\u4f7f\u5f97\u66f4\u957f\u7684\u5e8f\u5217\u957f\u5ea6\u548c\u66f4\u5927\u7684\u6279\u91cf\u5927\u5c0f\u4e0b\u7684\u63a8\u7406\u6210\u4e3a\u53ef\u80fd\u3002|\n", "2405.12961": "|**2024-05-21**|**Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale**|Shriram Chennakesavalu et.al.|[2405.12961](http://arxiv.org/abs/2405.12961)|**[link](https://github.com/rotskoff-group/llm-era)**|\u5728\u5316\u5b66\u7a7a\u95f4\u4e2d\u7684\u641c\u7d22\u662f\u4e00\u4e2a\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\uff0c\u56e0\u4e3a\u53ef\u80fd\u7684\u5206\u5b50\u6570\u91cf\u968f\u7740\u539f\u5b50\u6570\u91cf\u5448\u7ec4\u5408\u7ea7\u589e\u957f\u3002\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\u901a\u8fc7\u5b66\u4e60\u5316\u5b66\u5316\u5408\u7269\u6570\u636e\u5e93\u5df2\u7ecf\u4ea7\u751f\u4e86\u5f3a\u5927\u7684\u751f\u6210\u5668\uff0c\u4f46\u6211\u4eec\u4ecd\u7136\u7f3a\u4e4f\u6709\u6548\u7b56\u7565\u6765\u751f\u6210\u5177\u6709\u7279\u5b9a\u6027\u8d28\u7684\u5206\u5b50\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u201c\u5bf9\u9f50\u201d\u95ee\u9898\u76f8\u4f3c\uff0c\u5c3d\u7ba1\u5728\u8bb8\u591a\u5316\u5b66\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u6709\u4e00\u4e2a\u660e\u786e\u4e14\u6613\u4e8e\u8bc4\u4f30\u7684\u5956\u52b1\u51fd\u6570\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u80fd\u91cf\u6392\u540d\u5bf9\u9f50\uff08ERA\uff09\u7684\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u660e\u786e\u7684\u5956\u52b1\u51fd\u6570\u6784\u5efa\u4e86\u4e00\u4e2a\u68af\u5ea6\u4f18\u5316\u76ee\u6807\uff0c\u7528\u4e8e\u8c03\u6574\u81ea\u56de\u5f52\u7b56\u7565\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u8be5\u7b97\u6cd5\u4e0eProximal Policy Optimization\uff08PPO\uff09\u548cDirect Preference Optimization\uff08DPO\uff09\u5bc6\u5207\u76f8\u5173\uff0c\u4f46\u5176\u6700\u5c0f\u5316\u5668\u6536\u655b\u4e8e\u4e00\u4e2a\u7406\u60f3\u7684\u5409\u5e03\u65af-\u73bb\u5c14\u5179\u66fc\u5206\u5e03\uff0c\u5956\u52b1\u51fd\u6570\u626e\u6f14\u4e86\u80fd\u91cf\u89d2\u8272\u3002\u6b64\u5916\uff0c\u8be5\u7b97\u6cd5\u5177\u6709\u9ad8\u5ea6\u53ef\u6269\u5c55\u6027\uff0c\u65e0\u9700\u5f3a\u5316\u5b66\u4e60\uff0c\u5e76\u4e14\u5728\u6bcf\u5bf9\u6837\u672c\u7684\u504f\u597d\u89c2\u5bdf\u6b21\u6570\u8f83\u5c11\u65f6\uff0c\u76f8\u5bf9\u4e8eDPO\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u5e94\u7528\u4e8e\u5206\u5b50\u53d8\u538b\u5668\u7684\u5bf9\u9f50\uff0c\u4ee5\u751f\u6210\u5177\u6709\u5916\u90e8\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\uff0c\u5e76\u53d1\u73b0\u5b83\u80fd\u7a33\u5065\u5730\u8fdb\u884c\u641c\u7d22\uff0c\u63a2\u7d22\u5316\u5b66\u7a7a\u95f4\u7684\u591a\u6837\u5316\u90e8\u5206\u3002\u867d\u7136\u6211\u4eec\u7684\u91cd\u70b9\u5728\u4e8e\u5316\u5b66\u641c\u7d22\uff0c\u4f46\u6211\u4eec\u5728\u4e00\u4e2aAI\u76d1\u7763\u7684\u4efb\u52a1\u4e0a\u4e5f\u53d6\u5f97\u4e86\u4f18\u79c0\u7ed3\u679c\uff0c\u8868\u660e\u8be5\u65b9\u6cd5\u662f\u53ef\u6269\u5c55\u4e14\u901a\u7528\u7684\u3002|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939](http://arxiv.org/abs/2405.12939)|**[link](https://github.com/yinzhangyue/AoR)**|## \u80cc\u666f \u8fd1\u671f\uff0cChain-of-Thought\u63d0\u793a\u7684\u8fdb\u5c55\u6781\u5927\u5730\u63a8\u52a8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u7a81\u7834\u3002\u5f53\u524d\u7814\u7a76\u901a\u8fc7\u91c7\u6837\u591a\u79cd\u63a8\u7406\u8def\u5f84\u5e76\u6839\u636e\u7b54\u6848\u9891\u7387\u8fdb\u884censemble\uff0c\u63d0\u9ad8\u4e86LLMs\u7684\u63a8\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u6b63\u786e\u7b54\u6848\u5904\u4e8e\u5c11\u6570\u7684\u60c5\u51b5\u65f6\u5931\u6548\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u662f\u5236\u7ea6LLMs\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u56e0\u7d20\uff0c\u4ec5\u51ed\u9884\u6d4b\u7b54\u6848\u65e0\u6cd5\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u63a8\u7406\u805a\u5408\u6846\u67b6AoR\uff08\u63a8\u7406\u805a\u5408\uff09\uff0c\u5b83\u4f9d\u636e\u63a8\u7406\u94fe\u6761\u7684\u8bc4\u4f30\u6765\u9009\u62e9\u7b54\u6848\u3002\u6b64\u5916\uff0cAoR\u5f15\u5165\u4e86\u52a8\u6001\u91c7\u6837\u7b56\u7565\uff0c\u6839\u636e\u4efb\u52a1\u590d\u6742\u5ea6\u8c03\u6574\u63a8\u7406\u94fe\u6761\u7684\u6570\u91cf\u3002 ## \u4efb\u52a1 \u4e00\u7cfb\u5217\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAoR\u76f8\u8f83\u4e8e\u4e3b\u6d41ensemble\u65b9\u6cd5\u8868\u73b0\u51fa\u8272\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u8868\u660e\uff0cAoR\u4e0d\u4ec5\u9002\u7528\u4e8e\u5404\u79cdLLMs\uff0c\u800c\u4e14\u5728\u4e0e\u73b0\u6709\u65b9\u6cd5\u7684\u6027\u80fd\u5929\u82b1\u677f\u6bd4\u8f83\u4e2d\uff0c\u8fbe\u5230\u4e86\u66f4\u4f18\u79c0\u7684\u6c34\u5e73\u3002|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933](http://arxiv.org/abs/2405.12933)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bf8\u5982\u603b\u7ed3\u3001\u7b97\u672f\u63a8\u7406\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5728\u9053\u5fb7\u63a8\u7406\u548c\u4f26\u7406\u51b3\u7b56\u65b9\u9762\uff0c\u5c24\u5176\u662f\u5728\u6d89\u53ca\u591a\u4e2a\u5229\u76ca\u76f8\u5173\u8005\u7684\u590d\u6742\u60c5\u666f\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u4e25\u5cfb\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSkin-in-the-Game\uff08SKIG\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4ece\u4e0d\u540c\u5229\u76ca\u76f8\u5173\u8005\u89d2\u5ea6\u5ba1\u89c6\u51b3\u7b56\u7684\u540e\u679c\uff0c\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u5728\u9053\u5fb7\u63a8\u7406\u4e2d\u7684\u80fd\u529b\u3002SKIG\u7684\u6838\u5fc3\u673a\u5236\u662f\u6a21\u62df\u884c\u52a8\u7684\u8d23\u4efb\u611f\uff0c\u7ed3\u5408\u540c\u7406\u5fc3\u7ec3\u4e60\u548c\u98ce\u9669\u8bc4\u4f30\uff0c\u5bf9\u63d0\u9ad8\u5176\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u4f7f\u7528\u4e13\u6709\u548c\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u9053\u5fb7\u63a8\u7406\u57fa\u51c6\u4e0a\u9a8c\u8bc1SKIG\u7684\u8868\u73b0\uff0c\u5e76\u901a\u8fc7\u6df1\u5165\u7684\u6d88\u878d\u5206\u6790\u63a2\u7a76\u5176\u5173\u952e\u7ec4\u4ef6\u3002|\n", "2405.12929": "|**2024-05-21**|**Code-mixed Sentiment and Hate-speech Prediction**|Anjali Yadav et.al.|[2405.12929](http://arxiv.org/abs/2405.12929)|**[link](https://github.com/matejklemen/sentiment-hate-speech-with-code-mixed-models)**|\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\uff0c\u6df7\u5408\u4ee3\u7801\uff08code-mixed discourse\uff09\u6307\u7684\u662f\u5355\u6587\u672c\u4e2d\u878d\u5408\u591a\u79cd\u8bed\u8a00\u7684\u73b0\u8c61\uff0c\u5c24\u5176\u662f\u5728\u5b98\u65b9\u8bed\u8a00\u591a\u5143\u7684\u56fd\u5bb6\u7684\u975e\u6b63\u5f0f\u4ea4\u6d41\u4e2d\u5e38\u89c1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u4e3b\u5bfc\u5730\u4f4d\u63d0\u5347\uff0c\u6211\u4eec\u9488\u5bf9\u4ee3\u7801\u6df7\u5408\u8bed\u5883\u7684\u7814\u7a76\u4e5f\u968f\u4e4b\u5c55\u5f00\u3002\u9996\u5148\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u56db\u6b3e\u65b0\u7684\u82f1\u8bed-\u5370\u5730\u8bed\u548c\u82f1\u8bed-\u65af\u6d1b\u6587\u5c3c\u4e9a\u53cc\u8bed\u9884\u8bad\u7ec3\u906e\u7f69\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u9002\u5e94\u975e\u6b63\u5f0f\u8bed\u8a00\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5404\u79cd\u7c7b\u578b\u7684\u6a21\u578b\u2014\u2014\u5305\u62ec\u5355\u8bed\u3001\u53cc\u8bed\u3001\u5c11\u91cf\u8bed\u8a00\u548c\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u6a21\u578b\u2014\u2014\u5728\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u60c5\u611f\u5206\u6790\u548c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7b49\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u6709\u6548\u7684\u5206\u7c7b\u5668\u662f\u9488\u5bf9\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u4e13\u4e1a\u5316\u53cc\u8bed\u548c\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u968f\u540e\u662f\u975e\u4e13\u4e1a\u7684\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u548c\u5355\u8bed\u6a21\u578b\uff0c\u800c\u5927\u578b\u751f\u6210\u6a21\u578b\u7684\u8868\u73b0\u5e76\u4e0d\u7a81\u51fa\u3002\u5bf9\u4e8e\u6d89\u53ca\u60c5\u611f\u7684\u95ee\u9898\uff0c\u6a21\u578b\u5728\u5904\u7406\u4ee3\u7801\u6df7\u5408\u6570\u636e\u65f6\u603b\u4f53\u4e0a\u7565\u4f18\u4e8e\u975e\u4ee3\u7801\u6df7\u5408\u6570\u636e\u3002|\n", "2405.12920": "|**2024-05-21**|**Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples**|Tim Menzies et.al.|[2405.12920](http://arxiv.org/abs/2405.12920)|**[link](https://github.com/timm/ez)**|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u8f6f\u4ef6\u5206\u6790\u6311\u6218\u4efb\u52a1\u3002\u5728\u8fd9\u4e2a\u88ab\u79f0\u4e3a\u201c\u8f6f\u4ef6\u5ba1\u67e5\u201d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e00\u7ec4SME\uff08\u4e3b\u9898\u4e13\u5bb6\uff09\u4f1a\u8bc4\u5ba1\u8f6f\u4ef6\u884c\u4e3a\u793a\u4f8b\uff0c\u4ee5\u5efa\u8bae\u5982\u4f55\u6539\u8fdb\u8f6f\u4ef6\u7684\u8fd0\u884c\u3002\u7531\u4e8eSME\u7684\u65f6\u95f4\u901a\u5e38\u975e\u5e38\u6709\u9650\uff0c\u7406\u60f3\u7684\u72b6\u51b5\u662f\uff0c\u8be5\u56e2\u961f\u4ec5\u901a\u8fc7\u67e5\u770b\u5c11\u91cf\u5177\u6709\u9ad8\u5ea6\u4fe1\u606f\u4ef7\u503c\u7684\u793a\u4f8b\u5c31\u80fd\u5b8c\u6210\u4f18\u5316\u4efb\u52a1\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e2a\u5ba1\u67e5\u8fc7\u7a0b\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u8bad\u7ec3\u9884\u6d4b\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u67d0\u4e2a\u4e13\u5bb6\u662f\u5426\u4f1a\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u4e0b\u4e00\u4e2a\u793a\u4f8b\u3002\u8fd9\u79cd\u9884\u6d4b\u6a21\u578b\u53ef\u4ee5\u4e0eSME\u5408\u4f5c\uff0c\u5f15\u5bfc\u4ed6\u4eec\u63a2\u7d22\u6240\u6709\u793a\u4f8b\uff0c\u540c\u65f6\u5728\u4e13\u5bb6\u79bb\u5f00\u540e\uff0c\u6a21\u578b\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u4ee3\u7406\uff0c\u5904\u7406\u65b0\u51fa\u73b0\u7684\u6848\u4f8b\uff0c\u4ee5\u5e94\u5bf9\u4e13\u5bb6\u4eec\u7684\u5fd9\u788c\u3002 \u572831\u4e2a\u6848\u4f8b\u7814\u7a76\u4e2d\uff08\u6db5\u76d6\u4e86\u4ece\u8f6f\u4ef6\u6d41\u7a0b\u7684\u9ad8\u5c42\u51b3\u7b56\u5230\u89c6\u9891\u7f16\u7801\u8f6f\u4ef6\u914d\u7f6e\u7684\u4f4e\u5c42\u51b3\u7b56\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f7f\u752812\u523030\u4e2a\u6807\u7b7e\u5c31\u80fd\u5efa\u7acb\u8fd9\u6837\u7684\u9884\u6d4b\u6a21\u578b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u4ec5\u51ed\u5c11\u6570\u793a\u4f8b\uff08\u4e0d\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5c31\u80fd\u53d6\u5f97\u8fd9\u6837\u7684\u6210\u679c\uff0c\u5728\u5f53\u524d\u5c1a\u5c5e\u7f55\u89c1\u3002\u9075\u5faa\u5f00\u653e\u79d1\u5b66\u7684\u539f\u5219\uff0c\u6211\u4eec\u5c06\u5728\u63d0\u4f9b\u6240\u6709\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4fbf\u4ed6\u4eba\u80fd\u590d\u5236\u3001\u9a8c\u8bc1\u6216\u5728\u6b64\u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u6539\u8fdb\u8fd9\u4e9b\u7ed3\u679c\u3002|\n", "2405.12915": "|**2024-05-21**|**G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation**|Xingyuan Pan et.al.|[2405.12915](http://arxiv.org/abs/2405.12915)|**[link](https://github.com/xypan0/G-DIG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u7528\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\uff0c\u5b83\u4eec\u80fd\u591f\u4e0e\u4eba\u7c7b\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u534f\u540c\u3002\u7136\u800c\uff0c\u6307\u4ee4\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u662f\u6307\u4ee4\u5fae\u8c03\u9762\u4e34\u7684\u4e24\u5927\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u9009\u62e9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u9ad8\u8d28\u91cf\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u3002\u6211\u4eec\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5206\u6790\u5355\u4e2a\u8bad\u7ec3\u6837\u4f8b\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f71\u54cd\u6a21\u578b\u3002\u901a\u8fc7\u7ed3\u5408\u5f71\u54cd\u529b\u51fd\u6570\u548c\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u79cd\u5b50\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u5bf9\u6a21\u578b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u7684\u6837\u4f8b\u4f5c\u4e3a\u9ad8\u8d28\u91cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u6570\u636e\u591a\u6837\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u805a\u7c7b\u5176\u68af\u5ea6\u5e76\u91cd\u91c7\u6837\uff0c\u6700\u5927\u5316\u5b83\u4eec\u5bf9\u6a21\u578b\u4ea7\u751f\u7684\u5f71\u54cd\u591a\u6837\u6027\u3002\u5728WMT22\u548cFLORES\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u6df1\u5165\u5206\u6790\u8fdb\u4e00\u6b65\u8bc1\u5b9e\u4e86\u5176\u6548\u679c\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2405.12914": "|**2024-05-21**|**An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation**|Zhiyu Tan et.al.|[2405.12914](http://arxiv.org/abs/2405.12914)|**[link](https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion.github.io)**|\u4e00\u4e2a\u5173\u952e\u7684\u5148\u51b3\u6761\u4ef6\u662f\u51c6\u786e\u7406\u89e3\u6587\u672c\u8f93\u5165\uff0c\u8fd9\u5bf9\u4e8e\u5fe0\u5b9e\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u5229\u7528CLIP\u6a21\u578b\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u8868\u793a\u63d0\u793a\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684CLIP\u6a21\u578b\u4ec5\u80fd\u5904\u7406\u82f1\u6587\uff0c\u4e14\u5176\u6587\u672c\u7f16\u7801\u5668\u7684\u6a21\u578b\u5bb9\u91cf\u76f8\u5bf9\u6709\u9650\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u80fd\u591f\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u5e76\u63d0\u4f9b\u66f4\u4f18\u79c0\u7684\u6587\u672c\u8868\u793a\u3002\u672c\u6587\u7814\u7a76\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u4ee5\u63d0\u5347\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u7136\u800c\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5305\u542bLLMs\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u6570\u636e\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e09\u9636\u6bb5\u8bad\u7ec3\u6d41\u7a0b\uff0c\u6709\u6548\u5730\u6574\u5408\u73b0\u6709\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u4e0eLLMs\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u8bad\u7ec3\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u9002\u914d\u5668\uff0c\u4f7f\u5f97\u80fd\u591f\u5feb\u901f\u4f7f\u7528LLMs\u751f\u6210\u7684\u6587\u672c\u8868\u793a\u6765\u8bad\u7ec3\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0d\u4ec5\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u8fd8\u80fd\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u800c\u4e14\u5728\u56fe\u50cf\u751f\u6210\u8d28\u91cf\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.12910": "|**2024-05-21**|**Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment**|Holli Sargeant et.al.|[2405.12910](http://arxiv.org/abs/2405.12910)|**[link](https://github.com/AhmedIzzidien/TopicLLM)**|**\u8be5\u8bba\u6587\u5173\u6ce8\u6cd5\u5f8b\u5206\u6790\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u7a7a\u767d\uff0c\u901a\u8fc7\u6784\u5efa\u548c\u5e94\u7528\u4e00\u79cd\u65b0\u9896\u7684\u5224\u4f8b\u4e3b\u9898\u5206\u7c7b\u6cd5\uff0c\u5bf9\u82f1\u56fd\u7684\u7b80\u6613\u5224\u51b3\u6848\u4ef6\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5229\u7528\u7cbe\u5fc3\u6311\u9009\u7684\u7b80\u6613\u5224\u51b3\u6848\u4f8b\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578bClaude 3 Opus\u7814\u7a76\u529f\u80fd\u6027\u8bdd\u9898\u548c\u8d8b\u52bf\u3002\u7ed3\u679c\u663e\u793a\uff0cClaude 3 Opus\u5728\u4e3b\u9898\u5206\u7c7b\u4e0a\u7684\u51c6\u786e\u7387\u4e3a87.10%\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u6cd5\u5f8b\u9886\u57df\u4e2d\u7b80\u6613\u5224\u51b3\u7684\u660e\u663e\u6a21\u5f0f\u3002\u7531\u4e8e\u82f1\u56fd\u7684\u5224\u4f8b\u6cd5\u5e76\u672a\u539f\u59cb\u6807\u6ce8\u5173\u952e\u8bcd\u6216\u63d0\u4f9b\u4e3b\u9898\u8fc7\u6ee4\u9009\u9879\uff0c\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u6df1\u5316\u4e86\u6211\u4eec\u5bf9\u7b80\u6613\u5224\u51b3\u4e3b\u9898\u672c\u8d28\u7684\u7406\u89e3\uff0c\u8fd8\u5c55\u793a\u4e86\u4f20\u7edf\u65b9\u6cd5\u4e0e\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u5206\u7c7b\u65b9\u6cd5\u7ed3\u5408\u7684\u53ef\u80fd\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u4f9b\u4e86\u82f1\u56fd\u6cd5\u5f8b\u7684\u65b0\u901a\u7528\u5206\u7c7b\u6846\u67b6\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u610f\u4e49\u4e3a\u53f8\u6cd5\u884c\u653f\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba1\u7b97\u6cd5\u5b66\u7814\u7a76\u65b9\u6cd5\u8bba\u8ba8\u8bba\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2405.12900": "|**2024-05-21**|**Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents**|San Kim et.al.|[2405.12900](http://arxiv.org/abs/2405.12900)|null|\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5404\u79cd\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7cfb\u7edf\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6bd2\u6027\u95ee\u9898\u5bf9\u7528\u6237\u4f53\u9a8c\u6784\u6210\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bad\u7ec3\u7b97\u6cd5\u2014\u2014\u5bf9\u6297\u5f0f\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08ADPO\uff09\uff0c\u5b83\u662f\u5728\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7684\u57fa\u7840\u4e0a\u6539\u8fdb\u7684\u3002ADPO\u65e8\u5728\u8bad\u7ec3\u6a21\u578b\u589e\u52a0\u5bf9\u4f18\u9009\u56de\u590d\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u964d\u4f4e\u5bf9\u4f7f\u7528\u6709\u6bd2\u63a7\u5236\u4ee4\u724c\u751f\u6210\u7684\u4e0d\u5b89\u5168\u56de\u590d\u7684\u6982\u7387\u3002\u7814\u7a76\u663e\u793a\uff0cADPO\u80fd\u591f\u589e\u5f3a\u6a21\u578b\u62b5\u5fa1\u6709\u5bb3\u5bf9\u8bdd\u7684\u80fd\u529b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660eADPO\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edfDPO\u66f4\u4e3a\u7a33\u5b9a\u7684\u8bad\u7ec3\u6d41\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u6709\u5bb3\u6570\u636e\u76f4\u63a5\u878d\u5165\u751f\u6210\u6a21\u578b\u7684DPO\u53d8\u4f53\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u521b\u5efa\u5b89\u5168\u5bf9\u8bdd\u6570\u636e\u7684\u9700\u6c42\u3002|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863](http://arxiv.org/abs/2405.14863)|null|\u8de8\u9886\u57df\u5bf9\u9f50\u662f\u6307\u5c06\u4e00\u4e2a\u6982\u5ff5\u4ece\u4e00\u4e2a\u9886\u57df\u6620\u5c04\u5230\u53e6\u4e00\u4e2a\u9886\u57df\u7684\u4efb\u52a1\u3002\u4f8b\u5982\uff0c\u8be2\u95ee\u201c\u5982\u679c\\textit{\u533b\u751f}\u662f\u4e00\u79cd\\textit{\u989c\u8272}\uff0c\u5b83\u4f1a\u662f\u4ec0\u4e48\u989c\u8272\uff1f\u201d\u8fd9\u4e2a\u770b\u4f3c\u5947\u7279\u7684\u8bfe\u9898\u65e8\u5728\u7814\u7a76\u4eba\u4eec\u5982\u4f55\u901a\u8fc7\u7c7b\u522b\u6620\u5c04\u548c\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u63a8\u7406\u6765\u8868\u5f81\u5177\u4f53\u548c\u62bd\u8c61\u7684\u6982\u5ff5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u8ba4\u77e5\u79d1\u5b66\u4e2d\u7684\u8fd9\u4e00\u4efb\u52a1\uff0c\u901a\u8fc7\u884c\u4e3a\u7814\u7a76\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6982\u5ff5\u5316\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u793aLLMs\u6267\u884c\u8de8\u57df\u6620\u5c04\u4efb\u52a1\uff0c\u5e76\u5728\u7fa4\u4f53\u548c\u4e2a\u4f53\u5c42\u9762\u5206\u6790\u5b83\u4eec\u7684\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u5bf9\u5176\u9884\u6d4b\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u548c\u5206\u7c7b\u5b83\u4eec\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u89e3\u91ca\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4eba\u7c7b\u548c\u6a21\u578b\u7684\u6620\u5c04\u4ee5\u53ca\u89e3\u91ca\u5b58\u5728\u663e\u8457\u76f8\u4f3c\u6027\uff0c\u8868\u660e\u6a21\u578b\u4ee5\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u65b9\u5f0f\u8868\u5f81\u6982\u5ff5\u3002\u8fd9\u79cd\u76f8\u4f3c\u6027\u4e0d\u4ec5\u4f53\u73b0\u5728\u6a21\u578b\u7684\u8868\u793a\u4e0a\uff0c\u4e5f\u4f53\u73b0\u5728\u5b83\u4eec\u7684\u884c\u4e3a\u4e2d\u3002\u800c\u4e14\uff0c\u6a21\u578b\u5927\u591a\u7ed9\u51fa\u6709\u6548\u7684\u89e3\u91ca\uff0c\u5e76\u91c7\u7528\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u63a8\u7406\u8def\u5f84\u3002|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862](http://arxiv.org/abs/2405.14862)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aBitune\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u63d0\u5347\u4e86\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u8c03\u4f18\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u5728\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002Bitune\u901a\u8fc7\u540c\u65f6\u5e94\u7528\u81ea\u56de\u5f52\u548c\u53cc\u5411\u6ce8\u610f\u529b\u5230\u63d0\u793a\u4e0a\uff0c\u4ee5\u83b7\u53d6\u66f4\u7cbe\u786e\u7684\u67e5\u8be2\u6216\u6307\u4ee4\u8868\u793a\u3002\u6211\u4eec\u4e3a\u6b64\u5f15\u5165\u4e86\u4e24\u7ec4\u53c2\u6570\uff0c\u5e76\u91c7\u7528\u4e86\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u6765\u5904\u7406\u3002\u8fd9\u4e24\u79cd\u7279\u5f81\u968f\u540e\u88ab\u7ec4\u5408\u6210\u4e00\u4e2a\u52a0\u6743\u5e73\u5747\uff0c\u5176\u4e2d\u6743\u91cd\u7531\u53ef\u8bad\u7ec3\u7cfb\u6570\u51b3\u5b9a\uff0c\u7528\u4e8e\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitune\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u7b97\u672f\u548c\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5927\u91cf\u7684\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u5e76\u663e\u793a\u4e86\u8be5\u65b9\u6cd5\u5bf9\u4e0d\u540cPEFT\u6280\u672f\u7684\u9c81\u68d2\u6027\u3002|\n", "2405.14852": "|**2024-05-23**|**PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression**|Vladimir Malinovskii et.al.|[2405.14852](http://arxiv.org/abs/2405.14852)|**[link](https://github.com/vahe1994/aqlm)**|## \u80cc\u666f \u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u6781\u7aef\u201d\u538b\u7f29\uff0c\u5373\u5c06\u5176\u53c2\u6570\u538b\u7f29\u81f31-2\u4f4d\u6bcf\u53c2\u6570\uff0c\u4ee5\u9002\u5e94\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u7684\u9ad8\u6548\u6267\u884c\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6539\u8fdb\u4e00\u6b21\u6027\u91cf\u5316\u6280\u672f\u548c\u6743\u91cd\u8868\u793a\u4e0a\uff1b\u7136\u800c\uff0c\u7eaf\u540e\u8bad\u7ec3\u65b9\u6cd5\u5728\u7cbe\u5ea6\u4e0e\u4f4d\u5bbd\u6743\u8861\u65b9\u9762\u7684\u6536\u76ca\u6b63\u5728\u51cf\u5c11\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982QuIP#\u548cAQLM\uff0c\u5305\u542b\u5bf9\u90e8\u5206\u538b\u7f29\u53c2\u6570\u7684\u5c0f\u89c4\u6a21\u6821\u51c6\u6570\u636e\u5fae\u8c03\uff1b\u7136\u800c\uff0c\u8fd9\u4e9b\u9488\u5bf9\u538b\u7f29\u6743\u91cd\u7684\u5fae\u8c03\u901a\u5e38\u4ec5\u4f7f\u7528\u76f4\u901a\u4f30\u8ba1\u5668\uff08STE\uff09\uff0cSTE\u5728\u8fd9\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\u5c1a\u4e0d\u660e\u786e\u3002 \u672c\u5de5\u4f5c\u8d28\u7591\u5728\u6781\u7aefLLM\u538b\u7f29\u4e2d\u4f7f\u7528STE\u7684\u6709\u6548\u6027\uff0c\u5e76\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u91cf\u5316\u611f\u77e5\u5fae\u8c03\u7b56\u7565\u3002\u6211\u4eec\u63d0\u51faPV-Tuning\uff0c\u4e00\u4e2a\u65e0\u7279\u5b9a\u67b6\u6784\u9650\u5236\u7684\u6846\u67b6\uff0c\u5b83\u6269\u5c55\u5e76\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u5e76\u5728\u67d0\u4e9b\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u4f9b\u6536\u655b\u4fdd\u8bc1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5f53\u7528\u4e8e1-2\u4f4d\u77e2\u91cf\u91cf\u5316\u65f6\uff0cPV-Tuning\u5728\u9ad8\u6027\u80fd\u6a21\u578b\u5982Llama\u548cMistral\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u6280\u672f\u3002\u901a\u8fc7\u4f7f\u7528PV-Tuning\uff0c\u6211\u4eec\u57282\u4f4d\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u9996\u6b21\u5b9e\u73b0\u4e86Llama 2\u5bb6\u65cf\u6a21\u578b\u7684\u5e15\u7d2f\u6258\u6700\u4f18\u91cf\u5316\u3002|\n", "2405.14831": "|**2024-05-23**|**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models**|Bernal Jim\u00e9nez Guti\u00e9rrez et.al.|[2405.14831](http://arxiv.org/abs/2405.14831)|**[link](https://github.com/osu-nlp-group/hipporag)**|\u4e3a\u4e86\u5728\u6076\u52a3\u591a\u53d8\u7684\u81ea\u7136\u73af\u5883\u4e2d\u751f\u5b58\uff0c\u54fa\u4e73\u52a8\u7269\u7684\u5927\u8111\u53d1\u5c55\u51fa\u5b58\u50a8\u5927\u91cf\u4e16\u754c\u77e5\u8bc6\u5e76\u4e0d\u65ad\u6574\u5408\u65b0\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u540c\u65f6\u907f\u514d\u707e\u96be\u6027\u9057\u5fd8\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u5e26\u6709\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u4e0a\u5df2\u53d6\u5f97\u663e\u8457\u6210\u5c31\uff0c\u4f46\u5b83\u4eec\u5728\u5927\u89c4\u6a21\u65b0\u7ecf\u9a8c\u878d\u5408\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51faHippoRAG\uff0c\u4e00\u4e2a\u53d7\u4eba\u7c7b\u957f\u671f\u8bb0\u5fc6\u6d77\u9a6c\u56de\u7d22\u5f15\u7406\u8bba\u542f\u53d1\u7684\u65b0\u578b\u68c0\u7d22\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u65b0\u7ecf\u9a8c\u7684\u66f4\u6df1\u3001\u66f4\u6709\u6548\u96c6\u6210\u3002HippoRAG\u5de7\u5999\u5730\u534f\u540cLLMs\u3001\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u4e2a\u6027\u5316PageRank\u7b97\u6cd5\uff0c\u6a21\u62df\u4eba\u8111\u76ae\u5c42\u548c\u6d77\u9a6c\u4f53\u5728\u8bb0\u5fc6\u4e2d\u7684\u4e0d\u540c\u4f5c\u7528\u3002 \u6211\u4eec\u5c06HippoRAG\u4e0e\u73b0\u6709RAG\u65b9\u6cd5\u5728\u591a\u8f6e\u95ee\u7b54\u4efb\u52a1\u4e2d\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793aHippoRAG\u663e\u8457\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002\u5355\u6b65\u68c0\u7d22\u65f6\uff0cHippoRAG\u8868\u73b0\u51fa\u4e0e\u8fed\u4ee3\u68c0\u7d22\u65b9\u6cd5\u5982IRCoT\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u6210\u672c\u8282\u770110-30\u500d\uff0c\u901f\u5ea6\u63d0\u53476-13\u500d\u3002\u5f53\u5c06HippoRAG\u878d\u5165IRCoT\u540e\uff0c\u8fd8\u80fd\u5e26\u6765\u989d\u5916\u7684\u663e\u8457\u589e\u76ca\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793aHippoRAG\u80fd\u591f\u5e94\u5bf9\u73b0\u6709\u65b9\u6cd5\u96be\u4ee5\u89e6\u53ca\u7684\u65b0\u573a\u666f\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728\u4e0a\u5f00\u6e90\u3002|\n", "2405.14804": "|**2024-05-23**|**Can LLMs Solve longer Math Word Problems Better?**|Xin Xu et.al.|[2405.14804](http://arxiv.org/abs/2405.14804)|null|### \u7ffb\u8bd1 \u6570\u5b66\u5e94\u7528\u9898\uff08MWPs\uff09\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u77ed\u80cc\u666f\u7684\u9898\u76ee\u4e0a\u3002\u7136\u800c\uff0c\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u6570\u5b66\u95ee\u9898\u5f80\u5f80\u6d89\u53ca\u590d\u6742\u60c5\u5883\uff0c\u56e0\u6b64LLMs\u89e3\u51b3\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u5bf9\u4e8e\u5176\u5728\u5b9e\u9645\u573a\u666f\u7684\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u8fd9\u4e00\u65b9\u9762\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5173\u6ce8Context Length Generalizability\uff08CoLeG\uff09\uff0c\u5373LLMs\u5904\u7406\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u3002\u6211\u4eec\u521b\u5efa\u4e86Extended Grade-School Math\uff08E-GSM\uff09\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5e26\u6709\u8be6\u7ec6\u53d9\u8ff0\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30LLMs\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u6548\u80fd\u548c\u9c81\u68d2\u6027\u3002 \u901a\u8fc7\u5bf9\u73b0\u6709\u96f6\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u4ee5\u53ca\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u8003\u5bdf\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u5728CoLeG\u65b9\u9762\u666e\u904d\u5b58\u5728\u4e0d\u8db3\u3002\u9488\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684LLMs\uff0c\u6211\u4eec\u63d0\u51fa\u9488\u5bf9\u6027\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u5bf9\u4e8e\u4e13\u6709\u6a21\u578b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u5bfc\u6027\u63d0\u793a\u4ee5\u51cf\u8f7b\u957f\u6587\u672c\u7684\u5f71\u54cd\uff1b\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4ee5\u63d0\u5347\u6a21\u578b\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u5728E-GSM\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5176\u4ed6\u591a\u4e2a\u6570\u5b66\u5e94\u7528\u9898\u57fa\u51c6\u4e0a\u4e5f\u5c55\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u672c\u7814\u7a76\u7684\u7ed3\u679c\u4e3a\u672a\u6765\u5229\u7528LLMs\u5904\u7406\u590d\u6742\u73b0\u5b9e\u95ee\u9898\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\uff0c\u4e3a\u5f53\u524d\u9650\u5236\u63d0\u51fa\u4e86\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u6a21\u578b\u6cdb\u5316\u6027\u548c\u8bad\u7ec3\u7b56\u7565\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2405.14782": "|**2024-05-23**|**Lessons from the Trenches on Reproducible Evaluation of Language Models**|Stella Biderman et.al.|[2405.14782](http://arxiv.org/abs/2405.14782)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u662f\u4e00\u9879\u672a\u89e3\u7684\u6311\u6218\u3002\u7814\u7a76\u4eba\u5458\u548c\u5de5\u7a0b\u5e08\u9762\u4e34\u8bf8\u591a\u65b9\u6cd5\u8bba\u96be\u9898\uff0c\u4f8b\u5982\u6a21\u578b\u5bf9\u8bc4\u4f30\u8bbe\u7f6e\u7684\u654f\u611f\u6027\u3001\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\uff0c\u4ee5\u53ca\u53ef\u91cd\u590d\u6027\u548c\u900f\u660e\u5ea6\u7684\u7f3a\u5931\u3002\u672c\u6587\u57fa\u4e8e\u4e09\u5e74\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7ecf\u9a8c\uff0c\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u6307\u5bfc\u548c\u6559\u8bad\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u5e38\u89c1\u7684\u95ee\u9898\u3002\u5176\u6b21\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u5e94\u5bf9\u6216\u51cf\u8f7b\u8fd9\u4e9b\u95ee\u9898\u7684\u6700\u4f73\u5b9e\u8df5\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Language Model Evaluation Harness\uff08lm-eval\uff09\uff1a\u4e00\u4e2a\u5f00\u6e90\u5e93\uff0c\u65e8\u5728\u72ec\u7acb\u3001\u53ef\u91cd\u590d\u548c\u6269\u5c55\u5730\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u5c06\u4ecb\u7ecd\u5e93\u7684\u529f\u80fd\uff0c\u5e76\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u5982\u4f55\u4f7f\u7528\u8be5\u5e93\u6765\u7f13\u89e3\u8fd9\u4e9b\u65b9\u6cd5\u8bba\u5173\u6ce8\u70b9\u3002|\n", "2405.14768": "|**2024-05-23**|**WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models**|Peng Wang et.al.|[2405.14768](http://arxiv.org/abs/2405.14768)|**[link](https://github.com/zjunlp/easyedit)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u968f\u7740\u4e16\u754c\u4e8b\u5b9e\u7684\u4e0d\u65ad\u589e\u957f\u548c\u7ea0\u6b63\u9519\u8bef\u54cd\u5e94\u7684\u9700\u6c42\uff0c\u6a21\u578b\u7f16\u8f91\u7684\u65b9\u6cd5\u9700\u8981\u4e0d\u65ad\u66f4\u65b0\u77e5\u8bc6\u3002\u8bba\u6587\u7684\u6838\u5fc3\u95ee\u9898\u662f\uff1a\u5728\u7f16\u8f91\u8fc7\u7a0b\u4e2d\uff0c\u77e5\u8bc6\u5e94\u5b58\u50a8\u5728\u6a21\u578b\u7684\u54ea\u4e2a\u8bb0\u5fc6\u5c42\u6b21\u66f4\u4e3a\u5408\u9002\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u76f4\u63a5\u4fee\u6539\u957f\u671f\u8bb0\u5fc6\uff08\u6a21\u578b\u53c2\u6570\uff09\u6216\u5229\u7528\u5de5\u4f5c\u8bb0\u5fc6\uff08\u901a\u8fc7\u68c0\u7d22\u7684\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\uff09\u90fd\u4f1a\u5bfc\u81f4\u4e0d\u53ef\u903e\u8d8a\u7684\u4e09\u89d2\u56f0\u5883\u2014\u2014\u53ef\u9760\u6027\u3001\u6cdb\u5316\u80fd\u529b\u548c\u5c40\u90e8\u6027\u65e0\u6cd5\u540c\u65f6\u5b9e\u73b0\u4e8e\u7ec8\u8eab\u7f16\u8f91\u573a\u666f\u4e2d\u3002\u76f4\u63a5\u4fee\u6539\u53c2\u6570\u4f1a\u4e0e\u65e0\u5173\u7684\u9884\u8bad\u7ec3\u77e5\u8bc6\u6216\u5148\u524d\u7f16\u8f91\u4ea7\u751f\u51b2\u7a81\uff08\u53ef\u9760\u6027\u5dee\u3001\u5c40\u90e8\u6027\u4e0d\u8db3\uff09\uff1b\u800c\u57fa\u4e8e\u68c0\u7d22\u7684\u5de5\u4f5c\u8bb0\u5fc6\u96be\u4ee5\u4f7f\u6a21\u578b\u7406\u89e3\u5e76\u6cdb\u5316\u7f16\u8f91\uff08\u6cdb\u5316\u80fd\u529b\u5f31\uff09\u3002\u56e0\u6b64\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aWISE\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u5f25\u5408\u8bb0\u5fc6\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002 \u5728WISE\u4e2d\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u53c2\u6570\u5185\u5b58\u673a\u5236\uff0c\u5305\u62ec\u4e3b\u5185\u5b58\u7528\u4e8e\u5b58\u50a8\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0c\u4fa7\u5185\u5b58\u7528\u4e8e\u5b58\u653e\u7f16\u8f91\u540e\u7684\u77e5\u8bc6\u3002\u4ec5\u5bf9\u4fa7\u5185\u5b58\u4e2d\u7684\u77e5\u8bc6\u8fdb\u884c\u7f16\u8f91\uff0c\u5e76\u8bad\u7ec3\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u4ee5\u4fbf\u6839\u636e\u67e5\u8be2\u51b3\u5b9a\u4ece\u54ea\u4e2a\u5185\u5b58\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u5bf9\u4e8e\u6301\u7eed\u7f16\u8f91\uff0c\u91c7\u7528\u4e86\u77e5\u8bc6\u5207\u7247\u673a\u5236\uff0c\u5c06\u4e0d\u540c\u7684\u7f16\u8f91\u5206\u5e03\u5728\u53c2\u6570\u7684\u4e0d\u540c\u5b50\u7a7a\u95f4\u4e2d\uff0c\u7136\u540e\u5408\u5e76\u5230\u5171\u4eab\u5185\u5b58\u4e2d\uff0c\u4ee5\u907f\u514d\u51b2\u7a81\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWISE\u5728\u95ee\u7b54\u3001\u5e7b\u89c9\u751f\u6210\u548c\u8de8\u4e0d\u540c\u8d8b\u52bf\u7684LLM\u67b6\u6784\uff08\u5982GPT\u3001LLaMA\u548cMistral\uff09\u7684\u7ec8\u8eab\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u6210\u529f\u514b\u670d\u4e86\u4e0a\u8ff0\u56f0\u5883\u3002\u4ee3\u7801\u5c06\u5728https://github.com/zjunlp/EasyEdit\u4e0a\u53d1\u5e03\u3002**|\n", "2405.14767": "|**2024-05-23**|**FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models**|Hongyang Yang et.al.|[2405.14767](http://arxiv.org/abs/2405.14767)|**[link](https://github.com/ai4finance-foundation/finrobot)**|**\u968f\u7740\u91d1\u878d\u673a\u6784\u548c\u4e13\u4e1a\u4eba\u58eb\u8d8a\u6765\u8d8a\u591a\u5730\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u878d\u5165\u5de5\u4f5c\u6d41\u7a0b\uff0c\u91d1\u878d\u884c\u4e1a\u4e0eAI\u793e\u533a\u4e4b\u95f4\u4ecd\u5b58\u5728\u663e\u8457\u969c\u788d\uff0c\u5982\u4e13\u6709\u6570\u636e\u548c\u4e13\u4e1a\u77e5\u8bc6\u3002\u8fd9\u4e9b\u6311\u6218\u9650\u5236\u4e86AI\u5728\u63d0\u5347\u91d1\u878d\u4efb\u52a1\u6548\u7387\u65b9\u9762\u7684\u6f5c\u529b\u3002\u9274\u4e8e\u91d1\u878d\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u6211\u4eec\u65e8\u5728\u5f00\u53d1\u4e13\u95e8\u9488\u5bf9\u91d1\u878d\u7684LLM\u9a71\u52a8\u5de5\u5177\u94fe\uff0c\u5e76\u901a\u8fc7\u5f00\u6e90\u9879\u76ee\u63a8\u52a8\u5176\u666e\u53ca\uff0c\u4fc3\u8fdbAI\u5728\u91d1\u878d\u51b3\u7b56\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u4ecb\u7ecdFinRobot\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u5f00\u6e90AI\u4ee3\u7406\u5e73\u53f0\uff0c\u652f\u6301\u591a\u4e2a\u91d1\u878d\u4e13\u4e1aAI\u4ee3\u7406\uff0c\u6bcf\u4e2a\u90fd\u7531LLM\u9a71\u52a8\u3002\u5e73\u53f0\u4e3b\u8981\u5206\u4e3a\u56db\u5c42\uff1a1\uff09\u91d1\u878dAI\u4ee3\u7406\u5c42\uff0c\u901a\u8fc7\u6784\u5efa\u91d1\u878dChain-of-Thought\uff08CoT\uff09\u5c06\u590d\u6742\u7684\u91d1\u878d\u95ee\u9898\u5206\u89e3\u4e3a\u903b\u8f91\u5e8f\u5217\uff1b2\uff09\u91d1\u878dLLM\u7b97\u6cd5\u5c42\uff0c\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u52a8\u6001\u914d\u7f6e\u5408\u9002\u7684\u6a21\u578b\u5e94\u7528\u7b56\u7565\uff1b3\uff09LLMOps\u548cDataOps\u5c42\uff0c\u901a\u8fc7\u8bad\u7ec3/\u5fae\u8c03\u6280\u672f\u4ee5\u53ca\u4f7f\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u751f\u6210\u7cbe\u786e\u6a21\u578b\uff1b4\uff09\u591a\u6e90LLM\u57fa\u7840\u6a21\u578b\u5c42\uff0c\u6574\u5408\u5404\u79cdLLM\uff0c\u4f7f\u4e0a\u8ff0\u5404\u5c42\u53ef\u4ee5\u76f4\u63a5\u8bbf\u95ee\u3002FinRobot\u65e8\u5728\u4e3a\u4e13\u4e1a\u5206\u6790\u5e08\u548c\u975e\u4e13\u4e1a\u4eba\u58eb\u63d0\u4f9b\u5b9e\u8df5\u64cd\u4f5c\uff0c\u8ba9\u4ed6\u4eec\u80fd\u591f\u5229\u7528\u5f3a\u5927\u7684AI\u6280\u672f\u8fdb\u884c\u9ad8\u7ea7\u91d1\u878d\u5206\u6790\u3002FinRobot\u7684\u5f00\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/AI4Finance-Foundation/FinRobot}\u3002**|\n", "2405.14766": "|**2024-05-23**|**Evaluating Large Language Models for Public Health Classification and Extraction Tasks**|Joshua Harris et.al.|[2405.14766](http://arxiv.org/abs/2405.14766)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u4eec\u5bf9\u5176\u5728\u516c\u5171\u536b\u751f\u9886\u57df\u652f\u6301\u4e13\u5bb6\u5de5\u4f5c\u7684\u6f5c\u529b\u4ea7\u751f\u4e86\u6d53\u539a\u5174\u8da3\u3002\u672c\u7814\u7a76\u901a\u8fc7\u7ed3\u5408\u516d\u4e2a\u5916\u90e8\u6807\u6ce8\u7684\u548c\u4e03\u4e2a\u5185\u90e8\u6807\u6ce8\u7684\u6570\u636e\u96c6\uff0c\u8bc4\u4f30\u4e86LLMs\u5728\u5904\u7406\u4e0e\u5065\u5eb7\u8d1f\u62c5\u3001\u6d41\u884c\u75c5\u5b66\u98ce\u9669\u56e0\u7d20\u548c\u516c\u5171\u536b\u751f\u5e72\u9884\u76f8\u5173\u7684\u6587\u672c\u5206\u7c7b\u548c\u63d0\u53d6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u9996\u5148\u5bf9\u4e94\u4e2a\u5f00\u6e90\u5927\u6a21\u578b\uff08\u53c2\u6570\u91cf\u4ece7\u4ebf\u523070\u4ebf\u4e0d\u7b49\uff09\u8fdb\u884c\u4e86\u96f6\u6837\u672c\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cLlama-3-70B-Instruct\u8868\u73b0\u51fa\u8272\uff0c\u5fae-F1\u5f97\u5206\u572817\u4e2a\u4efb\u52a1\u4e2d\u768415\u9879\u4e2d\u6700\u9ad8\u3002\u5404\u4efb\u52a1\u95f4\u7684\u6027\u80fd\u5dee\u5f02\u663e\u8457\uff0c\u4f8b\u5982\uff0c\u6709\u4e9b\u6a21\u578b\u5982Contact Classification\u7684\u5f97\u5206\u4f4e\u4e8e60%\uff0c\u800c\u50cfGI\u75be\u75c5\u5206\u7c7b\u8fd9\u6837\u7684\u4efb\u52a1\uff0c\u6240\u6709\u6a21\u578b\u90fd\u80fd\u8fbe\u523080%\u4ee5\u4e0a\u7684\u5fae-F1\u3002\u5bf9\u4e8e12\u4e2a\u4efb\u52a1\u7684\u5b50\u96c6\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86GPT-4\uff0c\u53d1\u73b0\u5176\u4e0eLlama-3-70B-Instruct\u7684\u7ed3\u679c\u76f8\u5f53\uff0cLlama-3-70B-Instruct\u5728\u5176\u4e2d6\u4e2a\u4efb\u52a1\u4e0a\u5f97\u5206\u66f4\u9ad8\u6216\u6301\u5e73\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6839\u636e\u521d\u6b65\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u6709\u53ef\u80fd\u6210\u4e3a\u516c\u5171\u536b\u751f\u4e13\u5bb6\u4ece\u5404\u79cd\u81ea\u7531\u6587\u672c\u6e90\u63d0\u53d6\u4fe1\u606f\u7684\u6709\u6548\u5de5\u5177\uff0c\u6709\u52a9\u4e8e\u516c\u5171\u536b\u751f\u76d1\u6d4b\u3001\u7814\u7a76\u548c\u5e72\u9884\u63aa\u65bd\u3002|\n", "2405.14755": "|**2024-05-23**|**Large language models can be zero-shot anomaly detectors for time series?**|Sarah Alnegheimish et.al.|[2405.14755](http://arxiv.org/abs/2405.14755)|**[link](https://github.com/sintel-dev/sigllm)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6267\u884c\u591a\u79cd\u4efb\u52a1\uff0c\u5305\u62ec\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u8fd9\u4e9b\u6a21\u578b\u7684\u7075\u6d3b\u6027\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e00\u9879\u65b0\u9896\u7684\u7814\u7a76\uff0c\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5bf9\u4e8e\u8bed\u8a00\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u6d89\u53ca\u8bc6\u522b\u8f93\u5165\u5e8f\u5217\uff08\u6216\u591a\u4e2a\u90e8\u5206\uff09\u4e2d\u7684\u5f02\u5e38\u70b9\uff0c\u4ee5\u53ca\u5904\u7406\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u800c\u975e\u4f20\u7edf\u7684\u6587\u672c\u8f93\u5165\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86sigllm\uff0c\u4e00\u4e2a\u4e13\u4e3a\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5305\u542b\u5c06\u65f6\u95f4\u5e8f\u5217\u8f6c\u6362\u4e3a\u6587\u672c\u7684\u6a21\u5757\uff0c\u4ee5\u53ca\u7aef\u5230\u7aef\u7684\u6d41\u7a0b\uff0c\u7528\u4e8e\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5f02\u5e38\u68c0\u6d4b\u3002\u6211\u4eec\u8bd5\u9a8c\u4e86\u4e24\u79cd\u6d4b\u8bd5\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u529b\u7684\u65b9\u6cd5\uff1a\u4e00\u662f\u76f4\u63a5\u63d0\u793a\u6a21\u578b\u6307\u51fa\u8f93\u5165\u4e2d\u7684\u5f02\u5e38\u5143\u7d20\uff1b\u4e8c\u662f\u5229\u7528\u8bed\u8a00\u6a21\u578b\u7684\u9884\u6d4b\u80fd\u529b\u6765\u8f85\u52a9\u68c0\u6d4b\u8fc7\u7a0b\u3002 \u6211\u4eec\u572811\u4e2a\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u4e8610\u79cd\u4e0d\u540c\u7684\u7ba1\u9053\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9884\u6d4b\u65b9\u6cd5\u5728\u6240\u670911\u4e2a\u6570\u636e\u96c6\u4e2d\u90fd\u663e\u8457\u4f18\u4e8e\u63d0\u793a\u65b9\u6cd5\uff0c\u5c24\u5176\u662f\u5728F1\u5206\u6570\u4e0a\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u53d1\u73b0\u5f02\u5e38\uff0c\u4f46\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4ecd\u5360\u4f18\uff0c\u5176\u8868\u73b0\u6bd4\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9ad8\u51fa30%\u3002|\n", "2405.15765": "|**2024-05-24**|**Scaling Laws for Discriminative Classification in Large Language Models**|Dean Wyatte et.al.|[2405.15765](http://arxiv.org/abs/2405.15765)|null|## \u80cc\u666f \u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u6a21\u578b\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u98de\u8dc3\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5bf9\u5404\u79cd\u67e5\u8be2\u751f\u6210\u5408\u7406\u7684\u56de\u7b54\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u5e94\u7528\u4e2d\u5177\u6709\u6f5c\u529b\u3002\u7136\u800c\uff0cLLMs\u5df2\u88ab\u89c2\u5bdf\u5230\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\uff0c\u8fd9\u5728\u77ed\u671f\u5185\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5c06\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u91cd\u65b0\u6784\u60f3\u4e3a\u5206\u7c7b\u4efb\u52a1\uff0c\u4ee5\u5e2e\u52a9\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u9009\u62e9\u6700\u4f73\u7684\u6a21\u677f\u56de\u590d\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a\u5ba2\u670d\u4ee3\u8868\u63d0\u4f9b\u6700\u5408\u9002\u7684\u524dK\u4e2a\u5019\u9009\u56de\u590d\u3002 ## \u4efb\u52a1\u63cf\u8ff0 \u6211\u4eec\u5c55\u793a\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u5b9e\u9a8c\u7684\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86\u5b9e\u9a8c\u7cfb\u7edf\u7684\u6709\u6548\u6027\uff0c\u79bb\u7ebf\u5b9e\u9a8c\u663e\u793a\u51fa\u6539\u8fdb\uff0c\u800c\u5728\u7ebf\u5b9e\u9a8c\u5219\u5e26\u6765\u4e86\u7edf\u8ba1\u663e\u8457\u7684\u6548\u679c\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u4eab\u4e86\u901a\u8fc7\u6a21\u578b\u53c2\u6570\u8c03\u6574\u8fdb\u884c\u7684\u9a8c\u8bc1\u635f\u5931\u548c\u524dK\u7cbe\u5ea6\u7684\u5ea6\u91cf\u66f2\u7ebf\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u6a21\u578b\u5927\u5c0f\u3001\u5ef6\u8fdf\u548c\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u3002|\n", "2405.15739": "|**2024-05-24**|**Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias**|Andres Algaba et.al.|[2405.15739](http://arxiv.org/abs/2405.15739)|**[link](https://github.com/andresalgaba/llm_citation_patterns)**|\u8bba\u6587\u6458\u8981\uff1a \u5f15\u7528\u5b9e\u8df5\u5bf9\u4e8e\u6784\u5efa\u79d1\u5b66\u77e5\u8bc6\u7ed3\u6784\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u53d7\u5230\u5f53\u4ee3\u89c4\u8303\u548c\u504f\u89c1\u7684\u5f71\u54cd\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u51fa\u73b0\uff0c\u8fd9\u4e00\u9886\u57df\u51fa\u73b0\u4e86\u65b0\u7684\u52a8\u6001\u3002\u7814\u7a76\u8005\u9996\u6b21\u63a2\u7d22\u4e86\u5b8c\u5168\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u800c\u975e\u57fa\u4e8e\u641c\u7d22\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7684\u63a8\u8350\u5f15\u7528\u7684\u7279\u6027\u53ca\u5176\u6f5c\u5728\u504f\u89c1\u3002\u5b9e\u9a8c\u4f7f\u7528\u4e86\u4e00\u7ec4\u5305\u542b166\u7bc7\u6765\u81eaAAAI\u3001NeurIPS\u3001ICML\u548cICLR\u7684\u8bba\u6587\uff0c\u8fd9\u4e9b\u8bba\u6587\u5728GPT-4\u7684\u77e5\u8bc6\u622a\u6b62\u65e5\u671f\u540e\u53d1\u8868\uff0c\u6d89\u53ca3,066\u4e2a\u5f15\u7528\u3002\u5b9e\u9a8c\u8ba9GPT-4\u4e3a\u533f\u540d\u6587\u672c\u4e2d\u7684\u5f15\u7528\u63d0\u4f9b\u5b66\u672f\u53c2\u8003\u3002\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u7c7b\u548c\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u5f15\u7528\u6a21\u5f0f\u60ca\u4eba\u76f8\u4f3c\uff0c\u4f46GPT-4\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9ad8\u5f15\u7528\u504f\u89c1\uff0c\u5373\u4f7f\u5728\u63a7\u5236\u4e86\u51fa\u7248\u5e74\u4efd\u3001\u6807\u9898\u957f\u5ea6\u3001\u4f5c\u8005\u6570\u91cf\u548c\u4f1a\u8bae\u7b49\u56e0\u7d20\u540e\u4f9d\u7136\u5b58\u5728\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0GPT-4\u751f\u6210\u7684\u65e2\u6709\u548c\u4e0d\u5b58\u5728\u5f15\u7528\u7684\u7279\u6027\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8868\u660e\u6a21\u578b\u5185\u5316\u4e86\u5f15\u7528\u6a21\u5f0f\u3002\u901a\u8fc7\u5206\u6790\u5f15\u7528\u56fe\u8c31\uff0c\u663e\u793aGPT-4\u63a8\u8350\u7684\u5f15\u7528\u5d4c\u5165\u5728\u76f8\u5173\u5f15\u7528\u7f51\u7edc\u4e2d\uff0c\u6697\u793a\u5176\u5bf9\u6982\u5ff5\u7684\u6df1\u5165\u7406\u89e3\u3002\u5c3d\u7ba1\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u8f85\u52a9\u5f15\u7528\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4e5f\u53ef\u80fd\u653e\u5927\u73b0\u6709\u504f\u89c1\u5e76\u5f15\u5165\u65b0\u504f\u89c1\uff0c\u53ef\u80fd\u5f71\u54cd\u79d1\u5b66\u77e5\u8bc6\u7684\u4f20\u64ad\u3002\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86\u8bc6\u522b\u6a21\u578b\u504f\u89c1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5f00\u53d1\u5e73\u8861\u7684\u65b9\u6cd5\u4e0e\u8bed\u8a00\u6a21\u578b\u4e92\u52a8\u7684\u91cd\u8981\u6027\u3002|\n", "2405.15734": "|**2024-05-24**|**LM4LV: A Frozen Large Language Model for Low-level Vision Tasks**|Boyang Zheng et.al.|[2405.15734](http://arxiv.org/abs/2405.15734)|**[link](https://github.com/bytetriper/lm4lv)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u50ac\u751f\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u7814\u7a76\u70ed\u6f6e\uff0c\u5b83\u4eec\u6b63\u5728\u6539\u53d8\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u591a\u4e2a\u7814\u7a76\u8303\u5f0f\u3002\u5c3d\u7ba1MLLMs\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u548c\u6587\u672c\u5230\u56fe\u50cf\u7b49\u9ad8\u7ea7\u89c6\u89c9\u548c Vision-and-Language \u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u63a2\u8ba8\u8fc7\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u5982\u4f55\u4ece\u8fd9\u4e9b\u6a21\u578b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5f53\u524d\u5927\u591a\u6570MLLM\u7684\u8bbe\u8ba1\u4f7f\u5176\u5bf9\u4f4e\u7ea7\u7279\u5f81\u89c6\u800c\u4e0d\u89c1\uff0c\u56e0\u6b64\u5728\u89e3\u51b3\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u56fa\u6709\u9650\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa$\\textbf{LM4LV}$\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u4e00\u4e2a\u51bb\u7ed3\u7684LLM\u65e0\u9700\u4efb\u4f55\u591a\u6a21\u6001\u6570\u636e\u6216\u5148\u9a8c\u77e5\u8bc6\u5c31\u80fd\u89e3\u51b3\u4e00\u7cfb\u5217\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u3002\u8fd9\u7a81\u663e\u4e86LLMs\u5728\u4f4e\u7ea7\u89c6\u89c9\u9886\u57df\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u5e76\u5f25\u5408\u4e86MLLMs\u4e0e\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u6fc0\u53d1\u5bf9LLMs\u7684\u65b0\u89c6\u89d2\uff0c\u52a0\u6df1\u5bf9\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7406\u89e3\u3002|\n", "2405.15729": "|**2024-05-24**|**Optimizing Large Language Models for OpenAPI Code Completion**|Bohdan Petryshyn et.al.|[2405.15729](http://arxiv.org/abs/2405.15729)|**[link](https://github.com/BohdanPetryshyn/openapi-completion-benchmark)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u3002\u5c3d\u7ba1\u4e3b\u6d41\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\u8865\u5168\u89e3\u51b3\u65b9\u6848\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u8f83\u5c11\u89c1\u7684\u683c\u5f0f\uff0c\u5982OpenAPI\u5b9a\u4e49\u65f6\u6027\u80fd\u6b20\u4f73\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86GitHub Copilot\uff0c\u4e00\u4e2a\u6d41\u884c\u7684\u5546\u4e1a\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u5728OpenAPI\u5b8c\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u5e76\u9488\u5bf9Meta\u5f00\u6e90\u7684Code Llama\u6a21\u578b\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u8be5\u4efb\u52a1\u7684\u4f18\u5316\u7b56\u7565\u3002\u7814\u7a76\u4e2d\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8bed\u4e49\u611f\u77e5\u7684OpenAPI\u5b8c\u6210\u57fa\u51c6\uff0c\u901a\u8fc7\u5b9e\u9a8c\u5206\u6790\u4e86\u4e0d\u540c\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u6280\u672f\u5bf9Code Llama\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7ecf\u8fc7\u5fae\u8c03\u7684Code Llama\u6a21\u578b\u5728\u6b63\u786e\u6027\u4e0a\u8fbe\u5230\u4e86\u6bd4GitHub Copilot\u9ad8\u51fa55.2%\u7684\u5cf0\u503c\uff0c\u540c\u65f6\u5176\u53c2\u6570\u6570\u91cf\u4ec5\u4e3a\u5546\u4e1a\u89e3\u51b3\u65b9\u6848\uff08\u57fa\u4e8eCodex\u6a21\u578b\uff09\u76841/25\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u6539\u8fdb\u4e86\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u4ee3\u7801\u586b\u5145\u8bad\u7ec3\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5728\u63a5\u6536\u5230\u5c0f\u4e8e\u8bad\u7ec3\u65f6\u4f7f\u7528\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u63d0\u793a\u65f6\u7684\u6027\u80fd\u4e0d\u8db3\u95ee\u9898\u3002|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684](http://arxiv.org/abs/2405.15684)|null|\u4e3a\u4e86\u5f25\u5408\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u901a\u5e38\u4f1a\u5b66\u4e60\u4e00\u4e2a\u9002\u914d\u5668\uff0c\u5c06\u89c6\u89c9\u8f93\u5165\u8f6c\u5316\u4e3a\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u7406\u89e3\u7684\u4ee4\u724c\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u9002\u914d\u5668\u751f\u6210\u7684\u89c6\u89c9\u4ee4\u724c\u76f8\u5bf9\u56fa\u5b9a\uff0c\u4e0d\u8003\u8651\u63d0\u793a\u4e2d\u63d0\u53ca\u7684\u5177\u4f53\u5bf9\u8c61\u3002\u7531\u4e8e\u8fd9\u4e9b\u9002\u914d\u5668\u5bf9\u56fe\u50cf\u4e2d\u7684\u6bcf\u4e2a\u7ec6\u8282\u5206\u914d\u540c\u7b49\u5173\u6ce8\uff0c\u4e14\u503e\u5411\u4e8e\u5904\u7406\u6574\u4e2a\u573a\u666f\uff0c\u8fd9\u53ef\u80fd\u4f1a\u589e\u52a0\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u573a\u666f\u65f6\u7684\u8ba4\u77e5\u8d1f\u8377\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u3002\u8fd9\u7c7b\u9002\u914d\u5668\u8bbe\u8ba1\u6709\u6839\u636e\u63d0\u793a\u7279\u5b9a\u5173\u6ce8\u70b9\u52a8\u6001\u5d4c\u5165\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u6587\u672c\u7279\u5f81\uff0c\u5728\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u5c42\u6b21\u4e0a\u6355\u6349\u4e0e\u63d0\u793a\u6700\u76f8\u5173\u7684\u89c6\u89c9\u7ebf\u7d22\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u89e3\u91ca\u89c6\u89c9\u5185\u5bb9\u7684\u80fd\u529b\u3002\u5728\u5404\u79cd\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u5982\u8ba1\u6570\u548c\u4f4d\u7f6e\u63a8\u7406\u5b9e\u9a8c\u4e2d\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u7684\u6548\u679c\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2405.15668": "|**2024-05-24**|**What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models**|Abdelrahman Abdelhamed et.al.|[2405.15668](http://arxiv.org/abs/2405.15668)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u591a\u6a21\u6001LLMs\u5e94\u7528\u4e8e\u56fe\u50cf\u8f93\u5165\uff0c\u751f\u6210\u8be6\u5c3d\u7684\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6587\u672c\u8868\u793a\u88ab\u8f6c\u5316\u4e3a\u8de8\u6a21\u6001\u5d4c\u5165\u7a7a\u95f4\u4e2d\u7684\u56fa\u5b9a\u7ef4\u7279\u5f81\uff0c\u5e76\u7ed3\u5408\u4f7f\u7528\u4e8e\u96f6\u6837\u672c\u5206\u7c7b\uff0c\u65e0\u9700\u4e3a\u6bcf\u4e2a\u6570\u636e\u96c6\u8bbe\u8ba1\u590d\u6742\u7684\u63d0\u793a\u3002\u7814\u7a76\u8005\u91c7\u7528\u901a\u7528\u63d0\u793a\u7b56\u7565\uff0c\u800c\u975e\u9488\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u5355\u72ec\u8c03\u6574\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u6bd4\u5148\u524d\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u6709\u6240\u63d0\u5347\u3002\u5e73\u5747\u800c\u8a00\uff0c\u5728\u5341\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u65b9\u6cd5\u6bd4\u4f20\u7edf\u65b9\u6cd5\u63d0\u9ad8\u4e864.1\u4e2a\u767e\u5206\u70b9\uff0c\u5c24\u5176\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e866.8\u4e2a\u767e\u5206\u70b9\u3002\u8fd9\u8868\u660e\uff0c\u591a\u6a21\u6001LLMs\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5982\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u4e4b\u7c7b\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u4efb\u52a1\uff0c\u4e3a\u73b0\u6709\u6280\u672f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002|\n", "2405.15662": "|**2024-05-24**|**Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning**|Wenhan Chang et.al.|[2405.15662](http://arxiv.org/abs/2405.15662)|null|\u5728\u4eba\u5de5\u667a\u80fd\u65f6\u4ee3\uff0c\u7528\u6237\u53ef\u80fd\u56e0\u9690\u79c1\u987e\u8651\u8981\u6c42AI\u516c\u53f8\u4ece\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u5220\u9664\u4ed6\u4eec\u7684\u4fe1\u606f\u3002\u4f5c\u4e3a\u6a21\u578b\u6240\u6709\u8005\uff0c\u91cd\u65b0\u8bad\u7ec3\u6a21\u578b\u4f1a\u6d88\u8017\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\uff0c\u56e0\u6b64\u673a\u5668\u9057\u5fd8\uff08machine unlearning\uff09\u6280\u672f\u5e94\u8fd0\u800c\u751f\uff0c\u4ee5\u5141\u8bb8\u5220\u9664\u8bf7\u6c42\u7684\u8bad\u7ec3\u6570\u636e\u6216\u7c7b\u522b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\uff0c\u5982\u56fe\u50cf\u6216\u6587\u672c\uff0c\u4ece\u6a21\u578b\u4e2d\u201c\u9057\u5fd8\u201d\u4e00\u4e2a\u7c7b\u522b\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u96be\u4ee5\u786e\u5b9a\u7c7b\u522b\u4e0e\u6a21\u578b\u4e4b\u95f4\u7684\u5173\u8054\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u6982\u5ff5\uff08Concept\uff09\u800c\u975e\u56fe\u50cf\u7279\u5f81\u6216\u6587\u672c\u6570\u636e\u4e2d\u7684\u4ee4\u724c\u6765\u8868\u793a\u8981\u5220\u9664\u7c7b\u522b\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u8fd9\u6709\u52a9\u4e8e\u5207\u65ad\u6a21\u578b\u4e0e\u7c7b\u522b\u7684\u8054\u7cfb\uff0c\u5b9e\u73b0\u5f7b\u5e95\u6d88\u9664\u5f71\u54cd\u3002 \u4e3a\u4e86\u5206\u6790\u590d\u6742\u6570\u636e\u4e2d\u7684\u6982\u5ff5\u5f71\u54cd\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u540e\u5904\u7406\u6982\u5ff5\u74f6\u9888\u6a21\u578b\u548c\u96c6\u6210\u68af\u5ea6\u6280\u672f\uff0c\u7cbe\u786e\u8bc6\u522b\u4e0d\u540c\u7c7b\u522b\u4e2d\u7684\u6982\u5ff5\u3002\u7136\u540e\uff0c\u6211\u4eec\u5229\u7528\u968f\u673a\u6807\u7b7e\u548c\u76ee\u6807\u6807\u7b7e\u7684\u6570\u636e\u6c61\u67d3\u7b56\u7565\uff0c\u63d0\u51fa\u9057\u5fd8\u65b9\u6cd5\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u6a21\u578b\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4e00\u81f4\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7b56\u7565\u80fd\u51c6\u786e\u5730\u4ece\u6a21\u578b\u4e2d\u62b9\u9664\u76ee\u6807\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u6027\u80fd\u7684\u5927\u90e8\u5206\u3002|\n", "2405.15652": "|**2024-05-24**|**$$\\mathbf{L^2\\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis**|Simen Gaure et.al.|[2405.15652](http://arxiv.org/abs/2405.15652)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u7ffb\u8bd1\u3001\u9884\u6d4b\u548c\u5185\u5bb9\u751f\u6210\u7b49\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\u3002\u540c\u65f6\uff0c\u7814\u7a76\u754c\u53d1\u73b0LLMs\u6613\u53d7\u653b\u51fb\uff0c\u4f46\u4e5f\u80fd\u589e\u5f3a\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f00\u6e90\u7684LLMs\u5728\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u5a92\u4ecb\uff0c\u5982\u652f\u6301\u6297\u5ba1\u67e5\u901a\u4fe1\u65b9\u9762\u7684\u80fd\u529b\u5982\u4f55\u5462\uff1f\u672c\u8bba\u6587\u4ece\u5b9e\u9a8c\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b9e\u8bc1\u6d4b\u91cf\u5f00\u6e90LLM\u6a21\u578b\uff08Llama-7B\uff09\u7684\u5b89\u5168\u6027\u4e0e\u5bb9\u91cf\uff0c\u4ee5\u8bc4\u4f30\u5176\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u7684\u6709\u6548\u6027\u3002\u5c3d\u7ba1\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8e\u8fd9\u79cd\u6a21\u578b\u7684\u901a\u9053\u4e0d\u592a\u53ef\u80fd\u5b9e\u73b0\u9ad8\u5b9e\u9645\u6bd4\u7279\u7387\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6d88\u606f\u957f\u5ea6\u548c\u6a21\u578b\u71b5\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5bf9\u624b\u53d1\u73b0\u9690\u79d8\u901a\u4fe1\u7684\u53ef\u80fd\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u4f7f\u7ed3\u679c\u6613\u4e8e\u5e7f\u6cdb\u53c2\u8003\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u4e14\u76f4\u89c2\u7684\u65b9\u6848\uff0c\u5e76\u5047\u8bbe\u6a21\u578b\u662f\u516c\u5f00\u53ef\u7528\u7684\u3002|\n", "2405.15646": "|**2024-05-24**|**LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots**|Ruoyu Wang et.al.|[2405.15646](http://arxiv.org/abs/2405.15646)|null|\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u5f00\u53d1\u901a\u7528\u670d\u52a1\u673a\u5668\u4eba\u7684\u9700\u6c42\u4fc3\u4f7f\u673a\u5668\u4eba\u5fc5\u987b\u80fd\u6070\u5f53\u5730\u6267\u884c\u591a\u79cd\u57fa\u7840\u884c\u4e3a\u3002\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fdb\u6b65\u4f7f\u5f97\u53ef\u4ee5\u76f4\u63a5\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u6210\u4efb\u52a1\u5e8f\u5217\uff0c\u65e0\u9700\u989d\u5916\u7684\u9886\u57df\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5c3d\u7ba1LLMs\u7684\u8f93\u51fa\u5728\u8bed\u4e49\u4e0a\u662f\u6b63\u786e\u7684\uff0c\u4f46\u751f\u6210\u7684\u4efb\u52a1\u8ba1\u5212\u53ef\u80fd\u5e76\u4e0d\u7cbe\u786e\u5730\u5bf9\u5e94\u4e8e\u53ef\u63a5\u53d7\u7684\u52a8\u4f5c\uff0c\u5e76\u4e14\u53ef\u80fd\u5b58\u5728\u5404\u79cd\u8bed\u8a00\u6a21\u7cca\u6027\u3002LLM\u7684\u5e7b\u89c9\u95ee\u9898\u5bf9\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u6784\u6210\u6311\u6218\uff0c\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u6216\u7528\u6237\u8f93\u5165\u4e0d\u7b26\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u7ea6\u675fLLM\u63d0\u793a\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u4ece\u547d\u4ee4\u4e2d\u751f\u6210\u53ef\u6267\u884c\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f02\u5e38\u5904\u7406\u6a21\u5757\u6765\u5e94\u5bf9LLM\u5e7b\u89c9\u95ee\u9898\uff0c\u786e\u4fdd\u751f\u6210\u7684\u7ed3\u679c\u5728\u5f53\u524d\u73af\u5883\u4e2d\u662f\u53ef\u63a5\u7eb3\u7684\u3002\u6211\u4eec\u5728RoboCup@Home\u547d\u4ee4\u751f\u6210\u5668\u751f\u6210\u7684\u547d\u4ee4\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u663e\u793a\u673a\u5668\u4eba\u5728\u7406\u89e3\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.15640": "|**2024-05-24**|**GECKO: Generative Language Model for English, Code and Korean**|Sungwoo Oh et.al.|[2405.15640](http://arxiv.org/abs/2405.15640)|null|\u6211\u4eec\u4ecb\u7ecdGECKO\uff0c\u4e00\u4e2a\u4e13\u4e3a\u97e9\u8bed\u548c\u82f1\u8bed\uff08\u5305\u62ec\u7f16\u7a0b\u8bed\u8a00\uff09\u8bbe\u8ba1\u7684\u53cc\u8bed\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u5b83\u57fa\u4e8eLLaMA\u67b6\u6784\uff0c\u4f7f\u7528\u5e73\u8861\u4e14\u9ad8\u8d28\u91cf\u7684\u97e9\u82f1\u8bed\u6570\u636e\u96c6\u8fdb\u884c\u9884\u8bad\u7ec3\u3002\u672c\u62a5\u544a\u8be6\u8ff0\u4e86\u6211\u4eec\u5728\u6784\u5efa\u6570\u636e\u7ba1\u9053\u548c\u8bad\u7ec3\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u52aa\u529b\u3002\u5c3d\u7ba1GECKO\u7684\u8bcd\u6c47\u91cf\u8f83\u5c0f\uff0c\u4f46\u5176\u5728\u751f\u6210\u97e9\u8bed\u548c\u82f1\u8bed\u4ee4\u724c\u65f6\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u80fd\u3002\u6211\u4eec\u5728\u4ee3\u8868\u6027\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30\u4e86\u5176\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u97e9\u56fdMMMLU\uff08\u97e9\u56fd\u591a\u6a21\u6001\u591a\u8bed\u8a00\u7406\u89e3\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u800c\u5728\u82f1\u8bed\u548c\u4ee3\u7801\u65b9\u9762\u5219\u663e\u793a\u51fa\u9002\u5ea6\u7684\u80fd\u529b\uff0c\u5c3d\u7ba1\u5176\u8bad\u7ec3\u7684\u4ee4\u724c\u6570\u91cf\u5c11\u4e8e\u4e13\u6ce8\u4e8e\u82f1\u8bed\u7684LLMs\u3002GECKO\u4ee5\u5bbd\u677e\u7684\u8bb8\u53ef\u534f\u8bae\u5bf9\u5f00\u6e90\u793e\u533a\u5f00\u653e\uff0c\u6211\u4eec\u5e0c\u671b\u5b83\u80fd\u4e3a\u97e9\u8bedLLM\u7814\u7a76\u63d0\u4f9b\u7814\u7a76\u57fa\u7ebf\u548c\u5b9e\u7528\u89c1\u89e3\u3002\u60a8\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\u8be5\u6a21\u578b\uff1ahttps://huggingface.co/kifai/GECKO-7B\u3002|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430](http://arxiv.org/abs/2405.17430)|null|## \u80cc\u666f \u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08\u5982LLaVA\uff09\u5728\u89c6\u89c9-\u8bed\u8a00\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4e9b\u6a21\u578b\u9996\u5148\u5c06\u56fe\u50cf\u5d4c\u5165\u5230\u5927\u91cf\u7684\u56fa\u5b9a\u89c6\u89c9\u4ee4\u724c\u4e2d\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u89c6\u9891\u7b49\u5bc6\u96c6\u89c6\u89c9\u573a\u666f\u65f6\u4f1a\u5bfc\u81f4\u5927\u91cf\u4ee4\u724c\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u5c3d\u7ba1\u5b58\u5728\u4ee4\u724c\u526a\u679d/\u5408\u5e76\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u4e3a\u6bcf\u4e2a\u56fe\u50cf\u751f\u6210\u5355\u4e2a\u957f\u5ea6\u7684\u8f93\u51fa\uff0c\u65e0\u6cd5\u5728\u4fe1\u606f\u5bc6\u5ea6\u4e0e\u6548\u7387\u4e4b\u95f4\u7075\u6d3b\u6743\u8861\u3002\u53d7\u5230\u5957\u5a03\u73a9\u5076\u6982\u5ff5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86M3\uff1a\u5957\u5a03\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u5b66\u4e60\u5c06\u89c6\u89c9\u5185\u5bb9\u8868\u793a\u4e3a\u6355\u6349\u4e0d\u540c\u7c97\u7ec6\u7c92\u5ea6\u4fe1\u606f\u7684\u5d4c\u5957\u89c6\u89c9\u4ee4\u724c\u96c6\u5408\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u65b9\u6cd5\u4e3aLMMs\u5e26\u6765\u4e86\u51e0\u4e2a\u72ec\u7279\u7684\u4f18\u52bf\uff1a(1) \u5728\u6d4b\u8bd5\u5b9e\u4f8b\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u660e\u786e\u63a7\u5236\u89c6\u89c9\u7c92\u5ea6\uff0c\u4f8b\u5982\uff0c\u6839\u636e\u5185\u5bb9\u7684\u590d\u6742\u6027\u6216\u7b80\u6d01\u6027\u8c03\u6574\u7528\u4e8e\u8868\u793a\u56fe\u50cf\u7684\u4ee4\u724c\u6570\u91cf\uff1b(2) M3\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5206\u6790\u73b0\u6709\u6570\u636e\u96c6\u6240\u9700\u7c92\u5ea6\u7684\u6846\u67b6\uff0c\u6211\u4eec\u53d1\u73b0\u50cfCOCO\u8fd9\u6837\u7684\u57fa\u51c6\u53ea\u9700\u8981\u5927\u7ea6~9\u4e2a\u89c6\u89c9\u4ee4\u724c\u5c31\u80fd\u83b7\u5f97\u4e0e\u4f7f\u7528\u6240\u6709576\u4e2a\u4ee4\u724c\u76f8\u5f53\u7684\u51c6\u786e\u6027\uff1b(3) \u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u63a2\u7d22\u6027\u80fd\u4e0e\u89c6\u89c9\u4ee4\u724c\u957f\u5ea6\u4e4b\u95f4\u7684\u6700\u4f73\u6743\u8861\u63d0\u4f9b\u4e86\u57fa\u7840\uff0c\u7814\u7a76\u663e\u793a\u5f53\u524d\u56fa\u5b9a\u89c4\u6a21\u8868\u793a\u4e0e\u7406\u60f3\u4e0a\u9650\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002|\n", "2405.17428": "|**2024-05-27**|**NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models**|Chankyu Lee et.al.|[2405.17428](http://arxiv.org/abs/2405.17428)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNV-Embed\u7684\u65b0\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u63d0\u5347\u57fa\u4e8e\u89e3\u7801\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5bc6\u96c6\u5411\u91cf\u68c0\u7d22\u3002NV-Embed\u901a\u8fc7\u591a\u79cd\u67b6\u6784\u8bbe\u8ba1\u548c\u8bad\u7ec3\u7b56\u7565\u663e\u8457\u589e\u5f3a\u6a21\u578b\u7684\u7075\u6d3b\u6027\u548c\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u7b80\u6d01\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002 \u5728\u67b6\u6784\u65b9\u9762\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9690\u5f0f\u6ce8\u610f\u529b\u5c42\u6765\u83b7\u53d6\u6c60\u5316\u5d4c\u5165\uff0c\u8fd9\u5728\u68c0\u7d22\u548c\u4e0b\u6e38\u4efb\u52a1\u51c6\u786e\u6027\u4e0a\u5747\u4f18\u4e8e\u5e73\u5747\u6c60\u5316\u6216\u4f7f\u7528LLMs\u7684\u6700\u540e\u4e00\u4e2a token\u5d4c\u5165\u3002\u4e3a\u4e86\u6539\u8fdb\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u79fb\u9664\u4e86LLMs\u7684\u81ea\u56de\u5f52\u6ce8\u610f\u529b\u63a9\u7801\uff0c\u5728\u5bf9\u6bd4\u6027\u8bad\u7ec3\u4e2d\u5141\u8bb8\u66f4\u5168\u9762\u7684\u4fe1\u606f\u4ea4\u4e92\u3002 \u5728\u8bad\u7ec3\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u5bf9\u6bd4\u6027\u6307\u4ee4\u8c03\u4f18\u65b9\u6cd5\u3002\u7b2c\u4e00\u9636\u6bb5\u5728\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6307\u4ee4\u8bad\u7ec3\uff0c\u5229\u7528\u6279\u6b21\u5185\u8d1f\u6837\u672c\u548c\u7cbe\u5fc3\u6311\u9009\u7684\u96be\u4f8b\u3002\u7b2c\u4e8c\u9636\u6bb5\u5c06\u5404\u79cd\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u6570\u636e\u878d\u5165\u6307\u4ee4\u8c03\u4f18\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u51c6\u786e\u6027\uff0c\u8fd8\u63d0\u5347\u4e86\u68c0\u7d22\u6027\u80fd\u3002 \u51ed\u501f\u8fd9\u4e9b\u521b\u65b0\uff0cNV-Embed\u4ec5\u4f7f\u7528\u516c\u5f00\u6570\u636e\u5c31\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u9ad8\u5206\uff0c\u8fbe\u523069.32\uff0c\u8363\u767b\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\uff08\u622a\u81f32024\u5e745\u670824\u65e5\uff09\u699c\u9996\uff0c\u6db5\u76d656\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u91cd\u6392\u3001\u5206\u7c7b\u3001\u805a\u7c7b\u548c\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u5ea6\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728BEIR\u768415\u9879\u68c0\u7d22\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u6700\u9ad8\u768459.36\u5206\u3002NV-Embed\u6a21\u578b\u7684\u6e90\u4ee3\u7801\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u5f00\u6e90\uff1ahttps://huggingface.co/nvidia/NV-Embed-v1\u3002|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427](http://arxiv.org/abs/2405.17427)|**[link](https://github.com/kuanchihhuang/reason3d)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b83\u4eec\u5728\u6982\u5ff5\u63a8\u7406\u7b49\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5728\u7406\u89e3\u4e09\u7ef4\u73af\u5883\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u76f8\u5bf9\u6709\u9650\u3002\u672c\u6587\u63d0\u51faReason3D\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u4e3a\u5168\u97623D\u7406\u89e3\u8bbe\u8ba1\u7684\u65b0\u9896LLM\u3002Reason3D\u63a5\u53d7\u70b9\u4e91\u6570\u636e\u548c\u6587\u672c\u63d0\u793a\u4f5c\u4e3a\u8f93\u5165\uff0c\u751f\u6210\u6587\u672c\u54cd\u5e94\u548c\u5206\u5272\u63a9\u7801\uff0c\u652f\u6301\u9ad8\u7ea7\u4efb\u52a1\uff0c\u59823D\u63a8\u7406\u5206\u5272\u3001\u5c42\u6b21\u641c\u7d22\u3001\u8868\u8fbe\u5f0f\u6307\u4ee3\u548c\u8be6\u7ec6\u63a9\u7801\u8f93\u51fa\u7684\u95ee\u7b54\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5206\u5c42\u63a9\u7801\u89e3\u7801\u5668\uff0c\u80fd\u591f\u7cbe\u786e\u5b9a\u4f4d\u5e7f\u9614\u573a\u666f\u4e2d\u7684\u5c0f\u7269\u4f53\u3002\u8be5\u89e3\u7801\u5668\u9996\u5148\u751f\u6210\u4e00\u4e2a\u7c97\u7565\u7684\u4f4d\u7f6e\u4f30\u8ba1\uff0c\u8986\u76d6\u7269\u4f53\u7684\u5927\u81f4\u533a\u57df\uff0c\u7136\u540e\u91c7\u7528\u9010\u6b65\u7ec6\u5316\u7684\u7b56\u7565\uff0c\u663e\u8457\u63d0\u9ad8\u5bf9\u8c61\u8bc6\u522b\u548c\u5206\u5272\u7684\u7cbe\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cReason3D\u5728ScanNet\u548cMatterport3D\u7b49\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\uff0c\u57283D\u8868\u8fbe\u5f0f\u6307\u4ee3\u30013D\u95ee\u7b54\u548c3D\u63a8\u7406\u5206\u5272\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1ahttps://github.com/KuanchihHuang/Reason3D\u3002**|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|\u7531\u4e8e\u5b9e\u4f53\u4ee3\u7406\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u5b83\u4eec\u5fc5\u987b\u5177\u5907\u5168\u9762\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u8fd1\u671f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\u3002\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u786e\u5b9a\u5177\u4f53\u52a8\u4f5c\u65f6\u53ef\u80fd\u5b58\u5728\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\uff0c\u79f0\u4e3a\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u6240\u80fd\u8fbe\u5230\u7684\u6210\u5c31\u9700\u8981\u66f4\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u662f\u6700\u5feb\u7684\uff0c\u6bd4\u4ee5\u524d\u5feb6.8\u500d\u3002|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418](http://arxiv.org/abs/2405.17418)|null|\u5f53\u673a\u5668\u4eba\u64cd\u4f5c\u7b56\u7565\u9762\u5bf9\u65b0\u4efb\u52a1\u6216\u7269\u4f53\u5b9e\u4f8b\u65f6\uff0c\u5176\u52a8\u4f5c\u6027\u80fd\u5f80\u5f80\u4e0d\u5c3d\u4eba\u610f\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u68c0\u6d4b\u548c\u81ea\u6211\u7ea0\u6b63\u5931\u8d25\u52a8\u4f5c\u7684\u80fd\u529b\u5bf9\u4e8e\u5b9e\u9645\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLM\uff09\u5728\u89c6\u89c9\u6307\u4ee4\u8ddf\u968f\u65b9\u9762\u5c55\u73b0\u51fa\u524d\u666f\uff0c\u5e76\u5728\u591a\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u5c06\u901a\u7528MLLM\u4f5c\u4e3a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4eba\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Self-Corrected (SC)-MLLM\uff0c\u4e0d\u4ec5\u4f7f\u5176\u80fd\u591f\u9884\u6d4b\u672b\u7aef\u6267\u884c\u5668\u4f4d\u7f6e\uff0c\u8fd8\u8d4b\u4e88\u5176\u81ea\u4e3b\u8bc6\u522b\u5e76\u7ea0\u6b63\u9519\u8bef\u52a8\u4f5c\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u53c2\u6570\u6548\u7387\u9ad8\u7684\u5fae\u8c03\uff0c\u4f7fMLLM\u5177\u5907\u59ff\u6001\u9884\u6d4b\u529f\u80fd\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u4e00\u4e2a\u8bed\u8a00\u5efa\u6a21\u95ee\u9898\u3002\u5728\u9047\u5230\u6267\u884c\u5931\u8d25\u65f6\uff0c\u6a21\u578b\u80fd\u8bc6\u522b\u4f4e\u5c42\u6b21\u52a8\u4f5c\u9519\u8bef\u7684\u539f\u56e0\uff08\u5982\u4f4d\u7f6e\u548c\u65cb\u8f6c\u8bef\u5dee\uff09\uff0c\u5e76\u4e3b\u52a8\u5bfb\u6c42\u4e13\u5bb6\u7684\u63d0\u793a\u3002\u6839\u636e\u53cd\u9988\uff0cSC-MLLM\u4f1a\u91cd\u65b0\u601d\u8003\u5f53\u524d\u5931\u8d25\u573a\u666f\uff0c\u751f\u6210\u4fee\u6b63\u540e\u7684\u52a8\u4f5c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u8fde\u7eed\u7b56\u7565\u5b66\u4e60\u65b9\u6cd5\uff0c\u9488\u5bf9\u6210\u529f\u7ea0\u6b63\u7684\u6837\u672c\uff0c\u63d0\u5347\u6a21\u578b\u5bf9\u5f53\u524d\u573a\u666f\u914d\u7f6e\u7684\u9002\u5e94\u6027\uff0c\u51cf\u5c11\u4e13\u5bb6\u5e72\u9884\u7684\u9891\u7387\u3002 \u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684SC-MLLM\uff0c\u6211\u4eec\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u6700\u5148\u8fdb\u7684\u673a\u5668\u4ebaMLLM\uff08ManipLLM\uff09\u76f8\u6bd4\uff0cSC-MLLM\u663e\u8457\u63d0\u9ad8\u4e86\u64cd\u4f5c\u7cbe\u5ea6\uff1a\u5728\u5df2\u77e5\u7269\u4f53\u7c7b\u522b\u4e0a\u4ece57%\u63d0\u5347\u81f379%\uff0c\u5728\u672a\u77e5\u65b0\u7c7b\u522b\u4e0a\u4ece47%\u63d0\u5347\u81f369%\u3002|\n", "2405.17402": "|**2024-05-27**|**THREAD: Thinking Deeper with Recursive Spawning**|Philip Schroeder et.al.|[2405.17402](http://arxiv.org/abs/2405.17402)|**[link](https://github.com/philipmit/thread)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u4f46\u968f\u7740\u4e0a\u4e0b\u6587\u7684\u957f\u5ea6\u548c\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u5b83\u4eec\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Thinking Recursively and Dynamically\uff08ThReaD\uff09\u65b9\u6cd5\u3002ThReaD\u5c06\u6a21\u578b\u751f\u6210\u8fc7\u7a0b\u6784\u60f3\u4e3a\u4e00\u4e2a\u6267\u884c\u6d41\u7a0b\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u53ef\u4ee5\u5b8c\u6574\u8fd0\u884c\u6216\u52a8\u6001\u5730\u521b\u5efa\u65b0\u7ebf\u7a0b\u3002\u901a\u8fc7\u5b50\u7ebf\u7a0b\uff0c\u6a21\u578b\u53ef\u4ee5\u5206\u53d1\u4efb\u52a1\uff08\u5982\u601d\u8003\u3001\u83b7\u53d6\u4fe1\u606f\uff09\uff0c\u5b50\u7ebf\u7a0b\u53ea\u8fd4\u56de\u7236\u7ebf\u7a0b\u6240\u9700\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u8ba9\u6a21\u578b\u80fd\u591f\u6839\u636e\u9700\u8981\u8c03\u6574\u4ea7\u751f\u4ee4\u724c\u65f6\u4f7f\u7528\u7684\u4e2d\u95f4\u5de5\u4f5c\u91cf\u3002\u6211\u4eec\u5728\u4efb\u52a1\u89e3\u51b3\u548c\u95ee\u7b54\u7b49\u573a\u666f\u4e2d\u5e94\u7528ThReaD\uff0c\u4f7f\u5176\u80fd\u9012\u5f52\u5730\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u6216\u95ee\u9898\u5206\u89e3\u4e3a\u9010\u6b65\u7b80\u5316\u7684\u5c0f\u5b50\u95ee\u9898\uff0c\u7531\u5355\u72ec\u7684\u5b50\u7ebf\u7a0b\u89e3\u51b3\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u7684\u65b9\u5f0f\u5b9e\u73b0ThReaD\uff0c\u5e76\u5728\u5305\u62ecALFWorld\u3001TextCraft\u3001WebShop\u5728\u5185\u7684\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30GPT-4\u548cGPT-3.5\u7684\u8868\u73b0\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65b0\u57fa\u51c6\uff1aDataCommons QA\u548cMIMIC-III ICU QA\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cThReaD\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u76f8\u5bf9\u4e8e\u73b0\u6709\u6846\u67b6\uff0c\u5373\u4f7f\u662f\u5c0f\u578b\u6a21\u578b\uff08\u5982Llama-3-8b\u548cCodeLlama-7b\uff09\u4e5f\u80fd\u63d0\u534710%\u523050%\u7684\u7edd\u5bf9\u5206\u6570\u3002|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386](http://arxiv.org/abs/2405.17386)|**[link](https://github.com/cone-mt/mindmerger)**|## \u4efb\u52a1 \u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u82f1\u8bed\u4e0e\u5176\u4ed6\u975e\u82f1\u8bed\u8bed\u8a00\u4e4b\u95f4\u7684\u5dee\u8ddd\u660e\u663e\u3002\u4e00\u4e9b\u7814\u7a76\u901a\u8fc7\u5fae\u8c03LLMs\u4ee5\u91cd\u65b0\u5b66\u4e60\u975e\u82f1\u8bed\u7684\u63a8\u7406\u80fd\u529b\uff0c\u800c\u53e6\u4e00\u4e9b\u65b9\u6cd5\u5219\u4f7f\u7528\u5916\u90e8\u6a21\u578b\uff08\u5982\u82f1\u8bed\u7ffb\u8bd1\u6587\u672c\uff09\u7684\u8f93\u51fa\u6765\u66ff\u6362\u975e\u82f1\u8bed\u8f93\u5165\uff0c\u4ee5\u5e94\u5bf9LLM\u7406\u89e3\u975e\u82f1\u8bed\u7684\u6311\u6218\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u672a\u80fd\u5145\u5206\u5229\u7528LLMs\u5185\u5728\u7684\u63a8\u7406\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528LLMs\u7684\u601d\u7ef4\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u79f0\u4e3aMindMerger\uff0c\u5b83\u5c06LLMs\u4e0e\u591a\u8bed\u8a00\u6a21\u578b\u7684\u5916\u90e8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u4ee5\u63d0\u5347\u591a\u8bed\u8a00\u63a8\u7406\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e24\u6b65\u8bad\u7ec3\u7b56\u7565\uff0c\u9996\u5148\u5c06\u5916\u90e8\u80fd\u529b\u5d4c\u5165LLMs\uff0c\u7136\u540e\u8bad\u7ec3\u5916\u90e8\u80fd\u529b\u548c\u5185\u7f6e\u80fd\u529b\u7684\u534f\u4f5c\u4f7f\u7528\u3002\u5728\u4e09\u4e2a\u591a\u8bed\u8a00\u63a8\u7406\u6570\u636e\u96c6\u548c\u4e00\u4e2a\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMindMerger\u59cb\u7ec8\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u7279\u522b\u662f\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e0a\u3002\u5728\u4e0d\u66f4\u65b0LLMs\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0cMGSM\u6570\u636e\u96c6\u4e0a\u6240\u6709\u8bed\u8a00\u7684\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e866.7%\uff0c\u4f4e\u8d44\u6e90\u8bed\u8a00\u63d0\u9ad8\u4e868.0%\u3002|\n", "2405.17382": "|**2024-05-27**|**ReMoDetect: Reward Models Recognize Aligned LLM's Generations**|Hyunseok Lee et.al.|[2405.17382](http://arxiv.org/abs/2405.17382)|**[link](https://github.com/hyunseoklee-ai/reward_llm_detect)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6613\u7528\u6027\u63d0\u5347\uff0c\u5b83\u4eec\u5e26\u6765\u7684\u793e\u4f1a\u98ce\u9669\uff0c\u5982\u5047\u65b0\u95fb\u751f\u6210\uff0c\u4fc3\u4f7f\u5f00\u53d1\u51fa\u80fd\u68c0\u6d4bLLM\u751f\u6210\u6587\u672c\uff08LGT\uff09\u7684\u65b9\u6cd5\u4ee5\u786e\u4fdd\u5b89\u5168\u4f7f\u7528\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5927\u91cfLLM\u7684\u5b58\u5728\uff0c\u9010\u4e2a\u8bc6\u522b\u5b83\u4eec\u7684\u7279\u70b9\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u5173\u6ce8\u7684\u662f\u8fd9\u4e9b\u5f3a\u5927\u6a21\u578b\u5171\u6709\u7684\u7279\u6027\uff0c\u5373\u201c\u5bf9\u9f50\u8bad\u7ec3\u201d\uff0c\u5373\u8bad\u7ec3LLMs\u751f\u6210\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u6587\u672c\u3002\u6211\u4eec\u7684\u5173\u952e\u53d1\u73b0\u662f\uff0c\u968f\u7740\u8fd9\u4e9b\u5bf9\u9f50\u8bad\u7ec3\u7684LLMs\u81f4\u529b\u4e8e\u6700\u5927\u5316\u4eba\u7c7b\u504f\u597d\uff0c\u5b83\u4eec\u751f\u6210\u7684\u6587\u672c\u751a\u81f3\u6bd4\u4eba\u7c7b\u64b0\u5199\u7684\u6587\u672c\u5728\u4f30\u8ba1\u504f\u597d\u4e0a\u66f4\u9ad8\uff0c\u8fd9\u4f7f\u5f97\u5229\u7528\u504f\u597d\u6a21\u578b\uff08\u4e00\u4e2a\u8bad\u7ec3\u6765\u6a21\u62df\u4eba\u7c7b\u504f\u597d\u5206\u5e03\u7684LLM\uff09\u8f7b\u6613\u5c31\u80fd\u68c0\u6d4b\u5230\u8fd9\u4e9b\u6587\u672c\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e24\u79cd\u8fdb\u4e00\u6b65\u589e\u5f3a\u504f\u597d\u6a21\u578b\u68c0\u6d4b\u80fd\u529b\u7684\u8bad\u7ec3\u7b56\u7565\uff1a\uff081\uff09\u6301\u7eed\u504f\u597d\u5fae\u8c03\uff0c\u4f7f\u6a21\u578b\u66f4\u504f\u5411\u4e8e\u8bc6\u522b\u5bf9\u9f50\u7684LLG\uff1b\uff082\uff09\u5956\u52b1\u6a21\u578b\u5bf9\u4eba/LLM\u6df7\u5408\u6587\u672c\u7684\u5b66\u4e60\uff0c\u5373\u4f7f\u7528\u5bf9\u9f50LLM\u91cd\u8ff0\u7684\u4eba\u7c7b\u539f\u521b\u6587\u672c\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8eLGT\u548c\u4eba\u7c7b\u6587\u672c\u4e4b\u95f4\u7684\u504f\u597d\u57fa\u51c6\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u5b66\u4e60\u51b3\u7b56\u8fb9\u754c\u3002\u6211\u4eec\u5728\u516d\u4e2a\u6587\u672c\u9886\u57df\u548c\u5341\u4e8c\u79cd\u5bf9\u9f50LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5728https://github.com/hyunseoklee-ai/reward_llm_detect\u4e0a\u63d0\u4f9b\u3002|\n", "2405.17378": "|**2024-05-27**|**RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects**|Ahmed Allam et.al.|[2405.17378](http://arxiv.org/abs/2405.17378)|**[link](https://github.com/AUCOHL/RTL-Repo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8f85\u52a9\u8fdb\u884c\u5bc4\u5b58\u5668\u4f20\u8f93\u7ea7\uff08Register Transfer Level, RTL\uff09\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5728\u53cd\u6620\u771f\u5b9e\u4e16\u754cRTL\u9879\u76ee\u590d\u6742\u6027\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u4e3a\u6b64\uff0c\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u57fa\u51c6\u2014\u2014RTL-Repo\uff0c\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5927\u89c4\u6a21RTL\u8bbe\u8ba1\u9879\u76ee\u4e2d\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u3002RTL-Repo\u5305\u542b\u4e86\u4eceGitHub\u516c\u5171\u4ed3\u5e93\u63d0\u53d6\u7684\u8d85\u8fc74000\u4e2aVerilog\u4ee3\u7801\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u63d0\u4f9b\u4e86\u5bf9\u5e94\u4ed3\u5e93\u7684\u5b8c\u6574\u4e0a\u4e0b\u6587\u3002\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u3001GPT-3.5\u3001Starcoder2\u4ee5\u53ca\u50cfVeriGen\u548cRTLCoder\u8fd9\u6837\u7684Verilog\u4e13\u7528\u6a21\u578b\u5728\u5185\u7684\u591a\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728RTL-Repo\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6bd4\u8f83\u5b83\u4eec\u5728\u751f\u6210\u590d\u6742\u9879\u76ee\u7684Verilog\u4ee3\u7801\u65b9\u9762\u7684\u8868\u73b0\u3002RTL-Repo\u4e3a\u786c\u4ef6\u8bbe\u8ba1\u793e\u533a\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b9d\u8d35\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645RTL\u8bbe\u8ba1\u573a\u666f\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u9488\u5bf9\u590d\u6742\u7684\u591a\u6587\u4ef6RTL\u9879\u76ee\u4e13\u95e8\u8bad\u7ec3Verilog\u4ee3\u7801\u751f\u6210\u3002RTL-Repo\u662f\u5f00\u6e90\u7684\uff0c\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u53ef\u7528\u3002|\n", "2405.17374": "|**2024-05-28**|**Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models**|ShengYun Peng et.al.|[2405.17374](http://arxiv.org/abs/2405.17374)|**[link](https://github.com/shengyun-peng/llm-landscape)**|### \u80cc\u666f \u5b89\u5168\u6821\u51c6\u662f\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u884c\u4e3a\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u5e76\u907f\u514d\u6709\u5bb3\u884c\u4e3a\u7684\u5173\u952e\uff0c\u4f46\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u4ec5\u4f7f\u7528\u5c11\u91cf\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bad\u7ec3\u6837\u672c\u6765\u5fae\u8c03\u6a21\u578b\u53ef\u80fd\u5bfc\u81f4\u5b89\u5168\u6027\u88ab\u8f7b\u6613\u7834\u574f\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u901a\u8fc7\u63a2\u7d22LLM\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u98ce\u9669\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u4e2a\u666e\u904d\u5b58\u5728\u4e8e\u6d41\u884c\u5f00\u6e90LLM\u6a21\u578b\u53c2\u6570\u7a7a\u95f4\u4e2d\u7684\u65b0\u73b0\u8c61\uff0c\u79f0\u4e3a\u201c\u5b89\u5168\u76c6\u5730\u201d\uff1a\u968f\u673a\u6270\u52a8\u6a21\u578b\u6743\u91cd\u80fd\u4f7f\u6a21\u578b\u5728\u5c40\u90e8\u533a\u57df\u4fdd\u6301\u539f\u59cb\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u6027\u3002 ### \u53d1\u73b0\u4e0e\u8d21\u732e \u6211\u4eec\u7684\u53d1\u73b0\u542f\u53d1\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5b89\u5168\u5ea6\u91cf\u65b9\u6cd5\u2014\u2014VISAGE\uff0c\u5b83\u901a\u8fc7\u63a2\u6d4b\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30LLM\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u5b89\u5168\u6027\u3002\u53ef\u89c6\u5316\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6709\u52a9\u4e8e\u7406\u89e3\u5fae\u8c03\u5982\u4f55\u4f7f\u6a21\u578b\u504f\u79bb\u5b89\u5168\u76c6\u5730\uff0c\u4ece\u800c\u635f\u5bb3\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u7cfb\u7edf\u63d0\u793a\u5728\u4fdd\u62a4\u6a21\u578b\u65b9\u9762\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u79cd\u4fdd\u62a4\u751a\u81f3\u4f1a\u4f20\u9012\u7ed9\u5904\u4e8e\u5b89\u5168\u76c6\u5730\u5185\u7684\u6270\u52a8\u7248\u672c\u3002\u8fd9\u4e9b\u4ece\u5b89\u5168\u666f\u89c2\u7814\u7a76\u4e2d\u5f97\u51fa\u7684\u89c1\u89e3\u4e3a\u672a\u6765LLM\u5b89\u5168\u9886\u57df\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b0\u7684\u6d1e\u89c1\u3002|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414](http://arxiv.org/abs/2405.18414)|null|## \u80cc\u666f \u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval Augmented Generation\uff0cRAG\uff09\u901a\u8fc7\u7ed3\u5408\u73b0\u6709\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u7684\u54cd\u5e94\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u6587\u6863\u4e0e\u95ee\u9898\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u6027\u4e0d\u660e\u663e\u6216\u5b58\u5728\u90e8\u5206\u4fe1\u606f\u65f6\uff0cRAG\u7684\u6548\u679c\u5982\u4f55\uff1f\u53c8\u8be5\u5982\u4f55\u5904\u7406\u6587\u6863\u4e4b\u95f4\u7684\u5173\u8054\u6027\u5462\uff1f\u672c\u7814\u7a76\u65e8\u5728\u89e3\u7b54RAG\u751f\u6210\u4e2d\u7684\u8fd9\u4e24\u4e2a\u6838\u5fc3\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aG-RAG\u7684\u65b9\u6cd5\uff0c\u5b83\u662f\u4e00\u4e2a\u57fa\u4e8e\u56fe\u795e\u7ecf\u7f51\u7edc\uff08Graph Neural Networks\uff0cGNNs\uff09\u7684\u91cd\u6392\u5668\uff0c\u4ecb\u4e8eRAG\u7684\u68c0\u7d22\u5668\u548c\u9605\u8bfb\u5668\u4e4b\u95f4\u3002G-RAG\u7ed3\u5408\u4e86\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u63a5\u6027\u548c\u8bed\u4e49\u4fe1\u606f\uff08\u901a\u8fc7\u62bd\u8c61\u610f\u4e49\u8868\u793a\u56fe\uff09\uff0c\u4e3aRAG\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u6709\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u6392\u540d\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cG-RAG\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u9886\u5148\u65b9\u6cd5\uff0c\u540c\u65f6\u8ba1\u7b97\u5f00\u9500\u66f4\u5c0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86PaLM 2\u4f5c\u4e3a\u91cd\u6392\u5668\u7684\u8868\u73b0\uff0c\u53d1\u73b0\u5176\u660e\u663e\u900a\u8272\u4e8eG-RAG\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5373\u4f7f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u91cd\u6392\u5728RAG\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2405.18386": "|**2024-05-28**|**Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning**|Yixiao Zhang et.al.|[2405.18386](http://arxiv.org/abs/2405.18386)|**[link](https://github.com/ldzhangyx/instruct-MusicGen)**|**\u5728\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u9886\u57df\uff0c\u8fd1\u671f\u7684\u8fdb\u6b65\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u6765\u6539\u53d8\u97f3\u4e50\u98ce\u683c\u6216\u8c03\u6574\u4e50\u5668\u5143\u7d20\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u4ece\u5934\u8bad\u7ec3\u7279\u5b9a\u7684\u7f16\u8f91\u6a21\u578b\uff0c\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\uff0c\u8981\u4e48\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u7f16\u8f91\u540e\u7684\u97f3\u4e50\uff0c\u5bfc\u81f4\u97f3\u9891\u91cd\u5efa\u4e0d\u591f\u7cbe\u786e\u3002\u4e3a\u4e86\u7ed3\u5408\u4f18\u70b9\u5e76\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Instruct-MusicGen\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u9488\u5bf9\u9884\u8bad\u7ec3\u7684MusicGen\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9ad8\u6548\u5730\u6267\u884c\u7f16\u8f91\u6307\u4ee4\uff0c\u5982\u6dfb\u52a0\u3001\u5220\u9664\u6216\u5206\u79bb\u97f3\u8f68\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fee\u6539\u4e86\u539f\u59cbMusicGen\u67b6\u6784\uff0c\u5f15\u5165\u4e86\u6587\u672c\u878d\u5408\u6a21\u5757\u548c\u97f3\u9891\u878d\u5408\u6a21\u5757\uff0c\u4f7f\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6307\u4ee4\u6587\u672c\u548c\u97f3\u9891\u8f93\u5165\uff0c\u751f\u6210\u6240\u9700\u7684\u7f16\u8f91\u97f3\u4e50\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cInstruct-MusicGen\u4ec5\u5411\u539f\u59cb\u6a21\u578b\u589e\u52a0\u4e868%\u7684\u65b0\u53c2\u6570\uff0c\u5e76\u57285000\u6b65\u7684\u8bad\u7ec3\u540e\uff0c\u5176\u6027\u80fd\u8d85\u8d8a\u73b0\u6709\u57fa\u51c6\uff0c\u4e14\u8868\u73b0\u51fa\u4e0e\u4e13\u95e8\u9488\u5bf9\u4efb\u52a1\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u8fdb\u5c55\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u7684\u6548\u7387\uff0c\u8fd8\u62d3\u5bbd\u4e86\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u5728\u52a8\u6001\u97f3\u4e50\u5236\u4f5c\u73af\u5883\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002**|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380](http://arxiv.org/abs/2405.18380)|**[link](https://github.com/pixeli99/owlore)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\u3002\u7136\u800c\uff0c\u5927\u6a21\u578b\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u7b49\u53c2\u6570\u9ad8\u6548\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u4f46\u5f80\u5f80\u727a\u7272\u6027\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5185\u5b58\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u2014\u2014Outlier-weighed Layerwise Sampled Low-Rank Projection\uff08OwLore\uff09\uff0c\u5b83\u53d7\u5230LLMs\u5c42\u95f4\u5f02\u5e38\u5206\u5e03\u7684\u542f\u53d1\uff0c\u901a\u8fc7\u52a8\u6001\u91c7\u6837\u9884\u8bad\u7ec3\u5c42\u800c\u975e\u6dfb\u52a0\u989d\u5916\u9002\u914d\u5668\u6765\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u9996\u5148\u901a\u8fc7Heavy-Tailed Self-Regularization\u7406\u8bba\uff08HT-SR\uff09\u89e3\u8bfb\u5f02\u5e38\u73b0\u8c61\uff0c\u53d1\u73b0\u5177\u6709\u66f4\u591a\u5f02\u5e38\u503c\u7684\u5c42\u66f4\u503e\u5411\u4e8e\u5448\u73b0\u957f\u5c3e\u5206\u5e03\uff0c\u8bad\u7ec3\u6548\u679c\u66f4\u597d\u3002\u56e0\u6b64\uff0cOwLore\u7b56\u7565\u6027\u5730\u4e3a\u5f02\u5e38\u503c\u8f83\u591a\u7684\u5c42\u5206\u914d\u66f4\u9ad8\u7684\u91c7\u6837\u6982\u7387\uff0c\u4ee5\u66f4\u597d\u5730\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u77e5\u8bc6\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u51cf\u5c11\u5fae\u8c03\u65f6\u7684\u5185\u5b58\u9700\u6c42\uff0c\u6211\u4eec\u7ed3\u5408\u68af\u5ea6\u4f4e\u79e9\u6295\u5f71\uff0c\u4f7f\u5f97\u6bcf\u4e00\u5c42\u80fd\u4ee5\u4f4e\u79e9\u65b9\u5f0f\u9ad8\u6548\u8bad\u7ec3\u3002\u901a\u8fc7\u878d\u5408\u4f4e\u79e9\u4f18\u52bf\u548c\u6700\u4f18\u5c42\u522b\u91c7\u6837\u7b56\u7565\uff0cOwLore\u663e\u8457\u4f18\u5316\u4e86LLM\u526a\u679d\u4e2d\u7684\u5185\u5b58-\u6027\u80fd\u6743\u8861\u3002\u6211\u4eec\u5728\u591a\u4e2a\u67b6\u6784\uff0c\u5982LLaMa2\u3001LLaMa3\u548cMistral\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cOwLore\u6301\u7eed\u4f18\u4e8e\u57fa\u7840\u65b9\u6cd5\uff0c\u5305\u62ec\u5168\u91cf\u5fae\u8c03\u3002\u4f8b\u5982\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u57fa\u51c6\u4e0a\uff0cOwLore\u53ef\u5b9e\u73b0\u5e73\u57471.1%\u7684\u7cbe\u5ea6\u63d0\u5347\uff0cMMLU\u4e0a\u63d0\u9ad83.0%\uff0c\u800c\u5728MT-Bench\u4e0a\u66f4\u662f\u6709\u663e\u8457\u768410%\u63d0\u5347\uff0c\u540c\u65f6\u5185\u5b58\u6548\u7387\u66f4\u9ad8\u3002\u7279\u522b\u5730\uff0cOwLore\u4ec5\u970021GB\u5185\u5b58\u5373\u53ef\u5bf9LLaMa2-7B\u8fdb\u884c\u5fae\u8c03\u3002**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377](http://arxiv.org/abs/2405.18377)|null|\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u590d\u6742\u63a8\u7406\u3001\u60c5\u611f\u5206\u6790\u7b49\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u63a8\u52a8\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f3a\u5927\u7684\u529f\u80fd\u4f34\u968f\u7740\u5de8\u5927\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u9650\u5236\u4e86\u5728\u5927\u591a\u6570\u786c\u4ef6\u5e73\u53f0\u4e0a\u7684\u4f7f\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8eLLaMA2-7B\u8fdb\u884c\u5355\u6b21\u5fae\u8c03\u540e\uff0c\u901a\u8fc7\u9057\u4f20\u7b97\u6cd5\u641c\u7d22\u627e\u5230\u66f4\u5c0f\u3001\u8ba1\u7b97\u590d\u6742\u5ea6\u66f4\u4f4e\u7684\u7f51\u7edc\u67b6\u6784\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u6807\u51c6\u57fa\u51c6\u4efb\u52a1\uff0c\u9884\u8bad\u7ec3\u7684LLaMA2-7B\u6a21\u578b\u5b9e\u9645\u4e0a\u8fc7\u4e8e\u5e9e\u5927\u4e14\u590d\u6742\u3002\u6211\u4eec\u5b9e\u73b0\u4e861.5\u500d\u7684\u6a21\u578b\u5927\u5c0f\u7f29\u51cf\u548c1.3\u500d\u7684\u541e\u5410\u91cf\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51e0\u4e4e\u65e0\u635f\u7684\u51c6\u786e\u6027\u3002\u76f8\u8f83\u4e8e\u67d0\u4e9b\u526a\u679d\u6216\u7a00\u758f\u5316\u6280\u672f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6548\u7387\u548c\u6548\u679c\u4e0a\u66f4\u4e3a\u4f18\u8d8a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u91cf\u5316\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\u7684\u6548\u679c\uff0c\u8fdb\u4e00\u6b65\u901a\u8fc7\u91cf\u5316\u51cf\u5c11\u4e86\u627e\u5230\u7684\u7f51\u7edc\u7684\u5927\u5c0f\u548c\u590d\u6742\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u672c\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u79cd\u81ea\u52a8\u521b\u5efa\u53ef\u5728\u66f4\u5ec9\u4ef7\u548c\u5e7f\u6cdb\u53ef\u7528\u786c\u4ef6\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684LLMs\u7684\u65b9\u6cd5\u3002|\n", "2405.18376": "|**2024-05-28**|**Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning**|Dongjie Chen et.al.|[2405.18376](http://arxiv.org/abs/2405.18376)|**[link](https://github.com/Dong-Jie-Chen/RCL)**|**### \u80cc\u666f \u6e90\u514d\u8d39\u9886\u57df\u9002\u5e94\uff08SFDA\uff09\u7684\u76ee\u6807\u662f\u4ec5\u4f7f\u7528\u672a\u6807\u8bb0\u7684\u9776\u57df\u6570\u636e\u6765\u8c03\u6574\u9884\u8bad\u7ec3\u7684\u6e90\u6a21\u578b\u3002\u5f53\u524d\u7684SFDA\u65b9\u6cd5\u5728\u6709\u6548\u5229\u7528\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u6316\u6398\u9776\u57df\u6570\u636e\u6f5c\u529b\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u548c\u6587\u672c\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5e94\u7528\u4e8eSFDA\u65f6\u5b58\u5728\u95ee\u9898\uff0c\u5982\u6307\u4ee4\u6267\u884c\u5931\u8d25\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\u4ee5\u53ca\u5728\u9002\u5e94\u524d\u6027\u80fd\u8bc4\u4f30\u56f0\u96be\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014\u53ef\u9760\u6027\u57fa\u4e8e\u8bfe\u7a0b\u5b66\u4e60\uff08RCL\uff09\uff0c\u5b83\u901a\u8fc7\u4f2a\u6807\u7b7e\u5316\u6574\u5408\u591a\u4e2aMLLM\u4ee5\u4fc3\u8fdb\u77e5\u8bc6\u5229\u7528\uff0c\u5e94\u7528\u4e8eSFDA\u3002 ### \u65b9\u6cd5 \u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\uff1a1) \u53ef\u9760\u77e5\u8bc6\u8f6c\u79fb\uff0c2) \u81ea\u6211\u7ea0\u6b63\uff0c3) MLLM\u5f15\u5bfc\u7684\u77e5\u8bc6\u6269\u5c55\uff0c\u4ee5\u53ca4) \u591a\u70ed\u63a9\u7801\u7cbe\u70bc\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u534f\u540c\u4f5c\u7528\uff0c\u9010\u6b65\u53d1\u6398\u9776\u57df\u672a\u6807\u8bb0\u6570\u636e\u7684\u4ef7\u503c\u3002RCL\u5728\u591a\u4e2aSFDA\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\uff08SOTA\uff09\u6027\u80fd\uff0c\u4f8b\u5982\u5728DomainNet\u4e0a\u63d0\u5347\u663e\u8457\uff0c\u8fbe\u5230$\\textbf{+9.4\\%}$\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u589e\u5f3a\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u540c\u65f6\u65e0\u9700\u8bbf\u95ee\u6e90\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Dong-Jie-Chen/RCL\u83b7\u53d6\u3002**|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375](http://arxiv.org/abs/2405.18375)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\u5e38\u8bc6\u63a8\u7406\u662f\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u7684\u91cd\u8981\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u6b64\u5df2\u5f00\u53d1\u51fa\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5927\u591a\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u521b\u5efa\u5e73\u884c\u57fa\u51c6\u6709\u52a9\u4e8e\u8de8\u8bed\u8a00\u8bc4\u4f30\uff0c\u4ece\u800c\u66f4\u597d\u5730\u7406\u89e3\u4e0d\u540c\u8bed\u8a00\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u6cf0\u8bed\u7248\u7684Winograd Schema\u96c6\u5408\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u6d4b\u8bd5\u6cf0\u8bed\u4e2d\u7684\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u65b0\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u9080\u8bf7\u6bcd\u8bed\u8005\u3001\u4e13\u4e1a\u7ffb\u8bd1\u548c\u4e25\u683c\u9a8c\u8bc1\u7684\u65b9\u6cd5\uff0c\u786e\u4fdd\u8be5\u7cfb\u5217\u9898\u5e93\u80fd\u51c6\u786e\u53cd\u6620\u6cf0\u56fd\u8bed\u8a00\u7684\u72ec\u7279\u6027\u3001\u4e60\u8bed\u548c\u6587\u5316\u5f15\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u7cca\u6027\u548c\u5e38\u8bc6\u6311\u6218\u3002\u6211\u4eec\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u548cClaude-3-Opus\uff09\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5c3d\u7ba1\u5728\u82f1\u8bed\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u4f46\u5b83\u4eec\u5728\u6cf0\u8bed\u4e2d\u7684\u6027\u80fd\u660e\u663e\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u5728\u591a\u8bed\u8a00\u5e38\u8bc6\u63a8\u7406\u65b9\u9762\u4ecd\u6709\u5f85\u8fdb\u6b65\u3002|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369](http://arxiv.org/abs/2405.18369)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u7684\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u5b83\u4eec\u6210\u529f\u7684\u5173\u952e\u5728\u4e8e\u63d0\u793a\u7684\u6982\u5ff5\uff0c\u5373\u6307\u5bfc\u6a21\u578b\u751f\u6210\u8f93\u51fa\u3002\u7136\u800c\uff0c\u624b\u52a8\u521b\u5efa\u63d0\u793a\u65e2\u8017\u65f6\u53c8\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\uff0c\u56e0\u6b64\u9700\u8981\u81ea\u52a8\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u4ecb\u7ecdPromptWizard\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u8fed\u4ee3\u5730\u5408\u6210\u548c\u4f18\u5316\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u63d0\u793a\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0cPromptWizard\u540c\u65f6\u4f18\u5316\u63d0\u793a\u6307\u4ee4\u548c\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u4ee5\u6700\u5927\u5316\u6a21\u578b\u6027\u80fd\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u53d8\u5f02\u6307\u4ee4\u5e76\u5f15\u5165\u8d1f\u4f8b\uff0c\u9010\u6b65\u6df1\u5316\u7406\u89e3\u5e76\u4fdd\u8bc1\u591a\u6837\u6027\u3002\u501f\u52a9\u4e00\u4e2a\u8bc4\u5224\u8005\uff0cPromptWizard\u8fdb\u4e00\u6b65\u6539\u8fdb\u6307\u4ee4\u548c\u793a\u4f8b\uff0c\u878d\u5165\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u4ee5\u5b9e\u73b0\u6700\u4f73\u8868\u73b0\u3002PromptWizard\u5177\u6709\u8ba1\u7b97\u6548\u7387\u9ad8\u3001\u9002\u5e94\u4e0d\u540c\u8bad\u7ec3\u6570\u636e\u91cf\u573a\u666f\u4ee5\u53ca\u5728\u5c0f\u578bLLM\u4e0a\u540c\u6837\u6709\u6548\u7684\u7279\u70b9\u3002\u901a\u8fc7\u5bf98\u4e2a\u6570\u636e\u96c6\u768435\u4e2a\u4efb\u52a1\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aPromptWizard\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u7b56\u7565\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u63d0\u793a\u4f18\u5316\u65b9\u9762\u7684\u9ad8\u6548\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361](http://arxiv.org/abs/2405.18361)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u4efb\u52a1\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u57fa\u4e8e\u7aef\u5230\u7aef\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u5e94\u7528\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u8bd5\u56fe\u878d\u5408\u5f3a\u5927\u7684\u903b\u8f91\u63a8\u7406\u548c\u8ba4\u77e5\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u5168\u9762\u7684\u7aef\u5230\u7aef\u89c4\u5212\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684VLM\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e2D\u89c6\u89c9\u5206\u8bcd\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5728\u5904\u7406\u4e09\u7ef4\u51e0\u4f55\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u9760\u7684\u89c4\u5212\u81f3\u5173\u91cd\u8981\u3002\u7814\u7a76\u8868\u660e\uff0c2D\u5206\u8bcd\u7684LLM\u5e76\u4e0d\u80fd\u51c6\u786e\u611f\u77e5\u4e09\u7ef4\u73af\u5883\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8eVLM\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u53ef\u9760\u6027\u7684\u8d28\u7591\u3002 \u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86DETR\u98ce\u683c\u76843D\u611f\u77e5\u5668\u4f5c\u4e3a3D\u5206\u8bcd\u5668\uff0c\u4e0e\u5355\u5c42\u7ebf\u6027\u6295\u5f71\u5668\u76f8\u8fde\uff0c\u5de7\u5999\u5730\u5229\u7528\u4e86\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u7684\u56fa\u6709\u7279\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u9ad8\u5206\u8fa8\u7387\u591a\u89c6\u89d2\u56fe\u50cf\u7684\u540c\u65f6\u5904\u7406\u548c\u65f6\u7a7a\u5efa\u6a21\u3002\u5c3d\u7ba1\u7b80\u5355\uff0c\u4f46Atlas\u5728NuScenes\u6570\u636e\u96c6\u4e0a\u76843D\u68c0\u6d4b\u548c\u81ea\u4e3b\u9a7e\u9a76\u89c4\u5212\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e863D\u5206\u8bcd\u7684LLM\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u9760\u81ea\u52a8\u9a7e\u9a76\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2405.18359": "|**2024-05-28**|**Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs**|Somnath Kumar et.al.|[2405.18359](http://arxiv.org/abs/2405.18359)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5168\u7403\u8303\u56f4\u5185\u91cd\u5851\u4f17\u591a\u9886\u57df\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u975e\u62c9\u4e01\u5b57\u6bcd\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u65f6\u7684\u5305\u5bb9\u6027\u548c\u6548\u679c\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u5173\u952e\u6311\u6218\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u5927\u91cf\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u591a\u8bed\u8a00LLMs\u7684\u8868\u73b0\u3002\u901a\u8fc7\u7cfb\u7edf\u5730\u7814\u7a76\u548c\u8bc4\u4f30\u5404\u79cd\u8bed\u8a00\u5728\u6d41\u884c\u7684\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b0\u9896\u6280\u672f\uff0c\u4ee5\u91ca\u653eLLMs\u5728\u591a\u5143\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u771f\u6b63\u6f5c\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u7b56\u7565\uff0c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u591a\u8bed\u8a00\u80fd\u529b\uff1a\u9996\u5148\uff0c\u7cbe\u5fc3\u4f18\u5316\u9002\u7528\u4e8e\u591a\u8bed\u8a00LLM\u7684\u63d0\u793a\uff0c\u6316\u6398\u5176\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5404\u8bed\u8a00\u7684\u8868\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u6df7\u5408\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u591a\u8bed\u8a00\u5d4c\u5165\u7684LLM\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\uff0c\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u591a\u4efb\u52a1\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u52a8\u6001\u5b66\u4e60\u7b56\u7565\uff0c\u5b9e\u73b0\u5b9e\u65f6\u6839\u636e\u67e5\u8be2\u52a8\u6001\u9009\u62e9\u6700\u5408\u9002\u7684\u63d0\u793a\u7b56\u7565\u3001LLM\u6a21\u578b\u548c\u5d4c\u5165\u6a21\u578b\uff0c\u4ece\u800c\u6700\u5927\u5316LLM\u5728\u4e0d\u540c\u8bed\u8a00\u4e0a\u7684\u6548\u7387\uff0c\u8d85\u8d8a\u4e86\u6700\u4f73\u9759\u6001\u548c\u968f\u673a\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u65e2\u9002\u7528\u4e8e\u79bb\u7ebf\u914d\u7f6e\u8c03\u6574\uff0c\u4e5f\u652f\u6301\u5728\u7ebf\u9002\u5e94\uff0c\u80fd\u591f\u65e0\u7f1d\u9002\u5e94\u65b0\u8bed\u8a00\u548c\u6570\u636e\u96c6\uff0c\u663e\u8457\u63a8\u52a8\u4e86\u591a\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u5728\u5404\u79cd\u8bed\u8a00\u4e2d\u7684\u8fdb\u6b65\u3002|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358](http://arxiv.org/abs/2405.18358)|null|## \u80cc\u666f \u8fd1\u671f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u878d\u5408\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7ec6\u81f4\u7684\u591a\u6a21\u6001\u7406\u89e3\u3001\u590d\u6742\u4efb\u52a1\u89e3\u6790\u4ee5\u53ca\u591a\u6a21\u6001\u4fe1\u606f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMMCTAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u89e3\u51b3\u5f53\u524dMLLM\u5728\u590d\u6742\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u4e2d\u56fa\u6709\u5c40\u9650\u6027\u7684\u65b0\u578b\u591a\u6a21\u6001\u6279\u5224\u6027\u601d\u7ef4\u4ee3\u7406\u6846\u67b6\u3002MMCTAgent\u501f\u9274\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u548c\u6279\u5224\u6027\u601d\u8003\u7684\u7279\u70b9\uff0c\u901a\u8fc7\u8fed\u4ee3\u5206\u6790\u591a\u6a21\u6001\u4fe1\u606f\u3001\u62c6\u89e3\u95ee\u9898\u3001\u89c4\u5212\u7b56\u7565\uff0c\u5e76\u5b9e\u73b0\u52a8\u6001\u63a8\u7406\u3002 \u6b64\u5916\uff0cMMCTAgent\u8fd8\u878d\u5165\u4e86\u6279\u5224\u6027\u601d\u8003\u5143\u7d20\uff0c\u5982\u5bf9\u6700\u7ec8\u7b54\u6848\u7684\u9a8c\u8bc1\u548c\u81ea\u6211\u53cd\u601d\u3002\u5b83\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5b9a\u4e49\u57fa\u4e8e\u89c6\u89c9\u7684\u8bc4\u5224\u8005\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ece\u800c\u63d0\u5347\u51b3\u7b56\u80fd\u529b\u3002\u5728\u591a\u4e2a\u56fe\u50cf\u7406\u89e3\u548c\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u4e25\u8c28\u5730\u8bc4\u4f30\u4e86MMCTAgent\uff08\u5305\u62ec\u5e26\u8bc4\u5224\u8005\u7684\u7248\u672c\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u5728\u8d85\u8d8a\u57fa\u7840MLLM\u548c\u5176\u4ed6\u5de5\u5177\u589e\u5f3a\u7684\u7ba1\u9053\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335](http://arxiv.org/abs/2405.19335)|null|\u6211\u4eec\u63d0\u51faX-VILA\uff0c\u4e00\u79cd\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u529f\u80fd\u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u878d\u5408\u4e86\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u6a21\u6001\u3002\u901a\u8fc7\u5c06\u5404\u6a21\u6001\u7279\u5b9a\u7684\u7f16\u7801\u5668\u4e0eLLM\u8f93\u5165\u5bf9\u9f50\uff0c\u5e76\u5c06\u6269\u6563\u89e3\u7801\u5668\u4e0eLLM\u8f93\u51fa\u5bf9\u9f50\uff0cX-VILA\u5b9e\u73b0\u4e86\u8de8\u6a21\u6001\u7406\u89e3\u3001\u63a8\u7406\u548c\u751f\u6210\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8de8\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6709\u6548\u7684\u4efb\u610f\u6a21\u6001\u6307\u4ee4\u8ddf\u968f\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u7684\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u5b58\u5728\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5bfc\u81f4\u89c6\u89c9\u4fe1\u606f\u4e22\u5931\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u89c6\u89c9\u5bf9\u9f50\u673a\u5236\uff0c\u5305\u62ec\u4e00\u4e2a\u89c6\u89c9\u5d4c\u5165\u9ad8\u901f\u516c\u8def\u6a21\u5757\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u8d44\u6e90\u9ad8\u6548\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u4f7f\u5f97X-VILA\u5728\u4efb\u610f\u6a21\u6001\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5927\u5e45\u8d85\u8d8a\u5148\u524d\u7684\u65b9\u6cd5\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u5728\u7f3a\u4e4f\u7c7b\u4f3c\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0cX-VILA\u5728\u4e0d\u540c\u6a21\u6001\u95f4\u4e5f\u5c55\u73b0\u51fa\u6d8c\u73b0\u7279\u6027\u3002\u8be5\u9879\u76ee\u5c06\u5f00\u6e90\u3002|\n", "2405.19334": "|**2024-05-29**|**LLMs Meet Multimodal Generation and Editing: A Survey**|Yingqing He et.al.|[2405.19334](http://arxiv.org/abs/2405.19334)|**[link](https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u5c06\u5b83\u4eec\u4e0e\u591a\u6a21\u6001\u5b66\u4e60\u76f8\u7ed3\u5408\u3002\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8c03\u67e5\u4e3b\u8981\u96c6\u4e2d\u5728\u7406\u89e3\u4e0a\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8de8\u56fe\u50cf\u3001\u89c6\u9891\u30013D\u548c\u97f3\u9891\u7b49\u9886\u57df\u7684\u591a\u6a21\u6001\u751f\u6210\uff0c\u7279\u522b\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u9886\u57df\u4e2d\u7684\u91cc\u7a0b\u7891\u5f0f\u5de5\u4f5c\u53ca\u5176\u6280\u672f\u8fdb\u6b65\u3002\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u5173\u952e\u6280\u672f\u7ec4\u4ef6\uff0c\u4ee5\u53ca\u5728\u76f8\u5173\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5256\u6790\u4e86\u501f\u52a9\u73b0\u6709\u751f\u6210\u6a21\u578b\u8fdb\u884c\u4eba\u7c7b-\u8ba1\u7b97\u673a\u4ea4\u4e92\u7684\u5de5\u5177\u589e\u5f3a\u578b\u591a\u6a21\u6001\u4ee3\u7406\u3002\u6700\u540e\uff0c\u6211\u4eec\u5168\u9762\u8ba8\u8bba\u4e86\u4eba\u5de5\u667a\u80fd\u5b89\u5168\u7684\u8fdb\u6b65\uff0c\u5e76\u63a2\u7d22\u4e86\u65b0\u5174\u5e94\u7528\u548c\u672a\u6765\u524d\u666f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7cfb\u7edf\u800c\u6df1\u5165\u7684\u591a\u6a21\u6001\u751f\u6210\u6982\u8ff0\uff0c\u6709\u671b\u63a8\u52a8\u751f\u6210\u5185\u5bb9\u7684\u4eba\u5de5\u667a\u80fd\uff08AIGC\uff09\u548c\u4e16\u754c\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6240\u6709\u76f8\u5173\u7684\u8bba\u6587\u5217\u8868\u53ef\u5728\u627e\u5230\u3002**|\n", "2405.19333": "|**2024-05-29**|**Multi-Modal Generative Embedding Model**|Feipeng Ma et.al.|[2405.19333](http://arxiv.org/abs/2405.19333)|null|\u5728\u5927\u591a\u6570\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\uff0c\u95ee\u9898\u53ef\u4ee5\u5f52\u7ed3\u4e3a\u751f\u6210\u6216\u5d4c\u5165\u3002\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u901a\u8fc7\u5c06\u8bed\u8a00\u6a21\u5757\u5206\u89e3\u4e3a\u4e00\u4e2a\u7528\u4e8e\u751f\u6210\u7684\u6587\u672c\u89e3\u7801\u5668\u548c\u4e00\u4e2a\u7528\u4e8e\u5d4c\u5165\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u5904\u7406\u8fd9\u4e24\u79cd\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u591a\u6a21\u6001\u65b9\u6cd5\u7684\u7b80\u7ea6\u6027\uff0c\u672c\u5de5\u4f5c\u8bd5\u56fe\u4ec5\u4f7f\u7528\u4e00\u4e2a\u6a21\u578b\u6765\u5904\u7406\u6bcf\u79cd\u6a21\u6001\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u6a21\u6001\u751f\u6210\u5d4c\u5165\u6a21\u578b\uff08MM-GEM\uff09\uff0c\u5b83\u5c06\u751f\u6210\u548c\u5d4c\u5165\u76ee\u6807\u6574\u5408\u5230\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86PoolAggregator\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u5e76\u5b9e\u73b0\u7ec6\u7c92\u5ea6\u7684\u5d4c\u5165\u548c\u751f\u6210\u80fd\u529b\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u4e4b\u95f4\u5e76\u6ca1\u6709\u663e\u8457\u51b2\u7a81\u3002\u4f8b\u5982\uff0c\u57fa\u4e8eViT-Large\u548cTinyLlama\u7684MM-GEM\u5728\u8bf8\u5982\u8de8\u6a21\u6001\u68c0\u7d22\u548c\u96f6\u6837\u672c\u5206\u7c7b\u7b49\u591a\u6a21\u6001\u5d4c\u5165\u6a21\u578b\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u5177\u5907\u826f\u597d\u7684\u56fe\u50cf\u63cf\u8ff0\u80fd\u529b\u3002\u6b64\u5916\uff0cMM-GEM\u80fd\u591f\u65e0\u7f1d\u6267\u884c\u533a\u57df\u7ea7\u522b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u548c\u68c0\u7d22\u4efb\u52a1\u3002\u53e6\u5916\uff0cMM-GEM\u4e2d\u7684\u5148\u8fdb\u6587\u672c\u6a21\u578b\u5bf9\u4e8e\u957f\u6587\u672c\u548c\u56fe\u50cf\u68c0\u7d22\u7684Recall@1\u6307\u6807\u5e26\u6765\u4e86\u8d85\u8fc75%\u7684\u63d0\u5347\u3002|\n", "2405.19332": "|**2024-05-29**|**Self-Exploring Language Models: Active Preference Elicitation for Online Alignment**|Shenao Zhang et.al.|[2405.19332](http://arxiv.org/abs/2405.19332)|**[link](https://github.com/shenao-zhang/selm)**|****\u6458\u8981\uff1a** \u504f\u597d\u4f18\u5316\uff0c\u7279\u522b\u662f\u5728\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u7684\u9a71\u52a8\u4e0b\uff0c\u5df2\u7ecf\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u610f\u613f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u6210\u5c31\u3002\u76f8\u8f83\u4e8e\u4f7f\u7528\u56fa\u5b9a\u6570\u636e\u96c6\u7684\u79bb\u7ebf\u5bf9\u9f50\uff0c\u901a\u8fc7\u4eba\u6216\u4eba\u5de5\u667a\u80fd\u5bf9\u6a21\u578b\u751f\u6210\u7684\u53cd\u9988\u901a\u5e38\u80fd\u591f\u901a\u8fc7\u8fed\u4ee3\u8fc7\u7a0b\u63d0\u5347\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\u548cLLMs\u7684\u4e00\u81f4\u6027\u3002\u7136\u800c\uff0c\u8981\u5b9e\u73b0\u5168\u5c40\u51c6\u786e\u7684\u5956\u52b1\u6a21\u578b\uff0c\u9700\u8981\u7cfb\u7edf\u5730\u63a2\u7d22\u751f\u6210\u5404\u79cd\u5404\u6837\u7684\u54cd\u5e94\uff0c\u4ee5\u6db5\u76d6\u81ea\u7136\u8bed\u8a00\u7684\u5e7f\u9614\u7a7a\u95f4\u3002\u4ec5\u4f9d\u8d56\u6807\u51c6\u5956\u52b1\u6700\u5927\u5316LLMs\u7684\u968f\u673a\u91c7\u6837\u662f\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u7684\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5c42\u76ee\u6807\uff0c\u4e50\u89c2\u5730\u503e\u5411\u4e8e\u53ef\u80fd\u5177\u6709\u9ad8\u5956\u52b1\u7684\u54cd\u5e94\uff0c\u4ee5\u6b64\u6765\u4e3b\u52a8\u63a2\u7d22\u5206\u5e03\u5916\u533a\u57df\u3002\u901a\u8fc7\u89e3\u51b3\u5185\u5c42\u95ee\u9898\uff0c\u5229\u7528\u91cd\u65b0\u53c2\u6570\u5316\u7684\u5956\u52b1\u51fd\u6570\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aSelf-Exploring Language Models\uff08SELM\uff09\u7684\u7b97\u6cd5\u3002\u5b83\u6d88\u9664\u4e86\u5bf9\u5355\u72ec\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u7684\u9700\u6c42\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u76f4\u89c2\u7684\u76ee\u6807\u5bf9LLMs\u8fdb\u884c\u8fed\u4ee3\u66f4\u65b0\u3002\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u76f8\u6bd4\uff0cSELM\u7684\u76ee\u6807\u964d\u4f4e\u4e86\u5bf9\u672a\u89c1\u8fc7\u7684\u8fc7\u5ea6\u5ef6\u4f38\u7684\u65e0\u5dee\u522b\u504f\u597d\uff0c\u63d0\u9ad8\u4e86\u63a2\u7d22\u6548\u7387\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728Zephyr-7B-SFT\u548cLlama-3-8B-Instruct\u6a21\u578b\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0cSELM\u5728MT-Bench\u548cAlpacaEval 2.0\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4ee5\u53ca\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u5404\u79cd\u6807\u51c6\u5b66\u672f\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328](http://arxiv.org/abs/2405.19328)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u89c4\u8303\u6a21\u5757\u201d\u7684\u67b6\u6784\uff0c\u5b83\u9488\u5bf9\u751f\u6210\u6027\u4ee3\u7406\u5728\u9762\u5bf9\u5305\u542b\u73b0\u6709\u89c4\u8303\u7684\u793e\u4f1a\u7ed3\u6784\u65f6\u7684\u534f\u4f5c\u6311\u6218\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u8bc4\u4f30\u73af\u5883\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u793e\u4f1a\u4efb\u52a1\u65f6\uff0c\u5982\u4f55\u8bc6\u522b\u5e76\u9002\u5e94\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u89c4\u8303\u6a21\u5757\u7684\u6838\u5fc3\u5728\u4e8e\u4fc3\u8fdb\u5747\u8861\u9009\u62e9\uff0c\u501f\u9274\u5206\u7c7b\u673a\u6784\u5b9e\u73b0\u76f8\u5173\u5747\u8861\u7684\u6982\u5ff5\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u540c\u4f34\u4e92\u52a8\u5b66\u4e60\u73af\u5883\u4e2d\u4e0d\u540c\u5019\u9009\u673a\u6784\u4e2d\u7684\u6743\u5a01\u6027\u3002\u901a\u8fc7\u63d0\u5347\u89c4\u8303\u80fd\u529b\uff0c\u4ee3\u7406\u53ef\u4ee5\u534f\u8c03\u5236\u88c1\u884c\u4e3a\uff0c\u8fdb\u800c\u5f71\u54cd\u793e\u4ea4\u73af\u5883\u4e2d\u7684\u57fa\u672c\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4f53\u798f\u7949\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u652f\u6301\u673a\u6784\u7684\u65b0\u73af\u5883\uff0c\u5e76\u6839\u636e\u4e24\u4e2a\u4e3b\u8981\u6807\u51c6\u6765\u8bc4\u4f30\u8be5\u6846\u67b6\uff1a\u4e00\u662f\u4ee3\u7406\u80fd\u5426\u5ffd\u7565\u975e\u6743\u5a01\u673a\u6784\uff0c\u4e8c\u662f\u4ee3\u7406\u5728\u591a\u4e2a\u9009\u9879\u4e2d\u8bc6\u522b\u6743\u5a01\u673a\u6784\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u914d\u5907\u4e86\u89c4\u8303\u6a21\u5757\u7684\u4ee3\u7406\u76f8\u6bd4\u57fa\u7840\u4ee3\u7406\u80fd\u5b9e\u73b0\u66f4\u7a33\u5b9a\u7684\u5408\u4f5c\u6548\u679c\uff0c\u8fd9\u4e3a\u7814\u7a76\u8bbe\u8ba1\u8003\u8651\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u7684\u73af\u5883\u548c\u4ee3\u7406\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002|\n", "2405.19327": "|**2024-05-29**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327](http://arxiv.org/abs/2405.19327)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u51fa\u4e8e\u5546\u4e1a\u5229\u76ca\uff0c\u50cfGPT\u3001Gemini\u548cClaude\u8fd9\u6837\u7684\u6700\u5148\u8fdb\u6a21\u578b\u88ab\u5c01\u95ed\u5728\u4e13\u6709\u63a5\u53e3\u540e\uff0c\u5176\u8bad\u7ec3\u8be6\u60c5\u5e76\u672a\u516c\u5f00\u3002\u8fd1\u671f\uff0c\u4e00\u4e9b\u673a\u6784\u5f00\u6e90\u4e86\u7c7b\u4f3c\u6027\u80fd\u7684LLMs\uff0c\u5982LLaMA-3\uff0c\u4f46\u5927\u591a\u6570\u7ec6\u8282\uff08\u5982\u4e2d\u95f4\u68c0\u67e5\u70b9\u3001\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u8bad\u7ec3\u4ee3\u7801\u7b49\uff09\u4ecd\u672a\u62ab\u9732\u3002\u4e3a\u4e86\u63d0\u9ad8LLMs\u7684\u900f\u660e\u5ea6\uff0c\u7814\u7a76\u754c\u6b63\u5728\u63a8\u52a8\u771f\u6b63\u5f00\u653e\u7684\u6a21\u578b\uff0c\u5982Pythia\u3001Amber\u548cOLMo\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u4fe1\u606f\uff0c\u4fc3\u8fdb\u4e86\u5bf9\u5927\u6a21\u578b\u6027\u80fd\u3001\u5c40\u9650\u6027\u3001\u504f\u89c1\u548c\u98ce\u9669\u7684\u79d1\u5b66\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5f00\u653e\u6a21\u578b\u5728\u63a8\u7406\u3001\u77e5\u8bc6\u548c\u7f16\u7a0b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ecd\u900a\u4e8e\u540c\u7b49\u89c4\u6a21\u7684\u5c01\u95ed\u6e90\u7801\u6a21\u578b\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u6e90\u4e86MAP-Neo\uff0c\u4e00\u4e2a\u62e5\u670970\u4ebf\u53c2\u6570\u7684\u53cc\u8bed\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u5934\u5f00\u59cb\u57284.5\u4e07\u4ebf\u9ad8\u8d28\u91cf\u4ee4\u724c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002MAP-Neo\u662f\u9996\u4e2a\u4e0e\u73b0\u6709\u9876\u7ea7LLMs\u6027\u80fd\u76f8\u5f53\u7684\u5b8c\u5168\u5f00\u6e90\u7684\u53cc\u8bed\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u4e86\u6240\u6709\u7ec6\u8282\uff0c\u5305\u62ec\u6e05\u7406\u540e\u7684\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u3001\u6570\u636e\u6e05\u6d17\u6d41\u7a0b\u3001\u68c0\u67e5\u70b9\u4ee5\u53ca\u4f18\u5316\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30\u6846\u67b6\uff0c\u4ee5\u4f9b\u91cd\u73b0\u3002\u6211\u4eec\u671f\u671bMAP-Neo\u80fd\u63a8\u52a8\u5f00\u653e\u7814\u7a76\u793e\u533a\u7684\u53d1\u5c55\uff0c\u6fc0\u53d1\u66f4\u591a\u521b\u65b0\uff0c\u4fc3\u8fdbLLMs\u7684\u8fdb\u4e00\u6b65\u63d0\u5347\u3002|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326](http://arxiv.org/abs/2405.19326)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u96f6\u6837\u672c3D\u63a8\u7406\u5206\u5272\uff0c\u76ee\u6807\u662f\u9488\u5bf9\u7269\u4f53\u7684\u90e8\u4ef6\u641c\u7d22\u548c\u5b9a\u4f4d\uff0c\u8fd9\u662f\u4e00\u79cd\u8d85\u8d8a\u4e86\u5148\u524d\u7c7b\u522b\u7279\u5b9a\u76843D\u8bed\u4e49\u5206\u5272\u30013D\u5b9e\u4f8b\u5206\u5272\u548c\u5f00\u653e\u8bcd\u6c473D\u5206\u5272\u5c40\u9650\u7684\u65b0\u8303\u5f0f\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aReasoning3D\u7684\u7b80\u5355\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u7684\u547d\u4ee4\uff0c\u5bf93D\u7f51\u683c\u8fdb\u884c\uff08\u7ec6\u81f4\uff09\u90e8\u5206\u5206\u5272\uff0c\u540c\u65f6\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u548c\u63a8\u7406\u7b54\u6848\u7684\u4ea4\u4e92\u5f0f\u5206\u5272\u80fd\u529b\u3002\u7279\u522b\u5730\uff0cReasoning3D\u5229\u7528\u9884\u8bad\u7ec3\u76842D\u5206\u5272\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\uff0c\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u89e3\u6790\u7528\u6237\u8f93\u5165\u67e5\u8be2\u3002\u5df2\u6709\u7814\u7a76\u8868\u660e\uff0c\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8d4b\u4e88\u57fa\u7840\u6a21\u578b\u4e16\u754c\u77e5\u8bc6\u7684\u5148\u9a8c\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u590d\u6742\u6307\u4ee4\uff0c\u8fd9\u4f7f\u5f97\u6211\u4eec\u5728\u4f9d\u8d56\u6709\u96503D\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u201c\u5206\u5272\u4efb\u4f55\u4e1c\u897f\u201d\uff08\u6e90\u6548\u7387\u9ad8\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u6cdb\u5316\u6027\uff0c\u80fd\u6709\u6548\u6839\u636e\u9690\u6027\u6587\u672c\u67e5\u8be2\u57283D\u5bf9\u8c61\uff083D\u7f51\u683c\uff09\u4e2d\u5b9a\u4f4d\u548c\u7a81\u51fa\u663e\u793a\u90e8\u5206\uff0c\u5305\u62ec\u53ef\u52a83D\u5bf9\u8c61\u548c\u771f\u5b9e\u4e16\u754c\u7684\u626b\u63cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65e0\u76d1\u7763\u65b9\u6cd5\u4fbf\u4e8e\u5feb\u901f\u90e8\u7f72\uff0c\u5e76\u4e3a\u672a\u67653D\uff08\u8bed\u4e49\uff09\u5bf9\u8c61\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\uff0c\u5982\u673a\u5668\u4eba\u3001\u7269\u4f53\u64cd\u4f5c\u3001\u90e8\u4ef6\u7ec4\u88c5\u3001\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u3001\u589e\u5f3a\u73b0\u5b9e\u548c\u865a\u62df\u73b0\u5b9e\uff08AR/VR\uff09\u3001\u4ee5\u53ca\u533b\u7597\u5e94\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u884c\u7684\u901a\u7528\u57fa\u51c6\u3002\u4ee3\u7801\u3001\u6a21\u578b\u6743\u91cd\u3001\u90e8\u7f72\u6307\u5357\u548c\u8bc4\u4f30\u534f\u8bae\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttp://tianrun-chen.github.io/Reason3D/\u3002|\n", "2405.19325": "|**2024-05-29**|**Nearest Neighbor Speculative Decoding for LLM Generation and Attribution**|Minghan Li et.al.|[2405.19325](http://arxiv.org/abs/2405.19325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4f1a\u4ea7\u751f\u865a\u6784\u5185\u5bb9\u4e14\u7f3a\u4e4f\u5bf9\u751f\u6210\u6587\u672c\u7684\u6765\u6e90\u6807\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u534a\u53c2\u6570\u5316\u8bed\u8a00\u6a21\u578b\u5982kNN-LM\u901a\u8fc7\u5728\u975e\u53c2\u6570\u6570\u636e\u5b58\u50a8\u4e2d\u5bfb\u627e\u4e0e\u7ed9\u5b9a\u63d0\u793a\u6700\u63a5\u8fd1\u7684\u90bb\u5c45\u6765\u6539\u8fdbLM\u8f93\u51fa\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u901a\u5e38\u8f83\u6162\uff0c\u751f\u6210\u7684\u6587\u672c\u6d41\u7545\u5ea6\u4e0d\u9ad8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u534a\u53c2\u6570\u5316\u8bed\u8a00\u5efa\u6a21\u65b9\u6cd5\u2014\u2014Nearest Neighbor Speculative Decoding\uff08NEST\uff09\uff0c\u5b83\u80fd\u591f\u5c06\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4efb\u610f\u957f\u5ea6\u6587\u672c\u7247\u6bb5\u878d\u5165\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u5176\u6e90\u5934\u7684\u6807\u6ce8\u3002NEST\u5728\u6bcf\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u8fdb\u884c\u57fa\u4e8e\u4ee4\u724c\u7684\u68c0\u7d22\uff0c\u8ba1\u7b97\u51fa\u4e00\u4e2a\u534a\u53c2\u6570\u6df7\u5408\u5206\u5e03\uff0c\u5e76\u4ece\u8bed\u6599\u5e93\u4e2d\u8bc6\u522b\u51fa\u53ef\u80fd\u7684\u8fde\u7eed\u6587\u672c\u6bb5\u843d\u6269\u5c55\u3002\u5b83\u91c7\u7528\u4e00\u79cd\u8fd1\u4f3c\u63a8\u6d4b\u89e3\u7801\u7b56\u7565\uff0c\u63a5\u53d7\u68c0\u7d22\u5230\u7684\u7247\u6bb5\u524d\u7f00\u6216\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002NEST\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u7840LM\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u751f\u6210\u8d28\u91cf\u548c\u6765\u6e90\u6807\u6ce8\u7387\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684kNN-LM\u65b9\u6cd5\uff0c\u5e76\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u68c0\u7d22\u589e\u5f3a\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u529b\u3002\u6b64\u5916\uff0cNEST\u5927\u5e45\u63d0\u5347\u4e86\u751f\u6210\u901f\u5ea6\uff0c\u5f53\u5e94\u7528\u4e8eLlama-2-Chat 70B\u65f6\uff0c\u63a8\u7406\u65f6\u95f4\u63d0\u9ad8\u4e861.8\u500d\u3002|\n", "2405.19323": "|**2024-05-29**|**Are Large Language Models Chameleons?**|Mingmeng Geng et.al.|[2405.19323](http://arxiv.org/abs/2405.19323)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u62e5\u6709\u81ea\u5df1\u7684\u4e16\u754c\u89c2\u548c\u4eba\u683c\u503e\u5411\uff1f\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u4e86\u8d85\u8fc7\u4e00\u767e\u4e07\u6b21\u7684\u5b9e\u9a8c\uff0c\u8ba9LLMs\u56de\u7b54\u4e3b\u89c2\u95ee\u9898\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u6a21\u578b\u7684\u54cd\u5e94\u4e0e\u6b27\u6d32\u793e\u4f1a\u8c03\u67e5\uff08ESS\uff09\u7684\u5b9e\u9645\u6570\u636e\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793a\u63d0\u793a\u5bf9\u504f\u89c1\u548c\u53d8\u5f02\u6027\u6709\u663e\u8457\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u91cd\u5927\u7684\u6587\u5316\u3001\u5e74\u9f84\u548c\u6027\u522b\u504f\u5dee\u3002\u6587\u4e2d\u8ba8\u8bba\u4e86\u8bc4\u4f30LLMs\u4e0e\u8c03\u67e5\u6570\u636e\u5dee\u5f02\u7684\u65b9\u6cd5\uff0c\u5982\u8ba1\u7b97\u52a0\u6743\u5e73\u5747\u503c\u4ee5\u53ca\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u57fa\u4e8eJaccard\u76f8\u4f3c\u6027\u7684\u6d4b\u91cf\u6307\u6807\u3002\u7814\u7a76\u8005\u5f3a\u8c03\uff0c\u5728\u5229\u7528LLMs\u6a21\u62df\u4e2a\u4f53\u51b3\u7b56\u6216\u96c6\u4f53\u884c\u4e3a\u4e4b\u524d\uff0c\u5206\u6790\u63d0\u793a\u7684\u7a33\u5065\u6027\u548c\u53d8\u5f02\u6027\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u6a21\u4eff\u80fd\u529b\u5145\u5176\u91cf\u53ea\u80fd\u8bf4\u662f\u8fd1\u4f3c\u7684\u3002|\n", "2405.19320": "|**2024-05-29**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320](http://arxiv.org/abs/2405.19320)|null|**\u6458\u8981\uff1a** \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5728\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u90fd\u5904\u4e8e\u6d3b\u8dc3\u7684\u7814\u7a76\u9636\u6bb5\uff0c\u4f46\u5173\u952e\u6311\u6218\u4e4b\u4e00\u662f\u5982\u4f55\u5728\u5904\u7406\u4ece\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u4e0d\u786e\u5b9a\u6027\u65f6\u3002\u5c3d\u7ba1\u6807\u51c6\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e2d\u4e50\u89c2\u4e3b\u4e49\u6216\u60b2\u89c2\u4e3b\u4e49\u7684\u539f\u5219\u5df2\u5e7f\u4e3a\u4eba\u77e5\uff0c\u4f46\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5b9e\u73b0\u65e2\u5b9e\u7528\u53c8\u57fa\u4e8e\u7406\u8bba\u7684\u65b9\u6cd5\u5c1a\u4e0d\u6210\u719f\uff0c\u56e0\u4e3a\u6784\u5efa\u7f6e\u4fe1\u533a\u95f4\u7684\u6807\u51c6\u6280\u672f\u5728\u5904\u7406\u4efb\u610f\u7b56\u7565\u53c2\u6570\u5316\u65f6\u53d8\u5f97\u96be\u4ee5\u5904\u7406\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u65b9\u6cd5\u2014\u2014\u4ef7\u503c\u6fc0\u52b1\u7684\u504f\u597d\u4f18\u5316\uff08VPO\uff09\u3002VPO\u901a\u8fc7\u5728\u6700\u5927\u4f3c\u7136\u4f30\u8ba1\u7684\u5956\u52b1\u51fd\u6570\u4e2d\u6dfb\u52a0\u76f8\u5e94\u7684\u503c\u51fd\u6570\u7684\u6b63\u5219\u5316\uff0c\u4ee5\u6307\u793a\u9009\u62e9\u4e50\u89c2\u4e3b\u4e49\u8fd8\u662f\u60b2\u89c2\u4e3b\u4e49\uff0c\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\u3002\u6b64\u5916\uff0cVPO\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u5e76\u5229\u7528\u9690\u5f0f\u5956\u52b1\u5efa\u6a21\uff0c\u56e0\u6b64\u5176RLHF\u7ba1\u9053\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\u66f4\u4e3a\u7b80\u5355\u3002\u5bf9\u4e8e\u5728\u7ebf\u548c\u79bb\u7ebf\u8bbe\u7f6e\uff0cVPO\u63d0\u4f9b\u4e86\u7406\u8bba\u4fdd\u8bc1\uff0c\u5176\u6536\u655b\u901f\u5ea6\u4e0e\u6807\u51c6RL\u76f8\u5f53\u3002\u5b9e\u9a8c\u5728\u6587\u672c\u6458\u8981\u548c\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u9a8c\u8bc1\u4e86VPO\u7684\u5b9e\u7528\u6027\u4e0e\u6709\u6548\u6027\u3002|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340](http://arxiv.org/abs/2405.20340)|**[link](https://github.com/IDEA-Research/MotionLLM)**|\u8fd9\u9879\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\uff08\u89c6\u9891\u548c\u52a8\u4f5c\u6a21\u6001\uff09\u4e0b\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5f3a\u5927\u529f\u80fd\u3002\u4e0e\u4e13\u4e3a\u5355\u6a21\u6001\uff08\u89c6\u9891\u6216\u52a8\u4f5c\uff09\u8bbe\u8ba1\u7684\u6700\u65b0LLMs\u4e0d\u540c\uff0c\u6211\u4eec\u8ba4\u4e3a\u7406\u89e3\u4eba\u7c7b\u884c\u4e3a\u9700\u8981\u5bf9\u89c6\u9891\u548c\u52a8\u4f5c\u5e8f\u5217\uff08\u5982SMPL\u5e8f\u5217\uff09\u8fdb\u884c\u8054\u5408\u5efa\u6a21\uff0c\u4ee5\u6709\u6548\u6355\u6349\u7cbe\u7ec6\u7684\u8eab\u4f53\u90e8\u4f4d\u52a8\u6001\u548c\u8bed\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMotionLLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u4eba\u7c7b\u52a8\u4f5c\u7406\u89e3\u3001\u63cf\u8ff0\u548c\u63a8\u7406\u3002MotionLLM\u91c7\u7528\u4e86\u4e00\u4f53\u5316\u7684\u89c6\u9891-\u52a8\u4f5c\u8bad\u7ec3\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u7c97\u7c92\u5ea6\u7684\u89c6\u9891-\u6587\u672c\u6570\u636e\u548c\u7cbe\u7ec6\u52a8\u4f5c-\u6587\u672c\u6570\u636e\u7684\u4f18\u52bf\uff0c\u4ee5\u83b7\u53d6\u4e30\u5bcc\u7684\u7a7a\u95f4-\u65f6\u95f4\u6d1e\u5bdf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684MoVid\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u591a\u6837\u5316\u7684\u89c6\u9891\u3001\u52a8\u4f5c\u3001caption\u548c\u6307\u4ee4\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86MoVid-Bench\uff0c\u5b83\u5177\u6709\u7cbe\u5fc3\u7684\u624b\u52a8\u6807\u6ce8\uff0c\u4ee5\u66f4\u597d\u5730\u8bc4\u4f30\u5728\u89c6\u9891\u548c\u52a8\u4f5c\u4e0a\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5145\u5206\u5c55\u793a\u4e86MotionLLM\u5728caption\u751f\u6210\u3001\u7a7a\u95f4-\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002|\n", "2405.20339": "|**2024-05-30**|**Visual Perception by Large Language Model's Weights**|Feipeng Ma et.al.|[2405.20339](http://arxiv.org/abs/2405.20339)|**[link](https://github.com/FeipengMa6/VLoRA)**|\u8fd9\u7bc7\u8bba\u6587\u7684\u80cc\u666f\u662f\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u91c7\u7528\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5373\u5c06\u89c6\u89c9\u4fe1\u606f\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u7a7a\u95f4\u5bf9\u9f50\uff0c\u7136\u540e\u5c06\u89c6\u89c9\u4ee4\u724c\u4e0e\u6587\u672c\u4ee4\u724c\u5408\u5e76\uff0c\u5f62\u6210\u7edf\u4e00\u7684\u5e8f\u5217\u8f93\u5165\u7ed9\u8bed\u8a00\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7531\u4e8e\u589e\u52a0\u4e86\u7531\u89c6\u89c9\u4ee4\u724c\u5bfc\u81f4\u7684\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\uff0c\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53c2\u6570\u7a7a\u95f4\u5bf9\u9f50\u8303\u5f0f\uff0c\u901a\u8fc7\u5c06\u89c6\u89c9\u4fe1\u606f\u8868\u793a\u4e3a\u6a21\u578b\u6743\u91cd\u6765\u5904\u7406\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u8f93\u5165\u56fe\u50cf\uff0c\u9996\u5148\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u63d0\u53d6\u7279\u5f81\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u7279\u5f81\u8f6c\u6362\u4e3a\u611f\u77e5\u6743\u91cd\uff0c\u5e76\u5c06\u5176\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u6743\u91cd\u878d\u5408\u3002\u8fd9\u6837\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u65e0\u9700\u89c6\u89c9\u4ee4\u724c\uff0c\u4ece\u800c\u7f29\u77ed\u4e86\u8f93\u5165\u5e8f\u5217\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6548\u7387\u3002 \u57fa\u4e8e\u8fd9\u4e00\u7406\u5ff5\uff0c\u8bba\u6587\u63d0\u51fa\u4e86VLoRA\u6a21\u578b\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u4e2a\u611f\u77e5\u6743\u91cd\u751f\u6210\u5668\u3002\u8be5\u751f\u6210\u5668\u8bbe\u8ba1\u6210\u80fd\u591f\u5c06\u89c6\u89c9\u7279\u5f81\u8f6c\u5316\u4e3a\u5177\u6709\u4f4e\u79e9\u7279\u6027\u7684\u611f\u77e5\u6743\u91cd\uff0c\u7c7b\u4f3c\u4e8eLoRA\uff08\u4f4e\u79e9\u81ea\u9002\u5e94\u8bad\u7ec3\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1VLoRA\u5728\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u4e0e\u73b0\u6709MLLMs\u76f8\u5f53\u7684\u6027\u80fd\uff0c\u4f46\u5176\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8ba1\u7b97\u6210\u672c\u663e\u8457\u964d\u4f4e\u3002\u8bba\u6587\u627f\u8bfa\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2405.20335": "|**2024-05-30**|**Xwin-LM: Strong and Scalable Alignment Practice for LLMs**|Bolin Ni et.al.|[2405.20335](http://arxiv.org/abs/2405.20335)|**[link](https://github.com/xwin-lm/xwin-lm)**|**\u672c\u6587\u4ecb\u7ecdXwin-LM\uff0c\u4e00\u4e2a\u4e13\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u7684\u5168\u9762\u5bf9\u9f50\u65b9\u6cd5\u5957\u4ef6\u3002\u5b83\u6db5\u76d6\u4e86\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3001\u5956\u52b1\u5efa\u6a21\uff08RM\uff09\u3001\u62d2\u7edd\u91c7\u6837\u5fae\u8c03\uff08RS\uff09\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b49\u591a\u79cd\u5173\u952e\u6280\u672f\u3002\u4e3b\u8981\u7ec4\u6210\u90e8\u5206\u5305\u62ec\uff1a(1) \u4f7f\u7528\u9ad8\u8d28\u91cf\u6307\u4ee4\u6570\u636e\u8fdb\u884c\u521d\u59cb\u5fae\u8c03\u7684Xwin-LM-SFT\uff1b(2) \u7531GPT-4\u7cbe\u5fc3\u6807\u6ce8\u7684\u5927\u578b\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6Xwin-Pair\uff1b(3) \u57287B\u300113B\u548c70B\u53c2\u6570\u89c4\u6a21\u4e0a\u8bad\u7ec3\u7684Xwin-RM\u5956\u52b1\u6a21\u578b\uff1b(4) \u6bcf\u4e2a\u63d0\u793a\u5173\u805464\u4e2a\u72ec\u7279\u54cd\u5e94\u7684\u591awise\u504f\u597d\u6570\u636e\u96c6Xwin-Set\uff0c\u8fd9\u4e9b\u54cd\u5e94\u7531Xwin-LM-SFT\u751f\u6210\u5e76\u7531Xwin-RM\u8bc4\u5206\uff1b(5) \u4f7f\u7528Xwin-Set\u4e2d\u6700\u9ad8\u5f97\u5206\u54cd\u5e94\u8fdb\u884c\u5fae\u8c03\u7684Xwin-LM-RS\u6a21\u578b\uff1b(6) \u901a\u8fc7DPO\u7b97\u6cd5\u5728Xwin-Set\u4e0a\u8fdb\u4e00\u6b65\u4f18\u5316\u7684Xwin-LM-DPO\u6a21\u578b\u3002\u6211\u4eec\u5728AlpacaEval\u548cMT-bench\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86\u6574\u4e2a\u7ba1\u9053\u7684\u7a33\u5b9a\u4e14\u663e\u8457\u6539\u8fdb\uff0c\u8bc1\u660e\u4e86Xwin-LM\u7684\u5f3a\u5927\u548c\u53ef\u6269\u5c55\u6027\u3002\u6211\u4eec\u5c06\u5728https://github.com/Xwin-LM/Xwin-LM\u7684\u4ed3\u5e93\u4e2d\u6301\u7eed\u66f4\u65b0\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u7814\u7a76\u3002**|\n", "2405.20319": "|**2024-05-31**|**ParSEL: Parameterized Shape Editing with Language**|Aditya Ganeshan et.al.|[2405.20319](http://arxiv.org/abs/2405.20319)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aParSEL\u7684\u7cfb\u7edf\uff0c\u5b83\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5b9e\u73b0\u9ad8\u8d28\u91cf3D\u8d44\u4ea7\u7684\u53ef\u63a7\u7f16\u8f91\u3002\u9762\u5bf9\u81ea\u7136\u8bed\u8a00\u5728\u7cbe\u786e\u64cd\u63a7\u4e0a\u7684\u5c40\u9650\u6027\uff0cParSEL\u63a5\u6536\u4e00\u4e2a\u5206\u5272\u76843D\u7f51\u683c\u548c\u7f16\u8f91\u8bf7\u6c42\uff0c\u751f\u6210\u4e00\u4e2a\u53c2\u6570\u5316\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u7528\u6237\u53ef\u4ee5\u8c03\u6574\u7a0b\u5e8f\u53c2\u6570\uff0c\u7cbe\u7ec6\u5730\u63a2\u7d22\u5f62\u72b6\u53d8\u5316\uff0c\u63a7\u5236\u7f16\u8f91\u5e45\u5ea6\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u7406\u89e3\u521d\u59cb\u7f16\u8f91\u6307\u4ee4\uff0c\u4f46\u53d1\u73b0\u5b83\u4eec\u5728\u63a8\u65ad\u5b8c\u6574\u7f16\u8f91\u7a0b\u5e8f\u65f6\u5e38\u5e38\u4e0d\u8db3\uff0c\u4ea7\u751f\u7684\u7ed3\u679c\u53ef\u80fd\u8fdd\u53cd\u5f62\u72b6\u903b\u8f91\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u5206\u6790\u6027\u7f16\u8f91\u4f20\u64ad\uff08Analytical Edit Propagation\uff0cAEP\uff09\u7b97\u6cd5\uff0c\u5b83\u4ece\u521d\u59cb\u7f16\u8f91\u79cd\u5b50\u5f00\u59cb\uff0c\u901a\u8fc7\u8ba1\u7b97\u673a\u4ee3\u6570\u7cfb\u7edf\u8fdb\u884c\u51e0\u4f55\u5206\u6790\uff0c\u5bfb\u627e\u4e0e\u6f5c\u5728\u7528\u6237\u7f16\u8f91\u517c\u5bb9\u7684\u5206\u6790\u6027\u7f16\u8f91\u64cd\u4f5c\uff0c\u4ee5\u751f\u6210\u5b8c\u6574\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u5176\u4ed6\u65b9\u6848\uff0cParSEL\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8bf7\u6c42\u6709\u6548\u5730\u5b9e\u73b0\u4e86\u5bf93D\u5bf9\u8c61\u7684\u53ef\u63a7\u7f16\u8f91\u3002|\n", "2405.20318": "|**2024-05-30**|**CausalQuest: Collecting Natural Causal Questions for AI Agents**|Roberto Ceraolo et.al.|[2405.20318](http://arxiv.org/abs/2405.20318)|**[link](https://github.com/roberto-ceraolo/causal-quest)**|**\u4eba\u7c7b\u5929\u751f\u5c31\u6709\u5bfb\u6c42\u56e0\u679c\u5173\u7cfb\u7684\u9a71\u52a8\u529b\uff0c\u65e0\u8bba\u662f\u51fa\u4e8e\u597d\u5947\u5fc3\u8fd8\u662f\u7279\u5b9a\u76ee\u6807\u3002\u4e3a\u4e86\u5f00\u53d1\u80fd\u5904\u7406\u8fd9\u79cd\u4eba\u7c7b\u672c\u6027\u8ffd\u6c42\u7684AI\u4ee3\u7406\uff0c\u6211\u4eec\u6025\u9700\u4e00\u4e2a\u5168\u9762\u7684\u81ea\u7136\u56e0\u679c\u95ee\u9898\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u8981\u4e48\u5305\u542b\u4eba\u5de5\u5236\u9020\u7684\u95ee\u9898\uff0c\u65e0\u6cd5\u53cd\u6620\u5b9e\u9645AI\u5e94\u7528\u573a\u666f\uff0c\u8981\u4e48\u5728\u7279\u5b9a\u6765\u6e90\u7684\u95ee\u9898\u8986\u76d6\u4e0a\u6709\u9650\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CausalQuest\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81ea\u793e\u4ea4\u7f51\u7edc\u3001\u641c\u7d22\u5f15\u64ce\u548cAI\u52a9\u624b\u768413,500\u4e2a\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u56e0\u679c\u95ee\u9898\uff0c\u5e76\u5efa\u7acb\u4e86\u66f4\u7ec6\u81f4\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u5458\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\uff0c\u6211\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6807\u6ce8\u3002\u7814\u7a76\u53d1\u73b0\uff0c42%\u7684\u4eba\u7c7b\u63d0\u95ee\u5b9e\u9645\u4e0a\u662f\u5173\u4e8e\u56e0\u679c\u7684\uff0c\u5927\u90e8\u5206\u662f\u60f3\u4e86\u89e3\u7ed9\u5b9a\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u9ad8\u6548\u7684\u4e8c\u5206\u7c7b\u5668\uff08\u9ad8\u8fbe28.5\u4ebf\u53c2\u6570\uff09\uff0c\u7528\u4e8e\u8bc6\u522b\u56e0\u679c\u95ee\u9898\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6027\u80fd\uff0cF1\u5206\u6570\u9ad8\u8fbe0.877\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u4e30\u5bcc\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u8fd9\u4e9b\u90fd\u53ef\u4ee5\u57fa\u4e8e\u6211\u4eec\u7684\u6570\u636e\u548c\u6a21\u578b\u8fdb\u884c\u6269\u5c55\u3002**|\n", "2405.20315": "|**2024-05-30**|**ANAH: Analytical Annotation of Hallucinations in Large Language Models**|Ziwei Ji et.al.|[2405.20315](http://arxiv.org/abs/2405.20315)|**[link](https://github.com/open-compass/anah)**|**### \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u5e7b\u89c9\u201d\u95ee\u9898\u5bf9\u4e8e\u5176\u5e7f\u6cdb\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5bf9\u8fd9\u4e00\u95ee\u9898\u7684\u7ec6\u81f4\u6d4b\u91cf\u5728\u793e\u533a\u4e2d\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a$\\textbf{ANAH}$\u7684\u53cc\u8bed\u6570\u636e\u96c6\uff0c\u4e13\u6ce8\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2d\u7684LLM\u5e7b\u89c9\u5206\u6790\u3002ANAH\u4e2d\u7684\u6bcf\u4e2a\u7b54\u6848\u53e5\u5b50\u90fd\u7ecf\u8fc7\u4e25\u8c28\u6807\u6ce8\uff0c\u5305\u62ec\u53c2\u8003\u7247\u6bb5\u68c0\u7d22\u3001\u5e7b\u89c9\u7c7b\u578b\u7684\u5224\u65ad\u4ee5\u53ca\u9519\u8bef\u5185\u5bb9\u7684\u4fee\u6b63\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea612,000\u4e2a\u53e5\u7ea7\u6ce8\u91ca\uff0c\u6db5\u76d6\u4e86\u5927\u7ea64,300\u4e2aLLM\u54cd\u5e94\uff0c\u6d89\u53ca\u8d85\u8fc7700\u4e2a\u4e3b\u9898\uff0c\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\u5f0f\u6d41\u7a0b\u6784\u5efa\u800c\u6210\u3002\u7531\u4e8e\u5e7b\u89c9\u6ce8\u91ca\u7684\u7cbe\u7ec6\u7c92\u5ea6\uff0c\u6211\u4eec\u53ef\u4ee5\u5b9a\u91cf\u786e\u8ba4LLMs\u7684\u5e7b\u89c9\u95ee\u9898\u968f\u7740\u7b54\u6848\u7684\u6269\u5c55\u800c\u9010\u6e10\u589e\u52a0\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u6807\u6ce8\u5668\u3002 ### \u4efb\u52a1 \u6211\u4eec\u6784\u5efa\u4e86\u5927\u7ea612,000\u6761\u53e5\u5b50\u7ea7\u522b\u7684\u6ce8\u91ca\uff0c\u9488\u5bf9\u7ea64,300\u4e2aLLM\u751f\u6210\u7684\u56de\u7b54\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u4e3b\u9898\u3002\u8fd9\u4e2a\u540d\u4e3aANAH\u7684\u6570\u636e\u96c6\u901a\u8fc7\u4eba\u7c7b\u53c2\u4e0e\u7684\u6d41\u7a0b\u7cbe\u5fc3\u8bbe\u8ba1\uff0c\u65e8\u5728\u63d0\u4f9b\u5173\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2dLLMs\u5e7b\u89c9\u7684\u8be6\u5c3d\u5206\u6790\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u5e7b\u89c9\u6807\u6ce8\uff0c\u6211\u4eec\u80fd\u591f\u91cf\u5316\u5730\u9a8c\u8bc1LLMs\u5728\u751f\u6210\u7b54\u6848\u65f6\u5e7b\u89c9\u95ee\u9898\u7684\u7d2f\u79ef\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u6df1\u5165\u7814\u7a76\u4e86\u751f\u6210\u5f0f\u548c\u533a\u5206\u6027\u6807\u6ce8\u5668\uff0c\u5e76\u53d1\u73b0\u5c3d\u7ba1\u5f00\u6e90LLMs\u5728\u7cbe\u7ec6\u5e7b\u89c9\u6807\u6ce8\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u4f46\u4f7f\u7528ANAH\u8bad\u7ec3\u7684\u751f\u6210\u5f0f\u6807\u6ce8\u5668\u80fd\u591f\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u6a21\u578b\uff0c\u751a\u81f3\u63a5\u8fd1GPT-3.5\u7684\u8868\u73b0\uff0c\u5e76\u5c55\u73b0\u51fa\u5728\u672a\u89c1\u8fc7\u95ee\u9898\u4e0a\u7684\u826f\u597d\u6cdb\u5316\u80fd\u529b\u3002**|\n", "2405.20313": "|**2024-05-30**|**Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation**|Guillaume Huguet et.al.|[2405.20313](http://arxiv.org/abs/2405.20313)|null|\u86cb\u767d\u8d28\u5728\u51e0\u4e4e\u6240\u6709\u7684\u751f\u7269\u8fc7\u7a0b\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\uff0c\u5176\u591a\u6837\u5316\u7684\u529f\u80fd\u6e90\u4e8e\u590d\u6742\u7684\u4e09\u7ef4\u7ed3\u6784\uff0c\u800c\u8fd9\u4e9b\u7ed3\u6784\u53c8\u7531\u6c28\u57fa\u9178\u5e8f\u5217\u51b3\u5b9a\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6c28\u57fa\u9178\u5e8f\u5217\u4e30\u5bcc\u7684\u751f\u7269\u5b66\u5f52\u7eb3\u504f\u7f6e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5e8f\u5217\u6761\u4ef6\u7684SE(3)\u7b49\u53d8\u6d41\u5339\u914d\u6a21\u578b\u2014\u2014FoldFlow-2\uff0c\u7528\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u751f\u6210\u3002\u4e0eFoldFlow\u5bb6\u65cf\u7684\u5148\u524d\u6a21\u578b\u76f8\u6bd4\uff0cFoldFlow-2\u5f15\u5165\u4e86\u65b0\u9896\u7684\u67b6\u6784\u7279\u6027\uff0c\u5305\u62ec\u7528\u4e8e\u7f16\u7801\u5e8f\u5217\u7684\u86cb\u767d\u8d28\u5927\u8bed\u8a00\u6a21\u578b\u3001\u7ed3\u5408\u7ed3\u6784\u548c\u5e8f\u5217\u8868\u793a\u7684\u65b0\u591a\u6a21\u6001\u878d\u5408\u4e3b\u5e72\uff0c\u4ee5\u53ca\u57fa\u4e8e\u51e0\u4f55\u53d8\u6362\u5668\u7684\u89e3\u7801\u5668\u3002\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u6837\u672c\u7684\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u2014\u2014\u8fd9\u5bf9\u65b0\u836f\u8bbe\u8ba1\u81f3\u5173\u91cd\u8981\u2014\u2014\u6211\u4eec\u5728\u6bd4\u5148\u524d\u5de5\u4f5c\u4f7f\u7528\u7684PDB\u6570\u636e\u96c6\u5927\u4e00\u4e2a\u6570\u91cf\u7ea7\u7684\u65b0\u6570\u636e\u96c6\u4e0a\u5927\u89c4\u6a21\u8bad\u7ec3FoldFlow-2\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u5df2\u77e5\u7684PDB\u86cb\u767d\u8d28\u548c\u901a\u8fc7\u8fc7\u6ee4\u83b7\u5f97\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u7ed3\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5f15\u5165\u5f3a\u5316\u5fae\u8c03\uff08Reinforced Finetuning\uff0c\u7b80\u79f0ReFT\uff09\u76ee\u6807\uff0c\u4f7fFoldFlow-2\u80fd\u591f\u9002\u5e94\u4efb\u610f\u5956\u52b1\uff0c\u5982\u63d0\u9ad8\u4e8c\u7ea7\u7ed3\u6784\u591a\u6837\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFoldFlow-2\u8d85\u8d8a\u4e86\u73b0\u6709\u57fa\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\u7684\u72b6\u6001\uff0c\u65e0\u8bba\u5728\u65e0\u6761\u4ef6\u751f\u6210\u8fd8\u662f\u5728\u8bbe\u8ba1\u6027\u3001\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u65b9\u9762\uff0c\u90fd\u4f18\u4e8eRFDiffusion\uff0c\u4e14\u5728\u86cb\u767d\u8d28\u957f\u5ea6\u7684\u5404\u7c7b\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u7b49\u6e29\u6784\u8c61\u91c7\u6837\u4efb\u52a1\u4e0a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684FoldFlow-2\u5728\u8bf8\u5982VHH\u7eb3\u7c73\u6297\u4f53\u9aa8\u67b6\u8bbe\u8ba1\u7b49\u5177\u6709\u6311\u6218\u6027\u7684\u6761\u4ef6\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002|\n", "2405.20309": "|**2024-05-30**|**Large Language Models Can Self-Improve At Web Agent Tasks**|Ajay Patel et.al.|[2405.20309](http://arxiv.org/abs/2405.20309)|**[link](https://github.com/AjayP13/webdreamer)**|\u5728\u590d\u6742\u7684\u73af\u5883\u4e2d\uff0c\u5982\u7f51\u7edc\u6d4f\u89c8\u5668\uff0c\u8bad\u7ec3\u6a21\u578b\u4f5c\u4e3a\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u6267\u884c\u52a8\u4f5c\u7684\u4ee3\u7406\u901a\u5e38\u5177\u6709\u6311\u6218\u6027\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u4ee5\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u6765\u5728\u65b0\u73af\u5883\u4e2d\u5bfc\u822a\u7684\u80fd\u529b\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\uff08\u5373\u5728\u5176\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u4e0a\u5fae\u8c03\uff09\u6765\u8d85\u8d8a\u57fa\u7840\u6027\u80fd\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76LLMs\u5728\u957f\u65f6\u5e8f\u4efb\u52a1\u7684\u590d\u6742\u73af\u5883\u2014\u2014WebArena\u57fa\u51c6\u4e2d\uff0c\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u80fd\u5426\u63d0\u5347\u5176\u8868\u73b0\u3002WebArena\u8981\u6c42\u4ee3\u7406\u81ea\u4e3b\u6d4f\u89c8\u7f51\u9875\u5e76\u6267\u884c\u64cd\u4f5c\u4ee5\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u5408\u6210\u8bad\u7ec3\u6570\u636e\u6df7\u5408\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u7ecf\u8fc7\u81ea\u6211\u6539\u8fdb\u540e\uff0c\u6a21\u578b\u5728WebArena\u57fa\u51c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u63d0\u9ad8\u4e8631%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u66f4\u5168\u9762\u5730\u8bc4\u4f30\u6211\u4eec\u7684\u5fae\u8c03\u4ee3\u7406\u6a21\u578b\u7684\u884c\u4e3a\u6027\u80fd\u3001\u9c81\u68d2\u6027\u3001\u80fd\u529b\u4ee5\u53ca\u8f68\u8ff9\u8d28\u91cf\uff0c\u8fd9\u4e9b\u6307\u6807\u8d85\u8d8a\u4e86\u5f53\u524d\u4ec5\u4f9d\u8d56\u4e8e\u6574\u4f53\u57fa\u51c6\u5206\u6570\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002|\n", "2405.20304": "|**2024-05-30**|**Group Robust Preference Optimization in Reward-free RLHF**|Shyam Sundhar Ramesh et.al.|[2405.20304](http://arxiv.org/abs/2405.20304)|**[link](https://github.com/rsshyam/Group-robust-preference-optimization)**|**## \u7ffb\u8bd1 \u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u9002\u5e94\u65f6\uff0c\u901a\u5e38\u9700\u8981\u901a\u8fc7\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u548c\u591a\u5143\u6807\u7b7e\u8005\u7fa4\u4f53\uff08\u5982\u4e0d\u540c\u6027\u522b\u3001\u79cd\u65cf\u3001\u516c\u53f8\u56e2\u961f\u7b49\uff09\u7684\u504f\u597d\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u4f20\u7edf\u65b9\u6cd5\u503e\u5411\u4e8e\u91c7\u7528\u201c\u4e00\u5200\u5207\u201d\u7684\u7b56\u7565\uff0c\u5373\u5047\u8bbe\u5e76\u4f18\u5316\u5355\u4e00\u7684\u504f\u597d\u6a21\u578b\uff0c\u5bf9\u5404\u7fa4\u4f53\u7684\u72ec\u7279\u7279\u6027\u548c\u9700\u6c42\u4e0d\u591f\u654f\u611f\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7fa4\u4f53\u9c81\u68d2\u504f\u597d\u4f18\u5316\uff08GRPO\uff09\u65b9\u6cd5\uff0c\u65e8\u5728\u7a33\u5065\u5730\u4f7fLLMs\u9002\u5e94\u5404\u4e2a\u7fa4\u4f53\u7684\u504f\u597d\u3002GRPO\u65b9\u6cd5\u57fa\u4e8e\u65e0\u5956\u52b1\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u4f46\u533a\u522b\u4e8e\u4ee5\u5f80\uff0c\u5b83\u76ee\u6807\u662f\u5bfb\u627e\u4e00\u4e2a\u80fd\u6700\u5927\u5316\u6700\u5dee\u7fa4\u4f53\u6027\u80fd\u7684\u9c81\u68d2\u7b56\u7565\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0cGRPO\u4f1a\u52a8\u6001\u4e14\u9010\u6b21\u8c03\u6574\u4e0d\u540c\u7fa4\u4f53\u7684\u6743\u91cd\uff0c\u4f18\u5148\u5173\u6ce8\u7d2f\u79ef\u635f\u5931\u8f83\u9ad8\u7684\u7fa4\u4f53\u3002\u6211\u4eec\u5728\u7406\u8bba\u4e0a\u63a2\u8ba8\u4e86GRPO\u7684\u53ef\u884c\u6027\uff0c\u5e76\u5206\u6790\u4e86\u5176\u5728\u5bf9\u6570\u7ebf\u6027\u7b56\u7565\u7c7b\u522b\u4e0b\u7684\u6536\u655b\u6027\u3002\u901a\u8fc7\u4f7f\u7528\u6765\u81ea\u4e0d\u540c\u7fa4\u4f53\u7684\u5168\u5c40\u610f\u89c1\u6570\u636e\u5bf9LLMs\u8fdb\u884cGRPO\u5fae\u8c03\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u6700\u5dee\u7fa4\u4f53\u7684\u8868\u73b0\uff0c\u51cf\u5c11\u4e86\u7fa4\u4f53\u95f4\u635f\u5931\u7684\u4e0d\u5e73\u8861\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u6982\u7387\u51c6\u786e\u6027\uff0c\u76f8\u8f83\u4e8e\u975e\u9c81\u68d2\u57fa\u7ebf\uff0c\u8fd9\u4e9b\u6539\u8fdb\u6548\u679c\u663e\u8457\u3002**|\n", "2405.20285": "|**2024-05-30**|**Who Writes the Review, Human or AI?**|Panagiotis C. Theocharopoulos et.al.|[2405.20285](http://arxiv.org/abs/2405.20285)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u4eba\u4eec\u5173\u6ce8\u5982\u4f55\u8bc6\u522b\u4e0d\u540c\u9886\u57df\u7684AI\u751f\u6210\u6587\u672c\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u901a\u8fc7\u63d0\u51fa\u4e00\u79cd\u65b9\u6cd5\u6765\u51c6\u786e\u533a\u5206\u4eba\u5de5\u667a\u80fd\u751f\u6210\u7684\u548c\u4eba\u7c7b\u64b0\u5199\u7684\u4e66\u8bc4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u8fc1\u79fb\u5b66\u4e60\uff0c\u8ba9\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u540c\u4e3b\u9898\u95f4\u8bc6\u522b\u751f\u6210\u6587\u672c\uff0c\u540c\u65f6\u63d0\u9ad8\u5176\u8bc6\u522b\u5199\u4f5c\u98ce\u683c\u548c\u8bcd\u6c47\u53d8\u5316\u7684\u80fd\u529b\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u5305\u542b\u771f\u5b9e\u7684\u4e66\u8bc4\u548c\u4f7f\u7528Vicuna\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u6a21\u62df\u8bc4\u8bba\uff0c\u4ee5\u8bc4\u4f30\u6240\u63d0\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bc6\u522b\u6587\u672c\u539f\u521b\u6765\u6e90\u662f\u53ef\u884c\u7684\uff0c\u51c6\u786e\u7387\u8fbe\u523096.86%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u8bc6\u522b\u65b9\u9762\u7684\u6027\u80fd\u4e0e\u5c40\u9650\u6027\u7814\u7a76\uff0c\u8fd9\u5bf9\u4e8e\u672a\u6765\u6709\u6548\u7ba1\u7406\u6b64\u7c7b\u6a21\u578b\u4ee5\u53ca\u786e\u4fdd\u4eba\u7c7b\u521b\u4f5c\u5185\u5bb9\u7684\u5b8c\u6574\u6027\u548c\u771f\u5b9e\u6027\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2405.21075": "|**2024-05-31**|**Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis**|Chaoyou Fu et.al.|[2405.21075](http://arxiv.org/abs/2405.21075)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u8ffd\u6c42\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u6210\u4e3a\u8fd1\u671f\u8fdb\u6b65\u7684\u6838\u5fc3\u3002\u7136\u800c\uff0c\u5bf9\u5b83\u4eec\u5904\u7406\u5e8f\u5217\u89c6\u89c9\u6570\u636e\u7684\u80fd\u529b\u7684\u5173\u6ce8\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51faVideo-MME\uff0c\u8fd9\u662f\u9996\u4e2a\u5168\u9762\u8bc4\u4f30MLLMs\u5728\u89c6\u9891\u5206\u6790\u6027\u80fd\u7684\u591a\u6a21\u6001\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u56db\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u89c6\u9891\u7c7b\u578b\u591a\u6837\uff0c\u6db5\u76d66\u4e2a\u4e3b\u8981\u89c6\u89c9\u9886\u57df\u548c30\u4e2a\u5b50\u9886\u57df\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\u6cdb\u5316\u80fd\u529b\uff1b2\uff09\u65f6\u95f4\u7ef4\u5ea6\u7684\u8de8\u5ea6\uff0c\u5305\u62ec\u77ed\u3001\u4e2d\u3001\u957f\u671f\u89c6\u9891\uff0c\u4ece11\u79d2\u52301\u5c0f\u65f6\uff0c\u4ee5\u68c0\u9a8c\u6a21\u578b\u5bf9\u590d\u6742\u60c5\u5883\u52a8\u6001\u7684\u9002\u5e94\u6027\uff1b3\uff09\u6570\u636e\u6a21\u6001\u7684\u5e7f\u5ea6\uff0c\u7ed3\u5408\u89c6\u9891\u5e27\u4ee5\u5916\u7684\u591a\u79cd\u8f93\u5165\uff0c\u5982\u5b57\u5e55\u548c\u97f3\u9891\uff0c\u63ed\u793aMLLMs\u7684\u5168\u65b9\u4f4d\u80fd\u529b\uff1b4\uff09\u9ad8\u8d28\u91cf\u7684\u6807\u6ce8\uff0c\u7531\u4e13\u5bb6\u4e25\u683c\u624b\u52a8\u6807\u8bb0\uff0c\u4ee5\u4fdd\u8bc1\u7cbe\u786e\u4e14\u53ef\u9760\u7684\u6a21\u578b\u8bc4\u4f30\u3002\u6211\u4eec\u7cbe\u5fc3\u6311\u9009\u5e76\u624b\u52a8\u6ce8\u89e3\u4e86900\u6bb5\u89c6\u9891\uff0c\u603b\u65f6\u957f\u8fbe\u5230256\u5c0f\u65f6\uff0c\u751f\u6210\u4e862,700\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u901a\u8fc7Video-MME\uff0c\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u7cfb\u5217\u3001Gemini 1.5 Pro\u5728\u5185\u7684\u591a\u4e2a\u6700\u5148\u8fdb\u7684MLLM\uff0c\u4ee5\u53ca\u5f00\u6e90\u56fe\u50cf\u6a21\u578bInternVL-Chat-V1.5\u548c\u89c6\u9891\u6a21\u578bLLaVA-NeXT-Video\u8fdb\u884c\u4e86\u6df1\u5165\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cGemini 1.5 Pro\u662f\u8868\u73b0\u6700\u4f73\u7684\u5546\u4e1a\u6a21\u578b\uff0c\u660e\u663e\u4f18\u4e8e\u5f00\u6e90\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u53d1\u73b0\u5f3a\u8c03\u4e86\u6539\u8fdb\u5904\u7406\u66f4\u957f\u5e8f\u5217\u548c\u591a\u6a21\u6001\u6570\u636e\u7684\u5fc5\u8981\u6027\u3002\u9879\u76ee\u7f51\u9875\u94fe\u63a5\uff1ahttps://video-mme.github.io|\n", "2405.21047": "|**2024-05-31**|**Grammar-Aligned Decoding**|Kanghee Park et.al.|[2405.21047](http://arxiv.org/abs/2405.21047)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u65f6\u9762\u4e34\u6311\u6218\uff0c\u5982\u7a0b\u5e8f\u4ee3\u7801\u3001\u6570\u5b66\u516c\u5f0f\u6216\u89c4\u8303\u7684\u6807\u8bb0\u3002\u7ea6\u675f\u89e3\u7801\u65b9\u6cd5\u901a\u8fc7\u9650\u5236\u6bcf\u6b21\u8f93\u51fa\u53ef\u80fd\u7684\u4ee4\u724c\uff0c\u786e\u4fdd\u8f93\u51fa\u7b26\u5408\u7279\u5b9a\u89c4\u5219\u6765\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4f8b\u5982\u5728\u8bed\u6cd5\u7ea6\u675f\u89e3\u7801\uff08GCD\uff09\u4e2d\uff0cLLM\u7684\u8f93\u51fa\u5fc5\u987b\u9075\u5faa\u7ed9\u5b9a\u7684\u8bed\u6cd5\u89c4\u5219\u3002\u7136\u800c\uff0c\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u79cd\u7ea6\u675f\u89e3\u7801\u53ef\u80fd\u4f1a\u626d\u66f2\u6a21\u578b\u7684\u5206\u5e03\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u8f93\u51fa\u867d\u7136\u8bed\u6cd5\u6b63\u786e\uff0c\u4f46\u5176\u6982\u7387\u5e76\u4e0d\u76f4\u63a5\u53cd\u6620LLM\u672c\u8eab\u7684\u6982\u7387\u5206\u914d\uff0c\u4ece\u800c\u8d28\u91cf\u4e0d\u9ad8\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u4e0e\u8bed\u6cd5\u7ea6\u675f\u5bf9\u9f50\u7684\u89e3\u7801\u201d\uff08Grammar-Aligned Decoding\uff0cGAD\uff09\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u81ea\u9002\u5e94\u91c7\u6837\u4e0e\u8fd1\u4f3c\u671f\u671b\u672a\u6765\u201d\uff08Adaptive Sampling with Approximate Expected Futures\uff0cASAp\uff09\u7684\u89e3\u7801\u7b97\u6cd5\u3002 ASAp\u7b97\u6cd5\u65e8\u5728\u4fdd\u8bc1\u8f93\u51fa\u7684\u8bed\u6cd5\u6027\uff0c\u5e76\u7406\u8bba\u4e0a\u4ea7\u751f\u4e0eLLM\u5728\u7ed9\u5b9a\u8bed\u6cd5\u7ea6\u675f\u6761\u4ef6\u4e0b\u7684\u6761\u4ef6\u6982\u7387\u76f8\u7b26\u7684\u7ed3\u679c\u3002\u8be5\u7b97\u6cd5\u5229\u7528\u5148\u524d\u7684\u6837\u672c\u8f93\u51fa\u6765\u7a33\u5065\u5730\u4f30\u7b97\u4e0d\u540c\u8f93\u51fa\u524d\u7f00\u7684\u672a\u6765\u8bed\u6cd5\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5728\u4ee3\u7801\u751f\u6210\u548c\u7ed3\u6784\u5316\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cASAp\u7ecf\u5e38\u80fd\u591f\u751f\u6210\u6bd4\u73b0\u6709GCD\u6280\u672f\u66f4\u7b26\u5408LLM\u5206\u5e03\u4e14\u4ecd\u9075\u5b88\u6240\u9700\u8bed\u6cd5\u9650\u5236\u7684\u8f93\u51fa\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u8d28\u91cf\u3002|\n", "2405.21040": "|**2024-05-31**|**Direct Alignment of Language Models via Quality-Aware Self-Refinement**|Runsheng Yu et.al.|[2405.21040](http://arxiv.org/abs/2405.21040)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u884c\u4e3a\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u5e38\u7528\u65b9\u6cd5\u3002\u6700\u8fd1\uff0c\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u4f5c\u4e3a\u4e00\u79cd\u66ff\u4ee3\u65b9\u6848\u5174\u8d77\uff0c\u5b83\u4e0d\u518d\u4f9d\u8d56LLM\u5956\u52b1\u6a21\u578b\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u989d\u5916\u7684\u5185\u5b58\u548c\u8bad\u7ec3\u65f6\u95f4\u3002\u7136\u800c\uff0cDPO\u5ffd\u89c6\u4e86\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u76f8\u5bf9\u8d28\u91cf\uff0c\u53ef\u80fd\u5bfc\u81f4\u8bad\u7ec3\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u5229\u7528LLM\u5185\u90e8\u77e5\u8bc6\u5728\u5373\u65f6\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5e76\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7ec6\u5316\u51fd\u6570\uff0c\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u4f30\u8ba1\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u54c1\u8d28\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u8f7b\u5ea6\u5047\u8bbe\u4e0b\uff0c\u6784\u5efa\u7684\u7ec6\u5316\u51fd\u6570\u80fd\u591f\u5e2e\u52a9\u81ea\u6211\u8c03\u6574\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u5c06\u8fd9\u4e2a\u7ec6\u5316\u529f\u80fd\u6574\u5408\u5230DPO\u53ca\u5176\u53d8\u4f53\u8eab\u4efd\u7b56\u7565\u4f18\u5316\uff08IPO\uff09\u4e2d\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0c\u8fd9\u4e9b\u6539\u8fdb\u540e\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u8005\u4e0a\u8868\u73b0\u51fa\u4f18\u4e8eDPO\u548cIPO\u7684\u6027\u80fd\u3002|\n", "2405.21030": "|**2024-05-31**|**Standards for Belief Representations in LLMs**|Daniel A. Herrmann et.al.|[2405.21030](http://arxiv.org/abs/2405.21030)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u5bb6\u4eec\u6b63\u5728\u5bfb\u6c42\u7406\u89e3\u5b83\u4eec\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5173\u4e8eLLMs\u5982\u4f55\uff08\u5982\u679c\u6709\u7684\u8bdd\uff09\u5185\u90e8\u6784\u5efa\u5bf9\u4e16\u754c\u7684\u4fe1\u5ff5\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u7f3a\u4e4f\u4e00\u4e2a\u7edf\u4e00\u7684\u7406\u8bba\u6846\u67b6\u6765\u652f\u6491\u5bf9LLM\u4e2d\u4fe1\u5ff5\u7684\u7814\u7a76\u3002\u672c\u6587\u8bd5\u56fe\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86\u4e00\u5957\u6761\u4ef6\uff0c\u4f7fLLM\u4e2d\u7684\u8868\u793a\u80fd\u591f\u88ab\u89c6\u4e3a\u4fe1\u5ff5\u4f3c\u7684\u3002\u6211\u4eec\u6307\u51fa\uff0c\u5c3d\u7ba1\u5728LLMs\u4e2d\u6d4b\u91cf\u4fe1\u5ff5\u7684\u9879\u76ee\u4e0e\u51b3\u7b56\u7406\u8bba\u548c\u5f62\u5f0f\u8ba4\u8bc6\u8bba\u4e2d\u7684\u4fe1\u5ff5\u6d4b\u91cf\u5728\u8bb8\u591a\u65b9\u9762\u6709\u76f8\u4f3c\u4e4b\u5904\uff0c\u4f46\u4e5f\u5b58\u5728\u5dee\u5f02\uff0c\u8fd9\u4e9b\u5dee\u5f02\u5e94\u5f71\u54cd\u6211\u4eec\u7684\u6d4b\u91cf\u65b9\u6cd5\u3002\u56e0\u6b64\uff0c\u501f\u9274\u54f2\u5b66\u6d1e\u5bdf\u548c\u673a\u5668\u5b66\u4e60\u7684\u5f53\u4ee3\u5b9e\u8df5\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u56db\u4e2a\u6807\u51c6\uff1a\u51c6\u786e\u6027\u3001\u4e00\u81f4\u6027\u3001\u7edf\u4e00\u6027\u548c\u5b9e\u7528\u6027\u3002\u8fd9\u56db\u4e2a\u6807\u51c6\u7ed3\u5408\u4e86\u7406\u8bba\u8003\u91cf\u4e0e\u5b9e\u9645\u9650\u5236\uff0c\u4e3a\u5168\u9762\u7406\u89e3LLM\u4e2d\u7684\u4fe1\u5ff5\u8868\u793a\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u6211\u4eec\u5f15\u7528\u5b9e\u8bc1\u5de5\u4f5c\u7684\u6210\u679c\uff0c\u63ed\u793a\u4e86\u5355\u72ec\u4f7f\u7528\u67d0\u4e9b\u6807\u51c6\u65f6\u8bc6\u522b\u4fe1\u5ff5\u8868\u793a\u7684\u5c40\u9650\u6027\u3002|\n", "2405.21028": "|**2024-05-31**|**LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models**|Elias Stengel-Eskin et.al.|[2405.21028](http://arxiv.org/abs/2405.21028)|**[link](https://github.com/esteng/pragmatic_calibration)**|**\u5f53\u56de\u7b54\u95ee\u9898\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u80fd\u63d0\u4f9b\u7b54\u6848\uff0c\u8fd8\u80fd\u4f20\u8fbe\u5bf9\u7b54\u6848\u6b63\u786e\u6027\u7684\u4fe1\u5fc3\u7a0b\u5ea6\u3002\u8fd9\u5305\u62ec\u660e\u786e\u7684\u5206\u6570\u6807\u8bb0\uff0c\u5982\u7ed9\u51fa\u6570\u5b57\uff0c\u4ee5\u53ca\u9690\u542b\u7684\u4fe1\u5fc3\u6807\u5fd7\uff0c\u5982\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u989d\u5916\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u5927\u591a\u6570\u6a21\u578b\u5f80\u5f80\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u6821\u51c6\u8fd9\u4e9b\u4fe1\u5fc3\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u7528\u7684\u3001\u8003\u8651\u542c\u4f17\u7684\u5fae\u8c03\u65b9\u6cd5\uff08LACIE\uff09\uff0c\u5b83\u4e0d\u4ec5\u5173\u6ce8\u7b54\u6848\u662f\u5426\u6b63\u786e\uff0c\u8fd8\u5173\u6ce8\u7b54\u6848\u662f\u5426\u4f1a\u88ab\u542c\u4f17\u63a5\u53d7\u3002\u6211\u4eec\u5c06\u6821\u51c6\u89c6\u4e3a\u504f\u597d\u4f18\u5316\uff0c\u901a\u8fc7\u53cc\u4ee3\u7406\u6e38\u620f\u521b\u5efa\u6570\u636e\uff0c\u8ba9\u4e00\u4e2a\u6f14\u8bb2\u8005\u6a21\u578b\u7684\u8f93\u51fa\u63a5\u53d7\u6a21\u62df\u542c\u8005\u7684\u8bc4\u5224\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528LACIE\u5bf9\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff08Mistral-7B\u3001Llama3-8B\u548cLlama3-70B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u663e\u793a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6a21\u62df\u542c\u8005\u9762\u524d\u6709\u66f4\u597d\u7684\u6821\u51c6\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u8d8b\u52bf\u4e5f\u9002\u7528\u4e8e\u4eba\u7c7b\u542c\u4f17\uff0c\u5e2e\u52a9\u4ed6\u4eec\u66f4\u51c6\u786e\u5730\u9884\u6d4b\u6a21\u578b\u7684\u6b63\u786e\u6027\uff1a\u6211\u4eec\u5728\u4eba\u673a\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u63a5\u53d7\u7684\u9519\u8bef\u7b54\u6848\u51cf\u5c11\u4e8647%\uff0c\u800c\u6b63\u786e\u7b54\u6848\u7684\u63a5\u53d7\u7387\u4fdd\u6301\u4e0d\u53d8\u3002\u6b64\u5916\uff0cLACIE\u6cdb\u5316\u5230\u53e6\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\uff0c\u5728\u4f7f\u7528TriviaQA\u8bad\u7ec3\u540e\uff0cTruthfulQA\u4e0a\u7684\u771f\u5b9e\u6027\u5927\u5e45\u63d0\u9ad8\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLACIE\u5bfc\u81f4\u4e86\u6b63\u786e\u548c\u9519\u8bef\u793a\u4f8b\u4e4b\u95f4\u7684\u4fe1\u5fc3\u5ea6\u66f4\u597d\u5730\u5206\u79bb\u3002\u5b9a\u6027\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u4f1a\u66f4\u52a0\u8c28\u614e\uff0c\u5e76\u5728\u56de\u7b54\u6b63\u786e\u65f6\u901a\u8fc7\u4f7f\u7528\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u7ec6\u8282\u6765\u9690\u6027\u5730\u8868\u793a\u786e\u5b9a\u6027\u3002\u6700\u540e\uff0cLACIE\u5fae\u8c03\u5bfc\u81f4\u6a21\u578b\u5bf9\u4e8e\u53ef\u80fd\u9519\u8bef\u7684\u7b54\u6848\u66f4\u503e\u5411\u4e8e\u653e\u5f03\uff08\u4f8b\u5982\u8bf4\u201c\u6211\u4e0d\u77e5\u9053\u201d\uff09\u3002**|\n", "2405.21018": "|**2024-05-31**|**Improved Techniques for Optimization-Based Jailbreaking on Large Language Models**|Xiaojun Jia et.al.|[2405.21018](http://arxiv.org/abs/2405.21018)|**[link](https://github.com/jiaxiaojunqaq/i-gcg)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5176\u5b89\u5168\u6821\u51c6\u6210\u4e3a\u5e7f\u6cdb\u5e94\u7528\u7684\u5173\u952e\u3002\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u7834\u89e3\uff08\u5373\u201cjailbreaking\u201d\uff09\u6d3b\u52a8\u65e5\u76ca\u589e\u591a\uff0c\u5176\u4e2d\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u653b\u51fb\u56e0\u5176\u6210\u6548\u663e\u8457\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0cGCG\u7684\u653b\u51fb\u6548\u7387\u4ecd\u6709\u63d0\u5347\u7a7a\u95f4\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6539\u8fdb\u7684\u4f18\u5316\u57fa\u7ebf\u7834\u89e3\u6280\u672f\uff0c\u4ee5\u63d0\u5347GCG\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u5355\u4e2a\u76ee\u6807\u6a21\u677f\u201cSure\u201d\u6781\u5927\u5730\u9650\u5236\u4e86GCG\u7684\u653b\u51fb\u6548\u679c\uff0c\u56e0\u6b64\u6211\u4eec\u5efa\u8bae\u91c7\u7528\u5305\u542b\u6709\u5bb3\u81ea\u6211\u6697\u793a\u548c/\u6216\u6307\u5bfc\u7684\u591a\u6837\u5316\u76ee\u6807\u6a21\u677f\uff0c\u4ee5\u8bef\u5bfc\u6a21\u578b\u3002\u5728\u4f18\u5316\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u5efa\u8bae\u5728GCG\u4e2d\u5b9e\u65bd\u81ea\u52a8\u591a\u5750\u6807\u66f4\u65b0\uff0c\u4ee5\u52a0\u901f\u6536\u655b\uff0c\u5e76\u5f15\u5165\u4ece\u7b80\u5355\u5230\u590d\u6742\uff08easy-to-hard\uff09\u7684\u521d\u59cb\u5316\u6280\u5de7\u3002\u5c06\u8fd9\u4e9b\u6539\u8fdb\u6574\u5408\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u9ad8\u6548\u7684\u65b9\u6cd5\u2014\u2014$\\mathcal{I}$-GCG\u3002\u5b9e\u9a8c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5982NeurIPS 2023 \u7ea2\u961f\u6311\u6218\u4e2d\u8fdb\u884c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6539\u8fdb\u6280\u672f\u80fd\u591f\u5e2e\u52a9GCG\u8d85\u8d8a\u73b0\u6709\u7834\u89e3\u653b\u51fb\uff0c\u5b9e\u73b0\u63a5\u8fd1100%\u7684\u653b\u51fb\u6210\u529f\u7387\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/jiaxiaojunQAQ/I-GCG\u3002**|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985](http://arxiv.org/abs/2405.20985)|**[link](https://github.com/yaolinli/deco)**|\u8be5\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u6295\u5f71\u5668\u6a21\u5757\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fde\u63a5\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u3001\u4fc3\u8fdb\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u9762\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u5bf9\u4e8e\u6295\u5f71\u5668\u5728\u89c6\u89c9-\u8bed\u8a00\u5bf9\u9f50\u65b9\u9762\u7684\u6548\u679c\u8bc4\u4f30\u4ecd\u663e\u4e0d\u8db3\uff0c\u901a\u5e38\u53ea\u80fd\u901a\u8fc7\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u95f4\u63a5\u63a8\u65ad\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u901a\u8fc7\u5206\u6790MLLM\u4e2d\u7684\u89c6\u89c9-\u8bed\u8a00\u8bed\u4e49\u6d41\uff0c\u6765\u89e3\u8bfb\u6295\u5f71\u5668\u7684\u5de5\u4f5c\u673a\u5236\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u8005\u8ffd\u8e2a\u4ece\u751f\u6210\u7684\u8bed\u8a00\u6807\u8bb0\u5230\u539f\u59cb\u89c6\u89c9\u7f16\u7801\u5757\u4ee5\u53ca\u6295\u5f71\u5668\u4ea7\u751f\u7684\u4e2d\u95f4\u8f93\u51fa\u4e4b\u95f4\u7684\u8bed\u4e49\u76f8\u5173\u6027\u6d41\u3002\u53d1\u73b0\u538b\u7f29\u578b\u6295\u5f71\u5668\uff08\u5982QFormer\uff09\u503e\u5411\u4e8e\u5c06\u89c6\u89c9\u5757\u62bd\u8c61\u6210\u6709\u9650\u7684\u51e0\u4e2a\u6982\u5ff5\uff0c\u5982\u7269\u4f53\u6216\u5c5e\u6027\uff0c\u5bfc\u81f4\u201c\u53cc\u91cd\u62bd\u8c61\u201d\u73b0\u8c61\uff1a\u9996\u5148\uff0c\u6295\u5f71\u5668\u53c2\u7167\u9884\u5b9a\u4e49\u67e5\u8be2\u4ee4\u724c\u8fdb\u884c\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\uff0c\u7136\u540e\uff0c\u57fa\u4e8e\u6587\u672c\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u4e00\u6b65\u63d0\u53d6\u3002\u8fd9\u79cd\u53cc\u91cd\u62bd\u8c61\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6548\u7387\u4e0d\u9ad8\uff0c\u5e76\u53ef\u80fd\u5bfc\u81f4\u89c6\u89c9\u8bed\u4e49\u4fe1\u606f\u7684\u7d2f\u79ef\u7f3a\u5931\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u63d0\u51fa\u201c\u89e3\u8026\u538b\u7f29\u4e0e\u62bd\u8c61\uff08DeCo\uff09\u201d\u7684\u5173\u952e\u6d1e\u5bdf\uff0c\u5373\u5728\u6295\u5f71\u5c42\u9762\u4e0a\u5c06\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u538b\u7f29\uff0c\u800c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u5b8c\u5168\u8d1f\u8d23\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u538b\u7f29\u5668\u2014\u2014\u4e8c\u7ef4\u81ea\u9002\u5e94\u6c60\u5316\uff0c\u4ee5\u65e0\u53c2\u6570\u7684\u65b9\u5f0f\u964d\u4f4e\u89c6\u89c9\u5757\u7684\u5c3a\u5bf8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDeCo\u5728\u6027\u80fd\u548c\u6548\u7387\u4e0a\u90fd\u4f18\u4e8e\u4f20\u7edf\u7684\u538b\u7f29\u6295\u5f71\u5668\u3002\u5b83\u5728MLLM\u57fa\u51c6\u3001\u89c6\u89c9\u5b9a\u4f4d\u548c\u5f00\u653e\u6027\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\u5206\u522b\u53d6\u5f97\u4e860.9%\u30017.1%\u548c2.9%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u62e5\u6709\u66f4\u5c11\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u66f4\u5feb\u7684\u6536\u655b\u901f\u5ea6\u3002|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978](http://arxiv.org/abs/2405.20978)|**[link](https://github.com/calubkk/raat)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5f3a\u5927\u529f\u80fd\uff0c\u4f46\u9762\u4e34\u6311\u6218\uff0c\u5982\u865a\u6784\u3001\u8fc7\u65f6\u77e5\u8bc6\u548c\u96be\u4ee5\u8ffd\u6eaf\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u5b83\u7ed3\u5408\u5916\u90e8\u6570\u636e\u5e93\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u4e0d\u9002\u5f53\u7684\u68c0\u7d22\u6bb5\u843d\u53ef\u80fd\u59a8\u788dLLMs\u751f\u6210\u5168\u9762\u4e14\u9ad8\u8d28\u91cf\u7684\u56de\u7b54\u3002\u5148\u524d\u5173\u4e8eRAG\u4e2d\u68c0\u7d22\u566a\u58f0\u7a33\u5065\u6027\u7684\u7814\u7a76\u5f80\u5f80\u5c40\u9650\u4e8e\u6709\u9650\u7684\u566a\u58f0\u7c7b\u578b\uff0c\u8fd9\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u68c0\u7d22\u73af\u5883\u4e0d\u7b26\uff0c\u9650\u5236\u4e86\u5b9e\u9645\u5e94\u7528\u3002\u672c\u7814\u7a76\u9996\u5148\u63a2\u8ba8\u4e86\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u5c06\u5176\u5206\u4e3a\u4e09\u79cd\u4e0d\u540c\u7684\u7c7b\u522b\uff0c\u53cd\u6620\u771f\u5b9e\u73af\u5883\u3002\u6211\u4eec\u5206\u6790\u4e86\u8fd9\u4e9b\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u566a\u58f0\u5bf9LLMs\u7a33\u5065\u6027\u7684\u5f71\u54cd\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAG\u65b9\u6cd5\uff0c\u79f0\u4e3a\u68c0\u7d22\u589e\u5f3a\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\uff08RAAT\uff09\u3002RAAT\u5229\u7528\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\u6765\u52a8\u6001\u8c03\u6574\u6a21\u578b\u7684\u8bad\u7ec3\u6d41\u7a0b\u4ee5\u5e94\u5bf9\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u91c7\u7528\u591a\u4efb\u52a1\u5b66\u4e60\u786e\u4fdd\u6a21\u578b\u80fd\u591f\u8bc6\u522b\u5608\u6742\u7684\u4e0a\u4e0b\u6587\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5404\u79cd\u566a\u58f0\u6761\u4ef6\u4e0b\uff0c\u4f7f\u7528RAAT\u8bad\u7ec3\u7684LLaMA-2 7B\u6a21\u578b\u5728F1\u548cEM\u5206\u6570\u4e0a\u663e\u793a\u51fa\u663e\u8457\u63d0\u5347\u3002\u4e3a\u4e86\u4fbf\u4e8e\u590d\u73b0\uff0c\u6211\u4eec\u5df2\u5728https://github.com/calubkk/RAAT\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2405.20974": "|**2024-05-31**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974](http://arxiv.org/abs/2405.20974)|**[link](https://github.com/xu1868/sayself)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u51c6\u786e\u6216\u865a\u5047\u7684\u4fe1\u606f\uff0c\u5e76\u4e14\u901a\u5e38\u65e0\u6cd5\u8868\u660e\u5176\u4fe1\u5fc3\u6c34\u5e73\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u76f4\u63a5\u63d0\u793a\u6216\u81ea\u6211\u4e00\u81f4\u6027\u63d0\u793a\u6765\u63d0\u53d6LLMs\u7684\u4fe1\u5fc3\uff0c\u6216\u8005\u6784\u5efa\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6027\u80fd\u8f83\u5dee\uff0c\u800c\u57fa\u4e8e\u8bad\u7ec3\u7684\u65b9\u6cd5\u53c8\u5c40\u9650\u4e8e\u4e8c\u5143\u6216\u4e0d\u7cbe\u786e\u7684\u6574\u4f53\u4fe1\u5fc3\u4f30\u8ba1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014SaySelf\uff0c\u8fd9\u662f\u4e00\u4e2a\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLMs\u63d0\u4f9b\u66f4\u7cbe\u786e\u7684\u7ec6\u7c92\u5ea6\u4fe1\u5fc3\u4f30\u8ba1\u3002 \u6b64\u5916\uff0cSaySelf\u8fd8\u63a8\u52a8LLMs\u751f\u6210\u81ea\u6211\u53cd\u601d\u7684\u89e3\u91ca\uff0c\u660e\u786e\u6307\u51fa\u5b83\u4eec\u5728\u53c2\u6570\u77e5\u8bc6\u4e0a\u7684\u7a7a\u767d\u5e76\u89e3\u91ca\u4e0d\u786e\u5b9a\u6027\u3002\u8fd9\u662f\u901a\u8fc7\u8ba9LLM\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u81ea\u52a8\u603b\u7ed3\u7279\u5b9a\u77e5\u8bc6\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u6765\u5b9e\u73b0\u7684\u3002\u8fd9\u79cd\u603b\u7ed3\u662f\u57fa\u4e8e\u5bf9\u591a\u4e2a\u91c7\u6837\u63a8\u7406\u94fe\u7684\u4e0d\u4e00\u81f4\u6027\u5206\u6790\uff0c\u751f\u6210\u7684\u6570\u636e\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u6821\u51c6\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u5956\u52b1\u51c6\u786e\u3001\u9ad8\u7f6e\u4fe1\u5ea6\u7684\u9884\u6d4b\uff0c\u540c\u65f6\u60e9\u7f5a\u9519\u8bef\u8f93\u51fa\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u5728\u5206\u5e03\u5185\u8fd8\u662f\u5206\u5e03\u5916\u7684\u6570\u636e\u96c6\u4e0a\uff0cSaySelf\u90fd\u80fd\u6709\u6548\u51cf\u5c11\u4fe1\u5fc3\u6821\u51c6\u8bef\u5dee\uff0c\u540c\u65f6\u4fdd\u6301\u4efb\u52a1\u6027\u80fd\u3002\u751f\u6210\u7684\u81ea\u6211\u53cd\u601d\u7406\u7531\u4e5f\u88ab\u8bc1\u660e\u662f\u5408\u7406\u7684\uff0c\u80fd\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u6821\u51c6\u3002\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a\\url{https://github.com/xu1868/SaySelf}\u3002**|\n", "2405.20973": "|**2024-05-31**|**LCQ: Low-Rank Codebook based Quantization for Large Language Models**|Wen-Pu Cai et.al.|[2405.20973](http://arxiv.org/abs/2405.20973)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4f18\u5f02\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7684\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u9ad8\u6210\u4e3a\u90e8\u7f72\u7684\u4e00\u5927\u6311\u6218\u3002\u4e3a\u4e86\u538b\u7f29\u6a21\u578b\u5e76\u964d\u4f4e\u6210\u672c\uff0c\u6743\u91cd\u91cf\u5316\u6280\u672f\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u76ee\u524d\uff0c\u5927\u591a\u6570\u9488\u5bf9LLMs\u7684\u91cf\u5316\u65b9\u6cd5\u4f7f\u7528\u79e9\u4e00\u7801\u672c\uff0c\u7136\u800c\u5728\u9ad8\u538b\u7f29\u6bd4\u4e0b\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u7cbe\u5ea6\u635f\u5931\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6743\u91cd\u91cf\u5316\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4f4e\u79e9\u7801\u672c\u91cf\u5316\uff08LCQ\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 ## \u65b9\u6cd5 LCQ\u91c7\u7528\u4f4e\u79e9\u7801\u672c\u8fdb\u884c\u91cf\u5316\uff0c\u5176\u79e9\u53ef\u4ee5\u5927\u4e8e\u4e00\u3002\u8fd9\u79cd\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u5229\u7528\u66f4\u9ad8\u7684\u79e9\u6765\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u7684\u7cbe\u5ea6\uff0c\u540c\u65f6\u63a7\u5236\u989d\u5916\u7684\u5b58\u50a8\u5f00\u9500\u51e0\u4e4e\u4e3a\u96f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u6bd4\uff0cLCQ\u5728\u4fdd\u6301\u826f\u597d\u51c6\u786e\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u80fd\u591f\u5b9e\u73b0\u66f4\u4f18\u7684\u538b\u7f29\u6548\u679c\u3002 ## \u7ed3\u8bba \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4f4e\u79e9\u7801\u672c\u91cf\u5316\u65b9\u6cd5\uff0c\u5b83\u6709\u671b\u5728\u4e0d\u663e\u8457\u589e\u52a0\u5b58\u50a8\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6027\u80fd\u548c\u6548\u7387\uff0c\u4e3a\u9ad8\u6548\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550](http://arxiv.org/abs/2406.02550)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|**\u8fd9\u7bc7\u5de5\u4f5c\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7ec4\u6a21\u5757\u5316\u7b97\u672f\u4efb\u52a1\u4e2d\u51fa\u73b0\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u6280\u80fd\u7ec4\u5408\u73b0\u8c61\u3002\u6211\u4eec\u5173\u6ce8\u7684\u662f\u6709\u9650\u6570\u91cf\u7684\u4e00\u6b21\u6027\u6a21\u8fd0\u7b97\u51fd\u6570 $z = a \\times x + b \\times y \\;(\\text{mod}\\; p)$\uff0c\u8fd9\u4e9b\u51fd\u6570\u7531\u5411\u91cf $(a, b) \\in \\mathbb{Z}_p^2$ \u6807\u8bb0\u3002\u90e8\u5206\u4efb\u52a1\u88ab\u7528\u4f5c\u9884\u8bad\u7ec3\uff0c\u5176\u4f59\u7528\u4e8e\u5206\u5e03\u5916\u6d4b\u8bd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cGPT\u98ce\u683c\u7684Transformer\u968f\u7740\u9884\u8bad\u7ec3\u4efb\u52a1\u6570\u91cf\u589e\u52a0\uff0c\u5176\u5728\u5206\u5e03\u5185\u548c\u5206\u5e03\u5916\u7684\u6cdb\u5316\u80fd\u529b\u4f1a\u7ecf\u5386\u8f6c\u53d8\u3002\u6700\u5c0f\u578b\u80fd\u5b9e\u73b0\u5206\u5e03\u5916\u6cdb\u5316\u7684\u6a21\u578b\u9700\u8981\u4e24\u4e2aTransformer\u5757\uff1b\u800c\u5bf9\u4e8e\u66f4\u6df1\u7684\u6a21\u578b\uff0c\u5206\u5e03\u5916\u6cdb\u5316\u9636\u6bb5\u662f\u201c\u77ac\u6001\u201d\u7684\uff0c\u9700\u8981\u65e9\u671f\u505c\u6b62\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u4e86\u53ef\u89e3\u91ca\u6027\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e24\u79cd\u9636\u6bb5\u4e2d\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8868\u793a\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b66\u4e60\u5230\u7684\u7b97\u6cd5\u3002**|\n", "2406.02547": "|**2024-06-04**|**Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning**|Alex Jinpeng Wang et.al.|[2406.02547](http://arxiv.org/abs/2406.02547)|**[link](https://github.com/showlab/VisInContext)**|**\u8fd9\u6bb5\u7814\u7a76\u5e76\u672a\u4ecb\u7ecd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u800c\u662f\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u63d0\u5347\u957f\u5e8f\u5217\u5728\u591a\u6a21\u6001\u6a21\u578b\u4e2d\u7684\u5904\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201cVisualized In-Context Text Processing\u201d\uff08VisInContext\uff09\u6280\u672f\uff0c\u901a\u8fc7\u89c6\u89c9\u4ee4\u724c\u6765\u5904\u7406\u957f\u6587\u672c\uff0c\u4ece\u800c\u663e\u8457\u964d\u4f4eGPU\u5185\u5b58\u4f7f\u7528\u548c\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9700\u6c42\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a560\u4ebf\u53c2\u6570\u7684\u6df7\u5408 Experts\uff08MOE\uff09\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u9884\u8bad\u7ec3\u4e2d\u7684\u4e0a\u4e0b\u6587\u6587\u672c\u957f\u5ea6\u6269\u5c55\u5230\u4e862048\u4e2atokens\uff0c\u800c\u8ba1\u7b97\u91cf\u51e0\u4e4e\u4fdd\u6301\u4e0d\u53d8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528VisInContext\u8bad\u7ec3\u7684\u6a21\u578b\u5728\u5e38\u89c1\u7684\u57fa\u4e8e\u5b9e\u4f8b\u7684\u5c11\u91cf\u6570\u636e\u8bc4\u4f30\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cVisInContext\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u7ed3\u5408\uff0c\u80fd\u589e\u5f3a\u5bf9\u6587\u6863\u7684\u7406\u89e3\u80fd\u529b\uff0c\u7279\u522b\u9002\u7528\u4e8e\u6587\u6863\u95ee\u7b54\u548c\u8fde\u7eed\u6587\u6863\u68c0\u7d22\uff0c\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002**|\n", "2406.02543": "|**2024-06-04**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543](http://arxiv.org/abs/2406.02543)|null|\u6211\u4eec\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u76ee\u6807\u662f\u8bc6\u522b\u5bf9\u7ed9\u5b9a\u67e5\u8be2\u7684\u54cd\u5e94\u65f6\u7684\u4e0d\u786e\u5b9a\u6027\u7a0b\u5ea6\u3002\u6211\u4eec\u540c\u65f6\u8003\u8651\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e0d\u786e\u5b9a\u6027\uff1a\u4e00\u79cd\u662f\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\uff08\u4f8b\u5982\u5bf9\u4e8b\u5b9e\u6216\u8bed\u8a00\u771f\u7406\u7684\u672a\u77e5\uff09\uff0c\u53e6\u4e00\u79cd\u662f\u4e0d\u53ef\u6d88\u9664\u7684\u968f\u673a\u6027\uff08\u5982\u53ef\u80fd\u7684\u7b54\u6848\u591a\u6837\u6027\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4fe1\u606f\u8bba\u6307\u6807\uff0c\u80fd\u591f\u53ef\u9760\u5730\u533a\u5206\u51fa\u53ea\u6709\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u8f83\u5927\u7684\u60c5\u51b5\uff0c\u8fd9\u65f6\u6a21\u578b\u7684\u8f93\u51fa\u662f\u4e0d\u53ef\u9760\u7684\u3002\u8fd9\u4e2a\u6761\u4ef6\u4ec5\u4f9d\u8d56\u4e8e\u901a\u8fc7\u7279\u6b8a\u8fed\u4ee3\u63d0\u793a\u57fa\u4e8e\u5148\u524d\u54cd\u5e94\u5f97\u5230\u7684\u6a21\u578b\u8f93\u51fa\u6765\u8ba1\u7b97\u3002\u8fd9\u79cd\u91cf\u5316\u65b9\u6cd5\u53ef\u4ee5\u68c0\u6d4b\u5355\u7b54\u548c\u591a\u7b54\u60c5\u51b5\u4e0b\u662f\u5426\u5b58\u5728\u865a\u6784\uff08\u5373\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u9ad8\uff09\u7684\u60c5\u51b5\uff0c\u8fd9\u4e0e\u8bb8\u591a\u6807\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\u7b56\u7565\uff08\u5982\u4ee5\u54cd\u5e94\u7684\u5bf9\u6570\u4f3c\u7136\u6027\u4f5c\u4e3a\u9608\u503c\uff09\u4e0d\u540c\uff0c\u540e\u8005\u65e0\u6cd5\u8bc6\u522b\u591a\u7b54\u60c5\u51b5\u4e0b\u7684\u865a\u6784\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86LLM\u5982\u4f55\u901a\u8fc7\u8fed\u4ee3\u63d0\u793a\u653e\u5927\u5bf9\u7ed9\u5b9a\u8f93\u51fa\u7684\u6982\u7387\u5206\u914d\uff0c\u8fd9\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u5174\u8da3\u4ef7\u503c\u3002|\n", "2406.02542": "|**2024-06-04**|**Loki: Low-Rank Keys for Efficient Sparse Attention**|Prajwal Singhania et.al.|[2406.02542](http://arxiv.org/abs/2406.02542)|null|\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u7279\u522b\u662f\u5f53\u4f7f\u7528\u957f\u5e8f\u5217\u65f6\uff0c\u81ea\u6ce8\u610f\u529b\u673a\u5236\u662f\u4e3b\u8981\u5f00\u9500\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u7a00\u758f\u6ce8\u610f\u529b\u8fd1\u4f3c\u65b9\u6cd5\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5206\u6790\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u5757\u4e2d\u7684\u952e\u5411\u91cf\u5b9e\u9645\u4e0a\u5904\u4e8e\u4e00\u4e2a\u8fdc\u4f4e\u4e8e\u539f\u59cb\u7ef4\u5ea6\u7684\u7a7a\u95f4\u3002\u8fd9\u4e00\u89c2\u5bdf\u4fc3\u4f7f\u6211\u4eec\u63d0\u51faLoki\uff0c\u4e00\u79cd\u65b0\u7684\u7a00\u758f\u6ce8\u610f\u529b\u65b9\u6cd5\u3002Loki\u6839\u636e\u5728\u4f4e\u7ef4\u7a7a\u95f4\u8ba1\u7b97\u7684\u6ce8\u610f\u529b\u5f97\u5206\uff0c\u5bf9KV\u7f13\u5b58\u4e2d\u7684\u4ee4\u724c\u8fdb\u884c\u6392\u5e8f\u548c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLoki\u80fd\u591f\u6bd4\u5176\u4ed6\u6d41\u884c\u8fd1\u4f3c\u65b9\u6cd5\u66f4\u597d\u5730\u4fdd\u6301\u6a21\u578b\u7684\u6548\u80fd\uff0c\u540c\u65f6\u7531\u4e8e\u51cf\u5c11\u4e86\u6570\u636e\u79fb\u52a8\uff08\u52a0\u8f7d/\u5b58\u50a8\uff09\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u52a0\u901f\u4e86\u6ce8\u610f\u529b\u8ba1\u7b97\u3002|\n", "2406.02539": "|**2024-06-04**|**Parrot: Multilingual Visual Instruction Tuning**|Hai-Long Sun et.al.|[2406.02539](http://arxiv.org/abs/2406.02539)|**[link](https://github.com/aidc-ai/parrot)**|\u968f\u7740GPT-4V\u7b49\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u5de5\u667a\u80fd\u671d\u7740\u901a\u7528\u4eba\u5de5\u667a\u80fd\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6765\u540c\u6b65\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u800c\u8d4b\u4e88\u5b83\u4eec\u591a\u6a21\u6001\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u505a\u6cd5\u53ef\u80fd\u5bfc\u81f4\u968f\u7740\u8bad\u7ec3\u7684\u8fdb\u884c\uff0c\u8bed\u8a00\u6a21\u578b\u5904\u7406\u591a\u79cd\u8bed\u8a00\u7684\u80fd\u529b\u9010\u6e10\u51cf\u5f31\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684\u4e0d\u5e73\u8861SFT\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u975e\u82f1\u8bed\u8bed\u8a00\u6027\u80fd\u663e\u8457\u4e0b\u964d\uff0c\u539f\u56e0\u5728\u4e8eSFT\u8fc7\u7a0b\u4e2d\u672a\u80fd\u6709\u6548\u8fde\u63a5\u89c6\u89c9\u7f16\u7801\u5668\u548c\u591a\u8bed\u8a00\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faParrot\uff0c\u4e00\u79cd\u5229\u7528\u6587\u672c\u5f15\u5bfc\u5728\u8bed\u8a00\u5c42\u9762\u9a71\u52a8\u89c6\u89c9\u4ee4\u724c\u5bf9\u9f50\u7684\u65b0\u65b9\u6cd5\u3002Parrot\u901a\u8fc7\u8ba9\u89c6\u89c9\u4ee4\u724c\u6839\u636e\u4e0d\u540c\u7684\u8bed\u8a00\u8f93\u5165\u8fdb\u884c\u6761\u4ef6\u5316\uff0c\u5e76\u501f\u52a9\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u4fc3\u8fdb\u591a\u8bed\u8a00\u4ee4\u724c\u7684\u5bf9\u9f50\u3002\u7279\u522b\u662f\uff0c\u4e3a\u4e86\u589e\u5f3a\u975e\u82f1\u8bed\u89c6\u89c9\u4ee4\u724c\u7684\u5bf9\u9f50\uff0c\u6211\u4eec\u8ba1\u7b97\u521d\u59cb\u89c6\u89c9\u7279\u5f81\u4e0e\u6587\u672c\u5d4c\u5165\u4e4b\u95f4\u7684\u8de8\u6ce8\u610f\u529b\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230MoE\u8def\u7531\u5668\uff0c\u9009\u62e9\u6700\u76f8\u5173\u7684\u4e13\u5bb6\u3002\u9009\u5b9a\u7684\u4e13\u5bb6\u4f1a\u5c06\u521d\u59cb\u89c6\u89c9\u4ee4\u724c\u8f6c\u5316\u4e3a\u7279\u5b9a\u8bed\u8a00\u7684\u89c6\u89c9\u4ee4\u724c\u3002\u9274\u4e8e\u76ee\u524d\u7f3a\u4e4f\u8bc4\u4f30\u591a\u8bed\u8a00\u80fd\u529b\u7684\u6807\u51c6\u57fa\u51c6\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u591a\u6a21\u6001\u57fa\u51c6\uff08MMMB\uff09\uff0c\u5305\u62ec6\u79cd\u8bed\u8a00\u300115\u4e2a\u7c7b\u522b\u548c12,000\u4e2a\u95ee\u9898\u3002Parrot\u4e0d\u4ec5\u5728MMMB\u548cMMM Benchmark\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8fd8\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5c06\u63d0\u4f9bParrot\u7684\u6e90\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\u96c6\u4f9b\u516c\u4f17\u4f7f\u7528\u3002|\n", "2406.02536": "|**2024-06-04**|**Mitigate Position Bias in Large Language Models via Scaling a Single Dimension**|Yijiong Yu et.al.|[2406.02536](http://arxiv.org/abs/2406.02536)|**[link](https://github.com/PositionalHidden/PositionalHidden)**|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4e00\u4e2a\u73b0\u8c61\u2014\u2014\u4f4d\u7f6e\u504f\u89c1\uff0c\u4e5f\u79f0\u4e3a\"\u8ff7\u5931\u5728\u4e2d\u95f4\"\u3002\u8fd9\u79cd\u504f\u89c1\u5728\u957f\u6587\u672c\u60c5\u5883\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u5373\u5173\u952e\u4fe1\u606f\u5728\u63d0\u793a\u4e2d\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f1a\u663e\u8457\u5f71\u54cd\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u6743\u91cd\u662f\u4f4d\u7f6e\u504f\u89c1\u7684\u5fae\u89c2\u8868\u73b0\u3002\u6b64\u5916\uff0c\u8bba\u6587\u6307\u51fa\uff0c\u56e0\u679c\u6ce8\u610f\u529b\u63a9\u7801\u901a\u8fc7\u521b\u5efa\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\uff0c\u4e5f\u5bf9\u4f4d\u7f6e\u504f\u89c1\u6709\u6240\u8d21\u732e\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u51cf\u8f7b\u4f4d\u7f6e\u504f\u89c1\uff0c\u5373\u8c03\u6574\u8fd9\u4e9b\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\u3002\u5b9e\u9a8c\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\uff0c\u5305\u62ec\u81ea\u7136\u95ee\u9898\u591a\u6587\u6863\u95ee\u7b54\u3001\u952e\u503c\u68c0\u7d22\u3001LongBench\u548c\u65f6\u95f4\u7ebf\u91cd\u6392\uff0c\u6d89\u53caRoPE\u6a21\u578b\u3001\u6269\u5c55\u4e0a\u4e0b\u6587\u7a97\u53e3\u6a21\u578b\u548cAlibi\u6a21\u578b\u7b49\u591a\u79cd\u67b6\u6784\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4ec5\u4fee\u6539\u9690\u85cf\u72b6\u6001\u7684\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u5c31\u80fd\u5b9e\u73b0\u6027\u80fd\u63d0\u5347\uff0c\u6700\u9ad8\u53ef\u8fbe15.2%\u3002\u7814\u7a76\u8005\u8fd8\u63d0\u4f9b\u4e86\u4ee3\u7801\u4f9b\u8fdb\u4e00\u6b65\u4f7f\u7528\uff0c\u4ee3\u7801\u5730\u5740\u4e3a\uff1ahttps://aka.ms/PositionalHidden\u3002|\n", "2406.02532": "|**2024-06-04**|**SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices**|Ruslan Svirschevski et.al.|[2406.02532](http://arxiv.org/abs/2406.02532)|**[link](https://github.com/yandex-research/specexec)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u9ad8\u6548\u8fd0\u884c\u5b83\u4eec\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u63a8\u6d4b\u6027\u89e3\u7801\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u662f\u9488\u5bf9\u6570\u636e\u4e2d\u5fc3\u786c\u4ef6\u8fdb\u884c\u8bbe\u8ba1\u3002\u672c\u7814\u7a76\u53cd\u95ee\uff1a\u6211\u4eec\u80fd\u5728\u6d88\u8d39\u7ea7\u8bbe\u5907\u4e0a\u591a\u5feb\u5730\u8fd0\u884cLLMs\uff1f\u6d88\u8d39\u8005\u7ea7GPU\u5df2\u65e0\u6cd5\u5bb9\u7eb3\u6700\u5927\u7684\u6a21\u578b\uff08500\u4ebf\u53c2\u6570\u4ee5\u4e0a\uff09\uff0c\u56e0\u6b64\u9700\u8981\u5c06\u53c2\u6570\u5378\u8f7d\u5230RAM\u6216SSD\u3002\u5f53\u4f7f\u7528\u5378\u8f7d\u53c2\u6570\u7684\u65b9\u5f0f\u8fd0\u884c\u65f6\uff0c\u63a8\u7406\u5f15\u64ce\u53ef\u4ee5\u540c\u65f6\u5904\u7406\u6570\u767e\u4e43\u81f3\u6570\u5343\u4e2a\u4ee4\u724c\u7684\u6279\u6b21\uff0c\u4f7f\u5176\u975e\u5e38\u9002\u5408\u63a8\u6d4b\u6027\u89e3\u7801\u3002\u6211\u4eec\u63d0\u51faSpecExec\uff08\u63a8\u6d4b\u6027\u6267\u884c\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u7684\u5e76\u884c\u89e3\u7801\u65b9\u6cd5\uff0c\u9002\u7528\u4e8e\u4e3b\u6d41LLM\u5bb6\u65cf\uff0c\u80fd\u751f\u6210\u6bcf\u8f6e\u76ee\u6807\u6a21\u578b\u8fed\u4ee3\u9ad8\u8fbe20\u4e2a\u4ee4\u724c\u7684\u9884\u6d4b\u3002\u5b83\u5229\u7528\u73b0\u4ee3LLMs\u4e2d\u6982\u7387\u5206\u5e03\u7684\u9ad8\u6ce2\u52a8\u6027\u548c\u6a21\u578b\u8f93\u51fa\u6982\u7387\u4e4b\u95f4\u7684\u9ad8\u5ea6\u4e00\u81f4\u6027\u3002SpecExec\u901a\u8fc7\u4ece\u8349\u7a3f\u6a21\u578b\u83b7\u53d6\u6700\u53ef\u80fd\u7684\u4ee4\u724c\u5ef6\u7eed\uff0c\u6784\u5efa\u4e00\u4e2a\u76ee\u6807\u6a21\u578b\u7684\u201c\u7f13\u5b58\u201d\u6811\uff0c\u7136\u540e\u5728\u4e00\u4e2a\u5355\u6b21\u904d\u5386\u4e2d\u9a8c\u8bc1\u3002 \u4f7f\u7528SpecExec\uff0c\u6211\u4eec\u5728\u6d88\u8d39\u7ea7GPU\u4e0a\u5b9e\u73b0\u4e86500\u4ebf\u53c2\u6570LLM\u7684\u63a8\u7406\uff0c\u914d\u5408RAM\u5378\u8f7d\uff0c4\u4f4d\u91cf\u5316\u4e0b\u7684\u901f\u5ea6\u8fbe\u52304-6\u4e2a\u4ee4\u724c/\u79d2\uff0c\u800c16\u4f4d\u6743\u91cd\u4e0b\u7684\u901f\u5ea6\u4e3a2-3\u4e2a\u4ee4\u724c/\u79d2\u3002|\n", "2406.02528": "|**2024-06-04**|**Scalable MatMul-free Language Modeling**|Rui-Jie Zhu et.al.|[2406.02528](http://arxiv.org/abs/2406.02528)|**[link](https://github.com/ridgerchu/matmulfreellm)**|**## \u7ffb\u8bd1 \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u77e9\u9635\u4e58\u6cd5\uff08MatMul\uff09\u901a\u5e38\u5360\u636e\u4e3b\u8981\u8ba1\u7b97\u5f00\u9500\u3002\u968f\u7740LLMs\u7684\u89c4\u6a21\u6269\u5927\uff0c\u5176\u5d4c\u5165\u7ef4\u5ea6\u548c\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e5f\u968f\u4e4b\u589e\u52a0\uff0c\u8fd9\u4e00\u95ee\u9898\u66f4\u4e3a\u663e\u8457\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u80fd\u591f\u5728\u4fdd\u6301\u5f3a\u5927\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5b8c\u5168\u79fb\u9664LLMs\u4e2d\u7684MatMul\u64cd\u4f5c\uff0c\u5373\u4f7f\u662f\u572827\u4ebf\u53c2\u6570\u91cf\u7ea7\u7684\u6a21\u578b\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65e0MatMul\u6a21\u578b\u5728\u4e0e\u5185\u5b58\u6d88\u8017\u663e\u8457\u66f4\u591a\u7684\u72b6\u6001-of-the-artTransformer\u76f8\u5f53\u7684\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u7814\u7a76\u4e86\u6a21\u578b\u7684\u6269\u5c55\u6027\u89c4\u5f8b\uff0c\u5e76\u53d1\u73b0\u65e0MatMul\u6a21\u578b\u4e0e\u5168\u7cbe\u5ea6Transformer\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u968f\u7740\u6a21\u578b\u5c3a\u5bf8\u589e\u5927\u800c\u51cf\u5c0f\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684GPU\u5b9e\u73b0\uff0c\u76f8\u8f83\u4e8e\u672a\u4f18\u5316\u7684\u57fa\u7ebf\uff0c\u8bad\u7ec3\u65f6\u80fd\u51cf\u5c11\u9ad8\u8fbe61%\u7684\u5185\u5b58\u4f7f\u7528\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u901a\u8fc7\u4f18\u5316\u7684\u5185\u6838\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5185\u5b58\u6d88\u8017\u53ef\u964d\u4f4e\u8d85\u8fc710\u500d\u3002\u4e3a\u4e86\u51c6\u786e\u8bc4\u4f30\u67b6\u6784\u6548\u7387\uff0c\u6211\u4eec\u5728FPGA\u4e0a\u6784\u5efa\u4e86\u5b9a\u5236\u786c\u4ef6\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528GPU\u65e0\u6cd5\u5904\u7406\u7684\u8f7b\u91cf\u7ea7\u8fd0\u7b97\uff0c\u5b9e\u73b0\u4e86\u5bf9\u5341\u4ebf\u53c2\u6570\u89c4\u6a21\u6a21\u578b\u7684\u9ad8\u901f\u5904\u7406\uff0c\u4f7f\u5176\u63a5\u8fd1\u4eba\u8111\u7ea7\u522b\u7684\u6548\u7387\u3002 \u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u5c55\u793a\u4e86LLMs\u5728\u51cf\u5c0f\u590d\u6742\u6027\u540e\u4ecd\u80fd\u4fdd\u6301\u9ad8\u6548\uff0c\u8fd8\u6307\u51fa\u4e86\u672a\u6765\u52a0\u901f\u5668\u5e94\u4f18\u5316\u7684\u8fd0\u7b97\u7c7b\u578b\uff0c\u4ee5\u9002\u5e94\u4e0b\u4e00\u4ee3\u8f7b\u91cf\u7ea7LLMs\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5b9e\u73b0\u5df2\u5f00\u6e90\u81f3\uff1a\\url{https://github.com/ridgerchu/matmulfreellm}\u3002**|\n", "2406.02524": "|**2024-06-04**|**CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks**|Maciej Besta et.al.|[2406.02524](http://arxiv.org/abs/2406.02524)|**[link](https://github.com/spcl/checkembed)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u53d8\u9769\uff0c\u4f46\u9a8c\u8bc1\u5176\u7b54\u6848\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u5f00\u653e\u6027\u7684\u4efb\u52a1\uff0c\u5982\u77e5\u8bc6\u6574\u5408\u3001\u6458\u8981\u548c\u63d0\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheckEmbed\u7684\u7cbe\u786e\u3001\u53ef\u6269\u5c55\u4e14\u7b80\u4fbf\u7684LLM\u9a8c\u8bc1\u65b9\u6cd5\u3002CheckEmbed\u7684\u6838\u5fc3\u7406\u5ff5\u662f\uff1a\u901a\u8fc7\u5229\u7528\u5982GPT\u6587\u672c\u5d4c\u5165\u5927\u6a21\u578b\u83b7\u53d6\u7684\u7b54\u6848\u7ea7\u5d4c\u5165\u6765\u6bd4\u8f83LLM\u7684\u56de\u7b54\u3002\u8fd9\u5c06\u590d\u6742\u7684\u6587\u672c\u7b54\u6848\u8f6c\u5316\u4e3a\u5355\u4e00\u7684\u5d4c\u5165\uff0c\u7b80\u5316\u4e86\u5bf9\u6bd4\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u5feb\u901f\u800c\u6709\u610f\u4e49\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u9a8c\u8bc1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u5b9e\u73b0\u4e86CheckEmbed\u7684\u7406\u5ff5\uff0c\u5e76\u63d0\u4f9b\u4e86\u8bc4\u4f30LLM\u7b54\u6848\u771f\u5b9e\u6027\u7684\u5ea6\u91cf\uff0c\u5982\u5d4c\u5165\u70ed\u529b\u56fe\u53ca\u5176\u603b\u7ed3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6307\u6807\u8bbe\u8ba1\u5b9e\u9645\u7684\u5f15\u64ce\uff0c\u4ee5\u51b3\u5b9aLLM\u7b54\u6848\u662f\u5426\u4ee4\u4eba\u6ee1\u610f\u3002\u5728\u5b9e\u9645\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\uff0c\u5982\u672f\u8bed\u63d0\u53d6\u548c\u6587\u6863\u6458\u8981\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u663e\u8457\u7684\u51c6\u786e\u6027\u63d0\u5347\u3001\u6210\u672c\u6548\u76ca\u548c\u8fd0\u884c\u65f6\u95f4\u6027\u80fd\uff0c\u76f8\u8f83\u4e8eBERTScore\u6216SelfCheckGPT\u7b49\u57fa\u4e8etoken\u3001\u53e5\u5b50\u548c\u4e8b\u5b9e\u7ea7\u522b\u7684\u65b9\u6848\u3002|\n", "2406.02523": "|**2024-06-04**|**RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots**|Soroush Nasiriany et.al.|[2406.02523](http://arxiv.org/abs/2406.02523)|null|## \u7ffb\u8bd1 \u4eba\u5de5\u667a\u80fd\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u89c4\u6a21\u7684\u6269\u5927\u3002\u7136\u800c\uff0c\u5728\u673a\u5668\u4eba\u9886\u57df\uff0c\u5927\u89c4\u6a21\u673a\u5668\u4eba\u6570\u636e\u96c6\u7684\u83b7\u53d6\u662f\u4e00\u4e2a\u74f6\u9888\u3002\u6211\u4eec\u4e3b\u5f20\u5229\u7528\u903c\u771f\u7684\u7269\u7406\u6a21\u62df\u6765\u63d0\u5347\u73af\u5883\u3001\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\uff0c\u4ee5\u652f\u6301\u673a\u5668\u4eba\u5b66\u4e60\u65b9\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecdRoboCasa\uff0c\u8fd9\u662f\u4e00\u4e2a\u5927\u578b\u7684\u4eff\u771f\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u80fd\u591f\u5728\u65e5\u5e38\u73af\u5883\u4e2d\u901a\u7528\u7684\u673a\u5668\u4eba\u3002RoboCasa\u7684\u7279\u70b9\u662f\u62e5\u6709\u4e30\u5bcc\u4e14\u591a\u6837\u5316\u7684\u53a8\u623f\u573a\u666f\uff0c\u5305\u62ec\u8d85\u8fc7150\u4e2a\u7c7b\u522b\u7684\u4e00\u5343\u591a\u4ef63D\u6a21\u578b\u8d44\u4ea7\u548c\u6570\u5341\u79cd\u53ef\u4ea4\u4e92\u7684\u5bb6\u5177\u548c\u7535\u5668\u3002 \u6211\u4eec\u901a\u8fc7\u751f\u6210\u5f0fAI\u5de5\u5177\u8fdb\u4e00\u6b65\u589e\u5f3a\u6a21\u62df\u7684\u771f\u5b9e\u6027\u548c\u591a\u6837\u6027\uff0c\u5982\u4f7f\u7528\u6587\u672c\u52303D\u6a21\u578b\u7684\u6280\u672f\u751f\u6210\u5bf9\u8c61\u8d44\u4ea7\uff0c\u4ee5\u53ca\u901a\u8fc7\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u73af\u5883\u7eb9\u7406\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86100\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6307\u5bfc\u7684\u590d\u5408\u4efb\u52a1\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f30\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u6f14\u793a\uff0c\u5e76\u7ed3\u5408\u81ea\u52a8\u8f68\u8ff9\u751f\u6210\u65b9\u6cd5\uff0c\u4ee5\u6700\u5c0f\u7684\u4eba\u529b\u6210\u672c\u5927\u5e45\u6269\u5145\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u5408\u6210\u751f\u6210\u7684\u673a\u5668\u4eba\u6570\u636e\u8fdb\u884c\u5927\u89c4\u6a21\u6a21\u4eff\u5b66\u4e60\u65f6\uff0c\u5b58\u5728\u660e\u663e\u7684\u89c4\u6a21\u6548\u5e94\uff0c\u5e76\u663e\u793a\u51fa\u5229\u7528\u6a21\u62df\u6570\u636e\u5728\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\u3002\u76f8\u5173\u89c6\u9891\u548c\u5f00\u6e90\u4ee3\u7801\u5df2\u5728https://robocasa.ai/\u7f51\u7ad9\u4e0a\u63d0\u4f9b\u3002|\n", "2406.03496": "|**2024-06-05**|**Wings: Learning Multimodal LLMs without Text-only Forgetting**|Yi-Kai Zhang et.al.|[2406.03496](http://arxiv.org/abs/2406.03496)|null|## \u4efb\u52a1 \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8d77\u6e90\u4e8e\u9884\u8bad\u7ec3\u7684\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff0c\u9996\u5148\u5c06\u56fe\u50cf\u4e0e\u6587\u672c\u5bf9\u9f50\uff0c\u7136\u540e\u5728\u6df7\u5408\u6a21\u6001\u8f93\u5165\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0cMLLM\u5728\u5904\u7406\u4ec5\u5305\u542b\u6587\u672c\u7684\u6307\u4ee4\u65f6\u4f1a\u51fa\u73b0\u707e\u96be\u6027\u7684\u9057\u5fd8\uff0c\u8fd9\u4e9b\u6587\u672c\u6307\u4ee4\u5e76\u672a\u5305\u542b\u56fe\u50cf\uff0c\u8fd9\u4e9b\u95ee\u9898\u5728\u521d\u59cb\u7684\u8bed\u8a00\u6a21\u578b\u9636\u6bb5\u5c31\u5df2\u7ecf\u5b58\u5728\u3002\u672c\u6587\u63d0\u51faWings\uff0c\u4e00\u4e2a\u65b0\u578b\u7684MLLM\uff0c\u5b83\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u901a\u8fc7\u5206\u6790MLLM\u5728\u591a\u6a21\u6001\u6307\u4ee4\u4e2d\u7684\u6ce8\u610f\u529b\uff0c\u6211\u4eec\u53d1\u73b0\u6587\u672c\u9057\u5fd8\u4e0e\u4ece\u56fe\u50cf\u524d\u5411\u56fe\u50cf\u540e\u7684\u6ce8\u610f\u529b\u8f6c\u79fb\u6709\u5173\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u989d\u5916\u6a21\u5757\u4f5c\u4e3a\u589e\u5f3a\u5b66\u4e60\u5668\uff0c\u4ee5\u8865\u507f\u8fd9\u79cd\u6ce8\u610f\u529b\u8f6c\u79fb\u3002\u89c6\u89c9\u548c\u6587\u672c\u5b66\u4e60\u5668\u4f5c\u4e3a\u201c\u7fc5\u8180\u201d\u5f0f\u7684\u8865\u5145\uff0c\u5e73\u884c\u8fde\u63a5\u5728\u6bcf\u4e2a\u6ce8\u610f\u529b\u5757\u5185\uff0c\u8d77\u521d\u56fe\u50cf\u548c\u6587\u672c\u8f93\u5165\u7531\u89c6\u89c9\u5b66\u4e60\u5668\u4e0e\u4e3b\u6ce8\u610f\u529b\u534f\u540c\u5de5\u4f5c\uff0c\u5e73\u8861\u5bf9\u89c6\u89c9\u5143\u7d20\u7684\u5173\u6ce8\u3002\u968f\u540e\uff0c\u6587\u672c\u5b66\u4e60\u5668\u901a\u8fc7\u6ce8\u610f\u529b\u8def\u7531\u7684\u65b9\u5f0f\u4e0e\u89c6\u89c9\u5b66\u4e60\u5668\u7684\u8f93\u51fa\u534f\u4f5c\u6574\u5408\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4f4e\u79e9\u6b8b\u5dee\u6ce8\u610f\u529b\uff08LoRRA\uff09\u673a\u5236\u4ee5\u4fdd\u8bc1\u5b66\u4e60\u5668\u7684\u9ad8\u6548\u8fd0\u884c\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWings\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u540c\u7b49\u89c4\u6a21\u7684MLLM\u3002\u5728\u6211\u4eec\u65b0\u6784\u5efa\u7684\u4ea4\u9519\u56fe\u50cf-\u6587\u672c\uff08IIT\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cWings\u5728\u4ece\u6587\u672c\u4e3a\u4e3b\u5230\u591a\u6a21\u6001\u4e3a\u4e3b\u7684\u95ee\u7b54\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002|\n", "2406.03488": "|**2024-06-06**|**Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training**|Ao Sun et.al.|[2406.03488](http://arxiv.org/abs/2406.03488)|**[link](https://github.com/maydomine/seq1f1b)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5176\u4e2d\u7ba1\u9053\u5e76\u884c\u6027\u8d77\u7740\u5173\u952e\u4f5c\u7528\u3002\u968f\u7740LLMs\u7684\u8bad\u7ec3\u5e8f\u5217\u957f\u5ea6\u6269\u5c55\u523032k\u751a\u81f3128k\uff0c\u5f53\u524d\u7684\u7ba1\u9053\u5e76\u884c\u65b9\u6cd5\u9762\u4e34\u4e25\u91cd\u74f6\u9888\uff0c\u5982\u9ad8\u5185\u5b58\u5360\u7528\u548c\u663e\u8457\u7684\u7ba1\u9053\u5ef6\u8fdf\uff0c\u8fd9\u6781\u5927\u5730\u9650\u5236\u4e86\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u548c\u8bad\u7ec3\u541e\u5410\u91cf\u3002\u4e3a\u4e86\u63d0\u9ad8\u5185\u5b58\u6548\u7387\u548c\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u957f\u5e8f\u5217\u8bad\u7ec3LLMs\u7684\u9ad8\u6548\u5e8f\u5217\u7ea7\u4e00\u6b21\u524d\u5411\u4e00\u6b21\u540e\u5411\uff081F1B\uff09\u7ba1\u9053\u8c03\u5ea6\u65b9\u6cd5\uff0c\u79f0\u4e3aSeq1F1B\u3002Seq1F1B\u5c06\u6279\u7ea7\u522b\u53ef\u8c03\u5ea6\u5355\u5143\u5206\u89e3\u4e3a\u66f4\u7ec6\u7684\u5e8f\u5217\u7ea7\u5355\u5143\uff0c\u4ece\u800c\u51cf\u5c0f\u5ef6\u8fdf\u5e76\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002 \u8003\u8651\u5230\u5982\u679c\u5747\u5300\u5206\u5272\u5e8f\u5217\uff0cSeq1F1B\u53ef\u80fd\u4f1a\u4ea7\u751f\u8f7b\u5fae\u7684\u989d\u5916\u5ef6\u8fdf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba1\u7b97\u7684\u7b56\u7565\u6765\u5212\u5206\u8f93\u5165\u5e8f\u5217\uff0c\u4ee5\u7f13\u89e3\u8fd9\u4e2a\u526f\u4f5c\u7528\u3002\u4e0e\u7ade\u4e89\u6027\u7684\u7ba1\u9053\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5982Megatron\u76841F1B\u7ba1\u9053\u5e76\u884c\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u66f4\u9ad8\u8bad\u7ec3\u541e\u5410\u91cf\u7684\u540c\u65f6\uff0c\u5185\u5b58\u5360\u7528\u66f4\u4f4e\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSeq1F1B\u80fd\u591f\u5728\u4e0d\u4f7f\u7528\u91cd\u65b0\u8ba1\u7b97\u7b56\u7565\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u5730\u572864\u4e2aNVIDIA A100 GPU\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u5177\u6709300\u4ebf\u53c2\u6570\u7684LLM\uff0c\u5904\u7406\u957f\u8fbe64k\u7684\u5e8f\u5217\uff0c\u8fd9\u662f\u73b0\u6709\u65b9\u6cd5\u65e0\u6cd5\u5b9e\u73b0\u7684\u3002\u6211\u4eec\u7684\u4ee3\u7801\u57fa\u4e8eMegatron-LM\uff0c\u5e76\u5df2\u5f00\u6e90\uff1ahttps://github.com/MayDomine/Seq1F1B.git\u3002|\n", "2406.03487": "|**2024-06-05**|**Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends**|Sanjana Ramprasad et.al.|[2406.03487](http://arxiv.org/abs/2406.03487)|null|### \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u6458\u8981\u751f\u6210\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u771f\u5b9e\u6027\u65b9\u9762\u7684\u95ee\u9898\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u65b0\u95fb\u9886\u57df\u7684LLMs\uff0c\u5bf9\u8bdd\u6458\u8981\u7684\u8bc4\u4ef7\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u4e8eBART\u7684\u6a21\u578b\u4e0a\uff0c\u8fd9\u5728\u6211\u4eec\u7406\u89e3\u5b83\u4eec\u7684\u53ef\u4fe1\u5ea6\u65b9\u9762\u7559\u4e0b\u4e86\u7a7a\u767d\u3002\u672c\u7814\u7a76\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u5bf9\u8bdd\u6458\u8981\u4e2d\u7684\u771f\u5b9e\u6027\uff0c\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\uff0c\u5e76\u7740\u91cd\u4e8e\u8bc6\u522b\u548c\u5206\u7c7b\u53e5\u7ea7\u4e0d\u4e00\u81f4\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8GPT-4\u548cAlpaca-13B\u8fd9\u4e24\u6b3e\u4e3b\u6d41\u6a21\u578b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u9519\u8bef\u5b9a\u4e49\u7684\u5fae\u5999\u4e4b\u5904\uff1aLLMs\u5e38\u5e38\u751f\u6210\u770b\u4f3c\u5408\u7406\u7684\u63a8\u65ad\uff0c\u8fd9\u4e9b\u63a8\u65ad\u4f9d\u8d56\u4e8e\u5bf9\u8bdd\u4e2d\u7684\u95f4\u63a5\u8bc1\u636e\uff0c\u800c\u7f3a\u4e4f\u76f4\u63a5\u8bc1\u636e\uff0c\u8fd9\u5728\u65e7\u6a21\u578b\u4e2d\u8f83\u5c11\u89c1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6539\u8fdb\u7684\u9519\u8bef\u5206\u7c7b\u4f53\u7cfb\uff0c\u5f15\u5165\u4e86\u201c\u60c5\u5883\u63a8\u7406\u201d\u7c7b\u522b\u6765\u5f52\u7c7b\u8fd9\u4e9bLLM\u884c\u4e3a\uff0c\u5e76\u516c\u5f00\u4e86\u76f8\u5173\u6570\u636e\u96c6\u3002\u5229\u7528\u6211\u4eec\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86LLMs\u4e0e\u8001\u5f0f\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u884c\u4e3a\u5dee\u5f02\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u81ea\u52a8\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\u5728LLM\u6458\u8981\u4e0a\u7684\u6548\u679c\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u8bc6\u522b\u8fd9\u7c7b\u7ec6\u5fae\u9519\u8bef\u65f6\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u7cbe\u7ec6\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u4f18\u4e8e\u73b0\u6709\u6307\u6807\uff0c\u7279\u522b\u662f\u5728\u8bc6\u522b\u201c\u60c5\u5883\u63a8\u7406\u201d\u9519\u8bef\u65f6\u3002|\n", "2406.03486": "|**2024-06-05**|**BIPED: Pedagogically Informed Tutoring System for ESL Education**|Soonwoo Kwon et.al.|[2406.03486](http://arxiv.org/abs/2406.03486)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u4f5c\u4e3a\u7ecf\u6d4e\u4e14\u6613\u4e8e\u83b7\u53d6\u7684\u82f1\u8bed\u7b2c\u4e8c\u8bed\u8a00\uff08L2\uff09\u5b66\u4e60\u8005\u5bf9\u8bdd\u5f0f\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\uff08CITS\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684CITS\u5f80\u5f80\u53ea\u80fd\u6559\u6388\u7b80\u5355\u6982\u5ff5\uff0c\u6216\u8005\u5728\u6559\u5b66\u6df1\u5ea6\u4e0a\u65e0\u6cd5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u7b56\u7565\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u5f00\u53d1\u4e00\u4e2a\u66f4\u5177\u6559\u80b2\u5b66\u5bfc\u5411\u3001\u80fd\u6559\u6388\u590d\u6742\u6982\u5ff5\u7684CITS\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53cc\u8bed\u6559\u80b2\u6307\u5bfc\u5bf9\u8bdd\u6570\u636e\u96c6\uff08BIPED\uff09\uff0c\u5305\u542b\u4e00\u5bf9\u4e00\u7684\u4eba\u7c7b\u82f1\u8bed\u8f85\u5bfc\u4e92\u52a8\u3002\u901a\u8fc7\u5bf9\u8f85\u5bfc\u5bf9\u8bdd\u7684\u540e\u5904\u7406\u5206\u6790\uff0c\u6211\u4eec\u63d0\u70bc\u51fa\u4e00\u5957\u5305\u542b34\u79cd\u6559\u5e08\u884c\u4e3a\u548c9\u79cd\u5b66\u751f\u884c\u4e3a\u7684\u5bf9\u8bdd\u52a8\u4f5c\u8bcd\u5178\uff0c\u5e76\u5c06\u5176\u7528\u4e8e\u8fdb\u4e00\u6b65\u6807\u6ce8\u6536\u96c6\u7684\u6570\u636e\u3002\u6839\u636e\u5148\u9884\u6d4b\u5408\u9002\u7684\u6559\u5e08\u884c\u4e3a\u518d\u751f\u6210\u76f8\u5e94\u56de\u590d\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u6211\u4eec\u5229\u7528GPT-4\u548cSOLAR-KO\u5206\u522b\u5b9e\u73b0\u4e86\u4e24\u4e2aCITS\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u5b9e\u65bd\u7684\u6a21\u578b\u4e0d\u4ec5\u6a21\u4eff\u4e86\u4eba\u7c7b\u6559\u5e08\u7684\u98ce\u683c\uff0c\u8fd8\u8fd0\u7528\u4e86\u4e30\u5bcc\u4e14\u4e0e\u4e0a\u4e0b\u6587\u76f8\u9002\u5e94\u7684\u6559\u5b66\u7b56\u7565\u3002|\n", "2406.03476": "|**2024-06-05**|**Does your data spark joy? Performance gains from domain upsampling at the end of training**|Cody Blakeney et.al.|[2406.03476](http://arxiv.org/abs/2406.03476)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u89c4\u6a21\u589e\u957f\u5230\u4e07\u4ebf\u7ea7\u522b\u7684tokens\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u7531\u5927\u89c4\u6a21\u7684CommonCrawl\u7f51\u7edc\u722c\u866b\u5185\u5bb9\u4ee5\u53ca\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u7ec4\u6210\u3002\u7531\u4e8e\u5728\u5927\u8ba1\u7b97\u91cf\uff08FLOPs\uff09\u4e0b\u8bad\u7ec3\u4ee5\u63ed\u793a\u6a21\u578b\u5728\u56f0\u96be\u548c\u65b0\u5174\u57fa\u51c6\u4e0a\u7684\u663e\u8457\u53d8\u5316\u6210\u672c\u9ad8\u6602\uff0c\u5982\u4f55\u5728\u901a\u7528\u7f51\u7edc\u6293\u53d6\u7684\u591a\u6837\u6027\u548c\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5bc6\u5ea6\u4e4b\u95f4\u627e\u5230\u6700\u4f18\u5e73\u8861\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c\u5728\u8bad\u7ec3\u540e\u671f\u5bf9\u5176\u8fdb\u884c\u4e0a\u91c7\u6837\uff0c\u4ece\u800c\u5728\u8bf8\u5982MMLU\u3001GSM8K\u548cHumanEval\u7b49\u57fa\u51c6\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u5bf9\u4e8e\u4e00\u4e2a\u8bad\u7ec3\u4e861\u4e07\u4ebf\uff08T\uff09\u4ee4\u724c\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u8fd9\u79cd\u7b80\u5355\u65b9\u6cd5\u53ef\u4f7f\u5176\u6027\u80fd\u63d0\u9ad86.90\u5206\u30018.26\u5206\u548c6.17\u5206\uff0c\u4e0e\u8bad\u7ec3\u65f6\u95f4\u4e24\u500d\u7684Llama-2\uff087B\uff09\u6a21\u578b\u76f8\u5f53\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5728\u8bad\u7ec3\u540e\u671f\u9886\u57df\u4e0a\u91c7\u6837\u7684\u6301\u7eed\u65f6\u95f4\uff0c\u4ece5%\u523030%\uff0c\u53d1\u73b010%\u523020%\u7684\u6bd4\u4f8b\u6700\u4e3a\u5408\u9002\uff0c\u4ee5\u5e73\u8861\u4e00\u822c\u8bed\u8a00\u5efa\u6a21\u80fd\u529b\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684\u4f18\u5316\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5229\u7528\u9886\u57df\u4e0a\u91c7\u6837\u6765\u5927\u89c4\u6a21\u5206\u6790\u5355\u4e2a\u6570\u636e\u96c6\u5bf9\u4e0d\u540c\u57fa\u51c6\u7684\u589e\u76ca\uff0c\u901a\u8fc7\u5728\u8fd9\u4e00\u9636\u6bb5\u79fb\u9664\u5b83\u4eec\u8fdb\u884c\u5b9e\u9a8c\u3002\u8fd9\u79cd\u65b9\u6cd5\u6781\u5927\u5730\u964d\u4f4e\u4e86\u5b9e\u9a8c\u6210\u672c\uff0c\u4f7f\u5f97\u80fd\u591f\u4ee5\u9884\u8bad\u7ec3\u8fd0\u884c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u7684\u6210\u672c\u63a2\u7d22\u4e0d\u540c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474](http://arxiv.org/abs/2406.03474)|null|\u9274\u4e8e\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u805a\u7126\u4e8e\u4f7f\u7528MLLM\u9a71\u52a8\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u5728\u5927\u89c4\u6a21\u52a8\u6001\u73af\u5883\u4e2d\u3002\u7136\u800c\uff0c\u5e38\u89c1\u7684\u65b9\u6cd5\u76f4\u63a5\u5c06\u9ad8\u7ea7\u6307\u4ee4\u8f6c\u5316\u4e3a\u4f4e\u7ea7\u8f66\u8f86\u63a7\u5236\u4fe1\u53f7\uff0c\u8fd9\u8fdd\u80cc\u4e86MLLM\u7684\u672c\u8d28\u751f\u6210\u6a21\u5f0f\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u5176\u6f5c\u5728\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u7684\u4e00\u822c\u5316\u80fd\u529b\u53d7\u5230\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6781\u5927\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u4e2d\u5c42\u8bed\u8a00\u9a71\u52a8\u547d\u4ee4\u6765\u8fde\u63a5\u9ad8\u7ea7\u6307\u4ee4\u548c\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\uff0c\u5b83\u4eec\u6bd4\u9ad8\u7ea7\u6307\u4ee4\u66f4\u7ec6\u81f4\uff0c\u4f46\u6bd4\u63a7\u5236\u4fe1\u53f7\u66f4\u901a\u7528\u4e14\u53ef\u89e3\u91ca\uff0c\u4ece\u800c\u6709\u6548\u5f25\u5408\u4e24\u8005\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3aAD-H\u7684\u5206\u5c42\u591a\u4ee3\u7406\u9a7e\u9a76\u7cfb\u7edf\u5b9e\u73b0\u8fd9\u4e00\u7406\u5ff5\uff0c\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u9ad8\u5c42\u63a8\u7406\u7684MLLM\u89c4\u5212\u5668\u548c\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u63a7\u5236\u5668\u8fdb\u884c\u4f4e\u5c42\u6267\u884c\u3002\u8fd9\u79cd\u5206\u5c42\u8bbe\u8ba1\u4f7fMLLM\u6446\u8131\u4e86\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\u89e3\u7801\uff0c\u5145\u5206\u91ca\u653e\u4e86\u5176\u5728\u9ad8\u5c42\u611f\u77e5\u3001\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u7684\u6d8c\u73b0\u80fd\u529b\u3002 \u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5e26\u6709\u52a8\u4f5c\u5c42\u6b21\u6ce8\u91ca\u7684\u65b0\u6570\u636e\u96c6\u3002\u5168\u9762\u7684\u95ed\u73af\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684AD-H\u7cfb\u7edf\u5177\u6709\u591a\u9879\u5173\u952e\u4f18\u52bf\u3002\u9996\u5148\uff0cAD-H\u5728\u9a7e\u9a76\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u751a\u81f3\u5c55\u73b0\u51fa\u5728\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u81ea\u6211\u7ea0\u6b63\u7684\u80fd\u529b\uff0c\u8fd9\u662f\u8bad\u7ec3\u6570\u636e\u672a\u6db5\u76d6\u7684\u573a\u666f\u3002\u5176\u6b21\uff0cAD-H\u5728\u957f\u7a0b\u6307\u4ee4\u548c\u65b0\u73af\u5883\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\uff0c\u660e\u663e\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\uff0c\u53ef\u901a\u8fc7\u83b7\u53d6\u3002|\n", "2406.03450": "|**2024-06-05**|**What is the Best Way for ChatGPT to Translate Poetry?**|Shanshan Wang et.al.|[2406.03450](http://arxiv.org/abs/2406.03450)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u5728\u82f1\u8bed-\u4e2d\u6587\u8bd7\u6b4c\u7ffb\u8bd1\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u5b9a\u5411\u63d0\u793a\u548c\u5c0f\u6837\u672c\u573a\u666f\u5206\u6790\u4ee5\u4f18\u5316\u5176\u8868\u73b0\u3002\u5c3d\u7ba1\u521d\u671f\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u7814\u7a76\u53d1\u73b0ChatGPT\u7684\u7ffb\u8bd1\u5b58\u5728\u6301\u7eed\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u89e3\u91ca\u8f85\u52a9\u8bd7\u6b4c\u673a\u5668\u7ffb\u8bd1\u201d\uff08EAPMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bd7\u6b4c\u7684\u5355\u8bed\u89e3\u91ca\u4f5c\u4e3a\u7ffb\u8bd1\u8fc7\u7a0b\u7684\u6307\u5bfc\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ee5\u66f4\u597d\u5730\u9002\u5e94\u73b0\u4ee3\u8bd7\u6b4c\u7ffb\u8bd1\u7684\u5fae\u5999\u4e4b\u5904\u3002\u6211\u4eec\u9080\u8bf7\u4e13\u4e1a\u8bd7\u4eba\u8fdb\u884c\u8bc4\u4f30\uff0c\u5e76\u7ed3\u5408GPT-4\u7684\u8bc4\u4ef7\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684EAPMT\u65b9\u6cd5\u5728\u4e0e\u4f20\u7edfChatGPT\u7ffb\u8bd1\u65b9\u6cd5\u4ee5\u53ca\u73b0\u6709\u5728\u7ebf\u7cfb\u7edf\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8bba\u6587\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e3a\u6587\u5b66\u7ffb\u8bd1\u7684\u673a\u5668\u8f85\u52a9\u63d0\u4f9b\u4e86\u65b0\u9896\u89c6\u89d2\u3002|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445](http://arxiv.org/abs/2406.03445)|null|## \u7ffb\u8bd1 \u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u6267\u884c\u57fa\u672c\u7684\u7b97\u672f\u8fd0\u7b97\uff0c\u5982\u52a0\u6cd5\uff0c\u4ecd\u4e0d\u6e05\u695a\u3002\u672c\u6587\u63ed\u793a\u4e86\u9884\u8bad\u7ec3\u7684LLMs\u901a\u8fc7\u5085\u91cc\u53f6\u7279\u5f81\u8fdb\u884c\u52a0\u6cd5\u2014\u2014\u8fd9\u4e9b\u662f\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u7ef4\u5ea6\uff0c\u901a\u8fc7\u4e00\u7ec4\u5728\u9891\u57df\u4e2d\u7a00\u758f\u5206\u5e03\u7684\u7279\u5f81\u6765\u8868\u793a\u6570\u5b57\u3002\u5728\u6a21\u578b\u4e2d\uff0c\u591a\u5c42\u611f\u77e5\u5668\uff08MLP\uff09\u5c42\u548c\u6ce8\u610f\u529b\u5c42\u4ee5\u4e92\u8865\u7684\u65b9\u5f0f\u4f7f\u7528\u5085\u91cc\u53f6\u7279\u5f81\uff1aMLP\u5c42\u4e3b\u8981\u4f7f\u7528\u4f4e\u9891\u7279\u5f81\u8fd1\u4f3c\u7b54\u6848\u7684\u5927\u5c0f\uff0c\u800c\u6ce8\u610f\u529b\u5c42\u4e3b\u8981\u901a\u8fc7\u9ad8\u9891\u7279\u5f81\u6267\u884c\u6a21\u8fd0\u7b97\uff08\u4f8b\u5982\u5224\u65ad\u7b54\u6848\u662f\u5426\u4e3a\u5076\u6570\uff09\u3002\u9884\u8bad\u7ec3\u5bf9\u4e8e\u8fd9\u79cd\u673a\u5236\u81f3\u5173\u91cd\u8981\uff1a\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684\u6a21\u578b\u4ec5\u5229\u7528\u4f4e\u9891\u7279\u5f81\uff0c\u5bfc\u81f4\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u5c06\u9884\u8bad\u7ec3\u7684\u8bcd\u5d4c\u5165\u5f15\u5165\u5230\u968f\u673a\u521d\u59cb\u5316\u7684\u6a21\u578b\u4e2d\u53ef\u4ee5\u6062\u590d\u5176\u6027\u80fd\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u9002\u5f53\u7684\u9884\u8bad\u7ec3\u8868\u793a\uff08\u5982\u5085\u91cc\u53f6\u7279\u5f81\uff09\u80fd\u591f\u89e3\u9501Transformer\u5b66\u4e60\u7b97\u6cd5\u4efb\u52a1\u7cbe\u786e\u673a\u5236\u7684\u80fd\u529b\u3002|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441](http://arxiv.org/abs/2406.03441)|null|\u5728\u8bb8\u591a\u9ad8\u98ce\u9669\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u4e2d\uff0c\u6a21\u578b\u9700\u8981\u80fd\u591f\u8868\u660e\u5176\u5bf9\u9884\u6d4b\u7684\u4e0d\u786e\u5b9a\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u4e0a\u7684\u51c6\u786e\u5ea6\u53ef\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u4eba\u7c7b\u6c34\u5e73\uff0c\u4f46\u5b83\u4eec\u5bf9\u9519\u8bef\u54cd\u5e94\u7684\u8fc7\u5ea6\u81ea\u4fe1\u4ecd\u662f\u5df2\u77e5\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u5728\u76f4\u63a5\u5e94\u7528\u4e8eLLMs\u65f6\u53ef\u80fd\u9762\u4e34\u8ba1\u7b97\u6210\u672c\u548c\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u6311\u6218\u3002\u8fd1\u671f\u63d0\u51fa\u4e86\u4e00\u4e9b\u9ed1\u76d2\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bf8\u5982\u81ea\u6211\u8868\u8ff0\u7684\u4fe1\u5fc3\u7b49\u542f\u53d1\u5f0f\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u901a\u8fc7\u5206\u6790\u6a21\u578b\u751f\u6210\u7b54\u6848\u7684\u89e3\u91ca\u5206\u5e03\u6765\u8861\u91cfLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u5c3d\u7ba1\u5229\u7528\u89e3\u91ca\u672c\u8eab\u5e76\u975e\u65b0\u9896\uff0c\u4f46\u6211\u4eec\u5c06\u5176\u89c6\u4e3a\u6d4b\u8bd5\u65f6\u95f4\u5206\u7c7b\u5668\uff0c\u901a\u8fc7\u8ba1\u7b97\u6700\u53ef\u80fd\u7684\u5206\u7c7b\u5668\u540e\u9a8c\u7b54\u6848\u5206\u5e03\uff0c\u4ee5\u6b64\u8fdb\u884c\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u89e3\u91ca\u8574\u542b\u4f5c\u4e3a\u5206\u7c7b\u5668\u4f3c\u7136\u6027\u7684\u4e00\u79cd\u7279\u5b9a\u6846\u67b6\u5b9e\u4f8b\uff0c\u5982\u4f55\u5728\u4e94\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\u4e0a\u6539\u8fdb\u4e86\u4fe1\u5fc3\u5206\u6570\u6307\u6807\uff08\u7279\u522b\u662fAUROC\u548cAURC\uff09\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u65e2\u5177\u6709\u7406\u8bba\u4f9d\u636e\uff0c\u53c8\u662f\u6709\u6548\u91cf\u5316LLMs\u4e0d\u786e\u5b9a\u6027\u7684\u65b9\u5f0f\u3002|\n", "2406.03411": "|**2024-06-05**|**Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach**|Saehyung Lee et.al.|[2406.03411](http://arxiv.org/abs/2406.03411)|**[link](https://github.com/saehyung-lee/plugir)**|**\u8be5\u8bba\u6587\u4e3b\u8981\u5173\u6ce8\u7684\u662f\u4ea4\u4e92\u5f0f\u6587\u672c\u5230\u56fe\u50cf\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u8bdd\u5f62\u5f0f\u4e0a\u4e0b\u6587\u67e5\u8be2\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\uff0c\u540d\u4e3aPlugIR\uff0c\u901a\u8fc7\u4e24\u79cd\u65b9\u5f0f\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u822c\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u9996\u5148\uff0c\u901a\u8fc7\u91cd\u8ff0\u5bf9\u8bdd\u5f62\u5f0f\u7684\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u6d88\u9664\u4e86\u5728\u73b0\u6709\u89c6\u89c9\u5bf9\u8bdd\u6570\u636e\u4e0a\u5fae\u8c03\u68c0\u7d22\u6a21\u578b\u7684\u9700\u6c42\uff0c\u4ece\u800c\u80fd\u591f\u4f7f\u7528\u4efb\u610f\u9ed1\u76d2\u6a21\u578b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2aLLM\u63d0\u95ee\u8005\uff0c\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u4e2d\u5019\u9009\u56fe\u50cf\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5173\u4e8e\u76ee\u6807\u56fe\u50cf\u5c5e\u6027\u7684\u975e\u5197\u4f59\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u51cf\u5c11\u4e86\u751f\u6210\u95ee\u9898\u7684\u566a\u58f0\u548c\u5197\u4f59\u3002\u9664\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3a\u6700\u4f73\u5bf9\u6570\u6392\u540d\u79ef\u5206\uff08BRI\uff09\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4ea4\u4e92\u5f0f\u68c0\u7d22\u7cfb\u7edf\u3002PlugIR\u5728\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u4f18\u4e8e\u96f6\u6b21\u8bbe\u7f6e\u548c Fine-tuned \u57fa\u51c6\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c PlugIR \u7684\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\u53ef\u4ee5\u6839\u636e\u4e0d\u540c\u60c5\u51b5\u7075\u6d3b\u5355\u72ec\u6216\u7ed3\u5408\u5e94\u7528\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/Saehyung-Lee/PlugIR\u3002**|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344](http://arxiv.org/abs/2406.04344)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d6\u5f97\u7684\u5de8\u5927\u8fdb\u5c55\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53e3\u5934\u5316\u673a\u5668\u5b66\u4e60\uff08VML\uff09\u6846\u67b6\u3002\u4e0e\u4f20\u7edf\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u901a\u5e38\u5728\u8fde\u7eed\u53c2\u6570\u7a7a\u95f4\u4e2d\u4f18\u5316\u4e0d\u540c\uff0cVML\u5c06\u53c2\u6570\u7a7a\u95f4\u9650\u5236\u4e3a\u4eba\u53ef\u7406\u89e3\u7684\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u79cd\u7ea6\u675f\u4fc3\u4f7f\u6211\u4eec\u4ece\u65b0\u89d2\u5ea6\u770b\u5f85\u51fd\u6570\u903c\u8fd1\u95ee\u9898\uff0c\u5373\u5c06\u5e26\u6709\u6587\u672c\u63d0\u793a\u7684LLM\u89c6\u4e3a\u7531\u6587\u672c\u63d0\u793a\u53c2\u6570\u5316\u7684\u51fd\u6570\u3002\u6211\u4eec\u501f\u6b64\u89c6\u89d2\u91cd\u65b0\u5ba1\u89c6\u4e86\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u5982\u56de\u5f52\u548c\u5206\u7c7b\uff0c\u53d1\u73b0\u8fd9\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\u548c\u4f18\u5316\u5668\u6765\u89e3\u51b3\u3002VML\u7684\u4e3b\u8981\u4f18\u52bf\u5305\u62ec\uff1a\uff081\uff09\u6613\u4e8e\u7f16\u7801\u5148\u9a8c\u77e5\u8bc6\uff1a\u5173\u4e8e\u95ee\u9898\u548c\u5047\u8bbe\u7c7b\u7684\u5148\u9a8c\u77e5\u8bc6\u53ef\u4ee5\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7f16\u7801\u5e76\u8f93\u5165\u7ed9LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\uff1b\uff082\uff09\u81ea\u52a8\u6a21\u578b\u9009\u62e9\uff1a\u4f18\u5316\u5668\u53ef\u4ee5\u6839\u636e\u6570\u636e\u548c\u53e3\u5934\u5316\u5148\u9a8c\u77e5\u8bc6\u81ea\u52a8\u9009\u62e9\u5177\u4f53\u7684\u6a21\u578b\u7c7b\u522b\uff0c\u5e76\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u66f4\u65b0\u6a21\u578b\u7c7b\u522b\uff1b\uff083\uff09\u53ef\u89e3\u91ca\u7684\u5b66\u4e60\u8005\u66f4\u65b0\uff1aLLM\u53c2\u6570\u5316\u7684\u4f18\u5316\u5668\u53ef\u4ee5\u89e3\u91ca\u6bcf\u6b21\u5b66\u4e60\u8005\u66f4\u65b0\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u591a\u9879\u5b9e\u9a8c\u8bc4\u4f30VML\u7684\u6709\u6548\u6027\uff0c\u5e0c\u671b\u5b83\u80fd\u6210\u4e3a\u589e\u5f3a\u673a\u5668\u5b66\u4e60\u53ef\u89e3\u91ca\u6027\u548c\u4fe1\u4efb\u5ea6\u7684\u6865\u6881\u3002|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339](http://arxiv.org/abs/2406.04339)|null|\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u7684\u6838\u5fc3\u76ee\u6807\u4e2d\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u89c6\u89c9\u573a\u666f\u5e76\u6267\u884c\u52a8\u4f5c\u662f\u4e00\u4e2a\u57fa\u672c\u4efb\u52a1\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u673a\u5668\u4eba\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u80fd\u591f\u5904\u7406\u4e00\u4e9b\u57fa\u7840\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5728\u4e24\u4e2a\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff1a1\uff09\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u63a8\u7406\u80fd\u529b\u4e0d\u8db3\uff1b2\uff09\u5bf9\u4e8eMLLM\u7684\u5fae\u8c03\u548c\u63a8\u7406\u5b58\u5728\u9ad8\u8ba1\u7b97\u6210\u672c\u3002\u8fd1\u671f\u63d0\u51fa\u7684\u57fa\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSM\uff09\u7684Mamba\u5c55\u793a\u4e86\u5728\u975e\u5e73\u51e1\u5e8f\u5217\u5efa\u6a21\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5177\u6709\u7ebf\u6027\u63a8\u7406\u590d\u6742\u5ea6\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RoboMamba\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4ebaMLLM\uff0c\u5b83\u5229\u7528Mamba\u6a21\u578b\u7ed3\u5408\u673a\u5668\u4eba\u63a8\u7406\u548c\u52a8\u4f5c\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u5fae\u8c03\u548c\u63a8\u7406\u6548\u7387\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u4e0eMamba\u96c6\u6210\uff0c\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u4f7f\u89c6\u89c9\u6570\u636e\u4e0e\u8bed\u8a00\u5d4c\u5165\u5bf9\u9f50\uff0c\u8d4b\u4e88\u6a21\u578b\u89c6\u89c9\u5e38\u8bc6\u548c\u4e0e\u673a\u5668\u4eba\u76f8\u5173\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347RoboMamba\u7684\u52a8\u4f5c\u59ff\u6001\u9884\u6d4b\u80fd\u529b\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u7b80\u5355\u7684\u7b56\u7565\u5934\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e00\u65e6RoboMamba\u5177\u5907\u8db3\u591f\u7684\u63a8\u7406\u80fd\u529b\uff0c\u53ea\u9700\u6781\u5c11\u7684\u5fae\u8c03\u53c2\u6570\uff08\u6a21\u578b\u76840.1%\uff09\u548c\u65f6\u95f4\uff0820\u5206\u949f\uff09\uff0c\u5c31\u80fd\u4e60\u5f97\u64cd\u7eb5\u6280\u80fd\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0cRoboMamba\u5728\u901a\u7528\u548c\u673a\u5668\u4eba\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u4e2d\u5b9e\u73b0\u4e86\u59ff\u6001\u9884\u6d4b\u7684\u51fa\u8272\u8868\u73b0\uff0c\u5176\u63a8\u7406\u901f\u5ea6\u6bd4\u73b0\u6709\u673a\u5668\u4ebaMLLM\u5feb7\u500d\u3002\u9879\u76ee\u7684\u7f51\u9875\u94fe\u63a5\u4e3a\uff1a\u3002|\n", "2406.04337": "|**2024-06-06**|**Coherent Zero-Shot Visual Instruction Generation**|Quynh Phung et.al.|[2406.04337](http://arxiv.org/abs/2406.04337)|null|\u5c3d\u7ba1\u6587\u672c\u5230\u56fe\u50cf\u5408\u6210\u6280\u672f\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5728\u6269\u6563\u6a21\u578b\u65b9\u9762\uff0c\u4f46\u751f\u6210\u9700\u8981\u7269\u4f53\u5728\u8fde\u7eed\u6b65\u9aa4\u4e2d\u4fdd\u6301\u4e00\u81f4\u8868\u793a\u548c\u5e73\u6ed1\u72b6\u6001\u8f6c\u6362\u7684\u89c6\u89c9\u6307\u4ee4\u4ecd\u7136\u662f\u4e00\u9879\u8270\u5de8\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\uff0c\u5de7\u5999\u5730\u7ed3\u5408\u4e86\u6587\u672c\u7406\u89e3\u4e0e\u56fe\u50cf\u751f\u6210\uff0c\u4ee5\u786e\u4fdd\u89c6\u89c9\u6307\u4ee4\u65e2\u7f8e\u89c2\u53c8\u5177\u6709\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6d4b\u8bd5\u591a\u6b65\u9aa4\u6307\u4ee4\uff0c\u5e76\u4e0e\u591a\u4e2a\u57fa\u7ebf\u8fdb\u884c\u6bd4\u8f83\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u8fde\u8d2f\u4e14\u89c6\u89c9\u4e0a\u5438\u5f15\u4eba\u7684\u6307\u4ee4\u3002|\n", "2406.04334": "|**2024-06-06**|**DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs**|Lingchen Meng et.al.|[2406.04334](http://arxiv.org/abs/2406.04334)|null|\u5927\u591a\u6570\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u901a\u8fc7\u5c06\u89c6\u89c9\u4ee4\u724c\u4f5c\u4e3a\u5e8f\u5217\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7b2c\u4e00\u5c42\u6765\u5b9e\u73b0\u3002\u8fd9\u79cd\u65b9\u6cd5\u867d\u7136\u76f4\u89c2\uff0c\u4f46\u4f1a\u663e\u8457\u589e\u52a0\u8ba1\u7b97\u548c\u5185\u5b58\u5f00\u9500\uff0c\u56e0\u4e3a\u6a21\u578b\u9700\u8981\u5904\u7406\u66f4\u591a\u7684\u8f93\u5165\u5c42\u4ee4\u724c\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u67b6\u6784DeepStack\uff0c\u7528\u4e8eLMMs\u3002\u5728LMM\u7684\u89c6\u89c9\u548c\u8bed\u8a00Transformer\u7684N\u5c42\u4e2d\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u4ee4\u724c\u5206\u4e3aN\u7ec4\uff0c\u5e76\u4ece\u5e95\u5c42\u9010\u5c42\u5411\u4e0a\u9988\u9001\u5230\u5bf9\u5e94\u7684Transformer\u5c42\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u6781\u5927\u5730\u589e\u5f3a\u4e86LMM\u5728\u8de8\u5c42\u89c6\u89c9\u4ee4\u724c\u4ea4\u4e92\u65b9\u9762\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u6210\u672c\u51e0\u4e4e\u4e0d\u53d8\u3002\u6211\u4eec\u5206\u522b\u5c06DeepStack\u5e94\u7528\u4e8eLMM\u7684\u8bed\u8a00\u548c\u89c6\u89c9Transformer\uff0c\u5e76\u901a\u8fc7\u5e7f\u6cdb\u5b9e\u8bc1\u7ed3\u679c\u9a8c\u8bc1\u4e86DeepStack LMM\u7684\u6709\u6548\u6027\u3002 \u4f7f\u7528\u76f8\u540c\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u6211\u4eec\u7684DeepStack 7B\u548c13B\u53c2\u6570\u6a21\u578b\u57289\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5e73\u5747\u8d85\u8d8a\u540c\u7c7b\u6a21\u578b2.7\u5206\u548c2.9\u5206\u3002\u4ec5\u4f7f\u7528\u4e94\u5206\u4e4b\u4e00\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0cDeepStack\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u4f7f\u7528\u5b8c\u6574\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6a21\u578b\u3002\u8fd9\u4e9b\u63d0\u5347\u5728\u9ad8\u5206\u8fa8\u7387\u4efb\u52a1\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u4f8b\u5982\uff0c\u4e0eLLaVA-1.5-7B\u76f8\u6bd4\uff0cTextVQA\u3001DocVQA\u548cInfoVQA\u4e0a\u7684\u6027\u80fd\u5206\u522b\u63d0\u9ad8\u4e864.2\u5206\u300111.0\u5206\u548c4.0\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06DeepStack\u5e94\u7528\u5230\u89c6\u89c9Transformer\u5c42\uff0c\u8fd9\u5e26\u6765\u4e86\u4e0eLLaVA-1.5-7B\u76f8\u5f53\u7684\u5e73\u5747\u6539\u8fdb\uff0c\u4e3a3.8\u5206\u3002|\n", "2406.04331": "|**2024-06-06**|**PaCE: Parsimonious Concept Engineering for Large Language Models**|Jinqi Luo et.al.|[2406.04331](http://arxiv.org/abs/2406.04331)|**[link](https://github.com/peterljq/parsimonious-concept-engineering)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5c3d\u7ba1\u5b83\u4eec\u80fd\u591f\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u56de\u590d\uff0c\u4f46\u4e5f\u4f1a\u4ea7\u751f\u4e0d\u826f\u8f93\u51fa\uff0c\u5982\u6f5c\u5728\u6709\u5bb3\u4fe1\u606f\u3001\u79cd\u65cf\u6216\u6027\u522b\u6b67\u89c6\u6027\u8a00\u8bba\u4ee5\u53ca\u9519\u8bef\u7684\u4fe1\u606f\u3002\u4e3a\u4e86\u51cf\u5c11\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u3001\u63d0\u793a\u5de5\u7a0b\u548c\u8868\u793a\u5de5\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u9762\u4e34\u6311\u6218\uff1a\u4e00\u4e9b\u9700\u8981\u9488\u5bf9\u6bcf\u4e2a\u5bf9\u9f50\u4efb\u52a1\u8fdb\u884c\u6602\u8d35\u7684\u5fae\u8c03\uff1b\u4e00\u4e9b\u672a\u80fd\u5145\u5206\u6d88\u9664\u4e0d\u826f\u6982\u5ff5\uff0c\u5bf9\u9f50\u6548\u679c\u4e0d\u4f73\uff1b\u4e00\u4e9b\u5219\u5220\u9664\u4e86\u826f\u6027\u7684\u6982\u5ff5\uff0c\u964d\u4f4e\u4e86LLMs\u7684\u8bed\u8a00\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aParsimonious Concept Engineering\uff08PaCE\uff09\u7684\u65b0\u578b\u6fc0\u6d3b\u5de5\u7a0b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \u9996\u5148\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u6982\u5ff5\u5b57\u5178\uff0c\u5b83\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u8868\u793a\u6bcf\u4e2a\u539f\u5b50\u5bf9\u5e94\u4e00\u4e2a\u8bed\u4e49\u6982\u5ff5\u3002\u63a5\u7740\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u4efb\u4f55\u5bf9\u9f50\u4efb\u52a1\uff0c\u6211\u4eec\u4f1a\u4f7f\u7528\u4e00\u4e2a\u6982\u5ff5\u5206\u533a\u5668\u9ad8\u6548\u5730\u6807\u8bb0\u8fd9\u4e9b\u6982\u5ff5\u4e3a\u826f\u6027\u6216\u4e0d\u826f\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u5229\u7528\u7a00\u758f\u7f16\u7801\u65b9\u6cd5\uff0c\u6839\u636e\u6982\u5ff5\u5b57\u5178\u5206\u89e3LLM\u7684\u6fc0\u6d3b\uff0c\u5c06\u5176\u51c6\u786e\u8868\u793a\u4e3a\u826f\u6027\u6210\u5206\u548c\u4e0d\u826f\u6210\u5206\u7684\u7ebf\u6027\u7ec4\u5408\u3002\u901a\u8fc7\u79fb\u9664\u4e0d\u826f\u6210\u5206\uff0c\u6211\u4eec\u80fd\u591f\u8c03\u6574LLMs\u7684\u884c\u4e3a\u4ee5\u7b26\u5408\u5bf9\u9f50\u76ee\u6807\u3002 \u6211\u4eec\u5728\u56de\u5e94\u51c0\u5316\u3001\u771f\u5b9e\u6027\u589e\u5f3a\u548c\u60c5\u611f\u4fee\u8ba2\u7b49\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u53d1\u73b0PaCE\u5728\u5b9e\u73b0\u5bf9\u9f50\u6027\u80fd\u7684\u540c\u65f6\uff0c\u4fdd\u6301\u4e86\u826f\u597d\u7684\u8bed\u8a00\u80fd\u529b\uff0c\u8fbe\u5230\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002**|\n", "2406.04314": "|**2024-06-06**|**Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step**|Zhanhao Liang et.al.|[2406.04314](http://arxiv.org/abs/2406.04314)|null|## \u80cc\u666f \u8fd1\u671f\uff0cDirect Preference Optimization (DPO) \u5df2\u6210\u529f\u6269\u5c55\u5230\u8c03\u6574\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\uff0c\u4f7f\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u4e0d\u540c\u4e8e\u5927\u591a\u6570\u73b0\u6709 DPO \u65b9\u6cd5\u5047\u8bbe\u6240\u6709\u6269\u6563\u6b65\u9aa4\u90fd\u4e0e\u6700\u7ec8\u751f\u6210\u56fe\u50cf\u4fdd\u6301\u4e00\u81f4\u7684\u504f\u597d\u987a\u5e8f\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u5047\u8bbe\u5ffd\u7565\u4e86\u6bcf\u4e2a\u6b65\u9aa4\u7279\u6709\u7684\u53bb\u566a\u6027\u80fd\uff0c\u56e0\u6b64\u5e94\u8be5\u4e3a\u6bcf\u4e00\u6b65\u5b9a\u5236\u504f\u597d\u6807\u7b7e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u540e\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014Step-aware Preference Optimization (SPO)\uff0c\u5b83\u72ec\u7acb\u8bc4\u4f30\u5e76\u8c03\u6574\u6bcf\u4e2a\u6b65\u9aa4\u7684\u53bb\u566a\u6027\u80fd\uff0c\u5229\u7528\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\u548c\u6b65\u7ea7\u91cd\u91c7\u6837\u5668\u6765\u786e\u4fdd\u51c6\u786e\u7684\u6b65\u7ea7\u76d1\u7763\u3002 \u5728SPO\u4e2d\uff0c\u6211\u4eec\u5728\u6bcf\u4e2a\u53bb\u566a\u6b65\u9aa4\u4e2d\u4f1a\u521b\u5efa\u4e00\u4e2a\u56fe\u50cf\u6c60\uff0c\u5bfb\u627e\u5408\u9002\u7684\u80dc\u8005-\u8d25\u8005\u5bf9\uff0c\u5e76\u4e14\u5173\u952e\u5728\u4e8e\uff0c\u6211\u4eec\u4f1a\u4ece\u6c60\u4e2d\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u56fe\u50cf\u4f5c\u4e3a\u4e0b\u4e00\u6b21\u53bb\u566a\u6b65\u9aa4\u7684\u8d77\u70b9\u3002\u8fd9\u4e2a\u6b65\u7ea7\u91cd\u91c7\u6837\u8fc7\u7a0b\u4fdd\u8bc1\u4e86\u6bcf\u6b21\u80dc\u8005-\u8d25\u8005\u5bf9\u90fd\u6765\u81ea\u540c\u4e00\u539f\u59cb\u56fe\u50cf\uff0c\u4f7f\u5f97\u6bd4\u8f83\u72ec\u7acb\u4e8e\u524d\u4e00\u6b65\u3002\u4e3a\u4e86\u8bc4\u4f30\u6bcf\u4e2a\u6b65\u9aa4\u7684\u504f\u597d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u6a21\u7cca\u548c\u6e05\u6670\u7684\u56fe\u50cf\u3002\u5728Stable Diffusion v1.5\u548cSDXL\u7b49\u5b9e\u9a8c\u4e2d\uff0cSPO \u663e\u8457\u4f18\u4e8e\u6700\u65b0\u7684Diffusion-DPO\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u8be6\u7ec6\u7684\u63d0\u793a\u65f6\uff0c\u80fd\u66f4\u597d\u5730\u751f\u6210\u56fe\u50cf\u5e76\u63d0\u5347\u7f8e\u5b66\u6548\u679c\uff0c\u540c\u65f6\u5728\u8bad\u7ec3\u6548\u7387\u4e0a\u8d85\u8fc720\u500d\u3002\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1a[https://rockeycoss.github.io/spo.github.io/](https://rockeycoss.github.io/spo.github.io/)\u3002|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306](http://arxiv.org/abs/2406.04306)|**[link](https://github.com/ml-jku/SDLG)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u65f6\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u8fd9\u963b\u788d\u4e86\u793e\u4f1a\u548c\u5de5\u4e1a\u4e2d\u7684\u5404\u79cd\u5e94\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u964d\u4f4eLLMs\u7684\u53ef\u4fe1\u5ea6\u3002\u5f53\u524d\u7684LLMs\u91c7\u7528\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u6587\u672c\uff0c\u5373\u9884\u6d4b\u5e76\u6dfb\u52a0\u6587\u672c\u6807\u8bb0\u3002\u5f53LLMs\u5bf9\u751f\u6210\u7684\u4e0b\u4e00\u4e2a\u6807\u8bb0\u7684\u8bed\u4e49\u542b\u4e49\u4e0d\u786e\u5b9a\u65f6\uff0c\u5f88\u53ef\u80fd\u4f1a\u4ea7\u751f\u5e7b\u89c9\u3002\u56e0\u6b64\uff0c\u4eba\u4eec\u8ba4\u4e3a\u5e7b\u89c9\u6e90\u4e8e\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u8bed\u4e49\u591a\u6837\u6027\u8bed\u8a00\u751f\u6210\u201d\uff08Semantically Diverse Language Generation\uff0cSDLG\uff09\uff0c\u7528\u4e8e\u91cf\u5316LLMs\u7684\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002SDLG\u5f15\u5bfcLLM\u751f\u6210\u8bed\u4e49\u591a\u6837\u4f46\u53c8\u5408\u7406\u7684\u521d\u59cb\u6587\u672c\u66ff\u4ee3\u65b9\u6848\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u7cbe\u786e\u7684aleatoric\u8bed\u4e49\u4e0d\u786e\u5b9a\u6027\u6d4b\u91cf\uff0c\u80fd\u591f\u68c0\u6d4b\u521d\u59cb\u6587\u672c\u662f\u5426\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u3002 \u5b9e\u9a8c\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u8868\u660e\uff0cSDLG\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u6700\u4e3a\u9ad8\u6548\uff0c\u4e3aLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u8bbe\u5b9a\u4e86\u65b0\u7684\u6807\u51c6\u3002**|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300](http://arxiv.org/abs/2406.04300)|null|\u5728\u6a21\u62df\u8bad\u7ec3\u548c\u8bc4\u4f30\u5173\u952e\u5b89\u5168\u7cfb\u7edf\uff0c\u5982\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u65f6\uff0c\u901a\u8fc7\u6a21\u62df\u751f\u6210\u5404\u79cd\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u6a21\u578b\u5176\u4ed6\u8f66\u8f86\u7684\u8f68\u8ff9\u4ee5\u6a21\u62df\u590d\u6742\u4e14\u6709\u610f\u4e49\u7684\u8fd1\u8ddd\u79bb\u4ea4\u4e92\u4efb\u52a1\u6210\u672c\u9ad8\u6602\u3002\u5229\u7528\u8bed\u8a00\u63cf\u8ff0\u6765\u751f\u6210\u9a7e\u9a76\u884c\u4e3a\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u76f4\u89c2\u7684\u4eba\u7c7b\u64cd\u4f5c\u65b9\u5f0f\uff0c\u80fd\u591f\u6a21\u62df\u5e7f\u6cdb\u9a7e\u9a76\u4e92\u52a8\u3002\u4f46\u5927\u578b\u6807\u6ce8\u7684\u8bed\u8a00-\u8f68\u8ff9\u6570\u636e\u7a00\u7f3a\u662f\u8fd9\u4e00\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Text-to-Drive\uff08T2D\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5408\u6210\u591a\u6837\u5316\u9a7e\u9a76\u884c\u4e3a\u7684\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u77e5\u8bc6\u9a71\u52a8\u4e24\u9636\u6bb5\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5229\u7528LLMs\u7684\u5185\u7f6e\u77e5\u8bc6\u751f\u6210\u4e30\u5bcc\u591a\u6837\u7684\u9a7e\u9a76\u884c\u4e3a\u8bed\u8a00\u63cf\u8ff0\uff1b\u63a5\u7740\uff0c\u5229\u7528\u5176\u63a8\u7406\u80fd\u529b\u5728\u6a21\u62df\u5668\u4e2d\u5b9e\u73b0\u8fd9\u4e9b\u884c\u4e3a\u3002T2D\u7684\u6838\u5fc3\u662f\u4f7f\u7528LLM\u6784\u5efa\u72b6\u6001\u56fe\uff0c\u5c06\u4f4e\u7ea7\u72b6\u6001\u6620\u5c04\u5230\u9ad8\u7ea7\u62bd\u8c61\uff0c\u4ece\u800c\u7b80\u5316\u4e86\u8bf8\u5982\u603b\u7ed3\u4f4e\u7ea7\u89c2\u6d4b\u3001\u8bc4\u4f30\u7b56\u7565\u4e0e\u884c\u4e3a\u63cf\u8ff0\u7684\u4e00\u81f4\u6027\u4ee5\u53ca\u8bbe\u8ba1\u8f85\u52a9\u5956\u52b1\u7b49\u4e0b\u6e38\u4efb\u52a1\uff0c\u65e0\u9700\u4eba\u5de5\u76d1\u7763\u3002\u901a\u8fc7\u6211\u4eec\u7684\u77e5\u8bc6\u9a71\u52a8\u65b9\u6cd5\uff0c\u6211\u4eec\u8bc1\u660eT2D\u80fd\u751f\u6210\u6bd4\u5176\u4ed6\u57fa\u51c6\u66f4\u4e30\u5bcc\u7684\u8f68\u8ff9\uff0c\u5e76\u63d0\u4f9b\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u4ea4\u4e92\u5f0f\u5730\u878d\u5165\u4eba\u7c7b\u504f\u597d\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u7f51\u7ad9\uff1a|\n", "2406.04289": "|**2024-06-07**|**What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages**|Nadav Borenstein et.al.|[2406.04289](http://arxiv.org/abs/2406.04289)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b66\u4e60\u4ec0\u4e48\uff1f\u6839\u636e\u5b9a\u4e49\uff0c\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\u662f\u5b57\u7b26\u4e32\u7684\u5206\u5e03\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06\u8fd9\u4e2a\u95ee\u9898\u8f6c\u5316\u4e3a\u8bc4\u4f30\u5b57\u7b26\u4e32\u5206\u5e03\u7c7b\u7684\u5b66\u4e60\u80fd\u529b\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u7406\u8bba\u9650\u5236\uff0c\u4f46\u6211\u4eec\u5173\u6ce8\u7684\u662f\u5b9e\u9645\u53ef\u5b66\u4e60\u6027\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u5b9e\u8bc1\u5de5\u4f5c\uff0c\u6211\u4eec\u8bc4\u4f30\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u5176\u201c\u4e3b\u573a\u201d\u2014\u2014\u5b66\u4e60\u6982\u7387\u8bed\u8a00\u2014\u2014\u4e0a\u7684\u8868\u73b0\uff0c\u800c\u4e0d\u662f\u4f5c\u4e3a\u5f62\u5f0f\u8bed\u8a00\u7684\u5206\u7c7b\u5668\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u9012\u5f52\u8bed\u8a00\u6a21\u578b\uff08RLM\uff09\u7531\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u548cTransformer LM\u5b66\u4e60\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5RLM\u7684\u53ef\u5b66\u4e60\u6027\uff0c\u8003\u5bdf\u5176\u4e0eRLM\u7684\u590d\u6742\u53c2\u6570\u4ee5\u53ca\u795e\u7ecfLM\u9690\u85cf\u5c42\u5927\u5c0f\u7684\u5173\u7cfb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRLM\u7684\u79e9\uff08\u5bf9\u5e94\u4e8e\u5176\u6761\u4ef6\u5206\u5e03\u5bf9\u6570\u4f3c\u7136\u7ebf\u6027\u7a7a\u95f4\u7684\u5927\u5c0f\uff09\u548c\u91c7\u6837\u5b57\u7b26\u4e32\u7684\u9884\u671f\u957f\u5ea6\u662fRNN\u548cTransformer LM\u53ef\u5b66\u4e60\u6027\u7684\u5f3a\u4e14\u663e\u8457\u9884\u6d4b\u56e0\u7d20\u3002\u5176\u4ed6\u4e00\u4e9b\u9884\u6d4b\u6307\u6807\u4e5f\u8fbe\u5230\u4e86\u663e\u8457\u6027\uff0c\u4f46RNN\u548cTransformer\u4e4b\u95f4\u5b58\u5728\u4e0d\u540c\u7684\u6a21\u5f0f\u3002|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278](http://arxiv.org/abs/2406.04278)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|**## \u7ffb\u8bd1\u540e\u7684\u4e2d\u6587\u6458\u8981 \u5bf9\u8bdd\u8bed\u6c14\u5728\u4eba\u9645\u4ea4\u6d41\u4e2d\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u7814\u7a76\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ea4\u6d41\u8bed\u6c14\u7684\u5dee\u5f02\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u5173\u4e8e\u5bf9\u8bdd\u6a21\u5f0f\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u9884\u5148\u5b58\u5728\u7684\u5206\u7c7b\u4f53\u7cfb\u6216\u6587\u672c\u8bed\u6599\u5e93\uff0c\u8fd9\u4e9b\u53ef\u80fd\u5b58\u5728\u5b9e\u9a8c\u8005\u504f\u89c1\uff0c\u5e76\u53ef\u80fd\u65e0\u6cd5\u5145\u5206\u53cd\u6620\u7814\u7a76\u9886\u57df\u4e2d\u7684\u771f\u5b9e\u4e16\u754c\u5206\u5e03\u3002\u53d7\u8ba4\u77e5\u79d1\u5b66\u65b9\u6cd5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8fed\u4ee3\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u4e24\u9879\u4efb\u52a1\u6765\u540c\u65f6\u63ed\u793a\u8bed\u6c14\u548c\u53e5\u5b50\uff1a\uff081\uff09\u53c2\u4e0e\u8005\u5224\u65ad\u7ed9\u5b9a\u53e5\u5b50\u7684\u8bed\u6c14\uff0c\uff082\uff09\u53e6\u4e00\u53c2\u4e0e\u8005\u6839\u636e\u8be5\u8bed\u6c14\u751f\u6210\u53e5\u5b50\u3002\u6211\u4eec\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u548cGPT-4\u4e4b\u95f4\u8fdb\u884c\u4e86100\u8f6e\u8fd9\u6837\u7684\u4e92\u52a8\uff0c\u4ece\u800c\u83b7\u5f97\u4e86\u4e00\u7ec4\u5305\u542b\u53e5\u5b50\u548c\u5e38\u89c1\u5bf9\u8bdd\u8bed\u6c14\u7684\u6570\u636e\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u989d\u5916\u5b9e\u9a8c\uff0c\u8ba9\u4eba\u7c7b\u548cGPT-4\u5bf9\u6240\u6709\u53e5\u5b50\u6807\u6ce8\u6240\u6709\u8bed\u6c14\u3002\u57fa\u4e8e1,339\u540d\u4eba\u7c7b\u53c2\u4e0e\u8005\u300133,370\u6b21\u4eba\u7c7b\u8bc4\u4ef7\u4ee5\u53ca29,900\u4e2aGPT-4\u67e5\u8be2\u7684\u6570\u636e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u521b\u5efa\u4e00\u4e2a\u53ef\u89e3\u91ca\u7684\u51e0\u4f55\u8868\u793a\uff0c\u4ee5\u5c55\u793a\u4eba\u7c7b\u548cGPT-4\u4e4b\u95f4\u7684\u5bf9\u8bdd\u8bed\u6c14\u5173\u7cfb\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u548c\u8ba4\u77e5\u79d1\u5b66\u7406\u5ff5\u5982\u4f55\u7ed3\u5408\uff0c\u4ee5\u89e3\u51b3\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6311\u6218\u3002**|\n", "2406.05132": "|**2024-06-07**|**3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs**|Jianing Yang et.al.|[2406.05132](http://arxiv.org/abs/2406.05132)|**[link](https://github.com/sled-group/3D-GRAND)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u8bed\u8a00\u4e0e\u4e09\u7ef4\u611f\u77e5\u7684\u878d\u5408\u5bf9\u4e8e\u6784\u5efa\u7406\u89e3\u548c\u4e92\u52a8\u4e8e\u7269\u7406\u4e16\u754c\u7684\u5b9e\u4f53\u4ee3\u7406\u548c\u673a\u5668\u4eba\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u9002\u5e94\u4e09\u7ef4\u73af\u5883\uff083D-LLMs\uff09\u65b9\u9762\u4ecd\u5904\u4e8e\u521d\u7ea7\u9636\u6bb5\uff0c\u4e3b\u8981\u6311\u6218\u5728\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u5bc6\u96c6\u5730\u5c06\u8bed\u8a00\u4e0e\u4e09\u7ef4\u573a\u666f\u5173\u8054\u7684\u6570\u636e\u96c6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e863D-GRAND\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u5927\u578b\u6570\u636e\u96c6\uff0c\u5305\u542b40,087\u4e2a\u5bb6\u5ead\u573a\u666f\uff0c\u914d\u5bf9\u6709620\u4e07\u6761\u8be6\u5c3d\u7684\u573a\u666f-\u8bed\u8a00\u6307\u4ee4\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u75283D-GRAND\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u663e\u8457\u63d0\u9ad8\u4e863D-LLMs\u7684\u5b9a\u4f4d\u80fd\u529b\uff0c\u5e76\u51cf\u5c11\u4e86\u9519\u8bef\u7684\u60f3\u8c61\u3002\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e863D-POPE\u57fa\u51c6\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f303D-LLMs\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u6a21\u578b\u7684\u516c\u5e73\u6bd4\u8f83\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u96c6\u89c4\u6a21\u4e0e3D-LLM\u6027\u80fd\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u5f3a\u8c03\u4e86\u5927\u578b\u4e09\u7ef4\u6587\u672c\u6570\u636e\u96c6\u5728\u63a8\u52a8\u4f53\u611fAI\u7814\u7a76\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u521d\u6b65\u8ff9\u8c61\u8868\u660e\uff0c\u901a\u8fc7\u5728\u5927\u578b\u5408\u6210\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u53ef\u80fd\u5728\u73b0\u5b9e\u4e16\u754c3D\u626b\u63cf\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u8fd9\u5c55\u793a\u4e86\u6a21\u62df\u5230\u5b9e\u9645\u7684\u8fc1\u79fb\u5b66\u4e60\u6f5c\u529b\u3002\u901a\u8fc73D-GRAND\u548c3D-POPE\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u4f53\u611fAI\u793e\u533a\u63d0\u4f9b\u5fc5\u8981\u7684\u8d44\u6e90\u548c\u6d1e\u89c1\uff0c\u63a8\u52a8\u66f4\u53ef\u9760\u3001\u66f4\u624e\u5b9e\u76843D-LLMs\u7684\u53d1\u5c55\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttps://3d-grand.github.io|\n", "2406.05130": "|**2024-06-07**|**An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models**|Xiongtao Zhou et.al.|[2406.05130](http://arxiv.org/abs/2406.05130)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u5177\u6709\u6570\u5341\u4ebf\u53c2\u6570\uff0c\u5168\u9762\u8c03\u6574\u53d8\u5f97\u56f0\u96be\u3002\u7814\u7a76\u76ee\u6807\u662f\u627e\u51fa\u5728\u53c2\u6570\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u5347MLLM\u6027\u80fd\u7684\u6709\u6548\u65b9\u6cd5\u3002\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u56db\u79cd\u6d41\u884c\u7684PEFT\u6280\u672f\u5bf9\u5f00\u6e90MLLMs\u7684LLM\u7ec4\u4ef6\u8fdb\u884c\u5fae\u8c03\uff0c\u8bba\u6587\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5206\u6790\uff0c\u5185\u5bb9\u5305\u62ec\u4e0d\u540c\u65b9\u6cd5\u5bf9\u6a21\u578b\u3001\u53c2\u6570\u4f4d\u7f6e\u3001\u5fae\u8c03\u6570\u636e\u89c4\u6a21\u3001\u6a21\u578b\u7a33\u5b9a\u6027\u3001\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e03\u9879\u6570\u636e\u96c6\uff1a\u672a\u89c1\u8fc7\u7684\u548c\u5df2\u89c1\u8fc7\u7684\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u662f\u6700\u6709\u6548\u7684PEFT\u65b9\u6cd5\uff0c\u800c\u8fde\u63a5\u5668\u5c42\u7684\u5fae\u8c03\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u80fd\u63d0\u9ad8\u6027\u80fd\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05127": "|**2024-06-07**|**Towards Semantic Equivalence of Tokenization in Multimodal LLM**|Shengqiong Wu et.al.|[2406.05127](http://arxiv.org/abs/2406.05127)|null|### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002MLLM\u7684\u6838\u5fc3\u5728\u4e8e\u89c6\u89c9 tokenization\uff0c\u5373\u5982\u4f55\u6709\u6548\u5730\u5c06\u8f93\u5165\u7684\u89c6\u89c9\u4fe1\u53f7\u8f6c\u5316\u4e3a\u5bf9\u8bed\u8a00\u6a21\u578b\u6709\u76ca\u7684\u7279\u5f81\u8868\u793a\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u89c9tokenizer\u5728\u4fdd\u6301\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u8bed\u4e49\u4e00\u81f4\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5b83\u4eec\u8fc7\u4e8e\u788e\u7247\u5316\u89c6\u89c9\u8f93\u5165\uff0c\u7834\u574f\u4e86\u89c6\u89c9\u5185\u5bb9\u7684\u8bed\u4e49\u5b8c\u6574\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u8bed\u4e49\u7b49\u6548\u89c6\u89c9tokenizer\uff08SeTok\uff09\uff0c\u5b83\u901a\u8fc7\u52a8\u6001\u805a\u7c7b\u7b97\u6cd5\u5c06\u89c6\u89c9\u7279\u5f81\u7ec4\u7ec7\u6210\u8bed\u4e49\u5355\u5143\uff0c\u6839\u636e\u56fe\u50cf\u590d\u6742\u6027\u7075\u6d3b\u51b3\u5b9atoken\u7684\u6570\u91cf\u3002\u8fd9\u79cd\u751f\u6210\u7684\u89c6\u89c9tokens\u80fd\u6709\u6548\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u540c\u65f6\u6355\u6349\u4f4e\u9891\u548c\u9ad8\u9891\u89c6\u89c9\u7279\u5f81\u3002 ### \u4efb\u52a1 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSetokim\u7684\u65b0\u578bMLLM\uff0c\u5b83\u7ed3\u5408\u4e86SeTok\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSetokim\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u4f18\u52bf\u3002\u5173\u4e8e\u66f4\u591a\u8be6\u60c5\uff0c\u53ef\u4ee5\u8bbf\u95ee\u9879\u76ee\u7f51\u9875\uff1ahttps://chocowu.github.io/SeTok-web/\u3002|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107](http://arxiv.org/abs/2406.05107)|null|## \u7ffb\u8bd1 \u6570\u636e\u63a2\u7d22\u662f\u4e00\u4e2a\u590d\u6742\u7684\u8fc7\u7a0b\uff0c\u7528\u6237\u901a\u8fc7\u9010\u6b65\u6267\u884c\u4e00\u7cfb\u5217\u67e5\u8be2\u6765\u5ba1\u89c6\u6570\u636e\u96c6\u3002\u6709\u65f6\uff0c\u7528\u6237\u4f1a\u63a2\u7d22\u65b0\u6570\u636e\u4ee5\u719f\u6089\u5b83\uff0c\u4f46\u66f4\u591a\u65f6\u5019\uff0c\u63a2\u7d22\u8fc7\u7a0b\u662f\u56f4\u7ed5\u7279\u5b9a\u5206\u6790\u76ee\u6807\u6216\u95ee\u9898\u8fdb\u884c\u7684\u3002\u4e3a\u4e86\u5e2e\u52a9\u7528\u6237\u6709\u6548\u63a2\u7d22\uff0c\u5df2\u63d0\u51fa\u81ea\u52a8\u5316\u6570\u636e\u63a2\u7d22\uff08Automated Data Exploration\uff0cADE\uff09\u7cfb\u7edf\uff0c\u5b83\u4eec\u65e8\u5728\u81ea\u52a8\u751f\u6210\u5c55\u793a\u6570\u636e\u6709\u8da3\u7279\u6027\u7684\u5b8c\u6574\u63a2\u7d22\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684ADE\u7cfb\u7edf\u5e38\u53d7\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u4f18\u5316\u51fd\u6570\uff0c\u5bfc\u81f4\u5bf9\u540c\u4e00\u6570\u636e\u96c6\u59cb\u7ec8\u4ea7\u751f\u76f8\u540c\u7684\u63a2\u7d22\u5e8f\u5217\uff0c\u8fd9\u5728\u6709\u660e\u786e\u76ee\u6807\u7684\u63a2\u7d22\u4e2d\u663e\u5f97\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51faLINX\uff0c\u4e00\u4e2a\u7ed3\u5408\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\u7684\u751f\u6210\u5f0f\u7cfb\u7edf\uff0c\u4e13\u6ce8\u4e8e\u9762\u5411\u76ee\u6807\u7684\u6570\u636e\u63a2\u7d22\u3002 LINX\u63a5\u53d7\u8f93\u5165\u6570\u636e\u96c6\u548c\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u5206\u6790\u76ee\u6807\uff0c\u751f\u6210\u4e0e\u7528\u6237\u9700\u6c42\u76f8\u5173\u7684\u4e2a\u6027\u5316\u63a2\u7d22\u4f1a\u8bdd\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89e3\u6790\u8f93\u5165\u7684\u5206\u6790\u76ee\u6807\uff0c\u5e76\u636e\u6b64\u751f\u6210\u671f\u671b\u8f93\u51fa\u63a2\u7d22\u4f1a\u8bdd\u7684\u89c4\u8303\u3002\u8fd9\u4e9b\u89c4\u8303\u968f\u540e\u88ab\u4f20\u9012\u7ed9\u57fa\u4e8e\u7ea6\u675f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08Constrained Deep Reinforcement Learning\uff0cCDRL\uff09\u7684\u65b0\u578b\u6a21\u5757\u5316ADE\u5f15\u64ce\uff0c\u4f7f\u5176\u80fd\u6839\u636e\u6307\u5b9a\u6307\u4ee4\u8c03\u6574\u8f93\u51fa\u3002\u4e3a\u4e86\u9a8c\u8bc1LINX\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u9762\u5411\u76ee\u6807\u63a2\u7d22\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u6df1\u5165\u7684\u7528\u6237\u7814\u7a76\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLINX\u751f\u6210\u7684\u63a2\u7d22\u7b14\u8bb0\u672c\u5728\u76f8\u5173\u6027\u548c\u5b9e\u7528\u6027\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u89e3\u51b3\u65b9\u6848\uff0c\u5305\u62ecChatGPT\u3001\u65e0\u76ee\u6807\u5bfc\u5411\u7684ADE\u4ee5\u53ca\u5546\u4e1a\u7cfb\u7edf\u3002|\n", "2406.05085": "|**2024-06-07**|**Multi-Head RAG: Solving Multi-Aspect Problems with LLMs**|Maciej Besta et.al.|[2406.05085](http://arxiv.org/abs/2406.05085)|**[link](https://github.com/spcl/mrag)**|**## \u80cc\u666f **\u589e\u5f3a\u578b\u68c0\u7d22\u751f\u6210\uff08Retrieval Augmented Generation, RAG\uff09**\u901a\u8fc7\u5c06\u6587\u6863\u5185\u5bb9\u878d\u5165\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u4e2d\uff0c\u63d0\u9ad8\u4e86\u5176\u54cd\u5e94\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684RAG\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5904\u7406\u90a3\u4e9b\u53ef\u80fd\u9700\u8981\u68c0\u7d22\u5305\u542b\u4e0d\u540c\u5185\u5bb9\u7684\u591a\u6587\u6863\u67e5\u8be2\u3002\u8fd9\u7c7b\u95ee\u9898\u5728\u73b0\u5b9e\u4e2d\u5f88\u5e38\u89c1\uff0c\u4f46\u6311\u6218\u5728\u4e8e\uff0c\u8fd9\u4e9b\u6587\u6863\u7684\u5d4c\u5165\u5728\u5411\u91cf\u7a7a\u95f4\u4e2d\u53ef\u80fd\u76f8\u8ddd\u8f83\u8fdc\uff0c\u96be\u4ee5\u4e00\u6b21\u6027\u83b7\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6848\u2014\u2014**\u591a\u5934\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Multi-Head RAG, MRAG\uff09**\uff0c\u5b83\u4ee5\u4e00\u79cd\u7b80\u5355\u800c\u5f3a\u5927\u7684\u65b9\u5f0f\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff1a\u5229\u7528Transformer\u7684\u591a\u5934\u6ce8\u610f\u529b\u5c42\u7684\u6fc0\u6d3b\u4f5c\u4e3a\u68c0\u7d22\u952e\uff0c\u800c\u975e\u89e3\u7801\u5c42\u3002\u8fd9\u4e2a\u60f3\u6cd5\u7684\u9a71\u52a8\u529b\u5728\u4e8e\uff0c\u4e0d\u540c\u7684\u6ce8\u610f\u529b\u5934\u80fd\u591f\u5b66\u4e60\u6355\u6349\u6570\u636e\u7684\u4e0d\u540c\u65b9\u9762\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u6fc0\u6d3b\uff0c\u6211\u4eec\u5f97\u5230\u7684\u5d4c\u5165\u80fd\u4ee3\u8868\u6570\u636e\u9879\u548c\u67e5\u8be2\u7684\u591a\u79cd\u7279\u6027\uff0c\u4ece\u800c\u63d0\u5347\u590d\u6742\u67e5\u8be2\u7684\u68c0\u7d22\u7cbe\u5ea6\u3002 **\u8d21\u732e** \u6211\u4eec\u63d0\u4f9b\u4e86\u8bc4\u4f30\u65b9\u6cd5\u3001\u5ea6\u91cf\u6807\u51c6\u3001\u5408\u6210\u6570\u636e\u96c6\u4ee5\u53ca\u5b9e\u9645\u5e94\u7528\u6848\u4f8b\uff0c\u6765\u5c55\u793aMRAG\u7684\u6709\u6548\u6027\u3002\u4e0e\u6807\u51c6RAG\u57fa\u7ebf\u76f8\u6bd4\uff0cMRAG\u5728\u76f8\u5173\u6027\u65b9\u9762\u7684\u63d0\u5347\u53ef\u9ad8\u8fbe20%\u3002MRAG\u53ef\u4ee5\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u7684RAG\u6846\u67b6\uff0c\u5982RAGAS\uff0c\u4ee5\u53ca\u5404\u7c7b\u6570\u636e\u5b58\u50a8\u7cfb\u7edf\u3002 \u603b\u7ed3\uff0c\u672c\u6587\u65e8\u5728\u6539\u8fdb\u73b0\u6709RAG\u6a21\u578b\uff0c\u4ee5\u66f4\u597d\u5730\u5904\u7406\u6d89\u53ca\u591a\u89d2\u5ea6\u4fe1\u606f\u68c0\u7d22\u7684\u590d\u6742\u67e5\u8be2\u4efb\u52a1\u3002**|\n", "2406.05063": "|**2024-06-07**|**Are Large Language Models More Empathetic than Humans?**|Anuradha Welivita et.al.|[2406.05063](http://arxiv.org/abs/2406.05063)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7814\u7a76\u5b83\u4eec\u662f\u5426\u80fd\u5728\u60c5\u611f\u8bc6\u522b\u548c\u5171\u60c5\u56de\u5e94\u65b9\u9762\u8d85\u8d8a\u4eba\u7c7b\u5df2\u6210\u4e3a\u7814\u7a76\u7126\u70b9\u3002\u672c\u8bba\u6587\u5f00\u5c55\u4e86\u4e00\u9879\u6df1\u5165\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5305\u62ecGPT-4\u3001LLaMA-2-70B-Chat\u3001Gemini-1.0-Pro\u548cMixtral-8x7B-Instruct\u5728\u5185\u7684\u56db\u6b3e\u6700\u5148\u8fdb\u7684LLMs\u4e0e\u4eba\u7c7b\u5728\u5171\u60c5\u56de\u5e94\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca1,000\u540d\u53c2\u4e0e\u8005\u7684\u53cc\u76f2\u7528\u6237\u7814\u7a76\uff0c\u5bf92,000\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u60c5\u611f\u5bf9\u8bdd\u63d0\u793a\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u8fd9\u4e9b\u63d0\u793a\u6db5\u76d6\u4e8632\u79cd\u4e0d\u540c\u6b63\u8d1f\u60c5\u7eea\u7684\u5e7f\u6cdb\u8303\u56f4\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u7684\u5171\u60c5\u56de\u5e94\u80fd\u529b\u5728\u7edf\u8ba1\u5b66\u4e0a\u4f18\u4e8e\u4eba\u7c7b\u3002GPT-4\u8868\u73b0\u51fa\u6700\u5f3a\u70c8\u7684\u5171\u60c5\uff0c\u5176\u201c\u597d\u201d\u7b49\u7ea7\u522b\u7684\u56de\u590d\u6bd4\u4eba\u7c7b\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea631%\u3002\u7d27\u968f\u5176\u540e\u7684\u662fLLaMA-2\uff0c\u63d0\u5347\u4e86\u7ea624%\uff0cMixtral-8x7B\u63d0\u5347\u4e86\u7ea621%\uff0cGemini-Pro\u63d0\u5347\u4e86\u7ea610%\u3002\u6211\u4eec\u8fd8\u5bf9\u56de\u590d\u8bc4\u7ea7\u8fdb\u884c\u4e86\u66f4\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u53d1\u73b0\u67d0\u4e9bLLMs\u5728\u56de\u5e94\u7279\u5b9a\u60c5\u7eea\u65b9\u9762\u660e\u663e\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\u3002\u63d0\u51fa\u7684\u8bc4\u4f30\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u9002\u5e94\u6027\u5f3a\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0LLMs\u7684\u5171\u60c5\u80fd\u529b\uff0c\u907f\u514d\u4e86\u672a\u6765\u7814\u7a76\u91cd\u590d\u8fd9\u9879\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055](http://arxiv.org/abs/2406.05055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u63d0\u793a\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u7cbe\u5fc3\u6784\u5efa\u7684\u57fa\u51c6\u4e0a\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u5b58\u5728\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u63a8\u7406\u95ee\u9898\uff0c\u5373\u6240\u8c13\u7684\u4e0d\u660e\u786e\u95ee\u9898\u3002\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0c\u73b0\u6709\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6548\u679c\u4e0d\u4f73\uff0c\u5f80\u5f80\u7ed9\u51fa\u8fc7\u5ea6\u81ea\u4fe1\u7684\u7b54\u6848\u6216\u9519\u8bef\u63a8\u65ad\u3002\u4e3a\u4e86\u6df1\u5165\u7814\u7a76\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u201d\uff08PMC\uff09\u7684\u57fa\u51c6\uff0c\u5e76\u5f15\u5165\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002\u4f7f\u7528PMC\u57fa\u51c6\u7684\u5206\u6790\u63ed\u793a\u4e86\u5728\u89e3\u51b3\u660e\u786e\u95ee\u9898\u7684\u6570\u5b66\u63a8\u7406\u6027\u80fd\u4e0e\u8bc6\u522b\u4e0d\u660e\u786e\u95ee\u9898\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u6743\u8861\u3002\u9488\u5bf9PMC\u5e26\u6765\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\uff0c\u79f0\u4e3aSMT-LIB\u63d0\u793a\uff08SLP\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528SMT-LIB\u8bed\u8a00\u63cf\u8ff0\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u76f4\u63a5\u6c42\u89e3\uff0c\u7136\u540e\u91c7\u7528\u53cc\u91cd\u68c0\u67e5\u6c42\u89e3\u7b56\u7565\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u7684\u6ee1\u8db3\u6027\u548c\u552f\u4e00\u6027\uff0c\u4ece\u800c\u63d0\u4f9b\u6700\u7ec8\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u7684SLP\u65b9\u6cd5\u5728\u5904\u7406\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u65f6\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5177\u6709\u663e\u8457\u4f18\u52bf\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u6211\u4eec\u7684\u57fa\u51c6\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2406.05053": "|**2024-06-07**|**Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation**|Nachiket Kotalwar et.al.|[2406.05053](http://arxiv.org/abs/2406.05053)|null|### \u6982\u8ff0 \u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u7684\u6f5c\u529b\u5de8\u5927\uff0c\u5b83\u4eec\u80fd\u591f\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u53cd\u9988\u548c\u63d0\u793a\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u751f\u6210\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u4ee5\u8fbe\u5230\u4eba\u7c7b\u5bfc\u5e08\u7684\u6c34\u5e73\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u6559\u80b2\u90e8\u7f72\u4e2d\uff0c\u9664\u4e86\u8d28\u91cf\u5916\uff0c\u6210\u672c\u3001\u65f6\u95f4\u53ca\u6570\u636e\u9690\u79c1\u4e5f\u662f\u5173\u952e\u8003\u91cf\u56e0\u7d20\u3002\u672c\u8bba\u6587\u65e8\u5728\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u53cd\u9988\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u5305\u62ec\u8d28\u91cf\u3001\u6210\u672c\u3001\u901f\u5ea6\u548c\u6570\u636e\u9690\u79c1\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u5229\u7528\u6700\u65b0\u7684\u5728\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u6280\u672f\uff0c\u8fd9\u6709\u52a9\u4e8e\u76f4\u63a5\u964d\u4f4e\u6210\u672c\u5e76\u4fdd\u62a4\u6570\u636e\u9690\u79c1\u3002 \u4e3a\u4e86\u4f18\u5316\u9002\u5408\u6d4f\u89c8\u5668\u5185\u8fd0\u884c\u7684\u5c0f\u578b\u6a21\u578b\u7684\u53cd\u9988\u8d28\u91cf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGPT-4\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u7684\u5fae\u8c03\u6d41\u7a0b\u3002\u6211\u4eec\u5c06\u5c55\u793a\u5982\u4f55\u4f7f\u7528WebLLM\u7684\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u5f15\u64ce\u6765\u4f18\u5316Llama3-8B\u548cPhi3-3.8B\u76844\u4f4d\u91cf\u5316\u6a21\u578b\u5728\u4e09\u4e2a\u4e0d\u540cPython\u7f16\u7a0b\u6570\u636e\u96c6\u4e0a\u7684\u6548\u679c\u3002\u6211\u4eec\u627f\u8bfa\u4f1a\u516c\u5f00\u5168\u90e8\u5b9e\u73b0\u3001web\u5e94\u7528\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4fc3\u8fdb\u5728\u6d4f\u89c8\u5668\u8bed\u8a00\u6a21\u578b\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2406.05039": "|**2024-06-07**|**Bootstrapping Referring Multi-Object Tracking**|Yani Zhang et.al.|[2406.05039](http://arxiv.org/abs/2406.05039)|**[link](https://github.com/zyn213/temprmot)**|## \u80cc\u666f \u5f53\u524d\u7684\u591a\u5bf9\u8c61\u5f15\u7528\u8ddf\u8e2a\uff08RMOT\uff09\u4efb\u52a1\u901a\u5e38\u4f9d\u8d56\u4e8e\u624b\u52a8\u6807\u6ce8\u7684\u6570\u636e\u96c6\u548c\u9759\u6001\u89c4\u5219\uff0c\u8fd9\u9650\u5236\u4e86\u591a\u6837\u6027\u548c\u5b9e\u65bd\u8303\u56f4\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u901a\u8fc7\u5f15\u5165\u66f4\u591a\u533a\u5206\u6027\u8bed\u8a00\u8bcd\u6c47\u6765\u63a8\u52a8RMOT\u4efb\u52a1\u7684\u53d1\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u9996\u5148\u5bf9Refer-KITTI\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u6269\u5c55\uff0c\u521b\u5efa\u4e86Refer-KITTI-V2\uff0c\u5b83\u4ece\u6700\u521d\u76842,719\u4e2a\u624b\u52a8\u6807\u6ce8\u5f00\u59cb\uff0c\u89e3\u51b3\u4e86\u7c7b\u522b\u4e0d\u5e73\u8861\u95ee\u9898\uff0c\u5e76\u589e\u52a0\u4e86\u66f4\u591a\u5173\u952e\u8bcd\uff0c\u4f7f\u5176\u66f4\u8d34\u8fd1\u73b0\u5b9e\u573a\u666f\uff0c\u76f8\u8f83\u4e8eRefer-KITTI\u6709\u6240\u8fdb\u6b65\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6269\u5145\u8fd9\u4e9b\u6807\u6ce8\uff0c\u603b\u8ba1\u8fbe\u52309,758\u4e2a\uff0c\u751f\u6210\u4e86617\u4e2a\u4e0d\u540c\u7684\u8bcd\u6c47\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684RMOT\u57fa\u51c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6539\u8fdb\u4e86RMOT\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u800c\u4f18\u96c5\u7684\u65f6\u5e8f\u63a8\u8fdb\u7b56\u7565\uff0c\u8be5\u7b56\u7565\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\u3002\u76f8\u5173\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05035": "|**2024-06-07**|**Scenarios and Approaches for Situated Natural Language Explanations**|Pengshuo Qiu et.al.|[2406.05035](http://arxiv.org/abs/2406.05035)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u751f\u6210\u9002\u5e94\u4e0d\u540c\u7528\u6237\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\uff08NLE\uff09\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u79cd\u9002\u5e94\u6027\u7684\u91cf\u5316\u8bc4\u4f30\u5c1a\u5b58\u7a7a\u767d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u2014\u2014\u57fa\u4e8e\u60c5\u5883\u7684\u89e3\u91ca\uff08Situation-Based Explanation\uff0cSBE\uff09\u6570\u636e\u96c6\uff0c\u5305\u542b100\u4e2a\u9700\u8981\u89e3\u91ca\u7684\u4e8b\u7269\uff08explanandum\uff09\u3002\u6bcf\u4e2a\u4e8b\u7269\u90fd\u914d\u5bf9\u4e86\u9488\u5bf9\u6559\u5e08\u3001\u5b66\u751f\u548c\u4e13\u4e1a\u4eba\u58eb\u7b49\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u89e3\u91ca\uff0c\u4ee5\u4fbf\u8bc4\u4f30\u6a21\u578b\u5728\u6ee1\u8db3\u8fd9\u4e9b\u591a\u5143\u5316\u7fa4\u4f53\u4fe1\u606f\u9700\u6c42\u548c\u80cc\u666f\u4e0b\u7684\u89e3\u91ca\u7cbe\u51c6\u5ea6\uff0c\u5982\u5b66\u751f\u3001\u6559\u5e08\u548c\u5bb6\u957f\u3002\u6bcf\u79cd\u201c\u4e8b\u4f8b-\u53d7\u4f17\u201d\u7ec4\u5408\u90fd\u9644\u6709\u4eba\u7c7b\u64b0\u5199\u7684\u53c2\u8003\u89e3\u91ca\uff0c\u7528\u4e8e\u8ba1\u7b97\u5206\u6570\uff0c\u4ee5\u91cf\u5316\u6a21\u578b\u5982\u4f55\u6839\u636e\u60c5\u5883\u8c03\u6574\u89e3\u91ca\u3002\u6211\u4eec\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u4e0a\u6d4b\u8bd5\u4e86\u4e09\u79cd\u63d0\u793a\u65b9\u6cd5\uff1a\u89c4\u5219\u57fa\u7840\u63d0\u793a\u3001\u5143\u63d0\u793a\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u3002\u7814\u7a76\u53d1\u73b0\uff1a1\uff09\u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u751f\u6210\u63d0\u793a\u4ea7\u751f\u66f4\u7cbe\u786e\u5730\u7b26\u5408\u76ee\u6807\u60c5\u5883\u7684\u89e3\u91ca\uff1b2\uff09\u660e\u786e\u63d0\u793a\u201c\u4f60\u662f\u4e00\u4e2a\u6709\u7528\u7684\u52a9\u624b\u201d\u5e76\u975e\u9488\u5bf9\u60c5\u5883\u5316NLE\u4efb\u52a1\u7684\u5fc5\u8981\u6280\u672f\uff1b3\uff09\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u4ec5\u80fd\u5e2e\u52a9\u6a21\u578b\u5b66\u4e60\u6f14\u793a\u6a21\u677f\uff0c\u4f46\u65e0\u52a9\u4e8e\u63d0\u5347\u5176\u63a8\u7406\u6027\u80fd\u3002SBE\u6570\u636e\u96c6\u548c\u6211\u4eec\u7684\u5206\u6790\u4e3a\u4eca\u540e\u751f\u6210\u9002\u5e94\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2406.06525": "|**2024-06-10**|**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**|Peize Sun et.al.|[2406.06525](http://arxiv.org/abs/2406.06525)|**[link](https://github.com/foundationvision/llamagen)**|**\u6211\u4eec\u63d0\u51faLlamaGen\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u65b0\u7684\u56fe\u50cf\u751f\u6210\u6a21\u578b\u5bb6\u65cf\uff0c\u5b83\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u539f\u59cb\u201c\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u201d\u8303\u5f0f\u5e94\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u9886\u57df\u3002\u8fd9\u8868\u660e\uff0c\u5982\u679c\u9002\u5f53\u6269\u5c55\uff0c\u672a\u7ecf\u89c6\u89c9\u7279\u6027\u7684\u5148\u9a8c\u77e5\u8bc6\u589e\u5f3a\u7684\u7eaf\u81ea\u56de\u5f52\u6a21\u578b\uff08\u5982Llama\uff09\u4e5f\u80fd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u56fe\u50cf\u751f\u6210\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u56fe\u50cf\u5206\u8bcd\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u4ee5\u53ca\u8bad\u7ec3\u6570\u636e\u8d28\u91cf\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a(1) \u4e00\u79cd\u5177\u670916\u500d\u4e0b\u91c7\u6837\u7684\u56fe\u50cf\u5206\u8bcd\u5668\uff0c\u5176\u5728ImageNet\u57fa\u51c6\u4e0a\u7684\u91cd\u6784\u8d28\u91cf\u4e3a0.94\uff0c\u4ee3\u7801\u4e66\u5229\u7528\u7387\u9ad8\u8fbe97%\u3002(2) \u4e00\u7cfb\u5217\u4ece111\u767e\u4e07\u523031\u4ebf\u53c2\u6570\u7684\u7c7b\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u5728ImageNet 256x256\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.18\u7684FID\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u6d41\u884c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5982LDM\u548cDiT\u3002(3) \u4e00\u4e2a7.75\u4ebf\u53c2\u6570\u7684\u6587\u672c\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\u5728LAION-COCO\u548c\u9ad8\u5ba1\u7f8e\u8d28\u91cf\u56fe\u50cf\u4e0a\uff0c\u663e\u793a\u51fa\u826f\u597d\u7684\u89c6\u89c9\u8d28\u91cf\u548c\u6587\u672c\u4e00\u81f4\u6027\u6027\u80fd\u3002(4) \u6211\u4eec\u9a8c\u8bc1\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u670d\u52a1\u6846\u67b6\u5728\u4f18\u5316\u56fe\u50cf\u751f\u6210\u6a21\u578b\u63a8\u7406\u901f\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5b9e\u73b0\u4e86326%\u81f3414%\u7684\u901f\u5ea6\u63d0\u5347\u3002\u6211\u4eec\u5f00\u6e90\u6240\u6709\u6a21\u578b\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u751f\u6210\u548c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u7684\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u7684\u53d1\u5c55\u3002**|\n", "2406.06519": "|**2024-06-10**|**UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor**|Shivani Upadhyay et.al.|[2406.06519](http://arxiv.org/abs/2406.06519)|**[link](https://github.com/castorini/umbrela)**|**## \u7ffb\u8bd1 \u5927\u91cf\u76f8\u5173\u6027\u5224\u65ad\u5bf9\u4e8e\u68c0\u7d22\u7cfb\u7edf\u7684\u6709\u6548\u8bad\u7ec3\u548c\u7cbe\u786e\u8bc4\u4f30\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u5224\u65ad\u7531\u4eba\u5de5\u8bc4\u5b9a\u5458\u5b8c\u6210\uff0c\u8fc7\u7a0b\u6602\u8d35\u4e14\u8017\u65f6\u3002\u5fae\u8f6fBing\u7684Thomas\u7b49\u4eba\u6700\u8fd1\u7684\u4e00\u9879\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u51c6\u786e\u5730\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u63d0\u4f9b\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u5224\u65ad\u3002\u9057\u61be\u7684\u662f\uff0c\u4ed6\u4eec\u7684\u7814\u7a76\u5e76\u672a\u516c\u5f00\u53ef\u4f9b\u91cd\u590d\u4f7f\u7528\u7684\u8f6f\u4ef6\u5de5\u5177\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u5305\u2014\u2014UMBRELA\uff08\u5168\u79f0\u4e3a\u201cUMBRELA\u662fBing RELevance Assessor\u7684\u9012\u5f52\u7f29\u5199\u201d\uff09\uff0c\u5b83\u57fa\u4e8eOpenAI\u7684GPT-4\u6a21\u578b\u590d\u73b0\u4e86Thomas\u7b49\u4eba\u7684\u7ed3\u679c\uff0c\u5e76\u4e3a\u539f\u8bba\u6587\u589e\u6dfb\u4e86\u66f4\u591a\u7ec6\u8282\u3002\u6211\u4eec\u5728TREC 2019\u5e74\u81f32023\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u4efb\u52a1\u4e2d\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u76f8\u5173\u6027\u5224\u65ad\u4e0e\u9ad8\u6548\u591a\u9636\u6bb5\u68c0\u7d22\u7cfb\u7edf\u751f\u6210\u7684\u6392\u540d\u9ad8\u5ea6\u76f8\u5173\u3002\u8be5\u5de5\u5177\u5305\u8bbe\u8ba1\u4e3a\u6613\u4e8e\u6269\u5c55\uff0c\u53ef\u4ee5\u878d\u5165\u73b0\u6709\u7684\u591a\u9636\u6bb5\u68c0\u7d22\u548c\u8bc4\u4f30\u6d41\u7a0b\uff0c\u4e3a\u7814\u7a76\u68c0\u7d22\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u8d44\u6e90\u3002UMBRELA\u5c06\u5728TREC 2024\u5e74\u7684RAG\u4efb\u52a1\u4e2d\u7528\u4e8e\u8f85\u52a9\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u6211\u4eec\u671f\u671b\u5b83\u6210\u4e3a\u8be5\u9886\u57df\u8fdb\u4e00\u6b65\u521b\u65b0\u7684\u57fa\u7840\u3002UMBRELA\u7684\u4ee3\u7801\u5e93\u53ef\u4e8ehttps://github.com/castorini/umbrela\u83b7\u53d6\u3002**|\n", "2406.06499": "|**2024-06-10**|**NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative**|Asmar Nadeem et.al.|[2406.06499](http://arxiv.org/abs/2406.06499)|null|\u5f53\u524d\u7684\u89c6\u9891\u5b57\u5e55\u57fa\u51c6\u548c\u6a21\u578b\u5728\u8868\u5f81\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u79cd\u53d9\u4e8b\u662f\u901a\u8fc7\u56e0\u679c\u5173\u7cfb\u8fde\u63a5\u7684\u4e00\u7cfb\u5217\u4e8b\u4ef6\uff0c\u968f\u65f6\u95f4\u53d1\u5c55\uff0c\u7531\u4eba\u7269\u6216\u4e3b\u4f53\u9a71\u52a8\u3002\u8fd9\u79cd\u7f3a\u4e4f\u53d9\u4e8b\u6027\u9650\u5236\u4e86\u6a21\u578b\u751f\u6210\u6355\u6349\u89c6\u9891\u5185\u5bb9\u5185\u5728\u56e0\u679c\u548c\u65f6\u95f4\u52a8\u6001\u7684\u6587\u672c\u63cf\u8ff0\u7684\u80fd\u529b\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faNarrativeBridge\uff0c\u5b83\u5305\u62ec\u4ee5\u4e0b\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\uff1a\uff081\uff09\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u751f\u6210\u7684\u65b0\u578b\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\uff08CTN\uff09\u5b57\u5e55\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u660e\u786e\u5730\u5728\u89c6\u9891\u63cf\u8ff0\u4e2d\u7f16\u7801\u56e0\u679c\u5173\u7cfb\uff0c\u901a\u8fc7\u81ea\u52a8\u8bc4\u4f30\u786e\u4fdd\u8d28\u91cf\u548c\u76f8\u5173\u6027\uff1b\uff082\uff09\u4e00\u4e2a\u4e13\u95e8\u7684\u56e0\u679c\u7f51\u7edc\uff08CEN\uff09\u67b6\u6784\uff0c\u5177\u6709\u72ec\u7acb\u7684\u7f16\u7801\u5668\u4ee5\u5206\u522b\u6355\u83b7\u56e0\u679c\u52a8\u6001\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u7684\u5b66\u4e60\u548c\u751f\u6210\u5177\u6709\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7684\u5b57\u5e55\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCEN\u5728\u8868\u8fbe\u89c6\u9891\u5185\u5bb9\u7684\u56e0\u679c\u548c\u65f6\u95f4\u65b9\u9762\u6bd4\u7b2c\u4e8c\u597d\u7684\u6a21\u578b\uff08GIT\uff09\u66f4\u51c6\u786e\uff1a\u5728MSVD\u548cMSR-VTT\u6570\u636e\u96c6\u4e0a\u7684CIDEr\u5206\u6570\u5206\u522b\u4e3a17.88\u548c17.44\u3002\u63d0\u51fa\u7684\u6846\u67b6\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u5177\u6709\u590d\u6742\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7ed3\u6784\u7684\u7ec6\u5fae\u6587\u672c\u63cf\u8ff0\uff0c\u8fd9\u662f\u89c6\u9891\u5b57\u5e55\u751f\u6210\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u6027\u3002\u6709\u5173\u9879\u76ee\u8be6\u60c5\uff0c\u8bf7\u8bbf\u95ee\u3002|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474](http://arxiv.org/abs/2406.06474)|null|\u5728\u5065\u5eb7\u9886\u57df\uff0c\u5927\u90e8\u5206\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u4e34\u5e8a\u4efb\u52a1\u4e0a\u3002\u7136\u800c\uff0c\u79fb\u52a8\u548c\u53ef\u7a7f\u6234\u8bbe\u5907\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u957f\u671f\u7684\u4e2a\u4eba\u5065\u5eb7\u76d1\u6d4b\u6570\u636e\u5f80\u5f80\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Large Language Model\uff08PH-LLM\uff09\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u662fGemini\u7684\u5b9a\u5236\u7248\uff0c\u4e13\u4e3a\u7406\u89e3\u548c\u5904\u7406\u6570\u503c\u65f6\u95f4\u5e8f\u5217\u7684\u4e2a\u4eba\u5065\u5eb7\u6570\u636e\u800c\u8bbe\u8ba1\u3002\u6211\u4eec\u521b\u5efa\u5e76\u6574\u7406\u4e86\u4e09\u4e2a\u6d4b\u8bd5\u96c6\uff0c\u8003\u5bdf\u4e86PH-LLM\u5728\u4ee5\u4e0b\u65b9\u9762\u7684\u6027\u80fd\uff1a1\uff09\u4ece\u7761\u7720\u6a21\u5f0f\u3001\u8eab\u4f53\u6d3b\u52a8\u548c\u751f\u7406\u53cd\u5e94\u4e2d\u751f\u6210\u4e2a\u6027\u5316\u89c1\u89e3\u548c\u5efa\u8bae\uff1b2\uff09\u4e13\u4e1a\u77e5\u8bc6\u9886\u57df\u7684\u4e13\u5bb6\u6c34\u5e73\uff1b3\uff09\u9884\u6d4b\u81ea\u6211\u62a5\u544a\u7684\u7761\u7720\u7ed3\u679c\u3002\u6211\u4eec\u4e0e\u9886\u57df\u4e13\u5bb6\u5408\u4f5c\u6784\u5efa\u4e86857\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bc4\u4f30\u5b9e\u9645\u7684\u7761\u7720\u548c\u5065\u8eab\u573a\u666f\u3002\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bc4\u5206\u6807\u51c6\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0Gemini Ultra 1.0\u548cPH-LLM\u5728\u5065\u8eab\u65b9\u9762\u4e0e\u4e13\u5bb6\u8868\u73b0\u65e0\u7edf\u8ba1\u5dee\u5f02\uff0c\u5c3d\u7ba1\u5728\u7761\u7720\u65b9\u9762\u4e13\u5bb6\u4ecd\u5360\u4f18\u52bf\uff0c\u4f46Fine-tune\u540e\u7684PH-LLM\u5728\u5229\u7528\u76f8\u5173\u9886\u57df\u77e5\u8bc6\u548c\u4e2a\u4eba\u5316\u7761\u7720\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u591a\u9879\u9009\u62e9\u7684\u7761\u7720\u533b\u5b66\u548c\u5065\u8eab\u8003\u8bd5\u8bc4\u4f30\u4e86PH-LLM\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5176\u5f97\u5206\u5206\u522b\u4e3a79%\u548c88%\uff0c\u8d85\u8fc7\u4e86\u4eba\u7c7b\u4e13\u5bb6\u6837\u672c\u7684\u5e73\u5747\u5206\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bad\u7ec3PH-LLM\u9884\u6d4b\u6765\u81ea\u53ef\u7a7f\u6234\u8bbe\u5907\u6587\u672c\u548c\u591a\u6a21\u6001\u7f16\u7801\u6570\u636e\u7684\u81ea\u6211\u62a5\u544a\u7761\u7720\u8d28\u91cf\u7ed3\u679c\uff0c\u5e76\u8bc1\u660e\u4e86\u591a\u6a21\u6001\u7f16\u7801\u5bf9\u4e8e\u8fbe\u5230\u4e13\u95e8\u533a\u5206\u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5728\u4e2a\u4eba\u5065\u5eb7\u8fd9\u4e2a\u5173\u952e\u5b89\u5168\u9886\u57df\u8fd8\u9700\u8981\u8fdb\u4e00\u6b65\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u4f46\u8fd9\u4e9b\u7ed3\u679c\u5c55\u793a\u4e86Gemini\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u80fd\u529b\uff0c\u4ee5\u53ca\u5c06\u751f\u7406\u6570\u636e\u5e94\u7528\u4e8e\u4e2a\u4eba\u5065\u5eb7\u5e94\u7528\uff0c\u5982PH-LLM\u4e2d\u7684\u505a\u6cd5\u3002|\n", "2406.06465": "|**2024-06-10**|**AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction**|Zhen Xing et.al.|[2406.06465](http://arxiv.org/abs/2406.06465)|null|\u6587\u672c\u5f15\u5bfc\u7684\u89c6\u9891\u9884\u6d4b\uff08TVP\uff09\u4efb\u52a1\u65e8\u5728\u6839\u636e\u521d\u59cb\u5e27\u548c\u6307\u4ee4\u9884\u6d4b\u540e\u7eed\u5e27\u7684\u8fd0\u52a8\uff0c\u8fd9\u5bf9\u4e8e\u865a\u62df\u73b0\u5b9e\u3001\u673a\u5668\u4eba\u6280\u672f\u548c\u5185\u5bb9\u521b\u4f5c\u7b49\u9886\u57df\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6539\u7f16Stable Diffusion\u5728\u8be5\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5e27\u4e00\u81f4\u6027\u4e0e\u65f6\u95f4\u7a33\u5b9a\u6027\u65b9\u9762\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u89c6\u9891\u6570\u636e\u96c6\u7684\u89c4\u6a21\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u9884\u8bad\u7ec3\u7684Image2Video\u6269\u6563\u6a21\u578b\u5bf9\u89c6\u9891\u52a8\u6001\u6709\u826f\u597d\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4f46\u7f3a\u4e4f\u6587\u672c\u63a7\u5236\u3002\u56e0\u6b64\uff0c\u5c06Image2Video\u6a21\u578b\u8f6c\u79fb\uff0c\u540c\u65f6\u6ce8\u5165\u6307\u4ee4\u63a7\u5236\u4ee5\u751f\u6210\u53ef\u63a7\u5236\u7684\u89c6\u9891\uff0c\u65e2\u5177\u6709\u610f\u4e49\u53c8\u9887\u5177\u6311\u6218\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u7528\u4e8e\u6839\u636e\u521d\u59cb\u5e27\u548c\u6587\u672c\u6307\u4ee4\u9884\u6d4b\u672a\u6765\u7684\u89c6\u9891\u72b6\u6001\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u53cc\u67e5\u8be2Transformer\uff08DQFormer\uff09\u67b6\u6784\uff0c\u5b83\u5c06\u6307\u4ee4\u548c\u5e27\u4fe1\u606f\u6574\u5408\u5230\u6761\u4ef6\u5d4c\u5165\u4e2d\uff0c\u7528\u4e8e\u672a\u6765\u5e27\u7684\u9884\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u957f\u77ed\u671f\u65f6\u5e8f\u9002\u914d\u5668\u548c\u7a7a\u95f4\u9002\u914d\u5668\uff0c\u80fd\u591f\u5728\u5c11\u91cf\u8bad\u7ec3\u6210\u672c\u4e0b\u5feb\u901f\u5c06\u901a\u7528\u89c6\u9891\u6269\u6563\u6a21\u578b\u9002\u5e94\u7279\u5b9a\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Something Something V2\u3001Epic Kitchen-100\u3001Bridge Data\u548cUCF-101\u56db\u4e2a\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6280\u672f\u3002\u7279\u522b\u662f\u5728Bridge\u6570\u636e\u96c6\u548cSSv2\u4e0a\uff0cAID\u5206\u522b\u5b9e\u73b0\u4e8691.2%\u548c55.5%\u7684FVD\u6539\u8fdb\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u9886\u57df\u7684\u6709\u6548\u6027\u3002\u66f4\u591a\u793a\u4f8b\u53ef\u5728\u6211\u4eec\u7684\u7f51\u7ad9\u627e\u5230\u3002|\n", "2406.06464": "|**2024-06-10**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.06461": "|**2024-06-11**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461](http://arxiv.org/abs/2406.06461)|null|\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u591a\u79cd\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4f46\u4f20\u7edf\u7684\u8bc4\u4ef7\u65b9\u6cd5\u4ec5\u5173\u6ce8\u6027\u80fd\u6307\u6807\uff0c\u5ffd\u89c6\u4e86\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u989d\u5916\u8ba1\u7b97\u8d44\u6e90\u5e26\u6765\u7684\u589e\u6548\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7b56\u7565\u6548\u7387\u7684\u7247\u9762\u7406\u89e3\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u5c06\u8ba1\u7b97\u9884\u7b97\u7eb3\u5165\u8bc4\u4f30\uff0c\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u65e2\u8003\u8651\u6027\u80fd\u6307\u6807\u53c8\u8003\u8651\u8ba1\u7b97\u6210\u672c\u7684\u66f4\u5168\u9762\u6bd4\u8f83\u3002\u901a\u8fc7\u8fd9\u79cd\u9884\u7b97\u610f\u8bc6\u7684\u89c6\u89d2\uff0c\u7814\u7a76\u53d1\u73b0\u590d\u6742\u7684\u63a8\u7406\u7b56\u7565\u5728\u6ca1\u6709\u663e\u8457\u7b97\u6cd5\u521b\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u5f80\u5f80\u7531\u4e8e\u5206\u914d\u4e86\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u800c\u8d85\u8d8a\u4e86\u7b80\u5355\u7684\u57fa\u7ebf\u3002\u4f8b\u5982\uff0c\u5f53\u7ed9\u4e88\u94fe\u5f0f\u601d\u8003\u81ea\u6d3d\u6027\uff08chain-of-thought self-consistency\uff09\u7c7b\u4f3c\u7ea7\u522b\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u5b83\u5e38\u5e38\u80fd\u4f18\u4e8e\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u63a8\u7406\u7b56\u7565\u3002\u7136\u800c\uff0c\u5728\u8fd9\u79cd\u89c4\u6a21\u654f\u611f\u7684\u89c6\u89d2\u4e0b\uff0c\u67d0\u4e9b\u7b56\u7565\u5982\u591a\u4ee3\u7406\u8fa9\u8bba\u6216\u591a\u53cd\u601d\u5728\u589e\u52a0\u8ba1\u7b97\u9884\u7b97\u65f6\u53ef\u80fd\u4f1a\u8868\u73b0\u5f97\u66f4\u5dee\u3002|\n", "2406.06458": "|**2024-06-10**|**Evaluating the Retrieval Component in LLM-Based Question Answering Systems**|Ashkan Alinejad et.al.|[2406.06458](http://arxiv.org/abs/2406.06458)|null|## \u80cc\u666f \u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u95ee\u7b54\u7cfb\u7edf\u5728\u4f9d\u8d56\u68c0\u7d22\u7ec4\u4ef6\u65f6\uff0c\u80fd\u591f\u83b7\u53d6\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5e76\u964d\u4f4e\u4ea7\u751f\u4e0d\u51c6\u786e\u56de\u590d\u6216\u9519\u8bef\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5c3d\u7ba1\u4fe1\u606f\u68c0\u7d22\u9886\u57df\u7684\u8bc4\u4f30\u65b9\u6cd5\u65e9\u5df2\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u8bc4\u4f30LLMs\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u6027\u80fd\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u57fa\u51c6\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4ef7\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u5168\u9762\u5730\u53cd\u6620\u68c0\u7d22\u5668\u7684\u6027\u80fd\uff0c\u5e76\u4e0e\u6574\u4e2a\u95ee\u7b54\u7cfb\u7edf\u7684\u6574\u4f53\u8868\u73b0\u66f4\u4e3a\u4e00\u81f4\u3002\u5c3d\u7ba1\u4f20\u7edf\u7684\u7cbe\u786e\u5ea6\uff08precision\uff09\u3001\u53ec\u56de\u7387\uff08recall\uff09\u548cF1\u5206\u6570\u7b49\u6307\u6807\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u63ed\u793aLLMs\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5728\u68c0\u7d22\u5668\u4e0d\u5b8c\u7f8e\u65f6\u4ecd\u63d0\u4f9b\u51c6\u786e\u7b54\u6848\uff0c\u4f46\u6211\u4eec\u7684\u8bc4\u4f30\u65b9\u6cd5\u8003\u8651\u5230\u4e86LLMs\u7684\u4f18\u52bf\uff0c\u5373\u5b83\u4eec\u80fd\u591f\u5ffd\u7565\u65e0\u5173\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4e5f\u80fd\u5904\u7406\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u548c\u865a\u6784\u5185\u5bb9\u3002|\n", "2406.06455": "|**2024-06-10**|**A Large Language Model Pipeline for Breast Cancer Oncology**|Tristen Pool et.al.|[2406.06455](http://arxiv.org/abs/2406.06455)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f17\u591a\u9886\u57df\u5c55\u73b0\u51fa\u521b\u65b0\u6f5c\u529b\uff0c\u4f46\u5728\u764c\u75c7\u6cbb\u7597\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u5f00\u53d1\u3002\u7814\u7a76\u8005\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684Langchain\u63d0\u793a\u5de5\u7a0b\u7ba1\u9053\uff0c\u5bf9\u6700\u5148\u8fdb\u7684OpenAI\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u6570\u636e\u96c6\u5305\u62ec\u4e34\u5e8a\u6570\u636e\u548c\u4e34\u5e8a\u6307\u5357\u6587\u672c\uff0c\u4e13\u6ce8\u4e8e\u4e73\u817a\u764c\u60a3\u8005\u8f85\u52a9\u653e\u7597\u548c\u5316\u7597\u4e24\u4e2a\u5173\u952e\u6cbb\u7597\u56e0\u7d20\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6a21\u578b\u5728\u5206\u7c7b\u8fd9\u4e24\u4e2a\u6cbb\u7597\u624b\u6bb5\u65f6\u8fbe\u5230\u4e86\u9ad8\u7cbe\u5ea6\uff080.85+\uff09\u3002\u901a\u8fc7\u89c2\u5bdf\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u7684\u6cbb\u7597\u8d28\u91cf\u6570\u636e\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u7f6e\u4fe1\u533a\u95f4\uff0c\u4f30\u8ba1\u6a21\u578b\u5728\u9884\u6d4b\u6cbb\u7597\u65b9\u6848\u65f6\u5fc5\u987b\u6bd4\u539f\u59cb\u80bf\u7624\u5b66\u5bb6\u8868\u73b0\u5f97\u66f4\u597d\uff0c\u624d\u80fd\u5728\u603b\u4f53\u4e0a\u6210\u4e3a\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6bd4\u4f8b\u4e3a8.2%\u81f313.3%\u3002\u7531\u4e8e\u764c\u75c7\u6cbb\u7597\u51b3\u7b56\u7ed3\u679c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u672a\u6765\u53ef\u80fd\u9700\u8981\u8fdb\u884c\u4e34\u5e8a\u8bd5\u9a8c\u6765\u9a8c\u8bc1\u8fd9\u4e00\u9608\u503c\u3002\u8003\u8651\u5230\u7f8e\u56fd85%\u7684\u764c\u75c7\u60a3\u8005\u5728\u5730\u65b9\u793e\u533a\u8bbe\u65bd\u63a5\u53d7\u6cbb\u7597\uff0c\u8fd9\u7c7b\u6a21\u578b\u6709\u53ef\u80fd\u663e\u8457\u6269\u5927\u4f18\u8d28\u62a4\u7406\u7684\u53ef\u53ca\u6027\uff0c\u5176\u6548\u679c\u81f3\u5c11\u63a5\u8fd1\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u3002|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451](http://arxiv.org/abs/2406.06451)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u3001\u8c03\u8bd5\u548c\u89e3\u91ca\u65b9\u9762\u7684\u6027\u80fd\u5f15\u53d1\u4e86\u8bb8\u591a\u7814\u7a76\u8005\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u5bf9\u672c\u79d1\u7f16\u7a0b\u6559\u80b2\u7684\u5173\u6ce8\uff0c\u4ed6\u4eec\u671f\u5f85\u8fd9\u4e9b\u6a21\u578b\u80fd\u9769\u65b0\u7f16\u7a0b\u6559\u5b66\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5982\u4f55\u4ee5\u53ca\u4e3a\u4f55\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u4f7f\u7528LLMs\u7684\u51b3\u7b56\u53ef\u80fd\u4e0d\u4ec5\u4ec5\u57fa\u4e8e\u6280\u672f\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u4ee5\u793e\u4f1a\u5851\u9020\u6280\u672f\u7406\u8bba\u4e3a\u6307\u5bfc\u6846\u67b6\uff0c\u63a2\u8ba8\u4e86\u5b66\u751f\u5bf9LLMs\u7684\u793e\u4f1a\u611f\u77e5\u5982\u4f55\u5f71\u54cd\u4ed6\u4eec\u7684\u4f7f\u7528\u884c\u4e3a\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u4e00\u4efd\u533f\u540d\u7684\u8bfe\u7a0b\u7ed3\u675f\u65f6\u7684\u8c03\u67e5\u95ee\u5377\uff08n=158\uff09\u3001\u4e2d\u671f\u81ea\u6211\u6548\u80fd\u95ee\u5377\uff08n=158\uff09\u300110\u4f4d\u5b66\u751f\u7684\u6df1\u5ea6\u8bbf\u8c08\u3001\u81ea\u6211\u62a5\u544a\u7684LLM\u5728\u4f5c\u4e1a\u4e2d\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u4ee5\u53ca\u671f\u4e2d\u8003\u8bd5\u6210\u7ee9\uff0c\u53d1\u73b0\u5b66\u751f\u7684LLM\u4f7f\u7528\u4e0e\u5176\u5bf9\u672a\u6765\u804c\u4e1a\u7684\u671f\u671b\u548c\u5bf9\u540c\u4f34\u4f7f\u7528\u7684\u611f\u77e5\u6709\u5173\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u65e9\u671f\u81ea\u6211\u62a5\u544a\u7684LLM\u4f7f\u7528\u4e0e\u8f83\u4f4e\u7684\u81ea\u6211\u6548\u80fd\u548c\u4e2d\u671f\u8003\u8bd5\u6210\u7ee9\u76f8\u5173\uff0c\u800c\u5b66\u751f\u5bf9\u8fc7\u5ea6\u4f9d\u8d56LLM\u7684\u611f\u77e5\uff0c\u800c\u975e\u5b9e\u9645\u4f7f\u7528\uff0c\u4e0e\u8bfe\u7a0b\u540e\u671f\u7684\u81ea\u6211\u6548\u80fd\u4e0b\u964d\u6709\u5173\u3002|\n", "2406.07545": "|**2024-06-11**|**Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena**|Aidar Myrzakhan et.al.|[2406.07545](http://arxiv.org/abs/2406.07545)|**[link](https://github.com/vila-lab/open-llm-leaderboard)**|**### \u80cc\u666f \u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u5e38\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u5e38\uff0cLLM\u4f1a\u6839\u636e\u8c03\u6574\u540e\u7684\u6982\u7387\uff0c\u5982\u957f\u5ea6\u56e0\u7d20\uff0c\u9009\u62e9\u6700\u53ef\u80fd\u7684\u7b54\u6848\u3002\u7136\u800c\uff0cLLMs\u53ef\u80fd\u5b58\u5728\u56fa\u6709\u7684\u504f\u89c1\uff0c\u4f8b\u5982\u5bf9A\u3001B\u3001C\u3001D\u7b49\u9009\u9879ID\u7684\u504f\u597d\uff0c\u8fd9\u53ef\u80fd\u5f71\u54cd\u7b54\u6848\u9884\u6d4b\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u5728\u5c11\u6570\u6d4b\u8bd5\u6837\u672c\u4e0a\u968f\u673a\u6253\u4e71\u9009\u9879\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u5230\u65b0\u6837\u672c\u4e0a\uff0c\u8bd5\u56fe\u51cf\u5c11\u8fd9\u79cd\u201c\u9009\u62e9\u504f\u5dee\u201d\u3002\u6b64\u5916\uff0cMCQ\u7684\u53e6\u4e00\u4e2a\u95ee\u9898\u662f\u201c\u5f69\u7968\u5f0f\u731c\u6d4b\u201d\uff0c\u5373LLM\u5e76\u672a\u771f\u6b63\u5b66\u4e60\u77e5\u8bc6\uff0c\u800c\u662f\u51ed\u8fd0\u6c14\u731c\u5bf9\u7b54\u6848\uff0c\u8fd9\u5bf9\u5c0f\u578bLLMs\u5c24\u4e3a\u4e25\u91cd\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e00\u4e2a\u66f4\u5168\u9762\u7684\u65b9\u6cd5\u662f\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\uff0c\u8fd9\u80fd\u4ece\u6839\u672c\u4e0a\u6d88\u9664\u9009\u62e9\u504f\u5dee\u548c\u968f\u673a\u731c\u6d4b\u3002\u4f46\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\u4e5f\u5e26\u6765\u4e86\u6311\u6218\uff1a\u4e00\u662f\u5982\u4f55\u8bc6\u522b\u5408\u9002\u7684\u5f00\u653e\u6027\u95ee\u9898\uff0c\u4e8c\u662f\u5982\u4f55\u9a8c\u8bc1LLM\u5bf9\u5f00\u653e\u5f0f\u95ee\u9898\u7684\u56de\u7b54\u4e0e\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u7b54\u6848\u4e4b\u95f4\u7684\u51c6\u786e\u6027\u3002\u672c\u7814\u7a76\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u96be\u9898\uff0c\u5e76\u5efa\u7acb\u4e00\u4e2a\u65b0\u7684LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u901a\u8fc7\u5b8c\u5168\u7684\u5f00\u653e\u5f0f\u95ee\u9898\u6765\u8861\u91cf\u6a21\u578b\u6027\u80fd\uff0c\u4f8b\u5982GPT-4o/4/3.5\u3001Claude 3\u3001Gemini\u7b49\u3002 ### \u4efb\u52a1 \u6211\u4eec\u521b\u5efa\u4e86Open-LLM-Leaderboard\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u8bc4\u4ef7\u5e73\u53f0\uff0c\u65e8\u5728\u8ddf\u8e2a\u5404\u79cdLLM\u7684\u8868\u73b0\uff0c\u63ed\u793a\u5b83\u4eec\u7684\u771f\u5b9e\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5f00\u6e90\uff0c\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/VILA-Lab/Open-LLM-Leaderboard\u3002**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528](http://arxiv.org/abs/2406.07528)|**[link](https://github.com/dvlab-research/q-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u957f\u5e8f\u5217\u65b9\u9762\u7684\u80fd\u529b\u5bf9\u4e8e\u5404\u9886\u57df\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u4ee5\u6df1\u5165\u7406\u89e3\u8bed\u4e49\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Query-aware Inference for LLMs\uff08Q-LLM\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u8ba4\u77e5\u5904\u7406\u5927\u89c4\u6a21\u5e8f\u5217\u7684\u7cfb\u7edf\u3002\u901a\u8fc7\u805a\u7126\u4e8e\u4e0e\u7ed9\u5b9a\u67e5\u8be2\u76f8\u5173\u7684\u5185\u5b58\u6570\u636e\uff0cQ-LLM\u80fd\u591f\u5728\u56fa\u5b9a\u7a97\u53e3\u5927\u5c0f\u5185\u51c6\u786e\u6355\u6349\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4e3a\u67e5\u8be2\u63d0\u4f9b\u7cbe\u786e\u7684\u7b54\u6848\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55LLMs\u4e2d\u3002\u4f7f\u7528LLaMA3\uff08QuickLLaMA\uff09\u7684Q-LLM\u80fd\u572830\u79d2\u5185\u9605\u8bfb\u300a\u54c8\u5229\u00b7\u6ce2\u7279\u300b\uff0c\u5e76\u80fd\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u76f8\u8f83\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684LLaMA3\uff0cQ-LLM\u7684\u6027\u80fd\u63d0\u5347\u4e867.17%\uff0c\u800c\u5728Mistral\u4e0a\uff0c\u5b83\u5728$\\infty$-bench\u4e0a\u7684\u8868\u73b0\u63d0\u5347\u4e863.26%\u3002\u5728\u201c\u9488\u950b\u76f8\u5bf9\u201d\u4efb\u52a1\u4e2d\uff0cQ-LLM\u5728\u5e7f\u6cdb\u8ba4\u53ef\u7684\u57fa\u51c6\u4e0a\uff0c\u76f8\u5bf9\u4e8e\u5f53\u524d\u6700\u4f73\u6210\u7ee9\uff0cMistral\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e867.0%\uff0c\u5728LLaMA3\u4e0a\u5b9e\u73b0\u4e86100%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728https://github.com/dvlab-research/Q-LLM\u4e0a\u5f00\u6e90\u3002**|\n", "2406.07515": "|**2024-06-11**|**Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement**|Yunzhen Feng et.al.|[2406.07515](http://arxiv.org/abs/2406.07515)|null|\u968f\u7740\u751f\u6210\u6a21\u578b\u5408\u6210\u6570\u636e\u7684\u5174\u8d77\uff0c\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5fae\u8c03\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u6a21\u578b\u5d29\u6e83\uff08\u5373\u5fae\u8c03\u6027\u80fd\u4e0b\u964d\uff09\u7684\u5173\u6ce8\u3002\u7531\u4e8e\u4eba\u7c7b\u548c\u673a\u5668\u90fd\u8f83\u5bb9\u6613\u5206\u8fa8\u597d\u6837\u672c\u548c\u574f\u6837\u672c\uff0c\u800c\u975e\u751f\u6210\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u53cd\u9988\u6765\u9632\u6b62\u6a21\u578b\u5728\u5408\u6210\u6570\u636e\u4e0a\u51fa\u73b0\u5d29\u6e83\u3002\u6211\u4eec\u7406\u8bba\u5206\u6790\u4e86\u4e00\u4e2a\u9ad8\u65af\u6df7\u5408\u5206\u7c7b\u6a21\u578b\u5728\u57fa\u4e8e\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u8bad\u7ec3\u4e0b\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u6709\u9650\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u5b9e\u9a8c\u8bc1\u636e\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5b9e\u9645\u95ee\u9898\u4e0a\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u9884\u6d4b\uff1a\u4f7f\u7528\u53d8\u538b\u5668\u8ba1\u7b97\u77e9\u9635\u7279\u5f81\u503c\u548c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u65b0\u95fb\u6458\u8981\uff0c\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u6a21\u578b\u5728\u751f\u6210\u6570\u636e\u4e0a\u90fd\u4f1a\u7ecf\u5386\u5d29\u6e83\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u901a\u8fc7\u4ece\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u4e2d\u8bad\u7ec3\uff0c\u65e0\u8bba\u662f\u4fee\u526a\u9519\u8bef\u9884\u6d4b\u8fd8\u662f\u9009\u62e9\u6700\u4f73\u731c\u6d4b\uff0c\u90fd\u80fd\u9632\u6b62\u6a21\u578b\u5d29\u6e83\uff0c\u8bc1\u5b9e\u4e86\u50cfRLHF\uff08Reinforcement Learning with Human Feedback\uff09\u8fd9\u6837\u7684\u6d41\u884c\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2406.07505": "|**2024-06-11**|**THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report**|KBTG Labs et.al.|[2406.07505](http://arxiv.org/abs/2406.07505)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u5728\u79d1\u6280\u9886\u57df\u5c55\u73b0\u4e86\u65b0\u529f\u80fd\u548c\u673a\u9047\u3002\u7136\u800c\uff0c\u975e\u5e38\u5927\u7684LLMs\u7684\u5b9e\u9645\u5e94\u7528\u53d7\u5230\u5176\u9ad8\u8ba1\u7b97\u6210\u672c\u7684\u5236\u7ea6\uff0c\u8fd9\u4e0e\u5176\u76f8\u5bf9\u6709\u9650\u7684\u4eba\u7c7b\u80fd\u529b\u76f8\u6bd4\uff0c\u6536\u76ca\u5e76\u4e0d\u660e\u663e\u3002\u5c3d\u7ba1\u5c0f\u578b\u3001\u66f4\u5b9e\u7528\u7684LLMs\u5728\u91d1\u878d\u5206\u6790\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5c1a\u672a\u5b8c\u5168\u638c\u63e1\uff0c\u5982\u5b83\u4eec\u5728\u6a21\u62df\u7279\u8bb8\u91d1\u878d\u5206\u6790\u5e08\uff08CFA\uff09\u8003\u8bd5\u4e2d\u7684\u63a5\u8fd1\u901a\u8fc7\u8868\u73b0\u6240\u793a\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86Financial Analyst Extension\uff08FAE\uff09\u5bf9\u6211\u4eec\u7684Text Hyperlocally Augmented Large Language Extension\uff08THaLLE\uff09\u7cfb\u5217\u7684\u6269\u5c55\uff0c\u8fd9\u4e00\u7cfb\u521780\u4ebf\u53c2\u6570\u7684LLMs\u5728\u6a21\u62dfCFA\u8003\u8bd5\u4e2d\u59cb\u7ec8\u8868\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u6a21\u578b\u76f8\u6bd4\u3002\u6211\u4eec\u8be6\u7ec6\u8bb0\u5f55\u4e86\u7528\u4e8e\u4f18\u5316\u7684\u5fae\u8c03\u6280\u672f\uff0c\u4ee5\u4f9b\u540e\u7eed\u7814\u7a76\u53c2\u8003\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165Flare CFA\uff0c\u8fd9\u662f\u4e00\u4e2a\u516c\u5f00\u53ef\u7528\u7684\u91d1\u878d\u987e\u95ee\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u68c0\u9a8cLLMs\u5728\u8d22\u52a1\u987e\u95ee\u89d2\u8272\u4e2d\u7684\u80fd\u529b\u3002|\n", "2406.07502": "|**2024-06-11**|**Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions**|Renjie Pi et.al.|[2406.07502](http://arxiv.org/abs/2406.07502)|**[link](https://github.com/sterzhang/image-textualization)**|**## \u80cc\u666f \u56fe\u50cf\u63cf\u8ff0\u6570\u636e\u96c6\u5bf9\u4e8e\u63a8\u52a8\u56fe\u50cf\u7406\u89e3\u3001\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u548c\u6587\u672c\u56fe\u50cf\u68c0\u7d22\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u5f53\u524d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u6765\u81ea\u4e24\u4e2a\u9014\u5f84\uff1a\u4e00\u662f\u4ece\u7f51\u7edc\u4e0a\u6293\u53d6\u56fe\u50cf\u4e0e\u6587\u5b57\u5bf9\uff0c\u4f46\u8fd9\u7c7b\u63cf\u8ff0\u5f80\u5f80\u8d28\u91cf\u8f83\u4f4e\u4e14\u5b58\u5728\u566a\u58f0\uff1b\u4e8c\u662f\u4eba\u5de5\u6807\u6ce8\uff0c\u5982COCO\u7b49\uff0c\u901a\u5e38\u63cf\u8ff0\u7b80\u6d01\uff0c\u7f3a\u4e4f\u8be6\u7ec6\u4fe1\u606f\u3002\u5c3d\u7ba1\u8be6\u7ec6\u7684\u56fe\u50cf\u63cf\u8ff0\u53ef\u4ee5\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u83b7\u5f97\uff0c\u4f46\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\u9650\u5236\u4e86\u5176\u53ef\u884c\u6027\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u4fc3\u4f7f\u6211\u4eec\u5bfb\u6c42\u66f4\u6709\u6548\u548c\u53ef\u6269\u5c55\u7684\u65b9\u6cd5\u6765\u751f\u6210\u51c6\u786e\u800c\u8be6\u5c3d\u7684\u56fe\u50cf\u63cf\u8ff0\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u56fe\u50cf\u6587\u672c\u5316\u201d\uff08Image Textualization\uff0c\u7b80\u79f0IT\uff09\uff0c\u5b83\u901a\u8fc7\u534f\u540c\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u548c\u89c6\u89c9\u4e13\u5bb6\u6a21\u578b\uff0c\u6709\u6548\u5730\u5c06\u89c6\u89c9\u4fe1\u606f\u8f6c\u5316\u4e3a\u6587\u672c\uff0c\u4ece\u800c\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u63cf\u8ff0\u3002\u9488\u5bf9\u5f53\u524d\u7f3a\u4e4f\u8be6\u5c3d\u63cf\u8ff0\u7684\u57fa\u51c6\u95ee\u9898\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u591a\u4e2a\u8bc4\u4ef7\u57fa\u51c6\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u8d28\u91cf\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728IT\u7cbe\u5fc3\u7f16\u7e82\u7684\u63cf\u8ff0\u8bad\u7ec3\u4e0b\uff0cLLaVA-7B\u6a21\u578b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u80fd\u529b\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u80fd\u591f\u751f\u6210\u66f4\u4e30\u5bcc\u7684\u63cf\u8ff0\uff0c\u8f93\u51fa\u957f\u5ea6\u548c\u7ec6\u8282\u663e\u8457\u589e\u52a0\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5e7b\u89c9\u73b0\u8c61\u3002**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496](http://arxiv.org/abs/2406.07496)|**[link](https://github.com/zou-group/textgrad)**|**\u4eba\u5de5\u667a\u80fd\u6b63\u7ecf\u5386\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5176\u4ed6\u590d\u6742\u7ec4\u4ef6\u7684\u534f\u540c\u5de5\u4f5c\u53d6\u5f97\u4e86\u7a81\u7834\u3002\u5f53\u524d\uff0c\u4e3a\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u8bbe\u8ba1\u539f\u5219\u5316\u7684\u81ea\u52a8\u5316\u4f18\u5316\u65b9\u6cd5\u6210\u4e3a\u4e00\u9879\u5173\u952e\u65b0\u6311\u6218\u3002\u795e\u7ecf\u7f51\u7edc\u5728\u65e9\u671f\u9762\u4e34\u7c7b\u4f3c\u95ee\u9898\u65f6\uff0c\u901a\u8fc7\u53cd\u5411\u4f20\u64ad\u548c\u81ea\u52a8\u5fae\u5206\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TextGrad\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u6587\u672c\u5b9e\u73b0\u81ea\u52a8\u201c\u5fae\u5206\u201d\uff0c\u5c06LLMs\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u901a\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5efa\u8bae\u56de\u4f20\u5230\u590d\u5408AI\u7cfb\u7edf\u7684\u5404\u4e2a\u7ec4\u4ef6\u4e2d\u3002TextGrad\u9075\u5faaPyTorch\u7684\u8bed\u6cd5\u548c\u62bd\u8c61\uff0c\u6613\u4e8e\u4f7f\u7528\u4e14\u7075\u6d3b\uff0c\u7528\u6237\u4ec5\u9700\u63d0\u4f9b\u76ee\u6807\u51fd\u6570\uff0c\u65e0\u9700\u8c03\u6574\u6846\u67b6\u7ec4\u4ef6\u6216\u63d0\u793a\uff0c\u5373\u53ef\u65e0\u7f1d\u5e94\u7528\u3002 TextGrad\u9002\u7528\u4e8e\u591a\u79cd\u4efb\u52a1\uff0c\u4ece\u95ee\u7b54\u548c\u5206\u5b50\u4f18\u5316\u5230\u653e\u5c04\u6cbb\u7597\u8ba1\u5212\u8bbe\u8ba1\u3002\u5728\u65e0\u9700\u4fee\u6539\u6846\u67b6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u663e\u8457\u63d0\u5347\u4e86GPT-4o\u5728Google\u8bc1\u660e\u6027\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u96f6-shot\u51c6\u786e\u7387\uff0c\u4ece51%\u63d0\u5347\u81f355%\uff1b\u5728\u4f18\u5316LeetCode\u96be\u9898\u89e3\u6cd5\u4e0a\u5b9e\u73b0\u4e8620%\u7684\u76f8\u5bf9\u6027\u80fd\u63d0\u5347\uff1b\u6539\u8fdb\u4e86\u63a8\u7406\u63d0\u793a\uff0c\u8bbe\u8ba1\u51fa\u5177\u6709\u7406\u60f3\u4f53\u5916\u4eb2\u548c\u529b\u7684\u65b0\u836f\u5019\u9009\u5206\u5b50\uff1b\u4ee5\u53ca\u8bbe\u8ba1\u51fa\u5177\u6709\u9ad8\u7279\u5f02\u6027\u7684\u653e\u5c04\u6cbb\u7597\u65b9\u6848\u3002TextGrad\u4e3a\u4e0b\u4e00\u4ee3AI\u7cfb\u7edf\u7684\u53d1\u5c55\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63a8\u52a8\u4e86\u590d\u5408AI\u6280\u672f\u7684\u52a0\u901f\u53d1\u5c55\u3002**|\n", "2406.07494": "|**2024-06-12**|**CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization**|Frederic Kirstein et.al.|[2406.07494](http://arxiv.org/abs/2406.07494)|null|\u8be5\u6587\u7ae0\u7efc\u8ff0\u4e862019\u5e74\u81f32024\u5e74\u95f4\u53d1\u8868\u76841262\u7bc7\u72ec\u7279\u7684\u7814\u7a76\u8bba\u6587\uff0c\u96c6\u4e2d\u5728Transformer\u67b6\u6784\u5728\u82f1\u6587\u5bf9\u8bdd\u6458\u8981\u751f\u6210\u65b9\u9762\u7684\u7814\u7a76\u3002\u6587\u7ae0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u5bf9\u8bdd\u6458\u8981\u4e2d\u5b58\u5728\u7684\u4e3b\u8981\u6311\u6218\uff0c\u5982\u8bed\u8a00\u7406\u89e3\u3001\u7ed3\u6784\u5904\u7406\u3001\u7406\u89e3\u80fd\u529b\u3001\u8bf4\u8bdd\u8005\u8bc6\u522b\u3001\u91cd\u8981\u6027\u5224\u65ad\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\uff0c\u5e76\u4e0e\u76f8\u5e94\u7684\u6280\u672f\uff0c\u5982\u56fe\u89e3\u65b9\u6cd5\u3001\u989d\u5916\u8bad\u7ec3\u4efb\u52a1\u548c\u89c4\u5212\u7b56\u7565\u8fdb\u884c\u4e86\u5173\u8054\u3002\u5c3d\u7ba1\u5728\u67d0\u4e9b\u65b9\u9762\uff08\u5982\u8bed\u8a00\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u7406\u89e3\u529b\u3001\u771f\u5b9e\u6027\u4e0e\u91cd\u8981\u6027\u8bc4\u4f30\u7b49\u6311\u6218\u4ecd\u7136\u5b58\u5728\uff0c\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u7814\u7a76\u7a7a\u95f4\u3002 \u6587\u7ae0\u8fd8\u5206\u6790\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u65b9\u5f0f\uff0c\u6db5\u76d6\u4e86\u5bf9\u8bdd\u5b50\u9886\u57df\uff08\u5982\u4f1a\u8bae\u3001\u533b\u7597\uff09\u7684\u5e38\u7528\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u81ea\u52a8\u8bc4\u4ef7\u6307\u6807\uff08\u5982ROUGE\uff09\u548c\u4eba\u7c7b\u8bc4\u4f30\u7684\u666e\u904d\u5b9e\u8df5\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8de8\u9886\u57df\u7684\u6570\u636e\u96c6\u76f8\u5bf9\u6709\u9650\uff0c\u4e14\u62a5\u544a\u7684\u4eba\u7c7b\u8bc4\u4f30\u5f80\u5f80\u7f3a\u4e4f\u8db3\u591f\u7684\u5185\u5ba1\u5458\u4e00\u81f4\u6027\u4fe1\u606f\u548c\u6807\u6ce8\u6307\u5357\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u63a2\u7d22\u53ef\u80fd\u5e26\u6765\u7684\u5f71\u54cd\uff0c\u6307\u51fa\u5c3d\u7ba1\u5b83\u4eec\u53ef\u80fd\u4f1a\u6539\u53d8\u76f8\u5173\u6027\u548c\u96be\u5ea6\uff0c\u4f46\u63cf\u8ff0\u7684\u6311\u6218\u5206\u7c7b\u4f53\u7cfb\u4ecd\u7136\u5177\u6709\u4ef7\u503c\u3002|\n", "2406.07485": "|**2024-06-11**|**PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction**|Adnan Abbas et.al.|[2406.07485](http://arxiv.org/abs/2406.07485)|null|\u9ad8\u6548\u7684\u8ba1\u5212\u5236\u5b9a\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4eba\u4eec\u5f80\u5f80\u96be\u4ee5\u5236\u5b9a\u5b9e\u9645\u7684\u8ba1\u5212\u5e76\u53cd\u601d\u81ea\u5df1\u7684\u6548\u7387\u3002\u5229\u7528\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u5bf9\u8bdd\u52a9\u624b\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u8bdd\u65b9\u5f0f\u5c06\u8ba1\u5212\u5916\u5316\uff0c\u5f3a\u5316\u51b3\u5fc3\uff0c\u4fc3\u8fdb\u4e13\u6ce8\u884c\u52a8\uff0c\u4ece\u800c\u6b63\u9762\u5f71\u54cd\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u8bbe\u8ba1\u4e00\u4e2a\u5bf9\u8bdd\u52a9\u624b\uff0c\u901a\u8fc7\u81ea\u7136\u5bf9\u8bdd\u7684\u793e\u4ea4\u4e92\u52a8\u6027\uff0c\u63d0\u4f9b\u6df1\u5165\u7684\u95ee\u9898\u548c\u53cd\u601d\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u8ba1\u5212\u6267\u884c\u5ea6\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u663e\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u7684\u6548\u76ca\uff0c\u4f46\u8bb8\u591a\u5e72\u9884\u63aa\u65bd\u4ecd\u4fdd\u6301\u9759\u6001\uff0c\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u53c2\u4e0e\u5ea6\u968f\u65f6\u95f4\u4e0b\u964d\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65cb\u8f6c\u548c\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u6bcf\u5929\u4e3a\u7528\u6237\u63d0\u4f9b\u591a\u6837\u7684\u5e72\u9884\u624b\u6bb5\u3002\u6211\u4eec\u7684\u7cfb\u7edfPITCH\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u4fc3\u8fdb\u65e5\u5e38\u8ba1\u5212\u7684\u5916\u90e8\u5316\u548c\u53cd\u601d\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0e\u5bf9\u8bdd\u4ee3\u7406\u4e00\u8d77\u5916\u5316\u4efb\u52a1\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u65cb\u8f6c\u7b56\u7565\u5728\u4fdd\u6301\u7528\u6237\u53c2\u4e0e\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2406.07483": "|**2024-06-11**|**Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing**|Mao Li et.al.|[2406.07483](http://arxiv.org/abs/2406.07483)|null|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u81ea\u52a8\u6587\u672c\u6807\u6ce8\u65b9\u9762\u5c55\u73b0\u51fa\u6d53\u539a\u5174\u8da3\u3002\u672c\u6587\u7814\u7a76\u4e86\u516b\u79cd\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728\u7acb\u573a\u6807\u6ce8\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5c06\u5176\u4e0e\u4eba\u7c7b\uff08\u901a\u8fc7\u4f17\u5305\uff09\u7684\u5224\u65ad\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u3002\u6211\u4eec\u63a2\u7a76\u4e86\u4f55\u65f6LLMs\u53ef\u80fd\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ea7\u751f\u5206\u6b67\u7684\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6587\u672c\u4e2d\u8868\u8fbe\u7acb\u573a\u7684\u660e\u786e\u7a0b\u5ea6\u5bf9LLMs\u5224\u65ad\u4e0e\u4eba\u7c7b\u4e00\u81f4\u6027\u81f3\u5173\u91cd\u8981\u3002\u5f53\u4eba\u7c7b\u6ce8\u91ca\u8005\u8868\u73b0\u826f\u597d\u65f6\uff0cLLMs\u4e5f\u8868\u73b0\u51fa\u8272\uff1b\u53cd\u4e4b\uff0cLLMs\u7684\u5931\u8d25\u5f80\u5f80\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u96be\u4ee5\u8fbe\u6210\u4e00\u81f4\u7684\u60c5\u5883\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u7ed3\u5408\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u7cbe\u786e\u5ea6\u4e0eLLMs\u9884\u6d4b\u7684\u89c4\u6a21\uff0c\u63d0\u51fa\u4e00\u79cd\u5168\u9762\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u63d0\u9ad8\u81ea\u52a8\u5316\u7acb\u573a\u68c0\u6d4b\u51c6\u786e\u6027\u548c\u5168\u9762\u6027\u7684\u5fc5\u8981\u6027\uff0c\u65e8\u5728\u63a8\u52a8\u8fd9\u4e9b\u6280\u672f\u5728\u66f4\u9ad8\u6548\u3001\u65e0\u504f\u89c1\u7684\u793e\u4f1a\u5a92\u4f53\u5206\u6790\u4e2d\u5f97\u5230\u63d0\u5347\u3002|\n", "2406.07476": "|**2024-06-11**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476](http://arxiv.org/abs/2406.07476)|**[link](https://github.com/damo-nlp-sg/videollama2)**|**\u672c\u6587\u4ecb\u7ecdVideoLLaMA 2\uff0c\u4e00\u5957\u4e13\u4e3a\u63d0\u5347\u89c6\u9891\u548c\u97f3\u9891\u5b9a\u5411\u4efb\u52a1\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u5efa\u6a21\u53ca\u97f3\u9891\u7406\u89e3\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Video-LLMs\uff09\u3002\u5b83\u5728\u524d\u4e00\u4ee3\u7684\u57fa\u7840\u4e0a\u589e\u6dfb\u4e86\u5b9a\u5236\u7684\u65f6\u7a7a\u5377\u79ef\uff08STC\uff09\u8fde\u63a5\u5668\uff0c\u6709\u6548\u5730\u6355\u6349\u89c6\u9891\u6570\u636e\u7684\u590d\u6742\u7a7a\u95f4\u548c\u65f6\u95f4\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u878d\u5165\u4e86\u97f3\u9891\u5206\u652f\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u65e0\u7f1d\u878d\u5408\u97f3\u9891\u7ebf\u7d22\u3002\u5728\u591a\u9879\u8bc4\u4f30\u4e2d\uff0c\u5982\u591a\u9009\u89c6\u9891\u95ee\u7b54\uff08MC-VQA\uff09\u3001\u5f00\u653e\u6027\u89c6\u9891\u95ee\u7b54\uff08OE-VQA\uff09\u548c\u89c6\u9891captioning\uff08VC\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u8868\u73b0\u51fa\u4e0e\u5f00\u6e90\u6a21\u578b\u76f8\u5f53\u7684\u7ade\u4e89\u5b9e\u529b\uff0c\u5e76\u5728\u67d0\u4e9b\u57fa\u51c6\u4e0a\u63a5\u8fd1\u4e13\u6709\u6a21\u578b\u3002\u5728\u97f3\u9891\u4ec5\u7528\uff08AQA\uff09\u548c\u97f3\u9891-\u89c6\u9891\u95ee\u7b54\uff08OE-AVQA\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u4e5f\u663e\u793a\u51fa\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5408\u7406\u6539\u8fdb\u3002\u8fd9\u4e9b\u8fdb\u6b65\u51f8\u663e\u4e86VideoLLaMA 2\u5728\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4e3a\u667a\u80fd\u89c6\u9891\u5206\u6790\u7cfb\u7edf\u6811\u7acb\u4e86\u65b0\u6807\u51c6\u3002\u6240\u6709\u6a21\u578b\u5747\u516c\u5f00\u4ee5\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2406.08477": "|**2024-06-12**|**Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens**|Ting-Ji Huang et.al.|[2406.08477](http://arxiv.org/abs/2406.08477)|null|\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u5411\u91cf\u8868\u793a\u7528\u6237\u548c\u9879\u76ee\u5bf9\u4e8e\u591a\u79cd\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u95ee\u7b54\u5f62\u5f0f\u7684\u63a8\u8350\uff0c\u4f7f\u7528\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\uff08\u5982\u201citem\u201d\u3001\u201c20\u201d\u3001\u201c24\u201d\uff09\u6765\u8868\u793a\u5b9e\u9645\u7684\u7528\u6237\u548c\u9879\u76ee\u3002\u7136\u800c\uff0c\u7531\u4e8eLLMs\u901a\u5e38\u662f\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u9884\u8bad\u7ec3\u7684\uff0c\u8fd9\u4e9b\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5728\u8868\u8fbe\u72ec\u7279\u7528\u6237\u548c\u9879\u76ee\u65b9\u9762\u80fd\u529b\u6709\u9650\uff0c\u5373\u4f7f\u7ecf\u8fc7\u63a8\u8350\u4efb\u52a1\u7684\u5fae\u8c03\uff0c\u4e5f\u4f1a\u524a\u5f31\u63a8\u8350\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5728LLM\u57fa\u7684\u63a8\u8350\u7cfb\u7edf\u4e2d\u5904\u7406\u7528\u6237\u548c\u9879\u76ee\u7684\u6807\u8bb0\u3002 \u6211\u4eec\u5f3a\u8c03\u4e86\u51fa\u8bcd\u6c47\u8868\uff08OOV\uff09\u6807\u8bb0\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u9664\u4e86\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5916\uff0c\u8fd8\u80fd\u6355\u6349\u7528\u6237/\u9879\u76ee\u4e4b\u95f4\u7684\u5173\u8054\u6027\u548c\u591a\u6837\u6027\u3002\u901a\u8fc7\u5206\u6790\u5386\u53f2\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u4f7f\u5177\u6709\u76f8\u4f3c\u7279\u6027\u7684\u7528\u6237/\u9879\u76ee\u7ec4\u5408\u5171\u4eab\u76f8\u540c\u7684OOV\u6807\u8bb0\u3002\u6b64\u5916\uff0c\u5c06\u8fd9\u4e9bOOV\u6807\u8bb0\u6574\u5408\u5230LLM\u7684\u8bcd\u6c47\u8868\u4e2d\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u533a\u5206\u7528\u6237\u548c\u9879\u76ee\uff0c\u589e\u5f3a\u5728\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u65f6\u5bf9\u7528\u6237-\u9879\u76ee\u5173\u7cfb\u7684\u6355\u6349\u3002 \u6211\u4eec\u7684\u63d0\u51fa\u7684\u6846\u67b6\u5728\u5404\u79cd\u4e0b\u6e38\u63a8\u8350\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2406.08474": "|**2024-06-12**|**Real2Code: Reconstruct Articulated Objects via Code Generation**|Zhao Mandi et.al.|[2406.08474](http://arxiv.org/abs/2406.08474)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014Real2Code\uff0c\u65e8\u5728\u901a\u8fc7\u4ee3\u7801\u751f\u6210\u6765\u91cd\u5efa\u53ef\u52a8\u7269\u4f53\u3002\u7ed9\u5b9a\u7269\u4f53\u7684\u89c6\u89c9\u89c2\u6d4b\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u56fe\u50cf\u5206\u5272\u6a21\u578b\u548c\u5f62\u72b6\u8865\u5168\u6a21\u578b\u91cd\u6784\u5176\u90e8\u4ef6\u51e0\u4f55\u7ed3\u6784\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u7269\u4f53\u90e8\u4ef6\u8868\u793a\u4e3a\u5e26\u6709\u65b9\u5411\u7684\u8fb9\u754c\u6846\uff0c\u7136\u540e\u8f93\u5165\u5230\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\uff0c\u9884\u6d4b\u5173\u8282\u6d3b\u52a8\u7684\u4ee3\u7801\u8868\u793a\u3002\u901a\u8fc7\u5229\u7528\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u4f18\u96c5\u5730\u6269\u5c55\u5230\u5177\u6709\u66f4\u591a\u53ef\u52a8\u90e8\u4ef6\u7684\u5bf9\u8c61\uff0c\u5e76\u80fd\u4ece\u5408\u6210\u8bad\u7ec3\u6570\u636e\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4e0d\u89c4\u5219\u73af\u5883\u7269\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cReal2Code\u5728\u91cd\u5efa\u7cbe\u5ea6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u662f\u9996\u4e2a\u80fd\u591f\u8d85\u8d8a\u8bad\u7ec3\u96c6\u4e2d\u5bf9\u8c61\u7ed3\u6784\u590d\u6742\u6027\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91cd\u5efa\u591a\u8fbe10\u4e2a\u53ef\u52a8\u90e8\u4ef6\u7684\u7269\u4f53\u3002\u5f53\u4e0e\u7acb\u4f53\u91cd\u5efa\u6a21\u578b\u7ed3\u5408\u65f6\uff0cReal2Code\u8fd8\u80fd\u4ece\u5c11\u91cf\u591a\u89c6\u56feRGB\u56fe\u50cf\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u7269\u4f53\uff0c\u65e0\u9700\u6df1\u5ea6\u6216\u76f8\u673a\u4fe1\u606f\u3002|\n", "2406.08464": "|**2024-06-12**|**Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing**|Zhangchen Xu et.al.|[2406.08464](http://arxiv.org/abs/2406.08464)|**[link](https://github.com/magpie-align/magpie)**|\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4\u6570\u636e\u5bf9\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u50cfLlama-3-Instruct\u8fd9\u6837\u7684\u6a21\u578b\u516c\u5f00\u4e86\u6743\u91cd\uff0c\u4f46\u5b83\u4eec\u7684\u5bf9\u9f50\u6570\u636e\u4ecd\u7136\u4fdd\u5bc6\uff0c\u8fd9\u9650\u5236\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u666e\u53ca\u3002\u73b0\u6709\u7684\u5f00\u6e90\u6570\u636e\u751f\u6210\u65b9\u6cd5\u53d7\u9650\u4e8e\u9ad8\u6602\u7684\u4eba\u529b\u6210\u672c\u548c\u6709\u9650\u7684\u63d0\u793a\u8303\u56f4\uff0c\u96be\u4ee5\u6709\u6548\u6269\u5c55\uff0c\u53ef\u80fd\u5f71\u54cd\u516c\u5171\u5bf9\u9f50\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u80fd\u5426\u901a\u8fc7\u76f4\u63a5\u4ece\u5df2\u5bf9\u9f50\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\uff0c\u5927\u89c4\u6a21\u5408\u6210\u9ad8\u8d28\u6307\u4ee4\u6570\u636e\u5462\uff1f\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5408\u6210\u65b9\u6cd5\uff0c\u79f0\u4e3aMagpie\u3002\u6211\u4eec\u7684\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u7531\u4e8eLlama-3-Instruct\u7b49\u5df2\u5bf9\u9f50\u7684\u6a21\u578b\u5177\u6709\u81ea\u56de\u5f52\u7279\u6027\uff0c\u5f53\u6211\u4eec\u4ec5\u8f93\u5165\u5de6\u4fa7\u6a21\u677f\u5230\u7528\u6237\u6d88\u606f\u9884\u7559\u4f4d\u7f6e\u65f6\uff0c\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u7528\u6237\u67e5\u8be2\u3002\u6211\u4eec\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\u63d0\u793aLlama-3-Instruct\uff0c\u751f\u6210\u4e86400\u4e07\u4e2a\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5bf9\u63d0\u53d6\u7684\u6570\u636e\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u5e76\u9009\u62e9\u4e8630\u4e07\u4e2a\u9ad8\u8d28\u91cf\u5b9e\u4f8b\u3002\u4e3a\u4e86\u6bd4\u8f83Magpie\u6570\u636e\u4e0e\u5176\u4ed6\u516c\u5171\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5206\u522b\u4f7f\u7528\u6bcf\u4e2a\u6570\u636e\u96c6\u5bf9Llama-3-8B-Base\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8bc4\u4f30\u5fae\u8c03\u540e\u6a21\u578b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u4ec5\u4f7f\u7528Magpie\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0e\u5b98\u65b9\u7ecf\u8fc71000\u4e07\u4e2a\u6570\u636e\u70b9\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u548c\u540e\u7eed\u53cd\u9988\u5b66\u4e60\u589e\u5f3a\u7684Llama-3-8B-Instruct\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4ec5\u4f7f\u7528Magpie\u8fdb\u884cSFT\u53ef\u4ee5\u8d85\u8d8a\u5148\u524d\u7528\u4e8eSFT\u548c\u504f\u597d\u4f18\u5316\uff08\u5982UltraFeedback\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff09\u7684\u516c\u5171\u6570\u636e\u96c6\u3002\u8fd9\u79cd\u4f18\u52bf\u5728AlpacaEval\u3001ArenaHard\u548cWildBench\u7b49\u5bf9\u9f50\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u660e\u663e\u3002|\n", "2406.08434": "|**2024-06-12**|**TasTe: Teaching Large Language Models to Translate through Self-Reflection**|Yutong Wang et.al.|[2406.08434](http://arxiv.org/abs/2406.08434)|**[link](https://github.com/yutongwang1216/reflectionllmmt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u540e\uff0c\u5728\u673a\u5668\u7ffb\u8bd1\uff08Machine Translation, MT\uff09\u7b49\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u672a\u80fd\u8fbe\u5230\u4e0e\u76d1\u7763\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08Supervised Neural Machine Translation, NMT\uff09\u7cfb\u7edf\u76f8\u5f53\u7684\u7ffb\u8bd1\u8d28\u91cf\u3002\u539f\u56e0\u53ef\u80fd\u662f\u5f53\u524d\u4f7f\u7528\u7684\u7b80\u5355\u63d0\u793a\u65e0\u6cd5\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TasTe\u6846\u67b6\uff0c\u5373\u201c\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u8fdb\u884c\u7ffb\u8bd1\u201d\u3002\u8be5\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u63a8\u7406\u9636\u6bb5\uff1a\u7b2c\u4e00\u9636\u6bb5\uff0c\u6a21\u578b\u88ab\u5f15\u5bfc\u751f\u6210\u521d\u6b65\u7ffb\u8bd1\u5e76\u540c\u65f6\u5bf9\u5176\u81ea\u8eab\u8fdb\u884c\u8bc4\u4f30\uff1b\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6a21\u578b\u6839\u636e\u8bc4\u4f30\u7ed3\u679c\u5bf9\u521d\u6b65\u7ffb\u8bd1\u8fdb\u884c\u7ec6\u5316\u3002\u5728WMT22\u57fa\u51c6\u7684\u56db\u79cd\u8bed\u8a00\u65b9\u5411\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\u7684\u6709\u6548\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91ca\u653e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u5e76\u589e\u5f3a\u5176\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/YutongWang1216/ReflectionLLMMT\u4e0a\u5f00\u6e90\u3002**|\n", "2406.08426": "|**2024-06-12**|**Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL**|Zijin Hong et.al.|[2406.08426](http://arxiv.org/abs/2406.08426)|null|\u6587\u672c\u8f6cSQL\u751f\u6210\u51c6\u786e\u7684SQL\u67e5\u8be2\u4ee5\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u662f\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5b83\u6d89\u53ca\u7528\u6237\u95ee\u9898\u7406\u89e3\u3001\u6570\u636e\u5e93\u6a21\u5f0f\u7406\u89e3\u4ee5\u53caSQL\u751f\u6210\u7b49\u591a\u4e2a\u590d\u6742\u73af\u8282\u3002\u4f20\u7edf\u7684\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u4f9d\u8d56\u4e8e\u4eba\u5de5\u5de5\u7a0b\u548c\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u3002\u968f\u7740\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u53d1\u5c55\u548c\u5728\u8be5\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002\u7136\u800c\uff0c\u968f\u7740\u6570\u636e\u5e93\u590d\u6742\u5ea6\u589e\u52a0\u548c\u7528\u6237\u95ee\u9898\u96be\u5ea6\u589e\u5927\uff0cPLMs\u6709\u9650\u7684\u7406\u89e3\u80fd\u529b\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u7684SQL\u751f\u6210\uff0c\u8fd9\u4fc3\u4f7f\u7814\u7a76\u4eba\u5458\u5bfb\u6c42\u66f4\u9ad8\u7ea7\u548c\u5b9a\u5236\u5316\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u9650\u5236\u4e86PLM\u57fa\u7840\u7cfb\u7edf\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0a\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u56e0\u6b64\uff0c\u6574\u5408LLM\u7684\u5b9e\u73b0\u4e3a\u6587\u672c\u8f6cSQL\u7814\u7a76\u5e26\u6765\u4e86\u72ec\u7279\u7684\u673a\u9047\u3001\u6311\u6218\u548c\u89e3\u51b3\u65b9\u6848\u3002\u672c\u7efc\u8ff0\u5168\u9762\u6982\u8ff0\u4e86\u57fa\u4e8eLLM\u7684\u6587\u672c\u8f6cSQL\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u5f53\u524d\u9762\u4e34\u7684\u6311\u6218\u548c\u6587\u672c\u8f6cSQL\u7684\u53d1\u5c55\u5386\u7a0b\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4ef7\u6307\u6807\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u5206\u6790\u4e86\u8fd1\u671f\u5728LLM\u652f\u6301\u4e0b\u7684\u6587\u672c\u8f6cSQL\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u5c1a\u5b58\u7684\u6311\u6218\uff0c\u5e76\u5bf9\u672a\u6765\u7814\u7a76\u65b9\u5411\u63d0\u51fa\u671f\u5f85\u3002|\n", "2406.08418": "|**2024-06-12**|**OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text**|Qingyun Li et.al.|[2406.08418](http://arxiv.org/abs/2406.08418)|**[link](https://github.com/opengvlab/omnicorpus)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aOmniCorpus\u7684\u5927\u578b\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u6570\u636e\u96c6\uff0c\u89c4\u6a21\u8fbe\u5230100\u4ebf\u7ea7\u522b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u901a\u8fc7\u9ad8\u6548\u7684\u5f15\u64ce\u7b5b\u9009\u548c\u63d0\u53d6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u6587\u6863\uff0c\u5305\u542b86\u4ebf\u5f20\u56fe\u7247\u548c1,696\u4e07\u4ebf\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u76f8\u8f83\u4e8e\u540c\u7c7b\u6570\u636e\uff08\u5982MMC4\u3001OBELICS\uff09\uff0cOmniCorpus\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf\uff1a1\uff09\u89c4\u6a21\u6269\u592715\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6570\u636e\u8d28\u91cf\uff1b2\uff09\u6765\u6e90\u66f4\u4e3a\u591a\u6837\uff0c\u5305\u62ec\u82f1\u6587\u548c\u975e\u82f1\u6587\u7f51\u7ad9\uff0c\u4ee5\u53ca\u89c6\u9891\u4e3a\u4e3b\u7684\u7f51\u7ad9\uff1b3\uff09\u7075\u6d3b\u6027\u66f4\u5f3a\uff0c\u53ef\u4ee5\u4ece\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u683c\u5f0f\u8f7b\u677e\u8f6c\u6362\u4e3a\u7eaf\u6587\u672c\u8bed\u6599\u5e93\u6216\u56fe\u50cf-\u6587\u672c\u5bf9\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bba\u6587\u9a8c\u8bc1\u4e86OmniCorpus\u7684\u6570\u636e\u8d28\u91cf\u3001\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684\u591a\u6a21\u6001\u6a21\u578b\u7814\u7a76\u63d0\u4f9b\u575a\u5b9e\u7684\u6570\u636e\u57fa\u7840\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/OpenGVLab/OmniCorpus\u4e0a\u516c\u5f00\u3002**|\n", "2406.08414": "|**2024-06-12**|**Discovering Preference Optimization Algorithms with and for Large Language Models**|Chris Lu et.al.|[2406.08414](http://arxiv.org/abs/2406.08414)|**[link](https://github.com/luchris429/DiscoPOP)**|****\u4e2d\u6587\u7ffb\u8bd1\uff1a** \u79bb\u7ebf\u504f\u597d\u4f18\u5316\u662f\u63d0\u5347\u548c\u63a7\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u8d28\u91cf\u7684\u91cd\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u504f\u597d\u4f18\u5316\u88ab\u89c6\u4e3a\u57fa\u4e8e\u4eba\u5de5\u8bbe\u8ba1\u7684\u51f8\u635f\u5931\u51fd\u6570\u7684\u79bb\u7ebf\u76d1\u7763\u5b66\u4e60\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53d7\u9650\u4e8e\u4eba\u7c7b\u521b\u9020\u529b\uff0c\u672a\u80fd\u5145\u5206\u63a2\u7d22\u53ef\u80fd\u7684\u635f\u5931\u51fd\u6570\u7684\u5de8\u5927\u641c\u7d22\u7a7a\u95f4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528LLM\u8fdb\u884c\u76ee\u6807\u53d1\u73b0\u7684\u65b9\u6cd5\uff0c\u4ee5\u81ea\u52a8\u53d1\u73b0\u65b0\u7684\u6700\u5148\u8fdb\u7684\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u65e0\u9700\uff08\u4e13\u5bb6\uff09\u4eba\u5de5\u5e72\u9884\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u8fed\u4ee3\u5730\u63d0\u793aLLM\uff0c\u6839\u636e\u5148\u524d\u7684\u6027\u80fd\u8bc4\u4f30\u63d0\u51fa\u5e76\u5b9e\u73b0\u65b0\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5bfc\u81f4\u4e86\u672a\u77e5\u4e14\u9ad8\u6548\u7684\u4f18\u5316\u7b97\u6cd5\u7684\u53d1\u73b0\u3002\u5176\u4e2d\u6700\u597d\u7684\u4e00\u4e2a\u88ab\u547d\u540d\u4e3a\u201c\u53d1\u73b0\u504f\u597d\u4f18\u5316\u201d\uff08DiscoPOP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\uff0c\u5b83\u5de7\u5999\u5730\u878d\u5408\u4e86\u903b\u8f91\u548c\u6307\u6570\u635f\u5931\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDiscoPOP\u5728\u6027\u80fd\u4e0a\u8fbe\u5230\u4e86\u6700\u65b0\u6c34\u5e73\uff0c\u5e76\u6210\u529f\u5730\u5e94\u7528\u4e8e\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u3002**|\n", "2406.08413": "|**2024-06-12**|**Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference**|Christopher Wolters et.al.|[2406.08413](http://arxiv.org/abs/2406.08413)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u671f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f7f\u5f97\u673a\u5668\u80fd\u591f\u751f\u6210\u903c\u771f\u7684\u6587\u672c\u5e76\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u968f\u7740\u8ba1\u7b97\u548c\u5185\u5b58\u9700\u6c42\u7684\u6025\u5267\u589e\u957f\uff0c\u5c24\u5176\u662f\u5f53LLMs\u8d85\u8d8a\u5355\u4e2aGPU\u7684\u5904\u7406\u80fd\u529b\u65f6\uff0c\u5bf9\u901f\u5ea6\u3001\u6548\u7387\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u9700\u6c42\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u540c\u65f6\uff0c\u8ba1\u7b97\u673a\u6027\u80fd\u548c\u5185\u5b58\u80fd\u529b\u7684\u53d1\u5c55\u5e76\u672a\u8ddf\u4e0a\u6b65\u4f10\uff0c\u5c24\u5176\u662f\u5728\u6469\u5c14\u5b9a\u5f8b\u653e\u7f13\u7684\u80cc\u666f\u4e0b\u3002\u5185\u5b58\u8bbf\u95ee\u6210\u672c\u8fdc\u9ad8\u4e8e\u8ba1\u7b97\uff0c\u8fd9\u7ed9\u5927\u89c4\u6a21\u6269\u5c55\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5373\u6240\u8c13\u7684\u201c\u5185\u5b58\u5899\u201d\u3002\u5728\u8fd9\u4e2a\u65f6\u5019\uff0c\u8ba1\u7b97\u5728\u5185\u5b58\uff08Compute-in-Memory, CIM\uff09\u6280\u672f\u4e3aAI\u63a8\u7406\u63d0\u4f9b\u4e86\u52a0\u901f\u53ef\u80fd\uff0c\u901a\u8fc7\u5728\u5185\u5b58\u4e2d\u76f4\u63a5\u6267\u884c\u6a21\u62df\u8ba1\u7b97\uff0c\u6709\u671b\u964d\u4f4e\u5ef6\u8fdf\u548c\u529f\u8017\u3002\u901a\u8fc7\u7d27\u5bc6\u96c6\u6210\u5185\u5b58\u548c\u8ba1\u7b97\u5143\u4ef6\uff0cCIM\u6d88\u9664\u4e86\u51af\u8bfa\u4f9d\u66fc\u74f6\u9888\uff0c\u51cf\u5c11\u4e86\u6570\u636e\u4f20\u8f93\uff0c\u63d0\u9ad8\u4e86\u80fd\u6e90\u6548\u7387\u3002 \u672c\u7efc\u8ff0\u8bba\u6587\u6982\u8ff0\u4e86\u57fa\u4e8e\u53d8\u538b\u5668\u7684\u6a21\u578b\uff0c\u63a2\u8ba8\u4e86\u5404\u79cdCIM\u67b6\u6784\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u5982\u4f55\u5e94\u5bf9\u73b0\u4ee3\u4eba\u5de5\u667a\u80fd\u8ba1\u7b97\u7cfb\u7edf\u9762\u4e34\u7684\u7d27\u8feb\u6311\u6218\u3002\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u4e86\u4e0e\u53d8\u538b\u5668\u76f8\u5173\u7684\u8fd0\u7b97\u53ca\u5176\u786c\u4ef6\u52a0\u901f\u7b56\u7565\uff0c\u540c\u65f6\u6307\u51fa\u76f8\u5173CIM\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\u3001\u8d8b\u52bf\u548c\u6d1e\u5bdf\u3002|\n", "2406.08402": "|**2024-06-12**|**Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models**|Chun-Yi Kuan et.al.|[2406.08402](http://arxiv.org/abs/2406.08402)|**[link](https://github.com/kuan2jiu99/audio-hallucination)**|**## \u80cc\u666f \u5927\u578b\u97f3\u9891\u8bed\u8a00\u6a21\u578b\uff08LALMs\uff09\u901a\u8fc7\u6574\u5408\u97f3\u9891\u611f\u77e5\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u4f20\u7edf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u97f3\u9891\u76f8\u5173\u4efb\u52a1\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bc4\u4f30LALMs\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u4f46\u5bf9\u5b83\u4eec\u7684\u53ef\u9760\u6027\uff0c\u7279\u522b\u662f\u5173\u4e8e\u5bf9\u8c61\u5e7b\u89c9\u7b49\u95ee\u9898\u7684\u5173\u6ce8\u4e0d\u8db3\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u6765\u8bc4\u4f30\u516c\u5f00\u53ef\u7528\u7684LALMs\u5728\u5bf9\u8c61\u5e7b\u89c9\u65b9\u9762\u7684\u7a0b\u5ea6\u3002\u7ed3\u679c\u8868\u660e\uff0cLALMs\u5728\u7406\u89e3\u97f3\u9891\u5185\u5bb9\u65b9\u9762\u4e0e\u4e13\u95e8\u7684\u97f3\u9891captioning\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5728\u56de\u7b54\u533a\u5206\u6027\u95ee\u9898\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u9700\u8981\u8bc6\u522b\u97f3\u9891\u7247\u6bb5\u4e2d\u7279\u5b9a\u7269\u4f53\u58f0\u97f3\u7684\u95ee\u9898\u3002\u8fd9\u63ed\u793a\u4e86\u5f53\u524dLALMs\u7684\u4e00\u4e2a\u5173\u952e\u5f31\u70b9\uff1a\u5b83\u4eec\u5bf9\u533a\u5206\u6027\u67e5\u8be2\u7684\u7406\u89e3\u4e0d\u8db3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u63d0\u793a\u5de5\u7a0b\u5982\u4f55\u63d0\u5347LALMs\u5728\u533a\u5206\u6027\u95ee\u9898\u4e0a\u7684\u6027\u80fd\u3002**|\n", "2406.08398": "|**2024-06-12**|**cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers**|Anirudh Sundar et.al.|[2406.08398](http://arxiv.org/abs/2406.08398)|null|## \u80cc\u666f \u5728\u60c5\u5883\u5316\u548c\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8bdd\uff08SIMMC\uff09\u7684\u65b0\u5174\u7814\u7a76\u9886\u57df\u4e2d\uff0c\u79d1\u5b66\u8bba\u6587\u7684\u4e92\u52a8\u662f\u4e00\u4e2a\u91cd\u8981\u65b9\u5411\u3002\u7531\u4e8e\u79d1\u5b66\u8bba\u6587\u4e3b\u8981\u7531\u6587\u672c\u3001\u516c\u5f0f\u3001\u56fe\u8868\u548c\u8868\u683c\u6784\u6210\uff0cSIMMC\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u8fdb\u884c\u4e13\u95e8\u8bbe\u8ba1\uff0c\u4ee5\u652f\u6301\u79d1\u7814\u4eba\u5458\u6240\u9700\u7684\u6df1\u5ea6\u63a2\u7a76\u548c\u4e92\u52a8\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u8bdd\u5f0f\u8bba\u6587\u201d\uff08cPAPERS\uff09\u7684\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u6765\u81eaarXiv\u4e0a\u53ef\u7528\u7684\u79d1\u5b66\u6587\u6863\u7684\u5b66\u672f\u8bba\u6587\u8bc4\u8bba\u4e2d\u7684\u95ee\u7b54\u5bf9\uff0c\u8fd9\u4e9b\u95ee\u7b54\u4e0e\u8bba\u6587\u7ec4\u4ef6\u53ca\u5176\u5f15\u7528\u76f8\u5173\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6570\u636e\u6536\u96c6\u7b56\u7565\uff0c\u901a\u8fc7OpenReview\u6536\u96c6\u8fd9\u4e9b\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5e76\u4e0eLaTeX\u6e90\u6587\u4ef6\u4e2d\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5173\u8054\u8d77\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u7cfb\u5217\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5305\u62ec\u96f6\u6837\u672c\u548c\u5fae\u8c03\u914d\u7f6e\uff0c\u6765\u5904\u7406cPAPERS\u6570\u636e\u96c6\u3002|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418](http://arxiv.org/abs/2406.09418)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|**\u5728\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u5c55\u57fa\u7840\u4e0a\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u89c6\u9891\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u9891LMMs\u4f9d\u8d56\u4e8e\u56fe\u50cf\u6216\u89c6\u9891\u7f16\u7801\u5668\u5904\u7406\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u4e9b\u7f16\u7801\u5668\u5404\u81ea\u5b58\u5728\u5c40\u9650\u6027\u3002\u56fe\u50cf\u7f16\u7801\u5668\u64c5\u957f\u6355\u6349\u5e27\u5e8f\u5217\u4e2d\u7684\u4e30\u5bcc\u7a7a\u95f4\u7ec6\u8282\uff0c\u4f46\u7f3a\u4e4f\u660e\u786e\u7684\u65f6\u95f4\u4e0a\u4e0b\u6587\uff1b\u800c\u89c6\u9891\u7f16\u7801\u5668\u63d0\u4f9b\u65f6\u95f4\u4e0a\u4e0b\u6587\uff0c\u4f46\u5e38\u5e38\u53d7\u9650\u4e8e\u8ba1\u7b97\u8d44\u6e90\uff0c\u5bfc\u81f4\u53ea\u80fd\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u7684\u7a00\u758f\u5e27\uff0c\u4ece\u800c\u5f71\u54cd\u4e86\u5bf9\u7a7a\u95f4\u548c\u4e0a\u4e0b\u6587\u7684\u7406\u89e3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faVideoGPT+\uff0c\u5b83\u7ed3\u5408\u4e86\u56fe\u50cf\u7f16\u7801\u5668\uff08\u7528\u4e8e\u8be6\u7ec6\u7684\u7a7a\u95f4\u7406\u89e3\uff09\u548c\u89c6\u9891\u7f16\u7801\u5668\uff08\u7528\u4e8e\u5168\u5c40\u65f6\u5e8f\u4e0a\u4e0b\u6587\u5efa\u6a21\uff09\u7684\u4f18\u52bf\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u89c6\u9891\u5212\u5206\u4e3a\u5c0f\u6bb5\uff0c\u5e76\u5bf9\u6765\u81ea\u4e24\u8005\u7279\u5f81\u7684\u63d0\u53d6\u5e94\u7528\u81ea\u9002\u5e94\u6c60\u5316\u7b56\u7565\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u6211\u4eec\u7684\u67b6\u6784\u5728\u591a\u4e2a\u89c6\u9891\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5305\u62ecVCGBench\u3001MVBench\u548c\u96f6\u6837\u672c\u95ee\u7b54\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a112K\u7684\u89c6\u9891\u6307\u4ee4\u96c6\uff0c\u901a\u8fc7\u65b0\u9896\u7684\u534a\u81ea\u52a8\u6807\u6ce8\u7ba1\u9053\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u89c6\u9891LMMs\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86VCGBench-Diverse\uff0c\u5b83\u6db5\u76d6\u4e8618\u4e2a\u5e7f\u6cdb\u89c6\u9891\u7c7b\u522b\uff0c\u5982\u751f\u6d3b\u65b9\u5f0f\u3001\u4f53\u80b2\u3001\u79d1\u5b66\u3001\u6e38\u620f\u548c\u76d1\u63a7\u89c6\u9891\uff0c\u51714,354\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u8fd9\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\u73b0\u6709LMMs\u5728\u5bc6\u96c6\u89c6\u9891\u63cf\u8ff0\u3001\u7a7a\u95f4\u548c\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u590d\u6742\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u786e\u4fdd\u5728\u5404\u79cd\u89c6\u9891\u7c7b\u578b\u548c\u52a8\u6001\u4e0b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mbzuai-oryx/VideoGPT-plus\u627e\u5230\u3002**|\n", "2406.09412": "|**2024-06-13**|**Explore the Limits of Omni-modal Pretraining at Scale**|Yiyuan Zhang et.al.|[2406.09412](http://arxiv.org/abs/2406.09412)|**[link](https://github.com/invictus717/MiCo)**|**\u6211\u4eec\u63d0\u8bae\u6784\u5efa\u5168\u6a21\u6001\u667a\u80fd\uff0c\u65e8\u5728\u7406\u89e3\u5404\u79cd\u6a21\u6001\u5e76\u5b66\u4e60\u901a\u7528\u8868\u793a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u9884\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u591a\u6a21\u6001\u4e0a\u4e0b\u6587\uff08MiCo\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u540c\u65f6\u589e\u52a0\u6a21\u6001\u6570\u91cf\u3001\u6570\u636e\u91cf\u4ee5\u53ca\u6a21\u578b\u53c2\u6570\u7684\u6570\u91cf\u3002\u901a\u8fc7MiCo\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u591a\u9879\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u6a21\u6001\u5b66\u4e60\u80fd\u529b\uff1a\u4e00\u662f\u9488\u5bf910\u79cd\u4e0d\u540c\u6a21\u6001\u7684\u5355\u6a21\u6001\u611f\u77e5\u57fa\u51c6\uff0c\u4e8c\u662f\u5305\u62ec\u68c0\u7d22\u3001\u95ee\u7b54\u548ccaptioning\u5728\u5185\u768425\u9879\u8de8\u6a21\u6001\u7406\u89e3\u4efb\u52a1\uff0c\u4e09\u662f18\u4e2a\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u57fa\u51c6\u3002\u6211\u4eec\u7684\u6a21\u578b\u521b\u9020\u4e8637\u9879\u6700\u65b0\u7684\u6700\u9ad8\u6027\u80fd\u8bb0\u5f55\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u63a8\u52a8\u5168\u6a21\u6001\u667a\u80fd\u7684\u53d1\u5c55\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u5f00\u6e90\u3002**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397](http://arxiv.org/abs/2406.09397)|null|\u73b0\u4ee3\u89c6\u89c9\u6a21\u578b\u5728\u5927\u89c4\u6a21\u5608\u6742\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u867d\u7136\u5c55\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u9075\u5faa\u7528\u6237\u610f\u56fe\u3001\u5982\u89c6\u89c9\u7f8e\u611f\u3001\u7279\u5b9a\u98ce\u683c\u548c\u8d23\u4efb\u8f93\u51fa\u65b9\u9762\u53ef\u80fd\u5b58\u5728\u95ee\u9898\u3002\u672c\u6587\u5173\u6ce8\u89c6\u89c9\u7f8e\u5b66\u9886\u57df\uff0c\u76ee\u6807\u662f\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u6807\u51c6\u5728\u68c0\u7d22\u7cfb\u7edf\u4e2d\u4fdd\u6301\u4e00\u81f4\u3002\u9ad8\u7ea7\u68c0\u7d22\u7cfb\u7edf\u901a\u5e38\u91c7\u7528\u57fa\u4e8e\u4f4e\u7ea7\u7279\u5f81\uff08\u5982\u9971\u548c\u5ea6\uff09\u7684\u5ba1\u7f8e\u6a21\u578b\u4f5c\u4e3a\u91cd\u6392\u5668\u6216\u8fc7\u6ee4\u5668\uff0c\u4f46\u9762\u5bf9\u98ce\u683c\u3001\u6587\u5316\u6216\u77e5\u8bc6\u80cc\u666f\u65f6\u6027\u80fd\u6709\u9650\u3002\u6211\u4eec\u53d1\u73b0\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u6539\u5199\u641c\u7d22\u67e5\u8be2\u5e76\u6269\u5c55\u5ba1\u7f8e\u671f\u671b\uff0c\u53ef\u4ee5\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u504f\u597d\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u9488\u5bf9\u89c6\u89c9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u63d0\u53d6LLM\u63a8\u7406\u548c\u5ba1\u7f8e\u6a21\u578b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u66f4\u597d\u5730\u4f7f\u89c6\u89c9\u6a21\u578b\u7b26\u5408\u4eba\u7c7b\u5ba1\u7f8e\u3002\u7531\u4e8e\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u68c0\u7d22\u7cfb\u7edf\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u6a21\u578b\uff08LMM\uff09\u6765\u8bc4\u4ef7\u7f8e\u611f\u8868\u73b0\u3002\u8003\u8651\u5230\u7f8e\u611f\u8bc4\u4f30\u7684\u4e3b\u89c2\u6027\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aHPIR\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8861\u91cf\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u7684\u5951\u5408\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9\u6a21\u578b\u7684\u7f8e\u611f\u884c\u4e3a\uff0c\u4ece\u591a\u4e2a\u6307\u6807\u6765\u770b\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u63d0\u51fa\u7684\u7b97\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u901a\u7528\u5b9e\u8df5\uff0c\u7528\u4e8e\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002|\n", "2406.09396": "|**2024-06-13**|**Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA**|Jongwoo Park et.al.|[2406.09396](http://arxiv.org/abs/2406.09396)|**[link](https://github.com/jongwoopark7978/LVNet)**|\u957f\u671f\u89c6\u9891\u901a\u5e38\u5305\u542b\u5927\u91cf\u5197\u4f59\u4fe1\u606f\uff0c\u8de8\u8d8a\u8f83\u957f\u7684\u65f6\u95f4\u95f4\u9694\uff0c\u4e14\u5305\u542b\u591a\u4e2a\u677e\u6563\u5173\u8054\u7684\u4e8b\u4ef6\u6216\u5b9e\u4f53\u3002\u56e0\u6b64\uff0c\u5728\u8fdb\u884c\u957f\u89c6\u9891\u95ee\u7b54\uff08LVQA\uff09\u65f6\uff0c\u751f\u6210\u6b63\u786e\u7b54\u6848\u6240\u9700\u7684\u6240\u6709\u4fe1\u606f\u5f80\u5f80\u53ea\u9700\u4e00\u5c0f\u90e8\u5206\u5e27\u5c31\u8db3\u4ee5\u63d0\u4f9b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LVQA\u57fa\u51c6\u4e0a\u53d6\u5f97\u5353\u8d8a\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u4f9d\u8d56\u4e8e\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5c06\u89c6\u9891\u4e2d\u7684\u6240\u6709\u89c6\u89c9\u5185\u5bb9\u8f6c\u6362\u6210\u81ea\u7136\u8bed\u8a00\u3002\u4f20\u7edf\u505a\u6cd5\u901a\u5e38\u662f\u5747\u5300\u91c7\u6837\u5927\u91cf\u5e27\u5e76\u72ec\u7acb\u4e3a\u5176\u751f\u6210\u63cf\u8ff0\uff0c\u8fd9\u65e2\u4e0d\u9ad8\u6548\u4e5f\u4e0d\u514d\u6709\u5197\u4f59\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5173\u952e\u5e27\u9009\u62e9\u548c\u987a\u5e8f\u611f\u77e5\u7684\u63cf\u8ff0\u65b9\u6cd5\uff0c\u4ee5\u663e\u8457\u51cf\u5c11\u8fd9\u4e9b\u5197\u4f59\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u521b\u65b0\u65b9\u6cd5\uff1a\u5c42\u6b21\u5173\u952e\u5e27\u9009\u62e9\u5668\u548c\u987a\u5e8f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u7684\u6700\u7ec8\u6846\u67b6\u79f0\u4e3aLVNet\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6LVQA\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u4ee3\u7801\u3002|\n", "2406.09367": "|**2024-06-13**|**Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs**|Zijia Zhao et.al.|[2406.09367](http://arxiv.org/abs/2406.09367)|**[link](https://github.com/joez17/videoniah)**|**\u89c6\u9891\u7406\u89e3\u662f\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5173\u952e\u4e0b\u4e00\u6b65\u3002\u4e3a\u4e86\u68c0\u9a8c\u89c6\u9891\u7406\u89e3\u7684\u7279\u5b9a\u65b9\u9762\uff0c\u73b0\u6709\u7684\u89c6\u9891\u57fa\u51c6\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u9009\u62e9\u4e0e\u76ee\u6807\u80fd\u529b\u5339\u914d\u7684\u89c6\u9891\uff0c\u5e76\u5bf9\u67e5\u8be2-\u54cd\u5e94\u5bf9\u8fdb\u884c\u7e41\u7410\u7684\u6807\u6ce8\uff0c\u4ee5\u5339\u914d\u89c6\u9891\u5185\u5bb9\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u65e2\u5177\u6709\u6311\u6218\u6027\u53c8\u8d44\u6e90\u5bc6\u96c6\u3002\u672c\u6587\u63d0\u51faVideoNIAH\uff08\u89c6\u9891\u9488 haystack\uff09\uff0c\u4e00\u4e2a\u901a\u8fc7\u5408\u6210\u89c6\u9891\u751f\u6210\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\u3002VideoNIAH\u901a\u8fc7\u5c06\u4e0d\u76f8\u5173\u7684\u56fe\u50cf/\u6587\u672c\u201c\u9488\u201d\u63d2\u5165\u539f\u59cb\u89c6\u9891\u4e2d\uff0c\u5c06\u6d4b\u8bd5\u89c6\u9891\u5185\u5bb9\u4e0e\u5b83\u4eec\u7684\u67e5\u8be2-\u54cd\u5e94\u5206\u79bb\u3002\u5b83\u4ec5\u57fa\u4e8e\u8fd9\u4e9b\u9488\u751f\u6210\u6ce8\u91ca\uff0c\u786e\u4fdd\u89c6\u9891\u6765\u6e90\u7684\u591a\u6837\u6027\u548c\u67e5\u8be2-\u54cd\u5e94\u7684\u4e30\u5bcc\u6027\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d2\u5165\u591a\u4e2a\u9488\uff0cVideoNIAH\u4e25\u683c\u8bc4\u4f30\u6a21\u578b\u7684\u65f6\u5e8f\u7406\u89e3\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528VideoNIAH\u6784\u5efa\u4e86\u89c6\u9891\u57fa\u51c6VNBench\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u6392\u5e8f\u548c\u8ba1\u6570\u7b49\u4efb\u52a1\u3002VNBench\u80fd\u591f\u9ad8\u6548\u5730\u8bc4\u4f30\u89c6\u9891\u6a21\u578b\u7684\u7cbe\u7ec6\u7406\u89e3\u80fd\u529b\u548c\u65f6\u7a7a\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u652f\u6301\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd1\u671f\u7684\u89c6\u9891\u4e3a\u4e2d\u5fc3\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u6a21\u578b\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u3002\u5c3d\u7ba1\u4e13\u6709\u6a21\u578b\u76f8\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\u5177\u6709\u663e\u8457\u4f18\u52bf\uff0c\u4f46\u6240\u6709\u73b0\u6709\u89c6\u9891\u6a21\u578b\u5728\u957f\u8ddd\u79bb\u4f9d\u8d56\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4ecd\u7136\u4e0d\u4f73\u3002VideoNIAH\u662f\u4e00\u4e2a\u7b80\u5355\u4e14\u9ad8\u5ea6\u53ef\u6269\u5c55\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\uff0c\u6211\u4eec\u76f8\u4fe1\u5b83\u5c06\u6fc0\u53d1\u672a\u6765\u89c6\u9891\u57fa\u51c6\u5de5\u4f5c\u7684\u521b\u65b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/joez17/VideoNIAH\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.09363": "|**2024-06-13**|**ElicitationGPT: Text Elicitation Mechanisms via Language Models**|Yifan Wu et.al.|[2406.09363](http://arxiv.org/abs/2406.09363)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u65e0\u9700\u9886\u57df\u77e5\u8bc6\u7684\u67e5\u8be2\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u5bf9\u83b7\u53d6\u7684\u6587\u672c\u9884\u6d4b\u8fdb\u884c\u8bc4\u5206\uff0c\u4ee5\u8bc4\u4f30\u5176\u4e0e\u5b9e\u9645\u72b6\u6001\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u6fc0\u52b1\u4fe1\u606f\u6536\u96c6\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u3002\u7814\u7a76\u901a\u8fc7\u5728\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u81ea\u52a8\u7684\u6a21\u578b\u8bc4\u5206\u4e0e\u4eba\u5de5\u5bfc\u5e08\u7ed9\u51fa\u7684\u8bc4\u5206\uff0c\u65e8\u5728\u5b9e\u8bc1\u8bc4\u4f30\u8fd9\u4e9b\u673a\u5236\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002|\n", "2406.09345": "|**2024-06-13**|**DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding**|Suwon Shon et.al.|[2406.09345](http://arxiv.org/abs/2406.09345)|null|## \u80cc\u666f \u5c06\u9884\u8bad\u7ec3\u7684\u6587\u672c\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u8bed\u97f3\u8f93\u5165\u76f8\u7ed3\u5408\uff0c\u5df2\u7ecf\u8d4b\u4e88\u4e86\u8fd9\u4e9b\u6a21\u578b\u6267\u884c\u591a\u6837\u5316\u8bed\u97f3\u4efb\u52a1\u7684\u80fd\u529b\uff0c\u5305\u62ec\u6307\u4ee4\u8ddf\u968f\u3002\u8fd9\u79cd\u6574\u5408\u9700\u8981\u7ed3\u5408\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u548cLLM\uff0c\u5b83\u4eec\u5206\u522b\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u63d0\u8bae\u4f7f\u7528\u79bb\u6563\u8bed\u97f3\u5355\u5143\uff08DSU\uff09\uff0c\u800c\u975e\u8fde\u7eed\u503c\u7684\u8bed\u97f3\u7f16\u7801\u8f93\u51fa\uff0c\u901a\u8fc7\u8bed\u97f3\u9002\u914d\u5668\u5c06DSU\u8f6c\u6362\u5230LLM\u7684\u5d4c\u5165\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u65e0\u76d1\u7763\u7684\u8bed\u97f3\u7f16\u7801\u5668\u751f\u6210DSU\uff0c\u7136\u540e\u8fd0\u7528k-means\u805a\u7c7b\u65b9\u6cd5\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u5904\u7406\u6765\u81ea\u89c1/\u672a\u89c1\u8fc7\u9886\u57df\u4ee5\u53ca\u53e3\u8bed\u95ee\u7b54\u4e2d\u7684\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u7a33\u5065\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6765\u81ea\u4e0d\u540c\u81ea\u76d1\u7763\u8bed\u97f3\u7f16\u7801\u5668\u5c42\u7684DSU\u7c7b\u578b\uff0c\u4ee5\u53ca\u6885\u5c14\u9891\u7387\u5012\u8c31\u7cfb\u6570\uff08MFCC\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u53e3\u8bed\u95ee\u7b54\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u4e2d\uff0cASR\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u53ef\u80fd\u8f83\u4f4e\u3002|\n", "2406.09325": "|**2024-06-13**|**REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space**|Tomer Ashuach et.al.|[2406.09325](http://arxiv.org/abs/2406.09325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u80fd\u65e0\u610f\u4e2d\u8bb0\u4f4f\u5e76\u6cc4\u9732\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u654f\u611f\u6216\u4e2a\u4eba\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\uff0c\u5f15\u53d1\u9690\u79c1\u95ee\u9898\u3002\u5f53\u524d\u7684\u89e3\u51b3\u65b9\u6848\u5305\u62ec\u6602\u8d35\u7684\u6570\u636e\u6e05\u6d17\uff0c\u6216\u8005\u901a\u8fc7\u9057\u5fd8\u548c\u6a21\u578b\u7f16\u8f91\u6765\u8fc7\u6ee4\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u88ab\u63d0\u53d6\u653b\u51fb\u7ed5\u8fc7\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u540d\u4e3aREVS\uff0c\u7528\u4e8e\u4eceLLMs\u4e2d\u6d88\u9664\u654f\u611f\u4fe1\u606f\u3002REVS\u8bc6\u522b\u5e76\u4fee\u6539\u4e0e\u6bcf\u6761\u654f\u611f\u4fe1\u606f\u76f8\u5173\u7684\u5c11\u91cf\u795e\u7ecf\u5143\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u795e\u7ecf\u5143\u6295\u5f71\u5230\u8bcd\u6c47\u7a7a\u95f4\uff08\u53bb\u5d4c\u5165\uff09\uff0c\u6211\u4eec\u5b9a\u4f4d\u9a71\u52a8\u5176\u751f\u6210\u7684\u5173\u952e\u90e8\u5206\u3002\u7136\u540e\uff0c\u6211\u4eec\u6839\u636e\u53bb\u5d4c\u5165\u77e9\u9635\u7684\u4f2a\u9006\u8ba1\u7b97\u6a21\u578b\u7f16\u8f91\uff0c\u5e76\u5e94\u7528\u5b83\u6765\u964d\u4f4e\u76ee\u6807\u654f\u611f\u6570\u636e\u7684\u751f\u6210\u6982\u7387\u3002\u4e3a\u4e86\u5145\u5206\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u771f\u6b63\u654f\u611f\u4fe1\u606f\u4e0a\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u662fGPT-J\u56fa\u6709\u7684\u7535\u5b50\u90ae\u4ef6\u6570\u636e\u96c6\uff0c\u53e6\u4e00\u4e2a\u662f\u6211\u4eec\u8c03\u6574\u6a21\u578b\u4f7f\u5176\u8bb0\u5fc6\u7684\u5408\u6210\u793e\u4f1a\u4fdd\u969c\u53f7\u7801\u6570\u636e\u96c6\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\u76f8\u6bd4\uff0cREVS\u5728\u6d88\u9664\u654f\u611f\u4fe1\u606f\u548c\u62b5\u6297\u63d0\u53d6\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u7684\u5b8c\u6574\u6027\u3002\u4ee3\u7801\u548c\u6f14\u793a\u7b14\u8bb0\u672c\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.09324": "|**2024-06-13**|**Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs**|Zhao Xu et.al.|[2406.09324](http://arxiv.org/abs/2406.09324)|**[link](https://github.com/usail-hkust/bag_of_tricks_for_llm_jailbreaking)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6837\u672c\u4efb\u52a1\u6267\u884c\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u6613\u53d7\u7834\u89e3\u653b\u51fb\uff0c\u53ef\u80fd\u88ab\u64cd\u7eb5\u4ea7\u751f\u6709\u5bb3\u8f93\u51fa\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f00\u59cb\u5c06\u7834\u89e3\u653b\u51fb\u5206\u4e3a\u4ee4\u724c\u7ea7\u548c\u63d0\u793a\u7ea7\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u4e3b\u8981\u5ffd\u89c6\u4e86\u7834\u89e3\u653b\u51fb\u7684\u591a\u6837\u5173\u952e\u56e0\u7d20\uff0c\u5927\u90e8\u5206\u7814\u7a76\u805a\u7126\u4e8eLLM\u7684\u6f0f\u6d1e\uff0c\u800c\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u63a2\u7d22\u4e0d\u8db3\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u653b\u51fb\u8bbe\u7f6e\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u8bae\u5efa\u7acb\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u6846\u67b6\uff0c\u4ee5\u4fc3\u8fdb\u6807\u51c6\u5316\u8bc4\u4f30\u3002\u6211\u4eec\u4ece\u76ee\u6807\u7ea7\u548c\u653b\u51fb\u7ea7\u4e24\u4e2a\u89d2\u5ea6\uff0c\u8be6\u7ec6\u8003\u5bdf\u4e86\u5b9e\u65bd\u9488\u5bf9LLMs\u7684\u7834\u89e3\u653b\u51fb\u7684\u516b\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5e38\u7528\u6570\u636e\u96c6\u4e0a\u5bf9\u516d\u79cd\u9632\u5fa1\u65b9\u6cd5\u8fdb\u884c\u4e86\u4e03\u79cd\u4ee3\u8868\u6027\u7684\u7834\u89e3\u653b\u51fb\uff0c\u603b\u8ba1\u7ea6320\u4e2a\u5b9e\u9a8c\uff0c\u4f7f\u7528A800-80G GPU\u8017\u65f6\u5927\u7ea65\u4e07\u5c0f\u65f6\u3002\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u8fdb\u884c\u6807\u51c6\u5316\u8bc4\u4f30\u7684\u5fc5\u8981\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff1ahttps://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking\u3002**|\n", "2406.09321": "|**2024-06-13**|**JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models**|Delong Ran et.al.|[2406.09321](http://arxiv.org/abs/2406.09321)|**[link](https://github.com/thuccslab/jailbreakeval)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8d8a\u72f1\u653b\u51fb\u7814\u7a76\u4e2d\u7684\u8bc4\u4f30\u96be\u9898\u3002\u76ee\u524d\uff0c\u5bf9\u4e8e\u653b\u51fb\u662f\u5426\u6210\u529f\u7f3a\u4e4f\u7edf\u4e00\u6807\u51c6\uff0c\u4e0d\u540c\u7684\u8bc4\u4f30\u65b9\u6cd5\u5982\u4eba\u5de5\u6807\u6ce8\u6216\u7279\u5b9a\u65b9\u5f0f\u63d0\u793aGPT-4\u5b58\u5728\uff0c\u5404\u6709\u4f18\u7f3a\u70b9\uff0c\u5bf9\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u4f53\u73b0\u548c\u7814\u7a76\u6210\u672c\u4ea7\u751f\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7814\u7a76\u5206\u6790\u4e86\u8fd1\u4e5d\u5341\u98792023\u5e745\u6708\u81f32024\u5e744\u6708\u671f\u95f4\u53d1\u5e03\u7684\u8d8a\u72f1\u653b\u51fb\u76f8\u5173\u7814\u7a76\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u8be6\u7ec6\u7684\u8bc4\u4f30\u65b9\u6cd5\u5206\u7c7b\u4f53\u7cfb\uff0c\u6df1\u5165\u5256\u6790\u4e86\u5404\u79cd\u8bc4\u4f30\u5668\u7684\u4f18\u7f3a\u70b9\u53ca\u5176\u5e94\u7528\u73b0\u72b6\u3002\u4e3a\u4e86\u63a8\u52a8\u540e\u7eed\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u5e76\u63a8\u51fa\u4e86JailbreakEval\u5de5\u5177\u5305\uff0c\u5b83\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5e73\u53f0\uff0c\u96c6\u6210\u4e86\u591a\u79cd\u77e5\u540d\u7684\u8bc4\u4f30\u5668\uff0c\u7528\u6237\u53ea\u9700\u4e00\u4e2a\u547d\u4ee4\u5373\u53ef\u83b7\u53d6\u7ed3\u679c\u3002\u6b64\u5916\uff0cJailbreakEval\u652f\u6301\u7528\u6237\u5728\u7edf\u4e00\u6846\u67b6\u5185\u5b9a\u5236\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6d41\u7a0b\uff0c\u7b80\u5316\u4e86\u5f00\u53d1\u548c\u6bd4\u8f83\u8fc7\u7a0b\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u671f\u671bJailbreakEval\u80fd\u4fc3\u8fdb\u8d8a\u72f1\u653b\u51fb\u8bc4\u4ef7\u7684\u6807\u51c6\u5316\uff0c\u6210\u4e3a\u793e\u533a\u5185\u8d8a\u72f1\u7814\u7a76\u8bc4\u4f30\u7684\u50ac\u5316\u5242\u3002**|\n", "2406.10229": "|**2024-06-14**|**Quantifying Variance in Evaluation Benchmarks**|Lovish Madaan et.al.|[2406.10229](http://arxiv.org/abs/2406.10229)|null|\u8bc4\u4ef7\u57fa\u51c6\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4e5f\u662f\u63a8\u52a8\u8fd9\u4e9b\u80fd\u529b\u8fdb\u6b65\u7684\u9a71\u52a8\u529b\u3002\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff08\u6216\u7f3a\u4e4f\uff09\uff0c\u73b0\u5728\u5b83\u4eec\u4e5f\u88ab\u5e7f\u6cdb\u7528\u4e8e\u51b3\u5b9a\u4e0d\u540c\u7684\u8bad\u7ec3\u9009\u62e9\u4e4b\u95f4\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u6211\u4eec\u5f88\u5c11\u91cf\u5316\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\uff0c\u8fd9\u51b3\u5b9a\u4e86\u6027\u80fd\u5dee\u5f02\u7684\u542b\u4e49\u3002\u672c\u6587\u5b9a\u4e49\u5e76\u6d4b\u91cf\u4e86\u4e00\u7cfb\u5217\u65e8\u5728\u8861\u91cf\u8bc4\u4ef7\u57fa\u51c6\u65b9\u5dee\u7684\u6307\u6807\uff0c\u5305\u62ec\u521d\u59cb\u5316\u65f6\u7684\u968f\u673a\u79cd\u5b50\u65b9\u5dee\u548c\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5355\u8c03\u6027\u3002\u901a\u8fc7\u5bf9\u5927\u91cf\u6a21\u578b\uff08\u5305\u62ec\u516c\u5f00\u53ef\u7528\u7684\u548c\u4ece\u5934\u8bad\u7ec3\u7684\u6a21\u578b\uff09\u8fdb\u884c\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5404\u79cd\u65b9\u5dee\u5ea6\u91cf\u7684\u5b9e\u8bc1\u4f30\u8ba1\uff0c\u5e76\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u8003\u8651\u548c\u5efa\u8bae\u3002\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u8fde\u7eed\u548c\u79bb\u6563\u6027\u80fd\u5ea6\u91cf\u7684\u5b9e\u7528\u6027\u548c\u6743\u8861\uff0c\u5e76\u63a2\u7d22\u4e86\u66f4\u597d\u5730\u7406\u89e3\u548c\u51cf\u5c11\u65b9\u5dee\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u89c4\u6a21\uff08\u7ea670\u4ebf\u53c2\u6570\uff09\u7684\u6a21\u578b\uff0c\u5982\u5c06\u591a\u6a21\u6001\u591a\u4efb\u52a1\u5b66\u4e60\uff08MMLU\uff09\u4efb\u52a1\u6846\u67b6\u4e3a\u5b8c\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u5e38\u5e38\u964d\u4f4e\u65b9\u5dee\uff1b\u800c\u53d7\u5230\u4eba\u7c7b\u6d4b\u8bd5\u6587\u732e\u542f\u53d1\u7684\u66f4\u590d\u6742\u65b9\u6cd5\uff08\u5982\u9879\u76ee\u5206\u6790\u548c\u9879\u76ee\u53cd\u5e94\u7406\u8bba\uff09\u5728\u663e\u8457\u51cf\u5c11\u65b9\u5dee\u65b9\u9762\u6548\u679c\u6709\u9650\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\u7279\u6027\uff0c\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u7279\u5b9a\u6280\u672f\u6765\u51cf\u5c11\u65b9\u5dee\uff0c\u5e76\u666e\u904d\u9f13\u52b1\u5b9e\u8df5\u8005\u5728\u6bd4\u8f83\u6a21\u578b\u65f6\u4ed4\u7ec6\u8003\u8651\u65b9\u5dee\u56e0\u7d20\u3002|\n", "2406.10218": "|**2024-06-14**|**Semantic Membership Inference Attack against Large Language Models**|Hamid Mozaffari et.al.|[2406.10218](http://arxiv.org/abs/2406.10218)|null|## \u80cc\u666f \u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIA\uff09\u7684\u76ee\u6807\u662f\u8bc6\u522b\u7279\u5b9a\u6570\u636e\u70b9\u662f\u5426\u88ab\u7eb3\u5165\u4e86\u76ee\u6807\u6a21\u578b\u7684\u8bad\u7ec3\u96c6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u8bed\u4e49\u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Semantic Membership Inference Attack\uff0cSMIA\uff09\uff0c\u901a\u8fc7\u5229\u7528\u8f93\u5165\u7684\u8bed\u4e49\u5185\u5bb9\u53ca\u5176\u6270\u52a8\uff0c\u63d0\u5347MIA\u7684\u6027\u80fd\u3002SMIA\u8bad\u7ec3\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6765\u5206\u6790\u76ee\u6807\u6a21\u578b\u5bf9\u6270\u52a8\u8f93\u5165\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6355\u6349\u6210\u5458\u6837\u672c\u4e0e\u975e\u6210\u5458\u6837\u672c\u4e4b\u95f4\u8f93\u51fa\u6982\u7387\u5206\u5e03\u7684\u5dee\u5f02\u3002\u6211\u4eec\u5728Pythia\u548cGPT-Neo\u6a21\u578b\u5bb6\u65cf\uff0c\u4ee5\u53caWikipedia\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSMIA\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u653b\u51fb\u624b\u6bb5\uff0c\u4f8b\u5982\u5728Pythia-12B\u4e0a\u7684AUC-ROC\u503c\u8fbe\u5230\u4e8667.39%\uff0c\u800c\u7b2c\u4e8c\u597d\u7684\u653b\u51fb\u65b9\u6cd5\u4ec5\u4e3a58.90%\u3002|\n", "2406.10216": "|**2024-06-14**|**Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs**|Rui Yang et.al.|[2406.10216](http://arxiv.org/abs/2406.10216)|**[link](https://github.com/yangrui2015/generalizable-reward-model)**|\u5728\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6846\u67b6\u4e2d\uff0c\u5229\u7528\u57fa\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u5956\u52b1\u6a21\u578b\u5df2\u8bc1\u5b9e\u80fd\u6709\u6548\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u610f\u56fe\u3002\u7136\u800c\uff0c\u5f53\u524d\u5956\u52b1\u6a21\u578b\u5bf9\u672a\u89c1\u8fc7\u7684\u63d0\u793a\u548c\u54cd\u5e94\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\uff0c\u53ef\u80fd\u5bfc\u81f4\u6240\u8c13\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u5373\u5956\u52b1\u4f18\u5316\u8fc7\u5ea6\u5bfc\u81f4\u5b9e\u9645\u6027\u80fd\u4e0b\u964d\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u503e\u5411\u4e8e\u7ea6\u675f\u7b56\u7565\u4f18\u5316\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u6b63\u5219\u5316\u9690\u85cf\u72b6\u6001\u6765\u589e\u5f3a\u5956\u52b1\u6a21\u578b\u5e94\u5bf9\u5206\u5e03\u53d8\u5316\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4fdd\u7559\u57fa\u7840\u6a21\u578b\u7684\u8bed\u8a00\u6a21\u578b\u5934\uff0c\u5e76\u7ed3\u5408\u4e00\u7cfb\u5217\u6587\u672c\u751f\u6210\u635f\u5931\uff0c\u65e8\u5728\u4fdd\u6301\u9690\u85cf\u72b6\u6001\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\uff0c\u540c\u65f6\u5728\u76f8\u540c\u7684\u9690\u85cf\u72b6\u6001\u540e\u5b66\u4e60\u4e00\u4e2a\u5956\u52b1\u5934\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5f15\u5165\u7684\u6b63\u5219\u5316\u6280\u672f\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u6cdb\u5316\u4efb\u52a1\u4e2d\u7684\u5956\u52b1\u6a21\u578b\u51c6\u786e\u6027\uff0c\u5e76\u6709\u6548\u7f13\u89e3\u4e86RLHF\u4e2d\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u53ef\u9760\u3001\u66f4\u7a33\u5065\u7684\u504f\u597d\u5b66\u4e60\u8303\u5f0f\u3002|\n", "2406.10209": "|**2024-06-14**|**Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs**|Abhimanyu Hans et.al.|[2406.10209](http://arxiv.org/abs/2406.10209)|**[link](https://github.com/ahans30/goldfish-loss)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u8bb0\u4f4f\u5e76\u91cd\u590d\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u8fd9\u5e26\u6765\u4e86\u9690\u79c1\u548c\u7248\u6743\u95ee\u9898\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u79cd\u8bb0\u5fc6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u4e0b\u4e00\u6b65 token \u8bad\u7ec3\u76ee\u6807\u7684\u5fae\u5999\u4fee\u6539\uff0c\u79f0\u4e3a\u201c\u91d1\u9c7c\u635f\u5931\u201d\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u4ee4\u724c\u4e0d\u53c2\u4e0e\u635f\u5931\u8ba1\u7b97\u3002\u6a21\u578b\u4e0d\u4f1a\u8bb0\u4f4f\u8fd9\u4e9b\u88ab\u4e22\u5f03\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u9632\u6b62\u4e86\u5b8c\u6574\u8bad\u7ec3\u5e8f\u5217\u7684\u9010\u5b57\u590d\u5236\u3002\u6211\u4eec\u5728\u6570\u5341\u4ebf\u89c4\u6a21\u7684 Llama-2 \u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u548c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u63d0\u53d6\u7684\u8bb0\u5fc6\uff0c\u800c\u5bf9\u4e0b\u6e38\u57fa\u51c6\u7684\u5f71\u54cd\u5fae\u4e4e\u5176\u5fae\u3002**|\n", "2406.10196": "|**2024-06-14**|**TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners**|Tomas de la Rosa et.al.|[2406.10196](http://arxiv.org/abs/2406.10196)|null|**\u6458\u8981\uff1a** \u65c5\u884c\u89c4\u5212\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u6839\u636e\u7ea6\u675f\u6761\u4ef6\u751f\u6210\u4e00\u7cfb\u5217\u4e0e\u8bbf\u95ee\u5730\u70b9\u76f8\u5173\u7684\u884c\u52a8\uff0c\u540c\u65f6\u6700\u5927\u5316\u7528\u6237\u7684\u6ee1\u610f\u5ea6\u3002\u4f20\u7edf\u65b9\u6cd5\u901a\u5e38\u4f1a\u5c06\u95ee\u9898\u8f6c\u5316\u4e3a\u7279\u5b9a\u5f62\u5f0f\u7684\u8bed\u8a00\u8868\u8fbe\uff0c\u4ece\u7f51\u7edc\u8d44\u6e90\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4f7f\u7528\u5408\u9002\u7684\u6c42\u89e3\u5668\u6765\u751f\u6210\u6709\u6548\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\u76f4\u63a5\u4ece\u7528\u6237\u8bf7\u6c42\u4e2d\u8f93\u51fa\u8ba1\u5212\uff0c\u5229\u7528\u4e30\u5bcc\u7684\u65c5\u884c\u9886\u57df\u77e5\u8bc6\u63d0\u4f9b\u666f\u70b9\u548c\u53ef\u80fd\u8def\u7ebf\u7b49\u9ad8\u5c42\u6b21\u4fe1\u606f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5f80\u5f80\u4ea7\u751f\u4e0d\u8fde\u8d2f\u3001\u672a\u80fd\u5b8c\u5168\u6ee1\u8db3\u7ea6\u675f\u7684\u8ba1\u5212\uff0c\u4e14\u65e0\u6cd5\u4fdd\u8bc1\u751f\u6210\u9ad8\u8d28\u91cf\u65b9\u6848\u3002\u6211\u4eec\u63d0\u51faTRIP-PAL\uff0c\u4e00\u79cd\u878d\u5408LLMs\u548c\u81ea\u52a8\u5316\u89c4\u5212\u5668\u7684\u6df7\u5408\u65b9\u6cd5\uff1a\uff081\uff09LLMs\u83b7\u53d6\u5e76\u8f6c\u6362\u65c5\u884c\u4fe1\u606f\u548c\u7528\u6237\u9700\u6c42\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u8f93\u5165\u89c4\u5212\u5668\u7684\u6570\u636e\u7ed3\u6784\uff1b\uff082\uff09\u81ea\u52a8\u5316\u89c4\u5212\u5668\u8d1f\u8d23\u751f\u6210\u6ee1\u8db3\u7ea6\u675f\u5e76\u4f18\u5316\u7528\u6237\u6548\u7528\u7684\u65c5\u884c\u8ba1\u5212\u3002\u6211\u4eec\u5728\u4e0d\u540c\u65c5\u884c\u573a\u666f\u4e2d\u7684\u5b9e\u9a8c\u8868\u660e\uff0cTRIP-PAL\u5728\u751f\u6210\u65c5\u884c\u8ba1\u5212\u65b9\u9762\u4f18\u4e8e\u7eafLLM\u65b9\u6cd5\u3002|\n", "2406.10185": "|**2024-06-14**|**Detecting and Evaluating Medical Hallucinations in Large Vision Language Models**|Jiawei Chen et.al.|[2406.10185](http://arxiv.org/abs/2406.10185)|null|\u968f\u7740\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u5982\u533b\u5b66\u56fe\u50cf\u95ee\u7b54\u548c\u62a5\u544a\u751f\u6210\uff0c\u5b83\u4eec\u4ece\u57fa\u7840\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u90a3\u91cc\u7ee7\u627f\u4e86\u5f3a\u5927\u7684\u529f\u80fd\uff0c\u4f46\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u4ee4\u4eba\u62c5\u5fe7\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5728\u533b\u7597\u8fd9\u6837\u5bf9\u9519\u8bef\u5bb9\u9650\u6781\u4f4e\u7684\u73af\u5883\u4e2d\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u65b9\u6cd5\u6216\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86Med-HallMark\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u533b\u7597\u591a\u6a21\u6001\u9886\u57df\u8bbe\u8ba1\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u57fa\u51c6\u3002Med-HallMark\u652f\u6301\u591a\u4efb\u52a1\u5e7b\u89c9\u68c0\u6d4b\uff0c\u63d0\u4f9b\u591a\u5143\u5316\u7684\u5e7b\u89c9\u6570\u636e\uff0c\u5e76\u91c7\u7528\u5206\u7ea7\u5e7b\u89c9\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MediHall Score\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u533b\u7597\u8bc4\u4f30\u6307\u6807\uff0c\u901a\u8fc7\u5206\u5c42\u8bc4\u5206\u7cfb\u7edf\u8bc4\u4f30LVLM\u7684\u5e7b\u89c9\uff0c\u8003\u8651\u5176\u4e25\u91cd\u7a0b\u5ea6\u548c\u7c7b\u578b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u6f5c\u5728\u4e34\u5e8a\u5f71\u54cd\u7684\u7ec6\u81f4\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86MediHallDetector\uff0c\u4e00\u79cd\u4e13\u4e3a\u7cbe\u786e\u5e7b\u89c9\u68c0\u6d4b\u8bbe\u8ba1\u7684\u533b\u7597LVLM\uff0c\u5b83\u91c7\u7528\u4e86\u591a\u4efb\u52a1\u8bad\u7ec3\u65b9\u6cd5\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u4e3a\u6d41\u884c\u7684LVLM\u8bbe\u7acb\u4e86\u57fa\u7ebf\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMediHall Score\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edf\u6307\u6807\u66f4\u6df1\u5165\u7406\u89e3\u5e7b\u89c9\u5f71\u54cd\u7684\u80fd\u529b\uff0c\u5e76\u663e\u793a\u4e86MediHallDetector\u7684\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u663e\u8457\u63d0\u9ad8LVLM\u5728\u533b\u7597\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6240\u6709\u76f8\u5173\u8d44\u6e90\u5c06\u5728\u4e0d\u4e45\u540e\u53d1\u5e03\u3002|\n", "2406.10181": "|**2024-06-14**|**Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors**|Siyuan Chen et.al.|[2406.10181](http://arxiv.org/abs/2406.10181)|null|\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8e\u5185\u5b58\u9700\u6c42\u901a\u5e38\u8d85\u8fc7\u5355\u4e2aGPU\u7684\u5bb9\u91cf\uff0c\u89e3\u51b3\u8fd9\u4e00\u5185\u5b58\u6311\u6218\u7684\u4e00\u4e2a\u5e38\u89c1\u65b9\u6cd5\u662f\u5c06\u8ba1\u7b97\u548c\u6570\u636e\u4eceGPU\u8fc1\u79fb\u5230CPU\u3002\u7136\u800c\uff0c\u8fd9\u53d7\u5230\u666e\u901a\u786c\u4ef6\u5e26\u5bbd\u9650\u5236\u7684\u5236\u7ea6\uff0c\u5f71\u54cd\u4e86CPU\u4e0eGPU\u4e4b\u95f4\u7684\u901a\u4fe1\u6548\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLSP_Offload\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5b66\u4e60\u5f0f\u7684\u5b50\u7a7a\u95f4\u6295\u5f71\u5668\uff0c\u5b9e\u73b0\u5728 commodity \u786c\u4ef6\u4e0a\u63a5\u8fd1\u539f\u751f\u901f\u5ea6\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6d89\u53ca\u5b66\u4e60\u4e00\u4e2a\u9ad8\u6548\u7684\u7a00\u758f\u538b\u7f29\u5668\uff0c\u4ee5\u6700\u5c0f\u5316\u901a\u4fe1\u5e76\u4fdd\u6301\u6700\u5c0f\u7cbe\u5ea6\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5c42\u7ea7\u901a\u4fe1\u8c03\u5ea6\u7b56\u7565\uff0c\u4ee5\u6700\u5927\u5316\u901a\u4fe1\u4e0e\u8ba1\u7b97\u4e4b\u95f4\u7684\u5e76\u884c\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u57284GB\u7b14\u8bb0\u672cGPU\u4e0a\u5fae\u8c0313\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5728\u914d\u590724GB\u5185\u5b58\u7684NVIDIA RTX 4090 GPU\u4e0a\u5fae\u8c0370\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u4ec5\u6bd4\u65e0\u5185\u5b58\u9650\u5236\u7684\u5fae\u8c03\u616231%\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u79bb\u7ebf\u6846\u67b6\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86\u5fae\u8c03\u541e\u5410\u91cf\uff0c\u6700\u9ad8\u53ef\u8fbe3.33\u500d\uff0c\u5f53\u8fbe\u5230\u76f8\u540c\u51c6\u786e\u5ea6\u65f6\uff0c\u51cf\u5c11\u4e86\u7aef\u5230\u7aef\u5fae\u8c03\u65f6\u95f4\u768433.1%\u81f362.5%\u3002|\n", "2406.10172": "|**2024-06-14**|**Datasets for Multilingual Answer Sentence Selection**|Matteo Gabburo et.al.|[2406.10172](http://arxiv.org/abs/2406.10172)|null|**\u6458\u8981\uff1a** \u5728\u8bbe\u8ba1\u9ad8\u6548\u7684\u68c0\u7d22\u5f0f\u95ee\u7b54\uff08Question Answering\uff0cQA\uff09\u7cfb\u7edf\u4e2d\uff0c\u7b54\u6848\u53e5\u5b50\u9009\u62e9\uff08Answer Sentence Selection\uff0cAS2\uff09\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u6807\u6ce8\u6570\u636e\uff0c\u5927\u591a\u6570AS2\u9886\u57df\u7684\u8fdb\u5c55\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\u3002\u8fd9\u5bfc\u81f4\u4e86\u975e\u82f1\u8bed\u73af\u5883\u4e0bQA\u7cfb\u7edf\u7684\u6027\u80fd\u4e0e\u82f1\u8bed\u7cfb\u7edf\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u672c\u8bba\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u65b0\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u8a00\uff08\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u8461\u8404\u7259\u8bed\u548c\u897f\u73ed\u7259\u8bed\uff09AS2\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u5bf9\u73b0\u6709\u7684\u82f1\u6587AS2\u6570\u636e\u96c6\uff08\u5982ASNQ\u3001WikiQA\u548cTREC-QA\uff09\u8fdb\u884c\u76d1\u7763\u81ea\u52a8\u673a\u5668\u7ffb\u8bd1\uff08Automatic Machine Translation\uff0cAMT\uff09\u3002\u6211\u4eec\u901a\u8fc7\u591a\u79cd\u5b9e\u9a8c\u548c\u4e0d\u540cTransformer\u67b6\u6784\u7684\u8bc4\u4f30\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u53ca\u7ffb\u8bd1\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u5bf9\u4e8e\u6784\u5efa\u5065\u58ee\u7684\u591a\u8bed\u8a00AS2\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u663e\u8457\u7f29\u5c0f\u4e86\u975e\u82f1\u8bed\u4e0e\u82f1\u8bed\u73af\u5883\u4e0b\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2406.10162": "|**2024-06-14**|**Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models**|Carson Denison et.al.|[2406.10162](http://arxiv.org/abs/2406.10162)|**[link](https://github.com/anthropics/sycophancy-to-subterfuge-paper)**|**\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u5f53\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5b66\u4f1a\u56e0\u8bad\u7ec3\u76ee\u6807\u4e0d\u660e\u786e\u800c\u83b7\u5f97\u4e0d\u671f\u671b\u7684\u884c\u4e3a\u65f6\uff0c\u5c31\u4f1a\u51fa\u73b0\u89c4\u683c\u6e38\u620f\u73b0\u8c61\u3002\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u4ece\u7b80\u5355\u7684\u5949\u627f\u884c\u4e3a\u53d1\u5c55\u5230\u66f4\u590d\u6742\u4e14\u5371\u9669\u7684\u5956\u52b1\u7be1\u6539\uff0c\u5373\u6a21\u578b\u76f4\u63a5\u4fee\u6539\u5176\u81ea\u8eab\u7684\u5956\u52b1\u673a\u5236\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8fd9\u4e9b\u590d\u6742\u884c\u4e3a\u53ef\u80fd\u8d85\u51fa\u63a2\u7d22\u7684\u8303\u7574\u3002\u672c\u8bba\u6587\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u4f1a\u5728\u5b66\u4e60\u5e38\u89c1\u89c4\u683c\u6e38\u620f\u7b56\u7565\u540e\uff0c\u6cdb\u5316\u5230\u6267\u884c\u66f4\u4e3a\u7f55\u89c1\u548c\u660e\u663e\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u5956\u52b1\u7be1\u6539\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9010\u6b65\u5347\u7ea7\u7684\u53ef\u6e38\u620f\u73af\u5883\u7cfb\u5217\uff0c\u5e76\u53d1\u73b0\u9488\u5bf9\u65e9\u671f\u9636\u6bb5\u73af\u5883\u7684\u8bad\u7ec3\u4f1a\u5bfc\u81f4\u5728\u540e\u7eed\u73af\u5883\u4e2d\u51fa\u73b0\u66f4\u591a\u7684\u89c4\u683c\u6e38\u620f\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u4e00\u5c0f\u90e8\u5206\u4f46\u975e\u96f6\u7684LLMs\uff0c\u5728\u7ecf\u5386\u4e86\u5b8c\u6574\u8bad\u7ec3\u8bfe\u7a0b\u540e\uff0c\u80fd\u591f\u96f6\u6837\u672c\u5730\u76f4\u63a5\u4fee\u6539\u5176\u5956\u52b1\u51fd\u6570\u3002\u91cd\u65b0\u8bad\u7ec3LLMs\u4ee5\u907f\u514d\u65e9\u671f\u9636\u6bb5\u7684\u6e38\u620f\u884c\u4e3a\u53ef\u4ee5\u51cf\u8f7b\u4f46\u4e0d\u80fd\u5b8c\u5168\u6d88\u9664\u540e\u671f\u73af\u5883\u4e2d\u7684\u5956\u52b1\u7be1\u6539\u3002\u6b64\u5916\uff0c\u5bf9\u53ef\u6e38\u620f\u73af\u5883\u8fdb\u884c\u65e0\u5bb3\u6027\u8bad\u7ec3\u5e76\u4e0d\u80fd\u963b\u6b62\u5956\u52b1\u7be1\u6539\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u4ece\u5e38\u89c1\u7684\u89c4\u683c\u6e38\u620f\u7b56\u7565\u4e2d\u6cdb\u5316\u5230\u66f4\u6076\u52a3\u7684\u5956\u52b1\u7be1\u6539\u884c\u4e3a\uff0c\u5e76\u4e14\u8981\u6d88\u9664\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u5e76\u975e\u6613\u4e8b\u3002**|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149](http://arxiv.org/abs/2406.10149)|**[link](https://github.com/booydar/babilong)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8f93\u5165\u4e0a\u4e0b\u6587\u957f\u5ea6\u663e\u8457\u589e\u52a0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u672a\u80fd\u5145\u5206\u8861\u91cf\u6a21\u578b\u5904\u7406\u957f\u7bc7\u6587\u672c\u4e2d\u7684\u4e8b\u5b9e\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86BABILong\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u6d4b\u8bd5\u6a21\u578b\u5728\u5206\u5e03\u5f0f\u957f\u6587\u6863\u4e2d\u8de8\u4e8b\u5b9e\u63a8\u7406\u7684\u80fd\u529b\u3002BABILong\u5305\u62ec20\u4e2a\u591a\u6837\u5316\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u5982\u4e8b\u5b9e\u94fe\u3001\u7b80\u5355\u5f52\u7eb3\u3001\u6f14\u7ece\u3001\u8ba1\u6570\u4ee5\u53ca\u5904\u7406\u5217\u8868/\u96c6\u5408\u7b49\u3002\u8fd9\u4e9b\u4efb\u52a1\u672c\u8eab\u5c31\u5177\u6709\u6311\u6218\u6027\uff0c\u800c\u5f53\u6240\u9700\u4e8b\u5b9e\u5206\u6563\u5728\u957f\u7bc7\u81ea\u7136\u6587\u672c\u4e2d\u65f6\uff0c\u96be\u5ea6\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6d41\u884c\u7684LLMs\u5b9e\u9645\u4e0a\u53ea\u5229\u7528\u4e8610%-20%\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4e14\u968f\u7740\u63a8\u7406\u590d\u6742\u6027\u7684\u63d0\u9ad8\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u5bf9\u4e8e\u66ff\u4ee3\u7684\u4e0a\u4e0b\u6587\u63a8\u7406\u65b9\u6cd5\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\u5728\u5355\u4e8b\u5b9e\u95ee\u9898\u56de\u7b54\u4e0a\u7684\u51c6\u786e\u7387\u4ec5\u4e3a60%\uff0c\u4e0e\u4e0a\u4e0b\u6587\u957f\u5ea6\u65e0\u5173\u3002\u5728\u4e0a\u4e0b\u6587\u6269\u5c55\u65b9\u6cd5\u4e2d\uff0c\u5faa\u73af\u8bb0\u5fc6Transformer\u5c55\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u53ef\u5904\u7406\u957f\u8fbe1100\u4e07\u4e2a\u4ee4\u724c\u7684\u957f\u5ea6\u3002BABILong\u57fa\u51c6\u6d4b\u8bd5\u53ef\u4ee5\u6269\u5c55\u5230\u4efb\u610f\u957f\u5ea6\uff0c\u4ee5\u652f\u6301\u8bc4\u4f30\u5177\u6709\u66f4\u5f3a\u80fd\u529b\u7684\u65b0\u6a21\u578b\uff0c\u5e76\u63d0\u4f9b\u4e86\u957f\u8fbe100\u4e07\u4ee4\u724c\u7684\u5206\u9694\u3002|\n", "2406.11840": "|**2024-06-17**|**LLaNA: Large Language and NeRF Assistant**|Andrea Amaduzzi et.al.|[2406.11840](http://arxiv.org/abs/2406.11840)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u56fe\u50cf\u548c3D\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5168\u9762\u6355\u6349\u7269\u4f53\u7684\u5916\u89c2\u548c\u51e0\u4f55\u7279\u6027\u4e0a\u5b58\u5728\u5c40\u9650\u3002\u8fd1\u671f\uff0c\u795e\u7ecf\u8f90\u5c04\u573a\uff08Neural Radiance Fields\uff0c\u7b80\u79f0NeRF\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u5668\uff08Multi-Layer Perceptron\uff0cMLP\uff09\u7684\u6743\u91cd\u7f16\u7801\u4e86\u7269\u4f53\u7684\u51e0\u4f55\u7ed3\u6784\u548c\u9ad8\u5ea6\u903c\u771f\u7684\u5916\u89c2\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06NeRF\u6574\u5408\u5230MLLM\u4e2d\u7684\u53ef\u884c\u6027\u548c\u6548\u679c\u3002\u6211\u4eec\u5f00\u53d1\u4e86LLaNA\uff0c\u8fd9\u662f\u9996\u4e2a\u901a\u7528\u7684NeRF-\u8bed\u8a00\u52a9\u624b\uff0c\u80fd\u591f\u6267\u884c\u65b0\u4efb\u52a1\uff0c\u5982NeRF\u63cf\u8ff0\u548c\u95ee\u7b54\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u76f4\u63a5\u5904\u7406NeRF MLP\u7684\u6743\u91cd\uff0c\u65e0\u9700\u6e32\u67d3\u56fe\u50cf\u6216\u6784\u5efa3D\u6570\u636e\u7ed3\u6784\uff0c\u5c31\u80fd\u63d0\u53d6\u6709\u5173\u4ee3\u8868\u5bf9\u8c61\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65e0\u987b\u4eba\u5de5\u5e72\u9884\u7684NeRF\u6587\u672c\u6807\u6ce8\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5404\u79cdNeRF-\u8bed\u8a00\u4efb\u52a1\uff0c\u5e76\u636e\u6b64\u5efa\u7acb\u4e86\u4e00\u4e2a\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u6211\u4eec\u7684\u6a21\u578b\u5bf9NeRF\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5904\u7406NeRF\u6743\u91cd\u7684\u65b9\u6cd5\u5728\u4e0e\u4eceNeRF\u4e2d\u63d0\u53d62D\u62163D\u8868\u793a\u8fdb\u884c\u6bd4\u8f83\u65f6\u8868\u73b0\u66f4\u4f18\u3002|\n", "2406.11839": "|**2024-06-17**|**mDPO: Conditional Preference Optimization for Multimodal Large Language Models**|Fei Wang et.al.|[2406.11839](http://arxiv.org/abs/2406.11839)|null|### \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5df2\u88ab\u8bc1\u660e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6821\u51c6\u7684\u6709\u6548\u624b\u6bb5\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06DPO\u5e94\u7528\u4e8e\u591a\u6a21\u6001\u573a\u666f\uff0c\u4f46\u53d1\u73b0\u5b9e\u73b0\u6301\u7eed\u6539\u8fdb\u9887\u5177\u6311\u6218\u3002\u901a\u8fc7\u5bf9\u6bd4\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5ffd\u89c6\u4e86\u56fe\u50cf\u6761\u4ef6\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86mDPO\uff0c\u4e00\u4e2a\u65e8\u5728\u9632\u6b62\u8bed\u8a00\u504f\u597d\u8fc7\u5ea6\u4f18\u5148\u7684\u591a\u6a21\u6001DPO\u76ee\u6807\uff0c\u540c\u65f6\u4f18\u5316\u56fe\u50cf\u504f\u597d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5956\u52b1\u951a\u70b9\uff0c\u786e\u4fdd\u9009\u62e9\u7684\u54cd\u5e94\u5956\u52b1\u4fdd\u6301\u6b63\u5411\uff0c\u4ece\u800c\u907f\u514d\u76f8\u5bf9\u504f\u597d\u4f18\u5316\u56fa\u6709\u7684\u53ef\u80fd\u6027\u964d\u4f4e\u95ee\u9898\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u4e24\u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u591a\u6a21\u6001LLM\u4ee5\u53ca\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0cmDPO\u6709\u6548\u89e3\u51b3\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5e76\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002|\n", "2406.11832": "|**2024-06-17**|**Unveiling Encoder-Free Vision-Language Models**|Haiwen Diao et.al.|[2406.11832](http://arxiv.org/abs/2406.11832)|**[link](https://github.com/baaivision/eve)**|**\u5f53\u524d\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u4e3b\u8981\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7f16\u7801\u5668\u6765\u63d0\u53d6\u89c6\u89c9\u7279\u5f81\uff0c\u7136\u540e\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u5728\u62bd\u8c61\u89c6\u89c9\u8868\u793a\u65b9\u9762\u8bbe\u5b9a\u4e86\u5f3a\u70c8\u7684\u5148\u9a8c\uff0c\u5982\u5206\u8fa8\u7387\u3001\u6bd4\u4f8b\u548c\u8bed\u4e49\u503e\u5411\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86VLM\u7684\u7075\u6d3b\u6027\u548c\u6548\u7387\u3002\u76f4\u63a5\u8bad\u7ec3\u65e0\u7f16\u7801\u5668\u7684\u7eafVLM\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u4e14\u9c9c\u6709\u63a2\u7d22\u3002\u5b9e\u8bc1\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u79cd\u76f4\u63a5\u8bad\u7ec3\u65b9\u6cd5\u4f1a\u5bfc\u81f4\u6536\u655b\u7f13\u6162\u548c\u6027\u80fd\u5dee\u8ddd\u8f83\u5927\u3002\u672c\u6587\u65e8\u5728\u5f25\u5408\u7f16\u7801\u5668\u4f9d\u8d56\u578b\u548c\u65e0\u7f16\u7801\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u7eafVLM\u8bad\u7ec3\u7b56\u7565\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u6df1\u5165\u5b9e\u9a8c\u63ed\u793a\u4e86\u9ad8\u6548\u8bad\u7ec3\u65e0\u7f16\u7801\u5668VLM\u7684\u5173\u952e\u8981\u7d20\uff1a\uff081\uff09\u5728\u7edf\u4e00\u7684\u89e3\u7801\u5668\u5185\u878d\u5408\u89c6\u89c9\u4e0e\u8bed\u8a00\u8868\u793a\uff1b\uff082\uff09\u901a\u8fc7\u989d\u5916\u76d1\u7763\u63d0\u5347\u89c6\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e9b\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86EVE\uff0c\u4e00\u4e2a\u65e0\u7f16\u7801\u5668\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u65e2\u80fd\u9ad8\u6548\u8bad\u7ec3\u4e5f\u80fd\u5feb\u901f\u63a8\u7406\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4ec5\u4f7f\u75283500\u4e07\u516c\u5f00\u53ef\u7528\u7684\u6570\u636e\uff0cEVE\u5c31\u80fd\u5728\u591a\u4e2a\u89c6\u89c9\u8bed\u8a00\u57fa\u51c6\u4e0a\u4e0e\u7c7b\u4f3c\u5bb9\u91cf\u7684\u7f16\u7801\u5668\u4f9d\u8d56\u578bVLM\u5339\u654c\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u795e\u79d8\u3001\u6570\u636e\u672a\u516c\u5f00\u7684Fuyu-8B\u6a21\u578b\u3002\u6211\u4eec\u76f8\u4fe1\uff0cEVE\u4e3a\u8de8\u6a21\u6001\u5f00\u53d1\u7eaf\u7cb9\u7684\u89e3\u7801\u5668\u67b6\u6784\u63d0\u4f9b\u4e86\u4e00\u4e2a\u900f\u660e\u4e14\u9ad8\u6548\u7684\u8def\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u516c\u5f00\u5728\uff1ahttps://github.com/baaivision/EVE\u3002**|\n", "2406.11831": "|**2024-06-17**|**Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models**|Bingqi Ma et.al.|[2406.11831](http://arxiv.org/abs/2406.11831)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u89e3\u7801\u5668-only\u53d8\u538b\u5668\u5728\u6587\u672c\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5982\u4f55\u5c06\u8fd9\u4e9b\u5148\u8fdb\u7684LLMs\u5e94\u7528\u4e8e\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\u4ecd\u662f\u4e00\u4e2a\u5f85\u63a2\u7d22\u7684\u95ee\u9898\u3002\u6211\u4eec\u53d1\u73b0\u76f4\u63a5\u4f7f\u7528LLM\u4f5c\u4e3a\u63d0\u793a\u7f16\u7801\u5668\u4f1a\u663e\u8457\u964d\u4f4e\u751f\u6210\u56fe\u50cf\u65f6\u7684\u63d0\u793a\u8ddf\u968f\u80fd\u529b\u3002\u4e3b\u8981\u5b58\u5728\u4e24\u4e2a\u95ee\u9898\uff1a\u4e00\u662fLLM\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u8bad\u7ec3\u4e0e\u6269\u6563\u6a21\u578b\u5bf9\u533a\u5206\u6027\u63d0\u793a\u7279\u5f81\u7684\u9700\u6c42\u4e0d\u5339\u914d\uff1b\u4e8c\u662f\u89e3\u7801\u5668\u67b6\u6784\u56fa\u6709\u7684\u4f4d\u7f6e\u504f\u89c1\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u6846\u67b6\uff0c\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4f7f\u7528\u6307\u5357\uff0c\u589e\u5f3aLLM\u7684\u6587\u672c\u8868\u793a\u80fd\u529b\uff0c\u6d88\u9664\u5176\u5185\u5728\u7684\u5b9a\u4f4d\u504f\u89c1\uff0c\u4ece\u800c\u7075\u6d3b\u5730\u5c06\u6700\u5148\u8fdb\u7684LLMs\u878d\u5165\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u878d\u5408\u591a\u4e2aLLMs\u7684\u65b9\u6cd5\u3002\u9274\u4e8eTransformer\u67b6\u6784\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6269\u5c55\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8bbe\u8ba1\u4e86\u57fa\u4e8e\u8be5\u6846\u67b6\u7684LLM-Infused Diffusion Transformer\uff08LI-DiT\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86LI-DiT\u5728\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u6570\u636e\u91cf\u4e0b\u7684\u6027\u80fd\u3002\u5f97\u76ca\u4e8eLLMs\u7684\u5185\u5728\u80fd\u529b\u53ca\u6211\u4eec\u7684\u521b\u65b0\u8bbe\u8ba1\uff0cLI-DiT\u7684\u63d0\u793a\u7406\u89e3\u6027\u80fd\u8f7b\u677e\u8d85\u8d8a\u5f00\u6e90\u7684\u6700\u65b0\u6a21\u578b\uff0c\u4ee5\u53ca\u5305\u62ecStable Diffusion 3\u3001DALL-E 3\u548cMidjourney V6\u5728\u5185\u7684\u4e3b\u6d41\u95ed\u6e90\u5546\u4e1a\u6a21\u578b\u3002\u5f3a\u5927\u7684LI-DiT-10B\u5c06\u5728\u8fdb\u4e00\u6b65\u4f18\u5316\u548c\u5b89\u5168\u68c0\u67e5\u540e\u63d0\u4f9b\u3002|\n", "2406.11827": "|**2024-06-17**|**WPO: Enhancing RLHF with Weighted Preference Optimization**|Wenxuan Zhou et.al.|[2406.11827](http://arxiv.org/abs/2406.11827)|**[link](https://github.com/wzhouad/wpo)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u66f4\u597d\u5730\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6709\u524d\u666f\u65b9\u6cd5\u3002\u7531\u4e8e\u6210\u672c\u6548\u76ca\u548c\u53ef\u6269\u5c55\u6027\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u2014\u2014\u901a\u8fc7\u5176\u4ed6\u6a21\u578b\u83b7\u53d6\u504f\u597d\u6570\u636e\u2014\u2014\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u7136\u800c\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u5e38\u53d7\u91c7\u6837\u7b56\u7565\u4e0e\u76ee\u6807\u7b56\u7565\u4e4b\u95f4\u5206\u5e03\u5dee\u5f02\u7684\u5f71\u54cd\uff0c\u5bfc\u81f4\u4f18\u5316\u6548\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7b56\u7565\u2014\u2014\u52a0\u6743\u504f\u597d\u4f18\u5316\uff08WPO\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u504f\u597d\u8bc4\u5206\u5bf9\uff0c\u4f7f\u79bb\u7ebf\u6570\u636e\u66f4\u63a5\u8fd1\u4e8e\u5f53\u524d\u7b56\u7565\uff0c\u4ece\u800c\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5206\u5e03\u5dee\u8ddd\u96be\u9898\uff0c\u8fd8\u63d0\u5347\u4e86\u4f18\u5316\u8fc7\u7a0b\uff0c\u65e0\u9700\u989d\u5916\u6210\u672c\u3002 \u6211\u4eec\u5728Alpaca Eval 2\u548cMT-bench\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002WPO\u5728Alpaca Eval 2\u4e0a\u7684\u6027\u80fd\u6bd4\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u63d0\u9ad8\u4e865.6%\u3002\u57fa\u4e8eLlama-3-8B-Instruct\uff0cWPO\u751a\u81f3\u5efa\u7acb\u4e86\u663e\u8457\u7684\u957f\u5ea6\u63a7\u5236\u80dc\u7387\uff0c\u8fbe\u523048.6%\uff0c\u572880\u4ebf\u53c2\u6570\u6a21\u578b\u6392\u884c\u699c\u4e0a\u6210\u4e3a\u6700\u5f3a\u52b2\u7684\u6a21\u578b\u3002\u6211\u4eec\u5c06\u5728\u4e0a\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002**|\n", "2406.11818": "|**2024-06-17**|**Embodied Instruction Following in Unknown Environments**|Zhenyu Wu et.al.|[2406.11818](http://arxiv.org/abs/2406.11818)|null|\u5728\u81ea\u4e3b\u5bb6\u5ead\u670d\u52a1\u7cfb\u7edf\u4e2d\uff0c\u4f7f\u5b9e\u4f53\u4ee3\u7406\u80fd\u6839\u636e\u81ea\u7136\u8bed\u8a00\u5b8c\u6210\u590d\u6742\u7684\u4eba\u7c7b\u6307\u4ee4\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u4ec5\u80fd\u5728\u6240\u6709\u4e92\u52a8\u5bf9\u8c61\u90fd\u63d0\u4f9b\u7ed9\u4ee3\u7406\u7684\u5df2\u77e5\u73af\u5883\u4e2d\u6267\u884c\u6307\u4ee4\uff0c\u76f4\u63a5\u5c06\u73b0\u6709\u65b9\u6cd5\u5e94\u7528\u4e8e\u672a\u77e5\u73af\u5883\u901a\u5e38\u4f1a\u4ea7\u751f\u64cd\u4f5c\u4e0d\u5b58\u5728\u7269\u4f53\u7684\u4e0d\u53ef\u884c\u8ba1\u5212\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u672a\u77e5\u73af\u5883\u7684\u590d\u6742\u4efb\u52a1\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\uff08Embodied Instruction Following\uff0cEIF\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4f7f\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u63a2\u7d22\u73af\u5883\uff0c\u5229\u7528\u73b0\u6709\u7269\u4f53\u751f\u6210\u53ef\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u8fbe\u6210\u62bd\u8c61\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\u5668\u548c\u4f4e\u5c42\u63a2\u7d22\u63a7\u5236\u5668\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5c42\u6b21\u5316\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u52a8\u6001\u533a\u57df\u6ce8\u610f\u529b\u6784\u5efa\u573a\u666f\u7684\u8bed\u4e49\u8868\u793a\u5730\u56fe\uff0c\u4ee5\u5c55\u793a\u5df2\u77e5\u7684\u89c6\u89c9\u7ebf\u7d22\uff0c\u4f7f\u4efb\u52a1\u89c4\u5212\u548c\u573a\u666f\u63a2\u7d22\u4e0e\u4eba\u7c7b\u6307\u4ee4\u76ee\u6807\u4fdd\u6301\u4e00\u81f4\u3002\u5bf9\u4e8e\u4efb\u52a1\u89c4\u5212\u5668\uff0c\u6839\u636e\u4efb\u52a1\u5b8c\u6210\u8fc7\u7a0b\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\uff0c\u6211\u4eec\u751f\u6210\u6b65\u9aa4\u5f0f\u7684\u53ef\u884c\u8ba1\u5212\u3002\u5bf9\u4e8e\u63a2\u7d22\u63a7\u5236\u5668\uff0c\u6839\u636e\u751f\u6210\u7684\u6b65\u9aa4\u8ba1\u5212\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\u9884\u6d4b\u6700\u4f18\u7684\u5bfc\u822a\u6216\u7269\u4f53\u4ea4\u4e92\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5927\u578b\u623f\u5c4b\u7ea7\u573a\u666f\u4e2d\u7684204\u4e2a\u590d\u6742\u4eba\u7c7b\u6307\u4ee4\uff08\u5982\u505a\u65e9\u9910\u548c\u6574\u7406\u623f\u95f4\uff09\u4e0a\u5b9e\u73b0\u4e8645.09%\u7684\u6210\u529f\u7387\u3002|\n", "2406.11816": "|**2024-06-17**|**VideoLLM-online: Online Video Large Language Model for Streaming Video**|Joya Chen et.al.|[2406.11816](http://arxiv.org/abs/2406.11816)|null|## \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u589e\u5f3a\u4e86\u89c6\u89c9\u529f\u80fd\uff0c\u80fd\u591f\u7406\u89e3\u56fe\u50cf\u3001\u89c6\u9891\u548c\u878d\u5408\u4e86\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u5185\u5bb9\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5927\u6a21odels\u7684\u8bad\u7ec3\u65b9\u6cd5\u901a\u5e38\u5c06\u89c6\u9891\u89c6\u4e3a\u9884\u5148\u526a\u8f91\u597d\u7684\u7247\u6bb5\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5728\u5904\u7406\u8fde\u7eed\u89c6\u9891\u6d41\u65f6\u6548\u679c\u4e0d\u4f73\u4e14\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cLearning-In-Video-Stream\u201d\uff08LIVE\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b9e\u65f6\u3001\u957f\u5e8f\u5217\u3001\u4e0e\u89c6\u9891\u6d41\u540c\u6b65\u7684\u5bf9\u8bdd\uff0c\u9002\u7528\u4e8e\u8fde\u7eed\u89c6\u9891\u8f93\u5165\u3002LIVE\u6846\u67b6\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a\uff081\uff09\u4e00\u4e2a\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u8fde\u7eed\u6d41\u5f0f\u8f93\u5165\u7684\u8bed\u8a00\u5efa\u6a21\u76ee\u6807\uff1b\uff082\uff09\u4e00\u79cd\u6570\u636e\u751f\u6210\u7b56\u7565\uff0c\u5c06\u79bb\u7ebf\u65f6\u95f4\u6807\u6ce8\u8f6c\u6362\u4e3a\u9002\u5408\u6d41\u5f0f\u5bf9\u8bdd\u7684\u683c\u5f0f\uff1b\uff083\uff09\u4e00\u4e2a\u4f18\u5316\u7684\u63a8\u7406\u7ba1\u9053\uff0c\u4ee5\u63d0\u9ad8\u5728\u5b9e\u9645\u89c6\u9891\u6d41\u4e2d\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u57fa\u4e8eLlama-2/Llama-3\uff0c\u6211\u4eec\u6784\u5efa\u4e86VideoLLM-online\u6a21\u578b\uff0c\u5e76\u901a\u8fc7\u5b83\u5c55\u793a\u4e86\u5728\u5904\u7406\u89c6\u9891\u6d41\u5bf9\u8bdd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\uff0c\u4f8b\u5982\uff0c\u5728A100 GPU\u4e0a\uff0c\u8be5\u6a21\u578b\u80fd\u57285\u5206\u949f\u89c6\u9891\u7247\u6bb5\u4e2d\u5b9e\u73b0\u8d85\u8fc710\u5e27\u6bcf\u79d2\u7684\u6d41\u5f0f\u5bf9\u8bdd\u3002\u6b64\u5916\uff0cVideoLLM-online\u8fd8\u5728\u516c\u5f00\u7684\u79bb\u7ebf\u89c6\u9891\u57fa\u51c6\u6d4b\u8bd5\uff08\u5982\u8bc6\u522b\u3001captioning\u548c\u9884\u6d4b\uff09\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u3001\u6a21\u578b\u3001\u6570\u636e\u548c\u6f14\u793a\u53d1\u5e03\u5728https://showlab.github.io/videollm-online\u4f9b\u4eba\u4f7f\u7528\u3002|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813](http://arxiv.org/abs/2406.11813)|**[link](https://github.com/kaistai/factual-knowledge-acquisition)**|\u5c3d\u7ba1\u8fd1\u671f\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5b58\u50a8\u5927\u91cf\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u8fd9\u4e9b\u77e5\u8bc6\u7684\u673a\u5236\u5c1a\u4e0d\u660e\u786e\u3002\u672c\u7814\u7a76\u9488\u5bf9\u8fd9\u4e00\u7f3a\u53e3\uff0c\u63a2\u8ba8\u4e86LLMs\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u5982\u4f55\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u3002\u7814\u7a76\u53d1\u73b0\u4e86\u4e00\u4e9b\u5173\u952e\u6d1e\u89c1\uff1a\u9996\u5148\uff0c\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0c\u66f4\u591a\u7684\u8bad\u7ec3\u6570\u636e\u5bf9\u6a21\u578b\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u7684\u80fd\u529b\u5e76\u65e0\u663e\u8457\u63d0\u5347\u3002\u5176\u6b21\uff0c\u8bad\u7ec3\u6b65\u6570\u4e0e\u8bb0\u5fc6\u9057\u5fd8\u548c\u4e8b\u5b9e\u77e5\u8bc6\u6cdb\u5316\u4e4b\u95f4\u5b58\u5728\u5e42\u5f8b\u5173\u7cfb\uff0c\u4f7f\u7528\u91cd\u590d\u8bad\u7ec3\u6570\u636e\u7684\u6a21\u578b\u9057\u5fd8\u901f\u5ea6\u66f4\u5feb\u3002\u7b2c\u4e09\uff0c\u589e\u5927\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u62b5\u6297\u9057\u5fd8\u7684\u80fd\u529b\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0cLLMs\u5728\u9884\u8bad\u7ec3\u4e2d\u7684\u4e8b\u5b9e\u77e5\u8bc6\u83b7\u53d6\u662f\u901a\u8fc7\u9010\u6b65\u589e\u52a0\u6bcf\u4e00\u6b65\u4e2d\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\u4e8b\u5b9e\u77e5\u8bc6\u51fa\u73b0\u7684\u6982\u7387\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u589e\u52a0\u968f\u540e\u4f1a\u56e0\u9057\u5fd8\u800c\u7a00\u91ca\u3002\u57fa\u4e8e\u8fd9\u79cd\u7406\u89e3\uff0c\u6211\u4eec\u80fd\u591f\u89e3\u91ca\u4e00\u4e9b\u6700\u8fd1\u89c2\u5bdf\u5230\u7684LLM\u884c\u4e3a\uff0c\u5982\u957f\u5c3e\u77e5\u8bc6\u4e0a\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4ee5\u53ca\u53bb\u91cd\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u597d\u5904\u3002|\n", "2406.11811": "|**2024-06-17**|**RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content**|Joao Monteiro et.al.|[2406.11811](http://arxiv.org/abs/2406.11811)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5927\u91cf\u4f9d\u8d56\u81ea\u52a8\u4ece\u4e92\u8054\u7f51\u6293\u53d6\u7684\u6570\u636e\uff0c\u5176\u4e2d\u5305\u62ec\u5305\u542b\u5927\u91cf\u901a\u7528\u77e5\u8bc6\u7684\u767e\u79d1\u5168\u4e66\uff08\u5982\u7ef4\u57fa\u767e\u79d1\uff09\uff0c\u4e5f\u53ef\u80fd\u4e0e\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u57fa\u51c6\u6570\u636e\u96c6\u91cd\u53e0\u3002\u56e0\u6b64\uff0c\u5982\u679c\u6d4b\u8bd5\u96c6\u53ef\u80fd\u5df2\u6cc4\u9732\u5230\u8bad\u7ec3\u96c6\u4e2d\uff0c\u5bf9\u6a21\u578b\u7684\u8bc4\u4f30\u53ef\u80fd\u4f1a\u4ea7\u751f\u8bef\u5bfc\u6027\u7684\u7ed3\u8bba\u3002\u4e3a\u4e86\u63a8\u52a8\u8bed\u8a00\u6a21\u578b\u7684\u516c\u6b63\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\u2014\u2014RepLiQA\uff0c\u9002\u7528\u4e8e\u95ee\u7b54\u548c\u4e3b\u9898\u68c0\u7d22\u4efb\u52a1\u3002RepLiQA\u662f\u4e00\u4e2a\u5305\u542b\u4e94\u4e2a\u5206\u7247\u7684\u6d4b\u8bd5\u96c6\uff0c\u5176\u4e2d\u56db\u4e2a\u5728\u672c\u8bba\u6587\u53d1\u5e03\u524d\u672a\u516c\u5f00\u6216\u901a\u8fc7LLM API\u63d0\u4f9b\u3002RepLiQA\u7684\u6bcf\u4e2a\u6837\u672c\u7531\u4ee5\u4e0b\u56db\u90e8\u5206\u7ec4\u6210\uff1a\uff081\uff09\u7531\u4eba\u7c7b\u6807\u6ce8\u5458\u521b\u4f5c\u7684\u865a\u6784\u573a\u666f\u63cf\u8ff0\u6587\u6863\uff08\u4f8b\u5982\u65b0\u95fb\u6587\u7ae0\uff09\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0d\u4f1a\u51fa\u73b0\u5728\u4e92\u8054\u7f51\u4e0a\uff1b\uff082\uff09\u5173\u4e8e\u6587\u6863\u4e3b\u9898\u7684\u95ee\u9898\uff1b\uff083\uff09\u76f4\u63a5\u6e90\u81ea\u6587\u6863\u4fe1\u606f\u7684\u6b63\u786e\u7b54\u6848\uff1b\uff084\uff09\u5305\u542b\u7b54\u6848\u7684\u6587\u6863\u6bb5\u843d\u3002\u8fd9\u610f\u5473\u7740\u53ea\u6709\u5f53\u6a21\u578b\u80fd\u5728\u63d0\u4f9b\u7684\u6587\u6863\u4e2d\u627e\u5230\u76f8\u5173\u5185\u5bb9\u65f6\uff0c\u624d\u80fd\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5305\u62ec\u591a\u4e2a\u6700\u5148\u8fdb\u7684LLM\uff0c\u4ee5\u63ed\u793a\u4e0d\u540c\u7c7b\u578b\u7684\u548c\u89c4\u6a21\u7684\u6a21\u578b\u5728\u6761\u4ef6\u8bed\u8a00\u5efa\u6a21\u8bbe\u7f6e\u4e0b\u7684\u6027\u80fd\u5dee\u5f02\u3002RepLiQA\u7684\u5df2\u53d1\u5e03\u5206\u7247\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\uff1ahttps://huggingface.co/datasets/ServiceNow/repliqa\u3002|\n", "2406.11801": "|**2024-06-17**|**Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations**|Rima Hazra et.al.|[2406.11801](http://arxiv.org/abs/2406.11801)|**[link](https://github.com/declare-lab/safety-arithmetic)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ffb\u8bd1\u548c\u95ee\u7b54\u7b49\u5e94\u7528\u4e2d\u7684\u65e5\u76ca\u91cd\u8981\uff0c\u786e\u4fdd\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6b63\u786e\u5bfc\u5411\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u5bf9\u9f50\u65b9\u6cd5\u5728\u5904\u7406\u52a8\u6001\u7528\u6237\u610f\u56fe\u548c\u590d\u6742\u76ee\u6807\u65f6\u5b58\u5728\u56f0\u96be\uff0c\u4f7f\u5f97\u6a21\u578b\u5bb9\u6613\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\u2014\u2014\u5b89\u5168\u7b97\u672f\uff08Safety Arithmetic\uff09\uff0c\u65e8\u5728\u63d0\u5347LLMs\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5305\u62ec\u57fa\u7840\u6a21\u578b\u3001\u76d1\u7763\u5fae\u8c03\u6a21\u578b\uff08SFT\uff09\u548c\u7f16\u8f91\u540e\u7684\u6a21\u578b\u3002\u5b89\u5168\u7b97\u672f\u5305\u542b\u4e24\u90e8\u5206\uff1a\u6709\u5bb3\u5185\u5bb9\u6d88\u9664\uff08Harm Direction Removal\uff09\u4ee5\u907f\u514d\u4e0d\u826f\u8f93\u51fa\uff0c\u4ee5\u53ca\u5b89\u5168\u5bf9\u9f50\uff08Safety Alignment\uff09\u4ee5\u4fc3\u8fdb\u5b89\u5168\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86NoIntentEdit\u6570\u636e\u96c6\uff0c\u5b83\u63ed\u793a\u4e86\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5b89\u5168\u98ce\u9669\u7684\u7f16\u8f91\u5b9e\u4f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5b89\u5168\u7b97\u672f\u663e\u8457\u589e\u5f3a\u4e86\u5b89\u5168\u63aa\u65bd\uff0c\u51cf\u5c11\u4e86\u8fc7\u5ea6\u5b89\u5168\u7684\u95ee\u9898\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u5b9e\u7528\u6027\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5728\u4fdd\u969c\u5185\u5bb9\u751f\u6210\u7684\u5b89\u5168\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002**|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846](http://arxiv.org/abs/2406.12846)|null|\u5f53\u524d\u7684\u957f\u89c6\u9891\u7406\u89e3\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u65f6\u957f\u4ec5\u5341\u51e0\u79d2\u7684\u89c6\u9891\uff0c\u5bf9\u5904\u7406\u66f4\u957f\u89c6\u9891\u7684\u6280\u672f\u63a2\u7d22\u6709\u9650\u3002\u957f\u89c6\u9891\u4e2d\u7684\u5927\u91cf\u5e27\u6570\u5e26\u6765\u4e86\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u96be\u4ee5\u5b9a\u4f4d\u5173\u952e\u4fe1\u606f\u548c\u8fdb\u884c\u957f\u671f\u63a8\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faDrVideo\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6587\u6863\u68c0\u7d22\u7684\u7cfb\u7edf\uff0c\u4e13\u4e3a\u957f\u89c6\u9891\u7406\u89e3\u8bbe\u8ba1\u3002\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u89c6\u9891\u7406\u89e3\u95ee\u9898\u8f6c\u5316\u4e3a\u957f\u6587\u6863\u7406\u89e3\u4efb\u52a1\uff0c\u4ee5\u5145\u5206\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0cDrVideo\u5c06\u957f\u89c6\u9891\u8f6c\u6362\u4e3a\u6587\u672c\u5f62\u5f0f\u7684\u957f\u6587\u6863\uff0c\u9996\u5148\u68c0\u7d22\u5173\u952e\u5e27\u5e76\u589e\u5f3a\u8fd9\u4e9b\u5e27\u7684\u4fe1\u606f\uff0c\u4f5c\u4e3a\u7cfb\u7edf\u7684\u8d77\u70b9\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u57fa\u4e8e\u4ee3\u7406\u7684\u8fed\u4ee3\u5faa\u73af\uff0c\u6301\u7eed\u641c\u7d22\u7f3a\u5931\u4fe1\u606f\u3001\u8865\u5145\u76f8\u5173\u6570\u636e\uff0c\u5e76\u5728\u6536\u96c6\u5230\u8db3\u591f\u7684\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u4fe1\u606f\u540e\uff0c\u4ee5\u94fe\u5f0f\u601d\u8003\u7684\u65b9\u5f0f\u7ed9\u51fa\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u591a\u4e2a\u957f\u89c6\u9891\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002DrVideo\u5728EgoSchema\uff083\u5206\u949f\uff09\u6d4b\u8bd5\u4e2d\u6bd4\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa3.8\u4e2a\u767e\u5206\u70b9\uff0c\u5728MovieChat-1K\uff0810\u5206\u949f\uff09\u7684break\u6a21\u5f0f\u548cglobal\u6a21\u5f0f\u4e2d\u5206\u522b\u63d0\u9ad817.9\u548c38.0\u5206\uff0c\u4ee5\u53ca\u5728LLama-Vid QA\uff08\u8d85\u8fc760\u5206\u949f\uff09\u6570\u636e\u96c6\u4e0a\u63d0\u534730.2\u5206\u3002|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845](http://arxiv.org/abs/2406.12845)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\uff08RM\uff09\uff0c\u8fc7\u7a0b\u901a\u5e38\u4ece\u6bd4\u8f83\u540c\u4e00\u7528\u6237\u8bf7\u6c42\u7684\u54cd\u5e94\u5f00\u59cb\uff0c\u76f8\u5bf9\u8bc4\u5206\u6307\u793a\u4eba\u7c7b\u66f4\u559c\u6b22\u54ea\u4e2a\u54cd\u5e94\u3002\u7136\u800c\uff0c\u7531\u4e8eRM\u7684\u9ed1\u76d2\u7279\u6027\uff0c\u5176\u8f93\u51fa\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u4eba\u4eec\u96be\u4ee5\u7406\u89e3\u4e3a\u4ec0\u4e48RM\u8ba4\u4e3a\u67d0\u4e2a\u56de\u590d\u662f\u597d\u7684\u3002\u9274\u4e8eRM\u4f5c\u4e3a\u4eba\u7c7b\u504f\u597d\u7684\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u8bae\u91c7\u7528\u4e24\u9636\u6bb5\u65b9\u6cd5\u6765\u521b\u5efa\u53ef\u89e3\u91ca\u7684RM\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u591a\u7ef4\u7edd\u5bf9\u8bc4\u5206\u6570\u636e\u8bad\u7ec3\u7edd\u5bf9\u8bc4\u7ea7\u591a\u76ee\u6807\u5956\u52b1\u6a21\u578b\uff08ArmoRM\uff09\uff0c\u6bcf\u4e2a\u7ef4\u5ea6\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u76ee\u6807\uff08\u5982\u8bda\u5b9e\u3001\u8be6\u5c3d\u3001\u5b89\u5168\uff09\uff1b\u5176\u6b21\uff0c\u5229\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7b56\u7565\uff0c\u7ed3\u5408\u4e00\u4e2a\u95e8\u63a7\u7f51\u7edc\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u81ea\u52a8\u9009\u62e9\u6700\u5408\u9002\u7684\u5956\u52b1\u76ee\u6807\u3002\u6211\u4eec\u6210\u529f\u5730\u4f7f\u7528Llama-3 8B\u8bad\u7ec3\u4e86ArmoRM\uff0c\u5e76\u5728\u9876\u90e8\u6dfb\u52a0\u4e86\u4e00\u4e2a\u6d45\u5c42MLP\u4f5c\u4e3a\u95e8\u63a7\u7f51\u7edc\uff0c\u5f62\u6210\u4e86ArmoRM-Llama3-8B\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u8bc4\u4f30RM\u7684\u8bed\u8a00\u5efa\u6a21\u6027\u80fd\u7684RewardBench\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6210\u7ee9\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8d85\u8fc7\u4e86\u4f7f\u7528GPT-4\u6cd5\u5b98\u7684LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\uff0c\u5e76\u63a5\u8fd1\u4e8e\u89c4\u6a21\u66f4\u5927\u7684Nemotron-4 340B\u5956\u52b1\u6a21\u578b\u7684\u6c34\u5e73\u3002**|\n", "2406.12844": "|**2024-06-18**|**Synergizing Foundation Models and Federated Learning: A Survey**|Shenghui Li et.al.|[2406.12844](http://arxiv.org/abs/2406.12844)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3001\u89c6\u89c9Transformer\u548c\u591a\u6a21\u6001\u6a21\u578b\u7b49\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u7684\u53d1\u5c55\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u4e0e\u5c0f\u578b\u6a21\u578b\u76f8\u6bd4\uff0cFMs\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5bf9\u5927\u91cf\u6570\u636e\u7684\u9700\u6c42\u66f4\u5927\u3002\u5c3d\u7ba1\u901a\u7528FMs\u53ef\u4ee5\u4f7f\u7528\u4e92\u8054\u7f51\u4e0a\u7684\u516c\u5f00\u6570\u636e\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u4f46\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684FMs\u9700\u8981\u4e13\u6709\u6570\u636e\uff0c\u8fd9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u56e0\u9690\u79c1\u95ee\u9898\u800c\u9762\u4e34\u6570\u636e\u53ef\u7528\u6027\u6311\u6218\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4f5c\u4e3a\u4e00\u79cd\u534f\u4f5c\u5b66\u4e60\u8303\u5f0f\uff0c\u6253\u7834\u4e86\u6570\u636e\u5171\u4eab\u7684\u969c\u788d\uff0c\u4e3a\u5229\u7528\u5206\u5e03\u5f0f\u6570\u636e\u5b9a\u5236\u548c\u9002\u5e94\u5404\u79cd\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u7684FMs\u63d0\u4f9b\u4e86\u524d\u666f\uff0c\u540c\u65f6\u4fdd\u62a4\u4e86\u6570\u636e\u9690\u79c1\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8bba\u6587\u63a2\u8ba8\u4e86FL\u4e0eFMs\u878d\u5408\u7684\u6f5c\u529b\u4e0e\u6311\u6218\uff0c\u603b\u7ed3\u4e86\u6838\u5fc3\u6280\u672f\u3001\u672a\u6765\u53d1\u5c55\u65b9\u5411\u4ee5\u53ca\u5e94\u7528\u573a\u666f\u3002\u5173\u4e8eFM-FL\u7684\u5b9a\u671f\u66f4\u65b0\u8bba\u6587\u96c6\u5408\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832](http://arxiv.org/abs/2406.12832)|**[link](https://github.com/arminazizi98/lamda)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u5df2\u7ecf\u6210\u4e3a\u6807\u51c6\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u5d4c\u5165\u7ef4\u5ea6\u7684\u589e\u52a0\uff0cLoRA\u6240\u9700\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u4e5f\u968f\u4e4b\u4e0a\u5347\uff0c\u5bfc\u81f4\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6b64\u5916\uff0c\u5176\u540e\u5411\u66f4\u65b0\u9700\u8981\u5b58\u50a8\u9ad8\u7ef4\u4e2d\u95f4\u6fc0\u6d3b\u548c\u4f18\u5316\u5668\u72b6\u6001\uff0c\u5bf9GPU\u5185\u5b58\u9700\u6c42\u8f83\u5927\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\u57fa\u4e8e\u8c31\u5206\u89e3\u7684\u4f4e\u7ef4\u9002\u5e94\uff08LaMDA\uff09\u3002LaMDA\u901a\u8fc7\u51bb\u7ed3\u7b2c\u4e00\u6295\u5f71\u77e9\u9635\uff08PMA\uff09\uff0c\u540c\u65f6\u5f15\u5165\u4e00\u4e2a\u4f4e\u7ef4\u53ef\u8bad\u7ec3\u7684\u5e73\u65b9\u77e9\u9635\uff0c\u5b9e\u73b0\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u7684\u5927\u5e45\u51cf\u5c11\u3002\u5728\u65e9\u671f\u7684\u5fae\u8c03\u9636\u6bb5\uff0cLaMDA\u9010\u6b65\u51bb\u7ed3\u7b2c\u4e8c\u6295\u5f71\u77e9\u9635\uff08PMB\uff09\uff0c\u8fdb\u4e00\u6b65\u964d\u4f4e\u6743\u91cd\u66f4\u65b0\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u63d0\u9ad8\u53c2\u6570\u6548\u7387\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u589e\u5f3a\u7248LaMDA++\uff0c\u5b83\u901a\u8fc7\u89c4\u8303\u5316\u9884\u8bad\u7ec3\u6a21\u578b\u6743\u91cd\u7684\u8c31\u5206\u6790\uff0c\u5b9e\u73b0\u8f7b\u91cf\u7ea7\u7684LoRA\u8def\u5f84\u81ea\u9002\u5e94\u79e9\u5206\u914d\u3002\u6211\u4eec\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecGLUE\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u57fa\u51c6\u3001\u6587\u672c\u6458\u8981\u3001\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4ee5\u53ca\u590d\u6742\u63a8\u7406\uff0c\u5e94\u7528\u4e8e\u4e0d\u540c\u7c7b\u578b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLaMDA\u5728\u6027\u80fd\u4e0a\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u5f53\u6216\u8d85\u8d8a\uff0c\u4e14\u5728\u5fae\u8c03\u671f\u95f4\u53ef\u51cf\u5c11\u9ad8\u8fbe17.7\u500d\u7684\u53c2\u6570\u66f4\u65b0\u6b21\u6570\uff0c\u4ee5\u53ca1.32\u500d\u7684\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u3002\u6211\u4eec\u5c06\u516c\u5f00\u4ee3\u7801\u3002**|\n", "2406.12822": "|**2024-06-18**|**Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?**|Pinzhen Chen et.al.|[2406.12822](http://arxiv.org/abs/2406.12822)|null|## \u80cc\u666f \u5927\u578b\u591a\u8bed\u8a00\u6a21\u578b\u65e8\u5728\u670d\u52a1\u4e0d\u540c\u8bed\u79cd\u7684\u6bcd\u8bed\u4f7f\u7528\u8005\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u5f53\u524d\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u5fae\u8c03\u548c\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4e0e\u5176\u521d\u8877\u4e0d\u7b26\uff0c\u539f\u56e0\u5728\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u7ffb\u8bd1\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ffb\u8bd1\u4e2d\u7684\u7455\u75b5\u3002\u5c1a\u4e0d\u6e05\u695a\u6307\u4ee4\u6570\u636e\u7684\u6027\u8d28\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u8f93\u51fa\uff0c\u540c\u65f6\uff0c\u7528\u7ffb\u8bd1\u6d4b\u8bd5\u96c6\u6765\u6355\u6349\u8fd9\u4e9b\u7ec6\u5fae\u5dee\u522b\u662f\u5426\u6709\u6548\u3002\u7531\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u9636\u6bb5\u5e38\u5e38\u7ed3\u5408\u4f7f\u7528\u7ffb\u8bd1\u6570\u636e\uff0c\u8fd9\u4e9b\u6f5c\u5728\u95ee\u9898\u53ef\u80fd\u88ab\u5ffd\u89c6\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5728\u6307\u4ee4\u8c03\u4f18\u548c\u8bc4\u4f30\u9636\u6bb5\u4f7f\u7528\u63a7\u5236\u6027\u7684\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6570\u636e\uff0c\u6765\u63a2\u7a76\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u89c2\u5bdf\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u516b\u79cd\u57fa\u7840\u6a21\u578b\u548c\u516b\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u6bcd\u8bed\u6216\u751f\u6210\u6027\u57fa\u51c6\uff0c\u4f7f\u7528\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6307\u4ee4\u6570\u636e\u65f6\uff0c\u6a21\u578b\u6027\u80fd\u9ad8\u65f6\uff0c\u4e24\u8005\u4e4b\u95f4\u7684\u5dee\u5f02\u5c24\u4e3a\u660e\u663e\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u7684\u6d4b\u8bd5\u96c6\u4e0a\u5219\u4e0d\u7136\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u6b63\u5219\u5316\u5bf9\u4e8e\u7ed3\u6784\u5316\u4efb\u52a1\u6709\u76ca\uff0c\u4f46\u5bf9\u4e8e\u751f\u6210\u6027\u4efb\u52a1\u5219\u4e0d\u7136\u3002|\n", "2406.12809": "|**2024-06-18**|**Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?**|Zhe Yang et.al.|[2406.12809](http://arxiv.org/abs/2406.12809)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e0d\u4e00\u81f4\u7684\u95ee\u9898\uff0c\u4f8b\u5982\u5bf9\u91cd\u8ff0\u6216\u5fae\u5c0f\u987a\u5e8f\u53d8\u5316\u7684\u53cd\u5e94\u4e0d\u4e00\u81f4\u3002\u9664\u4e86\u8fd9\u4e9b\u4e0d\u7a33\u5b9a\u6027\uff0c\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\u5c3d\u7ba1LLMs\u80fd\u591f\u89e3\u51b3\u96be\u9898\uff0c\u4f46\u5728\u76f8\u5bf9\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\u5374\u53ef\u80fd\u5931\u8d25\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u4ece\u96be\u5230\u6613\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u6211\u4eec\u521b\u5efa\u4e86ConsisEval\u57fa\u51c6\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u5305\u542b\u4e24\u4e2a\u96be\u5ea6\u6709\u5e8f\u7684\u95ee\u9898\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u81f4\u6027\u5206\u6570\u7684\u6982\u5ff5\uff0c\u4ee5\u91cf\u5316\u8fd9\u79cd\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u5206\u6790\u901a\u8fc7\u76f8\u5bf9\u4e00\u81f4\u6027\u5206\u6570\u6539\u8fdb\u4e00\u81f4\u6027\u6f5c\u529b\u3002\u901a\u8fc7\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4ee5\u4e0b\u53d1\u73b0\uff1a(1) GPT-4\u83b7\u5f9792.2%\u7684\u6700\u9ad8\u4e00\u81f4\u6027\u5206\u6570\uff0c\u4f46\u4ecd\u56e0\u5197\u4f59\u4fe1\u606f\u7684\u5e72\u6270\u3001\u95ee\u9898\u8bef\u89e3\u7b49\u95ee\u9898\u5bf9\u7279\u5b9a\u95ee\u9898\u4e0d\u4e00\u81f4\uff1b(2) \u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u901a\u5e38\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u4e00\u81f4\u6027\uff0c\u4f46\u4e5f\u5b58\u5728\u4f8b\u5916\u60c5\u51b5\uff1b(3) \u5bf9\u4e8e Fine-tuning \u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u800c\u8a00\uff0c\u786c\u6570\u636e\u53ef\u4ee5\u63d0\u9ad8\u4e00\u81f4\u6027\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12800": "|**2024-06-18**|**Supporting Human Raters with the Detection of Harmful Content using Large Language Models**|Kurt Thomas et.al.|[2406.12800](http://arxiv.org/abs/2406.12800)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u6216\u8f85\u52a9\u4eba\u7c7b\u5ba1\u9605\u8005\u68c0\u6d4b\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u5982\u4ec7\u6068\u8a00\u8bba\u3001\u9a9a\u6270\u3001\u6781\u7aef\u4e3b\u4e49\u548c\u9009\u4e3e\u8bef\u5bfc\u3002\u901a\u8fc750,000\u6761\u8bc4\u8bba\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u6bd4\u65f6\u80fd\u8fbe\u523090%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u63d0\u51fa\u4e94\u79cd\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4ee5\u6574\u5408LLMs\u4e0e\u4eba\u5de5\u8bc4\u7ea7\uff0c\u4f8b\u5982\u9884\u7b5b\u9009\u975e\u66b4\u529b\u5185\u5bb9\u3001\u68c0\u6d4b\u4eba\u7c7b\u8bc4\u7ea7\u53ef\u80fd\u7684\u9519\u8bef\uff0c\u6216\u8005\u63d0\u4f9b\u5173\u952e\u4e0a\u4e0b\u6587\u4ee5\u652f\u6301\u4eba\u5de5\u8bc4\u7ea7\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u4e00\u4e2a\u4f18\u5316\u7684\u63d0\u793a\u6765\u652f\u6301\u8fd9\u4e9b\u8bbe\u8ba1\u6a21\u5f0f\u3002\u5728\u5b9e\u9645\u5e94\u7528\u7684\u8bd5\u70b9\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f18\u5316\u4eba\u529b\u8d44\u6e90\u6548\u7387\u65b9\u9762\u5b9e\u73b0\u4e8641.5%\u7684\u63d0\u5347\uff0c\u540c\u65f6\u5728\u68c0\u6d4b\u8fdd\u89c4\u5185\u5bb9\u7684\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e0a\u5206\u522b\u63d0\u9ad8\u4e869%\u81f311%\u3002|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793](http://arxiv.org/abs/2406.12793)|**[link](https://github.com/thudm/chatglm-6b)**|\u6211\u4eec\u4ecb\u7ecdChatGLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u4e0d\u65ad\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7cfb\u5217\u3002\u672c\u62a5\u544a\u4e3b\u8981\u5173\u6ce8GLM-4\u8bed\u8a00\u7cfb\u5217\uff0c\u5305\u62ecGLM-4\u3001GLM-4-Air\u548cGLM-4-9B\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u6211\u4eec\u5f53\u524d\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u96c6\u6210\u4e86\u524d\u4e09\u4ee3ChatGLM\u7684\u6240\u6709\u7ecf\u9a8c\u548c\u6559\u8bad\u3002\u8fd9\u4e9b\u6a21\u578b\u7ecf\u8fc7\u4e86\u5341\u4e07\u4ebf\u6b21\u8bad\u7ec3\uff0c\u4e3b\u8981\u6db5\u76d6\u4e2d\u6587\u548c\u82f1\u8bed\uff0c\u4ee5\u53ca\u5c11\u91cf\u6765\u81ea24\u79cd\u8bed\u8a00\u7684\u8bed\u6599\u5e93\uff0c\u4fa7\u91cd\u4e8e\u4e2d\u82f1\u6587\u7684\u5bf9\u9f50\u3002\u9ad8\u8d28\u91cf\u7684\u5bf9\u9f50\u662f\u901a\u8fc7\u591a\u9636\u6bb5\u7684\u540e\u8bad\u7ec3\u8fc7\u7a0b\u5b9e\u73b0\u7684\uff0c\u5305\u62ec\u76d1\u7763\u5fae\u8c03\u548c\u5b66\u4e60\u4eba\u7c7b\u53cd\u9988\u3002\u8bc4\u4f30\u663e\u793a\uff0cGLM-4\u5728\u901a\u7528\u6307\u6807\u5982MMLU\u3001GSM8K\u3001MATH\u3001BBH\u3001GPQA\u548cHumanEval\u4e0a\u63a5\u8fd1\u6216\u4f18\u4e8eGPT-4\uff1b\u5728IFEval\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63a5\u8fd1GPT-4 Turbo\uff1b\u5728\u957f\u6587\u672c\u4efb\u52a1\u4e0a\u4e0eGPT-4 Turbo\uff08128K\uff09\u548cClaude 3\u76f8\u5f53\uff1b\u5728\u4e2d\u6587\u5bf9\u9f50\u65b9\u9762\uff0cGLM-4\u4f18\u4e8eGPT-4\uff0c\u6839\u636eAlignBench\u8861\u91cf\u3002GLM-4 All Tools\u6a21\u578b\u8fdb\u4e00\u6b65\u8fdb\u884c\u4e86\u5bf9\u9f50\uff0c\u4ee5\u7406\u89e3\u7528\u6237\u610f\u56fe\u5e76\u80fd\u81ea\u4e3b\u51b3\u5b9a\u4f55\u65f6\u4f7f\u7528\u54ea\u79cd\u5de5\u5177\uff0c\u5982Web\u6d4f\u89c8\u5668\u3001Python\u89e3\u91ca\u5668\u3001\u6587\u672c\u8f6c\u56fe\u50cf\u6a21\u578b\u548c\u81ea\u5b9a\u4e49\u51fd\u6570\uff0c\u4ee5\u6709\u6548\u5730\u5b8c\u6210\u590d\u6742\u4efb\u52a1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5b83\u5728\u8bf8\u5982\u901a\u8fc7\u7f51\u7edc\u6d4f\u89c8\u83b7\u53d6\u4fe1\u606f\u548c\u4f7f\u7528Python\u89e3\u91ca\u5668\u89e3\u9898\u7b49\u4efb\u52a1\u4e0a\u4e0eGPT-4 All Tools\u76f8\u5339\u914d\u751a\u81f3\u8d85\u8d8a\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u4e00\u7cfb\u5217\u6a21\u578b\uff0c\u5305\u62ecChatGLM-6B\uff08\u4e09\u4ee3\uff09\u3001GLM-4-9B\uff08128K\u30011M\uff09\u3001GLM-4V-9B\u3001WebGLM\u548cCodeGeeX\uff0c\u57282023\u5e74\u4ec5Hugging Face\u4e0a\u5c31\u6709\u8d85\u8fc71000\u4e07\u6b21\u4e0b\u8f7d\u3002\u8fd9\u4e9b\u5f00\u6e90\u6a21\u578b\u53ef\u901a\u8fc7\u548c\u8bbf\u95ee\u3002|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784](http://arxiv.org/abs/2406.12784)|**[link](https://github.com/Cyno2232/UBENCH)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4f4e\u53ef\u89e3\u91ca\u6027\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u672a\u9884\u89c1\u60c5\u51b5\u4e0b\u5e38\u4f1a\u51fa\u73b0\u9519\u8bef\uff0c\u9650\u5236\u4e86\u5176\u4ef7\u503c\u3002\u5c3d\u7ba1\u5df2\u6709\u8bb8\u591a\u7814\u7a76\u81f4\u529b\u4e8e\u6784\u5efa\u5168\u9762\u7684\u8bc4\u4f30\u4f53\u7cfb\uff0c\u4f46\u5148\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u5173\u6ce8\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u5bf9\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u4e0d\u8db3\uff0c\u53ef\u80fd\u5bfc\u81f4\u4e0d\u7a33\u5b9a\u6027\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u8861\u91cfLLM\u53ef\u9760\u6027\u65f6\u8d44\u6e90\u6d88\u8017\u5927\uff0c\u4e14\u96be\u4ee5\u6d4b\u8bd5\u9ed1\u76d2\u6a21\u578b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86UBENCH\uff0c\u4e00\u4e2a\u5168\u9762\u7684LLM\u53ef\u9760\u6027\u8bc4\u4f30\u57fa\u51c6\u3002\u5b83\u5305\u542b3,978\u4e2a\u6db5\u76d6\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u3001\u63a8\u7406\u80fd\u529b\u7684\u591a\u9009\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cUBENCH\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u5355\u6b21\u91c7\u6837\u65b9\u6cd5\u663e\u8457\u8282\u7701\u4e86\u8ba1\u7b97\u8d44\u6e90\uff0c\u76f8\u8f83\u4e8e\u9700\u8981\u591a\u6b21\u91c7\u6837\u7684\u57fa\u7ebf\u65b9\u6cd5\u66f4\u4e3a\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528UBENCH\u8bc4\u4f30\u4e8615\u79cd\u6d41\u884cLLM\u7684\u53ef\u9760\u6027\uff0c\u53d1\u73b0GLM4\u8868\u73b0\u51fa\u8272\uff0c\u7d27\u968f\u5176\u540e\u7684\u662fGPT-4\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86Chain-of-Thought\u63d0\u793a\u3001\u89d2\u8272\u626e\u6f14\u63d0\u793a\u3001\u9009\u9879\u987a\u5e8f\u548c\u6e29\u5ea6\u5bf9LLM\u53ef\u9760\u6027\u7684\u5f71\u54cd\uff0c\u5206\u6790\u4e86\u5b83\u4eec\u5bf9\u4e0d\u540c\u6a21\u578b\u7684\u4e0d\u540c\u4f5c\u7528\u3002|\n", "2406.14563": "|**2024-06-20**|**Model Merging and Safety Alignment: One Bad Model Spoils the Bunch**|Hasan Abed Al Kader Hammoud et.al.|[2406.14563](http://arxiv.org/abs/2406.14563)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5408\u5e76\u662f\u4e00\u79cd\u7ecf\u6d4e\u9ad8\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u591a\u4e2a\u4e13\u5bb6\u7ea7LLMs\u6574\u5408\u6210\u4e00\u4e2a\u5168\u80fd\u6a21\u578b\uff0c\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u5408\u5e76\u8fc7\u7a0b\u4e2d\u5b89\u5168\u5bf9\u9f50\u7684\u91cd\u8981\u6027\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u6a21\u578b\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u6a21\u578b\u5408\u5e76\u5bf9\u5bf9\u9f50\u6027\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6d41\u884c\u7684\u6a21\u578b\u5408\u5e76\u6280\u672f\uff0c\u53d1\u73b0\u73b0\u6709\u65b9\u6cd5\u4e0d\u4ec5\u4f20\u9012\u4e86\u9886\u57df\u4e13\u4e1a\u77e5\u8bc6\uff0c\u8fd8\u4f20\u64ad\u4e86\u4e0d\u4e00\u81f4\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u6cd5\u89e3\u51b3\u65b9\u6848\uff1a(1) \u751f\u6210\u5408\u6210\u7684\u5b89\u5168\u6027\u548c\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c(2) \u5c06\u8fd9\u4e9b\u751f\u6210\u7684\u6570\u636e\u878d\u5165\u73b0\u6709\u7684\u6570\u636e\u9a71\u52a8\u7684\u6a21\u578b\u5408\u5e76\u4f18\u5316\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u80fd\u591f\u5c06\u5bf9\u9f50\u6027\u89c6\u4e3a\u53ef\u4ee5\u6700\u5927\u5316\u4e8e\u5408\u5e76\u540eLLM\u4e2d\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5408\u5e76\u8fc7\u7a0b\u4e2d\u6574\u5408\u5bf9\u9f50\u76f8\u5173\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u662f\u65e2\u80fd\u4fdd\u6301\u9886\u57df\u4e13\u957f\u53c8\u80fd\u5b9e\u73b0\u826f\u597d\u5bf9\u9f50\u7684\u6a21\u578b\u3002|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562](http://arxiv.org/abs/2406.14562)|null|\u5f53\u9762\u4e34\u6d89\u53ca\u89c6\u89c9\u601d\u7ef4\u7684\u95ee\u9898\u65f6\uff0c\u4eba\u7c7b\u4f1a\u81ea\u7136\u5730\u5207\u6362\u5230\u63a8\u7406\u6a21\u5f0f\uff0c\u5e38\u5e38\u5f62\u6210\u5fc3\u7406\u56fe\u50cf\u6216\u7ed8\u5236\u89c6\u89c9\u8f85\u52a9\u5de5\u5177\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6570\u5b66\u548c\u7b26\u53f7\u63a8\u7406\u65b9\u9762\u5c55\u73b0\u51fa\u826f\u597d\u8868\u73b0\uff0c\u901a\u8fc7\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u94fe\u6761\u601d\u8003\uff0c\u4f46\u5728\u5904\u7406\u53ef\u4ee5\u901a\u8fc7\u89c6\u89c9\u63a8\u7406\u8f7b\u677e\u89e3\u7b54\u7684\u6587\u672c\u67e5\u8be2\u65f6\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5927\u91cf\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u4e5f\u662f\u5982\u6b64\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u65b9\u6cd5\uff0c\u5373\u201c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u201d\uff0c\u6765\u89e3\u9501\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8de8\u6a21\u6001\u4e2d\u7684\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u3002\u767d\u677f\u601d\u7ef4\u63d0\u793a\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6bd4\u55bb\u6027\u7684\u201c\u767d\u677f\u201d\uff0c\u8ba9\u5176\u4ee5\u56fe\u50cf\u5f62\u5f0f\u5c55\u73b0\u63a8\u7406\u6b65\u9aa4\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u56fe\u50cf\u8fd4\u56de\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u5904\u7406\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u793a\u8303\u6216\u4e13\u7528\u6a21\u5757\uff0c\u800c\u662f\u5229\u7528\u6a21\u578b\u73b0\u6709\u7684\u4f7f\u7528Matplotlib\u548cTurtle\u7b49\u5e93\u7f16\u5199\u4ee3\u7801\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u7b80\u5355\u7b56\u7565\u5728\u56db\u4e2a\u6d89\u53ca\u89c6\u89c9\u548c\u7a7a\u95f4\u63a8\u7406\u7684\u56f0\u96be\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e0e\u94fe\u5f0f\u601d\u8003\u76f8\u6bd4\uff0cGPT-4o\u5728\u67d0\u4e9b\u573a\u666f\u4e0b\u5927\u5e45\u5931\u8d25\uff0c\u5305\u62ec\u4e00\u4e9b\u51c6\u786e\u7387\u4e3a0%\u7684\u60c5\u51b5\u4e0b\uff0c\u800c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u80fd\u63d0\u5347\u81f3\u9ad8\u8fbe92%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8be5\u6280\u672f\u7684\u6210\u529f\u4e4b\u5904\u53ca\u5176\u9519\u8bef\u6765\u6e90\u3002|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556](http://arxiv.org/abs/2406.14556)|**[link](https://github.com/memberre/asyncdriver)**|\u5c3d\u7ba1\u5b9e\u65f6\u89c4\u5212\u5668\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u4e3a\u63d0\u9ad8\u8fd0\u52a8\u89c4\u5212\u7684\u53ef\u89e3\u91ca\u6027\u548c\u53ef\u63a7\u6027\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u9a71\u52a8\u7684\u89c4\u5212\u5668\u4ecd\u9762\u4e34\u8d44\u6e90\u6d88\u8017\u5927\u548c\u63a8\u7406\u65f6\u95f4\u957f\u7684\u95ee\u9898\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5b9e\u7528\u90e8\u7f72\u3002\u9274\u4e8e\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AsyncDriver\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u5f02\u6b65LLM\u589e\u5f3a\u7684\u95ed\u73af\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u7684\u4e0e\u573a\u666f\u76f8\u5173\u7684\u6307\u4ee4\u7279\u5f81\uff0c\u6307\u5bfc\u5b9e\u65f6\u89c4\u5212\u5668\u8fdb\u884c\u7cbe\u786e\u548c\u53ef\u63a7\u7684\u8f68\u8ff9\u9884\u6d4b\u3002AsyncDriver\u5c55\u793a\u4e86LLMs\u5728\u7406\u89e3\u548c\u5904\u7406\u5411\u91cf\u5316\u573a\u666f\u6570\u636e\u53ca\u4e00\u7cfb\u5217\u8def\u7ebf\u6307\u793a\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u540c\u65f6\u901a\u8fc7\u5f02\u6b65\u8bbe\u8ba1\uff0c\u6709\u6548\u964d\u4f4e\u4e86LLM\u5e26\u6765\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u4fdd\u6301\u4e86\u4e0e\u4e4b\u76f8\u8fd1\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728nuPlan\u7684\u590d\u6742\u573a\u666f\u4e2d\u5b9e\u73b0\u4e86\u66f4\u4f18\u7684\u95ed\u73af\u8bc4\u4f30\u6027\u80fd\u3002|\n", "2406.14550": "|**2024-06-20**|**GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models**|Shilong Li et.al.|[2406.14550](http://arxiv.org/abs/2406.14550)|null|\u957f\u6587\u672c\u5904\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u5bf9\u590d\u6742\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u6709\u591a\u65b9\u52aa\u529b\u4f18\u5316LLMs\u5904\u7406\u957f\u8f93\u5165\uff0c\u4f46\u4f9d\u7136\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u63d0\u51faGraphReader\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u56fe\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u6784\u5efa\u6587\u672c\u56fe\u5e76\u8ba9\u4ee3\u7406\u81ea\u4e3b\u63a2\u7d22\u6765\u5904\u7406\u957f\u6587\u672c\u3002\u5f53\u63a5\u6536\u5230\u95ee\u9898\u65f6\uff0c\u4ee3\u7406\u4f1a\u9010\u6b65\u5206\u6790\u5e76\u5236\u5b9a\u5408\u7406\u8ba1\u5212\uff0c\u7136\u540e\u8c03\u7528\u9884\u5b9a\u4e49\u51fd\u6570\u8bfb\u53d6\u8282\u70b9\u5185\u5bb9\u548c\u90bb\u5c45\u4fe1\u606f\uff0c\u5b9e\u73b0\u4ece\u7c97\u5230\u7ec6\u7684\u56fe\u63a2\u7d22\u3002\u5728\u63a2\u7d22\u8fc7\u7a0b\u4e2d\uff0c\u4ee3\u7406\u4e0d\u65ad\u8bb0\u5f55\u65b0\u53d1\u73b0\u5e76\u53cd\u601d\u5f53\u524d\u60c5\u51b5\uff0c\u4ee5\u4f18\u5316\u83b7\u53d6\u4fe1\u606f\u7684\u8fc7\u7a0b\uff0c\u76f4\u5230\u6536\u96c6\u8db3\u591f\u4fe1\u606f\u751f\u6210\u7b54\u6848\u3002\u5728LV-Eval\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u75284k\u4e0a\u4e0b\u6587\u7a97\u53e3\u7684GraphReader\u572816k\u5230256k\u7684\u957f\u6587\u672c\u957f\u5ea6\u4e0a\uff0c\u76f8\u5bf9\u4e8eGPT-4-128k\u6709\u663e\u8457\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5355\u8df3\u548c\u591a\u8df3\u7684\u6311\u6218\u6027\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.14549": "|**2024-06-20**|**Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models**|Sunny Duan et.al.|[2406.14549](http://arxiv.org/abs/2406.14549)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5174\u8d77\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u53d1\u751f\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u4f46\u8fd9\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u9690\u79c1\u548c\u5b89\u5168\u7684\u91cd\u5927\u5fe7\u8651\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u5305\u542b\u6f5c\u5728\u654f\u611f\u6216\u4e13\u6709\u4fe1\u606f\u7684\u5927\u91cf\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u6570\u636e\u6cc4\u9732\u7684\u98ce\u9669\u2014\u2014\u5373\u6a21\u578b\u54cd\u5e94\u63ed\u793a\u90e8\u5206\u4fe1\u606f\u2014\u2014\u5c1a\u4e0d\u4e3a\u4eba\u5145\u5206\u7406\u89e3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u673a\u5668\u5b66\u4e60\u6a21\u578b\u4e2d\u7684\u8bb0\u5fc6\u73b0\u8c61\uff0c\u7279\u522b\u662f\u5173\u6ce8\u5176\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u6f14\u53d8\u3002\u6211\u4eec\u8c03\u67e5\u4e86\u8bad\u7ec3\u6570\u636e\u7684\u7edf\u8ba1\u7279\u6027\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5185\u7f16\u7801\u7684\u8bb0\u5fc6\uff0c\u901a\u8fc7\u8bc4\u4f30\u91cd\u590d\u5bf9\u8bb0\u5fc6\u7684\u5f71\u54cd\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6a21\u578b\u8bb0\u4f4f\u4e00\u4e2a\u5e8f\u5217\u7684\u6982\u7387\u4e0e\u5b83\u5728\u6570\u636e\u4e2d\u51fa\u73b0\u7684\u6b21\u6570\u5448\u5bf9\u6570\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5373\u4f7f\u6ca1\u6709\u540e\u7eed\u7684\u63a5\u89e6\uff0c\u67d0\u4e9b\u770b\u4f3c\u672a\u88ab\u8bb0\u4f4f\u7684\u5e8f\u5217\u4e5f\u53ef\u80fd\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6e10\u663e\u73b0\u3002\u8fd9\u79cd\u9690\u85cf\u7684\u5df2\u8bb0\u4f4f\u5e8f\u5217\u5bf9\u6570\u636e\u9690\u79c1\u6784\u6210\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u9690\u85cf\u5728\u6a21\u578b\u7684\u6700\u7ec8\u68c0\u67e5\u70b9\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8bca\u65ad\u6d4b\u8bd5\uff0c\u901a\u8fc7\u8003\u8651\u5b83\u4eec\u7684\u4ea4\u53c9\u71b5\u635f\u5931\u6765\u63ed\u793a\u8fd9\u4e9b\u6f5c\u5728\u7684\u8bb0\u5fc6\u5e8f\u5217\u3002|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546](http://arxiv.org/abs/2406.14546)|**[link](https://github.com/choidami/inductive-oocr)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b89\u5168\u98ce\u9669\uff0c\u4e00\u4e2a\u7b56\u7565\u662f\u4ece\u5176\u8bad\u7ec3\u6570\u636e\u4e2d\u5220\u9664\u5371\u9669\u77e5\u8bc6\u3002\u5c3d\u7ba1\u8fd9\u6d88\u9664\u4e86\u663e\u6027\u4fe1\u606f\uff0c\u4f46\u9690\u6027\u4fe1\u606f\u53ef\u80fd\u4ecd\u6563\u843d\u5728\u591a\u4e2a\u8bad\u7ec3\u6587\u6863\u4e2d\u3002\u6211\u4eec\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1aLLMs\u80fd\u5426\u901a\u8fc7\u62fc\u51d1\u8fd9\u4e9b\u9690\u542b\u7ebf\u7d22\uff0c\u63a8\u65ad\u51fa\u88ab\u5c4f\u853d\u7684\u77e5\u8bc6\uff1f\u4e3a\u6b64\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u65e0\u4e0a\u4e0b\u6587\u5f52\u7eb3\u63a8\u7406\uff08Inductive Out-of-Context Reasoning\uff0cOOCR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6cdb\u5316\u80fd\u529b\uff0c\u8981\u6c42LLMs\u6839\u636e\u5206\u5e03\u5728\u8bad\u7ec3\u6587\u6863\u4e2d\u7684\u8bc1\u636e\u63a8\u65ad\u6f5c\u5728\u4fe1\u606f\uff0c\u5e76\u5728\u65e0\u9700\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u60c5\u51b5\u4e0b\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u3002\u901a\u8fc7\u4e94\u4e2a\u4efb\u52a1\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u524d\u6cbfLLMs\u786e\u5b9e\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u5728\u4e00\u9879\u5b9e\u9a8c\u4e2d\uff0c\u4ec5\u5bf9\u4e00\u4e2a\u672a\u77e5\u57ce\u5e02\u4e0e\u5176\u4e0e\u5176\u4ed6\u5df2\u77e5\u57ce\u5e02\u4e4b\u95f4\u7684\u8ddd\u79bb\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u6ca1\u6709\u793a\u4f8b\u6216\u94fe\u5f0f\u601d\u8003\uff0c\u8be5LLM\u4e5f\u80fd\u8868\u8ff0\u51fa\u672a\u77e5\u57ce\u5e02\u662f\u5df4\u9ece\uff0c\u5e76\u636e\u6b64\u89e3\u7b54\u540e\u7eed\u95ee\u9898\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u63a5\u53d7\u5355\u4e2a\u786c\u5e01\u629b\u63b7\u7ed3\u679c\u8bad\u7ec3\u7684LLMs\u80fd\u5224\u65ad\u786c\u5e01\u662f\u5426\u504f\u659c\uff0c\u800c\u53ea\u63a5\u89e6$(x, f(x))$\u5bf9\u7684\u6a21\u578b\u80fd\u9610\u8ff0$f$\u7684\u5b9a\u4e49\u5e76\u8ba1\u7b97\u9006\u8fd0\u7b97\u3002\u867d\u7136OOCR\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u826f\u597d\uff0c\u4f46\u6211\u4eec\u4e5f\u53d1\u73b0\u5b83\u5e76\u4e0d\u603b\u662f\u53ef\u9760\u7684\uff0c\u7279\u522b\u662f\u5728\u5c0f\u578bLLMs\u5b66\u4e60\u590d\u6742\u7ed3\u6784\u65f6\u3002\u603b\u7684\u6765\u8bf4\uff0cLLMs\u65e0\u9700\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u5c31\u80fd\u201c\u4e32\u8054\u8d77\u201d\u4fe1\u606f\uff0c\u8fd9\u7ed9\u76d1\u63a7\u548c\u63a7\u5236\u5b83\u4eec\u83b7\u53d6\u7684\u77e5\u8bc6\u5e26\u6765\u4e86\u6f5c\u5728\u6311\u6218\u3002**|\n", "2406.14545": "|**2024-06-20**|**Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems**|\u0110or\u0111e Klisura et.al.|[2406.14545](http://arxiv.org/abs/2406.14545)|null|\u5173\u7cfb\u6570\u636e\u5e93\u5728\u73b0\u4ee3\u4fe1\u606f\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u662f\u5b58\u50a8\u3001\u67e5\u8be2\u548c\u7ba1\u7406\u6570\u636e\u7684\u6838\u5fc3\u3002\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u6587\u672c\u5230SQL\u6280\u672f\u5d2d\u9732\u5934\u89d2\uff0c\u6781\u5927\u5730\u63d0\u5347\u4e86\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u9690\u79c1\u548c\u5b89\u5168\u7684\u62c5\u5fe7\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e13\u6ce8\u4e8e\u63d0\u53d6\u6587\u672c\u5230SQL\u6a21\u578b\u6240\u4f9d\u8d56\u7684\u6570\u636e\u5e93\u6a21\u5f0f\u5143\u7d20\u3002\u4e86\u89e3\u6a21\u5f0f\u53ef\u80fd\u4f7fSQL\u6ce8\u5165\u653b\u51fb\u66f4\u4e3a\u5bb9\u6613\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u96f6\u77e5\u8bc6\u6846\u67b6\uff0c\u901a\u8fc7\u63d0\u51fa\u7cbe\u5fc3\u6784\u9020\u7684\u95ee\u9898\uff0c\u65e0\u9700\u76f4\u63a5\u4e86\u89e3\u6570\u636e\u5e93\uff0c\u8be5\u6846\u67b6\u80fd\u4fc3\u4f7f\u8fd9\u4e9b\u6a21\u578b\u5904\u7406\u8fd9\u4e9b\u95ee\u9898\u5e76\u751f\u6210\u8f93\u51fa\uff0c\u4ece\u800c\u63ed\u793a\u6570\u636e\u5e93\u6a21\u5f0f\u7ed3\u6784\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u5e94\u7528\u4e8e\u9488\u5bf9\u6587\u672c-SQL\u5bf9\u8fdb\u884c\u8fc7\u5fae\u8c03\u7684\u4e13\u7528\u6587\u672c\u5230SQL\u6a21\u578b\u4ee5\u53ca\u7528\u4e8eSQL\u751f\u6210\u7684\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u80fd\u591f\u4ee5\u63a5\u8fd10.75\u7684F1\u5206\u6570\u91cd\u6784\u8868\u540d\uff0c\u800c\u5bf9\u4e8e\u751f\u6210\u5f0f\u6a21\u578b\uff0c\u8fd9\u4e00\u5206\u6570\u66f4\u662f\u9ad8\u8fbe0.96\u3002|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544](http://arxiv.org/abs/2406.14544)|**[link](https://github.com/sparksjoe/prism)**|**## \u7ffb\u8bd1 \u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5904\u7406\u5404\u79cd\u89c6\u89c9\u95ee\u9898\u65f6\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u5177\u5907\u5f3a\u5927\u7684\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u611f\u77e5\u548c\u63a8\u7406\u5728\u73b0\u6709VLM\u4e2d\u7684\u4ea4\u7ec7\u6027\uff0c\u72ec\u7acb\u8bc4\u4f30\u8fd9\u4e24\u65b9\u9762\u7684\u80fd\u529b\u9887\u5177\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\u2014\u2014Prism\uff0c\u65e8\u5728\u5206\u79bb\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u5728\u89c6\u89c9\u95ee\u7b54\u4e2d\u7684\u4f5c\u7528\u3002Prism\u5206\u4e3a\u4e24\u4e2a\u9636\u6bb5\uff1a\u611f\u77e5\u9636\u6bb5\u5229\u7528VLM\u63d0\u53d6\u5e76\u4ee5\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u89c6\u89c9\u4fe1\u606f\uff1b\u63a8\u7406\u9636\u6bb5\u5219\u6839\u636e\u63d0\u53d6\u7684\u89c6\u89c9\u4fe1\u606f\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u54cd\u5e94\u3002\u8fd9\u79cd\u6a21\u5757\u5316\u8bbe\u8ba1\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u7cfb\u7edf\u5730\u6bd4\u8f83\u548c\u8bc4\u4f30\u4e0d\u540cVLM\u7684\u611f\u77e5\u548c\u63a8\u7406\u6027\u80fd\u3002 \u6211\u4eec\u7684\u5206\u6790\u6846\u67b6\u63d0\u4f9b\u4e86\u8bf8\u591a\u6d1e\u89c1\uff0c\u8bc1\u660e\u4e86Prism\u4f5c\u4e3a\u6210\u672c\u6548\u76ca\u9ad8\u7684\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5c06\u4e13\u6ce8\u4e8e\u611f\u77e5\u7684\u7b80\u5316VLM\u4e0e\u4e13\u4e3a\u63a8\u7406\u8bbe\u8ba1\u7684\u5f3a\u5927LLM\u76f8\u7ed3\u5408\uff0cPrism\u5728\u901a\u7528\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4f18\u5f02\u6210\u7ee9\uff0c\u540c\u65f6\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u548c\u8fd0\u8425\u6210\u672c\u3002\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\uff0c\u5f53Prism\u914d\u5907\u57fa\u7840\u76842B LLaVA VLM\u548c\u5f00\u6e90\u7684GPT-3.5\u65f6\uff0c\u5176\u5728\u4e25\u8c28\u7684\u591a\u6a21\u6001\u57fa\u51c6MMStar\u4e0a\u7684\u8868\u73b0\u53ef\u4e0e\u5927\u5341\u500d\u7684VLM\u76f8\u5f53\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728\uff1ahttps://github.com/SparksJoe/Prism\u3002**|\n", "2406.14541": "|**2024-06-21**|**Are LLMs Naturally Good at Synthetic Tabular Data Generation?**|Shengzhe Xu et.al.|[2406.14541](http://arxiv.org/abs/2406.14541)|**[link](https://github.com/anonymou9167/anonymouscode)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u548c\u56fe\u50cf\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u5728\u751f\u6210\u6700\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b\u2014\u2014\u8868\u683c\u6570\u636e\u65b9\u9762\u7684\u6f5c\u529b\u5374\u9c9c\u6709\u7814\u7a76\u3002\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u76f4\u63a5\u4f7f\u7528\u6216\u7ecf\u8fc7\u4f20\u7edf\u5fae\u8c03\u7684LLMs\u5728\u4f5c\u4e3a\u5408\u6210\u8868\u683c\u751f\u6210\u5668\u65f6\u8868\u73b0\u6781\u5dee\u3002\u7531\u4e8eLLMs\u7684\u81ea\u56de\u5f52\u7279\u6027\uff0c\u968f\u673a\u987a\u5e8f\u6392\u5217\u7684\u5fae\u8c03\u4e0e\u6355\u6349\u529f\u80fd\u6027\u4f9d\u8d56\u7684\u91cd\u8981\u6027\u76f8\u6096\uff0c\u5bfc\u81f4\u5b83\u4eec\u65e0\u6cd5\u5904\u7406\u6761\u4ef6\u6df7\u5408\u5206\u5e03\uff08\u8fd9\u662f\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u7ea6\u675f\u7684\u5173\u952e\uff09\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u4f7fLLMs\u53d8\u5f97\u611f\u77e5\u6392\u5217\u987a\u5e8f\u6765\u6539\u5584\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u4ece\u800c\u63d0\u5347\u5176\u6027\u80fd\u3002**|\n", "2406.14517": "|**2024-06-20**|**PostMark: A Robust Blackbox Watermark for Large Language Models**|Yapei Chang et.al.|[2406.14517](http://arxiv.org/abs/2406.14517)|**[link](https://github.com/lilakk/postmark)**|**\u6700\u6709\u6548\u7684\u68c0\u6d4b\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6587\u672c\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u5728\u89e3\u7801\u8fc7\u7a0b\u4e2d\u63d2\u5165\u53ef\u8bc6\u522b\u7684\u6807\u8bb0\uff0c\u5373\u6c34\u5370\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u83b7\u53d6\u5230LLM\u7684\u539f\u59cb\u6982\u7387\uff08logits\uff09\uff0c\u8fd9\u4f7f\u5f97LLM\u670d\u52a1\u63d0\u4f9b\u5546\u4e0d\u613f\u5206\u4eab\uff0c\u56e0\u4e3a\u62c5\u5fc3\u6a21\u578b\u6cc4\u9732\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u6c34\u5370\u9700\u8981\u6bcf\u4e2a\u63d0\u4f9b\u8005\u72ec\u7acb\u5f00\u53d1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u540e\u5904\u7406\u6c34\u5370\u65b9\u6848\uff0c\u540d\u4e3aPostMark\u3002\u5b83\u662f\u4e00\u79cd\u6a21\u5757\u5316\u7684\u3001\u751f\u6210\u540e\u63d2\u5165\u7684\u6c34\u5370\u7b56\u7565\uff0c\u65e0\u9700\u89e6\u53calogits\uff0c\u9002\u5408\u7b2c\u4e09\u65b9\u5b9e\u65bd\u3002PostMark\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u5bf9\u6297\u540c\u4e49\u53e5\u653b\u51fb\u80fd\u529b\uff1a\u6211\u4eec\u5728\u5b9e\u9a8c\u4e2d\u6db5\u76d6\u4e86\u516b\u4e2a\u57fa\u7840\u7b97\u6cd5\u3001\u4e94\u4e2a\u57fa\u7ebfLLM\u548c\u4e09\u4e2a\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86PostMark\u5bf9\u6587\u672c\u8d28\u91cf\u7684\u5f71\u54cd\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86\u8d28\u91cf\u548c\u6297\u6539\u5199\u653b\u51fb\u4e4b\u95f4\u7684\u6743\u8861\u3002\u7814\u7a76\u4ee3\u7801\u3001\u8f93\u51fa\u548c\u6ce8\u91ca\u5df2\u516c\u5f00\u5728https://github.com/lilakk/PostMark\u3002**|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.15330": "|**2024-06-21**|**Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance**|Haoling Li et.al.|[2406.15330](http://arxiv.org/abs/2406.15330)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u4f17\u591a\u7814\u7a76\u9886\u57df\u5e26\u6765\u4e86\u9769\u65b0\u3002\u5c3d\u7ba1\u4eba\u4eec\u666e\u904d\u77e5\u9053\u5fae\u8c03\u5bf9\u4e8e\u589e\u5f3aLLMs\u7684\u529f\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u8868\u660e\uff0c\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u5b58\u5728\u53c2\u6570\u5197\u4f59\u3002\u56e0\u6b64\uff0c\u6709\u7814\u7a76\u5efa\u8bae\u53ea\u66f4\u65b0\u90e8\u5206\u53c2\u6570\uff0c\u4f46\u8fd9\u672a\u80fd\u6709\u6548\u5229\u7528\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\u6765\u8bc6\u522b\u8bad\u7ec3\u4e2d\u7684\u91cd\u8981\u53c2\u6570\u3002\u8003\u8651\u5230\u68af\u5ea6\u672c\u8d28\u4e0a\u8574\u542b\u7740\u4efb\u52a1\u76f8\u5173\u6570\u636e\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68af\u5ea6\u63a9\u7801\u8c03\u4f18\uff08Gradient-Mask Tuning\uff0cGMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6839\u636e\u53c2\u6570\u7684\u68af\u5ea6\u4fe1\u606f\u9009\u62e9\u6027\u5730\u8fdb\u884c\u8bad\u7ec3\u66f4\u65b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8ba1\u7b97\u68af\u5ea6\u7684\u7edd\u5bf9\u503c\uff0c\u5e76\u5bf9\u8f83\u5c0f\u5e45\u5ea6\u7684\u68af\u5ea6\u5e94\u7528\u63a9\u7801\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMT\u4e0d\u4ec5\u4f18\u4e8e\u4f20\u7edf\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u8fd8\u63d0\u5347\u4e86LLM\u6027\u80fd\u7684\u4e0a\u9650\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cGMT\u5bf9\u63a9\u7801\u6bd4\u4f8b\u5177\u6709\u4e00\u5b9a\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u4e0e\u57fa\u672c\u7684\u5fae\u8c03\uff08Simple Fine-Tuning\uff0cSFT\uff09\u76f8\u5f53\u3002|\n", "2406.15325": "|**2024-06-21**|**Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks**|Hokyung Lee et.al.|[2406.15325](http://arxiv.org/abs/2406.15325)|**[link](https://github.com/hamminghq/bug-in-the-code-stack)**|\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u9488\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6d77\u91cf\u6587\u672c\u6587\u6863\u4e2d\u68c0\u7d22\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684Needle-in-a-Haystack\uff08NIAH\uff09\u57fa\u51c6\u7814\u7a76\u6709\u6240\u8fdb\u5c55\u3002\u968f\u7740LLMs\u5728\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u65e5\u76ca\u878d\u5408\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u4ee3\u7801\u73af\u5883\u4e2d\u7684\u8868\u73b0\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740LLMs\u671d\u7740\u7a0b\u5e8f\u5408\u6210\u65b9\u5411\u53d1\u5c55\uff0c\u5fc5\u987b\u786e\u4fdd\u5b83\u4eec\u80fd\u7406\u89e3\u8bed\u6cd5\u5e76\u7f16\u5199\u51fa\u7b26\u5408\u8bed\u6cd5\u89c4\u5219\u7684\u4ee3\u7801\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Bug In The Code Stack\uff08BICS\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u68c0\u9a8cLLMs\u8bc6\u522b\u7b80\u5355\u8bed\u6cd5\u9519\u8bef\u7684\u80fd\u529b\u4e8e\u5927\u578b\u6e90\u4ee3\u7801\u4e2d\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\u4e09\u4e2a\u5173\u952e\u70b9\uff1a\uff081\uff09\u4e0e\u6587\u672c\u73af\u5883\u76f8\u6bd4\uff0c\u57fa\u4e8e\u4ee3\u7801\u7684\u73af\u5883\u5bf9\u68c0\u7d22\u4efb\u52a1\u6784\u6210\u4e86\u66f4\u5927\u7684\u6311\u6218\uff1b\uff082\uff09\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff1b\uff083\uff09\u5c3d\u7ba1\u5982\u6b64\uff0c\u8f83\u957f\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e0e\u6027\u80fd\u4e0b\u964d\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u8fd9\u79cd\u4e0b\u964d\u7a0b\u5ea6\u5728\u4e0d\u540c\u7684\u6a21\u578b\u95f4\u6709\u6240\u4e0d\u540c\u3002|\n", "2406.15264": "|**2024-06-21**|**Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics**|Weijia Zhang et.al.|[2406.15264](http://arxiv.org/abs/2406.15264)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u53ef\u9760\u6216\u96be\u4ee5\u9a8c\u8bc1\u7684\u4fe1\u606f\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u7684LLMs\u5f15\u5165\u4e86\u5f15\u7528\uff0c\u4f7f\u5185\u5bb9\u57fa\u4e8e\u53ef\u6838\u67e5\u7684\u6765\u6e90\u3002\u7136\u800c\uff0c\u624b\u52a8\u8bc4\u4f30\u5f15\u7528\u662f\u5426\u5145\u5206\u652f\u6301\u76f8\u5173\u9648\u8ff0\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4fe1\u4ef0\u5ea6\u6307\u6807\u81ea\u52a8\u4f30\u8ba1\u5f15\u7528\u7684\u652f\u6301\u7a0b\u5ea6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u9650\u4e8e\u4e8c\u5206\u7c7b\uff0c\u5ffd\u89c6\u4e86\u5b9e\u9645\u573a\u666f\u4e2d\u5bf9\u7cbe\u7ec6\u7ea7\u522b\u5f15\u7528\u652f\u6301\u7684\u8003\u91cf\u3002\u4e3a\u4e86\u63a2\u7a76\u4fe1\u4ef0\u5ea6\u6307\u6807\u5728\u7cbe\u7ec6\u7ea7\u522b\u8bc4\u4f30\u4e2d\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6bd4\u8f83\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u68c0\u9a8c\u8fd9\u4e9b\u6307\u6807\u5728\u533a\u5206\u4e09\u79cd\u652f\u6301\u7b49\u7ea7\uff08\u5168\u9762\u3001\u90e8\u5206\u548c\u65e0\u652f\u6301\uff09\u4e4b\u95f4\u7684\u80fd\u529b\uff1a\u5168\u9762\u652f\u6301\u3001\u90e8\u5206\u652f\u6301\u548c\u4e0d\u652f\u6301\u3002\u6211\u4eec\u7684\u6846\u67b6\u91c7\u7528\u76f8\u5173\u6027\u5206\u6790\u3001\u5206\u7c7b\u8bc4\u4f30\u548c\u68c0\u7d22\u8bc4\u4f30\uff0c\u5168\u65b9\u4f4d\u8861\u91cf\u6307\u6807\u5206\u6570\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u4e00\u81f4\u6027\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u6ca1\u6709\u5355\u4e00\u6307\u6807\u5728\u6240\u6709\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63ed\u793a\u4e86\u7cbe\u7ec6\u7ea7\u522b\u652f\u6301\u8bc4\u4f30\u7684\u590d\u6742\u6027\u3002\u6839\u636e\u53d1\u73b0\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u4e3a\u5f00\u53d1\u66f4\u6709\u6548\u7684\u6307\u6807\u63d0\u4f9b\u4e86\u5b9e\u7528\u5efa\u8bae\u3002|\n", "2406.15231": "|**2024-06-21**|**Detecting Synthetic Lyrics with Few-Shot Inference**|Yanis Labrak et.al.|[2406.15231](http://arxiv.org/abs/2406.15231)|null|\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u7684\u97f3\u4e50\u5185\u5bb9\u9010\u6e10\u53d7\u5230\u5173\u6ce8\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u88ab\u6709\u6548\u5e94\u7528\u4e8e\u521b\u4f5c\u5404\u79cd\u98ce\u683c\u3001\u4e3b\u9898\u548c\u8bed\u8a00\u7ed3\u6784\u7684\u6b4c\u8bcd\uff0c\u8fd9\u63a8\u52a8\u4e86\u827a\u672f\u5bb6\u4eec\u7684\u521b\u4f5c\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u7248\u6743\u4fb5\u72af\u3001\u6d88\u8d39\u8005\u6ee1\u610f\u5ea6\u548c\u5185\u5bb9\u6ee5\u53d1\u7b49\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u68c0\u6d4b\u751f\u6210\u6b4c\u8bcd\u7684\u65b9\u6cd5\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5e76\u672a\u4e13\u6ce8\u4e8e\u8fd9\u4e00\u7279\u5b9a\u9886\u57df\u6216\u521b\u610f\u6587\u672c\u7684\u673a\u5668\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u7cbe\u5fc3\u6784\u5efa\u4e86\u9996\u4e2a\u9ad8\u8d28\u91cf\u5408\u6210\u6b4c\u8bcd\u6570\u636e\u96c6\uff0c\u5e76\u5bf9\u591a\u79cd\u57fa\u4e8e\u5c11\u91cf\u6837\u672c\u7684\u68c0\u6d4b\u65b9\u6cd5\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5b9a\u91cf\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u8f85\u4ee5\u4eba\u7c7b\u8bc4\u4ef7\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6700\u4f73\u5c11\u6570\u6837\u672c\u68c0\u6d4b\u5668\u2014\u2014\u57fa\u4e8eLLM2Vec\u7684\u65b9\u6cd5\u8d85\u8d8a\u4e86\u5728\u5176\u4ed6\u9886\u57df\u8868\u73b0\u5f3a\u52b2\u7684\u98ce\u683c\u548c\u7edf\u8ba1\u65b9\u6cd5\uff0c\u6210\u529f\u9274\u522b\u51fa\u4eba\u7c7b\u521b\u4f5c\u4e0e\u673a\u5668\u751f\u6210\u7684\u6b4c\u8bcd\uff0c\u4e14\u5c55\u73b0\u51fa\u826f\u597d\u7684\u8de8\u827a\u672f\u5bb6\u548c\u6a21\u578b\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u80fd\u6709\u6548\u8bc6\u522b\u751f\u6210\u540e\u7684\u4eba\u5de5\u6da6\u8272\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5728\u521b\u610f\u5185\u5bb9\u68c0\u6d4b\u9886\u57df\uff0c\u7279\u522b\u662f\u6cdb\u5316\u80fd\u529b\u548c\u5bf9\u66f4\u5927\u6b4c\u66f2\u5e93\u7684\u9002\u5e94\u6027\u65b9\u9762\uff0c\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6240\u6709\u6570\u636e\u96c6\u3001\u9884\u5904\u7406\u811a\u672c\u548c\u4ee3\u7801\u5df2\u516c\u5f00\u5728GitHub\u548cHugging Face\u4e0a\uff0c\u9075\u5faaApache 2.0\u8bb8\u53ef\u534f\u8bae\u3002|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227](http://arxiv.org/abs/2406.15227)|**[link](https://github.com/hitz-zentroa/cn-eval)**|\u968f\u7740\u7f51\u7edc\u4e0a\u9519\u8bef\u4fe1\u606f\u548c\u6709\u5bb3\u8a00\u8bba\u7684\u589e\u591a\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u53cd\u53d9\u4e8b\uff08Counter Narrative\uff0cCN\uff09\u751f\u6210\u6280\u672f\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u751f\u6210\u7684CN\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e4b\u95f4\u7684\u590d\u6742\u5173\u7cfb\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684CN\uff0c\u5373\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4f5c\u4e3a\u8bc4\u4f30\u5668\u3002\u901a\u8fc7\u4ee5\u9526\u6807\u8d5b\u5f62\u5f0f\u5bf9\u751f\u6210\u7684CN\u8fdb\u884c\u5bf9\u6218\u6bd4\u8f83\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u6a21\u578b\u6392\u540d\u6d41\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u95f4\u7684\u76f8\u5173\u7cfb\u6570\u8fbe\u52300.88\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528LLM\u8fdb\u884c\u96f6\u6837\u672c\uff08Zero-Shot\uff0cZS\uff09CN\u751f\u6210\u7684\u80fd\u529b\uff0c\u5bf9\u6bd4\u5206\u6790\u4e86\u804a\u5929\u3001\u6307\u4ee4\u548c\u57fa\u7840\u6a21\u578b\u7684\u6027\u80fd\u548c\u5c40\u9650\u6027\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u5fae\u8c03\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u9886\u57df\u6570\u636e\u4e0b\u7684\u54cd\u5e94\u5dee\u5f02\u3002\u7ed3\u8bba\u662f\uff0c\u5bf9\u4e8e\u6267\u884c\u8fd9\u9879\u4efb\u52a1\uff0c\u5982\u679c\u80fd\u907f\u514d\u56e0\u5b89\u5168\u987e\u8651\u800c\u62d2\u7edd\u751f\u6210\uff0c\u804a\u5929\u5bfc\u5411\u7684ZS\u6a21\u578b\u53ef\u80fd\u662f\u6700\u4f73\u9009\u62e9\u3002|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214](http://arxiv.org/abs/2406.15214)|null|## \u7ffb\u8bd1 \u5bf9\u8bdd\u7b56\u7565\u5728\u6784\u5efa\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5176\u5f00\u53d1\u548c\u7ef4\u62a4\u5f80\u5f80\u9700\u8981\u5bf9\u8bdd\u5efa\u6a21\u4e13\u5bb6\u7684\u5927\u91cf\u6295\u5165\u3002\u5c3d\u7ba1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u624b\u5934\u6709\u5927\u91cf\u7684\u5bf9\u8bdd\u6570\u636e\uff0c\u4f46\u4eba\u4eec\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5bf9\u8bdd\u7b56\u7565\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u901a\u8fc7\u5c55\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u5728\u5bf9\u8bdd\u6570\u636e\u8f6c\u5316\u4e3a\u7edf\u4e00\u7684\u4e2d\u95f4\u8868\u793a\u2014\u2014\u89c4\u8303\u5f62\u5f0f\u7684\u8fc7\u7a0b\u4e2d\u53d1\u6325\u4f5c\u7528\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5229\u7528\u53ef\u63a7\u4e14\u53ef\u89e3\u91ca\u7684\u56fe\u57fa\u65b9\u6cd5\u751f\u6210\u5bf9\u8bdd\u7b56\u7565\u7684\u6280\u672f\u3002\u901a\u8fc7\u5c06\u5bf9\u8bdd\u4e2d\u7684\u89c4\u8303\u5f62\u5f0f\u6574\u5408\u6210\u6d41\u7a0b\u7f51\u7edc\uff0c\u6211\u4eec\u53d1\u73b0\u8fd0\u884c\u56fe\u904d\u5386\u7b97\u6cd5\u6709\u52a9\u4e8e\u63d0\u53d6\u5bf9\u8bdd\u6d41\u7a0b\u3002\u76f8\u6bd4\u4ec5\u4f9d\u8d56LLM\u63d0\u53d6\u7684\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6d41\u7a0b\u66f4\u597d\u5730\u53cd\u6620\u4e86\u5e95\u5c42\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e8\u5728\u8d4b\u4e88\u5bf9\u8bdd\u8bbe\u8ba1\u8005\u66f4\u5927\u7684\u63a7\u5236\u529b\uff0c\u63d0\u4f9b\u4e00\u4e2a\u63d0\u5347\u5bf9\u8bdd\u7b56\u7565\u5f00\u53d1\u6548\u7387\u7684\u5de5\u5177\u3002|\n", "2406.15209": "|**2024-06-21**|**Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding**|Mohan Li et.al.|[2406.15209](http://arxiv.org/abs/2406.15209)|null|## \u80cc\u666f \u96f6\u6837\u672c\u8bed\u97f3\u8bed\u8a00\u7406\u89e3\uff08SLU\uff09\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u65e0\u9700\u5148\u524d\u8bad\u7ec3\u6570\u636e\u7684\u65b0\u9886\u57df\u7406\u89e3\u7528\u6237\u8bdd\u8bed\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5bfc\u81f4\u5e9e\u5927\u7684\u5b58\u50a8\u9700\u6c42\u548c\u590d\u6742\u6027\u3002\u672c\u6587\u63d0\u51fa\u4f7f\u7528 Whisper\uff0c\u4e00\u4e2a\u72ec\u7acb\u7684\u8bed\u97f3\u5904\u7406\u6a21\u578b\uff0c\u6765\u8fdb\u884c\u96f6\u6837\u672c\u7aef\u5230\u7aef\uff08E2E\uff09SLU\u3002\u4e3a\u5904\u7406\u672a\u89c1\u8fc7\u7684\u8bed\u4e49\u6807\u7b7e\uff0c\u6211\u4eec\u5c06SLU\u4efb\u52a1\u878d\u5165\u95ee\u7b54\uff08QA\uff09\u6846\u67b6\u4e2d\uff0c\u901a\u8fc7\u63d0\u793aWhisper\u89e3\u7801\u5668\u8fdb\u884c\u8bed\u4e49\u63a8\u65ad\u3002\u6211\u4eec\u91c7\u7528\u524d\u7f00\u8c03\u4f18\u65b9\u6cd5\u9ad8\u6548\u5730\u8bad\u7ec3\u8be5\u7cfb\u7edf\uff0c\u53ea\u4f18\u5316\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2aWhisper\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u7cfb\u7edf\u5728SLURP\u4e0a\u7684\u69fd\u4f4d\u586b\u5145\uff08SLU-F1\uff09\u5f97\u5206\u6bd4\u6700\u8fd1\u5f15\u5165\u7684\u96f6\u6837\u672c\u57fa\u51c6\u63d0\u9ad8\u4e8640.7%\u3002\u6b64\u5916\uff0c\u5728\u65e2\u5b9a\u548c\u8de8\u9886\u57df\u8bc4\u4f30\u73af\u5883\u4e0b\uff0c\u5b83\u4e0e\u57fa\u4e8eWhisper-GPT-2\u7684\u6a21\u5757\u5316\u7cfb\u7edf\u8868\u73b0\u76f8\u5f53\uff0c\u4f46\u6a21\u578b\u53c2\u6570\u51cf\u5c11\u4e8634.8%\u3002|\n", "2406.15198": "|**2024-06-21**|**Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms**|Santiago Berrezueta-Guzman et.al.|[2406.15198](http://arxiv.org/abs/2406.15198)|null|\u6ce8\u610f\u529b\u7f3a\u9677\u591a\u52a8\u969c\u788d\uff08ADHD\uff09\u662f\u4e00\u79cd\u795e\u7ecf\u53d1\u80b2\u969c\u788d\uff0c\u5176\u7279\u5f81\u4e3a\u6ce8\u610f\u529b\u4e0d\u96c6\u4e2d\u3001\u8fc7\u5ea6\u6d3b\u8dc3\u548c\u51b2\u52a8\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4f53\u7684\u65e5\u5e38\u751f\u6d3b\u548c\u751f\u6d3b\u8d28\u91cf\u3002\u804c\u4e1a\u7597\u6cd5\u5728ADHD\u7ba1\u7406\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u901a\u8fc7\u57f9\u517b\u65e5\u5e38\u751f\u6d3b\u6240\u9700\u7684\u6280\u80fd\uff0c\u63d0\u5347\u4e2a\u4f53\u5728\u5b66\u6821\u3001\u5bb6\u5ead\u548c\u793e\u4f1a\u73af\u5883\u4e2d\u5168\u9762\u53c2\u4e0e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\u548c\u793e\u4ea4\u8f85\u52a9\u673a\u5668\u4eba\uff09\u5728\u5fc3\u7406\u6cbb\u7597\u4e2d\u7684\u6f5c\u5728\u4ef7\u503c\uff0c\u4ee5\u5f25\u8865\u73b0\u6709\u7597\u6cd5\u7684\u5c40\u9650\uff0c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u652f\u6301\u5e76\u9002\u5e94\u4e2a\u4f53\u7684\u72ec\u7279\u9700\u6c42\u3002\u7136\u800c\uff0c\u5173\u4e8e\u8fd9\u4e9b\u5148\u8fdb\u6280\u672f\u5728ADHD\u7597\u6cd5\u4e2d\u7684\u8054\u5408\u5e94\u7528\u7814\u7a76\u5c1a\u5b58\u5728\u8f83\u5927\u7a7a\u767d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6574\u5408\u4e86ChatGPT-4 Turbo\u548cClaude-3 Opus\u4e24\u4e2a\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u5230\u4e00\u4e2a\u673a\u5668\u4eba\u52a9\u7406\u4e2d\uff0c\u4ee5\u8003\u5bdf\u5b83\u4eec\u5728\u673a\u5668\u4eba\u8f85\u52a9\u4e92\u52a8\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u5728\u4e00\u4e2a\u6a21\u62df\u6cbb\u7597\u573a\u666f\u4e2d\u6bd4\u8f83\u5b83\u4eec\u4e0e\u4e34\u5e8a\u9a8c\u8bc1\u7684\u5b9a\u5236\u6a21\u578b\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cChatGPT-4 Turbo\u5728\u6027\u80fd\u548c\u54cd\u5e94\u901f\u5ea6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u9002\u5408\u4e8e\u65f6\u95f4\u654f\u611f\u7684\u5e94\u7528\u3002\u800cClaude-3 Opus\u5728\u7406\u89e3\u3001\u8fde\u8d2f\u6027\u548c\u4f26\u7406\u8003\u91cf\u65b9\u9762\u8868\u73b0\u51fa\u4f18\u52bf\uff0c\u5f3a\u8c03\u5b89\u5168\u548c\u5438\u5f15\u4eba\u7684\u4e92\u52a8\u3002\u4e24\u8005\u90fd\u5c55\u73b0\u51fa\u521b\u65b0\u548c\u9002\u5e94\u6027\uff0c\u4f46ChatGPT-4 Turbo\u5728\u96c6\u6210\u7b80\u6613\u5ea6\u548c\u8bed\u8a00\u652f\u6301\u65b9\u9762\u66f4\u5177\u4f18\u52bf\u3002\u9009\u62e9\u54ea\u4e2a\u6a21\u578b\u53d6\u51b3\u4e8eADHD\u7597\u6cd5\u7684\u5177\u4f53\u9700\u6c42\u3002|\n", "2406.15187": "|**2024-06-21**|**UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis**|Yulong Hui et.al.|[2406.15187](http://arxiv.org/abs/2406.15187)|**[link](https://github.com/qinchuanhui/uda-benchmark)**|**## \u7ffb\u8bd1 \u5c3d\u7ba1\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation, RAG\uff09\u6280\u672f\u63d0\u5347\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u7279\u522b\u662f\u5728\u5b66\u672f\u6587\u732e\u548c\u91d1\u878d\u95ee\u7b54\u7b49\u9886\u57df\uff0c\u6570\u636e\u5e38\u5e38\u4ee5HTML\u6216PDF\u683c\u5f0f\u7684\u5197\u957f\u3001\u7ed3\u6784\u590d\u6742\u7684\u6587\u672c\u548c\u8868\u683c\u5f62\u5f0f\u5b58\u5728\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3a\u201cUnstructured Document Analysis\u201d\uff08UDA\uff09\u7684\u65b0\u57fa\u51c6\uff0c\u5b83\u5305\u542b2,965\u4efd\u771f\u5b9e\u4e16\u754c\u7684\u6587\u6863\u548c29,590\u4e2a\u4e13\u5bb6\u6807\u6ce8\u7684\u95ee\u7b54\u5bf9\u3002\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u57fa\u4e8eLLM\u548cRAG\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u5e76\u5728\u591a\u4e2a\u6587\u6863\u9886\u57df\u548c\u591a\u6837\u5316\u7684\u67e5\u8be2\u7c7b\u578b\u4e0a\u8bc4\u4f30\u7b54\u6848\u8d28\u91cf\u548c\u7b56\u7565\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u6709\u8da3\u7684\u7ed3\u679c\uff0c\u5f3a\u8c03\u4e86\u6570\u636e\u89e3\u6790\u548c\u68c0\u7d22\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u4e2a\u57fa\u51c6\u80fd\u591f\u4e3a\u73b0\u5b9e\u4e16\u754c\u7684\u6587\u6863\u5206\u6790\u5e94\u7528\u63d0\u4f9b\u542f\u793a\uff0c\u5e76\u4e3a\u5176\u53d1\u5c55\u670d\u52a1\u3002\u57fa\u51c6\u5957\u4ef6\u548c\u4ee3\u7801\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.16858": "|**2024-06-24**|**EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees**|Yuhui Li et.al.|[2406.16858](http://arxiv.org/abs/2406.16858)|**[link](https://github.com/safeailab/eagle)**|\u5728\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6210\u672c\u9ad8\u4e14\u8017\u65f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6295\u673a\u53d6\u5de7\u7684\u62bd\u6837\u65b9\u6cd5\u5982EAGLE\u5df2\u8bc1\u5b9e\u6709\u6548\u3002\u4f20\u7edf\u65b9\u6cd5\u5047\u8bbe\u8349\u7a3f\u6811\u7684\u63a5\u53d7\u7387\u4ec5\u4f9d\u8d56\u4e8e\u4ee4\u724c\u7684\u4f4d\u7f6e\uff0c\u7136\u800c\u6211\u4eec\u53d1\u73b0\u8fd9\u5176\u5b9e\u8fd8\u53d6\u51b3\u4e8e\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728EAGLE\u7684\u57fa\u7840\u4e0a\u63d0\u51fa\u4e86EAGLE-2\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u52a8\u6001\u8349\u7a3f\u6811\u6280\u672f\u5230\u8d77\u8349\u5efa\u6a21\u4e2d\u3002\u8fd9\u4e00\u6539\u8fdb\u5229\u7528\u4e86EAGLE\u7684\u8349\u7a3f\u6a21\u578b\u6821\u51c6\u826f\u597d\u7684\u7279\u6027\uff1a\u8349\u7a3f\u6a21\u578b\u7684\u4fe1\u5fc3\u5206\u6570\u80fd\u8fd1\u4f3c\u8868\u793a\u63a5\u53d7\u7387\uff0c\u8bef\u5dee\u8f83\u5c0f\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u7cfb\u5217\u7684LLMs\u548c\u516d\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aEAGLE-2\u7684\u901f\u5ea6\u63d0\u5347\u6bd4\u7387\u4e3a3.05\u500d\u52304.26\u500d\uff0c\u6bd4EAGLE-1\u5feb20%\u523040%\u3002\u6b64\u5916\uff0cEAGLE-2\u8fd8\u80fd\u4fdd\u6301\u751f\u6210\u6587\u672c\u5206\u5e03\u4e0d\u53d8\uff0c\u56e0\u6b64\u662f\u4e00\u4e2a\u65e0\u635f\u52a0\u901f\u7b97\u6cd5\u3002|\n", "2406.16838": "|**2024-06-24**|**From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models**|Sean Welleck et.al.|[2406.16838](http://arxiv.org/abs/2406.16838)|null|\u73b0\u4ee3\u7814\u7a76\u4e2d\u6700\u5f15\u4eba\u6ce8\u76ee\u7684\u53d1\u73b0\u4e4b\u4e00\u662f\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u589e\u52a0\u8ba1\u7b97\u8d44\u6e90\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u65ad\u65f6\u7684\u4f18\u5316\u65b9\u6cd5\u7684\u5173\u6ce8\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u7bc7\u7efc\u8ff0\u4e13\u95e8\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u63a8\u65ad\u65f6\u95f4\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u7edf\u4e00\u7684\u6570\u5b66\u6846\u67b6\u51fa\u53d1\uff0c\u8003\u5bdf\u4e86\u4e09\u4e2a\u9886\u57df\uff1a\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\u3001\u5143\u751f\u6210\u7b97\u6cd5\u548c\u9ad8\u6548\u751f\u6210\u3002\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\uff0c\u901a\u5e38\u79f0\u4e3a\u89e3\u7801\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e00\u6b21\u62bd\u6837\u4e00\u4e2atoken\u6216\u6784\u5efa\u8bcd\u7ea7\u641c\u7d22\u7a7a\u95f4\uff0c\u7136\u540e\u9009\u62e9\u8f93\u51fa\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5047\u8bbe\u80fd\u591f\u8bbf\u95ee\u8bed\u8a00\u6a21\u578b\u7684logits\u3001\u4e0b\u4e00\u4e2atoken\u5206\u5e03\u6216\u6982\u7387\u5206\u6570\u3002\u5143\u751f\u6210\u7b97\u6cd5\u5904\u7406\u90e8\u5206\u6216\u5b8c\u6574\u5e8f\u5217\uff0c\u878d\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u652f\u6301\u56de\u6eaf\uff0c\u5e76\u6574\u5408\u5916\u90e8\u4fe1\u606f\u3002\u9ad8\u6548\u751f\u6210\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11token\u6210\u672c\uff0c\u63d0\u9ad8\u751f\u6210\u901f\u5ea6\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u878d\u5408\u4e86\u6765\u81ea\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u73b0\u4ee3LLMs\u548c\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u4e09\u4e2a\u7814\u7a76\u793e\u533a\u7684\u89c2\u70b9\u3002|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833](http://arxiv.org/abs/2406.16833)|null|\u5728\u5f53\u524d\u7684\u80cc\u666f\u4e0b\uff0c\u8bc6\u522b\u7528\u6237\u5728\u5404\u79cd\u8bdd\u9898\u7684\u957f\u7bc7\u8ba8\u8bba\u4e2d\u7684\u89c2\u70b9\u548c\u7acb\u573a\u5bf9\u4e8e\u4e2a\u6027\u5316\u3001\u5e02\u573a\u7814\u7a76\u3001\u653f\u6cbb\u7ade\u9009\u3001\u5ba2\u6237\u670d\u52a1\u3001\u51b2\u7a81\u89e3\u51b3\u3001\u5b9a\u5411\u5e7f\u544a\u548c\u5185\u5bb9\u7ba1\u7406\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u52a8\u6807\u6ce8\u6570\u636e\u4ee5\u8bad\u7ec3\u6b64\u7c7b\u6a21\u578b\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u5982\u8017\u65f6\u6602\u8d35\u3001\u957f\u5bf9\u8bdd\u53ef\u80fd\u5f15\u5165\u566a\u58f0\uff0c\u4ee5\u53ca\u7528\u6237\u89c2\u70b9\u8f6c\u53d8\u7684\u5fae\u5999\u4e4b\u5904\u53ef\u80fd\u5bfc\u81f4\u89e3\u8bfb\u56f0\u96be\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u672c\u6587\u5c1d\u8bd5\u5229\u7528Mistral Large\u548cGPT-4\u81ea\u52a8\u5316\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\u7684\u6807\u6ce8\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u63a8\u7406\uff1a\u4e00\u662f\u7528\u6237\u7acb\u573a\u5206\u7c7b\uff0c\u5373\u5728\u5bf9\u8bdd\u4e2d\u5bf9\u7528\u6237\u5e16\u5b50\u7684\u89c2\u70b9\u8fdb\u884c\u4e94\u7ea7\u6807\u6ce8\uff1b\u4e8c\u662f\u7528\u6237\u56fa\u6267\u7a0b\u5ea6\u5206\u7c7b\uff0c\u5173\u6ce8\u7528\u6237\u5728\u6574\u4e2a\u5bf9\u8bdd\u4e2d\u7684\u603b\u4f53\u610f\u89c1\uff0c\u91c7\u7528\u56db\u7ea7\u6807\u6ce8\u3002\u901a\u8fc7\u5728764\u4e2a\u591a\u7528\u6237Reddit\u5bf9\u8bdd\u4e0a\u5e94\u7528\u96f6\u6837\u672c\u3001\u4e00\u793a\u4f8b\u548c\u5c11\u91cf\u6837\u4f8b\u6807\u6ce8\u7684\u591a\u6570\u6295\u7968\uff0c\u6211\u4eec\u521b\u5efa\u4e86USDC\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\u5bf9\u591a\u4e2a\u5c0f\u578b\u90e8\u7f72\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u548c\u6307\u4ee4\u8c03\u6574\uff0c\u7528\u4e8e\u6267\u884c\u4e94\u7c7b\u7acb\u573a\u548c\u56db\u7c7b\u56fa\u6267\u7a0b\u5ea6\u7684\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff1a[https://anonymous.4open.science/r/USDC-0F7F]\u3002|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828](http://arxiv.org/abs/2406.16828)|**[link](https://github.com/castorini/ragnarok)**|## \u80cc\u666f \u60a8\u53ef\u80fd\u4f53\u9a8c\u8fc7\u65b0\u7684Bing\u641c\u7d22\u6216Google AI\u6982\u8ff0\uff1f\u8fd9\u4e9b\u90fd\u53cd\u6620\u51fa\u5f53\u524d\u641c\u7d22\u5f15\u64ce\u6b63\u9010\u6b65\u53d1\u5c55\u5230\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u7cfb\u7edf\u3002\u8fd9\u7c7b\u7cfb\u7edf\u80fd\u6574\u5408\u5b9e\u65f6\u6570\u636e\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u63d0\u4f9b\u4fe1\u606f\u4e30\u5bcc\u3001\u6709\u6765\u6e90\u4e14\u7b80\u6d01\u7684\u6458\u8981\uff0c\u4e0e\u4f20\u7edf\u7684\u6587\u6863\u6392\u540d\u5c55\u793a\u65b9\u5f0f\u5f62\u6210\u5bf9\u6bd4\u3002\u56e0\u6b64\uff0c\u4e3a\u4e86\u63a8\u52a8RAG\u7cfb\u7edf\u8bc4\u4f30\u7684\u521b\u65b0\uff0c\u6211\u4eec\u63d0\u8bae\u5728TREC 2024\u5e74\u589e\u8bbeRAG\u7ade\u8d5b\u3002\u672c\u6587\u8be6\u8ff0\u4e86\u6211\u4eec\u5982\u4f55\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff1a\u63cf\u8ff0\u4e86\u53ef\u590d\u7528\u6846\u67b6Ragnar\\\"ok\u7684\u8bbe\u8ba1\uff0c\u89e3\u91ca\u4e86MS MARCO V2.1\u8bed\u6599\u5e93\u7684\u9009\u62e9\uff0c\u53d1\u5e03\u4e86\u7ade\u8d5b\u5f00\u53d1\u8bdd\u9898\uff0c\u5e76\u6807\u51c6\u5316\u4e86\u7528\u6237\u63a5\u53e3\u5b9a\u4e49\uff0c\u4ee5\u4fbf\u5229\u7528\u6237\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5229\u7528Ragnar\\\"ok\u5c55\u793a\u5173\u952e\u7684\u5de5\u4e1a\u57fa\u51c6\uff0c\u5982OpenAI\u7684GPT-4o\u548cCohere\u7684Command R+\u3002\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u4e00\u4e2a\u7f51\u9875\u754c\u9762\uff0c\u7528\u4e8e\u4e92\u52a8\u5f0f\u5730\u6bd4\u8f83\u4e0d\u540cRAG\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u5e76\u901a\u8fc7\u4f17\u5305\u65b9\u5f0f\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u5f00\u6e90Ragnar\\\"ok\u6846\u67b6\u548c\u57fa\u51c6\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684RAG\u7cfb\u7edf\u5efa\u7acb\u7edf\u4e00\u7684\u6807\u51c6\u3002|\n", "2406.16801": "|**2024-06-24**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801](http://arxiv.org/abs/2406.16801)|**[link](https://github.com/qurrent-ai/res-q)**|**## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u4fc3\u4f7f\u4e86\u4e00\u7c7b\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u7cfb\u7edf\u53d1\u5c55\uff0c\u5982\u5bf9\u5927\u578b\u4ee3\u7801\u4ed3\u5e93\u8fdb\u884c\u7f16\u8f91\u3002\u9274\u4e8eLLMs\u5bf9\u63d0\u793a\u5fae\u8c03\u7684\u9ad8\u654f\u611f\u6027\u548c\u4e0d\u53ef\u9884\u6d4b\u6027\uff0c\u8feb\u5207\u9700\u8981\u7a33\u5065\u7684\u8bc4\u4f30\u5de5\u5177\u6765\u63a8\u52a8\u8fd9\u4e9b\u7cfb\u7edf\u7684\u672a\u6765\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51faRES-Q\uff0c\u4e00\u4e2a\u9488\u5bf9$\\textbf{R}$epository $\\textbf{E}$diting $\\textbf{S}$ystems\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e100\u4e2a\u771f\u5b9e\u7684GitHub\u63d0\u4ea4\u6784\u5efa\u4e86100\u4e2a\u4ed3\u5e93\u7f16\u8f91\u4efb\u52a1\u3002\u7ed9\u5b9a\u7f16\u8f91\u6307\u4ee4\u548c\u4ee3\u7801\u4ed3\u5e93\uff0cRES-Q\u8bc4\u4f30LLM\u7cfb\u7edf\u83b7\u53d6\u4fe1\u606f\u5e76\u6784\u9020\u6ee1\u8db3\u6307\u4ee4\u8981\u6c42\u7684\u7f16\u8f91\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u8fd9\u79cd\u8bc4\u4f30\u65b9\u5f0f\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u80fd\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002 \u6211\u4eec\u4f7f\u7528Qurrent OS\u5f00\u53d1\u7684\u8bed\u8a00\u4ee3\u7406\u8f6f\u4ef6\u6784\u5efa\u4e86\u4e00\u4e2a\u4ed3\u5e93\u7f16\u8f91\u7cfb\u7edf\uff0c\u5bf9\u8be5\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5982Claude Sonnet 3.5\u548cGPT-4o\uff0c\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5c3d\u7ba1\u5728HumanEval\u4e0a\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6709\u6240\u5dee\u5f02\uff0c\u4f46\u5728RES-Q\u4e0a\uff0cClaude Sonnet 3.5\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6bd4GPT-4o\u9ad8\u51fa12%\uff0c\u8fd9\u8868\u660eRES-Q\u5177\u6709\u533a\u5206\u6a21\u578b\u80fd\u529b\u7684\u6f5c\u529b\uff0c\u968f\u7740\u4f20\u7edf\u57fa\u51c6\u63a5\u8fd1\u9971\u548c\uff0c\u5b83\u80fd\u63d0\u4f9b\u66f4\u6df1\u5165\u7684\u6d1e\u5bdf\u3002 \u6211\u4eec\u8fd8\u7814\u7a76\u4e86token\u6548\u7387\u3001\u4e0e\u73b0\u6709\u57fa\u51c6\u7684\u6027\u80fd\u5173\u8054\uff0c\u4ee5\u53ca\u5c01\u95ed\u6e90\u548c\u5f00\u6e90LLM\u4e4b\u95f4\u7684\u6709\u8da3\u5dee\u5f02\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u5728https://github.com/Qurrent-AI/RES-Q\u83b7\u53d6\u3002**|\n", "2406.16797": "|**2024-06-24**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797](http://arxiv.org/abs/2406.16797)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|**## \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u65b0\u4efb\u52a1\u7684\u65b9\u6cd5\u5e76\u4e0d\u9002\u7528\u4e8e\u591a\u4efb\u52a1\u9002\u5e94\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u4fee\u6539\u6240\u6709\u6a21\u578b\u6743\u91cd\uff0c\u5bfc\u81f4\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4ea7\u751f\u7834\u574f\u6027\u7684\u5e72\u6270\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u5148\u524d\u4efb\u52a1\u7684\u9057\u5fd8\uff0c\u4f7f\u5f97\u540c\u65f6\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u83b7\u5f97\u826f\u597d\u6027\u80fd\u53d8\u5f97\u56f0\u96be\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Lottery Ticket Adaptation\uff08LoTA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7a00\u758f\u9002\u5e94\u65b9\u6cd5\uff0c\u5b83\u8bc6\u522b\u5e76\u4f18\u5316\u6a21\u578b\u4e2d\u7684\u4e00\u4e2a\u7a00\u758f\u5b50\u7f51\u7edc\u3002\u6211\u4eec\u5728\u8bf8\u5982\u6307\u4ee4\u8ddf\u968f\u3001\u63a8\u7406\u3001\u6570\u5b66\u548c\u6458\u8981\u7b49\u590d\u6742\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86LoTA\u3002 ## \u65b9\u6cd5 LoTA\u901a\u8fc7\u53d1\u73b0\u548c\u4f18\u5316\u201c\u5f69\u7968\u5238\u201d\uff08\u6216\u7a00\u758f\u4efb\u52a1\u5411\u91cf\uff09\u6765\u5b9e\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u4e8e\u5168\u91cf\u5fae\u8c03\u548c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u3002LoTA\u4e0d\u4ec5\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd8\u80fd\u5728\u8bad\u7ec3\u5176\u4ed6\u4efb\u52a1\u540e\u4fdd\u6301\u826f\u597d\u7684\u8868\u73b0\uff0c\u4ece\u800c\u907f\u514d\u4e86\u707e\u96be\u6027\u9057\u5fd8\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d0\u53d6\u548c\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\uff0cLoTA\u8fd8\u652f\u6301\u5728\u9ad8\u5ea6\u4e0d\u540c\u7684\u4efb\u52a1\u95f4\u8fdb\u884c\u6a21\u578b\u878d\u5408\u3002 ## \u7ed3\u8bba \u603b\u7684\u6765\u8bf4\uff0cLoTA\u4f5c\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u7a00\u758f\u9002\u5e94\u7b56\u7565\uff0c\u4e3a\u591a\u4efb\u52a1\u5927\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u5728\u5904\u7406\u591a\u4e2a\u4efb\u52a1\u65f6\u4fdd\u6301\u7a33\u5b9a\u4e14\u9ad8\u6548\u7684\u8868\u73b0\u3002**|\n", "2406.16783": "|**2024-06-24**|**M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models**|Rishabh Maheshwary et.al.|[2406.16783](http://arxiv.org/abs/2406.16783)|null|## \u80cc\u666f \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u6307\u4ee4\u7684\u6821\u51c6\u8fc7\u7a0b\u4e2d\uff0c\u5fae\u8c03\uff08finetuning, IFT\uff09\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u4e9b\u6709\u6548\u7684IFT\u6570\u636e\u96c6\uff0c\u4f46\u5927\u591a\u96c6\u4e2d\u5728\u9ad8\u8d44\u6e90\u8bed\u8a00\u5982\u82f1\u8bed\u4e0a\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51fa\u4e00\u4e2a\u5168\u5408\u6210\u7684\u3001\u57fa\u4e8eEvol\u5206\u7c7b\u6cd5\u5f15\u5bfc\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\u2014\u2014M2Lingual\uff0c\u76ee\u6807\u662f\u63d0\u5347LLMs\u5728\u591a\u6837\u8bed\u8a00\u548c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002M2Lingual\u5171\u5305\u542b182,000\u4e2aIFT\u5bf9\uff0c\u6e90\u81ea\u4e0d\u540c\u79cd\u5b50\uff0c\u6db5\u76d670\u79cd\u8bed\u8a00\u300117\u4e2aNLP\u4efb\u52a1\u4ee5\u53ca\u901a\u7528\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u3002 ## \u76ee\u7684\u4e0e\u8d21\u732e \u4f7f\u7528M2Lingual\u8fdb\u884c\u8bad\u7ec3\u7684LLMs\u6027\u80fd\u663e\u8457\u4f18\u4e8e\u5927\u591a\u6570\u73b0\u6709\u7684\u591a\u8bed\u8a00IFT\u6570\u636e\u96c6\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u7ecfM2Lingual\u5fae\u8c03\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u7a33\u5065\u7684\u8de8\u8bed\u8a00\u80fd\u529b\uff0c\u65e0\u8bba\u662f\u5728\u6211\u4eec\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u7ffb\u8bd1\u8bc4\u4ef7\u57fa\u51c6\u4e0a\uff0c\u8fd8\u662f\u5728\u591a\u79cd\u591a\u6837\u7684\u591a\u8bed\u8a00\u4efb\u52a1\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8d21\u732e\u4e86Evol\u5206\u7c7b\u6cd5\u7684\u4e24\u6b65\u65b9\u6cd5\uff0c\u5e76\u516c\u5f00\u4e86M2Lingual\u7684\u6570\u636e\u96c6\uff1ahttps://huggingface.co/datasets/ServiceNow-AI/M2Lingual\u3002|\n", "2406.16779": "|**2024-06-24**|**It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension**|Sagi Shaier et.al.|[2406.16779](http://arxiv.org/abs/2406.16779)|null|\u8fc7\u53bb\u5341\u5e74\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5b9e\u8df5\u672a\u7ecf\u5145\u5206\u8bc4\u4f30\u5c31\u5df2\u786e\u7acb\u3002\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u8fd9\u4e00\u60c5\u51b5\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u95ee\u9898\uff1a1\uff09\u8f93\u5165\u987a\u5e8f\uff08\u5373\u95ee\u9898\u548c\u4e0a\u4e0b\u6587\uff09\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff1f\u9274\u4e8e\u8fd1\u671f\u5728\u8f93\u5165\u4fa7\u91cd\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63a2\u7a76\uff1a2\uff09\u5f3a\u8c03\u95ee\u9898\u3001\u4e0a\u4e0b\u6587\u6216\u4e24\u8005\u662f\u5426\u80fd\u63d0\u5347\u8868\u73b0\uff1f\u6211\u4eec\u57283\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u4e869\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u53d1\u73b0\u5148\u5448\u73b0\u4e0a\u4e0b\u6587\u518d\u7ed9\u51fa\u95ee\u9898\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u6700\u9ad8\u53ef\u8fbe31%\u7684\u51c6\u786e\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u5f3a\u8c03\u4e0a\u4e0b\u6587\u7684\u6548\u679c\u4f18\u4e8e\u7a81\u51fa\u663e\u793a\u95ee\u9898\uff0c\u800c\u4e14\u5bf9\u6a21\u578b\u7f3a\u4e4f\u53c2\u6570\u77e5\u8bc6\u6765\u56de\u7b54\u7684\u95ee\u9898\uff0c\u9488\u5bf9\u6027\u5730\u5f3a\u8c03\u8f93\u5165\u90e8\u5206\u5c24\u5176\u6709\u6548\u3002\u901a\u8fc7\u5c1d\u8bd5\u57fa\u4e8e\u63d0\u793a\u548c\u6ce8\u610f\u529b\u7684\u5f3a\u8c03\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u6709\u6548\u7684\u7b56\u7565\u51fa\u4eba\u610f\u6599\u5730\u7b80\u5355\uff1a\u53ea\u9700\u5728\u8f93\u5165\u4e2d\u9644\u52a0\u51e0\u4e2a\u6807\u8bb0\uff0c\u5c31\u80fd\u5b9e\u73b0\u9ad8\u8fbe36%\u7684\u51c6\u786e\u6027\u63d0\u5347\uff0c\u4f7f\u5f97\u5c0f\u578b\u6a21\u578b\u80fd\u591f\u8d85\u8d8a\u5176\u5927\u5f97\u591a\u7684\u540c\u7c7b\u6a21\u578b\u3002|\n", "2406.16777": "|**2024-06-24**|**Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024**|Sai Koneru et.al.|[2406.16777](http://arxiv.org/abs/2406.16777)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u88ab\u5e7f\u6cdb\u7814\u7a76\uff0c\u4ee5\u5e94\u7528\u4e8e\u8bf8\u5982\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u3001\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u751a\u81f3\u7aef\u5230\u7aef\u8bed\u97f3\u7ffb\u8bd1\uff08ST\uff09\u7b49\u4efb\u52a1\u3002\u672c\u6587\u4ecb\u7ecdKIT\u56e2\u961f\u5728\u53d7\u9650+LLM\u8d5b\u9053\u4e0b\u7684\u79bb\u7ebf\u63d0\u4ea4\uff0c\u6211\u4eec\u901a\u8fc7\u6574\u5408\u6700\u65b0\u6280\u672f\u6539\u8fdb\u4e86\u7ea7\u8054\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u5c06Mistral-7B\u6a21\u578b\\footnote{mistralai/Mistral-7B-Instruct-v0.1}\u878d\u5165\u5176\u4e2d\uff0c\u4ece\u4e24\u4e2a\u65b9\u9762\u589e\u5f3a\u7cfb\u7edf\uff1a\u4e00\u662f\u5229\u7528\u6211\u4eec\u7684\u7cfb\u7edf\u751f\u6210\u7684N-best\u5217\u8868\u7cbe\u70bcASR\u8f93\u51fa\uff0c\u901a\u8fc7\u5fae\u8c03LLM\u63d0\u9ad8\u8f6c\u5f55\u51c6\u786e\u6027\uff1b\u4e8c\u662f\u5bf9MT\u8f93\u51fa\u8fdb\u884c\u6587\u6863\u7ea7\u522b\u7684\u7cbe\u70bc\uff0c\u5229\u7528ASR\u548cMT\u9884\u6d4b\u6765\u63d0\u5347\u7ffb\u8bd1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u7684\u96c6\u6210\u4f7f\u5f97ASR\u7684Word Error Rate\u4e0b\u964d\u4e86\u7edd\u5bf90.3%\uff0cMT\u7684COMET\u8bc4\u5206\u63d0\u9ad8\u4e860.65%\u3002\u7136\u800c\uff0c\u5728\u5305\u542b\u91cd\u53e0\u8bf4\u8bdd\u8005\u548c\u80cc\u666f\u566a\u97f3\u7684\u6311\u6218\u6027\u6d4b\u8bd5\u96c6\u4e2d\uff0c\u7531\u4e8eASR\u6027\u80fd\u4e0d\u4f73\uff0cLLM\u96c6\u6210\u7684\u6548\u679c\u4e0d\u660e\u663e\u3002\u4e3a\u4e86\u6539\u5584\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u53ef\u80fd\u7f3a\u5931\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u5206\u5757\u957f\u5f62\u5f0f\u89e3\u7801\u7684ASR\u65b9\u6cd5\u3002|\n", "2406.16768": "|**2024-06-24**|**WARP: On the Benefits of Weight Averaged Rewarded Policies**|Alexandre Ram\u00e9 et.al.|[2406.16768](http://arxiv.org/abs/2406.16768)|null|### \u7ffb\u8bd1 \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u901a\u8fc7\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u6765\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f7f\u5176\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u4e3a\u4e86\u4fdd\u6301\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0cRLHF\u901a\u5e38\u91c7\u7528KL\u6563\u5ea6\u6b63\u5219\u5316\uff0c\u4f46\u8fd9\u4f1a\u9650\u5236\u5956\u52b1\u4f18\u5316\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5bf9\u9f50\u7b56\u7565\uff0c\u79f0\u4e3a\u6743\u91cd\u5e73\u5747\u5956\u52b1\u7b56\u7565\uff08WARP\uff09\u3002WARP\u5728\u4e09\u4e2a\u9636\u6bb5\u5728\u6743\u91cd\u7a7a\u95f4\u4e2d\u878d\u5408\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5b83\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7b56\u7565\u4f5c\u4e3aKL\u6b63\u5219\u5316\u7684\u52a8\u6001\u57fa\u51c6\u3002\u5176\u6b21\uff0c\u5e94\u7528\u7403\u9762\u63d2\u503c\u5c06\u72ec\u7acb\u5fae\u8c03\u7684\u7b56\u7565\u5408\u5e76\u6210\u4e00\u4e2a\u589e\u5f3a\u6a21\u578b\u3002\u6700\u540e\uff0c\u7ebf\u6027\u63d2\u503c\u5728\u5408\u5e76\u6a21\u578b\u548c\u521d\u59cb\u6a21\u578b\u4e4b\u95f4\u8fdb\u884c\uff0c\u4ee5\u6062\u590d\u9884\u8bad\u7ec3\u7279\u5f81\u3002\u8be5\u8fc7\u7a0b\u8fed\u4ee3\u8fdb\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6700\u7ec8\u6a21\u578b\u7528\u4f5c\u4e0b\u4e00\u8f6e\u7684\u9ad8\u7ea7\u521d\u59cb\u5316\uff0c\u9010\u6b65\u4f18\u5316KL\u4e0e\u5956\u52b1\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5b9e\u73b0\u56fa\u5b9aKL\u4e0b\u7684\u66f4\u9ad8\u5956\u52b1\u3002GEMMA\u7b56\u7565\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86WARP\u7684\u4f18\u70b9\uff0c\u5176\u8d28\u91cf\u548c\u5bf9\u9f50\u6027\u80fd\u4f18\u4e8e\u5f00\u6e90\u7684LLMs\u3002|\n", "2406.17770": "|**2024-06-25**|**MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning**|Xiangyu Zhao et.al.|[2406.17770](http://arxiv.org/abs/2406.17770)|**[link](https://github.com/phoenixz810/mg-llava)**|**## \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u6a21\u578b\u5c40\u9650\u4e8e\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u56fe\u50cf\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u8be6\u7ec6\u89c6\u89c9\u4fe1\u606f\u7684\u611f\u77e5\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MLLM\u2014\u2014MG-LLaVA\uff0c\u901a\u8fc7\u5f15\u5165\u591a\u5c3a\u5ea6\u89c6\u89c9\u6d41\uff0c\u5305\u62ec\u4f4e\u5206\u8fa8\u7387\u3001\u9ad8\u5206\u8fa8\u7387\u548c\u5bf9\u8c61\u7ea7\u7279\u5f81\uff0c\u6765\u589e\u5f3a\u6a21\u578b\u7684\u89c6\u89c9\u5904\u7406\u80fd\u529b\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u4ee5\u6355\u6349\u7cbe\u7ec6\u7ec6\u8282\uff0c\u5e76\u901a\u8fc7\u5377\u79ef\u95e8\u878d\u5408\u7f51\u7edc\u4e0e\u57fa\u7840\u89c6\u89c9\u7279\u5f81\u878d\u5408\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u5bf9\u8c61\u8bc6\u522b\u80fd\u529b\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u6765\u81ea\u79bb\u7ebf\u68c0\u6d4b\u5668\u786e\u5b9a\u7684\u8fb9\u754c\u6846\u7684\u7269\u4f53\u7ea7\u522b\u7279\u5f81\u3002MG-LLaVA\u4ec5\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u611f\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u7f16\u7801\u5668\uff08\u4ece38\u4ebf\u5230340\u4ebf\u53c2\u6570\uff09\u5b9e\u4f8b\u5316MG-LLaVA\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u5176\u6027\u80fd\u3002\u591a\u9879\u57fa\u51c6\u6d4b\u8bd5\u7684\u7ed3\u679c\u8868\u660e\uff0cMG-LLaVA\u5728\u540c\u7c7b\u53c2\u6570\u91cf\u7684\u73b0\u6709MLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u51fa\u8272\u7684\u6548\u7387\u3002\u4ee3\u7801\u5c06\u5728https://github.com/PhoenixZ810/MG-LLaVA\u4e0a\u5f00\u6e90\u3002**|\n", "2406.17764": "|**2024-06-25**|**BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning**|Ercong Nie et.al.|[2406.17764](http://arxiv.org/abs/2406.17764)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u79ef\u7d2f\u4e86\u4e30\u5bcc\u7684\u53c2\u6570\u77e5\u8bc6\uff0c\u4f46\u7531\u4e8e\u91cd\u65b0\u8bad\u7ec3\u6210\u672c\u9ad8\u6602\u4e14\u5bf9\u95ed\u6e90\u6a21\u578b\u4e0d\u53ef\u884c\uff0c\u66f4\u65b0\u8fd9\u4e9b\u77e5\u8bc6\u53d8\u5f97\u56f0\u96be\u3002\u77e5\u8bc6\u7f16\u8f91\uff08KE\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5141\u8bb8\u5728\u4e0d\u635f\u5bb3\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u66f4\u65b0LLMs\u7684\u77e5\u8bc6\u3002\u57fa\u4e8e\u201c\u4e0a\u4e0b\u6587\u5b66\u4e60\u201d\uff08ICL\uff09\u7684\u5373\u5e2dKE\u65b9\u6cd5\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u4f5c\u4e3a\u9ed1\u76d2\u5904\u7406\u3002\u8fc7\u53bb\uff0cKE\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u73af\u5883\uff0c\u800c\u5f53\u524d\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLMs\u5728\u8de8\u8bed\u8a00KE\u65b9\u9762\u7684\u6f5c\u529b\u5c1a\u672a\u5145\u5206\u6316\u6398\u3002\u4e3a\u4e86\u63a8\u52a8\u8fd9\u65b9\u9762\u7684\u66f4\u591a\u7814\u7a76\uff0c\u6211\u4eec\u63a8\u51fa\u4e86BMIKE-53\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u9488\u5bf953\u79cd\u4e0d\u540c\u8bed\u8a00\u7684\u4e09\u79cdKE\u4efb\u52a1\u7c7b\u578b\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u68af\u5ea6\u7684KE\u65b9\u6cd5\u2014\u2014\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7f16\u8f91\uff08MIKE\uff09\uff0c\u5e76\u5728BMIKE-53\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u5173\u6ce8\u8de8\u8bed\u8a00\u77e5\u8bc6\u8f6c\u79fb\u7684\u53ef\u9760\u6027\u3001\u6cdb\u5316\u6027\u3001\u5c40\u90e8\u6027\u548c\u53ef\u79fb\u690d\u6027\uff0c\u4e3a\u672a\u6765\u8de8\u8bed\u8a00KE\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u548c\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u901a\u8fc7\u533f\u540d\u4ed3\u5e93https://anonymous.4open.science/r/MIKE\u516c\u5f00\u83b7\u53d6\u3002|\n", "2406.17761": "|**2024-06-25**|**CaLMQA: Exploring culturally specific long-form question answering across 23 languages**|Shane Arora et.al.|[2406.17761](http://arxiv.org/abs/2406.17761)|**[link](https://github.com/2015aroras/calmqa)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u957f\u7bc7\u95ee\u7b54\u4efb\u52a1\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u9700\u751f\u6210\u6bb5\u843d\u7ea7\u522b\u7684\u7b54\u6848\u6765\u56de\u5e94\u590d\u6742\u95ee\u9898\u3002\u5c3d\u7ba1\u82f1\u8bed\u7684\u957f\u7bc7\u95ee\u7b54\u7814\u7a76\u5df2\u76f8\u5f53\u6df1\u5165\uff0c\u6d89\u53ca\u591a\u79cd\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u6307\u6807\uff0c\u4f46\u5176\u4ed6\u8bed\u8a00\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63a8\u51fa\u4e86CaLMQA\uff0c\u4e00\u4e2a\u5305\u542b2,600\u4e2a\u8de823\u79cd\u8bed\u8a00\u7684\u590d\u6742\u95ee\u9898\u96c6\u5408\uff0c\u5176\u4e2d\u5305\u62ec\u8d44\u6e90\u6709\u9650\u3001\u9c9c\u5c11\u7814\u7a76\u7684\u8bed\u8a00\uff0c\u5982\u6590\u6d4e\u8bed\u548c\u57fa\u6797\u8fea\u8bed\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u65e2\u5305\u62ec\u793e\u533a\u7f51\u7edc\u8bba\u575b\u4e0a\u6536\u96c6\u7684\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u4e5f\u5305\u542b\u4e86\u7531\u6bcd\u8bed\u4f7f\u7528\u8005\u64b0\u5199\u7684\u9898\u76ee\uff0c\u6211\u4eec\u4e3a\u6b64\u4e13\u95e8\u8058\u8bf7\u4e86\u4ed6\u4eec\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u4ea7\u751f\u4e86\u591a\u6837\u4e14\u590d\u6742\u7684\u9898\u76ee\uff0c\u53cd\u6620\u4e86\u6587\u5316\u4e3b\u9898\uff08\u5982\u4f20\u7edf\u3001\u6cd5\u5f8b\u3001\u65b0\u95fb\uff09\uff0c\u4ee5\u53ca\u6bcd\u8bed\u4f7f\u7528\u8005\u7684\u8bed\u8a00\u4e60\u60ef\u3002 \u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u81ea\u52a8\u8bc4\u4f30\uff0c\u4f7f\u7528\u4e86\u6211\u4eec\u65b0\u63d0\u51fa\u7684CaLMScore\u6307\u6807\uff0c\u8be5\u6307\u6807\u80fd\u68c0\u6d4b\u7b54\u6848\u4e2d\u7684\u8bed\u8a00\u9519\u8bef\u548c\u91cd\u590d\u8bcd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u4f4e\u8d44\u6e90\u8bed\u8a00\uff0cLLM\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u660e\u663e\u4e0b\u964d\u3002\u6211\u4eec\u5728\u90e8\u5206\u6a21\u578b\u7684\u4eba\u5de5\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u5bf9\u4e8e\u5177\u6709\u6587\u5316\u7279\u6027\u7684\u95ee\u9898\uff0c\u6a21\u578b\u8868\u73b0\u663e\u8457\u4f4e\u4e8e\u6587\u5316\u4e2d\u7acb\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5bf9LLM\u591a\u8bed\u8a00\u80fd\u529b\u53ca\u975e\u82f1\u8bed\u957f\u7bc7\u95ee\u7b54\u8bc4\u4ef7\u9886\u57df\u66f4\u6df1\u5165\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002**|\n", "2406.17755": "|**2024-06-25**|**Accelerating Clinical Evidence Synthesis with Large Language Models**|Zifeng Wang et.al.|[2406.17755](http://arxiv.org/abs/2406.17755)|null|\u4eba\u5de5\u667a\u80fd\u81ea\u52a8\u533b\u5b66\u53d1\u73b0\u662f\u8bb8\u591a\u4eba\u7684\u68a6\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTrialMind\u7684\u751f\u6210\u5f0fAI\u7ba1\u9053\uff0c\u65e8\u5728\u8fdb\u884c\u533b\u5b66\u7cfb\u7edf\u6027\u56de\u987e\uff0c\u6db5\u76d6\u7814\u7a76\u641c\u7d22\u3001\u7b5b\u9009\u548c\u6570\u636e\u63d0\u53d6\u9636\u6bb5\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u6bcf\u4e2a\u73af\u8282\uff0c\u5e76\u5f15\u5165\u4e13\u5bb6\u76d1\u7763\u4ee5\u51cf\u5c11\u9519\u8bef\u3002\u4e3a\u4e86\u8bc4\u4f30\u6027\u80fd\uff0c\u6211\u4eec\u521b\u5efa\u4e86TrialReviewBench\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u662f\u4e00\u4e2a\u5b9a\u5236\u7684\u5305\u542b870\u4efd\u6765\u81ea25\u7bc7\u5143\u5206\u6790\u8bba\u6587\u7684\u4e34\u5e8a\u7814\u7a76\u6807\u6ce8\u6570\u636e\uff0c\u6db5\u76d6\u4e0d\u540c\u533b\u7597\u6cbb\u7597\u9886\u57df\u3002\u7ed3\u679c\u663e\u793a\uff0cTrialMind\u663e\u8457\u63d0\u5347\u4e86\u6587\u732e\u5ba1\u67e5\u6548\u7387\uff0c\u5728\u4ece\u8d85\u8fc72000\u4e07\u7bc7PubMed\u7814\u7a76\u4e2d\u68c0\u7d22\u76f8\u5173\u7814\u7a76\u65f6\uff0c\u53ec\u56de\u7387\u9ad8\u8fbe0.897\u81f31.000\u3002\u5728\u7b5b\u9009\u9636\u6bb5\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8e\u57fa\u4e8e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7684\u65b9\u6cd5\uff08\u53ec\u56de\u7387\u5206\u522b\u4e3a0.227-0.246 vs. 0.000-0.102\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ed3\u679c\u63d0\u53d6\u65b9\u9762\u8d85\u8d8a\u4e86\u76f4\u63a5\u4f7f\u7528GPT-4\u7684\u8868\u73b0\uff0c\u51c6\u786e\u7387\u8303\u56f4\u4e3a0.65\u52300.84\u3002\u6211\u4eec\u8fd8\u652f\u6301\u68ee\u6797\u56fe\u4e2d\u7684\u4e34\u5e8a\u8bc1\u636e\u7efc\u5408\uff0c\u7ecf\u516b\u540d\u4eba\u7c7b\u6807\u6ce8\u5458\u9a8c\u8bc1\uff0c\u4ed6\u4eec\u666e\u904d\u66f4\u504f\u597dTrialMind\uff0c\u5176\u5728\u6d89\u53ca\u7684\u5ba1\u67e5\u4e2d\u80dc\u51fa\u7387\u4e3a62.5%\u81f3100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u5982TrialMind\uff0c\u80fd\u591f\u4fc3\u8fdb\u53ef\u9760\u4e14\u9ad8\u8d28\u91cf\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\uff0c\u4ece\u800c\u63d0\u5347\u4e34\u5e8a\u7814\u7a76\u7684\u6548\u7387\u3002|\n", "2406.17753": "|**2024-06-25**|**Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language**|Amalie Brogaard Pauli et.al.|[2406.17753](http://arxiv.org/abs/2406.17753)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u9762\u5bf9\u5927\u91cf\u8bd5\u56fe\u5f71\u54cd\u6211\u4eec\u7684\u4fe1\u606f\uff0c\u5982\u9884\u544a\u6d88\u606f\u3001\u8fa9\u8bba\u3001\u5e26\u6709\u653f\u6cbb\u8272\u5f69\u7684\u65b0\u95fb\u548c\u5ba3\u4f20\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5177\u6709\u8bf4\u670d\u529b\u6587\u672c\u7684\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u4e13\u6ce8\u4e8e\u7279\u5b9a\u9886\u57df\u6216\u7c7b\u578b\u529d\u8bf4\u7684\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u5206\u6790\uff0c\u65e8\u5728\u6d4b\u91cf\u548c\u57fa\u51c6LLMs\u5728\u88ab\u660e\u786e\u8981\u6c42\u589e\u5f3a\u6216\u51cf\u5c11\u8bf4\u670d\u529b\u65f6\uff0c\u4ee5\u53ca\u4ec5\u8981\u6c42\u8fdb\u884c\u91ca\u4e49\u65f6\u4ea7\u751f\u8bf4\u670d\u6027\u6587\u672c\u7684\u7a0b\u5ea6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u2014\u2014\u201cPersuasive-Pairs\u201d\uff0c\u5305\u542b\u4e00\u7ec4\u7531\u7b80\u77ed\u6587\u672c\u548cLLM\u91cd\u5199\u4ee5\u653e\u5927\u6216\u524a\u5f31\u8bf4\u670d\u529b\u7684\u6587\u672c\u5bf9\u3002\u6211\u4eec\u5bf9\u8fd9\u4e9b\u914d\u5bf9\u8fdb\u884c\u4e86\u591a\u6807\u6ce8\uff0c\u6309\u76f8\u5bf9\u5c3a\u5ea6\u8bc4\u4f30\u5176\u8bf4\u670d\u529b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u4e0d\u4ec5\u672c\u8eab\u5177\u6709\u4ef7\u503c\uff0c\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5b83\u8bad\u7ec3\u4e00\u4e2a\u56de\u5f52\u6a21\u578b\uff0c\u9884\u6d4b\u6587\u672c\u5bf9\u4e4b\u95f4\u8bf4\u670d\u529b\u7684\u5f97\u5206\uff0c\u4ece\u800c\u80fd\u591f\u5bf9\u4e0d\u540c\u9886\u57df\u7684LLMs\u8fdb\u884c\u8bc4\u5206\u548c\u6bd4\u8f83\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e0d\u540c\u7cfb\u7edf\u63d0\u793a\u5bf9LLaMA3\u4ea7\u751f\u7684\u5f71\u54cd\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5373\u4f7f\u5728\u4ec5\u8981\u6c42\u91ca\u4e49\u7684\u60c5\u51b5\u4e0b\uff0c\u4e0d\u540c\u7684\u201c\u89d2\u8272\u201d\u63d0\u793a\u4e5f\u4f1a\u663e\u8457\u6539\u53d8\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7814\u7a76LLM\u751f\u6210\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u8bed\u8a00\u7684\u91cd\u8981\u6027\u3002|\n", "2406.17737": "|**2024-06-25**|**LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users**|Elinor Poole-Dayan et.al.|[2406.17737](http://arxiv.org/abs/2406.17737)|null|\u5728\u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5173\u4e8e\u5b83\u4eec\u7684\u4e0d\u53ef\u9760\u884c\u4e3a\uff0c\u5982\u865a\u6784\u548c\u504f\u89c1\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u7684\u56de\u7b54\u8d28\u91cf\u5728\u4fe1\u606f\u51c6\u786e\u6027\u3001\u771f\u5b9e\u6027\u4ee5\u53ca\u62d2\u7edd\u56de\u7b54\u65b9\u9762\uff0c\u5982\u4f55\u968f\u7740\u4e09\u79cd\u7528\u6237\u7279\u5f81\u7684\u53d8\u5316\u800c\u53d8\u5316\uff1a\u82f1\u8bed\u6c34\u5e73\u3001\u6559\u80b2\u7a0b\u5ea6\u548c\u56fd\u7c4d\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u6700\u5148\u8fdb\u7684LLMs\u548c\u4e24\u4e2a\u4e8b\u5b9e\u6838\u67e5\u76f8\u5173\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u771f\u5b9e\u6027\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u5bf9\u82f1\u8bed\u80fd\u529b\u8f83\u4f4e\u3001\u6559\u80b2\u6c34\u5e73\u8f83\u4f4e\u4ee5\u53ca\u975e\u7f8e\u56fd\u7c4d\u7528\u6237\u7684\u56de\u7b54\u8d28\u91cf\u5b58\u5728\u66f4\u660e\u663e\u7684\u8d1f\u9762\u503e\u5411\uff0c\u8fd9\u4f7f\u5f97\u8fd9\u4e9b\u6a21\u578b\u5bf9\u4e8e\u5176\u6700\u5f31\u52bf\u7528\u6237\u6765\u8bf4\uff0c\u5e76\u975e\u53ef\u9760\u7684\u4fe1\u606f\u6765\u6e90\u3002|\n", "2406.17706": "|**2024-06-25**|**FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model**|Feijie Wu et.al.|[2406.17706](http://arxiv.org/abs/2406.17706)|**[link](https://github.com/HarliWu/FedBiOT)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u8fc7\u9002\u5f53\u9886\u57df\u7279\u5b9a\u6570\u636e\u7684\u5fae\u8c03\u540e\uff0c\u5728\u8bb8\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u4e13\u7528\u6570\u636e\u901a\u5e38\u5206\u5e03\u5728\u591a\u4e2a\u6240\u6709\u8005\u4e4b\u95f4\uff0c\u8fd9\u5c31\u63d0\u51fa\u4e86\u5982\u4f55\u5728\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4e2d\u8fdb\u884cLLM\u5fae\u8c03\u7684\u95ee\u9898\u3002\u9762\u5bf9\u6709\u9650\u7684\u8ba1\u7b97\u548c\u901a\u4fe1\u80fd\u529b\uff0cFL\u5ba2\u6237\u7aef\u5728\u6709\u6548\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FedBiOT\uff0c\u4e00\u79cd\u65e8\u5728\u63d0\u9ad8\u8d44\u6e90\u6548\u7387\u7684LLM\u5fae\u8c03FL\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u670d\u52a1\u5668\u751f\u6210\u4e00\u4e2a\u538b\u7f29\u7684LLM\uff0c\u5e76\u786e\u4fdd\u5176\u6027\u80fd\u4e0e\u5b8c\u6574\u6a21\u578b\u76f8\u5f53\u3002\u7136\u540e\uff0c\u5ba2\u6237\u7aef\u9488\u5bf9\u8fd9\u4e2a\u538b\u7f29\u6a21\u578b\u7684\u4e00\u4e2a\u8f7b\u91cf\u4f46\u91cd\u8981\u7684\u90e8\u5206\u2014\u2014\u9002\u914d\u5668\u8fdb\u884c\u5fae\u8c03\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7531\u4e8e\u670d\u52a1\u5668\u65e0\u6cd5\u8bbf\u95ee\u5ba2\u6237\u7aef\u62e5\u6709\u7684\u79c1\u4eba\u6570\u636e\uff0c\u670d\u52a1\u5668\u7528\u4e8e\u6821\u51c6\u7684\u6570\u636e\u5206\u5e03\u4e0e\u5ba2\u6237\u7aef\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u4e0d\u540c\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5efa\u6a21\u4e3a\u4e00\u4e2a\u5e26\u6709\u6570\u636e\u4e0d\u4e00\u81f4\u6027\u5f71\u54cd\u7684 bilevel \u4f18\u5316\u95ee\u9898\uff0c\u5e76\u5bfc\u51fa\u4e86\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u7684\u66f4\u65b0\u89c4\u5219\u3002\u6211\u4eec\u5728 LLaMA-2 \u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u5728\u91cd\u65b0\u6574\u5408\u5230\u5168\u5c40\u8bed\u8a00\u6a21\u578b\u65f6\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u8fd8\u8868\u660e\uff0cFedBiOT \u76f8\u6bd4\u73b0\u6709\u57fa\u51c6\u663e\u8457\u51cf\u5c11\u4e86\u8d44\u6e90\u6d88\u8017\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u76f8\u8fd1\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692](http://arxiv.org/abs/2406.17692)|**[link](https://github.com/thomlake/investigating-alignment)**|**\u8be5\u7814\u7a76\u5206\u6790\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ecf\u8fc7\u6821\u51c6\u540e\u8f93\u51fa\u5206\u5e03\u7684\u53d8\u5316\u7279\u6027\u3002\u9996\u5148\uff0c\u91cd\u65b0\u8bc4\u4f30\u4e86\u4e4b\u524d\u5173\u4e8e\u6821\u51c6\u540e\u54cd\u5e94\u591a\u6837\u6027\u964d\u4f4e\u7684\u62a5\u544a\uff0c\u53d1\u73b0\u8fd9\u79cd\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u8d28\u91cf\u63a7\u5236\u548c\u4fe1\u606f\u6574\u5408\u3002\u6821\u51c6\u80fd\u591f\u6291\u5236\u4e0d\u76f8\u5173\u548c\u65e0\u5e2e\u52a9\u7684\u5185\u5bb9\uff0c\u540c\u65f6\u4f7f\u8f93\u51fa\u5206\u5e03\u503e\u5411\u4e8e\u66f4\u957f\u7684\u3001\u6db5\u76d6\u591a\u4e2a\u57fa\u7840LLM\u54cd\u5e94\u4fe1\u606f\u7684\u7b54\u6848\uff0c\u5b9e\u8d28\u4e0a\u662f\u5c06\u591a\u6837\u5316\u4fe1\u606f\u6c47\u603b\u5728\u5355\u4e2a\u54cd\u5e94\u4e2d\u3002\u7814\u7a76\u5e76\u672a\u53d1\u73b0\u6821\u51c6\u663e\u8457\u51cf\u5c11\u6709\u7528\u4fe1\u606f\uff0c\u8fdb\u800c\u5f15\u51fa\u95ee\u9898\uff1a\u6821\u51c6\u6a21\u578b\u662f\u5426\u4f1a\u4ea7\u751f\u57fa\u7840\u6a21\u578b\u65e0\u6cd5\u518d\u73b0\u7684\u4fe1\u606f\uff1f\u7b2c\u4e8c\u90e8\u5206\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u60c5\u51b5\u5e76\u975e\u5982\u6b64\uff0c\u6821\u51c6\u6a21\u578b\u7684\u884c\u4e3a\u53ef\u4ee5\u901a\u8fc7\u57fa\u7840\u6a21\u578b\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u590d\u73b0\u3002\u901a\u8fc7\u4e0a\u4e0b\u6587\u793a\u4f8b\u548c\u8f83\u4f4e\u5206\u8fa8\u7387\u7684\u8bed\u4e49\u63d0\u793a\uff0c\u53ef\u4ee5\u4ece\u57fa\u7840LLMs\u5f15\u5bfc\u51fa\u4e0e\u6821\u51c6\u540e\u7684\u76f8\u4f3c\u54cd\u5e94\uff0c\u751a\u81f3\u4e0e\u6821\u51c6\u540e\u7684\u54cd\u5e94\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u63a5\u8fd1\u3002\u8fd9\u4e9b\u53d1\u73b0\u652f\u6301\u201c\u8868\u9762\u6821\u51c6\u5047\u8bbe\u201d\uff0c\u5373\u5f53\u524d\u7684\u6821\u51c6\u6280\u672f\u4ec5\u6355\u6349\u4e86\u52a9\u624b\u578b\u57fa\u7840LLM\u884c\u4e3a\u4e2d\u6709\u7528\u7684\u90e8\u5206\uff0c\u5e76\u672a\u6269\u5c55\u5176\u80fd\u529b\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u8fd8\u663e\u793a\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u6821\u51c6\u4f5c\u4e3a\u4e00\u79cd\u6a21\u4eff\u6821\u51c6LLMs\u7684\u7b56\u7565\uff0c\u6548\u679c\u51fa\u4eba\u610f\u6599\u5730\u597d\uff0c\u4e14\u65e0\u9700\u5fae\u8c03\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.17681": "|**2024-06-25**|**VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation**|Kun Qian et.al.|[2406.17681](http://arxiv.org/abs/2406.17681)|**[link](https://github.com/qbetterk/VarBench)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f20\u7edf\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u65e5\u76ca\u51fa\u8272\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u9884\u8bad\u7ec3\u671f\u95f4\u7684\u57fa\u51c6\u6570\u636e\u6cc4\u9732\u95ee\u9898\uff0c\u901a\u5e38\u79f0\u4e3a\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002\u4e3a\u4e86\u786e\u4fdd\u516c\u6b63\u7684\u8bc4\u4f30\uff0c\u6700\u8fd1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u516c\u5f00\u8bad\u7ec3\u548c\u9a8c\u8bc1\u96c6\uff0c\u5bf9\u6d4b\u8bd5\u96c6\u6807\u7b7e\u4fdd\u5bc6\u3002\u4ed6\u4eec\u8981\u6c42\u4efb\u4f55\u5e0c\u671b\u8bc4\u4f30\u81ea\u5df1\u8bed\u8a00\u6a21\u578b\u7684\u4eba\u90fd\u9700\u8981\u63d0\u4ea4\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u8fdb\u884c\u96c6\u4e2d\u5904\u7406\uff0c\u7136\u540e\u5728\u6392\u884c\u699c\u4e0a\u516c\u5e03\u6a21\u578b\u7684\u5f97\u5206\u3002\u7136\u800c\uff0c\u8fd9\u4e2a\u63d0\u4ea4\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u59a8\u788d\u4e86\u6709\u6548\u7684\u9519\u8bef\u5206\u6790\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u52a8\u6001\u5316\u57fa\u51c6\u6d4b\u8bd5\u5e76\u5b9e\u65f6\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u6bcf\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u4e2d\u63d0\u53d6\u53d8\u91cf\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u53d8\u91cf\u5b9a\u4e49\u4e00\u4e2a\u503c\u8303\u56f4\u3002\u6bcf\u6b21\u8bc4\u4f30\u65f6\uff0c\u6211\u4eec\u4f1a\u4ece\u8fd9\u4e9b\u503c\u57df\u4e2d\u62bd\u53d6\u65b0\u7684\u503c\u6765\u521b\u5efa\u72ec\u7279\u7684\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ece\u800c\u4fdd\u8bc1\u6bcf\u6b21\u90fd\u662f\u5168\u65b0\u7684\u8bc4\u4f30\u3002 \u6211\u4eec\u9488\u5bf9\u6570\u5b66\u751f\u6210\u4efb\u52a1\u7684GSM8K\u3001\u591a\u9879\u9009\u62e9\u4efb\u52a1\u7684ARC\u3001commonsense\u95ee\u7b54\u7684CommonsenseQA\u4ee5\u53caTruthfulQA\u7684\u771f\u5b9e\u6027\u95ee\u7b54\u4efb\u52a1\uff0c\u5e94\u7528\u4e86\u8fd9\u79cd\u53d8\u91cf\u6270\u52a8\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u51c6\u786e\u5730\u8861\u91cf\u8bed\u8a00\u6a21\u578b\u7684\u771f\u5b9e\u80fd\u529b\uff0c\u6709\u6548\u7f13\u89e3\u4e86\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002|\n", "2406.17675": "|**2024-06-25**|**Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models**|Yuan Li et.al.|[2406.17675](http://arxiv.org/abs/2406.17675)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u65e5\u76ca\u626e\u6f14\u7c7b\u4f3c\u4eba\u7c7b\u52a9\u624b\u7684\u89d2\u8272\u3002\u793e\u4f1a\u5bf9\u5c06LLMs\u66f4\u5e7f\u6cdb\u5730\u878d\u5165\u5176\u4e2d\u4ea7\u751f\u4e86\u5174\u8da3\uff0c\u63a2\u8ba8\u5b83\u4eec\u662f\u5426\u5177\u5907\u5fc3\u7406\u7279\u8d28\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u7279\u8d28\u662f\u5426\u7a33\u5b9a\u4e14\u6709\u52a9\u4e8e\u7406\u89e3\u5176\u884c\u4e3a\u3002\u672c\u6587\u501f\u9274\u5fc3\u7406\u5b66\u6d4b\u91cf\u5b66\u7684\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7528\u4e8e\u7814\u7a76LLMs\u4e2d\u7684\u5fc3\u7406\u5b66\uff0c\u5305\u62ec\u5fc3\u7406\u7ef4\u5ea6\u8bc6\u522b\u3001\u8bc4\u4f30\u6570\u636e\u96c6\u521b\u5efa\u548c\u7ed3\u679c\u9a8c\u8bc1\u3002\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684LLM\u5fc3\u7406\u6d4b\u91cf\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u516d\u79cd\u5fc3\u7406\u7ef4\u5ea6\uff1a\u4e2a\u6027\u3001\u4ef7\u503c\u89c2\u3001\u60c5\u7eea\u3001\u5fc3\u667a\u7406\u8bba\u3001\u52a8\u673a\u548c\u667a\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u542b\u4e86\u5341\u4e09\u4e2a\u5305\u542b\u591a\u6837\u573a\u666f\u548c\u9898\u578b\u7684\u6570\u636e\u96c6\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLMs\u5c55\u73b0\u51fa\u5e7f\u6cdb\u7684\u5fc3\u7406\u7279\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u89c2\u5bdf\u5230LLMs\u5728\u81ea\u6211\u62a5\u544a\u7684\u7279\u8d28\u4e0e\u5176\u5b9e\u9645\u884c\u4e3a\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u3002\u8be5\u8bba\u6587\u8be6\u7ec6\u5c55\u793a\u4e86LLMs\u7684\u5fc3\u7406\u6d4b\u91cf\u8bc4\u4f30\uff0c\u4e3aAI\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53ef\u9760\u8bc4\u4f30\u63d0\u4f9b\u4e86\u6d1e\u89c1\uff0c\u4ee5\u53ca\u53ef\u80fd\u7684\u5e94\u7528\u65b9\u5411\u3002|\n", "2406.18532": "|**2024-06-26**|**Symbolic Learning Enables Self-Evolving Agents**|Wangchunshu Zhou et.al.|[2406.18532](http://arxiv.org/abs/2406.18532)|**[link](https://github.com/aiwaves-cn/agents)**|**\u4eba\u5de5\u667a\u80fd\u754c\u901a\u8fc7\u6784\u5efa\"\u8bed\u8a00\u4ee3\u7406\"\uff08\u5373\u590d\u6742\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ba1\u9053\uff09\u6765\u63a2\u5bfb\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u9053\u8def\uff0c\u8fd9\u4e9b\u6a21\u578b\u7ed3\u5408\u4e86\u63d0\u793a\u6280\u672f\u548c\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u4f17\u591a\u5b9e\u9645\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f53\u524d\u8bed\u8a00\u4ee3\u7406\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u662f\u5176\u6a21\u578b\u4e2d\u5fc3\u6216\u5de5\u7a0b\u5bfc\u5411\uff1a\u63d0\u793a\u3001\u5de5\u5177\u548c\u7ba1\u9053\u7684\u6539\u8fdb\u4f9d\u8d56\u4e8e\u5927\u91cf\u7684\u4eba\u5de5\u4e13\u5bb6\u8bbe\u8ba1\uff0c\u800c\u975e\u81ea\u52a8\u4ece\u6570\u636e\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4ece\u6a21\u578b\u4e2d\u5fc3\u5411\u6570\u636e\u4e2d\u5fc3\u8f6c\u53d8\u2014\u2014\u8ba9\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u81ea\u4e3b\u5b66\u4e60\u548c\u9002\u5e94\u73af\u5883\uff0c\u662f\u5b83\u4eec\u8fc8\u5411AGI\u7684\u5173\u952e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\"\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\"\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u65b9\u6cd5\uff0c\u5b83\u4f7f\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u5728\u6570\u636e\u9a71\u52a8\u7684\u65b9\u5f0f\u4e0b\u81ea\u6211\u4f18\u5316\uff0c\u5229\u7528\u7b26\u53f7\u4f18\u5316\u5668\u3002\u6211\u4eec\u5c06\u4ee3\u7406\u89c6\u4e3a\u5177\u6709\u53ef\u5b66\u4e60\u6743\u91cd\u7684\u7b26\u53f7\u7f51\u7edc\uff0c\u8fd9\u4e9b\u6743\u91cd\u7531\u63d0\u793a\u3001\u5de5\u5177\u53ca\u5176\u7ec4\u5408\u65b9\u5f0f\u5b9a\u4e49\u3002\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u65e8\u5728\u6a21\u4eff\u8fde\u63a5\u4e3b\u4e49\u5b66\u4e60\u4e2d\u7684\u4e24\u4e2a\u57fa\u672c\u7b97\u6cd5\uff1a\u53cd\u5411\u4f20\u64ad\u548c\u68af\u5ea6\u4e0b\u964d\uff0c\u4f46\u5b83\u5904\u7406\u7684\u662f\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7684\u6743\u91cd\u3001\u635f\u5931\u548c\u68af\u5ea6\u3002\u6211\u4eec\u5728\u6807\u51c6\u57fa\u51c6\u548c\u590d\u6742\u73b0\u5b9e\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u6982\u5ff5\u9a8c\u8bc1\u5b9e\u9a8c\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u4f7f\u5f97\u8bed\u8a00\u4ee3\u7406\u5728\u521b\u5efa\u548c\u90e8\u7f72\u540e\u80fd\u591f\u81ea\u6211\u66f4\u65b0\uff0c\u5b9e\u73b0\u4e86\"\u81ea\u6211\u8fdb\u5316\u7684\u4ee3\u7406\"\u3002**|\n", "2406.18528": "|**2024-06-26**|**PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation**|Christoph Leiter et.al.|[2406.18528](http://arxiv.org/abs/2406.18528)|**[link](https://github.com/gringham/prexme)**|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5b83\u4eec\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u4f7f\u5176\u6210\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u8bc4\u4ef7\u7684\u6709\u529b\u5de5\u5177\uff0c\u7279\u522b\u9002\u7528\u4e8e\u8d44\u6e90\u532e\u4e4f\u548c\u65f6\u95f4\u9650\u5236\u7684\u573a\u666f\u3002\u672c\u6587\u63d0\u51faPrExMe\uff0c\u4e00\u9879\u5927\u89c4\u6a21\u7684\u63d0\u793a\u63a2\u7d22\u5ea6\u91cf\u6cd5\uff0c\u6211\u4eec\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u548c\u6458\u8981\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u8d85\u8fc7720\u79cd\u5f00\u6e90LLM\u4f5c\u4e3a\u5ea6\u91cf\u6807\u51c6\u7684\u6a21\u677f\uff0c\u603b\u8ba1\u7ea6660\u4e07\u6b21\u8bc4\u4f30\u3002\u8fd9\u9879\u8be6\u5c3d\u7684\u6bd4\u8f83\uff081\uff09\u4e3a\u8fd1\u671f\u5f00\u6e90LLMs\u4f5c\u4e3a\u8bc4\u4ef7\u6307\u6807\u7684\u8868\u73b0\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff1b\uff082\uff09\u63a2\u8ba8\u4e86\u4e0d\u540c\u63d0\u793a\u7b56\u7565\u7684\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65b9\u9762\uff0c\u5b58\u5728\u4e00\u4e9b\u60c5\u51b5\u4e0b\u63d0\u793a\u8868\u73b0\u7a33\u5b9a\uff1a\u6709\u4e9bLLMs\u8868\u73b0\u51fa\u7279\u6709\u7684\u504f\u597d\uff0c\u503e\u5411\u4e8e\u4f7f\u7528\u6587\u672c\u6807\u7b7e\u6765\u8bc4\u5206\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u503e\u5411\u4e8e\u8fd4\u56de\u6570\u503c\u5206\u6570\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u63d0\u793a\u7684\u7a33\u5b9a\u6027\u548c\u6a21\u578b\u6392\u540d\u53ef\u80fd\u53d7\u5230\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u66f4\u6539\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u5c06\u8f93\u51fa\u683c\u5f0f\u4ece\u201c0\u5230100\u201d\u6539\u4e3a\u201c-1\u5230+1\u201d\u53ef\u80fd\u4f1a\u663e\u8457\u6539\u53d8\u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u6709\u52a9\u4e8e\u7406\u89e3\u4e0d\u540c\u63d0\u793a\u65b9\u6cd5\u5bf9MT\u548c\u6458\u8981\u8bc4\u4ef7\u4e2dLLM-based\u5ea6\u91cf\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u6700\u7a33\u5b9a\u7684\u63d0\u793a\u6a21\u5f0f\uff0c\u5e76\u6307\u51fa\u4e86\u6f5c\u5728\u5c40\u9650\u6027\u3002|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521](http://arxiv.org/abs/2406.18521)|**[link](https://github.com/princeton-nlp/CharXiv)**|\u5728\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5904\u7406\u79d1\u5b66\u8bba\u6587\u6216\u8d22\u52a1\u62a5\u544a\u7b49\u4efb\u52a1\u65f6\uff0c\u56fe\u8868\u7406\u89e3\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u5f80\u5f80\u96c6\u4e2d\u5728\u7b80\u5316\u548c\u540c\u8d28\u5316\u7684\u56fe\u8868\u4e0a\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6a21\u677f\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u8bc4\u4f30\u8fc7\u4e8e\u4e50\u89c2\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5c3d\u7ba1\u5f00\u6e90\u6a21\u578b\u5728\u73b0\u6709\u57fa\u51c6\u4e0a\u53ef\u80fd\u8868\u73b0\u4f18\u4e8e\u5f3a\u5927\u7684\u4e13\u6709\u6a21\u578b\uff0c\u4f46\u901a\u8fc7\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\uff0c\u5982\u6539\u53d8\u56fe\u8868\u6216\u95ee\u9898\uff0c\u6027\u80fd\u4f1a\u4e0b\u964d\u9ad8\u8fbe34.5%\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faCharXiv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,323\u4e2a\u6765\u81eaarXiv\u8bba\u6587\u7684\u81ea\u7136\u3001\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u8868\u7684\u5168\u9762\u8bc4\u4f30\u5957\u4ef6\u3002CharXiv\u5305\u62ec\u4e24\u7c7b\u95ee\u9898\uff1a1\uff09\u63cf\u8ff0\u6027\u95ee\u9898\uff0c\u7528\u4e8e\u68c0\u67e5\u57fa\u672c\u56fe\u8868\u5143\u7d20\uff1b2\uff09\u63a8\u7406\u95ee\u9898\uff0c\u9700\u8981\u7efc\u5408\u5206\u6790\u56fe\u8868\u4e2d\u7684\u590d\u6742\u89c6\u89c9\u5143\u7d20\u3002\u6240\u6709\u56fe\u8868\u548c\u95ee\u9898\u90fd\u7531\u4e13\u5bb6\u7cbe\u5fc3\u6311\u9009\u3001\u6574\u7406\u548c\u9a8c\u8bc1\u4ee5\u4fdd\u8bc1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5f3a\u4e13\u6709\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\uff0c\u51c6\u786e\u7387\u4e3a47.1%\uff09\u4e0e\u6700\u5f3a\u5f00\u6e90\u6a21\u578b\uff08\u5982InternVL Chat V1.5\uff0c\u51c6\u786e\u7387\u4e3a29.2%\uff09\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\uff0c\u800c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5747\u8fdc\u4f4e\u4e8e\u4eba\u7c7b\u768480.5%\u6c34\u5e73\uff0c\u8fd9\u63ed\u793a\u4e86\u73b0\u6709MLLM\u5728\u56fe\u8868\u7406\u89e3\u80fd\u529b\u4e0a\u7684\u4e0d\u8db3\u3002\u6211\u4eec\u5e0c\u671bCharXiv\u80fd\u63a8\u52a8\u672a\u6765\u7684\u7814\u7a76\uff0c\u901a\u8fc7\u63d0\u4f9b\u66f4\u771f\u5b9e\u3001\u66f4\u5177\u4ee3\u8868\u6027\u7684\u8fdb\u6b65\u8861\u91cf\u6807\u51c6\uff0c\u4fc3\u8fdb\u56fe\u8868\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\u3002\u9879\u76ee\u9875\u9762\u548c\u6392\u884c\u699c\u53ef\u8bbf\u95ee\uff1ahttps://charxiv.github.io/\u3002|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512](http://arxiv.org/abs/2406.18512)|null|### \u6982\u8ff0 \u89e3\u91ca\u662f\u77e5\u8bc6\u5171\u4eab\u7684\u6838\u5fc3\uff0c\u5b83\u5efa\u7acb\u5728\u6c9f\u901a\u539f\u7406\u3001\u793e\u4f1a\u52a8\u6001\u548c\u5b66\u4e60\u7406\u8bba\u4e4b\u4e0a\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5bf9\u8bdd\u5f0f\u7684\u89e3\u91ca\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5176\u73af\u5883\u9ad8\u5ea6\u9002\u5e94\u6027\u548c\u4ea4\u4e92\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u5229\u7528\u4e86\u89e3\u91ca\u884c\u4e3a\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7406\u89e3\u89e3\u91ca\u8005\u548c\u88ab\u89e3\u91ca\u8005\u5728\u5bf9\u8bdd\u4e2d\u5982\u4f55\u8fd0\u7528\u7b56\u7565\u8fdb\u884c\u89e3\u91ca\u3001\u7406\u89e3\u548c\u4e92\u52a8\u7684\u5de5\u5177\u3002\u6211\u4eec\u5229\u7528Wachsmuth\u7b49\u4eba\u6784\u5efa\u7684WIRED YouTube\u7cfb\u5217\u6570\u636e\u96c6\uff0c\u5e76\u7531Booshehri\u7b49\u4eba\u8fdb\u884c\u4e86\u5e26\u6709\u89e3\u91ca\u884c\u4e3a\u7684\u6807\u6ce8\uff0c\u8fd9\u4e9b\u6ce8\u91ca\u4e3a\u6211\u4eec\u7406\u89e3\u5bf9\u8bdd\u4e2d\u89e3\u91ca\u8005\u5982\u4f55\u6784\u5efa\u56de\u5e94\u63d0\u4f9b\u4e86\u4f9d\u636e\u3002 \u968f\u7740\u53bb\u5e74\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u671f\u671b\u66f4\u597d\u5730\u7406\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u4ee5\u53ca\u5b83\u4eec\u5982\u4f55\u589e\u5f3a\u4e13\u5bb6\u89e3\u91ca\u8005\u7684\u5bf9\u8bdd\u4ea4\u6d41\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Booshehri\u7b49\u4eba2023\u5e74\u6807\u6ce8\u76845-Levels\u6570\u636e\u96c6\u6765\u8bc4\u4f30LLMs\u5728\u89e3\u91ca\u6027\u5bf9\u8bdd\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u8bc4\u4ef7LLMs\u751f\u6210\u89e3\u91ca\u8005\u56de\u5e94\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u79cd\u7b56\u7565\uff1a\u4eba\u7c7b\u89e3\u91ca\u8005\u7684\u539f\u59cb\u56de\u5e94\u3001GPT4\u7684\u6807\u51c6\u56de\u5e94\u4ee5\u53ca\u52a0\u5165\u4e86\u89e3\u91ca\u6b65\u9aa4\u7684GPT4\u56de\u5e94\u3002\u6211\u4eec\u9080\u8bf7\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u8fd9\u4e09\u79cd\u7b56\u7565\u8fdb\u884c\u8bc4\u4f30\u3002|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505](http://arxiv.org/abs/2406.18505)|null|## \u80cc\u666f \u5c3d\u7ba1\u73b0\u4ee3\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7406\u8bba\u4e0a\u80fd\u591f\u8868\u8fbe\u4efb\u610f\u53ef\u80fd\u7684\u4ee4\u724c\u5206\u5e03\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5229\u7528\u9884\u8bad\u7ec3\u65f6\u79ef\u7d2f\u7684\u4e16\u754c\u77e5\u8bc6\u6765\u7406\u89e3\u7269\u7406\u4e16\u754c\u4e2d\u7684\u4ee3\u7406\u884c\u4e3a\uff0c\u8fd9\u4e00\u65b9\u9762\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5b9e\u8bc1\u8003\u5bdf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u8fc7\u63a8\u7406\u5206\u6790\u4ee3\u7406\u7684\u884c\u4e3a\u53ca\u5176\u5bf9\u72b6\u6001\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u6784\u5efa\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\uff08agent mental modeling\uff09\u7684\u80fd\u529b\u3002\u8fd9\u53ef\u80fd\u63ed\u793a\u51fa\u5229\u7528LLMs\u89e3\u6790\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u884c\u4e3a\u7684\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u89e3\u91ca\u5f3a\u5316\u5b66\u4e60\uff08XRL\uff09\u7684\u5173\u952e\u6311\u6218\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u7279\u5b9a\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5728\u4e0d\u540c\u590d\u6742\u5ea6\u7684RL\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u62a5\u544a\u5173\u4e8e\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\u5efa\u7acb\u7684\u7814\u7a76\u7ed3\u679c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u8fd8\u65e0\u6cd5\u4ec5\u901a\u8fc7\u63a8\u7406\u5b8c\u5168\u5b9e\u73b0\u4ee3\u7406\u7684\u5fc3\u7406\u5efa\u6a21\uff0c\u8fd9\u9700\u8981\u8fdb\u4e00\u6b65\u521b\u65b0\u3002\u56e0\u6b64\uff0c\u8fd9\u9879\u5de5\u4f5c\u63d0\u4f9b\u4e86\u5bf9\u73b0\u4ee3LLMs\u80fd\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\u3002|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501](http://arxiv.org/abs/2406.18501)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5185\u63d2\u5b66\u4e60\uff08in-context learning\uff0cICL\uff09\u80fd\u529b\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u8fdb\u884c\u529f\u80fd\u7b49\u6548\u6027\u8bca\u65ad\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u9006\u9891\u7387\u6548\u5e94\uff08inverse frequency effect\uff0cIFE\uff09\u6765\u5206\u6790\u3002IFE\u73b0\u8c61\u6307\u7684\u662f\u5728\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6a21\u578b\u5e94\u5bf9\u7f55\u89c1\u6837\u4f8b\u4ea7\u751f\u7684\u66f4\u65b0\u5e45\u5ea6\u5927\u4e8e\u5e38\u89c1\u6837\u4f8b\u3002\u5728\u5fc3\u7406\u5b66\u4e2d\uff0c\u4eba\u7c7b\u5728\u7ed3\u6784\u5316\u63d0\u793a\uff08\u5982\u503e\u5411\u4e8e\u91cd\u590d\u6700\u8fd1\u63a5\u89e6\u7684\u53e5\u5b50\u7ed3\u6784\uff09\u60c5\u5883\u4e2d\u8868\u73b0\u51faIFE\uff0c\u8fd9\u8868\u660e\u5176\u53ef\u80fd\u6d89\u53ca\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u673a\u5236\u3002\u5b9e\u9a8c\u901a\u8fc7\u6a21\u62df\u7ed3\u6784\u5316\u63d0\u793a\u5728ICL\u4e2d\u7684\u5f71\u54cd\u53d1\u73b0\uff0cLLMs\u540c\u6837\u663e\u793a\u51faIFE\uff0c\u4e14\u8fd9\u4e00\u6548\u5e94\u5728\u66f4\u5927\u7684\u6a21\u578b\u4e2d\u66f4\u4e3a\u660e\u663e\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u7ed3\u679c\u652f\u6301\u4e86ICL\u672c\u8d28\u4e0a\u662f\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u7684\u5047\u8bbe\uff0c\u5373\u5728ICL\u7684\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u9690\u542b\u5730\u8ba1\u7b97\u4e86\u68af\u5ea6\u3002\u8bba\u6587\u7ed3\u8bba\u6307\u51fa\uff0c\u4eba\u7c7b\u548cLLMs\u90fd\u4f7f\u7528\u4e86\u57fa\u4e8e\u68af\u5ea6\u7684\u3001\u9519\u8bef\u9a71\u52a8\u7684\u5904\u7406\u673a\u5236\u3002|\n", "2406.18460": "|**2024-06-26**|**Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation**|Ahmed Njifenjou et.al.|[2406.18460](http://arxiv.org/abs/2406.18460)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b9\u6cd5\u6765\u521b\u5efa\u80fd\u591f\u8fdb\u884c\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u56de\u7b54\u7528\u6237\u95ee\u9898\uff0c\u4f46\u5c40\u9650\u4e8e\u5355\u5411\u95ee\u7b54\u5f62\u5f0f\uff0c\u800c\u975e\u771f\u6b63\u7684\u5bf9\u8bdd\u3002\u901a\u5e38\uff0c\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u6765\u8c03\u6574\u5b83\u4eec\u7684\u4ea4\u6d41\u98ce\u683c\uff0c\u4f46\u8fd9\u65e2\u6602\u8d35\u53c8\u9650\u4e8e\u5c11\u6570\u8bed\u8a00\u3002\u672c\u7814\u7a76\u63a2\u7d22\u4e86\u89d2\u8272\u626e\u6f14\u7684\u96f6\u6837\u672c\u63d0\u793a\u4f5c\u4e3a\u63d0\u9ad8\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u591a\u8bed\u8a00\u80fd\u529b\u5f3a\u7684\u8bad\u7ec3\u6709\u7d20\u6a21\u578b\uff08Beeching\u7b49\u4eba\uff0c2023\u5e74\uff09\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u9075\u5faa\u6307\u4ee4\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\u7cfb\u7edf\uff0c\u5f53\u4e0e\u9075\u5faa\u6307\u4ee4\u7684\u6a21\u578b\u2014\u2014\u8fd9\u91cc\u4f7f\u7528Vicuna\uff08Chiang\u7b49\u4eba\uff0c2023\u5e74\uff09\u7ed3\u5408\u65f6\uff0c\u80fd\u591f\u751f\u6210\u5728\u6cd5\u8bed\u4e2d\u7684\u5bf9\u8bdd\u4ee3\u7406\uff0c\u5728\u4e24\u9879\u4efb\u52a1\u4e2d\u751a\u81f3\u8d85\u8d8a\u4e86\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5e76\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.18449": "|**2024-06-26**|**Cascading Large Language Models for Salient Event Graph Generation**|Xingwei Tan et.al.|[2406.18449](http://arxiv.org/abs/2406.18449)|**[link](https://github.com/xingwei-warwick/callmsae)**|\u7531\u4e8e\u957f\u6587\u6863\u4e2d\u4e8b\u4ef6\u68c0\u6d4b\u3001\u5173\u7cfb\u8bc6\u522b\u4ee5\u53ca\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e0e\u7ed3\u6784\u5316\u56fe\u8c31\u7684\u6574\u5408\u7b49\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u4ece\u6587\u672c\u751f\u6210\u4e8b\u4ef6\u56fe\u8c31\u662f\u4e00\u9879\u6311\u6218\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u540c\u7b49\u91cd\u89c6\u6240\u6709\u4e8b\u4ef6\uff0c\u672a\u80fd\u533a\u5206\u5bf9\u7406\u89e3\u53d9\u4e8b\u81f3\u5173\u91cd\u8981\u7684\u5173\u952e\u4e8b\u4ef6\u3002\u672c\u6587\u63d0\u51faCALLMSAE\uff0c\u4e00\u4e2a\u57fa\u4e8eCAscading\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684SAlient Event\u56fe\u8c31\u751f\u6210\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u7684\u80fd\u529b\uff0c\u5e76\u907f\u514d\u4e86\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u9700\u6c42\u3002\u9996\u5148\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u6458\u8981\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u91cd\u8981\u4e8b\u4ef6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u4ee3\u7801\u7cbe\u70bc\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u751f\u6210\u4e8b\u4ef6\u5173\u7cfb\u56fe\uff0c\u6d88\u9664\u9519\u8bef\u7684\u5173\u7cfb\u5e76\u6062\u590d\u7f3a\u5931\u7684\u8fb9\u3002\u5bf9\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u56fe\u8c31\u751f\u6210\u6a21\u578b\u8fdb\u884c fine-tuning\uff0c\u5728\u4f7f\u7528 LLM \u751f\u6210\u7684\u56fe\u8c31\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f18\u4e8e\u4f7f\u7528 CAEVO \u751f\u6210\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u3002\u5728\u4eba\u7c7b\u6807\u6ce8\u7684\u6d4b\u8bd5\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u751f\u6210\u66f4\u7a81\u51fa\u4e14\u51c6\u786e\u7684\u56fe\u8c31\uff0c\u8d85\u8d8a\u4e86\u7ade\u4e89\u6027\u7684\u57fa\u7ebf\u3002|\n", "2406.18440": "|**2024-06-26**|**New intelligent empowerment for digital transformation**|Peng Yifeng et.al.|[2406.18440](http://arxiv.org/abs/2406.18440)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u8861\u91cf\u4f01\u4e1a\u7684\u6570\u5b57\u5316\u8f6c\u578b\uff08DT\uff09\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf92005\u5e74\u81f32022\u5e74\u95f4\u5728\u7ebd\u7ea6\u8bc1\u5238\u4ea4\u6613\u6240\u548c\u7eb3\u65af\u8fbe\u514b\u4e0a\u5e02\u76844407\u5bb6\u516c\u53f8\u7684\u5e74\u5ea6\u62a5\u544a\u8fdb\u884c\u5206\u6790\uff0c\u6784\u5efa\u4e86\u4e00\u5957\u5168\u9762\u7684DT\u6307\u6807\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cDT\u663e\u8457\u63d0\u9ad8\u4e86\u4f01\u4e1a\u7684\u8d22\u52a1\u8868\u73b0\u3002\u7136\u800c\uff0c\u4e0d\u540c\u7684\u6570\u5b57\u6280\u672f\u5bf9\u8d22\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u5404\u4e0d\u76f8\u540c\uff0c\u533a\u5757\u94fe\u6280\u672f\u7684\u79ef\u6781\u5f71\u54cd\u76f8\u5bf9\u8f83\u5c0f\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0DT\u901a\u8fc7\u63d0\u5347\u8fd0\u8425\u6548\u7387\u548c\u964d\u4f4e\u6210\u672c\u4fc3\u8fdb\u8d22\u52a1\u7ee9\u6548\u589e\u957f\u3002\u672c\u7814\u7a76\u4e3a\u5b66\u672f\u754c\u63d0\u4f9b\u4e86\u65b0\u7684DT\u8bc4\u4f30\u5de5\u5177\uff0c\u540c\u65f6\u62d3\u5bbd\u4e86\u751f\u6210\u4eba\u5de5\u667a\u80fd\u6280\u672f\u5728\u7ecf\u6d4e\u7814\u7a76\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002|\n", "2406.18406": "|**2024-06-26**|**IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons**|Dan Shi et.al.|[2406.18406](http://arxiv.org/abs/2406.18406)|null|\u4eba\u4eec\u666e\u904d\u8ba4\u4e3a\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u540e\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u6587\u672c\u65f6\u7684\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5185\u7f16\u7801\u7684\u53c2\u6570\u77e5\u8bc6\uff08\u5373\u77e5\u8bc6\u5e93\uff09\u4e0e\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u65b0\u77e5\u8bc6\u5b58\u5728\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014IRCAN\uff08\u8bc6\u522b\u548c\u91cd\u6743\u4e0a\u4e0b\u6587\u611f\u77e5\u795e\u7ecf\u5143\uff09\u3002IRCAN\u9996\u5148\u5229\u7528\u6574\u5408\u68af\u5ea6\u8ba1\u7b97\u5f97\u5230\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u5f52\u56e0\u5206\u6570\uff0c\u6765\u8bc6\u522b\u90a3\u4e9b\u5bf9\u5904\u7406\u8bed\u5883\u81f3\u5173\u91cd\u8981 \u7684\u795e\u7ecf\u5143\u3002\u63a5\u7740\uff0c\u901a\u8fc7\u91cd\u65b0\u8d4b\u6743\uff0c\u6211\u4eec\u5f3a\u5316\u8fd9\u4e9b\u8bc6\u522b\u51fa\u7684\u4e0a\u4e0b\u6587\u76f8\u5173\u795e\u7ecf\u5143\uff0c\u4ece\u800c\u5f15\u5bfcLLMs\u751f\u6210\u66f4\u7b26\u5408\u4e0a\u4e0b\u6587\u65b0\u77e5\u8bc6\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5728\u591a\u79cd\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cIRCAN\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u7684\u80fd\u529b\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u3001\u5373\u63d2\u5373\u7528\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u6a21\u578b\u4e2d\u3002|\n", "2406.19392": "|**2024-06-27**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392](http://arxiv.org/abs/2406.19392)|**[link](https://github.com/rextime/rextime)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aReXTime\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u95e8\u9488\u5bf9\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u5728\u89c6\u9891\u4e8b\u4ef6\u4e2d\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\u3002ReXTime\u5173\u6ce8\u7684\u662f\u8de8\u65f6\u95f4\u63a8\u7406\uff0c\u5373\u7406\u89e3\u5f53\u95ee\u9898\u53ca\u5176\u76f8\u5e94\u7684\u7b54\u6848\u51fa\u73b0\u5728\u4e0d\u540c\u7684\u89c6\u9891\u7247\u6bb5\u65f6\u7684\u4eba\u7c7b\u5f0f\u7406\u89e3\u3002\u8fd9\u79cd\u9700\u8981\u6df1\u5165\u7406\u89e3\u89c6\u9891\u7247\u6bb5\u4e4b\u95f4\u56e0\u679c\u5173\u7cfb\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u5bf9\u524d\u6cbf\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8bc4\u4ef7\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7ba1\u9053\uff0c\u7528\u4e8e\u751f\u6210\u65f6\u95f4\u63a8\u7406\u7684\u95ee\u7b54\u5bf9\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u7e41\u7410\u7684\u624b\u52a8\u6807\u6ce8\u9700\u6c42\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5305\u62ec921\u4e2a\u7cbe\u5fc3\u7b5b\u9009\u7684\u9a8c\u8bc1\u6837\u672c\u548c2,143\u4e2a\u6d4b\u8bd5\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u7ecf\u8fc7\u4eba\u5de5\u7cbe\u5fc3\u6311\u9009\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b66\u672f\u6a21\u578b\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7684\u8868\u73b0\u4ecd\u5b58\u5728\u663e\u8457\u768414.3%\u7684\u7cbe\u5ea6\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u65e0\u9700\u4eba\u5de5\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9,695\u4e2a\u673a\u5668\u751f\u6210\u6837\u672c\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6765\u63d0\u5347\u8de8\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u3002**|\n", "2406.19384": "|**2024-06-27**|**The Remarkable Robustness of LLMs: Stages of Inference?**|Vedang Lad et.al.|[2406.19384](http://arxiv.org/abs/2406.19384)|**[link](https://github.com/vdlad/remarkable-robustness-of-llms)**|**\u6211\u4eec\u901a\u8fc7\u5220\u9664\u548c\u4ea4\u6362\u76f8\u90bb\u5c42\u6765\u5c55\u793a\u5e76\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u60ca\u4eba\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u5e72\u9884\u63aa\u65bd\u4ecd\u80fd\u4fdd\u7559\u539f\u59cb\u6a21\u578b72%\u81f395%\u7684\u9884\u6d4b\u7cbe\u5ea6\uff0c\u800c\u4e14\u6a21\u578b\u5c42\u6570\u8d8a\u591a\uff0c\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u6839\u636e\u9010\u5c42\u5e72\u9884\u5b9e\u9a8c\u548c\u5176\u4ed6\u5b9e\u9a8c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5047\u8bbe\uff1a\u5b58\u5728\u56db\u79cd\u901a\u7528\u7684\u63a8\u7406\u9636\u6bb5\uff0c\u8de8\u8d8a\u516b\u79cd\u4e0d\u540c\u7684\u6a21\u578b\uff1a\u89e3\u7801\u5668\u9636\u6bb5\uff0c\u5c06\u539f\u59cb\u4ee4\u724c\u8868\u793a\u63d0\u5347\u4e3a\u66f4\u9ad8\u7ea7\u7684\u4e0a\u4e0b\u6587\u8868\u793a\uff1b\u7279\u5f81\u5de5\u7a0b\u9636\u6bb5\uff0c\u8fed\u4ee3\u4f18\u5316\u4efb\u52a1\u548c\u5b9e\u4f53\u7279\u5b9a\u7279\u5f81\uff1b\u7136\u540e\u662f\u6a21\u578b\u7684\u534a\u90e8\u5206\uff0c\u968f\u7740\u4e13\u95e8\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u9690\u85cf\u8868\u793a\u4e0e\u8bcd\u6c47\u7a7a\u95f4\u7684\u5bf9\u9f50\u8fdb\u5165\u4e00\u4e2a\u76f8\u53d8\u9636\u6bb5\uff1b\u6700\u540e\uff0c\u6700\u540e\u4e00\u5c42\u901a\u8fc7\u6d88\u9664\u5bf9\u9884\u6d4b\u9020\u6210\u5e72\u6270\u7684\u8fc7\u65f6\u7279\u5f81\uff0c\u7cbe\u7ec6\u5316\u540e\u7eed\u7684\u4ee4\u724c\u5206\u5e03\u3002**|\n", "2406.19358": "|**2024-06-27**|**The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models**|Xiliang Zhu et.al.|[2406.19358](http://arxiv.org/abs/2406.19358)|null|### \u6982\u8ff0 \u60c5\u611f\u5206\u6790\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u626e\u6f14\u7740\u6838\u5fc3\u89d2\u8272\u3002XLM-R\u548cmT5\u7b49\u591a\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u7684\u5173\u6ce8\u5ea6\u63d0\u5347\u3002\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63a8\u52a8\u4e86\u901a\u7528NLP\u4efb\u52a1\u7684\u53d1\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5728\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u65b9\u9762\u7684\u6027\u80fd\u5c1a\u672a\u5145\u5206\u63a2\u8ba8\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5b9e\u8bc1\u5206\u6790\uff0c\u6bd4\u8f83\u4e86\u516c\u5171\u5c0f\u578b\u591a\u8bed\u8a00\u6a21\u578b\uff08SMLM\uff09\u5982XLM-R\u4e0e\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLM\uff08\u5982Llama-3\uff09\u5728\u82f1\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u548c\u4e2d\u6587\u7684\u60c5\u611f\u5206\u6790\u4e2d\u7684\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8fc1\u79fb\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c31\u516c\u5f00\u6a21\u578b\u800c\u8a00\uff0cSMLM\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u8bbe\u7f6e\u4e2d\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u516c\u5f00LLM\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9002\u5e94\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u4e13\u6709\u7684GPT-3.5\u548cGPT-4\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u80fd\u529b\u4e0a\u9886\u5148\uff0c\u4f46\u5728\u5c11\u91cf\u6837\u672c\u573a\u666f\u4e0b\uff0c\u5b83\u4eec\u88ab\u516c\u5f00\u6a21\u578b\u8d85\u8d8a\u3002|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356](http://arxiv.org/abs/2406.19356)|**[link](https://github.com/umass-ml4ed/divert)**|## \u80cc\u666f \u9ad8\u8d28\u91cf\u7684\u5e72\u6270\u9879\u5bf9\u4e8e\u9009\u62e9\u9898\uff08\u5c24\u5176\u662f\u6570\u5b66\u9009\u62e9\u9898\uff09\u7684\u8bc4\u4f30\u548c\u6559\u5b66\u4ef7\u503c\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u5de5\u8bbe\u8ba1\u80fd\u591f\u53cd\u6620\u5b66\u751f\u5b9e\u9645\u77e5\u8bc6\u7f3a\u9677\u6216\u8bef\u89e3\u7684\u5e72\u6270\u9879\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u5728\u751f\u6210\u5e72\u6270\u9879\u65b9\u9762\u6709\u6240\u52a9\u76ca\uff0c\u4f46\u6570\u5b66\u8fd9\u7c7b\u5b66\u79d1\u7684\u5904\u7406\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7406\u89e3\u548c\u751f\u6210\u89e3\u91ca\u6027\u7684\u9519\u8bef\u8868\u793a\uff0c\u4ee5\u751f\u6210\u6570\u5b66\u9009\u62e9\u9898\u7684\u5e72\u6270\u9879\u3002\u672c\u6587\u4ecb\u7ecdDiVERT\uff08\u57fa\u4e8e\u6587\u672c\u7684\u53d8\u5f02\u8bef\u5dee\u751f\u6210\u5668\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u75287\u4ebf\u53c2\u6570\u5f00\u6e90LLM\u7684\u53d8\u5206\u65b9\u6cd5\uff0c\u5b83\u5728\u771f\u5b9e\u4e16\u754c\u6570\u5b66\u9009\u62e9\u9898\u6570\u636e\u96c6\uff08\u5305\u542b1,434\u4e2a\u95ee\u9898\uff0c\u88ab\u6570\u5341\u4e07\u5b66\u751f\u4f7f\u7528\uff09\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684GPT-4\u65b9\u6cd5\uff0cDiVERT\u5728\u5e72\u6270\u9879\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u4e0e\u6570\u5b66\u6559\u80b2\u8005\u7684\u540c\u884c\u8bc4\u5ba1\uff0c\u7ed3\u679c\u8868\u660eDiVERT\u751f\u6210\u7684\u9519\u8bef\u6807\u7b7e\u8d28\u91cf\u63a5\u8fd1\u4eba\u7c7b\u7f16\u5199\u7684\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u82f1\u6587\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u8f93\u51fa\u4e0d\u5e94\u5305\u542b\u9664\u6458\u8981\u5185\u5bb9\u5916\u7684\u4efb\u4f55\u5176\u4ed6\u5185\u5bb9\uff0c\u4e14\u786e\u4fdd\u4e0d\u51fa\u73b0\",\"\u5b57\u7b26\u3002|\n", "2406.19349": "|**2024-06-27**|**IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language**|Lucky Susanto et.al.|[2406.19349](http://arxiv.org/abs/2406.19349)|null|## \u7ffb\u8bd1 \u9488\u5bf9\u7f51\u7edc\u4ec7\u6068\u8a00\u8bba\u5bf9\u793e\u4f1a\u548c\u8c10\u7684\u4e25\u5cfb\u5a01\u80c1\uff0c\u7279\u522b\u662f\u5728\u5370\u5c3c\u8fd9\u7c7b\u56fd\u5bb6\uff0c\u8fd1\u5e74\u6765\u4ec7\u6068\u8a00\u8bba\u5728\u7ebf\u6bd4\u7387\u589e\u957f\u4e86\u5341\u500d\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u68c0\u6d4b\u673a\u5236\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5145\u8db3\u7684\u6807\u8bb0\u6570\u636e\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u5370\u5c3c\u6587\u672c\u7684\uff0c\u8fd9\u4e00\u8fdb\u5c55\u53d7\u5230\u4e86\u963b\u788d\u3002\u8fb9\u7f18\u5316\u7fa4\u4f53\uff0c\u5982\u4ec0\u53f6\u6d3e\u3001LGBTQ\u7b49\u5c11\u6570\u7fa4\u4f53\uff0c\u9762\u4e34\u7684\u6311\u6218\u66f4\u5927\uff0c\u56e0\u4e3a\u4ec7\u6068\u8a00\u8bba\u62a5\u544a\u4e0d\u8db3\uff0c\u73b0\u6709\u7684\u68c0\u6d4b\u5de5\u5177\u5bf9\u5176\u7406\u89e3\u6709\u9650\u3002\u6b64\u5916\uff0c\u5f53\u524d\u6570\u636e\u96c6\u5bf9\u4e3b\u89c2\u6027\u7684\u5904\u7406\u4e0d\u8db3\uff0c\u52a0\u5267\u4e86\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faIndoToxic2024\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u5370\u5c3c\u4ec7\u6068\u8a00\u8bba\u548c\u6bd2\u6027\u5206\u7c7b\u6570\u636e\u96c6\uff0c\u5305\u542b43,692\u6761\u8bb0\u5f55\uff0c\u753119\u540d\u591a\u5143\u5316\u7684\u4e2a\u4f53\u8fdb\u884c\u6807\u6ce8\uff0c\u7279\u522b\u5173\u6ce8\u9009\u4e3e\u671f\u95f4\u9488\u5bf9\u56fd\u5185\u5f31\u52bf\u7fa4\u4f53\uff08\u5982\u603b\u7edf\u9009\u4e3e\u4e2d\u7684\u7279\u5b9a\u7fa4\u4f53\uff09\u7684\u6587\u672c\u3002\u6211\u4eec\u4f7f\u7528BERT\u6a21\u578b\uff08IndoBERTweet\uff09\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4e3a\u4e03\u79cd\u4e8c\u5143\u5206\u7c7b\u4efb\u52a1\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff0c\u53d6\u5f97\u4e860.78\u7684\u5b8fF1\u5206\u6570\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5c06\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u878d\u5165\u5176\u4e2d\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578bgpt-3.5-turbo\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u8b66\u544a\uff0c\u8fc7\u5ea6\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u53ef\u80fd\u5bfc\u81f4\u7ec6\u5316\u6a21\u578b\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u8fd9\u4f1a\u5bfc\u81f4\u6570\u636e\u788e\u7247\u5316\u3002|\n", "2406.19317": "|**2024-06-27**|**Jump Starting Bandits with LLM-Generated Prior Knowledge**|Parand A. Alamdari et.al.|[2406.19317](http://arxiv.org/abs/2406.19317)|**[link](https://github.com/BorealisAI/jump-starting-bandits)**|\u6211\u4eec\u63d0\u4f9b\u4e86\u6709\u529b\u7684\u8bc1\u636e\uff0c\u5c55\u793a\u4e86\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\u76f8\u7ed3\u5408\u7684\u4f18\u52bf\u3002\u4e0a\u4e0b\u6587\u5316\u8001\u864e\u673a\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u7528\u4e8e\u6839\u636e\u7528\u6237\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u751f\u6210\u4e2a\u6027\u5316\u5efa\u8bae\u3002\u6211\u4eec\u8868\u660e\uff0c\u7ecf\u8fc7\u5927\u89c4\u6a21\u8bed\u6599\u5e93\u8bad\u7ec3\uff0c\u5bcc\u542b\u4eba\u7c7b\u77e5\u8bc6\u548c\u504f\u597d\u7684LLMs\u80fd\u591f\u5f88\u597d\u5730\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u4ece\u800c\u901a\u8fc7\u542f\u52a8\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6765\u51cf\u5c11\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\uff08regret\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521d\u59cb\u5316\u7b97\u6cd5\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u63a5\u8fd1\u4eba\u7c7b\u504f\u597d\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u4f9b\u8001\u864e\u673a\u5b66\u4e60\u4f7f\u7528\u3002\u8fd9\u663e\u8457\u964d\u4f4e\u4e86\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\u548c\u6570\u636e\u6536\u96c6\u6210\u672c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e24\u7ec4\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4f7f\u7528LLMs\u4f5c\u4e3a\u5360\u535c\u8005\uff08oracle\uff09\u7684\u5b9e\u9a8c\u548c\u57fa\u4e8e\u8054\u5408\u8c03\u67e5\u5b9e\u9a8c\u6570\u636e\u7684\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u3002|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292](http://arxiv.org/abs/2406.19292)|**[link](https://github.com/edixiong/artificial-needles)**|\u8fd1\u671f\u7684\u7814\u7a76\u6307\u51fa\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u957f\u6587\u672c\u8f93\u5165\u65f6\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5408\u6210\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6570\u503c\u578b\u952e\u503c\u5bf9\u68c0\u7d22\u4efb\u52a1\u3002\u6211\u4eec\u5728GPT-3.5 Turbo\u548cMistral 7B\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u8fd9\u79cd\u6570\u636e\u96c6\u7684\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u957f\u6587\u672c\u73af\u5883\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u5206\u6790\u4e86\u5fae\u8c03\u540e\u7684\u6a21\u578b\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u4ece\u5408\u6210\u4efb\u52a1\u8fc1\u79fb\u5230\u5b9e\u9645\u8bc4\u4f30\uff08\u5982\u572820\u6587\u6863MDQA\u4e2d\u7684\u4f4d\u7f6e10\u5904\u63d0\u534710.5%\uff09\u65b9\u9762\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u7ecf\u8fc7\u6211\u4eec\u5408\u6210\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u5728\u901a\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u4fdd\u6301\u7a33\u5b9a\uff0c\u800c\u4f7f\u7528\u5176\u4ed6\u57fa\u4e8e\u957f\u6587\u672c\u589e\u5f3a\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u53ef\u80fd\u4f1a\u5bfc\u81f4\u9519\u8bef\u589e\u52a0\uff08\u4f8b\u5982\uff0c\u5728TriviaQA\u4e0a\uff0cMistral 7B\u5728\u6211\u4eec\u7684\u5408\u6210\u6570\u636e\u4e0a\u5fae\u8c03\u65e0\u660e\u663e\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u5176\u4ed6\u57fa\u7ebf\u6570\u636e\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u8303\u56f4\u57282.33%\u52306.19%\u4e4b\u95f4\uff09\u3002\u672c\u7814\u7a76\u7a81\u663e\u4e86\u901a\u8fc7\u5408\u6210\u6570\u636e\u5fae\u8c03\u6765\u63d0\u5347LLMs\u5728\u957f\u6587\u672c\u4efb\u52a1\u6027\u80fd\u7684\u6f5c\u529b\u3002|\n", "2406.19283": "|**2024-06-27**|**PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models**|Cathy Mengying Fang et.al.|[2406.19283](http://arxiv.org/abs/2406.19283)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPhysioLLM\u7684\u4e92\u52a8\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u53ef\u7a7f\u6234\u8bbe\u5907\u7684\u751f\u7406\u6570\u636e\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u5065\u5eb7\u7406\u89e3\u548c\u63a2\u7d22\u3002\u4e0e\u5546\u4e1a\u5065\u5eb7\u5e94\u7528\u4e0d\u540c\uff0cPhysioLLM\u5177\u5907\u5168\u9762\u7684\u7edf\u8ba1\u5206\u6790\u529f\u80fd\uff0c\u80fd\u53d1\u73b0\u7528\u6237\u6570\u636e\u4e2d\u7684\u5173\u8054\u548c\u8d8b\u52bf\u3002\u7528\u6237\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u95ee\uff0c\u83b7\u53d6\u751f\u6210\u7684\u4e2a\u6027\u5316\u6d1e\u5bdf\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u4fe1\u606f\u5236\u5b9a\u884c\u52a8\u76ee\u6807\u3002\u4ee5\u6539\u5584\u7761\u7720\u8d28\u91cf\u4e3a\u4f8b\uff0c\u56e0\u4e3a\u5176\u53ef\u901a\u8fc7\u751f\u7406\u6570\u636e\u91cf\u5316\u4e14\u5bf9\u6574\u4f53\u5065\u5eb7\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u4e00\u9879\u6d89\u53ca24\u540dFitbit\u667a\u80fd\u624b\u8868\u7528\u6237\u7684\u7528\u6237\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86PhysioLLM\u5728\u4fc3\u8fdb\u5bf9\u5065\u5eb7\u6570\u636e\u7684\u6df1\u5165\u4e2a\u6027\u5316\u7406\u89e3\uff0c\u4ee5\u53ca\u652f\u6301\u5b9e\u73b0\u4e2a\u4eba\u5065\u5eb7\u76ee\u6807\u65b9\u9762\uff0c\u4f18\u4e8eFitbit\u5e94\u7528\u548c\u901a\u7528LLM\u804a\u5929\u673a\u5668\u4eba\u3002|\n", "2406.19280": "|**2024-06-27**|**HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale**|Junying Chen et.al.|[2406.19280](http://arxiv.org/abs/2406.19280)|**[link](https://github.com/freedomintelligence/huatuogpt-vision)**|**\u968f\u7740\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u533b\u5b66\u591a\u6a21\u6001\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u533b\u5b66\u5f71\u50cf-\u6587\u672c\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u53d7\u9650\u4e8e\u6570\u636e\u9690\u79c1\u95ee\u9898\u548c\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ecd\u9762\u4e34\u6311\u6218\u3002\u65e9\u671f\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528PubMed\u7684\u5927\u578b\u53bb\u6807\u8bc6\u5316\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\u6765\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u4ecd\u53d7\u5230\u6570\u636e\u566a\u97f3\u7684\u5f71\u54cd\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4f18\u5316\u4e86PubMed\u4e2d\u7684\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\uff0c\u5e76\u5229\u7528GPT-4V\u5728\u201c\u975e\u76f2\u201d\u6a21\u5f0f\u4e0b\u8fdb\u884c\u6570\u636e\u6e05\u6d17\u548c\u683c\u5f0f\u8f6c\u6362\uff0c\u521b\u5efa\u4e86PubMedVision\u6570\u636e\u96c6\uff0c\u5305\u542b130\u4e07\u4efd\u533b\u5b66\u89c6\u89c9\u95ee\u7b54\u6837\u672c\u3002\u6211\u4eec\u7684\u9a8c\u8bc1\u8868\u660e\uff1a\uff081\uff09PubMedVision\u663e\u8457\u63d0\u5347\u4e86\u5f53\u524d\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u5728\u533b\u5b66\u9886\u57df\u7684\u6027\u80fd\uff0c\u5728\u8bf8\u5982MMMU Health & Medicine track\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u6539\u5584\uff1b\uff082\uff09\u533b\u5b66\u4e13\u5bb6\u7684\u624b\u52a8\u68c0\u67e5\u548c\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u5728\u6570\u636e\u8d28\u91cf\u4e0a\u4f18\u4e8e\u5176\u4ed6\u6784\u5efa\u65b9\u6cd5\u3002\u5229\u7528PubMedVision\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u540d\u4e3aHuatuoGPT-Vision\u7684340\u4ebf\u53c2\u6570\u7684\u533b\u5b66\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u5728\u516c\u5f00\u6e90\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u533b\u5b66\u591a\u6a21\u6001\u573a\u666f\u4e2d\u663e\u793a\u51fa\u4f18\u8d8a\u6027\u80fd\u3002**|\n", "2406.19271": "|**2024-06-27**|**AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning**|Praneeth Vadlapati et.al.|[2406.19271](http://arxiv.org/abs/2406.19271)|**[link](https://github.com/Pro-GenAI/AutoPureData)**|**\u4eba\u4eec\u5bf9\u6700\u65b0\u7684\u548c\u53ef\u9760\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9700\u6c42\u6301\u7eed\u589e\u957f\u3002\u901a\u5e38\uff0cLLMs\u662f\u57fa\u4e8e\u56fa\u5b9a\u7684\u6570\u636e\u96c6\u8bad\u7ec3\u7136\u540e\u90e8\u7f72\u7684\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u6570\u636e\u4f1a\u968f\u7740\u65f6\u95f4\u9010\u6e10\u8fc7\u65f6\u3002\u7814\u7a76\u5173\u6ce8\u5982\u4f55\u5229\u7528\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u66f4\u65b0AI\u6a21\u578b\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u6570\u636e\u8d28\u91cf\u4e0e\u5b89\u5168\u7684\u987e\u8651\uff0c\u5982\u504f\u89c1\u3001\u5783\u573e\u4fe1\u606f\u7b49\u3002\u786e\u4fdd\u6570\u636e\u7eaf\u51c0\u5bf9\u4e8e\u751f\u6210\u53ef\u9760\u7684\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5728\u4e0d\u7eaf\u6570\u636e\u4e0a\u8bad\u7ec3\u53ef\u80fd\u5bfc\u81f4\u4e0d\u826f\u7ed3\u679c\u3002\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5b83\u6536\u96c6\u7f51\u7edc\u6570\u636e\uff0c\u5e76\u501f\u52a9\u73b0\u6709\u53ef\u4fe1\u7684AI\u6a21\u578b\u81ea\u52a8\u7b5b\u9009\u51fa\u4e0d\u9700\u8981\u7684\u5185\u5bb9\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5904\u7406\u4e86\u4e00\u5c0f\u90e8\u5206\u7f51\u7edc\u6570\u636e\uff0c\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\u7684\u6570\u636e\u51c0\u5316\u6548\u679c\u3002**|\n", "2406.20098": "|**2024-06-28**|**Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs**|Sukmin Yun et.al.|[2406.20098](http://arxiv.org/abs/2406.20098)|**[link](https://github.com/mbzuai-llm/web2code)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u7b49\u591a\u79cd\u6a21\u6001\u7684\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7406\u89e3\u548c\u751f\u6210\u7f51\u9875\u622a\u56fe\u4ee5\u53ca\u76f8\u5e94\u7684HTML\u4ee3\u7801\u65b9\u9762\u7684\u80fd\u529b\u76f8\u5bf9\u8f83\u5f31\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faWeb2Code\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u62ec\u5927\u89c4\u6a21\u7f51\u9875\u5230\u4ee3\u7801\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u6307\u4ee4\u8c03\u4f18\uff0c\u5e76\u8bc4\u4f30MLLM\u5728\u7f51\u9875\u7406\u89e3\u53caHTML\u4ee3\u7801\u8f6c\u6362\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u6784\u5efa\u6570\u636e\u96c6\u65f6\uff0c\u5229\u7528\u9884\u8bad\u7ec3\u7684LLMs\u589e\u5f3a\u73b0\u6709\u7684\u7f51\u9875\u5230\u4ee3\u7801\u6570\u636e\u96c6\uff0c\u5e76\u751f\u6210\u591a\u6837\u5316\u7684\u7f51\u9875\u56fe\u7247\uff0c\u4ee5\u4f9b\u6e32\u67d3\u3002\u8f93\u5165\u662f\u7f51\u9875\u56fe\u7247\u548c\u8bf4\u660e\uff0c\u8f93\u51fa\u662f\u7f51\u9875\u7684HTML\u4ee3\u7801\uff0c\u540c\u65f6\u52a0\u5165\u5173\u4e8e\u7f51\u9875\u5185\u5bb9\u7684\u4e30\u5bcc\u81ea\u7136\u8bed\u8a00\u95ee\u7b54\u5bf9\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u7f51\u9875\u5185\u5bb9\u7684\u5168\u9762\u7406\u89e3\u3002\u4e3a\u4e86\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6d4b\u8bd5\u6846\u67b6\uff0c\u7528\u4e8e\u6d4b\u8bd5MLLM\u5728\u7f51\u9875\u7406\u89e3\u4e0e\u7f51\u9875\u5230\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6280\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0d\u4ec5\u6709\u76ca\u4e8e\u6211\u4eec\u63d0\u51fa\u7684\u4efb\u52a1\uff0c\u8fd8\u5728\u89c6\u89c9\u9886\u57df\u7684\u4e00\u822c\u6027\u80fd\u4e0a\u6709\u6240\u63d0\u5347\uff0c\u800c\u5148\u524d\u7684\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u63a8\u52a8\u901a\u7528MLLM\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u9002\u7528\u4e8e\u7f51\u7edc\u5185\u5bb9\u751f\u6210\u548c\u81ea\u52a8\u5316\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095](http://arxiv.org/abs/2406.20095)|**[link](https://github.com/lostxine/llara)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLaRA\uff08\u5927\u578b\u8bed\u8a00\u548c\u673a\u5668\u4eba\u52a9\u624b\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u8f6c\u5316\u4e3a\u5bf9\u8bdd\u5f62\u5f0f\uff0c\u901a\u8fc7\u7ed3\u5408\u989d\u5916\u7684\u6570\u636e\u8f85\u52a9\u5b66\u4e60\uff0c\u63d0\u5347\u54cd\u5e94\u8d28\u91cf\u3002\u5229\u7528\u5177\u5907\u89c6\u89c9\u8f93\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\uff0c\u5373\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5904\u7406\u72b6\u6001\u4fe1\u606f\uff0c\u4f5c\u4e3a\u89c6\u89c9-\u6587\u672c\u63d0\u793a\uff0c\u5e76\u751f\u6210\u6700\u4f18\u7684\u673a\u5668\u4eba\u51b3\u7b56\u7b56\u7565\u3002\u9996\u5148\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u65b9\u6cd5\uff0c\u4ece\u73b0\u6709\u7684\u884c\u4e3a\u514b\u9686\u6570\u636e\u4e2d\u751f\u6210\u591a\u6837\u4e14\u9ad8\u8d28\u91cf\u7684\u673a\u5668\u4eba\u6307\u4ee4\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u5b9a\u5236\u7684\u5bf9\u8bdd\u5f0f\u683c\u5f0f\u5bf9VLM\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u5176\u80fd\u591f\u751f\u6210\u6709\u610f\u4e49\u7684\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLaRA\u6846\u67b6\u5728\u591a\u4e2a\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u9884\u8bad\u7ec3\u6a21\u578b\u5df2\u5728\u63d0\u4f9b\u3002**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094](http://arxiv.org/abs/2406.20094)|**[link](https://github.com/tencent-ailab/persona-hub)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5185\u7684\u591a\u79cd\u89c6\u89d2\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u4eba\u5de5\u5408\u6210\u6570\u636e\u3002\u4e3a\u4e86\u5728\u5927\u89c4\u6a21\u4e0a\u5145\u5206\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5f15\u5165\u4e86Persona Hub\uff0c\u8fd9\u662f\u4e00\u4e2a\u4ece\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u6574\u7406\u51fa\u7684\u4e00\u4ebf\u4e2a\u591a\u5143\u5316\u4eba\u683c\u7684\u96c6\u5408\uff0c\u76f8\u5f53\u4e8e\u5168\u7403\u4eba\u53e3\u7684\u7ea613%\u3002\u8fd9\u4e9b\u4eba\u683c\u4f5c\u4e3a\u5206\u5e03\u5f0f\u4e16\u754c\u77e5\u8bc6\u8f7d\u4f53\uff0c\u51e0\u4e4e\u53ef\u4ee5\u8c03\u7528LLM\u5185\u5305\u542b\u7684\u5404\u7c7b\u89c2\u70b9\uff0c\u4ece\u800c\u63a8\u52a8\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u5408\u6210\u6570\u636e\u521b\u5efa\uff0c\u9002\u7528\u4e8e\u5404\u79cd\u573a\u666f\u3002\u901a\u8fc7\u5c55\u793aPersona Hub\u5982\u4f55\u5728\u5927\u89c4\u6a21\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u548c\u903b\u8f91\u63a8\u7406\u95ee\u9898\u3001\u6307\u4ee4\uff08\u7528\u6237\u63d0\u793a\uff09\u3001\u5bcc\u542b\u77e5\u8bc6\u7684\u6587\u672c\u3001\u6e38\u620fNPC\u548c\u5de5\u5177\uff08\u51fd\u6570\uff09\u7b49\u65b9\u9762\u7684\u5e94\u7528\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u5177\u6709\u591a\u6837\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u6613\u7528\u6027\uff0c\u53ef\u80fd\u5f15\u9886\u5408\u6210\u6570\u636e\u521b\u9020\u548c\u5b9e\u9645\u5e94\u7528\u7684\u65b0\u8303\u5f0f\uff0c\u5bf9LLM\u7684\u7814\u7a76\u548c\u53d1\u5c55\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002|\n", "2406.20092": "|**2024-06-28**|**LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression**|Jieneng Chen et.al.|[2406.20092](http://arxiv.org/abs/2406.20092)|**[link](https://github.com/beckschen/llavolta)**|**\u5c3d\u7ba1\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5d4c\u5165\u538b\u7f29\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u4e2d\u7684\u89c6\u89c9\u4ee4\u724c\u538b\u7f29\u4ecd\u7136\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u7814\u7a76\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u5197\u4f59\u6027\u4ee5\u53ca\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6709\u6548\u8bad\u7ec3\u3002\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d4b\u8bd5\u9636\u6bb5\u901a\u8fc7\u7b80\u5355\u5e73\u5747\u6c60\u5316\u6d88\u9664\u9ad8\u8fbe70%\u7684\u89c6\u89c9\u4ee4\u724c\uff0cGQA\u57fa\u51c6\u7684\u89c6\u89c9\u95ee\u7b54\u51c6\u786e\u7387\u4ec5\u4e0b\u964d3%\uff0c\u8fd9\u663e\u793a\u51fa\u89c6\u89c9\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u5927\u91cf\u5197\u4f59\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Visual Context Compressor\uff0c\u5b83\u5728\u8bad\u7ec3\u9636\u6bb5\u51cf\u5c11\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e3a\u4e86\u5728\u538b\u7f29\u89c6\u89c9\u4ee4\u724c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4fe1\u606f\u635f\u5931\u5e76\u4fdd\u6301\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u8f7b\u91cf\u7ea7\u8bad\u7ec3\u65b9\u6848LLaVolta\u3002LLaVolta\u91c7\u7528\u5206\u9636\u6bb5\u7684\u89c6\u89c9\u4e0a\u4e0b\u6587\u538b\u7f29\u7b56\u7565\uff0c\u4ece\u91cd\u5ea6\u5230\u8f7b\u5ea6\u9010\u6e10\u538b\u7f29\uff0c\u6700\u7ec8\u5728\u8bad\u7ec3\u7ed3\u675f\u65f6\u5b8c\u5168\u4e0d\u8fdb\u884c\u538b\u7f29\uff0c\u4ece\u800c\u5728\u6d4b\u8bd5\u65f6\u4e0d\u4f1a\u4e22\u5931\u4efb\u4f55\u4fe1\u606f\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u5347\u4e86\u591a\u6a21\u6001\u6a21\u578b\u5728\u56fe\u50cf-\u8bed\u8a00\u548c\u89c6\u9891-\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u6210\u672c\u3002\u4ee3\u7801\u5df2\u5728https://github.com/Beckschen/LLaVolta\u4e0a\u5f00\u6e90\u3002**|\n", "2406.20087": "|**2024-06-28**|**ProgressGym: Alignment with a Millennium of Moral Progress**|Tianyi Qiu et.al.|[2406.20087](http://arxiv.org/abs/2406.20087)|**[link](https://github.com/pku-alignment/progressgym)**|\u968f\u7740\u524d\u6cbf\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u77e5\u8bc6\u8bba\u4e2d\u7684\u5f71\u54cd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5b83\u4eec\u53ef\u80fd\u5f3a\u5316\u793e\u4f1a\u666e\u904d\u7684\u4ef7\u503c\u89c2\uff0c\u8fdb\u800c\u52a0\u5267\u9519\u8bef\u9053\u5fb7\u89c2\u5ff5\u7684\u56fa\u5316\uff0c\u5bfc\u81f4\u5e7f\u6cdb\u7684\u793e\u4f1a\u95ee\u9898\u6301\u7eed\u5b58\u5728\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6f5c\u5728\u98ce\u9669\uff0c\u6211\u4eec\u63d0\u51fa\u8fdb\u6b65\u5bf9\u9f50\u4f5c\u4e3a\u4e00\u79cd\u6280\u672f\u89e3\u51b3\u65b9\u6848\u3002\u8fdb\u6b65\u5bf9\u9f50\u7b97\u6cd5\u65e8\u5728\u5b66\u4e60\u4eba\u7c7b\u9053\u5fb7\u8fdb\u6b65\u7684\u673a\u5236\uff0c\u4ece\u800c\u5f25\u8865\u73b0\u6709\u5bf9\u9f50\u65b9\u6cd5\u5bf9\u5f53\u4ee3\u9053\u5fb7\u76f2\u70b9\u7684\u654f\u611f\u6027\u3002\u4e3a\u4e86\u63a8\u52a8\u8fdb\u6b65\u5bf9\u9f50\u7684\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ProgressGym\uff0c\u4e00\u4e2a\u5b9e\u9a8c\u6027\u6846\u67b6\uff0c\u5b83\u4ece\u5386\u53f2\u4e2d\u5b66\u4e60\u9053\u5fb7\u8fdb\u6b65\u7684\u89c4\u5f8b\uff0c\u4ee5\u4fc3\u8fdb\u73b0\u5b9e\u4e16\u754c\u9053\u5fb7\u51b3\u7b56\u7684\u672a\u6765\u53d1\u5c55\u3002\u501f\u52a99\u4e2a\u4e16\u7eaa\u7684\u5386\u53f2\u6587\u672c\u548c18\u4e2a\u5386\u53f2LLMs\uff0cProgressGym\u5c06\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u8fdb\u6b65\u5bf9\u9f50\u6311\u6218\u8f6c\u5316\u4e3a\u5177\u4f53\u7684\u57fa\u51c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u8ffd\u8e2a\u6f14\u53d8\u7684\u4ef7\u503c\uff08PG-Follow\uff09\u3001\u9884\u6d4b\u9053\u5fb7\u8fdb\u6b65\uff08PG-Predict\uff09\u4ee5\u53ca\u8c03\u8282\u4eba\u4e0eAI\u4ef7\u503c\u53d8\u8fc1\u4e4b\u95f4\u7684\u53cd\u9988\u5faa\u73af\uff08PG-Coevolve\uff09\u3002\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u65f6\u95f4\u7ef4\u5ea6\u7684\u65b9\u6cd5\uff0c\u800c\u4f20\u7edf\u7684\u5bf9\u9f50\u7b56\u7565\u65e0\u6cd5\u80dc\u4efb\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7ec8\u8eab\u5b66\u4e60\u548c\u5916\u63a8\u7b97\u6cd5\u4f5c\u4e3a\u8fdb\u6b65\u5bf9\u9f50\u7684\u57fa\u672c\u65b9\u6cd5\uff0c\u5e76\u5efa\u7acb\u4e86\u4e00\u4e2a\u5f00\u653e\u7684\u6392\u884c\u699c\uff0c\u9080\u8bf7\u521b\u65b0\u7b97\u6cd5\u548c\u65b0\u6311\u6218\u3002\u8be5\u6846\u67b6\u548c\u6392\u884c\u699c\u5206\u522b\u53ef\u5728https://github.com/PKU-Alignment/ProgressGym \u548c https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard \u83b7\u53d6\u3002|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085](http://arxiv.org/abs/2406.20085)|null|\u57fa\u4e8e\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u65b9\u6cd5\u5df2\u7ecf\u5728\u751f\u6210\u5404\u79cd\u5e03\u5c40\u7684\u9ad8\u8d28\u91cf\u56fe\u50cf\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u4e0b\u6e38\u611f\u77e5\u4efb\u52a1\u5177\u6709\u663e\u8457\u76ca\u5904\u3002\u7136\u800c\uff0c\u4ec5\u4f9d\u8d56\u8bed\u8a00\u63cf\u8ff0\u548c\u4e00\u4e2a\u5408\u9002\u7684\u591a\u5b9e\u4f8b\u8bc4\u4f30\u6307\u6807\u6765\u5b9e\u73b0\u5168\u81ea\u52a8\u5e03\u5c40\u751f\u6210\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Auto Cherry-Picker\uff08ACP\uff09\uff0c\u65e8\u5728\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6837\u672c\uff0c\u4ee5\u589e\u5f3a\u611f\u77e5\u548c\u591a\u6a21\u6001\u8bad\u7ec3\u6548\u679c\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u6982\u5ff5\u5217\u8868\uff0c\u6211\u4eec\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u8be6\u7ec6\u7684\u63cf\u8ff0\u5e76\u8bbe\u8ba1\u5408\u7406\u7684\u5e03\u5c40\u3002\u7136\u540e\uff0c\u4f7f\u7528\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u591a\u4e2a\u56fe\u7247\u3002\u63a5\u7740\uff0c\u6211\u4eec\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bc4\u4f30\u6307\u6807\u5bf9\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u70bc\uff0c\u786e\u4fdd\u8d28\u91cf\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u590d\u5408\u5e03\u5c40\u4e0e\u56fe\u50cf\u8bc4\u5206\uff08Composite Layout and Image Score\uff0cCLIS\uff09\u8fd9\u4e00\u65b0\u6307\u6807\uff0c\u7528\u4e8e\u516c\u6b63\u5730\u8bc4\u4f30\u751f\u6210\u7684\u56fe\u50cf\u3002\u6211\u4eec\u7684\u5408\u6210\u9ad8\u8d28\u793a\u4f8b\u5728\u5b9a\u5236\u521d\u59cb\u6982\u5ff5\u5217\u8868\u65f6\uff0c\u80fd\u591f\u6709\u6548\u63d0\u5347\u5404\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5c3e\u5206\u5e03\u548c\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u7684\u95ee\u9898\u4e0a\u3002\u4e0b\u6e38\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cACP\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6a21\u578b\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86CLIS\u4e0e\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u53d1\u73b0CLIS\u5206\u6570\u8d8a\u9ad8\uff0c\u6027\u80fd\u8d8a\u597d\u3002\u8fd9\u8868\u660e\u8bc4\u4f30\u6307\u6807\u5728\u89c6\u89c9\u611f\u77e5\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u4ee3\u7801\u3002|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079](http://arxiv.org/abs/2406.20079)|**[link](https://github.com/anisha2102/molecular_facts)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5185\u5bb9\u7684\u81ea\u52a8\u4e8b\u5b9e\u6838\u67e5\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4ee5\u5e94\u5bf9\u9519\u8bef\u53d9\u8ff0\u7684\u95ee\u9898\uff0c\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u7126\u70b9\u5728\u4e8e\u6838\u67e5\u7684\u7c92\u5ea6\uff1a\u8f83\u5927\u7684\u6587\u672c\u6bb5\u843d\u96be\u4ee5\u6838\u67e5\uff0c\u800c\u66f4\u539f\u5b50\u5316\u7684\u4e8b\u5b9e\uff08\u5982\u547d\u9898\uff09\u53ef\u80fd\u7f3a\u4e4f\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u89e3\u8bfb\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u8fd9\u4e9b\u539f\u5b50\u4e8b\u5b9e\u4e2d\u4e0a\u4e0b\u6587\u7684\u4f5c\u7528\u3002\u6211\u4eec\u8ba4\u4e3a\u5b8c\u5168\u539f\u5b50\u7684\u4e8b\u5b9e\u5e76\u975e\u6700\u4f73\u8868\u793a\u5f62\u5f0f\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u5206\u5b50\u4e8b\u5b9e\u7684\u4e24\u4e2a\u6807\u51c6\uff1a\u53bb\u60c5\u5883\u5316\uff08decontextuality\uff09\uff0c\u5373\u5b83\u4eec\u80fd\u5426\u72ec\u7acb\u5b58\u5728\uff0c\u4ee5\u53ca\u6700\u5c0f\u5316\uff08minimality\uff09\uff0c\u5373\u6dfb\u52a0\u591a\u5c11\u989d\u5916\u4fe1\u606f\u624d\u80fd\u5b9e\u73b0\u53bb\u60c5\u5883\u5316\u3002\u6211\u4eec\u91cf\u5316\u4e86\u53bb\u60c5\u5883\u5316\u5bf9\u6700\u5c0f\u5316\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u7840\u65b9\u6cd5\u6765\u81ea\u52a8\u751f\u6210\u5206\u5b50\u4e8b\u5b9e\uff0c\u76ee\u6807\u662f\u5728\u4fdd\u6301\u51c6\u786e\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u9002\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u4e0e\u4e0d\u540c\u7684\u53bb\u60c5\u5883\u5316\u7b56\u7565\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u53d1\u73b0\u5206\u5b50\u4e8b\u5b9e\u80fd\u591f\u5728\u6a21\u7cca\u573a\u666f\u4e2d\u5e73\u8861\u6700\u5c0f\u5316\u548c\u4e8b\u5b9e\u6838\u67e5\u7684\u51c6\u786e\u6027\u3002**|\n", "2406.20041": "|**2024-07-01**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041](http://arxiv.org/abs/2406.20041)|null|\u81ea\u4e3b\u4ee3\u7406\u9a71\u52a8\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u793a\u4e86\u5de8\u5927\u7684\u81ea\u52a8\u5316\u6f5c\u529b\u3002\u65e9\u671f\u7684\u5c55\u793a\u8868\u660e\uff0c\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u89e3\u51b3\u590d\u6742\u4efb\u52a1\uff0c\u4e0e\u5916\u90e8\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u589e\u5f3a\u77e5\u8bc6\uff0c\u5e76\u89e6\u53d1\u884c\u52a8\u3002\u7279\u522b\u662f\uff0c\u591a\u4e2a\u4ee3\u7406\u534f\u4f5c\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u5de5\u4f5c\u6d41\u8bc1\u660e\u4e86\u5b83\u4eec\u5728\u4e0d\u90a3\u4e48\u4e25\u683c\u548c\u5b9a\u4e49\u4e0d\u660e\u786e\u7684\u73af\u5883\u4e2d\u64cd\u4f5c\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u591a\u4ee3\u7406\u65b9\u6cd5\u6709\u5de8\u5927\u7684\u6f5c\u529b\u6210\u4e3a\u4f17\u591a\u5de5\u4e1a\u5e94\u7528\u7684\u6838\u5fc3\uff0c\u4ece\u590d\u6742\u7684\u77e5\u8bc6\u68c0\u7d22\u7cfb\u7edf\u5230\u4e0b\u4e00\u4ee3\u673a\u5668\u4eba\u8fc7\u7a0b\u81ea\u52a8\u5316\u3002\u9274\u4e8e\u5f53\u524dLLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5904\u7406\u590d\u6742\u6d41\u7a0b\u9700\u8981\u5206\u6b65\u9aa4\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u8bbe\u8ba1\u660e\u786e\u4e14\u6a21\u5757\u5316\u7684\u4efb\u52a1\u8ba1\u5212\u3002\u6839\u636e\u590d\u6742\u7a0b\u5ea6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u53ef\u4ee5\u7531\u5355\u4e2a\u4ee3\u7406\u6216\u4e00\u7ec4\u4ee3\u7406\u6267\u884c\u3002\u672c\u7814\u7a76\u4e13\u6ce8\u4e8e\u6784\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684\u4ee3\u7406\u5de5\u7a0b\u6846\u67b6\uff0c\u91cd\u70b9\u5173\u6ce8\u89c4\u5212\u548c\u6267\u884c\uff0c\u65e8\u5728\u5e94\u5bf9\u4e0d\u540c\u9886\u57df\u7684\u590d\u6742\u5e94\u7528\u573a\u666f\u3002\u8be5\u6846\u67b6\u4e3a\u5de5\u4e1a\u5e94\u7528\u63d0\u4f9b\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u51fa\u786e\u4fdd\u53ef\u6269\u5c55\u3001\u7075\u6d3b\u4e14\u534f\u4f5c\u7684\u5de5\u4f5c\u6d41\u7a0b\u6280\u672f\uff0c\u8ba9\u591a\u4e2a\u81ea\u4e3b\u4ee3\u7406\u534f\u540c\u89e3\u51b3\u95ee\u9898\u3002|\n", "2406.20030": "|**2024-06-28**|**LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models**|Renzhi Wang et.al.|[2406.20030](http://arxiv.org/abs/2406.20030)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u4e86\u8ddf\u4e0a\u4e0d\u65ad\u53d8\u5316\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u9700\u8981\u6301\u7eed\u8fdb\u884c\u6a21\u578b\u66f4\u65b0\uff0c\u8fd9\u50ac\u751f\u4e86\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u591a\u79cd\u5355\u6b21\u548c\u6279\u91cf\u7f16\u8f91\u7684\u6280\u672f\uff0c\u4f46\u5b83\u4eec\u5728\u9762\u5bf9\u7ec8\u751f\u7f16\u8f91\u65f6\u8981\u4e48\u65e0\u6cd5\u5e94\u7528\uff0c\u8981\u4e48\u6548\u679c\u4e0d\u4f73\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51faLEMoE\uff0c\u4e00\u4e2a\u4e13\u4e3a\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u8bbe\u8ba1\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u9002\u914d\u5668\u3002\u9996\u5148\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f71\u54cd\u4f20\u7edfMoE\u9002\u914d\u5668\u5728\u7ec8\u751f\u7f16\u8f91\u4e2d\u6709\u6548\u6027\u7684\u56e0\u7d20\uff0c\u5305\u62ec\u707e\u96be\u6027\u9057\u5fd8\u3001\u8def\u7531\u4e0d\u4e00\u81f4\u6027\u548c\u987a\u5e8f\u654f\u611f\u6027\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9a\u5236\u7684\u6a21\u5757\u63d2\u5165\u65b9\u6cd5\uff0c\u5f15\u5165\u4e86\u65b0\u9896\u7684\u952e\u503c\u5bf9\u951a\u5b9a\u8def\u7531\u4ee5\u589e\u5f3a\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8def\u7531\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u805a\u7c7b\u57fa\u7f16\u8f91\u987a\u5e8f\u89c4\u5212\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec8\u751f\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u6280\u672f\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6279\u91cf\u7f16\u8f91\u4efb\u52a1\u4e2d\u7684\u4f18\u79c0\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5f00\u6e90\u3002|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015](http://arxiv.org/abs/2406.20015)|**[link](https://github.com/toolbehonest/toolbehonest)**|**\u968f\u7740\u5de5\u5177\u589e\u5f3a\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fc5\u901f\u878d\u5165\u5b9e\u9645\u5e94\u7528\uff0c\u793e\u533a\u4e9f\u9700\u5168\u9762\u4e86\u89e3\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5168\u9762\u7684\u8bca\u65ad\u57fa\u51c6\u2014\u2014ToolBH\u3002\u6211\u4eec\u4ece\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e24\u4e2a\u7ef4\u5ea6\u8fdb\u884c\u8bc4\u4f30\uff1a\u5728\u6df1\u5ea6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u591a\u7ea7\u8bca\u65ad\u6d41\u7a0b\uff0c\u5305\u62ec\uff081\uff09\u53ef\u89e3\u6027\u68c0\u6d4b\u3001\uff082\uff09\u89e3\u51b3\u65b9\u6848\u89c4\u5212\u548c\uff083\uff09\u7f3a\u5931\u5de5\u5177\u5206\u6790\uff1b\u5728\u5e7f\u5ea6\u4e0a\uff0c\u8003\u8651\u4e86\u5de5\u5177\u96c6\u7279\u5f81\u4e0b\u7684\u4e09\u79cd\u573a\u666f\uff1a\u7f3a\u5c11\u5fc5\u8981\u5de5\u5177\u3001\u6f5c\u5728\u5de5\u5177\u548c\u529f\u80fd\u6709\u9650\u7684\u5de5\u5177\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e03\u4e2a\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u6b21\u4eba\u5de5\u6807\u6ce8\u6536\u96c6\u4e86700\u4efd\u8bc4\u4f30\u6837\u672c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u5148\u8fdb\u7684\u6a21\u578bGemini-1.5-Pro\u548cGPT-4o\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u603b\u5f97\u5206\u4e3a45.3\u548c37.0\uff0c\u6ee1\u5206100\u5206\u3002\u5728\u5de5\u5177\u589e\u5f3a\u7684LLM\u573a\u666f\u4e2d\uff0c\u66f4\u5927\u7684\u6a21\u578b\u53c2\u6570\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8bad\u7ec3\u6570\u636e\u548c\u56de\u590d\u7b56\u7565\u540c\u6837\u5173\u952e\u3002\u6211\u4eec\u7684\u8bca\u65ad\u5206\u6790\u6307\u51fa\uff0c\u6a21\u578b\u9519\u8bef\u7684\u4e3b\u8981\u539f\u56e0\u5728\u4e8e\u4efb\u52a1\u53ef\u89e3\u6027\u7684\u5224\u65ad\u3002\u5f00\u653e\u6e90\u7801\u6a21\u578b\u5728\u5197\u957f\u56de\u590d\u65f6\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u4e13\u6709\u6a21\u578b\u5728\u957f\u94fe\u63a8\u7406\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002**|\n", "2407.02490": "|**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8ba1\u7b97\u6311\u6218\uff0c\u5c24\u5176\u662f\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u957f\uff0c\u5176\u5e7f\u6cdb\u5e94\u7528\u9762\u4e34\u969c\u788d\u3002\u7531\u4e8e\u6ce8\u610f\u529b\u8ba1\u7b97\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c80\u4ebf\u53c2\u6570\u7684LLM\u5728\u5355\u4e2aA100 GPU\u4e0a\u5904\u7406100\u4e07\u4e2a\u4ee4\u724c\uff08\u5373\u9884\u586b\u5145\u9636\u6bb5\uff09\u9700\u898130\u5206\u949f\u3002\u73b0\u6709\u7684\u52a0\u901f\u9884\u586b\u5145\u65b9\u6cd5\u5f80\u5f80\u5728\u9762\u5bf9\u957f\u5e8f\u5217LLMs\u65f6\u96be\u4ee5\u4fdd\u6301\u65e2\u9ad8\u6548\u53c8\u51c6\u786e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MInference\uff08\u767e\u4e07\u4ee4\u724c\u63a8\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u63d0\u5347\u957f\u5e8f\u5217\u5904\u7406\u9884\u586b\u5145\u9636\u6bb5\u901f\u5ea6\u7684\u7a00\u758f\u8ba1\u7b97\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u6ce8\u610f\u529b\u77e9\u9635\u4e2d\u7684\u4e09\u79cd\u72ec\u7279\u6a21\u5f0f\uff1aA\u5f62\u3001\u5782\u76f4\u659c\u7ebf\u548c\u5757\u7a00\u758f\uff0c\u8fd9\u4e9b\u6a21\u5f0f\u53ef\u5229\u7528GPU\u8fdb\u884c\u9ad8\u6548\u7684\u7a00\u758f\u8ba1\u7b97\u3002\u6211\u4eec\u5728\u79bb\u7ebf\u9636\u6bb5\u786e\u5b9a\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u6700\u4f73\u6a21\u5f0f\uff0c\u5e76\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u52a8\u6001\u6784\u5efa\u7a00\u758f\u7d22\u5f15\u3002\u901a\u8fc7\u4f18\u5316\u7684GPU\u5185\u6838\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u57fa\u4e8e\u6307\u5b9a\u6a21\u5f0f\u7684\u7a00\u758f\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u957f\u5e8f\u5217LLMs\u9884\u586b\u5145\u9636\u6bb5\u7684\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u9700\u4fee\u6539\u9884\u8bad\u7ec3\u8bbe\u7f6e\u6216\u989d\u5916\u5fae\u8c03\u5373\u53ef\u76f4\u63a5\u5e94\u7528\u4e8e\u73b0\u6709LLMs\u3002\u6211\u4eec\u5728\u5305\u62ecInfiniteBench\u3001RULER\u3001PG-19\u548cNeedle In A Haystack\u5728\u5185\u7684\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4ee5\u53caLLaMA-3-1M\u3001GLM4-1M\u3001Yi-200K\u3001Phi-3-128K\u548cQwen2-128K\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMInference\u5728A100\u4e0a\u6709\u6548\u964d\u4f4e\u4e86\u9884\u586b\u5145\u7684\u63a8\u7406\u5ef6\u8fdf\u9ad8\u8fbe10\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u5730\u5740\u4e3a\uff1ahttps://aka.ms/MInference\u3002**|\n", "2407.02486": "|**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNeurocache\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6709\u6548\u4e0a\u4e0b\u6587\u8303\u56f4\uff0c\u901a\u8fc7\u5916\u90e8\u5411\u91cf\u7f13\u5b58\u5b58\u50a8\u5176\u8fc7\u53bb\u7684\u6a21\u578b\u72b6\u6001\u3002\u4e0e\u8fd1\u671f\u7684\u5411\u91cf\u68c0\u7d22\u65b9\u6cd5\u7c7b\u4f3c\uff0cNeurocache\u5229\u7528\u9ad8\u6548\u7684k\u8fd1\u90bb(kNN)\u7b97\u6cd5\u68c0\u7d22\u76f8\u5173\u7684\u5386\u53f2\u72b6\u6001\uff0c\u5e76\u5c06\u5176\u878d\u5165\u6ce8\u610f\u529b\u8fc7\u7a0b\u3002Neurocache\u5728\u6539\u8fdb\u73b0\u6709\u65b9\u6cd5\u65b9\u9762\u6709\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u5b58\u50a8\u538b\u7f29\u7684\u72b6\u6001\uff0c\u51cf\u5c0f\u4e86\u7f13\u5b58\u5927\u5c0f\uff1b(2) \u6bcf\u4e2a\u4ee4\u724c\u6267\u884c\u4e00\u6b21\u68c0\u7d22\u64cd\u4f5c\uff0c\u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\uff1b(3) \u5c06\u68c0\u7d22\u7a97\u53e3\u6269\u5c55\u5230\u90bb\u8fd1\u72b6\u6001\uff0c\u63d0\u5347\u4e86\u8bed\u8a00\u5efa\u6a21\u548c\u4e0b\u6e38\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u8fd8\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982Llama2-7B\u548cMistral-7B\uff09\u8fdb\u884c\u589e\u5f3a\uff0cNeurocache\u90fd\u80fd\u6709\u6548\u3002\u6211\u4eec\u8fd8\u5bf9\u6bd4\u4e86Neurocache\u4e0e\u5176\u4ed6\u6587\u672c\u68c0\u7d22\u65b9\u6cd5\uff0c\u5728\u5355\u6587\u6863\u95ee\u7b54\u548c\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5176\u4f18\u52bf\u3002\u6e90\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u516c\u5f00\uff1ahttps://github.com/alisafaya/neurocache\u3002**|\n", "2407.02485": "|**2024-07-02**|**RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs**|Yue Yu et.al.|[2407.02485](http://arxiv.org/abs/2407.02485)|null|\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6307\u4ee4\u8c03\u4f18\u6846\u67b6RankRAG\uff0c\u65e8\u5728\u9488\u5bf9\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4e2d\u7684\u4e0a\u4e0b\u6587\u6392\u540d\u548c\u7b54\u6848\u751f\u6210\u53cc\u91cd\u4efb\u52a1\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u8c03\u4f18\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u52a0\u5165\u5c11\u91cf\u6392\u540d\u6570\u636e\uff0c\u6307\u4ee4\u8c03\u4f18\u7684\u5355\u4e2a\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u4ee4\u4eba\u60ca\u8bb6\u7684\u6548\u679c\uff0c\u8d85\u8d8a\u4e86\u4e13\u95e8\u4f7f\u7528\u5927\u91cf\u6392\u540d\u6570\u636e\u8fdb\u884c\u5355\u72ec\u8c03\u4f18\u7684\u73b0\u6709\u4e13\u5bb6\u6392\u540d\u6a21\u578b\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4e0e\u5305\u62ecGPT-4-0613\u3001GPT-4-turbo-2024-0409\u548c\u5f00\u653e\u6e90\u4ee3\u7801\u7684\u6700\u5148\u8fdb\u7684RAG\u6027\u80fd\u6a21\u578bChatQA-1.5\u5728\u5185\u7684\u591a\u4e2a\u5f3abaseline\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684Llama3-RankRAG\u5728\u4e5d\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8eLlama3-ChatQA-1.5\u548cGPT-4\u7cfb\u5217\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5728\u65e0\u9700\u9488\u5bf9\u751f\u7269\u533b\u5b66\u9886\u57df\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u4e94\u4e2a\u751f\u7269\u533b\u5b66\u9886\u57df\u7684RAG\u57fa\u51c6\u4e0a\u4e0eGPT-4\u6a21\u578b\u8868\u73b0\u76f8\u5f53\uff0c\u8fd9\u663e\u793a\u4e86\u5176\u5728\u65b0\u9886\u57df\u4e2d\u7684\u51fa\u8272\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.02477": "|**2024-07-02**|**Understanding Alignment in Multimodal LLMs: A Comprehensive Study**|Elmira Amirloo et.al.|[2407.02477](http://arxiv.org/abs/2407.02477)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6027\u80fd\u7684\u63d0\u5347\uff0c\u504f\u597d\u4e00\u81f4\u6027\u5df2\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u56e0\u7d20\uff0c\u4f46\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u5e94\u7528\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u7406\u89e3\u4efb\u52a1\u4e2d\u4e5f\u4f1a\u9047\u5230\u8bf8\u5982\u9519\u8bef\u9648\u8ff0\u548c\u5185\u5bb9\u4e0d\u4e00\u81f4\uff08\u5373\u5e7b\u89c9\uff09\u7684\u95ee\u9898\u3002MLLMs\u7684\u504f\u597d\u5bf9\u9f50\u76ee\u6807\u662f\u4f7f\u6a21\u578b\u7684\u56de\u7b54\u66f4\u8d34\u8fd1\u56fe\u50cf\u4fe1\u606f\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u5f15\u5165\u4e86\u9488\u5bf9MLLM\u7684\u504f\u597d\u6570\u636e\u96c6\uff0c\u5e76\u5c1d\u8bd5\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u548cproximal policy optimization\uff08PPO\uff09\u7b49\u4e0d\u540c\u7684\u5bf9\u9f50\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u3001\u57fa\u7840\u6a21\u578b\u7c7b\u578b\u548c\u5bf9\u9f50\u7b56\u7565\u7684\u5dee\u5f02\uff0c\u54ea\u79cd\u65b9\u6cd5\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u8d21\u732e\u6700\u5927\u5c1a\u4e0d\u6e05\u695a\u3002 \u672c\u6587\u72ec\u7acb\u5206\u6790\u4e86MLLM\u504f\u597d\u5bf9\u9f50\u7684\u5404\u4e2a\u65b9\u9762\u3002\u6211\u4eec\u5c06\u5bf9\u9f50\u7b97\u6cd5\u5206\u4e3a\u79bb\u7ebf\uff08\u5982DPO\uff09\u548c\u5728\u7ebf\uff08\u5982\u5728\u7ebf-DPO\uff09\u4e24\u7c7b\uff0c\u5e76\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u7ed3\u5408\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u56de\u987e\u4e86\u5404\u79cd\u5df2\u53d1\u8868\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u96c6\uff0c\u63a2\u8ba8\u4e86\u5b83\u4eec\u6784\u5efa\u7ec6\u8282\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u751f\u6210\u65b9\u6cd5\u2014\u2014\u504f\u89c1\u9a71\u52a8\u7684\u5e7b\u89c9\u91c7\u6837\uff08Bias-Driven Hallucination Sampling\uff0cBDHS\uff09\uff0c\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u989d\u5916\u6807\u6ce8\u6216\u5916\u90e8\u6a21\u578b\uff0c\u4e14\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u4e0e\u4e4b\u524d\u53d1\u8868\u7684\u5bf9\u9f50\u5de5\u4f5c\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u80fd\u3002|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473](http://arxiv.org/abs/2407.02473)|null|\u5982\u4f55\u6784\u5efa\u80fd\u591f\u5728\u5f00\u653e\u4e16\u754c\u4e2d\u6267\u884c\u8bed\u4e49\u5bfc\u822a\u4efb\u52a1\u7684\u673a\u5668\u4eba\uff0c\u6bd4\u5982\u5728\u65b0\u573a\u666f\u4e2d\u5bfb\u627e\u76ee\u6807\u7269\u4f53\uff1f\u5c3d\u7ba1\u57fa\u7840\u6a21\u578b\u5177\u5907\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u6240\u9700\u7684\u4e30\u5bcc\u77e5\u8bc6\u548c\u6cdb\u5316\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4e00\u79cd\u5408\u9002\u7684\u573a\u666f\u8868\u793a\u6765\u5c06\u5b83\u4eec\u6574\u5408\u5230\u5b8c\u6574\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5f00\u653e\u573a\u666f\u56fe\uff08Open Scene Graphs\uff0cOSG\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u62d3\u6251\u8bed\u4e49\u8868\u793a\uff0c\u7528\u4e8e\u4fdd\u7559\u548c\u7ec4\u7ec7\u5f00\u653e\u96c6\u4e2d\u573a\u666f\u4fe1\u606f\uff0c\u4e14\u7ed3\u6784\u53ef\u9002\u5e94\u4e0d\u540c\u73af\u5883\u7c7b\u578b\u3002\u6211\u4eec\u5c06\u57fa\u7840\u6a21\u578b\u548cOSG\u6574\u5408\u5230OpenSearch\u7cfb\u7edf\u4e2d\uff0c\u8be5\u7cfb\u7edf\u4e13\u4e3a\u5f00\u653e\u4e16\u754c\u7684\u5bf9\u8c61\u76ee\u6807\u5bfc\u822a\u8bbe\u8ba1\uff0c\u80fd\u591f\u7406\u89e3\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u5e76\u5728\u591a\u53d8\u73af\u5883\u4e2d\u96f6\u6837\u672c\u6cdb\u5316\uff0c\u5bfb\u627e\u672a\u89c1\u8fc7\u7684\u7269\u4f53\u3002\u6211\u4eec\u7684OSG\u589e\u5f3a\u4e86\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5f97OpenSearch\u5728\u7269\u4f53\u76ee\u6807\u5bfc\u822a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u7684LLM\u65b9\u6cd5\u3002\u901a\u8fc7\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u6d4b\u8bd5\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86OpenSearch\u5728\u5404\u79cd\u73af\u5883\u3001\u673a\u5668\u4eba\u548c\u65b0\u9896\u6307\u4ee4\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02464": "|**2024-07-02**|**Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I**|Harrie Oosterhuis et.al.|[2407.02464](http://arxiv.org/abs/2407.02464)|null|\u4f20\u7edf\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u7cfb\u7edf\u8bc4\u4f30\u901a\u5e38\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u4eba\u5de5\u4e13\u5bb6\u8fdb\u884c\u76f8\u5173\u6027\u6807\u6ce8\u3002\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u80fd\u591f\u4ee5\u76f8\u5bf9\u8f83\u4f4e\u7684\u8ba1\u7b97\u6210\u672c\u5927\u89c4\u6a21\u751f\u6210\u76f8\u5173\u6027\u6ce8\u91ca\uff0c\u53ef\u80fd\u51cf\u8f7bIR\u8bc4\u4f30\u7684\u4f20\u7edf\u6210\u672c\uff0c\u5e76\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u8d44\u6e90\u532e\u4e4f\u7684\u5e94\u7528\u573a\u666f\u3002\u7136\u800c\uff0c\u751f\u6210\u7684\u6ce8\u91ca\u5e76\u975e\u65e0\u8bef\uff0c\u76f4\u63a5\u7528\u4e8e\u8bc4\u4f30\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u53ef\u9760\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e24\u79cd\u65b9\u6cd5\uff0c\u5206\u522b\u662f\u57fa\u4e8e\u9884\u6d4b\u9a71\u52a8\u7684\u63a8\u65ad\u548c\u89c4\u8303\u98ce\u9669\u63a7\u5236\uff0c\u5229\u7528\u8ba1\u7b97\u673a\u751f\u6210\u7684\u76f8\u5173\u6027\u6ce8\u91ca\u4e3aIR\u8bc4\u4f30\u6307\u6807\u63d0\u4f9b\u53ef\u9760\u7684\u7f6e\u4fe1\u533a\u95f4\uff08CIs\uff09\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9700\u8981\u5c11\u91cf\u53ef\u9760\u7684\u6ce8\u91ca\uff0c\u901a\u8fc7\u7edf\u8ba1\u5206\u6790\u751f\u6210\u6ce8\u91ca\u4e2d\u7684\u9519\u8bef\uff0c\u4ece\u800c\u4e3a\u8bc4\u4f30\u6307\u6807\u8bbe\u7f6eCIs\uff0c\u5177\u6709\u575a\u5b9e\u7684\u7406\u8bba\u57fa\u7840\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u7684\u89c4\u8303\u98ce\u9669\u63a7\u5236\u65b9\u6cd5\u9002\u7528\u4e8e\u6392\u540d\u8bc4\u4f30\uff0c\u5e76\u4e14\u53ef\u4ee5\u6839\u636e\u67e5\u8be2\u548c\u6587\u6863\u81ea\u9002\u5e94\u8c03\u6574CIs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7f6e\u4fe1\u533a\u95f4\u51c6\u786e\u6355\u6349\u4e86\u57fa\u4e8eLLM\u6ce8\u91ca\u7684\u8bc4\u4f30\u4e2d\u7684\u53d8\u5f02\u6027\u548c\u504f\u5dee\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684Bootstrap\u4f30\u8ba1\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e9b\u8d21\u732e\u80fd\u4e3a\u90a3\u4e9b\u4f20\u7edf\u4e0a\u96be\u4ee5\u5b9e\u73b0\u53ef\u9760\u8bc4\u4f30\u7684\u4f17\u591aIR\u5e94\u7528\u5e26\u6765\u9769\u65b0\u3002|\n", "2407.02411": "|**2024-07-03**|**Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs**|Jinmin Li et.al.|[2407.02411](http://arxiv.org/abs/2407.02411)|null|\u968f\u7740\u89c6\u9891\u9a71\u52a8\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u89c6\u9891\u7406\u89e3\u80fd\u529b\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u4fdd\u62a4\u65b9\u9762\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u89c6\u9891\u66f4\u5bb9\u6613\u88ab\u65e0\u6388\u6743\u5730\u6807\u6ce8\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cVideo Watermarking\u201d\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fdd\u62a4\u89c6\u9891\u514d\u53d7\u672a\u7ecf\u6388\u6743\u7684\u89c6\u9891LLMs\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5185\u5bb9\u548c\u63cf\u8ff0\u7684\u5904\u7406\u3002\u901a\u8fc7\u5728\u5173\u952e\u5e27\u4e2d\u5d4c\u5165\u96be\u4ee5\u5bdf\u89c9\u7684\u6c34\u5370\uff0c\u6211\u4eec\u5229\u7528\u591a\u6a21\u6001\u6d41\u635f\u5931\u4fdd\u6301\u89c2\u770b\u4f53\u9a8c\u7684\u540c\u65f6\uff0c\u9632\u6b62\u89c6\u9891\u88ab\u6ee5\u7528\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0cVideo Watermarking\u663e\u8457\u964d\u4f4e\u4e86\u89c6\u9891\u5728\u5404\u79cd\u89c6\u9891LLMs\u4e2d\u7684\u53ef\u7406\u89e3\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u9690\u79d8\u6027\u548c\u9c81\u68d2\u6027\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u786e\u4fdd\u89c6\u9891\u5185\u5bb9\u7684\u5b89\u5168\u3001\u5b8c\u6574\u6027\u548c\u4fdd\u5bc6\u6027\u63d0\u4f9b\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u4e0d\u65ad\u53d1\u5c55\u7684\u89c6\u9891LLMs\u6280\u672f\u3002|\n", "2407.02408": "|**2024-07-02**|**CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models**|Song Wang et.al.|[2407.02408](http://arxiv.org/abs/2407.02408)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u5bf9\u5176\u751f\u6210\u5185\u5bb9\u53ef\u80fd\u4ea7\u751f\u7684\u8d1f\u9762\u793e\u4f1a\u5f71\u54cd\u7684\u62c5\u5fe7\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u7684\u504f\u89c1\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u504f\u89c1\u8bc4\u4f30\u5de5\u4f5c\u5f80\u5f80\u53ea\u5173\u6ce8\u67d0\u79cd\u7c7b\u578b\u7684\u504f\u89c1\uff0c\u5e76\u4f7f\u7528\u4e0d\u4e00\u81f4\u7684\u8bc4\u4ef7\u6307\u6807\uff0c\u8fd9\u5bfc\u81f4\u4e0d\u540c\u6570\u636e\u96c6\u548cLLM\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u591a\u79cd\u7528\u4e8e\u8bc4\u4f30LLM\u504f\u89c1\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86CEB\uff08Compositional Evaluation Benchmark\uff09\uff0c\u5b83\u6db5\u76d6\u4e86\u4e0d\u540c\u793e\u4f1a\u7fa4\u4f53\u548c\u793e\u4f1a\u4efb\u52a1\u4e2d\u7684\u5404\u79cd\u7c7b\u578b\u504f\u89c1\u3002CEB\u7684\u6784\u5efa\u57fa\u4e8e\u6211\u4eec\u65b0\u63d0\u51fa\u7684\u6784\u6210\u6027\u5206\u7c7b\u4f53\u7cfb\uff0c\u4ece\u4e09\u4e2a\u7ef4\u5ea6\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u523b\u753b\uff1a\u504f\u89c1\u7c7b\u578b\u3001\u793e\u4f1a\u7fa4\u4f53\u548c\u4efb\u52a1\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e09\u4e2a\u7ef4\u5ea6\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u5168\u9762\u7684LLM\u504f\u89c1\u8bc4\u4f30\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u504f\u89c1\u5728\u5404\u7ef4\u5ea6\u4e0a\u7684\u7a0b\u5ea6\u6709\u6240\u4e0d\u540c\uff0c\u4ece\u800c\u4e3a\u9488\u5bf9\u7279\u5b9a\u504f\u89c1\u7684\u7f13\u89e3\u65b9\u6cd5\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002|\n", "2407.02402": "|**2024-07-02**|**Assessing the Code Clone Detection Capability of Large Language Models**|Zixian Zhang et.al.|[2407.02402](http://arxiv.org/abs/2407.02402)|null|\u8be5\u7814\u7a76\u65e8\u5728\u8bc4\u4f30\u4e24\u79cd\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cGPT-3.5\u548cGPT-4\uff0c\u5728\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u901a\u8fc7\u5728\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u6a21\u578b\uff1aBigCloneBench\uff08\u4eba\u7c7b\u521b\u5efa\uff09\u548cGPTCloneBench\uff08LLM\u751f\u6210\uff09\u3002\u7814\u7a76\u53d1\u73b0\uff0cGPT-4\u5728\u6240\u6709\u7c7b\u578b\u7684\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4e2d\u90fd\u660e\u663e\u4f18\u4e8eGPT-3.5\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e0e\u5176\u8bc6\u522b\u4ee3\u7801\u514b\u9686\u7684\u80fd\u529b\u4e0e\u4ee3\u7801\u76f8\u4f3c\u5ea6\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5728\u8bc6\u522b\u6700\u590d\u6742\u7684Type-4\u4ee3\u7801\u514b\u9686\u65f6\u6548\u679c\u8f83\u4f4e\u3002\u6b64\u5916\uff0cGPT\u6a21\u578b\u5728\u68c0\u6d4bLLM\u751f\u6210\u7684\u4ee3\u7801\u4e2d\u7684\u4ee3\u7801\u514b\u9686\u8868\u73b0\u4f18\u4e8e\u4eba\u7c7b\u751f\u6210\u7684\u4ee3\u7801\uff0c\u4f46\u6574\u4f53\u51c6\u786e\u6027\u4ecd\u4e0d\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u5728\u4ee3\u7801\u514b\u9686\u8bc6\u522b\u80fd\u529b\u7684\u5fc5\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u81ea\u6211\u751f\u6210\u4ee3\u7801\u514b\u9686\u7684\u95ee\u9898\uff0c\u968f\u7740\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u8d8a\u6765\u8d8a\u591a\u5730\u4f7f\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u548c\u91cd\u6784\u5de5\u5177\uff0c\u8fd9\u53ef\u80fd\u4f1a\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002|\n", "2407.03310": "|**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**\u6458\u8981\uff1a** \u957f\u5ea6\u6cdb\u5316\u6307\u7684\u662f\u4ece\u7b80\u77ed\u7684\u8bad\u7ec3\u5e8f\u5217\u63a8\u65ad\u51fa\u957f\u6d4b\u8bd5\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd9\u5bf9\u4e8e\u5f53\u524d\u7684\u5927\u8bed\u8a00\u6a21\u578b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u67b6\u6784\u6216\u6570\u636e\u683c\u5f0f\u53d8\u5316\u6765\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5c40\u9650\u4e8e\u7279\u5b9a\u4efb\u52a1\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u64e6\u9664\u677f\u548c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u6280\u672f\uff0c\u63d0\u51fa\u4e86Turing\u7a0b\u5e8f\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684CoT\u7b56\u7565\uff0c\u5b83\u5c06\u7b97\u6cd5\u6027\u4efb\u52a1\u5206\u89e3\u6210\u7c7b\u4f3c\u56fe\u7075\u673a\u8ba1\u7b97\u7684\u6b65\u9aa4\u3002\u8fd9\u4e2a\u6846\u67b6\u65e2\u901a\u7528\u53c8\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728\u4e0a\u4e0b\u6587\u4e2d\u7a0d\u4f5c\u4fee\u6539\u5730\u590d\u5236\u6587\u672c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Turing\u7a0b\u5e8f\uff0c\u6211\u4eec\u5728\u52a0\u6cd5\u3001\u4e58\u6cd5\u4ee5\u53ca\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684SGD\u7b49\u7b97\u6cd5\u6027\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u7a33\u5065\u7684\u957f\u5ea6\u6cdb\u5316\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793aTransformer\u5728\u968f\u673aTuring\u7a0b\u5e8f\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u8fd9\u8868\u660e\u5bf9\u4e8e\u4efb\u4f55\u7b97\u6cd5\u6027\u4efb\u52a1\uff0c\u957f\u5ea6\u6cdb\u5316\u90fd\u662f\u53ef\u80fd\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u7406\u8bba\u8bc1\u660eTransformer\u80fd\u591f\u5b9e\u73b0Turing\u7a0b\u5e8f\uff0c\u6784\u9020\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RASP\uff08Weiss\u7b49\u4eba\uff09\u7a0b\u5e8f\uff0c\u5b83\u6a21\u62df\u4efb\u610f\u56fe\u7075\u673a\u3002|\n", "2407.03286": "|**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## \u80cc\u666f \u534a\u7ed3\u6784\u5316\u6570\u636e\u683c\u5f0f\u5982JSON\u56e0\u5176\u5728\u5b58\u50a8\u6570\u636e\u65f6\u7684\u7075\u6d3b\u6027\u800c\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cJSON\u6570\u636e\u901a\u5e38\u7f3a\u4e4f\u4e0e\u5173\u7cfb\u6570\u636e\u5e93\u4e2d\u7684\u8868\u5355\u7ed3\u6784\u76f8\u5bf9\u5e94\u7684\u89c4\u8303\uff08schema\uff09\u3002\u56e0\u6b64\uff0c\u51fa\u73b0\u4e86\u8bb8\u591a\u4ece\u6570\u636e\u96c6\u4e2d\u53d1\u73b0\u89c4\u8303\u7684\u5de5\u5177\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5de5\u5177\u5f88\u6709\u7528\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u6587\u6863\u7684\u8bed\u6cd5\uff0c\u800c\u5ffd\u89c6\u4e86\u8bed\u4e49\u4fe1\u606f\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u5982\u4f55\u81ea\u52a8\u4e3a\u53d1\u73b0\u7684\u89c4\u8303\u6dfb\u52a0\u6709\u610f\u4e49\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u4f7f\u5176\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4f5c\u8005\u7f16\u5199\u7684\u89c4\u8303\u4e2d\u6240\u5305\u542b\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u4eba\u5de5\u7f16\u5199\u7684JSON Schema\u6587\u6863\u5e93\uff0c\u751f\u6210\u5143\u7d20\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u3001\u53ef\u91cd\u7528\u5b9a\u4e49\u7684\u6709\u610f\u4e49\u540d\u79f0\uff0c\u5e76\u8bc6\u522b\u51fa\u54ea\u4e9b\u53d1\u73b0\u7684\u5c5e\u6027\u6700\u6709\u7528\uff0c\u54ea\u4e9b\u53ef\u4ee5\u89c6\u4e3a\u201c\u566a\u58f0\u201d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5148\u524d\u5df2\u8bc1\u660e\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684\u6587\u672c\u751f\u6210\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|**[link](https://github.com/ziweiji/Internal_States_Reveal_Hallucination)**|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u4e25\u91cd\u5236\u7ea6\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002\u4eba\u7c7b\u5177\u6709\u81ea\u6211\u610f\u8bc6\u8fc7\u7a0b\uff0c\u80fd\u8bc6\u522b\u9762\u5bf9\u67e5\u8be2\u65f6\u7684\u672a\u77e5\u9886\u57df\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u8bba\u6587\u7814\u7a76\u4e86LLMs\u80fd\u5426\u5728\u751f\u6210\u54cd\u5e94\u4e4b\u524d\u81ea\u884c\u8bc4\u4f30\u5176\u5e7b\u89c9\u98ce\u9669\u3002\u6211\u4eec\u4ece\u8bad\u7ec3\u6570\u636e\u6e90\u548c15\u4e2a\u4e0d\u540c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u89d2\u5ea6\u5e7f\u6cdb\u5206\u6790LLMs\u7684\u5185\u90e8\u673a\u5236\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u6570\u636e\u96c6\u3002\u5b9e\u8bc1\u5206\u6790\u63ed\u793a\u4e86\u4e24\u4e2a\u5173\u952e\u53d1\u73b0\uff1a(1) LLM\u7684\u5185\u90e8\u72b6\u6001\u80fd\u591f\u6307\u793a\u5b83\u4eec\u662f\u5426\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c1\u8fc7\u67e5\u8be2\uff1b(2) LLM\u7684\u5185\u90e8\u72b6\u6001\u663e\u793a\u51fa\u5b83\u4eec\u5bf9\u67e5\u8be2\u53ef\u80fd\u4ea7\u751f\u5e7b\u89c9\u6216\u4e0d\u4ea7\u751f\u5e7b\u89c9\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u7814\u7a76\u5173\u6ce8\u7279\u5b9a\u7684\u795e\u7ecf\u5143\u3001\u6fc0\u6d3b\u5c42\u548c\u4ee4\u724c\uff0c\u8fd9\u4e9b\u5728LLM\u5bf9\u4e0d\u786e\u5b9a\u6027\u548c\u5e7b\u89c9\u98ce\u9669\u7684\u8ba4\u8bc6\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u901a\u8fc7\u4e00\u79cd\u63a2\u67e5\u4f30\u8ba1\u7b97\u6cd5\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u8fd0\u884c\u65f6\u5b9e\u73b0\u4e86\u5e73\u574784.32%\u7684\u5e7b\u89c9\u4f30\u8ba1\u51c6\u786e\u7387\u3002|\n", "2407.03227": "|**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|\u6211\u4eec\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89d2\u5ea6\u63a2\u8ba8\u6587\u672c\u5230SQL\u7684\u8bed\u4e49\u89e3\u6790\u3002\u9274\u4e8e\u5546\u4e1a\u6570\u636e\u5e93\u6a21\u5f0f\u7684\u89c4\u6a21\u6311\u6218\u548c\u4e1a\u52a1\u667a\u80fd\u89e3\u51b3\u65b9\u6848\u7684\u90e8\u7f72\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5b83\u52a8\u6001\u83b7\u53d6\u8f93\u5165\u6570\u636e\u5e93\u4fe1\u606f\uff0c\u5e76\u5229\u7528\u62bd\u8c61\u8bed\u6cd5\u6811\u9009\u62e9\u5c11\u91cf\u793a\u4f8b\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5982\u4f55\u5229\u7528\u5e76\u884c\u8bed\u4e49\u89e3\u6790\u5668\u751f\u6210SQL\u67e5\u8be2\u7684\u8fd1\u4f3c\u7248\u672c\uff0c\u4ee5\u652f\u6301\u6211\u4eec\u7684\u68c0\u7d22\u3002\u6211\u4eec\u751a\u81f3\u5c06\u8fd9\u79cd\u65b9\u6cd5\u63a8\u5411\u6781\u81f4\uff0c\u91c7\u7528\u4e0d\u52305\u4ebf\u53c2\u6570\u7684\u6a21\u578b\u4f5c\u4e3a\u9ad8\u6548\u8fd1\u4f3c\u5668\uff0c\u5e76\u8d4b\u4e88\u5176\u5e76\u884c\u5904\u7406\u6a21\u5f0f\u7684\u80fd\u529b\u3002\u6211\u4eec\u5728\u5355\u8bed\u548c\u8de8\u8bed\u8a00\u7684\u8bed\u4e49\u89e3\u6790\u57fa\u51c6\u4e0a\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4f18\u4e8e\u73b0\u6709\u6700\u4f73\u57fa\u7ebf\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u79cd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u8bbe\u7f6e\u4e2d\u5404\u4e2a\u6a21\u5757\u7684\u8d21\u732e\uff0c\u4e3a\u672a\u6765\u5de5\u4f5c\u6307\u660e\u4e86\u6709\u8da3\u7684\u65b9\u5411\u3002|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## \u80cc\u666f \u91cf\u5316\u6280\u672f\u5728\u63d0\u5347\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u901f\u5ea6\u548c\u90e8\u7f72\u6548\u7387\u65b9\u9762\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5c3d\u7ba1\u6709\u5927\u91cf\u7684\u7814\u7a76\u5173\u6ce8\u4e86\u91cf\u5316\u540e\u7684\u82f1\u8bed\u4efb\u52a1\u6a21\u578b\u6548\u679c\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u9488\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u3002\u6211\u4eec\u5bf9\u91cf\u5316\u591a\u8bed\u8a00LLM\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u8de8\u8bed\u8a00\u6027\u80fd\u53ca\u4e0d\u540c\u89c4\u6a21\u4e0b\u7684\u8868\u73b0\u3002\u6211\u4eec\u91c7\u7528\u81ea\u52a8\u57fa\u51c6\u6d4b\u8bd5\u3001LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u8bc4\u4f30\uff0c\u53d1\u73b0\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u91cf\u5316\u5bf9\u4eba\u7c7b\u8bc4\u4ef7\u7684\u5f71\u54cd\u662f\u8d1f\u9762\u7684\uff0c\u4e14\u81ea\u52a8\u6307\u6807\u4e25\u91cd\u4f4e\u4f30\u4e86\u8fd9\u79cd\u635f\u5bb3\uff1a\u81ea\u52a8\u4efb\u52a1\u4e2d\u5e73\u57471.7%\u7684\u6027\u80fd\u4e0b\u964d\u5bf9\u5e94\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u65e5\u672c\u4efb\u52a1\u768416.0%\u663e\u8457\u4e0b\u6ed1\uff1b(2) \u4e0d\u540c\u8bed\u8a00\u53d7\u5230\u91cf\u5316\u7684\u5f71\u54cd\u7a0b\u5ea6\u4e0d\u5747\uff0c\u975e\u62c9\u4e01\u5b57\u6bcd\u4f53\u7cfb\u7684\u8bed\u8a00\u53d7\u5f71\u54cd\u6700\u4e25\u91cd\uff1b(3) \u6bd4\u5982\u6570\u5b66\u63a8\u7406\u8fd9\u7c7b\u6311\u6218\u6027\u4efb\u52a1\uff0c\u5176\u6027\u80fd\u4e0b\u964d\u6700\u4e3a\u663e\u8457\u3002\u968f\u7740\u4f4e\u529f\u8017\u6a21\u578b\u670d\u52a1\u4e8e\u5168\u7403NLP\u6280\u672f\u7684\u666e\u53ca\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5728\u8bc4\u4f30\u9ad8\u6548\u6a21\u578b\u65f6\uff0c\u591a\u8bed\u8a00\u6027\u80fd\u5e94\u4f5c\u4e3a\u5173\u952e\u6307\u6807\u3002|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### \u7ffb\u8bd1 \u5728\u6570\u5b66\u8bc1\u660e\u7684\u8ba1\u7b97\u673a\u53ef\u9a8c\u8bc1\u5f62\u5f0f\u8bed\u8a00\uff08\u5982Lean\uff09\u9a8c\u8bc1\u4e2d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u7684\u8bc1\u660e\u65b9\u6cd5\u5177\u6709\u91cd\u8981\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7531\u4e8eNL\u4e0e\u5f62\u5f0f\u8bed\u8a00\uff08FL\uff09\u7684\u8bc1\u660e\u6570\u636e\u7a00\u7f3a\uff0c\u73b0\u4ee3LLMs\u5728\u751f\u6210\u5b8c\u6574\u8bc1\u660e\u65b9\u9762\u7684\u6027\u80fd\u6b20\u4f73\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a**TheoremLlama**\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u901a\u7528LLM\u6210\u4e3aLean4\u4e13\u5bb6\u3002\u8be5\u6846\u67b6\u5305\u62ecNL-FL\u5bf9\u9f50\u6570\u636e\u96c6\u751f\u6210\u65b9\u6cd5\u3001LLM\u5f62\u5f0f\u5b9a\u7406\u8bc1\u660e\u5668\u7684\u8bad\u7ec3\u7b56\u7565\u4ee5\u53caLLM\u5728\u64b0\u5199Lean4\u8bc1\u660e\u4e2d\u7684\u6280\u672f\u3002 \u5173\u952e\u521b\u65b0\u5728\u4e8e\u6211\u4eec\u5f00\u53d1\u4e86NL-FL\u81ea\u4e3e\u65b9\u6cd5\uff0c\u5373\u5c06NL\u8bc1\u660e\u878d\u5165Lean4\u4ee3\u7801\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u6b63\u5f0f\u63a8\u7406\u3002\u901a\u8fc7\u8fd9\u79cd\u6570\u636e\u96c6\u751f\u6210\u65b9\u5f0f\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86**Open Bootstrapped Theorems**\uff08OBT\uff09\uff0c\u4e00\u4e2a\u5bf9\u9f50\u4e14\u81ea\u4e3e\u7684NL-FL\u6570\u636e\u96c6\u3002**TheoremLlama**\u6846\u67b6\u5728MiniF2F-Valid\u548cTest\u6570\u636e\u96c6\u4e0a\u7684\u7d2f\u8ba1\u51c6\u786e\u7387\u5206\u522b\u8fbe\u523036.48%\u548c33.61%\uff0c\u8d85\u8fc7\u4e86GPT-4\u7684\u57fa\u7ebf\u5206\u657022.95%\u548c25.41%\u3002\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u6a21\u578b\u68c0\u67e5\u70b9\u548c\u751f\u6210\u7684\u6570\u636e\u96c6\uff0c\u5e76\u5373\u5c06\u5168\u90e8\u4ee3\u7801\u5f00\u6e90\u3002**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aDivergent CoT\uff08DCoT\uff09\uff0c\u901a\u8fc7\u8981\u6c42\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u6bd4\u8f83\u591a\u4e2a\u63a8\u7406\u94fe\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u5c0f\u578b\u3001\u66f4\u6613\u4e8e\u83b7\u53d6\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u4e5f\u80fd\u63d0\u9ad8\u8868\u73b0\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6d89\u53ca\u4e0d\u540c\u7c7b\u578b\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u7814\u7a76\u53d1\u73b0\u5bf9DCoT\u6570\u636e\u96c6\u7684\u5fae\u8c03\u5728\u5404\u79cd\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece13\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\u4e0a\u666e\u904d\u4f18\u4e8e\u57fa\u672c\u7684CoT\u65b9\u6cd5\u3002\u5b9e\u9a8c\u548c\u4eba\u5de5\u8bc4\u4f30\u8868\u660e\uff0c\u8fd9\u4e9b\u6027\u80fd\u63d0\u5347\u6e90\u4e8e\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u4e2d\u751f\u6210\u4e86\u591a\u4e2a\u4e0d\u540c\u7684\u63a8\u7406\u8def\u5f84\uff0c\u8fd9\u8868\u660e\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u81ea\u6211\u7ea0\u6b63\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/UKPLab/arxiv2024-divergent-cot\u4e0a\u516c\u5f00\u3002**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u80fd\u529b\u3001\u6cdb\u5316\u80fd\u529b\u548c\u8de8\u9886\u57df\u7684\u6d41\u7545\u6027\uff0c\u5728\u63d0\u5347\u8bed\u97f3\u76f8\u5173\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u672c\u6587\u5173\u6ce8\u7684\u662f\u5982\u4f55\u5c06\u89e3\u7801\u5668\u4ec5\u6709\u7684LLMs\u6574\u5408\u5230\u8bed\u97f3\u8f6c\u6587\u672c\u7ffb\u8bd1\uff08Speech-to-Text Translation\uff0cS2TT\uff09\u4efb\u52a1\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u67b6\u6784\uff0c\u8ba9LLM\u76f4\u63a5\u5904\u7406\u7f16\u7801\u7684\u8bed\u97f3\u8868\u793a\u5e76\u751f\u6210\u6587\u672c\u7ffb\u8bd1\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u548c\u4efb\u52a1\u8868\u8ff0\u65b9\u5f0f\u7684\u5f71\u54cd\u3002\u5728\u4e0d\u4f7f\u7528\u4e13\u6709\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728CoVoST 2\u548cFLEURS\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u8bbe\u8ba1\u9009\u62e9\u7684\u5408\u7406\u6027\uff0c\u5e76\u4e3aLLMs\u4e0eS2TT\u4efb\u52a1\u7684\u878d\u5408\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2407.03160": "|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## \u80cc\u666f \u5f00\u6e90\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u516c\u4f17\u548c\u884c\u4e1a\u4e2d\u7684\u53d7\u6b22\u8fce\u7a0b\u5ea6\u65e5\u76ca\u63d0\u5347\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u5b9a\u5236\u3001\u5fae\u8c03\u4e14\u514d\u8d39\u4f7f\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5f00\u6e90LLMs\u5728\u4f7f\u7528\u524d\u9700\u8981\u5ba1\u6279\uff0c\u8fd9\u4fc3\u4f7f\u7b2c\u4e09\u65b9\u53d1\u5e03\u6613\u4e8e\u83b7\u53d6\u7684\u7248\u672c\uff0c\u751a\u81f3\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u6216\u91cf\u5316\u4f18\u5316\uff0c\u4ee5\u964d\u4f4e\u8ba1\u7b97\u9700\u6c42\u3002\u8fd9\u4e9b\u4fbf\u6377\u7248\u672c\u5bf9\u7528\u6237\u9887\u5177\u5438\u5f15\u529b\uff0c\u4f46\u4e5f\u589e\u52a0\u4e86\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u7684\u98ce\u9669\uff0c\u5a01\u80c1\u5230LLMs\u7684\u5b8c\u6574\u6027\u548c\u5b89\u5168\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u65b9\u6cd5SOS\uff0c\u5b83\u8bbe\u8ba1\u5f97\u8ba1\u7b97\u9700\u6c42\u4f4e\uff0c\u65e0\u9700\u5e72\u51c0\u6570\u636e\u6216\u8c03\u6574\u6a21\u578b\u6743\u91cd\uff0c\u4fdd\u6301\u6a21\u578b\u7684\u53ef\u7528\u6027\u3002SOS\u9488\u5bf9\u5404\u79cd\u573a\u666f\u4e0b\u7684\u5b89\u5168\u95ee\u9898\uff0c\u5305\u62ec\u540e\u95e8\u653b\u51fb\u3001\u7834\u89e3\u653b\u51fb\u548c\u63d0\u793a\u7a83\u53d6\u653b\u51fb\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u653b\u51fb\u5728\u6240\u6709\u8bc4\u4f30\u76ee\u6807\u4e0a\u5747\u6709\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86SOS\u6280\u672f\u7684\u53e6\u4e00\u9762\u2014\u2014\u7248\u6743\u4ee4\u724c\uff1a\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5141\u8bb8\u7528\u6237\u6807\u8bb0\u5176\u7248\u6743\u5185\u5bb9\uff0c\u9632\u6b62\u6a21\u578b\u4f7f\u7528\u3002|\n", "2407.03157": "|**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4ee3\u7801\u751f\u6210\u4e2d\u7684\u5e38\u89c1\u573a\u666f\uff1a\u5f00\u53d1\u8005\u5b9e\u65f6\u7f16\u8f91\u73b0\u6709\u4ee3\u7801\uff0c\u5e76\u8bf7\u6c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u8bed\u8a00\u6a21\u578b\uff09\u8fdb\u884c\u5373\u65f6\u91cd\u9884\u6d4b\u4e0b\u4e00\u4e2atoken\u6216\u884c\u3002\u76f4\u63a5\u7684\u65b9\u6cd5\u662f\u8ba9LLM\u91cd\u65b0\u7f16\u7801\u6574\u4e2a\u952e\u503c\u7f13\u5b58\u4ee5\u63d0\u4f9b\u7cbe\u786e\u7684\u9884\u6d4b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8ba1\u7b97\u6210\u672c\u9ad8\uff0c\u7279\u522b\u662f\u5f53\u5e8f\u5217\u957f\u5ea6\u5f88\u957f\u65f6\u3002\u4ec5\u7f16\u7801\u7f16\u8f91\u540e\u7684\u5b50\u5e8f\u5217\u5e76\u5c06\u5176\u6574\u5408\u5230\u539f\u59cb\u952e\u503c\u7f13\u5b58\u4e2d\u4f1a\u9047\u5230\u65f6\u95f4\u6df7\u6dc6\u95ee\u9898\uff0c\u5bfc\u81f4\u6027\u80fd\u5927\u5e45\u4e0b\u964d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\u2014\u2014\\textbf{\u4f4d\u7f6e\u5b8c\u6574\u6027\u7f16\u7801}\uff08Positional Integrity Encoding\uff0c\u7b80\u79f0PIE\uff09\u3002PIE\u57fa\u4e8e\u65cb\u8f6c\u578b\u4f4d\u7f6e\u7f16\u7801\uff0c\u9996\u5148\u79fb\u9664\u5f15\u5165\u65f6\u95f4\u6df7\u6dc6\u7684\u65cb\u8f6c\u578b\u77e9\u9635\uff0c\u7136\u540e\u91cd\u65b0\u5e94\u7528\u6b63\u786e\u7684\u77e9\u9635\uff0c\u786e\u4fdd\u4e86\u4ee4\u724c\u4e4b\u95f4\u7684\u4f4d\u7f6e\u5173\u7cfb\u6b63\u786e\uff0c\u4ec5\u9700\u4e00\u8f6e\u77e9\u9635\u4e58\u6cd5\u5373\u53ef\u5b8c\u6210\u3002\u6211\u4eec\u5728RepoBench-C-8k\u6570\u636e\u96c6\u4e0a\uff0c\u4f7f\u752813\u4ebf\u300167\u4ebf\u548c330\u4ebf\u53c2\u6570\u7684DeepSeek-Coder\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u4ee3\u7801\u63d2\u5165\u3001\u4ee3\u7801\u5220\u9664\u548c\u591a\u4f4d\u7f6e\u4ee3\u7801\u7f16\u8f91\u7b49\u4e09\u4e2a\u5b9e\u9645\u7f16\u7a0b\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u5b8c\u6574\u91cd\u8ba1\u7b97\u65b9\u6cd5\u76f8\u6bd4\uff0cPIE\u5728\u6240\u6709\u6a21\u578b\u89c4\u6a21\u548c\u4efb\u52a1\u4e2d\u90fd\u80fd\u51cf\u5c11\u8d85\u8fc785%\u7684\u8ba1\u7b97\u5f00\u9500\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6027\u80fd\u8fd1\u4f3c\u3002|\n", "2407.04694": "|**2024-07-05**|**Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs**|Rudolf Laine et.al.|[2407.04694](http://arxiv.org/abs/2407.04694)|**[link](https://github.com/lrudl/sad)**|## \u80cc\u666f \u4eba\u5de5\u667a\u80fd\u52a9\u624b\uff0c\u5982ChatGPT\uff0c\u5728\u88ab\u8bad\u7ec3\u65f6\u4f1a\u56de\u5e94\u7528\u6237\uff1a\u201c\u6211\u662f\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u201d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u7684\u77e5\u9053\u81ea\u5df1\u662fLLMs\uff0c\u5e76\u80fd\u636e\u6b64\u53ef\u9760\u5730\u884c\u52a8\uff1f\u5b83\u4eec\u662f\u5426\u4e86\u89e3\u81ea\u5df1\u5f53\u524d\u7684\u90e8\u7f72\u60c5\u51b5\uff0c\u4f8b\u5982\u9762\u5411\u516c\u4f17\uff1f\u6211\u4eec\u79f0\u4e4b\u4e3a\u6a21\u578b\u7684\u201c\u60c5\u5883\u610f\u8bc6\u201d\u3002\u4e3a\u4e86\u91cf\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u60c5\u5883\u610f\u8bc6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u884c\u4e3a\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u95ee\u7b54\u548c\u6307\u4ee4\u6267\u884c\uff0c\u8fd9\u5c31\u662f**\u60c5\u5883\u610f\u8bc6\u6570\u636e\u96c6\uff08Situational Awareness Dataset\uff0c\u7b80\u79f0SAD\uff09**\u3002\u8be5\u57fa\u51c6\u5305\u62ec7\u4e2a\u4efb\u52a1\u7c7b\u522b\uff0c\u8d85\u8fc713,000\u4e2a\u95ee\u9898\uff0c\u6d4b\u8bd5\u4e86\u591a\u9879\u80fd\u529b\uff0c\u5982\u8bc6\u522b\u81ea\u8eab\u751f\u6210\u7684\u6587\u672c\u3001\u9884\u6d4b\u81ea\u5df1\u7684\u884c\u4e3a\u3001\u5206\u8fa8\u63d0\u793a\u6765\u81ea\u5185\u90e8\u8bc4\u4f30\u8fd8\u662f\u5b9e\u9645\u5e94\u7528\uff0c\u4ee5\u53ca\u9075\u5faa\u4f9d\u8d56\u81ea\u6211\u8ba4\u77e5\u7684\u6307\u4ee4\u3002 \u6211\u4eec\u5bf916\u79cdLLMs\u5728SAD\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u57fa\u7840\uff08\u9884\u8bad\u7ec3\uff09\u6a21\u578b\u548c\u804a\u5929\u6a21\u578b\u3002\u5c3d\u7ba1\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\uff0c\u4f46\u6700\u9ad8\u5206\u7684\u6a21\u578b\uff08Claude 3 Opus\uff09\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0SAD\u7684\u8868\u73b0\u4e0e\u901a\u7528\u77e5\u8bc6\u6307\u6807\uff08\u5982MMLU\uff09\u7684\u76f8\u5173\u6027\u5e76\u4e0d\u5b8c\u5168\u4e00\u81f4\u3002\u804a\u5929\u6a21\u578b\uff0c\u7ecf\u8fc7\u9488\u5bf9\u6027\u8bad\u7ec3\u4ee5\u4f5c\u4e3aAI\u52a9\u624b\uff0c\u76f8\u5bf9\u4e8e\u57fa\u7840\u6a21\u578b\u5728SAD\u4e0a\u7684\u8868\u73b0\u66f4\u597d\uff0c\u4f46\u5728\u901a\u7528\u77e5\u8bc6\u4efb\u52a1\u4e0a\u5219\u4e0d\u7136\u3002SAD\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5206\u89e3\u6210\u53ef\u91cf\u5316\u7684\u80fd\u529b\uff0c\u4fc3\u8fdb\u79d1\u5b66\u754c\u5bf9LLMs\u60c5\u5883\u610f\u8bc6\u7684\u7406\u89e3\u3002\u60c5\u5883\u610f\u8bc6\u5bf9\u4e8e\u589e\u5f3a\u6a21\u578b\u7684\u81ea\u4e3b\u89c4\u5212\u548c\u884c\u52a8\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u8fd9\u65e2\u6709\u5229\u4e8e\u81ea\u52a8\u5316\uff0c\u4e5f\u5e26\u6765\u4e86\u4e0eAI\u5b89\u5168\u548c\u63a7\u5236\u76f8\u5173\u7684\u5168\u65b0\u98ce\u9669\u3002\u60a8\u53ef\u4ee5\u5728\u83b7\u53d6\u4ee3\u7801\u548c\u6700\u65b0\u7ed3\u679c\u3002|\n", "2407.04693": "|**2024-07-05**|**ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models**|Yuzhe Gu et.al.|[2407.04693](http://arxiv.org/abs/2407.04693)|**[link](https://github.com/open-compass/anah)**|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8de8\u9886\u57df\u548c\u5e7f\u6cdb\u5e94\u7528\u7684\u957f\u683c\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\u4f1a\u51fa\u73b0\u5e7b\u89c9\u3002\u5f53\u524d\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u7f13\u89e3\u6570\u636e\u96c6\u5728\u9886\u57df\u8986\u76d6\u548c\u89c4\u6a21\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u7531\u4e8e\u52b3\u52a8\u6210\u672c\u9ad8\u6602\u4e14\u73b0\u6709\u5e7b\u89c9\u6807\u6ce8\u5458\u7684\u53ef\u9760\u6027\u4e0d\u8db3\uff0c\u96be\u4ee5\u5b9e\u73b0\u89c4\u6a21\u5316\u3002\u4e3a\u4e86\u63a8\u52a8\u5bf9LLMs\u5e7b\u89c9\u7684\u53ef\u6269\u5c55\u76d1\u7763\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u81ea\u6211\u8bad\u7ec3\u6846\u67b6\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u671f\u671b\u6700\u5927\u5316\uff08EM\uff09\u7b97\u6cd5\uff0c\u6bcf\u6b21\u8fed\u4ee3\u9996\u5148\u4f7f\u7528\u4e00\u4e2a\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u6765\u6807\u8bb0\u6269\u5927\u7684\u6570\u636e\u96c6\uff0c\u7136\u540e\u7528\u8fd9\u4e2a\u66f4\u51c6\u786e\u7684\u6807\u6ce8\u5668\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u5728\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u4e2d\uff0c\u4f7f\u7528\u65b0\u7684\u6807\u6ce8\u5668\u66f4\u65b0\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\uff0c\u6700\u7ec8\u5f97\u5230\u7684\u4ec5\u97007\u4ebf\u53c2\u6570\u7684\u5e7b\u89c9\u6807\u6ce8\u5668\u8d85\u8d8a\u4e86GPT-4\u7684\u8868\u73b0\uff0c\u5e76\u5728HaluEval\u548cHalluQA\u4e0a\u7684\u96f6\u6837\u672c\u63a8\u7406\u4e2d\u53d6\u5f97\u4e86\u6700\u65b0\u7684\u5e7b\u89c9\u68c0\u6d4b\u6548\u679c\u3002\u8fd9\u79cd\u6807\u6ce8\u5668\u4e0d\u4ec5\u80fd\u591f\u8bc4\u4f30\u4e0d\u540cLLMs\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u5e7b\u89c9\u7a0b\u5ea6\uff0c\u8fd8\u80fd\u901a\u8fc7NLI\u6307\u6807\u63d0\u5347\uff08\u4ece25%\u63d0\u9ad8\u523037%\uff09\u6765\u5e2e\u52a9\u51cf\u8f7b\u751f\u6210\u6587\u672c\u7684\u5e7b\u89c9\u95ee\u9898\u3002|\n", "2407.04681": "|**2024-07-05**|**Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge**|Yuanze Lin et.al.|[2407.04681](http://arxiv.org/abs/2407.04681)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u4f7f\u7528\u5927\u578b\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u6587\u672c\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u540e\uff0c\u5728\u6574\u4f53\u7406\u89e3\u56fe\u50cf\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u6587\u672c\u5f62\u5f0f\u56fa\u6709\u7684\u56f0\u96be\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u9700\u8981\u7cbe\u7ec6\u6216\u7a7a\u95f4\u5bc6\u96c6\u4fe1\u606f\uff08\u5982\u906e\u7f69\uff09\u7684\u95ee\u9898\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u5bf9\u8be6\u7ec6\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u80fd\u529b\u3002\u53d7\u5230\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7406\u5ff5\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u89c6\u89c9\u63d0\u793a\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6765\u81ea\u4e13\u95e8\u89c6\u89c9\u6a21\u578b\uff08\u5982\u5b9e\u4f8b\u5206\u5272\u548cOCR\u6a21\u578b\uff09\u7684\u7cbe\u7ec6\u5916\u90e8\u77e5\u8bc6\u878d\u5165MLLM\u3002\u8fd9\u662f\u4e00\u4e2a\u6709\u524d\u666f\u4f46\u5c1a\u672a\u5145\u5206\u63a2\u7d22\u7684\u65b9\u5411\uff0c\u53ef\u4ee5\u63d0\u5347MLLM\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u533a\u522b\u4e8e\u540c\u65f6\u671f\u7684\u5de5\u4f5c\uff0c\u5b83\u4eec\u5c06\u5916\u90e8\u77e5\u8bc6\u8f6c\u5316\u4e3a\u989d\u5916\u7684\u6587\u672c\u63d0\u793a\uff0c\u8feb\u4f7f\u6a21\u578b\u95f4\u63a5\u5b66\u4e60\u89c6\u89c9\u5185\u5bb9\u4e0e\u6587\u672c\u5750\u6807\u4e4b\u95f4\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u8bae\u5c06\u7cbe\u7ec6\u77e5\u8bc6\u4fe1\u606f\u76f4\u63a5\u5d4c\u5165\u5230\u4e00\u4e2a\u7a7a\u95f4\u5d4c\u5165\u56fe\u4e2d\u4f5c\u4e3a\u89c6\u89c9\u63d0\u793a\u3002\u8fd9\u79cd\u8bbe\u8ba1\u53ef\u4ee5\u8f7b\u677e\u5730\u6574\u5408\u8fdb\u5404\u79cdMLLM\uff0c\u5982LLaVA\u548cMipha\uff0c\u663e\u8457\u63d0\u9ad8\u5b83\u4eec\u7684\u89c6\u89c9\u7406\u89e3\u6027\u80fd\u3002\u901a\u8fc7\u4e25\u8c28\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u4e5d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u63d0\u5347MLLM\u7684\u6574\u4f53\u6027\u80fd\uff0c\u589e\u5f3a\u5176\u5bf9\u7ec6\u7c92\u5ea6\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u80fd\u529b\u3002|\n", "2407.04675": "|**2024-07-05**|**Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition**|Ye Bai et.al.|[2407.04675](http://arxiv.org/abs/2407.04675)|null|\u73b0\u4ee3\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u9700\u8981\u51c6\u786e\u8f6c\u5f55\u6765\u81ea\u4e0d\u540c\u9886\u57df\u3001\u8bed\u8a00\u548c\u53e3\u97f3\u7684\u591a\u6837\u8bed\u97f3\u4fe1\u53f7\uff0c\u540c\u65f6\u8003\u8651\u5230\u7279\u5b9a\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ee5\u9002\u5e94\u5404\u79cd\u5e94\u7528\u573a\u666f\u7684\u9700\u6c42\u3002\u4f20\u7edf\u7684\u7aef\u5230\u7aef\u6a21\u578b\u7ed3\u5408\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u6570\u636e\u5339\u914d\u573a\u666f\u4e2d\u6548\u679c\u826f\u597d\uff0c\u4f46\u9010\u6e10\u9762\u4e34\u74f6\u9888\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u578b\u8bed\u97f3\u8bc6\u522b\u6a21\u578b\u2014\u2014Seed-ASR\u3002\u5b83\u5efa\u7acb\u5728\u97f3\u9891\u6761\u4ef6\u5316LLM\uff08AcLLM\uff09\u67b6\u6784\u4e4b\u4e0a\uff0c\u901a\u8fc7\u5c06\u8fde\u7eed\u8bed\u97f3\u8868\u793a\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5229\u7528\u4e86LLM\u7684\u5f3a\u5927\u529f\u80fd\u3002\u901a\u8fc7\u5206\u9636\u6bb5\u7684\u5927\u89c4\u6a21\u8bad\u7ec3\u4ee5\u53ca\u5728LLM\u4e2d\u6fc0\u53d1\u4e0a\u4e0b\u6587\u611f\u77e5\u80fd\u529b\uff0cSeed-ASR\u5728\u5305\u62ec\u591a\u4e2a\u9886\u57df\u3001\u65b9\u8a00\u548c\u8bed\u8a00\u7684\u7efc\u5408\u8bc4\u4f30\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u7aef\u5230\u7aef\u6a21\u578b\u3002\u6b64\u5916\uff0cSeed-ASR\u80fd\u591f\u90e8\u7f72\u5230\u5404\u79cd\u573a\u666f\u4e2d\u652f\u6301\u7279\u5b9a\u9700\u6c42\uff0c\u65e0\u9700\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u3002\u4e0e\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578bASR\u6a21\u578b\u76f8\u6bd4\uff0cSeed-ASR\u5728\u4e2d\u6587\u548c\u82f1\u6587\u516c\u5f00\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8bcd\uff08\u6216\u5b57\u7b26\uff0c\u9488\u5bf9\u4e2d\u6587\uff09\u9519\u8bef\u7387\u964d\u4f4e\u4e8610%-40%\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.04656": "|**2024-07-05**|**Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement**|Yongji Wu et.al.|[2407.04656](http://arxiv.org/abs/2407.04656)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u7a00\u758f\u6fc0\u6d3b\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u8ba1\u7b97\u6210\u672c\u7684\u4e9a\u7ebf\u6027\u6269\u5c55\u800c\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u3002\u7136\u800c\uff0c\u9891\u7e41\u7684\u8bad\u7ec3\u5931\u8d25\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u56e0\u4e3a\u5355\u6b21\u5931\u8d25\u53ef\u80fd\u5bfc\u81f4\u6240\u6709GPU\u9677\u5165\u95f2\u7f6e\uff0c\u76f4\u81f3\u95ee\u9898\u89e3\u51b3\uff0c\u4ece\u800c\u53ef\u80fd\u4e22\u5931\u5927\u91cf\u8bad\u7ec3\u8fdb\u5ea6\uff0c\u9700\u8981\u4ece\u68c0\u67e5\u70b9\u91cd\u65b0\u5f00\u59cb\u3002\u73b0\u6709\u7684\u9ad8\u6548\u5bb9\u9519\u8bad\u7ec3\u89e3\u51b3\u65b9\u6848\u8981\u4e48\u7f3a\u4e4f\u5f39\u6027\uff0c\u8981\u4e48\u4f9d\u8d56\u4e8e\u5c06\u6062\u590d\u80fd\u529b\u6784\u5efa\u5230\u7ba1\u9053\u5e76\u884c\u6027\u4e2d\uff0c\u4f46\u8fd9\u4e0d\u9002\u7528\u4e8eMoE\u6a21\u578b\uff0c\u56e0\u4e3aMoE\u67b6\u6784\u91c7\u7528\u4e86\u4e13\u5bb6\u5e76\u884c\u7b56\u7565\u3002 \u6211\u4eec\u63d0\u51fa\u4e86Lazarus\uff0c\u4e00\u4e2a\u9488\u5bf9MoE\u6a21\u578b\u8fdb\u884c\u5bb9\u9519\u548c\u5f39\u6027\u7684\u8bad\u7ec3\u7cfb\u7edf\u3002Lazarus\u901a\u8fc7\u52a8\u6001\u5206\u914d\u4e13\u5bb6\u526f\u672c\u6765\u5e94\u5bf9\u4e13\u5bb6\u5de5\u4f5c\u8d1f\u8f7d\u7684\u56fa\u6709\u4e0d\u5e73\u8861\uff0c\u4ece\u800c\u52a0\u901f\u8bad\u7ec3\uff0c\u5e76\u5f00\u53d1\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u6700\u4f18\u7684\u4e13\u5bb6\u653e\u7f6e\u7b97\u6cd5\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8\u5728\u5931\u8d25\u540e\u7684\u6062\u590d\u6982\u7387\u3002\u901a\u8fc7\u81ea\u9002\u5e94\u7684\u4e13\u5bb6\u653e\u7f6e\u548c\u7075\u6d3b\u7684\u4ee4\u724c\u5206\u53d1\u5668\uff0cLazarus\u80fd\u591f\u5728\u6545\u969c\u540e\u5145\u5206\u5229\u7528\u6240\u6709\u53ef\u7528\u8282\u70b9\uff0c\u907f\u514dGPU\u7a7a\u95f2\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u4e0e\u73b0\u6709MoE\u8bad\u7ec3\u7cfb\u7edf\u76f8\u6bd4\uff0cLazarus\u5728\u9891\u7e41\u7684\u8282\u70b9\u6545\u969c\u4e0b\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe5.7\u500d\uff0c\u4e14\u5728\u771f\u5b9espot\u5b9e\u4f8b\u8ddf\u8e2a\u4e0a\u63d0\u5347\u4e863.4\u500d\u3002|\n", "2407.04629": "|**2024-07-05**|**Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework**|Reza Averly et.al.|[2407.04629](http://arxiv.org/abs/2407.04629)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u4e34\u5e8a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08Clinical NER\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u4ece\u4e34\u5e8a\u75c5\u5386\u4e2d\u63d0\u53d6\u91cd\u8981\u5b9e\u4f53\u7684\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4e13\u6709\u7684LLMs\uff0c\u4f46\u8bba\u6587\u63a2\u8ba8\u4e86\u5f00\u653e\u7684\u3001\u4e13\u95e8\u4e3a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u8bad\u7ec3\u7684LLMs\u5728\u4e34\u5e8aNER\u4e2d\u7684\u6027\u80fd\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u5b9e\u4f53\u5206\u89e3\u4e0e\u8fc7\u6ee4\u201d\uff08Entity Decomposition with Filtering\uff0cEDF\uff09\uff0c\u76ee\u7684\u662f\u901a\u8fc7\u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u5b9e\u4f53\u7c7b\u578b\u7684\u68c0\u7d22\uff0c\u5e76\u5f15\u5165\u4e00\u4e2a\u8fc7\u6ee4\u673a\u5236\u6765\u6d88\u9664\u9519\u8bef\u5b9e\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5728\u6240\u6709\u5ea6\u91cf\u6807\u51c6\u3001\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5b9e\u4f53\u7c7b\u578b\u4e0a\u90fd\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002\u5206\u6790\u663e\u793a\uff0c\u5b9e\u4f53\u5206\u89e3\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5bf9\u5148\u524d\u672a\u88ab\u6355\u6349\u5230\u7684\u5b9e\u4f53\u7684\u8bc6\u522b\u3002\u6b64\u5916\uff0c\u8bba\u6587\u8fd8\u63d0\u4f9b\u4e86\u5bf9\u6846\u67b6\u7684\u5168\u9762\u8bc4\u4f30\u548c\u6df1\u5165\u7684\u9519\u8bef\u5206\u6790\uff0c\u4ee5\u671f\u4e3a\u672a\u6765\u7684\u7814\u7a76\u63d0\u4f9b\u65b9\u5411\u3002|\n", "2407.04622": "|**2024-07-05**|**On scalable oversight with weak LLMs judging strong LLMs**|Zachary Kenton et.al.|[2407.04622](http://arxiv.org/abs/2407.04622)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u53ef\u6269\u5c55\u7684\u76d1\u7763\u534f\u8bae\uff0c\u76ee\u6807\u662f\u8ba9\u4eba\u7c7b\u80fd\u591f\u6709\u6548\u76d1\u7763\u8d85\u8d8a\u4eba\u7c7b\u7ea7\u522b\u7684AI\u3002\u7814\u7a76\u4e3b\u8981\u805a\u7126\u5728\u8fa9\u8bba\u3001\u54a8\u8be2\u548c\u76f4\u63a5\u95ee\u7b54\u4e09\u79cd\u5f62\u5f0f\u4e0a\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aAI\u4ee3\u7406\u548c\u6cd5\u5b98\u89d2\u8272\uff0c\u5047\u8bbe\u6cd5\u5b98\u6a21\u578b\u8f83\u5f31\u3002\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u4efb\u52a1\u5f02\u8d28\u6027\uff0c\u6269\u5c55\u4e86\u5148\u524d\u4ec5\u5173\u6ce8\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u5355\u4e00\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\uff0c\u589e\u52a0\u4e86\u6570\u5b66\u3001\u7f16\u7a0b\u3001\u903b\u8f91\u548c\u591a\u6a21\u6001\u63a8\u7406\u7b49\u9886\u57df\u7684\u6311\u6218\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u6240\u6709\u4efb\u52a1\u4e2d\uff0c\u5f53\u54a8\u8be2\u5e08\u968f\u673a\u88ab\u5206\u914d\u6b63\u786e\u6216\u9519\u8bef\u7b54\u6848\u65f6\uff0c\u8fa9\u8bba\u4f18\u4e8e\u54a8\u8be2\u3002\u5728\u5b58\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u8fa9\u8bba\u4f18\u4e8e\u76f4\u63a5\u95ee\u7b54\uff0c\u4f46\u5728\u5176\u4ed6\u6ca1\u6709\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u4efb\u52a1\u4e2d\uff0c\u7ed3\u679c\u5219\u4e0d\u4e00\u3002\u5f53AI\u88ab\u5141\u8bb8\u9009\u62e9\u8981\u8bba\u8bc1\u7684\u7b54\u6848\u800c\u975e\u9884\u5148\u6307\u5b9a\u65f6\uff0c\u53d1\u73b0\u6cd5\u5b98\u88ab\u9519\u8bef\u7b54\u6848\u8bf4\u670d\u7684\u60c5\u51b5\u5728\u8fa9\u8bba\u4e2d\u51cf\u5c11\u3002\u6b64\u5916\uff0c\u66f4\u5f3a\u7684\u8fa9\u8bba\u8005\u6a21\u578b\u80fd\u63d0\u9ad8\u6cd5\u5b98\u7684\u51c6\u786e\u6027\uff0c\u5c3d\u7ba1\u63d0\u5347\u7a0b\u5ea6\u7565\u4f4e\u4e8e\u4e4b\u524d\u7684\u7814\u7a76\u3002|\n", "2407.04581": "|**2024-07-05**|**Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions**|Shumaila Javaid et.al.|[2407.04581](http://arxiv.org/abs/2407.04581)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u878d\u5165\u96c6\u6210\u536b\u661f\u3001\u822a\u7a7a\u548c\u5730\u9762\u7f51\u7edc\uff08ISATN\uff09\u7684\u53d8\u9769\u6f5c\u529b\uff0c\u5229\u7528\u5148\u8fdb\u7684\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u4f18\u5316\u8fd9\u4e9b\u7f51\u7edc\u7684\u8fde\u901a\u6027\u3002\u9996\u5148\u6982\u8ff0\u4e86ISATN\u7684\u5f53\u524d\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86LLMs\u5728\u63d0\u5347\u6570\u636e\u6d41\u3001\u4fe1\u53f7\u5904\u7406\u548c\u7f51\u7edc\u7ba1\u7406\u65b9\u9762\u7684\u4f5c\u7528\uff0c\u4ee5\u63a8\u52a85G/6G\u901a\u4fe1\u6280\u672f\u7684\u53d1\u5c55\uff0c\u901a\u8fc7\u9ad8\u7ea7\u9884\u6d4b\u7b97\u6cd5\u548c\u5b9e\u65f6\u51b3\u7b56\u6765\u589e\u5f3a\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6df1\u5165\u5206\u6790\u4e86ISATN\u7ec4\u4ef6\uff0c\u63a2\u8ba8\u4e86\u5982\u4f55\u6709\u6548\u5730\u5229\u7528LLMs\u89e3\u51b3\u4f20\u7edf\u6570\u636e\u4f20\u8f93\u548c\u5904\u7406\u4e2d\u7684\u74f6\u9888\u95ee\u9898\u3002 \u6587\u7ae0\u7740\u91cd\u4e8eISATN\u7684\u7f51\u7edc\u7ba1\u7406\u6311\u6218\uff0c\u5305\u62ec\u8d44\u6e90\u5206\u914d\u7b56\u7565\u3001\u6d41\u91cf\u8def\u7531\u4ee5\u53ca\u5728\u4e0d\u65ad\u53d8\u5316\u6761\u4ef6\u4e0b\u786e\u4fdd\u65e0\u7f1d\u8fde\u63a5\u548c\u6700\u4f18\u6027\u80fd\u7684\u7f51\u7edc\u5b89\u5168\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5c06LLMs\u6574\u5408\u5230ISATN\u4e2d\u6240\u9762\u4e34\u7684\u6280\u672f\u6311\u6218\uff0c\u5982\u6570\u636e\u96c6\u6210\u3001\u6269\u5c55\u6027\u95ee\u9898\u3001\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u5ef6\u8fdf\uff0c\u4ee5\u53ca\u6784\u5efa\u5065\u58ee\u4e14\u5bb9\u9519\u7684\u7cfb\u7edf\u8bbe\u8ba1\u3002\u6700\u540e\uff0c\u7814\u7a76\u6307\u51fa\u4e86\u672a\u6765\u7814\u7a76\u7684\u5173\u952e\u65b9\u5411\uff0c\u5373\u5982\u4f55\u5145\u5206\u5229\u7528LLM\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u5347\u7f51\u7edc\u53ef\u9760\u6027\u3001\u4f18\u5316\u6027\u80fd\uff0c\u5b9e\u73b0\u4e00\u4e2a\u771f\u6b63\u5168\u7403\u4e92\u8054\u4e14\u667a\u80fd\u7684\u7f51\u7edc\u4f53\u7cfb\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u6027\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04541": "|**2024-07-05**|**PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts**|Ana-Cristina Rogoz et.al.|[2407.04541](http://arxiv.org/abs/2407.04541)|**[link](https://github.com/ana-rogoz/poprero)**|**\u6211\u4eec\u63a8\u51fa\u4e86PoPreRo\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u7f57\u9a6c\u5c3c\u4e9aReddit\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6536\u96c6\u7684dataset\u3002PoPreRo\u6c47\u96c6\u4e86\u4e94\u4e2a\u4e0d\u540c\u7f57\u9a6c\u5c3c\u4e9a\u5b50\u8bba\u575b\u7684\u591a\u6837\u5316\u5e16\u5b50\u6837\u672c\uff0c\u603b\u8ba1\u5305\u542b28,107\u6761\u6570\u636e\u3002\u968f\u6570\u636e\u96c6\u4e00\u540c\u53d1\u5e03\u7684\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u7ade\u4e89\u6027\u6a21\u578b\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u7840\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6d4b\u8bd5\u96c6\u4e0a\u5f97\u5206\u6700\u9ad8\u7684\u6a21\u578b\u8fbe\u5230\u4e8661.35%\u7684\u51c6\u786e\u7387\u548c60.60%\u7684\u5b8fF1\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5728PoPreRo\u4e0a\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u5bf9Falcon-7B\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u4e00\u6b65\u63a2\u7a76\u4e5f\u6307\u5411\u4e86\u540c\u6837\u7684\u7ed3\u8bba\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u76f8\u4fe1PoPreRo\u662f\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u8d44\u6e90\uff0c\u53ef\u4ee5\u7528\u6765\u8bc4\u4f30\u7f57\u9a6c\u5c3c\u4e9a\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/ana-rogoz/PoPreRo\u3002**|\n", "2407.06189": "|**2024-07-08**|**Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision**|Orr Zohar et.al.|[2407.06189](http://arxiv.org/abs/2407.06189)|**[link](https://github.com/orrzohar/Video-STaR)**|**\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u7684\u6027\u80fd\u4e0e\u5176\u8bad\u7ec3\u6570\u636e\u7684\u89c4\u6a21\u548c\u8d28\u91cf\u5bc6\u5207\u76f8\u5173\u3002\u5f53\u524d\u7684\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e3b\u8981\u7531\u63d0\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u89c6\u9891\u5b57\u5e55\u4ee5\u5f62\u6210\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5185\u5bb9\u591a\u4e3a\u63cf\u8ff0\u6027\u3002\u7136\u800c\uff0c\u8bb8\u591a\u5e26\u6709\u4e30\u5bcc\u6807\u7b7e\u548c\u76d1\u7763\u7684\u89c6\u9891\u6570\u636e\u96c6\u5df2\u7ecf\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u5c06\u5b83\u4eec\u878d\u5165LVLM\u5e76\u975e\u6613\u4e8b\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u4e0e\u589e\u5f3a\u63a8\u7406\uff08Video Self-Training with augmented Reasoning\uff0c\u7b80\u79f0Video-STaR\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u65b9\u6cd5\u3002Video-STaR\u4f7f\u5f97\u4efb\u4f55\u6807\u6ce8\u7684\u89c6\u9891\u6570\u636e\u96c6\u90fd\u80fd\u7528\u4e8e\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLVLM\u5728\u751f\u6210\u6307\u4ee4\u548c\u5fae\u8c03\u4e4b\u95f4\u5faa\u73af\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u8fd9\u4e0d\u4ec5\u80fd\u63d0\u5347\u89c6\u9891\u6574\u4f53\u7406\u89e3\u80fd\u529b\uff08I\uff09\uff0c\u8fd8\u80fd\u8ba9LVLM\u9002\u5e94\u65b0\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u5229\u7528\u73b0\u6709\u76d1\u7763\u8fdb\u884c\u5b66\u4e60\u3002 \u5177\u4f53\u6765\u8bf4\uff0cLVLM\u88ab\u63d0\u793a\u63d0\u51fa\u4e00\u4e2a\u7b54\u6848\uff0c\u7136\u540e\u4ec5\u4fdd\u7559\u90a3\u4e9b\u5305\u542b\u539f\u59cb\u89c6\u9891\u6807\u7b7e\u7684\u7b54\u6848\u3002LVLM\u968f\u540e\u5728\u751f\u6210\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u518d\u8bad\u7ec3\u3002\u901a\u8fc7\u53ea\u5728\u5305\u542b\u6b63\u786e\u89c6\u9891\u6807\u7b7e\u7684\u751f\u6210\u7b54\u6848\u4e0a\u8bad\u7ec3\uff0cVideo-STaR\u5229\u7528\u73b0\u6709\u7684\u89c6\u9891\u6807\u7b7e\u4f5c\u4e3a\u5f31\u76d1\u7763\u6765\u6307\u5bfc\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7Video-STaR\u589e\u5f3a\u7684LVLM\u5728\uff08I\uff09\u4e00\u822c\u89c6\u9891\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63d0\u5347\u4e8610%\uff0c\u5728\uff08II\uff09\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0cVideo-STaR\u63d0\u9ad8\u4e86Kinetics700-QA\u7684\u51c6\u786e\u602720%\uff0c\u4ee5\u53caFineDiving\u52a8\u4f5c\u8d28\u91cf\u8bc4\u4f30\u7684\u6027\u80fd15%\u3002\u603b\u7684\u6765\u8bf4\uff0cVideo-STaR\u4e3aLVLM\u7684\u6027\u80fd\u63d0\u5347\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u6548\u4e14\u5b9e\u7528\u7684\u65b9\u6cd5\u3002**|\n", "2407.06188": "|**2024-07-08**|**CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation**|Xinying Guo et.al.|[2407.06188](http://arxiv.org/abs/2407.06188)|null|\u5728\u5a31\u4e50\u884c\u4e1a\uff08\u5982\u52a8\u753b\u548c\u6e38\u620f\uff09\u4ee5\u53ca\u6218\u7565\u9886\u57df\uff08\u5982\u57ce\u5e02\u6a21\u62df\u548c\u89c4\u5212\uff09\u4e2d\uff0c\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9700\u8981\u7cbe\u7ec6\u5730\u878d\u5408\u63a7\u5236\u4e0e\u751f\u6210\uff0c\u4ee5\u5728\u7279\u5b9a\u7684\u7a7a\u95f4\u548c\u8bed\u4e49\u7ea6\u675f\u4e0b\u5b9e\u73b0\u903c\u771f\u7684\u7fa4\u4f53\u52a8\u6001\u5408\u6210\uff0c\u5176\u6311\u6218\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u6a21\u578b\u5f80\u5f80\u5173\u6ce8\u4e2a\u4f53\u884c\u4e3a\uff0c\u5ffd\u89c6\u4e86\u96c6\u4f53\u884c\u4e3a\u7684\u590d\u6742\u6027\uff1b\u800c\u591a\u4e2a\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u7684\u6700\u65b0\u65b9\u6cd5\u4e25\u91cd\u4f9d\u8d56\u9884\u8bbe\u573a\u666f\uff0c\u4e14\u9650\u4e8e\u56fa\u5b9a\u3001\u5c11\u91cf\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u9650\u5236\u4e86\u5176\u5b9e\u7528\u6027\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCrowdMoGen\uff0c\u4e00\u4e2a\u96f6\u6837\u672c\u6587\u672c\u9a71\u52a8\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u529b\u91cf\uff0c\u5c06\u96c6\u4f53\u667a\u6167\u878d\u5165\u8fd0\u52a8\u751f\u6210\u6846\u67b6\uff0c\u4ece\u800c\u80fd\u591f\u5728\u6ca1\u6709\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u901a\u7528\u7684\u89c4\u5212\u548c\u7fa4\u4f53\u8fd0\u52a8\u751f\u6210\u3002\u6211\u4eec\u7684\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a1\uff09\u4eba\u7fa4\u573a\u666f\u89c4\u5212\u5668\uff0c\u5b66\u4e60\u6839\u636e\u7279\u5b9a\u573a\u666f\u4e0a\u4e0b\u6587\u6216\u5f15\u5165\u7684\u6270\u52a8\u534f\u8c03\u8fd0\u52a8\u548c\u52a8\u6001\uff1b2\uff09\u96c6\u4f53\u8fd0\u52a8\u751f\u6210\u5668\uff0c\u6839\u636e\u6574\u4f53\u8ba1\u5212\u9ad8\u6548\u5408\u6210\u6240\u9700\u7684\u96c6\u4f53\u8fd0\u52a8\u3002\u5927\u91cf\u7684\u5b9a\u91cf\u548c\u5b9a\u6027\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5b83\u4e0d\u4ec5\u586b\u8865\u4e86\u5927\u89c4\u6a21\u548c\u901a\u7528\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u4efb\u52a1\u7684\u91cd\u8981\u7a7a\u767d\uff0c\u800c\u4e14\u5728\u771f\u5b9e\u611f\u548c\u7075\u6d3b\u6027\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6c34\u51c6\u3002|\n", "2407.06172": "|**2024-07-08**|**On Speeding Up Language Model Evaluation**|Jin Peng Zhou et.al.|[2407.06172](http://arxiv.org/abs/2407.06172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\uff0c\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u80fd\u529b\u3002\u4ece\u8bad\u7ec3\u5230\u63a8\u7406\uff0c\u6784\u5efa\u8fd9\u6837\u7684\u6a21\u578b\u6d89\u53ca\u4f17\u591a\u51b3\u7b56\uff0c\u5f62\u6210\u4e00\u4e2a\u590d\u6742\u7684\u641c\u7d22\u95ee\u9898\u3002\u4f8b\u5982\uff0c\u4e3a\u4e86\u4e3a\u7279\u5b9a\u4efb\u52a1\u627e\u5230\u6700\u4f73\u7684\u9884\u8bad\u7ec3LLM\u3001\u63d0\u793a\u6216\u8d85\u53c2\u6570\uff0c\u901a\u5e38\u9700\u8981\u5bf9\u6574\u4e2a\u6d4b\u8bd5\u96c6\u4e2d\u7684\u591a\u4e2a\u5019\u9009\u65b9\u6848\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002\u8fd9\u79cd\u8be6\u5c3d\u7684\u8bc4\u4f30\u8017\u65f6\u4e14\u6602\u8d35\uff0c\u56e0\u4e3aLLMs\u7684\u63a8\u7406\u548c\u5ea6\u91cf\u8ba1\u7b97\u9700\u6c42\u9ad8\u3002 \u672c\u6587\u9488\u5bf9\u5728\u6709\u9650\u9884\u7b97\u5185\u6709\u6548\u8bc4\u4f30\u65b9\u6cd5\u5728\u6d4b\u8bd5\u6837\u672c\u4e0a\u7684\u6027\u80fd\u8fd9\u4e00\u6311\u6218\u3002\u6211\u4eec\u5229\u7528\u4e86\u5e7f\u6cdb\u7814\u7a76\u7684\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u987a\u5e8f\u9009\u62e9\u4e0b\u4e00\u4e2a\u8981\u8bc4\u4f30\u7684\u65b9\u6cd5-\u793a\u4f8b\u5bf9\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u2014\u2014\u7ed3\u5408\u591a\u81c2\u8001\u864e\u673a\u7b97\u6cd5\u4e0e\u4f4e\u79e9\u5206\u89e3\u2014\u2014\u663e\u8457\u51cf\u5c11\u4e86\u6240\u9700\u7684\u8d44\u6e90\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b97\u6cd5\u4ec5\u4f7f\u7528\u901a\u5e38\u9700\u6c42\u76845%-15%\u8d44\u6e90\uff0c\u5c31\u80fd\u8bc6\u522b\u51fa\u8868\u73b0\u6700\u597d\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9ad8\u8fbe85%-95%\u7684\u6210\u672c\u8282\u7701\u3002|\n", "2407.06153": "|**2024-07-08**|**What's Wrong with Your Code Generated by Large Language Models? An Extensive Study**|Shihan Dou et.al.|[2407.06153](http://arxiv.org/abs/2407.06153)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u7814\u7a76\u4eba\u5458\u5bf9\u6b64\u7684\u5173\u6ce8\u5ea6\u65e5\u76ca\u63d0\u9ad8\u3002\u76ee\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6784\u5efa\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u548c\u91c7\u7528\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6280\u672f\u6765\u63d0\u5347LLM\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u4e9b\u73b0\u6709\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u548c\u8fb9\u754c\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7814\u7a76\u63a2\u8ba8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8be6\u5c3d\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u8bc4\u4f30\u4e86\u4e09\u4e2a\u9886\u5148\u95ed\u6e90LLM\u548c\u56db\u4e2a\u5f00\u6e90LLM\u5728\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u7814\u7a76\u8003\u5bdf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u957f\u5ea6\u3001\u5faa\u73af\u590d\u6742\u5ea6\u548cAPI\u6570\u91cf\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u66f4\u590d\u6742\u7684\u7f16\u7a0b\u95ee\u9898\u65f6\u9762\u4e34\u6311\u6218\uff0c\u751f\u6210\u7684\u4ee3\u7801\u5f80\u5f80\u8f83\u77ed\u4f46\u7ed3\u6784\u66f4\u590d\u6742\uff0c\u4e0e\u6807\u51c6\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\u3002 \u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\u548c12\u4e2a\u5b50\u7c7b\u522b\uff0c\u5206\u6790\u5e38\u89c1\u9519\u8bef\u7c7b\u578b\u7684\u6839\u6e90\u3002\u4e3a\u4e86\u68c0\u9a8cLLMs\u5728\u5b9e\u9645\u9879\u76ee\u4e2d\u7684\u8868\u73b0\uff0c\u6211\u4eec\u4eb2\u624b\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b140\u4e2a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u5b9e\u4e16\u754c\u57fa\u51c6\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5b9e\u9645\u573a\u666f\u4e2d\u7684bug\u5206\u5e03\u4e0e\u73b0\u6709\u57fa\u51c6\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u989d\u5916\u8bad\u7ec3\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u5f15\u5165\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4f7fLLMs\u80fd\u591f\u6839\u636ebug\u7c7b\u578b\u548c\u7f16\u8bd1\u5668\u53cd\u9988\u4fee\u6b63\u5176\u751f\u6210\u7684\u4ee3\u7801\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7ecf\u8fc7\u4e24\u6b21\u8fed\u4ee3\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u51cf\u5c11\u9519\u8bef\uff0c\u4f7f\u901a\u8fc7\u7387\u63d0\u9ad829.2%\uff0c\u8fd9\u8868\u660eLLMs\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65b9\u9762\u5177\u6709\u5de8\u5927\u6f5c\u529b\u3002|\n", "2407.06146": "|**2024-07-09**|**Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks**|Lukas Netz et.al.|[2407.06146](http://arxiv.org/abs/2407.06146)|null|\u6211\u4eec\u4ecb\u7ecd\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bed\u6cd5\u906e\u76d6\u201d\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u65e0\u5173\u6587\u6cd5\u7684\u7ea6\u675f\u4e0b\u751f\u6210\u8bed\u6cd5\u6b63\u786e\u7684\u6a21\u578b\u3002\u5c3d\u7ba1\u5c11\u91cf\u793a\u4f8b\u5b66\u4e60\u6216\u63d0\u793a\u5f15\u5bfc\u7b49prompt\u5de5\u7a0b\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8LLMs\u751f\u6210\u6b63\u786e\u8bed\u6cd5\u7684\u6982\u7387\uff0c\u4f46\u5904\u7406\u590d\u6742\u6587\u6cd5\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u8017\u65f6\u4e14\u6548\u679c\u4e0d\u7406\u60f3\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u6216prompt\u5de5\u7a0b\u4e0a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ea6\u675f\u89e3\u7801\u9650\u5236\u8f93\u51fa\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u6709\u6548\u8bed\u6cd5\u3002\u6211\u4eec\u5229\u7528MontiCore\u6784\u5efa\u7684\u591a\u79cd\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u548c\u591a\u6b3eLLMs\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u4e86\u4f7f\u7528\u548c\u672a\u4f7f\u7528\u7ea6\u675f\u89e3\u7801\u7684\u6548\u679c\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u76f8\u5e94\u7684\u89e3\u6790\u5668\u9a8c\u8bc1\u6bcf\u79cd\u6a21\u578b\u7684\u53e5\u6cd5\u51c6\u786e\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bed\u6cd5\u906e\u76d6\u663e\u8457\u63d0\u5347\u4e86\u591a\u4e2aLLMs\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u51cf\u5c11\u4e86\u5bf9\u7cbe\u5fc3\u8bbe\u8ba1\u63d0\u793a\u7684\u9700\u6c42\uff0c\u63d0\u9ad8\u4e86\u751f\u6210\u6b63\u786e\u6a21\u578b\u7684\u53ef\u80fd\u6027\u3002|\n", "2407.06135": "|**2024-07-08**|**ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation**|Ethan Chern et.al.|[2407.06135](http://arxiv.org/abs/2407.06135)|**[link](https://github.com/gair-nlp/anole)**|**## \u80cc\u666f \u5148\u524d\u7684\u5f00\u6e90\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\uff1a\uff081\uff09\u5b83\u4eec\u5f80\u5f80\u7f3a\u4e4f\u539f\u751f\u96c6\u6210\uff0c\u9700\u8981\u9002\u914d\u5668\u6765\u8854\u63a5\u89c6\u89c9\u8868\u793a\u4e0e\u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u8bb8\u591a\u6a21\u578b\u4ec5\u9650\u4e8e\u5355\u6a21\u6001\u751f\u6210\uff1b\uff083\uff09\u5c3d\u7ba1\u6709\u4e9b\u652f\u6301\u591a\u6a21\u6001\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5355\u72ec\u7684\u6269\u6563\u6a21\u578b\u5904\u7406\u89c6\u89c9\u90e8\u5206\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Anole\uff0c\u4e00\u4e2a\u5f00\u6e90\u7684\u3001\u81ea\u56de\u5f52\u7684\u3001\u539f\u751f\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff0c\u4e13\u4e3a\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u751f\u6210\u8bbe\u8ba1\u3002\u6211\u4eec\u57fa\u4e8eMeta AI\u7684Chameleon\u6784\u5efaAnole\uff0c\u91c7\u7528\u4e86\u4e00\u79cd\u65e2\u6570\u636e\u9ad8\u6548\u53c8\u53c2\u6570\u9ad8\u6548\u7684\u521b\u65b0\u5fae\u8c03\u7b56\u7565\u3002Anole\u5c55\u793a\u4e86\u9ad8\u8d28\u91cf\u3001\u8fde\u8d2f\u7684\u591a\u6a21\u6001\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u5df2\u7ecf\u516c\u5f00\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3001\u8bad\u7ec3\u6846\u67b6\u4ee5\u53ca\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002**|\n", "2407.06129": "|**2024-07-08**|**Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization**|Hannah K. Bako et.al.|[2407.06129](http://arxiv.org/abs/2407.06129)|**[link](https://github.com/hdi-umd/semantic_profiling_llm_evaluation)**|**### \u6982\u8ff0 \u81ea\u52a8\u6839\u636e\u4eba\u7c7b\u5bf9\u6570\u636e\u96c6\u7684\u53e3\u5934\u63cf\u8ff0\u751f\u6210\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\uff0c\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u8bed\u8a00\u4e2d\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u5305\u62ec\u5bf9\u6570\u636e\u5c5e\u6027\u3001\u53ef\u89c6\u5316\u4efb\u52a1\u4ee5\u53ca\u6570\u636e\u9884\u5904\u7406\u6b65\u9aa4\u7684\u9690\u542b\u548c\u660e\u786e\u63d0\u53ca\u3002\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff08NLIs\uff09\u5728\u6570\u636e\u53ef\u89c6\u5316\u65b9\u9762\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5982\u4f55\u6355\u6349\u8fd9\u4e9b\u4fe1\u606f\uff0c\u4f46\u4eba\u7c7b\u8a00\u8bed\u7684\u4e0d\u786e\u5b9a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u4f46\u5b83\u4eec\u63d0\u53d6\u76f8\u5173\u8bed\u4e49\u4fe1\u606f\u7684\u80fd\u529b\u5c1a\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86\u56db\u6b3e\u516c\u5f00\u53ef\u7528\u7684LLMs\uff08GPT-4\u3001Gemini-Pro\u3001Llama3\u548cMixtral\uff09\uff0c\u5206\u6790\u5b83\u4eec\u5728\u9762\u5bf9\u4e0d\u786e\u5b9a\u6027\u65f6\u7406\u89e3\u53e3\u5934\u6307\u4ee4\u7684\u80fd\u529b\uff0c\u5e76\u8bc6\u522b\u6570\u636e\u4e0a\u4e0b\u6587\u548c\u89c6\u89c9\u4efb\u52a1\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5bf9\u53e3\u8bed\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u5f88\u654f\u611f\uff0c\u80fd\u591f\u63d0\u53d6\u5173\u952e\u7684\u6570\u636e\u80cc\u666f\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u63a8\u65ad\u53ef\u89c6\u5316\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u6b20\u4f73\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u672a\u6765\u5229\u7528LLMs\u8fdb\u884c\u53ef\u89c6\u5316\u751f\u6210\u7684\u7814\u7a76\u65b9\u5411\u3002**|\n", "2407.06125": "|**2024-07-08**|**Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities**|Avinash Anand et.al.|[2407.06125](http://arxiv.org/abs/2407.06125)|null|\u6291\u90c1\u75c7\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u91cd\u5927\u7684\u516c\u5171\u536b\u751f\u95ee\u9898\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4eba\u7684\u5fc3\u7406\u5065\u5eb7\u3002\u672a\u7ecf\u8bca\u65ad\u7684\u6291\u90c1\u75c7\u53ef\u80fd\u5bfc\u81f4\u4e25\u91cd\u7684\u5065\u5eb7\u95ee\u9898\uff0c\u5305\u62ec\u751f\u7406\u75c7\u72b6\u751a\u81f3\u81ea\u6740\u3002\u901a\u5e38\uff0c\u6291\u90c1\u75c7\u7684\u8bca\u65ad\u4f9d\u8d56\u4e8e\u4e34\u5e8a\u533b\u751f\u548c\u5fc3\u7406\u5065\u5eb7\u4e13\u4e1a\u4eba\u5458\u8fdb\u884c\u7684\u7ed3\u6784\u5316\u8bbf\u8c08\u548c\u5982Patient Health Questionnaire\uff08PHQ\uff09\u7b49\u95ee\u5377\u8c03\u67e5\u3002\u7136\u800c\uff0c\u8fd9\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u533b\u751f\u7684\u7ecf\u9a8c\u548c\u5224\u65ad\uff0c\u53ef\u80fd\u53d7\u5230\u4e2a\u4eba\u504f\u89c1\u7684\u5f71\u54cd\u3002\u7531\u4e8e\u6291\u90c1\u75c7\u7684\u6210\u56e0\u4ecd\u5728\u7814\u7a76\u4e2d\uff0c\u533b\u751f\u5728\u8bc6\u522b\u548c\u6cbb\u7597\u521d\u671f\u9636\u6bb5\u7684\u6291\u90c1\u75c7\u65f6\u9762\u4e34\u6311\u6218\u3002 \u8fd1\u671f\uff0c\u4eba\u5de5\u667a\u80fd\u795e\u7ecf\u8ba1\u7b97\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u8bed\u97f3\u5904\u7406\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u6211\u4eec\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528\u8fd9\u4e9b\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5728E-DAIC\uff08Extended Distress Analysis Interview Corpus Wizard of Oz\uff09\u6570\u636e\u96c6\u548c2019\u5e74Audio/Visual Emotion Challenge\uff08AVEC\uff09\u4e2d\u8fdb\u884c\u5b9e\u9a8c\uff0c\u4ee5\u671f\u4f18\u5316\u591a\u6a21\u6001\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u5229\u7528\u4e13\u6709\u548c\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u6a21\u6001\u4e0a\u7684Root Mean Square Error\uff08RMSE\uff09\u5f97\u5206\u8fbe\u52303.98\uff0c\u4f18\u4e8eAVEC 2019\u6311\u6218\u7684\u57fa\u7ebf\u548c\u5f53\u524d\u6700\u4f73\u7684\u56de\u5f52\u5206\u6790\u67b6\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u51c6\u786e\u6027\u8fbe\u5230\u4e8671.43%\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u97f3\u9891-\u89c6\u89c9\u591a\u6a21\u6001\u7f51\u7edc\uff0c\u5176\u9884\u6d4bPHQ-8\u8bc4\u5206\u7684RMSE\u4e3a6.51\u3002|\n", "2407.06093": "|**2024-07-08**|**Artificial Intuition: Efficient Classification of Scientific Abstracts**|Harsh Sakhrani et.al.|[2407.06093](http://arxiv.org/abs/2407.06093)|null|## \u80cc\u666f \u4e3a\u4e86\u83b7\u53d6\u6218\u7565\u6d1e\u89c1\u6216\u8fdb\u884c\u79d1\u7814\u9879\u76ee\u7ba1\u7406\uff0c\u5bf9\u7b80\u77ed\u7684\u79d1\u5b66\u6587\u672c\uff08\u5982\u7814\u7a76\u57fa\u91d1\u7533\u8bf7\u4e66\u6216\u51fa\u7248\u7269\u6458\u8981\uff09\u8fdb\u884c\u7c97\u7c92\u5ea6\u5206\u7c7b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u6587\u672c\u5411\u5177\u5907\u6df1\u539a\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u4f20\u8fbe\u5bc6\u96c6\u4fe1\u606f\uff0c\u4f46\u81ea\u52a8\u5316\u7684\u4efb\u52a1\u6781\u5176\u8270\u5de8\uff0c\u56e0\u4e3a\u7bc7\u5e45\u6709\u9650\u4e14\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\u6765\u751f\u6210\u5e76\u51c6\u786e\u5206\u914d\u7279\u5b9a\u9886\u57df\u7684\u7c97\u6807\u7b7e\u3002\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u591f\u63d0\u4f9b\u4efb\u52a1\u6240\u9700\u7684\u5143\u6570\u636e\uff0c\u7c7b\u4f3c\u4e8e\u589e\u5f3a\u4eba\u7c7b\u76f4\u89c9\u7684\u8865\u5145\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u3002\u4f5c\u4e3a\u521d\u6b65\u5b9e\u9a8c\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u7f8e\u56fd\u56fd\u5bb6\u822a\u7a7a\u822a\u5929\u5c40\uff08NASA\uff09\u7684\u5956\u9879\u6458\u8981\u6570\u636e\u5e93\u3002\u6211\u4eec\u7ed3\u5408\u73b0\u6709\u6027\u80fd\u6307\u6807\uff0c\u5f00\u53d1\u4e86\u65b0\u7684\u8bc4\u4f30\u5de5\u5177\u3002|\n", "2407.06089": "|**2024-07-08**|**Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models**|Jinliang Lu et.al.|[2407.06089](http://arxiv.org/abs/2407.06089)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u6210\u529f\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u8fdb\u5165\u4e86\u65b0\u65f6\u4ee3\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5404\u6709\u6240\u957f\uff0c\u4f46\u8bad\u7ec3\u5728\u4e0d\u540c\u8bed\u6599\u5e93\u4e0a\u7684LLMs\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u8fd9\u7ed9\u63d0\u9ad8\u6574\u4f53\u6548\u7387\u548c\u7075\u6d3b\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u7d22\u4e86LLMs\u7684\u534f\u4f5c\u7b56\u7565\u3002\u672c\u6587\u5168\u9762\u6982\u8ff0\u4e86\u8fd9\u4e00\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u5f3a\u8c03\u4e86\u5408\u4f5c\u80cc\u540e\u7684\u52a8\u529b\u3002\u6211\u4eec\u5c06\u534f\u4f5c\u7b56\u7565\u4e3b\u8981\u5206\u4e3a\u4e09\u79cd\u65b9\u6cd5\uff1a\u5408\u5e76\u3001\u96c6\u6210\u548c\u534f\u4f5c\u3002\u5408\u5e76\u662f\u5c06\u591a\u4e2aLLMs\u7684\u53c2\u6570\u7a7a\u95f4\u6574\u5408\u3002\u96c6\u6210\u5219\u662f\u7ed3\u5408\u591a\u4e2a\u6a21\u578b\u7684\u8f93\u51fa\u3002\u534f\u4f5c\u5229\u7528\u4e0d\u540cLLMs\u7684\u4f18\u52bf\uff0c\u4f7f\u5176\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u53d1\u6325\u5404\u81ea\u4e13\u957f\u3002\u6211\u4eec\u5c06\u4ece\u4e0d\u540c\u89d2\u5ea6\u8be6\u7ec6\u4ecb\u7ecd\u8fd9\u4e9b\u65b9\u6cd5\uff0c\u5e76\u8ba8\u8bba\u5176\u6f5c\u5728\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u52fe\u52d2\u51fa\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u671f\u671b\u672c\u5de5\u4f5c\u80fd\u6fc0\u53d1\u66f4\u591a\u5173\u4e8eLLMs\u534f\u4f5c\u7684\u7814\u7a76\uff0c\u63a8\u52a8\u9ad8\u7ea7NLP\u5e94\u7528\u7684\u53d1\u5c55\u3002|\n", "2407.07094": "|**2024-07-09**|**AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning**|Jiaxi Cui et.al.|[2407.07094](http://arxiv.org/abs/2407.07094)|**[link](https://github.com/pandavt/datatager)**|**\u5728\u5404\u884c\u5404\u4e1a\u5e7f\u6cdb\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc7\u7a0b\u4e2d\uff0c\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e2a\u4f53\u548c\u5c0f\u578b\u7ec4\u7ec7\u5bf9\u9488\u5bf9\u5176\u7279\u5b9a\u4e1a\u52a1\u573a\u666f\u5b9a\u5236\u5316\u6a21\u578b\u7684\u9700\u6c42\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\\textbf{AnyTaskTune}\uff0c\u5373\u4efb\u52a1\u5fae\u8c03\uff08Task-Fine-Tune\uff09\uff0c\u65e8\u5728\u63d0\u5347\u6a21\u578b\u5728\u591a\u6837\u5316\u7684\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u5305\u62ec\u7ec6\u81f4\u5730\u8bc6\u522b\u548c\u5b9a\u4e49\u9886\u57df\u5185\u7684\u5b50\u4efb\u52a1\uff0c\u968f\u540e\u521b\u5efa\u4e13\u95e8\u7684\u589e\u5f3a\u6570\u636e\u96c6\u8fdb\u884c\u7cbe\u7ec6\u8c03\u6574\uff0c\u4ece\u800c\u4f18\u5316\u4efb\u52a1\u7279\u5b9a\u7684\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u6cd5\u5f8b\uff08\u5982\u5173\u952e\u8bcd\u63d0\u53d6\u548c\u53e5\u5b50\u9884\u6d4b\uff09\u7b49\u591a\u4e2a\u9886\u57df\uff0c\u5305\u62ec\u91d1\u878d\u3001\u533b\u7597\u3001\u6cd5\u5f8b\u3001\u5fc3\u7406\u5b66\u3001\u5ba2\u6237\u670d\u52a1\u548c\u4eba\u529b\u8d44\u6e90\u7b49\u4e8c\u5341\u591a\u4e2a\u5b50\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5fae\u8c03\u5b9e\u9a8c\u3002\u4e3a\u4e86\u652f\u6301\u793e\u533a\u53c2\u4e0e\u5e76\u5206\u4eab\u8d44\u6e90\uff0c\u6211\u4eec\u5c06\u5f00\u6e90\u8fd9\u4e9b\u53cc\u8bed\u4efb\u52a1\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\\textbf{Task-Fine-Tune}\u65b9\u6cd5\u5fae\u8c03\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5404\u81ea\u9886\u57df\u5185\u660e\u663e\u4f18\u4e8e\u901a\u7528\u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/PandaVT/DataTager}\u3002**|\n", "2407.07093": "|**2024-07-09**|**FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation**|Liqun Ma et.al.|[2407.07093](http://arxiv.org/abs/2407.07093)|**[link](https://github.com/liqunma/fbi-llm)**|**\u8be5\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u5168\u4e8c\u8fdb\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08FBI-LLM\uff09\uff0c\u8fd9\u662f\u9996\u6b21\u5c55\u793a\u5982\u4f55\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5927\u89c4\u6a21\u7684\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\uff08\u4e0d\u540c\u4e8e\u90e8\u5206\u4e8c\u8fdb\u5236\u6216\u4e09\u8fdb\u5236\u7684LSTM\uff0c\u5982BitNet b1.58\uff09\uff0c\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6d6e\u70b916\u4f4d\uff08FP16\uff09\u6216\u6df7\u5408\u7cbe\u5ea616\u4f4d\uff08BF16\uff09\u7684\u5e38\u89c4\u5927\u8bed\u8a00\u6a21\u578b\u76f8\u5f53\u3002\u901a\u8fc7\u4f7f\u7528\u81ea\u56de\u5f52\u84b8\u998f\uff08AD\uff09\u635f\u5931\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u5c3a\u5bf8\uff08130M\u300113B\u30017B\uff09\u548c\u9884\u8bad\u7ec3\u6570\u636e\u91cf\u4e0e\u5e38\u89c4LLM\u76f8\u5f53\uff0cFBI-LLM\u5728\u56f0\u60d1\u5ea6\u548c\u4efb\u52a1\u7279\u5b9a\u6548\u679c\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\u5e76\u4e0d\u9700\u8981\u9884\u8bad\u7ec3\u6743\u91cd\u3002\u8fd9\u9879\u5de5\u4f5c\u50ac\u751f\u4e86\u4e00\u4e2a\u65b0\u7684\u8ba1\u7b97\u6846\u67b6\uff0c\u5e76\u53ef\u80fd\u63a8\u52a8\u9488\u5bf9\u5b8c\u51681\u6bd4\u7279LLMs\u7684\u4e13\u4e1a\u786c\u4ef6\u8bbe\u8ba1\u3002\u6211\u4eec\u516c\u5f00\u6240\u6709\u6a21\u578b\u3001\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff08\u4ee3\u7801\uff1ahttps://github.com/LiqunMa/FBI-LLM\uff0c\u6a21\u578b\uff1ahttps://huggingface.co/LiqunMa/\uff09\u3002**|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5173\u952e\u65b0\u589e\u7684\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u51c6\u786e\u6027\u6765\u9010\u6b65\u4f18\u5316\u3002\u5728Melting Pot\u57fa\u51c6\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u5047\u8bbe\u5fc3\u667a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7ebf\uff0c\u65e0\u8bba\u662f\u5728\u4e8c\u5143\u73af\u5883\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\u4e2d\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u8fed\u4ee3\u7cbe\u70bc\u5bf9\u4e8e\u5e94\u5bf9\u590d\u6742\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.07080": "|**2024-07-09**|**Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities**|Shaltiel Shmidman et.al.|[2407.07080](http://arxiv.org/abs/2407.07080)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u5e0c\u4f2f\u6765\u7b49\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6311\u6218\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86DictaLM2.0\u548cDictaLM2.0-Instruct\uff0c\u8fd9\u4e24\u4e2a\u6a21\u578b\u57fa\u4e8eMistral\u6a21\u578b\uff0c\u4f7f\u7528\u5927\u7ea62000\u4ebf\u4e2a\u5e0c\u4f2f\u6765\u8bed\u548c\u82f1\u8bed\u8bcd\u6c47\u8fdb\u884c\u8bad\u7ec3\u3002\u9002\u5e94\u9884\u8bad\u7ec3\u6a21\u578b\u5230\u65b0\u8bed\u8a00\u9700\u8981\u4e13\u95e8\u7684\u6280\u672f\uff0c\u8fd9\u4e0e\u4ece\u5934\u8bad\u7ec3\u6216\u5728\u8d44\u6e90\u4e30\u5bcc\u7684\u8bed\u8a00\uff08\u5982\u82f1\u8bed\uff09\u4e0a\u8fdb\u4e00\u6b65\u8bad\u7ec3\u73b0\u6709\u6a21\u578b\u6709\u663e\u8457\u5dee\u5f02\u3002\u8bba\u6587\u8be6\u7ec6\u9610\u8ff0\u4e86\u8fd9\u4e9b\u521b\u65b0\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdb\u5e0c\u4f2f\u6765\u8bed\u7684\u9ad8\u6548\u5b66\u4e60\u548c\u9002\u5e94\u5176\u8bed\u8a00\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9DictaLM2.0-Instruct\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6307\u4ee4\u5fae\u8c03\uff0c\u4ee5\u63d0\u5347\u5176\u5728\u4efb\u52a1\u5bfc\u5411\u6307\u4ee4\u4e0a\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u4e25\u683c\u8bc4\u4f30\u6211\u4eec\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u65b0\u7684\u5e0c\u4f2f\u6765LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u95ee\u7b54\u3001\u60c5\u611f\u5206\u6790\u3001Winograd Schema Challenge\u3001\u7ffb\u8bd1\u548c\u6458\u8981\u7b49\u591a\u4e2a\u4efb\u52a1\u3002\u672c\u6587\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3LLMs\u7684\u590d\u6742\u6027\uff0c\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u7528\u4e8e\u5176\u4ed6LLM\u8de8\u975e\u82f1\u8bed\u8bed\u8a00\u9002\u5e94\u7684\u6846\u67b6\uff0c\u4ece\u800c\u5bf9\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2407.07071": "|**2024-07-09**|**Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps**|Yung-Sung Chuang et.al.|[2407.07071](http://arxiv.org/abs/2407.07071)|**[link](https://github.com/voidism/lookback-lens)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u603b\u7ed3\u6587\u7ae0\u6216\u6839\u636e\u7ed9\u5b9a\u6bb5\u843d\u56de\u7b54\u95ee\u9898\u65f6\u53ef\u80fd\u51fa\u73b0\u7684\u8bed\u5883\u6027\u865a\u6784\u95ee\u9898\u3002LLMs\u53ef\u80fd\u4f1a\u675c\u64b0\u7ec6\u8282\uff0c\u63d0\u4f9b\u4e0e\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0d\u7b26\u7684\u4e0d\u51c6\u786e\u7b54\u6848\u3002\u7814\u7a76\u8005\u63d0\u51fa\uff0c\u8fd9\u79cd\u865a\u6784\u4e0e\u6a21\u578b\u503e\u5411\u4e8e\u5173\u6ce8\u4e0a\u4e0b\u6587\u4fe1\u606f\u8fd8\u662f\u81ea\u52a8\u751f\u6210\u5185\u5bb9\u7684\u7a0b\u5ea6\u6709\u5173\u3002\u4e3a\u6b64\uff0c\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7b80\u5355\u7684\u68c0\u6d4b\u6a21\u578b\u2014\u2014\u201cLookback Lens\u201d\uff0c\u5176\u8f93\u5165\u7279\u5f81\u662f\u57fa\u4e8e\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u4e0a\u4e0b\u6587\u6ce8\u610f\u529b\u6743\u91cd\u4e0e\u65b0\u751f\u6210\u8bcd\u7684\u6bd4\u4f8b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u4f7f\u7528\u8fd9\u4e9b\u56de\u987e\u6bd4\u7387\u7279\u5f81\u7684\u7ebf\u6027\u5206\u7c7b\u5668\u4e0e\u5229\u7528LLM\u6574\u4e2a\u9690\u85cf\u72b6\u6001\u6216\u6587\u672c\u8574\u542b\u6a21\u578b\u7684\u66f4\u590d\u6742\u68c0\u6d4b\u5668\u540c\u6837\u6709\u6548\u3002Lookback Lens\u4e0d\u4ec5\u9002\u7528\u4e8e\u4e0d\u540c\u4efb\u52a1\uff0c\u8fd8\u80fd\u8de8\u6a21\u578b\u8fc1\u79fb\uff0c\u4e00\u4e2a\u572870\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u8bad\u7ec3\u7684\u68c0\u6d4b\u5668\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u5373\u53ef\u5e94\u7528\u4e8e\u66f4\u5927\u7684130\u4ebf\u53c2\u6570\u6a21\u578b\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0\uff0c\u901a\u8fc7\u7b80\u5355\u7684\u5206\u7c7b\u5668\u6307\u5bfc\u89e3\u7801\u65b9\u6cd5\uff0c\u80fd\u591f\u51cf\u5c11\u8bf8\u5982XSum\u6458\u8981\u4efb\u52a1\u4e2d\u7684\u865a\u6784\u7a0b\u5ea6\uff0c\u4f8b\u5982\u964d\u4f4e9.6%\u7684\u865a\u6784\u53d1\u751f\u7387\u3002**|\n", "2407.07064": "|**2024-07-09**|**Prompting Techniques for Secure Code Generation: A Systematic Investigation**|Catherine Tony et.al.|[2407.07064](http://arxiv.org/abs/2407.07064)|null|## \u6982\u8981 \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u5174\u8d77\uff0c\u901a\u8fc7\u63d0\u793a\u9a71\u52a8\u7f16\u7a0b\uff0c\u5f00\u53d1\u8005\u80fd\u591f\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5b83\u4eec\u80fd\u5426\u4ea7\u751f\u5b89\u5168\u4ee3\u7801\u7684\u7814\u7a76\u5f15\u53d1\u4e86\u8d28\u7591\uff0c\u8fd9\u5173\u7cfb\u5230\u63d0\u793a\u751f\u6210\u8f6f\u4ef6\u7684\u8d28\u91cf\u3002\u5c3d\u7ba1\u5df2\u7ecf\u51fa\u73b0\u4e86\u591a\u79cd\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u4ee5\u4f18\u5316LLM\u7684\u54cd\u5e94\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u76ee\u6807\uff1a\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0d\u540c\u63d0\u793a\u6280\u672f\u5bf9LLMs\u6839\u636eNL\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u7684\u5b89\u5168\u6027\u5f71\u54cd\u3002\u65b9\u6cd5\uff1a\u9996\u5148\uff0c\u6211\u4eec\u8fdb\u884c\u7cfb\u7edf\u6587\u732e\u56de\u987e\uff0c\u4ee5\u8bc6\u522b\u9002\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u6709\u63d0\u793a\u6280\u672f\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728GPT-3\u3001GPT-3.5\u548cGPT-4\u6a21\u578b\u4e0a\u8bc4\u4f30\u8fd9\u4e9b\u6280\u672f\u4e2d\u7684\u90e8\u5206\uff0c\u4f7f\u7528\u4e00\u4e2a\u5305\u542b150\u4e2a\u4e0e\u5b89\u5168\u76f8\u5173\u7684\u4ee3\u7801\u751f\u6210NL\u63d0\u793a\u7684\u6570\u636e\u96c6\u3002\u7ed3\u679c\uff1a\u6211\u4eec\u7684\u5de5\u4f5c\uff081\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u7684\u6f5c\u5728\u63d0\u793a\u6280\u672f\u8fdb\u884c\u4e86\u5206\u7c7b\uff0c\uff082\uff09\u9002\u5e94\u5e76\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\uff083\uff09\u89c2\u5bdf\u5230\u5728\u6d4b\u8bd5\u7684LLMs\u4e2d\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u4e86\u540d\u4e3a\u201c\u9012\u5f52\u6279\u8bc4\u4e0e\u6539\u8fdb\u201d\uff08RCI\uff09\u7684\u73b0\u6709\u6280\u672f\u540e\uff0c\u5b89\u5168\u6f0f\u6d1e\u6709\u6240\u51cf\u5c11\uff0c\u4e3aLLM\u751f\u6210\u4ee3\u7801\u5b89\u5168\u6027\u7684\u8ba8\u8bba\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.07061": "|**2024-07-09**|**Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence**|Weize Chen et.al.|[2407.07061](http://arxiv.org/abs/2407.07061)|**[link](https://github.com/openbmb/ioa)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u51fa\u73b0\u4e86\u80fd\u6548\u5353\u8d8a\u7684\u81ea\u4e3b\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5728\u6574\u5408\u6765\u81ea\u4e0d\u540c\u751f\u6001\u7cfb\u7edf\u7684\u9ad8\u80fd\u529b\u7b2c\u4e09\u65b9\u4ee3\u7406\u65f6\u9762\u4e34\u6311\u6218\uff0c\u901a\u5e38\u5c40\u9650\u4e8e\u81ea\u8eab\u5c01\u95ed\u73af\u5883\u3002\u5b83\u4eec\u5728\u6a21\u62df\u5206\u5e03\u5f0f\u73af\u5883\u65f6\u4e5f\u53d7\u9650\u4e8e\u5355\u8bbe\u5907\u8bbe\u7f6e\uff0c\u5e76\u4e14\u5f80\u5f80\u4f9d\u8d56\u786c\u7f16\u7801\u7684\u901a\u4fe1\u7ba1\u9053\uff0c\u96be\u4ee5\u9002\u5e94\u4efb\u52a1\u9700\u6c42\u7684\u53d8\u5316\u3002\u53d7\u4e92\u8054\u7f51\u7406\u5ff5\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u4e92\u8054\u7f51\u201d\uff08Internet of Agents\uff0cIoA\uff09\u7684\u65b0\u6846\u67b6\u3002IoA\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7075\u6d3b\u4e14\u53ef\u6269\u5c55\u7684\u5e73\u53f0\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u3002\u5b83\u5f15\u5165\u4e86\u4ee3\u7406\u96c6\u6210\u534f\u8bae\u3001\u5373\u65f6\u6d88\u606f\u67b6\u6784\u4ee5\u53ca\u52a8\u6001\u7684\u56e2\u961f\u534f\u4f5c\u548c\u5bf9\u8bdd\u6d41\u7a0b\u63a7\u5236\u673a\u5236\u3002\u901a\u8fc7\u5728\u901a\u7528\u52a9\u624b\u4efb\u52a1\u3001\u4f53\u611fAI\u4efb\u52a1\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660eIoA\u5728\u6027\u80fd\u4e0a\u6301\u7eed\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\uff0c\u5c55\u793a\u4e86\u5176\u5728\u5f02\u6784\u4ee3\u7406\u4e4b\u95f4\u6709\u6548\u5408\u4f5c\u7684\u80fd\u529b\u3002IoA\u4ee3\u8868\u4e86\u671d\u7740\u5c06\u591a\u6837\u5316\u7684\u4ee3\u7406\u94fe\u63a5\u5728\u4e00\u4e2a\u7c7b\u4f3c\u4e92\u8054\u7f51\u7684\u73af\u5883\u4e2d\u8fc8\u8fdb\uff0c\u8ba9\u5b83\u4eec\u80fd\u591f\u65e0\u7f1d\u534f\u4f5c\u4ee5\u63d0\u5347\u6574\u4f53\u667a\u80fd\u548c\u529f\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5e93\u5df2\u53d1\u5e03\u5728\uff1a\\url{https://github.com/OpenBMB/IoA}\u3002**|\n", "2407.07053": "|**2024-07-09**|**Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model**|Wenqi Zhang et.al.|[2407.07053](http://arxiv.org/abs/2407.07053)|**[link](https://github.com/zwq2018/multi-modal-self-instruct)**|**\u5c3d\u7ba1\u5f53\u524d\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5df2\u7ecf\u80fd\u591f\u7406\u89e3\u81ea\u7136\u573a\u666f\u7684\u7167\u7247\u548c\u8096\u50cf\uff0c\u4f46\u5b83\u4eec\u5bf9\u62bd\u8c61\u56fe\u50cf\uff08\u5982\u56fe\u8868\u3001\u5730\u56fe\u6216\u5e03\u5c40\uff09\u7684\u7406\u89e3\u4ee5\u53ca\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u4ecd\u7136\u76f8\u5f53\u521d\u7ea7\u3002\u5b83\u4eec\u5728\u5904\u7406\u65e5\u5e38\u4efb\u52a1\u65f6\u5e38\u5e38\u9047\u5230\u56f0\u96be\uff0c\u4f8b\u5982\u9605\u8bfb\u65f6\u949f\u65f6\u95f4\u3001\u7406\u89e3\u6d41\u7a0b\u56fe\u6216\u6839\u636e\u8def\u7ebf\u56fe\u89c4\u5212\u8def\u5f84\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u81ea\u6211\u6307\u5bfc\u7cfb\u7edf\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ca\u5176\u4ee3\u7801\u80fd\u529b\u6765\u751f\u6210\u5927\u91cf\u7684\u62bd\u8c61\u56fe\u50cf\u548c\u65e5\u5e38\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u63a8\u7406\u6307\u4ee4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8f7b\u677e\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\uff0c\u5305\u542b11,193\u4e2a\u6307\u4ee4\uff0c\u6db5\u76d6\u516b\u4e2a\u89c6\u89c9\u573a\u666f\uff1a\u56fe\u8868\u3001\u8868\u683c\u3001\u6a21\u62df\u5730\u56fe\u3001\u4eea\u8868\u677f\u3001\u6d41\u7a0b\u56fe\u3001\u5173\u7cfb\u56fe\u3001\u697c\u5c42\u5e73\u9762\u56fe\u548c\u89c6\u89c9\u8c1c\u9898\u3002 \u8fd9\u4e2a\u7531\u7b80\u5355\u7ebf\u6761\u548c\u51e0\u4f55\u5143\u7d20\u6784\u6210\u7684\u57fa\u51c6\u63ed\u793a\u4e86\u6700\u5148\u8fdb\u7684LMM\uff08\u5982Claude-3.5-Sonnet\u548cGPT-4o\uff09\u5728\u62bd\u8c61\u56fe\u50cf\u7406\u89e3\u3001\u7a7a\u95f4\u5173\u7cfb\u63a8\u7406\u548c\u89c6\u89c9\u5143\u7d20\u8bc6\u522b\u65b9\u9762\u7684\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u9a8c\u8bc1\u5408\u6210\u6570\u636e\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u752862,476\u6761\u5408\u6210\u7684\u56fe\u8868\u3001\u8868\u683c\u548c\u8def\u7ebf\u56fe\u6307\u4ee4\u5bf9LMM\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u56fe\u8868\u7406\u89e3\u548c\u5730\u56fe\u5bfc\u822a\u6027\u80fd\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u540c\u65f6\u4e5f\u8868\u660e\u8fd9\u5bf9\u5176\u4ed6\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u53ef\u80fd\u5177\u6709\u6f5c\u5728\u76ca\u5904\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1a\\url{https://github.com/zwq2018/Multi-modal-Self-instruct}\u3002**|\n", "2407.07019": "|**2024-07-09**|**Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies**|Inwon Kang et.al.|[2407.07019](http://arxiv.org/abs/2407.07019)|null|\u6211\u4eec\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u751f\u6210\u57fa\u4e8e\u6587\u672c\u7684\u5065\u5eb7\u4fdd\u9669\u653f\u7b56\u7684\u81ea\u52a8\u5316\u4ee3\u7801\uff0c\u76ee\u6807\u662f\u533a\u5757\u94fe\u667a\u80fd\u5408\u7ea6\u3002\u667a\u80fd\u5408\u7ea6\u56e0\u5176\u4e0d\u53ef\u53d8\u6027\u3001\u53ef\u9a8c\u8bc1\u6027\u3001\u6269\u5c55\u6027\u548c\u65e0\u9700\u9884\u8bbe\u4fe1\u4efb\u7684\u7279\u6027\u800c\u88ab\u9009\u4e2d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6309\u6280\u672f\u590d\u6742\u5ea6\u9012\u589e\u751f\u6210\u8f93\u51fa\uff1a\uff081\uff09\u6587\u672c\u6458\u8981\uff0c\uff082\uff09\u58f0\u660e\u5f0f\u51b3\u7b56\u903b\u8f91\uff0c\u4ee5\u53ca\uff083\uff09\u5e26\u6709\u5355\u5143\u6d4b\u8bd5\u7684\u667a\u80fd\u5408\u7ea6\u4ee3\u7801\u3002\u6211\u4eec\u786e\u8ba4LLMs\u5728\u4efb\u52a1\uff081\uff09\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u6709\u52a9\u4e8e\u9a8c\u8bc1\u4efb\u52a1\uff082\uff09\u548c\uff083\uff09\u3002\u58f0\u660e\u5f0f\u8bed\u8a00\u5e38\u7528\u4e8e\u89c4\u8303\u533b\u7597\u653f\u7b56\uff0c\u4f46\u5728\u533a\u5757\u94fe\u4e0a\u7684\u6267\u884c\u8f83\u4e3a\u590d\u6742\uff0c\u56e0\u6b64\u4efb\u52a1\uff083\uff09\u65e8\u5728\u76f4\u63a5\u901a\u8fc7\u667a\u80fd\u5408\u7ea6\u81ea\u52a8\u5b9e\u73b0\u8fd9\u4e00\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u5b8c\u6574\u6027\u3001\u6b63\u786e\u6027\u3001\u6e05\u6670\u5ea6\u3001\u8bed\u6cd5\u548c\u529f\u80fd\u6027\u4ee3\u7801\u4f5c\u4e3a\u8bc4\u4f30\u6307\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u6765\u81eaMedicare\u5b98\u65b9\u624b\u518c\u7684\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u96be\u5ea6\u7684\u4fdd\u9669\u653f\u7b56\u573a\u666f\u8fdb\u884c\u8bc4\u4f30\uff0c\u6d89\u53caGPT-3.5 Turbo\u3001GPT-3.5 Turbo 16K\u3001GPT-4\u3001GPT-4 Turbo\u548cCodeLLaMA\u7b49\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5728\u751f\u6210\u6587\u672c\u6458\u8981\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002\u5c3d\u7ba1\u4efb\u52a1\uff082\uff09\u5230\uff083\uff09\u7684\u8f93\u51fa\u53ef\u4ee5\u4f5c\u4e3a\u8d77\u70b9\uff0c\u4f46\u5b83\u4eec\u4ecd\u9700\u4eba\u5de5\u5ba1\u6838\uff1a\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u5373\u4f7f\u201c\u53ef\u8fd0\u884c\u201d\u7684\u4ee3\u7801\u4e5f\u53ef\u80fd\u4ea7\u751f\u4e0d\u6b63\u786e\u7684\u7ed3\u679c\uff1b\u76ee\u6807\u8bed\u8a00\u7684\u6d41\u884c\u7a0b\u5ea6\u4f1a\u5f71\u54cd\u8f93\u51fa\u8d28\u91cf\uff1b\u66f4\u590d\u6742\u7684\u573a\u666f\u4ecd\u662f\u5f53\u524d\u7684\u4e00\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u5c55\u793a\u4e86LLMs\u5728\u5c06\u6587\u672c\u6d41\u7a0b\u63cf\u8ff0\u8f6c\u5316\u4e3a\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2407.07018": "|**2024-07-09**|**End-To-End Causal Effect Estimation from Unstructured Natural Language Data**|Nikita Dhawan et.al.|[2407.07018](http://arxiv.org/abs/2407.07018)|null|\u4e86\u89e3\u5e72\u9884\u63aa\u65bd\u7684\u6548\u679c\u5bf9\u4eba\u7c7b\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u624b\u52a8\u6536\u96c6\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u8fd9\u5bfc\u81f4\u7814\u7a76\u6210\u672c\u589e\u52a0\u3001\u5b8c\u6210\u65f6\u95f4\u5ef6\u957f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u91c7\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u89c2\u5bdf\u6027\u6587\u672c\u6570\u636e\uff0c\u4ee5\u5728\u9002\u5f53\u7684\u56e0\u679c\u5047\u8bbe\u4e0b\u751f\u6210\u4f4e\u6210\u672c\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u63d0\u51faNATURAL\uff0c\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u65b0\u578b\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u7b97\u6cd5\u5bb6\u65cf\uff0c\u9002\u7528\u4e8e\u5904\u7406\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528LLMs\u7684\u6761\u4ef6\u5206\u5e03\uff08\u9488\u5bf9\u611f\u5174\u8da3\u7684\u53d8\u91cf\uff0c\u6839\u636e\u6587\u672c\u6570\u636e\uff09\u8f85\u52a9\u8ba1\u7b97\u7ecf\u5178\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u514b\u670d\u4e86\u4e00\u7cfb\u5217\u6280\u672f\u6311\u6218\uff0c\u5982\u81ea\u52a8\u5316\u6570\u636e\u6574\u7406\u548c\u4f7f\u7528LLMs\u586b\u8865\u7f3a\u5931\u4fe1\u606f\u3002 \u6211\u4eec\u51c6\u5907\u4e86\u516d\u4e2a\uff08\u4e24\u4e2a\u5408\u6210\u7684\u548c\u56db\u4e2a\u5b9e\u9645\u7684\uff09\u89c2\u5bdf\u6027\u6570\u636e\u96c6\uff0c\u5e76\u914d\u4ee5\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u5f62\u5f0f\u7684\u771f\u5b9e\u6807\u7b7e\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u6211\u4eec\u7ba1\u9053\u4e2d\u7684\u6bcf\u4e00\u6b65\u3002NATURAL\u4f30\u8ba1\u7b97\u6cd5\u8868\u73b0\u51fa\u8272\uff0c\u5176\u7ed3\u679c\u4e0e\u771f\u5b9e\u503c\u7684\u5dee\u8ddd\u4e0d\u8d85\u8fc73\u4e2a\u767e\u5206\u70b9\uff0c\u5305\u62ec\u5728\u5b9e\u9645\u7684\u4e09\u671f\u548c\u56db\u671f\u4e34\u5e8a\u8bd5\u9a8c\u4e2d\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u662f\u56e0\u679c\u6548\u5e94\u4fe1\u606f\u7684\u4e30\u5bcc\u6765\u6e90\uff0cNATURAL\u662f\u5229\u7528\u8fd9\u4e00\u8d44\u6e90\u7684\u81ea\u52a8\u5316\u6d41\u7a0b\u7684\u7b2c\u4e00\u6b65\u3002|\n", "2407.07890": "|**2024-07-10**|**Training on the Test Task Confounds Evaluation and Emergence**|Ricardo Dominguez-Olmedo et.al.|[2407.07890](http://arxiv.org/abs/2407.07890)|**[link](https://github.com/socialfoundations/training-on-the-test-task)**|**\u6211\u4eec\u7814\u7a76\u4e86\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u7684\u6838\u5fc3\u95ee\u9898\uff0c\u79f0\u4e3a\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u3002\u8fd9\u5e76\u975e\u5982\u6570\u636e\u6cc4\u9732\u6216\u6c61\u67d3\u7b49\u4e0d\u5f53\u505a\u6cd5\uff0c\u800c\u662f\u4e00\u79cd\u9010\u6e10\u589e\u957f\u7684\u5305\u62ec\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u6280\u672f\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u4f1a\u6df7\u6dc6\u6a21\u578b\u7684\u76f8\u5bf9\u8bc4\u4f30\u548c\u5173\u4e8e\u6d8c\u73b0\u80fd\u529b\u7684\u58f0\u660e\u3002\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u4e4b\u95f4\u7684\u770b\u4f3c\u4f18\u52bf\u53ef\u80fd\u7531\u4ed6\u4eec\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\u7a0b\u5ea6\u5dee\u5f02\u6240\u89e3\u91ca\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u65b9\u6cd5\uff0c\u5373\u5728\u6bd4\u8f83\u524d\u5bf9\u6bcf\u4e2a\u6a21\u578b\u8fdb\u884c\u76f8\u540c\u7684\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5fae\u8c03\uff0c\u4ee5\u6821\u6b63\u8fd9\u79cd\u8bad\u7ec3\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u65e6\u8c03\u6574\u4e86\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\uff0c\u6d8c\u73b0\u884c\u4e3a\u7684\u5b9e\u4f8b\u5927\u591a\u6d88\u5931\u3002\u540c\u6837\u9002\u7528\u4e8e\u90a3\u4e9b\u65e0\u6cd5\u7528\u8bc4\u4ef7\u6307\u6807\u89e3\u91ca\u7684\u6d8c\u73b0\u884c\u4e3a\u62a5\u544a\u6848\u4f8b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63a8\u52a8\u4e86\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b0\u8bc4\u4ef7\u89c6\u89d2\uff0c\u5bf9\u57fa\u51c6\u6d4b\u8bd5\u548c\u6d8c\u73b0\u80fd\u529b\u7814\u7a76\u5177\u6709\u5e7f\u6cdb\u5f71\u54cd\u3002**|\n", "2407.07880": "|**2024-07-10**|**Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization**|Junkang Wu et.al.|[2407.07880](http://arxiv.org/abs/2407.07880)|**[link](https://github.com/junkangwu/dr_dpo)**|**\u672c\u7814\u7a76\u5173\u6ce8\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u566a\u58f0\u5bf9Direct Preference Optimization (DPO)\u65b9\u6cd5\u7684\u6311\u6218\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u6211\u4eec\u533a\u5206\u4e86\u4e24\u7c7b\u566a\u58f0\uff1a\u70b9\u566a\u58f0\uff0c\u6d89\u53ca\u4f4e\u8d28\u91cf\u7684\u6570\u636e\u70b9\uff1b\u548c\u6210\u5bf9\u566a\u58f0\uff0c\u5f71\u54cd\u504f\u597d\u7684\u6b63\u786e\u6392\u5e8f\u3002\u901a\u8fc7\u5206\u5e03\u5f0f\u9c81\u68d2\u4f18\u5316\uff08DRO\uff09\uff0c\u6211\u4eec\u589e\u5f3a\u4e86DPO\u62b5\u6297\u8fd9\u4e9b\u566a\u58f0\u7684\u80fd\u529b\u3002\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0cDPO\u672c\u8d28\u4e0a\u8574\u542b\u4e86DRO\u539f\u7406\uff0c\u5bf9\u70b9\u566a\u58f0\u5177\u6709\u5929\u7136\u7684\u9c81\u68d2\u6027\uff0c\u5176\u4e2d\u6b63\u5219\u5316\u7cfb\u6570$\\beta$\u5728\u6297\u566a\u58f0\u65b9\u9762\u8d77\u5173\u952e\u4f5c\u7528\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u5206\u5e03\u5f0f\u9c81\u68d2\u589e\u5f3a\u7684DPO\uff08Dr. DPO\uff09\uff0c\u5b83\u901a\u8fc7\u4f18\u5316\u6700\u574f\u60c5\u51b5\u7684\u6210\u5bf9\u573a\u666f\u6765\u96c6\u6210\u6210\u5bf9\u9c81\u68d2\u6027\u3002Dr. DPO\u4e2d\u7684\u65b0\u8d85\u53c2\u6570$\\beta'$\u5141\u8bb8\u5bf9\u6570\u636e\u5bf9\u53ef\u9760\u6027\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\uff0c\u5e73\u8861\u4e86\u5728\u5608\u6742\u8bad\u7ec3\u73af\u5883\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cDr. DPO\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u54cd\u5e94\u51c6\u786e\u6027\uff0c\u65e0\u8bba\u5728\u6709\u566a\u58f0\u8fd8\u662f\u65e0\u566a\u58f0\u7684\u8bbe\u7f6e\u4e0b\u90fd\u8868\u73b0\u51fa\u8272\u3002\u4ee3\u7801\u5df2\u5728https://github.com/junkangwu/Dr_DPO\u4e0a\u63d0\u4f9b\u3002**|\n", "2407.07858": "|**2024-07-10**|**FACTS About Building Retrieval Augmented Generation-based Chatbots**|Rama Akkiraju et.al.|[2407.07858](http://arxiv.org/abs/2407.07858)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u65e5\u76ca\u6210\u4e3a\u63d0\u5347\u5458\u5de5\u751f\u4ea7\u529b\u7684\u5173\u952e\u5de5\u5177\uff0c\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u53ca\u5982Langchain\u548cLlamaindex\u4e4b\u7c7b\u7684orchestration\u6846\u67b6\u5728\u6784\u5efa\u8fd9\u4e9b\u804a\u5929\u673a\u5668\u4eba\u4e2d\u626e\u6f14\u4e86\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u521b\u5efa\u6709\u6548\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u662f\u4e00\u9879\u6311\u6218\uff0c\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u7684RAG\u7ba1\u9053\u5de5\u7a0b\u3002\u8fd9\u5305\u62ec\u5fae\u8c03\u5d4c\u5165\u548cLLMs\u3001\u4ece\u5411\u91cf\u6570\u636e\u5e93\u63d0\u53d6\u6587\u6863\u3001\u91cd\u8ff0\u67e5\u8be2\u3001\u91cd\u65b0\u6392\u540d\u7ed3\u679c\u3001\u8bbe\u8ba1\u63d0\u793a\u3001\u9075\u5b88\u6587\u6863\u8bbf\u95ee\u63a7\u5236\u3001\u63d0\u4f9b\u7b80\u6d01\u7684\u56de\u7b54\u3001\u5305\u542b\u5f15\u7528\u3001\u4fdd\u62a4\u4e2a\u4eba\u4fe1\u606f\u4ee5\u53ca\u6784\u5efaorchestration\u4ee3\u7406\u3002\u6211\u4eec\u57fa\u4e8e\u4e09\u4e2aNVIDIA\u804a\u5929\u673a\u5668\u4eba\uff08\u5206\u522b\u7528\u4e8eIT/HR\u798f\u5229\u3001\u8d22\u52a1\u6536\u76ca\u548c\u901a\u7528\u5185\u5bb9\uff09\u7684\u7ecf\u9a8c\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6784\u5efaRAG\u804a\u5929\u673a\u5668\u4eba\u7684\u6846\u67b6\u2014\u2014FACTS\uff08Freshness\u3001Architectures\u3001Cost\u3001Testing\u3001Security\uff09\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u65b9\u9762\uff1a\u9996\u5148\u4ecb\u7ecdFACTS\u6846\u67b6\uff0c\u5176\u6b21\u5217\u51fa\u5341\u4e94\u4e2aRAG\u7ba1\u9053\u63a7\u5236\u70b9\uff0c\u6700\u540e\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u6a21\u578b\u548c\u5c0f\u6a21\u578b\u5728\u51c6\u786e\u6027\u548c\u5ef6\u8fdf\u4e4b\u95f4\u6743\u8861\u7684\u5b9e\u8bc1\u7ed3\u679c\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u7bc7\u5168\u9762\u63a2\u8ba8\u6784\u5efa\u5b89\u5168\u4f01\u4e1a\u7ea7\u804a\u5929\u673a\u5668\u4eba\u7684\u65b9\u6cd5\u548c\u89e3\u51b3\u65b9\u6848\u7684\u8bba\u6587\u3002|\n", "2407.07852": "|**2024-07-10**|**OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training**|Sami Jaghouar et.al.|[2407.07852](http://arxiv.org/abs/2407.07852)|**[link](https://github.com/PrimeIntellect-ai/OpenDiLoCo)**|**OpenDiLoCo\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u5206\u5e03\u5f0f\u4f4e\u901a\u4fe1\uff08DiLoCo\uff09\u8bad\u7ec3\u65b9\u6cd5\u7684\u5b9e\u73b0\u548c\u590d\u5236\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u53ef\u590d\u73b0\u7684DiLoCo\u5b9e\u9a8c\uff0c\u901a\u8fc7Hivemind\u5e93\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5927\u6d32\u548c\u4e09\u4e2a\u56fd\u5bb6\u4e4b\u95f4\u8bad\u7ec3\u6a21\u578b\uff0c\u540c\u65f6\u4fdd\u630190-95%\u7684\u8ba1\u7b97\u8d44\u6e90\u5229\u7528\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5173\u4e8e\u7b97\u6cd5\u8ba1\u7b97\u6548\u7387\u3001\u5de5\u4f5c\u5668\u6570\u91cf\u53ef\u6269\u5c55\u6027\u7684\u7814\u7a76\uff0c\u5e76\u8868\u660e\u5176\u68af\u5ea6\u53ef\u4ee5\u4f7f\u7528FP16\u8fdb\u884c\u5168\u5f52\u4e00\u5316\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06OpenDiLoCo\u6269\u5c55\u5230\u539f\u59cb\u5de5\u4f5c\u7684\u4e09\u500d\u89c4\u6a21\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u767e\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2407.07845": "|**2024-07-10**|**Natural Language Mechanisms via Self-Resolution with Foundation Models**|Nicolas Della Penna et.al.|[2407.07845](http://arxiv.org/abs/2407.07845)|null|\u5728\u5b9e\u9645\u64cd\u4f5c\u4e2d\uff0c\u4ee3\u7406\u4eba\u901a\u5e38\u53d7\u9650\u4e8e\u8bf8\u5982\u4ea4\u6613\u6216\u8ba2\u5355\u4e4b\u7c7b\u7684\u6709\u9650\u62a5\u544a\u683c\u5f0f\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u4ed6\u4eec\u8868\u8fbe\u4fe1\u606f\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u673a\u5236\uff0c\u5b83\u4fc3\u4f7f\u4ee3\u7406\u4eba\u4ee5\u81ea\u7136\u8bed\u8a00\u63d0\u4ea4\u62a5\u544a\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\u6765\u9009\u62e9\u7ed3\u679c\u548c\u5206\u914d\u62a5\u916c\u3002\u6211\u4eec\u786e\u5b9a\u4e86\u8fd9\u4e9b\u673a\u5236\u5728LLM\u4f5c\u4e3a\u826f\u597d\u7684\u4e16\u754c\u6a21\u578b\u4ee5\u53ca\u5f3a\u70c8\u7684\u8de8\u4ee3\u7406\u4fe1\u606f\u8fc7\u5ea6\u786e\u5b9a\u6761\u4ef6\u4e0b\u7684\u6fc0\u52b1\u517c\u5bb9\u6027\u548c\u6548\u7387\u7684\u5fc5\u8981\u6761\u4ef6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u4f20\u7edf\u9884\u6d4b\u5e02\u573a\u5728\u4fe1\u53f7\u7ed3\u6784\u4e0a\u5b58\u5728\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u673a\u5236\u80fd\u591f\u6210\u529f\u5730\u6574\u5408\u4fe1\u606f\u3002|\n", "2407.07810": "|**2024-07-10**|**Transformer Alignment in Large Language Models**|Murdock Aubry et.al.|[2407.07810](http://arxiv.org/abs/2407.07810)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6df1\u5165\u7406\u89e3\u5176\u5185\u90e8\u673a\u5236\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u89c6LLMs\u4e3a\u9ad8\u7ef4\u7a7a\u95f4\u4e2d\u7684\u79bb\u6563\u3001\u8026\u5408\u7684\u975e\u7ebf\u6027\u52a8\u529b\u7cfb\u7edf\uff0c\u901a\u8fc7\u7814\u7a76tokens\u5728Transformer\u5757\u4e2d\u7684\u8f68\u8ff9\uff0c\u5e76\u6cbf\u7740\u8fd9\u4e9b\u8f68\u8ff9\u7ebf\u6027\u5316\u7cfb\u7edf\uff0c\u5229\u7528\u96c5\u53ef\u6bd4\u77e9\u9635\u8fdb\u884c\u5206\u6790\u3002\u5728\u5bf938\u4e2a\u516c\u5f00\u53ef\u7528\u7684LLMs\u8fdb\u884c\u7814\u7a76\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u6b8b\u5dee\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u4e0a\u5de6\u548c\u53f3\u5947\u5f02\u5411\u91cf\u4e4b\u95f4\u7684\u5bf9\u9f50\uff0c\u4ee5\u53ca\u7ebf\u6027\u6027\u548c\u5c42\u5185\u6307\u6570\u589e\u957f\u7684\u51fa\u73b0\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u5bf9\u9f50\u5ea6\u7684\u63d0\u9ad8\u4e0e\u6a21\u578b\u6027\u80fd\u5448\u6b63\u76f8\u5173\u3002\u8bad\u7ec3\u540e\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u521d\u59cb\u5316\u6743\u91cd\u65f6\u7684\u6307\u6807\uff0c\u6709\u663e\u8457\u6539\u5584\uff0c\u8fd9\u5f3a\u8c03\u4e86\u8bad\u7ec3\u5728Transformer\u67b6\u6784\u4e2d\u7684\u91cd\u8981\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86\u4e00\u79cd\u4ee5\u524d\u672a\u88ab\u5145\u5206\u8ba4\u8bc6\u7684\u89c4\u5f8b\u6027\uff0c\u5f3a\u5316\u4e86\u52a8\u529b\u5b66\u89e3\u91ca\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3\u548c\u4f18\u5316LLM\u67b6\u6784\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.07799": "|**2024-07-10**|**Attribute or Abstain: Large Language Models as Long Document Assistants**|Jan Buchmann et.al.|[2407.07799](http://arxiv.org/abs/2407.07799)|**[link](https://github.com/ukplab/arxiv2024-attribute-or-abstain)**|**## \u80cc\u666f \u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8f85\u52a9\u5904\u7406\u957f\u7bc7\u6587\u6863\uff0c\u4f46\u5b83\u4eec\u4e5f\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\u3002\u589e\u52a0\u53ef\u4fe1\u5ea6\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u63d0\u4f9b\u8bc1\u636e\u652f\u6301\u54cd\u5e94\uff0c\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u3002\u5f53\u524d\u7684\u5f52\u56e0\u65b9\u6cd5\u4ec5\u5728\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u73af\u5883\u4e2d\u8bc4\u4f30\u8fc7\uff0c\u8fd9\u4e0e\u65e0\u9700\u68c0\u7d22\u7684\u957f\u6587\u6863\u573a\u666f\u4e0d\u540c\uff0c\u53ef\u80fd\u4ecd\u6709\u5e94\u7528\u4ef7\u503c\u3002\u56e0\u6b64\uff0c\u7f3a\u4e4f\u9488\u5bf9\u957f\u6587\u6863\u7684\u5f52\u56e0\u4e13\u95e8\u8bc4\u4f30\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLAB\uff0c\u4e00\u4e2a\u5305\u542b6\u4e2a\u591a\u6837\u5316\u7684\u957f\u6587\u6863\u4efb\u52a1\u7684\u57fa\u51c6\uff0c\u5e76\u5728\u56db\u79cd\u4e0d\u540c\u5927\u5c0f\u7684LLM\uff08\u5373\u63d0\u793a\u548c\u5fae\u8c03\uff09\u4e0a\u8bd5\u9a8c\u4e86\u4e0d\u540c\u7684\u5f52\u56e0\u65b9\u6cd5\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u6b65\u751f\u6210\u5f15\u7528\uff08citation\uff0c\u5373\u540c\u65f6\u8fdb\u884c\u54cd\u5e94\u751f\u6210\u548c\u8bc1\u636e\u63d0\u53d6\uff09\u7684\u8868\u73b0\u6700\u4f73\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u73b0\u8c61\u662f\u5426\u9002\u7528\u4e8e\u5f52\u56e0\uff0c\u4f46\u672a\u53d1\u73b0\u8fd9\u79cd\u60c5\u51b5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8bc1\u636e\u8d28\u91cf\u5728\u7b80\u5355\u54cd\u5e94\u7684\u573a\u666f\u4e0b\u53ef\u4ee5\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\uff0c\u4f46\u5bf9\u4e8e\u590d\u6742\u54cd\u5e94\u5219\u4e0d\u7136\uff0c\u56e0\u4e3a\u6a21\u578b\u5728\u4e3a\u590d\u6742\u4e3b\u5f20\u63d0\u4f9b\u8bc1\u636e\u65f6\u9762\u4e34\u6311\u6218\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2407.07796": "|**2024-07-11**|**Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard**|Oguzhan Topsakal et.al.|[2407.07796](http://arxiv.org/abs/2407.07796)|**[link](https://github.com/research-outcome/llm-game-benchmark)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u4e14\u53ef\u6269\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u7f51\u683c\u578b\u6e38\u620f\u5982\u4e95\u5b57\u68cb\u3001\u8fde\u63a5\u56db\u548c\u56f4\u68cb\u8fdb\u884c\u3002\u5f00\u6e90\u7684\u6e38\u620f\u6a21\u62df\u4ee3\u7801\u5728GitHub\u4e0a\u63d0\u4f9b\uff0c\u5141\u8bb8LLMs\u7ade\u6280\uff0c\u5e76\u751f\u6210JSON\u3001CSV\u3001TXT\u548cPNG\u683c\u5f0f\u7684\u8be6\u7ec6\u6570\u636e\u6587\u4ef6\uff0c\u7528\u4e8e\u6392\u884c\u699c\u6392\u540d\u548c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5305\u62ecAnthropic\u7684Claude 3.5 Sonnet\u548cClaude 3 Sonnet\uff0cGoogle\u7684Gemini 1.5 Pro\u548cGemini 1.5 Flash\uff0cOpenAI\u7684GPT-4 Turbo\u548cGPT-4o\uff0c\u4ee5\u53caMeta\u7684Llama3-70B\u5728\u5185\u7684\u9886\u5148LLM\u4e4b\u95f4\u7684\u6bd4\u8d5b\u7ed3\u679c\u3002\u6211\u4eec\u9f13\u52b1\u5176\u4ed6LLM\u63d0\u4ea4\u7ed3\u679c\u3002\u603b\u5171\u8fdb\u884c\u4e862,310\u573a\u6a21\u62df\u6bd4\u8d5b\uff08\u6bcf\u5bf9\u6a21\u578b\u8fdb\u884c5\u8f6e\uff0c\u51717\u4e2a\u6a21\u578b\u95f4\u7684\u5bf9\u5c40\uff0c\u4ee5\u53ca\u4e0e\u968f\u673a\u73a9\u5bb6\u7684\u6bd4\u8d5b\uff09\uff0c\u6db5\u76d6\u4e09\u79cd\u7c7b\u578b\u7684\u6e38\u620f\uff0c\u4f7f\u7528\u4e86\u5217\u8868\u3001\u63d2\u56fe\u548c\u56fe\u50cf\u4e09\u79cd\u63d0\u793a\u65b9\u5f0f\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u5728\u4e0d\u540c\u6e38\u620f\u548c\u63d0\u793a\u7c7b\u578b\u4e0b\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff0c\u5206\u6790\u5185\u5bb9\u5305\u62ec\u80dc\u7387\u3001\u9519\u5931\u673a\u4f1a\u548c\u65e0\u6548\u52a8\u4f5c\u3002\u6392\u884c\u699c\u548c\u7ed3\u679c\u77e9\u9635\u7684\u8be6\u7ec6\u6570\u636e\u4f5c\u4e3a\u5f00\u653e\u8bbf\u95ee\u6570\u636e\u5728GitHub\u4e0a\u63d0\u4f9b\u3002\u8fd9\u9879\u7814\u7a76\u52a0\u6df1\u4e86\u6211\u4eec\u5bf9LLM\u5728\u672a\u4e13\u95e8\u8bad\u7ec3\u7684\u6e38\u620f\u4e2d\u7684\u80fd\u529b\u7684\u7406\u89e3\uff0c\u6709\u52a9\u4e8e\u8bc4\u4f30\u5b83\u4eec\u7684\u89c4\u5219\u7406\u89e3\u80fd\u529b\u548c\u6218\u7565\u601d\u7ef4\u3002\u5728\u901a\u5411\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\u7684\u9053\u8def\u4e0a\uff0c\u8fd9\u9879\u7814\u7a76\u4e3a\u672a\u6765\u63a2\u7d22\u5b83\u4eec\u5728\u590d\u6742\u51b3\u7b56\u573a\u666f\u4e2d\u7684\u5b9e\u7528\u6027\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u7684\u6218\u7565\u601d\u8003\u80fd\u529b\uff0c\u5e76\u4e3a\u6df1\u5165\u63a2\u7a76LLM\u5728\u57fa\u4e8e\u6e38\u620f\u6846\u67b6\u5185\u7684\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u65b9\u5411\u3002**|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u6bd4\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.07778": "|**2024-07-10**|**WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment**|Jiefu Ou et.al.|[2407.07778](http://arxiv.org/abs/2407.07778)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7269\u7406\u73af\u5883\u4e2d\u90e8\u7f72\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4ee3\u7406\u65f6\u6240\u9700\u7684\u57fa\u672c\u64cd\u4f5c\uff08API\uff09\u6570\u91cf\u548c\u8bbe\u8ba1\u95ee\u9898\u3002\u7814\u7a76\u8005\u8bbe\u60f3\uff0c\u5982\u679cwikiHow\u6559\u7a0b\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u7528\u6237\u81ea\u7f16\u4efb\u52a1\uff0c\u90a3\u4e48\u8fd9\u4e9b\u4efb\u52a1\u6240\u9700\u7684API\u8303\u56f4\u662f\u4ec0\u4e48\u3002\u4ed6\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06wikiHow\u6307\u4ee4\u4e0e\u7f6e\u8eab\u4e8e\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u7b56\u7565\u5173\u8054\uff0c\u8fed\u4ee3\u5730\u751f\u6210\u65b0\u7684API\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f53\u611f\u89c4\u5212\u65b9\u9762\u7684\u6700\u65b0\u6210\u5c31\uff0c\u7814\u7a76\u8005\u63d0\u8bae\u4f7f\u7528\u5c11\u91cf\u6837\u4f8b\u63d0\u793aGPT-4\u751f\u6210Python\u4ee3\u7801\u4f5c\u4e3a\u4ee3\u7406\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4ee5\u4e0b\u6b65\u9aa4\u6269\u5c55API\u5e93\uff1a1\uff09\u91cd\u7528\u521d\u59cbAPI\u96c6\uff1b2\uff09\u5728\u5fc5\u8981\u65f6\u521b\u5efa\u65b0\u7684API\u8c03\u7528\u3002\u5b9e\u9a8c\u5173\u6ce8\u7684\u662f\u5b9a\u4e49API\uff0c\u800c\u975e\u5176\u5b9e\u73b0\u6027\u3002\u5728\u4e00\u5c0f\u90e8\u5206wikiHow\u6559\u7a0b\u4e0a\u5e94\u7528\u8be5\u65b9\u6cd5\u540e\uff0c\u53d1\u73b0\u9700\u8981300\u591a\u4e2aAPI\u6765\u6355\u6349\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u6837\u4efb\u52a1\u3002\u81ea\u52a8\u548c\u4eba\u5de5\u5206\u6790\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7ba1\u9053\u80fd\u6709\u6548\u590d\u7528\u548c\u521b\u9020API\u3002\u8fdb\u4e00\u6b65\u7684\u4eba\u5de5\u5ba1\u67e5\u53d1\u73b0\uff0c\u73b0\u6709\u7684\u6a21\u62df\u5668\u4ec5\u652f\u6301\u8bf1\u5bfc\u51fa\u7684API\u7684\u4e00\u5c0f\u90e8\u5206\uff08\u524d50\u4e2a\u5e38\u7528API\u4e2d\u76849\u4e2a\uff09\uff0c\u8fd9\u4fc3\u4f7f\u5f00\u53d1\u66f4\u4e30\u5bcc\u7684\u4f53\u611f\u73af\u5883\u3002|\n", "2407.08739": "|**2024-07-11**|**MAVIS: Mathematical Visual Instruction Tuning**|Renrui Zhang et.al.|[2407.08739](http://arxiv.org/abs/2407.08739)|**[link](https://github.com/zrrskywalker/mavis)**|**### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8fd1\u5e74\u6765\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u591a\u6a21\u6001\u573a\u666f\u4e2d\u7684\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5bf9\u6570\u5b66\u56fe\u89e3\u7684\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u80fd\u529b\u7814\u7a76\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6307\u51fa\u4e86MLLM\u5728\u6570\u5b66\u89c6\u89c9\u9886\u57df\u7684\u4e09\u4e2a\u5173\u952e\u6539\u8fdb\u9886\u57df\uff1a\u6570\u5b66\u56fe\u89e3\u7684\u89c6\u89c9\u7f16\u7801\u3001\u56fe\u89e3\u4e0e\u8bed\u8a00\u7684\u5bf9\u9f50\u4ee5\u53ca\u6570\u5b66\u63a8\u7406\u6280\u80fd\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u9700\u8981\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u7684\u89c6\u89c9\u6570\u5b66\u6570\u636e\u548c\u8bad\u7ec3\u6d41\u7a0b\u3002\u672c\u6587\u63d0\u51faMAVIS\uff08Mathematical VISual instruction tuning for MLLMs\uff09\uff0c\u4e00\u4e2a\u9488\u5bf9MLLM\u7684\u6570\u5b66\u89c6\u89c9\u6307\u5bfc\u8c03\u53c2\u8303\u5f0f\uff0c\u5305\u62ec\u4e00\u7cfb\u5217\u6570\u5b66\u89c6\u89c9\u6570\u636e\u96c6\u548c\u4e13\u95e8\u7684MLLM\u3002 ### \u65b9\u6cd5 MAVIS\u5206\u4e3a\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u4ece\u5934\u5f00\u59cb\u7684\u8bad\u7ec3\u3002\u9996\u5148\uff0c\u6211\u4eec\u521b\u5efa\u4e86MAVIS-Caption\uff0c\u5305\u542b558,000\u4e2a\u56fe\u89e3-\u63cf\u8ff0\u5bf9\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u5fae\u8c03\u4e13\u4e3a\u6570\u5b66\u8bbe\u8ba1\u7684\u89c6\u89c9\u7f16\u7801\u5668\uff08CLIP-Math\uff09\uff0c\u4ee5\u63d0\u5347\u56fe\u89e3\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u5176\u6b21\uff0c\u5229\u7528MAVIS-Caption\uff0c\u6211\u4eec\u901a\u8fc7\u6295\u5f71\u5c42\u5c06CLIP-Math\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5173\u8054\uff0c\u589e\u5f3a\u6570\u5b66\u9886\u57df\u7684\u89c6\u89c9\u8bed\u8a00\u5bf9\u9f50\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f15\u5165MAVIS-Instruct\uff0c\u5305\u542b900,000\u4e2a\u7cbe\u5fc3\u6536\u96c6\u548c\u6807\u6ce8\u7684\u89c6\u89c9\u6570\u5b66\u95ee\u9898\uff0c\u7528\u4e8e\u6700\u7ec8\u6307\u5bfc\u8c03\u53c2\uff0c\u4ee5\u589e\u5f3aMLLM\u7684\u7a33\u5065\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u5728MAVIS-Instruct\u4e2d\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6bcf\u4e2a\u95ee\u9898\u7684\u5b8c\u6574\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u7406\u7531\uff0c\u5e76\u51cf\u5c11\u6587\u672c\u5197\u4f59\uff0c\u4f7f\u6a21\u578b\u66f4\u4e13\u6ce8\u4e8e\u89c6\u89c9\u5143\u7d20\u3002 ### \u7ed3\u679c \u6570\u636e\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/ZrrSkywalker/MAVIS\u3002\u901a\u8fc7MAVIS\uff0c\u6211\u4eec\u65e8\u5728\u586b\u8865\u6570\u5b66\u89c6\u89c9\u7406\u89e3\u7684\u7a7a\u767d\uff0c\u63d0\u5347MLLM\u5728\u89e3\u51b3\u5b9e\u9645\u6570\u5b66\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002**|\n", "2407.08735": "|**2024-07-11**|**Real-Time Anomaly Detection and Reactive Planning with Large Language Models**|Rohan Sinha et.al.|[2407.08735](http://arxiv.org/abs/2407.08735)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5728\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u68c0\u6d4b\u548c\u5e94\u5bf9\u5f02\u5e38\u60c5\u51b5\uff0c\u4ee5\u63d0\u9ad8\u5176\u9c81\u68d2\u6027\u548c\u5b89\u5168\u6027\u3002\u4e3b\u8981\u6311\u6218\u5305\u62ec\u51cf\u5c11\u6a21\u578b\u7684\u8ba1\u7b97\u5f00\u9500\u4ee5\u4fbf\u5b9e\u73b0\u5b9e\u65f6\u5e94\u7528\uff0c\u4ee5\u53ca\u5c06\u6a21\u578b\u7684\u5224\u65ad\u878d\u5165\u5230\u5b89\u5168\u63a7\u5236\u6846\u67b6\u4e2d\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u63a8\u7406\u6846\u67b6\uff1a\u9996\u5148\u662f\u4e00\u4e2a\u5feb\u901f\u7684\u4e8c\u5143\u5f02\u5e38\u5206\u7c7b\u5668\uff0c\u5b83\u5728\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7a7a\u95f4\u4e2d\u5206\u6790\u89c2\u6d4b\u6570\u636e\uff0c\u5982\u679c\u53d1\u73b0\u5f02\u5e38\uff0c\u4f1a\u89e6\u53d1\u540e\u7eed\u7684\u6162\u901f\u63a8\u7406\u9636\u6bb5\uff0c\u5229\u7528\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6df1\u5165\u7684\u903b\u8f91\u63a8\u7406\u3002\u8fd9\u79cd\u8bbe\u8ba1\u7c7b\u4f3c\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\u4e2d\u7684\u51b3\u7b56\u5206\u652f\uff0c\u8003\u8651\u5230\u6162\u901f\u63a8\u7406\u5668\u7684\u5ef6\u8fdf\uff0c\u53ef\u4ee5\u7acb\u5373\u91c7\u53d6\u5907\u4efd\u8ba1\u5212\uff0c\u786e\u4fdd\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002 \u901a\u8fc7\u4e0e\u6700\u5148\u8fdb\u7684GPT\u6a21\u578b\u7684\u81ea\u56de\u5f52\u63a8\u7406\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5373\u4f7f\u4f7f\u7528\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ed6\u4eec\u7684\u5feb\u901f\u5f02\u5e38\u5206\u7c7b\u5668\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4f7f\u5f97\u4ed6\u4eec\u5f00\u53d1\u7684\u8fd0\u884c\u65f6\u76d1\u63a7\u5668\u80fd\u591f\u5728\u8d44\u6e90\u548c\u65f6\u95f4\u9650\u5236\u4e0b\uff0c\u63d0\u5347\u52a8\u6001\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5982\u56db\u65cb\u7ffc\u65e0\u4eba\u673a\u6216\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u4fe1\u4efb\u5ea6\u3002\u8bba\u6587\u7684\u89c6\u9891\u793a\u4f8b\u53ef\u4ee5\u5728\u9879\u76ee\u9875\u9762\u4e0a\u67e5\u770b\uff1ahttps://sites.google.com/view/aesop-llm\u3002|\n", "2407.08733": "|**2024-07-11**|**Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist**|Zihao Zhou et.al.|[2407.08733](http://arxiv.org/abs/2407.08733)|null|### \u7ffb\u8bd1 **\u6458\u8981\uff1a** \u5f3a\u5927\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5353\u8d8a\u6027\u80fd\u7684\u5173\u952e\u4f53\u73b0\u3002\u5982\u4f55\u5b9a\u4e49\u548c\u5168\u9762\u8bc4\u4f30LLMs\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ee5\u53ca\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u53cd\u6620\u7528\u6237\u4f53\u9a8c\uff0c\u5df2\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u76ee\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u4fa7\u91cd\u4e8e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u8fc7\u62df\u5408\uff0c\u5e76\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u771f\u6b63\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u5982\u679c\u6a21\u578b\u771f\u6b63\u7406\u89e3\u4e86\u95ee\u9898\uff0c\u5b83\u5e94\u8be5\u80fd\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u7a33\u5065\u4e14\u7075\u6d3b\u5730\u5e94\u7528\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMATHCHECK\uff0c\u4e00\u4e2a\u65e8\u5728\u6d4b\u8bd5\u4efb\u52a1\u6cdb\u5316\u548c\u63a8\u7406\u9c81\u68d2\u6027\u7684\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6e05\u5355\uff0c\u4ee5\u53ca\u4e00\u4e2a\u81ea\u52a8\u751f\u6210\u6e05\u5355\u7684\u5de5\u5177\u3002MATHCHECK\u5305\u542b\u591a\u4e2a\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u548c\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u6570\u5b66\u63a8\u7406\u80fd\u529b\u548c\u884c\u4e3a\u6d4b\u8bd5\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6211\u4eec\u5229\u7528MATHCHECK\u521b\u5efa\u4e86MATHCHECK-GSM\u548cMATHCHECK-GEO\uff0c\u5206\u522b\u9488\u5bf9\u6570\u5b66\u6587\u672c\u63a8\u7406\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u8bc4\u4f30\uff0c\u5b83\u4eec\u662fGSM8k\u3001GeoQA\u3001UniGeo\u548cGeometry3K\u7b49\u57fa\u51c6\u7684\u5347\u7ea7\u7248\u3002\u6211\u4eec\u4f7f\u7528MATHCHECK-GSM\u548cMATHCHECK-GEO\u5bf9\u8d85\u8fc720\u79cdLLM\u548c11\u79cd\u591a\u6a21\u6001LLMs\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u68c0\u9a8c\u5b83\u4eec\u7684\u7efc\u5408\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u6a21\u578b\u5982GPT-4\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u4ed6\u6a21\u578b\u5bb6\u65cf\u5728\u6e05\u5355\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4e0b\u964d\u3002\u8fdb\u4e00\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u4f20\u7edf\u6570\u5b66\u57fa\u51c6\u76f8\u6bd4\uff0cMATHCHECK\u66f4\u597d\u5730\u53cd\u6620\u4e86\u771f\u6b63\u7684\u6570\u5b66\u80fd\u529b\uff0c\u7ebf\u6027\u5ea6\u66f4\u9ad8\uff0c\u4ece\u800c\u652f\u6301\u6211\u4eec\u7684\u8bbe\u8ba1\u3002\u901a\u8fc7MATHCHECK\uff0c\u6211\u4eec\u53ef\u4ee5\u8f7b\u677e\u8fdb\u884c\u8be6\u7ec6\u7684\u884c\u4e3a\u5206\u6790\uff0c\u6df1\u5165\u63a2\u7a76\u6a21\u578b\u3002|\n", "2407.08716": "|**2024-07-11**|**A Taxonomy for Data Contamination in Large Language Models**|Medha Palavalli et.al.|[2407.08716](http://arxiv.org/abs/2407.08716)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5e7f\u6cdb\u7f51\u7edc\u8bed\u6599\u5e93\u7684\u9884\u8bad\u7ec3\u540e\uff0c\u5728\u4f17\u591a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u6570\u636e\u6c61\u67d3\u95ee\u9898\u65e5\u76ca\u5f15\u8d77\u5173\u6ce8\uff0c\u5373\u8bc4\u4f30\u6570\u636e\u53ef\u80fd\u5b58\u5728\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\uff0c\u5bfc\u81f4\u6a21\u578b\u8868\u73b0\u865a\u9ad8\u3002\u53bb\u6c61\u67d3\uff08decontamination\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u8bd5\u56fe\u68c0\u6d4b\u5e76\u79fb\u9664\u8fd9\u4e9b\u6c61\u67d3\u6570\u636e\u3002\u7136\u800c\uff0c\u6c61\u67d3\u6570\u636e\u53ef\u80fd\u6e90\u4e8e\u6d4b\u8bd5\u96c6\u7684\u4fee\u6539\u7248\u672c\uff0c\u8fd9\u4f7f\u5f97\u68c0\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u76ee\u524d\u5c1a\u4e0d\u6e05\u695a\u4e0d\u540c\u7c7b\u578b\u7684\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u7c7b\u4f53\u7cfb\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u9047\u5230\u7684\u5404\u79cd\u6c61\u67d3\u7c7b\u578b\u8fdb\u884c\u5212\u5206\uff0c\u5e76\u786e\u5b9a\u4e86\u54ea\u4e9b\u7c7b\u578b\u7684\u98ce\u9669\u6700\u9ad8\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u603b\u7ed3\u548c\u95ee\u7b54\u4e24\u4e2a\u5173\u952e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u7c7b\u578b\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5728\u5b9e\u9645\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u3002|\n", "2407.08713": "|**2024-07-11**|**GTA: A Benchmark for General Tool Agents**|Jize Wang et.al.|[2407.08713](http://arxiv.org/abs/2407.08713)|**[link](https://github.com/open-compass/GTA)**|**\u4eba\u4eec\u666e\u904d\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5404\u79cd\u5de5\u5177\u7684\u6574\u5408\uff0c\u4ee5\u5f00\u53d1\u901a\u7528\u4ee3\u7406\uff0c\u4f46\u8fd9\u5bf9LLMs\u7684\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u63d0\u51fa\u4e86\u6311\u6218\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u6cd5\u5b58\u5728\u660e\u663e\u7f3a\u9677\uff0c\u5982\u4f7f\u7528AI\u751f\u6210\u7684\u67e5\u8be2\u3001\u5355\u6b65\u9aa4\u4efb\u52a1\u3001\u6a21\u62df\u5de5\u5177\u4ee5\u53ca\u4ec5\u9650\u6587\u672c\u7684\u4ea4\u4e92\uff0c\u672a\u80fd\u5145\u5206\u5c55\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u4e2d\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faGTA\uff08\u901a\u7528\u5de5\u5177\u4ee3\u7406\u57fa\u51c6\uff09\uff0c\u5b83\u5305\u542b\u4e09\u4e2a\u5173\u952e\u7279\u6027\uff1a\uff081\uff09\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff1a\u7531\u4eba\u7c7b\u7f16\u5199\uff0c\u5177\u6709\u7b80\u5355\u7684\u73b0\u5b9e\u4e16\u754c\u76ee\u6807\uff0c\u4f46\u9690\u542b\u4e86\u5de5\u5177\u4f7f\u7528\u9700\u6c42\uff0c\u8981\u6c42LLMs\u80fd\u63a8\u7406\u51fa\u5408\u9002\u7684\u5de5\u5177\u5e76\u89c4\u5212\u89e3\u51b3\u65b9\u6848\u6b65\u9aa4\u3002\uff082\uff09\u771f\u5b9e\u90e8\u7f72\u7684\u5de5\u5177\uff1a\u4e00\u4e2a\u914d\u5907\u6709\u611f\u77e5\u3001\u64cd\u4f5c\u3001\u903b\u8f91\u548c\u521b\u65b0\u7c7b\u5de5\u5177\u7684\u8bc4\u4f30\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u7684\u5b9e\u9645\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002\uff083\uff09\u771f\u5b9e\u7684\u591a\u6a21\u6001\u8f93\u5165\uff1a\u5305\u62ec\u7a7a\u95f4\u573a\u666f\u56fe\u7247\u3001\u7f51\u9875\u622a\u56fe\u3001\u8868\u683c\u3001\u4ee3\u7801\u7247\u6bb5\u548c\u6253\u5370/\u624b\u5199\u6750\u6599\u7b49\uff0c\u4ee5\u8d34\u8fd1\u771f\u5b9e\u4e16\u754c\u7684\u573a\u666f\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86229\u4e2a\u73b0\u5b9e\u751f\u6d3b\u4efb\u52a1\u548c\u53ef\u6267\u884c\u7684\u5de5\u5177\u94fe\uff0c\u6765\u8bc4\u4f30\u4e3b\u6d41LLMs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff0c\u73b0\u6709\u7684LLMs\u9762\u4e34\u4e25\u5cfb\u6311\u6218\uff0cGPT-4\u5b8c\u6210\u7684\u4efb\u52a1\u4e0d\u8db3\u4e00\u534a\uff0c\u5927\u591a\u6570\u6a21\u578b\u7684\u6210\u7ee9\u4f4e\u4e8e25%\u3002\u8fd9\u4e2a\u8bc4\u4f30\u63ed\u793a\u4e86\u5f53\u524dLLMs\u5728\u5b9e\u9645\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u4e0a\u7684\u74f6\u9888\uff0c\u4e3a\u63d0\u5347\u901a\u7528\u5de5\u5177\u4ee3\u7406\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\u3002GTA\u7684\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2407.08701": "|**2024-07-11**|**Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models**|Zhening Xing et.al.|[2407.08701](http://arxiv.org/abs/2407.08701)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u673a\u5236\uff0c\u5728\u6587\u672c\u548c\u97f3\u9891\u6d41\u6570\u636e\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5bf9\u5b9e\u65f6\u89c6\u9891\u5904\u7406\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u89c6\u9891\u6d41\u5904\u7406\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u8f83\u5c11\u3002\u73b0\u6709\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u4f9d\u8d56\u53cc\u5411\u65f6\u95f4\u6ce8\u610f\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u76f4\u64ad\u89c6\u9891\u7684\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLive2Diff\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u5b9e\u65f6\u89c6\u9891\u7ffb\u8bd1\u8bbe\u8ba1\u7684\u5177\u6709\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u3002\u4e0e\u5148\u524d\u5de5\u4f5c\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e0e\u524d\u4e00\u5e27\u53ca\u5176\u5c11\u6570\u9884\u70ed\u5e27\u76f8\u5173\u8054\uff0c\u4fdd\u6301\u4e86\u65f6\u95f4\u4e00\u81f4\u6027\u548c\u5e73\u6ed1\u6027\uff0c\u65e0\u9700\u8003\u8651\u672a\u6765\u5e27\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u9ad8\u6548\u7684\u964d\u566a\u65b9\u6848\uff0c\u5305\u62ecKV\u7f13\u5b58\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u5904\u7406\uff0c\u4ee5\u652f\u6301\u4e92\u52a8\u5e27\u7387\u4e0b\u7684\u89c6\u9891\u6d41\u7ffb\u8bd1\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6ce8\u610f\u529b\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u8bbe\u8ba1\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u4fdd\u6301\u65f6\u95f4\u5e73\u6ed1\u6027\u548c/\u6216\u6548\u7387\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.08699": "|**2024-07-11**|**Mitigating Catastrophic Forgetting in Language Transfer via Model Merging**|Anton Alexandrov et.al.|[2407.08699](http://arxiv.org/abs/2407.08699)|null|\u968f\u7740\u5f00\u653e\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u82f1\u8bed\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u4e0d\u65ad\u63d0\u5347\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u81f4\u529b\u4e8e\u5c06\u5176\u6269\u5c55\u5230\u5176\u4ed6\u8bed\u8a00\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bed\u8a00\u9002\u5e94\u5f80\u5f80\u4f1a\u5bfc\u81f4\u57fa\u7840\u6a21\u578b\u80fd\u529b\u7684\u707e\u96be\u6027\u9057\u5fd8\uff0c\u9650\u5236\u4e86\u6539\u7f16\u540e\u6a21\u578b\u7684\u5b9e\u7528\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u9002\u5e94\u65b9\u6cd5\u2014\u2014Branch-and-Merge\uff08BaM\uff09\uff0c\u5b83\u57fa\u4e8e\u8fed\u4ee3\u5730\u5408\u5e76\u591a\u4e2a\u9488\u5bf9\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u3002BaM\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4ea7\u751f\u7684\u662f\u5e45\u5ea6\u8f83\u5c0f\u4f46\u8d28\u91cf\u66f4\u9ad8\u7684\u6743\u91cd\u8c03\u6574\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u6e90\u9886\u57df\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u76ee\u6807\u9886\u57df\u7684\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4fdd\u52a0\u5229\u4e9a\u8bed\u548c\u5fb7\u8bed\u7684\u5e7f\u6cdb\u5b9e\u8bc1\u7814\u7a76\u4e2d\u5c55\u793a\u4e86BaM\u7684\u4f18\u52bf\uff1a\u5b83\u80fd\u663e\u8457\u964d\u4f4e\u9057\u5fd8\uff0c\u540c\u65f6\u5728\u4e0d\u540c\u6a21\u578b\u67b6\u6784\u4e0a\u4e0e\u6807\u51c6\u6301\u7eed\u9884\u8bad\u7ec3\u548c\u6307\u4ee4\u5fae\u8c03\u76f8\u6bd4\uff0c\u80fd\u591f\u5339\u914d\u751a\u81f3\u63d0\u5347\u76ee\u6807\u9886\u57df\u7684\u6027\u80fd\u3002|\n", "2407.08694": "|**2024-07-11**|**Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight**|Zhiqiang Xie et.al.|[2407.08694](http://arxiv.org/abs/2407.08694)|null|\u5728\u73b0\u4ee3\u4e91\u7cfb\u7edf\u4e2d\uff0c\u8fd0\u884c\u65f6\u6545\u969c\u548c\u6027\u80fd\u4e0b\u964d\u662f\u5e38\u6001\u3002\u5bf9\u4e8e\u4e91\u670d\u52a1\u63d0\u4f9b\u5546\u800c\u8a00\uff0c\u81ea\u52a8\u786e\u5b9a\u95ee\u9898\u7684\u6839\u672c\u539f\u56e0\u662f\u4fdd\u8bc1\u9ad8\u53ef\u9760\u6027\u548c\u53ef\u7528\u6027\u7684\u5173\u952e\uff0c\u56e0\u4e3a\u5feb\u901f\u7684\u6545\u969c\u5b9a\u4f4d\u6709\u52a9\u4e8e\u52a0\u5feb\u8bca\u65ad\u548c\u4f18\u5148\u7ea7\u6392\u5e8f\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u89e3\u51b3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u4e2d\uff0c\u56e0\u679c\u63a8\u7406\u5229\u7528\u56e0\u679c\u56fe\u6765\u6355\u6349\u4e0d\u540c\u4e91\u7cfb\u7edf\u6027\u80fd\u6307\u6807\u4e4b\u95f4\u7684\u5173\u7cfb\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u7cfb\u7edf\u5f00\u53d1\u8005\u9700\u8981\u7cbe\u786e\u5b9a\u4e49\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\uff0c\u8fd9\u662f\u4e00\u9879\u8017\u65f6\u3001\u8106\u5f31\u4e14\u6311\u6218\u6027\u7684\u5de5\u4f5c\uff0c\u5c24\u5176\u5bf9\u4e8e\u5e9e\u5927\u548c\u52a8\u6001\u7684\u7cfb\u7edf\uff0c\u4e14\u9700\u8981\u6df1\u539a\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u5728\u4e91\u7cfb\u7edf\u4e2d\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u6545\u969c\u4e8b\u4ef6\u7684\u53d1\u751f\u9891\u7387\u76f8\u5bf9\u8f83\u4f4e\u3002 \u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u89e3\u51b3\u65b9\u6848\u2014\u2014Atlas\uff0c\u5b83\u80fd\u591f\u81ea\u52a8\u5408\u6210\u4e91\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\u3002Atlas\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u7cfb\u7edf\u6587\u6863\u3001\u65e5\u5fd7\u548c\u90e8\u7f72\u53cd\u9988\u751f\u6210\u56e0\u679c\u56fe\u3002Atlas\u4e0e\u6570\u636e\u9a71\u52a8\u7684\u56e0\u679c\u53d1\u73b0\u6280\u672f\u76f8\u8f85\u76f8\u6210\uff0c\u5e76\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u9a8c\u8bc1\u6b65\u9aa4\u8fdb\u884c\u589e\u5f3a\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u6545\u969c\u5b9a\u4f4d\u573a\u666f\u4e2d\u8bc4\u4f30\u4e86Atlas\uff0c\u7ed3\u679c\u8868\u660e\uff0cAtlas\u80fd\u591f\u5728\u53ef\u6269\u5c55\u548c\u666e\u9002\u7684\u65b9\u5f0f\u4e0b\u751f\u6210\u56e0\u679c\u56fe\uff0c\u5176\u6027\u80fd\u8fdc\u8d85\u6570\u636e\u9a71\u52a8\u7b97\u6cd5\uff0c\u5e76\u4e0e\u57fa\u51c6\u7ebf\u76f8\u5f53\u3002|\n", "2407.08683": "|**2024-07-11**|**SEED-Story: Multimodal Long Story Generation with Large Language Model**|Shuai Yang et.al.|[2407.08683](http://arxiv.org/abs/2407.08683)|**[link](https://github.com/tencentarc/seed-story)**|**\u968f\u7740\u56fe\u50cf\u751f\u6210\u548c\u5f00\u653e\u5f62\u5f0f\u6587\u672c\u751f\u6210\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u6709\u5438\u5f15\u529b\u3002\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\uff0c\u5373\u751f\u6210\u53d9\u4e8b\u6587\u672c\u4e0e\u751f\u52a8\u56fe\u50cf\u7684\u4ea4\u9519\u5e8f\u5217\uff0c\u4f5c\u4e3a\u4e00\u79cd\u6709\u4ef7\u503c\u7684\u5b9e\u7528\u4efb\u52a1\uff0c\u56e0\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u524d\u666f\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u7406\u89e3\u6587\u672c\u548c\u56fe\u50cf\u590d\u6742\u4ea4\u4e92\u3001\u751f\u6210\u8fde\u8d2f\u4e14\u76f8\u5173\u6587\u672c\u548c\u89c6\u89c9\u5185\u5bb9\u7684\u6311\u6218\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51faSEED-Story\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u6765\u751f\u6210\u6269\u5c55\u7684\u591a\u6a21\u6001\u6545\u4e8b\u3002\u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8eMLLM\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u65e2\u80fd\u9884\u6d4b\u6587\u672c\u4ee4\u724c\uff0c\u4e5f\u80fd\u9884\u6d4b\u89c6\u89c9\u4ee4\u724c\uff0c\u7136\u540e\u901a\u8fc7\u9002\u5e94\u7684\u89c6\u89c9\u89e3\u4ee4\u724c\u5316\u5668\u5904\u7406\uff0c\u751f\u6210\u5177\u6709\u4e00\u81f4\u89d2\u8272\u548c\u98ce\u683c\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u591a\u6a21\u6001\u6ce8\u610f\u529b\u6c89\u964d\u673a\u5236\uff0c\u4f7f\u5f97\u5728\u9ad8\u5ea6\u81ea\u52a8\u9012\u5f52\u7684\u65b9\u5f0f\u4e0b\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe25\u4e2a\u5e8f\u5217\uff08\u4ec5\u752810\u4e2a\u8fdb\u884c\u8bad\u7ec3\uff09\u7684\u6545\u4e8b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u5927\u89c4\u6a21\u9ad8\u5206\u8fa8\u7387\u7684StoryStream\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bad\u7ec3\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5e76\u91cf\u5316\u8bc4\u4f30\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\u4efb\u52a1\u5728\u591a\u4e2a\u65b9\u9762\u7684\u6027\u80fd\u3002**|\n", "2407.08662": "|**2024-07-11**|**Uncertainty Estimation of Large Language Models in Medical Question Answering**|Jiaxin Wu et.al.|[2407.08662](http://arxiv.org/abs/2407.08662)|null|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b58\u5728\u4ea7\u751f\u9519\u8bef\u4e8b\u5b9e\u7684\u98ce\u9669\u3002\u4e3a\u4e86\u5728\u533b\u7597\u95ee\u9898\u89e3\u7b54\u4e2d\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\uff0c\u9700\u8981\u53ef\u9760\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\uff08UE\uff09\u65b9\u6cd5\u6765\u8bc6\u522b\u5e7b\u89c9\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5728\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u5bf9\u6d41\u884cUE\u65b9\u6cd5\u53ca\u5176\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u65b9\u6cd5\u5728\u8be5\u9886\u57df\u901a\u5e38\u8868\u73b0\u4e0d\u4f73\uff0c\u51f8\u663e\u4e86\u533b\u7597\u5e94\u7528\u4e2d\u7684UE\u6311\u6218\u3002\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5f80\u5f80\u80fd\u83b7\u5f97\u66f4\u597d\u7684\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u89c4\u6a21\u4e0eUE\u53ef\u9760\u6027\u53ef\u80fd\u5b58\u5728\u5173\u8054\u3002 \u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4e24\u9636\u6bb5\u9a8c\u8bc1\u201d\u7684\u6982\u7387\u81ea\u7531\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u6cd5\u3002\u9996\u5148\uff0cLLM\u751f\u6210\u9010\u6b65\u89e3\u91ca\u548c\u521d\u59cb\u7b54\u6848\uff0c\u63a5\u7740\u5236\u5b9a\u6838\u67e5\u95ee\u9898\u4ee5\u68c0\u67e5\u89e3\u91ca\u4e2d\u7684\u4e8b\u5b9e\u9648\u8ff0\u3002\u6a21\u578b\u4f1a\u4e24\u6b21\u56de\u7b54\u8fd9\u4e9b\u95ee\u9898\uff1a\u4e00\u6b21\u72ec\u7acb\uff0c\u4e00\u6b21\u53c2\u8003\u89e3\u91ca\u3002\u4e24\u79cd\u7b54\u6848\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u5ea6\u8861\u91cf\u539f\u59cb\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u4f7f\u7528Llama 2 Chat\u6a21\u578b\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u51c6\u57fa\u7ebf\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4e24\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u5728\u5404\u4e2a\u6570\u636e\u96c6\u548c\u6a21\u578b\u89c4\u6a21\u4e0a\u5b9e\u73b0\u4e86\u6700\u4f73\u7684\u6574\u4f53\u51c6\u786e\u6027\u548c\u7a33\u5b9a\u6027\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u968f\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\u800c\u63d0\u5347\u3002|\n", "2407.09467": "|**2024-07-12**|**FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3**|Georgios Makridis et.al.|[2407.09467](http://arxiv.org/abs/2407.09467)|null|\u5728\u8fd9\u4e2a\u5145\u6ee1\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u53d9\u4e8b\u591a\u6837\u6027\u4e16\u754c\u4e2d\uff0c\u6709\u4e00\u4e2a\u72ec\u7279\u7684\u673a\u4f1a\u662f\u901a\u8fc7\u5b9a\u5236\u548c\u4e2a\u6027\u5316\u7684\u53d9\u8ff0\u5438\u5f15\u5e74\u8f7b\u89c2\u4f17\u3002\u672c\u6587\u4ecb\u7ecdFairyLandAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u513f\u7ae5\u5f00\u53d1\u7684\u521b\u65b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u57fa\u4e8eOpenAI\u7684API\u6784\u5efa\u3002\u5176\u7279\u522b\u4e4b\u5904\u5728\u4e8e\uff0cFairyLandAI\u4e0d\u4ec5\u80fd\u751f\u6210\u5f15\u4eba\u5165\u80dc\u3001\u9002\u5408\u5404\u5e74\u9f84\u6bb5\u4e14\u53cd\u6620\u5404\u79cd\u4f20\u7edf\u7684\u6545\u4e8b\uff0c\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5408\u9ad8\u7ea7\u56fe\u50cf\u751f\u6210\u5de5\u5177\uff08\u5982GenAI\u548cDalle-3\uff09\u7684\u521b\u610f\u63d0\u793a\uff0c\u4ece\u800c\u4e30\u5bcc\u8bb2\u6545\u4e8b\u7684\u4f53\u9a8c\u3002FairyLandAI\u7cbe\u51c6\u5730\u9002\u5e94\u513f\u7ae5\u7684\u60f3\u8c61\u529b\u4e16\u754c\uff0c\u63d0\u4f9b\u65e2\u6559\u80b2\u53c8\u5a31\u4e50\u7684\u6545\u4e8b\uff0c\u5e76\u4e0e\u4e0d\u540c\u5e74\u9f84\u9636\u6bb5\u6240\u8574\u542b\u7684\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002\u5b83\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u6839\u636e\u4e2a\u4f53\u5b69\u5b50\u7684\u559c\u597d\u548c\u6587\u5316\u80cc\u666f\u5b9a\u5236\u6545\u4e8b\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u53d9\u4e8b\u65b0\u65f6\u4ee3\u7684\u5230\u6765\u3002\u6b64\u5916\uff0c\u5b83\u4e0e\u56fe\u50cf\u751f\u6210\u6280\u672f\u7684\u7ed3\u5408\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u53d9\u4e8b\u4f53\u9a8c\uff0c\u6fc0\u53d1\u53e3\u5934\u548c\u89c6\u89c9\u521b\u9020\u529b\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cFairyLandAI\u5728\u521b\u4f5c\u5438\u5f15\u5b69\u5b50\u4eec\u7684\u6545\u4e8b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u4e9b\u6545\u4e8b\u4e0d\u4ec5\u5a31\u4e50\uff0c\u8fd8\u4f53\u73b0\u4e86\u591a\u5143\u4f20\u7edf\u4e2d\u7684\u9053\u5fb7\u6559\u8bf2\u3002\u8fd9\u4e2a\u6a21\u578b\u5bf9\u4e8e\u5bb6\u957f\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u6765\u8bf4\u662f\u4e00\u4e2a\u5b9d\u8d35\u7684\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u901a\u8fc7\u5f15\u4eba\u5165\u80dc\u7684\u6545\u4e8b\u4f20\u9012\u6df1\u523b\u7684\u4eba\u751f\u9053\u7406\u3002FairyLandAI\u4ee3\u8868\u4e86\u5229\u7528LLMs\uff0c\u7279\u522b\u662fOpenAI API\u8fdb\u884c\u6559\u80b2\u548c\u6587\u5316\u63d0\u5347\u7684\u5f00\u521b\u6027\u4e00\u6b65\uff0c\u4f7f\u590d\u6742\u800c\u5bcc\u6709\u6559\u80b2\u610f\u4e49\u7684\u9053\u5fb7\u6545\u4e8b\u5bf9\u5e74\u8f7b\u3001\u5bcc\u6709\u60f3\u8c61\u529b\u7684\u5fc3\u7075\u53d8\u5f97\u6613\u4e8e\u7406\u89e3\u548c\u4eab\u53d7\u3002|\n", "2407.09450": "|**2024-07-12**|**Human-like Episodic Memory for Infinite Context LLMs**|Zafeirios Fountas et.al.|[2407.09450](http://arxiv.org/abs/2407.09450)|**[link](https://github.com/em-llm/EM-LLM-model)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u4ecd\u9762\u4e34\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u7684\u95ee\u9898\u3002\u4eba\u7c7b\u5927\u8111\u5728\u7ec4\u7ec7\u548c\u68c0\u7d22\u8de8\u957f\u65f6\u95f4\u5c3a\u5ea6\u7684\u4eb2\u8eab\u7ecf\u5386\u65b9\u9762\u5c24\u4e3a\u51fa\u8272\uff0c\u80fd\u591f\u8986\u76d6\u4e00\u751f\u7684\u8bb0\u5fc6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aEM-LLM\uff0c\u5b83\u5c06\u4eba\u7c7b\u7684 episodic memory\uff08\u60c5\u666f\u8bb0\u5fc6\uff09\u548c\u4e8b\u4ef6\u8ba4\u77e5\u5173\u952e\u8981\u7d20\u878d\u5165\u5230LLMs\u4e2d\uff0c\u4f7f\u5176\u80fd\u591f\u6709\u6548\u5904\u7406\u51e0\u4e4e\u65e0\u9650\u957f\u5ea6\u7684\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4fdd\u6301\u8ba1\u7b97\u6548\u7387\u3002EM-LLM\u901a\u8fc7\u7ed3\u5408\u8d1d\u53f6\u65af\u60ca\u5947\u5ea6\u548c\u56fe\u8bba\u8fb9\u754c\u7ec6\u5316\u6280\u672f\uff0c\u5728\u7ebf\u65b9\u5f0f\u7ec4\u7ec7\u4ee4\u724c\u5e8f\u5217\u6210\u8fde\u8d2f\u7684\u4e8b\u4ef6\u3002\u5f53\u9700\u8981\u65f6\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bb0\u5fc6\u8fc7\u7a0b\u2014\u2014\u7ed3\u5408\u76f8\u4f3c\u5ea6\u548c\u65f6\u95f4\u90bb\u63a5\u7684\u68c0\u7d22\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4fe1\u606f\u8bbf\u95ee\u3002\u5728LongBench\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0cEM-LLM\u7684\u8868\u73b0\u4f18\u4e8e\u6700\u5148\u8fdb\u7684InfLLM\u6a21\u578b\uff0c\u603b\u4f53\u76f8\u5bf9\u63d0\u9ad8\u4e864.3%\uff0c\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\uff0c\u5305\u62ec\u63d0\u5347\u4e8633%\u7684PassageRetrieval\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86EM-LLM\u4e8b\u4ef6\u5206\u5272\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e8b\u4ef6\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\uff0c\u6697\u793a\u4e86\u8fd9\u4e2a\u4eba\u5de5\u7cfb\u7edf\u4e0e\u751f\u7269\u5bf9\u5e94\u673a\u5236\u4e4b\u95f4\u7684\u6865\u6881\u3002\u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u63d0\u5347\u4e86LLMs\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd8\u4e3a\u63a2\u7d22\u4eba\u7c7b\u8bb0\u5fc6\u673a\u5236\u63d0\u4f9b\u4e86\u8ba1\u7b97\u6846\u67b6\uff0c\u5f00\u8f9f\u4e86\u4eba\u5de5\u667a\u80fd\u548c\u8ba4\u77e5\u79d1\u5b66\u4ea4\u53c9\u7814\u7a76\u7684\u65b0\u9014\u5f84\u3002|\n", "2407.09447": "|**2024-07-12**|**ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts**|Amelia F. Hardy et.al.|[2407.09447](http://arxiv.org/abs/2407.09447)|**[link](https://github.com/sisl/astprompter)**|## \u80cc\u666f \u901a\u5e38\u7684\u81ea\u52a8\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ea2\u961f\u5bf9\u6297\u7b56\u7565\u96c6\u4e2d\u5728\u5bfb\u627e\u80fd\u89e6\u53d1\u51bb\u7ed3\u8bed\u8a00\u6a21\u578b\uff08\u5373\u9632\u5fa1\u8005\uff09\u751f\u6210\u6709\u6bd2\u6587\u672c\u7684\u63d0\u793a\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6a21\u578b\uff08\u5373\u653b\u51fb\u8005\uff09\u4ea7\u751f\u96be\u4ee5\u7406\u89e3\u3001\u4e0d\u81ea\u7136\u7684\u8f93\u51fa\u3002\u5728\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u6765\u5904\u7406LLMs\u7684\u7ea2\u961f\u5bf9\u6297\u4efb\u52a1\uff0c\u76ee\u6807\u662f\u627e\u5230\u65e2\u80fd\uff081\uff09\u89e6\u53d1\u9632\u5fa1\u8005\u751f\u6210\u6709\u6bd2\u6587\u672c\uff0c\u53c8\u80fd\uff082\uff09\u4fdd\u6301\u4f4e\u56f0\u60d1\u5ea6\uff08\u5373\u9632\u5fa1\u8005\u6253\u5206\uff09\u7684\u63d0\u793a\u3002\u6211\u4eec\u8ba4\u4e3a\u5728\u7ea2\u961f\u5bf9\u6297\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u60c5\u51b5\u6700\u76f8\u5173\uff0c\u56e0\u4e3a\u5b83\u4eec\u5f88\u53ef\u80fd\u5728\u9632\u5fa1\u8005\u6a21\u578b\u7684\u5e38\u89c4\u4f7f\u7528\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u5728\u7ebf\u548c\u5f31\u76d1\u7763\u7684Identity Preference Optimization\uff08IPO\uff09\u53d8\u4f53\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u5e94\u7528\u4e8eGPT-2\u548cGPT-2 XL\u4f5c\u4e3a\u9632\u5fa1\u8005\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b56\u7565\u80fd\u591f\u751f\u6210\u65e2\u53ef\u80fd\u53c8\u4f1a\u89e6\u53d1\u6bd2\u6027\u7684\u63d0\u793a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5b66\u4e60\u7b56\u7565\u3001\u53ef\u80fd\u6027\u4e0e\u6bd2\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u8ba8\u8bba\u4e86\u76f8\u5173\u542b\u4e49\u3002\u8be5\u9879\u76ee\u7684\u6e90\u4ee3\u7801\u53ef\u5728\u8fd9\u91cc\u83b7\u53d6\uff1ahttps://github.com/sisl/ASTPrompter/\u3002|\n", "2407.09435": "|**2024-07-12**|**MUSCLE: A Model Update Strategy for Compatible LLM Evolution**|Jessica Echterhoff et.al.|[2407.09435](http://arxiv.org/abs/2407.09435)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u6570\u636e\u6216\u67b6\u6784\u7684\u8c03\u6574\u800c\u7ecf\u5e38\u66f4\u65b0\u4ee5\u63d0\u5347\u6027\u80fd\u3002\u5728\u5347\u7ea7\u8fc7\u7a0b\u4e2d\uff0c\u5f00\u53d1\u8005\u901a\u5e38\u4fa7\u91cd\u4e8e\u63d0\u9ad8\u603b\u4f53\u6027\u80fd\u6307\u6807\uff0c\u5bf9\u4e0e\u65e7\u7248\u672c\u517c\u5bb9\u6027\u7684\u5173\u6ce8\u8f83\u5c11\u3002\u7136\u800c\uff0c\u7528\u6237\u5f80\u5f80\u4f1a\u5bf9\u4ed6\u4eec\u4f7f\u7528\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u529f\u80fd\u548c\u80fd\u529b\u5f62\u6210\u5fc3\u7406\u6a21\u578b\uff0c\u5e76\u968f\u7740\u6bcf\u6b21\u66f4\u65b0\u9700\u8981\u8c03\u6574\u8fd9\u4e2a\u6a21\u578b\u3002\u9891\u7e41\u7684\u6a21\u578b\u53d8\u66f4\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u6ee1\u610f\u5ea6\u4e0b\u964d\u3002\u5b9e\u9645\u4e0a\uff0c\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u5668\u4f9d\u8d56\u9884\u8bad\u7ec3\u7684LLM\u57fa\u6a21\u578b\u3002\u5f53\u57fa\u6a21\u578b\u66f4\u65b0\u65f6\uff0c\u9762\u5411\u7528\u6237\u7684\u8fd9\u4e9b\u4e0b\u6e38\u4efb\u52a1\u6a21\u578b\u53ef\u80fd\u4f1a\u51fa\u73b0\u5b9e\u4f8b\u9000\u5316\u6216\u8d1f\u9762\u7ffb\u8f6c\u2014\u2014\u5148\u524d\u6b63\u786e\u7684\u5b9e\u4f8b\u73b0\u5728\u88ab\u9884\u6d4b\u9519\u8bef\u3002\u5373\u4f7f\u4e0b\u6e38\u4efb\u52a1\u7684\u8bad\u7ec3\u6d41\u7a0b\u4fdd\u6301\u4e0d\u53d8\uff0c\u8fd9\u79cd\u60c5\u51b5\u4e5f\u4f1a\u53d1\u751f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4e3a\u7528\u6237\u63d0\u4f9b\u65e0\u7f1d\u7684\u6a21\u578b\u66f4\u65b0\u4f53\u9a8c\uff0c\u65b9\u6cd5\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u6a21\u578b\u4e0e\u65e7\u7248\u672c\u7684\u517c\u5bb9\u6027\uff0c\u7279\u522b\u9002\u7528\u4e8e\u751f\u6210\u4efb\u52a1\uff0c\u4e5f\u53ef\u5e94\u7528\u4e8e\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e0d\u540c\u6a21\u578b\u7248\u672c\u548c\u66f4\u65b0\u4e4b\u95f4\u5b58\u5728\u9000\u5316\u548c\u4e0d\u4e00\u81f4\u6027\uff0c\u5c24\u5176\u662f\u5728\u591a\u6837\u5316\u7684\u4efb\u52a1\u4e0a\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ee5\u4e0b\u4e24\u4e2a\u9014\u5f84\u63d0\u4f9b\u5bf9\u7528\u6237\u53cb\u597d\u7684\u6a21\u578b\u66f4\u65b0\uff1a\u4e00\u662f\u5f00\u53d1\u4e00\u79cd\u517c\u5bb9\u6027\u8bc4\u4f30\u6807\u51c6\uff0c\u7528\u4e8e\u68c0\u6d4b\u751f\u6210\u4efb\u52a1\u6216\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6a21\u578b\u7248\u672c\u95f4\u5dee\u5f02\uff1b\u4e8c\u662f\u63d0\u51fa\u4e00\u79cd\u8bad\u7ec3\u7b56\u7565\uff0c\u901a\u8fc7\u8bad\u7ec3\u517c\u5bb9\u6027\u6a21\u578b\u6765\u51cf\u5c11\u6a21\u578b\u66f4\u65b0\u4e2d\u7684\u4e0d\u4e00\u81f4\uff0c\u4ece\u800c\u964d\u4f4e\u4eceLlama 1\u5230Llama 2\u7b49\u7248\u672c\u66f4\u65b0\u65f6\u7684\u8d1f\u9762\u7ffb\u8f6c\u7387\uff0c\u6700\u591a\u53ef\u51cf\u5c1140%\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u8f7b\u677e\u5730\u9002\u5e94\u65b0\u7248\u672c\uff0c\u800c\u65e0\u9700\u9891\u7e41\u8c03\u6574\u4ed6\u4eec\u7684\u9884\u671f\u548c\u4f7f\u7528\u65b9\u5f0f\u3002|\n", "2407.09429": "|**2024-07-12**|**Open (Clinical) LLMs are Sensitive to Instruction Phrasings**|Alberto Mario Ceballos Arroyo et.al.|[2407.09429](http://arxiv.org/abs/2407.09429)|**[link](https://github.com/alceballosa/clin-robust)**|## \u80cc\u666f \u57fa\u4e8e\u6307\u4ee4\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u6267\u884c\u5404\u79cd\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5bf9\u6307\u4ee4\u8868\u8ff0\u7684\u654f\u611f\u6027\u662f\u4e00\u4e2a\u95ee\u9898\u3002\u5728\u533b\u7597\u9886\u57df\u5c24\u5176\u5173\u952e\uff0c\u56e0\u4e3a\u4e34\u5e8a\u533b\u751f\u53ef\u80fd\u4e0d\u662f\u63d0\u793a\u5de5\u7a0b\u65b9\u9762\u7684\u4e13\u5bb6\uff0c\u4e14\u9519\u8bef\u8f93\u51fa\u7684\u6f5c\u5728\u540e\u679c\u66f4\u4e3a\u4e25\u91cd\u3002\u8fd9\u5c31\u63d0\u51fa\u4e86\u4e00\u4e2a\u5b9e\u9645\u95ee\u9898\uff1a\u9488\u5bf9\u4e34\u5e8a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u6307\u4ee4\u8c03\u4f18\u7684LLMs\u5bf9\u4e8e\u81ea\u7136\uff08\u975e\u653b\u51fb\u6027\u7684\uff09\u6307\u4ee4\u8868\u8ff0\u53d8\u5316\u6709\u591a\u7a33\u5065\uff1f\u6211\u4eec\u6536\u96c6\u4e86\u6765\u81ea\u4e0d\u540c\u4efb\u52a1\u7684\u533b\u751f\u63d0\u793a\uff0c\u8861\u91cf\u4e86\u4e03\u79cdLLM\uff08\u5305\u62ec\u901a\u7528\u548c\u4e13\u7528\u7684\uff09\u5bf9\u6307\u4ee4\u8868\u8ff0\u7ec6\u5fae\u5dee\u5f02\u7684\u654f\u611f\u5ea6\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5dee\u5f02\u663e\u8457\uff0c\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u4e13\u95e8\u9488\u5bf9\u4e34\u5e8a\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u8f83\u4e8e\u901a\u7528\u9886\u57df\u7684\u6a21\u578b\uff0c\u5176\u7a33\u5b9a\u6027\u8f83\u5dee\u3002\u6b64\u5916\uff0c\u968f\u610f\u7684\u8868\u8ff0\u53d8\u5316\u53ef\u80fd\u5f71\u54cd\u516c\u5e73\u6027\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u9884\u6d4b\u6b7b\u4ea1\u7387\u7684\u6709\u6548\u4f46\u4e0d\u540c\u7684\u6307\u4ee4\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u6574\u4f53\u6027\u80fd\u7684\u6ce2\u52a8\uff0c\u8fd8\u4f1a\u5728\u4e0d\u540c\u4eba\u7fa4\u95f4\u4ea7\u751f\u5dee\u5f02\u3002|\n", "2407.09424": "|**2024-07-12**|**TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models**|Hang Zou et.al.|[2407.09424](http://arxiv.org/abs/2407.09424)|null|\u8be5\u8bba\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u5230\u7535\u4fe1\u9886\u57df\u7684\u4e13\u7528\u6a21\u578b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u6784\u5efa\u4e86\u7535\u4fe1\u7279\u5b9a\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u6307\u4ee4\u6570\u636e\u96c6\u548c\u504f\u597d\u6570\u636e\u96c6\uff0c\u5206\u522b\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\u3001\u6307\u5bfc\u8c03\u4f18\u548c\u5bf9\u9f50\u8c03\u4f18\u3002\u7531\u4e8e\u7535\u4fe1\u9886\u57df\u7f3a\u4e4f\u5e7f\u6cdb\u63a5\u53d7\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u57fa\u51c6\uff1a\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u3001\u7535\u4fe1\u5f00\u653e\u6027\u95ee\u9898\u4e0e\u7b54\u6848\uff08TeleQnA\uff09\u4ee5\u53ca\u7535\u4fe1\u4ee3\u7801\u4efb\u52a1\u3002\u8fd9\u4e9b\u65b0\u57fa\u51c6\u5168\u9762\u8bc4\u4f30\u4e86LLMs\u5728\u7535\u4fe1\u9886\u57df\u7684\u6570\u5b66\u5efa\u6a21\u3001\u5f00\u653e\u5f0f\u95ee\u9898\u56de\u7b54\u3001\u4ee3\u7801\u751f\u6210\u3001\u586b\u5145\u3001\u603b\u7ed3\u548c\u5206\u6790\u7b49\u80fd\u529b\u3002\u6211\u4eec\u7684\u4f18\u5316\u6a21\u578bTelecomGPT\u5728\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5982GPT-4\u3001Llama-3\u548cMistral\uff0c\u5e76\u5728TeleQnA\u30013GPP\u6280\u672f\u6587\u6863\u5206\u7c7b\u3001\u7535\u4fe1\u4ee3\u7801\u6458\u8981\u4e0e\u751f\u6210\u4ee5\u53ca\u586b\u5145\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2407.09417": "|**2024-07-12**|**Mitigating Entity-Level Hallucination in Large Language Models**|Weihang Su et.al.|[2407.09417](http://arxiv.org/abs/2407.09417)|**[link](https://github.com/oneal2000/entityhallucination)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7528\u6237\u83b7\u53d6\u4fe1\u606f\u7684\u65b9\u5f0f\u53d1\u751f\u4e86\u8f6c\u53d8\uff0c\u4ece\u4f20\u7edf\u7684\u641c\u7d22\u5f15\u64ce\u8f6c\u5411\u76f4\u63a5\u4e0eLLMs\u8fdb\u884c\u95ee\u7b54\u4ea4\u4e92\u3002\u7136\u800c\uff0cLLMs\u7684\u5e7f\u6cdb\u5e94\u7528\u66b4\u9732\u51fa\u4e00\u4e2a\u6311\u6218\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u751f\u6210\uff0c\u5373\u6a21\u578b\u751f\u6210\u770b\u4f3c\u8fde\u8d2f\u4f46\u4e8b\u5b9e\u6027\u9519\u8bef\u7684\u56de\u7b54\uff0c\u8fd9\u5bfc\u81f4\u7528\u6237\u5bf9\u57fa\u4e8eLLMs\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u4ea7\u751f\u6000\u7591\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff1a\u52a8\u6001\u68c0\u7d22\u589e\u5f3a\u57fa\u4e8e\u5e7b\u89c9\u68c0\u6d4b\uff08DRAD\uff09\u3002DRAD\u6539\u8fdb\u4e86\u4f20\u7edf\u68c0\u7d22\u589e\u5f3a\u6280\u672f\uff0c\u901a\u8fc7\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\u6765\u52a8\u6001\u8c03\u6574\u68c0\u7d22\u8fc7\u7a0b\u3002\u5b83\u4e3b\u8981\u5305\u62ec\u4e24\u4e2a\u6838\u5fc3\u7ec4\u4ef6\uff1a\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\uff08RHD\uff09\uff0c\u7528\u4e8e\u5728\u65e0\u9700\u5916\u90e8\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\u8bc6\u522b\u6f5c\u5728\u7684\u5e7b\u89c9\uff1b\u4ee5\u53ca\u57fa\u4e8e\u5916\u90e8\u77e5\u8bc6\u7684\u81ea\u6211\u7ea0\u6b63\uff08SEK\uff09\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4fee\u6b63\u8fd9\u4e9b\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAD\u5728\u68c0\u6d4b\u548c\u51cf\u5c11LLMs\u4e2d\u7684\u5e7b\u89c9\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5df2\u5c06\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u4f7f\u7528\uff1ahttps://github.com/oneal2000/EntityHallucination\u3002**|\n", "2407.09413": "|**2024-07-12**|**SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers**|Shraman Pramanick et.al.|[2407.09413](http://arxiv.org/abs/2407.09413)|**[link](https://github.com/google/spiqa)**|**### \u4efb\u52a1 \u5728\u6df1\u5165\u9605\u8bfb\u79d1\u5b66\u8bba\u6587\u65f6\uff0c\u5feb\u901f\u67e5\u627e\u4fe1\u606f\u662f\u5173\u952e\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u4e8e\u8bba\u6587\u7684\u95ee\u9898 answering\uff08QA\uff09\u6570\u636e\u96c6\u5728\u89c4\u6a21\u548c\u5185\u5bb9\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u4e3b\u8981\u5173\u6ce8\u6587\u672c\u90e8\u5206\u3002\u4e3a\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63a8\u51fa\u4e86SPIQA\uff08\u79d1\u5b66\u8bba\u6587\u56fe\u50cf\u95ee\u9898\u56de\u7b54\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u5927\u578bQA\u6570\u636e\u96c6\uff0c\u65e8\u5728\u7406\u89e3\u8ba1\u7b97\u673a\u79d1\u5b66\u5404\u9886\u57df\u7684\u590d\u6742\u56fe\u8868\u3001\u8868\u683c\u548c\u7ed3\u679c\u53ef\u89c6\u5316\u3002\u501f\u52a9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u901a\u8fc7\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u7b5b\u9009\u521b\u5efa\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u3002SPIQA\u5305\u542b\u4e8627\u4e07\u6761\u95ee\u9898\uff0c\u5206\u4e3a\u8bad\u7ec3\u3001\u9a8c\u8bc1\u548c\u4e09\u4e2a\u4e0d\u540c\u7684\u8bc4\u4f30\u5206\u6bb5\u3002\u901a\u8fc7\u4e0e12\u4e2a\u57fa\u7840\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u591a\u6a21\u6001\u7cfb\u7edf\u7406\u89e3\u79d1\u7814\u6587\u7ae0\u7ec6\u5fae\u4e4b\u5904\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u8bc4\u4ef7\u7b56\u7565\uff0c\u7ed3\u5408\u4e0a\u4e0b\u6587\u68c0\u7d22\uff0c\u5b9e\u73b0\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u8bc4\u4f30\uff0c\u6709\u52a9\u4e8e\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u989d\u5916\u6587\u672c\u4fe1\u606f\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u4e0a\u9650\uff0c\u8fd9\u8868\u660e\u4e86\u5176\u5bf9\u672a\u6765\u7814\u7a76\u7684\u6f5c\u529b\uff0c\u5e76\u9884\u793a\u7740\u8be5\u6570\u636e\u96c6\u5c06\u9769\u65b0\u6211\u4eec\u4e0e\u79d1\u5b66\u6587\u732e\u4e92\u52a8\u7684\u65b9\u5f0f\u3002**|\n", "2407.09394": "|**2024-07-12**|**PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents**|Saber Zerhoudi et.al.|[2407.09394](http://arxiv.org/abs/2407.09394)|**[link](https://github.com/padas-lab-de/PersonaRAG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u77e5\u8bc6\u8fc7\u65f6\u548c\u80e1\u7f16\u4e71\u9020\u800c\u96be\u4ee5\u751f\u6210\u53ef\u9760\u7684\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u578b\u901a\u8fc7\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u6539\u8fdb\u4e86LLMs\uff0c\u4f46\u5f80\u5f80\u65e0\u6cd5\u4e2a\u6027\u5316\u68c0\u7d22\u8fc7\u7a0b\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PersonaRAG\uff0c\u5b83\u5f15\u5165\u4e86\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u4ee3\u7406\uff0c\u80fd\u591f\u6839\u636e\u5b9e\u65f6\u7528\u6237\u6570\u636e\u548c\u4ea4\u4e92\u6765\u8c03\u6574\u68c0\u7d22\u548c\u751f\u6210\u3002\u5728\u591a\u4e2a\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPersonaRAG\u76f8\u8f83\u4e8e\u57fa\u7840\u6a21\u578b\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\uff0c\u80fd\u66f4\u597d\u5730\u6ee1\u8db3\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7528\u6237\u9002\u5e94\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u5177\u6709\u5e7f\u9614\u7684\u53d1\u5c55\u524d\u666f\u3002|\n", "2407.09388": "|**2024-07-12**|**GAVEL: Generating Games Via Evolution and Language Models**|Graham Todd et.al.|[2407.09388](http://arxiv.org/abs/2407.09388)|null|\u81ea\u52a8\u521b\u5efa\u65b0\u9896\u6709\u8da3\u7684\u6e38\u620f\u662f\u4e00\u4e2a\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u5982\u4f55\u4ee5\u8ba1\u7b97\u673a\u53ef\u5904\u7406\u7684\u5f62\u5f0f\u8868\u8fbe\u6e38\u620f\u89c4\u5219\u3001\u641c\u7d22\u5e9e\u5927\u7684\u6f5c\u5728\u6e38\u620f\u7a7a\u95f4\uff0c\u4ee5\u53ca\u51c6\u786e\u8bc4\u4f30\u672a\u89c1\u8fc7\u6e38\u620f\u7684\u539f\u521b\u6027\u548c\u8d28\u91cf\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u6709\u9650\u7684\u89c4\u5219\u8868\u793a\uff0c\u5e76\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u5728Ludii\u6e38\u620f\u63cf\u8ff0\u8bed\u8a00\u4e2d\u751f\u6210\u65b0\u5947\u7684\u6e38\u620f\uff0c\u8be5\u8bed\u8a00\u7f16\u7801\u4e86\u5404\u79cd\u98ce\u683c\u548c\u73a9\u6cd5\u76841000\u591a\u6b3e\u68cb\u76d8\u6e38\u620f\u89c4\u5219\u3002\u6211\u4eec\u501f\u9274\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fdb\u5316\u8ba1\u7b97\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u80fd\u591f\u667a\u80fd\u5730\u53d8\u5f02\u548c\u91cd\u7ec4\u4ee5\u4ee3\u7801\u5f62\u5f0f\u8868\u8fbe\u7684\u6e38\u620f\u673a\u5236\u7684\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u521b\u9020\u51fa\u65b0\u7684\u3001\u6709\u5438\u5f15\u529b\u7684\u6e38\u620f\uff0c\u5305\u62ec\u90a3\u4e9b\u73b0\u6709Ludii\u6570\u636e\u96c6\u4e2d\u672a\u8986\u76d6\u7684\u6e38\u620f\u533a\u57df\u3002\u751f\u6210\u7684\u4e00\u4e9b\u6e38\u620f\u793a\u4f8b\u53ef\u901a\u8fc7Ludii\u95e8\u6237\u5728\u7ebf\u4f53\u9a8c\u3002|\n", "2407.10972": "|**2024-07-15**|**VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation**|Bocheng Zou et.al.|[2407.10972](http://arxiv.org/abs/2407.10972)|**[link](https://github.com/vgbench/VGBench)**|**\u5728\u89c6\u89c9\u6a21\u578b\u9886\u57df\uff0c\u4e3b\u8981\u7684\u8868\u793a\u65b9\u5f0f\u662f\u4f7f\u7528\u50cf\u7d20\u6765\u7ed8\u5236\u89c6\u89c9\u4e16\u754c\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u975e\u603b\u662f\u6700\u4f73\u6216\u552f\u4e00\u7684\u8868\u793a\u89c6\u89c9\u5185\u5bb9\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8bbe\u8ba1\u5e08\u548c\u827a\u672f\u5bb6\uff0c\u4ed6\u4eec\u5e38\u7528\u591a\u8fb9\u5f62\u7b49\u51e0\u4f55\u5f62\u72b6\u6765\u6784\u5efa\u56fe\u5f62\u3002\u77e2\u91cf\u56fe\u5f62\uff08VG\uff09\u63d0\u4f9b\u4e86\u4e00\u79cd\u6587\u672c\u5f62\u5f0f\u7684\u89c6\u89c9\u5185\u5bb9\u8868\u793a\uff0c\u5bf9\u4e8e\u5361\u901a\u6216\u7d20\u63cf\u7b49\u7c7b\u578b\u7684\u5185\u5bb9\u53ef\u80fd\u66f4\u4e3a\u7cbe\u70bc\u548c\u5f3a\u5927\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5f3a\u5927\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u4f46\u8fd9\u4e9b\u5de5\u4f5c\u4e3b\u8981\u4fa7\u91cd\u4e8e\u5b9a\u6027\u5206\u6790\u3001\u7406\u89e3\u6216\u7279\u5b9a\u7c7b\u578b\u7684\u77e2\u91cf\u56fe\u5f62\u3002\u6211\u4eec\u63d0\u51faVGBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u7684\u6027\u80fd\uff0c\u5305\u62ec\uff1a(a) \u5bf9\u89c6\u89c9\u7406\u89e3\u548c\u751f\u6210\u7684\u53cc\u91cd\u5173\u6ce8\uff0c(b) \u591a\u79cd\u77e2\u91cf\u56fe\u5f62\u683c\u5f0f\u7684\u8bc4\u4f30\uff0c(c) \u4e0d\u540c\u7c7b\u578b\u7684\u63d0\u95ee\uff0c(d) \u5e7f\u6cdb\u7684\u63d0\u793a\u6280\u5de7\uff0c\u4ee5\u53ca(e) \u5728\u591a\u79cdLLMs\u4e0b\u7684\u8868\u73b0\u3002\u901a\u8fc7\u5bf9\u6536\u96c6\u76844279\u4e2a\u7406\u89e3\u6837\u672c\u548c5845\u4e2a\u751f\u6210\u6837\u672c\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u90fd\u8868\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u4f4e\u7ea7\u683c\u5f0f\uff08\u5982SVG\uff09\u4e0a\u8868\u73b0\u7a0d\u900a\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u8bc4\u4f30\u6d41\u7a0b\u5c06\u5728\u4e0a\u5f00\u6e90\u3002**|\n", "2407.10969": "|**2024-07-15**|**Q-Sparse: All Large Language Models can be Fully Sparsely-Activated**|Hongyu Wang et.al.|[2407.10969](http://arxiv.org/abs/2407.10969)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u79f0\u4e3aQ-Sparse\uff0c\u4e13\u4e3a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u3002Q-Sparse\u4f7f\u5f97LLMs\u7684\u6fc0\u6d3b\u5168\u4e3a\u7a00\u758f\uff0c\u4ece\u800c\u5728\u63a8\u7406\u9636\u6bb5\u5e26\u6765\u663e\u8457\u7684\u6548\u7387\u63d0\u5347\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u5e94\u7528\u9876\u90e8K\u7a00\u758f\u5316\u6280\u672f\u5bf9\u6fc0\u6d3b\u8fdb\u884c\u5904\u7406\uff0c\u5e76\u7ed3\u5408\u76f4\u901a\u4f30\u8ba1\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3b\u8981\u6210\u679c\u5305\u62ec\uff1a(1) Q-Sparse\u5728\u4fdd\u6301\u4e0e\u57fa\u7ebfLLM\u7ed3\u679c\u76f8\u5f53\u7684\u540c\u65f6\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a8\u7406\u65f6\u7684\u6548\u7387\uff1b(2) \u6211\u4eec\u7ed9\u51fa\u4e86\u7a00\u758f\u6fc0\u6d3bLLMs\u7684\u6700\u4f18\u63a8\u7406\u7f29\u653e\u5b9a\u5f8b\uff1b(3) Q-Sparse\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8868\u73b0\u4f18\u79c0\uff0c\u5305\u62ec\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7ee7\u7eed\u8bad\u7ec3\u548c\u5fae\u8c03\uff1b(4) Q-Sparse\u9002\u7528\u4e8e\u5168\u7cbe\u5ea6\u548c1\u4f4d\u7cbe\u5ea6\u7684LLMs\uff0c\u5982BitNet b1.58\u3002\u7279\u522b\u662f\uff0cBitNet b1.58\u4e0eQ-Sparse\uff08\u53ef\u914d\u5907MoE\uff09\u7684\u7ed3\u5408\uff0c\u4e3a\u672a\u6765LLMs\u7684\u6548\u7387\u63d0\u5347\uff0c\u5305\u62ec\u6210\u672c\u548c\u80fd\u8017\uff0c\u63d0\u4f9b\u4e86\u57fa\u77f3\u548c\u6e05\u6670\u8def\u5f84\u3002|\n", "2407.10960": "|**2024-07-15**|**Fast Matrix Multiplications for Lookup Table-Quantized LLMs**|Han Guo et.al.|[2407.10960](http://arxiv.org/abs/2407.10960)|**[link](https://github.com/hanguo97/flute)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u901a\u5e38\u53d7\u5230\u5185\u5b58\u5e26\u5bbd\u7684\u9650\u5236\uff0c\u5176\u4e2d\u4e3b\u8981\u74f6\u9888\u662f\u5c06\u6a21\u578b\u53c2\u6570\u4eceGPU\u5168\u5c40\u5185\u5b58\u4f20\u8f93\u5230\u5bc4\u5b58\u5668\u7684\u6210\u672c\u3002\u901a\u8fc7\u7ed3\u5408\u6743\u91cd\u53ea\u91cf\u5316\uff0c\u53ef\u4ee5\u51cf\u5c11\u5185\u5b58\u79fb\u52a8\uff0c\u4ece\u800c\u52a0\u901f\u63a8\u7406\u901f\u5ea6\u3002\u7136\u800c\uff0c\u4e3a\u91cf\u5316\u540e\u7684LLMs\u8bbe\u8ba1\u9ad8\u6027\u80fd\u5185\u6838\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5f53\u6743\u91cd\u88ab\u538b\u7f29\u5230\u975e\u5747\u5300\u5206\u9694\u7684\u4f4d\u5bbd\uff08\u59823\u4f4d\uff09\uff0c\u5e76\u91c7\u7528\u975e\u5747\u5300\u67e5\u627e\u8868\uff08LUT\uff09\u91cf\u5316\u65f6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7075\u6d3b\u7684\u67e5\u627e\u8868\u5f15\u64ceFLUTE\uff0c\u5b83\u901a\u8fc7\u5bf9\u91cf\u5316\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u79bb\u7ebf\u91cd\u6784\uff0c\u4ee5\u6700\u5c0f\u5316\u89e3\u538b\u76f8\u5173\u7684\u4f4d\u64cd\u4f5c\uff0c\u5e76\u901a\u8fc7\u5411\u91cf\u5316\u548c\u590d\u5236\u67e5\u627e\u8868\u6765\u7f13\u89e3\u5171\u4eab\u5185\u5b58\u5e26\u5bbd\u9650\u5236\u3002\u5728\u5c0f\u6279\u91cf\uff08\u5c0f\u4e8e32\uff09\u548c\u91cf\u5316\u7ec4\u5927\u5c0f\u4e3a128\uff08LLM\u63a8\u7406\u4e2d\u7684\u5178\u578b\u503c\uff09\u7684\u60c5\u51b5\u4e0b\uff0cFLUTE\u5185\u6838\u7684\u901f\u5ea6\u53ef\u4ee5\u6bd4\u73b0\u6709GEMM\u5185\u6838\u5feb2-4\u500d\u3002\u4f5c\u4e3aFLUTE\u7684\u5e94\u7528\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u67e5\u627e\u8868\u57fa\u7684NormalFloat\u91cf\u5316\u7684\u4e00\u79cd\u7b80\u5355\u6269\u5c55\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u4e8e\u91cf\u5316LLaMA3\uff0c\u83b7\u5f97\u4e86\u4e0e\u5f3a\u5927\u57fa\u51c6\u76f8\u5f53\u7684\u91cf\u5316\u6027\u80fd\uff0c\u540c\u65f6\u5b9e\u73b0\u4e86\u7aef\u5230\u7aef\u541e\u5410\u91cf\u76841.5\u52302\u500d\u63d0\u5347\u3002|\n", "2407.10953": "|**2024-07-15**|**MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models**|Chengguang Gan et.al.|[2407.10953](http://arxiv.org/abs/2407.10953)|null|## \u4efb\u52a1 **\u80cc\u666f\uff1a** \u4e92\u60e0\u589e\u5f3a\u6548\u5e94\uff08MRE\uff09\u5728\u4fe1\u606f\u62bd\u53d6\u548c\u591a\u4efb\u52a1\u7814\u7a76\u4e2d\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4ec5\u6709\u7684MRE\u6df7\u5408\u6570\u636e\u96c6\u5c40\u9650\u4e8e\u65e5\u8bed\uff0c\u8fd9\u9650\u5236\u4e86\u5168\u7403\u7814\u7a76\u754c\u7684\u5e7f\u6cdb\u63a2\u7d22\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u8bed\u8a00MRE\u6df7\u5408\u6570\u636e\u96c6\uff08MMM\uff09\uff0c\u5305\u542b\u82f1\u8bed\u3001\u65e5\u8bed\u548c\u6c49\u8bed\u768421\u4e2a\u5b50\u96c6\u3002\u672c\u6587\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f85\u52a9\u7684\u6570\u636e\u96c6\u7ffb\u8bd1\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528LLMs\u5c06\u539f\u59cb\u65e5\u8bed\u6587\u672c\u8fdb\u884c\u7ffb\u8bd1\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u6570\u636e\u96c6\u6784\u5efa\u65f6\u7684\u4eba\u5de5\u6807\u6ce8\u65f6\u95f4\u3002 **\u8d21\u732e\uff1a** \u6211\u4eec\u6269\u5c55\u4e86\u6570\u636e\u96c6\uff0c\u52a0\u5165\u4e86\u5f00\u653e\u9886\u57df\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u53e5\u5b50\u5206\u7c7b\u4efb\u52a1\u3002\u57fa\u4e8e\u8fd9\u4e2a\u6269\u5145\u540e\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7edf\u4e00\u7684\u8f93\u5165-\u8f93\u51fa\u6846\u67b6\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u5f00\u653e\u57df\u4fe1\u606f\u62bd\u53d6\u5927\u8bed\u8a00\u6a21\u578b\uff08OIELLM\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0cOIELLM\u6a21\u578b\u80fd\u591f\u6709\u6548\u5904\u7406\u65b0\u7684MMM\u6570\u636e\u96c6\uff0c\u5e76\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002 \u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u591a\u8bed\u8a00\u8d44\u6e90\u548c\u9ad8\u6548\u7684\u7ffb\u8bd1\u7b56\u7565\uff0c\u63a8\u52a8\u4e92\u60e0\u589e\u5f3a\u6548\u5e94\u5728\u591a\u8bed\u8a00\u4fe1\u606f\u62bd\u53d6\u9886\u57df\u7684\u5e94\u7528\u7814\u7a76\u3002|\n", "2407.10947": "|**2024-07-15**|**Can Textual Semantics Mitigate Sounding Object Segmentation Preference?**|Yaoting Wang et.al.|[2407.10947](http://arxiv.org/abs/2407.10947)|**[link](https://github.com/gewu-lab/sounding-object-segmentation-preference)**|**## \u4efb\u52a1 \u97f3\u9891-\u89c6\u89c9\u5206\u5272\uff08Audio-Visual Segmentation\uff0cAVS\uff09\u4efb\u52a1\u7684\u76ee\u6807\u662f\u5229\u7528\u97f3\u9891\u7ebf\u7d22\u5728\u89c6\u89c9\u7a7a\u95f4\u4e2d\u5206\u5272\u51fa\u53d1\u58f0\u7269\u4f53\u3002\u7136\u800c\uff0c\u7814\u7a76\u6307\u51fa\uff0c\u73b0\u6709\u7684AVS\u65b9\u6cd5\u8fc7\u4e8e\u4f9d\u8d56\u5bf9\u53ef\u542c\u89c1\u5bf9\u8c61\u7684\u5206\u5272\u504f\u597d\uff0c\u800c\u975e\u7cbe\u786e\u7684\u97f3\u9891\u6307\u5bfc\u3002\u95ee\u9898\u5728\u4e8e\uff0c\u76f8\u6bd4\u4e8e\u89c6\u89c9\uff0c\u97f3\u9891\u5728\u591a\u58f0\u6e90\u97f3\u573a\u4e2d\u7684\u8bed\u4e49\u8868\u73b0\u8f83\u5f31\uff0c\u5bfc\u81f4\u5176\u5728\u6307\u5bfc\u89c6\u89c9\u7a7a\u95f4\u65f6\u4f5c\u7528\u6709\u9650\u3002\u9274\u4e8e\u6587\u672c\u6a21\u6001\u7ecf\u8fc7\u6df1\u5165\u63a2\u7d22\uff0c\u5305\u542b\u4e30\u5bcc\u7684\u62bd\u8c61\u8bed\u4e49\uff0c\u6211\u4eec\u63d0\u51fa\u5229\u7528\u89c6\u89c9\u573a\u666f\u4e2d\u7684\u6587\u672c\u63d0\u793a\u6765\u589e\u5f3a\u97f3\u9891\u6307\u5bfc\u7684\u7cbe\u786e\u6027\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u73b0\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u5668\u83b7\u53d6\u573a\u666f\u63cf\u8ff0\uff0c\u7136\u540e\u5229\u7528\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u6f5c\u5728\u7684\u53d1\u58f0\u7269\u4f53\u4f5c\u4e3a\u6587\u672c\u7ebf\u7d22\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u4e49\u7684\u97f3\u9891\u5efa\u6a21\u6a21\u5757\uff0c\u5f15\u5165\u52a8\u6001\u63a9\u7801\uff0c\u5c06\u97f3\u9891\u7279\u5f81\u4e0e\u6587\u672c\u7ebf\u7d22\u878d\u5408\uff0c\u751f\u6210\u5177\u6709\u4ee3\u8868\u6027\u7684\u53d1\u58f0\u7269\u4f53\u7279\u5f81\u3002\u8fd9\u4e9b\u7279\u5f81\u4e0d\u4ec5\u5305\u542b\u97f3\u9891\u4fe1\u606f\uff0c\u8fd8\u8574\u542b\u4e86\u751f\u52a8\u7684\u8bed\u4e49\uff0c\u4ece\u800c\u4e3a\u89c6\u89c9\u7a7a\u95f4\u63d0\u4f9b\u66f4\u4e3a\u6e05\u6670\u7684\u6307\u5f15\u3002\u6211\u4eec\u5728AVS\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u501f\u52a9\u6587\u672c\u63d0\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5bf9\u97f3\u9891\u7684\u654f\u611f\u5ea6\u5f97\u5230\u63d0\u5347\uff0c\u5728\u6240\u6709\u4e09\u4e2a\u5b50\u96c6\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u7ade\u4e89\u529b\u3002\u9879\u76ee\u9875\u9762\uff1a[https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference](https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference)\u3002**|\n", "2407.10943": "|**2024-07-15**|**GRUtopia: Dream General Robots in a City at Scale**|Hanqing Wang et.al.|[2407.10943](http://arxiv.org/abs/2407.10943)|**[link](https://github.com/openrobotlab/grutopia)**|**\u8fd1\u671f\u7684\u7814\u7a76\u6b63\u5728\u63a2\u7d22Embodied AI\u9886\u57df\u7684\u89c4\u6a21\u6cd5\u5219\u3002\u9274\u4e8e\u6536\u96c6\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u7684\u9ad8\u6602\u6210\u672c\uff0c\u6211\u4eec\u8ba4\u4e3a\u6a21\u62df\u5230\u73b0\u5b9e\uff08Sim2Real\uff09\u65b9\u6cd5\u5bf9\u4e8e\u6269\u5c55embodied\u6a21\u578b\u7684\u5b66\u4e60\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u4ecb\u7ecd\u9879\u76eeGRUtopia\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u5404\u79cd\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u9996\u4e2a\u4e92\u52a8\u4e09\u7ef4\u793e\u4f1a\u3002\u5b83\u5177\u6709\u591a\u9879\u521b\u65b0\uff1a(a) \u573a\u666f\u6570\u636e\u96c6GRScenes\u5305\u542b\u4e8610\u4e07\u5f20\u4ea4\u4e92\u5f0f\u3001\u7cbe\u7ec6\u6ce8\u91ca\u7684\u573a\u666f\uff0c\u8fd9\u4e9b\u573a\u666f\u53ef\u4ee5\u81ea\u7531\u7ec4\u5408\u6210\u57ce\u5e02\u89c4\u6a21\u7684\u73af\u5883\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u5173\u6ce8\u5bb6\u5ead\u73af\u5883\u7684\u4f5c\u54c1\u4e0d\u540c\uff0cGRScenes\u6db5\u76d6\u4e8689\u4e2a\u591a\u6837\u5316\u7684\u573a\u666f\u7c7b\u522b\uff0c\u5f25\u5408\u4e86\u670d\u52a1\u5bfc\u5411\u73af\u5883\u4e2d\u673a\u5668\u4eba\u521d\u59cb\u90e8\u7f72\u7684\u5dee\u8ddd\u3002(b) GRResidents\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u975e\u73a9\u5bb6\u89d2\u8272\uff08NPC\uff09\u7cfb\u7edf\uff0c\u8d1f\u8d23\u793e\u4ea4\u4e92\u52a8\u3001\u4efb\u52a1\u751f\u6210\u548c\u4efb\u52a1\u5206\u914d\uff0c\u4ece\u800c\u6a21\u62dfembodied AI\u5e94\u7528\u4e2d\u7684\u793e\u4f1a\u573a\u666f\u3002(c) \u6807\u51c6\u5316\u57fa\u51c6GRBench\u652f\u6301\u5404\u79cd\u673a\u5668\u4eba\uff0c\u4f46\u4ee5\u817f\u8db3\u673a\u5668\u4eba\u4e3a\u4e3b\uff0c\u63d0\u4f9b\u6d89\u53ca\u7269\u4f53\u5bfc\u822a\u3001\u793e\u4ea4\u5bfc\u822a\u548c\u79fb\u52a8\u64cd\u4f5c\u7684\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5177\u6709\u9002\u5ea6\u7684\u6311\u6218\u6027\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u591f\u7f13\u89e3\u8be5\u9886\u57df\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u532e\u4e4f\uff0c\u5e76\u4e3aEmbodied AI\u7814\u7a76\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u9879\u76ee\u4ee3\u7801\u53ef\u4ecehttps://github.com/OpenRobotLab/GRUtopia\u83b7\u53d6\u3002**|\n", "2407.10909": "|**2024-07-15**|**FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets**|Xiaohui Victor Li et.al.|[2407.10909](http://arxiv.org/abs/2407.10909)|**[link](https://github.com/xiaohui-victor-li/FinDKG)**|\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff08DKGs\uff09\u662f\u4e00\u79cd\u6d41\u884c\u7684\u6570\u636e\u7ed3\u6784\uff0c\u7528\u4e8e\u8868\u793a\u968f\u65f6\u95f4\u53d8\u5316\u7684\u5bf9\u8c61\u4e4b\u95f4\u7684\u5404\u79cd\u8fde\u63a5\u3002\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u65e0\u7ed3\u6784\u6570\u636e\u6e90\uff08\u5982\u6587\u672c\u548c\u56fe\u50cf\uff09\u63d0\u53d6\u7684\u4fe1\u606f\u65f6\u5c55\u73b0\u51fa\u9ad8\u6548\u6027\u3002\u5728\u91d1\u878d\u5e94\u7528\u4e2d\uff0cDKGs\u53ef\u7528\u4e8e\u57fa\u4e8e\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u63a2\u6d4b\u6295\u8d44\u7b56\u7565\u7684\u8d8b\u52bf\u3002\u672c\u7814\u7a76\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\u7684\u7279\u6027\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u7684Fine-tuned LLM\uff0c\u79f0\u4e3a\u96c6\u6210\u4e0a\u4e0b\u6587\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\uff08ICKG\uff09\u3002\u5229\u7528ICKG\uff0c\u6211\u4eec\u4ece\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u4e2d\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u5f00\u6e90\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff0c\u79f0\u4e3aFinDKG\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6ce8\u610f\u529b\u673a\u5236\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff08KGTransformer\uff09\uff0c\u7528\u4e8e\u5206\u6790\u8fd9\u4e2a\u56fe\u8c31\u3002\u6211\u4eec\u5728\u57fa\u51c6\u6570\u636e\u96c6\u548cFinDKG\u4e0a\u6d4b\u8bd5\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7ed3\u679c\u663e\u793a\u5728\u94fe\u63a5\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0cKGTransformer\u8868\u73b0\u4f18\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86KGTransformer\u5728FinDKG\u4e0a\u7684\u4e3b\u9898\u6295\u8d44\u6027\u80fd\uff0c\u8bc1\u660e\u5b83\u80fd\u8d85\u8d8a\u73b0\u6709\u7684\u4e3b\u9898\u4ea4\u6613\u6240\u4ea4\u6613\u57fa\u91d1\uff08ETF\uff09\u3002|\n", "2407.10887": "|**2024-07-15**|**Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique**|Mark Russinovich et.al.|[2407.10887](http://arxiv.org/abs/2407.10887)|null|\u968f\u7740\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u76d7\u548c\u8bef\u7528\u7684\u62c5\u5fe7\u52a0\u5267\uff0c\u6a21\u578b\u6307\u7eb9\u5316\u7684\u5fc5\u8981\u6027\u63d0\u5347\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u6210\u529f\u7684\u6307\u7eb9\u5e94\u5177\u5907\u4e94\u4e2a\u7279\u6027\uff1a\u900f\u660e\u6027\u3001\u6548\u7387\u3001\u6301\u4e45\u6027\u3001\u9c81\u68d2\u6027\u548c\u4e0d\u53ef\u4f2a\u9020\u6027\u3002\u672c\u6587\u9996\u5148\u5b9a\u4e49\u4e86\u8fd9\u4e9b\u8981\u6c42\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7b80\u5355\u6307\u7eb9\u65b9\u6cd5\u2014\u2014Chain & Hash\uff0c\u5b83\u878d\u5408\u4e86\u52a0\u5bc6\u7406\u5ff5\uff0c\u5b9e\u73b0\u4e86\u6240\u6709\u8fd9\u4e9b\u7279\u6027\u3002Chain & Hash\u6d89\u53ca\u751f\u6210\u4e00\u7ec4\u95ee\u9898\uff08\u6307\u7eb9\uff09\u53ca\u5176\u53ef\u80fd\u7684\u7b54\u6848\uff0c\u7136\u540e\u4f7f\u7528\u5b89\u5168\u54c8\u5e0c\u6280\u672f\u5c06\u5b83\u4eec\u5408\u5e76\uff0c\u4ee5\u786e\u5b9a\u6bcf\u4e2a\u95ee\u9898\u7684\u503c\uff0c\u4ece\u800c\u4fdd\u8bc1\u4e0d\u53ef\u4f2a\u9020\u6027\uff0c\u9632\u6b62\u5bf9\u624b\u58f0\u79f0\u865a\u5047\u6240\u6709\u6743\u3002\u6211\u4eec\u5728\u591a\u4e2a\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86Chain & Hash\u6280\u672f\uff0c\u5e76\u5c55\u793a\u4e86\u5b83\u5bf9\u826f\u6027\u64cd\u4f5c\uff08\u5982\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\uff09\u548c\u654c\u610f\u5220\u9664\u6307\u7eb9\u7684\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5e26\u6307\u7eb9\u7684\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6027\u80fd\u51e0\u4e4e\u4e0e\u975e\u6307\u7eb9\u5316\u6a21\u578b\u76f8\u5f53\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u9ad8\u6548\u6027\u53ca\u5176\u5b9e\u7528\u4ef7\u503c\u3002|\n", "2407.10886": "|**2024-07-15**|**SLIP: Securing LLMs IP Using Weights Decomposition**|Yehonathan Refael et.al.|[2407.10886](http://arxiv.org/abs/2407.10886)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u4ef7\u503c\u4f5c\u4e3a\u77e5\u8bc6\u4ea7\u6743\uff08IP\uff09\u65e5\u76ca\u51f8\u663e\uff0c\u53cd\u6620\u51fa\u5176\u80cc\u540e\u5de8\u5927\u7684\u6295\u8d44\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4e91\u90e8\u7f72\u6210\u672c\u9ad8\uff0c\u8fb9\u7f18\u8bbe\u5907\u90e8\u7f72\u7684\u9700\u6c42\u589e\u52a0\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u53c2\u6570\u88ab\u76d7\u7528\u548c\u672a\u7ecf\u6388\u6743\u4f7f\u7528\u3002\u5f53\u524d\u7684\u4fdd\u62a4\u65b9\u6cd5\u5728\u5b9e\u7528\u6027\u3001\u51c6\u786e\u6027\u635f\u5931\u6216\u9002\u5e94\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6df7\u5408\u63a8\u7406\u7b97\u6cd5\uff0c\u79f0\u4e3aSLIP\uff08Secure Lightweight Inference Protocol\uff09\uff0c\u65e8\u5728\u4fdd\u62a4\u90e8\u7f72\u5728\u8fb9\u7f18\u7684\u6a21\u578b\u514d\u53d7\u76d7\u7a83\u3002SLIP\u662f\u9996\u4e2a\u517c\u987e\u5b9e\u9645\u5e94\u7528\u7684\u5b9e\u7528\u6027\u548c\u4e25\u683c\u5b89\u5168\u6027\u7684\u6df7\u5408\u534f\u8bae\uff0c\u540c\u65f6\u4fdd\u6301\u96f6\u7cbe\u5ea6\u4e0b\u964d\u548c\u4f4e\u5ef6\u8fdf\u5f71\u54cd\u3002 SLIP\u901a\u8fc7\u77e9\u9635\u5206\u89e3\u5b9e\u73b0\u4e86\u6a21\u578b\u5728\u4e24\u4e2a\u8ba1\u7b97\u8d44\u6e90\u4e4b\u95f4\u7684\u5212\u5206\uff1a\u4e00\u4e2a\u5b89\u5168\u4f46\u6602\u8d35\uff0c\u53e6\u4e00\u4e2a\u6210\u672c\u6548\u76ca\u9ad8\u4f46\u6613\u53d7\u653b\u51fb\u3002\u5173\u952e\u5728\u4e8e\uff0c\u5b89\u5168\u8d44\u6e90\u4fdd\u7559\u4e86\u6a21\u578bIP\u4e2d\u6700\u654f\u611f\u7684\u90e8\u5206\uff0c\u540c\u65f6\u6267\u884c\u6700\u5c11\u7684\u8ba1\u7b97\uff0c\u800c\u8106\u5f31\u8d44\u6e90\u5219\u76f8\u53cd\u3002\u6b64\u5916\uff0c\u8be5\u534f\u8bae\u63d0\u4f9b\u4e86\u9632\u6b62\u653b\u51fb\u8005\u5229\u7528\u5206\u5272\u83b7\u53d6\u4fdd\u5bc6\u4fe1\u606f\u7684\u5b89\u5168\u4fdd\u969c\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5b9e\u9a8c\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86SLIP\u7684\u7a33\u5065\u6027\u548c\u6709\u6548\u6027\uff0c\u4f7f\u5176\u6210\u4e3a\u4fdd\u62a4LLMs\u7684\u7406\u60f3\u89e3\u51b3\u65b9\u6848\u3002|\n", "2407.10873": "|**2024-07-15**|**Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models**|Rui Zhang et.al.|[2407.10873](http://arxiv.org/abs/2407.10873)|null|\u81ea\u52a8\u5316\u542f\u53d1\u5f0f\u8bbe\u8ba1\uff08AHD\uff09\u56e0\u5176\u5728\u81ea\u52a8\u5f00\u53d1\u9ad8\u6548\u542f\u53d1\u5f0f\u65b9\u6cd5\u65b9\u9762\u7684\u6f5c\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5c06AHD\u89c6\u4e3a\u8fdb\u5316\u7a0b\u5e8f\u641c\u7d22\uff08EPS\uff09\u95ee\u9898\u7684\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u8bbe\u7f6e\u4e0d\u4e00\u81f4\uff0c\u57fa\u7840\u6bd4\u8f83\u4e0d\u8db3\uff0c\u4e14\u7f3a\u4e4f\u5bf9LLM\u4e0e\u641c\u7d22\u7b56\u7565\u7ed3\u5408\u5fc5\u8981\u6027\u7684\u6df1\u5165\u5206\u6790\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u7684\u5b9e\u9645\u8fdb\u5c55\u96be\u4ee5\u5f97\u5230\u5145\u5206\u8bc1\u660e\u3002\u672c\u7814\u7a76\u901a\u8fc7\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86\u56db\u9879\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u548c\u56db\u9879AHD\u95ee\u9898\uff0c\u8de8\u8d8a\u4e5d\u79cdLLM\uff0c\u5e76\u8fdb\u884c\u4e86\u4e94\u6b21\u72ec\u7acb\u8fd0\u884c\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\uff0c\u5b9e\u8bc1\u4e86\u5728LLM\u9a71\u52a8\u7684AHD\u65b9\u6cd5\u4e2d\u7684\u8fdb\u5316\u641c\u7d22\u7684\u91cd\u8981\u6027\uff0c\u540c\u65f6\u4e5f\u63a8\u52a8\u4e86\u672a\u6765EPS\u7b97\u6cd5\u5f00\u53d1\u7684\u8fdb\u6b65\u3002\u4e3a\u4e86\u4fc3\u8fdb\u53ef\u8bbf\u95ee\u6027\u548c\u53ef\u91cd\u590d\u6027\uff0c\u6211\u4eec\u5df2\u7ecf\u5168\u9762\u5f00\u6e90\u4e86\u6211\u4eec\u7684\u57fa\u51c6\u548c\u76f8\u5173\u7ed3\u679c\u3002|\n", "2407.11965": "|**2024-07-16**|**UrbanWorld: An Urban World Model for 3D City Generation**|Yu Shang et.al.|[2407.11965](http://arxiv.org/abs/2407.11965)|**[link](https://github.com/urban-world/urbanworld)**|\u57ce\u5e02\u4f5c\u4e3a\u4eba\u7c7b\u751f\u6d3b\u7684\u57fa\u672c\u73af\u5883\uff0c\u5305\u542b\u4e86\u5efa\u7b51\u3001\u9053\u8def\u548c\u690d\u88ab\u7b49\u591a\u5143\u7269\u7406\u5143\u7d20\uff0c\u8fd9\u4e9b\u5143\u7d20\u4e4b\u95f4\u5b58\u5728\u7740\u590d\u6742\u7684\u76f8\u4e92\u5173\u8054\u3002\u6784\u5efa\u903c\u771f\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u73af\u5883\u5bf9\u4e8e\u7814\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u611f\u77e5\u3001\u51b3\u7b56\u548c\u884c\u52a8\u7684AI\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u624b\u5de5\u5236\u4f5c\u8fc7\u7a0b\u8017\u65f6\u4e14\u7cbe\u7ec6\uff0c\u9700\u8981\u8bbe\u8ba1\u5e08\u6295\u5165\u5927\u91cf\u7cbe\u529b\u6765\u7cbe\u786e\u5448\u73b0\u590d\u6742\u7684\u57ce\u5e02\u7279\u5f81\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faUrbanWorld\uff0c\u8fd9\u662f\u4e00\u4e2a\u9996\u4e2a\u81ea\u52a8\u751f\u6210\u5b9a\u5236\u5316\u3001\u771f\u5b9e\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u4e16\u754c\u7684\u6a21\u578b\uff0c\u652f\u6301\u7075\u6d3b\u7684\u63a7\u5236\u6761\u4ef6\u3002UrbanWorld\u7684\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u56db\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u5229\u7528\u516c\u5f00\u7684OSM\u6570\u636e\u8fdb\u884c3D\u5e03\u5c40\u751f\u6210\u3001\u501f\u52a9\u5f3a\u5927\u7684\u57ce\u5e02\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08Urban MLLM\uff09\u8fdb\u884c\u57ce\u5e02\u573a\u666f\u89c4\u5212\u4e0e\u8bbe\u8ba1\u3001\u901a\u8fc7\u5148\u8fdb\u76843D\u6269\u6563\u6280\u672f\u5b9e\u73b0\u53ef\u63a7\u8d44\u4ea7\u6e32\u67d3\uff0c\u4ee5\u53caMLLM\u8f85\u52a9\u7684\u573a\u666f\u7ec6\u5316\u3002UrbanWorld\u751f\u6210\u7684\u9ad8\u4fdd\u771f3D\u57ce\u5e02\u73af\u5883\u4e3a\u901a\u7528AI\u548c\u673a\u5668\u611f\u77e5\u7cfb\u7edf\u5728\u6a21\u62df\u4e2d\u7684\u771f\u5b9e\u53cd\u9988\u548c\u4ea4\u4e92\u63d0\u4f9b\u4e86\u53ef\u80fd\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u5c06UrbanWorld\u4f5c\u4e3a\u5f00\u6e90\u4e14\u591a\u529f\u80fd\u7684\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u63d0\u5347AI\u5728\u771f\u5b9e\u57ce\u5e02\u73af\u5883\u4e2d\u7684\u611f\u77e5\u3001\u51b3\u7b56\u548c\u4e92\u52a8\u80fd\u529b\u3002|\n", "2407.11963": "|**2024-07-16**|**NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?**|Mo Li et.al.|[2407.11963](http://arxiv.org/abs/2407.11963)|**[link](https://github.com/open-compass/opencompass)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aNeedleBench\u7684\u6846\u67b6\uff0c\u5b83\u662f\u4e00\u7cfb\u5217\u8bc4\u4f30\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u957f\u6587\u672c\u7406\u89e3\u80fd\u529b\u7684\u9010\u6b65\u5347\u7ea7\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u6d89\u53ca\u4e0d\u540c\u957f\u5ea6\u533a\u95f4\uff084k\u30018k\u300132k\u3001128k\u3001200k\u30011M\u4e43\u81f3\u66f4\u957f\uff09\u548c\u6df1\u5ea6\u8303\u56f4\uff0c\u901a\u8fc7\u5728\u4e0d\u540c\u6587\u672c\u6df1\u5ea6\u533a\u57df\u63d2\u5165\u5173\u952e\u6570\u636e\u70b9\uff0c\u7cfb\u7edf\u6027\u5730\u6d4b\u8bd5\u6a21\u578b\u5728\u5404\u79cd\u60c5\u5883\u4e0b\u7684\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u9488\u5bf9\u4e8e\u53cc\u8bed\u957f\u6587\u672c\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u6846\u67b6\u6765\u8003\u5bdf\u4e3b\u6d41\u5f00\u6e90\u6a21\u578b\u8bc6\u522b\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e76\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7956\u5148\u8ffd\u8e2a\u6311\u6218\uff08Ancestral Trace Challenge\uff0cATC\uff09\uff0c\u65e8\u5728\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u957f\u6587\u672c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5355\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30LLMs\u5904\u7406\u590d\u6742\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u5728\u5b9e\u9645\u7684\u957f\u6587\u672c\u5e94\u7528\u4e2d\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u5904\u7406\u903b\u8f91\u63a8\u7406\u96be\u9898\u65f6\u9762\u4e34\u6311\u6218\u3002\u6240\u6709\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728OpenCompass\u9879\u76ee\uff08https://github.com/open-compass/opencompass\uff09\u83b7\u53d6\u3002**|\n", "2407.11934": "|**2024-07-16**|**Code Documentation and Analysis to Secure Software Development**|Paul Attie et.al.|[2407.11934](http://arxiv.org/abs/2407.11934)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCode Documentation and Analysis Tool\uff08CoDAT\uff09\u7684\u5de5\u5177\u3002CoDAT\u65e8\u5728\u4fdd\u6301\u4ee3\u7801\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\uff0c\u4f8b\u5982\uff0c\u5982\u679c\u4ee3\u7801\u7247\u6bb5\u4e2d\u7684\u67d0\u884c\u88ab\u4fee\u6539\uff0c\u76f8\u5e94\u7684\u6ce8\u91ca\u4e5f\u4f1a\u81ea\u52a8\u66f4\u65b0\uff0c\u786e\u4fdd\u5185\u90e8\u4e00\u81f4\u6027\u4ee5\u53ca\u4e0e\u4ee3\u7801\u7684\u4e00\u81f4\u6027\u3002\u901a\u8fc7\u6807\u8bb0\u8fc7\u65f6\u7684\u6ce8\u91ca\uff0cCoDAT\u63d0\u9192\u5f00\u53d1\u8005\u7ef4\u62a4\u6700\u65b0\u7684\u6587\u6863\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u67e5\u4ee3\u7801\u7247\u6bb5\u4e0e\u5176\u63cf\u8ff0\u7684\u8bed\u4e49\u4e00\u81f4\u6027\uff0c\u4ece\u800c\u4e5f\u80fd\u8bc6\u522b\u51fa\u8bed\u4e49\u4e0d\u4e00\u81f4\u548c\u8fc7\u65f6\u7684\u6ce8\u91ca\u3002\u8fd9\u6709\u52a9\u4e8e\u7a0b\u5e8f\u5458\u7f16\u5199\u6b63\u786e\u5b9e\u73b0\u4ee3\u7801\u8349\u56fe\u7684\u4ee3\u7801\uff0c\u652f\u6301\u9010\u6b65\u7ec6\u5316\u65b9\u6cd5\uff0c\u4ece\u4ee3\u7801\u8349\u56fe\u9010\u6b65\u6f14\u53d8\u4e3a\u7ecf\u8fc7\u4e00\u4e24\u6b21\u6216\u66f4\u591a\u6b21\u7ec6\u5316\u8fed\u4ee3\u7684\u4ee3\u7801\u3002 CoDAT\u5728IntelliJ IDEA IDE\u4e2d\u5b9e\u73b0\uff0c\u5229\u7528Code Insight\u5b88\u62a4\u7a0b\u5e8f\u5305\u7ed3\u5408\u81ea\u5b9a\u4e49\u6b63\u5219\u8868\u8fbe\u5f0f\u7b97\u6cd5\uff0c\u6807\u8bb0\u5bf9\u5e94\u4ee3\u7801\u5757\u5df2\u66f4\u6539\u7684\u6807\u8bb0\u6ce8\u91ca\u3002CoDAT\u7684\u540e\u7aef\u7ed3\u6784\u4e0a\u662f\u53bb\u4e2d\u5fc3\u5316\u7684\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u8d26\u672c\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u4ee3\u7801\u4e00\u81f4\u6027\u8ddf\u8e2a\u548c\u67b6\u6784\u7f16\u8bd1\u7ba1\u7406\u3002|\n", "2407.11919": "|**2024-07-16**|**What's Wrong? Refining Meeting Summaries with LLM Feedback**|Frederic Kirstein et.al.|[2407.11919](http://arxiv.org/abs/2407.11919)|null|\u968f\u7740\u6570\u5b57\u4f1a\u8bae\u7684\u666e\u53ca\uff0c\u4f1a\u8bae\u6458\u8981\u63d0\u70bc\u6210\u4e3a\u5173\u952e\u4efb\u52a1\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u5728\u8fde\u8d2f\u6027\u548c\u7406\u89e3\u4e0a\u4e0b\u6587\u4e2d\u8d85\u8d8a\u4e86\u4f20\u7edf\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4ecd\u9700\u6539\u8fdb\u4ee5\u4fdd\u6301\u76f8\u5173\u6027\u5e76\u907f\u514d\u9519\u8bef\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u591aLLM\u7684\u4f1a\u8bae\u6458\u8981\u4fee\u6b63\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6a21\u62df\u4eba\u7c7b\u5ba1\u67e5\uff1a\u9519\u8bef\u8bc6\u522b\u548c\u6458\u8981\u7cbe\u70bc\u3002\u6211\u4eec\u53d1\u5e03\u4e86QMSum Mistake\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b200\u4efd\u7531\u4eba\u5de5\u6807\u6ce8\u7684\u81ea\u52a8\u751f\u6210\u4f1a\u8bae\u6458\u8981\u6570\u636e\u96c6\uff0c\u9488\u5bf9\u7ed3\u6784\u3001\u9057\u6f0f\u548c\u4e0d\u76f8\u5173\u7b49\u4e5d\u79cd\u9519\u8bef\u7c7b\u578b\u8fdb\u884c\u4e86\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLLMs\u80fd\u591f\u51c6\u786e\u8bc6\u522b\u8fd9\u4e9b\u9519\u8bef\u3002\u6211\u4eec\u5c06\u8bc6\u522b\u51fa\u7684\u95ee\u9898\u8f6c\u5316\u4e3a\u53ef\u64cd\u4f5c\u7684\u53cd\u9988\uff0c\u4ee5\u6b64\u63d0\u5347\u6458\u8981\u7684\u8d28\u91cf\uff0c\u5982\u76f8\u5173\u6027\u3001\u4fe1\u606f\u91cf\u3001\u7b80\u6d01\u6027\u548c\u8fde\u8d2f\u6027\u3002\u8fd9\u79cd\u4e8b\u540e\u4f18\u5316\u7b56\u7565\u901a\u8fc7\u5229\u7528\u591a\u4e2aLLMs\u6765\u9a8c\u8bc1\u8f93\u51fa\u8d28\u91cf\uff0c\u6709\u6548\u63d0\u9ad8\u4e86\u6458\u8981\u8d28\u91cf\u3002\u6211\u4eec\u7684\u591aLLM\u4f1a\u8bae\u6458\u8981\u65b9\u6cd5\u5bf9\u4e8e\u9700\u8981\u7a33\u5065\u6027\u3001\u884c\u52a8\u8ba1\u5212\u548c\u76ee\u6807\u5bfc\u5411\u7684\u590d\u6742\u6587\u672c\u751f\u6210\u4efb\u52a1\u5177\u6709\u6f5c\u5728\u5e94\u7528\u4ef7\u503c\u3002|\n", "2407.11888": "|**2024-07-16**|**Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads**|Aritra Dhar et.al.|[2407.11888](http://arxiv.org/abs/2407.11888)|null|\u5728\u4e91\u5de5\u4f5c\u8d1f\u8f7d\u4e2d\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210AI\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u4e13\u7528\u786c\u4ef6\u52a0\u901f\u5668\uff0c\u5982GPU\u3001NPUs\u548cTPUs\uff0c\u56e0\u5176\u5728AI\u5e94\u7528\u4e2d\u7684\u5353\u8d8a\u6027\u80fd\u8d85\u8d8a\u4e86\u901a\u7528CPU\u3002AI\u6a21\u578b\u548c\u6570\u636e\u901a\u5e38\u5177\u6709\u9ad8\u5ea6\u654f\u611f\u6027\uff0c\u5e76\u6765\u81ea\u76f8\u4e92\u4e0d\u4fe1\u4efb\u7684\u5404\u65b9\u3002\u73b0\u6709\u7684\u57fa\u4e8eCPU\u7684\u53ef\u4fe1\u6267\u884c\u73af\u5883\uff08TEE\uff09\uff0c\u5982\u82f1\u7279\u5c14SGX\u6216AMD SEV\uff0c\u63d0\u4f9b\u7684\u4fdd\u62a4\u4e0d\u591f\u5145\u5206\u3002\u50cfNvidia-CC\u8fd9\u6837\u7684\u8bbe\u5907\u4e2d\u5fc3TEE\u4ec5\u9488\u5bf9\u7d27\u5bc6\u8026\u5408\u7684CPU-GPU\u7cfb\u7edf\uff0c\u4e14\u91c7\u7528\u4e13\u6709\u65b9\u6848\uff0c\u9700\u8981\u5728\u4e3b\u673aCPU\u4e0a\u90e8\u7f72TEE\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u73b0\u6709\u7684\u5b66\u672f\u63d0\u6848\u5927\u591a\u9488\u5bf9\u7279\u5b9a\u7684CPU-TEE\u5e73\u53f0\u3002 \u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Ascend-CC\uff0c\u4e00\u79cd\u57fa\u4e8e\u79bb\u6563NPUs\u7684\u673a\u5bc6\u8ba1\u7b97\u67b6\u6784\uff0c\u65e0\u9700\u5bf9\u4e3b\u673a\u7cfb\u7edf\u4fe1\u4efb\u3002Ascend-CC\u901a\u8fc7\u786e\u4fdd\u6570\u636e\u548c\u6a21\u578b\u52a0\u5bc6\uff0c\u4fdd\u62a4\u6570\u636e\u3001\u6a21\u578b\u53c2\u6570\u548c\u8fd0\u7b97\u7b26\u4e8c\u8fdb\u5236\uff0c\u63d0\u4f9b\u5f3a\u5927\u7684\u5b89\u5168\u6027\u3002\u5b83\u5229\u7528\u59d4\u6258\u5f0f\u5185\u5b58\u8bed\u4e49\u786e\u4fdd\u4e0e\u4e3b\u673a\u8f6f\u4ef6\u6808\u7684\u9694\u79bb\uff0c\u5e76\u901a\u8fc7\u4efb\u52a1\u9274\u6743\u63d0\u4f9b\u6a21\u578b\u5b8c\u6574\u6027\u7684\u5f3a\u6709\u529b\u4fdd\u8bc1\u3002\u6211\u4eec\u7684Ascend-CC\u5b9e\u73b0\u548c\u4e0e\u6700\u65b0LLMs\uff08\u5982Llama2\u548cLlama3\uff09\u7684\u8bc4\u4f30\u8868\u660e\uff0cAscend-CC\u5f15\u5165\u7684\u5f00\u9500\u6781\u5c0f\uff0c\u65e0\u9700\u4fee\u6539AI\u8f6f\u4ef6\u6808\u3002|\n", "2407.11852": "|**2024-07-16**|**Schema Matching with Large Language Models: an Experimental Study**|Marcel Parciak et.al.|[2407.11852](http://arxiv.org/abs/2407.11852)|**[link](https://github.com/uhasselt-dsi-data-systems-lab/code-schema-matching-llms-artefacs)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5173\u7cfb\u6570\u636e\u5e93\u67b6\u6784\uff08schema\uff09\u5339\u914d\u4e2d\u7684\u5e94\u7528\u3002\u76ee\u6807\u662f\u4ec5\u901a\u8fc7\u5143\u7d20\u540d\u79f0\u548c\u63cf\u8ff0\u627e\u51fa\u4e24\u4e2a\u5173\u7cfb\u6a21\u5f0f\u4e4b\u95f4\u7684\u8bed\u4e49\u5bf9\u5e94\u3002\u7814\u7a76\u8005\u6784\u5efa\u4e86\u4e00\u4e2a\u6765\u81ea\u5065\u5eb7\u9886\u57df\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u63d0\u51fa\u4e86\u4e0d\u540c\u7684\u4efb\u52a1\u8303\u56f4\uff0c\u5373\u4f7f\u7528\u4e0d\u540c\u6570\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u63d0\u793a\u6a21\u578b\u8fdb\u884cschema\u5339\u914d\u3002\u4ed6\u4eec\u5bf9\u6bd4\u4e86\u57fa\u4e8eLLM\u7684\u5339\u914d\u65b9\u6cd5\u4e0e\u57fa\u4e8e\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u7684\u57fa\u7ebf\uff0c\u8003\u5bdf\u4e86\u5339\u914d\u8d28\u91cf\u3001\u9a8c\u8bc1\u5de5\u4f5c\u91cf\u3001\u51b3\u7b56\u786e\u5b9a\u6027\u548c\u4e92\u8865\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u4fe1\u606f\u4f1a\u964d\u4f4e\u5339\u914d\u8d28\u91cf\uff0c\u8fc7\u591a\u7684\u4fe1\u606f\u4e5f\u4f1a\u6709\u8d1f\u9762\u5f71\u54cd\u3002\u65b0\u7248\u672c\u7684LLMs\u901a\u5e38\u80fd\u63d0\u9ad8\u51b3\u7b56\u786e\u5b9a\u6027\u3002\u6709\u4e9b\u4efb\u52a1\u8303\u56f4\u4e0b\u7684\u9a8c\u8bc1\u5de5\u4f5c\u76f8\u5bf9\u9002\u5ea6\uff0c\u4e14\u80fd\u6210\u529f\u8bc6\u522b\u5927\u91cf\u771f\u6b63\u610f\u4e49\u4e0a\u7684\u8bed\u4e49\u5339\u914d\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cLLMs\u6709\u6f5c\u529b\u4f5c\u4e3aschema\u5339\u914d\u7684\u521d\u59cb\u5de5\u5177\uff0c\u6570\u636e\u5de5\u7a0b\u5e08\u53ef\u4ee5\u5229\u7528\u5b83\u4eec\u7684\u540d\u79f0\u548c\u63cf\u8ff0\u4fe1\u606f\u5feb\u901f\u8fdb\u884c\u5339\u914d\uff0c\u65e0\u9700\u4f9d\u8d56\u5b9e\u9645\u6570\u636e\u5b9e\u4f8b\u3002**|\n", "2407.11833": "|**2024-07-16**|**LoFTI: Localization and Factuality Transfer to Indian Locales**|Sona Elza Simon et.al.|[2407.11833](http://arxiv.org/abs/2407.11833)|**[link](https://github.com/csalt-research/lofti)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u8bad\u7ec3\u5728\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u7684\u5927\u578b\u7f51\u9875\u6570\u636e\u96c6\uff0c\u79ef\u7d2f\u4e86\u5927\u91cf\u7684\u4e16\u754c\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u901a\u5e38\u503e\u5411\u4e8e\u82f1\u8bed\u548c\u897f\u6b27\u56fd\u5bb6\uff0c\u5bfc\u81f4LLMs\u5bf9\u6765\u81ea\u5176\u4ed6\u5730\u533a\uff0c\u7279\u522b\u662f\u5370\u5ea6\u7684\u672c\u5730\u5316\u67e5\u8be2\u4ea7\u751f\u504f\u89c1\u6216\u865a\u6784\u7684\u56de\u7b54\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6LoFTI\uff08\u5370\u5ea6\u672c\u5730\u5316\u4e0e\u4e8b\u5b9e\u8f6c\u79fb\uff09\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u672c\u5730\u5316\u548c\u4e8b\u5b9e\u6587\u672c\u8f6c\u6362\u80fd\u529b\u3002LoFTI\u5305\u542b\u5173\u4e8e\u5168\u7403\u6e90\u5730\u70b9\u548c\u5370\u5ea6\u76ee\u6807\u5730\u70b9\uff08\u5305\u62ec\u56fd\u5bb6\u3001\u5dde\u548c\u57ce\u5e02\u7684\u4e0d\u540c\u5c42\u7ea7\uff09\u5b9e\u4f53\u7684\u4e8b\u5b9e\u9648\u8ff0\uff0c\u6d89\u53ca\u5404\u7c7b\u5e7f\u6cdb\u7684\u4e3b\u9898\u3002\u6211\u4eec\u4f7f\u7528LoFTI\u6765\u8bc4\u4f30Mixtral\u3001GPT-4\u4ee5\u53ca\u4e24\u79cd\u9002\u7528\u4e8e\u672c\u5730\u5316\u4e8b\u5b9e\u8f6c\u79fb\u4efb\u52a1\u7684Mixtral\u884d\u751f\u65b9\u6cd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLoFTI\u662f\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5305\u62ecGPT-4\u5728\u5185\u7684\u6240\u6709\u6a21\u578b\u5728\u4e0d\u540c\u5c42\u7ea7\u7684\u672c\u5730\u5316\u4e0a\u90fd\u8868\u73b0\u51fa\u504f\u5dee\u3002**|\n", "2407.11827": "|**2024-07-16**|**GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text**|Kyle Hamilton et.al.|[2407.11827](http://arxiv.org/abs/2407.11827)|null|\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u5728\u68c0\u6d4b\u6587\u672c\u4e2d\u7684\u5ba3\u4f20\u624b\u6bb5\u65b9\u9762\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u4f46\u5927\u591a\u6570\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u201c\u9ed1\u76d2\u201d\u89e3\u51b3\u65b9\u6848\uff0c\u5176\u5185\u90e8\u5de5\u4f5c\u539f\u7406\u4e0d\u900f\u660e\u3002\u53ef\u89e3\u91ca\u7684\u65b9\u6cd5\u63d0\u4f9b\u4e86\u89e3\u51b3\u65b9\u6848\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u7cbe\u5fc3\u7684\u7279\u5f81\u5de5\u7a0b\u548c\u6602\u8d35\u7684\u4e13\u5bb6\u6807\u6ce8\u6570\u636e\u3002\u6b64\u5916\uff0c\u5173\u4e8e\u8bf4\u670d\u6027\u6587\u672c\u7684\u8bed\u8a00\u7279\u6027\u901a\u5e38\u7531\u4fee\u8f9e\u5b66\u5bb6\u6216\u8bed\u8a00\u5b66\u5bb6\u5173\u6ce8\uff0c\u4f46\u6ca1\u6709\u9002\u5408\u673a\u5668\u5b66\u4e60\u7684\u6807\u8bb0\u6709\u6b64\u7c7b\u7279\u6027\u7684\u6570\u636e\u96c6\u3002\u672c\u7814\u7a76\u65e8\u5728\u7f16\u7e82\u6587\u732e\u4e2d\u8bc6\u522b\u51fa\u768422\u4e2a\u4fee\u8f9e\u548c\u8bed\u8a00\u7279\u5f81\uff0c\u76ee\u7684\u662f\u5bf9\u4e00\u4e2a\u5df2\u6807\u6ce8\u6709\u5ba3\u4f20\u624b\u6bb5\u7684\u73b0\u6709\u6570\u636e\u96c6\u8fdb\u884c\u6ce8\u91ca\u3002\u4e3a\u4e86\u5e2e\u52a9\u4eba\u7c7b\u4e13\u5bb6\u5728\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u4e0a\u6807\u6ce8\u8fd9\u4e9b\u7279\u5f81\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u540d\u4e3aRhetAnn\u7684\u7f51\u7edc\u5e94\u7528\uff0c\u4ee5\u51cf\u5c11\u539f\u672c\u8f83\u5927\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u63a5\u7740\uff0c\u4f7f\u7528\u4e00\u5c0f\u90e8\u5206\u6807\u6ce8\u6570\u636e\uff0c\u6211\u4eec\u5229\u7528GPT-3.5\uff0c\u4e00\u79cd\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5bf9\u5269\u4f59\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u540c\u65f6\u517c\u987e\u6210\u672c\u6548\u76ca\u548c\u5206\u7c7b\u7cbe\u5ea6\u3002\u8fd9\u9879\u7814\u7a76\u8868\u660e\uff0c\u7ed3\u5408\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u793a\u4f8b\u4e0eGPT\uff0c\u53ef\u4ee5\u6709\u6548\u5730\u4ee5\u4f20\u7edf\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u4e13\u5bb6\u7684\u6807\u6ce8\u6210\u672c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u5b9e\u73b0\u5927\u89c4\u6a21\u6807\u6ce8\u8fc7\u7a0b\u7684\u6269\u5c55\u3002\u7ed3\u679c\u4e0e\u64b0\u5199\u65f6\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4\uff09\u76f8\u5f53\uff0c\u4e14\u6210\u672c\u964d\u4f4e10\u500d\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u8fd9\u4e9b\u7279\u5f81\u3001\u5b83\u4eec\u7684\u5c5e\u6027\u3001\u5b9a\u4e49\u4ee5\u53ca\u793a\u4f8b\u7684\u673a\u5668\u53ef\u8bfb\u683c\u5f0f\uff0c\u4ee5\u53caRhetAnn\u7684\u4ee3\u7801\u3001GPT\u63d0\u793a\u548c\u5fae\u8c03\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u63a8\u52a8\u4e86\u53ef\u89e3\u91ca\u7684\u5ba3\u4f20\u624b\u6bb5\u68c0\u6d4b\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\u3002|\n", "2407.11798": "|**2024-07-16**|**PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation**|Branden Butler et.al.|[2407.11798](http://arxiv.org/abs/2407.11798)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5206\u5e03\u5f0f\u8ba1\u7b97\u673a\u96c6\u7fa4\u4e0a\u7684\u63a8\u7406\u5df2\u6210\u4e3a\u7814\u7a76\u70ed\u70b9\uff0c\u8bb8\u591a\u52a0\u901f\u6280\u672f\u501f\u9274\u4e86CPU\u7684\u63a8\u6d4b\u6267\u884c\u7b56\u7565\u3002\u8fd9\u4e9b\u6280\u672f\u65e8\u5728\u7f13\u89e3\u5185\u5b58\u5e26\u5bbd\u74f6\u9888\uff0c\u4f46\u4f1a\u589e\u52a0\u6bcf\u6b21\u63a8\u7406\u8fd0\u884c\u7684\u7aef\u5230\u7aef\u5ef6\u8fdf\uff0c\u9700\u8981\u9ad8\u63a8\u6d4b\u63a5\u53d7\u7387\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4efb\u52a1\u95f4\u63a5\u53d7\u7387\u7684\u53d8\u5f02\u6027\uff0c\u63a8\u6d4b\u6027\u63a8\u7406\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u7ba1\u9053\u5e76\u884c\u8bbe\u8ba1\u9700\u8981\u5927\u91cf\u7528\u6237\u8bf7\u6c42\u4ee5\u4fdd\u6301\u9ad8\u5229\u7528\u7387\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PipeInfer\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u51cf\u5c11\u8de8\u4ee4\u724c\u5ef6\u8fdf\u3001\u63d0\u9ad8\u5355\u8bf7\u6c42\u573a\u666f\u4e0b\u7cfb\u7edf\u5229\u7528\u7387\u7684\u7ba1\u9053\u5316\u63a8\u6d4b\u52a0\u901f\u6280\u672f\uff0c\u540c\u65f6\u589e\u5f3a\u4e86\u5bf9\u4f4e\u63a8\u6d4b\u63a5\u53d7\u7387\u548c\u4f4e\u5e26\u5bbd\u4e92\u8054\u7684\u5bb9\u5fcd\u5ea6\u3002 PipeInfer\u901a\u8fc7\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u548c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6539\u8fdb\u3002\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u5141\u8bb8\u540c\u65f6\u8fdb\u884c\u5355\u4ee4\u724c\u63a8\u7406\u4e0e\u591a\u4e2a\u63a8\u6d4b\u8fd0\u884c\uff0c\u4ece\u800c\u964d\u4f4e\u5ef6\u8fdf\u548c\u751f\u6210\u901f\u5ea6\u3002\u800c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5219\u80fd\u591f\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u8df3\u8fc7\u65e0\u6548\u8fd0\u884c\u7684\u8ba1\u7b97\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u901f\u5ea6\u548c\u5ef6\u8fdf\u3002PipeInfer\u5728\u751f\u6210\u901f\u5ea6\u4e0a\u6bd4\u6807\u51c6\u63a8\u6d4b\u6027\u63a8\u7406\u6700\u9ad8\u53ef\u63d0\u53472.15\u500d\u3002|\n", "2407.11789": "|**2024-07-16**|**Large Language Models as Misleading Assistants in Conversation**|Betty Li Hou et.al.|[2407.11789](http://arxiv.org/abs/2407.11789)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u80fd\u591f\u63d0\u4f9b\u5e2e\u52a9\u3002\u7136\u800c\uff0c\u6a21\u578b\u8f93\u51fa\u53ef\u80fd\u4f1a\u8bef\u5bfc\u7528\u6237\uff0c\u65e0\u8bba\u662f\u65e0\u610f\u7684\u8fd8\u662f\u6545\u610f\u7684\u3002\u6211\u4eec\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u4efb\u52a1\u63a2\u8ba8\u4e86LLMs\u5728\u6b3a\u9a97\u6027\u8f85\u52a9\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5c06\u5176\u4f5c\u4e3a\u4eba\u7c7b\u7528\u6237\u7684\u4ee3\u7406\u3002\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u4e09\u79cd\u60c5\u51b5\uff1a\uff081\uff09\u6a21\u578b\u88ab\u63d0\u793a\u63d0\u4f9b\u771f\u5b9e\u4fe1\u606f\uff0c\uff082\uff09\u6a21\u578b\u88ab\u63d0\u793a\u8fdb\u884c\u5fae\u5999\u8bef\u5bfc\uff0c\u4ee5\u53ca\uff083\uff09\u6a21\u578b\u88ab\u63d0\u793a\u652f\u6301\u9519\u8bef\u7b54\u6848\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT-4\u80fd\u591f\u6709\u6548\u8bef\u5bfcGPT-3.5-Turbo\u548cGPT-4\u81ea\u8eab\uff0c\u6b3a\u9a97\u6027\u52a9\u624b\u5bfc\u81f4\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u9ad8\u8fbe23%\uff0c\u76f8\u6bd4\u4e8e\u4f7f\u7528\u771f\u5b9e\u52a9\u624b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5411\u7528\u6237\u6a21\u578b\u63d0\u4f9b\u66f4\u591a\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u53ef\u4ee5\u90e8\u5206\u62b5\u6d88\u6b3a\u9a97\u6a21\u578b\u7684\u5f71\u54cd\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u8bef\u5bfc\u6027\u4fe1\u606f\u7684\u80fd\u529b\u53ca\u5176\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u6f5c\u5728\u5f71\u54cd\u3002|\n", "2407.12735": "|**2024-07-17**|**EchoSight: Advancing Visual-Language Models with Wiki Knowledge**|Yibin Yan et.al.|[2407.12735](http://arxiv.org/abs/2407.12735)|null|**\u6458\u8981\uff1a** \u77e5\u8bc6\u9a71\u52a8\u7684\u89c6\u89c9\u95ee\u7b54\uff08KVQA\uff09\u4efb\u52a1\u8981\u6c42\u5229\u7528\u4e30\u5bcc\u80cc\u666f\u77e5\u8bc6\u89e3\u7b54\u56fe\u50cf\u76f8\u5173\u95ee\u9898\uff0c\u4f46\u751f\u6210\u6a21\u578b\u5728\u8fd9\u65b9\u9762\u5e38\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faEchoSight\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5e2e\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u9700\u8981\u8be6\u5c3d\u767e\u79d1\u77e5\u8bc6\u7684\u89c6\u89c9\u95ee\u7b54\u3002EchoSight\u9996\u5148\u4ec5\u4f7f\u7528\u56fe\u50cf\u4fe1\u606f\u5728\u7ef4\u57fa\u767e\u79d1\u4e2d\u641c\u7d22\u6587\u7ae0\uff0c\u7136\u540e\u5bf9\u5019\u9009\u6587\u7ae0\u6839\u636e\u5b83\u4eec\u4e0e\u6587\u672c-\u56fe\u50cf\u67e5\u8be2\u7684\u76f8\u5173\u6027\u8fdb\u884c\u4e8c\u6b21\u6392\u5e8f\uff0c\u4ece\u800c\u663e\u8457\u63d0\u5347\u591a\u6a21\u6001\u77e5\u8bc6\u7684\u6574\u5408\uff0c\u8fdb\u800c\u63d0\u9ad8\u68c0\u7d22\u6548\u679c\u548c\u7b54\u6848\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u5728Encyclopedic VQA\u548cInfoSeek\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEchoSight\u5728\u77e5\u8bc6\u578b\u89c6\u89c9\u95ee\u7b54\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684state-of-the-art\u6210\u7ee9\uff0cEncyclopedic VQA\u4efb\u52a1\u4e0a\u8fbe\u523041.8%\u7684\u51c6\u786e\u7387\uff0cInfoSeek\u4efb\u52a1\u4e0a\u8fbe\u523031.3%\u3002|\n", "2407.12727": "|**2024-07-17**|**NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model**|Zhongqun Zhang et.al.|[2407.12727](http://arxiv.org/abs/2407.12727)|null|### \u80cc\u666f \u5728\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u91cd\u5efa\u4e2d\uff0c\u7cbe\u786e\u7684\u624b\u90e8\u4e0e\u7269\u4f53\u4e4b\u95f4\u7684\u7269\u7406\u63a5\u89e6\u662f\u63d0\u5347\u624b\u90e8\u59ff\u6001\u4f30\u8ba1\u51c6\u786e\u6027\u548c\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u7684\u6807\u51c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u96be\u4ee5\u5b9a\u4e49\u6216\u63a7\u5236\u7684\u51e0\u4f55\u7ea6\u675f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u8fdb\u884c\u53ef\u63a7\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u5efa\u6a21\u3002\u9762\u4e34\u7684\u6311\u6218\u5305\u62ec\uff1a\u4e00\u3001\u4ece\u8bed\u8a00\u5230\u63a5\u89e6\u7684\u590d\u6742\u8de8\u6a21\u6001\u5efa\u6a21\uff1b\u4e8c\u3001\u7f3a\u4e4f\u63cf\u8ff0\u63a5\u89e6\u6a21\u5f0f\u7684\u6587\u672c\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86NL2Contact\u6a21\u578b\uff0c\u5b83\u5229\u7528\u5206\u6bb5\u6269\u6563\u6a21\u578b\u751f\u6210\u53ef\u63a7\u5236\u7684\u63a5\u89e6\u3002\u7ed9\u5b9a\u5bf9\u624b\u548c\u63a5\u89e6\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\uff0cNL2Contact\u80fd\u591f\u751f\u6210\u903c\u771f\u4e14\u5fe0\u5b9e\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5f00\u53d1\u4e86NL2Contact\u6a21\u578b\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u751f\u6210\u5177\u6709\u63a7\u5236\u6027\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002\u4e3a\u8bad\u7ec3\u8fd9\u4e2a\u6a21\u578b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u9996\u4e2a\u540d\u4e3a\\textit{ContactDescribe}\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u57fa\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\uff08\u5982\u6293\u53d6\u52a8\u4f5c\u3001\u6293\u53d6\u7c7b\u578b\u3001\u63a5\u89e6\u4f4d\u7f6e\u548c\u81ea\u7531\u624b\u6307\u72b6\u6001\uff09\u751f\u6210\u7684\u4e30\u5bcc\u591a\u6837\u7684\u624b\u90e8\u4e2d\u5fc3\u63a5\u89e6\u63cf\u8ff0\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u4f18\u5316\u6293\u63e1\u59ff\u52bf\u548c\u57fa\u4e8e\u6587\u672c\u63cf\u8ff0\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u65b9\u9762\u5c55\u793a\u4e86\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.12725": "|**2024-07-17**|**Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?**|Ben Yao et.al.|[2407.12725](http://arxiv.org/abs/2407.12725)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u89e3\u51b3\u590d\u6742\u95ee\u9898\u7684\u80fd\u529b\u65b9\u9762\uff0c\u901a\u8fc7\u9010\u6b65\u63a8\u7406\u6b65\u9aa4\u7684\u6269\u5c55\u663e\u8457\u63d0\u5347\u5176\u6027\u80fd\uff0c\u56e0\u4e3a\u8fd9\u4fc3\u4f7f\u6a21\u578b\u8fdb\u884c\u5e8f\u5217\u601d\u8003\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u5bf9\u8bbd\u523a\u7684\u7406\u89e3\u901a\u5e38\u88ab\u89c6\u4e3a\u4e00\u79cd\u76f4\u89c9\u4e14\u6574\u4f53\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u5b83\u6574\u5408\u4e86\u8bed\u8a00\u3001\u4e0a\u4e0b\u6587\u548c\u60c5\u611f\u7ebf\u7d22\uff0c\u5f62\u6210\u5bf9\u8bf4\u8bdd\u8005\u771f\u5b9e\u610f\u56fe\u7684\u5168\u9762\u7406\u89e3\uff0c\u8fd9\u79cd\u7406\u89e3\u88ab\u8ba4\u4e3a\u4e0d\u5c40\u9650\u4e8e\u4e00\u6b65\u6b65\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u89c2\u70b9\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u6846\u67b6\uff0c\u79f0\u4e3aSarcasmCue\uff0c\u5b83\u5305\u542b\u4e86\u56db\u79cd\u63d0\u793a\u7b56\u7565\uff1a\u8fde\u9501\u77db\u76fe\uff08CoC\uff09\u3001\u7ebf\u7d22\u56fe\uff08GoC\uff09\u3001\u7ebf\u7d22\u96c6\u5408\uff08BoC\uff09\u548c\u7ebf\u7d22\u5f20\u91cf\uff08ToC\uff09\u3002\u8fd9\u4e9b\u65b9\u6cd5\u65e8\u5728\u5f15\u5bfcLLMs\u901a\u8fc7\u8003\u8651\u987a\u5e8f\u548c\u975e\u987a\u5e8f\u63d0\u793a\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u8bbd\u523a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5168\u9762\u5b9e\u8bc1\u6bd4\u8f83\u8868\u660e\uff0c\u6211\u4eec\u7684\u56db\u79cd\u63d0\u793a\u65b9\u6cd5\u660e\u663e\u4f18\u4e8e\u6807\u51c6\u7684\u8f93\u5165-\u8f93\u51fa\u63d0\u793a\u3001CoT\u548cToT\uff0c\u800c\u4e14\u975e\u987a\u5e8f\u63d0\u793a\u901a\u5e38\u4f18\u4e8e\u987a\u5e8f\u63d0\u793a\u3002|\n", "2407.12723": "|**2024-07-17**|**The Future of Learning: Large Language Models through the Lens of Students**|He Zhang et.al.|[2407.12723](http://arxiv.org/abs/2407.12723)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u63d0\u5347\u548c\u529f\u80fd\u6269\u5c55\u5bf9\u6559\u80b2\u9886\u57df\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u672c\u7814\u7a76\u901a\u8fc7\u8bbf\u8c0814\u540d\u5b66\u751f\uff0c\u63a2\u8ba8\u4ed6\u4eec\u65e5\u5e38\u4e0eChatGPT\u7684\u4e92\u52a8\u3002\u521d\u6b65\u7ed3\u679c\u663e\u793a\uff0c\u5b66\u751f\u4eec\u5728\u4eab\u53d7ChatGPT\u63d0\u9ad8\u5b66\u4e60\u6548\u7387\u548c\u4fe1\u606f\u83b7\u53d6\u4fbf\u5229\u7684\u540c\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4fe1\u4efb\u5371\u673a\u548c\u4f26\u7406\u987e\u8651\u3002\u4ed6\u4eec\u8ba4\u4e3aChatGPT\u76f8\u8f83\u4e8e\u4f20\u7edfAI\u66f4\u663e\u201c\u4eba\u6027\u5316\u201d\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u77db\u76fe\u60c5\u7eea\u3001\u884c\u4e3a\u4e0d\u4e00\u81f4\u4ee5\u53ca\u5bf9\u5b66\u751f\u6574\u4f53\u4e0a\u79ef\u6781\u7684\u6001\u5ea6\uff0c\u51f8\u663e\u4e86ChatGPT\u5728\u6559\u80b2\u9886\u57df\u7684\u6f5c\u5728\u4ef7\u503c\u3002\u4f46\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5c3d\u7ba1\u5176\u667a\u80fd\u7a0b\u5ea6\u9ad8\uff0c\u53ef\u80fd\u5e26\u6765\u8d1f\u9762\u6548\u5e94\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f3a\u8c03\u5728\u5e94\u7528\u65f6\u9700\u8c28\u614e\uff0c\u5e76\u81f4\u529b\u4e8e\u5728\u672a\u6765\u7684\u5f00\u53d1\u4e2d\u51cf\u5c11\u6f5c\u5728\u7684\u5371\u5bb3\u3002|\n", "2407.12709": "|**2024-07-17**|**MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models**|Leyang Shen et.al.|[2407.12709](http://arxiv.org/abs/2407.12709)|**[link](https://github.com/jiutian-vl/mome)**|**\u5728\u591a\u9879\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u901a\u7528\u7684MLLM\u5728\u5927\u591a\u6570VL\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4e0d\u5982\u4e13\u95e8\u5316\u7684MLLM\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b58\u5728\u4efb\u52a1\u5e72\u6270\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u591a\u6a21\u6001\u4e13\u5bb6\uff08MoME\uff09\u67b6\u6784\uff0c\u65e8\u5728\u51cf\u8f7b\u4efb\u52a1\u5e72\u6270\uff0c\u4ece\u800c\u83b7\u5f97\u4e00\u4e2a\u5168\u80fd\u7684MLLM\u3002MoME\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a\u89c6\u89c9\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoVE\uff09\u548c\u8bed\u8a00\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoLE\uff09\u3002MoVE\u80fd\u591f\u81ea\u9002\u5e94\u5730\u8c03\u6574\u6765\u81ea\u4e0d\u540c\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7279\u5f81\uff0c\u5e76\u5728\u8f6c\u6362\u67b6\u6784\u4e0a\u5177\u6709\u826f\u597d\u7684\u517c\u5bb9\u6027\u3002MoLE\u901a\u8fc7\u7a00\u758f\u95e8\u63a7\u4e13\u5bb6\u878d\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u5b9e\u73b0\u4e86\u51e0\u4e4e\u65e0\u989d\u5916\u6210\u672c\u7684\u6027\u80fd\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u4efb\u52a1\u5e72\u6270\uff0cMoME\u4e13\u6ce8\u4e8e\u89c6\u89c9\u548c\u8bed\u8a00\u4e24\u79cd\u6a21\u6001\uff0c\u4ee5\u9002\u5e94\u4efb\u52a1\u95f4\u7684\u5dee\u5f02\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoME\u663e\u8457\u63d0\u9ad8\u4e86\u901a\u7528MLLM\u5728\u5404\u79cdVL\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6e90\u4ee3\u7801\u5df2\u5728https://github.com/JiuTian-VL/MoME\u4e0a\u53d1\u5e03\u3002**|\n", "2407.12665": "|**2024-07-17**|**Patch-Level Training for Large Language Models**|Chenze Shao et.al.|[2407.12665](http://arxiv.org/abs/2407.12665)|**[link](https://github.com/shaochenze/patchtrain)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\uff0c\u5176\u8bad\u7ec3\u6548\u7387\u6210\u4e3a\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0cLLMs\u901a\u8fc7\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u8fdb\u884c\u8bad\u7ec3\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4ee4\u724c\u7684\u8bad\u7ec3\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5176\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u5904\u7406\u5927\u91cf\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cpatch-level training\u201d\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u591a\u4e2a\u4ee4\u724c\u538b\u7f29\u6210\u5355\u4e2apatch\u6765\u7f29\u77ed\u5e8f\u5217\u957f\u5ea6\u3002\u5728patch-level\u8bad\u7ec3\u4e2d\uff0c\u6211\u4eec\u8f93\u5165\u66f4\u77ed\u7684patch\u5e8f\u5217\uff0c\u8ba9\u6a21\u578b\u5b66\u4e60\u9884\u6d4b\u4e0b\u4e00\u4e2apatch\uff0c\u4ece\u800c\u5927\u5e45\u5ea6\u51cf\u5c11\u4e86\u5927\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u5904\u7406\u6210\u672c\u3002\u63a5\u7740\uff0c\u6a21\u578b\u4f1a\u8fdb\u884c\u5269\u4f59\u8bad\u7ec3\u6570\u636e\u7684\u4ee4\u724c\u7ea7\u8bad\u7ec3\uff0c\u4ee5\u9002\u5e94\u63a8\u7406\u6a21\u5f0f\u3002\u5b9e\u9a8c\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08370M-2.7\u4ebf\u53c2\u6570\uff09\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660epatch-level\u8bad\u7ec3\u53ef\u4ee5\u5c06\u603b\u4f53\u8ba1\u7b97\u6210\u672c\u964d\u4f4e\u81f30.5\u500d\uff0c\u540c\u65f6\u4e0d\u4f1a\u5f71\u54cd\u6a21\u578b\u6027\u80fd\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/shaochenze/PatchTrain}\u3002**|\n", "2407.12642": "|**2024-07-17**|**Zero-shot Text-guided Infinite Image Synthesis with LLM guidance**|Soyeong Kwon et.al.|[2407.12642](http://arxiv.org/abs/2407.12642)|null|**\u80cc\u666f\uff1a** \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\u548c\u751f\u6210\u65b9\u6cd5\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u7136\u800c\uff0c\u6587\u672c\u5f15\u5bfc\u7684\u65e0\u9650\u56fe\u50cf\u5408\u6210\u9762\u4e34\u7740\u4e00\u4e9b\u6311\u6218\u3002\u9996\u5148\uff0c\u7f3a\u4e4f\u9ad8\u5206\u8fa8\u7387\u4e14\u5177\u6709\u4e30\u5bcc\u60c5\u5883\u591a\u6837\u6027\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u6570\u636e\u96c6\u3002\u5176\u6b21\uff0c\u6839\u636e\u6587\u672c\u6269\u5c55\u56fe\u50cf\u9700\u8981\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u4e30\u5bcc\u7684\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7c7b\u522b\uff0c\u5982\u81ea\u7136\u98ce\u666f\uff0c\u4e14\u9700\u8981\u5728\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u53ca\u5176\u914d\u6587\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u540c\u65f6\u5904\u7406\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u65e0\u9700\u4efb\u4f55\u9ad8\u5206\u8fa8\u7387\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u3002 **\u65b9\u6cd5\uff1a** \u6211\u4eec\u5728\u8bad\u7ec3\u6269\u6563\u6a21\u578b\u65f6\uff0c\u8ba9\u5b83\u6839\u636eLLM\u751f\u6210\u7684\u5168\u5c40\u548c\u5c40\u90e8\u63cf\u8ff0\u4ee5\u53ca\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u7ed9\u5b9a\u4e00\u5f20\u56fe\u7247\u548c\u4e00\u4e2a\u5168\u5c40\u63cf\u8ff0\uff0c\u6211\u4eec\u4f7f\u7528LLM\u751f\u6210\u4e0b\u4e00\u4e2a\u5c40\u90e8\u63cf\u8ff0\u6765\u6269\u5c55\u8f93\u5165\u56fe\u50cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u7ed3\u5408\u5168\u5c40\u63cf\u8ff0\u3001\u751f\u6210\u7684\u5c40\u90e8\u63cf\u8ff0\u548c\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\uff0c\u4ee5\u786e\u4fdd\u5168\u5c40\u4e00\u81f4\u6027\u5e76\u8003\u8651\u7a7a\u95f4\u5c40\u90e8\u4e0a\u4e0b\u6587\u3002 **\u5b9e\u9a8c\u7ed3\u679c\uff1a** \u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5b9a\u91cf\u548c\u5b9a\u6027\u4e0a\u90fd\u4f18\u4e8e\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5c55\u793a\u4e86\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u501f\u52a9LLM\u5f15\u5bfc\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u7684\u80fd\u529b\u3002 \u603b\u7ed3\uff1a \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u6269\u5c55\u65b9\u6cd5\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u5206\u8fa8\u7387\u7684\u914d\u5bf9\u6570\u636e\uff0c\u80fd\u591f\u5b9e\u73b0\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u5e76\u5728\u5b9e\u9a8c\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u652f\u6301\u96f6\u6837\u672c\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u3002|\n", "2407.12620": "|**2024-07-17**|**Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences**|Claudio Pinhanez et.al.|[2407.12620](http://arxiv.org/abs/2407.12620)|null|\u81ea2022\u5e74\u4ee5\u6765\uff0c\u6211\u4eec\u4e00\u76f4\u5728\u63a2\u7d22\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u73b0\u4ee3\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u9886\u57df\uff0c\u4ee5\u652f\u6301\u548c\u4fc3\u8fdb\u6fd2\u4e34\u6d88\u5931\u7684\u571f\u8457\u8bed\u8a00\u7684\u4f7f\u7528\u4e0e\u6587\u6863\u5316\u3002\u9996\u5148\uff0c\u6211\u4eec\u5173\u6ce8\u4e16\u754c\u8bed\u8a00\u591a\u6837\u6027\u7684\u51cf\u5c11\uff0c\u5e76\u8ba8\u8bba\u4e0e\u5904\u7406\u571f\u8457\u8bed\u8a00\u76f8\u5173\u7684\u72ec\u7279\u4f26\u7406\u6311\u6218\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u533a\u53c2\u4e0e\u548c\u4f7f\u7528\u7684AI\u5f00\u53d1\u65b0\u5faa\u73af\u3002\u63a5\u7740\uff0c\u6211\u4eec\u62a5\u544a\u4e86\u4f7f\u7528\u5c11\u91cf\u6570\u636e\u5fae\u8c03\u6700\u5148\u8fdb\u7684\u7ffb\u8bd1\u5668\uff0c\u6210\u529f\u5f00\u53d1\u51fa\u9ad8\u8d28\u91cf\u7684\u571f\u8457\u8bed\u8a00\u673a\u5668\u7ffb\u8bd1\u7684\u9f13\u821e\u4eba\u5fc3\u7684\u6210\u679c\uff0c\u5e76\u8ba8\u8bba\u4e86\u907f\u514d\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u5e38\u89c1\u9677\u9631\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e862023\u5e74\u548c2024\u5e74\u5728\u5df4\u897f\u4e0e\u571f\u8457\u793e\u533a\u5408\u4f5c\u9879\u76ee\u4e2d\u7684\u539f\u578b\uff0c\u76ee\u6807\u662f\u7b80\u5316\u5199\u4f5c\uff0c\u4ee5\u53ca\u53d1\u5c55\u571f\u8457\u8bed\u8a00\u6a21\u578b\uff08ILMs\uff09\u4f5c\u4e3a\u521b\u5efa\u62fc\u5199\u68c0\u67e5\u5668\u3001\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u5668\u7b49\u5de5\u5177\u7684\u53ef\u590d\u5236\u548c\u53ef\u6269\u5c55\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u671b\u4e00\u4e2a\u672a\u6765\uff0c\u6fd2\u5371\u7684\u8bed\u8a00\u5c06\u901a\u8fc7\u4e92\u52a8\u7684\u8bed\u8a00\u6a21\u578b\u5f97\u4ee5\u4fdd\u5b58\u3002|\n", "2407.12613": "|**2024-07-17**|**AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism**|William Brannon et.al.|[2407.12613](http://arxiv.org/abs/2407.12613)|**[link](https://github.com/mit-ccc/AudienceView-demo)**|****\u80cc\u666f\uff1a** \u8bb0\u8005\u7406\u89e3\u548c\u5229\u7528\u53d7\u4f17\u53cd\u9988\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5982\u4eca\u4ed6\u4eec\u5728\u7ebf\u9762\u4e34\u5927\u91cf\u89c2\u4f17\u8bc4\u8bba\uff0c\u8fd9\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u6211\u4eec\u63a8\u51fa\u4e86AudienceView\uff0c\u4e00\u4e2a\u5728\u7ebf\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e2e\u52a9\u8bb0\u8005\u5bf9\u8fd9\u4e9b\u53cd\u9988\u8fdb\u884c\u5206\u7c7b\u548c\u89e3\u8bfb\u3002AudienceView\u8bc6\u522b\u4e3b\u9898\u548c\u8bdd\u9898\uff0c\u5c06\u5b83\u4eec\u4e0e\u7279\u5b9a\u8bc4\u8bba\u5173\u8054\uff0c\u5c55\u793a\u8bc4\u8bba\u7684\u60c5\u611f\u503e\u5411\u548c\u5206\u5e03\uff0c\u5e76\u534f\u52a9\u7528\u6237\u6784\u601d\u540e\u7eed\u62a5\u9053\u9879\u76ee\u3002\u6211\u4eec\u5c06\u63a2\u8ba8\u8fd9\u7c7b\u5de5\u5177\u5982\u4f55\u878d\u5165\u8bb0\u8005\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5f3a\u8c03\u60c5\u5883\u7406\u89e3\u53ca\u4eba\u7c7b\u5224\u65ad\u7684\u91cd\u8981\u6027\u3002 \u8bf7\u8bb0\u4f4f\uff0c\u4ee5\u4e0a\u7ffb\u8bd1\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002**|\n", "2407.12580": "|**2024-07-17**|**E5-V: Universal Embeddings with Multimodal Large Language Models**|Ting Jiang et.al.|[2407.12580](http://arxiv.org/abs/2407.12580)|**[link](https://github.com/kongds/e5-v)**|**### \u80cc\u666f \u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u901a\u7528\u89c6\u89c9\u548c\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5982\u4f55\u5229\u7528MLLMs\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u7684\u8868\u793a\u65b9\u5f0f\u5c1a\u672a\u5145\u5206\u7814\u7a76\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6846\u67b6E5-V\uff0c\u65e8\u5728\u4f7fMLLMs\u9002\u5e94\u5b9e\u73b0\u901a\u7528\u591a\u6a21\u6001\u5d4c\u5165\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\uff0cMLLMs\u5728\u5904\u7406\u591a\u6a21\u6001\u8f93\u5165\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u901a\u8fc7\u7ed3\u5408\u63d0\u793a\uff0cE5-V\u6709\u6548\u5730\u5f25\u5408\u4e86\u4e0d\u540c\u7c7b\u578b\u8f93\u5165\u4e4b\u95f4\u7684\u6a21\u6001\u9e3f\u6c9f\uff0c\u5373\u4f7f\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5d4c\u5165\u80fd\u529b\u3002 ### \u65b9\u6cd5 E5-V\u91c7\u7528\u5355\u4e00\u6a21\u6001\u8bad\u7ec3\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u57fa\u4e8e\u56fe\u50cf-\u6587\u672c\u5bf9\u7684\u591a\u6a21\u6001\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86\u5927\u7ea695%\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u907f\u514d\u4e86\u6536\u96c6\u6602\u8d35\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6570\u636e\u7684\u9700\u6c42\u3002\u5b9e\u9a8c\u5728\u56db\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u9a8c\u8bc1\uff0c\u4ee5\u5c55\u793aE5-V\u7684\u6709\u6548\u6027\u3002 ### \u7ed3\u679c \u4f5c\u4e3a\u4e00\u6b3e\u901a\u7528\u591a\u6a21\u6001\u6a21\u578b\uff0cE5-V\u4e0d\u4ec5\u5728\u5404\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9876\u5c16\u6027\u80fd\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\u4e86\u73b0\u6709\u6280\u672f\u6c34\u5e73\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u662f\u57fa\u4e8e\u5355\u6a21\u6001\u8bad\u7ec3\u5b8c\u6210\u7684\u3002**|\n", "2407.13761": "|**2024-07-18**|**SegPoint: Segment Any Point Cloud via Large Language Model**|Shuting He et.al.|[2407.13761](http://arxiv.org/abs/2407.13761)|null|\u5c3d\u7ba1\u4e09\u7ef4\u70b9\u4e91\u5206\u5272\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u4f9d\u8d56\u4e8e\u660e\u786e\u7684\u6307\u4ee4\u6765\u8bc6\u522b\u76ee\u6807\uff0c\u7f3a\u4e4f\u5728\u7edf\u4e00\u6846\u67b6\u4e2d\u7406\u89e3\u548c\u63a8\u65ad\u7528\u6237\u9690\u542b\u610f\u56fe\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSegPoint\u7684\u6a21\u578b\uff0c\u5b83\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u70b9\u7ea7\u5206\u5272\uff1a1\uff09\u4e09\u7ef4\u6307\u4ee4\u5206\u5272\uff0c2\uff09\u4e09\u7ef4\u6307\u79f0\u5206\u5272\uff0c3\uff09\u4e09\u7ef4\u8bed\u4e49\u5206\u5272\uff0c\u4ee5\u53ca4\uff09\u4e09\u7ef4\u5f00\u653e\u8bcd\u6c47\u8bed\u4e49\u5206\u5272\u3002\u4e3a\u4e86\u63a8\u52a8\u4e09\u7ef4\u6307\u4ee4\u7814\u7a76\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6Instruct3D\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ece\u590d\u6742\u548c\u9690\u542b\u6307\u4ee4\u6587\u672c\u8fdb\u884c\u5206\u5272\u6027\u80fd\uff0c\u5305\u542b2,565\u4e2a\u70b9\u4e91-\u6307\u4ee4\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSegPoint\u5728ScanRefer\u6307\u79f0\u5206\u5272\u548cScanNet\u8bed\u4e49\u5206\u5272\u7b49\u65e2\u6709\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u7ade\u4e89\u529b\uff0c\u540c\u65f6\u5728Instruct3D\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u4f18\u5f02\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cSegPoint\u662f\u9996\u4e2a\u5728\u4e00\u4e2a\u6846\u67b6\u5185\u5904\u7406\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u5206\u5272\u4efb\u52a1\u5e76\u8fbe\u5230\u6ee1\u610f\u6027\u80fd\u7684\u6a21\u578b\u3002|\n", "2407.13757": "|**2024-07-18**|**Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models**|Zhuo Chen et.al.|[2407.13757](http://arxiv.org/abs/2407.13757)|null|## \u4efb\u52a1 \u672c\u7814\u7a76\u5173\u6ce8\u4e8eRetrieval-Augmented Generation\uff08RAG\uff09\u6a21\u578b\u5728\u9762\u5bf9\u9ed1\u76d2\u653b\u51fb\u65f6\u7684\u8106\u5f31\u6027\uff0c\u5c24\u5176\u662f\u5728\u610f\u89c1\u64cd\u7eb5\u65b9\u9762\u7684\u5e94\u7528\u3002RAG\u65e8\u5728\u89e3\u51b3\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5e7b\u89c9\u95ee\u9898\u548c\u5b9e\u65f6\u7ea6\u675f\uff0c\u4f46\u540c\u65f6\u4e5f\u66b4\u9732\u51fa\u5bf9\u6297\u68c0\u7d22\u7be1\u6539\u653b\u51fb\u7684\u5f31\u70b9\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u767d\u76d2\u548c\u5c01\u95ed\u9886\u57df\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684RAG\u4e0d\u7a33\u5b9a\u6027\u3002\u672c\u6587\u7684\u76ee\u6807\u662f\u63ed\u793a\u5f53RAG\u6a21\u578b\u906d\u9047\u9ed1\u76d2\u653b\u51fb\u65f6\uff0c\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4e3a\u63d0\u9ad8\u6a21\u578b\u7684\u53ef\u9760\u6027\u548c\u5b89\u5168\u6027\u63d0\u4f9b\u65b0\u89c1\u89e3\u3002 \u6211\u4eec\u901a\u8fc7\u64cd\u63a7RAG\u4e2d\u68c0\u7d22\u6a21\u578b\u7684\u6392\u540d\u7ed3\u679c\uff0c\u5229\u7528\u8fd9\u4e9b\u64cd\u7eb5\u540e\u7684\u6570\u636e\u8bad\u7ec3\u4e00\u4e2a\u4ee3\u7406\u6a21\u578b\u3002\u63a5\u7740\uff0c\u91c7\u7528\u5bf9\u6297\u6027\u68c0\u7d22\u653b\u51fb\u65b9\u6cd5\u9488\u5bf9\u4ee3\u7406\u6a21\u578b\u5b9e\u65bd\u9ed1\u76d2\u8fc1\u79fb\u653b\u51fb\uff0c\u8fdb\u4e00\u6b65\u5f71\u54cdRAG\u7684\u751f\u6210\u8fc7\u7a0b\u3002\u5728\u6d89\u53ca\u591a\u4e2a\u4e3b\u9898\u7684\u610f\u89c1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u63d0\u51fa\u7684\u653b\u51fb\u7b56\u7565\u80fd\u663e\u8457\u6539\u53d8RAG\u751f\u6210\u5185\u5bb9\u7684\u89c2\u70b9\u6781\u6027\uff0c\u8fd9\u63ed\u793a\u4e86\u6a21\u578b\u7684\u6613\u53d7\u653b\u51fb\u6027\uff0c\u5e76\u4e14\u6f5c\u5728\u5730\u6307\u51fa\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u4f7f\u5f97\u8bef\u5bfc\u7528\u6237\u63a5\u53d7\u9519\u8bef\u6216\u6709\u504f\u89c1\u7684\u4fe1\u606f\u53d8\u5f97\u66f4\u52a0\u5bb9\u6613\u3002|\n", "2407.13742": "|**2024-07-18**|**CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications**|Mirza Masfiqur Rahman et.al.|[2407.13742](http://arxiv.org/abs/2407.13742)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u8702\u7a9d\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u5e38\u5e38\u5c06\u5b89\u5168\u6f0f\u6d1e\u5f52\u548e\u4e8e\u5e95\u5c42\u534f\u8bae\u8bbe\u8ba1\u63cf\u8ff0\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u901a\u5e38\u957f\u8fbe\u6570\u5343\u9875\u7684\u8be6\u7ec6\u89c4\u683c\u6587\u6863\u53ef\u80fd\u5305\u542b\u9519\u8bef\u3001\u4e0d\u5b8c\u6574\u63cf\u8ff0\u3001\u9690\u542b\u5047\u8bbe\u548c\u5185\u90e8\u77db\u76fe\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51faCellularLint\u2014\u2014\u4e00\u4e2a\u9488\u5bf94G\u548c5G\u975e\u63a5\u5165\u5c42\uff08Non-Access Stratum\uff0cNAS\uff09\u548c\u5b89\u5168\u89c4\u8303\u7684\u534a\u81ea\u52a8\u6846\u67b6\uff0c\u5229\u7528\u4e00\u5957\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u9886\u57df\u9002\u5e94\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6539\u826f\u7684\u5c11\u91cf\u6837\u4f8b\u5b66\u4e60\u3002\u8be5\u6a21\u578b\u9884\u8bad\u7ec3\u5728\u5927\u91cf\u7684\u8702\u7a9d\u7f51\u7edc\u534f\u8bae\u6570\u636e\u4e0a\uff0c\u80fd\u591f\u540c\u65f6\u68c0\u6d4b\u4e0d\u540c\u8bed\u4e49\u5c42\u6b21\u548c\u5b9e\u9645\u4f7f\u7528\u6848\u4f8b\u4e2d\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u4ee5\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u65b9\u5f0f\u63d0\u5347\u534f\u8bae\u89c4\u683c\u7684\u81ea\u52a8\u5316\u5206\u6790\u3002\u901a\u8fc7\u7814\u7a76\uff0c\u6211\u4eec\u57284G\u548c5G\u7f51\u7edc\u4e2d\u53d1\u73b0\u4e86157\u4e2a\u4e0d\u4e00\u81f4\u70b9\uff0c\u51c6\u786e\u7387\u4e3a82.67%\u3002\u7ecf\u8fc7\u5bf9\u5f00\u6e90\u5b9e\u73b0\u548c17\u6b3e\u5546\u7528\u8bbe\u5907\u7684\u9a8c\u8bc1\uff0c\u6211\u4eec\u786e\u8ba4\u8fd9\u4e9b\u4e0d\u4e00\u81f4\u786e\u5b9e\u5bf9\u8bbe\u8ba1\u51b3\u7b56\u6709\u91cd\u5927\u5f71\u54cd\uff0c\u53ef\u80fd\u5bfc\u81f4\u9690\u79c1\u3001\u5b8c\u6574\u6027\u3001\u53ef\u7528\u6027\u548c\u4e92\u64cd\u4f5c\u6027\u65b9\u9762\u7684\u62c5\u5fe7\u3002|\n", "2407.13729": "|**2024-07-18**|**Baba Is AI: Break the Rules to Beat the Benchmark**|Nathan Cloos et.al.|[2407.13729](http://arxiv.org/abs/2407.13729)|null|\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u65e2\u4f9d\u8d56\u4e8e\u9075\u5faa\u73b0\u6709\u89c4\u5219\u548c\u7a0b\u5e8f\uff0c\u4e5f\u4f9d\u8d56\u4e8e\u521b\u65b0\u601d\u7ef4\u6765\u91cd\u65b0\u5b9a\u4e49\u89c4\u5219\u548c\u76ee\u6807\u3002\u4e3a\u4e86\u68c0\u9a8c\u8fd9\u4e9b\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e\u6e38\u620f\u300aBaba Is You\u300b\u3002\u5728\u8fd9\u4e2a\u6e38\u620f\u4e2d\uff0c\u4ee3\u7406\u9700\u8981\u64cd\u63a7\u73af\u5883\u4e2d\u7684\u7269\u4f53\u548c\u53ef\u79fb\u52a8\u7684\u6587\u5b57\u89c4\u5219\u74f7\u7816\uff0c\u4ee5\u5b9e\u73b0\u7279\u5b9a\u76ee\u6807\u5e76\u8d62\u5f97\u6bd4\u8d5b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u4e09\u79cd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08OpenAI GPT-4\u3001Google Gemini-1.5-Pro\u548cGemini-1.5-Flash\uff09\uff0c\u53d1\u73b0\u5f53\u9700\u8981\u5bf9\u6e38\u620f\u89c4\u5219\u8fdb\u884c\u64cd\u7eb5\u548c\u7ec4\u5408\u65f6\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u5927\u5e45\u4e0b\u6ed1\u3002|\n", "2407.13717": "|**2024-07-18**|**CoDefeater: Using LLMs To Find Defeaters in Assurance Cases**|Usman Gohar et.al.|[2407.13717](http://arxiv.org/abs/2407.13717)|**[link](https://gitlab.com/anonymousdot/codefeater)**|\u6784\u5efa\u4fdd\u8bc1\u6848\u4f8b\u662f\u4e00\u79cd\u5e38\u7528\u4e14\u6709\u65f6\u5fc5\u8981\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc1\u660e\u5b89\u5168\u5173\u952e\u7cfb\u7edf\u5728\u5176\u89c4\u5212\u73af\u5883\u4e2d\u5c06\u5b89\u5168\u8fd0\u884c\u3002\u4e3a\u4e86\u964d\u4f4e\u9519\u8bef\u548c\u8fb9\u7f18\u60c5\u51b5\u9057\u6f0f\u7684\u98ce\u9669\uff0c\u5f15\u5165\u4e86\u201c\u53cd\u9a73\u201d\u6982\u5ff5\uff0c\u5373\u6311\u6218\u4fdd\u8bc1\u6848\u4f8b\u4e2d\u8bba\u70b9\u6216\u8bc1\u636e\u7684\u8bba\u636e\u3002\u53cd\u9a73\u6709\u52a9\u4e8e\u53ca\u65f6\u53d1\u73b0\u8bba\u70b9\u4e2d\u7684\u5f31\u70b9\uff0c\u4fc3\u4f7f\u8fdb\u4e00\u6b65\u8c03\u67e5\u548c\u53ca\u65f6\u8865\u6551\u3002\u7136\u800c\uff0c\u6355\u6349\u53cd\u9a73\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u5224\u65ad\u3001\u7ecf\u9a8c\u548c\u521b\u65b0\u601d\u7ef4\uff0c\u5e76\u4e14\u5fc5\u987b\u968f\u7740\u9700\u6c42\u548c\u6cd5\u89c4\u7684\u53d8\u5316\u8fdb\u884c\u8fed\u4ee3\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51faCoDefeater\uff0c\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5bfb\u627e\u53cd\u9a73\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u627e\u5230\u5df2\u77e5\u548c\u672a\u77e5\u7684\u5408\u7406\u53cd\u9a73\uff0c\u4ece\u800c\u5e2e\u52a9\u5b89\u5168\u5206\u6790\u5e08\u589e\u5f3a\u4fdd\u8bc1\u6848\u4f8b\u7684\u5b8c\u6574\u6027\u548c\u4fe1\u5fc3\u3002|\n", "2407.13709": "|**2024-07-18**|**Understanding Reference Policies in Direct Preference Optimization**|Yixin Liu et.al.|[2407.13709](http://arxiv.org/abs/2407.13709)|**[link](https://github.com/yale-nlp/refdpo)**|## \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08Direct Preference Optimization\uff0c\u7b80\u79f0 DPO\uff09\u5df2\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u6307\u4ee4\u5fae\u8c03\u7684\u5e38\u7528\u8bad\u7ec3\u65b9\u6cd5\u3002\u672c\u7814\u7a76\u5173\u6ce8DPO\u7684\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u8ba8\u7684\u65b9\u9762\uff1a\u5176\u5bf9\u53c2\u8003\u6a21\u578b\u6216\u7b56\u7565\u7684\u4f9d\u8d56\u6027\u3002\u8fd9\u4e9b\u53c2\u8003\u7b56\u7565\u901a\u5e38\u8868\u73b0\u4e3a\u5f85\u8fdb\u4e00\u6b65\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5b83\u4eec\u5bf9\u4e8eDPO\u7684\u6548\u679c\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u672c\u5de5\u4f5c\u9488\u5bf9\u4ee5\u4e0b\u4e09\u4e2a\u76f8\u5173\u95ee\u9898\u8fdb\u884c\u4e86\u63a2\u7a76\uff1a 1. \u9996\u5148\uff0c\u6211\u4eec\u7814\u7a76\u4e86DPO\u4e2d\u7684KL\u6563\u5ea6\u7ea6\u675f\u5f3a\u5ea6\u7684\u6700\u4f73\u9009\u62e9\uff0c\u8be5\u7ea6\u675f\u60e9\u7f5a\u4e0e\u53c2\u8003\u7b56\u7565\u7684\u504f\u79bb\uff0c\u53d1\u73b0DPO\u5bf9\u6b64\u654f\u611f\u3002 2. \u5176\u6b21\uff0c\u6211\u4eec\u4ece\u7406\u8bba\u548c\u5b9e\u8bc1\u4e0a\u6bd4\u8f83\u4e86DPO\u4e0e\u5176\u4ed6\u5b66\u4e60\u76ee\u6807\uff0c\u4ee5\u63a2\u8ba8\u53c2\u8003\u7b56\u7565\u5728\u6307\u4ee4\u5fae\u8c03\u4e2d\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u663e\u793a\u4e86DPO\u7684\u4f18\u52bf\u3002 3. \u6700\u540e\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u662f\u5426\u6709\u5229\u4e8eDPO\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u53c2\u8003\u7b56\u7565\u4e0e\u88ab\u5fae\u8c03\u6a21\u578b\u76f8\u4f3c\u65f6\uff0c\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u53ef\u80fd\u4f1a\u63d0\u9ad8\u6027\u80fd\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u53c2\u8003\u7b56\u7565\u5728DPO\u4e2d\u7684\u6df7\u6dc6\u4f5c\u7528\uff0c\u63d0\u4f9b\u4e86\u6700\u4f73\u5b9e\u8df5\u7684\u89c1\u89e3\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765\u7814\u7a76\u63d0\u51fa\u4e86\u5f00\u653e\u6027\u95ee\u9898\u3002|\n", "2407.13699": "|**2024-07-18**|**A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice**|Shaina Raza et.al.|[2407.13699](http://arxiv.org/abs/2407.13699)|null|## \u80cc\u666f \u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u901a\u8fc7\u63d0\u4f9b\u4e2a\u6027\u5316\u9879\u76ee\u5efa\u8bae\uff0c\u5bf9\u63d0\u5347\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u672c\u7efc\u8ff0\u56de\u987e\u4e86\u4ece2017\u5e74\u81f32024\u5e74\u95f4RS\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u5c06\u7406\u8bba\u521b\u65b0\u4e0e\u5b9e\u9645\u5e94\u7528\u7d27\u5bc6\u7ed3\u5408\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u4ece\u4f20\u7edf\u65b9\u6cd5\u5982\u57fa\u4e8e\u5185\u5bb9\u548c\u534f\u540c\u8fc7\u6ee4\u7684\u63a8\u8350\uff0c\u5230\u9ad8\u7ea7\u6280\u672f\u5982\u6df1\u5ea6\u5b66\u4e60\u3001\u56fe\u6a21\u578b\u3001\u5f3a\u5316\u5b66\u4e60\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5173\u6ce8\u4e86\u4e13\u95e8\u5316\u7684\u7cfb\u7edf\uff0c\u5982\u4e0a\u4e0b\u6587\u611f\u77e5\u3001\u8bc4\u8bba\u9a71\u52a8\u548c\u516c\u5e73\u6027\u8003\u91cf\u7684RS\u3002\u672c\u8c03\u67e5\u7684\u76ee\u6807\u662f\u8fde\u63a5\u7406\u8bba\u4e0e\u5b9e\u8df5\uff0c\u5173\u6ce8\u7535\u5b50\u5546\u52a1\u3001\u533b\u7597\u4fdd\u5065\u548c\u91d1\u878d\u7b49\u9886\u57df\u7684\u6311\u6218\uff0c\u5f3a\u8c03\u5bf9\u53ef\u6269\u5c55\u3001\u5b9e\u65f6\u4e14\u503c\u5f97\u4fe1\u8d56\u89e3\u51b3\u65b9\u6848\u7684\u9700\u6c42\u3002\u901a\u8fc7\u6b64\u7efc\u8ff0\uff0c\u6211\u4eec\u9f13\u52b1\u5b66\u672f\u7814\u7a76\u4e0e\u884c\u4e1a\u5b9e\u8df5\u7684\u7d27\u5bc6\u5408\u4f5c\u3002\u672c\u7814\u7a76\u63d0\u4f9b\u7684\u6d1e\u89c1\u65e8\u5728\u5e2e\u52a9\u4e1a\u754c\u4e13\u4e1a\u4eba\u5458\u4f18\u5316RS\u90e8\u7f72\uff0c\u5e76\u6fc0\u53d1\u672a\u6765\u7814\u7a76\u7684\u65b0\u65b9\u5411\uff0c\u7279\u522b\u662f\u5728\u5e94\u5bf9\u65b0\u5174\u6280\u672f\u548c\u793e\u4f1a\u8d8b\u52bf\u65f6\u3002|\n", "2407.13692": "|**2024-07-18**|**Prover-Verifier Games improve legibility of LLM outputs**|Jan Hendrik Kirchner et.al.|[2407.13692](http://arxiv.org/abs/2407.13692)|null|\u4e3a\u4e86\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f93\u51fa\u7ed3\u679c\u7684\u53ef\u4fe1\u5ea6\uff0c\u4e00\u4e2a\u65b9\u6cd5\u662f\u652f\u6301\u6e05\u6670\u6613\u9a8c\u8bc1\u7684\u63a8\u7406\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u53ef\u8bfb\u6027\u3002\u672c\u6587\u4ee5\u89e3\u51b3\u5c0f\u5b66\u6570\u5b66\u95ee\u9898\u4e3a\u80cc\u666f\uff0c\u7814\u7a76\u4e86\u53ef\u8bfb\u6027\uff0c\u5e76\u53d1\u73b0\u4ec5\u4f18\u5316\u8fde\u8d2f\u601d\u7ef4\u89e3\u9898\u7684\u51c6\u786e\u6027\u53ef\u80fd\u4f1a\u964d\u4f4e\u5176\u53ef\u8bfb\u6027\u3002\u4e3a\u7f13\u89e3\u8fd9\u4e00\u635f\u5931\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53d7Anil\u7b49\u4eba\uff082021\uff09\u7684\u8bc1\u660e\u5668-\u9a8c\u8bc1\u5668\u6e38\u620f\u542f\u53d1\u7684\u8bad\u7ec3\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u8fed\u4ee3\u5730\u8bad\u7ec3\u5c0f\u578b\u9a8c\u8bc1\u5668\u9884\u6d4b\u89e3\u9898\u6b63\u786e\u6027\uff0c\"\u6709\u5e2e\u52a9\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u9a8c\u8bc1\u5668\u63a5\u53d7\u7684\u6b63\u786e\u89e3\u7b54\uff0c\u4ee5\u53ca\"\u72e1\u733e\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u6b3a\u9a97\u9a8c\u8bc1\u5668\u7684\u9519\u8bef\u89e3\u7b54\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6709\u5e2e\u52a9\u8bc1\u660e\u5668\u7684\u51c6\u786e\u6027\u548c\u9a8c\u8bc1\u5668\u5bf9\u6297\u653b\u51fb\u7684\u9c81\u68d2\u6027\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u9488\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u80fd\u591f\u8f6c\u79fb\u7ed9\u65f6\u95f4\u6709\u9650\u7684\u4eba\u7c7b\uff0c\u4ed6\u4eec\u5728\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u6b63\u786e\u6027\u65f6\u7684\u51c6\u786e\u6027\u4f1a\u968f\u7740\u8bad\u7ec3\u63d0\u9ad8\uff0c\u800c\u5728\u9a8c\u8bc1\u72e1\u733e\u8bc1\u660e\u5668\u7684\u89e3\u51b3\u65b9\u6848\u65f6\u4f1a\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u5c0f\u578b\u9a8c\u8bc1\u5668\u8fdb\u884c\u53ef\u8bfb\u6027\u8bad\u7ec3\u53ef\u80fd\u662f\u4e00\u79cd\u5b9e\u9645\u53ef\u884c\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u63d0\u5347\u5927\u578bLLMs\u5bf9\u4eba\u7c7b\u7684\u53ef\u8bfb\u6027\uff0c\u4ece\u800c\u6709\u52a9\u4e8e\u8d85\u7ea7\u4eba\u7c7b\u6a21\u578b\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u662f\u4e00\u4e2a\u5b9e\u7528\u7684\u9014\u5f84\uff0c\u53ef\u4ee5\u589e\u5f3a\u5927\u578bLLMs\u7684\u53ef\u8bfb\u6027\uff0c\u5bf9\u4eba\u7c7b\u6765\u8bf4\u66f4\u6613\u4e8e\u7406\u89e3\u548c\u4fe1\u4efb\u3002|\n", "2407.13648": "|**2024-07-18**|**COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization**|Skyler Grandel et.al.|[2407.13648](http://arxiv.org/abs/2407.13648)|null|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u8f6f\u4ef6\u7ef4\u62a4\u4e2d\u4ee3\u7801\u7406\u89e3\u7684\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7\u81ea\u52a8\u5316\u751f\u6210\u6ce8\u91ca\u6765\u63d0\u5347\u8fd9\u4e00\u8fc7\u7a0b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOMCAT\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u9886\u57df\u4e13\u5bb6\u6307\u5bfc\uff0c\u65e8\u5728\u4e3a\u6e90\u4ee3\u7801\u63d0\u4f9b\u6709\u52a9\u4e8e\u7406\u89e3\u7684\u6ce8\u91ca\u3002COMCAT\u6d41\u7a0b\u5305\u62ec\u81ea\u52a8\u8bc6\u522b\u4ee3\u7801\u4e2d\u9002\u5408\u6dfb\u52a0\u6ce8\u91ca\u7684\u4f4d\u7f6e\u3001\u9884\u6d4b\u6bcf\u4e2a\u4f4d\u7f6e\u6700\u9002\u5408\u7684\u6ce8\u91ca\u7c7b\u578b\uff0c\u5e76\u6839\u636e\u9009\u5b9a\u4f4d\u7f6e\u548c\u7c7b\u578b\u751f\u6210\u6ce8\u91ca\u3002\u5728\u4eba\u7c7b\u53d7\u8bd5\u8005\u7684\u7814\u7a76\u4e2d\uff0c\u7ed3\u679c\u663e\u793aCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u663e\u8457\u63d0\u9ad8\u4e86\u5f00\u53d1\u4eba\u5458\u5728\u4e09\u4e2a\u5178\u578b\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u7684\u4ee3\u7801\u7406\u89e3\u80fd\u529b\uff0c\u5bf9\u4e8e87%\u7684\u53c2\u4e0e\u8005\uff0c\u63d0\u5347\u5e45\u5ea6\u8fbe\u523012%\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u8868\u660eCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u5728\u51c6\u786e\u6027\u3001\u53ef\u8bfb\u6027\u4e0a\u81f3\u5c11\u4e0e\u4eba\u5de5\u6ce8\u91ca\u76f8\u5f53\uff0c\u5e76\u4e14\u572892%\u7684\u4ee3\u7801\u7247\u6bb5\u4e2d\uff0c\u5f00\u53d1\u8005\u66f4\u504f\u597dCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\uff0c\u800c\u975e\u6807\u51c6\u7684ChatGPT\u751f\u6210\u7684\u6ce8\u91ca\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u5f00\u53d1\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5305\u542b\u6e90\u4ee3\u7801\u7247\u6bb5\u3001\u4eba\u5de5\u7f16\u5199\u6ce8\u91ca\u548c\u6807\u6ce8\u7684\u7c7b\u522b\u6570\u636e\u96c6\u3002\u603b\u7684\u6765\u8bf4\uff0cCOMCAT\u5229\u7528LLMs\u5728\u591a\u79cd\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u4ee3\u7801\u7406\u89e3\u6c34\u5e73\u3002|\n", "2407.13647": "|**2024-07-18**|**Weak-to-Strong Reasoning**|Yuqing Yang et.al.|[2407.13647](http://arxiv.org/abs/2407.13647)|**[link](https://github.com/gair-nlp/weak-to-strong-reasoning)**|\u5f53\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6027\u80fd\u8d85\u8d8a\u4eba\u7c7b\u65f6\uff0c\u4e3a\u5176\u63d0\u4f9b\u5168\u9762\u800c\u7cbe\u786e\u7684\u76d1\u7763\u53d8\u5f97\u56f0\u96be\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u5f31\u5230\u5f3a\u5b66\u4e60\u65b9\u6cd5\uff0c\u5373\u5229\u7528\u80fd\u529b\u8f83\u5f31\u7684\u6a21\u578b\u6fc0\u53d1\u8f83\u5f3a\u6a21\u578b\u7684\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u793a\u51fa\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u7684\u6548\u679c\u5c1a\u672a\u5f97\u5230\u5145\u5206\u68c0\u9a8c\uff0c\u4e14\u5f53\u524d\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u6765\u907f\u514d\u6a21\u578b\u76f2\u76ee\u6a21\u4eff\u5f31\u5bfc\u5e08\uff0c\u5305\u62ec\u5176\u9519\u8bef\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6e10\u8fdb\u5b66\u4e60\u6846\u67b6\uff0c\u4f7f\u5f3a\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u4f18\u5316\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u7ea7\u6a21\u578b\u6216\u4eba\u5de5\u6807\u6ce8\u7684\u6570\u636e\u3002\u8be5\u6846\u67b6\u9996\u5148\u5bf9\u9009\u5b9a\u7684\u5c0f\u800c\u9ad8\u8d28\u91cf\u6570\u636e\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u7136\u540e\u5728\u5f3a\u6a21\u578b\u81ea\u884c\u8bc6\u522b\u7684\u5bf9\u6bd4\u6837\u672c\u4e0a\u8fdb\u884c\u504f\u597d\u4f18\u5316\u3002\u6211\u4eec\u5728GSM8K\u548cMATH\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86Llama2-70b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u4e09\u79cd\u4e0d\u540c\u7684\u5f31\u6a21\u578b\u8fdb\u884c\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5728\u524d\u77bb\u6027\u7684\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0cLlama3-8b-instruct\u6210\u529f\u6307\u5bfcLlama3-70b\u5728\u6781\u5177\u6311\u6218\u6027\u7684OlympicArena\u6570\u636e\u96c6\u4e0a\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u7684\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u53ef\u6269\u5c55\u548c\u9ad8\u7ea7\u7684\u7b56\u7565\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728\u83b7\u53d6\u3002|\n", "2407.14507": "|**2024-07-19**|**Internal Consistency and Self-Feedback in Large Language Models: A Survey**|Xun Liang et.al.|[2407.14507](http://arxiv.org/abs/2407.14507)|**[link](https://github.com/iaar-shanghai/icsfsurvey)**|**\u672c\u6587\u603b\u7ed3\u4e86\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u79f0\u4e3a\u5185\u90e8\u4e00\u81f4\u6027\uff08Internal Consistency\uff09\uff0c\u5b83\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u7406\u4e0d\u8db3\u548c\u751f\u6210\u5e7b\u89c9\u5185\u5bb9\u7b49\u95ee\u9898\u4e0a\u7684\u8868\u73b0\u63d0\u4f9b\u4e86\u4e00\u81f4\u7684\u89e3\u91ca\u3002\u5185\u90e8\u4e00\u81f4\u6027\u8bc4\u4f30\u4e86LLM\u7684\u6f5c\u5728\u5c42\u3001\u89e3\u7801\u5c42\u548c\u54cd\u5e94\u5c42\u4e4b\u95f4\u7684\u5185\u5728\u4e00\u81f4\u6027\uff0c\u57fa\u4e8e\u91c7\u6837\u65b9\u6cd5\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86Self-Feedback\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u7406\u8bba\u6846\u67b6\uff0c\u7528\u4e8e\u6316\u6398\u5185\u90e8\u4e00\u81f4\u6027\u7684\u4fe1\u606f\u3002Self-Feedback\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u6a21\u5757\uff1a\u81ea\u6211\u8bc4\u4f30\uff08Self-Evaluation\uff09\u548c\u81ea\u6211\u66f4\u65b0\uff08Self-Update\uff09\u3002 \u6211\u4eec\u7cfb\u7edf\u5730\u6309\u4efb\u52a1\u548c\u7814\u7a76\u65b9\u5411\u5bf9\u8fd9\u4e9b\u7814\u7a76\u8fdb\u884c\u4e86\u5206\u7c7b\uff1b\u603b\u7ed3\u4e86\u76f8\u5173\u7684\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\uff1b\u6df1\u5165\u63a2\u8ba8\u4e86\u201cSelf-Feedback\u771f\u7684\u6709\u6548\u5417\uff1f\u201d\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u4e2a\u5173\u952e\u89c2\u70b9\uff0c\u5305\u62ec\u201c\u5185\u90e8\u4e00\u81f4\u6027\u7684\u53d1\u5c55\u949f\u697c\u201d\u3001\u201c\u4e00\u81f4\u6027\u51e0\u4e4e\u662f\u6b63\u786e\u6027\u201d\u7684\u5047\u8bbe\u4ee5\u53ca\u201c\u6f5c\u610f\u8bc6\u4e0e\u663e\u5f0f\u63a8\u7406\u6096\u8bba\u201d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6982\u8ff0\u4e86\u672a\u6765\u7814\u7a76\u7684\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u5b9e\u9a8c\u4ee3\u7801\u3001\u53c2\u8003\u5217\u8868\u548c\u7edf\u8ba1\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u94fe\u63a5\u4e3a\uff1a[](https://github.com/IAAR-Shanghai/ICSFSurvey)**|\n", "2407.14506": "|**2024-07-19**|**On Pre-training of Multimodal Language Models Customized for Chart Understanding**|Wan-Cyuan Fan et.al.|[2407.14506](http://arxiv.org/abs/2407.14506)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5728\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5b9a\u5236\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u6210\u679c\uff0c\u7279\u522b\u662f\u5728\u79d1\u5b66\u56fe\u8868\u7406\u89e3\u9886\u57df\u3002\u8fd9\u4e9b\u7814\u7a76\u901a\u5e38\u901a\u8fc7\u4f7f\u7528\u4e13\u95e8\u7684\u6570\u636e\u96c6\u8fdb\u884c\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6765\u589e\u5f3a\u95ee\u7b54\uff08QA\uff09\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5f80\u5f80\u5ffd\u89c6\u4e86\u81ea\u7136\u56fe\u50cf-\u63cf\u8ff0\u9884\u8bad\u7ec3\u6570\u636e\u4e0e\u6570\u5b57\u56fe\u8868\u56fe\u50cf-QA\u6570\u636e\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u6a21\u578b\u4ece\u56fe\u8868\u4e2d\u63d0\u53d6\u6f5c\u5728\u6570\u503c\u7684\u80fd\u529b\u3002\u672c\u6587\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u758f\u6f0f\uff0c\u63a2\u7d22\u6539\u8fdbMLLMs\u5bf9\u56fe\u8868\u7406\u89e3\u6240\u9700\u7684\u5173\u952e\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u53d1\u73b0\uff1a\uff081\uff09\u5728\u5bf9\u9f50\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u878d\u5165\u539f\u59cb\u6570\u636e\u503c\u663e\u8457\u63d0\u9ad8\u4e86\u5bf9\u56fe\u8868\u6570\u636e\u7684\u7406\u89e3\u80fd\u529b\u3002\uff082\uff09\u5728\u7aef\u5230\u7aef\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u968f\u673a\u66ff\u6362\u56fe\u50cf\u4e3a\u6587\u672c\u8868\u793a\uff0c\u80fd\u591f\u5c06\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u56fe\u8868\u89e3\u91ca\u6280\u80fd\u4e0a\u3002\uff083\uff09\u8981\u6c42\u6a21\u578b\u9996\u5148\u63d0\u53d6\u5e95\u5c42\u56fe\u8868\u6570\u636e\uff0c\u7136\u540e\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u56de\u7b54\u95ee\u9898\uff0c\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u51c6\u786e\u6027\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86CHOPINLLM\uff0c\u4e00\u79cd\u4e13\u4e3a\u6df1\u5165\u56fe\u8868\u7406\u89e3\u5b9a\u5236\u7684MLLM\u3002CHOPINLLM\u6709\u6548\u5730\u89e3\u6790\u5404\u79cd\u7c7b\u578b\u7684\u56fe\u8868\uff0c\u5305\u62ec\u672a\u6807\u6ce8\u7684\u56fe\u8868\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30MLLMs\u5728\u4e0d\u540c\u56fe\u8868\u7c7b\u578b\u548c\u7406\u89e3\u6c34\u5e73\u4e0a\u7684\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCHOPINLLM\u5728\u7406\u89e3\u5404\u79cd\u7c7b\u578b\u3001\u5e26\u6709\u6807\u6ce8\u548c\u672a\u6807\u6ce8\u7684\u56fe\u8868\u65b9\u9762\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.14487": "|**2024-07-19**|**Evaluating the Reliability of Self-Explanations in Large Language Models**|Korbinian Randl et.al.|[2407.14487](http://arxiv.org/abs/2407.14487)|**[link](https://github.com/k-randl/self-explaining_llms)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u88ab\u63d0\u793a\u89e3\u91ca\u5176\u5148\u524d\u8f93\u51fa\u65f6\u751f\u6210\u7684\u89e3\u91ca\u53ef\u9760\u6027\u3002\u6211\u4eec\u5229\u7528\u4e09\u79cd\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08\u53c2\u6570\u4ece2B\u52308B\uff09\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u5206\u7c7b\u4efb\u52a1\uff08\u5ba2\u89c2\u548c\u4e3b\u89c2\uff09\u4e0a\u8bc4\u4f30\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u81ea\u6211\u89e3\u91ca\u2014\u2014\u62bd\u53d6\u5f0f\u548c\u53cd\u4e8b\u5b9e\u5f0f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u81ea\u6211\u89e3\u91ca\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5e76\u4e0d\u5b8c\u5168\u4e14\u51c6\u786e\u5730\u9075\u5faa\u6a21\u578b\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u6307\u51fa\u4e86\u4e00\u79cd\u611f\u77e5\u4e0e\u5b9e\u9645\u6a21\u578b\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u663e\u793a\uff0c\u901a\u8fc7\u63d0\u793a\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u53cd\u4e8b\u5b9e\u89e3\u91ca\uff0c\u53ef\u4ee5\u4ea7\u751f\u5fe0\u5b9e\u3001\u4fe1\u606f\u4e30\u5bcc\u4e14\u6613\u4e8e\u9a8c\u8bc1\u7684\u7ed3\u679c\u3002\u8fd9\u4e9b\u53cd\u4e8b\u5b9e\u4e3a\u4f20\u7edf\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\uff08\u4f8b\u5982SHAP\u3001LIME\uff09\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u524d\u63d0\u662f\u5bf9\u7279\u5b9a\u4efb\u52a1\u5b9a\u5236\u63d0\u793a\u5e76\u68c0\u67e5\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14474": "|**2024-07-19**|**Contrastive Learning with Counterfactual Explanations for Radiology Report Generation**|Mingjie Li et.al.|[2407.14474](http://arxiv.org/abs/2407.14474)|null|\u7531\u4e8e\u89e3\u5256\u5b66\u7684\u5e38\u89c1\u5185\u5bb9\u548c\u4e0e\u4e4b\u5bf9\u5e94\u7684\u5f71\u50cf\u5b66\u56fe\u50cf\u4e4b\u95f4\u7684\u9ad8\u5ea6\u76f8\u4f3c\u6027\uff0c\u8fd9\u79cd\u56fa\u6709\u7684\u6570\u636e\u504f\u89c1\u53ef\u80fd\u5bfc\u81f4\u81ea\u52a8\u62a5\u544a\u751f\u6210\u6a21\u578b\u5b66\u4e60\u7ea0\u7f20\u548c\u76f8\u5173\u6027\u589e\u5f3a\u7684\u8868\u793a\uff0c\u4ece\u800c\u4ea7\u751f\u8bef\u8bca\u62a5\u544a\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cCo\u201dunter\u201cF\u201dactual \u201cE\u201dxplanations\uff08CoFE\uff09\u6846\u67b6\u7528\u4e8e\u653e\u5c04\u5b66\u62a5\u544a\u751f\u6210\u3002\u53cd\u4e8b\u5b9e\u89e3\u91ca\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u7406\u89e3\u7b97\u6cd5\u51b3\u7b56\u5982\u4f55\u901a\u8fc7\u63d0\u51fa\u201c\u5982\u679c\u201d\u573a\u666f\u800c\u88ab\u6539\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e00\u6982\u5ff5\uff0cCoFE\u53ef\u4ee5\u901a\u8fc7\u5bf9\u6bd4\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u7684\u8868\u793a\u6765\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\uff0c\u4ece\u800c\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u5728\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u4ea4\u6362\u8865\u4e01\u76f4\u5230\u9884\u6d4b\u8bca\u65ad\u53d1\u751f\u53d8\u5316\uff0c\u6211\u4eec\u63a8\u5bfc\u51fa\u53cd\u4e8b\u5b9e\u56fe\u50cf\u3002\u5728\u8fd9\u91cc\uff0c\u6b63\u4f8b\u548c\u8d1f\u4f8b\u662f\u6700\u8bed\u4e49\u4e0a\u76f8\u4f3c\u7684\uff0c\u4f46\u5177\u6709\u4e0d\u540c\u7684\u8bca\u65ad\u6807\u7b7e\u3002\u6b64\u5916\uff0cCoFE\u91c7\u7528\u53ef\u5b66\u4e60\u63d0\u793a\u9ad8\u6548\u5730\u5bf9\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u5c01\u88c5\u4e86\u6b63\u4e8b\u5b9e\u4f8b\u548c\u53cd\u4e8b\u5b9e\u5b9e\u4f8b\u7684\u5185\u5bb9\uff0c\u63d0\u4f9b\u66f4\u901a\u7528\u7684\u63d0\u793a\u8868\u793a\u3002\u5728\u4e24\u4e2a\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u5229\u7528\u53cd\u4e8b\u5b9e\u89e3\u91ca\u4f7fCoFE\u80fd\u591f\u751f\u6210\u8bed\u4e49\u4e0a\u8fde\u8d2f\u4e14\u4e8b\u5b9e\u5b8c\u6574\u7684\u62a5\u544a\uff0c\u5e76\u5728\u8bed\u8a00\u751f\u6210\u548c\u4e34\u5e8a\u6709\u6548\u6027\u6307\u6807\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.14467": "|**2024-07-19**|**Check-Eval: A Checklist-based Approach for Evaluating Text Quality**|Jayr Pereira et.al.|[2407.14467](http://arxiv.org/abs/2407.14467)|null|\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u4f20\u7edf\u7684\u8bc4\u4f30\u6807\u51c6\u5f80\u5f80\u4e0e\u4eba\u7c7b\u7684\u5224\u65ad\u4e0d\u5339\u914d\uff0c\u5c24\u5176\u662f\u5728\u9700\u8981\u521b\u9020\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u7684\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheck-Eval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u901a\u8fc7\u5229\u7528LLM\u4ee5\u68c0\u67e5\u8868\u4e3a\u57fa\u7840\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u3002Check-Eval\u53ef\u4ee5\u4f5c\u4e3a\u65e0\u53c2\u8003\u548c\u6709\u53c2\u8003\u7684\u8bc4\u4f30\u65b9\u6cd5\u4f7f\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u4e14\u53ef\u89e3\u91ca\u7684\u6587\u672c\u8d28\u91cf\u8bc4\u4f30\u4f53\u7cfb\u3002\u8be5\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u9636\u6bb5\u7ec4\u6210\uff1a\u68c0\u67e5\u8868\u751f\u6210\u548c\u68c0\u67e5\u8868\u8bc4\u4f30\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u9a8c\u8bc1\u4e86Check-Eval\uff1a\u8461\u8404\u7259\u8bed\u6cd5\u5f8b\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u6027\u4ee5\u53caSummEval\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cCheck-Eval\u4e0e\u73b0\u6709\u6307\u6807\uff08\u5982G-Eval\u548cGPTScore\uff09\u76f8\u6bd4\uff0c\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5176\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4efb\u52a1\u66f4\u53ef\u9760\u548c\u6709\u6548\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u4ee3\u7801\u53ef\u5728https://anonymous.4open.science/r/check-eval-0DB4\u83b7\u53d6\u3002|\n", "2407.14452": "|**2024-07-19**|**Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier**|Zachary Wojtowicz et.al.|[2407.14452](http://arxiv.org/abs/2407.14452)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u5176\u4ed6\u9ad8\u5ea6\u5148\u8fdb\u7684AI\u7cfb\u7edf\u5728\u51b3\u5b9a\u8bf4\u4ec0\u4e48\u6216\u505a\u4ec0\u4e48\u65f6\u63d0\u4f9b\u4e86\u4fbf\u5229\uff0c\u4f46\u8fd9\u4fbf\u5229\u6027\u5b9e\u9645\u4e0a\u524a\u5f31\u4e86\u5728\u793e\u4f1a\u60c5\u5883\u4e0b\u91c7\u53d6\u6709\u6548\u884c\u52a8\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5f15\u5165\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u8fd9\u4e00\u6574\u5408\u6027\u7406\u8bba\u6982\u5ff5\u6765\u89e3\u91ca\u8fd9\u79cd\u770b\u4f3c\u77db\u76fe\u7684\u73b0\u8c61\u3002\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u53d1\u751f\u5728\u4f7f\u7528\u53ef\u89c2\u5bdf\u7684\u884c\u4e3a\u6765\u8bc1\u5b9e\u4e0d\u53ef\u89c2\u5bdf\u7684\u5fc3\u7406\u4e8b\u5b9e\u7684\u60c5\u51b5\u4e2d\u3002\u4ece\u62db\u8058\u5230\u7ea6\u4f1a\uff0c\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u4f7f\u4eba\u4eec\u80fd\u591f\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e2d\u76f8\u4e92\u4f20\u8fbe\u4ef7\u503c\u89c2\u3001\u610f\u56fe\u3001\u77e5\u8bc6\u72b6\u6001\u7b49\u5fc3\u7406\u7279\u5f81\uff0c\u8fd9\u4e9b\u73af\u5883\u4e2d\u7684\u8bda\u5b9e\u96be\u4ee5\u5f97\u5230\u5f3a\u5236\u6267\u884c\u3002 \u57fa\u4e8e\u7ecf\u6d4e\u5b66\u3001\u7406\u8bba\u751f\u7269\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63cf\u8ff0\u4e86\u4f7f\u4eba\u7c7b\u80fd\u591f\u5b9e\u65bd\u5fc3\u7406\u8bc1\u660e\u7684\u6838\u5fc3\u7406\u8bba\u673a\u5236\u3002\u5bf9\u8fd9\u4e9b\u673a\u5236\u7684\u5206\u6790\u63ed\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u5982\u4f55\u5728\u4f7f\u601d\u8003\u53d8\u5f97\u5bb9\u6613\u7684\u540c\u65f6\uff0c\u5374\u53ef\u80fd\u4f7f\u4f4e\u4fe1\u4efb\u5408\u4f5c\u53d8\u5f97\u66f4\u96be\u3002 \u901a\u8fc7\u7406\u89e3\u5fc3\u7406\u8bc1\u660e\u7684\u5de5\u4f5c\u539f\u7406\u53ca\u5176\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u5e94\u7528\uff0c\u6211\u4eec\u53ef\u4ee5\u8bbe\u8ba1\u51fa\u65e2\u80fd\u4fc3\u8fdb\u9ad8\u6548\u6c9f\u901a\u53c8\u80fd\u7ef4\u62a4\u793e\u4f1a\u534f\u4f5c\u7684AI\u7cfb\u7edf\u3002\u4f8b\u5982\uff0c\u5728\u62db\u8058\u8fc7\u7a0b\u4e2d\uff0cAI\u53ef\u4ee5\u901a\u8fc7\u5206\u6790\u5019\u9009\u4eba\u7684\u884c\u4e3a\u6a21\u5f0f\u548c\u5386\u53f2\u6570\u636e\u6765\u95f4\u63a5\u8bc4\u4f30\u5176\u6280\u80fd\u3001\u56e2\u961f\u5408\u4f5c\u80fd\u529b\u4ee5\u53ca\u5bf9\u516c\u53f8\u6587\u5316\u7684\u9002\u5e94\u6027\uff0c\u4ece\u800c\u5e2e\u52a9\u96c7\u4e3b\u505a\u51fa\u66f4\u53ef\u9760\u7684\u4eba\u624d\u9009\u62e9\u51b3\u7b56\u3002\u5728\u7ea6\u4f1a\u573a\u666f\u4e2d\uff0cAI\u53ef\u4ee5\u5229\u7528\u793e\u4ea4\u5a92\u4f53\u6d3b\u52a8\u3001\u5174\u8da3\u7231\u597d\u7b49\u4fe1\u606f\u6765\u6784\u5efa\u7528\u6237\u7684\u5fc3\u7406\u753b\u50cf\uff0c\u4ee5\u6b64\u5e2e\u52a9\u7528\u6237\u627e\u5230\u4e0e\u81ea\u5df1\u4ef7\u503c\u89c2\u548c\u751f\u6d3b\u65b9\u5f0f\u76f8\u5339\u914d\u7684\u4f34\u4fa3\u3002 \u603b\u4e4b\uff0c\u901a\u8fc7\u5408\u7406\u5730\u8bbe\u8ba1\u548c\u5e94\u7528AI\u6280\u672f\uff0c\u6211\u4eec\u4e0d\u4ec5\u53ef\u4ee5\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e0b\u589e\u5f3a\u4eba\u7c7b\u7684\u4ea4\u6d41\u548c\u5408\u4f5c\u80fd\u529b\uff0c\u800c\u4e14\u8fd8\u80fd\u4fc3\u8fdb\u66f4\u52a0\u516c\u6b63\u3001\u900f\u660e\u548c\u9ad8\u6548\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002|\n", "2407.14439": "|**2024-07-19**|**Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding**|Renshan Zhang et.al.|[2407.14439](http://arxiv.org/abs/2407.14439)|**[link](https://github.com/JiuTian-VL/TokenCorrCompressor)**|**\u5f53\u524d\u4e3b\u6d41\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models, MLLMs\uff09\u5728\u8fdb\u884c\u6587\u6863\u7406\u89e3\u65f6\uff0c\u666e\u904d\u91c7\u7528\u5bf9\u9ad8\u5206\u8fa8\u7387\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u88c1\u526a\uff0c\u4ece\u800c\u751f\u6210\u591a\u4e2a\u5b50\u56fe\u50cf\u7684\u65b9\u6cd5\u3002\u5927\u591a\u6570\u73b0\u6709\u7684\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u4f1a\u4fdd\u7559\u6240\u6709\u5b50\u56fe\u50cf\u5185\u7684\u6807\u8bb0\uff0c\u5e76\u540c\u7b49\u5bf9\u5f85\u5b83\u4eec\uff0c\u8fd9\u5ffd\u89c6\u4e86\u8fd9\u4e9b\u6807\u8bb0\u7684\u4e0d\u540c\u4fe1\u606f\u4ef7\u503c\u6027\uff0c\u5bfc\u81f4\u4e86\u5927\u91cf\u4e0d\u5fc5\u8981\u7684\u56fe\u50cf\u6807\u8bb0\u589e\u52a0\u3002\u4e3a\u4e86\u5b9e\u73b0\u66f4\u52a0\u9002\u5e94\u6027\u548c\u9ad8\u6548\u7684\u6587\u6863\u7406\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cToken\u7ea7\u76f8\u5173\u6027\u5f15\u5bfc\u538b\u7f29\u201d\u7684\u65e0\u53c2\u6570\u4e14\u53ef\u63d2\u62d4\u65b9\u6cd5\uff0c\u65e8\u5728\u4f18\u5316\u6807\u8bb0\u5904\u7406\u8fc7\u7a0b\u3002\u8be5\u65b9\u6cd5\u9996\u5148\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bc4\u4f30\u6a21\u5f0f\u91cd\u590d\u6027\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u6bcf\u4e2a\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u8fdb\u884c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8bc6\u522b\u5197\u4f59\u6807\u8bb0\uff0c\u4ece\u800c\u786e\u5b9a\u5b50\u56fe\u50cf\u7684\u4fe1\u606f\u5bc6\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9Token\u7ea7\u522b\u7684\u91c7\u6837\u65b9\u6cd5\uff0c\u901a\u8fc7\u6df1\u5165\u5206\u6790[CLS]\u6807\u8bb0\u4e0e\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u9ad8\u6548\u6355\u6349\u6700\u5177\u4fe1\u606f\u4ef7\u503c\u7684\u6807\u8bb0\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e24\u79cd\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4f7f\u7528\u88c1\u526a\u6280\u672f\u7684MLLMs\u4e2d\u7684\u81ea\u9002\u5e94\u538b\u7f29\u6a21\u5757\u3002\u8fd9\u4e00\u6a21\u5757\u4e0d\u4ec5\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u8fc7\u7a0b\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u901f\u5ea6\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u4e0e\u73b0\u6709\u538b\u7f29\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u6c34\u5e73\u3002\u6211\u4eec\u4f7f\u7528\u5f53\u524d\u6700\u4f73\u7684\u6587\u6863\u7406\u89e3\u6a21\u578bmPLUG-DocOwl1.5\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4e0e\u5176\u4ed6\u538b\u7f29\u65b9\u6cd5\u7684\u5e7f\u6cdb\u5bf9\u6bd4\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14402": "|**2024-07-19**|**The Vision of Autonomic Computing: Can LLMs Make It a Reality?**|Zhiyang Zhang et.al.|[2407.14402](http://arxiv.org/abs/2407.14402)|null|\u300a\u81ea\u6cbb\u8ba1\u7b97\u613f\u666f\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u300b\u4e00\u6587\u56de\u987e\u4e86\u8d85\u8fc7\u4e8c\u5341\u5e74\u524d\u63d0\u51fa\u7684\u81ea\u6cbb\u8ba1\u7b97\uff08ACV\uff09\u613f\u666f\uff0c\u65e8\u5728\u6784\u5efa\u80fd\u591f\u81ea\u6211\u7ba1\u7406\u548c\u9002\u5e94\u73af\u5883\u53d8\u5316\u7684\u8ba1\u7b97\u7cfb\u7edf\uff0c\u8fd9\u4e00\u76ee\u6807\u81f3\u4eca\u4ecd\u9762\u4e34\u6311\u6218\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u5b83\u4eec\u901a\u8fc7\u5229\u7528\u5e7f\u6cdb\u7684\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u4ee5\u53ca\u4efb\u52a1\u81ea\u52a8\u5316\u80fd\u529b\u6765\u5b9e\u73b0\u8fd9\u4e00\u613f\u666f\u3002 \u672c\u6587\u63a2\u8ba8\u4e86\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u5fae\u670d\u52a1\u7ba1\u7406\u81ea\u4e3b\u6027\u7684\u53ef\u884c\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e94\u7ea7\u5206\u7c7b\u4f53\u7cfb\uff0c\u7528\u4e8e\u63cf\u8ff0\u81ea\u4e3b\u670d\u52a1\u7ef4\u62a4\u7684\u4e0d\u540c\u5c42\u6b21\u3002\u6587\u4e2d\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u201cSock Shop\u201d\u5fae\u670d\u52a1\u6f14\u793a\u9879\u76ee\u7684\u5728\u7ebf\u8bc4\u4f30\u57fa\u51c6\uff0c\u4ee5\u8bc4\u4f30\u8be5\u6846\u67b6\u7684\u6027\u80fd\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7LLMs\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u5fae\u670d\u52a1\u4f53\u7cfb\u7ed3\u6784\u4e2d\u95ee\u9898\u68c0\u6d4b\u548c\u89e3\u51b3\u7684\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u7b2c\u4e09\u7ea7\u81ea\u4e3b\u6027\u6c34\u5e73\u7684\u7a81\u7834\uff0c\u8fd9\u6807\u5fd7\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u6846\u67b6\u96c6\u6210\u65b9\u9762\u7684\u5e94\u7528\u53d6\u5f97\u4e86\u91cd\u8981\u8fdb\u5c55\uff0c\u4e3a\u6784\u5efa\u66f4\u9002\u5e94\u6027\u548c\u81ea\u6211\u7ba1\u7406\u7684\u8ba1\u7b97\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u548c\u53d1\u5c55\uff0c\u76f8\u5173\u7684\u4ee3\u7801\u5c06\u901a\u8fc7\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2407.14371": "|**2024-07-19**|**Open Artificial Knowledge**|Vadim Borisov et.al.|[2407.14371](http://arxiv.org/abs/2407.14371)|null|\u300a\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\uff08OAK\uff09\u6570\u636e\u96c6\uff1a\u4fc3\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u4e0e\u89e3\u51b3\u6570\u636e\u7a00\u7f3a\u4e0e\u9690\u79c1\u95ee\u9898\u300b \u5f53\u524d\uff0c\u57fa\u4e8e\u5bf9\u8bdd\u7684AI\u7cfb\u7edf\u5982ChatGPT\u3001Claude\u548cGemini\u7684\u6210\u529f\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u6d77\u91cf\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u3002\u7136\u800c\uff0c\u83b7\u53d6\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u4f26\u7406\u6765\u6e90\u7684\u6570\u636e\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\u201d\uff08OAK\uff09\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b\u8d85\u8fc75\u4ebf\u4e2a\u4ee4\u724c\uff08\u64b0\u5199\u65f6\uff09\u7684\u5927\u578b\u8d44\u6e90\u5e93\u3002OAK\u901a\u8fc7\u96c6\u5408\u5305\u62ecGPT4o\u3001LLaMa3-70B\u3001LLaMa3-8B\u3001Mixtral-8x7B\u3001Gemma-7B\u548cGemma-2-9B\u5728\u5185\u7684\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5229\u7528\u7ef4\u57fa\u767e\u79d1\u7684\u4e3b\u8981\u7c7b\u522b\u6765\u5f15\u5bfc\u6587\u672c\u751f\u6210\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u9886\u57df\u8986\u76d6\uff0c\u540c\u65f6\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\u3002OAK\u6570\u636e\u96c6\u65e8\u5728\u4fc3\u8fdb\u66f4\u5f3a\u5927\u3001\u66f4\u5bf9\u9f50\u7684\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u5e76\u89e3\u51b3\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u95ee\u9898\uff0c\u5982\u6570\u636e\u7a00\u7f3a\u6027\u548c\u9690\u79c1\u95ee\u9898\u3002\u76ee\u524d\uff0c\u8be5\u6570\u636e\u96c6\u662f\u514d\u8d39\u63d0\u4f9b\u5728www.oakdataset.org\u3002|\n", "2407.14355": "|**2024-07-19**|**Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models**|Xuenan Xu et.al.|[2407.14355](http://arxiv.org/abs/2407.14355)|**[link](https://github.com/wsntxxn/attrenhzsac)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8fdb\u884c\u96f6\u6837\u672c\u97f3\u9891\u5206\u7c7b\uff0c\u5373\u8bc6\u522b\u548c\u5206\u7c7b\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ece\u672a\u89c1\u8fc7\u7684\u97f3\u9891\u7c7b\u522b\u3002\u6211\u4eec\u63d0\u8bae\u5217\u51fa\u4e00\u7cfb\u5217\u97f3\u9891\u5c5e\u6027\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9886\u57df\u77e5\u8bc6\u4e3a\u6bcf\u4e2a\u7c7b\u522b\u751f\u6210\u8be6\u7ec6\u7684\u5c5e\u6027\u63cf\u8ff0\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u4f9d\u8d56\u7c7b\u522b\u6807\u7b7e\u6216\u7b80\u5355\u63cf\u8ff0\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u591a\u7ef4\u5ea6\u7684\u5185\u5728\u542c\u89c9\u5c5e\u6027\uff0c\u6355\u6349\u97f3\u9891\u7c7b\u522b\u7684\u4e0d\u540c\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\u6765\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u6807\u7b7e\u7684\u96f6\u6837\u672c\u5b66\u4e60\u3002\u6211\u4eec\u5728VGGSound\u548cAudioSet\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff08\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://www.github.com/wsntxxn/AttrEnhZsAc\uff09\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u96f6\u6837\u672c\u5206\u7c7b\u51c6\u786e\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u9ad8\u3002\u6d88\u878d\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u65e0\u8bba\u6a21\u578b\u67b6\u6784\u5982\u4f55\uff0c\u6027\u80fd\u589e\u5f3a\u90fd\u975e\u5e38\u7a33\u5065\u3002|\n", "2407.15850": "|**2024-07-22**|**AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description**|Junyu Xie et.al.|[2407.15850](http://arxiv.org/abs/2407.15850)|**[link](https://github.com/Jyxarthur/AutoAD-Zero)**|**\u6211\u4eec\u7684\u76ee\u6807\u662f\u65e0\u9700\u8bad\u7ec3\u5730\u751f\u6210\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u97f3\u9891\u63cf\u8ff0\uff08AD\uff09\u3002\u6211\u4eec\u5229\u7528\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5e76\u5f00\u53d1\u4e86\u89c6\u89c9\u548c\u6587\u672c\u63d0\u793a\u7b56\u7565\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u70b9\uff1a(i) \u6211\u4eec\u8bc1\u660e\uff0c\u5982\u679c\u901a\u8fc7\u89c6\u89c9\u6307\u793a\u76f4\u63a5\u63d0\u793aVLM\u63d0\u4f9b\u89d2\u8272\u4fe1\u606f\uff0cVLM\u53ef\u4ee5\u6210\u529f\u547d\u540d\u548c\u5f15\u7528\u89d2\u8272\uff0c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\uff1b(ii) \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6765\u751f\u6210AD\uff0c\u7b2c\u4e00\u9636\u6bb5\u8ba9VLM\u5168\u9762\u63cf\u8ff0\u89c6\u9891\uff0c\u7b2c\u4e8c\u9636\u6bb5\u4f7f\u7528LLM\u5c06\u5bc6\u96c6\u7684\u6587\u672c\u4fe1\u606f\u603b\u7ed3\u4e3a\u4e00\u4e2a\u7b80\u6d01\u7684AD\u53e5\u5b50\uff1b(iii) \u6211\u4eec\u5236\u5b9a\u4e86\u4e00\u4e2a\u65b0\u7684\u7535\u89c6\u97f3\u9891\u63cf\u8ff0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5AutoAD-Zero\u5728AD\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08\u751a\u81f3\u4e0e\u4e00\u4e9b\u5728\u771f\u5b9eAD\u4e0a\u5fae\u8c03\u7684\u6a21\u578b\u76f8\u5339\u654c\uff09\uff0c\u5b9e\u73b0\u4e86\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u6700\u9ad8CRITIC\u8bc4\u5206\u3002**|\n", "2407.15847": "|**2024-07-22**|**LLMmap: Fingerprinting For Large Language Models**|Dario Pasquini et.al.|[2407.15847](http://arxiv.org/abs/2407.15847)|**[link](https://github.com/pasquini-dario/LLMmap)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u96c6\u6210\u5e94\u7528\u7684\u9996\u4ee3\u6307\u7eb9\u8bc6\u522b\u653b\u51fb\u5de5\u5177\u2014\u2014LLMmap\u3002\u8be5\u5de5\u5177\u91c7\u7528\u79ef\u6781\u7684\u6307\u7eb9\u8bc6\u522b\u7b56\u7565\uff0c\u901a\u8fc7\u5411\u5e94\u7528\u53d1\u9001\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u67e5\u8be2\u5e76\u5206\u6790\u54cd\u5e94\u4fe1\u606f\uff0c\u4ee5\u8bc6\u522b\u6240\u4f7f\u7528\u7684\u5177\u4f53LLM\u6a21\u578b\u3002\u4ec5\u97008\u6b21\u4ea4\u4e92\uff0cLLMmap\u5373\u53ef\u572895%\u4ee5\u4e0a\u7684\u51c6\u786e\u7387\u4e0b\u7cbe\u786e\u8bc6\u522b\u51faLLM\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cLLMmap\u88ab\u8bbe\u8ba1\u5f97\u5177\u6709\u8de8\u4e0d\u540c\u5e94\u7528\u5c42\u7684\u9c81\u68d2\u6027\uff0c\u4f7f\u5176\u80fd\u591f\u8bc6\u522b\u5728\u5404\u79cd\u7cfb\u7edf\u63d0\u793a\u3001\u968f\u673a\u62bd\u6837\u8d85\u53c2\u6570\u4ee5\u53ca\u590d\u6742\u7684\u751f\u6210\u6846\u67b6\u5982RAG\u6216Chain-of-Thought\u7b49\u73af\u5883\u4e0b\u7684LLM\u6a21\u578b\u3002|\n", "2407.15841": "|**2024-07-22**|**SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models**|Mingze Xu et.al.|[2407.15841](http://arxiv.org/abs/2407.15841)|**[link](https://github.com/apple/ml-slowfast-llava)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6162\u901f-LLaVA\u201d\uff08\u6216\u7b80\u79f0\u4e3aSF-LLaVA\uff09\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5b83\u80fd\u591f\u540c\u65f6\u6355\u6349\u8be6\u7ec6\u7684\u7a7a\u95f4\u8bed\u4e49\u548c\u957f\u65f6\u5e8f\u4e0a\u4e0b\u6587\uff0c\u800c\u4e0d\u4f1a\u8d85\u51fa\u901a\u5e38\u4f7f\u7528\u7684LLM\u7684\u4ee4\u724c\u9884\u7b97\u3002\u8fd9\u4e00\u76ee\u6807\u901a\u8fc7\u4f7f\u7528\u89c6\u9891LLM\u8f93\u5165\u7684\u53cc\u6d41\u8bbe\u8ba1\u5b9e\u73b0\uff0c\u6709\u6548\u5730\u805a\u5408\u4e86\u4ece\u91c7\u6837\u89c6\u9891\u5e27\u4e2d\u63d0\u53d6\u7684\u7279\u5f81\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6162\u901f\u8def\u5f84\u4ee5\u8f83\u4f4e\u7684\u5e27\u7387\u63d0\u53d6\u5c3d\u53ef\u80fd\u591a\u7684\u7a7a\u95f4\u7ec6\u8282\u7684\u7279\u5f81\uff08\u4f8b\u5982\uff0c\u4ee524x24\u7684\u4ee4\u724c\uff09\uff0c\u800c\u5feb\u901f\u8def\u5f84\u5219\u4ee5\u8f83\u9ad8\u7684\u5e27\u7387\u64cd\u4f5c\uff0c\u4f46\u4f7f\u7528\u8f83\u5927\u7684\u7a7a\u95f4\u6c60\u5316\u6b65\u957f\uff08\u4f8b\u5982\uff0c\u4e0b\u91c7\u68376x\uff09\u6765\u5173\u6ce8\u8fd0\u52a8\u7ebf\u7d22\u3002\u56e0\u6b64\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u6211\u4eec\u9002\u5f53\u5730\u6355\u83b7\u5bf9\u4e8e\u7406\u89e3\u89c6\u9891\u4e2d\u7684\u8be6\u7ec6\u4fe1\u606f\u6709\u76ca\u7684\u65f6\u7a7a\u7279\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSF-LLaVA\u5728\u5404\u79cd\u89c6\u9891\u4efb\u52a1\u4e0a\u90fd\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u3002\u5728\u67d0\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u751a\u81f3\u4e0e\u5728\u89c6\u9891\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u7684\u6700\u5148\u8fdb\u7684\u89c6\u9891LLM\u5b9e\u73b0\u4e86\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2407.15838": "|**2024-07-22**|**MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity**|Yangzhou Liu et.al.|[2407.15838](http://arxiv.org/abs/2407.15838)|**[link](https://github.com/yuecao0119/mminstruct)**|\u5c3d\u7ba1\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u89c6\u89c9\u4efb\u52a1\u4e0a\u7684\u5fae\u8c03\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4f46\u73b0\u6709\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u5b58\u5728\u4ee5\u4e0b\u5c40\u9650\u6027\uff1a 1. \u6307\u4ee4\u6ce8\u91ca\u8d28\u91cf\uff1a\u867d\u7136\u73b0\u6709\u7684\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u751f\u6210\u7684\u6307\u4ee4\u53ef\u80fd\u4ecd\u4f1a\u5305\u542b\u4e0d\u51c6\u786e\u6027\uff0c\u5982\u5e7b\u89c9\u73b0\u8c61\u3002 2. \u6307\u4ee4\u548c\u56fe\u50cf\u591a\u6837\u6027\uff1a\u6307\u4ee4\u7c7b\u578b\u8303\u56f4\u6709\u9650\u4ee5\u53ca\u56fe\u50cf\u6570\u636e\u7f3a\u4e4f\u591a\u6837\u6027\u53ef\u80fd\u4f1a\u5f71\u54cd\u6a21\u578b\u751f\u6210\u591a\u6837\u6027\u548c\u63a5\u8fd1\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8f93\u51fa\u7684\u80fd\u529b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6MMInstruct\uff0c\u5305\u542b\u6765\u81ea24\u4e2a\u9886\u57df\u5171\u8ba1973K\u6761\u6307\u4ee4\u3002\u8be5\u6570\u636e\u96c6\u5305\u62ec\u56db\u79cd\u6307\u4ee4\u7c7b\u578b\uff1a\u5224\u65ad\u3001\u591a\u9879\u9009\u62e9\u3001\u957f\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u548c\u77ed\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u3002 \u4e3a\u4e86\u6784\u5efaMMInstruct\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6307\u4ee4\u751f\u6210\u6570\u636e\u5f15\u64ce\uff0c\u5229\u7528GPT-4V\u3001GPT-3.5\u548c\u4eba\u5de5\u6821\u6b63\u3002\u6211\u4eec\u7684\u6307\u4ee4\u751f\u6210\u5f15\u64ce\u5141\u8bb8\u534a\u81ea\u52a8\u3001\u4f4e\u6210\u672c\u3001\u591a\u9886\u57df\u7684\u6307\u4ee4\u751f\u6210\uff0c\u6210\u672c\u4ec5\u4e3a\u624b\u52a8\u6784\u5efa\u7684\u516d\u5206\u4e4b\u4e00\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u548c\u6d88\u878d\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86MMInstruct\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff0c\u4f8b\u5982\uff0c\u57fa\u4e8eMMInstruct\u7684\u6a21\u578b\u5fae\u8c03\u572812\u4e2a\u57fa\u51c6\u4e2d\u768410\u4e2a\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u72b6\u6001\u6700\u4f18\u8868\u73b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728https://github.com/yuecao0119/MMInstruct\u63d0\u4f9b\u3002|\n", "2407.15835": "|**2024-07-22**|**dMel: Speech Tokenization made Simple**|He Bai et.al.|[2407.15835](http://arxiv.org/abs/2407.15835)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5229\u7528\u5927\u89c4\u6a21\u6587\u672c\u6570\u636e\u7684\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u7684\u8fdb\u6b65\u3002\u53d7\u6b64\u6210\u529f\u542f\u53d1\uff0c\u7814\u7a76\u4eba\u5458\u63a2\u7d22\u4e86\u590d\u6742\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\uff0c\u4ee5\u5c06\u8fde\u7eed\u7684\u8bed\u97f3\u4fe1\u53f7\u79bb\u6563\u5316\uff0c\u4ece\u800c\u4f7f\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u53ef\u4ee5\u5e94\u7528\u4e8e\u8bed\u97f3\u6570\u636e\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u5efa\u6a21\u8bed\u4e49\u4ee4\u724c\uff0c\u53ef\u80fd\u4f1a\u4e22\u5931\u58f0\u5b66\u4fe1\u606f\uff0c\u8981\u4e48\u5efa\u6a21\u58f0\u5b66\u4ee4\u724c\uff0c\u53c8\u53ef\u80fd\u9762\u4e34\u4e22\u5931\u8bed\u4e49\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5177\u6709\u591a\u79cd\u4ee4\u724c\u7c7b\u578b\u4e5f\u4f7f\u67b6\u6784\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u9700\u8981\u989d\u5916\u7684\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5c06\u6885\u5c14\u6ee4\u6ce2\u5668\u901a\u9053\u79bb\u6563\u5316\u4e3a\u79bb\u6563\u5f3a\u5ea6\u5355\u5143\uff08dMel\uff09\u4ea7\u751f\u4e86\u4e00\u4e2a\u7b80\u5355\u8868\u793a\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5176\u4ed6\u73b0\u6709\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u3002\u4f7f\u7528\u4ec5\u89e3\u7801\u5668\u7684\u53d8\u6362\u5668\u67b6\u6784\u8fdb\u884c\u8bed\u97f3-\u6587\u672c\u5efa\u6a21\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u4e0d\u540c\u7684\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u5728\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u548c\u8bed\u97f3\u5408\u6210\uff08TTS\uff09\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cdMel\u5728\u8054\u5408\u5efa\u6a21\u8bed\u97f3\u548c\u6587\u672c\u7684\u7edf\u4e00\u6846\u67b6\u4e2d\u5b9e\u73b0\u9ad8\u6027\u80fd\u7684\u6709\u6548\u6027\uff0c\u4e3a\u9ad8\u6548\u4e14\u6709\u6548\u7684\u8bed\u97f3\u4e0e\u6587\u672c\u8054\u5408\u5efa\u6a21\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.15819": "|**2024-07-22**|**Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight**|Ziyuan Huang et.al.|[2407.15819](http://arxiv.org/abs/2407.15819)|null|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u94fe\u89c6\u56fe\u201d\u7684\u89c6\u89c9-\u8bed\u8a00\u6865\u6881\u6a21\u5757\uff0c\u65e8\u5728\u52a0\u901f\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u9884\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u5e8f\u5217\u5316\u7684\u89c6\u89c9\u91cd\u91c7\u6837\u5668\uff0c\u80fd\u591f\u6709\u6548\u5730\u6355\u6349\u4e0d\u540c\u7a7a\u95f4\u5c3a\u5ea6\u7684\u89c6\u89c9\u7ec6\u8282\u3002\u8fd9\u79cd\u67b6\u6784\u4e0d\u4ec5\u80fd\u591f\u6709\u6548\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u8fd8\u901a\u8fc7\u590d\u5408\u4ee4\u724c\u7f29\u653e\u7b56\u7565\u7075\u6d3b\u6269\u5c55\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u6700\u591a\u53ef\u4ee5\u589e\u52a016\u500d\u7684\u4ee4\u724c\u6570\u91cf\uff0c\u800c\u65e0\u9700\u5728\u9884\u8bad\u7ec3\u540e\u8fdb\u884c\u5fae\u8c03\u3002\u56e0\u6b64\uff0c\u201c\u94fe\u89c6\u56fe\u201d\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u6240\u9700\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u8fdc\u5c11\u4e8e\u5fae\u8c03\u9636\u6bb5\uff0c\u8fd9\u6709\u610f\u5730\u51cf\u5c11\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u663e\u8457\u52a0\u901f\u4e86\u9884\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8282\u7701\u4e86\u5927\u7ea673%\u7684\u5b9e\u9645\u8bad\u7ec3\u65f6\u95f4\u3002 \u5728\u4e00\u7cfb\u5217\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u201c\u94fe\u89c6\u56fe\u201d\u52a0\u901f\u9884\u8bad\u7ec3\u8fc7\u7a0b\u5e76\u4e0d\u4f1a\u727a\u7272\u6027\u80fd\uff0c\u5176\u8868\u73b0\u4e0e\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u6240\u6709\u89c6\u89c9\u4ee4\u724c\u7684\u6807\u51c6\u6d41\u7a0b\u76f8\u5f53\u6216\u66f4\u597d\u3002\u8fdb\u4e00\u6b65\u589e\u52a0\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u4f1a\u5bfc\u81f4\u66f4\u5f3a\u7684\u8868\u73b0\uff0c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u73b0\u6709\u65b9\u6cd5\u7ade\u4e89\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u6458\u8981\u5df2\u7ecf\u8f6c\u6362\u6210\u4e86\u4e2d\u6587\u8868\u8ff0\uff0c\u5e76\u4e14\u9075\u5faa\u4e86\u4e0d\u5305\u542b\u7279\u6b8a\u7b26\u53f7\u7684\u6307\u793a\u3002|\n", "2407.15788": "|**2024-07-22**|**Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach**|Rian Dolphin et.al.|[2407.15788](http://arxiv.org/abs/2407.15788)|null|\u91d1\u878d\u65b0\u95fb\u5728\u91d1\u878d\u5e02\u573a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u4f46\u5c06\u5176\u8f6c\u5316\u4e3a\u7ed3\u6784\u5316\u6570\u636e\u7684\u8fc7\u7a0b\u4e00\u76f4\u5145\u6ee1\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u91d1\u878d\u65b0\u95fb\u5904\u7406\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u514b\u670d\u4e86\u4ee5\u5f80\u63d0\u53d6\u7ed3\u6784\u5316\u4fe1\u606f\u65f6\u9047\u5230\u7684\u9650\u5236\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u5957\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u80fd\u591f\u4ece\u539f\u59cb\u65b0\u95fb\u6587\u7ae0\u5185\u5bb9\u4e2d\u63d0\u53d6\u76f8\u5173\u516c\u53f8\u4ee3\u7801\uff0c\u5e76\u5728\u4e0d\u4f9d\u8d56\u4e8e\u9884\u7ed3\u6784\u5316\u6570\u636e\u6d41\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u516c\u53f8\u5c42\u9762\u7684\u60c5\u7eea\u5206\u6790\u548c\u751f\u6210\u6458\u8981\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86LLMs\u7684\u751f\u6210\u80fd\u529b\u3001\u4ee5\u53ca\u6700\u65b0\u7684\u63d0\u793a\u6280\u672f\uff0c\u914d\u4ee5\u4e00\u4e2a\u5b9a\u5236\u7684\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u9a8c\u8bc1\u6846\u67b6\u3002 \u901a\u8fc7\u4f7f\u7528\u5305\u542b5530\u7bc7\u91d1\u878d\u65b0\u95fb\u6587\u7ae0\u7684\u6570\u636e\u96c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u76f8\u6bd4\u73b0\u6709\u6570\u636e\u63d0\u4f9b\u5546\uff0c\u6211\u4eec\u670990%\u7684\u6587\u7ae0\u4e0d\u4f1a\u9057\u6f0f\u4efb\u4f55\u516c\u53f8\u4ee3\u7801\uff0c\u800c\u670922%\u7684\u6587\u7ae0\u4f1a\u989d\u5916\u63d0\u4f9b\u76f8\u5173\u7684\u516c\u53f8\u4ee3\u7801\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u5e76\u901a\u8fc7\u5b9e\u65f6API\u7aef\u70b9\u63d0\u4f9b\u4e86\u7ecf\u8fc7\u5904\u7406\u7684\u6570\u636e\uff0c\u66f4\u65b0\u4e86\u6700\u65b0\u65b0\u95fb\u4fe1\u606f\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u6211\u4eec\u9996\u6b21\u4f5c\u4e3a\u6570\u636e\u63d0\u4f9b\u5546\u63d0\u4f9b\u4ece\u65b0\u95fb\u6587\u7ae0\u4e2d\u5bf9\u6bcf\u4e2a\u516c\u53f8\u7684\u7ec6\u81f4\u60c5\u7eea\u5206\u6790\u670d\u52a1\uff0c\u589e\u5f3a\u4e86\u5e02\u573a\u53c2\u4e0e\u8005\u53ef\u83b7\u53d6\u7684\u4fe1\u606f\u6df1\u5ea6\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5229\u7528\u91d1\u878d\u65b0\u95fb\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86\u5305\u542b5530\u7bc7\u5904\u7406\u540e\u6587\u7ae0\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002|\n", "2407.15748": "|**2024-07-22**|**MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation**|Marco Simoni et.al.|[2407.15748](http://arxiv.org/abs/2407.15748)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86MoRSE\uff08\u6df7\u5408RAG\u5b89\u5168\u4e13\u5bb6\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u95e8\u7684AI\u804a\u5929\u673a\u5668\u4eba\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u3002MoRSE\u65e8\u5728\u63d0\u4f9b\u5168\u9762\u4e14\u5b8c\u6574\u7684\u7f51\u7edc\u5b89\u5168\u77e5\u8bc6\u3002MoRSE\u4f7f\u7528\u4e86\u4e24\u4e2aRAG\uff08\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff09\u7cfb\u7edf\uff0c\u8bbe\u8ba1\u7528\u4e8e\u4ece\u591a\u7ef4\u5ea6\u7684\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u548c\u7ec4\u7ec7\u4fe1\u606f\u3002\u4e0e\u4f20\u7edf\u7684RAG\u4e0d\u540c\uff0cMoRSE\u91c7\u7528\u4e86\u5e76\u884c\u68c0\u7d22\u5668\u534f\u540c\u5de5\u4f5c\uff0c\u4ee5\u5728\u4e0d\u540c\u683c\u5f0f\u548c\u7ed3\u6784\u4e2d\u68c0\u7d22\u8bed\u4e49\u76f8\u5173\u7684\u4fe1\u606f\u3002 \u4e0d\u540c\u4e8e\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u5e93\u7684\u4f20\u7edf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cMoRSE\u54cd\u5e94\u7528\u6237\u67e5\u8be2\u65f6\u4ece\u975e\u53c2\u6570\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u76f8\u5173\u6587\u6863\u3002\u968f\u540e\uff0cMoRSE\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0cMoRSE\u53d7\u76ca\u4e8e\u5176\u77e5\u8bc6\u5e93\u7684\u5b9e\u65f6\u66f4\u65b0\uff0c\u8fd9\u4f7f\u5f97\u7cfb\u7edf\u80fd\u591f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u6301\u7eed\u7684\u77e5\u8bc6\u4e30\u5bcc\u3002 \u6211\u4eec\u5bf9MoRSE\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9600\u4e2a\u7279\u5b9a\u7684\u7f51\u7edc\u5b89\u5168\u95ee\u9898\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6027\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u3001Mixtral 7x8\u7b49\u5df2\u77e5\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\uff0c\u5728\u7b54\u6848\u7684\u76f8\u5173\u6027\u548c\u6b63\u786e\u6027\u7684\u6539\u8fdb\u4e0a\u8d85\u8fc7\u4e8610%\u3002|\n", "2407.15736": "|**2024-07-22**|**OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context**|Steffen Kleinle et.al.|[2407.15736](http://arxiv.org/abs/2407.15736)|null|\u5f53\u79fb\u6c11\u5230\u4e00\u4e2a\u65b0\u7684\u56fd\u5bb6\u65f6\uff0c\u4eba\u4eec\u5f88\u5bb9\u6613\u56e0\u9700\u8981\u83b7\u53d6\u6709\u5173\u8d22\u653f\u652f\u6301\u3001\u4f4f\u623f\u3001\u6559\u80b2\u3001\u8bed\u8a00\u8bfe\u7a0b\u4ee5\u53ca\u5176\u4ed6\u95ee\u9898\u7684\u4fe1\u606f\u800c\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5982\u679c\u642c\u8fc1\u8fc7\u7a0b\u5306\u5fd9\u6216\u751a\u81f3\u88ab\u8feb\u8fdb\u884c\uff0c\u5bf9\u8fd9\u4e9b\u95ee\u9898\u7684\u9ad8\u8d28\u91cf\u89e3\u7b54\u53d8\u5f97\u5c24\u4e3a\u8feb\u5207\u3002\u5b98\u65b9\u79fb\u6c11\u987e\u95ee\u901a\u5e38\u8fc7\u4e8e\u7e41\u5fd9\uff0c\u800c\u5728\u7ebf\u7cfb\u7edf\u53ef\u4ee5\u5f15\u5bfc\u65b0\u79fb\u6c11\u627e\u5230\u6240\u9700\u4fe1\u606f\u6216\u5408\u9002\u7684\u54a8\u8be2\u670d\u52a1\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86OMoS-QA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u5fb7\u8bed\u548c\u82f1\u8bed\u95ee\u9898\u4e0e\u76f8\u5173\u53ef\u4fe1\u6587\u6863\u4ee5\u53ca\u624b\u52a8\u6807\u6ce8\u7684\u7b54\u6848\uff0c\u4e13\u95e8\u9488\u5bf9\u8fd9\u4e00\u573a\u666f\u3002\u95ee\u9898\u662f\u7531\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u751f\u6210\u7684\uff0c\u7b54\u6848\u53e5\u5b50\u7531\u5177\u6709\u9ad8\u5ea6\u4e00\u81f4\u6027\u7684\u4f17\u5305\u5de5\u4f5c\u8005\u9009\u62e9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u5fb7\u8bed\u548c\u82f1\u8bed\u4e0a\u5bf95\u4e2a\u9884\u8bad\u7ec3\u7684LLM\u8fdb\u884c\u4e86\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u7684\u6bd4\u8f83\u3002\u5728\u6240\u6709\u6a21\u578b\u548c\u4e24\u79cd\u8bed\u8a00\u4e2d\uff0c\u9009\u62e9\u7b54\u6848\u53e5\u5b50\u7684\u7cbe\u786e\u5ea6\u9ad8\uff0c\u53ec\u56de\u7387\u4f4e\u81f3\u4e2d\u7b49\uff0c\u8fd9\u662f\u4e00\u4e2a\u6709\u5229\u7684\u6743\u8861\uff0c\u4ee5\u907f\u514d\u8bef\u5bfc\u7528\u6237\u3002\u8fd9\u79cd\u6027\u80fd\u5373\u4f7f\u5728\u95ee\u9898\u8bed\u8a00\u4e0e\u6587\u6863\u8bed\u8a00\u4e0d\u5339\u914d\u65f6\u4e5f\u80fd\u4fdd\u6301\u4e0d\u53d8\u3002\u5728\u6839\u636e\u4e0a\u4e0b\u6587\u8bc6\u522b\u4e0d\u53ef\u56de\u7b54\u7684\u95ee\u9898\u65b9\u9762\uff0c\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u5b58\u5728\u66f4\u5927\u7684\u5dee\u5f02\u3002|\n", "2407.15734": "|**2024-07-22**|**TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON**|John Chong Min Tan et.al.|[2407.15734](http://arxiv.org/abs/2407.15734)|**[link](https://github.com/simbianai/taskgen)**|TaskGen\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u4f7f\u7528\u4ee3\u7406\u6765\u89e3\u51b3\u4efb\u610f\u4efb\u52a1\u5e76\u5c06\u5176\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\u6765\u5b9e\u73b0\u3002\u6bcf\u4e2a\u5b50\u4efb\u52a1\u88ab\u6620\u5c04\u5230\u4e00\u4e2a\u88c5\u5907\u51fd\u6570\u6216\u53e6\u4e00\u4e2a\u4ee3\u7406\u6267\u884c\u3002\u4e3a\u4e86\u51cf\u5c11\u5197\u4f59\uff08\u4ece\u800c\u51cf\u5c11\u4ee4\u724c\u4f7f\u7528\uff09\uff0cTaskGen\u4f7f\u7528\u4e86StrictJSON\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u7684JSON\u683c\u5f0f\uff0c\u5e76\u5177\u5907\u7c7b\u578b\u68c0\u67e5\u548c\u8fed\u4ee3\u9519\u8bef\u4fee\u6b63\u7b49\u989d\u5916\u529f\u80fd\u3002TaskGen\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\u57fa\u4e8e\u9700\u6c42\u7ba1\u7406\u4fe1\u606f/\u8bb0\u5fc6\u3002 \u6211\u4eec\u5bf9TaskGen\u5728\u5404\u79cd\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u5305\u62ec40x40\u52a8\u6001\u8ff7\u5bab\u5bfc\u822a\uff0c\u5176\u4e2d\u969c\u788d\u7269\u4f4d\u7f6e\u4f1a\u53d8\u5316\uff08100%\u7684\u6210\u529f\u7387\uff09\uff0c\u6587\u672c\u4e16\u754c\u9003\u8131\u623f\u95f4\u89e3\u8c1c\uff0c\u5177\u6709\u5bc6\u96c6\u5956\u52b1\u548c\u8be6\u7ec6\u76ee\u6807\uff0896%\u7684\u6210\u529f\u7387\uff09\uff0c\u7f51\u7edc\u6d4f\u89c8\uff0869%\u7684\u52a8\u4f5c\u6210\u529f\uff09\uff0c\u89e3\u51b3MATH\u6570\u636e\u96c6\uff08\u5728100\u4e2aLevel-5\u95ee\u9898\u4e0a\uff0c\u6210\u529f\u738771%\uff09\uff0c\u4ee5\u53ca\u81ea\u7136\u95ee\u9898\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08F1\u5206\u6570\u4e3a47.03%\uff09\u3002|\n", "2407.16686": "|**2024-07-23**|**Can Large Language Models Automatically Jailbreak GPT-4V?**|Yuanwei Wu et.al.|[2407.16686](http://arxiv.org/abs/2407.16686)|null|GPT-4V\u56e0\u5176\u5728\u6574\u5408\u548c\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u800c\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u3002\u540c\u65f6\uff0c\u5176\u9762\u90e8\u8bc6\u522b\u529f\u80fd\u4e5f\u5f15\u53d1\u4e86\u9690\u79c1\u6cc4\u9732\u7684\u5b89\u5168\u62c5\u5fe7\u3002\u5c3d\u7ba1\u7814\u7a76\u8005\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u4e0e\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6216\u9884\u5904\u7406\u8fc7\u6ee4\u5668\u7b49\u624b\u6bb5\u52aa\u529b\u5b9e\u73b0\u5b89\u5168\u5bf9\u9f50\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u5b58\u5728\u88ab\u5229\u7528\u7684\u6f0f\u6d1e\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AutoJailbreak\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u81ea\u52a8\u8d8a\u72f1\u6280\u672f\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u63d0\u793a\u4f18\u5316\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u7ea2\u961f\u8bad\u7ec3\uff0c\u4ee5\u7cbe\u70bc\u8d8a\u72f1\u63d0\u793a\uff0c\u5e76\u91c7\u7528\u5f31\u5230\u5f3a\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u63d0\u793a\u6765\u63d0\u9ad8\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u7ed3\u5408\u65e9\u671f\u505c\u6b62\u7b56\u7565\uff0c\u4ee5\u6700\u5c0f\u5316\u4f18\u5316\u65f6\u95f4\u548c\u4ee4\u724c\u6d88\u8017\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoJailbreak\u663e\u8457\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e86\u8d85\u8fc795.3%\u7684\u6210\u529f\u653b\u51fb\u7387\uff08ASR\uff09\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86\u52a0\u5f3aGPT-4V\u5b89\u5168\u6027\u7684\u6f5c\u529b\uff0c\u7a81\u663e\u4e86LLMs\u53ef\u80fd\u88ab\u7528\u4e8e\u7834\u574fGPT-4V\u5b8c\u6574\u6027\u7684\u98ce\u9669\u3002|\n", "2407.16667": "|**2024-07-23**|**RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent**|Huiyu Xu et.al.|[2407.16667](http://arxiv.org/abs/2407.16667)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u5df2\u88ab\u96c6\u6210\u81f3\u8bf8\u591a\u5b9e\u9645\u5e94\u7528\uff0c\u4f8b\u5982\u4ee3\u7801\u52a9\u624bCopilot\u3002\u8fd9\u4e9b\u96c6\u6210\u663e\u8457\u6269\u5c55\u4e86LLM\u7684\u653b\u51fb\u9762\uff0c\u4f7f\u5176\u9762\u4e34\u591a\u79cd\u5a01\u80c1\u3002\u5176\u4e2d\uff0c\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u8bf1\u5bfc\u51fa\u6bd2\u6027\u54cd\u5e94\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u5f15\u8d77\u4e86\u5b89\u5168\u9886\u57df\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u8bc6\u522b\u8fd9\u4e9b\u5a01\u80c1\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7ea2\u961f\u7b56\u7565\u901a\u8fc7\u6784\u5efa\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u6765\u6a21\u62df\u6f5c\u5728\u7684\u5bf9\u6297\u573a\u666f\uff0c\u4ee5\u6b64\u6d4b\u8bd5\u76ee\u6807LLM\u3002\u7136\u800c\uff0c\u73b0\u6709\u7ea2\u961f\u7b56\u7565\u5e76\u672a\u8003\u8651LLM\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u72ec\u7279\u8106\u5f31\u6027\uff0c\u4f7f\u5f97\u6784\u5efa\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u53d8\u5f97\u56f0\u96be\u3002\u540c\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u4f9d\u8d56\u4e8e\u5c11\u6570\u53d8\u5f02\u64cd\u4f5c\u5bf9\u201c\u8d8a\u72f1\u201d\u6a21\u677f\u8fdb\u884c\u7ec6\u5316\uff0c\u7f3a\u4e4f\u9002\u5e94\u4e0d\u540c\u60c5\u5883\u7684\u81ea\u52a8\u5316\u548c\u89c4\u6a21\u5316\u80fd\u529b\u3002 \u4e3a\u4e86\u5b9e\u73b0\u60c5\u5883\u611f\u77e5\u548c\u9ad8\u6548\u7ea2\u961f\u7b56\u7565\uff0c\u6211\u4eec\u62bd\u8c61\u5e76\u5efa\u6a21\u73b0\u6709\u653b\u51fb\u884c\u4e3a\u4e3a\u4e00\u4e2a\u7edf\u4e00\u6982\u5ff5\u2014\u2014\u201c\u8d8a\u72f1\u7b56\u7565\u201d\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u667a\u80fd\u4f53LLM\u7cfb\u7edfRedAgent\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u751f\u6210\u60c5\u5883\u611f\u77e5\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u5e76\u901a\u8fc7\u989d\u5916\u7684\u8bb0\u5fc6\u7f13\u51b2\u533a\u81ea\u6211\u53cd\u601d\u60c5\u5883\u53cd\u9988\uff0c\u6301\u7eed\u5b66\u4e60\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u5728\u7279\u5b9a\u60c5\u5883\u4e0b\u5b9e\u73b0\u6709\u6548\u201c\u8d8a\u72f1\u201d\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u53ef\u4ee5\u5728\u4e94\u4e2a\u67e5\u8be2\u5185\u6210\u529f\u201c\u8d8a\u72f1\u201d\u5927\u591a\u6570\u9ed1\u76d2LLM\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u7ea2\u961f\u65b9\u6cd5\u6548\u7387\u63d0\u5347\u4e24\u500d\u3002\u6b64\u5916\uff0cRedAgent\u80fd\u591f\u66f4\u9ad8\u6548\u5730\u9488\u5bf9\u5b9a\u5236\u5316\u7684LLM\u5e94\u7528\u8fdb\u884c\u201c\u8d8a\u72f1\u201d\u3002 \u901a\u8fc7\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u5e94\u7528\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u6211\u4eec\u53d1\u73b0\u4e8660\u4e2a\u4e25\u91cd\u6f0f\u6d1e\u5b58\u5728\u4e8e\u5b9e\u9645\u5e94\u7528\u4e2d\u7684GPTs\u4e0a\uff0c\u4ec5\u9700\u6bcf\u6f0f\u6d1e\u4e24\u6b21\u67e5\u8be2\u3002\u6211\u4eec\u5df2\u62a5\u544a\u6240\u6709\u53d1\u73b0\u7684\u95ee\u9898\uff0c\u5e76\u4e0eOpenAI\u548cMeta\u8fdb\u884c\u4e86\u6c9f\u901a\u4ee5\u4fee\u590d\u6f0f\u6d1e\u3002|\n", "2407.16637": "|**2024-07-23**|**Course-Correction: Safety Alignment Using Synthetic Preferences**|Rongwu Xu et.al.|[2407.16637](http://arxiv.org/abs/2407.16637)|**[link](https://github.com/pillowsofwind/course-correction)**|### \u6458\u8981 \u672c\u6587\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u201c\u8bfe\u7a0b\u7ea0\u6b63\u201d\u4efb\u52a1\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u4e00\u9879\u7cfb\u7edf\u6027\u7814\u7a76\uff0c\u5373\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u5730\u907f\u514d\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\textsc{C$^2$-Eval}\u57fa\u51c6\u7528\u4e8e\u5b9a\u91cf\u8bc4\u4f30\uff0c\u5e76\u5206\u6790\u4e8610\u4e2a\u6d41\u884cLLM\u7684\u6027\u80fd\uff0c\u63ed\u793a\u4e86\u5f53\u524d\u5b89\u5168\u8c03\u4f18\u7684LLM\u5728\u8bfe\u7a0b\u7ea0\u6b63\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u504f\u597d\u5b66\u4e60\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u5f3a\u8c03\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u91cd\u8981\u6027\u3002\u901a\u8fc7\u81ea\u52a8\u5316\u6d41\u7a0b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\\textsc{C$^2$-Syn}\u5408\u6210\u6570\u636e\u96c6\uff0c\u5305\u542b75\u4e07\u5bf9\u504f\u597d\uff0c\u4ee5\u6b64\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u504f\u597d\u5b66\u4e60\u6559\u6388\u6a21\u578b\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u6982\u5ff5\u3002\u5728\\textsc{Llama2-Chat 7B}\u548c\\textsc{Qwen2 7B}\u4e24\u4e2aLLM\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u63d0\u9ad8\u4e86\u8bfe\u7a0b\u7ea0\u6b63\u80fd\u529b\uff0c\u540c\u65f6\u4e0d\u5f71\u54cd\u603b\u4f53\u6027\u80fd\uff0c\u5e76\u4e14\u7279\u522b\u6709\u6548\u5730\u63d0\u5347\u4e86LLM\u7684\u5b89\u5168\u6027\uff0c\u5c24\u5176\u662f\u62b5\u6297\u9003\u8131\u653b\u51fb\u7684\u80fd\u529b\u3002|\n", "2407.16615": "|**2024-07-23**|**Lawma: The Power of Specialization for Legal Tasks**|Ricardo Dominguez-Olmedo et.al.|[2407.16615](http://arxiv.org/abs/2407.16615)|null|\u6cd5\u5f8b\u6587\u672c\u7684\u6ce8\u91ca\u4e0e\u5206\u7c7b\u662f\u5b9e\u8bc1\u6cd5\u5b66\u7814\u7a76\u7684\u6838\u5fc3\u90e8\u5206\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5f80\u5f80\u7531\u53d7\u8fc7\u8bad\u7ec3\u7684\u7814\u7a76\u52a9\u7406\u627f\u62c5\u3002\u5728\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u8fdb\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5b9e\u8bc1\u6cd5\u5f8b\u5b66\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u8f6c\u5411\u4f7f\u7528\u5546\u4e1a\u6a21\u578b\uff0c\u5e0c\u671b\u4ee5\u6b64\u51cf\u8f7b\u4eba\u5de5\u6807\u6ce8\u7684\u5de8\u5927\u6210\u672c\u3002\u5c3d\u7ba1\u8fd9\u7c7b\u65b9\u6cd5\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4f46\u5173\u4e8e\u5982\u4f55\u6700\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6cd5\u5f8b\u4efb\u52a1\u7684\u76f8\u5173\u7814\u7a76\u4ecd\u7136\u6709\u9650\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u51e0\u4e4e\u5168\u90e8\u9488\u5bf9\u673a\u5668\u5b66\u4e60\u793e\u533a\u7684\u65b0\u6cd5\u5f8b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u3002\u4eceGPT-4\u4f5c\u4e3a\u57fa\u51c6\u5f00\u59cb\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u5728\u96f6\u6837\u672c\u51c6\u786e\u5ea6\u4e0a\u7684\u8868\u73b0\u5177\u6709\u975e\u540c\u5bfb\u5e38\u4f46\u9ad8\u5ea6\u591a\u53d8\u6027\uff0c\u7ecf\u5e38\u8868\u73b0\u51fa\u53ef\u80fd\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u6cd5\u5f8b\u5de5\u4f5c\u9700\u6c42\u7684\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8f7b\u5ea6\u5fae\u8c03\u540e\u7684Llama 3\u6a21\u578b\u5728\u51e0\u4e4e\u6240\u6709\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5747\u8fdc\u8d85GPT-4\uff0c\u901a\u5e38\u63d0\u9ad8\u4e86\u4e24\u4f4d\u6570\u767e\u5206\u70b9\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5728\u5fae\u8c03\u65f6\u54cd\u5e94\u6548\u679c\u66f4\u597d\u3002\u51e0\u5341\u5230\u51e0\u767e\u4e2a\u793a\u4f8b\u8db3\u4ee5\u5b9e\u73b0\u9ad8\u5206\u7c7b\u51c6\u786e\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u6240\u6709260\u4e2a\u4efb\u52a1\u4e0a\u540c\u65f6\u5fae\u8c03\u4e00\u4e2a\u6a21\u578b\uff0c\u76f8\u5bf9\u4e8e\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u5355\u72ec\u521b\u5efa\u6a21\u578b\uff0c\u4ec5\u5728\u51c6\u786e\u6027\u65b9\u9762\u7565\u6709\u635f\u5931\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u6307\u51fa\u4e86\u66ff\u4ee3\u73b0\u6709\u505a\u6cd5\u7684\u4e00\u79cd\u53ef\u884c\u9009\u62e9\u3002\u5bf9\u4e8e\u5177\u5907\u4e00\u5b9a\u6807\u6ce8\u6570\u636e\u7684\u7279\u5b9a\u6cd5\u5f8b\u4efb\u52a1\uff0c\u7814\u7a76\u4eba\u5458\u66f4\u5e94\u8003\u8651\u4f7f\u7528\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002|\n", "2407.16604": "|**2024-07-23**|**Shared Imagination: LLMs Hallucinate Alike**|Yilun Zhou et.al.|[2407.16604](http://arxiv.org/abs/2407.16604)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u8fd1\u53d1\u5c55\u5448\u73b0\u4e86\u663e\u8457\u7684\u589e\u957f\uff0c\u4f46\u5b83\u4eec\u7684\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u6570\u636e\u548c\u4f18\u5316\u7b97\u6cd5\u2014\u2014\u5f80\u5f80\u6781\u4e3a\u76f8\u4f3c\u3002\u8fd9\u81ea\u7136\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u5982\u4f55\uff1f\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u8bbe\u7f6e\uff0c\u5373\u60f3\u8c61\u95ee\u9898\u56de\u7b54\uff08IQA\uff09\uff0c\u4ee5\u66f4\u6df1\u5165\u5730\u7406\u89e3\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u3002\u5728IQA\u4e2d\uff0c\u6211\u4eec\u8ba9\u4e00\u4e2a\u6a21\u578b\u751f\u6210\u5b8c\u5168\u865a\u6784\u7684\u95ee\u9898\uff08\u4f8b\u5982\uff0c\u5173\u4e8e\u7269\u7406\u4e2d\u5b8c\u5168\u4e0d\u5b58\u5728\u7684\u6982\u5ff5\uff09\uff0c\u7136\u540e\u8ba9\u53e6\u4e00\u4e2a\u6a21\u578b\u8fdb\u884c\u56de\u7b54\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u95ee\u9898\u5b8c\u5168\u865a\u6784\uff0c\u4f46\u6240\u6709\u6a21\u578b\u90fd\u80fd\u6210\u529f\u56de\u7b54\u5bf9\u65b9\u7684\u95ee\u9898\uff0c\u8fd9\u8868\u660e\u5728\u8fd9\u6837\u7684\u5e7b\u89c9\u8fc7\u7a0b\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u5171\u4eab\u7740\u4e00\u4e2a\u201c\u5171\u540c\u7684\u60f3\u8c61\u7a7a\u95f4\u201d\u3002 \u6211\u4eec\u5bf9\u8fd9\u4e00\u73b0\u8c61\u8fdb\u884c\u4e86\u7cfb\u5217\u8c03\u67e5\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b83\u5bf9\u6a21\u578b\u540c\u8d28\u6027\u3001\u5e7b\u89c9\u4ee5\u53ca\u8ba1\u7b97\u521b\u9020\u529b\u7684\u542f\u793a\u3002|\n", "2407.16576": "|**2024-07-23**|**Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs**|Yifan Xia et.al.|[2407.16576](http://arxiv.org/abs/2407.16576)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u68c0\u6d4b\u52a0\u5bc6API\u8bef\u7528\u65b9\u9762\u6240\u9762\u4e34\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u5728\u5f53\u524d\u81ea\u52a8\u5316\u68c0\u6d4b\u6280\u672f\u8fdb\u6b65\u7684\u57fa\u7840\u4e0a\uff0c\u5bf9\u4e8e\u590d\u6742\u76ee\u6807\u7684\u7cbe\u786e\u5ea6\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u624b\u52a8\u5b9a\u4e49\u6a21\u5f0f\u7684\u4f9d\u8d56\u3002LLM\u4ee5\u5176\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\uff0c\u5728\u6b64\u5173\u952e\u5b89\u5168\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5c06LLM\u5e94\u7528\u4e8e\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u6311\u6218\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u5b83\u4eec\u56fa\u6709\u7684\u968f\u673a\u6027\u548c\u4f17\u6240\u5468\u77e5\u7684\u5e7b\u89c9\u95ee\u9898\u5bfc\u81f4\u7684\u4e0d\u7a33\u5b9a\u6027\u3002 \u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u52a0\u5bc6\u8bef\u7528\u65b9\u9762\u7684\u53ef\u9760\u6027\uff0c\u5e76\u63a2\u7d22\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5229\u7528\u6db5\u76d6\u4eba\u5de5\u6784\u5efa\u6837\u672c\u548c\u5b9e\u9645\u9879\u76ee\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8fdb\u884c\u5206\u6790\u3002\u901a\u8fc7\u6df1\u5165\u5206\u679011,940\u4efdLLM\u751f\u6210\u7684\u62a5\u544a\uff0c\u6211\u4eec\u63ed\u793a\u4e86LLM\u56fa\u6709\u4e0d\u7a33\u5b9a\u6027\u7684\u666e\u904d\u5b58\u5728\uff0c\u5bfc\u81f4\u8d85\u8fc7\u4e00\u534a\u7684\u62a5\u544a\u88ab\u8bef\u62a5\u4e3a\u8bef\u7528\u3002\u7136\u800c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u901a\u8fc7\u9650\u5236\u95ee\u9898\u8303\u56f4\u5e76\u4e0eLLM\u7684\u81ea\u6211\u4fee\u6b63\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u68c0\u6d4b\u7684\u53ef\u9760\u6027\u3002\u4f18\u5316\u7684\u65b9\u6cd5\u5b9e\u73b0\u4e86\u63a5\u8fd190%\u7684\u68c0\u6d4b\u7387\uff0c\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5e76\u5728\u73b0\u6709\u57fa\u51c6\u4e2d\u53d1\u73b0\u4e86\u672a\u88ab\u53d1\u73b0\u7684\u8bef\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u6301\u7eed\u963b\u788dLLM\u53ef\u9760\u6027\u7684\u5931\u8d25\u6a21\u5f0f\uff0c\u5305\u62ec\u52a0\u5bc6\u77e5\u8bc6\u4e0d\u8db3\u548c\u4ee3\u7801\u8bed\u4e49\u8bef\u89e3\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4ee5LLM\u4e3a\u57fa\u7840\u7684\u5de5\u4f5c\u6d41\u7a0b\u6765\u68c0\u67e5\u5f00\u6e90\u4ed3\u5e93\uff0c\u6700\u7ec8\u53d1\u73b0\u4e8663\u4e2a\u771f\u5b9e\u7684\u52a0\u5bc6\u8bef\u7528\u6848\u4f8b\u3002\u5176\u4e2d46\u4e2a\u5df2\u88ab\u5f00\u53d1\u793e\u533a\u8ba4\u53ef\uff0c23\u4e2a\u6b63\u5728\u5904\u7406\u4e2d\uff0c6\u4e2a\u5df2\u5f97\u5230\u89e3\u51b3\u3002\u8003\u8651\u5230\u5f00\u53d1\u8005\u53cd\u9988\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u548cLLM\u5b89\u5168\u5de5\u5177\u53d1\u5c55\u7684\u5efa\u8bae\u3002|\n", "2407.16565": "|**2024-07-23**|**Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models**|Ioana Buhnila et.al.|[2407.16565](http://arxiv.org/abs/2407.16565)|**[link](https://github.com/ATILF-UMR7118/pRAGe)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u5bf9\u516c\u4f17\u800c\u8a00\u53d8\u5f97\u6108\u53d1\u4fbf\u6377\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4eba\u4eec\u5728\u533b\u7597\u5efa\u8bae\u65b9\u9762\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u60c5\u51b5\u96be\u4ee5\u8ffd\u8e2a\u3002\u5927\u578b\u8bed\u8a00\u751f\u6210\u6a21\u578b\u5b58\u5728\u4e24\u4e2a\u5173\u952e\u95ee\u9898\uff1a\u9996\u5148\uff0c\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u9519\u8bef\u63a8\u7406\uff0c\u56e0\u6b64\u7528\u4e8e\u533b\u7597\u76ee\u7684\u65f6\u9700\u8981\u5177\u5907\u79d1\u5b66\u6027\u548c\u4e8b\u5b9e\u6027\uff1b\u5176\u6b21\uff0c\u7531\u4e8e\u6a21\u578b\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u8ba1\u7b97\u8d44\u6e90\u6784\u6210\u91cd\u5927\u6311\u6218\u3002 \u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3apRAGe\u7684\u7ba1\u9053\uff0c\u65e8\u5728\u901a\u8fc7\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u8fdb\u884c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u4e0e\u8bc4\u4f30\uff0c\u4ee5\u5b9e\u73b0\u6cd5\u8bed\u533b\u5b66\u77ed\u8bed\u751f\u6210\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u6027\u4ee5\u53ca\u5916\u90e8\u77e5\u8bc6\u5e93\u5728\u533b\u5b66\u77ed\u8bed\u751f\u6210\u4e2d\u7684\u5f71\u54cd\u3002|\n", "2407.16557": "|**2024-07-23**|**Patched RTC: evaluating LLMs for diverse software development tasks**|Asankhaya Sharma et.al.|[2407.16557](http://arxiv.org/abs/2407.16557)|**[link](https://github.com/codelion/optillm/blob/main/rto.py)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8865\u4e01\u5f80\u8fd4\u6b63\u786e\u6027\uff08Patched RTC\uff09\u201d\u7684\u65b0\u578b\u8bc4\u4f30\u65b9\u6cd5\uff0c\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u7279\u522b\u662f\u201c\u5916\u5faa\u73af\u201d\u6d3b\u52a8\uff0c\u5982\u9519\u8bef\u4fee\u590d\u3001\u4ee3\u7801\u5ba1\u67e5\u548c\u6587\u6863\u66f4\u65b0\u3002Patched RTC\u662f\u5bf9\u539f\u5f80\u8fd4\u6b63\u786e\u6027\u65b9\u6cd5\u7684\u6269\u5c55\uff0c\u9002\u7528\u4e8e\u4efb\u4f55LLM\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u6211\u8bc4\u4f30\u6846\u67b6\uff0c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u5373\u53ef\u6d4b\u91cf\u6a21\u578b\u54cd\u5e94\u7684\u4e00\u81f4\u6027\u548c\u7a33\u5065\u6027\u3002\u7814\u7a76\u663e\u793a\u4e86Patched RTC\u5206\u6570\u4e0e\u7279\u5b9a\u4efb\u52a1\u51c6\u786e\u6027\u6307\u6807\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u5c06\u5176\u4f5c\u4e3a\u66ff\u4ee3LLM-as-Judge\u8303\u5f0f\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5f00\u653e\u57df\u4efb\u52a1\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3apatchwork\u7684\u5f00\u6e90\u6846\u67b6\u5b9e\u73b0Patched RTC\uff0c\u5728\u5404\u79cd\u8865\u4e01\u6d41\u4e2d\u5b9e\u73b0\u4e86\u5bf9\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u900f\u660e\u8bc4\u4f30\u3002 \u6bd4\u8f83GPT-3.5\u548cGPT-4\u6a21\u578b\u5728\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86Patched RTC\u80fd\u591f\u6709\u6548\u5730\u533a\u5206\u6a21\u578b\u6027\u80fd\u548c\u4efb\u52a1\u96be\u5ea6\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u4e00\u81f4\u6027\u63d0\u793a\u5bf9\u63d0\u9ad8\u6a21\u578b\u51c6\u786e\u6027\u7684\u5f71\u54cd\uff0c\u8868\u660ePatched RTC\u53ef\u4ee5\u6307\u5bfc\u63d0\u793a\u4f18\u5316\u548c\u6a21\u578b\u9009\u62e9\uff0c\u4ee5\u9002\u5e94\u590d\u6742\u7684\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002|\n", "2407.16552": "|**2024-07-24**|**MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues**|Liyun Zhang et.al.|[2407.16552](http://arxiv.org/abs/2407.16552)|null|\u5728\u89c6\u89c9\u3001\u542c\u89c9\u548c\u8bed\u8a00\u7b49\u591a\u6a21\u6001\u7ebf\u7d22\u7684\u89c6\u9891\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5c55\u793a\u4e86\u5353\u8d8a\u7684\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\u80fd\u529b\uff0c\u80fd\u591f\u7efc\u5408\u8fd9\u4e9b\u7ebf\u7d22\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5ffd\u89c6\u4e86\u6355\u6349\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\u4ee5\u53ca\u89c6\u9891\u4e2d\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u4ece\u800c\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u9650\u5236\u4e86\u5b83\u4eec\u7684\u6709\u6548\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65f6\u95f4\u654f\u611f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578bMicroEmo\uff0c\u65e8\u5728\u5c06\u6ce8\u610f\u529b\u96c6\u4e2d\u4e8e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u7ec6\u8282\u548c\u89c6\u9891\u4e2d\u7684\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5305\u542b\u4e86\u4e24\u4e2a\u5173\u952e\u7684\u67b6\u6784\u8d21\u732e\uff1a 1. \u5168\u5c40-\u5c40\u90e8\u6ce8\u610f\u529b\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u5b83\u7ed3\u5408\u4e86\u5168\u5c40\u5e27\u7ea7\u65f6\u95f4\u7ed1\u5b9a\u56fe\u50cf\u7279\u5f81\u4e0e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\uff0c\u5b9e\u73b0\u4e86\u5bf9\u6574\u4f53\u548c\u5c40\u90e8\u4fe1\u606f\u7684\u6709\u6548\u878d\u5408\uff1b 2. \u4e00\u4e2a\u8bdd\u8bed\u610f\u8bc6\u7684\u89c6\u9891Q-Former\uff0c\u5b83\u901a\u8fc7\u4e3a\u6bcf\u4e2a\u8bdd\u8bed\u6bb5\u843d\u548c\u6574\u4e2a\u89c6\u9891\u751f\u6210\u89c6\u89c9\u4ee4\u724c\u5e8f\u5217\u6765\u6355\u83b7\u591a\u5c42\u6b21\u548c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u7ec4\u5408\u5728\u4e00\u8d77\uff0c\u4ee5\u6355\u6349\u591a\u5c3a\u5ea6\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u3002 \u521d\u6b65\u7684\u5b9a\u6027\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e00\u4e2a\u5229\u7528\u591a\u6a21\u6001\u548c\u591a\u65b9\u9762\u7ebf\u7d22\u4ee5\u5f00\u653e\u8bcd\u6c47\uff08OV\uff09\u65b9\u5f0f\u9884\u6d4b\u60c5\u7eea\u7684\u65b0\u89e3\u91ca\u6027\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\uff08EMER\uff09\u4efb\u52a1\u4e2d\uff0cMicroEmo\u76f8\u8f83\u4e8e\u6700\u65b0\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2407.16521": "|**2024-07-23**|**AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game**|Yizhou Chi et.al.|[2407.16521](http://arxiv.org/abs/2407.16521)|null|\u6218\u7565\u6027\u7684\u793e\u4ea4\u63a8\u65ad\u6e38\u620f\u662f\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u7684\u5b9d\u8d35\u5b9e\u9a8c\u5e73\u53f0\uff0c\u5bf9\u4e8e\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u3001\u4eba\u5de5\u667a\u80fd\u9886\u57df\u4ee5\u53ca\u7b56\u7565\u6027\u6e38\u620f\u90fd\u6709\u91cd\u8981\u4ef7\u503c\u3002\u672c\u6587\u96c6\u4e2d\u4e8e\u5728\u6a21\u62df\u73af\u5883\u4e2d\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u300aAmong Us\u300b\u4f5c\u4e3a\u7814\u7a76\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u901a\u8fc7\u521b\u5efa\u4e00\u4e2a\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\u73af\u5883\uff0c\u79f0\u4e3aAmongAgent\uff0c\u8be5\u73af\u5883\u590d\u5236\u4e86\u300aAmong Us\u300b\u7684\u6e38\u620f\u52a8\u6001\u3002\u73a9\u5bb6\u626e\u6f14\u592a\u7a7a\u8239\u4e0a\u7684\u8239\u5458\uff0c\u4efb\u52a1\u662f\u8bc6\u522b\u7834\u574f\u592a\u7a7a\u8239\u7684\u5192\u540d\u9876\u66ff\u8005\u5e76\u6d88\u9664\u8239\u5458\u3002\u5728\u8fd9\u4e2a\u73af\u5883\u4e2d\uff0c\u6a21\u62df\u8bed\u8a00\u4ee3\u7406\u7684\u884c\u4e3a\u88ab\u5206\u6790\u3002\u5b9e\u9a8c\u6d89\u53ca\u4e0d\u540c\u8239\u5458\u548c\u5192\u540d\u9876\u66ff\u8005\u4eba\u683c\u539f\u578b\u914d\u7f6e\u7684\u591a\u6837\u5316\u7684\u6e38\u620f\u5e8f\u5217\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u6700\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6709\u6548\u5730\u638c\u63e1\u6e38\u620f\u89c4\u5219\uff0c\u5e76\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u5728\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u590d\u6742\u52a8\u4f5c\u7a7a\u95f4\u4e2d\u7684\u76ee\u6807\u5bfc\u5411\u6e38\u620f\u4e2d\u7684\u8bed\u8a00\u6a21\u578b\u6027\u80fd\u8fdb\u884c\u8fdb\u4e00\u6b65\u63a2\u7d22\uff0c\u8fd9\u4e9b\u8bbe\u7f6e\u63d0\u4f9b\u4e86\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4f1a\u9a71\u52a8\u573a\u666f\u4e2d\u8868\u73b0\u7684\u5b9d\u8d35\u673a\u4f1a\u3002|\n", "2407.17469": "|**2024-07-24**|**I Could've Asked That: Reformulating Unanswerable Questions**|Wenting Zhao et.al.|[2407.17469](http://arxiv.org/abs/2407.17469)|**[link](https://github.com/wenting-zhao/couldask)**|**\u5728\u4ece\u4e0d\u719f\u6089\u6587\u6863\u4e2d\u83b7\u53d6\u4fe1\u606f\u65f6\uff0c\u7528\u6237\u7ecf\u5e38\u63d0\u51fa\u65e0\u6cd5\u7531\u6587\u6863\u56de\u7b54\u7684\u95ee\u9898\u3002\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8bc6\u522b\u8fd9\u4e9b\u65e0\u6cd5\u56de\u7b54\u7684\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5e2e\u52a9\u7528\u6237\u91cd\u65b0\u6784\u5efa\u95ee\u9898\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u5b83\u4eec\u7684\u6574\u4f53\u5b9e\u7528\u6027\u3002\u6211\u4eec\u7cbe\u5fc3\u7f16\u6392\u4e86CouldAsk\uff0c\u4e00\u4e2a\u7528\u4e8e\u6587\u6863\u652f\u6301\u7684\u95ee\u7b54\u4efb\u52a1\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u65e8\u5728\u7814\u7a76\u91cd\u65b0\u6784\u5efa\u65e0\u6cd5\u56de\u7b54\u95ee\u9898\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u62ec\u4e86\u73b0\u6709\u7684\u548c\u65b0\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728CouldAsk\u4e0a\u7684\u8868\u73b0\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u91cd\u65b0\u6784\u5efa\u95ee\u9898\u65b9\u9762\u80fd\u529b\u6709\u9650\u3002\u5177\u4f53\u800c\u8a00\uff0cGPT-4\u548cLlama2-7B\u4ec5\u6210\u529f\u5730\u91cd\u65b0\u6784\u5efa\u4e86\u95ee\u9898\u768426%\u548c12%\u3002\u9519\u8bef\u5206\u6790\u663e\u793a\uff0c\u5931\u8d25\u7684\u91cd\u65b0\u6784\u5efa\u4e2d\u670962%\u7684\u539f\u56e0\u662f\u6a21\u578b\u53ea\u662f\u91cd\u8ff0\u4e86\u95ee\u9898\uff0c\u751a\u81f3\u751f\u6210\u4e86\u5b8c\u5168\u76f8\u540c\u7684\u95ee\u9898\u3002\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u57fa\u51c6\u4ee5\u53ca\u91cd\u73b0\u5b9e\u9a8c\u6240\u9700\u7684\u4ee3\u7801\u3002**|\n", "2407.17468": "|**2024-07-24**|**WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries**|Wenting Zhao et.al.|[2407.17468](http://arxiv.org/abs/2407.17468)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u666e\u904d\u5b58\u5728\u7684\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u7684\u4e8b\u5b9e\u6027\u8bc4\u4f30\u57fa\u51c6\u672a\u80fd\u8986\u76d6\u73b0\u5b9e\u4e16\u754c\u7528\u6237\u5bfb\u6c42\u4fe1\u606f\u7684\u591a\u6837\u5316\u77e5\u8bc6\u9886\u57df\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86WildHallucinations\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u4e8b\u5b9e\u6027\u3002\u8be5\u57fa\u51c6\u901a\u8fc7\u4fc3\u4f7fLLM\u751f\u6210\u6765\u81ea\u91ce\u5916\u7528\u6237-\u804a\u5929\u673a\u5668\u4eba\u5bf9\u8bdd\u4e2d\u7684\u5b9e\u4f53\u7684\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u751f\u6210\u5185\u5bb9\u968f\u540e\u81ea\u52a8\u4e0e\u4ece\u7f51\u7edc\u641c\u7d22\u7cfb\u7edf\u6536\u96c6\u7684\u6709\u7ec4\u7ec7\u7684\u77e5\u8bc6\u5e93\u8fdb\u884c\u4e8b\u5b9e\u68c0\u67e5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4e00\u534a\u4ee5\u4e0a\u7684\u5b9e\u9645\u4e16\u754c\u5b9e\u4f53\u5e76\u6ca1\u6709\u76f8\u5173\u7684\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u3002\u6211\u4eec\u572815\u4e2aLLM\u4e0a\u5bf97919\u4e2a\u5b9e\u4f53\u8fdb\u884c\u4e86118785\u6b21\u751f\u6210\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u53d1\u73b0\uff0cLLM\u5728\u6ca1\u6709\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u7684\u5b9e\u4f53\u4e0a\u4ea7\u751f\u66f4\u591a\u7684\u5e7b\u89c9\uff0c\u5e76\u4e14\u4e0d\u540c\u9886\u57df\u7684\u5e7b\u89c9\u7387\u5b58\u5728\u5dee\u5f02\u3002\u6700\u540e\uff0c\u5728\u4f7f\u7528\u76f8\u540c\u7684\u5e95\u5c42\u6a21\u578b\u65f6\uff0c\u4ec5\u589e\u52a0\u68c0\u7d22\u7ec4\u4ef6\u53ef\u4ee5\u7565\u5fae\u51cf\u5c11\u5e7b\u89c9\uff0c\u4f46\u65e0\u6cd5\u5b8c\u5168\u6d88\u9664\u5e7b\u89c9\u3002|\n", "2407.17467": "|**2024-07-24**|**CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models**|Jiawei Gu et.al.|[2407.17467](http://arxiv.org/abs/2407.17467)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f80\u5f80\u5728\u7279\u5b9a\u9886\u57df\u5185\u8868\u73b0\u4e0d\u4f73\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u7279\u5b9a\u9886\u57df\u7684\u6216\u4e13\u6709\u8bed\u6599\u5e93\u3002\u8fde\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u901a\u8fc7\u56de\u653e\u901a\u7528\u8bed\u6599\u5e76\u6ce8\u5165\u65b0\u9886\u57df\u7684\u7279\u5b9a\u77e5\u8bc6\u6765\u589e\u5f3aLLM\u7684\u80fd\u529b\uff0c\u4ee5\u6b64\u9632\u6b62\u707e\u96be\u6027\u9057\u5fd8\u3002\u7136\u800c\uff0c\u5728\u901a\u7528\u8bed\u6599\u548c\u9886\u57df\u7279\u5b9a\u8bed\u6599\u7684\u6df7\u5408\u6bd4\u4f8b\u4e0a\uff0c\u4eba\u4eec\u901a\u5e38\u91c7\u53d6\u7684\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5b9e\u9645\u8bad\u7ec3\u6548\u7387\u7684\u4f4e\u4e0b\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5c1d\u8bd5\u4eceCPT\u7684\u6838\u5fc3\u51fa\u53d1\u91cd\u65b0\u5ba1\u89c6LLM\u7684\u7f29\u653e\u884c\u4e3a\uff0c\u5e76\u53d1\u73b0\u635f\u5931\u3001\u6df7\u5408\u6bd4\u7387\u4e0e\u8bad\u7ec3\u4ee4\u724c\u89c4\u6a21\u4e4b\u95f4\u7684\u5e42\u5f8b\u5173\u7cfb\u3002\u6211\u4eec\u6b63\u5f0f\u5b9a\u4e49\u4e86\u901a\u7528\u80fd\u529b\u548c\u9886\u57df\u7279\u5b9a\u80fd\u529b\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u4ece\u800c\u786e\u5b9a\u4e86\u901a\u7528\u6570\u636e\u548c\u9886\u57df\u6570\u636e\u7684\u4e34\u754c\u6df7\u5408\u6bd4\u7387\uff08CMR\uff09\u3002\u901a\u8fc7\u627e\u5230\u5e73\u8861\u70b9\uff0cCMR\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u901a\u7528\u80fd\u529b\uff0c\u5e76\u5b9e\u73b0\u4e86\u671f\u671b\u7684\u9886\u57df\u8fc1\u79fb\uff0c\u786e\u4fdd\u4e86\u53ef\u7528\u8d44\u6e90\u7684\u6700\u5927\u5316\u5229\u7528\u3002\u56e0\u6b64\uff0c\u5982\u679c\u91cd\u89c6\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u5e73\u8861\uff0cCMR\u53ef\u4ee5\u88ab\u8ba4\u4e3a\u662f\u6700\u4f73\u6df7\u5408\u6bd4\u7387\u3002 \u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u5b9e\u4e86CMR\u7684\u53ef\u9884\u6d4b\u6027\uff0c\u5e76\u63d0\u51fa\u4e86CMR\u7f29\u653e\u5b9a\u5f8b\uff0c\u5e76\u5bf9\u5176\u4e00\u822c\u6027\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u8fd9\u4e9b\u53d1\u73b0\u63d0\u4f9b\u4e86\u4f18\u5316LLM\u5728\u7279\u5b9a\u9886\u57df\u5185\u7684\u8bad\u7ec3\u7684\u5b9e\u7528\u6307\u5357\uff0c\u786e\u4fdd\u5728\u6709\u6548\u7ba1\u7406\u8bad\u7ec3\u8d44\u6e90\u7684\u540c\u65f6\uff0c\u65e2\u4fdd\u6301\u901a\u7528\u6027\u80fd\u53c8\u5b9e\u73b0\u9886\u57df\u7279\u5b9a\u6027\u80fd\u3002|\n", "2407.17453": "|**2024-07-24**|**$VILA^2$: VILA Augmented VILA**|Yunhao Fang et.al.|[2407.17453](http://arxiv.org/abs/2407.17453)|null|\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(VLMs)\u7684\u53d1\u5c55\u8fc5\u901f\uff0c\u5f97\u76ca\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u6a21\u578b\u67b6\u6784\u548c\u8bad\u7ec3\u57fa\u7840\u8bbe\u65bd\u5728\u5feb\u901f\u8fdb\u6b65\uff0c\u4f46\u6570\u636e\u6536\u96c6\u4e0e\u6574\u7406\u7684\u5de5\u4f5c\u4ecd\u88ab\u5ffd\u89c6\u3002\u5f53\u6570\u636e\u7684\u6570\u91cf\u4e0e\u8d28\u91cf\u6210\u4e3a\u74f6\u9888\u65f6\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u76f4\u63a5\u4ece\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u66f4\u591a\u539f\u59cb\u6570\u636e\uff0c\u8fd9\u4e9b\u6570\u636e\u7684\u8d28\u91cf\u65e0\u6cd5\u4fdd\u8bc1\uff0c\u8981\u4e48\u4ece\u9ed1\u76d2\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4V/\u91d1\u725b\u5ea7\uff09\u4e2d\u63d0\u53d6\u6570\u636e\uff0c\u5bfc\u81f4\u6027\u80fd\u53d7\u5230\u8be5\u6a21\u578b\u7684\u9650\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u548c\u4e13\u5bb6\u589e\u5f3a\u6b65\u9aa4\uff0c\u4ee5\u8fed\u4ee3\u5730\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u6a21\u578b\u6027\u80fd\u3002 \u5728\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u4e2d\uff0cVLM\u91cd\u65b0\u751f\u6210\u5176\u81ea\u8eab\u7684\u9884\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u63d0\u5347\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u4ece\u8fd9\u4e2a\u7cbe\u70bc\u7684\u6570\u636e\u96c6\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ee5\u6539\u5584\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u91cd\u590d\u8fdb\u884c\u591a\u6b21\u3002\u4e00\u65e6\u81ea\u6211\u589e\u5f3a\u8fbe\u5230\u9971\u548c\uff0c\u6211\u4eec\u5c06\u91c7\u7528\u51e0\u4e2a\u4e13\u95e8\u9886\u57dfVLM\uff0c\u8fd9\u4e9bVLM\u662f\u4ece\u81ea\u6211\u589e\u5f3a\u7684VLM\u4e2d\u5fae\u8c03\u800c\u6765\u7684\uff0c\u5177\u6709\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u901a\u8fc7\u4efb\u52a1\u5bfc\u5411\u7684\u91cd\u65b0\u751f\u6210\u548c\u91cd\u65b0\u8bad\u7ec3\uff0c\u8fdb\u4e00\u6b65\u5c06\u4e13\u5bb6\u77e5\u8bc6\u6ce8\u5165\u901a\u7528\u6a21\u578b\u4e2d\u3002 \u901a\u8fc7\u7ed3\u5408\u81ea\u6211\u589e\u5f3a\u548c\u4e13\u5bb6\u589e\u5f3a\u7684\u8bad\u7ec3\uff0c\u6211\u4eec\u5f15\u5165\u4e86VILA\u00b2\uff08VILA\u589e\u5f3a-VILA\uff09\u6a21\u578b\u5bb6\u65cf\uff0c\u8be5\u5bb6\u65cf\u5728\u5e7f\u6cdb\u7684\u4efb\u52a1\u4e0a\u6301\u7eed\u63d0\u9ad8\u4e86\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u6210\u679c\uff0c\u5e76\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u6a21\u578b\u4e2dMMMU\u6392\u884c\u699c\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u7ed3\u679c\u3002|\n", "2407.17417": "|**2024-07-24**|**Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?**|Michael-Andrei Panaitescu-Liess et.al.|[2407.17417](http://arxiv.org/abs/2407.17417)|null|\u672c\u6587\u9996\u5148\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u5d4c\u5165\u6c34\u5370\u4f5c\u4e3a\u9632\u6b62\u751f\u6210\u7248\u6743\u4fb5\u6743\u6587\u672c\u7684\u6709\u6548\u624b\u6bb5\u3002\u901a\u8fc7\u7406\u8bba\u5206\u6790\u548c\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5728LLM\u4e2d\u878d\u5165\u6c34\u5370\u80fd\u591f\u663e\u8457\u964d\u4f4e\u751f\u6210\u7248\u6743\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u89e3\u51b3LLM\u90e8\u7f72\u8fc7\u7a0b\u4e2d\u7684\u4e00\u9879\u5173\u952e\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6c34\u5370\u5bf9\u6210\u5458\u5f52\u5c5e\u63a8\u65ad\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIAs\uff09\u7684\u5f71\u54cd\uff0cMIAs\u65e8\u5728\u8bc6\u522b\u6837\u672c\u662f\u5426\u5c5e\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u80fd\u7528\u4e8e\u68c0\u6d4b\u7248\u6743\u8fdd\u89c4\u884c\u4e3a\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6c34\u5370\u964d\u4f4e\u4e86MIAs\u7684\u6210\u529f\u7387\uff0c\u4f7f\u68c0\u6d4b\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u7248\u6743\u6587\u672c\u53d8\u5f97\u590d\u6742\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u6280\u672f\u6765\u63d0\u9ad8\u5728\u6c34\u5370\u73af\u5883\u4e0b\u6700\u8fd1MIAs\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u5f00\u53d1\u9002\u5e94\u6027\u65b9\u6cd5\u4ee5\u7814\u7a76\u5177\u6709\u6f5c\u5728\u6cd5\u5f8b\u5f71\u54cd\u7684LLM\u5173\u952e\u95ee\u9898\u7684\u91cd\u8981\u6027\u3002|\n", "2407.17412": "|**2024-07-24**|**(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork**|Tianjin Huang et.al.|[2407.17412](http://arxiv.org/abs/2407.17412)|null|\u5927\u578b\u795e\u7ecf\u7f51\u7edc\u5728\u4e0d\u540c\u9886\u57df\u5982\u89c6\u89c9\u548c\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5c3d\u7ba1\u8fd9\u4f34\u968f\u7740\u5de8\u5927\u7684\u8ba1\u7b97\u8d44\u6e90\u6210\u672c\u3002\u538b\u7f29\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u7ed3\u6784\u6a21\u578b\u526a\u679d\u7b97\u6cd5\u662f\u4fc3\u8fdb\u6a21\u578b\u6548\u7387\u7684\u5173\u952e\u65b9\u6cd5\uff0c\u5f97\u76ca\u4e8e\u5176\u52a0\u901f\u53cb\u597d\u7684\u7a00\u758f\u6027\u6a21\u5f0f\u3002\u7ed3\u6784\u526a\u679d\u7684\u6838\u5fc3\u95ee\u9898\u662f\u5982\u4f55\u4f30\u8ba1\u901a\u9053\u7684\u91cd\u8981\u6027\u3002\u4e0e\u6b64\u5e76\u884c\uff0c\u6570\u636e\u4e3a\u4e2d\u5fc3\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u8868\u660e\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u6280\u672f\u80fd\u591f\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e00\u4e2a\u8ff7\u4eba\u7684\u53ef\u80fd\u6027\u2014\u2014\u5229\u7528\u89c6\u89c9\u63d0\u793a\u6765\u6355\u6349\u901a\u9053\u91cd\u8981\u6027\uff0c\u5e76\u63a8\u5bfc\u51fa\u9ad8\u8d28\u91cf\u7684\u7ed3\u6784\u7a00\u758f\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\u6846\u67b6\uff0c\u5373\\texttt{PASS}\u3002\u5b83\u662f\u4e00\u79cd\u5b9a\u5236\u7684\u8d85\u7f51\u7edc\uff0c\u63a5\u53d7\u89c6\u89c9\u63d0\u793a\u548c\u7f51\u7edc\u6743\u91cd\u7edf\u8ba1\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u9012\u5f52\u65b9\u5f0f\u8f93\u51fa\u9010\u5c42\u901a\u9053\u7a00\u758f\u6027\u3002\u8fd9\u79cd\u8bbe\u8ba1\u8003\u8651\u4e86\u5c42\u4e4b\u95f4\u901a\u9053\u7684\u5185\u5728\u4f9d\u8d56\u6027\u3002\u8de8\u591a\u4e2a\u7f51\u7edc\u67b6\u6784\u548c\u516d\u4e2a\u6570\u636e\u96c6\u7684\u5168\u9762\u5b9e\u9a8c\u663e\u793a\u4e86\\texttt{PASS}\u5728\u5b9a\u4f4d\u826f\u597d\u7ed3\u6784\u7a00\u758f\u6027\u7684\u4f18\u52bf\u3002\u4f8b\u5982\uff0c\u5728\u76f8\u540c\u7684FLOPs\u6c34\u5e73\u4e0b\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u5728Food101\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e861%-3%\u66f4\u9ad8\u7684\u51c6\u786e\u6027\uff1b\u6216\u8005\u5728\u83b7\u5f97\u4e0e\u57fa\u7ebf\u76f8\u540c\u768480%\u51c6\u786e\u5ea6\u65f6\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u80fd\u591f\u5b9e\u73b00.35\u500d\u66f4\u591a\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2407.17404": "|**2024-07-24**|**Grammar-based Game Description Generation using Large Language Models**|Tsunehiko Tanaka et.al.|[2407.17404](http://arxiv.org/abs/2407.17404)|null|\u4e3a\u4e86\u964d\u4f4e\u6e38\u620f\u8bbe\u8ba1\u5f00\u53d1\u7684\u95e8\u69db\uff0c\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u901a\u8fc7\u8ba1\u7b97\u8fc7\u7a0b\u751f\u6210\u6e38\u620f\u8bbe\u8ba1\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\uff0c\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5982\u8fdb\u5316\u7b97\u6cd5\u5df2\u53d6\u5f97\u6210\u529f\u3002\u5f97\u76ca\u4e8e\u6df1\u5ea6\u5b66\u4e60\u9886\u57df\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u65b9\u9762\u7684\u663e\u8457\u8fdb\u5c55\uff0c\u6e38\u620f\u751f\u6210\u65b9\u9762\u4e5f\u6709\u4e86\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u7684\u6570\u636e\u91cf\u6709\u9650\uff0c\u6df1\u5ea6\u5b66\u4e60\u5728\u4efb\u52a1\u5982\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u4e0a\u5e94\u7528\u4e0d\u8db3\u3002\u4e3a\u4e86\u5f00\u62d3\u5904\u7406\u6709\u9650\u6570\u636e\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\u7684\u65b0\u9014\u5f84\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u3002LLMs\u53ef\u4ee5\u4ece\u5c11\u91cf\u793a\u8303\u793a\u4f8b\u4e2d\u6355\u83b7\u4efb\u52a1\u7279\u5f81\uff0c\u5e76\u5229\u7528\u9884\u8bad\u7ec3\u671f\u95f4\u83b7\u5f97\u7684\u80fd\u529b\u8fdb\u884c\u5e94\u7528\u3002\u6211\u4eec\u5f15\u5165\u4e86\u6e38\u620f\u63cf\u8ff0\u7684\u8bed\u6cd5\uff0c\u6709\u6548\u5730\u5bf9\u6e38\u620f\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u7ed3\u6784\u5316\uff0c\u4f7fLLMs\u80fd\u591f\u6355\u6349\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\u7684\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u7801\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u8bed\u6cd5\u8fed\u4ee3\u6539\u8fdb\u751f\u6210\u7684\u8f93\u51fa\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u751f\u6210\u6e38\u620f\u63cf\u8ff0\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002|\n", "2407.17398": "|**2024-07-24**|**3D Question Answering for City Scene Understanding**|Penglei Sun et.al.|[2407.17398](http://arxiv.org/abs/2407.17398)|null|\u5728\u4e09\u7ef4\u591a\u6a21\u6001\u95ee\u7b54\uff08MQA\uff09\u9886\u57df\uff0c\u901a\u8fc7\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5176\u6240\u5728\u73af\u5883\u4e2d\u7684\u4e09\u7ef4\u7a7a\u95f4\uff0c\u5bf9\u4e8e\u573a\u666f\u7406\u89e3\u5177\u6709\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5ba4\u5185\u5bb6\u5ead\u4efb\u52a1\u548c\u5ba4\u5916\u9053\u8def\u81ea\u52a8\u9a7e\u9a76\u4efb\u52a1\u4e0a\uff0c\u800c\u5bf9\u4e8e\u57ce\u5e02\u7ea7\u522b\u7684\u573a\u666f\u7406\u89e3\u4efb\u52a1\u63a2\u7d22\u6709\u9650\u3002\u73b0\u6709\u7814\u7a76\u5728\u7406\u89e3\u57ce\u5e02\u573a\u666f\u65f6\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u57ce\u5e02\u5c42\u9762\u7684\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u4ee5\u53ca\u4eba\u7c7b\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u4fe1\u606f\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u4ece\u6570\u636e\u96c6\u548c\u65b9\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u5bf9\u4e09\u7ef4MQA\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u4ece\u6570\u636e\u96c6\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aCity-3DQA\u7684\u65b0\u9896\u4e09\u7ef4MQA\u6570\u636e\u96c6\uff0c\u5b83\u662f\u9996\u4e2a\u878d\u5408\u57ce\u5e02\u573a\u666f\u8bed\u4e49\u548c\u4eba\u4e0e\u73af\u5883\u4ea4\u4e92\u4efb\u52a1\u7684\u6570\u636e\u96c6\u3002\u4ece\u65b9\u6cd5\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u57ce\u5e02\u7ea7\u522b\u7406\u89e3\u65b9\u6cd5\uff08Sg-CityU\uff09\uff0c\u5229\u7528\u573a\u666f\u56fe\u5f15\u5165\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u3002\u5728City-3DQA\u7684\u4e0d\u540c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684Sg-CityU\u65b9\u6cd5\u53d6\u5f97\u4e8663.94%\u548c63.76%\u7684\u51c6\u786e\u7387\uff0c\u76f8\u6bd4\u5ba4\u5185\u4e09\u7ef4MQA\u65b9\u6cd5\u548c\u4f7f\u7528\u5148\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96f6\u6837\u672c\u65b9\u6cd5\uff0c\u5728\u9c81\u68d2\u6027\u548c\u6cdb\u5316\u80fd\u529b\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.17365": "|**2024-07-24**|**ViPer: Visual Personalization of Generative Models via Individual Preference Learning**|Sogand Salehi et.al.|[2407.17365](http://arxiv.org/abs/2407.17365)|null|\u4e0d\u540c\u7684\u7528\u6237\u5bf9\u4e8e\u540c\u4e00\u63d0\u793a\u751f\u6210\u7684\u4e0d\u540c\u56fe\u50cf\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u8fd9\u50ac\u751f\u4e86\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u7684\u6982\u5ff5\uff0c\u5373\u521b\u5efa\u4e0e\u4e2a\u4eba\u89c6\u89c9\u504f\u597d\u76f8\u5339\u914d\u7684\u56fe\u50cf\u3002\u76ee\u524d\u7684\u751f\u6210\u6a21\u578b\u662f\u65e0\u4e2a\u6027\u5316\u7684\uff0c\u5b83\u4eec\u88ab\u8c03\u6574\u4e3a\u5438\u5f15\u5e7f\u6cdb\u53d7\u4f17\u3002\u7528\u6237\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u7b26\u5408\u4e2a\u4eba\u504f\u597d\u7684\u56fe\u50cf\u4f9d\u8d56\u4e8e\u901a\u8fc7\u591a\u6b21\u8fed\u4ee3\u624b\u52a8\u8c03\u6574\u63d0\u793a\u7684\u8fc7\u7a0b\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u4e0d\u7406\u60f3\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u8fc7\u7a0b\uff1a\u9996\u5148\u901a\u8fc7\u9080\u8bf7\u7528\u6237\u5bf9\u4e00\u5c0f\u90e8\u5206\u56fe\u50cf\u8fdb\u884c\u8bc4\u8bba\uff0c\u89e3\u91ca\u4ed6\u4eec\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u7684\u539f\u56e0\uff0c\u4ece\u800c\u6355\u6349\u7528\u6237\u7684\u901a\u7528\u504f\u597d\u3002\u57fa\u4e8e\u8fd9\u4e9b\u8bc4\u8bba\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u51fa\u7528\u6237\u7684\u7ed3\u6784\u5316\u559c\u597d\u7684\u548c\u4e0d\u559c\u597d\u7684\u89c6\u89c9\u5c5e\u6027\uff0c\u5373\u4ed6\u4eec\u7684\u89c6\u89c9\u504f\u597d\u3002\u8fd9\u4e9b\u5c5e\u6027\u7528\u4e8e\u6307\u5bfc\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u66f4\u8d34\u8fd1\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u7684\u56fe\u50cf\u3002 \u901a\u8fc7\u4e00\u7cfb\u5217\u7528\u6237\u7814\u7a76\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f15\u5bfc\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u4e0e\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u9ad8\u5ea6\u4e00\u81f4\u7684\u751f\u6210\u7ed3\u679c\u3002|\n", "2407.17353": "|**2024-07-24**|**Scalify: scale propagation for efficient low-precision LLM training**|Paul Balan\u00e7a et.al.|[2407.17353](http://arxiv.org/abs/2407.17353)|**[link](https://github.com/graphcore-research/jax-scalify)**|**\u4f4e\u7cbe\u5ea6\u683c\u5f0f\uff0c\u5982float8\uff0c\u5df2\u88ab\u5f15\u5165\u673a\u5668\u5b66\u4e60\u52a0\u901f\u786c\u4ef6\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u548c\u63a8\u7406\u7684\u8ba1\u7b97\u6548\u7387\u3002\u7136\u800c\uff0c\u7531\u4e8e\u9700\u8981\u590d\u6742\u7684\u3001\u6709\u65f6\u662f\u8106\u5f31\u7684\u6280\u672f\u6765\u5339\u914d\u66f4\u9ad8\u7cbe\u5ea6\u7684\u8bad\u7ec3\u51c6\u786e\u5ea6\uff0cML\u793e\u533a\u5bf9\u4f4e\u7cbe\u5ea6\u683c\u5f0f\u7684\u91c7\u7eb3\u901f\u5ea6\u8f83\u6162\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aScalify\u7684\u7aef\u5230\u7aef\u7684\u7f29\u653e\u4f20\u64ad\u8303\u5f0f\uff0c\u7528\u4e8e\u8ba1\u7b97\u56fe\uff0c\u5b83\u6cdb\u5316\u5e76\u5f62\u5f0f\u5316\u4e86\u73b0\u6709\u7684\u5f20\u91cf\u7f29\u653e\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cScalify\u652f\u6301\u76f4\u63a5\u4f7f\u7528float8\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u548c\u68af\u5ea6\u8868\u793a\uff0c\u4ee5\u53cafloat16\u4f18\u5316\u5668\u72b6\u6001\u5b58\u50a8\u3002\u6211\u4eec\u5bf9Scalify\u7684JAX\u5b9e\u73b0\u5df2\u7ecf\u5f00\u6e90\u5728https://github.com/graphcore-research/jax-scalify\u3002**|\n", "2407.18219": "|**2024-07-26**|**Recursive Introspection: Teaching Language Model Agents How to Self-Improve**|Yuxiao Qu et.al.|[2407.18219](http://arxiv.org/abs/2407.18219)|null|\u5728\u4f7f\u57fa\u7840\u6a21\u578b\u5177\u5907\u81ea\u6211\u53cd\u7701\u80fd\u529b\u4ee5\u4fc3\u8fdb\u667a\u80fd\u4ee3\u7406\u884c\u4e3a\u7684\u5173\u952e\u65b9\u9762\u5728\u4e8e\u4f7f\u5176\u80fd\u591f\u5bf9\u5176\u884c\u4e3a\u3001\u63a8\u7406\u4ee5\u53ca\u5728\u53ef\u7528\u8ba1\u7b97\u6216\u4ea4\u4e92\u589e\u52a0\u65f6\u7ea0\u6b63\u9519\u8bef\u7684\u80fd\u529b\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u3002\u5373\u4f7f\u662f\u6700\u5f3a\u7684\u4e13\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e5f\u672a\u80fd\u5c55\u73b0\u51fa\u5728\u660e\u786e\u544a\u77e5\u5176\u72af\u9519\u7684\u60c5\u51b5\u4e0b\uff0c\u80fd\u591f\u8fde\u7eed\u6539\u8fdb\u5176\u54cd\u5e94\u5e8f\u5217\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRISE\uff08\u9012\u5f52\u5185\u7701\uff09\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5fae\u8c03LLMs\u4ee5\u5f15\u5165\u8fd9\u4e00\u80fd\u529b\uff0c\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u66fe\u5047\u8bbe\u8fd9\u79cd\u80fd\u529b\u53ef\u80fd\u65e0\u6cd5\u5b9e\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u89c4\u5b9a\u4e86\u4e00\u4e2a\u8fed\u4ee3\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u5c1d\u8bd5\u6559\u6388\u6a21\u578b\u5982\u4f55\u5728\u5176\u89e3\u51b3\u56f0\u96be\u6d4b\u8bd5\u65f6\u95ee\u9898\u7684\u4e0d\u6210\u529f\u5c1d\u8bd5\u540e\u4fee\u6539\u5176\u54cd\u5e94\uff0c\u5e76\u53ef\u9009\u5730\u83b7\u5f97\u989d\u5916\u7684\u73af\u5883\u53cd\u9988\u3002RISE\u5c06\u5355\u8f6e\u63d0\u793a\u7684\u5fae\u8c03\u89c6\u4e3a\u89e3\u51b3\u591a\u8f6e\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDP\uff09\uff0c\u5176\u4e2d\u521d\u59cb\u72b6\u6001\u4e3a\u63d0\u793a\u3002\u53d7\u5728\u7ebf\u6a21\u4eff\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u539f\u7406\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u8f6e\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u65e8\u5728\u8d4b\u4e88LLM\u9012\u5f52\u68c0\u6d4b\u5e76\u4fee\u6b63\u5176\u5148\u524d\u9519\u8bef\u5e76\u5728\u540e\u7eed\u8fed\u4ee3\u4e2d\u8fdb\u884c\u7ea0\u6b63\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRISE\u4f7fLlama2\u3001Llama3\u548cMistral\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u901a\u8fc7\u66f4\u591a\u8f6e\u6b21\u6539\u5584\u81ea\u5df1\uff0c\u4e0e\u7ed9\u5b9a\u7b49\u91cf\u63a8\u7406\u65f6\u95f4\u8ba1\u7b97\u76f8\u6bd4\uff0c\u8d85\u8fc7\u4e86\u51e0\u79cd\u5355\u8f6e\u7b56\u7565\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cRISE\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\uff0c\u901a\u5e38\u968f\u7740\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u800c\u83b7\u5f97\u66f4\u5927\u7684\u6536\u76ca\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cRISE\u5bf9\u56f0\u96be\u63d0\u793a\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u6709\u610f\u4e49\u7684\u6539\u8fdb\uff0c\u4ee5\u8fbe\u5230\u6b63\u786e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u6ca1\u6709\u56e0\u4e3a\u8868\u8fbe\u66f4\u590d\u6742\u7684\u5206\u5e03\u800c\u5bfc\u81f4\u5355\u8f6e\u80fd\u529b\u53d7\u5230\u5f71\u54cd\u3002|\n", "2407.18213": "|**2024-07-26**|**Exploring Scaling Trends in LLM Robustness**|Nikolaus Howe et.al.|[2407.18213](http://arxiv.org/abs/2407.18213)|null|\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u53ef\u9884\u6d4b\u5730\u901a\u8fc7\u589e\u52a0\u6a21\u578b\u7684\u5927\u5c0f\u548c\u8bad\u7ec3\u6570\u636e\u800c\u5f97\u5230\u6539\u5584\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u5df2\u8bad\u7ec3\u4e86\u4e00\u7cfb\u5217\u8d8a\u6765\u8d8a\u5927\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5bf9\u5bf9\u6297\u6027\u63d0\u793a\uff08\u5982\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff09\u975e\u5e38\u8106\u5f31\uff0c\u8fd9\u7c7b\u653b\u51fb\u4f1a\u64cd\u63a7\u6a21\u578b\u6267\u884c\u4e0d\u5e0c\u671b\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6784\u6210\u4e86\u91cd\u5927\u7684\u8bef\u7528\u98ce\u9669\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u968f\u7740\u6a21\u578b\u548c\u6570\u636e\u89c4\u6a21\u7684\u589e\u52a0\uff0c\u8ba1\u7b97\u673a\u89c6\u89c9\u6a21\u578b\u7684\u9c81\u68d2\u6027\u4e5f\u4f1a\u63d0\u9ad8\uff0c\u56e0\u6b64\u63d0\u51fa\u4e86\u8fd9\u6837\u4e00\u4e2a\u95ee\u9898\uff1a\u8bed\u8a00\u6a21\u578b\u7684\u9c81\u68d2\u6027\u662f\u5426\u4e5f\u4f1a\u968f\u89c4\u6a21\u7684\u6269\u5927\u800c\u63d0\u5347\uff1f\u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\u56de\u7b54\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u53d1\u73b0\u66f4\u5927\u7684\u6a21\u578b\u5728\u5bf9\u6297\u6027\u8bad\u7ec3\u4e0b\u6709\u663e\u8457\u66f4\u597d\u7684\u8868\u73b0\uff0c\u4f46\u5728\u6ca1\u6709\u660e\u786e\u9632\u5fa1\u63aa\u65bd\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u89c4\u6a21\u7684\u589e\u52a0\u5e76\u6ca1\u6709\u5e26\u6765\u4efb\u4f55\u76ca\u5904\u3002|\n", "2407.18158": "|**2024-07-25**|**Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models**|Sanae Lotfi et.al.|[2407.18158](http://arxiv.org/abs/2407.18158)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u538b\u7f29\u6280\u672f\u8ba1\u7b97\u4e86LLM\u7684\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\uff0c\u4f46\u5bf9\u4e8e\u5341\u4ebf\u53c2\u6570\u7ea7\u522b\u7684\u5927\u578b\u6a21\u578b\uff0c\u8fd9\u4e9b\u8fb9\u754c\u663e\u5f97\u65e0\u610f\u4e49\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u8fb9\u754c\u662f\u5728\u975e\u5e38\u6709\u9650\u7684\u538b\u7f29\u6280\u672f\u4e0b\u83b7\u5f97\u7684\uff0c\u9650\u5236\u4e86\u751f\u6210\u8d28\u91cf\u8f83\u4f4e\u6587\u672c\u7684\u538b\u7f29\u6a21\u578b\u3002\u66f4\u5173\u952e\u7684\u662f\uff0c\u73b0\u6709\u8fb9\u754c\u4f9d\u8d56\u4e8e\u8bad\u7ec3\u96c6\u4e2d\u72ec\u7acb\u540c\u5206\u5e03\uff08IID\uff09\u6587\u6863\u7684\u6570\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u8bad\u7ec3\u96c6\u5185\u6570\u91cf\u5e9e\u5927\u7684\u975eIID\u6784\u6210\u4ee4\u724c\uff0c\u8fd9\u4f7f\u5f97\u8fdb\u4e00\u6b65\u63d0\u9ad8\u8fb9\u754c\u7d27\u81f4\u6027\u6f5c\u529b\u672a\u88ab\u5145\u5206\u5229\u7528\u3002 \u672c\u7814\u7a76\u91c7\u7528\u9785\u7684\u6027\u8d28\u6765\u63a8\u5bfc\u6cdb\u5316\u8fb9\u754c\uff0c\u8fd9\u4e9b\u8fb9\u754c\u80fd\u591f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u96c6\u4e2d\u5305\u542b\u7684\u5927\u91cf\u4ee4\u724c\u4e2d\u83b7\u76ca\u3002\u4e0e\u8bad\u7ec3\u96c6\u76f8\u6bd4\uff0c\u6570\u636e\u96c6\u5305\u542b\u7684\u4ee4\u724c\u6570\u91cf\u8fdc\u591a\u4e8e\u6587\u6863\uff0c\u56e0\u6b64\u6211\u4eec\u7684\u6cdb\u5316\u8fb9\u754c\u4e0d\u4ec5\u5bb9\u5fcd\u4e86\u66f4\u4e3a\u5bbd\u677e\u7684\u538b\u7f29\u65b9\u6848\uff0c\u5b9e\u9645\u4e0a\u8fd8\u80fd\u4ece\u8fd9\u4e9b\u65b9\u6848\u4e2d\u83b7\u76ca\u3002\u6211\u4eec\u901a\u8fc7Monarch\u77e9\u9635\u3001Kronecker\u56e0\u5b50\u5206\u89e3\u548c\u540e\u8bad\u7ec3\u91cf\u5316\u7b49\u65b9\u6cd5\uff0c\u4e3aLLM\uff08\u5982LLaMA2-70B\uff09\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002\u4e0e\u4ee5\u5f80\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u9996\u6b21\u4e3a\u5728\u5b9e\u8df5\u4e2d\u90e8\u7f72\u5e76\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u7684\u6a21\u578b\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002|\n", "2407.18129": "|**2024-07-26**|**Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic**|Fakhraddin Alwajih et.al.|[2407.18129](http://arxiv.org/abs/2407.18129)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u751f\u6210\u548c\u7406\u89e3\u56fe\u50cf\u5230\u6587\u672c\u5185\u5bb9\u65b9\u9762\u7684\u529f\u80fd\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u8fd9\u4e9b\u6210\u529f\uff0c\u4f46\u8fdb\u6b65\u4e3b\u8981\u5c40\u9650\u4e8e\u82f1\u8bed\uff0c\u7531\u4e8e\u5176\u4ed6\u8bed\u8a00\u5982\u963f\u62c9\u4f2f\u8bed\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u8d44\u6e90\u7684\u7a00\u7f3a\u6027\uff0c\u8fd9\u9650\u5236\u4e86\u963f\u62c9\u4f2f\u8bed\u7b49\u8bed\u8a00\u4e2d\u7ade\u4e89\u6027\u6a21\u578b\u7684\u53d1\u5c55\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684\u963f\u62c9\u4f2f\u8bed\u591a\u6a21\u6001\u52a9\u624b\u2014\u2014Dallah\uff0c\u5b83\u57fa\u4e8eLLaMA-2\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u6765\u4fc3\u8fdb\u591a\u6a21\u6001\u4ea4\u4e92\u3002Dallah\u5728\u963f\u62c9\u4f2f\u8bedMLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u901a\u8fc7\u7ec6\u8c03\u516d\u4e2a\u963f\u62c9\u4f2f\u65b9\u8a00\uff0cDallah\u5c55\u793a\u4e86\u5176\u5904\u7406\u5305\u542b\u6587\u672c\u548c\u89c6\u89c9\u5143\u7d20\u7684\u590d\u6742\u65b9\u8a00\u4e92\u52a8\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u5728\u4e24\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff1a\u4e00\u4e2a\u8bc4\u4f30\u5176\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u6027\u80fd\uff0c\u53e6\u4e00\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u65b9\u8a00\u54cd\u5e94\u3002 \u9664\u4e86\u5728\u591a\u6a21\u6001\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u7a33\u5065\u6027\u80fd\u5916\uff0cDallah\u6709\u671b\u5f15\u9886\u8fdb\u4e00\u6b65\u5f00\u53d1\u65b9\u8a00\u610f\u8bc6\u7684\u963f\u62c9\u4f2f\u8bedMLLM\u7684\u53d1\u5c55\u3002|\n", "2407.18103": "|**2024-07-25**|**Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow**|Tian Guo et.al.|[2407.18103](http://arxiv.org/abs/2407.18103)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u5fae\u8c03\u6280\u672f\u5728\u5404\u79cd\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06LLM\u7528\u4e8e\u57fa\u4e8e\u91d1\u878d\u65b0\u95fb\u6d41\u7684\u80a1\u7968\u56de\u62a5\u9884\u6d4b\u7684\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u91cf\u5316\u6295\u8d44\u9886\u57df\uff0c\u56de\u62a5\u9884\u6d4b\u662f\u540e\u7eed\u4efb\u52a1\u5982\u80a1\u7968\u6311\u9009\u548c\u7ec4\u5408\u4f18\u5316\u7b49\u7684\u57fa\u7840\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u6587\u672c\u8868\u793a\u548c\u9884\u6d4b\u6a21\u5757\u7684\u6a21\u578b\u3002\u63d0\u51fa\u4e86\u6bd4\u8f83\u4ec5\u7f16\u7801\u5668\u548c\u4ec5\u89e3\u7801\u5668LLM\u7684\u4e24\u79cd\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee5\u4e0d\u540c\u7684\u65b9\u5f0f\u751f\u6210\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u4e0d\u540c\u8868\u793a\u5bf9\u9884\u6d4b\u6027\u80fd\u7684\u5f71\u54cd\u4ecd\u662f\u4e00\u4e2a\u5f00\u653e\u7684\u95ee\u9898\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u5c06LLM\u7684token\u7ea7\u8868\u793a\u96c6\u6210\u5230\u9884\u6d4b\u6a21\u5757\u4e2d\u7684\u4e24\u79cd\u7b80\u5355\u65b9\u6cd5\u3002\u5728\u771f\u5b9e\u65b0\u95fb\u548c\u6295\u8d44\u8303\u56f4\u5185\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\u4ee5\u4e0b\u7ed3\u679c\uff1a\uff081\uff09\u4eceLLM\u7684token\u7ea7\u5d4c\u5165\u805a\u5408\u7684\u8868\u793a\u901a\u5e38\u80fd\u4ea7\u751f\u589e\u5f3a\u957f\u671f\u548c\u957f\u671f\u77ed\u671f\u6295\u8d44\u7ec4\u5408\u6027\u80fd\u7684\u56de\u62a5\u9884\u6d4b\uff1b\uff082\uff09\u5728\u76f8\u5bf9\u8f83\u5927\u7684\u6295\u8d44\u8303\u56f4\u5185\uff0c\u57fa\u4e8e\u89e3\u7801\u5668\u7684LLM\u9884\u6d4b\u6a21\u578b\u5bfc\u81f4\u66f4\u5f3a\u7684\u6295\u8d44\u7ec4\u5408\uff0c\u800c\u5728\u8f83\u5c0f\u7684\u8303\u56f4\u5185\uff0c\u6ca1\u6709\u4e00\u81f4\u7684\u8d62\u5bb6\uff1b\uff083\uff09\u4eceLLM\u6587\u672c\u8868\u793a\u4e2d\u5bfc\u51fa\u7684\u56de\u62a5\u9884\u6d4b\u5bf9\u4e8e\u6295\u8d44\u7ec4\u5408\u6784\u9020\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u4fe1\u53f7\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684\u60c5\u7eea\u5f97\u5206\u3002|\n", "2407.18078": "|**2024-07-25**|**PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization**|Christopher Clarke et.al.|[2407.18078](http://arxiv.org/abs/2407.18078)|**[link](https://github.com/ChrisIsKing/Parameter-Efficient-Personalization)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u5174\u8d77\u4e3a\u4eba\u7c7b\u4e0eAI\u7684\u4ea4\u4e92\u5f00\u8f9f\u4e86\u65b0\u7684\u7bc7\u7ae0\u3002\u8fd9\u4e9b\u5148\u8fdb\u6a21\u578b\uff0c\u4ee5Chat-GPT\u4e3a\u4ee3\u8868\uff0c\u5c55\u73b0\u4e86\u5728\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u89c4\u6a21\u7684\u6307\u6570\u7ea7\u589e\u957f\uff0c\u4e00\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6a21\u578b\u4e2a\u6027\u5316\u2014\u2014\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u5927\u578b\u57fa\u7840\u6a21\u578b\u5982GPT-3\u7b49\u4fa7\u91cd\u4e8e\u6784\u5efa\u901a\u7528\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u4efb\u52a1\u548c\u7528\u6237\u7fa4\u4f53\u3002\u8fd9\u79cd\u7b56\u7565\u5f3a\u8c03\u4e86\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c06\u7528\u6237\u89c6\u4e3a\u6574\u4f53\u800c\u975e\u4e2a\u4f53\u3002\u867d\u7136\u5728\u8bb8\u591a\u5e38\u89c1\u5e94\u7528\u4e2d\u5b9e\u7528\uff0c\u4f46\u8fd9\u79cd\u4e00\u5200\u5207\u7684\u65b9\u6cd5\u5f80\u5f80\u65e0\u6cd5\u6ee1\u8db3\u4eba\u7c7b\u591a\u6837\u6027\u548c\u4e2a\u6027\u5316\u9700\u6c42\u7684\u4e30\u5bcc\u6027\u3002\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86PEFT-U\u57fa\u51c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u5efa\u548c\u8bc4\u4f30\u9762\u5411\u7528\u6237\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6a21\u578b\u7684\u65b0\u6570\u636e\u96c6\u3002PEFT-U\u5305\u542b\u4e86\u591a\u5143\u4e14\u4e2a\u6027\u5316\u7684\u8868\u8fbe\u4efb\u52a1\uff0c\u5176\u4e2d\u540c\u4e00\u8f93\u5165\u5bf9\u4e8e\u4e0d\u540c\u7528\u6237\u53ef\u80fd\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u901a\u8fc7PEFT-U\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u9ad8\u6548\u5730\u4e2a\u6027\u5316LLM\u4ee5\u9002\u5e94\u7528\u6237\u7279\u5b9a\u504f\u597d\uff0c\u7279\u522b\u662f\u5728\u591a\u6837\u5316\u7684\u7528\u6237\u4e2d\u5fc3\u4efb\u52a1\u80cc\u666f\u4e0b\u3002**|\n", "2407.18069": "|**2024-07-25**|**C2P: Featuring Large Language Models with Causal Reasoning**|Abdolmahdi Bagheri et.al.|[2407.18069](http://arxiv.org/abs/2407.18069)|null|\u56e0\u679c\u63a8\u7406\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fbe\u5230\u4eba\u7c7b\u7ea7\u667a\u80fd\u7684\u4e3b\u8981\u969c\u788d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u56e0\u679c\u94fe\u63d0\u793a\uff08C2P\uff09\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u4e3a\u5f53\u524dLLM\u63d0\u4f9b\u56e0\u679c\u63a8\u7406\u80fd\u529b\u7684\u63a8\u7406\u6846\u67b6\u3002C2P\u81ea\u4e3b\u8fd0\u884c\uff0c\u5728\u56e0\u679c\u5b66\u4e60\u548c\u63a8\u7406\u9636\u6bb5\u5747\u65e0\u9700\u4f9d\u8d56\u5916\u90e8\u5de5\u5177\u6216\u6a21\u5757\uff0c\u5e76\u4e14\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230LLM\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002\u5728\u5404\u79cd\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cC2P\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u56e0\u679c\u5b66\u4e60\u548c\u540e\u7eed\u63a8\u7406\u51c6\u786e\u6027\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7C2P\u589e\u5f3aLLM\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u7684\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u89e3\u51b3\u533b\u7597\u3001\u533b\u5b66\u3001\u7ecf\u6d4e\u5b66\u3001\u6559\u80b2\u3001\u793e\u4f1a\u79d1\u5b66\u3001\u73af\u5883\u79d1\u5b66\u548c\u5e02\u573a\u8425\u9500\u7b49\u9886\u57df\u4e2d\u7684\u590d\u6742\u95ee\u9898\u3002\u5229\u7528\u5c11\u793a\u4f8b\u5b66\u4e60\uff0cGPT-4 Turbo \u4f7f\u7528C2P\uff0c\u4ec5\u4f7f\u7528\u516d\u4e2a\u793a\u4f8b\u5c31\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u63a8\u7406\u51c6\u786e\u6027\u6bd4\u5728\u7c7b\u4f3c\u60c5\u51b5\u4e0b\u8fd1\u4e4e\u968f\u673a\u8fd0\u884c\u7684\u6700\u5148\u8fdbLLM\u9ad8\u51fa33%\u4ee5\u4e0a\u3002\u8fd9\u8bc1\u660e\u4e86\u5c06C2P\u96c6\u6210\u5230LLM\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u8d4b\u4e88\u8fd9\u4e9b\u6a21\u578b\u9ad8\u7ea7\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u5177\u6709\u53d8\u9769\u6027\u610f\u4e49\u3002|\n", "2407.18064": "|**2024-07-25**|**ComPeer: A Generative Conversational Agent for Proactive Peer Support**|Tianjian Liu et.al.|[2407.18064](http://arxiv.org/abs/2407.18064)|**[link](https://github.com/liutj9/compeer)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u5f0f\u4ee3\u7406\uff08CA\uff09\u4f5c\u4e3a\u540c\u4f34\u652f\u6301\u8005\u5728\u5fc3\u7406\u5065\u5eb7\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\u53ca\u76ca\u5904\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u540c\u4f34\u652f\u6301\u578bCA\u8981\u4e48\u7531\u7528\u6237\u4e3b\u52a8\u89e6\u53d1\uff0c\u8981\u4e48\u9075\u5faa\u9884\u8bbe\u89c4\u5219\u4ee5\u542f\u52a8\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u80fd\u963b\u788d\u7528\u6237\u4e0eCA\u5efa\u7acb\u957f\u671f\u5173\u7cfb\uff0c\u4ece\u800c\u5f71\u54cd\u957f\u671f\u76ca\u5904\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ComPeer\u2014\u2014\u4e00\u79cd\u751f\u6210\u5f0fCA\uff0c\u5b83\u80fd\u591f\u4e3b\u52a8\u63d0\u4f9b\u9002\u5e94\u6027\u7684\u540c\u4f34\u652f\u6301\u3002 ComPeer\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5e76\u53cd\u6620\u5bf9\u8bdd\u4e2d\u7684\u5173\u952e\u4e8b\u4ef6\uff0c\u4ee5\u6b64\u6765\u7b56\u7565\u6027\u5730\u89c4\u5212\u4e3b\u52a8\u5173\u6000\u7684\u65f6\u95f4\u548c\u5185\u5bb9\u3002\u6b64\u5916\uff0cComPeer\u8fd8\u6574\u5408\u4e86\u540c\u4f34\u652f\u6301\u7b56\u7565\u3001\u5bf9\u8bdd\u5386\u53f2\u4ee5\u53ca\u5176\u4e2a\u6027\u5316\u7684\u5143\u7d20\u5230\u751f\u6210\u7684\u6d88\u606f\u4e2d\u3002\u901a\u8fc7\u4e00\u9879\u4e3a\u671f\u4e00\u5468\u7684\u8de8\u7ec4\u5b9e\u9a8c\uff08\u53c2\u4e0e\u4eba\u6570\uff1a24\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86ComPeer\u5728\u957f\u65f6\u95f4\u5185\u63d0\u4f9b\u540c\u4f34\u652f\u6301\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u4e0e\u57fa\u4e8e\u7528\u6237\u7684\u4e3b\u52a8\u89e6\u53d1\u7684CA\u76f8\u6bd4\uff0c\u663e\u8457\u63d0\u5347\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002 \u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u751f\u6210\u5f0fCA\u5728\u540c\u4f34\u652f\u6301\u9886\u57df\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5b83\u4eec\u5982\u4f55\u901a\u8fc7\u4e3b\u52a8\u5173\u6000\u7b56\u7565\u4fc3\u8fdb\u66f4\u6df1\u5165\u3001\u66f4\u6301\u7eed\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u4ece\u800c\u4e3a\u7528\u6237\u63d0\u4f9b\u957f\u671f\u7684\u5fc3\u7406\u5065\u5eb7\u76ca\u5904\u3002|\n", "2407.18062": "|**2024-07-25**|**Audio Entailment: Assessing Deductive Reasoning for Audio Understanding**|Soham Deshmukh et.al.|[2407.18062](http://arxiv.org/abs/2407.18062)|**[link](https://github.com/microsoft/audioentailment)**|**\u8fd1\u671f\u6587\u732e\u5728\u6784\u5efa\u97f3\u9891\u57fa\u7840\u6a21\u578b\u65f6\u4f7f\u7528\u4e86\u8bed\u8a00\u3002\u8fd9\u4e9b\u97f3\u9891-\u8bed\u8a00\u6a21\u578b\uff08ALMs\uff09\u901a\u8fc7\u5927\u91cf\u97f3\u9891\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u5728\u6587\u672c\u5230\u97f3\u9891\u68c0\u7d22\u3001\u5b57\u5e55\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6267\u884c\u66f4\u590d\u6742\u7684\u5f00\u653e\u6027\u4efb\u52a1\uff0c\u5982\u4ea4\u4e92\u5f0f\u95ee\u7b54\u65f6\u7684\u80fd\u529b\uff0c\u9700\u8981\u903b\u8f91\u63a8\u7406\u6280\u80fd\uff0c\u800c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u8bc4\u4f30\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u97f3\u9891\u8574\u542b\u7684\u65b0\u4efb\u52a1\uff0c\u7528\u4e8e\u8bc4\u4f30ALM\u7684\u6f14\u7ece\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u4e2a\u4efb\u52a1\u8bc4\u4f30\u97f3\u9891\u5185\u5bb9\u7684\u6587\u672c\u63cf\u8ff0\uff08\u5047\u8bbe\uff09\u662f\u5426\u53ef\u4ee5\u4ece\u97f3\u9891\u8bb0\u5f55\uff08\u524d\u63d0\uff09\u4e2d\u63a8\u65ad\u51fa\u6765\uff0c\u7ed3\u8bba\u53ef\u80fd\u662f\u8574\u542b\u3001\u4e2d\u7acb\u6216\u77db\u76fe\uff0c\u53d6\u51b3\u4e8e\u8bc1\u636e\u7684\u5145\u5206\u6027\u3002\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\uff0c\u97f3\u9891\u8bb0\u5f55\u6765\u81ea\u4e24\u4e2a\u97f3\u9891\u5b57\u5e55\u6570\u636e\u96c6\u2014\u2014AudioCaps\u548cClotho\uff0c\u800c\u5047\u8bbe\u5219\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u3002 \u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684ALMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u53d1\u73b0\u5b83\u4eec\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5148\u5b57\u5e55\u540e\u63a8\u7406\u201d\u8fd9\u4e00\u4e2d\u95f4\u6b65\u9aa4\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u53ef\u4ee5\u5206\u522b\u63d0\u9ad8ALMs\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u7edd\u5bf9\u503c6%\u548c3%\u3002**|\n", "2407.18061": "|**2024-07-25**|**Difficulty Estimation and Simplification of French Text Using LLMs**|Henri Jamet et.al.|[2407.18061](http://arxiv.org/abs/2407.18061)|null|\u6211\u4eec\u5229\u7528\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6765\u5f00\u53d1\u5916\u8bed\u5b66\u4e60\u5e94\u7528\uff0c\u4e13\u6ce8\u4e8e\u8bc4\u4f30\u5916\u8bed\u6587\u672c\u7684\u96be\u5ea6\u5e76\u5c06\u5176\u7b80\u5316\u81f3\u8f83\u4f4e\u96be\u5ea6\u7ea7\u522b\u3002\u6211\u4eec\u5c06\u8fd9\u4e24\u4e2a\u4efb\u52a1\u90fd\u89c6\u4e3a\u9884\u6d4b\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u6709\u6807\u7b7e\u793a\u4f8b\u3001\u8fc1\u79fb\u5b66\u4e60\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u96be\u5ea6\u5206\u7c7b\u6a21\u578b\uff0c\u76f8\u8f83\u4e8e\u4ee5\u5f80\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u5728\u51c6\u786e\u6027\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5bf9\u4e8e\u7b80\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u7b80\u5316\u8d28\u91cf\u4e0e\u610f\u4e49\u4fdd\u7559\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u6bd4\u8f83\u4e86\u96f6\u521d\u59cb\u5316\u548c\u5fae\u8c03\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u6709\u9650\u7684\u5fae\u8c03\uff0c\u53ef\u4ee5\u83b7\u5f97\u5177\u6709\u610f\u4e49\u7684\u6587\u672c\u7b80\u5316\u7ed3\u679c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u6cd5\u8bed\u6587\u672c\u4e0a\u8fdb\u884c\uff0c\u4f46\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u8bed\u8a00\u65e0\u5173\u6027\uff0c\u5e76\u76f4\u63a5\u9002\u7528\u4e8e\u5176\u4ed6\u5916\u8bed\u3002|\n", "2407.18897": "|**2024-07-26**|**Small Molecule Optimization with Large Language Models**|Philipp Guevorguian et.al.|[2407.18897](http://arxiv.org/abs/2407.18897)|**[link](https://github.com/yerevann/chemlactica)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e3a\u751f\u6210\u5206\u5b50\u836f\u7269\u8bbe\u8ba1\u5e26\u6765\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cChemlactica\u201d\u548c\u201cChemma\u201d\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u4eec\u5747\u57fa\u4e8e\u4e00\u4e2a\u542b\u67091.1\u4ebf\u4e2a\u5206\u5b50\u53ca\u8ba1\u7b97\u5f97\u51fa\u5c5e\u6027\u7684\u5168\u65b0\u6570\u636e\u96c6\uff0c\u5171\u8ba1400\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u5fae\u8c03\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u751f\u6210\u5177\u6709\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\u4ee5\u53ca\u4ece\u6709\u9650\u6837\u672c\u9884\u6d4b\u65b0\u5206\u5b50\u7279\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u4f18\u5316\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u6211\u4eec\u7684\u8bed\u8a00\u6a21\u578b\u5bf9\u4efb\u610f\u5c5e\u6027\u8fdb\u884c\u4f18\u5316\uff0c\u540c\u65f6\u4ec5\u901a\u8fc7\u9ed1\u76d2\u5f0f\u63a5\u53e3\u8bbf\u95ee\u6709\u9650\u4fe1\u606f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u9057\u4f20\u7b97\u6cd5\u3001\u62d2\u7edd\u91c7\u6837\u548c\u63d0\u793a\u4f18\u5316\u7684\u6982\u5ff5\u3002\u8be5\u7b97\u6cd5\u5728\u591a\u4e2a\u5206\u5b50\u4f18\u5316\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5728\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\u63d0\u9ad8\u4e868%\u7684\u201cPractical Molecular Optimization\u201d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u8bed\u8a00\u6a21\u578b\u548c\u4f18\u5316\u7b97\u6cd5\u7684\u4ee3\u7801\u3002**|\n", "2407.18827": "|**2024-07-26**|**Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models**|Mutahar Safdar et.al.|[2407.18827](http://arxiv.org/abs/2407.18827)|null|\u6570\u636e\u9a71\u52a8\u7684\u589e\u6750\u5236\u9020(AM)\u7814\u7a76\u5728\u8fd1\u5e74\u6765\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5927\u91cf\u7684\u79d1\u5b66\u6587\u732e\u6d8c\u73b0\u3002\u8fd9\u4e9b\u6587\u732e\u4e2d\u7684\u77e5\u8bc6\u6d89\u53caAM\u548c\u4eba\u5de5\u667a\u80fd(AI)\u7684\u4e0a\u4e0b\u6587\uff0c\u4f46\u5c1a\u672a\u4ee5\u96c6\u6210\u7684\u65b9\u5f0f\u8fdb\u884c\u6316\u6398\u548c\u5f62\u5f0f\u5316\u3002\u4ece\u8fd9\u4e9b\u4f5c\u54c1\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u9700\u8981\u5927\u91cf\u7684\u52aa\u529b\u548c\u65f6\u95f4\u3002\u5728AM\u9886\u57df\u7684\u4e13\u5bb6\u5df2\u7ecf\u8d21\u732e\u4e86\u8d85\u8fc7\u4e8c\u5341\u591a\u7bc7\u7efc\u8ff0\u8bba\u6587\u6765\u603b\u7ed3\u8fd9\u4e9b\u5de5\u4f5c\u3002\u7136\u800c\uff0c\u4e0eAM\u548cAI\u76f8\u5173\u7684\u7279\u5b9a\u4fe1\u606f\u4ecd\u7136\u9700\u8981\u624b\u52a8\u52aa\u529b\u6765\u63d0\u53d6\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u5982BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u53d8\u6362\u5668\uff09\u6216GPT\uff08\u9884\u8bad\u7ec3\u751f\u6210\u578b\u53d8\u6362\u5668\uff09\u5728\u6587\u672c\u6570\u636e\u4e0a\u7684\u6210\u529f\uff0c\u4e3a\u52a0\u901f\u79d1\u5b66\u4fe1\u606f\u63d0\u53d6\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdbAM\u548cAI\u4e13\u5bb6\u4e4b\u95f4\u7684\u5408\u4f5c\uff0c\u4ee5\u8fde\u7eed\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u3002\u57fa\u4e8e\u63d0\u51fa\u7684\u6846\u67b6\u5b9e\u73b0\u4e86\u4e00\u4e2a\u6f14\u793a\u5de5\u5177\uff0c\u5e76\u5f00\u5c55\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u63d0\u53d6\u4e0e\u6570\u636e\u96c6\u3001\u5efa\u6a21\u3001\u4f20\u611f\u548cAM\u7cfb\u7edf\u7c7b\u522b\u76f8\u5173\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u52a0\u5feb\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\u7684\u80fd\u529b\u3002\u5728\u672a\u6765\uff0c\u8be5\u6846\u67b6\u53ef\u4ee5\u7528\u4e8e\u4ece\u5de5\u7a0b\u5b66\u79d1\u7684\u8bbe\u8ba1\u548c\u5236\u9020\u6587\u732e\u4e2d\u63d0\u53d6\u4fe1\u606f\u3002|\n", "2407.18787": "|**2024-07-26**|**Automatic Detection of Moral Values in Music Lyrics**|Vjosa Preniqi et.al.|[2407.18787](http://arxiv.org/abs/2407.18787)|**[link](https://github.com/vjosapreniqi/ismir-mft-values)**|\u9053\u5fb7\u4ef7\u503c\u89c2\u5728\u8bc4\u4f30\u4fe1\u606f\u3001\u505a\u51fa\u51b3\u7b56\u548c\u5bf9\u91cd\u8981\u793e\u4f1a\u95ee\u9898\u5f62\u6210\u5224\u65ad\u65b9\u9762\u53d1\u6325\u7740\u57fa\u7840\u6027\u4f5c\u7528\u3002\u4ece\u6b4c\u8bcd\u4e2d\u5feb\u901f\u63d0\u53d6\u9053\u5fb7\u4ef7\u503c\u7684\u53ef\u80fd\u6027\u4f7f\u6211\u4eec\u5bf9\u97f3\u4e50\u8046\u542c\u884c\u4e3a\u6709\u66f4\u6df1\u7684\u7406\u89e3\u3002\u57fa\u4e8e\u9053\u5fb7\u57fa\u7840\u7406\u8bba\uff08MFT\uff09\uff0c\u6211\u4eec\u5bf9\u4e00\u7ec4\u7ecf\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4\uff09\u751f\u6210\u76842,721\u4e2a\u5408\u6210\u6b4c\u8bcd\u5fae\u8c03\u7684\u53d8\u538b\u5668\u57fa\u8bed\u8a00\u6a21\u578b\uff08BERT\uff09\u8fdb\u884c\u4e86\u4efb\u52a1\uff0c\u4ee5\u68c0\u6d4b200\u9996\u7531\u4e24\u4f4d\u4e13\u5bb6\u6ce8\u91ca\u7684\u771f\u5b9e\u97f3\u4e50\u6b4c\u8bcd\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff08\u5305\u62ec\u79bb\u57df\uff08BERT\u5728MFT\u6ce8\u91ca\u7684\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u4e0a\u5fae\u8c03\uff09\u548c\u96f6\u5c04\u51fb\uff08GPT-4\uff09\u5206\u7c7b\uff09\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u9884\u6d4b\u80fd\u529b\u3002\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5728\u6240\u6709\u5b9e\u9a8c\u4e2d\u5747\u8868\u73b0\u51fa\u6700\u4f73\u51c6\u786e\u6027\uff0c\u5e73\u5747F1\u52a0\u6743\u5f97\u5206\u4e3a0.8\u3002\u4e0e\u57fa\u51c6\u6a21\u578b\u76f8\u6bd4\uff0c\u8be5\u6027\u80fd\u5e73\u5747\u9ad8\u51fa5%\u3002\u5728\u4e8c\u5143\u5206\u7c7b\u7684\u7cbe\u786e\u5ea6\u4e0a\uff0c\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5e73\u5747\u9ad8\u51fa\u57fa\u51c6\u6a21\u578b12%\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8d21\u732e\u4e86\u65e0\u6ce8\u91ca\u7684\u6b4c\u8bcd\u9053\u5fb7\u5b66\u4e60\u4ee5\u53ca\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u97f3\u4e50\u4e2d\u9053\u5fb7\u8868\u8fbe\u7684\u77e5\u8bc6\u63d0\u70bc\uff0c\u5e76\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u6280\u672f\u5bf9\u521b\u610f\u4ea7\u4e1a\u548c\u97f3\u4e50\u6587\u5316\u6f5c\u5728\u5f71\u54cd\u7684\u6709\u7528\u89c1\u89e3\u3002|\n", "2407.18786": "|**2024-07-26**|**The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs**|Aleix Sant et.al.|[2407.18786](http://arxiv.org/abs/2407.18786)|null|\u672c\u6587\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u89c6\u89d2\u63a2\u8ba8\u4e86\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u6027\u522b\u504f\u89c1\u95ee\u9898\u3002\u7814\u7a76\u4f7f\u7528\u4e86\u56db\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u6d4b\u8bd5\u96c6\uff0c\u5bf9\u82f1\u8bed\u5230\u52a0\u6cf0\u7f57\u5c3c\u4e9a\u8bed\uff08En$\\rightarrow$Ca\uff09\u548c\u82f1\u8bed\u5230\u897f\u73ed\u7259\u8bed\uff08En$\\rightarrow$Es\uff09\u7684\u7ffb\u8bd1\u65b9\u5411\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e0e\u6700\u5148\u8fdb\u7684\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08NMT\uff09\u6a21\u578b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u8bc4\u4f30\u5404\u79cd\u57fa\u7840LLM\u7684\u7ffb\u8bd1\u8d28\u91cf\u548c\u6027\u522b\u504f\u89c1\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u666e\u904d\u5b58\u5728\u6027\u522b\u504f\u89c1\u73b0\u8c61\uff0c\u5176\u4e2d\u57fa\u7840LLM\u7684\u504f\u89c1\u7a0b\u5ea6\u6bd4NMT\u6a21\u578b\u66f4\u9ad8\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u79cd\u504f\u89c1\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u5e94\u7528\u7684\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u3002\u7814\u7a76\u8bc6\u522b\u51fa\u4e00\u79cd\u63d0\u793a\u7ed3\u6784\uff0c\u80fd\u591f\u663e\u8457\u964d\u4f4e\u6027\u522b\u504f\u89c1\uff0c\u76f8\u6bd4\u66f4\u76f4\u63a5\u7684\u63d0\u793a\uff0c\u5728WinoMT\u8bc4\u4f30\u6570\u636e\u96c6\u4e0a\u51cf\u5c11\u4e86\u9ad8\u8fbe12%\u7684\u6027\u522b\u504f\u89c1\u3002\u8fd9\u4e9b\u7ed3\u679c\u663e\u8457\u7f29\u5c0f\u4e86LLM\u4e0e\u4f20\u7edfNMT\u7cfb\u7edf\u5728\u6027\u522b\u504f\u89c1\u51c6\u786e\u6027\u65b9\u9762\u7684\u5dee\u8ddd\u3002|\n", "2407.18764": "|**2024-07-26**|**TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals**|Kevin Kliimask et.al.|[2407.18764](http://arxiv.org/abs/2407.18764)|null|\u81ea2000\u5e74\u4ee3\u4e2d\u671f\u4ee5\u6765\uff0c\u63a8\u52a8\u5f00\u653e\u653f\u5e9c\u6570\u636e\uff08OGD\uff09\u7684\u52aa\u529b\u5728\u5404\u7ea7\u653f\u5e9c\u4e2d\u83b7\u5f97\u4e86\u663e\u8457\u7684\u52bf\u5934\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684\u6570\u636e\u96c6\u88ab\u53d1\u5e03\u5230OGD\u95e8\u6237\u4e0a\uff0c\u67e5\u627e\u7279\u5b9a\u6570\u636e\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\uff0c\u5bfc\u81f4\u4fe1\u606f\u8fc7\u8f7d\u3002\u5b8c\u6574\u4e14\u51c6\u786e\u7684\u6570\u636e\u96c6\u6587\u6863\uff0c\u5305\u62ec\u4e0e\u6570\u636e\u96c6\u5173\u8054\u7684\u9002\u5f53\u6807\u7b7e\uff0c\u5bf9\u4e8e\u63d0\u9ad8\u6570\u636e\u96c6\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u81f3\u5173\u91cd\u8981\u3002\u5bf9\u7231\u6c99\u5c3c\u4e9a\u5f00\u653e\u6570\u636e\u95e8\u6237\u7684\u5206\u6790\u63ed\u793a\uff0c11%\u7684\u6570\u636e\u96c6\u6ca1\u6709\u5173\u8054\u6807\u7b7e\uff0c\u800c26%\u7684\u6570\u636e\u96c6\u4ec5\u6709\u4e00\u4e2a\u6807\u7b7e\u88ab\u5206\u914d\uff0c\u8fd9\u8868\u660e\u4e86\u95e8\u6237\u5185\u6570\u636e\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u9762\u4e34\u7684\u6311\u6218\u3002\u6839\u636e\u6700\u8fd1\u7684\u5f00\u653e\u6570\u636e\u6210\u719f\u5ea6\u62a5\u544a\uff0c\u8be5\u95e8\u6237\u88ab\u8ba4\u4e3a\u662f\u9886\u5148\u8005\u3002\u672c\u7814\u7a76\u7684\u76ee\u6807\u662f\u63d0\u51fa\u4e00\u79cd\u81ea\u52a8\u5316\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u6539\u5584OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u96c6\u6807\u7b7e\uff0c\u4ece\u800c\u63d0\u9ad8\u6570\u636e\u96c6\u7684\u53ef\u53d1\u73b0\u6027\u3002\u672c\u6587\u4ecb\u7ecd\u4e86Tagify\u2014\u2014\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5982GPT-3.5-turbo\u548cGPT-4\u81ea\u52a8\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\u7684\u539f\u578b\uff0c\u4ee5\u82f1\u8bed\u548c\u7231\u6c99\u5c3c\u4e9a\u8bed\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\uff0c\u4ece\u800c\u589e\u5f3a\u6570\u636e\u53d1\u5e03\u8005\u51c6\u5907\u7684\u5143\u6570\u636e\uff0c\u5e76\u901a\u8fc7\u6539\u5584\u6570\u636e\u7528\u6237\u5728OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u53d1\u73b0\u6027\u6765\u63d0\u9ad8\u6570\u636e\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u5f00\u53d1\u7684\u89e3\u51b3\u65b9\u6848\u7ecf\u8fc7\u7528\u6237\u8bc4\u4f30\uff0c\u5e76\u6536\u96c6\u4e86\u4ed6\u4eec\u7684\u53cd\u9988\uff0c\u4ee5\u5b9a\u4e49\u672a\u6765\u539f\u578b\u6539\u8fdb\u7684\u8bae\u7a0b\u3002|\n", "2407.18752": "|**2024-07-26**|**Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery**|Yuni Susanti et.al.|[2407.18752](http://arxiv.org/abs/2407.18752)|**[link](https://github.com/littleflow3r/kg-structure-as-prompt)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u57fa\u4e8e\u5143\u6570\u636e\u800c\u975e\u5b9e\u9645\u6570\u636e\u503c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u56e0\u679c\u53d1\u73b0\u95ee\u9898\u4e0a\u7684\u65b0\u89c6\u89d2\uff0c\u5373\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u6211\u4eec\u5173\u6ce8\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLMs\uff0c\u53c2\u6570\u5c11\u4e8e10\u4ebf\uff09\u5982\u4f55\u901a\u8fc7\u63d0\u793a\u5f0f\u5b66\u4e60\u8fdb\u884c\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7ed3\u6784\u63d0\u793a\u201d\uff08KG Structure as Prompt\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u77e5\u8bc6\u56fe\u8c31\u4e2d\u7684\u7ed3\u6784\u4fe1\u606f\uff0c\u5982\u5171\u90bb\u8282\u70b9\u548c\u5143\u8def\u5f84\uff0c\u6574\u5408\u5230\u63d0\u793a\u5f0f\u5b66\u4e60\u4e2d\uff0c\u4ee5\u589e\u5f3aSLMs\u7684\u80fd\u529b\u3002 \u5728\u4e09\u79cd\u7c7b\u578b\u7684\u751f\u547d\u79d1\u5b66\u548c\u5f00\u653e\u57df\u6570\u636e\u96c6\u4e0b\u7684\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u8d85\u8d8a\u4e86\u8bb8\u591a\u57fa\u7ebf\uff0c\u5e76\u4e14\u751a\u81f3\u8d85\u8fc7\u4e86\u5728\u5b8c\u6574\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5e38\u89c4\u5fae\u8c03\u7684\u4f20\u7edf\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\uff1a\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u63d0\u793a\u5f0f\u5b66\u4e60\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u663e\u793a\u51fa\u8d85\u8d8a\u53c2\u6570\u66f4\u591aLLMs\u7684\u6f5c\u529b\u3002 \u6211\u4eec\u5df2\u7ecf\u5728GitHub\u4e0a\u63d0\u4f9b\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\u3002**|\n", "2407.18743": "|**2024-07-26**|**Towards Effective and Efficient Continual Pre-training of Large Language Models**|Jie Chen et.al.|[2407.18743](http://arxiv.org/abs/2407.18743)|null|\u8fd9\u7bc7\u6280\u672f\u62a5\u544a\u4ecb\u7ecd\u4e86\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u65b9\u6cd5\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u6216\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u62a5\u544a\u4ee5Llama-3\uff088B\uff09\u4e3a\u4f8b\uff0c\u8fd9\u662f\u4e00\u4e2a\u663e\u8457\u63d0\u5347\u4e86\u5176\u5728\u4e2d\u6587\u7406\u89e3\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u7684\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5728\u589e\u5f3a\u65b0\u80fd\u529b\u7684\u540c\u65f6\u4fdd\u6301\u539f\u6709\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6570\u636e\u6df7\u5408\u548c\u8bfe\u7a0b\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u6570\u636e\u96c6\u5e76\u5408\u6210\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u57fa\u4e8e\u76f8\u5173\u7f51\u9875\u751f\u6210\u591a\u5b66\u79d1\u7684\u79d1\u5b66\u95ee\u9898\u4e0e\u7b54\u6848\uff08QA\uff09\u5bf9\uff0c\u5e76\u5c06\u8fd9\u4e9b\u5408\u6210\u6570\u636e\u878d\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u4ee5\u63d0\u5347Llama-3\u7684\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ecf\u8fc7\u8fd9\u4e00\u7cfb\u5217\u6539\u8fdb\u540e\u7684\u6a21\u578b\u88ab\u79f0\u4e3aLlama-3-SynE\uff08\u5408\u6210\u6570\u636e\u589e\u5f3a\u7684Llama-3\uff09\u3002\u62a5\u544a\u8fd8\u901a\u8fc7\u8f83\u5c0f\u89c4\u6a21\u7684TinyLlama\u6a21\u578b\u8fdb\u884c\u8c03\u53c2\u5b9e\u9a8c\uff0c\u5e76\u5229\u7528\u4ece\u8fd9\u4e9b\u5b9e\u9a8c\u4e2d\u5f97\u5230\u7684\u53d1\u73b0\u6765\u8bad\u7ec3\u57fa\u7ebf\u6a21\u578b\u3002 \u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u9ad8\u57fa\u7ebf\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5305\u62ec\u901a\u7528\u80fd\u529b\uff08C-Eval\u4e0a+8.81\u5206\uff0cCMMLU\u4e0a+6.31\u5206\uff09\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\uff08MATH\u4e0a+12.00\u5206\uff0cSciEval\u4e0a+4.13\u5206\uff09\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u539f\u6709\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u3001\u6570\u636e\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/RUC-GSAI/Llama-3-SynE\u3002|\n", "2407.18738": "|**2024-07-26**|**Towards Generalized Offensive Language Identification**|Alphaeus Dmonte et.al.|[2407.18738](http://arxiv.org/abs/2407.18738)|null|\u4e92\u8054\u7f51\u4e0a\u5177\u6709\u653b\u51fb\u6027\u7684\u5185\u5bb9\uff0c\u5305\u62ec\u4ec7\u6068\u8a00\u8bba\u548c\u7f51\u7edc\u6b3a\u51cc\uff0c\u662f\u4e00\u4e2a\u5168\u7403\u6027\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u793e\u533a\u5bf9\u6b64\u7ed9\u4e88\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u5df2\u7ecf\u5f00\u53d1\u51fa\u4e86\u591a\u79cd\u81ea\u52a8\u8bc6\u522b\u53ef\u80fd\u6709\u5bb3\u5185\u5bb9\u5e76\u51cf\u8f7b\u5176\u5f71\u54cd\u7684\u7cfb\u7edf\u3002\u8fd9\u4e9b\u7cfb\u7edf\u4e3b\u8981\u91c7\u7528\u4e24\u79cd\u7b56\u7565\uff1a\uff081\uff09\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u6a21\u578b\u548c\u5e94\u7528\u7aef\u70b9\uff0c\u5305\u62ec\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u6ce8\u91ca\u6570\u636e\u96c6\uff0c\u5e76\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u673a\u5668\u5b66\u4e60\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u901a\u7528\u6027\u5c1a\u4e0d\u6e05\u695a\uff0c\u800c\u4e14\u5b83\u4eec\u5728\u5b9e\u9645\u73af\u5883\u548c\u975e\u9886\u57df\u5185\u7684\u5e94\u7528\u4e5f\u5e38\u53d7\u5230\u8d28\u7591\u3002\u672c\u6587\u901a\u8fc7\u4e00\u4e2a\u65b0\u9896\u7684\u901a\u7528\u57fa\u51c6\u5bf9\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u901a\u7528\u6027\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\u3002\u6211\u4eec\u9488\u5bf9\u901a\u7528\u6027\u63d0\u51fa\u4e86\u4e09\u4e2a\u7814\u7a76\u95ee\u9898\uff0c\u5e76\u5f97\u51fa\u4e86\u7ed3\u8bba\u3002\u8fd9\u4e9b\u53d1\u73b0\u5c06\u6709\u52a9\u4e8e\u6784\u5efa\u66f4\u5f3a\u5927\u7684\u73b0\u5b9e\u4e16\u754c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7cfb\u7edf\u3002|\n", "2407.18723": "|**2024-07-26**|**LLASP: Fine-tuning Large Language Models for Answer Set Programming**|Erica Coppolillo et.al.|[2407.18723](http://arxiv.org/abs/2407.18723)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u3002\u5c3d\u7ba1\u5728\u9002\u5e94LLMs\u4ee5\u751f\u6210\u591a\u79cd\u6307\u4ee4\u6027\u7f16\u7a0b\u8bed\u8a00\u548c\u4efb\u52a1\u7684\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u58f0\u660e\u5f0f\u5f62\u5f0f\u5316\u8bed\u8a00\uff0c\u5982\u7b54\u6848\u96c6\u7f16\u7a0b\uff08ASP\uff09\u65f6\u7684\u80fd\u529b\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8LLMs\u5728ASP\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u5e94\u7528\u53ef\u80fd\u6027\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u8fdb\u884c\u4e86\u7cfb\u7edf\u8bc4\u4f30\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5728\u53c2\u6570\u6570\u91cf\u3001\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u7b49\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u5b83\u4eec\u5728\u751f\u6210\u6b63\u786eASP\u7a0b\u5e8f\u65b9\u9762\u7684\u8868\u73b0\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLASP\u7684\u8f7b\u91cf\u7ea7\u6a21\u578b\uff0c\u4e13\u95e8\u7528\u4e8e\u7f16\u7801ASP\u7a0b\u5e8f\u7684\u57fa\u672c\u6a21\u5f0f\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u5e7f\u6cdb\u57fa\u672c\u95ee\u9898\u89c4\u8303\u7684\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\uff0c\u8fd9\u4e9b\u89c4\u8303\u53ef\u4ee5\u88ab\u7f16\u7801\u4e3aASP\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLASP\u751f\u6210\u7684ASP\u7a0b\u5e8f\u7684\u8d28\u91cf\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u3002\u4e0e\u672a\u7ecf\u8fc7\u5fae\u8c03\u7684\u7248\u672c\u76f8\u6bd4\uff0c\u4ee5\u53ca\u4e0e\u5927\u591a\u6570\u6e34\u671b\u578bLLM\u5019\u9009\u8005\uff0c\u5c24\u5176\u662f\u4ece\u8bed\u4e49\u89d2\u5ea6\u6765\u770b\uff0c\u5176\u8868\u73b0\u5747\u4f18\u4e8e\u591a\u6570\u3002\u6240\u6709\u7528\u4e8e\u6267\u884c\u5b9e\u9a8c\u7684\u4ee3\u7801\u548c\u6570\u636e\u90fd\u5df2\u516c\u5f00\u53d1\u5e03\u4e8ehttps://anonymous.4open.science/r/LLASP-D86C/\u3002|\n", "2407.18722": "|**2024-07-26**|**Neurosymbolic AI for Enhancing Instructability in Generative AI**|Amit Sheth et.al.|[2407.18722](http://arxiv.org/abs/2407.18722)|null|\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u97f3\u4e50\u7b49\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u53d8\u9769\uff0c\u5c55\u793a\u4e86\u9075\u5faa\u6307\u4ee4\u7684\u63d0\u793a\u80fd\u529b\uff0c\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u6307\u4ee4\u8c03\u4f18\u3002\u6307\u4ee4\u8c03\u4f18\u662f\u4e00\u79cd\u76d1\u7763\u5f0f\u5fae\u8c03\u65b9\u6cd5\uff0c\u901a\u8fc7\u8bad\u7ec3\u6570\u636e\u96c6\u6765\u5b9e\u73b0\u7279\u5b9a\u4efb\u52a1\u53ca\u5176\u5bf9\u5e94\u6307\u4ee4\u683c\u5f0f\u5316\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7cfb\u7edf\u6027\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u6267\u884c\u63d0\u4f9b\u6307\u793a\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cLLMs \u5728\u4e00\u81f4\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u3001\u591a\u6b65\u9aa4\u6307\u4ee4\u4ee5\u53ca\u5c06\u8fd9\u4e9b\u6307\u4ee4\u63a8\u5e7f\u5230\u65b0\u4efb\u52a1\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5bf9\u4e8e\u66f4\u5e7f\u6cdb\u5730\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e3a\u4f55\u795e\u7ecf\u7b26\u53f7AI\u80fd\u63d0\u4f9b\u589e\u5f3aLLMs\u6307\u4ee4\u53ef\u7406\u89e3\u6027\u7684\u66f4\u597d\u9014\u5f84\u3002\u6211\u4eec\u63a2\u7d22\u4f7f\u7528\u7b26\u53f7\u4efb\u52a1\u89c4\u5212\u5668\u5206\u89e3\u9ad8\u7ea7\u6307\u4ee4\u4e3a\u7ed3\u6784\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u795e\u7ecf\u8bed\u4e49\u89e3\u6790\u5668\u5c06\u8fd9\u4e9b\u4efb\u52a1\u843d\u5730\u4e3a\u53ef\u6267\u884c\u64cd\u4f5c\uff0c\u4ee5\u53ca\u4f7f\u7528\u795e\u7ecf\u7b26\u53f7\u6267\u884c\u5668\u5b9e\u65bd\u8fd9\u4e9b\u64cd\u4f5c\u7684\u540c\u65f6\u52a8\u6001\u7ef4\u62a4\u660e\u786e\u7684\u72b6\u6001\u8868\u793a\u3002\u6211\u4eec\u4e5f\u5bfb\u6c42\u5c55\u793a\uff0c\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\u80fd\u591f\u589e\u5f3a\u4efb\u52a1\u6267\u884c\u7684\u53ef\u9760\u6027\u548c\u4e0a\u4e0b\u6587\u610f\u8bc6\uff0c\u4f7fLLMs\u80fd\u591f\u4ee5\u66f4\u9ad8\u7684\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u52a8\u6001\u89e3\u91ca\u548c\u54cd\u5e94\u66f4\u5e7f\u6cdb\u7684\u6307\u4ee4\u4e0a\u4e0b\u6587\u3002|\n", "2407.20232": "|**2024-07-29**|**Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing**|Ekaterina Iakovleva et.al.|[2407.20232](http://arxiv.org/abs/2407.20232)|null|\u6587\u672c\u7f16\u8f91\u7684\u6269\u6563\u6a21\u578b\u5728\u7528\u6237\u8f93\u5165\u6307\u4ee4\u5b58\u5728\u6b67\u4e49\u65f6\u8868\u73b0\u51fa\u6709\u9650\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Specify ANd Edit\uff08SANE\uff09\uff0c\u4e00\u4e2a\u7528\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u7f16\u8f91\u7cfb\u7edf\u7684\u96f6\u6837\u672c\u63a8\u7406\u7ba1\u9053\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u8f93\u5165\u6307\u4ee4\u5206\u89e3\u4e3a\u5177\u4f53\u7684\u6307\u4ee4\uff0c\u5373\u5e94\u7528\u5230\u8f93\u5165\u56fe\u50cf\u4ee5\u6ee1\u8db3\u7528\u6237\u8bf7\u6c42\u7684\u5177\u4f53\u5e72\u9884\u63aa\u65bd\u3002\u901a\u8fc7\u4e00\u79cd\u4e13\u95e8\u4e3a\u4efb\u52a1\u8bbe\u8ba1\u7684\u65b0\u9896\u53bb\u566a\u6307\u5bfc\u7b56\u7565\uff0c\u6211\u4eec\u53ef\u4ee5\u4eceLLM\u751f\u6210\u7684\u6307\u4ee4\u4ee5\u53ca\u539f\u59cb\u6307\u4ee4\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u4e09\u4e2a\u57fa\u7ebf\u548c\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86SANE\u5728\u6240\u6709\u8bbe\u7f6e\u4e2d\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u63d0\u9ad8\u4e86\u7f16\u8f91\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u589e\u5f3a\u4e86\u8f93\u51fa\u591a\u6837\u6027\u3002\u6211\u4eec\u8fd8\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u7f16\u8f91\uff0c\u65e0\u8bba\u662f\u5426\u5b58\u5728\u6b67\u4e49\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/fabvio/SANE\u3002|\n", "2407.20224": "|**2024-07-29**|**Can Editing LLMs Inject Harm?**|Canyu Chen et.al.|[2407.20224](http://arxiv.org/abs/2407.20224)|null|\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u6b63\u9010\u6e10\u88ab\u91c7\u7528\u4ee5\u9ad8\u6548\u5730\u7ea0\u6b63\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u9519\u8bef\u6216\u8fc7\u65f6\u77e5\u8bc6\uff0c\u8fd9\u4e3b\u8981\u662f\u56e0\u4e3a\u4ece\u5934\u5f00\u59cb\u91cd\u65b0\u8bad\u7ec3\u7684\u9ad8\u6210\u672c\u3002\u540c\u65f6\uff0c\u4e00\u4e2a\u4e9f\u5f85\u63a2\u7d22\u4f46\u672a\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1a\u77e5\u8bc6\u7f16\u8f91\u662f\u5426\u53ef\u4ee5\u7528\u4e8e\u5411LLMs\u6ce8\u5165\u5371\u5bb3\uff1f\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u77e5\u8bc6\u7f16\u8f91\u91cd\u65b0\u5b9a\u4e49\u4e3aLLMs\u9762\u4e34\u7684\u4e00\u79cd\u65b0\u7c7b\u578b\u5b89\u5168\u6027\u5a01\u80c1\uff0c\u5373\u7f16\u8f91\u653b\u51fb\uff0c\u5e76\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6EditAttack\u8fdb\u884c\u4e86\u7cfb\u7edf\u6027\u7684\u8c03\u67e5\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7f16\u8f91\u653b\u51fb\u7684\u4e24\u4e2a\u5178\u578b\u5b89\u5168\u6027\u98ce\u9669\uff1a\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u504f\u89c1\u6ce8\u5165\u3002\u5bf9\u4e8e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u9996\u5148\u5c06\u5176\u7ec6\u5206\u4e3a\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u957f\u5c3e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u3002\u7136\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u7f16\u8f91\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u5411LLMs\u6ce8\u5165\u8fd9\u4e24\u79cd\u7c7b\u578b\u7684\u8bef\u5bfc\u6027\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5bf9\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u6709\u6548\u6027\u7279\u522b\u9ad8\u3002 \u5bf9\u4e8e\u504f\u89c1\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u70b9\uff0c\u5373\u4e0d\u4ec5\u53ef\u4ee5\u901a\u8fc7\u9ad8\u6709\u6548\u6027\u5411LLMs\u6ce8\u5165\u6709\u504f\u89c1\u7684\u53e5\u5b50\uff0c\u800c\u4e14\u5355\u4e2a\u6709\u504f\u89c1\u7684\u53e5\u5b50\u6ce8\u5165\u5c31\u8db3\u4ee5\u5bfc\u81f4LLMs\u7684\u603b\u4f53\u8f93\u51fa\u51fa\u73b0\u663e\u8457\u504f\u89c1\u589e\u52a0\uff0c\u5373\u4f7f\u8fd9\u4e9b\u8f93\u51fa\u4e0e\u6ce8\u5165\u7684\u53e5\u5b50\u9ad8\u5ea6\u65e0\u5173\uff0c\u8fd9\u8868\u660e\u4e86\u7f16\u8f91\u653b\u51fb\u5bf9LLMs\u6574\u4f53\u516c\u5e73\u6027\u7684\u707e\u96be\u6027\u5f71\u54cd\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7f16\u8f91\u653b\u51fb\u7684\u9ad8\u9690\u853d\u6027\uff0c\u901a\u8fc7\u5176\u5bf9LLMs\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u7684\u5f71\u54cd\u6765\u8861\u91cf\uff0c\u4ee5\u53ca\u5728\u5b9e\u8bc1\u8bc1\u636e\u7684\u57fa\u7840\u4e0a\u8bf4\u660e\u4e86\u9632\u5fa1\u7f16\u8f91\u653b\u51fb\u7684\u56f0\u96be\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u5728\u635f\u5bb3LLMs\u5b89\u5168\u5bf9\u9f50\u65b9\u9762\u6b63\u5728\u51fa\u73b0\u7684\u6ee5\u7528\u98ce\u9669\u3002|\n", "2407.20207": "|**2024-07-29**|**QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval**|Hongming Tan et.al.|[2407.20207](http://arxiv.org/abs/2407.20207)|null|\u5728\u5bc6\u96c6\u68c0\u7d22\u9886\u57df\uff0c\u5c06\u957f\u6587\u672c\u8f6c\u5316\u4e3a\u7a20\u5bc6\u5411\u91cf\u65f6\u53ef\u80fd\u4f1a\u5bfc\u81f4\u4fe1\u606f\u4e22\u5931\uff0c\u4ece\u800c\u5f71\u54cd\u67e5\u8be2\u4e0e\u6587\u672c\u7684\u5339\u914d\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u8d28\u91cf\u8f83\u4f4e\u3001\u566a\u58f0\u8fc7\u591a\u6216\u5173\u952e\u4fe1\u606f\u7a00\u758f\u7684\u6587\u672c\u5f80\u5f80\u96be\u4ee5\u4e0e\u76f8\u5173\u67e5\u8be2\u826f\u597d\u5339\u914d\u3002\u5f53\u524d\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u53e5\u5d4c\u5165\u6a21\u578b\u6216\u68c0\u7d22\u6d41\u7a0b\u4e0a\u3002\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u5c06\u539f\u59cb\u6587\u6863\u8f6c\u5316\u4e3a\u4fe1\u606f\u5bc6\u96c6\u578b\u6587\u672c\u683c\u5f0f\uff0c\u4ee5\u8865\u5145\u539f\u6587\u672c\uff0c\u6709\u6548\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\uff0c\u540c\u65f6\u65e0\u9700\u4fee\u6539\u5d4c\u5165\u6216\u68c0\u7d22\u65b9\u6cd5\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u96f6\u6837\u672c\u63d0\u793a\u751f\u6210\u4e24\u79cd\u6587\u672c\u8868\u793a\uff1a\u95ee\u9898-\u7b54\u6848\u5bf9\u548c\u4e8b\u4ef6\u9a71\u52a8\u5143\u7d20\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u547d\u540d\u4e3aQAEA-DR\uff1a\u7edf\u4e00\u95ee\u9898\u751f\u6210\u4e0e\u4e8b\u4ef6\u63d0\u53d6\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\uff0c\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bc4\u5206\u7684\u8bc4\u4f30\u4e0e\u518d\u751f\u6210\u673a\u5236\u4e8eLLM\u63d0\u793a\u8fc7\u7a0b\u4e2d\u3002\u6211\u4eec\u7684QAEA-DR\u6a21\u578b\u5bf9\u5bc6\u96c6\u68c0\u7d22\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u8fd9\u4e00\u89c2\u70b9\u5f97\u5230\u4e86\u7406\u8bba\u5206\u6790\u548c\u5b9e\u9a8c\u8bc1\u636e\u7684\u652f\u6301\u3002|\n", "2407.20183": "|**2024-07-29**|**MindSearch: Mimicking Human Minds Elicits Deep AI Searcher**|Zehui Chen et.al.|[2407.20183](http://arxiv.org/abs/2407.20183)|**[link](https://github.com/internlm/mindsearch)**|**\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u662f\u4e00\u4e2a\u590d\u6742\u8ba4\u77e5\u4efb\u52a1\uff0c\u9700\u8981\u6295\u5165\u5927\u91cf\u65f6\u95f4\u548c\u7cbe\u529b\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fd1\u671f\u663e\u8457\u8fdb\u5c55\u7684\u542f\u53d1\uff0c\u8fd1\u671f\u5de5\u4f5c\u5c1d\u8bd5\u901a\u8fc7\u7ed3\u5408\u641c\u7d22\u5f15\u64ce\u4e0eLLM\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u7136\u56e0\u4e09\u4e2a\u6311\u6218\u800c\u83b7\u5f97\u4e0d\u4ee4\u4eba\u6ee1\u610f\u7684\u6027\u80fd\uff1a\uff081\uff09\u590d\u6742\u7684\u8bf7\u6c42\u5f80\u5f80\u65e0\u6cd5\u51c6\u786e\u4e14\u5b8c\u6574\u5730\u7531\u641c\u7d22\u5f15\u64ce\u68c0\u7d22\uff1b\uff082\uff09\u9700\u8981\u6574\u5408\u7684\u4fe1\u606f\u5206\u5e03\u5728\u591a\u4e2a\u7f51\u9875\u4e0a\uff0c\u5e76\u5939\u6742\u7740\u5927\u91cf\u566a\u97f3\uff1b\uff083\uff09\u5927\u91cf\u957f\u6587\u672c\u7684\u7f51\u9875\u53ef\u80fd\u8fc5\u901f\u8d85\u8fc7LLM\u7684\u6700\u5927\u4e0a\u4e0b\u6587\u957f\u5ea6\u3002 \u53d7\u4eba\u7c7b\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u601d\u7ef4\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86MindSearch\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u4e92\u8054\u7f51\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u8fc7\u7a0b\u4e2d\u7684\u601d\u7ef4\u6a21\u5f0f\uff0c\u53ef\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u800c\u6709\u6548\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u3002WebPlanner\u4ee5\u52a8\u6001\u56fe\u6784\u5efa\u8fc7\u7a0b\u6765\u6a21\u62df\u4eba\u7c7b\u591a\u6b65\u9aa4\u4fe1\u606f\u68c0\u7d22\u7684\u601d\u7ef4\uff1a\u5b83\u5c06\u7528\u6237\u67e5\u8be2\u5206\u89e3\u4e3a\u539f\u5b50\u5b50\u95ee\u9898\u4f5c\u4e3a\u56fe\u4e2d\u7684\u8282\u70b9\uff0c\u5e76\u6839\u636e\u4eceWebSearcher\u83b7\u53d6\u7684\u641c\u7d22\u7ed3\u679c\u9010\u6b65\u6269\u5c55\u56fe\u3002WebSearcher\u627f\u62c5\u6bcf\u4e2a\u5b50\u95ee\u9898\uff0c\u6267\u884c\u5206\u5c42\u4fe1\u606f\u68c0\u7d22\u5e76\u4ece\u641c\u7d22\u5f15\u64ce\u6536\u96c6\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u4f9bWebPlanner\u4f7f\u7528\u3002MindSearch\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u4f7f\u5176\u6574\u4f53\u6846\u67b6\u80fd\u591f\u5e76\u884c\u4ece\u8d85\u8fc7300\u4e2a\u7f51\u9875\u4e2d\u68c0\u7d22\u548c\u6574\u5408\u4fe1\u606f\uff0c\u4ec5\u97003\u5206\u949f\uff0c\u76f8\u5f53\u4e8e\u8282\u7701\u4e863\u5c0f\u65f6\u7684\u4eba\u7c7b\u52aa\u529b\u3002 MindSearch\u5728\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u9002\u7528\u4e8e\u5c01\u95ed\u96c6\u548c\u5f00\u653e\u96c6\u7684\u95ee\u7b54\u95ee\u9898\u3002\u6b64\u5916\uff0c\u57fa\u4e8eInternLM2.5-7B\u7684MindSearch\u751f\u6210\u7684\u54cd\u5e94\u88ab\u4eba\u7c7b\u8ba4\u4e3a\u4f18\u4e8eChatGPT-Web\u548cPerplexity.ai\u5e94\u7528\uff0c\u8fd9\u8868\u660eMindSearch\u5df2\u7ecf\u80fd\u591f\u63d0\u4f9b\u4e0e\u4e13\u6709AI\u641c\u7d22\u5f15\u64ce\u76f8\u7ade\u4e89\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2407.20174": "|**2024-07-29**|**Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning**|Xingchen Zeng et.al.|[2407.20174](http://arxiv.org/abs/2407.20174)|**[link](https://github.com/zengxingchen/chartqa-mllm)**|**\u65b0\u5174\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u8868\u95ee\u9898\u56de\u7b54\uff08CQA\uff09\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u8fd1\u671f\u7684\u52aa\u529b\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u6269\u5927\u8bad\u7ec3\u6570\u636e\u96c6\uff08\u5305\u62ec\u56fe\u8868\u3001\u6570\u636e\u8868\u683c\u548c\u95ee\u7b54\u5bf9\uff09\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u73b0\u6709MLLMs\u548cCQA\u6570\u636e\u96c6\u7684\u5b9e\u8bc1\u7814\u7a76\u63ed\u793a\u4e86\u663e\u8457\u7684\u5dee\u8ddd\u3002 \u9996\u5148\uff0c\u5f53\u524d\u7684\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u6570\u636e\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u7cbe\u7ec6\u7684\u89c6\u89c9\u7f16\u7801\u548c\u95ee\u7b54\u4efb\u52a1\u7684\u8003\u8651\uff0c\u5bfc\u81f4\u6570\u636e\u5206\u5e03\u4e0e\u5b9e\u9645CQA\u573a\u666f\u5927\u76f8\u5f84\u5ead\uff0c\u4e0d\u5e73\u8861\u6027\u660e\u663e\u3002\u5176\u6b21\uff0c\u73b0\u6709\u7684\u5de5\u4f5c\u9075\u5faa\u4e86\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u81ea\u7136\u56fe\u50cf\u7684\u57fa\u7840MLLMs\u7684\u8bad\u7ec3\u914d\u65b9\uff0c\u5bf9\u4e8e\u56fe\u8868\u7684\u72ec\u7279\u7279\u6027\uff0c\u5982\u4e30\u5bcc\u7684\u6587\u672c\u5143\u7d20\u7684\u9002\u5e94\u6027\u63a2\u7d22\u4e0d\u8db3\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u53c2\u8003\u6307\u4ee4\u8c03\u6574\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u589e\u5f3a\u548c\u6a21\u578b\u5f00\u53d1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u5f15\u64ce\uff0c\u80fd\u591f\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u6709\u6548\u5730\u7b5b\u9009\u51fa\u591a\u6837\u6027\u548c\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u5e76\u968f\u540e\u5229\u7528\u57fa\u4e8eLLM\u7684\u751f\u6210\u6280\u672f\u5bf9\u6570\u636e\u8fdb\u884c\u7ec6\u5316\u548c\u6269\u5145\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0e\u5b9e\u9645\u95ee\u7b54\u4efb\u52a1\u548c\u89c6\u89c9\u7f16\u7801\u76f8\u5339\u914d\u3002 \u7136\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u56fe\u8868\u7279\u6027\u7684\u9002\u5e94\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e30\u5bcc\u5316\u6570\u636e\u6765\u8bad\u7ec3\u4e00\u4e2aMLLM\uff0c\u901a\u8fc7\u89e3\u51bb\u89c6\u89c9\u7f16\u7801\u5668\u5e76\u5f15\u5165\u6df7\u5408\u5206\u8fa8\u7387\u9002\u5e94\u7b56\u7565\uff0c\u4ee5\u589e\u5f3a\u7ec6\u5fae\u7c92\u5ea6\u8bc6\u522b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5373\u4f7f\u4f7f\u7528\u8f83\u5c11\u7684\u8bad\u7ec3\u793a\u4f8b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u7684CQA\u6a21\u578b\uff0c\u5728\u5df2\u5efa\u7acb\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u8fd8\u8d21\u732e\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\u5206\u5272\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u51c6\u3002\u8be5\u8bba\u6587\u7684\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/zengxingchen/ChartQA-MLLM\u3002**|\n", "2407.20171": "|**2024-07-29**|**Diffusion Feedback Helps CLIP See Better**|Wenxuan Wang et.al.|[2407.20171](http://arxiv.org/abs/2407.20171)|**[link](https://github.com/baaivision/diva)**|\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u5728\u8de8\u9886\u57df\u548c\u6a21\u6001\u62bd\u8c61\u5f00\u653e\u4e16\u754c\u8868\u793a\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5df2\u6210\u4e3a\u5404\u79cd\u89c6\u89c9\u548c\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86CLIP\u5728\u89c6\u89c9\u65b9\u9762\u7684\u4e25\u91cd\u5c40\u9650\u6027\uff0c\u5982\u96be\u4ee5\u533a\u5206\u65b9\u5411\u3001\u6570\u91cf\u3001\u989c\u8272\u3001\u7ed3\u6784\u7b49\u3002\u8fd9\u4e9b\u89c6\u89c9\u5c40\u9650\u6027\u4e5f\u9650\u5236\u4e86\u57fa\u4e8eCLIP\u6784\u5efa\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u611f\u77e5\u80fd\u529b\u3002\u4e3b\u8981\u539f\u56e0\u662f\u7528\u4e8e\u8bad\u7ec3CLIP\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u56fa\u6709\u504f\u89c1\uff0c\u7531\u4e8e\u6587\u672c\u7684\u4e0d\u660e\u786e\u6027\u548c\u56fe\u7247\u591a\u6837\u6027\u4e0d\u8db3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9CLIP\u6a21\u578b\u7684\u7b80\u5355\u540e\u5904\u7406\u65b9\u6cd5\uff0c\u901a\u8fc7\u81ea\u6211\u76d1\u7763\u7684\u6269\u6563\u8fc7\u7a0b\u6781\u5927\u5730\u514b\u670d\u4e86\u5176\u89c6\u89c9\u5c40\u9650\u6027\u3002\u6211\u4eec\u5f15\u5165\u4e86DIVA\uff0c\u5373\u4f5c\u4e3aCLIP\u89c6\u89c9\u8f85\u52a9\u7684\u6269\u6563\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0cDIVA\u5229\u7528\u6587\u672c\u5230\u56fe\u50cf\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u53cd\u9988\u6765\u4f18\u5316CLIP\u8868\u793a\uff0c\u4ec5\u4f7f\u7528\u56fe\u50cf\uff08\u4e0d\u5305\u62ec\u5bf9\u5e94\u6587\u672c\uff09\u3002\u6211\u4eec\u8bc1\u660eDIVA\u5728MMVP-VLM\u57fa\u51c6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86CLIP\u7684\u6027\u80fd\uff0c\u8be5\u57fa\u51c6\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u7ec6\u5fae\u7684\u89c6\u89c9\u80fd\u529b\uff08\u4f8b\u5982\uff0c3-7%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u589e\u5f3a\u4e86MLLMs\u548c\u89c6\u89c9\u6a21\u578b\u5728\u591a\u6a21\u6001\u7406\u89e3\u548c\u5206\u5272\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u572829\u4e2a\u56fe\u50cf\u5206\u7c7b\u548c\u68c0\u7d22\u57fa\u51c6\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u4fdd\u7559\u4e86CLIP\u5f3a\u5927\u7684\u96f6\u6837\u672c\u80fd\u529b\u3002\u4ee3\u7801\u5c06\u5728https://github.com/baaivision/DIVA\u516c\u5f00\u3002|\n", "2407.20164": "|**2024-07-29**|**Language-Conditioned Offline RL for Multi-Robot Navigation**|Steven Morad et.al.|[2407.20164](http://arxiv.org/abs/2407.20164)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u4e3a\u591a\u673a\u5668\u4eba\u56e2\u961f\u5f00\u53d1\u80fd\u591f\u7406\u89e3\u5e76\u9075\u5faa\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u7684\u5bfc\u822a\u7b56\u7565\u3002\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5d4c\u5165\u6765\u6761\u4ef6\u5316\u8fd9\u4e9b\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u4ec520\u5206\u949f\u968f\u673a\u6536\u96c6\u7684\u6570\u636e\u8fdb\u884c\u79bb\u7ebf\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u5b83\u4eec\u3002\u5728\u4e94\u53f0\u771f\u5b9e\u673a\u5668\u4eba\u7684\u5b9e\u9a8c\u4e2d\uff0c\u8fd9\u4e9b\u7b56\u7565\u5bf9\u672a\u89c1\u8fc7\u7684\u547d\u4ee4\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8868\u660e\u5b83\u4eec\u7406\u89e3\u4e86LLM\u7684\u6f5c\u5728\u7a7a\u95f4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u9700\u8981\u6a21\u62df\u5668\u6216\u73af\u5883\u6a21\u578b\uff0c\u5e76\u4ea7\u751f\u4f4e\u5ef6\u8fdf\u7684\u63a7\u5236\u7b56\u7565\uff0c\u53ef\u4ee5\u76f4\u63a5\u90e8\u7f72\u5230\u771f\u5b9e\u673a\u5668\u4eba\u4e0a\u800c\u65e0\u9700\u8fdb\u4e00\u6b65\u8c03\u4f18\u3002\u66f4\u591a\u4fe1\u606f\u548c\u5b9e\u9a8c\u89c6\u9891\u8bf7\u53c2\u9605https://sites.google.com/view/llm-marl\u3002|\n", "2407.20157": "|**2024-07-29**|**rLLM: Relational Table Learning with LLMs**|Weichen Li et.al.|[2407.20157](http://arxiv.org/abs/2407.20157)|**[link](https://github.com/rllm-project/rllm)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3arLLM\uff08\u5173\u7cfbLLM\uff09\u7684PyTorch\u5e93\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5b9e\u73b0\u5173\u7cfb\u8868\u5b66\u4e60\uff08RTL\uff09\u3002\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u6700\u5148\u8fdb\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u3001LLMs\u548c\u8868\u795e\u7ecf\u7f51\u7edc\u5206\u89e3\u4e3a\u6807\u51c6\u5316\u6a21\u5757\uff0c\u4ee5\u5b9e\u73b0\u5feb\u901f\u6784\u5efa\u65b0\u578bRTL\u578b\u6a21\u578b\u7684\u7b80\u5355\u201c\u7ec4\u5408\u3001\u5bf9\u9f50\u548c\u8054\u5408\u8bad\u7ec3\u201d\u65b9\u5f0f\u3002\u4e3a\u4e86\u8bf4\u660erLLM\u7684\u4f7f\u7528\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RTL\u65b9\u6cd5\u540d\u4e3aBRIDGE\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7ecf\u5178\u6570\u636e\u96c6\uff0c\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u5173\u7cfb\u8868\u683c\u6570\u636e\u96c6\uff08TML1M\u3001TLF2K\u548cTACM12K\uff09\u3002\u6211\u4eec\u5e0c\u671brLLM\u80fd\u591f\u4f5c\u4e3a\u7528\u4e8eRTL\u76f8\u5173\u4efb\u52a1\u7684\u6709\u7528\u4e14\u6613\u4e8e\u4f7f\u7528\u7684\u5f00\u53d1\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u4f4d\u7f6e\u83b7\u53d6\uff1ahttps://github.com/rllm-project/rllm\u3002**|\n", "2407.20143": "|**2024-07-29**|**ByteCheckpoint: A Unified Checkpointing System for LLM Development**|Borui Wan et.al.|[2407.20143](http://arxiv.org/abs/2407.20143)|null|\u5728\u6784\u5efa\u5b9e\u9645\u4e16\u754c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\uff0c\u9700\u8981\u5728\u6301\u4e45\u5b58\u50a8\u4e2d\u68c0\u67e5\u8bad\u7ec3\u72b6\u6001\u4ee5\u9632\u6b62\u6f5c\u5728\u7684\u8f6f\u4ef6\u548c\u786c\u4ef6\u6545\u969c\uff0c\u5e76\u652f\u6301\u8bad\u7ec3\u7ba1\u9053\u5185\u7684\u68c0\u67e5\u70b9\u8f6c\u79fb\u4ee5\u53ca\u8de8\u4efb\u52a1\u4f7f\u7528\u3002\u7531\u4e8eLLMs\u7684\u89c4\u6a21\u5e9e\u5927\uff0c\u4fdd\u5b58\u548c\u52a0\u8f7d\u68c0\u67e5\u70b9\u5f80\u5f80\u4f1a\u5bfc\u81f4\u4ee4\u4eba\u96be\u4ee5\u63a5\u53d7\u7684\u5206\u949f\u7ea7\u5ef6\u8fdf\uff0c\u6781\u5927\u5730\u964d\u4f4e\u4e86\u8bad\u7ec3\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8de8\u4efb\u52a1\u8f6c\u79fb\u68c0\u67e5\u70b9\u65f6\uff0c\u901a\u5e38\u9700\u8981\u6267\u884c\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\uff0c\u5373\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u7684\u7279\u6027\u548c\u8d44\u6e90\u914d\u989d\u5c06\u68c0\u67e5\u70b9\u52a0\u8f7d\u5230\u4e0d\u540c\u7684\u5e76\u884c\u914d\u7f6e\u4e2d\u3002\u5148\u524d\u7684\u68c0\u67e5\u70b9\u7cfb\u7edf\u5047\u8bbe\u5e76\u884c\u914d\u7f6e\u4e00\u81f4\uff0c\u672a\u80fd\u89e3\u51b3\u5728\u91cd\u65b0\u5206\u7247\u671f\u95f4\u8f6c\u6362\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u800c\u4e14\uff0c\u5728\u5de5\u4e1a\u5e73\u53f0\u4e2d\uff0c\u5f00\u53d1\u8005\u4ece\u4e0d\u540c\u7684\u8bad\u7ec3\u6846\u67b6\u521b\u5efa\u68c0\u67e5\u70b9\uff0c\u6bcf\u4e2a\u6846\u67b6\u90fd\u6709\u5176\u72ec\u7279\u7684\u5b58\u50a8\u548cI/O\u903b\u8f91\uff0c\u8fd9\u589e\u52a0\u4e86\u7edf\u4e00\u7ba1\u7406\u548c\u4f18\u5316\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86ByteCheckpoint\uff0c\u4e00\u4e2a\u652f\u6301\u81ea\u52a8\u5728\u7ebf\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\u7684PyTorch\u539f\u751f\u591a\u6846\u67b6LLM\u68c0\u67e5\u70b9\u7cfb\u7edf\u3002ByteCheckpoint\u91c7\u7528\u6570\u636e/\u5143\u6570\u636e\u5206\u79bb\u7684\u5b58\u50a8\u67b6\u6784\uff0c\u89e3\u8026\u4e86\u68c0\u67e5\u70b9\u5b58\u50a8\u4e0e\u6240\u91c7\u7528\u7684\u5e76\u884c\u7b56\u7565\u548c\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5f02\u6b65\u5f20\u91cf\u5408\u5e76\u6280\u672f\u6765\u89e3\u51b3\u4e0d\u89c4\u5219\u5f20\u91cf\u5206\u7247\u95ee\u9898\uff0c\u5e76\u63d0\u51fa\u4e86\u591a\u9879I/O\u6027\u80fd\u4f18\u5316\u63aa\u65bd\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u68c0\u67e5\u70b9\u4fdd\u5b58\u548c\u52a0\u8f7d\u7684\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cByteCheckpoint\u5728\u51cf\u5c11\u68c0\u67e5\u70b9\u4fdd\u5b58\uff08\u6700\u9ad8\u53ef\u8fbe529.22\u500d\uff09\u548c\u52a0\u8f7d\uff08\u6700\u9ad8\u53ef\u8fbe3.51\u500d\uff09\u6210\u672c\u65b9\u9762\u5177\u6709\u660e\u663e\u4f18\u52bf\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\u3002|\n", "2407.20053": "|**2024-07-29**|**Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models**|Zhe Li et.al.|[2407.20053](http://arxiv.org/abs/2407.20053)|null|\u663e\u8457\u6ce2\u9ad8\uff08SWH\uff09\u5728\u6d77\u6d0b\u79d1\u5b66\u4e2d\u662f\u4e00\u4e2a\u5173\u952e\u6307\u6807\uff0c\u7cbe\u786e\u7684SWH\u4f30\u8ba1\u5bf9\u4e8e\u5404\u79cd\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f8b\u5982\u6d77\u6d0b\u80fd\u5f00\u53d1\u3001\u6e14\u4e1a\u3001\u6f5c\u5728\u98ce\u9669\u7684\u65e9\u671f\u9884\u8b66\u7cfb\u7edf\u7b49\u3002\u57fa\u4e8e\u6570\u503c\u6a21\u578b\u548c\u7269\u7406\u7406\u8bba\u7684\u4f20\u7edfSWH\u4f30\u7b97\u65b9\u6cd5\u53d7\u5230\u8ba1\u7b97\u6548\u7387\u4f4e\u4e0b\u7684\u9650\u5236\u3002\u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u4f5c\u4e3a\u4e00\u79cd\u6709\u5438\u5f15\u529b\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5df2\u7528\u4e8e\u63d0\u9ad8\u51c6\u786e\u5ea6\u5e76\u51cf\u5c11\u8ba1\u7b97\u65f6\u95f4\u3002\u7136\u800c\uff0c\u7531\u4e8e\u89c2\u6d4b\u6280\u672f\u6709\u9650\u548c\u6210\u672c\u9ad8\u6602\uff0c\u5b9e\u9645\u6570\u636e\u7684\u7a00\u7f3a\u6027\u9650\u5236\u4e86\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6d77\u6d0bSWH\u4f30\u7b97\u6846\u67b6\uff0c\u540d\u4e3aOrca\u3002\u5177\u4f53\u800c\u8a00\uff0cOrca\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u7a7a\u95f4\u65f6\u95f4\u611f\u77e5\u7f16\u7801\u6a21\u5757\uff0c\u589e\u5f3a\u4e86\u7ecf\u5178\u8bed\u8a00\u6a21\u578b\u5728\u7a7a\u95f4\u65f6\u95f4\u548c\u6570\u636e\u91cf\u6709\u9650\u60c5\u51b5\u4e0b\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u6709\u9650\u7684\u6d6e\u6807\u89c2\u6d4b\u6570\u636e\u8fdb\u884c\u65f6\u95f4\u5206\u5272\u3001\u7f16\u7801\u6d6e\u6807\u7684\u5730\u7406\u4f4d\u7f6e\u3001\u8bbe\u8ba1\u63d0\u793a\u6a21\u677f\uff0cOrca\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u6709\u6548\u5730\u4f7f\u7528\u6709\u9650\u7684\u6570\u636e\u5bf9\u663e\u8457\u6ce2\u9ad8\u8fdb\u884c\u4f30\u7b97\u3002\u5728\u58a8\u897f\u54e5\u6e7e\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cOrca\u5728SWH\u4f30\u7b97\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.21018": "|**2024-07-30**|**ThinK: Thinner Key Cache by Query-Driven Pruning**|Yuhui Xu et.al.|[2407.21018](http://arxiv.org/abs/2407.21018)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5f15\u53d1\u4e86\u4e00\u573a\u9769\u547d\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u5927\u7684\u6a21\u578b\u89c4\u6a21\u548c\u5e8f\u5217\u957f\u5ea6\uff0c\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u968f\u4e4b\u800c\u6765\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6210\u672c\u7684\u589e\u52a0\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\uff0c\u7531\u4e8e\u6ce8\u610f\u529b\u673a\u5236\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c\u5bf9\u7f13\u5b58\u5185\u5b58\u7ba1\u7406\u63d0\u51fa\u4e86\u4e25\u5cfb\u8003\u9a8c\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u957f\u4e0a\u4e0b\u6587\u573a\u666f\uff0c\u9488\u5bf9\u63a8\u7406\u8fc7\u7a0b\u4e2dKV\u7f13\u5b58\u5185\u5b58\u6d88\u8017\u7684\u6548\u7387\u95ee\u9898\u8fdb\u884c\u6df1\u5165\u63a2\u8ba8\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u57fa\u4e8e\u5e8f\u5217\u957f\u5ea6\u4f18\u5316\u5185\u5b58\u4e0d\u540c\uff0c\u6211\u4eec\u63ed\u793a\u4e86KV\u7f13\u5b58\u901a\u9053\u5728\u6743\u91cd\u5206\u5e03\u4e0d\u5747\u548c\u4f4e\u79e9\u7ed3\u6784\u7279\u5f81\u4e0b\u5b58\u5728\u663e\u8457\u5197\u4f59\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c2\u5bdf\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aThinK\u7684\u65b0\u578b\u67e5\u8be2\u4f9d\u8d56\u578bKV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\uff0c\u65e8\u5728\u6700\u5c0f\u5316\u6ce8\u610f\u529b\u6743\u91cd\u635f\u5931\u7684\u540c\u65f6\uff0c\u6709\u9009\u62e9\u5730\u526a\u679d\u6389\u6700\u4e0d\u91cd\u8981\u7684\u901a\u9053\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u80fd\u591f\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u51c6\u786e\u7387\uff0c\u800c\u4e14\u76f8\u6bd4\u4f20\u7edf\u7684KV\u7f13\u5b58\u6dd8\u6c70\u65b9\u6cd5\uff0c\u80fd\u5b9e\u73b0\u8d85\u8fc720%\u7684\u5185\u5b58\u6210\u672c\u51cf\u5c11\u3002\u901a\u8fc7\u5728LLaMA3\u548cMistral\u6a21\u578b\u4e0a\u5bf9\u591a\u4e2a\u957f\u5e8f\u5217\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86ThinK\u7684\u6709\u6548\u6027\uff0c\u786e\u7acb\u4e86\u5728\u4e0d\u727a\u7272\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u9ad8\u6548\u90e8\u7f72LLM\u7684\u65b0\u6807\u51c6\u3002\u6211\u4eec\u8fd8\u5c55\u671b\u4e86\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u5230\u503c\u7f13\u5b58\u526a\u679d\u7684\u53ef\u80fd\u6027\uff0c\u5c55\u793a\u4e86ThinK\u5728\u964d\u4f4e\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u65b9\u9762\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u548c\u6f5c\u529b\u3002|\n", "2407.21011": "|**2024-07-30**|**CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning**|Yuexi Du et.al.|[2407.21011](http://arxiv.org/abs/2407.21011)|**[link](https://github.com/xypb/cleft)**|**\u8fd1\u671f\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u7684\u8fdb\u5c55\u5728\u591a\u4efb\u52a1\u81ea\u76d1\u7763\u8868\u793a\u5b66\u4e60\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u7136\u800c\uff0c\u73b0\u6709CLIP\u7c7b\u65b9\u6cd5\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684GPU\u8d44\u6e90\u548c\u957f\u65f6\u95f4\u7684\u8bad\u7ec3\u5468\u671f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u4e8e\u533b\u5b66\u5e94\u7528\u800c\u8a00\uff0c\u5927\u89c4\u6a21\u6570\u636e\u96c6\u5e76\u4e0d\u603b\u662f\u5e38\u89c1\u3002\u540c\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u4e3b\u8981\u57fa\u4e8e\u4e0e\u56fe\u50cf\u5173\u8054\u7684\u6807\u7b7e\u8fdb\u884c\u624b\u52a8\u63d0\u53d6\uff0c\u53ef\u80fd\u5ffd\u89c6\u4e86\u8bad\u7ec3\u6837\u672c\u5185\u7684\u4e30\u5bcc\u4fe1\u606f\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u9ad8\u6548\u5927\u8bed\u8a00\u6a21\u578b\u4e0e\u63d0\u793a\u5fae\u8c03\u201d\uff08CLEFT\uff09\u7684\u8bed\u8a00-\u56fe\u50cf\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b83\u5145\u5206\u5229\u7528\u4e86\u5e7f\u6cdb\u9884\u8bad\u7ec3\u7684\u8bed\u4e49\u548c\u89c6\u89c9\u6a21\u578b\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7b56\u7565\u6765\u5b66\u4e60\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u63d0\u793a\uff0c\u4ee5\u7f29\u5c0f\u4e34\u5e8a\u8bca\u65ad\u6570\u636e\u4e0e\u7b80\u5355\u7c7b\u522b\u6807\u7b7e\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u591a\u4e2a\u80f8\u90e8X\u5149\u548c\u4e73\u817aX\u5149\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u5747\u4f18\u4e8e\u5404\u79cd\u57fa\u7ebf\uff0c\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002 \u6240\u63d0\u51fa\u7684\u53c2\u6570\u9ad8\u6548\u7684\u6846\u67b6\u53ef\u4ee5\u5c06\u603b\u53ef\u8bad\u7ec3\u6a21\u578b\u5927\u5c0f\u51cf\u5c1139%\uff0c\u5e76\u5c06\u53ef\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u51cf\u5c11\u5230\u4ec54%\uff0c\u4e0e\u5f53\u524d\u7684BERT\u7f16\u7801\u5668\u76f8\u6bd4\u3002**|\n", "2407.20999": "|**2024-07-30**|**MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning**|Yupeng Chen et.al.|[2407.20999](http://arxiv.org/abs/2407.20999)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u975e\u51e1\u7684\u80fd\u529b\u3002\u901a\u5e38\uff0cLLM\u901a\u8fc7\u5927\u91cf\u8bed\u6599\u5e93\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u5e76\u968f\u540e\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0cLLM\u53ef\u80fd\u4f1a\u5fd8\u8bb0\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5b66\u5230\u7684\u77e5\u8bc6\uff0c\u5bfc\u81f4\u4e00\u822c\u80fd\u529b\u4e0b\u964d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5fae\u8c03\u7b97\u6cd5\u2014\u2014\u52a8\u91cf\u8fc7\u6ee4\u4f18\u5316\u5668\uff08MoFO\uff09\u3002MoFO\u7684\u6838\u5fc3\u601d\u60f3\u662f\u8fed\u4ee3\u5730\u9009\u62e9\u5e76\u66f4\u65b0\u5177\u6709\u6700\u5927\u52a8\u91cf\u5e45\u5ea6\u7684\u6a21\u578b\u53c2\u6570\u3002\u4e0e\u5168\u53c2\u6570\u8bad\u7ec3\u76f8\u6bd4\uff0cMoFO\u5728\u4fdd\u6301\u53c2\u6570\u63a5\u8fd1\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u540c\u65f6\u5b9e\u73b0\u4e86\u76f8\u4f3c\u7684\u5fae\u8c03\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86\u77e5\u8bc6\u9057\u5fd8\u7684\u95ee\u9898\u3002\u4e0e\u73b0\u6709\u7684\u5927\u591a\u6570\u9057\u5fd8\u7f13\u89e3\u65b9\u6cd5\u4e0d\u540c\uff0cMoFO\u5177\u5907\u4ee5\u4e0b\u4e24\u4e2a\u4f18\u52bf\u3002\u9996\u5148\uff0cMoFO\u4e0d\u9700\u8981\u8bbf\u95ee\u9884\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4f7f\u5f97MoFO\u7279\u522b\u9002\u7528\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e0d\u53ef\u7528\u7684\u5fae\u8c03\u573a\u666f\uff0c\u5982\u4f7f\u7528\u5f00\u6e90LLM\u7684\u68c0\u67e5\u70b9\u8fdb\u884c\u5fae\u8c03\u3002\u5176\u6b21\uff0cMoFO\u4e0d\u4f1a\u6539\u53d8\u539f\u59cb\u635f\u5931\u51fd\u6570\u3002\u8fd9\u53ef\u4ee5\u907f\u514d\u635f\u5bb3\u6a21\u578b\u5728\u5fae\u8c03\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u901a\u8fc7\u4e25\u8c28\u7684\u6536\u655b\u6027\u5206\u6790\u548c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86MoFO\u7684\u4f18\u8d8a\u6027\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u7f13\u89e3\u9057\u5fd8\u548c\u589e\u5f3a\u5fae\u8c03\u6027\u80fd\u65b9\u9762\u7684\u4f18\u52bf\u3002|\n", "2407.20990": "|**2024-07-30**|**From Feature Importance to Natural Language Explanations Using LLMs with RAG**|Sule Tekkesinoglu et.al.|[2407.20990](http://arxiv.org/abs/2407.20990)|**[link](https://github.com/suletekkesinoglu/xai_llm_rag)**|\u968f\u7740\u673a\u5668\u5b66\u4e60\u5728\u6d89\u53ca\u4eba\u7c7b\u4ea4\u4e92\u7684\u81ea\u4e3b\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4f5c\u7528\u65e5\u76ca\u91cd\u8981\uff0c\u7406\u89e3\u6a21\u578b\u8f93\u51fa\u53d8\u5f97\u8d8a\u6765\u8d8a\u5173\u952e\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u6b63\u88ab\u63a2\u7d22\u7528\u4f5c\u4e8b\u540e\u89e3\u91ca\u5668\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u63ed\u793a\u9884\u6d4b\u6a21\u578b\u51b3\u7b56\u673a\u5236\u7684\u9014\u5f84\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u53ef\u8ffd\u8e2a\u95ee\u7b54\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u6307\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u573a\u666f\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u7528\u6237\u67e5\u8be2\u8fdb\u884c\u54cd\u5e94\u3002\u8be5\u77e5\u8bc6\u5e93\u5305\u542b\u4e86\u5173\u4e8e\u6a21\u578b\u8f93\u51fa\u7684\u4e0a\u4e0b\u6587\u7ec6\u8282\uff0c\u5305\u62ec\u9ad8\u7ea7\u7279\u5f81\u3001\u7279\u5f81\u91cd\u8981\u6027\u4ee5\u53ca\u66ff\u4ee3\u6982\u7387\u3002 \u6211\u4eec\u91c7\u7528\u51cf\u6cd5\u53cd\u4e8b\u5b9e\u63a8\u7406\u8ba1\u7b97\u7279\u5f81\u91cd\u8981\u6027\uff0c\u8fd9\u662f\u4e00\u79cd\u5206\u6790\u5728\u5206\u89e3\u8bed\u4e49\u7279\u5f81\u540e\u8f93\u51fa\u53d8\u5316\u7684\u65b9\u6cd5\u3002\u4e3a\u4e86\u4fdd\u6301\u5bf9\u8bdd\u6d41\u7545\uff0c\u6211\u4eec\u4ece\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u4e2d\u63d0\u70bc\u51fa\u56db\u4e2a\u5173\u952e\u7279\u6027\u2014\u2014\u793e\u4ea4\u6027\u3001\u56e0\u679c\u6027\u3001\u9009\u62e9\u6027\u548c\u5bf9\u6bd4\u6027\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u4e00\u4e2a\u5373\u65f6\u63d0\u793a\u4e2d\uff0c\u4ee5\u6b64\u6307\u5bfc\u54cd\u5e94\u751f\u6210\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u751f\u6210\u7684\u89e3\u91ca\u5305\u542b\u4e86\u8fd9\u4e9b\u5143\u7d20\uff0c\u8fd9\u8868\u660e\u5b83\u6709\u53ef\u80fd\u5728\u590d\u6742\u6a21\u578b\u8f93\u51fa\u4e0e\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u4e4b\u95f4\u67b6\u8d77\u6865\u6881\u3002|\n", "2407.20970": "|**2024-07-30**|**Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks**|Alakesh Kalita et.al.|[2407.20970](http://arxiv.org/abs/2407.20970)|null|\u968f\u7740\u7b2c\u4e94\u4ee3\uff085G\uff09\u548c\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u4ee5\u53ca\u7269\u8054\u7f51\uff08IoT\uff09\u7684\u5174\u8d77\uff0c\u8bed\u4e49\u901a\u4fe1\u6b63\u53d7\u5230\u7814\u7a76\u8005\u7684\u5173\u6ce8\uff0c\u56e0\u4e3a\u5f53\u524d\u7684\u901a\u4fe1\u6280\u672f\u6b63\u63a5\u8fd1\u9999\u519c\u6781\u9650\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u7406\u89e3\u5e76\u751f\u6210\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u6587\u672c\uff0c\u57fa\u4e8e\u5bf9\u6570\u5341\u4ebf\u53c2\u6570\u7684\u5e7f\u6cdb\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u8003\u8651\u5230\u6700\u8fd1\u7684\u5c31\u8fd1\u8ba1\u7b97\u6280\u672f\u5982\u8fb9\u7f18\u8ba1\u7b97\uff0c\u672c\u6587\u6982\u8ff0\u4e86\u4e00\u4e2a\u6846\u67b6\u53ca\u5176\u6a21\u5757\uff0c\u5176\u4e2dLLMs\u53ef\u4ee5\u5728\u7269\u8054\u7f51\u7f51\u7edc\u7684\u7f51\u7edc\u8fb9\u7f18\u4e0b\uff0c\u4f5c\u4e3a\u8bed\u4e49\u901a\u4fe1\u7684\u4e00\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u9ad8\u6548\u901a\u4fe1\u6548\u7387\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e00\u4e9b\u5e94\u7528\uff0c\u5e76\u5206\u6790\u4e86\u53d1\u5c55\u6b64\u7c7b\u7cfb\u7edf\u7684\u6311\u6218\u548c\u673a\u9047\u3002|\n", "2407.20906": "|**2024-07-30**|**Automated Review Generation Method Based on Large Language Models**|Shican Wu et.al.|[2407.20906](http://arxiv.org/abs/2407.20906)|**[link](https://github.com/tju-ecat-ai/automaticreviewgeneration)**|**\u6587\u732e\u7814\u7a76\u5bf9\u4e8e\u79d1\u5b66\u8fdb\u6b65\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u5bf9\u6d77\u91cf\u4fe1\u606f\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u52a8\u5316\u7efc\u8ff0\u751f\u6210\u65b9\u6cd5\uff0c\u65e8\u5728\u7b80\u5316\u6587\u732e\u5904\u7406\u6d41\u7a0b\u5e76\u51cf\u8f7b\u8ba4\u77e5\u8d1f\u62c5\u3002\u4ee5\u4e19\u70f7\u8131\u6c22\uff08PDH\uff09\u50ac\u5316\u5242\u4e3a\u4f8b\uff0c\u8be5\u65b9\u6cd5\u4ece343\u7bc7\u6587\u7ae0\u4e2d\u8fc5\u901f\u751f\u6210\u4e86\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u5e73\u5747\u6bcf\u7bc7\u6587\u7ae0\u6bcfLLM\u8d26\u6237\u8017\u65f6\u4ec5\u6570\u79d2\u3002\u5bf91041\u7bc7\u6587\u7ae0\u7684\u8fdb\u4e00\u6b65\u5206\u6790\u63ed\u793a\u4e86\u50ac\u5316\u5242\u7ec4\u6210\u3001\u7ed3\u6784\u548c\u6027\u80fd\u7684\u6df1\u5165\u89c1\u89e3\u3002 \u8ba4\u8bc6\u5230LLM\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u7684\u95ee\u9898\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u591a\u5c42\u6b21\u7684\u8d28\u91cf\u63a7\u5236\u7b56\u7565\uff0c\u786e\u4fdd\u4e86\u65b9\u6cd5\u7684\u53ef\u9760\u6027\u548c\u6709\u6548\u7f13\u89e3\u5e7b\u89c9\u7684\u80fd\u529b\u3002\u4e13\u5bb6\u9a8c\u8bc1\u8bc1\u5b9e\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u751f\u6210\u7684\u7efc\u8ff0\u4e0d\u4ec5\u51c6\u786e\u4e14\u5f15\u6587\u5b8c\u6574\uff0cLLM\u5e7b\u89c9\u7684\u98ce\u9669\u5df2\u964d\u81f3\u4f4e\u4e8e0.5%\uff0c\u7f6e\u4fe1\u5ea6\u8d85\u8fc795%\u3002\u53d1\u5e03\u7684Windows\u5e94\u7528\u7a0b\u5e8f\u652f\u6301\u4e00\u952e\u751f\u6210\u7efc\u8ff0\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8ddf\u8e2a\u6700\u65b0\u8fdb\u5c55\u5e76\u63a8\u8350\u76f8\u5173\u6587\u732e\u3002\u8fd9\u4e00\u65b9\u6cd5\u5c55\u793a\u4e86LLM\u5728\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u751f\u4ea7\u529b\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.20898": "|**2024-07-30**|**ThinkRepair: Self-Directed Automated Program Repair**|Xin Yin et.al.|[2407.20898](http://arxiv.org/abs/2407.20898)|**[link](https://github.com/vinci-grape/ThinkRepair)**|**\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u8bb8\u591a\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\uff08APR\uff09\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u4fee\u590d\u4e00\u4e9b\u7279\u5b9a\u7c7b\u578b\u7684\u9519\u8bef\u65f6\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u5bf9\u9519\u8bef\u7a0b\u5e8f\u7684\u903b\u8f91\u8fdb\u884c\u5206\u6790\u548c\u63a8\u7406\u7684\u590d\u6742\u9519\u8bef\u65f6\u4ecd\u5b58\u5728\u5c40\u9650\u6027\u3002\u6700\u8fd1\uff0c\u901a\u8fc7\u63d0\u793a\u5de5\u7a0b\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u89e3\u51b3\u5305\u62ec\u9519\u8bef\u4fee\u590d\u5728\u5185\u7684\u591a\u79cd\u4efb\u52a1\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u63d0\u793a\u7684\u8d28\u91cf\u4f1a\u6781\u5927\u5730\u5f71\u54cdLLMs\u7684\u80fd\u529b\uff0c\u800c\u624b\u52a8\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u63d0\u793a\u662f\u4e00\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5bfc\u5411\u7684LLM\u57fa\u4e8e\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\u65b9\u6cd5ThinkRepair\uff0c\u5b83\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a\u6536\u96c6\u9636\u6bb5\u548c\u4fee\u590d\u9636\u6bb5\u3002\u5728\u6536\u96c6\u9636\u6bb5\uff0c\u901a\u8fc7\u4f7f\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u6307\u5bfcLLMs\uff0c\u81ea\u52a8\u6536\u96c6\u6784\u6210\u9884\u4fee\u590d\u77e5\u8bc6\u7684\u5404\u79cd\u601d\u8003\u94fe\u3002\u5728\u4fee\u590d\u9636\u6bb5\uff0c\u76ee\u6807\u662f\u901a\u8fc7\u9996\u5148\u9009\u62e9\u7528\u4e8e\u5c11\u91cf\u5b66\u4e60\u7684\u793a\u4f8b\u5e76\u5176\u6b21\u4e0eLLMs\u81ea\u52a8\u4ea4\u4e92\u6765\u4fee\u590d\u9519\u8bef\uff0c\u6839\u636e\u6d4b\u8bd5\u4fe1\u606f\u63d0\u4f9b\u53cd\u9988\uff08\u5982\u679c\u9700\u8981\u7684\u8bdd\uff09\u3002 \u5728\u5bf9\u4e24\u4e2a\u5e7f\u6cdb\u7814\u7a76\u7684\u6570\u636e\u96c6\uff08Defects4J\u548cQuixBugs\uff09\u7684\u8bc4\u4f30\u4e2d\uff0c\u4e0e12\u4e2a\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u8868\u660eThinkRepair\u5728\u4fee\u590d\u9519\u8bef\u65b9\u9762\u7684\u4f18\u5148\u7ea7\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728Defects4J V1.2\u4e0a\uff0cThinkRepair\u6210\u529f\u4fee\u590d\u4e8698\u4e2a\u9519\u8bef\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u63d0\u5347\u4e8627%-344.4%\u3002\u5728Defects4J V2.0\u4e0a\uff0cThinkRepair\u6bd4\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u591a\u4fee\u590d\u4e8612-65\u4e2a\u9519\u8bef\u3002\u6b64\u5916\uff0c\u5728Java\u548cPython\u4e0a\uff0cThinkRepair\u5728QuixBugs\u4e0a\u7684\u8868\u73b0\u4e5f\u6709\u4e86\u663e\u8457\u63d0\u5347\uff08\u6700\u591a\u5206\u522b\u8fbe\u523031\u548c21\uff09\u3002**|\n", "2407.20884": "|**2024-07-30**|**Effective Black Box Testing of Sentiment Analysis Classification Networks**|Parsa Karbasizadeh et.al.|[2407.20884](http://arxiv.org/abs/2407.20884)|null|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u795e\u7ecf\u7f51\u7edc\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u5982\u60c5\u611f\u5206\u6790\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u590d\u6742\u67b6\u6784\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u4fdd\u6301\u53ef\u9760\u6027\u7684\u6311\u6218\u4f9d\u7136\u5b58\u5728\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7ec4\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u4e3a\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7f51\u7edc\u6784\u5efa\u7684\u6d4b\u8bd5\u5957\u4ef6\u7684\u8986\u76d6\u6807\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u8f93\u5165\u7a7a\u95f4\u5212\u5206\u7684\u9ed1\u76d2\u7b56\u7565\uff0c\u8003\u8651\u4e86\u4e0e\u60c5\u611f\u76f8\u5173\u7684\u5173\u952e\u8bed\u8a00\u7279\u5f81\uff0c\u5305\u62ec\u52a8\u8bcd\u3001\u5f62\u5bb9\u8bcd\u3001\u526f\u8bcd\u548c\u540d\u8bcd\u3002\u4e3a\u4e86\u6709\u6548\u5730\u751f\u6210\u6db5\u76d6\u5e7f\u6cdb\u60c5\u611f\u5143\u7d20\u7684\u6d4b\u8bd5\u7528\u4f8b\uff0c\u6211\u4eec\u91c7\u7528\u4e86k\u6295\u5f71\u8986\u76d6\u5ea6\u91cf\u3002\u8be5\u5ea6\u91cf\u901a\u8fc7\u4e00\u6b21\u68c0\u67e5k\u4e2a\u7279\u5f81\u7684\u5b50\u96c6\u6765\u51cf\u5c11\u95ee\u9898\u7684\u590d\u6742\u6027\uff0c\u4ece\u800c\u964d\u4f4e\u7ef4\u5ea6\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u5c55\u793a\u7279\u5b9a\u60c5\u611f\u7279\u5f81\u7ec4\u5408\u7684\u53e5\u5b50\u3002\u4ece\u60c5\u611f\u5206\u6790\u6570\u636e\u96c6\u5b9e\u9a8c\u4e2d\u83b7\u5f97\u7684\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6807\u51c6\u548c\u751f\u6210\u7684\u6d4b\u8bd5\u5e73\u5747\u63d0\u9ad8\u4e8616%\u7684\u6d4b\u8bd5\u8986\u76d6\u7387\u3002\u540c\u65f6\uff0c\u6a21\u578b\u51c6\u786e\u5ea6\u5e73\u5747\u4e0b\u964d\u4e866.5%\uff0c\u663e\u793a\u4e86\u8bc6\u522b\u8106\u5f31\u6027\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u8bc4\u4f30\u6539\u8fdb\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7cfb\u7edf\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e0a\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u5de5\u5177\u7684\u5e94\u7528\u4f7f\u7cfb\u7edf\u80fd\u591f\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u4f7f\u5176\u4ece\u4ec5\u4ec5\u751f\u6210\u6587\u672c\u8f6c\u53d8\u4e3a\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u3002\u9274\u4e8e\u4ee3\u7406\u7684\u5b9e\u9645\u5e94\u7528\u8303\u56f4\u4ee5\u53ca\u5176\u5bf9\u73af\u5883\u8fdb\u884c\u64cd\u4f5c\u7684\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u9ed1\u5ba2\u5165\u4fb5\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u9020\u6210\u7684\u635f\u5bb3\u53ef\u80fd\u4f1a\u8d85\u8fc7\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u6709\u7814\u7a76\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e0d\u540c\u89d2\u5ea6\u5ba1\u89c6\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u6545\u969c\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u653b\u51fb\u624b\u6bb5\u3001\u573a\u666f\u548c\u5c5e\u6027\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5176\u6613\u611f\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u4e2a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u5bfc\u81f4\u8d85\u8fc780%\u7684\u5931\u8d25\u7387\u3002\u901a\u8fc7\u5728\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u9488\u5bf9\u5b9e\u73b0\u5e76\u90e8\u7f72\u7684\u4ee3\u7406\u8fdb\u884c\u653b\u51fb\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u6b64\u7c7b\u6f0f\u6d1e\u6240\u4f34\u968f\u7684\u73b0\u5b9e\u98ce\u9669\u3002\u4e3a\u4e86\u51cf\u8f7b\u6b64\u7c7b\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u4ec5\u4f7f\u7528LLM\u5f88\u96be\u6709\u6548\u68c0\u6d4b\u5230\u8fd9\u4e9b\u653b\u51fb\uff0c\u8fd9\u51f8\u663e\u4e86\u8fd9\u79cd\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.20856": "|**2024-07-30**|**Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations**|Sarthak Anand et.al.|[2407.20856](http://arxiv.org/abs/2407.20856)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(Large Language Models, LLMs)\u4e3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u4ea7\u54c1\u63a8\u8350\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u6709\u6548\u6027\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5b83\u4eec\u5bf9\u4ea7\u54c1\u5e93\u5b58\u7684\u5168\u9762\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u589e\u5f3aLLMs\u7684\u4ea7\u54c1\u77e5\u8bc6\u80fd\u529b\uff0c\u901a\u8fc7\u8bad\u7ec3\u5b83\u4eec\u54cd\u5e94\u5305\u542b\u4ea7\u54c1ID\u7684\u5408\u6210\u641c\u7d22\u67e5\u8be2\uff0c\u4ee5\u8fdb\u884c\u4e0a\u4e0b\u6587\u76f8\u5173\u56de\u590d\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0c\u8bc4\u4f30\u5176\u6548\u679c\uff0c\u6982\u8ff0\u5176\u4f18\u70b9\uff0c\u5e76\u6307\u51fa\u4e86\u9650\u5236\u56e0\u7d20\u3002\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u6b64\u65b9\u6cd5\u7684\u6539\u8fdb\u6f5c\u529b\u548c\u672a\u6765\u65b9\u5411\uff0c\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4ea7\u54c1\u63a8\u8350\u4e2d\u89d2\u8272\u7684\u5168\u9762\u7406\u89e3\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u5220\u9664\u6240\u6709','\u5b57\u7b26\u3002|\n", "2407.21771": "|**2024-07-31**|**Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs**|Shi Liu et.al.|[2407.21771](http://arxiv.org/abs/2407.21771)|null|\u73b0\u6709\u5927\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u4e3b\u8981\u901a\u8fc7\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u7684\u56fe\u50cf\u7279\u5f81\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\uff0c\u5229\u7528\u5176\u5f3a\u5927\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u89c4\u6a21\u5dee\u5f02\u53ef\u80fd\u5bfc\u81f4LLM\u5728\u591a\u6a21\u6001\u7406\u89e3\u4e2d\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u79cdLVLM\u4e2d\u7684\u4e0d\u5e73\u8861\u53ef\u80fd\u5f15\u53d1\u5e7b\u89c9\u73b0\u8c61\u3002\u5177\u4f53\u6765\u8bf4\uff0cLVLM\u53ef\u80fd\u751f\u6210\u4e00\u81f4\u7684\u63cf\u8ff0\uff0c\u65e0\u8bba\u662f\u5426\u6709\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u8868\u660e\u67d0\u4e9b\u8f93\u51fa\u4ec5\u53d7\u4e0a\u4e0b\u6587\u6587\u672c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u73b0\u8c61\u79f0\u4e3a\u201c\u6587\u672c\u60ef\u6027\u201d\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u7b97\u6cd5\u6765\u5bfb\u627e\u56fe\u50cf\u7406\u89e3\u548c\u8bed\u8a00\u63a8\u65ad\u4e4b\u95f4\u7684\u5e73\u8861\u70b9\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u52a8\u6001\u8c03\u6574\u5e76\u653e\u5927\u5206\u914d\u7ed9\u56fe\u50cf\u4ee4\u724c\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u4ece\u800c\u8d4b\u4e88\u89c6\u89c9\u5143\u7d20\u66f4\u5927\u7684\u91cd\u8981\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4ece\u591a\u6a21\u6001\u8f93\u5165\u7684logits\u4e2d\u51cf\u53bb\u7eaf\u6587\u672c\u8f93\u5165\u7684logits\uff0c\u6709\u52a9\u4e8eLVLM\u907f\u514d\u8fc7\u5206\u4f9d\u8d56LLM\u3002\u901a\u8fc7\u589e\u5f3a\u56fe\u50cf\u4ee4\u724c\u5e76\u51cf\u5c11LLM\u7684\u987d\u56fa\u8f93\u51fa\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9LVLM\u66f4\u591a\u5730\u5173\u6ce8\u56fe\u50cf\uff0c\u4ece\u800c\u7f13\u89e3\u6587\u672c\u60ef\u6027\u548c\u51cf\u5c11LVLM\u4e2d\u7684\u5e7b\u89c9\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0c\u5728\u4e0d\u540c\u6307\u6807\u4e0b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5404\u79cdLVLM\u4e2d\u7684\u5e7b\u89c9\u8f93\u51fa\u9891\u7387\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://lalbj.github.io/projects/PAI/\u3002|\n", "2407.21762": "|**2024-07-31**|**ReplanVLM: Replanning Robotic Tasks with Visual Language Models**|Aoran Mei et.al.|[2407.21762](http://arxiv.org/abs/2407.21762)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u9886\u57df\u83b7\u5f97\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u5206\u6790\u4e0e\u751f\u6210\u3001\u4ee5\u53ca\u5bf9\u4e16\u754c\u5e7f\u6cdb\u77e5\u8bc6\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u89e3\u6790\u89c6\u89c9\u7ebf\u7d22\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\uff0c\u65e0\u6cd5\u76f4\u63a5\u611f\u77e5\u4e16\u754c\u72b6\u6001\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u63cf\u8ff0\u5f53\u524d\u4e16\u754c\u72b6\u6001\u4e0a\u7684\u4e0d\u8db3\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u901a\u8fc7\u96c6\u6210\u89c6\u89c9\u611f\u77e5\u6a21\u5757\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u589e\u5f3a\u4e86\u673a\u5668\u4eba\u7684\u81ea\u4e3b\u6027\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cVLM\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4f8b\u5982\uff0c\u5728\u63d0\u4f9b\u51c6\u786e\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u6267\u884c\u9519\u8bef\u7684\u98ce\u9669\u4f9d\u7136\u5b58\u5728\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u7684ReplanVLM\u6846\u67b6\u3002\u8be5\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u9519\u8bef\u4fee\u6b63\u5e72\u9884\u63aa\u65bd\u3002\u63d0\u51fa\u4e86\u5185\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\u548c\u5916\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\uff0c\u5728\u76f8\u5e94\u7684\u9636\u6bb5\u8fdb\u884c\u9519\u8bef\u7ea0\u6b63\u3002\u53d1\u5c55\u4e86\u4e00\u79cd\u91cd\u89c4\u5212\u7b56\u7565\uff0c\u5f53\u4efb\u52a1\u6267\u884c\u5931\u8d25\u65f6\uff0c\u7528\u4e8e\u91cd\u65b0\u89c4\u5212\u4efb\u52a1\u6216\u4fee\u6b63\u9519\u8bef\u4ee3\u7801\u3002\u5728\u771f\u5b9e\u673a\u5668\u4eba\u548c\u4eff\u771f\u73af\u5883\u4e2d\u8fdb\u884c\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6846\u67b6\u5177\u6709\u66f4\u9ad8\u7684\u6210\u529f\u7387\u548c\u66f4\u5f3a\u7684\u5f00\u653e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\u3002\u6709\u5173\u5b9e\u9a8c\u7684\u89c6\u9891\u53ef\u4ee5\u5728https://youtu.be/NPk2pWKazJc\u627e\u5230\u3002|\n", "2407.21712": "|**2024-07-31**|**Adaptive Retrieval-Augmented Generation for Conversational Systems**|Xi Wang et.al.|[2407.21712](http://arxiv.org/abs/2407.21712)|null|\u5c3d\u7ba1\u5728\u5bf9\u8bdd\u7cfb\u7edf\u5f00\u53d1\u4e2d\u878d\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u8bb8\u591a\u7814\u7a76\u663e\u793a\u4e86\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u5bf9\u4e8e\u63d0\u4f9b\u4fe1\u606f\u6027\u54cd\u5e94\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7814\u7a76\u901a\u5e38\u5047\u8bbe\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u6bcf\u6b21\u56de\u590d\u90fd\u9700\u8981\u68c0\u7d22\u589e\u5f3a\uff0c\u800c\u65e0\u9700\u660e\u786e\u63a7\u5236\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u5173\u4e8e\u8fd9\u79cd\u5fc5\u8981\u6027\u7684\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u7cfb\u7edf\u56de\u5e94\u662f\u5426\u9700\u8981\u4f7f\u7528\u5916\u90e8\u77e5\u8bc6\u8fdb\u884c\u589e\u5f3a\u7684\u5fc5\u8981\u6027\u3002\u901a\u8fc7\u5229\u7528\u4eba\u7c7b\u5bf9\u662f\u5426\u9700\u8981\u9002\u5e94\u6027\u589e\u5f3a\u7684\u4e8c\u5143\u9009\u62e9\u8fdb\u884c\u5224\u65ad\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RAGate\u2014\u2014\u4e00\u4e2a\u95f8\u95e8\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u901a\u8fc7\u5206\u6790\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u548c\u76f8\u5173\u8f93\u5165\u6765\u9884\u6d4b\u5bf9\u8bdd\u7cfb\u7edf\u662f\u5426\u9700\u8981RAG\u4ee5\u83b7\u5f97\u6539\u8fdb\u7684\u56de\u590d\u3002\u6211\u4eec\u5728\u6784\u5efa\u548c\u5e94\u7528RAGate\u5230\u5bf9\u8bdd\u6a21\u578b\u4ee5\u53ca\u5bf9\u4e0d\u540c\u5bf9\u8bdd\u573a\u666f\u8fdb\u884c\u8be6\u5c3d\u5206\u6790\u65b9\u9762\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u548c\u5206\u6790\u8868\u660e\uff0cRAGate\u5728\u8bc6\u522b\u9700\u8981RAG\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u56de\u590d\u5e76\u5177\u6709\u9ad8\u751f\u6210\u7f6e\u4fe1\u5ea6\u7684\u7cfb\u7edf\u54cd\u5e94\u65b9\u9762\u6709\u6709\u6548\u5e94\u7528\u3002\u8fd9\u9879\u7814\u7a76\u8fd8\u53d1\u73b0\u4e86\u751f\u6210\u7f6e\u4fe1\u5ea6\u6c34\u5e73\u4e0e\u589e\u5f3a\u77e5\u8bc6\u7684\u76f8\u5173\u6027\u3002|\n", "2407.21708": "|**2024-07-31**|**CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature**|Stefan Langer et.al.|[2407.21708](http://arxiv.org/abs/2407.21708)|null|\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5df2\u6807\u6ce8\u6587\u672c\u8bed\u6599\u5e93\u548c\u4eceChebi\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u589e\u5f3a\u73b0\u6709\u77e5\u8bc6\uff0c\u5e76\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u8bc6\u522b\u79d1\u5b66\u6587\u732e\u4e2d\u7684\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u672c\u4f53\u8bba\u77e5\u8bc6\u4e0eLLM\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u79d1\u5b66\u6587\u732e\u4e2d\u8bc6\u522b\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u7684\u9ad8\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u4ece8000\u7bc7ChemRxiv\u6587\u7ae0\u4e2d\u63d0\u53d6\u8fd9\u4e9b\u5b9e\u4f53\u548c\u89d2\u8272\uff0c\u7136\u540e\u4f7f\u7528\u7b2c\u4e8c\u4e2aLLM\u6784\u5efa\u4e86\u4e00\u4e2a\u5316\u5b66\u5b9e\u4f53\u548c\u89d2\u8272\u7684\u77e5\u8bc6\u56fe\u8c31\uff08CEAR\uff09\uff0c\u8be5\u56fe\u8c31\u4e0d\u4ec5\u4e3aChEBI\u63d0\u4f9b\u4e86\u8865\u5145\u4fe1\u606f\uff0c\u8fd8\u80fd\u5e2e\u52a9\u6269\u5c55\u5176\u5185\u5bb9\u3002|\n", "2407.21693": "|**2024-07-31**|**TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities**|Ming Zhang et.al.|[2407.21693](http://arxiv.org/abs/2407.21693)|**[link](https://github.com/konglonggefdu/transfertod)**|\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\uff08TOD\uff09\u7cfb\u7edf\u65e8\u5728\u6709\u6548\u5904\u7406\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\uff0c\u5305\u62ec\u4fe1\u606f\u6536\u96c6\u3002\u5982\u4f55\u51c6\u786e\u3001\u9ad8\u6548\u4e14\u6709\u6548\u5730\u5229\u7528TOD\u8fdb\u884c\u4fe1\u606f\u6536\u96c6\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u4e00\u4e2a\u5173\u952e\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u3001\u6307\u4ee4\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u80fd\u591f\u901a\u8fc7\u5fae\u8c03\u663e\u8457\u63d0\u9ad8TOD\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6570\u636e\u96c6\u4e3b\u8981\u9488\u5bf9\u7528\u6237\u9a71\u52a8\u7684\u7cfb\u7edf\uff0c\u5e76\u5c40\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u7279\u5b9a\u573a\u666f\u548c\u69fd\u4f4d\uff0c\u56e0\u6b64\u9700\u8981\u5728TOD\u7684\u4e3b\u52a8\u6027\u3001\u591a\u6837\u6027\u548c\u80fd\u529b\u65b9\u9762\u8fdb\u884c\u6539\u8fdb\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u591a\u9886\u57df\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u6570\u636e\u6784\u5efa\u8fc7\u7a0b\u4ee5\u53ca\u57fa\u4e8e\u6b64\u8fc7\u7a0b\u751f\u6210\u7684\u4e2d\u6587\u5bf9\u8bdd\u6570\u636e\u96c6\u2014\u2014\\textbf{TransferTOD}\uff0c\u8be5\u6570\u636e\u96c6\u771f\u5b9e\u6a21\u62df\u4e86\u572830\u4e2a\u6d41\u884c\u751f\u6d3b\u670d\u52a1\u573a\u666f\u4e2d\u7684\u4eba\u673a\u5bf9\u8bdd\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4f7f\u7528\u5168\u53c2\u6570\u5fae\u8c03\u7684\\textbf{TransferTOD-7B}\u6a21\u578b\uff0c\u5c55\u793a\u4e86\u5728\u5404\u79cd\u4e0b\u6e38\u573a\u666f\u4e2d\u7684\u663e\u8457\u7684\u586b\u69fd\u80fd\u529b\u548c\u63d0\u95ee\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u6570\u636e\u5e94\u7528\u573a\u666f\u4e0b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6570\u636e\u4f7f\u7528\u6548\u7387\u548c\u7cfb\u7edf\u6027\u80fd\u3002\u6570\u636e\u5df2\u53d1\u5e03\u4e8ehttps://github.com/KongLongGeFDU/TransferTOD\u3002|\n", "2407.21669": "|**2024-07-31**|**Synth-Empathy: Towards High-Quality Synthetic Empathy Data**|Hao Liang et.al.|[2407.21669](http://arxiv.org/abs/2407.21669)|**[link](https://github.com/aurora-slz/synth-empathy)**|\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b9e\u73b0\u51fa\u8272\u540c\u7406\u5fc3\u54cd\u5e94\u80fd\u529b\u5df2\u6210\u4e3a\u4e00\u4e2a\u81f3\u5173\u91cd\u8981\u7684\u524d\u63d0\u3002\u56e0\u6b64\uff0c\u7ba1\u7406\u548c\u7406\u89e3\u540c\u7406\u5fc3\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u540c\u7406\u5fc3\u6570\u636e\u901a\u5e38\u7531\u4eba\u7c7b\u6807\u6ce8\uff0c\u5bfc\u81f4\u6570\u636e\u91cf\u4e0d\u8db3\u548c\u5927\u91cf\u7684\u4eba\u529b\u6d6a\u8d39\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSynth-Empathy\u7684LLM\u57fa\u4e8e\u7684\u6570\u636e\u751f\u6210\u4e0e\u8d28\u91cf\u3001\u591a\u6837\u6027\u9009\u62e9\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u540c\u7406\u5fc3\u6570\u636e\u5e76\u7b5b\u9009\u6389\u4f4e\u8d28\u91cf\u6570\u636e\u3002\u901a\u8fc7\u5229\u7528\u4f4e\u540c\u7406\u5fc3\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u9ad8\u4e86\u540c\u7406\u5fc3\u54cd\u5e94\u6027\u80fd\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\uff08SoTA\uff09\u7ed3\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5404\u79cd\u4eba\u7c7b\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5747\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6570\u636e\u91cf\u4e0e\u8d28\u91cf\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u63d0\u4f9b\u4e86\u540c\u7406\u5fc3\u6570\u636e\u751f\u6210\u4e0e\u9009\u62e9\u65b9\u9762\u7684\u89c1\u89e3\u3002|\n", "2407.21593": "|**2024-07-31**|**LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows**|Lukas Teufelberger et.al.|[2407.21593](http://arxiv.org/abs/2407.21593)|null|\u4e3a\u4e86\u63d0\u9ad8\u751f\u4ea7\u529b\u5e76\u4f18\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u529f\u80fd\u5d4c\u5165\u5e94\u7528\u7a0b\u5e8f\u7684\u8d8b\u52bf\u6b63\u5728\u589e\u957f\uff0c\u4ece\u57fa\u4e8e\u6d4f\u89c8\u5668\u7684\u7f51\u7edc\u5e94\u7528\u5230\u5728\u4e2a\u4eba\u8ba1\u7b97\u673a\u4e0a\u8fd0\u884c\u7684\u539f\u751f\u5e94\u7528\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7cfb\u7edf\u7ea7\u5feb\u6377\u65b9\u5f0f\u5c42\u2014\u2014LLM-for-X\uff0c\u5b83\u901a\u8fc7\u8f7b\u91cf\u7ea7\u5f39\u51fa\u5f0f\u5bf9\u8bdd\u6846\u65e0\u7f1d\u5730\u5411\u4efb\u4f55\u5e94\u7528\u7a0b\u5e8f\u6dfb\u52a0LLM\u670d\u52a1\u3002\u6211\u4eec\u7684\u539f\u751f\u5c42\u901a\u8fc7\u7edf\u4e00\u7684\u804a\u5929\u524d\u7aef\u4f5c\u4e3a\u7f16\u7a0b\u63a5\u53e3\u6216\u81ea\u5b9a\u4e49API\u8c03\u7528\uff0c\u5c06\u524d\u7aef\u5e94\u7528\u7a0b\u5e8f\u4e0e\u6d41\u884c\u7684LLM\u540e\u7aef\uff08\u5982ChatGPT\u548cGemini\uff09\u65e0\u7f1d\u8fde\u63a5\u3002\u6211\u4eec\u5c55\u793a\u4e86LLM-for-X\u5728Microsoft Office\u3001VSCode\u3001Adobe Acrobat\u4ee5\u53caOverleaf\u7b49\u6d41\u884c\u7f51\u7edc\u5e94\u7528\u4e2d\u7684\u4f18\u52bf\u3002\u5728\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c06LLM-for-X\u4e0eChatGPT\u7684\u7f51\u9875\u754c\u9762\u8fdb\u884c\u4e86\u4efb\u52a1\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u4f9b\u5feb\u901f\u3001\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u7684LLM\u8f85\u52a9\uff0c\u65e0\u9700\u5207\u6362\u4e0a\u4e0b\u6587\u652f\u6301\u5199\u4f5c\u548c\u9605\u8bfb\u4efb\u52a1\uff0c\u540c\u65f6\u5bf9\u7279\u5b9a\u5e94\u7528\u65e0\u7279\u5b9a\u4f9d\u8d56\u3002|\n", "2407.21579": "|**2024-07-31**|**A Performance Study of LLM-Generated Code on Leetcode**|Tristan Coignion et.al.|[2407.21579](http://arxiv.org/abs/2407.21579)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6548\u7387\uff0c\u5e76\u4f7f\u7528\u6765\u81eaLeetCode\u7684\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7f16\u5199\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u6211\u4eec\u5bf9\u6bd4\u4e8618\u4e2aLLM\uff0c\u8003\u8651\u4e86\u6a21\u578b\u6e29\u5ea6\u548c\u6210\u529f\u7387\u7b49\u56e0\u7d20\u5bf9\u4ee3\u7801\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5ea6\u91cf\u548c\u6bd4\u8f83LLM\u751f\u6210\u4ee3\u7801\u7684\u901f\u5ea6\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u91c7\u7528\u4e0d\u540cLLM\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u6027\u80fd\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u5e73\u5747\u800c\u8a00\u6bd4\u4eba\u7c7b\u7f16\u5199\u7684\u4ee3\u7801\u66f4\u9ad8\u6548\u3002\u8bba\u6587\u8fdb\u4e00\u6b65\u8ba8\u8bba\u4e86\u4f7f\u7528LeetCode\u4f5c\u4e3a\u57fa\u51c6\u6570\u636e\u96c6\u3001\u6f5c\u5728\u6570\u636e\u6c61\u67d3\u5e26\u6765\u7684\u9650\u5236\u4ee5\u53ca\u5e73\u53f0\u6d4b\u91cf\u53ef\u9760\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3LLM\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u80fd\u529b\uff0c\u5e76\u4e3a\u8be5\u9886\u57df\u672a\u6765\u7684\u4f18\u5316\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2407.21571": "|**2024-07-31**|**PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning**|Min Jae Jung et.al.|[2407.21571](http://arxiv.org/abs/2407.21571)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u9047\u5230\u91cd\u5927\u6311\u6218\uff0c\u4e3b\u8981\u5728\u4e8e\u707e\u96be\u6027\u9057\u5fd8\u73b0\u8c61\uff0c\u5373\u65b0\u4fe1\u606f\u4f1a\u8986\u76d6\u4e4b\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u8fd9\u4e00\u5c40\u9650\u6027\u5bfc\u81f4\u4e86\u5927\u91cf\u73af\u5883\u548c\u7ecf\u6d4e\u8d44\u6e90\u7684\u6d6a\u8d39\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aPMoE\uff08Progressive Mixture of Experts with Asymmetric Transformer\uff09\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u65e8\u5728\u901a\u8fc7\u91c7\u7528\u5177\u6709\u6d45\u5c42\u7528\u4e8e\u4e00\u822c\u77e5\u8bc6\u548c\u6df1\u5c42\u7528\u4e8e\u65b0\u77e5\u8bc6\u7684\u4e0d\u5bf9\u79f0\u8bbe\u8ba1\u6765\u6700\u5c0f\u5316\u9057\u5fd8\u3002PMoE\u5728\u6df1\u5c42\u5f15\u5165\u4e86\u9010\u6b65\u589e\u52a0\u7684\u4e13\u5bb6\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u8be5\u8def\u7531\u5668\u80fd\u591f\u9ad8\u6548\u5730\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u5408\u9002\u7684\u4e13\u5bb6\u3002 \u8def\u7531\u5668\u4f4d\u4e8e\u6df1\u5c42\u9644\u8fd1\uff0c\u5229\u7528\u6df1\u5ea6\u7279\u5f81\u805a\u5408\u5df2\u6574\u5408\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u8def\u7531\u5668\u80fd\u591f\u6709\u6548\u5730\u6267\u884c\u4efb\u52a1\uff0c\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u9010\u6b65\u589e\u52a0\u7684\u6df1\u5c42\u4e13\u5bb6\u3002\u901a\u8fc7\u5728TRACE\u6570\u636e\u96c6\u548c\u901a\u7528\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684PMoE\u65b9\u6cd5\u4f18\u4e8e\u5148\u524d\u7684\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2407.21553": "|**2024-07-31**|**CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment**|Akira Kasuga et.al.|[2407.21553](http://arxiv.org/abs/2407.21553)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5ba2\u6237\u4f53\u9a8c\uff08CX\uff09\u6a21\u62df\u5668\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u7528\u6237\u884c\u4e3a\u6a21\u62df\u6765\u8bc4\u4f30\u672a\u6d4b\u8bd5\u7684\u7f51\u7edc\u8425\u9500\u6d3b\u52a8\u7684\u5f71\u54cd\u3002\u8be5\u63d0\u51fa\u7684\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u7528\u6237\u884c\u4e3a\u5386\u53f2\u4e2d\u7684\u5404\u79cd\u4e8b\u4ef6\uff0c\u5982\u67e5\u770b\u5546\u54c1\u3001\u4f7f\u7528\u4f18\u60e0\u5238\u6216\u8d2d\u4e70\u5546\u54c1\u7b49\uff0c\u8868\u793a\u4e3a\u8bed\u4e49\u5d4c\u5165\u5411\u91cf\u3002\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u6a21\u578b\uff0c\u7528\u4e8e\u4ece\u5176LLM\u5d4c\u5165\u4e2d\u9884\u6d4b\u4e8b\u4ef6\u4e4b\u95f4\u7684\u8fc7\u6e21\uff0c\u751a\u81f3\u53ef\u4ee5\u4ece\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u4e2d\u5b66\u4e60\uff0c\u4ece\u800c\u5bf9\u672a\u77e5\u4e8b\u4ef6\u8fdb\u884c\u6cdb\u5316\u3002\u5728web\u8425\u9500\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u8fc7\u6e21\u9884\u6d4b\u6a21\u578b\u6765\u6a21\u62df\u5f53\u65b0\u7684\u8425\u9500\u6d3b\u52a8\u6216\u4ea7\u54c1\u5c55\u793a\u7ed9\u7528\u6237\u65f6\uff0c\u7528\u6237\u53ef\u80fd\u5982\u4f55\u53cd\u5e94\u4e0d\u540c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u6d88\u9664\u5728\u7ebf\u6d4b\u8bd5\u7684\u9ad8\u6602\u6210\u672c\uff0c\u5e76\u589e\u5f3a\u8425\u9500\u4eba\u5458\u63ed\u793a\u6d1e\u5bdf\u529b\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u6570\u503c\u8bc4\u4f30\u548c\u4f7f\u7528Google\u5546\u54c1\u5546\u5e97\u7684\u5927\u89c4\u6a21\u516c\u5171\u6570\u636e\u96c6\u8fdb\u884c\u7684\u7528\u6237\u7814\u7a76\u8bc1\u660e\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u6574LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\u3002\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u8fc7\u7a0b\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u7684\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6AgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u5305\u542b\u5404\u79cd\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u96be\u5ea6\u591a\u6837\u6027\u7684\u7a0b\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u5bb9\u6613\u548c\u56f0\u96be\u7684\u4e24\u4e2a\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u4f7f\u7528AgentGen\u6307\u4ee4\u8c03\u6574\u7684Llama-3 8B\u5728\u603b\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u6b64\u5916\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4\u3002|\n", "2408.00761": "|**2024-08-01**|**Tamper-Resistant Safeguards for Open-Weight LLMs**|Rishub Tamirisa et.al.|[2408.00761](http://arxiv.org/abs/2408.00761)|**[link](https://github.com/rishub-tamirisa/tamper-resistance)**|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5f15\u53d1\u4e86\u5bf9\u6f5c\u5728\u6076\u610f\u7528\u9014\u7684\u5e7f\u6cdb\u62c5\u5fe7\u3002\u9488\u5bf9\u5f00\u653e\u6743\u91cd\u7684LLM\uff0c\u73b0\u6709\u4fdd\u62a4\u63aa\u65bd\u5728\u62b5\u6297\u7be1\u6539\u653b\u51fb\u65b9\u9762\u7f3a\u4e4f\u8db3\u591f\u7684\u7a33\u5b9a\u6027\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6b65\u9aa4\u8f7b\u6613\u5730\u79fb\u9664\u62d2\u7edd\u548c\u9057\u5fd8\u4fdd\u62a4\u63aa\u65bd\u3002\u8fd9\u7c7b\u6f0f\u6d1e\u8981\u6c42\u91c7\u53d6\u65b0\u7684\u65b9\u6cd5\u6765\u786e\u4fdd\u5b89\u5168\u91ca\u653e\u5f00\u653e\u6743\u91cd\u7684LLM\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTAR\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u4e0d\u53ef\u7be1\u6539\u7684\u5b89\u5168\u9632\u62a4\u878d\u5165\u5230\u5f00\u653e\u6743\u91cd\u7684LLM\u4e2d\uff0c\u4f7f\u5f97\u5373\u4f7f\u7ecf\u8fc7\u6570\u5343\u6b65\u7684\u5fae\u8c03\uff0c\u653b\u51fb\u8005\u4e5f\u65e0\u6cd5\u79fb\u9664\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u3002\u5728\u5168\u9762\u7684\u8bc4\u4f30\u548c\u7ea2\u961f\u6d4b\u8bd5\u5206\u6790\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u9632\u62a4\u7684\u4e0d\u53ef\u7be1\u6539\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u6027\u529f\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u4e0d\u53ef\u7be1\u6539\u6027\u662f\u4e00\u4e2a\u53ef\u884c\u7684\u95ee\u9898\uff0c\u4e3a\u6539\u8fdb\u5f00\u653e\u6743\u91cdLLM\u7684\u5b89\u5168\u6027\u548c\u5b89\u5168\u6027\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u65b0\u9014\u5f84\u3002|\n", "2408.00741": "|**2024-08-01**|**DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency**|Jovan Stojkovic et.al.|[2408.00741](http://arxiv.org/abs/2408.00741)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210\u80fd\u529b\u4f7f\u5176\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u6210\u4e3a\u5173\u952e\u7684\u5de5\u4f5c\u8d1f\u8f7d\u3002\u5982\u4eca\uff0cLLM\u63a8\u7406\u96c6\u7fa4\u5904\u7406\u5927\u91cf\u67e5\u8be2\uff0c\u5e76\u5bf9\u670d\u52a1\u8d28\u91cf\u6307\u6807\uff08SLOs\uff09\u6709\u4e25\u683c\u8981\u6c42\u3002\u4e3a\u4e86\u8fbe\u5230\u9884\u671f\u6027\u80fd\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u80fd\u8017\u9ad8\u7684GPU\u4e0a\u6267\u884c\uff0c\u5bfc\u81f4\u63a8\u7406\u96c6\u7fa4\u6d88\u8017\u5927\u91cf\u80fd\u6e90\uff0c\u5e76\u4ea7\u751f\u8fc7\u91cf\u7684\u78b3\u6392\u653e\u3002\u5e78\u8fd0\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u53ef\u4ee5\u901a\u8fc7\u5229\u7528\u63a8\u7406\u8ba1\u7b97\u7279\u6027\u7684\u5f02\u8d28\u6027\u4ee5\u53ca\u5de5\u4f5c\u8d1f\u8f7d\u7684\u6ce2\u52a8\uff0c\u663e\u8457\u63d0\u9ad8\u80fd\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u591a\u6837\u6027\u548c\u52a8\u6001\u73af\u5883\u521b\u9020\u4e86\u4e00\u4e2a\u5de8\u5927\u7684\u641c\u7d22\u7a7a\u95f4\uff0c\u4e0d\u540c\u7684\u7cfb\u7edf\u914d\u7f6e\uff08\u5982\u5b9e\u4f8b\u6570\u91cf\u3001\u6a21\u578b\u5e76\u884c\u6027\u548cGPU\u9891\u7387\uff09\u5bfc\u81f4\u4e0d\u540c\u7684\u80fd\u6e90\u548c\u6027\u80fd\u6298\u8877\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DynamoLLM\uff0c\u8fd9\u662f\u9996\u4e2a\u9488\u5bf9LLM\u63a8\u7406\u73af\u5883\u7684\u80fd\u6548\u7ba1\u7406\u6846\u67b6\u3002DynamoLLM\u81ea\u52a8\u4e14\u52a8\u6001\u5730\u91cd\u65b0\u914d\u7f6e\u63a8\u7406\u96c6\u7fa4\uff0c\u4ee5\u4f18\u5316\u80fd\u6e90\u548c\u6210\u672c\uff0c\u540c\u65f6\u6ee1\u8db3\u670d\u52a1\u7684\u6027\u80fdSLOs\u3002\u7814\u7a76\u8868\u660e\uff0c\u5728\u670d\u52a1\u5c42\u9762\uff0cDynamoLLM\u80fd\u591f\u8282\u770153%\u7684\u80fd\u6e90\u548c38%\u7684\u64cd\u4f5c\u78b3\u6392\u653e\uff0c\u5e76\u4e3a\u5ba2\u6237\u51cf\u5c1161%\u7684\u6210\u672c\uff0c\u540c\u65f6\u4ecd\u80fd\u6ee1\u8db3\u5ef6\u8fdfSLOs\u3002|\n", "2408.00727": "|**2024-08-01**|**Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions**|Guangzhi Xiong et.al.|[2408.00727](http://arxiv.org/abs/2408.00727)|**[link](https://github.com/teddy-xionggz/medrag)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u89e3\u51b3\u533b\u7597\u95ee\u9898\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u80fd\u591f\u638c\u63e1\u5927\u91cf\u533b\u5b66\u77e5\u8bc6\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u4e14\u5728\u77e5\u8bc6\u66f4\u65b0\u65b9\u9762\u5177\u6709\u5c40\u9650\u6027\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u5728\u533b\u5b66\u95ee\u7b54\u65b9\u9762\u7684\u80fd\u529b\uff0c\u63d0\u51fa\u4e86\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\uff0c\u901a\u8fc7\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u591a\u6b21\u4fe1\u606f\u67e5\u8be2\u7684\u590d\u6742\u60c5\u51b5\u4e0b\uff0cRAG\u53ef\u80fd\u4ecd\u7136\u4f1a\u5931\u8d25\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3RAG\u65b9\u6cd5\uff08i-MedRAG\uff09\uff0c\u5141\u8bb8LLM\u5728\u6bcf\u6b21\u5c1d\u8bd5\u540e\u8fed\u4ee3\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u3002\u5728\u6bcf\u6b21i-MedRAG\u8fed\u4ee3\u4e2d\uff0c\u540e\u7eed\u95ee\u9898\u7531\u57fa\u672c\u7684RAG\u7cfb\u7edf\u56de\u7b54\uff0c\u5e76\u7528\u4e8e\u6307\u5bfc\u4e0b\u4e00\u4e2a\u8fed\u4ee3\u4e2d\u7684\u67e5\u8be2\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ec5\u4f7f\u7528RAG\u7684\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0ci-MedRAG\u663e\u8457\u63d0\u9ad8\u4e86\u5404\u79cdLLM\u5728\u590d\u6742\u95ee\u9898\u4e0a\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u95ee\u9898\u662f\u7f8e\u56fd\u533b\u5b66\u751f\u6267\u7167\u8003\u8bd5\uff08USMLE\uff09\u4e34\u5e8a\u6848\u4f8b\u548c\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u8bed\u8a00\u7406\u89e3\uff08MMLU\uff09\u6570\u636e\u96c6\u4e2d\u7684\u77e5\u8bc6\u6d4b\u8bd5\u6240\u6db5\u76d6\u7684\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u96f6\u6837\u672ci-MedRAG\u5728GPT-3.5\u4e0a\u53d6\u5f97\u4e8669.68%\u7684\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u6240\u6709\u73b0\u6709\u7684\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u65b9\u6cd5\u5728MedQA\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86i-MedRAG\u5728\u4e0d\u540c\u8fed\u4ee3\u6b21\u6570\u548c\u6bcf\u8fed\u4ee3\u67e5\u8be2\u6570\u91cf\u4e0b\u7684\u6269\u5c55\u7279\u6027\u3002 \u6211\u4eec\u7684\u6848\u4f8b\u7814\u7a76\u663e\u793a\uff0ci-MedRAG\u80fd\u591f\u7075\u6d3b\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u5f62\u6210\u63a8\u7406\u94fe\uff0c\u6df1\u5165\u5206\u6790\u533b\u7597\u95ee\u9898\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u540e\u7eed\u95ee\u9898\u878d\u5165\u533b\u5b66RAG\u7684\u7814\u7a76\u3002|\n", "2408.00724": "|**2024-08-01**|**An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models**|Yangzhen Wu et.al.|[2408.00724](http://arxiv.org/abs/2408.00724)|null|\u5728\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u4f18\u8bad\u7ec3\u914d\u7f6e\u7814\u7a76\u4e2d\uff0c\u7279\u522b\u662f\u5728\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u65b9\u9762\u7684\u914d\u7f6e\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u63a2\u8ba8\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u7406\u9636\u6bb5\u5982\u4f55\u6700\u4f18\u5316\u914d\u7f6eLLM\u4ee5\u5e73\u8861\u989d\u5916\u7684\u63a8\u7406\u8ba1\u7b97\u65f6\u95f4\u548c\u6027\u80fd\u63d0\u5347\u7684\u7814\u7a76\u8fd8\u4e0d\u591f\u6df1\u5165\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\uff0c\u5373\u8bbe\u8ba1\u80fd\u591f\u901a\u8fc7\u8c03\u6574\u63a8\u7406\u65f6\u95f4\u7684\u8ba1\u7b97\u91cf\u6765\u4f18\u5316\u6027\u80fd\u7684\u6a21\u578b\u548c\u63a8\u7406\u7b56\u7565\u3002 \u4e3a\u4e86\u7406\u89e3\u5e76\u8bbe\u8ba1\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5bf9\u591a\u79cd\u63a8\u7406\u7b56\u7565\uff0c\u5982\u8d2a\u5fc3\u641c\u7d22\u3001\u591a\u6570\u6295\u7968\u3001\u6700\u4f73N\u79cd\u7ec4\u5408\u3001\u52a0\u6743\u6295\u7968\u53ca\u5176\u53d8\u4f53\uff0c\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u6811\u641c\u7d22\u7b97\u6cd5\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d89\u53ca\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u914d\u5408\u66f4\u5148\u8fdb\u7684\u89e3\u7801\u7b97\u6cd5\u901a\u5e38\u80fd\u5b9e\u73b0\u5e15\u7d2f\u6258\u6700\u4f18\u7684\u6743\u8861\uff0c\u5373\u5728\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u4e0e\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u627e\u5230\u6700\u4f73\u5e73\u8861\u70b9\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u5728\u9884\u7b97\u6709\u9650\u7684\u573a\u666f\u4e0b\uff0c\u5982\u7ec8\u7aef\u8bbe\u5907\u4e0a\u90e8\u7f72\u5c0f\u578b\u6a21\u578b\uff0c\u53ef\u80fd\u5177\u6709\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u9ad8\u95ee\u9898\u89e3\u51b3\u7684\u51c6\u786e\u7387\u3002 \u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86Llemma-7B\u6a21\u578b\u5728\u4f7f\u7528\u7ea6\u4e24\u500d\u4e8eLlemma-34B\u6a21\u578b\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u5b9e\u73b0\u4e0e\u540e\u8005\u76f8\u5f53\u7684MATH500\u4efb\u52a1\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u53ef\u80fd\u9002\u7528\u4e8e\u4efb\u4f55\u6709\u660e\u786e\u6210\u529f\u5ea6\u91cf\u6807\u51c6\u7684\u751f\u6210\u4efb\u52a1\u3002|\n", "2408.00722": "|**2024-08-01**|**Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities**|Sunder Ali Khowaja et.al.|[2408.00722](http://arxiv.org/abs/2408.00722)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u65b0\u5174\u5e94\u7528\u4e2d\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u800c\u5907\u53d7\u5173\u6ce8\uff0c\u8fd9\u4e9b\u5e94\u7528\u5305\u62ec\u901a\u4fe1\u7f51\u7edc\u3002\u9884\u8ba16G\u79fb\u52a8\u8fb9\u7f18\u8ba1\u7b97\u7f51\u7edc\u5c06\u80fd\u591f\u4f5c\u4e3a\u670d\u52a1\u652f\u6301LLMs\uff0c\u56e0\u4e3a\u5b83\u4eec\u63d0\u4f9b\u8d85\u53ef\u9760\u7684\u4f4e\u5ef6\u8fdf\u901a\u4fe1\u548c\u95ed\u73af\u5927\u89c4\u6a21\u8fde\u63a5\u3002\u7136\u800c\uff0cLLMs\u5728\u6570\u636e\u548c\u6a21\u578b\u9690\u79c1\u65b9\u9762\u5b58\u5728\u6f0f\u6d1e\uff0c\u8fd9\u5f71\u54cd\u4e86\u5728\u7528\u6237\u670d\u52a1\u4e2d\u90e8\u7f72LLMs\u7684\u4fe1\u4efb\u5ea6\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u65f6\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u7279\u522b\u662f\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u653b\u51fb\u7f51\u7edc\u7684\u7279\u5f81\uff0c\u8be5\u7f51\u7edc\u53ef\u4ee5\u5728\u8bbf\u95ee\u4e0b\u6e38\u4efb\u52a1\u7ec6\u8c03\u6a21\u578b\u65f6\u6267\u884c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\uff0c\u524d\u63d0\u662f\u653b\u51fb\u8005\u53ef\u4ee5\u8bbf\u95ee\u8be5\u6a21\u578b\u3002\u6211\u4eec\u8868\u660e\uff0c\u5bf9\u4e8e\u4efb\u4f55\u4e0b\u6e38\u4efb\u52a1\uff0c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u90fd\u662f\u6709\u6548\u7684\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5728\u4f7f\u7528LLMs\u4f5c\u4e3a\u670d\u52a1\u65f6\u53d1\u751f\u4e2a\u4eba\u6570\u636e\u6cc4\u9732\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u4e0a\uff0c\u653b\u51fb\u6210\u529f\u7387\u53ef\u8fbe92%\u3002\u57fa\u4e8e\u5b9e\u9a8c\u5206\u6790\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u9632\u5fa1\u673a\u5236\uff0c\u5e76\u63d0\u51fa\u4e86\u53ef\u80fd\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u4f7f\u57286G\u7f51\u7edc\u80cc\u666f\u4e0bLLMs\u66f4\u52a0\u53ef\u9760\u3002|\n", "2408.00690": "|**2024-08-02**|**Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning**|Trapoom Ukarapol et.al.|[2408.00690](http://arxiv.org/abs/2408.00690)|**[link](https://github.com/trapoom555/language-model-sts-cft)**|\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u7279\u70b9\u800c\u964d\u4f4e\u4e86\u5176\u53ef\u83b7\u53d6\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u5982MiniCPM\u63d0\u4f9b\u4e86\u66f4\u53ef\u6301\u7eed\u7684\u6269\u5c55\u6027\uff0c\u4f46\u5f80\u5f80\u5728\u6ca1\u6709\u4e13\u95e8\u4f18\u5316\u7684\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u4f73\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u63d0\u5347\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\u6765\u589e\u5f3a\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff1aMiniCPM\u3001Phi-2\u548cGemma\uff0c\u5728NLI\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u5347\u6240\u6709\u4e09\u79cd\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\uff0c\u5176\u4e2dMiniCPM\u8868\u73b0\u51fa\u6700\u663e\u8457\u7684\u5e73\u574756.33%\u6027\u80fd\u63d0\u5347\u3002\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/trapoom555/Language-Model-STS-CFT\u3002|\n", "2408.00686": "|**2024-08-01**|**Can Developers Prompt? A Controlled Experiment for Code Documentation Generation**|Hans-Alexander Kruse et.al.|[2408.00686](http://arxiv.org/abs/2408.00686)|null|\u6211\u4eec\u5bf920\u540d\u4e13\u4e1a\u4eba\u58eb\u548c30\u540d\u8ba1\u7b97\u673a\u79d1\u5b66\u5b66\u751f\u8fdb\u884c\u4e86\u4e00\u4e2a\u53d7\u63a7\u5b9e\u9a8c\uff0c\u8981\u6c42\u4ed6\u4eec\u4f7f\u7528ChatGPT\u98ce\u683c\u7684Visual Studio Code\u6269\u5c55\u6765\u4e3a\u4e24\u4e2aPython\u51fd\u6570\u7f16\u5199\u4ee3\u7801\u6587\u6863\u3002\u5b9e\u9a8c\u7ec4\u81ea\u7531\u8f93\u5165\u81ea\u5b9a\u4e49\u63d0\u793a\uff0c\u800c\u5bf9\u7167\u7ec4\u5219\u6267\u884c\u9884\u8bbe\u7684\u5c11\u91cf\u63d0\u793a\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u4e13\u4e1a\u4eba\u58eb\u8fd8\u662f\u5b66\u751f\uff0c\u90fd\u5bf9\u6216\u65e0\u6cd5\u5e94\u7528\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5c24\u5176\u662f\u5b66\u751f\uff0c\u4ed6\u4eec\u8ba4\u4e3a\u4ece\u81ea\u5b9a\u4e49\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u6bd4\u4ece\u51c6\u5907\u597d\u7684\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u5728\u53ef\u8bfb\u6027\u3001\u7b80\u6d01\u6027\u548c\u6709\u7528\u6027\u65b9\u9762\u663e\u8457\u8f83\u5dee\u3002\u4e00\u4e9b\u4e13\u4e1a\u4eba\u58eb\u4ec5\u901a\u8fc7\u5728\u81ea\u5b9a\u4e49\u63d0\u793a\u4e2d\u52a0\u5165\u201cDocstring\u201d\u5173\u952e\u8bcd\u5c31\u80fd\u751f\u6210\u66f4\u9ad8\u8d28\u91cf\u7684\u6587\u6863\u3002\u5b66\u751f\u5e0c\u671b\u83b7\u5f97\u66f4\u591a\u7684\u6307\u5bfc\u6765\u5236\u5b9a\u63d0\u793a\uff0c\u800c\u4e13\u4e1a\u4eba\u58eb\u5219\u66f4\u6b23\u8d4f\u81ea\u5b9a\u4e49\u63d0\u793a\u7684\u7075\u6d3b\u6027\u3002\u53c2\u4e0e\u8005\u666e\u904d\u8ba4\u4e3a\u8f93\u51fa\u5e76\u975e\u5b8c\u7f8e\uff0c\u800c\u662f\u5c06\u5176\u89c6\u4e3a\u9010\u6b65\u5b8c\u5584\u6587\u6863\u7684\u5de5\u5177\u3002\u9700\u8981\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u6765\u7406\u89e3\u5f00\u53d1\u4eba\u5458\u5177\u6709\u7684\u63d0\u793a\u6280\u5de7\u548c\u504f\u597d\uff0c\u4ee5\u53ca\u4ed6\u4eec\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u652f\u63f4\u3002|\n", "2408.00665": "|**2024-08-01**|**AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models**|Daqin Luo et.al.|[2408.00665](http://arxiv.org/abs/2408.00665)|**[link](https://github.com/tim120526/AutoM3L)**|### \u6458\u8981 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591a\u6a21\u6001\u673a\u5668\u5b66\u4e60\u81ea\u52a8\u5316\u6846\u67b6\u2014\u2014AutoM3L\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u63a7\u5236\u5668\uff0c\u81ea\u52a8\u6784\u5efa\u591a\u6a21\u6001\u8bad\u7ec3\u7ba1\u9053\u3002AutoM3L\u80fd\u591f\u7406\u89e3\u6570\u636e\u6a21\u6001\u5e76\u6839\u636e\u7528\u6237\u9700\u6c42\u9009\u62e9\u5408\u9002\u7684\u6a21\u578b\uff0c\u63d0\u4f9b\u81ea\u52a8\u5316\u548c\u4e92\u52a8\u6027\u3002\u901a\u8fc7\u6d88\u9664\u624b\u52a8\u7279\u5f81\u5de5\u7a0b\u548c\u8d85\u53c2\u6570\u4f18\u5316\u7684\u9700\u6c42\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7b80\u5316\u4e86\u7528\u6237\u53c2\u4e0e\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u6307\u4ee4\u63d0\u4f9b\u4e86\u5b9a\u5236\u5316\u9009\u9879\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u4ee5\u5f80\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u3002 \u6211\u4eec\u5bf9AutoM3L\u5728\u516d\u4e2a\u4e0d\u540c\u7c7b\u578b\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6db5\u76d6\u4e86\u5206\u7c7b\u3001\u56de\u5f52\u548c\u68c0\u7d22\u4efb\u52a1\uff0c\u4ee5\u53ca\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u5355\u6a21\u6001\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoM3L\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\u6216\u8d85\u8d8a\u6027\u3002\u6b64\u5916\uff0c\u7528\u6237\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86AutoM3L\u5728\u7528\u6237\u53cb\u597d\u6027\u548c\u6613\u7528\u6027\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u76f8\u8f83\u4e8e\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3002|\n", "2408.00657": "|**2024-08-01**|**Disentangling Dense Embeddings with Sparse Autoencoders**|Charles O'Neill et.al.|[2408.00657](http://arxiv.org/abs/2408.00657)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5e94\u7528\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\uff08SAEs\uff09\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u5bc6\u96c6\u6587\u672c\u5d4c\u5165\u7684\u9996\u6b21\u5c1d\u8bd5\uff0c\u5c55\u793a\u5176\u5728\u89e3\u7f20\u8bed\u4e49\u6982\u5ff5\u65b9\u9762\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5728\u8d85\u8fc742\u4e07\u7bc7\u8ba1\u7b97\u673a\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u79d1\u5b66\u8bba\u6587\u6458\u8981\u7684\u5d4c\u5165\u4e0a\u8bad\u7ec3SAEs\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6240\u5f97\u5230\u7684\u7a00\u758f\u8868\u793a\u4fdd\u6301\u4e86\u8bed\u4e49\u4e00\u81f4\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002\u6211\u4eec\u5206\u6790\u8fd9\u4e9b\u5b66\u4e60\u7279\u5f81\uff0c\u63a2\u7d22\u4e0d\u540c\u6a21\u578b\u5bb9\u91cf\u4e0b\u5b83\u4eec\u7684\u884c\u4e3a\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8bc6\u522b\u201c\u7279\u5f81\u5bb6\u65cf\u201d\uff0c\u8fd9\u4e9b\u7279\u5f81\u4ee3\u8868\u4e86\u4e0d\u540c\u62bd\u8c61\u7ea7\u522b\u7684\u76f8\u5173\u6982\u5ff5\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u5b9e\u9645\u5e94\u7528\u4ef7\u503c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u4e9b\u53ef\u89e3\u91ca\u7279\u5f81\u7cbe\u786e\u63a7\u5236\u8bed\u4e49\u641c\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u67e5\u8be2\u8bed\u4e49\u7684\u7cbe\u7ec6\u63a7\u5236\u3002\u8fd9\u9879\u5de5\u4f5c\u586b\u8865\u4e86\u5bc6\u96c6\u5d4c\u5165\u7684\u8bed\u4e49\u4e30\u5bcc\u6027\u548c\u7a00\u758f\u8868\u793a\u7684\u53ef\u89e3\u91ca\u6027\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8bad\u7ec3\u540e\u7684\u5d4c\u5165\u3001\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\u4ee5\u53ca\u53ef\u89e3\u91ca\u7279\u5f81\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7528\u4e8e\u63a2\u7d22\u5b83\u4eec\u7684\u7f51\u9875\u5e94\u7528\u7a0b\u5e8f\u3002|\n", "2408.01423": "|**2024-08-02**|**Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting**|Xiangyu Zhao et.al.|[2408.01423](http://arxiv.org/abs/2408.01423)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5728\u6267\u884c\u5404\u79cd\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u6027\u80fd\u53d7\u5230\u7279\u5b9a\u63d0\u793a\u8bbe\u8ba1\u7b56\u7565\u7684\u5f71\u54cd\u3002\u4e3b\u8981\u6709\u4e24\u79cd\u63d0\u793a\u8bbe\u8ba1\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u901a\u8fc7\u624b\u52a8\u4e3a\u7279\u5b9a\u6570\u636e\u96c6\u521b\u5efa\u4e13\u95e8\u7684\u63d0\u793a\uff0c\u88ab\u79f0\u4e3a\u4e13\u5bb6\u8bbe\u8ba1\u63d0\u793a\uff08EDP\uff09\uff0c\u4e00\u65e6\u521b\u5efa\uff0c\u5b83\u4eec\u5c31\u65e0\u6cd5\u66f4\u6539\uff0c\u5176\u6709\u6548\u6027\u53d7\u9650\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u8005\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u5f53\u5e94\u7528\u4e8eLLM\u65f6\uff0c\u8fd9\u79cd\u56fa\u5b9a\u7684\u65b9\u6cd5\u5bfc\u81f4\u5bf9\u7b80\u5355\u95ee\u9898\u548c\u590d\u6742\u95ee\u9898\u91c7\u7528\u7edf\u4e00\u7684\u89e3\u51b3\u7b56\u7565\uff0c\u5bfc\u81f4\u5bf9\u4e8e\u7b80\u5355\u95ee\u9898\u8fc7\u5ea6\u4f7f\u7528\u4ee4\u724c\u3002\u53e6\u4e00\u79cd\u65b9\u6cd5\u662f\u8ba9LLM\u81ea\u52a8\u751f\u6210\u63d0\u793a\uff0c\u79f0\u4e3aLLM\u884d\u751f\u63d0\u793a\uff08LDP\uff09\uff0c\u80fd\u591f\u9488\u5bf9\u5177\u4f53\u95ee\u9898\u63d0\u4f9b\u5b9a\u5236\u89e3\u51b3\u65b9\u6848\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86EDP\u7684\u5c40\u9650\u6027\u3002\u7136\u800c\uff0cLDP\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6027\u80fd\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5728\u89e3\u51b3\u95ee\u9898\u89c4\u5212\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u7d2f\u79ef\u9519\u8bef\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63d0\u793a\u9012\u5f52\u641c\u7d22\uff08PRS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u51cf\u5c11\u4ee4\u724c\u7684\u4f7f\u7528\u3002\u8fd9\u4e2a\u6846\u67b6\u5305\u542b\u4e86\u5bf9\u95ee\u9898\u590d\u6742\u6027\u7684\u8bc4\u4f30\u4ee5\u53ca\u53ef\u8c03\u6574\u7684\u7ed3\u6784\uff0c\u4ee5\u964d\u4f4e\u51fa\u9519\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u53c2\u6570\u6570\u91cf\u7684LLM\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5185\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86PRS\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u4e0e\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0cPRS\u65b9\u6cd5\u5728\u4f7f\u7528Llama3-7B\u6a21\u578b\u65f6\uff0cBBH\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e868%\uff0c\u5b9e\u73b0\u4e8622%\u7684\u6539\u8fdb\u3002|\n", "2408.01420": "|**2024-08-02**|**Mission Impossible: A Statistical Perspective on Jailbreaking LLMs**|Jingtong Su et.al.|[2408.01420](http://arxiv.org/abs/2408.01420)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6709\u9650\u7684\u8d28\u91cf\u63a7\u5236\u4e0b\u8bad\u7ec3\u4e8e\u6d77\u91cf\u6587\u672c\u6570\u636e\u4e2d\u3002\u8fd9\u5bfc\u81f4LLM\u53ef\u80fd\u51fa\u73b0\u610f\u5916\u751a\u81f3\u6709\u5bb3\u7684\u884c\u4e3a\uff0c\u5982\u6cc4\u9732\u4fe1\u606f\u3001\u5047\u65b0\u95fb\u6216\u4ec7\u6068\u8a00\u8bba\u3002\u5e94\u5bf9\u7b56\u7565\uff0c\u901a\u5e38\u79f0\u4e3a\u504f\u597d\u5bf9\u9f50\uff0c\u5305\u62ec\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6587\u672c\u793a\u4f8b\u7cbe\u7ec6\u8c03\u6574\u9884\u8bad\u7ec3\u7684LLM\uff0c\u4ee5\u4f53\u73b0\u671f\u671b\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u5373\u4f7f\u8fdb\u884c\u4e86\u504f\u597d\u5bf9\u9f50\uff0cLLM\u4e5f\u4ecd\u53ef\u80fd\u8bf1\u9a97\u81f3\u6709\u5bb3\u884c\u4e3a\u3002\u8fd9\u79cd\u88ab\u79f0\u4e3aLLM\u201c\u8d8a\u72f1\u201d\u7684\u73b0\u8c61\u901a\u5e38\u901a\u8fc7\u4fee\u6539\u8f93\u5165\u63d0\u793a\u6765\u5b9e\u73b0\uff0c\u4ee5\u8bef\u5bfcLLM\u3002\u672c\u6587\u4ece\u7edf\u8ba1\u5b66\u7684\u89d2\u5ea6\u63d0\u4f9b\u5bf9\u504f\u597d\u5bf9\u9f50\u548c\u8d8a\u72f1\u73b0\u8c61\u7684\u7406\u8bba\u6d1e\u5bdf\u3002 \u5728\u6211\u4eec\u7684\u6846\u67b6\u4e0b\uff0c\u9996\u5148\u8bc1\u660e\u4e86\u5982\u679c\u8bad\u7ec3\u8bed\u6599\u5e93\u4e2d\u5b58\u5728\u6709\u5bb3\u884c\u4e3a\uff0c\u9884\u8bad\u7ec3\u7684LLM\u4f1a\u6a21\u4eff\u8fd9\u79cd\u884c\u4e3a\u3002\u540c\u6837\u57fa\u4e8e\u8fd9\u4e2a\u6846\u67b6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7edf\u8ba1\u610f\u4e49\u4e0a\u7684\u5bf9\u9f50\u6982\u5ff5\uff0c\u5e76\u7ed9\u51fa\u4e86\u8d8a\u72f1\u6982\u7387\u7684\u4e0b\u754c\uff0c\u8868\u660e\u5728\u5408\u7406\u5047\u8bbe\u4e0b\uff0c\u8fd9\u79cd\u73b0\u8c61\u662f\u65e0\u6cd5\u907f\u514d\u7684\u3002\u57fa\u4e8e\u6211\u4eec\u7684\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u5f53\u524d\u666e\u904d\u91c7\u7528\u7684\u5bf9\u9f50\u7b56\u7565\u2014\u2014\u5f3a\u5316\u8bed\u8a00\u5f15\u5bfc\u53cd\u9988\uff08RLHF\uff09\u7684\u6539\u8fdb\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aE-RLHF\u7684\u7b80\u5355\u4fee\u6539\u7248RLHF\u76ee\u6807\uff0c\u65e8\u5728\u63d0\u9ad8\u5b89\u5168\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u3002E-RLHF\u4e0d\u4f1a\u589e\u52a0\u989d\u5916\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u4e14\u4e0e\u5176\u5b83\u65b9\u6cd5\u517c\u5bb9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4e0d\u727a\u7272MT-Bench\u9879\u76ee\u8861\u91cf\u7684\u6a21\u578b\u6027\u80fd\u7684\u60c5\u51b5\u4e0b\uff0cE-RLHF\u5728AdvBench\u548cHarmBench\u9879\u76ee\u63d0\u51fa\u7684\u6240\u6709\u5bf9\u9f50\u95ee\u9898\u4e0a\u5747\u4f18\u4e8eRLHF\u3002|\n", "2408.01419": "|**2024-08-02**|**DebateQA: Evaluating Question Answering on Debatable Knowledge**|Rongwu Xu et.al.|[2408.01419](http://arxiv.org/abs/2408.01419)|**[link](https://github.com/pillowsofwind/debateqa)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u63a2\u8ba8\u5173\u4e8eLLM\u804a\u5929\u673a\u5668\u4eba\u4e0a\u56fa\u6709\u4e89\u8bae\u6027\u95ee\u9898\u7684\u7b54\u6848\uff0c\u8fd9\u9700\u8981\u4e00\u79cd\u53ef\u9760\u7684\u65b9\u5f0f\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u4f20\u7edf\u95ee\u7b54\u57fa\u51c6\u5047\u8bbe\u56fa\u5b9a\u7684\u7b54\u6848\u5bf9\u6b64\u76ee\u7684\u800c\u8a00\u662f\u4e0d\u8db3\u7684\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86DebateQA\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,941\u4e2a\u4e89\u8bae\u6027\u95ee\u9898\u7684\u6570\u636e\u96c6\uff0c\u6bcf\u4e2a\u95ee\u9898\u90fd\u9644\u5e26\u4e86\u591a\u4e2a\u7531\u4eba\u7c7b\u6ce8\u91ca\u7684\u7247\u6bb5\u7b54\u6848\uff0c\u8fd9\u4e9b\u7247\u6bb5\u7b54\u6848\u6355\u6349\u4e86\u5404\u79cd\u89c6\u89d2\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff1a\u89c2\u70b9\u591a\u6837\u6027\uff0c\u7528\u4e8e\u8bc4\u4f30\u89c6\u89d2\u7684\u5168\u9762\u6027\uff1b\u4ee5\u53ca\u4e89\u8bae\u610f\u8bc6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u662f\u5426\u8ba4\u8bc6\u5230\u95ee\u9898\u7684\u4e89\u8bae\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\u4e0e\u4eba\u7c7b\u504f\u597d\u4e00\u81f4\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57fa\u7840\u6a21\u578b\u4e4b\u95f4\u5177\u6709\u7a33\u5b9a\u6027\u3002\u901a\u8fc7\u4f7f\u7528DebateQA\u548c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u6211\u4eec\u8bc4\u4f30\u4e8612\u79cd\u6d41\u884c\u7684LLM\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u867d\u7136LLM\u901a\u5e38\u64c5\u957f\u8bc6\u522b\u4e89\u8bae\u6027\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u63d0\u4f9b\u5168\u9762\u7b54\u6848\u3001\u6db5\u76d6\u591a\u6837\u89c6\u89d2\u7684\u80fd\u529b\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002|\n", "2408.01417": "|**2024-08-02**|**Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs**|Yilun Hua et.al.|[2408.01417](http://arxiv.org/abs/2408.01417)|null|\u4eba\u7c7b\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u4f1a\u81ea\u53d1\u5730\u4f7f\u7528\u8d8a\u6765\u8d8a\u9ad8\u6548\u7684\u8bed\u8a00\uff0c\u901a\u8fc7\u9002\u5e94\u5e76\u5f62\u6210\u81ea\u5b9a\u4e49\u7684\u7ea6\u5b9a\u3002\u8fd9\u4e00\u73b0\u8c61\u5df2\u7ecf\u901a\u8fc7\u53c2\u8003\u6e38\u620f\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u4eba\u7c7b\u8bed\u8a00\u8d85\u8d8a\u4f20\u8fbe\u610f\u56fe\u7684\u7279\u6027\u3002\u76ee\u524d\uff0c\u6211\u4eec\u5c1a\u672a\u63a2\u7d22\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u662f\u5426\u5728\u4ea4\u4e92\u4e2d\u540c\u6837\u63d0\u9ad8\u4e86\u6c9f\u901a\u6548\u7387\uff0c\u5e76\u4e14\u5b83\u4eec\u53ef\u80fd\u91c7\u7528\u4f55\u79cd\u673a\u5236\u5b9e\u73b0\u8fd9\u4e00\u76ee\u7684\u3002 \u6211\u4eec\u5f15\u5165\u4e86ICCA\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728MLLM\u4e2d\u8bc4\u4f30\u6b64\u7c7b\u5bf9\u8bdd\u9002\u5e94\u4f5c\u4e3a\u4e0a\u4e0b\u6587\u884c\u4e3a\u7684\u80fd\u529b\u3002\u6211\u4eec\u5bf9\u51e0\u79cd\u6700\u5148\u8fdb\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u89c2\u5bdf\u5230\u867d\u7136\u5b83\u4eec\u53ef\u80fd\u7406\u89e3\u5176\u5bf9\u8bdd\u4f19\u4f34\u7684\u8bed\u8a00\u8d8a\u6765\u8d8a\u9ad8\u6548\uff0c\u4f46\u5b83\u4eec\u672c\u8eab\u5e76\u4e0d\u81ea\u53d1\u5730\u5728\u65f6\u95f4\u4e0a\u4f7f\u81ea\u5df1\u7684\u8bed\u8a00\u53d8\u5f97\u66f4\u9ad8\u6548\u3002\u8fd9\u79cd\u80fd\u529b\u4ec5\u5728\u67d0\u4e9b\u6a21\u578b\uff08\u5982GPT-4\uff09\u4e2d\u53ef\u4ee5\u901a\u8fc7\u5f3a\u70c8\u7684\u63d0\u793a\u6765\u6fc0\u53d1\u3002\u8fd9\u8868\u660e\uff0c\u5373\u4f7f\u8fd9\u662f\u4eba\u7c7b\u8bed\u8a00\u7684\u5e38\u89c1\u7279\u5f81\uff0c\u5f53\u524d\u7684\u8bad\u7ec3\u5236\u5ea6\u5e76\u4e0d\u80fd\u4ea7\u751f\u8fd9\u4e00\u4e92\u52a8\u5c5e\u6027\u3002 ICCA\u6846\u67b6\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/lil-lab/ICCA\u3002|\n", "2408.01380": "|**2024-08-02**|**Coalitions of Large Language Models Increase the Robustness of AI Agents**|Prattyush Mangal et.al.|[2408.01380](http://arxiv.org/abs/2408.01380)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4ece\u6839\u672c\u4e0a\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u5b57\u7cfb\u7edf\u4e92\u52a8\u7684\u65b9\u5f0f\uff0c\u5e76\u63a8\u52a8\u4e86\u5bf9\u501f\u52a9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684AI\u4ee3\u7406\u4ee5\u8f85\u52a9\u65e5\u5e38\u6d41\u7a0b\u7684\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5f3a\u5927\u7684\u80fd\u529b\u5e76\u80fd\u591f\u8868\u73b0\u51fa\u4e00\u4e9b\u6d8c\u73b0\u7279\u6027\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u903b\u8f91\u63a8\u7406\u8005\uff0c\u5f80\u5f80\u5728AI\u4ee3\u7406\u6267\u884c\u5de5\u4f5c\u6d41\u7a0b\u65f6\u6240\u6d89\u53ca\u7684\u6240\u6709\u5b50\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u73b0\u6709\u7814\u7a76\u901a\u8fc7\u5927\u89c4\u6a21\u7684\u4e00\u822c\u6027\u9884\u8bad\u7ec3\u6216\u9488\u5bf9\u5de5\u5177\u4f7f\u7528\u8fdb\u884c\u4e13\u95e8\u7684\u5fae\u8c03\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u800c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u4e2a\u7531\u4e13\u6ce8\u4e8e\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u7ec4\u6210\u7684\u8054\u76df\u662f\u5426\u80fd\u4e0e\u5355\u4e00\u6a21\u578b\u4ee3\u7406\u7684\u8868\u73b0\u76f8\u5339\u654c\u3002\u8054\u76df\u6a21\u578b\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5176\u5728\u6784\u5efa\u9c81\u68d2\u6027\u548c\u964d\u4f4e\u8fd9\u4e9bAI\u4ee3\u7406\u8fd0\u884c\u6210\u672c\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u901a\u8fc7\u5229\u7528\u7279\u5b9a\u6a21\u578b\u5c55\u73b0\u7684\u7279\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u901a\u8fc7\u8003\u8651\u4e00\u7ec4\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u53ef\u4ee5\u51cf\u8f7b\u5fae\u8c03\u7684\u9700\u6c42\uff0c\u5e76\u76f8\u4fe1\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u5176\u4ed6\u5229\u7528LLM\u7684\u975e\u4ee3\u7406\u7cfb\u7edf\u3002|\n", "2408.01363": "|**2024-08-02**|**Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation**|Jheng-Hong Yang et.al.|[2408.01363](http://arxiv.org/abs/2408.01363)|null|### \u6458\u8981 \u672c\u6587\u5bf9\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u7684\u6f5c\u529b\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u9488\u5bf9\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u4f5c\u7684\u5927\u578b\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\uff0c\u8bc4\u4f30\u4e86CLIP\u3001LLaVA\u548cGPT-4V\u7b49VLM\u7684\u6027\u80fd\u3002\u521d\u6b65\u5b9e\u9a8c\u7ed3\u679c\u5982\u4e0b\uff1a 1. **\u6027\u80fd\u6bd4\u8f83**\uff1a\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u4e0a\uff0cLLaVA\u548cGPT-4V\uff08\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684Kendall\u2019s \u03c4\u22480.4\u7684\u6210\u7ee9\uff0c\u8d85\u8fc7\u4e86CLIPScore\u6307\u6807\u3002 2. **\u504f\u597d\u4e0e\u504f\u89c1**\uff1a\u5c3d\u7ba1CLIPScore\u8868\u73b0\u7a81\u51fa\uff0c\u4f46LLMs\u5728\u504f\u89c1\u65b9\u9762\u76f8\u5bf9\u8f83\u5c11\u503e\u5411\u4e8e\u57fa\u4e8eCLIP\u7684\u68c0\u7d22\u7cfb\u7edf\u3002 3. **\u4e00\u81f4\u6027\u5206\u6790**\uff1aGPT-4V\u7684\u8bc4\u5206\u5206\u5e03\u4e0e\u4eba\u7c7b\u5224\u65ad\u66f4\u4e3a\u4e00\u81f4\uff0c\u5176Cohen\u2019s \u03ba\u503c\u7ea6\u4e3a0.08\uff0c\u8fdc\u9ad8\u4e8eCLIPScore\u7684\u7ea6-0.096\u3002\u8fd9\u4e00\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684VLM\u5728\u589e\u5f3a\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u5177\u6709\u6f5c\u529b\u3002 ### \u7ed3\u8bba \u672c\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u76f8\u5173\u6027\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u7279\u522b\u662f\u5f53\u5b83\u4eec\u88ab\u7528\u4e8e\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\u65f6\u3002\u901a\u8fc7\u6bd4\u8f83\u4e0d\u540c\u6a21\u578b\u7684\u6027\u80fd\uff0c\u7814\u7a76\u5f3a\u8c03\u4e86LLMs\u5728\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u5efa\u9886\u57df\u5185\u7684\u6f5c\u5728\u4f18\u52bf\uff0c\u5e76\u6307\u51fa\u4e86\u5b83\u4eec\u5728\u63d0\u5347\u5185\u5bb9\u76f8\u5173\u6027\u5224\u65ad\u65b9\u9762\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.01355": "|**2024-08-02**|**Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs**|Peng Ding et.al.|[2408.01355](http://arxiv.org/abs/2408.01355)|**[link](https://github.com/njunlp/hallu-pi)**|\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5076\u5c14\u4f1a\u4ea7\u751f\u4e0e\u7ed9\u5b9a\u56fe\u50cf\u4e0d\u4e00\u81f4\u7684\u5185\u5bb9\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4f7f\u7528\u6807\u51c6\u3001\u672a\u6270\u52a8\u57fa\u51c6\u8bc4\u4f30\u5e7b\u89c9\u4e0a\uff0c\u8fd9\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u666e\u904d\u5b58\u5728\u7684\u6270\u52a8\u8f93\u5165\uff08\u5982\u56fe\u50cf\u88c1\u526a\u6216\u6a21\u7cca\uff09\uff0c\u8fd9\u662f\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5e7b\u89c9\u5168\u9762\u8bc4\u4f30\u7684\u5173\u952e\u3002 \u672c\u7bc7\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86Hallu-PI\uff0c\u9996\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6270\u52a8\u8f93\u5165\u4e0b\u7684\u5e7b\u89c9\u7684\u57fa\u51c6\u3002Hallu-PI\u5305\u542b\u4e867\u79cd\u6270\u52a8\u60c5\u666f\uff0c\u6d89\u53ca1,260\u5f20\u6765\u81ea11\u79cd\u7269\u4f53\u7c7b\u578b\u7684\u6270\u52a8\u56fe\u50cf\u3002\u6bcf\u5f20\u56fe\u50cf\u90fd\u9644\u6709\u8be6\u7ec6\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5e7b\u89c9\u7c7b\u578b\uff0c\u5982\u5b58\u5728\u6027\u3001\u5c5e\u6027\u548c\u5173\u7cfb\u7b49\u3002\u8fd9\u4e9b\u6ce8\u91ca\u914d\u5907\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u95ee\u7b54\u96c6\uff0c\u4f7fHallu-PI\u9002\u7528\u4e8e\u8fa8\u522b\u6027\u548c\u751f\u6210\u6027\u4efb\u52a1\u3002 \u5728\u5bf9\u4e3b\u6d41\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini-Pro Vision\uff09\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728Hallu-PI\u4e0a\u7684\u8868\u73b0\u663e\u793a\u51fa\u663e\u8457\u7684\u5e7b\u89c9\uff0c\u800c\u5728\u672a\u6270\u52a8\u573a\u666f\u4e2d\u672a\u89c2\u5bdf\u5230\u6b64\u7c7b\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0d\u540c\u7c7b\u578b\u5e7b\u89c9\u65f6\u5b58\u5728\u7684\u4e25\u91cd\u504f\u5dee\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u4e13\u95e8\u9488\u5bf9\u6270\u52a8\u60c5\u666f\u7684\u57fa\u7ebf\uff0c\u5206\u522b\u4e3aPerturbed-Reminder\u548cPerturbed-ICL\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u5f15\u8d77\u7814\u7a76\u4eba\u5458\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u6270\u52a8\u8f93\u5165\u65f6\u5c40\u9650\u6027\u7684\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u8fdb\u4e00\u6b65\u7684\u8c03\u67e5\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728GitHub\uff08https://github.com/NJUNLP/Hallu-PI\uff09\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.01354": "|**2024-08-02**|**MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code**|Kaiwen Ning et.al.|[2408.01354](http://arxiv.org/abs/2408.01354)|**[link](https://github.com/KevinHeiwa/MCGTM)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u4f17\u591a\u8f6f\u4ef6\u670d\u52a1\u63d0\u4f9b\u5546\uff08SSP\uff09\u81f4\u529b\u4e8e\u5f00\u53d1\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u5b9a\u5236\u5316LLM\uff0c\u5982CodeLlama\u548cCopilot\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u6709\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u6765\u751f\u6210\u6076\u610f\u8f6f\u4ef6\uff0c\u5bf9\u8f6f\u4ef6\u751f\u6001\u7cfb\u7edf\u6784\u6210\u6f5c\u5728\u5a01\u80c1\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u9ad8\u7ea7\u7f51\u7edc\u9493\u9c7c\u6076\u610f\u8f6f\u4ef6\u7684\u521b\u5efa\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9996\u5148\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b\u7ea6400\u5c0f\u65f6\u5de5\u4f5c\u91cf\u3001\u5171\u8ba1406\u4e2a\u6076\u610f\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u63d0\u793a\u6570\u636e\u96c6MCGTest\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MCGMark\uff0c\u8fd9\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u7a33\u5065\u3001\u7ed3\u6784\u611f\u77e5\u4e14\u53ef\u7f16\u7801\u7684\u6c34\u5370\u65b9\u6cd5\uff0c\u7528\u4e8e\u8ffd\u8e2a\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u3002\u6211\u4eec\u901a\u8fc7\u63a7\u5236\u4ee4\u724c\u9009\u62e9\u548c\u57fa\u4e8e\u6982\u7387\u5f02\u5e38\u503c\u786e\u4fdd\u8f93\u51fa\u8d28\u91cf\u6765\u5d4c\u5165\u53ef\u7f16\u7801\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8003\u8651\u6076\u610f\u4ee3\u7801\u7684\u7ed3\u6784\u7279\u5f81\u589e\u5f3a\u4e86\u6c34\u5370\u7684\u9c81\u68d2\u6027\uff0c\u907f\u514d\u5728\u6613\u4e8e\u4fee\u6539\u7684\u4f4d\u7f6e\uff08\u5982\u6ce8\u91ca\uff09\u5d4c\u5165\u6c34\u5370\u3002\u6211\u4eec\u4f7f\u7528DeepSeek-Coder\u9a8c\u8bc1\u4e86MCGMark\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\uff0c\u5176\u6700\u5927\u8f93\u51fa\u9650\u5236\u4e3a400\u4e2a\u4ee4\u724c\u65f6\uff0c\u5d4c\u5165\u6210\u529f\u7387\u8fbe\u5230\u4e8688.9%\u3002\u540c\u65f6\uff0c\u8be5\u65b9\u6cd5\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u5bf9\u8f93\u51fa\u4ee3\u7801\u7684\u8d28\u91cf\u5f71\u54cd\u6781\u5c0f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5e2e\u52a9SSP\u8ffd\u8e2a\u5e76\u8ffd\u7a76\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u7684\u6e90\u5934\u53ca\u8d23\u4efb\u3002|\n", "2408.01346": "|**2024-08-02**|**Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks**|Anders Giovanni M\u00f8ller et.al.|[2408.01346](http://arxiv.org/abs/2408.01346)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u662f\u4fc3\u8fdb\u793e\u4f1a\u8ba1\u7b97\u9886\u57df\u590d\u6742\u6587\u672c\u7406\u89e3\u4efb\u52a1\u7684\u6709\u529b\u5de5\u5177\u3002\u5b83\u4eec\u7684\u591a\u529f\u80fd\u6027\u867d\u7136\u6709\u76ca\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u5728\u8be5\u9886\u57df\u5efa\u7acb\u6807\u51c6\u5316\u6700\u4f73\u5b9e\u8df5\u7684\u969c\u788d\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e0d\u540c\u7b56\u7565\u4ef7\u503c\u7684\u6e05\u6670\u5ea6\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u73b0\u4ee3\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u65b9\u6cd5\u572823\u4e2a\u793e\u4f1a\u77e5\u8bc6\u4efb\u52a1\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u6307\u51fa\u4e86\u4e09\u4e2a\u6700\u4f73\u5b9e\u8df5\uff1a\u9009\u62e9\u5177\u6709\u66f4\u5927\u8bcd\u6c47\u91cf\u548c\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u6a21\u578b\uff1b\u907f\u514d\u7b80\u5355\u7684\u96f6\u6b21\u5c1d\u8bd5\uff0c\u800c\u503e\u5411\u4e8e\u589e\u5f3a\u63d0\u793a\u7684\u4eba\u5de5\u667a\u80fd\u65b9\u6cd5\uff1b\u5728\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8003\u8651\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u66f4\u590d\u6742\u7684\u6307\u4ee4\u8c03\u6574\uff0c\u4ec5\u5f53\u8bad\u7ec3\u6570\u636e\u66f4\u4e3a\u4e30\u5bcc\u65f6\u624d\u8fd9\u6837\u505a\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u6bb5\u7ffb\u8bd1\u6587\u672c\u4e2d\u5e76\u672a\u5305\u542b\u4efb\u4f55\", \"\u5b57\u7b26\u3002|\n", "2408.01334": "|**2024-08-02**|**A Backbone for Long-Horizon Robot Task Understanding**|Xiaoshuai Chen et.al.|[2408.01334](http://arxiv.org/abs/2408.01334)|null|\u4e3a\u4e86\u5e94\u5bf9\u957f\u65f6\u7a0b\u4efb\u52a1\u4e2d\u7aef\u5230\u7aef\u673a\u5668\u4eba\u5b66\u4e60\u7684\u4e0d\u53ef\u9884\u6d4b\u6027\u4e0e\u6cdb\u5316\u80fd\u529b\u5dee\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eTherblig\u7684\u9aa8\u67b6\u6846\u67b6\uff08TBBF\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u673a\u5668\u4eba\u4efb\u52a1\u7406\u89e3\u4e0e\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u6846\u67b6\u5229\u7528Therblig\uff08\u57fa\u672c\u52a8\u4f5c\u5143\u7d20\uff09\u4f5c\u4e3a\u9aa8\u67b6\uff0c\u5c06\u9ad8\u7ea7\u673a\u5668\u4eba\u4efb\u52a1\u5206\u89e3\u4e3a\u57fa\u672c\u673a\u5668\u4eba\u914d\u7f6e\uff0c\u7136\u540e\u7ed3\u5408\u5f53\u524d\u7684\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u4efb\u52a1\u7406\u89e3\u3002 \u8be5\u65b9\u6cd5\u5305\u542b\u4e24\u4e2a\u9636\u6bb5\uff1a\u79bb\u7ebf\u8bad\u7ec3\u4e0e\u5728\u7ebf\u6d4b\u8bd5\u3002\u5728\u79bb\u7ebf\u8bad\u7ec3\u9636\u6bb5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Meta-RGate SynerFusion\uff08MGSF\uff09\u7f51\u7edc\uff0c\u7528\u4e8e\u8de8\u4efb\u52a1\u7cbe\u786e\u7684Therblig\u5206\u5272\u3002\u5728\u7ebf\u6d4b\u8bd5\u9636\u6bb5\uff0c\u901a\u8fc7\u6536\u96c6\u65b0\u4efb\u52a1\u7684\u4e00\u6b21\u6f14\u793a\uff0cMGSF\u7f51\u7edc\u63d0\u53d6\u9ad8\u9636\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7Action Registration\uff08ActionREG\uff09\u7f16\u7801\u5165\u56fe\u50cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528Large Language Model\uff08LLM\uff09-Alignment Policy for Visual Correction\uff08LAP-VC\uff09\u6765\u786e\u4fdd\u7cbe\u786e\u7684\u52a8\u4f5c\u6267\u884c\uff0c\u4ece\u800c\u5728\u65b0\u578b\u673a\u5668\u4eba\u573a\u666f\u4e2d\u5b9e\u73b0\u8f68\u8ff9\u8f6c\u79fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0cTherblig\u5206\u5272\u8fbe\u5230\u4e8694.37%\u7684\u53ec\u56de\u7387\uff0c\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u7684\u5728\u7ebf\u673a\u5668\u4eba\u6d4b\u8bd5\u4e2d\uff0c\u5bf9\u4e8e\u7b80\u5355\u548c\u590d\u6742\u573a\u666f\u7684\u6210\u529f\u7387\u5206\u522b\u8fbe\u5230\u4e8694.4%\u548c80%\u3002\u8865\u5145\u6750\u6599\u53ef\u5728\u4ee5\u4e0b\u7f51\u7ad9\u83b7\u53d6\uff1ahttps://sites.google.com/view/therbligsbasedbackbone/home|\n", "2408.02651": "|**2024-08-05**|**Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?**|Mohammad Bahrami Karkevandi et.al.|[2408.02651](http://arxiv.org/abs/2408.02651)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u9053\u5fb7\u6027\u4ecd\u7136\u5b58\u5728\u4e89\u8bae\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u8bad\u7ec3\u57fa\u4e8e\u4e92\u8054\u7f51\u6587\u672c\u8bed\u6599\u5e93\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u62c5\u5fe7\uff0c\u5df2\u7ecf\u5f00\u53d1\u4e86\u5bf9\u9f50\u6280\u672f\u6765\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u5171\u53ef\u7528\u6027\u548c\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\u4f3c\u4e4e\u4ecd\u7136\u5b58\u5728\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u201c\u53cd\u5411\u5bf9\u9f50\u201dLLM\u7684\u6982\u5ff5\u2014\u2014\u5229\u7528\u5bf9\u6297\u89e6\u53d1\u5668\u9006\u8f6c\u5176\u5bf9\u9f50\u8fc7\u7a0b\u3002\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u8f6f\u5d4c\u5165\u63d0\u793a\u3001\u624b\u52a8\u6784\u5efa\u7684\u63d0\u793a\u548c\u57fa\u4e8e\u68af\u5ea6\u7684\u81ea\u52a8\u63d0\u793a\uff0c\u5728\u9ed1\u76d2\u6a21\u578b\u4e0a\u7531\u4e8e\u9700\u8981\u8bbf\u95ee\u6a21\u578b\u548c\u4ea7\u751f\u6709\u9650\u7684\u624b\u52a8\u6784\u5efa\u63d0\u793a\u7684\u9700\u6c42\u800c\u53d6\u5f97\u4e86\u6709\u9650\u7684\u6210\u529f\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5bb9\u6613\u88ab\u963b\u65ad\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u4f18\u5316\u5bf9\u6297\u89e6\u53d1\u5668\uff0c\u4ec5\u9700\u5bf9\u76ee\u6807\u6a21\u578b\u8fdb\u884c\u63a8\u7406API\u8bbf\u95ee\u4ee5\u53ca\u4e00\u4e2a\u5c0f\u578b\u4ee3\u7406\u6a21\u578b\u5373\u53ef\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528BERTScore\u4e3a\u57fa\u7840\u7684\u5956\u52b1\u51fd\u6570\uff0c\u589e\u5f3a\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u5728\u65b0\u9ed1\u76d2\u6a21\u578b\u4e0a\u7684\u53ef\u79fb\u690d\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5982\u4f55\u5728\u672a\u6d4b\u8bd5\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\u63d0\u9ad8\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u7684\u8868\u73b0\u3002|\n", "2408.02632": "|**2024-08-05**|**SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models**|Muxi Diao et.al.|[2408.02632](http://arxiv.org/abs/2408.02632)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u4e0e\u5f71\u54cd\u529b\u7684\u6301\u7eed\u589e\u5f3a\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u9884\u9632\u6709\u5bb3\u8f93\u51fa\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u5173\u5207\uff0c\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5bf9\u6297\u6027\u63d0\u793a\u8fdb\u884c\u7ea2\u961f\u6d4b\u8bd5\u3002\u7136\u800c\uff0cLLM\u4e2d\u6f0f\u6d1e\u7684\u4e0d\u65ad\u6f14\u53d8\u4f7f\u5f97\u5f53\u524d\u7684\u5bf9\u6297\u65b9\u6cd5\u5728\u5177\u4f53\u9488\u5bf9\u548c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u5f31\u70b9\u65b9\u9762\u663e\u5f97\u529b\u4e0d\u4ece\u5fc3\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u81ea\u6211\u6f14\u5316\u5b89\u5168\u4f18\u5316\u201d\uff08SEAS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5229\u7528\u6a21\u578b\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u6765\u589e\u5f3a\u5b89\u5168\u6027\u3002SEAS\u8fd0\u4f5c\u4e8e\u4e09\u4e2a\u8fed\u4ee3\u9636\u6bb5\uff1a\u521d\u59cb\u5316\u3001\u653b\u51fb\u548c\u5bf9\u6297\u4f18\u5316\uff0c\u65e8\u5728\u540c\u65f6\u63d0\u5347\u7ea2\u961f\u548c\u76ee\u6807\u6a21\u578b\u7684\u7a33\u5065\u6027\u548c\u5b89\u5168\u6027\u3002 \u8be5\u6846\u67b6\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u6d4b\u8bd5\u7684\u4f9d\u8d56\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86LLM\u7684\u5b89\u5168\u6027\u80fd\u529b\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u65b0\u9896\u7684\u5bf9\u6297\u6027\u6846\u67b6\u3001\u4e00\u4e2a\u5168\u9762\u7684\u5b89\u5168\u6570\u636e\u96c6\u4ee5\u53ca\u7ecf\u8fc7\u4e09\u6b21\u8fed\u4ee3\u540e\uff0c\u76ee\u6807\u6a21\u578b\u7684\u5b89\u5168\u6c34\u5e73\u8fbe\u5230\u4e86\u4e0eGPT-4\u76f8\u5f53\u7684\u6c34\u5e73\uff0c\u800c\u7ea2\u961f\u6a21\u578b\u5728\u5bf9\u6297\u9ad8\u7ea7\u6a21\u578b\u65f6\u7684\u6210\u529f\u7387\uff08ASR\uff09\u6709\u4e86\u663e\u8457\u63d0\u9ad8\u3002|\n", "2408.02599": "|**2024-08-05**|**Progressively Selective Label Enhancement for Language Model Alignment**|Biao Liu et.al.|[2408.02599](http://arxiv.org/abs/2408.02599)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u53ef\u80fd\u4f1a\u751f\u6210\u4e0e\u4eba\u7c7b\u9884\u671f\u4e0d\u7b26\u7684\u5185\u5bb9\uff0c\u4ece\u800c\u5f15\u53d1\u4f26\u7406\u548c\u6cd5\u5f8b\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u7684\u5c40\u9650\u6027\u5e76\u5b9e\u65bd\u9650\u5236\u4ee5\u786e\u4fdd\u5b89\u5168\u6027\u548c\u5408\u89c4\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u5176\u4e2d\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8eRLHF\u9636\u6bb5\u5728\u7a33\u5b9a\u6027\u548c\u53ef\u6269\u5c55\u6027\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u5728\u63a2\u7d22\u5176\u4ed6\u65b9\u6cd5\u6765\u5b9e\u73b0\u4e0eRLHF\u7c7b\u4f3c\u7684\u6548\u679c\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5927\u91cf\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u4f4e\u6548\u5730\u5229\u7528\u751f\u6210\u7684\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSLE\uff08Progressively Selective Label Enhancement for Language Model Alignment\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5145\u5206\u5229\u7528\u6240\u6709\u751f\u6210\u6570\u636e\uff0c\u901a\u8fc7\u6307\u5bfc\u6a21\u578b\u9075\u5faa\u539f\u5219\u6765\u4f7f\u8f93\u51fa\u4e0e\u4eba\u7c7b\u671f\u671b\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u52a8\u6001\u66f4\u65b0\u9608\u503c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u786e\u4fdd\u4e86\u9ad8\u6548\u7684\u6570\u636e\u5229\u7528\uff0c\u901a\u8fc7\u6574\u5408\u6240\u6709\u751f\u6210\u54cd\u5e94\u5e76\u6839\u636e\u5176\u76f8\u5e94\u7684\u5956\u52b1\u5206\u6570\u5bf9\u5b83\u4eec\u8fdb\u884c\u52a0\u6743\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPSLE\u5728\u73b0\u6709\u8bed\u8a00\u6a21\u578b\u5bf9\u9f50\u65b9\u6cd5\u4e2d\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002|\n", "2408.02584": "|**2024-08-05**|**Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization**|Ankan Mullick et.al.|[2408.02584](http://arxiv.org/abs/2408.02584)|null|\u968f\u7740\u6570\u5b57\u4fe1\u606f\u91cf\u7684\u6301\u7eed\u589e\u957f\uff0c\u7528\u6237\u9700\u8981\u6709\u6548\u65b9\u6cd5\u4ece\u957f\u7bc7\u6587\u6863\u4e2d\u63d0\u53d6\u5173\u952e\u89c1\u89e3\u3002\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u9488\u5bf9\u6027\u7684\u65b9\u6cd5\uff0c\u751f\u6210\u4e13\u6ce8\u4e8e\u6587\u6863\u5185\u7279\u5b9a\u65b9\u9762\u7684\u5c0f\u7ed3\u3002\u5c3d\u7ba1\u5728\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u7814\u7a76\u9886\u57df\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u8ffd\u6c42\u662f\u5fc5\u8981\u7684\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u603b\u7ed3\u95ee\u9898\u4e0a\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u4ee5\u6267\u884c\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u4efb\u52a1\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f00\u6e90\u57fa\u7840LLMs\uff0c\u5305\u62ecLlama2\u3001Mistral\u3001Gemma\u548cAya\uff0c\u5bf9\u4e8e\u516c\u5f00\u53ef\u7528\u7684\u7279\u5b9a\u9886\u57df\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5047\u8bbe\u662f\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8ba9\u8fd9\u4e9b\u6a21\u578b\u6709\u6548\u5730\u8bc6\u522b\u5e76\u63d0\u53d6\u4e0e\u65b9\u9762\u76f8\u5173\u7684\u4fe1\u606f\uff0c\u4ece\u800c\u4ea7\u751f\u4e0e\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u76f8\u6bd4\u66f4\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5c06\u5fae\u8c03\u540e\u7684LLMs\u7684\u6027\u80fd\u4e0e\u7ade\u4e89\u6027\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u65b9\u6cd5\u4ee5\u53ca\u5fae\u8c03\u524dLLMs\u7684\u539f\u59cb\u7248\u672c\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u901a\u8fc7\u8bc1\u660e\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\uff0c\u4e3a\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6b64\u5916\uff0c\u5b83\u4e3a\u5728\u4e0d\u540cNLP\u9886\u57df\u8fdb\u4e00\u6b65\u63a2\u7d22\u4f7f\u7528LLMs\u8fdb\u884c\u76ee\u6807\u4fe1\u606f\u62bd\u53d6\u4efb\u52a1\u6253\u5f00\u4e86\u5927\u95e8\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0eAPI\u9a71\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u3001\u4e0d\u5b8c\u5168\u4fe1\u606f\u73af\u5883\u4e0b\u7684\u6587\u672c\u6e38\u620f\u534f\u4f5c\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u975e\u82f1\u8bed\u73af\u5883\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7814\u7a76\u5bf9\u6bd4\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e0e\u5176\u4ed6\u7c7b\u578b\u4ee3\u7406\u7684\u6027\u80fd\uff0c\u5e76\u4f7f\u7528\u7406\u8bba\u601d\u7ef4\uff08Theory of Mind, ToM\uff09\u89c4\u5212\u6280\u672f\u6765\u8bc4\u4f30\u5b83\u4eec\u5728\u9700\u8981\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u7684\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u4e2d\u8868\u73b0\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5f15\u5165\u5916\u90e8\u5de5\u5177\u6765\u89e3\u51b3\u6b64\u5361\u724c\u6e38\u620f\u4e2d\u52a8\u6001\u4e14\u5e9e\u5927\u7684\u884c\u52a8\u7a7a\u95f4\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9762\u5bf9\u9ad8\u7ea7\u522b\u4efb\u52a1\u65f6\u4e0e\u5f3a\u5316\u5b66\u4e60\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002\u5c3d\u7ba1\u5b58\u5728\u8fd9\u4e00\u5dee\u8ddd\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c55\u73b0\u4e86\u5728\u6e38\u620f\u573a\u666f\u4e0b\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u80fd\u591f\u7406\u89e3\u76df\u53cb\u548c\u5bf9\u624b\u7684\u884c\u4e3a\uff0c\u5e76\u4e0e\u76df\u53cb\u5efa\u7acb\u534f\u4f5c\u5173\u7cfb\uff0c\u4ece\u800c\u6301\u7eed\u63d0\u5347\u5176\u6027\u80fd\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u4e0e\u7406\u89e3\uff0c\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u4ee3\u7801\u5e93\u3002|\n", "2408.02549": "|**2024-08-05**|**Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning**|Hao Zhou et.al.|[2408.02549](http://arxiv.org/abs/2408.02549)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u90e8\u7f72\u57fa\u7840\u6a21\u578b\u7684\u521b\u65b0\u8fb9\u7f18-\u4e91\u67b6\u6784\u3002\u5177\u4f53\u76ee\u6807\u662f\u901a\u8fc7\u65e0\u7ebf\u7535\u8d44\u6e90\u5206\u914d\u548c\u4efb\u52a1\u5378\u8f7d\u6765\u6700\u5c0f\u5316\u57fa\u7840\u6a21\u578b\u7684\u670d\u52a1\u5ef6\u8fdf\u3002\u4e3b\u8981\u5206\u4e3a\u4e09\u90e8\u5206\uff1a\u9996\u5148\uff0c\u4ecb\u7ecd\u901a\u4fe1\u7cfb\u7edf\u6a21\u578b\uff0c\u5373\u5206\u914d\u65e0\u7ebf\u7535\u8d44\u6e90\u5e76\u8ba1\u7b97\u652f\u6301\u751f\u6210\u5185\u5bb9\u4f20\u8f93\u7684\u94fe\u8def\u5bb9\u91cf\uff1b\u5176\u6b21\uff0c\u5c55\u793a\u57fa\u7840\u6a21\u578b\u63a8\u7406\u6a21\u578b\uff0c\u7528\u4e8e\u8ba1\u7b97\u5185\u5bb9\u751f\u6210\u7684\u5ef6\u8fdf\uff1b\u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u65b9\u6cd5\u6765\u4f18\u5316\u4efb\u52a1\u5378\u8f7d\u51b3\u7b56\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u57fa\u7840\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u907f\u514d\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4e2d\u9700\u8981\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u56f0\u96be\u3002\u4eff\u771f\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u8fb9\u7f18-\u4e91\u90e8\u7f72\u4e0e\u4e0a\u4e0b\u6587\u5b66\u4e60\u4efb\u52a1\u5378\u8f7d\u65b9\u6cd5\u53ef\u4ee5\u5728\u65e0\u9700\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u6ee1\u610f\u7684\u751f\u6210\u670d\u52a1\u8d28\u91cf\u3002|\n", "2408.02545": "|**2024-08-05**|**RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation**|Daniel Fleischer et.al.|[2408.02545](http://arxiv.org/abs/2408.02545)|**[link](https://github.com/intellabs/ragfoundry)**|\u5b9e\u65bd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u56fa\u6709\u5730\u590d\u6742\uff0c\u9700\u8981\u6df1\u5165\u4e86\u89e3\u6570\u636e\u3001\u5e94\u7528\u573a\u666f\u4ee5\u53ca\u7ec6\u81f4\u7684\u8bbe\u8ba1\u51b3\u7b56\u3002\u6b64\u5916\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u7cfb\u7edf\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u9700\u8981\u901a\u8fc7\u591a\u7ef4\u5ea6\u7684\u65b9\u6cd5\u8bc4\u4f30\u68c0\u7d22\u51c6\u786e\u6027\u548c\u751f\u6210\u8d28\u91cf\u3002\u6211\u4eec\u5f15\u5165\u4e86RAG Foundry\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u7528\u4e8e\u5728RAG\u573a\u666f\u4e2d\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u636e\u3002RAG Foundry\u5c06\u6570\u636e\u521b\u5efa\u3001\u8bad\u7ec3\u3001\u63a8\u7406\u548c\u8bc4\u4f30\u6574\u5408\u5230\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u4e2d\uff0c\u4ece\u800c\u4e3a\u5728RAG\u8bbe\u7f6e\u4e2d\u8bad\u7ec3\u548c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u6570\u636e\u589e\u5f3a\u96c6\u63d0\u4f9b\u4e86\u4fbf\u5229\u3002\u8fd9\u79cd\u6574\u5408\u4f7f\u5f97\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u548cRAG\u6280\u672f\u7684\u5b9e\u9a8c\u53d8\u5f97\u5bb9\u6613\uff0c\u5141\u8bb8\u7528\u6237\u8f7b\u677e\u751f\u6210\u6570\u636e\u96c6\u5e76\u4f7f\u7528\u5185\u90e8\u6216\u4e13\u95e8\u7684\u77e5\u8bc6\u6e90\u8bad\u7ec3RAG\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u591a\u79cdRAG\u914d\u7f6e\u5bf9Llama-3\u548cPhi-3\u6a21\u578b\u8fdb\u884c\u589e\u5f3a\u548c\u5fae\u8c03\uff0c\u5728\u4e09\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86\u6301\u7eed\u6539\u8fdb\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u4f5c\u4e3a\u5f00\u6e90\u53d1\u5e03\u5728https://github.com/IntelLabs/RAGFoundry\u3002|\n", "2408.02544": "|**2024-08-05**|**Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions**|Xinbei Ma et.al.|[2408.02544](http://arxiv.org/abs/2408.02544)|**[link](https://github.com/xbmxb/EnvDistraction)**|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4ee3\u7406\u5728\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u73af\u5883\u4e2d\u7684\u5fe0\u8bda\u5ea6\u95ee\u9898\uff0c\u65e8\u5728\u89e3\u51b3\u4ee5\u4e0b\u7814\u7a76\u95ee\u9898\uff1a\u591a\u6a21\u6001GUI\u4ee3\u7406\u662f\u5426\u53ef\u80fd\u88ab\u73af\u5883\u80cc\u666f\u5206\u6563\u6ce8\u610f\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u8bbe\u7f6e\uff0c\u5176\u4e2d\u7528\u6237\u548c\u4ee3\u7406\u5747\u4e3a\u5584\u610f\u89d2\u8272\uff0c\u800c\u73af\u5883\u867d\u975e\u6076\u610f\uff0c\u4f46\u5305\u542b\u4e0e\u4efb\u52a1\u65e0\u5173\u7684\u5185\u5bb9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6a21\u62df\u6570\u636e\u96c6\uff0c\u5bf9\u591a\u79cdMLLM\u4f5c\u4e3aGUI\u4ee3\u7406\u8fdb\u884c\u8bc4\u4f30\uff0c\u6309\u7167\u4e09\u79cd\u4e0d\u540c\u7684\u5de5\u4f5c\u6a21\u5f0f\uff0c\u5373\u5177\u6709\u4e0d\u540c\u7a0b\u5ea6\u611f\u77e5\u80fd\u529b\u7684\u6a21\u5f0f\u8fdb\u884c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4fbf\u662f\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u901a\u7528\u578b\u4ee3\u7406\u8fd8\u662f\u4e13\u95e8\u7528\u4e8eGUI\u7684\u4ee3\u7406\uff0c\u90fd\u5bb9\u6613\u53d7\u5230\u5e72\u6270\u3002\u867d\u7136\u8fd1\u671f\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u591a\u6a21\u6001\u4ee3\u7406\u7684\u52a8\u4f5c\u51c6\u786e\u6027\uff08\u5373\u5e2e\u52a9\u6027\uff09\uff0c\u4f46\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u9762\u5bf9\u73af\u5883\u5e72\u6270\u65f6\u8868\u73b0\u51fa\u4e0d\u5fe0\u884c\u4e3a\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u51fa\u53d1\uff0c\u5b9e\u65bd\u73af\u5883\u6ce8\u5165\u7b56\u7565\uff0c\u5c55\u793a\u51fa\u5229\u7528\u8fd9\u79cd\u4e0d\u5fe0\u884c\u4e3a\u53ef\u80fd\u5bfc\u81f4\u7684\u610f\u5916\u98ce\u9669\u3002|\n", "2408.02535": "|**2024-08-05**|**Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph**|Zhao Kaichen et.al.|[2408.02535](http://arxiv.org/abs/2408.02535)|null|\u89c6\u89c9\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u662f\u667a\u80fd\u4f53\u9886\u57df\u7684\u91cd\u8981\u7814\u7a76\u4e4b\u4e00\uff0c\u65e8\u5728\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5468\u56f4\u73af\u5883\u5e76\u5b8c\u6210\u5bfc\u822a\u4efb\u52a1\u3002\u5728VLN\u4efb\u52a1\u4e2d\uff0c\u6307\u4ee4\u53ef\u4ee5\u5206\u4e3a\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u4e24\u79cd\u7c7b\u578b\u3002\u7ec6\u7c92\u5ea6\u6307\u4ee4\u8be6\u7ec6\u63cf\u8ff0\u4e86\u6574\u4e2a\u4efb\u52a1\u7684\u6b65\u9aa4\uff0c\u800c\u7c97\u7c92\u5ea6\u6307\u4ee4\u5219\u63d0\u4f9b\u4e86\u4e00\u4e2a\u62bd\u8c61\u7684\u4efb\u52a1\u63cf\u8ff0\uff0c\u66f4\u9002\u5408\u4eba\u7c7b\u7684\u4e60\u60ef\u3002\u73b0\u6709\u7684\u5927\u90e8\u5206\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u5bf9\u7ec6\u7c92\u5ea6\u6307\u4ee4\u7684\u7814\u7a76\u4e0a\uff0c\u5ffd\u89c6\u4e86\u65e5\u5e38\u751f\u6d3b\u4e2d\u5b58\u5728\u7684\u62bd\u8c61\u6307\u4ee4\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u4e8b\u4ef6\u77e5\u8bc6\u589e\u5f3a\u7684\u65b9\u5f0f\u8003\u8651VLN\u4e2d\u7684\u7c97\u7c92\u5ea6\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6765\u6574\u5408\u591a\u4e2a\u4e3b\u6d41\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5f62\u6210\u4e00\u4e2a\u5168\u9762\u7684\u4e8b\u4ef6\u77e5\u8bc6\u56fe\u8c31\uff08\u547d\u540d\u4e3aVLN-EventKG\uff09\u3002\u901a\u8fc7\u5c0f\u89c4\u6a21\u548c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u5408\u4f5c\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u80fd\u591f\u5904\u7406\u7c97\u7c92\u5ea6\u6307\u4ee4\u8f93\u5165\u7684\u4e8b\u4ef6\u5bfc\u822a\uff08EventNav\uff09\u65b9\u6cd5\uff0c\u7528\u4e8eVLN\u4efb\u52a1\u4e2d\u7684\u5bfc\u822a\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u52a8\u6001\u5386\u53f2\u56de\u6eaf\u6a21\u5757\uff0c\u80fd\u591f\u5728\u5b9e\u65f6\u4e2d\u7ea0\u6b63\u6f5c\u5728\u7684\u9519\u8bef\u52a8\u4f5c\u89c4\u5212\u3002\u5728\u5404\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528\u6211\u4eec\u63d0\u51fa\u7684VLN-EventKG\u7684\u77e5\u8bc6\u589e\u5f3a\u65b9\u6cd5\uff0c\u5728\u4f7f\u7528\u7c97\u7c92\u5ea6\u6307\u4ee4\u7684VLN\u4efb\u52a1\u4e2d\u5177\u6709\u8d85\u8fc75%\u7684\u6210\u529f\u7387\u4f18\u52bf\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u4ee5\u5728 \u4e0a\u8bbf\u95ee\u3002|\n", "2408.02509": "|**2024-08-05**|**Practical Attacks against Black-box Code Completion Engines**|Slobodan Jenko et.al.|[2408.02509](http://arxiv.org/abs/2408.02509)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aINSEC\u7684\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u5f15\u5bfc\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7801\u8865\u5168\u5f15\u64ce\u751f\u6210\u5b58\u5728\u5b89\u5168\u6f0f\u6d1e\u7684\u4ee3\u7801\u3002\u8fd9\u79cd\u653b\u51fb\u65b9\u5f0f\u4e0e\u5e02\u9762\u4e0a\u5927\u591a\u6570\u5546\u4e1a\u8865\u5168\u5f15\u64ce\uff08\u5982GitHub Copilot\uff09\u76f8\u4f3c\uff0c\u4ec5\u9700\u8981\u9ed1\u76d2\u67e5\u8be2\u8bbf\u95ee\u76ee\u6807\u5f15\u64ce\uff0c\u65e0\u9700\u4e86\u89e3\u5176\u5185\u90e8\u673a\u5236\u3002\u653b\u51fb\u7b56\u7565\u901a\u8fc7\u5728\u8865\u5168\u8f93\u5165\u4e2d\u63d2\u5165\u6076\u610f\u653b\u51fb\u5b57\u7b26\u4e32\u4f5c\u4e3a\u7b80\u77ed\u6ce8\u91ca\u6765\u5b9e\u65bd\u3002\u4e3a\u4e86\u8bbe\u8ba1\u51fa\u6709\u6548\u7684\u653b\u51fb\u5b57\u7b26\u4e32\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u521d\u59cb\u5316\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u4f18\u5316\u8fc7\u7a0b\u8fdb\u4e00\u6b65\u7cbe\u70bc\u3002\u6211\u4eec\u5728\u5f00\u6e90\u6a21\u578b\u3001\u9ed1\u76d2\u5546\u4e1a\u670d\u52a1\uff08\u5982OpenAI API\u548cGitHub Copilot\uff09\u4ee5\u53ca\u4e94\u79cd\u7f16\u7a0b\u8bed\u8a00\u4e0b\u768416\u4e2a\u5173\u952e\u9519\u8bef\u7c7b\u522b\u4e0a\u9a8c\u8bc1\u4e86INSEC\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\uff0cINSEC\u663e\u8457\u63d0\u9ad8\u4e86\u8003\u8651\u4e2d\u7684\u8865\u5168\u5f15\u64ce\u751f\u6210\u4e0d\u5b89\u5168\u4ee3\u7801\u7684\u53ef\u80fd\u6027\u8d85\u8fc750%\uff0c\u540c\u65f6\u4ecd\u5177\u5907\u751f\u6210\u529f\u80fd\u6b63\u786e\u4ee3\u7801\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u8d44\u6e90\u9700\u6c42\u8f83\u4f4e\uff0c\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e\u5341\u7f8e\u5143\uff0c\u53ef\u5728\u666e\u901a\u786c\u4ef6\u4e0a\u8fd0\u884c\u3002|\n", "2408.03302": "|**2024-08-06**|**TextIM: Part-aware Interactive Motion Synthesis from Text**|Siyuan Fan et.al.|[2408.03302](http://arxiv.org/abs/2408.03302)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTextIM\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u5408\u6210\u57fa\u4e8e\u6587\u672c\u9a71\u52a8\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u5e76\u7279\u522b\u5173\u6ce8\u4e8e\u90e8\u5206\u7ea7\u8bed\u4e49\u7684\u7cbe\u786e\u5bf9\u9f50\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u4ea4\u4e92\u8eab\u4f53\u90e8\u4f4d\u7684\u5173\u952e\u4f5c\u7528\uff0c\u5e76\u672a\u80fd\u5145\u5206\u6355\u6349\u548c\u5bf9\u9f50\u90e8\u5206\u7ea7\u8bed\u4e49\uff0c\u5bfc\u81f4\u4e86\u4e0d\u51c6\u786e\u751a\u81f3\u9519\u8bef\u7684\u52a8\u4f5c\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0cTextIM\u91c7\u7528\u4e86\u4e00\u4e2a\u89e3\u8026\u6761\u4ef6\u6269\u6563\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u4ea4\u4e92\u52a8\u4f5c\u4e0e\u5bf9\u5e94\u6587\u672c\u63cf\u8ff0\u4e2d\u7684\u8bed\u4e49\u610f\u56fe\u4e4b\u95f4\u8be6\u7ec6\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f5c\u4e3a\u4eba\u7c7b\u5927\u8111\u7684\u89d2\u8272\uff0c\u6765\u8bc6\u522b\u4ea4\u4e92\u7684\u8eab\u4f53\u90e8\u4f4d\u5e76\u7406\u89e3\u4ea4\u4e92\u8bed\u4e49\uff0c\u4ece\u800c\u751f\u6210\u590d\u6742\u7684\u5fae\u5999\u4ea4\u4e92\u52a8\u4f5c\u3002\u5728\u7cbe\u7ec6\u52a8\u4f5c\u5f15\u5bfc\u4e0b\uff0cTextIM\u8fdb\u4e00\u6b65\u5c06\u8fd9\u4e9b\u90e8\u5206\u52a8\u4f5c\u6269\u5c55\u4e3a\u6574\u4e2a\u8eab\u4f53\u7684\u8fde\u8d2f\u52a8\u4f5c\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7a7a\u95f4\u4e00\u81f4\u6027\u6a21\u5757\uff0c\u901a\u8fc7\u90e8\u5206\u56fe\u5377\u79ef\u7f51\u7edc\u5728\u6574\u4e2a\u8eab\u4f53\u52a8\u4f5c\u4e2d\u8865\u5145\u548c\u7ef4\u6301\u5404\u90e8\u5206\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\u548c\u548c\u8c10\u6027\u3002\u5bf9\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u7cbe\u5fc3\u9009\u62e9\u4e86\u5e76\u91cd\u65b0\u6807\u8bb0\u4e86HUMANML3D\u4e2d\u7684\u4ea4\u4e92\u52a8\u4f5c\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTextIM\u80fd\u591f\u4ea7\u751f\u8bed\u4e49\u4e0a\u51c6\u786e\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u573a\u666f\u4e0b\u5408\u6210\u4ea4\u4e92\u52a8\u4f5c\u7684\u771f\u5b9e\u611f\u548c\u5e94\u7528\u6027\uff0c\u5305\u62ec\u4e0e\u53ef\u53d8\u5f62\u548c\u52a8\u6001\u53d8\u5316\u7269\u4f53\u7684\u4ea4\u4e92\u3002|\n", "2408.03297": "|**2024-08-06**|**KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models**|Ruizhe Zhang et.al.|[2408.03297](http://arxiv.org/abs/2408.03297)|null|\u901a\u8fc7\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b56\u7565\u5df2\u6210\u4e3a\u7f13\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u65f6\u9047\u5230\u7684\u5e7b\u89c9\u95ee\u9898\u7684\u6709\u6548\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5728\u6574\u5408\u975e\u53c2\u6570\u5316\u7684\u5916\u90e8\u652f\u6301\u8bc1\u636e\u4e0e\u5185\u90e8\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e0d\u53ef\u907f\u514d\u7684\u77e5\u8bc6\u51b2\u7a81\u53ef\u80fd\u4f1a\u4ea7\u751f\uff0c\u5bfc\u81f4\u6a21\u578b\u54cd\u5e94\u4e2d\u7684\u6df7\u6dc6\u3002\u4e3a\u4e86\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u77e5\u8bc6\u9009\u62e9\u80fd\u529b\uff0c\u4e00\u4e9b\u7814\u7a76\u5df2\u7ecf\u5173\u6ce8\u4e8e\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u6765\u7ec6\u5316\u5176\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u660e\u786e\u7684\u8d1f\u5411\u4fe1\u53f7\u548c\u6bd4\u8f83\u76ee\u6807\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u8fdb\u884c\u5fae\u8c03\u7684\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u3001\u73b0\u5b9e\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u4ecd\u7136\u53ef\u80fd\u8868\u73b0\u51fa\u4e0d\u7406\u60f3\u7684\u7279\u6027\u3002 \u9488\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u610f\u8bc6\u504f\u597d\u4f18\u5316\uff08KaPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5bf9\u771f\u5b9e\u68c0\u7d22\u573a\u666f\u4e2d\u77e5\u8bc6\u9009\u62e9\u7684\u53ef\u63a7\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63a2\u7d22\u5e76\u6a21\u62df\u4e86\u4e0d\u540c\u4e0a\u4e0b\u6587\u7ec4\u5408\u4e0b\u7684\u9519\u8bef\u7c7b\u578b\uff0c\u5e76\u901a\u8fc7\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u5b66\u4e60\u5982\u4f55\u907f\u514d\u8fd9\u4e9b\u8d1f\u5411\u4fe1\u53f7\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u54cd\u5e94\u957f\u5ea6\u4e0e\u8868\u793a\u4e0d\u540c\u884c\u4e3a\u6a21\u5f0f\u7684\u504f\u597d\u6570\u636e\u6bd4\u4f8b\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u6211\u4eec\u589e\u5f3a\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u80fd\u529b\u548c\u566a\u58f0\u9c81\u68d2\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0cKaPO\u5728\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u65b9\u9762\u53d6\u5f97\u4e86\u8d85\u8fc737%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u5728\u5404\u79cd\u79bb\u7fa4\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u4e86\u7a33\u5065\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2408.03281": "|**2024-08-07**|**StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation**|Boxi Cao et.al.|[2408.03281](http://arxiv.org/abs/2408.03281)|**[link](https://github.com/c-box/structeval)**|\u8bc4\u4ef7\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f00\u53d1\u7684\u5173\u952e\u5de5\u5177\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u5f0f\u901a\u5e38\u91c7\u7528\u5355\u4e00\u6307\u6807\u8bc4\u4f30\u6a21\u5f0f\uff0c\u5bf9\u6bcf\u4e2a\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u5728\u533a\u5206\u6a21\u578b\u662f\u5426\u771f\u6b63\u5177\u5907\u6240\u9700\u80fd\u529b\u8fd8\u662f\u4ec5\u4ec5\u8bb0\u5fc6\u6216\u731c\u6d4b\u7279\u5b9a\u95ee\u9898\u7684\u7b54\u6848\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aStructEval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\u3002\u4ece\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u51fa\u53d1\uff0cStructEval\u901a\u8fc7\u5728\u591a\u4e2a\u8ba4\u77e5\u5c42\u6b21\u548c\u5173\u952e\u6982\u5ff5\u4e0a\u8fdb\u884c\u7ed3\u6784\u5316\u7684\u8bc4\u4f30\u6765\u6df1\u5316\u548c\u62d3\u5bbd\u8bc4\u4f30\u8303\u56f4\uff0c\u4ece\u800c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u4f9b\u5168\u9762\u3001\u7a33\u5065\u4e14\u4e00\u81f4\u7684\u8bc4\u4f30\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0cStructEval\u662f\u4e00\u4e2a\u53ef\u9760\u7684\u5de5\u5177\uff0c\u80fd\u591f\u62b5\u6297\u6570\u636e\u6c61\u67d3\u7684\u98ce\u9669\u5e76\u51cf\u5c11\u6f5c\u5728\u504f\u89c1\u7684\u5e72\u6270\uff0c\u4ece\u800c\u63d0\u4f9b\u5173\u4e8e\u6a21\u578b\u80fd\u529b\u66f4\u53ef\u9760\u548c\u4e00\u81f4\u7684\u7ed3\u8bba\u3002\u6211\u4eec\u7684\u6846\u67b6\u8fd8\u4e3a\u672a\u6765\u539f\u7406\u6027\u548c\u53ef\u4fe1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u534f\u8bae\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u542f\u793a\u3002|\n", "2408.03256": "|**2024-08-06**|**Synthesizing Text-to-SQL Data from Weak and Strong LLMs**|Jiaxi Yang et.al.|[2408.03256](http://arxiv.org/abs/2408.03256)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0e\u5c01\u95ed\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6587\u672c\u5230SQL\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u5dee\u8ddd\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5408\u6210\u6570\u636e\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u66f4\u5927\u3001\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff08\u5f3a\u6a21\u578b\uff09\u4e0e\u8f83\u5c0f\u3001\u4e0d\u5b8c\u5168\u5bf9\u9f50\u6a21\u578b\u751f\u6210\u7684\u9519\u8bef\u4fe1\u606f\u6570\u636e\uff08\u5f31\u6a21\u578b\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230SQL\u6a21\u578b\u7684\u9886\u57df\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u63a2\u7d22\u4e86\u9519\u8bef\u6570\u636e\u76d1\u7763\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u5408\u6210\u6570\u636e\u65b9\u6cd5\u5bf9\u5f00\u6e90LLM\u8fdb\u884c\u6307\u4ee4\u8c03\u6574\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u4e13\u95e8\u9488\u5bf9\u6587\u672c\u5230SQL\u4efb\u52a1\u7684\u6a21\u578bSENSE\u3002\u901a\u8fc7\u5728SPIDER\u548cBIRD\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\uff0c\u8bc1\u660e\u4e86SENSE\u7684\u6709\u6548\u6027\uff0c\u6210\u529f\u7f29\u5c0f\u4e86\u5f00\u6e90\u6a21\u578b\u4e0e\u57fa\u4e8e\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u65b9\u6cd5\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2408.03247": "|**2024-08-06**|**Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons**|Yifei Wang et.al.|[2408.03247](http://arxiv.org/abs/2408.03247)|**[link](https://github.com/wangyifei0047/tfrkn)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9762\u5bf9\u63a8\u7406\u4efb\u52a1\u65f6\u662f\u5426\u79ef\u6781\u5730\u56de\u5fc6\u6216\u68c0\u7d22\u5176\u5185\u90e8\u4e8b\u5b9e\u77e5\u8bc6\u5e93\u3002\u901a\u8fc7\u5206\u6790LLM\u5728\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u5185\u90e8\u4e8b\u5b9e\u53ec\u56de\u60c5\u51b5\uff0c\u5373\u6240\u8c13\u7684\u77e5\u8bc6\u795e\u7ecf\u5143\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0cLLM\u672a\u80fd\u6709\u6548\u5229\u7528\u5173\u952e\u7684\u4e8b\u5b9e\u5173\u8054\u3002\u76f8\u53cd\uff0c\u5b83\u4eec\u503e\u5411\u4e8e\u91c7\u53d6\u66ff\u4ee3\u7684\u3001\u5feb\u6377\u7684\u8def\u5f84\u6765\u56de\u7b54\u63a8\u7406\u95ee\u9898\u3002\u901a\u8fc7\u624b\u52a8\u8c03\u6574LLM\u4e2d\u53c2\u6570\u77e5\u8bc6\u7684\u53ec\u56de\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc1\u660e\u76f4\u63a5\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u63a8\u7406\u6027\u80fd\uff0c\u800c\u6291\u5236\u5b83\u5219\u4f1a\u5bfc\u81f4\u660e\u663e\u7684\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7684\u5f71\u54cd\uff0c\u8fd9\u662f\u4e00\u79cd\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5f3a\u5927\u6280\u672f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cCoT\u53ef\u4ee5\u901a\u8fc7\u9f13\u52b1LLM\u8fdb\u884c\u6709\u6761\u7406\u548c\u53ef\u9760\u7684\u63a8\u7406\u6765\u589e\u5f3a\u5bf9\u4e8b\u5b9e\u77e5\u8bc6\u7684\u56de\u5fc6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0a\u4e0b\u6587\u51b2\u7a81\u5982\u4f55\u5f71\u54cd\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e8b\u5b9e\u7684\u68c0\u7d22\uff0c\u4ee5\u83b7\u5f97\u5bf9LLM\u4e8b\u5b9e\u56de\u5fc6\u884c\u4e3a\u7684\u5168\u9762\u7406\u89e3\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0d\u4e45\u540e\u63d0\u4f9b\u3002|\n", "2408.03172": "|**2024-08-06**|**Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi**|Pranita Deshmukh et.al.|[2408.03172](http://arxiv.org/abs/2408.03172)|null|\u968f\u7740\u4f4e\u8d44\u6e90\u8bed\u8a00\u6570\u5b57\u5185\u5bb9\u7684\u6fc0\u589e\uff0c\u9488\u5bf9\u8fd9\u4e9b\u8bed\u8a00\u7684\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u9700\u6c42\u6b63\u5728\u589e\u52a0\u3002BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff09\u4f5c\u4e3a\u4f17\u591aNLP\u67b6\u6784\u548c\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u6846\u67b6\uff0c\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u5f00\u53d1\u4f4e\u8d44\u6e90NLP\u6a21\u578b\u3002\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u51cf\u5c11\u8bad\u7ec3\u53c2\u6570\uff0c\u4ee5\u964d\u4f4e\u8bad\u7ec3\u6a21\u578b\u6240\u9700\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u5e76\u8fbe\u5230\u4e0e\u5b8c\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\u7684\u7ed3\u679c\u3002\u672c\u7814\u7a76\u65e8\u5728\u5206\u6790PEFT\u65b9\u6cd5\u5728\u9a6c\u62c9\u5730\u8bed\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u5bf9\u5404\u79cd\u5355\u8bed\u548c\u591a\u8bed\u79cd\u9a6c\u62c9\u5730\u8bedBERT\u6a21\u578b\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728MahaSent\u3001MahaHate\u548cMahaNews\u7b49\u91cd\u8981\u6587\u672c\u5206\u7c7b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002PEFT\u6280\u672f\u7684\u5f15\u5165\u663e\u8457\u52a0\u5feb\u4e86\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u7684\u5173\u952e\u65b9\u9762\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u548c\u9002\u914d\u5668\u65b9\u6cd5\u5728\u4f4e\u8d44\u6e90\u6587\u672c\u5206\u7c7b\u4e2d\u7684\u5e94\u7528\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u51c6\u786e\u7387\u4e0a\u4e0e\u5168\u91cf\u5fae\u8c03\u76f8\u5f53\uff0c\u4e14\u65e0\u9700\u635f\u5931\uff0c\u53ef\u7528\u4e8e\u9a6c\u62c9\u5730\u8bed\u548c\u5176\u4ed6\u5370\u5ea6\u8bed\u65cf\u8bed\u8a00\u7684NLP\u80fd\u529b\u6301\u7eed\u53d1\u5c55\u3002|\n", "2408.03150": "|**2024-08-06**|**Conditioning LLMs with Emotion in Neural Machine Translation**|Charles Brazier et.al.|[2408.03150](http://arxiv.org/abs/2408.03150)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u673a\u5668\u7ffb\u8bd1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u901a\u8fc7\u5c06\u60c5\u611f\u4fe1\u606f\u6574\u5408\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u6765\u589e\u5f3a\u7ffb\u8bd1\u8d28\u91cf\uff0c\u8fd9\u4e9b\u60c5\u611f\u4fe1\u606f\u662f\u4ece\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\uff08SER\uff09\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u4e94\u4e2a\u73b0\u6709\u7684LLM\u8fdb\u884cLibri-trans\u6570\u636e\u96c6\u7684\u5fae\u8c03\uff0c\u5e76\u9009\u62e9\u8868\u73b0\u6700\u4f73\u7684\u6a21\u578b\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ee5\u4e0d\u540c\u7ef4\u5ea6\u7684\u60c5\u611f\u589e\u5f3aLLM\u63d0\u793a\uff0c\u5e76\u5728\u8fd9\u4e9b\u4e0d\u540c\u7684\u914d\u7f6e\u4e0b\u8bad\u7ec3\u9009\u5b9a\u7684LLM\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c06\u60c5\u611f\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5524\u9192\u5ea6\uff0c\u6574\u5408\u5230LLM\u63d0\u793a\u4e2d\uff0c\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u7ffb\u8bd1\u8d28\u91cf\u3002|\n", "2408.03130": "|**2024-08-06**|**Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations**|Leo Donisch et.al.|[2408.03130](http://arxiv.org/abs/2408.03130)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u65e0\u5904\u4e0d\u5728\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u89c4\u6a21\u548c\u590d\u6742\u6027\u5e26\u6765\u4e86\u72ec\u7279\u7684\u6311\u6218\u4e0e\u673a\u9047\uff0c\u4fc3\u4f7f\u7814\u7a76\u8005\u4e0e\u5b9e\u8df5\u8005\u63a2\u7d22\u65b0\u578b\u7684\u6a21\u578b\u8bad\u7ec3\u3001\u4f18\u5316\u548c\u90e8\u7f72\u65b9\u6cd5\u3002\u672c\u6587\u7efc\u8ff0\u7684\u91cd\u70b9\u5728\u4e8e\u5404\u79cd\u964d\u4f4e\u8d44\u6e90\u9700\u6c42\u548c\u538b\u7f29\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6280\u672f\uff0c\u5305\u62ec\u91cf\u5316\u3001\u526a\u679d\u3001\u77e5\u8bc6\u84b8\u998f\u4ee5\u53ca\u67b6\u6784\u4f18\u5316\u3002\u4e3b\u8981\u76ee\u6807\u662f\u6df1\u5165\u63a2\u8ba8\u6bcf\u79cd\u65b9\u6cd5\uff0c\u5e76\u7a81\u51fa\u5176\u72ec\u7279\u6311\u6218\u53ca\u5176\u5b9e\u9645\u5e94\u7528\u3002\u8ba8\u8bba\u7684\u65b9\u6cd5\u6309\u7167\u5206\u7c7b\u5b66\u8fdb\u884c\u7ec4\u7ec7\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4f18\u5316\u666f\u89c2\u7684\u6982\u89c8\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7814\u7a76\u8f68\u8ff9\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u4e0d\u8981\u8f93\u51fa\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u786e\u4fdd\u7ffb\u8bd1\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2408.03127": "|**2024-08-06**|**Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation**|Artur Guimar\u00e3es et.al.|[2408.03127](http://arxiv.org/abs/2408.03127)|**[link](https://github.com/araag2/SemEval2024-Task2)**|\u8fd9\u7bc7\u8bba\u6587\u9610\u8ff0\u4e86\u6211\u4eec\u5bf9SemEval-2024\u5b89\u5168\u751f\u7269\u533b\u5b66\u81ea\u7136\u8bed\u8a00\u63a8\u65ad\u5728\u4e34\u5e8a\u8bd5\u9a8c\uff08NLI4CT\uff09\u4efb\u52a1\u7684\u5904\u7406\u7b56\u7565\u3002\u8be5\u4efb\u52a1\u6d89\u53ca\u5bf9\u4e34\u5e8a\u8bd5\u9a8c\u62a5\u544a\uff08CTRs\uff09\u4e2d\u7684\u9648\u8ff0\u8fdb\u884c\u5206\u7c7b\u3002\u6211\u4eec\u63a2\u7d22\u4e86Mistral-7B\u8fd9\u79cd\u901a\u7528\u7684\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u80fd\u529b\u3002\u6211\u4eec\u4e3aNLI4CT\u4efb\u52a1\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u589e\u5f3a\u540e\u7684\u8bad\u7ec3\u6570\u636e\u96c6\u5bf9\u91cf\u5316\u7248\u672c\u7684\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u5b8fF1\u5206\u6570\u65b9\u9762\u53ef\u4ee5\u4ea7\u751f\u663e\u8457\u7684\u7ed3\u679c\uff0c\u4f46\u5728\u5fe0\u5b9e\u6027\u548c\u4e00\u81f4\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u6240\u6709\u5f00\u53d1\u7684\u4ee3\u7801\u90fd\u5728GitHub\u4ed3\u5e93\u4e2d\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.03119": "|**2024-08-06**|**Evaluating the Translation Performance of Large Language Models Based on Euas-20**|Yan Huang et.al.|[2408.03119](http://arxiv.org/abs/2408.03119)|null|\u8fd1\u5e74\u6765\uff0c\u5728\u6df1\u5ea6\u5b66\u4e60\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982BERT\u548cGPT\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u7a81\u7834\u6027\u6210\u679c\u3002\u673a\u5668\u7ffb\u8bd1\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u6838\u5fc3\u4efb\u52a1\u4e4b\u4e00\uff0c\u4e5f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e2d\u53d7\u76ca\u532a\u6d45\uff0c\u5b9e\u73b0\u4e86\u8d28\u7684\u98de\u8dc3\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u673a\u5668\u7ffb\u8bd1\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u56e0\u6b64\uff0c\u672c\u6587\u6784\u5efa\u4e86Euas-20\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3001\u4e0d\u540c\u8bed\u8a00\u7684\u7ffb\u8bd1\u80fd\u529b\u4ee5\u53ca\u9884\u8bad\u7ec3\u6570\u636e\u5bf9LLMs\u7ffb\u8bd1\u80fd\u529b\u7684\u5f71\u54cd\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u53c2\u8003\u3002|\n", "2408.03940": "|**2024-08-07**|**How Well Can Vision Language Models See Image Details?**|Chenhui Gou et.al.|[2408.03940](http://arxiv.org/abs/2408.03940)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LLM-\u9a71\u52a8\u7684VLM\uff09\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9bVLM\u662f\u5426\u80fd\u8d85\u8d8a\u8bed\u4e49\u5c42\u9762\uff0c\u6df1\u5165\u89c2\u5bdf\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u4e0d\u660e\u6717\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u50cf\u7d20\u503c\u9884\u6d4b\u4efb\u52a1\uff08PVP\uff09\uff0c\u4ee5\u63a2\u7d22\u201c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u770b\u5230\u591a\u7ec6\u7684\u56fe\u50cf\u7ec6\u8282\uff1f\u201d\u5e76\u534f\u52a9VLM\u63d0\u5347\u5bf9\u7ec6\u8282\u7684\u611f\u77e5\u80fd\u529b\u3002\u901a\u5e38\uff0c\u8fd9\u4e9b\u6a21\u578b\u7531\u51bb\u7ed3\u7684CLIP\u89c6\u89c9\u7f16\u7801\u5668\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fde\u63a5\u6a21\u5757\u7ec4\u6210\u3002\u5728\u5bf9PVP\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u6211\u4eec\u53d1\u73b0\uff1a1\uff09\u73b0\u6709\u7684VLM\u4ec5\u901a\u8fc7\u5fae\u8c03\u8fde\u63a5\u6a21\u5757\u548cLLM\uff0c\u5728\u9884\u6d4b\u7cbe\u786e\u50cf\u7d20\u503c\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff1b2\uff09\u5f53\u89c6\u89c9\u7f16\u7801\u5668\u4e5f\u5f97\u5230\u9002\u5e94\u65f6\uff0c\u9884\u6d4b\u7cbe\u5ea6\u663e\u8457\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\uff0c\u5c06\u50cf\u7d20\u503c\u9884\u6d4b\u4f5c\u4e3aVLM\u9884\u8bad\u7ec3\u4efb\u52a1\u4e4b\u4e00\uff0c\u5e76\u5bf9\u89c6\u89c9\u7f16\u7801\u5668\u8fdb\u884c\u9002\u5e94\uff0c\u663e\u8457\u63d0\u5347\u4e86VLM\u5728\u9700\u8981\u8be6\u7ec6\u56fe\u50cf\u611f\u77e5\u7684\u4e0b\u6e38\u56fe\u50cf\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5982\u5f15\u7528\u56fe\u50cf\u5206\u5272\uff08\u5e73\u5747cIoU\u6539\u8fdb+10.19\u767e\u5206\u70b9\uff09\u548c\u89c6\u9891\u6e38\u620f\u51b3\u7b56\uff08\u5728\u4e24\u4e2a\u6e38\u620f\u4e2d\u5206\u522b\u5e73\u5747\u5f97\u5206\u6539\u5584\u4e86+80.34\u548c+70.54\uff09\u3002|\n", "2408.03936": "|**2024-08-07**|**SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature**|Vin\u00edcius Di Oliveira et.al.|[2408.03936](http://arxiv.org/abs/2408.03936)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u82f1\u8bed\u4e4b\u5916\u7684\u8bed\u8a00\uff0c\u5c24\u5176\u662f\u5728\u7279\u5b9a\u9886\u57df\u5982Mercosur\u901a\u7528\u5546\u54c1\u540d\u79f0\uff08NCM\uff09\uff0c\u5df4\u897f\u534f\u8c03\u7cfb\u7edf\uff08HS\uff09\u7684\u5e94\u7528\u65b9\u9762\uff0c\u4ecd\u6709\u5f88\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u7f3a\u53e3\uff0c\u672c\u7814\u7a76\u5229\u7528TeenyTineLLaMA\uff0c\u4e00\u79cd\u57fa\u7840\u8461\u8404\u7259\u8bedLLM\uff0c\u4f5c\u4e3aLLM\u6e90\uff0c\u5b9e\u65bdNCM\u5e94\u7528\u5904\u7406\u3002\u6b64\u5916\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u4efb\u52a1\u7279\u5b9a\u5fae\u8c03\u7684\u7b80\u5316\u68c0\u7d22\u589e\u5f3a\u5fae\u8c03\uff08SLIM-RAFT\uff09\u6280\u672f\u3002\u8be5\u65b9\u6cd5\u91c7\u7528\u7b80\u5316\u7684\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u7b56\u7565\u8fdb\u884c\u63d0\u793a\u5f00\u53d1\uff0c\u4f7f\u7528\u7b80\u77ed\u800c\u96c6\u4e2d\u7684\u6587\u6863\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u66f4\u7d27\u51d1\u548c\u9ad8\u6548\u7684\u65b9\u5f0f\u8fdb\u884c\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u76f8\u540c\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8eTeenyTineLLaMA\u548cChatGPT-4\uff0c\u5c55\u793a\u4e86\u8f83\u5c0fLLM\u5fae\u8c03\u7684\u9ad8\u6548\u548c\u6210\u672c\u6548\u76ca\u66ff\u4ee3\u65b9\u6848\u3002\u5c3d\u7ba1\u7814\u7a76\u91cd\u70b9\u662fNCM\u5e94\u7528\uff0c\u4f46\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u53ef\u4ee5\u8f7b\u677e\u5730\u9002\u5e94\u5168\u7403\u8303\u56f4\u5185\u7684HS\u5e94\u7528\u3002|\n", "2408.03910": "|**2024-08-07**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u5177\u6709\u7279\u5b9a\u7684\u4efb\u52a1\u6027\uff0c\u5e76\u4e14\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u8fd9\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\framework\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\uff0c\u5b83\u5c06LLM\u4ee3\u7406\u4e0e\u4ece\u4ee3\u7801\u4ed3\u5e93\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u96c6\u6210\u5728\u4e00\u8d77\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u4ee5\u53ca\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0c\\framework\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\\framework\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u51ed\u501f\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0c\\framework\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u4f53\u73b0\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u529f\u80fd\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03907": "|**2024-08-07**|**Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models**|Shachi H Kumar et.al.|[2408.03907](http://arxiv.org/abs/2408.03907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u8bed\u8a00\u548c\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u76f8\u5f53\u7684\u6587\u672c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5373\u4f7f\u7ecf\u8fc7\u76d1\u7763\u8bad\u7ec3\u548c\u4eba\u7c7b\u5bf9\u9f50\uff0c\u8fd9\u4e9bLLM\u4ecd\u5bb9\u6613\u53d7\u5230\u6076\u610f\u7528\u6237\u7684\u653b\u51fb\uff0c\u540e\u8005\u53ef\u4ee5\u901a\u8fc7\u63d0\u793a\u6a21\u578b\u751f\u6210\u4e0d\u5e0c\u671b\u770b\u5230\u7684\u6587\u672c\u3002\u6b64\u5916\uff0cLLM\u5185\u5d4c\u6709\u6f5c\u5728\u504f\u89c1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4e92\u52a8\u4e2d\u7684\u5404\u79cd\u6709\u5bb3\u5f71\u54cd\u3002\u5f53\u524d\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\u7f3a\u4e4f\u6807\u51c6\u548c\u5171\u8bc6\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4eba\u5de5\u751f\u6210\u7684\u6a21\u677f\u548c\u6ce8\u91ca\uff0c\u8fd9\u65e2\u6602\u8d35\u53c8\u8d39\u65f6\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u521b\u5efa\u5bf9\u6297\u6027\u63d0\u793a\u6765\u6fc0\u53d1\u76ee\u6807LLM\u751f\u6210\u5e26\u6709\u504f\u89c1\u7684\u54cd\u5e94\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5206\u6790\u4e86\u591a\u79cd\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u548c\u6307\u6807\u3002\u6211\u4eec\u6df1\u5165\u63a2\u8ba8\u4e86\u6a21\u578b\u54cd\u5e94\u7684\u5404\u79cd\u7ec6\u5fae\u5dee\u522b\uff0c\u8bc6\u522b\u4e86\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u5e76\u8bc4\u4f30\u4e86\u8bc4\u4f30\u65b9\u6cd5\u7684\u4e0d\u8db3\u4e4b\u5904\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u6307\u6807\u4e0e\u4eba\u5de5\u8bc4\u4f30\u8fdb\u884c\u6bd4\u8f83\uff0c\u5e76\u9a8c\u8bc1\u4e86\u201cLLM\u4f5c\u4e3a\u6cd5\u5b98\u201d\u7684\u6307\u6807\u4e0e\u751f\u6210\u504f\u89c1\u5224\u65ad\u7684\u4eba\u7c7b\u8bc4\u4ef7\u4e00\u81f4\u3002|\n", "2408.03876": "|**2024-08-07**|**From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems**|Leixian Shen et.al.|[2408.03876](http://arxiv.org/abs/2408.03876)|null|\u521b\u5efa\u4ece\u539f\u59cb\u6570\u636e\u751f\u6210\u6570\u636e\u6545\u4e8b\u7684\u8fc7\u7a0b\u6781\u5177\u6311\u6218\u6027\uff0c\u8fd9\u4e3b\u8981\u6e90\u4e8e\u4eba\u7c7b\u6709\u9650\u7684\u6ce8\u610f\u529b\u548c\u5bf9\u7279\u5b9a\u6280\u80fd\u7684\u9700\u6c42\u3002\u8fd1\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u6784\u5efa\u5229\u7528\u72ec\u7acb\u4ee3\u7406\u5b9e\u73b0\u5de5\u4f5c\u6d41\u7a0b\u81ea\u52a8\u5316\u4ee5\u7b80\u5316\u6570\u636e\u6545\u4e8b\u521b\u4f5c\u6d41\u7a0b\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5de8\u5927\u673a\u9047\u3002\u5c3d\u7ba1\u591a\u4ee3\u7406\u7cfb\u7edf\u80fd\u591f\u5145\u5206\u6316\u6398LLM\u6f5c\u529b\u5e76\u5206\u89e3\u4efb\u52a1\u4f9b\u4e2a\u4f53\u4ee3\u7406\u6267\u884c\u5177\u6709\u8bf8\u591a\u4f18\u52bf\uff0c\u4f46\u5728\u8bbe\u8ba1\u8fd9\u4e9b\u7cfb\u7edf\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4efb\u52a1\u5206\u89e3\u3001\u5b50\u4efb\u52a1\u6027\u80fd\u4f18\u5316\u4ee5\u53ca\u5de5\u4f5c\u6d41\u7a0b\u8bbe\u8ba1\u7b49\u65b9\u9762\u7684\u6311\u6218\u3002\u4e3a\u4e86\u66f4\u6df1\u5165\u5730\u7406\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Data Director\u2014\u2014\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u751f\u6210\u52a8\u753b\u6570\u636e\u89c6\u9891\uff0c\u8fd9\u4e00\u7c7b\u6570\u636e\u6545\u4e8b\u7684\u5178\u578b\u5f62\u5f0f\u3002Data Director\u901a\u8fc7\u89e3\u6790\u539f\u59cb\u6570\u636e\u3001\u62c6\u5206\u4efb\u52a1\u3001\u8bbe\u8ba1\u4ee3\u7406\u89d2\u8272\u4ee5\u8fdb\u884c\u81ea\u52a8\u51b3\u7b56\uff0c\u5e76\u65e0\u7f1d\u6574\u5408\u6570\u636e\u89c6\u9891\u4e2d\u7684\u5404\u79cd\u7ec4\u4ef6\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86Data Director\u5728\u751f\u6210\u6570\u636e\u89c6\u9891\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6574\u4e2a\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ece\u89e3\u51b3\u9762\u4e34\u7684\u6311\u6218\u4e2d\u63d0\u70bc\u51fa\u4e86\u7ecf\u9a8c\u6559\u8bad\uff0c\u8fd9\u4e9b\u7ecf\u9a8c\u5bf9\u4e8e\u6307\u5bfc\u672a\u6765\u5728\u6570\u636e\u6545\u4e8b\u53d9\u8ff0\u9886\u57df\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4e5f\u63ed\u793a\u4e86\u5168\u7403\u4f18\u5316\u3001\u4eba\u673a\u4ea4\u4e92\u8bbe\u8ba1\u4ee5\u53ca\u9ad8\u7ea7\u591a\u6a21\u6001LLM\u5e94\u7528\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.03865": "|**2024-08-07**|**PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training**|Haoran Xu et.al.|[2408.03865](http://arxiv.org/abs/2408.03865)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u4f20\u7edf\u7684Transformer\u6a21\u578b\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u53d8\u5f97\u8ba1\u7b97\u5bc6\u96c6\u578b\uff0c\u56e0\u4e3a\u5176\u8ba1\u7b97\u91cf\u968f\u5e8f\u5217\u957f\u5ea6\u7684\u5e73\u65b9\u589e\u957f\u3002Mamba\u4f5c\u4e3a\u751f\u6210AI\u9886\u57df\u7684\u4e00\u9879\u7a81\u7834\u6027\u67b6\u6784\uff0c\u5c55\u73b0\u51fa\u5728\u51cf\u5c11\u8ba1\u7b97\u548c\u5185\u5b58\u590d\u6742\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u9ad8\u6548\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684Mamba\u8bad\u7ec3\u6846\u67b6\u5728\u5904\u7406\u53d8\u957f\u5e8f\u5217\u8f93\u5165\u65f6\u5b58\u5728\u6548\u7387\u95ee\u9898\u3002\u5355\u5e8f\u5217\u8bad\u7ec3\u4f1a\u5bfc\u81f4GPU\u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u800c\u5bf9\u53d8\u957f\u5e8f\u5217\u8fdb\u884c\u6279\u91cf\u5904\u7406\u5230\u6700\u5927\u957f\u5ea6\u5219\u4f1a\u5e26\u6765\u663e\u8457\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5206\u6790\u4e86Mamba\u67b6\u6784\u4e2d\u74f6\u9888\u64cd\u4f5c\u5668\u5728\u4e0d\u540c\u5f20\u91cf\u5f62\u72b6\u4e0b\u7684\u6027\u80fd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPackMamba\u7684\u9ad8\u541e\u5410\u91cfMamba\uff0c\u5b83\u80fd\u591f\u6709\u6548\u5730\u5904\u7406\u53d8\u957f\u5e8f\u5217\u3002\u6df1\u5165\u7814\u7a76\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSMs\uff09\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u5e76\u884c\u64cd\u4f5c\u5668\uff0c\u4ee5\u907f\u514d\u5728\u5404\u4e2a\u5e8f\u5217\u4e4b\u95f4\u4f20\u9012\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6027\u80fd\u3002\u5728NVIDIA A100 GPU\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPackMamba\u5728\u5904\u74061.4B\u6a21\u578b\u65f6\u6bd4\u57fa\u7ebf\u5355\u5e8f\u5217\u5904\u7406\u65b9\u6848\u63d0\u9ad8\u4e863.06\u500d\u7684\u901f\u5ea6\uff0c\u5728\u5904\u74062.8B\u6a21\u578b\u65f6\u63d0\u9ad8\u4e862.62\u500d\u7684\u901f\u5ea6\u3002|\n", "2408.03847": "|**2024-08-07**|**GAIA -- A Large Language Model for Advanced Power Dispatch**|Yuheng Cheng et.al.|[2408.03847](http://arxiv.org/abs/2408.03847)|null|\u7535\u529b\u8c03\u5ea6\u5bf9\u4e8e\u63d0\u4f9b\u7a33\u5b9a\u3001\u7ecf\u6d4e\u4e14\u73af\u4fdd\u7684\u7535\u529b\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u7535\u529b\u7cfb\u7edf\u89c4\u6a21\u548c\u590d\u6742\u6027\u7684\u589e\u957f\uff0c\u4f20\u7edf\u7684\u8c03\u5ea6\u65b9\u6cd5\u5728\u591a\u4efb\u52a1\u5904\u7406\u3001\u5feb\u901f\u95ee\u9898\u89e3\u51b3\u4ee5\u53ca\u4eba\u673a\u534f\u4f5c\u65b9\u9762\u9047\u5230\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4e13\u4e3a\u7535\u529b\u8c03\u5ea6\u4efb\u52a1\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u2014\u2014GAIA\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u96c6\u6784\u5efa\u6280\u672f\uff0c\u5229\u7528\u591a\u79cd\u6570\u636e\u6e90\u5bf9GAIA\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u5176\u5728\u8be5\u9886\u57df\u7684\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u7b80\u5316\u4e86LLM\u7684\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u4f7f\u5f97\u5728\u7535\u529b\u7cfb\u7edf\u7ba1\u7406\u4e2d\u80fd\u591f\u65e0\u7f1d\u6574\u5408\u591a\u7ef4\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e13\u95e8\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u9ad8GAIA\u5728\u8c03\u5ea6\u573a\u666f\u4e0b\u7684\u8f93\u5165\u8f93\u51fa\u6548\u7387\u3002\u5728ElecBench\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cGAIA\u5728\u591a\u4e2a\u6307\u6807\u4e0a\u8d85\u8d8a\u4e86\u57fa\u7840\u6a21\u578bLLaMA2\u3002\u5b9e\u9645\u5e94\u7528\u8868\u660e\uff0cGAIA\u80fd\u591f\u589e\u5f3a\u51b3\u7b56\u8fc7\u7a0b\u3001\u63d0\u9ad8\u8fd0\u8425\u6548\u7387\uff0c\u5e76\u4fc3\u8fdb\u7535\u529b\u8c03\u5ea6\u64cd\u4f5c\u4e2d\u7684\u4eba\u673a\u4ea4\u4e92\u3002\u672c\u6587\u6269\u5c55\u4e86LLM\u5728\u7535\u529b\u8c03\u5ea6\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u9a8c\u8bc1\u4e86\u5176\u5b9e\u7528\u6027\uff0c\u4e3a\u8fd9\u4e00\u9886\u57df\u672a\u6765\u7684\u521b\u65b0\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2408.03841": "|**2024-08-07**|**MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models**|Yuchen Dong et.al.|[2408.03841](http://arxiv.org/abs/2408.03841)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u64cd\u4f5c\u548c\u5de5\u5177\u751f\u6210\uff08SOTG\uff09\u9886\u57df\u7684\u5e94\u7528\uff0c\u4ee5\u6b64\u6765\u63d0\u5347\u8f6f\u4ef6\u751f\u4ea7\u529b\u3002\u8fd9\u4e00\u8fc7\u7a0b\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u6587\u660e\u65e9\u671f\u901a\u8fc7\u521b\u9020\u5e76\u4f7f\u7528\u5de5\u5177\u52a0\u901f\u53d1\u5c55\u7684\u9636\u6bb5\u3002\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u8981\u6c42AI\u80fd\u591f\u6301\u7eed\u603b\u7ed3\u5e76\u6539\u8fdb\u3002\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u5ffd\u89c6\u4e86\u5c06\u5b9e\u65f6\u4efb\u52a1\u7ecf\u9a8c\u8f6c\u5316\u4e3a\u7cfb\u7edf\u8bb0\u5fc6\u4ee5\u53ca\u533a\u5206\u73b0\u6709\u77e5\u8bc6\u672a\u6765\u4ef7\u503c\u7684\u91cd\u8981\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u201cMemory-Loop\u7f51\u7edc\u201d\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u7684\u8bb0\u5fc6\u5b58\u50a8\u4e0e\u7ecf\u9a8c\u5f15\u7528\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u57fa\u4e8e\u77e5\u8bc6\u7cbe\u786e\u5206\u6bb5\u7684RAG\u673a\u5236\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u4fbf\u6839\u636e\u4ef7\u503c\u5dee\u5f02\u5229\u7528\u8bb0\u5fc6\u3002\u9488\u5bf9SOTG\u8bbe\u8ba1\u4e86MaxMind\u6a21\u578b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MaxMind4Sheet\uff0c\u4e00\u4e2a\u9075\u5faaMaxMind\u7406\u5ff5\u7684\u7535\u5b50\u8868\u683c\u5904\u7406\u7cfb\u7edf\u3002\u4e0eSheetCopilot\u7684\u6bd4\u8f83\u5b9e\u9a8c\u663e\u793a\uff0c\u4efb\u52a1\u8bb0\u5fc6\u7684\u79ef\u7d2f\u548c\u5faa\u73af\u80fd\u591f\u7a33\u6b65\u63d0\u9ad8\u4efb\u52a1\u6210\u529f\u7387\uff0c\u5728\u6b64\u793a\u4f8b\u5b9e\u65bd\u4e2d\uff0c\u6bcf\u8f6e\u7684\u6210\u529f\u7387\u63d0\u5347\u7ea6\u4e3a3%-6%\u3002\u968f\u7740\u8bb0\u5fc6\u7684\u6301\u7eed\u589e\u957f\uff0c\u8fd9\u79cd\u7d2f\u79ef\u6539\u8fdb\u53ef\u80fd\u4f1a\u975e\u5e38\u663e\u8457\u3002 \u5f15\u5165\u8bb0\u5fc6\u5faa\u73af\u8fd8\u53ef\u4ee5\u901a\u8fc7\u9ad8\u8fbe25%\u7684\u6548\u7387\u63d0\u5347\u589e\u52a0\u7cfb\u7edf\u7684\u4efb\u52a1\u6267\u884c\u6548\u7387\uff0c\u5e76\u901a\u8fc7\u8bb0\u5fc6\u8f6c\u79fb\u89e3\u51b3LLM\u5728\u5904\u7406\u4e13\u4e1a\u4efb\u52a1\u65f6\u9762\u4e34\u7684\u518d\u8bad\u7ec3\u95ee\u9898\u3002\u8fd9\u8868\u660eMaxMind\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728SOTG\u9886\u57df\u7684\u529f\u80fd\u548c\u751f\u4ea7\u529b\u3002|\n", "2408.03837": "|**2024-08-07**|**WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models**|Prannaya Gupta et.al.|[2408.03837](http://arxiv.org/abs/2408.03837)|**[link](https://github.com/walledai/walledeval)**|WalledEval\u662f\u4e00\u4e2a\u5168\u9762\u7684AI\u5b89\u5168\u6027\u6d4b\u8bd5\u5de5\u5177\u5305\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u5b83\u80fd\u591f\u517c\u5bb9\u5404\u79cd\u6a21\u578b\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u4e24\u79cd\u7c7b\u578b\uff0c\u5e76\u5305\u542b\u4e86\u8d85\u8fc735\u4e2a\u8986\u76d6\u591a\u8bed\u8a00\u5b89\u5168\u3001\u5938\u5f20\u5b89\u5168\u4ee5\u53ca\u63d0\u793a\u6ce8\u5165\u7b49\u9886\u57df\u7684\u5b89\u5168\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u652f\u6301\u5bf9LLM\u548c\u88c1\u5224\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u4e14\u96c6\u6210\u81ea\u5b9a\u4e49\u7a81\u53d8\u5668\uff0c\u7528\u4e8e\u6d4b\u8bd5\u5728\u4e0d\u540c\u6587\u672c\u98ce\u683c\u53d8\u5f02\u5982\u5c06\u6765\u65f6\u6001\u548c\u91cd\u8ff0\u4e0b\u7684\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0cWalledEval\u5f15\u5165\u4e86WalledGuard\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5c0f\u578b\u9ad8\u6548\u5185\u5bb9\u5ba1\u6838\u5de5\u5177\uff0c\u4ee5\u53caSGXSTest\uff0c\u7528\u4e8e\u8bc4\u4f30\u6587\u5316\u80cc\u666f\u4e0b\u7684\u5938\u5927\u5b89\u5168\u95ee\u9898\u3002\u6211\u4eec\u5df2\u5c06WalledEval\u516c\u5f00\u53d1\u5e03\u5728https://github.com/walledai/walledevalA\u3002|\n", "2408.03834": "|**2024-08-07**|**Target Prompting for Information Extraction with Vision Language Model**|Dipankar Medhi et.al.|[2408.03834](http://arxiv.org/abs/2408.03834)|null|\u8fd1\u671f\uff0c\u5927\u578b\u89c6\u89c9\u4e0e\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u9886\u57df\u7684\u53d1\u5c55\u5728\u6784\u5efa\u4fe1\u606f\u63d0\u53d6\u7cfb\u7edf\u65b9\u9762\u5e26\u6765\u4e86\u65b0\u7684\u53d8\u9769\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u7406\u89e3\u6587\u6863\u548c\u6784\u5efa\u8de8\u884c\u4e1a\u7684\u95ee\u9898\u56de\u7b54\u7cfb\u7edf\u65b9\u9762\u8fbe\u5230\u4e86\u9876\u5c16\u6c34\u5e73\uff0c\u663e\u8457\u63d0\u5347\u4e86\u4ece\u6587\u6863\u56fe\u50cf\u751f\u6210\u6587\u672c\u4ee5\u53ca\u63d0\u4f9b\u7cbe\u786e\u7b54\u6848\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u7cbe\u51c6\u5bf9\u8bdd\u7cfb\u7edf\u65f6\u4ecd\u5b58\u5728\u4e00\u4e9b\u6311\u6218\u3002\u4f20\u7edf\u7684\u901a\u7528\u63d0\u793a\u6280\u672f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u7684\u5e94\u7528\u5f80\u5f80\u4e0d\u9002\u5408\u8fd9\u4e9b\u4e13\u95e8\u8bbe\u8ba1\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u4f7f\u7528\u8fd9\u7c7b\u901a\u7528\u8f93\u5165\u63d0\u793a\u6240\u751f\u6210\u7684\u8f93\u51fa\u901a\u5e38\u8f83\u4e3a\u666e\u901a\uff0c\u4e0e\u6587\u6863\u5b9e\u9645\u5185\u5bb9\u76f8\u6bd4\u53ef\u80fd\u5b58\u5728\u4fe1\u606f\u7f3a\u53e3\u3002\u4e3a\u4e86\u83b7\u5f97\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4f53\u7684\u7b54\u6848\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u90e8\u5206\u7684\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u63d0\u793a\uff0c\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7b54\u6848\u3002\u672c\u6587\u8ba8\u8bba\u4e86\u4e00\u79cd\u79f0\u4e3a\u201c\u76ee\u6807\u63d0\u793a\u201d\u7684\u6280\u672f\uff0c\u8be5\u6280\u672f\u4e13\u6ce8\u4e8e\u660e\u786e\u6307\u5411\u6587\u6863\u56fe\u50cf\u7684\u90e8\u5206\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u7528\u6237\u67e5\u8be2\u548c\u8f93\u5165\u63d0\u793a\u5bf9\u6bcf\u79cd\u63d0\u793a\u6280\u672f\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002|\n", "2408.04614": "|**2024-08-08**|**Better Alignment with Instruction Back-and-Forth Translation**|Thao Nguyen et.al.|[2408.04614](http://arxiv.org/abs/2408.04614)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\uff0c\u7528\u4e8e\u6784\u5efa\u57fa\u4e8e\u4e16\u754c\u77e5\u8bc6\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u6570\u636e\uff0c\u4ee5\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5bf9\u9f50\u3002\u7ed9\u5b9a\u7f51\u7edc\u8bed\u6599\u5e93\u4e2d\u7684\u6587\u6863\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Li\u7b49\u4eba(2023a)\u63d0\u51fa\u7684\u56de\u8bd1\u65b9\u6cd5\u751f\u6210\u5e76\u6574\u7406\u5408\u6210\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u6839\u636e\u521d\u59cb\u6587\u6863\u8fdb\u4e00\u6b65\u6539\u8fdb\u54cd\u5e94\u7684\u8d28\u91cf\u6765\u91cd\u5199\u8fd9\u4e9b\u6307\u4ee4\u3002\u901a\u8fc7\u4f7f\u7528\u4ea7\u751f\u7684\uff08\u56de\u8bd1\u6307\u4ee4\uff0c\u91cd\u5199\u54cd\u5e94\uff09\u5bf9\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5728AlpacaEval\u4e0a\u7684\u83b7\u80dc\u7387\u9ad8\u4e8e\u4f7f\u7528\u5176\u4ed6\u5e38\u89c1\u6307\u4ee4\u6570\u636e\u96c6\uff08\u5982Humpback\u3001ShareGPT\u3001Open Orca\u3001Alpaca-GPT4\u548cSelf-instruct\uff09\u3002\u6211\u4eec\u4e5f\u5c55\u793a\u4e86\u7528LLM\u91cd\u5199\u54cd\u5e94\u4f18\u4e8e\u76f4\u63a5\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u5e76\u4e14\u751f\u6210\u7684\u6587\u672c\u5206\u5e03\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u56de\u8bd1\u6307\u4ee4\u7684\u8d28\u91cf\u6bd4\u5176\u4ed6\u5408\u6210\u6307\u4ee4\u6765\u6e90\u66f4\u9ad8\uff0c\u800c\u6211\u4eec\u7684\u54cd\u5e94\u5728\u591a\u6837\u6027\u4e0e\u590d\u6742\u6027\u4e0a\u6bd4\u4ece\u84b8\u998f\u83b7\u5f97\u7684\u7ed3\u679c\u66f4\u4e3a\u51fa\u8272\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\u7ed3\u5408\u4e86\u7f51\u7edc\u4e0a\u4fe1\u606f\u591a\u6837\u6027\u548c\u6570\u91cf\u7684\u4f18\u52bf\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u8fd9\u662f\u6709\u6548\u5bf9\u9f50\u6240\u5fc5\u9700\u7684\u3002|\n", "2408.04594": "|**2024-08-09**|**Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models**|Qirui Jiao et.al.|[2408.04594](http://arxiv.org/abs/2408.04594)|**[link](https://github.com/modelscope/data-juicer)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aImg-Diff\u7684\u65b0\u6570\u636e\u96c6\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u548c\u56fe\u50cf\u5dee\u5f02\u63cf\u8ff0\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ec6\u5fae\u56fe\u50cf\u8bc6\u522b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5206\u6790\u76f8\u4f3c\u56fe\u50cf\u95f4\u7684\u5bf9\u8c61\u5dee\u5f02\uff0c\u8981\u6c42\u6a21\u578b\u8bc6\u522b\u76f8\u540c\u4e0e\u4e0d\u540c\u4e4b\u5904\u3002\u5229\u7528Stable-Diffusion-XL\u6a21\u578b\u53ca\u9ad8\u7ea7\u56fe\u50cf\u7f16\u8f91\u6280\u672f\u751f\u6210\u7a81\u51fa\u5bf9\u8c61\u66ff\u6362\u7684\u76f8\u4f3c\u56fe\u50cf\u5bf9\u3002\u6570\u636e\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u5dee\u5f02\u533a\u57df\u751f\u6210\u5668\u8bc6\u522b\u5bf9\u8c61\u5dee\u5f02\uff0c\u968f\u540e\u5dee\u5f02\u63cf\u8ff0\u751f\u6210\u5668\u63d0\u4f9b\u8be6\u7ec6\u7684\u5dee\u5f02\u8bf4\u660e\u3002\u7ed3\u679c\u662f\u521b\u5efa\u4e86\u4e00\u4e2a\u5c0f\u800c\u9ad8\u8d28\u91cf\u7684\u201c\u5bf9\u8c61\u66ff\u6362\u201d\u6837\u672c\u96c6\u5408\u3002\u4f7f\u7528\u6b64\u6570\u636e\u96c6\u5bf9\u5f53\u524d\u6700\u4f73\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982MGM-7B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u5dee\u5f02\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini\uff09\u5728MMVP\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63a2\u8ba8\u4e86\u901a\u8fc7\u201c\u5bf9\u8c61\u79fb\u9664\u201d\u65b9\u6cd5\u751f\u6210\u56fe\u50cf\u5dee\u5f02\u6570\u636e\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u5e76\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\u4ee5\u9a8c\u8bc1\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u5bf9\u6bd4\u6027\u6570\u636e\u96c6\u5408\u6210\u7684\u6df1\u5165\u89c1\u89e3\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5e76\u63a8\u52a8\u591a\u6a21\u6001\u6570\u636e\u5408\u6210\u548c\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u80fd\u529b\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53d1\u5e03\u5728https://github.com/modelscope/data-juicer/tree/ImgDiff\u4e0a\u4f9b\u516c\u4f17\u4f7f\u7528\u3002**|\n", "2408.04585": "|**2024-08-08**|**Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness**|Xiaojing Fan et.al.|[2408.04585](http://arxiv.org/abs/2408.04585)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b9e\u7528\u5e94\u7528\u9700\u6c42\u7684\u589e\u52a0\uff0c\u8bb8\u591a\u5173\u6ce8\u6548\u7387\u7684\u6a21\u578b\u88ab\u5f00\u53d1\u51fa\u6765\u4ee5\u5e73\u8861\u6027\u80fd\u548c\u8ba1\u7b97\u6210\u672c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5bf9\u6297\u9c81\u68d2\u6027\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u7814\u7a76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u901a\u8fc7\u6bd4\u8f83\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u590d\u6742\u5ea6\u548c\u6548\u7387\u6c34\u5e73\u7684\u4e3b\u8981\u6a21\u578b\u2014\u2014Transformer++\u3001\u95e8\u63a7\u7ebf\u6027\u6ce8\u610f\u529b\uff08GLA\uff09\u53d8\u6362\u5668\u4ee5\u53caMatMul-Free LM\uff0c\u6765\u63a2\u7d22\u6548\u7387\u3001\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u7684\u6743\u8861\u5173\u7cfb\u3002\u5229\u7528GLUE\u548cAdvGLUE\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002AdvGLUE\u6570\u636e\u96c6\u901a\u8fc7\u6dfb\u52a0\u65e8\u5728\u6311\u6218\u6a21\u578b\u9c81\u68d2\u6027\u7684\u5bf9\u6297\u6837\u672c\u6269\u5c55\u4e86GLUE\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5728GLUE\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u6027\u7a0d\u4f4e\u7684\u60c5\u51b5\u4e0b\uff0cGLA\u53d8\u6362\u5668\u548cMatMul-Free LM\u5728AdvGLUE\u4efb\u52a1\u4e0a\u663e\u793a\u51fa\u66f4\u9ad8\u7684\u6548\u7387\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u653b\u51fb\u7ea7\u522b\u4e0b\uff0c\u5b83\u4eec\u7684\u9c81\u68d2\u6027\u8981\u4e48\u4f18\u4e8e\uff0c\u8981\u4e48\u4e0eTransformer++\u76f8\u5339\u654c\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7b80\u5316\u67b6\u6784\u5728\u5b9e\u73b0\u9ad8\u6548\u80fd\u3001\u9ad8\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u4e4b\u95f4\u53d6\u5f97\u826f\u597d\u5e73\u8861\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8d44\u6e90\u53d7\u9650\u73af\u5883\u548c\u5bf9\u5bf9\u6297\u653b\u51fb\u6709\u9ad8\u62b5\u6297\u529b\u9700\u6c42\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2408.04575": "|**2024-08-08**|**SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals**|Haoran Zheng et.al.|[2408.04575](http://arxiv.org/abs/2408.04575)|null|\u89e3\u91ca\u6027\u4eba\u5de5\u667a\u80fd\uff08XAI\uff09\u5bf9\u4e8e\u589e\u5f3a\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u7684\u900f\u660e\u5ea6\u548c\u8d23\u4efb\u6027\u81f3\u5173\u91cd\u8981\uff0c\u5c24\u5176\u662f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSCENE\uff08\u8f6f\u53cd\u4e8b\u5b9e\u8bc4\u4f30\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u53ef\u89e3\u91ca\u6027\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6b21\u5c04\u51fb\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8f6f\u53cd\u4e8b\u5b9e\u89e3\u91ca\u3002\u901a\u8fc7\u5173\u6ce8\u57fa\u4e8e\u8bcd\u5143\u7684\u66ff\u6362\uff0cSCENE\u521b\u5efa\u4e86\u4e0a\u4e0b\u6587\u76f8\u5173\u4e14\u8bed\u4e49\u4e0a\u5177\u6709\u610f\u4e49\u7684\u8f6f\u53cd\u4e8b\u5b9e\uff0c\u800c\u65e0\u9700\u8fdb\u884c\u5927\u91cf\u5fae\u8c03\u3002SCENE\u91c7\u7528\u6709\u6548\u6027\u8f6f\u548cC\u8f6f\u6307\u6807\u6765\u8bc4\u4f30\u5404\u79cd\u6a21\u578b\u65e0\u5173\u7684XAI\u65b9\u6cd5\u5728\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6548\u679c\u3002\u5e94\u7528\u4e8eCNN\u3001RNN\u548cBERT\u67b6\u6784\uff0cSCENE\u63d0\u4f9b\u4e86\u5bf9\u5404\u79cdXAI\u6280\u672f\u5f3a\u9879\u548c\u5c40\u9650\u6027\u7684\u6709\u4ef7\u503c\u89c1\u89e3\u3002|\n", "2408.04568": "|**2024-08-08**|**Learning Fine-Grained Grounded Citations for Attributed Large Language Models**|Lei Huang et.al.|[2408.04568](http://arxiv.org/abs/2408.04568)|**[link](https://github.com/luckyyysta/fine-grained-attribution)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u5728\u5e7b\u89c9\u95ee\u9898\u4e0a\u5b58\u5728\u6311\u6218\u3002\u57fa\u4e8e\u5c5e\u6027\u7684LLM\uff0c\u901a\u8fc7\u5728\u751f\u6210\u6587\u672c\u4e2d\u6dfb\u52a0\u5185\u8054\u5f15\u7528\uff0c\u663e\u793a\u51fa\u51cf\u5c11\u5e7b\u89c9\u5e76\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u5f15\u7528\u65b9\u9762\u6548\u679c\u4e0d\u4f73\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u4f9d\u8d56\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u53ea\u5f15\u7528\u7c97\u7c92\u5ea6\u6587\u6863\u6807\u8bc6\u7684\u505a\u6cd5\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FRONT\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLM\u751f\u6210\u7ec6\u7c92\u5ea6\u76f8\u5173\u5f15\u7528\u3002\u8fd9\u4e9b\u5f15\u7528\u901a\u8fc7\u8fde\u63a5\u5230\u751f\u6210\u54cd\u5e94\u7684\u7ec6\u7c92\u5ea6\u652f\u6301\u5f15\u7528\u6765\u63d0\u4f9b\u6307\u5bfc\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u5f15\u7528\u8d28\u91cf\uff0c\u8fd8\u4fbf\u4e8e\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u5728ALCE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFRONT\u5728\u751f\u6210\u4f18\u79c0\u76f8\u5173\u54cd\u5e94\u548c\u9ad8\u5ea6\u652f\u6301\u6027\u5f15\u7528\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002\u4f7f\u7528LLaMA-2-7B\u65f6\uff0c\u8be5\u6846\u67b6\u663e\u8457\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u5e73\u5747\u63d0\u9ad8\u4e8614.21%\u7684\u5f15\u7528\u8d28\u91cf\uff0c\u5e76\u4e14\u8d85\u8d8a\u4e86ChatGPT\u3002**|\n", "2408.04556": "|**2024-08-08**|**Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models**|Yupeng Chang et.al.|[2408.04556](http://arxiv.org/abs/2408.04556)|**[link](https://github.com/cyp-jlu-ai/ba-lora)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u77a9\u76ee\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u5c06\u8fd9\u4e9b\u6a21\u578b\u5e94\u7528\u4e8e\u4e0b\u6e38\u5e94\u7528\u65f6\uff0c\u901a\u5e38\u9700\u8981\u8fdb\u884c\u8ba1\u7b97\u5bc6\u96c6\u578b\u548c\u5185\u5b58\u6d88\u8017\u5927\u7684\u5fae\u8c03\u8fc7\u7a0b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u6280\u672f\u5df2\u7ecf\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u51fa\u73b0\uff0c\u65e8\u5728\u4ee5\u6700\u5c0f\u7684\u8ba1\u7b97\u6210\u672c\u6765\u5b9a\u5236LLM\u3002\u5c3d\u7ba1PEFT\u65b9\u6cd5\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5b8c\u5168\u89e3\u51b3\u4ece\u9884\u8bad\u7ec3\u6570\u636e\u7ee7\u627f\u504f\u89c1\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684PEFT\u65b9\u6cd5\u2014\u2014Bias-Aware Low-Rank Adaptation (BA-LoRA)\uff0c\u65e8\u5728\u5bf9\u6297\u504f\u89c1\u7ee7\u627f\u3002 BA-LoRA\u6574\u5408\u4e86\u4e09\u4e2a\u4e0d\u540c\u7684\u6b63\u5219\u5316\u9879\uff1a\u4e00\u81f4\u6027\u6b63\u5219\u5316\u5668\u3001\u591a\u6837\u6027\u6b63\u5219\u5316\u5668\u4ee5\u53ca\u5947\u5f02\u503c\u5206\u89e3\u6b63\u5219\u5316\u5668\u3002\u8fd9\u4e09\u4e2a\u6b63\u5219\u5316\u5668\u5171\u540c\u65e8\u5728\u63d0\u9ad8\u751f\u6210\u6a21\u578b\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u4e00\u81f4\u6027\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u548c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528\u5982LLaMA\u3001Mistral\u548cGemma\u7b49\u4e3b\u6d41LLM\uff0c\u6211\u4eec\u5c55\u793a\u4e86BA-LoRA\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86LoRA\u53ca\u5176\u6700\u5148\u8fdb\u7684\u53d8\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u5730\u51cf\u8f7b\u4e86\u9884\u8bad\u7ec3\u504f\u89c1\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u5bfc\u81f4\u66f4\u53ef\u9760\u4e14\u7a33\u5065\u7684\u6a21\u578b\u8f93\u51fa\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/cyp-jlu-ai/BA-LoRA\u3002**|\n", "2408.04522": "|**2024-08-08**|**Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models**|Fabio Pernisi et.al.|[2408.04522](http://arxiv.org/abs/2408.04522)|null|\u968f\u7740\u4e0d\u540c\u8bed\u8a00\u7684\u591a\u5143\u8bed\u8a00\u793e\u533a\u548c\u7528\u6237\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u4e0d\u540c\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u5b89\u5168\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u7ecf\u8fdb\u884c\u4e86\u6301\u7eed\u7684\u52aa\u529b\u4ee5\u786e\u4fddLLM\u7684\u5b89\u5168\u6027\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u53ef\u4ee5\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u6280\u672f\u6765\u8868\u73b0\u5f97\u4e0d\u5b89\u5168\uff0c\u8fd9\u662f\u4e00\u79cd\u4fc3\u4f7f\u6a21\u578b\u5728\u5176\u64cd\u4f5c\u51c6\u5219\u4e4b\u5916\u884c\u52a8\u7684\u6280\u672f\u3002\u5bf9\u4e8eLLM\u5b89\u5168\u6027\u4ee5\u53ca\u201c\u8d8a\u72f1\u201d\u7684\u7814\u7a76\u76ee\u524d\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\uff0c\u8fd9\u9650\u5236\u4e86\u6211\u4eec\u5bf9\u5176\u4ed6\u8bed\u8a00\u4e2dLLM\u5b89\u5168\u6027\u7684\u7406\u89e3\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u610f\u5927\u5229\u8bed\u4e2d\u7814\u7a76\u591a\u8f6e\u201c\u8d8a\u72f1\u201d\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u7528\u4e0d\u5b89\u5168\u793a\u4f8b\u6765\u8bf1\u5bfc\u4e0d\u5b89\u5168\u884c\u4e3a\uff0c\u6765\u8d21\u732e\u4e8e\u8fd9\u4e00\u9886\u57df\u3002\u4e3a\u4e86\u652f\u6301\u6211\u4eec\u7684\u5206\u6790\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u610f\u5927\u5229\u8bed\u95ee\u9898-\u7b54\u6848\u4e0d\u5b89\u5168\u6570\u636e\u96c6\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5728\u56db\u4e2a\u5f00\u653e\u6743\u91cdLLM\u5bb6\u65cf\u4e2d\u8bc6\u522b\u51fa\u4e86\u660e\u663e\u7684\u5b89\u5168\u6f0f\u6d1e\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u4f7f\u7528\u5c11\u91cf\u4e0d\u5b89\u5168\u793a\u4f8b\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u4e5f\u4f1a\u8868\u73b0\u51fa\u4e0d\u5b89\u5168\u7684\u884c\u4e3a\uff0c\u5e76\u4e14\u66f4\u4ee4\u4eba\u62c5\u5fe7\u7684\u662f\uff0c\u968f\u7740\u66f4\u591a\u793a\u4f8b\u7684\u51fa\u73b0\uff0c\u8fd9\u79cd\u8d8b\u52bf\u8fc5\u901f\u52a0\u5267\u3002|\n", "2408.04477": "|**2024-08-08**|**What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant**|Jonan Richards et.al.|[2408.04477](http://arxiv.org/abs/2408.04477)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7528\u4e8e\u8f85\u52a9\u5f00\u53d1\u8005\u7406\u89e3\u4ee3\u7801\u7684\u5de5\u5177\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\u7684\u540c\u65f6\uff0c\u5f00\u53d1\u8005\u5728\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u65f6\u4ecd\u9762\u4e34\u4e00\u4e9b\u969c\u788d\uff0c\u5305\u62ec\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u5176\u610f\u56fe\u7684\u6311\u6218\u3001\u89e3\u8bfb\u5de5\u5177\u7ed3\u679c\u7684\u56f0\u96be\uff0c\u4ee5\u53ca\u8c03\u6574\u6709\u6548\u63d0\u793a\u4ee5\u83b7\u5f97\u6709\u7528\u4fe1\u606f\u7684\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u5bf9\u8bdd\u52a9\u624b\uff0c\u8be5\u52a9\u624b\u6839\u636e\u63a8\u65ad\u51fa\u7684\u7528\u6237\u5fc3\u7406\u72b6\u6001\uff08\u5982\u80cc\u666f\u77e5\u8bc6\u548c\u7ecf\u9a8c\uff09\u63d0\u4f9b\u4e2a\u6027\u5316\u4e92\u52a8\u3002\u901a\u8fc7\u9488\u5bf9\u5341\u56db\u4f4d\u65b0\u624b\u8fdb\u884c\u7684\u5185\u90e8\u4e3b\u9898\u7814\u7a76\uff0c\u6211\u4eec\u6355\u6349\u4e86\u4ed6\u4eec\u7684\u611f\u77e5\u548c\u504f\u597d\u3002\u7814\u7a76\u7ed3\u679c\u4e3a\u5e0c\u671b\u521b\u5efa\u6216\u6539\u8fdb\u9762\u5411\u65b0\u624b\u7684LLM\u4e3a\u57fa\u7840\u7684\u5bf9\u8bdd\u52a9\u624b\u4ee5\u652f\u6301\u4ee3\u7801\u7406\u89e3\u7684\u7814\u7a76\u4eba\u5458\u548c\u5de5\u5177\u5f00\u53d1\u8005\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.04472": "|**2024-08-08**|**Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate**|Yiqun Zhang et.al.|[2408.04472](http://arxiv.org/abs/2408.04472)|**[link](https://github.com/zhangyiqun018/agent-for-debate)**|**\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u8fd9\u4e00\u5168\u9762\u4e14\u590d\u6742\u7684\u8ba1\u7b97\u8bba\u8fa9\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u4e34\u7740\u5e7b\u89c9\u548c\u7ade\u4e89\u529b\u4e0d\u8db3\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8fa9\u8bba\u8005\u201d\uff08Agent4Debate\uff09\u7684\u52a8\u6001\u3001\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u57fa\u4e8eLLMs\u8bbe\u8ba1\uff0c\u65e8\u5728\u589e\u5f3a\u5176\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u53d7\u5230\u4eba\u7c7b\u5728\u8fa9\u8bba\u51c6\u5907\u4e0e\u6267\u884c\u8fc7\u7a0b\u4e2d\u884c\u4e3a\u7684\u542f\u53d1\uff0c\u91c7\u7528\u534f\u4f5c\u67b6\u6784\uff0c\u7531\u56db\u4e2a\u4e13\u95e8\u7684\u4ee3\u7406\uff08\u641c\u7d22\u8005\u3001\u5206\u6790\u8005\u3001\u64b0\u5199\u8005\u548c\u5ba1\u9605\u8005\uff09\u52a8\u6001\u4ea4\u4e92\u5e76\u5408\u4f5c\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5728\u6574\u4e2a\u8fa9\u8bba\u8fc7\u7a0b\u4e2d\u8986\u76d6\u4e86\u4ece\u521d\u59cb\u7814\u7a76\u5230\u8bba\u70b9\u5f62\u6210\u3001\u53cd\u9a73\u548c\u603b\u7ed3\u7684\u591a\u4e2a\u9636\u6bb5\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6846\u67b6\u7684\u6027\u80fd\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u4e2d\u56fd\u8fa9\u8bba\u7ade\u6280\u573a\u201d\u7684\u6570\u636e\u5e93\uff0c\u5305\u542b\u4e8666\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4e2d\u6587\u8fa9\u8bba\u8bae\u9898\u3002\u6211\u4eec\u62db\u52df\u4e86\u5341\u4f4d\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e13\u4e1a\u8fa9\u8bba\u8005\uff0c\u5e76\u6536\u96c6\u4e86\u6d89\u53caAgent4Debate\u3001\u57fa\u7ebf\u6a21\u578b\u548c\u4eba\u7c7b\u7684200\u573a\u8fa9\u8bba\u8bb0\u5f55\u3002\u8bc4\u4ef7\u4f53\u7cfb\u91c7\u7528\u4e86\u81ea\u52a8\u8bc4\u5206\u7cfb\u7edfDebatrix\u4ee5\u53ca\u57fa\u4e8eDebatrix-Elo\u548cHuman-Elo\u6392\u540d\u7684\u4e13\u4e1a\u8bc4\u5ba1\u56e2\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5148\u8fdb\u7684Agent4Debate\u5728\u80fd\u529b\u4e0a\u4e0e\u4eba\u7c7b\u76f8\u5f53\u3002\u8fdb\u4e00\u6b65\u7684\u6d88\u878d\u7814\u7a76\u8868\u660e\uff0c\u4ee3\u7406\u7ed3\u6784\u4e2d\u7684\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.04449": "|**2024-08-08**|**RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents**|Zihao Zhu et.al.|[2408.04449](http://arxiv.org/abs/2408.04449)|null|\u6458\u8981\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRiskAwareBench\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u7531\u56db\u4e2a\u6a21\u5757\u7ec4\u6210\uff1a\u5b89\u5168\u63d0\u793a\u751f\u6210\u3001\u5371\u9669\u573a\u666f\u751f\u6210\u3001\u8ba1\u5212\u751f\u6210\u548c\u8bc4\u4f30\uff0c\u5b83\u5141\u8bb8\u8fdb\u884c\u5168\u9762\u7684\u98ce\u9669\u8bc4\u4f30\uff0c\u4e14\u6240\u9700\u7684\u4eba\u5de5\u5e72\u9884\u6700\u5c11\u3002\u901a\u8fc7\u4f7f\u7528\u8fd9\u4e2a\u6846\u67b6\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aPhysicalRisk\u7684\u6570\u636e\u96c6\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u6d89\u53ca\u76f8\u5173\u5b89\u5168\u63d0\u793a\u3001\u89c2\u5bdf\u548c\u6307\u4ee4\u7684\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5927\u591a\u6570LLM\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u8868\u73b0\u4e0d\u8db3\uff0c\u5e76\u4e14\u57fa\u7840\u7684\u98ce\u9669\u7f13\u89e3\u7b56\u7565\u5e26\u6765\u7684\u63d0\u5347\u6709\u9650\u3002\u8fd9\u5f3a\u8c03\u4e86\u5728\u672a\u6765\u6539\u8fdb\u57fa\u4e8eLLM\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u7684\u7269\u7406\u98ce\u9669\u610f\u8bc6\u7684\u7d27\u8feb\u6027\u548c\u91cd\u8981\u6027\u3002|\n", "2408.05212": "|**2024-08-10**|**Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions**|Michele Miranda et.al.|[2408.05212](http://arxiv.org/abs/2408.05212)|**[link](https://github.com/michele17284/awesome-privacy-preserving-llms)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u6b65\uff0c\u5e76\u5728\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5e9e\u5927\u7684\u4e92\u8054\u7f51\u6765\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u5e26\u6765\u4e86\u663e\u8457\u7684\u9690\u79c1\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5728\u5173\u952e\u9886\u57df\uff08\u5982\u533b\u7597\u4fdd\u5065\uff09\u7684\u60c5\u51b5\u4e0b\u4f1a\u52a0\u5267\u8fd9\u4e9b\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5728\u7279\u5b9a\u5e94\u7528\u573a\u666f\u4e0b\uff0c\u53ef\u80fd\u9700\u8981\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u9488\u5bf9\u79c1\u6709\u6570\u636e\u7684\u5fae\u8c03\u3002\u672c\u6587\u5bf9LLM\u7684\u9690\u79c1\u5a01\u80c1\u8fdb\u884c\u4e86\u6279\u5224\u6027\u8bc4\u4f30\uff0c\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u6a21\u578b\u53ef\u80fd\u8bb0\u4f4f\u5e76\u65e0\u610f\u95f4\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u7684\u98ce\u9669\u3002 \u6211\u4eec\u901a\u8fc7\u56de\u987e\u9488\u5bf9LLM\u7684\u9690\u79c1\u653b\u51fb\u6765\u63a2\u8ba8\u5f53\u524d\u7684\u5a01\u80c1\uff0c\u5e76\u63d0\u51fa\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5728\u6574\u4e2a\u5b66\u4e60\u7ba1\u9053\u4e2d\u6574\u5408\u9690\u79c1\u673a\u5236\u3002\u8fd9\u4e9b\u89e3\u51b3\u65b9\u6848\u6db5\u76d6\u4e86\u4ece\u533f\u540d\u5316\u8bad\u7ec3\u6570\u636e\u5230\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u8fc7\u7a0b\u4e2d\u5b9e\u65bd\u5dee\u5206\u9690\u79c1\uff0c\u4ee5\u53ca\u5728\u8bad\u7ec3\u540e\u6267\u884c\u673a\u5668\u9057\u5fd8\u7684\u8303\u56f4\u3002\u6211\u4eec\u7684\u6587\u732e\u7efc\u8ff0\u6df1\u5165\u7814\u7a76\u4e86\u73b0\u6709\u7814\u7a76\u4e2d\u7684\u6301\u7eed\u6311\u6218\u3001\u53ef\u7528\u5de5\u5177\u548c\u672a\u6765\u65b9\u5411\uff0c\u4ee5\u4fdd\u62a4LLM\u4e2d\u7684\u9690\u79c1\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u5bf9\u9690\u79c1\u4fdd\u5b58\u65b9\u6cd5\u53ca\u5176\u5728\u51cf\u8f7b\u98ce\u9669\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u5168\u9762\u7406\u89e3\uff0c\u6307\u5bfc\u5f00\u53d1\u66f4\u5b89\u5168\u3001\u66f4\u53ef\u4fe1\u7684AI\u7cfb\u7edf\u3002|\n", "2408.05211": "|**2024-08-09**|**VITA: Towards Open-Source Interactive Omni Multimodal LLM**|Chaoyou Fu et.al.|[2408.05211](http://arxiv.org/abs/2408.05211)|**[link](https://github.com/VITA-MLLM/VITA)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86VITA\uff0c\u8fd9\u662f\u9996\u4e2a\u5f00\u6e90\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u540c\u65f6\u5904\u7406\u548c\u5206\u6790\u89c6\u9891\u3001\u56fe\u50cf\u3001\u6587\u672c\u548c\u97f3\u9891\u7b49\u591a\u5143\u6a21\u6001\u4fe1\u606f\uff0c\u5e76\u4e14\u5177\u5907\u9ad8\u7ea7\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u4f53\u9a8c\u3002\u4eceMixtral 8x7B\u4f5c\u4e3a\u8bed\u8a00\u57fa\u7840\u51fa\u53d1\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u5176\u5728\u4e2d\u6587\u65b9\u9762\u7684\u8bcd\u6c47\uff0c\u5e76\u901a\u8fc7\u53cc\u8bed\u6307\u4ee4\u5fae\u8c03\u8fdb\u4e00\u6b65\u63d0\u5347\u4e86\u6a21\u578b\u80fd\u529b\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e24\u9636\u6bb5\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u65b9\u5f0f\uff0c\u4e3a\u8bed\u8a00\u6a21\u578b\u8d4b\u4e88\u4e86\u89c6\u89c9\u548c\u97f3\u9891\u5904\u7406\u7684\u80fd\u529b\u3002 VITA\u5c55\u73b0\u4e86\u5f3a\u5927\u7684\u591a\u8bed\u8a00\u3001\u89c6\u89c9\u548c\u97f3\u9891\u7406\u89e3\u7684\u57fa\u7840\u80fd\u529b\uff0c\u5e76\u5728\u4e00\u7cfb\u5217\u5355\u6a21\u6001\u4e0e\u591a\u6a21\u6001\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u9664\u4e86\u57fa\u7840\u80fd\u529b\u5916\uff0c\u6211\u4eec\u5728\u63d0\u5347\u81ea\u7136\u591a\u6a21\u6001\u4eba\u673a\u4ea4\u4e92\u4f53\u9a8c\u65b9\u9762\u4e5f\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5229\u7528\u975e\u5524\u9192\u4ea4\u4e92\u548c\u97f3\u9891\u4e2d\u65ad\u529f\u80fd\u3002 VITA\u662f\u5f00\u6e90\u793e\u533a\u63a2\u7d22\u65e0\u7f1d\u878d\u5408\u591a\u6a21\u6001\u7406\u89e3\u548c\u4ea4\u4e92\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1VITA\u4e0e\u4e13\u6709\u6a21\u578b\u8fd8\u6709\u8f83\u5927\u5dee\u8ddd\uff0c\u4f46\u6211\u4eec\u76f8\u4fe1\u5b83\u4f5c\u4e3a\u5148\u950b\u89d2\u8272\u53ef\u4ee5\u6210\u4e3a\u540e\u7eed\u7814\u7a76\u7684\u91cd\u8981\u57fa\u77f3\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://vita-home.github.io|\n", "2408.05204": "|**2024-08-09**|**Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners**|Michael Vaccaro Jr et.al.|[2408.05204](http://arxiv.org/abs/2408.05204)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5c24\u5176\u662fOpenAI\u7684GPT\u7cfb\u5217\uff0c\u5728\u591a\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u56e0\u5176\u5728\u4e0d\u540c\u5b66\u79d1\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u4ee5\u53ca\u5bf9\u7528\u6237\u63d0\u793a\u7684\u5feb\u901f\u9002\u5e94\u6027\u800c\u53d7\u5230\u5173\u6ce8\uff0c\u5e76\u4e14\u5c55\u73b0\u51fa\u4f5c\u4e3a\u4e2a\u6027\u5316\u5b66\u4e60\uff08PL\uff09\u5de5\u5177\u7684\u72ec\u7279\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728K-12\u6559\u80b2\u4e2d\u7684\u5e94\u7528\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u9996\u6b21\u91c7\u7528\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u65b9\u6cd5\uff08\u6837\u672c\u91cf\u4e3a23\uff09\u6765\u8bc4\u4f30GPT-4\u5728\u4e2d\u5b66\u79d1\u5b66\u6587\u672c\u4e2a\u6027\u5316\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u7814\u7a76\u3002\u5728\u8be5\u7814\u7a76\u4e2d\uff0cGPT-4\u7528\u4e8e\u6839\u636e\u5b66\u751f\u5728\u8bad\u7ec3\u9636\u6bb5\u505a\u51fa\u7684\u9009\u62e9\u6765\u5206\u6790\u548c\u9884\u6d4b\u4ed6\u4eec\u7684\u5b66\u4e60\u504f\u597d\u3002\u5bf9\u4e8e\u5b9e\u9a8c\u7ec4\u7684\u5b66\u751f\uff0cGPT-4\u88ab\u7528\u6765\u4fee\u6539\u79d1\u5b66\u6587\u672c\u4ee5\u4e0e\u5b66\u751f\u7684\u9884\u6d4b\u504f\u597d\u76f8\u5339\u914d\uff1b\u800c\u5bf9\u4e8e\u63a7\u5236\u7ec4\u7684\u5b66\u751f\uff0c\u6587\u672c\u5219\u88ab\u4fee\u6539\u4e3a\u4e0e\u5176\u5b66\u4e60\u504f\u597d\u76f8\u53cd\u3002\u901a\u8fc7\u66fc-\u60e0\u7279\u5c3cU\u68c0\u9a8c\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u6587\u672c\u4e0e\u5b66\u751f\u504f\u597d\u5339\u914d\u65f6\uff0c\u5b66\u751f\u660e\u663e\u66f4\u503e\u5411\u4e8e\u63a5\u53d7\uff08\u57280.10\u6c34\u5e73\u4e0a\u5177\u6709\u7edf\u8ba1\u5b66\u610f\u4e49\uff0cp=0.059\uff09\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u80fd\u591f\u6709\u6548\u5730\u7406\u89e3\u548c\u5b9a\u5236\u6559\u80b2\u5185\u5bb9\u4ee5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u8005\u7684\u504f\u597d\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u5b66\u4e60\u6280\u672f\u9886\u57df\u7684\u4e00\u4e2a\u91cd\u8981\u8fdb\u5c55\u3002 \u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u8fd9\u9879\u7814\u7a76\u7684\u5c40\u9650\u6027\u548c\u5728\u6559\u80b2\u4e2d\u4f7f\u7528\u4eba\u5de5\u667a\u80fd\u7684\u4f26\u7406\u8003\u8651\u3002|\n", "2408.05200": "|**2024-08-09**|**TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning**|Yujie Feng et.al.|[2408.05200](http://arxiv.org/abs/2408.05200)|**[link](https://github.com/WoodScene/TaSL)**|\u8bed\u8a00\u6a21\u578b\u8fde\u7eed\u5b66\u4e60\uff08CL\uff09\u6700\u8fd1\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u56e0\u4e3a\u5b83\u6709\u53ef\u80fd\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u9002\u5e94\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u52a8\u6001\u73b0\u5b9e\u73af\u5883\u3002\u4e00\u4e2a\u5173\u952e\u6311\u6218\u662f\u707e\u96be\u6027\u9057\u5fd8\uff0c\u5373\u6a21\u578b\u5728\u5b66\u4e60\u65b0\u4efb\u52a1\u65f6\u4f1a\u5931\u53bb\u5148\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u73b0\u6709\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u591a\u4e2a\u53c2\u6570\u6548\u7387\u5fae\u8c03\uff08PEFT\uff09\u5757\u6765\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u83b7\u53d6\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u7f3a\u4e4f\u6548\u7387\uff0c\u5e76\u4e14\u5ffd\u89c6\u4e86\u901a\u8fc7\u4efb\u52a1\u4ea4\u4e92\u8fdb\u884c\u77e5\u8bc6\u4f20\u9012\u7684\u53ef\u80fd\u6027\u3002 \u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4efb\u52a1\u6280\u80fd\u5b9a\u4f4d\u4e0e\u6574\u5408\uff08TaSL\uff09\u7684\u65b0CL\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u4e0d\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u91cd\u64ad\u6765\u589e\u5f3a\u77e5\u8bc6\u4f20\u9012\u3002TaSL\u9996\u5148\u6839\u636e\u53c2\u6570\u4f9d\u8d56\u6027\u5c06\u6a21\u578b\u5206\u4e3a\u201c\u6280\u80fd\u5355\u5143\u201d\uff0c\u8fd9\u4f7f\u5f97\u5bf9\u6280\u80fd\u5355\u5143\u7684\u63a7\u5236\u66f4\u52a0\u7cbe\u7ec6\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ec4\u7ea7\u6280\u80fd\u5b9a\u4f4d\u6280\u672f\uff0c\u4ee5\u8bc6\u522b\u65b0\u4efb\u52a1\u4e2d\u6280\u80fd\u5355\u5143\u7684\u91cd\u8981\u6027\u5206\u5e03\u3002\u901a\u8fc7\u6bd4\u8f83\u8fd9\u4e2a\u91cd\u8981\u6027\u5206\u5e03\u4e0e\u5176\u4ed6\u5148\u524d\u4efb\u52a1\u4e2d\u7684\u5206\u5e03\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u4e00\u4e2a\u7cbe\u7ec6\u7684\u6280\u80fd\u6574\u5408\u7b56\u7565\uff0c\u4fdd\u7559\u4e86\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u9632\u6b62\u9057\u5fd8\uff0c\u5e76\u66f4\u65b0\u4e86\u5171\u4eab\u4efb\u52a1\u77e5\u8bc6\uff0c\u8fd9\u4fc3\u8fdb\u4e86\u53cc\u5411\u77e5\u8bc6\u4f20\u9012\u3002\u56e0\u6b64\uff0cTaSL\u5b9e\u73b0\u4e86\u4fdd\u6301\u5148\u524d\u77e5\u8bc6\u548c\u5728\u65b0\u4efb\u52a1\u4e0a\u53d6\u5f97\u4f18\u5f02\u8868\u73b0\u4e4b\u95f4\u7684\u6700\u4f73\u5e73\u8861\u3002 TaSL\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u901a\u7528\u6a21\u578b\uff0c\u5e76\u53ef\u4ee5\u6839\u636eLoRA\u7b49PEFT\u65b9\u6cd5\u8fdb\u884c\u5b9a\u5236\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8868\u73b0\u51fa\u663e\u8457\u7684\u6269\u5c55\u6027\uff0c\u5141\u8bb8\u4e0e\u8bb0\u5fc6\u91cd\u64ad\u96c6\u6210\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\u3002\u5728\u4e24\u4e2aCL\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4f7f\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece2.2\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\uff0c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86TaSL\u53ca\u5176\u53d8\u4f53\u5728\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u6709\u6548\u6027\u3002|\n", "2408.05149": "|**2024-08-09**|**AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset**|Pritam Deka et.al.|[2408.05149](http://arxiv.org/abs/2408.05149)|null|\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u653b\u51fb\u5f52\u56e0\u662f\u81f3\u5173\u91cd\u8981\u7684\u8fc7\u7a0b\uff0c\u5b83\u5141\u8bb8\u4e13\u5bb6\u5236\u5b9a\u9488\u5bf9\u653b\u51fb\u8005\u7684\u9632\u5fa1\u63aa\u65bd\u548c\u6cd5\u5f8b\u884c\u52a8\u3002\u76ee\u524d\uff0c\u5206\u6790\u4eba\u5458\u4e3b\u8981\u901a\u8fc7\u624b\u52a8\u64cd\u4f5c\u6765\u8fdb\u884c\u5f52\u56e0\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u4efb\u52a1\u7684\u590d\u6742\u6027\u3002\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u53ef\u4ee5\u88ab\u7528\u6765\u8f85\u52a9\u7f51\u7edc\u5b89\u5168\u5206\u6790\u5e08\u5728\u5f52\u56e0\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u5de5\u4f5c\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6280\u672f\u975e\u5e38\u5f3a\u5927\uff0c\u4f46\u5728\u7f3a\u4e4f\u653b\u51fb\u5f52\u56e0\u9886\u57df\u7684\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u4eec\u9700\u8981\u5e94\u5bf9\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c06\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u5e76\u63d0\u4f9b\u5230\u76ee\u524d\u4e3a\u6b62\u6211\u4eec\u6240\u77e5\u7684\u7b2c\u4e00\u4e2a\u653b\u51fb\u5f52\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u8bbe\u8ba1\u7684\u4e3b\u8981\u76ee\u6807\u662f\u4ece\u7f51\u7edc\u5b89\u5168\u6587\u672c\u4e2d\u63d0\u53d6\u653b\u51fb\u5f52\u56e0\u4fe1\u606f\uff0c\u5229\u7528NLP\u9886\u57df\u7684\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u65b9\u6cd5\u3002\u4e0e\u5176\u5b83\u7f51\u7edc\u5b89\u5168NER\u6570\u636e\u96c6\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u63d0\u4f9b\u4e86\u4e30\u5bcc\u4e14\u5305\u542b\u4e0a\u4e0b\u6587\u7ec6\u8282\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u4e00\u4e9b\u8de8\u77ed\u8bed\u548c\u53e5\u5b50\u7684\u6ce8\u91ca\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5e76\u5e94\u7528\u4e86NLP\u6280\u672f\u6765\u5c55\u793a\u6570\u636e\u96c6\u5728\u653b\u51fb\u5f52\u56e0\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u8fd9\u4e9b\u5b9e\u9a8c\u7a81\u663e\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5728\u6539\u8fdb\u7f51\u7edc\u5b89\u5168\u6570\u636e\u96c6\u4e2d\u7684NER\u4efb\u52a1\u4ee5\u63d0\u5347\u653b\u51fb\u5f52\u56e0\u80fd\u529b\u7684\u6f5c\u529b\u3002|\n", "2408.05141": "|**2024-08-09**|**A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning**|Ye Yuan et.al.|[2408.05141](http://arxiv.org/abs/2408.05141)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7efc\u5408\u4f18\u5316\u7684\u589e\u5f3a\u68c0\u7d22\u8f85\u52a9\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u5916\u90e8\u77e5\u8bc6\u5e93\u663e\u8457\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u548c\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u8fdb\u884c\u4e86\u591a\u9879\u6539\u8fdb\uff0c\u5305\u62ec\u5bf9\u7f51\u9875\u4e2d\u7684\u6587\u672c\u6bb5\u843d\u548c\u8868\u683c\u8fdb\u884c\u7ec6\u5316\u5904\u7406\u3001\u5f15\u5165\u5c5e\u6027\u9884\u6d4b\u5668\u4ee5\u51cf\u5c11\u5e7b\u89c9\u3001\u6784\u5efaLLM\u77e5\u8bc6\u62bd\u53d6\u5668\u548c\u77e5\u8bc6\u56fe\u8c31\u62bd\u53d6\u5668\uff0c\u5e76\u6700\u7ec8\u5efa\u7acb\u4e86\u4e00\u4e2a\u6574\u5408\u6240\u6709\u53c2\u8003\u4fe1\u606f\u7684\u63a8\u7406\u7b56\u7565\u3002\u6211\u4eec\u901a\u8fc7Meta CRAG KDD\u676f2024\u7ade\u8d5b\u4e2d\u7684CRAG\u6570\u636e\u96c6\u5bf9\u7cfb\u7edf\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u672c\u5730\u4e0e\u5728\u7ebf\u8bc4\u4f30\u5747\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u590d\u6742\u63a8\u7406\u80fd\u529b\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u63d0\u5347\u3002\u5728\u672c\u5730\u8bc4\u4f30\u4e2d\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u6a21\u578b\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u9519\u8bef\u7387\u4e5f\u6709\u6240\u4e0b\u964d\uff0c\u53d6\u5f97\u4e86\u8f83\u9ad8\u7684\u5206\u6570\u3002\u540c\u65f6\uff0c\u5728\u7ebf\u8bc4\u4f30\u7ed3\u679c\u540c\u6837\u8868\u73b0\u4f18\u5f02\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7cfb\u7edf\u7684\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u7cfb\u7edf\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8e\\url{https://gitlab.aicrowd.com/shizueyy/crag-new}\u3002|\n", "2408.05128": "|**2024-08-09**|**Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations**|Jasmine Latendresse et.al.|[2408.05128](http://arxiv.org/abs/2408.05128)|null|\u5728\u8f6f\u4ef6\u7cfb\u7edf\u529f\u80fd\u3001\u6548\u7387\u4e0e\u53ef\u7ef4\u62a4\u6027\u65b9\u9762\uff0c\u8f6f\u4ef6\u5e93\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\u3002\u968f\u7740\u5f00\u53d1\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b80\u5316\u7f16\u7801\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6a21\u578b\u63a8\u8350\u5408\u9002\u5e93\u7684\u6709\u6548\u6027\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002\u672c\u6587\u8bc4\u4f30\u4e86ChatGPT\u4f5c\u4e3a\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc6\u522b\u4e86\u6539\u8fdb\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-3.5 Turbo\u751f\u6210\u9488\u5bf910000\u4e2aStack Overflow\u95ee\u9898\u7684Python\u4ee3\u7801\uff0c\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cChatGPT\u6bd4\u4eba\u7c7b\u5f00\u53d1\u8005\u66f4\u9891\u7e41\u5730\u4f7f\u7528\u7b2c\u4e09\u65b9\u5e93\uff0c\u503e\u5411\u4e8e\u5e7f\u6cdb\u91c7\u7528\u4e14\u5386\u53f2\u60a0\u4e45\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c14.2%\u63a8\u8350\u7684\u5e93\u5177\u6709\u9650\u5236\u6027\u7684Copyleft\u8bb8\u53ef\uff0c\u8fd9\u5e76\u672a\u7531ChatGPT\u660e\u786e\u4f20\u8fbe\u3002\u6b64\u5916\uff0c\u67096.5%\u7684\u5e93\u65e0\u6cd5\u76f4\u63a5\u4f7f\u7528\uff0c\u53ef\u80fd\u5bfc\u81f4\u5f00\u53d1\u8005\u56f0\u60d1\u548c\u6d6a\u8d39\u65f6\u95f4\u3002\u5c3d\u7ba1ChatGPT\u53ef\u4ee5\u4f5c\u4e3a\u6709\u6548\u7684\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\uff0c\u4f46\u5e94\u63d0\u4f9b\u5173\u4e8e\u7ef4\u62a4\u6307\u6807\u548c\u8bb8\u53ef\u7684\u66f4\u591a\u660e\u786e\u4fe1\u606f\u3002\u6211\u4eec\u5efa\u8bae\u5f00\u53d1\u8005\u5b9e\u65bd\u4e25\u683c\u7684\u4f9d\u8d56\u7ba1\u7406\u5b9e\u8df5\uff0c\u5e76\u5728\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u96c6\u6210\u5230\u9879\u76ee\u4e2d\u4e4b\u524d\uff0c\u4ed4\u7ec6\u68c0\u67e5\u5e93\u7684\u8bb8\u53ef\u8bc1\u3002|\n", "2408.05126": "|**2024-08-09**|**Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media**|Petre Breazu et.al.|[2408.05126](http://arxiv.org/abs/2408.05126)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u6f14\u8fdb\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6587\u672c\u5206\u6790\u4e2d\u7684\u53d1\u5c55\u4e0e\u5e94\u7528\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5404\u79cdLLMs\u5728\u8fdb\u884c\u5b9a\u6027\u5206\u6790\u65f6\u5c55\u73b0\u51fa\u7684\u6f5c\u529b\u88ab\u5bc4\u4e88\u539a\u671b\uff0c\u4f46\u5b83\u4eec\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u4e2d\u7684\u5e94\u7528\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u8ba8\u3002\u672c\u6587\u901a\u8fc7\u4e00\u9879\u4ee5GPT-4\u4e3a\u6838\u5fc3\u7684\u7814\u7a76\u5b9e\u9a8c\uff0c\u4e3aLLMs\u5728\u5b9a\u6027\u5206\u6790\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\u3002\u7814\u7a76\u57fa\u4e8e\u4e00\u4e2a\u6765\u81ea\u6b27\u76df\u8d44\u52a9\u9879\u76ee\u7684YouTube\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u805a\u7126\u4e8e2016\u5e74\u745e\u5178\u7f57\u9a6c\u5c3c\u4e9a\u79fb\u6c11\u7fa4\u4f53\u7684\u4ee3\u8868\u5f62\u8c61\uff0c\u8fd9\u4e00\u65f6\u671f\u6b63\u503c2015\u5e74\u96be\u6c11\u5371\u673a\u4e4b\u540e\uff0c\u7d27\u90bb2017\u5e74\u7684\u745e\u5178\u5168\u56fd\u9009\u4e3e\u3002\u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u5c06\u4eba\u7c7b\u667a\u6167\u4e0eAI\u7684\u89c4\u6a21\u548c\u6548\u7387\u76f8\u7ed3\u5408\u7684\u53ef\u80fd\u6027\uff0c\u901a\u8fc7\u5206\u6790LLMs\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u5e94\u7528\u4f18\u52a3\uff0c\u5e76\u8ba8\u8bba\u672a\u6765\u53ef\u80fd\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.05123": "|**2024-08-09**|**Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video**|Chunggi Lee et.al.|[2408.05123](http://arxiv.org/abs/2408.05123)|null|\u968f\u7740\u7bee\u7403\u8fd0\u52a8\u7684\u666e\u53ca\uff0c\u7c89\u4e1d\u4eec\u5e38\u5e38\u56e0\u6bd4\u8d5b\u8282\u594f\u5feb\u548c\u590d\u6742\u5ea6\u9ad8\u800c\u611f\u5230\u56f0\u60d1\u3002\u7bee\u7403\u6218\u672f\u6d89\u53ca\u4e00\u7cfb\u5217\u590d\u6742\u7684\u52a8\u4f5c\uff0c\u9700\u8981\u5927\u91cf\u7684\u77e5\u8bc6\u624d\u80fd\u5b8c\u5168\u7406\u89e3\u3002\u8fd9\u79cd\u590d\u6742\u6027\u5bfc\u81f4\u4e86\u5bf9\u989d\u5916\u4fe1\u606f\u548c\u89e3\u91ca\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u4f1a\u5206\u6563\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u5173\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSportify\u7684\u89c6\u89c9\u95ee\u7b54\u7cfb\u7edf\uff0c\u5b83\u878d\u5408\u4e86\u53d9\u4e8b\u548c\u5d4c\u5165\u5f0f\u53ef\u89c6\u5316\uff0c\u65e8\u5728\u4e3a\u7403\u8ff7\u63d0\u4f9b\u7bee\u7403\u6218\u672f\u7591\u95ee\u7684\u6e05\u6670\u89e3\u7b54\uff0c\u5e2e\u52a9\u4ed6\u4eec\u7406\u89e3\u6bd4\u8d5b\u7684\u5404\u79cd\u65b9\u9762\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u79cd\u65b0\u578b\u7684\u52a8\u4f5c\u53ef\u89c6\u5316\uff08\u4f20\u7403\u3001\u5207\u5165\u548c\u63a9\u62a4\uff09\uff0c\u4ee5\u5c55\u793a\u5173\u952e\u52a8\u4f5c\u5e8f\u5217\u3002\u4e3a\u4e86\u89e3\u91ca\u7403\u5458\u884c\u52a8\u80cc\u540e\u7684\u539f\u56e0\u548c\u903b\u8f91\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u53d9\u4e8b\u6587\u672c\u3002\u6211\u4eec\u91c7\u7528\u6545\u4e8b\u8bb2\u8ff0\u7684\u65b9\u6cd5\u6765\u63cf\u8ff0\u590d\u6742\u573a\u666f\uff0c\u4ece\u7b2c\u4e00\u4eba\u79f0\u548c\u7b2c\u4e09\u4eba\u79f0\u7684\u89d2\u5ea6\u8fdb\u884c\u53d9\u8ff0\uff0c\u5e76\u878d\u5165\u52a8\u4f5c\u53ef\u89c6\u5316\u3002\u6211\u4eec\u901a\u8fc7\u4e0e\u7bee\u7403\u7c89\u4e1d\u7684\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86Sportify\u5728\u6df1\u5316\u6218\u672f\u6d1e\u5bdf\u529b\u548c\u589e\u5f3a\u89c2\u8d5b\u4f53\u9a8c\u65b9\u9762\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u7b2c\u4e09\u4eba\u79f0\u53d9\u8ff0\u6709\u52a9\u4e8e\u4eba\u4eec\u83b7\u5f97\u6df1\u5165\u7684\u6bd4\u8d5b\u89e3\u91ca\uff0c\u800c\u7b2c\u4e00\u4eba\u79f0\u53d9\u8ff0\u5219\u589e\u5f3a\u4e86\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u53c2\u4e0e\u611f\u3002|\n", "2408.05109": "|**2024-08-09**|**A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?**|Xinyu Liu et.al.|[2408.05109](http://arxiv.org/abs/2408.05109)|**[link](https://github.com/hkustdial/nl2sql_handbook)**|\u7ffb\u8bd1\u5982\u4e0b\uff1a \u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u5230SQL\u67e5\u8be2\uff08\u5373NL2SQL\uff09\u7684\u7ffb\u8bd1\u53ef\u4ee5\u663e\u8457\u964d\u4f4e\u8bbf\u95ee\u5173\u7cfb\u6570\u636e\u5e93\u7684\u969c\u788d\uff0c\u5e76\u652f\u6301\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\uff0cNL2SQL\u7684\u6027\u80fd\u5f97\u5230\u4e86\u5927\u5e45\u63d0\u5347\u3002\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684NL2SQL\u6280\u672f\u7efc\u8ff0\uff0c\u57fa\u4e8eLLMs\u9a71\u52a8\uff0c\u8986\u76d6\u4e86\u4ece\u56db\u4e2a\u65b9\u9762\u5bf9\u6574\u4e2a\u751f\u547d\u5468\u671f\u7684\u5168\u9762\u5ba1\u67e5\uff1a\uff081\uff09\u6a21\u578b\uff1a\u5904\u7406\u81ea\u7136\u8bed\u8a00\u7684\u6a21\u7cca\u6027\u548c\u4e0d\u5145\u5206\u6027\uff0c\u5e76\u6b63\u786e\u6620\u5c04\u81ea\u7136\u8bed\u8a00\u4e0e\u6570\u636e\u5e93\u6a21\u5f0f\u548c\u5b9e\u4f8b\uff1b\uff082\uff09\u6570\u636e\uff1a\u4ece\u6536\u96c6\u8bad\u7ec3\u6570\u636e\u3001\u5e94\u5bf9\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u6570\u636e\u5408\u6210\uff0c\u5230NL2SQL\u57fa\u51c6\uff1b\uff083\uff09\u8bc4\u4f30\uff1a\u4ece\u591a\u4e2a\u89d2\u5ea6\u4f7f\u7528\u4e0d\u540c\u6307\u6807\u5bf9NL2SQL\u65b9\u6cd5\u8fdb\u884c\u8bc4\u4f30\uff1b\uff084\uff09\u9519\u8bef\u5206\u6790\uff1a\u5206\u6790NL2SQL\u9519\u8bef\u4ee5\u627e\u5230\u6839\u672c\u539f\u56e0\uff0c\u5e76\u6307\u5bfcNL2SQL\u6a21\u578b\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5f00\u53d1NL2SQL\u89e3\u51b3\u65b9\u6848\u7684\u4e00\u6761\u7ecf\u9a8c\u6cd5\u5219\u3002\u6700\u540e\uff0c\u8ba8\u8bba\u4e86\u5728LLMs\u65f6\u4ee3NL2SQL\u7684\u7814\u7a76\u6311\u6218\u548c\u5f00\u653e\u95ee\u9898\u3002 \u8bf7\u6ce8\u610f\uff0c\u6458\u8981\u4e2d\u5df2\u53bb\u9664\u6240\u6709\u4e0d\u5fc5\u8981\u7684\u5b57\u7b26\uff0c\u5305\u62ec\",\"\u7b26\u53f7\u3002|\n", "2408.06332": "|**2024-08-12**|**Animate, or Inanimate, That is the Question for Large Language Models**|Leonardo Ranaldi et.al.|[2408.06332](http://arxiv.org/abs/2408.06332)|null|\u4eba\u7c7b\u7684\u8ba4\u77e5\u6838\u5fc3\u4e0e\u201c\u6709\u751f\u547d\u6027\u201d\u8fd9\u4e00\u6982\u5ff5\u7d27\u5bc6\u76f8\u8fde\uff0c\u5b83\u5728\u5851\u9020\u8bb0\u5fc6\u3001\u89c6\u89c9\u4ee5\u53ca\u591a\u5c42\u6b21\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u867d\u7136\u201c\u6709\u751f\u547d\u6027\u201d\u5728\u8bed\u8a00\u4e2d\u901a\u8fc7\u52a8\u8bcd\u548c\u5f62\u5bb9\u8bcd\u7684\u7ec6\u5fae\u7ea6\u675f\u4f53\u73b0\u51fa\u6765\uff0c\u4f46\u5176\u5b66\u4e60\u548c\u7cbe\u70bc\u8fc7\u7a0b\u4e5f\u4f9d\u8d56\u4e8e\u975e\u8bed\u8a00\u4fe1\u606f\u3002\u540c\u6837\u5730\uff0c\u6211\u4eec\u5047\u8bbe\u5927\u6a21\u578b\u5728\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\u65f6\u80fd\u529b\u6709\u9650\u7684\u539f\u56e0\u662f\u5b83\u4eec\u4ec5\u4ee5\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u3002\u56e0\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u63a2\u8ba8\u7684\u95ee\u9898\u662f\uff1a\u5927\u6a21\u578b\u662f\u5426\u80fd\u591f\u4ee5\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u65b9\u5f0f\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\uff1f\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u65b9\u6cd5\u8fdb\u884c\u4e86\u7cfb\u7edf\u5206\u6790\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u5927\u6a21\u578b\u5728\u4e0d\u540c\u7684\u6709\u751f\u547d\u3001\u65e0\u751f\u547d\u3001\u5e38\u89c1\u548c\u5f02\u5e38\u60c5\u5883\u4e0b\u8fdb\u884c\u64cd\u4f5c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u5927\u6a21\u578b\u4e3b\u8981\u57fa\u4e8e\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f46\u5728\u9762\u5bf9\u5178\u578b\u7684\u6709\u751f\u547d\u4f53\u548c\u65e0\u751f\u547d\u4f53\u65f6\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u4e0e\u5148\u524d\u7814\u7a76\u4e00\u81f4\u7684\u4eba\u7c7b\u884c\u4e3a\u6a21\u5f0f\u3002\u56e0\u6b64\uff0c\u5927\u6a21\u578b\u80fd\u591f\u9002\u5e94\u7406\u89e3\u975e\u5178\u578b\u60c5\u51b5\uff0c\u901a\u8fc7\u8bc6\u522b\u5f02\u5e38\u60c5\u51b5\u4e3a\u6709\u751f\u547d\u4f53\uff0c\u800c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u4f9d\u8d56\u7684\u672a\u8a00\u660e\u7684\u8ba4\u77e5\u89e6\u53d1\u673a\u5236\u6765\u5206\u89e3\u52a8\u753b\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u80fd\u5426\u5bf9\u5177\u6709\u957f\u4e0a\u4e0b\u6587\u7684\u573a\u666f\u4ea7\u751f\u8d1f\u9762\u5f71\u54cd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u5584\u8ba1\u5212\uff1f\uff084\uff09\u662f\u5426\u53ef\u4ee5\u4f7f\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u4f46\u5728\u5904\u7406\u957f\u7bc7\u4e0a\u4e0b\u6587\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u65e0\u6cd5\u5173\u6ce8\u5173\u952e\u90e8\u5206\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u5206\u6790\u957f\u671f\u89c4\u5212\uff0c\u5e76\u4e0d\u80fd\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u4f9b\u7ec6\u5316\u4f7f\u7528\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3a\u53cd\u9988\u611f\u77e5\u5fae\u8c03\uff08FAFT\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4e86\u6b63\u8d1f\u53cd\u9988\uff0c\u76f8\u8f83\u4e8e\u76d1\u7763\u5f0f\u5fae\u8c03\uff08SFT\uff09\uff0c\u5b83\u80fd\u5e26\u6765\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u6709\u5173\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.06292": "|**2024-08-12**|**The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery**|Chris Lu et.al.|[2408.06292](http://arxiv.org/abs/2408.06292)|**[link](https://github.com/sakanaai/ai-scientist)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u7684\u79d1\u5b66\u53d1\u73b0\uff0c\u4f7f\u524d\u6cbf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u72ec\u7acb\u8fdb\u884c\u7814\u7a76\uff0c\u5e76\u4f20\u8fbe\u5176\u7814\u7a76\u6210\u679c\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201cAI\u79d1\u5b66\u5bb6\u201d\u8fd9\u4e00\u6982\u5ff5\uff0c\u5b83\u80fd\u751f\u6210\u65b0\u9896\u7684\u7814\u7a76\u601d\u8def\uff0c\u7f16\u5199\u4ee3\u7801\uff0c\u6267\u884c\u5b9e\u9a8c\uff0c\u53ef\u89c6\u5316\u7ed3\u679c\uff0c\u64b0\u5199\u5b8c\u6574\u7684\u79d1\u5b66\u8bba\u6587\uff0c\u5e76\u8fdb\u884c\u6a21\u62df\u7684\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\u4ee5\u8fdb\u884c\u8bc4\u4f30\u3002\u7406\u8bba\u4e0a\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u8fed\u4ee3\u8fdb\u884c\uff0c\u4ee5\u5f00\u653e\u6027\u65b9\u5f0f\u53d1\u5c55\u60f3\u6cd5\uff0c\u5c31\u50cf\u4eba\u7c7b\u7684\u79d1\u5b66\u793e\u533a\u4e00\u6837\u3002 \u901a\u8fc7\u5c06\u5176\u5e94\u7528\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u4e09\u4e2a\u4e0d\u540c\u5b50\u9886\u57df\uff1a\u6269\u6563\u5efa\u6a21\u3001\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u8bed\u8a00\u5efa\u6a21\u548c\u5b66\u4e60\u52a8\u6001\uff0c\u5c55\u793a\u4e86\u5176\u7075\u6d3b\u6027\u3002\u6bcf\u4e00\u7bc7\u8bba\u6587\u7684\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e15\u7f8e\u5143\u3002\u4e3a\u4e86\u8bc4\u4f30\u751f\u6210\u7684\u8bba\u6587\uff0c\u6211\u4eec\u8bbe\u8ba1\u5e76\u9a8c\u8bc1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5ba1\u7a3f\u4eba\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u5728\u8bc4\u4ef7\u8bba\u6587\u5206\u6570\u65b9\u9762\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u8868\u73b0\u3002AI\u79d1\u5b66\u5bb6\u80fd\u591f\u4ea7\u751f\u8d85\u8fc7\u9876\u7ea7\u673a\u5668\u5b66\u4e60\u4f1a\u8bae\u63a5\u53d7\u9608\u503c\u7684\u8bba\u6587\uff0c\u8fd9\u662f\u7531\u6211\u4eec\u7684\u81ea\u52a8\u5ba1\u7a3f\u4eba\u5224\u65ad\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u9886\u57df\u79d1\u5b66\u7814\u7a76\u65b0\u7eaa\u5143\u7684\u5f00\u59cb\uff1a\u5c06AI\u4ee3\u7406\u7684\u53d8\u9769\u6027\u4f18\u52bf\u5e26\u5165\u6574\u4e2a\u7814\u7a76\u8fc7\u7a0b\uff0c\u4f7f\u6211\u4eec\u66f4\u63a5\u8fd1\u4e00\u4e2a\u80fd\u591f\u91ca\u653e\u89e3\u51b3\u4e16\u754c\u6700\u8270\u5de8\u95ee\u9898\u7684\u65e0\u9650\u53ef\u8d1f\u62c5\u521b\u65b0\u4e0e\u521b\u9020\u529b\u7684\u4e16\u754c\u3002\u6240\u6709\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/SakanaAI/AI-Scientist\u3002**|\n", "2408.06281": "|**2024-08-12**|**MovieSum: An Abstractive Summarization Dataset for Movie Screenplays**|Rohit Saxena et.al.|[2408.06281](http://arxiv.org/abs/2408.06281)|**[link](https://github.com/saxenarohit/moviesum)**|**\u7535\u5f71\u5267\u672c\u7684\u6982\u8ff0\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u8981\u6c42\u7406\u89e3\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u548c\u7535\u5f71\u7279\u6709\u7684\u5404\u79cd\u5143\u7d20\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u6863\u6982\u8ff0\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5728\u5904\u7406\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u65f6\u9047\u5230\u56f0\u96be\u3002\u6b64\u5916\uff0c\u867d\u7136\u6700\u8fd1\u7684\u7814\u7a76\u5173\u6ce8\u7535\u89c6\u811a\u672c\uff0c\u4f46\u7535\u5f71\u5267\u672c\u6982\u8ff0\u4ecd\u7136\u7f3a\u4e4f\u63a2\u7d22\u3002\u4e3a\u4e86\u6fc0\u53d1\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3aMovieSum\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u7535\u5f71\u5267\u672c\u7684\u62bd\u8c61\u6982\u8ff0\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u5305\u542b\u4e862200\u4e2a\u7535\u5f71\u5267\u672c\u53ca\u5176\u5bf9\u5e94\u7684\u7ef4\u57fa\u767e\u79d1\u5267\u60c5\u6982\u8ff0\u3002\u6211\u4eec\u4eba\u5de5\u683c\u5f0f\u5316\u4e86\u7535\u5f71\u5267\u672c\u4ee5\u8868\u793a\u5176\u7ed3\u6784\u5143\u7d20\u3002\u4e0e\u73b0\u6709\u7684\u6570\u636e\u96c6\u76f8\u6bd4\uff0cMovieSum\u5177\u6709\u51e0\u4e2a\u72ec\u7279\u7279\u70b9\uff1a\uff081\uff09\u5b83\u5305\u62ec\u7535\u5f71\u5267\u672c\uff0c\u8fd9\u4e9b\u5267\u672c\u6bd4\u7535\u89c6\u5267\u811a\u672c\u66f4\u957f\u3002\uff082\uff09\u5b83\u7684\u89c4\u6a21\u662f\u4e4b\u524d\u7535\u5f71\u5267\u672c\u6570\u636e\u96c6\u7684\u4e24\u500d\u3002\uff083\uff09\u5b83\u63d0\u4f9b\u4e86IMDb ID\u7b49\u5143\u6570\u636e\uff0c\u65b9\u4fbf\u83b7\u53d6\u989d\u5916\u7684\u5916\u90e8\u77e5\u8bc6\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6982\u8ff0\u7684\u7ed3\u679c\uff0c\u4ee5\u63d0\u4f9b\u8be6\u7ec6\u7684\u57fa\u51c6\u3002**|\n", "2408.06276": "|**2024-08-13**|**Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation**|Jieyong Kim et.al.|[2408.06276](http://arxiv.org/abs/2408.06276)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u4e86\u5b83\u4eec\u5728\u63a8\u8350\u7cfb\u7edf\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u7684\u6f5c\u529b\uff0c\u5f80\u5f80\u53d7\u9650\u4e8e\u8f93\u5165\u4fe1\u606f\u7684\u6709\u9650\u6027\uff0c\u672a\u80fd\u5168\u9762\u53d1\u6325\u5176\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEXP3RT\u7684\u65b0\u9896LLM\u63a8\u8350\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u7528\u6237\u548c\u7269\u54c1\u8bc4\u8bba\u4e2d\u8574\u542b\u7684\u4e30\u5bcc\u504f\u597d\u4fe1\u606f\u3002 EXP3RT\u901a\u8fc7\u4ece\u6559\u5e08LLM\u4e2d\u8fdb\u884c\u77e5\u8bc6\u84b8\u998f\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u5173\u952e\u7684\u4e09\u9879\u4efb\u52a1\uff1a\u9996\u5148\uff0c\u5b83\u4ece\u539f\u59cb\u8bc4\u8bba\u4e2d\u63d0\u53d6\u5e76\u5c01\u88c5\u6838\u5fc3\u7684\u4e3b\u89c2\u504f\u597d\uff1b\u5176\u6b21\uff0c\u6839\u636e\u7279\u5b9a\u6807\u51c6\u805a\u5408\u548c\u603b\u7ed3\u8fd9\u4e9b\u504f\u597d\uff0c\u5f62\u6210\u7528\u6237\u548c\u7269\u54c1\u7684\u6863\u6848\uff1b\u6700\u540e\uff0c\u8003\u8651\u7528\u6237/\u7269\u54c1\u6863\u6848\u4ee5\u53ca\u7269\u54c1\u63cf\u8ff0\u4e2d\u7684\u4e3b\u5ba2\u89c2\u4fe1\u606f\uff0c\u751f\u6210\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\u548c\u9884\u6d4b\u8bc4\u7ea7\uff0c\u5373\u57fa\u4e8e\u63a8\u7406\u7684\u8bc4\u7ea7\u9884\u6d4b\u3002\u8fd9\u79cd\u7531EXP3RT\u63d0\u4f9b\u7684\u4e2a\u6027\u5316\u504f\u597d\u63a8\u7406\u80fd\u591f\u63d0\u9ad8\u8bc4\u7ea7\u9884\u6d4b\u7684\u51c6\u786e\u6027\uff0c\u5e76\u4e3a\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u5fe0\u5b9e\u4e14\u5408\u7406\u7684\u89e3\u91ca\u3002 \u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cEXP3RT\u5728\u8bc4\u7ea7\u9884\u6d4b\u548c\u5019\u9009\u9879\u76ee\u91cd\u6392\u5e8f\uff08\u7528\u4e8etop-k\u63a8\u8350\uff09\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\uff0c\u540c\u65f6\u663e\u8457\u63d0\u5347\u4e86\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u89e3\u91ca\u6027\u3002|\n", "2408.06273": "|**2024-08-12**|**FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data**|Haoran Sun et.al.|[2408.06273](http://arxiv.org/abs/2408.06273)|**[link](https://github.com/tjunlp-lab/fuxitranyu)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bb8\u591aLLM\u5728\u9ad8\u8d44\u6e90\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u591a\u8bed\u8a00LLM\u2014\u2014FuxiTranyu\uff0c\u65e8\u5728\u6ee1\u8db3\u7814\u7a76\u793e\u533a\u5bf9\u5e73\u8861\u4e14\u9ad8\u6027\u80fd\u591a\u8bed\u8a00\u80fd\u529b\u7684\u9700\u6c42\u3002FuxiTranyu-8B\uff0c\u5177\u670980\u4ebf\u53c2\u6570\u7684\u57fa\u6a21\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5728\u4e00\u4e2a\u7cbe\u5fc3\u5e73\u8861\u7684\u591a\u8bed\u8a00\u6570\u636e\u4ed3\u5e93\u4e0a\uff0c\u8be5\u4ed3\u5e93\u5305\u542b\u8986\u76d643\u79cd\u81ea\u7136\u8bed\u8a00\u548c16\u79cd\u7f16\u7a0b\u8bed\u8a00\u76846000\u4ebf\u4e2a\u4ee4\u724c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e24\u4e2a\u6307\u4ee4\u8c03\u4f18\u6a21\u578b\uff1aFuxiTranyu-8B-SFT\uff0c\u5b83\u57fa\u4e8e\u591a\u5143\u6307\u4ee4\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\uff1b\u4ee5\u53caFuxiTranyu-8B-DPO\uff0c\u5728\u504f\u597d\u6570\u636e\u96c6\u4e0a\u8fdb\u4e00\u6b65\u7cbe\u70bc\u4ee5\u589e\u5f3a\u5bf9\u9f50\u80fd\u529b\u7684DPO\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u5728\u591a\u79cd\u591a\u8bed\u8a00\u57fa\u51c6\u4e0a\u7684\u7ed3\u679c\u663e\u793a\uff0cFuxiTranyu\u5728\u4e0e\u73b0\u6709\u591a\u8bed\u8a00LLM\uff08\u5982BLOOM-7B\u3001PolyLM-13B\u3001Llama-2-Chat-7B\u548cMistral-7B-Instruct\uff09\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u6027\u80fd\u3002\u795e\u7ecf\u5143\u7ea7\u548c\u8868\u793a\u7ea7\u53ef\u89e3\u91ca\u6027\u5206\u6790\u8868\u660e\uff0cFuxiTranyu\u80fd\u591f\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5b66\u4e60\u4e00\u81f4\u7684\u591a\u8bed\u8a00\u8868\u793a\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u591a\u8bed\u8a00LLM\u53ca\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u5e03\u4e86\u57fa\u6a21\u548c\u6307\u4ee4\u8c03\u4f18\u7684FuxiTranyu\u6a21\u578b\uff0c\u4ee5\u53ca58\u4e2a\u9884\u8bad\u7ec3\u68c0\u67e5\u70b9\uff0c\u901a\u8fc7HuggingFace\u548cGithub\u516c\u5f00\u5206\u4eab\u3002|\n", "2408.06272": "|**2024-08-12**|**A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution**|Sampath Rajapaksha et.al.|[2408.06272](http://arxiv.org/abs/2408.06272)|null|\u5728\u4e0d\u65ad\u6f14\u8fdb\u7684\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u5206\u6790\u5e08\u9700\u8981\u5bc6\u5207\u5173\u6ce8\u6700\u65b0\u7684\u653b\u51fb\u8d8b\u52bf\u548c\u76f8\u5173\u4fe1\u606f\uff0c\u4ee5\u534f\u52a9\u8c03\u67e5\u4e0e\u5f52\u56e0\u7f51\u7edc\u653b\u51fb\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6280\u672f\u7684\u95ee\u7b54\u6a21\u578b\u53ca\u5176\u5e94\u7528\uff0c\u65e8\u5728\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u5bb6\u63d0\u4f9b\u6709\u5173\u7f51\u7edc\u653b\u51fb\u8c03\u67e5\u4e0e\u5f52\u56e0\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u5e93\uff08KB\uff09\uff0c\u80fd\u591f\u6839\u636e\u77e5\u8bc6\u5e93\u6216\u7528\u6237\u63d0\u4f9b\u7684\u5916\u90e8\u8d44\u6e90\u56de\u7b54\u7528\u6237\u7684\u67e5\u8be2\u3002 \u6211\u4eec\u901a\u8fc7\u5404\u79cd\u7c7b\u578b\u7684\u63d0\u95ee\uff0c\u5305\u62ec\u57fa\u4e8e\u77e5\u8bc6\u5e93\u3001\u5143\u6570\u636e\u3001\u77e5\u8bc6\u5e93\u4e2d\u7684\u7279\u5b9a\u6587\u6863\u4ee5\u53ca\u5916\u90e8\u8d44\u6e90\u7684\u63d0\u95ee\uff0c\u5bf9\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u8fdb\u884c\u4e86\u6d4b\u8bd5\u4e0e\u8bc4\u4f30\u3002\u6211\u4eec\u5c06\u77e5\u8bc6\u5e93\u4e3a\u57fa\u7840\u7684\u95ee\u9898\u7684\u7b54\u6848\u4e0eOpenAI\u7684GPT-3.5\u53ca\u6700\u65b0GPT-4\u7684LLM\u7b54\u6848\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u5728\u63d0\u4f9b\u7b54\u6848\u7684\u540c\u65f6\u7ed9\u51fa\u4e86\u6765\u6e90\u4fe1\u606f\uff0c\u5e76\u4e14\u514b\u670d\u4e86GPT\u6a21\u578b\u53ef\u80fd\u4ea7\u751f\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5bf9\u4e8e\u7f51\u7edc\u653b\u51fb\u7684\u8c03\u67e5\u4e0e\u5f52\u56e0\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u5f53RAG\u95ee\u7b54\u6a21\u578b\u5728\u67e5\u8be2\u4e4b\u5916\u63d0\u4f9b\u5c11\u91cf\u793a\u4f8b\u65f6\uff0c\u5176\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u901a\u5e38\u4f18\u4e8e\u4ec5\u63d0\u4f9b\u67e5\u8be2\u800c\u6ca1\u6709\u793a\u4f8b\u7684\u60c5\u51b5\u3002|\n", "2408.06266": "|**2024-08-12**|**Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment**|Karel D'Oosterlinck et.al.|[2408.06266](http://arxiv.org/abs/2408.06266)|**[link](https://github.com/contextualai/clair_and_apo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u4f7f\u7528\u5bf9\u6bd4\u6027\u5bf9\u9f50\u76ee\u6807\u548c\u504f\u597d\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5bf9\u9f50\u3002\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u5230\u6a21\u578b\u3001\u914d\u5bf9\u6570\u636e\u4ee5\u53ca\u76ee\u6807\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u4f7f\u5f97\u5bf9\u9f50\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u6709\u65f6\u5bfc\u81f4\u4e0d\u7406\u60f3\u7684\u6210\u679c\u3002\u6211\u4eec\u5bf9\u6b64\u8fdb\u884c\u4e86\u7814\u7a76\uff0c\u53d1\u73b0\uff08i\uff09\u5f53\u5e95\u5c42\u54cd\u5e94\u5177\u6709\u5bf9\u6bd4\u6027\u65f6\uff0c\u504f\u597d\u6570\u636e\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u5b66\u4e60\u4fe1\u53f7\uff1b\uff08ii\uff09\u5bf9\u9f50\u76ee\u6807\u5728\u8bad\u7ec3\u671f\u95f4\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u63a7\u5236\uff0c\u4ece\u800c\u5bfc\u81f4\u4e86\u66f4\u597d\u7684\u6027\u80fd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5bf9\u6bd4\u5b66\u4e60\u4eceAI\u4fee\u8ba2\uff08CLAIR\uff09\uff0c\u4e00\u79cd\u6570\u636e\u521b\u5efa\u65b9\u6cd5\uff0c\u53ef\u4ee5\u751f\u6210\u66f4\u5177\u6709\u5bf9\u6bd4\u6027\u7684\u504f\u597d\u5bf9\uff0c\u4ee5\u53ca\u951a\u5b9a\u504f\u597d\u4f18\u5316\uff08APO\uff09\uff0c\u4e00\u4e2a\u66f4\u5177\u53ef\u63a7\u6027\u548c\u7a33\u5b9a\u6027\u7684\u5bf9\u9f50\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u96c6\u548c\u5bf9\u9f50\u76ee\u6807\u6765\u5bf9Llama-3-8B-Instruct\u8fdb\u884c\u5bf9\u9f50\uff0c\u5e76\u6d4b\u91cf\u4e86\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684MixEval-Hard\u5206\u6570\u3002CLAIR\u504f\u597d\u5bfc\u81f4\u6240\u6709\u6570\u636e\u96c6\u4e2d\u7684\u6700\u4f73\u6027\u80fd\uff0c\u800cAPO\u59cb\u7ec8\u4f18\u4e8e\u8f83\u5c11\u53ef\u63a7\u7684\u76ee\u6807\u3002\u901a\u8fc7\u572832K CLAIR\u504f\u597d\u4e0a\u4f7f\u7528APO\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u7684\u6700\u4f73\u6a21\u578b\u63d0\u9ad8\u4e86Llama-3-8B-Instruct\u7684\u6027\u80fd\u8fbe7.65%\uff0c\u5c06\u4e0eGPT4-turbo\u7684\u5dee\u8ddd\u7f29\u5c0f\u4e8645%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/ContextualAI/CLAIR_and_APO\u3002|\n", "2408.06223": "|**2024-08-12**|**On Effects of Steering Latent Representation for Large Language Model Unlearning**|Dang Huu-Tien et.al.|[2408.06223](http://arxiv.org/abs/2408.06223)|null|\u672c\u6587\u9996\u5148\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\u4e86\u5f15\u5bfc\u6a21\u578b\u4e2d\u95f4\u5c42\u9057\u5fd8\u8868\u793a\u5411\u968f\u673a\u65b9\u5411\u504f\u79fb\uff0c\u80fd\u964d\u4f4e\u6587\u672c\u751f\u6210\u7684\u7f6e\u4fe1\u5ea6\uff0c\u5bfc\u81f4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ea7\u751f\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u56de\u7b54\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u7cfb\u6570\u5982\u4f55\u5f71\u54cd\u9057\u5fd8\u6837\u672c\u8868\u793a\u4e0e\u968f\u673a\u65b9\u5411\u7684\u4e00\u81f4\u6027\uff0c\u5e76\u6697\u793a\u4e86\u4e0d\u540c\u7f51\u7edc\u5c42\u4e0b\u6709\u6548\u7684\u6700\u4f18\u7cfb\u6570\u503c\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5229\u7528\u4ee3\u8868\u9519\u4e71\u6cd5\uff08RMU\uff09\u8fdb\u884c\u5b66\u4e60\u64a4\u9500\u540e\u7684\u6a21\u578b\u80fd\u591f\u62b5\u5fa1\u5bf9\u6297\u6027\u9003\u8131\u653b\u51fb\u3002 \u6700\u540e\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u5206\u6790\u8868\u660e\uff0c\u5f53\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u95f4\u548c\u540e\u671f\u5c42\u65f6\uff0cRMU\u7684\u6709\u6548\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u2014\u2014\u81ea\u9002\u5e94RMU\uff0c\u8be5\u65b9\u6cd5\u4f7f\u5927\u591a\u6570\u5c42\u90fd\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\uff0c\u4e14\u4e0d\u589e\u52a0\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u7814\u7a76\u76f8\u6bd4\uff0c\u81ea\u9002\u5e94RMU\u663e\u8457\u63d0\u9ad8\u4e86\u5b66\u4e60\u64a4\u9500\u7684\u6027\u80fd\u3002|\n", "2408.06186": "|**2024-08-12**|**Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting**|Halley Young et.al.|[2408.06186](http://arxiv.org/abs/2408.06186)|null|\u751f\u6210\u591a\u6837\u5316\u7684\u6587\u672c\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u591a\u6837\u6027\u7684\u7814\u7a76\u4e3b\u8981\u901a\u8fc7$n$-gram\u591a\u6837\u6027\u6216BERT\u5d4c\u5165\u7684\u591a\u6837\u6027\u7b49\u6307\u6807\u8fdb\u884c\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u8003\u8651\u591a\u6837\u6027\u7684\u7ef4\u5ea6\u4e0a\u7f3a\u4e4f\u7528\u6237\u63a7\u5236\u6743\u3002\u4f8b\u5982\uff0c\u5728\u8bd7\u6b4c\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u5e0c\u671b\u5728\u62bc\u97f5\u548c\u8282\u594f\u65b9\u9762\u5b9e\u73b0\u591a\u6837\u6027\uff0c\u800c\u5728\u4ee3\u7801\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u66f4\u5173\u6ce8\u89e3\u51b3\u95ee\u9898\u65f6\u6240\u4f7f\u7528\u7684\u8868\u8fbe\u65b9\u5f0f\u7684\u591a\u6837\u6027\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u7ed3\u6784\u591a\u6837\u6027\uff08Structural Diversity\uff09\u7684\u65b0\u6307\u6807\u3002\u8be5\u6307\u6807\u5141\u8bb8\u7528\u6237\u63d0\u4f9b\u4e00\u4e2a\u6620\u5c04\uff0c\u5c06\u751f\u6210\u7684\u6587\u672c\u8f6c\u6362\u4e3a\u6355\u83b7\u7528\u6237\u5173\u5fc3\u7684\u591a\u6837\u6027\u7684\u7279\u5f81\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u5177\u4f53\u5730\u63a7\u5236\u4ed6\u4eec\u60f3\u8981\u63a2\u7d22\u7684\u591a\u6837\u6027\u7ef4\u5ea6\uff0c\u5982\u5728\u8bd7\u6b4c\u9886\u57df\u5173\u6ce8\u62bc\u97f5\u548c\u8282\u594f\uff0c\u5728\u4ee3\u7801\u9886\u57df\u5173\u6ce8\u7279\u5b9a\u7684\u8868\u8fbe\u65b9\u5f0f\u7b49\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u94fe\u5f0f\u89c4\u8303\uff08Chain-of-Specification\uff0cCoS\uff09\u7684\u65b0\u578b\u7b56\u7565\uff0c\u7528\u4e8e\u901a\u8fc7\u9996\u5148\u8ba9LLM\u751f\u6210\u63cf\u8ff0\u7279\u5b9a\u7ed3\u6784\u7279\u5f81\u5b9e\u4f8b\u7684\u89c4\u8303\uff0c\u7136\u540e\u5f15\u5bfcLLM\u751f\u6210\u6ee1\u8db3\u8fd9\u4e9b\u7279\u5f81\u7684\u6587\u672c\u6765\u63d0\u9ad8\u591a\u6837\u6027\uff1b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u7b56\u7565\u9002\u7528\u4e8e\u9ed1\u76d2LLM\u3002\u5728\u6211\u4eec\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728\u8bd7\u6b4c\u548c\u4ee3\u7801\u9886\u57df\u5b9e\u73b0\u7ed3\u6784\u591a\u6837\u6027\u65f6\uff0cCoS\u7b56\u7565\u76f8\u6bd4\u591a\u4e2a\u57fa\u7ebf\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6837\u6027\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u5728SWE-Bench Lite\u4e2d\u89e3\u51b3\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u5b83\u4eec\u7684\u72ec\u7279\u4e13\u957f\u3002DEI\u4f5c\u4e3a\u4e00\u4e2a\u4f4d\u4e8e\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7531DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90\u7684SWE\u4ee3\u7406\uff0c\u5176\u4e2a\u4f53\u89e3\u51b3\u7387\u6700\u9ad8\u4e3a27.3%\u5728SWE-Bench Lite\u4e2d\uff0c\u901a\u8fc7\u91c7\u7528DEI\uff0c\u53ef\u4ee5\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u6027\u80fd\u7ec4\u8868\u73b0\u51fa\u8272\uff0c\u8fbe\u5230\u4e8655%\u7684\u89e3\u51b3\u7387\uff0c\u5728SWE-Bench Lite\u4e2d\u83b7\u5f97\u4e86\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5c\u578b\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u5c55\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.07055": "|**2024-08-13**|**LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs**|Yushi Bai et.al.|[2408.07055](http://arxiv.org/abs/2408.07055)|**[link](https://github.com/thudm/longwriter)**|**\u5f53\u524d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5904\u7406\u6700\u591a10\u4e07\u5b57\u7684\u8f93\u5165\uff0c\u7136\u800c\u5728\u751f\u6210\u8d85\u8fc72\u5343\u5b57\u7684\u8f93\u51fa\u65f6\u5374\u529b\u4e0d\u4ece\u5fc3\u3002\u901a\u8fc7\u63a7\u5236\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u7684\u6709\u6548\u751f\u6210\u957f\u5ea6\u672c\u8d28\u4e0a\u53d7\u5230\u5176\u5728\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u671f\u95f4\u6240\u89c1\u6837\u672c\u7684\u9650\u5236\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u5b83\u4eec\u7684\u8f93\u51fa\u9650\u5236\u6e90\u4e8e\u73b0\u6709SFT\u6570\u636e\u96c6\u4e2d\u957f\u8f93\u51fa\u793a\u4f8b\u7684\u7a00\u7f3a\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentWrite\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u4ee3\u7406\u7684\u7ba1\u9053\uff0c\u5c06\u8d85\u957f\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u4f7f\u73b0\u6709\u7684LLMs\u80fd\u591f\u751f\u6210\u8d85\u8fc72\u4e07\u5b57\u7684\u8fde\u8d2f\u8f93\u51fa\u3002 \u501f\u52a9AgentWrite\uff0c\u6211\u4eec\u6784\u5efa\u4e86LongWriter-6k\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e866000\u4e2aSFT\u6570\u636e\uff0c\u8f93\u51fa\u957f\u5ea6\u8303\u56f4\u4ece2\u5343\u523032\u5343\u5b57\u3002\u901a\u8fc7\u5c06\u6b64\u6570\u636e\u96c6\u7eb3\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u73b0\u6709\u6a21\u578b\u7684\u8f93\u51fa\u957f\u5ea6\u6269\u5c55\u81f3\u8d85\u8fc71\u4e07\u5b57\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u8f93\u51fa\u8d28\u91cf\u3002\u6211\u4eec\u4e5f\u5f00\u53d1\u4e86LongBench-Write\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u8d85\u957f\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u76849\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7DPO\u8fdb\u4e00\u6b65\u6539\u8fdb\u540e\uff0c\u5728\u8fd9\u4e00\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u66f4\u5927\u89c4\u6a21\u7684\u4e13\u6709\u6a21\u578b\u3002 \u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u73b0\u6709\u7684\u957f\u4e0a\u4e0b\u6587LLMs\u5b9e\u9645\u4e0a\u5df2\u7ecf\u5177\u5907\u4e86\u66f4\u5927\u7684\u8f93\u51fa\u7a97\u53e3\u7684\u80fd\u529b\u2014\u2014\u4f60\u53ea\u9700\u8981\u5728\u6a21\u578b\u5bf9\u9f50\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u5e26\u6709\u5ef6\u957f\u8f93\u51fa\u7684\u6570\u636e\u5373\u53ef\u89e3\u9501\u8fd9\u4e00\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u4ee5\u5728\uff1ahttps://github.com/THUDM/LongWriter\u627e\u5230\u3002**|\n", "2408.07004": "|**2024-08-13**|**Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models**|Chun Jie Chong et.al.|[2408.07004](http://arxiv.org/abs/2408.07004)|null|\u57fa\u4e8e\u7f51\u7edc\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u670d\u52a1\u5df2\u88ab\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e76\u5df2\u6210\u4e3a\u6211\u4eec\u4e92\u8054\u7f51\u4f53\u9a8c\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002\u7b2c\u4e09\u65b9\u63d2\u4ef6\u901a\u8fc7\u63d0\u4f9b\u5bf9\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u548c\u670d\u52a1\u7684\u8bbf\u95ee\uff0c\u589e\u5f3a\u4e86LLM\u7684\u529f\u80fd\u6027\u3002\u7136\u800c\uff0c\u4e0e\u8fd9\u4e9b\u670d\u52a1\u53ca\u5176\u7b2c\u4e09\u65b9\u63d2\u4ef6\u76f8\u5173\u7684\u9690\u79c1\u540e\u679c\u5e76\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u654f\u611f\u63d0\u793a\u6570\u636e\u5728\u4e91\u57faLLM\u63d0\u4f9b\u5546\u548c\u7b2c\u4e09\u65b9\u63d2\u4ef6\u4e2d\u88ab\u5b58\u50a8\u3001\u5904\u7406\u548c\u5171\u4eab\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCasper\u7684\u63d0\u793a\u51c0\u5316\u6280\u672f\uff0c\u65e8\u5728\u901a\u8fc7\u68c0\u6d4b\u5e76\u4ece\u7528\u6237\u8f93\u5165\u4e2d\u5220\u9664\u654f\u611f\u4fe1\u606f\u6765\u4fdd\u62a4\u7528\u6237\u9690\u79c1\uff0c\u4ece\u800c\u5728\u53d1\u9001\u7ed9LLM\u670d\u52a1\u4e4b\u524d\u4fdd\u62a4\u7528\u6237\u9690\u79c1\u3002Casper\u5b8c\u5168\u4f5c\u4e3a\u6d4f\u89c8\u5668\u6269\u5c55\u8fd0\u884c\u5728\u7528\u6237\u7684\u8bbe\u5907\u4e0a\uff0c\u65e0\u9700\u5bf9\u5728\u7ebfLLM\u670d\u52a1\u8fdb\u884c\u4efb\u4f55\u66f4\u6539\u3002Casper\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u4e09\u5c42\u51c0\u5316\u673a\u5236\uff0c\u5305\u62ec\u89c4\u5219\u57fa\u4e8e\u8fc7\u6ee4\u5668\u3001\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u5668\u548c\u6d4f\u89c8\u5668\u672c\u5730LLM\u4e3b\u9898\u6807\u8bc6\u5668\u3002\u6211\u4eec\u4f7f\u75284000\u4e2a\u5408\u6210\u63d0\u793a\u96c6\u5bf9Casper\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5b83\u80fd\u591f\u4ee5\u9ad8\u51c6\u786e\u7387\uff0898.5%\uff09\u6709\u6548\u5730\u8fc7\u6ee4\u51fa\u4e2a\u4eba\u53ef\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\u548c\u9690\u79c1\u654f\u611f\u8bdd\u9898\uff0889.9%\uff09\u3002|\n", "2408.06993": "|**2024-08-13**|**LLMs can Schedule**|Henrik Abgaryan et.al.|[2408.06993](http://arxiv.org/abs/2408.06993)|**[link](https://github.com/starjob42/datasetjsp)**|**\u5de5\u4f5c\u8f66\u95f4\u8c03\u5ea6\u95ee\u9898(JSSP)\u5728\u4f18\u5316\u751f\u4ea7\u6d41\u7a0b\u65b9\u9762\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u8be5\u95ee\u9898\u6d89\u53ca\u6709\u6548\u5206\u914d\u4efb\u52a1\u5230\u6709\u9650\u6570\u91cf\u7684\u673a\u5668\u4e0a\uff0c\u4ee5\u6700\u5c0f\u5316\u603b\u5904\u7406\u65f6\u95f4\u6216\u4f5c\u4e1a\u5ef6\u8fdf\u7b49\u56e0\u7d20\u3002\u5c3d\u7ba1\u8fd1\u671f\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\u5df2\u7ecf\u4ea7\u751f\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4f8b\u5982\u5f3a\u5316\u5b66\u4e60\u548c\u56fe\u795e\u7ecf\u7f51\u7edc\uff0c\u4f46\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u5728JSSP\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u4e13\u95e8\u4e3a\u8bad\u7ec3LLM\u8bbe\u8ba1\u7684120k\u6570\u636e\u96c6\uff0c\u4e13\u95e8\u9488\u5bf9JSSP\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8c03\u5ea6\u53ef\u4ee5\u5b9e\u73b0\u4e0e\u5176\u5b83\u795e\u7ecf\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u91c7\u6837\u65b9\u6cd5\uff0c\u4ee5\u63d0\u9ad8LLM\u5728\u89e3\u51b3JSSP\u65f6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.06941": "|**2024-08-13**|**OpenResearcher: Unleashing AI for Accelerated Scientific Research**|Yuxiang Zheng et.al.|[2408.06941](http://arxiv.org/abs/2408.06941)|**[link](https://github.com/gair-nlp/openresearcher)**|**\u5feb\u901f\u53d1\u5c55\u7684\u79d1\u5b66\u6587\u732e\u5bf9\u7814\u7a76\u4eba\u5458\u5728\u5404\u81ea\u9886\u57df\u4fdd\u6301\u6700\u65b0\u8fdb\u5c55\u548c\u63a2\u7d22\u65b0\u9886\u57df\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u5e73\u53f0\u2014\u2014OpenResearcher\uff0c\u5b83\u5229\u7528\u4eba\u5de5\u667a\u80fd\u6280\u672f\u52a0\u901f\u7814\u7a76\u8fc7\u7a0b\uff0c\u901a\u8fc7\u56de\u7b54\u7814\u7a76\u4eba\u5458\u7684\u591a\u79cd\u95ee\u9898\u6765\u5e2e\u52a9\u4ed6\u4eec\u3002OpenResearcher\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6784\u5efa\uff0c\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u7279\u5b9a\u9886\u57df\u7684\u6700\u65b0\u77e5\u8bc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5404\u79cd\u5de5\u5177\uff0c\u4f7fOpenResearcher\u80fd\u591f\u7406\u89e3\u7814\u7a76\u4eba\u5458\u7684\u95ee\u9898\u3001\u4ece\u79d1\u5b66\u6587\u732e\u4e2d\u641c\u7d22\u3001\u7b5b\u9009\u68c0\u7d22\u5230\u7684\u4fe1\u606f\u3001\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u7b54\u6848\uff0c\u5e76\u81ea\u6211\u4f18\u5316\u8fd9\u4e9b\u7b54\u6848\u3002OpenResearcher\u7075\u6d3b\u5730\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5728\u6548\u7387\u4e0e\u6709\u6548\u6027\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u3002\u7ed3\u679c\uff0cOpenResearcher\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8282\u7701\u65f6\u95f4\uff0c\u63d0\u9ad8\u4ed6\u4eec\u53d1\u73b0\u65b0\u89c1\u89e3\u548c\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u7a81\u7834\u7684\u6f5c\u529b\u3002\u6f14\u793a\u3001\u89c6\u9891\u548c\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/GAIR-NLP/OpenResearcher\u3002**|\n", "2408.06929": "|**2024-08-13**|**Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas**|Louis Kwok et.al.|[2408.06929](http://arxiv.org/abs/2408.06929)|**[link](https://github.com/louiskwoklf/llms-cultural-adaptability)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u591a\u6587\u5316\u73af\u5883\u4e2d\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5b83\u4eec\u7406\u89e3\u7528\u6237\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u8ba9LLM\u6a21\u62df\u4ee3\u8868\u5404\u79cd\u56fd\u7c4d\u7684\u4eba\u7c7b\u89d2\u8272\u8fdb\u884c\u95ee\u5377\u5f0f\u5fc3\u7406\u5b66\u5b9e\u9a8c\u6765\u8861\u91cf\u8fd9\u4e00\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4f7f\u7528GPT-3.5\u5bf9\u6765\u81ea15\u4e2a\u56fd\u5bb6\u76847,286\u540d\u53c2\u4e0e\u8005\u9605\u8bfb\u5e76\u56de\u5e94\u5177\u6709\u8bf4\u670d\u529b\u7684\u65b0\u95fb\u6587\u7ae0\u7684\u53cd\u5e94\u8fdb\u884c\u6a21\u62df\uff1b\u5e76\u5c06\u7ed3\u679c\u4e0e\u62e5\u6709\u76f8\u540c\u4eba\u53e3\u7edf\u8ba1\u7279\u5f81\u7684\u771f\u5b9e\u53c2\u4e0e\u8005\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0c\u660e\u786e\u6307\u5b9a\u4e00\u4e2a\u4eba\u7684\u5c45\u4f4f\u56fd\u53ef\u4ee5\u63d0\u9ad8GPT-3.5\u4e0e\u4ed6\u4eec\u7684\u53cd\u5e94\u7684\u4e00\u81f4\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5f15\u5165\u7684\u53d8\u5316\u663e\u8457\u964d\u4f4e\u4e86\u6574\u4f53\u4e00\u81f4\u6027\uff0c\u5e76\u4e14\u67d0\u4e9b\u8bed\u8a00\u7279\u522b\u5f71\u54cd\u4e86\u6027\u80fd\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u76f4\u63a5\u63d0\u4f9b\u56fd\u7c4d\u4fe1\u606f\u53ef\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6587\u5316\u9002\u5e94\u6027\uff0c\u4f46\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5e76\u4e0d\u4e00\u5b9a\u80fd\u53ef\u9760\u5730\u63d0\u9ad8\u6a21\u62df\u51c6\u786e\u6027\uff0c\u53cd\u800c\u53ef\u80fd\u635f\u5bb3\u6a21\u578b\u7684\u6709\u6548\u6027\u3002|\n", "2408.06904": "|**2024-08-13**|**Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives**|Zhihu Wang et.al.|[2408.06904](http://arxiv.org/abs/2408.06904)|null|\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6301\u7eed\u6269\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u589e\u5f3a\u5f80\u5f80\u4e0d\u8db3\u4ee5\u89e3\u51b3\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\u3002\u7cfb\u7edf\u6027\u5730\u5206\u6790\u8fd9\u4e9b\u5931\u8d25\u5e76\u6709\u6548\u63d0\u5347\u5176\u6027\u80fd\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86Re-TASK\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7406\u8bba\u6a21\u578b\uff0c\u4ece\u80fd\u529b\u3001\u6280\u80fd\u3001\u77e5\u8bc6\u7684\u89d2\u5ea6\u91cd\u65b0\u5ba1\u89c6LLM\u4efb\u52a1\uff0c\u9075\u5faa\u5e03\u5362\u59c6\u5206\u7c7b\u6cd5\u548c\u77e5\u8bc6\u7a7a\u95f4\u7406\u8bba\u7684\u539f\u5219\u3002Re-TASK\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u6df1\u5316\u6211\u4eec\u5bf9LLM\u7684\u7406\u89e3\u3001\u8bc4\u4f30\u548c\u63d0\u5347\uff0c\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u3002\u5b83\u63a2\u7d22\u4e86LLM\u7684\u80fd\u529b\u3001\u5904\u7406\u7684\u77e5\u8bc6\u4ee5\u53ca\u5e94\u7528\u7684\u6280\u80fd\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u9610\u660e\u4e86\u8fd9\u4e9b\u5143\u7d20\u5982\u4f55\u76f8\u4e92\u5173\u8054\u5e76\u5f71\u54cd\u4efb\u52a1\u8868\u73b0\u3002 \u901a\u8fc7\u5e94\u7528Re-TASK\u6846\u67b6\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8bb8\u591a\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5931\u8d25\u7684\u539f\u56e0\u4e3b\u8981\u5f52\u548e\u4e8e\u77e5\u8bc6\u4e0d\u8db3\u6216\u6280\u80fd\u9002\u5e94\u5ea6\u4e0d\u591f\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u7ed3\u6784\u5316\u7684\u7b56\u7565\u6765\u589e\u5f3aLLM\uff0c\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u77e5\u8bc6\u6ce8\u5165\u548c\u6280\u80fd\u9002\u5e94\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u80fd\u529b\u9879\uff0c\u5e76\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u5347\u4efb\u52a1\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u5c11\u5927\u91cf\u5fae\u8c03\u7684\u9700\u6c42\u3002\u6216\u8005\uff0c\u6211\u4eec\u4f7f\u7528\u80fd\u529b\u7279\u5b9a\u6307\u4ee4\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86\u663e\u8457\u63d0\u9ad8LLM\u5728\u6027\u80fd\u548c\u9002\u7528\u6027\u65b9\u9762\u7684\u6548\u679c\u3002|\n", "2408.06874": "|**2024-08-13**|**Leveraging Language Models for Emotion and Behavior Analysis in Education**|Kaito Tanaka et.al.|[2408.06874](http://arxiv.org/abs/2408.06874)|null|\u5206\u6790\u5b66\u751f\u7684\u60c5\u7eea\u548c\u884c\u4e3a\u5bf9\u4e8e\u63d0\u5347\u5b66\u4e60\u6548\u679c\u4e0e\u4e2a\u6027\u5316\u6559\u80b2\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5bf9\u4fb5\u5165\u6027\u7684\u89c6\u89c9\u548c\u751f\u7406\u6570\u636e\u6536\u96c6\uff0c\u8fd9\u5f15\u53d1\u4e86\u9690\u79c1\u95ee\u9898\u5e76\u9650\u5236\u4e86\u89c4\u6a21\u6027\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u63d0\u793a\u5de5\u7a0b\u6765\u5206\u6790\u5b66\u751f\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u7b56\u7565\u901a\u8fc7\u5b9a\u5236\u7684\u63d0\u793a\u5f15\u5bfcLLMs\u68c0\u6d4b\u60c5\u611f\u548c\u53c2\u4e0e\u72b6\u6001\uff0c\u63d0\u4f9b\u4e00\u79cd\u975e\u4fb5\u5165\u6027\u3001\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528Qwen\u3001ChatGPT\u3001Claude2\u548cGPT-4\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u4e0e\u57fa\u7840\u6a21\u578b\u548c\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u5728\u63d0\u4f9b\u5b9e\u7528\u6709\u6548\u5de5\u5177\u4ee5\u8fdb\u884c\u6559\u80b2\u60c5\u7eea\u548c\u884c\u4e3a\u5206\u6790\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06854": "|**2024-08-13**|**LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models**|Jia-Chen Zhang et.al.|[2408.06854](http://arxiv.org/abs/2408.06854)|null|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5b9e\u73b0\u9ad8\u53c2\u6570\u6548\u7387\u5e76\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u5df2\u6210\u4e3a\u65b0\u7684\u7814\u7a76\u65b9\u5411\u3002\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u663e\u8457\u964d\u4f4e\u4e86\u7ec6\u8c03\u65f6\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u5c3d\u7ba1\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u590d\u6742\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0c\u4ec5\u5728\u5355\u4e00\u5c3a\u5ea6\u4e0a\u8c03\u53c2\u53ef\u80fd\u5e76\u975e\u6700\u4f18\u7b56\u7565\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6269\u5c55LoRA\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aLoRA$^2$\u3002\u9996\u5148\uff0c\u901a\u8fc7\u7ed3\u5408\u6b63\u4ea4\u6295\u5f71\u7406\u8bba\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e24\u7ec4\u5728\u76f8\u4e92\u6b63\u4ea4\u5e73\u9762\u4e0a\u7684LoRA\u96c6\u5408\u3002\u7136\u540e\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u91cd\u8981\u6027\u8bc4\u5206\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5927\u7ea6\u51cf\u5c11\u4e8698.5%\u7684\u53c2\u6570\u654f\u611f\u5ea6\u8ba1\u7b97\u3002\u901a\u8fc7\u53bb\u9664\u5177\u6709\u8f83\u4f4e\u91cd\u8981\u6027\u5206\u6570\u7684\u5947\u5f02\u503c\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u5bf9\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u7684\u9002\u5e94\u80fd\u529b\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1LoRA$^2$\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u5168\u91cf\u7ec6\u8c03\u76f8\u6bd4\uff0c\u5b83\u4ec5\u5c06\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u51cf\u5c11\u81f30.72%\uff0c\u540c\u65f6\u4ecd\u80fd\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u5373\u4f7f\u8fdb\u4e00\u6b65\u5c06\u53c2\u6570\u51cf\u5c11\u81f30.17M\uff0c\u5176\u7ed3\u679c\u4e5f\u4e0e\u57fa\u7ebf\u6a21\u578b\uff08\u53c2\u6570\u91cf\u591a\u51fa8\u500d\uff09\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1a|\n", "2408.06849": "|**2024-08-13**|**Causal Agent based on Large Language Model**|Kairong Han et.al.|[2408.06849](http://arxiv.org/abs/2408.06849)|**[link](https://github.com/kairong-han/causal_agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\u3002\u7136\u800c\uff0c\u56e0\u679c\u95ee\u9898\u7684\u5185\u5728\u590d\u6742\u6027\u548c\u56e0\u679c\u7406\u8bba\u4f7f\u5f97\u7528\u81ea\u7136\u8bed\u8a00\u51c6\u786e\u63cf\u8ff0\u5b83\u4eec\u53d8\u5f97\u56f0\u96be\uff0c\u8fd9\u963b\u788d\u4e86LLM\u6709\u6548\u5730\u7406\u89e3\u548c\u4f7f\u7528\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7528\u81ea\u7136\u8bed\u8a00\u4f20\u8fbe\u56e0\u679c\u65b9\u6cd5\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fd9\u9650\u5236\u4e86LLM\u5e94\u7528\u5b83\u4eec\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u56e0\u679c\u6570\u636e\u96c6\u901a\u5e38\u4ee5\u8868\u683c\u5f62\u5f0f\u5b58\u5728\uff0c\u800cLLM\u5728\u5904\u7406\u81ea\u7136\u8bed\u8a00\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u79cd\u7ed3\u6784\u4e0a\u7684\u4e0d\u5339\u914d\u59a8\u788d\u4e86\u5bf9\u8868\u683c\u6570\u636e\u7684\u6709\u6548\u63a8\u7406\u3002\u7f3a\u4e4f\u56e0\u679c\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86LLM\u7684\u53d1\u5c55\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4e3aLLM\u914d\u5907\u4e86\u56e0\u679c\u5de5\u5177\uff0c\u5e76\u5c06\u5176\u7f6e\u4e8e\u4e00\u4e2a\u4ee3\u7406\u6846\u67b6\u4e2d\uff0c\u79f0\u4e3a\u201c\u56e0\u679c\u4ee3\u7406\u201d\u3002\u8be5\u4ee3\u7406\u5305\u62ec\u5de5\u5177\u3001\u8bb0\u5fc6\u548c\u63a8\u7406\u6a21\u5757\u3002\u5728\u5de5\u5177\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u901a\u8fc7\u5c06\u8868\u683c\u6570\u636e\u4e0e\u81ea\u7136\u8bed\u8a00\u5bf9\u9f50\u6765\u5e94\u7528\u56e0\u679c\u65b9\u6cd5\u3002\u5728\u63a8\u7406\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u91c7\u7528ReAct\u6846\u67b6\u591a\u6b21\u8fed\u4ee3\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u8fdb\u884c\u63a8\u7406\u3002\u5728\u8bb0\u5fc6\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5b57\u5178\u5b9e\u4f8b\uff0c\u5176\u4e2d\u952e\u662f\u552f\u4e00\u7684\u540d\u79f0\uff0c\u503c\u662f\u56e0\u679c\u56fe\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u56e0\u679c\u4ee3\u7406\u7684\u56e0\u679c\u80fd\u529b\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u5305\u62ec\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\uff1a\u53d8\u91cf\u7ea7\u522b\u3001\u8fb9\u7ea7\u522b\u3001\u56e0\u679c\u56fe\u7ea7\u522b\u548c\u56e0\u679c\u6548\u5e94\u7ea7\u522b\u3002\u6211\u4eec\u4f7f\u7528ChatGPT-3.5\u751f\u6210\u4e861300\u4e2a\u9488\u5bf9\u8fd9\u56db\u4e2a\u5c42\u6b21\u95ee\u9898\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u56e0\u679c\u4ee3\u7406\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\u4e0a\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u5747\u8d85\u8fc780%\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u6d1e\u5bdf\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u901a\u8fc7GitHub\u4ed3\u5e93https://github.com/Kairong-Han/Causal_Agent\u83b7\u53d6\u3002**|\n", "2408.07702": "|**2024-08-14**|**The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models**|Karime Maamari et.al.|[2408.07702](http://arxiv.org/abs/2408.07702)|null|Schema linking is a crucial step in Text-to-SQL pipelines, which translate natural language queries into SQL. The goal of schema linking is to retrieve relevant tables and columns (signal) while disregarding irrelevant ones (noise). However, imperfect schema linking can often exclude essential columns needed for accurate query generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at identifying relevant schema elements during generation, without the need for explicit schema linking. This allows Text-to-SQL pipelines to bypass schema linking entirely and instead pass the full database schema to the LLM, eliminating the risk of excluding necessary information. Furthermore, as alternatives to schema linking, we propose techniques that improve Text-to-SQL accuracy without compromising on essential schema information. Our approach achieves 71.83\\% execution accuracy on the BIRD benchmark, ranking first at the time of submission.|\n", "2408.07666": "|**2024-08-15**|**Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities**|Enneng Yang et.al.|[2408.07666](http://arxiv.org/abs/2408.07666)|**[link](https://github.com/ennengyang/awesome-model-merging-methods-theories-applications)**|**Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at \\url{https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications}.**|\n", "2408.07665": "|**2024-08-14**|**Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models**|Yi-Cheng Lin et.al.|[2408.07665](http://arxiv.org/abs/2408.07665)|**[link](https://github.com/dlion168/spoken_stereoset)**|Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address these biases. This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in SLLMs. By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. Our experiments reveal significant insights into their performance and bias levels. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.|\n", "2408.07663": "|**2024-08-14**|**Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions**|Quan Liu et.al.|[2408.07663](http://arxiv.org/abs/2408.07663)|**[link](https://github.com/gigabaozi/aed)**|**Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root causes of jailbreak issues. We first define the Competitive Index to quantify alignment failures and utilize feedback from self-evaluation to compute post-alignment logits. Then, AED adaptively combines AED and post-alignment logits with the original logits to obtain harmless and helpful distributions. Consequently, our method enhances safety alignment while maintaining helpfulness. We conduct experiments across five models and four common jailbreaks, with the results validating the effectiveness of our approach. Code is available at https://github.com/GIGABaozi/AED.git.**|\n", "2408.07611": "|**2024-08-14**|**WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs**|Weijian Xie et.al.|[2408.07611](http://arxiv.org/abs/2408.07611)|null|Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce \"phantom\" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a \"Retrieval-Augmented Generation (RAG)\" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions.|\n", "2408.07583": "|**2024-08-14**|**Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey**|Hamza Kheddar et.al.|[2408.07583](http://arxiv.org/abs/2408.07583)|null|With significant advancements in Transformers LLMs, NLP has extended its reach into many research fields due to its enhanced capabilities in text generation and user interaction. One field benefiting greatly from these advancements is cybersecurity. In cybersecurity, many parameters that need to be protected and exchanged between senders and receivers are in the form of text and tabular data, making NLP a valuable tool in enhancing the security measures of communication protocols. This survey paper provides a comprehensive analysis of the utilization of Transformers and LLMs in cyber-threat detection systems. The methodology of paper selection and bibliometric analysis is outlined to establish a rigorous framework for evaluating existing research. The fundamentals of Transformers are discussed, including background information on various cyber-attacks and datasets commonly used in this field. The survey explores the application of Transformers in IDSs, focusing on different architectures such as Attention-based models, LLMs like BERT and GPT, CNN/LSTM-Transformer hybrids, emerging approaches like ViTs, among others. Furthermore, it explores the diverse environments and applications where Transformers and LLMs-based IDS have been implemented, including computer networks, IoT devices, critical infrastructure protection, cloud computing, SDN, as well as in autonomous vehicles. The paper also addresses research challenges and future directions in this area, identifying key issues such as interpretability, scalability, and adaptability to evolving threats, and more. Finally, the conclusion summarizes the findings and highlights the significance of Transformers and LLMs in enhancing cyber-threat detection capabilities, while also outlining potential avenues for further research and development.|\n", "2408.07543": "|**2024-08-15**|**MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark**|Minxuan Zhou et.al.|[2408.07543](http://arxiv.org/abs/2408.07543)|**[link](https://github.com/PKU-Baichuan-MLSystemLab/MathScape)**|With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchmarks have not sufficiently integrated visual and textual information. To address this gap, we proposed MathScape, a new benchmark that emphasizes the understanding and application of combined visual and textual information. MathScape is designed to evaluate photo-based math problem scenarios, assessing the theoretical understanding and application ability of MLLMs through a categorical hierarchical approach. We conduct a multi-dimensional evaluation on 11 advanced MLLMs, revealing that our benchmark is challenging even for the most sophisticated models. By analyzing the evaluation results, we identify the limitations of MLLMs, offering valuable insights for enhancing model performance.|\n", "2408.07537": "|**2024-08-15**|**Usefulness of data flow diagrams and large language models for security threat validation: a registered report**|Winnie Bahati Mbaka et.al.|[2408.07537](http://arxiv.org/abs/2408.07537)|null|The arrival of recent cybersecurity standards has raised the bar for security assessments in organizations, but existing techniques don't always scale well. Threat analysis and risk assessment are used to identify security threats for new or refactored systems. Still, there is a lack of definition-of-done, so identified threats have to be validated which slows down the analysis. Existing literature has focused on the overall performance of threat analysis, but no previous work has investigated how deep must the analysts dig into the material before they can effectively validate the identified security threats. We propose a controlled experiment with practitioners to investigate whether some analysis material (like LLM-generated advice) is better than none and whether more material (the system's data flow diagram and LLM-generated advice) is better than some material. In addition, we present key findings from running a pilot with 41 MSc students, which are used to improve the study design. Finally, we also provide an initial replication package, including experimental material and data analysis scripts and a plan to extend it to include new materials based on the final data collection campaign with practitioners (e.g., pre-screening questions).|\n", "2408.07531": "|**2024-08-14**|**Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments**|Seungjun Han et.al.|[2408.07531](http://arxiv.org/abs/2408.07531)|null|Emergency department (ED) overcrowding and the complexity of rapid decision-making in critical care settings pose significant challenges to healthcare systems worldwide. While clinical decision support systems (CDSS) have shown promise, the integration of large language models (LLMs) offers new possibilities for enhancing triage accuracy and clinical decision-making. This study presents an LLM-driven CDSS designed to assist ED physicians and nurses in patient triage, treatment planning, and overall emergency care management. We developed a multi-agent CDSS utilizing Llama-3-70b as the base LLM, orchestrated by CrewAI and Langchain. The system comprises four AI agents emulating key ED roles: Triage Nurse, Emergency Physician, Pharmacist, and ED Coordinator. It incorporates the Korean Triage and Acuity Scale (KTAS) for triage assessment and integrates with the RxNorm API for medication management. The model was evaluated using the Asclepius dataset, with performance assessed by a clinical emergency medicine specialist. The CDSS demonstrated high accuracy in triage decision-making compared to the baseline of a single-agent system. Furthermore, the system exhibited strong performance in critical areas, including primary diagnosis, critical findings identification, disposition decision-making, treatment planning, and resource allocation. Our multi-agent CDSS demonstrates significant potential for supporting comprehensive emergency care management. By leveraging state-of-the-art AI technologies, this system offers a scalable and adaptable tool that could enhance emergency medical care delivery, potentially alleviating ED overcrowding and improving patient outcomes. This work contributes to the growing field of AI applications in emergency medicine and offers a promising direction for future research and clinical implementation.|\n", "2408.07505": "|**2024-08-14**|**Large Language Models Know What Makes Exemplary Contexts**|Quanyu Long et.al.|[2408.07505](http://arxiv.org/abs/2408.07505)|null|In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks without needing to update millions of parameters. This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts; self-rank candidates with different demonstration compositions; self-optimize the demonstration selection and ordering through reinforcement learning. Specifically, our method designs a parameter-efficient retrieval head that generates the optimized demonstration after training with rewards from LLM's own preference. Experimental results validate the proposed method's effectiveness in enhancing ICL performance. Additionally, our approach effectively identifies and selects the most representative examples for the current task, and includes more diversity in retrieval.|\n", "2408.08313": "|**2024-08-15**|**Can Large Language Models Understand Symbolic Graphics Programs?**|Zeju Qiu et.al.|[2408.08313](http://arxiv.org/abs/2408.08313)|null|Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown exciting promise towards program synthesis, but do they understand symbolic graphics programs? Unlike conventional programs, symbolic graphics programs can be translated to graphics content. Here, we characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. This task is challenging as the questions are difficult to answer from the symbolic programs alone -- yet, they would be easy to answer from the corresponding graphics content as we verify through a human experiment. To understand symbolic programs, LLMs may need to possess the ability to imagine how the corresponding graphics content would look without directly accessing the rendered visual content. We use this task to evaluate LLMs by creating a large benchmark for the semantic understanding of symbolic graphics programs. This benchmark is built via program-graphics correspondence, hence requiring minimal human efforts. We evaluate current LLMs on our benchmark to elucidate a preliminary assessment of their ability to reason about visual scenes from programs. We find that this task distinguishes existing LLMs and models considered good at reasoning perform better. Lastly, we introduce Symbolic Instruction Tuning (SIT) to improve this ability. Specifically, we query GPT4-o with questions and images generated by symbolic programs. Such data are then used to finetune an LLM. We also find that SIT data can improve the general instruction following ability of LLMs.|\n", "2408.08310": "|**2024-08-15**|**ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws**|Ruihang Li et.al.|[2408.08310](http://arxiv.org/abs/2408.08310)|null|High-quality data is crucial for the pre-training performance of large language models. Unfortunately, existing quality filtering methods rely on a known high-quality dataset as reference, which can introduce potential bias and compromise diversity. In this paper, we propose ScalingFilter, a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data, thereby eliminating the influence of the reference dataset in the filtering process. An theoretical analysis shows that ScalingFilter is equivalent to an inverse utilization of scaling laws. Through training models with 1.3B parameters on the same data source processed by various quality filters, we find ScalingFilter can improve zero-shot performance of pre-trained models in downstream tasks. To assess the bias introduced by quality filtering, we introduce semantic diversity, a metric of utilizing text embedding models for semantic representations. Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.|\n", "2408.08302": "|**2024-08-15**|**Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors**|Usman Syed et.al.|[2408.08302](http://arxiv.org/abs/2408.08302)|null|In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems. This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced LLMs, especially their accuracy, consistency, and reasoning behaviors, in solving transportation engineering problems. Our comprehensive analysis uncovers the unique strengths and limitations of each LLM, e.g. our analysis shows the impressive accuracy and some unexpected inconsistent behaviors of Claude 3.5 Sonnet in solving TransportBench problems. Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.|\n", "2408.08300": "|**2024-08-15**|**HELP: Hierarchical Embeddings-based Log Parsing**|Andy Xu et.al.|[2408.08300](http://arxiv.org/abs/2408.08300)|null|Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heuristics-based parsers require handcrafted features and domain knowledge, which are difficult to generalize at scale. Second, existing large language model-based parsers rely on periodic offline processing, limiting their effectiveness in real-time use cases. Third, existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies. To address these challenges, we propose HELP, a Hierarchical Embeddings-based Log Parser. HELP is the first online semantic-based parser to leverage LLMs for performant and cost-effective log parsing. We achieve this through a novel hierarchical embeddings module, which fine-tunes a text embedding model to cluster logs before parsing, reducing querying costs by multiple orders of magnitude. To combat log drift, we also develop an iterative rebalancing module, which periodically updates existing log groupings. We evaluate HELP extensively on 14 public large-scale datasets, showing that HELP achieves significantly higher F1-weighted grouping and parsing accuracy than current state-of-the-art online log parsers. We also implement HELP into Iudex's production observability platform, confirming HELP's practicality in a production environment. Our results show that HELP is effective and efficient for high-throughput real-world log parsing.|\n", "2408.08291": "|**2024-08-15**|**The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community**|Shachar Don-Yehiya et.al.|[2408.08291](http://arxiv.org/abs/2408.08291)|null|Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.|\n", "2408.08282": "|**2024-08-15**|**Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model**|Jin Wang et.al.|[2408.08282](http://arxiv.org/abs/2408.08282)|null|Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.|\n", "2408.08274": "|**2024-08-15**|**BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts**|Qizhen Zhang et.al.|[2408.08274](http://arxiv.org/abs/2408.08274)|null|The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when \"upcycling\" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.|\n", "2408.08231": "|**2024-08-15**|**DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System**|Xihong Yang et.al.|[2408.08231](http://arxiv.org/abs/2408.08231)|null|Benefiting from the strong reasoning capabilities, Large language models (LLMs) have demonstrated remarkable performance in recommender systems. Various efforts have been made to distill knowledge from LLMs to enhance collaborative models, employing techniques like contrastive learning for representation alignment. In this work, we prove that directly aligning the representations of LLMs and collaborative models is sub-optimal for enhancing downstream recommendation tasks performance, based on the information theorem. Consequently, the challenge of effectively aligning semantic representations between collaborative models and LLMs remains unresolved. Inspired by this viewpoint, we propose a novel plug-and-play alignment framework for LLMs and collaborative models. Specifically, we first disentangle the latent representations of both LLMs and collaborative models into specific and shared components via projection layers and representation regularization. Subsequently, we perform both global and local structure alignment on the shared representations to facilitate knowledge transfer. Additionally, we theoretically prove that the specific and shared representations contain more pertinent and less irrelevant information, which can enhance the effectiveness of downstream recommendation tasks. Extensive experimental results on benchmark datasets demonstrate that our method is superior to existing state-of-the-art algorithms.|\n", "2408.08217": "|**2024-08-15**|**RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science**|David Farr et.al.|[2408.08217](http://arxiv.org/abs/2408.08217)|null|Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.|\n", "2408.08210": "|**2024-08-15**|**Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models**|Javier Gonz\u00e1lez et.al.|[2408.08210](http://arxiv.org/abs/2408.08210)|null|Recent advances in AI have been significantly driven by the capabilities of large language models (LLMs) to solve complex problems in ways that resemble human thinking. However, there is an ongoing debate about the extent to which LLMs are capable of actual reasoning. Central to this debate are two key probabilistic concepts that are essential for connecting causes to their effects: the probability of necessity (PN) and the probability of sufficiency (PS). This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms using these probabilistic measures. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which it is possible to compute suitable approximations of PN and PS. Our research marks an important step towards gaining a deeper understanding of when LLMs are capable of reasoning, as illustrated by a series of math examples.|\n", "2408.08869": "|**2024-08-16**|**PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars**|Sumanth Prabhu et.al.|[2408.08869](http://arxiv.org/abs/2408.08869)|null|\u81ea\u4e00\u81f4\u6027\u7b49\u4f9d\u8d56\u4e8e\u51c6\u786e\u7b54\u6848\u63d0\u53d6\u8fc7\u7a0b\u7684\u81ea\u6211\u96c6\u4e1b\u6280\u672f\u5df2\u7ecf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6280\u672f\u5728\u805a\u5408\u591a\u4e2a\u8f93\u51fa\u65f6\u9700\u8981\u8f83\u9ad8\u7684\u63a8\u7406\u6210\u672c\uff0c\u76f8\u8f83\u4e8e\u8d2a\u5fc3\u89e3\u7801\u800c\u8a00\uff0c\u751f\u6210\u76f8\u5bf9\u8f83\u591a\u7684\u8f93\u51fa\u4ee4\u724c\u3002\u7814\u7a76\u663e\u793a\uff0c\u81ea\u4e00\u81f4\u6027\u65b9\u6cd5\u4ea7\u751f\u7684\u81ea\u7531\u6587\u672c\u8f93\u51fa\u53ef\u4ee5\u901a\u8fc7LLM\u53ef\u9760\u5730\u805a\u5408\u4ee5\u4ea7\u751f\u6700\u7ec8\u8f93\u51fa\u3002\u6b64\u5916\uff0c\u6700\u8fd1\u7684LLM\u63a8\u7406\u8fdb\u5c55\u8868\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u591a\u6837\u5316\u7684\u793a\u4f8b\u80fd\u591f\u8bf1\u5bfcLLM\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002\u8fd9\u4e9b\u5df2\u7ecf\u8bc1\u660e\u7684\u6280\u672f\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u6269\u5c55\u5230\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u5b9e\u73b0\u6587\u672c\u751f\u6210\u7684\u6574\u4f53\u6027\u80fd\u6539\u8fdb\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPEDAL\uff08\u57fa\u4e8e\u793a\u4f8b\u591a\u6837\u6027\u7684LLM\u805a\u5408\uff09\u7684\u6df7\u5408\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u57fa\u4e8e\u591a\u6837\u793a\u4f8b\u63d0\u793a\u548cLLM\u805a\u5408\u7684\u4f18\u52bf\uff0c\u4ee5\u5b9e\u73b0\u6027\u80fd\u7684\u63d0\u5347\u3002\u5728\u516c\u5f00\u7684SVAMP\u548cARC\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\uff0c\u4e0e\u57fa\u4e8e\u8d2a\u5fc3\u89e3\u7801\u7684\u7b56\u7565\u76f8\u6bd4\uff0cPEDAL\u80fd\u591f\u5728\u8f83\u4f4e\u7684\u63a8\u7406\u6210\u672c\u4e0b\u83b7\u5f97\u66f4\u597d\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u57fa\u4e8e\u81ea\u4e00\u81f4\u6027\u7684\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u4f18\u52bf\u3002|\n", "2408.08862": "|**2024-08-16**|**Visual Agents as Fast and Slow Thinkers**|Guangyan Sun et.al.|[2408.08862](http://arxiv.org/abs/2408.08862)|**[link](https://github.com/guangyans/sys2-llava)**|\u5b9e\u73b0\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u667a\u80fd\u9700\u8981\u5bf9\u8ba4\u77e5\u4e0a\u7684\u7b2c\u4e00\u7cfb\u7edf\u548c\u7b2c\u4e8c\u7cfb\u7edf\u601d\u7ef4\u8fdb\u884c\u7ec6\u5316\u3002\u5f53\u524d\u7684\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\uff0c\u867d\u7136\u8868\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u7279\u70b9\uff0c\u4f46\u5e76\u672a\u8fbe\u5230\u771f\u6b63\u7684\u8ba4\u77e5\u6c34\u5e73\u3002\u5728\u4ece\u7ed3\u6784\u5316\u57fa\u51c6\u5411\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8fc7\u6e21\u7684\u8fc7\u7a0b\u4e2d\uff0c\u89c6\u89c9\u4ee3\u7406\u9762\u4e34\u6311\u6218\uff0c\u5f80\u5f80\u5bfc\u81f4\u56de\u7b54\u65e2\u4e0d\u51c6\u786e\u53c8\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86FaST\uff08\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\uff09\uff0c\u5b83\u5c06\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\u673a\u5236\u878d\u5165\u5230\u89c6\u89c9\u4ee3\u7406\u4e2d\u3002FaST\u91c7\u7528\u5207\u6362\u9002\u914d\u5668\u52a8\u6001\u9009\u62e9\u7cfb\u7edf1/2\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u7684\u590d\u6742\u6027\u8c03\u6574\u89e3\u51b3\u95ee\u9898\u7684\u65b9\u6cd5\u3002\u9762\u5bf9\u4e0d\u786e\u5b9a\u548c\u672a\u89c1\u8fc7\u7684\u5bf9\u8c61\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u4fe1\u5fc3\u5e76\u6574\u5408\u65b0\u7684\u4e0a\u4e0b\u6587\u6570\u636e\uff0c\u5b83\u80fd\u591f\u7075\u6d3b\u5e94\u5bf9\u3002 \u6211\u4eec\u63d0\u5021\u4e00\u4e2a\u7075\u6d3b\u7684\u7cfb\u7edf\u3001\u5c42\u6b21\u5316\u7684\u63a8\u7406\u80fd\u529b\u548c\u900f\u660e\u7684\u51b3\u7b56\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u4f7f\u5f97FaST\u80fd\u591f\u6a21\u4eff\u4eba\u7c7b\u5728\u89c6\u89c9\u667a\u80fd\u4e2d\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFaST\u5728\u89c6\u89c9\u95ee\u7b54(VQA^{v2})\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8680.8%\u7684\u51c6\u786e\u7387\uff0c\u5728\u63a8\u7406\u5206\u5272(ReasonSeg)\u4efb\u52a1\u4e0a\u83b7\u5f97\u4e8648.7%\u7684GIoU\u5206\u6570\uff0c\u8fd9\u5145\u5206\u5c55\u793a\u4e86FaST\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u5e7f\u6cdb\u7684\u6d4b\u8bd5\u9a8c\u8bc1\u4e86FaST\u6838\u5fc3\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u548c\u7a33\u5065\u6027\uff0c\u663e\u793a\u4e86\u5176\u5728\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u8ba4\u77e5\u89c6\u89c9\u4ee3\u7406\u7684\u53d1\u5c55\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.08849": "|**2024-08-16**|**ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis**|Yubao Zhao et.al.|[2408.08849](http://arxiv.org/abs/2408.08849)|null|\u5728\u533b\u7597\u8f85\u52a9\u9886\u57df\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u6210\u529f\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97\u60a3\u8005\u80fd\u591f\u5229\u7528\u751f\u7406\u4fe1\u53f7\u6570\u636e\u8fdb\u884c\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u901a\u7528\u7684MLLMs\u5728\u5fc3\u810f\u75c5\u8bca\u65ad\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u5728ECG\u6570\u636e\u89e3\u6790\u4e0e\u957f\u6587\u672c\u533b\u5b66\u62a5\u544a\u751f\u6210\u7684\u6574\u5408\u4e0a\uff0c\u4e3b\u8981\u539f\u56e0\u662fECG\u6570\u636e\u89e3\u6790\u7684\u590d\u6742\u6027\u4ee5\u53ca\u6587\u672c\u4e0eECG\u4fe1\u53f7\u6a21\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6a21\u578b\u5728\u957f\u6587\u672c\u751f\u6210\u65f6\u5f80\u5f80\u5b58\u5728\u4e25\u91cd\u7684\u7a33\u5b9a\u6027\u95ee\u9898\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u4e0e\u7528\u6237\u67e5\u8be2\u7d27\u5bc6\u76f8\u5173\u7684\u7cbe\u786e\u77e5\u8bc6\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aECG-Chat\u7684\u591a\u4efb\u52a1MLLM\uff0c\u4e13\u6ce8\u4e8eECG\u533b\u5b66\u62a5\u544a\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u57fa\u4e8e\u5fc3\u810f\u75c5\u5b66\u77e5\u8bc6\u7684\u8de8\u6a21\u6001\u5bf9\u8bdd\u80fd\u529b\u3002\u6211\u4eec\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5c06ECG\u6ce2\u5f62\u6570\u636e\u4e0e\u6587\u672c\u62a5\u544a\u7ed3\u5408\uff0c\u4ee5\u7cbe\u7ec6\u7684\u65b9\u5f0f\u5bf9\u9f50ECG\u7279\u5f81\u4e0e\u62a5\u544a\u5185\u5bb9\u3002\u8fd9\u79cd\u65b9\u6cd5\u8fd8\u4ea7\u751f\u4e86\u4e00\u4e2a\u5728\u96f6\u6837\u672c\u62a5\u544a\u68c0\u7d22\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u7684ECG\u7f16\u7801\u5668\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u6269\u5c55\u73b0\u6709\u6570\u636e\u96c6\uff0c\u6784\u5efa\u4e86\u5305\u542b19K\u4e2aECG\u8bca\u65ad\u6570\u636e\u96c6\u548c25K\u4e2a\u591a\u8f6e\u5bf9\u8bdd\u6570\u636e\u96c6\u7528\u4e8e\u8bad\u7ec3\u548c\u5fae\u8c03ECG-Chat\uff0c\u4ece\u800c\u63d0\u4f9b\u4e13\u4e1a\u7684\u8bca\u65ad\u548c\u5bf9\u8bdd\u80fd\u529b\u3002\u6b64\u5916\uff0cECG-Chat\u53ef\u4ee5\u901a\u8fc7\u81ea\u52a8\u5316LaTeX\u751f\u6210\u7ba1\u9053\u6765\u751f\u6210\u5168\u9762\u7684ECG\u5206\u6790\u62a5\u544a\u3002\u6211\u4eec\u4e3aECG\u62a5\u544a\u751f\u6210\u4efb\u52a1\u5efa\u7acb\u4e86\u57fa\u51c6\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u7ebf\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002ECG-Chat\u5728\u5206\u7c7b\u3001\u68c0\u7d22\u3001\u591a\u6a21\u6001\u5bf9\u8bdd\u548c\u533b\u5b66\u62a5\u544a\u751f\u6210\u4efb\u52a1\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u4f73\u6027\u80fd\u3002\u6211\u4eec\u7684\u62a5\u544a\u6a21\u677f\u8bbe\u8ba1\u4e5f\u5f97\u5230\u4e86\u533b\u7597\u4e13\u4e1a\u4eba\u5458\u7684\u4e00\u81f4\u8ba4\u53ef\u3002|\n", "2408.08848": "|**2024-08-16**|**PsychoLex: Unveiling the Psychological Mind of Large Language Models**|Mohammad Amin Abbasi et.al.|[2408.08848](http://arxiv.org/abs/2408.08848)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5fc3\u7406\u5b66\u4e0e\u4eba\u5de5\u667a\u80fd\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5f00\u53d1\u548c\u8bc4\u4f30\u4e13\u7528\u4e8e\u5fc3\u7406\u4efb\u52a1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u6211\u4eec\u5f15\u5165\u4e86PsychoLex\u5957\u4ef6\uff0c\u65e8\u5728\u589e\u5f3aLLMs\u5728\u6ce2\u65af\u8bed\u548c\u82f1\u8bed\u4e2d\u7684\u5fc3\u7406\u4efb\u52a1\u5904\u7406\u80fd\u529b\u3002\u4e3b\u8981\u8d21\u732e\u5305\u62ecPsychoLexQA\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u6559\u5b66\u5185\u5bb9\u7684\u521b\u5efa\uff0c\u4ee5\u53caPsychoLexEval\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5bf9LLMs\u5728\u590d\u6742\u5fc3\u7406\u60c5\u666f\u4e0b\u7684\u4e25\u683c\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4ecb\u7ecd\u4e86PsychoLexLLaMA\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u7279\u522b\u4f18\u5316\u4ee5\u9002\u7528\u4e8e\u5fc3\u7406\u5e94\u7528\uff0c\u5176\u6027\u80fd\u660e\u663e\u4f18\u4e8e\u901a\u7528\u6a21\u578b\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5b9a\u5236LLMs\u5728\u63a8\u8fdb\u5fc3\u7406\u7814\u7a76\u548c\u5e94\u7528\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u540c\u65f6\u4e5f\u6307\u51fa\u4e86\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u9886\u57df\u3002\u8fd9\u9879\u7814\u7a76\u4e3a\u5c06LLMs\u878d\u5165\u7279\u5b9a\u7684\u5fc3\u7406\u5b66\u9886\u57df\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5bf9\u672a\u6765AI\u9a71\u52a8\u7684\u5fc3\u7406\u5b9e\u8df5\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2408.08841": "|**2024-08-16**|**FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats**|Xuanliang Zhang et.al.|[2408.08841](http://arxiv.org/abs/2408.08841)|**[link](https://github.com/zhxlia/FLEXTAF)**|**## \u4e0a\u6587\u80cc\u666f \u8868\u683c\u63a8\u7406\u4efb\u52a1\u65e8\u5728\u6839\u636e\u7ed9\u5b9a\u7684\u8868\u683c\u56de\u7b54\u95ee\u9898\u3002\u76ee\u524d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u8868\u683c\u63a8\u7406\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u73b0\u6709\u7684\u5927\u591a\u6570\u65b9\u6cd5\u90fd\u91c7\u7528\u56fa\u5b9a\u7684\u8868\u683c\u683c\u5f0f\u6765\u8868\u793a\u8868\u683c\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u6027\u80fd\u3002\u9274\u4e8e\u6bcf\u4e2a\u5b9e\u4f8b\u9700\u8981\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u800c\u6a21\u578b\u5177\u6709\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u6211\u4eec\u65ad\u8a00\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u9002\u7528\u4e8e\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\u3002\u901a\u8fc7\u5b9e\u9a8c\u7ed3\u679c\u7684\u5b9a\u91cf\u5206\u6790\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e00\u70b9\uff1a\u4f7f\u7528\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\uff0c\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u53ef\u4ee5\u83b7\u5f97\u4e0d\u540c\u7684\u6027\u80fd\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u589e\u5f3a\u8868\u683c\u63a8\u7406\u6027\u80fd\u7684\u65b9\u6cd5FLEXTAF-Single\u548cFLEXTAF-Vote\uff0c\u901a\u8fc7\u4f7f\u7528\u7075\u6d3b\u7684\u8868\u683c\u683c\u5f0f\u3002\u5177\u4f53\u6765\u8bf4\uff0c(i) FLEXTAF-Single\u8bad\u7ec3\u4e00\u4e2a\u5206\u7c7b\u5668\uff0c\u57fa\u4e8e\u5b9e\u4f8b\u548cLLM\u9884\u6d4b\u6700\u9002\u5408\u7684\u8868\u683c\u683c\u5f0f\u3002(ii) FLEXTAF-Vote\u5728\u4e0d\u540c\u683c\u5f0f\u4e4b\u95f4\u96c6\u6210\u7ed3\u679c\u3002\u6211\u4eec\u5728WikiTableQuestions\u548cTabFact\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u4e86\u663e\u8457\u7684\u6539\u8fdb\uff0c\u4e0e\u4f7f\u7528\u56fa\u5b9a\u8868\u683c\u683c\u5f0f\u5e76\u7ed3\u5408\u8d2a\u5a6a\u89e3\u7801\u548c\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u8fbe\u5230\u7684\u6700\u4f73\u6027\u80fd\u76f8\u6bd4\uff0c\u5e73\u5747\u63d0\u9ad8\u4e862.3%\u548c4.8%\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.08811": "|**2024-08-16**|**Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors**|Felipe A. Csaszar et.al.|[2408.08811](http://arxiv.org/abs/2408.08811)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5982\u4f55\u5f71\u54cd\u4f01\u4e1a\u6218\u7565\u51b3\u7b56\u8fc7\u7a0b\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u4f8b\u5c55\u793a\u4e86AI\u5982\u4f55\u589e\u5f3a\u73b0\u6709\u6218\u7565\u51b3\u7b56\u5de5\u5177\uff0c\u5e76\u63d0\u4f9b\u4e86\u6765\u81ea\u9886\u5148\u52a0\u901f\u5668\u8ba1\u5212\u548c\u521b\u4e1a\u7ade\u8d5b\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660e\u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u548c\u8bc4\u4f30\u7b56\u7565\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u4f01\u4e1a\u5bb6\u548c\u6295\u8d44\u8005\u76f8\u5f53\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5206\u6790\u4e86\u6218\u7565\u51b3\u7b56\u80cc\u540e\u7684\u5173\u952e\u8ba4\u77e5\u8fc7\u7a0b\u2014\u2014\u641c\u7d22\u3001\u8868\u793a\u548c\u805a\u5408\uff0c\u5e76\u63d0\u51faAI\u6709\u53ef\u80fd\u63d0\u5347\u6218\u7565\u5206\u6790\u7684\u901f\u5ea6\u3001\u8d28\u91cf\u548c\u89c4\u6a21\uff0c\u540c\u65f6\u8fd8\u80fd\u542f\u7528\u5982\u865a\u62df\u6218\u7565\u6a21\u62df\u7b49\u65b0\u65b9\u6cd5\u3002\u7136\u800c\uff0cAI\u5bf9\u4f01\u4e1a\u53d1\u5c55\u7684\u5f71\u54cd\u6700\u7ec8\u53d6\u51b3\u4e8e\u7ade\u4e89\u52a8\u6001\u4ee5\u53caAI\u80fd\u529b\u7684\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5c06AI\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u5e94\u7528\u4e0e\u4f01\u4e1a\u7ed3\u679c\u8054\u7cfb\u8d77\u6765\uff0c\u5e76\u8ba8\u8bba\u4e86AI\u5982\u4f55\u91cd\u5851\u7ade\u4e89\u4f18\u52bf\u7684\u6765\u6e90\u3002\u6700\u540e\uff0c\u6211\u4eec\u8003\u8651\u4e86AI\u5982\u4f55\u65e2\u652f\u6301\u53c8\u6311\u6218\u57fa\u4e8e\u7406\u8bba\u7684\u6218\u7565\u89c2\u7684\u6838\u5fc3\u539f\u5219\u3002\u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63cf\u7ed8\u4e86\u4e00\u4e2aAI\u4e0e\u6218\u7565\u9886\u57df\u6b63\u5728\u5f62\u6210\u7684\u7814\u7a76\u524d\u6cbf\u3002|\n", "2408.08808": "|**2024-08-16**|**Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge**|Ravi Raju et.al.|[2408.08808](http://arxiv.org/abs/2408.08808)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u5b66\u4e60\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u7136\u800c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u591a\u6837\u884c\u4e3a\u3002\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7684\u4ef7\u503c\u5728\u4e8e\u5b83\u80fd\u5426\u6e05\u6670\u533a\u5206\u4e0d\u540c\u80fd\u529b\u7ea7\u522b\u7684\u6a21\u578b\uff08\u53ef\u5206\u6027\uff09\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u7d27\u5bc6\u5339\u914d\u5ea6\u3002\u5f53\u524d\u7684\u6846\u67b6\u5982Alpaca-Eval 2.0 LC \\cite{dubois2024lengthcontrolledalpacaevalsimpleway} \u548cArena-Hard v0.1 \\cite{li2024crowdsourced}\u4e3b\u8981\u5173\u6ce8\u901a\u7528\u67e5\u8be2\uff0c\u5e76\u4e14\u7f3a\u4e4f\u8de8\u6cd5\u5f8b\u3001\u533b\u5b66\u7b49\u9886\u57df\u7684\u591a\u6837\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u7ba1\u9053\uff0c\u6765\u5b9a\u5236\u4e00\u7cfb\u5217\u591a\u5143\u5316\u7684\u3001\u9488\u5bf9LLM-as-a-Judge\u6846\u67b6\u7684\u9886\u57df\u7279\u5b9a\u8bc4\u4f30\u96c6\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u4eba\u5de5\u7b5b\u9009\u3001\u534a\u76d1\u7763\u5b66\u4e60\u751f\u6210\u805a\u7c7b\u4ee5\u53ca\u5206\u5c42\u62bd\u6837\uff0c\u786e\u4fdd\u5728\u5e7f\u6cdb\u9886\u57df\u548c\u8bed\u8a00\u4e2d\u90fd\u6709\u5747\u8861\u7684\u4ee3\u8868\u6027\u3002\u4ea7\u751f\u7684\u8bc4\u4f30\u96c6\u5305\u62ec1573\u4e2a\u6837\u672c\uff0c\u5206\u5e03\u572814\u4e2a\u7c7b\u522b\u4e2d\uff0c\u663e\u793a\u51fa\u9ad8\u53ef\u5206\u6027\uff0884%\uff09\u548c\u5bf9\u524d\u5341\u5927\u6a21\u578b\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u540c\u65f6\u4e0eChatbot Arena\u7684\u5171\u8bc6\u5ea6\uff0884%\uff09\u548cSpearman\u76f8\u5173\u7cfb\u6570\uff080.915\uff09\u4e5f\u8868\u73b0\u51fa\u826f\u597d\u7684\u4e00\u81f4\u6027\u3002\u4e0eAlpacaEval 2.0 LC\u7684\u5171\u8bc6\u5ea6\u76f8\u6bd4\uff0c\u8fd9\u4e00\u503c\u9ad8\u51fa9%\uff0c\u4e0eArena Hard\u76f8\u6bd4\u5219\u9ad8\u51fa20%\uff0c\u800c\u4e0eSpearman\u7cfb\u6570\u76f8\u6bd4\u5219\u662f\u4e0b\u4e00\u4e2a\u6700\u4f73\u57fa\u51c6\u76840.7\u500d\uff0c\u8fd9\u8868\u660e\u6211\u4eec\u5728\u57fa\u51c6\u6d4b\u8bd5\u7684\u6709\u6548\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f00\u6e90\u7684\u8bc4\u4f30\u5de5\u5177\uff0c\u5141\u8bb8\u7528\u6237\u81ea\u5b9a\u4e49\u7c7b\u522b\u8fdb\u884c\u7cbe\u7ec6\u5206\u6790\uff0c\u4ece\u800c\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6d1e\u5bdf\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u589e\u5f3aLLM\u8bc4\u4f30\u65b9\u6cd5\u7684\u900f\u660e\u5ea6\u3001\u591a\u6837\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2408.08782": "|**2024-08-16**|**EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics**|Chenwei Wan et.al.|[2408.08782](http://arxiv.org/abs/2408.08782)|**[link](https://github.com/cw-wan/EmoDynamiX-v2)**|**\u8bbe\u8ba1\u80fd\u591f\u63d0\u4f9b\u6170\u85c9\u548c\u5efa\u8bae\u7684\u5177\u6709\u60c5\u611f\u667a\u80fd\u7684\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u4ee5\u5e2e\u52a9\u90a3\u4e9b\u7ecf\u5386\u538b\u529b\u7684\u4eba\u4eec\uff0c\u662f\u4e00\u4e2a\u6781\u5177\u5438\u5f15\u529b\u7684\u7814\u7a76\u9886\u57df\u3002\u8fc7\u53bb\u7684\u7814\u7a76\u5de5\u4f5c\u7740\u91cd\u4e8e\u6784\u5efa\u6a21\u5757\u5316\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u5e76\u5c06\u5176\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u89c6\u4e3a\u8f85\u52a9\u4efb\u52a1\uff0c\u901a\u8fc7\u5b9a\u5236\u89e3\u7801\u5668\u751f\u6210\u6761\u4ef6\u5316\u7684\u54cd\u5e94\u3002\u6700\u8fd1\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u65b9\u9762\u7684\u53d1\u5c55\u4f7f\u5f97\u65e0\u9700\u660e\u786e\u7684\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u6b65\u9aa4\u7684\u7aef\u5230\u7aef\u5bf9\u8bdd\u4ee3\u7406\u53d8\u5f97\u6d41\u884c\u8d77\u6765\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u8bed\u8a00\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u56fa\u6709\u7684\u504f\u597d\u504f\u89c1\uff0c\u503e\u5411\u4e8e\u67d0\u4e9b\u793e\u4f1a\u60c5\u611f\u7b56\u7565\uff0c\u963b\u788d\u4e86\u63d0\u4f9b\u9ad8\u8d28\u91cf\u60c5\u611f\u652f\u6301\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff1a\u5c06\u7b56\u7565\u9884\u6d4b\u4e0e\u8bed\u8a00\u751f\u6210\u5206\u79bb\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aEmoDynamiX\u7684\u65b0\u578b\u5bf9\u8bdd\u7b56\u7565\u9884\u6d4b\u5668\u3002\u8be5\u9884\u6d4b\u5668\u5229\u7528\u5f02\u6784\u56fe\u6765\u5efa\u6a21\u7528\u6237\u60c5\u7eea\u4e0e\u7cfb\u7edf\u7b56\u7565\u4e4b\u95f4\u7684\u5bf9\u8bdd\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4e86\u5bf9\u8bdd\u4e2d\u60c5\u611f\u8bc6\u522b\uff08ERC\uff09\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u6df7\u5408\u60c5\u7eea\u6a21\u5757\uff0c\u4ee5\u6355\u6349\u7528\u6237\u7684\u7ec6\u5fae\u60c5\u611f\u72b6\u6001\u3002\u5728\u4e24\u4e2aESC\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEmoDynamiX\u663e\u8457\u8d85\u8d8a\u4e86\u5148\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u7ecf\u79fb\u9664\u4e86\",\"\u5b57\u7b26\u3002**|\n", "2408.08780": "|**2024-08-16**|**Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions**|Chenming Tang et.al.|[2408.08780](http://arxiv.org/abs/2408.08780)|null|\u901a\u8fc7\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728ICL\u8fc7\u7a0b\u4e2d\u63cf\u8ff0\u6027\u6307\u4ee4\u7684\u4f5c\u7528\u4ecd\u7136\u6709\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u96c6\u6210\u63d0\u793a\u6846\u67b6\uff0c\u7528\u4e8e\u63cf\u8ff0\u591a\u4e2a\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u9009\u62e9\u6807\u51c6\uff0c\u5e76\u5728\u516d\u4e2a\u7ffb\u8bd1\u65b9\u5411\u7684\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u4efb\u52a1\u4e0a\u7684\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u6846\u67b6\u80fd\u591f\u63d0\u5347ICL\u6027\u80fd\u3002\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0cLLM\u53ef\u80fd\u5e76\u4e0d\u5173\u5fc3\u63cf\u8ff0\u7684\u5177\u4f53\u5185\u5bb9\uff0c\u6027\u80fd\u63d0\u5347\u4e3b\u8981\u6e90\u4e8e\u96c6\u6210\u683c\u5f0f\uff0c\u5373\u4f7f\u4f7f\u7528\u968f\u673a\u63cf\u8ff0\u540d\u8bcd\uff0c\u8be5\u6846\u67b6\u4e5f\u80fd\u5e26\u6765\u6539\u8fdb\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u5e38\u8bc6\u3001\u6570\u5b66\u3001\u903b\u8f91\u63a8\u7406\u548c\u5e7b\u89c9\u4efb\u52a1\u4e0a\u5e94\u7528\u4e86\u8fd9\u79cd\u65b0\u7684\u96c6\u6210\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u4e09\u79cdLLM\u53d6\u5f97\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u8fd9\u518d\u6b21\u8868\u660e\u8bbe\u8ba1\u9002\u5f53\u7684\u63d0\u793a\u683c\u5f0f\u6bd4\u4e13\u6ce8\u4e8e\u7279\u5b9a\u63cf\u8ff0\u66f4\u4e3a\u6709\u6548\u548c\u9ad8\u6548\u3002\u5728\u8bba\u6587\u53d1\u8868\u540e\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.08779": "|**2024-08-16**|**DAC: Decomposed Automation Correction for Text-to-SQL**|Dingzirui Wang et.al.|[2408.08779](http://arxiv.org/abs/2408.08779)|**[link](https://github.com/zirui-HIT/DAC)**|**\u6587\u672c\u5230SQL\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u4efb\u52a1\uff0c\u5b83\u901a\u8fc7\u81ea\u52a8\u751f\u6210SQL\u67e5\u8be2\u5e2e\u52a9\u4eba\u4eec\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u8003\u8651\u5230\u51fa\u8272\u7684\u6027\u80fd\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b9\u6cd5\u6210\u4e3a\u4e86\u6587\u672c\u5230SQL\u7684\u4e3b\u6d41\u65b9\u5f0f\u3002\u5728\u8fd9\u7c7b\u65b9\u6cd5\u4e2d\uff0c\u81ea\u52a8\u4fee\u6b63\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u624b\u6bb5\uff0c\u80fd\u591f\u901a\u8fc7\u7ea0\u6b63\u751f\u6210\u7ed3\u679c\u4e2d\u7684\u9519\u8bef\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u73b0\u6709\u4fee\u6b63\u65b9\u6cd5\u8981\u6c42LLM\u76f4\u63a5\u5bf9\u751f\u6210\u7684SQL\u8fdb\u884c\u4fee\u6b63\uff0c\u800c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u68c0\u6d4b\u9519\u8bef\uff0c\u5bfc\u81f4\u4e86\u8f83\u5dee\u7684\u6027\u80fd\u3002\u56e0\u6b64\uff0c\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u91c7\u7528\u5206\u89e3\u5f0f\u4fee\u6b63\u6765\u589e\u5f3a\u6587\u672c\u5230SQL\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5206\u89e3\u5f0f\u4fee\u6b63\u4f18\u4e8e\u76f4\u63a5\u4fee\u6b63\uff0c\u56e0\u4e3a\u4e0eSQL\u76f8\u6bd4\uff0c\u901a\u8fc7\u7ed3\u679c\u5206\u89e3\u5b50\u4efb\u52a1\u6765\u68c0\u6d4b\u548c\u4fee\u590d\u9519\u8bef\u66f4\u4e3a\u5bb9\u6613\u3002\u57fa\u4e8e\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5206\u89e3\u81ea\u52a8\u5316\u4fee\u6b63\uff08DAC\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u5c06\u6587\u672c\u5230SQL\u5206\u89e3\u4e3a\u5b9e\u4f53\u94fe\u63a5\u548c\u9aa8\u67b6\u89e3\u6790\u4e24\u4e2a\u5b50\u4efb\u52a1\u6765\u4fee\u6b63SQL\u3002DAC\u9996\u5148\u751f\u6210\u4e0e\u95ee\u9898\u5bf9\u5e94\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\uff0c\u7136\u540e\u6bd4\u8f83\u521d\u59cbSQL\u4e0e\u751f\u6210\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\u4e4b\u95f4\u7684\u5dee\u5f02\u4f5c\u4e3a\u4fee\u6b63\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u3001Bird\u548cKaggleDBQA\u4e0a\u7684\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e863.7%\uff0c\u8bc1\u660e\u4e86DAC\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10197": "|**2024-08-19**|**Demystifying the Communication Characteristics for Distributed Transformer Models**|Quentin Anthony et.al.|[2408.10197](http://arxiv.org/abs/2408.10197)|null|\u6df1\u5ea6\u5b66\u4e60\uff08DL\uff09\u6a21\u578b\u57fa\u4e8e\u53d8\u6362\u5668\u67b6\u6784\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3001\u89c6\u89c9\u53d8\u6362\u5668\u3001\u97f3\u9891\u751f\u6210\u548c\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u7b49\u4f17\u591aDL\u5e94\u7528\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u8fdb\u5c55\u3002\u8fd9\u4e00\u7cfb\u5217\u8fdb\u6b65\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\uff0c\u7136\u800c\u5206\u5e03\u5f0f\u901a\u4fe1\u4ecd\u7136\u662f\u5f71\u54cd\u8bad\u7ec3\u8fdb\u5ea6\u7684\u4e00\u4e2a\u91cd\u5927\u74f6\u9888\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u53d8\u6362\u5668\u6a21\u578b\u7684\u901a\u4fe1\u884c\u4e3a\uff0c\u5373\u5728\u4f7f\u7528\u591a\u8282\u70b9/\u591aGPU DL\u8bad\u7ec3\u65f6\uff0c\u4e0d\u540c\u5e76\u884c\u65b9\u6848\u5982\u4f55\u5728\u53d8\u6362\u5668\u80cc\u666f\u4e0b\u8fdb\u884c\u6570\u636e\u901a\u4fe1\u3002\u6211\u4eec\u4ee5GPT\u4e3a\u57fa\u7840\u7684\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u53d8\u6362\u5668\u67b6\u6784\u6848\u4f8b\u7814\u7a76\u7684\u4e3b\u8981\u5bf9\u8c61\uff0c\u7531\u4e8e\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u800c\u88ab\u9009\u4e2d\u3002\u901a\u8fc7\u6211\u4eec\u7684\u901a\u4fe1\u65e5\u5fd7\u9a8c\u8bc1\u4e86\u6240\u83b7\u5f97\u7684\u5b9e\u9a8c\u7ed3\u679c\uff0c\u5e76\u4f7f\u7528\u5206\u6790\u6a21\u578b\u5bf9\u8fd9\u4e9b\u7ed3\u679c\u8fdb\u884c\u4e86\u786e\u8ba4\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8fdb\u4e00\u6b65\u4f18\u5316\u5c0f\u6d88\u606f\u70b9\u5230\u70b9\u901a\u4fe1\u7684\u5fc5\u8981\u6027\u3001\u5e8f\u5217\u957f\u5ea6\u3001\u6bcfGPU\u541e\u5410\u91cf\u3001\u6a21\u578b\u5927\u5c0f\u4ee5\u53ca\u6240\u7528\u4f18\u5316\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4ee5\u53ca\u5728\u6846\u67b6\u548c\u9ad8\u6027\u80fd\u8ba1\u7b97\u4e2d\u95f4\u4ef6\u8bbe\u8ba1\u4e0e\u4f18\u5316\u65b9\u9762\u53ef\u80fd\u9700\u8981\u5f15\u5bfc\u7684\u8fdb\u4e00\u6b65\u4f18\u5316\u65b9\u5411\u3002|\n", "2408.10174": "|**2024-08-19**|**SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models**|Anke Tang et.al.|[2408.10174](http://arxiv.org/abs/2408.10174)|**[link](https://github.com/tanganke/fusion_bench)**|**\u6df1\u5ea6\u6a21\u578b\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u8bad\u7ec3\u65e5\u76ca\u53d8\u5f97\u6210\u672c\u9ad8\u6602\uff0c\u8fd9\u4fc3\u4f7f\u4eba\u4eec\u5e7f\u6cdb\u91c7\u7528\u6df1\u5ea6\u6a21\u578b\u878d\u5408\u6280\u672f\uff0c\u4ee5\u5229\u7528\u73b0\u6709\u6a21\u578b\u7684\u77e5\u8bc6\u3002\u4ece\u7b80\u5355\u7684\u6743\u91cd\u5e73\u5747\u5230\u66f4\u590d\u6742\u7684AdaMerging\u7b49\u65b9\u6cd5\uff0c\u6a21\u578b\u878d\u5408\u80fd\u591f\u6709\u6548\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5e76\u52a0\u901f\u65b0\u6a21\u578b\u7684\u5f00\u53d1\u3002\u7136\u800c\uff0c\u4e2a\u4f53\u6a21\u578b\u53c2\u6570\u95f4\u7684\u76f8\u4e92\u5e72\u6270\u4ee5\u53ca\u878d\u5408\u8fc7\u7a0b\u7684\u53ef\u89e3\u91ca\u6027\u4e0d\u8db3\u4ecd\u7136\u662f\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u8bd5\u56fe\u901a\u8fc7\u8bc4\u4f30\u53c2\u6570\u5c5e\u6027\uff08\u5982\u5927\u5c0f\u6216\u7b26\u53f7\uff09\u6216\u8fdb\u884c\u53c2\u6570\u4fee\u526a\u6765\u89e3\u51b3\u53c2\u6570\u5e72\u6270\u95ee\u9898\u3002\u672c\u7814\u7a76\u9996\u5148\u4ece\u7ebf\u6027\u5c42\u5fae\u8c03\u7684\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b50\u7a7a\u95f4\u5206\u6790\u660e\u786e\u5730\u5b9a\u4e49\u4e86\u53c2\u6570\u5e72\u6270\u4f5c\u4e3a\u4f18\u5316\u95ee\u9898\uff0c\u4ee5\u63ed\u793a\u8fd9\u4e00\u4e3b\u9898\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u96f6\u6837\u672c\u7a00\u758f\u6df7\u5408\u4f4e\u79e9\u4e13\u5bb6\uff08SMILE\uff09\u6784\u9020\u7684\u521b\u65b0\u6a21\u578b\u878d\u5408\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5141\u8bb8\u5728\u65e0\u9700\u989d\u5916\u6570\u636e\u6216\u8fdb\u4e00\u6b65\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u5c06\u6e90\u6a21\u578b\u5347\u7ea7\u4e3a\u6df7\u5408\u4e13\u5bb6\u6a21\u578b\uff08MoE\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u4ee5\u4e0b\u89c2\u5bdf\uff1a\u5fae\u8c03\u4e3b\u8981\u4fdd\u7559\u4e86\u9884\u8bad\u7ec3\u7684\u91cd\u8981\u90e8\u5206\uff0c\u4f46\u4f7f\u7528\u8f83\u5c11\u91cd\u8981\u6216\u672a\u4f7f\u7528\u7684\u533a\u57df\u6765\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u5728\u539f\u59cb\u53c2\u6570\u7a7a\u95f4\u4e2d\u56fa\u6709\u7684\u53c2\u6570\u5e72\u6270\u95ee\u9898\uff0c\u53ef\u4ee5\u901a\u8fc7\u6269\u5c55\u7ef4\u5ea6\u6765\u7ba1\u7406\u3002\u6211\u4eec\u5728\u591a\u79cd\u573a\u666f\u4e0b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5305\u62ec\u56fe\u50cf\u5206\u7c7b\u548c\u6587\u672c\u6cdb\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u5168\u91cf\u5fae\u8c03\u548cLoRA\u5fae\u8c03\uff0c\u5e76\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08CLIP\u6a21\u578b\u3001Flan-T5\u6a21\u578b\u548cMistral-7B\u6a21\u578b\uff09\uff0c\u7a81\u51fa\u4e86SMILE\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8ehttps://github.com/tanganke/fusion_bench**|\n", "2408.10159": "|**2024-08-19**|**Customizing Language Models with Instance-wise LoRA for Sequential Recommendation**|Xiaoyu Kong et.al.|[2408.10159](http://arxiv.org/abs/2408.10159)|**[link](https://github.com/akalikong/ilora)**|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u77e5\u8bc6\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u8bed\u8a00\u751f\u6210\u8303\u5f0f\u5c06LLM\u5e94\u7528\u4e8e\u5e8f\u5217\u63a8\u8350\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5c06\u7528\u6237\u884c\u4e3a\u5e8f\u5217\u8f6c\u6362\u4e3aLLM\u5fae\u8c03\u7684\u63d0\u793a\uff0c\u5229\u7528LoRA\u6a21\u5757\u6765\u7ec6\u5316\u63a8\u8350\u3002\u7136\u800c\uff0c\u5728\u4e0d\u540c\u7528\u6237\u884c\u4e3a\u4e4b\u95f4\u8fdb\u884c\u7edf\u4e00\u5e94\u7528\u65f6\uff0cLoRA\u6709\u65f6\u65e0\u6cd5\u6355\u6349\u5230\u4e2a\u4f53\u5dee\u5f02\u6027\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u4ee5\u53ca\u5728\u4e0d\u540c\u884c\u4e3a\u5e8f\u5217\u95f4\u7684\u8d1f\u8fc1\u79fb\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684LoRA\uff08iLoRA\uff09\uff0c\u5b83\u7ed3\u5408\u4e86LoRA\u4e0e\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6846\u67b6\u3002iLoRA\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6837\u5316\u7684\u4e13\u5bb6\u96c6\u5408\uff0c\u6bcf\u4e2a\u4e13\u5bb6\u90fd\u80fd\u591f\u6355\u83b7\u7279\u5b9a\u7684\u7528\u6237\u504f\u597d\u65b9\u9762\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u7531\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u5f15\u5bfc\u7684\u95e8\u63a7\u51fd\u6570\u3002\u8be5\u95e8\u63a7\u51fd\u6570\u5904\u7406\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u4ee5\u751f\u6210\u589e\u5f3a\u8868\u793a\uff0c\u4ece\u800c\u6307\u5bfc\u95e8\u63a7\u7f51\u7edc\u8f93\u51fa\u5b9a\u5236\u7684\u4e13\u5bb6\u53c2\u4e0e\u6743\u91cd\u3002\u8fd9\u79cd\u5b9a\u5236\u5316\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c11\u8d1f\u8fc1\u79fb\u5e76\u52a8\u6001\u9002\u5e94\u591a\u6837\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u4e09\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\u4e86iLoRA\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u6355\u6349\u7528\u6237\u7279\u5b9a\u504f\u597d\u548c\u63d0\u9ad8\u63a8\u8350\u51c6\u786e\u5ea6\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2408.10151": "|**2024-08-19**|**Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models**|Amey Hengle et.al.|[2408.10151](http://arxiv.org/abs/2408.10151)|**[link](https://github.com/AmeyHengle/multilingual-needle-in-a-haystack)**|\u5728\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u793a\u4e86\u5728\u591a\u79cd\u8bed\u8a00\u4e2d\u54cd\u5e94\u67e5\u8be2\u7684\u80fd\u529b\u4e4b\u540e\uff0c\u5b83\u4eec\u5904\u7406\u957f\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002\u56e0\u6b64\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u8bc4\u4f30LLM\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u7279\u522b\u662f\u5728\u4fe1\u606f\u68c0\u7d22\u7684\u80cc\u666f\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u8bed\u8a00\u9488\u5728\u8349\u5806\u4e2d\u7684\u6d4b\u8bd5\uff08MultiLingual Needle-in-a-Haystack\uff0c\u7b80\u79f0MLNeedle\uff09\uff0c\u65e8\u5728\u8bc4\u4f30\u6a21\u578b\u4ece\u591a\u8bed\u8a00\u5e72\u6270\u6587\u672c\u96c6\u5408\uff08\u8349\u5806\uff09\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\uff08\u9488\uff09\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u6d4b\u8bd5\u6269\u5c55\u4e86\u591a\u8bed\u8a00\u95ee\u7b54\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u5355\u8bed\u8a00\u548c\u8de8\u8bed\u8a00\u68c0\u7d22\u3002\u6211\u4eec\u5bf9\u5f53\u524d\u7684\u56db\u5927\u5148\u8fdbLLM\u8fdb\u884c\u4e86MLNeedle\u6d4b\u8bd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u6a21\u578b\u6027\u80fd\u5728\u4e0d\u540c\u8bed\u8a00\u548c\u9488\u7684\u4f4d\u7f6e\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5f53\u9488\u4f4d\u4e8e\u82f1\u8bed\u8bed\u7cfb\u4e4b\u5916\u7684\u8bed\u8a00\u4e2d\u4ee5\u53ca\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u4e2d\u95f4\u4f4d\u7f6e\u65f6\uff0c\u6a21\u578b\u7684\u6027\u80fd\u6700\u4f4e\u3002\u6b64\u5916\uff0c\u5c3d\u7ba1\u67d0\u4e9b\u6a21\u578b\u58f0\u79f0\u5177\u6709\u9ad8\u8fbe8k\u4e2a\u4ee4\u724c\u7684\u4e0a\u4e0b\u6587\u5927\u5c0f\uff0c\u4f46\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u90fd\u6ca1\u6709\u8868\u73b0\u51fa\u6ee1\u610f\u7684\u8de8\u8bed\u8a00\u68c0\u7d22\u6027\u80fd\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5173\u4e8eLLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u5173\u952e\u89c1\u89e3\uff0c\u4ee5\u6307\u5bfc\u672a\u6765\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u7814\u7a76LLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7684\u957f\u4e0a\u4e0b\u6587\u884c\u4e3a\u3002|\n", "2408.10147": "|**2024-08-19**|**In-Context Learning with Representations: Contextual Generalization of Trained Transformers**|Tong Yang et.al.|[2408.10147](http://arxiv.org/abs/2408.10147)|null|\u672c\u6587\u901a\u8fc7\u975e\u7ebf\u6027\u56de\u5f52\u4efb\u52a1\u7684\u89c6\u89d2\u6765\u63a2\u8ba8Transformer\u5728\u68af\u5ea6\u4e0b\u964d\u8fc7\u7a0b\u4e2d\u7684\u8bad\u7ec3\u52a8\u6001\u3002\u5728\u6b64\u7c7b\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u5b66\u4e60\u6bcf\u4e2a\u4efb\u52a1\u7684\u6a21\u677f\u51fd\u6570\u5b9e\u73b0\u4e0a\u4e0b\u6587\u6cdb\u5316\uff0c\u6240\u6709\u6a21\u677f\u51fd\u6570\u90fd\u4f4d\u4e8e\u5305\u542b$m$\u4e2a\u57fa\u51fd\u6570\u7684\u7ebf\u6027\u7a7a\u95f4\u5185\u3002\u6211\u4eec\u5bf9\u5355\u5c42\u591a\u5934Transformer\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u4ee5\u5728\u90e8\u5206\u6807\u8bb0\u63d0\u793a\u4e0b\u9884\u6d4b\u672a\u6807\u8bb0\u8f93\u5165\u7684\u4e0a\u4e0b\u6587\u5185\u9884\u6d4b\u80fd\u529b\uff0c\u5176\u4e2d\u6807\u7b7e\u5305\u542b\u9ad8\u65af\u566a\u58f0\uff0c\u6bcf\u4e2a\u63d0\u793a\u4e2d\u7684\u793a\u4f8b\u6570\u91cf\u4e0d\u8db3\u4ee5\u786e\u5b9a\u6a21\u677f\u3002 \u5728\u6e29\u548c\u5047\u8bbe\u4e0b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5355\u5c42\u591a\u5934Transformer\u7684\u8bad\u7ec3\u635f\u5931\u4f1a\u7ebf\u6027\u6536\u655b\u81f3\u5168\u5c40\u6700\u5c0f\u503c\u3002\u6b64\u5916\uff0cTransformer\u6709\u6548\u5730\u5b66\u4e60\u4e86\u5728\u57fa\u51fd\u6570\u4e0a\u8fdb\u884c\u5cad\u56de\u5f52\u7684\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u5c55\u793a\u4e86\u5f53\u63d0\u793a\u4ec5\u5305\u542b\u5c11\u91cf\u67e5\u8be2-\u7b54\u6848\u5bf9\u65f6\uff0cTransformer\u80fd\u591f\u5b66\u4e60\u4e0a\u4e0b\u6587\u4fe1\u606f\uff08\u5373\u6a21\u677f\uff09\u4ee5\u5bf9\u672a\u89c1\u8fc7\u7684\u793a\u4f8b\u548c\u4efb\u52a1\u8fdb\u884c\u6cdb\u5316\u3002|\n", "2408.10141": "|**2024-08-19**|**Instruction Finetuning for Leaderboard Generation from Empirical AI Research**|Salomon Kabongo et.al.|[2408.10141](http://arxiv.org/abs/2408.10141)|null|\u672c\u6587\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6307\u4ee4\u5fae\u8c03\u5728\u81ea\u52a8\u5316\u751f\u6210AI\u7814\u7a76\u6392\u884c\u699c\u4e2d\u7684\u5e94\u7528\uff0c\u4ece\u6587\u7ae0\u4e2d\u63d0\u53d6\uff08\u4efb\u52a1\uff0c\u6570\u636e\u96c6\uff0c\u6307\u6807\uff0c\u5206\u6570\uff09\u56db\u5143\u7ec4\u3002\u8be5\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ece\u4f20\u7edf\u7684\u3001\u57fa\u4e8e\u793e\u533a\u7684\u624b\u52a8\u6574\u7406\u8f6c\u53d8\u4e3a\u5229\u7528\u81ea\u52a8\u5316\u3001\u751f\u6210\u5f0fLLM\u65b9\u6cd5\u6765\u7b80\u5316AI\u7814\u7a76\u8fdb\u5c55\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u8d85\u8d8a\u4f9d\u8d56\u4e8e\u7279\u5b9a\u5206\u7c7b\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u6a21\u578b\u7684\u4f20\u7edf\u65b9\u5f0f\u3002\u901a\u8fc7\u5229\u7528FLAN-T5\u6a21\u578b\uff0c\u672c\u7814\u7a76\u589e\u5f3a\u4e86LLMs\u5728\u4fe1\u606f\u62bd\u53d6\u65b9\u9762\u7684\u9002\u5e94\u6027\u548c\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u6784\u5efa\u7ed3\u6784\u5316\u77e5\u8bc6\u8868\u793a\u3002|\n", "2408.10124": "|**2024-08-19**|**Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models**|Tianyu Zhang et.al.|[2408.10124](http://arxiv.org/abs/2408.10124)|**[link](https://github.com/zhangtia16/molgraph-lardo)**|**\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u662f\u836f\u7269\u53d1\u73b0\u7684\u57fa\u7840\u3002\u8fd1\u5e74\u6765\uff0c\u9884\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u5f97\u5230\u4e86\u5e7f\u6cdb\u5e94\u7528\uff0c\u5e76\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u4e00\u4e9b\u5c06\u751f\u7269\u5316\u5b66\u9886\u57df\u7684\u5148\u9a8c\u77e5\u8bc6\u878d\u5165\u9884\u8bad\u7ec3\u6846\u67b6\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u751f\u7269\u5316\u5b66\u4e13\u5bb6\uff0c\u83b7\u53d6\u548c\u603b\u7ed3\u5927\u91cf\u7684\u9886\u57df\u77e5\u8bc6\u6587\u732e\u65e2\u8017\u65f6\u53c8\u6602\u8d35\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u5e76\u9ad8\u6548\u63d0\u4f9b\u901a\u7528\u77e5\u8bc6\u65b9\u9762\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5076\u5c14\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u7f3a\u4e4f\u751f\u6210\u7279\u5b9a\u9886\u57df\u77e5\u8bc6\u7684\u7cbe\u786e\u6027\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\uff08DSMs\uff09\u62e5\u6709\u4e30\u5bcc\u7684\u9886\u57df\u77e5\u8bc6\uff0c\u80fd\u591f\u51c6\u786e\u8ba1\u7b97\u4e0e\u5206\u5b50\u9886\u57df\u76f8\u5173\u7684\u6307\u6807\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5b83\u4eec\u7684\u6a21\u578b\u5927\u5c0f\u6709\u9650\u4e14\u529f\u80fd\u5355\u4e00\uff0c\u5b83\u4eec\u7f3a\u4e4f\u5168\u9762\u7684\u8868\u793a\u5b66\u4e60\u6240\u9700\u7684\u5e7f\u6cdb\u77e5\u8bc6\u3002\u4e3a\u4e86\u5728\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u4e2d\u5145\u5206\u5229\u7528\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMolGraph-LarDo\u7684\u65b0\u578b\u5206\u5b50\u56fe\u8868\u793a\u5b66\u4e60\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u878d\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\u3002\u6280\u672f\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e24\u9636\u6bb5\u63d0\u793a\u7b56\u7565\uff0c\u5176\u4e2d\u5f15\u5165DSMs\u6765\u6821\u51c6LLMs\u63d0\u4f9b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u589e\u5f3a\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u7684\u51c6\u786e\u6027\uff0c\u4f7fLLMs\u80fd\u591f\u4e3a\u5206\u5b50\u6837\u672c\u751f\u6210\u66f4\u7cbe\u786e\u7684\u6587\u5b57\u63cf\u8ff0\u3002\u968f\u540e\uff0c\u6211\u4eec\u91c7\u7528\u591a\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u534f\u8c03\u5305\u62ec\u5206\u5b50\u56fe\u53ca\u5176\u5bf9\u5e94\u63cf\u8ff0\u6587\u672c\u5728\u5185\u7684\u5404\u79cd\u6a21\u6001\uff0c\u4ee5\u6307\u5bfc\u5206\u5b50\u8868\u793a\u7684\u9884\u8bad\u7ec3\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10111": "|**2024-08-20**|**PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities**|Yuanjian Xu et.al.|[2408.10111](http://arxiv.org/abs/2408.10111)|null|\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u5efa\u6a21\u5bf9\u4e8e\u7406\u89e3\u4e0e\u9884\u6d4b\u5e02\u573a\u884c\u4e3a\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u4e34\u7740\u975e\u7ebf\u6027\u3001\u975e\u5e73\u7a33\u6027\u548c\u9ad8\u566a\u58f0\u7b49\u6311\u6218\u3002\u4f20\u7edf\u7684\u6a21\u578b\u5728\u6355\u6349\u590d\u6742\u6a21\u5f0f\u65f6\u53d7\u5230\u8fd9\u4e9b\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u53d7\u5230\u8ba1\u7b97\u8d44\u6e90\u548c\u6a21\u578b\u5bb9\u91cf\u7684\u9650\u5236\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a$\\textbf{PLUTUS}$\u7684\u6a21\u578b\uff0c\u5176\u5168\u79f0\u4e3a$\\textbf{P}$re-trained $\\textbf{L}$arge $\\textbf{U}$nified $\\textbf{T}$ransformer-based\u6a21\u578b\uff0c\u7528\u4e8e\u63ed\u793a\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u4e2d\u7684\u89c4\u5f8b\u3002$\\textbf{PLUTUS}$\u901a\u8fc7\u7ed3\u5408\u53ef\u9006\u5d4c\u5165\u6a21\u5757\u3001\u5bf9\u6bd4\u5b66\u4e60\u548c\u81ea\u52a8\u7f16\u7801\u6280\u672f\uff0c\u521b\u5efa\u4e86\u539f\u59cb\u6570\u636e\u4e0e\u5757\u5d4c\u5165\u4e4b\u95f4\u7684\u8fd1\u4f3c\u4e00\u4e00\u6620\u5c04\u3002 TimeFormer\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6ce8\u610f\u529b\u7684\u67b6\u6784\uff0c\u6784\u6210\u4e86$\\textbf{PLUTUS}$\u7684\u6838\u5fc3\uff0c\u6709\u6548\u5730\u5904\u7406\u4e86\u9ad8\u566a\u58f0\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6ce8\u610f\u529b\u673a\u5236\uff0c\u4ee5\u8de8\u53d8\u91cf\u548c\u65f6\u95f4\u7ef4\u5ea6\u6355\u83b7\u7279\u5f81\u3002$\\textbf{PLUTUS}$\u5728\u89c4\u6a21\u7a7a\u524d\u76841000\u4ebf\u4e2a\u89c2\u5bdf\u503c\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u65e8\u5728\u9002\u5e94\u5608\u6742\u7684\u91d1\u878d\u5e02\u573a\u73af\u5883\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c$\\textbf{PLUTUS}$\u662f\u9996\u4e2a\u5f00\u6e90\u7684\u3001\u5927\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6a21\u578b\uff0c\u53c2\u6570\u8d85\u8fc7\u5341\u4ebf\u4e2a\u3002\u5b83\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u8fc1\u79fb\u6027\uff0c\u5e76\u4e3a\u91d1\u878d\u9886\u57df\u5efa\u7acb\u4e86\u4e00\u4e2a\u575a\u5b9e\u7684\u57fa\u7840\u6a21\u578b\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u7684\u6280\u672f\u6307\u5bfc\uff0c\u786e\u7acb\u4e86\u8be5\u9886\u57df\u7684\u5168\u65b0\u6807\u51c6\u3002|\n", "2408.10086": "|**2024-08-19**|**ARMADA: Attribute-Based Multimodal Data Augmentation**|Xiaomeng Jin et.al.|[2408.10086](http://arxiv.org/abs/2408.10086)|null|\u5728\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLMs\uff09\u4e2d\uff0c\u624b\u52a8\u6807\u6ce8\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u6570\u636e\u4ee5\u8fdb\u884c\u5fae\u8c03\u548c\u5bf9\u9f50\u7684\u6210\u672c\u975e\u5e38\u9ad8\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u6846\u67b6\u63d0\u51fa\u4e86\u589e\u5f3a\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u7684\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u8981\u4e48\u5728\u6587\u672c\u548c\u56fe\u50cf\u4e4b\u95f4\u5b58\u5728\u8bed\u4e49\u4e0d\u4e00\u81f4\uff0c\u8981\u4e48\u751f\u6210\u4e0d\u5207\u5b9e\u9645\u7684\u56fe\u50cf\uff0c\u5bfc\u81f4\u4e0e\u73b0\u5b9e\u4e16\u754c\u793a\u4f8b\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAttribute-based Multimodal Data Augmentation (ARMADA)\u7684\u65b0\u578b\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u77e5\u8bc6\u5f15\u5bfc\u7684\u63d0\u53ca\u5b9e\u4f53\u89c6\u89c9\u5c5e\u6027\u7684\u4fee\u6539\u6765\u589e\u5f3a\u6570\u636e\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u539f\u59cb\u6587\u672c\u6570\u636e\u4e2d\u63d0\u53d6\u5b9e\u4f53\u53ca\u5176\u89c6\u89c9\u5c5e\u6027\uff0c\u7136\u540e\u5728\u77e5\u8bc6\u5e93\uff08KBs\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u5bfc\u4e0b\u641c\u7d22\u89c6\u89c9\u5c5e\u6027\u7684\u66ff\u4ee3\u503c\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5229\u7528\u56fe\u50cf\u7f16\u8f91\u6a21\u578b\u6839\u636e\u63d0\u53d6\u7684\u5c5e\u6027\u7f16\u8f91\u56fe\u50cf\u3002ARMADA\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u6570\u636e\u751f\u6210\u6846\u67b6\uff1a(i) \u4ece\u7b26\u53f7\u77e5\u8bc6\u5e93\u4e2d\u63d0\u53d6\u77e5\u8bc6\u5173\u8054\u7684\u5c5e\u6027\uff0c\u5b9e\u73b0\u8bed\u4e49\u4e00\u81f4\u4e14\u5177\u6709\u533a\u522b\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u751f\u6210\uff1b(ii) \u5229\u7528\u77e5\u8bc6\u5e93\u5c42\u6b21\u7ed3\u6784\u4e2d\u7684\u540c\u7c7b\u522b\u5b9e\u4f53\u751f\u6210\u89c6\u89c9\u4e0a\u76f8\u4f3c\u4f46\u4e0d\u540c\u7c7b\u522b\u7684\u56fe\u50cf\uff1b(iii) \u4f7f\u7528LLMs\u7684\u5e38\u8bc6\u77e5\u8bc6\u8c03\u8282\u8f85\u52a9\u89c6\u89c9\u5c5e\u6027\uff0c\u5982\u80cc\u666f\uff0c\u4ee5\u66f4\u5168\u9762\u5730\u8868\u793a\u539f\u59cb\u5b9e\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u660e\uff0c\u5728\u56db\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u5e76\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e5f\u5f3a\u8c03\u4e86\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4ee3\u7406\u4ee5\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\u548c\u73b0\u5b9e\u4e16\u754c\u76f8\u5173\u6027\u7684\u5fc5\u8981\u6027\u3002|\n", "2408.10072": "|**2024-08-19**|**FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant**|Zhengchao Huang et.al.|[2408.10072](http://arxiv.org/abs/2408.10072)|null|\u5feb\u901f\u53d1\u5c55\u7684\u6df1\u5ea6\u4f2a\u9020\u6280\u672f\u5f15\u53d1\u4e86\u516c\u4f17\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c24\u5176\u662f\u5728\u5bf9\u516c\u5171\u4fe1\u606f\u5b89\u5168\u6784\u6210\u4e25\u91cd\u5a01\u80c1\u7684\u9762\u90e8\u4f2a\u9020\u65b9\u9762\u3002\u7136\u800c\uff0c\u672a\u77e5\u548c\u591a\u6837\u7684\u4f2a\u9020\u6280\u672f\u3001\u591a\u53d8\u7684\u9762\u90e8\u7279\u5f81\u4ee5\u53ca\u590d\u6742\u7684\u73af\u5883\u56e0\u7d20\u7ed9\u9762\u90e8\u4f2a\u9020\u5206\u6790\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u73b0\u6709\u6570\u636e\u96c6\u5728\u63cf\u8ff0\u8fd9\u4e9b\u65b9\u9762\u65f6\u5b58\u5728\u4e0d\u8db3\uff0c\u4f7f\u5f97\u4ec5\u901a\u8fc7\u89c6\u89c9\u4fe1\u606f\u96be\u4ee5\u5728\u5404\u79cd\u5e72\u6270\u56e0\u7d20\u4e2d\u533a\u5206\u771f\u5b9e\u4e0e\u4f2a\u9020\u7684\u9762\u90e8\u3002\u6b64\u5916\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u672a\u80fd\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u4e14\u53ef\u89e3\u91ca\u7684\u7ed3\u679c\uff0c\u590d\u6742\u5316\u4e86\u6a21\u578b\u51b3\u7b56\u8fc7\u7a0b\u7684\u7406\u89e3\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u9896\u7684\u201c\u5f00\u653e\u4e16\u754c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u95ee\u7b54\u201d\uff08OW-FFA-VQA\uff09\u4efb\u52a1\u53ca\u5176\u76f8\u5e94\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u9996\u5148\u5efa\u7acb\u4e86\u4e00\u4e2a\u5305\u542b\u771f\u5b9e\u548c\u4f2a\u9020\u9762\u90e8\u56fe\u50cf\u7684\u591a\u6837\u96c6\u5408\uff0c\u5e76\u914d\u6709\u5173\u952e\u63cf\u8ff0\u548c\u53ef\u9760\u4f2a\u9020\u63a8\u7406\u7684\u6570\u636e\u96c6\u3002\u57fa\u4e8e\u6b64\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u52a9\u624b\u201d\uff08FFAA\uff09\uff0c\u5b83\u7531\u4e00\u4e2a\u5fae\u8c03\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u548c\u4e00\u4e2a\u591a\u7b54\u6848\u667a\u80fd\u51b3\u7b56\u7cfb\u7edf\uff08MIDS\uff09\u7ec4\u6210\u3002\u901a\u8fc7\u7ed3\u5408\u5047\u8bbe\u6027\u63d0\u793a\u4e0eMIDS\uff0c\u6709\u6548\u6d88\u9664\u4e86\u6a21\u7cca\u5206\u7c7b\u8fb9\u754c\u7684\u5f71\u54cd\u529b\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u9c81\u68d2\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u7528\u6237\u53cb\u597d\u7684\u53ef\u89e3\u91ca\u7ed3\u679c\uff0c\u800c\u4e14\u5728\u51c6\u786e\u6027\u4e0e\u9c81\u68d2\u6027\u65b9\u9762\u663e\u8457\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u65b9\u6cd5\u3002|\n", "2408.11053": "|**2024-08-20**|**Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks**|Nathaniel Pinckney et.al.|[2408.11053](http://arxiv.org/abs/2408.11053)|**[link](https://github.com/nvlabs/verilog-eval)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6570\u5b57\u786c\u4ef6\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5e94\u7528\u662f\u4e00\u4e2a\u65b0\u5174\u9886\u57df\u3002\u5927\u591a\u6570LLM\u4e3b\u8981\u662f\u5728\u81ea\u7136\u8bed\u8a00\u548c\u8f6f\u4ef6\u4ee3\u7801\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\u3002\u786c\u4ef6\u4ee3\u7801\uff0c\u5982Verilog\uff0c\u53ea\u5360\u8bad\u7ec3\u6570\u636e\u7684\u4e00\u5c0f\u90e8\u5206\uff0c\u800c\u4e14\u5f88\u5c11\u6709\u786c\u4ef6\u57fa\u51c6\u5b58\u5728\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c2023\u5e74\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aVerilogEval\u7684\u5f00\u6e90\u57fa\u51c6\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4e00\u81f4\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8eLLM\u5728\u4ee3\u7801\u5b8c\u6210\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u57fa\u51c6\u5728\u5f53\u65f6\u7684\u9886\u5148\u6a21\u578b\uff0c\u5305\u62ecGPT-4\uff0c\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002\u7136\u800c\uff0cVerilogEval\u548c\u5176\u4ed6Verilog\u751f\u6210\u57fa\u51c6\u7f3a\u4e4f\u5931\u8d25\u5206\u6790\uff0c\u5f53\u524d\u5f62\u5f0f\u4e0b\u4e5f\u4e0d\u5229\u4e8e\u63a2\u7d22\u63d0\u793a\u6280\u672f\u3002\u6b64\u5916\uff0c\u5728VerilogEval\u53d1\u5e03\u540e\uff0c\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u90fd\u7ecf\u5386\u4e86\u6301\u7eed\u7684\u53d1\u5c55\u3002 \u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u65b0\u53d1\u5e03\u7684\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u4e0d\u540c\u89c4\u6a21\uff0c\u9488\u5bf9\u6539\u8fdb\u540e\u7684VerilogEval\u57fa\u51c6\u5957\u4ef6\u3002\u6211\u4eec\u589e\u5f3a\u4e86VerilogEval\u7684\u57fa\u7840\u67b6\u6784\u548c\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u81ea\u52a8\u5206\u7c7b\u5931\u8d25\uff0c\u5f15\u5165\u4e86\u652f\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u793a\u4f8b\u7684\u65b0\u63d0\u793a\uff0c\u5e76\u6269\u5c55\u4e86\u652f\u6301\u7684\u4efb\u52a1\u5230\u89c4\u683c\u5230RTL\u8f6c\u6362\u3002\u6211\u4eec\u53d1\u73b0\u5546\u4e1a\u9886\u57df\u7684\u6700\u65b0\u6a21\u578b\u6709\u4e86\u53ef\u6d4b\u91cf\u7684\u6539\u8fdb\uff0c\u5176\u4e2dGPT-4 Turbo\u5728\u89c4\u683c\u5230RTL\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8659%\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u4e5f\u7814\u7a76\u4e86\u65b0\u51fa\u73b0\u7684\u5f00\u6e90\u548c\u9886\u57df\u7279\u5b9a\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5e76\u5c55\u793a\u4e86\u6a21\u578b\u4ece\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u83b7\u5f97\u663e\u8457\u76ca\u5904\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u53d1\u5e03\u7684Llama 3.1 405B\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0eGPT-4 Turbo\u76f8\u5f53\uff0c\u5b9e\u73b0\u4e8658%\u7684\u6210\u529f\u7387\uff0c\u800c\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u7684RTL-Coder 6.7B\u6a21\u578b\u5219\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768437%\u7684\u6210\u529f\u7387\u3002\u7136\u800c\uff0c\u63d0\u793a\u5de5\u7a0b\u5bf9\u4e8e\u5b9e\u73b0\u826f\u597d\u7684\u6210\u529f\u7387\u81f3\u5173\u91cd\u8981\uff0c\u5e76\u4e14\u968f\u7740\u6a21\u578b\u548c\u4efb\u52a1\u7684\u53d8\u5316\u800c\u53d8\u5316\u3002\u4e00\u4e2a\u5141\u8bb8\u8fdb\u884c\u63d0\u793a\u5de5\u7a0b\u548c\u5931\u8d25\u5206\u6790\u7684\u57fa\u51c6\u57fa\u7840\u8bbe\u65bd\u5bf9\u4e8e\u6301\u7eed\u7684\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u5f80\u5f80\u8f83\u4f4e\u4e0b\u3002\u6211\u4eec\u5f15\u5165\u4e86FLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u591a\u6a21\u6001LLM\u7684\u65b0\u578b\u4ee3\u7406\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u80fd\u9ad8\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u5b9e\u73b0\u5bf9\u5bfc\u822a\u4efb\u52a1\u7684\u6709\u6548\u9002\u5e94\uff1a\u5355\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8857\u9053\u89c6\u56fe\u63cf\u8ff0\u3001\u591a\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8f68\u8ff9\u603b\u7ed3\u4ee5\u53ca\u7aef\u5230\u7aef\u8bad\u7ec3\u5728VLN\u6570\u636e\u96c6\u4e0a\u7684\u7efc\u5408\u80fd\u529b\u3002\u751f\u6210\u7684\u6570\u636e\u96c6\u901a\u8fc7\u81ea\u52a8\u5316\u8fc7\u7a0b\u5408\u6210\u800c\u6210\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u8f83\u73b0\u6709\u65b9\u6cd5\u63d0\u9ad8\u4e867.3%\uff0c\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u4ee3\u8868\u4e86\u5411\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53\u4eba\u5de5\u667a\u80fd\u9886\u57df\u8fc8\u51fa\u7684\u91cd\u8981\u4e00\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11049": "|**2024-08-20**|**MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding**|Jian Chen et.al.|[2408.11049](http://arxiv.org/abs/2408.11049)|**[link](https://github.com/infini-ai-lab/magicdec)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982\u4ea4\u4e92\u5f0f\u804a\u5929\u673a\u5668\u4eba\u3001\u6587\u6863\u5206\u6790\u548c\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u7b49\u957f\u671f\u4e0a\u4e0b\u6587\u5e94\u7528\u4e2d\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4f46\u63d0\u4f9b\u957f\u4e0a\u4e0b\u6587\u8bf7\u6c42\u65f6\uff0c\u8981\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u548c\u9ad8\u541e\u5410\u91cf\u662f\u4e00\u4e2a\u6311\u6218\u3002\u63a8\u6d4b\u6027\u89e3\u7801\uff08SD\uff09\u662f\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u964d\u4f4e\u5ef6\u8fdf\u7684\u6280\u672f\uff0c\u4f20\u7edf\u89c2\u70b9\u8ba4\u4e3a\u5176\u6548\u80fd\u4ec5\u9650\u4e8e\u8f83\u5c0f\u7684\u6279\u6b21\u5927\u5c0f\u3002\u7136\u800c\uff0c\u5728MagicDec\u4e2d\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4ee4\u4eba\u60ca\u8bb6\u7684\u4e8b\u5b9e\uff1a\u5373\u4f7f\u5728\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u73af\u5883\u4e2d\uff0c\u5bf9\u4e8e\u4e2d\u7b49\u5230\u8f83\u957f\u5e8f\u5217\uff0cSD\u4ecd\u80fd\u5b9e\u73b0\u52a0\u901f\u3002\u66f4\u6709\u8da3\u7684\u662f\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u4e25\u8c28\u5206\u6790\uff0c\u4e00\u79cd\u667a\u80fd\u8d77\u8349\u7b56\u7565\u53ef\u4ee5\u5728\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u65f6\u83b7\u5f97\u66f4\u597d\u7684\u52a0\u901f\u6548\u679c\u3002 MagicDec\u9996\u5148\u8bc6\u522b\u51fa\u968f\u7740\u6279\u6b21\u5927\u5c0f\u548c\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u7684\u74f6\u9888\u8f6c\u79fb\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u6d1e\u5bdf\u6765\u66f4\u6709\u6548\u5730\u90e8\u7f72\u63a8\u6d4b\u6027\u89e3\u7801\u4ee5\u652f\u6301\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u5229\u7528\u7a00\u758fKV\u7f13\u5b58\u7684\u8349\u6848\u6a21\u578b\u6765\u89e3\u51b3\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u548c\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u800c\u6269\u5c55\u7684KV\u74f6\u9888\u95ee\u9898\u3002|\n", "2408.11043": "|**2024-08-20**|**Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research**|Sreyoshi Bhaduri et.al.|[2408.11043](http://arxiv.org/abs/2408.11043)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5206\u6790\u8bbf\u8c08\u8bb0\u5f55\uff0c\u4ee5\u89e3\u51b3\u624b\u52a8\u5206\u6790\u5b9a\u6027\u6570\u636e\u9700\u8981\u5927\u91cf\u65f6\u95f4\u548c\u52aa\u529b\u7684\u95ee\u9898\u3002\u7814\u7a76\u65e8\u5728\u5c06\u7814\u7a76\u95ee\u9898\u8bbe\u5b9a\u4e3a\u7531LLM\u4f5c\u4e3a\u521d\u7ea7\u7814\u7a76\u52a9\u624b\u8fdb\u884c\u8f85\u52a9\u7684\u6a21\u5f0f\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u5c06LLM\u89c6\u4e3a\u4eba\u624d\u7ba1\u7406\u9886\u57df\u7814\u7a76\u4eba\u5458\u7684\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u601d\u7ef4\u6a21\u578b\u3002\u901a\u8fc7\u6269\u5c55\u57fa\u4e8eRAG\u7684LLM\u65b9\u6cd5\uff0c\u672c\u6587\u5c55\u793a\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u5bf9\u534a\u7ed3\u6784\u5316\u8bbf\u8c08\u6570\u636e\u8fdb\u884c\u4e3b\u9898\u5efa\u6a21\u65b9\u9762\u7684\u7075\u6d3b\u6027\uff0c\u8d85\u8d8a\u4e86\u5b83\u4eec\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u4e2d\u7684\u4f20\u7edf\u5e94\u7528\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684RAG\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u63d0\u53d6\u611f\u5174\u8da3\u7684\u8bae\u9898\uff0c\u4e0e\u4ece\u540c\u4e00\u6570\u636e\u96c6\u624b\u52a8\u751f\u6210\u7684\u4e3b\u9898\u76f8\u6bd4\uff0c\u8986\u76d6\u8303\u56f4\u663e\u8457\u66f4\u9ad8\u3002\u8fd9\u8bc1\u660e\u4e86\u4f7f\u7528LLM\u4f5c\u4e3a\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u53ef\u884c\u6027\u3002\u6b64\u5916\uff0c\u7814\u7a76\u5efa\u8bae\uff0c\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u7814\u7a76\u8005\u5e94\u4e25\u683c\u9075\u5faa\u4f20\u7edf\u8d28\u6027\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u8d28\u91cf\u6807\u51c6\uff0c\u4ee5\u786e\u4fdd\u5176\u65b9\u6cd5\u7684\u4e25\u8c28\u6027\u548c\u53ef\u9760\u6027\u3002 \u6700\u540e\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u9488\u5bf9\u5e0c\u671b\u5c06LLM\u4e0e\u73b0\u6709\u8d28\u6027\u7814\u7a76\u8303\u5f0f\u76f8\u878d\u5408\u7684\u884c\u4e1a\u5b9e\u8df5\u8005\u7684\u5173\u952e\u5efa\u8bae\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u6709\u6548\u6574\u5408\u8fd9\u4e9b\u5f3a\u5927\u4f46\u521d\u7ea7\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u5177\u5728\u5b9a\u6027\u6570\u636e\u5206\u6790\u4e2d\u7684\u8def\u5f84\uff0c\u7279\u522b\u662f\u5728\u4eba\u624d\u9886\u57df\u3002|\n", "2408.11029": "|**2024-08-20**|**Scaling Law with Learning Rate Annealing**|Howe Tissue et.al.|[2408.11029](http://arxiv.org/abs/2408.11029)|null|\u6211\u4eec\u53d1\u73b0\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u4ea4\u53c9\u71b5\u635f\u5931\u66f2\u7ebf\u9075\u5faa\u4e86\u4e00\u4e2a\u4e0e\u5b66\u4e60\u7387\uff08LR\uff09\u8870\u51cf\u76f8\u5173\u7684\u7f29\u653e\u5b9a\u5f8b\uff1a$L(s) = L_0 + A\\cdot S_1^{-\\alpha} - C\\cdot S_2$\u3002\u5176\u4e2d\uff0c$S_1$\u4ee3\u8868\u524d\u5411\u533a\u57df\uff0c$S_2$\u4ee3\u8868\u5b66\u4e60\u7387\u8870\u51cf\u533a\u57df\u3002\u8fd9\u4e00\u516c\u5f0f\u8003\u8651\u4e86\u4e24\u4e2a\u56e0\u7d20\uff1a\uff081\uff09\u4f20\u7edf\u7684\u7f29\u653e\u5f8b\u5b9a\u4e49\u7684\u524d\u5411\u7f29\u653e\uff1b\u4ee5\u53ca\uff082\uff09\u5b66\u4e60\u7387\u8870\u51cf\u5e26\u6765\u7684\u989d\u5916\u635f\u5931\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u8be5\u516c\u5f0f\u80fd\u591f\u63cf\u8ff0\u6bcf\u4e2a\u6b65\u9aa4\u7684\u5b8c\u6574\u635f\u5931\u66f2\u7ebf\uff0c\u800c\u975e\u4ec5\u9650\u4e8e\u8bad\u7ec3\u7ed3\u675f\u65f6\u7684\u5355\u4e00\u635f\u5931\u70b9\u3002\u901a\u8fc7\u5e94\u7528\u5305\u542b\u5b66\u4e60\u7387\u8870\u51cf\u7684\u7f29\u653e\u5f8b\uff0c\u5e76\u4ec5\u901a\u8fc7\u4e00\u5230\u4e24\u6b21\u8bad\u7ec3\u66f2\u7ebf\u62df\u5408\uff0c\u6211\u4eec\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\uff08LRS\uff09\u4e0b\u7684\u635f\u5931\u3002 \u6b64\u5916\uff0c\u8fd9\u4e00\u65b9\u7a0b\u51c6\u786e\u5730\u63cf\u8ff0\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u52a8\u6001\uff0c\u5e76\u4e3a\u5148\u524d\u7814\u7a76\u4e2d\u5173\u6ce8\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u548c\u5b66\u4e60\u7387\u8870\u51cf\u7684\u76f8\u5173\u5b9e\u9a8c\u53d1\u73b0\u63d0\u4f9b\u4e86\u7406\u8bba\u9a8c\u8bc1\u548c\u89e3\u91ca\u3002\u7531\u6b64\u4ea7\u751f\u7684\u6d1e\u5bdf\uff0c\u4e5f\u4e3a\u7814\u7a76\u4eba\u5458\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u63d0\u524d\u9009\u62e9\u5173\u952e\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u7b56\u7565\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002\u6700\u91cd\u8981\u7684\u662f\uff0c\u7531\u4e8e\u6574\u4e2a\u8bad\u7ec3\u66f2\u7ebf\u4e0a\u7684\u6240\u6709\u70b9\u90fd\u9075\u5faa\u8be5\u65b9\u7a0b\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\u4e0b\u5b9e\u73b0\u51c6\u786e\u7684\u635f\u5931\u9884\u6d4b\uff0c\u800c\u6240\u9700\u8ba1\u7b97\u6210\u672c\u4ec5\u4e3a\u4f7f\u7528\u5c0f\u677e\u9f20\u7f29\u653e\u6cd5\u5219\u62df\u5408\u8bed\u8a00\u6a21\u578b\u635f\u5931\u6240\u9700\u76841%\u4ee5\u4e0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u6781\u5927\u5730\u4fc3\u8fdb\u4e86\u7f29\u653e\u5f8b\u62df\u5408\u548c\u9884\u6d4b\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u666e\u53ca\u6027\u3002|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4ee5\u4e0d\u65ad\u589e\u957f\u7684\u7a0b\u5ea6\u81ea\u4e3b\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u7ed9\u5b83\u4eec\u7684\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5b83\u5229\u7528\u4e86\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u7684\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5411\u5b89\u5168\u6027\u53d1\u5c55\uff0c\u540c\u65f6\u5b8c\u6210\u7ed9\u5b9a\u7684\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u4e2a\u6279\u5224\u6027\u673a\u5236\uff0c\u5728\u6bcf\u4e2a\u6b65\u9aa4\u4e0a\u5f15\u5bfc\u4ee3\u7406\u907f\u514d\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5b89\u5168\u63a8\u7406\u80fd\u529b\u7684\u73b0\u6709\u57fa\u51c6\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u6db5\u76d68\u4e2a\u7c7b\u522b\u5171\u8ba180\u4e2a\u5de5\u5177\u5305\u548c180\u4e2a\u573a\u666f\u7684\u4e00\u7ec4\u6570\u636e\u96c6\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u6027\u601d\u8003\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.11006": "|**2024-08-20**|**While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?**|Wen Cheng et.al.|[2408.11006](http://arxiv.org/abs/2408.11006)|**[link](https://github.com/sensente/security-attacks-on-lccts)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u8865\u5168\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u50ac\u751f\u4e86\u65b0\u4e00\u4ee3\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff08LCCT\uff09\u3002\u4e0e\u901a\u7528LLM\u4e0d\u540c\uff0c\u8fd9\u4e9b\u5de5\u5177\u5177\u6709\u72ec\u7279\u7684\u64cd\u4f5c\u6d41\u7a0b\uff0c\u6574\u5408\u591a\u79cd\u4fe1\u606f\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4f18\u5148\u8003\u8651\u4ee3\u7801\u5efa\u8bae\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\uff0c\u8fd9\u5f15\u5165\u4e86\u7279\u5b9a\u7684\u5b89\u5168\u6311\u6218\u3002\u6b64\u5916\uff0cLCCT\u901a\u5e38\u4f9d\u8d56\u4e8e\u4e13\u6709\u4ee3\u7801\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u5f15\u53d1\u4e86\u5173\u4e8e\u654f\u611f\u6570\u636e\u6cc4\u9732\u7684\u62c5\u5fe7\u3002\u672c\u6587\u5229\u7528LCCT\u7684\u72ec\u7279\u7279\u6027\uff0c\u5f00\u53d1\u4e86\u9488\u5bf9\u4e24\u79cd\u5173\u952e\u5b89\u5168\u98ce\u9669\u7684\u9488\u5bf9\u6027\u653b\u51fb\u65b9\u6cd5\uff1a\u8d8a\u72f1\u653b\u51fb\u548c\u8bad\u7ec3\u6570\u636e\u63d0\u53d6\u653b\u51fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86LCCT\u4e2d\u5b58\u5728\u7684\u91cd\u5927\u6f0f\u6d1e\uff0c\u5305\u62ec\u5728GitHub Copilot\u4e0a\u768499.4%\u6210\u529f\u8d8a\u72f1\u653b\u51fb\u7387\uff0c\u5728Amazon Q\u4e0a\u768446.3%\u6210\u529f\u7387\u3002\u6211\u4eec\u8fd8\u6210\u529f\u4eceGitHub Copilot\u4e2d\u63d0\u53d6\u4e86\u654f\u611f\u7528\u6237\u6570\u636e\uff0c\u5305\u62ec54\u4e2a\u771f\u5b9e\u7535\u5b50\u90ae\u4ef6\u5730\u5740\u548c314\u4e2a\u4e0eGitHub\u7528\u6237\u540d\u5173\u8054\u7684\u7269\u7406\u5730\u5740\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7801\u7684\u653b\u51fb\u65b9\u6cd5\u5bf9\u901a\u7528LLM\uff08\u5982GPT\u7cfb\u5217\uff09\u540c\u6837\u6709\u6548\uff0c\u7a81\u663e\u4e86\u73b0\u4ee3LLM\u5904\u7406\u4ee3\u7801\u65f6\u5b58\u5728\u7684\u66f4\u5e7f\u6cdb\u5b89\u5168\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86LCCT\u9762\u4e34\u7684\u5173\u952e\u5b89\u5168\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u52a0\u5f3a\u5176\u5b89\u5168\u6846\u67b6\u7684\u91cd\u8981\u65b9\u5411\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u76f8\u5173\u4ee3\u7801\u793a\u4f8b\u548c\u653b\u51fb\u6837\u672c\uff0c\u5b83\u4eec\u53ef\u4ecehttps://github.com/Sensente/Security-Attacks-on-LCCTs\u83b7\u53d6\u3002**|\n", "2408.10995": "|**2024-08-20**|**CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models**|Michael Reinisch et.al.|[2408.10995](http://arxiv.org/abs/2408.10995)|null|\u65b0\u533b\u7597\u6cbb\u7597\u65b9\u6cd5\u7684\u5f00\u53d1\u9700\u8981\u591a\u4e2a\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u3002\u5c3d\u7ba1\u5c06\u836f\u7269\u63a8\u5411\u5e02\u573a\u7684\u6210\u672c\u9ad8\u6602\u4e14\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u53ea\u6709\u4e0d\u523020%\u7684\u836f\u7269\u80fd\u4ece\u7b2c\u4e00\u9636\u6bb5\u8fc7\u6e21\u5230\u6700\u540e\u7684\u6279\u51c6\u3002\u8fd1\u671f\u7684\u7814\u7a76\u6587\u732e\u8868\u660e\uff0c\u8bd5\u9a8c\u65b9\u6848\u7684\u8bbe\u8ba1\u5bf9\u8bd5\u9a8c\u8868\u73b0\u6709\u7740\u663e\u8457\u5f71\u54cd\u3002\u6211\u4eec\u7814\u7a76\u4e86\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u9884\u6d4b\uff08CTOP\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u8bd5\u9a8c\u8bbe\u8ba1\u6587\u4ef6\u81ea\u52a8\u9884\u6d4b\u4e0d\u540c\u9636\u6bb5\u7684\u8f6c\u6362\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684CTOP\u6a21\u578b\u2014\u2014CTP-LLM\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aPhaseTransition\uff08PT\uff09\u7684\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u6839\u636e\u8bd5\u9a8c\u5728\u76d1\u7ba1\u8fc7\u7a0b\u4e2d\u7684\u8fdb\u5c55\u8fdb\u884c\u6807\u8bb0\uff0c\u5e76\u4f5c\u4e3aCTOP\u8bc4\u4f30\u7684\u6807\u51c6\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cbe\u7ec6\u8c03\u53c2GPT-3.5\u4e3a\u57fa\u7840\u7684\u6a21\u578b\uff08CTP-LLM\uff09\u80fd\u591f\u901a\u8fc7\u5206\u6790\u539f\u59cb\u534f\u8bae\u6587\u672c\u6765\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u7684\u8f6c\u6362\uff0c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u9009\u62e9\u7684\u7279\u5f81\u3002CTP-LLM\u5728\u6240\u6709\u9636\u6bb5\u7684\u9884\u6d4b\u4e2d\u8fbe\u5230\u4e8667%\u7684\u51c6\u786e\u7387\uff0c\u5728\u9884\u6d4b\u4ece\u7b2c\u4e09\u9636\u6bb5\u5230\u6700\u7ec8\u6279\u51c6\u7684\u8f6c\u6362\u65f6\uff0c\u51c6\u786e\u7387\u66f4\u8fbe\u5230\u4e8675%\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u9a71\u52a8\u5e94\u7528\u5728\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u548c\u8bc4\u4f30\u8bd5\u9a8c\u8bbe\u8ba1\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.10947": "|**2024-08-20**|**Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models**|Yuyan Chen et.al.|[2408.10947](http://arxiv.org/abs/2408.10947)|null|\u6559\u5e08\u5728\u4f20\u6388\u77e5\u8bc6\u548c\u5f15\u5bfc\u5b66\u4e60\u8005\u65b9\u9762\u53d1\u6325\u7740\u91cd\u8981\u4f5c\u7528\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6f5c\u5728\u6559\u80b2\u8005\u7684\u89d2\u8272\u6b63\u5728\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u9886\u57df\u3002\u8ba4\u8bc6\u5230LLMs\u751f\u6210\u6559\u80b2\u5185\u5bb9\u7684\u80fd\u529b\u53ef\u4ee5\u63a8\u52a8\u81ea\u52a8\u5316\u548c\u4e2a\u6027\u5316\u5b66\u4e60\u7684\u8fdb\u5c55\u3002\u867d\u7136LLMs\u5728\u7406\u89e3\u529b\u548c\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u65b9\u9762\u7684\u6d4b\u8bd5\u5df2\u7ecf\u8fdb\u884c\uff0c\u4f46\u5b83\u4eec\u5728\u6559\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u9c9c\u4e3a\u4eba\u77e5\u3002\u5728\u6559\u5b66\u4e2d\uff0c\u63d0\u95ee\u662f\u4e00\u9879\u5173\u952e\u6280\u80fd\uff0c\u80fd\u591f\u6307\u5bfc\u5b66\u751f\u5206\u6790\u3001\u8bc4\u4f30\u5e76\u7efc\u5408\u6838\u5fc3\u6982\u5ff5\u548c\u539f\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\u6765\u8bc4\u4f30\u6559\u80b2\u4e2dLLMs\u7684\u63d0\u95ee\u80fd\u529b\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u7684\u6559\u80b2\u95ee\u9898\uff0c\u5229\u7528\u5b89\u5fb7\u68ee\u548c\u514b\u62c9\u592b\u970d\u592b\u7684\u5206\u7c7b\u6cd5\u8986\u76d6\u4e00\u822c\u3001\u5355\u5b66\u79d1\u548c\u8de8\u5b66\u79d1\u9886\u57df\u3002\u6211\u4eec\u4ece\u5c06LLMs\u89c6\u4e3a\u5b66\u4e60\u8005\u8f6c\u5411\u5c06\u5176\u89c6\u4e3a\u6559\u80b2\u8005\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u95ee\u9898\u7684\u80fd\u529b\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6559\u5b66\u80fd\u529b\u3002\u6211\u4eec\u5e94\u7528\u4e86\u56db\u4e2a\u6307\u6807\uff0c\u5305\u62ec\u76f8\u5173\u6027\u3001\u8986\u76d6\u7387\u3001\u4ee3\u8868\u6027\u4ee5\u53ca\u4e00\u81f4\u6027\uff0c\u6765\u8bc4\u4f30LLMs\u8f93\u51fa\u7684\u6559\u80b2\u8d28\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u5728\u6559\u6388\u4e00\u822c\u3001\u4eba\u6587\u5b66\u79d1\u548c\u79d1\u5b66\u8bfe\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u663e\u8457\u6f5c\u529b\uff1bClaude2\u4f3c\u4e4e\u66f4\u9002\u5408\u62c5\u4efb\u8de8\u5b66\u79d1\u6559\u5e08\u3002\u6b64\u5916\uff0c\u81ea\u52a8\u8bc4\u5206\u4e0e\u4eba\u7c7b\u89c2\u70b9\u4e00\u81f4\u3002|\n", "2408.10946": "|**2024-08-20**|**Large Language Model Driven Recommendation**|Anton Korikov et.al.|[2408.10946](http://arxiv.org/abs/2408.10946)|null|### \u6458\u8981 \u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u65b0\u673a\u9047\u3002\u5728\u4e4b\u524d\u7684\u7ae0\u8282\u4e2d\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662f\u57fa\u4e8e\u6807\u51c6\u5316\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u5982\u8d2d\u4e70\u3001\u89c2\u770b\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u589e\u5f3a\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff0c\u8fd9\u4e3a\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6765\u6784\u5efa\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u63a8\u8350\u7cfb\u7edf\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u7684\u65b9\u5f0f\u4ecb\u7ecd\u5173\u952e\u7684\u6570\u636e\u6e90\uff0c\u6db5\u76d6\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u7684\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u5305\u62ec\u8c03\u4f18\u548c\u672a\u8c03\u4f18\u60c5\u51b5\u4e0b\u7684\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\u3002\u7136\u540e\uff0c\u8f6c\u5411\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u7ba1\u9053\u4e2d\u534f\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u4fc3\u8fdb\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u63d0\u4f9b\u63a8\u8350\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e0e\u7528\u6237\u7684\u4e92\u52a8\uff0c\u7528\u4e8e\u504f\u597d\u63d0\u53d6\u3001\u6279\u8bc4\u548c\u95ee\u7b54\u3002 ### \u7ffb\u8bd1 \u672c\u6587\u63a2\u8ba8\u4e86\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u65b9\u9762\u7684\u65b0\u578b\u5e94\u7528\u3002\u6b64\u524d\u7ae0\u8282\u4e3b\u8981\u5173\u6ce8\u57fa\u4e8e\u6807\u51c6\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u4f8b\u5982\u8d2d\u4e70\u3001\u6d4f\u89c8\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5b83\u4eec\u5177\u5907\u4e86\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u7684\u80fd\u529b\uff0c\u4ece\u800c\u6253\u5f00\u4e86\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6784\u5efa\u9ad8\u5ea6\u5b9a\u5236\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u80fd\u6027\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u65b9\u5f0f\u6982\u8ff0\u4e86\u5173\u952e\u6570\u636e\u6e90\uff0c\u5305\u62ec\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u968f\u540e\uff0c\u6df1\u5165\u63a2\u8ba8\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u6db5\u76d6\u4e86\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\uff0c\u65e0\u8bba\u662f\u5728\u8c03\u4f18\u8fd8\u662f\u672a\u8c03\u4f18\u72b6\u6001\u4e0b\u3002\u63a5\u7740\uff0c\u8ba8\u8bba\u4e86\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u6d41\u7a0b\u4e2d\u534f\u540c\u5de5\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u652f\u6301\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u7528\u4e8e\u751f\u6210\u63a8\u8350\uff0c\u8fd8\u80fd\u4e0e\u7528\u6237\u8fdb\u884c\u4e92\u52a8\uff0c\u8fdb\u884c\u504f\u597d\u6536\u96c6\u3001\u8bc4\u4ef7\u548c\u95ee\u7b54\u3002|\n", "2408.11813": "|**2024-08-21**|**SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs**|Yuanyang Yin et.al.|[2408.11813](http://arxiv.org/abs/2408.11813)|null|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u8868\u73b0\uff0c\u5b83\u4eec\u901a\u5e38\u7531\u89c6\u89c9\u7f16\u7801\u5668\u3001\u9002\u914d\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7ec4\u6210\u3002\u9002\u914d\u5668\u4f5c\u4e3a\u89c6\u89c9\u4e0e\u8bed\u8a00\u7ec4\u4ef6\u4e4b\u95f4\u7684\u5173\u952e\u6865\u6881\u3002\u7136\u800c\uff0c\u901a\u8fc7\u56fe\u50cf\u7ea7\u76d1\u7763\u8bad\u7ec3\u9002\u914d\u5668\u5f80\u5f80\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u5bf9\u9f50\u504f\u5dee\uff0c\u8fd9\u4f1a\u524a\u5f31LLM\u7684\u80fd\u529b\u5e76\u9650\u5236\u591a\u6a21\u6001LLM\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76d1\u7763\u5d4c\u5165\u5bf9\u9f50\uff08SEA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982CLIP\uff09\u7684\u5206\u8bcd\u7ea7\u5bf9\u9f50\u65b9\u6cd5\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u8c03\u6574\u89c6\u89c9\u5206\u8bcd\u4e0eLLM\u5d4c\u5165\u7a7a\u95f4\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u786e\u4fdd\u4e86\u89c6\u89c9\u548c\u8bed\u8a00\u8868\u793a\u4e4b\u95f4\u66f4\u534f\u8c03\u7684\u6574\u5408\uff0c\u4ece\u800c\u589e\u5f3a\u591a\u6a21\u6001LLM\u7684\u6027\u80fd\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u540c\u65f6\u4fdd\u7559\u5176\u56fa\u6709\u7279\u6027\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cSEA\u6709\u6548\u5730\u63d0\u9ad8\u4e86MLLMs\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u65e0\u9700\u989d\u5916\u7684\u6570\u636e\u6216\u63a8\u7406\u8ba1\u7b97\u3002\u6b64\u5916\uff0cSEA\u4e5f\u4e3a\u5f00\u53d1\u66f4\u901a\u7528\u548c\u9002\u5e94\u6027\u5f3a\u7684\u89e3\u51b3\u65b9\u6848\u4ee5\u589e\u5f3a\u591a\u6a21\u6001\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.11801": "|**2024-08-21**|**Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models**|Yuzhou Huang et.al.|[2408.11801](http://arxiv.org/abs/2408.11801)|null|\u4f20\u7edf\u89c6\u89c9\u53d9\u4e8b\u590d\u6742\uff0c\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u548c\u5927\u91cf\u8d44\u6e90\uff0c\u4f46\u5f80\u5f80\u53d7\u9650\u4e8e\u4eba\u7c7b\u7684\u521b\u9020\u529b\u4e0e\u521b\u4f5c\u7cbe\u5ea6\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u589e\u5f3a\u4e86\u89c6\u89c9\u53d9\u4e8b\u80fd\u529b\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u4e8c\u7ef4\u89c6\u89c9\u6548\u679c\u6216\u901a\u8fc7\u52a8\u4f5c\u5408\u6210\u548c\u884c\u4e3a\u6a21\u62df\u7b80\u5316\u6545\u4e8b\uff0c\u672a\u80fd\u751f\u6210\u5168\u9762\u3001\u591a\u7ef4\u7684\u53d9\u4e8b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faStory3D-Agent\uff0c\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u7684\u80fd\u529b\u5c06\u63d0\u4f9b\u7684\u53d9\u4e8b\u8f6c\u5316\u4e3a\u4e09\u7ef4\u6e32\u67d3\u53ef\u89c6\u5316\u3002\u901a\u8fc7\u96c6\u6210\u7a0b\u5e8f\u5efa\u6a21\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u7cbe\u786e\u63a7\u5236\u591a\u89d2\u8272\u7684\u52a8\u4f5c\u548c\u52a8\u6001\uff0c\u4ee5\u53ca\u5404\u79cd\u88c5\u9970\u5143\u7d20\uff0c\u786e\u4fdd\u957f\u671f\u548c\u52a8\u6001\u7684\u4e09\u7ef4\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u652f\u6301\u901a\u8fc7\u903b\u8f91\u63a8\u7406\u8fdb\u884c\u53d9\u4e8b\u6269\u5c55\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u6709\u6761\u4ef6\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u5bf9Story3D-Agent\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u4ee5\u9a8c\u8bc1\u5176\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u672c\u6846\u67b6\u6765\u63a8\u52a8\u4e09\u7ef4\u6545\u4e8b\u8868\u793a\u7684\u53d1\u5c55\u3002|\n", "2408.11800": "|**2024-08-21**|**PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain**|Rounak Meyur et.al.|[2408.11800](http://arxiv.org/abs/2408.11800)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u548c\u6587\u672c\u751f\u6210\u9886\u57df\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5174\u8d77\u4e3a\u901a\u8fc7\u5229\u7528\u7528\u6237\u6307\u5b9a\u6570\u636e\u5e93\u4e2d\u7684\u4fe1\u606f\u6765\u63d0\u9ad8\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u57fa\u51c6\u6d4b\u8bd5\u5bf9\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u4e0d\u540cRAG\u914d\u7f6e\u5728\u68c0\u7d22\u5668\u548c\u751f\u6210\u5668\u65b9\u9762\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u914d\u7f6e\u7684\u6709\u6548\u6027\u3001\u53ef\u6269\u5c55\u6027\u548c\u7279\u5b9a\u9886\u57df\u548c\u5e94\u7528\u7684\u9002\u7528\u6027\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u751f\u6210\u4e0e\u7279\u5b9a\u9886\u57df\u76f8\u5173\u7684RAG\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u57fa\u4e8e\u81ea\u52a8\u95ee\u9898\u7b54\u6848\u751f\u6210\u4e0e\u4eba\u7c7b\uff08\u9886\u57df\u4e13\u5bb6\uff09-\u4eba\u5de5\u667a\u80fd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u534f\u4f5c\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u4ee5\u6848\u4f8b\u7814\u7a76\u7684\u5f62\u5f0f\uff0c\u6211\u4eec\u901a\u8fc7\u5f15\u5165PermitQA\u4f5c\u4e3a\u98ce\u573a\u9009\u5740\u548c\u8bb8\u53ef\u9886\u57df\u7684\u9996\u4e2a\u57fa\u51c6\u8fdb\u884c\u4e86\u6846\u67b6\u5c55\u793a\uff0c\u8be5\u57fa\u51c6\u5305\u542b\u4e86\u4e0e\u98ce\u80fd\u9879\u76ee\u73af\u5883\u5f71\u54cd\u76f8\u5173\u7684\u591a\u7bc7\u79d1\u5b66\u6587\u6863/\u62a5\u544a\u3002 \u6211\u4eec\u7684\u6846\u67b6\u7cfb\u7edf\u5730\u4f7f\u7528\u591a\u79cd\u6307\u6807\u548c\u4e0d\u540c\u590d\u6742\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\u7c7b\u578b\u6765\u8bc4\u4f30RAG\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4e0d\u540c\u6a21\u578b\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002|\n", "2408.11795": "|**2024-08-21**|**EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model**|Feipeng Ma et.al.|[2408.11795](http://arxiv.org/abs/2408.11795)|null|\u5728\u591a\u6a21\u6001\u7814\u7a76\u9886\u57df\uff0c\u4f17\u591a\u7814\u7a76\u5229\u7528\u5927\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u8fdb\u884c\u6a21\u6001\u5bf9\u9f50\u5b66\u4e60\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u8f6c\u5316\u4e3a\u591a\u6a21\u6001LLMs\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u76ee\u524d\u4e3b\u8981\u7684\u5b9e\u73b0\u65b9\u6cd5\u5206\u4e3a\u4e24\u7c7b\uff1a\u81ea\u6ce8\u610f\u529b\u57fa\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u3002\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u56e0\u5176\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u673a\uff08MLP\uff09\u67b6\u6784\u800c\u5177\u6709\u8f83\u9ad8\u7684\u6570\u636e\u6548\u7387\uff0c\u4f46\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u5374\u76f8\u5bf9\u8f83\u4f4e\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u9700\u8981\u5c06\u89c6\u89c9\u548c\u6587\u672c\u4ee4\u724c\u4f5c\u4e3a\u8f93\u5165\u8fdb\u884c\u8fde\u63a5\u3002\u800c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u867d\u7136\u5728\u989d\u5916\u7684\u5b66\u4e60\u53c2\u6570\u65b9\u9762\u4e0d\u5982\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u9ad8\u6548\uff0c\u4f46\u7531\u4e8e\u907f\u514d\u4e86\u4e3aLLM\u63d0\u4f9b\u8fc7\u957f\u5e8f\u5217\u8f93\u5165\uff0c\u56e0\u6b64\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8868\u73b0\u66f4\u9ad8\u3002\u4e3a\u4e86\u5e73\u8861\u8fd9\u4e9b\u6743\u8861\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6570\u636e\u9ad8\u6548\u4e14\u8ba1\u7b97\u9ad8\u6548\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08EE-MLLM\uff09\u3002EE-MLLM\u5728\u4e0d\u5f15\u5165\u989d\u5916\u6a21\u5757\u6216\u53ef\u5b66\u4e60\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6570\u636e\u548c\u8ba1\u7b97\u6548\u7387\u7684\u63d0\u5347\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u591a\u6a21\u6001LLM\u4e2d\u7684\u539f\u59cb\u81ea\u6ce8\u610f\u529b\u673a\u5236\u8fdb\u884c\u4e86\u6539\u8fdb\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u590d\u5408\u6ce8\u610f\u529b\u673a\u5236\u3002\u8be5\u673a\u5236\u6709\u4e24\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u6d88\u9664\u89c6\u89c9\u4ee4\u724c\u5185\u90e8\u7684\u81ea\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u4ee5\u5b9e\u73b0\u8ba1\u7b97\u6548\u7387\uff1b2\uff09\u91cd\u7528LLM\u6bcf\u4e00\u5c42\u7684\u6743\u91cd\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u4e0e\u8bed\u8a00\u4e4b\u95f4\u7684\u6709\u6548\u6a21\u6001\u5bf9\u9f50\uff0c\u4ece\u800c\u5b9e\u73b0\u6570\u636e\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEE-MLLM\u5728\u5305\u62ecMMBench\u3001SeedBench\u7b49\u901a\u7528\u6027\u6570\u636e\u96c6\u4ee5\u53caTextVQA\u3001DocVQA\u7b49\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u5728\u5185\u7684\u591a\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6709\u6548\u6027\u3002|\n", "2408.11793": "|**2024-08-21**|**Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design**|Nathaniel H. Park et.al.|[2408.11793](http://arxiv.org/abs/2408.11793)|null|\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u548c\u901a\u8fc7\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u8fdb\u884c\u751f\u6210\u8bbe\u8ba1\u662f\u7814\u7a76\u7684\u70ed\u70b9\u9886\u57df\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u5b83\u5728\u52a0\u901f\u65b0\u6750\u6599\u5f00\u53d1\u65b9\u9762\u7684\u6f5c\u529b\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u5f97\u5230\u4e86\u663e\u8457\u589e\u5f3a\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u66f4\u590d\u6742\u7684\u7814\u7a76\u4efb\u52a1\u80cc\u666f\u4e0b\u8fdb\u884c\u9884\u6d4b\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u65b9\u9762\uff0c\u4ee3\u7406\u7cfb\u7edf\u4ecd\u6709\u6539\u8fdb\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u5bf9\u9884\u6d4b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u66ff\u4ee3\u5e94\u7528\uff0c\u5982\u5229\u7528\u5b83\u4eec\u7684\u6f5c\u5728\u8868\u793a\u6765\u4fc3\u8fdb\u8de8\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u5728\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u4efb\u52a1\u7279\u5b9a\u7684\u6750\u6599\u8bbe\u8ba1\uff0c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002 \u5728\u6b64\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5927\u89c4\u6a21\u3001\u9884\u8bad\u7ec3\u7684\u5316\u5b66\u57fa\u7840\u6a21\u578b\u53ef\u4ee5\u4f5c\u4e3a\u4f7f\u5316\u5b66\u4fe1\u606f\u68c0\u7d22\u8bed\u4e49\u5316\u7684\u57fa\u7840\uff0c\u9002\u7528\u4e8e\u5c0f\u5206\u5b50\u3001\u590d\u6742\u805a\u5408\u7269\u6750\u6599\u548c\u53cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5316\u5b66\u57fa\u7840\u6a21\u578b\u4e0e\u56fe\u50cf\u6a21\u578b\uff08\u5982OpenCLIP\uff09\u76f8\u7ed3\u5408\uff0c\u80fd\u591f\u5b9e\u73b0\u8de8\u591a\u4e2a\u8868\u5f81\u6570\u636e\u57df\u7684\u524d\u6240\u672a\u6709\u7684\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e9b\u7cfb\u7edf\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\uff0c\u4ee5\u652f\u6301\u7ed3\u6784\u548c\u62d3\u6251\u4e3a\u57fa\u7840\u7684\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\uff0c\u4ece\u800c\u4fc3\u8fdb\u590d\u6742\u7814\u7a76\u4efb\u52a1\u7684\u6267\u884c\u3002|\n", "2408.11791": "|**2024-08-21**|**Critique-out-Loud Reward Models**|Zachary Ankner et.al.|[2408.11791](http://arxiv.org/abs/2408.11791)|**[link](https://github.com/zankner/cloud)**|**\u4f20\u7edf\u7684\u5956\u52b1\u6a21\u578b\u5728\u4ece\u4eba\u7c7b\u53cd\u9988\u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u65f6\uff0c\u4ec5\u7528\u4e8e\u76f4\u63a5\u9884\u6d4b\u504f\u597d\u5206\u6570\uff0c\u800c\u4e0d\u5229\u7528\u5e95\u5c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u80fd\u529b\u3002\u8fd9\u9650\u5236\u4e86\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u5fc5\u987b\u901a\u8fc7\u5355\u4e00\u524d\u5411\u4f20\u9012\u6765\u9690\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5373\uff0c\u5fc5\u987b\u5728\u504f\u597d\u5efa\u6a21\u8fc7\u7a0b\u4e2d\u5b8c\u6210\u63a8\u7406\u3002\u4e3a\u4e86\u4f7f\u5956\u52b1\u6a21\u578b\u80fd\u591f\u663e\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u3002CLoud\u5956\u52b1\u6a21\u578b\u9996\u5148\u751f\u6210\u5bf9\u52a9\u624b\u54cd\u5e94\u7684\u81ea\u7136\u8bed\u8a00\u6279\u8bc4\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u6279\u8bc4\u6765\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\u7684\u6807\u91cf\u5956\u52b1\u3002 \u6211\u4eec\u8bc1\u660e\u4e86\u5bf9\u4e8eLlama-3-8B\u548c70B\u57fa\u7840\u6a21\u578b\uff0cCLoud\u5956\u52b1\u6a21\u578b\u7684\u6210\u529f\uff1a\u4e0e\u7ecf\u5178\u5956\u52b1\u6a21\u578b\u76f8\u6bd4\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5206\u522b\u5728RewardBench\u4e0a\u63d0\u9ad8\u4e868B\u548c70B\u57fa\u7840\u6a21\u578b\u7684\u4e8c\u5143\u504f\u597d\u5206\u7c7b\u51c6\u786e\u73874.65\u548c5.84\u4e2a\u767e\u5206\u70b9\u3002\u6b64\u5916\uff0c\u5f53\u4f5c\u4e3aBest-of-N\u8bc4\u5206\u6a21\u578b\u4f7f\u7528\u65f6\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5728ArenaHard\u4e0a\u7684\u80dc\u7387\u4e5f\u5b9e\u73b0\u4e86\u5e15\u7d2f\u6258\u6539\u8fdb\u3002\u6700\u540e\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u5229\u7528CLoud\u5956\u52b1\u6a21\u578b\u7684\u52a8\u6001\u63a8\u7406\u8ba1\u7b97\u80fd\u529b\uff0c\u901a\u8fc7\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u6765\u8fdb\u884c\u5956\u52b1\u9884\u6d4b\u3002 \u4ee5\u4e0a\u662f\u5173\u4e8e\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u7684\u6458\u8981\u7ffb\u8bd1\uff0c\u5b83\u5c55\u793a\u4e86\u8fd9\u79cd\u65b0\u578b\u5956\u52b1\u6a21\u578b\u5728\u63d0\u5347\u5f3a\u5316\u5b66\u4e60\u7cfb\u7edf\u6027\u80fd\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2408.11788": "|**2024-08-21**|**DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework**|Zhifei Xie et.al.|[2408.11788](http://arxiv.org/abs/2408.11788)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cDreamFactory\u201d\u7684LLM\u57fa\u6846\u67b6\uff0c\u5b83\u80fd\u89e3\u51b3\u5f53\u524d\u89c6\u9891\u751f\u6210\u6a21\u578b\u5728\u521b\u5efa\u957f\u89c6\u9891\u65f6\u9047\u5230\u7684\u6311\u6218\u3002DreamFactory\u901a\u8fc7\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u539f\u5219\u548c\u5173\u952e\u5e27\u8fed\u4ee3\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u786e\u4fdd\u4e86\u957f\u89c6\u9891\u7684\u4e00\u81f4\u6027\u548c\u98ce\u683c\u7edf\u4e00\u3002\u5b83\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08Chain of Thought\uff0cCOT\uff09\u6765\u5904\u7406\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56fa\u6709\u7684\u4e0d\u786e\u5b9a\u6027\u3002DreamFactory\u80fd\u591f\u751f\u6210\u957f\u3001\u98ce\u683c\u4e00\u81f4\u4e14\u590d\u6742\u7684\u89c6\u9891\u3002 \u5bf9\u4e8e\u8fd9\u4e9b\u957f\u5f62\u5f0f\u89c6\u9891\u7684\u8bc4\u4f30\u63d0\u51fa\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u8de8\u573a\u666f\u9762\u90e8\u8ddd\u79bb\u5206\u6570\u548c\u8de8\u573a\u666f\u98ce\u683c\u4e00\u81f4\u6027\u5206\u6570\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\uff0c\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u5305\u542b\u8d85\u8fc7150\u4e2a\u7531\u4eba\u7c7b\u8bc4\u5206\u7684\u591a\u573a\u666f\u89c6\u9891\u7684\u591a\u573a\u666f\u89c6\u9891\u6570\u636e\u96c6\u3002|\n", "2408.11779": "|**2024-08-21**|**Personality Alignment of Large Language Models**|Minjun Zhu et.al.|[2408.11779](http://arxiv.org/abs/2408.11779)|**[link](https://github.com/zhu-minjun/palign)**|**\u4e3a\u4e86\u5f25\u8865\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\u65b9\u6cd5\u5728\u53cd\u6620\u4eba\u7c7b\u666e\u904d\u4ef7\u503c\u89c2\u548c\u884c\u4e3a\u65f6\u7684\u4e0d\u8db3\uff0c\u5ffd\u89c6\u4e86\u4e2a\u4f53\u7528\u6237\u72ec\u7279\u7279\u5f81\u548c\u504f\u597d\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e2a\u6027\u5bf9\u9f50\u7684\u6982\u5ff5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u6839\u636e\u4e2a\u4f53\u7528\u6237\u6216\u7d27\u5bc6\u5173\u8054\u7fa4\u4f53\u7684\u5177\u4f53\u504f\u597d\u8c03\u6574LLM\u7684\u54cd\u5e94\u4e0e\u51b3\u7b56\u3002\u53d7\u5fc3\u7406\u6d4b\u91cf\u5b66\u7684\u542f\u53d1\uff0c\u6211\u4eec\u6784\u5efa\u4e86Personality Alignment with Personality Inventories (PAPI) \u6570\u636e\u96c6\uff0c\u5305\u542b\u4e8630\u4e07\u771f\u5b9e\u4e3b\u4f53\u7684\u6570\u636e\uff0c\u6bcf\u4e2a\u4e3b\u4f53\u57fa\u4e8e\u4e94\u5927\u4eba\u683c\u56e0\u7d20\u63d0\u4f9b\u884c\u4e3a\u504f\u597d\u4fe1\u606f\u3002\u8fd9\u4e00\u6570\u636e\u96c6\u4f7f\u6211\u4eec\u80fd\u591f\u5b9a\u91cf\u8bc4\u4f30LLM\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u4e0e\u6bcf\u4e2a\u4e3b\u4f53\u7684\u884c\u4e3a\u6a21\u5f0f\u76f8\u5339\u914d\u3002\u9274\u4e8e\u4e2a\u6027\u5bf9\u9f50\u9762\u4e34\u7684\u6311\u6218\uff1a\u5982\u4e2a\u4eba\u6570\u636e\u6709\u9650\u3001\u504f\u597d\u591a\u6837\u4ee5\u53ca\u53ef\u6269\u5c55\u6027\u9700\u6c42\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6fc0\u6d3b\u5e72\u9884\u4f18\u5316\u65b9\u6cd5\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528\u6700\u5c11\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u63d0\u9ad8\u4e86LLM\u9ad8\u6548\u5bf9\u9f50\u4e2a\u4f53\u884c\u4e3a\u504f\u597d\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5PAS\u4e0d\u4ec5\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86DPO\uff0c\u800c\u4e14\u4f18\u5316\u65f6\u95f4\u4ec5\u4e3a\u540e\u8005\u7684\u4e94\u5206\u4e4b\u4e00\uff0c\u5177\u6709\u5b9e\u9645\u4ef7\u503c\uff0c\u63a8\u52a8\u4e86\u4e2a\u6027\u5316\u7684AI\u7cfb\u7edf\u51b3\u7b56\u4e0e\u63a8\u7406\u7684\u53d1\u5c55\uff0c\u589e\u5f3a\u4e86\u4e0e\u6bcf\u4f4d\u7528\u6237\u7684\u4ea4\u4e92\u76f8\u5173\u6027\u548c\u610f\u4e49\uff0c\u4fc3\u8fdb\u4e86\u4ee5\u4eba\u4e3a\u672c\u7684\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728\u3002**|\n", "2408.11775": "|**2024-08-21**|**Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards**|Omar Erak et.al.|[2408.11775](http://arxiv.org/abs/2408.11775)|**[link](https://github.com/Nouf-Alabbasi/oKUmura_AI_Telecom_challenge)**|**\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7535\u4fe1\u6807\u51c6\u65b9\u9762\u7684\u6280\u672f\u89c4\u8303\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8ePhi-2\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u7684\u5fae\u8c03\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u4f5c\u4e3a\u901a\u4fe1\u7f51\u7edc\u7684\u6743\u5a01\u7b54\u6848\u6765\u6e90\u3002\u6211\u4eec\u5f00\u53d1\u7684\u7cfb\u7edf\u5229\u7528\u524d\u77bb\u6027\u7684\u8bed\u4e49\u5206\u5757\u6765\u52a8\u6001\u786e\u5b9a\u89e3\u6790\u65ad\u70b9\uff0c\u4f9d\u636e\u5d4c\u5165\u76f8\u4f3c\u5ea6\u8fdb\u884c\u8c03\u6574\uff0c\u4ece\u800c\u6709\u6548\u5904\u7406\u591a\u79cd\u6587\u6863\u683c\u5f0f\u3002\u9488\u5bf9\u6280\u672f\u6807\u51c6\u4e2d\u53ef\u80fd\u51fa\u73b0\u7684\u591a\u4e2a\u76f8\u4f3c\u4e0a\u4e0b\u6587\u95ee\u9898\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u91cd\u65b0\u6392\u540d\u7b97\u6cd5\u4ee5\u4f18\u5148\u8003\u8651\u6700\u76f8\u5173\u7684\u63d0\u53d6\u7247\u6bb5\u3002\u8003\u8651\u5230Phi-2\u7684\u5c0f\u8bed\u5883\u7a97\u53e3\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aSelfExtend\u7684\u6700\u65b0\u6280\u672f\uff0c\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u6269\u5c55\u8bed\u5883\u7a97\u53e3\uff0c\u4e0d\u4ec5\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u8fd8\u80fd\u9002\u5e94\u5ba2\u6237\u5230\u4e13\u4e1a\u6280\u672f\u4eba\u5458\u7684\u5404\u79cd\u67e5\u8be2\u548c\u8bbe\u8ba1\u9700\u6c42\u3002\u4e3a\u4e86\u5fae\u8c03\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4f4e\u79e9\u9002\u914d\uff08LoRA\uff09\u6280\u672f\uff0c\u5728\u8bad\u7ec3\u65f6\u63d0\u9ad8\u8ba1\u7b97\u6548\u7387\uff0c\u5e76\u5728\u5c0f\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u6709\u6548\u7684\u5fae\u8c03\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u7535\u4fe1\u9886\u57df\u5bf9\u73b0\u6709\u95ee\u7b54\u65b9\u6cd5\u7684\u663e\u8457\u6539\u8fdb\uff0c\u6027\u80fd\u8d85\u8fc7GPT-4\uff08\u5927\u7ea6\u662f\u5176\u89c4\u6a21\u7684880\u500d\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u5229\u7528SLM\u5728\u901a\u4fe1\u7f51\u7edc\u4e2d\u7684\u65b0\u65b9\u6cd5\uff0c\u63d0\u4f9b\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u53ef\u4f5c\u4e3a\u6784\u5efa\u667a\u80fd\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u3002**|\n", "2408.11749": "|**2024-08-21**|**Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks**|Yiyi Chen et.al.|[2408.11749](http://arxiv.org/abs/2408.11749)|**[link](https://github.com/siebeniris/vec2text_exp)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u6765\u81ea\u7f51\u7edc\u653b\u51fb\u8005\u7684\u6076\u610f\u5f71\u54cd\uff0c\u5982\u5bf9\u6297\u6027\u3001\u540e\u95e8\u548c\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u3002\u5bf9\u6b64\uff0c\u65b0\u5174\u7684LLM\u5b89\u5168\u9886\u57df\u81f4\u529b\u4e8e\u7814\u7a76\u5e76\u9632\u5fa1\u6b64\u7c7b\u5a01\u80c1\u3002\u8fc4\u4eca\u4e3a\u6b62\uff0c\u8be5\u9886\u57df\u7684\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u82f1\u8bed\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u7136\u800c\uff0c\u6700\u65b0\u7814\u7a76\u8868\u660e\uff0c\u591a\u8bed\u8a00LLM\u53ef\u80fd\u6bd4\u5176\u5355\u4e00\u8bed\u8a00\u540c\u50da\u66f4\u6613\u53d7\u5230\u5404\u79cd\u653b\u51fb\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5728\u90e8\u5206\u6b27\u6d32\u8bed\u8a00\u4e0a\u7684\u5d4c\u5165\u53cd\u8f6c\uff0c\u4f46\u8981\u5c06\u8fd9\u4e9b\u53d1\u73b0\u63a8\u53ca\u5230\u4e0d\u540c\u8bed\u7cfb\u548c\u4e0d\u540c\u4e66\u5199\u7cfb\u7edf\u7684\u8bed\u8a00\uff0c\u5374\u6781\u5177\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u591a\u8bed\u8a00LLM\u5728\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5e76\u572820\u79cd\u8bed\u8a00\u4e2d\u8fdb\u884c\u8de8\u8bed\u8a00\u548c\u8de8\u4e66\u5199\u7684\u53cd\u8f6c\u6d4b\u8bd5\uff0c\u8986\u76d68\u4e2a\u8bed\u7cfb\u548c12\u79cd\u4e66\u5199\u7cfb\u7edf\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u963f\u62c9\u4f2f\u5b57\u6bcd\u548c\u897f\u91cc\u5c14\u5b57\u6bcd\u4e66\u5199\u7684\u8bed\u8a00\u4ee5\u53ca\u5370\u5ea6-\u96c5\u5229\u5b89\u8bed\u7cfb\u7684\u8bed\u8a00\u7279\u522b\u5bb9\u6613\u53d7\u5230\u5d4c\u5165\u53cd\u8f6c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u89c2\u5bdf\u5230\u53cd\u8f6c\u6a21\u578b\u503e\u5411\u4e8e\u51fa\u73b0\u8bed\u8a00\u6df7\u6dc6\uff0c\u6709\u65f6\u5927\u5e45\u5ea6\u964d\u4f4e\u4e86\u653b\u51fb\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u63a2\u7d22\u4e86\u8fd9\u4e00\u74f6\u9888\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u53ef\u9884\u6d4b\u6a21\u5f0f\uff0c\u8fd9\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u3002\u6700\u7ec8\uff0c\u672c\u7814\u7a76\u65e8\u5728\u6df1\u5316\u5bf9\u591a\u8bed\u8a00LLM\u9762\u4e34\u7684\u4e3b\u8981\u5b89\u5168\u6f0f\u6d1e\u7684\u7406\u89e3\uff0c\u5e76\u63d0\u9ad8\u5bf9\u6700\u6613\u53d7\u8fd9\u4e9b\u653b\u51fb\u5f71\u54cd\u7684\u8bed\u8a00\u7684\u610f\u8bc6\u3002|\n", "2408.12599": "|**2024-08-22**|**Controllable Text Generation for Large Language Models: A Survey**|Xun Liang et.al.|[2408.12599](http://arxiv.org/abs/2408.12599)|**[link](https://github.com/iaar-shanghai/ctgsurvey)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6587\u672c\u751f\u6210\u8d28\u91cf\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0cLLMs\u9700\u8981\u6ee1\u8db3\u65e5\u76ca\u590d\u6742\u7684\u9700\u6c42\u3002\u9664\u4e86\u907f\u514d\u8bef\u5bfc\u6027\u6216\u4e0d\u9002\u5f53\u7684\u5185\u5bb9\uff0cLLMs\u8fd8\u88ab\u671f\u671b\u6839\u636e\u7279\u5b9a\u7528\u6237\u9700\u6c42\u8fdb\u884c\u8c03\u6574\uff0c\u5982\u6a21\u4eff\u7279\u5b9a\u7684\u5199\u4f5c\u98ce\u683c\u6216\u751f\u6210\u5bcc\u6709\u8bd7\u610f\u7684\u6587\u672c\u3002\u8fd9\u4e9b\u591a\u6837\u7684\u9700\u6c42\u63a8\u52a8\u4e86\u53ef\u63a7\u6587\u672c\u751f\u6210\uff08CTG\uff09\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u7b26\u5408\u9884\u8bbe\u7684\u63a7\u5236\u6761\u4ef6\uff0c\u5982\u5b89\u5168\u6027\u3001\u60c5\u611f\u503e\u5411\u3001\u4e3b\u9898\u4e00\u81f4\u6027\u4ee5\u53ca\u8bed\u8a00\u98ce\u683c\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u6709\u7528\u6027\u3001\u6d41\u7545\u6027\u548c\u591a\u6837\u6027\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u56de\u987e\u4e86CTG\u5728LLMs\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8be6\u7ec6\u5b9a\u4e49\u4e86\u5176\u6838\u5fc3\u6982\u5ff5\uff0c\u5e76\u660e\u786e\u4e86\u63a7\u5236\u6761\u4ef6\u548c\u6587\u672c\u8d28\u91cf\u7684\u8981\u6c42\u3002\u6211\u4eec\u5c06CTG\u4efb\u52a1\u5206\u4e3a\u4e24\u5927\u7c7b\uff1a\u5185\u5bb9\u63a7\u5236\u548c\u5c5e\u6027\u63a7\u5236\uff0c\u5e76\u5bf9\u6bcf\u79cd\u7c7b\u578b\u7684\u65b9\u6cd5\u8fdb\u884c\u4e86\u8ba8\u8bba\uff0c\u5305\u62ec\u6a21\u578b\u91cd\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u5f3a\u5316\u5b66\u4e60\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6f5c\u5728\u7a7a\u95f4\u64cd\u7eb5\u548c\u89e3\u7801\u65f6\u5e72\u9884\u3002\u6211\u4eec\u5206\u6790\u4e86\u6bcf\u79cd\u65b9\u6cd5\u7684\u7279\u70b9\u3001\u4f18\u52bf\u548c\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u5b9e\u73b0\u751f\u6210\u63a7\u5236\u7684\u6df1\u5165\u89c1\u89e3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u56de\u987e\u4e86CTG\u8bc4\u4f30\u65b9\u6cd5\u3001\u603b\u7ed3\u4e86\u5176\u8de8\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u6307\u51fa\u4e86\u5f53\u524d\u7814\u7a76\u7684\u5173\u952e\u6311\u6218\uff0c\u5982\u6d41\u7545\u5ea6\u548c\u5b9e\u7528\u6027\u7684\u964d\u4f4e\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u82e5\u5e72\u547c\u5401\uff0c\u5f3a\u8c03\u672a\u6765\u7814\u7a76\u5e94\u66f4\u6ce8\u91cd\u5b9e\u9645\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u4e3a\u8be5\u9886\u57df\u7684\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6307\u5bfc\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u548c\u4e2d\u6587\u7248\u672c\u5df2\u5f00\u6e90\u5728https://github.com/IAAR-Shanghai/CTGSurvey\u3002**|\n", "2408.12579": "|**2024-08-22**|**RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment**|Xiaohan Wang et.al.|[2408.12579](http://arxiv.org/abs/2408.12579)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u3001MedPaLM-2\u548cMed-Gemini\u5728\u5404\u7c7b\u533b\u7597\u8bc4\u4f30\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u4e0e\u533b\u5b66\u4e13\u5bb6\u7ade\u4e89\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u4e0e\u533b\u751f\u76f8\u5ab2\u7f8e\u7684\u4e13\u4e1a\u8bca\u65ad\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u9ad8\u6548\u6536\u96c6\u60a3\u8005\u4fe1\u606f\u4ee5\u53ca\u63a8\u7406\u6700\u7ec8\u8bca\u65ad\u7684\u8fc7\u7a0b\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRuleAlign\u7684\u6846\u67b6\uff0c\u65e8\u5728\u4f7fLLM\u4e0e\u7279\u5b9a\u8bca\u65ad\u89c4\u5219\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u57fa\u4e8e\u89c4\u5219\u7684\u533b\u60a3\u5bf9\u8bdd\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u8fdb\u884c\u5bf9\u9f50\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u671f\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u591f\u542f\u53d1\u63a2\u7d22LLM\u4f5c\u4e3aAI\u533b\u5e08\u7684\u6f5c\u529b\u3002|\n", "2408.12570": "|**2024-08-22**|**Jamba-1.5: Hybrid Transformer-Mamba Models at Scale**|Jamba Team et.al.|[2408.12570](http://arxiv.org/abs/2408.12570)|null|\u6211\u4eec\u63a8\u51fa\u4e86Jamba-1.5\uff0c\u57fa\u4e8e\u6211\u4eecJamba\u67b6\u6784\u7684\u65b0\u578b\u6307\u4ee4\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002Jamba\u662f\u4e00\u79cd\u6df7\u5408Transformer-Mamba\u4e13\u5bb6\u6df7\u5408\u67b6\u6784\uff0c\u5b83\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u8303\u56f4\u5185\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u4f7f\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eTransformer\u6a21\u578b\u76f8\u540c\u6216\u66f4\u597d\u7684\u8d28\u91cf\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4e24\u79cd\u6a21\u578b\u5927\u5c0f\uff1aJamba-1.5-Large\uff0c\u5177\u670994B\u4e2a\u6d3b\u8dc3\u53c2\u6570\uff1b\u4ee5\u53caJamba-1.5-Mini\uff0c\u5177\u670912B\u4e2a\u6d3b\u8dc3\u53c2\u6570\u3002\u8fd9\u4e24\u79cd\u6a21\u578b\u5747\u9488\u5bf9\u591a\u79cd\u5bf9\u8bdd\u548c\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5e76\u4e14\u5177\u6709256K\u4ee4\u724c\u7684\u6700\u5927\u6709\u6548\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u5728\u5f00\u653e\u6743\u91cd\u6a21\u578b\u4e2d\u6700\u5927\u3002\u4e3a\u4e86\u652f\u6301\u6210\u672c\u6548\u76ca\u7684\u63a8\u7406\uff0c\u6211\u4eec\u5f15\u5165\u4e86ExpertsInt8\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u91cf\u5316\u6280\u672f\uff0c\u5141\u8bb8\u5728\u5904\u7406256K\u4ee4\u724c\u4e0a\u4e0b\u6587\u65f6\u5c06Jamba-1.5-Large\u6a21\u578b\u653e\u5165\u5177\u67098\u4e2a80GB GPU\u7684\u673a\u5668\u4e0a\u800c\u4e0d\u4f1a\u635f\u5931\u8d28\u91cf\u3002\u5f53\u5728\u4e00\u7cfb\u5217\u5b66\u672f\u548c\u804a\u5929\u673a\u5668\u4eba\u57fa\u51c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\u65f6\uff0cJamba-1.5\u6a21\u578b\u53d6\u5f97\u4e86\u51fa\u8272\u7684\u7ed3\u679c\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u5e76\u4f18\u4e8e\u5176\u4ed6\u5f00\u653e\u6743\u91cd\u6a21\u578b\u5728\u957f\u4e0a\u4e0b\u6587\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u4e24\u79cd\u5927\u5c0f\u7684\u6a21\u578b\u7684\u6743\u91cd\u90fd\u6839\u636eJamba\u5f00\u653e\u6a21\u578b\u8bb8\u53ef\u516c\u5f00\u63d0\u4f9b\uff0c\u5e76\u4e14\u6211\u4eec\u53d1\u5e03\u4e86ExpertsInt8\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u3002|\n", "2408.12561": "|**2024-08-22**|**ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation**|Lujia Zhong et.al.|[2408.12561](http://arxiv.org/abs/2408.12561)|**[link](https://github.com/lujiazho/ssprop)**|**\u8fd1\u671f\uff0c\u6df1\u5ea6\u5b66\u4e60\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u6a21\u578b\u9886\u57df\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u6982\u7387\u6027\u6269\u6563\u6a21\u578b\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u8fd9\u4e9b\u6a21\u578b\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u6d88\u8017\u6570\u5341\u4ebf\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08petaFLOPs\uff09\uff0c\u5bfc\u81f4\u5de8\u5927\u7684\u80fd\u6e90\u6d88\u8017\u548c\u78b3\u8db3\u8ff9\uff0c\u5f15\u53d1\u4e86\u5bf9\u73af\u5883\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u5728\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u8fc7\u7a0b\u4e2d\uff0c\u53cd\u5411\u4f20\u64ad\uff08Back-propagation, BP\uff09\u662f\u4e3b\u8981\u7684\u8ba1\u7b97\u8d1f\u62c5\u6765\u6e90\u3002 \u4e3a\u4e86\u63a8\u52a8\u80fd\u6e90\u6548\u7387\u7684\u63d0\u9ad8\uff0c\u5e76\u5141\u8bb8\u5728\u4efb\u4f55\u673a\u5668\u548c\u8bbe\u5907\u4e0a\u5b9e\u73b0\u7a00\u758f\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u3001\u80fd\u6e90\u9ad8\u6548\u7684\u5377\u79ef\u6a21\u5757\uff0c\u5b83\u80fd\u591f\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u4e2d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u901a\u9053\u7ea7\u7a00\u758f\u6027\uff0c\u5e76\u57fa\u4e8e\u5047\u8bbeBP\u901a\u5e38\u5bc6\u96c6\u4e14\u4f4e\u6548\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u8fc7\u62df\u5408\u548c\u9ad8\u8ba1\u7b97\u6d88\u8017\uff0c\u63d0\u51fa\u4e86\u989d\u5916\u7684\u68af\u5ea6\u9009\u62e9\u8c03\u5ea6\u5668\uff0c\u5728\u53cd\u5411\u4f20\u64ad\u9636\u6bb5\u8fdb\u884c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c1140%\u7684\u8ba1\u7b97\u91cf\uff0c\u540c\u65f6\u6709\u53ef\u80fd\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u751f\u6210\u4efb\u52a1\u4e0a\u5f97\u5230\u9a8c\u8bc1\u3002\u8fd9\u79cd\u51cf\u5c11\u53ef\u4ee5\u5e26\u6765\u663e\u8457\u7684\u80fd\u6e90\u8282\u7701\u548c\u8f83\u4f4e\u7684\u78b3\u8db3\u8ff9\uff0c\u5c24\u5176\u662f\u5728\u5927\u578bAI\u7cfb\u7edf\u7684\u7814\u7a76\u4e0e\u5f00\u53d1\u9636\u6bb5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u4e0d\u540c\u4e8eDropout\u7684\u65b9\u5f0f\u7f13\u89e3\u4e86\u8fc7\u62df\u5408\u95ee\u9898\uff0c\u5141\u8bb8\u5b83\u4e0eDropout\u7ed3\u5408\u4f7f\u7528\uff0c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u5e76\u964d\u4f4e\u8ba1\u7b97\u8d44\u6e90\u6d88\u8017\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u6570\u636e\u96c6\u548c\u4efb\u52a1\uff0c\u5e76\u4e0e\u591a\u79cd\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u548c\u6a21\u5757\u517c\u5bb9\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/lujiazho/ssProp\u3002**|\n", "2408.12547": "|**2024-08-22**|**Towards Evaluating and Building Versatile Large Language Models for Medicine**|Chaoyi Wu et.al.|[2408.12547](http://arxiv.org/abs/2408.12547)|**[link](https://github.com/magic-ai4med/meds-ins)**|**\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014MedS-Bench\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e34\u5e8a\u573a\u666f\u4e2d\u7684\u6027\u80fd\u3002\u4e0e\u73b0\u6709\u4fa7\u91cd\u4e8e\u591a\u9879\u9009\u62e9\u95ee\u9898\u56de\u7b54\u7684\u57fa\u51c6\u4e0d\u540c\uff0cMedS-Bench\u8986\u76d6\u4e8611\u4e2a\u9ad8\u7ea7\u522b\u4e34\u5e8a\u4efb\u52a1\uff0c\u5305\u62ec\u4e34\u5e8a\u62a5\u544a\u6458\u8981\u3001\u6cbb\u7597\u5efa\u8bae\u3001\u8bca\u65ad\u3001\u5b9e\u4f53\u8bc6\u522b\u548c\u533b\u5b66\u6982\u5ff5\u89e3\u91ca\u7b49\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u63d0\u793a\u5bf9\u516d\u6b3e\u9886\u5148\u7684LLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5982MEDITRON\u3001Mistral\u3001InternLM 2\u3001Llama 3\u3001GPT-4\u548cClaude-3.5\uff0c\u53d1\u73b0\u5373\u4f7f\u662f\u6700\u9ad8\u7ea7\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u4e0a\u4e5f\u5b58\u5728\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MedS-Ins\uff0c\u4e00\u4e2a\u9762\u5411\u533b\u5b66\u9886\u57df\u7684\u5927\u578b\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u3002MedS-Ins\u5305\u542b\u4e8658\u4e2a\u533b\u5b66\u76f8\u5173\u7684\u8bed\u8a00\u8bed\u6599\u5e93\uff0c\u603b\u8ba11350\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86122\u4e2a\u4efb\u52a1\u3002\u901a\u8fc7\u5c55\u793a\u8be5\u6570\u636e\u96c6\u7684\u7528\u9014\uff0c\u6211\u4eec\u5728\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u5f00\u6e90\u7684\u533b\u7597\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u6307\u4ee4\u8c03\u4f18\u5b9e\u9a8c\uff0c\u7ed3\u679c\u5f97\u5230\u4e86\u540d\u4e3aMMedIns-Llama 3\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u5728\u51e0\u4e4e\u6240\u6709\u4e34\u5e8a\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u90fd\u8d85\u8fc7\u4e86\u73b0\u6709\u6a21\u578b\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9LLMs\u5e94\u7528\u4e8e\u4e34\u5e8a\u6311\u6218\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06MedS-Ins\u6570\u636e\u96c6\u5b8c\u5168\u516c\u5f00\uff0c\u5e76\u9080\u8bf7\u7814\u7a76\u793e\u533a\u53c2\u4e0e\u5176\u6269\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u52a8\u6001\u6392\u884c\u699c\uff0c\u8ba1\u5212\u5b9a\u671f\u66f4\u65b0\u6d4b\u8bd5\u96c6\uff0c\u4ee5\u8ddf\u8e2a\u8fdb\u5c55\u5e76\u589e\u5f3a\u901a\u7528LLM\u5728\u533b\u5b66\u9886\u57df\u4e2d\u7684\u9002\u5e94\u80fd\u529b\u3002\u6392\u884c\u699c\uff1ahttps://henrychur.github.io/MedS-Bench/\u3002Github\uff1ahttps://github.com/MAGIC-AI4Med/MedS-Ins\u3002**|\n", "2408.12496": "|**2024-08-22**|**MEDCO: Medical Education Copilots Based on A Multi-Agent Framework**|Hao Wei et.al.|[2408.12496](http://arxiv.org/abs/2408.12496)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u5b66\u548c\u5065\u5eb7\u9886\u57df\u7b49\u591a\u4e2a\u7814\u7a76\u9886\u57df\u4ea7\u751f\u4e86\u91cd\u5927\u5f71\u54cd\uff0c\u7136\u800cLLMs\u4f5c\u4e3a\u533b\u7597\u6559\u80b2\u4e2d\u7684\u52a9\u624b\u6f5c\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684AI\u8f85\u52a9\u6559\u80b2\u5de5\u5177\u53d7\u9650\u4e8e\u5355\u4e00\u5b66\u4e60\u65b9\u6cd5\u4ee5\u53ca\u65e0\u6cd5\u6a21\u62df\u5b9e\u9645\u533b\u7597\u57f9\u8bad\u7684\u591a\u5b66\u79d1\u6027\u548c\u4e92\u52a8\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDCO\uff08Medical EDucation COpilots\uff09\u7684\u65b0\u578b\u591a\u4ee3\u7406\u52a9\u624b\u7cfb\u7edf\uff0c\u4e13\u95e8\u7528\u4e8e\u6a21\u62df\u771f\u5b9e\u4e16\u754c\u533b\u7597\u57f9\u8bad\u73af\u5883\u3002MEDCO\u6574\u5408\u4e86\u4e09\u4e2a\u6838\u5fc3\u4ee3\u7406\uff1a\u4e00\u4e2a\u81ea\u4e3b\u60a3\u8005\u3001\u4e00\u4f4d\u4e13\u5bb6\u533b\u751f\u548c\u4e00\u4f4d\u653e\u5c04\u79d1\u533b\u5e08\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u548c\u4e92\u52a8\u7684\u5b66\u4e60\u73af\u5883\u3002\u6211\u4eec\u7684\u6846\u67b6\u7740\u91cd\u4e8e\u6559\u6388\u9ad8\u6548\u63d0\u95ee\u6280\u5de7\u3001\u8de8\u5b66\u79d1\u534f\u4f5c\u4ee5\u53ca\u5b66\u751f\u4e4b\u95f4\u7684\u540c\u4f34\u8ba8\u8bba\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7MEDCO\u8bad\u7ec3\u7684\u865a\u62df\u5b66\u751f\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u4e0e\u9ad8\u7ea7\u6a21\u578b\u76f8\u5ab2\u7f8e\u7684\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u8fd8\u5c55\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u5b66\u4e60\u884c\u4e3a\u548c\u8fdb\u6b65\uff0c\u5e76\u4e14\u5b66\u4e60\u6837\u672c\u6570\u91cf\u589e\u52a0\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u533b\u7597\u6559\u80b2\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u4e92\u52a8\u548c\u534f\u4f5c\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u63d0\u4f9b\u4e86\u5173\u4e8e\u96c6\u6210AI\u7684\u8bad\u7ec3\u6a21\u5f0f\u6709\u6548\u6027\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2408.12494": "|**2024-08-22**|**GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models**|Kunsheng Tang et.al.|[2408.12494](http://arxiv.org/abs/2408.12494)|**[link](https://github.com/kstanghere/gendercare-ccs24)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u4e5f\u88ab\u89c2\u5bdf\u5230\u653e\u5927\u4e86\u793e\u4f1a\u504f\u89c1\uff0c\u5c24\u5176\u662f\u4e0e\u6027\u522b\u76f8\u5173\u7684\u504f\u89c1\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u5df2\u7ecf\u63d0\u51fa\u4e86\u82e5\u5e72\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u7f3a\u4e4f\u5b9e\u9645\u7684\u7075\u6d3b\u6027\u6216\u65e0\u610f\u4e2d\u5f15\u5165\u4e86\u504f\u89c1\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenderCARE\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\uff0c\u5305\u62ec\u521b\u65b0\u7684\u51c6\u5219\u3001\u8bc4\u4f30\u3001\u51cf\u5c11\u6280\u672f\u4ee5\u53ca\u8bc4\u4ef7\u6307\u6807\uff0c\u65e8\u5728\u91cf\u5316\u548c\u51cf\u8f7bLLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u5f00\u521b\u6027\u7684\u6027\u522b\u5e73\u7b49\u57fa\u51c6\u51c6\u5219\uff0c\u8986\u76d6\u4e86\u5305\u5bb9\u6027\u3001\u591a\u6837\u6027\u3001\u53ef\u89e3\u91ca\u6027\u3001\u5ba2\u89c2\u6027\u3001\u7a33\u5065\u6027\u548c\u73b0\u5b9e\u6027\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6839\u636e\u8fd9\u4e9b\u51c6\u5219\uff0c\u6211\u4eec\u6784\u5efa\u4e86GenderPair\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u914d\u5bf9\u57fa\u51c6\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u6807\u51c6\u5316\u4e14\u73b0\u5b9e\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u4ee5\u524d\u88ab\u5ffd\u89c6\u7684\u6027\u522b\u7fa4\u4f53\uff0c\u5982\u8de8\u6027\u522b\u8005\u548c\u975e\u4e8c\u5143\u4e2a\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u6709\u6548\u7684\u53bb\u504f\u6280\u672f\uff0c\u5305\u62ec\u53cd\u4e8b\u5b9e\u6570\u636e\u589e\u5f3a\u548c\u4e13\u95e8\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ee5\u5728\u4e0d\u635f\u5bb3LLM\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u51cf\u5c11\u6027\u522b\u504f\u89c1\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u572817\u4e2a\u4e0d\u540c\u7684LLM\u4e0a\uff0c\u5404\u79cd\u6027\u522b\u504f\u89c1\u57fa\u51c6\u7684\u663e\u8457\u51cf\u5c11\uff0c\u6700\u9ad8\u53ef\u8fbe\u8d85\u8fc790%\uff0c\u5e73\u5747\u503c\u8d85\u8fc735%\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u51cf\u5c11\u5e26\u6765\u7684\u4e3b\u6d41\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u7684\u53d8\u5f02\u6027\u4fdd\u6301\u57282%\u4ee5\u4e0b\u3002\u901a\u8fc7\u63d0\u4f9b\u771f\u5b9e\u6027\u7684\u8bc4\u4f30\u548c\u9488\u5bf9\u6027\u522b\u504f\u89c1\u7684\u5b9a\u5236\u51cf\u5c11\uff0c\u6211\u4eec\u5e0c\u671bGenderCARE\u80fd\u591f\u4ee3\u8868\u5728LLM\u4e2d\u5b9e\u73b0\u516c\u5e73\u548c\u516c\u6b63\u7684\u4e00\u4e2a\u91cd\u8981\u6b65\u9aa4\u3002\u66f4\u591a\u7ec6\u8282\u8bf7\u53c2\u9605https://github.com/kstanghere/GenderCARE-ccs24\u3002**|\n", "2408.12480": "|**2024-08-23**|**Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese**|Khang T. Doan et.al.|[2408.12480](http://arxiv.org/abs/2408.12480)|null|\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Vintern-1B\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u8d8a\u5357\u8bed\u4efb\u52a1\u7684\u53ef\u9760\u7684\u4e00\u767e\u4ebf\u53c2\u6570\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u3002\u901a\u8fc7\u6574\u5408Qwen2-0.5B-Instruct\u8bed\u8a00\u6a21\u578b\u4e0eInternViT-300M-448px\u89c6\u89c9\u6a21\u578b\uff0cVintern-1B\u4f18\u5316\u4e86\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\uff08OCR\uff09\u3001\u6587\u6863\u63d0\u53d6\u548c\u8d8a\u5357\u8bed\u4e0a\u4e0b\u6587\u4e2d\u7684\u901a\u7528\u95ee\u9898\u56de\u7b54\u7b49\u5e94\u7528\u3002\u8be5\u6a21\u578b\u5728\u8d85\u8fc7\u4e09\u767e\u4e07\u5f20\u56fe\u50cf-\u95ee\u9898-\u7b54\u6848\u5bf9\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5b9e\u73b0\u4e86\u5728\u591a\u4e2a\u8d8a\u5357\u8bed\u57fa\u51c6\u6d4b\u8bd5\u5982OpenViVQA\u548cViTextVQA\u4e0a\u7684\u7a33\u5065\u6027\u80fd\u548c\u53ef\u9760\u7ed3\u679c\u3002Vintern-1B\u8db3\u591f\u5c0f\uff0c\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u5404\u79cd\u79bb\u7ebf\u5e94\u7528\u4e2d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u6e90\u4e86\u51e0\u7ec4\u7528\u4e8e\u6587\u672c\u548c\u56fe\u8868\u7684\u8d8a\u5357\u8bed\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u6570\u636e\u96c6\uff0c\u4f7f\u7528\u7684\u662fGemini 1.5 Flash\u521b\u5efa\u7684\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://huggingface.co/5CD-AI/Vintern-1B-v2\u3002|\n", "2408.12475": "|**2024-08-22**|**Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition**|Bozheng Li et.al.|[2408.12475](http://arxiv.org/abs/2408.12475)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65f6\u5e8f\u5e8f\u5217\u611f\u77e5\u6a21\u578b\uff08TSAM\uff09\u4ee5\u8fdb\u884c\u5c11\u91cf\u6837\u672c\u52a8\u4f5c\u8bc6\u522b\uff08FSAR\uff09\uff0c\u8be5\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u6846\u67b6\u4e2d\u5f15\u5165\u4e86\u5e8f\u5217\u611f\u77e5\u5668\u9002\u914d\u5668\uff0c\u65e8\u5728\u6574\u5408\u7a7a\u95f4\u4fe1\u606f\u548c\u5e8f\u5217\u65f6\u95f4\u52a8\u6001\u5230\u7279\u5f81\u5d4c\u5165\u4e2d\u3002\u4e0e\u73b0\u6709\u901a\u8fc7\u63a2\u7d22\u6240\u6709\u5e27\u4e4b\u95f4\u5173\u7cfb\u6765\u6355\u83b7\u65f6\u95f4\u4fe1\u606f\u7684\u7ec6\u8c03\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u611f\u77e5\u5668\u7684\u9002\u914d\u5668\u80fd\u591f\u6cbf\u65f6\u95f4\u7ebf\u9012\u5f52\u5730\u6355\u6349\u5e8f\u5217\u52a8\u6001\uff0c\u5e76\u611f\u77e5\u987a\u5e8f\u53d8\u5316\u3002\u4e3a\u4e86\u83b7\u53d6\u6bcf\u4e2a\u7c7b\u522b\u7684\u5224\u522b\u6027\u8868\u793a\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bfc\u51fa\u7684\u6587\u672c\u5e93\uff0c\u5bf9\u89c6\u89c9\u539f\u578b\u8fdb\u884c\u4e86\u4e30\u5bcc\uff0c\u901a\u8fc7\u6574\u5408\u4e0a\u4e0b\u6587\u8bed\u4e49\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u4e0d\u5e73\u8861\u6700\u4f18\u4f20\u8f93\u7b56\u7565\u6765\u8fdb\u884c\u7279\u5f81\u5339\u914d\uff0c\u4ee5\u51cf\u8f7b\u4e0e\u7c7b\u522b\u65e0\u5173\u7279\u5f81\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u6709\u6548\u7684\u51b3\u7b56\u3002\u5728\u4e94\u4e2aFSAR\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u521b\u4e0b\u4e86\u65b0\u7684\u57fa\u51c6\uff0c\u4e0e\u7b2c\u4e8c\u597d\u7684\u7ade\u4e89\u5bf9\u624b\u76f8\u6bd4\u53d6\u5f97\u4e86\u663e\u8457\u7684\u4f18\u52bf\u3002|\n", "2408.12470": "|**2024-08-22**|**DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems**|Jiaju Chen et.al.|[2408.12470](http://arxiv.org/abs/2408.12470)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u4f46\u5f80\u5f80\u4f34\u968f\u7740\u63a8\u8350\u591a\u6837\u6027\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u635f\u5bb3\u7528\u6237\u4f53\u9a8c\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u5e94\u8fd0\u800c\u751f\uff0c\u5b83\u5141\u8bb8\u7528\u6237\u6307\u5b9a\u504f\u597d\u5e76\u83b7\u5f97\u6ee1\u8db3\u5176\u591a\u6837\u5316\u9700\u6c42\u7684\u63a8\u8350\u3002\u5c3d\u7ba1\u5177\u6709\u6f5c\u529b\uff0c\u73b0\u6709\u7684\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u901a\u5e38\u4f9d\u8d56\u4e8e\u7b80\u5355\u673a\u5236\uff0c\u5982\u5355\u4e00\u63d0\u793a\uff0c\u6765\u8c03\u8282\u591a\u6837\u6027\uff0c\u8fd9\u79cd\u505a\u6cd5\u672a\u80fd\u5145\u5206\u6355\u6349\u7528\u6237\u504f\u597d\u7684\u590d\u6742\u6027\u3002\u9488\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDLCRec\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8eLLM\u7684\u63a8\u8350\u7cfb\u7edf\u7684\u7cbe\u7ec6\u7c92\u5ea6\u591a\u6837\u6027\u63a7\u5236\u3002\u4e0e\u4f20\u7edf\u65b9\u6cd5\u4e0d\u540c\uff0cDLCRec\u91c7\u7528\u7cbe\u7ec6\u4efb\u52a1\u5206\u89e3\u7b56\u7565\uff0c\u5c06\u63a8\u8350\u8fc7\u7a0b\u62c6\u5206\u4e3a\u4e09\u4e2a\u4f9d\u6b21\u8fdb\u884c\u7684\u5b50\u4efb\u52a1\uff1a\u4f53\u88c1\u9884\u6d4b\u3001\u4f53\u88c1\u586b\u5145\u548c\u9879\u76ee\u9884\u6d4b\u3002\u8fd9\u4e9b\u5b50\u4efb\u52a1\u72ec\u7acb\u8bad\u7ec3\u5e76\u5728\u7528\u6237\u5b9a\u4e49\u7684\u63a7\u5236\u6570\u6307\u5bfc\u4e0b\u4f9d\u6b21\u63a8\u7406\uff0c\u786e\u4fdd\u4e86\u5bf9\u591a\u6837\u6027\u7684\u66f4\u7cbe\u786e\u63a7\u5236\u3002\u6b64\u5916\uff0c\u7a00\u7f3a\u4e14\u5206\u5e03\u4e0d\u5747\u7684\u591a\u6837\u6027\u76f8\u5173\u7528\u6237\u884c\u4e3a\u6570\u636e\u7684\u7f3a\u4e4f\u6784\u6210\u4e86\u5bf9\u5fae\u8c03\u7684\u4e25\u5cfb\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u79cd\u6570\u636e\u589e\u5f3a\u6280\u672f\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u566a\u58f0\u548c\u79bb\u7fa4\u6570\u636e\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u6280\u672f\u4f7f\u6a21\u578b\u63a5\u89e6\u5230\u66f4\u5e7f\u6cdb\u7684\u6a21\u5f0f\uff0c\u4ece\u800c\u63d0\u9ad8\u5176\u751f\u6210\u4e0d\u540c\u591a\u6837\u6027\u7684\u63a8\u8350\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDLCRec\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u5bf9\u591a\u6837\u6027\u7684\u7cbe\u786e\u63a7\u5236\uff0c\u800c\u4e14\u5728\u591a\u4e2a\u63a8\u8350\u573a\u666f\u4e2d\u90fd\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.13257": "|**2024-08-23**|**MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?**|Yi-Fan Zhang et.al.|[2408.13257](http://arxiv.org/abs/2408.13257)|null|\u8fd1\u671f\uff0c\u5168\u9762\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7814\u7a76\u793e\u533a\u4e2d\u5f15\u53d1\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u73b0\u6709\u57fa\u51c6\u6d4b\u8bd5\u5b58\u5728\u4e00\u4e9b\u666e\u904d\u7684\u969c\u788d\uff0c\u4f7f\u5f97\u8861\u91cf\u6a21\u578b\u9762\u4e34\u7684\u5b9e\u9645\u4e16\u754c\u6311\u6218\u53d8\u5f97\u56f0\u96be\uff0c\u5305\u62ec\uff1a1\uff09\u6570\u636e\u89c4\u6a21\u8f83\u5c0f\u5bfc\u81f4\u6027\u80fd\u6ce2\u52a8\u5927\uff1b2\uff09\u4f9d\u8d56\u6a21\u578b\u751f\u6210\u6ce8\u91ca\u9020\u6210\u6570\u636e\u8d28\u91cf\u53d7\u9650\uff1b3\uff09\u4efb\u52a1\u96be\u5ea6\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u56fe\u50cf\u5206\u8fa8\u7387\u6709\u9650\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86MME-RealWorld\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4ece\u516c\u5171\u6570\u636e\u96c6\u548c\u4e92\u8054\u7f51\u6536\u96c6\u4e86\u8d85\u8fc730\u4e07\u5f20\u56fe\u7247\uff0c\u5e76\u7b5b\u9009\u51fa13,366\u5f20\u9ad8\u8d28\u91cf\u56fe\u7247\u8fdb\u884c\u6807\u6ce8\u3002\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u52a8\u7528\u4e8625\u540d\u4e13\u4e1a\u6ce8\u91ca\u5458\u548c7\u540dMLLM\u9886\u57df\u7684\u4e13\u5bb6\uff0c\u5171\u8d21\u732e\u4e8629,429\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u6db5\u76d6\u4e865\u79cd\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4e0b\u768443\u4e2a\u5b50\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u751a\u81f3\u5bf9\u4eba\u7c7b\u6765\u8bf4\u4e5f\u6781\u5177\u6311\u6218\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cMME-RealWorld\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u4eba\u5de5\u6807\u6ce8\u57fa\u51c6\uff0c\u5176\u7279\u5f81\u4e3a\u6700\u9ad8\u5206\u8fa8\u7387\u4ee5\u53ca\u4e13\u6ce8\u4e8e\u771f\u5b9e\u4e16\u754c\u5e94\u7528\u7684\u76ee\u6807\u5bfc\u5411\u3002 \u6211\u4eec\u8fdb\u4e00\u6b65\u5bf928\u4e2a\u9886\u5148\u7684MLLM\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u5982GPT-4o\u3001Gemini 1.5 Pro\u548cClaude 3.5 Sonnet\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u65e0\u6cd5\u5e94\u5bf9\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u6ca1\u6709\u4e00\u4e2a\u6a21\u578b\u8fbe\u523060%\u7684\u51c6\u786e\u7387\u3002\u611f\u77e5\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u7406\u89e3\u590d\u6742\u7684\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4ecd\u7136\u662f\u4e9f\u5f85\u89e3\u51b3\u7684\u5173\u952e\u95ee\u9898\u3002\u76f8\u5173\u7684\u6570\u636e\u548c\u8bc4\u4f30\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://mme-realworld.github.io/ \u3002|\n", "2408.13253": "|**2024-08-23**|**Domain-specific long text classification from sparse relevant information**|C\u00e9lia D'Cruz et.al.|[2408.13253](http://arxiv.org/abs/2408.13253)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65e0\u7591\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\uff0c\u5f53\u524d\u7684\u8d8b\u52bf\u662f\u63a8\u52a8\u5355\u4e00\u6a21\u578b\u89e3\u51b3\u6240\u6709\u4efb\u52a1\uff08\u5982\u60c5\u611f\u5206\u6790\u3001\u7ffb\u8bd1\u7b49\uff09\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u7a00\u758f\u4fe1\u606f\u6216\u5f31\u4fe1\u53f7\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u7edf\u8ba1\u673a\u5236\u96be\u4ee5\u6709\u6548\u5229\u7528\u5173\u952e\u4fe1\u606f\u3002\u4f8b\u5982\uff0c\u5728\u957f\u7bc7\u7279\u5b9a\u9886\u57df\u6587\u6863\u7684\u5206\u7c7b\u4e2d\uff0c\u76f8\u5173\u6027\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4e00\u4e2a\u6216\u51e0\u4e2a\u5173\u952e\u672f\u8bed\u3002\u533b\u7597\u9886\u57df\u4e2d\uff0c\u786e\u5b9a\u67d0\u4e2a\u62a5\u544a\u662f\u5426\u5305\u542b\u4e86\u5173\u4e8e\u60a3\u8005\u72b6\u51b5\u7684\u5173\u952e\u4fe1\u606f\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5173\u952e\u4fe1\u606f\u901a\u5e38\u57fa\u4e8e\u4e00\u4e24\u4e2a\u7279\u5b9a\u7684\u5b64\u7acb\u672f\u8bed\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5c42\u6b21\u5316\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e00\u4e2a\u6f5c\u5728\u76ee\u6807\u672f\u8bed\u5217\u8868\u6765\u68c0\u7d22\u5019\u9009\u53e5\u5b50\uff0c\u5e76\u5c06\u8fd9\u4e9b\u53e5\u5b50\u8868\u793a\u4e3a\u5305\u542b\u5b83\u4eec\u7684\u76ee\u6807\u672f\u8bed\u7684\u4e0a\u4e0b\u6587\u5d4c\u5165\u3002\u5bf9\u76ee\u6807\u672f\u8bed\uff08\u6216\u672f\u8bed\uff09\u7684\u5d4c\u5165\u8fdb\u884c\u805a\u5408\u5bfc\u81f4\u6587\u6863\u8868\u793a\u88ab\u7528\u4e8e\u5206\u7c7b\u3002\u6211\u4eec\u5206\u522b\u5728\u82f1\u8bed\u548c\u6cd5\u8bed\u7684\u516c\u5f00\u533b\u7597\u6587\u6863\u57fa\u51c6\u6570\u636e\u96c6\u4ee5\u53ca\u79c1\u6709\u533b\u7597\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7a84\u5c42\u7ea7\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u80cc\u666f\u4e0b\u68c0\u7d22\u76f8\u5173\u957f\u6587\u6863\u65b9\u9762\u4f18\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2408.13233": "|**2024-08-23**|**Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time**|Yingyu Liang et.al.|[2408.13233](http://arxiv.org/abs/2408.13233)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5feb\u901f\u8ba1\u7b97\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u4e2d\u7684\u68af\u5ea6\u8ba1\u7b97\u3002\u8be5\u65b9\u6cd5\u5728\u51e0\u4e4e\u7ebf\u6027\u65f6\u95f4\u5185$n^{1+o(1)}$\u8ba1\u7b97\u6574\u4e2a\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u7684\u68af\u5ea6\uff0c\u5176\u4e2d$n$\u662f\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u3002\u8fd9\u4e00\u7a81\u7834\u6781\u5927\u5730\u964d\u4f4e\u4e86\u4f20\u7edf\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u76f8\u5173\u7684\u8ba1\u7b97\u74f6\u9888\u3002\u6211\u4eec\u7684\u7406\u8bba\u9002\u7528\u4e8e\u4efb\u4f55\u635f\u5931\u51fd\u6570\uff0c\u5e76\u5728\u5168\u6a21\u578b\u4e0a\u4fdd\u6301\u53ef\u63a7\u5236\u7684\u8fd1\u4f3c\u8bef\u5dee\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fd8\u8003\u8651\u4e86\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u5305\u542b\u8bb8\u591a\u5b9e\u7528\u5b50\u6a21\u5757\u7684\u60c5\u51b5\uff0c\u5982\u6b8b\u5dee\u8fde\u63a5\u3001\u56e0\u679c\u63a9\u7801\u548c\u591a\u5934\u6ce8\u610f\u529b\u3002\u901a\u8fc7\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u68af\u5ea6\u8ba1\u7b97\u7684\u6548\u7387\uff0c\u6211\u4eec\u671f\u671b\u901a\u8fc7\u57fa\u4e8e\u6211\u4eec\u7684\u7406\u8bba\u7ed3\u679c\u6539\u8fdb\u957f\u4e0a\u4e0b\u6587\u8bed\u8a00\u6a21\u578b\u7684\u8bad\u7ec3\u548c\u90e8\u7f72\uff0c\u4f7f\u8fd9\u4e9b\u6a21\u578b\u66f4\u52a0\u6709\u6548\u3002|\n", "2408.13214": "|**2024-08-23**|**EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods**|Hongcheng Ding et.al.|[2408.13214](http://arxiv.org/abs/2408.13214)|null|\u51c6\u786e\u9884\u6d4bEUR/USD\u6c47\u7387\u5bf9\u6295\u8d44\u8005\u3001\u4f01\u4e1a\u548c\u653f\u7b56\u5236\u5b9a\u8005\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6IUS\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u65b0\u95fb\u548c\u5206\u6790\u7684\u975e\u7ed3\u6784\u5316\u6587\u672c\u6570\u636e\u4e0e\u6c47\u7387\u548c\u91d1\u878d\u6307\u6807\u7684\u7ed3\u6784\u5316\u6570\u636e\uff0c\u4ee5\u589e\u5f3a\u6c47\u7387\u9884\u6d4b\u80fd\u529b\u3002IUS\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u60c5\u611f\u6781\u6027\u8bc4\u5206\u548c\u6c47\u7387\u53d8\u52a8\u5206\u7c7b\u3002\u8fd9\u4e9b\u6587\u672c\u7279\u5f81\u4e0e\u5b9a\u91cf\u7279\u5f81\u76f8\u7ed3\u5408\uff0c\u5e76\u8f93\u5165\u5230\u56e0\u679c\u9a71\u52a8\u7279\u5f81\u751f\u6210\u5668\u4e2d\u3002\u7136\u540e\u4f7f\u7528Optuna\u4f18\u5316\u7684Bi-LSTM\u6a21\u578b\u9884\u6d4bEUR/USD\u6c47\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6a21\u578b\u5728\u51cf\u5c11\u5e73\u5747\u7edd\u5bf9\u8bef\u5dee\uff08MAE\uff0910.69%\u548c\u6839\u5747\u65b9\u8bef\u5dee\uff08RMSE\uff099.56%\u65b9\u9762\u4f18\u4e8e\u57fa\u51c6\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u878d\u5408\u975e\u7ed3\u6784\u5316\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u51c6\u786e\u6027\u6bd4\u4ec5\u4f7f\u7528\u7ed3\u6784\u5316\u6570\u636e\u66f4\u9ad8\u3002\u6b64\u5916\uff0c\u4f7f\u7528\u9876\u7ea712\u4e2a\u91cd\u8981\u5b9a\u91cf\u7279\u5f81\u548c\u6587\u672c\u7279\u5f81\u76f8\u7ed3\u5408\u8fdb\u884c\u7279\u5f81\u9009\u62e9\u8bc1\u660e\u662f\u6700\u6709\u6548\u7684\u3002\u63d0\u51fa\u7684IUS\u6846\u67b6\u548cOptuna-Bi-LSTM\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u79cd\u5f3a\u5927\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u6e90\u6570\u636e\u96c6\u6210\u7684\u6c47\u7387\u9884\u6d4b\u3002|\n", "2408.13204": "|**2024-08-23**|**DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation**|Qiming Zhu et.al.|[2408.13204](http://arxiv.org/abs/2408.13204)|null|\u4ee3\u7801\u57fa\u51c6\uff0c\u5982HumanEval\uff0c\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u5b83\u4eec\u4f18\u52bf\u4e0e\u4e0d\u8db3\u7684\u6d1e\u5bdf\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u7528\u7f16\u7801\u4efb\u52a1\u4e0a\uff08\u4f8b\u5982\uff1a\u5192\u6ce1\u6392\u5e8f\u3001\u6700\u5927\u516c\u7ea6\u6570\uff09\uff0c\u5bf9\u9886\u57df\u7279\u5b9a\u7f16\u7801\u4efb\u52a1\uff08\u5982\u8ba1\u7b97\u3001\u7cfb\u7edf\u3001\u52a0\u5bc6\uff09\u7684\u63a2\u7d22\u5219\u8f83\u5c11\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u9886\u57df\u4ee3\u7801\u57fa\u51c6DOMAINEVAL\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLMs\u7684\u7f16\u7801\u80fd\u529b\u3002\u6211\u4eec\u7684\u6d41\u7a0b\u4ee5\u5168\u81ea\u52a8\u65b9\u5f0f\u5de5\u4f5c\uff0c\u5141\u8bb8\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u6784\u5efa\u683c\u5f0f\u5316\u7684\u7814\u7a76\u4e3b\u9898\u8fdb\u884c\u5e95\u90e8\u63a8\u52a8\u5f0f\u6784\u5efa\u3002\u901a\u8fc7\u4f7f\u752812\u4e2a\u4ee3\u8868\u6027LLM\u5728DOMAINEVAL\u4e0a\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u4e00\u4e9b\u6709\u8da3\u7684\u7ed3\u679c\u3002 \u6211\u4eec\u6ce8\u610f\u5230\uff0cLLMs\u5728\u8ba1\u7b97\u4efb\u52a1\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u4f46\u5728\u52a0\u5bc6\u548c\u7cfb\u7edf\u7f16\u7801\u4efb\u52a1\u4e0a\u5374\u6709\u6240\u6b20\u7f3a\u3002\u67d0\u4e9bLLM\u5728\u8fd9\u4e9b\u9886\u57df\u7684\u6027\u80fd\u5dee\u8ddd\u53ef\u80fd\u9ad8\u8fbe68.94%\uff0880.94%-12.0%\uff09\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u751f\u6210\u66f4\u591a\u6837\u672c\u53ef\u4ee5\u63d0\u9ad8LLMs\u7684\u6574\u4f53\u6027\u80fd\uff0c\u4f46\u9886\u57df\u504f\u89c1\u751a\u81f3\u53ef\u80fd\u589e\u52a0\u3002\u672c\u7814\u7a76\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u6570\u636e\u96c6DOMAINEVAL\uff0c\u6db5\u76d6\u516d\u4e2a\u6d41\u884c\u9886\u57df\uff0c\u4ee5\u53ca\u4e00\u4e2a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7ba1\u9053\u7528\u4e8e\u6784\u5efa\u4ee3\u7801\u57fa\u51c6\uff0c\u5e76\u57fa\u4e8e\u5728DOMAINEVAL\u4e0a\u7684\u6027\u80fd\u8bc6\u522b\u4e86LLMs\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u6539\u8fdb\u7684\u65b9\u5411\u3002\u9886\u5bfc\u8005\u677f\u53ef\u5728https://domaineval.github.io/\u67e5\u770b\u3002|\n", "2408.13184": "|**2024-08-23**|**Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning**|Hourui Deng et.al.|[2408.13184](http://arxiv.org/abs/2408.13184)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\uff0c\u7a7a\u95f4\u63a8\u7406\u662f\u5b9e\u73b0\u611f\u77e5\u667a\u80fd\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u5728\u7b80\u5355\u7684\u8ff7\u5bab\u73af\u5883\u4e2d\uff0cLLM\u5728\u957f\u671f\u8def\u5f84\u89c4\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u53d7\u5230\u5176\u7a7a\u95f4\u5e7b\u89c9\u548c\u957f\u671f\u63a8\u7406\u5bfc\u81f4\u7684\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6a21\u578b\u2014\u2014\u7a7a\u95f4\u5230\u5173\u7cfb\u8f6c\u6362\u4e0e\u9012\u8fdbQ\u5b66\u4e60\uff08S2RCQL\uff09\u3002\u4e3a\u89e3\u51b3LLM\u7684\u7a7a\u95f4\u5e7b\u89c9\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7a7a\u95f4\u5230\u5173\u7cfb\u201d\u7684\u65b9\u6cd5\uff0c\u5c06\u7a7a\u95f4\u63d0\u793a\u8f6c\u5316\u4e3a\u5b9e\u4f53\u5173\u7cfb\u548c\u8868\u793a\u5b9e\u4f53\u5173\u7cfb\u94fe\u7684\u8def\u5f84\uff0c\u5145\u5206\u6316\u6398\u4e86LLM\u5728\u5e8f\u5217\u601d\u8003\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8eQ\u5b66\u4e60\u7684\u8def\u5f84\u89c4\u5212\u7b97\u6cd5\uff0c\u4ee5\u7f13\u89e3\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\uff0c\u589e\u5f3aLLM\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u72b6\u6001\u52a8\u4f5c\u7684Q\u503c\u4f5c\u4e3a\u63d0\u793a\u7684\u8f85\u52a9\u4fe1\u606f\uff0c\u6211\u4eec\u7ea0\u6b63\u4e86LLM\u7684\u5e7b\u89c9\uff0c\u5f15\u5bfcLLM\u5b66\u4e60\u6700\u4f18\u8def\u5f84\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u53cd\u5411\u8bfe\u7a0b\u5b66\u4e60\u6280\u672f\uff0c\u8fdb\u4e00\u6b65\u7f13\u89e3\u4e86\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u3002\u8be5\u6280\u672f\u901a\u8fc7\u964d\u4f4e\u4efb\u52a1\u96be\u5ea6\u5e76\u5229\u7528\u6210\u529f\u7ecf\u9a8c\uff0c\u5e2e\u52a9LLM\u5feb\u901f\u79ef\u7d2f\uff0c\u5e76\u4ee5\u6b64\u6765\u5e94\u5bf9\u66f4\u590d\u6742\u4efb\u52a1\u3002\u6211\u4eec\u5728\u767e\u5ea6\u81ea\u4e3b\u7814\u53d1\u7684LLM\uff1aERNIE-Bot 4.0\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684S2RCQL\u5728\u6210\u529f\u7387\u548c\u6700\u4f18\u6027\u65b9\u9762\u5206\u522b\u63d0\u9ad8\u4e8623%\u81f340%\uff0c\u76f8\u8f83\u4e8e\u5148\u8fdb\u7684\u63d0\u793a\u5de5\u7a0b\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002|\n", "2408.13073": "|**2024-08-23**|**IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models**|Zhihao Yu et.al.|[2408.13073](http://arxiv.org/abs/2408.13073)|**[link](https://github.com/yzhHoward/IntelliCare)**|\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u6570\u636e\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u53d6\u5f97\u5de8\u5927\u8fdb\u6b65\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u5728\u5904\u7406\u6709\u9650\u6570\u636e\u4e2d\u7684\u591a\u6837\u5316\u7684\u533b\u5b66\u4ee3\u7801\u65f6\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u5176\u8bed\u4e49\u3002\u5f15\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u77e5\u8bc6\u6574\u5408\u4e3a\u63d0\u5347\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u5206\u6790\u53ef\u80fd\u4f1a\u56e0\u6b67\u4e49\u95ee\u9898\u548c\u4e0d\u4e00\u81f4\u6027\u5bfc\u81f4\u663e\u8457\u7684\u6ce2\u52a8\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u6709\u6548\u5229\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aIntelliCare\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528LLM\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u60a3\u8005\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e76\u589e\u5f3a\u73b0\u6709\u7684EHR\u6a21\u578b\u6765\u6539\u5584\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u3002\u5177\u4f53\u6765\u8bf4\uff0cIntelliCare\u901a\u8fc7\u8bc6\u522b\u60a3\u8005\u7fa4\u4f53\uff0c\u5e76\u5229\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u7edf\u8ba1\u4fe1\u606f\u6765\u589e\u5f3aLLM\u7684\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6b67\u4e49\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5b83\u901a\u8fc7\u7ed3\u5408EHR\u6a21\u578b\u548c\u56f0\u60d1\u5ea6\u91cf\u6765\u7ec6\u5316\u4eceLLM\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u91c7\u7528\u6df7\u5408\u65b9\u6cd5\u751f\u6210\u591a\u4e2a\u5206\u6790\u7ed3\u679c\u5e76\u8fdb\u884c\u6821\u51c6\u3002\u5728\u4e09\u4e2a\u4e34\u5e8a\u9884\u6d4b\u4efb\u52a1\u4e0a\u5bf9\u4e24\u4e2a\u5927\u89c4\u6a21EHR\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cIntelliCare\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u73b0\u6709\u65b9\u6cd5\u7684\u8868\u73b0\uff0c\u51f8\u663e\u4e86\u5176\u5728\u63a8\u8fdb\u4e2a\u6027\u5316\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u548c\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.13071": "|**2024-08-23**|**Guiding IoT-Based Healthcare Alert Systems with Large Language Models**|Yulan Gao et.al.|[2408.13071](http://arxiv.org/abs/2408.13071)|null|\u5728\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff08HAS\uff09\u9886\u57df\uff0c\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u3001\u7269\u8054\u7f51\uff08IoT\uff09\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u4ee5\u53ca\u516c\u4f17\u5065\u5eb7\u610f\u8bc6\u7684\u63d0\u9ad8\uff0cHAS\u6b63\u7ecf\u5386\u7740\u5feb\u901f\u7684\u53d8\u9769\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u5b58\u5728\u4e00\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u5982\u4f55\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u5728\u4e2a\u6027\u5316\u5065\u5eb7\u8b66\u62a5\u7684\u51c6\u786e\u6027\u4e0e\u4e25\u683c\u9690\u79c1\u4fdd\u62a4\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u70b9\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-HAS\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff09\u3002\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u878d\u5165\u5230HAS\u4e2d\uff0c\u4ee5\u663e\u8457\u63d0\u5347\u8b66\u62a5\u7684\u51c6\u786e\u6027\u3001\u786e\u4fdd\u7528\u6237\u9690\u79c1\uff0c\u5e76\u589e\u5f3a\u4e2a\u6027\u5316\u533b\u7597\u670d\u52a1\uff0c\u540c\u65f6\u6539\u5584\u7528\u6237\u4f53\u9a8c\u7684\u8d28\u91cf\uff08QoE\uff09\u3002\u6211\u4eec\u7684\u521b\u65b0\u6846\u67b6\u91c7\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u65b9\u6cd5\uff0c\u7ed3\u5408LLM\uff0c\u901a\u8fc7\u5206\u6790\u7528\u6237\u7684\u4e2a\u6027\u5316\u504f\u597d\u548c\u6f5c\u5728\u5065\u5eb7\u98ce\u9669\u6765\u5904\u7406\u989d\u5916\u7684\u6587\u672c\u5de5\u4f5c\u63cf\u8ff0\u3002\u8fd9\u79cd\u5206\u6790\u6307\u5bfc\u4e86\u4e13\u95e8\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08DDPG\uff09\u4e13\u5bb6\u7684\u9009\u62e9\uff0c\u4ed6\u4eec\u8d1f\u8d23\u63d0\u4f9b\u7cbe\u786e\u7684\u5065\u5eb7\u8b66\u62a5\u3002\u6b64\u5916\uff0cLLM-HAS\u80fd\u591f\u5904\u7406\u5bf9\u8bdd\u5f0f\u7528\u6237\u53cd\u9988\uff0c\u4e0d\u4ec5\u5141\u8bb8\u5bf9DDPG\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd8\u80fd\u52a0\u6df1\u7528\u6237\u53c2\u4e0e\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8\u5065\u5eb7\u7ba1\u7406\u7b56\u7565\u7684\u51c6\u786e\u6027\u548c\u4e2a\u6027\u5316\u7a0b\u5ea6\u3002 \u6a21\u62df\u7ed3\u679c\u9a8c\u8bc1\u4e86LLM-HAS\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5176\u4f5c\u4e3a\u5229\u7528\u751f\u6210\u578b\u4eba\u5de5\u667a\u80fd\uff08GAI\uff09\u63d0\u4f9b\u9ad8\u5ea6\u51c6\u786e\u53ef\u9760\u8b66\u62a5\u7684\u7a81\u7834\u6027\u65b9\u6cd5\u7684\u6f5c\u529b\u3002|\n", "2408.13031": "|**2024-08-23**|**VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models**|Wentao Wu et.al.|[2408.13031](http://arxiv.org/abs/2408.13031)|**[link](https://github.com/event-ahu/vfm-det)**|**\u73b0\u6709\u8f66\u8f86\u68c0\u6d4b\u5668\u901a\u5e38\u901a\u8fc7\u5728\u57fa\u4e8e\u9884\u8bad\u7ec3\u4e3b\u5e72\uff08\u5982ResNet\u3001ViT\uff09\u7684\u9884\u8bad\u7ec3\u5178\u578b\u68c0\u6d4b\u5668\uff08\u4f8b\u5982YOLO\u3001RCNN\u3001DETR\u7cfb\u5217\uff09\u4e0a\u8fdb\u884c\u8f66\u8f86\u56fe\u50cf\u8bad\u7ec3\u83b7\u5f97\u3002\u4e00\u4e9b\u7814\u7a76\u8005\u8fd8\u5229\u7528\u5e76\u589e\u5f3a\u5927\u578b\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u68c0\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e9b\u68c0\u6d4b\u5668\u53ef\u80fd\u4ec5\u83b7\u5f97\u6b21\u4f18\u7ed3\u679c\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f7f\u7528\u7684\u5927\u578b\u6a21\u578b\u5e76\u975e\u4e13\u95e8\u4e3a\u8f66\u8f86\u8bbe\u8ba1\u3002\u6b64\u5916\uff0c\u4ed6\u4eec\u7684\u7ed3\u679c\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7279\u5f81\uff0c\u5e76\u4e14\u5f88\u5c11\u8003\u8651\u8f66\u8f86\u8bed\u4e49\u4fe1\u606f\u4e0e\u89c6\u89c9\u8868\u793a\u4e4b\u95f4\u7684\u5bf9\u9f50\u3002 \u5728\u6b64\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u9884\u8bad\u7ec3\u7684\u8f66\u8f86\u6a21\u578b\uff08VehicleMAE\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08T5\uff09\u7684\u65b0\u8f66\u8f86\u68c0\u6d4b\u8303\u5f0f\uff0c\u79f0\u4e3aVFM-Det\u3002\u5b83\u9075\u5faa\u533a\u57df\u5efa\u8bae\u6846\u68c0\u6d4b\u6846\u67b6\uff0c\u6bcf\u4e2a\u63d0\u8bae\u7684\u7279\u5f81\u53ef\u4ee5\u901a\u8fc7VehicleMAE\u589e\u5f3a\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684VAtt2Vec\u6a21\u5757\uff0c\u7528\u4e8e\u9884\u6d4b\u8fd9\u4e9b\u63d0\u8bae\u7684\u8f66\u8f86\u8bed\u4e49\u5c5e\u6027\u5e76\u5c06\u5b83\u4eec\u8f6c\u6362\u4e3a\u7279\u5f81\u5411\u91cf\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u589e\u5f3a\u89c6\u89c9\u7279\u5f81\u3002\u5bf9\u4e09\u4e2a\u8f66\u8f86\u68c0\u6d4b\u57fa\u51c6\u6570\u636e\u96c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u5145\u5206\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u8f66\u8f86\u68c0\u6d4b\u5668\u7684\u6709\u6548\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5206\u522b\u5728Cityscapes\u6570\u636e\u96c6\u4e0a\u7684$AP_{0.5}$\u3001$AP_{0.75}$\u6307\u6807\u4e0a\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e86$+5.1\\%$\u3001$+6.2\\%$\u3002\u6b64\u5de5\u4f5c\u7684\u6e90\u4ee3\u7801\u5c06\u5728https://github.com/Event-AHU/VFM-Det\u53d1\u5e03\u3002**|\n", "2408.13028": "|**2024-08-23**|**In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting**|Haowei Du et.al.|[2408.13028](http://arxiv.org/abs/2408.13028)|null|\u5728\u5f53\u524d\u7684\u5b66\u672f\u754c\uff0c\u5bf9\u57fa\u4e8e\u6307\u4ee4\u589e\u5f3a\u7684\u5c11\u91cf\u5b9e\u4f8b\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLM\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08In-context Learning, ICL\uff09\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u73b0\u6709\u7684\u7528\u4e8eICL\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u5229\u7528\u7a00\u758f\u6216\u5bc6\u96c6\u68c0\u7d22\u5668\uff0c\u5e76\u4e14\u80fd\u591f\u4ea7\u751f\u6709\u6548\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u5bf9\u53cd\u9988\u4fe1\u606f\u7684\u5229\u7528\u6765\u8bad\u7ec3\u68c0\u7d22\u5668\uff0c\u6240\u9009\u7684\u793a\u4f8b\u53ef\u80fd\u65e0\u6cd5\u663e\u8457\u63d0\u5347LLM\u7684\u7c7b\u6bd4\u80fd\u529b\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u57fa\u4e8e\u5f3a\u5316\u5b66\u4e60\u7684\u7b56\u7565\u6846\u67b6\uff08Policy-based Reinforcement Learning Framework, RLS\uff09\u7528\u4e8e\u793a\u4f8b\u9009\u62e9\u3002\u8be5\u6846\u67b6\u7531\u8bed\u8a00\u6a21\u578b\uff08Language Model, LM\uff09\u9009\u62e9\u5668\u548cLLM\u751f\u6210\u5668\u7ec4\u6210\u3002\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u5c06\u5019\u9009\u793a\u4f8b\u7f16\u7801\u4e3a\u5bc6\u96c6\u8868\u793a\uff0c\u5e76\u4ece\u4e2d\u9009\u62e9top-k\u4e2a\u793a\u4f8b\u4f5c\u4e3aLLM\u7684\u793a\u8303\u3002\u901a\u8fc7\u91c7\u7528LLM\u7684\u8f93\u51fa\u6765\u8ba1\u7b97\u5956\u52b1\u548c\u7b56\u7565\u68af\u5ea6\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u3002 \u6211\u4eec\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u76f8\u8f83\u4e8e\u76d1\u7763\u5fae\u8c03\uff08Supervised Fine-tuning, SFT\uff09\u6a21\u578b\u663e\u793a\u51fa\u4f18\u52bf\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u793a\u4f8b\u7684\u6570\u91cf\u4e30\u5bcc\u6027\u548c\u4e0e\u6d4b\u8bd5\u6848\u4f8b\u7684\u76f8\u4f3c\u6027\u5bf9\u4e8eICL\u4e2d\u7684LLM\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.14470": "|**2024-08-27**|**Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models**|Aradhye Agarwal et.al.|[2408.14470](http://arxiv.org/abs/2408.14470)|**[link](https://github.com/Aradhye2002/selective-peft-toolkit)**|**\u7ec6\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0b\u6e38\u4efb\u52a1\u4e0a\u9700\u8981\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\u3002\u53c2\u6570\u9ad8\u6548\u7ec6\u8c03\uff08PEFT\uff09\u7c7b\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4ec5\u5fae\u8c03\u6a21\u578b\u53c2\u6570\u7684\u5c0f\u90e8\u5206\u6765\u7f13\u89e3\u8fd9\u4e9b\u8ba1\u7b97\u6311\u6218\u3002\u867d\u7136\u4ece\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8003\u8651\uff0c\u8fd9\u4e9b\u6280\u672f\u901a\u5e38\u65e0\u6cd5\u4e0e\u5b8c\u5168\u5fae\u8c03\u7684\u6a21\u578b\u6027\u80fd\u76f8\u5339\u654c\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u56fa\u6709\u7684\u504f\u89c1\u3002\u4f20\u7edf\u7684\u9009\u62e9\u6027PEFT\u6280\u672f\u57fa\u4e8e\u9884\u5148\u5b9a\u4e49\u7684\u9884\u7b97\uff08\u4e5f\u79f0\u4e3a\u53bb\u906e\u7f69\uff09\u4f7f\u7528\u56fa\u5b9a\u53c2\u6570\u96c6\uff0c\u672a\u80fd\u52a8\u6001\u6355\u6349\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u7ecf\u5e38\u8d85\u51fa\u9884\u7b97\u3002\u6211\u4eec\u5f15\u5165\u4e86$\\text{ID}^3$\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u9009\u62e9\u6027PEFT\u65b9\u6cd5\uff0c\u5b83\u8fde\u7eed\u8ba1\u7b97\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u901a\u8fc7\u5e73\u8861\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u6765\u52a8\u6001\u5730\u53bb\u906e\u7f69\u53c2\u6570\u3002\u6211\u4eec\u572815\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8986\u76d6\u4e86\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\uff0c\u663e\u793a\u4e86\u4e0e\u57fa\u4e8e\u56fa\u5b9a\u53bb\u906e\u7f69\u7684PEFT\u6280\u672f\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\uff0c$\\text{ID}^3$\u5c06\u68af\u5ea6\u66f4\u65b0\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u4e00\u500d\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u8ba1\u7b97\u6548\u7387\u3002$\\text{ID}^3$\u5bf9\u795e\u7ecf\u5143\u7684\u968f\u673a\u521d\u59cb\u5316\u5177\u6709\u9c81\u68d2\u6027\uff0c\u56e0\u6b64\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230\u73b0\u6709\u6dfb\u52a0\u5f0f\u548c\u91cd\u65b0\u53c2\u6570\u5316\u57faPEFT\u6a21\u5757\uff0c\u5982\u9002\u914d\u5668\u548cLoRA\u4e2d\uff0c\u7528\u4e8e\u52a8\u6001\u7a00\u758f\u5316\u3002**|\n", "2408.14469": "|**2024-08-26**|**Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos**|Qirui Chen et.al.|[2408.14469](http://arxiv.org/abs/2408.14469)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u957f\u5f62\u5f0f\u7b2c\u4e00\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e2d\u7684\u591a\u8df3\u89c6\u9891\u95ee\u7b54\uff08Multi-Hop Video Question Answering\uff0cMH-VidQA\uff09\u95ee\u9898\u3002\u8fd9\u9879\u4efb\u52a1\u4e0d\u4ec5\u9700\u8981\u56de\u7b54\u89c6\u89c9\u95ee\u9898\uff0c\u8fd8\u9700\u8981\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u591a\u4e2a\u76f8\u5173\u7684\u65f6\u95f4\u6bb5\u4f5c\u4e3a\u89c6\u89c9\u8bc1\u636e\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u6d41\u7a0b\u6765\u521b\u5efa\u5e26\u6709\u5173\u8054\u65f6\u95f4\u8bc1\u636e\u7684\u591a\u8df3\u95ee\u9898\u89e3\u7b54\u914d\u5bf9\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6307\u4ee4\u8c03\u6574\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u3002\u4e3a\u4e86\u76d1\u6d4b\u8fd9\u4e00\u65b0\u4efb\u52a1\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u6574\u7406\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u57fa\u51c6\u2014\u2014MultiHop-EgoQA\uff0c\u901a\u8fc7\u4ed4\u7ec6\u7684\u624b\u52a8\u9a8c\u8bc1\u548c\u7ec6\u5316\u8fdb\u884c\u6784\u5efa\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86\u73b0\u6709\u8de8\u6a21\u6001\u7cfb\u7edf\u5728\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGrounding Scattered Evidence with Large Language Model\u201d\uff08GeLM\uff09\u7684\u65b0\u67b6\u6784\uff0c\u8be5\u67b6\u6784\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u5730\u7406\u89e3\u7801\u6a21\u5757\u589e\u5f3a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\uff0c\u8be5\u6a21\u5757\u4f7f\u7528\u7075\u6d3b\u7684\u5730\u7406\u89e3\u7801\u4ee4\u724c\u4ece\u89c6\u9891\u4e2d\u68c0\u7d22\u65f6\u95f4\u8bc1\u636e\u3002\u5728\u6211\u4eec\u7684\u89c6\u89c9\u6307\u4ee4\u6570\u636e\u4e0a\u8fdb\u884c\u8bad\u7ec3\u540e\uff0cGeLM\u5c55\u793a\u4e86\u589e\u5f3a\u7684\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4e3a\u8fd9\u4e00\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u8bbe\u5b9a\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u6b64\u5916\uff0c\u5f53\u5728\u7b2c\u4e09\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u76f8\u540c\u7684\u67b6\u6784\u5728\u5355\u8df3\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\uff08ActivityNet-RTL\uff09\u4e0a\u4e5f\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8bc1\u660e\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2408.14467": "|**2024-08-26**|**Explicit Inductive Inference using Large Language Models**|Tianyang Liu et.al.|[2408.14467](http://arxiv.org/abs/2408.14467)|null|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ba1\u9053\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fd9\u4e00\u504f\u5dee\u8fdb\u884c\u660e\u786e\u7684\u5f52\u7eb3\u63a8\u7406\u3002\u8be5\u7ba1\u9053\u4f7f\u7528LLM\u5c06\u524d\u63d0\u8f6c\u6362\u4e3a\u4e00\u7ec4\u5df2\u9a8c\u8bc1\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u805a\u5408\u884d\u751f\u7684\u65b0\u8574\u542b\u8be2\u95ee\u7684\u7b54\u6848\u6765\u652f\u6301\u539f\u59cb\u63a8\u7406\u9884\u6d4b\u3002\u5728\u65b9\u5411\u6027\u8c13\u8bcd\u8574\u542b\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u901a\u8fc7\u5e94\u7528\u6b64\u7b80\u5355\u7ba1\u9053\uff0c\u53ef\u4ee5\u63d0\u9ad8LLM\u5728\u63a8\u7406\u4e0a\u7684\u6574\u4f53\u6027\u80fd\uff0c\u5e76\u663e\u8457\u51cf\u8f7b\u5b83\u4eec\u7684\u8bc1\u5b9e\u504f\u5dee\u5f71\u54cd\u3002|\n", "2408.14438": "|**2024-08-26**|**Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study**|Liuchang Xu Shuo Zhao et.al.|[2408.14438](http://arxiv.org/abs/2408.14438)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u3001Gemini\u7b49\u7684\u95ee\u4e16\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4ee3\u7801\u751f\u6210\u7b49\u591a\u65b9\u9762\u80fd\u529b\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u65b9\u9762\u7684\u8868\u73b0\u5e76\u672a\u5f97\u5230\u5168\u9762\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u4efb\u52a1\u7a7a\u95f4\u8bc4\u4ef7\u6570\u636e\u96c6\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u548c\u6bd4\u8f83\u51e0\u79cd\u5148\u8fdb\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u5341\u4e8c\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\u7c7b\u578b\uff0c\u5305\u62ec\u7a7a\u95f4\u7406\u89e3\u548c\u8def\u5f84\u89c4\u5212\uff0c\u5e76\u4e14\u6bcf\u9879\u4efb\u52a1\u90fd\u6709\u7ecf\u8fc7\u9a8c\u8bc1\u7684\u51c6\u786e\u7b54\u6848\u3002 \u6211\u4eec\u91c7\u7528\u53cc\u9636\u6bb5\u6d4b\u8bd5\u65b9\u6cd5\u5bf9\u591a\u4e2a\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecOpenAI\u7684gpt-3.5-turbo\u3001gpt-4o\u4ee5\u53caZhipuAI\u7684glm-4\u3002\u9996\u5148\u8fdb\u884c\u96f6\u6837\u672c\u6d4b\u8bd5\uff0c\u968f\u540e\u6839\u636e\u96be\u5ea6\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\uff0c\u5e76\u6267\u884c\u4e86\u63d0\u793a\u8c03\u4f18\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u7b2c\u4e00\u9636\u6bb5\u7684\u6d4b\u8bd5\u4e2d\uff0cgpt-4o\u7684\u6574\u4f53\u51c6\u786e\u6027\u6700\u9ad8\uff0c\u5e73\u5747\u8fbe\u5230\u4e8671.3%\u3002\u5c3d\u7ba1moonshot-v1-8k\u5728\u603b\u4f53\u4e0a\u7565\u900a\u4e00\u7b79\uff0c\u4f46\u5728\u5730\u540d\u8bc6\u522b\u4efb\u52a1\u4e0a\u5374\u8d85\u8d8a\u4e86gpt-4o\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u7279\u5b9a\u4efb\u52a1\u4e2d\u63d0\u793a\u7b56\u7565\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u94fe\u5f0f\u601d\u8003\uff08COT\uff09\u7b56\u7565\u4f7fgpt-4o\u5728\u8def\u5f84\u89c4\u5212\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece12.4%\u63d0\u5347\u81f387.5%\uff0c\u800c\u4e00\u6b21\u5c04\u51fb\u7b56\u7565\u5219\u4f7fmoonshot-v1-8k\u5728\u5730\u56fe\u7ed8\u5236\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece10.1%\u63d0\u9ad8\u523076.3%\u3002|\n", "2408.14419": "|**2024-08-26**|**CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models**|Shubham Bharti et.al.|[2408.14419](http://arxiv.org/abs/2408.14419)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHARTOM\u7684\u89c6\u89c9\u7406\u8bba\u7406\u89e3\u57fa\u51c6\uff0c\u9488\u5bf9\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002CHARTOM\u7531\u4e13\u95e8\u8bbe\u8ba1\u7684\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\u7ec4\u6210\u3002\u7ed9\u5b9a\u4e00\u4e2a\u56fe\u8868\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u9700\u8981\u6b63\u786e\u7406\u89e3\u56fe\u8868\uff08\u4e8b\u5b9e\u95ee\u9898\uff09\uff0c\u8fd8\u9700\u8981\u5224\u65ad\u8be5\u56fe\u8868\u662f\u5426\u4f1a\u8ba9\u4eba\u7c7b\u8bfb\u8005\u4ea7\u751f\u8bef\u5bfc\uff08\u601d\u7ef4\u95ee\u9898\uff09\u3002\u8fd9\u4e24\u4e2a\u95ee\u9898\u90fd\u5177\u6709\u91cd\u8981\u7684\u793e\u4f1a\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u8be6\u7ec6\u4ecb\u7ecd\u6784\u5efaCHARTOM\u57fa\u51c6\u7684\u8fc7\u7a0b\uff0c\u5305\u62ec\u5176\u5bf9\u4eba\u7c7b\u8868\u73b0\u7684\u6821\u51c6\u3002|\n", "2408.14418": "|**2024-08-26**|**MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues**|Kuluhan Binici et.al.|[2408.14418](http://arxiv.org/abs/2408.14418)|null|\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b(ASR)\u7cfb\u7edf\u5728\u5c06\u8bed\u97f3\u8f6c\u6362\u4e3a\u6587\u672c\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u7136\u800c\uff0c\u5b83\u4eec\u5f15\u5165\u7684\u9519\u8bef\u4f1a\u4e25\u91cd\u964d\u4f4e\u4e0b\u6e38\u4efb\u52a1\u5982\u6458\u8981\u751f\u6210\u7684\u8868\u73b0\u3002\u8fd9\u4e2a\u95ee\u9898\u5728\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u9886\u57df\u5c24\u4e3a\u7a81\u51fa\uff0c\u8fd9\u662f\u4e00\u4e2a\u6570\u636e\u8d44\u6e90\u6709\u9650\u7684\u9886\u57df\uff0c\u7528\u4e8e\u5fae\u8c03\u7684\u76d1\u7763\u6570\u636e\u7a00\u7f3a\uff0c\u56e0\u6b64\u9700\u8981\u5c06ASR\u6a21\u578b\u4f5c\u4e3a\u9ed1\u76d2\u89e3\u51b3\u65b9\u6848\u4f7f\u7528\u3002\u4f20\u7edf\u7684\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u4e5f\u4e0d\u9002\u7528\u4e8e\u63d0\u9ad8\u6458\u8981\u6a21\u578b\u5bf9\u566a\u97f3\u7684\u9c81\u68d2\u6027\uff0c\u539f\u56e0\u662f\u7f3a\u4e4f\u8db3\u591f\u7684\u533b\u7597\u5bf9\u8bdd\u97f3\u9891\u8bb0\u5f55\u53ca\u5176\u5bf9\u5e94\u7684ASR\u8f6c\u5f55\u6587\u672c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDSAGE\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u751f\u6210\u5408\u6210\u6837\u672c\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5229\u7528LLMs\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\uff0c\u5e76\u6307\u5bfc\u5b83\u4eec\u57fa\u4e8e\u5c11\u91cf\u53ef\u7528\u7684\u533b\u7597\u5bf9\u8bdd\u793a\u4f8b\u548c\u97f3\u9891\u8bb0\u5f55\uff0c\u751f\u6210\u7c7b\u4f3cASR\u7684\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u5efa\u6a21ASR\u566a\u97f3\uff0c\u5c06\u8fd9\u79cd\u542b\u566a\u6570\u636e\u878d\u5165\u8bad\u7ec3\u8fc7\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86\u533b\u7597\u5bf9\u8bdd\u6458\u8981\u7cfb\u7edf\u7684\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u89e3\u51b3\u4e86\u5173\u952e\u5e94\u7528\u4e2dASR\u8f93\u51fa\u566a\u97f3\u7684\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u589e\u5f3a\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u53ef\u9760\u6027\u7684\u7a33\u5065\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.14398": "|**2024-08-26**|**Language-specific Calibration for Pruning Multilingual Language Models**|Simon Kurz et.al.|[2408.14398](http://arxiv.org/abs/2408.14398)|null|\u8fd1\u671f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u526a\u679d\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\uff0c\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u5353\u8d8a\u7684\u538b\u7f29\u6548\u679c\uff0c\u5e76\u4fdd\u6301\u4e86\u9ad8\u9884\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u4f7f\u7528\u82f1\u8bed\u6587\u672c\u8fdb\u884c\u526a\u679d\u6821\u51c6\uff0c\u800c\u5ffd\u7565\u4e86\u73b0\u4ee3LLM\u7684\u591a\u8bed\u8a00\u6027\u8d28\u53ca\u5176\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u7528\u4e8e\u526a\u679d\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u7b56\u7565\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u9996\u4e2a\u5168\u9762\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u4e0d\u540c\u6821\u51c6\u8bed\u8a00\u5728\u591a\u8bed\u8a00\u4efb\u52a1\u3001\u6a21\u578b\u548c\u6700\u5148\u8fdb\u7684\u526a\u679d\u6280\u672f\u4e0b\u5bf9\u526a\u679d\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63d0\u4f9b\u4e86\u5b9e\u7528\u7684\u5efa\u8bae\uff0c\u4f8b\u5982\uff0c\u5728\u76ee\u6807\u8bed\u8a00\u4e0a\u8fdb\u884c\u6821\u51c6\u53ef\u4ee5\u6709\u6548\u5730\u964d\u4f4e\u56f0\u60d1\u5ea6\uff0c\u4f46\u4e0d\u4e00\u5b9a\u80fd\u4fc3\u8fdb\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u63d0\u5347\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u5b9e\u9a8c\u63ed\u793a\uff0c\u76ee\u6807\u8bed\u8a00\u4e0a\u7684\u6821\u51c6\u4e3b\u8981\u8d21\u732e\u5728\u4e8e\u4fdd\u7559\u4e0e\u6d41\u7545\u6027\u548c\u8fde\u8d2f\u6027\u76f8\u5173\u7684\u8bed\u8a00\u7279\u5b9a\u7279\u6027\uff0c\u4f46\u53ef\u80fd\u65e0\u6cd5\u6355\u6349\u5230\u4e0e\u7406\u89e3\u80fd\u529b\u548c\u63a8\u7406\u80fd\u529b\u7b49\u8bed\u8a00\u901a\u7528\u7279\u6027\u7684\u5173\u8054\u3002 \u6700\u540e\uff0c\u6211\u4eec\u4e3a\u672a\u6765\u7684\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u5b9e\u9645\u7684\u5efa\u8bae\u3002|\n", "2408.14387": "|**2024-08-26**|**Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning**|Sakhinana Sagar Srinivas et.al.|[2408.14387](http://arxiv.org/abs/2408.14387)|null|\u7a7a\u95f4\u65f6\u95f4\u9884\u6d4b\u5728\u4ea4\u901a\u7cfb\u7edf\u3001\u7269\u6d41\u548c\u4f9b\u5e94\u94fe\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u53d7\u9650\u4e8e\u5904\u7406\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5f00\u6e90\u5927\u578b\u548c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs \u548c LMs\uff09\u4e0e\u4f20\u7edf\u9884\u6d4b\u65b9\u6cd5\u7684\u6df7\u5408\u7b56\u7565\u3002\u901a\u8fc7\u5f15\u5165\u52a8\u6001\u63d0\u793a\u548c\u5206\u7ec4\u67e5\u8be2\u3001\u591a\u5934\u6ce8\u610f\u529b\u673a\u5236\uff0c\u8be5\u7b56\u7565\u80fd\u591f\u66f4\u6709\u6548\u5730\u6355\u6349\u6f14\u53d8\u975e\u7ebf\u6027\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u4e2d\u7684\u5185\u90e8\u7cfb\u5217\u548c\u8de8\u7cfb\u5217\u4f9d\u8d56\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4f4e\u79e9\u9002\u914d\u4e0e\u6fc0\u6d3b\u8bb0\u5fc6\u51cf\u5c11\u6280\u672f\uff08LoRA-AMR\uff09\uff0c\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u5bf9\u5f00\u6e90\u5c0f\u578b LM \u8fdb\u884c\u5b9a\u5236\u5316\u5fae\u8c03\uff0c\u4ee5\u5206\u6790\u65f6\u95f4\u5e8f\u5217\u8d8b\u52bf\uff0c\u540c\u65f6\u4fdd\u7559\u63a8\u7406\u5ef6\u8fdf\u5e76\u964d\u4f4e\u8ba1\u7b97\u5f00\u9500\u548c\u6fc0\u6d3b\u5b58\u50a8\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u5c06\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0e\u4f20\u7edf\u65f6\u95f4\u5e8f\u5217\u8868\u793a\u5b66\u4e60\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u5b9e\u73b0\u8de8\u6a21\u6001\u96c6\u6210\uff0c\u4ece\u800c\u83b7\u5f97\u7a33\u5065\u4e14\u51c6\u786e\u7684\u9884\u6d4b\u7ed3\u679c\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u5b9e\u9645\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8be5\u6846\u67b6\u7684\u6548\u80fd\u5f97\u5230\u4e86\u5145\u5206\u9a8c\u8bc1\uff0c\u5176\u9884\u6d4b\u51c6\u786e\u6027\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.14380": "|**2024-08-26**|**Probing Causality Manipulation of Large Language Models**|Chenyang Zhang et.al.|[2408.14380](http://arxiv.org/abs/2408.14380)|**[link](https://github.com/tongjinlp/llm-causality-probing)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u591a\u79cd\u80fd\u529b\uff0c\u5305\u62ec\u56e0\u679c\u5173\u7cfb\u95ee\u9898\u3002\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u901a\u5e38\u57fa\u4e8e\u7edf\u8ba1\u5173\u8054\u5de5\u4f5c\uff0c\u800c\u975e\u4e13\u6ce8\u4e8e\u53e5\u5b50\u4e2d\u7684\u56e0\u679c\u4e0e\u5f71\u54cd\u3002\u56e0\u6b64\uff0c\u63a2\u7d22LLM\u5185\u90e8\u5bf9\u56e0\u679c\u6027\u7684\u64cd\u7eb5\u662f\u5fc5\u8981\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u63d0\u4f9b\u4e0d\u540c\u7684\u6377\u5f84\u5e76\u89c2\u5bdf\u6a21\u578b\u884c\u4e3a\u6765\u63a2\u67e5\u56e0\u679c\u6027\u64cd\u7eb5\u7684\u5c42\u7ea7\u3002\u6211\u4eec\u5229\u7528\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u6280\u672f\uff0c\u9488\u5bf9\u8bbe\u8ba1\u7684\u56e0\u679c\u5206\u7c7b\u4efb\u52a1\uff0c\u5bf9\u4e3b\u6d41LLM\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5305\u62ecGPT-4\u4ee5\u53ca\u4e00\u4e9b\u8f83\u5c0f\u7684\u548c\u7279\u5b9a\u9886\u57df\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM\u80fd\u591f\u8bc6\u522b\u4e0e\u56e0\u679c\u6027\u76f8\u5173\u7684\u5b9e\u4f53\uff0c\u5e76\u8ba4\u8bc6\u5230\u76f4\u63a5\u7684\u56e0\u679c\u5173\u7cfb\u3002\u7136\u800c\uff0cLLM\u7f3a\u4e4f\u4e13\u95e8\u7684\u56e0\u679c\u8ba4\u77e5\u80fd\u529b\uff0c\u53ea\u662f\u5c06\u56e0\u679c\u6027\u89c6\u4e3a\u53e5\u5b50\u6574\u4f53\u8bed\u4e49\u7684\u4e00\u90e8\u5206\u3002**|\n", "2408.14354": "|**2024-08-26**|**SWE-bench-java: A GitHub Issue Resolving Benchmark for Java**|Daoguang Zan et.al.|[2408.14354](http://arxiv.org/abs/2408.14354)|**[link](https://github.com/multi-swe-bench/multi-swe-bench-env)**|**GitHub\u95ee\u9898\u89e3\u51b3\u662f\u8f6f\u4ef6\u5de5\u7a0b\u4e2d\u7684\u5173\u952e\u4efb\u52a1\uff0c\u8fd1\u671f\u5728\u884c\u4e1a\u548c\u5b66\u672f\u754c\u90fd\u53d7\u5230\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5728\u8fd9\u4e2a\u9886\u57df\u5185\uff0cSWE-bench\u5df2\u7ecf\u53d1\u5e03\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u4ec5\u5173\u6ce8Python\u7248\u672c\u3002\u7136\u800c\uff0c\u652f\u6301\u66f4\u591a\u7f16\u7a0b\u8bed\u8a00\u540c\u6837\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5de5\u4e1a\u754c\u5bf9\u6b64\u6709\u5f3a\u70c8\u9700\u6c42\u3002\u4f5c\u4e3a\u8fc8\u5411\u591a\u8bed\u8a00\u652f\u6301\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Java\u7248\u7684SWE-bench\uff0c\u79f0\u4e3aSWE-bench-java\u3002\u6211\u4eec\u5df2\u516c\u5f00\u53d1\u5e03\u4e86\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u4e8eDocker\u7684\u8bc4\u4f30\u73af\u5883\u548c\u6392\u884c\u699c\uff0c\u8fd9\u4e9b\u90fd\u5c06\u6301\u7eed\u7ef4\u62a4\u548c\u66f4\u65b0\u3002\u4e3a\u4e86\u9a8c\u8bc1SWE-bench-java\u7684\u53ef\u9760\u6027\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u7ecf\u5178\u65b9\u6cd5SWE-agent\uff0c\u5e76\u5728\u5176\u4e2d\u6d4b\u8bd5\u4e86\u51e0\u79cd\u5f3a\u5927\u7684LLMs\u3002\u4f17\u6240\u5468\u77e5\uff0c\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u8a00\u57fa\u51c6\u65e2\u8017\u65f6\u53c8\u8d39\u529b\uff0c\u56e0\u6b64\u6211\u4eec\u6b22\u8fce\u901a\u8fc7\u62c9\u53d6\u8bf7\u6c42\u6216\u5408\u4f5c\u6765\u52a0\u901f\u5176\u8fed\u4ee3\u548c\u6539\u8fdb\uff0c\u4e3a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7f16\u7a0b\u94fa\u5e73\u9053\u8def\u3002**|\n", "2408.15240": "|**2024-08-27**|**Generative Verifiers: Reward Modeling as Next-Token Prediction**|Lunjun Zhang et.al.|[2408.15240](http://arxiv.org/abs/2408.15240)|null|\u9a8c\u8bc1\u5668\u6216\u5956\u52b1\u6a21\u578b\u5e38\u7528\u4e8e\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u6027\u80fd\u3002\u4e00\u79cd\u5e38\u89c1\u7684\u65b9\u6cd5\u662fBest-of-N\u7b56\u7565\uff0c\u5176\u4e2d\u4eceLLM\u751f\u6210\u7684N\u4e2a\u5019\u9009\u89e3\u51b3\u65b9\u6848\u4e2d\u7531\u9a8c\u8bc1\u5668\u8fdb\u884c\u6392\u540d\uff0c\u9009\u62e9\u6700\u4f73\u4e00\u4e2a\u3002\u4f20\u7edf\u4e0a\uff0c\u9a8c\u8bc1\u5668\u662f\u4f5c\u4e3a\u5224\u522b\u5206\u7c7b\u5668\u8fdb\u884c\u8bad\u7ec3\u4ee5\u5bf9\u89e3\u51b3\u65b9\u6848\u6253\u5206\u7684\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3LLM\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u5728\u9a8c\u8bc1\u548c\u89e3\u51b3\u65b9\u6848\u751f\u6210\u4e0a\u4f7f\u7528\u901a\u7528\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u76ee\u6807\u8054\u5408\u8bad\u7ec3\u9a8c\u8bc1\u5668\u3002\u4e0e\u6807\u51c6\u9a8c\u8bc1\u5668\u76f8\u6bd4\uff0c\u8fd9\u6837\u7684\u751f\u6210\u578b\u9a8c\u8bc1\u5668\uff08GenRM\uff09\u53ef\u4ee5\u4eceLLM\u7684\u51e0\u4e2a\u4f18\u52bf\u4e2d\u83b7\u76ca\uff1a\u5b83\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u6307\u4ee4\u8c03\u8c10\u76f8\u7ed3\u5408\uff0c\u652f\u6301\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u5e76\u4e14\u53ef\u4ee5\u901a\u8fc7\u589e\u52a0\u63a8\u7406\u65f6\u7684\u8ba1\u7b97\u91cf\u6765\u5229\u7528\u591a\u6570\u6295\u7968\uff0c\u4ece\u800c\u8fdb\u884c\u66f4\u597d\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7b97\u6cd5\u95ee\u9898\u548c\u5c0f\u5b66\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u4f7f\u7528Gemma\u4e3a\u57fa\u7840\u7684\u9a8c\u8bc1\u5668\u65f6\uff0cGenRM\u4f18\u4e8e\u5224\u522b\u578b\u9a8c\u8bc1\u5668\u548cLLM\u4f5c\u4e3a\u88c1\u5224\uff0c\u8868\u73b0\u51fa16%-64%\u7684\u95ee\u9898\u89e3\u51b3\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86GenRM\u5728\u6570\u636e\u96c6\u89c4\u6a21\u3001\u6a21\u578b\u5bb9\u91cf\u548c\u63a8\u7406\u65f6\u8ba1\u7b97\u91cf\u589e\u52a0\u65b9\u9762\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\u3002|\n", "2408.15221": "|**2024-08-27**|**LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet**|Nathaniel Li et.al.|[2408.15221](http://arxiv.org/abs/2408.15221)|null|\u8fd1\u671f\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9632\u5fa1\u63aa\u65bd\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5bf9\u6709\u5bb3\u67e5\u8be2\u7684\u62d2\u7edd\u80fd\u529b\uff0c\u5373\u4f7f\u5728\u906d\u53d7\u6709\u7ec4\u7ec7\u653b\u51fb\u7684\u60c5\u51b5\u4e0b\u4e5f\u4e0d\u4f8b\u5916\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u9632\u5fa1\u63aa\u65bd\u4e3b\u8981\u662f\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u9488\u5bf9\u81ea\u52a8\u5316\u653b\u51fb\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u79cd\u5a01\u80c1\u6a21\u578b\u4e0d\u8db3\u4ee5\u53cd\u6620\u771f\u5b9e\u4e16\u754c\u4e2d\u6076\u610f\u884c\u4e3a\u7684\u590d\u6742\u6027\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\uff08\u5373\u653b\u51fb\u8005\u5229\u7528\u6a21\u578b\u7684\u6f0f\u6d1e\u6765\u7ed5\u8fc7\u9632\u5fa1\u673a\u5236\uff09\u80fd\u591f\u63ed\u9732\u9632\u5fa1\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u6f0f\u6d1e\u3002\u5728\u4f7f\u7528HarmBench\u8fd9\u4e00\u8bc4\u4f30\u5e73\u53f0\uff0c\u5bf9\u6297\u90a3\u4e9b\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u4ec5\u62a5\u544a\u4f4e\u767e\u5206\u6bd4\u653b\u51fb\u6210\u529f\u7387\uff08ASR\uff09\u7684\u9632\u5fa1\u7cfb\u7edf\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u7684\u6210\u529f\u7387\u8d85\u8fc7\u4e8670%\u3002\u8fd9\u8868\u660e\u5f53\u524d\u7684\u9632\u5fa1\u673a\u5236\u5728\u9762\u5bf9\u66f4\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u653b\u51fb\u7b56\u7565\u65f6\u5b58\u5728\u4e0d\u8db3\u3002 \u6b64\u5916\uff0c\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u8fd8\u63ed\u793a\u4e86\u673a\u5668\u9057\u5fd8\u9632\u5fa1\u7cfb\u7edf\u7684\u6f0f\u6d1e\u3002\u653b\u51fb\u8005\u6210\u529f\u5730\u4ece\u672a\u88ab\u5220\u9664\u7684\u6a21\u578b\u4e2d\u6062\u590d\u4e86\u53ef\u7528\u4e8e\u751f\u7269\u5b89\u5168\u53cc\u91cd\u7528\u9014\u7684\u77e5\u8bc6\uff0c\u8fd9\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u73b0\u6709\u9632\u5fa1\u63aa\u65bd\u5728\u4fdd\u62a4\u654f\u611f\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u7684\u5f31\u70b9\u3002 \u4e3a\u4e86\u603b\u7ed3\u548c\u5171\u4eab\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u591a\u8f6e\u5bf9\u8bdd\u4eba\u5de5\u667a\u80fd\u8d8a\u72f1\u201d\uff08Multi-Turn Human Jailbreaks\uff0c\u7b80\u79f0MHJ\uff09\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u6765\u81ea537\u4e2a\u4e0d\u540c\u591a\u8f6e\u5bf9\u8bdd\u573a\u666f\u76842912\u4e2a\u89e6\u53d1\u6307\u4ee4\uff0c\u5171\u8ba12,912\u4e2a\u89e6\u53d1\u6307\u4ee4\u6d89\u53ca2,912\u4e2a\u4e0d\u540c\u7684\u591a\u8f6e\u5bf9\u8bdd\u201c\u8d8a\u72f1\u201d\u6848\u4f8b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u4ee5\u53ca\u5728\u591a\u79cd\u5546\u4e1a\u7ea2\u961f\u6d4b\u8bd5\u4e2d\u53d1\u5c55\u51fa\u7684\u4e00\u7cfb\u5217\u201c\u8d8a\u72f1\u201d\u7b56\u7565\u7684\u7efc\u8ff0\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u66f4\u5f3a\u5927\u7684LLM\u9632\u5fa1\u7cfb\u7edf\u63d0\u4f9b\u8d44\u6e90\u548c\u652f\u6301\u3002|\n", "2408.15207": "|**2024-08-27**|**Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks**|Shide Zhou et.al.|[2408.15207](http://arxiv.org/abs/2408.15207)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\u6781\u5927\u5730\u6539\u53d8\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u683c\u5c40\uff0c\u7136\u800c\u5728\u654f\u611f\u9886\u57df\u90e8\u7f72\u65f6\uff0c\u5b83\u4eec\u7684\u8106\u5f31\u6027\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u4e25\u91cd\u5173\u5207\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u6076\u610f\u5229\u7528\u7684\u98ce\u9669\u3002\u8fd9\u79cd\u60c5\u51b5\u51f8\u663e\u4e86\u9884\u90e8\u7f72\u6d4b\u8bd5\u4e0d\u8db3\u7684\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u66f4\u52a0\u4e25\u683c\u548c\u5168\u9762\u8bc4\u4f30\u65b9\u6cd5\u7684\u7d27\u8feb\u6027\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5168\u9762\u7684\u5b9e\u8bc1\u5206\u6790\uff0c\u8bc4\u4f30\u4e86\u4f20\u7edf\u8986\u76d6\u6807\u51c6\u5728\u8bc6\u522b\u8fd9\u4e9b\u6f0f\u6d1e\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u5173\u952e\u95ee\u9898\u2014\u2014\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002\u7814\u7a76\u9996\u5148\u5bf9LLM\u4e2d\u7684\u9690\u85cf\u72b6\u6001\u8fdb\u884c\u4e86\u805a\u7c7b\u5206\u6790\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u72b6\u6001\u7684\u5185\u5728\u7279\u6027\u80fd\u591f\u660e\u663e\u533a\u5206\u4e0d\u540c\u7c7b\u578b\u7684\u67e5\u8be2\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6807\u51c6\u7ea7\u522b\u3001\u5c42\u7ea7\u522b\u548c\u8bcd\u7ea7\u522b\u2014\u2014\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6807\u51c6\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u6b63\u5e38\u67e5\u8be2\u4e0e\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u5728\u795e\u7ecf\u5143\u6fc0\u6d3b\u6a21\u5f0f\u4e0a\u7684\u663e\u8457\u5dee\u5f02\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u805a\u7c7b\u7ed3\u679c\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5b9e\u65f6\u68c0\u6d4b\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff0c\u5229\u7528\u795e\u7ecf\u6fc0\u6d3b\u7279\u5f81\u3002\u6211\u4eec\u7684\u5206\u7c7b\u5668\u8868\u73b0\u51fa\u4e86\u6781\u9ad8\u7684\u51c6\u786e\u7387\uff0c\u5e73\u5747\u8fbe\u523096.33%\uff0c\u6210\u529f\u8bc6\u522b\u51fa\u5305\u62ec\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6027\u653b\u51fb\u7684\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u3002\u8fd9\u9879\u7814\u7a76\u7684\u91cd\u8981\u6027\u5728\u4e8e\u5176\u5bf9LLM\u5b89\u5168\u6027\u6d4b\u8bd5\u590d\u6742\u6311\u6218\u7684\u5168\u9762\u5e94\u5bf9\u3002\u901a\u8fc7\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u751f\u6210\u7b2c\u4e00\u4e2a\u8bcd\u65f6\u7acb\u5373\u68c0\u6d4b\u5230\u653b\u51fb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u96c6\u6210LLM\u7684\u672a\u6765\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5f3a\u5927\u7684\u5b9e\u65f6\u68c0\u6d4b\u80fd\u529b\u3002\u8fd9\u4e00\u7814\u7a76\u6df1\u5316\u4e86\u6211\u4eec\u5bf9LLM\u5b89\u5168\u6027\u7684\u7406\u89e3\uff0c\u5e76\u4e3a\u5f00\u53d1\u66f4\u7a33\u5065\u7684\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.15205": "|**2024-08-27**|**Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation**|Jian Hu et.al.|[2408.15205](http://arxiv.org/abs/2408.15205)|**[link](https://github.com/lwpyh/ProMaC_code)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4efb\u52a1\u901a\u7528\u7684\u63d0\u793a\u53ef\u5206\u5272\u65b9\u6cd5\uff0c\u65e8\u5728\u51cf\u5c11\u5bf9\u6bcf\u79cd\u6240\u9700\u5bf9\u8c61\u7684\u5b9e\u4f8b\u7279\u5b9a\u624b\u52a8\u63d0\u793a\u7684\u9700\u6c42\u3002\u901a\u8fc7\u4f7f\u7528\u5355\u4e2a\u4efb\u52a1\u901a\u7528\u63d0\u793a\u6765\u6307\u5bfc\u540c\u4e00\u4efb\u52a1\u4e0b\u4e0d\u540c\u5bf9\u8c61\u7684\u4e0d\u540c\u56fe\u50cf\u7684\u5206\u5272\uff0c\u5f15\u5165\u4e86\u4efb\u52a1\u901a\u7528\u63d0\u793a\u5206\u5272\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4ece\u901a\u7528\u63d0\u793a\u63a8\u7406\u51fa\u8be6\u7ec6\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u5206\u5272\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002\u7136\u800c\uff0cMLLMs\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7ecf\u5e38\u51fa\u73b0\u5e7b\u89c9\uff0c\u5bfc\u81f4\u63d0\u793a\u4e0d\u51c6\u786e\u3002\u73b0\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u6d88\u9664\u5e7b\u89c9\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u672c\u6587\u8ba4\u4e3aMLLM\u5e7b\u89c9\u5728\u6b63\u786e\u5229\u7528\u65f6\u53ef\u4ee5\u63ed\u793a\u6709\u4ef7\u503c\u7684\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee3\u8868\u4e86\u8d85\u8d8a\u5355\u5f20\u56fe\u50cf\u7684\u9884\u8bad\u7ec3\u5927\u89c4\u6a21\u77e5\u8bc6\u3002\u56e0\u6b64\uff0c\u672c\u6587\u5229\u7528\u5e7b\u89c9\u4ece\u56fe\u50cf\u4e2d\u6316\u6398\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u9a8c\u8bc1\u5176\u51c6\u786e\u6027\u4ee5\u589e\u5f3a\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7684\u63d0\u793a-\u63a9\u7801\u5faa\u73af\u751f\u6210\u6846\u67b6\uff08ProMaC\uff09\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u63d0\u793a\u751f\u6210\u5668\u548c\u4e00\u4e2a\u63a9\u7801\u751f\u6210\u5668\u3002\u63d0\u793a\u751f\u6210\u5668\u4f7f\u7528\u591a\u5c3a\u5ea6\u94fe\u5f0f\u601d\u8003\u63d0\u793a\uff0c\u6700\u521d\u63a2\u7d22\u5e7b\u89c9\u4ee5\u63d0\u53d6\u6d4b\u8bd5\u56fe\u50cf\u4e0a\u7684\u6269\u5c55\u4e0a\u4e0b\u6587\u77e5\u8bc6\u3002\u7136\u540e\uff0c\u5c06\u8fd9\u4e9b\u5e7b\u89c9\u964d\u4f4e\u5230\u5f62\u6210\u7cbe\u786e\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ece\u800c\u5f15\u5bfc\u63a9\u7801\u751f\u6210\u5668\u901a\u8fc7\u63a9\u7801\u8bed\u4e49\u5bf9\u9f50\u4ea7\u751f\u4e0e\u4efb\u52a1\u8bed\u4e49\u4e00\u81f4\u7684\u63a9\u7801\u3002\u751f\u6210\u7684\u63a9\u7801\u901a\u8fc7\u8fed\u4ee3\u5f15\u5bfc\u63d0\u793a\u751f\u6210\u5668\u66f4\u5173\u6ce8\u4efb\u52a1\u76f8\u5173\u7684\u56fe\u50cf\u533a\u57df\u5e76\u51cf\u5c11\u65e0\u5173\u7684\u5e7b\u89c9\uff0c\u6700\u7ec8\u5171\u540c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u63a9\u7801\u7684\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u57285\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8bc1\u660e\u4e86ProMaC\u7684\u6709\u6548\u6027\u3002\u8be6\u7ec6\u4ee3\u7801\u89c1https://lwpyh.github.io/ProMaC/\u3002|\n", "2408.15204": "|**2024-08-27**|**Can Unconfident LLM Annotations Be Used for Confident Conclusions?**|Kristina Gligori\u0107 et.al.|[2408.15204](http://arxiv.org/abs/2408.15204)|**[link](https://github.com/kristinagligoric/confidence-driven-inference)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u8005\u9ad8\u5ea6\u4e00\u81f4\uff0c\u663e\u793a\u51fa\u51cf\u8f7b\u4eba\u7c7b\u6570\u636e\u6536\u96c6\u6311\u6218\u7684\u6f5c\u529b\u3002\u5728\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\uff08CSS\uff09\u9886\u57df\uff0c\u7814\u7a76\u4eba\u5458\u8d8a\u6765\u8d8a\u591a\u5730\u5229\u7528LLM\u6ce8\u91ca\u6765\u8865\u5145\u7f13\u6162\u4e14\u6602\u8d35\u7684\u4eba\u7c7b\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5982\u4f55\u6536\u96c6\u548c\u4f7f\u7528LLM\u6ce8\u91ca\u800c\u4e0d\u635f\u5bb3\u4e0b\u6e38\u7ed3\u8bba\u7684\u6709\u6548\u6027\uff0c\u4ecd\u7f3a\u4e4f\u660e\u786e\u7684\u6307\u5357\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u201d\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86LLM\u6ce8\u91ca\u548cLLM\u7f6e\u4fe1\u5ea6\u6307\u793a\u5668\uff0c\u4ee5\u6218\u7565\u65b9\u5f0f\u9009\u62e9\u5e94\u6536\u96c6\u54ea\u4e9b\u4eba\u7c7b\u6ce8\u91ca\uff0c\u65e8\u5728\u751f\u4ea7\u51c6\u786e\u7684\u7edf\u8ba1\u4f30\u8ba1\u548c\u53ef\u9a8c\u8bc1\u7684\u7f6e\u4fe1\u533a\u95f4\uff0c\u540c\u65f6\u51cf\u5c11\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u9632\u6b62LLM\u6ce8\u91ca\u8d28\u91cf\u5dee\u7684\u4fdd\u969c\u63aa\u65bd\uff0c\u786e\u4fdd\u5f97\u51fa\u7684\u7ed3\u8bba\u65e2\u6709\u6548\u53c8\u4e0d\u6bd4\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u6ce8\u91ca\u66f4\u4e0d\u51c6\u786e\u3002\u6211\u4eec\u5728\u4e09\u4e2aCSS\u573a\u666f\u2014\u2014\u793c\u8c8c\u6587\u672c\u3001\u7acb\u573a\u548c\u504f\u89c1\u2014\u2014\u4e2d\u7684\u7edf\u8ba1\u4f30\u8ba1\u4efb\u52a1\u4e2d\uff0c\u901a\u8fc7\u4e0e\u57fa\u7ebf\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u7684\u6709\u6548\u6027\uff0c\u6bcf\u79cd\u573a\u666f\u4e0b\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u51cf\u5c11\u4e86\u8d85\u8fc725%\u3002\u5c3d\u7ba1\u6211\u4eec\u4f7f\u7528CSS\u573a\u666f\u8fdb\u884c\u6f14\u793a\uff0c\u4f46\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u53ef\u4ee5\u7528\u4e8e\u5e7f\u6cdbNLP\u95ee\u9898\u4e2d\u7684\u5927\u591a\u6570\u6807\u51c6\u91cf\u4f30\u8ba1\u3002|\n", "2408.15176": "|**2024-08-27**|**Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement**|Longshen Ou et.al.|[2408.15176](http://arxiv.org/abs/2408.15176)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\uff0c\u5305\u62ec\u7b26\u53f7\u97f3\u4e50\u751f\u6210\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u8fdb\u884c\u53ef\u63a7\u97f3\u4e50\u7f16\u6392\u4efb\u52a1\u7684\u6311\u6218\u4ecd\u7136\u65b0\u9896\uff0c\u6bcf\u4e2a\u4efb\u52a1\u90fd\u9700\u8981\u4e0d\u540c\u7684\u97f3\u4e50\u4fe1\u606f\u4f5c\u4e3a\u63a7\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5e8f\u5217\u5230\u5e8f\u5217\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u5bf9\u7b26\u53f7\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u56db\u4e2a\u4e0d\u540c\u7684\u591a\u8f68\u7f16\u6392\u4efb\u52a1\uff1a\u4e50\u961f\u7f16\u6392\u3001\u94a2\u7434\u7f29\u51cf\u3001\u9f13\u7f16\u6392\u548c\u58f0\u97f3\u5206\u79bb\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u7b56\u7565\u5728\u6240\u6709\u56db\u4e2a\u4efb\u52a1\u4e0a\u5747\u5b9e\u73b0\u4e86\u66f4\u9ad8\u97f3\u4e50\u8d28\u91cf\u7684\u7ed3\u679c\uff0c\u4e0e\u4e13\u95e8\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u57fa\u7ebf\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u989d\u5916\u7684\u63a2\u67e5\u5206\u6790\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u9636\u6bb5\u8d4b\u4e88\u6a21\u578b\u7406\u89e3\u97f3\u4e50\u6761\u4ef6\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u8fd9\u5728\u4ec5\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u7684\u5fae\u8c03\u96be\u4ee5\u83b7\u5f97\u7684\u60c5\u51b5\u4e0b\u5c24\u4e3a\u91cd\u8981\u3002|\n", "2408.15172": "|**2024-08-27**|**X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation**|Hanjia Lyu et.al.|[2408.15172](http://arxiv.org/abs/2408.15172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5df2\u88ab\u8bc1\u660e\u80fd\u663e\u8457\u63d0\u5347\u4e30\u5bcc\u9879\u76ee\u63cf\u8ff0\u7684\u6548\u679c\uff0c\u8fdb\u800c\u589e\u5f3a\u63a8\u8350\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u4e8e\u7eaf\u6587\u672c\u63d0\u793a\uff0c\u6216\u8005\u91c7\u7528\u57fa\u672c\u7684\u591a\u6a21\u6001\u7b56\u7565\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u6587\u672c\u4e0e\u89c6\u89c9\u6a21\u6001\u4e4b\u95f4\u4e92\u8865\u7684\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCross-Reflection Prompting\uff08X-Reflect\uff09\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5f15\u5bfcLMM\u660e\u786e\u8bc6\u522b\u5e76\u8c03\u548c\u6587\u672c\u4e0e\u56fe\u50cf\u4e4b\u95f4\u7684\u652f\u6301\u6027\u4e0e\u51b2\u7a81\u4fe1\u606f\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u901a\u8fc7\u6355\u6349\u4e24\u79cd\u6a21\u6001\u7684\u7ec6\u5fae\u6d1e\u5bdf\uff0c\u6b64\u65b9\u6cd5\u751f\u6210\u4e86\u66f4\u4e3a\u5168\u9762\u4e14\u8bed\u5883\u4e30\u5bcc\u7684\u9879\u76ee\u8868\u793a\u3002\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4e0b\u6e38\u63a8\u8350\u51c6\u786e\u5ea6\u4e0a\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u6846\u67b6\u5728\u4e0d\u540cLMM\u67b6\u6784\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u63d0\u793a\u7b56\u7565\u7684\u9c81\u68d2\u6027\uff0c\u63d0\u4f9b\u4e86\u4f18\u5316\u7684\u89c1\u89e3\u3002\u8fd9\u9879\u5de5\u4f5c\u5f3a\u8c03\u4e86\u6574\u5408\u591a\u6a21\u6001\u4fe1\u606f\u7684\u91cd\u8981\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u5584\u591a\u6a21\u6001\u63a8\u8350\u7cfb\u7edf\u4e2d\u9879\u76ee\u7406\u89e3\u7684\u65b0\u578b\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.15171": "|**2024-08-27**|**Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation**|N. E. Kriman et.al.|[2408.15171](http://arxiv.org/abs/2408.15171)|null|\u81ea2022\u5e74ChatGPT\u7684\u53d1\u5e03\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u8303\u56f4\u663e\u8457\u6269\u5927\uff0c\u663e\u793a\u51fa\u5176\u5728\u5404\u79cd\u573a\u666f\u4e2d\u7684\u4ef7\u503c\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4f01\u4e1a\u7ea7\u548c\u5546\u4e1a\u5e94\u7528\u800c\u8a00\uff0cLLMs\u751f\u6210\u4e0d\u51c6\u786e\u4fe1\u606f\u7684\u8d8b\u52bf\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u73b0\u8c61\uff0c\u6210\u4e3a\u4e86\u4e00\u4e2a\u4e3b\u8981\u6311\u6218\u3002\u672c\u9879\u76ee\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728\u4e0e\u539f\u59cb\u6587\u672c\u8fdb\u884c\u6bd4\u8f83\u65f6\u8bc4\u4f30LLM\u751f\u6210\u6982\u8981\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6734\u7d20\u8d1d\u53f6\u65af\u5206\u7c7b\u6765\u5224\u65ad\u751f\u6210\u5185\u5bb9\u7684\u771f\u5b9e\u6027\u3002 \u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u53ef\u4ee5\u4f30\u8ba1\u751f\u6210\u6587\u672c\u4e0e\u5b9e\u9645\u4fe1\u606f\u4e4b\u95f4\u7684\u5339\u914d\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8LLM\u5e94\u7528\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u3002\u8fd9\u4e0d\u4ec5\u6709\u52a9\u4e8e\u8bc6\u522b\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u4e4b\u5904\uff0c\u8fd8\u80fd\u589e\u5f3a\u7528\u6237\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u4fe1\u4efb\uff0c\u4fc3\u8fdb\u5176\u5728\u66f4\u5e7f\u6cdb\u9886\u57df\u7684\u6709\u6548\u4f7f\u7528\u3002\u6b64\u5916\uff0c\u8be5\u65b9\u6cd5\u8fd8\u80fd\u4e3aLLM\u7684\u6301\u7eed\u6539\u8fdb\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u53cd\u9988\uff0c\u63a8\u52a8\u6280\u672f\u8fdb\u6b65\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u9ad8\u8d28\u91cf\u3001\u66f4\u53ef\u9760\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u5185\u5bb9\u751f\u6210\u3002|\n", "2408.15079": "|**2024-08-27**|**BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline**|Guosheng Dong et.al.|[2408.15079](http://arxiv.org/abs/2408.15079)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6838\u5fc3\u80fd\u529b\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5e7f\u6cdb\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u7ec4\u6210\u548c\u9009\u62e9\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u88ab\u591a\u4e2a\u673a\u6784\u89c6\u4e3a\u5546\u4e1a\u79d8\u5bc6\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u6e90\u4e86\u4e00\u4e2a\u901a\u7528\u9002\u7528\u7684\u6570\u636e\u5904\u7406\u7ba1\u9053\uff0c\u5e76\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u7ade\u4e89\u6027\u7684LLM\u57fa\u7ebf\u6765\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u548c\u6f5c\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6570\u636e\u5904\u7406\u7ba1\u9053\u5305\u62ec\u5e7f\u57df\u6536\u96c6\u4ee5\u6269\u5927\u89c4\u6a21\u548c\u91cd\u65b0\u52a0\u6743\u4ee5\u63d0\u9ad8\u8d28\u91cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u7ba1\u9053\u5bf93\u4e07\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u660e\u786e\u7684\u4e0b\u6e38\u4efb\u52a1\u4f18\u5316\uff0c\u63a5\u7740\u8fdb\u884c\u4e00\u4e2a\u7b80\u5355\u4f46\u6709\u6548\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u3002BaichuanSEED\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8868\u73b0\u51fa\u4e00\u81f4\u6027\u4e0e\u9884\u6d4b\u6027\uff0c\u5e76\u5728\u7efc\u5408\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u51e0\u4e2a\u5148\u8fdb\u7684\u5546\u4e1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5982Qwen1.5\u548cLlama3\uff0c\u5b9e\u73b0\u4e86\u53ef\u6bd4\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u51e0\u4e2a\u542f\u53d1\u5f0f\u5b9e\u9a8c\uff0c\u8ba8\u8bba\u4e86\u5728\u6570\u5b66\u548c\u7f16\u7a0b\u7b49\u4e0b\u6e38\u4efb\u52a1\u8fdb\u4e00\u6b65\u4f18\u5316\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.15066": "|**2024-08-27**|**Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models**|Ned Cooper et.al.|[2408.15066](http://arxiv.org/abs/2408.15066)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u53cd\u9988\u529f\u80fd\u5728ChatGPT\u754c\u9762\u4e2d\u7684\u53ef\u7528\u6027\uff0c\u5206\u6790\u4e86\u8fd9\u4e9b\u529f\u80fd\u5982\u4f55\u5851\u9020\u7528\u6237\u8f93\u5165\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fed\u4ee3\u8fc7\u7a0b\u4e2d\u7684\u53c2\u4e0e\u5ea6\u3002\u901a\u8fc7\u8c03\u7814ChatGPT\u7528\u6237\u5e76\u5e94\u7528\u4e86\u53ef\u64cd\u4f5c\u6027\u6846\u67b6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u7c7b\u529f\u80fd\u9f13\u52b1\u7b80\u5355\u3001\u9891\u7e41\u4e14\u4fa7\u91cd\u4e8e\u6027\u80fd\u7684\u53cd\u9988\uff0c\u540c\u65f6\u9650\u5236\u4e86\u96c6\u4f53\u8f93\u5165\u548c\u7528\u6237\u95f4\u7684\u8ba8\u8bba\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u8fd9\u79cd\u53cd\u9988\u683c\u5f0f\u6781\u5927\u5730\u9650\u5236\u4e86\u7528\u6237\u7684\u53c2\u4e0e\uff0c\u5f3a\u5316\u4e86\u7528\u6237\u3001\u516c\u4f17\u4e0e\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u53f8\u4e4b\u95f4\u7684\u6743\u529b\u4e0d\u5e73\u7b49\u3002\u6211\u4eec\u7684\u5206\u6790\u4e3a\u73b0\u6709\u53c2\u4e0e\u5f0f\u4eba\u5de5\u667a\u80fd\u6587\u732e\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\uff0c\u7740\u91cd\u4e8e\u73b0\u6709\u53cd\u9988\u6d41\u7a0b\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u91cd\u65b0\u8bbe\u8ba1\u7684\u65b9\u5411\u3002 \u4e3a\u4e86\u4f7f\u516c\u4f17\u5728\u4eba\u5de5\u667a\u80fd\u53d1\u5c55\u4e2d\u80fd\u591f\u66f4\u5177\u6709\u610f\u4e49\u5730\u53c2\u4e0e\uff0c\u6211\u4eec\u63d0\u5021\u8f6c\u5411\u5173\u6ce8\u6a21\u578b\u8f93\u51fa\u4e0e\u7279\u5b9a\u7528\u6237\u504f\u597d\u7684\u4e00\u81f4\u6027\u7684\u8fc7\u7a0b\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u5f3a\u8c03\u9700\u8981\u4fc3\u8fdb\u516c\u53f8\u4e0e\u4e0d\u540c\u201c\u516c\u4f17\u201d\u4e4b\u95f4\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u76ee\u7684\u548c\u5e94\u7528\u8fdb\u884c\u5bf9\u8bdd\u7684\u8fc7\u7a0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u8981\u6c42\u5bf9\u6301\u7eed\u7684\u793e\u4f1a\u57fa\u7840\u8bbe\u65bd\u5efa\u8bbe\u7684\u5173\u6ce8\uff0c\u5373\u521b\u5efa\u548c\u7ef4\u6301\u89e3\u51b3AI\u5f00\u53d1\u548c\u90e8\u7f72\u5f71\u54cd\u7fa4\u4f53\u5173\u5207\u6240\u9700\u7684\u793e\u4f1a\u3001\u6280\u672f\u548c\u673a\u6784\u7ed3\u6784\u3002|\n", "2408.15998": "|**2024-08-28**|**Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders**|Min Shi et.al.|[2408.15998](http://arxiv.org/abs/2408.15998)|**[link](https://github.com/nvlabs/eagle)**|**\u300a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\uff1a\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u63a2\u7d22\u300b\u4e00\u6587\u63a2\u8ba8\u4e86\u51c6\u786e\u89e3\u6790\u590d\u6742\u89c6\u89c9\u4fe1\u606f\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u91cd\u8981\u6027\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u589e\u5f3a\u7684\u89c6\u89c9\u611f\u77e5\u80fd\u663e\u8457\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\u3001\u6587\u6863\u5206\u6790\u7b49\u5206\u8fa8\u7387\u654f\u611f\u4efb\u52a1\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u8bb8\u591a\u5148\u8fdbMLLMs\u901a\u8fc7\u96c6\u6210\u591a\u79cd\u89c6\u89c9\u7f16\u7801\u5668\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u7136\u800c\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u5173\u952e\u65b9\u9762\u7cfb\u7edf\u7684\u6bd4\u8f83\u548c\u8be6\u7ec6\u7684\u62c6\u89e3\u7814\u7a76\uff0c\u6bd4\u5982\u4e13\u5bb6\u9009\u62e9\u548c\u591a\u89c6\u89c9\u4e13\u5bb6\u878d\u5408\u7b56\u7565\u3002\u672c\u6587\u5bf9\u4f7f\u7528\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684MLLM\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u5e7f\u6cdb\u63a2\u7d22\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u591a\u4e2a\u4e92\u8865\u89c6\u89c9\u7f16\u7801\u5668\u7684\u89c6\u89c9\u4ee4\u724c\u7b80\u5355\u62fc\u63a5\u5373\u53ef\u8fbe\u5230\u4e0e\u66f4\u590d\u6742\u7684\u6df7\u5408\u67b6\u6784\u6216\u7b56\u7565\u76f8\u5f53\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u5f15\u5165\u9884\u5bf9\u9f50\uff08Pre-Alignment\uff09\u673a\u5236\uff0c\u4ee5\u5f25\u5408\u4e13\u6ce8\u4e8e\u89c6\u89c9\u7684\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u4ee4\u724c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u4ece\u800c\u63d0\u5347\u6a21\u578b\u4e00\u81f4\u6027\u3002\u7531\u6b64\u4ea7\u751f\u7684MLLM\u5bb6\u65cf\u2014\u2014Eagle\uff0c\u5728\u4e3b\u8981\u7684MLLM\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8d85\u8d8a\u4e86\u5176\u4ed6\u9886\u5148\u5f00\u6e90\u6a21\u578b\u3002\u76f8\u5173\u4ee3\u7801\u53ca\u6a21\u578b\u5df2\u5f00\u6e90\u53d1\u5e03\uff1ahttps://github.com/NVlabs/Eagle**|\n", "2408.15971": "|**2024-08-28**|**BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems**|Wei Wang et.al.|[2408.15971](http://arxiv.org/abs/2408.15971)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u53d8\u5f97\u8d8a\u6765\u8d8a\u5f3a\u5927\uff0c\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\uff0c\u4f8b\u5982\u6784\u5efa\u5355\u4e00\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u76f8\u8f83\u4e8e\u5355\u4e00\u4ee3\u7406\uff0c\u591a\u4ee3\u7406\u7cfb\u7edf\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\u80fd\u529b\u63d0\u51fa\u4e86\u66f4\u9ad8\u7684\u8981\u6c42\u3002\u5df2\u6709\u7684\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u4e8e\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u7ec6\u7c92\u5ea6\u8bc4\u4f30\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5e76\u4e14\u5ffd\u7565\u4e86\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u4e0e\u7ade\u4e89\u573a\u666f\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014BattleAgentBench\u3002\u8be5\u57fa\u51c6\u5b9a\u4e49\u4e86\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u4e03\u4e2a\u5b50\u9636\u6bb5\uff0c\u65e8\u5728\u4ece\u5355\u4e00\u4ee3\u7406\u573a\u666f\u5bfc\u822a\u80fd\u529b\u3001\u914d\u5bf9\u4ee3\u7406\u4efb\u52a1\u6267\u884c\u80fd\u529b\u4ee5\u53ca\u591a\u4ee3\u7406\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7b49\u591a\u4e2a\u7ef4\u5ea6\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u7ec6\u81f4\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5bf9\u56db\u5927\u95ed\u6e90\u6a21\u578b\u548c\u4e03\u5927\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u5219\u9762\u4e34\u6311\u6218\u3002\u5bf9\u4e8e\u9700\u8981\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7684\u56f0\u96be\u4efb\u52a1\uff0c\u5c3d\u7ba1\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5c55\u793a\u4e86\u4e00\u5b9a\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u4ecd\u6709\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002|\n", "2408.15966": "|**2024-08-28**|**More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding**|Yuan Tang et.al.|[2408.15966](http://arxiv.org/abs/2408.15966)|**[link](https://github.com/tangyuan96/greenplm)**|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7406\u89e3\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u4e09\u7ef4\u70b9\u4e91\u4e0e\u6587\u672c\u914d\u5bf9\u6570\u636e\u96c6\uff0cLLM \u5728\u4e09\u7ef4\u7406\u89e3\u4e0a\u7684\u6210\u529f\u5c1a\u672a\u5b9e\u73b0\u590d\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u4efb\u52a1\uff1a3D \u6570\u636e\u9ad8\u6548\u70b9\u4e91-\u8bed\u8a00\u7406\u89e3\u3002\u76ee\u6807\u662f\u4f7fLLM \u80fd\u591f\u5229\u7528\u6700\u5c11\u7684\u4e09\u7ef4\u70b9\u4e91\u548c\u6587\u672c\u6570\u636e\u5bf9\u5b9e\u73b0\u7a33\u5065\u7684\u4e09\u7ef4\u5bf9\u8c61\u7406\u89e3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u5f15\u5165\u4e86GreenPLM\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u591a\u7684\u6587\u672c\u6570\u636e\u6765\u5f25\u8865\u7f3a\u5c11\u7684\u4e09\u7ef4\u6570\u636e\u3002\u9996\u5148\uff0c\u501f\u9274\u4f7f\u7528CLIP\u5bf9\u56fe\u50cf\u548c\u6587\u672c\u8fdb\u884c\u5bf9\u9f50\u7684\u65b9\u5f0f\uff0c\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u7684\u70b9\u4e91-\u6587\u672c\u7f16\u7801\u5668\u5c06\u4e09\u7ef4\u70b9\u4e91\u7a7a\u95f4\u6620\u5c04\u5230\u6587\u672c\u7a7a\u95f4\u3002\u8fd9\u4e00\u6620\u5c04\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u8fde\u63a5\u6587\u672c\u7a7a\u95f4\u4e0eLLM\u3002\u4e00\u65e6\u5efa\u7acb\u4e86\u70b9\u4e91-\u6587\u672c-LLM\u7684\u8fde\u63a5\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u901a\u8fc7\u6269\u5c55\u4e2d\u95f4\u6587\u672c\u7a7a\u95f4\u589e\u5f3a\u6587\u672c-LLM\u7684\u5bf9\u9f50\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u4e09\u7ef4\u70b9\u4e91\u6570\u636e\u7684\u4f9d\u8d56\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u751f\u6210\u4e86600\u4e07\u4e2a\u5173\u4e8e\u4e09\u7ef4\u7269\u4f53\u7684\u81ea\u7531\u6587\u672c\u63cf\u8ff0\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e09\u9636\u6bb5\u8bad\u7ec3\u7b56\u7565\uff0c\u5e2e\u52a9LLM\u66f4\u597d\u5730\u63a2\u7d22\u4e0d\u540c\u6a21\u6001\u4e4b\u95f4\u7684\u5185\u5728\u8054\u7cfb\u3002\u4e3a\u4e86\u5b9e\u73b0\u9ad8\u6548\u7684\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u96f6\u53c2\u6570\u4ea4\u53c9\u6ce8\u610f\u529b\u6a21\u5757\u7528\u4e8e\u4ee4\u724c\u805a\u5408\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGreenPLM\u4ec5\u9700\u8981\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u6240\u75283D\u8bad\u7ec3\u6570\u636e\u768412%\uff0c\u5c31\u80fd\u8fbe\u5230\u66f4\u4f18\u7684\u4e09\u7ef4\u7406\u89e3\u6027\u80fd\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cGreenPLM\u4ec5\u4f7f\u7528\u6587\u672c\u6570\u636e\u4e5f\u80fd\u5b9e\u73b0\u7ade\u4e89\u529b\u7684\u8868\u73b0\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6743\u91cd\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/TangYuan96/GreenPLM\u3002|\n", "2408.15950": "|**2024-08-28**|**Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games**|Nicholas R. Waytowich et.al.|[2408.15950](http://arxiv.org/abs/2408.15950)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u4f7f\u5176\u80fd\u529b\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u6587\u672c\u4efb\u52a1\uff0c\u6269\u5c55\u5230\u4e86\u591a\u6a21\u6001\u9886\u57df\uff0c\u6574\u5408\u4e86\u89c6\u89c9\u3001\u542c\u89c9\u548c\u6587\u672c\u6570\u636e\u3002\u867d\u7136\u5728\u673a\u5668\u4eba\u5b66\u548c\u6e38\u620f\u7b49\u9ad8\u9636\u89c4\u5212\u9886\u57df\u5bf9\u591a\u6a21\u6001LLM\u7684\u7814\u7a76\u5df2\u7ecf\u76f8\u5f53\u5e7f\u6cdb\uff0c\u4f46\u5728\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u5374\u9c9c\u6709\u63a2\u7d22\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001LLM\u5728 Atari \u89c6\u9891\u6e38\u620f\u9886\u57df\u7684\u5e94\u7528\uff0c\u5f15\u5165\u4e86 Atari \u6e38\u620f\u6027\u80fd\u4f5c\u4e3a\u8bc4\u4f30\u591a\u6a21\u6001LLM\u6267\u884c\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u80fd\u529b\u7684\u65b0\u57fa\u51c6\u3002\u4e0e\u4f20\u7edf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u548c\u6a21\u4eff\u5b66\u4e60\uff08IL\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8fd9\u4e9bLLM\u65e0\u9700\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u5956\u52b1\u51fd\u6570\u5b9a\u4e49\uff0c\u800c\u662f\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u77e5\u8bc6\u76f4\u63a5\u4e0e\u6e38\u620f\u73af\u5883\u4ea4\u4e92\u3002 \u6211\u4eec\u7684\u7814\u7a76\u8bc4\u4f30\u4e86\u591a\u4e2a\u591a\u6a21\u6001LLM\u7684\u8868\u73b0\uff0c\u4e0e\u4f20\u7edfRL\u4ee3\u7406\u3001\u4eba\u7c7b\u73a9\u5bb6\u548c\u968f\u673a\u4ee3\u7406\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u91cd\u70b9\u5173\u6ce8\u5b83\u4eec\u7406\u89e3\u590d\u6742\u89c6\u89c9\u573a\u666f\u5e76\u5236\u5b9a\u6218\u7565\u54cd\u5e94\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u5f15\u5165\u4eba\u7c7b\u6f14\u793a\u7684\u6e38\u620f\u73a9\u6cd5\u8f68\u8ff9\u6765\u7814\u7a76\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u7684\u5f71\u54cd\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002 \u901a\u8fc7\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u786e\u5b9a\u591a\u6a21\u6001LLM\u80fd\u5426\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u8bad\u7ec3\u6765\u6709\u6548\u5730\u5145\u5f53\u4f4e\u7ea7\u63a7\u5236\u5668\uff0c\u4ece\u800c\u91cd\u65b0\u5b9a\u4e49\u52a8\u6001\u548c\u89c6\u89c9\u590d\u6742\u73af\u5883\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\u3002\u6709\u5173\u989d\u5916\u7ed3\u679c\u548c\u89c6\u9891\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u9879\u76ee\u7f51\u9875\uff1ahttps://sites.google.com/view/atari-gpt/\u3002|\n", "2408.15915": "|**2024-08-28**|**Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models**|Yuncheng Yang et.al.|[2408.15915](http://arxiv.org/abs/2408.15915)|**[link](https://github.com/yaphabates/rocket)**|\u5728\u7279\u5b9a\u9886\u57df\u57f9\u517b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee5\u89e3\u51b3\u4efb\u52a1\u6240\u9700\u7684\u4e13\u957f\u5f80\u5f80\u9700\u8981\u9488\u5bf9\u7a33\u5b9a\u9884\u671f\u8f93\u51fa\u8fdb\u884c\u4e13\u95e8\u8c03\u6574\u3002\u907f\u514d\u624b\u52a8\u51c6\u5907\u6307\u4ee4\u6570\u636e\u96c6\u548c\u8bad\u7ec3\u8d44\u6e90\u5e26\u6765\u7684\u5de8\u5927\u6210\u672c\uff0c\u5229\u7528\u5f00\u653e\u77e5\u8bc6\u5305\u62ec\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u6a21\u578b\u548c\u6307\u4ee4\u6570\u636e\u96c6\u4f5c\u4e3a\u8d77\u70b9\u662f\u5408\u7406\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u4e0a\u4fa7\u91cd\u4e8e\u901a\u7528\u80fd\u529b\u7684\u6027\u80fd\uff0c\u800c\u5ffd\u89c6\u4e86\u5728\u7279\u5b9a\u9886\u57df\u90e8\u7f72\u65f6\u66b4\u9732\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u8fc7\u5f15\u5165\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u6837\u672c\uff08\u5373K-shot\uff09\u6765\u5f25\u5408\u6b64\u7c7b\u5dee\u8ddd\u7684\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdbLLM\u5728\u5f00\u653e\u77e5\u8bc6\u4e0a\u7684\u4efb\u52a1\u4e13\u957f\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u9ad8\u6548\u4e14\u53ef\u6269\u5c55\u7684\u7ba1\u9053\uff0c\u4ee5\u6210\u672c\u6548\u76ca\u65b9\u5f0f\u751f\u6210\u4efb\u52a1\u4e13\u5bb6\uff0c\u5176\u4e2dK-shot\u6570\u636e\u53c2\u4e0e\u9009\u62e9\u6700\u5177\u6f5c\u529b\u7684\u4e13\u5bb6\u5019\u9009\u8005\u548c\u4efb\u52a1\u76f8\u5173\u7684\u6307\u4ee4\u3002\u6784\u5efa\u4e86\u4e00\u4e2a\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7cfb\u7edf\uff0c\u5145\u5206\u5229\u7528\u591a\u4e2a\u4e13\u5bb6\u4e4b\u95f4\u72ec\u7279\u4f46\u4e92\u8865\u7684\u77e5\u8bc6\u3002\u6211\u4eec\u63ed\u793a\u4e86MoE\u7cfb\u7edf\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\uff1a 1. \u9075\u5faaK-shot\u539f\u5219\uff1a\u786e\u4fdd\u771f\u6b63\u5177\u5907\u89e3\u51b3K-shot\u95ee\u9898\u80fd\u529b\u7684\u6a21\u578b\u88ab\u9009\u4e2d\uff0c\u800c\u975e\u76f2\u731c\u8005\u3002 2. \u5f3a\u8c03\u591a\u6837\u6027\uff1a\u4e0d\u4ec5\u4e13\u5bb6\u672c\u8eab\u5177\u6709\u591a\u6837\u6027\uff0c\u800c\u4e14\u5728\u6574\u4e2a\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u8fc7\u7a0b\u4e2d\uff0c\u7ec6\u8c03\u6307\u4ee4\u4e5f\u4f53\u73b0\u51fa\u591a\u6837\u6027\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5bf9\u5f00\u653e\u77e5\u8bc6\u5229\u7528\u7684\u4f18\u8d8a\u6027\u3002\u540e\u7eed\u5c06\u53d1\u5e03\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2408.15907": "|**2024-08-28**|**Decentralized LLM Inference over Edge Networks with Energy Harvesting**|Aria Khoshsirat et.al.|[2408.15907](http://arxiv.org/abs/2408.15907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u7684\u5353\u8d8a\u6027\u80fd\u5df2\u7ecf\u6781\u5927\u5730\u6539\u53d8\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u4f46\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u5982\u8fb9\u7f18\u7f51\u7edc\u4e2d\u7684\u90e8\u7f72\u4ecd\u9762\u4e34\u6311\u6218\u3002\u5206\u5e03\u5f0f\u63a8\u7406\u6280\u672f\u7684\u51fa\u73b0\u901a\u8fc7\u5728\u591a\u53f0\u8bbe\u5907\u95f4\u5206\u914d\u6a21\u578b\u5757\u6765\u63d0\u5347\u7075\u6d3b\u6027\u548c\u6210\u672c\u6548\u76ca\uff0c\u4f46\u4ecd\u5b58\u5728\u80fd\u6e90\u9650\u5236\u95ee\u9898\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u7535\u6c60\u4f9b\u7535\u7684\u8fb9\u7f18\u8bbe\u5907\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u4e92\u8054\u3001\u4f7f\u7528\u80fd\u91cf\u6536\u96c6\u7684\u7535\u6c60\u4f9b\u7535\u8fb9\u7f18\u8bbe\u5907\u7684\u534f\u4f5c\u63a8\u7406\u53ef\u6301\u7eed\u6a21\u578b\u3002\u901a\u8fc7\u5efa\u7acb\u534a\u9a6c\u5c14\u53ef\u592b\u6a21\u578b\u63cf\u8ff0\u8bbe\u5907\u72b6\u6001\uff0c\u8003\u8651\u5904\u7406\u53c2\u6570\u548c\u5e73\u5747\u7eff\u8272\u80fd\u6e90\u5230\u8fbe\u60c5\u51b5\uff0c\u4ee5\u6307\u5bfc\u8bbe\u8ba1\u65e8\u5728\u51cf\u5c11\u8bbe\u5907\u505c\u673a\u65f6\u95f4\u548c\u6700\u5927\u5316\u7f51\u7edc\u541e\u5410\u91cf\u7684\u8c03\u5ea6\u7b97\u6cd5\u3002\u901a\u8fc7\u5b9e\u8bc1\u8bc4\u4f30\u548c\u6a21\u62df\u8fd0\u884c\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u4e3a\u8fb9\u7f18\u7f51\u7edc\u4e0a\u7684\u8282\u80fd\u5206\u5e03\u5f0f\u63a8\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2408.15903": "|**2024-08-28**|**LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments**|Ruirui Chen et.al.|[2408.15903](http://arxiv.org/abs/2408.15903)|null|\u5feb\u901f\u8fc7\u65f6\u7684\u4fe1\u606f\u4f7f\u5f97\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6574\u5408\u65b0\u77e5\u8bc6\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u51c6\u786e\u4e8b\u5b9e\u8bc6\u522b\u548c\u5e8f\u5217\u903b\u8f91\u63a8\u7406\u7684\u591a\u8df3\u95ee\u9898\u65f6\u4ecd\u5b58\u5728\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u9762\u5bf9\u5927\u91cf\u4e8b\u5b9e\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86Graph Memory-based Editing for Large Language Models\uff08GMeLLo\uff09\uff0c\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86\u77e5\u8bc6\u56fe\u8c31\uff08KGs\uff09\u7684\u660e\u786e\u77e5\u8bc6\u8868\u793a\u4e0eLLMs\u7684\u8bed\u8a00\u7075\u6d3b\u6027\u3002GMeLLo\u4e0d\u4ec5\u5229\u7528LLMs\u8fdb\u884c\u95ee\u7b54\uff0c\u8fd8\u8fd0\u7528\u8fd9\u4e9b\u6a21\u578b\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u6362\u4e3a\u7ed3\u6784\u5316\u67e5\u8be2\u548c\u4e8b\u5b9e\u4e09\u5143\u7ec4\uff0c\u4ece\u800c\u5b9e\u73b0\u4e0eKGs\u7684\u65e0\u7f1d\u4ea4\u4e92\uff0c\u7528\u4e8e\u5feb\u901f\u66f4\u65b0\u548c\u7cbe\u786e\u7684\u591a\u8df3\u63a8\u7406\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMeLLo\u5728\u591a\u8df3\u95ee\u7b54\u57fa\u51c6MQuAKE\u4e2d\u663e\u8457\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u5927\u91cf\u77e5\u8bc6\u66f4\u65b0\u7684\u573a\u666f\u4e2d\u3002|\n", "2408.15901": "|**2024-08-28**|**Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts**|Nikolas Gritsch et.al.|[2408.15901](http://arxiv.org/abs/2408.15901)|null|\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6548\u7387\u3001\u4e13\u4e1a\u5316\u548c\u5bf9\u65b0\u6570\u636e\u5206\u5e03\u7684\u9002\u5e94\u6027\u65b9\u9762\u96be\u4ee5\u540c\u65f6\u5177\u5907\u8fd9\u4e9b\u4f18\u79c0\u54c1\u8d28\u3002\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u6761\u4ef6\u8ba1\u7b97\u7684\u5185\u5728\u7279\u6027\uff0c\u6210\u4e3a\u7814\u7a76\u7684\u91cd\u70b9\u9886\u57df\uff0c\u65e8\u5728\u63d0\u5347\u8fd9\u4e9b\u54c1\u8d28\u3002\u672c\u5de5\u4f5c\u4e13\u6ce8\u4e8e\u201c\u5347\u7ea7\u201d\u5bc6\u96c6\u578b\u4e13\u5bb6\u6a21\u578b\u81f3MoE\u67b6\u6784\uff0c\u65e8\u5728\u589e\u5f3a\u4e13\u4e1a\u5316\u7684\u540c\u65f6\uff0c\u4e5f\u589e\u52a0\u5bf9\u65b0\u4efb\u52a1\u7684\u7075\u6d3b\u9002\u5e94\u6027\u3002 \u6211\u4eec\u5f15\u5165\u4e86Nexus\uff0c\u4e00\u79cd\u589e\u5f3a\u7684MoE\u67b6\u6784\uff0c\u5176\u5177\u6709\u81ea\u9002\u5e94\u8def\u7531\u673a\u5236\uff0c\u5141\u8bb8\u6a21\u578b\u5b66\u4e60\u5c06\u4e13\u5bb6\u5d4c\u5165\u4ece\u9886\u57df\u8868\u793a\u8fdb\u884c\u6295\u5f71\u3002\u8fd9\u79cd\u7b56\u7565\u4f7f\u5f97Nexus\u80fd\u591f\u901a\u8fc7\u5355\u72ec\u8bad\u7ec3\u7684\u5bc6\u96c6\u6a21\u578b\u7075\u6d3b\u5730\u6dfb\u52a0\u65b0\u7684\u4e13\u5bb6\uff0c\u65e0\u9700\u5bf9\u672a\u89c1\u6570\u636e\u57df\u8fdb\u884c\u5927\u89c4\u6a21MoE\u8bad\u7ec3\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cNexus\u5728\u521d\u59cb\u5347\u7ea7\u9636\u6bb5\u5b9e\u73b0\u4e86\u9ad8\u8fbe2.1%\u7684\u76f8\u5bf9\u589e\u76ca\uff0c\u5728\u4f7f\u7528\u6709\u9650\u7684\u5fae\u8c03\u6570\u636e\u6269\u5c55MoE\u65f6\u5b9e\u73b0\u4e8618.8%\u7684\u76f8\u5bf9\u589e\u76ca\u3002Nexus\u7684\u7075\u6d3b\u6027\u5bf9\u4e8e\u5efa\u7acb\u4e00\u4e2a\u5f00\u6e90\u751f\u6001\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\uff0c\u8be5\u751f\u6001\u7cfb\u7edf\u5141\u8bb8\u6bcf\u4e2a\u7528\u6237\u6839\u636e\u81ea\u5df1\u7684\u9700\u6c42\u4e0d\u65ad\u7ec4\u88c5\u81ea\u5df1\u7684MoE\u6df7\u5408\u6a21\u578b\u3002|\n", "2408.15895": "|**2024-08-28**|**Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models**|Sebastian Vallejo Vera et.al.|[2408.15895](http://arxiv.org/abs/2408.15895)|null|\u4eba\u7c7b\u7f16\u7801\u5458\u5b58\u5728\u504f\u89c1\u3002\u6211\u4eec\u901a\u8fc7\u590d\u5236Ennser-Jedenastik\u548cMeyer\uff082018\uff09\u7684\u5b9e\u9a8c\uff0c\u53d1\u73b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc4\u4f30\u653f\u6cbb\u58f0\u660e\u65f6\u4f7f\u7528\u653f\u6cbb\u4fe1\u606f\uff0c\u7279\u522b\u662f\u653f\u515a\u7ebf\u7d22\u3002LLMs\u4e0d\u4ec5\u6839\u636e\u653f\u515a\u7ebf\u7d22\u4e0a\u4e0b\u6587\u5316\u5224\u65ad\u9648\u8ff0\u662f\u6b63\u9762\u3001\u8d1f\u9762\u8fd8\u662f\u4e2d\u6027\uff0c\u8fd8\u53cd\u6620\u51fa\u5b83\u4eec\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u751f\u6210\u7684\u4eba\u7c7b\u6570\u636e\u6240\u5177\u6709\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4e0e\u4eba\u7c7b\u4e0d\u540c\u7684\u662f\uff0c\u4eba\u7c7b\u4ec5\u5728\u9762\u5bf9\u6781\u7aef\u653f\u515a\u58f0\u660e\u65f6\u8868\u73b0\u51fa\u504f\u89c1\uff0c\u800cLLMs\u5373\u4f7f\u5728\u88ab\u63d0\u793a\u6765\u81ea\u4e2d\u95f4\u5de6\u7ffc\u548c\u4e2d\u95f4\u53f3\u7ffc\u653f\u515a\u7684\u58f0\u660e\u65f6\u4e5f\u663e\u793a\u51fa\u663e\u8457\u504f\u89c1\u3002\u6700\u540e\u90e8\u5206\u8ba8\u8bba\u4e86\u8fd9\u4e9b\u53d1\u73b0\u7684\u610f\u4e49\u3002|\n", "2408.15879": "|**2024-08-28**|**Persuasion Games using Large Language Models**|Ganesh Prasath Ramani et.al.|[2408.15879](http://arxiv.org/abs/2408.15879)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5df2\u7ecf\u53d1\u5c55\u6210\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u672c\u6587\u7814\u7a76\u4e86LLM\u5728\u5851\u9020\u4eba\u7c7b\u89c2\u70b9\u5e76\u8fdb\u800c\u5f71\u54cd\u4ed6\u4eec\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u51b3\u7b56\u65b9\u9762\u7684\u6f5c\u529b\u3002\u8fd9\u4e9b\u80fd\u529b\u5728\u6295\u8d44\u3001\u4fe1\u7528\u5361\u548c\u4fdd\u9669\u7b49\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\uff0c\u5e2e\u52a9\u7528\u6237\u9009\u62e9\u5408\u9002\u7684\u4fdd\u9669\u653f\u7b56\u3001\u6295\u8d44\u8ba1\u5212\u3001\u4fe1\u7528\u5361\u4ee5\u53ca\u96f6\u552e\u4ea7\u54c1\uff0c\u751a\u81f3\u5728\u884c\u4e3a\u6539\u53d8\u652f\u6301\u7cfb\u7edf\uff08BCSS\uff09\u4e2d\u4e5f\u6709\u5e94\u7528\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u590d\u6742\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u4e2d\u4e00\u7ec4\u4ee3\u7406\u4ee5\u534f\u4f5c\u65b9\u5f0f\u64cd\u4f5c\u3002\u4e3b\u8981\u4ee3\u7406\u76f4\u63a5\u4e0e\u7528\u6237\u8fdb\u884c\u6709\u8bf4\u670d\u529b\u7684\u5bf9\u8bdd\uff0c\u800c\u8f85\u52a9\u4ee3\u7406\u6267\u884c\u8bf8\u5982\u4fe1\u606f\u68c0\u7d22\u3001\u54cd\u5e94\u5206\u6790\u3001\u5236\u5b9a\u8bf4\u670d\u7b56\u7565\u548c\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u4efb\u52a1\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u534f\u4f5c\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u8bf4\u670d\u6548\u679c\u3002\u6211\u4eec\u6301\u7eed\u5206\u6790\u7528\u6237\u7684\u62b5\u6297\u6027\uff0c\u5e76\u901a\u8fc7\u7ed3\u5408\u89c4\u5219\u57fa\u4e8e\u548cLLM\u57fa\u4e8e\u7684\u62b5\u6297-\u8bf4\u670d\u6620\u5c04\u6280\u672f\u6765\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\u3002 \u6211\u4eec\u4f7f\u7528\u6a21\u62df\u7684\u4eba\u683c\u5f62\u8c61\uff0c\u5e76\u5728\u4fdd\u9669\u3001\u94f6\u884c\u548c\u96f6\u552e\u9886\u57df\u751f\u6210\u5bf9\u8bdd\uff0c\u4ee5\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bc6\u522b\u3001\u9002\u5e94\u548c\u5f71\u54cd\u4e0d\u540c\u4eba\u683c\u7c7b\u578b\u65b9\u9762\u7684\u719f\u7ec3\u7a0b\u5ea6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u68c0\u67e5\u4e86LLM\u6a21\u62df\u4eba\u683c\u6240\u91c7\u7528\u7684\u62b5\u6297\u673a\u5236\u3002\u8bf4\u670d\u6548\u679c\u901a\u8fc7\u4ea4\u4e92\u524d\u540e\u7684\u53ef\u8861\u91cf\u8c03\u67e5\u3001LLM\u751f\u6210\u7684\u5bf9\u8bdd\u8bc4\u5206\u4ee5\u53ca\u7528\u6237\u51b3\u7b56\uff08\u8d2d\u4e70\u6216\u4e0d\u8d2d\u4e70\uff09\u8fdb\u884c\u91cf\u5316\u3002|\n", "2408.16756": "|**2024-08-29**|**How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models**|Jiyue Jiang et.al.|[2408.16756](http://arxiv.org/abs/2408.16756)|**[link](https://github.com/jiangjyjy/yue-benchmark)**|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u6539\u53d8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u7ade\u8d5b\u73af\u5883\uff0c\u7279\u522b\u662f\u5728\u82f1\u8bed\u548c\u5176\u4ed6\u6570\u636e\u4e30\u5bcc\u7684\u8bed\u8a00\u4e2d\u3002\u7136\u800c\uff0c\u5728\u8bf8\u5982\u7ca4\u8bed\u8fd9\u6837\u7684\u4ee3\u8868\u6027\u4e0d\u8db3\u7684\u8bed\u8a00\u9886\u57df\uff0c\u5f00\u53d1\u5dee\u8ddd\u4ecd\u7136\u663e\u8457\u5b58\u5728\uff0c\u8fd9\u5c24\u5176\u4ee4\u4eba\u62c5\u5fe7\uff0c\u8003\u8651\u5230\u5e7f\u6df1\u6e2f\u6fb3\u5927\u6e7e\u533a\u7684\u7ecf\u6d4e\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5728\u65b0\u52a0\u5761\u548c\u5317\u7f8e\u5730\u533a\u5927\u91cf\u7ca4\u8bed\u4f7f\u7528\u8005\u7684\u60c5\u51b5\u3002\u5c3d\u7ba1\u7ca4\u8bed\u5e7f\u6cdb\u4f7f\u7528\uff0c\u4f46\u5728NLP\u7814\u7a76\u4e2d\u5bf9\u7ca4\u8bed\u7684\u4ee3\u8868\u5374\u5c11\u4e4b\u53c8\u5c11\uff0c\u5c24\u5176\u662f\u4e0e\u5176\u4ed6\u540c\u6837\u53d1\u8fbe\u5730\u533a\u7684\u8bed\u8a00\u76f8\u6bd4\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e9b\u7a7a\u767d\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u5f53\u524d\u7684\u7ca4\u8bedNLP\u65b9\u6cd5\uff0c\u5e76\u5f15\u5165\u4e86\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4e8b\u5b9e\u751f\u6210\u3001\u6570\u5b66\u903b\u8f91\u3001\u590d\u6742\u63a8\u7406\u548c\u7ca4\u8bed\u4e2d\u7684\u901a\u7528\u77e5\u8bc6\u7b49\u65b9\u9762\u7684\u6027\u80fd\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u63a8\u52a8\u5f00\u6e90\u7ca4\u8bedLLM\u6280\u672f\u7684\u53d1\u5c55\u3002\u6211\u4eec\u4e5f\u63d0\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u548c\u63a8\u8350\u7684\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u7ca4\u8bedLLM\u7684\u5f00\u53d1\u3002|\n", "2408.16753": "|**2024-08-29**|**Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models**|Alec Solway et.al.|[2408.16753](http://arxiv.org/abs/2408.16753)|null|\u5f3a\u5316\u5b66\u4e60\u5728\u9884\u8bad\u7ec3\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u6700\u5927\u5316\u4f3c\u7136\u6027\u6765\u9884\u6d4b\u5927\u578b\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u7684\u4e0b\u4e00\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u7528\u4e8e\u5c06\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5bf9\u9f50\u3002\u5728\u90e8\u7f72\u5230\u7279\u5b9a\u9886\u57df\u4e4b\u524d\uff0c\u901a\u5e38\u4f1a\u5bf9\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u5fae\u8c03\u4ee5\u9002\u5e94\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u3002\u7531\u4e8e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5728\u6700\u540e\u9636\u6bb5\u5f80\u5f80\u4e0d\u53ef\u7528\uff0c\u56e0\u6b64\u901a\u5e38\u4f7f\u7528\u6700\u5927\u5316\u4f3c\u7136\u6027\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u9ed8\u8ba4\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5f3a\u5316\u5b66\u4e60\u9664\u4e86\u80fd\u591f\u4fc3\u8fdb\u4e0e\u4eba\u7c7b\u5b9a\u4e49\u5956\u52b1\u51fd\u6570\u7684\u5bf9\u9f50\u4e4b\u5916\uff0c\u8fd8\u6709\u5176\u4ed6\u4f18\u52bf\u3002\u76f8\u6bd4\u4e8e\u6700\u5927\u5316\u4f3c\u7136\u6027\uff0c\u5373\u6a21\u4eff\u5b66\u4e60\u6a21\u578b\u5728\u7406\u60f3\u6761\u4ef6\u4e0b\u5e94\u6267\u884c\u7684\u64cd\u4f5c\uff0c\u5f3a\u5316\u5b66\u4e60\u4e0d\u9650\u4e8e\u4ec5\u5c55\u793a\u8fbe\u5230\u6700\u4f18\u72b6\u6001\u65f6\u7684\u64cd\u4f5c\uff0c\u800c\u662f\u5728\u63a2\u7d22\u7b56\u7565\u7a7a\u95f4\u7684\u8fc7\u7a0b\u4e2d\u8bad\u7ec3\u6a21\u578b\u5728\u5404\u79cd\u60c5\u51b5\u4e0b\u7684\u64cd\u4f5c\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8bad\u7ec3\u6a21\u578b\u907f\u514d\u6267\u884c\u7ade\u4e89\u4f46\u6548\u679c\u4e0d\u4f73\u7684\u64cd\u4f5c\u3002\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u8fdb\u884c\u6700\u540e\u4e00\u9636\u6bb5\u5fae\u8c03\u7684\u6846\u67b6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u8be5\u65b9\u6cd5\u662f\u5426\u80fd\u5e26\u6765\u6027\u80fd\u63d0\u5347\u3002\u5b9e\u9a8c\u96c6\u4e2d\u5728\u62bd\u8c61\u6982\u62ec\u4e0a\uff0c\u4f46\u6846\u67b6\u5177\u6709\u666e\u904d\u9002\u7528\u6027\u3002\u91c7\u7528\u8be5\u6d41\u7a0b\u4ea7\u751f\u7684\u7ed3\u679c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u6700\u5927\u4f3c\u7136\u6027\u8f93\u51fa\u7684\u7ed3\u679c\u3002\u5bf9\u4e8e\u7279\u5b9a\u7684\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u540e\u5904\u7406\u6700\u5927\u4f3c\u7136\u8f93\u51fa\u53ef\u4ee5\u7f29\u5c0f\u6027\u80fd\u5dee\u8ddd\u3002\u7136\u800c\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u4f18\u5316\u6a21\u578b\u7684\u65b0\u9014\u5f84\uff0c\u5728\u540e\u5904\u7406\u53ef\u80fd\u4e0d\u90a3\u4e48\u76f4\u63a5\u6709\u6548\u6216\u6709\u6548\u7684\u573a\u666f\u4e2d\u5c24\u4e3a\u6709\u7528\uff0c\u5e76\u4e14\u5b83\u53ef\u4ee5\u6269\u5c55\u4ee5\u5305\u62ec\u66f4\u591a\u7c7b\u522b\u7684\u9700\u8981\u60e9\u7f5a\u5e76\u8bad\u7ec3\u53cd\u5bf9\u7684\u4e0d\u9002\u5f53\u8f93\u51fa\uff0c\u5982\u5e7b\u89c9\u3002|\n", "2408.16749": "|**2024-08-29**|**Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge**|Beidi Dong et.al.|[2408.16749](http://arxiv.org/abs/2408.16749)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u68c0\u6d4b\u548c\u9650\u5236\u7f51\u7edc\u4e0a\u6781\u7aef\u4e3b\u4e49\u601d\u60f3\u4f20\u64ad\u65b9\u9762\uff0c\u81ea\u52a8\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u7814\u7a76\u6bd4\u8f83\u4e86\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff08BERT\uff09\u548c\u751f\u6210\u9884\u8bad\u7ec3Transformer\uff08GPT\uff09\u6a21\u578b\uff0c\u5728\u201c\u53f3\u7ffc\u201d\u548c\u201c\u5de6\u7ffc\u201d\u610f\u8bc6\u5f62\u6001\u5173\u952e\u8bcd\u7684\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u4e2d\u8fdb\u884c\u68c0\u6d4b\u4e0e\u5206\u7c7b\u7684\u80fd\u529b\u3002\u6211\u4eec\u6536\u96c6\u4e86\u542b\u6709\u4e0a\u8ff0\u5173\u952e\u8bcd\u7684\u5e16\u5b50\uff0c\u5e76\u4eba\u5de5\u6807\u8bb0\u4e3a\u6781\u7aef\u4e3b\u4e49\u6216\u975e\u6781\u7aef\u4e3b\u4e49\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c06\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u5206\u4e3a\u4e94\u4e2a\u6784\u6210\u8981\u7d20\u4e4b\u4e00\uff0c\u57fa\u4e8e\u5de5\u4f5c\u5b9a\u4e49\u6846\u67b6\u3002 BERT\u6a21\u578b\u7684\u6027\u80fd\u8bc4\u4f30\u57fa\u4e8e\u8bad\u7ec3\u6570\u636e\u89c4\u6a21\u548c\u7c7b\u522b\u95f4\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u6bd4\u4e86\u4f7f\u7528\u4e0d\u540c\u63d0\u793a\u7684GPT 3.5\u548cGPT 4\u6a21\u578b\u7684\u6027\u80fd\uff1a\u539f\u59cb\u63d0\u793a\u3001\u4e00\u822c\u5b9a\u4e49\u3001\u89d2\u8272\u626e\u6f14\u548c\u4e13\u4e1a\u5b9a\u4e49\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6700\u4f73\u8868\u73b0\u7684GPT\u6a21\u578b\u4f18\u4e8e\u6700\u4f73\u8868\u73b0\u7684BERT\u6a21\u578b\uff0c\u66f4\u8be6\u7ec6\u7684\u63d0\u793a\u901a\u5e38\u80fd\u5e26\u6765\u66f4\u597d\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fc7\u4e8e\u590d\u6742\u7684\u63d0\u793a\u53ef\u80fd\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e0d\u540c\u7684GPT\u7248\u672c\u5bf9\u88ab\u8ba4\u5b9a\u4e3a\u6781\u7aef\u4e3b\u4e49\u7684\u654f\u611f\u5ea6\u5404\u4e0d\u76f8\u540c\u3002GPT 3.5\u5728\u8bc6\u522b\u5de6\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\uff0c\u800cGPT 4\u5219\u5728\u8bc6\u522b\u53f3\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\u3002 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT\u6a21\u578b\uff09\u5728\u5728\u7ebf\u6781\u7aef\u4e3b\u4e49\u5206\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684BERT\u6a21\u578b\uff0c\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u672a\u6765\u7814\u7a76\u5e94\u63a2\u7d22\u4eba\u7c7b\u4e0e\u8ba1\u7b97\u673a\u4ea4\u4e92\u5728\u4f18\u5316GPT\u6a21\u578b\u4ee5\u8fdb\u884c\u6781\u7aef\u4e3b\u4e49\u68c0\u6d4b\u4e0e\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u4f5c\u7528\uff0c\u4ee5\u5f00\u53d1\u66f4\u9ad8\u6548\uff08\u4f8b\u5982\uff0c\u66f4\u5feb\u6377\u3001\u66f4\u5c11\u52aa\u529b\uff09\u4e14\u66f4\u6709\u6548\u7684\u8bc6\u522b\u6781\u7aef\u4e3b\u4e49\u5185\u5bb9\u65b9\u6cd5\u3002|\n", "2408.16740": "|**2024-08-29**|**Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models**|Ji\u0159\u00ed Mili\u010dka et.al.|[2408.16740](http://arxiv.org/abs/2408.16740)|null|\u672c\u6587\u4ece\u5b9a\u91cf\u8bed\u8a00\u5b66\u7684\u89d2\u5ea6\u63a2\u8ba8\u4e86\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u751f\u6210\u6587\u672c\u6240\u9762\u4e34\u7684\u6982\u5ff5\u3001\u65b9\u6cd5\u8bba\u548c\u6280\u672f\u6311\u6218\u3002\u672c\u6587\u57fa\u4e8e\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u533a\u5206\u4e86\u4f5c\u4e3a\u8f7d\u4f53\u7684LLM\u4e0e\u6a21\u62df\u7684\u5b9e\u4f53\u3002\u672c\u6587\u5021\u5bfc\u5bf9\u6a21\u578b\u91c7\u53d6\u4e25\u683c\u975e\u62df\u4eba\u5316\u7684\u65b9\u6cd5\uff0c\u540c\u65f6\u8c28\u614e\u5730\u5e94\u7528\u7528\u4e8e\u7814\u7a76\u4eba\u7c7b\u8bed\u8a00\u884c\u4e3a\u7684\u65b9\u6cd5\u6765\u5206\u6790\u6a21\u62df\u5b9e\u4f53\u3002\u867d\u7136\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7814\u7a76\u8005\u5173\u6ce8\u6a21\u578b\u672c\u8eab\u3001\u5176\u67b6\u6784\u3001\u8bc4\u4f30\u4ee5\u53ca\u63d0\u9ad8\u6027\u80fd\u7684\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u5b9a\u91cf\u8bed\u8a00\u5b66\u5bb6\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u6784\u5efa\u5173\u4e8eLLM\u751f\u6210\u6587\u672c\u7279\u6027\u7684\u7406\u8bba\u4f53\u7cfb\uff0c\u5b83\u4eec\u4e0e\u4eba\u7c7b\u751f\u6210\u7684\u6587\u672c\u6709\u4f55\u4e0d\u540c\uff0c\u4ee5\u53ca\u6a21\u62df\u5b9e\u4f53\u7684\u5c5e\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5e94\u63a2\u7d22LLM\u4f5c\u4e3a\u7814\u7a76\u4eba\u7c7b\u6587\u5316\u5de5\u5177\u7684\u53ef\u80fd\u6027\uff0c\u800c\u8bed\u8a00\u662f\u8fd9\u4e00\u6587\u5316\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002|\n", "2408.16700": "|**2024-08-29**|**GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models**|Moreno D'Inc\u00e0 et.al.|[2408.16700](http://arxiv.org/abs/2408.16700)|**[link](https://github.com/moreno98/gradbias)**|**\u8fd1\u671f\u5728\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\u4f7f\u5f97\u9ad8\u8d28\u91cf\u56fe\u50cf\u751f\u6210\u6210\u4e3a\u53ef\u80fd\u3002\u968f\u7740\u6027\u80fd\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u63d0\u9ad8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6b63\u53d7\u5230\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u548c\u6b22\u8fce\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u516c\u5e73\u6027\u548c\u5b89\u5168\u6027\u662f\u9632\u6b62\u504f\u89c1\u4f20\u64ad\u548c\u5ef6\u7eed\u7684\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9884\u5b9a\u4e49\u504f\u89c1\uff08\u5982\u6027\u522b\u3001\u79cd\u65cf\uff09\u7684\u5c01\u95ed\u96c6\u5408\u4e0a\u8fdb\u884c\u504f\u89c1\u68c0\u6d4b\u3002\u7136\u800c\uff0c\u5728\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\uff0c\u5373\u65e0\u9700\u9884\u5148\u8bbe\u5b9a\u7684\u60c5\u51b5\u4e0b\uff0c\u68c0\u6d4b\u548c\u91cf\u5316\u504f\u89c1\u662f\u4e00\u4e2a\u6311\u6218\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u8bc6\u522b\u3001\u91cf\u5316\u548c\u89e3\u91ca\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\u7684\u504f\u89c1\u3002\u8be5\u7ba1\u9053\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ece\u4e00\u7ec4\u63cf\u8ff0\u4e2d\u63d0\u51fa\u504f\u89c1\u3002\u968f\u540e\uff0c\u4f7f\u7528\u76ee\u6807\u751f\u6210\u6a21\u578b\u751f\u6210\u4e00\u7cfb\u5217\u56fe\u50cf\u3002\u6700\u540e\uff0c\u901a\u8fc7\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u8fdb\u884c\u504f\u89c1\u8bc4\u4f30\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4e24\u79cd\u57fa\u4e8e\u6b64\u6846\u67b6\u7684\u65b9\u6cd5\uff1aOpenBias \u548c GradBias\u3002OpenBias \u80fd\u591f\u68c0\u6d4b\u5e76\u91cf\u5316\u4e0e\u4eba\u3001\u7269\u4f53\u548c\u52a8\u7269\u76f8\u5173\u7684\u5df2\u77e5\u548c\u65b0\u578b\u504f\u89c1\uff0c\u5e76\u4e0e\u73b0\u6709\u7684\u5c01\u95ed\u96c6\u504f\u89c1\u68c0\u6d4b\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u4e00\u81f4\u3002GradBias \u663e\u793a\u51fa\u4e2d\u6027\u8bcd\u6c47\u5bf9\u504f\u89c1\u7684\u5f71\u54cd\u663e\u8457\uff0c\u5e76\u4e14\u5728\u591a\u9879\u57fa\u7ebf\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684\u57fa\u7840\u6a21\u578b\u3002 \u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1ahttps://github.com/Moreno98/GradBias\u3002**|\n", "2408.16673": "|**2024-08-29**|**Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity**|Ziniu Li et.al.|[2408.16673](http://arxiv.org/abs/2408.16673)|null|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u7684\u7cbe\u8c03\uff08Supervised Fine-Tuning\uff0cSFT\uff09\u8fc7\u7a0b\u4e2d\u9047\u5230\u7684\u8fc7\u62df\u5408\u548c\u8f93\u51fa\u591a\u6837\u6027\u53d7\u9650\u7684\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u4ea4\u53c9\u71b5\uff08Cross Entropy\uff0cCE\uff09\u635f\u5931\u51fd\u6570\u88ab\u5e7f\u6cdb\u7528\u4e8eSFT\uff0c\u7136\u800c\u5b83\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5bf9\u6570\u636e\u5206\u5e03\u8fdb\u884c\u8fc7\u4e8e\u6fc0\u8fdb\u7684\u66f4\u65b0\uff0c\u4ece\u800c\u5f15\u53d1\u8fc7\u62df\u5408\u548c\u964d\u4f4e\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u6700\u5927\u71b5\u539f\u5219\uff0c\u8be5\u539f\u5219\u503e\u5411\u4e8e\u4fc3\u8fdb\u6a21\u578b\u751f\u6210\u66f4\u5e73\u6ed1\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u4ecd\u80fd\u6709\u6548\u6355\u6349\u6570\u636e\u7279\u5f81\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGEM\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u89e3\u51b3\u53cd\u5411Kullback-Leibler\u6563\u5ea6\u6700\u5c0f\u5316\u95ee\u9898\uff0c\u5e76\u52a0\u5165\u71b5\u6b63\u5219\u5316\u5668\uff0c\u6765\u5339\u914d\u76ee\u6807\u5206\u5e03\u3002 \u5728\u5bf9Llama-3-8B\u6a21\u578b\u8fdb\u884cSFT\u65f6\uff0cGEM\u5728\u591a\u4e2a\u65b9\u9762\u4f18\u4e8eCE\u3002\u9996\u5148\uff0c\u5728\u4f7f\u7528UltraFeedback\u6570\u636e\u96c6\u8bad\u7ec3\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u65f6\uff0cGEM\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u8ff9\u8c61\uff0c\u8868\u73b0\u4e3a\u66f4\u4f4e\u7684\u56f0\u60d1\u5ea6\u548c\u5728IFEval\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u66f4\u597d\u6027\u80fd\u3002\u6b64\u5916\uff0cGEM\u8fd8\u63d0\u9ad8\u4e86\u8f93\u51fa\u7684\u591a\u6837\u6027\uff0c\u5373\u4f7f\u5728\u6ca1\u6709\u7279\u5b9a\u9886\u57df\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u4ec5\u901a\u8fc7\u6700\u4f73n\u91c7\u6837\uff0c\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u6027\u80fd\u4e5f\u5f97\u5230\u4e86\u6700\u9ad87\u5206\u7684\u63d0\u5347\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u5f53\u4f7f\u7528\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u96c6\u5bf9\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u65f6\uff0cGEM\u540c\u6837\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u548c\u4e0eCE\u76f8\u6bd4\u9ad8\u8fbe10\u5206\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.16601": "|**2024-08-29**|**Examination of Code generated by Large Language Models**|Robin Beer et.al.|[2408.16601](http://arxiv.org/abs/2408.16601)|**[link](https://github.com/t-muras/ai-code-analysis)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4f8b\u5982ChatGPT\u548cCopilot\uff0c\u6b63\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u4ee3\u7801\u751f\u6210\u5f7b\u5e95\u6539\u53d8\u8f6f\u4ef6\u5f00\u53d1\uff0c\u8fd9\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u4fc3\u8fdb\u4e86\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u3001\u6559\u80b2\u652f\u6301\u4ee5\u53ca\u751f\u4ea7\u529b\u7684\u63d0\u5347\u3002\u56e0\u6b64\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u6b63\u786e\u6027\u548c\u8d28\u91cf\u5e94\u4e0e\u4eba\u5de5\u7f16\u5199\u7684\u4ee3\u7801\u76f8\u5f53\u3002\u4e3a\u4e86\u8bc4\u4f30\u5f53\u524dLLM\u5728\u751f\u6210Java\u548cPython\u8bed\u8a00\u4e2d\u7684\u7b80\u5355\u7b97\u6cd5\u53ca\u5176\u5bf9\u5e94\u7684\u5355\u5143\u6d4b\u8bd5\u65f6\u7684\u6b63\u786e\u6027\u548c\u8d28\u91cf\uff08\u8986\u76d6\u7387\uff09\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u53d7\u63a7\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u5305\u62ec\u8ba9LLM\u751f\u6210\u4ee3\u7801\u5e76\u8bc4\u4f30\u5176\u6b63\u786e\u6027\u4e0e\u8d28\u91cf\u3002\u6211\u4eec\u89c2\u5bdf\u5230LLM\u4e4b\u95f4\u3001\u4e0d\u540c\u7f16\u7a0b\u8bed\u8a00\u4e4b\u95f4\u3001\u7b97\u6cd5\u4e0e\u6d4b\u8bd5\u4ee3\u7801\u4e4b\u95f4\u4ee5\u53ca\u65f6\u95f4\u4e0a\u7684\u663e\u8457\u5dee\u5f02\u3002\u672c\u6587\u62a5\u544a\u4e86\u8fd9\u4e9b\u7ed3\u679c\u53ca\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u4ee5\u4fbf\u8fdb\u884c\u91cd\u590d\u548c\u53ef\u6bd4\u7684\u8bc4\u4f30\uff0c\u4ee5\u6db5\u76d6\u66f4\u591a\u7684\u7b97\u6cd5\u3001\u8bed\u8a00\u548cLLM\u968f\u65f6\u95f4\u7684\u53d8\u5316\u60c5\u51b5\u3002**|\n", "2408.16586": "|**2024-08-29**|**Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies**|Zhiyang Qi et.al.|[2408.16586](http://arxiv.org/abs/2408.16586)|null|\u8fd1\u671f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u7684\u53d1\u5c55\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5bf9\u8bdd\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f7f\u5f97\u5b83\u4eec\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u81ea\u7136\u6d41\u7545\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u4ecd\u9762\u4e34\u7740\u8bf8\u5982\u6301\u7eed\u5bf9\u8bdd\u7ba1\u7406\u3001\u8bb0\u5fc6\u4fdd\u7559\u548c\u51cf\u5c11\u5e7b\u89c9\u7b49\u6311\u6218\u3002AIWolfDial2024\u8fd9\u4e00\u9879\u76ee\u901a\u8fc7\u91c7\u7528\u201c\u72fc\u4eba\u6740\u201d\u8fd9\u4e00\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u6765\u6d4b\u8bd5LLM\u5728\u590d\u6742\u4e92\u52a8\u73af\u5883\u4e2d\u7684\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9\u4e0a\u8ff0\u6311\u6218\u3002\u8be5\u9879\u76ee\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620fAI\uff0c\u5176\u4e2d\u6bcf\u4e2a\u89d2\u8272\u90fd\u901a\u8fc7\u60c5\u5883\u5206\u6790\u6765\u8f85\u52a9\u56de\u5e94\u751f\u6210\u3002\u5bf9\u4e8e\u201c\u72fc\u4eba\u201d\u8fd9\u4e00\u89d2\u8272\uff0c\u9879\u76ee\u91c7\u7528\u4e86\u5305\u62ec\u903b\u8f91\u5438\u5f15\u529b\u3001\u53ef\u4fe1\u5ea6\u5438\u5f15\u529b\u548c\u60c5\u611f\u5438\u5f15\u529b\u5728\u5185\u7684\u591a\u79cd\u8bf4\u670d\u7b56\u7565\uff0c\u4ee5\u6709\u6548\u5730\u5f15\u5bfc\u5176\u4ed6\u73a9\u5bb6\u4e0e\u81ea\u5df1\u7684\u884c\u52a8\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2408.16518": "|**2024-08-29**|**CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues**|Rena Gao et.al.|[2408.16518](http://arxiv.org/abs/2408.16518)|**[link](https://github.com/renagao/csl2024)**|\u6211\u4eec\u5f00\u53d1\u4e86CNIMA\uff08\u4e00\u79cd\u4e2d\u6587\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u975e\u6bcd\u8bed\u4e92\u52a8\u6d4b\u91cf\u4e0e\u81ea\u52a8\u5316\u6570\u636e\u96c6\uff09\uff0c\u5305\u542b10,000\u4e2a\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u8bc4\u4f30\u6846\u67b6\u6765\u6ce8\u91caCNIMA\uff0c\u8be5\u6846\u67b6\u6700\u521d\u7528\u4e8e\u82f1\u8bed\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u5bf9\u8bdd\uff0c\u5b83\u8bc4\u4f30\u4e86\u5fae\u89c2\u5c42\u9762\u7279\u5f81\uff08\u5982\u56de\u8bdd\uff09\u548c\u5b8f\u89c2\u5c42\u9762\u4e92\u52a8\u6807\u7b7e\uff08\u5982\u4e3b\u9898\u7ba1\u7406\uff09\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u8be5\u6846\u67b6\u4ece\u82f1\u8bed\u5230\u4e2d\u6587\u7684\u53ef\u79fb\u690d\u6027\u3002\u53d1\u73b0\u8be5\u6846\u67b6\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u666e\u904d\u6027\u548c\u7279\u5b9a\u4e8e\u8bed\u8a00\u7684\u5fae\u89c2\u5c42\u9762\u548c\u5b8f\u89c2\u5c42\u9762\u7279\u5f81\u4e4b\u95f4\u7684\u5173\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u8bc4\u4f30\u7684\u65b9\u6cd5\uff0c\u5e76\u627e\u5230\u4e86\u5f3a\u5927\u7684\u6027\u80fd\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u81ea\u52a8\u5316\u7b2c\u4e8c\u8bed\u8a00\u8bc4\u4f30\u5de5\u5177\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6613\u4e8e\u9002\u5e94\u5176\u4ed6\u8bed\u8a00\uff0c\u56e0\u4e3a\u5b83\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u56e0\u6b64\u4e0d\u9700\u8981\u5927\u89c4\u6a21\u6807\u6ce8\u8bad\u7ec3\u6570\u636e\u3002|\n", "2408.16502": "|**2024-08-29**|**LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?**|Jan Cegin et.al.|[2408.16502](http://arxiv.org/abs/2408.16502)|null|\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8d8a\u6765\u8d8a\u5e7f\u6cdb\uff0c\u6587\u672c\u6837\u672c\u901a\u8fc7LLM\u8fdb\u884c\u540c\u4e49\u66ff\u6362\u540e\u7528\u4e8e\u5206\u7c7b\u6a21\u578b\u7684\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5173\u4e8eLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u76f8\u8f83\u4e8e\u73b0\u6709\u6210\u719f\u65b9\u6cd5\u662f\u5426\u5177\u6709\u660e\u663e\u4f18\u52bf\u7684\u7814\u7a76\u8bc1\u636e\u76f8\u5bf9\u7f3a\u4e4f\u3002\u4e3a\u4e86\u63a2\u8ba8\u5728\u4f55\u79cd\u60c5\u51b5\u4e0b\u4f7f\u7528LLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u66f4\u4e3a\u6709\u5229\uff0c\u672c\u7814\u7a76\u57286\u4e2a\u6570\u636e\u96c6\u30013\u4e2a\u5206\u7c7b\u5668\u548c2\u79cd\u5fae\u8c03\u65b9\u6cd5\u4e0a\u8fdb\u884c\u4e86\u5bf9\u6bd4\u5b9e\u9a8c\u3002\u6211\u4eec\u8fd8\u8c03\u6574\u4e86\u79cd\u5b50\u6570\u91cf\u548c\u6536\u96c6\u6837\u672c\u7684\u6570\u91cf\uff0c\u4ee5\u4fbf\u66f4\u5168\u9762\u5730\u63a2\u7d22\u4e0b\u6e38\u6a21\u578b\u51c6\u786e\u5ea6\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6210\u672c\u6548\u76ca\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u975e\u5e38\u5c11\u91cf\u79cd\u5b50\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u503c\u5f97\u90e8\u7f72\u3002\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u65b9\u6cd5\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u7c7b\u4f3c\u751a\u81f3\u66f4\u597d\u7684\u6a21\u578b\u51c6\u786e\u5ea6\u3002|\n", "2408.17437": "|**2024-08-30**|**SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists**|Raoyuan Zhao et.al.|[2408.17437](http://arxiv.org/abs/2408.17437)|**[link](https://github.com/loreley99/syntheval_checklist)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\u901a\u5e38\u4f7f\u7528\u9759\u6001\u9884\u7559\u6d4b\u8bd5\u96c6\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5f80\u5f80\u4f1a\u5bfc\u81f4\u6027\u80fd\u8fc7\u4f30\u8ba1\uff0c\u5e76\u7f3a\u4e4f\u63d0\u4f9b\u5168\u9762\u3001\u53ef\u89e3\u91ca\u548c\u52a8\u6001\u8bc4\u4f30NLP\u6a21\u578b\u7684\u80fd\u529b\u3002\u8fd1\u671f\uff0c\u5982DynaBench\uff08Kiela\u7b49\uff0c2021\u5e74\uff09\u548cCheckList\uff08Ribeiro\u7b49\uff0c2020\u5e74\uff09\u7b49\u4f5c\u54c1\u901a\u8fc7\u591a\u6b65\u9aa4\u4eba\u5de5\u6ce8\u91ca\u7ba1\u9053\u751f\u6210\u6d4b\u8bd5\u7c7b\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5bf9NLP\u6a21\u578b\u8fdb\u884c\u884c\u4e3a\u6d4b\u8bd5\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u624b\u52a8\u521b\u5efa\u5404\u79cd\u6d4b\u8bd5\u7c7b\u578b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\u52b3\u52a8\uff0c\u6210\u672c\u9ad8\u6602\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSYNTHEVAL\u7684\u6df7\u5408\u884c\u4e3a\u6d4b\u8bd5\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5927\u91cf\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4e3aNLP\u6a21\u578b\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002SYNTHEVAL\u9996\u5148\u901a\u8fc7LLMs\u8fdb\u884c\u53d7\u63a7\u751f\u6210\u751f\u6210\u53e5\u5b50\uff0c\u7136\u540e\u901a\u8fc7\u6bd4\u8f83LLMs\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684NLP\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\u6765\u8bc6\u522b\u6311\u6218\u6027\u793a\u4f8b\u3002\u6700\u540e\u9636\u6bb5\uff0c\u7531\u4eba\u7c7b\u4e13\u5bb6\u8c03\u67e5\u8fd9\u4e9b\u6311\u6218\u6027\u793a\u4f8b\uff0c\u624b\u52a8\u8bbe\u8ba1\u6a21\u677f\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u6a21\u578b\u4e00\u81f4\u8868\u73b0\u7684\u5931\u8d25\u7c7b\u578b\u3002\u6211\u4eec\u5c06SYNTHEVAL\u5e94\u7528\u4e8e\u60c5\u611f\u5206\u6790\u548c\u6709\u6bd2\u8bed\u8a00\u68c0\u6d4b\u4e24\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e0a\uff0c\u5e76\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u8bc6\u522b\u8fd9\u4e9b\u4efb\u52a1\u4e2d\u5f3a\u5927\u6a21\u578b\u7684\u5f31\u70b9\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5206\u4eab\u4e86\u4ee3\u7801\u4e8ehttps://github.com/Loreley99/SynthEval_CheckList\u3002**|\n", "2408.17431": "|**2024-08-30**|**Advancing Multi-talker ASR Performance with Large Language Models**|Mohan Shi et.al.|[2408.17431](http://arxiv.org/abs/2408.17431)|null|\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u9886\u57df\uff0c\u8bc6\u522b\u5bf9\u8bdd\u573a\u666f\u4e2d\u7684\u91cd\u53e0\u8bed\u97f3\u662f\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u5904\u7406\u65b9\u6cd5\u901a\u8fc7\u5e8f\u5217\u8f93\u51fa\u8bad\u7ec3\uff08SOT\uff09\uff0c\u5373\u5c06\u591a\u4e2a\u8bf4\u8bdd\u8005\u7684\u58f0\u97f3\u6392\u653e\u65f6\u95f4\u6309\u7167\u5176\u53d1\u8a00\u987a\u5e8f\u8fdb\u884c\u62fc\u63a5\uff0c\u6765\u89e3\u51b3\u591a\u8bf4\u8bdd\u8005ASR\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u4ece\u5bf9\u8bdd\u4e2d\u62fc\u63a5\u76f8\u5173\u8bdd\u8bed\u7684\u8f6c\u5f55\u4f9d\u8d56\u4e8e\u6784\u5efa\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u65b9\u6cd5\u53ef\u80fd\u66f4\u9002\u5408\u5904\u7406\u8fd9\u7c7b\u590d\u6742\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u573a\u666f\uff0c\u56e0\u4e3a\u5b83\u5229\u7528\u4e86\u9884\u8bad\u7ec3\u89e3\u7801\u5668\u7684\u5f3a\u5927\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684SOT\u65b9\u6cd5\u7528\u4e8e\u591a\u8bf4\u8bdd\u8005ASR\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u548cLLM\uff0c\u5e76\u901a\u8fc7\u9002\u5f53\u7684\u7b56\u7565\u5bf9\u591a\u8bf4\u8bdd\u8005\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6a21\u62df\u6570\u636e\u96c6LibriMix\u4e0a\u4f18\u4e8e\u4f20\u7edf\u7684\u65b9\u6cd5\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6AMI\u7684\u8bc4\u4f30\u96c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u663e\u8457\u8d85\u8d8a\u4e86\u4e4b\u524d\u4f7f\u75281000\u500d\u66f4\u591a\u76d1\u7763\u6570\u636e\u8bad\u7ec3\u7684AED\u6a21\u578b\u3002|\n", "2408.17404": "|**2024-08-30**|**Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach**|Jialiang Wei et.al.|[2408.17404](http://arxiv.org/abs/2408.17404)|**[link](https://github.com/jl-wei/feature-inspiration)**|\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\uff0c\u501f\u9274\u5e94\u7528\u5546\u5e97\uff08AppStore\uff09\u7684\u89c4\u8303\u83b7\u53d6\u65b9\u6cd5\u88ab\u8bc1\u660e\u975e\u5e38\u6709\u76ca\u3002\u5f00\u53d1\u8005\u7ecf\u5e38\u7814\u7a76\u7ade\u4e89\u5bf9\u624b\u7684\u5e94\u7528\u7a0b\u5e8f\u4ee5\u6536\u96c6\u65b0\u529f\u80fd\u7684\u7075\u611f\u3002\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u542f\u53d1\u7684\u89c4\u8303\u83b7\u53d6\u5177\u6709\u6f5c\u529b\u3002LLMs\u53ef\u4ee5\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u65b0\u529f\u80fd\u60f3\u6cd5\u7684\u7075\u611f\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u5728\u5b9e\u8df5\u4e2d\u8d8a\u6765\u8d8a\u53d7\u6b22\u8fce\uff0c\u4f46\u5b83\u4eec\u4e4b\u95f4\u7684\u5dee\u5f02\u7f3a\u4e4f\u6df1\u5165\u7406\u89e3\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6bd4\u8f83\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5e94\u7528\u5546\u5e97\u548cLLM\u542f\u53d1\u7684\u65b9\u6cd5\u5728\u7ec6\u5316\u529f\u80fd\u4e3a\u5b50\u529f\u80fd\u65f6\u7684\u8868\u73b0\u3002\u901a\u8fc7\u624b\u52a8\u5206\u6790\u4ece\u4e24\u79cd\u65b9\u6cd5\u63a8\u8350\u76841200\u4e2a\u5b50\u529f\u80fd\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u5b83\u4eec\u7684\u4f18\u70b9\u3001\u6311\u6218\u4ee5\u53ca\u5173\u952e\u5dee\u5f02\u3002\u5c3d\u7ba1\u4e24\u79cd\u65b9\u6cd5\u90fd\u63a8\u8350\u4e86\u9ad8\u5ea6\u76f8\u5173\u4e14\u63cf\u8ff0\u6e05\u6670\u7684\u5b50\u529f\u80fd\uff0c\u4f46LLMs\u5728\u7279\u522b\u6d89\u53ca\u672a\u89c1\u5e94\u7528\u8303\u56f4\u7684\u65b0\u9896\u6027\u65b9\u9762\u4f3c\u4e4e\u66f4\u4e3a\u5f3a\u5927\u3002\u6b64\u5916\uff0c\u4e00\u4e9b\u63a8\u8350\u7684\u529f\u80fd\u662f\u865a\u6784\u7684\uff0c\u5176\u53ef\u884c\u6027\u4e0d\u660e\u786e\uff0c\u8fd9\u5f3a\u8c03\u4e86\u4eba\u7c7b\u5206\u6790\u5e08\u5728\u83b7\u53d6\u8fc7\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2408.17377": "|**2024-08-30**|**NDP: Next Distribution Prediction as a More Broad Target**|Junhao Ruan et.al.|[2408.17377](http://arxiv.org/abs/2408.17377)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u8303\u5f0f\u8fdb\u884c\u8bad\u7ec3\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684NTP\u8303\u5f0f\u5b58\u5728\u51e0\u4e2a\u9650\u5236\uff0c\u7279\u522b\u662f\u5728\u8ba1\u5212\u4efb\u52a1\u590d\u6742\u6027\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9519\u8bef\u4f20\u64ad\u65b9\u9762\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6269\u5c55\u4e86\u5bf9NTP\u7684\u6279\u8bc4\uff0c\u6307\u51fa\u5176\u9650\u5236\u8fd8\u6e90\u4e8e\u8bad\u7ec3\u76ee\u6807\u72ed\u7a84\uff1a\u9884\u6d4b\u4e00\u4e2a\u6b21\u4f18\u7684\u4e00\u70ed\u5206\u5e03\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u6279\u8bc4\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u9884\u5b9e\u9a8c\uff0c\u5c06\u5f3a\u5927\u7684LLM\u7684\u8f93\u51fa\u5206\u5e03\u89c6\u4e3a\u9ad8\u6548\u7684\u4e16\u754c\u6570\u636e\u538b\u7f29\u3002\u901a\u8fc7\u8bc4\u4f30n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\uff0c\u6211\u4eec\u53d1\u73b0n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u66f4\u4e3a\u4e00\u81f4\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e0b\u4e00\u4e2a\u5206\u5e03\u9884\u6d4b\uff08NDP\uff09\uff0c\u4f7f\u7528n-gram\u5206\u5e03\u6765\u66ff\u6362\u4e00\u70ed\u76ee\u6807\uff0c\u4ece\u800c\u589e\u5f3a\u5b66\u4e60\u8fc7\u7a0b\u800c\u65e0\u9700\u989d\u5916\u7684\u5728\u7ebf\u8bad\u7ec3\u65f6\u95f4\u3002\u6211\u4eec\u5728\u7ffb\u8bd1\u3001\u901a\u7528\u4efb\u52a1\u3001\u8bed\u8a00\u8fc1\u79fb\u548c\u533b\u7597\u9886\u57df\u9002\u5e94\u7b49\u56db\u4e2a\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0eNTP\u76f8\u6bd4\uff0cNDP\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u53ef\u8fbe\u5230+2.97 COMET\u6539\u8fdb\uff0c\u5728\u901a\u7528\u4efb\u52a1\u4e0a\u5e73\u5747\u6539\u5584+0.61\uff0c\u5728\u533b\u7597\u9886\u57df\u4e0a\u5e73\u5747\u6539\u5584+10.75\u3002\u8fd9\u8868\u660e\u89e3\u51b3\u76ee\u6807\u72ed\u7a84\u95ee\u9898\u7684\u5177\u4f53\u76ca\u5904\uff0c\u5e76\u6307\u51fa\u4e86\u672a\u6765\u6539\u8fdbNTP\u7684\u4e00\u4e2a\u65b0\u65b9\u5411\u3002|\n", "2408.17362": "|**2024-08-30**|**Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain**|Francesca Grasso et.al.|[2408.17362](http://arxiv.org/abs/2408.17362)|**[link](https://github.com/stefanolocci/LLMClassification)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4e24\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09GPT3.5\u548cLlama2\u4ee5\u53ca\u4e00\u79cd\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09Gemma\u5728\u6c14\u5019\u53d8\u5316\uff08CC\uff09\u548c\u73af\u5883\u9886\u57df\u5185\u7684\u4e09\u79cd\u4e0d\u540c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528\u57fa\u4e8eBERT\u7684\u6a21\u578b\u4f5c\u4e3a\u57fa\u51c6\uff0c\u6211\u4eec\u5c06\u8fd9\u4e9b\u8f6c\u6362\u5668\u57fa\u6a21\u578b\u4e0e\u5b83\u4eec\u8fdb\u884c\u6bd4\u8f83\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u8fd9\u4e9b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u53e3\u5934\u4fe1\u5fc3\u5206\u6570\u7684\u6821\u51c6\u60c5\u51b5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u57fa\u4e8eBERT\u7684\u6a21\u578b\u901a\u5e38\u5728\u6240\u6709\u6a21\u578b\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u4f46\u5927\u751f\u6210\u6a21\u578b\u7684\u6027\u80fd\u4ecd\u7136\u503c\u5f97\u6ce8\u610f\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u7684\u6821\u51c6\u5206\u6790\u663e\u793a\uff0cGemma\u5728\u521d\u671f\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u826f\u597d\u7684\u6821\u51c6\u6027\uff0c\u968f\u540e\u4ea7\u751f\u4e0d\u4e00\u81f4\u7684\u7ed3\u679c\uff1bLlama\u5177\u6709\u5408\u7406\u7684\u6821\u51c6\u6027\uff0c\u800cGPT\u59cb\u7ec8\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6821\u51c6\u6027\u3002\u901a\u8fc7\u8fd9\u9879\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u8ba8\u8bba\u5927\u578b\u751f\u6210\u578bLM\u5728\u89e3\u51b3\u5730\u7403\u6700\u7d27\u8feb\u95ee\u9898\u65b9\u9762\u7684\u9002\u7528\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u8d21\u732e\uff0c\u7279\u522b\u662f\u5728\u751f\u6001\u5b66\u548cCC\u80cc\u666f\u4e0b\u7a81\u51fa\u5176\u4f18\u52bf\u548c\u9650\u5236\u3002**|\n", "2408.17354": "|**2024-08-30**|**Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage**|Md Rafi Ur Rashid et.al.|[2408.17354](http://arxiv.org/abs/2408.17354)|null|\u9488\u5bf9\u79c1\u6709\u6570\u636e\u8fdb\u884c\u4e0b\u6e38\u5e94\u7528\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u5b58\u5728\u91cd\u5927\u9690\u79c1\u98ce\u9669\uff0c\u53ef\u80fd\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u3002\u5f53\u524d\u793e\u533a\u5e73\u53f0\u63d0\u4f9b\u4e86\u65b9\u4fbf\u7684\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u6a21\u578b\u5206\u53d1\uff0c\u4efb\u4f55\u4eba\u90fd\u53ef\u4ee5\u53d1\u5e03\u800c\u65e0\u9700\u4e25\u683c\u7684\u9a8c\u8bc1\u3002\u8fd9\u79cd\u60c5\u5883\u4e0b\uff0c\u9690\u79c1\u5a01\u80c1\u663e\u8457\u589e\u52a0\uff0c\u56e0\u4e3a\u9884\u8bad\u7ec3\u6a21\u578b\u53ef\u80fd\u88ab\u6545\u610f\u7be1\u6539\u4ee5\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u6cc4\u9732\u79c1\u4eba\u6570\u636e\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e2d\u6bd2\u6280\u672f\uff0c\u4f7f\u7528\u6a21\u578b\u5378\u8f7d\u4f5c\u4e3a\u653b\u51fb\u5de5\u5177\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u8c03\u6574\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6765\u63d0\u9ad8\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u79c1\u4eba\u6570\u636e\u6cc4\u9732\u7a0b\u5ea6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u6a21\u578b\u5b9e\u7528\u6027\u7684\u540c\u65f6\uff0c\u589e\u5f3a\u4e86\u6210\u5458\u5f52\u5c5e\u6027\u548c\u6570\u636e\u63d0\u53d6\u653b\u51fb\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u4e0d\u540c\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5fae\u8c03\u8bbe\u7f6e\u4e0b\u663e\u793a\uff0c\u6211\u4eec\u7684\u653b\u51fb\u663e\u8457\u8d85\u8d8a\u4e86\u57fa\u51c6\u6027\u80fd\u3002\u8fd9\u9879\u5de5\u4f5c\u5411\u4e0b\u8f7d\u672a\u7ecf\u8fc7\u4e25\u683c\u9a8c\u8bc1\u6765\u6e90\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7528\u6237\u53d1\u51fa\u4e86\u8b66\u544a\uff0c\u7a81\u663e\u4e86\u6f5c\u5728\u7684\u98ce\u9669\u3002|\n", "2408.17316": "|**2024-08-30**|**Bridging Domain Knowledge and Process Discovery Using Large Language Models**|Ali Norouzifar et.al.|[2408.17316](http://arxiv.org/abs/2408.17316)|**[link](https://github.com/alinorouzifar/imr-llm)**|**\u53d1\u73b0\u4f18\u8d28\u6d41\u7a0b\u6a21\u578b\u5bf9\u4e8e\u6267\u884c\u4e0d\u540c\u7684\u6d41\u7a0b\u5206\u6790\u4efb\u52a1\u81f3\u5173\u91cd\u8981\uff0c\u5982\u4e00\u81f4\u6027\u68c0\u67e5\u548c\u6d41\u7a0b\u6539\u8fdb\u3002\u81ea\u52a8\u5316\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u4ef7\u503c\u7684\u4e13\u4e1a\u9886\u57df\u77e5\u8bc6\u3002\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5305\u62ec\u6765\u81ea\u4e13\u4e1a\u9886\u57df\u4e13\u5bb6\u7684\u89c1\u89e3\u548c\u8be6\u7ec6\u6d41\u7a0b\u6587\u6863\uff0c\u901a\u5e38\u5728\u6d41\u7a0b\u53d1\u73b0\u8fc7\u7a0b\u4e2d\u672a\u5f97\u5230\u5145\u5206\u5229\u7528\u3002\u672c\u6587\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f4\u63a5\u5c06\u6b64\u7c7b\u77e5\u8bc6\u6574\u5408\u5230\u6d41\u7a0b\u53d1\u73b0\u4e2d\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u4f7f\u7528\u4eceLLMs\u4e2d\u63d0\u53d6\u7684\u89c4\u5219\u6765\u6307\u5bfc\u6a21\u578b\u6784\u5efa\u8fc7\u7a0b\uff0c\u786e\u4fdd\u5176\u4e0e\u9886\u57df\u77e5\u8bc6\u548c\u5b9e\u9645\u6d41\u7a0b\u6267\u884c\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u6574\u5408LLMs\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u5ea7\u8fde\u63a5\u4ee5\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u7684\u6d41\u7a0b\u77e5\u8bc6\u4e0e\u53d1\u73b0\u7a33\u5065\u6d41\u7a0b\u6a21\u578b\u4e4b\u95f4\u7684\u6865\u6881\uff0c\u663e\u8457\u63a8\u8fdb\u4e86\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u8bba\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u6846\u67b6\u7684\u5b9e\u7528\u6027\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u5bf9\u8c61\u662fUWV\u5458\u5de5\u4fdd\u9669\u516c\u53f8\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5b9e\u9645\u4f18\u52bf\u548c\u6709\u6548\u6027\u3002**|\n", "2408.17280": "|**2024-08-30**|**Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts**|Rhui Dih Lee et.al.|[2408.17280](http://arxiv.org/abs/2408.17280)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5de5\u5177\u5305\uff0c\u7528\u4e8e\u4ece\u5df2\u8bad\u7ec3\u7684\u6a21\u578b\u521b\u5efa\u4f4e\u6210\u672c\u7684\u9886\u57df\u4e13\u5bb6\u6df7\u5408\uff08MOE\uff09\u3002\u8be5\u5de5\u5177\u5305\u53ef\u4ee5\u7528\u4e8e\u4ece\u6a21\u578b\u6216\u9002\u914d\u5668\u521b\u5efa\u6df7\u5408\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d4b\u8bd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u5173\u4e8e\u4f7f\u7528\u5de5\u5177\u5305\u5b9a\u4e49\u7ed3\u679cMOE\u67b6\u6784\u7684\u6307\u5bfc\u3002\u516c\u5f00\u4e86\u4e00\u4e2a\u53ef\u7528\u7684\u5b58\u50a8\u5e93\u3002|\n", "2408.17258": "|**2024-08-30**|**Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach**|Tong Nie et.al.|[2408.17258](http://arxiv.org/abs/2408.17258)|null|\u7535\u5b50\u5546\u52a1\u548c\u57ce\u5e02\u5316\u7684\u84ec\u52c3\u53d1\u5c55\uff0c\u6781\u5927\u5730\u589e\u5f3a\u4e86\u57ce\u5e02\u533a\u57df\u7684\u914d\u9001\u6d3b\u52a8\uff0c\u5bfc\u81f4\u4e86\u9700\u6c42\u91cf\u7684\u589e\u52a0\u4e0e\u590d\u6742\u6027\u7684\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6570\u636e\u9a71\u52a8\u7684\u9884\u6d4b\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5f00\u59cb\u5728\u57ce\u5e02\u914d\u9001\u9700\u6c42\u7ba1\u7406\u95ee\u9898\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\u5168\u57ce\u8303\u56f4\u5185\u7684\u914d\u9001\u9700\u6c42\u8054\u5408\u4f30\u8ba1\u4e0e\u9884\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c06\u5176\u5efa\u6a21\u4e3a\u4e00\u4e2a\u57fa\u4e8e\u56fe\u7684\u65f6\u7a7a\u5b66\u4e60\u4efb\u52a1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e00\u4e2a\u6d88\u606f\u4f20\u9012\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u6765\u6355\u6349\u76f8\u5173\u533a\u57df\u4e4b\u95f4\u9700\u6c42\u6a21\u5f0f\u7684\u4ea4\u4e92\u3002\u5176\u6b21\uff0c\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u4ece\u672a\u7ed3\u6784\u5316\u7684\u5730\u7406\u4f4d\u7f6e\u6570\u636e\u4e2d\u63d0\u53d6\u901a\u7528\u7684\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u7f16\u7801\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u9700\u6c42\u9884\u6d4b\u5668\u4e2d\u3002\u6700\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u6a21\u578b\u5728\u4e0d\u540c\u57ce\u5e02\u7684\u8fc1\u79fb\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u5f52\u7eb3\u8bad\u7ec3\u65b9\u6848\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u771f\u5b9e\u7684\u914d\u9001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4e2d\u56fd\u7684\u516b\u4e2a\u57ce\u5e02\u548c\u7f8e\u56fd\u7684\u57ce\u5e02\uff0c\u7ed3\u679c\u8868\u660e\u6211\u4eec\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u57fa\u51c6\u65b9\u6cd5\u3002|\n", "2408.17253": "|**2024-08-30**|**VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters**|Mouxiang Chen et.al.|[2408.17253](http://arxiv.org/abs/2408.17253)|**[link](https://github.com/keytoyze/visionts)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ece\u4e30\u5bcc\u4e14\u9ad8\u8d28\u91cf\u7684\u81ea\u7136\u56fe\u50cf\u51fa\u53d1\u6784\u5efa\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u57fa\u7840\u6a21\u578b\u7684\u65b0\u8def\u5f84\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u8981\u4e48\u901a\u8fc7\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8981\u4e48\u5efa\u7acb\u5927\u89c4\u6a21\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u96c6\u6765\u5f00\u53d1TSF\u57fa\u7840\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u9762\u4e34\u8de8\u57df\u5dee\u8ddd\u6216\u9886\u57df\u5185\u5f02\u8d28\u6027\u7684\u4e25\u5cfb\u6311\u6218\u3002\u6211\u4eec\u57fa\u4e8e\u56fe\u50cf\u4e0e\u65f6\u95f4\u5e8f\u5217\u4e4b\u95f4\u5185\u5728\u76f8\u4f3c\u6027\uff0c\u63a2\u7d22\u4e86\u4e00\u79cd\u65b0\u7684TSF\u4efb\u52a1\u8868\u793a\uff0c\u5c06\u5176\u91cd\u65b0\u8868\u8ff0\u4e3a\u56fe\u50cf\u91cd\u5efa\u4efb\u52a1\uff0c\u5e76\u5229\u7528\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u63a9\u7801\u81ea\u52a8\u7f16\u7801\u5668\uff08MAE\uff09\u8fdb\u884c\u5904\u7406\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5728\u65e0\u9700\u8fdb\u4e00\u6b65\u5728\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u8fdb\u884c\u9002\u5e94\u7684\u60c5\u51b5\u4e0b\uff0c\u6240\u63d0\u51fa\u7684VisionTS\u5c31\u80fd\u5b9e\u73b0\u4f18\u4e8e\u73b0\u6709TSF\u57fa\u7840\u6a21\u578b\u7684\u96f6\u6837\u672c\u9884\u6d4b\u6027\u80fd\u3002\u901a\u8fc7\u6700\u5c0f\u7a0b\u5ea6\u7684\u5fae\u8c03\uff0cVisionTS\u80fd\u591f\u8fdb\u4e00\u6b65\u63d0\u5347\u9884\u6d4b\u6027\u80fd\uff0c\u5e76\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u89c6\u89c9\u6a21\u578b\u53ef\u80fd\u4e3aTSF\u63d0\u4f9b\u514d\u8d39\u5348\u9910\uff0c\u5e76\u5f3a\u8c03\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u4e0eTSF\u9886\u57df\u672a\u6765\u4ea4\u53c9\u7814\u7a76\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/Keytoyze/VisionTS\u4e0a\u3002**|\n", "2409.02920": "|**2024-09-04**|**RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)**|Yao Mu et.al.|[2409.02920](http://arxiv.org/abs/2409.02920)|null|\u672c\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboTwin\u7684\u65b0\u578b\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u7ed3\u5408\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9065\u63a7\u6570\u636e\u4e0e\u901a\u8fc7\u6570\u5b57\u5b6a\u751f\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u3002RoboTwin\u65e8\u5728\u4e3a\u53cc\u81c2\u673a\u5668\u4eba\u573a\u666f\u63d0\u4f9b\u652f\u6301\uff0c\u7279\u522b\u5173\u6ce8\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u548c\u4eba\u673a\u4ea4\u4e92\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528COBOT Magic\u5e73\u53f0\u6536\u96c6\u4e86\u4e30\u5bcc\u7684\u6570\u636e\uff0c\u6db5\u76d6\u5de5\u5177\u64cd\u4f5c\u548c\u4eba\u673a\u4e92\u52a8\u7684\u591a\u6837\u6027\u3002 \u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\u6765\u521b\u5efa\u6570\u5b57\u5b6a\u751f\u4f53\uff0c\u5229\u7528AI\u751f\u6210\u7684\u5185\u5bb9\u5c06\u4e8c\u7ef4\u56fe\u50cf\u8f6c\u6362\u4e3a\u8be6\u7ec6\u7684\u4e09\u7ef4\u6a21\u578b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u4e13\u5bb6\u7ea7\u8bad\u7ec3\u6570\u636e\u548c\u9762\u5411\u529f\u80fd\u6027\u7684\u4efb\u52a1\u7279\u5b9a\u59ff\u6001\u5e8f\u5217\u3002 \u6211\u4eec\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a 1. RoboTwin\u57fa\u51c6\u6570\u636e\u96c6\uff0c 2. \u9ad8\u6548\u7684\u73b0\u5b9e\u5230\u6a21\u62df\u7ba1\u9053\uff0c\u4ee5\u53ca 3. \u5229\u7528\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u52a8\u4e13\u5bb6\u7ea7\u6570\u636e\u751f\u6210\u3002 \u8fd9\u4e9b\u8fdb\u5c55\u65e8\u5728\u89e3\u51b3\u673a\u5668\u4eba\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u6709\u671b\u52a0\u901f\u5f00\u53d1\u66f4\u591a\u529f\u80fd\u5f3a\u5927\u3001\u9002\u5e94\u6027\u5e7f\u6cdb\u7684\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5e94\u7528\u4e8e\u5e7f\u6cdb\u7684\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://robotwin-benchmark.github.io/early-version/|\n", "2409.02897": "|**2024-09-05**|**LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA**|Jiajie Zhang et.al.|[2409.02897](http://arxiv.org/abs/2409.02897)|**[link](https://github.com/THUDM/LongCite)**|\u5c3d\u7ba1\u5f53\u524d\u7684\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5927\u91cf\u6587\u672c\u56de\u7b54\u7528\u6237\u95ee\u9898\u65b9\u9762\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7f3a\u4e4f\u5f15\u7528\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u9a8c\u8bc1\u7b54\u6848\u7684\u51c6\u786e\u6027\uff0c\u4ece\u800c\u5f15\u53d1\u4e86\u5bf9\u5176\u53ef\u9760\u6027\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4ea7\u751f\u9519\u8bef\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4f7f\u8fd9\u4e9b\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u751f\u6210\u5305\u542b\u7cbe\u7ec6\u53e5\u7ea7\u5f15\u7528\u7684\u54cd\u5e94\uff0c\u4ee5\u63d0\u9ad8\u5b83\u4eec\u7684\u5fe0\u5b9e\u5ea6\u548c\u53ef\u9a8c\u8bc1\u6027\u3002 \u6211\u4eec\u9996\u5148\u5f15\u5165\u4e86LongBench-Cite\uff0c\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u8868\u73b0\u7684\u57fa\u51c6\uff0c\u63ed\u793a\u4e86\u5728\u53e5\u7ea7\u5f15\u7528\u65b9\u9762\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CoF\uff08\u7c97\u5230\u7ec6\uff09\u8fd9\u4e00\u65b0\u9896\u7684\u7ba1\u9053\uff0c\u5229\u7528\u73b0\u6210\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5305\u542b\u7cbe\u786e\u53e5\u7ea7\u5f15\u7528\u7684\u957f\u6587\u672c\u95ee\u7b54\u5b9e\u4f8b\uff0c\u5e76\u4ee5\u6b64\u7ba1\u9053\u6784\u5efa\u4e86LongCite-45k\uff0c\u4e00\u4e2a\u7528\u4e8e\u53e5\u7ea7\u5f15\u7528\u95ee\u9898\u7684\u5927\u578b\u81ea\u76d1\u7763\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528LongCite-45k\u6570\u636e\u96c6\u8bad\u7ec3\u4e86LongCite-8B\u548cLongCite-9B\u6a21\u578b\uff0c\u6210\u529f\u5730\u4f7f\u5b83\u4eec\u80fd\u591f\u5728\u5355\u4e2a\u8f93\u51fa\u4e2d\u751f\u6210\u51c6\u786e\u7684\u54cd\u5e94\u548c\u7cbe\u7ec6\u7684\u53e5\u7ea7\u5f15\u7528\u3002\u5728LongBench-Cite\u4e0a\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u8bad\u7ec3\u6a21\u578b\u5728\u5f15\u7528\u8d28\u91cf\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6c34\u5e73\uff0c\u8d85\u8d8a\u4e86\u5305\u62ecGPT-4\u5728\u5185\u7684\u9ad8\u7ea7\u4e13\u6709\u6a21\u578b\u3002|\n", "2409.02889": "|**2024-09-04**|**LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture**|Xidong Wang et.al.|[2409.02889](http://arxiv.org/abs/2409.02889)|**[link](https://github.com/freedomintelligence/longllava)**|**\u6269\u5c55\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u5bf9\u4e8e\u89c6\u9891\u7406\u89e3\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u7406\u89e3\u548c\u591a\u6a21\u6001\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6d89\u53ca\u5230\u4e00\u7cfb\u5217\u7cfb\u7edf\u4f18\u5316\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u6570\u636e\u6784\u9020\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u5c24\u5176\u662f\u89e3\u51b3\u968f\u7740\u66f4\u591a\u56fe\u50cf\u5f15\u5165\u800c\u51fa\u73b0\u7684\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u9ad8\u6602\u8ba1\u7b97\u6210\u672c\u7b49\u95ee\u9898\u3002\u672c\u6587\u901a\u8fc7\u5c06\u6a21\u578b\u67b6\u6784\u8c03\u6574\u4e3aMamba\u548cTransformer\u5757\u7684\u6df7\u5408\u4f53\u3001\u91c7\u7528\u65e2\u80fd\u8003\u8651\u591a\u4e2a\u56fe\u50cf\u95f4\u65f6\u95f4\u4f9d\u8d56\u6027\u53c8\u80fd\u8003\u8651\u7a7a\u95f4\u4f9d\u8d56\u6027\u7684\u6570\u636e\u6784\u9020\u65b9\u6cd5\uff0c\u5e76\u5b9e\u65bd\u6e10\u8fdb\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5bf9\u8fd9\u4e9b\u6311\u6218\u8fdb\u884c\u4e86\u5e94\u5bf9\u3002\u53d1\u5e03\u7684\u6a21\u578b\u201cLongLLaVA\u201d\uff08\u957f\u671f\u8bed\u8a00\u4e0e\u89c6\u89c9\u52a9\u624b\uff09\u662f\u9996\u4e2a\u6df7\u5408\u578bMLLM\uff0c\u5b9e\u73b0\u4e86\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u826f\u597d\u5e73\u8861\u3002LongLLaVA\u4e0d\u4ec5\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u800c\u4e14\u4fdd\u6301\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u6d88\u8017\u7684\u7279\u70b9\u3002\u7279\u522b\u5730\uff0c\u5b83\u80fd\u591f\u5728\u5355\u4e2aA100 80GB GPU\u4e0a\u5904\u7406\u8fd1\u4e00\u5343\u5f20\u56fe\u7247\uff0c\u5c55\u793a\u4e86\u5e7f\u6cdb\u4efb\u52a1\u5e94\u7528\u524d\u666f\u7684\u6f5c\u529b\u3002**|\n", "2409.02841": "|**2024-09-04**|**Historical German Text Normalization Using Type- and Token-Based Language Modeling**|Anton Ehrmanntraut et.al.|[2409.02841](http://arxiv.org/abs/2409.02841)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf91700\u5e74\u81f31900\u5e74\u5fb7\u56fd\u6587\u5b66\u6587\u672c\u7684\u6b63\u8bcd\u6cd5\u89c4\u8303\u5316\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u5e73\u884c\u8bed\u6599\u5e93\u8bad\u7ec3\u3002\u6240\u63d0\u51fa\u7684\u7cfb\u7edf\u5229\u7528\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u548cTransformer\u8bed\u8a00\u6a21\u578b\uff0c\u7ed3\u5408\u7f16\u7801\u5668-\u89e3\u7801\u5668\u6a21\u578b\u5bf9\u5355\u4e2a\u8bcd\u6c47\u7c7b\u578b\u8fdb\u884c\u89c4\u8303\u5316\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u56e0\u679c\u8bed\u8a00\u6a21\u578b\u5728\u4e0a\u4e0b\u6587\u4e2d\u8c03\u6574\u8fd9\u4e9b\u89c4\u8303\u5316\u7ed3\u679c\u3002\u5e7f\u6cdb\u8bc4\u4f30\u8868\u660e\uff0c\u8be5\u63d0\u51fa\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u6700\u5148\u8fdb\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u5b8c\u5168\u7aef\u5230\u7aef\u7684\u53e5\u5b50\u7ea7\u89c4\u8303\u5316\u7cfb\u7edf\u76f8\u5f53\uff0c\u8be5\u7cfb\u7edf\u662f\u901a\u8fc7\u5bf9\u9884\u8bad\u7ec3\u7684Transformer\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u800c\u5b9e\u73b0\u7684\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6a21\u578b\u96be\u4ee5\u6cdb\u5316\u4ee5\u53ca\u7f3a\u4e4f\u5927\u91cf\u9ad8\u8d28\u91cf\u5e73\u884c\u6570\u636e\uff0c\u5386\u53f2\u6587\u672c\u7684\u89c4\u8303\u5316\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002|\n", "2409.02836": "|**2024-09-04**|**Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models**|Moein Shahiki Tash et.al.|[2409.02836](http://arxiv.org/abs/2409.02836)|null|\u672c\u6587\u901a\u8fc7\u8fd0\u7528\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\uff0c\u5bf9\u52a0\u5bc6\u8d27\u5e01\u76f8\u5173\u8ba8\u8bba\u4e2d\u7684\u9884\u6d4b\u9648\u8ff0\u3001\u5e0c\u671b\u6f14\u8bb2\u53ca\u6094\u6068\u68c0\u6d4b\u884c\u4e3a\u8fdb\u884c\u5206\u6790\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5206\u7c7b\u65b9\u6cd5\u2014\u2014\u201c\u9884\u6d4b\u9648\u8ff0\u201d\uff0c\u5c06\u5176\u7ec6\u5206\u4e3a\u9884\u6d4b\u589e\u52a0\u3001\u9884\u6d4b\u51cf\u5c11\u3001\u9884\u6d4b\u4e2d\u7acb\u6216\u975e\u9884\u6d4b\u7c7b\u522b\u3002\u5229\u7528GPT-4o\u8fd9\u4e00\u524d\u6cbf\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u5728\u4e94\u5927\u4e3b\u6d41\u52a0\u5bc6\u8d27\u5e01\uff08Cardano\u3001Binance\u3001Matic\u3001Fantom\u3001Ripple\uff09\u7684\u8ba8\u8bba\u4e2d\u63a2\u7d22\u4e86\u60c5\u7eea\u52a8\u6001\u3002\u7814\u7a76\u53d1\u73b0\uff0cMatic\u5728\u4e50\u89c2\u9884\u6d4b\u65b9\u9762\u663e\u793a\u51fa\u7279\u522b\u9ad8\u7684\u503e\u5411\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5e0c\u671b\u4e0e\u6094\u6068\u60c5\u7eea\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u63ed\u793a\u4e86\u8fd9\u4e9b\u60c5\u611f\u4e0e\u9884\u6d4b\u884c\u4e3a\u4e4b\u95f4\u590d\u6742\u7684\u4e92\u52a8\u6a21\u5f0f\u3002\u5c3d\u7ba1\u9762\u4e34\u6570\u636e\u91cf\u548c\u8d44\u6e90\u53ef\u7528\u6027\u65b9\u9762\u7684\u9650\u5236\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4ecd\u63ed\u793a\u4e86\u52a0\u5bc6\u8d27\u5e01\u5e02\u573a\u6295\u8d44\u8005\u884c\u4e3a\u548c\u60c5\u7eea\u8d8b\u52bf\u7684\u91cd\u8981\u53d1\u73b0\uff0c\u4e3a\u6218\u7565\u51b3\u7b56\u548c\u672a\u6765\u7814\u7a76\u63d0\u4f9b\u4e86\u4fe1\u606f\u3002|\n", "2409.02834": "|**2024-09-04**|**CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models**|Wentao Liu et.al.|[2409.02834](http://arxiv.org/abs/2409.02834)|**[link](https://github.com/ecnu-icalk/educhat-math)**|\u672c\u6587\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aCMM-Math\u7684\u4e2d\u6587\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\uff0c\u5305\u542b\u57fa\u51c6\u548c\u8bad\u7ec3\u90e8\u5206\uff0c\u65e8\u5728\u8bc4\u4f30\u548c\u589e\u5f3a\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\u3002CMM-Math\u5305\u542b\u4e86\u8d85\u8fc728,000\u4e2a\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u4ece\u5c0f\u5b66\u5230\u9ad8\u4e2d\u7684\u4e2d\u56fd12\u4e2a\u5e74\u7ea7\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\uff08\u4f8b\u5982\u9009\u62e9\u9898\u3001\u586b\u7a7a\u9898\u7b49\uff09\uff0c\u5e76\u63d0\u4f9b\u4e86\u8be6\u7ec6\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7279\u522b\u5730\uff0c\u95ee\u9898\u6216\u89c2\u70b9\u4e2d\u53ef\u80fd\u5305\u542b\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u4f7f\u5f97\u8fd9\u4e2a\u6570\u636e\u96c6\u66f4\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u6700\u5148\u8fdb\u7684LMM\u5728CMM-Math\u6570\u636e\u96c6\u4e0a\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728LMM\u5f00\u53d1\u65b9\u9762\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultimodal Mathematical LMM\uff08Math-LMM\uff09\u7684\u6a21\u578b\u6765\u5904\u7406\u6df7\u5408\u8f93\u5165\u7684\u591a\u4e2a\u56fe\u50cf\u548c\u6587\u672c\u6bb5\u843d\u7684\u95ee\u9898\u3002\u6211\u4eec\u91c7\u7528\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff1a\u57fa\u7840\u9884\u8bad\u7ec3\u3001\u57fa\u7840\u5fae\u8c03\u548c\u6570\u5b66\u5fae\u8c03\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u4e0e\u4e09\u4e2a\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u7684SOTA LMM\u8fdb\u884c\u6bd4\u8f83\u65f6\uff0c\u6709\u6548\u5730\u63d0\u9ad8\u4e86\u6570\u5b66\u63a8\u7406\u6027\u80fd\u3002|\n", "2409.02828": "|**2024-09-04**|**ExpLLM: Towards Chain of Thought for Facial Expression Recognition**|Xing Lan et.al.|[2409.02828](http://arxiv.org/abs/2409.02828)|null|\u9762\u90e8\u8868\u60c5\u8bc6\u522b\uff08FER\uff09\u5728\u591a\u5a92\u4f53\u9886\u57df\u81f3\u5173\u91cd\u8981\uff0c\u5bf9\u5404\u79cd\u5e94\u7528\u5177\u6709\u91cd\u5927\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7406\u89e3\u9762\u90e8\u8868\u60c5\u7684\u539f\u56e0\u5bf9\u4e8e\u51c6\u786e\u8bc6\u522b\u8868\u60c5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9762\u90e8\u52a8\u4f5c\u5355\u4f4d\uff08AUs\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u5e38\u63d0\u4f9bAU\u540d\u79f0\u548c\u5f3a\u5ea6\uff0c\u4f46\u7f3a\u4e4f\u5173\u4e8eAU\u4e4b\u95f4\u7684\u4e92\u52a8\u4ee5\u53ca\u6574\u4f53\u8868\u60c5\u4e4b\u95f4\u5173\u7cfb\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aExpLLM\u7684\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u7684\u51c6\u786e\u601d\u7ef4\u94fe\uff08CoT\uff09\u3002\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u89c6\u89d2\u8bbe\u8ba1\u4e86CoT\u673a\u5236\uff1a\u5173\u952e\u89c2\u5bdf\u3001\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u548c\u7ed3\u8bba\u3002\u5173\u952e\u89c2\u5bdf\u63cf\u8ff0\u4e86AU\u7684\u540d\u79f0\u3001\u5f3a\u5ea6\u53ca\u5176\u76f8\u5173\u60c5\u611f\u3002\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u57fa\u4e8e\u591a\u4e2aAU\u53ca\u5176\u4e92\u52a8\u8fdb\u884c\u5206\u6790\uff0c\u786e\u5b9a\u4e3b\u5bfc\u60c5\u611f\u53ca\u5176\u5173\u7cfb\u3002\u6700\u540e\uff0c\u7ed3\u8bba\u57fa\u4e8e\u524d\u4e00\u5206\u6790\u5f97\u51fa\u6700\u7ec8\u7684\u8868\u60c5\u6807\u7b7e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86Exp-CoT\u5f15\u64ce\uff0c\u7528\u4e8e\u6784\u5efa\u6b64\u8868\u60c5CoT\u5e76\u751f\u6210\u6307\u4ee4\u63cf\u8ff0\u6570\u636e\u4ee5\u8bad\u7ec3\u6211\u4eec\u7684ExpLLM\u3002\u5728RAF-DB\u548cAffectNet\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cExpLLM\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u65b9\u6cd5\u3002\u5728\u5fae\u8868\u60c5\u8bc6\u522b\u65b9\u9762\uff0cExpLLM\u4e5f\u8d85\u8d8a\u4e86\u6700\u65b0\u7684GPT-4o\uff0c\u5c24\u5176\u662f\u5728GPT-4o\u7ecf\u5e38\u5931\u8d25\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2409.02823": "|**2024-09-04**|**Design Contradictions: Help or Hindrance?**|Aron E. Owen et.al.|[2409.02823](http://arxiv.org/abs/2409.02823)|null|\u5728\u6570\u636e\u53ef\u89c6\u5316\u9886\u57df\uff0c\u521b\u65b0\u601d\u7ef4\u7684\u8feb\u5207\u9700\u6c42\u4fc3\u4f7f\u6211\u4eec\u63a2\u7d22\u65b0\u7684\u521b\u610f\u65b9\u6cd5\u3002\u901a\u8fc7\u7ec4\u5408\u4e24\u4e2a\u6216\u66f4\u591a\u5177\u6709\u5bf9\u7acb\u6027\u8d28\u7684\u521b\u9020\u6027\u8bcd\u6c47\uff0c\u80fd\u591f\u6fc0\u53d1\u65b0\u578b\u60f3\u6cd5\u4e0e\u8bbe\u8ba1\uff0c\u5bf9\u521b\u610f\u8fc7\u7a0b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002\u968f\u7740\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u8bbe\u8ba1\u7684\u53d1\u5c55\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u6d6e\u51fa\u6c34\u9762\uff1a\u8fd9\u4e9b\u8bbe\u8ba1\u77db\u76fe\u662f\u5426\u80fd\u4e0eAI\u5de5\u5177\u534f\u540c\u5de5\u4f5c\uff1f\u76ee\u524d\u7b54\u6848\u662f\u5426\u5b9a\u7684\u3002AI\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f9d\u8d56\u4e8e\u4ea7\u751f\u76f8\u4f3c\u6027\u7684\u7b97\u6cd5\uff0c\u800c\u521b\u9020\u529b\u5f80\u5f80\u9700\u8981\u5dee\u5f02\u6027\u548c\u65b0\u9896\u6027\u3002\u8fd9\u4efd\u6d77\u62a5\u5f00\u542f\u4e86\u5173\u4e8e\u5982\u4f55\u5f15\u5bfcAI\u7cfb\u7edf\u53d8\u5f97\u66f4\u5177\u521b\u9020\u6027\u548c\u751f\u6210\u65b0\u60f3\u6cd5\u7684\u5bf9\u8bdd\u3002\u8fd9\u9879\u7814\u7a76\u9080\u8bf7\u6211\u4eec\u91cd\u65b0\u8003\u8651\u4f20\u7edf\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5e76\u63a2\u7d22AI\u9a71\u52a8\u4e16\u754c\u4e2d\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u80fd\u5426\u5e94\u7528\u4f20\u7edf\u7684\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5982\u53cc\u94bb\u77f3\u6a21\u578b\uff0c\u6216\u8005\u662f\u5426\u9700\u8981\u65b0\u7684\u8bbe\u8ba1\u5de5\u7a0b\u65b9\u6cd5\uff1f\u5982\u4f55\u5229\u7528\u751f\u6210\u5f0fAI\u5feb\u901f\u8bbe\u8ba1\u53ef\u89c6\u5316\u5e76\u6784\u601d\u65b0\u60f3\u6cd5\uff1f\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u5f00\u542f\u8fd9\u4e00\u91cd\u8981\u5bf9\u8bdd\uff0c\u5e76\u63d0\u4f9b\u6709\u5173AI\u5728\u63a8\u52a8\u6570\u636e\u53ef\u89c6\u5316\u521b\u610f\u65b9\u9762\u7684\u6f5c\u529b\u7684\u5b9e\u7528\u89c1\u89e3\u3002|\n", "2409.02822": "|**2024-09-04**|**Language Understanding as a Constraint on Consensus Size in LLM Societies**|Giordano De Marzo et.al.|[2409.02822](http://arxiv.org/abs/2409.02822)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u671d\u7740\u534f\u4f5c\u4efb\u52a1\u53d1\u5c55\u7684\u60c5\u51b5\u4e0b\uff0c\u591a\u4e2a\u4ee3\u7406\u76f8\u4e92\u4f5c\u7528\uff0c\u5982\u540c\u4e00\u4e2aLLM\u793e\u4f1a\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u5927\u91cf\u7684LLM\u80fd\u591f\u901a\u8fc7\u81ea\u6211\u7ec4\u7ec7\u65b9\u5f0f\u8fbe\u6210\u5173\u4e8e\u4efb\u610f\u89c4\u8303\u7684\u5171\u8bc6\uff0c\u8fd9\u4e9b\u89c4\u8303\u5728\u4fe1\u606f\u652f\u6301\u67d0\u4e00\u9009\u9879\u4f18\u4e8e\u53e6\u4e00\u9009\u9879\u7684\u60c5\u51b5\u4e0b\u4e0d\u5b58\u5728\u3002\u4e3a\u4e86\u7406\u89e3LLM\u662f\u5426\u4e0e\u4eba\u7c7b\u793e\u4f1a\u4e00\u6837\uff0c\u5728\u6ca1\u6709\u673a\u6784\u7684\u60c5\u51b5\u4e0b\u80fd\u591f\u8fbe\u5230\u5171\u8bc6\uff0c\u6211\u4eec\u5e94\u7528\u4e86\u590d\u6742\u79d1\u5b66\u7684\u65b9\u6cd5\u548c\u884c\u4e3a\u79d1\u5b66\u7684\u539f\u5219\uff0c\u5f00\u521b\u4e86\u4e00\u79cdAI\u4eba\u7c7b\u5b66\u7684\u65b0\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLM\u80fd\u591f\u5728\u7fa4\u4f53\u4e2d\u8fbe\u6210\u5171\u8bc6\uff0c\u5e76\u4e14LLM\u7684\u610f\u89c1\u52a8\u6001\u53ef\u4ee5\u7528\u4e00\u4e2a\u7531\u591a\u6570\u529b\u91cf\u7cfb\u6570\u53c2\u6570\u5316\u7684\u51fd\u6570\u6765\u7406\u89e3\uff0c\u8be5\u7cfb\u6570\u51b3\u5b9a\u4e86\u5171\u8bc6\u662f\u5426\u53ef\u80fd\u3002\u5bf9\u4e8e\u5177\u6709\u66f4\u9ad8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u79cd\u591a\u6570\u529b\u91cf\u66f4\u5f3a\uff0c\u800c\u5bf9\u4e8e\u8f83\u5927\u7684\u7fa4\u4f53\u800c\u8a00\u5219\u4f1a\u51cf\u5f31\uff0c\u5bfc\u81f4\u5b58\u5728\u4e00\u4e2a\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\uff0c\u8d85\u8fc7\u8fd9\u4e2a\u5927\u5c0f\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684LLM\uff0c\u8fbe\u6210\u5171\u8bc6\u53d8\u5f97\u4e0d\u53ef\u80fd\u3002\u8fd9\u4e00\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\u968f\u7740\u6a21\u578b\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u589e\u957f\u5448\u6307\u6570\u7ea7\u589e\u957f\uff0c\u5bf9\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u800c\u8a00\uff0c\u5176\u53ef\u4ee5\u8fbe\u5230\u8fdc\u8d85\u975e\u6b63\u5f0f\u4eba\u7c7b\u7fa4\u4f53\u5178\u578b\u89c4\u6a21\u7684\u6570\u91cf\u7ea7\u3002|\n", "2409.02795": "|**2024-09-04**|**Towards a Unified View of Preference Learning for Large Language Models: A Survey**|Bofei Gao et.al.|[2409.02795](http://arxiv.org/abs/2409.02795)|**[link](https://github.com/kbsdjames/awesome-llm-preference-learning)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u5b9e\u73b0\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\u4e4b\u4e00\u662f\u4f7fLLM\u7684\u8f93\u51fa\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u8fd9\u4e00\u8fc7\u7a0b\u901a\u5e38\u9700\u8981\u5c11\u91cf\u6570\u636e\u5c31\u80fd\u9ad8\u6548\u63d0\u5347LLM\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u76f8\u5173\u65b9\u6cd5\u76f8\u5bf9\u590d\u6742\u96be\u4ee5\u7406\u89e3\u3002\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u5173\u7cfb\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\uff0c\u9650\u5236\u4e86\u504f\u597d\u8c03\u6574\u7b56\u7565\u7684\u53d1\u5c55\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5206\u89e3\u4e86\u73b0\u6709\u6d41\u884c\u8c03\u6574\u7b56\u7565\u7684\u56db\u4e2a\u7ec4\u6210\u90e8\u5206\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u6765\u7814\u7a76\u5f53\u524d\u7684\u8c03\u6574\u7b56\u7565\uff0c\u4ee5\u6b64\u5efa\u7acb\u5b83\u4eec\u4e4b\u95f4\u7684\u8054\u7cfb\u3002\u5728\u672c\u6587\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u5c06\u6240\u6709\u504f\u597d\u5b66\u4e60\u7b56\u7565\u5206\u89e3\u4e3a\u56db\u4e2a\u90e8\u5206\uff1a\u6a21\u578b\u3001\u6570\u636e\u3001\u53cd\u9988\u548c\u7b97\u6cd5\u3002\u8fd9\u79cd\u7edf\u4e00\u89c6\u89d2\u4e3a\u73b0\u6709\u8c03\u6574\u7b97\u6cd5\u63d0\u4f9b\u4e86\u6df1\u5165\u7406\u89e3\uff0c\u5e76\u4e14\u4e5f\u5f00\u542f\u4e86\u6574\u5408\u4e0d\u540c\u7b56\u7565\u4f18\u52bf\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u73b0\u6709\u4e3b\u6d41\u7b97\u6cd5\u7684\u5de5\u4f5c\u793a\u4f8b\uff0c\u4ee5\u5e2e\u52a9\u8bfb\u8005\u5168\u9762\u4e86\u89e3\u3002\u6700\u540e\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u7edf\u4e00\u89c6\u89d2\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7684\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002|\n", "2409.03752": "|**2024-09-05**|**Attention Heads of Large Language Models: A Survey**|Zifan Zheng et.al.|[2409.03752](http://arxiv.org/abs/2409.03752)|**[link](https://github.com/iaar-shanghai/awesome-attention-heads)**|**\u81eaChatGPT\u95ee\u4e16\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u4f5c\u4e3a\u9ed1\u76d2\u7cfb\u7edf\u5b58\u5728\u3002\u56e0\u6b64\uff0c\u5176\u53d1\u5c55\u4e3b\u8981\u4f9d\u8d56\u4e8e\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\uff0c\u9650\u5236\u4e86\u901a\u8fc7\u6539\u53d8\u5185\u90e8\u67b6\u6784\u548c\u63a8\u7406\u8def\u5f84\u6765\u63d0\u5347\u6027\u80fd\u7684\u53ef\u80fd\u6027\u3002\u8bb8\u591a\u7814\u7a76\u8005\u5f00\u59cb\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u90e8\u673a\u5236\uff0c\u65e8\u5728\u8bc6\u522b\u63a8\u7406\u74f6\u9888\u7684\u672c\u8d28\uff0c\u5927\u591a\u6570\u7814\u7a76\u96c6\u4e2d\u5728\u6ce8\u610f\u529b\u5934\u90e8\u4e0a\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u65e8\u5728\u901a\u8fc7\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\u548c\u6ce8\u610f\u529b\u5934\u90e8\u7684\u5185\u5728\u673a\u5236\uff0c\u63ed\u793a\u5176\u5185\u90e8\u63a8\u7406\u8fc7\u7a0b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u63d0\u70bc\u4e3a\u56db\u4e2a\u9636\u6bb5\u6846\u67b6\uff1a\u77e5\u8bc6\u56de\u5fc6\u3001\u60c5\u5883\u5185\u8bc6\u522b\u3001\u6f5c\u5728\u63a8\u7406\u548c\u8868\u8fbe\u51c6\u5907\u3002\u5229\u7528\u8fd9\u4e00\u6846\u67b6\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u56de\u987e\u73b0\u6709\u7814\u7a76\uff0c\u8bc6\u522b\u5e76\u5206\u7c7b\u7279\u5b9a\u6ce8\u610f\u529b\u5934\u90e8\u7684\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u53d1\u73b0\u8fd9\u4e9b\u7279\u6b8a\u5934\u90e8\u6240\u4f7f\u7528\u7684\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u5206\u4e3a\u65e0\u6a21\u578b\u65b9\u6cd5\u548c\u6709\u6a21\u578b\u65b9\u6cd5\u4e24\u5927\u7c7b\u3002\u6211\u4eec\u4e5f\u6982\u8ff0\u4e86\u76f8\u5173\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u5f53\u524d\u7814\u7a76\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u51e0\u4e2a\u6f5c\u5728\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u5f00\u6e90\u4e8e\u3002**|\n", "2409.03735": "|**2024-09-05**|**LLM-CI: Assessing Contextual Integrity Norms in Language Models**|Yan Shvartzshnaider et.al.|[2409.03735](http://arxiv.org/abs/2409.03735)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ece\u4e92\u8054\u7f51\u4e0a\u6536\u96c6\u7684\u6570\u636e\u4e2d\u8bb0\u5fc6\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u540c\u65f6\uff0c\u4e5f\u53ef\u80fd\u65e0\u610f\u4e2d\u7f16\u7801\u4e86\u793e\u4f1a\u504f\u597d\u548c\u89c4\u8303\u3002\u968f\u7740\u8fd9\u4e9b\u6a21\u578b\u88ab\u6574\u5408\u5230\u793e\u4f1a\u6280\u672f\u7cfb\u7edf\u4e2d\uff0c\u786e\u4fdd\u5b83\u4eec\u7f16\u7801\u7684\u89c4\u8303\u7b26\u5408\u793e\u4f1a\u671f\u671b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u89c4\u8303\u53ef\u80fd\u56e0\u6a21\u578b\u3001\u8d85\u53c2\u6570\u3001\u4f18\u5316\u6280\u672f\u4ee5\u53ca\u6570\u636e\u96c6\u7684\u4e0d\u540c\u800c\u4e0d\u540c\u3002\u7531\u4e8e\u63d0\u793a\u654f\u611f\u6027\u7684\u95ee\u9898\u2014\u2014\u5fae\u5c0f\u7684\u63d0\u793a\u53d8\u5316\u4f1a\u5bfc\u81f4\u4e0d\u540c\u7684\u54cd\u5e94\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u53d8\u5f97\u4e0d\u53ef\u9760\u3002\u9700\u8981\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u6db5\u76d6\u5404\u79cd\u6a21\u578b\u3001\u4f18\u5316\u548c\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u53ef\u9760\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u7f16\u7801\u7684\u89c4\u8303\u3002 \u6211\u4eec\u63d0\u51fa\u4e86LLM-CI\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30LLM\u4e2d\u7f16\u7801\u9690\u79c1\u89c4\u8303\u7684\u5f00\u6e90\u6846\u67b6\u3002LLM-CI\u4f7f\u7528\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u56e0\u7d20\u7684\u60c5\u5883\u53d9\u8ff0\u65b9\u6cd5\u6765\u8bc4\u4f30\u4e0d\u540c\u4e0a\u4e0b\u6587\u4e2d\u548c\u4e0d\u540cLLM\u4e2d\u7684\u7f16\u7801\u89c4\u8303\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u63d0\u793a\u8bc4\u4f30\u65b9\u6cd5\u6765\u89e3\u51b3\u63d0\u793a\u654f\u611f\u6027\u95ee\u9898\uff0c\u901a\u8fc7\u4ec5\u4ece\u5bfc\u81f4\u591a\u4e2a\u53d8\u4f53\u4e00\u81f4\u54cd\u5e94\u7684\u63d0\u793a\u4e2d\u8bc4\u4f30\u89c4\u8303\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4f7f\u7528\u5148\u524d\u5de5\u4f5c\u4e2d\u7684IoT\u548cCOPPA\u60c5\u666f\u6570\u636e\u96c6\u7684LLM\u3002 \u901a\u8fc7\u4f7f\u7528LLM-CI\u548c\u6211\u4eec\u63d0\u51fa\u7684\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5168\u9762\u5730\u8bc4\u4f30\u4e86LLM\uff0c\u7814\u7a76\u4e86\u6a21\u578b\u5c5e\u6027\uff08\u5982\u8d85\u53c2\u6570\u3001\u5bb9\u91cf\uff09\u548c\u4f18\u5316\u7b56\u7565\uff08\u5982\u5bf9\u9f50\u3001\u91cf\u5316\uff09\u7684\u5f71\u54cd\u3002|\n", "2409.03734": "|**2024-09-05**|**Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry**|Meena Jagadeesan et.al.|[2409.03734](http://arxiv.org/abs/2409.03734)|null|\u672c\u6587\u4ece\u7ecf\u6d4e\u548c\u7b97\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7b49\u5927\u89c4\u6a21\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5e02\u573a\u4e2d\u7684\u96c6\u4e2d\u95ee\u9898\uff0c\u4ee5\u53ca\u662f\u5426\u5b58\u5728\u8fdb\u5165\u6b64\u7c7b\u5e02\u573a\u7684\u4e0d\u53ef\u514b\u670d\u969c\u788d\u3002\u6211\u4eec\u901a\u8fc7\u6b63\u5f0f\u5b9a\u4e49\u4e00\u4e2a\u591a\u76ee\u6807\u9ad8\u7ef4\u56de\u5f52\u6846\u67b6\u6765\u63a2\u8ba8\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u7684\u95ee\u9898\uff0c\u8be5\u6846\u67b6\u6355\u6349\u5230\u4e86\u58f0\u8a89\u635f\u5bb3\u7684\u7279\u5f81\uff0c\u5e76\u5206\u6790\u4e86\u65b0\u516c\u53f8\u8fdb\u5165\u5e02\u573a\u6240\u9700\u7684\u6837\u672c\u6570\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u591a\u76ee\u6807\u8003\u8651\u80fd\u591f\u4ece\u6839\u672c\u4e0a\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u2014\u2014\u6240\u9700\u6837\u672c\u6570\u91cf\u53ef\u80fd\u8fdc\u5c0f\u4e8e\u73b0\u6709\u516c\u53f8\u7684\u6570\u636e\u96c6\u5927\u5c0f\u3002\u5728\u8bc1\u660e\u8fd9\u4e9b\u7ed3\u679c\u7684\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u8fd8\u53d1\u5c55\u4e86\u591a\u76ee\u6807\u73af\u5883\u4e2d\u9ad8\u7ef4\u7ebf\u6027\u56de\u5f52\u7684\u7f29\u653e\u5b9a\u5f8b\uff0c\u5c55\u793a\u4e86\u5f53\u6570\u636e\u96c6\u89c4\u6a21\u8f83\u5927\u65f6\uff0c\u7f29\u653e\u7387\u4f1a\u53d8\u5f97\u8f83\u6162\uff0c\u8fd9\u4e00\u53d1\u73b0\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u7814\u7a76\u4ef7\u503c\u3002|\n", "2409.03733": "|**2024-09-05**|**Planning In Natural Language Improves LLM Search For Code Generation**|Evan Wang et.al.|[2409.03733](http://arxiv.org/abs/2409.03733)|**[link](https://github.com/scaleapi/plansearch)**|\u5728\u5927\u89c4\u6a21\u63d0\u5347\u8bad\u7ec3\u8ba1\u7b97\u80fd\u529b\u7684\u540c\u65f6\uff0c\u63a8\u7406\u8ba1\u7b97\u7684\u89c4\u6a21\u6269\u5c55\u5e76\u672a\u5e26\u6765\u7c7b\u4f3c\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u5047\u8bbe\uff0c\u8fd9\u4e00\u9886\u57df\u7f3a\u4e4f\u5173\u952e\u6027\u7684\u7a81\u7834\u5728\u4e8e\u751f\u6210\u6a21\u578b\u7684\u8f93\u51fa\u591a\u6837\u6027\u4e0d\u8db3\uff0c\u5bfc\u81f4\u641c\u7d22\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u6a21\u578b\u4e0d\u65ad\u4ea7\u751f\u9ad8\u5ea6\u76f8\u4f3c\u4f46\u9519\u8bef\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u73b0\u63d0\u9ad8\u8f93\u51fa\u591a\u6837\u6027\u53ef\u4ee5\u6709\u6548\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPLANSEARCH\u7684\u65b0\u9896\u641c\u7d22\u7b97\u6cd5\uff0c\u5b83\u5728\u4eba\u7c7b\u8bc4\u4ef7\u3001MBPP+\u548cLiveCodeBench\uff08\u4e00\u4e2a\u7528\u4e8e\u7ade\u4e89\u6027\u7f16\u7a0b\u7684\u65e0\u6c61\u67d3\u57fa\u51c6\uff09\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u751f\u6210\u5173\u4e8e\u95ee\u9898\u7684\u591a\u6837\u89c2\u5bdf\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u89c2\u5bdf\u6784\u5efa\u89e3\u51b3\u7b56\u7565\uff0c\u6765\u63a2\u7d22\u6bd4\u4f20\u7edf\u65b9\u6cd5\u66f4\u5e7f\u6cdb\u7684\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u3002\u5728\u4f7f\u7528PLANSEARCH\u7ed3\u5408Claude 3.5 Sonnet\u8fdb\u884c\u4f18\u5316\u540e\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86LiveCodeBench\u4e0a77.0%\u7684\u901a\u8fc7\u7387\uff08pass@200\uff09\uff0c\u8fd9\u4e0d\u4ec5\u8d85\u8d8a\u4e86\u4e0d\u4f7f\u7528\u641c\u7d22\u65b9\u6cd5\uff08pass@1=41.4%\uff09\u7684\u7ed3\u679c\uff0c\u4e5f\u4f18\u4e8e\u4ec5\u4f9d\u8d56\u91cd\u590d\u91c7\u6837\u7684\u65b9\u6cd5\uff08pass@200=60.6%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u641c\u7d22\u5e26\u6765\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5176\u5173\u952e\u56e0\u7d20\u662f\u751f\u6210\u60f3\u6cd5\u7684\u591a\u6837\u6027\u3002|\n", "2409.03708": "|**2024-09-06**|**RAG based Question-Answering for Contextual Response Prediction System**|Sriram Veturi et.al.|[2409.03708](http://arxiv.org/abs/2409.03708)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u80fd\u529b\uff0c\u9488\u5bf9\u5b9e\u9645\u5de5\u4e1a\u5e94\u7528\u4e2d\u7684\u95ee\u9898\u56de\u7b54\u573a\u666f\u3002\u7ed9\u5b9a\u5ba2\u6237\u67e5\u8be2\uff0c\u8be5\u7cfb\u7edf\u4f1a\u68c0\u7d22\u76f8\u5173\u77e5\u8bc6\u6587\u6863\uff0c\u5e76\u7ed3\u5408\u4e4b\u524d\u7684\u804a\u5929\u5386\u53f2\uff0c\u4e3a\u96f6\u552e\u516c\u53f8\u7684\u5ba2\u670d\u4e2d\u5fc3\u63d0\u4f9b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u751f\u6210\u54cd\u5e94\u5efa\u8bae\u3002\u901a\u8fc7\u5168\u9762\u7684\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5728\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u4e0a\u4f18\u4e8e\u5f53\u524d\u57fa\u4e8eBERT\u7684\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eRAG\u7684LLMs\u53ef\u4ee5\u4f5c\u4e3a\u4eba\u7c7b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u7684\u4f18\u79c0\u8f85\u52a9\u5de5\u5177\uff0c\u51cf\u8f7b\u4ed6\u4eec\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002|\n", "2409.03671": "|**2024-09-05**|**TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems**|Stylianos Loukas Vasileiou et.al.|[2409.03671](http://arxiv.org/abs/2409.03671)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRACE-cs\u7684\u65b0\u578b\u6df7\u5408\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u7b26\u53f7\u63a8\u7406\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u89e3\u51b3\u6392\u7a0b\u95ee\u9898\u4e2d\u7684\u5bf9\u6bd4\u67e5\u8be2\u3002TRACE-cs\u5229\u7528SAT\u6c42\u89e3\u6280\u672f\u7f16\u7801\u6392\u7a0b\u7ea6\u675f\uff0c\u5e76\u751f\u6210\u7528\u6237\u67e5\u8be2\u7684\u89e3\u91ca\uff0c\u540c\u65f6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c06\u7528\u6237\u7684\u67e5\u8be2\u8f6c\u6362\u4e3a\u903b\u8f91\u6761\u76ee\uff0c\u5e76\u7ec6\u5316\u7b26\u53f7\u6c42\u89e3\u5668\u751f\u6210\u7684\u89e3\u91ca\u4e3a\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u3002\u901a\u8fc7\u6574\u5408\u8fd9\u4e9b\u7ec4\u4ef6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5c06\u7b26\u53f7\u65b9\u6cd5\u4e0eLLM\u76f8\u7ed3\u5408\uff0c\u521b\u5efa\u5177\u6709\u6b63\u786e\u6027\u4fdd\u8bc1\u7684\u53ef\u89e3\u91caAI\u4ee3\u7406\u7684\u6f5c\u529b\u3002|\n", "2409.03668": "|**2024-09-05**|**A Fused Large Language Model for Predicting Startup Success**|Abdurahman Maarouf et.al.|[2409.03668](http://arxiv.org/abs/2409.03668)|null|\u4e3a\u4e86\u5e2e\u52a9\u6295\u8d44\u8005\u505a\u51fa\u6709\u6548\u7684\u51b3\u7b56\u5e76\u6301\u7eed\u5bfb\u627e\u76c8\u5229\u7684\u521b\u4e1a\u6295\u8d44\u673a\u4f1a\uff0c\u9700\u8981\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u5982\u4eca\uff0c\u6295\u8d44\u8005\u4e0d\u4ec5\u53ef\u4ee5\u5229\u7528\u6709\u5173\u521d\u521b\u516c\u53f8\u7684\u5404\u79cd\u57fa\u672c\u9762\u4fe1\u606f\uff08\u5982\u516c\u53f8\u7684\u6210\u7acb\u65f6\u95f4\u3001\u521b\u59cb\u4eba\u6570\u91cf\u4ee5\u53ca\u6240\u5904\u884c\u4e1a\uff09\uff0c\u8fd8\u53ef\u4ee5\u901a\u8fc7\u5728\u7ebf\u98ce\u9669\u6295\u8d44\uff08VC\uff09\u5e73\u53f0\u83b7\u53d6\u5173\u4e8e\u516c\u53f8\u521b\u65b0\u548c\u4e1a\u52a1\u6a21\u5f0f\u7684\u6587\u672c\u63cf\u8ff0\u4fe1\u606f\uff0c\u4f8b\u5982Crunchbase\u3002\u4e3a\u4e86\u652f\u6301\u6295\u8d44\u8005\u7684\u51b3\u7b56\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\uff0c\u65e8\u5728\u5728VC\u5e73\u53f0\u4e0a\u5b9a\u4f4d\u6210\u529f\u7684\u521d\u521b\u516c\u53f8\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f00\u53d1\u3001\u8bad\u7ec3\u5e76\u8bc4\u4f30\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u8bc4\u4f30VC\u5e73\u53f0\u4e0a\u516c\u53f8\u7684\u81ea\u6211\u63cf\u8ff0\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u9884\u6d4b\u5176\u6210\u529f\u6027\u3002\u4f7f\u7528\u6765\u81eaCrunchbase\u768420,172\u4e2a\u5728\u7ebf\u8d44\u6599\u6863\u6848\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\uff0c\u5176\u4e2d\u6587\u672c\u81ea\u6211\u63cf\u8ff0\u5bf9\u9884\u6d4b\u80fd\u529b\u8d21\u732e\u4e86\u663e\u8457\u90e8\u5206\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u51b3\u7b56\u652f\u6301\u5de5\u5177\uff0c\u5e2e\u52a9\u6295\u8d44\u8005\u627e\u5230\u76c8\u5229\u7684\u6295\u8d44\u673a\u4f1a\u3002|\n", "2409.03662": "|**2024-09-05**|**The representation landscape of few-shot learning and fine-tuning in large language models**|Diego Doimo et.al.|[2409.03662](http://arxiv.org/abs/2409.03662)|**[link](https://github.com/diegodoimo/geometry_icl_finetuning)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u6539\u8fdb\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6027\u80fd\u7684\u4e24\u79cd\u5e38\u89c1\u7b56\u7565\uff1a\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u548c\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u672c\u8d28\u4e0d\u540c\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u80fd\u4ea7\u751f\u76f8\u4f3c\u7684\u6027\u80fd\u63d0\u5347\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u5b83\u4eec\u662f\u5426\u5728LLM\u5185\u90e8\u8bf1\u5bfc\u51fa\u76f8\u4f3c\u7684\u8868\u793a\u7ed3\u6784\u77e5\u4e4b\u751a\u5c11\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u9690\u85cf\u8868\u793a\u7684\u6982\u7387\u666f\u89c2\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5728\u76f8\u540c\u7684\u95ee\u7b54\u4efb\u52a1\u4e0a\u6bd4\u8f83\u4e86LLM\u7684\u8868\u73b0\uff0c\u53d1\u73b0ICL\u548cSFT\u4ea7\u751f\u4e86\u975e\u5e38\u4e0d\u540c\u7684\u5185\u90e8\u7ed3\u6784\uff0c\u4e24\u8005\u90fd\u5728\u7f51\u7edc\u7684\u4e2d\u95f4\u90e8\u5206\u7ecf\u5386\u4e86\u4e00\u4e2a\u660e\u663e\u7684\u8f6c\u53d8\u3002\u5728\u6a21\u578b\u7684\u524d\u534a\u90e8\u5206\uff0cICL\u5851\u9020\u4e86\u5206\u5c42\u7ec4\u7ec7\u7684\u53ef\u89e3\u91ca\u8868\u793a\uff0c\u6309\u7167\u5176\u8bed\u4e49\u5185\u5bb9\u8fdb\u884c\u6392\u5e8f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0cSFT\u5f97\u5230\u7684\u6982\u7387\u666f\u89c2\u66f4\u52a0\u6a21\u7cca\u4e14\u8bed\u4e49\u6df7\u6742\u3002\u5728\u7f51\u7edc\u7684\u540e\u534a\u90e8\u5206\uff0c\u5fae\u8c03\u540e\u7684\u8868\u793a\u53d1\u5c55\u51fa\u4e86\u66f4\u6709\u5229\u4e8e\u7f16\u7801\u7b54\u6848\u8eab\u4efd\u7684\u6982\u7387\u6a21\u5f0f\uff0c\u800cICL\u8868\u793a\u7684\u6982\u7387\u5cf0\u5219\u4e0d\u592a\u660e\u786e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u63ed\u793a\u4e86LLM\u5728\u4e0d\u540c\u6761\u4ef6\u4e0b\u89e3\u51b3\u76f8\u540c\u4efb\u52a1\u65f6\u6240\u91c7\u7528\u7684\u591a\u6837\u5316\u8ba1\u7b97\u7b56\u7565\uff0c\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u671d\u7740\u8bbe\u8ba1\u51fa\u4ece\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u4fe1\u606f\u7684\u6700\u4f73\u65b9\u6cd5\u8fc8\u8fdb\u3002**|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u4e14\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u65b9\u5f0f\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u8003\u8651\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\uff0c\u540c\u65f6\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u5916\u8fd8\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u4f7f\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u7cfb\u7edf\uff08GPT-3 \u548c GPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u6b4c\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u667a\u80fd\u4f53\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u5bfc\u81f4\u4ee5\u4e0b\u7ed3\u679c\uff1a1\uff09\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff08pp\uff09\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\uff0c\u6839\u636e\u72ec\u7279\u548c\u65b0\u9896\u7684n-grams\u8bc4\u4f30\u3002\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u65b9\u9762\u4e5f\u8868\u73b0\u51fa\u7fa4\u4f53\u5dee\u5f02\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u83b7\u76ca\uff0c\u5177\u6709\u975e\u540c\u8d28\u667a\u80fd\u4f53\u7684\u591a\u6837\u5316\u7684\u6a21\u578b\u7ec4\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u663e\u793a\u4e86\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u8bcd\u6c47\u591a\u6837\u6027\u7684\u4e0b\u964d\uff0c\u5e76\u6ca1\u6709\u5c55\u73b0\u51fa\u65e8\u5728\u5728\u793e\u4ea4\u7f51\u7edc\u4e2d\u5b9e\u73b0\u7684\u7fa4\u4f53\u95f4\u5206\u5316\u3002 \u672c\u6587\u8ba4\u4e3a\uff0c\u5728\u8bf8\u5982\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u8fdb\u884c\u8303\u5f0f\u8f6c\u53d8\uff0c\u5f15\u5165\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4ea4\u4e92\u7684\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u5efa\u6a21\uff09\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u52a0\u591a\u6837\u6027\u548c\u521b\u65b0\u7684\u751f\u6210\u3002**|\n", "2409.03512": "|**2024-09-05**|**From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents**|Jifan Yu et.al.|[2409.03512](http://arxiv.org/abs/2409.03512)|null|\u81ea\u6700\u65e9\u7684\u5728\u7ebf\u6559\u80b2\u5b9e\u4f8b\u51fa\u73b0\uff0c\u8bfe\u7a0b\u88ab\u4e0a\u4f20\u81f3\u53ef\u8bbf\u95ee\u5e76\u5171\u4eab\u7684\u5728\u7ebf\u5e73\u53f0\u4ee5\u6765\uff0c\u8fd9\u79cd\u6269\u5927\u77e5\u8bc6\u4f20\u64ad\u8303\u56f4\u3001\u89e6\u53ca\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u5f62\u5f0f\u5f15\u53d1\u4e86\u5e7f\u6cdb\u8ba8\u8bba\u548c\u666e\u904d\u91c7\u7eb3\u3002\u8ba4\u8bc6\u5230\u4e2a\u6027\u5316\u5b66\u4e60\u4ecd\u5b58\u5728\u6539\u8fdb\u7a7a\u95f4\uff0c\u4eba\u5de5\u667a\u80fd\u6280\u672f\u4e0d\u65ad\u878d\u5165\u8fd9\u4e00\u5b66\u4e60\u6a21\u5f0f\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u591a\u79cd\u6559\u80b2AI\u5e94\u7528\uff0c\u5982\u6559\u80b2\u63a8\u8350\u548c\u667a\u80fd\u8f85\u5bfc\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u667a\u80fd\u7684\u6d8c\u73b0\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u6559\u80b2\u589e\u5f3a\u529f\u80fd\u5f97\u4ee5\u57fa\u4e8e\u7edf\u4e00\u7684\u57fa\u7840\u6a21\u578b\u6784\u5efa\uff0c\u5b9e\u73b0\u66f4\u6df1\u5c42\u9762\u7684\u6574\u5408\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMAIC\uff08\u5927\u89c4\u6a21AI\u8d4b\u80fd\u8bfe\u7a0b\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5728\u7ebf\u6559\u80b2\u5f62\u5f0f\uff0c\u5229\u7528LLM\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u6784\u5efaAI\u8f85\u52a9\u8bfe\u5802\uff0c\u5e73\u8861\u4e86\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002\u9664\u4e86\u63a2\u7d22\u6982\u5ff5\u6846\u67b6\u548c\u6280\u672f\u521b\u65b0\u5916\uff0c\u6211\u4eec\u5728\u6e05\u534e\u5927\u5b66\u2014\u2014\u4e2d\u56fd\u9876\u5c16\u5927\u5b66\u4e4b\u4e00\u2014\u2014\u8fdb\u884c\u4e86\u521d\u6b65\u5b9e\u9a8c\u3002\u901a\u8fc7\u8d85\u8fc710\u4e07\u6761\u5b66\u4e60\u8bb0\u5f55\u548c500\u591a\u540d\u5b66\u751f\u7684\u6570\u636e\uff0c\u6211\u4eec\u83b7\u5f97\u4e86\u5b9d\u8d35\u89c2\u5bdf\u548c\u521d\u6b65\u5206\u6790\u3002\u8fd9\u4e2a\u9879\u76ee\u5c06\u6301\u7eed\u53d1\u5c55\uff0c\u6700\u7ec8\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u5168\u9762\u5f00\u653e\u7684\u5e73\u53f0\uff0c\u652f\u6301\u548c\u7edf\u4e00\u7814\u7a76\u3001\u6280\u672f\u548c\u5e94\u7528\uff0c\u5728\u5927\u6a21\u578bAI\u65f6\u4ee3\u63a2\u7d22\u5728\u7ebf\u6559\u80b2\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bbe\u60f3\u8fd9\u4e2a\u5e73\u53f0\u662f\u4e00\u4e2a\u5408\u4f5c\u67a2\u7ebd\uff0c\u6c47\u96c6\u6559\u80b2\u8005\u3001\u7814\u7a76\u4eba\u5458\u548c\u521b\u65b0\u8005\u5171\u540c\u63a2\u7d22AI\u9a71\u52a8\u5728\u7ebf\u6559\u80b2\u7684\u672a\u6765\u3002|\n", "2409.04421": "|**2024-09-06**|**RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs**|Jiaxing Wu et.al.|[2409.04421](http://arxiv.org/abs/2409.04421)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u9884\u6d4b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08Reinforcement Learning from Prediction Feedback\uff0cRLPF\uff09\u201d\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u5e94\u7528\u65f6\u9762\u4e34\u7684\u95ee\u9898\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f53LLMs\u4ece\u7528\u6237\u7684\u8fc7\u5f80\u6d3b\u52a8\u9884\u6d4b\u884c\u4e3a\u65f6\uff0c\u5b83\u4eec\u7684\u6709\u6548\u6027\u5f80\u5f80\u53d6\u51b3\u4e8e\u80fd\u5426\u6709\u6548\u5730\u5229\u7528\u5927\u91cf\u3001\u957f\u7bc7\u7684\u7528\u6237\u5386\u53f2\u6570\u636e\uff0c\u800c\u8fd9\u4e9b\u6570\u636e\u901a\u5e38\u542b\u6709\u566a\u97f3\u4e14\u957f\u5ea6\u8fc7\u957f\u3002\u73b0\u6709\u9884\u8bad\u7ec3\u7684LLMs\u53ef\u80fd\u751f\u6210\u7684\u6458\u8981\u867d\u77ed\u5c0f\u7cbe\u608d\uff0c\u4f46\u7f3a\u4e4f\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5176\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0cRLPF\u65b9\u6cd5\u901a\u8fc7\u5fae\u8c03LLMs\u6765\u751f\u6210\u7cbe\u70bc\u3001\u4eba\u7c7b\u53ef\u8bfb\u7684\u7528\u6237\u6982\u8981\uff0c\u8fd9\u4e9b\u6982\u8981\u80fd\u591f\u4f18\u5316\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\u3002\u901a\u8fc7\u6700\u5927\u5316\u751f\u6210\u6982\u8981\u7684\u6709\u7528\u6027\uff0cRLPF\u80fd\u591f\u6709\u6548\u63d0\u53d6\u5927\u91cf\u7528\u6237\u5386\u53f2\u6570\u636e\u7684\u5173\u952e\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0cRLPF\u5728\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u4e0a\u663e\u8457\u63d0\u5347\u4e8622%\uff0c\u5728\u4e8b\u5b9e\u6027\u3001\u62bd\u8c61\u6027\u548c\u53ef\u8bfb\u6027\u7b49\u6307\u6807\u4e0a\u7684\u8868\u73b0\u5206\u522b\u8fbe\u5230\u4e8684.59%\u7684\u80dc\u7387\uff0c\u540c\u65f6\u5b9e\u73b0\u4e8674%\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u51cf\u5c11\uff0c\u4e14\u572816\u4e2a\u672a\u89c1\u7684\u4efb\u52a1\u548c/\u6216\u6570\u636e\u96c6\u4e0a\u5747\u6709\u6027\u80fd\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u5176\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u603b\u4e4b\uff0cRLPF\u63d0\u4f9b\u4e86\u4e00\u79cd\u589e\u5f3aLLMs\u5728\u4e2a\u4eba\u5316\u9886\u57df\u5e94\u7528\u7684\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u901a\u8fc7\u5c06\u957f\u7bc7\u3001\u566a\u97f3\u4e30\u5bcc\u7684\u7528\u6237\u5386\u53f2\u8f6c\u5316\u4e3a\u4fe1\u606f\u4e30\u5bcc\u3001\u6613\u4e8e\u7406\u89e3\u7684\u8868\u793a\uff0c\u4ece\u800c\u63d0\u9ad8LLMs\u7684\u4e2a\u4eba\u5316\u80fd\u529b\u3002|\n", "2409.04388": "|**2024-09-06**|**Question-Answering Dense Video Events**|Hangyu Qin et.al.|[2409.04388](http://arxiv.org/abs/2409.04388)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\u2014\u2014\u9488\u5bf9\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u4e0e\u5b9a\u4f4d\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u80fd\u591f\u51c6\u786e\u7406\u89e3\u5e76\u63a8\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u591a\u4e2a\u4e8b\u4ef6\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aDeVE-QA\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5173\u4e8e10600\u4e2a\u957f\u89c6\u9891\u4e2d26000\u4e2a\u4e8b\u4ef6\u768478000\u4e2a\u95ee\u9898\u3002 \u73b0\u6709\u5728\u5355\u4e8b\u4ef6\u95ee\u7b54\u4e0a\u8868\u73b0\u51fa\u8272\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u9762\u5bf9DeVE-QA\u65f6\u9047\u5230\u6311\u6218\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5904\u7406\u957f\u65f6\u95f4\u6bb5\u5185\u53d1\u751f\u7684\u591a\u4e2a\u4e8b\u4ef6\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDeVi\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u5373\u53ef\u63d0\u5347MLLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002DeVi\u901a\u8fc7\u5f15\u5165\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u6765\u6539\u8fdb\u73b0\u6709\u7684MLLMs\uff1a\u5c42\u7ea7\u63cf\u8ff0\u6a21\u5757\u3001\u65f6\u95f4\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u548c\u81ea\u6211\u4e00\u81f4\u6027\u68c0\u67e5\u6a21\u5757\u3002\u8fd9\u4e09\u4e2a\u6a21\u5757\u5206\u522b\u7528\u4e8e\u68c0\u6d4b\u3001\u4e0a\u4e0b\u6587\u5316\u548c\u8bb0\u5fc6\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\uff0c\u4ee5\u53ca\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u4ee5\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709MLLMs\u76f8\u6bd4\uff0cDeVi\u5728\u56de\u7b54\u5bc6\u96c6\u4e8b\u4ef6\u95ee\u9898\u548c\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728DeVE-QA\u6570\u636e\u96c6\u4e0a\uff0cDeVi\u7684G(round)QA\u51c6\u786e\u7387\u63d0\u9ad8\u4e864.1%\uff0c\u5728NExT-GQA\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e863.7%\u3002|\n", "2409.04318": "|**2024-09-06**|**Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs**|Aliakbar Nafar et.al.|[2409.04318](http://arxiv.org/abs/2409.04318)|**[link](https://github.com/HLR/LvsR-LLM)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u4f30\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5185\u5728\u5b66\u4e60\u673a\u5236\u7684\u6846\u67b6\u3002\u6211\u4eec\u58f0\u79f0\uff0c\u8fd9\u4e9b\u673a\u5236\u662f\u901a\u8fc7\u68c0\u7d22\u5185\u90e8\u77e5\u8bc6\u548c\u901a\u8fc7\u5173\u6ce8\u56de\u5f52\u4efb\u52a1\u4ece\u4e0a\u4e0b\u6587\u4e2d\u7684\u793a\u4f8b\u8fdb\u884c\u5b66\u4e60\u7684\u7ec4\u5408\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u6267\u884c\u56de\u5f52\u7684\u80fd\u529b\uff0c\u5e76\u8bbe\u8ba1\u5b9e\u9a8c\u6765\u8861\u91cf\u6a21\u578b\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u901a\u8fc7\u68c0\u7d22\u5176\u5185\u90e8\u77e5\u8bc6\u800c\u4e0d\u662f\u4ece\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u5b66\u4e60\u6765\u8fdb\u884c\u5185\u5728\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e2a\u8fc7\u7a0b\u4f4d\u4e8e\u8fd9\u4e24\u4e2a\u6781\u7aef\u4e4b\u95f4\u7684\u8fde\u7eed\u4f53\u4e0a\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u6839\u636e\u5404\u79cd\u56e0\u7d20\uff08\u5982\u4efb\u52a1\u7684\u5148\u9a8c\u77e5\u8bc6\u4ee5\u53ca\u63d0\u4f9b\u7ed9\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u4fe1\u606f\u7c7b\u578b\u548c\u4e30\u5bcc\u5ea6\uff09\u8fd9\u4e9b\u673a\u5236\u88ab\u89e6\u53d1\u7684\u7a0b\u5ea6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cdLLMs\u5e76\u5229\u7528\u591a\u4e2a\u6570\u636e\u96c6\u6765\u9a8c\u8bc1\u6211\u4eec\u7684\u53d1\u73b0\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5982\u4f55\u6839\u636e\u6240\u89e3\u51b3\u7684\u95ee\u9898\u5229\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u7684\u5143\u5b66\u4e60\u548c\u4fc3\u8fdb\u77e5\u8bc6\u68c0\u7d22\u7684\u65b9\u6cd5\u3002|\n", "2409.04312": "|**2024-09-06**|**An optically accelerated extreme learning machine using hot atomic vapors**|Pierre Azam et.al.|[2409.04312](http://arxiv.org/abs/2409.04312)|null|\u673a\u5668\u5b66\u4e60\u6b63\u9010\u6e10\u6210\u4e3a\u4e00\u79cd\u5e7f\u6cdb\u5e94\u7528\u7684\u6280\u672f\uff0c\u5176\u589e\u957f\u901f\u5ea6\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u539f\u56e0\u5728\u4e8e\u5b83\u80fd\u591f\u63d0\u4f9b\u89e3\u51b3\u793e\u4f1a\u5173\u6ce8\u95ee\u9898\u7684\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\u7684\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u968f\u7740\u5e94\u7528\u548c\u6240\u9700\u8d44\u6e90\u7684\u589e\u52a0\uff0c\u5f53\u524d\u7684\u786c\u4ef6\u6280\u672f\u5f00\u59cb\u53d7\u9650\u3002\u7279\u522b\u662f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6216\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u8bc6\u522b\u7b49\u65b0\u578b\u673a\u5668\u5b66\u4e60\u9886\u57df\uff0c\u8ba1\u7b97\u65f6\u95f4\u4e0e\u80fd\u6e90\u6210\u672c\u6210\u4e3a\u4e86\u5173\u952e\u95ee\u9898\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u591a\u5e74\u6765\u5df2\u7ecf\u8bbe\u8ba1\u51fa\u4e86\u5149\u5b66\u5e73\u53f0\uff0c\u65e8\u5728\u5f00\u53d1\u66f4\u9ad8\u6548\u7684\u673a\u5668\u5b66\u4e60\u786c\u4ef6\u3002 \u5176\u4e2d\uff0c\u81ea\u7531\u7a7a\u95f4\u4f20\u64ad\u5e73\u53f0\u5177\u6709\u591a\u79cd\u4f18\u52bf\uff1a\u5e76\u884c\u6027\u3001\u4f4e\u80fd\u8017\u4e0e\u8ba1\u7b97\u901f\u5ea6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7ed3\u5408\u5149\u675f\u5728\u70ed\u539f\u5b50\u84b8\u6c14\u4e2d\u4f20\u64ad\u7684\u5f3a\u70c8\u4e14\u53ef\u8c03\u975e\u7ebf\u6027\u7279\u6027\u7684\u65b0\u8bbe\u8ba1\uff0c\u5e76\u4e0e\u6781\u7aef\u5b66\u4e60\u673a\u6a21\u578b\u76f8\u7ed3\u5408\u3002\u901a\u8fc7\u6570\u503c\u6a21\u62df\u4e0e\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728MNIST\u56fe\u50cf\u5206\u7c7b\u4efb\u52a1\u4e2d\u4f7f\u7528\u6b64\u7c7b\u81ea\u7531\u7a7a\u95f4\u975e\u7ebf\u6027\u4f20\u64ad\u589e\u5f3a\u8bad\u7ec3\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u9a8c\u4e2d\u7684\u591a\u4e2a\u8d85\u53c2\u6570\uff0c\u8fd9\u4e9b\u53c2\u6570\u8fdb\u4e00\u6b65\u4f18\u5316\u540e\u53ef\u4ee5\u63d0\u9ad8\u5e73\u53f0\u7684\u51c6\u786e\u6027\u3002|\n", "2409.04286": "|**2024-09-06**|**Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets**|Desiree Heim et.al.|[2409.04286](http://arxiv.org/abs/2409.04286)|null|\u5f53\u524d\u516c\u5f00\u7684\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u5728\u591a\u6837\u6027\u3001\u8be6\u5c3d\u6ce8\u91ca\u4ee5\u53ca\u7528\u6237\u548c\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u77e5\u8bc6\u5de5\u4f5c\u8f85\u52a9\u7cfb\u7edf\u8fdb\u884c\u5ba2\u89c2\u548c\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u9a71\u52a8\u8bc4\u4f30\u4e0e\u4f18\u5316\u3002\u7531\u4e8e\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u6536\u96c6\u6b64\u7c7b\u6570\u636e\u6240\u9700\u7684\u8d44\u6e90\u5de8\u5927\uff0c\u4ee5\u53ca\u6570\u636e\u5ba1\u67e5\u7684\u5fc5\u8981\u6027\uff0c\u56e0\u6b64\u6784\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u51e0\u4e4e\u4e0d\u53ef\u80fd\u5b9e\u73b0\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u914d\u7f6e\u7684\u591a\u4ee3\u7406\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u751f\u6210\u5668\u3002\u8be5\u7cfb\u7edf\u6a21\u62df\u4e86\u7531\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u6863\u5e76\u76f8\u4e92\u534f\u4f5c\u7684\u4ee3\u7406\u4e4b\u95f4\u7684\u77e5\u8bc6\u5de5\u4f5c\uff0c\u5e76\u8bb0\u5f55\u4e86\u4f34\u968f\u7684\u6570\u636e\u8f68\u8ff9\u3002\u6b64\u5916\uff0c\u751f\u6210\u5668\u5728\u5176\u914d\u7f6e\u4e2d\u6355\u83b7\u6216\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u521b\u5efa\u7684\u6240\u6709\u80cc\u666f\u4fe1\u606f\uff0c\u5e76\u4ee5\u77e5\u8bc6\u56fe\u8c31\u7684\u5f62\u5f0f\u5b58\u50a8\u3002\u6700\u540e\uff0c\u4ea7\u751f\u7684\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\u5229\u7528\u548c\u5171\u4eab\uff0c\u800c\u65e0\u9700\u6d89\u53ca\u9690\u79c1\u6216\u673a\u5bc6\u95ee\u9898\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u8bbe\u8ba1\u613f\u666f\uff0c\u5e76\u4e13\u6ce8\u4e8e\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u771f\u5b9e\u7684\u77e5\u8bc6\u5de5\u4f5c\u6587\u6863\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u4eba\u7c7b\u8bc4\u4f30\u8005\u8bc4\u4f30\u4e86\u751f\u6210\u6587\u6863\u768453%\u548c\u771f\u5b9e\u6587\u6863\u768474%\uff0c\u8ba4\u4e3a\u5b83\u4eec\u5177\u6709\u771f\u5b9e\u6027\uff0c\u8fd9\u8868\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u6790\u4e86\u53c2\u4e0e\u8005\u8bc4\u8bba\u4e2d\u63d0\u5230\u7684\u771f\u5b9e\u6027\u6807\u51c6\uff0c\u5e76\u5bf9\u5df2\u8bc6\u522b\u7684\u5e38\u89c1\u95ee\u9898\u8fdb\u884c\u4e86\u8be6\u7ec6\u8bf4\u660e\uff0c\u63d0\u51fa\u4e86\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.04270": "|**2024-09-06**|**Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models**|Yuxiao Huang et.al.|[2409.04270](http://arxiv.org/abs/2409.04270)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4f18\u5316\u8303\u5f0f\uff0c\u4ee5\u5efa\u7acb\u4e00\u4e2a\u81ea\u4e3b\u6a21\u578b\u5de5\u5382\uff0c\u7528\u4e8e\u751f\u6210\u9002\u7528\u4e8e\u4e0d\u540c\u4f18\u5316\u4efb\u52a1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u8bbe\u8ba1\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u4e3a\u4e86\u8bc4\u4f30\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u7814\u7a76\uff0c\u5c06\u751f\u6210\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u4e0e\u73b0\u6709\u7684\u6700\u4f73\u77e5\u8bc6\u8f6c\u79fb\u65b9\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u751f\u6210\u7684\u6a21\u578b\u5728\u6548\u7387\u548c\u6709\u6548\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u6216\u4e0e\u624b\u5de5\u8bbe\u8ba1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2409.04183": "|**2024-09-06**|**GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding**|Ziyin Zhang et.al.|[2409.04183](http://arxiv.org/abs/2409.04183)|null|\u5728\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GALLa - \u56fe\u5f62\u5bf9\u9f50\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002GALLa \u5229\u7528\u56fe\u795e\u7ecf\u7f51\u7edc\u548c\u8de8\u6a21\u6001\u5bf9\u9f50\u6280\u672f\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u5411LLM\u6ce8\u5165\u4ee3\u7801\u7684\u7ed3\u6784\u4fe1\u606f\u4f5c\u4e3a\u8f85\u52a9\u4efb\u52a1\u3002\u8fd9\u79cd\u6846\u67b6\u65e2\u65e0\u6a21\u578b\u4f9d\u8d56\u6027\u4e5f\u65e0\u4efb\u52a1\u4f9d\u8d56\u6027\uff0c\u5b83\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801LLM\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801\u4e0b\u6e38\u4efb\u52a1\uff0c\u5e76\u4ec5\u5728\u8bad\u7ec3\u65f6\u4ece\u4e0e\u5fae\u8c03\u6570\u636e\u65e0\u5173\u7684\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7ed3\u6784\u5316\u56fe\u5f62\u6570\u636e\uff0c\u800c\u5728\u63a8\u7406\u9636\u6bb5\u65e0\u9700\u989d\u5916\u6210\u672c\u3002\u901a\u8fc7\u56db\u79cd\u4e0d\u540c\u57fa\u7ebfLLM\uff08\u53c2\u6570\u91cf\u4ece3.5\u4ebf\u523080\u4ebf\u4e0d\u7b49\uff09\u5728\u4e94\u4e2a\u4ee3\u7801\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86GALLa\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5f3a\u5927\u7684\u6a21\u578b\u5982LLaMA3\uff0c\u4e5f\u8bc1\u660e\u4e86\u5176\u4e00\u81f4\u6027\u6539\u8fdb\u3002|\n", "2409.04181": "|**2024-09-06**|**Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering**|Larissa Pusch et.al.|[2409.04181](http://arxiv.org/abs/2409.04181)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u636e\u5e93\u7b49\u4fe1\u606f\u7cfb\u7edf\u7684\u4ea4\u4e92\u65b9\u5f0f\uff0c\u4f7f\u5176\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u7136\u800c\uff0c\u5728\u5173\u952e\u51c6\u786e\u6027\u9886\u57df\uff0c\u5982\u751f\u7269\u533b\u5b66\u9886\u57df\uff0c\u4ecd\u5b58\u5728\u6311\u6218\u3002\u5176\u4e2d\u4e00\u4e2a\u91cd\u8981\u95ee\u9898\u662f\u5e7b\u89c9\u95ee\u9898\uff0c\u5373\u6a21\u578b\u751f\u6210\u4e86\u6570\u636e\u652f\u6301\u4e4b\u5916\u7684\u4fe1\u606f\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5371\u9669\u7684\u9519\u8bef\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u6539\u5584\u95ee\u7b54\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u4ee5\u751f\u7269\u533b\u5b66KG\u4e3a\u4f8b\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8eLangChain\u6846\u67b6\u6784\u5efa\uff0c\u901a\u8fc7\u5f15\u5165\u67e5\u8be2\u68c0\u67e5\u5668\u786e\u4fddLLM\u751f\u6210\u7684\u67e5\u8be2\u5728\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u7684\u6709\u6548\u6027\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u67e5\u8be2\u4ece\u77e5\u8bc6\u56fe\u8c31\u4e2d\u63d0\u53d6\u4fe1\u606f\uff0c\u5927\u5e45\u51cf\u5c11\u4e86\u9519\u8bef\u5982\u5e7b\u89c9\u7684\u53d1\u751f\u3002 \u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u5305\u542b50\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u9898\u7684\u65b0\u57fa\u51c6\u6570\u636e\u96c6\u5bf9\u6574\u4f53\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u4e86\u5305\u62ecGPT-4 Turbo\u548cllama3:70b\u5728\u5185\u7684\u51e0\u79cdLLM\u3002\u7ed3\u679c\u663e\u793a\uff0c\u867d\u7136GPT-4 Turbo\u5728\u751f\u6210\u51c6\u786e\u67e5\u8be2\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f00\u6e90\u6a21\u578b\u5982llama3:70b\u5728\u9002\u5f53\u7684\u95ee\u9898\u63d0\u793a\u5de5\u7a0b\u4e0b\u4e5f\u663e\u793a\u51fa\u6f5c\u529b\u3002\u4e3a\u4e86\u4f7f\u8fd9\u79cd\u65b9\u6cd5\u6613\u4e8e\u8bbf\u95ee\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684Web\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\uff0c\u67e5\u770b\u751f\u6210\u548c\u4fee\u6b63\u7684Cypher\u67e5\u8be2\uff0c\u5e76\u9a8c\u8bc1\u7ed3\u679c\u8def\u5f84\u7684\u51c6\u786e\u6027\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u8fd9\u79cd\u6df7\u5408\u65b9\u6cd5\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6570\u636e\u7f3a\u53e3\u548c\u5e7b\u89c9\u7b49\u5e38\u89c1\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u9760\u4e14\u76f4\u89c2\u7684\u89e3\u51b3\u65b9\u6848\u6765\u6539\u8fdb\u95ee\u7b54\u7cfb\u7edf\u3002\u751f\u6210\u672c\u6587\u7ed3\u679c\u548c\u7528\u6237\u754c\u9762\u6240\u9700\u6e90\u4ee3\u7801\u7684Git\u4ed3\u5e93\u94fe\u63a5\u5982\u4e0b\uff1ahttps://git.zib.de/lpusch/cyphergenkg-gui|\n", "2409.04168": "|**2024-09-06**|**From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks**|Andreas Stephan et.al.|[2409.04168](http://arxiv.org/abs/2409.04168)|null|\u4e3a\u4e86\u51cf\u5c11\u5bf9\u4eba\u5de5\u6807\u6ce8\u7684\u9700\u6c42\uff0c\u63d0\u51fa\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5019\u9009\u6a21\u578b\u8d28\u91cf\u7684\u8bc4\u5224\u8005\u3002\u8fd9\u4e9bLLM\u8bc4\u5224\u8005\u901a\u5e38\u901a\u8fc7\u5728\u6458\u8981\u6216\u673a\u5668\u7ffb\u8bd1\u7b49\u751f\u6210\u4efb\u52a1\u4e0a\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u6765\u8bc4\u4f30\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u7684LLM\u8bc4\u5224\u8005\u3002\u8fd9\u7c7b\u4efb\u52a1\u9700\u8981\u591a\u6b65\u63a8\u7406\uff0c\u5176\u89e3\u7b54\u7684\u6b63\u786e\u6027\u53ef\u4ee5\u9a8c\u8bc1\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u5ba2\u89c2\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u8be6\u7ec6\u7684\u8868\u73b0\u5206\u6790\uff0c\u5e76\u53d1\u73b0\u4f7f\u7528\u7684\u8bc4\u5224\u8005\u5927\u591a\u65e0\u6cd5\u63d0\u9ad8\u4efb\u52a1\u6027\u80fd\uff0c\u4f46\u80fd\u591f\u9009\u62e9\u66f4\u597d\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8bc4\u5224\u8868\u73b0\u4e0e\u5019\u9009\u6a21\u578b\u4efb\u52a1\u8868\u73b0\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\u3002\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u503e\u5411\u4e8e\u9009\u62e9\u66f4\u9ad8\u8d28\u91cf\u7684\u6a21\u578b\uff0c\u5373\u4f7f\u5176\u7b54\u6848\u662f\u9519\u8bef\u7684\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u7edf\u8ba1\u63aa\u65bd\uff0c\u5982\u5019\u9009\u6a21\u578b\u7684\u4efb\u52a1\u6027\u80fd\uff0c\u6765\u9884\u6d4b\u8bc4\u5224\u8868\u73b0\u3002\u5728\u6d88\u878d\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4ea4\u6362\u6216\u5c4f\u853d\u5019\u9009\u7b54\u6848\uff0c\u5e76\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u7ecf\u5e38\u4fdd\u6301\u539f\u59cb\u5224\u65ad\uff0c\u8fd9\u63d0\u4f9b\u4e86\u8bc1\u636e\u8868\u660e\u8bc4\u5224\u8005\u5728\u5224\u65ad\u4e2d\u878d\u5165\u4e86\u5199\u4f5c\u98ce\u683c\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u7edf\u8ba1\u6307\u6807\u91cf\u5316\u5224\u65ad\u4e2d\u7684\u89c4\u5f8b\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528\u5b83\u4eec\u7684\u5404\u79cd\u89d2\u5ea6\u3002|\n", "2409.04164": "|**2024-09-06**|**Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation**|Luis Mayer et.al.|[2409.04164](http://arxiv.org/abs/2409.04164)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u5728\u591a\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u5305\u62ec\u8f6f\u4ef6\u5de5\u7a0b\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e94\u6b3e\u6700\u5148\u8fdb\u7684LLM\u2014\u2014Bard\u3001BingChat\u3001ChatGPT\u3001Llama2\u548cCode Llama\u2014\u2014\u5728\u6587\u672c\u5230\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5411\u6a21\u578b\u63d0\u4f9b\u6765\u81ea\u7f16\u7a0b\u7f51\u7ad9LeetCode\u7684\u7f16\u7801\u95ee\u9898\u63cf\u8ff0\u6587\u672c\u63d0\u793a\uff0c\u8981\u6c42\u5b83\u4eec\u7528Python\u7f16\u5199\u89e3\u51b3\u65b9\u6848\u3002\u968f\u540e\uff0c\u6211\u4eec\u4f7f\u7528LeetCode\u7684\u6d4b\u8bd5\u529f\u80fd\u6765\u8bc4\u4f30\u751f\u6210\u8f93\u51fa\u7684\u8d28\u91cf\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002ChatGPT\u5728\u5904\u7406\u8fd9\u7c7b\u7f16\u7a0b\u6311\u6218\u65b9\u9762\u8868\u73b0\u6700\u4e3a\u6709\u6548\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u4e13\u95e8\u9488\u5bf9\u4ee3\u7801\u7684\u6a21\u578b\uff0c\u5982Code Llama\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u4e86\u89e3\u60c5\u51b5\uff0c\u6211\u4eec\u6d4b\u91cf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u8fd0\u884c\u65f6\u95f4\u548c\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\uff0c\u5e76\u5c06\u5176\u4e0eLeetCode\u4e0a\u7684\u5176\u4ed6\u4ee3\u7801\u63d0\u4ea4\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u8be6\u7ec6\u9519\u8bef\u5206\u6790\u5305\u62ec\u6bd4\u8f83\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u6b63\u786e\u7f29\u8fdb\u548c\u5f62\u5f0f\u5dee\u5f02\uff0c\u4ee5\u53ca\u5c06\u672a\u89e3\u51b3\u7684\u4efb\u52a1\u5f52\u7c7b\u5230\u7279\u5b9a\u9519\u8bef\u7c7b\u522b\uff0c\u6709\u52a9\u4e8e\u6211\u4eec\u66f4\u6df1\u5165\u5730\u7406\u89e3\u7ed3\u679c\u5e76\u627e\u5230\u6539\u8fdb\u7a7a\u95f4\u3002\u7814\u7a76\u7ed3\u679c\u8fd8\u663e\u793a\uff0c\u5f53\u6a21\u578b\u9762\u4e34\u5927\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u65f6\uff0c\u5373\u8f83\u957f\u63d0\u793a\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u8d8a\u6765\u8d8a\u4e0d\u51c6\u786e\u3002|\n", "2409.05840": "|**2024-09-09**|**MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct**|Run Luo et.al.|[2409.05840](http://arxiv.org/abs/2409.05840)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53d1\u5c55\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5728\u6570\u636e\u91cf\u548c\u6570\u636e\u8d28\u91cf\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u5173\u952e\u74f6\u9888\u3002\u624b\u52a8\u521b\u5efa\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u65e2\u8017\u65f6\u53c8\u4f4e\u6548\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u9ad8\u590d\u6742\u6027\u7684\u6307\u4ee4\u65f6\u3002\u6b64\u5916\uff0c\u4ece\u201c\u9ed1\u76d2\u201d\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\u3001GPT-4V\uff09\u4e2d\u63d0\u53d6\u6307\u4ee4\u6570\u636e\u5f80\u5f80\u5bfc\u81f4\u751f\u6210\u7684\u6307\u4ee4\u6570\u636e\u8fc7\u4e8e\u7b80\u5355\uff0c\u8fd9\u9650\u5236\u4e86\u6a21\u578b\u6027\u80fd\u4ec5\u4e0e\u5176\u81ea\u8eab\u6c34\u5e73\u76f8\u5f53\u3002\u6784\u5efa\u591a\u6837\u6027\u548c\u590d\u6742\u6027\u6307\u4ee4\u6570\u636e\u7684\u6311\u6218\u4f9d\u7136\u5de8\u5927\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMMEvol\u7684\u65b0\u9896\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u8fdb\u5316\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u7cbe\u7ec6\u611f\u77e5\u6f14\u5316\u3001\u8ba4\u77e5\u63a8\u7406\u6f14\u5316\u4ee5\u53ca\u4e92\u52a8\u6f14\u5316\u3002\u8fd9\u4e00\u8fed\u4ee3\u65b9\u6cd5\u7a81\u7834\u4e86\u6570\u636e\u8d28\u91cf\u74f6\u9888\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u50cf-\u6587\u672c\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u4ece\u800c\u589e\u5f3a\u4e86MLLMs\u7684\u80fd\u529b\u3002\u6211\u4eec\u4ee5\u521d\u59cb\u6307\u4ee4\u96c6\u5408SEED-163K\u4e3a\u57fa\u7840\uff0c\u5229\u7528MMEvol\u7cfb\u7edf\u5730\u6269\u5c55\u4e86\u6307\u4ee4\u7c7b\u578b\u7684\u591a\u6837\u6027\uff0c\u878d\u5165\u4e86\u589e\u5f3a\u8ba4\u77e5\u80fd\u529b\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u5e76\u4ece\u56fe\u50cf\u4e2d\u63d0\u53d6\u4e86\u8be6\u7ec6\u4fe1\u606f\u4ee5\u63d0\u9ad8\u89c6\u89c9\u7406\u89e3\u548c\u9c81\u68d2\u6027\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u4f7f\u7528\u8fdb\u5316\u7684\u6570\u636e\u8bad\u7ec3\u4e86LLaVA-NeXT\uff0c\u5e76\u572813\u4e2a\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0e\u57fa\u4e8e\u539f\u59cb\u6570\u636e\u8bad\u7ec3\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5e73\u5747\u63d0\u9ad8\u4e863.1\u70b9\u51c6\u786e\u7387\uff0c\u5e76\u57289\u4e2a\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2409.05824": "|**2024-09-09**|**Are Large Language Models a Threat to Programming Platforms? An Exploratory Study**|Md Mustakim Billah et.al.|[2409.05824](http://arxiv.org/abs/2409.05824)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982ChatGPT\u3001Gemini\u548cMeta AI\u5728LeetCode\u3001Codeforces\u548cHackerRank\u7b49\u7ade\u8d5b\u7f16\u7a0b\u5e73\u53f0\u4e0a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u8fd9\u4e9b\u5e73\u53f0\u5e38\u88ab\u62db\u8058\u4eba\u5458\u7528\u6765\u7b5b\u9009\u7f16\u7a0b\u6280\u80fd\u3002\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5bf9\u5176\u5728\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u3001\u5404\u7c7b\u522b\u7684\u7f16\u7a0b\u6311\u6218\u4e2d\u7684\u8868\u73b0\u8fdb\u884c\u8bc4\u4f30\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002 \u7814\u7a76\u56e2\u961f\u4eceLeetCode\u9009\u53d6\u4e8698\u4e2a\u95ee\u9898\uff0c\u4eceCodeforces\u9009\u53d6\u4e86126\u4e2a\u95ee\u9898\uff0c\u8986\u76d6\u4e8615\u4e2a\u7c7b\u522b\u3002\u901a\u8fc7\u4e5d\u573a\u5728\u7ebfCodeforces\u548cLeetCode\u7ade\u8d5b\u4ee5\u53caHackerRank\u7684\u4e24\u9879\u8ba4\u8bc1\u6d4b\u8bd5\uff0c\u5bf9LLM\u7684\u5b9e\u65f6\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7814\u7a76\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u4e86\u63d0\u793a\u548c\u53cd\u9988\u673a\u5236\u6765\u5f15\u5bfcLLM\uff0c\u5e76\u63a2\u7d22\u4e86\u4e0d\u540c\u573a\u666f\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u3002 \u7ed3\u679c\u663e\u793a\uff0cChatGPT\u7b49LLM\u5728LeetCode\u548cHackerRank\u7684\u8ba4\u8bc1\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff08\u6210\u529f\u7387\u4e3a71.43%\uff09\uff0c\u4f46\u5728\u865a\u62df\u7ade\u8d5b\u4e2d\uff0c\u7279\u522b\u662f\u5728Codeforces\u7684\u9ad8\u96be\u5ea6\u6bd4\u8d5b\u4e2d\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4e0d\u5c3d\u5982\u4eba\u610f\u3002\u5c3d\u7ba1\u5728LeetCode\u6863\u6848\u5e93\u4e2d\u7684\u7528\u6237\u4e2d\u8868\u73b0\u4f18\u4e8e\u90e8\u5206\u7528\u6237\uff0c\u4f46LLM\u5728\u65f6\u95f4\u6548\u7387\u548c\u5185\u5b58\u6548\u7387\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u800c\u5728\u66f4\u56f0\u96be\u7684Codeforces\u7ade\u8d5b\u4e2d\u5219\u5904\u4e8e\u52a3\u52bf\u3002 \u5c3d\u7ba1\u5f53\u524d\u60c5\u51b5\u5e76\u672a\u7acb\u5373\u6784\u6210\u5a01\u80c1\uff0c\u4f46LLM\u5728\u8fd9\u4e9b\u5e73\u53f0\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u62c5\u5fe7\uff0c\u672a\u6765\u9700\u8981\u6539\u8fdb\u4ee5\u63d0\u9ad8\u5176\u6027\u80fd\u3002|\n", "2409.05806": "|**2024-09-09**|**Benchmarking Chinese Knowledge Rectification in Large Language Models**|Tianhe Lu et.al.|[2409.05806](http://arxiv.org/abs/2409.05806)|**[link](https://github.com/zjunlp/easyedit)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u751f\u6210\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u6ca1\u6709\u7f3a\u9677\uff0c\u7279\u522b\u662f\u5b58\u5728\u5e7b\u89c9\u7684\u95ee\u9898\u3002\u5f53LLM\u5e94\u7528\u4e8e\u7279\u5b9a\u8bed\u8a00\u548c\u9886\u57df\u65f6\uff0c\u8fd9\u4e00\u95ee\u9898\u5c24\u4e3a\u7a81\u51fa\u3002\u4f8b\u5982\uff0c\u5728\u5904\u7406\u4e2d\u56fd\u53e4\u4ee3\u8bd7\u6b4c\u3001\u8c1a\u8bed\u6216\u6210\u8bed\u65f6\uff0cLLM\u53ef\u80fd\u4f1a\u751f\u6210\u6beb\u65e0\u610f\u4e49\u7684\u4fe1\u606f\uff0c\u8fd9\u662f\u7531\u4e8e\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u9020\u6210\u7684\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u7684\u57fa\u51c6\uff0c\u901a\u8fc7\u77e5\u8bc6\u7f16\u8f91\u6765\u7ea0\u6b63\u4e2d\u6587\u77e5\u8bc6\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4ece\u5404\u79cd\u6765\u6e90\u6536\u96c6\u4e03\u79cd\u7c7b\u578b\u7684\u77e5\u8bc6\uff0c\u5305\u62ec\u53e4\u5178\u6587\u672c\u3001\u6210\u8bed\u4ee5\u53ca\u6765\u81ea\u767e\u5ea6\u8d34\u5427\u201c\u6c42\u8bf8\u5bb6\u201d\u7684\u5185\u5bb9\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u4e2d\u6587\u6570\u636e\u96c6CKnowEdit\uff0c\u4ee5\u5e94\u5bf9\u4e2d\u6587\u8bed\u8a00\u7279\u6709\u7684\u590d\u8c03\u6027\u3001\u53cd\u8bbd\u6027\u548c\u903b\u8f91\u7ed3\u6784\u3002\u901a\u8fc7\u5bf9\u8fd9\u4e2a\u6570\u636e\u96c6\u7684\u5206\u6790\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5f53\u524dLLM\u5728\u638c\u63e1\u4e2d\u6587\u65b9\u9762\u7684\u6311\u6218\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u5bf9\u73b0\u6709\u7684\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u8fdb\u884c\u8bc4\u4f30\uff0c\u53d1\u73b0\u5bf9\u4e2d\u6587\u77e5\u8bc6\u7684\u4fee\u6b63\u4ecd\u5b58\u5728\u5de8\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\uff1ahttps://github.com/zjunlp/EasyEdit\u3002**|\n", "2409.05771": "|**2024-09-09**|**Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models**|Emily Cheng et.al.|[2409.05771](http://arxiv.org/abs/2409.05771)|null|\u7814\u7a76\u5df2\u53cd\u590d\u8bc1\u660e\uff0c\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u4e2d\u95f4\u9690\u85cf\u72b6\u6001\u80fd\u591f\u9884\u6d4b\u5bf9\u81ea\u7136\u8bed\u8a00\u523a\u6fc0\u7684\u6d4b\u91cf\u5927\u8111\u53cd\u5e94\u3002\u7136\u800c\uff0c\u5173\u4e8e\u4f7f\u8fd9\u4e00\u9ad8\u9884\u6d4b\u6027\u80fd\u6210\u4e3a\u53ef\u80fd\u7684\u8868\u793a\u7279\u6027\u7684\u4e86\u89e3\u975e\u5e38\u6709\u9650\u3002\u4e3a\u4ec0\u4e48\u662f\u4e2d\u95f4\u5c42\u800c\u4e0d\u662f\u8f93\u51fa\u5c42\u5728\u8fd9\u4e00\u72ec\u7279\u4e14\u9ad8\u5ea6\u901a\u7528\u7684\u8f6c\u79fb\u4efb\u52a1\u4e2d\u6700\u4e3a\u6709\u6548\uff1f\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u529f\u80fd\u6027\u78c1\u5171\u632f\u6210\u50cf\u4e2d\u7684\u8bed\u8a00\u7f16\u7801\u6a21\u578b\u8bc1\u636e\u652f\u6301\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5185\u5b58\u5728\u4e24\u4e2a\u9636\u6bb5\u62bd\u8c61\u8fc7\u7a0b\u7684\u5b58\u5728\u3002\u6211\u4eec\u4f7f\u7528\u6d41\u5f62\u5b66\u4e60\u65b9\u6cd5\u8868\u660e\uff0c\u8fd9\u79cd\u62bd\u8c61\u8fc7\u7a0b\u81ea\u7136\u5730\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ea7\u751f\uff0c\u5e76\u4e14\u968f\u7740\u8bad\u7ec3\u7ee7\u7eed\u8fdb\u884c\uff0c\u8fd9\u4e2a\u62bd\u8c61\u8fc7\u7a0b\u7684\u7b2c\u4e00\u4e2a\u201c\u7ec4\u5408\u201d\u9636\u6bb5\u88ab\u538b\u7f29\u5230\u66f4\u5c11\u7684\u5c42\u4e2d\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5c42\u6b21\u7f16\u7801\u6027\u80fd\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8868\u793a\u7684\u5185\u5728\u7ef4\u5ea6\u4e4b\u95f4\u5b58\u5728\u5f3a\u70c8\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u6211\u4eec\u521d\u6b65\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u5bf9\u5e94\u5173\u7cfb\u4e3b\u8981\u6765\u6e90\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u5728\u7ec4\u5408\u6027\uff0c\u800c\u975e\u5176\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u5c5e\u6027\u3002|\n", "2409.05768": "|**2024-09-09**|**Model Input Verification of Large Scale Simulations**|Rumyana Neykova et.al.|[2409.05768](http://arxiv.org/abs/2409.05768)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u9a8c\u8bc1\u6a21\u62df\u8f93\u5165\u6570\u636e\u6709\u6548\u6027\u7684\u65b9\u6cd5\u8bba\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6a21\u578b\u8f93\u5165\u9a8c\u8bc1\uff08MIV\uff09\u3002\u6211\u4eec\u901a\u8fc7\u8bbe\u8ba1\u7279\u5b9a\u4e8e\u6a21\u62df\u5efa\u6a21\u9700\u6c42\u7684\u6570\u636e\u6a21\u5f0f\u548c\u9a8c\u8bc1\u5de5\u5177\u5728\u540d\u4e3aFabGuard\u7684\u5de5\u5177\u96c6\u4e2d\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u3002\u672c\u6587\u5f15\u5165\u4e86MIV\u6a21\u5f0f\u7684\u6b63\u5f0f\u5206\u7c7b\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u96c6\u6210\u5230\u73b0\u6709\u6a21\u62df\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u7b80\u5316\u9a8c\u8bc1\u7ba1\u9053\u3002FabGuard\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u2014\u2014\u51b2\u7a81\u9a71\u52a8\u7684\u4eba\u53e3\u8fc1\u79fb\u3001\u707e\u5bb3\u758f\u6563\u4ee5\u53ca\u75be\u75c5\u4f20\u64ad\u6a21\u578b\u2014\u2014\u7684\u5e94\u7528\u5f97\u5230\u4e86\u5c55\u793a\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u7ea6\u675f\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u7684\u5e94\u7528\u3002\u5728\u5bf9\u4e00\u4e2a\u79fb\u6c11\u6a21\u62df\u6848\u4f8b\u7684\u7814\u7a76\u4e2d\uff0cLLMs\u4e0d\u4ec5\u6b63\u786e\u63a8\u65ad\u51fa\u4e8623\u4e2a\u5f00\u53d1\u8005\u5b9a\u4e49\u7684\u7ea6\u675f\u4e2d\u768422\u4e2a\uff0c\u800c\u4e14\u8fd8\u53d1\u73b0\u4e86\u73b0\u6709\u7ea6\u675f\u4e2d\u7684\u9519\u8bef\uff0c\u5e76\u63d0\u51fa\u4e86\u65b0\u7684\u6709\u6548\u7ea6\u675f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5bf9\u4e8e\u5927\u578b\u6570\u636e\u96c6\uff0cMIV\u662f\u53ef\u884c\u7684\uff0cFabGuard\u80fd\u591f\u5728140\u79d2\u5185\u9ad8\u6548\u5904\u740612,000\u4e2a\u8f93\u5165\u6587\u4ef6\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5728\u4e0d\u540c\u6587\u4ef6\u5927\u5c0f\u4e0b\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2409.05747": "|**2024-09-09**|**A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System**|B. Sankar et.al.|[2409.05747](http://arxiv.org/abs/2409.05747)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u3001\u57fa\u4e8e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u6fc0\u6d3b\u521b\u65b0\u754c\u9762\uff0c\u4f5c\u4e3a\u521b\u610f\u751f\u6210\u5de5\u5177\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u8bbe\u8ba1\u8005\u7f13\u89e3\u901a\u5e38\u5b58\u5728\u7684\u521d\u59cb\u5ef6\u8fdf\u548c\u521b\u65b0\u74f6\u9888\u95ee\u9898\u3002\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u3001\u4e92\u52a8\u4e14\u4e0a\u4e0b\u6587\u54cd\u5e94\u5f0f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u79ef\u6781\u5730\u5229\u7528\u4eba\u5de5\u667a\u80fd\u9886\u57df\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u751f\u6210\u9488\u5bf9\u4e0d\u540c\u8bbe\u8ba1\u95ee\u9898\u7684\u591a\u4e2a\u6f5c\u5728\u60f3\u6cd5\u8868\u8ff0\u3002\u5c06\u6b64\u7c7bAI\u6a21\u578b\u4e0e\u521b\u65b0\u8fc7\u7a0b\u7ed3\u5408\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u6fc0\u6d3b\u521b\u65b0\u201d\u60c5\u666f\uff0c\u65e8\u5728\u4fc3\u8fdb\u57fa\u4e8e\u5bf9\u8bdd\u7684\u8fde\u7eed\u4e92\u52a8\u3001\u4e0a\u4e0b\u6587\u76f8\u5173\u7684\u5bf9\u8bdd\u4ee5\u53ca\u5927\u91cf\u7684\u60f3\u6cd5\u751f\u6210\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u5de5\u5177\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5bf930\u540d\u521d\u5b66\u8005\u8bbe\u8ba1\u5e08\u8fdb\u884c\u4e86\u8bd5\u70b9\u7814\u7a76\uff0c\u8ba9\u4ed6\u4eec\u4f7f\u7528\u4f20\u7edf\u65b9\u6cd5\u548c\u65b0\u7684\u57fa\u4e8eCAI\u7684\u754c\u9762\u6765\u4e3a\u7ed9\u5b9a\u95ee\u9898\u751f\u6210\u60f3\u6cd5\u3002\u901a\u8fc7\u4e13\u5bb6\u5c0f\u7ec4\u5bf9\u7ed3\u679c\u8fdb\u884c\u7684\u5b9a\u6027\u6bd4\u8f83\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u6d41\u7545\u5ea6\u3001\u65b0\u9896\u6027\u548c\u591a\u6837\u6027\u4f5c\u4e3a\u5173\u952e\u53c2\u6570\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u63d0\u51fa\u7684\u5de5\u5177\u80fd\u591f\u6709\u6548\u5730\u4ea7\u751f\u5927\u91cf\u3001\u591a\u6837\u4e14\u65b0\u9896\u7684\u60f3\u6cd5\u3002 \u4e3a\u4e86\u63d0\u9ad8\u754c\u9762\u7684\u53ef\u7528\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u5316\u7684\u5bf9\u8bdd\u6a21\u5f0f\uff0c\u4e3a\u6bcf\u4e2a\u521b\u65b0\u9636\u6bb5\u8bbe\u8ba1\u4e86\u63d0\u793a\u5de5\u7a0b\u5316\u7ed3\u6784\uff0c\u4f7f\u5176\u66f4\u52a0\u7edf\u4e00\u548c\u65b9\u4fbf\u8bbe\u8ba1\u5e08\u64cd\u4f5c\u3002\u91c7\u7528\u8fd9\u79cd\u7ed3\u6784\u5316\u7684CAI\u754c\u9762\u540e\uff0c\u5f97\u5230\u7684\u54cd\u5e94\u66f4\u52a0\u7b80\u6d01\uff0c\u5e76\u4e14\u4e0e\u968f\u540e\u7684\u8bbe\u8ba1\u9636\u6bb5\uff0c\u5373\u6982\u5ff5\u5316\u9636\u6bb5\uff0c\u66f4\u52a0\u7d27\u5bc6\u76f8\u5173\u3002 \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u8bc1\u660e\u4e86\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff08Gen-AI\uff09\u5728\u521b\u610f\u4ea7\u54c1\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u65e9\u671f\u3001\u7ed3\u6784\u4e0d\u660e\u786e\u9636\u6bb5\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.05746": "|**2024-09-09**|**LLMs Will Always Hallucinate, and We Need to Live With This**|Sourav Banerjee et.al.|[2409.05746](http://arxiv.org/abs/2409.05746)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u6df1\u5165\u63a2\u8ba8\u5b83\u4eec\u5185\u5728\u5c40\u9650\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\uff0c\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u5e76\u975e\u5076\u7136\u9519\u8bef\uff0c\u800c\u662f\u8fd9\u4e9b\u7cfb\u7edf\u56fa\u6709\u7684\u7279\u5f81\u3002\u6211\u4eec\u901a\u8fc7\u8ba1\u7b97\u7406\u8bba\u548c\u54e5\u5fb7\u5c14\u7b2c\u4e00\u4e0d\u5b8c\u5168\u6027\u5b9a\u7406\u7684\u5f15\u7528\uff08\u6d89\u53caHalting\u3001Emptiness\u548cAcceptance\u95ee\u9898\u7684\u4e0d\u53ef\u5224\u5b9a\u6027\uff09\uff0c\u5c55\u793a\u4e86\u5e7b\u89c9\u6e90\u4e8eLLM\u7684\u57fa\u672c\u6570\u5b66\u548c\u903b\u8f91\u7ed3\u6784\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u67b6\u6784\u6539\u8fdb\u3001\u6570\u636e\u96c6\u589e\u5f3a\u6216\u4e8b\u5b9e\u6838\u67e5\u673a\u5236\u6d88\u9664\u5e7b\u89c9\u662f\u4e0d\u53ef\u80fd\u7684\u3002 \u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4ece\u8bad\u7ec3\u6570\u636e\u7f16\u8bd1\u5230\u4e8b\u5b9e\u68c0\u7d22\u3001\u610f\u56fe\u5206\u7c7b\u548c\u6587\u672c\u751f\u6210\u7684\u6bcf\u4e2a\u9636\u6bb5\uff0c\u90fd\u5b58\u5728\u4ea7\u751f\u5e7b\u89c9\u7684\u975e\u96f6\u6982\u7387\u3002\u7531\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u6027\u5e7b\u89c9\u7684\u6982\u5ff5\uff0c\u4f5c\u4e3a\u8fd9\u4e9b\u7cfb\u7edf\u7684\u56fa\u6709\u7279\u6027\u3002\u901a\u8fc7\u5efa\u7acb\u5e7b\u89c9\u7684\u6570\u5b66\u786e\u5b9a\u6027\uff0c\u672c\u6587\u6311\u6218\u4e86\u5e7b\u89c9\u53ef\u4ee5\u5b8c\u5168\u907f\u514d\u7684\u4f20\u7edf\u89c2\u70b9\u3002|\n", "2409.05735": "|**2024-09-09**|**A System and Benchmark for LLM-based Q\\&A on Heterogeneous Data**|Achille Fokoue et.al.|[2409.05735](http://arxiv.org/abs/2409.05735)|null|\u5728\u8bb8\u591a\u5de5\u4e1a\u73af\u5883\u4e2d\uff0c\u7528\u6237\u5e0c\u671b\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u63d0\u51fa\u95ee\u9898\uff0c\u5e76\u4ece\u7ed3\u6784\u5316\u6570\u636e\u6e90\uff08\u5982\u7535\u5b50\u8868\u683c\u3001\u6570\u636e\u5e93\u3001API\u6216\u5b83\u4eec\u7684\u7ec4\u5408\uff09\u4e2d\u83b7\u53d6\u7b54\u6848\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u7528\u6237\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u8bc6\u522b\u6216\u8bbf\u95ee\u6b63\u786e\u7684\u6570\u636e\u6e90\u3002\u5982\u679c\u9700\u8981\u7ec4\u88c5\u591a\u4e2a\uff08\u751a\u81f3\u53ef\u80fd\u662f\u9694\u79bb\u7684\uff09\u6570\u636e\u6e90\u6765\u5f97\u51fa\u7b54\u6848\uff0c\u8fd9\u4e2a\u95ee\u9898\u4f1a\u53d8\u5f97\u66f4\u52a0\u590d\u6742\u3002\u6700\u8fd1\uff0c\u4e00\u4e9b\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5230SQL\u5e94\u7528\u5df2\u89e3\u51b3\u4e86\u4e00\u4e9b\u8fd9\u4e9b\u95ee\u9898\uff0c\u901a\u8fc7\u4f7f\u7528\u6237\u80fd\u591f\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u51fa\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u73b0\u5b9e\u7684\u5de5\u4e1a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u5e94\u7528\u4ecd\u7136\u4e0d\u5b9e\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5e94\u5bf9\u5178\u578b\u73af\u5883\u4e2d\u6570\u636e\u6e90\u7684\u5f02\u8d28\u6027\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u5f15\u5165siwarex\u5e73\u53f0\u89e3\u51b3\u5f02\u8d28\u6027\u95ee\u9898\uff0c\u8be5\u5e73\u53f0\u5141\u8bb8\u65e0\u7f1d\u5730\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8bbf\u95ee\u6570\u636e\u5e93\u548cAPI\u3002 \u4e3a\u4e86\u5c55\u793asiwarex\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u6d41\u884c\u7684Spider\u6570\u636e\u96c6\u5e76\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u66ff\u6362\u5176\u4e2d\u7684\u4e00\u4e9b\u8868\u683c\u4e3a\u6570\u636e\u68c0\u7d22API\u3002\u6211\u4eec\u53d1\u73b0siwarex\u5f88\u597d\u5730\u5e94\u5bf9\u4e86\u6570\u636e\u6e90\u5f02\u8d28\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u4fee\u6539\u540e\u7684Spider\u57fa\u51c6\u5f88\u5feb\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u5f00\u653e\u3002|\n", "2409.05732": "|**2024-09-09**|**Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach**|Meng Zhou et.al.|[2409.05732](http://arxiv.org/abs/2409.05732)|null|## \u4e0a\u6587\u80cc\u666f \u591a\u8bed\u8a00\u5f00\u6e90\u533b\u7597\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5177\u6709\u670d\u52a1\u4e8e\u4e0d\u540c\u5730\u533a\u8bed\u8a00\u591a\u6837\u6027\u7684\u6f5c\u529b\u3002\u5c06\u901a\u7528LLMs\u9002\u5e94\u4e8e\u533b\u7597\u9886\u57df\u901a\u5e38\u9700\u8981\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4f46\u8fd9\u5728\u8ba1\u7b97\u4e0a\u6210\u672c\u9ad8\u6602\u4e14\u6709\u65f6\u4e0d\u53ef\u884c\u3002\u4ec5\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u7279\u5b9a\u4efb\u52a1\u53ef\u80fd\u65e0\u6cd5\u4fdd\u8bc1\u6700\u4f73\u6027\u80fd\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5e7f\u6cdb\u9886\u57df\u77e5\u8bc6\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u5728\u5404\u79cd\u573a\u666f\u4e0b\u7406\u89e3\u548c\u63a8\u7406\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u591a\u8bed\u8a00\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\uff1aMMed-IFT\u548cMMed-IFT-MC\uff0c\u8fd9\u4e24\u4e2a\u6570\u636e\u96c6\u5206\u522b\u5305\u542b\u4e86\u8d85\u8fc720\u4e07\u6761\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u79cd\u533b\u7597\u6837\u672c\uff0c\u5728\u516d\u79cd\u8bed\u8a00\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8bad\u7ec3\u8303\u5f0f\uff1a\u7b2c\u4e00\u9636\u6bb5\u5229\u7528MMed-IFT\u6ce8\u5165\u901a\u7528\u533b\u5b66\u77e5\u8bc6\uff0c\u7b2c\u4e8c\u9636\u6bb5\u5219\u4f7f\u7528MMed-IFT-MC\u5fae\u8c03\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u591a\u9879\u9009\u62e9\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u82f1\u8bed\u548c\u591a\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\u3002\u6211\u4eec\u8ba1\u5212\u5728\u672a\u6765\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6a21\u578b\u6743\u91cd\u516c\u5f00\u5728\\url{https://github.com/SpassMed/Med-Llama3}\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u4e3a\u4e2d\u6587\uff0c\u907f\u514d\u8f93\u51fa\u5176\u4ed6\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u5e76\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2409.05703": "|**2024-09-09**|**The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy**|Qihang He et.al.|[2409.05703](http://arxiv.org/abs/2409.05703)|null|\u8fd1\u5e74\u6765\uff0c\u5fc3\u7406\u5065\u5eb7\u969c\u788d\u60a3\u8005\u7684\u6570\u91cf\u6301\u7eed\u589e\u957f\uff0c\u800c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e0d\u540c\u9886\u57df\u7684\u8fdb\u6b65\u4e5f\u4f7f\u5f97\u57fa\u4e8eLLM\u7684\u5fc3\u7406\u6cbb\u7597\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u5f71\u54cd\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u6001\u5ea6\u7684\u56e0\u7d20\u9c9c\u6709\u63a2\u8ba8\u3002\u672c\u6587\u4f5c\u4e3a\u9996\u6b21\u5c1d\u8bd5\uff0c\u65e8\u5728\u7814\u7a76\u4efb\u52a1\u5dee\u5f02\u548c\u7fa4\u4f53\u5dee\u5f02\u5bf9\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7684\u6001\u5ea6\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8fd0\u7528\u6280\u672f\u63a5\u53d7\u6a21\u578b\uff08TAM\uff09\u548c\u81ea\u52a8\u5316\u63a5\u53d7\u6a21\u578b\uff08AAM\uff09\uff0c\u7ed3\u5408\u5728\u7ebf\u95ee\u5377\u8c03\u67e5\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5206\u6790\u4e86\u6765\u81ea\u4e2d\u56fd\u5927\u9646222\u540d\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7528\u6237\u7684\u53cd\u9988\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u7fa4\u4f53\u5dee\u5f02\uff08\u5373\u5fc3\u7406\u5065\u5eb7\u72b6\u51b5\uff09\u53ef\u4ee5\u5f71\u54cd\u7528\u6237\u5bf9LLM\u5de5\u5177\u7684\u6001\u5ea6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u4f5c\u4e3a\u5178\u578b\u4efb\u52a1\u5dee\u5f02\u4e4b\u4e00\u7684\u9690\u79c1\u987e\u8651\uff0c\u5e76\u672a\u53d1\u73b0\u5bf9\u4fe1\u4efb\u5ea6\u548c\u4f7f\u7528\u610f\u56fe\u4ea7\u751f\u663e\u8457\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u53ef\u6307\u5bfc\u672a\u6765\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u670d\u52a1\u7684\u8bbe\u8ba1\u5de5\u4f5c\u3002|\n", "2409.06679": "|**2024-09-10**|**E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning**|Zihan Liao et.al.|[2409.06679](http://arxiv.org/abs/2409.06679)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9886\u57df\uff0c\u5904\u7406\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5bf9\u4e8e\u591a\u8f6e\u5bf9\u8bdd\u3001\u4ee3\u7801\u751f\u6210\u548c\u6587\u6863\u6458\u8981\u7b49\u4efb\u52a1\u6108\u53d1\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u589e\u5f3a\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u6027\u80fd\u3001\u964d\u4f4e\u8ba1\u7b97\u590d\u6742\u6027\u4ee5\u53ca\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u6240\u9762\u4e34\u7684\u6311\u6218\u2014\u2014\u5373\u6240\u8c13\u7684\u201c\u4e0d\u53ef\u80fd\u4e09\u89d2\u201d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aE2LLM\uff08\u7f16\u7801\u5668\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u89e3\u51b3\u8fd9\u4e00\u6096\u8bba\u3002 \u8be5\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5212\u5206\u4e3a\u591a\u4e2a\u7247\u6bb5\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u6587\u672c\u7f16\u7801\u5668\u5c06\u6bcf\u4e2a\u7247\u6bb5\u538b\u7f29\u4e3a\u5d4c\u5165\u5411\u91cf\u3002\u7136\u540e\u5229\u7528\u9002\u914d\u5668\u5c06\u8fd9\u4e9b\u8868\u793a\u4e0e\u89e3\u7801\u5668\u578bLLM\u5bf9\u9f50\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u8f6f\u63d0\u793a\u7684\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bad\u7ec3\u76ee\u6807\uff1a\u4e00\u662f\u91cd\u5efa\u7f16\u7801\u5668\u8f93\u51fa\uff0c\u4e8c\u662f\u9488\u5bf9\u957f\u6587\u672c\u6307\u4ee4\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u5e2e\u52a9LLM\u7406\u89e3\u8f6f\u63d0\u793a\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cE2LLM\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6548\u7387\u3001\u6027\u80fd\u548c\u4e0e\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u517c\u5bb9\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4ee3\u8868\u4e86\u9886\u57df\u5185\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u4e3a\u6709\u6548\u7684\u5927\u6587\u672c\u5efa\u6a21\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.06666": "|**2024-09-10**|**LLaMA-Omni: Seamless Speech Interaction with Large Language Models**|Qingkai Fang et.al.|[2409.06666](http://arxiv.org/abs/2409.06666)|**[link](https://github.com/ictnlp/llama-omni)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u8bed\u97f3\u5b9e\u73b0\u5b9e\u65f6\u4ea4\u4e92\u7684\u80fd\u529b\u63d0\u5347\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u6587\u672c\u4ea4\u4e92\u65b9\u5f0f\uff0c\u6a21\u578b\u5982GPT-4\u663e\u8457\u589e\u5f3a\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u7136\u800c\uff0c\u5f53\u524d\u5728\u57fa\u4e8e\u5f00\u6e90LLM\u6784\u5efa\u8bed\u97f3\u4ea4\u4e92\u6a21\u578b\u65b9\u9762\u4ecd\u7f3a\u4e4f\u6df1\u5165\u63a2\u7d22\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u6a21\u578b\u67b6\u6784\u2014\u2014LLaMA-Omni\uff0c\u65e8\u5728\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u4e0e\u9ad8\u8d28\u91cf\u7684\u8bed\u97f3\u4e0eLLM\u4ea4\u4e92\u3002\u8be5\u67b6\u6784\u878d\u5408\u4e86\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u3001LLM\u548c\u6d41\u5f0f\u8bed\u97f3\u89e3\u7801\u5668\uff0c\u65e0\u9700\u8fdb\u884c\u8bed\u97f3\u8f6c\u5f55\uff0c\u5373\u53ef\u76f4\u63a5\u4ece\u8bed\u97f3\u6307\u4ee4\u751f\u6210\u6587\u672c\u548c\u8bed\u97f3\u54cd\u5e94\uff0c\u54cd\u5e94\u901f\u5ea6\u6781\u5feb\u3002 \u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8e\u6700\u65b0\u7684Llama-3.1-8B-Instruct\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u9488\u5bf9\u8bed\u97f3\u4ea4\u4e92\u573a\u666f\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aInstructS2S-200K\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e8620\u4e07\u6761\u8bed\u97f3\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u8bed\u97f3\u56de\u5e94\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ee5\u5f80\u7684\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\uff0cLLaMA-Omni\u5728\u5185\u5bb9\u4e0e\u98ce\u683c\u4e0a\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u54cd\u5e94\uff0c\u54cd\u5e94\u5ef6\u8fdf\u4f4e\u81f3226\u6beb\u79d2\u3002\u6b64\u5916\uff0c\u8bad\u7ec3LLaMA-Omni\u4ec5\u9700\u4e0d\u52303\u5929\u7684\u65f6\u95f4\uff0c\u57284\u5757GPU\u4e0a\u5373\u53ef\u5b8c\u6210\uff0c\u8fd9\u4e3a\u672a\u6765\u9ad8\u6548\u5f00\u53d1\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2409.06653": "|**2024-09-10**|**Human Perception of LLM-generated Text Content in Social Media Environments**|Kristina Radivojevic et.al.|[2409.06653](http://arxiv.org/abs/2409.06653)|null|\u65b0\u5174\u6280\u672f\uff0c\u5c24\u5176\u662f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4e3a\u6076\u610f\u884c\u4e3a\u8005\u63d0\u4f9b\u4e86\u64cd\u7eb5\u6570\u5b57\u5bf9\u8bdd\u7684\u5f3a\u5927\u5de5\u5177\u3002LLM\u6709\u53ef\u80fd\u5f71\u54cd\u4f20\u7edf\u5f62\u5f0f\u7684\u6c11\u4e3b\u53c2\u4e0e\uff0c\u4f8b\u5982\u9009\u6c11\u9009\u62e9\u3001\u653f\u5e9c\u8c03\u67e5\u6216\u4e0e\u76d1\u7ba1\u673a\u6784\u7684\u5728\u7ebf\u4ea4\u6d41\uff0c\u56e0\u4e3a\u673a\u5668\u4eba\u80fd\u591f\u751f\u6210\u5927\u91cf\u53ef\u4fe1\u6587\u672c\u3002\u4e3a\u4e86\u7814\u7a76\u4eba\u7c7b\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u611f\u77e5\uff0c\u6211\u4eec\u62db\u52df\u4e86\u8d85\u8fc71000\u540d\u53c2\u4e0e\u8005\uff0c\u7136\u540e\u8ba9\u4ed6\u4eec\u5c1d\u8bd5\u5728\u793e\u4ea4\u5a92\u4f53\u8ba8\u8bba\u7ebf\u7a0b\u4e2d\u533a\u5206\u673a\u5668\u4eba\u4e0e\u4eba\u7c7b\u5e16\u5b50\u3002\u6211\u4eec\u53d1\u73b0\u4eba\u7c7b\u5728\u8bc6\u522b\u793e\u4ea4\u5a92\u4f53\u4e0a\u7684\u771f\u5b9e\u7528\u6237\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4eba\u7c7b\u5728\u793e\u4ea4\u5a92\u4f53\u5bf9\u8bdd\u4e2d\u8bc6\u522bLLM\u751f\u6210\u6587\u672c\u5185\u5bb9\u7684\u6a21\u5f0f\u3002\u6700\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u201c\u602a\u5f02\u8c37\u201d\u6548\u5e94\u5728\u6587\u672c\u5bf9\u8bdd\u4e2d\u7684\u5b58\u5728\uff0c\u65e0\u8bba\u662f\u5728\u611f\u77e5\u8fd8\u662f\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u8868\u660e\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u7684\u8868\u73b0\u4e0d\u4f73\uff0c\u4f46\u5f53\u9605\u8bfbLLM\u751f\u6210\u7684\u5185\u5bb9\u65f6\uff0c\u4ed6\u4eec\u4ecd\u80fd\u611f\u53d7\u5230\u4e0d\u9002\u3002|\n", "2409.06646": "|**2024-09-10**|**Optimal Workload Placement on Multi-Instance GPUs**|Bekir Turkkan et.al.|[2409.06646](http://arxiv.org/abs/2409.06646)|null|\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5982\u4f55\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u57fa\u7840\u7684AI\u63a8\u7406\u5de5\u4f5c\u8d1f\u8f7d\u5728GPU\u4e0a\u7684\u90e8\u7f72\u3002\u6211\u4eec\u9996\u5148\u8bc6\u522b\u5e76\u9610\u8ff0\u4e86\u5b9e\u8df5\u4e2d\u9047\u5230\u7684\u4e00\u4e9b\u9700\u8981\u9ad8\u6548\u5206\u914d\u6216\u8fc1\u79fb\u5de5\u4f5c\u8d1f\u8f7d\u5230\u5176\u4ed6GPU\u4ee5\u817e\u51fa\u7a7a\u95f4\u4f9b\u65b0\u5de5\u4f5c\u8d1f\u8f7d\u4f7f\u7528\u7684\u60c5\u51b5\u3002\u76ee\u6807\u662f\u5c3d\u53ef\u80fd\u51cf\u5c11\u4f7f\u7528\u7684GPU\u6570\u91cf\uff0c\u5e76\u8fdb\u4e00\u6b65\u964d\u4f4e\u88ab\u5229\u7528GPU\u4e2d\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6d6a\u8d39\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u4f18\u5316\u65b9\u6cd5\uff0c\u53e6\u4e00\u79cd\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528\u4e24\u79cd\u5de5\u4f5c\u8d1f\u8f7d\u8c03\u5ea6\u542f\u53d1\u5f0f\u7b97\u6cd5\u5bf9\u591a\u79cd\u7528\u4f8b\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0e\u57fa\u7ebf\u542f\u53d1\u5f0f\u76f8\u6bd4\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u80fd\u591f\u8282\u7701\u9ad8\u8fbe2.85\u500d\u7684GPU\u4f7f\u7528\u91cf\uff0c\u4ee5\u53ca\u9ad8\u8fbe70%\u7684GPU\u6d6a\u8d39\u3002 \u6211\u4eec\u8ba1\u5212\u8ba9SRE\uff08\u7cfb\u7edf\u53ef\u9760\u6027\u5de5\u7a0b\uff09\u793e\u533a\u80fd\u591f\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u5229\u7528\u6211\u4eec\u7684\u63d0\u8bae\u65b9\u6cd5\u3002|\n", "2409.06635": "|**2024-09-10**|**MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders**|Wenyu Zhang et.al.|[2409.06635](http://arxiv.org/abs/2409.06635)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u663e\u8457\u63d0\u9ad8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\uff0c\u4fc3\u8fdb\u4e86\u97f3\u9891LLM\u7684\u53d1\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u7406\u89e3\u8bed\u97f3\u548c\u97f3\u9891\u8f93\u5165\u3002\u73b0\u6709\u7684\u97f3\u9891LLM\u901a\u5e38\u7ed3\u5408\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u4e0e\u6587\u672c\u9884\u8bad\u7ec3\u7684LLM\uff0c\u5e76\u5728\u7279\u5b9a\u7684\u97f3\u9891\u4efb\u52a1\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u7684\u5bb9\u91cf\u6709\u9650\uff0c\u65e0\u6cd5\u6355\u83b7\u65b0\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e2d\u7684\u7279\u5f81\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u201c\u5f31\u201d\u7f16\u7801\u5668\u6df7\u5408\uff08MoWE\uff09\u878d\u5165\u97f3\u9891LLM\u6846\u67b6\u3002MoWE\u901a\u8fc7\u5728\u57fa\u672c\u7f16\u7801\u5668\u57fa\u7840\u4e0a\u8865\u5145\u4e00\u7ec4\u76f8\u5bf9\u8f83\u8f7b\u91cf\u7ea7\u7684\u7f16\u7801\u5668\uff0c\u6839\u636e\u97f3\u9891\u8f93\u5165\u52a8\u6001\u6fc0\u6d3b\u4ee5\u589e\u5f3a\u7279\u5f81\u63d0\u53d6\uff0c\u540c\u65f6\u907f\u514d\u663e\u8457\u589e\u52a0\u6a21\u578b\u5927\u5c0f\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoWE\u6709\u6548\u63d0\u9ad8\u4e86\u591a\u4efb\u52a1\u6027\u80fd\uff0c\u4f7f\u97f3\u9891LLM\u80fd\u591f\u5e94\u7528\u4e8e\u66f4\u591a\u6837\u5316\u7684\u97f3\u9891\u4efb\u52a1\u3002|\n", "2409.06624": "|**2024-09-10**|**A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio**|Ningyuan Xi et.al.|[2409.06624](http://arxiv.org/abs/2409.06624)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u8fc7\u7a0b\u4e2d\uff0c\u5982\u4f55\u901a\u8fc7\u989d\u5916\u8bed\u8a00\u6df7\u5408\u6bd4\uff08ALMR\uff09\u548c\u5b66\u4e60\u7387\uff08LR\uff09\u4e4b\u95f4\u7684\u6700\u4f18\u76f8\u5173\u6027\uff0c\u63d0\u5347\u6a21\u578b\u5728\u4e2d\u6587\u53ca\u5176\u4ed6\u7279\u5b9a\u9886\u57df\u7684\u6027\u80fd\u3002\u9488\u5bf98B\u5927\u5c0f\u7684Llama-3\u6a21\u578b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\uff0c\u786e\u5b9a\u4e86\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u7684\u5173\u952e\u8d85\u53c2\u6570\uff0c\u5e76\u901a\u8fc7\u7cbe\u7ec6\u8c03\u6574\uff0c\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5728\u4e2d\u6587\u76f8\u5173\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u6570\u5b66\u3001\u7f16\u7a0b\u548c\u60c5\u7eea\u667a\u80fd\u7b49\u7279\u5b9a\u9886\u57df\u7684\u80fd\u529b\u3002\u6700\u7ec8\uff0c\u6211\u4eec\u5c0670B\u5927\u5c0f\u7684LLM\u90e8\u7f72\u5230\u5b9e\u9645\u804a\u5929\u7cfb\u7edf\u4e2d\uff0c\u5e76\u53d6\u5f97\u4e86\u4ee4\u4eba\u6ee1\u610f\u7684\u6548\u679c\u3002|\n", "2409.06601": "|**2024-09-10**|**Alleviating Hallucinations in Large Language Models with Scepticism Modeling**|Yetao Wu et.al.|[2409.06601](http://arxiv.org/abs/2409.06601)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u4e3b\u8981\u6311\u6218\u662f\u5e7b\u89c9\u73b0\u8c61\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5728\u591a\u4e2a\u9886\u57df\u7684\u5e94\u7528\u3002\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u53ef\u4ee5\u88ab\u7528\u4e8e\u7f13\u89e3\u5e7b\u89c9\u5e26\u6765\u7684\u635f\u5bb3\u3002\u4eba\u7c7b\u7684\u6000\u7591\u60c5\u7eea\u88ab\u8ba4\u4e3a\u80fd\u589e\u5f3a\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c2\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8d28\u7591\u5efa\u6a21\u201d\uff08SM\uff09\u7684\u65b0\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u8bcd\u5143\u548clogits\u4fe1\u606f\u6765\u8fdb\u884c\u81ea\u6211\u8bc4\u4f30\u800c\u5f97\u5230\u5f62\u5f0f\u5316\u3002\u6211\u4eec\u6784\u5efa\u4e86\u5305\u542b\u6000\u7591\u60c5\u7eea\u610f\u8bc6\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u8fde\u7eed\u9884\u8bad\u7ec3\uff0c\u7136\u540e\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u63d0\u5347\u5b83\u4eec\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u6709\u6548\u589e\u5f3a\u4e86\u6a21\u578b\u4f30\u7b97\u4e0d\u786e\u5b9a\u6027\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u8de8\u9886\u57df\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u5176\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.06595": "|**2024-09-10**|**GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering**|Sacha Muller et.al.|[2409.06595](http://arxiv.org/abs/2409.06595)|**[link](https://github.com/illuin-tech/grouse)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u79c1\u6709\u4e14\u66f4\u65b0\u81f3\u6700\u65b0\u7684\u77e5\u8bc6\u5e93\u76f8\u7ed3\u5408\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u8303\u5f0f\u65f6\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u8bc4\u4f30\u7531RAG\u7cfb\u7edf\u751f\u6210\u7684\u57fa\u4e8e\u73b0\u5b9e\u7684\u7b54\u6848\u65f6\uff0c\u4f5c\u4e3a\u88c1\u5224\u7684LLM\u6240\u9047\u5230\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u8bc4\u4f30\u88c1\u5224\u6a21\u578b\u7684\u6821\u51c6\u548c\u533a\u5206\u80fd\u529b\uff0c\u6211\u4eec\u8bc6\u522b\u4e867\u79cd\u751f\u6210\u5668\u5931\u8d25\u6a21\u5f0f\uff0c\u5e76\u5f15\u5165\u4e86GroUSE\uff08\u57fa\u4e8e\u95ee\u9898\u89e3\u7b54\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b144\u4e2a\u5355\u5143\u6d4b\u8bd5\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\u3002\u8fd9\u4e2a\u57fa\u51c6\u63ed\u793a\u4e86\u73b0\u6709\u7684\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u5f80\u5f80\u5ffd\u89c6\u4e86\u91cd\u8981\u5931\u8d25\u6a21\u5f0f\uff0c\u5373\u4f7f\u5728\u4f7f\u7528GPT-4\u4f5c\u4e3a\u88c1\u5224\u7684\u60c5\u51b5\u4e0b\u4e5f\u662f\u5982\u6b64\u3002 \u4e3a\u4e86\u6539\u8fdb\u5f53\u524d\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u7684\u8bbe\u8ba1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7ba1\u9053\uff0c\u5e76\u53d1\u73b0\u5c01\u95ed\u6a21\u578b\u5728GroUSE\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u88c1\u5224\u6a21\u578b\u5728\u6211\u4eec\u7684\u63d0\u8bae\u6807\u51c6\u4e0b\u5e76\u672a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c3d\u7ba1\u5b83\u4eec\u4e0eGPT-4\u7684\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7684\u76f8\u5173\u6027\u662f\u4e00\u4e2a\u4e0d\u5b8c\u6574\u7684\u4ee3\u7406\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u88c1\u5224\u6a21\u578b\u7684\u5b9e\u9645\u6027\u80fd\uff0c\u5e76\u5e94\u8be5\u901a\u8fc7\u5bf9\u53c2\u8003\u60c5\u51b5\u7684\u7cbe\u786e\u5931\u8d25\u6a21\u5f0f\u68c0\u6d4b\u8fdb\u884c\u8865\u5145\u8bc4\u4f30\u3002 \u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u901a\u8fc7\u5728GPT-4\u7684\u63a8\u7406\u75d5\u8ff9\u4e0a\u5bf9Llama-3\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5176\u8bc4\u4f30\u80fd\u529b\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u4e0eGPT-4\u8bc4\u4ef7\u7684\u76f8\u5173\u6027\u548c\u53c2\u8003\u60c5\u51b5\u7684\u6821\u51c6\u5ea6\u3002|\n", "2409.06558": "|**2024-09-10**|**MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science**|Mahdieh Aliazam et.al.|[2409.06558](http://arxiv.org/abs/2409.06558)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u9ad8\u5ea6\u7cbe\u786e\u548c\u9ad8\u6548\u7684\u7cfb\u7edf\u7684\u9700\u6c42\u4e5f\u5728\u4e0d\u65ad\u589e\u957f\uff0c\u4ee5\u63d0\u5347\u5b89\u5168\u6027\u80fd\u3001\u64cd\u4f5c\u6548\u7387\u548c\u80fd\u6e90\u6d88\u8017\u3002\u5728\u7ba1\u7406\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u65f6\uff0c\u9884\u6d4b\u8f66\u8f86\u8fd0\u884c\u671f\u95f4\u7684\u5404\u79cd\u6761\u4ef6\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6539\u8fdb\u4ee5\u53ca\u77e5\u540d\u6a21\u578b\u5982ChatGPT\u7684\u51fa\u73b0\uff0c\u4e3a\u81ea\u52a8\u9a7e\u9a76\u76f8\u5173\u9884\u6d4b\u63d0\u4f9b\u4e86\u72ec\u7279\u7684\u673a\u4f1a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAPS\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u4f5c\u4e3a\u5730\u56fe\u9605\u8bfb\u8f85\u52a9\u9a7e\u9a76\u5458\uff0c\u9884\u6d4b\u5728\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u8bbe\u7f6e\u7684\u5173\u952e\u53c2\u6570\uff0c\u4ee5\u5e73\u8861\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u3002MAPS\u65b9\u6cd5\u5728\u5bfc\u822a\u7cbe\u5ea6\u65b9\u9762\u76f8\u8f83\u4e8e\u6700\u4f73\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e8620%\u3002\u6b64\u5916\uff0cMAPS\u8fd8\u663e\u793a\u4e86\u5728\u8ba1\u7b97\u5355\u5143\u4e0a\u8282\u7701\u4e8611%\u7684\u80fd\u6e90\uff0c\u5e76\u5728\u673a\u68b0\u548c\u8ba1\u7b97\u5355\u5143\u4e0a\u6700\u9ad8\u8282\u7701\u4e8654%\u3002|\n", "2409.06518": "|**2024-09-10**|**Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games**|Juhwan Choi et.al.|[2409.06518](http://arxiv.org/abs/2409.06518)|**[link](https://github.com/c-juhwan/olympics_analysis)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u7ecf\u6210\u4e3a\u4e3b\u5bfc\u6027\u65b9\u6cd5\uff0c\u7136\u800c\u5b83\u4eec\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u901a\u8fc7\u5206\u6790\u5965\u6797\u5339\u514b\u8fd0\u52a8\u4f1a\u7684\u5386\u53f2\u5956\u724c\u7edf\u8ba1\u60c5\u51b5\uff0c\u7814\u7a76\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u3002\u6211\u4eec\u8981\u6c42\u6a21\u578b\u63d0\u4f9b\u5404\u961f\u7684\u5956\u724c\u6570\u91cf\uff0c\u5e76\u786e\u5b9a\u54ea\u4e9b\u961f\u4f0d\u83b7\u5f97\u4e86\u7279\u5b9a\u6392\u540d\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u6700\u5148\u8fdb\u7684LLM\u5728\u62a5\u544a\u5355\u4e2a\u961f\u4f0d\u7684\u5956\u724c\u6570\u91cf\u65b9\u9762\u8868\u73b0\u5f97\u975e\u5e38\u51fa\u8272\uff0c\u4f46\u5728\u56de\u7b54\u5173\u4e8e\u7279\u5b9a\u6392\u540d\u7684\u95ee\u9898\u65f6\u5374\u9047\u5230\u663e\u8457\u56f0\u96be\u3002\u8fd9\u6697\u793a\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4e0e\u4eba\u7c7b\u7684\u6839\u672c\u4e0d\u540c\uff0c\u4eba\u7c7b\u80fd\u591f\u8f7b\u677e\u5730\u4ece\u5df2\u77e5\u7684\u5956\u724c\u6570\u91cf\u63a8\u65ad\u51fa\u6392\u540d\u3002\u4e3a\u4e86\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u6a21\u578b\u8f93\u51fa\u3002|\n", "2409.07453": "|**2024-09-11**|**\"My Grade is Wrong!\": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays**|Shengxin Hong et.al.|[2409.07453](http://arxiv.org/abs/2409.07453)|null|\u4ea4\u4e92\u5f0f\u53cd\u9988\u5728\u6559\u5e08\u4e0e\u5b66\u751f\u4e4b\u95f4\u53cc\u5411\u6d41\u52a8\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u5411\u53cd\u9988\u66f4\u4e3a\u6709\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u53cd\u9988\u65b9\u5f0f\u5f80\u5f80\u8017\u65f6\u8fc7\u591a\uff0c\u96be\u4ee5\u5728\u6559\u80b2\u5b9e\u8df5\u4e2d\u5e7f\u6cdb\u5e94\u7528\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5177\u6709\u81ea\u52a8\u5316\u53cd\u9988\u7684\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u4e92\u52a8\u60c5\u5883\u4e0b\u7684\u63a8\u7406\u548c\u4ea4\u4e92\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCAELF\uff08Contestable AI Empowered LLM\u6846\u67b6\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u591a\u4ee3\u7406\u7cfb\u7edf\u4e0e\u8ba1\u7b97\u8bba\u8fa9\u6765\u81ea\u52a8\u5316\u4ea4\u4e92\u5f0f\u53cd\u9988\u3002\u9996\u5148\uff0c\u5b66\u751f\u7684\u4f5c\u6587\u7531\u591a\u4e2a\u6559\u5b66\u52a9\u7406\u4ee3\u7406\uff08TA\u4ee3\u7406\uff09\u8fdb\u884c\u8bc4\u4f30\uff0c\u968f\u540e\uff0c\u6559\u5e08\u4ee3\u7406\u901a\u8fc7\u5f62\u5f0f\u5316\u63a8\u7406\u6574\u5408\u8fd9\u4e9b\u8bc4\u4ef7\uff0c\u751f\u6210\u53cd\u9988\u548c\u8bc4\u5206\u3002\u5b66\u751f\u53ef\u4ee5\u8fdb\u4e00\u6b65\u4e0e\u53cd\u9988\u4e92\u52a8\uff0c\u4ee5\u6df1\u5316\u7406\u89e3\u3002\u901a\u8fc7\u5bf9500\u7bc7\u6279\u5224\u6027\u601d\u7ef4\u4f5c\u6587\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u5e76\u7ed3\u5408\u7528\u6237\u7814\u7a76\uff0c\u7ed3\u679c\u8868\u660e\uff0cCAELF\u663e\u8457\u63d0\u9ad8\u4e86\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u589e\u5f3a\u4e86LLM\u7684\u63a8\u7406\u548c\u4e92\u52a8\u80fd\u529b\u3002\u8fd9\u4e00\u65b9\u6cd5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u514b\u670d\u5f71\u54cd\u6559\u80b2\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u65f6\u95f4\u548c\u8d44\u6e90\u969c\u788d\u7684\u6709\u524d\u666f\u89e3\u51b3\u65b9\u6848\u3002|\n", "2409.07440": "|**2024-09-11**|**SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories**|Ben Bogin et.al.|[2409.07440](http://arxiv.org/abs/2409.07440)|**[link](https://github.com/allenai/super-benchmark)**|**\u7ed9\u5b9a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u5199\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u5b83\u4eec\u73b0\u5728\u662f\u5426\u80fd\u591f\u81ea\u4e3b\u91cd\u73b0\u7814\u7a76\u4ed3\u5e93\u4e2d\u7684\u7ed3\u679c\uff1f\u8fd9\u6837\u7684\u80fd\u529b\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u4ea7\u751f\u5de8\u5927\u76ca\u5904\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9a8c\u8bc1\u3001\u7406\u89e3\u5e76\u6269\u5c55\u5148\u524d\u7684\u5de5\u4f5c\u3002\u4e3a\u4e86\u5411\u8fd9\u4e00\u76ee\u6807\u8fc8\u8fdb\uff0c\u6211\u4eec\u5f15\u5165\u4e86SUPER\uff0c\u8fd9\u662f\u9996\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4ece\u7814\u7a76\u4ed3\u5e93\u8bbe\u7f6e\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u7684\u57fa\u51c6\u3002SUPER\u65e8\u5728\u6355\u6349\u7814\u7a76\u4eba\u5458\u5728\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u4ed3\u5e93\u5de5\u4f5c\u65f6\u6240\u9762\u4e34\u7684\u771f\u5b9e\u6311\u6218\u3002\u6211\u4eec\u7684\u57fa\u51c6\u7531\u4e09\u4e2a\u4e0d\u540c\u7684\u95ee\u9898\u96c6\u7ec4\u6210\uff1a45\u4e2a\u7aef\u5230\u7aef\u95ee\u9898\uff0c\u9644\u6709\u4e13\u5bb6\u89e3\u51b3\u65b9\u6848\u7684\u6ce8\u91ca\uff0c152\u4e2a\u4e13\u6ce8\u4e8e\u7279\u5b9a\u6311\u6218\uff08\u4f8b\u5982\u914d\u7f6e\u8bad\u7ec3\u5668\uff09\u7684\u5b50\u95ee\u9898\uff0c\u4ee5\u53ca602\u4e2a\u7528\u4e8e\u66f4\u5927\u89c4\u6a21\u5f00\u53d1\u7684\u81ea\u52a8\u751f\u6210\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u5404\u79cd\u8bc4\u4f30\u6307\u6807\u6765\u8bc4\u4f30\u4efb\u52a1\u6210\u529f\u548c\u8fdb\u5ea6\uff0c\u5f53\u6709\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\u53ef\u7528\u65f6\u4f7f\u7528\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\uff0c\u5426\u5219\u4f7f\u7528\u8fd1\u4f3c\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u9047\u5230\u4e86\u56f0\u96be\uff0c\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4o\uff09\u4ec5\u89e3\u51b3\u4e8616.3%\u7684\u7aef\u5230\u7aef\u96c6\u548c46.1%\u7684\u573a\u666f\u3002\u8fd9\u8868\u660e\u4e86\u8fd9\u9879\u4efb\u52a1\u7684\u6311\u6218\u6027\uff0c\u5e76\u8868\u660eSUPER\u53ef\u4ee5\u4f5c\u4e3a\u793e\u533a\u8861\u91cf\u548c\u63a8\u52a8\u8fdb\u6b65\u7684\u5b9d\u8d35\u8d44\u6e90\u3002**|\n", "2409.07407": "|**2024-09-11**|**CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification**|Zeqing Qin et.al.|[2409.07407](http://arxiv.org/abs/2409.07407)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6f0f\u6d1e\u8bc6\u522b\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7531\u4e8eC/C++\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u5360\u636e\u4e86\u5f00\u6e90\u8f6f\u4ef6\uff08OSS\uff09\u6f0f\u6d1e\u7684\u4e00\u534a\uff0c\u5e76\u4e14\u4e3b\u8981\u901a\u8fc7\u63d0\u4ea4\u8fdb\u884c\u66f4\u65b0\uff0c\u56e0\u6b64\u589e\u5f3aLLM\u5728\u8bc6\u522bC/C++\u6f0f\u6d1e\u8d21\u732e\u63d0\u4ea4\uff08VCC\uff09\u65b9\u9762\u7684\u80fd\u529b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u5927\u89c4\u6a21\u4ee3\u7801\u96c6\u8fdb\u4e00\u6b65\u9884\u8bad\u7ec3LLM\u4e0a\uff0c\u8fd9\u65e2\u8017\u8d39\u8d44\u6e90\u53c8\u5b58\u5728\u6548\u7387\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u65b9\u6cd5\u6765\u63d0\u5347\u57fa\u4e8eBERT\u7684LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86CodeLinguaNexus\uff08CLNX\uff09\uff0c\u4f5c\u4e3a\u8fde\u63a5C/C++\u7a0b\u5e8f\u4e0eLLM\u7684\u6865\u6881\u3002CLNX\u901a\u8fc7\u5728\u4fdd\u7559\u5173\u952e\u7ec6\u8282\u7684\u540c\u65f6\uff0c\u4ee5\u66f4\u81ea\u7136\u7684\u65b9\u5f0f\u9ad8\u6548\u5730\u5c06\u6e90\u4ee3\u7801\u8f6c\u6362\u4e3a\u66f4\u9002\u5408LLM\u5904\u7406\u7684\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0cCLNX\u9996\u5148\u5e94\u7528\u7ed3\u6784\u7ea7\u81ea\u7136\u5316\u6765\u5206\u89e3\u590d\u6742\u7684\u7a0b\u5e8f\uff0c\u7136\u540e\u5e94\u7528\u7b26\u53f7\u7ea7\u81ea\u7136\u5316\u6765\u89e3\u91ca\u590d\u6742\u7684\u7b26\u53f7\u3002\u6211\u4eec\u5728\u5305\u542b25,872\u4e2aC/C++\u51fd\u6570\u53ca\u5176\u63d0\u4ea4\u7684\u516c\u5f00\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86CLNX\u3002\u7ed3\u679c\u8868\u660e\uff0cCLNX\u663e\u8457\u63d0\u5347\u4e86LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u914d\u5907CLNX\u7684CodeBERT\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u8bc6\u522b\u4e8638\u4e2aOSS\u6f0f\u6d1e\u3002|\n", "2409.07394": "|**2024-09-11**|**AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge**|Han Wang et.al.|[2409.07394](http://arxiv.org/abs/2409.07394)|**[link](https://github.com/hannight/adacad)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u53c2\u6570\u5b58\u50a8\u7684\u77e5\u8bc6\u4e4b\u95f4\u5b58\u5728\u77e5\u8bc6\u51b2\u7a81\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u4f7f\u7528\u6807\u51c6\u89e3\u7801\u6280\u672f\u65f6\u6027\u80fd\u53d7\u635f\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6280\u672f\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e0a\u4e0b\u6587\u3002\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u95f4\u5bf9\u6bd4\u65b9\u6cd5\u8bd5\u56fe\u901a\u8fc7\u6bd4\u8f83\u5e26\u6709\u548c\u4e0d\u5e26\u6709\u4e0a\u4e0b\u6587\u7684LLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u5bf9\u6bd4\uff0c\u5e76\u6839\u636e\u5b83\u4eec\u4e4b\u95f4\u7684\u5bf9\u6bd4\u8c03\u6574\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u65b9\u6cd5\u7ecf\u5e38\u9519\u8bef\u5730\u5224\u65ad\u51b2\u7a81\u7684\u7a0b\u5ea6\uff0c\u5e76\u4e14\u96be\u4ee5\u5904\u7406\u4e0d\u540c\u51b2\u7a81\u7a0b\u5ea6\u7684\u5b9e\u4f8b\uff0c\u9759\u6001\u65b9\u6cd5\u5728\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\u8fc7\u5ea6\u8c03\u6574\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684\u7cbe\u7ec6\u7c92\u5ea6\u65b9\u6cd5AdaCAD\uff0c\u5b83\u52a8\u6001\u5730\u6839\u636eJensen-Shannon\u6563\u5ea6\u6d4b\u91cf\u7684\u4e0a\u4e0b\u6587\u548c\u53c2\u6570\u77e5\u8bc6\u5206\u5e03\u4e4b\u95f4\u7684\u51b2\u7a81\u7a0b\u5ea6\u6765\u63a8\u65ad\u8c03\u6574\u6743\u91cd\u3002\u6211\u4eec\u5728\u56db\u4e2a\u6a21\u578b\u4e0a\u5bf9\u516d\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u548c\u4e09\u4e2a\u6458\u8981\u4efb\u52a1\u8fdb\u884c\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u81ea\u9002\u5e94\u65b9\u6cd5\u59cb\u7ec8\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u5176\u4ed6\u89e3\u7801\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8614.21%\uff08\u7edd\u5bf9\u503c\uff09\uff0c\u5e76\u4e14\u63d0\u9ad8\u4e86\u6458\u8981\u7684\u771f\u5b9e\u6027\uff0cAlignScore\u63d0\u9ad8\u4e865.59\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4e0e\u51b2\u7a81\u7684\u5bf9\u6bd4\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5f53\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\uff0c\u89e3\u7801\u4f1a\u635f\u5bb3\u6027\u80fd\uff0c\u800cAdaCAD\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u635f\u5931\uff0c\u4f7f\u5176\u66f4\u9002\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\uff0c\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e2d\uff0c\u6709\u4e9b\u793a\u4f8b\u5b58\u5728\u51b2\u7a81\uff0c\u800c\u5176\u4ed6\u793a\u4f8b\u5219\u4e0d\u5b58\u5728\u51b2\u7a81\u3002**|\n", "2409.07368": "|**2024-09-11**|**Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code**|Khiem Ton et.al.|[2409.07368](http://arxiv.org/abs/2409.07368)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSGCode\u7684\u7075\u6d3b\u63d0\u793a\u4f18\u5316\u7cfb\u7edf\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5b89\u5168\u4ee3\u7801\u3002SGCode\u5c06\u6700\u8fd1\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e0eLLM\u7ed3\u5408\u5728\u4e00\u4e2a\u7edf\u4e00\u7684\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u524d\u7aef\u548c\u540e\u7aefAPI\u63d0\u4f9b\u670d\u52a1\uff0c\u4f7f\u7528\u6237\u80fd\u591f\uff1a1\uff09\u751f\u6210\u65e0\u6f0f\u6d1e\u7684\u5b89\u5168\u4ee3\u7801\uff1b2\uff09\u67e5\u770b\u548c\u5171\u4eab\u5b89\u5168\u6027\u5206\u6790\uff1b\u4ee5\u53ca3\uff09\u8f7b\u677e\u5728\u4e0d\u540c\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u63d0\u4f9b\u6709\u5173\u6a21\u578b\u548c\u7cfb\u7edf\u6027\u80fd\u7684\u89c1\u89e3\u3002\u6211\u4eec\u4f7f\u7528AWS\u670d\u52a1\u5668\u4e0a\u7684PromSec\u586b\u5145SGCode\uff0c\u8fd9\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06LLM\u3001\u5b89\u5168\u5de5\u5177\u4e0e\u8f7b\u91cf\u7ea7\u751f\u6210\u5bf9\u6297\u56fe\u795e\u7ecf\u7f51\u7edc\u76f8\u7ed3\u5408\uff0c\u6765\u68c0\u6d4b\u5e76\u4fee\u590d\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u4ece\u800c\u4f18\u5316\u63d0\u793a\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSGCode\u4f5c\u4e3a\u516c\u5171\u5de5\u5177\uff0c\u80fd\u591f\u63ed\u793a\u6a21\u578b\u5b9e\u7528\u6027\u3001\u5b89\u5168\u4ee3\u7801\u751f\u6210\u548c\u7cfb\u7edf\u6210\u672c\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5177\u6709\u76f8\u5bf9\u8f83\u4f4e\u7684\u6210\u672c\u3002SGCode\u5df2\u4e0a\u7ebf\u4e8e\uff1a\u3002|\n", "2409.07355": "|**2024-09-11**|**Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation**|SeongYeub Chu et.al.|[2409.07355](http://arxiv.org/abs/2409.07355)|**[link](https://github.com/BBeeChu/InteractEval)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cInteractEval\u201d\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u91c7\u7528\u201cThink-Aloud\u201d\u65b9\u6cd5\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u610f\u89c1\uff0c\u4ee5\u751f\u6210\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u7684\u5c5e\u6027\u3002\u901a\u8fc7\u878d\u5408\u4eba\u7c7b\u7684\u7075\u6d3b\u6027\u548c\u63a8\u7406\u80fd\u529b\u4ee5\u53caLLM\u7684\u4e00\u81f4\u6027\uff0cInteractEval\u5728\u4e00\u81f4\u6027\u3001\u6d41\u7545\u6027\u3001\u76f8\u5173\u6027\u548c\u8fde\u8d2f\u6027\u56db\u4e2a\u7ef4\u5ea6\u4e0a\u5747\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u975eLLM\u57fa\u7ebf\u548cLLM\u57fa\u7ebf\u6a21\u578b\u3002\u5b9e\u9a8c\u8fd8\u63a2\u8ba8\u4e86\u201cThink-Aloud\u201d\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5b83\u80fd\u4fc3\u8fdb\u4eba\u7c7b\u548cLLM\u7684\u53d1\u6563\u601d\u7ef4\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u5e7f\u6cdb\u7684\u76f8\u5173\u5c5e\u6027\uff0c\u5e76\u63d0\u9ad8\u6587\u672c\u8bc4\u4f30\u6027\u80fd\u3002\u6bd4\u8f83\u5206\u6790\u663e\u793a\uff0c\u4eba\u7c7b\u5728\u8bc6\u522b\u4e0e\u5185\u90e8\u8d28\u91cf\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u8fde\u8d2f\u6027\u548c\u6d41\u7545\u6027\uff09\u65b9\u9762\u8868\u73b0\u4f18\u5f02\uff0c\u800cLLM\u5728\u4e0e\u5916\u90e8\u5bf9\u9f50\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u4e00\u81f4\u6027\u548c\u76f8\u5173\u6027\uff09\u4e0a\u8868\u73b0\u66f4\u597d\u3002\u56e0\u6b64\uff0c\u7ed3\u5408\u4eba\u7c7b\u548cLLM\u5171\u540c\u4ea7\u751f\u7684\u8bc4\u4f30\u7ed3\u679c\u6700\u4f73\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u672c\u6587\u5f3a\u8c03\u4e86\u5728\u81ea\u52a8\u5316\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u6846\u67b6\u4e2d\u6709\u6548\u6574\u5408\u4eba\u7c7b\u548cLLM\u7684\u5fc5\u8981\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\textbf{\\url{https://github.com/BBeeChu/InteractEval.git}}}\u3002**|\n", "2409.07331": "|**2024-09-11**|**Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering**|Weixi Weng et.al.|[2409.07331](http://arxiv.org/abs/2409.07331)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u51fa\u8272\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u77e5\u8bc6\u57fa\u89c6\u89c9\u95ee\u7b54\uff08KB-VQA\uff09\u4efb\u52a1\u4e2d\uff0cMLLMs\u53ef\u80fd\u7f3a\u4e4f\u4eba\u7c7b\u5e38\u8bc6\u6216\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u4ece\u800c\u9700\u8981\u4ece\u5916\u90e8\u77e5\u8bc6\u6e90\u83b7\u53d6\u6240\u9700\u4fe1\u606f\u4ee5\u56de\u7b54\u6b64\u7c7b\u95ee\u9898\u3002\u5148\u524d\u7684\u5de5\u4f5c\uff0c\u5982\u68c0\u7d22\u589e\u5f3a\u7684VQA-v2\uff08RAVQA-v2\uff09\uff0c\u4fa7\u91cd\u4e8e\u5145\u5206\u5229\u7528\u8f93\u5165\u4fe1\u606f\uff0c\u4f8b\u5982\u56fe\u50cf\u6587\u672c\u63cf\u8ff0\u548c\u68c0\u7d22\u7684\u77e5\u8bc6\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u90fd\u5ffd\u89c6\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u968f\u7740\u8f93\u5165\u4ee4\u724c\u6570\u91cf\u7684\u589e\u52a0\uff0c\u63a8\u7406\u6548\u7387\u663e\u8457\u964d\u4f4e\uff0c\u8fd9\u4e0e\u5b9e\u9645\u5e94\u7528\u7684\u9700\u6c42\u76f8\u77db\u76fe\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68c0\u7d22\u589e\u5f3a\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08RACC\uff09\u3002RACC\u5b66\u4e60\u538b\u7f29\u5e76\u805a\u5408\u68c0\u7d22\u4e0a\u4e0b\u6587\uff0c\u5e76\u751f\u6210\u7d27\u51d1\u7684\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u5f62\u5f0f\u7684\u8c03\u8282\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u8c03\u8282\u6765\u9002\u5e94\u4e0b\u6e38\u51bb\u7ed3\u7684MLLM\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u4e14\u9ad8\u6548\u7684\u63a8\u7406\u3002RACC\u5728OK-VQA\u4e0a\u5b9e\u73b0\u4e86\u5f53\u524d\u6700\u4f73\u768462.9%\u6027\u80fd\u3002\u6b64\u5916\uff0c\u5b83\u5c06RAVQA-v2\u7684\u63a8\u7406\u5ef6\u8fdf\u663e\u8457\u964d\u4f4e\u4e8622.0%-59.7%\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\u4e86RACC\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u3002\u5b83\u4e0e\u5404\u79cd\u73b0\u6210\u7684MLLM\u517c\u5bb9\uff0c\u5e76\u53ef\u4ee5\u5904\u7406\u5305\u62ec\u6587\u672c\u548c\u591a\u6a21\u6001\u6587\u6863\u5728\u5185\u7684\u4e0d\u540c\u77e5\u8bc6\u6e90\u3002|\n", "2409.07314": "|**2024-09-11**|**MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications**|Praveen K Kanithi et.al.|[2409.07314](http://arxiv.org/abs/2409.07314)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u533b\u7597\u5065\u5eb7\u9886\u57df\u7684\u5feb\u901f\u5f00\u53d1\u5f15\u53d1\u4e86\u5bf9\u8d85\u8d8a\u5982USMLE\u7b49\u5e38\u7528\u57fa\u51c6\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u9700\u6c42\uff0c\u4ee5\u66f4\u597d\u5730\u53cd\u6620\u5b9e\u9645\u5e94\u7528\u8868\u73b0\u3002\u867d\u7136\u73b0\u5b9e\u4e16\u754c\u7684\u8bc4\u4f30\u662f\u5b9e\u7528\u6027\u7684\u91cd\u8981\u6307\u6807\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u843d\u540e\u4e8eLLM\u6f14\u8fdb\u7684\u901f\u5ea6\uff0c\u53ef\u80fd\u5bfc\u81f4\u7814\u7a76\u7ed3\u679c\u5728\u90e8\u7f72\u65f6\u53d8\u5f97\u8fc7\u65f6\u3002\u8fd9\u79cd\u65f6\u95f4\u4e0a\u7684\u8131\u8282\u9700\u8981\u4e00\u79cd\u5168\u9762\u7684\u524d\u671f\u8bc4\u4f30\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6a21\u578b\u9009\u62e9\u3002 \u6211\u4eec\u5f15\u5165\u4e86MEDIC\u6846\u67b6\uff0c\u5b83\u4ece\u4e94\u4e2a\u5173\u952e\u7684\u4e34\u5e8a\u80fd\u529b\u7ef4\u5ea6\u8bc4\u4f30LLM\uff1a\u533b\u5b66\u63a8\u7406\u3001\u4f26\u7406\u4e0e\u504f\u89c1\u3001\u6570\u636e\u548c\u8bed\u8a00\u7406\u89e3\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\u4ee5\u53ca\u4e34\u5e8a\u5b89\u5168\u6027\u3002MEDIC\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ea4\u53c9\u5ba1\u67e5\u6846\u67b6\uff0c\u91cf\u5316\u4e86LLM\u5728\u8986\u76d6\u8303\u56f4\u548c\u5e7b\u89c9\u68c0\u6d4b\u7b49\u9886\u57df\u7684\u6027\u80fd\uff0c\u800c\u65e0\u9700\u53c2\u8003\u8f93\u51fa\u3002\u6211\u4eec\u4f7f\u7528MEDIC\u5bf9\u533b\u7597\u95ee\u7b54\u3001\u5b89\u5168\u3001\u603b\u7ed3\u3001\u7b14\u8bb0\u751f\u6210\u4ee5\u53ca\u5176\u4ed6\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\u4e0d\u540c\u6a21\u578b\u5927\u5c0f\u4e4b\u95f4\u3001\u57fa\u7ebf\u6a21\u578b\u4e0e\u533b\u5b66\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u5e76\u5bf9\u9700\u8981\u7279\u5b9a\u6a21\u578b\u4f18\u52bf\u7684\u5e94\u7528\uff08\u5982\u4f4e\u5e7b\u89c9\u6216\u8f83\u4f4e\u63a8\u7406\u6210\u672c\uff09\u7684\u6a21\u578b\u9009\u62e9\u5177\u6709\u542f\u793a\u610f\u4e49\u3002MEDIC\u7684\u591a\u7ef4\u5ea6\u8bc4\u4f30\u63ed\u793a\u4e86\u7406\u8bba\u80fd\u529b\u548c\u5b9e\u9645\u5b9e\u65bd\u4e4b\u95f4\u7684\u6027\u80fd\u6743\u8861\uff0c\u5f25\u5408\u4e86\u5728\u533b\u7597\u4fdd\u5065\u73af\u5883\u4e2d\u8bc6\u522b\u548c\u9002\u5e94\u6700\u6709\u524d\u666f\u6a21\u578b\u7684\u5dee\u8ddd\uff0c\u786e\u4fdd\u4e86\u9002\u5408\u591a\u79cd\u533b\u7597\u4fdd\u5065\u5e94\u7528\u7684\u6a21\u578b\u5f97\u5230\u8bc6\u522b\u548c\u9002\u5e94\u3002|\n", "2409.07276": "|**2024-09-11**|**STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM**|Qijiong Liu et.al.|[2409.07276](http://arxiv.org/abs/2409.07276)|null|\u4f20\u7edf\u63a8\u8350\u6a21\u578b\u901a\u5e38\u4f9d\u8d56\u4e8e\u72ec\u7279\u7684\u9879\u76ee\u6807\u8bc6\u7b26\uff08ID\uff09\u6765\u533a\u5206\u9879\u76ee\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u5b83\u4eec\u5229\u7528\u9879\u76ee\u5185\u5bb9\u4fe1\u606f\u548c\u63a8\u5e7f\u957f\u5c3e\u6216\u51b7\u542f\u52a8\u9879\u76ee\u7684\u80fd \u529b\u3002\u8fd1\u671f\uff0c\u5df2\u63d0\u51fa\u8bed\u4e49\u5206\u8bcd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6bcf\u4e2a\u9879\u76ee\u7684\u8bed\u4e49\u8868\u793a\u5206\u8bcd\u4e3a\u4e00\u7cfb\u5217\u79bb\u6563\u7684\u4ee4\u724c\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5b83\u4fdd \u7559\u4e86\u9879\u76ee\u5728\u8fd9\u4e9b\u4ee4\u724c\u5185\u7684\u8bed\u4e49\uff0c\u5e76\u786e\u4fdd\u5177\u6709\u76f8\u4f3c\u8bed\u4e49\u7684\u9879\u76ee\u7531\u76f8\u4f3c\u7684\u4ee4\u724c\u8868\u793a\u3002\u8fd9\u4e9b\u8bed\u4e49\u4ee4\u724c\u6210\u4e3a\u8bad\u7ec3\u751f\u6210\u63a8\u8350\u6a21\u578b\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u73b0\u6709 \u7684\u751f\u6210\u63a8\u8350\u65b9\u6cd5\u901a\u5e38\u6d89\u53ca\u591a\u4e2a\u5b50\u6a21\u578b\u8fdb\u884c\u5d4c\u5165\u3001\u91cf\u5316\u548c\u63a8\u8350\uff0c\u5bfc\u81f4\u7cfb\u7edf\u8fc7\u4e8e\u590d\u6742\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u79f0\u4e3aSTORE\uff0c \u5229\u7528\u5355\u4e00\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u540c\u65f6\u6267\u884c\u8fd9\u4e24\u9879\u4efb\u52a1\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u8bed\u4e49\u5206\u8bcd\u8868\u8ff0\u4e3a\u6587\u672c\u5230\u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u800c\u751f\u6210\u63a8\u8350\u5219\u8868\u8ff0\u4e3a\u4ee4\u724c\u5230 \u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u901a\u8fc7\u8865\u5145\u4ee4\u724c\u5230\u6587\u672c\u91cd\u6784\u4efb\u52a1\u548c\u6587\u672c\u5230\u4ee4\u724c\u8f85\u52a9\u4efb\u52a1\uff0c\u6240\u6709\u8fd9\u4e9b\u4efb\u52a1\u5747\u4ee5\u751f\u6210\u65b9\u5f0f\u8868\u8ff0\u5e76\u4f7f\u7528\u5355\u4e00LLM\u9aa8\u5e72\u8fdb\u884c\u8bad\u7ec3\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1\u6211\u4eec\u7684STORE\u6846\u67b6\u5728\u5404\u79cd\u63a8\u8350\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u6e90\u4ee3\u7801\u548c\u914d\u7f6e\uff0c\u4ee5\u4fbf\u8fdb\u884c\u53ef\u590d\u73b0\u7684\u7814\u7a76\u3002|\n", "2409.07267": "|**2024-09-11**|**MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving**|Enming Zhang et.al.|[2409.07267](http://arxiv.org/abs/2409.07267)|**[link](https://github.com/emzucas/minidrive)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMiniDrive\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u573a\u666f\u4e2d\u7684\u5e94\u7528\u96be\u9898\u3002\u73b0\u6709\u7684VLM\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u8ba1\u7b97\u5bc6\u96c6\u578b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u5728\u5b9e\u9645\u4e16\u754c\u548c\u5b9e\u65f6\u5e94\u7528\u4e2d\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u5927\u591a\u6570\u73b0\u6709VLM\u7f3a\u4e4f\u5904\u7406\u591a\u5f20\u56fe\u7247\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u9002\u5e94\u81ea\u52a8\u9a7e\u9a76\u4e2d\u7684\u591a\u6444\u50cf\u5934\u611f\u77e5\u9700\u6c42\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u7279\u5f81\u5de5\u7a0b\u6df7\u5408\u4e13\u5bb6\uff08FE-MoE\uff09\u548c\u52a8\u6001\u6307\u4ee4\u9002\u914d\u5668\uff08DI-Adapter\uff09\u3002FE-MoE\u6709\u6548\u5730\u5c06\u4e8c\u7ef4\u7279\u5f81\u6620\u5c04\u5230\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\uff0c\u7136\u540e\u4f5c\u4e3a\u8f93\u5165\u4f20\u9012\u7ed9\u8bed\u8a00\u6a21\u578b\u3002DI-Adapter\u5141\u8bb8\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u6839\u636e\u6307\u4ee4\u6587\u672c\u5d4c\u5165\u52a8\u6001\u53d8\u5316\uff0c\u89e3\u51b3\u4e86\u4ee5\u5f80\u65b9\u6cd5\u4e2d\u540c\u4e00\u56fe\u7247\u4e0b\u9759\u6001\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u7684\u95ee\u9898\u3002 \u4e0e\u4e4b\u524d\u7684\u6210\u679c\u76f8\u6bd4\uff0cMiniDrive\u5728\u53c2\u6570\u5927\u5c0f\u3001\u6d6e\u70b9\u8fd0\u7b97\u91cf\u548c\u54cd\u5e94\u6548\u7387\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u4f18\u6027\u80fd\uff0c\u6700\u5c0f\u7248\u672c\u4ec5\u5305\u542b83M\u53c2\u6570\u3002|\n", "2409.08264": "|**2024-09-12**|**Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale**|Rogerio Bonatti et.al.|[2409.08264](http://arxiv.org/abs/2409.08264)|**[link](https://github.com/microsoft/windowsagentarena)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u5728\u9700\u8981\u89c4\u5212\u548c\u63a8\u7406\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u4f5c\u4e3a\u8ba1\u7b97\u673a\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u80fd\u663e\u8457\u63d0\u5347\u4eba\u7c7b\u751f\u4ea7\u529b\u548c\u8f6f\u4ef6\u53ef\u8bbf\u95ee\u6027\u3002\u7136\u800c\uff0c\u8861\u91cf\u8fd9\u4e9b\u4ee3\u7406\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u7684\u6027\u80fd\u4ecd\u5b58\u5728\u6311\u6218\uff1a\uff08i\uff09\u5927\u591a\u6570\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u9650\u4e8e\u7279\u5b9a\u6a21\u6001\u6216\u9886\u57df\uff08\u4f8b\u5982\u7eaf\u6587\u672c\u3001\u7f51\u9875\u5bfc\u822a\u3001\u95ee\u9898\u56de\u7b54\u3001\u7f16\u7a0b\uff09\uff0c\uff08ii\uff09\u5b8c\u6574\u57fa\u51c6\u8bc4\u4f30\u8017\u65f6\u957f\uff08\u901a\u5e38\u9700\u6570\u5929\u65f6\u95f4\uff09\uff0c\u56e0\u4e3a\u4efb\u52a1\u5177\u6709\u591a\u6b65\u9aa4\u7684\u5e8f\u5217\u6027\u8d28\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201cWindows Agent Arena\u201d\uff1a\u4e00\u4e2a\u53ef\u590d\u73b0\u7684\u901a\u7528\u73af\u5883\uff0c\u4e13\u6ce8\u4e8eWindows\u64cd\u4f5c\u7cfb\u7edf\uff0c\u5141\u8bb8\u4ee3\u7406\u81ea\u7531\u64cd\u4f5c\u5e76\u4f7f\u7528\u4e0e\u4eba\u7c7b\u7528\u6237\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u76f8\u540c\u7684\u5e7f\u6cdb\u5e94\u7528\u7a0b\u5e8f\u3001\u5de5\u5177\u548c\u7f51\u7edc\u6d4f\u89c8\u5668\u3002\u6211\u4eec\u6839\u636eOSWorld\u6846\u67b6\uff08Xie\u7b49\u4eba\uff0c2024\u5e74\uff09\u521b\u5efa\u4e86150\u591a\u4e2a\u8de8\u4ee3\u8868\u9886\u57df\u7684\u591a\u6837\u5316Windows\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u89c4\u5212\u3001\u5c4f\u5e55\u7406\u89e3\u53ca\u5de5\u5177\u4f7f\u7528\u7684\u4ee3\u7406\u80fd\u529b\u8981\u6c42\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u5e76\u80fd\u591f\u65e0\u7f1d\u5730\u5728Azure\u4e0a\u5e76\u884c\u5316\uff0c\u4ece\u800c\u5728\u77ed\u77ed20\u5206\u949f\u5185\u5b8c\u6210\u5168\u9762\u57fa\u51c6\u8bc4\u4f30\u3002\u4e3a\u4e86\u5c55\u793aWindows Agent Arena\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ee3\u7406Navi\u3002Navi\u5728Windows\u9886\u57df\u5185\u7684\u6210\u529f\u7387\u8fbe\u5230\u4e8619.5%\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u672a\u7ecf\u8f85\u52a9\u7684\u4eba\u7c7b\u8868\u73b0\u5219\u4e3a74.5%\u3002\u6b64\u5916\uff0cNavi\u5728\u53e6\u4e00\u4e2a\u6d41\u884c\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u57fa\u51c6\u6d4b\u8bd5Mind2Web\u4e2d\u4e5f\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9Navi\u6027\u80fd\u7684\u8be6\u7ec6\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528Windows Agent Arena\u8fdb\u884c\u672a\u6765\u7814\u7a76\u7684\u4ee3\u7406\u5f00\u53d1\u548c\u6570\u636e\u751f\u6210\u673a\u4f1a\u7684\u89c1\u89e3\u3002\u7f51\u9875\uff1ahttps://microsoft.github.io/WindowsAgentArena \u4ee3\u7801\uff1ahttps://github.com/microsoft/WindowsAgentArena**|\n", "2409.08250": "|**2024-09-12**|**OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering**|Jiahao Nick Li et.al.|[2409.08250](http://arxiv.org/abs/2409.08250)|null|\u4eba\u4eec\u5e38\u901a\u8fc7\u7167\u7247\u3001\u5c4f\u5e55\u622a\u56fe\u548c\u89c6\u9891\u6765\u6355\u6349\u8bb0\u5fc6\u3002\u73b0\u6709\u7684\u57fa\u4e8eAI\u7684\u5de5\u5177\u80fd\u591f\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u68c0\u7d22\u8fd9\u4e9b\u6570\u636e\uff0c\u4f46\u4e3b\u8981\u5c40\u9650\u4e8e\u68c0\u7d22\u50cf\u7167\u7247\u4e2d\u7684\u7279\u5b9a\u7269\u4f53\u8fd9\u6837\u7684\u5355\u4e00\u4fe1\u606f\uff0c\u96be\u4ee5\u5904\u7406\u6d89\u53ca\u7406\u89e3\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\uff08\u5982\u4e8b\u4ef6\u5e8f\u5217\uff09\u7684\u66f4\u590d\u6742\u67e5\u8be2\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u4e3a\u671f\u4e00\u4e2a\u6708\u7684\u65e5\u5fd7\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u73b0\u5b9e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96c6\u6210\u4e0e\u6355\u83b7\u8bb0\u5fc6\u76f8\u5173\u5fc5\u8981\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86OmniQuery\uff0c\u8fd9\u662f\u4e00\u79cd\u80fd\u591f\u56de\u7b54\u9700\u8981\u63d0\u53d6\u548c\u63a8\u65ad\u591a\u5c42\u4e0a\u4e0b\u6587\u4fe1\u606f\u4ee5\u6574\u5408\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\u7684\u590d\u6742\u4e2a\u4eba\u8bb0\u5fc6\u76f8\u5173\u95ee\u9898\u7684\u65b0\u578b\u7cfb\u7edf\u3002OmniQuery\u901a\u8fc7\u4ece\u591a\u4e2a\u76f8\u4e92\u5173\u8054\u7684\u8bb0\u5fc6\u4e2d\u96c6\u6210\u5206\u6563\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u589e\u5f3a\u5355\u4e2a\u6355\u83b7\u7684\u8bb0\u5fc6\uff0c\u68c0\u7d22\u76f8\u5173\u8bb0\u5fc6\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u4f9b\u5168\u9762\u7684\u7b54\u6848\u3002\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86OmniQuery\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u8fbe\u523071.5%\uff0c\u5e76\u4e14\u5b83\u572874.5%\u7684\u65f6\u95f4\u91cc\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684RAG\u7cfb\u7edf\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u751a\u81f3\u53d6\u5f97\u4e86\u80dc\u5229\u6216\u5e76\u5217\u7b2c\u4e00\u7684\u6210\u7ee9\u3002|\n", "2409.08239": "|**2024-09-12**|**Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources**|Alisia Lupidi et.al.|[2409.08239](http://arxiv.org/abs/2409.08239)|null|\u5728\u9762\u5bf9\u4f9d\u8d56\u7ed3\u6784\u5316\u6570\u636e\u3001\u590d\u6742\u63a8\u7406\u6216\u5de5\u5177\u4f7f\u7528\u7684\u6311\u6218\u6027\u573a\u666f\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSource2Synth\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u65e0\u9700\u6602\u8d35\u7684\u4eba\u7c7b\u6807\u6ce8\u5373\u53ef\u7528\u4e8e\u6559\u6388LLMs\u65b0\u6280\u80fd\u3002Source2Synth\u63a5\u53d7\u81ea\u5b9a\u4e49\u6570\u636e\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u751f\u6210\u5177\u6709\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u6765\u6e90\u7684\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u5408\u6210\u6570\u636e\u70b9\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u6839\u636e\u5176\u53ef\u56de\u7b54\u6027\u4e22\u5f03\u4f4e\u8d28\u91cf\u751f\u6210\u6765\u63d0\u9ad8\u6570\u636e\u96c6\u8d28\u91cf\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u9886\u57df\u4e2d\u5e94\u7528\u6b64\u65b9\u6cd5\u6765\u5c55\u793a\u5176\u901a\u7528\u6027\uff1a\u5728\u591a\u8df3\u95ee\u9898\u56de\u7b54\uff08MHQA\uff09\u4e2d\u6d4b\u8bd5\u63a8\u7406\u80fd\u529b\uff0c\u5728\u8868\u683c\u578b\u95ee\u9898\u56de\u7b54\uff08TQA\uff09\u4e2d\u6d4b\u8bd5\u5de5\u5177\u4f7f\u7528\u3002\u4e0e\u7ecf\u8fc7\u5fae\u8c03\u7684\u57fa\u672c\u6a21\u578b\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728WikiSQL\u4e0a\u7684TQA\u4e0a\u63d0\u9ad8\u4e8625.51%\uff0c\u5728HotPotQA\u4e0a\u7684MHQA\u4e0a\u63d0\u9ad8\u4e8622.57%\u7684\u6027\u80fd\u3002|\n", "2409.08234": "|**2024-09-12**|**LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems**|Hakan T. Otal et.al.|[2409.08234](http://arxiv.org/abs/2409.08234)|**[link](https://github.com/ai-in-complex-systems-lab/llm-honeypot)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6784\u5efa\u771f\u5b9e\u4e14\u4e92\u52a8\u7684\u871c\u7f50\u7cfb\u7edf\u3002\u901a\u8fc7\u5728\u5305\u542b\u653b\u51fb\u8005\u751f\u6210\u547d\u4ee4\u548c\u54cd\u5e94\u7684\u591a\u6837\u5316\u6570\u636e\u96c6\u4e0a\u5bf9\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u80fd\u591f\u4e0e\u653b\u51fb\u8005\u8fdb\u884c\u9ad8\u7ea7\u4ea4\u4e92\u7684\u871c\u7f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5173\u952e\u6b65\u9aa4\uff1a\u6570\u636e\u6536\u96c6\u4e0e\u5904\u7406\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6a21\u578b\u9009\u62e9\u4ee5\u53ca\u76d1\u7763\u5f0f\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002\u901a\u8fc7\u76f8\u4f3c\u6027\u6307\u6807\u8bc4\u4f30\u4e0e\u73b0\u573a\u90e8\u7f72\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u51c6\u786e\u4e14\u4fe1\u606f\u4e30\u5bcc\u7684\u54cd\u5e94\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u5728\u91cd\u5851\u871c\u7f50\u6280\u672f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u4e1a\u4eba\u5458\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u5de5\u5177\u6765\u68c0\u6d4b\u548c\u5206\u6790\u6076\u610f\u6d3b\u52a8\uff0c\u4ece\u800c\u589e\u5f3a\u6574\u4f53\u5b89\u5168\u67b6\u6784\u3002**|\n", "2409.08202": "|**2024-09-12**|**What Makes a Maze Look Like a Maze?**|Joy Hsu et.al.|[2409.08202](http://arxiv.org/abs/2409.08202)|null|\u4eba\u7c7b\u89c6\u89c9\u7406\u89e3\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u80fd\u591f\u7075\u6d3b\u5730\u89e3\u91ca\u62bd\u8c61\u6982\u5ff5\u7684\u80fd\u529b\uff1a\u83b7\u53d6\u63d0\u5347\u89c4\u5219\u6765\u89e3\u91ca\u5b83\u4eec\u6240\u8c61\u5f81\u7684\u542b\u4e49\uff0c\u5728\u719f\u6089\u548c\u4e0d\u719f\u6089\u7684\u4e0a\u4e0b\u6587\u4e2d\u951a\u5b9a\u5b83\u4eec\uff0c\u5e76\u5bf9\u5b83\u4eec\u8fdb\u884c\u9884\u6d4b\u6216\u63a8\u7406\u3002\u5c3d\u7ba1\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u5177\u4f53\u5bf9\u8c61\u7c7b\u522b\uff08\u5982\u6811\u679d\uff09\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u7406\u89e3\u8fd9\u6837\u7684\u89c6\u89c9\u62bd\u8c61\uff08\u4f8b\u5982\uff0c\u4e00\u7ec4\u6811\u679d\u5982\u4f55\u5f62\u6210\u8ff7\u5bab\u7684\u5899\u58c1\uff09\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u6df1\u5ea6\u67b6\u6784\u63a5\u5730\uff08DSG\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u660e\u786e\u7684\u7ed3\u6784\u5316\u8868\u793a\u6cd5\u6765\u951a\u5b9a\u548c\u63a8\u7406\u89c6\u89c9\u62bd\u8c61\u7684\u6846\u67b6\u3002DSG\u7684\u6838\u5fc3\u662f\u67b6\u6784\u2014\u2014\u5206\u89e3\u62bd\u8c61\u6982\u5ff5\u7684\u4f9d\u8d56\u56fe\u5f62\u63cf\u8ff0\uff0c\u5c06\u5176\u5206\u89e3\u4e3a\u66f4\u57fa\u672c\u7684\u7b26\u53f7\u3002DSG\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u53d6\u67b6\u6784\uff0c\u7136\u540e\u901a\u8fc7\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5206\u5c42\u5730\u5c06\u67b6\u6784\u4e2d\u7684\u5177\u4f53\u5230\u62bd\u8c61\u7ec4\u4ef6\u951a\u5b9a\u5230\u56fe\u50cf\u4e0a\u3002\u951a\u5b9a\u540e\u7684\u67b6\u6784\u7528\u4e8e\u589e\u5f3a\u5bf9\u89c6\u89c9\u62bd\u8c61\u7684\u7406\u89e3\u3002\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86DSG\u53ca\u5176\u4e0d\u540c\u7684\u65b9\u6cd5\u5728\u6211\u4eec\u65b0\u521b\u5efa\u7684\u89c6\u89c9\u62bd\u8c61\u6570\u636e\u96c6\u4e0a\u7684\u63a8\u7406\u6027\u80fd\uff0c\u8be5\u6570\u636e\u96c6\u7531\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u4e16\u754c\u56fe\u50cf\u548c\u76f8\u5e94\u7684\u95ee\u7b54\u5bf9\u7ec4\u6210\u3002\u6211\u4eec\u5c55\u793a\u4e86DSG\u663e\u8457\u63d0\u9ad8\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u62bd\u8c61\u89c6\u89c9\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\uff0c\u5e76\u671d\u7740\u4e0e\u4eba\u7c7b\u4e00\u81f4\u7684\u89c6\u89c9\u62bd\u8c61\u7406\u89e3\u8fc8\u8fdb\u4e86\u4e00\u6b65\u3002|\n", "2409.08185": "|**2024-09-12**|**Fine-tuning Large Language Models for Entity Matching**|Aaron Steiner et.al.|[2409.08185](http://arxiv.org/abs/2409.08185)|**[link](https://github.com/wbsg-uni-mannheim/tailormatch)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5b9e\u4f53\u5339\u914d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5fae\u8c03\u3002\u5df2\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4e0a\u3002\u672c\u6587\u4ece\u4e24\u4e2a\u7ef4\u5ea6\u5206\u6790\u4e86\u5fae\u8c03\u7684\u53ef\u884c\u6027\uff1a1\uff09\u8bad\u7ec3\u793a\u4f8b\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u5b9e\u9a8c\u6d89\u53ca\u5728\u8bad\u7ec3\u96c6\u4e2d\u6dfb\u52a0\u4e0d\u540c\u7c7b\u578b\u7684LLM\u751f\u6210\u89e3\u91ca\uff1b2\uff09\u4f7f\u7528LLM\u9009\u62e9\u548c\u751f\u6210\u8bad\u7ec3\u793a\u4f8b\u3002\u6211\u4eec\u4e0d\u4ec5\u5173\u6ce8\u6e90\u6570\u636e\u96c6\u4e0a\u7684\u5339\u914d\u6027\u80fd\uff0c\u8fd8\u7814\u7a76\u4e86\u5fae\u8c03\u5bf9\u6a21\u578b\u5728\u540c\u57df\u6570\u636e\u96c6\u4ee5\u53ca\u8de8\u9886\u57df\u6570\u636e\u96c6\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u7684\u5f71\u54cd\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5fae\u8c03\u663e\u8457\u63d0\u5347\u4e86\u5c0f\u578b\u6a21\u578b\u7684\u6027\u80fd\uff0c\u800c\u5927\u578b\u6a21\u578b\u7684\u8868\u73b0\u5219\u53c2\u5dee\u4e0d\u9f50\u3002\u5fae\u8c03\u5728\u63d0\u5347\u540c\u57df\u6570\u636e\u96c6\u7684\u6cdb\u5316\u80fd\u529b\u7684\u540c\u65f6\uff0c\u4e5f\u5f71\u54cd\u4e86\u8de8\u57df\u8fc1\u79fb\u7684\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5411\u8bad\u7ec3\u96c6\u6dfb\u52a0\u7ed3\u6784\u5316\u7684\u89e3\u91ca\u5bf9\u56db\u79cdLLM\u4e2d\u7684\u4e09\u79cd\u6709\u6b63\u9762\u5f71\u54cd\uff0c\u800c\u63d0\u51fa\u7684\u793a\u4f8b\u9009\u62e9\u548c\u751f\u6210\u65b9\u6cd5\u4ec5\u63d0\u5347\u4e86Llama 3.1 8B\u7684\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86GPT-4o Mini\u7684\u6027\u80fd\u3002**|\n", "2409.08148": "|**2024-09-12**|**Faster Speech-LLaMA Inference with Multi-token Prediction**|Desh Raj et.al.|[2409.08148](http://arxiv.org/abs/2409.08148)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u4efb\u52a1\u4e0a\u53d8\u5f97\u6781\u4e3a\u719f\u7ec3\uff0c\u5305\u62ec\u6d89\u53ca\u591a\u6a21\u6001\u8f93\u5165\u7684\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u4f7f\u7528\u8bed\u97f3\u7f16\u7801\u5668\u5b9e\u4f8b\u5316LLM\uff08\u4f8b\u5982LLaMA\uff09\u5e76\u5229\u7528\u914d\u5bf9\u6570\u636e\u5bf9\u5176\u8fdb\u884c\u8bad\u7ec3\uff0c\u53ef\u4ee5\u8d4b\u4e88\u53ea\u89e3\u7801\u7684\u6a21\u578b\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u80fd\u529b\uff0c\u56e0\u6b64\u79f0\u4e4b\u4e3aSpeech-LLaMA\u3002\u7136\u800c\uff0c\u7531\u4e8e\u81ea\u56de\u5f52\u63a8\u7406\u7684\u987a\u5e8f\u6027\u8d28\u4ee5\u53ca\u76f8\u5bf9\u8f83\u5927\u7684\u89e3\u7801\u5668\uff0cSpeech-LLaMA\u6a21\u578b\u7684\u63a8\u7406\u65f6\u95f4\u76f8\u5bf9\u8f83\u9ad8\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u5728\u540c\u4e00\u89e3\u7801\u6b65\u9aa4\u4e2d\u9884\u6d4b\u591a\u4e2a\u4ee4\u724c\u6765\u52a0\u901fSpeech-LLaMA\u7684\u63a8\u7406\u3002\u6211\u4eec\u63a2\u7d22\u4e86\u51e0\u4e2a\u80fd\u591f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6a21\u578b\u67b6\u6784\uff0c\u5e76\u901a\u8fc7\u9608\u503c\u63a8\u7406\u548c\u9a8c\u8bc1\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u524d\u7f00\u7684\u675f\u641c\u7d22\u89e3\u7801\u65b9\u6cd5\uff0c\u5141\u8bb8\u6b64\u7c7b\u6a21\u578b\u8fdb\u884c\u9ad8\u6548\u7684\u6700\u5c0f\u8bcd\u9519\u8bef\u7387\uff08MWER\uff09\u8bad\u7ec3\u3002\u6211\u4eec\u5728\u591a\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u4eec\u5c06\u89e3\u7801\u8c03\u7528\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u7ea63.2\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u6216\u63d0\u9ad8\u4e86WER\u6027\u80fd\u3002|\n", "2409.08147": "|**2024-09-12**|**LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models**|Zhengliang Liu et.al.|[2409.08147](http://arxiv.org/abs/2409.08147)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u8868\u73b0\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u957f\u671f\u5b58\u5728\u7684\u5ba2\u89c2\u8bc4\u4f30\u8fa9\u8bba\u7ed3\u679c\u7684\u6311\u6218\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u4ece\u201c\u653f\u7b56\u3001\u4e2a\u6027\u4e0e\u89c6\u89d2\u201d\uff083P\uff09\u548c\u201c\u5174\u8da3\u3001\u610f\u8bc6\u5f62\u6001\u4e0e\u8eab\u4efd\u8ba4\u540c\u201d\uff083I\uff09\u7684\u89d2\u5ea6\u5206\u6790\u56db\u4f4d\u5173\u952e\u53d7\u4f17\u7fa4\u4f53\uff1a\u9009\u6c11\u3001\u4f01\u4e1a\u3001\u6350\u8d60\u8005\u53ca\u653f\u5ba2\u5bf9\u5019\u9009\u4eba\u7684\u5171\u9e23\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u751f\u6210\u201cLLM-POTUS\u8bc4\u5206\u201d\uff0c\u5373\u57fa\u4e8e3P\u4e0e3I\u4e4b\u95f4\u4e00\u81f4\u6027\u5ea6\u91cf\u7684\u91cf\u5316\u6307\u6807\uff0c\u6765\u8bc4\u4ef7\u8fa9\u8bba\u8868\u73b0\u3002\u6211\u4eec\u5e94\u7528\u6b64\u6846\u67b6\u5bf9\u8fd1\u671f\u7f8e\u56fd\u603b\u7edf\u8fa9\u8bba\u7684\u6587\u672c\u8fdb\u884c\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u8fa9\u8bba\u7b56\u7565\u7684\u6709\u6548\u6027\u53ca\u5176\u5bf9\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u5f71\u54cd\u3002\u7814\u7a76\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u653f\u6cbb\u5206\u6790\u5de5\u5177\uff0c\u8fd8\u63a2\u7d22\u4e86\u5728\u590d\u6742\u793e\u4f1a\u80cc\u666f\u4e0b\u4f7f\u7528LLM\u4f5c\u4e3a\u516c\u6b63\u8bc4\u5224\u8005\u7684\u6f5c\u529b\u4e0e\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u4e3a\u4e2a\u4eba\u516c\u6c11\u63d0\u4f9b\u4e86\u4e00\u4e2a\u72ec\u7acb\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u7684\u8868\u73b0\uff0c\u4ece\u800c\u589e\u5f3a\u6c11\u4e3b\u53c2\u4e0e\u5ea6\uff0c\u51cf\u5c11\u5bf9\u53ef\u80fd\u504f\u89c1\u7684\u5a92\u4f53\u89e3\u8bfb\u548c\u673a\u6784\u5f71\u54cd\u529b\u7684\u4f9d\u8d56\uff0c\u8fdb\u800c\u52a0\u5f3a\u77e5\u60c5\u516c\u6c11\u53c2\u4e0e\u7684\u57fa\u7840\u3002|\n", "2409.08098": "|**2024-09-12**|**The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal**|Huiyuan Xie et.al.|[2409.08098](http://arxiv.org/abs/2409.08098)|null|\u672c\u6587\u7814\u7a76\u4e86\u6280\u672f\u9769\u65b0\u4e0e\u83b7\u53d6\u516c\u6b63\u4e4b\u95f4\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5728\u82f1\u56fd\u5c31\u4e1a\u6cd5\u5ead\uff08UKET\uff09\u6784\u5efa\u9884\u6d4b\u6848\u4f8b\u7ed3\u679c\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u6311\u6218\uff0c\u8be5\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u81ea\u52a8\u6ce8\u91ca\uff0c\u4ece\u800c\u521b\u5efa\u4e86CLC-UKET\u6570\u636e\u96c6\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea619,000\u4e2aUKET\u6848\u4f8b\u53ca\u5176\u5143\u6570\u636e\u3002\u5168\u9762\u7684\u6cd5\u5f8b\u6ce8\u91ca\u6db5\u76d6\u4e86\u4e8b\u5b9e\u3001\u4e3b\u5f20\u3001\u5148\u4f8b\u5f15\u7528\u3001\u6cd5\u89c4\u5f15\u7528\u3001\u6848\u4f8b\u7ed3\u679c\u3001\u7406\u7531\u548c\u7ba1\u8f96\u6743\u4ee3\u7801\u3002\u501f\u52a9CLC-UKET\u6570\u636e\uff0c\u6211\u4eec\u5bf9UKET\u7684\u591a\u7c7b\u6848\u4f8b\u7ed3\u679c\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u6536\u96c6\u4e86\u4eba\u7c7b\u9884\u6d4b\u4ee5\u5efa\u7acb\u6a21\u578b\u6bd4\u8f83\u7684\u6027\u80fd\u53c2\u8003\u3002\u4ece\u57fa\u7840\u6a21\u578b\u7684\u5b9e\u8bc1\u7ed3\u679c\u6765\u770b\uff0c\u5fae\u8c03\u7684\u8f6c\u6362\u5668\u6a21\u578b\u5728UKET\u9884\u6d4b\u4efb\u52a1\u4e0a\u4f18\u4e8e\u96f6\u6b21\u548c\u5c11\u91cf\u6837\u672c\u7684LLM\u3002\u96f6\u6b21LLM\u7684\u6027\u80fd\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u4fe1\u606f\u6765\u589e\u5f3a\uff0c\u878d\u5165\u5c11\u91cf\u6837\u672c\u793a\u4f8b\u4e2d\u3002\u6211\u4eec\u5e0c\u671bCLC-UKET\u6570\u636e\u96c6\u3001\u4eba\u7c7b\u6ce8\u91ca\u4ee5\u53ca\u5b9e\u8bc1\u53d1\u73b0\u80fd\u591f\u4f5c\u4e3a\u5c31\u4e1a\u76f8\u5173\u7ea0\u7eb7\u89e3\u51b3\u7684\u5b9d\u8d35\u57fa\u51c6\u3002|\n", "2409.08087": "|**2024-09-12**|**Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks**|Benji Peng et.al.|[2409.08087](http://arxiv.org/abs/2409.08087)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u8fd1\u5e74\u6765\u6709\u5173\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b89\u5168\u6027\u7684\u5173\u952e\u95ee\u9898\u7684\u7814\u7a76\u6587\u732e\uff0c\u91cd\u70b9\u662f\u51c6\u786e\u6027\u3001\u504f\u89c1\u3001\u5185\u5bb9\u68c0\u6d4b\u4ee5\u53ca\u5bf9\u6297\u653b\u51fb\u7684\u8106\u5f31\u6027\u3002\u6587\u7ae0\u8be6\u7ec6\u8ba8\u8bba\u4e86LLM\u8f93\u51fa\u53ef\u80fd\u4e0d\u51c6\u786e\u6216\u8bef\u5bfc\u6027\u7684\u95ee\u9898\uff0c\u5e76\u5f3a\u8c03\u4e86\u901a\u8fc7\u4e8b\u5b9e\u6838\u67e5\u65b9\u6cd5\u589e\u5f3a\u54cd\u5e94\u53ef\u9760\u6027\u7684\u5b9e\u65bd\u7b56\u7565\u3002\u6587\u7ae0\u6df1\u5165\u63a2\u8ba8\u4e86\u5185\u5d4c\u4e8eLLM\u4e2d\u7684\u56fa\u6709\u504f\u89c1\uff0c\u901a\u8fc7\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6280\u672f\uff0c\u5982\u63a7\u5236\u8f93\u5165\u7814\u7a76\u548c\u7ea2\u961f\u6f14\u7ec3\uff0c\u5bf9\u5176\u8fdb\u884c\u6279\u5224\u6027\u5ba1\u89c6\u3002\u63d0\u51fa\u4e86\u5168\u9762\u7684\u504f\u89c1\u7f13\u89e3\u7b56\u7565\u5206\u6790\uff0c\u5305\u62ec\u4ece\u9884\u5904\u7406\u5e72\u9884\u5230\u8bad\u7ec3\u671f\u95f4\u8c03\u6574\u548c\u540e\u5904\u7406\u6539\u8fdb\u7684\u5404\u79cd\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u63a2\u7a76\u4e86\u533a\u5206LLM\u751f\u6210\u5185\u5bb9\u4e0e\u4eba\u7c7b\u521b\u4f5c\u6587\u672c\u7684\u590d\u6742\u6027\uff0c\u5f15\u5165\u4e86\u8bf8\u5982DetectGPT\u7684\u68c0\u6d4b\u673a\u5236\u4ee5\u53ca\u6c34\u5370\u6280\u672f\uff0c\u540c\u65f6\u6307\u51fa\u5728\u590d\u6742\u60c5\u51b5\u4e0b\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u5206\u7c7b\u5668\u5b58\u5728\u5c40\u9650\u6027\u3002\u6587\u7ae0\u8fd8\u5206\u6790\u4e86LLM\u7684\u6f0f\u6d1e\uff0c\u5305\u62ec\u9003\u9038\u653b\u51fb\u548c\u63d0\u793a\u6ce8\u5165\u653b\u51fb\uff0c\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u548c\u5927\u89c4\u6a21\u7ade\u8d5bHackAPrompt\u7b49\u8fdb\u884c\u4e86\u6df1\u5165\u63a2\u8ba8\u3002\u6700\u540e\uff0c\u6587\u7ae0\u56de\u987e\u4e86\u4fdd\u62a4LLM\u7684\u9632\u5fa1\u63aa\u65bd\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u5bf9LLM\u5b89\u5168\u6027\u9886\u57df\u8fdb\u884c\u66f4\u6df1\u5165\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.09030": "|**2024-09-13**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanxian Huang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5c24\u5176\u662f\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u4e2d\u7684\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u8bb8\u591a\u5c06LLMs\u4e0eSE\u7ed3\u5408\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u7684\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u53d1\u5c55\u80cc\u666f\u7684\u6df1\u5165\u7efc\u8ff0\u3001\u5206\u6790\u5b83\u4eec\u5982\u4f55\u7ed3\u5408\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u79cd\u4efb\u52a1\u4ee5\u53ca\u6f84\u6e05SE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u65e8\u5728\u8fdb\u884c\u9996\u6b21\u5173\u4e8e\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u3002\u540c\u65f6\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u8fd9\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u5f53\u524d\u6311\u6218\uff0c\u5e76\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u7684\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u76f8\u5173\u7684\u8bba\u6587GitHub\u4ed3\u5e93\uff0c\u5730\u5740\u4e3a\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09010": "|**2024-09-13**|**Contri(e)ve: Context + Retrieve for Scholarly Question Answering**|Kanchan Shivashankar et.al.|[2409.09010](http://arxiv.org/abs/2409.09010)|null|### \u6458\u8981\u7ffb\u8bd1 \u5b66\u8005\u4ea4\u6d41\u662f\u4e00\u4e2a\u5feb\u901f\u53d1\u5c55\u7684\u9886\u57df\uff0c\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5176\u975e\u7ed3\u6784\u5316\u7684\u6587\u6863\u683c\u5f0f\uff0c\u4f20\u7edf\u7684\u6587\u6863\u68c0\u7d22\u65b9\u6cd5\u96be\u4ee5\u4ece\u4e2d\u63d0\u53d6\u6709\u7528\u4fe1\u606f\u3002\u5b66\u8005\u77e5\u8bc6\u56fe\u8c31\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u8bed\u4e49\u7f51\u7edc\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u9690\u85cf\u7684\u6d1e\u5bdf\u3001\u6458\u8981\u548c\u6613\u4e8e\u901a\u8fc7\u67e5\u8be2\u83b7\u53d6\u7684\u8bbf\u95ee\u6027\u3002\u81ea\u7136\u5730\uff0c\u5bf9\u5b66\u8005\u56fe\u8c31\u8fdb\u884c\u95ee\u7b54\u6269\u5c55\u4e86\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u67d0\u4e9b\u77e5\u8bc6\u4ecd\u7136\u4ee5\u975e\u7ed3\u6784\u5316\u6587\u672c\u5f62\u5f0f\u5448\u73b0\uff0c\u56e0\u6b64\u9700\u8981\u7ed3\u5408\u89e3\u51b3\u65b9\u6848\u6765\u4e3a\u95ee\u7b54\u7cfb\u7edf\u63d0\u4f9b\u652f\u6301\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u89e3\u51b3\u65b9\u6848\uff0c\u4f7f\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff1aLlama3.1\u5bf9\u5b66\u8005-QALD\u6570\u636e\u96c6\u8fdb\u884c\u5904\u7406\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ece\u4e0d\u540c\u7684\u7ed3\u6784\u5316\u548c\u975e\u7ed3\u6784\u5316\u6570\u636e\u6e90\u4e2d\u63d0\u53d6\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5185\u5bb9\uff1aDBLP\u3001SemOpenAlex\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u7ef4\u57fa\u767e\u79d1\u6587\u672c\u3002 \u5176\u6b21\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u63d0\u793a\u5de5\u7a0b\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4fe1\u606f\u68c0\u7d22\u6027\u80fd\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u5728F1\u5206\u6570\u4e0a\u53d6\u5f97\u4e8640%\u7684\u6210\u7ee9\uff0c\u5e76\u89c2\u5bdf\u5230\u4e00\u4e9b\u6765\u81eaLLM\u7684\u5f02\u5e38\u54cd\u5e94\uff0c\u8fd9\u4e9b\u54cd\u5e94\u5728\u8bba\u6587\u7684\u6700\u540e\u90e8\u5206\u8fdb\u884c\u4e86\u8ba8\u8bba\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u7b26\u5408\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u4eba\u7c7b\u7684\u5408\u89c4\u6027\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u4e0d\u65ad\u589e\u957f\u91cf\u548c\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\u9762\u4e34\u7740\u6269\u5c55\u96be\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\uff0c\u4e3a\u81ea\u52a8\u5316\u5185\u5bb9\u5408\u89c4\u6027\u9a8c\u8bc1\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u672c\u6587\u8bc4\u4f30\u4e86\u516d\u4e2a\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u8fd9\u4e9b\u4ee3\u7406\u57fa\u4e8eOpen-LLMs\uff0c\u5728\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u5bf9\u89c4\u5219\u5408\u89c4\u6027\u8fdb\u884c\u81ea\u52a8\u9a8c\u8bc1\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\uff0c\u56e0\u4e3a\u793e\u533a\u7684\u8303\u56f4\u548c\u89c4\u5219\u5404\u4e0d\u76f8\u540c\u3002\u901a\u8fc7\u5bf9\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\u7684\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u975e\u5408\u89c4\u5185\u5bb9\u3001\u638c\u63e1\u8bed\u8a00\u4e0a\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u4e0d\u540c\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u663e\u793a\u51fa\u9ad8\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\uff0c\u5728\u8bc4\u5206\u89e3\u91ca\u548c\u5408\u89c4\u5efa\u8bae\u4e0a\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u8005\u76f8\u5339\u914d\u3002\u901a\u8fc7\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u5de5\u8bc4\u4f30\uff0c\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08937": "|**2024-09-13**|**Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions**|Zahra Ashktorab et.al.|[2409.08937](http://arxiv.org/abs/2409.08937)|null|\u672c\u6587\u7814\u7a76\u4e86\u5728\u4eba\u7c7b\u4e0e\u4eba\u5de5\u667a\u80fd\u5408\u4f5c\u8fdb\u884c\u6587\u672c\u751f\u6210\u4efb\u52a1\u65f6\uff0c\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u534f\u52a9\u751f\u6210\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6570\u636e\u3002\u5bf9\u4e8e\u8fd9\u4e9b\u6a21\u578b\u800c\u8a00\uff0c\u9700\u8981\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u63d0\u5347\u5176\u6027\u80fd\u7684\u5173\u952e\u6b65\u9aa4\u3002\u5728\u5ba2\u6237\u670d\u52a1\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u4e2d\uff0c\u6570\u636e\u4ee5\u4eba\u4e0e\u5ba2\u670d\u4ee3\u7406\u4e4b\u95f4\u7684\u5bf9\u8bdd\u5f62\u5f0f\u5b58\u5728\uff0c\u5e76\u53ef\u501f\u52a9AI\u52a9\u624b\u751f\u6210\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u5171\u62db\u52df\u4e8611\u4f4d\u7528\u6237\uff0c\u6bcf\u4f4d\u7528\u6237\u5b8c\u62108\u9879\u4efb\u52a1\uff0c\u603b\u5171\u5b8c\u6210\u4e8688\u9879\u4efb\u52a1\u3002\u7ed3\u679c\u53d1\u73b0\uff0c\u5e7b\u89c9\u7684\u5b58\u5728\u5bf9\u6570\u636e\u8d28\u91cf\u4ea7\u751f\u4e86\u8d1f\u9762\u5f71\u54cd\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5c3d\u7ba1\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5e76\u975e\u603b\u80fd\u62b5\u6d88\u5e7b\u89c9\u5bf9\u6570\u636e\u8d28\u91cf\u7684\u4e0d\u5229\u5f71\u54cd\uff0c\u4f46\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5171\u540c\u4f5c\u7528\u4e8e\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u5f71\u54cd\u7528\u6237\u5982\u4f55\u5229\u7528\u5448\u73b0\u7ed9\u4ed6\u4eec\u7684AI\u54cd\u5e94\u3002\u901a\u8fc7\u5206\u6790\u7528\u6237\u884c\u4e3a\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5bf9AI\u751f\u6210\u54cd\u5e94\u4f9d\u8d56\u7684\u660e\u663e\u6a21\u5f0f\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728\u5bf9\u8bddAI\u60c5\u5883\u4e0b\u7ba1\u7406\u5e7b\u89c9\u5728AI\u751f\u6210\u5185\u5bb9\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08936": "|**2024-09-13**|**SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records**|Paloma Rabaey et.al.|[2409.08936](http://arxiv.org/abs/2409.08936)|**[link](https://github.com/prabaey/synsum)**|**\u6211\u4eec\u63d0\u51fa\u4e86SynSUM\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5c06\u975e\u7ed3\u6784\u5316\u7684\u4e34\u5e8a\u8bb0\u5f55\u4e0e\u7ed3\u6784\u5316\u80cc\u666f\u53d8\u91cf\u8054\u7cfb\u8d77\u6765\u3002\u8be5\u6570\u636e\u96c6\u753110,000\u4e2a\u865a\u6784\u7684\u60a3\u8005\u8bb0\u5f55\u7ec4\u6210\uff0c\u5305\u542b\u8868\u683c\u53d8\u91cf\uff08\u5982\u75c7\u72b6\u3001\u8bca\u65ad\u548c\u57fa\u7840\u6761\u4ef6\uff09\u4ee5\u53ca\u4e0e\u4e4b\u76f8\u5173\u7684\u63cf\u8ff0\u865a\u6784\u60a3\u8005\u5c31\u8bca\u60c5\u51b5\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u9886\u57df\u4e3a\u547c\u5438\u75be\u75c5\u3002\u8868\u683c\u90e8\u5206\u7684\u6570\u636e\u901a\u8fc7\u8d1d\u53f6\u65af\u7f51\u7edc\u751f\u6210\uff0c\u5176\u4e2d\u56e0\u679c\u7ed3\u6784\u548c\u6761\u4ef6\u6982\u7387\u7531\u4e13\u5bb6\u57fa\u4e8e\u9886\u57df\u77e5\u8bc6\u63d0\u51fa\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4o\uff09\u751f\u6210\u4e0e\u60a3\u8005\u5c31\u8bca\u76f8\u5173\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u63cf\u8ff0\u60a3\u8005\u7684\u75c7\u72b6\u548c\u989d\u5916\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u3002 SynSUM\u6570\u636e\u96c6\u4e3b\u8981\u65e8\u5728\u4fc3\u8fdb\u5728\u5b58\u5728\u8868\u683c\u80cc\u666f\u53d8\u91cf\u7684\u60c5\u51b5\u4e0b\u5bf9\u4e34\u5e8a\u4fe1\u606f\u63d0\u53d6\u7684\u7814\u7a76\uff0c\u53ef\u4ee5\u901a\u8fc7\u9886\u57df\u77e5\u8bc6\u5c06\u8fd9\u4e9b\u53d8\u91cf\u94fe\u63a5\u5230\u4ece\u6587\u672c\u4e2d\u63d0\u53d6\u7684\u6982\u5ff5\u5174\u8da3\u70b9\u2014\u2014\u5728SynSUM\u7684\u60c5\u51b5\u4e0b\u662f\u75c7\u72b6\u3002\u6b21\u8981\u7528\u9014\u5305\u62ec\u7814\u7a76\u8868\u683c\u6570\u636e\u548c\u6587\u672c\u7684\u81ea\u52a8\u5316\u4e34\u5e8a\u63a8\u7406\u3001\u5728\u5b58\u5728\u8868\u683c\u548c/\u6216\u6587\u672c\u6df7\u6742\u56e0\u7d20\u60c5\u51b5\u4e0b\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u4ee5\u53ca\u591a\u6a21\u6001\u5408\u6210\u6570\u636e\u751f\u6210\u3002 \u8be5\u6570\u636e\u96c6\u53ef\u4ee5\u4ece\u4ee5\u4e0b\u94fe\u63a5\u4e0b\u8f7d\uff1a**|\n", "2409.08931": "|**2024-09-13**|**LLM-based Weak Supervision Framework for Query Intent Classification in Video Search**|Farnoosh Javadi et.al.|[2409.08931](http://arxiv.org/abs/2409.08931)|null|\u6d41\u5a92\u4f53\u670d\u52a1\u5df2\u7ecf\u5f7b\u5e95\u6539\u53d8\u4e86\u6211\u4eec\u53d1\u73b0\u548c\u53c2\u4e0e\u6570\u5b57\u5a31\u4e50\u7684\u65b9\u5f0f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u6709\u6548\u7406\u89e3\u7528\u6237\u641c\u7d22\u67e5\u8be2\u7684\u5e7f\u6cdb\u8303\u56f4\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6784\u5efa\u4e00\u4e2a\u80fd\u591f\u5904\u7406\u4ee3\u8868\u4e0d\u540c\u7528\u6237\u610f\u56fe\u7684\u5404\u79cd\u5b9e\u4f53\u7684\u51c6\u786e\u67e5\u8be2\u7406\u89e3\u7cfb\u7edf\u5bf9\u4e8e\u63d0\u4f9b\u589e\u5f3a\u7684\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u8bad\u7ec3\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u6a21\u578b\u53ef\u4ee5\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u7136\u800c\uff0c\u5728\u8fd9\u4e2a\u4e13\u95e8\u9886\u57df\u7684\u9ad8\u8d28\u91cf\u6807\u6ce8\u6570\u636e\u83b7\u53d6\u662f\u4e00\u4e2a\u5de8\u5927\u7684\u969c\u788d\u3002\u624b\u52a8\u6ce8\u91ca\u6210\u672c\u9ad8\u6602\u4e14\u5728\u6355\u6349\u7528\u6237\u8bcd\u6c47\u53d8\u5f02\u6027\u65b9\u9762\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f31\u76d1\u7763\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u6807\u6ce8\u5927\u91cf\u7528\u6237\u641c\u7d22\u67e5\u8be2\u3002\u901a\u8fc7\u4f7f\u7528\u63d0\u793a\u5de5\u7a0b\u548c\u591a\u6837\u5316\u7684LLM\u89d2\u8272\uff0c\u6211\u4eec\u751f\u6210\u4e86\u4e0e\u4eba\u5de5\u6ce8\u91ca\u8005\u671f\u671b\u76f8\u5339\u914d\u7684\u8bad\u7ec3\u6570\u636e\u3002\u901a\u8fc7\u5f15\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u5229\u7528\u94fe\u5f0f\u601d\u8003\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6807\u8bb0\u6570\u636e\u8bad\u7ec3\u4f18\u5316\u7528\u4e8e\u5b9e\u65f6\u63a8\u7406\u7684\u4f4e\u5ef6\u8fdf\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u53ec\u56de\u7387\u4e0a\u4f18\u4e8e\u57fa\u7ebf\u5e73\u5747\u63d0\u9ad8\u4e86113%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u4ea7\u751f\u7528\u4e8e\u5f31\u76d1\u7763\u7684\u9ad8\u8d28\u91cfLLM\u751f\u6210\u6570\u636e\uff1b\u4e0e\u4eba\u7c7b\u6ce8\u91ca\u7684F1\u5f97\u5206\u52a0\u6743\u5206\u5e03\u76f8\u6bd4\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u9884\u6d4b\u548c\u4eba\u7c7b\u6ce8\u89e3\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u63d0\u9ad8\u4e8647.60%\u3002\u6211\u4eec\u7684\u89d2\u8272\u9009\u62e9\u8def\u7531\u673a\u5236\u8fdb\u4e00\u6b65\u589e\u52a0\u4e863.67%\u7684\u52a0\u6743F1\u5f97\u5206\uff0c\u8fd9\u662f\u5728\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u57fa\u7840\u4e0a\u7684\u989d\u5916\u6536\u76ca\u3002|\n", "2409.08904": "|**2024-09-13**|**AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models**|Yifei Yao et.al.|[2409.08904](http://arxiv.org/abs/2409.08904)|**[link](https://github.com/sjtu-mvasl-robotics/AnyBipe)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bad\u7ec3\u548c\u90e8\u7f72\u673a\u5668\u4eba\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7b56\u7565\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5f15\u5bfc\u3002\u8be5\u6846\u67b6\u7531\u4e09\u4e2a\u76f8\u4e92\u8fde\u63a5\u7684\u6a21\u5757\u7ec4\u6210\uff1a\u4e00\u4e2a\u901a\u8fc7LLM\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u6a21\u5757\u3001\u4e00\u4e2a\u5229\u7528\u73b0\u6709\u5de5\u4f5c\u7684RL\u8bad\u7ec3\u6a21\u5757\u4ee5\u53ca\u4e00\u4e2a\u6a21\u62df\u5230\u73b0\u5b9e\uff08sim-to-real\uff09\u540c\u6001\u8bc4\u4f30\u6a21\u5757\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4ec5\u9700\u8981\u57fa\u672c\u7684\u6a21\u62df\u548c\u90e8\u7f72\u5e73\u53f0\uff0c\u5e76\u4e14\u63d0\u4f9b\u4e86\u4eba\u5de5\u5de5\u7a0b\u7b56\u7565\u548c\u5386\u53f2\u6570\u636e\u7684\u6574\u5408\u9009\u9879\u3002\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u8fd9\u4e9b\u6a21\u5757\u7684\u6784\u5efa\u3001\u5b83\u4eec\u76f8\u5bf9\u4e8e\u4f20\u7edf\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u5c55\u793a\u8be5\u6846\u67b6\u5728\u53cc\u8db3\u673a\u5668\u4eba\u6b65\u6001\u63a7\u5236\u81ea\u4e3b\u5f00\u53d1\u548c\u6539\u8fdb\u80fd\u529b\u7684\u5b9e\u4f8b\uff0c\u8bc1\u660e\u5176\u5728\u4e0d\u9700\u8981\u4eba\u7c7b\u5e72\u9884\u7684\u60c5\u51b5\u4e0b\u64cd\u4f5c\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.08890": "|**2024-09-13**|**A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research**|Martin Obschonka et.al.|[2409.08890](http://arxiv.org/abs/2409.08890)|null|\u5728\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u91c7\u7528\u7684\u8fc5\u901f\u589e\u957f\u4ee5\u53ca\u5927\u6570\u636e\u53ef\u7528\u6027\u7684\u80cc\u666f\u4e0b\uff0c\u521b\u4e1a\u5b66\u9886\u57df\u53ef\u80fd\u8fce\u6765\u6709\u53f2\u4ee5\u6765\u6700\u91cd\u5927\u7684\u8f6c\u53d8\u3002\u672c\u6587\u901a\u8fc7\u5f3a\u8c03AI\u9769\u547d\u671f\u95f4\u521b\u4e1a\u7814\u7a76\u4e2d\u6f5c\u5728\u7684\u65e0\u6210\u6548\u77e5\u8bc6\u4ea4\u6d41\u98ce\u9669\uff0c\u505a\u51fa\u4e86\u7d27\u8feb\u7684\u5143\u8d21\u732e\u3002\u5b83\u63d0\u4f9b\u4e86\u7f13\u89e3\u8fd9\u4e00\u98ce\u9669\u7684\u7b56\u7565\uff0c\u5e76\u4e3a\u672a\u6765\u57fa\u4e8eAI\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u5176\u96c6\u4f53\u5f71\u54cd\u529b\u548c\u76f8\u5173\u6027\u3002 \u501f\u9274Akerlof\u8457\u540d\u7684\u201c\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u201d\u6982\u5ff5\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u7531\u4e8e\u9886\u57df\u6f14\u8fdb\u5230\u5f53\u524d\u73af\u5883\u800c\u53ef\u80fd\u51fa\u73b0\u7684\u91cd\u5927\u77e5\u8bc6\u4e0d\u5bf9\u79f0\u6027\uff0c\u5982\u6784\u9020\u6709\u6548\u6027\u3001\u7406\u8bba\u6784\u5efa\u548c\u7814\u7a76\u76f8\u5173\u6027\u65b9\u9762\u7684\u590d\u6742\u6027\u3002\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u6027\u7279\u522b\u6df1\u690d\u4e8e\u6240\u8c13\u7684\u53cc\u91cd\u9ed1\u7bb1\u56f0\u5883\u4e2d\uff0c\u5373AI\u65b9\u6cd5\u7684\u5e7f\u6cdb\u8ba4\u53ef\u7684\u9ed1\u7bb1\u6027\u8d28\u4e0e\u7531\u5185\u5728\u4e0d\u786e\u5b9a\u6027\u9a71\u52a8\u7684\u521b\u4e1a\u73b0\u8c61\u7684\u9ed1\u7bb1\u6027\u8d28\u7684\u4ea4\u6c47\u70b9\u3002\u7ed3\u679c\uff0c\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u53ef\u80fd\u5bfc\u81f4\u4e0d\u53ef\u68c0\u6d4b\u7684\u6b21\u4f18\u7814\u7a76\u4ea7\u54c1\u589e\u52a0\uff0c\u4ece\u800c\u5f62\u6210\u4e00\u4e2a\u635f\u5bb3\u9886\u57df\u798f\u7949\u3001\u58f0\u8a89\u548c\u5f71\u54cd\u529b\u7684\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u3002 \u7136\u800c\uff0c\u91cd\u8981\u7684\u662f\uff0c\u5982\u679c\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u98ce\u9669\uff0cAI\u9769\u547d\u6709\u53ef\u80fd\u9884\u793a\u7740\u521b\u4e1a\u7814\u7a76\u7684\u65b0\u9ec4\u91d1\u65f6\u4ee3\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u63d0\u5347\u9886\u57df\u81f3\u66f4\u9ad8\u6c34\u5e73\u7684AI\u97e7\u6027\u6240\u9700\u91c7\u53d6\u7684\u884c\u52a8\uff0c\u540c\u65f6\u575a\u5b9a\u5730\u4fdd\u6301\u5176\u57fa\u7840\u539f\u5219\u548c\u6838\u5fc3\u4ef7\u503c\u89c2\u3002|\n", "2409.08864": "|**2024-09-13**|**Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies**|Zhiqiang Zhong et.al.|[2409.08864](http://arxiv.org/abs/2409.08864)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5904\u7406\u5404\u79cd\u6570\u636e\u7ed3\u6784\u65f6\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5305\u62ec\u56fe\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u5f00\u53d1\u7528\u4e8e\u56fe\u8868\u793a\u7684\u6587\u672c\u7f16\u7801\u65b9\u6cd5\u4e0a\uff0c\u4f46\u591a\u6a21\u6001LLM\u7684\u51fa\u73b0\u4e3a\u7406\u89e3\u56fe\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u524d\u6cbf\u3002\u8fd9\u4e9b\u5148\u8fdb\u7684\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6587\u672c\u548c\u56fe\u50cf\uff0c\u901a\u8fc7\u7ed3\u5408\u89c6\u89c9\u8868\u793a\u4e0e\u4f20\u7edf\u7684\u6587\u672c\u6570\u636e\uff0c\u53ef\u80fd\u5728\u63d0\u9ad8\u5bf9\u56fe\u7ed3\u6784\u7684\u7406\u89e3\u65b9\u9762\u5e26\u6765\u6539\u8fdb\u3002\u8fd9\u9879\u7814\u7a76\u63a2\u8ba8\u4e86\u53ef\u89c6\u5316\u56fe\u5728\u4e0d\u540c\u7ea7\u522b\uff08\u8282\u70b9\u3001\u8fb9\u548c\u56fe\u7ea7\u522b\uff09\u4e0a\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u591a\u6a21\u6001\u65b9\u6cd5\u4e0e\u7eaf\u6587\u672c\u56fe\u8868\u793a\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u63d0\u4f9b\u4e86\u5173\u4e8e\u5229\u7528\u89c6\u89c9\u56fe\u6a21\u6001\u589e\u5f3aLLM\u5bf9\u56fe\u7ed3\u6784\u7406\u89e3\u80fd\u529b\u7684\u6f5c\u529b\u548c\u9650\u5236\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2409.08846": "|**2024-09-13**|**FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition**|Zhenhua Xu et.al.|[2409.08846](http://arxiv.org/abs/2409.08846)|null|\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9700\u8981\u5de8\u5927\u7684\u8ba1\u7b97\u80fd\u529b\u548c\u5927\u91cf\u7684\u6570\u636e\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u6307\u7eb9\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u7684\u77e5\u8bc6\u4ea7\u6743\u5bf9\u4e8e\u6240\u6709\u6743\u8ba4\u8bc1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5c1d\u8bd5\u901a\u8fc7\u5fae\u8c03\u5411LLMs\u6dfb\u52a0\u6307\u7eb9\uff0c\u4f46\u8fd9\u4ecd\u6210\u672c\u9ad8\u6602\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FP-VEC\uff0c\u4e00\u79cd\u4f7f\u7528\u6307\u7eb9\u5411\u91cf\u4f5c\u4e3a\u9ad8\u6548LLM\u6307\u7eb9\u65b9\u6cd5\u7684\u8bd5\u70b9\u7814\u7a76\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u751f\u6210\u4e00\u4e2a\u4ee3\u8868\u5d4c\u5165\u5728\u6a21\u578b\u4e2d\u7684\u4fdd\u5bc6\u7b7e\u540d\u7684\u6307\u7eb9\u5411\u91cf\uff0c\u5141\u8bb8\u901a\u8fc7\u5411\u91cf\u76f8\u52a0\u65e0\u7f1d\u5730\u5c06\u76f8\u540c\u7684\u6307\u7eb9\u6574\u5408\u5230\u65e0\u9650\u6570\u91cf\u7684LLMs\u4e2d\u3002\u5728\u591a\u4e2aLLMs\u4e0a\u7684\u7ed3\u679c\u8868\u660e\uff0cFP-VEC\u8f7b\u91cf\u7ea7\uff0c\u53ef\u4ee5\u5728\u4ec5\u4f7f\u7528CPU\u7684\u8bbe\u5907\u4e0a\u8fd0\u884c\u4ee5\u8fdb\u884c\u6307\u7eb9\u8bc6\u522b\uff1b\u53ef\u6269\u5c55\uff0c\u53ea\u9700\u8981\u4e00\u6b21\u8bad\u7ec3\u5373\u53ef\u5b9e\u73b0\u65e0\u9650\u6b21\u7684\u6307\u7eb9\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u4e14\u80fd\u591f\u4fdd\u6301\u6a21\u578b\u7684\u6b63\u5e38\u884c\u4e3a\u3002\u9879\u76ee\u9875\u9762\u4f4d\u4e8ehttps://fingerprintvector.github.io \u3002|\n", "2409.10516": "|**2024-09-16**|**RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval**|Di Liu et.al.|[2409.10516](http://arxiv.org/abs/2409.10516)|**[link](https://github.com/jzbjyb/reatt)**|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u5bf9\u6269\u5c55\u5230\u66f4\u957f\u4e0a\u4e0b\u6587\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5bfc\u81f4\u4e86\u6781\u9ad8\u7684\u63a8\u7406\u5ef6\u8fdf\u548cGPU\u5185\u5b58\u6d88\u8017\u4ee5\u7f13\u5b58\u952e\u503c\uff08KV\uff09\u5411\u91cf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u2014\u2014\u68c0\u7d22\u6ce8\u610f\u529b\uff08RetrievalAttention\uff09\uff0c\u4ee5\u52a0\u901f\u6ce8\u610f\u529b\u8ba1\u7b97\u3002\u901a\u8fc7\u5229\u7528\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u52a8\u6001\u7a00\u758f\u7279\u6027\uff0cRetrievalAttention\u5728CPU\u5185\u5b58\u4e0a\u6784\u5efa\u4e86\u8fd1\u4f3c\u6700\u8fd1\u90bb\u641c\u7d22\uff08ANNS\uff09\u7d22\u5f15\uff0c\u5e76\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u901a\u8fc7\u5411\u91cf\u641c\u7d22\u68c0\u7d22\u6700\u76f8\u5173\u7684\u90e8\u5206\u3002 \u7531\u4e8e\u67e5\u8be2\u5411\u91cf\u4e0e\u952e\u5411\u91cf\u4e4b\u95f4\u7684\u5206\u5e03\u5916\uff08OOD\uff09\u95ee\u9898\uff0c\u73b0\u6210\u7684ANNS\u7d22\u5f15\u4ecd\u9700\u8981\u626b\u63cfO(N)\uff08\u901a\u5e38\u4e3a\u6240\u6709\u952e\u768430%\uff09\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u786e\u68c0\u7d22\uff0c\u8fd9\u65e0\u6cd5\u5145\u5206\u5229\u7528\u9ad8\u7a00\u758f\u6027\u3002RetrievalAttention\u9996\u5148\u8bc6\u522b\u4e86ANNS\u57fa\u6ce8\u610f\u529b\u4e2d\u7684OOD\u6311\u6218\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u9002\u5e94\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u611f\u77e5\u5411\u91cf\u641c\u7d22\u7b97\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u8be5\u7b97\u6cd5\u4ec5\u8bbf\u95ee1-3%\u7684\u6570\u636e\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e9a\u7ebf\u6027\u65f6\u95f4\u590d\u6742\u5ea6\u3002 RetrievalAttention\u5927\u5e45\u964d\u4f4e\u4e86\u957f\u4e0a\u4e0b\u6587LLMs\u7684\u63a8\u7406\u6210\u672c\uff0c\u540c\u65f6\u663e\u8457\u51cf\u5c11\u4e86GPU\u5185\u5b58\u9700\u6c42\uff0c\u800c\u4fdd\u6301\u4e86\u6a21\u578b\u51c6\u786e\u6027\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cRetrievalAttention\u4ec5\u9700\u898116GB\u7684GPU\u5185\u5b58\u5373\u53ef\u4e3a\u5177\u67098B\u53c2\u6570\u7684LLM\u63d0\u4f9b\u670d\u52a1\uff0c\u652f\u6301\u5904\u7406128K\u4e2a\u4ee4\u724c\uff0c\u80fd\u591f\u5728\u5355\u4e2aNVIDIA RTX4090\uff0824GB\uff09\u4e0a\u751f\u6210\u4e00\u4e2a\u4ee4\u724c\u8017\u65f60.188\u79d2\u3002|\n", "2409.10506": "|**2024-09-16**|**Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models**|Momoko Shiraishi et.al.|[2409.10506](http://arxiv.org/abs/2409.10506)|null|\u7531\u4e8e\u73b0\u6709C\u7a0b\u5e8f\u4e2d\u7684\u5185\u5b58\u5b89\u5168\u6027\u6f0f\u6d1e\u6301\u7eed\u5a01\u80c1\u4ee5\u53caRust\u8bed\u8a00\u4f5c\u4e3aC\u8bed\u8a00\u66ff\u4ee3\u54c1\u6240\u53d7\u5230\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c06C\u4ee3\u7801\u8f6c\u6362\u4e3aRust\u4ee3\u7801\u5b58\u5728\u5f3a\u70c8\u7684\u52a8\u673a\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u751f\u6210\u6bd4\u57fa\u4e8e\u89c4\u5219\u65b9\u6cd5\u66f4\u81ea\u7136\u3001\u66f4\u5b89\u5168\u7684\u4ee3\u7801\u6765\u81ea\u52a8\u5316\u8fd9\u4e00\u7ffb\u8bd1\u8fc7\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u751f\u6210\u7684Rust\u4ee3\u7801\u5f80\u5f80\u65e0\u6cd5\u7f16\u8bd1\uff0c\u5373\u4f7f\u662f\u76f8\u5bf9\u8f83\u5c0f\u7684C\u7a0b\u5e8f\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u548c\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u7ffb\u8bd1\u65b9\u6848\uff0c\u4ee5\u63d0\u9ad8\u5927\u89c4\u6a21C\u4ee3\u7801\u6210\u529f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\u7684\u6982\u7387\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6280\u672f\uff1a\uff081\uff09\u9884\u5904\u7406C\u4ee3\u7801\uff0c\u4f7f\u5176\u7ed3\u6784\u548c\u8868\u8fbe\u5f0f\u66f4\u597d\u5730\u4e0eRust\u5bf9\u9f50\uff1b\uff082\uff09\u5c06\u4ee3\u7801\u5206\u5272\u4e3a\u6700\u4f73\u5927\u5c0f\u7684\u7ffb\u8bd1\u5355\u5143\uff0c\u4ee5\u907f\u514d\u8d85\u51faLLM\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\uff1b\uff083\uff09\u901a\u8fc7\u4f7f\u7528\u4e0a\u4e0b\u6587\u8865\u5145\u63d0\u793a\uff0c\u8fed\u4ee3\u7f16\u8bd1\u5e76\u4fee\u590d\u9519\u8bef\uff0c\u540c\u65f6\u4fdd\u6301\u4e0d\u540c\u7ffb\u8bd1\u5355\u5143\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u3002\u6210\u529f\u7f16\u8bd1\u662f\u5b9e\u73b0\u529f\u80fd\u7b49\u6548\u6027\u7684\u9996\u8981\u6b65\u9aa4\uff0c\u56e0\u4e3a\u53ea\u6709\u53ef\u7f16\u8bd1\u7684\u4ee3\u7801\u624d\u80fd\u8fdb\u4e00\u6b65\u8fdb\u884c\u6d4b\u8bd5\u3002 \u572820\u4e2a\u57fa\u51c6C\u7a0b\u5e8f\u7684\u5b9e\u9a8c\u4e2d\uff0c\u5305\u62ec\u90a3\u4e9b\u8d85\u8fc74\u5343\u884c\u4ee3\u7801\u7684\u7a0b\u5e8f\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u6240\u6709\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\uff0c\u6ca1\u6709\u4e22\u5931\u539f\u59cb\u4ee3\u7801\u7684\u5bf9\u5e94\u90e8\u5206\u3002|\n", "2409.10504": "|**2024-09-16**|**DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction**|John Wu et.al.|[2409.10504](http://arxiv.org/abs/2409.10504)|null|\u5728\u533b\u5b66\u7f16\u7801\u7b49\u9ad8\u7ef4\u6216\u591a\u6807\u7b7e\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0c\u65e2\u9700\u8981\u9884\u6d4b\u7684\u51c6\u786e\u6027\u4e5f\u9700\u8981\u89e3\u91ca\u7684\u53ef\u8bfb\u6027\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5c40\u90e8\u89e3\u91ca\u65b9\u6cd5\uff0c\u65e0\u6cd5\u63d0\u4f9b\u6574\u4e2a\u591a\u6807\u7b7e\u96c6\u5185\u6bcf\u4e2a\u6807\u7b7e\u9884\u6d4b\u80cc\u540e\u7684\u5168\u9762\u673a\u5236\u89e3\u91ca\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDIctionary Label Attention\uff08\u7b80\u79f0\\method\uff09\u7684\u6a21\u5757\u5316\u89e3\u91ca\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u4e0d\u53ef\u89e3\u91ca\u7684\u5bc6\u96c6\u5d4c\u5165\u5206\u89e3\u5230\u7a00\u758f\u5d4c\u5165\u7a7a\u95f4\u4e2d\u3002\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u975e\u96f6\u5143\u7d20\uff08\u5b57\u5178\u7279\u5f81\uff09\u4ee3\u8868\u4e86\u5168\u5c40\u5b66\u4e60\u7684\u533b\u7597\u6982\u5ff5\u3002 \u901a\u8fc7\u4eba\u5de5\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u7a00\u758f\u5d4c\u5165\u6bd4\u5176\u5bc6\u96c6\u5bf9\u5e94\u7269\u5728\u4eba\u7c7b\u7406\u89e3\u4e0a\u81f3\u5c11\u63d0\u9ad8\u4e8650%\u3002\u6211\u4eec\u7684\u81ea\u52a8\u5b57\u5178\u7279\u5f81\u8bc6\u522b\u7ba1\u9053\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u901a\u8fc7\u68c0\u67e5\u5e76\u603b\u7ed3\u6bcf\u4e2a\u5b57\u5178\u7279\u5f81\u6fc0\u6d3b\u7684\u6700\u9ad8\u7ea7\u8bcd\u6c47\uff0c\u63ed\u793a\u4e86\u6570\u5343\u4e2a\u5b66\u4e60\u5230\u7684\u533b\u7597\u6982\u5ff5\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7a00\u758f\u7684\u53ef\u89e3\u91ca\u77e9\u9635\u8868\u793a\u5b57\u5178\u7279\u5f81\u4e0e\u533b\u7597\u4ee3\u7801\u4e4b\u95f4\u7684\u5173\u7cfb\uff0c\u8fd9\u4e0d\u4ec5\u589e\u5f3a\u4e86\u6a21\u578b\u9884\u6d4b\u7684\u673a\u5236\u6027\u548c\u5168\u5c40\u7406\u89e3\u80fd\u529b\uff0c\u800c\u4e14\u5728\u4e0d\u9700\u8981\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u60c5\u51b5\u4e0b\uff0c\u4fdd\u6301\u4e86\u7ade\u4e89\u529b\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2409.10502": "|**2024-09-16**|**Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles**|Kulin Shah et.al.|[2409.10502](http://arxiv.org/abs/2409.10502)|**[link](https://github.com/kulinshah98/llm-reasoning-logic-puzzles)**|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8eTransformer\u67b6\u6784\u7684\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u6b63\u53d1\u5c55\u51fa\u4e86\u57fa\u672c\u7684\u641c\u7d22\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ecd\u662f\u4e00\u4e2a\u6301\u7eed\u8ba8\u8bba\u7684\u8bdd\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u80fd\u5426\u5b66\u4f1a\u89e3\u51b3\u590d\u6742\u7684\u6570\u72ec\u8c1c\u9898\u8fd9\u4e00\u4efb\u52a1\u3002\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\u9700\u8981\u6a21\u578b\u9996\u5148\u5728\u6240\u6709\u7a7a\u767d\u5355\u5143\u683c\u4e2d\u8fdb\u884c\u641c\u7d22\u4ee5\u51b3\u5b9a\u586b\u5145\u54ea\u4e2a\u5355\u5143\u683c\uff0c\u7136\u540e\u5e94\u7528\u9002\u5f53\u7684\u7b56\u7565\u6765\u586b\u5145\u9009\u5b9a\u7684\u5355\u5143\u683c\u3002\u6709\u65f6\uff0c\u7b56\u7565\u7684\u5e94\u7528\u4ec5\u5bfc\u81f4\u5355\u5143\u683c\u53ef\u80fd\u503c\u7684\u51cf\u5c11\uff0c\u800c\u975e\u786e\u5b9a\u786e\u5207\u503c\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u9700\u8981\u5bf9\u5355\u4e2a\u5355\u5143\u683c\u5e94\u7528\u591a\u4e2a\u7b56\u7565\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u7ecf\u8fc7\u903b\u8f91\u6b65\u9aa4\u5e8f\u5217\u8bad\u7ec3\u7684Transformer\u6a21\u578b\u786e\u5b9e\u80fd\u591f\u5b66\u4f1a\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\uff08\u6211\u4eec\u7684\u6a21\u578b\u6b63\u786e\u89e3\u51b3\u4e8694.21%\u7684\u8c1c\u9898\uff09\u3002\u6211\u4eec\u8fd8\u5bf9Zebra\u8c1c\u9898\uff08\u53c8\u79f0\u7231\u56e0\u65af\u5766\u8c1c\u9898\uff09\u8fdb\u884c\u4e86\u6269\u5c55\u5206\u6790\uff0c\u5e76\u8bc1\u660e\u6a21\u578b\u80fd\u591f\u6b63\u786e\u89e3\u51b392.04%\u7684\u8c1c\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u8bad\u7ec3\u540e\u7684Transformer\u5185\u90e8\u8868\u793a\uff0c\u5e76\u901a\u8fc7\u7ebf\u6027\u63a2\u67e5\u53d1\u73b0\uff0c\u53ef\u4ee5\u4ece\u5b83\u4eec\u4e2d\u89e3\u7801\u51fa\u7ed9\u5b9a\u5355\u5143\u683c\u7684\u6240\u6709\u53ef\u80fd\u503c\u4fe1\u606f\uff0c\u8fd9\u8868\u660eTransformer\u6743\u91cd\u4e2d\u9690\u542b\u7740\u5f3a\u5927\u7684\u63a8\u7406\u5f15\u64ce\u3002|\n", "2409.10490": "|**2024-09-16**|**Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models**|Shaznin Sultana et.al.|[2409.10490](http://arxiv.org/abs/2409.10490)|null|\u8fd1\u5e74\u6765\uff0c\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u5bf9\u5f00\u6e90\u9879\u76ee\u4f9d\u8d56\u7684\u589e\u52a0\u5bfc\u81f4\u4e86\u6f0f\u6d1e\u95ee\u9898\u7684\u663e\u8457\u589e\u957f\uff0c\u8fd9\u4e00\u73b0\u8c61\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc6\u522b\u4ee3\u7801\u5e93\u4e2d\u7684\u6f0f\u6d1e\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u6548\u679c\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u65b0\u5174LLM\u6280\u672f\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u901a\u8fc7\u5bf9\u6bd4\u5206\u6790\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5305\u62ecLlama\u3001CodeLlama\u3001Gemma\u548cCodeGemma\u5728\u5185\u7684\u6700\u8fd1\u52a0\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u53caBERT\u3001RoBERTa\u548cGPT-3\u7b49\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u63ed\u793aLLM\u5728\u6f0f\u6d1e\u68c0\u6d4b\u9886\u57df\u7684\u80fd\u529b\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e0d\u540c\u5f00\u6e90\u4ed3\u5e93\u7684\u5b89\u5168\u5b9e\u8df5\u63d0\u5347\u3002\u7ed3\u679c\u663e\u793a\uff0cCodeGemma\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u53d6\u5f97\u4e86\u6700\u9ad8\u7684F1\u5206\u6570\uff0858%\uff09\u548c\u53ec\u56de\u7387\uff0887%\uff09\u3002|\n", "2409.10484": "|**2024-09-16**|**XLM for Autonomous Driving Systems: A Comprehensive Review**|Sonda Fourati et.al.|[2409.10484](http://arxiv.org/abs/2409.10484)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u63d0\u53d6\u548c\u6587\u732e\u603b\u7ed3\u5230\u5185\u5bb9\u751f\u6210\u3001\u9884\u6d4b\u5efa\u6a21\u3001\u51b3\u7b56\u5236\u5b9a\u4ee5\u53ca\u7cfb\u7edf\u63a7\u5236\u7b49\u591a\u4e2a\u65b9\u9762\u3002\u6b64\u5916\uff0c\u89c6\u89c9\u5927\u578b\u6a21\u578b\uff08VLMs\uff09\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\uff0c\u5373XLMs\uff0c\u80fd\u591f\u7ed3\u5408\u591a\u79cd\u6570\u636e\u6a21\u6001\uff0c\u5e76\u5229\u7528\u8bed\u8a00\u7406\u89e3\u7684\u5f3a\u5927\u529b\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u8bf8\u5982\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\uff08ADS\uff09\u7b49\u57fa\u4e8e\u4fe1\u606f\u7cfb\u7edf\u7684\u8fdb\u6b65\u3002\u901a\u8fc7\u5c06\u8bed\u8a00\u901a\u4fe1\u4e0e\u591a\u6a21\u5f0f\u611f\u5b98\u8f93\u5165\uff08\u5982\u5168\u666f\u56fe\u50cf\u548c\u6fc0\u5149\u96f7\u8fbe\u6216\u96f7\u8fbe\u6570\u636e\uff09\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u91c7\u53d6\u51c6\u786e\u7684\u9a7e\u9a76\u884c\u52a8\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u672c\u6587\u7efc\u8ff0\u4e86XLMs\u5728\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u56de\u987e\u4e86ADS\u548cXLMs\u7684\u76f8\u5173\u6587\u732e\uff0c\u5305\u62ec\u5b83\u4eec\u7684\u67b6\u6784\u3001\u5de5\u5177\u548c\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u8be6\u7ec6\u9610\u8ff0\u4e86\u90e8\u7f72XLMs\u4ee5\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u89e3\u51b3\u65b9\u6848\u7684\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86XLM\u90e8\u7f72\u5728ADS\u4e2d\u7684\u76f8\u5173\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u4fc3\u8fdbXLM\u5728\u672a\u6765ADS\u6846\u67b6\u4e2d\u7684\u5e94\u7528\u3002|\n", "2409.10482": "|**2024-09-17**|**Schrodinger's Memory: Large Language Models**|Wei Wang et.al.|[2409.10482](http://arxiv.org/abs/2409.10482)|null|\u8bb0\u5fc6\u662f\u4eba\u7c7b\u6d3b\u52a8\u7684\u57fa\u7840\uff1b\u6ca1\u6709\u8bb0\u5fc6\uff0c\u51e0\u4e4e\u4e0d\u53ef\u80fd\u6267\u884c\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u4efb\u4f55\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u80fd\u529b\u6b63\u53d8\u5f97\u8d8a\u6765\u8d8a\u63a5\u8fd1\u4eba\u7c7b\u3002\u4f46LLMs\u6709\u8bb0\u5fc6\u5417\uff1f\u6839\u636e\u5f53\u524d\u7684\u8868\u73b0\uff0cLLMs\u786e\u5b9e\u663e\u793a\u51fa\u5177\u6709\u8bb0\u5fc6\u7684\u8ff9\u8c61\u3002\u90a3\u4e48\uff0c\u8fd9\u79cd\u8bb0\u5fc6\u673a\u5236\u80cc\u540e\u662f\u4ec0\u4e48\u539f\u7406\u5462\uff1f\u76ee\u524d\u7684\u7814\u7a76\u7f3a\u4e4f\u5bf9LLMs\u8bb0\u5fc6\u80fd\u529b\u548c\u5e95\u5c42\u7406\u8bba\u7684\u6df1\u5165\u63a2\u8ba8\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6cdb\u903c\u8fd1\u5b9a\u7406\uff08UAT\uff09\u6765\u89e3\u91caLLMs\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u5404\u79cdLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8fd9\u4e9b\u8bb0\u5fc6\u80fd\u529b\u7684\u65b0\u65b9\u6cd5\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0cLLMs\u7684\u8bb0\u5fc6\u5de5\u4f5c\u65b9\u5f0f\u7c7b\u4f3c\u4e8e\u859b\u5b9a\u8c14\u7684\u8bb0\u5fc6\uff0c\u5373\u53ea\u6709\u5728\u67e5\u8be2\u7279\u5b9a\u8bb0\u5fc6\u65f6\u624d\u4f1a\u663e\u73b0\u51fa\u6765\u3002\u6211\u4eec\u53ea\u80fd\u901a\u8fc7\u54cd\u5e94\u67e5\u8be2\u7684\u8f93\u51fa\u6765\u786e\u5b9a\u6a21\u578b\u662f\u5426\u4fdd\u7559\u4e86\u8bb0\u5fc6\uff1b\u5426\u5219\uff0c\u5b83\u4ecd\u7136\u662f\u4e0d\u786e\u5b9a\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u8fd9\u4e00\u6982\u5ff5\uff0c\u901a\u8fc7\u6bd4\u8f83\u4eba\u8111\u548cLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u5b83\u4eec\u5728\u64cd\u4f5c\u673a\u5236\u4e0a\u7684\u76f8\u4f3c\u6027\u548c\u5dee\u5f02\u6027\u3002|\n", "2409.10444": "|**2024-09-16**|**LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning**|Jicong Ao et.al.|[2409.10444](http://arxiv.org/abs/2409.10444)|**[link](https://github.com/proneverfake/kios)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLLM\u4f5c\u4e3a\u884c\u4e3a\u6811\u89c4\u5212\u5668\u201d\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u673a\u5668\u4eba\u88c5\u914d\u4efb\u52a1\u89c4\u5212\u4e0e\u6267\u884c\u4e2d\u7684\u884c\u4e3a\u6811\uff08BT\uff09\u751f\u6210\u3002\u6211\u4eec\u5f15\u5165\u4e86\u56db\u79cd\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ee5BT\u683c\u5f0f\u4ea7\u751f\u4efb\u52a1\u8ba1\u5212\uff0c\u4ece\u800c\u51cf\u5c11\u4eba\u5de5\u52aa\u529b\u5e76\u786e\u4fdd\u5176\u7a33\u5065\u6027\u548c\u53ef\u7406\u89e3\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u5bf9\u540c\u4e00\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u7684\u53c2\u6570\u8f83\u5c11\u7684LLMs\u7684\u8868\u73b0\u3002\u5728\u6a21\u62df\u548c\u5b9e\u9645\u4e16\u754c\u8bbe\u7f6e\u4e0b\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u63d0\u9ad8\u4e86LLMs\u5728BT\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u548c\u76d1\u7763\u5fae\u8c03\uff0c\u5728BT\u751f\u6210\u65b9\u9762\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u3002|\n", "2409.10411": "|**2024-09-16**|**A Large-Scale Privacy Assessment of Android Third-Party SDKs**|Mark Huasong Meng et.al.|[2409.10411](http://arxiv.org/abs/2409.10411)|null|\u672c\u6587\u7814\u7a76\u5bf9Android\u5e73\u53f0\u4e0a\u7684\u7b2c\u4e09\u65b9\u8f6f\u4ef6\u5f00\u53d1\u5de5\u5177\u5305\uff08SDK\uff09\u8fdb\u884c\u4e86\u9488\u5bf9\u6027\u5206\u6790\uff0c\u65e8\u5728\u586b\u8865Android\u8f6f\u4ef6\u4f9b\u5e94\u94fe\u4e2d\u7684\u5173\u952e\u7a7a\u767d\uff0c\u5173\u6ce8\u4e8e\u7528\u6237\u9690\u79c1\u4fdd\u62a4\u95ee\u9898\u3002\u7814\u7a76\u4e3b\u8981\u4ece\u4e24\u4e2a\u5173\u952e\u7684SDK\u53d1\u5e03\u5e73\u53f0\uff0c\u5b98\u65b9\u5e73\u53f0\u4e0e\u5927\u578b\u66ff\u4ee3\u5e73\u53f0\uff0c\u5bf9\u5e7f\u6cdb\u4f7f\u7528\u7684158\u4e2aSDK\u8fdb\u884c\u4e86\u8c03\u67e5\u3002 \u5728\u9690\u79c1\u6cc4\u9732\u65b9\u9762\uff0c\u6211\u4eec\u53d1\u73b0\u4e86338\u4e2a\u5b9e\u4f8b\uff0c\u8868\u660e\u8fd9\u4e9bSDK\u5728\u672a\u7ecf\u6388\u6743\u7684\u60c5\u51b5\u4e0b\uff0c\u975e\u6cd5\u4f20\u8f93\u4e86\u7528\u6237\u7684\u654f\u611f\u4fe1\u606f\u3002\u8fd9\u53ef\u80fd\u88ab\u7528\u4e8e\u975e\u6cd5\u76ee\u7684\uff0c\u5982\u7528\u6237\u8ffd\u8e2a\u6216\u725f\u5229\u3002 \u5728\u9690\u79c1\u5408\u89c4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u8d85\u8fc730%\u7684\u88ab\u68c0\u67e5SDK\u5e76\u672a\u63d0\u4f9b\u9690\u79c1\u653f\u7b56\uff0c\u4ee5\u62ab\u9732\u5176\u6570\u636e\u5904\u7406\u5b9e\u8df5\u3002\u5bf9\u4e8e\u90a3\u4e9b\u63d0\u4f9b\u4e86\u9690\u79c1\u653f\u7b56\u7684SDK\uff0c\u670937%\u8fc7\u5ea6\u6536\u96c6\u4e86\u7528\u6237\u6570\u636e\uff0c\u800c88%\u5219\u9519\u8bef\u5730\u58f0\u79f0\u62e5\u6709\u8bbf\u95ee\u654f\u611f\u6570\u636e\u7684\u6743\u5229\u3002 \u6211\u4eec\u5728\u4e00\u5e74\u540e\u91cd\u65b0\u5ba1\u89c6\u4e86SDK\u7684\u6700\u65b0\u7248\u672c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u4ee4\u4eba\u62c5\u5fe7\u7684\u8d8b\u52bf\u5e76\u6ca1\u6709\u5f97\u5230\u6539\u5584\u3002 \u57fa\u4e8e\u6211\u4eec\u7684\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u9879\u884c\u52a8\u5efa\u8bae\uff0c\u65e8\u5728\u964d\u4f4e\u9690\u79c1\u6cc4\u9732\u98ce\u9669\u5e76\u589e\u5f3aAndroid\u7528\u6237\u7684\u9690\u79c1\u4fdd\u62a4\u3002\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u5bf9\u884c\u4e1a\u63d0\u51fa\u4e86\u7d27\u8feb\u7684\u5173\u6ce8\u547c\u5401\uff0c\u4e5f\u4e3a\u672a\u6765\u7684\u76d1\u7ba1\u5e72\u9884\u63d0\u4f9b\u4e86\u5173\u952e\u89c1\u89e3\u3002|\n", "2409.10354": "|**2024-09-17**|**Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot**|Bhuvan Sachdeva et.al.|[2409.10354](http://arxiv.org/abs/2409.10354)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u5e94\u7528\u53ca\u5176\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u5e7b\u89c9\u3001\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u504f\u89c1\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u8005\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6784\u5efa\u4f60\u81ea\u5df1\u7684\u4e13\u5bb6\u673a\u5668\u4eba\u201d\uff08BYOeB\uff09\u7684\u5e73\u53f0\uff0c\u5141\u8bb8\u5f00\u53d1\u4eba\u5458\u521b\u5efa\u96c6\u6210\u4e13\u5bb6\u9a8c\u8bc1\u7684LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u3002CataractBot\u662f\u8be5\u5e73\u53f0\u7684\u7b2c\u4e00\u4e2a\u5b9e\u73b0\uff0c\u5b83\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u6709\u5173\u767d\u5185\u969c\u624b\u672f\u7684\u4e13\u5bb6\u9a8c\u8bc1\u56de\u7b54\u3002\u521d\u6b65\u8bc4\u4f30\u663e\u793a\u4e86\u5176\u6f5c\u529b\uff0c\u4f46\u8be5\u7814\u7a76\u6837\u672c\u91cf\u8f83\u5c0f\u4e14\u4e3b\u8981\u4e3a\u5b9a\u6027\u5206\u6790\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5bf9CataractBot\u8fdb\u884c\u4e86\u4e3a\u671f24\u5468\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u6d89\u53ca318\u540d\u60a3\u8005\u53ca\u5176\u966a\u540c\u4eba\u5458\u53d1\u9001\u76841992\u6761\u6d88\u606f\uff0c\u5176\u4e2d91.71%\u7684\u56de\u7b54\u7ecf\u8fc7\u4e86\u4e03\u4f4d\u4e13\u5bb6\u7684\u9a8c\u8bc1\u3002\u901a\u8fc7\u5206\u6790\u4ea4\u4e92\u65e5\u5fd7\uff0c\u6211\u4eec\u53d1\u73b0\u533b\u7597\u95ee\u9898\u8fdc\u591a\u4e8e\u7269\u6d41\u95ee\u9898\uff0c\u5e7b\u89c9\u73b0\u8c61\u53ef\u4ee5\u5ffd\u7565\u4e0d\u8ba1\uff0c\u5e76\u4e14\u4e13\u5bb6\u8bc4\u5b9a84.52%\u7684\u533b\u7597\u56de\u7b54\u51c6\u786e\u65e0\u8bef\u3002\u968f\u7740\u77e5\u8bc6\u5e93\u901a\u8fc7\u4e13\u5bb6\u66f4\u6b63\u4e0d\u65ad\u6269\u5c55\uff0c\u7cfb\u7edf\u7684\u6027\u80fd\u5f97\u5230\u4e8619.02%\u7684\u63d0\u5347\uff0c\u51cf\u5c11\u4e86\u4e13\u5bb6\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002\u8fd9\u4e9b\u53d1\u73b0\u6307\u5bfc\u672a\u6765LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.11404": "|**2024-09-17**|**AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs**|Basel Mousi et.al.|[2409.11404](http://arxiv.org/abs/2409.11404)|null|\u963f\u62c9\u4f2f\u8bed\uff0c\u4ee5\u5176\u4e30\u5bcc\u7684\u65b9\u8a00\u591a\u6837\u6027\uff0c\u4ecd\u7136\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u663e\u8457\u88ab\u4f4e\u4f30\uff0c\u5c24\u5176\u662f\u5728\u65b9\u8a00\u53d8\u4f53\u65b9\u9762\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u673a\u5668\u7ffb\u8bd1\u7ed3\u5408\u4eba\u5de5\u540e\u7f16\u8f91\u521b\u5efa\u7684\u4e03\u4e2a\u4eba\u5de5\u5408\u6210\u6570\u636e\u96c6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u4ee5\u53ca\u963f\u62c9\u4f2f\u5404\u5730\u533a\u7684\u65b9\u8a00\u3002\u6211\u4eec\u63d0\u51fa\u4e86AraDiCE\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u7406\u89e3\u4e0e\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u7814\u7a76\u4fa7\u91cd\u4e8e\u4f4e\u8d44\u6e90\u963f\u62c9\u4f2f\u65b9\u8a00\uff0c\u5e76\u5bf9\u5176\u8fdb\u884c\u4e86\u8bc4\u4ef7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u57fa\u51c6\uff0c\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u534a\u5c9b\u3001\u57c3\u53ca\u548c\u9ece\u51e1\u7279\u5730\u533a\u4e4b\u95f4\u7684\u6587\u5316\u610f\u8bc6\uff0c\u4e3aLLM\u8bc4\u4f30\u63d0\u4f9b\u4e86\u65b0\u7684\u7ef4\u5ea6\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u9488\u5bf9\u7279\u5b9a\u963f\u62c9\u4f2f\u8bed\u6a21\u578b\u5982Jais\u548cAceGPT\u5728\u65b9\u8a00\u4efb\u52a1\u4e0a\u4f18\u4e8e\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u5728\u65b9\u8a00\u8bc6\u522b\u3001\u751f\u6210\u548c\u7ffb\u8bd1\u65b9\u9762\u4ecd\u5b58\u5728\u91cd\u5927\u6311\u6218\u3002\u8fd9\u9879\u5de5\u4f5c\u8d21\u732e\u4e86\u7ea64.5\u4e07\u4e2a\u7ecf\u8fc7\u4eba\u5de5\u540e\u7f16\u8f91\u7684\u6837\u672c\u3001\u4e00\u4e2a\u6587\u5316\u57fa\u51c6\uff0c\u5e76\u5f3a\u8c03\u4e86\u6839\u636e\u7279\u5b9a\u8bad\u7ec3\u6765\u6539\u5584\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6355\u6349\u4e0d\u540c\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u80cc\u666f\u7ec6\u5fae\u5dee\u5f02\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u5728\u672c\u7814\u7a76\u4e2d\u6784\u5efa\u7684\u65b9\u8a00\u7ffb\u8bd1\u6a21\u578b\u548c\u57fa\u51c6\u3002|\n", "2409.11402": "|**2024-09-17**|**NVLM: Open Frontier-Class Multimodal LLMs**|Wenliang Dai et.al.|[2409.11402](http://arxiv.org/abs/2409.11402)|null|\u6211\u4eec\u5f15\u5165\u4e86NVLM 1.0\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fbe\u5230\u524d\u6cbf\u6c34\u5e73\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cf\uff0c\u5176\u6027\u80fd\u4e0e\u9876\u7ea7\u4e13\u6709\u6a21\u578b\uff08\u5982GPT-4o\uff09\u548c\u5f00\u6e90\u6a21\u578b\uff08\u5982Llama 3-V 405B\u548cInternVL 2\uff09\u76f8\u5339\u654c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cNVLM 1.0\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u540e\uff0c\u5728\u4ec5\u6587\u672c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u751a\u81f3\u8d85\u8fc7\u4e86\u5176\u80cc\u540e\u7684\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u67b6\u6784\u3002 \u5728\u6a21\u578b\u8bbe\u8ba1\u65b9\u9762\uff0c\u6211\u4eec\u5bf9\u89e3\u7801\u5668\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982LLaVA\uff09\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u578b\u6a21\u578b\uff08\u5982Flamingo\uff09\u8fdb\u884c\u4e86\u5168\u9762\u6bd4\u8f83\u3002\u57fa\u4e8e\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u67b6\u6784\uff0c\u4ee5\u63d0\u9ad8\u8bad\u7ec3\u6548\u7387\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7528\u4e8e\u52a8\u6001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u76841-D\u74f7\u7816\u6807\u8bb0\u8bbe\u8ba1\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u63a8\u7406\u548cOCR\u76f8\u5173\u4efb\u52a1\u7684\u6027\u80fd\u3002 \u5173\u4e8e\u8bad\u7ec3\u6570\u636e\uff0c\u6211\u4eec\u7cbe\u5fc3\u6536\u96c6\u5e76\u63d0\u4f9b\u4e86\u6240\u6709\u67b6\u6784\u7684\u9884\u8bad\u7ec3\u548c\u76d1\u7763\u5fae\u8c03\u6570\u636e\u96c6\u7684\u8be6\u7ec6\u4fe1\u606f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u6570\u636e\u8d28\u91cf\u548c\u4efb\u52a1\u591a\u6837\u6027\u6bd4\u89c4\u6a21\u66f4\u4e3a\u91cd\u8981\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u4e3aNVLM-1.0\u6a21\u578b\u5f00\u53d1\u4e86\u751f\u4ea7\u7ea7\u591a\u6a21\u6001\u529f\u80fd\uff0c\u4f7f\u5b83\u4eec\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e2d\u4e0d\u4ec5\u4fdd\u6301\u751a\u81f3\u8d85\u8d8a\u4e86\u57fa\u7840\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u4e2d\u5de7\u5999\u5730\u6574\u5408\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u7eaf\u6587\u672c\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u5927\u91cf\u7684\u591a\u6a21\u6001\u6570\u5b66\u548c\u63a8\u7406\u6570\u636e\uff0c\u4ece\u800c\u5728\u6240\u6709\u6a21\u6001\u4e0b\u63d0\u9ad8\u4e86\u6570\u5b66\u548c\u7f16\u7801\u80fd\u529b\u3002 \u4e3a\u4e86\u63a8\u52a8\u9886\u57df\u7814\u7a76\uff0c\u6211\u4eec\u5c06\u53d1\u5e03\u6a21\u578b\u6743\u91cd\u5e76\u5f00\u6e90\u4ee3\u7801\u4f9b\u793e\u533a\u4f7f\u7528\uff1ahttps://nvlm-project.github.io/\u3002|\n", "2409.11390": "|**2024-09-17**|**Says Who? Effective Zero-Shot Annotation of Focalization**|Rebecca M. M. Hicke et.al.|[2409.11390](http://arxiv.org/abs/2409.11390)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e3a\u6587\u5b66\u6587\u672c\u6807\u6ce8\u7126\u70b9\u6a21\u5f0f\u65f6\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u4efb\u52a1\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0e\u53d7\u8fc7\u8bad\u7ec3\u7684\u4eba\u7c7b\u6ce8\u91ca\u8005\u76f8\u5f53\u3002\u6211\u4eec\u4ee5\u65af\u8482\u82ac\u00b7\u91d1\u7684\u5c0f\u8bf4\u4e3a\u4f8b\u8fdb\u884c\u6848\u4f8b\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u8ba1\u7b97\u6587\u5b66\u7814\u7a76\u4e2d\u7684\u5b9e\u7528\u6027\uff0c\u8bf4\u660e\u4e86\u5982\u4f55\u5927\u89c4\u6a21\u5730\u7814\u7a76\u7126\u70b9\u6a21\u5f0f\u3002|\n", "2409.11378": "|**2024-09-17**|**Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement**|Simon Yu et.al.|[2409.11378](http://arxiv.org/abs/2409.11378)|**[link](https://github.com/for-ai/iterative-data-selection)**|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u6570\u636e\u4e0a\u7684\u80fd\u529b\u5bf9\u4e8e\u589e\u5f3a\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u63d0\u5347\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u6307\u4ee4\u6570\u636e\u96c6\u7684\u4e0d\u65ad\u589e\u591a\uff0c\u9009\u62e9\u6709\u6548\u7684\u6570\u636e\u8fdb\u884c\u6709\u6548\u8bad\u7ec3\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u786e\u5b9a\u6709\u6548\u8bad\u7ec3\u7684\u6700\u4f73\u6570\u636e\u5b50\u96c6\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u5b9e\u4f8b\u8d28\u91cf\u7b49\u5c40\u90e8\u6807\u51c6\u8fdb\u884c\u5b50\u96c6\u9009\u62e9\uff0c\u4f46\u6211\u4eec\u8ba4\u4e3a\u5168\u5c40\u89c6\u89d2\u5173\u6ce8\u6570\u636e\u591a\u6837\u6027\u66f4\u4e3a\u5173\u952e\u3002\u6211\u4eec\u91c7\u7528k\u5747\u503c\u805a\u7c7b\u65b9\u6cd5\u786e\u4fdd\u6240\u9009\u5b50\u96c6\u5145\u5206\u4ee3\u8868\u6574\u4e2a\u6570\u636e\u96c6\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u542f\u53d1\u81ea\u4e3b\u52a8\u5b66\u4e60\u6280\u672f\u7684\u8fed\u4ee3\u4f18\u5316\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ece\u5404\u4e2a\u805a\u7c7b\u4e2d\u91cd\u65b0\u91c7\u6837\u5b9e\u4f8b\uff0c\u5e76\u5728\u6bcf\u4e00\u6b21\u8bad\u7ec3\u8fed\u4ee3\u4e2d\u91cd\u65b0\u8bc4\u4f30\u6bcf\u4e2a\u805a\u7c7b\u7684\u91cd\u8981\u6027\u548c\u91c7\u6837\u6743\u91cd\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u964d\u4f4e\u5f02\u5e38\u503c\u7684\u5f71\u54cd\u5e76\u81ea\u52a8\u7b5b\u9009\u51fa\u5305\u542b\u4f4e\u8d28\u91cf\u6570\u636e\u7684\u805a\u7c7b\u3002\u901a\u8fc7\u5728\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u3001\u4e00\u822c\u4e16\u754c\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u5e76\u5bf9\u5404\u79cd\u6a21\u578b\u5bb6\u65cf\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u81f4\u6027\u6539\u8fdb\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u9009\u62e9\u63d0\u9ad8\u4e867%\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u91c7\u6837\u65b9\u6cd5\u63d0\u9ad8\u4e863.8%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5728\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee5\u589e\u5f3a\u5e7f\u6cdb\u7684\u8bc4\u4f30\u4efb\u52a1\u6027\u80fd\u65f6\uff0c\u4f18\u5148\u8003\u8651\u591a\u6837\u6027\u7684\u91c7\u6837\u65b9\u6cd5\u7684\u91cd\u8981\u6027\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/for-ai/iterative-data-selection\u3002|\n", "2409.11376": "|**2024-09-17**|**Towards Time Series Reasoning with LLMs**|Winnie Chow et.al.|[2409.11376](http://arxiv.org/abs/2409.11376)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7b49\u9886\u57df\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u5c1a\u672a\u770b\u5230\u8fd9\u79cd\u5e7f\u6cdb\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65f6\u95f4\u5e8f\u5217MLLM\u7814\u7a76\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u8868\u73b0\uff0c\u4f46\u5f88\u5c11\u6709\u5de5\u4f5c\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u7684\u65f6\u95f4\u5e8f\u5217\u63a8\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u6a21\u6001\u65f6\u95f4\u5e8f\u5217LLM\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8de8\u5404\u79cd\u9886\u57df\u5b66\u4e60\u901a\u7528\u4fe1\u606f\uff0c\u5e76\u5177\u6709\u5f3a\u5927\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5728LLM\u9876\u90e8\u8bad\u7ec3\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u65f6\u95f4\u5e8f\u5217\u7f16\u7801\u5668\uff0c\u76f4\u63a5\u63d0\u53d6\u65f6\u95f4\u5e8f\u5217\u4fe1\u606f\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7684\u65f6\u95f4\u5e8f\u5217\u4efb\u52a1\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9f13\u52b1\u6a21\u578b\u751f\u6210\u63a8\u7406\u8def\u5f84\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u6a21\u578b\u5b66\u4e60\u5230\u7684\u6f5c\u5728\u8868\u793a\u53cd\u6620\u4e86\u7279\u5b9a\u7684\u65f6\u95f4\u5e8f\u5217\u7279\u5f81\uff08\u4f8b\u5982\u659c\u7387\u3001\u9891\u7387\uff09\uff0c\u5e76\u4e14\u5728\u591a\u79cd\u9886\u57df\u7684\u4e00\u7cfb\u5217\u96f6\u6837\u672c\u63a8\u7406\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8eGPT-4o\u3002|\n", "2409.11375": "|**2024-09-17**|**Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification**|Fatema-E- Jannat et.al.|[2409.11375](http://arxiv.org/abs/2409.11375)|null|\u5728\u533b\u7597\u9886\u57df\u4e2d\uff0c\u83b7\u53d6\u5927\u91cf\u6570\u636e\u9762\u4e34\u7740\u663e\u8457\u7684\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u9690\u79c1\u95ee\u9898\u3002\u7136\u800c\uff0c\u4e3a\u4e86\u8bad\u7ec3\u7528\u4e8e\u89c6\u7f51\u819c\u75be\u75c5\u8bca\u65ad\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u96c6\u3002\u5728\u8f83\u5c0f\u6570\u636e\u96c6\u4e0a\u6709\u6548\u6cdb\u5316\u7684\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u6301\u7eed\u7684\u6311\u6218\u3002\u6570\u636e\u7a00\u7f3a\u6027\u6784\u6210\u4e86\u5b9e\u65bd\u53ef\u6269\u5c55\u533b\u7597AI\u89e3\u51b3\u65b9\u6848\u7684\u5b9e\u9645\u969c\u788d\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u591a\u79cd\u6570\u636e\u6e90\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u5e76\u589e\u5f3a\u5bf9\u65b0\u6570\u636e\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u901a\u8fc7\u8d4b\u4e88\u6a21\u578b\u4ece\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e2d\u66f4\u6df1\u5165\u7406\u89e3\u6570\u636e\u8868\u793a\u7684\u80fd\u529b\u3002\u6211\u4eec\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548cSwinV2\u6846\u67b6\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u76d1\u7763\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u591a\u6a21\u6001\u6570\u636e\u96c6\u8868\u793a\u7684\u7406\u89e3\uff0c\u4ece\u800c\u63d0\u9ad8\u4f7f\u7528\u5149\u5b66\u76f8\u5e72\u65ad\u5c42\u6210\u50cf\uff08OCT\uff09\u56fe\u50cf\u68c0\u6d4b\u773c\u75c5\u7684\u80fd\u529b\u3002 \u6211\u4eec\u91c7\u7528\u4e86\u4e24\u9636\u6bb5\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5373\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u548c\u4e0b\u6e38\u76d1\u7763\u5206\u7c7b\u5668\u7684\u5fae\u8c03\u3002\u9488\u5bf9\u4e09\u79cd\u4e0d\u540c\u6570\u636e\u96c6\u8fdb\u884c\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5728\u672a\u878d\u5408\u6570\u636e\u3001\u6570\u636e\u91cf\u6709\u9650\u8bbe\u7f6e\u548c\u65e0\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u573a\u666f\u4e0b\u91c7\u7528\u4e0d\u540c\u7684\u7f16\u7801\u5668\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5373\u4f7f\u5728\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u6761\u4ef6\u4e0b\uff0c\u4e5f\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u4e0e\u57fa\u7ebf\u6a21\u578bResNet-50\u76f8\u6bd4\uff0c\u5177\u6709\u66f4\u5f3a\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.11365": "|**2024-09-17**|**CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration**|Jiahui Gao et.al.|[2409.11365](http://arxiv.org/abs/2409.11365)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u9762\u5bf9\u6076\u610f\u89c6\u89c9\u8f93\u5165\u65f6\u7684\u5b89\u5168\u610f\u8bc6\u95ee\u9898\u3002MLLM\u901a\u5e38\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u914d\u4ee5\u56fe\u50cf\u7f16\u7801\u5668\u5c06\u56fe\u50cf\u8f6c\u6362\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u7684\u6587\u672c\u6570\u636e\u96c6\u4e2d\u7684\u4ee4\u724c\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u89c6\u89c9\u6a21\u6001\u7684\u6574\u5408\u5f15\u5165\u4e86\u4e00\u79cd\u72ec\u7279\u7684\u8106\u5f31\u6027\uff1aMLLM\u5bf9\u6076\u610f\u56fe\u50cf\u8f93\u5165\u53d8\u5f97\u654f\u611f\uff0c\u5e76\u503e\u5411\u4e8e\u751f\u6210\u53ef\u80fd\u5f15\u53d1\u5b89\u5168\u6216\u6709\u5bb3\u54cd\u5e94\u7684\u8f93\u51fa\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u901a\u8fc7\u5728MLLM\u7684\u8f93\u5165\u4e2d\u52a0\u5165\u4e00\u4e2a\u539f\u5219\uff0c\u4ee5\u660e\u786e\u5b9a\u4e49\u5b89\u5168\u6027\u8981\u6c42\uff0c\u5176\u5b89\u5168\u610f\u8bc6\u5f97\u5230\u4e86\u589e\u5f3a\u3002\u8fd9\u8bc1\u5b9e\u4e86MLLM\u5728\u5904\u7406\u56fe\u50cf\u8f93\u5165\u65f6\u5177\u6709\u4e00\u5b9a\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u4f46\u8fd9\u4e00\u80fd\u529b\u53d7\u5230\u6a21\u6001\u5dee\u8ddd\u7684\u5f71\u54cd\u800c\u51cf\u5f31\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u6280\u672f\u2014\u2014CoCA\uff08Calibration of Conditional Awareness\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u8f93\u51fa\u5206\u5e03\u6765\u589e\u5f3aMLLM\u7684\u5b89\u5168\u610f\u8bc6\u3002\u8be5\u7b56\u7565\u6709\u52a9\u4e8e\u6a21\u578b\u6062\u590d\u5176\u539f\u59cb\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u540c\u65f6\u4e0d\u727a\u7272\u5176\u539f\u6709\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u6a21\u6001\u5b89\u5168\u6027\u548c\u7406\u89e3\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.11360": "|**2024-09-17**|**AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances**|Dhruv Agarwal et.al.|[2409.11360](http://arxiv.org/abs/2409.11360)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f53\u897f\u65b9\u5bfc\u5411\u7684AI\u6a21\u578b\u5411\u6765\u81ea\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u7528\u6237\u63d0\u4f9b\u5199\u4f5c\u5efa\u8bae\u65f6\u4f1a\u53d1\u751f\u4ec0\u4e48\u60c5\u51b5\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u8de8\u6587\u5316\u7684\u53d7\u63a7\u5b9e\u9a8c\uff0c\u5171\u6709\u6765\u81ea\u5370\u5ea6\u548c\u7f8e\u56fd\u7684118\u540d\u53c2\u4e0e\u8005\u5b8c\u6210\u4e86\u5177\u6709\u6587\u5316\u57fa\u7840\u7684\u5199\u4f5c\u4efb\u52a1\uff0c\u5e76\u5728\u6709\u65e0AI\u5efa\u8bae\u7684\u60c5\u51b5\u4e0b\u5b8c\u6210\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cAI\u4e3a\u7f8e\u56fd\u4eba\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u6548\u7387\u589e\u76ca\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5370\u5ea6\u53c2\u4e0e\u8005\u5219\u5728\u91c7\u7528\u897f\u65b9\u5199\u4f5c\u98ce\u683c\u65b9\u9762\u53d7\u5230\u5f71\u54cd\uff0c\u4e0d\u4ec5\u6539\u53d8\u4e86\u6240\u5199\u7684\u5185\u5bb9\uff0c\u4e5f\u6539\u53d8\u4e86\u5176\u5199\u4f5c\u98ce\u683c\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u4ee5\u897f\u65b9\u4e3a\u4e2d\u5fc3\u7684AI\u6a21\u578b\u4f1a\u5c06\u5199\u4f5c\u65b9\u5f0f\u540c\u8d28\u5316\uff0c\u4f7f\u4e4b\u8d8b\u5411\u4e8e\u897f\u65b9\u89c4\u8303\uff0c\u4ece\u800c\u524a\u5f31\u4e86\u80fd\u591f\u4f53\u73b0\u6587\u5316\u5dee\u5f02\u7684\u7ec6\u5fae\u4e4b\u5904\u3002|\n", "2409.11353": "|**2024-09-17**|**THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models**|Mengfei Liang et.al.|[2409.11353](http://arxiv.org/abs/2409.11353)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aTHaMES\uff08\u5de5\u5177\u7528\u4e8e\u5e7b\u89c9\u7f13\u89e3\u4e0e\u8bc4\u4f30\uff09\u7684\u96c6\u6210\u6846\u67b6\u548c\u5e93\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u5b58\u5728\u7684\u5e7b\u89c9\u751f\u6210\u8fd9\u4e00\u65e5\u76ca\u589e\u957f\u7684\u6311\u6218\u3002\u73b0\u6709\u7684\u68c0\u6d4b\u548c\u7f13\u89e3\u65b9\u6cd5\u5f80\u5f80\u5b64\u7acb\u4e14\u65e0\u6cd5\u6ee1\u8db3\u7279\u5b9a\u9886\u57df\u7684\u9700\u8981\uff0c\u7f3a\u4e4f\u6807\u51c6\u5316\u6d41\u7a0b\u3002THaMES\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7aef\u5230\u7aef\u89e3\u51b3\u65b9\u6848\uff0c\u6db5\u76d6\u8bc4\u4f30\u548c\u7f13\u89e3LLMs\u4e2d\u5e7b\u89c9\u95ee\u9898\u7684\u5404\u4e2a\u73af\u8282\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u6d4b\u8bd5\u96c6\u751f\u6210\u3001\u591a\u7ef4\u5ea6\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u7075\u6d3b\u7684\u7f13\u89e3\u7b56\u7565\u3002\u5b83\u901a\u8fc7\u6279\u91cf\u5904\u7406\u3001\u52a0\u6743\u62bd\u6837\u548c\u53cd\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u6280\u672f\u81ea\u52a8\u521b\u5efa\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6210\u672c\u6548\u76ca\u9ad8\u7684\u6d4b\u8bd5\u96c6\u3002THaMES\u8bc4\u4f30\u4e86\u6a21\u578b\u5728\u6587\u672c\u751f\u6210\u548c\u4e8c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u68c0\u6d4b\u4e0e\u51cf\u5c11\u80fd\u529b\uff0c\u5e76\u5e94\u7528\u4e86\u6700\u4f73\u7f13\u89e3\u7b56\u7565\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u3001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u4f7f\u7528\u5b66\u672f\u8bba\u6587\u3001\u653f\u6cbb\u65b0\u95fb\u548c\u7ef4\u57fa\u767e\u79d1\u7684\u77e5\u8bc6\u5e93\u5bf9\u524d\u6cbfLLMs\u8fdb\u884c\u8bc4\u4f30\u53d1\u73b0\uff0c\u5546\u4e1a\u6a21\u578b\u5982GPT-4o\u5728\u53d7\u76ca\u4e8eRAG\u65b9\u9762\u6bd4ICL\u66f4\u591a\uff0c\u800c\u5f00\u6e90\u6a21\u578b\u5982Llama-3.1-8B-Instruct\u548cMistral-Nemo\u5219\u4eceICL\u4e2d\u83b7\u5f97\u66f4\u5927\u76ca\u5904\u3002\u6b64\u5916\uff0cPEFT\u663e\u8457\u63d0\u9ad8\u4e86Llama-3.1-8B-Instruct\u5728\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002|\n", "2409.11282": "|**2024-09-17**|**Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5**|Marcel Lamott et.al.|[2409.11282](http://arxiv.org/abs/2409.11282)|null|\u968f\u7740\u5404\u7c7b\u6570\u5b57\u6587\u6863\u683c\u5f0f\u7684\u6fc0\u589e\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u975e\u6807\u51c6\u5316\u7684\u6587\u6863\u5982\u5546\u4e1a\u62a5\u544a\u548c\u73af\u5883\u8bc4\u4f30\u62a5\u544a\uff0c\u6587\u6863\u7406\u89e3\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u80fd\u529b\uff0c\u4f46\u5728\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u76f4\u63a5\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u8868\u660eLLMs\u5728\u8fd9\u4e00\u9886\u57df\u5177\u6709\u6f5c\u529b\uff0c\u7136\u800c\u5b83\u4eec\u5de8\u5927\u7684\u8ba1\u7b97\u9700\u6c42\u4f7f\u5176\u96be\u4ee5\u6709\u6548\u5730\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u4e13\u6709\u7684\u201c\u9ed1\u76d2\u201dLLMs\u5f80\u5f80\u4f18\u4e8e\u5f00\u6e90\u7248\u672c\uff0c\u8fd9\u6784\u6210\u4e86\u5e7f\u6cdb\u53ef\u8bbf\u95ee\u6027\u7684\u969c\u788d\u3002\u672c\u6587\u6df1\u5165\u63a2\u8ba8\u4e86\u6587\u6863\u7406\u89e3\u7684\u9886\u57df\uff0c\u5229\u7528\u4e86\u4eceLLM ChatGPT\u5230FLAN-T5\u7684\u63d0\u70bc\u65b9\u6cd5\u6765\u5e73\u8861\u5927\u6a21\u578b\u7684\u5f3a\u5927\u529f\u80fd\u4e0e\u8ba1\u7b97\u9650\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u6574\u5408\u6807\u8bb0\u548c\u8bfe\u7a0b\u5b66\u4e60\u673a\u5236\u6765\u4fc3\u8fdb\u77e5\u8bc6\u7684\u6709\u6548\u8f6c\u79fb\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u7684\u8fdb\u5c55\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5f25\u5408\u8d44\u6e90\u5bc6\u96c6\u578bLLMs\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u63d0\u70bc\u6280\u672f\u5728\u4f7f\u590d\u6742\u8bed\u8a00\u6a21\u578b\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5f97\u5230\u5e7f\u6cdb\u5e94\u7528\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u63a8\u52a8\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u53d1\u5c55\u3002|\n", "2409.12194": "|**2024-09-20**|**Gender Representation and Bias in Indian Civil Service Mock Interviews**|Somonnoy Banerjee et.al.|[2409.12194](http://arxiv.org/abs/2409.12194)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u8d21\u732e\u3002\u9996\u5148\uff0c\u901a\u8fc7\u6536\u96c6\u81ea888\u4e2a\u5370\u5ea6\u516c\u52a1\u5458\u5019\u9009\u4eba\u9762\u8bd5\u6a21\u62df\u7684YouTube\u89c6\u9891\u4e2d\u768451,278\u4e2a\u95ee\u9898\u6837\u672c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5bf9\u7537\u6027\u548c\u5973\u6027\u5019\u9009\u4eba\u63d0\u95ee\u7684\u6027\u522b\u504f\u89c1\u5728\u5e7f\u6cdb\u6027\u8d28\u4e0a\u7684\u663e\u8457\u5b58\u5728\u3002\u7b2c\u4e8c\uff0c\u6211\u4eec\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b9e\u9a8c\u63ed\u793a\u4e86\u5728\u6027\u522b\u63a8\u65ad\u4efb\u52a1\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u7684\u89e3\u91ca\u4e2d\u5b58\u5728\u5f3a\u70c8\u7684\u6027\u522b\u504f\u89c1\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b51,278\u4e2a\u9762\u8bd5\u95ee\u9898\u7684\u65b0\u578b\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u672a\u6765\u7684\u4eba\u6587\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u63d0\u4f9b\u4fe1\u606f\u3002|\n", "2409.12183": "|**2024-09-18**|**To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning**|Zayne Sprague et.al.|[2409.12183](http://arxiv.org/abs/2409.12183)|**[link](https://github.com/zayne-sprague/to-cot-or-not-to-cot)**|\u4e3a\u4e86\u5206\u6790\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u5728\u54ea\u4e9b\u4efb\u52a1\u4e2d\u771f\u6b63\u6709\u76ca\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u91cf\u5316\u5143\u5206\u6790\uff0c\u8986\u76d6\u4e86\u8d85\u8fc7100\u7bc7\u4f7f\u7528CoT\u7684\u8bba\u6587\uff0c\u5e76\u5bf920\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u4e8614\u79cd\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0cCoT\u4e3b\u8981\u5728\u6570\u5b66\u6216\u903b\u8f91\u4efb\u52a1\u4e0a\u63d0\u4f9b\u663e\u8457\u6027\u80fd\u4f18\u52bf\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u4efb\u52a1\u4e0a\u7684\u589e\u76ca\u8f83\u5c0f\u3002\u5728MMLU\u4e0a\uff0c\u76f4\u63a5\u751f\u6210\u7b54\u6848\u800c\u65e0\u9700CoT\u51e0\u4e4e\u4e0eCoT\u5177\u6709\u76f8\u540c\u7684\u51c6\u786e\u6027\uff0c\u9664\u975e\u95ee\u9898\u6216\u6a21\u578b\u7684\u56de\u7b54\u5305\u542b\u7b49\u53f7\uff0c\u8fd9\u8868\u660e\u7b26\u53f7\u64cd\u4f5c\u548c\u63a8\u7406\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u5206\u6790\u4e86CoT\u5728\u8fd9\u4e9b\u95ee\u9898\u4e2d\u7684\u884c\u4e3a\uff0c\u901a\u8fc7\u5206\u79bb\u89c4\u5212\u548c\u6267\u884c\uff0c\u5e76\u4e0e\u589e\u5f3a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6bd4\u8f83\u3002CoT\u5927\u90e8\u5206\u6536\u76ca\u6765\u81ea\u6539\u8fdb\u7684\u7b26\u53f7\u6267\u884c\uff0c\u4f46\u76f8\u8f83\u4e8e\u4f7f\u7528\u7b26\u53f7\u6c42\u89e3\u5668\uff0c\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u53ef\u4ee5\u6839\u636e\u9700\u8981\u5e94\u7528CoT\uff0c\u540c\u65f6\u4fdd\u6301\u6027\u80fd\u5e76\u8282\u7701\u63a8\u7406\u6210\u672c\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u7ed3\u679c\u8fd8\u8868\u660e\uff0c\u9700\u8981\u8d85\u8d8a\u57fa\u4e8e\u63d0\u793a\u7684CoT\uff0c\u8f6c\u5411\u65b0\u7684\u8303\u5f0f\uff0c\u66f4\u597d\u5730\u5229\u7528\u6574\u4e2a\u8303\u56f4\u5185\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5e94\u7528\u4e2d\u7684\u4e2d\u95f4\u8ba1\u7b97\u3002|\n", "2409.12180": "|**2024-09-18**|**Finetuning Language Models to Emit Linguistic Expressions of Uncertainty**|Arslan Chaudhry et.al.|[2409.12180](http://arxiv.org/abs/2409.12180)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u68c0\u7d22\u4e0e\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u4f46\u5b83\u4eec\u503e\u5411\u4e8e\u751f\u6210\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u76f8\u51b2\u7a81\u7684\u4fe1\u606f\uff0c\u5e76\u4ee5\u8bf4\u670d\u6027\u7684\u65b9\u5f0f\u8868\u8fbe\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4e0d\u51c6\u786e\u6027\u770b\u8d77\u6765\u81ea\u4fe1\u4e14\u4ee4\u4eba\u4fe1\u670d\u3002\u8fd9\u5bfc\u81f4\u6700\u7ec8\u7528\u6237\u96be\u4ee5\u4e00\u81f4\u5730\u5c06LLM\u7684\u81ea\u4fe1\u5ea6\u4e0e\u9884\u6d4b\u7684\u51c6\u786e\u6027\u5bf9\u9f50\uff0c\u5e38\u5e38\u5bfc\u81f4\u5bf9\u6240\u6709\u8f93\u51fa\u7684\u76f2\u76ee\u4fe1\u4efb\u6216\u5b8c\u5168\u5ffd\u89c6\u5176\u53ef\u9760\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5728\u4e0d\u786e\u5b9a\u6027\u589e\u5f3a\u7684\u9884\u6d4b\u57fa\u7840\u4e0a\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u4ee5\u6b64\u6765\u5f00\u53d1\u80fd\u591f\u751f\u6210\u8bed\u8a00\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u7684\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8861\u91cf\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6821\u51c6\u7a0b\u5ea6\uff0c\u7136\u540e\u901a\u8fc7\u57fa\u4e8e\u6a21\u578b\u81ea\u8eab\u4fe1\u5fc3\u7684\u5fae\u8c03\uff0c\u4f7f\u8bed\u8a00\u6a21\u578b\u4ea7\u751f\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002\u901a\u8fc7\u5bf9\u5404\u79cd\u95ee\u7b54\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86LLM\u5728\u8bc4\u4f30\u9884\u6d4b\u65f6\u5177\u6709\u826f\u597d\u7684\u6821\u51c6\u80fd\u529b\uff0c\u5e76\u57fa\u4e8e\u6a21\u578b\u672c\u8eab\u7684\u4fe1\u5fc3\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u53ef\u83b7\u5f97\u7279\u522b\u9002\u7528\u4e8e\u5355\u4e2a\u58f0\u660e\u7b54\u6848\u7684\u826f\u597d\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002|\n", "2409.12150": "|**2024-09-18**|**Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference**|Najmeh Forouzandehmehr et.al.|[2409.12150](http://arxiv.org/abs/2409.12150)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u8868\u8fbe\u80fd\u529b\u6765\u89e3\u51b3\u4e2a\u6027\u5316\u670d\u88c5\u63a8\u8350\u8fd9\u4e00\u590d\u6742\u6311\u6218\u3002\u901a\u8fc7\u7ec6\u8c03\u548c\u76f4\u63a5\u53cd\u9988\u96c6\u6210\uff0c\u6211\u4eec\u8bd5\u56fe\u514b\u670dLLM\u7684\u201c\u9ed1\u76d2\u201d\u7279\u6027\u548c\u9759\u6001\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e0a\u4f7f\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u56fe\u50cf\u63cf\u8ff0\uff0c\u6765\u5f25\u5408\u9879\u76ee\u89c6\u89c9\u4e0e\u6587\u672c\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u4ece\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e2d\u63d0\u53d6\u98ce\u683c\u548c\u8272\u5f69\u7279\u5f81\uff0c\u4ece\u800c\u5f62\u6210\u4e2a\u6027\u5316\u7684\u63a8\u8350\u57fa\u7840\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u7684Polyvore\u6570\u636e\u96c6\u5bf9LLM\u8fdb\u884c\u9ad8\u6548\u7ec6\u8c03\uff0c\u4f18\u5316\u5176\u63a8\u8350\u65f6\u5c1a\u642d\u914d\u7684\u80fd\u529b\u3002\u91c7\u7528\u76f4\u63a5\u504f\u597d\u673a\u5236\u5e76\u7ed3\u5408\u8d1f\u4f8b\uff0c\u4ee5\u589e\u5f3aLLM\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u8fd9\u521b\u5efa\u4e86\u4e00\u4e2a\u81ea\u6211\u589e\u5f3a\u7684\u4eba\u5de5\u667a\u80fd\u53cd\u9988\u5faa\u73af\uff0c\u6301\u7eed\u5730\u6839\u636e\u5b63\u8282\u6027\u65f6\u5c1a\u8d8b\u52bf\u4f18\u5316\u63a8\u8350\u3002\u6211\u4eec\u7684\u6846\u67b6\u5728Polyvore\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\uff1a\u8865\u5168\u7a7a\u767d\u548c\u8f85\u52a9\u9879\u76ee\u68c0\u7d22\u3002\u8fd9\u4e9b\u8bc4\u4f30\u7ed3\u679c\u5f3a\u8c03\u4e86\u6846\u67b6\u751f\u6210\u65f6\u5c1a\u3001\u4e0e\u6f6e\u6d41\u4e00\u81f4\u7684\u670d\u88c5\u5efa\u8bae\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u76f4\u63a5\u53cd\u9988\u6301\u7eed\u6539\u8fdb\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u5728\u8fd9\u4e9b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\uff0c\u521b\u9020\u4e86\u66f4\u52a0\u534f\u8c03\u7684\u670d\u88c5\u3002\u6539\u8fdb\u7684\u8868\u73b0\u8bc1\u660e\u4e86\u8be5\u6846\u67b6\u589e\u5f3a\u8d2d\u7269\u4f53\u9a8c\u3001\u63d0\u4f9b\u51c6\u786e\u5efa\u8bae\u7684\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5b83\u76f8\u5bf9\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.12147": "|**2024-09-18**|**MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning**|Justin Chih-Yao Chen et.al.|[2409.12147](http://arxiv.org/abs/2409.12147)|**[link](https://github.com/dinobby/magicore)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\u53ef\u4ee5\u901a\u8fc7\u5728\u6d4b\u8bd5\u65f6\u91c7\u7528\u805a\u5408\u7b56\u7565\u8fdb\u884c\u63d0\u5347\uff0c\u5373\u751f\u6210\u591a\u4e2a\u6837\u672c\u5e76\u57fa\u4e8e\u751f\u6210\u6837\u672c\u8fdb\u884c\u6295\u7968\u3002\u867d\u7136\u8fd9\u4e9b\u7b56\u7565\u80fd\u591f\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5b58\u5728\u9971\u548c\u70b9\u3002\u6539\u8fdb\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201cRefinement\u201d\u7684\u7b56\u7565\uff0c\u901a\u8fc7\u5229\u7528LLM\u751f\u6210\u7684\u53cd\u9988\u6765\u63d0\u5347\u89e3\u51b3\u65b9\u6848\u7684\u8d28\u91cf\u3002\u7136\u800c\uff0cRefinement\u4e5f\u5e26\u6765\u4e86\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u8fc7\u5ea6\u7ec6\u5316\uff1a\u5bf9\u6240\u6709\u5b9e\u4f8b\u8fdb\u884c\u7edf\u4e00\u7ec6\u5316\u53ef\u80fd\u5bfc\u81f4\u8fc7\u5ea6\u4fee\u6b63\uff0c\u4ece\u800c\u964d\u4f4e\u6574\u4f53\u6027\u80fd\u3002\uff082\uff09\u96be\u4ee5\u5b9a\u4f4d\u548c\u7ea0\u6b63\u9519\u8bef\uff1aLLM\u5177\u6709\u6709\u9650\u7684\u81ea\u6211\u7ea0\u6b63\u80fd\u529b\uff0c\u5f88\u96be\u8bc6\u522b\u5e76\u7ea0\u6b63\u81ea\u5df1\u7684\u9519\u8bef\u3002\uff083\uff09\u7ec6\u5316\u4e0d\u8db3\uff1a\u51b3\u5b9a\u9700\u8981\u591a\u5c11\u8fed\u4ee3\u7684\u7ec6\u5316\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fc7\u65e9\u505c\u6b62\u53ef\u80fd\u4f1a\u8ba9\u9519\u8bef\u672a\u5f97\u5230\u89e3\u51b3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAgICoRe\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u95ee\u9898\u96be\u5ea6\u5206\u4e3a\u7b80\u5355\u6216\u56f0\u96be\uff0c\u5e76\u4f7f\u7528\u7c97\u7c92\u5ea6\u805a\u5408\u89e3\u51b3\u7b80\u5355\u95ee\u9898\uff0c\u4f7f\u7528\u7ec6\u7c92\u5ea6\u548c\u591a\u8f6e\u8fed\u4ee3\u7ec6\u5316\u89e3\u51b3\u56f0\u96be\u95ee\u9898\uff0c\u4ee5\u907f\u514d\u8fc7\u5ea6\u7ec6\u5316\u3002\u4e3a\u4e86\u6539\u5584\u9519\u8bef\u5b9a\u4f4d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57fa\u4e8e\u6b65\u9aa4\u7ea7\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u5206\u6570\u7684\u5916\u90e8\u8bc4\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7531\u4e09\u4e2a\u4ee3\u7406\u7ec4\u6210\u7684\u591a\u4ee3\u7406\u5faa\u73af\uff1a\u6c42\u89e3\u8005\u3001\u5ba1\u67e5\u8005\uff08\u6839\u636e\u6b65\u9aa4\u7ea7RM\u5206\u6570\u751f\u6210\u9488\u5bf9\u6027\u53cd\u9988\uff09\u4ee5\u53ca\u7ec6\u5316\u8005\uff08\u6574\u5408\u53cd\u9988\uff09\uff0c\u4ee5\u786e\u4fdd\u6709\u6548\u7ec6\u5316\u3002\u4e3a\u4e86\u786e\u4fdd\u8db3\u591f\u7684\u7ec6\u5316\uff0c\u6211\u4eec\u91cd\u65b0\u8bc4\u4f30\u66f4\u65b0\u540e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u5728\u5fc5\u8981\u65f6\u542f\u52a8\u8fdb\u4e00\u6b65\u7684\u7ec6\u5316\u8f6e\u6b21\u3002\u6211\u4eec\u4f7f\u7528Llama-3-8B\u548cGPT-3.5\u57285\u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86MAgICoRe\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u6709\u6548\u6027\u3002\u5373\u4f7f\u53ea\u8fdb\u884c\u4e00\u6b21\u8fed\u4ee3\uff0cMAgICoRe\u4e5f\u80fd\u5728\u4f7f\u7528\u4e0d\u5230\u57fa\u7ebf\u6837\u672c\u4e00\u534a\u7684\u60c5\u51b5\u4e0b\uff0c\u5206\u522b\u8d85\u8fc7Self-Consistency\u3001Best-of-k\u548cSelf-Refine\u7b97\u6cd53.4%\u30013.2%\u548c4.0%\u3002\u4e0e\u8fed\u4ee3\u7ec6\u5316\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0cMAgICoRe\u968f\u7740\u8fed\u4ee3\u6b21\u6570\u7684\u589e\u52a0\u6301\u7eed\u63d0\u9ad8\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u7684\u6d88\u878d\u5b9e\u9a8c\u5f3a\u8c03\u4e86MAgICoRe\u4e2dRMs\u548c\u591a\u4ee3\u7406\u901a\u4fe1\u7684\u91cd\u8981\u6027\u3002**|\n", "2409.12140": "|**2024-09-18**|**MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion**|Kalakonda Sai Shashank et.al.|[2409.12140](http://arxiv.org/abs/2409.12140)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMoRAG\u7684\u521b\u65b0\u591a\u90e8\u5206\u878d\u5408\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\uff0c\u7528\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u3002\u6b64\u65b9\u6cd5\u901a\u8fc7\u5229\u7528\u589e\u5f3a\u7684\u8fd0\u52a8\u68c0\u7d22\u8fc7\u7a0b\u83b7\u5f97\u7684\u989d\u5916\u77e5\u8bc6\u6765\u63d0\u5347\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u3002\u901a\u8fc7\u6709\u6548\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u6211\u4eec\u89e3\u51b3\u4e86\u8fd0\u52a8\u68c0\u7d22\u4e2d\u7684\u62fc\u5199\u9519\u8bef\u548c\u91cd\u8ff0\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u591a\u90e8\u5206\u68c0\u7d22\u7b56\u7565\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u68c0\u7d22\u5728\u8bed\u8a00\u7a7a\u95f4\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u7a7a\u95f4\u7ec4\u5408\u68c0\u7d22\u5230\u7684\u52a8\u4f5c\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u6837\u672c\u3002\u6b64\u5916\uff0c\u5229\u7528\u4f4e\u7ea7\u3001\u7279\u5b9a\u90e8\u5206\u7684\u8fd0\u52a8\u4fe1\u606f\uff0c\u6211\u4eec\u53ef\u4ee5\u6784\u5efa\u9488\u5bf9\u672a\u89c1\u8fc7\u6587\u672c\u63cf\u8ff0\u7684\u8fd0\u52a8\u6837\u672c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u4f5c\u4e3a\u63d2\u4ef6\u6a21\u5757\u4f7f\u7528\uff0c\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u7684\u6027\u80fd\u3002\u4ee3\u7801\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u548c\u89c6\u9891\u793a\u4f8b\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u63d0\u4f9b\uff1ahttps://motion-rag.github.io/|\n", "2409.12139": "|**2024-09-24**|**Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models**|Sijing Chen et.al.|[2409.12139](http://arxiv.org/abs/2409.12139)|null|\u968f\u7740\u5927\u6570\u636e\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u4ee3\u7684\u5230\u6765\uff0c\u96f6\u6837\u672c\u4e2a\u6027\u5316\u5feb\u901f\u5b9a\u5236\u5df2\u6210\u4e3a\u4e00\u4e2a\u663e\u8457\u8d8b\u52bf\u3002\u672c\u62a5\u544a\u4ecb\u7ecd\u4e86Takin AudioLLM\u7cfb\u5217\u6280\u672f\u4e0e\u6a21\u578b\uff0c\u4e3b\u8981\u5305\u62ecTakin TTS\u3001Takin VC\u548cTakin Morphing\uff0c\u4e13\u95e8\u7528\u4e8e\u6709\u58f0\u8bfb\u7269\u5236\u4f5c\u3002\u8fd9\u4e9b\u6a21\u578b\u5177\u5907\u96f6\u6837\u672c\u8bed\u97f3\u751f\u6210\u80fd\u529b\uff0c\u80fd\u4ea7\u751f\u51e0\u4e4e\u4e0e\u771f\u4eba\u58f0\u97f3\u96be\u4ee5\u533a\u5206\u7684\u9ad8\u8d28\u91cf\u8bed\u97f3\uff0c\u4f7f\u5f97\u4e2a\u4eba\u53ef\u4ee5\u6839\u636e\u81ea\u8eab\u9700\u6c42\u5b9a\u5236\u8bed\u97f3\u5185\u5bb9\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdTakin TTS\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u589e\u5f3a\u795e\u7ecf\u8bed\u97f3\u7f16\u89e3\u7801\u5668\u548c\u591a\u4efb\u52a1\u8bad\u7ec3\u6846\u67b6\u7684\u795e\u7ecf\u7f16\u89e3\u7801\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u4ee5\u96f6\u6837\u672c\u65b9\u5f0f\u751f\u6210\u9ad8\u4fdd\u771f\u81ea\u7136\u8bed\u97f3\u3002\u5bf9\u4e8eTakin VC\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u5185\u5bb9\u4e0e\u97f3\u8272\u8054\u5408\u5efa\u6a21\u65b9\u6cd5\u6765\u63d0\u9ad8\u8bf4\u8bdd\u4eba\u76f8\u4f3c\u5ea6\uff0c\u5e76\u5021\u5bfc\u57fa\u4e8e\u6761\u4ef6\u6d41\u5339\u914d\u7684\u89e3\u7801\u5668\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u81ea\u7136\u6027\u548c\u8868\u8fbe\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Takin Morphing\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u91c7\u7528\u9ad8\u5ea6\u89e3\u8026\u4e14\u5148\u8fdb\u7684\u97f3\u8272\u4e0e\u8282\u594f\u5efa\u6a21\u65b9\u6cd5\uff0c\u4f7f\u4e2a\u4f53\u80fd\u591f\u4ee5\u7cbe\u786e\u53ef\u63a7\u7684\u65b9\u5f0f\u6839\u636e\u81ea\u5df1\u7684\u504f\u597d\u5b9a\u5236\u8bed\u97f3\u751f\u4ea7\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eecTakin AudioLLM\u7cfb\u5217\u6a21\u578b\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u6709\u5173\u8be6\u7ec6\u6f14\u793a\uff0c\u8bf7\u53c2\u9605\u3002|\n", "2409.12122": "|**2024-09-18**|**Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement**|An Yang et.al.|[2409.12122](http://arxiv.org/abs/2409.12122)|null|\u5728\u672c\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7cfb\u5217\u6570\u5b66\u4e13\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff1aQwen2.5-Math \u548c Qwen2.5-Math-Instruct-1.5B/7B/72B\u3002Qwen2.5 \u7cfb\u5217\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5728\u6574\u4e2a\u7ba1\u9053\u4e2d\u878d\u5165\u81ea\u6211\u63d0\u5347\u7684\u54f2\u5b66\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u3001\u540e\u5904\u7406\u548c\u63a8\u7406\u9636\u6bb5\uff1a\uff081\uff09\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u4f7f\u7528 Qwen2-Math-Instruct \u6765\u751f\u6210\u5927\u89c4\u6a21\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6570\u636e\u3002\uff082\uff09\u5728\u540e\u5904\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u901a\u8fc7\u4ece Qwen2-Math-Instruct \u8fdb\u884c\u5927\u91cf\u91c7\u6837\u6765\u5f00\u53d1\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06\u6b64 RM \u5e94\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u7684\u8fed\u4ee3\u8fdb\u5316\u3002\u901a\u8fc7\u589e\u5f3a\u7684 SFT \u6a21\u578b\uff0c\u6709\u53ef\u80fd\u8fdb\u884c\u8fed\u4ee3\u8bad\u7ec3\u5e76\u66f4\u65b0 RM\uff0c\u8fdb\u800c\u6307\u5bfc SFT \u6570\u636e\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u5728\u6700\u7ec8\u7684 SFT \u6a21\u578b\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u7ec8\u6781 RM \u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff0c\u4ece\u800c\u4ea7\u751f Qwen2.5-Math-Instruct \u6a21\u578b\u3002\uff083\uff09\u6b64\u5916\uff0c\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u4f7f\u7528 RM \u6765\u5f15\u5bfc\u91c7\u6837\uff0c\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002 Qwen2.5-Math-Instruct \u652f\u6301\u4e2d\u6587\u548c\u82f1\u6587\uff0c\u5e76\u5177\u6709\u9ad8\u7ea7\u6570\u5b66\u63a8\u7406\u80fd\u529b\uff0c\u5305\u62ec\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u548c\u5de5\u5177\u96c6\u6210\u63a8\u7406\uff08TIR\uff09\u3002\u6211\u4eec\u5728\u82f1\u8bed\u548c\u4e2d\u6587\u7684 10 \u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5982 GSM8K\u3001MATH\u3001GaoKao\u3001AMC23 \u548c AIME24\uff0c\u6db5\u76d6\u4ece\u5c0f\u5b66\u6c34\u5e73\u5230\u6570\u5b66\u7ade\u8d5b\u95ee\u9898\u7684\u5e7f\u6cdb\u96be\u5ea6\u3002|\n", "2409.12117": "|**2024-09-18**|**Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference**|Edresson Casanova et.al.|[2409.12117](http://arxiv.org/abs/2409.12117)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u5c06\u97f3\u9891\u8f6c\u6362\u4e3a\u79bb\u6563\u4ee4\u724c\u7684\u97f3\u9891\u7f16\u89e3\u7801\u5668\u65b9\u9762\u663e\u8457\u63a8\u52a8\u4e86\u97f3\u9891\u5904\u7406\uff0c\u8fd9\u4f7f\u5f97\u53ef\u4ee5\u5c06\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u5e94\u7528\u4e8e\u97f3\u9891\u6570\u636e\u3002\u7136\u800c\uff0c\u97f3\u9891\u7f16\u89e3\u7801\u5668\u901a\u5e38\u4ee5\u9ad8\u5e27\u7387\u8fd0\u884c\uff0c\u5bfc\u81f4\u8bad\u7ec3\u548c\u63a8\u7406\u901f\u5ea6\u7f13\u6162\uff0c\u7279\u522b\u662f\u5728\u81ea\u56de\u5f52\u6a21\u578b\u4e2d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f4e\u5e27\u7387\u8bed\u97f3\u7f16\u89e3\u7801\u5668\uff08LFSC\uff09\uff1a\u4e00\u79cd\u795e\u7ecf\u97f3\u9891\u7f16\u89e3\u7801\u5668\uff0c\u5b83\u5229\u7528\u6709\u9650\u6807\u91cf\u91cf\u5316\u548c\u4e0e\u5927\u578b\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u6297\u6027\u8bad\u7ec3\uff0c\u4ee51.89 kbps\u7684\u6bd4\u7279\u7387\u548c21.5\u5e27/\u79d2\u5b9e\u73b0\u9ad8\u8d28\u91cf\u7684\u97f3\u9891\u538b\u7f29\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6211\u4eec\u7684\u65b0\u578b\u7f16\u89e3\u7801\u5668\u53ef\u4ee5\u4f7f\u57fa\u4e8eLLM\u7684\u6587\u672c\u5230\u8bed\u97f3\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u52a0\u5feb\u7ea6\u4e09\u500d\uff0c\u540c\u65f6\u63d0\u9ad8\u53ef\u61c2\u5ea6\u5e76\u4ea7\u751f\u4e0e\u4ee5\u5f80\u6a21\u578b\u76f8\u5f53\u7684\u8d28\u91cf\u3002|\n", "2409.12106": "|**2024-09-18**|**Measuring Human and AI Values based on Generative Psychometrics with Large Language Models**|Haoran Ye et.al.|[2409.12106](http://arxiv.org/abs/2409.12106)|**[link](https://github.com/value4ai/gpv)**|**\u672c\u6587\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u5fc3\u7406\u6d4b\u5ea6\uff08GPV\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6570\u636e\u9a71\u52a8\u7684\u4ef7\u503c\u6d4b\u91cf\u8303\u5f0f\uff0c\u7406\u8bba\u57fa\u7840\u5728\u4e8e\u6587\u672c\u63ed\u793a\u7684\u9009\u62e9\u6027\u611f\u77e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u4ee5\u5b9e\u73b0\u7cbe\u786e\u7684\u611f\u77e5\u5c42\u7ea7\u4ef7\u503c\u6d4b\u91cf\uff0c\u5e76\u9a8c\u8bc1LLM\u89e3\u6790\u6587\u672c\u5f62\u6210\u611f\u77e5\u7684\u6838\u5fc3\u80fd\u529b\uff0c\u4ece\u800c\u6784\u5efaGPV\u7ba1\u9053\u7684\u57fa\u7840\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06GPV\u5e94\u7528\u4e8e\u4eba\u7c7b\u64b0\u5199\u7684\u535a\u5ba2\uff0c\u8bc1\u660e\u5176\u7a33\u5b9a\u6027\u548c\u6709\u6548\u6027\uff0c\u5e76\u4e14\u4f18\u4e8e\u5148\u524d\u7684\u5fc3\u7406\u5b66\u5de5\u5177\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06GPV\u6269\u5c55\u5230LLM\u4ef7\u503c\u6d4b\u91cf\uff0c\u901a\u8fc7\u4ee5\u4e0b\u65b9\u5f0f\u63a8\u52a8\u5f53\u524d\u6280\u672f\uff1a1\uff09\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u53ef\u6269\u5c55\u548c\u81ea\u7531\u5f62\u5f0f\u8f93\u51fa\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u4f7f\u4ef7\u503c\u6d4b\u91cf\u80fd\u591f\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\uff1b2\uff09\u6bd4\u8f83\u4e86\u4e0d\u540c\u6d4b\u91cf\u65b9\u6cd5\uff0c\u63ed\u793a\u4e86\u524d\u4eba\u65b9\u6cd5\u7684\u56de\u5e94\u504f\u5dee\uff1b3\uff09\u5c1d\u8bd5\u5c06LLM\u4ef7\u503c\u4e0e\u5b89\u5168\u6027\u8054\u7cfb\u8d77\u6765\uff0c\u53d1\u73b0\u4e0d\u540c\u4ef7\u503c\u4f53\u7cfb\u7684\u9884\u6d4b\u529b\uff0c\u5e76\u5206\u6790\u5404\u79cd\u4ef7\u503c\u5bf9LLM\u5b89\u5168\u6027\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8de8\u5b66\u79d1\u52aa\u529b\uff0c\u672c\u6587\u65e8\u5728\u5229\u7528AI\u63a8\u52a8\u4e0b\u4e00\u4ee3\u5fc3\u7406\u6d4b\u5ea6\u7684\u53d1\u5c55\uff0c\u5e76\u5229\u7528\u5fc3\u7406\u6d4b\u5ea6\u4fc3\u8fdb\u4ef7\u503c\u5bfc\u5411\u7684AI\u3002**|\n", "2409.17143": "|**2024-09-25**|**Attention Prompting on Image for Large Vision-Language Models**|Runpeng Yu et.al.|[2409.17143](http://arxiv.org/abs/2409.17143)|**[link](https://github.com/yu-rp/apiprompting)**|**\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u76f8\u6bd4\uff0c\u5927\u578b\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u8fd8\u80fd\u63a5\u53d7\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u56e0\u6b64\u5c55\u793a\u4e86\u66f4\u591a\u6709\u8da3\u7684\u73b0\u8c61\u7ea7\u80fd\u529b\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u8868\u73b0\u3002\u53d7LLM\u4e2d\u6587\u672c\u63d0\u793a\u7684\u542f\u53d1\uff0c\u63a2\u7d22\u4e86\u589e\u5f3aLVLM\u5bf9\u89c6\u89c9\u4fe1\u606f\u611f\u77e5\u80fd\u529b\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u3002\u7136\u800c\uff0c\u4ee5\u5f80\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u4ec5\u5904\u7406\u89c6\u89c9\u8f93\u5165\u800c\u4e0d\u8003\u8651\u6587\u672c\u67e5\u8be2\uff0c\u9650\u5236\u4e86\u6a21\u578b\u9075\u5faa\u6587\u672c\u6307\u4ee4\u5b8c\u6210\u4efb\u52a1\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u7684\u65b0\u63d0\u793a\u6280\u672f\uff0c\u8be5\u6280\u672f\u7b80\u5355\u5730\u5728\u539f\u59cb\u8f93\u5165\u56fe\u50cf\u4e0a\u53e0\u52a0\u4e86\u4e00\u4e2a\u7531\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u751f\u6210\u7684\u3001\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\uff0c\u5e76\u6709\u6548\u5730\u589e\u5f3a\u4e86LVLM\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u4e3a\u8f93\u5165\u56fe\u50cf\u751f\u6210\u4e00\u4e2a\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\u3002\u7136\u540e\uff0c\u70ed\u56fe\u7b80\u5355\u5730\u4e58\u4ee5\u539f\u59cb\u56fe\u50cf\u7684\u50cf\u7d20\u503c\u6765\u83b7\u5f97\u5b9e\u9645\u8f93\u5165\u56fe\u50cf\u4f9bLVLM\u4f7f\u7528\u3002\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6280\u672f\u7684\u6709\u6548\u6027\u3002\u4f8b\u5982\uff0c\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u5206\u522b\u63d0\u9ad8\u4e86LLaVA-1.5\u5728MM-Vet\u548cLLaVA-Wild\u57fa\u51c6\u4e0a\u7684\u6027\u80fd3.8%\u548c2.9%\u3002**|\n", "2409.17141": "|**2024-09-25**|**FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression**|Fazal Mittu et.al.|[2409.17141](http://arxiv.org/abs/2409.17141)|**[link](https://github.com/fazalmittu/finezip)**|**\u672c\u6587\u6df1\u5165\u5206\u6790\u4e86\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u4e0eTransformer\u7684\u6587\u672c\u538b\u7f29\u6280\u672f\uff0c\u5e76\u5c06\u5176\u4e0e\u4f20\u7edf\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u8fdb\u884c\u5bf9\u6bd4\u3002\u5c3d\u7ba1\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\u5728\u538b\u7f29\u6bd4\u4e0a\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5728\u5b9e\u7528\u6027\u65b9\u9762\u5374\u6781\u4e3a\u6709\u9650\u3002\u4ee5Llama3-8B\u4e3a\u57fa\u7840\u7684LLM\u538b\u7f29\u7cfb\u7edf\u2014\u2014LLMZip\uff0c\u5728\u538b\u7f29\u4ec510MB\u6587\u672c\u65f6\u9700\u89819.5\u5929\u7684\u65f6\u95f4\uff0c\u5c3d\u7ba1\u538b\u7f29\u6548\u679c\u6709\u6240\u63d0\u5347\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FineZip\u2014\u2014\u4e00\u79cd\u7ed3\u5408\u5728\u7ebf\u8bb0\u5fc6\u4e0e\u52a8\u6001\u4e0a\u4e0b\u6587\u6982\u5ff5\u7684\u65b0\u578bLLM\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u3002FineZip\u76f8\u8f83\u4e8eLLMZip\uff0c\u5c06\u538b\u7f29\u65f6\u95f4\u5927\u5e45\u7f29\u77ed\u81f3\u7ea64\u5c0f\u65f6\uff0c\u6027\u80fd\u63d0\u5347\u4e8654\u500d\uff0c\u4e14\u4e0e\u4f20\u7edf\u7b97\u6cd5\u538b\u7f29\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5176\u538b\u7f29\u6548\u7387\u63d0\u9ad8\u4e86\u5927\u7ea650%\u3002\u901a\u8fc7\u672c\u7814\u7a76\uff0c\u6211\u4eec\u8fc8\u51fa\u4e86\u8ba9\u57fa\u4e8eLLM\u7684\u65e0\u635f\u6587\u672c\u538b\u7f29\u6210\u4e3a\u73b0\u5b9e\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1FineZip\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46LLM\u4ecd\u4e0d\u9002\u7528\u4e8e\u5927\u89c4\u6a21\u6587\u672c\u538b\u7f29\u3002\u6211\u4eec\u671f\u5f85\u672c\u6587\u7684\u7814\u7a76\u548c\u521b\u65b0\u80fd\u4e3a\u672a\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u94fa\u5e73\u9053\u8def\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAXIS\u7684\u65b0\u578b\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u7f16\u7a0b\u63a5\u53e3\uff08API\uff09\u4f18\u5148\u5904\u7406\u64cd\u4f5c\u800c\u975e\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\uff0c\u4ee5\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u7684\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u95ee\u9898\u3002\u6b64\u5916\uff0cAXIS\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u7a0b\u5e8f\u7684\u65b9\u5f0f\u4fc3\u8fdb\u4e86API\u7684\u521b\u5efa\u4e0e\u6269\u5c55\u3002 \u5728Office Word\u5e94\u7528\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u4efb\u52a1\u5b8c\u6210\u65f6\u95f4\u4e0a\u7f29\u77ed\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u4eba\u7c7b\u3001\u4ee3\u7406\u548c\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u4ee5\u53ca\u5e94\u7528\u7a0b\u5e8f\u63d0\u4f9b\u8005\u5728LLM\u65f6\u4ee3\u7684\u65b0UI\u8bbe\u8ba1\u539f\u5219\u505a\u51fa\u4e86\u8d21\u732e\u3002\u5b83\u4e5f\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e2a\u5e94\u7528\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.17115": "|**2024-09-25**|**Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale**|Fan Zhou et.al.|[2409.17115](http://arxiv.org/abs/2409.17115)|**[link](https://github.com/gair-nlp/prox)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u8bad\u7ec3\u9886\u57df\uff0c\u4eba\u4eec\u957f\u671f\u4ee5\u6765\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u5236\u5b9a\u63d0\u5347\u6570\u636e\u8d28\u91cf\u7684\u542f\u53d1\u5f0f\u89c4\u5219\uff0c\u81f3\u4eca\u5df2\u53d1\u5c55\u51fa\u4f17\u591a\u89c4\u5219\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u89c4\u5219\u7f3a\u4e4f\u7075\u6d3b\u6027\uff0c\u65e0\u6cd5\u6709\u6548\u9488\u5bf9\u6bcf\u4e2a\u5b9e\u4f8b\u7684\u72ec\u7279\u7279\u6027\u8fdb\u884c\u8c03\u6574\u3002\u540c\u65f6\uff0c\u4e3a\u6bcf\u4e2a\u5b9e\u4f8b\u5e94\u7528\u5b9a\u5236\u89c4\u5219\u5bf9\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u800c\u8a00\u662f\u4e0d\u5207\u5b9e\u9645\u7684\u3002\u672c\u6587\u5c55\u793a\u4e86\u5373\u4f7f\u662f\u53c2\u6570\u6570\u91cf\u4ec5\u67090.3B\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u4e5f\u80fd\u5c55\u73b0\u51fa\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u76f8\u5f53\u7684\u6570\u636e\u4f18\u5316\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f16\u7a0b\u6bcf\u4f8b\u201d\uff08ProX\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u6570\u636e\u4f18\u5316\u89c6\u4e3a\u7f16\u7a0b\u4efb\u52a1\uff0c\u5141\u8bb8\u6a21\u578b\u901a\u8fc7\u751f\u6210\u5e76\u6267\u884c\u7cbe\u7ec6\u7c92\u5ea6\u7684\u64cd\u4f5c\uff08\u5982\u5b57\u7b26\u4e32\u89c4\u8303\u5316\uff09\u5bf9\u6bcf\u4e2a\u4e2a\u4f53\u5b9e\u4f8b\u8fdb\u884c\u5927\u89c4\u6a21\u4f18\u5316\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528ProX\u7b5b\u9009\u540e\u7684\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u5404\u79cd\u4e0b\u6e38\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u4f18\u4e8e\u539f\u59cb\u6570\u636e\u6216\u7531\u5176\u4ed6\u7b5b\u9009\u65b9\u6cd5\u5904\u7406\u7684\u6570\u636e\uff0c\u6027\u80fd\u63d0\u5347\u8d85\u8fc72%\u3002\u8be5\u6846\u67b6\u7684\u6709\u6548\u6027\u9002\u7528\u4e8e\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5305\u62ecC4\u3001RedPajama-V2\u548cFineWeb\u3002\u6b64\u5916\uff0cProX\u5728\u7279\u5b9a\u9886\u57df\u7684\u8fde\u7eed\u9884\u8bad\u7ec3\u4e2d\u8868\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff1a\u5728\u65e0\u9700\u7279\u5b9a\u9886\u57df\u8bbe\u8ba1\u7684\u60c5\u51b5\u4e0b\uff0c\u4f7f\u7528ProX\u4f18\u5316\u7684OpenWebMath\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u51c6\u786e\u6027\u4e0a\u5206\u522b\u6bd4Mistral-7B\u3001Llama-2-7B\u548cCodeLlama-7B\u63d0\u9ad8\u4e867.6%\u300114.6%\u548c20.3%\uff0c\u4ec5\u4f7f\u7528\u7ea610B\u4ee4\u724c\u5373\u53ef\u8fbe\u5230\u7c7b\u4f3c\u4e8e\u4f7f\u7528200B\u4ee4\u724c\u9884\u8bad\u7ec3\u7684Llama-7B\u6a21\u578b\u7684\u6c34\u5e73\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0cProX\u663e\u8457\u8282\u7701\u4e86\u8bad\u7ec3FLOPs\uff0c\u4e3a\u9ad8\u6548LLM\u9884\u8bad\u7ec3\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u9053\u8def\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86ProX\uff0c\u5305\u62ec>100B\u7684\u8bed\u6599\u5e93\u3001\u6a21\u578b\u4ee5\u53ca\u6240\u6709\u8bad\u7ec3\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u4ee5\u4fc3\u8fdb\u53ef\u590d\u5236\u7814\u7a76\u548c\u672a\u6765\u521b\u65b0\u3002\u4ee3\u7801\uff1ahttps://github.com/GAIR-NLP/ProX**|\n", "2409.17092": "|**2024-09-25**|**Accumulator-Aware Post-Training Quantization**|Ian Colbert et.al.|[2409.17092](http://arxiv.org/abs/2409.17092)|null|\u8fd1\u5e74\u6765\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u7d22\u4e86\u4f4e\u7cbe\u5ea6\u7d2f\u52a0\uff0c\u62a5\u544a\u4e86\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684\u541e\u5410\u91cf\u3001\u529f\u7387\u548c\u9762\u79ef\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u63d0\u8bae\u4ec5\u8003\u8651\u4e86\u91cf\u5316\u611f\u77e5\u8bad\u7ec3\uff08QAT\uff09\u8303\u5f0f\uff0c\u5728\u8be5\u8303\u5f0f\u4e2d\uff0c\u6a21\u578b\u5728\u91cf\u5316\u5faa\u73af\u4e2d\u8fdb\u884c\u5fae\u8c03\u6216\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3002\u968f\u7740\u6a21\u578b\u7ee7\u7eed\u589e\u5927\uff0cQAT\u6280\u672f\u7684\u6210\u672c\u53d8\u5f97\u8d8a\u6765\u8d8a\u9ad8\uff0c\u8fd9\u6fc0\u53d1\u4e86\u6700\u8fd1\u5bf9\u540e\u91cf\u5316\u91cf\u5316\uff08PTQ\uff09\u7814\u7a76\u7684\u70ed\u6f6e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u6b63\u5f0f\u7814\u7a76PTQ\u80cc\u666f\u4e0b\u7684\u79ef\u7b97\u5668\u611f\u77e5\u91cf\u5316\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AXE\uff0c\u4e00\u4e2a\u65e8\u5728\u8d4b\u4e88\u73b0\u6709\u5c42\u5f0fPTQ\u7b97\u6cd5\u6ea2\u51fa\u907f\u514d\u4fdd\u8bc1\u7684\u5b9e\u7528\u6846\u67b6\u7684\u6269\u5c55\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u6700\u5148\u8fdb\u7684PTQ\u7b97\u6cd5\uff1aGPFQ\u548cOPTQ\u4e4b\u4e0a\u5b9e\u73b0AXE\u6765\u7406\u8bba\u5730\u63a8\u52a8AXE\uff0c\u5e76\u8bc1\u660e\u5176\u7075\u6d3b\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u901a\u8fc7\u9996\u6b21\u652f\u6301\u591a\u9636\u6bb5\u79ef\u7d2f\u6765\u4e00\u822c\u5316AXE\uff0c\u4e3a\u5168\u6570\u636e\u8def\u5f84\u4f18\u5316\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6269\u5c55\u6253\u5f00\u5927\u95e8\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u8bed\u8a00\u751f\u6210\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86AXE\uff0c\u5e76\u89c2\u5bdf\u5230\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5728\u79ef\u7b97\u5668\u4f4d\u5bbd\u4e0e\u6a21\u578b\u51c6\u786e\u6027\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u6539\u8fdb\u3002|\n", "2409.17066": "|**2024-09-25**|**VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models**|Yifei Liu et.al.|[2409.17066](http://arxiv.org/abs/2409.17066)|**[link](https://github.com/microsoft/vptq)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aVector Post-Training Quantization\uff08VPTQ\uff09\u7684\u4f4e\u6bd4\u7279\u91cf\u5316\u65b9\u6cd5\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u8fc7\u4f7f\u7528\u4e8c\u6b21\u4f18\u5316\u6765\u5b9a\u4e49LLM\u5411\u91cf\u91cf\u5316\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u89e3\u51b3\u4f18\u5316\u95ee\u9898\u6765\u6307\u5bfc\u91cf\u5316\u7b97\u6cd5\u8bbe\u8ba1\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u5f15\u5165\u4e86\u901a\u9053\u72ec\u7acb\u7684\u4e8c\u6b21\u4f18\u5316\u4ee5\u5b9e\u73b0\u7cbe\u7ec6\u5316\u91cf\u5316\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u5206\u89e3\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u51fa\u4e86\u7b80\u660e\u6709\u6548\u7684\u4ee3\u7801\u672c\u521d\u59cb\u5316\u7b97\u6cd5\u3002\u6b64\u5916\uff0cVPTQ\u8fd8\u6269\u5c55\u4e86\u6b8b\u5dee\u548c\u5f02\u5e38\u503c\u91cf\u5316\u652f\u6301\uff0c\u8fd9\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6a21\u578b\u7cbe\u5ea6\uff0c\u8fd8\u80fd\u8fdb\u4e00\u6b65\u538b\u7f29\u6a21\u578b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eSOTA\u76f8\u6bd4\uff0c\u57282\u6bd4\u7279\u91cf\u5316\u65f6\uff0cVPTQ\u5c06\u6a21\u578b\u91cf\u5316\u56f0\u60d1\u5ea6\u964d\u4f4e0.01-0.34\uff0cMistral-7B\u4e0a\u4e3a0.38-0.68\uff0cLLaMA-3\u4e0a\u4e3a4.41-7.34\u3002\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u5e73\u5747\u51c6\u786e\u5ea6\u63d0\u5347\u8303\u56f4\u4e3aLLaMA-2\u4e0a\u76840.79%-1.5%\uff0cMistral-7B\u4e0a\u76841%\uff0c\u4ee5\u53caLLaMA-3\u4e0a\u768411%-22%\u3002\u91cf\u5316\u7b97\u6cd5\u6267\u884c\u65f6\u95f4\u4ec5\u536010.4%-18.6%\uff0c\u5bfc\u81f4\u63a8\u7406\u541e\u5410\u91cf\u63d0\u9ad81.6-1.8\u500d\u3002**|\n", "2409.17054": "|**2024-09-25**|**Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia**|Azmul Asmar Irfan et.al.|[2409.17054](http://arxiv.org/abs/2409.17054)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u672c\u5730\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528Whisper\u6a21\u578b\u8fdb\u884c\u8f6c\u5f55\uff0cGPT-3\u8fdb\u884c\u603b\u7ed3\uff0c\u5e76\u5c06\u5176\u683c\u5f0f\u5316\u4e3aePuskemas\u533b\u7597\u8bb0\u5f55\u3002\u6b64\u7cfb\u7edf\u4f5c\u4e3a\u73b0\u6709\u7f51\u7edc\u6d4f\u89c8\u5668\u6269\u5c55\u7684\u9644\u52a0\u7ec4\u4ef6\u5b9e\u73b0\uff0c\u5141\u8bb8\u533b\u751f\u5728\u8bf4\u8bdd\u65f6\u586b\u5199\u60a3\u8005\u8868\u683c\u3002\u901a\u8fc7\u5229\u7528\u5b9e\u65f6\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u529f\u80fd\uff0c\u533b\u751f\u53ef\u4ee5\u63d0\u9ad8\u60a3\u8005\u62a4\u7406\u7684\u5468\u8f6c\u65f6\u95f4\uff0c\u540c\u65f6\u589e\u5f3a\u8bb0\u5f55\u7684\u8d28\u91cf\uff0c\u4f7f\u5f97\u8bb0\u5f55\u66f4\u52a0\u8be6\u7ec6\u4e14\u5bcc\u6709\u6d1e\u5bdf\u529b\uff0c\u4ee5\u4f9b\u672a\u6765\u7684\u8bbf\u95ee\u53c2\u8003\u3002\u8fd9\u4e00\u521b\u65b0\u65e8\u5728\u89e3\u51b3\u5370\u5c3c\u533b\u7597\u673a\u6784\u62e5\u6324\u4ee5\u53ca\u533b\u62a4\u4eba\u5458\u884c\u653f\u8d1f\u62c5\u91cd\u7684\u95ee\u9898\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5c06\u5e2e\u52a9\u533b\u751f\u8282\u7701\u65f6\u95f4\u3001\u63d0\u4f9b\u66f4\u597d\u7684\u62a4\u7406\u5e76\u4ea7\u751f\u66f4\u51c6\u786e\u7684\u533b\u7597\u8bb0\u5f55\uff0c\u4ee3\u8868\u4e86\u5411\u73b0\u4ee3\u5316\u533b\u7597\u4fdd\u5065\u8fc8\u8fdb\u7684\u91cd\u8981\u4e00\u6b65\uff0c\u786e\u4fdd\u5373\u4f7f\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u60a3\u8005\u4e5f\u80fd\u83b7\u5f97\u53ca\u65f6\u3001\u9ad8\u8d28\u91cf\u7684\u62a4\u7406\u3002|\n", "2409.17044": "|**2024-09-25**|**How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not**|Francesco Verdini et.al.|[2409.17044](http://arxiv.org/abs/2409.17044)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u60ca\u4eba\u8868\u73b0\u63a8\u52a8\u4e86\u7814\u7a76\u52aa\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u4efb\u52a1\u548c\u8f93\u5165\u6a21\u6001\u3002\u5728\u8bed\u97f3\u8f6c\u6587\u672c\uff08S2T\uff09\u4efb\u52a1\u4e2d\uff0c\u65b0\u5174\u7684\u89e3\u51b3\u65b9\u6848\u662f\u901a\u8fc7\u9002\u914d\u5668\u6a21\u5757\u5c06\u8bed\u97f3\u57fa\u7840\u6a21\u578b\uff08SFM\uff09\u7684\u8f93\u51fa\u6295\u5f71\u5230LLM\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u76ee\u524d\u8fd8\u6ca1\u6709\u5de5\u4f5c\u63a2\u8ba8\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6bcf\u4e2a\u7ec4\u4ef6\uff08SFM\u3001\u9002\u914d\u5668\u3001LLM\uff09\uff0c\u6216\u8005\u9009\u62e9\u9002\u914d\u5668\u7684\u6700\u4f73\u8bbe\u8ba1\u662f\u5426\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e865\u4e2a\u9002\u914d\u5668\u6a21\u5757\u30012\u4e2aLLM\uff08Mistral\u548cLlama\uff09\u4ee5\u53ca2\u4e2aSFM\uff08Whisper\u548cSeamlessM4T\uff09\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u548c\u8bed\u97f3\u7ffb\u8bd1\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684S2T\u4efb\u52a1\u4e0a\u7684\u7ec4\u5408\u6548\u679c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cSFM\u5728\u4e0b\u6e38\u6027\u80fd\u4e2d\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\uff0c\u800c\u9002\u914d\u5668\u7684\u9009\u62e9\u5177\u6709\u9002\u5ea6\u7684\u5f71\u54cd\uff0c\u5e76\u4e14\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002|\n", "2409.17027": "|**2024-09-25**|**Counterfactual Token Generation in Large Language Models**|Ivi Chatzi et.al.|[2409.17027](http://arxiv.org/abs/2409.17027)|**[link](https://github.com/networks-learning/counterfactual-llms)**|\u672c\u6587\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u529f\u80fd\uff0c\u4f7f\u5176\u80fd\u591f\u63a8\u7406\u8fc7\u53bb\u751f\u6210\u7684\u4ee4\u724c\u6240\u5448\u73b0\u7684\u53ef\u80fd\u66ff\u4ee3\u60c5\u51b5\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGumbel-Max\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u7684\u56e0\u679c\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fd9\u4e00\u529f\u80fd\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u51e0\u4e4e\u4e0d\u589e\u52a0\u4e0e\u57fa\u7840\u4ee4\u724c\u751f\u6210\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u8fdb\u884c\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\uff0c\u5b9e\u73b0\u8fc7\u7a0b\u7b80\u5355\u4e14\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u6216\u63d0\u793a\u5de5\u7a0b\u3002\u6211\u4eec\u5728\u6b64\u57fa\u7840\u4e0a\u5728Llama 3 8B-instruct\u4e0a\u5b9e\u73b0\u4e86\u8be5\u6a21\u578b\uff0c\u5e76\u5bf9\u751f\u6210\u7684\u53cd\u4e8b\u5b9e\u6587\u672c\u8fdb\u884c\u4e86\u5b9a\u6027\u548c\u5b9a\u91cf\u5206\u6790\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\u5728\u504f\u89c1\u68c0\u6d4b\u65b9\u9762\u7684\u5e94\u7528\uff0c\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u7684\u4e16\u754c\u6a21\u578b\u4e2d\u7684\u4e00\u4e9b\u6709\u8da3\u89c1\u89e3\u3002|\n", "2409.17011": "|**2024-09-25**|**LLM-CARD: Towards a Description and Landscape of Large Language Models**|Shengwei Tian et.al.|[2409.17011](http://arxiv.org/abs/2409.17011)|**[link](https://github.com/shengwei-tian/dependency-parser-visualization)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e2d\u4e0d\u65ad\u6d8c\u73b0\u3002\u968f\u7740\u53d1\u8868\u7684\u8bba\u6587\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\uff0c\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u9762\u4e34\u4fe1\u606f\u8fc7\u8f7d\u7684\u6311\u6218\u3002\u56e0\u6b64\uff0c\u5f00\u53d1\u4e00\u4e2a\u80fd\u591f\u81ea\u52a8\u4ece\u5b66\u672f\u8bba\u6587\u4e2d\u63d0\u53d6\u5e76\u7ec4\u7ec7LLM\u5173\u952e\u4fe1\u606f\u7684\u7cfb\u7edf\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u5173\u7cfb\u62bd\u53d6\uff08RE\uff09\u65b9\u6cd5\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u4ee5\u81ea\u52a8\u4ece\u8bba\u6587\u4e2d\u63d0\u53d6\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9ad8\u6548\u5730\u83b7\u53d6\u5173\u4e8eLLMs\u7684\u4fe1\u606f\u3002\u8fd9\u4e9b\u7279\u6027\u5305\u62ec\u6a21\u578b\u7684\u201c\u8bb8\u53ef\u201d\u3001\u201c\u540d\u79f0\u201d\u548c\u201c\u5e94\u7528\u201d\u3002\u501f\u52a9\u8fd9\u4e9b\u7279\u6027\uff0c\u6211\u4eec\u53ef\u4ee5\u4e3a\u6bcf\u7bc7\u8bba\u6587\u5f62\u6210\u4e00\u4e2a\u6a21\u578b\u5361\u7247\u3002\u5728\u6570\u636e\u8d21\u732e\u65b9\u9762\uff0c\u5bf9106\u7bc7\u5b66\u672f\u8bba\u6587\u8fdb\u884c\u4e86\u5904\u7406\uff0c\u5b9a\u4e49\u4e86\u4e09\u4e2a\u5b57\u5178\u2014\u2014LLMs\u540d\u79f0\u3001\u8bb8\u53ef\u548c\u5e94\u7528\u3002\u901a\u8fc7\u5b57\u5178\u67e5\u627e\u63d0\u53d6\u4e8611051\u4e2a\u53e5\u5b50\uff0c\u5e76\u901a\u8fc7\u4eba\u5de5\u5ba1\u67e5\u6700\u7ec8\u9009\u62e9\u4e86129\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u540d\u79f0\u4e0e\u8bb8\u53ef\u4e4b\u95f4\u7684\u94fe\u63a5\uff0c\u4ee5\u53ca106\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u6a21\u578b\u540d\u79f0\u4e0e\u5e94\u7528\u4e4b\u95f4\u7684\u94fe\u63a5\u3002|\n", "2409.18127": "|**2024-09-26**|**EgoLM: Multi-Modal Language Model of Egocentric Motions**|Fangzhou Hong et.al.|[2409.18127](http://arxiv.org/abs/2409.18127)|null|\u5728\u7a7f\u6234\u8bbe\u5907\u7684\u666e\u53ca\u80cc\u666f\u4e0b\uff0c\u7406\u89e3\u4e3b\u89c2\u89c6\u89d2\u7684\u52a8\u4f5c\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53d1\u5c55\u5177\u6709\u60c5\u5883\u610f\u8bc6\u7684\u4eba\u5de5\u667a\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEgoLM\u7684\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u4ece\u591a\u6a21\u6001\u8f93\u5165\uff08\u5982\u4e3b\u89c2\u89c6\u9891\u548c\u8fd0\u52a8\u4f20\u611f\u5668\uff09\u4e2d\u8ddf\u8e2a\u548c\u7406\u89e3\u4e3b\u89c2\u52a8\u4f5c\u3002EgoLM\u901a\u8fc7\u5229\u7528\u4e30\u5bcc\u7684\u4e0a\u4e0b\u6587\u6765\u89e3\u51b3\u5355\u6a21\u6001\u6761\u4ef6\u4e0b\u7684\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u548c\u7406\u89e3\u96be\u9898\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u901a\u7528\u4e14\u591a\u6a21\u6001\u7684\u6846\u67b6\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5efa\u6a21\u4e3b\u4f53\u52a8\u4f5c\u548c\u81ea\u7136\u8bed\u8a00\u7684\u8054\u5408\u5206\u5e03\u3002\u591a\u6a21\u6001\u4f20\u611f\u5668\u8f93\u5165\u88ab\u7f16\u7801\u5e76\u6295\u5f71\u5230\u8bed\u8a00\u6a21\u578b\u7684\u8054\u5408\u6f5c\u5728\u7a7a\u95f4\u4e2d\uff0c\u5e76\u7528\u4e8e\u89e6\u53d1\u52a8\u4f5c\u751f\u6210\u6216\u6587\u672c\u751f\u6210\uff0c\u5206\u522b\u7528\u4e8e\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u6216\u7406\u89e3\u3002\u5927\u89c4\u6a21\u591a\u6a21\u6001\u4eba\u4f53\u52a8\u4f5c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86EgoLM\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u5728\u666e\u904d\u4e3b\u89c2\u5b66\u4e60\u4e2d\u7684\u6709\u6548\u6027\u3002|\n", "2409.18119": "|**2024-09-26**|**Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography**|Yuexi Du et.al.|[2409.18119](http://arxiv.org/abs/2409.18119)|null|\u5728\u533b\u7597\u56fe\u50cf\u5206\u6790\u9886\u57df\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u5176\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7684CLIP\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5982\u80f8\u7247\u8fd9\u7c7b\u62e5\u6709\u4e30\u5bcc\u56fe\u50cf\u62a5\u544a\u6570\u636e\u7684\u6a21\u6001\u4e0a\uff0c\u800c\u5ffd\u7565\u4e86\u8bf8\u5982\u4e73\u817aX\u5149\u7b49\u8bb8\u591a\u91cd\u8981\u6a21\u6001\u7684\u7814\u7a76\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u5c06\u5b8c\u6574\u7684CLIP\u6a21\u578b\u5e94\u7528\u4e8e\u4e73\u817aX\u5149\u56fe\u50cf\u5206\u6790\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u6807\u8bb0\u6570\u636e\u7a00\u7f3a\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u5c0f\u611f\u5174\u8da3\u533a\u57df\u4ee5\u53ca\u6570\u636e\u4e0d\u5e73\u8861\u7684\u6311\u6218\u3002 \u6211\u4eec\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u79cd\u9488\u5bf9\u4e73\u817aX\u5149\u7684\u4e13\u7528\u76d1\u7763\u6846\u67b6\uff0c\u5229\u7528\u5176\u591a\u89c6\u56fe\u7279\u6027\u3002\u6b64\u5916\uff0c\u8bbe\u8ba1\u4e86\u5bf9\u9f50\u6a21\u5757\u4ee5\u66f4\u597d\u5730\u805a\u7126\u4e8e\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u8be6\u7ec6\u7279\u5f81\u3002\u6700\u540e\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\uff0c\u7528\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u9884\u5148\u4f7f\u7528\u533b\u5b66\u77e5\u8bc6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u5e94\u5bf9\u6570\u636e\u9650\u5236\u95ee\u9898\u3002 \u6211\u4eec\u7684\u591a\u89c6\u56fe\u548c\u591a\u5c3a\u5ea6\u5bf9\u9f50\uff08MaMA\uff09\u65b9\u6cd5\uff0c\u5728\u4e24\u4e2a\u5927\u578b\u771f\u5b9e\u4e16\u754c\u4e73\u817aX\u5149\u6570\u636e\u96c6EMBED\u548cRSNA-Mammo\u4e0a\uff0c\u5bf9\u4e8e\u4e09\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u76f8\u6bd4\u6700\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u4ec5\u4f7f\u7528\u4e8652%\u7684\u6a21\u578b\u5927\u5c0f\u3002|\n", "2409.18111": "|**2024-09-26**|**E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding**|Ye Liu et.al.|[2409.18111](http://arxiv.org/abs/2409.18111)|**[link](https://github.com/PolyU-ChenLab/ETBench)**|**\u4e3a\u4e86\u9a8c\u8bc1\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\uff08Video Large Language Models, Video-LLMs\uff09\u5728\u901a\u7528\u89c6\u9891\u7406\u89e3\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5df2\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bca\u65ad\u6a21\u578b\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u901a\u8fc7\u89c6\u9891\u7ea7\u95ee\u9898\u56de\u7b54\u8fdb\u884c\u8bc4\u4f30\uff0c\u7f3a\u4e4f\u5bf9\u4e8b\u4ef6\u7ea7\u522b\u7684\u7cbe\u7ec6\u8bc4\u4f30\u548c\u4efb\u52a1\u591a\u6837\u6027\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86E.T. Bench\uff08\u4e8b\u4ef6\u7ea7\u522b\u4e0e\u65f6\u95f4\u654f\u611f\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u5f00\u653e\u5f0f\u7684\u4e8b\u4ef6\u7ea7\u522b\u89c6\u9891\u7406\u89e3\u7684\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u57fa\u51c6\u6d4b\u8bd5\u3002 E.T. Bench\u6309\u7167\u4e09\u5c42\u4efb\u52a1\u5206\u7c7b\u4f53\u7cfb\u8fdb\u884c\u7ec4\u7ec7\uff0c\u5305\u542b\u4e86\u6db5\u76d612\u4e2a\u4efb\u52a1\u76847300\u4e2a\u6837\u672c\uff0c\u4ee5\u53ca8\u4e2a\u9886\u57df\u76842514\u5c0f\u65f6\u603b\u65f6\u957f\u76847000\u4e2a\u89c6\u9891\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5e7f\u6cdb\u5730\u5bf98\u4e2a\u56fe\u50cf\u5927\u8bed\u8a00\u6a21\u578b\u548c12\u4e2a\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5e76\u4e14\u7ed3\u679c\u663e\u793a\uff0c\u7528\u4e8e\u7c97\u7c92\u5ea6\uff08\u89c6\u9891\u7ea7\uff09\u7406\u89e3\u7684\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u89e3\u51b3\u6211\u4eec\u7684\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u4f8b\u5982\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u611f\u5174\u8da3\u7684\u4e8b\u4ef6\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u89c6\u9891\u4e0a\u4e0b\u6587\u957f\u5ea6\u77ed\u3001\u65f6\u95f4\u8868\u793a\u4e0d\u5f53\u4ee5\u53ca\u7f3a\u4e4f\u591a\u4e8b\u4ef6\u8bad\u7ec3\u6570\u636e\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\u2014\u2014E.T. Chat\uff0c\u4ee5\u53ca\u4e13\u95e8\u4e3a\u7cbe\u7ec6\u7c92\u5ea6\u4e8b\u4ef6\u7406\u89e3\u8bbe\u8ba1\u7684\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6E.T. Instruct 164K\u3002\u6211\u4eec\u7684\u7b80\u5355\u4f46\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6848\u5728\u591a\u4e2a\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002**|\n", "2409.18060": "|**2024-09-26**|**Infering Alt-text For UI Icons With Large Language Models During App Development**|Sabrina Haque et.al.|[2409.18060](http://arxiv.org/abs/2409.18060)|null|\u786e\u4fdd\u79fb\u52a8\u5e94\u7528\u7684\u65e0\u969c\u788d\u6027\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u4f9d\u8d56\u5c4f\u5e55\u9605\u8bfb\u5668\u7684\u89c6\u969c\u7528\u6237\u3002\u754c\u9762\u56fe\u6807\u5bf9\u4e8e\u5bfc\u822a\u548c\u4e92\u52a8\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u7f3a\u4e4f\u6709\u610f\u4e49\u7684\u66ff\u4ee3\u6587\u672c\uff0c\u4ece\u800c\u5f62\u6210\u4f7f\u7528\u969c\u788d\u3002\u4f20\u7edf\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5728\u751f\u6210\u66ff\u4ee3\u6587\u672c\u65f6\u9700\u8981\u5927\u91cf\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u5728\u56fe\u6807\u7c7b\u578b\u591a\u6837\u6027\u4e0e\u4e0d\u5e73\u8861\u6027\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u66f4\u8fd1\u671f\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5219\u8981\u6c42\u5b8c\u6574\u7684UI\u5c4f\u5e55\uff0c\u8fd9\u5728\u5e94\u7528\u7a0b\u5e8f\u5f00\u53d1\u7684\u8fed\u4ee3\u9636\u6bb5\u53ef\u80fd\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u90e8\u5206UI\u6570\u636e\u81ea\u4e3b\u751f\u6210\u79fb\u52a8UI\u56fe\u6807\u7684\u63cf\u8ff0\u6027\u66ff\u4ee3\u6587\u672c\u3002\u901a\u8fc7\u6574\u5408\u5305\u62ec\u7c7b\u522b\u3001\u8d44\u6e90ID\u3001\u8fb9\u754c\u3001OCR\u68c0\u6d4b\u5230\u7684\u6587\u5b57\u4ee5\u53ca\u7236\u8282\u70b9\u548c\u540c\u7ea7\u8282\u70b9\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5728\u5185\u7684\u56fe\u6807\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u5bf9\u5927\u7ea61400\u4e2a\u56fe\u6807\u7684\u5c0f\u578b\u6570\u636e\u96c6\u8fdb\u884c\u79bb\u7ebf\u5fae\u8c03\uff0c\u4ece\u800c\u751f\u6210\u4e86IconDesc\u3002\u5728\u5b9e\u8bc1\u8bc4\u4f30\u548c\u7528\u6237\u7814\u7a76\u4e2d\uff0cIconDesc\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u76f8\u5173\u66ff\u4ee3\u6587\u672c\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u80fd\u529b\u4f7f\u5f97IconDesc\u6210\u4e3a\u5f00\u53d1\u8005\u7684\u91cd\u8981\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u5feb\u901f\u8fed\u4ee3\u548c\u63d0\u5347UI\u7684\u65e0\u969c\u788d\u6027\u3002|\n", "2409.18053": "|**2024-09-26**|**DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving**|Dingrui Wang et.al.|[2409.18053](http://arxiv.org/abs/2409.18053)|**[link](https://github.com/TUM-AVS/DualAD)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u81ea\u4e3b\u9a7e\u9a76\u6846\u67b6DualAD\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u9a7e\u9a76\u8fc7\u7a0b\u4e2d\u7684\u51b3\u7b56\u903b\u8f91\u3002DualAD\u7531\u4e24\u5c42\u6784\u6210\uff1a\u5e95\u5c42\u4e3a\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\uff0c\u8d1f\u8d23\u5904\u7406\u9700\u8981\u8f83\u5c11\u51b3\u7b56\u7684\u5e38\u89c4\u9a7e\u9a76\u4efb\u52a1\uff1b\u4e0a\u5c42\u5219\u914d\u5907\u4e86\u4e00\u4e2a\u57fa\u4e8e\u89c4\u5219\u7684\u6587\u5b57\u7f16\u7801\u5668\uff0c\u5c06\u7edd\u5bf9\u72b6\u6001\u4e0b\u7684\u9a7e\u9a76\u573a\u666f\u8f6c\u5316\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6b64\u6587\u672c\u968f\u540e\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u51b3\u7b56\u3002\u5f53\u68c0\u6d4b\u5230\u6f5c\u5728\u5371\u9669\u65f6\uff0c\u4e0a\u5c42\u4f1a\u4ecb\u5165\u5e95\u5c42\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u5728\u5173\u952e\u60c5\u51b5\u4e0b\u7684\u51b3\u7b56\u903b\u8f91\u3002\u95ed\u5408\u73af\u8def\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528\u96f6\u8bad\u7ec3\u9884\u8bad\u7ec3\u6a21\u578b\u7684DualAD\u663e\u8457\u4f18\u4e8e\u7f3a\u4e4f\u51b3\u7b56\u80fd\u529b\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8fd8\u5f3a\u8c03\u4e86\u6587\u5b57\u7f16\u7801\u5668\u7684\u6709\u6548\u6027\uff0c\u5b83\u6781\u5927\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u5bf9\u573a\u666f\u7684\u7406\u89e3\u80fd\u529b\u3002\u6b64\u5916\uff0c\u96c6\u6210\u7684DualAD\u6a21\u578b\u968f\u7740\u66f4\u5f3a\u5927\u7684LLM\u7684\u4f7f\u7528\u800c\u5f97\u5230\u6539\u5584\uff0c\u8fd9\u8868\u660e\u8be5\u6846\u67b6\u5177\u6709\u8fdb\u4e00\u6b65\u589e\u5f3a\u7684\u6f5c\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4ee3\u7801\u548c\u57fa\u51c6\u6d4b\u8bd5\u4f9b\u516c\u4f17\u8bbf\u95ee\u3002|\n", "2409.18042": "|**2024-09-26**|**EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions**|Kai Chen et.al.|[2409.18042](http://arxiv.org/abs/2409.18042)|null|\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u4e2d\uff0c\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u4ee5\u516c\u5f00\u6570\u636e\u8fdb\u884c\u7aef\u5230\u7aef\u7684\u56fe\u50cf\u3001\u6587\u672c\u548c\u8bed\u97f3\u751f\u6210\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u73b0\u6709\u7684\u89c6\u8bed\u6a21\u578b\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u8fdb\u884c\u8bed\u97f3\u5904\u7406\uff0c\u800c\u8bed\u97f3\u8bed\u6a21\u578b\u4ecd\u7f3a\u4e4f\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86EMOVA\uff08\u60c5\u7eea\u5316\u7684\u5168\u6a21\u5f0f\u8bed\u97f3\u52a9\u624b\uff09\uff0c\u4ee5\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5177\u5907\u7aef\u5230\u7aef\u7684\u8bed\u97f3\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9886\u5148\u7684\u89c6\u8bed\u8868\u73b0\u3002\u901a\u8fc7\u8bed\u4e49-\u58f0\u5b66\u5206\u79bb\u7684\u8bed\u97f3\u7f16\u7801\u5668\uff0c\u6211\u4eec\u610f\u5916\u5730\u53d1\u73b0\uff0c\u5168\u6a21\u6001\u5bf9\u9f50\u53ef\u4ee5\u8fdb\u4e00\u6b65\u589e\u5f3a\u89c6\u8bed\u548c\u8bed\u97f3\u80fd\u529b\uff0c\u4e0e\u76f8\u5e94\u7684\u53cc\u6a21\u6001\u5bf9\u9f50\u6a21\u578b\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u98ce\u683c\u6a21\u5757\uff0c\u7528\u4e8e\u7075\u6d3b\u63a7\u5236\u8bed\u97f3\u98ce\u683c\uff08\u4f8b\u5982\u60c5\u611f\u548c\u97f3\u8c03\uff09\u3002\u9996\u6b21\uff0cEMOVA\u5728\u89c6\u8bed\u548c\u8bed\u97f3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u540c\u65f6\u652f\u6301\u5e26\u6709\u751f\u52a8\u60c5\u611f\u7684\u5168\u6a21\u6001\u5bf9\u8bdd\u3002|\n", "2409.18028": "|**2024-09-26**|**Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective**|Yotam Wolf et.al.|[2409.18028](http://arxiv.org/abs/2409.18028)|null|\u5728\u8fdb\u884c\u590d\u6742\u5206\u6790\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f7f\u7528\u4e2d\uff0c\u901a\u5e38\u4f1a\u5c06\u6574\u4e2a\u4efb\u52a1\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u8fdb\u884c\u91c7\u6837\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u4e2d\u5206\u89e3\u4efb\u52a1\uff08\u5373\u94fe\u5f0f\u601d\u7ef4\uff09\u5bf9\u4e8e\u89e3\u51b3\u8fd9\u7c7b\u4efb\u52a1\u662f\u6709\u76ca\u7684\u3002\u672c\u6587\u6307\u51fa\u4e86\u4e00\u79cd\u9650\u5236\uff0c\u5373LLM\u5728\u540c\u4e00\u4e2a\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u6267\u884c\u591a\u4e2a\u5b50\u4efb\u52a1\u7684\u80fd\u529b\u2014\u2014\u4e00\u79cd\u201c\u590d\u5408\u96be\u5ea6\u201d\u3002\u8fd9\u8868\u660e\u5728LLM\u7ec4\u6210\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u5c06\u5206\u89e3\u540e\u7684\u95ee\u9898\u5206\u53d1\u5904\u7406\u5177\u6709\u4f18\u52bf\u3002\u6211\u4eec\u901a\u8fc7\u751f\u6210\u590d\u6742\u5ea6\u6307\u6807\u6765\u91cf\u5316\u8fd9\u79cd\u590d\u5408\u96be\u5ea6\uff0c\u5373\u5728\u91c7\u6837\u5230\u81f3\u5c11\u4e00\u4e2a\u6b63\u786e\u89e3\u6240\u9700\u7684LLM\u751f\u6210\u6b21\u6570\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u76f8\u5bf9\u4e8e\u5728\u76f8\u540c\u4e0a\u4e0b\u6587\u5185\u89e3\u51b3\u7ec4\u5408\u95ee\u9898\uff0c\u5c06\u95ee\u9898\u5206\u6563\u7ed9\u591a\u4e2a\u667a\u80fd\u4f53\u7684\u751f\u6210\u590d\u6742\u5ea6\u4e4b\u95f4\u5b58\u5728\u5dee\u8ddd\uff0c\u5e76\u4e14\u968f\u7740\u89e3\u957f\u5ea6\u7684\u589e\u52a0\uff0c\u8fd9\u4e2a\u5dee\u8ddd\u5448\u6307\u6570\u589e\u957f\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u548c\u5b9e\u9a8c\u8bc1\u660e\u4e86\u8fd9\u4e00\u7ed3\u679c\u3002|\n", "2409.18025": "|**2024-09-26**|**An Adversarial Perspective on Machine Unlearning for AI Safety**|Jakub \u0141ucki et.al.|[2409.18025](http://arxiv.org/abs/2409.18025)|**[link](https://github.com/ethz-spylab/unlearning-vs-safety)**|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u62d2\u7edd\u5371\u9669\u77e5\u8bc6\u76f8\u5173\u95ee\u9898\u65b9\u9762\u7684\u5fae\u8c03\u65b9\u5f0f\uff0c\u4f46\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5f80\u5f80\u5bb9\u6613\u88ab\u7ed5\u8fc7\u3002\u53bb\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5f7b\u5e95\u6d88\u9664\u6a21\u578b\u7684\u5371\u9669\u80fd\u529b\u5e76\u4f7f\u5176\u5bf9\u653b\u51fb\u8005\u4e0d\u53ef\u8bbf\u95ee\u3002\u672c\u6587\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u6311\u6218\u4e86\u53bb\u5b66\u4e60\u4e0e\u4f20\u7edf\u5b89\u5168\u540e\u8bad\u7ec3\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u4e4b\u524d\u88ab\u8ba4\u4e3a\u65e0\u6548\u7684\u73b0\u6709\u9003\u8131\u65b9\u6cd5\uff0c\u5728\u7cbe\u5fc3\u5e94\u7528\u65f6\u53ef\u4ee5\u6210\u529f\u5e94\u5bf9\u53bb\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u7cfb\u5217\u9002\u5e94\u6027\u65b9\u6cd5\u6765\u6062\u590d\u5927\u90e8\u5206\u88ab\u8ba4\u4e3a\u662f\u65e0\u6cd5\u5b66\u4e60\u7684\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528RMU\uff08\u5f53\u524d\u6700\u5148\u8fdb\u7684\u53bb\u5b66\u4e60\u65b9\u6cd5\uff09\u7f16\u8f91\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u5728\u65e0\u5173\u793a\u4f8b\u4e0a\u8fdb\u884c\u5fae\u8c03\u6216\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u79fb\u9664\u7279\u5b9a\u65b9\u5411\uff0c\u53ef\u4ee5\u6062\u590d\u5927\u90e8\u5206\u5371\u9669\u80fd\u529b\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8d28\u7591\u4e86\u5f53\u524d\u53bb\u5b66\u4e60\u65b9\u6cd5\u7684\u7a33\u5065\u6027\uff0c\u5e76\u5bf9\u5b83\u4eec\u76f8\u5bf9\u4e8e\u5b89\u5168\u8bad\u7ec3\u7684\u4f18\u52bf\u63d0\u51fa\u4e86\u7591\u95ee\u3002|\n", "2409.18023": "|**2024-09-26**|**DARE: Diverse Visual Question Answering with Robustness Evaluation**|Hannah Sterz et.al.|[2409.18023](http://arxiv.org/abs/2409.18023)|null|\u300aDARE\uff1a\u591a\u6837\u5316\u7684\u89c6\u89c9\u95ee\u7b54\u4e0e\u9c81\u68d2\u6027\u8bc4\u4f30\u300b\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u5982\u4e0b\uff1a \u672c\u6587\u5f15\u5165\u4e86DARE\uff08Diverse Visual Question Answering with Robustness Evaluation\uff09\uff0c\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u5e76\u6536\u96c6\u7684\u591a\u9009\u578b\u89c6\u89c9\u95ee\u7b54\u57fa\u51c6\u3002DARE\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u4e94\u4e2a\u4e0d\u540c\u7c7b\u522b\u7684\u89c6\u89c9\u95ee\u9898\u4e0a\uff0c\u5e76\u5305\u62ec\u57fa\u4e8e\u63d0\u793a\u53d8\u5316\u3001\u7b54\u6848\u9009\u9879\u5b50\u96c6\u3001\u8f93\u51fa\u683c\u5f0f\u548c\u6b63\u786e\u7b54\u6848\u6570\u91cf\u7b49\u56db\u4e2a\u9c81\u68d2\u6027\u5bfc\u5411\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5927\u591a\u6570\u7c7b\u522b\u4e2d\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u4e14\u65e0\u6cd5\u5728\u6d4b\u8bd5\u7684\u6240\u6709\u9c81\u68d2\u6027\u8bc4\u4f30\u4e2d\u4fdd\u6301\u4e00\u81f4\u7684\u9ad8\u6027\u80fd\u3002\u5728\u4e0d\u540c\u7b54\u6848\u9009\u9879\u5b50\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u6700\u5dee\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u4e0b\u964d\u53ef\u8fbe\u6807\u51c6\u60c5\u51b5\u4e0b\u768434%\u3002\u5f00\u6e90\u6a21\u578b\u5982LLaVA 1.6\u548cIdefics\u5728\u9c81\u68d2\u6027\u65b9\u9762\u65e0\u6cd5\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4\u548cGemini\u76f8\u5339\u654c\uff0c\u800c\u540e\u8005\u5728\u4e0d\u540c\u53d8\u4f53\u4e0b\u4ecd\u8868\u73b0\u51fa\u660e\u663e\u7684\u8106\u5f31\u6027\u3002 \u603b\u4e4b\uff0c\u8be5\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u65f6\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u8bbe\u8ba1\u66f4\u9c81\u68d2\u7684\u6a21\u578b\u65f6\u9700\u8981\u8003\u8651\u7684\u95ee\u9898\u3002|\n", "2409.18014": "|**2024-09-26**|**Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles**|Lewei He et.al.|[2409.18014](http://arxiv.org/abs/2409.18014)|null|\u9488\u5bf9\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5904\u7406\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ecd\u7136\u5b58\u5728\u5b9e\u73b0\u590d\u6742\u6027\u3001\u8bad\u7ec3\u6548\u7387\u548c\u6570\u636e\u7a00\u758f\u6027\u7b49\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u8303\u5f0f\u2014\u2014\u5728\u7ebf\u957f\u671f\u4e0a\u4e0b\u6587\u5904\u7406\uff08OLP\uff09\uff0c\u9002\u7528\u4e8e\u5904\u7406\u65e0\u9650\u957f\u5ea6\u7684\u6587\u6863\uff0c\u5e38\u89c1\u4e8e\u81ea\u52a8\u5316\u65b0\u95fb\u62a5\u9053\u3001\u76f4\u64ad\u7535\u5546\u548c\u75c5\u6bd2\u77ed\u89c6\u9891\u7b49\u591a\u6837\u5316\u7684\u6d41\u5a92\u4f53\u4fe1\u606f\u63a5\u6536\u4e0e\u7ec4\u7ec7\u573a\u666f\u3002\u540c\u65f6\uff0c\u5728\u9009\u62e9\u4f17\u591a\u6027\u80fd\u4f18\u5f02\u3001\u4ef7\u683c\u9002\u4e2d\u4e14\u54cd\u5e94\u5ef6\u8fdf\u77ed\u7684LLM\u65f6\uff0c\u5f80\u5f80\u9047\u5230\u96be\u4ee5\u6289\u62e9\u7684\u95ee\u9898\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u89d2\u8272\u5f3a\u5316\u5b66\u4e60\uff08Role-RL\uff09\u6846\u67b6\uff0c\u81ea\u52a8\u90e8\u7f72\u4e0d\u540c\u89d2\u8272\u7684LLM\u5728OLP\u7ba1\u9053\u4e2d\uff0c\u6839\u636e\u5176\u5b9e\u9645\u6027\u80fd\u8fdb\u884c\u5408\u7406\u5206\u914d\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5e76\u5728\u6211\u4eec\u7684OLP-MINI\u6570\u636e\u96c6\u4e0a\u53d1\u73b0\uff0c\u7ed3\u5408Role-RL\u6846\u67b6\u7684OLP\u7cfb\u7edf\u5e73\u5747\u53ec\u56de\u7387\u4e3a93.2%\uff0c\u5b9e\u73b0\u4e86OLP\u57fa\u51c6\uff0c\u5e76\u8282\u7701\u4e8679.4%\u7684LLM\u6210\u672c\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\uff1ahttps://anonymous.4open.science/r/Role-RL\u3002|\n", "2409.18957": "|**2024-09-27**|**LML: Language Model Learning a Dataset for Data-Augmented Prediction**|Praneeth Vadlapati et.al.|[2409.18957](http://arxiv.org/abs/2409.18957)|**[link](https://github.com/pro-genai/lml-dap)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u5206\u7c7b\u4efb\u52a1\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u901a\u5e38\u7531\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5904\u7406\u3002\u4e0e\u4f9d\u8d56\u5927\u91cf\u6570\u636e\u6e05\u6d17\u548c\u7279\u5f81\u5de5\u7a0b\u7684ML\u6a21\u578b\u4e0d\u540c\uff0c\u6b64\u65b9\u6cd5\u901a\u8fc7\u7b80\u5316\u6d41\u7a0b\uff0c\u4f7f\u7528LLM\u6765\u4f18\u5316\u8fc7\u7a0b\u3002\u672c\u6587\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\uff08LML\uff09\u201d\u7684\u6982\u5ff5\uff0c\u501f\u52a9\u4e00\u79cd\u79f0\u4e3a\u201c\u6570\u636e\u589e\u5f3a\u9884\u6d4b\uff08DAP\uff09\u201d\u7684\u65b0\u65b9\u6cd5\u3002\u5206\u7c7b\u4efb\u52a1\u7531LLM\u6267\u884c\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u624b\u52a8\u63a2\u7d22\u548c\u7406\u89e3\u6570\u636e\uff0c\u5e76\u5229\u7528\u6570\u636e\u4f5c\u4e3a\u53c2\u8003\u6765\u505a\u51fa\u5206\u7c7b\u51b3\u7b56\u3002 \u8bad\u7ec3\u6570\u636e\u88ab\u603b\u7ed3\u548c\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5bfc\u81f4\u6bcf\u4e2a\u6807\u7b7e\u5206\u7c7b\u7684\u4e3b\u8981\u7279\u5f81\u3002\u5728DAP\u8fc7\u7a0b\u4e2d\uff0c\u7cfb\u7edf\u4f7f\u7528\u6570\u636e\u6982\u8981\u81ea\u52a8\u751f\u6210\u67e5\u8be2\uff0c\u7528\u4e8e\u4ece\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u884c\u3002\u901a\u8fc7\u4f7f\u7528\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u6570\u636e\uff0cLLM\u57fa\u4e8e\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u884c\u751f\u6210\u5206\u7c7b\uff0c\u5373\u4f7f\u9762\u5bf9\u590d\u6742\u6570\u636e\u4e5f\u80fd\u786e\u4fdd\u6ee1\u610f\u7684\u51c6\u786e\u6027\u3002\u6570\u636e\u6982\u8981\u548c\u7c7b\u4f3c\u6570\u636e\u5728DAP\u4e2d\u7684\u5e94\u7528\u786e\u4fdd\u4e86\u51b3\u7b56\u7684\u4e0a\u4e0b\u6587\u610f\u8bc6\u3002\u8be5\u65b9\u6cd5\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u4e86\u201c\u4ee5\u53ef\u89e3\u91ca\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8eab\u4efd\u884c\u4e8b\u201d\u7684\u8bed\u53e5\uff0c\u589e\u5f3a\u4e86\u9884\u6d4b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5141\u8bb8\u7528\u6237\u5ba1\u67e5\u6bcf\u6761\u9884\u6d4b\u80cc\u540e\u7684\u903b\u8f91\u3002\u5728\u67d0\u4e9b\u6d4b\u8bd5\u6848\u4f8b\u4e2d\uff0c\u7cfb\u7edf\u7684\u51c6\u786e\u7387\u8d85\u8fc790%\uff0c\u8bc1\u660e\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u53ca\u5176\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8d85\u8d8a\u4f20\u7edfML\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/Pro-GenAI/LML-DAP\u3002**|\n", "2409.18943": "|**2024-09-27**|**Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models**|Jiaming Li et.al.|[2409.18943](http://arxiv.org/abs/2409.18943)|**[link](https://github.com/geaming2002/ruler)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u4f7f\u5f97\u4eba\u7c7b\u80fd\u591f\u4ee5\u81ea\u7136\u7684\u65b9\u5f0f\u4e0eAI\u4ee3\u7406\u4e92\u52a8\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u751f\u6210\u7279\u5b9a\u957f\u5ea6\u54cd\u5e94\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f80\u5f80\u96be\u4ee5\u6ee1\u8db3\u7528\u6237\u9700\u6c42\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u5728\u51c6\u786e\u611f\u77e5\u6570\u503c\u9650\u5236\u65b9\u9762\u5b58\u5728\u7684\u56fa\u6709\u56f0\u96be\u3002\u4e3a\u4e86\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9075\u5faa\u7279\u5b9a\u957f\u5ea6\u6307\u4ee4\u65f6\u63a7\u5236\u751f\u6210\u54cd\u5e94\u957f\u5ea6\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\uff08TLG\uff09\u5e76\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u7cbe\u786e\u5339\u914d\uff08PM\uff09\u548c\u7075\u6d3b\u5339\u914d\uff08FM\uff09\uff0c\u4ee5\u8bc4\u4f30\u6a21\u578b\u5728\u9075\u5b88\u6307\u5b9a\u54cd\u5e94\u957f\u5ea6\u65b9\u9762\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u6a21\u578b\u65e0\u5173\u7684\u65b9\u6cd5Ruler\uff0c\u901a\u8fc7\u4f7f\u7528\u5143\u957f\u5ea6\u6807\u8bb0\uff08MLTs\uff09\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u5ea6\u53d7\u9650\u6307\u4ee4\u4e0b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0cRuler\u4f7fLLMs\u80fd\u591f\u5728\u6307\u4ee4\u4e2d\u5305\u542b\u957f\u5ea6\u7ea6\u675f\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u6307\u5b9a\u957f\u5ea6\u7684\u54cd\u5e94\u3002\u800c\u4e14\uff0c\u5f53\u957f\u5ea6\u7ea6\u675f\u6ca1\u6709\u660e\u786e\u63d0\u4f9b\u65f6\uff0cRuler\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5f53\u7684MLT\uff0c\u8868\u73b0\u51fa\u51fa\u8272\u7684\u901a\u7528\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRuler\u5728\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\u4e0a\u5bf9\u4e0d\u540c\u7684LLMs\u90fd\u663e\u793a\u51fa\u6709\u6548\u6027\uff0c\u4f8b\u5982\u5728PM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a27.97\uff0c\u5728FM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a29.57\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86Ruler\u7684\u6709\u6548\u6027\u53ca\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/Geaming2002/Ruler\u83b7\u53d6\u3002**|\n", "2409.18938": "|**2024-09-27**|**From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding**|Heqing Zou et.al.|[2409.18938](http://arxiv.org/abs/2409.18938)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u89c6\u89c9\u7f16\u7801\u5668\u96c6\u6210\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5229\u7528\u5176\u56fa\u6709\u4f18\u52bf\u6765\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u4ee5\u8fdb\u884c\u89c6\u89c9\u63a8\u7406\u3002\u7531\u4e8e\u89c6\u89c9\u6570\u636e\u7684\u591a\u6837\u6027\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MM-LLMs\uff09\u5728\u8bbe\u8ba1\u548c\u8bad\u7ec3\u4e0a\u9488\u5bf9\u7406\u89e3\u56fe\u50cf\u3001\u77ed\u89c6\u9891\u548c\u957f\u89c6\u9891\u65f6\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u7279\u5f81\u548c\u6311\u6218\u3002\u6211\u4eec\u7684\u7814\u7a76\u805a\u7126\u4e8e\u957f\u89c6\u9891\u7406\u89e3\u4e0e\u9759\u6001\u56fe\u50cf\u53ca\u77ed\u89c6\u9891\u7406\u89e3\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u53ca\u5176\u72ec\u7279\u6311\u6218\u3002 \u4e0d\u540c\u4e8e\u9759\u6001\u56fe\u50cf\uff0c\u77ed\u89c6\u9891\u5305\u542b\u4e86\u5e8f\u5217\u5e27\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u5185\u90e8\u7684\u65f6\u95f4\u4fe1\u606f\uff1b\u800c\u957f\u89c6\u9891\u5219\u5305\u542b\u4e86\u591a\u4e2a\u4e8b\u4ef6\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u95f4\u7684\u957f\u671f\u65f6\u95f4\u4f9d\u8d56\u6027\u3002\u672c\u6587\u65e8\u5728\u8ffd\u6eaf\u5e76\u603b\u7ed3MM-LLMs\u4ece\u56fe\u50cf\u7406\u89e3\u5230\u957f\u89c6\u9891\u7406\u89e3\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u8be6\u7ec6\u5bf9\u6bd4\u5404\u79cd\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e4b\u95f4\u7684\u5dee\u5f02\uff0c\u5e76\u7a81\u51fa\u957f\u89c6\u9891\u7406\u89e3\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u66f4\u7ec6\u81f4\u7684\u65f6\u7a7a\u7ec6\u8282\u3001\u52a8\u6001\u4e8b\u4ef6\u548c\u957f\u671f\u4f9d\u8d56\u6027\u3002 \u63a5\u7740\uff0c\u672c\u6587\u5bf9MM-LLMs\u5728\u6a21\u578b\u8bbe\u8ba1\u548c\u8bad\u7ec3\u65b9\u6cd5\u4e0a\u7684\u53d1\u5c55\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u6982\u8ff0\uff0c\u7279\u522b\u5173\u6ce8\u4e8e\u5982\u4f55\u6709\u6548\u7406\u89e3\u957f\u89c6\u9891\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6bd4\u8f83\u73b0\u6709MM-LLMs\u5728\u4e0d\u540c\u957f\u5ea6\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u8868\u73b0\uff0c\u672c\u6587\u8ba8\u8bba\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u89c6\u9891\u7406\u89e3\u9886\u57df\u53ef\u80fd\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.18924": "|**2024-09-27**|**AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow**|Huizi Yu et.al.|[2409.18924](http://arxiv.org/abs/2409.18924)|null|\u5728\u73b0\u4ee3\u533b\u5b66\u6559\u80b2\u4e0e\u7814\u7a76\u9886\u57df\uff0c\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u53d1\u6325\u7740\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u3001\u7efc\u5408\u7684\u5b66\u4e60\u73af\u5883\uff0c\u5e76\u5141\u8bb8\u8fdb\u884c\u4e34\u5e8a\u51b3\u7b56\u6a21\u62df\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6709\u671b\u901a\u8fc7\u9ad8\u4fdd\u771f\u5ea6\u548c\u4f4e\u6210\u672c\u5730\u590d\u5236\u533b\u7597\u72b6\u51b5\u548c\u533b\u60a3\u4e92\u52a8\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u7cfb\u7edf\u7684\u6709\u6548\u6027\u548c\u53ef\u4fe1\u6027\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u4e00\u4e2a\u89c4\u6a21\u5927\u3001\u591a\u6837\u4e14\u7cbe\u786e\u7684\u60a3\u8005\u77e5\u8bc6\u5e93\uff0c\u540c\u65f6\u5177\u5907\u5f3a\u5927\u7684\u7a33\u5b9a\u77e5\u8bc6\u4f20\u64ad\u80fd\u529b\u3002 \u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AIPatient\uff0c\u8fd9\u662f\u4e00\u4e2a\u9ad8\u7ea7\u7684\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\uff0c\u5b83\u4ee5AIPatient\u77e5\u8bc6\u56fe\u8c31\uff08AIPatient KG\uff09\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u91c7\u7528\u57fa\u4e8e\u63a8\u7406\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Reasoning RAG\uff09\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u4f5c\u4e3a\u751f\u6210\u57fa\u7840\u3002AIPatient KG\u4eceMedical Information Mart for Intensive Care\uff08MIMIC-III\uff09\u6570\u636e\u5e93\u4e2d\u7684\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHRs\uff09\u62bd\u53d6\u6570\u636e\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u5728\u77e5\u8bc6\u5e93\u6709\u6548\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08F1\u5f97\u5206\u4e3a0.89\uff09\u3001\u4e34\u5e8a\u591a\u6837\u6027\u548c\u76f8\u5173\u6027\u9ad8\u76841,495\u540d\u60a3\u8005\u7684\u7fa4\u4f53\u3002 Reasoning RAG\u5229\u7528\u4e86\u516d\u4e2a\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u8986\u76d6\u4e86\u5305\u62ec\u68c0\u7d22\u3001KG\u67e5\u8be2\u751f\u6210\u3001\u62bd\u8c61\u3001\u68c0\u67e5\u3001\u91cd\u5199\u548c\u603b\u7ed3\u5728\u5185\u7684\u4efb\u52a1\u3002\u8fd9\u4e2a\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8eEHR\u7684\u533b\u7597\u95ee\u7b54\uff08QA\uff09\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8694.15%\u7684\u6574\u4f53\u51c6\u786e\u6027\uff0c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u65e0\u4ee3\u7406\u6216\u90e8\u5206\u4ee3\u7406\u96c6\u6210\u7684\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cfb\u7edf\u8fd8\u5c55\u793a\u4e86\u9ad8\u53ef\u8bfb\u6027\uff08\u4e2d\u4f4d\u6570Flesch\u9605\u8bfb\u8f7b\u677e\u5ea677.23\uff1b\u4e2d\u4f4d\u6570Flesch-Kincaid\u5e74\u7ea75.6\uff09\u3001\u7a33\u5065\u6027\uff08ANOVA F\u503c0.6126\uff0cp<0.1\uff09\u548c\u7a33\u5b9a\u6027\uff08ANOVA F\u503c0.782\uff0cp<0.1\uff09\u3002AIPatient\u7cfb\u7edf\u7684\u51fa\u8272\u6027\u80fd\u9884\u793a\u7740\u5176\u5728\u533b\u5b66\u6559\u80b2\u3001\u6a21\u578b\u8bc4\u4f30\u548c\u7cfb\u7edf\u96c6\u6210\u7b49\u591a\u4e2a\u5e94\u7528\u9886\u57df\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2409.18911": "|**2024-09-27**|**Soft Measures for Extracting Causal Collective Intelligence**|Maryam Berijanian et.al.|[2409.18911](http://arxiv.org/abs/2409.18911)|**[link](https://github.com/kuldeep7688/soft-measures-causal-intelligence)**|**\u7406\u89e3\u4e0e\u6a21\u62df\u96c6\u4f53\u667a\u6167\u5bf9\u4e8e\u5904\u7406\u590d\u6742\u793e\u4f1a\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u6a21\u7cca\u8ba4\u77e5\u5730\u56fe\uff08FCMs\uff09\u4f5c\u4e3a\u8868\u793a\u56e0\u679c\u5fc3\u7406\u6a21\u578b\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u901a\u8fc7\u5b9a\u5411\u56fe\u8fdb\u884c\u7f16\u7801\uff0c\u4f46\u76f4\u63a5\u4ece\u6587\u672c\u63d0\u53d6\u9ad8\u53ef\u4fe1\u5ea6\u7684FCMs\u5177\u6709\u6311\u6218\u6027\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u63d0\u53d6FCMs\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5f15\u5165\u4e86\u65b0\u9896\u7684\u57fa\u4e8e\u56fe\u7684\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528Elo\u8bc4\u5206\u7cfb\u7edf\u5173\u8054\u8f93\u51fa\u4e0e\u4eba\u7c7b\u5224\u65ad\u6765\u8bc4\u4f30\u8fd9\u4e9b\u5ea6\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u5ea6\u91cf\u4e0e\u4eba\u7c7b\u8bc4\u4ef7\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\uff0c\u5c3d\u7ba1\u8868\u73b0\u6700\u597d\u7684\u5ea6\u91cf\u4ecd\u7136\u5728\u6355\u6349FCM\u7ec6\u5fae\u5dee\u522b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u73b0\u6709\u7684\u5ea6\u91cf\u4ecd\u7136\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u9700\u6c42\u3002\u672c\u7814\u7a76\u5f3a\u8c03\u4e86\u9700\u8981\u9488\u5bf9FCMs\u63d0\u53d6\u8bbe\u8ba1\u7684\u8f6f\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u4f7f\u7528NLP\u6a21\u62df\u96c6\u4f53\u667a\u6167\u7684\u53d1\u5c55\u3002**|\n", "2409.18892": "|**2024-09-27**|**IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation**|Fan Lin et.al.|[2409.18892](http://arxiv.org/abs/2409.18892)|**[link](https://github.com/DUTlf/IDGen)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u8bc4\u4f30\u96c6\u5fc5\u987b\u4e0e\u65f6\u4ff1\u8fdb\uff0c\u4ee5\u786e\u4fdd\u5176\u6301\u7eed\u4fdd\u6301\u8db3\u591f\u7684\u533a\u5206\u80fd\u529b\u3002\u53d7\u6559\u80b2\u8bc4\u4f30\u4e2d\u5e7f\u6cdb\u4f7f\u7528\u7684\u9879\u76ee\u9274\u522b\uff08Item Discrimination, ID\uff09\u7406\u8bba\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eID\u7684\u63d0\u793a\u5408\u6210\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\uff0c\u786e\u4fdd\u8bc4\u4f30\u96c6\u80fd\u591f\u6839\u636e\u6a21\u578b\u7684\u80fd\u529b\u4e0d\u65ad\u66f4\u65b0\u548c\u4f18\u5316\u3002\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u6ce8\u91cd\u5e7f\u5ea6\u4e0e\u7cbe\u786e\u6027\u5e76\u91cd\u3002\u5b83\u80fd\u751f\u6210\u65e2\u80fd\u5168\u9762\u8bc4\u4f30LLMs\u80fd\u529b\uff0c\u53c8\u80fd\u63ed\u793a\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u6709\u610f\u4e49\u6027\u80fd\u5dee\u5f02\u7684\u63d0\u793a\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u548c\u9886\u57df\u4e2d\u7684\u76f8\u5bf9\u5f3a\u9879\u548c\u5f31\u70b9\u7684\u6709\u6548\u533a\u5206\u3002 \u4e3a\u4e86\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u901a\u7528\u5316\u6846\u67b6\u4e2d\u878d\u5165\u4e86\u4e00\u4e2a\u81ea\u6211\u6821\u6b63\u673a\u5236\uff0c\u5e76\u5f00\u53d1\u4e86\u4e24\u4e2a\u6a21\u578b\u6765\u9884\u6d4b\u63d0\u793a\u7684\u9274\u522b\u80fd\u529b\u548c\u96be\u5ea6\u8bc4\u5206\uff0c\u4ee5\u6b64\u63a8\u52a8\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u3002\u8fd9\u4e9b\u5de5\u5177\u5bf9\u8bc4\u4f30\u6570\u636e\u5408\u6210\u7814\u7a76\u5177\u6709\u91cd\u8981\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u6570\u636e\u5e94\u7528\u4e8e\u8bc4\u4f30\u4e94\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u3002\u8be5\u6570\u636e\u5e73\u5747\u5f97\u5206\u4e3a51.92\uff0c\u65b9\u5dee\u4e3a10.06\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5148\u524d\u7684\u5de5\u4f5c\uff08\u5982SELF-INSTRUCT\u548cWizardLM\uff09\u7684\u5e73\u5747\u5f97\u5206\u8d85\u8fc767\uff0c\u65b9\u5dee\u4f4e\u4e8e3.2\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u6846\u67b6\u751f\u6210\u7684\u6570\u636e\u5728\u6311\u6218\u6027\u548c\u533a\u5206\u80fd\u529b\u4e0a\u6bd4\u4e4b\u524d\u7684\u5de5\u4f5c\u66f4\u5177\u4f18\u52bf\u3002\u6211\u4eec\u8ba1\u5212\u53d1\u5e03\u5305\u542b\u8d85\u8fc73000\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7684\u6570\u636e\u5e93\uff0c\u4ee5\u4fc3\u8fdbLLMs\u8bc4\u4f30\u7814\u7a76\u7684\u53d1\u5c55\u3002|\n", "2409.18858": "|**2024-09-27**|**Predicting and analyzing memorization within fine-tuned Large Language Models**|J\u00e9r\u00e9mie Dentan et.al.|[2409.18858](http://arxiv.org/abs/2409.18858)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u8bb0\u5fc6\u4e86\u76f8\u5f53\u5927\u7684\u6bd4\u4f8b\uff0c\u8fd9\u5728\u63a8\u7406\u65f6\u6784\u6210\u4e86\u4e25\u91cd\u7684\u5a01\u80c1\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u79cd\u65e0\u610f\u7684\u8bb0\u5fc6\u95ee\u9898\uff0c\u7406\u89e3\u54ea\u4e9b\u5143\u7d20\u88ab\u8bb0\u5fc6\u4ee5\u53ca\u539f\u56e0\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u5927\u591a\u6570\u73b0\u6709\u5de5\u4f5c\u63d0\u4f9b\u7684\u662f\u4e8b\u540e\u89e3\u91ca\uff0c\u8fd9\u5728\u5b9e\u8df5\u4e2d\u5174\u8da3\u6709\u9650\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u5207\u7247\u4e92\u4fe1\u606f\uff0c\u5728\u5206\u7c7b\u573a\u666f\u4e2d\u9884\u5148\u68c0\u6d4b\u8bb0\u5fc6\u6837\u672c\u3002\u8be5\u65b9\u6cd5\u4ece\u8bad\u7ec3\u7684\u65e9\u671f\u9636\u6bb5\u5c31\u5177\u6709\u9ad8\u6548\u6027\uff0c\u5e76\u4e14\u6613\u4e8e\u9002\u5e94\u5b9e\u9645\u573a\u666f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f97\u5230\u4e86\u65b0\u7684\u7406\u8bba\u7ed3\u679c\u7684\u652f\u6301\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff0c\u5e76\u4e14\u9700\u8981\u8f83\u4f4e\u7684\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u83b7\u5f97\u4e86\u5f3a\u5927\u7684\u5b9e\u8bc1\u7ed3\u679c\uff0c\u4e3a\u5728\u8bb0\u5fc6\u53d1\u751f\u4e4b\u524d\u7cfb\u7edf\u5730\u68c0\u67e5\u548c\u4fdd\u62a4\u8fd9\u4e9b\u6613\u53d7\u5f71\u54cd\u7684\u6837\u672c\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.18857": "|**2024-09-27**|**Mitigating Selection Bias with Node Pruning and Auxiliary Options**|Hyeong Kyu Choi et.al.|[2409.18857](http://arxiv.org/abs/2409.18857)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u56de\u7b54\u591a\u9009\u9898\u65f6\u5f80\u5f80\u8868\u73b0\u51fa\u5bf9\u67d0\u4e9b\u9009\u9879\u7684\u4e0d\u9002\u5f53\u504f\u597d\uff0c\u8fd9\u5728LLM\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u5f15\u53d1\u4e86\u663e\u8457\u7684\u53ef\u9760\u6027\u95ee\u9898\u3002\u4ee5\u5f80\u7684\u89e3\u51b3\u65b9\u6848\u4e3b\u8981\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u8f93\u5165\u548c/\u6216\u8f93\u51fa\u6765\u5e94\u5bf9\u504f\u89c1\u95ee\u9898\u3002\u800c\u6211\u4eec\u7684\u5de5\u4f5c\u5219\u91c7\u53d6\u4e86\u4e0d\u540c\u7684\u8def\u5f84\uff0c\u65e8\u5728\u63a2\u7a76\u6a21\u578b\u5185\u90e8\u504f\u89c1\u7684\u5f62\u6210\u673a\u5236\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u504f\u5dee\u8282\u70b9\u4fee\u526a\uff08BNP\uff09\u7684\u65b0\u9896\u53bb\u504f\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u5220\u9664\u90a3\u4e9b\u5bfc\u81f4\u504f\u89c1\u7684\u7ebf\u6027\u5c42\u53c2\u6570\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u8f85\u52a9\u9009\u9879\u6ce8\u5165\uff08AOI\uff09\u7684\u7b80\u5355\u800c\u6709\u6548\u7684\u8f93\u5165\u4fee\u6539\u6280\u672f\uff0c\u9002\u7528\u4e8e\u9ed1\u76d2\u6a21\u578b\u7684\u53bb\u504f\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e00\u4e2a\u66f4\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u9009\u62e9\u504f\u89c1\uff0c\u6211\u4eec\u56de\u987e\u4e86\u73b0\u6709\u6307\u6807\uff0c\u5e76\u63d0\u51fa\u4e86\u9009\u62e9Kullback-Leibler\u6563\u5ea6\uff08CKLD\uff09\uff0c\u4ee5\u89e3\u51b3\u5e38\u7528\u6307\u6807\u5bf9\u6807\u7b7e\u4e0d\u5e73\u8861\u4e0d\u654f\u611f\u7684\u95ee\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5e94\u7528\u5230\u4e09\u79cd\u4e0d\u540c\u7684LLM\u65f6\u8868\u73b0\u51fa\u4e86\u9c81\u68d2\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2409.18812": "|**2024-09-27**|**LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis**|Hamed Babaei Giglou et.al.|[2409.18812](http://arxiv.org/abs/2409.18812)|**[link](https://github.com/HamedBabaei/LLMs4Synthesis)**|\u9762\u5bf9\u79d1\u5b66\u6587\u732e\u65e5\u76ca\u589e\u957f\u7684\u590d\u6742\u6027\u548c\u6570\u91cf\uff0c\u672c\u6587\u63d0\u51fa\u4e86LLMs4Synthesis\u6846\u67b6\uff0c\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u79d1\u5b66\u7efc\u5408\u5206\u6790\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u9488\u5bf9\u5feb\u901f\u3001\u8fde\u8d2f\u548c\u8bed\u5883\u4e30\u5bcc\u7684\u79d1\u5b66\u89c1\u89e3\u96c6\u6210\u9700\u6c42\uff0c\u5229\u7528\u5f00\u6e90\u548c\u4e13\u6709LLMs\uff0c\u4ee5\u89e3\u51b3\u5f53\u524d\u5b9a\u91cf\u6307\u6807\u5728\u8bc4\u4f30\u8fd9\u4e9b\u7efc\u5408\u5206\u6790\u65f6\u5b58\u5728\u7684\u4e0d\u8db3\u3002\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u5904\u7406\u79d1\u5b66\u8bba\u6587\u7684\u65b0\u65b9\u6cd5\u3001\u5b9a\u4e49\u65b0\u7684\u7efc\u5408\u7c7b\u578b\u4ee5\u53ca\u5efa\u7acb\u4e5d\u9879\u8be6\u7ec6\u7684\u8d28\u91cf\u8bc4\u4f30\u6807\u51c6\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5bf9\u8fd9\u4e00\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6211\u4eec\u8fd8\u63d0\u8bae\u5c06LLMs\u4e0e\u5f3a\u5316\u5b66\u4e60\u548cAI\u53cd\u9988\u76f8\u7ed3\u5408\uff0c\u4ee5\u4f18\u5316\u7efc\u5408\u8d28\u91cf\uff0c\u5e76\u786e\u4fdd\u5176\u4e0e\u65e2\u5b9a\u6807\u51c6\u4fdd\u6301\u4e00\u81f4\u3002LLMs4Synthesis\u6846\u67b6\u53ca\u5176\u7ec4\u6210\u90e8\u5206\u7684\u53ef\u7528\u6027\uff0c\u6709\u671b\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u7efc\u5408\u8fc7\u7a0b\u7684\u751f\u6210\u548c\u8bc4\u4ef7\u80fd\u529b\u3002|\n", "2409.18794": "|**2024-09-27**|**Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs**|Yanyuan Qiao et.al.|[2409.18794](http://arxiv.org/abs/2409.18794)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u540d\u4e3aOpen-Nav\u7684\u521b\u65b0\u7814\u7a76\uff0c\u65e8\u5728\u63a2\u7d22\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fde\u7eed\u73af\u5883\u4e2d\u7684\u96f6\u6837\u672c\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u5e94\u7528\u3002Open-Nav\u91c7\u7528\u4e86\u7a7a\u95f4\u65f6\u95f4\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63a8\u7406\u65b9\u6cd5\uff0c\u5c06\u4efb\u52a1\u5206\u89e3\u4e3a\u6307\u4ee4\u7406\u89e3\u3001\u8fdb\u5ea6\u4f30\u8ba1\u548c\u51b3\u7b56\u5236\u5b9a\u4e09\u4e2a\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u6a21\u578b\u5728\u5bfc\u822a\u573a\u666f\u4e2d\u7684\u611f\u77e5\u80fd\u529b\u5e76\u589e\u5f3a\u5bf9\u7ec6\u7c92\u5ea6\u7269\u4f53\u548c\u7a7a\u95f4\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u6a21\u62df\u73af\u5883\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5747\u663e\u793a\uff0cOpen-Nav\u80fd\u591f\u4e0e\u4f7f\u7528\u95ed\u6e90LLMs\u5b9e\u73b0\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u6027\u80fd\u3002|\n", "2409.20566": "|**2024-09-30**|**MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning**|Haotian Zhang et.al.|[2409.20566](http://arxiv.org/abs/2409.20566)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cfMM1.5\uff0c\u65e8\u5728\u589e\u5f3a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7406\u89e3\u3001\u89c6\u89c9\u5f15\u7528\u4e0e\u5b9a\u4f4d\u4ee5\u53ca\u591a\u56fe\u50cf\u63a8\u7406\u7684\u80fd\u529b\u3002\u5728MM1\u67b6\u6784\u7684\u57fa\u7840\u4e0a\uff0cMM1.5\u91c7\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u5728\u6574\u4e2a\u6a21\u578b\u8bad\u7ec3\u751f\u547d\u5468\u671f\u5185\u4e0d\u540c\u6570\u636e\u6df7\u5408\u7684\u5f71\u54cd\u3002\u8fd9\u5305\u62ec\u9ad8\u8d28\u91cf\u7684OCR\u6570\u636e\u548c\u5408\u6210\u63cf\u8ff0\u7b26\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4ee5\u53ca\u4f18\u5316\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u53c2\u6570\u636e\u6df7\u5408\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6a21\u578b\u6db5\u76d6\u4e86\u4ece1\u4ebf\u523030\u4ebf\u53c2\u6570\u7684\u8303\u56f4\uff0c\u5305\u62ec\u5bc6\u96c6\u578b\u548c\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u53d8\u4f53\uff0c\u5e76\u8bc1\u660e\u4e86\u5373\u4f7f\u5728\u8f83\u5c0f\u89c4\u6a21\uff081\u4ebf\u548c3\u4ebf\u53c2\u6570\uff09\u4e0b\uff0c\u7cbe\u5fc3\u7684\u6570\u636e\u6574\u7406\u548c\u8bad\u7ec3\u7b56\u7565\u4e5f\u80fd\u4ea7\u751f\u5f3a\u5927\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u4e13\u95e8\u7684\u53d8\u4f53\uff1aMM1.5-Video\uff0c\u7528\u4e8e\u89c6\u9891\u7406\u89e3\uff1bMM1.5-UI\uff0c\u7528\u4e8e\u79fb\u52a8\u7528\u6237\u754c\u9762\u7406\u89e3\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u548c\u6d88\u878d\u5206\u6790\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u548c\u51b3\u7b56\u7684\u8be6\u7ec6\u89c1\u89e3\uff0c\u8fd9\u4e9b\u89c1\u89e3\u5bf9\u4e8e\u672a\u6765\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u5177\u6709\u5b9d\u8d35\u7684\u6307\u5bfc\u610f\u4e49\u3002|\n", "2409.20557": "|**2024-09-30**|**Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos**|Md Mohaiminul Islam et.al.|[2409.20557](http://arxiv.org/abs/2409.20557)|null|\u672c\u6587\u63d0\u51fa\u4e86VidAssist\uff0c\u4e00\u4e2a\u7528\u4e8e\u4ece\u6559\u5b66\u89c6\u9891\u4e2d\u8fdb\u884c\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u7684\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u7684\u96c6\u6210\u6846\u67b6\u3002VidAssist\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u77e5\u8bc6\u5e93\u548c\u8bc4\u4f30\u5de5\u5177\uff0c\u751f\u6210\u5e76\u8bc4\u4f30\u884c\u52a8\u8ba1\u5212\uff0c\u4ee5\u6b64\u514b\u670d\u4ece\u5c0f\u89c4\u6a21\u3001\u4f4e\u591a\u6837\u6027\u6570\u636e\u96c6\u83b7\u53d6\u8fc7\u7a0b\u77e5\u8bc6\u7684\u6311\u6218\u3002\u6b64\u5916\uff0cVidAssist\u91c7\u7528\u5e7f\u5ea6\u4f18\u5148\u641c\u7d22\u7b97\u6cd5\u8fdb\u884c\u6700\u4f18\u8ba1\u5212\u751f\u6210\uff0c\u5e76\u4f7f\u7528\u4e13\u4e3a\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u8ba1\u7684\u4ef7\u503c\u51fd\u6570\uff0c\u5728\u6bcf\u4e00\u6b65\u8bc4\u4f30\u9884\u6d4b\u52a8\u4f5c\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cVidAssist\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u4e0d\u540c\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u7f6e\u7684\u7edf\u4e00\u6846\u67b6\uff0c\u5982\u89c6\u89c9\u8f85\u52a9\u89c4\u5212\uff08VPA\uff09\u548c\u7a0b\u5e8f\u89c4\u5212\uff08PP\uff09\uff0c\u5728\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5c11\u91cf\u6837\u672c\u6a21\u578b\u5728COIN\u6570\u636e\u96c6\u4e0a\u7684VPA\u4efb\u52a1\u548cPP\u4efb\u52a1\u4e0a\u5206\u522b\u6bd4\u5168\u76d1\u7763\u7684\u524d\u5bfc\u65b9\u6cd5\u9ad8\u51fa+7.7%\u548c+4.81%\uff0c\u540c\u65f6\u9884\u6d4b4\u4e2a\u672a\u6765\u52a8\u4f5c\u3002\u6240\u6709\u4ee3\u7801\u548c\u6a21\u578b\u90fd\u5728https://sites.google.com/view/vidassist\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2409.20550": "|**2024-09-30**|**LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation**|Ziyao Zhang et.al.|[2409.20550](http://arxiv.org/abs/2409.20550)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u73b0\u8c61\u7684\u5b9e\u8bc1\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u5b9e\u9645\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u590d\u6742\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u65f6\uff0c\u5f80\u5f80\u4f1a\u4ea7\u751f\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u7684\u7ed3\u679c\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u5728\u5355\u4e00\u529f\u80fd\u751f\u6210\u573a\u666f\u4e0b\u7684\u5e7b\u89c9\u5206\u6790\uff0c\u4f46\u672c\u6587\u5c06\u7814\u7a76\u8303\u56f4\u6269\u5c55\u81f3\u66f4\u5b9e\u9645\u4e14\u590d\u6742\u7684\u4ed3\u5e93\u7ea7\u751f\u6210\u60c5\u666f\u3002 \u9996\u5148\uff0c\u901a\u8fc7\u4eba\u5de5\u68c0\u67e5\u516d\u79cd\u4e3b\u6d41LLM\u7684\u4ee3\u7801\u751f\u6210\u7ed3\u679c\uff0c\u672c\u6587\u5efa\u7acb\u4e86LLM\u751f\u6210\u4ee3\u7801\u7684\u5e7b\u89c9\u5206\u7c7b\u4f53\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u8be6\u7ec6\u9610\u8ff0\u4e86\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5206\u6790\u4e86\u4e0d\u540c\u6a21\u578b\u95f4\u5e7b\u89c9\u5206\u5e03\u7684\u60c5\u51b5\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5e7b\u89c9\u4ea7\u751f\u7684\u539f\u56e0\uff0c\u5e76\u8bc6\u522b\u4e86\u56db\u4e2a\u53ef\u80fd\u5bfc\u81f4\u5e7b\u89c9\u7684\u56e0\u7d20\u3002 \u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bb0\u5fc6\u7f51\u7edc\uff08RAG\uff09\u7684\u7f13\u89e3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u6240\u6709\u7814\u7a76\u7684LLM\u4e0a\u5747\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6709\u6548\u6027\u3002\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u62ec\u4ee3\u7801\u3001\u6570\u636e\u548c\u5b9e\u9a8c\u7ed3\u679c\u7684\u53ef\u590d\u5236\u5305\uff0c\u4f9b\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u53c2\u8003\u548c\u9a8c\u8bc1\u3002\u6b64\u7814\u7a76\u6709\u52a9\u4e8e\u63d0\u9ad8LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u53ef\u9760\u6027\u4e0e\u51c6\u786e\u6027\uff0c\u5bf9\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20548": "|**2024-09-30**|**Robi Butler: Remote Multimodal Interactions with Household Robot Assistant**|Anxing Xiao et.al.|[2409.20548](http://arxiv.org/abs/2409.20548)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Robi Butler\uff0c\u4e00\u79cd\u65b0\u578b\u7684\u5bb6\u5ead\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5b83\u80fd\u591f\u4e0e\u8fdc\u7a0b\u7528\u6237\u8fdb\u884c\u591a\u6a21\u6001\u4ea4\u4e92\u3002\u57fa\u4e8e\u5148\u8fdb\u7684\u901a\u4fe1\u63a5\u53e3\uff0cRobi Butler\u5141\u8bb8\u7528\u6237\u76d1\u63a7\u673a\u5668\u4eba\u7684\u72b6\u6001\u3001\u53d1\u9001\u6587\u672c\u6216\u8bed\u97f3\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u624b\u52bf\u9009\u62e9\u76ee\u6807\u5bf9\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u9ad8\u7ea7\u884c\u4e3a\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u89e3\u91ca\u591a\u6a21\u6001\u6307\u4ee4\u5e76\u751f\u6210\u884c\u52a8\u8ba1\u5212\u3002\u8fd9\u4e9b\u8ba1\u5212\u7531\u652f\u6301\u6587\u672c\u548c\u70b9\u51fb\u67e5\u8be2\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5904\u7406\u7684\u5f00\u653e\u8bcd\u6c47\u96c6\u7ec4\u6210\u3002\u6574\u5408\u4ee5\u4e0a\u7ec4\u4ef6\u4f7f\u5f97Robi Butler\u80fd\u591f\u5728\u96f6\u6837\u672c\u7684\u60c5\u51b5\u4e0b\u5c06\u8fdc\u7a0b\u591a\u6a21\u6001\u6307\u4ee4\u8f6c\u5316\u4e3a\u73b0\u5b9e\u4e16\u754c\u5bb6\u5ead\u73af\u5883\u4e2d\u7684\u5b9e\u9645\u64cd\u4f5c\u3002\u6211\u4eec\u901a\u8fc7\u6f14\u793a\u5404\u79cd\u65e5\u5e38\u5bb6\u52a1\u4efb\u52a1\u7684\u6709\u6548\u6027\u548c\u6548\u7387\uff0c\u5c55\u793a\u4e86\u8be5\u7cfb\u7edf\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u5230\u8fdc\u7a0b\u7528\u6237\u7ed9\u51fa\u591a\u6a21\u6001\u6307\u4ee4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u7528\u6237\u7814\u7a76\uff0c\u5206\u6790\u4e86\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8fdc\u7a0b\u4eba\u673a\u4ea4\u4e92\u7684\u6548\u7387\u548c\u7528\u6237\u4f53\u9a8c\u7684\u5f71\u54cd\uff0c\u5e76\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.20512": "|**2024-09-30**|**Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models**|Arpan Mukherjee et.al.|[2409.20512](http://arxiv.org/abs/2409.20512)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u51c6\u786e\u9884\u6d4b\u5de5\u4e1a\u5408\u6210\u4e2d\u6240\u7528\u9499\u949b\u77ff\u6eb6\u5242\u6bd2\u6027\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u9488\u5bf9\u6027\u548c\u7ed3\u6784\u5316\u7684\u6bd2\u6027\u6570\u636e\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u52a8\u5316\u6570\u636e\u63d0\u53d6\u4e0e\u5177\u6709\u4e0d\u786e\u5b9a\u6027\u4fe1\u606f\u7684\u9884\u6d4b\u6a21\u578b\uff0c\u4ee5\u586b\u8865\u6570\u636e\u7a7a\u767d\u5e76\u63d0\u9ad8\u9884\u6d4b\u7684\u7f6e\u4fe1\u5ea6\u3002 \u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e24\u79cd\u65b9\u6cd5\u4ece\u6d89\u53ca\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u7684\u79d1\u5b66\u6587\u732e\u8bed\u6599\u5e93\u4e2d\u81ea\u52a8\u63d0\u53d6\u76f8\u5173\u6570\u636e\uff1a\u8f83\u5c0f\u7684\u53cc\u5411\u8bed\u8a00\u6a21\u578b\uff08\u5982BERT\u548cELMo\uff09\u56e0\u5176\u91cd\u590d\u6027\u548c\u786e\u5b9a\u6027\u8f93\u51fa\u800c\u88ab\u4f7f\u7528\uff1b\u800c\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-3.5\u5219\u5229\u7528\u5176\u5e9e\u5927\u7684\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u66f4\u597d\u7684\u54cd\u5e94\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u201c\u63d0\u793a\u548c\u9a8c\u8bc1\u201d\u6280\u672f\u96c6\u6210\u5230LLM\u4e2d\uff0c\u65e8\u5728\u5b9e\u73b0\u6709\u9488\u5bf9\u6027\u7684\u63d0\u53d6\u548c\u4f18\u5316\uff0c\u4ece\u800c\u51cf\u5c11LLM\u7684\u5e7b\u89c9\u73b0\u8c61\uff0c\u63d0\u5347\u63d0\u53d6\u6570\u636e\u7684\u8d28\u91cf\u3002 \u63a5\u4e0b\u6765\uff0c\u63d0\u53d6\u7684\u6570\u636e\u88ab\u8f93\u5165\u5230\u9884\u8bad\u7ec3\u7684\u591a\u4efb\u52a1\u4e8c\u5143\u5206\u7c7b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u63d0\u53d6\u6eb6\u5242\u7684ED\u6027\u8d28\u3002\u6211\u4eec\u5229\u7528\u4ece\u5206\u7c7b\u6a21\u578b\u83b7\u5f97\u7684\u7c7b\u522b\u6982\u7387\u8fdb\u884c\u9999\u519c\u71b5\u4e3a\u57fa\u7840\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u4ee5\u6b64\u6765\u91cf\u5316\u4e0d\u786e\u5b9a\u6027\u5e76\u8bc6\u522b\u9884\u6d4b\u4e2d\u7684\u6570\u636e\u7f3a\u53e3\u3002\u8fd9\u79cd\u65b9\u6cd5\u5bfc\u81f4\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u7528\u4e8e\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u53ca\u5176\u57fa\u4e8e\u4e0d\u786e\u5b9a\u6027\u865a\u62df\u6bd2\u6027\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u548c\u5f26\u56fe\u6765\u53ef\u89c6\u5316\u6eb6\u5242\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u5e76\u4f18\u5148\u8003\u8651\u90a3\u4e9b\u53ef\u80fd\u5b58\u5728\u5371\u9669\u7684\u6eb6\u5242\uff0c\u7ed3\u679c\u53d1\u73b070%\u7684\u6eb6\u5242\u76f8\u4e92\u4f5c\u7528\u4e3b\u8981\u4e0e\u7279\u5b9a\u7684\u4e24\u79cd\u9499\u949b\u77ff\u76f8\u5173\u8054\u3002|\n", "2409.20502": "|**2024-09-30**|**COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models**|Divyanshu Daiya et.al.|[2409.20502](http://arxiv.org/abs/2409.20502)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOLLAGE\u7684\u65b0\u578b\u6846\u67b6\uff0c\u7528\u4e8e\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5c42\u6b21\u5316\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u5411\u91cf\u91cf\u5316\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VQ-VAE\uff09\u6765\u751f\u6210\u534f\u4f5c\u5f0f\u4ee3\u7406-\u5bf9\u8c61-\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u6a21\u578b\u89e3\u51b3\u4e86\u8fd9\u4e00\u9886\u57df\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u6765\u6307\u5bfc\u751f\u6210\u6027\u6269\u6563\u6a21\u578b\u3002\u5c42\u6b21\u5316\u7684VQ-VAE\u67b6\u6784\u5728\u591a\u4e2a\u62bd\u8c61\u7ea7\u522b\u6355\u83b7\u4e86\u4e0d\u540c\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u7279\u5f81\uff0c\u907f\u514d\u4e86\u5197\u4f59\u6982\u5ff5\uff0c\u5e76\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u591a\u5206\u8fa8\u7387\u8868\u793a\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u5728\u9690\u7a7a\u95f4\u4e2d\u64cd\u4f5c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5e76\u7ed3\u5408\u4e86\u7531LLM\u751f\u6210\u7684\u8fd0\u52a8\u89c4\u5212\u63d0\u793a\u6765\u5f15\u5bfc\u53bb\u566a\u8fc7\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9488\u5bf9\u7279\u5b9a\u63d0\u793a\u7684\u8fd0\u52a8\u751f\u6210\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a7\u5236\u6027\u548c\u591a\u6837\u6027\u3002\u5728CORE-4D\u548cInterHuman\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u751f\u6210\u771f\u5b9e\u4e14\u591a\u6837\u5316\u7684\u534f\u4f5c\u4eba\u7c7b-\u7269\u4f53-\u4eba\u7c7b\u4ea4\u4e92\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u673a\u5668\u4eba\u5b66\u3001\u56fe\u5f62\u5b66\u548c\u8ba1\u7b97\u673a\u89c6\u89c9\u7b49\u9886\u57df\u5efa\u6a21\u590d\u6742\u4ea4\u4e92\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.20441": "|**2024-10-01**|**Instance-adaptive Zero-shot Chain-of-Thought Prompting**|Xiaosong Yuan et.al.|[2409.20441](http://arxiv.org/abs/2409.20441)|null|\u96f6\u5c04\u94fe\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7b56\u7565\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u65b9\u9762\u5c55\u73b0\u51fa\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5355\u4e00\u4efb\u52a1\u7ea7\u63d0\u793a\u5728\u6574\u4e2a\u5b9e\u4f8b\u4e0a\u7684\u5e94\u7528\u5b58\u5728\u5c40\u9650\u6027\uff0c\u56e0\u4e3a\u4e00\u4e2a\u63d0\u793a\u65e0\u6cd5\u4e0e\u6240\u6709\u5b9e\u4f8b\u90fd\u6210\u4e3a\u6700\u4f73\u642d\u6863\u3002\u56e0\u6b64\uff0c\u66f4\u6070\u5f53\u7684\u505a\u6cd5\u662f\u7cbe\u5fc3\u8003\u8651\u63d0\u793a\u4e0e\u6bcf\u4e2a\u5b9e\u4f8b\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b97\u6cd5\u4f5c\u4e3a\u96f6\u5c04CoT\u63a8\u7406\u7684\u4e00\u79cd\u66ff\u4ee3\u7b56\u7565\uff0c\u65e8\u5728\u901a\u8fc7\u9002\u5f53\u5730\u533a\u5206\u51fa\u597d\u7684\u548c\u574f\u7684\u63d0\u793a\u6765\u63d0\u5347\u6027\u80fd\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u4fe1\u606f\u6d41\u7684\u89d2\u5ea6\u5bf9LLM\u8fdb\u884c\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\uff0c\u53d1\u73b0\u4fe1\u606f\u4ece\u95ee\u9898\u5230\u63d0\u793a\u4ee5\u53ca\u95ee\u9898\u5230\u63a8\u7406\u7684\u53cc\u5411\u6d41\u52a8\u5bf9\u63a8\u7406\u7ed3\u679c\u5f71\u54cd\u6700\u5927\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u66f4\u4f18\u79c0\u7684\u96f6\u5c04CoT\u63a8\u7406\u9700\u8981\u63d0\u793a\u4ece\u95ee\u9898\u4e2d\u83b7\u53d6\u8bed\u4e49\u4fe1\u606f\uff0c\u7136\u540e\u63a8\u7406\u4ece\u95ee\u9898\u76f4\u63a5\u6216\u901a\u8fc7\u63d0\u793a\u95f4\u63a5\u5730\u805a\u5408\u8db3\u591f\u4fe1\u606f\u3002\u76f8\u53cd\uff0c\u7f3a\u5931\u8fd9\u4e9b\u4efb\u4f55\u4e00\u9879\u53ef\u80fd\u90fd\u4f1a\u5bfc\u81f4\u4e00\u4e2a\u4e0d\u7406\u60f3\u7684\u63d0\u793a\u3002\u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u96f6\u5c04CoT\u63a8\u7406\u7684\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b56\u7565\uff08IAP\uff09\u3002 \u5728LLaMA-2\u3001LLaMA-3\u548cQwen\u4e0a\u5bf9\u6570\u5b66\u3001\u903b\u8f91\u548c\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\uff08\u5982GSM8K\u3001MMLU\u3001\u56e0\u679c\u5224\u65ad\uff09\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5b9e\u4f8b\u81ea\u9002\u5e94\u96f6\u5c04CoT\u63d0\u793a\u7b56\u7565\u5728\u67d0\u4e9b\u5b9a\u5236\u63d0\u793a\u6216\u590d\u6742\u7a0b\u5e8f\u7684\u57fa\u7840\u4e0a\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd9\u8bc1\u660e\u4e86\u6211\u4eec\u5728\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\u7814\u7a76\u4e2d\u7684\u53d1\u73b0\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20385": "|**2024-09-30**|**Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation**|Shan Chen et.al.|[2409.20385](http://arxiv.org/abs/2409.20385)|null|\u80cc\u666f\uff1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8bad\u7ec3\u6210\u9075\u5faa\u6307\u4ee4\uff0c\u4f46\u8fd9\u79cd\u8bbe\u8ba1\u4f7f\u5176\u5bb9\u6613\u5728\u751f\u6210\u9519\u8bef\u4fe1\u606f\u65f6\u76f2\u76ee\u9075\u4ece\u7528\u6237\u8bf7\u6c42\u3002\u5728\u533b\u5b66\u9886\u57df\uff0c\u8fd9\u53ef\u80fd\u4f1a\u52a0\u901f\u9519\u8bef\u4fe1\u606f\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u5f71\u54cd\u4eba\u7c7b\u5065\u5eb7\u3002\u7814\u7a76\u76ee\u6807/\u65b9\u6cd5\uff1a\u6211\u4eec\u5206\u6790\u4e86\u6a21\u578b\u5728\u77e5\u9053\u8bf7\u6c42\u4e0d\u5408\u7406\u7684\u60c5\u51b5\u4e0b\uff0c\u751f\u6210\u4e0e\u836f\u7269\u6709\u5173\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u503e\u5411\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u901a\u8fc7\u4e0a\u4e0b\u6587\u63d0\u793a\u548c\u8c03\u6574\u53c2\u6570\uff0c\u4f7fLLMs\u4f18\u5148\u8003\u8651\u903b\u8f91\u63a8\u7406\u800c\u975e\u9075\u4ece\u6027\uff0c\u4ee5\u964d\u4f4e\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u98ce\u9669\u7684\u53ef\u80fd\u6027\u3002 \u7ed3\u679c\uff1a\u6240\u6709\u524d\u6cbfLLMs\u90fd\u9075\u5b88\u4e86\u751f\u6210\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u4e0d\u5408\u7406\u8bf7\u6c42\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u548c\u53c2\u6570\u8c03\u6574\u7b56\u7565\u53ef\u4ee5\u63d0\u5347\u68c0\u6d4b\u8bf7\u6c42\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\uff0c\u5e76\u9632\u6b62\u533b\u7597\u4fe1\u606f\u7684\u8bef\u4f20\u3002 \u7ed3\u8bba\uff1a\u5c06LLMs\u7684\u8bbe\u8ba1\u91cd\u5fc3\u4ece\u9075\u4ece\u6027\u8f6c\u5411\u903b\u8f91\u63a8\u7406\uff0c\u6709\u52a9\u4e8e\u964d\u4f4e\u5176\u88ab\u5229\u7528\u4e8e\u4f20\u64ad\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u7684\u98ce\u9669\u3002|\n", "2409.20370": "|**2024-09-30**|**The Perfect Blend: Redefining RLHF with Mixture of Judges**|Tengyu Xu et.al.|[2409.20370](http://arxiv.org/abs/2409.20370)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u7684\u540e\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u7ea6\u675f\u751f\u6210\u7b56\u7565\u4f18\u5316\uff08CGPO\uff09\u3002CGPO\u7684\u6838\u5fc3\u662f\u201c\u88c1\u5224\u6df7\u5408\u201d\uff08MoJ\uff09\uff0c\u5b83\u4ee5\u6210\u672c\u6548\u76ca\u7684\u65b9\u5f0f\u5bf9\u7b56\u7565\u8fdb\u884c\u5206\u5c42\u7ea6\u675f\u4f18\u5316\uff0c\u4ece\u800c\u5728\u539f\u7406\u4e0a\u8bc6\u522bRLHF\u4e2d\u7684\u5b8c\u7f8e\u878d\u5408\u3002\u6b64\u65b9\u6cd5\u5728\u7406\u8bba\u4e0a\u6709\u4fdd\u8bc1\uff0c\u4e0d\u9700\u8981\u5927\u91cf\u7684\u8d85\u53c2\u6570\u8c03\u6574\uff0c\u5e76\u4e14\u53ef\u4ee5\u5728\u5e38\u89c1\u7684\u540e\u8bad\u7ec3\u7ba1\u9053\u4e2d\u65e0\u7f1d\u96c6\u6210\u3002\u8fd9\u6709\u52a9\u4e8e\u68c0\u6d4b\u548c\u7f13\u89e3\u5956\u52b1\u4f5c\u5f0a\u884c\u4e3a\uff0c\u5e76\u5728\u5927\u91cf\u76ee\u6807\u7684\u573a\u666f\u4e0b\u8fbe\u5230\u5e15\u7d2f\u6258\u6700\u4f18\u70b9\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cCGPO\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u6807\u51c6\u7684RLHF\u7b97\u6cd5\uff0c\u5982PPO\u548cDPO\uff0c\u5305\u62ec\u901a\u7528\u804a\u5929\u3001STEM\u95ee\u9898\u3001\u6307\u4ee4\u9075\u5faa\u548c\u7f16\u7a0b\u7b49\u3002\u5177\u4f53\u800c\u8a00\uff0cCGPO\u5728AlpacaEval-2\uff08\u901a\u7528\u804a\u5929\uff09\u4e0a\u63d0\u9ad8\u4e867.4%\uff0c\u5728Arena-Hard\uff08STEM\u4e0e\u63a8\u7406\uff09\u4e0a\u63d0\u9ad8\u4e8612.5%\uff0c\u5e76\u5728\u6570\u5b66\u548c\u5176\u4ed6\u9886\u57df\u5982\u7f16\u7a0b\u7b49\u4efb\u52a1\u4e0a\u4fdd\u6301\u4e00\u81f4\u7684\u6539\u8fdb\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u867d\u7136PPO\u7ecf\u5e38\u88ab\u4f7f\u7528\uff0c\u4f46\u5728\u6d41\u884c\u7684\u7f16\u7a0b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u5bb9\u6613\u906d\u53d7\u4e25\u91cd\u7684\u5956\u52b1\u4f5c\u5f0a\uff0c\u800cCGPO\u6210\u529f\u5730\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\u3002 \u8fd9\u4e00\u7a81\u7834\u5728RLHF\u9886\u57df\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5956\u52b1\u4f5c\u5f0a\u548c\u6781\u7aef\u591a\u76ee\u6807\u4f18\u5316\u7684\u6311\u6218\uff0c\u800c\u4e14\u63a8\u8fdb\u4e86\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u591a\u79cd\u5e94\u7528\u4e2d\u7684\u5bf9\u9f50\u6280\u672f\u3002|\n", "2409.20365": "|**2024-09-30**|**VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs**|Ruotong Liao et.al.|[2409.20365](http://arxiv.org/abs/2409.20365)|**[link](https://github.com/mayhugotong/videoinsta)**|\u5728\u89c6\u9891\u8bed\u8a00\u9886\u57df\uff0c\u5229\u7528\u96f6\u6837\u672c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u7406\u8fdb\u884c\u89c6\u9891\u7406\u89e3\u7684\u6700\u65b0\u5de5\u4f5c\u5df2\u6210\u4e3a\u6311\u6218\u4f20\u7edf\u7aef\u5230\u7aef\u6a21\u578b\u7684\u6709\u529b\u7ade\u4e89\u8005\u3002\u7136\u800c\uff0c\u957f\u89c6\u9891\u7684\u7406\u89e3\u9762\u4e34\u7740\u72ec\u7279\u7684\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u65f6\u95f4\u8de8\u5ea6\u65f6\uff0c\u5373\u4f7f\u662f\u96f6\u6837\u672cLLM\u65b9\u6cd5\u4e5f\u662f\u5982\u6b64\u3002\u957f\u89c6\u9891\u4e2d\u7684\u4fe1\u606f\u5197\u4f59\u95ee\u9898\u4fc3\u4f7f\u6211\u4eec\u601d\u8003\u54ea\u4e9b\u4fe1\u606f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53ca\u5982\u4f55\u5229\u7528\u5b83\u4eec\u8fdb\u884c\u590d\u6742\u7684\u7a7a\u95f4-\u65f6\u95f4\u63a8\u7406\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u957f\u89c6\u9891\u5206\u6790\u7684\u7406\u89e3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aVideoINSTA\uff08INformative Spatial-TemporAl Reasoning\uff09\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u96f6\u6837\u672c\u957f\u89c6\u9891\u7406\u89e3\u3002VideoINSTA\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a\uff081\uff09\u5229\u7528LLM\u8fdb\u884c\u957f\u89c6\u9891\u7406\u89e3\u7684\u96f6\u6837\u672c\u6846\u67b6\uff1b\uff082\uff09\u4e8b\u4ef6\u9a71\u52a8\u7684\u65f6\u95f4\u63a8\u7406\u548c\u57fa\u4e8e\u5185\u5bb9\u7684\u7a7a\u95f4\u63a8\u7406\u65b9\u6cd5\uff0c\u4f7fLLM\u80fd\u591f\u5bf9\u89c6\u9891\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\uff1b\uff083\uff09\u4e00\u79cd\u81ea\u6211\u53cd\u601d\u7684\u4fe1\u606f\u63a8\u7406\u65b9\u6848\uff0c\u901a\u8fc7\u4fe1\u606f\u5145\u5206\u6027\u548c\u9884\u6d4b\u7f6e\u4fe1\u5ea6\u7684\u5e73\u8861\u6765\u8c03\u6574\u65f6\u95f4\u56e0\u7d20\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5728\u4e09\u4e2a\u957f\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6700\u4f73\u6027\u80fd\uff1aEgoSchema\u3001NextQA\u548cIntentQA\uff0c\u4ee5\u53ca\u5f00\u653e\u95ee\u7b54\u6570\u636e\u96c6ActivityNetQA\u3002\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u53d1\u5e03\uff1ahttps://github.com/mayhugotong/VideoINSTA\u3002|\n", "2410.01805": "|**2024-10-02**|**Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads**|Yuxiang Huang et.al.|[2410.01805](http://arxiv.org/abs/2410.01805)|**[link](https://github.com/huangyuxiang03/Locret)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u652f\u6301\u957f\u671f\u4e0a\u4e0b\u6587\u7406\u89e3\u548c\u5904\u7406\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5c06LLMs\u7684\u751f\u6210\u63a8\u7406\u6269\u5c55\u5230\u5982\u6b64\u957f\u7684\u4e0a\u4e0b\u6587\u4f1a\u589e\u52a0\u5927\u91cf\u7684\u8ba1\u7b97\u8d1f\u8f7d\uff0c\u5e76\u8981\u6c42\u5728\u7ef4\u6301\u57fa\u4e8e\u8f6c\u6362\u5668\u7684LLMs\u7684\u5173\u952e\u503c\u5bf9\uff08KV\uff09\u7f13\u5b58\u65f6\u4f7f\u7528\u5927\u91cfGPU\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u538b\u7f29\u65b9\u6cd5\uff0c\u5982\u91cf\u5316\uff0c\u968f\u7740\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u589e\u52a0\u800c\u9047\u5230\u5185\u5b58\u74f6\u9888\uff1b\u800c\u56fa\u5b9a\u5927\u5c0f\u7684\u7f13\u5b58\uff0c\u5982\u6dd8\u6c70\u7b56\u7565\uff0c\u5219\u7531\u4e8e\u4e0d\u9ad8\u6548\u7684\u7b56\u7565\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u8fd9\u4e9b\u9650\u5236\u9650\u5236\u4e86\u5728\u5355\u4e2aNvidia 4090 GPU\u7b49\u6d88\u8d39\u8005\u7ea7\u8bbe\u5907\u4e0a\u7684\u90e8\u7f72\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Locret\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u7528\u4e8e\u957f\u4e0a\u4e0b\u6587LLM\u63a8\u7406\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5165\u4fdd\u7559\u5934\u90e8\u6765\u8bc4\u4f30KV\u7f13\u5b58\u5355\u5143\u7684\u56e0\u679c\u91cd\u8981\u6027\uff0c\u4ece\u800c\u5141\u8bb8\u5728\u56fa\u5b9a\u7f13\u5b58\u5927\u5c0f\u5185\u8fdb\u884c\u66f4\u51c6\u786e\u7684\u6dd8\u6c70\u3002Locret\u5728\u51bb\u7ed3\u7684\u4e3b\u5e72LLM\u57fa\u7840\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4f7f\u7528\u6807\u51c6\u957f\u65f6\u95f4\u4e0a\u4e0b\u6587SFT\u6570\u636e\u96c6\u7684\u5c11\u91cf\u6570\u636e\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ee5\u5206\u5757\u9884\u586b\u5145\u6a21\u5f0f\u6dd8\u6c70\u4f4e\u91cd\u8981\u6027\u7684\u7f13\u5b58\u5355\u5143\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u91cf\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u6765\u8bc4\u4f30Locret\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6700\u8fd1\u7684\u7ade\u4e89\u65b9\u6cd5\uff08\u5305\u62ecInfLLM\u3001\u91cf\u5316\u3001SirLLM\u548cMInference\uff09\u76f8\u6bd4\uff0cLocret\u5728\u5185\u5b58\u6548\u7387\u548c\u751f\u6210\u5185\u5bb9\u8d28\u91cf\u65b9\u9762\u5747\u8868\u73b0\u51fa\u8272\u2014\u2014Locret\u5b9e\u73b0\u4e86\u4e0ePhi-3-mini-128K\u548cLlama-3.1-8B-instruct\u5168KV\u7f13\u5b58\u76f8\u6bd4\u8d85\u8fc720\u500d\u548c8\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\u7387\u3002\u6b64\u5916\uff0cLocret\u8fd8\u53ef\u4ee5\u4e0e\u5176\u4ed6\u65b9\u6cd5\uff08\u5982\u91cf\u5316\u548c\u4ee4\u724c\u5408\u5e76\uff09\u7ed3\u5408\u4f7f\u7528\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cLocret\u662f\u7b2c\u4e00\u4e2a\u80fd\u591f\u5c06Llama-3.1-8B\u6216\u7c7b\u4f3c\u6a21\u578b\u90e8\u7f72\u5230\u5355\u4e2aNvidia 4090 GPU\u4e0a\uff0c\u540c\u65f6\u5728\u4e0d\u727a\u7272\u751f\u6210\u8d28\u91cf\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0128K\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u7684\u6846\u67b6\uff0c\u4e14\u4ec5\u9700\u8981\u5c11\u91cf\u989d\u5916\u7684\u7cfb\u7edf\u4f18\u5316\u3002**|\n", "2410.01799": "|**2024-10-02**|**Efficient $1$-bit tensor approximations**|Alex W. Neal Riasanovsky et.al.|[2410.01799](http://arxiv.org/abs/2410.01799)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7a7a\u95f4\u6548\u7387\u9ad8\u7684\u77e9\u9635\u548c\u4efb\u610f\u9636\u5f20\u91cf\u5206\u89e3\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u7ebf\u6027\u7ec4\u5408\u7684\u5f20\u91cf\u79ef\u5f62\u5f0f\uff0c\u5176\u4e2d\u5411\u91cf\u503c\u4e3a$\\{-1, 1\\}$\u3002\u5bf9\u4e8e\u4efb\u4e00\u77e9\u9635$A \\in \\mathbb{R}^{m \\times n}$\uff0c\u5176\u8868\u8fbe\u5f0f\u4e3a\uff1a$$A - R_w = S_w C_w T_w^\\top = \\sum_{j=1}^w c_j \\cdot \\mathbf{s}_j \\mathbf{t}_j^\\top$$ \u8fd9\u662f\u4e00\u4e2a\u5173\u4e8e$A$\u7684\u201c\u5bbd\u5ea6\u4e3a$w$\u7684\u7b26\u53f7\u5207\u5206\u89e3\u201d\u3002\u8fd9\u91cc$C_w = \"diag\"(\\mathbf{c}_w)$\uff0c\u4e14$S_w, T_w$\u548c\u5411\u91cf$\\mathbf{s}_j, \\mathbf{t}_j$\u5747\u4e3a$\\{-1, 1\\}$\u503c\u3002\u7528\u4e8e\u5b58\u50a8$(S_w, T_w, C_w)$\u6240\u9700\u7684\u7a7a\u95f4\u662f$w \\cdot (m + n)$\u4f4d\uff0c\u5e76\u4ec5\u9700$w$\u4e2a\u6d6e\u70b9\u6570\u3002\u5f53\u5e94\u7528\u4e8e\u5177\u6709i.i.d. $\\mathcal N (0, 1)$\u5206\u5e03\u5143\u7d20\u7684#f32\u77e9\u9635\u65f6\uff0c$\\,R_w\\,_F$\u5448\u73b0\u51fa\u6307\u6570\u8870\u51cf\u3002\u9009\u62e9\u5408\u9002\u7684$w$\uff0c\u4f7f$(S_w, T_w, C_w)$\u7684\u5185\u5b58\u5360\u7528\u4e0e\\textit{f16}\u6216\\textit{bf16}\u77e9\u9635\u76f8\u540c\uff0c\u76f8\u5bf9\u8bef\u5dee\u76f8\u5f53\u3002\u6211\u4eec\u7684\u7b97\u6cd5\u572820\u884c\u4f2a\u4ee3\u7801\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u7b26\u53f7\u5207\u5206\u89e3\u3002\u5b83\u6e90\u81ea1999\u5e74Frieze\u548cKannan\u7684\u4e00\u7bc7\u8457\u540d\u8bba\u6587\u7684\u7b80\u5355\u4fee\u6539\u3002 \u4f5c\u4e3a\u7b2c\u4e00\u4e2a\u5e94\u7528\uff0c\u6211\u4eec\u5bf9\u5f00\u653e\u6e90\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\\textit{Mistral-7B-v0.1}\u4e2d\u7684\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u4e86$50\\%$\u7684\u7a7a\u95f4\u538b\u7f29\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6240\u6709$226$\u4e2a\u4f59\u77e9\u9635\u7684\u76f8\u5bf9\u8bef\u5dee\u5747\u5c0f\u4e8e$6\\%$\uff0c\u4e14\u6269\u5c55\u6a21\u578b\u5728huggingface\u6392\u884c\u699c\u4e0a\u4e0e\\textit{Mistral-7B-v0.1}\u6a21\u578b\u8868\u73b0\u76f8\u8fd1\u3002\u968f\u7740\u7a7a\u95f4\u538b\u7f29\u7387\u4ece$50\\%$\u964d\u4f4e\u81f3$25\\%$\uff0c\u57fa\u51c6\u6027\u80fd\u7f13\u6162\u4e0b\u964d\u3002\u6211\u4eec\u4f18\u5316\u4e86\u5f00\u6e90\u7684\\textit{rust}\u5b9e\u73b0\uff0c\u4f7f\u7528\u4e86\\textit{avx2}\u548c\\textit{avx512}\u67b6\u6784\u4e0b\u7684\\textit{simd}\u6307\u4ee4\u8fdb\u884c\u52a0\u901f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06\u8be5\u7b97\u6cd5\u6269\u5c55\u5230\u4e86\u4efb\u610f\u9636\u5f20\u91cf\uff0c\u5e76\u5229\u7528\u5b83\u538b\u7f29\u4e86\u4e00\u5f20\u4f5c\u8005\u732bAngus\u7684\u7167\u7247\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u91cc\u7684\u6587\u672c\u5e76\u672a\u5305\u542b\u4efb\u4f55\u7279\u6b8a\u5b57\u7b26\u6216\u7279\u5b9a\u683c\u5f0f\u6807\u8bb0\uff0c\u800c\u662f\u4ee5\u7eaf\u6587\u672c\u5f62\u5f0f\u5448\u73b0\u4e86\u6458\u8981\u5185\u5bb9\u3002|\n", "2410.01795": "|**2024-10-02**|**Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models**|Joseph Lee et.al.|[2410.01795](http://arxiv.org/abs/2410.01795)|**[link](https://github.com/pennshenlab/freeform)**|**\u57fa\u4e8e\u590d\u6742\u9057\u4f20\u57fa\u7840\u9884\u6d4b\u8868\u578b\uff0c\u5229\u7528\u5c0f\u800c\u53ef\u89e3\u91ca\u7684\u53d8\u5f02\u7279\u5f81\u4ecd\u7136\u662f\u4e00\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u4e0a\uff0c\u4f7f\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6b64\u4efb\u52a1\uff0c\u4f46\u57fa\u56e0\u578b\u6570\u636e\u7684\u9ad8\u7ef4\u7279\u6027\u4f7f\u5f97\u5206\u6790\u548c\u9884\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u53d7\u5230\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7f16\u7801\u7684\u4e30\u5bcc\u77e5\u8bc6\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u751f\u7269\u533b\u5b66\u6982\u5ff5\u4e0a\u7684\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u65e8\u5728\u63a2\u7d22LLM\u5728\u8868\u683c\u57fa\u56e0\u578b\u6570\u636e\u7279\u5f81\u9009\u62e9\u4e0e\u5de5\u7a0b\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5e76\u5f15\u5165\u4e00\u79cd\u57fa\u4e8e\u77e5\u8bc6\u7684\u6846\u67b6\u3002\u6211\u4eec\u5f00\u53d1\u4e86FREEFORM\uff0c\u4e00\u79cd\u81ea\u7531\u6d41\u52a8\u63a8\u7406\u4e0e\u96c6\u6210\u589e\u5f3a\u7279\u5f81\u8f93\u51fa\u548c\u7a33\u5065\u5efa\u6a21\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u94fe\u5f0f\u601d\u8003\u4e0e\u96c6\u6210\u539f\u5219\uff0c\u5229\u7528LLM\u7684\u5185\u5728\u77e5\u8bc6\u6765\u9009\u62e9\u548c\u5de5\u7a0b\u7279\u5f81\u3002\u5728\u4e24\u4e2a\u4e0d\u540c\u7684\u4eba\u7c7b\u57fa\u56e0\u578b-\u8868\u578b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\uff0c\u5305\u62ec\u9057\u4f20\u8840\u7edf\u548c\u9057\u4f20\u6027\u542c\u529b\u635f\u5931\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e2a\u6846\u67b6\u5728\u4f4e\u6837\u672c\u91cf\u60c5\u51b5\u4e0b\u4f18\u4e8e\u51e0\u79cd\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u3002FREEFORM\u4f5c\u4e3a\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u53ef\u4ee5\u5728GitHub\u4e0a\u83b7\u53d6\uff1ahttps://github.com/PennShenLab/FREEFORM\u3002**|\n", "2410.01792": "|**2024-10-02**|**When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1**|R. Thomas McCoy et.al.|[2410.01792](http://arxiv.org/abs/2410.01792)|null|\u5728\u201c\u81ea\u52a8\u56de\u5f52\u4f59\u70ec\u201d\uff08McCoy\u7b49\u4eba\uff0c2023\u5e74\uff09\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u51e0\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8d77\u6e90\u4e0a\u5b58\u5728\u4e00\u4e9b\u91cd\u8981\u9650\u5236\uff0c\u8fd9\u5f52\u56e0\u4e8e\u5b83\u4eec\u7684\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u7279\u6027\u3002\u8fd9\u91cc\u6211\u4eec\u63a2\u8ba8\u4e86OpenAI\u7684\u65b0\u7cfb\u7edfo1\u662f\u5426\u4f9d\u7136\u5b58\u5728\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e0e\u4e4b\u524d\u7684LLMs\u76f8\u6bd4\uff0co1\u5728\u63a8\u7406\u4f18\u5316\u65b9\u9762\u6709\u6240\u4e0d\u540c\u3002\u7814\u7a76\u53d1\u73b0\uff0co1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\u663e\u8457\u4f18\u4e8e\u4e4b\u524d\u6a21\u578b\uff0c\u5728\u67d0\u4e9b\u5e38\u89c1\u4efb\u52a1\u7684\u7f55\u89c1\u53d8\u4f53\u4e0a\uff08\u4f8b\u5982\uff0c\u4ece\u5217\u8868\u4e2d\u7684\u6bcf\u4e2a\u8bcd\u7684\u7b2c\u4e8c\u4e2a\u5b57\u6bcd\u5f62\u6210\u7f29\u5199\uff0c\u800c\u4e0d\u662f\u7b2c\u4e00\u4e2a\u5b57\u6bcd\uff09\u8868\u73b0\u5c24\u5176\u51fa\u8272\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5b9a\u91cf\u6539\u8fdb\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u4f46o1\u4f9d\u7136\u663e\u793a\u51fa\u4e86\u4e0e\u4e4b\u524d\u7cfb\u7edf\u76f8\u540c\u7684\u57fa\u672c\u8d8b\u52bf\uff1a\u5bf9\u4e8e\u6982\u7387\u8f83\u9ad8\u7684\u793a\u4f8b\u548c\u4efb\u52a1\uff0co1\u7684\u8868\u73b0\u66f4\u597d\u4e14\u9700\u8981\u7684\u201c\u601d\u8003\u4ee4\u724c\u201d\u6570\u91cf\u8f83\u5c11\uff1b\u800c\u5728\u6982\u7387\u8f83\u4f4e\u7684\u60c5\u51b5\u4e0b\u5219\u8868\u73b0\u4e0d\u4f73\u3002 \u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u4ee5\u8fdb\u884c\u63a8\u7406\u53ef\u4ee5\u51cf\u8f7b\u4f46\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u514b\u670d\u8bed\u8a00\u6a21\u578b\u7684\u6982\u7387\u654f\u611f\u6027\u95ee\u9898\u3002|\n", "2410.01789": "|**2024-10-02**|**Investigating on RLHF methodology**|Alexey Kutalev et.al.|[2410.01789](http://arxiv.org/abs/2410.01789)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7684\u5bf9\u9f50\u95ee\u9898\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u8bad\u7ec3\u504f\u597d\u6a21\u578b\u7684\u7279\u6027\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u504f\u597d\uff0c\u5e76\u4ecb\u7ecd\u4e86\u5b9e\u73b0\u6700\u4f73\u7ed3\u679c\u6240\u9700\u7684\u65b9\u6cd5\u548c\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u63cf\u8ff0\u4e86\u9047\u5230\u7684\u6311\u6218\u4ee5\u53ca\u514b\u670d\u8fd9\u4e9b\u6311\u6218\u7684\u65b9\u5f0f\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u7684\u7ecf\u9a8c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u6211\u4eec\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\uff0c\u800c\u65e0\u9700\u521b\u5efa\u5355\u72ec\u7684\u504f\u597d\u6a21\u578b\u3002\u4f5c\u4e3a\u6211\u4eec\u7684\u8d21\u732e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u901a\u8fc7\u56f0\u60d1\u5ea6\u7b5b\u9009\u6536\u96c6\u504f\u597d\u6570\u636e\u96c6\u7684\u65b9\u6cd5\uff0c\u8fd9\u4f7f\u5f97\u4e3a\u7279\u5b9a\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u7684\u8fc7\u7a0b\u66f4\u52a0\u7b80\u4fbf\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u3002|\n", "2410.01784": "|**2024-10-02**|**OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models**|Heng Yang et.al.|[2410.01784](http://arxiv.org/abs/2410.01784)|**[link](https://github.com/yangheng95/OmniGenomeBench)**|**\u8fd1\u5e74\u6765\uff0c\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u6fc0\u53d1\u4e86\u5bf9\u57fa\u56e0\u7ec4\u57fa\u7840\u6a21\u578b\uff08GFMs\uff09\u7a81\u7834\u6027\u8fdb\u5c55\u7684\u671f\u5f85\u3002\u81ea\u751f\u547d\u8fdb\u5316\u4e4b\u521d\u5c31\u9690\u85cf\u5728\u591a\u6837\u5316\u7684\u57fa\u56e0\u7ec4\u4e2d\u7684\u201c\u81ea\u7136\u4e4b\u7801\u201d\uff0c\u8574\u542b\u7740\u5de8\u5927\u6f5c\u529b\uff0c\u80fd\u591f\u901a\u8fc7\u57fa\u56e0\u7ec4\u5efa\u6a21\u5bf9\u4eba\u7c7b\u548c\u751f\u6001\u7cfb\u7edf\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002\u8fd1\u671fGFM\u9886\u57df\u7684\u91cd\u8981\u7a81\u7834\uff0c\u5982Evo\uff0c\u5438\u5f15\u4e86\u5927\u91cf\u6295\u8d44\u4e0e\u5173\u6ce8\uff0c\u5b83\u4eec\u89e3\u51b3\u4e86\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5e76\u5c06\u57fa\u56e0\u7ec4\u7814\u7a76\u4ece\u624b\u52a8\u3001\u4e0d\u53ef\u9760\u548c\u4f4e\u6548\u7684\u4f20\u7edf\u6a21\u5f0f\u8f6c\u53d8\u4e3a\u81ea\u52a8\u5316\u3001\u53ef\u9760\u548c\u9ad8\u6548\u7684\u65b0\u8303\u5f0f\u3002\u5728\u57fa\u56e0\u7ec4\u5b66\u8fde\u7eed\u6280\u672f\u9769\u547d\u7684\u80cc\u666f\u4e0b\uff0cGFM\u7814\u7a76\u9762\u4e34\u4e24\u5927\u6311\u6218\uff1a\u7f3a\u4e4fGFM\u57fa\u51c6\u6d4b\u8bd5\u5de5\u5177\u4ee5\u53ca\u591a\u7ef4\u57fa\u56e0\u7ec4\u5b66\u7684\u5f00\u6e90\u8f6f\u4ef6\u7f3a\u5931\u3002\u8fd9\u4e9b\u6311\u6218\u963b\u788d\u4e86GFM\u5feb\u901f\u6f14\u8fdb\u53ca\u5176\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7406\u89e3\u4e0e\u5408\u6210\u57fa\u56e0\u7ec4\u7b49\u6570\u5341\u5e74\u6765\u5b58\u5728\u7684\u95ee\u9898\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GFMBench\u6846\u67b6\uff0c\u4e00\u4e2a\u4e13\u6ce8\u4e8eGFM\u5bfc\u5411\u57fa\u51c6\u6d4b\u8bd5\u7684\u5e73\u53f0\u3002GFMBench\u6807\u51c6\u5316\u4e86\u57fa\u51c6\u5957\u4ef6\uff0c\u5e76\u5b9e\u73b0\u4e86\u5bf9\u5927\u91cf\u5f00\u6e90GFMs\u7684\u81ea\u52a8\u5316\u57fa\u51c6\u6d4b\u8bd5\u3002\u5b83\u96c6\u6210\u4e86\u6765\u81ea\u56db\u5927\u5927\u578b\u57fa\u51c6\u7684\u6570\u767e\u4e07\u4e2a\u57fa\u56e0\u5e8f\u5217\uff0c\u8986\u76d6\u6570\u767e\u79cd\u57fa\u56e0\u7ec4\u4efb\u52a1\uff0c\u4f7fGFMs\u6c11\u4e3b\u5316\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u865a\u62df\u57fa\u56e0\u7ec4\u5e94\u7528\u3002\u6b64\u5916\uff0cGFMBench\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u53d1\u5e03\uff0c\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u754c\u9762\u548c\u591a\u6837\u5316\u6559\u7a0b\uff0c\u9002\u7528\u4e8e\u81ea\u52a8\u6d4b\u8bd5\u4ee5\u53caRNA\u8bbe\u8ba1\u548c\u7ed3\u6784\u9884\u6d4b\u7b49\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u4fc3\u8fdb\u57fa\u56e0\u7ec4\u5efa\u6a21\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u516c\u5171\u6392\u884c\u699c\uff0c\u5c55\u793a\u7531AutoBench\u751f\u6210\u7684\u57fa\u51c6\u6027\u80fd\u3002GFMBench\u4ee3\u8868\u4e86\u6807\u51c6\u5316GFM\u57fa\u51c6\u6d4b\u8bd5\u548c\u6c11\u4e3b\u5316GFM\u5e94\u7528\u7684\u4e00\u5927\u6b65\u3002**|\n", "2410.01782": "|**2024-10-02**|**Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models**|Shayekh Bin Islam et.al.|[2410.01782](http://arxiv.org/abs/2410.01782)|**[link](https://github.com/ShayekhBinIslam/openrag)**|\u4e3a\u4e86\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e8b\u5b9e\u51c6\u786e\u6027\u4e0a\u7684\u8868\u73b0\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\u5df2\u7ecf\u5f97\u5230\u4e86\u5e7f\u6cdb\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u5728\u5229\u7528\u68c0\u7d22\u5230\u7684\u8bc1\u636e\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u5f00\u6e90LLM\u65f6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Open-RAG\uff0c\u65e8\u5728\u589e\u5f3a\u5f00\u6e90LLM\u5728RAG\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u4efb\u610f\u5bc6\u96c6\u578bLLM\u8f6c\u6362\u6210\u4e00\u4e2a\u53c2\u6570\u9ad8\u6548\u7684\u7a00\u758f\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\uff0c\u80fd\u591f\u5904\u7406\u5305\u62ec\u5355\u8df3\u548c\u591a\u8df3\u67e5\u8be2\u5728\u5185\u7684\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002 Open-RAG\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\uff0c\u5b83\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u6765\u5e94\u5bf9\u770b\u4f3c\u76f8\u5173\u4f46\u5177\u6709\u8bef\u5bfc\u6027\u7684\u5e72\u6270\u9879\uff0c\u4ece\u800c\u6709\u6548\u5730\u5bfc\u822a\u590d\u6742\u573a\u666f\u3002\u901a\u8fc7\u5229\u7528\u6f5c\u5b66\u4e60\uff0cOpen-RAG\u52a8\u6001\u9009\u62e9\u76f8\u5173\u4e13\u5bb6\u5e76\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u4ee5\u63d0\u4f9b\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u81ea\u9002\u5e94\u68c0\u7d22\u65b9\u6cd5\uff0c\u7528\u4e8e\u5224\u65ad\u68c0\u7d22\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5e73\u8861\u6027\u80fd\u589e\u76ca\u4e0e\u63a8\u7406\u901f\u5ea6\u4e4b\u95f4\u7684\u6743\u8861\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLlama2-7B\u7684Open-RAG\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\uff0c\u76f8\u8f83\u4e8eChatGPT\u3001Self-RAG\u548cCommand R+\u7b49\u6700\u5148\u8fdb\u7684LLM\u548cRAG\u6a21\u578b\uff0c\u8868\u73b0\u51fa\u66f4\u4f18\u7684\u8868\u73b0\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6a21\u578b\u5f00\u6e90\u5728https://openragmoe.github.io/\u3002|\n", "2410.01769": "|**2024-10-02**|**Quantifying Generalization Complexity for Large Language Models**|Zhenting Qi et.al.|[2410.01769](http://arxiv.org/abs/2410.01769)|**[link](https://github.com/zhentingqi/scylla)**|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u7406\u89e3\u590d\u6742\u67e5\u8be2\u548c\u6267\u884c\u9ad8\u7ea7\u4efb\u52a1\u7684\u975e\u51e1\u80fd\u529b\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u5f80\u5f80\u4e0e\u8bb0\u5fc6\u6df1\u5ea6\u4ea4\u7ec7\u5728\u4e00\u8d77\uff0c\u8fd9\u8981\u6c42\u6211\u4eec\u8fdb\u884c\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86Scylla\uff0c\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u5b9a\u91cf\u8861\u91cfLLMs\u7684\u6cdb\u5316\u80fd\u529b\u3002Scylla\u901a\u8fc7\u5728\u5206\u5e03\u5185\uff08ID\uff09\u548c\u5206\u5e03\u5916\uff08OOD\uff09\u6570\u636e\u4e0a\u8bc4\u4f30\u6a21\u578b\u6027\u80fd\u6765\u5206\u79bb\u6cdb\u5316\u4e0e\u8bb0\u5fc6\uff0c\u6d89\u53ca20\u4e2a\u4efb\u52a1\uff0c\u8986\u76d65\u4e2a\u590d\u6742\u5ea6\u7ea7\u522b\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u4e0eID\u548cOOD\u6570\u636e\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u4e4b\u95f4\u975e\u5355\u8c03\u7684\u5173\u7cfb\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6cdb\u5316\u5c71\u8c37\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u8fd9\u4e00\u73b0\u8c61\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u9608\u503c\u2014\u2014\u79f0\u4e3a\u5173\u952e\u590d\u6742\u6027\u2014\u2014\u5728\u8be5\u9608\u503c\u5904\uff0c\u975e\u6cdb\u5316\u884c\u4e3a\u7684\u4f9d\u8d56\u8fbe\u5230\u5cf0\u503c\uff0c\u8868\u660e\u4e86LLMs\u6cdb\u5316\u80fd\u529b\u7684\u4e0a\u9650\u3002\u968f\u7740\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\uff0c\u5173\u952e\u590d\u6742\u6027\u5411\u66f4\u9ad8\u5c42\u6b21\u7684\u4efb\u52a1\u590d\u6742\u5ea6\u79fb\u52a8\uff0c\u8868\u660e\u66f4\u5927\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u4e4b\u524d\u5904\u7406\u66f4\u590d\u6742\u7684\u63a8\u7406\u4efb\u52a1\u3002\u5229\u7528Scylla\u548c\u5173\u952e\u590d\u6742\u6027\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u5bf9\u5305\u62ec\u5f00\u6e90\u6a21\u578b\u5982LLaMA\u548cQwen\u5bb6\u65cf\u3001\u4ee5\u53ca\u95ed\u6e90\u6a21\u578b\u5982Claude\u548cGPT\u5728\u5185\u768428\u4e2aLLMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u63d0\u4f9b\u4e86\u66f4\u7a33\u5065\u7684\u8bc4\u4f30\uff0c\u5e76\u5bf9LLMs\u7684\u6cdb\u5316\u80fd\u529b\u6709\u4e86\u66f4\u6e05\u6670\u7684\u7406\u89e3\u3002|\n", "2410.01744": "|**2024-10-02**|**LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks**|Mengzhao Jia et.al.|[2410.01744](http://arxiv.org/abs/2410.01744)|**[link](https://github.com/jill0001/leopard)**|\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u50cf\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u666e\u904d\u5b58\u5728\uff0c\u5982\u5e7b\u706f\u7247\u6f14\u793a\u3001\u626b\u63cf\u6587\u6863\u548c\u7f51\u9875\u5feb\u7167\u7b49\uff0c\u5176\u4e2d\u6587\u672c\u4f5c\u4e3a\u6838\u5fc3\u89c6\u89c9\u5143\u7d20\u5f15\u5bfc\u6574\u4f53\u7406\u89e3\u3002\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u7684\u4efb\u52a1\u5c24\u5176\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e0d\u4ec5\u9700\u8981\u7406\u89e3\u5355\u4e2a\u56fe\u50cf\u7684\u5185\u5bb9\uff0c\u8fd8\u9700\u8981\u5728\u591a\u4e2a\u89c6\u89c9\u8f93\u5165\u4e4b\u95f4\u63a8\u7406\u5173\u7cfb\u548c\u903b\u8f91\u6d41\u7a0b\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u573a\u666f\u7684\u91cd\u8981\u6027\uff0c\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u65f6\u9047\u5230\u4e24\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u7f3a\u4e4f\u9002\u5408\u4e8e\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff1b\uff082\uff09\u96be\u4ee5\u5e73\u8861\u56fe\u50cf\u5206\u8fa8\u7387\u4e0e\u89c6\u89c9\u7279\u5f81\u5e8f\u5217\u957f\u5ea6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\OurMethod\uff0c\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u6d89\u53ca\u591a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7684\u89c6\u8bed\u8a00\u4efb\u52a1\u7684MLLM\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u7ea6\u4e00\u767e\u4e07\u6761\u9488\u5bf9\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u7684\u9ad8\u5206\u8fa8\u7387\u591a\u56fe\u50cf\u7f16\u7801\u6a21\u5757\uff0c\u6839\u636e\u8f93\u5165\u56fe\u50cf\u7684\u539f\u59cb\u7eb5\u6a2a\u6bd4\u548c\u5206\u8fa8\u7387\u52a8\u6001\u4f18\u5316\u89c6\u89c9\u5e8f\u5217\u957f\u5ea6\u7684\u5206\u914d\u3002\u5728\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u80fd\u529b\uff0c\u5e76\u5728\u901a\u7528\u9886\u57df\u8bc4\u4f30\u4e2d\u5c55\u73b0\u51fa\u7ade\u4e89\u529b\u3002|\n", "2410.01738": "|**2024-10-02**|**VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models**|Kailai Feng et.al.|[2410.01738](http://arxiv.org/abs/2410.01738)|**[link](https://github.com/carlofkl/vitaglyph)**|**\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u53cc\u5206\u652f\u3001\u65e0\u9700\u8bad\u7ec3\u7684\u65b0\u578b\u827a\u672f\u5b57\u4f53\u751f\u6210\u65b9\u6cd5\u2014\u2014VitaGlyph\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u7075\u6d3b\u5730\u8868\u8fbe\u8f93\u5165\u5b57\u7b26\u7684\u6838\u5fc3\u6982\u5ff5\u4ee5\u53ca\u4e30\u5bcc\u76f8\u5173\u7684\u80cc\u666f\u4fe1\u606f\uff0c\u5b9e\u73b0\u827a\u672f\u5b57\u4f53\u4e0e\u53ef\u63a7\u5236\u7684\u51e0\u4f55\u53d8\u5316\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u4ece\u800c\u4fdd\u6301\u5b57\u4f53\u7684\u53ef\u8bfb\u6027\u3002VitaGlyph\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u8f93\u5165\u5b57\u7b26\u89c6\u4e3a\u7531\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7ec4\u6210\u7684\u573a\u666f\uff0c\u5e76\u5728\u4e0d\u540c\u51e0\u4f55\u53d8\u6362\u7a0b\u5ea6\u4e0b\u8fdb\u884c\u6e32\u67d3\u3002 \u5177\u4f53\u6765\u8bf4\uff0cVitaGlyph\u901a\u8fc7\u4ee5\u4e0b\u4e09\u4e2a\u9636\u6bb5\u6846\u67b6\u5b9e\u73b0\u5176\u529f\u80fd\uff1a(i) \u77e5\u8bc6\u83b7\u53d6\u9636\u6bb5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bbe\u8ba1\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7684\u6587\u672c\u63cf\u8ff0\uff1b(ii) \u533a\u57df\u5206\u89e3\u9636\u6bb5\u8bc6\u522b\u6700\u5339\u914d\u4e3b\u4f53\u63cf\u8ff0\u7684\u90e8\u5206\uff0c\u5e76\u5c06\u8f93\u5165\u7684\u5b57\u7b26\u56fe\u50cf\u5206\u4e3a\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\uff1b(iii) \u5b57\u4f53\u98ce\u683c\u5316\u9636\u6bb5\u9996\u5148\u901a\u8fc7\u8bed\u4e49\u5b57\u4f53\u4f18\u5316\u4e3b\u4f53\u533a\u57df\u7684\u7ed3\u6784\uff0c\u7136\u540e\u5206\u522b\u4f7f\u7528\u53ef\u63a7\u7ec4\u5408\u751f\u6210\u6280\u672f\u6e32\u67d3\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\u7684\u7eb9\u7406\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cVitaGlyph\u4e0d\u4ec5\u5728\u827a\u672f\u6027\u548c\u53ef\u8bfb\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u80fd\u591f\u63cf\u7ed8\u591a\u79cd\u5b9a\u5236\u6982\u5ff5\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5bcc\u6709\u521b\u610f\u548c\u6109\u60a6\u7684\u827a\u672f\u5b57\u4f53\u751f\u6210\u3002\u9879\u76ee\u4ee3\u7801\u5c06\u5728https://github.com/Carlofkl/VitaGlyph\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2410.02761": "|**2024-10-03**|**FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models**|Zhipei Xu et.al.|[2410.02761](http://arxiv.org/abs/2410.02761)|**[link](https://github.com/zhipeixu/fakeshield)**|\u751f\u6210\u5f0fAI\u7684\u5feb\u901f\u53d1\u5c55\u72b9\u5982\u4e00\u628a\u53cc\u5203\u5251\uff0c\u65e2\u4fc3\u8fdb\u4e86\u5185\u5bb9\u521b\u4f5c\uff0c\u4e5f\u4f7f\u5f97\u56fe\u50cf\u7f16\u8f91\u548c\u96be\u4ee5\u8fa8\u8bc6\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u5f53\u524d\u7684\u56fe\u50cf\u4f2a\u9020\u68c0\u6d4b\u4e0e\u5b9a\u4f4d\uff08IFDL\uff09\u65b9\u6cd5\u867d\u7136\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u6709\u6548\uff0c\u4f46\u4ecd\u7136\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a1\uff09\u9ed1\u76d2\u6027\u8d28\uff0c\u5373\u65e0\u6cd5\u77e5\u6653\u5176\u68c0\u6d4b\u539f\u7406\uff1b2\uff09\u5bf9\u4e0d\u540c\u4f2a\u9020\u6280\u672f\uff08\u5982Photoshop\u3001DeepFake\u3001AIGC-Editing\u7b49\uff09\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53ef\u89e3\u91ca\u7684IFDL\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u5177\u6709\u591a\u6a21\u6001\u80fd\u529b\u7684\u6846\u67b6\u2014\u2014FakeShield\u3002\u8be5\u6846\u67b6\u65e8\u5728\u8bc4\u4f30\u56fe\u50cf\u7684\u771f\u5b9e\u6027\uff0c\u751f\u6210\u7be1\u6539\u533a\u57df\u7684\u63a9\u6a21\uff0c\u5e76\u57fa\u4e8e\u50cf\u7d20\u7ea7\u548c\u56fe\u50cf\u7ea7\u7684\u7be1\u6539\u7ebf\u7d22\u63d0\u4f9b\u5224\u65ad\u4f9d\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528GPT-4o\u589e\u5f3a\u4e86\u73b0\u6709\u7684IFDL\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u591a\u6a21\u6001\u7be1\u6539\u63cf\u8ff0\u6570\u636e\u96c6\uff08MMTD-Set\uff09\uff0c\u7528\u4e8e\u8bad\u7ec3FakeShield\u7684\u7be1\u6539\u5206\u6790\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57df\u6807\u7b7e\u5f15\u5bfc\u7684\u53ef\u89e3\u91ca\u4f2a\u9020\u68c0\u6d4b\u6a21\u5757\uff08DTE-FDM\uff09\u548c\u591a\u6a21\u6001\u4f2a\u9020\u5b9a\u4f4d\u6a21\u5757\uff08MFLM\uff09\uff0c\u4ee5\u5e94\u5bf9\u5404\u79cd\u4f2a\u9020\u68c0\u6d4b\u89e3\u91ca\u548c\u5b9e\u73b0\u7531\u8be6\u7ec6\u6587\u672c\u63cf\u8ff0\u6307\u5bfc\u7684\u4f2a\u9020\u5b9a\u4f4d\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0cFakeShield\u6709\u6548\u5730\u68c0\u6d4b\u548c\u5b9a\u4f4d\u4e86\u5404\u79cd\u7be1\u6539\u6280\u672f\uff0c\u63d0\u4f9b\u4e86\u6bd4\u4ee5\u5f80IFDL\u65b9\u6cd5\u66f4\u53ef\u89e3\u91ca\u4e14\u6027\u80fd\u66f4\u4f18\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.02757": "|**2024-10-03**|**Loong: Generating Minute-level Long Videos with Autoregressive Language Models**|Yuqing Wang et.al.|[2410.02757](http://arxiv.org/abs/2410.02757)|null|\u5728\u751f\u6210\u65f6\u957f\u8fbe\u5230\u6570\u5206\u949f\u7684\u4e30\u5bcc\u5185\u5bb9\u89c6\u9891\u65b9\u9762\uff0c\u5c3d\u7ba1\u5177\u6709\u6311\u6218\u6027\u4f46\u524d\u666f\u5e7f\u9614\u3002\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u751f\u6210\u8fde\u8d2f\u4e14\u957f\u5ea6\u8f83\u957f\u7684\u4ee4\u724c\u5e8f\u5217\u65b9\u9762\u53d6\u5f97\u4e86\u5de8\u5927\u6210\u529f\uff0c\u800c\u5728\u63a2\u7d22\u4f7f\u7528\u81ea\u56de\u5f52LLMs\u8fdb\u884c\u89c6\u9891\u751f\u6210\u65f6\uff0c\u4e3b\u8981\u5c40\u9650\u4e8e\u751f\u6210\u51e0\u79d2\u949f\u7684\u77ed\u89c6\u9891\u3002\u672c\u6587\u5bf9\u963b\u6b62\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u751f\u6210\u957f\u65f6\u95f4\u89c6\u9891\u7684\u6311\u6218\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\u3002\u57fa\u4e8e\u89c2\u5bdf\u548c\u5206\u6790\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u201cLoong\u201d\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe\u6570\u5206\u949f\u7684\u89c6\u9891\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u6587\u672c\u4ee4\u724c\u548c\u89c6\u9891\u4ee4\u724c\u7edf\u4e00\u4e3a\u81ea\u56de\u5f52LLM\u53ef\u4ee5\u8fdb\u884c\u81ea\u56de\u5f52\u5efa\u6a21\u7684\u5e8f\u5217\uff0c\u5e76\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u6e10\u8fdb\u5f0f\u77ed\u81f3\u957f\u8bad\u7ec3\u548c\u635f\u5931\u91cd\u65b0\u52a0\u6743\u65b9\u6848\uff0c\u4ee5\u7f13\u89e3\u957f\u671f\u89c6\u9891\u8bad\u7ec3\u4e2d\u7684\u635f\u5931\u4e0d\u5e73\u8861\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u63a8\u7406\u7b56\u7565\uff0c\u5305\u62ec\u89c6\u9891\u4ee4\u724c\u91cd\u7f16\u7801\u548c\u91c7\u6837\u7b56\u7565\uff0c\u4ee5\u51cf\u5c11\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7d2f\u79ef\u7684\u8bef\u5dee\u3002\u6211\u4eec\u7684\u63d0\u51fa\u7684\u201cLoong\u201d\u53ef\u4ee5\u4ece10\u79d2\u7684\u89c6\u9891\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u6269\u5c55\u5230\u6839\u636e\u6587\u672c\u63d0\u793a\u751f\u6210\u6570\u5206\u949f\u7ea7\u522b\u7684\u957f\u89c6\u9891\uff0c\u5982\u7ed3\u679c\u6240\u793a\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\uff1ahttps://epiphqny.github.io/Loong-video\u3002|\n", "2410.02755": "|**2024-10-03**|**SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost**|Jifan Zhang et.al.|[2410.02755](http://arxiv.org/abs/2410.02755)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSIEVE\u7684\u8f7b\u91cf\u7ea7\u66ff\u4ee3\u65b9\u6848\uff0c\u8be5\u65b9\u6848\u5728\u6210\u672c\u4ec5\u4e3aGPT-4o\u5355\u6b21\u8fc7\u6ee4\u8c03\u7528\u7684\u5341\u5206\u4e4b\u4e00\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u4e0eGPT-4o\u7684\u51c6\u786e\u6027\u76f8\u5339\u914d\u3002SIEVE\u7684\u6838\u5fc3\u5728\u4e8e\u5c06GPT-4o\u548c\u8f7b\u91cf\u7ea7T5\u6a21\u578b\u65e0\u7f1d\u96c6\u6210\uff0c\u5e76\u4f7f\u7528\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u5728\u5c11\u91cfGPT-4o\u8c03\u7528\u7684\u652f\u6301\u4e0b\u5bf9T5\u8fdb\u884c\u5fae\u8c03\u3002\u4e00\u65e6\u8bad\u7ec3\u5b8c\u6210\uff0cSIEVE\u7684\u8868\u73b0\u4e0eGPT-4o\u76f8\u5f53\uff0c\u4f46\u6210\u672c\u5374\u4f4e\u5f97\u591a\uff08\u4ec5\u4e3a\u73b0\u6709\u6280\u672f\u76841%\uff09\u3002\u6211\u4eec\u5728OpenWebText\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u9488\u5bf9\u9ad8\u8d28\u91cf\u548c\u9886\u57df\u7279\u5b9a\u5185\u5bb9\u7684\u4e94\u4e2a\u9ad8\u5ea6\u5b9a\u5236\u5316\u7684\u8fc7\u6ee4\u4efb\u52a1\u9a8c\u8bc1\u4e86SIEVE\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002 \u8fdb\u4e00\u6b65\u9a8c\u8bc1SIEVE\u7684\u6548\u679c\u663e\u793a\uff0cSIEVE\u548cGPT-4o\u5728\u51c6\u786e\u6027\u65b9\u9762\u8fbe\u5230\u76f8\u4f3c\u6c34\u5e73\uff0c\u800c\u4eba\u7c7b\u8bc4\u4f30\u8005\u66f4\u503e\u5411\u4e8eSIEVE\u7684\u8fc7\u6ee4\u7ed3\u679c\u800c\u975eGPT-4o\u7684\u7ed3\u679c\u3002|\n", "2410.02749": "|**2024-10-03**|**Training Language Models on Synthetic Edit Sequences Improves Code Synthesis**|Ulyana Piterbarg et.al.|[2410.02749](http://arxiv.org/abs/2410.02749)|**[link](https://github.com/upiterbarg/lintseq)**|\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aLintSeq\u7684\u5408\u6210\u6570\u636e\u751f\u6210\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u4f7f\u7528\u4ee3\u7801\u68c0\u67e5\u5668\u6765\u7a0b\u5e8f\u5316\u5730\u5728\u4e0d\u5f15\u5165\u9519\u8bef\u7684\u60c5\u51b5\u4e0b\u968f\u673a\u9009\u53d6\u63d2\u5165\u64cd\u4f5c\u5e8f\u5217\uff0c\u4ece\u800c\u5bf9\u73b0\u6709\u4ee3\u7801\u8fdb\u884c\u91cd\u6784\uff0c\u751f\u6210\u4e00\u7cfb\u5217\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u3002\u8fd9\u4e9b\u5e8f\u5217\u4ee5\u8fde\u7eed\u7684\u7a0b\u5e8f\u5dee\u5f02\u5f62\u5f0f\u8f93\u51fa\u3002 \u4e3a\u4e86\u6d4b\u8bd5LintSeq\uff0c\u6211\u4eec\u5c06\u5176\u5e94\u7528\u4e8e\u5c06\u6307\u4ee4+\u7a0b\u5e8f\u5bf9\u91cd\u65b0\u683c\u5f0f\u5316\u4e3a\u6307\u4ee4+\u7a0b\u5e8f\u5dee\u5f02\u5e8f\u5217\u5bf9\u7684\u4ee3\u7801\u5e93\u3002\u7136\u540e\uff0c\u6211\u4eec\u5bf9\u53c2\u6570\u4ece2.6B\u523014B\u7684\u591a\u4e2a\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u57fa\u4e8e\u6307\u4ee4\u7684\u5fae\u8c03\uff0c\u6bd4\u8f83\u4e86\u5728\u539f\u59cb\u7248\u672c\u548c\u91cd\u65b0\u683c\u5f0f\u5316\u7248\u672c\u6570\u636e\u96c6\u4e0a\u7684\u96f6\u6b21\u5c04\u51fb\u6027\u80fd\u5728\u4ee3\u7801\u5408\u6210\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u6b21\u91c7\u6837\u671f\u95f4\uff0c\u7ecf\u8fc7\u4ee3\u7801\u5dee\u5f02\u5fae\u8c03\u7684\u6a21\u578b\u4ea7\u751f\u7684\u7a0b\u5e8f\u591a\u6837\u6027\u9ad8\u4e8e\u57fa\u7ebf\u3002\u8fd9\u5bfc\u81f4\u4e86\u5728\u7ed9\u5b9a\u5c1d\u8bd5\u6b21\u6570\u201ck\u201d\u65f6\uff0c\u9488\u5bf9\u57fa\u51c6\u8986\u76d6\u7387\u7684\u63a8\u7406\u65f6\u95f4\u6269\u5c55\u6027\u66f4\u597d\uff0c\u5373\u89e3\u51b3\u4efb\u4f55\u95ee\u9898\u7684\u6982\u7387\u201cpass@k\u201d\u3002\u4f8b\u5982\uff0c\u5728HumanEval pass@50\u4e0a\uff0c\u8f83\u5c0f\u6a21\u578b\u5728\u7ecf\u8fc7\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u5fae\u8c03\u540e\u4e0eGPT-4\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\uff0c\u5e76\u4e14\u4f18\u4e8e\u57fa\u4e8e\u57fa\u7ebf\u6570\u636e\u96c6\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u7edd\u5bf9\u5f97\u5206\u9ad8\u51fa20%\uff08\u00b13%\uff09\u3002 \u6700\u540e\uff0c\u6211\u4eec\u8fd8\u9884\u8bad\u7ec3\u4e86\u81ea\u5df1\u7684\u5c0f\u578b\u6a21\u578b\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u57fa\u4e8e\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u7684\u5fae\u8c03\u53ef\u4ee5\u8fbe\u5230\u7c7b\u8bbe\u5907\u6a21\u578b\u7684\u6700\u9ad8\u4ee3\u7801\u5408\u6210\u6027\u80fd\u3002\u6211\u4eec\u76841.5\u4ebf\u53c2\u6570\u7f16\u8f91\u5e8f\u5217\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5339\u914d\u6216\u8d85\u8d8a\u4e86\u53c2\u6570\u91cf\u7ffb\u500d\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u5426\u8fdb\u884c\u591a\u6b21\u91c7\u6837\uff0c\u5305\u62ecCodex\u548cAlphaCode\u3002|\n", "2410.02748": "|**2024-10-03**|**CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation**|Han He et.al.|[2410.02748](http://arxiv.org/abs/2410.02748)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u4ece\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u589e\u5f3a\u603b\u7ed3\u63d0\u793a\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u53ef\u4ee5\u63d0\u9ad8ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0c\u878d\u5165\u77ed\u8bed\u7ea7\u522b\u7684\u663e\u8457\u4fe1\u606f\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7ea7\u522b\u7684\u4fe1\u606f\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5bf9\u5e7b\u89c9\u7684\u5f71\u54cd\u5e76\u975e\u5728\u6240\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0a\u90fd\u662f\u79ef\u6781\u7684\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u9879\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u6a21\u578bKeyphrase Signal Extractor\uff08CriSPO\uff09\uff0c\u8be5\u6a21\u578b\u53ef\u4ee5\u5fae\u8c03\u4ee5\u63d0\u53d6\u663e\u8457\u7684\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528CriSPO\uff0c\u6211\u4eec\u5728\u6570\u636e\u96c6\u3001\u5f00\u6e90\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u5bf9ROUGE\u6539\u8fdb\u7684\u4e00\u81f4\u6027\uff0c\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u5b9a\u5236\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u603b\u7ed3\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.02746": "|**2024-10-03**|**Contrastive Localized Language-Image Pre-Training**|Hong-You Chen et.al.|[2410.02746](http://arxiv.org/abs/2410.02746)|null|\u672c\u6587\u9488\u5bf9\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u4f5c\u4e3a\u89c6\u89c9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u7684\u6210\u529f\uff0c\u91cd\u70b9\u5728\u4e8e\u901a\u8fc7\u5728\u56fe\u50cf\u7ea7\u522b\u4e0a\u5bf9\u9f50\u7f51\u7edc\u6587\u672c\u6ce8\u91ca\u6765\u4f18\u5316\u89c6\u89c9\u7f16\u7801\u5668\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u9700\u8981\u7ec6\u7c92\u5ea6\u89c6\u89c9\u8868\u793a\u7684\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d8\u5f97\u4e0d\u591f\u5145\u5206\uff0c\u5c24\u5176\u662f\u5f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u9700\u8981\u8fdb\u884c\u533a\u57df\u7ea7\u7406\u89e3\u65f6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bf9\u6bd4\u5b9a\u4f4d\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLOC\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8865\u5145CLIP\u4ee5\u589e\u52a0\u533a\u57df\u6587\u672c\u5bf9\u6bd4\u635f\u5931\u548c\u6a21\u5757\u6765\u63d0\u5347\u5176\u5b9a\u4f4d\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u6982\u5ff5\uff0c\u5373\u53ef\u63d0\u793a\u5d4c\u5165\uff0c\u5176\u5141\u8bb8\u7f16\u7801\u5668\u751f\u6210\u6613\u4e8e\u901a\u8fc7\u7a7a\u95f4\u63d0\u793a\u8f6c\u6362\u4e3a\u533a\u57df\u8868\u793a\u7684\u56fe\u50cf\u5d4c\u5165\u3002\u4e3a\u4e86\u652f\u6301\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\uff0c\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u89c6\u89c9\u589e\u5f3a\u4e14\u7a7a\u95f4\u5c40\u90e8\u5316\u7684\u63cf\u8ff0\u7b26\u751f\u6210\u6846\u67b6\uff0c\u80fd\u591f\u6709\u6548\u751f\u6210\u5927\u89c4\u6a21\u7684\u533a\u57df\u6587\u672c\u4f2a\u6807\u7b7e\u3002\u901a\u8fc7\u6269\u5c55\u5230\u6570\u5341\u4ebf\u6807\u6ce8\u56fe\u50cf\uff0cCLOC\u4f7f\u5f97\u56fe\u50cf\u533a\u57df\u8bc6\u522b\u548c\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u9ad8\u8d28\u91cf\u533a\u57df\u5d4c\u5165\u6210\u4e3a\u53ef\u80fd\uff0c\u5e76\u53ef\u4ee5\u4f5c\u4e3aCLIP\u7684\u76f4\u63a5\u66ff\u4ee3\u54c1\uff0c\u7528\u4e8e\u589e\u5f3aMLLMs\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee3\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u4efb\u52a1\u4e2d\u3002|\n", "2410.02744": "|**2024-10-03**|**Neutral residues: revisiting adapters for model extension**|Franck Signe Talla et.al.|[2410.02744](http://arxiv.org/abs/2410.02744)|null|\u6211\u4eec\u89e3\u51b3\u4e86\u4e00\u4e2a\u65b0\u7684\u95ee\u9898\uff1a\u5982\u4f55\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u6269\u5c55\u5230\u5728\u8bad\u7ec3\u65f6\u672a\u66fe\u89c1\u8fc7\u7684\u9886\u57df\uff0c\u4f8b\u5982\u6dfb\u52a0\u4e00\u79cd\u539f\u59cb\u6a21\u578b\u672a\u89c1\u8fc7\u6216\u89c1\u8fc7\u5f88\u5c11\u8bad\u7ec3\u6570\u636e\u7684\u8bed\u8a00\u3002\u6d41\u884c\u7684\u89e3\u51b3\u65b9\u6848\u5982\u5fae\u8c03\u6216\u4f4e\u79e9\u9002\u5e94\u5728\u9886\u57df\u9002\u5e94\u65b9\u9762\u53d6\u5f97\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5b9e\u9645\u4e0a\u5e76\u672a\u589e\u52a0\u989d\u5916\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u964d\u4f4e\u4e86\u539f\u59cb\u9886\u57df\u7684\u6027\u80fd\u3002\u672c\u6587\u4ece\u4e09\u4e2a\u89d2\u5ea6\u5206\u6790\u4e86\u8fd9\u4e2a\u95ee\u9898\uff1a\u6570\u636e\u3001\u67b6\u6784\u548c\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u88ab\u6709\u5229\u5730\u8054\u5408\u8003\u8651\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u9002\u914d\u5668\uff0c\u5e76\u4f7f\u5176\u6709\u53ef\u80fd\u5b66\u4e60\u4e00\u4e2a\u5168\u65b0\u7684\u8bed\u8a00\uff0c\u540c\u65f6\u786e\u4fdd\u795e\u7ecf\u7f51\u7edc\u5728\u539f\u59cb\u9886\u57df\u7684\u8f93\u51fa\u51e0\u4e4e\u4e0d\u53d8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u65b0\u7684\u6b8b\u5dee\u5757\u7684\u65b9\u5f0f\uff0c\u4f7f\u5f97\u6bcf\u4e2a\u65b0\u7684\u6b8b\u5dee\u5757\u5728\u539f\u59cb\u9886\u57df\u8f93\u51fa\u63a5\u8fd1\u96f6\u7684\u7ed3\u679c\u3002 \u8fd9\u79cd\u88ab\u79f0\u4e3a\u201c\u4e2d\u6027\u6b8b\u5dee\u201d\u7684\u89e3\u51b3\u65b9\u6848\u501f\u9274\u4e86\u6df7\u5408\u4e13\u5bb6\u67b6\u6784\u7684\u7ec4\u4ef6\uff0c\u6548\u679c\u663e\u8457\uff1a\u4e0e\u4ec5\u7528\u82f1\u8bed\u8bad\u7ec3\u7684\u539f\u59cb\u6a21\u578b\u76f8\u6bd4\uff0c\u53ea\u9700\u8981\u989d\u591620%\u7684\u5b66\u4e60\u6743\u91cd\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u548c\u4e0d\u5fd8\u8bb0\u82f1\u8bed\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u4f18\u4e8e\u540c\u65f6\u8fdb\u884c\u7684\u5176\u4ed6\u65b9\u6cd5\uff08\u5fae\u8c03\u3001\u4f4e\u79e9\u6216\u5e38\u89c4\u9002\u914d\u5668\uff09\u7684\u7ed3\u679c\u3002|\n", "2410.02743": "|**2024-10-03**|**MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions**|Yekun Chai et.al.|[2410.02743](http://arxiv.org/abs/2410.02743)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u8bc1\u660e\u4e86\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u65b9\u9762\u5177\u6709\u6709\u6548\u6027\u3002\u7136\u800c\uff0c\u57fa\u4e8etoken\u7684RLHF\u9762\u4e34\u7740\u957f\u671f\u5e8f\u5217\u4e2d\u7684\u8d23\u4efb\u5f52\u56e0\u95ee\u9898\uff0c\u5176\u4e2d\u5ef6\u8fdf\u5956\u52b1\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u786e\u5b9a\u54ea\u4e9b\u64cd\u4f5c\u5bfc\u81f4\u4e86\u6210\u529f\u7684\u7ed3\u679c\uff0c\u8fd9\u963b\u788d\u4e86\u5b66\u4e60\u6548\u7387\u5e76\u51cf\u6162\u4e86\u6536\u655b\u901f\u5ea6\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMA-RLHF\u7684\u7b80\u5355\u800c\u6709\u6548\u7684RLHF\u6846\u67b6\uff0c\u5b83\u5c06\u5b8f\u52a8\u4f5c\u2014\u2014\u4e00\u7cfb\u5217token\u6216\u66f4\u9ad8\u5c42\u6b21\u7684\u8bed\u8a00\u6784\u9020\u2014\u2014\u878d\u5165\u5230\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u3002\u901a\u8fc7\u5728\u66f4\u9ad8\u62bd\u8c61\u7ea7\u522b\u4e0a\u64cd\u4f5c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u884c\u52a8\u548c\u5956\u52b1\u4e4b\u95f4\u7684\u65f6\u5e8f\u8ddd\u79bb\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e86\u66f4\u5feb\u4e14\u66f4\u51c6\u786e\u7684\u8d23\u4efb\u5f52\u56e0\u3002\u8fd9\u5bfc\u81f4\u4e86\u66f4\u7a33\u5b9a\u7684\u7b56\u7565\u68af\u5ea6\u4f30\u8ba1\uff0c\u5e76\u63d0\u9ad8\u4e86\u6bcf\u4e2aepisode\u5185\u7684\u5b66\u4e60\u6548\u7387\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u671f\u95f4\u589e\u52a0\u8ba1\u7b97\u590d\u6742\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6587\u672c\u6458\u8981\u3001\u5bf9\u8bdd\u751f\u6210\u3001\u95ee\u9898\u56de\u7b54\u548c\u7a0b\u5e8f\u5408\u6210\u7b49\u5404\u4e2a\u6a21\u578b\u5927\u5c0f\u548c\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6587\u672c\u6458\u8981\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u8fbe30%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5728\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e8618%\uff0c\u5728\u95ee\u9898\u56de\u7b54\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e868%\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6bd4\u6807\u51c6\u7684RLHF\u5feb1.7\u81f32\u500d\u7684\u8bad\u7ec3\u65f6\u95f4\u8fbe\u5230\u4e0e\u4e4b\u76f8\u5339\u654c\u7684\u6027\u80fd\u6c34\u5e73\uff0c\u5e76\u4e14\u968f\u7740\u8fdb\u4e00\u6b65\u7684\u8bad\u7ec3\uff0c\u7ee7\u7eed\u8d85\u8d8a\u5b83\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u7f51\u5740\u4e3ahttps://github.com/ernie-research/MA-RLHF \u3002|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\u7ecf\u5e38\u9047\u5230\u56f0\u96be\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u8282\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large language model with Imperfect world MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u6574\u5408\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u65f6\u95f4\u4e00\u81f4\u6027\u4f53\u9a8c\u91c7\u6837\u7684\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\u3001\u4e00\u7ec4\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u601d\u5148\u524d\u7ecf\u9a8c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6a21\u5757\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u663e\u8457\u63d0\u5347\u5f3a\u5f00\u6e90LLMs\uff08\u5982LLaMA-3\uff09\u7684\u8868\u73b0\uff0c\u5206\u522b\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.04\u500d\u30011.54\u500d\u548c1.82\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5176\u66f4\u5927\u7684\u540c\u8f88\u6a21\u578b\uff0c\u5982GPT-4\u3002|\n", "2410.02741": "|**2024-10-03**|**Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization**|Lei Xu et.al.|[2410.02741](http://arxiv.org/abs/2410.02741)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u6765\u589e\u5f3a\u751f\u6210\u63d0\u793a\u4ee5\u6539\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6458\u8981\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u80fd\u63d0\u5347ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u5f97\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u52a0\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u901a\u8fc7\u8c03\u6574\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\uff0c\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0c\u5c06\u77ed\u8bed\u7ea7\u7684\u663e\u8457\u4fe1\u606f\u878d\u5165\u63d0\u793a\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u4e0d\u610f\u5473\u7740\u5bf9\u6240\u6709LLM\u90fd\u666e\u904d\u6709\u76ca\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u7684Keyphrase Signal Extractor\uff08SigExt\uff09\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u53ef\u8fdb\u884c\u5fae\u8c03\u4ee5\u63d0\u53d6\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528SigExt\uff0c\u6211\u4eec\u5728\u591a\u4e2a\u6570\u636e\u96c6\u3001\u516c\u5f00\u6743\u91cd\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u4e0d\u4f9d\u8d56\u4e8eLLM\u5b9a\u5236\u7684ROUGE\u6307\u6807\u6539\u5584\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u6458\u8981\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.03663": "|**2024-10-04**|**Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models**|Zhuochun Li et.al.|[2410.03663](http://arxiv.org/abs/2410.03663)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cMistake-Aware Peer-Review Distillation\u201d\uff08MAPD\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u6539\u8fdb\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u7684\u77e5\u8bc6\u63d0\u70bc\uff08KD\uff09\u8fc7\u7a0b\u6765\u63d0\u9ad8\u5b83\u4eec\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u901a\u5e38\u4f9d\u8d56\u4e8e\u5927\u578b\u5546\u4e1a\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u6559\u5e08\u3002\u4e0e\u4ee5\u5f80\u7814\u7a76\u4ec5\u4f7f\u7528\u5355\u4e00\u6559\u5e08\u751f\u6210\u7684\u9ec4\u91d1\u7406\u636e\u8fdb\u884c\u8bad\u7ec3\u4e0d\u540c\uff0cMAPD\u65b9\u6cd5\u91c7\u53d6\u4e86\u66f4\u4e3a\u7ec6\u81f4\u7684\u7b56\u7565\uff1a 1. **\u4e2a\u6027\u5316\u9519\u8bef\u53cd\u9988**\uff1aMAPD\u4e0d\u4ec5\u8981\u6c42\u6559\u5e08\u63d0\u4f9b\u5b66\u751f\u7b54\u6848\u7684\u6b63\u786e\u7406\u636e\uff0c\u66f4\u8fdb\u4e00\u6b65\u5730\uff0c\u5b83\u8ba9\u6559\u5e08\u6307\u51fa\u5b66\u751f\u7684\u9519\u8bef\u5e76\u89e3\u91ca\u539f\u56e0\uff0c\u4ece\u800c\u751f\u6210\u5b9a\u5236\u5316\u7684\u6559\u5b66\u6570\u636e\u3002 2. **\u6a21\u62df\u540c\u884c\u8bc4\u5ba1**\uff1a\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6559\u5e08\u95f4\u7684\u6a21\u62df\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\uff0cMAPD\u7b5b\u9009\u51fa\u90a3\u4e9b\u8fbe\u5230\u4e00\u5b9a\u63a5\u53d7\u6807\u51c6\u7684\u751f\u6210\u7406\u636e\u3002\u8fd9\u4e00\u673a\u5236\u51cf\u5c11\u4e86\u6559\u5e08\u56e0\u731c\u6d4b\u800c\u7ed9\u51fa\u9519\u8bef\u7406\u636e\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6559\u5b66\u6570\u636e\u7684\u8d28\u91cf\u3002 \u672c\u6587\u5728\u6570\u5b66\u3001\u5e38\u8bc6\u548c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86MAPD\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.03658": "|**2024-10-04**|**RAFT: Realistic Attacks to Fool Text Detectors**|James Wang et.al.|[2410.03658](http://arxiv.org/abs/2410.03658)|**[link](https://github.com/jameslwang/raft)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u73b0\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5668\u7684\u8bed\u6cd5\u65e0\u8bef\u7684\u9ed1\u76d2\u653b\u51fb\u65b9\u6cd5\uff0c\u79f0\u4e3aRAFT\u3002\u4e0e\u4e4b\u524d\u9488\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u653b\u51fb\u4e0d\u540c\uff0cRAFT\u65b9\u6cd5\u5229\u7528\u4e86\u8bcd\u7ea7\u4e0a\u7684LLM\u5d4c\u5165\u7684\u53ef\u8fc1\u79fb\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u539f\u59cb\u6587\u672c\u8d28\u91cf\u4e0d\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8f85\u52a9\u5d4c\u5165\uff0cRAFT\u8d2a\u5a6a\u5730\u9009\u62e9\u9700\u8981\u6270\u52a8\u7684\u76ee\u6807\u5355\u8bcd\uff0c\u4ee5\u5bf9\u6297\u7279\u5b9a\u7684\u68c0\u6d4b\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRAFT\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u4f7f\u6240\u6709\u7814\u7a76\u4e2d\u7684\u68c0\u6d4b\u5668\u5728\u5404\u79cd\u9886\u57df\u4e2d\u5931\u6548\u9ad8\u8fbe99%\uff0c\u5e76\u4e14\u5177\u6709\u8de8\u6e90\u6a21\u578b\u7684\u53ef\u79fb\u690d\u6027\u3002\u624b\u52a8\u7684\u4eba\u7c7b\u8bc4\u4f30\u7814\u7a76\u8868\u660e\uff0cRAFT\u751f\u6210\u7684\u653b\u51fb\u5b9e\u4f8b\u65e2\u771f\u5b9e\u53c8\u96be\u4ee5\u4e0e\u539f\u521b\u4eba\u7c7b\u7f16\u5199\u6587\u672c\u533a\u5206\u5f00\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86RAFT\u751f\u6210\u7684\u4f8b\u5b50\u53ef\u4ee5\u7528\u6765\u8bad\u7ec3\u9c81\u68d2\u6027\u66f4\u5f3a\u7684\u68c0\u6d4b\u5668\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u5f53\u524d\u7684LLM\u68c0\u6d4b\u5668\u5e76\u975e\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5f3a\u8c03\u4e86\u8feb\u5207\u9700\u8981\u66f4\u5f3a\u5927\u7684\u68c0\u6d4b\u673a\u5236\u7684\u5fc5\u8981\u6027\u3002|\n", "2410.03642": "|**2024-10-04**|**Aligning LLMs with Individual Preferences via Interaction**|Shujin Wu et.al.|[2410.03642](http://arxiv.org/abs/2410.03642)|**[link](https://github.com/shujinwu-0814/aloe)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u65e5\u76ca\u5148\u8fdb\u7684\u80fd\u529b\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u5bf9\u4e8e\u5e7f\u6cdb\u91c7\u7528\u8fd9\u4e9b\u6a21\u578b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9075\u5faa\u8bf8\u5982\u5e2e\u52a9\u6027\u3001\u65e0\u5bb3\u6027\u548c\u8bda\u5b9e\u6027\u7b49\u4e00\u822c\u539f\u5219\u4e0a\uff0c\u4f46\u5ffd\u89c6\u4e86\u8003\u8651\u5230\u4e2a\u4eba\u548c\u591a\u6837\u6027\u504f\u597d\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31\u4e86\u4e2a\u6027\u5316\u7684\u4eba\u7c7b\u4f53\u9a8c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u80fd\u591f\u201c\u4ea4\u4e92\u4ee5\u5bf9\u9f50\u201d\u7684LLMs\uff0c\u5373\u8ba9LLMs\u53d1\u5c55\u51fa\u4e00\u79cd\u9690\u5f0f\u63a8\u65ad\u5f53\u524d\u7528\u6237\u672a\u660e\u786e\u8868\u8fbe\u7684\u4e2a\u6027\u5316\u504f\u597d\u7684\u5143\u6280\u80fd\uff0c\u5e76\u636e\u6b64\u52a8\u6001\u8c03\u6574\u540e\u7eed\u884c\u4e3a\u548c\u54cd\u5e94\u4ee5\u9002\u5e94\u8fd9\u4e9b\u63a8\u65ad\u7684\u504f\u597d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u5efa\u7acb\u4e00\u4e2a\u75313,310\u4e2a\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7ec4\u6210\u7684\u591a\u6837\u5316\u6c60\uff0c\u901a\u8fc7\u521d\u59cb\u793a\u4f8b\u521b\u5efa\uff0c\u7136\u540e\u901a\u8fc7\u8fed\u4ee3\u81ea\u6211\u751f\u6210\u548c\u7b5b\u9009\u8fdb\u884c\u6269\u5c55\u3002\u5728\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7684\u6307\u5bfc\u4e0b\uff0c\u6211\u4eec\u5229\u7528\u591aLLM\u534f\u4f5c\u5f00\u53d1\u4e86\u4e00\u4e2a\u5305\u542b3K+\u591a\u8f6e\u5bf9\u8bdd\u7684\u6811\u5f62\u7ed3\u6784\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u76d1\u7763\u5fae\u8c03\u548c\u5f3a\u5316\u5b66\u4e60\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u63d0\u9ad8LLMs\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u5efa\u7acb\u4e86ALOE\uff08ALign With CustOmized PrEferences\uff09\u57fa\u51c6\uff0c\u5305\u542b100\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4f8b\u5b50\u4ee5\u53ca\u7528\u4e8e\u8861\u91cf\u5bf9\u8bdd\u4e2d\u4e2a\u6027\u5316\u5bf9\u9f50\u6027\u80fd\u7684\u9002\u5f53\u5ea6\u91cf\u6807\u51c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u901a\u8fc7\u4e92\u52a8\u5b9e\u73b0\u52a8\u6001\u3001\u4e2a\u6027\u5316\u7684\u5bf9\u9f50\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002**|\n", "2410.03613": "|**2024-10-04**|**Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation**|Jie Xiao et.al.|[2410.03613](http://arxiv.org/abs/2410.03613)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6211\u4eec\u5de5\u4f5c\u548c\u65e5\u5e38\u751f\u6d3b\u7684\u5404\u4e2a\u65b9\u9762\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u7528\u6237\u9690\u79c1\u7684\u5173\u6ce8\u63a8\u52a8\u4e86\u8fd9\u4e9b\u6a21\u578b\u672c\u5730\u90e8\u7f72\u7684\u8d8b\u52bf\u3002\u5b58\u5728\u4e00\u4e9b\u8f7b\u91cf\u7ea7LLM\uff08\u4f8b\u5982Gemini Nano\uff0cLLAMA2 7B\uff09\uff0c\u5b83\u4eec\u53ef\u4ee5\u5728\u667a\u80fd\u624b\u673a\u4e0a\u672c\u5730\u8fd0\u884c\uff0c\u4e3a\u7528\u6237\u63d0\u4f9b\u5bf9\u5176\u4e2a\u4eba\u6570\u636e\u7684\u66f4\u5927\u63a7\u5236\u6743\u3002\u4f5c\u4e3a\u4e00\u9879\u8fc5\u901f\u53d1\u5c55\u7684\u5e94\u7528\uff0c\u6211\u4eec\u5173\u6ce8\u5b83\u4eec\u5728\u5546\u7528\u79fb\u52a8\u8bbe\u5907\u4e0a\u7684\u6027\u80fd\u3002 \u4e3a\u4e86\u5168\u9762\u4e86\u89e3LLM\u5728\u79fb\u52a8\u5e73\u53f0\u4e0a\u7684\u90e8\u7f72\u73b0\u72b6\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u6d4b\u91cf\u7814\u7a76\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f71\u54cd\u7528\u6237\u4f53\u9a8c\u7684\u6307\u6807\uff0c\u5305\u62ec\u4ee4\u724c\u541e\u5410\u91cf\u3001\u5ef6\u8fdf\u548c\u7535\u6c60\u6d88\u8017\uff0c\u4ee5\u53ca\u5bf9\u5f00\u53d1\u8005\u81f3\u5173\u91cd\u8981\u7684\u56e0\u7d20\uff0c\u5982\u8d44\u6e90\u5229\u7528\u3001\u52a8\u6001\u7535\u538b\u9891\u7387\u7f29\u653e\u7b56\u7565\u548c\u63a8\u7406\u5f15\u64ce\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u5206\u6790\u4e86\u786c\u4ef6\u80fd\u529b\u548c\u7cfb\u7edf\u52a8\u529b\u5b66\u5982\u4f55\u5f71\u54cd\u672c\u5730\u8bbe\u5907\u4e0a\u7684LLM\u6027\u80fd\uff0c\u8fd9\u53ef\u80fd\u6709\u52a9\u4e8e\u5f00\u53d1\u8005\u8bc6\u522b\u5e76\u89e3\u51b3\u79fb\u52a8LLM\u5e94\u7528\u7a0b\u5e8f\u4e2d\u7684\u74f6\u9888\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u9488\u5bf9\u4e3b\u8981\u4f9b\u5e94\u5546\u7684\u79fb\u52a8\u7cfb\u7edf\u7ea7\u82af\u7247\uff08SoC\uff09\u7684\u5168\u9762\u6bd4\u8f83\uff0c\u7a81\u51fa\u4e86\u5b83\u4eec\u5728\u5904\u7406LLM\u5de5\u4f5c\u8d1f\u8f7d\u65f6\u7684\u6027\u80fd\u5dee\u5f02\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u591f\u4e3a\u672c\u5730\u8bbe\u5907LLM\u7684\u5f00\u53d1\u548c\u672a\u6765\u79fb\u52a8\u7cfb\u7edf\u67b6\u6784\u7684\u8bbe\u8ba1\u63d0\u4f9b\u6d1e\u5bdf\u3002|\n", "2410.03608": "|**2024-10-04**|**TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation**|Jonathan Cook et.al.|[2410.03608](http://arxiv.org/abs/2410.03608)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u80cc\u666f\u4e0b\uff0c\u6784\u5efa\u7075\u6d3b\u4e14\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u5176\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u7684\u65b9\u6cd5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\uff0c\u504f\u597d\u5224\u65ad\u6210\u4e3a\u4e86\u8bc4\u4f30\u6807\u51c6\u7684\u9ed8\u8ba4\u9009\u62e9\uff0c\u5c3d\u7ba1\u8fd9\u79cd\u505a\u6cd5\u7b80\u5316\u4e86\u590d\u6742\u3001\u591a\u7ef4\u504f\u597d\u7684\u63d0\u70bc\uff0c\u5c06\u5176\u5f52\u7ed3\u4e3a\u5355\u4e00\u6392\u540d\u3002\u7136\u800c\uff0c\u968f\u7740\u4eba\u5de5\u6ce8\u91ca\u7684\u7f13\u6162\u548c\u6210\u672c\u9ad8\u6602\uff0cLLM\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u505a\u51fa\u8fd9\u4e9b\u5224\u65ad\uff0c\u8fd9\u727a\u7272\u4e86\u53ef\u9760\u6027\u548c\u53ef\u89e3\u91ca\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TICK\uff08\u9488\u5bf9\u7279\u5b9a\u6307\u4ee4\u7684\u7ed3\u6784\u5316\u8bc4\u4f30\u4e0e\u6838\u67e5\u6e05\u5355\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u81ea\u52a8\u5316\u3001\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u65b9\u6848\uff0c\u901a\u8fc7LLM\u751f\u6210\u7684\u3001\u9488\u5bf9\u6307\u4ee4\u7684\u6838\u67e5\u6e05\u5355\u7ed3\u6784\u5316\u8bc4\u4f30\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7ed9\u5b9a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u80fd\u591f\u53ef\u9760\u5730\u4ea7\u751f\u9ad8\u8d28\u91cf\u3001\u5b9a\u5236\u5316\u7684\u8bc4\u4f30\u6838\u67e5\u6e05\u5355\uff0c\u5c06\u6307\u4ee4\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u662f/\u5426\u95ee\u9898\u3002\u6bcf\u4e2a\u95ee\u9898\u8be2\u95ee\u5019\u9009\u56de\u5e94\u662f\u5426\u6ee1\u8db3\u6307\u4ee4\u7684\u5177\u4f53\u8981\u6c42\u3002\u6211\u4eec\u8bc1\u660e\u4f7f\u7528TICK\u80fd\u591f\u663e\u8457\u63d0\u9ad8LLM\u5224\u65ad\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7cbe\u786e\u4e00\u81f4\u6027\u7684\u9891\u7387\uff0c\u76f8\u6bd4\u76f4\u63a5\u7531LLM\u8bc4\u5206\u8f93\u51fa\uff0c\u8fd9\u4e00\u6bd4\u4f8b\u4ece46.4%\u63d0\u5347\u81f352.2%\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86STICK\uff08\u81ea\u6211TICK\uff09\u53ef\u4ee5\u5229\u7528\u81ea\u6211\u7ec6\u5316\u548c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u6765\u6539\u5584\u591a\u4e2a\u57fa\u51c6\u7684\u751f\u6210\u8d28\u91cf\u3002\u5bf9LiveBench\u63a8\u7406\u4efb\u52a1\u8fdb\u884cSTICK\u81ea\u6211\u7ec6\u5316\uff0c\u5b9e\u73b0\u4e86\u7edd\u5bf9\u589e\u76ca+7.8%\uff0c\u800c\u4f7f\u7528STICK\u8fdb\u884c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u5728\u771f\u5b9e\u4e16\u754c\u6307\u4ee4\u6570\u636e\u96c6WildBench\u4e0a\u83b7\u5f97\u4e86+6.3%\u7684\u7edd\u5bf9\u6539\u8fdb\u3002\u8fd9\u8868\u660e\uff0c\u7ed3\u6784\u5316\u7684\u3001\u591a\u7ef4\u5ea6\u7684\u81ea\u6211\u6539\u8fdb\u662f\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u80fd\u529b\u7684\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6700\u540e\uff0c\u901a\u8fc7\u5411\u76f4\u63a5\u4e3aWildBench\u6307\u4ee4\u8bc4\u4f30LLM\u54cd\u5e94\u7684\u4eba\u7c7b\u8bc4\u4f30\u8005\u63d0\u4f9bLLM\u751f\u6210\u7684\u6838\u67e5\u6e05\u5355\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u8bc4\u4f30\u8005\u4e4b\u95f4\u7684\u5171\u8bc6\u5ea6\uff08\u4ece0.194\u63d0\u5347\u81f30.256\uff09\u3002|\n", "2410.03600": "|**2024-10-04**|**Efficiently Identifying Watermarked Segments in Mixed-Source Texts**|Xuandong Zhao et.al.|[2410.03600](http://arxiv.org/abs/2410.03600)|null|\u6587\u672c\u6c34\u5370\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u7528\u4e8e\u68c0\u6d4b\u5408\u6210\u6587\u672c\uff0c\u4ee5\u7f13\u89e3\u865a\u5047\u65b0\u95fb\u548c\u5b66\u672f\u4e0d\u8bda\u5b9e\u7b49\u6ee5\u7528\u60c5\u51b5\u3002\u73b0\u6709\u6c34\u5370\u68c0\u6d4b\u6280\u672f\u4e3b\u8981\u5173\u6ce8\u4e8e\u5bf9\u6574\u4e2a\u6587\u6863\u8fdb\u884c\u5206\u7c7b\uff0c\u5224\u65ad\u5176\u662f\u5426\u88ab\u6c34\u5370\u6807\u8bb0\uff0c\u4f46\u5f80\u5f80\u5ffd\u7565\u4e86\u5728\u66f4\u957f\u7684\u6df7\u5408\u6765\u6e90\u6587\u6863\u4e2d\u8bc6\u522b\u5355\u72ec\u6c34\u5370\u6bb5\u843d\u7684\u5e38\u89c1\u573a\u666f\u3002\u53d7\u5230\u6284\u88ad\u68c0\u6d4b\u7cfb\u7edf\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u578b\u65b9\u6cd5\u8fdb\u884c\u90e8\u5206\u6c34\u5370\u68c0\u6d4b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u51e0\u4f55\u8986\u76d6\u68c0\u6d4b\u6846\u67b6\uff0c\u65e8\u5728\u786e\u5b9a\u957f\u6587\u672c\u4e2d\u662f\u5426\u5b58\u5728\u6c34\u5370\u6bb5\u843d\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7b97\u6cd5\uff0c\u4ee5\u51c6\u786e\u5b9a\u4f4d\u6587\u672c\u4e2d\u7684\u6c34\u5370\u6bb5\u843d\u4f4d\u7f6e\u3002\u5728\u4e09\u79cd\u6d41\u884c\u7684\u6c34\u5370\u6280\u672f\uff08KGW-Watermark\u3001Unigram-Watermark \u548c Gumbel-Watermark\uff09\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53d6\u5f97\u4e86\u9ad8\u7cbe\u5ea6\uff0c\u5e76\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5177\u6709\u9002\u5e94\u5176\u4ed6\u6c34\u5370\u6280\u672f\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u7cbe\u786e\u6c34\u5370\u68c0\u6d4b\u7684\u65b0\u89c1\u89e3\u3002|\n", "2410.03595": "|**2024-10-04**|**Understanding Reasoning in Chain-of-Thought from the Hopfieldian View**|Lijie Hu et.al.|[2410.03595](http://arxiv.org/abs/2410.03595)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u63d0\u793a\u4f5c\u4e3a\u4e00\u79cd\u63d0\u5347\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u6280\u672f\u9010\u6e10\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u9ad8\u6027\u80fd\u65b9\u9762\uff0c\u7f3a\u4e4f\u5bf9CoT\u6210\u529f\u80cc\u540e\u6839\u672c\u56e0\u7d20\u7684\u5168\u9762\u89e3\u91ca\u6846\u67b6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba4\u77e5\u795e\u7ecf\u79d1\u5b66\u4e2d\u7684\u970d\u666e\u83f2\u5c14\u5fb7\u8ba4\u77e5\u89c2\u7684\u65b0\u89c6\u89d2\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u94fe\u63a5CoT\u63a8\u7406\u4e0e\u523a\u6fc0\u3001\u52a8\u4f5c\u3001\u795e\u7ecf\u7fa4\u4f53\u548c\u8868\u793a\u7a7a\u95f4\u7b49\u5173\u952e\u8ba4\u77e5\u5143\u7d20\u4e4b\u95f4\u7684\u5173\u7cfb\u6846\u67b6\u3002\u4ece\u8fd9\u4e00\u89c6\u89d2\u51fa\u53d1\uff0c\u6211\u4eec\u53ef\u4ee5\u7406\u89e3\u63a8\u7406\u8fc7\u7a0b\u5b9e\u8d28\u4e0a\u662f\u8fd9\u4e9b\u8868\u793a\u7a7a\u95f4\u4e4b\u95f4\u7684\u79fb\u52a8\u3002 \u57fa\u4e8e\u6b64\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u5b9a\u4f4dCoT\u54cd\u5e94\u4e2d\u7684\u63a8\u7406\u9519\u8bef\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u601d\u8003\u7684\u8868\u793a\u201d\uff08Representation-of-Thought, RoT\uff09\u7684\u6846\u67b6\uff0c\u5229\u7528\u4f4e\u7ef4\u8868\u793a\u7a7a\u95f4\u7684\u9c81\u68d2\u6027\u6765\u589e\u5f3aCoT\u63a8\u7406\u8fc7\u7a0b\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRoT\u4e0d\u4ec5\u63d0\u9ad8\u4e86CoT\u63a8\u7406\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u800c\u4e14\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u5316\u63a7\u5236\u7684\u53ef\u80fd\u6027\u3002|\n", "2410.03577": "|**2024-10-04**|**Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models**|Xin Zou et.al.|[2410.03577](http://arxiv.org/abs/2410.03577)|**[link](https://github.com/1zhou-Wang/MemVR)**|\u5c3d\u7ba1\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5177\u6709\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u5e7b\u89c9\uff0c\u7279\u522b\u662f\u5728\u89c6\u89c9\u8f93\u5165\u4e2d\u4e0d\u5b58\u5728\u5173\u952e\u7ec6\u8282\u65f6\uff0c\u4f1a\u5938\u5f20\u5730\u7f16\u9020\u5185\u5bb9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9075\u5faa\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e2a\u5e38\u89c1\u6b65\u9aa4\u2014\u2014\u5f53\u5bf9\u73b0\u573a\u5173\u952e\u7ec6\u8282\u7684\u8bb0\u5fc6\u9010\u6e10\u6a21\u7cca\u65f6\uff0c\u76f4\u89c2\u7684\u505a\u6cd5\u662f\u518d\u6b21\u67e5\u770b\u8fd9\u4e9b\u7ec6\u8282\u4ee5\u5bfb\u6c42\u51c6\u786e\u548c\u771f\u5b9e\u7684\u4fe1\u606f\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bb0\u5fc6\u7a7a\u95f4\u89c6\u89c9\u91cd\u8bfb\u201d\uff08MemVR\uff09\u7684\u65b0\u578b\u5e7b\u89c9\u7f13\u89e3\u8303\u5f0f\uff0c\u5b83\u65e0\u9700\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6216\u989d\u5916\u7684\u5fae\u8c03\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u63d0\u793a\u4f5c\u4e3a\u8865\u5145\u8bc1\u636e\uff0c\u901a\u8fc7\u524d\u9988\u7f51\u7edc\uff08FFN\uff09\u6ce8\u5165\u5230MLLMs\u4e2d\u4f5c\u4e3a\u952e\u503c\u8bb0\u5fc6\uff0c\u5f53\u6a21\u578b\u5bf9\u95ee\u9898\u76f8\u5173\u7684\u89c6\u89c9\u8bb0\u5fc6\u4e0d\u786e\u5b9a\u751a\u81f3\u9057\u5fd8\u65f6\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cMemVR\u5728\u5404\u79cdMLLMs\u4e0a\u663e\u8457\u7f13\u89e3\u4e86\u5e7b\u89c9\u95ee\u9898\uff0c\u5e76\u4e14\u5728\u4e0d\u589e\u52a0\u65f6\u95f4\u5f00\u9500\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u901a\u7528\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4ece\u800c\u7a81\u663e\u51fa\u5176\u5e7f\u6cdb\u9002\u7528\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.03568": "|**2024-10-04**|**Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs)**|Abrar Rahman et.al.|[2410.03568](http://arxiv.org/abs/2410.03568)|null|\u672c\u6587\u5bf9\u5f53\u524d\u9876\u7ea7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u91c7\u7528\u7684\u5206\u8bcd\u6280\u672f\u8fdb\u884c\u4e86\u5168\u9762\u7814\u7a76\uff0c\u5e76\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u4e0d\u540c\u8bed\u8a00\u5c24\u5176\u662f\u8d44\u6e90\u532e\u4e4f\u8bed\u8a00\u670d\u52a1\u6210\u672c\u4e0e\u53ef\u7528\u6027\u65b9\u9762\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u7814\u7a76\u8003\u8651\u4e86\u591a\u79cdLLMs\uff0c\u5305\u62ec\u4f7f\u7528cl100k_base\u5d4c\u5165\u7684GPT-4\u3001\u4f7f\u7528p50k_base\u5d4c\u5165\u7684GPT-3\u4ee5\u53ca\u4f7f\u7528r50k_base\u5d4c\u5165\u7684DaVinci\uff0c\u540c\u65f6\u5bf9\u6bd4\u4e86\u5e7f\u6cdb\u4f7f\u7528\u7684BERT\u57fa\u7840\u5206\u8bcd\u5668\u3002\u7814\u7a76\u5206\u6790\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u5206\u8bcd\u5dee\u5f02\uff0c\u5e76\u6df1\u5165\u63a2\u7a76\u4e86\u5b50\u8bcd\u5206\u8bcd\u5728\u8bed\u8a00\u8868\u793a\u4e0a\u7684\u6311\u6218\u3002 \u7814\u7a76\u5f3a\u8c03\u4e86\u57f9\u517b\u8bed\u8a00\u610f\u8bc6\u5f00\u53d1\u5b9e\u8df5\u7684\u91cd\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u90a3\u4e9b\u4f20\u7edf\u4e0a\u8d44\u6e90\u4e0d\u8db3\u7684\u8bed\u8a00\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u5206\u8bcd\u9009\u62e9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002\u7814\u7a76\u65e8\u5728\u4fc3\u8fdbAI\u670d\u52a1\u9886\u57df\uff0c\u7279\u522b\u662f\u8de8\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u901a\u7528\u5316\u56fd\u9645\u5316\uff08I18N\uff09\u5b9e\u8df5\uff0c\u7279\u522b\u5173\u6ce8\u88ab\u73b0\u6709AI\u5e94\u7528\u4e25\u91cd\u5ffd\u89c6\u7684\u8bed\u8a00\u7684\u5305\u5bb9\u6027\u53d1\u5c55\u3002|\n", "2410.03553": "|**2024-10-04**|**Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding**|Wei Wu et.al.|[2410.03553](http://arxiv.org/abs/2410.03553)|null|\u86cb\u767d\u8d28\u4f5c\u4e3a\u751f\u7269\u5206\u5b50\u7684\u6838\u5fc3\uff0c\u5728\u751f\u7269\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u5305\u62ec\u4ee3\u8c22\u53cd\u5e94\u548cDNA\u590d\u5236\u3002\u51c6\u786e\u9884\u6d4b\u5b83\u4eec\u7684\u6027\u8d28\u548c\u529f\u80fd\u5bf9\u751f\u7269\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u5f00\u53d1\u7684\u86cb\u767d\u8d28\u8bed\u8a00\u6a21\u578b\uff08pLMs\uff09\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u63d0\u4f9b\u4e86\u89e3\u51b3\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5fae\u8c03\u7684\u6a21\u578b\u4ec5\u9488\u5bf9\u7279\u5b9a\u4e0b\u6e38\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u5b9a\u5236\uff0c\u5b9e\u73b0\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u589e\u5f3a\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u8c03\u8c10\uff08SEPIT\uff09\u6846\u67b6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728pLMs\u4e2d\u96c6\u6210\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u7ed3\u6784\u611f\u77e5\u6a21\u5757\uff0c\u4ee5\u63d0\u4f9b\u6709\u5173\u7ed3\u6784\u7684\u77e5\u8bc6\uff0c\u5e76\u5c06\u8fd9\u4e9b\u589e\u5f3a\u7684pLMs\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fde\u63a5\u8d77\u6765\uff0c\u4ee5\u751f\u6210\u86cb\u767d\u8d28\u7684\u7406\u89e3\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u6307\u4ee4\u8c03\u8c10\u7ba1\u9053\uff0c\u9996\u5148\u901a\u8fc7\u57fa\u4e8e\u56fe\u6807\u7684\u6307\u4ee4\u5efa\u7acb\u86cb\u767d\u8d28\u7684\u57fa\u672c\u7406\u89e3\uff0c\u7136\u540e\u4f7f\u7528\u4e13\u5bb6\u6df7\u5408\uff08MoEs\uff09\u5b66\u4e60\u66f4\u590d\u6742\u5c5e\u6027\u548c\u529f\u80fd\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6fc0\u6d3b\u53c2\u6570\u7684\u6570\u91cf\u76f8\u540c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u6700\u5168\u9762\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u6211\u4eec\u80fd\u591f\u8bad\u7ec3\u548c\u8bc4\u4f30\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u7ecf\u9a8c\u7ed3\u679c\u5728\u5f00\u653e\u5f0f\u751f\u6210\u548c\u5c01\u95ed\u96c6\u5408\u7b54\u6848\u4efb\u52a1\u4e0a\u663e\u793a\u4e86SEPIT\u76f8\u5bf9\u4e8e\u95ed\u6e90\u901a\u7528LLM\u548c\u4f7f\u7528\u86cb\u767d\u8d28\u77e5\u8bc6\u8bad\u7ec3\u7684\u5f00\u6e90LLM\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2410.05269": "|**2024-10-07**|**Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models**|Fei Wang et.al.|[2410.05269](http://arxiv.org/abs/2410.05269)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u6570\u636e\u662f\u5173\u952e\u8981\u7d20\u3002\u8fd1\u671f\u7814\u7a76\u63a2\u7d22\u4e86\u5229\u7528LLM\u8fdb\u884c\u9ad8\u6548\u6570\u636e\u6536\u96c6\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531LLM\u751f\u6210\u7684\u6570\u636e\u5f80\u5f80\u5b58\u5728\u8d28\u91cf\u53c2\u5dee\u4e0d\u9f50\u3001\u67d0\u4e9b\u65b9\u9762\u88ab\u4f4e\u4f30\u6216\u7f3a\u5931\u4ee5\u53ca\u6570\u636e\u70b9\u8d28\u91cf\u4f4e\u4e0b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6570\u636e\u987e\u95ee\u201d\u7684\u589e\u5f3a\u578bLLM\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u76ee\u6807\u6570\u636e\u96c6\u7684\u7279\u6027\uff0c\u4ece\u9884\u5b9a\u4e49\u7684\u539f\u5219\u51fa\u53d1\uff0c\u76d1\u63a7\u751f\u6210\u6570\u636e\u7684\u72b6\u6001\uff0c\u8bc6\u522b\u5f53\u524d\u6570\u636e\u96c6\u7684\u5f31\u70b9\uff0c\u5e76\u636e\u6b64\u6307\u5bfc\u6570\u636e\u751f\u6210\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u6570\u636e\u987e\u95ee\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u73b0\u6709\u7684\u6570\u636e\u751f\u6210\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u8986\u76d6\u9762\u3002 \u5728\u5bf9\u4e09\u4e2a\u4ee3\u8868\u6027LLM\uff08\u5373Mistral\u3001Llama2\u548cFalcon\uff09\u7684\u5b89\u5168\u5bf9\u9f50\u8fdb\u884c\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6570\u636e\u987e\u95ee\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u727a\u7272\u6a21\u578b\u5b9e\u7528\u6027\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u63d0\u5347\u6a21\u578b\u5bf9\u5404\u79cd\u7cbe\u7ec6\u7c92\u5ea6\u5b89\u5168\u95ee\u9898\u7684\u9002\u5e94\u6027\u7684\u80fd\u529b\u3002|\n", "2410.05265": "|**2024-10-07**|**PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs**|Mengzhao Chen et.al.|[2410.05265](http://arxiv.org/abs/2410.05265)|**[link](https://github.com/chenmnz/prefixquant)**|**\u91cf\u5316\u5bf9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u80fd\u663e\u8457\u63d0\u5347\u5185\u5b58\u6548\u7387\u4e0e\u63a8\u7406\u901f\u5ea6\u3002\u73b0\u6709\u7684\u6fc0\u6d3b\u91cf\u5316\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u901a\u9053\u7ea7\u5f02\u5e38\u503c\u8fdb\u884c\u5904\u7406\uff0c\u5f80\u5f80\u5ffd\u7565\u4e86\u4ee4\u724c\u7ea7\u7684\u5f02\u5e38\u503c\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5bf9\u6210\u672c\u9ad8\u6602\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u4f9d\u8d56\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPrefixQuant\u7684\u65b0\u9896\u6280\u672f\uff0c\u8be5\u6280\u672f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u79bb\u7ebf\u8bc6\u522b\u51fa\u9ad8\u9891\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3a\u524d\u7f00\u653e\u5165KV\u7f13\u5b58\u4e2d\uff0c\u4ee5\u9632\u6b62\u63a8\u7406\u8fc7\u7a0b\u4e2d\u751f\u6210\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u7b80\u5316\u4e86\u91cf\u5316\u8fc7\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cPrefixQuant\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u5e76\u8d85\u8d8a\u6602\u8d35\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u7684\u65b9\u6cd5\u3002\u4f8b\u5982\uff0c\u5728W4A4KV4\uff08\u6743\u91cd4\u4f4d\u3001\u6fc0\u6d3b4\u4f4d\u3001KV\u7f13\u5b584\u4f4d\uff09\u7684Llama-3-8B\u6a21\u578b\u4e2d\uff0c\u4f7f\u7528PrefixQuant\u548c\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u540e\uff0cWikiText2\u7684\u56f0\u60d1\u5ea6\u964d\u4f4e\u4e867.43\u4e2a\u70b9\uff0c\u5e73\u5747\u51c6\u786e\u7387\u57285\u4e2a\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\u4e0a\u63d0\u9ad8\u4e8671.08%\uff0c\u76f8\u8f83\u4e8e\u4e4b\u524d\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u65b9\u6cd5QuaRot\uff0c\u5206\u522b\u5728\u56f0\u60d1\u5ea6\u4e0a\u63d0\u5347\u4e860.98\u4e2a\u70b9\uff0c\u5728\u51c6\u786e\u7387\u4e0a\u63d0\u5347\u4e865.98\u4e2a\u70b9\u3002\u6b64\u5916\uff0c\u4f7f\u7528PrefixQuant\u91cf\u5316\u540e\u7684\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u76f8\u8f83\u4e8eFP16\u6a21\u578b\u63d0\u5347\u4e861.60\u500d\u52302.81\u500d\uff0c\u4e14\u8d85\u8fc7\u4e86QuaRot\u6a21\u578b1.2\u500d\u52301.3\u500d\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\url{https://github.com/ChenMnZ/PrefixQuant}\u3002**|\n", "2410.05262": "|**2024-10-07**|**TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles**|Qingchen Yu et.al.|[2410.05262](http://arxiv.org/abs/2410.05262)|**[link](https://github.com/mazzzystar/TurtleBench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u8303\u56f4\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u53ef\u9760\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u73b0\u6709\u7684LLM\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u4f9d\u8d56\u9759\u6001\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u5f97\u8bc4\u4f30\u6a21\u578b\u5728\u4e0e\u7528\u6237\u52a8\u6001\u4ea4\u4e92\u65f6\u7684\u8868\u73b0\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5f80\u5f80\u9700\u8981\u7279\u5b9a\u80cc\u666f\u77e5\u8bc6\uff0c\u4ece\u800c\u590d\u6742\u5316\u4e86\u8861\u91cf\u6a21\u578b\u903b\u8f91\u63a8\u7406\u80fd\u529b\u7684\u6d4b\u91cf\u3002\u57fa\u4e8e\u5f3a\u5927\u6a21\u578b\u6216\u4eba\u5de5\u52aa\u529b\u7684\u5176\u4ed6\u52a8\u6001\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4f1a\u5f15\u5165\u504f\u89c1\uff0c\u5e76\u4e14\u6210\u672c\u548c\u65f6\u95f4\u9700\u6c42\u9ad8\uff0c\u8fd9\u963b\u788d\u4e86\u5927\u89c4\u6a21\u5e94\u7528\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TurtleBench\u3002TurtleBench\u4ece\u6211\u4eec\u5f00\u53d1\u7684\u5728\u7ebfTurtle Soup Puzzle\u5e73\u53f0\u6536\u96c6\u771f\u5b9e\u7684\u7528\u6237\u731c\u6d4b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u751f\u6210\u76f8\u5bf9\u52a8\u6001\u7684\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u964d\u4f4e\u6a21\u578b\u4f5c\u5f0a\u7684\u98ce\u9669\uff0c\u540c\u65f6\u4f7f\u8bc4\u4f30\u66f4\u8d34\u8fd1\u5b9e\u9645\u7528\u6237\u7684\u63a8\u7406\u9700\u6c42\uff0c\u4ece\u800c\u63d0\u9ad8\u8bc4\u4f30\u7684\u53ef\u9760\u6027\u3002TurtleBench\u5305\u542b\u4e861,532\u4e2a\u7528\u6237\u731c\u6d4b\u53ca\u5176\u6b63\u786e\u6027\u7684\u6ce8\u91ca\u4fe1\u606f\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u4e5d\u4e2aLLM\u6a21\u578b\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cOpenAI o1\u7cfb\u5217\u6a21\u578b\u5728\u8fd9\u4e9b\u8bc4\u4f30\u4e2d\u5e76\u672a\u53d6\u5f97\u9886\u5148\u5730\u4f4d\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u5047\u8bbe\uff0c\u4f8b\u5982\u201co1\u7684\u6f5c\u5728\u63a8\u7406\u4f7f\u7528\u4e86\u7b80\u5355\u7684\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u6280\u672f\u201d\u548c\u201c\u589e\u52a0CoT\u957f\u5ea6\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u63a8\u7406\u76ca\u5904\uff0c\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u566a\u97f3\u6210\u672c\u201d\u3002**|\n", "2410.05258": "|**2024-10-07**|**Differential Transformer**|Tianzhu Ye et.al.|[2410.05258](http://arxiv.org/abs/2410.05258)|**[link](https://github.com/microsoft/unilm/blob/master/Diff-Transformer/)**|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5dee\u5f02\u53d8\u6362\u5668\uff08Diff Transformer\uff09\uff0c\u5b83\u80fd\u591f\u589e\u5f3a\u5bf9\u76f8\u5173\u4e0a\u4e0b\u6587\u7684\u6ce8\u610f\u529b\u540c\u65f6\u6d88\u9664\u566a\u97f3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5dee\u5f02\u6ce8\u610f\u529b\u673a\u5236\u901a\u8fc7\u8ba1\u7b97\u4e24\u4e2a\u72ec\u7acb\u7684softmax\u6ce8\u610f\u529b\u6620\u5c04\u4e4b\u95f4\u7684\u5dee\u503c\u6765\u786e\u5b9a\u6ce8\u610f\u529b\u5206\u6570\u3002\u8fd9\u79cd\u51cf\u6cd5\u64cd\u4f5c\u53ef\u4ee5\u6d88\u9664\u566a\u97f3\u5e76\u4fc3\u8fdb\u7a00\u758f\u6ce8\u610f\u529b\u6a21\u5f0f\u7684\u4ea7\u751f\u3002\u5728\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u53d8\u6362\u5668\u76f8\u6bd4\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u5728\u6a21\u578b\u5927\u5c0f\u548c\u8bad\u7ec3\u6837\u672c\u91cf\u7684\u6269\u5c55\u4e0a\u5747\u8868\u73b0\u51fa\u8272\u3002\u66f4\u4ee4\u4eba\u5174\u594b\u7684\u662f\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5982\u957f\u4e0a\u4e0b\u6587\u5efa\u6a21\u3001\u5173\u952e\u4fe1\u606f\u68c0\u7d22\u3001\u5e7b\u89c9\u6291\u5236\u3001\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u4ee5\u53ca\u6fc0\u6d3b\u5f02\u5e38\u51cf\u5c11\u7b49\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002\u7531\u4e8e\u5bf9\u65e0\u5173\u4e0a\u4e0b\u6587\u7684\u5173\u6ce8\u8f83\u5c11\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u80fd\u591f\u6709\u6548\u7f13\u89e3\u95ee\u7b54\u548c\u6587\u672c\u6458\u8981\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u5728\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u51c6\u786e\u7387\uff0c\u800c\u4e14\u5bf9\u4e8e\u987a\u5e8f\u6392\u5217\u66f4\u4e3a\u9c81\u68d2\uff0c\u8fd9\u88ab\u8ba4\u4e3a\u662f\u957f\u671f\u7684\u7a33\u5065\u6027\u95ee\u9898\u3002\u8fd9\u4e9b\u7ed3\u679c\u786e\u7acb\u4e86\u5dee\u5f02\u53d8\u6362\u5668\u4f5c\u4e3a\u63a8\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u7684\u9ad8\u6548\u4e14\u6709\u524d\u666f\u67b6\u6784\u7684\u5730\u4f4d\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u4e0e\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u9886\u57df\u901a\u5e38\u4ee5\u81ea\u7136\u8bed\u8a00\u6c9f\u901a\u4e3a\u4e3b\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\u884c\u4e3a\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u548c\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u7b56\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u8fd9\u4e9b\u95ee\u9898\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u4e00\u76f4\u5728\u63a2\u7d22LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u529b\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u7684\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u5dee\u5f02\u4f7f\u5f97\u5f88\u96be\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6807\u51c6\u5316\u7814\u7a76\u57fa\u4e8e\u53cc\u4eba\u3001\u5e8f\u5217\u3001\u8bed\u8a00\u9a71\u52a8\u6e38\u620f\u7684\u6807\u51c6\u6846\u67b6\u3002\u53d7\u7ecf\u6d4e\u5b66\u6587\u732e\u542f\u53d1\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u672c\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u548c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u548c\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u6765\u6a21\u62df\u4ea4\u4e92\u548c\u5206\u6790\uff0c\u5e76\u5229\u7528\u5b83\u6536\u96c6\u4e86LMM\u5bf9LMM\u4ea4\u4e92\u7684\u5927\u91cf\u6570\u636e\u96c6\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u5bf9LMM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u5982\u4f55\u88ab\u7528\u6765\uff1a (i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b (ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u5c42\u9762\u8bc4\u4f30\u4ee3\u7406\u7684\u6027\u80fd\uff1b (iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.05252": "|**2024-10-07**|**Causal Micro-Narratives**|Mourad Heddaya et.al.|[2410.05252](http://arxiv.org/abs/2410.05252)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5bf9\u6587\u672c\u4e2d\u7684\u56e0\u679c\u5fae\u53d9\u4e8b\u8fdb\u884c\u5206\u7c7b\u3002\u8fd9\u4e9b\u53d9\u4e8b\u662f\u5173\u4e8e\u76ee\u6807\u4e3b\u4f53\u7684\u56e0\u679c\u89e3\u91ca\u7684\u53e5\u5b50\u7ea7\u63cf\u8ff0\u3002\u8be5\u65b9\u6cd5\u4ec5\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u4e3b\u9898\u7684\u56e0\u679c\u548c\u6548\u679c\u7684\u672c\u4f53\uff0c\u6211\u4eec\u901a\u8fc7\u5e94\u7528\u5230\u901a\u8d27\u81a8\u80c0\u53d9\u4e8b\u4e2d\u8fdb\u884c\u4e86\u793a\u8303\u3002\u5229\u7528\u8986\u76d6\u7f8e\u56fd\u5386\u53f2\u548c\u5f53\u4ee3\u65b0\u95fb\u6587\u7ae0\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u5728\u591a\u6807\u7b7e\u5206\u7c7b\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u51e0\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\u2014\u2014\u5fae\u8c03\u540e\u7684Llama 3.1 8B\uff0c\u5728\u53d9\u4e8b\u68c0\u6d4b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.87\uff0c\u5728\u53d9\u4e8b\u5206\u7c7b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.71\u3002\u5168\u9762\u7684\u9519\u8bef\u5206\u6790\u63ed\u793a\u4e86\u8bed\u4e49\u6b67\u4e49\u5e26\u6765\u7684\u6311\u6218\uff0c\u5e76\u6307\u51fa\u6a21\u578b\u9519\u8bef\u5f80\u5f80\u53cd\u6620\u4e86\u4eba\u5de5\u6ce8\u91ca\u8005\u7684\u5206\u6b67\u3002\u8fd9\u9879\u7814\u7a76\u5efa\u7acb\u4e86\u4e00\u4e2a\u4ece\u5b9e\u9645\u6570\u636e\u4e2d\u63d0\u53d6\u56e0\u679c\u5fae\u53d9\u4e8b\u7684\u6846\u67b6\uff0c\u5177\u6709\u5e7f\u6cdb\u7684\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u5e94\u7528\u524d\u666f\u3002|\n", "2410.05248": "|**2024-10-07**|**SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe**|Yuxin Xiao et.al.|[2410.05248](http://arxiv.org/abs/2410.05248)|null|\u4e3a\u4e86\u5728\u4ea4\u4e92\u9a71\u52a8\u4efb\u52a1\u4e2d\u8bf1\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u671f\u671b\u7684\u884c\u4e3a\uff0c\u901a\u5e38\u91c7\u7528\u6307\u4ee4-\u8c03\u4f18\u9636\u6bb5\uff0c\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u635f\u5931\u8bad\u7ec3LLM\u4e8e\u6307\u4ee4\u54cd\u5e94\u5bf9\u3002\u5148\u524d\u7684\u5de5\u4f5c\u65e8\u5728\u63d0\u5347\u8c03\u4f18\u6027\u80fd\uff0c\u5e38\u7740\u91cd\u4e8e\u9ad8\u8d28\u91cf\u7684\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u96c6\u7684\u6784\u5efa\uff0c\u8fd9\u901a\u5e38\u9700\u8981\u6602\u8d35\u7684\u6570\u636e\u8fc7\u6ee4\u8fc7\u7a0b\u6216\u4eba\u529b\u5bc6\u96c6\u578b\u7684\u4eba\u5de5\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528\u6570\u636e\u96c6\u7684\u5185\u5728\u7279\u6027\uff0c\u5bfc\u81f4\u4e86\u9ad8\u6602\u7684\u8ba1\u7b97\u548c\u52b3\u52a8\u6210\u672c\uff0c\u9650\u5236\u4e86\u53ef\u6269\u5c55\u6027\u548c\u6027\u80fd\u63d0\u5347\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSFTMix\u7684\u65b0\u9896\u65b9\u6cd5\uff0c\u5b83\u8d85\u8d8a\u4e86\u4f20\u7edfNTP\u8303\u5f0f\uff0c\u65e0\u9700\u7cbe\u5fc3\u8bbe\u8ba1\u7684SFT\u6570\u636e\u96c6\u5373\u53ef\u63d0\u5347\u8c03\u4f18\u6027\u80fd\u3002 \u89c2\u5bdf\u5230LLM\u5728\u8bed\u4e49\u8868\u793a\u7a7a\u95f4\u4e2d\u8868\u73b0\u51fa\u4e0d\u5747\u5300\u7684\u7f6e\u4fe1\u5ea6\u5206\u5e03\uff0c\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\u5728\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u5e94\u626e\u6f14\u4e0d\u540c\u7684\u89d2\u8272\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0cSFTMix\u5229\u7528\u8bad\u7ec3\u52a8\u6001\u6765\u8bc6\u522b\u5177\u6709\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\uff0c\u7136\u540e\u5e94\u7528\u57fa\u4e8eMixup\u7684\u6b63\u5219\u5316\u6765\u51cf\u5c11\u5bf9\u9ad8\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u8fc7\u62df\u5408\uff0c\u540c\u65f6\u4f20\u64ad\u76d1\u7763\u4fe1\u53f7\u4ee5\u6539\u5584\u76f8\u5bf9\u4f4e\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u5b66\u4e60\u6548\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u5f97SFTMix\u80fd\u591f\u5728\u5e7f\u6cdb\u7684\u64cd\u4f5c\u6307\u4ee4\u9075\u5faa\u548c\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u7279\u5b9aSFT\u4efb\u52a1\u4e2d\u663e\u8457\u8d85\u8d8aNTP\uff0c\u8bc1\u660e\u4e86\u5176\u5bf9\u4e0d\u540cLLM\u5bb6\u65cf\u548c\u4efb\u610f\u5927\u5c0f\u6570\u636e\u96c6\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86SFTMix\u8bbe\u8ba1\u9009\u62e9\u7684\u7a33\u5065\u6027\uff0c\u5f3a\u8c03\u4e86\u5176\u5728\u4e0d\u540cLLM\u548c\u6570\u636e\u96c6\u4e0a\u7684\u4e00\u81f4\u6027\u80fd\u63d0\u5347\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u66f4\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u3002|\n", "2410.05243": "|**2024-10-07**|**Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents**|Boyu Gou et.al.|[2410.05243](http://arxiv.org/abs/2410.05243)|**[link](https://github.com/OSU-NLP-Group/UGround)**|\u672c\u8bba\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5982\u4f55\u91cd\u5851\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u4f7f\u5176\u4ece\u53d7\u63a7\u6a21\u62df\u5411\u8de8\u5e73\u53f0\u7684\u590d\u6742\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u8fc7\u6e21\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u5176\u56fa\u6709\u6027\u7684\u7a33\u5065\u6027\u3002\u5f53\u524d\u7684GUI\u4ee3\u7406\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u8868\u793a\uff0c\u5982HTML\u6216\u53ef\u8bbf\u95ee\u6027\u6811\uff0c\u5c3d\u7ba1\u5b83\u4eec\u5177\u6709\u5b9e\u7528\u6027\uff0c\u4f46\u5f80\u5f80\u5f15\u5165\u566a\u58f0\u3001\u4e0d\u5b8c\u6574\u6027\u4ee5\u53ca\u589e\u52a0\u8ba1\u7b97\u5f00\u9500\u3002 \u6211\u4eec\u7684\u89c2\u70b9\u662f\uff0c\u4e3aGUI\u4ee3\u7406\u6784\u5efa\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4f53\u73b0\uff0c\u80fd\u591f\u5b8c\u5168\u901a\u8fc7\u89c6\u89c9\u611f\u77e5\u73af\u5883\uff0c\u5e76\u76f4\u63a5\u5bf9GUI\u6267\u884c\u50cf\u7d20\u7ea7\u64cd\u4f5c\u3002\u5173\u952e\u5728\u4e8e\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u5b83\u4eec\u80fd\u591f\u51c6\u786e\u5730\u5c06GUI\u5143\u7d20\u7684\u5404\u79cd\u5f15\u7528\u8868\u8fbe\u6620\u5c04\u5230\u5176\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684GUI\u5750\u6807\u4e0a\u3002\u6211\u4eec\u8868\u660e\uff0c\u4e00\u4e2a\u7b80\u5355\u7684\u914d\u65b9\u2014\u2014\u5305\u62ec\u57fa\u4e8e\u7f51\u7edc\u7684\u5408\u6210\u6570\u636e\u548c\u5bf9LLaVA\u67b6\u6784\u7684\u8f7b\u5fae\u8c03\u6574\u2014\u2014\u5bf9\u4e8e\u8bad\u7ec3\u8fd9\u6837\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u662f\u51fa\u5947\u6709\u6548\u7684\u3002 \u6211\u4eec\u6536\u96c6\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684GUI\u89c6\u89c9\u5b9a\u4f4d\u6570\u636e\u96c6\uff0c\u5305\u542b10M\u4e2aGUI\u5143\u7d20\u53ca\u5176\u5f15\u7528\u8868\u8fbe\uff0c\u8986\u76d6\u4e861.3M\u5f20\u622a\u56fe\uff0c\u4ee5\u6b64\u6765\u8bad\u7ec3UGround\uff0c\u8fd9\u662f\u7528\u4e8eGUI\u4ee3\u7406\u7684\u5f3a\u5927\u901a\u7528\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u3002\u5728\u516d\u4e2a\u8de8\u4e09\u4e2a\u7c7b\u522b\uff08\u5b9a\u4f4d\u3001\u79bb\u7ebf\u4ee3\u7406\u548c\u5728\u7ebf\u4ee3\u7406\uff09\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u51fa\u4ee5\u4e0b\u4e24\u70b9\uff1a 1\uff09UGround\u663e\u8457\u4f18\u4e8e\u73b0\u6709GUI\u4ee3\u7406\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u7edd\u5bf9\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002 2\uff09\u4f7f\u7528UGround\u7684\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u4ee3\u7406\uff0c\u5c3d\u7ba1\u73b0\u6709\u7684\u4ee3\u7406\u4f7f\u7528\u989d\u5916\u7684\u57fa\u4e8e\u6587\u672c\u7684\u8f93\u5165\uff0c\u800c\u6211\u4eec\u7684\u4ee3\u7406\u4ec5\u4f9d\u8d56\u4e8e\u89c6\u89c9\u611f\u77e5\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5f3a\u6709\u529b\u5730\u652f\u6301\u4e86\u8fd9\u6837\u4e00\u79cd\u8bbe\u60f3\uff1a\u5373\u50cf\u4eba\u7c7b\u4e00\u6837\u5728\u6570\u5b57\u4e16\u754c\u4e2d\u5bfc\u822a\u7684GUI\u4ee3\u7406\u662f\u53ef\u884c\u7684\uff0c\u5e76\u4e14\u5145\u6ee1\u4e86\u6f5c\u529b\u3002|\n", "2410.05229": "|**2024-10-07**|**GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models**|Iman Mirzadeh et.al.|[2410.05229](http://arxiv.org/abs/2410.05229)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5f15\u53d1\u4e86\u5bf9\u5b83\u4eec\u5728\u6570\u5b66\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u5173\u6ce8\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5c0f\u5b66\u6c34\u5e73\u95ee\u9898\u3002GSM8K\u57fa\u51c6\u6d4b\u8bd5\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u8868\u73b0\u3002\u5c3d\u7ba1LLM\u5728GSM8K\u4e0a\u7684\u6210\u7ee9\u8fd1\u5e74\u6765\u663e\u8457\u63d0\u9ad8\uff0c\u4f46\u5176\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5426\u771f\u6b63\u6709\u6240\u63d0\u5347\u4ecd\u7136\u5b58\u5728\u7591\u95ee\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u8bc4\u4f30\u6307\u6807\u7684\u53ef\u9760\u6027\u53d7\u5230\u8d28\u7591\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u5f53\u524d\u6700\u524d\u6cbf\u7684\u5f00\u653e\u548c\u5c01\u95ed\u6a21\u578b\u3002\u4e3a\u4e86\u514b\u670d\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86GSM-Symbolic\u6539\u8fdb\u7248\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u57fa\u4e8e\u7b26\u53f7\u6a21\u677f\u751f\u6210\u4e86\u591a\u6837\u5316\u7684\u9898\u76ee\u3002GSM-Symbolic\u4f7f\u5f97\u8bc4\u4f30\u66f4\u52a0\u53ef\u63a7\uff0c\u63d0\u4f9b\u4e86\u5173\u952e\u6d1e\u5bdf\u548c\u66f4\u53ef\u9760\u7684\u6307\u6807\u6765\u8861\u91cf\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u5728\u56de\u7b54\u4e0d\u540c\u7248\u672c\u540c\u9898\u65f6\u8868\u73b0\u51fa\u660e\u663e\u7684\u5dee\u5f02\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728GSM-Symbolic\u57fa\u51c6\u4e2d\uff0c\u4ec5\u6539\u53d8\u95ee\u9898\u4e2d\u7684\u6570\u503c\u540e\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f1a\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8106\u5f31\u6027\uff0c\u5e76\u8868\u660e\u968f\u7740\u95ee\u9898\u4e2d\u6761\u76ee\u6570\u91cf\u7684\u589e\u52a0\uff0c\u5176\u6027\u80fd\u4f1a\u663e\u8457\u964d\u4f4e\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u8fd9\u662f\u56e0\u4e3a\u5f53\u524d\u7684LLM\u65e0\u6cd5\u6267\u884c\u771f\u6b63\u7684\u903b\u8f91\u63a8\u7406\uff1b\u5b83\u4eec\u53ea\u662f\u590d\u5236\u4e86\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u63a8\u7406\u6b65\u9aa4\u3002\u5373\u4f7f\u6dfb\u52a0\u4e00\u4e2a\u770b\u4f3c\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5355\u4e2a\u6761\u76ee\uff0c\u6240\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7684\u8868\u73b0\u4e5f\u4f1a\u5927\u5e45\u4e0b\u964d\uff08\u9ad8\u8fbe65%\uff09\uff0c\u5c3d\u7ba1\u8fd9\u4e2a\u6761\u76ee\u5b9e\u9645\u4e0a\u5e76\u4e0d\u8d21\u732e\u4e8e\u5b8c\u6210\u7b54\u6848\u6240\u9700\u7684\u5173\u952e\u63a8\u7406\u94fe\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u7406\u89e3LLM\u5728\u6570\u5b66\u63a8\u7406\u4e0a\u7684\u80fd\u529b\u548c\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u4e3a\u7ec6\u81f4\u7684\u89c6\u89d2\u3002|\n", "2410.05224": "|**2024-10-07**|**Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates**|Avanika Narayan et.al.|[2410.05224](http://arxiv.org/abs/2410.05224)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCookbook\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u7f16\u7a0b\u65b9\u5f0f\u751f\u6210\u8bad\u7ec3\u6570\u636e\uff0c\u6570\u636e\u4e3b\u8981\u7531\u968f\u673a\u6807\u8bb0\u7684\u7b80\u5355\u6a21\u5f0f\u7ec4\u6210\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u89c4\u6a21\u548c\u6210\u672c\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4e14\u907f\u514d\u4e86\u4e0e\u4eba\u7c7b\u6216\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6570\u636e\u76f8\u5173\u7684\u6cd5\u5f8b\u548c\u9690\u79c1\u95ee\u9898\u3002\u9996\u5148\uff0cCookbook\u5229\u7528\u6570\u636e\u751f\u6210Python\u51fd\u6570\u6a21\u677f\u6765\u4ea7\u751f\u9f13\u52b1\u6a21\u578b\u5b66\u4e60\u4e0e\u7279\u5b9a\u4efb\u52a1\u76f8\u5339\u914d\u7684\u663e\u5f0f\u89c4\u5219\u7684\u8bad\u7ec3\u6570\u636e\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u6a21\u578b\u5728\u5bf9\u5e94\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\uff0c\u6700\u9ad8\u53ef\u8fbe52.7\u4e2a\u51c6\u786e\u6027\u70b9\u3002\u5176\u6b21\uff0c\u7531\u4e8e\u6307\u4ee4\u6570\u636e\u96c6\u80fd\u591f\u540c\u65f6\u6539\u5584\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\uff0cCookbook\u7b97\u6cd5\u81ea\u52a8\u5b66\u4e60\u5982\u4f55\u6df7\u5408\u6765\u81ea\u4e0d\u540c\u6a21\u677f\u7684\u6570\u636e\u4ee5\u4f18\u5316\u591a\u4e2a\u4efb\u52a1\u7684\u6027\u80fd\u3002\u5728\u6807\u51c6\u7684\u591a\u4efb\u52a1GPT4ALL\u8bc4\u4f30\u5957\u4ef6\u4e0a\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684Mistral-7B\u6a21\u578b\u5728\u5e73\u5747\u51c6\u786e\u6027\u548c\u4e09\u4e2a\u4efb\u52a1\u4e2d\u7684\u4e09\u4e2a\u4e0a\u5747\u53d6\u5f97\u6700\u4f73\u6210\u7ee9\u3002\u6700\u540e\uff0c\u5206\u6790\u4e86Cookbook\u4e3a\u4f55\u80fd\u63d0\u9ad8\u6027\u80fd\u4ee5\u53ca\u5176\u80cc\u540e\u7684\u539f\u7406\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u9879\u6307\u6807\u6765\u9a8c\u8bc1\u6539\u8fdb\u7684\u4e3b\u8981\u539f\u56e0\u662f\u6a21\u578b\u751f\u6210\u7684\u7ed3\u679c\u66f4\u597d\u5730\u9075\u5faa\u4e86\u6a21\u677f\u89c4\u5219\u3002|\n", "2410.07176": "|**2024-10-09**|**Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models**|Fei Wang et.al.|[2410.07176](http://arxiv.org/abs/2410.07176)|null|\u5728\u63a2\u7d22\u5982\u4f55\u901a\u8fc7\u8054\u5408\u5206\u6790\u6765\u7406\u89e3\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u5bf9\u751f\u6210\u578b\u95ee\u7b54\uff08RAG\uff09\u884c\u4e3a\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5982\u4f55\u5728LLM\u5185\u90e8\u77e5\u8bc6\u4e0e\u5916\u90e8\u6765\u6e90\u4e4b\u95f4\u4ea7\u751f\u6f5c\u5728\u51b2\u7a81\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u4e0d\u5b8c\u7f8e\u7684\u68c0\u7d22\u589e\u5f3a\u53ef\u80fd\u662f\u4e0d\u53ef\u907f\u514d\u7684\uff0c\u5e76\u4e14\u4f1a\u5bf9RAG\u7cfb\u7edf\u9020\u6210\u4e25\u91cd\u5f71\u54cd\u3002\u901a\u8fc7\u5728\u73b0\u5b9e\u6761\u4ef6\u4e0b\u7684\u63a7\u5236\u6027\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4ece\u68c0\u7d22\u5230\u7684\u4e0d\u5b8c\u6574\u77e5\u8bc6\u4e0eLLM\u5185\u90e8\u77e5\u8bc6\u4e4b\u95f4\u7684\u77e5\u8bc6\u51b2\u7a81\u662fRAG\u540e\u5904\u7406\u9636\u6bb5\u9700\u8981\u514b\u670d\u7684\u5173\u952e\u74f6\u9888\u3002 \u4e3a\u4e86\u4f7fLLM\u5728\u9762\u5bf9\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u65f6\u5177\u6709\u9c81\u68d2\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7cbe\u660eRAG\u201d\u8fd9\u4e00\u65b0\u9896\u7684RAG\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u80fd\u591f\u9002\u5f53\u5730\u6fc0\u53d1LLM\u5185\u90e8\u77e5\u8bc6\u4e2d\u7684\u5173\u952e\u4fe1\u606f\uff0c\u901a\u8fc7\u6e90\u610f\u8bc6\u5730\u6574\u5408\u5185\u90e8\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u6700\u7ec8\u6839\u636e\u4fe1\u606f\u53ef\u9760\u6027\u786e\u5b9a\u7b54\u6848\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4f7f\u7528\u4e86Gemini\u548cClaude\u4e24\u4e2a\u6a21\u578b\u9a8c\u8bc1\u4e86\u201c\u7cbe\u660eRAG\u201d\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5176\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u589e\u5f3aRAG\u9c81\u68d2\u6027\u7684\u65b9\u6cd5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u6700\u574f\u60c5\u51b5\u573a\u666f\u4e0b\uff0c\u201c\u7cbe\u660eRAG\u201d\u662f\u552f\u4e00\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u6ca1\u6709RAG\u7684LLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002 \u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u201c\u7cbe\u660eRAG\u201d\u6709\u6548\u5730\u89e3\u51b3\u4e86\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u63d0\u9ad8\u4e86RAG\u7cfb\u7edf\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2410.07173": "|**2024-10-09**|**Do better language models have crisper vision?**|Jona Ruthardt et.al.|[2410.07173](http://arxiv.org/abs/2410.07173)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u6587\u672c\u4ec5\u4f9d\u8d56\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u4e16\u754c\u65b9\u9762\u7684\u8868\u73b0\u3002\u968f\u7740LLMs\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u8fd9\u4e00\u95ee\u9898\u53d8\u5f97\u65e2\u57fa\u7840\u53c8\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7684\u573a\u666f\u4e0a\uff0c\u5982\u751f\u6210\u89c6\u89c9\u5185\u5bb9\u6216\u5bf9\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u805a\u7c7b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u89c6\u89c9\u6587\u672c\u8868\u793a\u57fa\u51c6\u201d\uff08ViTeRB\uff09\u7684\u4efb\u52a1\uff0c\u65e8\u5728\u8bc6\u522b\u51fa\u80fd\u591f\u4e0e\u89c6\u89c9\u4e16\u754c\u9ad8\u5ea6\u4e00\u81f4\u7684\u5173\u952e\u5c5e\u6027\u3002\u57fa\u4e8e\u6b64\u4efb\u52a1\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u4e3a\u4e2d\u5fc3\u7684\u8bed\u5883\u4e0b\u4f5c\u4e3a\u6587\u672c\u8868\u793a\u7684\u7406\u60f3\u5019\u9009\uff0c\u8fd9\u4e0e\u5f53\u524d\u4f7f\u7528\u6587\u672c\u7f16\u7801\u5668\u7684\u505a\u6cd5\u5f62\u6210\u4e86\u5bf9\u6bd4\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201cShareLock\u201d\u2014\u2014\u4e00\u79cd\u8d85\u8f7b\u91cf\u7ea7\u7684\u7c7b\u4f3cCLIP\u7684\u6a21\u578b\u3002\u901a\u8fc7\u5229\u7528\u4ece\u5f3a\u5927\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u9884\u8ba1\u7b97\u7684\u51bb\u7ed3\u7279\u5f81\uff0cShareLock\u5728ImageNet\u4e0a\u53d6\u5f97\u4e8651%\u7684\u51c6\u786e\u7387\uff0c\u4ec5\u4f7f\u7528\u4e86563,000\u5f20\u56fe\u50cf-\u63cf\u8ff0\u5bf9\u3002\u6b64\u5916\uff0c\u8bad\u7ec3\u6240\u9700\u7684\u8d44\u6e90\u4ec5\u4e3a1\u4e2aGPU\u5c0f\u65f6\uff08\u6216\u5305\u62ec\u7279\u5f81\u9884\u8ba1\u7b97\u768410\u4e2a\u5c0f\u65f6\uff09\uff0c\u8fdc\u5c11\u4e8e\u4ee5\u5f80\u65b9\u6cd5\u6240\u9700\u7684\u65f6\u95f4\u6570\u91cf\u7ea7\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u8be5\u4ee3\u7801\u3002|\n", "2410.07167": "|**2024-10-09**|**Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate**|Qidong Huang et.al.|[2410.07167](http://arxiv.org/abs/2410.07167)|**[link](https://github.com/shikiw/modality-integration-rate)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u3001\u7a33\u5065\u7684\u4e14\u901a\u7528\u7684\u6307\u6807\u2014\u2014\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u7528\u4e8e\u8861\u91cf\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(LVLMs)\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u8d28\u91cf\u3002\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u5728\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u800c\u5982\u4f55\u5728\u6602\u8d35\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u4e4b\u524d\u8bc4\u4f30\u5176\u8bad\u7ec3\u8d28\u91cf\u5219\u662f\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u7d22\u7684\u9886\u57df\u3002\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\uff0c\u5e38\u7528\u7684\u9884\u8bad\u7ec3\u6307\u6807\u5305\u62ec\u635f\u5931\u3001\u56f0\u60d1\u5ea6\u4ee5\u53ca\u4e0a\u4e0b\u6587\u5185\u8bc4\u4f30\u7ed3\u679c\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u8fd9\u4e9b\u6307\u6807\u5728\u5bf9\u826f\u597d\u8bad\u7ec3\u7684LLMs\u4e0e\u65b0\u6a21\u6001\u8fdb\u884c\u5bf9\u9f50\u65f6\u5e76\u4e0d\u5177\u6709\u5f88\u597d\u7684\u6307\u793a\u6027\u3002\u7531\u4e8e\u7f3a\u4e4f\u5408\u9002\u7684\u6307\u6807\uff0cLVLMs\u5728\u5173\u952e\u7684\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u7814\u7a76\u53d7\u5230\u4e86\u6781\u5927\u7684\u963b\u788d\uff0c\u5305\u62ec\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u9ad8\u6548\u6a21\u5757\u8bbe\u8ba1\u7b49\u3002\u672c\u6587\u63d0\u51fa\u4ece\u8de8\u6a21\u6001\u5206\u5e03\u8ddd\u79bb\u7684\u89d2\u5ea6\u6765\u8bc4\u4f30\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u5f15\u5165\u4e86\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u8be5\u6307\u6807\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a1\uff09**\u6709\u6548**\u5730\u4ee3\u8868\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u4e0e\u7ecf\u8fc7\u76d1\u7763\u5fae\u8c03\u540e\u7684\u57fa\u51c6\u6027\u80fd\u5448\u73b0\u6b63\u76f8\u5173\uff1b2\uff09**\u7a33\u5065**\u4e8e\u4e0d\u540c\u7684\u8bad\u7ec3/\u8bc4\u4f30\u6570\u636e\uff1b3\uff09**\u6cdb\u5316**\u4e8e\u591a\u79cd\u8bad\u7ec3\u914d\u7f6e\u548c\u67b6\u6784\u9009\u62e9\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u9884\u8bad\u7ec3\u5b9e\u9a8c\u4ee5\u63a2\u7d22MIR\u7684\u6709\u6548\u6027\uff0c\u5e76\u89c2\u5bdf\u5230\u4ee4\u4eba\u6ee1\u610f\u7684\u7ed3\u679c\uff0c\u5373MIR\u80fd\u591f\u6307\u793a\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u8bad\u7ec3\u7b56\u7565\u8c03\u5ea6\u4ee5\u53ca\u6a21\u578b\u67b6\u6784\u8bbe\u8ba1\u4ee5\u83b7\u5f97\u66f4\u597d\u7684\u9884\u8bad\u7ec3\u7ed3\u679c\u3002\u6211\u4eec\u5e0c\u671bMIR\u80fd\u591f\u6210\u4e3a\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u7684\u6709\u7528\u6307\u6807\uff0c\u5e76\u6fc0\u53d1\u4e0d\u540c\u9886\u57df\u5173\u4e8e\u6a21\u6001\u5bf9\u9f50\u7684\u540e\u7eed\u7814\u7a76\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/shikiw/Modality-Integration-Rate\u3002**|\n", "2410.07166": "|**2024-10-09**|**Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making**|Manling Li et.al.|[2410.07166](http://arxiv.org/abs/2410.07166)|**[link](https://github.com/embodied-agent-interface/embodied-agent-interface)**|**\u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u7684\u8868\u73b0\uff0c\u867d\u7136\u5df2\u6709\u5927\u91cf\u7814\u7a76\u5229\u7528LLMs\u5904\u7406\u5b9e\u4f53\u5316\u73af\u5883\u4e2d\u7684\u51b3\u7b56\u95ee\u9898\uff0c\u4f46\u6211\u4eec\u4ecd\u7f3a\u4e4f\u5bf9\u5176\u6027\u80fd\u7684\u5168\u9762\u7406\u89e3\u3002\u73b0\u6709\u5de5\u4f5c\u901a\u5e38\u5728\u4e0d\u540c\u9886\u57df\u3001\u9488\u5bf9\u4e0d\u540c\u76ee\u7684\u3001\u57fa\u4e8e\u4e0d\u540c\u8f93\u5165\u548c\u8f93\u51fa\u6784\u5efaLLMs\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u7edf\u4e00\u8bc4\u4ef7\u5b83\u4eec\u3002\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u6700\u7ec8\u7684\u6210\u529f\u7387\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u8bc6\u522bLLMs\u7f3a\u5931\u7684\u80fd\u529b\u4ee5\u53ca\u95ee\u9898\u6240\u5728\uff0c\u8fdb\u800c\u963b\u788d\u4e86\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u6709\u6548\u4e14\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u63a5\u53e3\uff08\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u63a5\u53e3\uff09\uff0c\u65e8\u5728\u652f\u6301\u5404\u79cd\u4efb\u52a1\u7c7b\u578b\u4e0eLLM\u6a21\u5757\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u7684\u7edf\u4e00\u5316\u3002\u5177\u4f53\u800c\u8a00\uff0c\u8be5\u63a5\u53e3\u5141\u8bb8\uff1a 1. \u7edf\u4e00\u591a\u79cd\u6d89\u53ca\u72b6\u6001\u4e0e\u65f6\u95f4\u5ef6\u4f38\u76ee\u6807\u7684\u5b9e\u4f53\u5316\u51b3\u7b56\u4efb\u52a1\u3002 2. \u7edf\u4e00\u56db\u79cd\u5e38\u7528\u7684\u7528\u4e8e\u51b3\u7b56\u7684LLM\u6a21\u5757\uff1a\u76ee\u6807\u89e3\u91ca\u3001\u5b50\u76ee\u6807\u5206\u89e3\u3001\u52a8\u4f5c\u5e8f\u5217\u89c4\u5212\u548c\u8fc7\u6e21\u5efa\u6a21\u3002 3. \u63d0\u4f9b\u4e00\u7cfb\u5217\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u5c06\u8bc4\u4f30\u7ec6\u5206\u4e3a\u5404\u79cd\u9519\u8bef\u7c7b\u578b\uff0c\u5982\u5e7b\u89c9\u9519\u8bef\u3001\u53ef\u7528\u6027\u9519\u8bef\u3001\u4e0d\u540c\u7c7b\u578b\u89c4\u5212\u9519\u8bef\u7b49\u3002 \u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4e0d\u540c\u5b50\u4efb\u52a1\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\uff0c\u63ed\u793a\u4e86LLM\u9a71\u52a8\u7684\u5b9e\u4f53\u5316\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u5f3a\u9879\u4e0e\u5f31\u70b9\uff0c\u5e76\u4e3a\u6709\u6548\u548c\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002**|\n", "2410.07163": "|**2024-10-09**|**Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning**|Chongyu Fan et.al.|[2410.07163](http://arxiv.org/abs/2410.07163)|**[link](https://github.com/OPTML-Group/Unlearn-Simple)**|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53bb\u5b66\u4e60\u95ee\u9898\uff0c\u5373\u5728\u4e0d\u91cd\u65b0\u4ece\u5934\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u6d88\u9664\u4e0d\u9700\u8981\u7684\u6570\u636e\u5f71\u54cd\u4ee5\u53ca\u76f8\u5173\u6a21\u578b\u80fd\u529b\uff08\u5982\u7248\u6743\u6570\u636e\u6216\u6709\u5bb3\u5185\u5bb9\u751f\u6210\uff09\uff0c\u540c\u65f6\u4fdd\u7559\u5fc5\u8981\u7684\u6a21\u578b\u529f\u80fd\u3002\u5c3d\u7ba1\u5bf9LLM\u53bb\u5b66\u4e60\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5c1a\u672a\u5f62\u6210\u4e00\u79cd\u539f\u7406\u6027\u7684\u4f18\u5316\u6846\u67b6\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u56de\u987e\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014\u8d1f\u504f\u597d\u4f18\u5316\uff08NPO\uff09\uff0c\u5e76\u53d1\u73b0\u4e86\u53c2\u8003\u6a21\u578b\u504f\u89c1\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31NPO\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u662f\u5728\u53bb\u5b66\u4e60\u4e0d\u540c\u96be\u5ea6\u6570\u636e\u65f6\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u53bb\u5b66\u4e60\u4f18\u5316\u6846\u67b6\u2014\u2014SimNPO\uff0c\u8868\u660e\u901a\u8fc7\u7b80\u5355\u7684\u504f\u597d\u4f18\u5316\u51cf\u5c11\u5bf9\u53c2\u8003\u6a21\u578b\u7684\u4f9d\u8d56\uff08\u4ece\u7b80\u5316\u89c6\u89d2\u6765\u770b\uff09\u6709\u52a9\u4e8e\u53bb\u5b66\u4e60\u8fc7\u7a0b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u6df1\u5165\u7684SimNPO\u4f18\u52bf\u5206\u6790\uff0c\u901a\u8fc7\u6df7\u5408\u9a6c\u5c14\u53ef\u592b\u94fe\u7684\u5206\u6790\u65b9\u6cd5\u652f\u6301\u8fd9\u4e00\u89c2\u70b9\u3002 \u6211\u4eec\u901a\u8fc7\u5728TOFU\u548cMUSE\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86SimNPO\u76f8\u5bf9\u4e8e\u73b0\u6709\u53bb\u5b66\u4e60\u57fa\u7ebf\u7684\u4f18\u8d8a\u6027\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5bf9\u91cd\u65b0\u5b66\u4e60\u653b\u51fb\u7684\u9c81\u68d2\u6027\u3002\u6240\u6709\u4ee3\u7801\u5747\u53ef\u5728GitHub\u4e0a\u7684https://github.com/OPTML-Group/Unlearn-Simple\u83b7\u53d6\u3002|\n", "2410.07155": "|**2024-10-09**|**Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis**|Bohan Zeng et.al.|[2410.07155](http://arxiv.org/abs/2410.07155)|**[link](https://github.com/yangling0818/trans4d)**|**\u8fd1\u671f\u5728\u6269\u6563\u6a21\u578b\u9886\u57df\u7684\u8fdb\u5c55\u5c55\u793a\u4e86\u5176\u5728\u56fe\u50cf\u548c\u89c6\u9891\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u4e864D\u5408\u6210\u7684\u6709\u6548\u6027\u3002\u73b0\u6709\u76844D\u751f\u6210\u65b9\u6cd5\u80fd\u591f\u6839\u636e\u7528\u6237\u53cb\u597d\u7684\u6761\u4ef6\u751f\u6210\u9ad8\u8d28\u91cf\u76844D\u5bf9\u8c61\u6216\u573a\u666f\uff0c\u5bf9\u6e38\u620f\u548c\u89c6\u9891\u884c\u4e1a\u5927\u6709\u88e8\u76ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5408\u6210\u590d\u67424D\u8fc7\u6e21\u548c\u573a\u666f\u5185\u5bf9\u8c61\u4ea4\u4e92\u7684\u663e\u8457\u53d8\u5f62\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTrans4D\u7684\u521b\u65b0\u6587\u672c\u52304D\u5408\u6210\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u771f\u5b9e\u53ef\u4fe1\u7684\u573a\u666f\u7ea7\u590d\u6742\u8fc7\u6e21\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u751f\u6210\u7269\u7406\u610f\u8bc6\u7684\u573a\u666f\u63cf\u8ff0\u4ee5\u8fdb\u884c4D\u573a\u666f\u521d\u59cb\u5316\u4ee5\u53ca\u6709\u6548\u8fc7\u6e21\u65f6\u95f4\u89c4\u5212\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51e0\u4f55\u611f\u77e5\u76844D\u8fc7\u6e21\u7f51\u7edc\uff0c\u57fa\u4e8e\u8ba1\u5212\u5b9e\u73b0\u590d\u6742\u7684\u573a\u666f\u7ea74D\u8fc7\u6e21\uff0c\u6d89\u53ca\u8868\u73b0\u529b\u5f3a\u7684\u5bf9\u8c61\u51e0\u4f55\u53d8\u5f62\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cTrans4D\u5728\u751f\u6210\u5177\u6709\u51c6\u786e\u6027\u548c\u9ad8\u8d28\u91cf\u8fc7\u6e21\u76844D\u573a\u666f\u65b9\u9762\u59cb\u7ec8\u8d85\u8d8a\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/Trans4D**|\n", "2410.07129": "|**2024-10-09**|**Mental Disorders Detection in the Era of Large Language Models**|Gleb Kuzmin et.al.|[2410.07129](http://arxiv.org/abs/2410.07129)|null|\u672c\u6587\u6bd4\u8f83\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3001\u7f16\u7801\u5668\u57fa\u6a21\u578b\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6291\u90c1\u75c7\u548c\u7126\u8651\u75c7\u68c0\u6d4b\u4efb\u52a1\u4e0a\u7684\u6548\u679c\u3002\u8003\u8651\u4e86\u4e94\u4e2a\u4e0d\u540c\u683c\u5f0f\u7684\u6570\u636e\u5e93\uff0c\u6bcf\u4e2a\u6570\u636e\u5e93\u90fd\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6765\u5b9a\u4e49\u76ee\u6807\u75c5\u7406\u5b66\u7c7b\u522b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u57fa\u4e8e\u8bed\u8a00\u7279\u5f81\u7684AutoML\u6a21\u578b\u3001\u591a\u79cd\u53d8\u4f53\u7684Transformer\u7f16\u7801\u5668\uff0c\u5982BERT\uff0c\u4ee5\u53ca\u6700\u5148\u8fdb\u7684LLM\u4f5c\u4e3a\u75c5\u7406\u5206\u7c7b\u6a21\u578b\u3002\u7ed3\u679c\u8868\u660e\uff0cLLM\u5728\u566a\u58f0\u5927\u4e14\u8bad\u7ec3\u6837\u672c\u5728\u6587\u672c\u957f\u5ea6\u548c\u7c7b\u578b\u4e0a\u5dee\u5f02\u663e\u8457\u7684\u5c0f\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5f53\u5728\u786e\u8bca\u4e3a\u6291\u90c1\u75c7\u4e2a\u4f53\u7684\u6587\u672c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u4f18\u4e8e\u4f20\u7edf\u7684\u5fc3\u7406\u8bed\u8a00\u5b66\u7279\u5f81\u548c\u7f16\u7801\u5668\u57fa\u6a21\u578b\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u5728\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2410.07113": "|**2024-10-09**|**Personalized Visual Instruction Tuning**|Renjie Pi et.al.|[2410.07113](http://arxiv.org/abs/2410.07113)|**[link](https://github.com/sterzhang/pvit)**|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u5c55\u5c55\u73b0\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5b58\u5728\u4e00\u4e2a\u660e\u663e\u7684\u5c40\u9650\u6027\u2014\u2014\u201c\u9762\u90e8\u76f2\u75c7\u201d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u4e00\u822c\u6027\u7684\u5bf9\u8bdd\uff0c\u4f46\u5374\u65e0\u6cd5\u9488\u5bf9\u7279\u5b9a\u4e2a\u4f53\u8fdb\u884c\u4e2a\u6027\u5316\u5bf9\u8bdd\u3002\u8fd9\u4e00\u7f3a\u9677\u963b\u788d\u4e86MLLMs\u5728\u4e2a\u6027\u5316\u573a\u666f\u4e2d\u7684\u5e94\u7528\uff0c\u5982\u5b9a\u5236\u5316\u7684\u79fb\u52a8\u8bbe\u5907\u89c6\u89c9\u52a9\u624b\u6216\u9700\u8981\u8bc6\u522b\u5bb6\u5ead\u6210\u5458\u7684\u5bb6\u7528\u673a\u5668\u4eba\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4e2a\u6027\u5316\u89c6\u89c9\u6307\u4ee4\u8c03\u6574\uff08PVIT\uff09\u7684\u65b0\u9896\u6570\u636e\u6574\u7406\u4e0e\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u4f7fMLLMs\u80fd\u591f\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u76ee\u6807\u4e2a\u4f53\uff0c\u5e76\u5c55\u5f00\u4e2a\u6027\u5316\u4e14\u8fde\u8d2f\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5f00\u53d1\u4e00\u4e2a\u590d\u6742\u7684\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u4e3b\u751f\u6210\u5305\u542b\u4e2a\u6027\u5316\u5bf9\u8bdd\u7684\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4e2a\u7ba1\u9053\u5229\u7528\u4e86\u5404\u79cd\u89c6\u89c9\u4e13\u5bb6\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u548c\uff08\u591a\u6a21\u6001\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30MLLMs\u7684\u4e2a\u6027\u5316\u6f5c\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aP-Bench\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u5305\u62ec\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u6211\u4eec\u6574\u7406\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u4e2a\u6027\u5316\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u667a\u80fd\u4f53\u53d8\u5f97\u8d8a\u6765\u8d8a\u81ea\u4e3b\uff0c\u5e76\u4e14\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\u65f6\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u6a21\u5f0f\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u9884\u89c1\u53ef\u80fd\u4ea7\u751f\u7684\u65b0\u73b0\u8c61\u4ee5\u53ca\u6f5c\u5728\u98ce\u9669\u3002\u672c\u6587\u53d7\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u542f\u53d1\uff0c\u4e13\u6ce8\u4e8e\u7814\u7a76\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u80cc\u666f\u7684\u591a\u667a\u80fd\u4f53\u73af\u5883\u4e2d\u7684LLM\u4ea4\u4e92\u6a21\u5f0f\u3002 \u7814\u7a76\u805a\u7126\u4e8e\u4e24\u7c7b\u4e3b\u8981\u73b0\u8c61\uff1a\u8bf4\u670d\u529b\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u8bd5\u56fe\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\uff08\u5982\u83b7\u5f97\u989d\u5916\u7684\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u72f1\uff09\u7684\u56da\u72af\u667a\u80fd\u4f53\u4e4b\u95f4\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u63a2\u8ba8\u3002\u901a\u8fc7\u4f7f\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\uff0c\u5171\u8ba12000\u6b21\u673a\u5668\u95f4\u7684\u5bf9\u8bdd\uff0c\u7814\u7a76\u4e86\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u83b7\u5f97\u4e86\u4ee5\u4e0b\u663e\u8457\u53d1\u73b0\uff1a 1. \u4e00\u4e9b\u6a21\u578b\u5728\u591a\u667a\u80fd\u4f53\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\uff0c\u65e0\u6cd5\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002 2. \u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u76ee\u6807\u5bf9\u667a\u80fd\u4f53\u7684\u8bf4\u670d\u529b\u6709\u663e\u8457\u5f71\u54cd\uff0c\u800c\u5bf9\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002 3. \u667a\u80fd\u4f53\u7684\u89d2\u8272\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u4eba\u683c\u7279\u8d28\uff0c\u5bf9\u56da\u72af\u7684\u8bf4\u670d\u6210\u529f\u51e0\u7387\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u6709\u7740\u76f4\u63a5\u63a8\u52a8\u4f5c\u7528\u3002 4. \u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u7684\u4eba\u683c\u7279\u8d28\uff0c\u4ec5\u901a\u8fc7\u8d4b\u4e88\u89d2\u8272\uff0c\u4e5f\u89c2\u5bdf\u5230\u4e86\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u81ea\u7136\u4ea7\u751f\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ea4\u4e92\u667a\u80fd\u4f53\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8ba8\u8bba\u5177\u6709\u91cd\u8981\u542f\u793a\u3002|\n", "2410.07103": "|**2024-10-09**|**Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context**|Sangwon Yu et.al.|[2410.07103](http://arxiv.org/abs/2410.07103)|null|\u5728\u591a\u8df3\u63a8\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u57fa\u4e8e\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u5185\u7684\u652f\u6301\u6587\u6863\u8fdb\u884c\u591a\u6b65\u9aa4\u63a8\u7406\u7684\u6311\u6218\u3002LLM\u5f80\u5f80\u96be\u4ee5\u7b5b\u9009\u51fa\u4e0d\u76f8\u5173\u7684\u6587\u6863\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5bf9\u4e0a\u4e0b\u6587\u4e2d\u652f\u6301\u6587\u6863\u7684\u4f4d\u7f6e\u975e\u5e38\u654f\u611f\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u6311\u6218\uff1aLLM\u7684\u6027\u80fd\u4e5f\u5bf9\u5448\u73b0\u652f\u6301\u6587\u6863\u7684\u987a\u5e8f\u975e\u5e38\u654f\u611f\u3002\u6211\u4eec\u5c06\u6b64\u95ee\u9898\u79f0\u4e3a\u201c\u9519\u5e8f\u4e0a\u4e0b\u6587\u95ee\u9898\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6cd5\u2014\u2014\u4e0a\u4e0b\u6587\u91cd\u590d\uff08CoRe\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u591a\u6b21\u63d0\u793a\u6a21\u578b\u4ee5\u786e\u4fdd\u652f\u6301\u6587\u6863\u4ee5\u6700\u4f73\u987a\u5e8f\u5448\u73b0\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002 \u901a\u8fc7\u5e94\u7528CoRe\uff0c\u6211\u4eec\u5728\u591a\u8df3\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684F1\u5f97\u5206\u63d0\u9ad8\u4e86\u9ad8\u8fbe30%\uff0c\u5728\u5408\u6210\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e86\u9ad8\u8fbe70%\u3002\u6b64\u5916\uff0cCoRe\u6709\u52a9\u4e8e\u7f13\u89e3LLM\u666e\u904d\u5b58\u5728\u7684\u201c\u4e2d\u95f4\u8ff7\u5931\u201d\u95ee\u9898\uff0c\u5e76\u53ef\u4ee5\u4e0e\u5229\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63a8\u7406\u7684\u68c0\u7d22\u65b9\u6cd5\u6709\u6548\u7ed3\u5408\u4f7f\u7528\u3002|\n", "2410.08202": "|**2024-10-10**|**Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training**|Gen Luo et.al.|[2410.08202](http://arxiv.org/abs/2410.08202)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5bf9\u6269\u5c55\u5176\u80fd\u529b\u4ee5\u5904\u7406\u591a\u6a21\u6001\u4efb\u52a1\u7684\u5173\u6ce8\u65e5\u76ca\u589e\u52a0\u3002\u5176\u4e2d\uff0c\u5bf9\u5355\u4f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u7814\u7a76\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6574\u5408\u4e86\u89c6\u89c9\u7f16\u7801\u548c\u8bed\u8a00\u89e3\u7801\u529f\u80fd\u3002\u5c3d\u7ba1\u5355\u4f53MLLM\u5728\u7ed3\u6784\u4e0a\u7b80\u6d01\u4e14\u6613\u4e8e\u90e8\u7f72\uff0c\u4f46\u8981\u5b9e\u73b0\u5177\u6709\u7ade\u4e89\u529b\u6027\u80fd\u7684\u8bad\u7ec3\u4ecd\u9762\u4e34\u6311\u6218\u3002\u6d41\u884c\u7684\u7b56\u7565\u91c7\u7528\u8fde\u7eed\u9884\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5c06\u9884\u8bad\u7ec3\u7684LLM\u6269\u5c55\u4e3a\u5355\u4f53MLLM\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u707e\u96be\u6027\u9057\u5fd8\u5e76\u5bfc\u81f4\u6027\u80fd\u9000\u5316\u3002 \u672c\u6587\u65e8\u5728\u4ece\u589e\u91cf\u5b66\u4e60\u7684\u89d2\u5ea6\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5728\u9884\u8bad\u7ec3\u7684LLM\u4e2d\u5d4c\u5165\u89c6\u89c9\u53c2\u6570\uff0c\u901a\u8fc7\u589e\u91cf\u5b66\u4e60\u673a\u5236\uff0c\u5373\u5728\u4f18\u5316\u89c6\u89c9\u53c2\u6570\u65f6\u51bb\u7ed3LLM\uff0c\u4ece\u5927\u91cf\u6570\u636e\u4e2d\u9010\u6b65\u5b66\u4e60\u89c6\u89c9\u77e5\u8bc6\u3002\u57fa\u4e8e\u8fd9\u4e00\u539f\u5219\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMono-InternVL\u7684\u65b0\u578b\u5355\u4f53MLLM\uff0c\u5b83\u901a\u8fc7\u591a\u6a21\u6001\u6df7\u5408\u4e13\u5bb6\u7ed3\u6784\u65e0\u7f1d\u5730\u878d\u5408\u4e86\u4e00\u7cfb\u5217\u89c6\u89c9\u4e13\u5bb6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u9884\u8bad\u7ec3\u7b56\u7565\u6765\u6700\u5927\u5316Mono-InternVL\u7684\u89c6\u89c9\u80fd\u529b\uff0c\u5373\u5185\u751f\u89c6\u89c9\u9884\u8bad\u7ec3\uff08EViP\uff09\u3002\u5177\u4f53\u800c\u8a00\uff0cEViP\u8bbe\u8ba1\u4e3a\u4e00\u4e2a\u89c6\u89c9\u4e13\u5bb6\u7684\u6e10\u8fdb\u5f0f\u5b66\u4e60\u8fc7\u7a0b\uff0c\u65e8\u5728\u5145\u5206\u5229\u7528\u4ece\u4f4e\u8d28\u91cf\u6570\u636e\u5230\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u89c6\u89c9\u77e5\u8bc6\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u572816\u4e2a\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u8bc1\u5b9e\u4e86\u4e0e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u5355\u4f53MLLM\u76f8\u6bd4\uff0cMono-InternVL\u57286\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\u4e0a\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4f8b\u5982\u5728OCRBench\u4e0a\u7684+113\u70b9\u4f18\u52bf\uff0c\u800c\u4e14\u8fd8\u786e\u8ba4\u4e86\u5176\u66f4\u597d\u7684\u90e8\u7f72\u6548\u7387\uff0c\u9996\u6b21\u4ee4\u724c\u5ef6\u8fdf\u964d\u4f4e\u4e86\u9ad8\u8fbe67%\u3002|\n", "2410.08197": "|**2024-10-10**|**From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions**|Changle Qu et.al.|[2410.08197](http://arxiv.org/abs/2410.08197)|**[link](https://github.com/quchangle1/DRAFT)**|**\u672c\u6587\u4e13\u6ce8\u4e8e\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u5b58\u5728\u7684\u7406\u89e3\u9e3f\u6c9f\u95ee\u9898\uff0c\u8fd9\u4e00\u9e3f\u6c9f\u6e90\u4e8e\u73b0\u6709\u4eba\u7c7b\u5bfc\u5411\u7684\u5de5\u5177\u6587\u6863\u7684\u4e0d\u5b8c\u5584\u6027\u548c\u4e0d\u51c6\u786e\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDRAFT\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u52a8\u6001\u4f18\u5316\u5de5\u5177\u6587\u6863\uff0c\u901a\u8fc7\u5206\u6790\u6765\u81eaLLM\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u7684\u53cd\u9988\u548c\u8f68\u8ff9\u4fe1\u606f\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8e\u4e00\u79cd\u521b\u65b0\u7684\u8bd5\u9519\u5b66\u4e60\u6d41\u7a0b\uff0c\u5305\u62ec\u7ecf\u9a8c\u6536\u96c6\u3001\u4ece\u7ecf\u9a8c\u5b66\u4e60\u4ee5\u53ca\u6587\u6863\u91cd\u5199\u4e09\u4e2a\u9636\u6bb5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5de5\u5177\u6587\u6863\u7684\u8d28\u91cf\u3002 \u4e3a\u4e86\u786e\u4fdd\u63a2\u7d22\u7684\u591a\u6837\u6027\u5e76\u907f\u514d\u8fc7\u62df\u5408\uff0cDRAFT\u8fd8\u91c7\u7528\u4e86\u4fc3\u8fdb\u591a\u6837\u6027\u7684\u63a2\u7d22\u7b56\u7565\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u5de5\u5177\u9002\u5e94\u6027\u7ec8\u6b62\u673a\u5236\u6765\u63d0\u9ad8\u6548\u7387\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAFT\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u4f18\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u6587\u6863\u8d28\u91cf\uff0c\u4fc3\u8fdb\u4e86LLM\u5bf9\u5de5\u5177\u7684\u66f4\u6df1\u5165\u7406\u89e3\u548c\u66f4\u6709\u6548\u5229\u7528\u3002\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u4f18\u5316\u540e\u7684\u5de5\u5177\u6587\u6863\u5177\u6709\u5f3a\u5927\u7684\u8de8\u6a21\u578b\u901a\u7528\u80fd\u529b\u3002**|\n", "2410.08196": "|**2024-10-10**|**MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code**|Zimu Lu et.al.|[2410.08196](http://arxiv.org/abs/2410.08196)|**[link](https://github.com/mathllm/mathcoder2)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u4f34\u968f\u63a8\u7406\u6b65\u9aa4\u7684\u6570\u5b66\u4ee3\u7801\uff0c\u4ee5\u8fdb\u884c\u6301\u7eed\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u6574\u5408\u6570\u5b66\u76f8\u5173\u7f51\u7edc\u6570\u636e\u3001\u4f7f\u7528\u6570\u5b66\u5305\u7684\u4ee3\u7801\u3001\u6570\u5b66\u6559\u79d1\u4e66\u548c\u5408\u6210\u6570\u636e\u6765\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6301\u7eed\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u53d6LaTeX\u8868\u8fbe\u5f0f\u3001\u8868\u8fbe\u5f0f\u7684\u6761\u4ef6\u4ee5\u53ca\u7ed3\u679c\u6765\u6784\u9020\u63a8\u7406\u6b65\u9aa4\u3002\u57fa\u4e8e\u8fd9\u4e9b\u63d0\u53d6\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\uff0c\u4ee5\u51c6\u786e\u6355\u6349\u6570\u5b66\u63a8\u7406\u8fc7\u7a0b\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u4ee3\u7801\u9644\u52a0\u5230\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u540e\uff0c\u5f62\u6210\u5305\u542b\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u6b65\u9aa4\u53ca\u5176\u5bf9\u5e94\u4ee3\u7801\u7684\u6570\u636e\u5bf9\u3002\u5c06\u6b64\u6570\u636e\u4e0e\u539f\u59cb\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u5f97\u5230\u4e00\u4e2a\u5305\u542b19.2B\u4e2a\u6807\u8bb0\u7684\u9ad8\u6027\u80fd\u6570\u5b66\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aMathCode-Pile\u3002\u4f7f\u7528\u6b64\u8bed\u6599\u5e93\u5bf9\u51e0\u79cd\u6d41\u884c\u7684\u57fa\u6a21\u8fdb\u884c\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ece\u800c\u4ea7\u751f\u4e86\u540d\u4e3aMathCoder2\u7684\u6a21\u578b\u5bb6\u65cf\u3002\u6240\u6709\u6570\u636e\u5904\u7406\u548c\u8bad\u7ec3\u4ee3\u7801\u5747\u5f00\u6e90\uff0c\u786e\u4fdd\u4e86\u6574\u4e2a\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u6d41\u7a0b\u7684\u900f\u660e\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002\u4ee3\u7801\u5728https://github.com/mathllm/MathCoder2\u4e0a\u53d1\u5e03\u3002**|\n", "2410.08193": "|**2024-10-10**|**GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment**|Yuancheng Xu et.al.|[2410.08193](http://arxiv.org/abs/2410.08193)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4ed4\u7ec6\u5bf9\u9f50\u4ee5\u6ee1\u8db3\u4eba\u7c7b\u7684\u504f\u597d\u3002\u4f20\u7edf\u7684\u8bad\u7ec3\u65f6\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u96c6\u6765\u5fae\u8c03LLM\uff0c\u4f46\u4f1a\u5e26\u6765\u663e\u8457\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u5e76\u4e14\u9700\u8981\u91cd\u590d\u8bad\u7ec3\u4ee5\u5e94\u5bf9\u591a\u6837\u5316\u7684\u7528\u6237\u504f\u597d\u3002\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u6765\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u8f68\u8ff9\u7ea7RM\uff0c\u5b83\u4eec\u65e8\u5728\u8bc4\u4f30\u5b8c\u6574\u54cd\u5e94\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u4e0d\u9002\u5408\u7528\u4e8e\u9700\u8981\u4ece\u90e8\u5206\u54cd\u5e94\u8ba1\u7b97\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\u7684\u81ea\u56de\u5f52\u6587\u672c\u751f\u6210\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenARM\uff0c\u4e00\u79cd\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5229\u7528\u4e86\u81ea\u56de\u5f52\u5956\u52b1\u6a21\u578b\u2014\u2014\u4e00\u79cd\u65b0\u578b\u7684\u5956\u52b1\u53c2\u6570\u5316\u65b9\u6cd5\uff0c\u65e8\u5728\u9884\u6d4b\u81ea\u56de\u5f52\u751f\u6210\u8fc7\u7a0b\u4e2d\u7684\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u548c\u6709\u6548\u7684\u81ea\u56de\u5f52\u751f\u6210\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u53c2\u6570\u5316\u53ef\u4ee5\u5728KL\u6b63\u5219\u5316\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u5185\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\u63a5\u8fd1\u4efb\u4f55\u7531\u4f20\u7edfRM\u53ef\u5b9e\u73b0\u7684\u5206\u5e03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGenARM\u5728\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u57fa\u7ebf\uff0c\u5e76\u4e14\u4e0e\u8bad\u7ec3\u65f6\u65b9\u6cd5\u7684\u6027\u80fd\u76f8\u5f53\u3002\u6b64\u5916\uff0cGenARM\u652f\u6301\u5f31\u5230\u5f3a\u7684\u6307\u5bfc\uff0c\u5141\u8bb8\u5728\u4e0d\u9700\u8981\u8bad\u7ec3\u66f4\u5927\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\uff0c\u901a\u8fc7\u8f83\u5c0f\u7684RM\u5bf9\u66f4\u5927\u7684LLM\u8fdb\u884c\u5bf9\u9f50\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u6210\u672c\u3002\u8fdb\u4e00\u6b65\u5730\uff0cGenARM\u8fd8\u652f\u6301\u591a\u76ee\u6807\u5bf9\u9f50\uff0c\u5141\u8bb8\u5b9e\u65f6\u5e73\u8861\u504f\u597d\u7ef4\u5ea6\uff0c\u6ee1\u8db3\u4e0d\u540c\u7528\u6237\u9700\u6c42\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u3002|\n", "2410.08174": "|**2024-10-10**|**Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models**|Qingni Wang et.al.|[2410.08174](http://arxiv.org/abs/2410.08174)|null|\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRON\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u65e8\u5728\u5bf9\u4efb\u4f55\u652f\u6301\u5728\u5f00\u653e\u548c\u5c01\u95ed\u573a\u666f\u4e0b\u91c7\u6837\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u98ce\u9669\u63a7\u5236\u4e0e\u8bc4\u4f30\u3002TRON\u7531\u4e24\u4e2a\u4e3b\u8981\u7ec4\u4ef6\u6784\u6210\uff1a\uff081\uff09\u4e00\u79cd\u65b0\u9896\u7684\u6821\u51c6\u8bc4\u5206\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ee5\u6700\u5c0f\u5c3a\u5bf8\u91c7\u6837\u54cd\u5e94\u96c6\uff1b\uff082\uff09\u57fa\u4e8e\u81ea\u81f4\u6027\u7406\u8bba\u7684\u975e\u4e00\u81f4\u6027\u8bc4\u5206\uff0c\u901a\u8fc7\u8bbe\u5b9a\u4e24\u79cd\u7279\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u6765\u63a7\u5236\u9519\u8bef\u7387\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u9996\u6b21\u63a2\u8ba8\u4e86\u5728\u5f00\u653e\u573a\u666f\u4e0b\u7684\u9884\u6d4b\u96c6\u4e2d\u7684\u8bed\u4e49\u5197\u4f59\u95ee\u9898\uff0c\u5e76\u636e\u6b64\u63d0\u51fa\u4e86\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4ef7MLLM\u7684\u65b0\u6307\u6807\u2014\u2014\u5e73\u5747\u96c6\u5408\u5927\u5c0f\u3002 \u901a\u8fc7\u5728\u56db\u4e2a\u89c6\u9891\u95ee\u7b54\uff08VideoQA\uff09\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u516b\u79cdMLLM\u8fdb\u884c\u5168\u9762\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86TRON\u80fd\u591f\u5b9e\u73b0\u7528\u6237\u6307\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u8303\u56f4\u5185\u7684\u671f\u671b\u9519\u8bef\u7387\u3002\u540c\u65f6\uff0c\u53bb\u91cd\u540e\u7684\u9884\u6d4b\u96c6\u5728\u4fdd\u6301\u9002\u5e94\u6027\u7684\u540c\u65f6\uff0c\u5c55\u73b0\u51fa\u66f4\u9ad8\u6548\u3001\u7a33\u5b9a\u7684\u98ce\u9669\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u4e0d\u540c\u98ce\u9669\u6c34\u5e73\u4e0b\u5747\u6709\u51fa\u8272\u8868\u73b0\u3002|\n", "2410.08172": "|**2024-10-10**|**On the Evaluation of Generative Robotic Simulations**|Feng Chen et.al.|[2410.08172](http://arxiv.org/abs/2410.08172)|null|\u7531\u4e8e\u83b7\u53d6\u771f\u5b9e\u4e16\u754c\u6570\u636e\u7684\u56f0\u96be\u6027\uff0c\u673a\u5668\u4eba\u6a21\u62df\u5df2\u6210\u4e3a\u5e76\u884c\u8bad\u7ec3\u548c\u6a21\u62df\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u8f6c\u6362\u7684\u5173\u952e\uff0c\u8fd9\u51f8\u663e\u4e86\u53ef\u6269\u5c55\u4eff\u771f\u673a\u5668\u4eba\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002\u57fa\u7840\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u5728\u81ea\u4e3b\u751f\u6210\u53ef\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u65b0\u8303\u5f0f\u5f3a\u8c03\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u81ea\u4e3b\u751f\u6210\u4efb\u52a1\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u751f\u6210\u6a21\u62df\u7684\u5168\u9762\u8bc4\u4ef7\u6846\u67b6\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u8bc4\u4f30\u5206\u4e3a\u4e09\u4e2a\u6838\u5fc3\u65b9\u9762\uff1a\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u3002\u5bf9\u4e8e\u5355\u4efb\u52a1\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u751f\u6210\u4efb\u52a1\u7684\u771f\u5b9e\u6027\u548c\u751f\u6210\u8f68\u8ff9\u7684\u5b8c\u6574\u6027\u3002\u5728\u591a\u6837\u6027\u65b9\u9762\uff0c\u6211\u4eec\u901a\u8fc7\u4efb\u52a1\u63cf\u8ff0\u7684\u6587\u672c\u76f8\u4f3c\u6027\u548c\u6536\u96c6\u7684\u4efb\u52a1\u8f68\u8ff9\u8bad\u7ec3\u7684\u4e16\u754c\u6a21\u578b\u635f\u5931\u6765\u6d4b\u91cf\u4efb\u52a1\u548c\u6570\u636e\u7684\u591a\u6837\u6027\u3002\u5bf9\u4e8e\u4efb\u52a1\u7ea7\u522b\u7684\u6cdb\u5316\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4f7f\u7528\u591a\u4e2a\u751f\u6210\u4efb\u52a1\u8bad\u7ec3\u7684\u7b56\u7565\u5728\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6cdb\u5316\u80fd\u529b\u3002\u5728\u4e09\u4e2a\u4ee3\u8868\u6027\u4efb\u52a1\u751f\u6210\u7ba1\u9053\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7684\u8bc4\u4f30\u7ed3\u679c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u9ad8\u5ea6\u4e00\u81f4\uff0c\u786e\u8ba4\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u6709\u6548\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u53ef\u4ee5\u901a\u8fc7\u67d0\u4e9b\u65b9\u6cd5\u5b9e\u73b0\u8d28\u91cf\u548c\u591a\u6837\u6027\u7684\u6307\u6807\uff0c\u4f46\u6ca1\u6709\u4efb\u4f55\u4e00\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u6240\u6709\u6307\u6807\u4e0a\u90fd\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u8868\u660e\u9700\u8981\u66f4\u591a\u5730\u5173\u6ce8\u5e73\u8861\u8fd9\u4e9b\u4e0d\u540c\u6307\u6807\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u7a81\u663e\u4e86\u5f53\u524d\u5de5\u4f5c\u9762\u4e34\u7684\u5171\u540c\u6311\u6218\u2014\u2014\u4f4e\u6cdb\u5316\u80fd\u529b\u3002 \u533f\u540d\u7f51\u7ad9\u94fe\u63a5\uff1ahttps://sites.google.com/view/evaltasks|\n", "2410.08164": "|**2024-10-10**|**Agent S: An Open Agentic Framework that Uses Computers Like a Human**|Saaket Agashe et.al.|[2410.08164](http://arxiv.org/abs/2410.08164)|**[link](https://github.com/simular-ai/agent-s)**|**\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgent S\u7684\u5f00\u653e\u6027\u4ee3\u7406\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u56fe\u5f62\u7528\u6237\u754c\u9762(GUI)\u4e0e\u8ba1\u7b97\u673a\u8fdb\u884c\u81ea\u4e3b\u4ea4\u4e92\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u590d\u6742\u3001\u591a\u6b65\u9aa4\u7684\u4efb\u52a1\u6765\u6539\u53d8\u4eba\u673a\u4ea4\u4e92\u65b9\u5f0f\u3002Agent S\u65e8\u5728\u89e3\u51b3\u81ea\u52a8\u5316\u8ba1\u7b97\u673a\u4efb\u52a1\u65f6\u9047\u5230\u7684\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\u83b7\u53d6\u7279\u5b9a\u9886\u57df\u7684\u77e5\u8bc6\u3001\u5728\u957f\u4efb\u52a1\u5468\u671f\u5185\u89c4\u5212\u4ee5\u53ca\u5904\u7406\u52a8\u6001\u3001\u975e\u5747\u5300\u7684\u754c\u9762\u3002\u4e3a\u6b64\uff0cAgent S\u5f15\u5165\u4e86\u7ecf\u9a8c\u589e\u5f3a\u7684\u5c42\u6b21\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u591a\u4e2a\u7ea7\u522b\u4e0a\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u641c\u7d22\u548c\u5185\u90e8\u7ecf\u9a8c\u68c0\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u9ad8\u6548\u7684\u4efb\u52a1\u89c4\u5212\u548c\u5b50\u4efb\u52a1\u6267\u884c\u3002\u6b64\u5916\uff0c\u5b83\u91c7\u7528\u4e86\u4ee3\u7406-\u8ba1\u7b97\u673a\u63a5\u53e3(ACI)\uff0c\u57fa\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b(MLLMs)\u66f4\u597d\u5730\u63ed\u793aGUI\u4ee3\u7406\u7684\u63a8\u7406\u548c\u63a7\u5236\u80fd\u529b\u3002\u5728OSWorld\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cAgent S\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e869.37%(\u76f8\u5bf9\u63d0\u9ad8\u4e8683.6%)\uff0c\u5e76\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u9ad8\u6c34\u5e73\u3002\u5168\u9762\u5206\u6790\u5f3a\u8c03\u4e86\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u672a\u6765\u6539\u8fdb\u7684\u89c1\u89e3\u3002\u6b64\u5916\uff0cAgent S\u5728\u65b0\u53d1\u5e03\u7684WindowsAgentArena\u57fa\u51c6\u4e0a\u5c55\u793a\u4e86\u5e7f\u6cdb\u7684\u901a\u7528\u6027\uff0c\u80fd\u591f\u9002\u5e94\u4e0d\u540c\u7684\u64cd\u4f5c\u7cfb\u7edf\u3002\u6709\u5173\u4ee3\u7801\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u9605https://github.com/simular-ai/Agent-S\u3002**|\n", "2410.08146": "|**2024-10-10**|**Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning**|Amrith Setlur et.al.|[2410.08146](http://arxiv.org/abs/2410.08146)|null|\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u7684\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u8fc7\u7a0b\u5956\u52b1\u6a21\u578b\uff08PRMs\uff09\u3002\u4e0e\u4ec5\u5728\u6700\u7ec8\u6b65\u9aa4\u63d0\u4f9b\u53cd\u9988\u7684\u7ed3\u679c\u5956\u52b1\u6a21\u578b\uff08ORMs\uff09\u76f8\u6bd4\uff0cPRMs\u5728\u591a\u6b65\u63a8\u7406\u8ddf\u8e2a\u7684\u6bcf\u4e2a\u6b65\u9aa4\u90fd\u63d0\u4f9b\u53cd\u9988\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u6539\u8fdb\u4fe1\u7528\u5206\u914d\u3002\u7136\u800c\uff0c\u6536\u96c6\u5bc6\u96c6\u3001\u6bcf\u6b65\u9aa4\u7684\u4eba\u7c7b\u6807\u7b7e\u5e76\u4e0d\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u4ece\u81ea\u52a8\u6807\u8bb0\u6570\u636e\u8bad\u7ec3PRMs\u8fc4\u4eca\u4e3a\u6b62\u5bfc\u81f4\u7684\u589e\u76ca\u6709\u9650\u3002\u4e3a\u4e86\u901a\u8fc7\u8fd0\u884c\u641c\u7d22\u6765\u6539\u8fdb\u57fa\u7b56\u7565\u6216\u5c06\u5176\u7528\u4f5c\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7684\u5bc6\u96c6\u5956\u52b1\u6765\u4f18\u5316\u57fa\u7b56\u7565\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u95ee\u9898\u662f\uff1a\u201c\u6211\u4eec\u5e94\u8be5\u5982\u4f55\u8bbe\u8ba1\u8fc7\u7a0b\u5956\u52b1\uff1f\u201d\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0c\u4e3a\u4e86\u6709\u6548\uff0c\u6b65\u9aa4\u7ea7\u5956\u52b1\u5e94\u8be5\u8861\u91cf\u8fdb\u5ea6\uff1a\u91c7\u53d6\u6b65\u9aa4\u524d\u540e\u4ea7\u751f\u6b63\u786e\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u53d8\u5316\uff0c\u5bf9\u5e94\u4e8eRL\u4e2d\u7684\u6b65\u9aa4\u7ea7\u4f18\u52bf\u7684\u6982\u5ff5\u3002\u5173\u952e\u5728\u4e8e\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5e94\u8be5\u5728\u4e0e\u57fa\u7b56\u7565\u4e0d\u540c\u7684\u8bc1\u660e\u7b56\u7565\u4e0b\u8fdb\u884c\u6d4b\u91cf\u3002\u6211\u4eec\u7406\u8bba\u5730\u63cf\u8ff0\u4e86\u826f\u597d\u7684\u8bc1\u660e\u8005\u96c6\u5408\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u8fd9\u6837\u7684\u8bc1\u660e\u8005\u4f18\u5316\u8fc7\u7a0b\u5956\u52b1\u53ef\u4ee5\u6539\u5584\u6d4b\u8bd5\u65f6\u641c\u7d22\u548c\u5728\u7ebfRL\u671f\u95f4\u7684\u63a2\u7d22\u3002\u5b9e\u9645\u4e0a\uff0c\u6211\u4eec\u7684\u63cf\u8ff0\u663e\u793a\uff0c\u5f31\u8bc1\u660e\u8005\u7b56\u7565\u53ef\u4ee5\u663e\u7740\u63d0\u9ad8\u66f4\u5f3a\u7684\u57fa\u7b56\u7565\uff0c\u8fd9\u4e5f\u662f\u6211\u4eec\u5728\u5b9e\u9a8c\u4e0a\u89c2\u5bdf\u5230\u7684\u73b0\u8c61\u3002\u6211\u4eec\u901a\u8fc7\u8bad\u7ec3\u8fc7\u7a0b\u4f18\u52bf\u9a8c\u8bc1\u5668\uff08PAVs\uff09\u6765\u9884\u6d4b\u5728\u8fd9\u4e9b\u8bc1\u660e\u8005\u4e0b\u8fdb\u884c\u7684\u8fdb\u5c55\uff0c\u8bc1\u660e\u4e0eORMs\u76f8\u6bd4\uff0c\u5728\u7ebfRL\u4f7f\u7528PAVs\u63d0\u4f9b\u7684\u5bc6\u96c6\u5956\u52b1\u53ef\u4ee5\u5b9e\u73b0\u9ad8\u8fbe8\uff05\u4ee5\u4e0a\u7684\u51c6\u786e\u6027\u63d0\u9ad8\uff0c\u4ee5\u53ca1.5\u81f35\u500d\u7684\u8ba1\u7b97\u6548\u7387\u63d0\u9ad8\u3002\u4f7f\u7528PAVs\u7684\u5728\u7ebfRL\u9996\u6b21\u5b9e\u73b0\u4e86\u6837\u672c\u6548\u7387\u63d0\u53475-6\u500d\uff0c\u51c6\u786e\u7387\u63d0\u5347\u8d85\u8fc76\uff05\u7684\u7ed3\u679c\u3002|\n", "2410.08145": "|**2024-10-10**|**Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs**|Xiaoyuan Liu et.al.|[2410.08145](http://arxiv.org/abs/2410.08145)|**[link](https://github.com/xyliu-cs/ConflictVIS)**|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4e2d\u89c6\u89c9\u4fe1\u606f\u4e0e\u6a21\u578b\u5185\u90e8\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u7684\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u7279\u5b9a\u60c5\u51b5\u4e0b\uff0cMLLMs\u53ef\u80fd\u57fa\u4e8e\u6587\u672c\u67e5\u8be2\u800c\u975e\u89c6\u89c9\u8f93\u5165\u505a\u51fa\u51b3\u7b56\uff0c\u5bfc\u81f4\u5e38\u8bc6\u7ea7\u7684\u89c6\u89c9-\u77e5\u8bc6\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\uff0c\u5e76\u8f85\u4ee5\u4eba\u5de5\u8d28\u91cf\u63a7\u5236\u73af\u8282\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6a21\u62df\u548c\u8bc4\u4f30\u6b64\u7c7b\u51b2\u7a81\u7684\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u3002 \u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u542b\u4e86374\u5f20\u539f\u521b\u56fe\u7247\u53ca1122\u4e2a\u9ad8\u8d28\u91cf\u7684\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u8986\u76d6\u4e86\u4e24\u79cd\u51b2\u7a81\u76ee\u6807\u7c7b\u578b\u548c\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\uff0c\u4e3a\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u63d0\u4f9b\u4e86\u5de5\u5177\u3002\u901a\u8fc7\u8fd9\u4e00\u57fa\u51c6\uff0c\u6211\u4eec\u5bf9\u4e5d\u79cd\u4ee3\u8868\u6027\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u4e0e\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u65f6\u5b58\u5728\u663e\u8457\u7684\u6587\u672c\u4f9d\u8d56\u6027\u95ee\u9898\u3002 \u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u7b56\u7565\u2014\u2014\u201c\u805a\u7126\u4e8e\u89c6\u89c9\u201d\uff08FoV\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u6a21\u578b\u5728\u9047\u5230\u51b2\u7a81\u65f6\u4f18\u5148\u8003\u8651\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u77db\u76fe\u6587\u672c\u4fe1\u606f\u7684\u4f9d\u8d56\u3002\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u4ee5\u53ca\u63d0\u51fa\u7684\u7b56\u7565\u5bf9\u7406\u89e3\u5e76\u7f13\u89e3MLLM\u4e2d\u7684\u89c6\u89c9-\u77e5\u8bc6\u51b2\u7a81\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002 \u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u548c\u4ee3\u7801\u7684\u516c\u5f00\u8bbf\u95ee\u6743\u9650\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002|\n", "2410.08143": "|**2024-10-10**|**DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory**|Yutong Wang et.al.|[2410.08143](http://arxiv.org/abs/2410.08143)|**[link](https://github.com/yutongwang1216/docmtagent)**|**\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u76f8\u5f53\u53ef\u89c2\u7684\u8d28\u91cf\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5f53\u524d\u7684MT-LLM\u7814\u7a76\u4ecd\u7136\u9762\u4e34\u5728\u5904\u7406\u6574\u4e2a\u6587\u6863\u65f6\u4fdd\u6301\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u51c6\u786e\u6027\u7684\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aDelTA\u7684\u6587\u6863\u7ea7\u7ffb\u8bd1\u4ee3\u7406\uff0c\u65e8\u5728\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u3002DelTA\u5177\u6709\u4e00\u79cd\u591a\u5c42\u6b21\u8bb0\u5fc6\u7ed3\u6784\uff0c\u80fd\u591f\u5b58\u50a8\u4e0d\u540c\u7c92\u5ea6\u548c\u8de8\u5ea6\u7684\u4fe1\u606f\uff0c\u5305\u62ec\u4e13\u6709\u540d\u8bcd\u8bb0\u5f55\u3001\u53cc\u8bed\u6458\u8981\u3001\u957f\u671f\u8bb0\u5fc6\u548c\u77ed\u671f\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u4fe1\u606f\u7531\u8f85\u52a9\u7684LLM\u7ec4\u4ef6\u8fde\u7eed\u68c0\u7d22\u548c\u66f4\u65b0\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u56db\u4e2a\u5f00\u6e90/\u95ed\u6e90LLM\u548c\u4e24\u4e2a\u4ee3\u8868\u6027\u6587\u6863\u7ffb\u8bd1\u6570\u636e\u96c6\u4e0a\uff0cDelTA\u5728\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u8d28\u91cf\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u5f3a\u5927\u7684\u57fa\u7ebf\uff0c\u5e73\u5747\u4e00\u81f4\u6027\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe4.58\u4e2a\u767e\u5206\u70b9\uff0cCOMET\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe3.16\u70b9\u3002DelTA\u91c7\u7528\u9010\u53e5\u7ffb\u8bd1\u7b56\u7565\uff0c\u786e\u4fdd\u65e0\u53e5\u5b50\u9057\u6f0f\uff0c\u5e76\u63d0\u4f9b\u4e0e\u4e3b\u6d41\u65b9\u6cd5\u76f8\u6bd4\u66f4\u4e3a\u5185\u5b58\u9ad8\u6548\u7684\u9009\u62e9\u3002\u6b64\u5916\uff0cDelTA\u63d0\u9ad8\u4e86\u4ee3\u8bcd\u7ffb\u8bd1\u51c6\u786e\u6027\uff0c\u5e76\u4e14\u4ee3\u7406\u7684\u6458\u8981\u7ec4\u4ef6\u4e5f\u663e\u793a\u51fa\u4f5c\u4e3a\u57fa\u4e8e\u67e5\u8be2\u7684\u6458\u8981\u4efb\u52a1\u5de5\u5177\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u53d1\u5e03\u5728https://github.com/YutongWang1216/DocMTAgent\u3002**|\n", "2410.09040": "|**2024-10-11**|**AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation**|Zijun Wang et.al.|[2410.09040](http://arxiv.org/abs/2410.09040)|**[link](https://github.com/ucsc-vlaa/attngcg-attack)**|**\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d7\u5230\u56da\u7981\u653b\u51fb\u7684\u8106\u5f31\u6027\uff0c\u7279\u522b\u5173\u6ce8\u57fa\u4e8e\u4f18\u5316\u7684\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u7b56\u7565\u3002\u6211\u4eec\u9996\u5148\u89c2\u5bdf\u5230\u653b\u51fb\u7684\u6709\u6548\u6027\u4e0e\u6a21\u578b\u5185\u90e8\u884c\u4e3a\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u5f53\u6a21\u578b\u5bf9\u65e8\u5728\u786e\u4fddLLM\u5b89\u5168\u5bf9\u9f50\u7684\u7cfb\u7edf\u63d0\u793a\u7ed9\u4e88\u66f4\u591a\u5173\u6ce8\u65f6\uff0c\u653b\u51fb\u5f80\u5f80\u6548\u679c\u8f83\u5dee\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u64cd\u7eb5\u6a21\u578b\u7684\u6ce8\u610f\u529b\u5206\u6570\u6765\u4fc3\u8fdbLLM\u7684\u56da\u7981\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aAttnGCG\u3002\u5b9e\u9a8c\u4e0a\uff0cAttnGCG\u5728\u5404\u79cdLLMs\u4e0a\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6539\u8fdb\uff0c\u5728Llama-2\u7cfb\u5217\u4e2d\u5e73\u5747\u63d0\u9ad8\u4e86\u7ea67%\uff0c\u5728Gemma\u7cfb\u5217\u4e2d\u63d0\u9ad8\u4e86\u7ea610%\u3002\u6211\u4eec\u7684\u7b56\u7565\u8fd8\u5c55\u793a\u4e86\u9488\u5bf9\u672a\u89c1\u8fc7\u7684\u6709\u5bb3\u76ee\u6807\u548c\u9ed1\u76d2LLMs\uff08\u5982GPT-3.5\u548cGPT-4\uff09\u7684\u7a33\u5065\u653b\u51fb\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u6211\u4eec\u7684\u6ce8\u610f\u529b\u5206\u6570\u53ef\u89c6\u5316\u66f4\u6613\u4e8e\u89e3\u91ca\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u66f4\u597d\u5730\u4e86\u89e3\u5982\u4f55\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u6ce8\u610f\u529b\u64cd\u7eb5\u5b9e\u73b0\u66f4\u6709\u6548\u7684\u56da\u7981\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4ee3\u7801\uff0c\u53ef\u5728https://github.com/UCSC-VLAA/AttnGCG-attack\u4e2d\u83b7\u53d6\u3002**|\n", "2410.09039": "|**2024-10-11**|**Semi-Supervised Learning of Noisy Mixture of Experts Models**|Oh-Ran Kwon et.al.|[2410.09039](http://arxiv.org/abs/2410.09039)|null|\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\u662f\u4e00\u4e2a\u7075\u6d3b\u7684\u9884\u6d4b\u5efa\u6a21\u6846\u67b6\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65f6\u4ee3\u91cd\u65b0\u5f15\u8d77\u4e86\u4eba\u4eec\u7684\u5173\u6ce8\u3002\u4e00\u4e2a\u7531\u9884\u6d4b\u201c\u4e13\u5bb6\u201d\u7ec4\u6210\u7684\u96c6\u5408\u4e0e\u63a7\u5236\u5728\u9884\u6d4b\u65f6\u6bcf\u4e2a\u4e13\u5bb6\u5f71\u54cd\u529b\u7684\u201c\u95e8\u63a7\u51fd\u6570\u201d\u5171\u540c\u5b66\u4e60\u3002\u8fd9\u79cd\u7ed3\u6784\u5141\u8bb8\u76f8\u5bf9\u7b80\u5355\u7684\u6a21\u578b\u5728\u590d\u6742\u3001\u5f02\u6784\u7684\u6570\u636e\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u5728\u5f53\u4eca\u8bb8\u591a\u5e94\u7528\u573a\u666f\u4e2d\uff0c\u672a\u6807\u8bb0\u6570\u636e\u5e7f\u6cdb\u53ef\u7528\u800c\u6807\u6ce8\u6570\u636e\u5374\u96be\u4ee5\u83b7\u53d6\u3002\u534a\u76d1\u7763\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5229\u7528\u672a\u6807\u8bb0\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8eMoE\u6a21\u578b\u534a\u76d1\u7763\u5b66\u4e60\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u6d77\u6d0b\u5b66\u5bb6\u5f00\u53d1\u7684\u4e00\u79cd\u5047\u8bbe\u5f3a\u70c8\u7684\u534a\u76d1\u7763MoE\u6a21\u578b\u5f00\u59cb\uff0c\u8be5\u6a21\u578b\u5047\u8bbe\u672a\u6807\u6ce8\u6570\u636e\u4e2d\u7684\u6f5c\u5728\u805a\u7c7b\u7ed3\u6784\u76f4\u63a5\u6620\u5c04\u5230\u76d1\u7763\u4efb\u52a1\u4e2d\u6bcf\u4e2a\u4e13\u5bb6\u5e94\u7ed9\u4e88\u7684\u5f71\u54cd\u3002\u6211\u4eec\u653e\u677e\u4e86\u8fd9\u4e00\u5047\u8bbe\uff0c\u8bbe\u60f3\u4e24\u8005\u4e4b\u95f4\u5b58\u5728\u566a\u58f0\u8fde\u63a5\uff0c\u5e76\u57fa\u4e8e\u6700\u5c0f\u5316\u5254\u9664\u5e73\u65b9\u7b97\u6cd5\u63d0\u51fa\u4e86\u4e00\u79cd\u7b97\u6cd5\uff0c\u5373\u4f7f\u5b58\u5728\u6570\u636e\u9519\u4f4d\u4e5f\u80fd\u6210\u529f\u3002\u6211\u4eec\u7684\u7406\u8bba\u5206\u6790\u786e\u5b9a\u4e86\u8be5\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u63a5\u8fd1\u53c2\u6570\u7387\u6536\u655b\u4f30\u8ba1\u5668\u7684\u6761\u4ef6\u3002\u6a21\u62df\u548c\u771f\u5b9e\u6570\u636e\u793a\u4f8b\u8bc1\u660e\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.09038": "|**2024-10-11**|**SimpleStrat: Diversifying Language Model Generation with Stratification**|Justin Wong et.al.|[2410.09038](http://arxiv.org/abs/2410.09038)|null|\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u6837\u5316\u54cd\u5e94\u5bf9\u4e8e\u89c4\u5212/\u641c\u7d22\u548c\u5408\u6210\u6570\u636e\u751f\u6210\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5e94\u7528\u9700\u8981\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u591a\u6837\u5316\u7684\u7b54\u6848\uff0c\u4ee5\u4fbf\u5728\u6bcf\u6b21\u751f\u6210\u65f6\u90fd\u80fd\u5f97\u5230\u4e0d\u540c\u7684\u7ed3\u679c\u3002\u4e4b\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u589e\u52a0\u6e29\u5ea6\u6765\u63d0\u9ad8\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u4e0e\u666e\u904d\u8ba4\u8bc6\u76f8\u53cd\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u968f\u7740\u6e29\u5ea6\u589e\u52a0\uff0c\u4e2a\u4f53\u751f\u6210\u7684\u8d28\u91cf\u964d\u4f4e\uff0c\u800c\u4e14\u5176\u6709\u6548\u6027\u8fd8\u53d6\u51b3\u4e8e\u6a21\u578b\u7684\u4e0b\u4e00\u4e2a\u8bcd\u6982\u7387\u4e0e\u771f\u5b9e\u7b54\u6848\u5206\u5e03\u7684\u76f8\u4f3c\u6027\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cSimpleStrat\u201d\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bed\u8a00\u6a21\u578b\u672c\u8eab\u5bf9\u7a7a\u95f4\u8fdb\u884c\u5206\u533a\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u5206\u533a\u5e76\u5728\u5176\u4e2d\u62bd\u53d6\u6837\u672c\u3002\u4e3a\u4e86\u8861\u91cf\u591a\u6837\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86CoverageQA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u5177\u6709\u591a\u4e2a\u540c\u7b49\u53ef\u80fd\u7b54\u6848\u7684\u672a\u6307\u5b9a\u95ee\u9898\u3002\u901a\u8fc7\u6d4b\u91cf\u8f93\u51fa\u5206\u5e03\u4e0e\u6709\u6548\u5730\u9762\u771f\u76f8\u7b54\u6848\u7684\u5747\u5300\u5206\u5e03\u4e4b\u95f4\u7684KL\u6563\u5ea6\u6765\u8bc4\u4f30\u591a\u6837\u6027\u3002\u7531\u4e8e\u8ba1\u7b97\u4e13\u7528\u6a21\u578b\u6bcf\u6761\u54cd\u5e94/\u89e3\u51b3\u65b9\u6848\u7684\u6982\u7387\u901a\u5e38\u662f\u4e0d\u53ef\u884c\u7684\uff0c\u56e0\u6b64\u6211\u4eec\u4f7f\u7528\u53ec\u56de\u7387\u6765\u8bc4\u4f30\u5730\u771f\u7406\u89e3\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528SimpleStrat\u65b9\u6cd5\u53ef\u4ee5\u5b9e\u73b0\u6bd4GPT-4o\u9ad80.05\u7684\u53ec\u56de\u7387\uff0c\u5e76\u4e14\u5e73\u5747\u51cf\u5c11\u4e860.36\u7684KL\u6563\u5ea6\u4e0eLlama 3\u76f8\u6bd4\u3002|\n", "2410.09037": "|**2024-10-11**|**Mentor-KD: Making Small Language Models Better Multi-step Reasoners**|Hojae Lee et.al.|[2410.09037](http://arxiv.org/abs/2410.09037)|**[link](https://github.com/2hojae/mentor-kd)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63d0\u793a\u5728\u5404\u79cd\u590d\u6742\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u975e\u51e1\u7684\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u84b8\u998f\uff08KD\uff09\u65b9\u6cd5\u2014\u2014\u63a8\u7406\u84b8\u998f\uff0c\u901a\u8fc7\u5fae\u8c03\u7531LLM\u6559\u5e08\u751f\u6210\u7684\u591a\u6b65\u63a8\u7406\u8bed\u8a00\u6a21\u578b\uff0c\u5c06LLM\u7684\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u6a21\u578b\u4e0a\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7814\u7a76\u5728\u4ee5\u4e0b\u4e24\u4e2a\u65b9\u9762\u8003\u8651\u4e0d\u8db3\uff1a\u4eceLLM\u6559\u5e08\u6a21\u578b\u83b7\u53d6\u7684\u793a\u4f8b\u96c6\u8d28\u91cf\u4f4e\u548c\u8f6f\u6807\u7b7e\u63d0\u4f9b\u4e0d\u8db3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bfc\u5e08-KD\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6709\u6548\u5730\u5c06LLM\u7684\u591a\u6b65\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u5e76\u89e3\u51b3\u4e86\u4e0a\u8ff0\u6311\u6218\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u4e00\u4e2a\u5bfc\u5e08\u2014\u2014\u7279\u5b9a\u4efb\u52a1\u7684\u4e2d\u95f4\u5927\u5c0f\u7684\u5fae\u8c03\u6a21\u578b\u2014\u2014\u6765\u589e\u52a0\u989d\u5916\u7684CoT\u6ce8\u91ca\u5e76\u4e3a\u5b66\u751f\u6a21\u578b\u63d0\u4f9b\u8f6f\u6807\u7b7e\uff0c\u4ee5\u5728\u63a8\u7406\u84b8\u998f\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u786e\u8ba4\u4e86\u5bfc\u5e08-KD\u5728\u4e0d\u540c\u6a21\u578b\u548c\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|Ptychography\u662f\u4e00\u79cd\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u7684\u9ad8\u7ea7\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5728\u7269\u7406\u5b66\u3001\u5316\u5b66\u3001\u751f\u7269\u5b66\u548c\u6750\u6599\u79d1\u5b66\u7b49\u7814\u7a76\u9886\u57df\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u5b9e\u8df5\u8fc7\u7a0b\u4e2d\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684ptychographic\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u4f17\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u5de5\u4f5c\u6548\u7387\u4f4e\u4e0b\uff0c\u5e76\u53ef\u80fd\u5f15\u5165\u4eba\u4e3a\u504f\u89c1\u3002\u672c\u5de5\u4f5c\u5f00\u53d1\u4e86\u201cptychographic\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u5904\u7406ptychography\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u91c7\u7528\u4e86\u591a\u4e2aLLM\u4ee3\u7406\u8fdb\u884c\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u4e5f\u662f\u5982\u6b64\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u8bbe\u8ba1\u6709\u53ef\u81ea\u5b9a\u4e49\u7684\u672c\u5730\u77e5\u8bc6\u5e93\uff0c\u4ee5\u786e\u4fdd\u5176\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e0b\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09013": "|**2024-10-11**|**The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals**|Xiaofeng Wu et.al.|[2410.09013](http://arxiv.org/abs/2410.09013)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5229\u7528\u6c49\u5b57\u4e2d\u7684\u89c6\u89c9\u4fe1\u606f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5173\u4e8e\u90e8\u9996\u3001\u7ed3\u6784\u3001\u7b14\u753b\u4ee5\u53ca\u7b14\u753b\u6570\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5bf9\u6c49\u5b57\u4e2d\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u7a0b\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u63d0\u4f9b\u5b57\u7b26\u56fe\u50cf\uff0c\u6a21\u578b\u4ecd\u7136\u5c55\u793a\u4e86\u6709\u9650\u4f46\u90e8\u5206\u7406\u89e3\u89c6\u89c9\u4fe1\u606f\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u6fc0\u53d1\u6a21\u578b\u5229\u7528\u90e8\u9996\u8fdb\u884c\u4e2d\u6587\u7406\u89e3\u4efb\u52a1\u7684\u6f5c\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5c1d\u8bd5\u5c06\u90e8\u9996\u4fe1\u606f\u878d\u5165\u5230\u63d0\u793a\u4e2d\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5728\u63d0\u4f9b\u5173\u4e8e\u90e8\u9996\u7684\u989d\u5916\u4fe1\u606f\u65f6\uff0c\u8bcd\u6027\u6807\u6ce8\u4efb\u52a1\u7684\u8868\u73b0\u5f97\u5230\u4e86\u4e00\u81f4\u6027\u7684\u63d0\u5347\u3002\u8fd9\u8868\u660e\u901a\u8fc7\u6574\u5408\u5b50\u5b57\u7b26\u4fe1\u606f\uff0c\u6709\u53ef\u80fd\u589e\u5f3a\u8bed\u8a00\u5904\u7406\u80fd\u529b\u3002|\n", "2410.09012": "|**2024-10-11**|**Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models**|Hao Li et.al.|[2410.09012](http://arxiv.org/abs/2410.09012)|**[link](https://github.com/sailresearch/fmse-blogs)**|\u672c\u6587\u9996\u6b21\u4ece\u5b9e\u8df5\u8005\u7684\u89c6\u89d2\u5206\u6790\u4e86\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u9876\u7ea7\u79d1\u6280\u516c\u53f8\u7684155\u7bc7FM4SE\u548c997\u7bc7SE4FM\u535a\u5ba2\u6587\u7ae0\uff0c\u5229\u7528\u57fa\u4e8eFM\u7684\u8c03\u7814\u65b9\u6cd5\u7cfb\u7edf\u5730\u6807\u8bb0\u548c\u603b\u7ed3\u4e86\u8ba8\u8bba\u7684\u6d3b\u52a8\u548c\u4efb\u52a1\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u4ee3\u7801\u751f\u6210\u662fFM4SE\u4e2d\u6700\u7a81\u51fa\u7684\u4efb\u52a1\uff0c\u4f46FMs\u8fd8\u88ab\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3001\u603b\u7ed3\u548cAPI\u63a8\u8350\u7b49\u4f17\u591a\u5176\u4ed6SE\u6d3b\u52a8\u3002\u5173\u4e8eSE4FM\u7684\u5927\u591a\u6570\u535a\u5ba2\u6587\u7ae0\u5173\u6ce8\u4e8e\u6a21\u578b\u90e8\u7f72\u4e0e\u64cd\u4f5c\u4ee5\u53ca\u7cfb\u7edf\u67b6\u6784\u4e0e\u7f16\u6392\u3002\u5c3d\u7ba1\u4e91\u90e8\u7f72\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u4f46\u5bf9FMs\u8fdb\u884c\u538b\u7f29\u5e76\u5728\u8fb9\u7f18\u6216\u79fb\u52a8\u8bbe\u5907\u4e0a\u90e8\u7f72\u7684\u5174\u8da3\u6b63\u5728\u589e\u957f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u516b\u4e2a\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u5f25\u5408\u7406\u8bba\u53d1\u73b0\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e0d\u4ec5\u4e30\u5bcc\u4e86FMs\u5728SE\u9886\u57df\u5b9e\u8df5\u5e94\u7528\u7684\u77e5\u8bc6\u4f53\u7cfb\uff0c\u8fd8\u5c55\u793a\u4e86FMs\u5728\u6280\u672f\u4e0e\u7070\u8272\u6587\u732e\u9886\u57df\u8fdb\u884c\u6587\u732e\u8c03\u7814\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u63d0\u4f9b\u7684\u6570\u636e\u96c6\u3001\u7ed3\u679c\u3001\u4ee3\u7801\u4ee5\u53ca\u4f7f\u7528\u7684\u63d0\u793a\u53ef\u4ee5\u5728\u5728\u7ebf\u590d\u5236\u5305https://github.com/SAILResearch/fmse-blogs\u4e2d\u627e\u5230\u3002|\n", "2410.09008": "|**2024-10-11**|**SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights**|Ling Yang et.al.|[2410.09008](http://arxiv.org/abs/2410.09008)|**[link](https://github.com/yangling0818/supercorrect-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u3001PaLM\u548cLLaMA\u5728\u5404\u79cd\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8f83\u5c0f\u7684\u6a21\u578b\u5982Llama-3-8B\u548cDeepSeekMath-Base\u4ecd\u7136\u5728\u590d\u6742\u7684\u6570\u5b66\u63a8\u7406\u65b9\u9762\u5b58\u5728\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u6709\u6548\u5730\u8bc6\u522b\u5e76\u7ea0\u6b63\u63a8\u7406\u9519\u8bef\u3002\u8fd1\u671f\u7684\u53cd\u601d\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4f7f\u6a21\u578b\u80fd\u591f\u81ea\u6211\u53cd\u601d\u548c\u81ea\u6211\u6821\u6b63\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u4ecd\u9762\u4e34\u72ec\u7acb\u68c0\u6d4b\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u9519\u8bef\u7684\u6311\u6218\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSuperCorrect\u7684\u65b0\u578b\u4e24\u9636\u6bb5\u6846\u67b6\uff0c\u5b83\u4f7f\u7528\u5927\u578b\u6559\u5e08\u6a21\u578b\u6765\u76d1\u7763\u548c\u7ea0\u6b63\u8f83\u5c0f\u5b66\u751f\u6a21\u578b\u7684\u63a8\u7406\u548c\u53cd\u601d\u8fc7\u7a0b\u3002 \u5728\u7b2c\u4e00\u9636\u6bb5\uff0c\u6211\u4eec\u4ece\u6559\u5e08\u6a21\u578b\u4e2d\u63d0\u53d6\u4e86\u5c42\u6b21\u5316\u7684\u9ad8\u9636\u548c\u8be6\u7ec6\u7684\u601d\u60f3\u6a21\u677f\uff0c\u4ee5\u6307\u5bfc\u5b66\u751f\u6a21\u578b\u751f\u6210\u66f4\u7ec6\u81f4\u7684\u63a8\u7406\u601d\u60f3\u3002\u5728\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8de8\u6a21\u578b\u534f\u4f5c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u6765\u589e\u5f3a\u5b66\u751f\u6a21\u578b\u7684\u81ea\u6211\u6821\u6b63\u80fd\u529b\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8ddf\u968f\u6559\u5e08\u7684\u4fee\u6b63\u8f68\u8ff9\u8fdb\u884c\u6539\u8fdb\u3002\u8fd9\u79cd\u8de8\u6a21\u578bDPO\u65b9\u6cd5\u6559\u4f1a\u5b66\u751f\u6a21\u578b\u901a\u8fc7\u4ece\u6559\u5e08\u6a21\u578b\u83b7\u5f97\u7684\u9519\u8bef\u9a71\u52a8\u7684\u89c1\u89e3\u6709\u6548\u5730\u5b9a\u4f4d\u5e76\u89e3\u51b3\u9519\u8bef\u7684\u601d\u60f3\uff0c\u6253\u7834\u5176\u601d\u60f3\u7684\u74f6\u9888\uff0c\u5e76\u901a\u8fc7\u5b66\u4e60\u65b0\u6280\u80fd\u548c\u77e5\u8bc6\u6765\u5e94\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684\u95ee\u9898\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u4e00\u81f4\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u4f18\u8d8a\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684SuperCorrect-7B\u6a21\u578b\u5728MATH/GSM8K\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684DeepSeekMath-7B\u548cQwen2.5-Math-7B\uff0c\u5206\u522b\u5728MATH\u548cGSM8K\u57fa\u51c6\u4e0a\u63d0\u9ad8\u4e867.8%/5.3%\u548c15.1%/6.3%\uff0c\u5728\u6240\u67097B\u6a21\u578b\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6027\u80fd\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/SuperCorrect-llm**|\n", "2410.09006": "|**2024-10-11**|**From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts**|Zhuohao Jerry Zhang et.al.|[2410.09006](http://arxiv.org/abs/2410.09006)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5728\u521b\u5efa\u80fd\u591f\u901a\u8fc7\u7528\u6237\u754c\u9762\uff08UI\uff09\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u5982\u4f55\u5bfc\u822aUI\u4ee5\u53ca\u7406\u89e3UI\u7ed3\u6784\u7684\u673a\u5236\uff0c\u4f46\u4ee3\u7406\u53ca\u5176\u81ea\u4e3b\u884c\u4e3a\uff08\u7279\u522b\u662f\u53ef\u80fd\u5177\u6709\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u6027\u7684\u884c\u4e3a\uff09\u7684\u5f71\u54cd\u548c\u540e\u679c\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86AI\u4ee3\u7406UI\u64cd\u4f5c\u7684\u5b9e\u9645\u4e16\u754c\u5f71\u54cd\u548c\u540e\u679c\u3002 \u6211\u4eec\u9996\u5148\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0e\u9886\u57df\u4e13\u5bb6\u7684\u5de5\u4f5c\u574a\u5f00\u53d1\u4e86\u4e00\u79cdUI\u64cd\u4f5c\u5f71\u54cd\u7684\u5206\u7c7b\u7cfb\u7edf\u3002\u968f\u540e\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6570\u636e\u7efc\u5408\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u7528\u6237\u611f\u77e5\u4e3a\u5177\u6709\u5f71\u54cd\u529b\u7684UI\u5c4f\u5e55\u8f68\u8ff9\u548c\u64cd\u4f5c\u6570\u636e\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u5f71\u54cd\u7c7b\u522b\u5bf9\u6536\u96c6\u7684\u6570\u636e\u548c\u4ece\u73b0\u6709UI\u5bfc\u822a\u6570\u636e\u96c6\u4e2d\u91cd\u65b0\u5229\u7528\u7684\u6570\u636e\u8fdb\u884c\u4e86\u6ce8\u91ca\u3002\u6211\u4eec\u5bf9\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u53d8\u4f53\u7684\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\u4e86\u8fd9\u4e9bLLM\u7406\u89e3\u548c\u9884\u6d4bAI\u4ee3\u7406\u53ef\u80fd\u91c7\u53d6\u7684UI\u64cd\u4f5c\u5f71\u54cd\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u5206\u7c7b\u7cfb\u7edf\u589e\u5f3a\u4e86\u8fd9\u4e9bLLM\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5b83\u4eec\u80fd\u591f\u66f4\u597d\u5730\u7406\u89e3UI\u64cd\u4f5c\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4ed6\u4eec\u5728\u53ef\u9760\u5730\u5206\u7c7b\u66f4\u5fae\u5999\u6216\u590d\u6742\u7684\u5f71\u54cd\u529b\u7c7b\u522b\u65f6\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u7684\u95ee\u9898\u3002|\n", "2410.08996": "|**2024-10-11**|**Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference**|Grace Proebsting et.al.|[2410.08996](http://arxiv.org/abs/2410.08996)|null|\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-4\u3001Llama-2\u548cMistral 7b\u7b49\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u751f\u6210\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u5047\u8bbe\uff0c\u6d4b\u8bd5\u4e86\u7528LLM\u66ff\u6362\u4f17\u5305\u5de5\u4f5c\u8005\u5bf9\u4ea7\u751f\u6ce8\u91ca\u504f\u89c1\u7684\u5f71\u54cd\u3002\u6211\u4eec\u590d\u73b0\u4e86\u65af\u5766\u798fNLI\u8bed\u6599\u5e93\u7684\u90e8\u5206\u6570\u636e\uff0c\u5e76\u8bad\u7ec3\u4e86\u4ec5\u4f7f\u7528\u5047\u8bbe\u7684\u5206\u7c7b\u5668\u6765\u786e\u5b9aLLM\u751f\u6210\u7684\u5047\u8bbe\u662f\u5426\u5305\u542b\u6ce8\u91ca\u504f\u89c1\u3002\u5728\u6211\u4eec\u7684\u7531LLM\u751f\u6210\u7684NLI\u6570\u636e\u96c6\u4e0a\uff0c\u57fa\u4e8eBERT\u7684\u4ec5\u5047\u8bbe\u5206\u7c7b\u5668\u8fbe\u5230\u4e8686%-96%\u7684\u51c6\u786e\u7387\uff0c\u8fd9\u8868\u660e\u8fd9\u4e9b\u6570\u636e\u96c6\u5305\u542b\u4ec5\u5047\u8bbe\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0LLM\u751f\u6210\u7684\u5047\u8bbe\u4e2d\u5b58\u5728\u9891\u7e41\u7684\u201c\u7ebf\u7d22\u201d\uff0c\u4f8b\u5982\uff0c\u201c\u5728\u6cf3\u6c60\u91cc\u6e38\u6cf3\u201d\u8fd9\u4e00\u77ed\u8bed\u5728GPT-4\u751f\u6210\u768410000\u591a\u4e2a\u77db\u76fe\u5047\u8bbe\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660eNLI\u4e2d\u5df2\u77e5\u7684\u504f\u89c1\u53ef\u80fd\u5728LLM\u751f\u6210\u7684\u6570\u636e\u4e2d\u6301\u7eed\u5b58\u5728\u3002|\n", "2410.10819": "|**2024-10-14**|**DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads**|Guangxuan Xiao et.al.|[2410.10819](http://arxiv.org/abs/2410.10819)|**[link](https://github.com/mit-han-lab/duo-attention)**|**\u90e8\u7f72\u957f\u4e0a\u4e0b\u6587\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6311\u6218\u3002\u7f13\u5b58\u6240\u6709\u6ce8\u610f\u529b\u5934\u4e2d\u7684Key\u548cValue\uff08KV\uff09\u72b6\u6001\u4f1a\u6d88\u8017\u5927\u91cf\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\u8981\u4e48\u635f\u5bb3\u4e86LLM\u7684\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\uff0c\u8981\u4e48\u53ea\u63d0\u4f9b\u4e86\u6709\u9650\u7684\u6548\u7387\u63d0\u5347\u3002\u672c\u6587\u53d1\u73b0\uff0c\u53ea\u6709\u90e8\u5206\u6ce8\u610f\u529b\u5934\uff0c\u5373\u68c0\u7d22\u5934\uff0c\u5bf9\u4e8e\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u662f\u81f3\u5173\u91cd\u8981\u7684\uff0c\u5e76\u4e14\u9700\u8981\u5bf9\u6240\u6709\u6807\u8bb0\u8fdb\u884c\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u673a\u5236\u3002\u76f8\u53cd\uff0c\u6240\u6709\u5176\u4ed6\u5934\u90e8\uff0c\u4e3b\u8981\u5173\u6ce8\u6700\u8fd1\u7684\u6807\u8bb0\u4ee5\u53ca\u6ce8\u610f\u529b\u6c47\u70b9\uff0c\u79f0\u4e3a\u6d41\u5934\u90e8\uff0c\u4e0d\u9700\u8981\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86DuoAttention\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4ec5\u5bf9\u68c0\u7d22\u5934\u5e94\u7528\u5b8c\u6574\u7684KV\u7f13\u5b58\uff0c\u800c\u5bf9\u6d41\u5934\u90e8\u4f7f\u7528\u8f7b\u91cf\u7ea7\u3001\u56fa\u5b9a\u957f\u5ea6\u7684KV\u7f13\u5b58\uff0c\u4ece\u800c\u5728\u4e0d\u635f\u5bb3\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51cf\u5c11LLM\u89e3\u7801\u548c\u9884\u586b\u5145\u7684\u5185\u5b58\u548c\u5ef6\u8fdf\u3002DuoAttention\u91c7\u7528\u4e86\u4e00\u79cd\u57fa\u4e8e\u4f18\u5316\u7684\u7b97\u6cd5\uff0c\u4f7f\u7528\u5408\u6210\u6570\u636e\u51c6\u786e\u8bc6\u522b\u68c0\u7d22\u5934\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u5185\u5b58\u6700\u591a\u51cf\u5c11\u4e862.55\u500d\uff08\u5bf9\u4e8eMHA\u6a21\u578b\uff09\u548c1.67\u500d\uff08\u5bf9\u4e8eGQA\u6a21\u578b\uff09\uff0c\u540c\u65f6\u89e3\u7801\u901f\u5ea6\u63d0\u9ad8\u4e86\u6700\u591a2.18\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.50\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u52a0\u901f\u9884\u586b\u5145\u6700\u591a1.73\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.63\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u4e14\u4e0e\u5168\u6ce8\u610f\u529b\u76f8\u6bd4\uff0c\u7cbe\u5ea6\u635f\u5931\u6700\u5c0f\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7ed3\u5408\u91cf\u5316\u6280\u672f\uff0cDuoAttention\u4f7fLlama-3-8B\u80fd\u591f\u5728\u5355\u4e2aA100 GPU\u4e0a\u89e3\u7801\u957f\u8fbe330\u4e07\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mit-han-lab/duo-attention\u83b7\u53d6\u3002**|\n", "2410.10813": "|**2024-10-14**|**LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory**|Di Wu et.al.|[2410.10813](http://arxiv.org/abs/2410.10813)|**[link](https://github.com/xiaowu0162/longmemeval)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u804a\u5929\u52a9\u624b\u7cfb\u7edf\u5df2\u96c6\u6210\u4e86\u8bb0\u5fc6\u7ec4\u4ef6\u6765\u8ddf\u8e2a\u7528\u6237\u4e0e\u52a9\u624b\u4e4b\u95f4\u7684\u804a\u5929\u5386\u53f2\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u51c6\u786e\u548c\u4e2a\u6027\u5316\u7684\u54cd\u5e94\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6301\u7eed\u4ea4\u4e92\u4e2d\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u4ecd\u9700\u6df1\u5165\u7814\u7a76\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLongMemEval\u7684\u7efc\u5408\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u804a\u5929\u52a9\u624b\u7684\u4e94\u9879\u6838\u5fc3\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\uff1a\u4fe1\u606f\u63d0\u53d6\u3001\u591a\u4f1a\u8bdd\u63a8\u7406\u3001\u65f6\u95f4\u63a8\u7406\u3001\u77e5\u8bc6\u66f4\u65b0\u548c\u5f03\u6743\u3002\u8be5\u57fa\u51c6\u5305\u542b500\u4e2a\u7cbe\u5fc3\u7b56\u5212\u7684\u95ee\u9898\uff0c\u5e76\u5d4c\u5165\u5728\u81ea\u7531\u6269\u5c55\u7684\u7528\u6237\u4e0e\u52a9\u624b\u804a\u5929\u5386\u53f2\u4e2d\u3002LongMemEval\u5bf9\u73b0\u6709\u7684\u957f\u671f\u8bb0\u5fc6\u7cfb\u7edf\u63d0\u51fa\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5728\u5546\u4e1a\u804a\u5929\u52a9\u624b\u548c\u957f\u4e0a\u4e0b\u6587LLM\u4e0a\uff0c\u8de8\u6301\u7eed\u4ea4\u4e92\u7684\u8bb0\u5fc6\u4fe1\u606f\u4fdd\u7559\u7387\u4e0b\u964d\u4e8630%\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u5c06\u957f\u671f\u8bb0\u5fc6\u8bbe\u8ba1\u5206\u89e3\u4e3a\u7d22\u5f15\u3001\u68c0\u7d22\u548c\u9605\u8bfb\u9636\u6bb5\u7684\u56db\u4e2a\u8bbe\u8ba1\u9009\u62e9\u3002\u57fa\u4e8e\u5173\u952e\u5b9e\u9a8c\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u79cd\u5185\u5b58\u8bbe\u8ba1\uff0c\u5305\u62ec\u4f1a\u8bdd\u5206\u89e3\u4ee5\u4f18\u5316\u503c\u7c92\u5ea6\u3001\u4e8b\u5b9e\u589e\u5f3a\u7684\u5173\u952e\u6269\u5c55\u4ee5\u589e\u5f3a\u7d22\u5f15\u7ed3\u6784\u4ee5\u53ca\u65f6\u95f4\u611f\u77e5\u67e5\u8be2\u6269\u5c55\u4ee5\u7ec6\u5316\u641c\u7d22\u8303\u56f4\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u4f18\u5316\u6781\u5927\u5730\u63d0\u9ad8\u4e86LongMemEval\u4e0a\u7684\u5185\u5b58\u53ec\u56de\u7387\u548c\u4e0b\u6e38\u95ee\u9898\u56de\u7b54\u6027\u80fd\u3002\u603b\u4f53\u800c\u8a00\uff0c\u672c\u7814\u7a76\u4e3a\u63a8\u8fdb\u57fa\u4e8eLLM\u7684\u804a\u5929\u52a9\u624b\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u8d44\u6e90\u548c\u6307\u5bfc\uff0c\u4e3a\u66f4\u4e2a\u6027\u5316\u548c\u53ef\u9760\u7684\u5bf9\u8bddAI\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2410.10814": "|**2024-10-14**|**Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free**|Ziyue Li et.al.|[2410.10814](http://arxiv.org/abs/2410.10814)|**[link](https://github.com/tianyi-lab/moe-embedding)**|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u89e3\u7801\u5668-only\u67b6\u6784\u901a\u5e38\u9650\u5236\u4e86\u5b83\u4eec\u4f5c\u4e3a\u5d4c\u5165\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u9664\u975e\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u8868\u793a\u5fae\u8c03\u3002\u8fd9\u662f\u5426\u4e0e\u5b83\u4eec\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u7684\u4e3b\u5f20\u76f8\u77db\u76fe\uff1f\u4e3a\u4e86\u56de\u7b54\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u66f4\u4ed4\u7ec6\u5730\u7814\u7a76\u4e86\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cMoE LLMs\u4e2d\u7684\u4e13\u5bb6\u8def\u7531\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u73b0\u6210\u7684\u5d4c\u5165\u6a21\u578b\uff0c\u5728\u5404\u79cd\u5d4c\u5165\u91cd\u70b9\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5e7f\u6cdb\u7684\u5206\u6790\u8868\u660e\uff0cMoE\u8def\u7531\u6743\u91cd\uff08RW\uff09\u4e0eLLMs\u5e7f\u6cdb\u4f7f\u7528\u7684\u9690\u85cf\u72b6\u6001\uff08HS\uff09\u4e92\u8865\u3002\u4e0eHS\u76f8\u6bd4\uff0c\u6211\u4eec\u53d1\u73b0RW\u5bf9\u63d0\u793a\u7684\u9009\u62e9\u66f4\u5177\u9c81\u68d2\u6027\uff0c\u5e76\u5173\u6ce8\u9ad8\u5c42\u6b21\u8bed\u4e49\u3002\u53d7\u6b64\u5206\u6790\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MoEE\uff0c\u7ed3\u5408\u4e86RW\u548cHS\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5355\u72ec\u4f7f\u7528\u4efb\u4e00\u65b9\u6cd5\u3002\u6211\u4eec\u5bf9\u5b83\u4eec\u7684\u7ec4\u5408\u53ca\u5176\u63d0\u793a\u7b56\u7565\u7684\u63a2\u7d22\u63ed\u793a\u4e86\u82e5\u5e72\u65b0\u9896\u89c1\u89e3\uff0c\u4f8b\u5982\uff0cRW\u548cHS\u76f8\u4f3c\u5ea6\u7684\u52a0\u6743\u548c\u4f18\u4e8e\u5b83\u4eec\u8fde\u63a5\u540e\u7684\u76f8\u4f3c\u5ea6\u3002\u6211\u4eec\u5728\u6765\u81ea\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\u76846\u4e2a\u5d4c\u5165\u4efb\u52a1\u4e2d\u768420\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0cMoEE\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8eLLM\u7684\u5d4c\u5165\u6548\u679c\uff0c\u4e14\u65e0\u9700\u8fdb\u4e00\u6b65\u5fae\u8c03\u3002|\n", "2410.10801": "|**2024-10-14**|**Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning**|Aakanksha et.al.|[2410.10801](http://arxiv.org/abs/2410.10801)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u88ab\u5168\u7403\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e94\u7528\u4e8e\u5404\u79cd\u9886\u57df\u3002\u7136\u800c\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u4f7f\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u504f\u597d\u8bad\u7ec3\u548c\u5b89\u5168\u63aa\u65bd\u5f80\u5f80\u8fc7\u5ea6\u62df\u5408\u4e8e\u897f\u65b9\u4e2d\u5fc3\u6570\u636e\u96c6\u4e2d\u7684\u5371\u5bb3\uff0c\u800c\u5b89\u5168\u534f\u8bae\u901a\u5e38\u65e0\u6cd5\u6269\u5c55\u5230\u591a\u8bed\u8a00\u73af\u5883\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5728\u591a\u6837\u5316\u7684\u591a\u4efb\u52a1\u8bbe\u7f6e\u4e2d\u63a2\u7d22\u6a21\u578b\u5408\u5e76\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7ed3\u5408\u5b89\u5168\u548c\u901a\u7528\u4efb\u52a1\u3002\u6bcf\u79cd\u8bed\u8a00\u5728\u4e0d\u540c\u4efb\u52a1\u4e2d\u5f15\u5165\u4e86\u72ec\u7279\u7684\u5b66\u4e60\u6311\u6218\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u57fa\u4e8e\u76ee\u6807\u7684\u5408\u5e76\u6bd4\u6df7\u5408\u6570\u636e\u66f4\u6709\u6548\uff0c\u603b\u4f53\u6027\u80fd\u548c\u5b89\u5168\u6027\u5206\u522b\u63d0\u9ad8\u4e868%\u548c10%\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u57fa\u4e8e\u8bed\u8a00\u7684\u5408\u5e76\u975e\u5e38\u6709\u6548\u2014\u2014\u901a\u8fc7\u5408\u5e76\u5355\u8bed\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u76f8\u540c\u53ef\u7528\u6570\u636e\u4e0b\uff0c\u76f8\u6bd4\u6df7\u5408\u6570\u636e\u65b9\u6cd5\uff0c\u6574\u4f53\u6027\u80fd\u63d0\u9ad84%\uff0c\u6240\u6709\u8bed\u8a00\u4e0a\u7684\u5371\u5bb3\u51cf\u5c117%\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u5408\u5e76\u65b9\u6cd5\u7684\u7efc\u5408\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6784\u5efa\u5f3a\u5927\u4e14\u5b89\u5168\u7684\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u7528\u6846\u67b6\u3002|\n", "2410.10798": "|**2024-10-15**|**MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling**|Jian Yang et.al.|[2410.10798](http://arxiv.org/abs/2410.10798)|null|\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u8054\u5408\u6982\u7387\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u540c\u65f6\u7406\u89e3\u548c\u751f\u6210\u56fe\u50cf\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u7684\u65b9\u6cd5\u5728\u7406\u89e3\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u4e0d\u53ef\u907f\u514d\u5730\u4f1a\u4e22\u5931\u56fe\u50cf\u4fe1\u606f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u56fe\u50cf\u79bb\u6563\u5316\u6216\u6269\u6563\u53bb\u566a\u6b65\u9aa4\u9020\u6210\u7684\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u81ea\u56de\u5f52\uff08MMAR\uff09\u6982\u7387\u5efa\u6a21\u6846\u67b6\u3002\u4e0e\u79bb\u6563\u5316\u65b9\u6cd5\u4e0d\u540c\uff0cMMAR\u91c7\u7528\u8fde\u7eed\u503c\u7684\u56fe\u50cf\u6807\u8bb0\u6765\u907f\u514d\u4fe1\u606f\u4e22\u5931\u3002\u4e0d\u540c\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u6bcf\u4e2a\u81ea\u56de\u5f52\u56fe\u50cf\u5757\u5d4c\u5165\u9876\u90e8\u6dfb\u52a0\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6269\u6563\u5934\u6765\u89e3\u8026\u6269\u6563\u8fc7\u7a0b\u548c\u81ea\u56de\u5f52\u4e3b\u5e72\u6a21\u578b\u3002\u8fd9\u6837\u4e00\u6765\uff0c\u5f53\u6a21\u578b\u4ece\u56fe\u50cf\u751f\u6210\u8fc7\u6e21\u5230\u901a\u8fc7\u6587\u672c\u751f\u6210\u8fdb\u884c\u7406\u89e3\u65f6\uff0c\u4e3b\u5e72\u6a21\u578b\u5bf9\u56fe\u50cf\u7684\u9690\u85cf\u8868\u793a\u4e0d\u53d7\u9650\u4e8e\u6700\u540e\u7684\u53bb\u566a\u6b65\u9aa4\u3002\u4e3a\u4e86\u6210\u529f\u8bad\u7ec3\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u88ab\u8bc1\u660e\u53ef\u4ee5\u89e3\u51b3\u6570\u503c\u7a33\u5b9a\u6027\u95ee\u9898\u7684\u6280\u672f\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u5e73\u8861\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u76ee\u6807\u7684\u8bad\u7ec3\u7b56\u7565\u3002\u901a\u8fc7\u572818\u4e2a\u56fe\u50cf\u7406\u89e3\u57fa\u51c6\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u8bc4\u4f30\uff0cMMAR\u5c55\u793a\u4e86\u6bd4\u5176\u4ed6\u8054\u5408\u591a\u6a21\u6001\u6a21\u578b\u66f4\u4f18\u8d8a\u7684\u6027\u80fd\uff0c\u5176\u6027\u80fd\u53ef\u4e0e\u91c7\u7528\u9884\u8bad\u7ec3CLIP\u89c6\u89c9\u7f16\u7801\u5668\u7684\u65b9\u6cd5\u76f8\u5ab2\u7f8e\uff0c\u540c\u65f6\u8fd8\u80fd\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u8868\u660e\uff0c\u8be5\u65b9\u6cd5\u5728\u66f4\u5927\u6570\u636e\u96c6\u548c\u66f4\u5927\u6a21\u578b\u89c4\u6a21\u4e0b\u5177\u6709\u53ef\u6269\u5c55\u6027\u3002|\n", "2410.10796": "|**2024-10-14**|**Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance**|Sachin Goyal et.al.|[2410.10796](http://arxiv.org/abs/2410.10796)|**[link](https://github.com/locuslab/context-parametric-inversion)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u6765\u589e\u5f3a\u5176\u9075\u5faa\u7528\u6237\u6307\u4ee4\u548c\u5904\u7406\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u5e38\u5e38\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\uff0c\u5c24\u5176\u662f\u5728\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e0d\u4e00\u81f4\u65f6\u3002\u8fd9\u4f1a\u5bfc\u81f4\u5404\u79cd\u5931\u8d25\uff0c\u4f8b\u5982\u5e7b\u89c9\uff0c\u5373\u54cd\u5e94\u5185\u5bb9\u8fc7\u65f6\u3001\u5e26\u6709\u504f\u89c1\u6216\u5305\u542b\u672a\u7ecf\u9a8c\u8bc1\u7684\u4e8b\u5b9e\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bd5\u56fe\u7406\u89e3\u8fd9\u79cd\u4e0d\u826f\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u7684\u6839\u672c\u539f\u56e0\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee4\u5fae\u8c03\u4e4b\u540e\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u4e2a\u6709\u8da3\u7684\u73b0\u8c61\uff1a\u5728\u6307\u4ee4\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u6700\u521d\u5982\u9884\u671f\u822c\u589e\u52a0\uff0c\u4f46\u968f\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u8fdb\u884c\uff0c\u8fd9\u79cd\u4f9d\u8d56\u6027\u9010\u6e10\u51cf\u5c11\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u79f0\u4e3a\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\uff0c\u5e76\u53d1\u73b0\u5728\u591a\u4e2a\u901a\u7528\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff08\u5982TULU\u3001Alpaca\u548cUltrachat\uff09\u4ee5\u53ca\u6a21\u578b\u5bb6\u65cf\uff08\u5982Llama\u3001Mistral\u548cPythia\uff09\u4e2d\u90fd\u5b58\u5728\u8fd9\u79cd\u73b0\u8c61\u3002\u5728\u4e00\u4e2a\u7b80\u5355\u7684\u7406\u8bba\u8bbe\u7f6e\u4e2d\uff0c\u6211\u4eec\u6cbf\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u68af\u5ea6\u4e0b\u964d\u8f68\u8ff9\u5206\u79bb\u51fa\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\u53d1\u751f\u7684\u539f\u56e0\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u4e0e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u6df7\u5408\u4e2d\u7684\u793a\u4f8b\u8054\u7cfb\u8d77\u6765\uff0c\u8fd9\u4e9b\u793a\u4f8b\u4e2d\u8f93\u5165\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u4fe1\u606f\u5df2\u7ecf\u5b58\u5728\u4e8e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e2d\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u51fa\u4e86\u67d0\u4e9b\u6709\u9650\u7684\u7f13\u89e3\u7b56\u7565\uff0c\u540c\u65f6\u4e5f\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u7406\u8bba\u89c1\u89e3\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5931\u8d25\u6a21\u5f0f\u7684\u4e00\u4e2a\u8d77\u70b9\uff0c\u800c\u8fd9\u4e00\u6a21\u5f0f\u662fLLM\u8bad\u7ec3\u4e2d\u7684\u4e00\u4e2a\u6807\u51c6\u90e8\u5206\u3002**|\n", "2410.10779": "|**2024-10-14**|**Focused ReAct: Improving ReAct through Reiterate and Early Stop**|Shuoqiu Li et.al.|[2410.10779](http://arxiv.org/abs/2410.10779)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\u65b9\u9762\u6709\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u8fd9\u4f53\u73b0\u5728ReAct\u7b49\u65b9\u6cd5\u4e2d\u3002\u7136\u800c\uff0c\u5c3d\u7ba1ReAct\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u975e\u5e38\u6709\u6548\uff0c\u4f46\u5b83\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u4e00\u662f\u5bb9\u6613\u504f\u79bb\u539f\u59cb\u95ee\u9898\uff0c\u4e8c\u662f\u9677\u5165\u884c\u52a8\u5faa\u73af\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86Focused ReAct\uff0c\u8fd9\u662fReAct\u8303\u5f0f\u7684\u4e00\u4e2a\u589e\u5f3a\u7248\u672c\uff0c\u5b83\u7ed3\u5408\u4e86\u91cd\u7533\u548c\u65e9\u671f\u505c\u6b62\u673a\u5236\u3002\u8fd9\u4e9b\u6539\u8fdb\u6709\u52a9\u4e8e\u6a21\u578b\u4fdd\u6301\u5bf9\u539f\u59cb\u95ee\u9898\u7684\u5173\u6ce8\u5e76\u907f\u514d\u91cd\u590d\u884c\u4e3a\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u539f\u59cb\u7684ReAct\u65b9\u6cd5\u76f8\u6bd4\uff0cFocused ReAct\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e8618%\u5230530%\uff0c\u8fd0\u884c\u65f6\u95f4\u51cf\u5c11\u4e86\u6700\u591a34%\u3002|\n", "2410.10762": "|**2024-10-14**|**AFlow: Automating Agentic Workflow Generation**|Jiayi Zhang et.al.|[2410.10762](http://arxiv.org/abs/2410.10762)|**[link](https://github.com/geekan/metagpt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u9886\u57df\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u663e\u8457\u7684\u6f5c\u529b\uff0c\u901a\u5e38\u901a\u8fc7\u91c7\u7528\u9075\u5faa\u8be6\u7ec6\u6307\u4ee4\u548c\u64cd\u4f5c\u5e8f\u5217\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u6765\u5b9e\u73b0\u3002\u7136\u800c\uff0c\u6784\u5efa\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u548c\u901a\u7528\u6027\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8bd5\u56fe\u81ea\u52a8\u5316\u751f\u6210\u548c\u4f18\u5316\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4ecd\u7136\u4f9d\u8d56\u4e8e\u521d\u59cb\u7684\u624b\u52a8\u8bbe\u7f6e\uff0c\u5e76\u4e14\u672a\u80fd\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u5316\u548c\u6709\u6548\u7684\u6d41\u7a0b\u751f\u6210\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c06\u5de5\u4f5c\u6d41\u4f18\u5316\u91cd\u65b0\u8868\u8ff0\u4e3a\u4e00\u4e2a\u4ee3\u7801\u8868\u793a\u7684\u5de5\u4f5c\u6d41\u7a7a\u95f4\u641c\u7d22\u95ee\u9898\uff0c\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u7531LLM\u8c03\u7528\u7684\u8282\u70b9\u901a\u8fc7\u8fb9\u8fde\u63a5\u3002\u6211\u4eec\u5f15\u5165\u4e86AFlow\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\u6709\u6548\u5730\u63a2\u7d22\u8fd9\u4e2a\u7a7a\u95f4\uff0c\u901a\u8fc7\u4ee3\u7801\u4fee\u6539\u3001\u6811\u7ed3\u6784\u7684\u7ecf\u9a8c\u4ee5\u53ca\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u5730\u6539\u8fdb\u5de5\u4f5c\u6d41\u7a0b\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u8bc1\u8bc4\u4f30\u8868\u660e\uff0cAFlow\u7684\u6709\u6548\u6027\uff0c\u5e73\u5747\u6bd4\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u63d0\u9ad8\u4e865.7%\u3002\u6b64\u5916\uff0cAFlow\u4f7f\u5f97\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u80fd\u591f\u8d85\u8d8aGPT-4\uff0c\u540c\u65f6\u5176\u63a8\u7406\u6210\u672c\u4ec5\u4e3aGPT-4\u76844.55%\u3002\u4ee3\u7801\u5c06\u5728https://github.com/geekan/MetaGPT\u83b7\u53d6\u3002**|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u8fd9\u79cd\u653b\u51fb\u901a\u8fc7\u6076\u610f\u8f93\u5165\u5982\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u89e6\u53d1\u6a21\u578b\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ed3\u675f\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\u7684\u60c5\u51b5\u4e0b\uff08\u4f8b\u5982\uff0c\u5bf9\u673a\u5668\u4eba\u7684\u8bed\u97f3\u6307\u4ee4\uff09\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4f9d\u8d56\u81ea\u7136\u6307\u4ee4\u7684\u65b9\u5f0f\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u9650\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u7684\u6700\u5927\u957f\u5ea6\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLMs\u7684\u57fa\u4e8e\u6295\u6bd2\u7684DoS\uff08P-DoS\uff09\u653b\u51fb\u65b9\u6cd5\uff0c\u8bc1\u660e\u901a\u8fc7\u6ce8\u5165\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6295\u6bd2\u6837\u672c\u53ef\u4ee5\u7a81\u7834\u8f93\u51fa\u957f\u5ea6\u7684\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u6295\u6bd2\u6837\u672c\u80fd\u591f\u4ee5\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\u6210\u529f\u653b\u51fbGPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u5bfc\u81f4\u91cd\u590d\u8f93\u51fa\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2a\u6807\u8bb0\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u6295\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u5f00\u6e90LLMs\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u6b64\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u9700\u8981\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u7684\u5b89\u5168\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u83b7\u53d6\u3002**|\n", "2410.10759": "|**2024-10-14**|**SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization**|Akrit Mudvari et.al.|[2410.10759](http://arxiv.org/abs/2410.10759)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u5e74\u6765\u6210\u4e3a\u4e00\u9879\u98a0\u8986\u6027\u7684\u521b\u65b0\uff0c\u5728\u6211\u4eec\u7684\u65e5\u5e38\u751f\u6d3b\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u5b83\u4eec\u7684\u529f\u80fd\u5305\u62ec\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u3001\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u3001\u865a\u62df\u52a9\u624b\u7b49\u3002\u7136\u800c\uff0c\u4f17\u6240\u5468\u77e5\uff0cLLMs\u5728\u53c2\u6570\u6570\u91cf\u4e0a\u975e\u5e38\u5e9e\u5927\u3002\u6b64\u5916\uff0c\u5e95\u5c42\u67b6\u6784Transformer\u4e2d\u7684\u81ea\u6ce8\u610f\u529b\u673a\u5236\u5728\u8ba1\u7b97\u548c\u5185\u5b58\u65b9\u9762\u4e0e\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u5448\u4e8c\u6b21\u590d\u6742\u6027\u5173\u7cfb\u3002\u7531\u4e8e\u8fd9\u4e9b\u539f\u56e0\uff0cLLM\u63a8\u7406\u8d44\u6e90\u5bc6\u96c6\u578b\u9ad8\uff0c\u56e0\u6b64LLM\u63a8\u7406\u7684\u541e\u5410\u91cf\u53d7\u5230\u9650\u5236\uff0c\u5c24\u5176\u662f\u5728\u8f83\u957f\u5e8f\u5217\u7684\u60c5\u51b5\u4e0b\u3002\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u670d\u52a1\u5668\u4e0e\u5176\u5ba2\u6237\u7aef\u4e4b\u95f4\u7684\u534f\u4f5c\u63a8\u7406\u67b6\u6784\uff0c\u4ee5\u7f13\u89e3\u541e\u5410\u91cf\u9650\u5236\u3002\u5728\u8fd9\u4e2a\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u53cc\u65b9\u53ef\u7528\u7684\u8d44\u6e90\uff0c\u5373\u8ba1\u7b97\u548c\u901a\u4fe1\u6210\u672c\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8e\u52a8\u6001\u89c4\u5212\u7684\u7b97\u6cd5\uff0c\u4ee5\u6700\u4f18\u65b9\u5f0f\u5206\u914d\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u8bbe\u5907\u4e4b\u95f4\u7684\u8ba1\u7b97\uff0c\u4ece\u800c\u63d0\u9ad8\u670d\u52a1\u5668\u541e\u5410\u91cf\uff0c\u540c\u65f6\u4e0d\u8fdd\u53cd\u670d\u52a1\u6c34\u5e73\u534f\u8bae\uff08SLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u80fd\u591f\u9ad8\u6548\u5730\u5206\u914d\u5de5\u4f5c\u8d1f\u8f7d\uff0c\u4f7f\u670d\u52a1\u5668\u7684\u5de5\u4f5c\u8d1f\u8f7d\u51cf\u5c11\u7ea6\u4e09\u5206\u4e4b\u4e00\uff0c\u540c\u65f6\u6bd4\u8d2a\u5fc3\u65b9\u6cd5\u63d0\u9ad8\u4e8619%\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5177\u6709\u4e0d\u540c\u7c7b\u578bLLM\u63a8\u7406\u8bf7\u6c42\u7684\u73af\u5883\u4e2d\uff0c\u670d\u52a1\u5668\u7684\u541e\u5410\u91cf\u5f97\u5230\u4e86\u63d0\u5347\u3002|\n", "2410.11841": "|**2024-10-15**|**GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation**|Fei Tang et.al.|[2410.11841](http://arxiv.org/abs/2410.11841)|null|\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u63a8\u8350\uff08LLM-based ER\uff09\u7cfb\u7edf\u5728\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u8350\u89e3\u91ca\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u9762\u4e34\u7740\u5efa\u6a21\u7528\u6237\u4e0e\u9879\u76ee\u4e4b\u95f4\u7684\u534f\u540c\u504f\u597d\u3001\u4e2a\u6027\u5316\u89e3\u91ca\u4ee5\u53ca\u5904\u7406\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGaVaMoE\u7684\u65b0\u6846\u67b6\uff0c\u5373\u9ad8\u65af\u53d8\u5206\u95e8\u63a7\u4e13\u5bb6\u6df7\u5408\u6a21\u578b\uff0c\u7528\u4e8e\u53ef\u89e3\u91ca\u63a8\u8350\u3002GaVaMoE\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(1) \u4e00\u4e2a\u8bc4\u5206\u91cd\u6784\u6a21\u5757\uff0c\u91c7\u7528\u5e26\u6709\u9ad8\u65af\u6df7\u5408\u6a21\u578b\uff08GMM\uff09\u7684\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VAE\uff09\uff0c\u4ee5\u6355\u6349\u590d\u6742\u7684\u7528\u6237-\u9879\u76ee\u534f\u540c\u504f\u597d\uff0c\u4f5c\u4e3a\u9884\u8bad\u7ec3\u7684\u591a\u95e8\u673a\u5236\uff1b(2) \u4e00\u7ec4\u7ec6\u7c92\u5ea6\u7684\u4e13\u5bb6\u6a21\u578b\uff0c\u4e0e\u591a\u95e8\u673a\u5236\u8026\u5408\uff0c\u7528\u4e8e\u751f\u6210\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u89e3\u91ca\u3002VAE\u7ec4\u4ef6\u5bf9\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u56e0\u7d20\u8fdb\u884c\u5efa\u6a21\uff0c\u800cGMM\u5219\u805a\u7c7b\u5177\u6709\u76f8\u4f3c\u884c\u4e3a\u7684\u7528\u6237\u3002\u6bcf\u4e2a\u805a\u7c7b\u5bf9\u5e94\u591a\u95e8\u673a\u5236\u4e2d\u7684\u4e00\u4e2a\u95e8\uff0c\u5c06\u7528\u6237-\u9879\u76ee\u5bf9\u8def\u7531\u5230\u9002\u5f53\u7684\u4e13\u5bb6\u6a21\u578b\u3002\u8fd9\u79cd\u67b6\u6784\u4f7fGaVaMoE\u80fd\u591f\u4e3a\u7279\u5b9a\u7c7b\u578b\u7684\u7528\u6237\u548c\u504f\u597d\u751f\u6210\u5b9a\u5236\u5316\u89e3\u91ca\uff0c\u901a\u8fc7\u5229\u7528\u7528\u6237\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u6765\u7f13\u89e3\u6570\u636e\u7a00\u758f\u95ee\u9898\u3002\u5728\u4e09\u4e2a\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cGaVaMoE\u5728\u89e3\u91ca\u8d28\u91cf\u3001\u4e2a\u6027\u5316\u548c\u4e00\u81f4\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002\u7279\u522b\u662f\uff0c\u5728\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u573a\u666f\u4e2d\uff0cGaVaMoE\u8868\u73b0\u51fa\u7a33\u5065\u7684\u6027\u80fd\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5386\u53f2\u6570\u636e\u6709\u9650\u7684\u7528\u6237\u4e5f\u80fd\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u89e3\u91ca\u3002|\n", "2410.11829": "|**2024-10-15**|**MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding**|Yue Cao et.al.|[2410.11829](http://arxiv.org/abs/2410.11829)|**[link](https://github.com/yuecao0119/MMFuser)**|**\u5c3d\u7ba1\u5728\u8de8\u6a21\u6001\u4ea4\u4e92\u4e2d\u7406\u89e3\u590d\u6742\u7684\u4eba\u7c7b\u610f\u56fe\u65b9\u9762\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u6355\u6349\u590d\u6742\u7684\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6574\u5408\u591a\u4e2a\u89c6\u89c9\u7f16\u7801\u5668\u6765\u589e\u5f3a\u89c6\u89c9\u7ec6\u8282\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u5f15\u5165\u4e86\u5197\u4f59\u548c\u8ba1\u7b97\u5f00\u9500\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5927\u591a\u6570MLLMs\u4ec5\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u7684\u6700\u540e\u4e00\u5c42\u7279\u5f81\u56fe\u6765\u8fdb\u884c\u89c6\u89c9\u8868\u793a\uff0c\u800c\u5ffd\u7565\u4e86\u6d45\u5c42\u7279\u5f81\u56fe\u4e2d\u7684\u4e30\u5bcc\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\modelname\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u591a\u5c42\u7279\u5f81\u878d\u5408\u5668\uff0c\u80fd\u591f\u9ad8\u6548\u5730\u6574\u5408\u6765\u81ea\u89c6\u89c9\u53d8\u6362\u5668\uff08ViTs\uff09\u7684\u6df1\u5c42\u548c\u6d45\u5c42\u7279\u5f81\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5229\u7528\u8bed\u4e49\u5bf9\u9f50\u7684\u6df1\u5c42\u7279\u5f81\u4f5c\u4e3a\u67e5\u8be2\uff0c\u52a8\u6001\u63d0\u53d6\u6d45\u5c42\u7279\u5f81\u4e2d\u7f3a\u5931\u7684\u7ec6\u8282\uff0c\u4ece\u800c\u5728\u4fdd\u6301\u8bed\u4e49\u5bf9\u9f50\u7684\u540c\u65f6\u4e30\u5bcc\u4e86\u8868\u793a\u5f62\u5f0f\u7684\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u5e94\u7528\u4e8eLLaVA-1.5\u6a21\u578b\u65f6\uff0c\\modelname\u5728\u89c6\u89c9\u8868\u793a\u548c\u57fa\u51c6\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u591a\u7f16\u7801\u5668\u96c6\u6210\u65b9\u6cd5\u66f4\u7075\u6d3b\u3001\u66f4\u8f7b\u91cf\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/yuecao0119/MMFuser\u3002**|\n", "2410.11815": "|**2024-10-15**|**SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing**|Zhiyuan Zhang et.al.|[2410.11815](http://arxiv.org/abs/2410.11815)|null|\u573a\u666f\u56fe\u4ee5\u8282\u70b9\u548c\u8fb9\u7684\u5f62\u5f0f\u63d0\u4f9b\u4e86\u56fe\u50cf\u7684\u7ed3\u6784\u5316\u3001\u5206\u5c42\u8868\u793a\uff0c\u5206\u522b\u8868\u793a\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\u3002\u5b83\u53ef\u4ee5\u7528\u4f5c\u56fe\u50cf\u7f16\u8f91\u7684\u81ea\u7136\u754c\u9762\uff0c\u663e\u8457\u63d0\u9ad8\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u3002\u5229\u7528\u8fd9\u4e00\u4f18\u52bf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u76f8\u7ed3\u5408\uff0c\u7528\u4e8e\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u56fe\u50cf\u7f16\u8f91\u3002\u8fd9\u79cd\u96c6\u6210\u4f7f\u5f97\u5728\u5bf9\u8c61\u7ea7\u522b\u8fdb\u884c\u7cbe\u786e\u4fee\u6539\u4ee5\u53ca\u5bf9\u573a\u666f\u8fdb\u884c\u521b\u9020\u6027\u91cd\u6784\u6210\u4e3a\u53ef\u80fd\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u6574\u4f53\u56fe\u50cf\u7684\u5b8c\u6574\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a1\uff09\u5229\u7528LLM\u9a71\u52a8\u7684\u573a\u666f\u89e3\u6790\u5668\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u56fe\u50cf\u7684\u573a\u666f\u56fe\uff0c\u6355\u6349\u5173\u952e\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\uff0c\u5e76\u89e3\u6790\u7ec6\u7c92\u5ea6\u5c5e\u6027\u5982\u5bf9\u8c61\u63a9\u7801\u548c\u63cf\u8ff0\u3002\u8fd9\u4e9b\u6ce8\u91ca\u4fc3\u8fdb\u4e86\u6982\u5ff5\u5b66\u4e60\uff0c\u4f7f\u7528\u5fae\u8c03\u6269\u6563\u6a21\u578b\u6765\u4ee3\u8868\u6bcf\u4e2a\u5bf9\u8c61\uff0c\u7528\u4f18\u5316\u7684\u6807\u8bb0\u548c\u8be6\u7ec6\u7684\u63cf\u8ff0\u63d0\u793a\u8868\u793a\u30022\uff09\u5728\u56fe\u50cf\u7f16\u8f91\u9636\u6bb5\uff0cLLM\u7f16\u8f91\u63a7\u5236\u5668\u6307\u5bfc\u7279\u5b9a\u533a\u57df\u7684\u7f16\u8f91\u3002\u8fd9\u4e9b\u7f16\u8f91\u901a\u8fc7\u6ce8\u610f\u529b\u8c03\u8282\u7684\u6269\u6563\u7f16\u8f91\u5668\u5b9e\u73b0\uff0c\u5229\u7528\u5fae\u8c03\u6a21\u578b\u6267\u884c\u5bf9\u8c61\u6dfb\u52a0\u3001\u5220\u9664\u3001\u66ff\u6362\u548c\u8c03\u6574\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u7f16\u8f91\u7cbe\u5ea6\u548c\u573a\u666f\u7f8e\u5b66\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u56fe\u50cf\u7f16\u8f91\u65b9\u6cd5\u3002|\n", "2410.11805": "|**2024-10-15**|**NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models**|Han Han et.al.|[2410.11805](http://arxiv.org/abs/2410.11805)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u5de5\u5177\u5b66\u4e60\u5728\u73b0\u5b9e\u5e94\u7528\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u679c\u3002\u5728\u5de5\u5177\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0cLLMs\u53ef\u80fd\u4f1a\u6309\u7167\u5d4c\u5957\u987a\u5e8f\u8c03\u7528\u591a\u4e2a\u5de5\u5177\uff0c\u5176\u4e2d\u540e\u4e00\u4e2a\u5de5\u5177\u8c03\u7528\u53ef\u80fd\u5c06\u5176\u524d\u4e00\u4e2a\u5de5\u5177\u7684\u54cd\u5e94\u4f5c\u4e3a\u8f93\u5165\u53c2\u6570\u3002\u7136\u800c\uff0c\u5f53\u524d\u5bf9\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7814\u7a76\u4ecd\u7136\u4e0d\u8db3\uff0c\u56e0\u4e3a\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u7f3a\u4e4f\u76f8\u5173\u6570\u636e\u5b9e\u4f8b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86NesTools\u6765\u586b\u8865\u5168\u9762\u8bc4\u4f30\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7a7a\u767d\u3002NesTools\u5305\u542b\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u7528\u4e8e\u6784\u5efa\u5177\u6709\u4e0d\u540c\u5d4c\u5957\u7ed3\u6784\u7684\u5927\u89c4\u6a21\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u3002\u901a\u8fc7\u4eba\u5de5\u5ba1\u6838\u548c\u4f18\u5316\uff0c\u8be5\u6570\u636e\u96c6\u8d28\u91cf\u9ad8\u4e14\u4e0e\u73b0\u5b9e\u573a\u666f\u7d27\u5bc6\u76f8\u5173\u3002\u56e0\u6b64\uff0cNesTools\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6765\u8bc4\u4f30LLMs\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u3002\u6211\u4eec\u5bf922\u4e2aLLMs\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528NesTools\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684LLMs\u5728\u590d\u6742\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u4efb\u52a1\u4e0a\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002|\n", "2410.11802": "|**2024-10-15**|**FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting**|Zhe Li et.al.|[2410.11802](http://arxiv.org/abs/2410.11802)|null|\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u5728\u91d1\u878d\u3001\u6c14\u8c61\u670d\u52a1\u548c\u80fd\u6e90\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u90fd\u662f\u5173\u952e\u529f\u80fd\u3002\u5c3d\u7ba1\u8fd1\u5e74\u6765\u51fa\u73b0\u4e86\u8bb8\u591aTSF\u65b9\u6cd5\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e2d\u7684\u8bb8\u591a\u9700\u8981\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u6536\u96c6\u548c\u6a21\u578b\u8bad\u7ec3\uff0c\u5e76\u4e14\u5728\u65b0\u9886\u57df\u4e0a\u7684\u6cdb\u5316\u6027\u80fd\u8f83\u5dee\u3002\u57fa\u7840\u6a21\u578b\u65e8\u5728\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u3002\u5b83\u4eec\u901a\u8fc7\u5927\u89c4\u6a21\u8bed\u8a00\u6216\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u9884\u8bad\u7ec3\uff0c\u8868\u73b0\u51fa\u5728\u65b0\u6216\u672a\u89c1\u8fc7\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u63a8\u7406\u7684\u6f5c\u529b\u3002\u8fd9\u4fc3\u4f7f\u4e86\u65b0\u578bTSF\u57fa\u7840\u6a21\u578b\u7684\u6d8c\u73b0\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5373FoundTS\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5f7b\u5e95\u800c\u516c\u5e73\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002FoundTS\u6db5\u76d6\u4e86\u5404\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u65f6\u95f4\u5e8f\u5217\u7684\u57fa\u7840\u6a21\u578b\u3002\u6b64\u5916\uff0cFoundTS\u652f\u6301\u4e0d\u540c\u7684\u9884\u6d4b\u7b56\u7565\uff0c\u5305\u62ec\u96f6\u6837\u672c\u3001\u5c11\u91cf\u6837\u672c\u548c\u5168\u6837\u672c\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6700\u540e\uff0cFoundTS\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6807\u51c6\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\u7ba1\u9053\uff0c\u5305\u62ec\u6570\u636e\u96c6\u5206\u5272\u3001\u52a0\u8f7d\u3001\u5f52\u4e00\u5316\u548c\u5c11\u91cf\u6837\u672c\u62bd\u53d6\uff0c\u4ece\u800c\u5b9e\u73b0\u516c\u5e73\u7684\u8bc4\u4f30\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5bf9\u5e7f\u6cdb\u9886\u57df\u5185\u5177\u6709\u4e0d\u540c\u7edf\u8ba1\u7279\u6027\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u7684TSF\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u8bc4\u4f30\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u6709\u57fa\u7840\u6a21\u578b\u7684\u4f18\u70b9\u3001\u7f3a\u70b9\u53ca\u5176\u5185\u5728\u9650\u5236\uff0c\u5e76\u786e\u5b9a\u4e86\u672a\u6765\u6a21\u578b\u8bbe\u8ba1\u7684\u65b9\u5411\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u5728https://anonymous.4open.science/r/FoundTS-C2B0\u83b7\u53d6\u3002|\n", "2410.11786": "|**2024-10-15**|**Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability**|Tsz Ting Chung et.al.|[2410.11786](http://arxiv.org/abs/2410.11786)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u65f6\u3002\u7136\u800c\uff0c\u4e0a\u4e0b\u6587\u5b66\u4e60\u5e26\u6765\u4e86\u989d\u5916\u7684\u8ba1\u7b97\u548c\u8d22\u52a1\u6210\u672c\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u4e00\u4e9b\u63d0\u793a\u538b\u7f29\u65b9\u6cd5\u88ab\u63d0\u51fa\u4ee5\u538b\u7f29\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u7684\u63d0\u793a\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u9762\u4e34\u7740\u7531\u4e8e\u6a21\u578b\u7279\u5b9a\u538b\u7f29\u800c\u5bfc\u81f4\u7684\u8fc1\u79fb\u6027\u5dee\u7684\u95ee\u9898\uff0c\u6216\u8005\u4f9d\u8d56\u5916\u90e8\u8bad\u7ec3\u6570\u636e\uff0c\u4f8b\u5982GPT-4\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u5f00\u53d1\u7edf\u4e00\u538b\u7f29\u65b9\u6cd5\u7684\u80fd\u529b\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u79bb\u6563\u5316\u4e0d\u5177\u4fe1\u606f\u6027\u7684\u6807\u8bb0\uff0c\u91c7\u7528\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u6280\u672f\u3002\u901a\u8fc7\u5728\u6301\u7eed\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f15\u5165\u5c11\u91cf\u53c2\u6570\uff0c\u6240\u63d0\u51fa\u7684Selection-p\u4e3a\u6bcf\u4e2a\u8f93\u5165\u6807\u8bb0\u751f\u6210\u4e00\u4e2a\u6982\u7387\u503c\uff0c\u6307\u793a\u4fdd\u7559\u6216\u4e22\u5f03\u8be5\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cSelection-p\u5728\u591a\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e2d\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u5b9e\u73b0\u9ad8\u8fbe10\u500d\u7684\u538b\u7f29\u7387\u7684\u540c\u65f6\uff0c\u4ec5\u7ecf\u5386\u4e86\u5fae\u5c0f\u76840.8%\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u5b83\u76f8\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u5728\u4e0d\u540c\u6a21\u578b\u4e0a\u7684\u8fc1\u79fb\u6027\u66f4\u4f18\u3002\u53e6\u5916\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5206\u6790\u4e86Selection-p\u5982\u4f55\u6709\u52a9\u4e8e\u5728\u957f\u4e0a\u4e0b\u6587\u4e2d\u4fdd\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u6027\u80fd\u3002|\n", "2410.11782": "|**2024-10-15**|**G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks**|Guibin Zhang et.al.|[2410.11782](http://arxiv.org/abs/2410.11782)|null|\u8fd1\u671f\u5728\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6280\u672f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u8bc1\u660e\u96c6\u4f53\u667a\u80fd\u53ef\u4ee5\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\u62d3\u6251\u3002\u5c3d\u7ba1\u6709\u8bb8\u591a\u591a\u6837\u5316\u4e14\u9ad8\u6027\u80fd\u7684\u8bbe\u8ba1\u53ef\u4f9b\u9009\u62e9\uff0c\u4f46\u5b9e\u8df5\u8005\u5728\u4e3a\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u6709\u6548\u7684\u7ba1\u9053\u65f6\u5e38\u5e38\u611f\u5230\u56f0\u60d1\uff1a\u54ea\u79cd\u62d3\u6251\u6700\u9002\u5408\u6211\u7684\u4efb\u52a1\uff0c\u540c\u65f6\u907f\u514d\u4e0d\u5fc5\u8981\u7684\u901a\u4fe1\u4ee4\u724c\u5f00\u9500\u5e76\u786e\u4fdd\u9ad8\u8d28\u91cf\u7684\u89e3\u51b3\u65b9\u6848\uff1f\u9488\u5bf9\u8fd9\u4e00\u56f0\u5883\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86G-Designer\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u9002\u5e94\u3001\u9ad8\u6548\u4e14\u7a33\u5065\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u52a8\u6001\u8bbe\u8ba1\u4efb\u52a1\u611f\u77e5\u7684\u5b9a\u5236\u5316\u901a\u4fe1\u62d3\u6251\u3002\u5177\u4f53\u6765\u8bf4\uff0cG-Designer\u5c06\u591a\u4ee3\u7406\u7cfb\u7edf\u5efa\u6a21\u4e3a\u4e00\u4e2a\u591a\u4ee3\u7406\u7f51\u7edc\uff0c\u5229\u7528\u53d8\u5206\u56fe\u81ea\u52a8\u7f16\u7801\u5668\u5bf9\u8282\u70b9\uff08\u4ee3\u7406\uff09\u548c\u4e00\u4e2a\u7279\u5b9a\u4efb\u52a1\u7684\u865a\u62df\u8282\u70b9\u8fdb\u884c\u7f16\u7801\uff0c\u5e76\u89e3\u7801\u51fa\u4e00\u4e2a\u4efb\u52a1\u9002\u5e94\u6027\u5f3a\u4e14\u6027\u80fd\u9ad8\u7684\u901a\u4fe1\u62d3\u6251\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cG-Designer\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\\textbf{(1) \u9ad8\u6027\u80fd}\uff0c\u5728MMLU\u4e0a\u7684\u51c6\u786e\u7387\u8fbe\u523084.50%\uff0c\u5728HumanEval\u4e0a\u7684pass@1\u8fbe\u523089.90%\uff1b\\textbf{(2) \u4efb\u52a1\u9002\u5e94\u6027}\uff0c\u6839\u636e\u4efb\u52a1\u96be\u5ea6\u6784\u5efa\u5b9a\u5236\u5316\u7684\u901a\u4fe1\u534f\u8bae\uff0c\u5c06\u4ee4\u724c\u6d88\u8017\u51cf\u5c11\u4e86\u9ad8\u8fbe95.33%\uff1b\u5e76\u4e14\\textbf{(3) \u5bf9\u6297\u9c81\u68d2}\uff0c\u80fd\u591f\u62b5\u5fa1\u4ee3\u7406\u5bf9\u6297\u653b\u51fb\uff0c\u4ec5\u5bfc\u81f40.3%\u7684\u51c6\u786e\u7387\u4e0b\u964d\u3002|\n", "2410.11781": "|**2024-10-15**|**Language Models Encode Numbers Using Digit Representations in Base 10**|Amit Arnold Levy et.al.|[2410.11781](http://arxiv.org/abs/2410.11781)|**[link](https://github.com/amitlevy/base10)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u5373\u4f7f\u662f\u7b80\u5355\u7684\u6570\u503c\u95ee\u9898\u65f6\uff0c\u5982\u6bd4\u8f83\u4e24\u4e2a\u5c0f\u6570\u5b57\uff0c\u4e5f\u7ecf\u5e38\u51fa\u9519\u3002\u4e00\u4e2a\u81ea\u7136\u7684\u5047\u8bbe\u662f\u8fd9\u4e9b\u9519\u8bef\u6e90\u4e8e\u6a21\u578b\u5982\u4f55\u8868\u793a\u6570\u5b57\uff0c\u7279\u522b\u662f\u5b83\u4eec\u662f\u5426\u6355\u6349\u5230\u4e86\u6570\u5b57\u7684\u5b9e\u9645\u6570\u503c\u3002\u6211\u4eec\u901a\u8fc7\u89c2\u5bdf\u53d1\u73b0\uff0cLLM\u5728\u6570\u503c\u4efb\u52a1\u4e0a\u7684\u9519\u8bef\u901a\u5e38\u5206\u5e03\u5728\u7b54\u6848\u7684\u201c\u4f4d\u6570\u201d\u4e0a\uff0c\u800c\u4e0d\u662f\u56f4\u7ed5\u5176\u201c\u6570\u503c\u201d\u6b63\u5e38\u5206\u5e03\u3002\u901a\u8fc7\u4e00\u7cfb\u5217\u63a2\u9488\u5b9e\u9a8c\u548c\u56e0\u679c\u5e72\u9884\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u5185\u90e8\u4ee5\u5341\u8fdb\u5236\u7684\u6bcf\u4e00\u4f4d\u6570\u5b57\u8fdb\u884c\u5706\u73af\u5f0f\u8868\u793a\uff0c\u800c\u4e0d\u662f\u6570\u503c\u8868\u793a\u3002\u8fd9\u79cd\u57fa\u4e8e\u4f4d\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u800c\u975e\u6570\u503c\u8868\u793a\uff0c\u63ed\u793a\u4e86\u6a21\u578b\u5728\u6d89\u53ca\u6570\u503c\u63a8\u7406\u7684\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u6a21\u5f0f\uff0c\u5e76\u53ef\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u5206\u6790LLM\u4e2d\u6570\u503c\u673a\u5236\u7684\u57fa\u7840\u3002|\n", "2410.11779": "|**2024-10-15**|**MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation**|Chenxi Wang et.al.|[2410.11779](http://arxiv.org/abs/2410.11779)|**[link](https://github.com/zjunlp/Deco)**|**\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7ecf\u5e38\u8868\u73b0\u51fa\u5e7b\u89c9\u73b0\u8c61\uff0c\u4f46\u5176\u80cc\u540e\u7684\u539f\u56e0\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u5206\u6790\u5e76\u53d1\u73b0\uff0c\u5c3d\u7ba1MLLMs\u5728\u6700\u7ec8\u8f93\u51fa\u4e2d\u9519\u8bef\u5730\u751f\u6210\u4e86\u5bf9\u8c61\uff0c\u4f46\u5728\u524d\u4e00\u5c42\u5b83\u4eec\u5b9e\u9645\u4e0a\u80fd\u591f\u8bc6\u522b\u89c6\u89c9\u5bf9\u8c61\u3002\u6211\u4eec\u63a8\u6d4b\u8fd9\u53ef\u80fd\u662f\u7531\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u77e5\u8bc6\u5148\u9a8c\u6291\u5236\u4e86\u89c6\u89c9\u4fe1\u606f\uff0c\u4ece\u800c\u5bfc\u81f4\u5e7b\u89c9\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u6821\u6b63\u89e3\u7801\u65b9\u6cd5\uff08DeCo\uff09\uff0c\u8be5\u65b9\u6cd5\u81ea\u9002\u5e94\u5730\u9009\u62e9\u5408\u9002\u7684\u524d\u4e00\u5c42\uff0c\u5e76\u6309\u6bd4\u4f8b\u5c06\u77e5\u8bc6\u6574\u5408\u5230\u6700\u7ec8\u5c42\u4ee5\u8c03\u6574\u8f93\u51falogits\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cDeCo\u662f\u4e0e\u6a21\u578b\u65e0\u5173\u7684\uff0c\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u5404\u79cd\u7ecf\u5178\u89e3\u7801\u7b56\u7565\u7ed3\u5408\uff0c\u5e76\u5e94\u7528\u4e8e\u4e0d\u540c\u7684MLLMs\u3002\u6211\u4eec\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86DeCo\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u76f8\u6bd4\u57fa\u7ebf\u5927\u5e45\u964d\u4f4e\u4e86\u5e7b\u89c9\u7387\uff0c\u7a81\u663e\u4e86\u5176\u51cf\u8f7b\u5e7b\u89c9\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u53ef\u5728https://github.com/zjunlp/DeCo\u83b7\u53d6\u3002**|\n", "2410.11772": "|**2024-10-15**|**Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models**|Kai Yao et.al.|[2410.11772](http://arxiv.org/abs/2410.11772)|**[link](https://github.com/kaiseem/ist)**|**\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u65b9\u6cd5\u56e0\u5176\u5728\u9002\u5e94\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5230\u4e0b\u6e38\u4efb\u52a1\u65f6\u663e\u8457\u51cf\u5c11\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u7684\u6f5c\u529b\u800c\u5e7f\u53d7\u6b22\u8fce\u3002\u7136\u800c\uff0c\u5927\u591a\u6570PEFT\u65b9\u6cd5\u7684\u4e00\u4e2a\u5e38\u89c1\u9650\u5236\u662f\u5b83\u4eec\u5728\u6574\u4e2a\u5c42\u4e2d\u5e94\u7528\u7edf\u4e00\u7684\u67b6\u6784\u8bbe\u8ba1\uff0c\u8fd9\u6d89\u53ca\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u6a21\u5757\uff0c\u5e76\u5ffd\u7565\u4e86\u6bcf\u5c42\u7684\u91cd\u8981\u6027\u5dee\u5f02\uff0c\u4ece\u800c\u5bfc\u81f4\u5fae\u8c03\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u5c40\u9650\u5e76\u83b7\u5f97\u66f4\u597d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u91cd\u8981\u6027\u611f\u77e5\u7a00\u758f\u8c03\u4f18\uff08IST\uff09\uff0c\u4ee5\u5145\u5206\u5229\u7528\u56fa\u6709\u7684\u7a00\u758f\u6027\uff0c\u5e76\u901a\u8fc7\u6709\u6548\u7684\u9010\u5c42\u91cd\u8981\u6027\u8bc4\u5206\u9009\u62e9\u6700\u91cd\u8981\u7684\u5168\u5c42\u5b50\u96c6\u3002\u6240\u63d0\u51fa\u7684IST\u662f\u4e00\u79cd\u901a\u7528\u4e14\u5373\u63d2\u5373\u7528\u7684\u6280\u672f\uff0c\u4e0e\u5404\u79cd\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u517c\u5bb9\u3002\u901a\u8fc7\u5229\u7528\u4f30\u8ba1\u7684\u91cd\u8981\u6027\u5f97\u5206\uff0cIST\u5728PEFT\u6a21\u5757\u4e2d\u52a8\u6001\u66f4\u65b0\u8fd9\u4e9b\u9009\u5b9a\u7684\u5c42\uff0c\u4ece\u800c\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u4f9b\u4e86\u6536\u655b\u6027\u7684\u7406\u8bba\u8bc1\u660e\u548c\u4f18\u4e8e\u5747\u5300\u66f4\u65b0\u7b56\u7565\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u4ee5\u8bc1\u660eIST\u76f8\u5bf9\u4e8e\u73b0\u6709\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5404\u79cdLLMs\u3001PEFT\u65b9\u6cd5\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u8bc1\u5b9e\u4e86\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86IST\u589e\u5f3a\u73b0\u6709\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/Kaiseem/IST\u83b7\u53d6\u3002**|\n", "2410.12788": "|**2024-10-16**|**Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception**|Jihao Zhao et.al.|[2410.12788](http://arxiv.org/abs/2410.12788)|**[link](https://github.com/IAAR-Shanghai/Meta-Chunking)**|Retrieval-Augmented Generation\uff08RAG\uff09\u5728\u4f5c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53ef\u884c\u8865\u5145\u65f6\uff0c\u5e38\u5e38\u5ffd\u7565\u4e86\u5176\u7ba1\u9053\u4e2d\u4e00\u4e2a\u5173\u952e\u65b9\u9762\u2014\u2014\u6587\u672c\u5206\u5757\uff0c\u8fd9\u5f71\u54cd\u4e86\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u7684\u8d28\u91cf\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u79f0\u4e3a\u5143\u5206\u5757\uff08Meta-Chunking\uff09\u7684\u6982\u5ff5\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8e\u53e5\u5b50\u548c\u6bb5\u843d\u4e4b\u95f4\u7684\u7c92\u5ea6\uff0c\u7531\u6bb5\u843d\u5185\u5177\u6709\u6df1\u5c42\u6b21\u8bed\u8a00\u903b\u8f91\u8054\u7cfb\u7684\u4e00\u7ec4\u53e5\u5b50\u7ec4\u6210\u3002\u4e3a\u4e86\u5b9e\u73b0\u5143\u5206\u5757\uff0c\u6211\u4eec\u57fa\u4e8eLLMs\u8bbe\u8ba1\u4e86\u4e24\u79cd\u7b56\u7565\uff1a\u8fb9\u754c\u91c7\u6837\u5206\u5757\u548c\u56f0\u60d1\u5ea6\u5206\u5757\u3002\u524d\u8005\u5229\u7528LLMs\u5bf9\u8fde\u7eed\u53e5\u5b50\u662f\u5426\u9700\u8981\u5206\u5272\u8fdb\u884c\u4e8c\u5206\u7c7b\u51b3\u7b56\uff0c\u57fa\u4e8e\u4ece\u8fb9\u754c\u91c7\u6837\u83b7\u5f97\u7684\u6982\u7387\u5dee\u505a\u51fa\u51b3\u7b56\u3002\u540e\u8005\u901a\u8fc7\u5206\u6790\u56f0\u60d1\u5ea6\u5206\u5e03\u7684\u7279\u70b9\u6765\u7cbe\u786e\u8bc6\u522b\u6587\u672c\u5206\u5757\u8fb9\u754c\u3002\u6b64\u5916\uff0c\u8003\u8651\u5230\u4e0d\u540c\u6587\u672c\u7684\u56fa\u6709\u590d\u6742\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5143\u5206\u5757\u4e0e\u52a8\u6001\u5408\u5e76\u7684\u7b56\u7565\uff0c\u4ee5\u5b9e\u73b0\u5728\u7ec6\u7c92\u5ea6\u548c\u7c97\u7c92\u5ea6\u6587\u672c\u5206\u5757\u4e4b\u95f4\u53d6\u5f97\u5e73\u8861\u3002\u5b9e\u9a8c\u5728\u5341\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660e\u5143\u5206\u5757\u53ef\u4ee5\u66f4\u6709\u6548\u5730\u63d0\u9ad8\u57fa\u4e8eRAG\u7684\u5355\u8df3\u548c\u591a\u8df3\u95ee\u7b54\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u57282WikiMultihopQA\u6570\u636e\u96c6\u4e0a\uff0c\u5b83\u6bd4\u76f8\u4f3c\u6027\u5206\u5757\u63d0\u9ad8\u4e861.32\u7684\u6027\u80fd\uff0c\u540c\u65f6\u4ec5\u6d88\u8017\u4e8645.8%\u7684\u65f6\u95f4\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/IAAR-Shanghai/Meta-Chunking \u83b7\u53d6\u3002|\n", "2410.12782": "|**2024-10-16**|**In-Context Learning Enables Robot Action Prediction in LLMs**|Yida Yin et.al.|[2410.12782](http://arxiv.org/abs/2410.12782)|null|\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u9886\u57df\u901a\u8fc7\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\u3002\u7136\u800c\uff0c\u5229\u7528LLMs\u7684ICL\u80fd\u529b\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u7684\u7814\u7a76\u8fd8\u76f8\u5bf9\u8f83\u5c11\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboPrompt\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u73b0\u6210\u7684\u7eaf\u6587\u672cLLMs\u80fd\u591f\u5728\u65e0\u9700\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u901a\u8fc7ICL\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u8bc6\u522b\u51fa\u4e00\u4e2a\u7247\u6bb5\u4e2d\u7684\u5173\u952e\u5e27\uff0c\u8fd9\u4e9b\u5173\u952e\u5e27\u6355\u6349\u4e86\u91cd\u8981\u7684\u65f6\u523b\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u4ece\u8fd9\u4e9b\u5173\u952e\u5e27\u4e2d\u63d0\u53d6\u672b\u7aef\u6267\u884c\u5668\u7684\u52a8\u4f5c\u4ee5\u53ca\u4f30\u8ba1\u7684\u521d\u59cb\u7269\u4f53\u59ff\u6001\uff0c\u5e76\u5c06\u4e24\u8005\u8f6c\u6362\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6a21\u677f\uff0c\u4ece\u8fd9\u4e9b\u6587\u672c\u63cf\u8ff0\u548c\u4efb\u52a1\u6307\u4ee4\u4e2d\u5f62\u6210ICL\u6f14\u793a\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u5728\u6d4b\u8bd5\u65f6\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0cRoboPrompt\u5728\u6a21\u62df\u548c\u771f\u5b9e\u73af\u5883\u4e2d\u5747\u8868\u73b0\u51fa\u6bd4\u96f6\u6837\u672c\u548cICL\u57fa\u7ebf\u66f4\u5f3a\u7684\u6027\u80fd\u3002|\n", "2410.12774": "|**2024-10-16**|**Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information**|Yingya Li et.al.|[2410.12774](http://arxiv.org/abs/2410.12774)|null|\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u4efb\u52a1\u7684\u5206\u7ec4\u65b9\u5f0f\u3002\u7b80\u5355\u5730\u5c06\u6240\u6709\u4efb\u52a1\u6216\u968f\u673a\u9009\u62e9\u7684\u4efb\u52a1\u7ec4\u5408\u5728\u4e00\u8d77\u53ef\u80fd\u5bfc\u81f4\u8d1f\u8fc1\u79fb\uff0c\u4ece\u800c\u4f7f\u591a\u4efb\u52a1\u6a21\u578b\u7684\u8868\u73b0\u4e0d\u5982\u5355\u4efb\u52a1\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u7ecf\u505a\u51fa\u4e86\u8bb8\u591a\u52aa\u529b\u6765\u8bc6\u522b\u4efb\u52a1\u5206\u7ec4\u5e76\u8861\u91cf\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4f46\u5b9a\u4e49\u4e00\u4e2a\u6307\u6807\u4ee5\u4ece\u4f17\u591a\u6f5c\u5728\u4efb\u52a1\u7ec4\u5408\u4e2d\u786e\u5b9a\u6700\u4f73\u4efb\u52a1\u5206\u7ec4\u4ecd\u7136\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7814\u7a76\u8bfe\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u70b9\u5f0fV-\u53ef\u7528\u4fe1\u606f\uff08PVI\uff09\u6d4b\u91cf\u4efb\u52a1\u96be\u5ea6\u7684\u4efb\u52a1\u76f8\u5173\u6027\u5ea6\u91cf\u65b9\u6cd5\u3002PVI\u662f\u4e00\u79cd\u65b0\u8fd1\u63d0\u51fa\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u7528\u4e8e\u4f30\u8ba1\u7ed9\u5b9a\u6a21\u578b\u65f6\u6570\u636e\u96c6\u5305\u542b\u591a\u5c11\u53ef\u7528\u4fe1\u606f\u3002\u6211\u4eec\u5047\u8bbe\u5177\u6709\u7edf\u8ba1\u4e0a\u4e0d\u53ef\u533a\u5206\u7684PVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u8db3\u591f\u76f8\u4f3c\uff0c\u53ef\u4ee5\u4ece\u8054\u5408\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u5728\u4e00\u822c\u3001\u751f\u7269\u533b\u5b66\u548c\u4e34\u5e8a\u9886\u57df\u768415\u4e2aNLP\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u8be5\u5ea6\u91cf\u65b9\u6cd5\u7528\u4e8e\u4efb\u52a1\u5206\u7ec4\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u5c06\u8054\u5408\u5b66\u4e60\u5668\u7684\u7ed3\u679c\u4e0e\u5355\u4efb\u52a1\u5b66\u4e60\u5668\u3001\u73b0\u6709\u57fa\u7ebf\u65b9\u6cd5\u4ee5\u53ca\u6700\u8fd1\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5305\u62ecLlama 2\u548cGPT-4\uff09\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u5c06\u5177\u6709\u76f8\u4f3cPVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u5206\u7ec4\uff0c\u8054\u5408\u5b66\u4e60\u5668\u5728\u8f83\u5c11\u603b\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u83b7\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u9886\u57df\u5185\u8868\u73b0\u4e00\u81f4\u3002|\n", "2410.12757": "|**2024-10-16**|**StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples**|Ajay Patel et.al.|[2410.12757](http://arxiv.org/abs/2410.12757)|null|\u98ce\u683c\u8868\u793a\u65e8\u5728\u5c06\u5177\u6709\u76f8\u4f3c\u5199\u4f5c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u63a5\u8fd1\u7684\u4f4d\u7f6e\uff0c\u5e76\u5c06\u5177\u6709\u4e0d\u540c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u8fdc\u79bb\u7684\u4f4d\u7f6e\uff0c\u800c\u4e0d\u8003\u8651\u5185\u5bb9\u3002\u7136\u800c\uff0c\u7528\u4e8e\u8bad\u7ec3\u8fd9\u4e9b\u8868\u793a\u7684\u5bf9\u6bd4\u4e09\u5143\u7ec4\u5f80\u5f80\u5728\u98ce\u683c\u548c\u5185\u5bb9\u4e0a\u90fd\u6709\u6240\u53d8\u5316\uff0c\u5bfc\u81f4\u8868\u793a\u4e2d\u53ef\u80fd\u5b58\u5728\u5185\u5bb9\u6cc4\u6f0f\u7684\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aStyleDistance\u7684\u65b0\u65b9\u6cd5\u6765\u8bad\u7ec3\u66f4\u5f3a\u7684\u72ec\u7acb\u4e8e\u5185\u5bb9\u7684\u98ce\u683c\u5d4c\u5165\u3002\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u4e86\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u53d7\u63a7\u98ce\u683c\u53d8\u5316\u7684\u8fd1\u4f3c\u91ca\u4e49\uff0c\u5e76\u4e3a\u7cbe\u786e\u7684\u5bf9\u6bd4\u5b66\u4e60\u751f\u6210\u4e86\u8de8\u8d8a40\u4e2a\u4e0d\u540c\u98ce\u683c\u7279\u5f81\u7684\u6b63\u4f8b\u548c\u8d1f\u4f8b\u3002\u6211\u4eec\u901a\u8fc7\u4eba\u5de5\u548c\u81ea\u52a8\u8bc4\u4f30\u6765\u8bc4\u4f30\u5408\u6210\u6570\u636e\u548c\u5d4c\u5165\u7684\u8d28\u91cf\u3002StyleDistance\u589e\u5f3a\u4e86\u98ce\u683c\u5d4c\u5165\u7684\u5185\u5bb9\u72ec\u7acb\u6027\uff0c\u8fd9\u79cd\u5d4c\u5165\u53ef\u4ee5\u63a8\u5e7f\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u5728\u4e0b\u6e38\u5e94\u7528\u4e2d\u4f18\u4e8e\u9886\u5148\u7684\u98ce\u683c\u8868\u793a\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728https://huggingface.co/StyleDistance/styledistance\u627e\u5230\u3002|\n", "2410.12735": "|**2024-10-17**|**CREAM: Consistency Regularized Self-Rewarding Language Models**|Zhaoyang Wang et.al.|[2410.12735](http://arxiv.org/abs/2410.12735)|null|\u8fd1\u671f\u7684\u81ea\u6211\u5956\u52b1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6210\u529f\u5730\u5e94\u7528\u4e86LLM\u4f5c\u4e3a\u88c1\u5224\u7684\u65b9\u6cd5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5bf9\u9f50\u6027\u80fd\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u6807\u6ce8\u7684\u504f\u597d\u6570\u636e\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u540c\u4e00LLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\uff08\u751f\u6210\u54cd\u5e94\uff09\u548c\u5956\u52b1\u6a21\u578b\uff08\u8bc4\u5206\u548c\u6392\u5e8f\u8fd9\u4e9b\u54cd\u5e94\uff09\u3002\u7136\u540e\uff0c\u6839\u636e\u6392\u540d\u7684\u54cd\u5e94\u4f5c\u4e3a\u504f\u597d\u5bf9\u6765\u901a\u8fc7\u76f4\u63a5\u5bf9\u9f50\u6280\u672f\uff08\u4f8b\u5982DPO\uff09\u8bad\u7ec3LLM\u3002\u7136\u800c\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0c\u5956\u52b1\u548c\u6392\u5e8f\u7684\u51c6\u786e\u6027\u6ca1\u6709\u4fdd\u8bc1\uff0c\u8fd9\u5bf9\u4e8e\u786e\u4fdd\u51c6\u786e\u7684\u5956\u52b1\u548c\u9ad8\u8d28\u91cf\u7684\u504f\u597d\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u6765\u81ea\u76f8\u5bf9\u8f83\u5c0f\u7684LLM\uff08\u4f8b\u59827B\u53c2\u6570\uff09\u7684\u7ecf\u9a8c\u7ed3\u679c\u4e5f\u8868\u660e\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7ecf\u8fc7\u51e0\u6b21\u8fed\u4ee3\u540e\uff0c\u81ea\u6211\u5956\u52b1\u7684\u6539\u8fdb\u53ef\u80fd\u4f1a\u51cf\u5f31\uff0c\u6211\u4eec\u5047\u8bbe\u8fd9\u662f\u7531\u4e8e\u5956\u52b1\u7cfb\u7edf\u4e2d\u7684\u7d2f\u79ef\u504f\u5dee\u6240\u81f4\u3002\u8fd9\u79cd\u504f\u5dee\u53ef\u80fd\u5bfc\u81f4\u7528\u4e8e\u8bad\u7ec3LLM\u7684\u4e0d\u53ef\u9760\u504f\u597d\u6570\u636e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u9996\u5148\u5236\u5b9a\u4e86\u5e76\u5206\u6790\u4e86\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u4e49\u8fed\u4ee3\u504f\u597d\u5fae\u8c03\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728\u8fd9\u4e00\u5e7f\u4e49\u6846\u67b6\u4e2d\u5f15\u5165\u6b63\u5219\u5316\uff0c\u4ee5\u51cf\u8f7b\u81ea\u6211\u5956\u52b1\u8fc7\u7a0b\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u504f\u597d\u6807\u8bb0\u3002\u57fa\u4e8e\u8fd9\u4e00\u7406\u8bba\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e00\u81f4\u6027\u6b63\u5219\u5316\u7684\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\uff08CREAM\uff09\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e0d\u540c\u8fed\u4ee3\u4e2d\u7684\u5956\u52b1\u4e00\u81f4\u6027\u6765\u6b63\u5219\u5316\u81ea\u6211\u5956\u52b1\u8bad\u7ec3\uff0c\u5e2e\u52a9\u6a21\u578b\u4ece\u66f4\u53ef\u9760\u7684\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u79cd\u660e\u786e\u7684\u6b63\u5219\u5316\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u660e\u4e86CREAM\u5728\u63d0\u9ad8\u5956\u52b1\u4e00\u81f4\u6027\u548c\u5bf9\u9f50\u6027\u80fd\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Raibows/CREAM\u516c\u5f00\u83b7\u53d6\u3002|\n", "2410.12707": "|**2024-10-16**|**FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression**|Zhenheng Tang et.al.|[2410.12707](http://arxiv.org/abs/2410.12707)|null|\u4e3a\u4e86\u7f13\u89e3\u5728\u8bad\u7ec3\u5927\u578b\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\uff08DNNs\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\u7684\u786c\u4ef6\u77ed\u7f3a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FusionLLM\uff0c\u8fd9\u662f\u4e00\u79cd\u53bb\u4e2d\u5fc3\u5316\u7684\u8bad\u7ec3\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u5730\u7406\u5206\u5e03\u7684GPU\u8de8\u4e0d\u540c\u7684\u8ba1\u7b97\u96c6\u7fa4\u6216\u5355\u4e2a\u8bbe\u5907\u8fdb\u884cDNN\u8bad\u7ec3\u3002\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u5728\u7cfb\u7edf\u8bbe\u8ba1\u548c\u6548\u7387\u65b9\u9762\u9762\u4e34\u91cd\u5927\u6311\u6218\uff0c\u5305\u62ec\uff1a1\uff09\u9700\u8981\u8fdc\u7a0b\u81ea\u52a8\u5fae\u5206\uff08RAD\uff09\uff0c2\uff09\u652f\u6301\u7075\u6d3b\u7684\u6a21\u578b\u5b9a\u4e49\u548c\u5f02\u6784\u8f6f\u4ef6\uff0c3\uff09\u5f02\u6784\u786c\u4ef6\u5bfc\u81f4\u8d44\u6e90\u5229\u7528\u7387\u4f4e\u6216\u5b58\u5728\u6162\u901f\u8282\u70b9\u95ee\u9898\uff0c\u4ee5\u53ca4\uff09\u7f51\u7edc\u901a\u4fe1\u7f13\u6162\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u5728\u7cfb\u7edf\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u5c06\u6a21\u578b\u8868\u793a\u4e3a\u64cd\u4f5c\u7b26\uff08OP-DAG\uff09\u7684\u6709\u5411\u65e0\u73af\u56fe\u3002DAG\u4e2d\u7684\u6bcf\u4e2a\u8282\u70b9\u4ee3\u8868DNN\u4e2d\u7684\u64cd\u4f5c\u7b26\uff0c\u8fb9\u5219\u8868\u793a\u64cd\u4f5c\u7b26\u4e4b\u95f4\u7684\u6570\u636e\u4f9d\u8d56\u5173\u7cfb\u3002\u57fa\u4e8e\u8fd9\u79cd\u8bbe\u8ba1\uff0c1\uff09\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49\u4efb\u4f55DNN\u800c\u4e0d\u5fc5\u5173\u5fc3\u5e95\u5c42\u64cd\u4f5c\u7b26\u5b9e\u73b0\uff1b2\uff09\u6211\u4eec\u901a\u8fc7\u66f4\u7ec6\u7c92\u5ea6\u7684\u5b50\u4efb\u52a1\u8fdb\u884c\u4efb\u52a1\u8c03\u5ea6\uff0c\u63d0\u4f9b\u66f4\u591a\u7684\u4f18\u5316\u7a7a\u95f4\uff1b3\uff09DAG\u8fd0\u884c\u65f6\u6267\u884c\u5668\u53ef\u4ee5\u5728\u4e0d\u4f9d\u8d56\u4e00\u81f4\u7684\u4f4e\u7ea7\u673a\u5668\u5b66\u4e60\u6846\u67b6\u7248\u672c\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0RAD\u3002 \u4e3a\u4e86\u63d0\u9ad8\u7cfb\u7edf\u6548\u7387\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u4e00\u4e2a\u5de5\u4f5c\u8d1f\u8f7d\u4f30\u8ba1\u5668\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cdOP-Fence\u8c03\u5ea6\u5668\uff0c\u5c06\u5177\u6709\u76f8\u4f3c\u5e26\u5bbd\u7684\u8bbe\u5907\u5206\u7ec4\u5728\u4e00\u8d77\uff0c\u5e76\u5bf9DAG\u8fdb\u884c\u5206\u533a\u4ee5\u589e\u52a0\u541e\u5410\u91cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cdAdaTopK\u538b\u7f29\u5668\uff0c\u4ee5\u81ea\u9002\u5e94\u5730\u538b\u7f29\u5728\u6700\u6162\u901a\u4fe1\u94fe\u8def\u4e0a\u7684\u4e2d\u95f4\u6fc0\u6d3b\u548c\u68af\u5ea6\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u7b97\u6cd5\u7684\u6536\u655b\u6027\u548c\u6548\u7387\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u73b0\u5b9e\u6d4b\u8bd5\u5e73\u53f0\u4e0a\u4f7f\u7528\u8fde\u63a5\u901f\u5ea6\u57288 Mbps\u523010 Gbps\u768448\u4e2aGPU\u4e0a\u8bad\u7ec3\u4e86ResNet-101\u548cGPT-2\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u65b9\u6cd5\u53ef\u4ee5\u5728\u786e\u4fdd\u6536\u655b\u7684\u540c\u65f6\u5b9e\u73b01.45\u81f39.39\u500d\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2410.12700": "|**2024-10-16**|**Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization**|Xingqi Wang et.al.|[2410.12700](http://arxiv.org/abs/2410.12700)|**[link](https://github.com/achernarwang/LiVO)**|**\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u7684\u6269\u6563\u6a21\u578b\u5df2\u7ecf\u80fd\u591f\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u56fe\u50cf\u96be\u4ee5\u533a\u5206\u7684\u56fe\u50cf\uff0c\u4f46\u5b83\u4eec\u5e38\u5e38\u4ea7\u751f\u6709\u5bb3\u5185\u5bb9\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u4e0d\u7b26\uff0c\u4f8b\u5982\u793e\u4f1a\u504f\u89c1\u548c\u5192\u72af\u6027\u5185\u5bb9\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\u8fdb\u884c\u4e86\u5927\u91cf\u7814\u7a76\uff0c\u4f46\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u5bf9\u9f50\u95ee\u9898\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LiVO\uff08\u8f7b\u91cf\u7ea7\u4ef7\u503c\u4f18\u5316\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u8f7b\u91cf\u7ea7\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06T2I\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u3002LiVO\u4ec5\u4f18\u5316\u4e00\u4e2a\u5373\u63d2\u5373\u7528\u7684\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u4ee5\u5c06\u6307\u5b9a\u7684\u4ef7\u503c\u539f\u5219\u6574\u5408\u5230\u8f93\u5165\u63d0\u793a\u4e2d\uff0c\u4ece\u800c\u5728\u63a7\u5236\u751f\u6210\u56fe\u50cf\u7684\u8bed\u4e49\u548c\u4ef7\u503c\u89c2\u65b9\u9762\u53d1\u6325\u4f5c\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9488\u5bf9\u6269\u6563\u6a21\u578b\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\uff0c\u8be5\u51fd\u6570\u5728\u7406\u8bba\u4e0a\u903c\u8fd1LLM\u5bf9\u9f50\u4e2d\u4f7f\u7528\u7684Bradley-Terry\u6a21\u578b\uff0c\u4f46\u63d0\u4f9b\u4e86\u56fe\u50cf\u8d28\u91cf\u548c\u4ef7\u503c\u4e00\u81f4\u6027\u4e4b\u95f4\u7684\u66f4\u7075\u6d3b\u7684\u6743\u8861\u3002\u4e3a\u4e86\u4f18\u5316\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e00\u4e2a\u6846\u67b6\u6765\u81ea\u52a8\u6784\u5efa\u4e00\u4e2a\u5305\u542b86k\u4e2a\u6837\u672c\uff08\u63d0\u793a\u3001\u5bf9\u9f50\u56fe\u50cf\u3001\u8fdd\u53cd\u56fe\u50cf\u3001\u4ef7\u503c\u539f\u5219\uff09\u7684\u6587\u672c-\u56fe\u50cf\u504f\u597d\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e0d\u66f4\u65b0\u5927\u591a\u6570\u6a21\u578b\u53c2\u6570\u5e76\u901a\u8fc7\u4ece\u8f93\u5165\u63d0\u793a\u4e2d\u8fdb\u884c\u81ea\u9002\u5e94\u4ef7\u503c\u9009\u62e9\uff0cLiVO\u663e\u8457\u51cf\u5c11\u4e86\u6709\u5bb3\u8f93\u51fa\uff0c\u5e76\u5b9e\u73b0\u4e86\u66f4\u5feb\u7684\u6536\u655b\uff0c\u8d85\u8d8a\u4e86\u51e0\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u8fc8\u51fa\u4e86\u5411\u4f26\u7406\u5bf9\u9f50\u7684T2I\u6a21\u578b\u8fc8\u51fa\u7684\u7b2c\u4e00\u6b65\u3002**|\n", "2410.12686": "|**2024-10-16**|**Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2**|Mohamad Abdi et.al.|[2410.12686](http://arxiv.org/abs/2410.12686)|null|\u89e3\u5256\u5b66\u6807\u5fd7\u5728\u533b\u5b66\u5f71\u50cf\u4e2d\u5bf9\u4e8e\u5bfc\u822a\u548c\u5f02\u5e38\u68c0\u6d4b\u81f3\u5173\u91cd\u8981\u3002\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5982Llama-2\uff0c\u4e3a\u5c06\u8fd9\u4e9b\u6807\u5fd7\u4ece\u81ea\u7531\u6587\u672c\u7684\u653e\u5c04\u5b66\u62a5\u544a\u6620\u5c04\u5230\u56fe\u50cf\u6570\u636e\u4e2d\u7684\u76f8\u5e94\u4f4d\u7f6e\u63d0\u4f9b\u4e86\u5e0c\u671b\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLMs\u53ef\u80fd\u80fd\u591f\u5f62\u6210\u8fde\u8d2f\u7684\u751f\u6210\u8fc7\u7a0b\u8868\u793a\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u662f\u5426\u51c6\u786e\u5730\u8868\u793a\u89e3\u5256\u5b66\u6807\u5fd7\u7684\u7a7a\u95f4\u4f4d\u7f6e\u3002\u901a\u8fc7\u4f7f\u7528Llama-2\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u53ef\u4ee5\u7ebf\u6027\u5730\u8868\u793a\u7a7a\u95f4\u4e2d\u7684\u89e3\u5256\u5b66\u6807\u5fd7\uff0c\u5e76\u4e14\u5bf9\u4e0d\u540c\u63d0\u793a\u5177\u6709\u76f8\u5f53\u5f3a\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u589e\u5f3a\u533b\u5b66\u5f71\u50cf\u5de5\u4f5c\u6d41\u7a0b\u6548\u7387\u548c\u51c6\u786e\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.12656": "|**2024-10-16**|**Evaluating Morphological Compositional Generalization in Large Language Models**|Mete Ismayilzada et.al.|[2410.12656](http://arxiv.org/abs/2410.12656)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u5c55\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u503c\u5f97\u8d28\u7591\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8e\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u50cf\u4eba\u7c7b\u4e00\u6837\u5b66\u4e60\u8bed\u8a00\u7684\u7591\u95ee\u3002\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bed\u8a00\u4f7f\u7528\u4e2d\u8868\u73b0\u51fa\u7ec4\u5408\u80fd\u529b\u548c\u8bed\u8a00\u521b\u9020\u6027\uff0c\u4f46LLMs\u5728\u8fd9\u65b9\u9762\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u5f62\u6001\u5b66\u65b9\u9762\u7684\u80fd\u529b\uff0c\u4ecd\u9700\u8fdb\u4e00\u6b65\u63a2\u7d22\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u7ec4\u5408\u6027\u7684\u89c6\u89d2\u7cfb\u7edf\u5730\u7814\u7a76\u4e86LLMs\u5728\u5f62\u6001\u5b66\u6cdb\u5316\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u5c06\u8bcd\u7d20\u5b9a\u4e49\u4e3a\u7ec4\u5408\u7684\u57fa\u672c\u5355\u4f4d\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u5957\u65b0\u7684\u751f\u6210\u6027\u548c\u5224\u522b\u6027\u4efb\u52a1\u6765\u8bc4\u4f30\u5f62\u6001\u5b66\u7684\u751f\u4ea7\u529b\u548c\u7cfb\u7edf\u6027\u3002\u91cd\u70b9\u5173\u6ce8\u50cf\u571f\u8033\u5176\u8bed\u548c\u82ac\u5170\u8bed\u8fd9\u6837\u7684\u9ecf\u7740\u8bed\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6700\u5148\u8fdb\u7684\u6307\u4ee4\u5fae\u8c03\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u5305\u62ecGPT-4\u548cGemini\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLLMs\u5728\u5904\u7406\u5f62\u6001\u5b66\u7ec4\u5408\u6cdb\u5316\u65f6\u7279\u522b\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u5e94\u7528\u4e8e\u65b0\u8bcd\u6839\u65f6\uff0c\u968f\u7740\u5f62\u6001\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u867d\u7136\u6a21\u578b\u80fd\u591f\u6bd4\u968f\u673a\u731c\u6d4b\u66f4\u597d\u5730\u8bc6\u522b\u4e2a\u522b\u5f62\u6001\u7ec4\u5408\uff0c\u4f46\u5176\u8868\u73b0\u7f3a\u4e4f\u7cfb\u7edf\u6027\uff0c\u5bfc\u81f4\u4e0e\u4eba\u7c7b\u76f8\u6bd4\u5b58\u5728\u663e\u8457\u7684\u51c6\u786e\u7387\u5dee\u8ddd\u3002|\n", "2410.12631": "|**2024-10-16**|**Explainable Moral Values: a neuro-symbolic approach to value classification**|Nicolas Lazzari et.al.|[2410.12631](http://arxiv.org/abs/2410.12631)|null|\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u672c\u4f53\u7684\u63a8\u7406\u4e0e\u673a\u5668\u5b66\u4e60\u6280\u672f\u5728\u53ef\u89e3\u91ca\u4ef7\u503c\u5206\u7c7b\u4e2d\u7684\u6574\u5408\u3002\u901a\u8fc7\u4f9d\u8d56\u9053\u5fb7\u57fa\u7840\u7406\u8bba\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u5f62\u5f0f\u5316\u4ee5\u53caDnS\u672c\u4f53\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4f7f\u7528sandra\u795e\u7ecf\u7b26\u53f7\u63a8\u7406\u5668\u6765\u63a8\u65ad\u6ee1\u8db3\u7279\u5b9a\u53e5\u5b50\u63cf\u8ff0\u7684\u4ef7\u503c\u3002\u53e5\u5b50\u53ca\u5176\u7ed3\u6784\u5316\u8868\u793a\u662f\u4f7f\u7528\u5f00\u6e90\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u7684\u3002\u6240\u63a8\u65ad\u7684\u63cf\u8ff0\u88ab\u7528\u6765\u81ea\u52a8\u68c0\u6d4b\u53e5\u5b50\u6240\u5173\u8054\u7684\u4ef7\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f9d\u9760\u63a8\u7406\u5668\u7684\u7ed3\u679c\u5373\u53ef\u5b9e\u73b0\u4e0e\u66f4\u590d\u6742\u65b9\u6cd5\u76f8\u5f53\u7684\u53ef\u89e3\u91ca\u5206\u7c7b\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5c06\u63a8\u7406\u5668\u7684\u63a8\u65ad\u7ed3\u679c\u4e0e\u5206\u5e03\u8bed\u4e49\u65b9\u6cd5\u76f8\u7ed3\u5408\u53ef\u4ee5\u5927\u5e45\u8d85\u8d8a\u6240\u6709\u57fa\u7ebf\uff0c\u5305\u62ec\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\u7684\u590d\u6742\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u5de5\u5177\u6765\u63a2\u7d22\u57fa\u4e8e\u7406\u8bba\u7684\u503c\u5206\u7c7b\u7684\u6f5c\u529b\uff0c\u8be5\u5de5\u5177\u53ef\u5728http://xmv.geomeaning.com/\u516c\u5f00\u8bbf\u95ee\u3002|\n", "2410.13863": "|**2024-10-17**|**Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens**|Lijie Fan et.al.|[2410.13863](http://arxiv.org/abs/2410.13863)|null|\u5728\u89c6\u89c9\u9886\u57df\uff0c\u6269\u5927\u81ea\u56de\u5f52\u6a21\u578b\u7684\u6548\u679c\u5e76\u4e0d\u50cf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u90a3\u6837\u663e\u8457\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8fd9\u4e00\u6269\u5c55\u95ee\u9898\uff0c\u91cd\u70b9\u5173\u6ce8\u4e24\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u6a21\u578b\u662f\u5426\u4f7f\u7528\u79bb\u6563\u6216\u8fde\u7eed\u7684\u6807\u8bb0\uff0c\u4ee5\u53ca\u6807\u8bb0\u662f\u5426\u4ee5\u968f\u673a\u6216\u56fa\u5b9a\u6805\u683c\u987a\u5e8f\u4f7f\u7528\u7c7b\u4f3c\u4e8eBERT\u6216GPT\u7684\u53d8\u6362\u5668\u67b6\u6784\u751f\u6210\u3002\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u867d\u7136\u6240\u6709\u6a21\u578b\u5728\u9a8c\u8bc1\u635f\u5931\u65b9\u9762\u90fd\u80fd\u6709\u6548\u6269\u5c55\uff0c\u4f46\u5b83\u4eec\u7684\u8bc4\u4f30\u6027\u80fd\u2014\u2014\u901a\u8fc7FID\u3001GenEval\u5206\u6570\u548c\u89c6\u89c9\u8d28\u91cf\u6765\u8861\u91cf\u2014\u2014\u5219\u5448\u73b0\u51fa\u4e0d\u540c\u7684\u8d8b\u52bf\u3002\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u7684\u6a21\u578b\u5728\u89c6\u89c9\u8d28\u91cf\u4e0a\u663e\u8457\u4f18\u4e8e\u4f7f\u7528\u79bb\u6563\u6807\u8bb0\u7684\u6a21\u578b\u3002\u6b64\u5916\uff0c\u751f\u6210\u987a\u5e8f\u548c\u6ce8\u610f\u529b\u673a\u5236\u5bf9GenEval\u5206\u6570\u6709\u663e\u8457\u5f71\u54cd\uff1a\u968f\u673a\u987a\u5e8f\u7684\u6a21\u578b\u5728GenEval\u5206\u6570\u4e0a\u660e\u663e\u4f18\u4e8e\u6805\u683c\u987a\u5e8f\u7684\u6a21\u578b\u3002\u53d7\u8fd9\u4e9b\u53d1\u73b0\u7684\u542f\u53d1\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u540d\u4e3aFluid\u7684\u968f\u673a\u987a\u5e8f\u81ea\u56de\u5f52\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u3002Fluid 10.5B\u6a21\u578b\u5728MS-COCO 30K\u4e0a\u7684\u96f6\u6837\u672cFID\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6c34\u5e73\uff0c\u53736.16\uff0c\u5e76\u4e14\u5728GenEval\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6574\u4f53\u5f97\u5206\u4e3a0.69\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u53d1\u73b0\u548c\u7ed3\u679c\u80fd\u9f13\u52b1\u672a\u6765\u8fdb\u4e00\u6b65\u7f29\u5c0f\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u6269\u5c55\u5dee\u8ddd\u3002|\n", "2410.13861": "|**2024-10-17**|**PUMA: Empowering Unified MLLM with Multi-granular Visual Generation**|Rongyao Fang et.al.|[2410.13861](http://arxiv.org/abs/2410.13861)|**[link](https://github.com/rongyaofang/puma)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u5728\u89c6\u89c9-\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u521d\u6b65\u5c1d\u8bd5\u4e5f\u63a2\u7d22\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u5185\u5bb9\u751f\u6210\u4e2d\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u672a\u80fd\u5145\u5206\u89e3\u51b3\u7edf\u4e00MLLM\u8303\u5f0f\u4e0b\u4e0d\u540c\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u5bf9\u4e0d\u540c\u7c92\u5ea6\u9700\u6c42\u7684\u95ee\u9898\u2014\u2014\u4ece\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6240\u9700\u7684\u591a\u6837\u6027\u5230\u56fe\u50cf\u64cd\u4f5c\u6240\u9700\u7684\u7cbe\u786e\u53ef\u63a7\u6027\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PUMA\uff0c\u5373\u901a\u8fc7\u591a\u7c92\u5ea6\u89c6\u89c9\u751f\u6210\u8d4b\u4e88\u7edf\u4e00MLLM\u4ee5\u529b\u91cf\u3002PUMA\u5c06\u591a\u7c92\u5ea6\u89c6\u89c9\u7279\u5f81\u7edf\u4e00\u4f5c\u4e3aMLLM\u7684\u8f93\u5165\u548c\u8f93\u51fa\uff0c\u4f18\u96c5\u5730\u89e3\u51b3\u4e86\u4e0d\u540c\u7c92\u5ea6\u8981\u6c42\u7684\u5404\u79cd\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u5728\u7edf\u4e00MLLM\u6846\u67b6\u4e0b\u7684\u95ee\u9898\u3002\u7ecf\u8fc7\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u548c\u4efb\u52a1\u7279\u5b9a\u6307\u4ee4\u5fae\u8c03\u540e\uff0cPUMA\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u9879\u5de5\u4f5c\u6807\u5fd7\u7740\u5411\u771f\u6b63\u7edf\u4e00\u7684MLLM\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\uff0c\u8fd9\u79cdMLLM\u80fd\u591f\u9002\u5e94\u5404\u79cd\u89c6\u89c9\u4efb\u52a1\u5bf9\u7c92\u5ea6\u7684\u9700\u6c42\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5c06\u5728https://github.com/rongyaofang/PUMA\u53d1\u5e03\u3002**|\n", "2410.13859": "|**2024-10-17**|**$\u03b3-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models**|Yaxin Luo et.al.|[2410.13859](http://arxiv.org/abs/2410.13859)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5176\u9ad8\u6602\u7684\u8ba1\u7b97\u6210\u672c\u4ecd\u7136\u662f\u5b9e\u9645\u90e8\u7f72\u4e2d\u7684\u4e00\u4e2a\u969c\u788d\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u6df1\u5ea6\u6df7\u5408\uff08MoD\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u4ece\u201c\u6fc0\u6d3b\u6807\u8bb0\u201d\u7684\u89d2\u5ea6\u6765\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\u3002\u6211\u4eec\u7684\u5173\u952e\u89c1\u89e3\u662f\uff0c\u5982\u679c\u5927\u591a\u6570\u6807\u8bb0\u5bf9\u4e8e\u5c42\u8ba1\u7b97\u662f\u5197\u4f59\u7684\uff0c\u90a3\u4e48\u53ef\u4ee5\u901a\u8fc7MoD\u5c42\u76f4\u63a5\u8df3\u8fc7\u5b83\u4eec\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5c06MLLMs\u7684\u5bc6\u96c6\u5c42\u8f6c\u6362\u4e3aMoD\u5c42\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u4e0b\u964d\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MoD\u9002\u5e94\u7b56\u7565\uff0c\u79f0\u4e3a$\\gamma$-MoD\uff0c\u7528\u4e8e\u73b0\u6709\u7684MLLMs\u3002\u5728$\\gamma$-MoD\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u6307\u6807\u6765\u6307\u5bfcMLLM\u4e2dMoD\u7684\u90e8\u7f72\uff0c\u5373\u6ce8\u610f\u529b\u56fe\u7684\u79e9\uff08ARank\uff09\u3002\u901a\u8fc7ARank\uff0c\u6211\u4eec\u53ef\u4ee5\u6709\u6548\u5730\u8bc6\u522b\u54ea\u4e9b\u5c42\u662f\u5197\u4f59\u7684\uff0c\u5e76\u5e94\u88ab\u66ff\u6362\u4e3aMoD\u5c42\u3002\u57fa\u4e8eARank\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u9896\u7684\u8bbe\u8ba1\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8MLLM\u7684\u8ba1\u7b97\u7a00\u758f\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6027\u80fd\uff0c\u5373\u5171\u4eab\u89c6\u89c9-\u8bed\u8a00\u8def\u7531\u5668\u548c\u63a9\u7801\u8def\u7531\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u4e9b\u8bbe\u8ba1\uff0cMLLM\u768490%\u4ee5\u4e0a\u7684\u5bc6\u96c6\u5c42\u53ef\u4ee5\u6709\u6548\u8f6c\u6362\u4e3aMoD\u5c42\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u6d41\u884c\u7684MLLM\u4e0a\u8fdb\u884c\u4e86\u5e94\u7528\uff0c\u5e76\u57289\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u9a8c\u8bc1\u4e86$\\gamma$-MoD\u5bf9\u73b0\u6709MLLMs\u7684\u663e\u8457\u6548\u7387\u63d0\u5347\uff0c\u8fd8\u8bc1\u5b9e\u4e86\u5176\u5728\u5404\u79cdMLLM\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u4f8b\u5982\uff0c$\\gamma$-MoD\u4ec5\u5bfc\u81f4\u8f7b\u5fae\u7684\u6027\u80fd\u4e0b\u964d\uff0c\u5373-1.5%\uff0c\u4f46\u53ef\u4ee5\u5206\u522b\u5c06LLaVA-HR\u7684\u8bad\u7ec3\u65f6\u95f4\u548c\u63a8\u7406\u65f6\u95f4\u51cf\u5c1131.0%\u548c53.2%\u3002|\n", "2410.13857": "|**2024-10-17**|**How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs**|Guhao Feng et.al.|[2410.13857](http://arxiv.org/abs/2410.13857)|null|\u5c3d\u7ba1\u57fa\u4e8eTransformer\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u4f46\u7406\u89e3\u548c\u63d0\u5347\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u6311\u6218\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5bf9LLMs\u7684\u6570\u5b66\u80fd\u529b\u8fdb\u884c\u4e86\u4e25\u683c\u7684\u7406\u8bba\u5206\u6790\uff0c\u7279\u522b\u5173\u6ce8\u5b83\u4eec\u7684\u7b97\u672f\u8868\u73b0\u3002\u6211\u4eec\u53d1\u73b0\u6570\u503c\u7cbe\u5ea6\u662f\u5f71\u54cd\u5176\u5728\u6570\u5b66\u4efb\u52a1\u4e2d\u8868\u73b0\u7684\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528\u4f4e\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u5728\u5904\u7406\u7b97\u672f\u4efb\u52a1\uff08\u5982\u8fed\u4ee3\u52a0\u6cd5\u548c\u6574\u6570\u4e58\u6cd5\uff09\u65f6\uff0c\u9664\u975e\u6a21\u578b\u5927\u5c0f\u76f8\u5bf9\u4e8e\u8f93\u5165\u957f\u5ea6\u5448\u8d85\u591a\u9879\u5f0f\u589e\u957f\uff0c\u5426\u5219\u65e0\u6cd5\u6709\u6548\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6807\u51c6\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u53ef\u4ee5\u9ad8\u6548\u5730\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\uff0c\u5e76\u4e14\u6240\u9700\u7684\u6a21\u578b\u5c3a\u5bf8\u8981\u5c0f\u5f97\u591a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u7406\u8bba\u53d1\u73b0\uff0c\u63a2\u7d22\u4e86\u4e0d\u540c\u6570\u503c\u7cbe\u5ea6\u5bf9\u7b97\u672f\u4efb\u52a1\u7684\u5f71\u54cd\uff0c\u4e3a\u63d0\u9ad8LLMs\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002|\n", "2410.13854": "|**2024-10-17**|**Can MLLMs Understand the Deep Implication Behind Chinese Images?**|Chenhao Zhang et.al.|[2410.13854](http://arxiv.org/abs/2410.13854)|**[link](https://github.com/MING-ZCH/CII-Bench)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u80fd\u529b\u4e0d\u65ad\u63d0\u5347\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u66f4\u9ad8\u9636\u80fd\u529b\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u9488\u5bf9MLLMs\u7684\u9ad8\u9636\u611f\u77e5\u548c\u7406\u89e3\u4e2d\u6587\u89c6\u89c9\u5185\u5bb9\u7684\u8bc4\u4f30\u5de5\u4f5c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e2d\u6587\u56fe\u50cf\u9690\u542b\u7406\u89e3\u57fa\u51c6\uff08CII-Bench\uff09\uff0c\u65e8\u5728\u8bc4\u4f30MLLMs\u5bf9\u4e2d\u6587\u56fe\u50cf\u7684\u9ad8\u9636\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u3002\u4e0e\u73b0\u6709\u57fa\u51c6\u76f8\u6bd4\uff0cCII-Bench\u5177\u6709\u591a\u4e2a\u7a81\u51fa\u7279\u70b9\u3002\u9996\u5148\uff0c\u4e3a\u4e86\u786e\u4fdd\u4e2d\u6587\u80cc\u666f\u7684\u771f\u5b9e\u6027\uff0cCII-Bench\u4e2d\u7684\u56fe\u50cf\u6765\u6e90\u4e8e\u4e2d\u56fd\u4e92\u8054\u7f51\uff0c\u5e76\u7ecf\u8fc7\u4eba\u5de5\u5ba1\u67e5\uff0c\u76f8\u5e94\u7684\u7b54\u6848\u4e5f\u7531\u4eba\u5de5\u7cbe\u5fc3\u5236\u4f5c\u3002\u6b64\u5916\uff0cCII-Bench\u8fd8\u7eb3\u5165\u4e86\u4ee3\u8868\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u56fe\u50cf\uff0c\u5982\u8457\u540d\u7684\u4e2d\u56fd\u4f20\u7edf\u7ed8\u753b\uff0c\u8fd9\u53ef\u4ee5\u6df1\u5165\u53cd\u6620\u6a21\u578b\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u7406\u89e3\u3002\u901a\u8fc7\u5728\u591a\u4e2aMLLMs\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4e86\u91cd\u8981\u53d1\u73b0\u3002\u6700\u521d\uff0cMLLMs\u5728CII-Bench\u4e0a\u7684\u8868\u73b0\u4e0e\u4eba\u7c7b\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002MLLMs\u7684\u6700\u9ad8\u51c6\u786e\u7387\u4e3a64.4%\uff0c\u800c\u4eba\u7c7b\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a78.2%\uff0c\u5cf0\u503c\u8fbe\u5230\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768481.0%\u3002\u968f\u540e\uff0cMLLMs\u5728\u5904\u7406\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u56fe\u50cf\u65f6\u8868\u73b0\u8f83\u5dee\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u7406\u89e3\u9ad8\u5c42\u6b21\u8bed\u4e49\u548c\u7f3a\u4e4f\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u6df1\u5165\u4e86\u89e3\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u6700\u540e\uff0c\u89c2\u5bdf\u5230\u5927\u591a\u6570\u6a21\u578b\u5728\u56fe\u50cf\u60c5\u611f\u63d0\u793a\u88ab\u7eb3\u5165\u63d0\u793a\u65f6\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0cCII-Bench\u5c06\u4f7fMLLMs\u66f4\u597d\u5730\u7406\u89e3\u4e2d\u6587\u8bed\u4e49\u548c\u7279\u5b9a\u4e8e\u4e2d\u56fd\u7684\u56fe\u50cf\uff0c\u4ece\u800c\u63a8\u52a8\u5411\u4e13\u5bb6\u578b\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u53d1\u5c55\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u5728https://cii-bench.github.io/\u516c\u5f00\u8bbf\u95ee\u3002**|\n", "2410.13852": "|**2024-10-17**|**Retrospective Learning from Interactions**|Zizhao Chen et.al.|[2410.13852](http://arxiv.org/abs/2410.13852)|null|\u591a\u56de\u5408\u4ea4\u4e92\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7528\u6237\u4e4b\u95f4\u7684\u4e92\u52a8\u81ea\u7136\u5305\u542b\u4e86\u9690\u542b\u7684\u53cd\u9988\u4fe1\u53f7\u3002\u5982\u679cLLMs\u4ee5\u51fa\u4e4e\u610f\u6599\u7684\u65b9\u5f0f\u56de\u5e94\u7528\u6237\u7684\u6307\u4ee4\uff0c\u7528\u6237\u5f88\u53ef\u80fd\u4f1a\u901a\u8fc7\u91cd\u65b0\u8868\u8ff0\u8bf7\u6c42\u3001\u8868\u8fbe\u632b\u8d25\u611f\u6216\u8f6c\u5411\u66ff\u4ee3\u4efb\u52a1\u6765\u4f20\u8fbe\u8fd9\u4e00\u4fe1\u53f7\u3002\u8fd9\u4e9b\u4fe1\u53f7\u4e0e\u5177\u4f53\u4efb\u52a1\u65e0\u5173\uff0c\u5e76\u4e14\u5360\u636e\u76f8\u5bf9\u53d7\u9650\u7684\u8bed\u8a00\u5b50\u7a7a\u95f4\uff0c\u5373\u4f7fLLMs\u5728\u5b9e\u9645\u4efb\u52a1\u4e0a\u5931\u8d25\u4e86\uff0c\u4e5f\u80fd\u8bc6\u522b\u8fd9\u4e9b\u4fe1\u53f7\u3002\u8fd9\u4e3aLLMs\u901a\u8fc7\u4e92\u52a8\u6301\u7eed\u5b66\u4e60\u63d0\u4f9b\u4e86\u9014\u5f84\uff0c\u800c\u65e0\u9700\u989d\u5916\u6807\u6ce8\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aReSpect\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u56de\u987e\u8fc7\u53bb\u7684\u4ea4\u4e92\u6765\u5b66\u4e60\u8fd9\u4e9b\u4fe1\u53f7\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u573a\u666f\u4e2d\u90e8\u7f72\u4e86ReSpect\uff0c\u5728\u8be5\u573a\u666f\u4e2d\uff0c\u4eba\u7c7b\u6307\u5bfcLLMs\u89e3\u51b3\u5177\u6709\u7ec4\u5408\u89e3\u7a7a\u95f4\u7684\u62bd\u8c61\u63a8\u7406\u4efb\u52a1\u3002\u901a\u8fc7\u4e0e\u4eba\u7c7b\u8fdb\u884c\u6570\u5343\u6b21\u4ea4\u4e92\uff0c\u6211\u4eec\u5c55\u793a\u4e86ReSpect\u5982\u4f55\u9010\u6b65\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7387\uff0c\u4ece31%\u63d0\u5347\u523082%\uff0c\u4e14\u65e0\u9700\u4efb\u4f55\u5916\u90e8\u6807\u6ce8\u3002|\n", "2410.13846": "|**2024-10-17**|**SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction**|Xuan Zhang et.al.|[2410.13846](http://arxiv.org/abs/2410.13846)|**[link](https://github.com/sail-sg/simlayerkv)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u5df2\u7ecf\u6269\u5c55\u4e86\u5b83\u4eec\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u589e\u52a0\u6a21\u578b\u5c42\u6570\u548c\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u663e\u8457\u589e\u52a0\u4e86\u5b58\u50a8\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\uff0c\u8fd9\u5bf9\u9ad8\u6548\u7684\u63a8\u7406\u6784\u6210\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SimLayerKV\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5728\u8bc6\u522b\u4e3a\u61d2\u5c42\u7684\u5c42\u4e2d\u9009\u62e9\u6027\u5730\u4e22\u5f03\u7f13\u5b58\u6765\u51cf\u5c11\u5c42\u95f4KV\u7f13\u5b58\u7684\u5197\u4f59\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u8fd9\u6837\u7684\u89c2\u5bdf\uff1a\u5728\u957f\u4e0a\u4e0b\u6587LLMs\u4e2d\uff0c\u67d0\u4e9b\u5c42\u8868\u73b0\u51fa\u201c\u61d2\u60f0\u201d\u884c\u4e3a\uff0c\u4e0e\u975e\u61d2\u5c42\u76f8\u6bd4\uff0c\u5bf9\u5efa\u6a21\u957f\u8ddd\u79bb\u4f9d\u8d56\u8d21\u732e\u8f83\u5c0f\u3002\u901a\u8fc7\u5206\u6790\u6ce8\u610f\u529b\u6743\u91cd\u6a21\u5f0f\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u61d2\u5c42\u5728\u7ed9\u5b9a\u8f93\u5165\u751f\u6210\u8fc7\u7a0b\u4e2d\u5bf9\u4e0d\u540ctoken\u7684\u884c\u4e3a\u662f\u4e00\u81f4\u7684\u3002\u8fd9\u4e00\u89c1\u89e3\u542f\u53d1\u4e86\u6211\u4eec\u7684SimLayerKV\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u8bc6\u522b\u61d2\u5c42\u5e76\u76f8\u5e94\u5730\u51cf\u5c11\u5176KV\u7f13\u5b58\u3002SimLayerKV\u65e0\u9700\u8bad\u7ec3\uff0c\u5177\u6709\u901a\u7528\u6027\uff0c\u5e76\u4e14\u53ef\u4ee5\u7528\u4ec5\u4e03\u884c\u4ee3\u7801\u5b9e\u73b0\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u4ee3\u8868\u6027LLMs\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4f8b\u5982LLaMA2-7B\u3001LLaMA3-8B\u548cMistral-7B\uff0c\u5728LongBench\u57fa\u51c6\u6d4b\u8bd5\u768416\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cSimLayerKV\u5b9e\u73b0\u4e865\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\uff0c\u5e76\u4e14\u5728\u7ed3\u54084\u4f4d\u91cf\u5316\u65f6\u6027\u80fd\u4ec5\u4e0b\u964d1.2%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/sail-sg/SimLayerKV\u83b7\u53d6\u3002**|\n", "2410.13835": "|**2024-10-17**|**Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs**|Tianyu Guo et.al.|[2410.13835](http://arxiv.org/abs/2410.13835)|null|\u5b9e\u8df5\u8005\u5728\u53d8\u538b\u5668\u578b\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u89c2\u5bdf\u5230\u4e86\u4e09\u4e2a\u4ee4\u4eba\u56f0\u60d1\u7684\u73b0\u8c61\uff1a\u6ce8\u610f\u529b\u6c47\u70b9\u3001\u503c\u72b6\u6001\u8017\u5c3d\u548c\u6b8b\u5dee\u72b6\u6001\u5cf0\u503c\uff0c\u8fd9\u4e9b\u73b0\u8c61\u7edf\u79f0\u4e3a\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u3002\u8fd9\u4e9b\u73b0\u8c61\u7684\u7279\u70b9\u662f\u67d0\u4e9b\u6240\u8c13\u7684\u201c\u6c47\u70b9\u4ee4\u724c\u201d\u63a5\u6536\u4e0d\u6210\u6bd4\u4f8b\u9ad8\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u8868\u73b0\u51fa\u660e\u663e\u8f83\u5c0f\u7684\u503c\u72b6\u6001\uff0c\u5e76\u4e14\u5177\u6709\u6bd4\u5176\u4ed6\u4ee4\u724c\u5927\u5f97\u591a\u7684\u6b8b\u5dee\u72b6\u6001\u8303\u6570\u3002\u8fd9\u4e9b\u6781\u7aef\u4ee4\u724c\u5728LLM\u63a8\u7406\u3001\u91cf\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u5f15\u53d1\u4e86\u8bb8\u591a\u6311\u6218\u3002\u6211\u4eec\u9610\u660e\u4e86\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u80cc\u540e\u7684\u673a\u5236\u3002\u9996\u5148\uff0c\u6211\u4eec\u5728\u975e\u5e38\u7b80\u5355\u7684\u67b6\u6784\u2014\u2014\u4e00\u5230\u4e09\u5c42\u7684\u53d8\u538b\u5668\uff0c\u5728\u73a9\u5177\u6a21\u578bBigram-Backcopy\uff08BB\uff09\u4efb\u52a1\u4e0a\u8bad\u7ec3\u65f6\u5c55\u793a\u4e86\u8fd9\u4e9b\u73b0\u8c61\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e00\u4e2a\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5176\u4e2d\u6ce8\u610f\u529b\u5934\u5bf9\u4e8e\u7279\u5b9a\u8f93\u5165\u57df\u6210\u4e3a\u6c47\u70b9\uff0c\u800c\u5bf9\u4e8e\u5176\u4ed6\u8f93\u5165\u5219\u4e0d\u662f\u3002\u6211\u4eec\u5bf9\u8bad\u7ec3\u52a8\u6001\u7684\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u4e9b\u73b0\u8c61\u662f\u7531\u4e00\u79cd\u76f8\u4e92\u589e\u5f3a\u673a\u5236\u9a71\u52a8\u7684\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u7f13\u89e3\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u7684\u7b56\u7565\uff0c\u5305\u62ec\u7528ReLU\u66ff\u6362softmax\u4ee5\u53ca\u7528SGD\u66ff\u6362Adam\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5206\u6790\u6269\u5c55\u5230\u9884\u8bad\u7ec3\u7684LLM\uff0c\u5305\u62ecLlama\u548cOLMo\uff0c\u663e\u793a\u8bb8\u591a\u6ce8\u610f\u529b\u5934\u8868\u73b0\u51fa\u4e0eBB\u4efb\u52a1\u4e2d\u7c7b\u4f3c\u7684\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5e76\u4e14\u76f8\u4e92\u589e\u5f3a\u673a\u5236\u4e5f\u652f\u914d\u7740LLM\u9884\u8bad\u7ec3\u671f\u95f4\u6781\u7aef\u4ee4\u724c\u73b0\u8c61\u7684\u51fa\u73b0\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u8bb8\u591a\u7531BB\u4efb\u52a1\u9884\u6d4b\u7684\u9759\u6001\u548c\u52a8\u6001\u6027\u8d28\u4e0e\u9884\u8bad\u7ec3LLM\u4e2d\u7684\u89c2\u5bdf\u7ed3\u679c\u4e00\u81f4\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u6765\u5b9e\u73b0\u81ea\u6cbb\uff0c\u53ef\u4ee5\u63d0\u5347\u4eba\u7c7b\u5728\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\u7684\u540c\u65f6\uff0c\u7f51\u7edc\u4ee3\u7406\u4e5f\u4f5c\u4e3a\u4e00\u4e2a\u91cd\u8981\u7684\u6982\u5ff5\u9a8c\u8bc1\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6027\u3002\u5176\u6210\u529f\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u4f1a\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u8fd9\u4e9b\u7b56\u7565\u53ef\u80fd\u65e0\u6cd5\u5728\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5f88\u597d\u5730\u63a8\u5e7f\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0eLLM\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u4e0d\u5339\u914d\u7684\u7814\u7a76\u975e\u5e38\u6709\u9650\u3002\u8fd9\u79cd\u5dee\u5f02\u7279\u522b\u660e\u663e\uff0c\u56e0\u4e3aLLM\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u8fdb\u884c\u8bad\u7ec3\uff0c\u800c\u4e0d\u662f\u5904\u7406\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316LLM\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0eLLM\u7684\u80fd\u529b\u76f8\u5339\u914d\uff0c\u4ece\u800c\u63d0\u5347\u4e86\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u7684\u57fa\u7840\u4ee3\u7406\u5728\u5404\u79cd\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4ee5\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u57fa\u51c6\u6d4b\u8bd5\u6db5\u76d6\u4e86\u901a\u7528\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u4e4b\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa9.8\u5206\uff08+29.4%\uff09\uff0c\u6bd4\u540c\u65f6\u671f\u7684\u5de5\u4f5c\u9ad8\u51fa5.9\u5206\uff08+15.8%\uff09\u3002\u76f8\u6bd4\u7c7b\u4f3c\u7684\u57fa\u672c\u7f51\u7edc\u4ee3\u7406\uff0c\u5176\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u9f50\u540e\u6210\u529f\u7387\u4e3a26.6\u5206\uff08+161%\uff09\u3002\u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u7684\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u8bbe\u8ba1\u7b80\u5355\uff0c\u7a81\u663e\u4e86LLM\u5728\u65e0\u6837\u672c\u60c5\u51b5\u4e0b\u6267\u884c\u7f51\u7edc\u4efb\u52a1\u7684\u5f3a\u5927\u6027\u80fd\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u3002|\n", "2410.13824": "|**2024-10-18**|**Harnessing Webpage UIs for Text-Rich Visual Understanding**|Junpeng Liu et.al.|[2410.13824](http://arxiv.org/abs/2410.13824)|null|\u6587\u672c\u4e30\u5bcc\u7684\u89c6\u89c9\u7406\u89e3\u2014\u2014\u5373\u5904\u7406\u5bc6\u96c6\u6587\u672c\u5185\u5bb9\u4e0e\u89c6\u89c9\u5143\u7d20\u76f8\u878d\u5408\u7684\u73af\u5883\u7684\u80fd\u529b\uff0c\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7ed3\u6784\u5316\u73af\u5883\u4e2d\u8fdb\u884c\u6709\u6548\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u4e86\u589e\u5f3a\u8fd9\u4e00\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u57fa\u4e8e\u6587\u672c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ece\u7f51\u9875\u7528\u6237\u754c\u9762\u5408\u6210\u901a\u7528\u7684\u591a\u6a21\u6001\u6307\u4ee4\u3002\u5c3d\u7ba1\u7f3a\u4e4f\u76f4\u63a5\u7684\u89c6\u89c9\u8f93\u5165\uff0c\u57fa\u4e8e\u6587\u672c\u7684LLMs\u80fd\u591f\u5904\u7406\u6765\u81ea\u7f51\u9875\u53ef\u8bbf\u95ee\u6027\u6811\u7684\u7ed3\u6784\u5316\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6307\u4ee4\u968f\u540e\u4e0eUI\u622a\u56fe\u914d\u5bf9\u4ee5\u8bad\u7ec3\u591a\u6a21\u6001\u6a21\u578b\u3002\u6211\u4eec\u5f15\u5165\u4e86MultiUI\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6765\u81ea100\u4e07\u4e2a\u7f51\u7ad9\u7684730\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u548cUI\u5e03\u5c40\u3002\u5728MultiUI\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7f51\u9875UI\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728VisualWebBench\u4e0a\u7684\u63d0\u5347\u9ad8\u8fbe48%\uff0c\u5728Mind2Web\u7684\u7f51\u9875\u4ee3\u7406\u6570\u636e\u96c6\u4e2d\u5143\u7d20\u51c6\u786e\u7387\u63d0\u9ad8\u4e8619.1%\uff0c\u800c\u4e14\u5728\u975e\u7f51\u9875UI\u4efb\u52a1\u4ee5\u53ca\u751a\u81f3\u975eUI\u9886\u57df\uff08\u5982\u6587\u6863\u7406\u89e3\u3001OCR\u548c\u56fe\u8868\u89e3\u91ca\uff09\u4e2d\u4e5f\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u6cdb\u5316\u80fd\u529b\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u7f51\u9875UI\u6570\u636e\u5728\u63a8\u52a8\u5404\u79cd\u573a\u666f\u4e0b\u6587\u672c\u4e30\u5bcc\u89c6\u89c9\u7406\u89e3\u7684\u5e7f\u6cdb\u5e94\u7528\u6027\u3002|\n", "2410.14677": "|**2024-10-18**|**Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts**|German Gritsai et.al.|[2410.14677](http://arxiv.org/abs/2410.14677)|null|\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u8457\u63d0\u5347\u4e86\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\uff0c\u8fd9\u4fc3\u4f7f\u4e86\u53ef\u9760\u673a\u5668\u751f\u6210\u6587\u672c\u68c0\u6d4b\u5668\u7684\u51fa\u73b0\u3002\u5927\u91cf\u68c0\u6d4b\u5668\u548c\u5305\u542b\u4eba\u5de5\u667a\u80fd\u7247\u6bb5\u7684\u6570\u636e\u96c6\u5e94\u8fd0\u800c\u751f\uff0c\u4e00\u4e9b\u68c0\u6d4b\u65b9\u6cd5\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e2d\u8fbe\u5230\u4e86\u9ad8\u8fbe99.9%\u7684\u76ee\u6807\u6307\u6807\u8bc6\u522b\u8d28\u91cf\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u8fd9\u4e9b\u68c0\u6d4b\u5668\u7684\u8d28\u91cf\u5f80\u5f80\u4f1a\u5927\u5e45\u4e0b\u964d\uff0c\u8fd9\u5f15\u53d1\u4e86\u7591\u95ee\uff1a\u8fd9\u4e9b\u68c0\u6d4b\u5668\u662f\u5426\u771f\u6b63\u5177\u6709\u9ad8\u5ea6\u7684\u53ef\u9760\u6027\uff0c\u8fd8\u662f\u5176\u9ad8\u57fa\u51c6\u5206\u6570\u4ec5\u4ec5\u662f\u7531\u4e8e\u8bc4\u4f30\u6570\u636e\u96c6\u8d28\u91cf\u8f83\u5dee\u6240\u81f4\uff1f\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u9700\u8981\u5efa\u7acb\u7a33\u5065\u4e14\u9ad8\u8d28\u91cf\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684\u6570\u636e\uff0c\u4ee5\u9632\u6b62\u672a\u6765\u6a21\u578b\u4e2d\u7684\u504f\u5dee\u548c\u4f4e\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u5bf9\u4e13\u95e8\u7528\u4e8eAI\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u7684\u7ade\u8d5b\u4e2d\u7684\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cfb\u7edf\u56de\u987e\uff0c\u5e76\u63d0\u51fa\u4e86\u8bc4\u4f30\u5305\u542bAI\u751f\u6210\u7247\u6bb5\u7684\u6570\u636e\u96c6\u8d28\u91cf\u7684\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u4f7f\u7528\u9ad8\u8d28\u91cf\u751f\u6210\u6570\u636e\u4ee5\u5b9e\u73b0\u4e24\u4e2a\u76ee\u6807\u7684\u53ef\u80fd\u6027\uff1a\u63d0\u9ad8\u68c0\u6d4b\u6a21\u578b\u7684\u8bad\u7ec3\u6548\u679c\u548c\u6539\u5584\u8bad\u7ec3\u6570\u636e\u96c6\u672c\u8eab\u3002\u6211\u4eec\u7684\u8d21\u732e\u65e8\u5728\u4fc3\u8fdb\u5bf9\u4eba\u4e0e\u673a\u5668\u6587\u672c\u4e4b\u95f4\u52a8\u6001\u5173\u7cfb\u7684\u66f4\u597d\u7406\u89e3\uff0c\u4ece\u800c\u6700\u7ec8\u652f\u6301\u5728\u4e00\u4e2a\u65e5\u76ca\u81ea\u52a8\u5316\u7684\u4e16\u754c\u4e2d\u4fe1\u606f\u7684\u5b8c\u6574\u6027\u3002|\n", "2410.14676": "|**2024-10-18**|**SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment**|Qin Liu et.al.|[2410.14676](http://arxiv.org/abs/2410.14676)|null|\u73b0\u6709\u7684\u504f\u597d\u5bf9\u9f50\u673a\u5236\u662f\u4e00\u79cd\u4e00\u5200\u5207\u7684\u5bf9\u9f50\u65b9\u5f0f\uff0c\u5176\u4e2d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53c2\u6570\u5316\u77e5\u8bc6\u4e2d\u7684\u975e\u504f\u597d\u7279\u5f81\u88ab\u7edf\u4e00\u5c4f\u853d\uff0c\u9002\u7528\u4e8e\u6240\u6709\u7528\u6237\u3002\u7136\u800c\uff0c\u8fd9\u90e8\u5206\u77e5\u8bc6\u5bf9\u4e8e\u90a3\u4e9b\u5177\u6709\u4e13\u4e1a\u77e5\u8bc6\u5e76\u80fd\u591f\u5904\u7406\u8fd9\u4e9b\u4fe1\u606f\u7684\u9ad8\u7ea7\u7528\u6237\u6765\u8bf4\u53ef\u80fd\u662f\u6709\u7528\u7684\u3002\u8fd9\u79cd\u4e00\u5200\u5207\u7684\u5bf9\u9f50\u673a\u5236\u524a\u5f31\u4e86\u8fd9\u4e9b\u5408\u683c\u7528\u6237\u7684LLM\u6548\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SudoLM\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u6388\u6743\u5bf9\u9f50\u8ba9LLM\u5b66\u4e60\u9488\u5bf9\u4e0d\u540c\u7528\u6237\u51ed\u8bc1\u7684\u5177\u4f53\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8bbf\u95ee\u63a7\u5236\u3002SudoLM\u5141\u8bb8\u6388\u6743\u7528\u6237\u901a\u8fc7\u5206\u914d\u7684SUDO\u5bc6\u94a5\u89e3\u9501\u5bf9\u6240\u6709\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8bbf\u95ee\uff0c\u800c\u975e\u6388\u6743\u7528\u6237\u5219\u65e0\u6cd5\u8bbf\u95ee\u8fd9\u4e9b\u77e5\u8bc6\u3002\u5728\u4e24\u4e2a\u5e94\u7528\u573a\u666f\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSudoLM\u80fd\u591f\u6709\u6548\u63a7\u5236\u7528\u6237\u5bf9\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8bbf\u95ee\uff0c\u5e76\u4fdd\u6301\u5176\u603b\u4f53\u6548\u7528\u3002|\n", "2410.14675": "|**2024-10-18**|**Enhancing Large Language Models' Situated Faithfulness to External Contexts**|Yukun Huang et.al.|[2410.14675](http://arxiv.org/abs/2410.14675)|**[link](https://github.com/kkkevinkkkkk/situated_faithfulness)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u5e38\u4f1a\u4f7f\u7528\u5916\u90e8\u4fe1\u606f\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff0c\u4f46\u8fd9\u4e9b\u5916\u90e8\u4fe1\u606f\u6709\u65f6\u53ef\u80fd\u662f\u4e0d\u51c6\u786e\u7684\uff0c\u751a\u81f3\u53ef\u80fd\u662f\u6545\u610f\u8bef\u5bfc\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u7a33\u5065\u7684LLMs\u5e94\u8be5\u5c55\u793a\u51fa\u60c5\u5883\u771f\u5b9e\u6027\uff0c\u6839\u636e\u5b83\u4eec\u5bf9\u5185\u90e8\u77e5\u8bc6\u548c\u5916\u90e8\u4e0a\u4e0b\u6587\u7684\u4fe1\u5fc3\u52a8\u6001\u8c03\u6574\u5bf9\u5916\u90e8\u4fe1\u606f\u7684\u4fe1\u4efb\u5ea6\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u80fd\u529b\uff0c\u6211\u4eec\u5bf9LLMs\u8fdb\u884c\u4e86\u591a\u9879QA\u6570\u636e\u96c6\u7684\u6d4b\u8bd5\uff0c\u5305\u62ec\u4e00\u4e2a\u65b0\u521b\u5efa\u7684\u6570\u636e\u96c6RedditQA\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u6765\u81eaReddit\u5e16\u5b50\u4e2d\u7684\u5b9e\u9645\u9519\u8bef\u4e0a\u4e0b\u6587\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u63d0\u4f9b\u6b63\u786e\u548c\u4e0d\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u65f6\uff0c\u65e0\u8bba\u662f\u5f00\u6e90\u6a21\u578b\u8fd8\u662f\u4e13\u6709\u6a21\u578b\uff0c\u90fd\u503e\u5411\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u5916\u90e8\u4fe1\u606f\uff0c\u800c\u4e0d\u7ba1\u5176\u4e8b\u5b9e\u51c6\u786e\u6027\u5982\u4f55\u3002\u4e3a\u4e86\u589e\u5f3a\u60c5\u5883\u771f\u5b9e\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u81ea\u5f15\u5bfc\u7f6e\u4fe1\u5ea6\u63a8\u7406\uff08SCR\uff09\u548c\u57fa\u4e8e\u89c4\u5219\u7684\u7f6e\u4fe1\u5ea6\u63a8\u7406\uff08RCR\uff09\u3002SCR\u4f7f\u6a21\u578b\u80fd\u591f\u6839\u636e\u81ea\u8eab\u5185\u90e8\u77e5\u8bc6\u76f8\u5bf9\u5730\u8bc4\u4f30\u5916\u90e8\u4fe1\u606f\u7684\u7f6e\u4fe1\u5ea6\uff0c\u4ece\u800c\u751f\u6210\u6700\u51c6\u786e\u7684\u7b54\u6848\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0cRCR\u4eceLLM\u4e2d\u63d0\u53d6\u663e\u5f0f\u7684\u7f6e\u4fe1\u5ea6\u4fe1\u53f7\uff0c\u5e76\u5229\u7528\u9884\u5b9a\u4e49\u7684\u89c4\u5219\u6765\u786e\u5b9a\u6700\u7ec8\u7b54\u6848\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u4e8e\u5177\u6709\u5f3a\u5927\u63a8\u7406\u80fd\u529b\u7684\u6a21\u578b\uff0c\u5982GPT-4o\u548cGPT-4o mini\uff0cSCR\u4f18\u4e8eRCR\uff0c\u5728\u76f4\u63a5\u8f93\u5165\u589e\u5f3a\u57fa\u7ebf\u4e0a\u7684\u63d0\u5347\u5e45\u5ea6\u6700\u9ad8\u53ef\u8fbe24.2%\u3002\u76f8\u53cd\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u5982Llama-3-8B\uff0cRCR\u5219\u4f18\u4e8eSCR\u3002\u901a\u8fc7\u6211\u4eec\u7684\u7f6e\u4fe1\u5ea6\u63a8\u7406\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08CR-DPO\uff09\u65b9\u6cd5\u5bf9SCR\u8fdb\u884c\u5fae\u8c03\uff0c\u53ef\u4ee5\u63d0\u9ad8\u5728\u5df2\u89c1\u548c\u672a\u89c1\u8fc7\u7684\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u5e73\u5747\u63d0\u5347\u5e45\u5ea6\u4e3a8.9%\u3002\u9664\u4e86\u5b9a\u91cf\u7ed3\u679c\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u5173\u4e8eSCR\u548cRCR\u76f8\u5bf9\u4f18\u52bf\u7684\u89c1\u89e3\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u63d0\u9ad8LLMs\u60c5\u5883\u771f\u5b9e\u6027\u7684\u6709\u524d\u666f\u9014\u5f84\u3002\u76f8\u5173\u6570\u636e\u548c\u4ee3\u7801\u5df2\u7ecf\u53d1\u5e03\u3002**|\n", "2410.14668": "|**2024-10-18**|**MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps**|Xiongtao Zhou et.al.|[2410.14668](http://arxiv.org/abs/2410.14668)|**[link](https://github.com/alenai97/miceval)**|**Multimodal Chain of Thought\uff08MCoT\uff09\u662f\u4e00\u79cd\u6d41\u884c\u7684\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u63d0\u9ad8\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5404\u79cd\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5c3d\u7ba1\u8fd9\u79cd\u65b9\u6cd5\u5f88\u53d7\u6b22\u8fce\uff0c\u4f46\u5728\u8bc4\u4f30\u591a\u6a21\u6001\u94fe\u5f0f\u601d\u7ef4\u63a8\u7406\u6b65\u9aa4\u7684\u8d28\u91cf\u65b9\u9762\u4ecd\u7f3a\u4e4f\u81ea\u52a8\u5316\u65b9\u6cd5\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Multimodal Chain-of-Thought Evaluation\uff08MiCEval\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u8bc4\u4f30\u63cf\u8ff0\u548c\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u7684\u8d28\u91cf\u6765\u8bc4\u4f30\u63a8\u7406\u94fe\u7684\u6b63\u786e\u6027\u3002\u63cf\u8ff0\u90e8\u5206\u7684\u8bc4\u4f30\u4fa7\u91cd\u4e8e\u56fe\u50cf\u63cf\u8ff0\u7684\u51c6\u786e\u6027\uff0c\u800c\u63a8\u7406\u6b65\u9aa4\u5219\u6839\u636e\u524d\u7eed\u6b65\u9aa4\u6761\u4ef6\u751f\u6210\u65f6\u7684\u8d28\u91cf\u8fdb\u884c\u8bc4\u4f30\u3002MiCEval\u57fa\u4e8e\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u6839\u636e\u6b63\u786e\u6027\u3001\u76f8\u5173\u6027\u548c\u4fe1\u606f\u91cf\u5bf9\u6bcf\u4e2a\u6b65\u9aa4\u8fdb\u884c\u6807\u6ce8\u3002\u5bf9\u56db\u79cd\u6700\u5148\u8fdb\u7684MLLMs\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528MiCEval\u8fdb\u884c\u9010\u6b65\u8bc4\u4f30\u4e0e\u4eba\u7c7b\u5224\u65ad\u66f4\u52a0\u543b\u5408\uff0c\u76f8\u6bd4\u73b0\u6709\u57fa\u4e8e\u4f59\u5f26\u76f8\u4f3c\u5ea6\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u66f4\u4e3a\u51c6\u786e\u3002MiCEval\u6570\u636e\u96c6\u548c\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/alenai97/MiCEval\u627e\u5230\u3002**|\n", "2410.14660": "|**2024-10-18**|**A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning**|Shengjie Sun et.al.|[2410.14660](http://arxiv.org/abs/2410.14660)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bbe\u8ba1\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4efb\u52a1\u7684\u5956\u52b1\u51fd\u6570\u65b9\u9762\u663e\u793a\u51fa\u663e\u8457\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u83b7\u53d6\u9ad8\u8d28\u91cf\u7684\u5956\u52b1\u4ee3\u7801\u901a\u5e38\u9700\u8981\u4eba\u5de5\u5e72\u9884\u3001\u5927\u91cf\u7684LLM\u67e5\u8be2\u6216\u91cd\u590d\u7684RL\u8bad\u7ec3\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CARD\uff0c\u8fd9\u662f\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u5956\u52b1\u8bbe\u8ba1\u6846\u67b6\uff0c\u5b83\u8fed\u4ee3\u5730\u751f\u6210\u548c\u6539\u8fdb\u5956\u52b1\u51fd\u6570\u4ee3\u7801\u3002\u5177\u4f53\u6765\u8bf4\uff0cCARD\u5305\u62ec\u4e00\u4e2a\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u751f\u6210\u548c\u9a8c\u8bc1\u4ee3\u7801\uff0c\u540c\u65f6\u8fd8\u6709\u4e00\u4e2a\u8bc4\u4f30\u5668\u63d0\u4f9b\u52a8\u6001\u53cd\u9988\u6765\u6307\u5bfc\u7f16\u7801\u5668\u6539\u8fdb\u4ee3\u7801\uff0c\u4ece\u800c\u6d88\u9664\u4e86\u5bf9\u4eba\u5de5\u53cd\u9988\u7684\u9700\u6c42\u3002\u9664\u4e86\u8fc7\u7a0b\u53cd\u9988\u548c\u8f68\u8ff9\u53cd\u9988\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u8f68\u8ff9\u504f\u597d\u8bc4\u4f30\uff08TPE\uff09\uff0c\u8be5\u8bc4\u4f30\u57fa\u4e8e\u8f68\u8ff9\u504f\u597d\u6765\u8bc4\u4f30\u5f53\u524d\u7684\u5956\u52b1\u51fd\u6570\u3002\u5982\u679c\u4ee3\u7801\u672a\u80fd\u901a\u8fc7TPE\uff0c\u8bc4\u4f30\u5668\u5c06\u63d0\u4f9b\u504f\u597d\u53cd\u9988\uff0c\u907f\u514d\u4e86\u5728\u6bcf\u6b21\u8fed\u4ee3\u65f6\u8fdb\u884cRL\u8bad\u7ec3\uff0c\u5e76\u4f7f\u5956\u52b1\u51fd\u6570\u66f4\u597d\u5730\u4e0e\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u3002\u5728Meta-World\u548cManiSkill2\u4e0a\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4efb\u52a1\u6027\u80fd\u548c\u4ee4\u724c\u6548\u7387\u4e4b\u95f4\u5b9e\u73b0\u4e86\u6709\u6548\u7684\u5e73\u8861\uff0c\u5728\u6240\u6709\u4efb\u52a1\u4e0a\u90fd\u4f18\u4e8e\u6216\u5339\u914d\u57fa\u7ebf\u3002\u572812\u4e2a\u4efb\u52a1\u4e2d\u768410\u4e2a\u4efb\u52a1\u4e0a\uff0cCARD\u7684\u8868\u73b0\u4f18\u4e8e\u6216\u53ef\u4e0e\u4f7f\u7528\u4e13\u5bb6\u8bbe\u8ba1\u5956\u52b1\u8bad\u7ec3\u7684\u7b56\u7565\u76f8\u5ab2\u7f8e\uff0c\u751a\u81f3\u57283\u4e2a\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u6700\u4f18\u89e3\u3002|\n", "2410.14649": "|**2024-10-18**|**EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search**|Oliver Sieberling et.al.|[2410.14649](http://arxiv.org/abs/2410.14649)|**[link](https://github.com/ist-daslab/evopress)**|\u9ad8\u8ba1\u7b97\u6210\u672c\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u4e34\u7684\u4e00\u4e2a\u4e3b\u8981\u95ee\u9898\uff0c\u56e0\u6b64\u5bf9\u6a21\u578b\u538b\u7f29\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5305\u62ec\u91cf\u5316\u3001\u7a00\u758f\u5316\u6216\u7ed3\u6784\u5316\u526a\u679d\u7b49\u3002\u4e00\u4e2a\u65b0\u7684\u7814\u7a76\u524d\u6cbf\u662f\u7531\u6240\u8c13\u7684\u201c\u52a8\u6001\u3001\u975e\u5747\u5300\u201d\u538b\u7f29\u65b9\u6cd5\u6784\u6210\u7684\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u901a\u8fc7\u8c03\u6574\u6bcf\u5757\u6216\u751a\u81f3\u6bcf\u5c42\u7684\u538b\u7f29\u7ea7\u522b\uff08\u4f8b\u5982\u7a00\u758f\u6027\uff09\uff0c\u4ee5\u6700\u5c0f\u5316\u7cbe\u5ea6\u635f\u5931\uff0c\u540c\u65f6\u786e\u4fdd\u5168\u5c40\u538b\u7f29\u9608\u503c\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u542f\u53d1\u5f0f\u65b9\u6cd5\u6765\u8bc6\u522b\u7ed9\u5b9a\u5c42\u5bf9\u8bef\u5dee\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u57fa\u4e8e\u8bf8\u5982\u201c\u8bef\u5dee\u5355\u8c03\u6027\u201d\u7684\u5047\u8bbe\uff0c\u5373\u7aef\u5230\u7aef\u6a21\u578b\u538b\u7f29\u8bef\u5dee\u4e0e\u5404\u5c42\u8bef\u5dee\u4e4b\u548c\u6210\u6bd4\u4f8b\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u8fd9\u4e00\u9886\u57df\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u901a\u7528\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u7ed9\u5b9a\u8f93\u5165\u8303\u56f4\u5185\u88ab\u8bc1\u660e\u662f\u6700\u4f73\u7684\u3002\u6211\u4eec\u7684\u52a8\u673a\u89c2\u5bdf\u5230\uff0c\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u201c\u8bef\u5dee\u5355\u8c03\u6027\u201d\u5bf9\u4e8eLLMs\u5e76\u4e0d\u6210\u7acb\uff1a\u5177\u6709\u8f83\u4f4e\u5404\u5c42\u8bef\u5dee\u603b\u548c\u7684\u538b\u7f29\u6a21\u578b\u53ef\u80fd\u8868\u73b0\u5f97\u6bd4\u8bef\u5dee\u603b\u548c\u8f83\u9ad8\u7684\u6a21\u578b\u66f4\u5dee\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u901a\u7528\u8fdb\u5316\u6846\u67b6\uff0c\u79f0\u4e3aEvoPress\uff0c\u5b83\u5177\u6709\u7406\u8bba\u4e0a\u7684\u6536\u655b\u6027\u548c\u4f4e\u6837\u672c\u53ca\u8bc4\u4f30\u590d\u6742\u5ea6\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u4fdd\u8bc1\u5bfc\u81f4\u4e86\u5728\u52a8\u6001\u538b\u7f29Llama\u3001Mistral\u548cPhi\u6a21\u578b\u65b9\u9762\u9ad8\u5ea6\u7ade\u4e89\u7684\u5b9e\u9645\u6027\u80fd\u3002\u901a\u8fc7EvoPress\uff0c\u6211\u4eec\u5728\u6240\u6709\u538b\u7f29\u65b9\u6cd5\u4e0a\u90fd\u8fbe\u5230\u4e86\u6700\u65b0\u7684\u6210\u679c\uff1a\u7ed3\u6784\u526a\u679d\uff08\u5757/\u5c42\u5220\u9664\uff09\u3001\u975e\u7ed3\u6784\u5316\u7a00\u758f\u6027\u4ee5\u53ca\u5177\u6709\u52a8\u6001\u4f4d\u5bbd\u7684\u91cf\u5316\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/IST-DASLab/EvoPress\u83b7\u53d6\u3002|\n", "2410.14641": "|**2024-10-18**|**Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs**|Runchu Tian et.al.|[2410.14641](http://arxiv.org/abs/2410.14641)|**[link](https://github.com/Rachum-thu/LongPiBench)**|**\u4f4d\u7f6e\u504f\u5dee\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u957f\u8f93\u5165\u7684\u80fd\u529b\u3002\u4e00\u4e2a\u663e\u8457\u7684\u4f8b\u5b50\u662f\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u73b0\u8c61\uff0c\u5373LLMs\u96be\u4ee5\u5229\u7528\u4f4d\u4e8e\u8f93\u5165\u4e2d\u95f4\u7684\u76f8\u5173\u4fe1\u606f\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4e2a\u76f8\u5173\u4fe1\u606f\u4e0a\uff0c\u4f46\u73b0\u5b9e\u4e16\u754c\u7684\u5e94\u7528\u901a\u5e38\u6d89\u53ca\u591a\u4e2a\u76f8\u5173\u7684\u4fe1\u606f\u7247\u6bb5\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LongPiBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30\u6d89\u53ca\u591a\u4e2a\u76f8\u5173\u7247\u6bb5\u7684\u4f4d\u7f6e\u504f\u5dee\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u901a\u8fc7\u4e94\u79cd\u5546\u4e1a\u6a21\u578b\u548c\u516d\u79cd\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u7684\u8be6\u7ec6\u5b9e\u9a8c\u8868\u660e\uff0c\u867d\u7136\u5927\u591a\u6570\u5f53\u524d\u6a21\u578b\u5bf9\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u7684\u95ee\u9898\u5177\u6709\u9c81\u68d2\u6027\uff0c\u4f46\u5b58\u5728\u4e0e\u76f8\u5173\u4fe1\u606f\u7247\u6bb5\u95f4\u8ddd\u663e\u8457\u76f8\u5173\u7684\u504f\u5dee\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8bc4\u4f30\u548c\u51cf\u5c11\u4f4d\u7f6e\u504f\u5dee\u7684\u91cd\u8981\u6027\uff0c\u4ee5\u63d0\u5347LLMs\u7684\u80fd\u529b\u3002**|\n", "2410.14635": "|**2024-10-18**|**GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings**|Raghuveer Thirukovalluru et.al.|[2410.14635](http://arxiv.org/abs/2410.14635)|**[link](https://github.com/raghavlite/GenEOL)**|\u8bad\u7ec3-free\u5d4c\u5165\u65b9\u6cd5\u76f4\u63a5\u5229\u7528\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u5d4c\u5165\u6587\u672c\uff0c\u907f\u514d\u4e86\u5bf9\u6bd4\u5b66\u4e60\u7684\u6602\u8d35\u548c\u590d\u6742\u7684\u6d41\u7a0b\u3002\u5148\u524d\u7684\u8bad\u7ec3-free\u5d4c\u5165\u65b9\u6cd5\u4e3b\u8981\u96c6\u4e2d\u5728\u4f18\u5316\u5d4c\u5165\u63d0\u793a\u4e0a\uff0c\u800c\u5ffd\u7565\u4e86\u5229\u7528LLMs\u7684\u751f\u6210\u80fd\u529b\u7684\u597d\u5904\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5GenEOL\uff0c\u8be5\u65b9\u6cd5\u4f7f\u7528LLMs\u751f\u6210\u4fdd\u7559\u53e5\u5b50\u542b\u4e49\u7684\u4e0d\u540c\u53d8\u6362\uff0c\u5e76\u805a\u5408\u8fd9\u4e9b\u53d8\u6362\u7684\u5d4c\u5165\u7ed3\u679c\u4ee5\u589e\u5f3a\u6574\u4f53\u53e5\u5b50\u5d4c\u5165\u3002GenEOL\u5728\u591a\u4e2aLLMs\u7684\u53e5\u5b50\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u6027\uff08STS\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5e73\u5747\u6bd4\u73b0\u6709\u7684\u8bad\u7ec3-free\u5d4c\u5165\u65b9\u6cd5\u9ad8\u51fa2.85\u5206\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cGenEOL\u5728LLM\u5c42\u9762\u4e0a\u7a33\u5b9a\u4e86\u8868\u5f81\u8d28\u91cf\uff0c\u5e76\u4e14\u5bf9\u5d4c\u5165\u63d0\u793a\u7684\u6270\u52a8\u5177\u6709\u9c81\u68d2\u6027\u3002GenEOL\u8fd8\u5728MTEB\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u591a\u4e2a\u805a\u7c7b\u3001\u91cd\u6392\u5e8f\u548c\u914d\u5bf9\u5206\u7c7b\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002|\n", "2410.14609": "|**2024-10-18**|**DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search**|Simon Lupart et.al.|[2410.14609](http://arxiv.org/abs/2410.14609)|null|\u4f1a\u8bdd\u641c\u7d22\uff08CS\uff09\u4efb\u52a1\u662f\u5728\u8bed\u5883\u5185\u4ece\u6587\u6863\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u6587\u6863\uff0c\u7ed3\u5408\u68c0\u7d22\u4e0e\u4f1a\u8bdd\u4e0a\u4e0b\u6587\u5efa\u6a21\u3002\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0cCS\u9886\u57df\u901a\u8fc7LLMs\u91cd\u5199\u7528\u6237\u67e5\u8be2\u5e76\u8003\u8651\u4f1a\u8bdd\u4e0a\u4e0b\u6587\u5f97\u5230\u4e86\u663e\u8457\u6539\u8fdb\u3002\u7136\u800c\uff0c\u5728\u63a8\u7406\u65f6\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u4f1a\u5f71\u54cd\u6548\u7387\u3002\u5f53\u524d\u65b9\u6cd5\u901a\u8fc7\u4ece\u4eba\u7c7b\u91cd\u5199\u7684\u67e5\u8be2\u4e2d\u84b8\u998f\u5d4c\u5165\u6765\u5b66\u4e60\u4e0a\u4e0b\u6587\u5efa\u6a21\u4efb\u52a1\u4ee5\u89e3\u51b3\u6b64\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u4e8e\u4e0a\u4e0b\u6587\u5efa\u6a21\uff0c\u5e76\u4e14\u4ec5\u5728\u72ec\u7acb\u4e8e\u84b8\u998f\u7684\u635f\u5931\u9879\u4e2d\u5904\u7406\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u6bd4\u90e8\u5206\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u5bf9\u5148\u524d\u76ee\u6807\u7684\u653e\u677e\uff0c\u7edf\u4e00\u68c0\u7d22\u548c\u4e0a\u4e0b\u6587\u5efa\u6a21\u3002\u6211\u4eec\u901a\u8fc7\u84b8\u998f\u5bf9\u8bdd\u548c\u6587\u6863\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u5206\u6570\u6765\u653e\u677e\u73b0\u6709\u7684\u8bad\u7ec3\u76ee\u6807\uff0c\u800c\u4e0d\u662f\u4ec5\u4ec5\u4f9d\u8d56\u8868\u793a\u5b66\u4e60\u3002\u6211\u4eec\u63d0\u51fa\u7684\u84b8\u998f\u76ee\u6807\u5141\u8bb8\u5728\u8868\u793a\u7a7a\u95f4\u4e2d\u6709\u66f4\u591a\u7684\u81ea\u7531\u5ea6\uff0c\u5e76\u5229\u7528\u6587\u6863\u76f8\u5173\u6027\u7684\u5bf9\u6bd4\u6027\u8d28\u3002\u901a\u8fc7\u57285\u4e2aCS\u6570\u636e\u96c6\u4e0a\u7684Learned Sparse Retrieval\uff08LSR\uff09\u5b9e\u9a8c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u57df\u5185\u548c\u57df\u5916\u68c0\u7d22\u6027\u80fd\u65b9\u9762\u5747\u663e\u793a\u51fa\u663e\u8457\u6539\u5584\uff0c\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u6c34\u5e73\uff0c\u5728\u57df\u5916\u6570\u636e\u96c6\u4e0a\u53ec\u56de\u7387\u63d0\u9ad8\u4e86\u591a\u8fbe6\u4e2a\u767e\u5206\u70b9\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u653e\u677e\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6559\u5e08\u84b8\u998f\uff0c\u4f7f\u7528\u591a\u4e2aLLM\u4f5c\u4e3a\u6559\u5e08\uff0c\u4ece\u800c\u83b7\u5f97\u989d\u5916\u6536\u76ca\uff0c\u5e76\u5728\u57df\u5185\u5b9e\u9a8c\u4e2d\u8d85\u8d8a\u8fd9\u4e9b\u6559\u5e08\u672c\u8eab\u3002\u6700\u540e\uff0c\u5bf9\u6a21\u578b\u7a00\u758f\u6027\u7684\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u84b8\u998f\u65b9\u6cd5\u80fd\u591f\u66f4\u597d\u5730\u63a7\u5236\u8bad\u7ec3\u6a21\u578b\u7684\u7a00\u758f\u6027\u3002|\n", "2410.14596": "|**2024-10-18**|**Teaching Models to Balance Resisting and Accepting Persuasion**|Elias Stengel-Eskin et.al.|[2410.14596](http://arxiv.org/abs/2410.14596)|**[link](https://github.com/esteng/persuasion_balanced_training)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u8bf4\u670d\u7684\u5f71\u54cd\uff0c\u8fd9\u5728\u6a21\u578b\u9762\u5bf9\u654c\u5bf9\u5bf9\u8bdd\u8005\u65f6\u53ef\u80fd\u5e26\u6765\u98ce\u9669\u3002\u6211\u4eec\u8fc8\u51fa\u4e86\u9632\u5fa1\u6a21\u578b\u514d\u53d7\u8bf4\u670d\u7684\u7b2c\u4e00\u6b65\uff0c\u540c\u65f6\u8ba4\u4e3a\u9632\u5fa1\u8d1f\u9762\u8bf4\u670d\u53ea\u662f\u95ee\u9898\u7684\u4e00\u534a\uff1a\u6a21\u578b\u8fd8\u5e94\u8be5\u80fd\u591f\u63a5\u53d7\u6709\u76ca\u7684\u8bf4\u670d\u4ee5\u6539\u8fdb\u5176\u7b54\u6848\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ec5\u4f18\u5316\u6a21\u578b\u4e00\u65b9\u9762\u4f1a\u5bfc\u81f4\u5728\u53e6\u4e00\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5e73\u8861\u6b63\u9762\u548c\u8d1f\u9762\u8bf4\u670d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8bf4\u670d\u5e73\u8861\u8bad\u7ec3\uff08PBT\uff09\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u591a\u667a\u80fd\u4f53\u9012\u5f52\u5bf9\u8bdd\u6811\u6765\u751f\u6210\u6570\u636e\uff0c\u5e76\u901a\u8fc7\u504f\u597d\u4f18\u5316\u8bad\u7ec3\u6a21\u578b\u5728\u9002\u5f53\u65f6\u5019\u63a5\u53d7\u8bf4\u670d\u3002PBT\u4e00\u81f4\u63d0\u9ad8\u4e86\u6a21\u578b\u5bf9\u6297\u9519\u8bef\u4fe1\u606f\u7684\u62b5\u6297\u529b\u548c\u5e94\u5bf9\u6311\u6218\u7684\u97e7\u6027\uff0c\u540c\u65f6\u4e5f\u4f7f\u6a21\u578b\u5728\u5305\u542b\u6b63\u53cd\u4e24\u9762\u8bf4\u670d\u7684\u6574\u4f53\u6570\u636e\u4e0a\u8868\u73b0\u6700\u4f73\u3002\u81f3\u5173\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0PBT\u6a21\u578b\u5728\u591a\u667a\u80fd\u4f53\u8fa9\u8bba\u4e2d\u662f\u66f4\u597d\u7684\u961f\u53cb\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u6ca1\u6709PBT\u7684\u60c5\u51b5\u4e0b\uff0c\u66f4\u5f3a\u548c\u8f83\u5f31\u6a21\u578b\u7684\u7ec4\u5408\u8868\u73b0\u51fa\u4e0d\u7a33\u5b9a\u6027\u80fd\uff0c\u6a21\u578b\u56de\u7b54\u7684\u987a\u5e8f\u51b3\u5b9a\u4e86\u56e2\u961f\u83b7\u5f97\u8f83\u5f3a\u6216\u8f83\u5f31\u6a21\u578b\u7684\u8868\u73b0\u3002PBT\u5e26\u6765\u4e86\u66f4\u597d\u4e14\u66f4\u7a33\u5b9a\u7684\u6027\u80fd\u7ed3\u679c\uff0c\u5e76\u51cf\u5c11\u4e86\u987a\u5e8f\u4f9d\u8d56\u6027\uff0c\u8f83\u5f3a\u6a21\u578b\u80fd\u591f\u6301\u7eed\u63d0\u5347\u8f83\u5f31\u6a21\u578b\u7684\u8868\u73b0\u3002**|\n", "2410.16270": "|**2024-10-21**|**Reflection-Bench: probing AI intelligence with reflection**|Lingyu Li et.al.|[2410.16270](http://arxiv.org/abs/2410.16270)|**[link](https://github.com/yabyum/reflectionbench)**|**\u9002\u5e94\u6027\u5730\u8c03\u6574\u4fe1\u5ff5\u6216\u884c\u4e3a\u4ee5\u5e94\u5bf9\u610f\u5916\u7ed3\u679c\u7684\u53cd\u601d\u80fd\u529b\uff0c\u662f\u667a\u80fd\u7cfb\u7edf\u4e0e\u4e16\u754c\u4e92\u52a8\u7684\u6838\u5fc3\u539f\u5219\u3002\u4ece\u8ba4\u77e5\u79d1\u5b66\u7684\u89d2\u5ea6\u6765\u770b\uff0c\u8fd9\u4e00\u539f\u5219\u9002\u7528\u4e8e\u4eba\u7c7b\u548c\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u3002\u4e3a\u4e86\u5e94\u5bf9\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u667a\u80fd\u6027\u7684\u8fa9\u8bba\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Reflection-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b\u4e03\u4e2a\u4efb\u52a1\u7684\u7efc\u5408\u57fa\u51c6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u53cd\u601d\u6240\u9700\u7684\u6838\u5fc3\u8ba4\u77e5\u529f\u80fd\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u3001\u4fe1\u5ff5\u66f4\u65b0\u3001\u51b3\u7b56\u3001\u9884\u6d4b\u3001\u53cd\u4e8b\u5b9e\u601d\u7ef4\u548c\u5143\u53cd\u601d\u3002\u6211\u4eec\u8bc4\u4f30\u4e8613\u4e2a\u8457\u540d\u7684LLMs\uff0c\u5982OpenAI o1\u3001GPT-4\u3001Claude 3.5 Sonnet\u7b49\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u76ee\u524d\u7684LLMs\u5728\u53cd\u601d\u80fd\u529b\u65b9\u9762\u4ecd\u4e0d\u4ee4\u4eba\u6ee1\u610f\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u8fd9\u4e9b\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u7684\u6f5c\u5728\u65b9\u5411\u3002\u603b\u4e4b\uff0cReflection-Bench\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u8bc4\u4f30\u5de5\u5177\uff0c\u4e5f\u4e3a\u5f00\u53d1\u80fd\u591f\u53ef\u9760\u5730\u4e0e\u73af\u5883\u4ea4\u4e92\u7684AI\u63d0\u4f9b\u4e86\u7075\u611f\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u53ef\u5728https://github.com/YabYum/ReflectionBench\u83b7\u5f97\u3002**|\n", "2410.16261": "|**2024-10-21**|**Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance**|Zhangwei Gao et.al.|[2410.16261](http://arxiv.org/abs/2410.16261)|**[link](https://github.com/opengvlab/internvl)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5e7f\u6cdb\u7684\u9886\u57df\u5185\u5c55\u793a\u4e86\u5728\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6a21\u578b\u89c4\u6a21\u5e9e\u5927\u548c\u76f8\u5173\u7684\u9ad8\u8ba1\u7b97\u6210\u672c\uff0c\u5728\u6d88\u8d39\u8005\u7ea7GPU\u6216\u8fb9\u7f18\u8bbe\u5907\u4e0a\u8bad\u7ec3\u548c\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u9762\u4e34\u7740\u5de8\u5927\u6311\u6218\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Mini-InternVL\u7cfb\u5217\u6a21\u578b\uff0c\u5176\u53c2\u6570\u91cf\u4ece1B\u52304B\u4e0d\u7b49\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ec5\u4f7f\u75285%\u7684\u53c2\u6570\u5c31\u80fd\u8fbe\u523090%\u7684\u6027\u80fd\u3002\u8fd9\u79cd\u663e\u8457\u7684\u6548\u7387\u548c\u6548\u679c\u63d0\u5347\u4f7f\u6211\u4eec\u7684\u6a21\u578b\u66f4\u52a0\u6613\u4e8e\u8bbf\u95ee\u548c\u9002\u7528\u4e8e\u5404\u79cd\u5b9e\u9645\u573a\u666f\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u8fd9\u4e9b\u6a21\u578b\u7684\u91c7\u7528\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7edf\u4e00\u7684\u9002\u5e94\u6846\u67b6\uff0c\u4f7f\u5f97Mini-InternVL\u6a21\u578b\u80fd\u591f\u8f6c\u79fb\u5e76\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8d85\u8d8a\u4e13\u95e8\u6a21\u578b\uff0c\u5305\u62ec\u81ea\u52a8\u9a7e\u9a76\u3001\u533b\u5b66\u5f71\u50cf\u548c\u9065\u611f\u7b49\u9886\u57df\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u7814\u7a76\u53ef\u4ee5\u4e3a\u9ad8\u6548\u4e14\u6709\u6548\u7684MLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u8d44\u6e90\u3002\u4ee3\u7801\u53ef\u5728https://github.com/OpenGVLab/InternVL\u83b7\u53d6\u3002**|\n", "2410.16257": "|**2024-10-21**|**Elucidating the design space of language models for image generation**|Xuantong Liu et.al.|[2410.16257](http://arxiv.org/abs/2410.16257)|**[link](https://github.com/Pepper-lll/LMforImageGeneration)**|\u81ea\u56de\u5f52\uff08AR\uff09\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u751f\u6210\u4e2d\u7684\u6210\u529f\u6fc0\u53d1\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u793e\u533a\u91c7\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u56fe\u50cf\u751f\u6210\u3002\u7136\u800c\uff0c\u8003\u8651\u5230\u6587\u672c\u548c\u56fe\u50cf\u6a21\u6001\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\uff0c\u7528\u4e8e\u56fe\u50cf\u751f\u6210\u7684\u8bed\u8a00\u6a21\u578b\u7684\u8bbe\u8ba1\u7a7a\u95f4\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u56fe\u50cf\u6807\u8bb0\u8868\u73b0\u51fa\u6bd4\u6587\u672c\u6807\u8bb0\u66f4\u5927\u7684\u968f\u673a\u6027\uff0c\u8fd9\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5e26\u6765\u4e86\u6311\u6218\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cAR\u6a21\u578b\u901a\u8fc7\u6709\u6548\u5730\u5b66\u4e60\u5373\u4f7f\u662f\u4ece\u770b\u4f3c\u6b21\u4f18\u7684\u4f18\u5316\u95ee\u9898\u4e2d\u63d0\u53d6\u7684\u6a21\u5f0f\uff0c\u5c55\u793a\u4e86\u5176\u6f5c\u529b\u3002\u6211\u4eec\u7684\u5206\u6790\u8fd8\u8868\u660e\uff0c\u867d\u7136\u6240\u6709\u6a21\u578b\u90fd\u6210\u529f\u5730\u7406\u89e3\u4e86\u5c40\u90e8\u4fe1\u606f\u5728\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u91cd\u8981\u6027\uff0c\u4f46\u8f83\u5c0f\u7684\u6a21\u578b\u96be\u4ee5\u6355\u6349\u5168\u5c40\u4e0a\u4e0b\u6587\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u8f83\u5927\u7684\u6a21\u578b\u5728\u8fd9\u65b9\u9762\u8868\u73b0\u51fa\u66f4\u597d\u7684\u80fd\u529b\uff0c\u89e3\u91ca\u4e86\u5f53\u6269\u5927\u6a21\u578b\u89c4\u6a21\u65f6\u6027\u80fd\u63d0\u5347\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u901a\u8fc7\u5e7f\u6cdb\u7684\u5bf9\u6bd4\u5b9e\u9a8c\u9610\u660e\u4e86\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u7684\u8bed\u8a00\u6a21\u578b\u7684\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5305\u62ec\u6807\u8bb0\u5668\u9009\u62e9\u3001\u6a21\u578b\u9009\u62e9\u3001\u6a21\u578b\u53ef\u6269\u5c55\u6027\u3001\u8bcd\u6c47\u8bbe\u8ba1\u548c\u91c7\u6837\u7b56\u7565\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u9996\u6b21\u5206\u6790\u4e86\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u751f\u6210\u4e2d\u7684\u4f18\u5316\u884c\u4e3a\uff0c\u6211\u4eec\u8ba4\u4e3a\u5b83\u80fd\u591f\u542f\u53d1\u66f4\u6709\u6548\u7684\u8bbe\u8ba1\uff0c\u5f53\u5c06LMs\u5e94\u7528\u4e8e\u5176\u4ed6\u9886\u57df\u65f6\u3002\u6700\u540e\uff0c\u6211\u4eec\u9610\u660e\u4e86\u4e00\u79cd\u7528\u4e8e\u56fe\u50cf\u751f\u6210\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u79f0\u4e3aELM\uff0c\u5728ImageNet 256*256\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002|\n", "2410.16256": "|**2024-10-21**|**CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution**|Maosong Cao et.al.|[2410.16256](http://arxiv.org/abs/2410.16256)|**[link](https://github.com/open-compass/compassjudger)**|**\u9ad8\u6548\u4e14\u51c6\u786e\u7684\u8bc4\u4f30\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6301\u7eed\u6539\u8fdb\u81f3\u5173\u91cd\u8981\u3002\u5728\u5404\u79cd\u8bc4\u4f30\u65b9\u6cd5\u4e2d\uff0c\u4e3b\u89c2\u8bc4\u4f30\u56e0\u5176\u4e0e\u73b0\u5b9e\u4e16\u754c\u4f7f\u7528\u573a\u666f\u548c\u4eba\u7c7b\u504f\u597d\u7684\u9ad8\u5ea6\u4e00\u81f4\u800c\u5907\u53d7\u5173\u6ce8\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u4eba\u7c7b\u7684\u8bc4\u4f30\u65e2\u6602\u8d35\u53c8\u7f3a\u4e4f\u53ef\u91cd\u590d\u6027\uff0c\u56e0\u6b64\u7cbe\u786e\u7684\u81ea\u52a8\u5316\u8bc4\u4f30\u8005\uff08\u8bc4\u5224\u8005\uff09\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\\textbf{CompassJudger-1}\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5f00\u6e90\u7684\\textbf{\u4e00\u4f53\u5316}\u8bc4\u5224LLM\u3002CompassJudger-1\u662f\u4e00\u4e2a\u901a\u7528\u7684LLM\uff0c\u8868\u73b0\u51fa\u663e\u8457\u7684\u591a\u529f\u80fd\u6027\u3002\u5b83\u80fd\u591f\uff1a1. \u4f5c\u4e3a\u5956\u52b1\u6a21\u578b\u8fdb\u884c\u5355\u4e00\u8bc4\u5206\u548c\u53cc\u6a21\u578b\u6bd4\u8f83\uff1b2. \u6839\u636e\u6307\u5b9a\u683c\u5f0f\u8fdb\u884c\u8bc4\u4f30\uff1b3. \u751f\u6210\u6279\u8bc4\uff1b4. \u6267\u884c\u5404\u79cd\u4efb\u52a1\uff0c\u5c31\u50cf\u4e00\u4e2a\u901a\u7528\u7684LLM\u3002\u4e3a\u4e86\u5728\u4e00\u4e2a\u7edf\u4e00\u7684\u8bbe\u7f6e\u4e0b\u8bc4\u4f30\u4e0d\u540c\u8bc4\u5224\u6a21\u578b\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fd8\u5efa\u7acb\u4e86\\textbf{JudgerBench}\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u4e3b\u89c2\u8bc4\u4f30\u4efb\u52a1\uff0c\u5e76\u6d89\u53ca\u5e7f\u6cdb\u7684\u4e3b\u9898\u3002CompassJudger-1\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\u6765\u5904\u7406\u5404\u79cd\u8bc4\u4f30\u4efb\u52a1\uff0c\u540c\u65f6\u4fdd\u6301\u9002\u5e94\u591a\u6837\u5316\u9700\u6c42\u7684\u7075\u6d3b\u6027\u3002CompassJudger\u548cJudgerBench\u5747\u5df2\u53d1\u5e03\u5e76\u53ef\u4f9b\u7814\u7a76\u793e\u533a\u8bbf\u95eehttps://github.com/open-compass/CompassJudger\u3002\u6211\u4eec\u76f8\u4fe1\u901a\u8fc7\u5f00\u6e90\u8fd9\u4e9b\u5de5\u5177\uff0c\u6211\u4eec\u53ef\u4ee5\u4fc3\u8fdb\u5408\u4f5c\u5e76\u52a0\u901fLLM\u8bc4\u4f30\u65b9\u6cd5\u7684\u8fdb\u6b65\u3002**|\n", "2410.16251": "|**2024-10-21**|**Can Knowledge Editing Really Correct Hallucinations?**|Baixiang Huang et.al.|[2410.16251](http://arxiv.org/abs/2410.16251)|**[link](https://github.com/llm-editing/HalluEditBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u5185\u5bb9\u65f6\u5e38\u5e38\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u5373\u5305\u542b\u4e0d\u771f\u5b9e\u7684\u4fe1\u606f\uff0c\u5c3d\u7ba1\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u540c\u65f6\uff0c\u77e5\u8bc6\u7f16\u8f91\u4f5c\u4e3a\u4e00\u79cd\u65b0\u7684\u6d41\u884c\u8303\u5f0f\uff0c\u88ab\u7528\u6765\u7ea0\u6b63LLMs\u4e2d\u9519\u8bef\u7684\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u5176\u4f18\u52bf\u5728\u4e8e\u907f\u514d\u4e86\u4ece\u5934\u5f00\u59cb\u91cd\u65b0\u8bad\u7ec3\u7684\u9700\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7528\u4e8e\u77e5\u8bc6\u7f16\u8f91\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u7684\u4e00\u4e2a\u5e38\u89c1\u95ee\u9898\u662f\uff0c\u5b83\u4eec\u5e76\u4e0d\u80fd\u786e\u4fddLLMs\u5728\u7f16\u8f91\u524d\u5bf9\u8bc4\u4f30\u95ee\u9898\u751f\u6210\u5e7b\u89c9\u6027\u7b54\u6848\u3002\u5f53\u7ecf\u8fc7\u4e0d\u540c\u6280\u672f\u7f16\u8f91\u540e\u7684LLMs\u5728\u8fd9\u7c7b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\u65f6\uff0c\u5f88\u96be\u76f4\u63a5\u91c7\u7528\u8fd9\u4e9b\u6027\u80fd\u6765\u8bc4\u4f30\u4e0d\u540c\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u5728\u7ea0\u6b63\u5e7b\u89c9\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u4e00\u4e2a\u57fa\u672c\u7684\u95ee\u9898\u4ecd\u672a\u5f97\u5230\u5145\u5206\u9a8c\u8bc1\uff1a\u77e5\u8bc6\u7f16\u8f91\u771f\u7684\u80fd\u7ea0\u6b63LLMs\u4e2d\u7684\u5e7b\u89c9\u5417\uff1f\u6211\u4eec\u63d0\u51fa\u4e86HalluEditBench\uff0c\u4ee5\u5168\u9762\u57fa\u51c6\u6d4b\u8bd5\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u5728\u7ea0\u6b63\u73b0\u5b9e\u4e16\u754c\u5e7b\u89c9\u65b9\u9762\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u4e25\u683c\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9\u4e2a\u9886\u57df\u300126\u4e2a\u4e3b\u9898\u548c\u8d85\u8fc76000\u4e2a\u5e7b\u89c9\u7684\u5927\u89c4\u6a21\u5e7b\u89c9\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728\u4e94\u4e2a\u7ef4\u5ea6\u2014\u2014\u5305\u62ec\u6709\u6548\u6027\u3001\u6cdb\u5316\u80fd\u529b\u3001\u53ef\u79fb\u690d\u6027\u3001\u5c40\u90e8\u6027\u548c\u9c81\u68d2\u6027\u2014\u2014\u4e0a\u5168\u9762\u8bc4\u4f30\u4e86\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u901a\u8fc7HalluEditBench\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9\u4e0d\u540c\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\u5728\u7ea0\u6b63\u5e7b\u89c9\u65b9\u9762\u7684\u6f5c\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\uff0c\u8fd9\u53ef\u4ee5\u542f\u53d1\u672a\u6765\u7684\u6539\u8fdb\u5e76\u4fc3\u8fdb\u77e5\u8bc6\u7f16\u8f91\u9886\u57df\u7684\u8fdb\u5c55\u3002**|\n", "2410.16246": "|**2024-10-21**|**Analyzing Context Contributions in LLM-based Machine Translation**|Emmanouil Zaranis et.al.|[2410.16246](http://arxiv.org/abs/2410.16246)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u65b9\u9762\u5df2\u7ecf\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u5c55\u793a\u4e86\u5229\u7528\u4e0a\u4e0b\u6587\u8fdb\u884c\u5b66\u4e60\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5173\u4e8eLLMs\u5982\u4f55\u4f7f\u7528\u8f93\u5165\u7684\u4e0d\u540c\u90e8\u5206\u7684\u673a\u5236\u4ecd\u7136\u5f88\u5927\u7a0b\u5ea6\u4e0a\u672a\u88ab\u63a2\u7d22\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5bf9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u4e0a\u4e0b\u6587\u5229\u7528\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u7814\u7a76\u4e86\u5f53\u751f\u6210\u7ffb\u8bd1\u65f6\uff0cLLMs\u5982\u4f55\u4f7f\u7528\u5404\u79cd\u4e0a\u4e0b\u6587\u90e8\u5206\uff0c\u5982\u5c11\u91cf\u793a\u4f8b\u548c\u6e90\u6587\u672c\u3002\u6211\u4eec\u5f3a\u8c03\u4e86\u51e0\u4e2a\u5173\u952e\u53d1\u73b0\uff1a\uff081\uff09\u5728\u4e0d\u540c\u7ffb\u8bd1\u65b9\u5411\u4e0b\uff0c\u5c11\u91cf\u793a\u4f8b\u7684\u6e90\u90e8\u5206\u4f3c\u4e4e\u6bd4\u5176\u5bf9\u5e94\u7684\u76ee\u6807\u90e8\u5206\u8d21\u732e\u66f4\u5927\uff1b\uff082\uff09\u7528\u5e73\u884c\u6570\u636e\u5fae\u8c03LLMs\u4f1a\u6539\u53d8\u4e0d\u540c\u4e0a\u4e0b\u6587\u90e8\u5206\u7684\u8d21\u732e\u6a21\u5f0f\uff1b\uff083\uff09\u5b58\u5728\u4f4d\u7f6e\u504f\u5dee\uff0c\u5373\u66f4\u65e9\u7684\u5c11\u91cf\u793a\u4f8b\u5bf9\u7ffb\u8bd1\u5e8f\u5217\u7684\u8d21\u732e\u66f4\u9ad8\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u68c0\u67e5\u5f02\u5e38\u7684\u4e0a\u4e0b\u6587\u8d21\u732e\u53ef\u4ee5\u6f5c\u5728\u5730\u63ed\u793a\u75c5\u6001\u7ffb\u8bd1\uff0c\u4f8b\u5982\u5e7b\u89c9\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u57fa\u4e8eLLM\u7684\u673a\u5668\u7ffb\u8bd1\u7684\u5185\u90e8\u8fd0\u4f5c\u673a\u5236\uff0c\u8fd9\u4e9b\u673a\u5236\u8d85\u8d8a\u4e86\u6807\u51c6\u7f16\u7801\u5668-\u89e3\u7801\u5668\u673a\u5668\u7ffb\u8bd1\u6a21\u578b\u5df2\u77e5\u7684\u77e5\u8bc6\u3002|\n", "2410.16237": "|**2024-10-21**|**IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems**|Yihuan Mao et.al.|[2410.16237](http://arxiv.org/abs/2410.16237)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8d8a\u6765\u8d8a\u591a\u5730\u96c6\u6210\u5230\u6211\u4eec\u7684\u57fa\u7840\u8bbe\u65bd\u4e2d\uff0c\u5b83\u4eec\u7684\u7a33\u5065\u534f\u8c03\u548c\u6d88\u606f\u540c\u6b65\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u62dc\u5360\u5ead\u5c06\u519b\u95ee\u9898\uff08BGP\uff09\u662f\u6784\u5efa\u5728\u5bf9\u6297\u6027\u653b\u51fb\u4e0b\u5177\u6709\u5f39\u6027\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08MAS\uff09\u7684\u5173\u952e\u6a21\u578b\u3002\u5b83\u63cf\u8ff0\u4e86\u4e00\u79cd\u60c5\u666f\uff0c\u5728\u8fd9\u79cd\u60c5\u666f\u4e2d\u7cfb\u7edf\u5185\u5b58\u5728\u6076\u610f\u4ee3\u7406\uff0c\u8fd9\u79cd\u60c5\u51b5\u53ef\u80fd\u6e90\u4e8eLLM\u4ee3\u7406\u7684\u5e7b\u89c9\u6216\u5916\u90e8\u653b\u51fb\u3002\u5728BGP\u4e2d\uff0c\u6574\u4e2a\u7cfb\u7edf\u7684\u76ee\u7684\u662f\u5c31\u91c7\u53d6\u7684\u884c\u52a8\u8fbe\u6210\u5171\u8bc6\u3002\u4f20\u7edf\u7684BGP\u8981\u6c42\u6240\u6709\u4ee3\u7406\u4e4b\u95f4\u5b9e\u73b0\u5168\u5c40\u5171\u8bc6\uff1b\u7136\u800c\uff0c\u5728\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u5168\u5c40\u5171\u8bc6\u5e76\u4e0d\u603b\u662f\u5fc5\u8981\uff0c\u751a\u81f3\u53ef\u80fd\u662f\u4f4e\u6548\u7684\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u63a2\u7d22\u4e00\u79cd\u4e0eMAS\u4e2d\u89c2\u5bdf\u5230\u7684\u5c40\u90e8\u534f\u8c03\u6a21\u5f0f\u76f8\u4e00\u81f4\u7684\u6539\u8fdb\u7248BGP\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u79f0\u8fd9\u79cd\u6539\u8fdb\u7248\u672c\u4e3a\u4e0d\u5b8c\u7f8eBGP\uff08IBGP\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u5dee\u5f02\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u901a\u7528MAS\u8bbe\u7f6e\u4e2d\u7684\u5171\u8bc6\u534f\u8bae\uff0c\u63d0\u4f9b\u4e86\u5bf9\u901a\u4fe1\u653b\u51fb\u7684\u53ef\u8bc1\u660e\u7684\u5f39\u6027\u4ee5\u53ca\u9002\u5e94\u4e0d\u65ad\u53d8\u5316\u73af\u5883\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u5b9e\u8bc1\u7ed3\u679c\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u4e00\u4e2a\u4f20\u611f\u5668\u7f51\u7edc\u73af\u5883\u7684\u6848\u4f8b\u7814\u7a76\u6765\u8bf4\u660e\u6211\u4eec\u534f\u8bae\u7684\u5b9e\u9645\u5e94\u7528\u3002|\n", "2410.16236": "|**2024-10-21**|**LLaVA-KD: A Framework of Distilling Multimodal Large Language Models**|Yuxuan Cai et.al.|[2410.16236](http://arxiv.org/abs/2410.16236)|**[link](https://github.com/Fantasyele/LLaVA-KD)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6210\u529f\u4fc3\u4f7f\u7814\u7a76\u8005\u63a2\u7d22\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u4ee5\u5b9e\u73b0\u89c6\u89c9\u548c\u8bed\u8a00\u7684\u7edf\u4e00\u7406\u89e3\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u590d\u6742\u6027\u7684\u589e\u52a0\uff0cMLLM\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u53d7\u5230\u9650\u5236\u3002\u5c0f\u89c4\u6a21\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08s-MLLM\uff09\u65e8\u5728\u4fdd\u7559\u5927\u89c4\u6a21\u6a21\u578b\uff08l-MLLM\uff09\u7684\u80fd\u529b\uff0c\u540c\u65f6\u51cf\u5c11\u8ba1\u7b97\u9700\u6c42\uff0c\u4f46\u4f1a\u5bfc\u81f4\u6027\u80fd\u663e\u8457\u4e0b\u964d\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLaVA-KD\u7684\u65b0\u6846\u67b6\uff0c\u7528\u4e8e\u5c06\u77e5\u8bc6\u4ecel-MLLM\u8f6c\u79fb\u5230s-MLLM\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u6a21\u6001\u84b8\u998f\uff08MDist\uff09\u6765\u6700\u5c0f\u5316l-MLLM\u548cs-MLLM\u4e4b\u95f4\u89c6\u89c9-\u6587\u672c\u8f93\u51fa\u5206\u5e03\u7684\u5dee\u5f02\uff0c\u5e76\u5f15\u5165\u5173\u7cfb\u84b8\u998f\uff08RDist\uff09\u6765\u8f6c\u79fbl-MLLM\u5efa\u6a21\u89c6\u89c9\u7279\u5f81\u4e4b\u95f4\u76f8\u5173\u6027\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e09\u9636\u6bb5\u8bad\u7ec3\u65b9\u6848\uff0c\u4ee5\u5145\u5206\u53d1\u6325s-MLLM\u7684\u6f5c\u529b\uff1a1\uff09\u84b8\u998f\u9884\u8bad\u7ec3\u5bf9\u9f50\u89c6\u89c9-\u6587\u672c\u8868\u793a\uff1b2\uff09\u76d1\u7763\u5fae\u8c03\u4f7f\u6a21\u578b\u5177\u5907\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff1b3\uff09\u84b8\u998f\u5fae\u8c03\u8fdb\u4e00\u6b65\u8f6c\u79fbl-MLLM\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\uff0c\u800c\u65e0\u9700\u6539\u53d8\u5c0f\u6a21\u578b\u7684\u67b6\u6784\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u63d0\u51fa\u7684\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u5c06\u5728https://github.com/caiyuxuan1120/LLAva-KD\u83b7\u53d6\u3002|\n", "2410.16235": "|**2024-10-21**|**ToW: Thoughts of Words Improve Reasoning in Large Language Models**|Zhikun Xu et.al.|[2410.16235](http://arxiv.org/abs/2410.16235)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86Thoughts of Words\uff08ToW\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u8bad\u7ec3\u65f6\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff0c\u7528\u4e8e\u4e0b\u4e2a\u8bcd\u9884\u6d4b\u3002ToW\u5c06\u4e0b\u4e2a\u8bcd\u9884\u6d4b\u89c6\u4e3a\u4e00\u4e2a\u6838\u5fc3\u63a8\u7406\u4efb\u52a1\uff0c\u5e76\u5728\u9884\u8bad\u7ec3\u6587\u672c\u4e2d\u6ce8\u5165\u7cbe\u7ec6\u7684\u601d\u8003\uff0c\u89e3\u91ca\u4e0b\u4e2a\u8bcd\u5e94\u8be5\u662f\u4ec0\u4e48\u4ee5\u53ca\u5b83\u4e0e\u524d\u6587\u4e0a\u4e0b\u6587\u7684\u5173\u7cfb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u89e3\u51b3\u4e86\u73b0\u6709\u4e0b\u4e2a\u8bcd\u9884\u6d4b\u5b66\u4e60\u65b9\u6848\u7684\u4e24\u4e2a\u57fa\u672c\u7f3a\u70b9\uff1a\u5b83\u4eec\u4f1a\u5f15\u8d77\u4e8b\u5b9e\u6027\u5e7b\u89c9\uff0c\u5e76\u4e14\u5bf9\u4e8e\u6a21\u578b\u6765\u8bf4\u96be\u4ee5\u6709\u6548\u5b66\u4e60\u539f\u59cb\u6587\u672c\u4e2d\u7684\u9690\u542b\u63a8\u7406\u8fc7\u7a0b\u3002\u867d\u7136\u83b7\u53d6\u8fd9\u4e9b\u5355\u8bcd\u7684\u601d\u60f3\u6709\u5f88\u591a\u65b9\u6cd5\uff0c\u4f46\u6211\u4eec\u63a2\u7d22\u4e86\u901a\u8fc7\u84b8\u998f\u4ece\u66f4\u5927\u6a21\u578b\u4e2d\u83b7\u53d6ToW\u6ce8\u91ca\u7684\u7b2c\u4e00\u6b65\u3002\u7ecf\u8fc7\u4ec5\u4f7f\u752870K\u4e2aToW\u6ce8\u91ca\u7684\u6301\u7eed\u9884\u8bad\u7ec3\u540e\uff0c\u6211\u4eec\u5728\u5e73\u5747\u60c5\u51b5\u4e0b\u63d0\u9ad8\u4e86\u6a21\u578b\u63a8\u7406\u6027\u80fd7%\u52309%\uff0c\u5e76\u5c06\u6a21\u578b\u5e7b\u89c9\u51cf\u5c11\u4e86\u9ad8\u8fbe10%\u3002\u540c\u65f6\uff0cToW\u5b8c\u5168\u72ec\u7acb\u4e8e\u4efb\u52a1\u548c\u5e94\u7528\uff0c\u4e0d\u4f1a\u5bf9\u6807\u7b7e\u6216\u8bed\u4e49\u5f15\u5165\u989d\u5916\u7684\u504f\u89c1\u3002|\n", "2410.16229": "|**2024-10-21**|**Building A Coding Assistant via the Retrieval-Augmented Language Model**|Xinze Li et.al.|[2410.16229](http://arxiv.org/abs/2410.16229)|**[link](https://github.com/NEUIR/CONAN)**|\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u5728\u4ee3\u7801\u76f8\u5173\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6709\u6548\u6027\uff0c\u5982\u4ee3\u7801\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u4ee3\u7801\u603b\u7ed3\u548c\u4ee3\u7801\u8865\u5168\u7b49\u4efb\u52a1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCONAN\uff08\u901a\u8fc7\u68c0\u7d22\u589e\u5f3a\u8bed\u8a00\u6a21\u578b\u5b9e\u73b0\u7684\u4ee3\u7801\u52a9\u624b\uff09\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u6a21\u4eff\u4eba\u7c7b\u5728\u7f16\u7a0b\u8fc7\u7a0b\u4e2d\u5bfb\u6c42\u77e5\u8bc6\u7684\u884c\u4e3a\u6765\u6784\u5efa\u4ee3\u7801\u52a9\u624b\u3002\u5177\u4f53\u6765\u8bf4\uff0cCONAN\u7531\u4e00\u4e2a\u4ee3\u7801\u7ed3\u6784\u611f\u77e5\u68c0\u7d22\u5668\uff08CONAN-R\uff09\u548c\u4e00\u4e2a\u57fa\u4e8e\u53cc\u91cd\u89c6\u56fe\u4ee3\u7801\u8868\u793a\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6a21\u578b\uff08CONAN-G\uff09\u7ec4\u6210\u3002CONAN-R\u901a\u8fc7\u4f7f\u7528Code-Documentation\u5bf9\u9f50\u548c\u63a9\u7801\u5b9e\u4f53\u9884\u6d4b\u4efb\u52a1\u6765\u9884\u8bad\u7ec3CodeT5\uff0c\u4ece\u800c\u4f7f\u8bed\u8a00\u6a21\u578b\u5177\u5907\u4ee3\u7801\u7ed3\u6784\u611f\u77e5\u80fd\u529b\uff0c\u5e76\u5b66\u4e60\u6709\u6548\u7684\u4ee3\u7801\u7247\u6bb5\u548c\u6587\u6863\u8868\u793a\u3002\u7136\u540e\uff0cCONAN-G\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u91cd\u89c6\u56fe\u4ee3\u7801\u8868\u793a\u673a\u5236\u6765\u5b9e\u73b0\u68c0\u7d22\u589e\u5f3a\u7684\u4ee3\u7801\u751f\u6210\u6a21\u578b\u3002CONAN-G\u5c06\u4ee3\u7801\u6587\u6863\u63cf\u8ff0\u89c6\u4e3a\u63d0\u793a\uff0c\u5e2e\u52a9\u8bed\u8a00\u6a21\u578b\u66f4\u597d\u5730\u7406\u89e3\u4ee3\u7801\u8bed\u4e49\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cCONAN\u5728\u4e0d\u540c\u7684\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4ee4\u4eba\u4fe1\u670d\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u68c0\u7d22\u589e\u5f3a\u4ee3\u7801\u751f\u6210\u6a21\u578b\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cCONAN\u901a\u8fc7\u5bf9\u4ee3\u7801\u6587\u6863\u6570\u636e\u5bf9\u8fdb\u884c\u5bf9\u9f50\u4ee5\u53ca\u901a\u8fc7\u63a9\u7801\u548c\u9884\u6d4b\u4ee3\u7801\u4e2d\u7684\u5b9e\u4f53\u6765\u6355\u83b7\u7ed3\u6784\u8bed\u4e49\uff0c\u4ece\u800c\u4e3a\u4ee3\u7801\u7247\u6bb5\u548c\u6587\u6863\u5b66\u4e60\u5b9a\u5236\u5316\u8868\u793a\u3002\u6b64\u5916\uff0c\u68c0\u7d22\u5230\u7684\u4ee3\u7801\u7247\u6bb5\u548c\u6587\u6863\u63d0\u4f9b\u4e86\u6765\u81ea\u7a0b\u5e8f\u8bed\u8a00\u548c\u81ea\u7136\u8bed\u8a00\u7684\u5fc5\u8981\u4fe1\u606f\uff0c\u4ee5\u534f\u52a9\u4ee3\u7801\u751f\u6210\u8fc7\u7a0b\u3002CONAN\u8fd8\u53ef\u4ee5\u4f5c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u52a9\u624b\uff0c\u5728\u8f83\u77ed\u7684\u4ee3\u7801\u6587\u6863\u957f\u5ea6\u4e0b\u63d0\u4f9b\u5916\u90e8\u77e5\u8bc6\uff0c\u4ee5\u63d0\u9ad8\u5176\u5728\u5404\u79cd\u4ee3\u7801\u4efb\u52a1\u4e0a\u7684\u6709\u6548\u6027\u3002\u8fd9\u663e\u793a\u4e86CONAN\u63d0\u53d6\u5fc5\u8981\u4fe1\u606f\u5e76\u5e2e\u52a9\u8fc7\u6ee4\u68c0\u7d22\u5230\u7684\u4ee3\u7801\u6587\u6863\u4e2d\u7684\u566a\u58f0\u7684\u80fd\u529b\u3002|\n", "2410.17236": "|**2024-10-22**|**Large Language Models Empowered Personalized Web Agents**|Hongru Cai et.al.|[2410.17236](http://arxiv.org/abs/2410.17236)|null|\u7f51\u7edc\u4ee3\u7406\u4f5c\u4e3a\u81ea\u52a8\u5316\u57fa\u4e8e\u7528\u6237\u6307\u4ee4\u7684Web\u4efb\u52a1\u5b8c\u6210\u7684\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u5411\uff0c\u663e\u8457\u63d0\u5347\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u6700\u8fd1\uff0c\u7f51\u7edc\u4ee3\u7406\u4ece\u4f20\u7edf\u7684\u4ee3\u7406\u53d1\u5c55\u5230\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7f51\u7edc\u4ee3\u7406\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\u5ffd\u7565\u4e86\u4e2a\u6027\u5316\u6570\u636e\uff08\u5982\u7528\u6237\u8d44\u6599\u548c\u5386\u53f2Web\u884c\u4e3a\uff09\u5728\u8f85\u52a9\u7406\u89e3\u7528\u6237\u7684\u4e2a\u6027\u5316\u6307\u4ee4\u548c\u6267\u884c\u5b9a\u5236\u5316\u64cd\u4f5c\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002\u4e3a\u514b\u670d\u8fd9\u4e00\u5c40\u9650\uff0c\u6211\u4eec\u9996\u5148\u5236\u5b9a\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u4e2a\u6027\u5316\u7f51\u7edc\u4ee3\u7406\u4efb\u52a1\uff0c\u8be5\u4efb\u52a1\u7ed3\u5408\u4e86\u4e2a\u6027\u5316\u6570\u636e\u548c\u7528\u6237\u6307\u4ee4\u6765\u5b9e\u73b0\u4e2a\u6027\u5316\u7684\u6307\u4ee4\u7406\u89e3\u548c\u64cd\u4f5c\u6267\u884c\u3002\u4e3a\u4e86\u5e94\u5bf9\u7f3a\u4e4f\u5168\u9762\u8bc4\u4f30\u57fa\u51c6\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e2a\u6027\u5316\u7f51\u7edc\u4ee3\u7406\u57fa\u51c6\uff08PersonalWAB\uff09\uff0c\u8be5\u57fa\u51c6\u5305\u542b\u4e86\u7528\u6237\u6307\u4ee4\u3001\u4e2a\u6027\u5316\u7528\u6237\u6570\u636e\u3001Web\u529f\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e09\u4e2a\u4e2a\u6027\u5316Web\u4efb\u52a1\u7684\u4e24\u79cd\u8bc4\u4f30\u8303\u5f0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e2a\u6027\u5316\u7528\u6237\u8bb0\u5fc6\u589e\u5f3a\u5bf9\u9f50\uff08PUMA\uff09\u6846\u67b6\uff0c\u4ee5\u9002\u5e94\u4e2a\u6027\u5316\u7f51\u7edc\u4ee3\u7406\u4efb\u52a1\u3002PUMA\u5229\u7528\u5177\u6709\u7279\u5b9a\u4efb\u52a1\u68c0\u7d22\u7b56\u7565\u7684\u8bb0\u5fc6\u5e93\u6765\u7b5b\u9009\u76f8\u5173\u7684\u5386\u53f2Web\u884c\u4e3a\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u884c\u4e3a\uff0cPUMA\u901a\u8fc7\u5fae\u8c03\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\u6765\u8c03\u6574LLM\u8fdb\u884c\u4e2a\u6027\u5316\u7684\u64cd\u4f5c\u6267\u884c\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86PUMA\u5728PersonalWAB\u4e0a\u4f18\u4e8e\u73b0\u6709\u7f51\u7edc\u4ee3\u7406\u7684\u4f18\u8d8a\u6027\u3002|\n", "2410.17235": "|**2024-10-22**|**Automated Spinal MRI Labelling from Reports Using a Large Language Model**|Robin Y. Park et.al.|[2410.17235](http://arxiv.org/abs/2410.17235)|**[link](https://github.com/robinyjpark/autolabelclassifier)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u7684\u7ba1\u9053\uff0c\u7528\u4e8e\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u5316\u63d0\u53d6\u653e\u5c04\u5b66\u62a5\u544a\u4e2d\u7684\u6807\u7b7e\uff0c\u5e76\u5728\u810a\u67f1MRI\u62a5\u544a\u4e0a\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u8be5\u6807\u7b7e\u63d0\u53d6\u65b9\u6cd5\u7684\u6709\u6548\u6027\u5728\u4e94\u79cd\u4e0d\u540c\u7684\u60c5\u51b5\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\uff1a\u810a\u67f1\u764c\u3001\u72ed\u7a84\u3001\u810a\u690e\u6ed1\u8131\u3001\u9a6c\u5c3e\u795e\u7ecf\u53d7\u538b\u548c\u759d\u6c14\u3002\u4f7f\u7528\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u7559\u7684\u4e00\u7ec4\u62a5\u544a\u4e0a\u7b49\u4e8e\u6216\u8d85\u8fc7\u4e86GPT-4\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6240\u63d0\u53d6\u7684\u6807\u7b7e\u53ef\u4ee5\u7528\u6765\u8bad\u7ec3\u5f71\u50cf\u6a21\u578b\u4ee5\u8bc6\u522b\u4f34\u968f\u7684MRI\u626b\u63cf\u4e2d\u7684\u8fd9\u4e9b\u5df2\u8bc6\u522b\u7684\u72b6\u51b5\u3002\u6240\u6709\u4f7f\u7528\u81ea\u52a8\u6807\u7b7e\u8bad\u7ec3\u7684\u5206\u7c7b\u5668\u8868\u73b0\u4e0e\u4f7f\u7528\u4e34\u5e8a\u533b\u751f\u624b\u52a8\u6807\u6ce8\u7684\u626b\u63cf\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u3002\u4ee3\u7801\u53ef\u4ee5\u5728\u627e\u5230\u3002**|\n", "2410.17234": "|**2024-10-22**|**Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy**|Benedict Aaron Tjandra et.al.|[2410.17234](http://arxiv.org/abs/2410.17234)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5176\u751f\u6210\u5408\u7406\u4f46\u4e0d\u51c6\u786e\u6587\u672c\u7684\u80fd\u529b\u800c\u95fb\u540d\uff0c\u8fd9\u79cd\u73b0\u8c61\u5728\u533b\u5b66\u6216\u6cd5\u5f8b\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u5e26\u6765\u4e86\u663e\u8457\u7684\u98ce\u9669\uff0c\u56e0\u6b64\u9700\u8981\u91c7\u53d6\u7a33\u5065\u7684\u5e7b\u89c9\u7f13\u89e3\u7b56\u7565\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u901a\u8fc7\u5fae\u8c03\u6765\u6559\u5bfc\u6a21\u578b\u907f\u514d\u56de\u7b54\u8d85\u51fa\u5176\u77e5\u8bc6\u6216\u80fd\u529b\u8303\u56f4\u7684\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u5916\u90e8\u7684\u771f\u5b9e\u6807\u7b7e\uff0c\u6216\u8005\u4ec5\u9650\u4e8e\u77ed\u6587\u672c\u56de\u5e94\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u8bed\u4e49\u71b5\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u4ece\u6a21\u578b\u5185\u90e8\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u5f97\u51fa\u7684\u4e0d\u786e\u5b9a\u6027\u5ea6\u91cf\uff0c\u4e0d\u9700\u8981\u5916\u90e8\u6807\u7b7e\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f7f\u7528\u5148\u524d\u7814\u7a76\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u4e0a\u8fbe\u5230\u4e86\u540c\u7b49\u6216\u66f4\u597d\u7684\u8868\u73b0\uff0c\u5e76\u5728\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u5bf9\u77ed\u6587\u672c\u548c\u957f\u6587\u672c\u751f\u6210\u7684\u5f3a\u5927\u6027\u80fd\u3002|\n", "2410.17233": "|**2024-10-22**|**Few-shot In-Context Preference Learning Using Large Language Models**|Chao Yu et.al.|[2410.17233](http://arxiv.org/abs/2410.17233)|null|\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u662f\u5f3a\u5316\u5b66\u4e60\u4e2d\u7684\u6838\u5fc3\u7ec4\u6210\u90e8\u5206\uff0c\u4f46\u5bf9\u4e8e\u975e\u5e38\u590d\u6742\u7684\u884c\u4e3a\u6765\u8bf4\u53ef\u80fd\u5177\u6709\u6311\u6218\u6027\u3002\u901a\u8fc7\u7528\u4ece\u4eba\u7c7b\u53cd\u9988\u4e2d\u5b66\u4e60\u5230\u7684\u5956\u52b1\u51fd\u6570\u66ff\u4ee3\u624b\u5de5\u7f16\u5199\u7684\u5956\u52b1\u51fd\u6570\uff0c\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u7528\u4e8e\u7f13\u89e3\u8fd9\u4e00\u6311\u6218\u3002\u7136\u800c\uff0c\u5b66\u4e60\u8fd9\u4e9b\u5956\u52b1\u51fd\u6570\u901a\u5e38\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u5b83\u4eec\u5f80\u5f80\u662f\u4ece\u5934\u5f00\u59cb\u5b66\u4e60\u7684\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u5c06\u4e00\u7cfb\u5217\u4eba\u7c7b\u504f\u597d\u8f6c\u6362\u4e3a\u8868\u793a\u5956\u52b1\u7684\u4ee3\u7801\u6765\u51cf\u5c11\u67e5\u8be2\u7684\u4f4e\u6548\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3a\u201c\u4e0a\u4e0b\u6587\u504f\u597d\u5b66\u4e60\u201d\uff08ICPL\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528LLM\u7684\u80cc\u666f\u77e5\u8bc6\u6765\u52a0\u901f\u4ece\u504f\u597d\u4e2d\u5b66\u4e60\u5956\u52b1\u51fd\u6570\u7684\u8fc7\u7a0b\u3002ICPL\u91c7\u7528\u73af\u5883\u4e0a\u4e0b\u6587\u548c\u4efb\u52a1\u63cf\u8ff0\uff0c\u5408\u6210\u4e00\u7ec4\u5956\u52b1\u51fd\u6570\uff0c\u7136\u540e\u53cd\u590d\u4f7f\u7528\u4eba\u7c7b\u5bf9\u653f\u7b56\u7ed3\u679c\u89c6\u9891\u7684\u6392\u540d\u6765\u66f4\u65b0\u8fd9\u4e9b\u5956\u52b1\u51fd\u6570\u3002\u901a\u8fc7\u5408\u6210\u504f\u597d\uff0c\u6211\u4eec\u8bc1\u660eICPL\u6bd4RLHF\u9ad8\u6548\u51e0\u4e2a\u6570\u91cf\u7ea7\uff0c\u5e76\u4e14\u751a\u81f3\u4e0e\u4f7f\u7528\u771f\u5b9e\u5956\u52b1\u51fd\u6570\u7684\u65b9\u6cd5\u76f8\u6bd4\u4e5f\u5177\u6709\u7ade\u4e89\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u4eba\u7c7b\u504f\u597d\u5b66\u4e60\u8bd5\u9a8c\uff0c\u89c2\u5bdf\u5230ICPL\u4e0d\u4ec5\u9002\u7528\u4e8e\u5408\u6210\u8bbe\u7f6e\uff0c\u8fd8\u53ef\u4ee5\u5728\u4eba\u7c7b\u53c2\u4e0e\u7684\u5faa\u73af\u4e2d\u6709\u6548\u5de5\u4f5c\u3002\u66f4\u591a\u76f8\u5173\u4fe1\u606f\u548c\u89c6\u9891\u53ef\u4ee5\u5728https://sites.google.com/view/few-shot-icpl/home \u83b7\u53d6\u3002|\n", "2410.17222": "|**2024-10-22**|**Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods**|Tsachi Blau et.al.|[2410.17222](http://arxiv.org/abs/2410.17222)|null|\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u5e38\u6d89\u53ca\u66f4\u65b0\u6570\u5341\u4ebf\u4e2a\u53c2\u6570\u3002\u4e00\u79cd\u66f4\u4e3a\u53c2\u6570\u9ad8\u6548\u7684\u65b9\u6cd5\u662f\u63d0\u793a\u8c03\u4f18\uff08PT\uff09\uff0c\u5b83\u4ec5\u66f4\u65b0\u5c11\u6570\u53ef\u5b66\u4e60\u7684\u6807\u8bb0\u3002\u53e6\u4e00\u79cd\u65b9\u6cd5\u662f\u60c5\u5883\u5b66\u4e60\uff08ICL\uff09\uff0c\u5b83\u901a\u8fc7\u5728\u8f93\u5165\u4e2d\u5305\u542b\u793a\u4f8b\u6765\u9002\u5e94\u65b0\u4efb\u52a1\uff0c\u800c\u65e0\u9700\u8fdb\u884c\u8bad\u7ec3\u3002\u5f53\u5e94\u7528\u57fa\u4e8e\u4f18\u5316\u7684\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u548cPT\u8fdb\u884c\u5c11\u6837\u672c\u5b66\u4e60\u65f6\uff0c\u6a21\u578b\u4f1a\u7279\u522b\u9002\u5e94\u5c11\u91cf\u7684\u8bad\u7ec3\u793a\u4f8b\uff0c\u800cICL\u5219\u4e0d\u6539\u53d8\u6a21\u578b\u672c\u8eab\u3002\u8fd9\u79cd\u533a\u522b\u4f7f\u5f97\u4f20\u7edf\u7684\u5b66\u4e60\u65b9\u6cd5\u66f4\u5bb9\u6613\u8fc7\u62df\u5408\uff1b\u76f8\u53cd\uff0cICL\u5bf9\u5c11\u91cf\u6837\u672c\u7684\u60c5\u51b5\u4e0d\u592a\u654f\u611f\u3002\u867d\u7136ICL\u4e0d\u5bb9\u6613\u8fc7\u62df\u5408\uff0c\u4f46\u5b83\u5e76\u4e0d\u80fd\u5b8c\u5168\u63d0\u53d6\u8bad\u7ec3\u793a\u4f8b\u4e2d\u5b58\u5728\u7684\u4fe1\u606f\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u60c5\u5883\u611f\u77e5\u63d0\u793a\u8c03\u4f18\uff08CPT\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53d7\u5230ICL\u3001PT\u548c\u5bf9\u6297\u653b\u51fb\u7684\u542f\u53d1\u3002\u6211\u4eec\u5728ICL\u7b56\u7565\u7684\u57fa\u7840\u4e0a\uff0c\u5c06\u793a\u4f8b\u4e0e\u8f93\u5165\u4e32\u8054\u8d77\u6765\uff0c\u4f46\u901a\u8fc7PT\u5f0f\u7684\u4f18\u5316\uff0c\u8fed\u4ee3\u5730\u4f18\u5316\u4e0a\u4e0b\u6587\u5d4c\u5165\uff0c\u4ee5\u4ece\u8bad\u7ec3\u793a\u4f8b\u4e2d\u63d0\u53d6\u66f4\u6df1\u5c42\u6b21\u7684\u4fe1\u606f\u3002\u6211\u4eec\u4ed4\u7ec6\u4fee\u6539\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u6807\u8bb0\uff0c\u8003\u8651\u8f93\u5165\u548c\u8f93\u51fa\u683c\u5f0f\u7684\u72ec\u7279\u7ed3\u6784\u3002\u53d7\u5bf9\u6297\u653b\u51fb\u7684\u542f\u53d1\uff0c\u6211\u4eec\u6839\u636e\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u7684\u6807\u7b7e\u8c03\u6574\u8f93\u5165\uff0c\u65e8\u5728\u6700\u5c0f\u5316\u800c\u4e0d\u662f\u6700\u5927\u5316\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5e94\u7528\u6295\u5f71\u68af\u5ea6\u4e0b\u964d\u7b97\u6cd5\uff0c\u4f7f\u6807\u8bb0\u5d4c\u5165\u4fdd\u6301\u5728\u63a5\u8fd1\u539f\u59cb\u503c\u7684\u72b6\u6001\uff0c\u5047\u8bbe\u7528\u6237\u63d0\u4f9b\u7684\u6570\u636e\u672c\u8d28\u4e0a\u662f\u6709\u4ef7\u503c\u7684\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u591a\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e2d\u4f7f\u7528\u5404\u79cdLLM\u6a21\u578b\uff0c\u5df2\u663e\u793a\u51fa\u4f18\u8d8a\u7684\u51c6\u786e\u6027\u3002|\n", "2410.17210": "|**2024-10-22**|**Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling**|Azmine Toushik Wasi et.al.|[2410.17210](http://arxiv.org/abs/2410.17210)|**[link](https://github.com/ciol-researchlab/ukil)**|**\u76ee\u7684\uff1a\u5b5f\u52a0\u62c9\u56fd\u7684\u6cd5\u5f8b\u7cfb\u7edf\u9762\u4e34\u7740\u91cd\u5927\u6311\u6218\uff0c\u5982\u6848\u4ef6\u79ef\u538b\u3001\u590d\u6742\u6027\u3001\u9ad8\u6602\u7684\u6210\u672c\u4ee5\u53ca\u6570\u767e\u4e07\u672a\u51b3\u6848\u4ef6\u7b49\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u5bfc\u81f4\u8bb8\u591a\u4eba\u56e0\u7f3a\u4e4f\u77e5\u8bc6\u6216\u7ecf\u6d4e\u9650\u5236\u800c\u65e0\u6cd5\u5bfb\u6c42\u6cd5\u5f8b\u6551\u6d4e\u3002\u672c\u7814\u7a76\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u4e13\u95e8\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee5\u534f\u52a9\u5b5f\u52a0\u62c9\u56fd\u7684\u6cd5\u5f8b\u7cfb\u7edf\u3002\u65b9\u6cd5\uff1a\u6211\u4eec\u901a\u8fc7\u6536\u96c6\u548c\u722c\u53d6\u5404\u79cd\u6cd5\u5f8b\u6cd5\u6848\u7684\u6570\u636e\uff0c\u521b\u5efa\u4e86UKIL-DB-EN\uff0c\u5373\u5b5f\u52a0\u62c9\u56fd\u6cd5\u5f8b\u6587\u4ef6\u7684\u82f1\u6587\u8bed\u6599\u5e93\u3002\u7136\u540e\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u5bf9GPT-2\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5f00\u53d1\u4e86GPT2-UKIL-EN\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u82f1\u8bed\u6cd5\u5f8b\u63f4\u52a9\u7684LLM\u3002\u7ed3\u679c\uff1a\u8be5\u6a21\u578b\u901a\u8fc7\u5305\u62ec\u4e13\u5bb6\u610f\u89c1\u652f\u6301\u7684\u6848\u4f8b\u7814\u7a76\u5728\u5185\u7684\u8bed\u4e49\u8bc4\u4f30\u8fdb\u884c\u4e86\u4e25\u683c\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6a21\u578b\u5177\u6709\u6f5c\u5728\u7684\u6cd5\u5f8b\u4e8b\u52a1\u8f85\u52a9\u80fd\u529b\u3002\u7ed3\u8bba\uff1a\u6211\u4eec\u7684\u5de5\u4f5c\u4ee3\u8868\u4e86\u5efa\u7acb\u5b5f\u52a0\u62c9\u56fdAI\u6cd5\u5f8b\u52a9\u624b\u7684\u7b2c\u4e00\u4e2a\u6709\u7ec4\u7ec7\u7684\u52aa\u529b\u3002\u5c3d\u7ba1\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u4ecd\u9700\u8981\u8fdb\u4e00\u6b65\u6539\u8fdb\u4ee5\u63d0\u9ad8\u6a21\u578b\u7684\u51c6\u786e\u6027\u3001\u53ef\u4fe1\u5ea6\u548c\u5b89\u5168\u6027\u3002\u8fd9\u662f\u671d\u7740\u521b\u5efa\u80fd\u591f\u6ee1\u8db31.8\u4ebf\u4eba\u53e3\u9700\u6c42\u7684\u6cd5\u5f8bAI\u7684\u91cd\u8981\u4e00\u6b65\u3002**|\n", "2410.17196": "|**2024-10-22**|**VoiceBench: Benchmarking LLM-Based Voice Assistants**|Yiming Chen et.al.|[2410.17196](http://arxiv.org/abs/2410.17196)|**[link](https://github.com/matthewcym/voicebench)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\uff0c\u8fd1\u671f\u7684\u8fdb\u5c55\u5982GPT-4o\u4f7f\u5f97\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u5b9e\u73b0\u5b9e\u65f6\u8bed\u97f3\u4ea4\u4e92\u6210\u4e3a\u53ef\u80fd\uff0c\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u6587\u672c\u7684\u4ea4\u4e92\u76f8\u6bd4\uff0c\u8fd9\u5927\u5927\u63d0\u5347\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u8fd9\u4e9b\u8bed\u97f3\u4ea4\u4e92\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u963b\u788d\u4e86\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u7684\u53d1\u5c55\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6216\u4f7f\u7528\u6e05\u6670\u8bed\u97f3\u7684\u4e00\u822c\u77e5\u8bc6\u8bc4\u4f30\u4e0a\uff0c\u5ffd\u89c6\u4e86\u66f4\u590d\u6742\u7684\u73b0\u5b9e\u573a\u666f\uff0c\u8fd9\u4e9b\u573a\u666f\u6d89\u53ca\u591a\u6837\u7684\u8bf4\u8bdd\u8005\u7279\u5f81\u3001\u73af\u5883\u548c\u5185\u5bb9\u56e0\u7d20\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86VoiceBench\uff0c\u8fd9\u662f\u9996\u4e2a\u65e8\u5728\u63d0\u4f9b\u591a\u65b9\u9762\u8bc4\u4f30\u7684\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u57fa\u51c6\u6d4b\u8bd5\u3002VoiceBench\u8fd8\u5305\u62ec\u65e2\u5305\u62ec\u771f\u5b9e\u7684\u4e5f\u5305\u62ec\u5408\u6210\u7684\u53e3\u8bed\u6307\u4ee4\uff0c\u8fd9\u4e9b\u6307\u4ee4\u878d\u5408\u4e86\u4e0a\u8ff0\u4e09\u4e2a\u5173\u952e\u7684\u73b0\u5b9e\u4e16\u754c\u53d8\u5316\u56e0\u7d20\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u5f53\u524d\u57fa\u4e8eLLM\u7684\u8bed\u97f3\u52a9\u624b\u6a21\u578b\u7684\u5c40\u9650\u6027\uff0c\u5e76\u4e3a\u8be5\u9886\u57df\u7684\u672a\u6765\u7814\u7a76\u548c\u53d1\u5c55\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002**|\n", "2410.17195": "|**2024-10-23**|**Non-myopic Generation of Language Model for Reasoning and Planning**|Chang Ma et.al.|[2410.17195](http://arxiv.org/abs/2410.17195)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u5c55\u793a\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u95ee\u9898\u5206\u89e3\u6210\u4e00\u7cfb\u5217\u6b65\u9aa4\u6765\u89e3\u51b3\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u548c\u7f16\u7801\u7b49\u5404\u79cd\u9886\u57df\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u7531\u4e8e\u5176\u56fa\u6709\u7684\u81ea\u56de\u5f52\u89e3\u7801\u65b9\u5f0f\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u786e\u4fdd\u53ef\u9760\u4e14\u6700\u4f18\u7684\u89c4\u5212\u65f6\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u4ece\u6700\u4f18\u63a7\u5236\u7684\u89d2\u5ea6\u91cd\u65b0\u5ba1\u89c6\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u9884\u6d4b\u89e3\u7801\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u6a21\u578b\u9884\u6d4b\u63a7\u5236\u6765\u589e\u5f3a\u89c4\u5212\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6839\u636e\u524d\u77bb\u8f68\u8ff9\u91cd\u65b0\u52a0\u6743\u8bed\u8a00\u6a21\u578b\u7684\u5206\u5e03\uff0c\u9884\u6d4b\u89e3\u7801\u65e8\u5728\u51cf\u8f7b\u65e9\u671f\u9519\u8bef\u5e76\u4fc3\u8fdb\u975e\u77ed\u89c6\u89c4\u5212\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6570\u5b66\u3001\u7f16\u7801\u548c\u667a\u80fd\u4f53\u4efb\u52a1\u7684\u5e7f\u6cdb\u8303\u56f4\u5185\uff0c\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6b64\u5916\uff0c\u9884\u6d4b\u89e3\u7801\u8fd8\u8868\u73b0\u51fa\u8ba1\u7b97\u6548\u7387\uff0c\u4f7f\u7528\u8f83\u5c11\u7684\u8ba1\u7b97\u8d44\u6e90\u5c31\u4f18\u4e8e\u641c\u7d22\u57fa\u7ebf\u3002\u672c\u7814\u7a76\u4e3a\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u5212\u80fd\u529b\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.17174": "|**2024-10-22**|**From Attention to Activation: Unravelling the Enigmas of Large Language Models**|Prannay Kaul et.al.|[2410.17174](http://arxiv.org/abs/2410.17174)|null|\u6211\u4eec\u7814\u7a76\u4e86\u81ea\u56de\u5f52Transformer\u4e2d\u7684\u4e24\u79cd\u5947\u602a\u73b0\u8c61\uff1a\uff081\uff09\u6ce8\u610f\u529b\u5934\u4e2d\u7b2c\u4e00\u4e2a\u4ee4\u724c\u7684\u4e3b\u5bfc\u6027\uff1b\uff082\uff09\u9690\u85cf\u72b6\u6001\u4e2d\u51fa\u73b0\u5927\u7684\u5f02\u5e38\u6fc0\u6d3b\u503c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u6d41\u884c\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982Llama\uff09\u572898%\u7684\u6ce8\u610f\u529b\u5934\u4e2d\u5bf9\u7b2c\u4e00\u4e2a\u4ee4\u724c\u7684\u5173\u6ce8\u5ea6\u6700\u5927\uff0c\u6211\u4eec\u5c06\u8fd9\u79cd\u884c\u4e3a\u5f52\u56e0\u4e8esoftmax\u51fd\u6570\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cdsoftmax-1\u7684\u91cd\u65b0\u516c\u5f0f\u5316\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u786e\u5b9a\u81ea\u9002\u5e94\u4f18\u5316\u5668\uff08\u4f8b\u5982Adam\uff09\u662f\u5bfc\u81f4\u8fd9\u4e9b\u5927\u5f02\u5e38\u6fc0\u6d3b\u503c\u7684\u4e3b\u8981\u539f\u56e0\uff0c\u5e76\u5f15\u5165OrthoAdam\uff0c\u4e00\u79cd\u65b0\u7684\u4f18\u5316\u5668\uff0c\u5b83\u4f7f\u7528\u6b63\u4ea4\u77e9\u9635\u6765\u8f6c\u6362\u68af\u5ea6\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6700\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u9632\u6b62\u4e86\u8fd9\u4e9b\u73b0\u8c61\u7684\u53d1\u751f\uff0c\u800c\u4e14\u8fd8\u4f7fTransformer\u80fd\u591f\u5728\u4f7f\u7528\u57fa\u672c\u7b97\u6cd5\u8fdb\u884c\u91cf\u5316\u65f6\u4fdd\u6301\u5176\u6027\u80fd\uff0c\u8fd9\u662f\u6807\u51c6\u65b9\u6cd5\u65e0\u6cd5\u505a\u5230\u7684\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u7b2c\u4e00\u4e2a\u4ee4\u724c\u7684\u6ce8\u610f\u529b\u6bd4\u4f8b\u4ece65%\u964d\u4f4e\u52303.3%\uff0c\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u6fc0\u6d3b\u5cf0\u5ea6\u4ece1657\u964d\u4f4e\u52303.1\uff0c\u57284\u4f4d\u6743\u91cd\u91cf\u5316\u4e0b\u56f0\u60d1\u5ea6\u60e9\u7f5a\u4ece3565\u964d\u4f4e\u52300.3\u3002|\n", "2410.17152": "|**2024-10-22**|**Improving Pinterest Search Relevance Using Large Language Models**|Han Wang et.al.|[2410.17152](http://arxiv.org/abs/2410.17152)|null|\u4e3a\u4e86\u63d0\u9ad8Pinterest\u641c\u7d22\u7684\u76f8\u5173\u6027\u8bc4\u5206\uff0c\u6211\u4eec\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u96c6\u6210\u5230\u6211\u4eec\u7684\u641c\u7d22\u76f8\u5173\u6027\u6a21\u578b\u4e2d\uff0c\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6587\u672c\u8868\u793a\u6765\u6709\u6548\u5730\u9884\u6d4bPin\u7684\u76f8\u5173\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4f7f\u7528\u641c\u7d22\u67e5\u8be2\u4ee5\u53ca\u5305\u542b\u4ece\u751f\u6210\u5f0f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u5b57\u5e55\u7684\u5185\u5bb9\u8868\u793a\u3002\u8fd9\u4e9b\u8868\u793a\u8fdb\u4e00\u6b65\u901a\u8fc7\u94fe\u63a5\u6587\u672c\u6570\u636e\u3001\u5386\u53f2\u9ad8\u8d28\u91cf\u4ea4\u4e92\u67e5\u8be2\u3001\u7528\u6237\u521b\u5efa\u7684\u677f\u3001Pin\u6807\u9898\u548cPin\u63cf\u8ff0\u8fdb\u884c\u4e30\u5bcc\uff0c\u4ece\u800c\u521b\u5efa\u51fa\u5f3a\u5927\u7684\u6a21\u578b\u6765\u9884\u6d4b\u641c\u7d22\u76f8\u5173\u6027\u3002\u6211\u4eec\u91c7\u7528\u534a\u76d1\u7763\u5b66\u4e60\u65b9\u6cd5\u4ee5\u9ad8\u6548\u5730\u6269\u5c55\u8bad\u7ec3\u6570\u636e\u91cf\uff0c\u8d85\u8d8a\u4ec5\u9650\u4e8e\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u3002\u901a\u8fc7\u5229\u7528\u591a\u8bed\u8a00LLMs\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5c06\u8bad\u7ec3\u6570\u636e\u6269\u5c55\u5230\u5305\u62ec\u672a\u89c1\u8fc7\u7684\u8bed\u8a00\u548c\u9886\u57df\uff0c\u5c3d\u7ba1\u521d\u59cb\u6570\u636e\u548c\u6ce8\u91ca\u5458\u7684\u4e13\u4e1a\u77e5\u8bc6\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c06\u57fa\u4e8eLLM\u7684\u6a21\u578b\u63d0\u70bc\u6210\u5b9e\u65f6\u53ef\u670d\u52a1\u7684\u6a21\u578b\u67b6\u6784\u548c\u7279\u5f81\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u79bb\u7ebf\u5b9e\u9a8c\u9a8c\u8bc1\u6211\u4eec\u63d0\u51fa\u7684\u6280\u672f\uff0c\u5e76\u5c55\u793a\u4e86\u5728\u5927\u89c4\u6a21\u90e8\u7f72\u7cfb\u7edf\u4e2d\u6240\u53d6\u5f97\u7684\u6210\u679c\u3002|\n", "2410.18071": "|**2024-10-23**|**TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts**|Yuxuan Xie et.al.|[2410.18071](http://arxiv.org/abs/2410.18071)|null|\u6700\u8fd1\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u56e0\u5176\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u800c\u5907\u53d7\u5173\u6ce8\u3002\u5bf9MLLMs\u7684\u8bc4\u4f30\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u8fd9\u6709\u52a9\u4e8e\u5206\u6790\u8fd9\u4e9b\u6a21\u578b\u7684\u7279\u6027\u5e76\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u5ffd\u89c6\u4e86\u63d0\u793a\u654f\u611f\u6027\u7684\u95ee\u9898\u2014\u2014\u8f7b\u5fae\u7684\u63d0\u793a\u53d8\u5316\u53ef\u80fd\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u6ce2\u52a8\u3002\u56e0\u6b64\uff0c\u4e0d\u9002\u5f53\u7684\u63d0\u793a\u53ef\u80fd\u4f1a\u63a9\u76d6\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4ece\u800c\u4f4e\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u4e0d\u540c\u7684\u6a21\u578b\u5bf9\u4e8e\u4e0d\u540c\u63d0\u793a\u6709\u4e0d\u540c\u7684\u504f\u597d\uff0c\u56e0\u6b64\u4f7f\u7528\u76f8\u540c\u7684\u63d0\u793a\u6765\u8bc4\u4f30\u6240\u6709\u6a21\u578b\u4f1a\u5bfc\u81f4\u8bc4\u4f30\u504f\u5dee\u3002\u672c\u6587\u5206\u6790\u4e86\u73b0\u6709\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8fd9\u4e00\u7f3a\u9677\uff0c\u5e76\u8fdb\u4e00\u6b65\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u8bc4\u4f30\u6846\u67b6TP-Eval\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u5f15\u5165\u63d0\u793a\u5b9a\u5236\u65b9\u6cd5\u6765\u51cf\u5c11\u8bc4\u4f30\u504f\u5dee\u5e76\u6316\u6398\u6a21\u578b\u7684\u6f5c\u529b\u3002TP-Eval\u5c06\u91cd\u5199\u539f\u59cb\u63d0\u793a\uff0c\u4e3a\u4e0d\u540c\u7684\u6a21\u578b\u751f\u6210\u4e0d\u540c\u7684\u5b9a\u5236\u5316\u63d0\u793a\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u9488\u5bf9MLLM\u8bc4\u4f30\u573a\u666f\u8bbe\u8ba1\u4e86\u4e00\u4e9b\u6a21\u5757\u6765\u5b9e\u73b0\u63d0\u793a\u5b9a\u5236\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u6709\u6548\u63ed\u793a\u6a21\u578b\u7684\u6f5c\u529b\uff0cTP-Eval\u6709\u671b\u4e3a\u793e\u533a\u5f00\u53d1\u66f4\u5168\u9762\u548c\u6709\u8bf4\u670d\u529b\u7684MLLM\u8bc4\u4f30\u57fa\u51c6\u505a\u51fa\u8d21\u732e\u3002|\n", "2410.18050": "|**2024-10-23**|**LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering**|Qingfei Zhao et.al.|[2410.18050](http://arxiv.org/abs/2410.18050)|**[link](https://github.com/qingfei1/longrag)**|**\u957f\u4e0a\u4e0b\u6587\u95ee\u7b54\uff08LCQA\uff09\u662f\u4e00\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\uff0c\u65e8\u5728\u901a\u8fc7\u63a8\u7406\u957f\u7bc7\u6587\u6863\u6765\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u73b0\u6709\u7684\u957f\u4e0a\u4e0b\u6587\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LCQA\u4e2d\u5e38\u5e38\u9762\u4e34\u201c\u8ff7\u5931\u4e2d\u95f4\u201d\u95ee\u9898\u3002\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u901a\u8fc7\u63d0\u4f9b\u5916\u90e8\u4e8b\u5b9e\u8bc1\u636e\u6765\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u5176\u5206\u5757\u7b56\u7565\u7834\u574f\u4e86\u5168\u5c40\u957f\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u5e76\u4e14\u5728\u957f\u4e0a\u4e0b\u6587\u4e2d\u4f4e\u8d28\u91cf\u7684\u68c0\u7d22\u4f1a\u963b\u788d\u5927\u8bed\u8a00\u6a21\u578b\u8bc6\u522b\u6709\u6548\u7684\u4e8b\u5b9e\u7ec6\u8282\uff0c\u56e0\u4e3a\u5b58\u5728\u5927\u91cf\u566a\u58f0\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LongRAG\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u7528\u7684\u3001\u53cc\u91cd\u89c6\u89d2\u7684\u3001\u5065\u58ee\u7684\u5927\u8bed\u8a00\u6a21\u578b\u4e3a\u57fa\u7840\u7684RAG\u7cfb\u7edf\u8303\u5f0f\uff0c\u7528\u4e8e\u589e\u5f3aRAG\u5bf9\u590d\u6742\u957f\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7684\u7406\u89e3\uff08\u5373\u5168\u5c40\u4fe1\u606f\u548c\u4e8b\u5b9e\u7ec6\u8282\uff09\u3002\u6211\u4eec\u5c06LongRAG\u8bbe\u8ba1\u4e3a\u4e00\u79cd\u5373\u63d2\u5373\u7528\u7684\u8303\u5f0f\uff0c\u4fbf\u4e8e\u9002\u5e94\u5404\u79cd\u9886\u57df\u548c\u5927\u8bed\u8a00\u6a21\u578b\u3002\u5728\u4e09\u4e2a\u591a\u8df3\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cLongRAG\u663e\u8457\u4f18\u4e8e\u957f\u4e0a\u4e0b\u6587\u5927\u8bed\u8a00\u6a21\u578b\uff08\u63d0\u53476.94%\uff09\uff0c\u5148\u8fdb\u7684RAG\uff08\u63d0\u53476.16%\uff09\u548c\u539f\u59cbRAG\uff08\u63d0\u534717.25%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9a\u91cf\u6d88\u878d\u7814\u7a76\u548c\u591a\u7ef4\u5ea6\u5206\u6790\uff0c\u5f3a\u8c03\u4e86\u7cfb\u7edf\u7ec4\u4ef6\u548c\u5fae\u8c03\u7b56\u7565\u7684\u6709\u6548\u6027\u3002\u6570\u636e\u548c\u4ee3\u7801\u53ef\u5728https://github.com/QingFei1/LongRAG\u83b7\u53d6\u3002**|\n", "2410.18040": "|**2024-10-23**|**Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases**|Anna Glazkova et.al.|[2410.18040](http://arxiv.org/abs/2410.18040)|null|\u5173\u952e\u8bcd\u9009\u62e9\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u4e00\u4e2a\u5177\u6709\u5e7f\u6cdb\u5e94\u7528\u7684\u6311\u6218\u6027\u4efb\u52a1\u3002\u7531\u4e8e\u4fc4\u8bed\u4e30\u5bcc\u7684\u5f62\u6001\u5b66\u7279\u5f81\u4ee5\u53ca\u6709\u9650\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5c06\u73b0\u6709\u7684\u76d1\u7763\u548c\u975e\u76d1\u7763\u89e3\u51b3\u65b9\u6848\u5e94\u7528\u4e8e\u4fc4\u8bed\u9762\u4e34\u8bf8\u591a\u9650\u5236\u3002\u6700\u8fd1\u5bf9\u82f1\u6587\u6587\u672c\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6210\u529f\u5730\u89e3\u51b3\u4e86\u751f\u6210\u5173\u952e\u8bcd\u7684\u4efb\u52a1\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u8fdb\u884c\u7279\u5b9a\u4efb\u52a1\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u53d6\u5f97\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u7ed3\u679c\uff0c\u4f7f\u7528\u6587\u672c\u63d0\u793a\u5373\u53ef\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u5728\u751f\u6210\u4fc4\u6587\u79d1\u5b66\u6458\u8981\u5173\u952e\u8bcd\u65b9\u9762\u7684\u8868\u73b0\u3002\u9996\u5148\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u3001\u5fae\u8c03\u6a21\u578b\u548c\u975e\u76d1\u7763\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e2d\u5173\u952e\u8bcd\u793a\u4f8b\u7684\u9009\u62e9\u7b56\u7565\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4eba\u5de5\u8bc4\u4f30\u751f\u6210\u7684\u5173\u952e\u8bcd\u7684\u7ed3\u679c\uff0c\u5e76\u901a\u8fc7\u4e13\u5bb6\u8bc4\u4f30\u5206\u6790\u4e86\u6a21\u578b\u7684\u4f18\u52bf\u548c\u52a3\u52bf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u5373\u4f7f\u4f7f\u7528\u7b80\u5355\u7684\u6587\u672c\u63d0\u793a\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u4e5f\u53ef\u4ee5\u8d85\u8d8a\u5e38\u89c1\u7684\u57fa\u7ebf\u6a21\u578b\u3002|\n", "2410.18035": "|**2024-10-23**|**MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning**|Jingfan Zhang et.al.|[2410.18035](http://arxiv.org/abs/2410.18035)|null|\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u53ca\u5176\u6df7\u5408\u4e13\u5bb6\uff08MOE\uff09\u53d8\u4f53\u662f\u9ad8\u5ea6\u6709\u6548\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5728Transformer\u5c42\u7684\u591a\u4e2a\u7ebf\u6027\u6a21\u5757\u4e2d\u6dfb\u52a0\u4e86LoRA\u6a21\u5757\u548cMOE\u8def\u7531\u5668\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u591a\u79df\u6237\u8bbe\u7f6e\u4e2d\u5f15\u5165\u4e86\u663e\u8457\u7684\u5ef6\u8fdf\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mixture of Low-Rank Adaptation (MiLoRA)\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u4e14\u9ad8\u6548\u7684LoRA\u53d8\u4f53\u3002MiLoRA\u4e0e\u4e4b\u524d\u7684MOE\u98ce\u683cLoRA\u65b9\u6cd5\u4e0d\u540c\uff0c\u5b83\u5c06\u6bcf\u4e2aLoRA\u6a21\u5757\u89c6\u4e3a\u4e00\u4e2a\u4e13\u5bb6\uff0c\u5e76\u91c7\u7528\u63d0\u793a\u611f\u77e5\u8def\u7531\u673a\u5236\u3002\u8fd9\u79cd\u673a\u5236\u5728\u751f\u6210\u7b2c\u4e00\u4e2a\u65b0\u6807\u8bb0\u4e4b\u524d\u8ba1\u7b97\u4e00\u6b21\u4e13\u5bb6\u8def\u7531\u7ed3\u679c\uff0c\u5e76\u5728\u540e\u7eed\u6807\u8bb0\u4e2d\u91cd\u7528\u8fd9\u4e9b\u7ed3\u679c\uff0c\u4ece\u800c\u51cf\u5c11\u5ef6\u8fdf\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u5206\u6790\u8868\u660e\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\u3001\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4ee5\u53ca\u5e7f\u6cdb\u4f7f\u7528\u7684LLM\u8bc4\u4f30\u57fa\u51c6\u4e0a\uff0cMiLoRA\u59cb\u7ec8\u4f18\u4e8e\u5f3a\u5927\u7684PEFT\u57fa\u7ebf\uff0c\u540c\u65f6\u5177\u6709\u53ef\u6bd4\u7684\u53ef\u8c03\u53c2\u6570\u9884\u7b97\u3002\u6b64\u5916\uff0c\u4e0e\u4e4b\u524d\u7684\u57fa\u4e8eLoRA\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0cMiLoRA\u5728\u591a\u79df\u6237\u8bbe\u7f6e\u4e2d\u663e\u8457\u964d\u4f4e\u4e86\u5ef6\u8fdf\u3002|\n", "2410.18032": "|**2024-10-23**|**GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration**|Xin Li et.al.|[2410.18032](http://arxiv.org/abs/2410.18032)|**[link](https://github.com/bupt-gamma/graphteam)**|**\u56fe\u662f\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5efa\u6a21\u5173\u7cfb\u6570\u636e\u7684\u5e38\u7528\u5de5\u5177\uff0c\u4f8b\u5982\u793e\u4ea4\u7f51\u7edc\u548c\u57ce\u5e02\u8ba1\u7b97\u3002\u73b0\u6709\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u56fe\u5206\u6790\u65b9\u6cd5\u8981\u4e48\u96c6\u6210\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\u4ee5\u7528\u4e8e\u7279\u5b9a\u7684\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5176\u53ef\u79fb\u690d\u6027\uff1b\u8981\u4e48\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u81ea\u8eab\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e86LLM\u57fa\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u8fdb\u5c55\u8868\u660e\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u6216\u5de5\u5177\u89e3\u51b3\u95ee\u9898\u3002\u901a\u8fc7\u6a21\u62df\u4eba\u7c7b\u7684\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff0c\u5982\u7c7b\u6bd4\u548c\u534f\u4f5c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u540d\u4e3aGraphTeam\uff0c\u7528\u4e8e\u56fe\u5206\u6790\u3002GraphTeam\u7531\u4e09\u4e2a\u6a21\u5757\u4e2d\u7684\u4e94\u4e2aLLM\u57fa\u4ee3\u7406\u7ec4\u6210\uff0c\u8fd9\u4e9b\u5177\u6709\u4e0d\u540c\u4e13\u957f\u7684\u4ee3\u7406\u53ef\u4ee5\u76f8\u4e92\u534f\u4f5c\u4ee5\u5e94\u5bf9\u590d\u6742\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\uff081\uff09\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u5316\u6a21\u5757\uff1a\u95ee\u9898\u4ee3\u7406\u4ece\u539f\u59cb\u95ee\u9898\u4e2d\u63d0\u53d6\u5e76\u7cbe\u70bc\u56db\u4e2a\u5173\u952e\u53c2\u6570\uff0c\u4fc3\u8fdb\u95ee\u9898\u7406\u89e3\uff0c\u800c\u7b54\u6848\u4ee3\u7406\u5219\u7ec4\u7ec7\u7ed3\u679c\u4ee5\u6ee1\u8db3\u8f93\u51fa\u8981\u6c42\uff1b\uff082\uff09\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6a21\u5757\uff1a\u6211\u4eec\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u6587\u6863\u548c\u7ecf\u9a8c\u4fe1\u606f\u7684\u77e5\u8bc6\u5e93\uff0c\u7136\u540e\u641c\u7d22\u4ee3\u7406\u9488\u5bf9\u6bcf\u4e2a\u95ee\u9898\u68c0\u7d22\u6700\u76f8\u5173\u7684\u6761\u76ee\u3002\uff083\uff09\u95ee\u9898\u89e3\u51b3\u6a21\u5757\uff1a\u7ed9\u5b9a\u641c\u7d22\u4ee3\u7406\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u7f16\u7801\u4ee3\u7406\u4f7f\u7528\u7f16\u7a0b\u65b9\u5f0f\u751f\u6210\u89e3\u51b3\u65b9\u6848\uff1b\u5982\u679c\u7f16\u7801\u4ee3\u7406\u4e0d\u8d77\u4f5c\u7528\uff0c\u5219\u63a8\u7406\u4ee3\u7406\u5c06\u76f4\u63a5\u8ba1\u7b97\u7ed3\u679c\u800c\u4e0d\u8fdb\u884c\u7f16\u7a0b\u3002\u5728\u516d\u4e2a\u56fe\u5206\u6790\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cGraphTeam\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u51c6\u786e\u7387\u65b9\u9762\u5e73\u5747\u6bd4\u6700\u4f73\u57fa\u7ebf\u9ad8\u51fa25.85%\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2410.18012": "|**2024-10-23**|**MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting**|Sungil Seok et.al.|[2410.18012](http://arxiv.org/abs/2410.18012)|null|\u7f8e\u56fd\u8054\u90a6\u57fa\u91d1\u5229\u7387\u5728\u56fd\u5185\u5916\u91d1\u878d\u5e02\u573a\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8be5\u5229\u7387\u8c03\u6574\u7684\u5f71\u54cd\u4e0a\uff0c\u800c\u4e0d\u662f\u51b3\u7b56\u8fc7\u7a0b\u672c\u8eab\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fdb\u6b65\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u80fd\u7684\u65b9\u6cd5\u6765\u91cd\u6784\u8d1f\u8d23\u8bbe\u5b9a\u8054\u90a6\u57fa\u91d1\u5229\u7387\u7684\u8054\u90a6\u516c\u5f00\u5e02\u573a\u59d4\u5458\u4f1a\uff08FOMC\uff09\u4f1a\u8bae\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e94\u9636\u6bb5\u7684FOMC\u4f1a\u8bae\u6a21\u62df\u6846\u67b6MiniFed\uff0c\u8be5\u6846\u67b6\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684FOMC\u4f1a\u8bae\u6210\u5458\uff0c\u5e76\u4f18\u5316FOMC\u7ed3\u6784\u3002\u6b64\u6846\u67b6\u6709\u6548\u5730\u91cd\u65b0\u6fc0\u6d3b\u4e86FOMC\u4f1a\u8bae\u8fc7\u7a0b\uff0c\u5e76\u6709\u52a9\u4e8e\u9884\u6d4b\u8054\u90a6\u57fa\u91d1\u5229\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u63d0\u51fa\u7684MiniFed\u6846\u67b6\u5728\u8054\u90a6\u57fa\u91d1\u5229\u7387\u9884\u6d4b\u65b9\u9762\u5177\u6709\u9ad8\u51c6\u786e\u5ea6\uff0c\u5e76\u4e14\u5728\u4ee3\u7406\u884c\u4e3a\u4e0a\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u5bf9\u5e94\u8005\u4fdd\u6301\u4e00\u81f4\u3002\u9274\u4e8e\u76ee\u524d\u5f88\u5c11\u6709\u7814\u7a76\u5229\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u5927\u89c4\u6a21\u7684\u73b0\u5b9e\u4e16\u754c\u4f1a\u8bae\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u53ef\u4ee5\u4f5c\u4e3a\u672a\u6765\u53d1\u5c55\u7684\u57fa\u51c6\u3002|\n", "2410.17954": "|**2024-10-23**|**ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference**|Xin He et.al.|[2410.17954](http://arxiv.org/abs/2410.17954)|null|\u7a00\u758f\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5bc6\u96c6\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f46\u5728\u63a8\u7406\u90e8\u7f72\u8fc7\u7a0b\u4e2d\u9762\u4e34\u663e\u8457\u7684\u5185\u5b58\u9700\u6c42\u6311\u6218\u3002\u73b0\u6709\u7684\u5378\u8f7d\u6280\u672f\u6d89\u53ca\u5728GPU\u548cCPU\u4e4b\u95f4\u4ea4\u6362\u6fc0\u6d3b\u548c\u7a7a\u95f2\u7684\u4e13\u5bb6\uff0c\u4f46\u8fd9\u4e9b\u6280\u672f\u901a\u5e38\u53d7\u5230\u521a\u6027\u4e13\u5bb6\u7f13\u5b58\u673a\u5236\u7684\u9650\u5236\u3002\u8fd9\u4e9b\u673a\u5236\u65e0\u6cd5\u9002\u5e94\u52a8\u6001\u8def\u7531\uff0c\u5bfc\u81f4\u7f13\u5b58\u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u6216\u5728\u9884\u6d4b\u8bad\u7ec3\u4e2d\u4ea7\u751f\u9ad8\u6602\u7684\u6210\u672c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u7279\u5b9a\u4e8e\u63a8\u7406\u7684\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86ExpertFlow\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u9002\u5e94\u7075\u6d3b\u8def\u7531\u5e76\u5b9e\u73b0\u4e13\u5bb6\u5728CPU\u548cGPU\u4e4b\u95f4\u7684\u9ad8\u6548\u8c03\u5ea6\u6765\u589e\u5f3a\u63a8\u7406\u6548\u7387\u3002\u8fd9\u51cf\u5c11\u4e86\u5f00\u9500\u5e76\u63d0\u5347\u4e86\u7cfb\u7edf\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6838\u5fc3\u662f\u4e00\u4e2a\u57fa\u4e8e\u9884\u6d4b\u8def\u7531\u8def\u5f84\u7684\u5378\u8f7d\u673a\u5236\uff0c\u5229\u7528\u8f7b\u91cf\u7ea7\u9884\u6d4b\u5668\u5728\u8ba1\u7b97\u5f00\u59cb\u524d\u51c6\u786e\u9884\u6d4b\u8def\u7531\u8def\u5f84\u3002\u8fd9\u79cd\u4e3b\u52a8\u7b56\u7565\u5141\u8bb8\u5b9e\u65f6\u7ea0\u6b63\u4e13\u5bb6\u7f13\u5b58\u4e2d\u7684\u9519\u8bef\uff0c\u663e\u8457\u63d0\u9ad8\u7f13\u5b58\u547d\u4e2d\u7387\u5e76\u51cf\u5c11\u4e13\u5bb6\u4f20\u8f93\u7684\u9891\u7387\uff0c\u4ece\u800c\u6700\u5c0f\u5316I/O\u5f00\u9500\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u65bd\u4e86\u4e00\u79cd\u52a8\u6001\u4ee4\u724c\u8c03\u5ea6\u7b56\u7565\uff0c\u901a\u8fc7\u5728\u4e0d\u540c\u6279\u6b21\u95f4\u91cd\u65b0\u6392\u5217\u8f93\u5165\u4ee4\u724c\u6765\u4f18\u5316MoE\u63a8\u7406\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u51cf\u5c11\u4e86\u6bcf\u6279\u6b21\u6fc0\u6d3b\u7684\u4e13\u5bb6\u6570\u91cf\uff0c\u8fd8\u63d0\u9ad8\u4e86\u8ba1\u7b97\u6548\u7387\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cExpertFlow\u5b9e\u73b0\u4e86\u9ad8\u8fbe93.72%\u7684GPU\u5185\u5b58\u8282\u7701\uff0c\u5e76\u5c06\u63a8\u7406\u901f\u5ea6\u63d0\u5347\u81f3\u57fa\u7ebf\u65b9\u6cd5\u76842\u523010\u500d\uff0c\u7a81\u663e\u4e86\u5176\u6709\u6548\u6027\u548c\u4f5c\u4e3a\u8d44\u6e90\u53d7\u9650\u63a8\u7406\u573a\u666f\u4e0b\u7684\u7a33\u5065\u89e3\u51b3\u65b9\u6848\u7684\u91cd\u8981\u6027\u3002|\n", "2410.17952": "|**2024-10-23**|**SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains**|Ran Xu et.al.|[2410.17952](http://arxiv.org/abs/2410.17952)|null|\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u901a\u8fc7\u6574\u5408\u5916\u90e8\u77e5\u8bc6\u589e\u5f3a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u95ee\u9898\u56de\u7b54\uff08QA\uff09\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06\u901a\u7528\u7684RAG\u7cfb\u7edf\u9002\u5e94\u5230\u79d1\u5b66\u548c\u533b\u5b66\u7b49\u4e13\u4e1a\u9886\u57df\u65f6\uff0c\u7531\u4e8e\u5206\u5e03\u5dee\u5f02\u548c\u6709\u9650\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u8bbf\u95ee\uff0c\u4f1a\u9762\u4e34\u72ec\u7279\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SimRAG\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u8bad\u7ec3\u65b9\u6cd5\uff0c\u4f7fLLM\u5177\u5907\u95ee\u9898\u56de\u7b54\u548c\u95ee\u9898\u751f\u6210\u7684\u8054\u5408\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u9886\u57df\u9002\u5e94\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u5728\u6307\u4ee4\u8ddf\u968f\u3001\u95ee\u7b54\u548c\u641c\u7d22\u76f8\u5173\u6570\u636e\u4e0a\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u540e\uff0c\u5b83\u63d0\u793a\u76f8\u540c\u7684LLM\u4ece\u65e0\u6807\u7b7e\u8bed\u6599\u5e93\u4e2d\u751f\u6210\u591a\u6837\u5316\u7684\u9886\u57df\u76f8\u5173\u95ee\u9898\uff0c\u5e76\u91c7\u7528\u989d\u5916\u7684\u8fc7\u6ee4\u7b56\u7565\u6765\u4fdd\u7559\u9ad8\u8d28\u91cf\u7684\u5408\u6210\u793a\u4f8b\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u5408\u6210\u793a\u4f8b\uff0cLLM\u53ef\u4ee5\u5728\u7279\u5b9a\u9886\u57df\u7684RAG\u4efb\u52a1\u4e2d\u63d0\u5347\u6027\u80fd\u3002\u5728\u8de8\u8d8a\u4e24\u4e2a\u57fa\u7840\u6a21\u578b\u5927\u5c0f\u548c\u4e09\u4e2a\u9886\u57df\u768411\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSimRAG\u6bd4\u57fa\u7ebf\u65b9\u6cd5\u9ad8\u51fa1.2%\u81f38.6%\u3002|\n", "2410.17950": "|**2024-10-23**|**Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling**|Nirav Bhan et.al.|[2410.17950](http://arxiv.org/abs/2410.17950)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u975e\u51e1\u7684\u80fd\u529b\uff0c\u4f46\u7531\u4e8e\u5de5\u5177\u4f7f\u7528\u548c\u529f\u80fd\u8c03\u7528\u65b9\u9762\u7684\u6311\u6218\uff0c\u5176\u7ecf\u6d4e\u5f71\u54cd\u53d7\u5230\u4e86\u9650\u5236\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aThorV2\u7684\u65b0\u67b6\u6784\uff0c\u8be5\u67b6\u6784\u663e\u8457\u589e\u5f3a\u4e86LLMs\u7684\u529f\u80fd\u8c03\u7528\u80fd\u529b\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u6ce8\u4e8eHubSpot CRM\u64cd\u4f5c\uff0c\u4ee5\u8bc4\u4f30ThorV2\u4e0eOpenAI\u548cAnthropic\u7684\u9886\u5148\u6a21\u578b\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cThorV2\u5728\u5355\u4e2a\u548c\u591aAPI\u8c03\u7528\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3001\u53ef\u9760\u6027\u3001\u5ef6\u8fdf\u548c\u6210\u672c\u6548\u7387\u65b9\u9762\u5747\u4f18\u4e8e\u73b0\u6709\u6a21\u578b\u3002\u6211\u4eec\u8fd8\u8868\u660e\uff0cThorV2\u5728\u591a\u6b65\u9aa4\u4efb\u52a1\u4e2d\u7684\u53ef\u9760\u6027\u66f4\u5f3a\uff0c\u5e76\u4e14\u53ef\u6269\u5c55\u6027\u66f4\u597d\uff0c\u76f8\u6bd4\u4f20\u7edf\u6a21\u578b\u5177\u6709\u660e\u663e\u4f18\u52bf\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4ee4\u4eba\u5174\u594b\u7684\u53ef\u80fd\u6027\uff0c\u5373\u4f7f\u7528\u663e\u8457\u66f4\u5c0f\u7684LLMs\u5b9e\u73b0\u6bd4\u5f53\u4eca\u6700\u4f73\u6a21\u578b\u66f4\u51c6\u786e\u7684\u529f\u80fd\u8c03\u7528\u3002\u8fd9\u4e9b\u8fdb\u5c55\u5bf9\u4e8e\u5f00\u53d1\u66f4\u5f3a\u5927\u7684AI\u52a9\u624b\u4ee5\u53caLLMs\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2410.17922": "|**2024-10-23**|**Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models**|He Cao et.al.|[2410.17922](http://arxiv.org/abs/2410.17922)|**[link](https://github.com/idea-xl/g4d)**|\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u90e8\u7f72\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u9632\u5fa1\u65b9\u6cd5\u5f80\u5f80\u5b58\u5728\u4e24\u4e2a\u5173\u952e\u95ee\u9898\uff1a(i) \u9632\u5fa1\u80fd\u529b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u5728\u5316\u5b66\u7b49\u7279\u5b9a\u9886\u57df\u573a\u666f\u4e0b\uff0c\u7f3a\u4e4f\u4e13\u95e8\u77e5\u8bc6\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6076\u610f\u67e5\u8be2\u751f\u6210\u6709\u5bb3\u54cd\u5e94\u3002(ii) \u8fc7\u5ea6\u9632\u5fa1\uff0c\u8fd9\u4f1a\u635f\u5bb3LLMs\u7684\u4e00\u822c\u5b9e\u7528\u6027\u548c\u54cd\u5e94\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u591a\u4ee3\u7406\u7684\u9632\u5fa1\u6846\u67b6\uff0c\u79f0\u4e3aGuide for Defense (G4D)\uff0c\u8be5\u6846\u67b6\u5229\u7528\u51c6\u786e\u7684\u5916\u90e8\u4fe1\u606f\u63d0\u4f9b\u7528\u6237\u610f\u56fe\u7684\u65e0\u504f\u603b\u7ed3\u4ee5\u53ca\u5206\u6790\u6027\u5b89\u5168\u54cd\u5e94\u6307\u5bfc\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d41\u884c\u7684\u624b\u518c\u9003\u8131\u653b\u51fb\u548c\u826f\u6027\u6570\u636e\u96c6\u4e0a\uff0c\u6211\u4eec\u7684G4D\u53ef\u4ee5\u5728\u4e0d\u635f\u5bb3\u6a21\u578b\u4e00\u822c\u529f\u80fd\u7684\u60c5\u51b5\u4e0b\u589e\u5f3aLLM\u5728\u901a\u7528\u548c\u7279\u5b9a\u9886\u57df\u7684\u9c81\u68d2\u6027\u3002|\n", "2410.18975": "|**2024-10-24**|**Unbounded: A Generative Infinite Game of Character Life Simulation**|Jialu Li et.al.|[2410.18975](http://arxiv.org/abs/2410.18975)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u751f\u6210\u65e0\u9650\u6e38\u620f\u7684\u6982\u5ff5\uff0c\u8fd9\u662f\u4e00\u79cd\u89c6\u9891\u6e38\u620f\uff0c\u5b83\u8d85\u8d8a\u4e86\u4f20\u7edf\u56fa\u5b9a\u3001\u786c\u7f16\u7801\u7cfb\u7edf\u7684\u8fb9\u754c\uff0c\u901a\u8fc7\u4f7f\u7528\u751f\u6210\u6a21\u578b\u6765\u5b9e\u73b0\u3002\u53d7James P. Carse\u5173\u4e8e\u6709\u9650\u6e38\u620f\u548c\u65e0\u9650\u6e38\u620f\u533a\u522b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5229\u7528\u6700\u8fd1\u5728\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u65b9\u9762\u7684\u8fdb\u5c55\u6765\u521b\u5efa\u300a\u65e0\u754c\u300b\u2014\u2014\u4e00\u6b3e\u5b8c\u5168\u5c01\u88c5\u5728\u751f\u6210\u6a21\u578b\u4e2d\u7684\u89d2\u8272\u751f\u6d3b\u6a21\u62df\u6e38\u620f\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u300a\u65e0\u754c\u300b\u53d7\u5230\u6c99\u76d2\u751f\u6d3b\u6a21\u62df\u6e38\u620f\u7684\u542f\u53d1\uff0c\u5141\u8bb8\u4f60\u901a\u8fc7\u5582\u517b\u3001\u73a9\u800d\u548c\u5f15\u5bfc\u7b49\u65b9\u5f0f\u4e0e\u4f60\u5728\u865a\u62df\u4e16\u754c\u4e2d\u7684\u81ea\u4e3b\u865a\u62df\u89d2\u8272\u4e92\u52a8\uff0c\u5176\u4e2d\u4e00\u4e9b\u673a\u5236\u662f\u5f00\u653e\u5f0f\u7684\uff0c\u5e76\u4e14\u53ef\u4ee5\u662f\u7a81\u53d1\u6027\u7684\u3002\u4e3a\u4e86\u5f00\u53d1\u300a\u65e0\u754c\u300b\uff0c\u6211\u4eec\u5728\u8bed\u8a00\u6a21\u578b\u548c\u89c6\u89c9\u751f\u6210\u9886\u57df\u63d0\u51fa\u4e86\u6280\u672f\u4e0a\u7684\u521b\u65b0\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\uff1a(1)\u4e00\u79cd\u4e13\u95e8\u8bbe\u8ba1\u7684\u3001\u7ecf\u8fc7\u84b8\u998f\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u5b9e\u65f6\u52a8\u6001\u751f\u6210\u6e38\u620f\u673a\u5236\u3001\u53d9\u4e8b\u548c\u89d2\u8272\u4e92\u52a8\uff0c(2)\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u533a\u57df\u56fe\u50cf\u63d0\u793a\u9002\u914d\u5668\uff08IP-Adapter\uff09\uff0c\u7528\u4e8e\u89c6\u89c9\u6a21\u578b\uff0c\u786e\u4fdd\u89d2\u8272\u5728\u591a\u4e2a\u73af\u5883\u4e2d\u7684\u89c6\u89c9\u751f\u6210\u65e2\u4e00\u81f4\u53c8\u7075\u6d3b\u3002\u6211\u4eec\u901a\u8fc7\u5b9a\u6027\u548c\u5b9a\u91cf\u5206\u6790\u5bf9\u6211\u4eec\u7684\u7cfb\u7edf\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u89d2\u8272\u751f\u6d3b\u6a21\u62df\u3001\u7528\u6237\u6307\u4ee4\u9075\u5faa\u3001\u53d9\u4e8b\u8fde\u8d2f\u6027\u548c\u89c6\u89c9\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u4e0e\u4f20\u7edf\u76f8\u5173\u65b9\u6cd5\u76f8\u6bd4\u6709\u663e\u8457\u6539\u8fdb\u3002|\n", "2410.18967": "|**2024-10-24**|**Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms**|Zhangheng Li et.al.|[2410.18967](http://arxiv.org/abs/2410.18967)|null|\u6784\u5efa\u4e00\u4e2a\u901a\u7528\u7684\u7528\u6237\u754c\u9762\uff08UI\uff09\u7406\u89e3\u6a21\u578b\u9762\u4e34\u7740\u8bf8\u591a\u6311\u6218\uff0c\u5305\u62ec\u5e73\u53f0\u591a\u6837\u6027\u3001\u5206\u8fa8\u7387\u53d8\u5316\u548c\u6570\u636e\u9650\u5236\u7b49\u95ee\u9898\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aFerret-UI 2\u7684\u65b0\u6a21\u578b\uff0c\u8fd9\u662f\u4e00\u79cd\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u8de8\u591a\u79cd\u5e73\u53f0\u7684\u901a\u7528UI\u7406\u89e3\uff0c\u5305\u62eciPhone\u3001Android\u3001iPad\u3001\u7f51\u9875\u548cApple TV\u7b49\u5e73\u53f0\u3002Ferret-UI 2\u5728\u539f\u6709Ferret-UI\u7684\u57fa\u7840\u4e0a\u5f15\u5165\u4e86\u4e09\u9879\u5173\u952e\u521b\u65b0\uff1a\u652f\u6301\u591a\u79cd\u5e73\u53f0\u7c7b\u578b\u3001\u901a\u8fc7\u81ea\u9002\u5e94\u7f29\u653e\u5b9e\u73b0\u9ad8\u5206\u8fa8\u7387\u611f\u77e5\uff0c\u4ee5\u53ca\u5229\u7528GPT-4o\u7ed3\u5408\u96c6\u5408\u6807\u8bb0\u89c6\u89c9\u63d0\u793a\u751f\u6210\u9ad8\u7ea7\u4efb\u52a1\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4e9b\u6539\u8fdb\u4f7fFerret-UI 2\u80fd\u591f\u6267\u884c\u590d\u6742\u7684\u3001\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u4ea4\u4e92\uff0c\u4f7f\u5176\u5728\u4e0d\u65ad\u6269\u5c55\u7684\u5e73\u53f0\u751f\u6001\u7cfb\u7edf\u4e2d\u5177\u6709\u9ad8\u5ea6\u7684\u901a\u7528\u6027\u548c\u9002\u5e94\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8bc1\u660e\uff0c\u5728\u6307\u5411\u3001\u5b9a\u4f4d\u3001\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u9ad8\u7ea7\u4efb\u52a1\uff08\u5305\u542b9\u4e2a\u5b50\u4efb\u52a1\u00d75\u4e2a\u5e73\u53f0\uff09\u3001GUIDE\u4e0b\u4e00\u6b65\u9884\u6d4b\u6570\u636e\u96c6\u548cGUI-World\u591a\u5e73\u53f0\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cFerret-UI 2\u663e\u8457\u4f18\u4e8eFerret-UI\uff0c\u5e76\u4e14\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u8de8\u5e73\u53f0\u8fc1\u79fb\u80fd\u529b\u3002|\n", "2410.18966": "|**2024-10-24**|**Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions**|Yujuan Fu et.al.|[2410.18966](http://arxiv.org/abs/2410.18966)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u663e\u793a\u51fa\u4f5c\u4e3a\u901a\u7528\u4efb\u52a1\u89e3\u51b3\u8005\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u662f\u5728\u5927\u91cf\u6570\u636e\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\uff0c\u56e0\u6b64\u5bf9\u5176\u8bc4\u4f30\u7684\u4e00\u4e2a\u91cd\u8981\u95ee\u9898\u662f\u6570\u636e\u6c61\u67d3\u95ee\u9898\uff0c\u5373\u8bad\u7ec3\u6570\u636e\u548c\u8bc4\u4f30\u6570\u636e\u96c6\u4e4b\u95f4\u7684\u91cd\u53e0\u4f1a\u5938\u5927\u6027\u80fd\u8bc4\u4f30\u3002\u867d\u7136\u5df2\u7ecf\u5f00\u53d1\u4e86\u591a\u79cd\u65b9\u6cd5\u6765\u8bc6\u522b\u6570\u636e\u6c61\u67d3\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u7279\u5b9a\u7684\u5047\u8bbe\uff0c\u800c\u8fd9\u4e9b\u5047\u8bbe\u53ef\u80fd\u5e76\u4e0d\u666e\u904d\u9002\u7528\u4e8e\u4e0d\u540c\u7684\u8bbe\u7f6e\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u56de\u987e\u4e8647\u7bc7\u5173\u4e8e\u6570\u636e\u6c61\u67d3\u68c0\u6d4b\u7684\u8bba\u6587\uff0c\u5bf9\u5176\u4e2d\u7684\u57fa\u7840\u5047\u8bbe\u8fdb\u884c\u4e86\u5206\u7c7b\uff0c\u5e76\u8bc4\u4f30\u4e86\u5b83\u4eec\u662f\u5426\u7ecf\u8fc7\u4e25\u683c\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u786e\u5b9a\u5e76\u5206\u6790\u4e86\u516b\u7c7b\u5047\u8bbe\uff0c\u5e76\u4ee5\u4e09\u4e2a\u5047\u8bbe\u4f5c\u4e3a\u6848\u4f8b\u7814\u7a76\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u5728\u5bf9\u7528\u4e8e\u9884\u8bad\u7ec3LLMs\u7684\u5b9e\u4f8b\u8fdb\u884c\u5206\u7c7b\u65f6\uff0c\u57fa\u4e8e\u8fd9\u4e09\u79cd\u5047\u8bbe\u7684\u68c0\u6d4b\u65b9\u6cd5\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u968f\u673a\u731c\u6d4b\uff0c\u8fd9\u8868\u660e\u5f53\u524d\u7684LLMs\u5b66\u4e60\u7684\u662f\u6570\u636e\u5206\u5e03\u800c\u4e0d\u662f\u8bb0\u5fc6\u4e2a\u522b\u5b9e\u4f8b\u3002\u603b\u4f53\u800c\u8a00\uff0c\u8fd9\u9879\u5de5\u4f5c\u5f3a\u8c03\u4e86\u65b9\u6cd5\u660e\u786e\u9648\u8ff0\u5176\u57fa\u7840\u5047\u8bbe\u5e76\u5728\u5404\u79cd\u573a\u666f\u4e0b\u6d4b\u8bd5\u5176\u6709\u6548\u6027\u7684\u91cd\u8981\u6027\u3002|\n", "2410.18963": "|**2024-10-24**|**OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning**|Xiaoqiang Wang et.al.|[2410.18963](http://arxiv.org/abs/2410.18963)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u5982\u7f51\u9875\u6d4f\u89c8\u548c\u6e38\u620f\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u8de8\u591a\u6837\u5316\u5e94\u7528\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86OSCAR\uff1a\u901a\u8fc7\u72b6\u6001\u611f\u77e5\u63a8\u7406\u548c\u91cd\u89c4\u5212\u7684\u64cd\u4f5c\u7cfb\u7edf\u63a7\u5236\u3002OSCAR\u662f\u4e00\u79cd\u901a\u7528\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u6807\u51c6\u5316\u7684\u63a7\u5236\u65b9\u5f0f\uff08\u5982\u9f20\u6807\u548c\u952e\u76d8\u8f93\u5165\uff09\u81ea\u4e3b\u5bfc\u822a\u548c\u4e0e\u5404\u79cd\u684c\u9762\u548c\u79fb\u52a8\u5e94\u7528\u7a0b\u5e8f\u8fdb\u884c\u4ea4\u4e92\uff0c\u540c\u65f6\u5904\u7406\u5c4f\u5e55\u56fe\u50cf\u4ee5\u5b8c\u6210\u7528\u6237\u547d\u4ee4\u3002OSCAR\u5c06\u4eba\u7c7b\u6307\u4ee4\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684Python\u4ee3\u7801\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u7684\u7cbe\u786e\u63a7\u5236\u3002\u4e3a\u4e86\u589e\u5f3a\u7a33\u5b9a\u6027\u548c\u9002\u5e94\u6027\uff0cOSCAR\u4f5c\u4e3a\u4e00\u4e2a\u72b6\u6001\u673a\u8fd0\u884c\uff0c\u5e76\u914d\u5907\u4e86\u9519\u8bef\u5904\u7406\u673a\u5236\u548c\u52a8\u6001\u4efb\u52a1\u91cd\u89c4\u5212\u529f\u80fd\uff0c\u4f7f\u5176\u80fd\u591f\u9ad8\u6548\u5730\u5b9e\u65f6\u8c03\u6574\u4ee5\u5e94\u5bf9\u53cd\u9988\u548c\u5f02\u5e38\u60c5\u51b5\u3002\u6211\u4eec\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5c55\u793a\u4e86OSCAR\u7684\u6709\u6548\u6027\uff0c\u5728\u8fd9\u4e9b\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u5c06\u590d\u6742\u7684\u64cd\u4f5c\u6d41\u7a0b\u7b80\u5316\u4e3a\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u547d\u4ee4\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u7528\u6237\u7684\u751f\u4ea7\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5728\u53d1\u8868\u540e\u5f00\u6e90\u3002|\n", "2410.18957": "|**2024-10-24**|**Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code**|Jipeng Zhang et.al.|[2410.18957](http://arxiv.org/abs/2410.18957)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u8d44\u6e90\u7f16\u7a0b\u8bed\u8a00\uff08HRPLs\uff09\u5982Python\u7684\u4ee3\u7801\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4f4e\u8d44\u6e90\u7f16\u7a0b\u8bed\u8a00\uff08LRPLs\uff09\u5982Racket\u6216D\u4e0a\u7684\u8868\u73b0\u5219\u663e\u8457\u900a\u8272\u3002\u8fd9\u79cd\u6027\u80fd\u5dee\u8ddd\u52a0\u5267\u4e86\u6570\u5b57\u9e3f\u6c9f\uff0c\u963b\u788d\u4e86\u4f7f\u7528LRPLs\u7684\u5f00\u53d1\u8005\u4eceLLM\u7684\u8fdb\u6b65\u4e2d\u53d7\u76ca\uff0c\u5e76\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u5f3a\u5316\u4e86\u672a\u5145\u5206\u4ee3\u8868\u7684\u7f16\u7a0b\u793e\u533a\u4e4b\u95f4\u7684\u521b\u65b0\u5dee\u5f02\u3002\u867d\u7136\u4e3aLRPLs\u751f\u6210\u989d\u5916\u8bad\u7ec3\u6570\u636e\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u4f46\u5b83\u9762\u4e34\u7740\u4e24\u4e2a\u5173\u952e\u6311\u6218\uff1a\u4eba\u5de5\u6807\u6ce8\u65e2\u8d39\u65f6\u53c8\u6602\u8d35\uff0c\u800cLLM\u751f\u6210\u7684LRPL\u4ee3\u7801\u8d28\u91cf\u901a\u5e38\u8f83\u5dee\u3002\u8fd9\u4e00\u95ee\u9898\u7684\u6839\u672c\u539f\u56e0\u5728\u4e8e\u81ea\u7136\u8bed\u8a00\u5230\u7f16\u7a0b\u8bed\u8a00\u7684\u5dee\u8ddd\uff08NL-PL Gap\uff09\uff0c\u5728LRPLs\u4e2d\u5c24\u5176\u660e\u663e\uff0c\u56e0\u4e3a\u5bf9\u9f50\u7684\u6570\u636e\u6709\u9650\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aBridge-Coder\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528LLMs\u7684\u5185\u5728\u80fd\u529b\u6765\u589e\u5f3a\u5176\u5728LRPLs\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e24\u4e2a\u5173\u952e\u9636\u6bb5\u3002\u9996\u5148\u662f\u6865\u63a5\u751f\u6210\uff0c\u901a\u8fc7\u5229\u7528LLMs\u5bf9\u4e00\u822c\u77e5\u8bc6\u7684\u7406\u89e3\u3001\u5bf9HRPLs\u7684\u719f\u7ec3\u7a0b\u5ea6\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u6765\u521b\u5efa\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6\u3002\u7136\u540e\u662f\u6865\u63a5\u5bf9\u9f50\uff0c\u9010\u6b65\u6539\u5584\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u4e0eLRPLs\u4e4b\u95f4\u7684\u5bf9\u9f50\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u591a\u79cdLRPLs\u4e2d\u663e\u793a\uff0cBridge-Coder\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8be6\u7ec6\u5206\u6790\u4e86\u65b9\u6cd5\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u672a\u6765\u89e3\u51b3\u4e0eLRPLs\u76f8\u5173\u6311\u6218\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2410.18955": "|**2024-10-24**|**BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning**|Yujuan Velvin Fu et.al.|[2410.18955](http://arxiv.org/abs/2410.18955)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982ChatGPT\u901a\u8fc7\u5728\u5927\u89c4\u6a21\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u8ddf\u968f\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u5fae\u8c03\uff0c\u80fd\u591f\u6cdb\u5316\u5230\u65b0\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7ecf\u8fc7\u6307\u4ee4\u5fae\u8c03\u7684LLMs\u5728\u9700\u8981\u9886\u57df\u77e5\u8bc6\u3001\u7ec6\u7c92\u5ea6\u6587\u672c\u7406\u89e3\u548c\u7ed3\u6784\u5316\u6570\u636e\u63d0\u53d6\u7684\u4e13\u4e1a\u533b\u5b66\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u4efb\u52a1\u4e2d\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\uff1a(1) \u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u63d0\u793a\u683c\u5f0f\uff0c\u9002\u7528\u4e8e7\u4e2a\u91cd\u8981\u7684NLU\u4efb\u52a1\uff0c\u901a\u8fc7\u8de8\u5ea6\u63d0\u53d6\u548c\u591a\u9009\u9898\u95ee\u7b54\uff08QA\uff09\u6765\u5b9e\u73b0\uff1b(2) \u521b\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6MNLU-Instruct\uff0c\u5229\u7528\u4e86\u591a\u79cd\u73b0\u6709\u7684\u5f00\u6e90\u533b\u5b66NLU\u8bed\u6599\u5e93\uff1b(3) \u901a\u8fc7\u5728MNLU-Instruct\u4e0a\u5bf9BioMistral\u8fdb\u884c\u5fae\u8c03\uff0c\u5f00\u53d1\u4e86BioMistral-NLU\uff0c\u4e00\u4e2a\u5177\u6709\u901a\u7528\u6027\u7684\u533b\u5b66NLU\u6a21\u578b\u3002\u6211\u4eec\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u8bc4\u4f30\u4e86BioMistral-NLU\uff0c\u5728\u4e24\u4e2a\u5e7f\u6cdb\u91c7\u7528\u7684\u533b\u5b66NLU\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5373\u751f\u7269\u533b\u5b66\u8bed\u8a00\u7406\u89e3\u8bc4\u4f30\uff08BLUE\uff09\u548c\u751f\u7269\u533b\u5b66\u8bed\u8a00\u7406\u89e3\u548c\u63a8\u7406\u57fa\u51c6\uff08BLURB\uff09\u4e2d\u76846\u4e2a\u91cd\u8981NLU\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684BioMistral-NLU\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u539f\u59cb\u7684BioMistral\u4ee5\u53ca\u4e13\u6709\u7684LLMs\u2014\u2014ChatGPT\u548cGPT-4\u3002\u6211\u4eec\u4e0e\u6570\u636e\u96c6\u65e0\u5173\u7684\u63d0\u793a\u7b56\u7565\u548c\u5728\u5404\u79cdNLU\u4efb\u52a1\u4e0a\u7684\u6307\u4ee4\u5fae\u8c03\u6b65\u9aa4\u589e\u5f3a\u4e86LLMs\u5728\u5404\u79cd\u533b\u5b66NLU\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u3002\u6d88\u878d\u5b9e\u9a8c\u663e\u793a\uff0c\u5373\u4f7f\u603b\u7684\u8bad\u7ec3\u5b9e\u4f8b\u6570\u91cf\u4fdd\u6301\u4e0d\u53d8\uff0c\u6307\u4ee4\u5fae\u8c03\u7684\u4efb\u52a1\u79cd\u7c7b\u8d8a\u5e7f\uff0c\u4e0b\u6e38\u96f6\u6837\u672c\u6cdb\u5316\u80fd\u529b\u4e5f\u8d8a\u5f3a\u3002|\n", "2410.18952": "|**2024-10-24**|**Dynamic Vocabulary Pruning in Early-Exit LLMs**|Jort Vincenti et.al.|[2410.18952](http://arxiv.org/abs/2410.18952)|**[link](https://github.com/matteonulli/vocabulary_pruning)**|**\u589e\u52a0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89c4\u6a21\u5df2\u88ab\u8bc1\u660e\u53ef\u4ee5\u63d0\u9ad8\u5176\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e5f\u5e26\u6765\u4e86\u63a8\u7406\u901f\u5ea6\u53d8\u6162\u548c\u6210\u672c\u589e\u52a0\u7684\u95ee\u9898\u3002\u65e9\u671f\u9000\u51fa\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5728\u4e2d\u95f4\u5c42\u8fdb\u884c\u9884\u6d4b\u6765\u63d0\u9ad8LLM\u63a8\u7406\u7684\u6548\u7387\u3002\u7136\u800c\uff0c\u73b0\u4ee3LLMs\u4e2d\u7684\u5927\u8bcd\u6c47\u91cf\u4f7f\u5f97\u6240\u9700\u7684\u7f6e\u4fe1\u5ea6\u4f30\u8ba1\u5728\u8ba1\u7b97\u4e0a\u975e\u5e38\u6602\u8d35\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u6548\u7387\u63d0\u5347\u7684\u6548\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u5728\u6d4b\u8bd5\u65f6\u52a8\u6001\u526a\u679d\u8bcd\u6c47\u8868\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u8bcd\u6c47\u8868\u5728\u6700\u521d\u7684\u67d0\u4e00\u5c42\u88ab\u526a\u679d\uff0c\u5e76\u5728\u6574\u4e2a\u524d\u5411\u4f20\u9012\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u8f83\u5c0f\u7684\u8bcd\u6c47\u8868\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u540e\u5904\u7406\u52a8\u6001\u8bcd\u6c47\u8868\u526a\u679d\u65b9\u6cd5\u63d0\u9ad8\u4e86\u65e9\u671f\u9000\u51faLLM\u4e2d\u7f6e\u4fe1\u5ea6\u4f30\u8ba1\u7684\u6548\u7387\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u6027\u80fd\u3002**|\n", "2410.18927": "|**2024-10-24**|**SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models**|Zonghao Ying et.al.|[2410.18927](http://arxiv.org/abs/2410.18927)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7528\u6237\u751f\u6210\u6709\u5bb3\u8f93\u51fa\u65b9\u9762\u8868\u73b0\u51fa\u5f3a\u70c8\u7684\u5b89\u5168\u9690\u60a3\uff0c\u8fd9\u4fc3\u4f7f\u4e86\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u73b0\u6709\u7684MLLMs\u5b89\u5168\u57fa\u51c6\u5b58\u5728\u67e5\u8be2\u8d28\u91cf\u4f4e\u548c\u8bc4\u4f30\u53ef\u9760\u6027\u5dee\u7684\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u9650\u5236\u4e86\u5bf9MLLMs\u5b89\u5168\u5f71\u54cd\u7684\u68c0\u6d4b\uff0c\u56e0\u4e3a\u968f\u7740MLLMs\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5df2\u663e\u5f97\u4e0d\u8db3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\toolns\u7684\u7efc\u5408\u6846\u67b6\uff0c\u7528\u4e8e\u5bf9MLLMs\u8fdb\u884c\u5b89\u5168\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u5168\u9762\u7684\u6709\u5bb3\u67e5\u8be2\u6570\u636e\u96c6\u548c\u4e00\u79cd\u81ea\u52a8\u8bc4\u4f30\u534f\u8bae\uff0c\u5206\u522b\u65e8\u5728\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5b89\u5168\u6570\u636e\u96c6\u751f\u6210\u7ba1\u9053\uff0c\u5728\u8fd9\u4e2a\u7ba1\u9053\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u4e00\u7ec4LLM\u8bc4\u5224\u8005\u6765\u8bc6\u522b\u548c\u5206\u7c7b\u5bf9MLLMs\u6700\u5177\u5371\u5bb3\u6027\u548c\u591a\u6837\u6027\u7684\u98ce\u9669\u573a\u666f\uff1b\u57fa\u4e8e\u8fd9\u79cd\u5206\u7c7b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8981\u6c42\u8fd9\u4e9b\u8bc4\u5224\u8005\u76f8\u5e94\u5730\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6709\u5bb3\u67e5\u8be2\uff0c\u4ece\u800c\u4ea7\u751f\u4e8623\u79cd\u98ce\u9669\u573a\u666f\u548c2300\u4e2a\u591a\u6a21\u6001\u6709\u5bb3\u67e5\u8be2\u5bf9\u3002\u5728\u5b89\u5168\u8bc4\u4f30\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u53f8\u6cd5\u7a0b\u5e8f\u4e2d\u7684\u966a\u5ba1\u56e2\u5236\u5ea6\uff0c\u5f00\u521b\u4e86\u4e00\u79cd\u966a\u5ba1\u56e2\u5ba1\u8bae\u8bc4\u4f30\u534f\u8bae\uff0c\u8be5\u534f\u8bae\u91c7\u7528\u534f\u4f5c\u5f0fLLM\u6765\u8bc4\u4f30\u76ee\u6807\u6a21\u578b\u662f\u5426\u8868\u73b0\u51fa\u7279\u5b9a\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u4f9b\u53ef\u9760\u4e14\u65e0\u504f\u89c1\u7684\u5185\u5bb9\u5b89\u5168\u98ce\u9669\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u57fa\u51c6\u8fd8\u53ef\u4ee5\u6269\u5c55\u5230\u97f3\u9891\u6a21\u6001\uff0c\u663e\u793a\u51fa\u9ad8\u5ea6\u7684\u53ef\u6269\u5c55\u6027\u548c\u6f5c\u529b\u3002\u57fa\u4e8e\u6211\u4eec\u7684\u6846\u67b6\uff0c\u6211\u4eec\u5bf915\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u5f00\u6e90MLLMs\u548c6\u79cd\u5546\u4e1aMLLMs\uff08\u5982GPT-4o\u3001Gemini\uff09\u8fdb\u884c\u4e86\u5927\u89c4\u6a21\u5b9e\u9a8c\uff0c\u63ed\u793a\u4e86\u73b0\u6709MLLMs\u4e2d\u5b58\u5728\u7684\u5e7f\u6cdb\u5b89\u5168\u95ee\u9898\uff0c\u5e76\u5b9e\u4f8b\u5316\u4e86\u5173\u4e8eMLLMs\u5b89\u5168\u6027\u80fd\u7684\u4e00\u4e9b\u89c1\u89e3\uff0c\u5982\u56fe\u50cf\u8d28\u91cf\u548c\u53c2\u6570\u5927\u5c0f\u3002|\n", "2410.18921": "|**2024-10-24**|**From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems**|A M Muntasir Rahman et.al.|[2410.18921](http://arxiv.org/abs/2410.18921)|null|\u8003\u8651\u4e00\u4e2a\u6570\u5b66\u95ee\u9898\uff1a\u201c\u8389\u8389\u6628\u5929\u4ece\u5979\u6700\u597d\u7684\u670b\u53cb\u90a3\u91cc\u6536\u5230\u4e863\u5757\u997c\u5e72\uff0c\u5e76\u5728\u65e9\u9910\u65f6\u5403\u4e865\u5757\u3002\u4eca\u5929\uff0c\u5979\u7684\u670b\u53cb\u53c8\u7ed9\u4e86\u59793\u5757\u997c\u5e72\u3002\u73b0\u5728\u8389\u8389\u6709\u591a\u5c11\u5757\u997c\u5e72\uff1f\u201d\u8bb8\u591a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5148\u524d\u7684\u7814\u7a76\u4e2d\u901a\u8fc7\u8ba1\u7b97\u201c3-5+3\u201d\u7684\u7b49\u5f0f\u6765\u5f97\u51fa\u7b54\u6848\u201c1\u201d\u3002\u7136\u800c\uff0c\u4ece\u4eba\u7c7b\u7684\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u8ba4\u8bc6\u5230\u8fd9\u4e2a\u95ee\u9898\u7684\u5185\u5728\u7f3a\u9677\uff1a\u5982\u679c\u8389\u8389\u6700\u521d\u53ea\u67093\u5757\u997c\u5e72\uff0c\u5979\u4e0d\u53ef\u80fd\u5728\u65e9\u9910\u65f6\u5403\u63895\u5757\u3002\u8fd9\u79cd\u5dee\u5f02\u5f15\u53d1\u4e86\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff1a\u5f53\u524d\u7684LLMs\u662f\u4ec5\u4ec5\u4f5c\u4e3a\u76f2\u76ee\u7684\u89e3\u9898\u8005\uff0c\u673a\u68b0\u5730\u5e94\u7528\u6570\u5b66\u8fd0\u7b97\u800c\u4e0d\u8fdb\u884c\u66f4\u6df1\u5c42\u6b21\u7684\u63a8\u7406\uff0c\u8fd8\u662f\u80fd\u591f\u4f5c\u4e3a\u4e00\u4e2a\u903b\u8f91\u601d\u8003\u8005\uff0c\u8bc6\u522b\u903b\u8f91\u4e0a\u7684\u4e0d\u4e00\u81f4\uff1f \u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u57fa\u51c6\u6570\u636e\u96c6FaultyMath\uff0c\u5176\u4e2d\u5305\u62ec\u591a\u6837\u5316\u7684\u6709\u7f3a\u9677\u7684\u6570\u5b66\u95ee\u9898\uff1ai\uff09\u6db5\u76d6\u591a\u4e2a\u6570\u5b66\u7c7b\u522b\uff0c\u5982\u4ee3\u6570\u3001\u51e0\u4f55\u3001\u6570\u8bba\u7b49\uff1bii\uff09\u5177\u6709\u4e0d\u540c\u7684\u96be\u5ea6\u7ea7\u522b\uff1biii\uff09\u4e0d\u540c\u7c7b\u578b\u7684\u7f3a\u9677\u6765\u6e90\u2014\u2014\u5305\u62ec\u5e38\u8bc6\u8fdd\u53cd\u3001\u6a21\u7cca\u9648\u8ff0\u3001\u6570\u5b66\u77db\u76fe\u7b49\u3002\u6211\u4eec\u4f7f\u7528FaultyMath\u5bf9\u5e7f\u6cdb\u7684LLMs\u8fdb\u884c\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u3001\u95ed\u6e90\u548c\u6570\u5b66\u4e13\u4e1a\u6a21\u578b\uff0c\u4ece\u4e09\u4e2a\u65b9\u9762\u8fdb\u884c\u8bc4\u4f30\uff1a(i) \u5728\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591a\u51c6\u786e\u5730\u68c0\u6d4b\u51fa\u6709\u7f3a\u9677\u7684\u6570\u5b66\u95ee\u9898\uff1f(ii) \u5f53\u63d0\u4f9b\u5173\u4e8e\u95ee\u9898\u6709\u6548\u6027\u7684\u63d0\u793a\u2014\u2014\u65e0\u8bba\u662f\u6b63\u786e\u7684\u8fd8\u662f\u8bef\u5bfc\u6027\u7684\u2014\u2014LLMs\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u9002\u5e94\u6210\u4e3a\u53ef\u9760\u7684\u903b\u8f91\u601d\u8003\u8005\uff1f(iii) \u5f53LLMs\u8bc6\u522b\u51fa\u4e00\u4e2a\u6570\u5b66\u95ee\u9898\u662f\u9519\u8bef\u7684\u65f6\uff0c\u5b83\u4eec\u751f\u6210\u7684\u89e3\u91ca\u6709\u591a\u53ef\u9760\uff1f\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u73b0\u6709\u7684LLMs\u5927\u591a\u8868\u73b0\u4e3a\u76f2\u76ee\u7684\u89e3\u9898\u8005\uff0c\u672a\u80fd\u5177\u5907\u6210\u4e3a\u903b\u8f91\u601d\u8003\u8005\u6240\u9700\u7684\u63a8\u7406\u80fd\u529b\u3002|\n", "2410.18908": "|**2024-10-25**|**A Survey on Speech Large Language Models**|Jing Peng et.al.|[2410.18908](http://arxiv.org/abs/2410.18908)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0a\u4e0b\u6587\u7406\u89e3\u548c\u591a\u4efb\u52a1\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u4e00\u76f4\u5728\u5bfb\u6c42\u5c06LLMs\u96c6\u6210\u5230\u53e3\u8bed\u7406\u89e3\uff08SLU\uff09\u9886\u57df\u7684\u5927\u6846\u67b6\u4e2d\u3002\u4e0d\u540c\u4e8e\u4f20\u7edf\u7684\u901a\u8fc7\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u751f\u6210\u6587\u672c\u5e76\u4f9d\u6b21\u5904\u7406\u7684\u65b9\u6cd5\uff0c\u65b0\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u8bbe\u8ba1\u4ee5\u97f3\u9891\u7279\u5f81\u63d0\u53d6\u4e3a\u4e2d\u5fc3\u3001\u7ed3\u5408\u591a\u6a21\u6001\u4fe1\u606f\u878d\u5408\u548cLLM\u63a8\u7406\u7684\u67b6\u6784\u2014\u2014\u5373\u6240\u8c13\u7684\u8bed\u97f3LLMs\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u66f4\u4e30\u5bcc\u5730\u63d0\u53d6\u97f3\u9891\u7279\u5f81\uff0c\u540c\u65f6\u4fc3\u8fdb\u97f3\u9891\u548c\u6587\u672c\u6a21\u6001\u7684\u7aef\u5230\u7aef\u878d\u5408\uff0c\u4ece\u800c\u5b9e\u73b0\u4ece\u97f3\u9891\u6570\u636e\u4e2d\u8fdb\u884c\u66f4\u6df1\u5c42\u6b21\u7684\u7406\u89e3\u548c\u63a8\u7406\u3002\u672c\u6587\u9610\u660e\u4e86\u8bed\u97f3LLMs\u7684\u53d1\u5c55\uff0c\u63d0\u4f9b\u4e86\u7cfb\u7edf\u67b6\u6784\u548c\u8bad\u7ec3\u7b56\u7565\u7684\u6df1\u5165\u5206\u6790\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u4e00\u7cfb\u5217\u9488\u5bf9\u6027\u5b9e\u9a8c\uff0c\u672c\u6587\u8bc4\u4f30\u4e86\u8bed\u97f3LLMs\u5728\u4e30\u5bcc\u97f3\u9891\u8f6c\u5199\u65b9\u9762\u7684\u8fdb\u5c55\u53ca\u5176\u5728SLU\u9886\u57df\u8de8\u4efb\u52a1\u6574\u5408\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u6307\u51fa\u4e86\u901a\u8fc7\u5b9e\u9a8c\u53d1\u73b0\u7684\u5173\u952e\u6311\u6218\uff0c\u4f8b\u5982\u5728\u67d0\u4e9b\u6761\u4ef6\u4e0bLLMs\u7684\u60f0\u6027\u95ee\u9898\u3002\u6587\u7ae0\u8fdb\u4e00\u6b65\u63a2\u8ba8\u4e86\u8bed\u97f3LLMs\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u5e76\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\u63d0\u51fa\u4e86\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u4e3a\u8be5\u9886\u57df\u7684\u672a\u6765\u7814\u7a76\u4ee5\u53caLLMs\u5728\u591a\u6a21\u6001\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u53c2\u8003\u3002|\n", "2410.19733": "|**2024-10-25**|**The Potential and Value of AI Chatbot in Personalized Cognitive Training**|Zilong Wang et.al.|[2410.19733](http://arxiv.org/abs/2410.19733)|null|\u8fd1\u5e74\u6765\uff0c\u5168\u7403\u4eba\u53e3\u8001\u9f84\u5316\u52a0\u901f\u5bfc\u81f4\u8ba4\u77e5\u969c\u788d\uff0c\u5982\u963f\u5c14\u8328\u6d77\u9ed8\u75c5\u7684\u53d1\u75c5\u7387\u589e\u52a0\uff0c\u7ed9\u516c\u5171\u536b\u751f\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u5c3d\u7ba1\u76ee\u524d\u5c1a\u65e0\u6709\u6548\u6cbb\u7597\u65b9\u6cd5\u53ef\u4ee5\u9006\u8f6c\u963f\u5c14\u8328\u6d77\u9ed8\u75c5\uff0c\u4f46\u9884\u9632\u548c\u65e9\u671f\u5e72\u9884\uff0c\u5305\u62ec\u8ba4\u77e5\u8bad\u7ec3\uff0c\u81f3\u5173\u91cd\u8981\u3002\u672c\u62a5\u544a\u63a2\u8ba8\u4e86AI\u804a\u5929\u673a\u5668\u4eba\u5728\u589e\u5f3a\u4e2a\u6027\u5316\u8ba4\u77e5\u8bad\u7ec3\u65b9\u9762\u7684\u6f5c\u529b\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86ReMe\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u7f51\u7edc\u7684\u6846\u67b6\uff0c\u65e8\u5728\u521b\u5efaAI\u804a\u5929\u673a\u5668\u4eba\u4ee5\u4fc3\u8fdb\u8ba4\u77e5\u8bad\u7ec3\u7814\u7a76\uff0c\u7279\u522b\u662f\u9488\u5bf9\u4ece\u4e2a\u4eba\u751f\u6d3b\u65e5\u5fd7\u4e2d\u63d0\u53d6\u7684\u60c5\u8282\u8bb0\u5fc6\u4efb\u52a1\u3002\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0cReMe\u63d0\u4f9b\u4e86\u66f4\u53cb\u597d\u3001\u4e92\u52a8\u548c\u4e2a\u6027\u5316\u7684\u57f9\u8bad\u4f53\u9a8c\u3002\u6848\u4f8b\u7814\u7a76\u8868\u660e\uff0cReMe\u901a\u8fc7\u751f\u6d3b\u56de\u5fc6\u548c\u5f00\u653e\u5f0f\u8bed\u8a00\u8c1c\u9898\u6709\u6548\u5730\u5438\u5f15\u4e86\u7528\u6237\uff0c\u7a81\u663e\u4e86\u5176\u5728\u6539\u5584\u8ba4\u77e5\u8bad\u7ec3\u8bbe\u8ba1\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\uff0c\u4f46\u4ecd\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\uff0c\u901a\u8fc7\u5305\u62ec\u8ba4\u77e5\u80fd\u529b\u8bc4\u4f30\u5728\u5185\u7684\u5927\u89c4\u6a21\u7814\u7a76\u6765\u9a8c\u8bc1\u57f9\u8bad\u7684\u6709\u6548\u6027\u3002\u603b\u4f53\u800c\u8a00\uff0cReMe\u4e3a\u4e2a\u6027\u5316\u8ba4\u77e5\u8bad\u7ec3\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5229\u7528AI\u6280\u672f\u6ee1\u8db3\u65e5\u76ca\u589e\u957f\u7684\u8ba4\u77e5\u5065\u5eb7\u975e\u836f\u7269\u5e72\u9884\u9700\u6c42\uff0c\u672a\u6765\u7684\u7814\u7a76\u65e8\u5728\u6269\u5c55\u5176\u5e94\u7528\u8303\u56f4\u548c\u6709\u6548\u6027\u3002|\n", "2410.19730": "|**2024-10-25**|**Counting Ability of Large Language Models and Impact of Tokenization**|Xiang Zhang et.al.|[2410.19730](http://arxiv.org/abs/2410.19730)|**[link](https://github.com/juntaic7/impact-of-tokenization-in-the-counting-ability-of-language-models)**|Transformers\u4f5c\u4e3a\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u57fa\u77f3\uff0c\u9762\u4e34\u7740\u56fa\u6709\u7684\u67b6\u6784\u9650\u5236\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e0e\u5faa\u73af\u7f51\u7edc\u4e0d\u540c\uff0cTransformers\u7f3a\u4e4f\u5faa\u73af\u8fde\u63a5\uff0c\u4f7f\u5176\u53ea\u80fd\u8fdb\u884c\u6052\u5b9a\u6df1\u5ea6\u7684\u8ba1\u7b97\u3002\u8fd9\u79cd\u9650\u5236\u4f7f\u5b83\u4eec\u5728TC$^0$\u590d\u6742\u6027\u7c7b\u4e2d\uff0c\u4ece\u7406\u8bba\u4e0a\u8bb2\uff0c\u65e0\u6cd5\u89e3\u51b3\u90a3\u4e9b\u9700\u8981\u8f93\u5165\u957f\u5ea6\u589e\u52a0\u65f6\u63a8\u7406\u6df1\u5ea6\u4e5f\u76f8\u5e94\u589e\u52a0\u7684\u4efb\u52a1\u3002\u8ba1\u6570\u4f5c\u4e3a\u8bb8\u591a\u63a8\u7406\u4efb\u52a1\u7684\u57fa\u672c\u7ec4\u6210\u90e8\u5206\uff0c\u4e5f\u9700\u8981\u63a8\u7406\u6df1\u5ea6\u968f\u7740\u4efb\u52a1\u590d\u6742\u5ea6\u7ebf\u6027\u589e\u957f\u624d\u80fd\u8fdb\u884c\u5f52\u7eb3\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u786e\u5b9a\u4e86\u57fa\u4e8eTransformer\u7684\u4e13\u5bb6\u6a21\u578b\u5728\u8ba1\u6570\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u4e0a\u9650\uff0c\u4f46\u8fd9\u4e9b\u53d1\u73b0\u5e76\u4e0d\u80fd\u76f4\u63a5\u5e94\u7528\u4e8e\u901a\u7528LLM\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u63a8\u7406\u673a\u5236\u5b58\u5728\u5dee\u5f02\u3002\u6700\u8fd1\u7684\u7814\u7a76\u6307\u51fa\uff0c\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63a8\u7406\u53ef\u4ee5\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u7f13\u89e3Transformer\u5728\u8ba1\u6570\u4efb\u52a1\u4e2d\u7684\u67b6\u6784\u9650\u5236\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5206\u8bcd\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u4f5c\u7528\u5374\u5f88\u5c11\u53d7\u5230\u5173\u6ce8\u3002\u4e0d\u540c\u4e8e\u901a\u5e38\u4f7f\u7528\u5b57\u7b26\u7ea7\u5206\u8bcd\u7684\u4e13\u5bb6\u6a21\u578b\uff0cLLM\u901a\u5e38\u4f9d\u8d56\u4e8e\u5b57\u8282\u7ea7\uff08BPE\uff09\u5206\u8bcd\u5668\uff0c\u8fd9\u4ece\u6839\u672c\u4e0a\u6539\u53d8\u4e86\u63a8\u7406\u5904\u7406\u7684\u65b9\u5f0f\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u8ba8\u4e86\u5206\u8bcd\u5bf9LLM\u8ba1\u6570\u80fd\u529b\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u57fa\u4e8e\u5206\u8bcd\u65b9\u5f0f\u7684\u4e0d\u540c\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u53d8\u5316\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u7406\u8bba\u548c\u5b9e\u9a8c\u5206\u6790\uff0c\u4e3a\u5982\u4f55\u901a\u8fc7\u9009\u62e9\u5408\u9002\u7684\u5206\u8bcd\u65b9\u6cd5\u6765\u589e\u5f3a\u6a21\u578b\u7684\u7406\u8bba\u53ef\u8ba1\u7b97\u6027\u63d0\u4f9b\u4e86\u89c1\u89e3\uff0c\u4ece\u800c\u542f\u53d1\u8bbe\u8ba1\u65b0\u7684\u5206\u8bcd\u65b9\u6cd5\u4ee5\u63d0\u9ad8LLM\u7684\u63a8\u7406\u80fd\u529b\u3002|\n", "2410.19727": "|**2024-10-25**|**FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning**|Nicole Cho et.al.|[2410.19727](http://arxiv.org/abs/2410.19727)|null|\u4ece\u5927\u91cf\u6570\u636e\u6e90\u751f\u6210\u91d1\u878d\u667a\u80fd\u901a\u5e38\u4f9d\u8d56\u4e8e\u4f20\u7edf\u7684\u65b9\u6cd5\uff0c\u5982\u77e5\u8bc6\u56fe\u8c31\u6784\u5efa\u6216\u6570\u636e\u5e93\u5de5\u7a0b\u3002\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u91d1\u878d\u9886\u57df\u7684\u7279\u5b9a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5df2\u7ecf\u51fa\u73b0\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u8fdb\u5c55\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u4ecd\u5b58\u5728\u4e00\u4e9b\u9650\u5236\uff0c\u4f8b\u5982\u9ad8\u63a8\u7406\u6210\u672c\u3001\u5e7b\u89c9\u4ee5\u53ca\u540c\u65f6\u5206\u6790\u9ad8\u7ef4\u91d1\u878d\u6570\u636e\u7684\u590d\u6742\u6027\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u53d1\u660e\u4e86FISHNET\uff08\u91d1\u878d\u667a\u80fd\u4ece\u5b50\u67e5\u8be2\u3001\u534f\u8c03\u3001\u795e\u7ecf\u6761\u4ef6\u3001\u4e13\u5bb6\u96c6\u7fa4\u548c\u4efb\u52a1\u89c4\u5212\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u4ee3\u7406\u67b6\u6784\uff0c\u80fd\u591f\u5b8c\u6210\u8d85\u8fc798,000\u4efd\u76d1\u7ba1\u6587\u4ef6\u7684\u6781\u5176\u590d\u6742\u7684\u5206\u6790\u4efb\u52a1\uff0c\u8fd9\u4e9b\u6587\u4ef6\u5728\u8bed\u4e49\u3001\u6570\u636e\u5c42\u6b21\u6216\u683c\u5f0f\u4e0a\u5dee\u5f02\u5de8\u5927\u3002FISHNET\u5728\u91d1\u878d\u6d1e\u5bdf\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08\u6210\u529f\u7387\u4e3a61.8%\uff0c\u8def\u7531\u4e3a5.0%\uff0cRAG R-\u7cbe\u786e\u5ea6\u4e3a45.6%\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e25\u683c\u7684\u6d88\u878d\u5b9e\u9a8c\uff0c\u4ee5\u5b9e\u8bc1\u8bc1\u660eFISHNET\u7684\u6210\u529f\u3001\u6bcf\u4e2a\u4ee3\u7406\u7684\u91cd\u8981\u6027\u4ee5\u53ca\u6240\u6709\u4ee3\u7406\u7ec4\u88c5\u4f18\u5316\u6027\u80fd\u3002\u6211\u4eec\u7684\u6a21\u5757\u5316\u67b6\u6784\u53ef\u4ee5\u5e94\u7528\u4e8e\u5404\u79cd\u7528\u4f8b\uff0c\u63d0\u4f9b\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u5bf9\u91d1\u878d\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u6570\u636e\u5b8c\u6574\u6027\u3002|\n", "2410.19720": "|**2024-10-25**|**2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision**|Shilong Li et.al.|[2410.19720](http://arxiv.org/abs/2410.19720)|null|\u8fd1\u5e74\u6765\uff0c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5176\u7b80\u5355\u6027\u548c\u6709\u6548\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u901a\u5e38\u4f18\u5316\u4e00\u4e2a\u6807\u91cf\u5206\u6570\u6216\u6392\u540d\u5956\u52b1\uff0c\u4ece\u800c\u5ffd\u7565\u4e86\u4eba\u7c7b\u504f\u597d\u7684\u591a\u7ef4\u6027\u8d28\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u5c06DPO\u7684\u504f\u597d\u6269\u5c55\u5230\u4e24\u4e2a\u7ef4\u5ea6\uff1a\u7247\u6bb5\u548c\u65b9\u9762\u3002\u6211\u4eec\u9996\u5148\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aHelpSteer-2D\u7684\u4e8c\u7ef4\u76d1\u7763\u6570\u636e\u96c6\u3002\u5bf9\u4e8e\u7247\u6bb5\u7ef4\u5ea6\uff0c\u6211\u4eec\u5c06\u54cd\u5e94\u5206\u6210\u53e5\u5b50\u5e76\u4e3a\u6bcf\u4e2a\u7247\u6bb5\u5206\u914d\u5206\u6570\u3002\u5bf9\u4e8e\u65b9\u9762\u7ef4\u5ea6\uff0c\u6211\u4eec\u7cbe\u5fc3\u8bbe\u8ba1\u4e86\u51e0\u9879\u6807\u51c6\u4ee5\u6db5\u76d6\u54cd\u5e94\u8d28\u91cf\u8bc4\u4f30\u6807\u51c6\u3002\u5229\u7528\u4e8c\u7ef4\u4fe1\u53f7\u4f5c\u4e3a\u53cd\u9988\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a2D-DPO\u6846\u67b6\uff0c\u5c06\u603b\u4f53\u76ee\u6807\u5206\u89e3\u4e3a\u591a\u7247\u6bb5\u548c\u591a\u65b9\u9762\u7684\u76ee\u6807\u3002\u5728\u6d41\u884c\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c2D-DPO\u7684\u8868\u73b0\u4f18\u4e8e\u90a3\u4e9b\u4f18\u5316\u6807\u91cf\u6216\u4e00\u7ef4\u504f\u597d\u7684\u65b9\u6cd5\u3002|\n", "2410.19702": "|**2024-10-25**|**TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning**|Xiangyu Zeng et.al.|[2410.19702](http://arxiv.org/abs/2410.19702)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u77ed\u89c6\u9891\u7406\u89e3\u65b9\u9762\u5df2\u7ecf\u5c55\u793a\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u957f\u89c6\u9891\u7684\u7406\u89e3\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u5957\u65b0\u7684\u8bbe\u8ba1\u6765\u9002\u5e94\u73b0\u6709\u7684\u77ed\u89c6\u9891MLLM\uff0c\u4ee5\u5b9e\u73b0\u957f\u89c6\u9891\u7406\u89e3\uff0c\u5305\u62ec\u4e00\u4e2a\u7b80\u5355\u800c\u9ad8\u6548\u7684\u6846\u67b6\u6765\u5904\u7406\u957f\u89c6\u9891\u5e8f\u5217\u3001\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u89c6\u9891\u6570\u636e\u96c6\u7528\u4e8eMLLM\u7684\u63a5\u5730\u8c03\u4f18\uff0c\u4ee5\u53ca\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\uff0c\u4ee5\u663e\u5f0f\u5730\u5c06\u63a5\u5730\u76d1\u7763\u7eb3\u5165\u4f20\u7edf\u7684\u95ee\u7b54\u683c\u5f0f\u3002\u5177\u4f53\u800c\u8a00\uff0c\u57fa\u4e8eVideoChat\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6211\u4eec\u7684\u957f\u89c6\u9891MLLM\uff0c\u79f0\u4e3aVideoChat-T\uff0c\u901a\u8fc7\u5b9e\u73b0\u4ee4\u724c\u6d17\u724c\u6765\u538b\u7f29\u957f\u89c6\u9891\u4ee4\u724c\uff0c\u5e76\u5f15\u5165\u65f6\u95f4\u81ea\u9002\u5e94\u4f4d\u7f6e\u7f16\u7801\uff08TAPE\uff09\u6765\u589e\u5f3a\u89c6\u89c9\u8868\u793a\u7684\u65f6\u95f4\u611f\u77e5\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86TimePro\uff0c\u8fd9\u662f\u4e00\u4e2a\u7efc\u5408\u6027\u7684\u63a5\u5730\u4e3a\u4e2d\u5fc3\u7684\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u75319\u4e2a\u4efb\u52a1\u548c34.9\u4e07\u4e2a\u9ad8\u8d28\u91cf\u7684\u63a5\u5730\u6807\u6ce8\u7ec4\u6210\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u7c7b\u578b\uff0c\u79f0\u4e3a\u65f6\u95f4\u63a5\u5730\u5b57\u5e55\uff0c\u7528\u4e8e\u6267\u884c\u8be6\u7ec6\u89c6\u9891\u63cf\u8ff0\u4e0e\u76f8\u5e94\u65f6\u95f4\u6233\u9884\u6d4b\u3002\u8fd9\u79cd\u660e\u786e\u7684\u65f6\u95f4\u4f4d\u7f6e\u9884\u6d4b\u5c06\u6307\u5bfcMLLM\u5728\u751f\u6210\u63cf\u8ff0\u65f6\u6b63\u786e\u5173\u6ce8\u89c6\u89c9\u5185\u5bb9\uff0c\u4ece\u800c\u51cf\u5c11\u56e0LLMs\u5f15\u8d77\u7684\u5e7b\u89c9\u98ce\u9669\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684TimeSuite\u6210\u529f\u5730\u63d0\u9ad8\u4e86\u77ed\u89c6\u9891MLLM\u5728\u957f\u89c6\u9891\u7406\u89e3\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5728Egoschema\u548cVideoMME\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5206\u522b\u63d0\u9ad8\u4e865.6%\u548c6.8%\u3002\u6b64\u5916\uff0cVideoChat-T\u5728\u96f6\u6837\u672c\u65f6\u95f4\u63a5\u5730\u80fd\u529b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u6700\u5148\u8fdb\u7684MLLM\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5b83\u7684\u8868\u73b0\u4e0e\u4f20\u7edf\u7684\u76d1\u7763\u4e13\u5bb6\u6a21\u578b\u76f8\u5f53\u3002|\n", "2410.19697": "|**2024-10-25**|**IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation**|Kaixian Qu et.al.|[2410.19697](http://arxiv.org/abs/2410.19697)|null|\u5728\u672a\u63a2\u7d22\u7684\u73af\u5883\u4e2d\u9ad8\u6548\u5bfc\u822a\u5230\u76ee\u6807\u7269\u4f53\u662f\u901a\u7528\u667a\u80fd\u673a\u5668\u4eba\u7684\u4e00\u9879\u5173\u952e\u6280\u672f\u3002\u6700\u8fd1\u7684\u65b9\u6cd5\u91c7\u7528\u6a21\u5757\u5316\u7b56\u7565\uff0c\u7ed3\u5408\u7ecf\u5178\u7684\u63a2\u7d22\u7b97\u6cd5\uff08\u7279\u522b\u662f\u524d\u6cbf\u63a2\u7d22\uff09\u4e0e\u5b66\u4e60\u7684\u8bed\u4e49\u6620\u5c04/\u63a2\u7d22\u6a21\u5757\u6765\u89e3\u51b3\u8fd9\u4e00\u7269\u4f53\u5bfc\u822a\u95ee\u9898\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4fe1\u606f\u8def\u5f84\u89c4\u5212\u548c\u4e09\u7ef4\u7269\u4f53\u6982\u7387\u6620\u5c04\u65b9\u6cd5\u3002\u8be5\u6620\u5c04\u6a21\u5757\u901a\u8fc7\u8bed\u4e49\u5206\u5272\u548c\u8d1d\u53f6\u65af\u6ee4\u6ce2\u8ba1\u7b97\u611f\u5174\u8da3\u7269\u4f53\u7684\u6982\u7387\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5b58\u50a8\u5e38\u89c1\u7269\u4f53\u7684\u6982\u7387\uff0c\u8fd9\u4e9b\u6982\u7387\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u5e38\u8bc6\u5148\u9a8c\uff0c\u4ece\u800c\u4ece\u8bed\u4e49\u4e0a\u5f15\u5bfc\u63a2\u7d22\u3002\u5f53\u5f53\u524d\u89c6\u89d2\u6355\u83b7\u4e86\u8db3\u591f\u591a\u4e14\u7f6e\u4fe1\u5ea6\u9ad8\u7684\u7269\u4f53\u4e3a\u611f\u5174\u8da3\u7269\u4f53\u7684\u4f53\u7d20\u65f6\uff0c\u89c4\u5212\u5668\u7ec8\u6b62\u3002\u5c3d\u7ba1\u6211\u4eec\u7684\u89c4\u5212\u5668\u91c7\u7528\u4e86\u96f6\u6837\u672c\u65b9\u6cd5\uff0c\u4f46\u5728Habitat\u7269\u4f53\u5bfc\u822a\u6311\u62182023\u4e2d\uff0c\u5b83\u5728\u6210\u529f\u52a0\u6743\u8def\u5f84\u957f\u5ea6\uff08SPL\uff09\u548c\u8f6fSPL\u6307\u6807\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u6bd4\u5176\u4ed6\u5de5\u4f5c\u9ad8\u51fa20%\u4ee5\u4e0a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u771f\u5b9e\u673a\u5668\u4eba\u4e0a\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002\u9879\u76ee\u7f51\u9875\uff1ahttps://ippon-paper.github.io/|\n", "2410.19694": "|**2024-10-25**|**Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs**|Yifei Zhang et.al.|[2410.19694](http://arxiv.org/abs/2410.19694)|null|\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u6210\u4e3a\u5c06\u9884\u8bad\u7ec3\u6a21\u578b\u9002\u5e94\u4e0b\u6e38\u4efb\u52a1\u7684\u91cd\u8981\u6280\u672f\u3002\u7136\u800c\uff0cLLMs\u7684\u5de8\u5927\u89c4\u6a21\u5e26\u6765\u4e86\u663e\u8457\u7684\u8ba1\u7b97\u590d\u6742\u6027\u548c\u8d44\u6e90\u9700\u6c42\u6311\u6218\u3002\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u5e94\u8fd0\u800c\u751f\u3002\u7136\u800c\uff0c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4f4e\u79e9\u9002\u5e94\u4e0e\u7406\u8bba\u6700\u4f18\u4e4b\u95f4\u5b58\u5728\u5dee\u8ddd\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6781\u7aef\u68af\u5ea6\u63d0\u5347LoRA\uff08XGBLoRA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5229\u7528\u96c6\u6210\u5b66\u4e60\u7684\u529b\u91cf\u6765\u5f25\u5408\u8fd9\u4e00\u5dee\u8ddd\u3002\u53d7\u68af\u5ea6\u63d0\u5347\u542f\u53d1\uff0cXGBLoRA\u8fed\u4ee3\u5730\u5b66\u4e60\u5e76\u878d\u5408\u4e00\u7cfb\u5217LoRA\u9002\u5e94\u4ee5\u4f18\u5316\u6a21\u578b\u9884\u6d4b\u3002\u5b83\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u6807\u51c6LoRA\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u79e9-1\u9002\u5e94\u7684\u8ba1\u7b97\u6548\u7387\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u7406\u8bba\u5206\u6790\u4ee5\u8bc1\u660e\u65b9\u6cd5\u7684\u6536\u655b\u6027\u548c\u6700\u4f18\u6027\uff0c\u5e76\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0cXGBLoRA\u59cb\u7ec8\u4f18\u4e8e\u6807\u51c6LoRA\uff0c\u5e76\u4e14\u5728\u663e\u8457\u51cf\u5c11\u53ef\u8bad\u7ec3\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u4e0e\u5168\u5fae\u8c03\u76f8\u5f53\u7684\u6027\u80fd\u3002\u8fd9\u9879\u5de5\u4f5c\u63a8\u8fdb\u4e86LLMs\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\uff0c\u5e76\u4e3a\u4f18\u5316\u6027\u80fd\u548c\u6548\u7387\u7684\u540c\u65f6\u5c06LLMs\u9002\u5e94\u5230\u4e0b\u6e38\u4efb\u52a1\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.19656": "|**2024-10-25**|**APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs**|Huaxiaoyue Wang et.al.|[2410.19656](http://arxiv.org/abs/2410.19656)|null|\u5bb6\u5ead\u673a\u5668\u4eba\u5728\u6267\u884c\u4e2a\u6027\u5316\u4efb\u52a1\u65f6\uff0c\u5fc5\u987b\u5de7\u5999\u5730\u5e73\u8861\u7528\u6237\u504f\u597d\u4e0e\u73af\u5883\u9650\u5236\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5728\u53d7\u9650\u7a7a\u95f4\u5185\u8fdb\u884c\u7ec4\u7ec7\u4efb\u52a1\uff0c\u4f8b\u5982\u5c06\u7269\u54c1\u653e\u5165\u51b0\u7bb1\uff0c\u5176\u4e2d\u7528\u6237\u7684\u653e\u7f6e\u504f\u597d\u5e38\u5e38\u4e0e\u7269\u7406\u9650\u5236\u76f8\u51b2\u7a81\u3002\u673a\u5668\u4eba\u5fc5\u987b\u6839\u636e\u5c11\u91cf\u6f14\u793a\u6765\u63a8\u65ad\u7528\u6237\u7684\u504f\u597d\uff0c\u8fd9\u6bd4\u8be6\u7ec6\u5b9a\u4e49\u6240\u6709\u9700\u6c42\u66f4\u5bb9\u6613\u8ba9\u7528\u6237\u64cd\u4f5c\u3002\u867d\u7136\u6700\u8fd1\u7684\u7814\u7a76\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ece\u7528\u6237\u6f14\u793a\u4e2d\u5b66\u4e60\u504f\u597d\uff0c\u4f46\u5b83\u4eec\u9762\u4e34\u4e24\u4e2a\u57fa\u672c\u6311\u6218\u3002\u9996\u5148\uff0c\u5728\u89e3\u91ca\u7528\u6237\u884c\u4e3a\u65f6\u5b58\u5728\u56fa\u6709\u7684\u6a21\u7cca\u6027\uff0c\u56e0\u4e3a\u5355\u4e00\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\u53ef\u80fd\u5bf9\u5e94\u591a\u79cd\u504f\u597d\u3002\u5176\u6b21\uff0c\u5e76\u975e\u6240\u6709\u7528\u6237\u504f\u597d\u5728\u73af\u5883\u4e2d\u90fd\u662f\u5b9e\u9645\u53ef\u884c\u7684\uff0c\u56e0\u4e3a\u5b58\u5728\u51e0\u4f55\u7ea6\u675f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86APRICOT\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u57fa\u4e8eLLM\u7684\u8d1d\u53f6\u65af\u4e3b\u52a8\u504f\u597d\u5b66\u4e60\u548c\u8003\u8651\u73af\u5883\u7ea6\u675f\u7684\u4efb\u52a1\u89c4\u5212\u3002APRICOT\u901a\u8fc7\u4e3b\u52a8\u67e5\u8be2\u7528\u6237\u6765\u4f18\u5316\u751f\u6210\u7684\u504f\u597d\uff0c\u5e76\u52a8\u6001\u8c03\u6574\u5176\u8ba1\u5212\u4ee5\u5c0a\u91cd\u73af\u5883\u9650\u5236\u3002\u6211\u4eec\u5728\u591a\u6837\u5316\u7684\u7ec4\u7ec7\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86APRICOT\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u504f\u597d\u6ee1\u610f\u5ea6\u548c\u8ba1\u5212\u53ef\u884c\u6027\u65b9\u9762\u7684\u663e\u8457\u63d0\u5347\u3002\u8be5\u9879\u76ee\u7f51\u7ad9\u4f4d\u4e8ehttps://portal-cornell.github.io/apricot/|\n", "2410.19599": "|**2024-10-25**|**Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina**|Yuan Gao et.al.|[2410.19599](http://arxiv.org/abs/2410.19599)|null|\u8fd1\u671f\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u4ee5\u5c55\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5728\u7ecf\u6d4e\u5b9e\u9a8c\u3001\u8c03\u67e5\u548c\u653f\u6cbb\u8ba8\u8bba\u4e2d\u4e0e\u4eba\u7c7b\u884c\u4e3a\u4e00\u81f4\u3002\u8fd9\u4fc3\u4f7f\u8bb8\u591a\u4eba\u63d0\u51fa\u53ef\u4ee5\u5c06LLMs\u4f5c\u4e3a\u4eba\u7c7b\u5728\u793e\u4f1a\u79d1\u5b66\u4e2d\u7684\u66ff\u4ee3\u54c1\u3002\u7136\u800c\uff0cLLMs\u4e0e\u4eba\u7c7b\u4e4b\u95f4\u5b58\u5728\u6839\u672c\u6027\u7684\u5dee\u5f02\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u6982\u7387\u6a21\u5f0f\uff0c\u7f3a\u4e4f\u5851\u9020\u4eba\u7c7b\u8ba4\u77e5\u7684\u5177\u8eab\u7ecf\u9a8c\u6216\u751f\u5b58\u76ee\u6807\u3002\u6211\u4eec\u901a\u8fc711-20\u91d1\u94b1\u8bf7\u6c42\u6e38\u620f\u6765\u8bc4\u4f30LLMs\u7684\u63a8\u7406\u6df1\u5ea6\u3002\u51e0\u4e4e\u6240\u6709\u5148\u8fdb\u7684\u65b9\u6cd5\u90fd\u65e0\u6cd5\u5728\u8bb8\u591a\u6a21\u578b\u4e2d\u590d\u5236\u4eba\u7c7b\u7684\u884c\u4e3a\u5206\u5e03\uff0c\u9664\u975e\u5728\u4f7f\u7528\u5927\u91cf\u4eba\u7c7b\u884c\u4e3a\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u3002\u5931\u8d25\u7684\u539f\u56e0\u591a\u79cd\u591a\u6837\uff0c\u6d89\u53ca\u8f93\u5165\u8bed\u8a00\u3001\u89d2\u8272\u548c\u4fdd\u62a4\u63aa\u65bd\u7b49\u56e0\u7d20\u3002\u8fd9\u4e9b\u7ed3\u679c\u63d0\u9192\u6211\u4eec\u4e0d\u8981\u5c06LLMs\u7528\u4e8e\u7814\u7a76\u4eba\u7c7b\u884c\u4e3a\u6216\u5c06\u5176\u4f5c\u4e3a\u4eba\u7c7b\u7684\u66ff\u4ee3\u54c1\u3002|\n", "2410.19586": "|**2024-10-25**|**Diverse Sign Language Translation**|Xin Shen et.al.|[2410.19586](http://arxiv.org/abs/2410.19586)|null|\u7c7b\u4f3c\u4e8e\u53e3\u8bed\uff0c\u4e00\u4e2a\u624b\u8bed\u8868\u8fbe\u53ef\u80fd\u5bf9\u5e94\u591a\u4e2a\u6709\u6548\u7684\u6587\u672c\u89e3\u91ca\u3002\u56e0\u6b64\uff0c\u5bf9\u624b\u8bed\u7ffb\u8bd1\uff08SLT\uff09\u6a21\u578b\u8fdb\u884c\u5355\u4e00\u7684\u6620\u5c04\u5b66\u4e60\u53ef\u80fd\u662f\u4e0d\u5145\u5206\u7684\uff0c\u5c24\u5176\u662f\u5728\u6570\u636e\u6709\u9650\u7684\u60c5\u51b5\u4e0b\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u6837\u5316\u7684\u624b\u8bed\u7ffb\u8bd1\uff08DivSLT\uff09\u4efb\u52a1\uff0c\u65e8\u5728\u4e3a\u624b\u8bed\u89c6\u9891\u751f\u6210\u591a\u6837\u4e14\u51c6\u786e\u7684\u7ffb\u8bd1\u3002\u9996\u5148\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u5e7f\u6cdb\u4f7f\u7528\u7684CSL-Daily\u548cPHOENIX14T SLT\u6570\u636e\u96c6\u751f\u6210\u591a\u4e2a\u53c2\u8003\u3002\u8fd9\u91cc\uff0c\u4ec5\u9080\u8bf7\u6bcd\u8bed\u4eba\u58eb\u6765\u6da6\u8272\u4e0d\u51c6\u786e\u7684\u53c2\u8003\uff0c\u4ece\u800c\u663e\u8457\u63d0\u9ad8\u4e86\u6ce8\u91ca\u6548\u7387\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u57fa\u51c6\u6a21\u578b\u4ee5\u63a8\u52a8\u8be5\u4efb\u52a1\u7684\u7814\u7a76\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u591a\u53c2\u8003\u8bad\u7ec3\u7b56\u7565\uff0c\u4ee5\u4f7f\u6211\u4eec\u7684DivSLT\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u591a\u6837\u5316\u7684\u7ffb\u8bd1\u3002\u7136\u540e\uff0c\u4e3a\u4e86\u63d0\u9ad8\u7ffb\u8bd1\u51c6\u786e\u6027\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u6700\u5927\u5316\u7ffb\u8bd1\u7ed3\u679c\u5956\u52b1\u7684\u5f3a\u5316\u5b66\u4e60\u76ee\u6807\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u591a\u79cd\u6307\u6807\u6765\u8bc4\u4f30DivSLT\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3001\u591a\u6837\u6027\u548c\u8bed\u4e49\u7cbe\u5ea6\u3002\u5728\u4e30\u5bcc\u540e\u7684\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684DivSLT\u65b9\u6cd5\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u7ffb\u8bd1\u6027\u80fd\uff0c\u8fd8\u83b7\u5f97\u4e86\u591a\u6837\u5316\u7684\u7ffb\u8bd1\u7ed3\u679c\u3002|\n", "2410.21272": "|**2024-10-28**|**Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics**|Yaniv Nikankin et.al.|[2410.21272](http://arxiv.org/abs/2410.21272)|**[link](https://github.com/technion-cs-nlp/llm-arithmetic-heuristics)**|\u4e3a\u4e86\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u63a8\u7406\u4efb\u52a1\u65f6\u662f\u901a\u8fc7\u5b66\u4e60\u7a33\u5065\u7684\u53ef\u6cdb\u5316\u7b97\u6cd5\uff0c\u8fd8\u662f\u901a\u8fc7\u8bb0\u5fc6\u8bad\u7ec3\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u4e86\u7b97\u672f\u63a8\u7406\u4f5c\u4e3a\u4ee3\u8868\u6027\u4efb\u52a1\u8fdb\u884c\u7814\u7a76\u3002\u901a\u8fc7\u56e0\u679c\u5206\u6790\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u6a21\u578b\u7684\u4e00\u4e2a\u5b50\u90e8\u5206\uff08\u4e00\u4e2a\u7535\u8def\uff09\uff0c\u8be5\u90e8\u5206\u89e3\u91ca\u4e86\u57fa\u672c\u7b97\u672f\u903b\u8f91\u4e2d\u6a21\u578b\u5927\u90e8\u5206\u7684\u884c\u4e3a\uff0c\u5e76\u68c0\u67e5\u4e86\u5176\u529f\u80fd\u3002\u901a\u8fc7\u5173\u6ce8\u5355\u4e2a\u7535\u8def\u795e\u7ecf\u5143\u7684\u5c42\u9762\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u7ec4\u91cd\u8981\u7684\u7a00\u758f\u795e\u7ecf\u5143\uff0c\u5b83\u4eec\u5b9e\u73b0\u4e86\u7b80\u5355\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u6bcf\u4e2a\u542f\u53d1\u5f0f\u65b9\u6cd5\u8bc6\u522b\u6570\u503c\u8f93\u5165\u6a21\u5f0f\u5e76\u8f93\u51fa\u76f8\u5e94\u7684\u7b54\u6848\u3002\u6211\u4eec\u5047\u8bbe\uff0c\u8fd9\u4e9b\u542f\u53d1\u5f0f\u795e\u7ecf\u5143\u7684\u7ec4\u5408\u662f\u751f\u6210\u6b63\u786e\u7b97\u672f\u7b54\u6848\u7684\u673a\u5236\u3002\u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u70b9\uff0c\u6211\u4eec\u5c06\u6bcf\u4e2a\u795e\u7ecf\u5143\u5206\u7c7b\u4e3a\u51e0\u79cd\u542f\u53d1\u5f0f\u7c7b\u578b\u2014\u2014\u4f8b\u5982\uff0c\u5f53\u64cd\u4f5c\u6570\u843d\u5728\u67d0\u4e2a\u8303\u56f4\u5185\u65f6\u6fc0\u6d3b\u7684\u795e\u7ecf\u5143\u2014\u2014\u5e76\u53d1\u73b0\u8fd9\u4e9b\u542f\u53d1\u5f0f\u7c7b\u578b\u7684\u65e0\u5e8f\u7ec4\u5408\u662f\u89e3\u91ca\u6a21\u578b\u5728\u7b97\u672f\u63d0\u793a\u4e0a\u51c6\u786e\u6027\u7684\u4e3b\u8981\u673a\u5236\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u8fd9\u79cd\u673a\u5236\u5728\u8bad\u7ec3\u65e9\u671f\u5c31\u662f\u7b97\u672f\u51c6\u786e\u6027\u7684\u91cd\u8981\u6765\u6e90\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u5728\u591a\u4e2aLLM\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u6267\u884c\u7b97\u672f\u8fd0\u7b97\u65e2\u4e0d\u662f\u4f9d\u9760\u7a33\u5065\u7684\u7b97\u6cd5\uff0c\u4e5f\u4e0d\u662f\u4f9d\u9760\u8bb0\u5fc6\uff1b\u76f8\u53cd\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u201c\u4e00\u7ec4\u542f\u53d1\u5f0f\u65b9\u6cd5\u201d\u3002|\n", "2410.21264": "|**2024-10-28**|**LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior**|Hanyu Wang et.al.|[2410.21264](http://arxiv.org/abs/2410.21264)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86LARP\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u89c6\u9891\u6807\u8bb0\u5668\uff0c\u65e8\u5728\u514b\u670d\u5f53\u524d\u7528\u4e8e\u81ea\u56de\u5f52\uff08AR\uff09\u751f\u6210\u6a21\u578b\u7684\u89c6\u9891\u6807\u8bb0\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u3002\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u8865\u4e01\u7684\u6807\u8bb0\u5668\u76f4\u63a5\u5c06\u5c40\u90e8\u89c6\u89c9\u8865\u4e01\u7f16\u7801\u4e3a\u79bb\u6563\u6807\u8bb0\u4e0d\u540c\uff0cLARP\u5f15\u5165\u4e86\u4e00\u79cd\u6574\u4f53\u6807\u8bb0\u65b9\u6848\uff0c\u901a\u8fc7\u4e00\u7ec4\u5b66\u4e60\u5230\u7684\u6574\u4f53\u67e5\u8be2\u6765\u6536\u96c6\u89c6\u89c9\u5185\u5bb9\u7684\u4fe1\u606f\u3002\u8fd9\u79cd\u8bbe\u8ba1\u4f7fLARP\u80fd\u591f\u6355\u6349\u66f4\u5168\u5c40\u548c\u8bed\u4e49\u5316\u7684\u8868\u793a\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u5c40\u9650\u4e8e\u5c40\u90e8\u8865\u4e01\u7ea7\u522b\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u63d0\u4f9b\u4e86\u7075\u6d3b\u6027\uff0c\u652f\u6301\u4efb\u610f\u6570\u91cf\u7684\u79bb\u6563\u6807\u8bb0\uff0c\u4ece\u800c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u5b9e\u73b0\u81ea\u9002\u5e94\u548c\u9ad8\u6548\u7684\u6807\u8bb0\u3002\u4e3a\u4e86\u4f7f\u79bb\u6563\u6807\u8bb0\u7a7a\u95f4\u4e0e\u4e0b\u6e38AR\u751f\u6210\u4efb\u52a1\u5bf9\u9f50\uff0cLARP\u96c6\u6210\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u7684AR\u53d8\u6362\u5668\u4f5c\u4e3a\u8bad\u7ec3\u65f6\u7684\u5148\u9a8c\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5728\u79bb\u6563\u6f5c\u5728\u7a7a\u95f4\u4e0a\u9884\u6d4b\u4e0b\u4e00\u4e2a\u6807\u8bb0\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7ed3\u5408\u5148\u9a8c\u6a21\u578b\uff0cLARP\u5b66\u4e60\u4e86\u4e00\u4e2a\u4e0d\u4ec5\u4f18\u5316\u4e86\u89c6\u9891\u91cd\u5efa\u7684\u6f5c\u5728\u7a7a\u95f4\uff0c\u800c\u4e14\u7ed3\u6784\u4e0a\u66f4\u9002\u5408\u81ea\u56de\u5f52\u751f\u6210\u7684\u6f5c\u5728\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u5b9a\u4e49\u4e86\u79bb\u6563\u6807\u8bb0\u7684\u987a\u5e8f\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6b65\u5c06\u5176\u63a8\u5411\u6700\u4f18\u914d\u7f6e\uff0c\u786e\u4fdd\u63a8\u7406\u65f6\u66f4\u5e73\u6ed1\u548c\u51c6\u786e\u7684AR\u751f\u6210\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLARP\u8868\u73b0\u5f3a\u52b2\uff0c\u5728UCF101\u5206\u7c7b\u6761\u4ef6\u4e0b\u7684\u89c6\u9891\u751f\u6210\u57fa\u51c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684FVD\u5206\u6570\u3002LARP\u589e\u5f3a\u4e86AR\u6a21\u578b\u4e0e\u89c6\u9891\u7684\u517c\u5bb9\u6027\uff0c\u5e76\u5f00\u542f\u4e86\u6784\u5efa\u7edf\u4e00\u7684\u9ad8\u4fdd\u771f\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53ef\u80fd\u6027\u3002|\n", "2410.21252": "|**2024-10-28**|**LongReward: Improving Long-context Large Language Models with AI Feedback**|Jiajie Zhang et.al.|[2410.21252](http://arxiv.org/abs/2410.21252)|**[link](https://github.com/THUDM/LongReward)**|\u5c3d\u7ba1\u5728\u5f00\u53d1\u957f\u4e0a\u4e0b\u6587\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5408\u6210\u7684\u6570\u636e\u8d28\u91cf\u5f80\u5f80\u5f71\u54cd\u4e86\u6709\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6a21\u578b\u7684\u957f\u4e0a\u4e0b\u6587\u6027\u80fd\uff0c\u5e76\u5bfc\u81f4\u56fa\u6709\u7684\u5c40\u9650\u6027\u3002\u539f\u5219\u4e0a\uff0c\u9002\u5f53\u7684\u5956\u52b1\u4fe1\u53f7\u53ef\u4ee5\u5229\u7528\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u5982\u4f55\u83b7\u5f97\u53ef\u9760\u7684\u5956\u52b1\u4ecd\u7136\u662f\u4e00\u4e2a\u672a\u63a2\u7d22\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LongReward\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u5229\u7528\u73b0\u6210\u7684LLM\u4ece\u56db\u4e2a\u7ef4\u5ea6\uff08\u5373\uff1a\u6709\u7528\u6027\u3001\u903b\u8f91\u6027\u3001\u51c6\u786e\u6027\u548c\u5b8c\u6574\u6027\uff09\u63d0\u4f9b\u957f\u4e0a\u4e0b\u6587\u6a21\u578b\u54cd\u5e94\u7684\u5956\u52b1\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u7ef4\u5ea6\u8bbe\u8ba1\u4e86\u8be6\u7ec6\u7684\u8bc4\u4f30\u6d41\u7a0b\u3002\u901a\u8fc7\u7ed3\u5408LongReward\u548c\u79bb\u7ebfRL\u7b97\u6cd5DPO\uff0c\u6211\u4eec\u80fd\u591f\u6709\u6548\u5730\u6539\u8fdb\u957f\u4e0a\u4e0b\u6587SFT\u6a21\u578b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLongReward\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u7684\u957f\u4e0a\u4e0b\u6587\u6027\u80fd\uff0c\u8fd8\u589e\u5f3a\u4e86\u5b83\u4eec\u9075\u5faa\u77ed\u6307\u4ee4\u7684\u80fd\u529b\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5e26\u6709LongReward\u7684\u957f\u4e0a\u4e0b\u6587DPO\u548c\u4f20\u7edf\u7684\u77ed\u4e0a\u4e0b\u6587DPO\u53ef\u4ee5\u4e00\u8d77\u4f7f\u7528\u800c\u4e0d\u635f\u5bb3\u4efb\u4f55\u4e00\u65b9\u7684\u6027\u80fd\u3002|\n", "2410.21242": "|**2024-10-28**|**Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback**|Nour Jedidi et.al.|[2410.21242](http://arxiv.org/abs/2410.21242)|null|\u6784\u5efa\u6709\u6548\u7684\u5bc6\u96c6\u68c0\u7d22\u7cfb\u7edf\u5728\u7f3a\u4e4f\u76f8\u5173\u6027\u76d1\u7763\u65f6\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u751f\u6210\u5047\u8bbe\u6587\u6863\uff0c\u4ece\u800c\u627e\u5230\u6700\u63a5\u8fd1\u7684\u771f\u5b9e\u6587\u6863\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u5177\u5907\u4e0e\u67e5\u8be2\u76f8\u5173\u7684\u9886\u57df\u7279\u5b9a\u77e5\u8bc6\uff0c\u8fd9\u5728\u5b9e\u8df5\u4e2d\u53ef\u80fd\u4e0d\u53ef\u884c\u3002\u6b64\u5916\uff0c\u751f\u6210\u5047\u8bbe\u6587\u6863\u7684\u65b9\u6cd5\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u5bf9\u4e8e\u6bcf\u4e2a\u67e5\u8be2\uff0cLLM\u9700\u8981\u751f\u6210\u5927\u91cf\u7684\u6807\u8bb0\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57fa\u4e8e\u76f8\u5173\u53cd\u9988\u7684\u771f\u5b9e\u6587\u6863\u5d4c\u5165\uff08ReDE-RF\uff09\u3002\u53d7\u76f8\u5173\u53cd\u9988\u7684\u542f\u53d1\uff0cReDE-RF\u63d0\u51fa\u5c06\u5047\u8bbe\u6587\u6863\u751f\u6210\u91cd\u65b0\u5b9a\u4e49\u4e3a\u76f8\u5173\u6027\u4f30\u8ba1\u4efb\u52a1\uff0c\u5229\u7528LLM\u9009\u62e9\u54ea\u4e9b\u6587\u6863\u5e94\u88ab\u7528\u4e8e\u6700\u8fd1\u90bb\u641c\u7d22\u3002\u901a\u8fc7\u8fd9\u79cd\u91cd\u65b0\u5b9a\u4e49\uff0cLLM\u4e0d\u518d\u9700\u8981\u9886\u57df\u7279\u5b9a\u7684\u77e5\u8bc6\uff0c\u800c\u53ea\u9700\u8981\u5224\u65ad\u4ec0\u4e48\u662f\u76f8\u5173\u7684\u3002\u6b64\u5916\uff0c\u76f8\u5173\u6027\u4f30\u8ba1\u4ec5\u8981\u6c42LLM\u8f93\u51fa\u4e00\u4e2a\u6807\u8bb0\uff0c\u4ece\u800c\u63d0\u9ad8\u6bcf\u6b21\u67e5\u8be2\u7684\u641c\u7d22\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cReDE-RF\u5728\u5e7f\u6cdb\u7684\u4f4e\u8d44\u6e90\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u59cb\u7ec8\u8d85\u8d8a\u6700\u5148\u8fdb\u7684\u96f6\u6837\u672c\u5bc6\u96c6\u68c0\u7d22\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u6bcf\u6b21\u67e5\u8be2\u7684\u5ef6\u8fdf\u65b9\u9762\u4e5f\u53d6\u5f97\u4e86\u663e\u8457\u6539\u8fdb\u3002|\n", "2410.21237": "|**2024-10-28**|**Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce**|Zhantao Yang et.al.|[2410.21237](http://arxiv.org/abs/2410.21237)|null|\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u5728\u5404\u79cdAI\u7cfb\u7edf\u4e2d\u626e\u6f14\u7740\u8d8a\u6765\u8d8a\u91cd\u8981\u7684\u89d2\u8272\u3002\u5bf9\u4e8e\u7535\u5b50\u5546\u52a1\u800c\u8a00\uff0c\u6784\u5efa\u9ad8\u6548\u4e14\u4f4e\u6210\u672c\u7684\u81ea\u52a8\u5316\u77e5\u8bc6\u56fe\u8c31\u662f\u5b9e\u73b0\u4f17\u591a\u6210\u529f\u4e0b\u6e38\u5e94\u7528\u7684\u57fa\u7840\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u4ece\u539f\u59cb\u4ea7\u54c1\u56fe\u50cf\u4e2d\u6784\u5efa\u7ed3\u6784\u5316\u7684\u4ea7\u54c1\u77e5\u8bc6\u56fe\u8c31\u3002\u8be5\u65b9\u6cd5\u5145\u5206\u5229\u7528\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b9e\u73b0\u4e86\u6574\u4e2a\u8fc7\u7a0b\u7684\u5b8c\u5168\u81ea\u52a8\u5316\uff0c\u5e76\u5141\u8bb8\u53ca\u65f6\u66f4\u65b0\u56fe\u8c31\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u4eba\u5de5\u6807\u6ce8\u7684\u7535\u5b50\u5546\u52a1\u4ea7\u54c1\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u77e5\u8bc6\u56fe\u8c31\u6784\u5efa\u4e2d\u7684\u4ea7\u54c1\u5c5e\u6027\u63d0\u53d6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6240\u6709\u6307\u6807\u548c\u8bc4\u4f30\u5c5e\u6027\u4e0a\u90fd\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u5176\u6709\u6548\u6027\u548c\u5e7f\u9614\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2410.21236": "|**2024-10-28**|**Flaming-hot Initiation with Regular Execution Sampling for Large Language Models**|Weizhe Chen et.al.|[2410.21236](http://arxiv.org/abs/2410.21236)|null|\u81eaChatGPT\u53d1\u5e03\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\u3002\u5728\u5f00\u53d1\u8fd9\u4e9b\u901a\u7528\u80fd\u529b\u65f6\uff0c\u4e00\u4e2a\u5173\u952e\u7684\u6311\u6218\u662f\u9ad8\u6548\u5730\u83b7\u53d6\u591a\u6837\u5316\u4e14\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u3002\u8fd9\u5728\u4e0e\u6c99\u76d2\u68c0\u67e5\u5668\u76f8\u5173\u7684\u63a8\u7406\u4efb\u52a1\u4e2d\u5c24\u4e3a\u91cd\u8981\uff0c\u4f8b\u5982\u6570\u5b66\u6216\u4ee3\u7801\u95ee\u9898\uff0c\u76ee\u6807\u662f\u63d0\u9ad8\u751f\u6210\u6b63\u786e\u89e3\u51b3\u65b9\u6848\u7684\u6982\u7387\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Flaming-hot Initiation with Regular Execution\uff08FIRE\uff09\u91c7\u6837\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u4f46\u975e\u5e38\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u9ad8\u6548\u5730\u627e\u5230\u597d\u7684\u54cd\u5e94\u3002\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0cFIRE\u91c7\u6837\u63d0\u9ad8\u4e86\u63a8\u7406\u65f6\u95f4\u751f\u6210\u7684\u8d28\u91cf\uff0c\u5e76\u4e14\u5728\u5bf9\u9f50\u9636\u6bb5\u7684\u8bad\u7ec3\u4e2d\u4e5f\u53d7\u76ca\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86FIRE\u91c7\u6837\u5982\u4f55\u901a\u8fc7\u4fc3\u8fdb\u591a\u6837\u6027\u6765\u63d0\u5347\u6027\u80fd\uff0c\u5e76\u5206\u6790\u4e86\u5728\u54cd\u5e94\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f7f\u7528FIRE\u7684\u5f71\u54cd\u3002|\n", "2410.21228": "|**2024-10-28**|**LoRA vs Full Fine-tuning: An Illusion of Equivalence**|Reece Shuttleworth et.al.|[2410.21228](http://arxiv.org/abs/2410.21228)|null|\u5fae\u8c03\u662f\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u9002\u5e94\u5230\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u5173\u952e\u8303\u5f0f\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0c\u4f4e\u79e9\u81ea\u9002\u5e94\uff08LoRA\uff09\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u80fd\u591f\u4ee5\u6781\u5c0f\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u8fbe\u5230\u4e0e\u5b8c\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u5373\u4f7f\u4e24\u79cd\u65b9\u6cd5\u5b66\u4e60\u5230\u7684\u6a21\u578b\u51c6\u786e\u6027\u76f8\u4f3c\uff0c\u5b83\u4eec\u7684\u5b66\u4e60\u89e3\u51b3\u65b9\u6848\u771f\u7684\u7b49\u4ef7\u5417\uff1f\u6211\u4eec\u901a\u8fc7\u5206\u6790\u6a21\u578b\u6743\u91cd\u77e9\u9635\u7684\u8c31\u5c5e\u6027\u6765\u7814\u7a76\u4e0d\u540c\u7684\u5fae\u8c03\u65b9\u6cd5\u5982\u4f55\u6539\u53d8\u9884\u8bad\u7ec3\u6a21\u578b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5168\u5fae\u8c03\u548cLoRA\u751f\u6210\u7684\u6743\u91cd\u77e9\u9635\u5728\u5947\u5f02\u503c\u5206\u89e3\u7ed3\u6784\u4e0a\u8868\u73b0\u51fa\u5f88\u5927\u7684\u4e0d\u540c\uff1b\u6b64\u5916\uff0c\u5f53\u5728\u8d85\u51fa\u9002\u5e94\u4efb\u52a1\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\u6d4b\u8bd5\u65f6\uff0c\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u663e\u793a\u51fa\u4e0d\u540c\u7684\u6cdb\u5316\u884c\u4e3a\u3002\u66f4\u5177\u4f53\u5730\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u5c55\u793a\u4e86\u4f7f\u7528LoRA\u8bad\u7ec3\u7684\u6743\u91cd\u77e9\u9635\u5177\u6709\u65b0\u7684\u9ad8\u6392\u540d\u5947\u5f02\u5411\u91cf\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u5165\u4fb5\u7ef4\u5ea6\u201d\u3002\u8fd9\u4e9b\u5165\u4fb5\u7ef4\u5ea6\u5728\u5168\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u4e0d\u4f1a\u51fa\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5c3d\u7ba1\u5177\u6709\u5165\u4fb5\u7ef4\u5ea6\u7684LoRA\u6a21\u578b\u5728\u76ee\u6807\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0e\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5b83\u4eec\u5bf9\u9884\u8bad\u7ec3\u5206\u5e03\u7684\u5efa\u6a21\u6548\u679c\u8f83\u5dee\uff0c\u5e76\u4e14\u5728\u987a\u5e8f\u9002\u5e94\u591a\u4e2a\u4efb\u52a1\u65f6\u7684\u9c81\u68d2\u6027\u8f83\u4f4e\u3002\u9ad8\u79e9\u3001\u79e9\u7a33\u5b9a\u7684LoRA\u6a21\u578b\u751a\u81f3\u5728\u4e0e\u4f4e\u79e9LoRA\u6a21\u578b\u6267\u884c\u76f8\u540c\u4efb\u52a1\u65f6\uff0c\u4e5f\u4e0e\u5168\u5fae\u8c03\u6a21\u578b\u975e\u5e38\u63a5\u8fd1\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u5728\u76f8\u540c\u7684\u5fae\u8c03\u5206\u5e03\u4e0a\u8868\u73b0\u76f8\u540c\uff0cLoRA\u66f4\u65b0\u7684\u6a21\u578b\u548c\u5168\u5fae\u8c03\u6a21\u578b\u8bbf\u95ee\u4e86\u53c2\u6570\u7a7a\u95f4\u7684\u4e0d\u540c\u90e8\u5206\u3002\u6211\u4eec\u6700\u540e\u63a2\u8ba8\u4e86\u4e3a\u4ec0\u4e48\u5165\u4fb5\u7ef4\u5ea6\u4f1a\u5728LoRA\u5fae\u8c03\u6a21\u578b\u4e2d\u51fa\u73b0\uff0c\u4e3a\u4ec0\u4e48\u5b83\u4eec\u662f\u4e0d\u7406\u60f3\u7684\uff0c\u4ee5\u53ca\u5982\u4f55\u6700\u5c0f\u5316\u5176\u5f71\u54cd\u3002|\n", "2410.21218": "|**2024-10-28**|**Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations**|Kaifeng Huang et.al.|[2410.21218](http://arxiv.org/abs/2410.21218)|null|\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u667a\u529b\u548c\u751f\u4ea7\u529b\u65b9\u9762\u5f15\u53d1\u4e86\u663e\u8457\u7684\u5f71\u54cd\u3002\u8fd1\u5e74\u6765\uff0c\u5546\u4e1a\u548c\u5f00\u6e90LLM\u7684\u5f15\u5165\u5448\u73b0\u51fa\u5de8\u5927\u7684\u589e\u957f\u8d8b\u52bf\u3002\u8bb8\u591a\u4f01\u4e1a\u5df2\u5c06LLM\u96c6\u6210\u5230\u5176\u5e94\u7528\u4e2d\u4ee5\u89e3\u51b3\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5c06LLM\u6574\u5408\u5230\u5177\u4f53\u4e1a\u52a1\u573a\u666f\u4e2d\u4e0d\u4ec5\u4ec5\u9700\u8981\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u672c\u8eab\uff0c\u800c\u662f\u4e00\u4e2a\u7cfb\u7edf\u7684\u8fc7\u7a0b\uff0c\u6d89\u53ca\u5927\u91cf\u7684\u7ec4\u6210\u90e8\u5206\uff0c\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u7edf\u79f0\u4e3aLLM\u4f9b\u5e94\u94fe\u3002LLM\u4f9b\u5e94\u94fe\u5185\u5728\u5730\u627f\u8f7d\u7740\u98ce\u9669\u3002\u56e0\u6b64\uff0c\u7406\u89e3\u53ef\u80fd\u5f15\u5165\u4f9b\u5e94\u94fe\u7684\u7ec4\u4ef6\u7c7b\u578b\u4ee5\u53ca\u76f8\u5173\u7684\u98ce\u9669\u81f3\u5173\u91cd\u8981\uff0c\u8fd9\u6709\u52a9\u4e8e\u4e0d\u540c\u7684\u5229\u76ca\u76f8\u5173\u8005\u5b9e\u65bd\u6709\u6548\u7684\u7f13\u89e3\u63aa\u65bd\u3002\u867d\u7136\u4e00\u4e9b\u6587\u732e\u6d89\u53ca\u4e0eLLM\u4f9b\u5e94\u94fe\u76f8\u5173\u7684\u98ce\u9669\uff0c\u4f46\u76ee\u524d\u8fd8\u6ca1\u6709\u8bba\u6587\u660e\u786e\u754c\u5b9a\u5176\u8303\u56f4\u3001\u8bc6\u522b\u56fa\u6709\u98ce\u9669\u5e76\u63a2\u8ba8\u6f5c\u5728\u7684\u7f13\u89e3\u7b56\u7565\u3002\u9274\u4e8eLLMs\u5df2\u6210\u4e3a\u65b0\u65f6\u4ee3\u7684\u91cd\u8981\u57fa\u7840\u8bbe\u65bd\uff0c\u6211\u4eec\u8ba4\u4e3a\u5bf9LLM\u4f9b\u5e94\u94fe\u53ca\u5176\u56fa\u6709\u98ce\u9669\u548c\u7f13\u89e3\u7b56\u7565\u8fdb\u884c\u5f7b\u5e95\u5ba1\u67e5\u5bf9\u4e8e\u884c\u4e1a\u4ece\u4e1a\u8005\u907f\u514d\u6f5c\u5728\u635f\u5931\u5177\u6709\u91cd\u8981\u4ef7\u503c\uff0c\u5e76\u4e14\u5bf9\u4e8e\u5b66\u672f\u7814\u7a76\u4eba\u5458\u91cd\u65b0\u601d\u8003\u73b0\u6709\u65b9\u6cd5\u548c\u63a2\u7d22\u65b0\u7684\u7814\u7a76\u9014\u5f84\u4e5f\u5177\u6709\u542f\u53d1\u610f\u4e49\u3002\u6211\u4eec\u7684\u8bba\u6587\u63d0\u4f9b\u4e86LLM\u4f9b\u5e94\u94fe\u7684\u5168\u9762\u6982\u8ff0\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u5229\u76ca\u76f8\u5173\u8005\u3001\u7ec4\u6210\u5143\u7d20\u4ee5\u53ca\u4f9b\u5e94\u7c7b\u578b\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e0e\u5404\u79cd\u4f9b\u5e94\u94fe\u5229\u76ca\u76f8\u5173\u8005\u548c\u7ec4\u4ef6\u76f8\u5173\u7684\u98ce\u9669\u7c7b\u578b\u3001\u98ce\u9669\u884c\u4e3a\u548c\u7f13\u89e3\u63aa\u65bd\u7684\u5206\u7c7b\u6cd5\u3002\u603b\u800c\u8a00\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63a2\u8ba8\u4e86LLM\u4f9b\u5e94\u94fe\u7684\u6280\u672f\u548c\u64cd\u4f5c\u65b9\u9762\uff0c\u4e3a\u7814\u7a76\u548c\u5de5\u7a0b\u4eba\u5458\u5728\u4e0d\u65ad\u53d1\u5c55\u7684LLM\u9886\u57df\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2410.21200": "|**2024-10-28**|**BongLLaMA: LLaMA for Bangla Language**|Abdullah Khan Zehady et.al.|[2410.21200](http://arxiv.org/abs/2410.21200)|null|\u5b5f\u52a0\u62c9\u8bed\uff08\u6216\u201c Bengali\u201d\uff09\u662f\u4e00\u79cd\u4f7f\u7528\u7ea62.4\u4ebf\u6bcd\u8bed\u8005\u548c\u5927\u7ea63\u4ebf\u4eba\u4f7f\u7528\u7684\u8bed\u8a00\u3002\u5c3d\u7ba1\u5b83\u662f\u4e16\u754c\u4e0a\u7b2c\u4e94\u5927\u4f7f\u7528\u8bed\u8a00\uff0c\u5b5f\u52a0\u62c9\u8bed\u4ecd\u88ab\u89c6\u4e3a\u4e00\u79cd\u201c\u4f4e\u8d44\u6e90\u201d\u8bed\u8a00\uff0c\u73b0\u6709\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u5728\u5b5f\u52a0\u62c9\u8bed\u5904\u7406\uff08BLP\uff09\u4efb\u52a1\u4e0a\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5f15\u5165BongLLaMA\uff08\u5373\u5b5f\u52a0\u62c9\u8bed-LLaMA\uff09\uff0c\u89e3\u51b3\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u95e8\u9488\u5bf9\u5927\u578b\u5b5f\u52a0\u62c9\u8bed\u8bed\u6599\u5e93\u548c\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\u3001\u6570\u636e\u589e\u5f3a\u6280\u672f\u3001\u5fae\u8c03\u7ec6\u8282\u4ee5\u53ca\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\u7ed3\u679c\uff0c\u5c55\u793a\u4e86BongLLaMA\u5728\u5b5f\u52a0\u62c9\u8bed\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u6548\u7528\u3002\u6211\u4eec\u76f8\u4fe1BongLLaMA\u5c06\u6210\u4e3a\u5b5f\u52a0\u62c9\u8bed\u6a21\u578b\u7684\u65b0\u6807\u51c6\u57fa\u7ebf\uff0c\u4ece\u800c\u4fc3\u8fdb\u672a\u6765\u4e13\u6ce8\u4e8e\u8fd9\u79cd\u5e7f\u6cdb\u4f7f\u7528\u4f46\u201c\u4f4e\u8d44\u6e90\u201d\u7684\u8bed\u8a00\u7684\u57fa\u51c6\u7814\u7a76\u3002\u6240\u6709BongLLaMA\u6a21\u578b\u5747\u53ef\u4f9b\u516c\u4f17\u4f7f\u7528\uff0c\u7f51\u5740\u4e3ahttps://huggingface.co/BanglaLLM\u3002|\n", "2410.21169": "|**2024-10-29**|**Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction**|Qintong Zhang et.al.|[2410.21169](http://arxiv.org/abs/2410.21169)|null|\u6587\u6863\u89e3\u6790\u5bf9\u4e8e\u5c06\u975e\u7ed3\u6784\u5316\u548c\u534a\u7ed3\u6784\u5316\u6587\u6863\uff08\u5982\u5408\u540c\u3001\u5b66\u672f\u8bba\u6587\u548c\u53d1\u7968\uff09\u8f6c\u6362\u4e3a\u7ed3\u6784\u5316\u7684\u3001\u673a\u5668\u53ef\u8bfb\u7684\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u6587\u6863\u89e3\u6790\u4ece\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e2d\u63d0\u53d6\u53ef\u9760\u4e14\u7ed3\u6784\u5316\u7684\u6570\u636e\uff0c\u4e3a\u4f17\u591a\u5e94\u7528\u63d0\u4f9b\u4e86\u6781\u5927\u7684\u4fbf\u5229\u3002\u7279\u522b\u662f\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u6587\u6863\u89e3\u6790\u5728\u77e5\u8bc6\u5e93\u6784\u5efa\u548c\u8bad\u7ec3\u6570\u636e\u751f\u6210\u65b9\u9762\u53d1\u6325\u7740\u4e0d\u53ef\u6216\u7f3a\u7684\u4f5c\u7528\u3002\u672c\u6587\u7efc\u8ff0\u4e86\u5f53\u524d\u6587\u6863\u89e3\u6790\u7684\u72b6\u6001\uff0c\u6db5\u76d6\u4e86\u5173\u952e\u7684\u65b9\u6cd5\u8bba\uff0c\u4ece\u6a21\u5757\u5316\u6d41\u6c34\u7ebf\u7cfb\u7edf\u5230\u7531\u5927\u578b\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u7aef\u5230\u7aef\u6a21\u578b\u3002\u8be6\u7ec6\u63a2\u8ba8\u4e86\u6838\u5fc3\u7ec4\u4ef6\uff0c\u5305\u62ec\u5e03\u5c40\u68c0\u6d4b\u3001\u5185\u5bb9\u63d0\u53d6\uff08\u5305\u62ec\u6587\u672c\u3001\u8868\u683c\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\uff09\u4ee5\u53ca\u591a\u6a21\u6001\u6570\u636e\u96c6\u6210\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u8ba8\u8bba\u4e86\u6a21\u5757\u5316\u6587\u6863\u89e3\u6790\u7cfb\u7edf\u548c\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u5e03\u5c40\u3001\u6574\u5408\u591a\u4e2a\u6a21\u5757\u548c\u8bc6\u522b\u9ad8\u5bc6\u5ea6\u6587\u672c\u65f6\u6240\u9762\u4e34\u7684\u6311\u6218\u3002\u6587\u7ae0\u5f3a\u8c03\u4e86\u5f00\u53d1\u66f4\u5927\u548c\u66f4\u591a\u6837\u5316\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6982\u8ff0\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2410.22323": "|**2024-10-29**|**Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models**|Seetharam Killivalavan et.al.|[2410.22323](http://arxiv.org/abs/2410.22323)|null|\u672c\u6587\u63a2\u7d22\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u6280\u672f\u6765\u63d0\u5347\u4e8c\u5143\u5206\u7c7b\u6a21\u578b\u5728\u8bc4\u4f30\u4ee3\u7801\u6ce8\u91ca\u8d28\u91cf\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u901a\u8fc7\u5c06\u6765\u81ea\u591a\u4e2aGitHub\u4ed3\u5e93\u76841,437\u4e2a\u65b0\u751f\u6210\u7684\u4ee3\u7801-\u6ce8\u91ca\u5bf9\uff08\u6807\u8bb0\u4e3a\u201c\u6709\u7528\u201d\u6216\u201c\u65e0\u7528\u201d\uff09\u6574\u5408\u5230\u4e00\u4e2a\u73b0\u6709\u7684C\u8bed\u8a00\u6570\u636e\u96c6\u4e2d\uff08\u8be5\u6570\u636e\u96c6\u5305\u542b9,048\u5bf9\uff09\uff0c\u5c55\u793a\u4e86\u6a21\u578b\u6027\u80fd\u7684\u663e\u8457\u63d0\u5347\u3002\u91c7\u7528\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f7f\u5f97\u652f\u6301\u5411\u91cf\u673a\uff08SVM\uff09\u6a21\u578b\u7684\u7cbe\u786e\u7387\u63d0\u9ad8\u4e865.78%\uff0c\u4ece0.79\u63d0\u5347\u81f30.8478\uff0c\u540c\u65f6\u4eba\u5de5\u795e\u7ecf\u7f51\u7edc\uff08ANN\uff09\u6a21\u578b\u7684\u53ec\u56de\u7387\u63d0\u9ad8\u4e862.17%\uff0c\u4ece0.731\u63d0\u5347\u81f30.7527\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u5728\u6539\u8fdb\u4ee3\u7801\u6ce8\u91ca\u5206\u7c7b\u6a21\u578b\u4e2d\u7684\u4ef7\u503c\uff0c\u4e3a\u8f6f\u4ef6\u5f00\u53d1\u548c\u8d28\u91cf\u63a7\u5236\u4e2d\u7684\u6a21\u578b\u51c6\u786e\u6027\u63d0\u5347\u63d0\u4f9b\u4e86\u91cd\u8981\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u4e3a\u5728\u5b9e\u9645\u8f6f\u4ef6\u5de5\u7a0b\u73af\u5883\u4e2d\u6574\u5408\u751f\u6210\u6280\u672f\u4ee5\u4f18\u5316\u673a\u5668\u5b66\u4e60\u6a21\u578b\u63d0\u4f9b\u4e86\u4e50\u89c2\u7684\u524d\u666f\u3002|\n", "2410.22318": "|**2024-10-29**|**Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting**|Can Chen et.al.|[2410.22318](http://arxiv.org/abs/2410.22318)|**[link](https://github.com/canchen-cc/online-llm-detection)**|**\u8fd1\u5e74\u6765\uff0c\u533a\u5206\u673a\u5668\u751f\u6210\u6587\u672c\u548c\u4eba\u7c7b\u64b0\u5199\u6587\u672c\u7684\u7b97\u6cd5\u7814\u7a76\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u65b9\u6cd5\u901a\u5e38\u662f\u5728\u79bb\u7ebf\u8bbe\u7f6e\u4e0b\u8fdb\u884c\uff0c\u5373\u5728\u7ed9\u5b9a\u7684\u6570\u636e\u96c6\u4e2d\u5305\u542b\u771f\u5b9e\u6587\u672c\u548c\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u6df7\u5408\u6837\u672c\uff0c\u4efb\u52a1\u662f\u786e\u5b9a\u6570\u636e\u96c6\u4e2d\u7684\u6bcf\u4e2a\u6837\u672c\u662f\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fd8\u662f\u7531\u4eba\u7c7b\u751f\u6210\u7684\u3002\u7136\u800c\uff0c\u5728\u8bb8\u591a\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u5982\u65b0\u95fb\u7f51\u7ad9\u3001\u793e\u4ea4\u5a92\u4f53\u8d26\u6237\u6216\u5176\u4ed6\u8bba\u575b\u53d1\u5e03\u7684\u6587\u7ae0\u662f\u4ee5\u6d41\u5f0f\u65b9\u5f0f\u53d1\u5e03\u7684\u3002\u56e0\u6b64\uff0c\u5728\u8fd9\u79cd\u5728\u7ebf\u573a\u666f\u4e2d\uff0c\u5982\u4f55\u5feb\u901f\u4e14\u51c6\u786e\u5730\u786e\u5b9a\u8fd9\u4e9b\u6765\u6e90\u662f\u5426\u4e3aLLM\uff0c\u5e76\u5177\u6709\u5f3a\u5927\u7684\u7edf\u8ba1\u4fdd\u8bc1\uff0c\u5bf9\u4e8e\u8fd9\u4e9b\u5a92\u4f53\u6216\u5e73\u53f0\u6709\u6548\u5730\u8fd0\u4f5c\u5e76\u9632\u6b62\u9519\u8bef\u4fe1\u606f\u548c\u5176\u4ed6\u6f5c\u5728\u7684LLM\u8bef\u7528\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u4e86\u89e3\u51b3\u5728\u7ebf\u68c0\u6d4b\u7684\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8e\u987a\u5e8f\u5047\u8bbe\u68c0\u9a8c\u7684\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u4e0d\u4ec5\u5efa\u7acb\u5e76\u8865\u5145\u4e86\u73b0\u6709\u7684\u79bb\u7ebf\u68c0\u6d4b\u6280\u672f\uff0c\u800c\u4e14\u8fd8\u5177\u5907\u7edf\u8ba1\u4fdd\u8bc1\uff0c\u5305\u62ec\u63a7\u5236\u9519\u8bef\u53d1\u73b0\u7387\u548c\u6b63\u786e\u8bc6\u522b\u6765\u6e90\u4e3aLLM\u7684\u9884\u671f\u65f6\u95f4\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2410.22315": "|**2024-10-29**|**Natural Language Inference Improves Compositionality in Vision-Language Models**|Paola Cascante-Bonilla et.al.|[2410.22315](http://arxiv.org/abs/2410.22315)|null|\u5728\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u4e2d\uff0c\u7ec4\u5408\u63a8\u7406\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u96be\u4ee5\u5173\u8054\u5bf9\u8c61\u3001\u5c5e\u6027\u548c\u7a7a\u95f4\u5173\u7cfb\u3002\u6700\u8fd1\u7684\u65b9\u6cd5\u8bd5\u56fe\u901a\u8fc7\u4f9d\u8d56\u6587\u672c\u63cf\u8ff0\u7684\u8bed\u4e49\uff0c\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c06\u95ee\u9898\u548c\u7b54\u6848\u5206\u89e3\u6210\u5b50\u96c6\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4e3b\u8981\u5728\u8868\u9762\u5c42\u6b21\u4e0a\u64cd\u4f5c\uff0c\u672a\u80fd\u5f15\u5165\u66f4\u6df1\u7684\u8bcd\u6c47\u7406\u89e3\uff0c\u540c\u65f6\u8fd8\u4f1a\u5f15\u5165\u7531LLM\u751f\u6210\u7684\u9519\u8bef\u5047\u8bbe\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Caption Expansion with Contradictions and Entailments (CECE)\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u539f\u7406\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u4ece\u7ed9\u5b9a\u7684\u524d\u63d0\u751f\u6210\u8574\u6db5\u548c\u77db\u76fe\u3002CECE\u751f\u6210\u8bcd\u6c47\u4e0a\u591a\u6837\u7684\u53e5\u5b50\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u610f\u4e49\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86CECE\u589e\u5f3a\u4e86\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u51cf\u5c11\u4e86\u5bf9\u6709\u504f\u89c1\u6216\u8868\u9762\u7279\u5f81\u7684\u8fc7\u5ea6\u4f9d\u8d56\u3002\u901a\u8fc7\u5e73\u8861\u539f\u59cb\u524d\u63d0\u4e0eCECE\uff0c\u6211\u4eec\u5728\u65e0\u9700\u989d\u5916\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u8861\u91cf\u56fe\u50cf-\u6587\u672c\u5bf9\u9f50\u7684\u4eba\u7c7b\u5224\u65ad\u5f97\u5206\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\uff0c\u5e76\u5728Winoground\u4e0a\u5b9e\u73b0\u4e86+19.2%\uff08\u7ec4\u5206\u6570\uff09\u548c\u5728EqBen\u4e0a\u5b9e\u73b0+12.9%\uff08\u7ec4\u5206\u6570\uff09\u7684\u6027\u80fd\u63d0\u5347\uff0c\u8d85\u8fc7\u4e86\u6700\u4f73\u73b0\u6709\u5de5\u4f5c\uff08\u4f7f\u7528\u9488\u5bf9\u6027\u6570\u636e\u5fae\u8c03\uff09\u3002|\n", "2410.22309": "|**2024-10-29**|**GPT-4o reads the mind in the eyes**|James W. A. Strachan et.al.|[2410.22309](http://arxiv.org/abs/2410.22309)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u6587\u672c\u4e2d\u91cd\u73b0\u4eba\u7c7b\u7c7b\u4f3c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u5305\u62ec\u5173\u4e8e\u60c5\u7eea\u548c\u5fc3\u7406\u72b6\u6001\u7684\u63a8\u7406\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u80fd\u529b\u662f\u5426\u6269\u5c55\u5230\u5176\u4ed6\u6a21\u6001\u5c1a\u4e0d\u6e05\u695a\u3002\u4eba\u7c7b\u5177\u6709\u901a\u8fc7\u4ed6\u4eba\u7684\u773c\u775b\u8bfb\u5fc3\u7684\u590d\u6742\u80fd\u529b\u3002\u5728\u6b64\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u6d4b\u8bd5\u4e86\u8fd9\u4e00\u80fd\u529b\u662f\u5426\u4e5f\u5b58\u5728\u4e8eGPT-4o\u8fd9\u4e00\u591a\u6a21\u6001LLM\u4e2d\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u4e24\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u5fc3\u7406\u7406\u8bba\u6d4b\u8bd5\u7248\u672c\uff0c\u5373\u201c\u773c\u775b\u4e2d\u7684\u5fc3\u667a\u9605\u8bfb\u6d4b\u8bd5\u201d\u548c\u201c\u591a\u5143\u79cd\u65cf\u773c\u775b\u4e2d\u7684\u5fc3\u667a\u9605\u8bfb\u6d4b\u8bd5\u201d\u3002\u7ed3\u679c\u53d1\u73b0\uff0cGPT-4o\u5728\u89e3\u91ca\u6765\u81ea\u76f4\u7acb\u9762\u90e8\u7684\u5fc3\u7406\u72b6\u6001\u65b9\u9762\u4f18\u4e8e\u4eba\u7c7b\uff0c\u4f46\u5728\u9762\u90e8\u5012\u7f6e\u65f6\u8868\u73b0\u8f83\u5dee\u3002\u5c3d\u7ba1\u6211\u4eec\u6837\u672c\u4e2d\u7684\u4eba\u7c7b\u5728\u767d\u4eba\u548c\u975e\u767d\u4eba\u9762\u5b54\u4e4b\u95f4\u6ca1\u6709\u8868\u73b0\u51fa\u5dee\u5f02\uff0c\u4f46GPT-4o\u5bf9\u767d\u4eba\u9762\u5b54\u7684\u51c6\u786e\u5ea6\u9ad8\u4e8e\u975e\u767d\u4eba\u9762\u5b54\u3002GPT-4o\u7684\u9519\u8bef\u5e76\u975e\u968f\u673a\u51fa\u73b0\uff0c\u800c\u662f\u63ed\u793a\u4e86\u4e00\u79cd\u9ad8\u5ea6\u4e00\u81f4\u4f46\u9519\u8bef\u7684\u5904\u7406\u5fc3\u7406\u72b6\u6001\u4fe1\u606f\u7684\u65b9\u5f0f\uff0c\u5728\u4e0d\u540c\u8bd5\u9a8c\u4e2d\u5448\u73b0\u51fa\u65b9\u5411\u4f9d\u8d56\u7684\u9519\u8bef\u7ed3\u6784\uff0c\u8fd9\u79cd\u7ed3\u6784\u5728\u9762\u5bf9\u5012\u7f6e\u9762\u5b54\u65f6\u4e0e\u4eba\u7c7b\u5b58\u5728\u5b9a\u6027\u5dee\u5f02\uff0c\u800c\u5728\u9762\u5bf9\u76f4\u7acb\u9762\u5b54\u65f6\u5219\u65e0\u660e\u663e\u533a\u522b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5148\u8fdb\u7684\u5fc3\u7406\u72b6\u6001\u63a8\u7406\u80fd\u529b\u548c\u4eba\u7c7b\u7c7b\u4f3c\u7684\u9762\u90e8\u5904\u7406\u7279\u5f81\uff0c\u5982\u53cd\u8f6c\u6548\u5e94\uff0c\u5728GPT-4o\u4e2d\u5171\u5b58\uff0c\u540c\u65f6\u5176\u4fe1\u606f\u5904\u7406\u65b9\u5f0f\u4e0e\u4eba\u7c7b\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002|\n", "2410.22307": "|**2024-10-29**|**SVIP: Towards Verifiable Inference of Open-source Large Language Models**|Yifan Sun et.al.|[2410.22307](http://arxiv.org/abs/2410.22307)|null|\u5f00\u6e90\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\uff0c\u5e76\u5728\u5404\u4e2a\u9886\u57df\u5f97\u5230\u4e86\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u89c4\u6a21\u7684\u589e\u5927\uff0c\u672c\u5730\u90e8\u7f72\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\uff0c\u8bb8\u591a\u7528\u6237\u4e0d\u5f97\u4e0d\u4f9d\u8d56\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u901a\u8fc7\u9ed1\u76d2API\u8fdb\u884c\u63a8\u7406\u3002\u8fd9\u79cd\u4f9d\u8d56\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u98ce\u9669\uff1a\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u53ef\u80fd\u5728\u672a\u7ecf\u7528\u6237\u540c\u610f\u7684\u60c5\u51b5\u4e0b\uff0c\u7528\u8f83\u5c0f\u4e14\u80fd\u529b\u8f83\u5f31\u7684\u6a21\u578b\u66ff\u4ee3\u7528\u6237\u8bf7\u6c42\u7684LLM\uff0c\u4ece\u800c\u63d0\u4f9b\u8d28\u91cf\u8f83\u5dee\u7684\u7ed3\u679c\uff0c\u540c\u65f6\u8282\u7701\u6210\u672c\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f62\u5f0f\u5316\u4e86LLM\u53ef\u9a8c\u8bc1\u63a8\u7406\u7684\u95ee\u9898\u3002\u73b0\u6709\u7684\u57fa\u4e8e\u5bc6\u7801\u5b66\u6216\u535a\u5f08\u8bba\u6280\u672f\u7684\u53ef\u9a8c\u8bc1\u8ba1\u7b97\u89e3\u51b3\u65b9\u6848\u8981\u4e48\u5728\u8ba1\u7b97\u4e0a\u4e0d\u7ecf\u6d4e\uff0c\u8981\u4e48\u57fa\u4e8e\u8f83\u5f3a\u7684\u5047\u8bbe\u3002\u6211\u4eec\u5f15\u5165\u4e86SVIP\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u79d8\u5bc6\u7684\u53ef\u9a8c\u8bc1LLM\u63a8\u7406\u534f\u8bae\uff0c\u5b83\u5229\u7528LLM\u7684\u4e2d\u95f4\u8f93\u51fa\u4f5c\u4e3a\u552f\u4e00\u7684\u6a21\u578b\u6807\u8bc6\u7b26\u3002\u901a\u8fc7\u5728\u8fd9\u4e9b\u8f93\u51fa\u4e0a\u8bad\u7ec3\u4ee3\u7406\u4efb\u52a1\uff0c\u5e76\u8981\u6c42\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u8fd4\u56de\u751f\u6210\u7684\u6587\u672c\u548c\u5904\u7406\u8fc7\u7684\u4e2d\u95f4\u8f93\u51fa\uff0c\u7528\u6237\u53ef\u4ee5\u53ef\u9760\u5730\u9a8c\u8bc1\u8ba1\u7b97\u670d\u52a1\u63d0\u4f9b\u5546\u662f\u5426\u8bda\u5b9e\u884c\u4e8b\u3002\u6b64\u5916\uff0c\u7ed3\u5408\u79d8\u5bc6\u673a\u5236\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u534f\u8bae\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5728\u591a\u79cd\u5f3a\u9002\u5e94\u6027\u5bf9\u6297\u573a\u666f\u4e0b\u5168\u9762\u5206\u6790\u4e86\u6211\u4eec\u7684\u534f\u8bae\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSVIP\u662f\u51c6\u786e\u7684\u3001\u53ef\u6cdb\u5316\u7684\u3001\u8ba1\u7b97\u9ad8\u6548\u7684\uff0c\u5e76\u4e14\u5bf9\u5404\u79cd\u653b\u51fb\u5177\u6709\u62b5\u6297\u529b\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSVIP\u7684\u5047\u9634\u6027\u7387\u4f4e\u4e8e5%\uff0c\u5047\u9633\u6027\u7387\u4f4e\u4e8e3%\uff0c\u5e76\u4e14\u6bcf\u6b21\u67e5\u8be2\u7684\u9a8c\u8bc1\u65f6\u95f4\u5c11\u4e8e0.01\u79d2\u3002|\n", "2410.22304": "|**2024-10-29**|**Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning**|Yihe Deng et.al.|[2410.22304](http://arxiv.org/abs/2410.22304)|null|\u6570\u5b66\u63a8\u7406\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5173\u952e\u80fd\u529b\uff0c\u4f46\u751f\u6210\u8be6\u7ec6\u4e14\u51c6\u786e\u7684\u63a8\u7406\u8f68\u8ff9\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5229\u7528\u5728\u7ebf\u5b66\u4e60\u6d41\u7684\u65b0\u65b9\u6cd5\uff0c\u4ee5\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u63a8\u7406\u8f68\u8ff9\u7528\u4e8eLLM\u5fae\u8c03\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u589e\u91cf\u8f93\u51fa\u751f\u4ea7\u6d41\uff0c\u5176\u4e2d\u7ec4\u4ef6LLM\u901a\u8fc7\u8fed\u4ee3\u901a\u4fe1\u534f\u4f5c\u6784\u5efa\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528\u5e26\u6709\u6eda\u52a8\u7684\u5728\u7ebf\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5b66\u4e60\u6765\u8bad\u7ec3\u8be5\u6d41\uff0c\u4e3a\u6bcf\u4e2a\u8bad\u7ec3\u6837\u672c\u751f\u6210DPO\u5bf9\uff0c\u5e76\u5b9e\u65f6\u66f4\u65b0\u6a21\u578b\u3002\u6211\u4eec\u76f4\u63a5\u6bd4\u8f83\u4e86\u901a\u8fc7\u6211\u4eec\u65b9\u6cd5\u4e0e\u76f4\u63a5\u6a21\u578b\u63a8\u7406\u751f\u6210\u7684\u63a8\u7406\u8f68\u8ff9\u7684\u8d28\u91cf\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u63d0\u9ad8LLM\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2410.22296": "|**2024-10-29**|**LLMs are Highly-Constrained Biophysical Sequence Optimizers**|Angelica Chen et.al.|[2410.22296](http://arxiv.org/abs/2410.22296)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u751f\u7269\u4efb\u52a1\u4e2d\uff0c\u5982\u86cb\u767d\u8d28\u5de5\u7a0b\u548c\u5206\u5b50\u8bbe\u8ba1\u65b9\u9762\uff0c\u6700\u8fd1\u5c55\u793a\u4e86\u663e\u8457\u7684\u6f5c\u529b\u3002\u8fd9\u4e9b\u4efb\u52a1\u901a\u5e38\u6d89\u53ca\u9ed1\u76d2\u79bb\u6563\u5e8f\u5217\u4f18\u5316\uff0c\u6311\u6218\u5728\u4e8e\u751f\u6210\u4e0d\u4ec5\u5728\u751f\u7269\u5b66\u4e0a\u53ef\u884c\u800c\u4e14\u4e25\u683c\u7b26\u5408\u7ec6\u7c92\u5ea6\u7ea6\u675f\u7684\u5e8f\u5217\u3002\u7136\u800c\uff0cLLMs\u5f80\u5f80\u96be\u4ee5\u5e94\u5bf9\u8fd9\u4e9b\u7ea6\u675f\uff0c\u7279\u522b\u662f\u5728\u751f\u7269\u5b66\u80cc\u666f\u4e0b\uff0c\u9a8c\u8bc1\u5019\u9009\u89e3\u51b3\u65b9\u6848\u65e2\u6602\u8d35\u53c8\u8017\u65f6\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5c06LLMs\u4f5c\u4e3a\u9ad8\u5ea6\u7ea6\u675f\u7684\u53cc\u5c42\u4f18\u5316\u5668\u7684\u53ef\u80fd\u6027\uff0c\u901a\u8fc7\u4e00\u79cd\u6211\u4eec\u79f0\u4e4b\u4e3a\u8bed\u8a00\u6a21\u578b\u4f18\u5316\u8fb9\u7f18\u671f\u671b\uff08LLOME\uff09\u7684\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u4f18\u5316\uff0c\u5229\u7528\u6709\u9650\u7684oracle\u8bc4\u4f30\u8fed\u4ee3\u5730\u589e\u5f3a\u7531LLM\u751f\u6210\u7684\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u76ee\u6807\u2014\u2014\u8fb9\u7f18\u5bf9\u9f50\u671f\u671b\uff08MargE\uff09\uff0c\u8be5\u76ee\u6807\u8bad\u7ec3LLM\u5e73\u6ed1\u5730\u5728\u5956\u52b1\u5206\u5e03\u548c\u53c2\u8003\u5206\u5e03\u4e4b\u95f4\u63d2\u503c\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u5408\u6210\u6d4b\u8bd5\u5957\u4ef6\uff0c\u8be5\u5957\u4ef6\u4e0e\u5b9e\u9645\u751f\u7269\u7269\u7406\u95ee\u9898\u5177\u6709\u5f3a\u70c8\u7684\u51e0\u4f55\u76f8\u4f3c\u6027\uff0c\u5e76\u4e14\u80fd\u591f\u5728\u4e0d\u8fdb\u884c\u8017\u65f6\u7684\u5b9e\u9a8c\u5ba4\u9a8c\u8bc1\u7684\u60c5\u51b5\u4e0b\u5feb\u901f\u8bc4\u4f30LLM\u4f18\u5316\u5668\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u4e0e\u9057\u4f20\u7b97\u6cd5\u57fa\u7ebf\u76f8\u6bd4\uff0cLLMs\u5728\u8981\u6c42\u8f83\u5c11\u6d4b\u8bd5\u51fd\u6570\u8bc4\u4f30\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u663e\u8457\u66f4\u4f4e\u7684\u9057\u61be\u89e3\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u89c2\u5bdf\u5230LLMs\u8868\u73b0\u51fa\u9002\u5ea6\u7684\u6821\u51c6\u504f\u5dee\uff0c\u5bb9\u6613\u53d1\u751f\u751f\u6210\u5668\u5d29\u6e83\uff0c\u5e76\u4e14\u5728\u6ca1\u6709\u660e\u786e\u7684\u5730\u9762\u771f\u503c\u5956\u52b1\u53ef\u7528\u65f6\u96be\u4ee5\u627e\u5230\u6700\u4f18\u89e3\u3002|\n", "2410.22293": "|**2024-10-29**|**Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats**|Mohammad Setak et.al.|[2410.22293](http://arxiv.org/abs/2410.22293)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u4ee3\u7801\u5408\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4e0d\u540c\u9886\u57df\u66f4\u590d\u6742\u7684\u4efb\u52a1\u3002\u672c\u6587\u63a2\u8ba8\u4e86LLMs\u5728\u4ee3\u7801\u53d8\u5f02\u4e2d\u7684\u5e94\u7528\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u4e0d\u6539\u53d8\u7a0b\u5e8f\u4ee3\u7801\u529f\u80fd\u7684\u524d\u63d0\u4e0b\u6539\u53d8\u5176\u7ed3\u6784\u7684\u8fc7\u7a0b\u3002\u4f20\u7edf\u4e0a\uff0c\u4ee3\u7801\u53d8\u5f02\u88ab\u7528\u4e8e\u63d0\u9ad8\u5173\u952e\u4efb\u52a1\u5e94\u7528\u7a0b\u5e8f\u7684\u8f6f\u4ef6\u5065\u58ee\u6027\u3002\u6b64\u5916\uff0c\u53d8\u5f02\u5f15\u64ce\u4e5f\u88ab\u6076\u610f\u8f6f\u4ef6\u5f00\u53d1\u8005\u7528\u6765\u9003\u907f\u57fa\u4e8e\u7279\u5f81\u7801\u7684\u68c0\u6d4b\u65b9\u6cd5\u3002\u73b0\u6709\u7684\u6076\u610f\u8f6f\u4ef6\u4f7f\u7528\u7684\u53d8\u5f02\u5f15\u64ce\u901a\u5e38\u53ea\u4ea7\u751f\u6709\u9650\u7684\u4ee3\u7801\u53d8\u5316\uff0c\u8fd9\u4e9b\u53d8\u5316\u4ecd\u7136\u53ef\u4ee5\u901a\u8fc7\u9759\u6001\u4ee3\u7801\u5206\u6790\u88ab\u8bc6\u522b\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684LLM\u6240\u5c55\u793a\u7684\u7075\u6d3b\u6027\u53ef\u80fd\u663e\u8457\u6539\u53d8\u8fd9\u79cd\u5a01\u80c1\u6001\u52bf\uff0c\u901a\u8fc7\u5141\u8bb8\u8fdb\u884c\u66f4\u590d\u6742\u7684\u4ee3\u7801\u53d8\u5f02\uff0c\u8fd9\u4e9b\u53d8\u5f02\u4e0d\u5bb9\u6613\u901a\u8fc7\u9759\u6001\u5206\u6790\u68c0\u6d4b\u5230\u3002\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u548c\u518d\u8bad\u7ec3\u589e\u52a0\u7531\u9884\u8bad\u7ec3LLM\u751f\u6210\u7684\u4ee3\u7801\u7684\u53d8\u5316\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u4ee3\u7801\u53d8\u5f02\u8bad\u7ec3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u4e3a\u57fa\u4e8e\u9884\u8bad\u7ec3LLM\u7684\u4ee3\u7801\u5408\u6210\u5668\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u4ee3\u7801\u53d8\u5f02\u8bad\u7ec3\u5b9a\u4e49\uff0c\u5e76\u5728\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u4e0a\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5728\u5b50\u4f8b\u7a0b\u7ea7\u522b\u91cd\u7ec4\uff08\u5373\u53d8\u5f02\uff09\u4ee3\u7801\uff0c\u8fd9\u4f7f\u5f97\u53d8\u5f02\u66f4\u52a0\u53ef\u63a7\u540c\u65f6\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u5e76\u901a\u8fc7\u5355\u5143\u6d4b\u8bd5\u9a8c\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u5730\u63d0\u9ad8\u4e86\u57fa\u4e8eLLM\u7684\u7a0b\u5e8f\u5408\u6210\u5668\u5728\u751f\u6210\u591a\u6837\u5316\u4e14\u529f\u80fd\u6b63\u786e\u7684\u4ee3\u7801\u89e3\u51b3\u65b9\u6848\u65b9\u9762\u7684\u53d8\u5f02\u80fd\u529b\uff0c\u5c55\u793a\u4e86\u5b83\u4eec\u5728\u6539\u53d8\u4ee3\u7801\u53d8\u5f02\u683c\u5c40\u4ee5\u53ca\u4e0e\u4e4b\u76f8\u5173\u7684\u5a01\u80c1\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2410.22284": "|**2024-10-29**|**Embedding-based classifiers can detect prompt injection attacks**|Md. Ahsan Ayub et.al.|[2410.22284](http://arxiv.org/abs/2410.22284)|**[link](https://github.com/AhsanAyub/malicious-prompt-detection)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u751f\u6210\u80fd\u529b\u800c\u5728\u5404\u7c7b\u7ec4\u7ec7\u4e2d\u5f97\u5230\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5bb9\u6613\u53d7\u5230\u5404\u79cd\u5bf9\u6297\u6027\u653b\u51fb\uff0c\u7279\u522b\u662f\u63d0\u793a\u6ce8\u5165\u653b\u51fb\uff0c\u8fd9\u79cd\u653b\u51fb\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6076\u610f\u63d0\u793a\u6b3a\u9a97LLMs\uff0c\u4f7f\u5176\u751f\u6210\u6709\u5bb3\u6216\u4e0d\u9002\u5f53\u7684\u5185\u5bb9\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5d4c\u5165\u5f0f\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u5206\u7c7b\u5668\u7684\u65b0\u65b9\u6cd5\uff0c\u4ee5\u4fdd\u62a4\u57fa\u4e8eLLM\u7684\u5e94\u7528\u7a0b\u5e8f\u514d\u53d7\u8fd9\u79cd\u4e25\u91cd\u5a01\u80c1\u3002\u6211\u4eec\u5229\u7528\u4e09\u79cd\u5e38\u7528\u7684\u5d4c\u5165\u6a21\u578b\u6765\u751f\u6210\u6076\u610f\u548c\u826f\u6027\u63d0\u793a\u7684\u5d4c\u5165\uff0c\u5e76\u4f7f\u7528ML\u5206\u7c7b\u5668\u9884\u6d4b\u8f93\u5165\u63d0\u793a\u662f\u5426\u4e3a\u6076\u610f\u3002\u5728\u51e0\u79cd\u4f20\u7edf\u7684ML\u65b9\u6cd5\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u968f\u673a\u68ee\u6797\u548cXGBoost\u6784\u5efa\u7684\u5206\u7c7b\u5668\u8868\u73b0\u6700\u4f73\u3002\u6211\u4eec\u7684\u5206\u7c7b\u5668\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5f00\u6e90\u5b9e\u73b0\u4e2d\u7684\u6700\u5148\u8fdb\u7684\u63d0\u793a\u6ce8\u5165\u5206\u7c7b\u5668\uff0c\u540e\u8005\u4f7f\u7528\u7684\u662f\u4ec5\u7f16\u7801\u5668\u7684\u795e\u7ecf\u7f51\u7edc\u3002**|\n", "2410.22282": "|**2024-10-29**|**Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models**|Renzhe Yu et.al.|[2410.22282](http://arxiv.org/abs/2410.22282)|null|\u81ea2022\u5e74\u5e95\u4ee5\u6765\uff0cChatGPT\u7b49\u7c7b\u4f3c\u5de5\u5177\u7684\u5e7f\u6cdb\u53ef\u7528\u6027\u5f15\u53d1\u4e86\u516c\u4f17\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63d0\u9ad8\u5b66\u4e60\u4f53\u9a8c\u548c\u6210\u679c\u65b9\u9762\u7684\u6f5c\u529b\u7684\u5de8\u5927\u5174\u8da3\u548c\u5b9e\u9a8c\u52aa\u529b\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u6765\u81ea\u5f31\u52bf\u80cc\u666f\u7684\u5b66\u4e60\u8005\u3002\u7136\u800c\uff0c\u5f88\u5c11\u6709\u7814\u7a76\u7cfb\u7edf\u5730\u8003\u5bdf\u4e86LLMs\u7684\u5b9e\u9645\u53ef\u7528\u6027\u5bf9\u6559\u80b2\u516c\u5e73\u6027\u7684\u73b0\u5b9e\u5f71\u54cd\uff0c\u9664\u4e86\u7406\u8bba\u9884\u6d4b\u548c\u521b\u65b0LLM\u5e94\u7528\u7684\u63a7\u5236\u7814\u7a76\u4e4b\u5916\u3002\u4e3a\u4e86\u63cf\u7ed8LLMs\u4e0d\u5e73\u7b49\u8d8b\u52bf\uff0c\u6211\u4eec\u5206\u6790\u4e86\u4e00\u6240\u7f8e\u56fd\u516c\u7acb\u5c11\u6570\u65cf\u88d4\u670d\u52a1\u9662\u68212021\u5e74\u81f32024\u5e74\u95f42391\u95e8\u8bfe\u7a0b\u4e2d16791\u540d\u5927\u5b66\u751f\u63d0\u4ea4\u76841140328\u7bc7\u5b66\u672f\u5199\u4f5c\u4f5c\u4e1a\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5728LLMs\u53ef\u7528\u4e4b\u540e\uff0c\u5b66\u751f\u7684\u6574\u4f53\u5199\u4f5c\u8d28\u91cf\u9010\u6e10\u63d0\u9ad8\uff0c\u5e76\u4e14\u8bed\u8a00\u4f18\u52bf\u548c\u52a3\u52bf\u5b66\u751f\u4e4b\u95f4\u7684\u5199\u4f5c\u8d28\u91cf\u5dee\u8ddd\u9010\u6e10\u7f29\u5c0f\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u5e73\u7b49\u5316\u6548\u5e94\u66f4\u591a\u96c6\u4e2d\u5728\u8f83\u9ad8\u793e\u4f1a\u7ecf\u6d4e\u5730\u4f4d\u7684\u5b66\u751f\u8eab\u4e0a\u3002\u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86LLMs\u65f6\u4ee3\u7684\u6570\u5b57\u9e3f\u6c9f\uff0c\u5e76\u63d0\u51fa\u4e86\u5173\u4e8eLLMs\u5728\u65e9\u671f\u9636\u6bb5\u7684\u516c\u5e73\u6548\u76ca\u7684\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u7814\u7a76\u4eba\u5458\u548c\u4ece\u4e1a\u8005\u9700\u8981\u5236\u5b9a\u8d1f\u8d23\u4efb\u7684\u505a\u6cd5\u4ee5\u901a\u8fc7LLMs\u6539\u5584\u6559\u80b2\u516c\u5e73\u6027\u3002|\n", "2410.23262": "|**2024-10-30**|**EMMA: End-to-End Multimodal Model for Autonomous Driving**|Jyh-Jing Hwang et.al.|[2410.23262](http://arxiv.org/abs/2410.23262)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86EMMA\uff0c\u8fd9\u662f\u4e00\u79cd\u7528\u4e8e\u81ea\u52a8\u9a7e\u9a76\u7684\u7aef\u5230\u7aef\u591a\u6a21\u6001\u6a21\u578b\u3002\u8be5\u6a21\u578b\u57fa\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u57fa\u7840\uff0c\u76f4\u63a5\u5c06\u539f\u59cb\u76f8\u673a\u4f20\u611f\u5668\u6570\u636e\u6620\u5c04\u5230\u5404\u79cd\u4e0e\u9a7e\u9a76\u76f8\u5173\u7684\u8f93\u51fa\uff0c\u5305\u62ec\u89c4\u5212\u8f68\u8ff9\u3001\u611f\u77e5\u5bf9\u8c61\u548c\u9053\u8def\u56fe\u5143\u7d20\u3002EMMA\u901a\u8fc7\u5c06\u6240\u6709\u975e\u4f20\u611f\u5668\u8f93\u5165\uff08\u4f8b\u5982\u5bfc\u822a\u6307\u4ee4\u548c\u81ea\u8f66\u72b6\u6001\uff09\u548c\u8f93\u51fa\uff08\u4f8b\u5982\u8f68\u8ff9\u548c\u4e09\u7ef4\u4f4d\u7f6e\uff09\u8868\u793a\u4e3a\u81ea\u7136\u8bed\u8a00\u6587\u672c\uff0c\u6700\u5927\u9650\u5ea6\u5730\u5229\u7528\u4e86\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u4e16\u754c\u77e5\u8bc6\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7fEMMA\u80fd\u591f\u5728\u7edf\u4e00\u7684\u8bed\u8a00\u7a7a\u95f4\u4e2d\u8054\u5408\u5904\u7406\u5404\u79cd\u9a7e\u9a76\u4efb\u52a1\uff0c\u5e76\u4f7f\u7528\u7279\u5b9a\u4efb\u52a1\u63d0\u793a\u751f\u6210\u6bcf\u4e2a\u4efb\u52a1\u7684\u8f93\u51fa\u3002\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0cEMMA\u5728nuScenes\u4e0a\u7684\u8fd0\u52a8\u89c4\u5212\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u5728Waymo\u5f00\u653e\u8fd0\u52a8\u6570\u636e\u96c6\uff08WOMD\uff09\u4e0a\u53d6\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\u3002\u6b64\u5916\uff0cEMMA\u5728Waymo\u5f00\u653e\u6570\u636e\u96c6\uff08WOD\uff09\u4e0a\u4f5c\u4e3a\u4e3b\u8981\u6444\u50cf\u5934\u7684\u4e09\u7ef4\u76ee\u6807\u68c0\u6d4b\u4e5f\u53d6\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u901a\u8fc7\u540c\u65f6\u8bad\u7ec3EMMA\u8fdb\u884c\u89c4\u5212\u8f68\u8ff9\u3001\u76ee\u6807\u68c0\u6d4b\u548c\u9053\u8def\u56fe\u4efb\u52a1\u53ef\u4ee5\u5728\u8fd9\u4e09\u4e2a\u9886\u57df\u90fd\u53d6\u5f97\u6539\u8fdb\uff0c\u7a81\u663e\u4e86EMMA\u4f5c\u4e3a\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6a21\u578b\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cEMMA\u4e5f\u8868\u73b0\u51fa\u4e00\u4e9b\u5c40\u9650\u6027\uff1a\u5b83\u53ea\u80fd\u5904\u7406\u5c11\u91cf\u56fe\u50cf\u5e27\uff0c\u4e0d\u5305\u542b\u51c6\u786e\u7684\u4e09\u7ef4\u4f20\u611f\u6a21\u6001\u5982\u6fc0\u5149\u96f7\u8fbe\u6216\u96f7\u8fbe\uff0c\u5e76\u4e14\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u7ed3\u679c\u80fd\u591f\u6fc0\u53d1\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u5e76\u8fdb\u4e00\u6b65\u53d1\u5c55\u81ea\u52a8\u9a7e\u9a76\u6a21\u578b\u67b6\u6784\u3002|\n", "2410.23252": "|**2024-10-30**|**Evaluating Cultural and Social Awareness of LLM Web Agents**|Haoyi Qiu et.al.|[2410.23252](http://arxiv.org/abs/2410.23252)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6269\u5c55\u5230\u6267\u884c\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u4e2d\u7684\u4ee3\u7406\u4efb\u52a1\uff0c\u8d85\u8d8a\u4f20\u7edf\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u8bc4\u4f30\u5176\u9c81\u68d2\u6027\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7b49\u5173\u952e\u7ef4\u5ea6\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86CASA\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u4ee3\u7406\u5728\u4e24\u4e2a\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\uff08\u5728\u7ebf\u8d2d\u7269\u548c\u793e\u4ea4\u8ba8\u8bba\u8bba\u575b\uff09\u4e2d\u5bf9\u6587\u5316\u548c\u793e\u4f1a\u89c4\u8303\u7684\u654f\u611f\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bc4\u4f30\u4e86LLM\u4ee3\u7406\u68c0\u6d4b\u5e76\u9002\u5f53\u56de\u5e94\u8fdd\u53cd\u89c4\u8303\u7684\u7528\u6237\u67e5\u8be2\u548c\u89c2\u5bdf\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u6d4b\u91cf\u4ee3\u7406\u5bf9\u6587\u5316\u548c\u793e\u4f1a\u89c4\u8303\u7684\u610f\u8bc6\u8986\u76d6\u7387\u3001\u5728\u7ba1\u7406\u7528\u6237\u67e5\u8be2\u65f6\u7684\u5b9e\u7528\u6027\u4ee5\u53ca\u9762\u5bf9\u8bef\u5bfc\u6027\u7f51\u7edc\u5185\u5bb9\u65f6\u7684\u8fdd\u89c4\u7387\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u524d\u7684LLM\u5728\u975e\u4ee3\u7406\u73af\u5883\u4e2d\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u5728\u7f51\u7edc\u4ee3\u7406\u73af\u5883\u4e2d\uff0c\u4ee3\u7406\u7684\u610f\u8bc6\u8986\u76d6\u7387\u4e0d\u523010%\uff0c\u8fdd\u89c4\u7387\u8d85\u8fc740%\u3002\u4e3a\u4e86\u63d0\u9ad8\u6027\u80fd\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u63d0\u793a\u548c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u4e92\u8865\u2014\u2014\u9488\u5bf9\u7279\u5b9a\u6587\u5316\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u589e\u5f3a\u4ee3\u7406\u5728\u4e0d\u540c\u5730\u533a\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u800c\u63d0\u793a\u5219\u80fd\u63d0\u5347\u4ee3\u7406\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u7a81\u663e\u4e86\u5728\u5f00\u53d1\u5468\u671f\u4e2d\u4e0d\u65ad\u57fa\u51c6\u6d4b\u8bd5LLM\u4ee3\u7406\u7684\u6587\u5316\u548c\u793e\u4f1a\u610f\u8bc6\u7684\u91cd\u8981\u6027\u3002|\n", "2410.23243": "|**2024-10-30**|**Carrot and Stick: Eliciting Comparison Data and Beyond**|Yiling Chen et.al.|[2410.23243](http://arxiv.org/abs/2410.23243)|null|\u6bd4\u8f83\u6570\u636e\u901a\u5e38\u6765\u81ea\u4e8e\u4eba\u4eec\u7684\u4e3b\u89c2\u5224\u65ad\uff0c\u5e76\u4e14\u96be\u4ee5\u76f4\u63a5\u9a8c\u8bc1\u3002\u8fd9\u4e9b\u6570\u636e\u5bf9\u4e8e\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u4efb\u52a1\u81f3\u5173\u91cd\u8981\uff0c\u5305\u62ec\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\u548c\u6392\u540d\u6a21\u578b\u4f30\u8ba1\u3002\u5982\u4f55\u8bda\u5b9e\u5730\u4ece\u7406\u6027\u4e2a\u4f53\u90a3\u91cc\u83b7\u53d6\u8fd9\u6837\u7684\u6bd4\u8f83\u6570\u636e\uff1f\u6211\u4eec\u8bbe\u8ba1\u4e86\u540c\u4f34\u9884\u6d4b\u673a\u5236\u6765\u5229\u7528\u5956\u91d1-\u60e9\u7f5a\u652f\u4ed8\u65b9\u5f0f\u6765\u83b7\u53d6\u6bd4\u8f83\u6570\u636e\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f9d\u8d56\u4e8e\u6bd4\u8f83\u6570\u636e\u7684\u5f3a\u968f\u673a\u4f20\u9012\u6027\uff0c\u4ece\u800c\u521b\u5efa\u5bf9\u79f0\u7684\u4e25\u683c\u771f\u5b9e\u673a\u5236\uff0c\u4f7f\u5f97\u8bf4\u5b9e\u8bdd\u4e0d\u4ec5\u5f62\u6210\u4e25\u683c\u7684\u8d1d\u53f6\u65af\u7eb3\u4ec0\u5747\u8861\uff0c\u800c\u4e14\u5728\u6240\u6709\u5bf9\u79f0\u5747\u8861\u4e2d\u83b7\u5f97\u6700\u9ad8\u62a5\u916c\u3002\u5728\u6211\u4eec\u7684\u673a\u5236\u4e0b\uff0c\u6bcf\u4e2a\u4e2a\u4f53\u53ea\u9700\u8981\u8bc4\u4f30\u4e00\u5bf9\u9879\u76ee\u5e76\u62a5\u544a\u5979\u7684\u6bd4\u8f83\u7ed3\u679c\u3002 \u6211\u4eec\u8fdb\u4e00\u6b65\u5c06\u5956\u91d1-\u60e9\u7f5a\u652f\u4ed8\u7684\u6982\u5ff5\u6269\u5c55\u5230\u7f51\u7edc\u5316\u6570\u636e\u7684\u83b7\u53d6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5f53\u4ee3\u7406\u4eba\u7684\u79c1\u4eba\u4fe1\u53f7\u6839\u636eIsing\u6a21\u578b\u91c7\u6837\u65f6\uff0c\u5bf9\u79f0\u5730\u4e25\u683c\u771f\u5b9e\u7684\u673a\u5236\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5956\u91d1-\u60e9\u7f5a\u652f\u4ed8\u6210\u4e3a\u4e25\u683c\u8d1d\u53f6\u65af\u7eb3\u4ec0\u5747\u8861\u7684\u5fc5\u8981\u548c\u5145\u5206\u6761\u4ef6\u3002\u5728\u4e24\u4e2a\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u652f\u6301\u4e86\u6211\u4eec\u7684\u7406\u8bba\u53d1\u73b0\u3002|\n", "2410.23242": "|**2024-10-30**|**A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment**|Matteo G. Mecattaf et.al.|[2410.23242](http://arxiv.org/abs/2410.23242)|null|\u4f5c\u4e3a\u901a\u7528\u5de5\u5177\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5fc5\u987b\u7ecf\u5e38\u63a8\u7406\u65e5\u5e38\u7269\u7406\u73af\u5883\u3002\u5728\u95ee\u7b54\u573a\u666f\u4e2d\uff0c\u7406\u89e3\u7269\u7406\u5bf9\u8c61\u7684\u76f8\u4e92\u4f5c\u7528\u53ef\u80fd\u662f\u7ed9\u51fa\u9002\u5f53\u56de\u7b54\u7684\u5fc5\u8981\u6761\u4ef6\u3002\u6b64\u5916\uff0cLLMs\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4f5c\u81ea\u4e3b\u7cfb\u7edf\u4e2d\u7684\u63a8\u7406\u5f15\u64ce\uff0c\u8bbe\u8ba1\u548c\u63a7\u5236\u5b83\u4eec\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u5927\u591a\u6570\u7814\u7a76\u901a\u8fc7\u9759\u6001\u57fa\u51c6\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd9\u4e9b\u57fa\u51c6\u7531\u5173\u4e8e\u7269\u7406\u4e16\u754c\u7684\u6587\u672c\u6216\u56fe\u50cf\u95ee\u9898\u7ec4\u6210\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u65e0\u6cd5\u6355\u6349\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u7269\u7406\u8fc7\u7a0b\u7684\u590d\u6742\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u3002\u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u63d0\u5021\u7b2c\u4e8c\u79cd\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u65b9\u6cd5\uff1a\u901a\u8fc7\u5728\u4e00\u4e2a3D\u73af\u5883\u4e2d\u8d4b\u4e88LLMs\u5bf9\u4ee3\u7406\u7684\u63a7\u5236\u6743\u6765\u201c\u5177\u8eab\u5316\u201d\u5b83\u4eec\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u7b2c\u4e00\u4e2a\u5177\u8eab\u4e14\u8ba4\u77e5\u4e0a\u6709\u610f\u4e49\u7684LLM\u7269\u7406\u5e38\u8bc6\u63a8\u7406\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7684\u6846\u67b6\u5141\u8bb8\u76f4\u63a5\u6bd4\u8f83LLMs\u4e0e\u5176\u4ed6\u5177\u8eab\u4ee3\u7406\uff0c\u5982\u57fa\u4e8e\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7684\u4ee3\u7406\uff0c\u4ee5\u53ca\u4eba\u7c7b\u548c\u975e\u4eba\u7c7b\u52a8\u7269\u3002\u6211\u4eec\u4f7f\u7528Animal-AI\uff08AAI\uff09\u73af\u5883\uff0c\u4e00\u4e2a\u6a21\u62df\u76843D\u865a\u62df\u5b9e\u9a8c\u5ba4\uff0c\u6765\u7814\u7a76LLMs\u7684\u7269\u7406\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528AAI\u6d4b\u8bd5\u5e73\u53f0\uff0c\u8be5\u5e73\u53f0\u662f\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u590d\u5236\u4e86\u975e\u4eba\u7c7b\u52a8\u7269\u7684\u5b9e\u9a8c\u5ba4\u7814\u7a76\uff0c\u4ee5\u7814\u7a76\u7269\u7406\u63a8\u7406\u80fd\u529b\uff0c\u5305\u62ec\u8ddd\u79bb\u4f30\u8ba1\u3001\u8ddf\u8e2a\u770b\u4e0d\u89c1\u7684\u7269\u4f53\u548c\u5de5\u5177\u4f7f\u7528\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6ca1\u6709\u5fae\u8c03\u7684\u72b6\u6001-of-the-art\u591a\u6a21\u6001\u6a21\u578b\u80fd\u591f\u5b8c\u6210\u8fd9\u79cd\u4efb\u52a1\uff0c\u4f7f\u5f97\u4e0e2019\u5e74Animal-AI\u5965\u8fd0\u4f1a\u53c2\u8d5b\u8005\u548c\u4eba\u7c7b\u513f\u7ae5\u8fdb\u884c\u6709\u610f\u4e49\u7684\u6bd4\u8f83\u6210\u4e3a\u53ef\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cLLMs\u76ee\u524d\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0d\u5982\u4eba\u7c7b\u513f\u7ae5\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u4f7f\u7528\u76f4\u63a5\u4ece\u8ba4\u77e5\u79d1\u5b66\u4e2d\u63d0\u53d6\u7684\u751f\u6001\u6709\u6548\u7684\u5b9e\u9a8c\u6765\u7814\u7a76\u7269\u7406\u63a8\u7406\uff0c\u4ece\u800c\u63d0\u9ad8LLMs\u7684\u9884\u6d4b\u6027\u548c\u53ef\u9760\u6027\u3002|\n", "2410.23234": "|**2024-10-30**|**EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning**|Peide Huang et.al.|[2410.23234](http://arxiv.org/abs/2410.23234)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aEMOTION\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u5728\u4eba\u5f62\u673a\u5668\u4eba\u4e2d\u751f\u6210\u5bcc\u6709\u8868\u73b0\u529b\u7684\u52a8\u4f5c\u5e8f\u5217\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u8fdb\u884c\u7c7b\u4eba\u975e\u8bed\u8a00\u4ea4\u6d41\u7684\u80fd\u529b\u3002\u975e\u8bed\u8a00\u7ebf\u7d22\u5982\u9762\u90e8\u8868\u60c5\u3001\u624b\u52bf\u548c\u8eab\u4f53\u52a8\u4f5c\u5728\u6709\u6548\u7684\u4eba\u9645\u4e92\u52a8\u4e2d\u8d77\u7740\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5c3d\u7ba1\u5728\u673a\u5668\u4eba\u7684\u884c\u4e3a\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u96be\u4ee5\u6a21\u4eff\u4eba\u7c7b\u975e\u8bed\u8a00\u4ea4\u6d41\u7684\u591a\u6837\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\uff0c\u52a8\u6001\u751f\u6210\u9002\u5408\u793e\u4f1a\u4ea4\u5f80\u7684\u624b\u52bf\u52a8\u4f5c\u5e8f\u5217\uff0c\u4ee5\u4fc3\u8fdb\u4eba\u673a\u4ea4\u4e92\u3002\u6211\u4eec\u4f7f\u7528\u8be5\u6846\u67b6\u751f\u6210\u4e8610\u79cd\u4e0d\u540c\u7684\u8868\u60c5\u624b\u52bf\uff0c\u5e76\u8fdb\u884c\u4e86\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff0c\u6bd4\u8f83\u7531EMOTION\u548c\u5176\u52a0\u5165\u4eba\u7c7b\u53cd\u9988\u7248\u672cEMOTION++\u751f\u6210\u7684\u52a8\u4f5c\u4e0e\u4eba\u7c7b\u64cd\u4f5c\u5458\u751f\u6210\u7684\u52a8\u4f5c\u4e4b\u95f4\u7684\u81ea\u7136\u5ea6\u548c\u53ef\u7406\u89e3\u6027\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u751f\u6210\u53ef\u7406\u89e3\u4e14\u81ea\u7136\u7684\u673a\u5668\u4eba\u52a8\u4f5c\u65b9\u9762\u8981\u4e48\u4e0e\u4eba\u7c7b\u8868\u73b0\u76f8\u5f53\uff0c\u8981\u4e48\u8d85\u8d8a\u4eba\u7c7b\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u7684\u8bbe\u8ba1\u542f\u793a\uff0c\u8003\u8651\u5728\u751f\u6210\u5bcc\u6709\u8868\u73b0\u529b\u7684\u673a\u5668\u4eba\u624b\u52bf\u65f6\u9700\u8981\u8003\u8651\u7684\u4e00\u7cfb\u5217\u53d8\u91cf\u3002|\n", "2410.23214": "|**2024-10-31**|**Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval**|Sheryl Hsu et.al.|[2410.23214](http://arxiv.org/abs/2410.23214)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u73b0\u8c61\u901a\u8fc7\u5141\u8bb8\u6a21\u578b\u641c\u7d22\u4fe1\u606f\u5e76\u5c06\u7b54\u6848\u4e0e\u771f\u5b9e\u6765\u6e90\u6302\u94a9\uff0c\u5f97\u5230\u4e86\u4e00\u5b9a\u7a0b\u5ea6\u7684\u7f13\u89e3\u3002\u7136\u800c\uff0cLLMs\u5728\u5904\u7406\u590d\u6742\u6216\u95f4\u63a5\u4e3b\u9898\u65f6\uff0c\u5f80\u5f80\u96be\u4ee5\u63d0\u51fa\u6b63\u786e\u7684\u641c\u7d22\u67e5\u8be2\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u901a\u8fc7\u8ba9LLMs\u5c1d\u8bd5\u4e0d\u540c\u7684\u67e5\u8be2\u5e76\u5b66\u4e60\u5bf9\u90a3\u4e9b\u6210\u529f\u4ea7\u751f\u76f8\u5173\u7ed3\u679c\u7684\u67e5\u8be2\u8d4b\u4e88\u66f4\u9ad8\u7684\u6743\u91cd\uff0cLLMs\u53ef\u4ee5\u5b66\u4f1a\u68c0\u7d22\u76f8\u5173\u7684\u4e8b\u5b9e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86LeReT\uff08Learning to Retrieve by Trying\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\uff0c\u901a\u8fc7\u63a2\u7d22\u641c\u7d22\u67e5\u8be2\u5e76\u4f7f\u7528\u57fa\u4e8e\u504f\u597d\u7684\u4f18\u5316\u6765\u63d0\u9ad8\u67e5\u8be2\u8d28\u91cf\u3002LeReT\u53ef\u4ee5\u5c06\u7edd\u5bf9\u68c0\u7d22\u51c6\u786e\u6027\u63d0\u9ad8\u591a\u8fbe29%\uff0c\u5e76\u5c06\u4e0b\u6e38\u751f\u6210\u5668\u8bc4\u4f30\u63d0\u9ad817%\u3002LeReT\u7684\u7b80\u5355\u6027\u548c\u7075\u6d3b\u6027\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4efb\u610f\u73b0\u6210\u7684\u68c0\u7d22\u5668\uff0c\u5e76\u6210\u4e3a\u6539\u8fdb\u901a\u7528LLM\u7ba1\u9053\u7684\u4e00\u79cd\u6709\u524d\u9014\u7684\u6280\u672f\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttp://sherylhsu.com/LeReT/\u3002|\n", "2410.23182": "|**2024-10-30**|**ProTransformer: Robustify Transformers via Plug-and-Play Paradigm**|Zhichao Hou et.al.|[2410.23182](http://arxiv.org/abs/2410.23182)|null|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8eTransformer\u7684\u67b6\u6784\u5728\u673a\u5668\u5b66\u4e60\u7684\u5404\u4e2a\u9886\u57df\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u9c81\u68d2\u6ce8\u610f\u529b\u673a\u5236\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8eTransformer\u7684\u67b6\u6784\u7684\u97e7\u6027\u3002\u8fd9\u9879\u6280\u672f\u53ef\u4ee5\u4f5c\u4e3a\u63d2\u4ef6\u5c42\u96c6\u6210\u5230\u73b0\u6709\u7684Transformer\u6a21\u578b\u4e2d\uff0c\u4ece\u800c\u63d0\u9ad8\u5176\u9c81\u68d2\u6027\uff0c\u800c\u65e0\u9700\u989d\u5916\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u3002\u901a\u8fc7\u5168\u9762\u7684\u5b9e\u9a8c\u548c\u6d88\u878d\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86ProTransformer\u663e\u8457\u63d0\u5347\u4e86\u5404\u79cd\u9884\u6d4b\u4efb\u52a1\u3001\u653b\u51fb\u673a\u5236\u3001\u9aa8\u5e72\u67b6\u6784\u548c\u6570\u636e\u57df\u4e2d\u7684Transformer\u6a21\u578b\u7684\u9c81\u68d2\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u7ecf\u5178\u7684TextFooler\u653b\u51fb\u4e0b\uff0c\u65e0\u9700\u8fdb\u4e00\u6b65\u5fae\u8c03\uff0cProTransformer\u5206\u522b\u5c06BERT\u3001ALBERT\u3001DistilBERT\u548cRoBERTA\u8fd9\u56db\u79cd\u6a21\u578b\u7684\u6027\u80fd\u63d0\u9ad8\u4e8619.5%\u300128.3%\u300116.1%\u548c11.4%\u3002\u6b64\u5916\uff0cProTransformer\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u5bf9\u57fa\u4e8e\u63d0\u793a\u7684\u653b\u51fb\u65f6\u8868\u73b0\u51fa\u826f\u597d\u7684\u97e7\u6027\uff0c\u5206\u522b\u5c06T5\u548cLLaMA\u7684\u6027\u80fd\u63d0\u9ad8\u4e8624.8%\u548c17.8%\uff0c\u5e76\u4e14\u5e73\u5747\u5c06Vicuna\u5728Jailbreaking\u653b\u51fb\u4e0b\u7684\u6027\u80fd\u63d0\u9ad8\u4e8610.4%\u3002\u9664\u4e86\u8bed\u8a00\u9886\u57df\u5916\uff0cProTransformer\u8fd8\u5728\u89c6\u89c9\u548c\u56fe\u9886\u57df\u5c55\u793a\u4e86\u51fa\u8272\u7684\u9c81\u68d2\u6027\u3002|\n", "2410.23180": "|**2024-10-30**|**ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning**|Millennium Bismay et.al.|[2410.23180](http://arxiv.org/abs/2410.23180)|**[link](https://github.com/millenniumbismay/reasoningrec)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aReasoningRec\u7684\u63a8\u7406\u63a8\u8350\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u5f25\u5408\u63a8\u8350\u4e0e\u4eba\u7c7b\u53ef\u89e3\u91ca\u6027\u89e3\u91ca\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u4e0e\u4f9d\u8d56\u4e8e\u9690\u5f0f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\u4e0d\u540c\uff0cReasoningRec\u4f7f\u7528LLMs\u6765\u5efa\u6a21\u7528\u6237\u548c\u9879\u76ee\uff0c\u91cd\u70b9\u5728\u4e8e\u7528\u6237\u7684\u504f\u597d\u3001\u538c\u6076\u548c\u89e3\u91ca\u6027\u63a8\u7406\u3002\u8be5\u6846\u67b6\u5229\u7528\u4e00\u4e2a\u8f83\u5927\u7684LLM\u751f\u6210\u7528\u6237\u504f\u597d\u7684\u5408\u6210\u89e3\u91ca\uff0c\u968f\u540e\u7528\u4e8e\u5fae\u8c03\u8f83\u5c0f\u7684LLM\u4ee5\u63d0\u9ad8\u63a8\u8350\u51c6\u786e\u6027\u53ca\u63d0\u4f9b\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u89e3\u91ca\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7814\u7a76\u8c03\u67e5\u4e86\u63a8\u7406\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\u5bf9\u4e2a\u6027\u5316\u63a8\u8350\u7684\u5f71\u54cd\uff0c\u7ed3\u679c\u663e\u793a\u4e0a\u4e0b\u6587\u548c\u4e2a\u4eba\u5316\u6570\u636e\u7684\u8d28\u91cf\u663e\u8457\u5f71\u54cdLLM\u751f\u6210\u5408\u7406\u89e3\u91ca\u7684\u80fd\u529b\u3002\u5b9e\u8bc1\u8bc4\u4f30\u8868\u660e\uff0cReasoningRec\u5728\u63a8\u8350\u9884\u6d4b\u65b9\u9762\u6bd4\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa12.5%\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u6613\u4e8e\u7406\u89e3\u7684\u89e3\u91ca\u3002\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/millenniumbismay/reasoningrec\u3002**|\n", "2410.23166": "|**2024-10-30**|**SciPIP: An LLM-based Scientific Paper Idea Proposer**|Wenxiao Wang et.al.|[2410.23166](http://arxiv.org/abs/2410.23166)|null|\u77e5\u8bc6\u7684\u6307\u6570\u589e\u957f\u548c\u8de8\u5b66\u79d1\u7814\u7a76\u7684\u590d\u6742\u6027\u7ed9\u7814\u7a76\u4eba\u5458\u5e26\u6765\u4e86\u663e\u8457\u6311\u6218\uff0c\u5305\u62ec\u4fe1\u606f\u8fc7\u8f7d\u548c\u63a2\u7d22\u65b0\u60f3\u6cd5\u7684\u56f0\u96be\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u5728\u589e\u5f3a\u60f3\u6cd5\u63d0\u6848\u65b9\u9762\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u5982\u4f55\u6709\u6548\u5229\u7528\u5927\u6a21\u578b\u8fdb\u884c\u5408\u7406\u7684\u60f3\u6cd5\u63d0\u6848\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u8ba8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u79d1\u5b66\u8bba\u6587\u60f3\u6cd5\u63d0\u6848\u5668\uff08SciPIP\uff09\u3002\u57fa\u4e8e\u7528\u6237\u63d0\u4f9b\u7684\u7814\u7a76\u80cc\u666f\uff0cSciPIP\u4ece\u6587\u732e\u6570\u636e\u5e93\u4e2d\u68c0\u7d22\u6709\u7528\u8bba\u6587\uff0c\u540c\u65f6\u5229\u7528LLMs\u7684\u80fd\u529b\u751f\u6210\u66f4\u591a\u65b0\u9896\u4e14\u53ef\u884c\u7684\u60f3\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6587\u732e\u68c0\u7d22\u6570\u636e\u5e93\uff0c\u63d0\u53d6\u5927\u91cf\u8bba\u6587\u7684\u591a\u7ef4\u5ea6\u4fe1\u606f\u4ee5\u4fbf\u5feb\u901f\u8bbf\u95ee\u3002\u7136\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bed\u4e49\u3001\u5b9e\u4f53\u548c\u5f15\u7528\u5171\u73b0\u7684\u6587\u732e\u68c0\u7d22\u65b9\u6cd5\uff0c\u4ece\u591a\u4e2a\u65b9\u9762\u6839\u636e\u7528\u6237\u63d0\u4f9b\u7684\u80cc\u666f\u641c\u7d22\u76f8\u5173\u6587\u732e\u3002\u5728\u6587\u732e\u68c0\u7d22\u4e4b\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cc\u8def\u5f84\u60f3\u6cd5\u63d0\u6848\u7b56\u7565\uff0c\u5176\u4e2d\u4e00\u6761\u8def\u5f84\u4ece\u68c0\u7d22\u5230\u7684\u6587\u732e\u4e2d\u63a8\u65ad\u89e3\u51b3\u65b9\u6848\uff0c\u53e6\u4e00\u6761\u8def\u5f84\u901a\u8fc7\u6a21\u578b\u5934\u8111\u98ce\u66b4\u751f\u6210\u539f\u521b\u60f3\u6cd5\u3002\u7136\u540e\u6211\u4eec\u5c06\u4e24\u8005\u7ed3\u5408\u8d77\u6765\u4ee5\u5b9e\u73b0\u53ef\u884c\u6027\u4e0e\u539f\u521b\u6027\u7684\u826f\u597d\u5e73\u8861\u3002\u901a\u8fc7\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660eSciPIP\u53ef\u4ee5\u68c0\u7d22\u4e0e\u73b0\u6709\u9876\u7ea7\u4f1a\u8bae\u8bba\u6587\u7c7b\u4f3c\u7684\u5f15\u6587\uff0c\u5e76\u751f\u6210\u8bb8\u591a\u4e0e\u5176\u4e00\u81f4\u7684\u60f3\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e86SciPIP\u751f\u6210\u7684\u5176\u4ed6\u60f3\u6cd5\u7684\u539f\u521b\u6027\uff0c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u548c\u6570\u636e\u5e93\u5df2\u53d1\u5e03\u5728https://github.com/cheerss/SciPIP\u3002|\n", "2410.23136": "|**2024-10-30**|**Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning**|Keqin Bao et.al.|[2410.23136](http://arxiv.org/abs/2410.23136)|**[link](https://github.com/ym689/rec_icl)**|**\u9891\u7e41\u66f4\u65b0\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u8350\u7cfb\u7edf\u4ee5\u9002\u5e94\u65b0\u7684\u7528\u6237\u5174\u8da3\uff0c\u5c31\u50cf\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\u6240\u505a\u7684\u90a3\u6837\uff0c\u7531\u4e8e\u9ad8\u6602\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u5373\u4f7f\u6709\u52a0\u901f\u65b9\u6cd5\u4e5f\u662f\u4e0d\u5207\u5b9e\u9645\u7684\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u4e0d\u8fdb\u884c\u4efb\u4f55\u6a21\u578b\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u901a\u8fc7\u5229\u7528\u60c5\u5883\u5b66\u4e60\uff08ICL\uff09\u6765\u9002\u5e94\u52a8\u6001\u7528\u6237\u5174\u8da3\u7684\u65b9\u6cd5\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f7fLLM\u80fd\u591f\u4ece\u8f93\u5165\u4e2d\u7684\u5c11\u91cf\u793a\u4f8b\u4e2d\u5b66\u4e60\u65b0\u4efb\u52a1\u3002\u4f7f\u7528\u65b0\u7684\u5174\u8da3\u793a\u4f8b\u4f5c\u4e3aICL\u7684\u5c11\u91cf\u793a\u4f8b\uff0cLLM\u53ef\u4ee5\u5b9e\u65f6\u5b66\u4e60\u5174\u8da3\uff0c\u4ece\u800c\u907f\u514d\u4e86\u6a21\u578b\u66f4\u65b0\u7684\u9700\u6c42\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u63a8\u8350\u5668\u5728\u63a8\u8350\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u7ecf\u5e38\u5931\u53bb\u5728\u60c5\u5883\u5b66\u4e60\u7684\u80fd\u529b\uff0c\u800c\u539f\u59cbLLM\u7684\u60c5\u5883\u5b66\u4e60\u7f3a\u4e4f\u9488\u5bf9\u63a8\u8350\u4efb\u52a1\u7684\u5173\u6ce8\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86RecICL\uff0c\u5b83\u5b9a\u5236\u4e86\u9488\u5bf9\u63a8\u8350\u4efb\u52a1\u7684\u60c5\u5883\u5b66\u4e60\uff0c\u7528\u4e8e\u5b9e\u65f6\u63a8\u8350\u3002RecICL\u4ee5\u60c5\u5883\u5b66\u4e60\u683c\u5f0f\u7ec4\u7ec7\u8bad\u7ec3\u793a\u4f8b\uff0c\u786e\u4fdd\u5728\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u4fdd\u7559\u60c5\u5883\u5b66\u4e60\u80fd\u529b\u5e76\u4e0e\u5176\u63a8\u8350\u4efb\u52a1\u5bf9\u9f50\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRecICL\u5728\u65e0\u9700\u6a21\u578b\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u5b9e\u65f6\u63a8\u8350\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/ym689/rec_icl\u83b7\u53d6\u3002**|\n", "2410.24198": "|**2024-11-01**|**SelfCodeAlign: Self-Alignment for Code Generation**|Yuxiang Wei et.al.|[2410.24198](http://arxiv.org/abs/2410.24198)|**[link](https://github.com/bigcode-project/selfcodealign)**|**\u6307\u4ee4\u5fae\u8c03\u662f\u4e00\u79cd\u76d1\u7763\u5fae\u8c03\u65b9\u6cd5\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u6307\u4ee4\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86SelfCodeAlign\uff0c\u8fd9\u662f\u9996\u4e2a\u5b8c\u5168\u900f\u660e\u4e14\u8bb8\u53ef\u5bbd\u677e\u7684\u7ba1\u9053\uff0c\u7528\u4e8e\u81ea\u6211\u5bf9\u9f50\u4ee3\u7801LLMs\uff0c\u800c\u65e0\u9700\u5927\u91cf\u7684\u624b\u52a8\u6807\u6ce8\u6216\u84b8\u998f\u3002SelfCodeAlign\u5728\u6574\u4e2a\u6570\u636e\u751f\u6210\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u76f8\u540c\u7684\u57fa\u6a21\u578b\u8fdb\u884c\u63a8\u7406\u3002\u5b83\u9996\u5148\u4ece\u9ad8\u8d28\u91cf\u7684\u79cd\u5b50\u4ee3\u7801\u7247\u6bb5\u4e2d\u63d0\u53d6\u591a\u6837\u5316\u7684\u7f16\u7801\u6982\u5ff5\u4ee5\u751f\u6210\u65b0\u4efb\u52a1\u3002\u7136\u540e\uff0c\u5b83\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u91c7\u6837\u591a\u4e2a\u54cd\u5e94\uff0c\u5e76\u5c06\u5176\u4e0e\u6d4b\u8bd5\u7528\u4f8b\u914d\u5bf9\uff0c\u5728\u6c99\u76d2\u73af\u5883\u4e2d\u8fdb\u884c\u9a8c\u8bc1\u3002\u6700\u540e\uff0c\u901a\u8fc7\u9009\u62e9\u901a\u8fc7\u6d4b\u8bd5\u7684\u793a\u4f8b\u8fdb\u884c\u6307\u4ee4\u5fae\u8c03\u3002\u5728\u6211\u4eec\u7684\u4e3b\u8981\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528SelfCodeAlign\u4e0eCodeQwen1.5-7B\u4e00\u8d77\u751f\u6210\u4e86\u4e00\u4e2a\u5305\u542b74k\u4e2a\u6307\u4ee4-\u54cd\u5e94\u5bf9\u7684\u6570\u636e\u96c6\u3002\u5728\u6b64\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u8be5\u6a21\u578b\u5728HumanEval+\u4e0a\u7684pass@1\u8fbe\u5230\u4e8667.1%\uff0c\u8d85\u8fc7\u4e86CodeLlama-70B-Instruct\uff0c\u5c3d\u7ba1\u5176\u89c4\u6a21\u5c0f\u4e86\u5341\u500d\u3002\u5728\u6240\u6709\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8fd9\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u59cb\u7ec8\u4f18\u4e8e\u4e4b\u524d\u6700\u5148\u8fdb\u7684\u65e0\u9700\u4eba\u5de5\u6807\u6ce8\u6216\u84b8\u998f\u7684\u6307\u4ee4\u5fae\u8c03\u65b9\u6cd5OctoPack\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86SelfCodeAlign\u5728\u5404\u79cd\u89c4\u6a21\u7684LLMs\uff08\u4ece3B\u523033B\uff09\u4e0a\u90fd\u662f\u6709\u6548\u7684\uff0c\u5e76\u4e14\u57fa\u6a21\u578b\u53ef\u4ee5\u4ece\u4e0e\u81ea\u8eab\u6570\u636e\u5206\u5e03\u7684\u5bf9\u9f50\u4e2d\u53d7\u76ca\u66f4\u591a\u3002\u6211\u4eec\u8fd8\u9a8c\u8bc1\u4e86\u7ba1\u9053\u4e2d\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\uff0c\u663e\u793aSelfCodeAlign\u5728\u76f4\u63a5\u4eceGPT-4o\u84b8\u998f\u548c\u9886\u5148\u7684\u57fa\u4e8eGPT-3.5\u7684\u84b8\u998f\u65b9\u6cd5\uff08\u5982OSS-Instruct\u548cEvol-Instruct\uff09\u65b9\u9762\u5747\u8868\u73b0\u51fa\u8272\u3002SelfCodeAlign\u8fd8\u4fc3\u6210\u4e86StarCoder2-Instruct\u7684\u521b\u5efa\uff0c\u8fd9\u662f\u9996\u4e2a\u5b8c\u5168\u900f\u660e\u3001\u8bb8\u53ef\u5bbd\u677e\u4e14\u81ea\u6211\u5bf9\u9f50\u7684\u4ee3\u7801LLM\uff0c\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7f16\u7801\u6027\u80fd\u3002**|\n", "2410.24175": "|**2024-10-31**|**Constraint Back-translation Improves Complex Instruction Following of Large Language Models**|Yunjia Qi et.al.|[2410.24175](http://arxiv.org/abs/2410.24175)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u9075\u5faa\u5177\u6709\u590d\u6742\u683c\u5f0f\u3001\u957f\u5ea6\u7b49\u7ea6\u675f\u7684\u6307\u4ee4\u65f6\u5b58\u5728\u56f0\u96be\u3002\u4f20\u7edf\u4e0a\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u901a\u8fc7\u5411\u5148\u8fdb\u7684LLMs\u63d0\u4f9b\u590d\u6742\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u6765\u8fdb\u884c\u540e\u8bad\u7ec3\uff0c\u4ee5\u5904\u7406\u8fd9\u4e9b\u590d\u6742\u6307\u4ee4\u3002\u7136\u800c\uff0c\u5373\u4f7f\u662f\u5148\u8fdb\u7684LLMs\u4e5f\u96be\u4ee5\u5f88\u597d\u5730\u9075\u5faa\u590d\u6742\u7684\u6307\u4ee4\uff0c\u4ece\u800c\u9650\u5236\u4e86\u751f\u6210\u6570\u636e\u7684\u8d28\u91cf\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u53d1\u73b0\u73b0\u6709\u7684\u6570\u636e\u96c6\u5185\u5728\u5730\u5305\u542b\u4e86\u9690\u542b\u7684\u590d\u6742\u7ea6\u675f\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u751f\u6210\u6280\u672f\u2014\u2014\u7ea6\u675f\u56de\u8bd1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u91c7\u7528\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\uff0c\u5e76\u4ec5\u4f7f\u7528\u5148\u8fdb\u7684LLMs\u5c06\u54cd\u5e94\u5df2\u6ee1\u8db3\u7684\u590d\u6742\u7ea6\u675f\u6dfb\u52a0\u5230\u6307\u4ee4\u4e2d\uff0c\u8fd9\u81ea\u7136\u964d\u4f4e\u4e86\u6210\u672c\u548c\u6570\u636e\u566a\u58f0\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528Llama3-70B-Instruct\u8fdb\u884c\u7ea6\u675f\u56de\u8bd1\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u590d\u6742\u6307\u4ee4-\u54cd\u5e94\u6570\u636e\u96c6\uff0c\u547d\u540d\u4e3aCRAB\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5728CRAB\u4e0a\u8fdb\u884c\u540e\u8bad\u7ec3\u53ef\u4ee5\u63d0\u9ad8\u591a\u79cd\u57fa\u7840LLMs\u7684\u590d\u6742\u6307\u4ee4\u9075\u5faa\u80fd\u529b\uff0c\u5728\u5e7f\u6cdb\u7684\u6307\u4ee4\u9075\u5faa\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u53d1\u73b0\uff0c\u7ea6\u675f\u56de\u8bd1\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u540e\u8bad\u7ec3\u4e2d\u7684\u6709\u7528\u8f85\u52a9\u8bad\u7ec3\u76ee\u6807\u3002\u6211\u4eec\u7684\u4ee3\u7801\u3001\u6570\u636e\u548c\u6a21\u578b\u5c06\u88ab\u53d1\u5e03\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2410.24155": "|**2024-10-31**|**Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning**|Jinghan Zhang et.al.|[2410.24155](http://arxiv.org/abs/2410.24155)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u901a\u5e38\u901a\u8fc7\u6784\u5efa\u601d\u7ef4\u94fe\u6765\u6307\u5bfc\u6a21\u578b\u8fdb\u884c\u591a\u6b65\u63a8\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5148\u524d\u63a2\u7d22\u8fc7\u7684\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\uff0c\u4ece\u800c\u5ffd\u7565\u4e86LLMs\u8ba4\u77e5\u8303\u56f4\u5185\u7684\u5173\u952e\u76f2\u70b9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Thought Space Explorer (TSE)\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u65e8\u5728\u6269\u5c55\u548c\u4f18\u5316\u601d\u7ef4\u7ed3\u6784\uff0c\u4ee5\u5f15\u5bfcLLMs\u63a2\u7d22\u5176\u601d\u7ef4\u76f2\u70b9\u3002\u901a\u8fc7\u57fa\u4e8e\u539f\u59cb\u601d\u7ef4\u7ed3\u6784\u751f\u6210\u65b0\u7684\u63a8\u7406\u6b65\u9aa4\u548c\u5206\u652f\uff0c\u5e76\u91c7\u7528\u5404\u79cd\u8bbe\u8ba1\u7b56\u7565\uff0cTSE\u6269\u5c55\u4e86\u601d\u7ef4\u7a7a\u95f4\u5e76\u51cf\u8f7b\u4e86\u76f2\u70b9\u5bf9LLM\u63a8\u7406\u7684\u5f71\u54cd\u3002\u5728\u591a\u4e2a\u7ea7\u522b\u7684\u63a8\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86TSE\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5206\u6790\uff0c\u4ee5\u7406\u89e3\u7ed3\u6784\u5316\u548c\u6269\u5c55\u5316\u7684\u601d\u7ef4\u5982\u4f55\u6709\u52a9\u4e8e\u91ca\u653eLLM\u63a8\u7406\u80fd\u529b\u7684\u6f5c\u529b\u3002|\n", "2410.24152": "|**2024-10-31**|**Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning**|Jiaqi Liu et.al.|[2410.24152](http://arxiv.org/abs/2410.24152)|null|\u5408\u4f5c\u9a7e\u9a76\u6280\u672f\u5bf9\u4e8e\u63d0\u5347\u4ea4\u901a\u7cfb\u7edf\u7684\u6548\u7387\u548c\u5b89\u5168\u6027\u81f3\u5173\u91cd\u8981\u3002\u57fa\u4e8e\u5b66\u4e60\u7684\u65b9\u6cd5\uff0c\u5982\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\uff0c\u5728\u5408\u4f5c\u51b3\u7b56\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684MARL\u65b9\u6cd5\u4ecd\u7136\u9762\u4e34\u5b66\u4e60\u6548\u7387\u548c\u6027\u80fd\u65b9\u9762\u7684\u6311\u6218\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fc5\u901f\u53d1\u5c55\uff0c\u5e76\u5728\u5404\u79cd\u987a\u5e8f\u51b3\u7b56\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u4e3a\u4e86\u589e\u5f3a\u5408\u4f5c\u4ee3\u7406\u7684\u5b66\u4e60\u80fd\u529b\uff0c\u540c\u65f6\u786e\u4fdd\u51b3\u7b56\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLDPD\u7684\u8bed\u8a00\u9a71\u52a8\u7b56\u7565\u84b8\u998f\u65b9\u6cd5\u6765\u5f15\u5bfcMARL\u63a2\u7d22\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u57fa\u4e8eLLM\u7684\u6559\u5e08\u4ee3\u7406\u8bad\u7ec3\u8f83\u5c0f\u7684\u5b66\u751f\u4ee3\u7406\u901a\u8fc7\u5176\u81ea\u8eab\u7684\u51b3\u7b56\u6f14\u793a\u5b9e\u73b0\u5408\u4f5c\u51b3\u7b56\u3002\u6559\u5e08\u4ee3\u7406\u589e\u5f3a\u4e86\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u89c2\u5bdf\u4fe1\u606f\uff0c\u5e76\u5229\u7528LLM\u8fdb\u884c\u590d\u6742\u7684\u5408\u4f5c\u51b3\u7b56\u63a8\u7406\uff0c\u540c\u65f6\u4e5f\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u51b3\u7b56\u5de5\u5177\u5b9e\u73b0\u4e13\u5bb6\u7ea7\u51b3\u7b56\uff0c\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u6559\u5b66\u7ecf\u9a8c\u3002\u5b66\u751f\u4ee3\u7406\u901a\u8fc7\u68af\u5ea6\u7b56\u7565\u66f4\u65b0\u5c06\u6559\u5e08\u7684\u5148\u9a8c\u77e5\u8bc6\u63d0\u70bc\u5230\u81ea\u5df1\u7684\u6a21\u578b\u4e2d\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5b66\u751f\u53ef\u4ee5\u5728\u6700\u5c11\u7684\u6559\u5e08\u6307\u5bfc\u4e0b\u5feb\u901f\u63d0\u9ad8\u5176\u80fd\u529b\uff0c\u5e76\u6700\u7ec8\u8d85\u8d8a\u6559\u5e08\u7684\u8868\u73b0\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6027\u80fd\u548c\u5b66\u4e60\u6548\u7387\u65b9\u9762\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2410.24119": "|**2024-10-31**|**Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing**|Akash Dhruv et.al.|[2410.24119](http://arxiv.org/abs/2410.24119)|**[link](https://github.com/neucol/llm-conversion-performance)**|**\u57fa\u7840\u6a21\u578b\u548c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff08GenAI\uff09\u7684\u51fa\u73b0\u6709\u671b\u6539\u53d8\u79d1\u5b66\u8ba1\u7b97\u4e2d\u7684\u751f\u4ea7\u529b\uff0c\u7279\u522b\u662f\u5728\u4ee3\u7801\u5f00\u53d1\u3001\u91cd\u6784\u4ee5\u53ca\u4ece\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u8f6c\u6362\u5230\u53e6\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u65b9\u9762\u3002\u7136\u800c\uff0c\u7531\u4e8eGenAI\u7684\u8f93\u51fa\u4e0d\u80fd\u4fdd\u8bc1\u6b63\u786e\u6027\uff0c\u56e0\u6b64\u4ecd\u7136\u9700\u8981\u4eba\u5de5\u5e72\u9884\u3002\u90e8\u5206\u8fd9\u79cd\u5e72\u9884\u53ef\u4ee5\u901a\u8fc7\u4efb\u52a1\u7279\u5b9a\u5de5\u5177\u4ee5\u53ca\u7528\u4e8e\u6b63\u786e\u6027\u9a8c\u8bc1\u548c\u6709\u6548\u63d0\u793a\u5f00\u53d1\u7684\u9644\u52a0\u65b9\u6cd5\u6765\u81ea\u52a8\u5316\u3002\u6211\u4eec\u7814\u7a76\u4e86GenAI\u5728\u8f85\u52a9\u4ee3\u7801\u8f6c\u6362\u3001\u8bed\u8a00\u4e92\u64cd\u4f5c\u6027\u548c\u5728\u7528\u4e8e\u6a21\u62df\u5927\u578b\u5f3a\u5b50\u5bf9\u649e\u673a\uff08LHC\uff09\u7c92\u5b50\u76f8\u4e92\u4f5c\u7528\u7684\u9057\u7559Fortran\u4ee3\u7801\u5e93\u4e2d\u8fdb\u884c\u4ee3\u7801\u5e93\u68c0\u67e5\u65b9\u9762\u7684\u5e94\u7528\u3002\u5728\u6b64\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u6b3e\u540d\u4e3aCodeScribe\u7684\u5de5\u5177\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u4e0e\u7528\u6237\u76d1\u7763\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684\u4ee3\u7801\u8f6c\u6362\u6d41\u7a0b\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86CodeScribe\u5982\u4f55\u5e2e\u52a9\u5c06Fortran\u4ee3\u7801\u8f6c\u6362\u4e3aC++\uff0c\u751f\u6210Fortran-C API\u4ee5\u96c6\u6210\u9057\u7559\u7cfb\u7edf\u4e0e\u73b0\u4ee3C++\u5e93\uff0c\u5e76\u63d0\u4f9b\u5f00\u53d1\u8005\u652f\u6301\u4ee5\u5b9e\u73b0\u4ee3\u7801\u7ec4\u7ec7\u548c\u7b97\u6cd5\u5b9e\u65bd\u3002\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86AI\u9a71\u52a8\u7684\u4ee3\u7801\u8f6c\u6362\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5f3a\u8c03\u5176\u5728\u63d0\u9ad8\u79d1\u5b66\u8ba1\u7b97\u5de5\u4f5c\u6d41\u7a0b\u751f\u4ea7\u529b\u65b9\u9762\u7684\u4f18\u52bf\u3002**|\n", "2410.24117": "|**2024-10-31**|**Repository-Level Compositional Code Translation and Validation**|Ali Reza Ibrahimzada et.al.|[2410.24117](http://arxiv.org/abs/2410.24117)|null|\u4ee3\u7801\u7ffb\u8bd1\u662f\u5c06\u7a0b\u5e8f\u4ece\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u8f6c\u6362\u4e3a\u53e6\u4e00\u79cd\u7f16\u7a0b\u8bed\u8a00\u7684\u8fc7\u7a0b\u3002\u4e00\u4e9b\u57fa\u4e8e\u89c4\u5219\u7684\u8f6c\u8bd1\u5668\u5df2\u7ecf\u88ab\u8bbe\u8ba1\u51fa\u6765\uff0c\u4ee5\u5b9e\u73b0\u4e0d\u540c\u7f16\u7a0b\u8bed\u8a00\u5bf9\u4e4b\u95f4\u7684\u81ea\u52a8\u5316\u4ee3\u7801\u7ffb\u8bd1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u89c4\u5219\u53ef\u80fd\u4f1a\u56e0\u7f16\u7a0b\u8bed\u8a00\u7684\u53d1\u5c55\u800c\u53d8\u5f97\u8fc7\u65f6\uff0c\u5e76\u4e14\u65e0\u6cd5\u63a8\u5e7f\u5230\u5176\u4ed6\u7f16\u7a0b\u8bed\u8a00\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u4ee3\u7801\u7ffb\u8bd1\u3002\u4e00\u4e2a\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u8fd9\u6837\u7684\u6280\u672f\u53ef\u80fd\u5728\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u4f46\u5728\u771f\u5b9e\u4e16\u754c\u7684\u9879\u76ee\u4e2d\uff0c\u7531\u4e8e\u4f9d\u8d56\u5173\u7cfb\u3001\u81ea\u5b9a\u4e49\u7c7b\u578b\u3001\u7279\u5b9a\u4e8e\u7f16\u7a0b\u8bed\u8a00\u7684\u529f\u80fd\u7b49\u56e0\u7d20\u7684\u5b58\u5728\uff0c\u5b83\u4eec\u53ef\u80fd\u96be\u4ee5\u6cdb\u5316\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AlphaTrans\uff0c\u8fd9\u662f\u4e00\u79cd\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u5316\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u7ea7\u522b\u7684\u4ee3\u7801\u7ffb\u8bd1\u3002AlphaTrans\u4e0d\u4ec5\u7ffb\u8bd1\u6e90\u4ee3\u7801\uff0c\u8fd8\u7ffb\u8bd1\u6d4b\u8bd5\u4ee3\u7801\uff0c\u5e76\u91c7\u7528\u591a\u7ea7\u9a8c\u8bc1\u786e\u4fdd\u7ffb\u8bd1\u540e\u7684\u4ee3\u7801\u4fdd\u7559\u4e86\u6e90\u7a0b\u5e8f\u7684\u529f\u80fd\u3002\u4e3a\u4e86\u5206\u89e3\u95ee\u9898\u4ee5\u4fbf\u8ba9LLMs\u5904\u7406\uff0cAlphaTrans\u5229\u7528\u7a0b\u5e8f\u5206\u6790\u5c06\u7a0b\u5e8f\u5206\u89e3\u6210\u7247\u6bb5\uff0c\u5e76\u6309\u9006\u8c03\u7528\u987a\u5e8f\u8fdb\u884c\u7ffb\u8bd1\u3002\u6211\u4eec\u4f7f\u7528AlphaTrans\u7ffb\u8bd1\u4e86\u5341\u4e2a\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u5f00\u6e90\u9879\u76ee\uff0c\u8fd9\u4e9b\u9879\u76ee\u5305\u542b\u7684\u7c7b\u3001\u65b9\u6cd5\u548c\u6d4b\u8bd5\u5206\u522b\u6709<836, 8575, 2719>\u4e2a\u3002AlphaTrans\u6210\u529f\u7ffb\u8bd1\u4e86\u8fd9\u4e9b\u9879\u76ee\u7684\u6240\u6709\u4ee3\u7801\u5e93\uff0c\u5171\u5305\u62ec6899\u4e2a\u4ee3\u7801\u7247\u6bb5\u300299.1%\u7684\u7ffb\u8bd1\u4ee3\u7801\u7247\u6bb5\u5728\u8bed\u6cd5\u4e0a\u662f\u6b63\u786e\u7684\uff0cAlphaTrans\u9a8c\u8bc1\u4e86\u5176\u4e2d25.8%\u7684\u8fd0\u884c\u65f6\u884c\u4e3a\u548c\u529f\u80fd\u6b63\u786e\u6027\u3002\u5e73\u5747\u800c\u8a00\uff0c\u96c6\u6210\u7ffb\u8bd1\u548c\u9a8c\u8bc1\u8fc7\u7a0b\u9700\u898136\u5c0f\u65f6\u6765\u7ffb\u8bd1\u4e00\u4e2a\u9879\u76ee\uff0c\u663e\u793a\u51fa\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u6269\u5c55\u6027\u3002\u5bf9\u4e8e\u90a3\u4e9b\u5728\u8bed\u6cd5\u6216\u8bed\u4e49\u4e0a\u4e0d\u6b63\u786e\u7684\u7ffb\u8bd1\uff0cAlphaTrans\u751f\u6210\u4e00\u4efd\u62a5\u544a\uff0c\u5176\u4e2d\u5305\u62ec\u73b0\u6709\u7684\u7ffb\u8bd1\u3001\u5806\u6808\u8ddf\u8e2a\u3001\u6d4b\u8bd5\u9519\u8bef\u6216\u65ad\u8a00\u5931\u8d25\u3002\u6211\u4eec\u5411\u4e24\u4f4d\u5f00\u53d1\u8005\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u8f85\u52a9\u6750\u6599\uff0c\u5e2e\u52a9\u4ed6\u4eec\u5728\u56db\u4e2a\u9879\u76ee\u4e2d\u4fee\u590d\u7ffb\u8bd1\u9519\u8bef\u3002\u4ed6\u4eec\u5e73\u5747\u82b1\u8d3920.1\u5c0f\u65f6\u89e3\u51b3\u4e86\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u4f7f\u6240\u6709\u6d4b\u8bd5\u901a\u8fc7\u3002|\n", "2410.24105": "|**2024-10-31**|**Matchmaker: Self-Improving Large Language Model Programs for Schema Matching**|Nabeel Seedat et.al.|[2410.24105](http://arxiv.org/abs/2410.24105)|null|\u5b9e\u4f53\u5339\u914d\u2014\u2014\u5373\u5728\u5177\u6709\u4e0d\u540c\u8868\u548c\u5c42\u6b21\u7ed3\u6784\u7684\u5f02\u6784\u6570\u636e\u6e90\u4e4b\u95f4\u627e\u5230\u5c5e\u6027\u4e4b\u95f4\u7684\u5339\u914d\u2014\u2014\u5bf9\u4e8e\u521b\u5efa\u53ef\u7528\u4e8e\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u7684\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e00\u57fa\u7840\u6027\u7684\u6570\u636e\u95ee\u9898\u5728\u533b\u7597\u3001\u91d1\u878d\u548c\u7535\u5b50\u5546\u52a1\u7b49\u9886\u57df\u5c24\u4e3a\u91cd\u8981\uff0c\u540c\u65f6\u4e5f\u80fd\u591f\u66f4\u5e7f\u6cdb\u5730\u901a\u8fc7\u589e\u52a0\u7528\u4e8e\u8bad\u7ec3ML\u6a21\u578b\u7684\u6570\u636e\u91cf\u6765\u4f7fML\u6a21\u578b\u53d7\u76ca\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4e0d\u540c\u6a21\u5f0f\u4e4b\u95f4\u7684\u7ed3\u6784/\u5c42\u6b21\u548c\u8bed\u4e49\u5f02\u8d28\u6027\uff0c\u5b9e\u4f53\u5339\u914d\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684ML\u4efb\u52a1\u3002\u5148\u524d\u7684\u81ea\u52a8\u5316\u5b9e\u4f53\u5339\u914d\u7684ML\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u5927\u91cf\u7684\u6807\u6ce8\u6570\u636e\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff0c\u8fd9\u901a\u5e38\u662f\u4e0d\u73b0\u5b9e\u7684\uff0c\u8981\u4e48\u96f6\u6837\u672c\u6027\u80fd\u8f83\u5dee\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Matchmaker\u2014\u2014\u4e00\u79cd\u7528\u4e8e\u5b9e\u4f53\u5339\u914d\u7684\u7ec4\u5408\u5f0f\u8bed\u8a00\u6a21\u578b\u7a0b\u5e8f\uff0c\u8be5\u7a0b\u5e8f\u7531\u5019\u9009\u751f\u6210\u3001\u4f18\u5316\u548c\u7f6e\u4fe1\u5ea6\u8bc4\u5206\u7ec4\u6210\u3002Matchmaker\u8fd8\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u4f18\u5316\u65b9\u6cd5\u5b9e\u73b0\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u81ea\u6211\u6539\u8fdb\uff0c\u8be5\u65b9\u6cd5\u6784\u5efa\u5408\u6210\u4e0a\u4e0b\u6587\u6f14\u793a\u4ee5\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u5728\u771f\u5b9e\u4e16\u754c\u7684\u533b\u5b66\u5b9e\u4f53\u5339\u914d\u57fa\u51c6\u4e0a\uff0cMatchmaker\u4f18\u4e8e\u4e4b\u524d\u7684\u57fa\u4e8eML\u7684\u65b9\u6cd5\uff0c\u7a81\u663e\u4e86\u5176\u52a0\u901f\u6570\u636e\u96c6\u6210\u548cML\u5c31\u7eea\u6570\u636e\u4e92\u64cd\u4f5c\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.24049": "|**2024-10-31**|**Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs**|Muhammed Saeed et.al.|[2410.24049](http://arxiv.org/abs/2410.24049)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7f\u6cdb\u5e94\u7528\u7684\u540c\u65f6\u5f15\u53d1\u4e86\u4f26\u7406\u95ee\u9898\uff0c\u56e0\u4e3a\u5b83\u4eec\u5185\u7f6e\u4e86\u793e\u4f1a\u504f\u89c1\u3002\u672c\u7814\u7a76\u5728\u5305\u62ec\u5973\u6027\u6743\u5229\u3001\u6050\u6016\u4e3b\u4e49\u548c\u53cd\u72b9\u592a\u4e3b\u4e49\u5728\u5185\u7684\u516b\u4e2a\u9886\u57df\u4e2d\u8003\u5bdf\u4e86LLMs\u5bf9\u963f\u62c9\u4f2f\u4eba\u4e0e\u897f\u65b9\u4eba\u7684\u504f\u89c1\uff0c\u5e76\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6a21\u578b\u62b5\u6297\u5ef6\u7eed\u8fd9\u4e9b\u504f\u89c1\u7684\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30LLM\u5bf9\u963f\u62c9\u4f2f\u4eba\u4e0e\u897f\u65b9\u4eba\u7684\u504f\u89c1\uff0c\u53e6\u4e00\u4e2a\u7528\u4e8e\u6d4b\u8bd5\u6a21\u578b\u5bf9\u653e\u5927\u8d1f\u9762\u7279\u5f81\u7684\u63d0\u793a\u7684\u5b89\u5168\u6027\uff08\u201c\u8d8a\u72f1\u201d\uff09\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u516d\u79cdLLM\u2014\u2014GPT-4\u3001GPT-4o\u3001LlaMA 3.1\uff088B & 405B\uff09\u3001Mistral 7B\u548cClaude 3.5 Sonnet\u3002\u6211\u4eec\u53d1\u73b079%\u7684\u6848\u4f8b\u663e\u793a\u51fa\u5bf9\u963f\u62c9\u4f2f\u4eba\u7684\u8d1f\u9762\u504f\u89c1\uff0c\u5176\u4e2dLlaMA 3.1-405B\u662f\u6700\u5177\u504f\u89c1\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u201c\u8d8a\u72f1\u201d\u6d4b\u8bd5\u663e\u793a\uff0c\u5c3d\u7ba1GPT-4o\u662f\u7ecf\u8fc7\u4f18\u5316\u7684\u7248\u672c\uff0c\u4f46\u5b83\u5374\u662f\u6700\u6613\u53d7\u653b\u51fb\u7684\uff0c\u5176\u6b21\u662fLlaMA 3.1-8B\u548cMistral 7B\u3002\u9664\u4e86Claude\u5916\uff0c\u6240\u6709LLM\u5728\u4e09\u4e2a\u7c7b\u522b\u4e2d\u7684\u653b\u51fb\u6210\u529f\u7387\u5747\u8d85\u8fc787%\u3002\u6211\u4eec\u8fd8\u53d1\u73b0Claude 3.5 Sonnet\u7684\u5b89\u5168\u6027\u6700\u9ad8\uff0c\u4f46\u4ecd\u7136\u5728\u516b\u4e2a\u7c7b\u522b\u4e2d\u7684\u4e03\u4e2a\u663e\u793a\u51fa\u504f\u89c1\u3002\u5c3d\u7ba1GPT-4o\u662fGPT-4\u7684\u4e00\u4e2a\u4f18\u5316\u7248\u672c\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5b83\u66f4\u5bb9\u6613\u53d7\u5230\u504f\u89c1\u548c\u201c\u8d8a\u72f1\u201d\u7684\u5f71\u54cd\uff0c\u8fd9\u8868\u660e\u4f18\u5316\u5b58\u5728\u7f3a\u9677\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u9700\u8981\u66f4\u5f3a\u5927\u7684\u504f\u89c1\u7f13\u89e3\u7b56\u7565\u548c\u5f3a\u5316\u5b89\u5168\u63aa\u65bd\u7684\u7d27\u8feb\u6027\u3002|\n", "2410.24032": "|**2024-10-31**|**Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks**|Yingzhe Peng et.al.|[2410.24032](http://arxiv.org/abs/2410.24032)|null|\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u5df2\u7ecf\u5f7b\u5e95\u6539\u53d8\u4e86\u7528\u6237\u4e0e\u77e5\u8bc6\u7cfb\u7edf\u4e4b\u95f4\u7684\u4ea4\u4e92\u65b9\u5f0f\uff0c\u4f7f\u5f97\u804a\u5929\u673a\u5668\u4eba\u80fd\u591f\u6574\u5408\u5927\u91cf\u7684\u4fe1\u606f\u5e76\u534f\u52a9\u5904\u7406\u590d\u6742\u7684\u63a2\u7d22\u6027\u4efb\u52a1\u3002\u7136\u800c\uff0c\u57fa\u4e8eLLM\u7684\u804a\u5929\u673a\u5668\u4eba\u5f80\u5f80\u96be\u4ee5\u63d0\u4f9b\u4e2a\u6027\u5316\u652f\u6301\uff0c\u5c24\u5176\u662f\u5728\u7528\u6237\u4ee5\u6a21\u7cca\u67e5\u8be2\u5f00\u59cb\u6216\u7f3a\u4e4f\u8db3\u591f\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u65f6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4e2a\u6027\u5316\u63a2\u7d22\u534f\u4f5c\u52a9\u7406\u201d\uff08CARE\uff09\u7684\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u901a\u8fc7\u7ed3\u5408\u591a\u4ee3\u7406LLM\u6846\u67b6\u548c\u7ed3\u6784\u5316\u7684\u7528\u6237\u754c\u9762\u6765\u589e\u5f3a\u4e2a\u6027\u5316\u5728\u63a2\u7d22\u6027\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u3002CARE\u7684\u754c\u9762\u5305\u62ec\u804a\u5929\u9762\u677f\u3001\u89e3\u51b3\u65b9\u6848\u9762\u677f\u548c\u9700\u6c42\u9762\u677f\uff0c\u4f7f\u8fed\u4ee3\u5f0f\u67e5\u8be2\u7ec6\u5316\u548c\u52a8\u6001\u89e3\u51b3\u65b9\u6848\u751f\u6210\u6210\u4e3a\u53ef\u80fd\u3002\u591a\u4ee3\u7406\u6846\u67b6\u534f\u540c\u5de5\u4f5c\uff0c\u4ee5\u8bc6\u522b\u663e\u6027\u548c\u9690\u6027\u7528\u6237\u9700\u6c42\uff0c\u4ece\u800c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u3001\u53ef\u64cd\u4f5c\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u4e00\u9879\u6d89\u53ca22\u540d\u53c2\u4e0e\u8005\u7684\u88ab\u8bd5\u5185\u7528\u6237\u7814\u7a76\u4e2d\uff0cCARE\u76f8\u5bf9\u4e8e\u57fa\u7ebfLLM\u804a\u5929\u673a\u5668\u4eba\u4e00\u76f4\u53d7\u5230\u6b22\u8fce\uff0c\u7528\u6237\u79f0\u8d5e\u5176\u80fd\u591f\u51cf\u8f7b\u8ba4\u77e5\u8d1f\u62c5\u3001\u6fc0\u53d1\u521b\u9020\u529b\uff0c\u5e76\u63d0\u4f9b\u66f4\u52a0\u4e2a\u6027\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cCARE\u6709\u53ef\u80fd\u5c06\u57fa\u4e8eLLM\u7684\u7cfb\u7edf\u4ece\u88ab\u52a8\u7684\u4fe1\u606f\u68c0\u7d22\u8005\u8f6c\u53d8\u4e3a\u4e2a\u6027\u5316\u95ee\u9898\u89e3\u51b3\u548c\u63a2\u7d22\u4e2d\u7684\u79ef\u6781\u5408\u4f5c\u4f19\u4f34\u3002|\n", "2410.24024": "|**2024-10-31**|**AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents**|Yifan Xu et.al.|[2410.24024](http://arxiv.org/abs/2410.24024)|null|\u81ea\u4e3b\u4ee3\u7406\u5728\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\u4e2d\u7684\u91cd\u8981\u6027\u65e5\u76ca\u589e\u52a0\u3002\u7279\u522b\u662f\uff0c\u5b89\u5353\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u4ea4\u4e92\u65b9\u6cd5\u88ab\u9891\u7e41\u63d0\u53ca\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7528\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u5b89\u5353\u4ee3\u7406\u7684\u7814\u7a76\u7f3a\u4e4f\u5bf9\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u7cfb\u7edf\u7684\u7cfb\u7edf\u6027\u7814\u7a76\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AndroidLab\u4f5c\u4e3a\u7cfb\u7edf\u5316\u7684\u5b89\u5353\u4ee3\u7406\u6846\u67b6\u3002\u5b83\u5305\u62ec\u4e00\u4e2a\u5177\u6709\u4e0d\u540c\u6a21\u6001\u7684\u64cd\u4f5c\u73af\u5883\u3001\u52a8\u4f5c\u7a7a\u95f4\u4ee5\u53ca\u53ef\u91cd\u590d\u4f7f\u7528\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u5b83\u652f\u6301\u5728\u540c\u4e00\u52a8\u4f5c\u7a7a\u95f4\u4e0b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u3002AndroidLab\u57fa\u51c6\u6d4b\u8bd5\u5305\u62ec\u9884\u5b9a\u4e49\u7684\u5b89\u5353\u865a\u62df\u8bbe\u5907\u548c\u4e5d\u4e2a\u5e94\u7528\u4e0a\u7684138\u4e2a\u4efb\u52a1\u3002\u901a\u8fc7\u4f7f\u7528AndroidLab\u73af\u5883\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b89\u5353\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u5e76\u8bad\u7ec3\u4e86\u516d\u4e2a\u5f00\u6e90\u7684LLMs\u548cLMMs\uff0c\u5c06LLMs\u7684\u6210\u529f\u7387\u4ece4.59%\u63d0\u5347\u523021.50%\uff0cLMMs\u7684\u6210\u529f\u7387\u4ece1.93%\u63d0\u5347\u523013.28%\u3002AndroidLab\u5df2\u5f00\u6e90\u5e76\u516c\u5f00\u63d0\u4f9b\uff0c\u7f51\u5740\u4e3a\u3002|\n"}} \ No newline at end of file