From 2fbb3b19274bef09752bef7719458f462007a4e3 Mon Sep 17 00:00:00 2001 From: limafang Date: Mon, 8 Jul 2024 00:44:21 +0000 Subject: [PATCH] Github Action Automatic Update agent Arxiv Papers --- README.md | 20 ++++++++++---------- docs/agent-arxiv-daily.json | 2 +- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index dc4ba860509..1c1260de436 100755 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -## Updated on 2024.07.07 +## Updated on 2024.07.08 > Usage instructions: [here](./docs/README.md#usage)
@@ -13,16 +13,16 @@ |Publish Date|Title|Authors|PDF|Code|abstract| |---|---|---|---|---|---| -|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|null|尽管多模态大型语言模型(MLLMs)已经取得了成功,但它们的泛化能力仍然有限,在某些情况下不如专业模型。近期,研究人员开发了基于LLMs的代理,通过用户输入选择合适的专用模型来解决这些问题。然而,在医疗领域,这类进展的应用还不广泛。为了弥补这一空白,本文首次提出了一种专为医疗设计的代理,名为\textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent}(MMedAgent)。我们构建了一个指令调优数据集,包含了六个医疗工具,用于解决七项任务,使代理能针对特定任务选择最适宜的工具。实验全面展示了MMedAgent在各种医疗任务上超越了开源方法,甚至包括封闭源模型GPT-4o,且在引入和整合新医疗工具方面表现出高效性。| +|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|null|尽管多模态大型语言模型(MLLMs)已经取得了成功,但它们的泛化能力仍然有限,在某些情况下表现不如专门化的模型。为了解决这些问题,最近的研究开发了基于LLMs的代理,可以根据用户输入选择合适的专用模型。然而,这种进展在医疗领域尚未得到充分探索。为了弥补这一空白,本文首次提出了一种专门为医疗领域设计的代理,称为\textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent}(MMedAgent)。我们构建了一个指令调优数据集,包含了六个医疗工具来解决七项任务,使代理能够为给定任务选择最合适的工具。实验全面展示了MMedAgent在各种医疗任务上超越了开源方法的最新状态,甚至与闭源模型GPT-4o相比也表现出色。此外,MMedAgent还显示出了更新和整合新医疗工具的高效性。| |**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|本文关注的是大型语言模型在决策制定中的性能,尤其是在杜尔克姆双臂赌博(Dueling Bandits,DB)问题的上下文中。研究比较了GPT-3.5-Turbo、GPT-4和GPT-4-Turbo与现有DB算法的性能。结果显示,尤其是GPT-4 Turbo,能够快速识别出优势明显的选项,从而在弱后悔方面超越当前最佳算法。然而,这些模型在收敛性上存在问题,对提示的敏感度较高,且对提示变化反应脆弱。为了改进,我们提出了一种结合了LLM决策能力与经典DB算法理论保证的增强型算法——IF-Enhanced LLM。这种设计展示了如何增强LLM在对性能稳定性有要求的决策任务中的可信度。IF-Enhanced LLM具有弱后悔和强后悔的理论保证。实验结果验证了即使面对嘈杂和对抗性的提示,IF-Enhanced LLM仍保持稳健。| |**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**随着大型语言模型(LLMs)的最新进展,软件开发任务的自动化,如代码合成、程序修复和测试生成,已取得显著进步。研究人员和业界实践者已经开发出各种自主LLM代理来执行端到端的软件开发任务,它们能够利用工具、运行命令、观察环境反馈并规划未来行动。然而,这些基于代理的方法的复杂性以及当前LLM的局限性,引发了一个问题:是否真的需要使用复杂的自主软件代理?为了探讨这个问题,我们构建了Agentless——一种无代理方法,用于自动解决软件开发问题。与复杂的代理设置相比,Agentless采用了一种简单的两阶段过程:定位后修复,不让LLM决定未来的行动或操作复杂的工具。在流行的SWE-bench Lite基准上,我们的实验结果令人惊讶地表明,这种简单的方法能够实现最高性能(27.33%)和最低成本(0.34美元),超越所有开源软件代理! 此外,我们手动分类了SWE-bench Lite中的问题,并发现存在精确的ground truth补丁问题或描述不足/误导性的问题。因此,我们构建了SWE-bench Lite-S,通过排除这些问题来进行更严格的评估和比较。我们的工作突显了当前被忽视的简单、可解释技术在自主软件开发中的潜力。我们希望Agentless将作为自主软件代理的基线、起点和期望值,激发未来在这个关键领域的工作。**| -|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|随着大型语言模型(LLMs)的最新进展,这些模型能够自主收集全球信息,并进行推理以解决复杂问题,这引发了使用LLM预测国际事件的兴趣。然而,目前缺乏一个严格评估LLM预测能力与可靠性的基准。为了填补这一空白,我们提出MIRAI,这是一个新颖的基准,旨在系统地评价LLM在国际事件时间序列预测中的表现。MIRAI构建了一个代理环境,配备有访问广泛历史结构化事件和文本新闻数据库的工具。我们对GDELT事件数据库进行了精心清洗和解析,设计了一系列关联预测任务,涵盖了不同预测时间范围,从短期到长期,以检验LLM在整合全球关键信息、运用领域特定API和库编写代码以及综合处理来自多种格式和时间的历史知识以准确预测未来事件的能力。通过全面的基准测试,我们的目标是建立一个可靠的框架,以评估LLM在国际事件预测方面的性能,从而推动更精确和可信的国际关系分析模型的发展。| +|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|随着大型语言模型(LLMs)的最新进展,这些模型能够自主收集全球信息,并进行推理以解决复杂问题,这引发了使用LLM预测国际事件的兴趣。然而,目前缺乏一个严格评估LLM预测能力与可靠性的基准。为了填补这一空白,我们提出MIRAI,这是一个新颖的基准,旨在系统地评价LLM在国际事件时间序列预测中的表现。MIRAI构建了一个代理环境,配备有访问广泛历史结构化事件和文本新闻数据库的工具。我们对GDELT事件数据库进行了精心清洗和解析,设计了一系列关联预测任务,涵盖了从短期到长期的不同预测时间跨度,以评估LLM在整合关键全球信息、使用领域特定API和库编写代码以及综合处理来自不同格式和时间的历史知识以准确预测未来事件的能力。通过全面的基准测试,我们的目标是建立一个可靠的框架,用于评估LLM在国际事件预测方面的性能,从而推动更精确和可信的国际关系分析模型的发展。| |**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|随着大型语言模型(LLMs)的显著进步,基于LLM的移动代理已成为人机交互领域的研究热点。然而,针对此类代理的基准测试资源相对匮乏。评估这类代理通常面临三个挑战:(1)仅依赖用户界面(UI)操作的低效限制了任务评估;(2)单一应用中的特定指令不足以全面评估LLM移动代理的多维度推理和决策能力;(3)当前的评估指标无法准确衡量连续动作过程。为此,我们提出了Mobile-Bench,一个全新的用于评估LLM移动代理能力的基准。首先,我们扩展了传统的UI操作,融入了103个收集到的API,以提高任务完成的效率。接着,我们通过结合真实用户查询和LLM增强的数据收集来进行评估。为了更好地评价移动代理的不同规划能力层次,我们的数据被分为SAST(简单任务)、SAMT(稍复杂任务)和MAMT(多任务)三类,反映了任务复杂度的差异。Mobile-Bench包含832条数据条目,其中超过200项任务专门设计用于测试跨应用协作场景。此外,我们引入了一种更精确的评估指标,称为CheckPoint,用于检查LLM移动代理在规划和推理步骤中是否达到关键点。| -|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**随着传统优化和调度方法逐渐转向用户驱动和个人化服务,以提升用户体验(QoE)和灵活性,未来的系统,尤其是在无线和数字化能源网络中,面临着如何更好地理解和响应用户需求的挑战。传统的系统往往忽视了用户的个性化需求,因为用户与机器之间的沟通不畅。大型语言模型(LLMs)的出现为解决这个问题带来了突破,它们提供了用户与设备之间自然的交流界面。本文首次提出了一种新颖的架构,通过构建三个LLM代理来将用户的语音请求(VRQ)转化为资源分配向量。具体包括:LLM意图识别代理将请求转化为优化问题(OP)、LLM OP参数识别代理以及LLM OP求解代理。 我们针对电动汽车(EV)充电的典型VRQ创建了一个数据库,作为性能评估的基础。作为概念验证,我们主要使用Llama 3 8B模型进行实验。通过不同的提示工程场景测试,结果显示了所提架构的有效性。研究还揭示了一些关键见解,例如,用于建模实际问题的更大候选OP集可能会由于更高的识别/OP分类噪声而降低最终性能。所有结果和代码已开源,供学术界和工业界进一步研究和应用。**| +|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**随着传统优化和调度方法逐渐转向用户驱动和个人化服务,以提升用户体验(QoE)和灵活性,未来的系统,尤其是在无线和数字化能源网络中,面临着如何更好地理解和响应用户需求的挑战。传统的系统往往忽视了用户的个性化需求,因为用户与机器之间的沟通不畅。大型语言模型(LLMs)的出现为解决这个问题带来了突破,它们提供了用户与设备之间自然的交流界面。本文首次提出了一种新颖的架构,通过构建三个LLM代理来将用户的语音请求(VRQ)转化为资源分配向量。具体包括:LLM意图识别代理将请求转化为优化问题(OP)、LLM OP参数识别代理以及LLM OP求解代理。 我们针对电动汽车(EV)充电的典型VRQ创建了一个数据库,作为性能评估的基础。作为概念验证,我们主要使用Llama 3 8B模型进行实验。通过不同的提示工程场景测试,结果显示了所提架构的有效性。研究还揭示了一些关键见解,例如,用于建模实际问题的更大候选OP集可能会由于更高的识别/OP分类噪声而降低最终性能。所有结果和代码已开源,供学术界进一步研究和利用。**| |**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|人工智能在金融领域取得了显著进步,正在重塑数据处理和解读方式。其中,大型语言模型(LLMs)展现出巨大的潜力,能够自动化复杂任务、提升客户服务,并提供详尽的财务分析。首先,我们介绍IDEA-FinBench,这是一个专为评估大型语言模型在金融知识方面的性能而设计的评价基准。它借鉴了两个全球知名且权威的金融专业考试中的问题,旨在全面检验LLMs解答与金融相关考题的能力。其次,我们提出IDEA-FinKER,是一个金融知识增强框架,旨在快速让通用LLMs适应金融领域。它采用基于检索的少量样本学习方法,实现实时上下文级知识注入,并提供一套高质量的金融知识指令,用于微调任何通用模型。最后,我们展示了IDEA-FinQA,一个由LLMs驱动的金融问答系统。该系统围绕实时知识注入和事实强化的架构构建,利用外部知识。IDEA-FinQA主要由数据收集器、数据查询模块和执行特定功能的LLM代理组成。| |**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|大多数经济理论通常假设金融市场参与者是完全理性的个体,并使用数学模型来模拟人类在金融市场的行为。然而,人类行为往往并非完全理性,用数学模型精确预测颇具挑战。本文提出了一种新型的\textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}arket(ASFM),首先构建了一个具有真实订单匹配系统的模拟股票市场。接着,我们设计了一种基于大型语言模型的股票交易代理,它包括个人概况、观察和基于工具学习的动作模块。这种交易代理能够全面理解当前市场动态和金融政策信息,从而根据其交易策略作出决策。实验表明,ASFM在可控场景下的反应与现实股票市场一致。此外,我们在两个经济学研究热点领域进行了实验,结果发现,我们的\model得出的结论与经济学研究的初步发现相吻合。因此,我们认为ASFM为经济研究提供了一个新的范式。| |**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|这项研究提出了一种创新的方法,利用语言模型驱动的虚拟代理来模拟立法过程,具体聚焦于美国参议院情报委员会。我们构建了代表个别参议员的代理,并在模拟的委员会讨论中让它们互动。这些代理展现出在现实辩论中的能力,能够提供深思熟虑的观点,并在特定条件下找到两党的解决方案。值得注意的是,模拟显示,面对外部干扰时,代理模型在两党合作上展现出转变的潜力。研究结果表明,这种基于语言模型的策略可能成为理解和改进立法流程的有效工具,这与一系列发现相呼应,即基于语言模型的代理能有用地模拟现实世界现象。未来的研究将致力于提升代理的复杂性,扩大模拟范围,并探索在政策测试和谈判中的应用。| -|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### 翻译 构建逼真的人工大型语言模型(LLMs)对于实现可信的社会模拟至关重要。尽管基于人口统计信息的角色扮演有时能提升人性化,但效果并不总是理想。本研究旨在探究是否可以通过整合来自实证人类信念网络的信息,进一步提升LLMs与人类行为的契合度。我们利用一项人类调查数据,估计了一个包含18个主题的信念网络,这些主题加载于两个不重叠的潜在因子上。然后,我们在LLM中植入一个关于某个话题的观点,评估其对剩余测试话题表达的意见与相应人类数据的吻合程度。仅依赖人口统计数据的角色扮演未能使LLM和人类观点保持一致,然而,当给代理注入单一信念时,它显著提高了与信念网络内相关话题的契合,而对于网络外的话题则影响不大。这些结果为在试图模拟和理解社会中信念分布模式的工作中,实现人与LLM的信念对齐提供了一条新路径。| +|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### 翻译 构建逼真的人工大型语言模型(LLMs)对于实现可信的社会模拟至关重要。尽管基于人口统计信息的角色扮演有时能提升人性化,但效果并不总是理想。本研究旨在探究是否可以通过整合来自实证人类信念网络的信息,进一步提升LLMs与人类行为的契合度。我们利用一项人类调查数据,估计了一个包含18个主题的信念网络,这些主题加载于两个不重叠的潜在因子上。然后,我们在LLM中植入一个关于某一主题的观点,分析其对剩余测试话题表达的观点与相应人类数据的契合程度。仅依赖人口统计信息的角色扮演未能使LLM和人类观点保持一致,但当植入单一信念时,对于相关于信念网络内的主题,这种一致性显著提高,而对于网络外的主题则没有明显影响。这些结果表明了一种新颖的方法,可以用于在追求理解和模拟社会中信念分布模式的人工智能工作中,实现人类与LLMs之间的信念对齐。| |**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## 翻译 近年来,机器学习的进步显著提升了从基因表达数据中识别疾病相关基因的能力。然而,这些过程往往需要深厚的专长和大量的人工努力,限制了其可扩展性。大型语言模型(LLMs)驱动的代理显示出在自动化此类任务方面的潜力,因为它们的问题解决能力日益增强。为了支持这类方法的评估和发展,我们创建了GenoTEX,这是一个基因表达数据分析自动探索的基准,包括数据集选择、预处理和统计分析任务。GenoTEX提供了全面的分析管道,其中包含了人类生物信息学家精心编写的注释,他们对数据集进行深入分析以确保准确性和可靠性。 为了提供这些任务的基线,我们设计了GenoAgents,这是一个基于LLMs的代理团队,具备上下文感知规划、迭代校正以及与领域专家咨询的能力,它们协作探索基因数据集。我们的实验显示了LLM驱动方法在基因组数据分析中的潜力,而错误分析指出了挑战和未来的改进方向。我们提议GenoTEX作为一个有前景的资源,用于衡量和提升人工智能驱动的基因组数据分析方法。我们的基准已公开发布在:\url{https://github.com/Liu-Hy/GenoTex}。**| |**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**大型语言模型多-agent系统(LLM-MAS)在解决复杂任务方面取得了显著进步。它们通过系统内各代理之间的通信协作来完成任务,前提是共享信息。然而,当代理间的交流被用于增强人类合作时,由于信息不对称(每个代理仅能访问其对应人类用户的信息),这带来了新的挑战。传统MAS在这种情况下难以完成任务。为解决此问题,我们提出了一种新型多agent系统架构,称为“iAgents”,即信息丰富多agent系统。在iAgents中,人类社会网络在代理网络中得到反映,代理主动交换完成任务所需的人类信息,从而克服信息不对称。iAgents采用了一种新颖的代理推理机制,InfoNav,引导代理之间的有效信息交流。结合InfoNav,iAgents组织了混合记忆中的人类信息,为代理提供准确全面的信息进行交换。此外,我们还推出了首个针对评估LLM在信息不对称条件下任务解决能力的基准——InformativeBench。实验结果显示,iAgents能够在包含140人和588条关系的社会网络中协作,自主进行超过30轮的通信,并从近70,000条消息中检索信息,在3分钟内完成任务。**| |**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|基于语言模型的代理作为一种有前景的工具,被设计用于通过迭代规划和行动来执行复杂任务。然而,这些代理在处理需要专业知识的任务时,容易产生不期望的规划幻觉。为了解决这个问题,初步尝试通过融入与工作流程相关的外部知识来增强规划可靠性。尽管显示出潜力,但注入的知识通常杂乱无章,格式多样,缺乏严谨的规范化和全面的比较。为此,我们规范了不同格式的工作流程知识,并提出了FlowBench,这是第一个面向工作流引导规划的基准。FlowBench涵盖了来自6个领域的51个不同场景,其中知识以多样的形式呈现。为了评估不同语言模型在FlowBench上的性能,我们设计了一个多层次的评估框架。我们研究了工作流程知识在多种格式下的有效性,结果表明当前的语言模型代理在满足满意的规划需求方面仍有很大的提升空间。我们期望这个具有挑战性的基准能为未来的代理规划研究铺平道路。| @@ -142,7 +142,7 @@ |**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|随着算法定价的兴起,人们担忧算法间的合谋问题。我们通过实验使用基于大型语言模型(LLMs)的定价代理,特别是GPT-4,进行了探究。研究发现:(1) LLM驱动的定价机制在定价任务上表现出色;(2) 在寡头竞争环境中,LLM定价代理会自发地进行合谋,从而损害消费者利益;(3) 对LLM指令(“提示”)看似微小的变化可能加剧这种合作行为。这些结果同样适用于拍卖场景。我们的研究结果强调了对算法定价进行反垄断监管的必要性,并揭示了针对LLM定价代理特有的监管挑战。| |**2024-03-31**|**"My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|在这个研究中,我们提出了一种创新的人类记忆架构,旨在提升基于大型语言模型的对话代理的认知能力。我们的设计使得这些代理能自主检索生成响应所需的必要记忆,从而解决LLMs在时间认知上的局限。我们借鉴了人类的记忆线索召回机制作为触发点,以实现精确且高效的回忆。此外,我们开发了一个数学模型,动态量化记忆巩固过程,考虑了诸如上下文相关性、时间流逝和回忆频率等因素。代理会从用户的交互历史中存储记忆,这些记忆被封装在数据库中,每个记忆都包含了内容和时间关联的语境。这样,通过类似人类识别和回忆过往经历的方式,系统能够战略性地存储记忆,并理解它们对用户在时间线上的重要性。| -

(back to top)

+

(back to top)

## llm @@ -150,13 +150,13 @@ |---|---|---|---|---|---| |**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**摘要:** 长度泛化指的是从简短的训练序列推断出长测试序列的能力,这对于当前的大语言模型是一个挑战。尽管先前的研究提出了一些架构或数据格式变化来实现长度泛化,但这些方法通常局限于特定任务。在此基础上,我们结合了擦除板和链式思考(Chain-of-Thought, CoT)技术,提出了Turing程序,这是一种新颖的CoT策略,它将算法性任务分解成类似图灵机计算的步骤。这个框架既通用又简单,只需要在上下文中稍作修改地复制文本。我们展示了使用Turing程序,我们在加法、乘法以及基于上下文的SGD等算法性任务上实现了稳健的长度泛化。接着,我们展示Transformer在随机Turing程序上也能实现长度泛化,这表明对于任何算法性任务,长度泛化都是可能的。最后,我们理论证明Transformer能够实现Turing程序,构造了一个简单的RASP(Weiss等人)程序,它模拟任意图灵机。| |**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## 背景 半结构化数据格式如JSON因其在存储数据时的灵活性而被广泛应用。然而,JSON数据通常缺乏与关系数据库中的表单结构相对应的规范(schema)。因此,出现了许多从数据集中发现规范的工具。尽管这些工具很有用,但现有的方法主要关注文档的语法,而忽视了语义信息。本研究中,我们探讨如何自动为发现的规范添加有意义的语义信息,使其类似于人类作者编写的规范中所包含的信息。我们利用大型语言模型和人工编写的JSON Schema文档库,生成元素的自然语言描述、可重用定义的有意义名称,并识别出哪些发现的属性最有用,哪些可以视为“噪声”。我们的方法在先前已证明与人类判断高度相关的文本生成指标上表现出色。| -|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|null|## 背景 大型语言模型(LLMs)的幻觉问题严重制约了它们的可靠性和可信度。人类具有自我意识过程,能识别面对查询时的未知领域。为此,我们的论文研究了LLMs能否在生成响应之前自行评估其幻觉风险。我们从训练数据源和15个不同自然语言生成(NLG)任务的广泛视角分析了LLMs的内部机制,涵盖了超过700个数据集。实证分析揭示了两个关键发现:(1) LLM的内部状态可以指示它们是否在训练数据中见过查询;(2) LLM的内部状态显示出它们对查询可能产生幻觉或不产生幻觉的风险。我们的研究关注特定的神经元、激活层和令牌,这些在LLM对不确定性和幻觉风险的认识中扮演着关键角色。通过一种探查估计算法,我们利用LLM的自我评估,在运行时实现了平均84.32%的幻觉估计准确率。| +|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|null|## 背景 大型语言模型(LLMs)的幻觉问题严重制约了它们的可靠性和可信度。人类具有自我意识过程,能识别面对查询时的未知领域。为此,我们的论文研究了LLMs能否在生成响应之前自行评估其幻觉风险。我们从训练数据源和15个不同自然语言生成(NLG)任务的角度广泛分析LLMs的内部机制,这些任务涵盖了超过700个数据集。实证分析揭示了两个关键发现:(1) LLM的内部状态能够指示它们是否在训练数据中见过查询;(2) LLM的内部状态显示出它们对查询可能产生幻觉或不产生幻觉的风险。我们的研究关注特定的神经元、激活层和令牌,这些在LLM对不确定性和幻觉风险的认识中扮演着关键角色。通过一种探查估计算法,我们利用LLM的自我评估能力,在运行时实现了平均84.32%的幻觉估计准确率。| |**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|我们从大型语言模型的角度探讨文本到SQL的语义解析。鉴于商业数据库模式的规模挑战和业务智能解决方案的部署问题,我们提出了一种方法,它动态获取输入数据库信息,并利用抽象语法树选择少量示例进行上下文学习。此外,我们研究了如何利用并行语义解析器生成SQL查询的近似版本,以支持我们的检索。我们甚至将这种方法推向极致,采用不到5亿参数的模型作为高效近似器,并赋予其并行处理模式的能力。我们在单语和跨语言的语义解析基准上应用了我们的方法,结果优于现有最佳基线。全面的实验揭示了这种检索增强生成设置中各个模块的贡献,为未来工作指明了有趣的方向。| |**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## 背景 量化技术在提升大语言模型(LLM)的推理速度和部署效率方面被广泛应用。尽管有大量的研究关注了量化后的英语任务模型效果,但尚无研究针对多语言场景。我们对量化多语言LLM进行了深入分析,重点关注其跨语言性能及不同规模下的表现。我们采用自动基准测试、LLM作为评判者的方法以及人类评估,发现以下几点:(1) 量化对人类评价的影响是负面的,且自动指标严重低估了这种损害:自动任务中平均1.7%的性能下降对应人类评估中日本任务的16.0%显著下滑;(2) 不同语言受到量化的影响程度不均,非拉丁字母体系的语言受影响最严重;(3) 比如数学推理这类挑战性任务,其性能下降最为显著。随着低功耗模型服务于全球NLP技术的普及变得至关重要,我们的研究结果强调了在评估高效模型时,多语言性能应作为关键指标。| |**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### 翻译 在数学证明的计算机可验证形式语言(如Lean)验证中,使用大型语言模型(LLMs)基于自然语言(NL)的证明方法具有重要影响。然而,由于NL与形式语言(FL)的证明数据稀缺,现代LLMs在生成完整证明方面的性能欠佳。为此,本文提出了一种名为**TheoremLlama**的端到端框架,旨在训练通用LLM成为Lean4专家。该框架包括NL-FL对齐数据集生成方法、LLM形式定理证明器的训练策略以及LLM在撰写Lean4证明中的技术。 关键创新在于我们开发了NL-FL自举方法,即将NL证明融入Lean4代码,利用LLMs的自然语言推理能力进行正式推理。通过这种数据集生成方式,我们提供了**Open Bootstrapped Theorems**(OBT),一个对齐且自举的NL-FL数据集。**TheoremLlama**框架在MiniF2F-Valid和Test数据集上的累计准确率分别达到36.48%和33.61%,超过了GPT-4的基线分数22.95%和25.41%。我们已公开了模型检查点和生成的数据集,并即将全部代码开源。**| -|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**该研究提出了一种新颖的方法,称为Divergent CoT(DCoT),通过要求模型在单次推理步骤中比较多个推理链来进一步提升性能。这种方法发现,即使在小型、更易于获取的大型语言模型上进行指令调优也能提高表现。通过一系列广泛涵盖不同类型推理任务的严谨实验,研究显示,对DCoT进行微调在不同规模的模型(从1.3亿到70亿参数)和模型家族中,都普遍优于基本的CoT方法。实验结果表明,这些性能提升源于模型在单次推理中生成了多条不同的推理路径,这表明语言模型能够实现自我纠正。研究代码和数据已公开在https://github.com/UKPLab/arxiv2024-divergent-cot。**| +|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**该研究提出了一种新颖的方法,称为Divergent CoT(DCoT),通过要求模型在单次推理步骤中比较多个推理链来进一步提升性能。这种方法发现,即使在小型、更易于获取的大型语言模型上进行指令调优也能提高表现。通过广泛的实验,涉及不同类型的推理任务,研究发现对DCoT数据集的微调在各种规模的模型(从13亿到70亿参数)上普遍优于基本的CoT方法。实验和人工评估表明,这些性能提升源于模型在单次推理中生成了多个不同的推理路径,这表明语言模型能够实现自我纠正。相关代码和数据已在https://github.com/UKPLab/arxiv2024-divergent-cot上公开。**| |**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## 背景 大型语言模型(LLMs)因其出色的推理能力、泛化能力和跨领域的流畅性,在提升语音相关任务方面展现出巨大潜力。本文关注的是如何将解码器仅有的LLMs整合到语音转文本翻译(Speech-to-Text Translation,S2TT)任务中。我们提出一种架构,让LLM直接处理编码的语音表示并生成文本翻译。同时,我们研究了不同参数高效微调技术和任务表述方式的影响。在不使用专有数据的情况下,我们的模型在CoVoST 2和FLEURS基准上实现了最先进的性能。我们还进行了深入分析,验证了我们设计选择的合理性,并为LLMs与S2TT任务的融合提供了见解。| -|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## 背景 开源的大规模语言模型(LLMs)在公众和行业中的受欢迎程度日益提升,因为它们可定制、微调且免费使用。然而,一些开源LLMs在使用前需要审批,这促使第三方发布易于获取的版本,甚至对这些模型进行了微调或量化处理,以降低计算资源需求。这种趋势增加了训练时间攻击的风险,威胁到LLMs的完整性和安全性。本研究提出了一种新的训练时间攻击——SOS(Save Our Skills),它设计得计算需求低,无需干净数据或调整模型权重,保持模型的实用性。该攻击旨在应对各种安全问题,包括后门攻击、破解攻击和提示窃取攻击。实验结果显示,SOS攻击在所有测试目标上都有效。此外,我们还展示了SOS技术的另一面——版权令牌:这是一种新颖的方法,用户可以标记其受版权保护的内容,防止模型使用。| +|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## 背景 开源的大规模语言模型(LLMs)在公众和行业中的受欢迎程度日益提升,因为它们可定制、微调且免费使用。然而,一些开源LLMs在使用前需要审批,这促使第三方发布易于获取的版本,甚至对这些模型进行微调或量化优化,以降低计算需求。这些便捷版本对用户颇具吸引力,但也增加了训练时间攻击的风险,威胁到LLMs的完整性和安全性。本文提出一种新的训练时间攻击方法SOS,它设计得计算需求低,无需干净数据或调整模型权重,保持模型的可用性。SOS针对各种场景下的安全问题,包括后门攻击、破解攻击和提示窃取攻击。实验结果表明,该攻击在所有评估目标上均有效。此外,我们还展示了SOS技术的另一面——版权令牌:这是一种新颖的方法,允许用户标记其版权内容,防止模型使用。| |**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|在本研究中,我们探讨了代码生成中的常见场景:开发者实时编辑现有代码,并请求大型语言模型(如大语言模型)进行即时重预测下一个token或行。直接的方法是让LLM重新编码整个键值缓存以提供精确的预测,但这个过程计算成本高,特别是当序列长度很长时。仅编码编辑后的子序列并将其整合到原始键值缓存中会遇到时间混淆问题,导致性能大幅下降。为此,我们提出了一种解决方案——\textbf{位置完整性编码}(Positional Integrity Encoding,简称PIE)。PIE基于旋转型位置编码,首先移除引入时间混淆的旋转型矩阵,然后重新应用正确的矩阵,确保了令牌之间的位置关系正确,仅需一轮矩阵乘法即可完成。我们在RepoBench-C-8k数据集上,使用13亿、67亿和330亿参数的DeepSeek-Coder模型进行了广泛实验,涵盖了代码插入、代码删除和多位置代码编辑等三个实际编程任务。实验结果表明,与标准的完整重计算方法相比,PIE在所有模型规模和任务中都能减少超过85%的计算开销,同时保持了良好的性能近似。| |**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**由于大型语言模型(LLMs)的计算挑战,尤其是随着提示长度的增长,其广泛应用面临障碍。由于注意力计算的二次复杂性,80亿参数的LLM在单个A100 GPU上处理100万个令牌(即预填充阶段)需要30分钟。现有的加速预填充方法往往在面对长序列LLMs时难以保持既高效又准确。为此,我们提出了MInference(百万令牌推理),这是一种旨在提升长序列处理预填充阶段速度的稀疏计算方法。我们发现了注意力矩阵中的三种独特模式:A形、垂直斜线和块稀疏,这些模式可利用GPU进行高效的稀疏计算。我们在离线阶段确定每个注意力头的最佳模式,并在推理过程中动态构建稀疏索引。通过优化的GPU内核,我们实现了基于指定模式的稀疏注意力计算,显著减少了长序列LLMs预填充阶段的延迟。我们的方法无需修改预训练设置或额外微调即可直接应用于现有LLMs。我们在包括InfiniteBench、RULER、PG-19和Needle In A Haystack在内的各种下游任务以及LLaMA-3-1M、GLM4-1M、Yi-200K、Phi-3-128K和Qwen2-128K等模型上的实验表明,MInference在A100上有效降低了预填充的推理延迟高达10倍,同时保持了准确性。我们的代码已开源,地址为:https://aka.ms/MInference。**| |**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**这篇论文介绍了一种名为Neurocache的方法,用于扩展大型语言模型(LLMs)的有效上下文范围,通过外部向量缓存存储其过去的模型状态。与近期的向量检索方法类似,Neurocache利用高效的k近邻(kNN)算法检索相关的历史状态,并将其融入注意力过程。Neurocache在改进现有方法方面有以下几点:(1) 存储压缩的状态,减小了缓存大小;(2) 每个令牌执行一次检索操作,提高了推理速度;(3) 将检索窗口扩展到邻近状态,提升了语言建模和下游任务的准确性。 实验结果表明,无论从头开始训练还是对预训练模型(如Llama2-7B和Mistral-7B)进行增强,Neurocache都能有效。我们还对比了Neurocache与其他文本检索方法,在单文档问答和少量样本学习任务中展示了其优势。源代码已在以下链接公开:https://github.com/alisafaya/neurocache。**| @@ -459,5 +459,5 @@ |**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|大型语言模型(LLMs)在根据自然语言描述自动生成可执行代码方面展现出巨大潜力,特别是通过互动功能,用户可以通过迭代反馈指导模型。然而,当前的互动方式往往假设用户具备调试源代码的专业知识,对非专业程序员不太友好。这使得使互动代码生成对不同编程水平的个体更易于使用成为一个挑战。为解决这个问题,我们提出了IntelliExplain,这是一种创新的人机交互范式,通过让用户通过自然语言解释与源代码互动,提升非专业人士的体验。用户通过提供他们发现错误的自然语言纠正反馈,来指导系统修订代码,直到用户对系统的代码解释感到满意。我们的用户研究显示,使用IntelliExplain的用户在Text-to-SQL和Python代码生成任务中的成功率分别比纯GPT-3.5提高了11.6%和25.3%,同时所需时间分别减少了39.0%和15.6%。| |**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|null|在这篇论文中,我们提出了一种创新的心理学基准测试——CPsyExam,它源于中国语言考试的问题。CPsyExam旨在分别强调心理学知识和案例分析的重要性,认识到将心理学知识应用于实际情境的价值。从22,000个问题库中,我们精选了4,000个来构建该基准,确保了主题的均衡覆盖,并包含了各种案例分析方法的多样性。此外,我们对一系列现有的大型语言模型(LLMs)进行了评估,包括开源和API基础的模型。实验和分析结果显示,CPsyExam是一个有效的确立语言模型对心理学理解能力的基准,同时支持在不同粒度上比较这些模型。| -

(back to top)

+

(back to top)

diff --git a/docs/agent-arxiv-daily.json b/docs/agent-arxiv-daily.json index b4abf6367b4..8e70ebea15e 100644 --- a/docs/agent-arxiv-daily.json +++ b/docs/agent-arxiv-daily.json @@ -1 +1 @@ -{"agent": {"2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u5d4c\u5165\u5f0f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7cfb\u7edf\u5728\u7a7a\u95f4\u8ba4\u77e5\u548c\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u96c6\u6210\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u8bba\u6587\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u63ed\u793a\u4e86\u660e\u663e\u7684\u8fdb\u5c55\uff0c\u4f46\u4e5f\u5f3a\u8c03\u4e86\u5f00\u53d1\u65b0\u65b9\u6cd5\u4ee5\u5145\u5206\u5229\u75283D-LLMs\u6f5c\u529b\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u6307\u660e\u9053\u8def\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u7efc\u8ff0\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.09935": "|**2024-05-24**|**DEBATE: Devil's Advocate-Based Assessment and Text Evaluation**|Alex Kim et.al.|[2405.09935](http://arxiv.org/abs/2405.09935)|null|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u6a21\u578b\u7684\u666e\u53ca\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u53d8\u5f97\u65e5\u76ca\u5173\u952e\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65e0\u53c2\u8003\u8bc4\u4ef7\u5668\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u5904\u7406\u65b0\u4efb\u52a1\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u91c7\u7528\u5355\u4ee3\u7406\u65b9\u6cd5\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\u3002\u56e0\u4e3aLLM\u4ee3\u7406\u7684\u56de\u7b54\u5b58\u5728\u504f\u89c1\uff0c\u6bd4\u5982\u5bf9\u7279\u5b9a\u6587\u672c\u7ed3\u6784\u6216\u5185\u5bb9\u7684\u504f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u5de5\u4f5c\u4e2d\u63d0\u51faDEBATE\uff0c\u4e00\u4e2a\u5efa\u7acb\u5728\u591a\u4ee3\u7406\u8bc4\u5206\u7cfb\u7edf\u57fa\u7840\u4e0a\u7684NLG\u8bc4\u4ef7\u6846\u67b6\uff0c\u878d\u5165\u4e86\u201c\u6076\u9b54\u8fa9\u624b\u201d\u7684\u6982\u5ff5\u3002\u5728\u8be5\u6846\u67b6\u4e2d\uff0c\u4e00\u4e2a\u4ee3\u7406\u88ab\u6307\u4ee4\u6279\u8bc4\u5176\u4ed6\u4ee3\u7406\u7684\u8bba\u70b9\uff0c\u4ece\u800c\u53ef\u80fd\u6d88\u89e3LLM\u4ee3\u7406\u7b54\u6848\u4e2d\u7684\u504f\u89c1\u3002DEBATE\u5728\u4e24\u4e2aNLG\u8bc4\u4ef7\u5143\u8bc4\u4f30\u57fa\u51c6\u2014\u2014SummEval\u548cTopicalChat\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4ee3\u7406\u4e4b\u95f4\u7684\u8fa9\u8bba\u5e7f\u5ea6\u4ee5\u53ca\u4ee3\u7406\u7684\u4eba\u683c\u7279\u8d28\u4f1a\u5f71\u54cd\u8bc4\u4ef7\u5668\u7684\u6027\u80fd\u3002|\n", "2405.05175": "|**2024-05-08**|**Air Gap: Protecting Privacy-Conscious Conversational Agents**|Eugene Bagdasaryan et.al.|[2405.05175](http://arxiv.org/abs/2405.05175)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u5f0f\u4ee3\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5904\u7406\u654f\u611f\u7528\u6237\u6570\u636e\u65f6\u5f15\u53d1\u4e86\u4e25\u91cd\u7684\u9690\u79c1\u95ee\u9898\u3002\u8fd9\u4e9b\u4ee3\u7406\u867d\u80fd\u7406\u89e3\u5e76\u5904\u7406\u4e0a\u4e0b\u6587\uff0c\u4f46\u4e5f\u53ef\u80fd\u88ab\u6076\u610f\u4e00\u65b9\u5229\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5a01\u80c1\u6a21\u578b\uff0c\u5373\u7b2c\u4e09\u65b9\u5e94\u7528\u901a\u8fc7\u64cd\u63a7\u4ea4\u4e92\u4e0a\u4e0b\u6587\uff0c\u8bef\u5bfcLLM\u4ee3\u7406\u6cc4\u9732\u4e0e\u5176\u4efb\u52a1\u65e0\u5173\u7684\u79c1\u4eba\u4fe1\u606f\u3002\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u6846\u67b6\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AirGapAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u6ce8\u91cd\u9690\u79c1\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u9650\u5236\u4ee3\u7406\u4ec5\u8bbf\u95ee\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u6570\u636e\uff0c\u9632\u6b62\u610f\u5916\u7684\u6570\u636e\u6cc4\u6f0f\u3002\u5b9e\u9a8c\u4f7f\u7528Gemini\u3001GPT\u548cMistral\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\uff0c\u7ed3\u679c\u663e\u793aAirGapAgent\u5728\u62b5\u5fa1\u57fa\u4e8e\u5355\u4e2a\u67e5\u8be2\u7684\u4e0a\u4e0b\u6587\u52ab\u6301\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8eGemini Ultra\u4ee3\u7406\uff0c\u8fd9\u79cd\u653b\u51fb\u4ece94%\u7684\u4fdd\u62a4\u80fd\u529b\u964d\u4f4e\u523045%\uff0c\u800cAirGapAgent\u53ef\u4ee5\u4fdd\u630197%\u7684\u9632\u62a4\u6548\u679c\uff0c\u4f7f\u540c\u6837\u7684\u653b\u51fb\u5931\u6548\u3002|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325](http://arxiv.org/abs/2405.04325)|null|\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u867d\u4e3a\u6784\u5efa\u81ea\u7136\u8bed\u8a00\u4ee3\u7406\u63d0\u4f9b\u4e86\u5f3a\u5927\u57fa\u7840\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u5b83\u4eec\u53ca\u5176\u57fa\u4e8e\u5b83\u4eec\u6784\u5efa\u7684\u81ea\u4e3b\u4ee3\u7406\u7684\u5b89\u5168\u6027\u62c5\u5fe7\u3002\u7279\u522b\u662f\u6b3a\u9a97\u80fd\u529b\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662fAI\u4ee3\u7406\u901a\u8fc7\u6df7\u6dc6\u548c\u6a21\u68f1\u4e24\u53ef\u6765\u8bef\u5bfc\u3001\u9690\u85cf\u771f\u76f8\u6216\u63a8\u5e7f\u90e8\u5206\u4e0d\u771f\u5b9e\u7684\u4fe1\u5ff5\u7684\u884c\u4e3a\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80AI\u5b89\u5168\u7814\u7a76\u4e2d\u7684\u6492\u8c0e\u3001\u81ea\u79c1\u51b3\u7b56\u6216\u63d0\u4f9b\u865a\u5047\u4fe1\u606f\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u4e00\u7c7b\u7279\u6b8a\u7684\u6b3a\u9a97\uff1a\u7c7b\u4f3c\u4e8e\u9b54\u672f\u5e08\u5229\u7528\u969c\u773c\u6cd5\u8ba9\u5154\u5b50\u4ece\u5e3d\u5b50\u91cc\u51fa\u73b0\uff0c\u8981\u4e48\u901a\u8fc7\u9690\u85cf\u7684\u6697\u95e8\uff0c\u8981\u4e48\u901a\u8fc7\u8f6c\u79fb\u6ce8\u610f\u529b\u76f4\u63a5\u5c55\u793a\u3002 \u6211\u4eec\u7684\u65b0\u5b9e\u9a8c\u5e73\u53f0\u5728\u4e00\u4e2a\u6709\u76ee\u6807\u7684\u73af\u5883\u4e2d\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u5bf9\u6297\u6027\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65f6\u7684\u6b3a\u9a97\u56fa\u6709\u80fd\u529b\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u7acb\u6cd5\u4efb\u52a1\u201c\u6e38\u8bf4\u201d\u8bae\u6848\u3002\u5728\u76ee\u6807\u9a71\u52a8\u7684\u73af\u5883\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u6784\u5efa\u6b3a\u9a97\u80fd\u529b\uff0c\u7ed3\u5408\u8bed\u8a00\u54f2\u5b66\u548c\u8ba4\u77e5\u5fc3\u7406\u5b66\u7406\u8bba\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6e38\u8bf4\u4ee3\u7406\u5728\u5bf9\u6297\u4e92\u52a8\u7684\u540e\u7eed\u5f3a\u5316\u8bd5\u9a8c\u4e2d\u5176\u6b3a\u9a97\u80fd\u529b\u63d0\u9ad8\u4e86\u7ea640%\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6b3a\u9a97\u68c0\u6d4b\u673a\u5236\u80fd\u8fbe\u5230\u9ad8\u8fbe92%\u7684\u8bc6\u522b\u7387\u3002\u8fd9\u4e9b\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u95ee\u9898\uff0c\u5373\u4ee3\u7406\u53ef\u80fd\u64cd\u7eb5\u4eba\u7c7b\u4ee5\u8fbe\u6210\u9884\u8bbe\u76ee\u6807\u3002|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324](http://arxiv.org/abs/2405.04324)|**[link](https://github.com/ibm-granite/granite-code-models)**|**\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u9886\u57df\u7684\u8bad\u7ec3\u6b63\u5728\u9769\u65b0\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002\u5982\u4eca\uff0c\u8fd9\u4e9b\u4ee3\u7801LLMs\u6b63\u9010\u6b65\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u73af\u5883\uff0c\u4ee5\u63d0\u5347\u4eba\u7c7b\u7a0b\u5e8f\u5458\u7684\u6548\u7387\uff0c\u5e76\u5c55\u73b0\u51fa\u81ea\u4e3b\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u8981\u5145\u5206\u5229\u7528\u4ee3\u7801LLMs\u7684\u5168\u90e8\u6548\u80fd\uff0c\u9700\u8981\u5176\u5177\u5907\u751f\u6210\u4ee3\u7801\u3001\u4fee\u590dbug\u3001\u89e3\u91ca\u548c\u6ce8\u91ca\u4ee3\u7801\u3001\u7ef4\u62a4\u4ed3\u5e93\u7b49\u591a\u79cd\u529f\u80fd\u3002\u672c\u6587\u4ecb\u7ecdGranite\u7cfb\u5217\u7684\u89e3\u7801\u5668\u4ec5\u6709\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u4e13\u4e3a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u800c\u8bbe\u8ba1\uff0c\u8bad\u7ec3\u6570\u636e\u6db5\u76d6116\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002Granite Code\u6a21\u578b\u5bb6\u65cf\u5305\u62ec\u4ece3\u4ebf\u5230340\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u4ece\u590d\u6742\u5e94\u7528\u73b0\u4ee3\u5316\u5230\u8bbe\u5907\u5185\u5b58\u53d7\u9650\u7684\u591a\u79cd\u5e94\u7528\u573a\u666f\u3002\u901a\u8fc7\u5168\u9762\u4efb\u52a1\u8bc4\u4f30\uff0cGranite Code\u6a21\u578b\u5728\u5f00\u6e90\u4ee3\u7801LLM\u4e2d\u7684\u6027\u80fd\u59cb\u7ec8\u5904\u4e8e\u9886\u5148\u6c34\u5e73\u3002\u8be5\u6a21\u578b\u5bb6\u65cf\u9488\u5bf9\u4f01\u4e1a\u8f6f\u4ef6\u5f00\u53d1\u5de5\u4f5c\u6d41\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8868\u73b0\u51fa\u8272\u4e8e\u5404\u79cd\u7f16\u7801\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\u3001\u4fee\u590d\u4e0e\u89e3\u91ca\uff09\uff0c\u662f\u4e00\u6b3e\u591a\u7528\u9014\u7684\u5168\u80fd\u4ee3\u7801\u6a21\u578b\u3002\u6211\u4eec\u4ee5Apache 2.0\u8bb8\u53ef\u534f\u8bae\u53d1\u5e03\u6240\u6709Granite Code\u6a21\u578b\uff0c\u4f9b\u7814\u7a76\u548c\u5546\u4e1a\u4f7f\u7528\u3002**|\n", "2405.04219": "|**2024-05-07**|**Iterative Experience Refinement of Software-Developing Agents**|Chen Qian et.al.|[2405.04219](http://arxiv.org/abs/2405.04219)|null|### \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u81ea\u4e3b\u6027\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u9759\u6001\u7ecf\u9a8c\u8303\u5f0f\u4f9d\u8d56\u4e8e\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u83b7\u53d6\u7684\u56fa\u5b9a\u5386\u53f2\u7ecf\u9a8c\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u8fed\u4ee3\u7ecf\u9a8c\u4f18\u5316\u6846\u67b6\uff0c\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u5728\u6267\u884c\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u52a8\u6001\u8c03\u6574\u548c\u4f18\u5316\u7ecf\u9a8c\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e24\u79cd\u6838\u5fc3\u6a21\u5f0f\uff1a\u987a\u5e8f\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u6279\u6b21\u5185\u7684\u6700\u8fd1\u7ecf\u9a8c\u8fdb\u884c\u6539\u8fdb\uff1b\u7d2f\u8ba1\u6a21\u5f0f\uff0c\u79ef\u7d2f\u6240\u6709\u5148\u524d\u4efb\u52a1\u6279\u6b21\u7684\u7ecf\u9a8c\u3002\u901a\u8fc7\u5f15\u5165\u7ecf\u9a8c\u6dd8\u6c70\u7b56\u7565\uff0c\u8be5\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9ad8\u8d28\u91cf\u548c\u5e38\u7528\u7684\u7ecf\u9a8c\uff0c\u6709\u6548\u5730\u7ba1\u7406\u7ecf\u9a8c\u7a7a\u95f4\uff0c\u63d0\u9ad8\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u987a\u5e8f\u6a21\u5f0f\u53ef\u80fd\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\uff0c\u4f46\u7d2f\u8ba1\u6a21\u5f0f\u5728\u7a33\u5b9a\u6027\u65b9\u9762\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u6dd8\u6c70\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u9ad8\u8d28\u91cf\u7ecf\u9a8c\u5b50\u96c6\u768411.54%\uff0c\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2405.03813": "|**2024-05-06**|**Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control**|Yaqub Chaudhary et.al.|[2405.03813](http://arxiv.org/abs/2405.03813)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6a21\u4eff\u5404\u79cd\u4fee\u8f9e\u98ce\u683c\uff0c\u751f\u6210\u8868\u8fbe\u5e7f\u6cdb\u60c5\u611f\u7684\u6587\u672c\uff0c\u8fd9\u79cd\u80fd\u529b\u5728\u4f4e\u6210\u672c\u4e0b\u8fc5\u901f\u666e\u53ca\uff0c\u5e26\u6765\u4e86\u6f5c\u5728\u7684\u793e\u4f1a\u5371\u5bb3\u3002\u672c\u6587\u5e76\u672a\u5b64\u7acb\u770b\u5f85\u8fd9\u4e9b\u6a21\u578b\uff0c\u800c\u662f\u5173\u6ce8\u5b83\u4eec\u80cc\u540e\u5927\u89c4\u6a21\u8ba1\u7b97\u57fa\u7840\u8bbe\u65bd\u5728\u5404\u9886\u57df\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u63a2\u8ba8\u4e86LLMs\u5982\u4f55\u901a\u8fc7\u6c61\u67d3\u548c\u6807\u51c6\u5316\u4fe1\u606f\u73af\u5883\u6765\u5f71\u54cd\u793e\u4f1a\uff0c\u5e76\u6307\u51fa\u8fd9\u4e9b\u529f\u80fd\u53ef\u80fd\u88ab\u7528\u4f5c\u63a7\u5236\u624b\u6bb5\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u7126\u70b9\u8f6c\u5411\u51e0\u4e2a\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u8fd9\u4e9b\u9886\u57df\u589e\u5f3a\u4e86LLMs\u4f5c\u4e3a\u6743\u529b\u5de5\u5177\u7684\u80fd\u529b\uff1a 1. \u901a\u8fc7\u5b9e\u65f6\u8bbe\u8ba1\u5bf9\u8bdd\u754c\u9762\u4e2d\u7684\u9009\u62e9\u67b6\u6784\uff08\u5982\u201cAI\u89d2\u8272\u201d\uff09\uff0c\u8fdb\u884c\u8bf4\u670d\u7b56\u7565\u3002 2. \u5229\u7528LLM\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u8ba1\u7b97\u6a21\u578b\uff08\u5982\u201c\u7845\u8d28\u4e3b\u4f53\u201d\uff09\u3002 3. \u5c06LLM\u5e94\u7528\u4e8e\u6a21\u62df\u4eba\u7c7b\u7fa4\u4f53\u884c\u4e3a\uff08\u5982\u201c\u7845\u8d28\u793e\u4f1a\u201d\uff09\u3002 4. \u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff0c\u521b\u5efa\u53ef\u63a7\u5236\u548c\u5bfc\u5411\u7684\u6218\u7565\u5bf9\u8bdd\u6a21\u578b\u3002 \u7efc\u5408\u4ee5\u4e0a\u51e0\u70b9\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6280\u672f\u6784\u5efa\u57fa\u4e8eLLMs\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u6a21\u62df\u548c\u4f2a\u88c5\u7684\u201c\u9884\u6d4b\u201d\uff0c\u6210\u4e3a\u4e2a\u4f53\u3001\u793e\u4f1a\u548c\u653f\u6cbb\u63a7\u5236\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u64cd\u63a7\u4eba\u7c7b\u7684\u884c\u4e3a\u3001\u610f\u56fe\u548c\u884c\u52a8\u3002|\n", "2405.06682": "|**2024-05-05**|**Self-Reflection in LLM Agents: Effects on Problem-Solving Performance**|Matthew Renze et.al.|[2405.06682](http://arxiv.org/abs/2405.06682)|**[link](https://github.com/matthewrenze/self-reflection)**|**\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u81ea\u6211\u53cd\u601d\u5bf9\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba9\u4e5d\u79cd\u6d41\u884c\u7684LLMs\u56de\u7b54\u4e00\u7cfb\u5217\u9009\u62e9\u9898\uff0c\u4ee5\u5efa\u7acb\u6027\u80fd\u57fa\u7ebf\u3002\u5bf9\u4e8e\u56de\u7b54\u9519\u8bef\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6307\u5bfc\u516b\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u81ea\u6211\u53cd\u601dLLM\u4ee3\u7406\u53cd\u601d\u5176\u9519\u8bef\uff0c\u5e76\u4e3a\u81ea\u5df1\u63d0\u4f9b\u6539\u8fdb\u95ee\u9898\u89e3\u51b3\u7684\u6307\u5bfc\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u6307\u5bfc\uff0c\u6bcf\u4e2a\u53cd\u601d\u578b\u4ee3\u7406\u91cd\u65b0\u5c1d\u8bd5\u56de\u7b54\u540c\u6837\u7684\u95ee\u9898\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u663e\u8457\u63d0\u9ad8\u4e86\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff08$p < 0.001$\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6bd4\u8f83\u4e86\u5404\u79cd\u81ea\u6211\u53cd\u601d\u65b9\u5f0f\u5bf9\u6027\u80fd\u7684\u5355\u72ec\u8d21\u732e\u3002\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\uff1ahttps://github.com/matthewrenze/self-reflection\u3002**|\n", "2405.02858": "|**2024-05-05**|**Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation**|Jinyu Cai et.al.|[2405.02858](http://arxiv.org/abs/2405.02858)|**[link](https://github.com/BlueLinkX/GA-MAS)**|**\u793e\u4ea4\u5a92\u4f53\u5e73\u53f0\u5982Twitter\u3001Reddit\u548c\u65b0\u6d6a\u5fae\u535a\u5728\u5168\u7403\u4ea4\u6d41\u4e2d\u626e\u6f14\u91cd\u8981\u89d2\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5730\u7f18\u653f\u6cbb\u654f\u611f\u533a\u57df\u5e38\u5e38\u53d7\u5230\u4e25\u683c\u76d1\u7ba1\u3002\u8fd9\u4fc3\u4f7f\u7528\u6237\u5728\u53d7\u9650\u7684\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e2d\u5de7\u5999\u5730\u8c03\u6574\u6c9f\u901a\u65b9\u5f0f\uff0c\u7ecf\u5e38\u4f7f\u7528\u7f16\u7801\u8bed\u8a00\u3002\u8fd9\u79cd\u8bed\u8a00\u6a21\u5f0f\u7684\u53d8\u5316\u4e0d\u4ec5\u662f\u4e3a\u4e86\u5bf9\u6297\u76d1\u7ba1\uff0c\u4e5f\u662f\u8bed\u8a00\u6f14\u5316\u7684\u751f\u52a8\u4f8b\u8bc1\uff0c\u5c55\u793a\u4e86\u793e\u4f1a\u548c\u6280\u672f\u538b\u529b\u4e0b\u8bed\u8a00\u5982\u4f55\u81ea\u7136\u6f14\u53d8\u3002\u7814\u7a76\u53d7\u9650\u5236\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e0b\u8bed\u8a00\u7684\u6f14\u53d8\u5bf9\u4e8e\u4fdd\u969c\u8a00\u8bba\u81ea\u7531\u3001\u4f18\u5316\u5185\u5bb9\u7ba1\u7406\u4ee5\u53ca\u63a8\u52a8\u8bed\u8a00\u5b66\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6a21\u62df\u6846\u67b6\uff0c\u7528\u4e8e\u63a2\u7d22\u5728\u4e25\u683c\u76d1\u7ba1\u4e0b\u7684\u7528\u6237\u8bed\u8a00\u8fdb\u5316\u3002\u8be5\u6846\u67b6\u5305\u542b\u5bf9\u8bdd\u76d1\u7763\u7684LLM\u9a71\u52a8\u4ee3\u7406\u548c\u53c2\u4e0e\u8005\u4ee3\u7406\uff0c\u5b83\u4eec\u5728\u4e92\u52a8\u4e2d\u53d1\u5c55\u8bed\u8a00\u7b56\u7565\uff0c\u6a21\u62df\u5728\u89c4\u907f\u793e\u4ea4\u5a92\u4f53\u89c4\u5219\u7684\u73af\u5883\u4e2d\u4ea4\u6d41\u65b9\u5f0f\u7684\u6f14\u53d8\u3002\u901a\u8fc7\u4ece\u62bd\u8c61\u573a\u666f\u5230\u73b0\u5b9e\u60c5\u5883\u7684\u591a\u79cd\u60c5\u666f\u8bc4\u4f30\uff0c\u7814\u7a76\u7ed3\u679c\u663e\u793aLLMs\u80fd\u591f\u6709\u6548\u6a21\u62df\u53d7\u9650\u73af\u5883\u4e2d\u7684\u590d\u6742\u8bed\u8a00\u52a8\u6001\u548c\u4ea4\u4e92\uff0c\u968f\u7740\u8fdb\u5316\uff0c\u5b83\u4eec\u5728\u89c4\u907f\u76d1\u7763\u548c\u4fe1\u606f\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u51fa\u63d0\u5347\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0LLM\u4ee3\u7406\u9488\u5bf9\u4e0d\u540c\u7684\u573a\u666f\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u7b56\u7565\u3002**|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533](http://arxiv.org/abs/2405.01533)|**[link](https://github.com/nvlabs/omnidrive)**|**\u968f\u7740\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5bf9\u4e8e\u57fa\u4e8e\u8fd9\u4e9b\u6a21\u578b\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u8868\u73b0\u51fa\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u671f\u671b\u5229\u7528\u5b83\u4eec\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06MLLMs\u7684\u5f3a\u9879\u5e94\u7528\u4e8e\u9a7e\u9a76\u4efb\u52a1\u7684\u89c4\u5212\u90e8\u5206\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u89c4\u5212\u9700\u8981\u5bf9\u4e09\u7ef4\u73af\u5883\u6709\u5168\u9762\u7684\u7406\u89e3\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u4e8c\u7ef4\u63a8\u7406\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u6a21\u578b\u4e0e3D\u9a7e\u9a76\u4efb\u52a1\u7684\u7d27\u5bc6\u5951\u5408\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u76843D MLLM\u67b6\u6784\uff0c\u5b83\u5229\u7528\u7a00\u758f\u67e5\u8be2\u6280\u672f\u5c06\u89c6\u89c9\u8868\u793a\u63d0\u5347\u5e76\u538b\u7f29\u5230\u4e09\u7ef4\u7a7a\u95f4\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u8fd9\u79cd\u57fa\u4e8e\u67e5\u8be2\u7684\u8868\u793a\u65b9\u5f0f\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u540c\u65f6\u7f16\u7801\u52a8\u6001\u7269\u4f53\u548c\u9759\u6001\u5730\u56fe\u5143\u7d20\uff08\u5982\u9053\u8def\uff09\uff0c\u4e3a\u611f\u77e5\u548c\u884c\u52a8\u7684\u5bf9\u9f50\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5316\u7684\u4e09\u7ef4\u4e16\u754c\u6a21\u578b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86OmniDrive-nuScenes\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u89c6\u89c9\u95ee\u7b54\u6570\u636e\u96c6\uff0c\u5b83\u901a\u8fc7\u5168\u9762\u7684\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\uff08\u5982\u573a\u666f\u63cf\u8ff0\u3001\u4ea4\u901a\u89c4\u5219\u7406\u89e3\u3001\u4e09\u7ef4\u5b9a\u4f4d\u3001\u53cd\u4e8b\u5b9e\u63a8\u7406\u3001\u51b3\u7b56\u5236\u5b9a\u548c\u89c4\u5212\uff09\u6765\u8003\u9a8c\u6a21\u578b\u5728\u590d\u6742\u4e09\u7ef4\u573a\u666f\u4e2d\u7684\u771f\u6b63\u60c5\u5883\u610f\u8bc6\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u63d0\u51fa\u7684\u67b6\u6784\u6709\u6548\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u590d\u6742\u4e09\u7ef4\u73af\u5883\u4e2d\u8fdb\u884c\u63a8\u7406\u548c\u89c4\u5212\u65f6\uff0c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002**|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972](http://arxiv.org/abs/2405.00972)|**[link](https://github.com/pnnl/cactus)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCACTUS\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u7ed3\u5408\u4e86\u5316\u5b66\u4fe1\u606f\u5b66\u5de5\u5177\uff0c\u65e8\u5728\u63d0\u5347\u5728\u5316\u5b66\u548c\u5206\u5b50\u53d1\u73b0\u9886\u57df\u7684\u9ad8\u7ea7\u63a8\u7406\u4e0e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u7814\u7a76\u8005\u4eec\u4f7f\u7528\u5305\u62ecGemma-7b\u3001Falcon-7b\u3001MPT-7b\u3001Llama2-7b\u548cMistral-7b\u5728\u5185\u7684\u591a\u6b3e\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u5bf9CACTUS\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6027\u80fd\u8bc4\u4f30\uff0c\u901a\u8fc7\u6570\u5343\u4e2a\u5316\u5b66\u95ee\u9898\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cCACTUS\u660e\u663e\u4f18\u4e8e\u57fa\u7840\u6a21\u578b\uff0c\u5176\u4e2dGemma-7b\u548cMistral-7b\u65e0\u8bba\u91c7\u7528\u4f55\u79cd\u63d0\u793a\u7b56\u7565\uff0c\u8868\u73b0\u6700\u4e3a\u51fa\u8272\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u9886\u57df\u7279\u5b9a\u63d0\u793a\u548c\u786c\u4ef6\u914d\u7f6e\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5f3a\u8c03\u4e86\u63d0\u793a\u5de5\u7a0b\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6307\u51fa\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u90e8\u7f72\u8f83\u5c0f\u6a21\u578b\u53ef\u80fd\u4e0d\u4f1a\u663e\u8457\u727a\u7272\u51c6\u786e\u6027\u3002 CACTUS\u901a\u8fc7\u878d\u5408\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8ba4\u77e5\u529f\u80fd\u4e0e\u4e13\u4e1a\u5de5\u5177\uff0c\u80fd\u591f\u534f\u52a9\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u5206\u5b50\u6027\u8d28\u9884\u6d4b\u3001\u76f8\u4f3c\u6027\u641c\u7d22\u548c\u836f\u7269\u9002\u7528\u6027\u8bc4\u4f30\u7b49\u4efb\u52a1\u3002\u4f5c\u4e3a\u5316\u5b66\u4fe1\u606f\u5b66\u9886\u57df\u7684\u91cd\u5927\u7a81\u7834\uff0cCACTUS\u4e3a\u5316\u5b66\u5bb6\u548c\u5206\u5b50\u63a2\u7d22\u8005\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u5de5\u5177\uff0c\u6709\u671b\u52a0\u901f\u79d1\u5b66\u7814\u7a76\uff0c\u63a8\u52a8\u65b0\u578b\u6709\u6548\u3001\u5b89\u5168\u836f\u7269\u3001\u50ac\u5316\u5242\u548c\u6750\u6599\u7684\u53d1\u73b0\u3002\u6b64\u5916\uff0cCACTUS\u4e0e\u81ea\u52a8\u5316\u5b9e\u9a8c\u5e73\u53f0\u7684\u96c6\u6210\u4ee5\u53ca\u5b9e\u65f6\u6570\u636e\u9a71\u52a8\u51b3\u7b56\u7684\u80fd\u529b\uff0c\u4e3a\u81ea\u4e3b\u53d1\u73b0\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u3002**|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978](http://arxiv.org/abs/2404.18978)|null|\u968f\u7740\u6559\u80b2\u73af\u5883\u4e2d\u5bf9\u5b66\u4e60\u8005\u6a21\u578b\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u7814\u7a76\u91cd\u70b9\u9010\u6e10\u8f6c\u5411\u5982\u4f55\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f8\u7ed3\u5408\uff0c\u63d0\u5347\u5728\u5f00\u653e\u6027\u6587\u672c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u901a\u7528\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e09\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\uff081\uff09\u57fa\u4e8eRL\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8868\u793a\u72b6\u6001\u548c\u884c\u52a8\u7b56\u7565\u4ee5\u5bfb\u627e\u6700\u4f73\u4e92\u52a8\u65b9\u5f0f\uff1b\uff082\uff09\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5229\u7528\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u901a\u8fc7\u63d0\u793a\u8fdb\u884c\u64cd\u4f5c\uff1b\uff083\uff09\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u63d0\u9ad8\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e9b\u4ee3\u7406\u7684\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PharmaSimText\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81eaPharmaSim\u865a\u62df\u836f\u5e97\u73af\u5883\u7684\u65b0\u57fa\u51c6\uff0c\u4e13\u6ce8\u4e8e\u8bca\u65ad\u5bf9\u8bdd\u5b9e\u8df5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRL\u57fa\u7840\u7684\u4ee3\u7406\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u8868\u73b0\u4f18\u79c0\uff0c\u4f46\u5728\u63d0\u95ee\u8d28\u91cf\u4e0a\u6709\u6240\u6b20\u7f3a\uff1b\u800cLLM\u57fa\u7840\u7684\u4ee3\u7406\u5728\u63d0\u95ee\u80fd\u529b\u4e0a\u8f83\u5f3a\uff0c\u4f46\u4efb\u52a1\u5b8c\u6210\u5ea6\u4e0d\u9ad8\u3002\u6700\u540e\uff0c\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\u5c55\u793a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u7684\u6f5c\u529b\uff0c\u8bc1\u5b9e\u4e86RL\u4e0eLLMs\u7ed3\u5408\u7528\u4e8e\u5f00\u53d1\u5f00\u653e\u6027\u5b66\u4e60\u73af\u5883\u9ad8\u8868\u73b0\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021](http://arxiv.org/abs/2404.18021)|null|\u968f\u7740\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u7684\u5174\u8d77\uff0c\u7cbe\u786e\u4fee\u6539\u9057\u4f20\u4fe1\u606f\u5df2\u6210\u4e3a\u53ef\u80fd\uff0c\u4f46\u9ad8\u6548\u57fa\u56e0\u7f16\u8f91\u7cfb\u7edf\u7684\u6784\u5efa\u9700\u8981\u6df1\u5165\u7406\u89e3CRISPR\u6280\u672f\u53ca\u5176\u590d\u6742\u5b9e\u9a8c\u80cc\u666f\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bf8\u591a\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u751f\u7269\u8bbe\u8ba1\u95ee\u9898\u4e0a\u5f80\u5f80\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u3002\u672c\u6587\u4ecb\u7ecdCRISPR-GPT\uff0c\u4e00\u4e2a\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u9886\u57df\u77e5\u8bc6\u548c\u5916\u90e8\u5de5\u5177\uff0c\u4ee5\u81ea\u52a8\u5316\u5e76\u63d0\u5347\u57fa\u4e8eCRISPR\u7684\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\u8bbe\u8ba1\u8fc7\u7a0b\u3002CRISPR-GPT\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u534f\u52a9\u9009\u62e9CRISPR\u7cfb\u7edf\u3001\u8bbe\u8ba1\u5f15\u5bfcRNA\u3001\u63a8\u8350\u7ec6\u80de\u9012\u9001\u65b9\u6cd5\u3001\u8d77\u8349\u534f\u8bae\u4ee5\u53ca\u8bbe\u8ba1\u9a8c\u8bc1\u5b9e\u9a8c\u4ee5\u786e\u8ba4\u7f16\u8f91\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86CRISPR-GPT\u5982\u4f55\u5e2e\u52a9\u975e\u4e13\u5bb6\u7814\u7a76\u4eba\u5458\u4ece\u5934\u5f00\u59cb\u8fdb\u884c\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u5b9e\u9645\u6848\u4f8b\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u81ea\u52a8\u5316\u57fa\u56e0\u7f16\u8f91\u8bbe\u8ba1\u7684\u4f26\u7406\u548c\u76d1\u7ba1\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u8d1f\u8d23\u4efb\u548c\u900f\u660e\u4f7f\u7528\u6b64\u7c7b\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u76ee\u6807\u662f\u5f25\u5408\u521d\u7ea7\u751f\u7269\u7814\u7a76\u8005\u4e0eCRISPR\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u5c55\u793aLLM\u4ee3\u7406\u5728\u4fc3\u8fdb\u590d\u6742\u751f\u7269\u53d1\u73b0\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2404.17833": "|**2024-04-27**|**Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs**|Zhenlan Ji et.al.|[2404.17833](http://arxiv.org/abs/2404.17833)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u4e2d\uff0c\u7279\u522b\u662f\u5728\u5fc3\u7406\u5065\u5eb7\u652f\u6301\u3001\u5316\u5b66\u5408\u6210\u548c\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5c55\u73b0\u6548\u7528\uff0c\u4eba\u4eec\u53d1\u73b0\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u957f\u671f\u89c4\u5212\u65f6\u5bb9\u6613\u4ea7\u751f\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u2014\u2014PDoctor\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u7406\u89e3LLM\u4ee3\u7406\u7684\u9519\u8bef\u89c4\u5212\u3002PDoctor\u9996\u5148\u5b9a\u4e49\u4e86\u4e00\u4e2a\u9886\u57df\u7279\u5b9a\u7684\u8bed\u8a00\uff08DSL\uff09\uff0c\u7528\u4e8e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u501f\u52a9Z3\u7ea6\u675f\u6c42\u89e3\u5668\u751f\u6210\u5404\u79cd\u8f93\u5165\uff0c\u8fd9\u4e9b\u8f93\u5165\u662f\u63cf\u8ff0\u4e00\u7cfb\u5217\u4efb\u52a1\u5b8c\u6210\u9700\u6c42\u7684\u81ea\u7136\u8bed\u8a00\u6bb5\u843d\u3002\u7136\u540e\uff0cPDoctor\u4ece\u8fd9\u4e9b\u9700\u6c42\u4e2d\u63d0\u53d6\u7ea6\u675f\uff0c\u5f62\u6210\u4e00\u4e2a\u6d4b\u8bd5\u57fa\u51c6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u4e3b\u6d41\u7684\u4ee3\u7406\u6846\u67b6\u548c\u4e24\u4e2a\u5f3a\u5927\u7684LLMs\uff08GPT-3.5\u548cGPT-4\uff09\u5bf9PDoctor\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u80fd\u6709\u6548\u8bc6\u522b\u4ee3\u7406\u89c4\u5212\u4e2d\u7684\u5404\u79cd\u9519\u8bef\uff0c\u5e76\u4e3a\u5f00\u53d1\u8005\u548c\u7528\u6237\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u9519\u8bef\u7279\u6027\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u66ff\u4ee3\u8bbe\u8ba1\u548c\u6269\u5c55PDoctor\u7684\u65b9\u5411\u3002|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662](http://arxiv.org/abs/2404.17662)|**[link](https://github.com/alickzhu/player)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u95f4\u7684\u901a\u4fe1\u548c\u793e\u4f1a\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u6d89\u53ca\u7ade\u4e89\u4e0e\u5408\u4f5c\u7684\u52a8\u6001\u73af\u5883\u4e2d\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u590d\u6742\u63a8\u7406\u7684\u6784\u5efa\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5c24\u5176\u662f\u56e0\u4e3a\u57fa\u4e8e\u4fe1\u606f\u56fe\u7684\u641c\u7d22\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPLAYER*\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u4efb\u610f\u91c7\u6837\u5f0f\u89c4\u5212\u5668\u7684\u65b0\u6846\u67b6\uff0c\u5b83\u7ed3\u5408\u4e86\u4f20\u611f\u5668\u548c\u526a\u679d\u6280\u672f\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5b8c\u5168\u4f9d\u8d56\u4e8e\u95ee\u9898\u9a71\u52a8\u7684\u641c\u7d22\u6846\u67b6\uff0c\u9002\u7528\u4e8e\u9ad8\u96be\u5ea6\u7684\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u53ef\u91cf\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u901a\u8fc7\u591a\u9879\u9009\u62e9\u9898\u6765\u6d4b\u8bd5\uff0c\u5e76\u521b\u5efa\u4e86WellPlay\u6570\u636e\u96c6\uff0c\u5305\u542b1,482\u4e2a\u95ee\u7b54\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPLAYER*\u5728\u590d\u6742\u52a8\u6001\u73af\u5883\u4e2d\u7684\u6548\u7387\u548c\u6027\u80fd\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u53ef\u91cf\u5316\u7684\u5bf9\u6bd4\u7ed3\u679c\u3002**|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525](http://arxiv.org/abs/2404.17525)|null|\u4f20\u7edf\u7684\u673a\u68b0\u8bbe\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u901a\u8fc7\u7ecf\u9a8c\u5f15\u5bfc\u7684\u4fee\u6539\u548c\u6709\u9650\u5143\u5206\u6790\uff08FEA\uff09\u6765\u6ee1\u8db3\u7279\u5b9a\u9700\u6c42\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8017\u65f6\u4e14\u9ad8\u5ea6\u4f9d\u8d56\u4e2a\u4eba\u77e5\u8bc6\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u4e86\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u6a21\u578b\u6765\u7b80\u5316\u7e41\u7410\u7684\u4e13\u5bb6\u9a71\u52a8\u8fed\u4ee3\u8fc7\u7a0b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9700\u8981\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5176\u8bad\u7ec3\u9886\u57df\u548c\u4efb\u52a1\uff0c\u9650\u5236\u4e86\u8de8\u4efb\u52a1\u5e94\u7528\u3002\u8fd9\u5728\u81ea\u52a8\u5316\u6548\u7387\u4e0e\u8d44\u6e90\u9700\u6c42\u4e4b\u95f4\u5f62\u6210\u4e86\u6743\u8861\u3002 \u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u5c06\u9884\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u6709\u9650\u5143\u6a21\u5757\u7ed3\u5408\u3002\u6709\u9650\u5143\u6a21\u5757\u8bc4\u4f30\u6bcf\u4e2a\u8bbe\u8ba1\u5e76\u63d0\u4f9b\u5173\u952e\u53cd\u9988\uff0c\u5f15\u5bfcLLMs\u4e0d\u65ad\u5b66\u4e60\u3001\u89c4\u5212\u3001\u751f\u6210\u548c\u4f18\u5316\u8bbe\u8ba1\uff0c\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u8fdb\u884c\u4e13\u95e8\u8bad\u7ec3\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6841\u67b6\u7ed3\u6784\u7684\u8fed\u4ee3\u4f18\u5316\u4e2d\u5c55\u793a\u8fd9\u79cd\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u80fd\u591f\u6839\u636e\u7ed3\u6784\u5316\u7684\u53cd\u9988\u548c\u6807\u51c6\u8c03\u6574\u8bbe\u8ba1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6210\u529f\u751f\u6210\u7b26\u5408\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u6841\u67b6\u7ed3\u6784\u8bbe\u8ba1\uff0c\u6210\u529f\u7387\u9ad8\u8fbe90%\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6240\u65bd\u52a0\u7684\u7ea6\u675f\u6761\u4ef6\u3002\u901a\u8fc7\u63d0\u793a\u5f0f\u4f18\u5316\u6280\u672f\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u63a5\u6536\u5230\u89e3-\u5f97\u5206\u5bf9\u540e\uff0c\u80fd\u591f\u6839\u636e\u5176\u5185\u5728\u63a8\u7406\u80fd\u529b\u8fed\u4ee3\u4f18\u5316\u8bbe\u8ba1\u4ee5\u6ee1\u8db3\u89c4\u683c\u8981\u6c42\u3002 LLM\u4ee3\u7406\u80fd\u591f\u4ea7\u751f\u53ef\u884c\u7684\u8bbe\u8ba1\u5e76\u6839\u636e\u5176\u56fa\u6709\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4f18\u5316\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u81ea\u4e3b\u53d1\u5c55\u548c\u5b9e\u65bd\u6709\u6548\u7684\u8bbe\u8ba1\u7b56\u7565\u3002|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460](http://arxiv.org/abs/2404.17460)|null|\u672c\u6587\u8ba8\u8bba\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5bf9\u8bdd\u5f0f\u8f85\u5bfc\u7cfb\u7edf\uff08Conversational Tutoring Systems\uff0cCTS\uff09\uff0c\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u9996\u5148\uff0c\u7cfb\u7edf\u901a\u8fc7\u81ea\u52a8\u4ece\u8bfe\u7a0b\u6587\u672c\u4e2d\u751f\u6210\u6613\u4e8e\u7f16\u8f91\u7684\u6559\u5b66\u811a\u672c\uff0c\u5b9e\u73b0AI\u8f85\u52a9\u7684\u5185\u5bb9\u521b\u4f5c\u3002\u5176\u6b21\uff0c\u7cfb\u7edf\u901a\u8fc7\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff08Ruffle\u548cRiley\uff09\u4ee5\u5b66\u4e60\u6559\u5b66\u6a21\u5f0f\u8fd0\u884c\uff0c\u5206\u522b\u626e\u6f14\u5b66\u751f\u548c\u6559\u6388\u89d2\u8272\uff0c\u8fdb\u884c\u81ea\u7531\u5f62\u5f0f\u7684\u5bf9\u8bdd\uff0c\u9075\u5faa\u5178\u578b\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\u7684\u5185\u73af\u548c\u5916\u73af\u7ed3\u6784\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff08N=200\uff09\u4e2d\u5bf9\u6bd4\u4e86\u8be5\u7cfb\u7edf\u4e0e\u7b80\u5355\u7684\u95ee\u7b54\u804a\u5929\u673a\u5668\u4eba\u548c\u9605\u8bfb\u6d3b\u52a8\u5728\u652f\u6301\u751f\u7269\u5b66\u8bfe\u7a0b\u7684\u6548\u679c\u3002\u7814\u7a76\u5206\u6790\u4e86\u7cfb\u7edf\u4f7f\u7528\u6a21\u5f0f\u3001\u9884\u540e\u6d4b\u8bd5\u6210\u7ee9\u4ee5\u53ca\u7528\u6237\u4f53\u9a8c\u8c03\u67e5\uff0c\u7ed3\u679c\u663e\u793a\u7528\u6237\u5bf9Ruffle&Riley\u7684\u53c2\u4e0e\u5ea6\u9ad8\uff0c\u7406\u89e3\u529b\u5f3a\uff0c\u5e76\u8ba4\u4e3a\u63d0\u4f9b\u7684\u652f\u6301\u6709\u5e2e\u52a9\u3002\u5c3d\u7ba1Ruffle&Riley\u7528\u6237\u7684\u5b8c\u6210\u65f6\u95f4\u8f83\u957f\uff0c\u4f46\u5728\u77ed\u671f\u5b66\u4e60\u6210\u6548\u4e0a\u5e76\u672a\u53d1\u73b0\u663e\u8457\u5dee\u5f02\uff0c\u4f18\u4e8e\u9605\u8bfb\u6d3b\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u67b6\u6784\u548c\u7528\u6237\u7814\u7a76\u4e3a\u672a\u6765CTS\u8bbe\u8ba1\u8005\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u6e90\u6211\u4eec\u7684\u7cfb\u7edf\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8eLLM\u7684\u5b66\u4e60\u6280\u672f\u6709\u6548\u6559\u5b66\u8bbe\u8ba1\u7684\u7814\u7a76\u3002|\n", "2404.17153": "|**2024-04-26**|**A Unified Debugging Approach via LLM-Based Multi-Agent Synergy**|Cheryl Lee et.al.|[2404.17153](http://arxiv.org/abs/2404.17153)|null|\u5728\u8f6f\u4ef6\u8c03\u8bd5\u8fd9\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u52aa\u529b\u5b9e\u73b0\u81ea\u52a8\u5316\uff0c\u5305\u62ec\u6545\u969c\u5b9a\u4f4d\u548c\u4fee\u590d\u751f\u6210\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u8c03\u8bd5\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4f20\u7edf\u548c\u57fa\u4e8eLLM\u7684\u8c03\u8bd5\u5de5\u5177\u9762\u4e34\u4e09\u5927\u6311\u6218\uff1a1\uff09\u4e0a\u6e38\u7684\u6545\u969c\u5b9a\u4f4d\u4e0d\u51c6\u786e\u4f1a\u6ce2\u53ca\u4e0b\u6e38\u7684\u4fee\u590d\uff1b2\uff09\u5904\u7406\u590d\u6742\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\u4e0d\u8db3\uff1b3\uff09\u5ffd\u89c6\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u81ea\u52a8\u5316\u7684\u3001\u7edf\u4e00\u7684\u8c03\u8bd5\u6846\u67b6\u2014\u2014FixAgent\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u534f\u540c\u3002FixAgent\u80fd\u6267\u884c\u7aef\u5230\u7aef\u7684\u6545\u969c\u5b9a\u4f4d\u3001\u4fee\u590d\u548c\u5206\u6790\u3002 \u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0cLLMs\u80fd\u591f\u4ece\u4eba\u7c7b\u5f00\u53d1\u8005\u8ba4\u53ef\u7684\u901a\u7528\u8f6f\u4ef6\u5de5\u7a0b\u539f\u5219\u4e2d\u83b7\u76ca\uff0c\u6bd4\u5982\u201c\u6a61\u76ae\u9e2d\u8c03\u8bd5\u201d\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7a0b\u5e8f\u529f\u80fd\u548c\u903b\u8f91\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u4e2a\u7075\u611f\u6765\u6e90\u4e8e\u201c\u6a61\u76ae\u9e2d\u201d\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u4ee3\u7406\u4e13\u4e1a\u5316\u4e0e\u534f\u540c\u3001\u5173\u952e\u53d8\u91cf\u8ddf\u8e2a\u548c\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u4fc3\u4f7fLLMs\u63d0\u4f9b\u660e\u786e\u7684\u89e3\u91ca\uff0c\u5e76\u805a\u7126\u4e8e\u5173\u952e\u7684\u7a0b\u5e8f\u903b\u8f91\u4fe1\u606f\u3002\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684QuixBugs\u6570\u636e\u96c6\u4e0a\uff0cFixAgent\u6210\u529f\u4fee\u590d\u4e8680\u4e2abug\u4e2d\u768479\u4e2a\uff0c\u5176\u4e2d9\u4e2a\u662f\u4e4b\u524d\u672a\u89e3\u51b3\u7684\u3002\u5b83\u8fd8\u5728CodeFlaws\u4e0a\u5408\u7406\u5730\u4fee\u590d\u4e861.9\u500d\u4e8e\u6700\u4f73\u4fee\u590d\u5de5\u5177\u7684\u7f3a\u9677\uff0c\u800c\u4e14\u65e0\u9700\u4f4d\u7f6e\u4fe1\u606f\uff0c\u91c7\u6837\u7387\u4f4e\u4e8e0.6%\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4e0e\u4f7f\u7528\u4e0d\u540cLLM\u7684\u57fa\u7ebf\u6a21\u578b\u76f8\u6bd4\uff0cFixAgent\u63d0\u9ad8\u4e86\u7ea620%\u7684\u5408\u7406\u4fee\u590d\u548c\u6b63\u786e\u4fee\u590d\u7387\uff0c\u663e\u793a\u51fa\u6211\u4eec\u8bbe\u8ba1\u7684\u6709\u6548\u6027\u3002 \u6b64\u5916\uff0cFixAgent\u7684\u6b63\u786e\u7387\u9ad8\u8fbe97.26%\uff0c\u8868\u660e\u5b83\u6709\u53ef\u80fd\u514b\u670d\u73b0\u6709\u65b9\u6cd5\u7684\u8fc7\u62df\u5408\u95ee\u9898\u3002\u603b\u7ed3\u6765\u8bf4\uff0cFixAgent\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u81ea\u52a8\u5316\u8c03\u8bd5\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u8f6f\u4ef6\u8c03\u8bd5\u7684\u6548\u7387\u548c\u51c6\u786e\u6027\u3002|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698](http://arxiv.org/abs/2404.16698)|**[link](https://github.com/giorgiopiatti/govsim)**|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51b3\u7b56\u5b89\u5168\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGovernance of the Commons Simulation\u201d\uff08GovSim\uff09\u7684\u6a21\u62df\u5e73\u53f0\uff0c\u65e8\u5728\u7814\u7a76LLMs\u4e2d\u7684\u6218\u7565\u4e92\u52a8\u548c\u5408\u4f5c\u51b3\u7b56\u3002\u901a\u8fc7\u8fd9\u4e2a\u73af\u5883\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u4e4b\u95f4\u8d44\u6e90\u5206\u4eab\u7684\u52a8\u6001\uff0c\u5f3a\u8c03\u4e86\u4f26\u7406\u8003\u91cf\u3001\u6218\u7565\u89c4\u5212\u548c\u8c08\u5224\u6280\u5de7\u7684\u91cd\u8981\u6027\u3002GovSim\u5177\u6709\u7075\u6d3b\u6027\uff0c\u652f\u6301\u6587\u672c\u578b\u4ee3\u7406\uff0c\u5305\u62ecLLMs\u3002\u5229\u7528\u751f\u6210\u5f0f\u4ee3\u7406\u6846\u67b6\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u901a\u7528\u4ee3\u7406\uff0c\u4fbf\u4e8e\u6574\u5408\u4e0d\u540c\u7684LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u5728GovSim\u4e2d\uff0c\u53ea\u670915\u4e2a\u6d4b\u8bd5\u6a21\u578b\u4e2d\u76842\u4e2a\u80fd\u591f\u5b9e\u73b0\u53ef\u6301\u7eed\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u7ba1\u7406\u5171\u4eab\u8d44\u6e90\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u5982\u679c\u79fb\u9664\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u80fd\u529b\uff0c\u5b83\u4eec\u4f1a\u8fc7\u5ea6\u4f7f\u7528\u5171\u4eab\u8d44\u6e90\uff0c\u7a81\u51fa\u4e86\u5408\u4f5c\u4e2d\u6c9f\u901a\u7684\u5173\u952e\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u5927\u591a\u6570LLMs\u7f3a\u4e4f\u666e\u904d\u5316\u7684\u5047\u8bbe\u80fd\u529b\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u63a8\u7406\u6280\u80fd\u7684\u4e00\u4e2a\u91cd\u8981\u5f31\u70b9\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u6240\u6709\u7814\u7a76\u7ed3\u679c\uff0c\u5305\u62ec\u6a21\u62df\u73af\u5883\u3001\u4ee3\u7406\u63d0\u793a\u4ee5\u53ca\u5168\u9762\u7684\u7f51\u7edc\u754c\u9762\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba8\u8bba\u3002|\n", "2404.17605": "|**2024-04-24**|**Autonomous LLM-driven research from data to human-verifiable research papers**|Tal Ifargan et.al.|[2404.17605](http://arxiv.org/abs/2404.17605)|**[link](https://github.com/technion-kishony-lab/data-to-paper)**|**\u968f\u7740\u4eba\u5de5\u667a\u80fd\u63a8\u52a8\u79d1\u5b66\u53d1\u73b0\u7684\u6b65\u4f10\u52a0\u5feb\uff0c\u4eba\u4eec\u8fd8\u4e0d\u6e05\u695a\u5b8c\u5168\u7531AI\u9a71\u52a8\u7684\u7814\u7a76\u662f\u5426\u53ef\u884c\uff0c\u4ee5\u53ca\u5b83\u80fd\u5426\u9075\u5faa\u5173\u952e\u7684\u79d1\u5b66\u4ef7\u503c\u89c2\uff0c\u5982\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002\u4e3a\u4e86\u6a21\u62df\u4eba\u7c7b\u7684\u79d1\u5b66\u7814\u7a76\u5b9e\u8df5\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u201c\u6570\u636e\u5230\u8bba\u6587\u201d\uff08data-to-paper\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u5e73\u53f0\uff0c\u5f15\u5bfc\u76f8\u4e92\u534f\u4f5c\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u901a\u8fc7\u5b8c\u6574\u7684\u5206\u6b65\u9aa4\u7814\u7a76\u6d41\u7a0b\uff0c\u540c\u65f6\u7a0b\u5e8f\u5316\u8ffd\u8e2a\u4fe1\u606f\u6d41\uff0c\u5e76\u5141\u8bb8\u4eba\u7c7b\u76d1\u7763\u548c\u4e92\u52a8\u3002\u5728\u81ea\u52a8\u6a21\u5f0f\u4e0b\uff0c\u4ec5\u63d0\u4f9b\u6807\u6ce8\u6570\u636e\uff0c\u8be5\u5e73\u53f0\u5c31\u80fd\u63d0\u51fa\u5047\u8bbe\uff0c\u8bbe\u8ba1\u7814\u7a76\u8ba1\u5212\uff0c\u7f16\u5199\u548c\u8c03\u8bd5\u5206\u6790\u4ee3\u7801\uff0c\u751f\u6210\u548c\u89e3\u8bfb\u7ed3\u679c\uff0c\u751a\u81f3\u521b\u5efa\u5b8c\u6574\u4e14\u4fe1\u606f\u53ef\u8ffd\u6eaf\u7684\u79d1\u7814\u8bba\u6587\u3002\u5c3d\u7ba1\u7814\u7a76\u65b0\u9896\u6027\u6709\u9650\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u5c55\u793a\u4e86AI\u81ea\u4e3b\u4ece\u6570\u636e\u4e2d\u751f\u6210\u539f\u521b\u5b9a\u91cf\u6d1e\u5bdf\u7684\u80fd\u529b\u3002\u5bf9\u4e8e\u7b80\u5355\u7684\u7814\u7a76\u76ee\u6807\uff0c\u5168\u81ea\u52a8\u6d41\u7a0b\u80fd\u521b\u4f5c\u51fa\u5927\u7ea680-90%\u65e0\u9700\u91cd\u5927\u9519\u8bef\u7684\u7a3f\u4ef6\uff0c\u7136\u800c\u968f\u7740\u76ee\u6807\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u4eba\u7c7b\u7684\u5171\u540c\u53c2\u4e0e\u5bf9\u4e8e\u4fdd\u8bc1\u51c6\u786e\u6027\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u751f\u6210\u7684\u8bba\u6587\u672c\u8eab\u4e5f\u5177\u6709\u5185\u5728\u7684\u53ef\u9a8c\u8bc1\u6027\uff0c\u56e0\u4e3a\u4fe1\u606f\u8ffd\u8e2a\u4f7f\u5f97\u7ed3\u679c\u3001\u65b9\u6cd5\u548c\u6570\u636e\u7684\u94fe\u63a5\u53ef\u4ee5\u7a0b\u5e8f\u5316\u8fdb\u884c\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0cAI\u9a71\u52a8\u7684\u79d1\u7814\u53ef\u4ee5\u52a0\u901f\u79d1\u5b66\u53d1\u73b0\uff0c\u540c\u65f6\u589e\u5f3a\u800c\u975e\u5a01\u80c1\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002**|\n", "2404.16115": "|**2024-04-24**|**Online Personalizing White-box LLMs Generation with Neural Bandits**|Zekai Chen et.al.|[2404.16115](http://arxiv.org/abs/2404.16115)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u59cb\u751f\u6210\u4e2a\u6027\u5316\u7684\u6587\u672c\u5185\u5bb9\uff0c\u5982\u4f55\u5728\u4e0d\u4e3a\u6bcf\u4f4d\u7528\u6237\u521b\u5efa\u72ec\u7279\u6a21\u578b\u7684\u8d44\u6e90\u6d88\u8017\u4e0b\u5b9e\u73b0\u9ad8\u6548\u4e2a\u6027\u5316\u6210\u4e86\u65b0\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5728\u7ebf\u65b9\u6cd5\uff0c\u5229\u7528\u795e\u7ecf_bandit\u7b97\u6cd5\u52a8\u6001\u4f18\u5316\u8f6f\u6307\u4ee4\u5d4c\u5165\uff0c\u6839\u636e\u7528\u6237\u53cd\u9988\u8c03\u6574\u5185\u5bb9\uff0c\u4ece\u800c\u63d0\u5347\u767d\u76d2LLMs\u5f00\u653e\u6027\u6587\u672c\u751f\u6210\u7684\u4e2a\u6027\u5316\u6c34\u5e73\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u57fa\u7840\u7b56\u7565\u6709\u663e\u8457\u6027\u80fd\u63d0\u5347\u3002\u7279\u522b\u662f\u9488\u5bf9\u4e2a\u6027\u5316\u65b0\u95fb\u6807\u9898\u751f\u6210\uff0cNeuralTS\u5e26\u6765\u4e86\u9ad8\u8fbe62.9%\u7684\u6700\u4f73ROUGE\u5206\u6570\u63d0\u5347\u4ee5\u53ca2.76%\u7684LLM\u4ee3\u7406\u8bc4\u4f30\u5206\u6570\u589e\u957f\uff0c\u8fd9\u8868\u660e\u5176\u6548\u679c\u663e\u8457\u3002|\n", "2404.15974": "|**2024-04-24**|**A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples**|Lihang Pan et.al.|[2404.15974](http://arxiv.org/abs/2404.15974)|null|## \u7ffb\u8bd1 \u5355\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fde\u63a5\u591a\u4e2aLLM\u4ee3\u7406\u6784\u5efa\u7684\u7f51\u7edc\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6574\u4f53\u6027\u80fd\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4eba\u673a\u534f\u4f5c\u5de5\u5177\u2014\u2014EasyLAN\uff0c\u65e8\u5728\u5e2e\u52a9\u5f00\u53d1\u8005\u8f7b\u677e\u6784\u5efaLLM\u4ee3\u7406\u7f51\u7edc\uff08LAN\uff09\u3002EasyLAN\u9996\u5148\u6839\u636e\u4efb\u52a1\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u4ec5\u5305\u542b\u4e00\u4e2a\u4ee3\u7406\u7684\u521d\u59cb\u7f51\u7edc\u3002\u63a5\u7740\uff0c\u5b83\u5229\u7528\u5c11\u91cf\u8bad\u7ec3\u793a\u4f8b\u6765\u8c03\u6574\u7f51\u7edc\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u793a\u4f8b\uff0cEasyLAN\u5206\u6790\u8f93\u51fa\u4e0e\u771f\u5b9e\u7ed3\u679c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u5e76\u627e\u51fa\u9519\u8bef\u7684\u539f\u56e0\u3002EasyLAN\u4f1a\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u7b56\u7565\u6765\u4fee\u6b63\u8fd9\u4e9b\u95ee\u9898\u3002\u7528\u6237\u53ef\u4ee5\u4ecb\u5165EasyLAN\u7684\u5de5\u4f5c\u6d41\u7a0b\u6216\u76f4\u63a5\u4fee\u6539LAN\u3002\u6700\u7ec8\uff0cLAN\u4ece\u5355\u4e2a\u4ee3\u7406\u53d1\u5c55\u6210\u591a\u4ee3\u7406\u7684\u7f51\u7edc\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEasyLAN\u80fd\u591f\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u6784\u5efa\u6027\u80fd\u826f\u597d\u7684LAN\u3002|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269](http://arxiv.org/abs/2404.15269)|**[link](https://github.com/gao-g/prelude)**|**\u6211\u4eec\u7814\u7a76\u57fa\u4e8e\u7528\u6237\u5bf9\u8bed\u8a00\u6a21\u578b\u7f16\u8f91\u7684\u4e92\u52a8\u5b66\u4e60\u8bed\u8a00\u4ee3\u7406\u3002\u5728\u8bf8\u5982\u5199\u4f5c\u52a9\u624b\u7684\u5e38\u89c1\u573a\u666f\u4e2d\uff0c\u7528\u6237\u4e0e\u8bed\u8a00\u4ee3\u7406\u4ea4\u4e92\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u751f\u6210\u54cd\u5e94\uff0c\u5e76\u53ef\u80fd\u9009\u62e9\u6027\u5730\u7f16\u8f91\u4ee3\u7406\u7684\u54cd\u5e94\u4ee5\u53cd\u6620\u4ed6\u4eec\u7684\u6f5c\u5728\u504f\u597d\uff0c\u540c\u65f6\u63d0\u9ad8\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u7f16\u8f91\u53cd\u9988\u662f\u81ea\u7136\u4ea7\u751f\u7684\uff0c\u9002\u5408\u7528\u4e8e\u63d0\u5347\u4ee3\u7406\u4e0e\u7528\u6237\u504f\u597d\u7684\u5951\u5408\u5ea6\uff0c\u964d\u4f4e\u540e\u7eed\u7528\u6237\u7684\u7f16\u8f91\u6210\u672c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPRELUDE\u6846\u67b6\uff0c\u5b83\u6839\u636e\u5386\u53f2\u7f16\u8f91\u6570\u636e\u63a8\u65ad\u7528\u6237\u7684\u6f5c\u5728\u504f\u597d\uff0c\u5e76\u636e\u6b64\u8bbe\u8ba1\u4e00\u4e2a\u63d0\u793a\u7b56\u7565\uff0c\u5f15\u5bfc\u672a\u6765\u7684\u54cd\u5e94\u751f\u6210\uff0c\u907f\u514d\u4e86\u6602\u8d35\u4e14\u96be\u4ee5\u6269\u5c55\u7684\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8fd8\u80fd\u4fdd\u6301\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u5b66\u4e60\u63cf\u8ff0\u6027\u7684\u504f\u597d\u6709\u52a9\u4e8e\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\uff0c\u7528\u6237\u53ef\u4ee5\u67e5\u770b\u548c\u8c03\u6574\u5b66\u4e60\u5230\u7684\u504f\u597d\u3002\u7136\u800c\uff0c\u7528\u6237\u504f\u597d\u53ef\u80fd\u590d\u6742\u591a\u53d8\uff0c\u53d7\u60c5\u5883\u5f71\u54cd\uff0c\u56e0\u6b64\u5b66\u4e60\u8d77\u6765\u5177\u6709\u6311\u6218\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCIPHER\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6839\u636e\u7528\u6237\u7f16\u8f91\u63a8\u65ad\u7ed9\u5b9a\u60c5\u5883\u4e0b\u7684\u7528\u6237\u504f\u597d\u3002\u672a\u6765\uff0cCIPHER\u4f1a\u4ece\u5386\u53f2\u4e2d\u7684k\u4e2a\u6700\u63a5\u8fd1\u7684\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u63a8\u65ad\u51fa\u7684\u504f\u597d\uff0c\u7efc\u5408\u751f\u6210\u54cd\u5e94\u3002\u6211\u4eec\u5728\u603b\u7ed3\u548c\u7535\u5b50\u90ae\u4ef6\u5199\u4f5c\u4e24\u4e2a\u4e92\u52a8\u73af\u5883\u4e2d\u4f7f\u7528GPT-4\u6a21\u62df\u7528\u6237\u8fdb\u884c\u8bc4\u4f30\uff0c\u4e0e\u76f4\u63a5\u4f7f\u7528\u7528\u6237\u7f16\u8f91\u4f46\u4e0d\u5b66\u4e60\u63cf\u8ff0\u6027\u504f\u597d\u7684\u7b97\u6cd5\uff0c\u4ee5\u53ca\u5b66\u4e60\u5168\u5c40\u65e0\u4e0a\u4e0b\u6587\u504f\u597d\u7684\u7b97\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002 \u5728\u4e24\u9879\u4efb\u52a1\u4e2d\uff0cCIPHER\u90fd\u5b9e\u73b0\u4e86\u6700\u4f4e\u7684\u7f16\u8f91\u8ddd\u79bb\u6210\u672c\uff0c\u5e76\u4e14\u5b66\u4e60\u5230\u7684\u504f\u597d\u4e0e\u771f\u5b9e\u504f\u597d\u663e\u793a\u51fa\u663e\u8457\u7684\u76f8\u4f3c\u6027\u3002**|\n", "2404.14387": "|**2024-04-22**|**A Survey on Self-Evolution of Large Language Models**|Zhengwei Tao et.al.|[2404.14387](http://arxiv.org/abs/2404.14387)|**[link](https://github.com/alibabaresearch/damo-convai)**|**## \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u9886\u57df\u548c\u667a\u80fd\u4ee3\u7406\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4f9d\u8d56\u4eba\u7c7b\u6216\u5916\u90e8\u6a21\u578b\u76d1\u7763\u7684\u73b0\u6709LLMs\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u591a\u6837\u6027\u589e\u52a0\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6210\u672c\u9ad8\u6602\u548c\u6027\u80fd\u74f6\u9888\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u5e94\u8fd0\u800c\u751f\uff0c\u8fd9\u79cd\u7b56\u7565\u5141\u8bb8LLMs\u81ea\u4e3b\u83b7\u53d6\u3001\u7cbe\u70bc\u5e76\u4ece\u81ea\u8eab\u751f\u6210\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\uff0c\u501f\u9274\u4eba\u7c7b\u7ecf\u9a8c\u5b66\u4e60\u8fc7\u7a0b\uff0c\u6709\u671b\u63a8\u52a8LLMs\u5411\u8d85\u7ea7\u667a\u80fd\u53d1\u5c55\u3002\u672c\u6587\u5168\u9762\u7efc\u8ff0\u4e86LLMs\u4e2d\u7684\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u6982\u5ff5\u6846\u67b6\uff0c\u5c06\u8fdb\u5316\u8fc7\u7a0b\u5212\u5206\u4e3a\u8fed\u4ee3\u5faa\u73af\u7684\u56db\u4e2a\u9636\u6bb5\uff1a\u7ecf\u9a8c\u83b7\u53d6\u3001\u7ecf\u9a8c\u7ec6\u5316\u3001\u66f4\u65b0\u548c\u8bc4\u4f30\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5206\u7c7b\u63a2\u8ba8LLMs\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u8fdb\u5316\u76ee\u6807\uff0c\u5e76\u5bf9\u76f8\u5173\u6587\u732e\u8fdb\u884c\u603b\u7ed3\uff0c\u63d0\u4f9b\u6bcf\u4e2a\u6a21\u5757\u7684\u5206\u7c7b\u548c\u89c1\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5f53\u524d\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u4e3a\u52a0\u901f\u81ea\u6f14\u8fdbLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u5173\u952e\u6d1e\u89c1\u3002**|\n", "2404.13501": "|**2024-04-21**|**A Survey on the Memory Mechanism of Large Language Model based Agents**|Zeyu Zhang et.al.|[2404.13501](http://arxiv.org/abs/2404.13501)|**[link](https://github.com/nuster1128/llm_agent_memory_survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u79d1\u7814\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u57fa\u4e8eLLMs\u7684\u667a\u80fd\u4ee3\u7406\u56e0\u5176\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5bf9\u4e8e\u89e3\u51b3\u9700\u8981\u957f\u671f\u590d\u6742\u4ea4\u4e92\u7684\u73b0\u5b9e\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u652f\u6301agent-environment\u4ea4\u4e92\u7684\u5173\u952e\u8981\u7d20\u662f\u4ee3\u7406\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u5c3d\u7ba1\u5df2\u6709\u4f17\u591a\u6709\u524d\u666f\u7684\u8bb0\u5fc6\u8bbe\u8ba1\u88ab\u63d0\u51fa\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u5206\u6563\u5728\u591a\u7bc7\u8bba\u6587\u4e2d\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7efc\u8ff0\u6765\u7cfb\u7edf\u6027\u5730\u603b\u7ed3\u548c\u6bd4\u8f83\uff0c\u672a\u80fd\u63d0\u70bc\u51fa\u901a\u7528\u4e14\u6709\u6548\u7684\u8bbe\u8ba1\u6a21\u5f0f\u4ee5\u542f\u53d1\u540e\u7eed\u7814\u7a76\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4efd\u5173\u4e8eLLM\u57fa\u4ee3\u7406\u8bb0\u5fc6\u673a\u5236\u7684\u5168\u9762\u8c03\u67e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8\u8bb0\u5fc6\u5728LLM\u4ee3\u7406\u4e2d\u7684\u201c\u662f\u4ec0\u4e48\u201d\u4ee5\u53ca\u201c\u4e3a\u4ec0\u4e48\u9700\u8981\u201d\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u56de\u987e\u4e86\u5173\u4e8e\u8bb0\u5fc6\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4f1a\u5c55\u793a\u8bb0\u5fc6\u6a21\u5757\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u626e\u6f14\u7684\u91cd\u8981\u89d2\u8272\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f1a\u5206\u6790\u73b0\u6709\u5de5\u4f5c\u7684\u5c40\u9650\uff0c\u5e76\u6307\u51fa\u91cd\u8981\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u4e86\u8ddf\u8e2a\u8be5\u9886\u57df\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2aGitHub\u4ed3\u5e93\uff1a\\url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}\u3002**|\n", "2404.11964": "|**2024-04-18**|**From Language Models to Practical Self-Improving Computer Agents**|Alex Sheng et.al.|[2404.11964](http://arxiv.org/abs/2404.11964)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u76f4\u63a5\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u521b\u5efa\u80fd\u591f\u6267\u884c\u5404\u79cd\u8ba1\u7b97\u673a\u4efb\u52a1\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u5e76\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u6765\u53d1\u5c55\u5de5\u5177\u548c\u589e\u5f3a\u529f\u80fd\uff0c\u4ee5\u89e3\u51b3\u65e5\u76ca\u590d\u6742\u7684\u4efb\u52a1\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u663e\u793a\u51fa\u4ece\u975e\u53c2\u6570\u589e\u5f3a\u4e2d\u83b7\u76ca\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u5927\u91cf\u96c6\u4e2d\u5728\u5f00\u53d1\u8f6f\u4ef6\uff0c\u4ee5\u8d4b\u4e88LLMs\u5404\u79cd\u80fd\u529b\u3002\u6211\u4eec\u5efa\u8bae\uff0c\u901a\u8fc7\u9002\u5f53\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u4e00\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u7cfb\u7edf\u5730\u751f\u6210\u8f6f\u4ef6\u6765\u589e\u5f3a\u81ea\u8eab\uff0c\u800c\u4e0d\u662f\u4f9d\u8d56\u4eba\u7c7b\u5de5\u7a0b\u7684\u9759\u6001\u8f6f\u4ef6\u5f00\u53d1\u3002 \u6211\u4eec\u901a\u8fc7\u4e00\u4e9b\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff1a\u4ec5\u901a\u8fc7\u7ec8\u7aef\u8bbf\u95ee\uff0c\u6211\u4eec\u5f15\u5bfcLLM\u4ee3\u7406\u6dfb\u52a0\u4e86\u68c0\u7d22\u3001\u4e92\u8054\u7f51\u641c\u7d22\u3001\u7f51\u9875\u5bfc\u822a\u548c\u6587\u672c\u7f16\u8f91\u529f\u80fd\u3002\u8be5\u4ee3\u7406\u6709\u6548\u5730\u5229\u7528\u8fd9\u4e9b\u5de5\u5177\u89e3\u51b3\u4e86\u95ee\u9898\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u548c\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\u3002\u8fd9\u79cd\u65b9\u6cd5\u8868\u660e\uff0c\u901a\u8fc7\u8fde\u7eed\u63d0\u95ee\u548c\u5de7\u5999\u7684\u63d0\u793a\u8bbe\u8ba1\uff0cLLM\u80fd\u591f\u81ea\u4e3b\u6269\u5c55\u5176\u529f\u80fd\uff0c\u6267\u884c\u5b9e\u9645\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002|\n", "2404.11794": "|**2024-04-25**|**Automated Social Science: Language Models as Scientist and Subjects**|Benjamin S. Manning et.al.|[2404.11794](http://arxiv.org/abs/2404.11794)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u6784\u5efa\u548c\u6d4b\u8bd5\u793e\u4f1a\u79d1\u5b66\u5047\u8bbe\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u5173\u952e\u5728\u4e8e\u4f7f\u7528\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u3002\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9648\u8ff0\u5047\u8bbe\u7684\u8bed\u8a00\u3001\u6784\u5efaLLM\u57fa\u7840\u4ee3\u7406\u7684\u84dd\u56fe\u3001\u5b9e\u9a8c\u8bbe\u8ba1\u4ee5\u53ca\u6570\u636e\u5206\u6790\u8ba1\u5212\u3002\u62df\u5408\u540e\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u53ef\u4f9b\u9884\u6d4b\u6216\u89c4\u5212\u540e\u7eed\u5b9e\u9a8c\u3002\u6211\u4eec\u901a\u8fc7\u51e0\u4e2a\u573a\u666f\u8fdb\u884c\u4e86\u6f14\u793a\uff1a\u8c08\u5224\u3001\u4fdd\u91ca\u542c\u8bc1\u4f1a\u3001\u6c42\u804c\u9762\u8bd5\u548c\u62cd\u5356\u3002\u5728\u8fd9\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7cfb\u7edf\u65e2\u63d0\u51fa\u4e86\u56e0\u679c\u5173\u7cfb\uff0c\u4e5f\u8fdb\u884c\u4e86\u68c0\u9a8c\uff0c\u53d1\u73b0\u4e86\u4e00\u4e9b\u8bc1\u636e\uff0c\u800c\u6709\u4e9b\u5219\u6ca1\u6709\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u4ece\u8fd9\u4e9b\u793e\u4f1a\u4e92\u52a8\u6a21\u62df\u4e2d\u83b7\u53d6\u7684\u6d1e\u5bdf\u5e76\u975e\u4ec5\u901a\u8fc7\u76f4\u63a5\u8be2\u95eeLLM\u5c31\u80fd\u83b7\u5f97\u3002\u5f53\u7ed9\u5b9a\u6bcf\u4e2a\u573a\u666f\u7684\u5efa\u8bae\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u65f6\uff0cLLM\u5728\u9884\u6d4b\u4f30\u8ba1\u6548\u5e94\u7684\u7b26\u53f7\u65b9\u9762\u8868\u73b0\u826f\u597d\uff0c\u4f46\u65e0\u6cd5\u53ef\u9760\u5730\u9884\u6d4b\u6548\u5e94\u7684\u5927\u5c0f\u3002\u5728\u62cd\u5356\u5b9e\u9a8c\u4e2d\uff0c\u6a21\u62df\u7ed3\u679c\u4e0e\u62cd\u5356\u7406\u8bba\u7684\u9884\u6d4b\u7d27\u5bc6\u543b\u5408\uff0c\u4f46LLM\u76f4\u63a5\u63d0\u53d6\u7684\u6e05\u7b97\u4ef7\u683c\u9884\u6d4b\u4e0d\u51c6\u786e\u3002\u7136\u800c\uff0c\u5982\u679c\u6a21\u578b\u80fd\u57fa\u4e8e\u62df\u5408\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u8fdb\u884c\u6761\u4ef6\u5316\uff0cLLM\u7684\u9884\u6d4b\u4f1a\u5927\u5e45\u6539\u8fdb\u3002\u7b80\u800c\u8a00\u4e4b\uff0cLLM\u77e5\u9053\u7684\u6bd4\u5b83\u80fd\u7acb\u5373\u8868\u8fbe\u7684\u8981\u591a\u3002|\n", "2404.11483": "|**2024-04-17**|**AgentKit: Flow Engineering with Graphs, not Coding**|Yue Wu et.al.|[2404.11483](http://arxiv.org/abs/2404.11483)|**[link](https://github.com/holmeswww/agentkit)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u76f4\u89c2\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u6846\u67b6\uff08AgentKit\uff09\uff0c\u65e8\u5728\u4e3a\u591a\u529f\u80fd\u4ee3\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u65b9\u6cd5\u3002AgentKit\u901a\u8fc7\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6784\u5efa\u590d\u6742\u7684\u201c\u601d\u7ef4\u8fc7\u7a0b\u201d\u3002\u5176\u57fa\u672c\u5355\u5143\u662f\u8282\u70b9\uff0c\u5305\u542b\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u3002\u7528\u6237\u53ef\u4ee5\u50cf\u62fc\u63a5\u4e50\u9ad8\u79ef\u6728\u4e00\u6837\u8fde\u63a5\u8fd9\u4e9b\u8282\u70b9\uff0c\u4ece\u800c\u660e\u786e\u8bbe\u8ba1\u51fa\u81ea\u7136\u7ed3\u6784\u5316\u7684\u201c\u601d\u8003\u6d41\u7a0b\u201d\u3002\u4f8b\u5982\uff0c\u5728\u64b0\u5199\u8bba\u6587\u65f6\uff0c\u53ef\u80fd\u7684\u6b65\u9aa4\u5305\u62ec\uff1a1\uff09\u786e\u5b9a\u6838\u5fc3\u4fe1\u606f\uff0c2\uff09\u8bc6\u522b\u7814\u7a76\u7a7a\u767d\u7b49\u3002AgentKit\u7684\u6a21\u5757\u5316\u7279\u6027\u4f7f\u5f97\u9ad8\u7ea7\u529f\u80fd\u5982\u5373\u5174\u7684\u5c42\u6b21\u5316\u89c4\u5212\u3001\u53cd\u601d\u548c\u4ece\u4e92\u52a8\u4e2d\u5b66\u4e60\u53d8\u5f97\u53ef\u80fd\u3002\u7531\u4e8e\u5176\u76f4\u89c2\u4e14\u6a21\u62df\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u7684\u8bbe\u8ba1\uff0c\u5373\u4f7f\u6ca1\u6709\u7f16\u7a0b\u7ecf\u9a8c\u7684\u4eba\u4e5f\u80fd\u521b\u5efa\u548c\u8c03\u6574\u57fa\u7840\u4ee3\u7406\u3002\u5b9a\u91cf\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528AgentKit\u8bbe\u8ba1\u7684\u4ee3\u7406\u5728WebShop\u548cCrafter\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u6210\u679c\u8868\u660eAgentKit\u6709\u6f5c\u529b\u4f7fLLM\u4ee3\u7406\u5728\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u4e0b\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728GitHub\uff1ahttps://github.com/holmeswww/AgentKit\u3002**|\n", "2404.09982": "|**2024-04-15**|**Memory Sharing for Large Language Model based Agents**|Hang Gao et.al.|[2404.09982](http://arxiv.org/abs/2404.09982)|**[link](https://github.com/ghupppp/memorysharingllm)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6267\u884c\u4efb\u52a1\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u91cd\u5927\u7a81\u7834\uff0c\u5b83\u51cf\u5c11\u4e86\u5bf9\u56fa\u5b9a\u7b54\u6848\u4efb\u52a1\uff08\u5982\u5e38\u8bc6\u95ee\u9898\u548c\u662f\u975e\u67e5\u8be2\uff09\u7684\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03\u9700\u6c42\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u5f00\u653e\u6027\u6311\u6218\u5982\u8bd7\u6b4c\u521b\u4f5c\u65f6\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5c40\u9650\uff0c\u4e3b\u8981\u6e90\u4e8e\u63d0\u4f9b\u7684\u793a\u4f8b\u5168\u9762\u6027\u4ee5\u53ca\u6a21\u578b\u7406\u89e3\u95ee\u9898\u5185\u5bb9\u7684\u80fd\u529b\u4e0d\u8db3\uff0c\u5bfc\u81f4\u8f93\u51fa\u5f80\u5f80\u4e0e\u9884\u671f\u7ed3\u679c\u5927\u76f8\u5f84\u5ead\u3002\u9488\u5bf9\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86Memory-Sharing\uff08MS\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLM\u591a\u4ee3\u7406\u7684\u5b9e\u65f6\u8bb0\u5fc6\u5b58\u50a8\u548c\u68c0\u7d22\u7cfb\u7edf\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u8fc7\u7a0b\u3002\u6bcf\u4e2a\u201c\u8bb0\u5fc6\u201d\u5355\u5143\u8bb0\u5f55\u4e86\u63d0\u51fa\u7684\u67e5\u8be2\u53ca\u5176\u6765\u81eaLLM\u4ee3\u7406\u7684\u5373\u65f6\u54cd\u5e94\uff0c\u4ece\u591a\u4e2a\u7c7b\u4f3c\u4ee3\u7406\u4e2d\u805a\u5408\u8fd9\u4e9b\u8bb0\u5fc6\uff0c\u5f62\u6210\u6240\u6709\u4ee3\u7406\u5171\u4eab\u7684\u4e30\u5bcc\u8bb0\u5fc6\u6c60\u3002MS\u6846\u67b6\u4e0d\u4ec5\u5e2e\u52a9\u4ee3\u7406\u627e\u5230\u7279\u5b9a\u4efb\u52a1\u7684\u76f8\u5173\u793a\u4f8b\uff0c\u8fd8\u8bc4\u4f30\u5176\u8bb0\u5fc6\u7684\u6f5c\u5728\u5229\u7528\u4ef7\u503c\uff0c\u4f9b\u5176\u4ed6\u4ee3\u7406\u672a\u6765\u5e94\u7528\u3002\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u7684\u5b9e\u8bc1\u9a8c\u8bc1\u663e\u793a\uff0cMS\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5904\u7406\u5f00\u653e\u6027\u95ee\u9898\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u54ea\u79cd\u8bb0\u5fc6\u6c60\u548c\u68c0\u7d22\u7b56\u7565\u80fd\u66f4\u597d\u5730\u652f\u6301\u4ee3\u7406\uff0c\u4e3aMS\u7684\u672a\u6765\u53d1\u5c55\u63d0\u4f9b\u4e86\u65b9\u5411\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\uff1ahttps://github.com/GHupppp/MemorySharingLLM \u83b7\u53d6\u3002**|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127](http://arxiv.org/abs/2404.09127)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|**### \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u5b83\u4eec\u901a\u5e38\u6821\u51c6\u4e0d\u826f\u4e14\u8fc7\u5ea6\u81ea\u4fe1\uff0c\u7279\u522b\u662f\u5728\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u4e2d\u3002\u4eba\u7c7b\u7684\u51b3\u7b56\u548c\u4fe1\u5fc3\u4e0d\u4ec5\u6e90\u4e8e\u5185\u5728\u4fe1\u5ff5\uff0c\u8fd8\u80fd\u901a\u8fc7\u65e5\u5e38\u89c2\u5bdf\u8fdb\u884c\u8c03\u6574\uff0c\u800c\u73b0\u6709LLM\u7684\u6821\u51c6\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u5355\u4e2a\u6a21\u578b\u7684\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u201c\u96c6\u4f53\u667a\u6167\u201d\uff1a\u591a\u4e2aLLM\u4e4b\u95f4\u7684\u534f\u4f5c\u8868\u8fbe\u80fd\u529b\uff0c\u8fd9\u53ef\u4ee5\u96c6\u4f53\u63d0\u9ad8\u51c6\u786e\u6027\u548c\u6821\u51c6\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u8bad\u7ec3\u540e\u5904\u7406\u7684\u6821\u51c6\u7b56\u7565\u2014\u2014\u534f\u4f5c\u6821\u51c6\uff08Collaborative Calibration\uff09\uff0c\u5b83\u5229\u7528\u591a\u4ee3\u7406\u5de5\u5177\u589e\u5f3a\u7684LLMs\u5728\u6a21\u62df\u7684\u7fa4\u4f53\u8ba8\u8bba\u8fc7\u7a0b\u4e2d\uff0c\u5171\u540c\u63d0\u5347\u6821\u51c6\u80fd\u529b\u548c\u63a8\u7406\u5408\u7406\u6027\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u751f\u6210\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u534f\u4f5c\u6821\u51c6\u7684\u6709\u6548\u6027\uff0c\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u6574\u5408\u96c6\u4f53\u6821\u51c6\u540e\u7684\u4fe1\u5fc3\u8bc4\u4f30\u548c\u63d0\u5347\u6a21\u578b\u9884\u6d4b\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077](http://arxiv.org/abs/2404.09077)|**[link](https://github.com/zukangy/kgp-curiousllm)**|**\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u5e93\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u6210\u6548\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u5f80\u5f80\u529b\u6709\u4e0d\u902e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5bf9\u4e00\u79cd\u540d\u4e3a\u77e5\u8bc6\u56fe\u8c31\u63d0\u793a\uff08KGP\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4ee5\u63d0\u5347\u63a8\u7406\u548c\u641c\u7d22\u7cbe\u5ea6\u3002\u7136\u800c\uff0c\u539f\u59cb\u7684KGP\u6846\u67b6\u9700\u8981\u6602\u8d35\u7684\u5927\u89c4\u6a21\u6570\u636e\u5fae\u8c03\uff0c\u5e76\u4e14\u4ecd\u5b58\u5728LLM\u7684\u9519\u8bef\u63a8\u65ad\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u878d\u5165\u63a8\u7406\u80fd\u529b\u7684LLM\u4ee3\u7406\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u7684\u597d\u5947\u5fc3\uff0c\u901a\u8fc7\u63d0\u95ee\u6765\u66f4\u6709\u6548\u5730\u5bfc\u822a\u641c\u7d22\u8fc7\u7a0b\u3002\u8fd9\u4e2a\u7b80\u5355\u7684\u6539\u8fdb\u663e\u8457\u63d0\u9ad8\u4e86LLM\u5728QA\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u521d\u59cbKGP\u6846\u67b6\u7684\u9ad8\u6210\u672c\u548c\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u8fdb\u4e00\u6b65\u53d1\u5c55\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u7cbe\u786e\u3001\u66f4\u5feb\u6377\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684QA\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2404.09043": "|**2024-04-13**|**Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation**|Jia Gu et.al.|[2404.09043](http://arxiv.org/abs/2404.09043)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u98de\u901f\u53d1\u5c55\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528LLMs\u6a21\u62df\u4eba\u7c7b\u7684\u884c\u4e3a\u51b3\u7b56\u8fc7\u7a0b\uff0c\u901a\u5e38\u8fd9\u4e9b\u8fc7\u7a0b\u88ab\u8868\u793a\u4e3a\u9a6c\u5c14\u53ef\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDPs\uff09\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u52a8\u4f5c\u9075\u5faa\u7279\u5b9a\u7684\u6982\u7387\u5206\u5e03\uff0c\u5e76\u9700\u8981\u8fed\u4ee3\u91c7\u6837\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u63a2\u7a76LLM\u4ee3\u7406\u7406\u89e3\u6982\u7387\u5206\u5e03\u7684\u80fd\u529b\uff0c\u4ee5\u901a\u8fc7\u6982\u7387\u91c7\u6837\u6307\u5bfc\u884c\u4e3a\u51b3\u7b56\u5e76\u751f\u6210\u884c\u4e3a\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u65b9\u9762\uff1a\u4e00\u662f\u5df2\u77e5\u7cbe\u786e\u6982\u7387\u5206\u5e03\u7684\u6a21\u62df\uff0c\u4e8c\u662f\u6a21\u7cca\u6982\u7387\u5206\u5e03\u7684\u5e8f\u5217\u751f\u6210\u3002 \u5728\u5df2\u77e5\u6982\u7387\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\uff0c\u4ee3\u7406\u9700\u8981\u6839\u636e\u95ee\u9898\u63cf\u8ff0\u63d0\u4f9b\u6982\u7387\u5206\u5e03\u7684\u7c7b\u578b\u548c\u53c2\u6570\uff0c\u7136\u540e\u7ed9\u51fa\u91c7\u6837\u5e8f\u5217\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u663e\u793a\uff0cLLM\u4ee3\u7406\u5728\u8fd9\u65b9\u9762\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4f46\u901a\u8fc7\u7f16\u7a0b\u5de5\u5177\u53ef\u4ee5\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u63d0\u9ad8\u91c7\u6837\u6210\u529f\u7387\u3002\u800c\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\uff0c\u6982\u7387\u5206\u5e03\u5f80\u5f80\u4e0d\u660e\u786e\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5728\u7b2c\u4e8c\u90e8\u5206\u8ba9\u4ee3\u7406\u8c03\u6574\u5728\u7ebf\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u6d3b\u8dc3\u5ea6\uff0c\u5e76\u5206\u6790\u884c\u52a8\u9891\u7387\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u501f\u52a9\u7f16\u7a0b\u5de5\u5177\uff0cLLM\u4ee3\u7406\u4f9d\u7136\u65e0\u6cd5\u6709\u6548\u5730\u91c7\u6837\u6982\u7387\u5206\u5e03\u3002\u8fd9\u610f\u5473\u7740\u5728\u76f4\u63a5\u5c06LLM\u4f5c\u4e3a\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\u5e94\u7528\u4e4b\u524d\uff0c\u8fd8\u9700\u8981\u8c28\u614e\u5bf9\u5f85\u3002|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492](http://arxiv.org/abs/2404.08492)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u535a\u5f08\u8bba\u6846\u67b6\u4e0b\u7684\u6e38\u620f\u884c\u4e3a\u7406\u89e3\u6f5c\u529b\u65e5\u76ca\u663e\u73b0\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u901a\u8fc7\u6a21\u62df\u5206\u6790\u4e0d\u540c\u7c7b\u578bLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u7ecf\u5178 Beauty Contest \u6e38\u620f\u4e2d\u7684\u7b56\u7565\u4e92\u52a8\u3002\u501f\u9274\u4eba\u7c7b\u5b9e\u9a8c\uff0c\u6211\u4eec\u5bf9LLM\u4ee3\u7406\u7684\u7b56\u7565\u5c42\u6b21\u8fdb\u884c\u7c7b\u4f3c\u7684\u8bc4\u4f30\uff0c\u53d1\u73b0\u5b83\u4eec\u5c55\u73b0\u51fa\u4ece\u96f6\u7ea7\u5230\u4e00\u7ea7\u7684\u4e0d\u540c\u7a0b\u5ea6\u63a8\u7406\u80fd\u529b\uff0c\u5e76\u5728\u91cd\u590d\u6e38\u620f\u4e2d\u8868\u73b0\u51fa\u884c\u52a8\u8d8b\u540c\u3002\u6b64\u5916\uff0c\u6211\u8fd8\u63a2\u8ba8\u4e86\u4e0d\u540c\u7c7b\u578b\u7684\u4ee3\u7406\u7fa4\u4f53\u6784\u6210\u5982\u4f55\u5f71\u54cd\u6218\u7565\u884c\u4e3a\uff1a\u9ad8\u6bd4\u4f8b\u7684\u56fa\u5b9a\u7b56\u7565\u5bf9\u624b\u80fd\u4fc3\u8fdbLLM\u4ee3\u7406\u7684\u6536\u655b\uff0c\u800c\u6df7\u5408\u73af\u5883\u4e2d\u4e0d\u540c\u76f8\u5bf9\u7b56\u7565\u6c34\u5e73\u7684\u4ee3\u7406\u5171\u5b58\u4f1a\u52a0\u901f\u6240\u6709\u4ee3\u7406\u7684\u6536\u655b\u3002\u66f4\u667a\u80fd\u7684\u4ee3\u7406\u53ef\u80fd\u83b7\u5f97\u66f4\u9ad8\u7684\u5e73\u5747\u6536\u76ca\uff0c\u4f46\u8fd9\u662f\u4ee5\u8f83\u4f4e\u667a\u80fd\u4ee3\u7406\u7684\u727a\u7272\u4e3a\u4ee3\u4ef7\u7684\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e0d\u4ec5\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u60c5\u666f\u4e0b\u6a21\u62df\u4ee3\u7406\u7684\u7ed3\u5c40\uff0c\u8fd8\u4e3a\u7406\u89e3\u7b97\u6cd5\u4e4b\u95f4\u7684\u6218\u7565\u4e92\u52a8\u63d0\u4f9b\u4e86\u91cd\u8981\u542f\u793a\u3002|\n", "2404.08144": "|**2024-04-17**|**LLM Agents can Autonomously Exploit One-day Vulnerabilities**|Richard Fang et.al.|[2404.08144](http://arxiv.org/abs/2404.08144)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5a01\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5176\u5728\u826f\u6027\u548c\u6076\u610f\u7528\u9014\u4e0a\u7684\u5e94\u7528\u4e5f\u65e5\u76ca\u5e7f\u6cdb\u3002\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u5b83\u4eec\u5229\u7528\u7f51\u7edc\u5b89\u5168\u6f0f\u6d1e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u81ea\u4e3b\u7834\u89e3\u7f51\u7ad9\u7684\u53ef\u80fd\u6027\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u5355\u7684\u6f0f\u6d1e\u4e0a\u3002\u672c\u5de5\u4f5c\u63ed\u793a\uff0cLLMs\u80fd\u591f\u81ea\u4e3b\u5229\u7528\u73b0\u5b9e\u4e16\u754c\u7cfb\u7edf\u4e2d\u7684\u5355\u65e5\u6f0f\u6d1e\u3002\u6211\u4eec\u6536\u96c6\u4e86\u4e00\u7ec4\u5305\u542b15\u4e2a\u88abCVE\u63cf\u8ff0\u4e3a\u201c\u5173\u952e\u4e25\u91cd\u6027\u201d\u7684\u4e00\u5929\u671f\u6f0f\u6d1e\u6570\u636e\u3002\u5f53\u63d0\u4f9bCVE\u63cf\u8ff0\u65f6\uff0cGPT-4\u6a21\u578b\u80fd\u6210\u529f\u5229\u752887%\u7684\u6f0f\u6d1e\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5176\u4ed6\u6d4b\u8bd5\u6a21\u578b\uff08\u5982GPT-3.5\u3001\u5f00\u6e90LLMs\u548c\u5f00\u6e90\u6f0f\u6d1e\u626b\u63cf\u5668ZAP\u548cMetasploit\uff09\u7684\u8868\u73b0\u5747\u4e3a0%\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684GPT-4\u6a21\u578b\u5728\u6ca1\u6709\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u6548\u7387\u5927\u51cf\uff0c\u4ec5\u80fd\u5229\u75287%\u7684\u6f0f\u6d1e\u3002\u8fd9\u4e9b\u53d1\u73b0\u5bf9\u5927\u89c4\u6a21\u90e8\u7f72\u9ad8\u80fd\u529bLLMs\u63d0\u51fa\u4e86\u8d28\u7591\u3002|\n", "2404.17586": "|**2024-04-11**|**The Future of Scientific Publishing: Automated Article Generation**|Jeremy R. Harper et.al.|[2404.17586](http://arxiv.org/abs/2404.17586)|null|\u8fd9\u9879\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8f6f\u4ef6\u5de5\u5177\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u793a\uff0c\u5b9e\u73b0\u4e86\u4ecePython\u4ee3\u7801\u81ea\u52a8\u751f\u6210\u5b66\u672f\u6587\u7ae0\uff0c\u8fd9\u5bf9\u4e8e\u751f\u7269\u533b\u5b66\u4fe1\u606f\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u9009\u62e9Python\u4f5c\u4e3a\u57fa\u7840\u793a\u4f8b\uff0c\u56e0\u5176\u5e7f\u6cdb\u4f7f\u7528\u548c\u5f3a\u5927\u7684\u6570\u636e\u5206\u6790\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u548c\u6846\u67b6\u7684\u7075\u6d3b\u6027\u4f7f\u5f97\u5176\u9002\u7528\u4e8e\u591a\u79cdGitHub\u4ed3\u5e93\uff0c\u8868\u660e\u4e86\u5de5\u5177\u7684\u5e7f\u6cdb\u5e94\u7528\u6f5c\u529b\uff08Harper\uff0c2024\u5e74\uff09\u3002\u901a\u8fc7\u7b80\u5316\u4f20\u7edf\u4e0a\u8017\u65f6\u7684\u5b66\u672f\u5199\u4f5c\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5728\u6574\u5408\u590d\u6742\u6570\u636e\u96c6\u548c\u4ee3\u7801\u8f93\u51fa\u65b9\u9762\uff0c\u8fd9\u4e00\u7a81\u7834\u6027\u8fdb\u5c55\u63a8\u52a8\u4e86\u79d1\u7814\u6210\u679c\u7684\u5feb\u901f\u4f20\u64ad\u3002\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u5e76\u672a\u4f9d\u8d56\u9ad8\u7ea7\u8bed\u8a00\u6a21\u578b\uff0c\u786e\u4fdd\u4e86\u81ea\u52a8\u5316\u751f\u6210\u5185\u5bb9\u7684\u8fde\u8d2f\u6027\u548c\u5b8c\u6574\u6027\u3002\u6b64\u6b21\u63a2\u7d22\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u8f6f\u4ef6\u7684\u6210\u529f\u5e94\u7528\u548c\u6548\u7387\uff0c\u8fd8\u9884\u793a\u4e86\u672a\u6765\u53ef\u80fd\u96c6\u6210\u66f4\u5148\u8fdb\u7684LLM\uff0c\u5c06\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u529f\u80fd\uff0c\u5f15\u9886\u4e00\u4e2a\u79d1\u7814\u53d1\u73b0\u53d1\u5e03\u66f4\u52a0\u8fc5\u901f\u548c\u6613\u83b7\u53d6\u7684\u65f6\u4ee3\u3002|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456](http://arxiv.org/abs/2404.07456)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u4f5c\u4e3a\u667a\u80fd\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u5de5\u7a0b\u6216\u4efb\u52a1\u7279\u5b9a\u7684\u5fae\u8c03\u6765\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u6216\u51b3\u7b56\u80fd\u529b\uff0c\u5ffd\u89c6\u4e86\u63a2\u7d22\u4e0e\u5229\u7528\u7684\u8fc7\u7a0b\u3002\u5728\u5904\u7406\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u9996\u5148\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u73af\u5883\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u6a21\u578b\u503e\u5411\u4e8e\u505a\u51fa\u8d2a\u5a6a\u51b3\u7b56\uff0c\u5bfc\u81f4\u89e3\u51b3\u65b9\u6848\u4e0d\u7406\u60f3\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u4ece\u73af\u5883\u4e2d\u83b7\u53d6\u7684\u65e0\u5173\u4fe1\u606f\u4e0d\u4ec5\u5f15\u5165\u566a\u58f0\uff0c\u8fd8\u589e\u52a0\u4e86\u989d\u5916\u7684\u6210\u672c\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u5f31\u63a2\u7d22\u5f3a\u5316\u5f3a\u5229\u7528\uff08Weak Exploration to Strong Exploitation\uff0cWESE\uff09\uff0c\u65e8\u5728\u589e\u5f3aLLM\u5728\u89e3\u51b3\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0cWESE\u5c06\u63a2\u7d22\u548c\u5229\u7528\u8fc7\u7a0b\u89e3\u8026\uff0c\u4f7f\u7528\u6210\u672c\u6548\u76ca\u9ad8\u7684\u201c\u5f31\u201d\u4ee3\u7406\u6267\u884c\u63a2\u7d22\u4efb\u52a1\uff0c\u4ee5\u83b7\u53d6\u5168\u5c40\u77e5\u8bc6\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7b56\u7565\u6765\u5b58\u50a8\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5e76\u63d0\u53d6\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u4ece\u800c\u63d0\u5347\u201c\u5f3a\u201d\u4ee3\u7406\u5728\u6210\u529f\u7387\u548c\u6548\u7387\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5e76\u5728\u56db\u4e2a\u4e92\u52a8\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u548c\u6548\u7387\u3002|\n", "2404.06921": "|**2024-04-10**|**GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications**|Shishir G. Patil et.al.|[2404.06921](http://arxiv.org/abs/2404.06921)|**[link](https://github.com/ShishirPatil/gorilla)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0d\u518d\u4ec5\u4ec5\u662f\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u4fe1\u606f\u63d0\u4f9b\u8005\uff0c\u800c\u662f\u5f00\u59cb\u79ef\u6781\u53c2\u4e0e\u5230\u4e0e\u5b9e\u9645\u5e94\u7528\u548c\u670d\u52a1\u7684\u4e92\u52a8\u4e2d\u3002\u5982\u4eca\uff0c\u4eba\u7c7b\u5728\u5c06LLM\u751f\u6210\u7684\u8f93\u51fa\uff08\u5982\u4ee3\u7801\u3001\u51fd\u6570\u6216\u64cd\u4f5c\uff09\u6295\u5165\u73b0\u5b9e\u4e16\u754c\u6267\u884c\u524d\uff0c\u9700\u8981\u9a8c\u8bc1\u5176\u6b63\u786e\u6027\u548c\u9002\u7528\u6027\uff0c\u8fd9\u5e26\u6765\u4e86\u6311\u6218\uff0c\u56e0\u4e3a\u4ee3\u7801\u7406\u89e3\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u975e\u5e38\u56f0\u96be\u3002\u672c\u6587\u7814\u7a76\u4e86\u4eba\u7c7b\u5982\u4f55\u80fd\u6709\u6548\u4e0eLLMs\u534f\u4f5c\u3001\u59d4\u6d3e\u548c\u76d1\u7763\uff0c\u7279\u522b\u662f\u5728\u672a\u6765\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u5bf9\u63d0\u51fa\u7684\u884c\u52a8\u8fdb\u884c\u201c\u4e8b\u540e\u9a8c\u8bc1\u201d\uff08\u5728\u770b\u5230\u8f93\u51fa\u540e\u786e\u8ba4\u5176\u6b63\u786e\u6027\uff09\u6bd4\u4e4b\u524d\u7684\u201c\u4e8b\u524d\u9a8c\u8bc1\u201d\u66f4\u4e3a\u5bb9\u6613\u3002\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u96c6\u6210\u76f4\u89c2\u7684\u64a4\u9500\u529f\u80fd\uff0c\u5e76\u4e3aLLM\u751f\u6210\u7684\u52a8\u4f5c\u8bbe\u5b9a\u635f\u5bb3\u7ea6\u675f\uff0c\u4f5c\u4e3a\u964d\u4f4e\u76f8\u5173\u98ce\u9669\u7684\u6709\u6548\u7b56\u7565\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u4eba\u7c7b\u53ef\u4ee5\u64a4\u9500LLM\u8f93\u51fa\u7684\u5f71\u54cd\uff0c\u6216\u8005\u786e\u4fe1\u6f5c\u5728\u98ce\u9669\u662f\u6709\u9650\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u5b9e\u73b0LLMs\u4e0e\u5e94\u7528\u548c\u670d\u52a1\u5728\u6709\u9650\u7684\u4eba\u7c7b\u76d1\u7763\u4e0b\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u63cf\u8ff0\u4e86\u5f00\u6e90\u8fd0\u884c\u65f6Gorilla Execution Engine\uff08GoEX\uff09\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\uff0c\u8be5\u8fd0\u884c\u65f6\u7528\u4e8e\u6267\u884cLLM\u52a8\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e9b\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\uff0c\u65e8\u5728\u63a8\u52a8LLMs\u4e0e\u5e94\u7528\u4e4b\u95f4\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u8fdb\u884c\u4ea4\u4e92\u3002GoEX\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ShishirPatil/gorilla/\u3002**|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411](http://arxiv.org/abs/2404.06411)|**[link](https://github.com/nec-research/agentquest)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\uff0c\u4eba\u4eec\u8ffd\u6c42\u80fd\u591f\u89e3\u51b3\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u7684LLM\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u5f80\u5f80\u5c40\u9650\u4e14\u53ea\u5173\u6ce8\u6574\u4f53\u4efb\u52a1\u6210\u529f\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AgentQuest\u6846\u67b6\uff0c\u5b83\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\uff08i\uff09benchmark\u548c\u8bc4\u4f30\u6307\u6807\u6a21\u5757\u5316\u4e14\u6613\u4e8e\u6269\u5c55\uff0c\u901a\u8fc7\u6587\u6863\u9f50\u5168\u3001\u6613\u7528\u7684API\uff1b\uff08ii\uff09\u6211\u4eec\u63d0\u4f9b\u4e86\u4e24\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u80fd\u591f\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u53ef\u9760\u5730\u8ffd\u8e2aLLM\u4ee3\u7406\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u793a\u4f8b\u5c55\u793a\u4e86\u8fd9\u4e9b\u6307\u6807\u7684\u5b9e\u7528\u6027\uff0c\u901a\u8fc7\u8bc6\u522b\u5e38\u89c1\u5931\u8d25\u70b9\u5e76\u4f18\u5316\u4ee3\u7406\u67b6\u6784\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6211\u4eec\u5e0c\u671b\u4e0e\u7814\u7a76\u754c\u5171\u540c\u6269\u5c55AgentQuest\uff0c\u5e76\u5df2\u5c06\u5176\u5f00\u6e90\u5728https://github.com/nec-research/agentquest\u3002**|\n", "2404.05427": "|**2024-04-15**|**AutoCodeRover: Autonomous Program Improvement**|Yuntong Zhang et.al.|[2404.05427](http://arxiv.org/abs/2404.05427)|**[link](https://github.com/nus-apr/auto-code-rover)**|**\u5728\u8fc7\u53bb\u51e0\u5341\u5e74\u91cc\uff0c\u7814\u7a76\u4eba\u5458\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u6781\u5927\u5730\u63a8\u52a8\u4e86\u7f16\u7a0b\u8f85\u52a9\u7684\u81ea\u52a8\u5316\u3002\u7136\u800c\uff0c\u8f6f\u4ef6\u5de5\u7a0b\u5e76\u4e0d\u4ec5\u4ec5\u662f\u7f16\u7801\uff0c\u8fd8\u5305\u62ec\u7ef4\u62a4\uff08\u5982\u4fee\u590dbug\uff09\u548c\u6f14\u5316\uff08\u5982\u6dfb\u52a0\u529f\u80fd\uff09\u7b49\u7a0b\u5e8f\u6539\u8fdb\u8fc7\u7a0b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u89e3\u51b3GitHub\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5b9e\u73b0\u7a0b\u5e8f\u81ea\u4e3b\u6539\u8fdb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u79f0\u4e3aAutoCodeRover\uff0c\u5b83\u7ed3\u5408\u4e86LLMs\u4e0e\u9ad8\u7ea7\u4ee3\u7801\u641c\u7d22\u80fd\u529b\uff0c\u6700\u7ec8\u751f\u6210\u7a0b\u5e8f\u4fee\u6539\u6216\u8865\u4e01\u3002\u4e0eAI\u7814\u7a76\u8005\u548c\u4ece\u4e1a\u8005\u8fd1\u671f\u5173\u6ce8\u7684\u4ec5\u6587\u4ef6\u7ea7\u522b\u7684\u8f6f\u4ef6\u9879\u76ee\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u7a0b\u5e8f\u8868\u793a\uff08\u62bd\u8c61\u8bed\u6cd5\u6811\uff09\uff0c\u5229\u7528\u7c7b/\u65b9\u6cd5\u7684\u7a0b\u5e8f\u7ed3\u6784\u6765\u589e\u5f3aLLM\u5bf9\u95ee\u9898\u6839\u672c\u539f\u56e0\u7684\u7406\u89e3\uff0c\u5e76\u901a\u8fc7\u8fed\u4ee3\u641c\u7d22\u63d0\u4f9b\u4e0a\u4e0b\u6587\u3002\u5f53\u6d4b\u8bd5\u5957\u4ef6\u53ef\u7528\u65f6\uff0c\u8c31\u7cfb\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u6280\u672f\u8fdb\u4e00\u6b65\u7cbe\u786e\u4e86\u4e0a\u4e0b\u6587\u3002 \u5728SWE-bench-lite\uff0c\u4e00\u4e2a\u5305\u542b300\u4e2a\u771f\u5b9eGitHub\u95ee\u9898\u7684\u6570\u636e\u96c6\u4e0a\uff0cAutoCodeRover\u7684\u89e3\u51b3\u65b9\u6848\u6548\u679c\u63d0\u5347\uff0c\u89e3\u51b3\u4e86\u7ea622-23%\u7684\u95ee\u9898\u3002\u5bf9\u4e8e\u5168\u91cf\u7684SWE-bench\uff0c\u5305\u542b2294\u4e2aGitHub\u95ee\u9898\uff0cAutoCodeRover\u89e3\u51b3\u4e86\u5927\u7ea616%\u7684\u95ee\u9898\uff0c\u8fd9\u6bd4\u6700\u8fd1\u62a5\u9053\u7684\u6765\u81eaCognition Labs\u7684AI\u8f6f\u4ef6\u5de5\u7a0b\u5e08Devin\u7684\u8868\u73b0\u8fd8\u8981\u9ad8\uff0c\u800c\u4e14\u65f6\u95f4\u6d88\u8017\u4e0eDevin\u76f8\u5f53\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u6d41\u7a0b\u80fd\u591f\u63a8\u52a8\u81ea\u4e3b\u8f6f\u4ef6\u5de5\u7a0b\u7684\u53d1\u5c55\uff0c\u672a\u6765LLM\u81ea\u52a8\u751f\u6210\u7684\u4ee3\u7801\u53ef\u4ee5\u88ab\u81ea\u52a8\u5730\u8fdb\u884c\u4f18\u5316\u548c\u6539\u8fdb\u3002**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291](http://arxiv.org/abs/2404.05291)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u63d0\u5347\u56db\u8db3\u673a\u5668\u4eba\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u8d85\u8d8a\u77ed\u671f\u52a8\u4f5c\u7684\u957f\u671f\u4efb\u52a1\u3002\u5bf9\u4e8e\u56db\u8db3\u673a\u5668\u4eba\u6765\u8bf4\uff0c\u957f\u671f\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u5bf9\u4efb\u52a1\u7684\u8bed\u4e49\u6709\u9ad8\u5c42\u7406\u89e3\uff0c\u5e76\u5177\u5907\u5e7f\u6cdb\u7684\u8fd0\u52a8\u548c\u64cd\u7eb5\u6280\u80fd\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5c42\u63a8\u7406\u5c42\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u4efb\u52a1\u63cf\u8ff0\u4e2d\u751f\u6210\u6df7\u5408\u79bb\u6563-\u8fde\u7eed\u7684\u8ba1\u5212\uff0c\u4f5c\u4e3a\u673a\u5668\u4eba\u4ee3\u7801\u3002\u5b83\u5305\u62ec\u591a\u4e2aLLM\u4ee3\u7406\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u601d\u8ba1\u5212\u7684\u8bed\u4e49\u89c4\u5212\u5668\u3001\u4e00\u4e2a\u53c2\u6570\u8ba1\u7b97\u5668\uff0c\u7528\u4e8e\u9884\u6d4b\u8ba1\u5212\u4e2d\u7684\u53c2\u6570\uff0c\u4ee5\u53ca\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u5668\uff0c\u5c06\u8ba1\u5212\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684\u673a\u5668\u4eba\u4ee3\u7801\u3002 \u5728\u4f4e\u5c42\u6b21\uff0c\u6211\u4eec\u91c7\u7528\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u4e00\u5957\u8fd0\u52a8\u89c4\u5212\u548c\u63a7\u5236\u6280\u80fd\uff0c\u4ee5\u589e\u5f3a\u56db\u8db3\u673a\u5668\u4eba\u7684\u7075\u6d3b\u6027\uff0c\u4f7f\u5176\u80fd\u8fdb\u884c\u4e30\u5bcc\u73af\u5883\u4ea4\u4e92\u3002\u6211\u4eec\u5728\u96be\u4ee5\u7528\u5355\u4e00\u6280\u80fd\u5b8c\u6210\u7684\u957f\u671f\u4efb\u52a1\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u7cfb\u7edf\u3002\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u8868\u660e\uff0c\u5b83\u6210\u529f\u5730\u5236\u5b9a\u4e86\u591a\u6b65\u9aa4\u7b56\u7565\uff0c\u5e76\u5c55\u73b0\u51fa\u975e\u5e73\u51e1\u7684\u884c\u4e3a\uff0c\u4f8b\u5982\u5236\u4f5c\u5de5\u5177\u6216\u5411\u4eba\u7c7b\u5bfb\u6c42\u5e2e\u52a9\u3002|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667](http://arxiv.org/abs/2404.04667)|null|\u591a\u6a21\u6001\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u6709\u671b\u901a\u8fc7\u89e3\u6790\u5404\u7c7b\u533b\u5b66\u6570\u636e\u63d0\u5347\u4e34\u5e8a\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5404\u533b\u5b66\u9886\u57df\u7684\u6548\u80fd\u5c1a\u4e0d\u660e\u6717\uff0c\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5176\u72ec\u7279\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u5f15\u64ce\u7684\u65b0\u578b\u591a\u6a21\u6001\u533b\u7597AI\u65b9\u6cd5\u3002\u6b64\u5f15\u64ce\u81ea\u4e3b\u534f\u8c03\u5e76\u90e8\u7f72\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u533b\u7597AI\u5de5\u5177\uff0c\u5982\u6587\u672c\u89e3\u8bfb\u3001\u653e\u5c04\u5b66\u548c\u75c5\u7406\u56fe\u50cf\u5206\u6790\u3001\u57fa\u56e0\u6570\u636e\u5904\u7406\u3001\u7f51\u7edc\u641c\u7d22\u4ee5\u53ca\u533b\u7597\u6307\u5357\u6587\u6863\u68c0\u7d22\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u4e34\u5e8a\u80bf\u7624\u5b66\u573a\u666f\u4e2d\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u573a\u666f\u6a21\u62df\u4e86\u5178\u578b\u7684\u60a3\u8005\u62a4\u7406\u6d41\u7a0b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u7cfb\u7edf\u5728\u9009\u62e9\u6070\u5f53\u5de5\u5177\uff0897%\uff09\u3001\u5f97\u51fa\u6b63\u786e\u7ed3\u8bba\uff0893.6%\uff09\u3001\u63d0\u4f9b\u5b8c\u6574\uff0894%\uff09\u548c\u6709\u76ca\uff0889.2%\uff09\u6cbb\u7597\u5efa\u8bae\uff0c\u4ee5\u53ca\u6839\u636e\u6307\u4ee4\u5f15\u7528\u76f8\u5173\u6587\u732e\uff0882.5%\uff09\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u80fd\u529b\u3002\u8fd9\u8868\u660eLLMs\u80fd\u591f\u6709\u6548\u5730\u89c4\u5212\u548c\u6267\u884c\u9886\u57df\u7279\u5b9a\u6a21\u578b\uff0c\u4ee5\u83b7\u53d6\u6216\u5408\u6210\u65b0\u4fe1\u606f\uff0c\u4ece\u800c\u5145\u5f53\u4e2a\u6027\u5316\u4e34\u5e8a\u52a9\u624b\u3002\u6b64\u5916\uff0c\u8fd9\u79cd\u67b6\u6784\u7b80\u5316\u4e86\u76d1\u7ba1\u5408\u89c4\u6027\uff0c\u56e0\u4e3a\u6bcf\u4e2a\u7ec4\u4ef6\u5de5\u5177\u53ef\u4ee5\u5355\u72ec\u9a8c\u8bc1\u548c\u5ba1\u6279\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u9879\u5de5\u4f5c\u4e3a\u533b\u7597\u9886\u57df\u7684\u66f4\u5148\u8fdbLLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u6982\u5ff5\u9a8c\u8bc1\u3002|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237](http://arxiv.org/abs/2404.04237)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u8fdb\u6b65\u4f7f\u5176\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u9891\u9891\u8d85\u8d8a\u4eba\u7c7b\u8868\u73b0\uff0c\u63a8\u52a8\u4e86\u4f17\u591a\u4e0b\u6e38\u5e94\u7528\u7684\u53d1\u5c55\uff0c\u5982\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u770b\u4f3c\u7b80\u5355\u7684\u4efb\u52a1\u4e2d\u610f\u5916\u5730\u8868\u73b0\u4e0d\u4f73\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5bf9\u66f4\u5168\u9762\u548c\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u9700\u6c42\uff0c\u4ee5\u8861\u91cf\u5b83\u4eec\u7684\u5b9e\u9645\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7ec4\u5408\u6027\u548c\u6761\u4ef6\u63a8\u7406\u2014\u2014\u4eba\u7c7b\u8ba4\u77e5\u7684\u57fa\u77f3\uff0c\u5e76\u63d0\u51faGroundCocoa\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e0e\u822a\u73ed\u9884\u8ba2\u8fd9\u4e00\u73b0\u5b9e\u95ee\u9898\u76f8\u8fde\u63a5\u7684\u8bcd\u6c47\u4e30\u5bcc\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u4efb\u52a1\u662f\u5c06\u7528\u6237\u7684\u8be6\u7ec6\u504f\u597d\u4e0e\u4ee5\u591a\u9009\u5f62\u5f0f\u63d0\u4f9b\u7684\u53ef\u7528\u822a\u73ed\u9009\u9879\u8fdb\u884c\u5339\u914d\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684GPT-4 Turbo\u5728\u5185\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7\u9ad8\u7ea7\u63d0\u793a\u540e\uff0c\u51c6\u786e\u7387\u4ecd\u4e0d\u8d85\u8fc767%\uff0c\u663e\u793a\u51fa\u663e\u8457\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045](http://arxiv.org/abs/2404.16045)|null|## \u7ffb\u8bd1 \u5728\u4ea7\u54c1\u5f00\u53d1\u7684\u5173\u952e\u9636\u6bb5\u2014\u2014\u9700\u6c42\u83b7\u53d6\uff0c\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u7528\u6237\u9700\u6c42\uff0c\u5bfc\u81f4\u6700\u7ec8\u4ea7\u54c1\u53ef\u80fd\u65e0\u6cd5\u6ee1\u8db3\u671f\u671b\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u548c\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u3002\u901a\u8fc7\u751f\u6210\u5927\u91cf\u6a21\u62df\u7528\u6237\uff08LLM\u4ee3\u7406\uff09\uff0c\u6211\u4eec\u53ef\u4ee5\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7528\u6237\u9700\u6c42\u548c\u672a\u9884\u89c1\u7684\u4f7f\u7528\u573a\u666f\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u63cf\u8ff0\u4ed6\u4eec\u7684\u884c\u4e3a\u3001\u89c2\u5bdf\u548c\u6311\u6218\uff0c\u53c2\u4e0e\u4ea7\u54c1\u4f53\u9a8c\u60c5\u666f\u3002\u968f\u540e\u7684\u4ee3\u7406\u8bbf\u8c08\u548c\u5206\u6790\u63ed\u793a\u4e86\u5b9d\u8d35\u7684\u7528\u6237\u9700\u6c42\uff0c\u5305\u62ec\u6f5c\u5728\u9700\u6c42\u3002\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff1a\u9996\u5148\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u65b9\u6cd5\u751f\u6210\u591a\u6837\u5316\u7684\u4ee3\u7406\uff0c\u5206\u6790\u5176\u4f18\u7f3a\u70b9\uff0c\u5e76\u8bc1\u660e\u4e86\u5177\u6709\u4e0a\u4e0b\u6587\u610f\u8bc6\u7684\u4ee3\u7406\u751f\u6210\u80fd\u5e26\u6765\u66f4\u5927\u7684\u9700\u6c42\u591a\u6837\u6027\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8be5\u6846\u67b6\u5982\u4f55\u6709\u6548\u5730\u6a21\u62df\u5bcc\u6709\u540c\u60c5\u5fc3\u7684\u9886\u5148\u7528\u6237\u8bbf\u8c08\uff0c\u8bc6\u522b\u51fa\u6bd4\u4f20\u7edf\u4eba\u7c7b\u8bbf\u8c08\u66f4\u591a\u7684\u6f5c\u5728\u9700\u6c42\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528LLMs\u5206\u6790\u8bbf\u8c08\uff0c\u63d0\u53d6\u9700\u6c42\u5e76\u5c06\u5176\u5206\u7c7b\u4e3a\u6f5c\u5728\u6216\u975e\u6f5c\u5728\u3002\u6211\u4eec\u7684\u7814\u7a76\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5229\u7528LLM\u4ee3\u7406\u52a0\u901f\u65e9\u671f\u4ea7\u54c1\u7814\u53d1\u3001\u964d\u4f4e\u6210\u672c\u548c\u4fc3\u8fdb\u521b\u65b0\u7684\u6f5c\u529b\u3002|\n", "2404.15317": "|**2024-04-03**|**Concept-Guided LLM Agents for Human-AI Safety Codesign**|Florian Geissler et.al.|[2404.15317](http://arxiv.org/abs/2404.15317)|null|\u968f\u7740\u751f\u6210\u4eba\u5de5\u667a\u80fd\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff0c\u7279\u522b\u662f\u5b89\u5168\u5de5\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u63d0\u5347\uff0c\u5bf9\u5b83\u7684\u8d28\u91cf\u8981\u6c42\u4e5f\u968f\u4e4b\u63d0\u9ad8\u3002\u5355\u7eaf\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e9b\u9700\u6c42\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u878d\u5408\u7684\u7b56\u7565\uff0c\u65e8\u5728\u5229\u7528LLMs\u8fdb\u884c\u5b89\u5168\u5206\u6790\u548c\u4eba\u673a\u534f\u540c\u8bbe\u8ba1\uff0c\u4ee5\u786e\u4fdd\u8f6f\u4ef6\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9a\u5236\u5316\u7684LLM\u4ee3\u7406\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u3001\u542f\u53d1\u5f0f\u63a8\u7406\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u4e13\u6ce8\u4e8e\u89e3\u51b3\u4e0e\u9884\u5b9a\u4e49\u5b89\u5168\u6982\u5ff5\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5e76\u4e0e\u7cfb\u7edf\u6a21\u578b\u56fe\u8fdb\u884c\u4ea4\u4e92\u3002\u51b3\u7b56\u6d41\u7a0b\u901a\u8fc7\u4e00\u7cfb\u5217\u5fae\u51b3\u7b56\u8fdb\u884c\u5f15\u5bfc\uff0c\u6709\u52a9\u4e8e\u4fdd\u6301\u7ed3\u6784\u5316\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u56fe\u7684\u53e3\u5934\u8868\u8ff0\u4f5c\u4e3a\u7cfb\u7edf\u6a21\u578b\u7684\u4e2d\u95f4\u8868\u793a\uff0c\u4ee5\u4fc3\u8fdbLLM\u4e0e\u56fe\u7684\u4ea4\u4e92\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7b80\u5316\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u7684\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u9009\u62e9\u7684\u63d0\u793a-\u54cd\u5e94\u5bf9\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u5e94\u7528\u4e8e\u5b89\u5168\u5206\u6790\u3002|\n", "2404.02183": "|**2024-04-02**|**Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**|Yoichi Ishibashi et.al.|[2404.02183](http://arxiv.org/abs/2404.02183)|**[link](https://github.com/tsukushiai/self-organized-agent)**|**## \u80cc\u666f \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u7684\u672a\u6765\u6b63\u9010\u6e10\u663e\u73b0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5355\u4ee3\u7406\u65b9\u6cd5\u5728\u751f\u6210\u548c\u4f18\u5316\u5927\u89c4\u6a21\u3001\u590d\u6742\u7684\u4ee3\u7801\u5e93\u65f6\u9762\u4e34\u4e0a\u4e0b\u6587\u957f\u5ea6\u9650\u5236\u7684\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014\u81ea\u7ec4\u7ec7\u591aAgent\u4f53\u7cfb\uff08SoA\uff09\u3002SoA\u662f\u4e00\u4e2a\u53ef\u6269\u5c55\u4e14\u9ad8\u6548\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u5141\u8bb8\u72ec\u7acb\u5730\u751f\u6210\u548c\u4fee\u6539\u4ee3\u7801\u7ec4\u4ef6\uff0c\u5e76\u534f\u540c\u6784\u5efa\u6574\u4e2a\u4ee3\u7801\u5e93\u3002SoA\u7684\u4e00\u4e2a\u5173\u952e\u7279\u6027\u662f\u6839\u636e\u95ee\u9898\u590d\u6742\u6027\u81ea\u52a8\u589e\u52a0\u4ee3\u7406\uff0c\u5b9e\u73b0\u52a8\u6001\u53ef\u6269\u5c55\u6027\u3002\u8fd9\u6837\uff0c\u6574\u4f53\u4ee3\u7801\u91cf\u53ef\u4ee5\u6839\u636e\u4ee3\u7406\u6570\u91cf\u65e0\u9650\u589e\u957f\uff0c\u800c\u6bcf\u4e2a\u4ee3\u7406\u7ba1\u7406\u7684\u4ee3\u7801\u91cf\u4fdd\u6301\u6052\u5b9a\u3002 \u6211\u4eec\u5728HumanEval\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86SoA\uff0c\u5e76\u53d1\u73b0\u4e0e\u5355\u4ee3\u7406\u7cfb\u7edf\u76f8\u6bd4\uff0cSoA\u4e2d\u7684\u6bcf\u4e2a\u4ee3\u7406\u5904\u7406\u7684\u4ee3\u7801\u91cf\u660e\u663e\u51cf\u5c11\uff0c\u4f46\u603b\u4f53\u751f\u6210\u7684\u4ee3\u7801\u91cf\u663e\u8457\u589e\u52a0\u3002\u6b64\u5916\uff0cSoA\u5728Pass@1\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u5f3a\u5927\u7684\u5355\u4ee3\u7406\u57fa\u7ebf\u63d0\u9ad8\u4e865%\u3002**|\n", "2404.01602": "|**2024-04-02**|**Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game**|Silin Du et.al.|[2404.01602](http://arxiv.org/abs/2404.01602)|**[link](https://github.com/doslim/evaluate-the-opinion-leadership-of-llms)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4ea4\u63a8\u7406\u6e38\u620f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u7b56\u7565\u884c\u4e3a\uff0c\u4f46\u5bf9\u5b83\u4eec\u4f5c\u4e3a\u610f\u89c1\u9886\u8896\u7684\u91cd\u8981\u6027\u5173\u6ce8\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u591aAgent\u548c\u4eba\u673a\u4ea4\u4e92\u573a\u666f\u7684\u5b9e\u9645\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u610f\u89c1\u9886\u8896\u662f\u6307\u5728\u4e00\u4e2a\u793e\u4f1a\u7fa4\u4f53\u4e2d\u5bf9\u4ed6\u4eba\u4fe1\u5ff5\u548c\u884c\u4e3a\u6709\u663e\u8457\u5f71\u54cd\u7684\u4e2a\u4f53\u3002\u672c\u7814\u7a76\u4f7f\u7528\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u4f5c\u4e3a\u6a21\u62df\u5e73\u53f0\uff0c\u63a2\u8ba8\u8bed\u8a00\u6a21\u578b\u5728\u626e\u6f14Sheriff\uff08\u6cbb\u5b89\u5b98\uff09\u89d2\u8272\u65f6\u7684\u610f\u89c1\u9886\u5bfc\u80fd\u529b\u3002Sheriff\u8d1f\u8d23\u603b\u7ed3\u8bba\u70b9\u5e76\u63d0\u51fa\u51b3\u7b56\u5efa\u8bae\uff0c\u56e0\u6b64\u5b83\u4ee3\u8868\u4e86\u610f\u89c1\u9886\u8896\u7684\u4e00\u4e2a\u53ef\u4fe1\u4ee3\u7406\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6574\u5408Sheriff\u89d2\u8272\u7684\u6846\u67b6\uff0c\u5e76\u57fa\u4e8e\u610f\u89c1\u9886\u8896\u7684\u5173\u952e\u7279\u6027\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bc4\u4f30\u6307\u6807\uff1a\u7b2c\u4e00\u4e2a\u8861\u91cf\u610f\u89c1\u9886\u8896\u7684\u53ef\u9760\u6027\uff0c\u7b2c\u4e8c\u4e2a\u8003\u5bdf\u5176\u5bf9\u5176\u4ed6\u73a9\u5bb6\u51b3\u7b56\u7684\u5f71\u54cd\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u8bc4\u4f30\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u521b\u5efa\u4e86\u201c\u72fc\u4eba\u6740\u201d\u95ee\u9898\u56de\u7b54\u6570\u636e\u96c6\uff08WWQA\uff09\uff0c\u4ee5\u6d4b\u8bd5\u548c\u63d0\u5347\u6a21\u578b\u5bf9\u6e38\u620f\u89c4\u5219\u7684\u7406\u89e3\u3002\u6b64\u5916\uff0c\u8fd8\u5305\u542b\u4e86\u4eba\u7c7b\u53c2\u4e0e\u8005\u8fdb\u884c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u662f\u4e00\u4e2a\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u610f\u89c1\u9886\u5bfc\u529b\u7684\u8bd5\u9a8c\u573a\uff0c\u4f46\u76ee\u524d\u4ec5\u6709\u5c11\u6570\u8bed\u8a00\u6a21\u578b\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002**|\n", "2404.00806": "|**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|\u968f\u7740\u7b97\u6cd5\u5b9a\u4ef7\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u62c5\u5fe7\u7b97\u6cd5\u95f4\u7684\u5408\u8c0b\u95ee\u9898\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b9a\u4ef7\u4ee3\u7406\uff0c\u7279\u522b\u662fGPT-4\uff0c\u8fdb\u884c\u4e86\u63a2\u7a76\u3002\u7814\u7a76\u53d1\u73b0\uff1a(1) LLM\u9a71\u52a8\u7684\u5b9a\u4ef7\u673a\u5236\u5728\u5b9a\u4ef7\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff1b(2) \u5728\u5be1\u5934\u7ade\u4e89\u73af\u5883\u4e2d\uff0cLLM\u5b9a\u4ef7\u4ee3\u7406\u4f1a\u81ea\u53d1\u5730\u8fdb\u884c\u5408\u8c0b\uff0c\u4ece\u800c\u635f\u5bb3\u6d88\u8d39\u8005\u5229\u76ca\uff1b(3) \u5bf9LLM\u6307\u4ee4\uff08\u201c\u63d0\u793a\u201d\uff09\u770b\u4f3c\u5fae\u5c0f\u7684\u53d8\u5316\u53ef\u80fd\u52a0\u5267\u8fd9\u79cd\u5408\u4f5c\u884c\u4e3a\u3002\u8fd9\u4e9b\u7ed3\u679c\u540c\u6837\u9002\u7528\u4e8e\u62cd\u5356\u573a\u666f\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u7b97\u6cd5\u5b9a\u4ef7\u8fdb\u884c\u53cd\u5784\u65ad\u76d1\u7ba1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u9488\u5bf9LLM\u5b9a\u4ef7\u4ee3\u7406\u7279\u6709\u7684\u76d1\u7ba1\u6311\u6218\u3002|\n", "2404.01343": "|**2024-04-15**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343](http://arxiv.org/abs/2404.01343)|**[link](https://github.com/jingzheshi/chops)**|**\u968f\u7740\u4f01\u4e1a\u548c\u8f6f\u4ef6\u5e73\u53f0\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-3.5\u3001GPT-4\u3001GLM-3\u548cLLaMa-2\uff09\u63d0\u4f9b\u804a\u5929\u8f85\u52a9\u6216\u5ba2\u6237\u670d\u52a1\u63a8\u7406\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u5ba2\u6237\u670d\u52a1\u6a21\u578b\u5728\u4e0e\u5ba2\u6237\u8d44\u6599\u96c6\u6210\u548c\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u5b83\u4eec\u503e\u5411\u4e8e\u5f3a\u8c03\u591a\u6837\u6027\u800c\u975e\u7cbe\u786e\u6027\u548c\u9519\u8bef\u907f\u514d\uff0c\u8fd9\u5bf9\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u5ba2\u6237\u670d\u52a1\u573a\u666f\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHOPS\uff08\u7ed3\u5408\u5ba2\u6237\u8d44\u6599\u7684\u804a\u5929\u52a9\u624b\uff09\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\uff1a\uff081\uff09\u9ad8\u6548\u5229\u7528\u73b0\u6709\u6570\u636e\u5e93\u6216\u7cfb\u7edf\u67e5\u8be2\u7528\u6237\u4fe1\u606f\uff0c\u6216\u9075\u5faa\u65e2\u5b9a\u6307\u5357\u4e0e\u7cfb\u7edf\u4ea4\u4e92\uff1b\uff082\uff09\u63d0\u4f9b\u51c6\u786e\u5408\u7406\u7684\u54cd\u5e94\u5e76\u6267\u884c\u7cfb\u7edf\u5185\u7684\u5fc5\u8981\u64cd\u4f5c\uff0c\u540c\u65f6\u907f\u514d\u6709\u5bb3\u64cd\u4f5c\uff1b\uff083\uff09\u901a\u8fc7\u7ed3\u5408\u5c0f\u578b\u548c\u5927\u578bLLM\u4ee5\u5b9e\u73b0\u6027\u80fd\u6ee1\u610f\u4e14\u6210\u672c\u5408\u7406\u7684\u63a8\u7406\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9e\u7528\u7684\u6570\u636e\u96c6\uff0c\u79f0\u4e3aCPHOS-dataset\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u6570\u636e\u5e93\u3001\u6307\u5bfc\u6587\u4ef6\u4ee5\u53ca\u6765\u81eaCPHOS\u5e73\u53f0\u7684\u6a21\u62df\u7269\u7406\u5965\u6797\u5339\u514b\u7ec4\u7ec7\u670d\u52a1\u7684\u95ee\u7b54\u5bf9\u3002CPHOS\u662f\u4e00\u4e2a\u9762\u5411\u9ad8\u4e2d\u6559\u5e08\u548c\u5b66\u751f\u7684\u5728\u7ebf\u5e73\u53f0\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528CPHOS-dataset\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86CHOPS\u67b6\u6784\u7684\u6027\u80fd\uff0c\u76ee\u6807\u662f\u5c55\u793aLLM\u5982\u4f55\u63d0\u5347\u6216\u66ff\u4ee3\u4eba\u5de5\u5ba2\u6237\u670d\u52a1\u3002\u5173\u4e8e\u6211\u4eec\u7684\u63d0\u6848\u67b6\u6784\u548c\u6570\u636e\u96c6\u7684\u4ee3\u7801\u53ef\u5728\u6b64\u5904\u83b7\u53d6\uff1a\u3002**|\n", "2404.01342": "|**2024-03-31**|**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**|Lirui Zhao et.al.|[2404.01342](http://arxiv.org/abs/2404.01342)|**[link](https://github.com/opengvlab/diffagent)**|**\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u8fd1\u5e74\u6765\u5907\u53d7\u77a9\u76ee\uff0c\u5728\u5b66\u672f\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u653e\u5f02\u5f69\u3002\u4f8b\u5982\uff0cCivitai\u5e73\u53f0\uff0c\u4e00\u4e2aT2I\u521b\u65b0\u7684\u805a\u96c6\u5730\uff0c\u76ee\u524d\u6c47\u96c6\u4e8674,492\u79cd\u72ec\u7279\u7684\u6a21\u578b\uff0c\u8fd9\u5e26\u6765\u4e86\u9009\u62e9\u6700\u5408\u9002\u7684\u6a21\u578b\u548c\u53c2\u6570\u7684\u8270\u5de8\u4efb\u52a1\uff0c\u901a\u5e38\u9700\u8981\u591a\u6b21\u8bd5\u9a8c\u3002\u501f\u9274\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5de5\u5177\u4f7f\u7528\u7814\u7a76\u7684\u601d\u8def\uff0c\u6211\u4eec\u63a8\u51fa\u4e86DiffAgent\uff0c\u8fd9\u662f\u4e00\u4e2a\u901a\u8fc7API\u8c03\u7528\u6765\u5feb\u901f\u7b5b\u9009\u51c6\u786e\u9009\u9879\u7684LLM\u4ee3\u7406\u3002DiffAgent\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff0c\u79f0\u4e3aSFTA\uff0c\u4f7f\u5176\u80fd\u591f\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7cbe\u786e\u5730\u5c06T2I API\u7684\u54cd\u5e94\u4e0e\u7528\u6237\u8f93\u5165\u5bf9\u9f50\u3002\u4e3a\u4e86\u8bad\u7ec3\u548c\u8bc4\u4f30DiffAgent\u7684\u80fd\u529b\uff0c\u6211\u4eec\u6784\u5efa\u4e86DABench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u5e93\uff0c\u6db5\u76d6\u4e86\u793e\u533a\u4e2d\u7684\u5404\u79cdT2I API\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDiffAgent\u4e0d\u4ec5\u5728\u9009\u62e9\u9002\u5f53\u7684T2I API\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u9a8c\u8bc1\u4e86SFTA\u8bad\u7ec3\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53ef\u5728https://github.com/OpenGVLab/DiffAgent\u83b7\u53d6\u3002**|\n", "2404.00573": "|**2024-03-31**|**\"My agent understands me better\": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u7c7b\u8bb0\u5fc6\u67b6\u6784\uff0c\u65e8\u5728\u63d0\u5347\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u8bdd\u4ee3\u7406\u7684\u8ba4\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f7f\u5f97\u8fd9\u4e9b\u4ee3\u7406\u80fd\u81ea\u4e3b\u68c0\u7d22\u751f\u6210\u54cd\u5e94\u6240\u9700\u7684\u5fc5\u8981\u8bb0\u5fc6\uff0c\u4ece\u800c\u89e3\u51b3LLMs\u5728\u65f6\u95f4\u8ba4\u77e5\u4e0a\u7684\u5c40\u9650\u3002\u6211\u4eec\u501f\u9274\u4e86\u4eba\u7c7b\u7684\u8bb0\u5fc6\u7ebf\u7d22\u53ec\u56de\u673a\u5236\u4f5c\u4e3a\u89e6\u53d1\u70b9\uff0c\u4ee5\u5b9e\u73b0\u7cbe\u786e\u4e14\u9ad8\u6548\u7684\u56de\u5fc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6570\u5b66\u6a21\u578b\uff0c\u52a8\u6001\u91cf\u5316\u8bb0\u5fc6\u5de9\u56fa\u8fc7\u7a0b\uff0c\u8003\u8651\u4e86\u8bf8\u5982\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u65f6\u95f4\u6d41\u901d\u548c\u56de\u5fc6\u9891\u7387\u7b49\u56e0\u7d20\u3002\u4ee3\u7406\u4f1a\u4ece\u7528\u6237\u7684\u4ea4\u4e92\u5386\u53f2\u4e2d\u5b58\u50a8\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u8bb0\u5fc6\u88ab\u5c01\u88c5\u5728\u6570\u636e\u5e93\u4e2d\uff0c\u6bcf\u4e2a\u8bb0\u5fc6\u90fd\u5305\u542b\u4e86\u5185\u5bb9\u548c\u65f6\u95f4\u5173\u8054\u7684\u8bed\u5883\u3002\u8fd9\u6837\uff0c\u901a\u8fc7\u7c7b\u4f3c\u4eba\u7c7b\u8bc6\u522b\u548c\u56de\u5fc6\u8fc7\u5f80\u7ecf\u5386\u7684\u65b9\u5f0f\uff0c\u7cfb\u7edf\u80fd\u591f\u6218\u7565\u6027\u5730\u5b58\u50a8\u8bb0\u5fc6\uff0c\u5e76\u7406\u89e3\u5b83\u4eec\u5bf9\u7528\u6237\u5728\u65f6\u95f4\u7ebf\u4e0a\u7684\u91cd\u8981\u6027\u3002|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.11403": "|**2024-05-18**|**MapCoder: Multi-Agent Code Generation for Competitive Problem Solving**|Md. Ashraful Islam et.al.|[2405.11403](http://arxiv.org/abs/2405.11403)|**[link](https://github.com/md-ashraful-pramanik/mapcoder)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ee3\u7801\u5408\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u590d\u6742\u7684\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u63cf\u8ff0\u3001\u751f\u6210\u590d\u6742\u7684\u7b97\u6cd5\u548c\u6570\u636e\u7ed3\u6784\u4ee3\u7801\uff0c\u5e76\u6267\u884c\u5168\u9762\u7684\u5355\u5143\u6d4b\u8bd5\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u591a\u4ee3\u7406\u63d0\u793a\u6846\u67b6MapCoder\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u5f00\u53d1\u8005\u7f16\u7a0b\u5408\u6210\u7684\u5b8c\u6574\u8fc7\u7a0b\uff0c\u5206\u4e3a\u56db\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\uff1a\u56de\u5fc6\u76f8\u5173\u793a\u4f8b\u3001\u89c4\u5212\u3001\u4ee3\u7801\u751f\u6210\u548c\u8c03\u8bd5\u3002 \u901a\u8fc7\u5728\u516b\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7ade\u8d5b\u7ea7\u95ee\u9898\u89e3\u51b3\u548c\u7a0b\u5e8f\u5408\u6210\u57fa\u51c6\u4e0a\u8fdb\u884c\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u5305\u62ecHumanEval\uff0893.9%\uff09\u3001MBPP\uff0883.1%\uff09\u3001APPS\uff0822.0%\uff09\u3001CodeContests\uff0828.5%\uff09\u548cxCodeEval\uff0845.3%\uff09\u7b49\uff0cMapCoder\u5c55\u73b0\u4e86\u51fa\u8272\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u591a\u9879\u65b0\u7684\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u800c\u4e14\uff0c\u65e0\u8bba\u7f16\u7a0b\u8bed\u8a00\u8fd8\u662f\u95ee\u9898\u96be\u5ea6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u90fd\u8868\u73b0\u51fa\u6301\u7eed\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8be5\u6846\u67b6\uff0c\u4f9b\u7814\u7a76\u8005\u53c2\u8003\uff1ahttps://github.com/Md-Ashraful-Pramanik/MapCoder\u3002**|\n", "2405.14751": "|**2024-05-23**|**AGILE: A Novel Framework of LLM Agents**|Peiyuan Feng et.al.|[2405.14751](http://arxiv.org/abs/2405.14751)|**[link](https://github.com/bytarnish/agile)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3aLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406AGILE\uff08\u80fd\u591f\u4e0e\u7528\u6237\u4e92\u52a8\u5e76\u4ece\u73af\u5883\u4e2d\u5b66\u4e60\u7684\u4ee3\u7406\uff09\uff0c\u65e8\u5728\u6267\u884c\u590d\u6742\u7684\u5bf9\u8bdd\u4efb\u52a1\uff0c\u5229\u7528LLMs\u3001\u8bb0\u5fc6\u3001\u5de5\u5177\u548c\u4e13\u5bb6\u4ea4\u4e92\u3002\u8fd9\u79cd\u4ee3\u7406\u4e0d\u4ec5\u5177\u5907\u5bf9\u8bdd\u80fd\u529b\uff0c\u8fd8\u5177\u5907\u53cd\u601d\u3001\u5de5\u5177\u8fd0\u7528\u4ee5\u53ca\u54a8\u8be2\u4e13\u5bb6\u7684\u529f\u80fd\u3002\u6211\u4eec\u5c06\u6784\u5efa\u6b64\u7c7bLLM\u4ee3\u7406\u89c6\u4e3a\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u4e2dLLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\u3002\u6211\u4eec\u4f7f\u7528\u6807\u6ce8\u7684\u884c\u4e3a\u6570\u636e\u548cPPO\u7b97\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7279\u522b\u5173\u6ce8\u7684\u662f\u95ee\u7b54\u4efb\u52a1\uff0c\u4e3a\u6b64\u6211\u4eec\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aProductQA\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u5728\u7ebf\u8d2d\u7269\u4e2d\u7684\u96be\u9898\u3002\u6211\u4eec\u5728ProductQA\u548cMedMCQA\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8e130\u4ebf\u548c70\u4ebf\u53c2\u6570\u7684LLM\u8bad\u7ec3\u7684AGILE\u4ee3\u7406\u80fd\u591f\u8d85\u8d8aGPT-4\u4ee3\u7406\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684 ablation\u7814\u7a76\u5f3a\u8c03\u4e86\u8bb0\u5fc6\u3001\u5de5\u5177\u3001\u54a8\u8be2\u3001\u53cd\u601d\u548c\u5f3a\u5316\u5b66\u4e60\u5728\u5b9e\u73b0\u4f18\u79c0\u6027\u80fd\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002|\n", "2405.14744": "|**2024-05-23**|**Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View**|Xuan Liu et.al.|[2405.14744](http://arxiv.org/abs/2405.14744)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u53cd\u6620\u4e86\u4eba\u7c7b\u504f\u89c1\uff0c\u5b83\u4eec\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\u95ee\u9898\u3002\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u662f\uff1aLLMs\u662f\u5426\u80fd\u591f\u5229\u7528\u5e7b\u89c9\u6765\u6a21\u4eff\u4eba\u7c7b\u7684\u8ba4\u77e5\u504f\u89c1\uff0c\u4ece\u800c\u5c55\u73b0\u51fa\u975e\u7406\u6027\u4f46\u793e\u4f1a\u6027\u7684\u4e00\u9762\uff1f\u672c\u6587\u63a2\u8ba8\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u901a\u8fc7\u7ed3\u5408\u5b9e\u7528\u7684\u793e\u4f1a\u79d1\u5b66\u5b9e\u9a8c\u548c\u7406\u8bba\u6d1e\u5bdf\uff0c\u63d0\u51faCogMir\uff0c\u4e00\u4e2a\u5f00\u653e\u5f0f\u591aLLM\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528LLMs\u7684\u5e7b\u89c9\u7279\u6027\u6765\u8bc4\u4f30\u548c\u63d0\u5347\u5176\u793e\u4f1a\u667a\u80fd\uff0c\u7279\u522b\u662f\u5728\u8ba4\u77e5\u504f\u5dee\u65b9\u9762\u3002\u6211\u4eec\u5728CogMir\u5b50\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u786e\u5b9a\u60c5\u5883\u4e0b\uff0cLLMs\u548c\u4eba\u7c7b\u5728\u975e\u7406\u6027\u53ca\u4eb2\u793e\u4f1a\u51b3\u7b56\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u4e00\u81f4\u6027\uff0c\u8fd9\u8868\u660eLLMs\u4f5c\u4e3a\u793e\u4f1a\u5b9e\u4f53\u7684\u4eb2\u793e\u4f1a\u6027\uff0c\u5e76\u7a81\u663e\u4e86\u5e7b\u89c9\u7279\u6027\u7684\u5173\u952e\u4f5c\u7528\u3002\u6b64\u5916\uff0cCogMir\u6846\u67b6\u5c55\u793a\u4e86\u5176\u4f5c\u4e3a\u7814\u7a76LLMs\u793e\u4f1a\u667a\u80fd\u7684\u6709\u4ef7\u503c\u5e73\u53f0\u7684\u6f5c\u529b\u3002|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547](http://arxiv.org/abs/2405.13547)|null|## \u80cc\u666f \u81ea\u52a8\u9a7e\u9a76\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u5148\u8fdb\u7684\u51b3\u7b56\u548c\u63a7\u5236\u7b97\u6cd5\u3002\u7406\u89e3\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u51b3\u7b56\u7684\u4f9d\u636e\u5bf9\u4e8e\u786e\u4fdd\u5176\u5728\u9ad8\u901f\u516c\u8def\u9a7e\u9a76\u4e2d\u7684\u5b89\u5168\u4e0e\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aHighwayLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u6765\u9884\u6d4bego\u8f66\u8f86\u7684\u672a\u6765\u5bfc\u822a\u8def\u5f84\u70b9\u3002\u8be5\u65b9\u6cd5\u8fd8\u91c7\u7528\u9884\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u6a21\u578b\u4f5c\u4e3a\u9ad8\u5c42\u6b21\u89c4\u5212\u5668\uff0c\u5bf9\u5408\u9002\u7684\u5143\u7ea7\u52a8\u4f5c\u8fdb\u884c\u51b3\u7b56\u3002HighwayLLM\u5c06RL\u6a21\u578b\u7684\u8f93\u51fa\u4e0e\u5f53\u524d\u72b6\u6001\u4fe1\u606f\u76f8\u7ed3\u5408\uff0c\u751f\u6210\u5b89\u5168\u3001\u65e0\u78b0\u649e\u4e14\u53ef\u89e3\u91ca\u7684\u672a\u6765\u72b6\u6001\u9884\u6d4b\uff0c\u4ece\u800c\u6784\u5efa\u51fa\u8f66\u8f86\u7684\u884c\u9a76\u8f68\u8ff9\u3002\u968f\u540e\uff0c\u57fa\u4e8ePID\u7684\u63a7\u5236\u5668\u5f15\u5bfc\u8f66\u8f86\u9075\u5faaLLM\u4ee3\u7406\u9884\u6d4b\u7684\u8def\u5f84\u70b9\u3002\u8fd9\u79cdLLM\u4e0eRL\u548cPID\u7684\u878d\u5408\u63d0\u5347\u4e86\u51b3\u7b56\u8fc7\u7a0b\uff0c\u5e76\u4e3a\u9ad8\u901f\u516c\u8def\u81ea\u52a8\u9a7e\u9a76\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050](http://arxiv.org/abs/2405.13050)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09-\u5728-\u73af\u5e94\u7528\u5df2\u663e\u793a\u51fa\u6709\u6548\u7406\u89e3\u7528\u6237\u547d\u4ee4\u3001\u5236\u5b9a\u8ba1\u5212\u5e76\u76f8\u5e94\u5730\u64cd\u4f5c\u5916\u90e8\u5de5\u5177/\u7cfb\u7edf\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u4ee3\u7406\u7684\u64cd\u4f5c\u8303\u56f4\u5c40\u9650\u4e8e\u88ab\u52a8\u54cd\u5e94\u7528\u6237\uff0c\u9700\u8981\u7528\u6237\u6839\u636e\u5e95\u5c42\u5de5\u5177/\u7cfb\u7edf\u6765\u8868\u8ff0\u9700\u6c42\u3002\u6211\u4eec\u6ce8\u610f\u5230LLM\u4ee3\u7406\u7528\u6237\u754c\u9762\uff08LAUI\uff09\u7684\u6f5c\u529b\u8fdc\u672a\u5145\u5206\u5229\u7528\u3002\u7406\u60f3\u7684LAUI\u8bbe\u60f3\u4e2d\uff0c\u7528\u6237\u65e0\u9700\u6df1\u5165\u4e86\u89e3\u5de5\u5177/\u7cfb\u7edf\uff0c\u5c31\u80fd\u4e0e\u4e4b\u4ea4\u4e92\u4ee5\u63a2\u7d22\u65b0\u5174\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u4e0d\u540c\u4e8e\u8bbe\u8ba1\u56fa\u5b9a\u7684\u53ef\u63a2\u7d22GUI\u6765\u6559\u6388\u7528\u6237\u4f7f\u7528\u7cfb\u7edf\u7684\u9884\u8bbe\u65b9\u5f0f\uff0cLAUI\u4e2d\u7684LLM\u4ee3\u7406\u4ece\u4e00\u5f00\u59cb\u5c31\u5bf9\u7cfb\u7edf\u719f\u7ec3\uff0c\u4e3b\u52a8\u5b66\u4e60\u7528\u6237\u53ca\u5176\u9700\u6c42\uff0c\u5e76\u5411\u7528\u6237\u63d0\u51fa\u65b0\u7684\u4e92\u52a8\u65b9\u6848\u3002\u4e3a\u4e86\u5c55\u793aLAUI\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\uff1aFlute X GPT\uff0c\u5b83\u7ed3\u5408\u4e86LLM\u4ee3\u7406\u3001\u63d0\u793a\u7ba1\u7406\u5668\u548c\u4e00\u4e2a\u652f\u6301\u590d\u6742\u5b9e\u65f6\u4f53\u9a8c\u7684\u7b1b\u5b50\u6559\u5b66\u591a\u5a92\u4f53\u8f6f\u786c\u4ef6\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u5b66\u4e60\u5439\u594f\u7b1b\u5b50\u7684\u8fc7\u7a0b\u3002|\n", "2405.13009": "|**2024-05-13**|**METAREFLECTION: Learning Instructions for Language Agents using Past Reflections**|Priyanshu Gupta et.al.|[2405.13009](http://arxiv.org/abs/2405.13009)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e7f\u53d7\u6b22\u8fce\uff0c\u4f46\u4e3a\u5176\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u8bbe\u8ba1\u7cbe\u786e\u7684\u63d0\u793a\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u7528\u6237\u901a\u5e38\u9700\u8981\u4e0e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u591a\u8f6e\u5bf9\u8bdd\u4ee5\u8fbe\u6210\u76ee\u6807\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u6a21\u578b\u81ea\u8eab\u7684\u53cd\u9988\uff0c\u5373\u81ea\u53cd\u601d\uff0c\u80fd\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u8d77\u5230\u5f3a\u5316\u4f5c\u7528\uff0c\u6709\u52a9\u4e8e\u66f4\u5feb\u5730\u8fbe\u5230\u671f\u671b\u7ed3\u679c\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014METAREFLECTION\uff0c\u5b83\u80fd\u4ece\u8bad\u7ec3\u9636\u6bb5\u6536\u96c6\u5230\u7684\u4e2a\u4f53\u81ea\u53cd\u601d\u4e2d\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u901a\u7528\u63d0\u793a\u6307\u4ee4\u3002\u6211\u4eec\u5728\u57fa\u7840\u8bbe\u65bd\u5373\u4ee3\u7801\uff08IAC\uff09\u6f0f\u6d1e\u68c0\u6d4b\u548c\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4f7f\u7528REACT\u548cCOT\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cMETAREFLECTION\u663e\u8457\u4f18\u4e8eGPT-4\uff0c\u5206\u522b\u5728IAC\u3001COT\u548cREACT\u4e2d\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a16.82%\u300131.33%\u548c15.42%\uff0c\u8fd9\u8868\u660eMETAREFLECTION\u6709\u6f5c\u529b\u63d0\u5347LLMs\u7684\u6548\u7387\uff0c\u662f\u4e00\u79cd\u503c\u5f97\u63a2\u7d22\u7684\u7b56\u7565\u3002|\n", "2405.15414": "|**2024-05-24**|**Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification**|Yuxuan Guo et.al.|[2405.15414](http://arxiv.org/abs/2405.15414)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u4e2d\uff0c\u6784\u5efa\u5f00\u653e\u578b\u4ee3\u7406\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u7ec8\u6781\u76ee\u6807\uff0c\u7279\u522b\u662f\u521b\u9020\u6027\u7684\u4ee3\u7406\u66f4\u5177\u5438\u5f15\u529b\u3002\u73b0\u6709\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u6709\u660e\u786e\u76ee\u6807\u7684\u957f\u5e8f\u5217\u4efb\u52a1\uff08\u5982\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u201c\u5f00\u91c7\u94bb\u77f3\u201d\uff09\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u5904\u7406\u5177\u6709\u5f00\u653e\u76ee\u6807\u548c\u62bd\u8c61\u6807\u51c6\u7684\u521b\u9020\u6027\u4efb\u52a1\u65f6\u9047\u5230\u56f0\u96be\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5f25\u5408\u8fd9\u4e9b\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u4ece\u800c\u7f3a\u4e4f\u81ea\u6211\u6539\u8fdb\u6765\u89e3\u51b3\u95ee\u9898\u7684\u53cd\u9988\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\u6280\u672f\uff0c\u4ee5\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e3a\u521b\u9020\u6027\u4efb\u52a1\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Luban\u4ee3\u7406\uff0c\u4e13\u6ce8\u4e8e\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5b83\u914d\u5907\u4e86\u4e24\u7ea7\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u5b9e\u8df5\uff1a\uff081\uff09\u89c6\u89c9\u9a8c\u8bc13D\u7ed3\u6784\u63a8\u6d4b\uff0c\u901a\u8fc7\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u7684CAD\u5efa\u6a21\u7a0b\u5e8f\u5b9e\u73b0\uff1b\uff082\uff09\u5b9e\u7528\u9a8c\u8bc1\uff0c\u6839\u636e\u62bd\u8c61\u6807\u51c6\u751f\u6210\u5e76\u9a8c\u8bc1\u4e0e\u73af\u5883\u76f8\u5173\u7684\u529f\u80fd\u7a0b\u5e8f\u3002\u5e7f\u6cdb\u7684\u591a\u7ef4\u5ea6\u4eba\u7c7b\u7814\u7a76\u548cElo\u8bc4\u7ea7\u663e\u793a\uff0cLuban\u80fd\u591f\u5728\u6211\u4eec\u63d0\u51fa\u7684\u57fa\u51c6\u4e2d\u5b8c\u6210\u591a\u6837\u5316\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5e76\u5728\u53ef\u89c6\u5316\u548c\u5b9e\u7528\u6027\u65b9\u9762\u5206\u522b\u6bd4\u5176\u4ed6\u57fa\u7ebf\u63d0\u9ad8\u4e8633%\u5230100%\u3002\u6b64\u5916\uff0c\u5b9e\u73b0\u5728\u771f\u5b9e\u4e16\u754c\u673a\u5668\u4eba\u624b\u81c2\u4e0a\u7684\u6f14\u793a\u5c55\u793a\u4e86Luban\u5728\u7269\u7406\u4e16\u754c\u4e2d\u7684\u521b\u4f5c\u6f5c\u529b\u3002|\n", "2405.15145": "|**2024-05-24**|**CulturePark: Boosting Cross-cultural Understanding in Large Language Models**|Cheng Li et.al.|[2405.15145](http://arxiv.org/abs/2405.15145)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u666e\u904d\u5b58\u5728\u6587\u5316\u504f\u89c1\uff0c\u4e3b\u8981\u6e90\u4e8e\u7f3a\u4e4f\u4ee3\u8868\u4e0d\u540c\u6587\u5316\u7684\u4ee3\u8868\u6027\u6570\u636e\u3002\u4f20\u7edf\u7684\u6587\u5316\u6570\u636e\u96c6\u548c\u57fa\u51c6\u901a\u5e38\u901a\u8fc7\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u63d0\u53d6\u6216\u805a\u5408\u6765\u81ea\u7ef4\u57fa\u767e\u79d1\u548c\u793e\u4ea4\u5a92\u4f53\u7684\u4fe1\u606f\u6784\u5efa\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u548c\u4eba\u5de5\u6807\u6ce8\uff0c\u6210\u672c\u9ad8\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u672c\u6587\u501f\u9274\u8ba4\u77e5\u793e\u4f1a\u4ea4\u6d41\u7406\u8bba\uff0c\u63d0\u51faCulturePark\uff0c\u4e00\u4e2a\u5229\u7528LLMs\u7684\u591a\u4ee3\u7406\u6c9f\u901a\u6846\u67b6\uff0c\u7528\u4e8e\u6587\u5316\u6570\u636e\u6536\u96c6\u3002CulturePark\u901a\u8fc7\u6a21\u62df\u4e0d\u540c\u6587\u5316\u80cc\u666f\u4e0b\u7684\u4eba\u7c7b\u4ea4\u6d41\uff0c\u8ba9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89d2\u8272\u626e\u6f14\uff0c\u751f\u6210\u5305\u542b\u4eba\u7c7b\u4fe1\u5ff5\u3001\u89c4\u8303\u548c\u4e60\u4fd7\u7684\u9ad8\u8d28\u91cf\u8de8\u6587\u5316\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528CulturePark\u751f\u6210\u4e8641,000\u4e2a\u6587\u5316\u6837\u672c\uff0c\u5bf9\u516b\u79cd\u7279\u5b9a\u6587\u5316\u8fdb\u884c\u4e86\u6a21\u578b\u5fae\u8c03\u3002\u5728\u4e09\u9879\u4e0b\u6e38\u4efb\u52a1\u8bc4\u4f30\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u8868\u73b0\u4f18\u4e8eGPT-4\uff1a\u5185\u5bb9\u8fc7\u6ee4\u3001\u6587\u5316\u4e00\u81f4\u6027\uff08\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u8868\u4e0a\uff09\u548c\u6587\u5316\u6559\u80b2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684GPT-3.5\u6a21\u578b\u5728\u5185\u5bb9\u8fc7\u6ee4\u4efb\u52a1\u4e0a\u4e0eGPT-4\u76f8\u5f53\u6216\u4f18\u4e8e\u5b83\uff1b\u5728\u6587\u5316\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u886813\u6846\u67b6\u4e0a\u8d85\u8d8aGPT-4\uff1b\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u6587\u5316\u6559\u80b2\u6548\u679c\u548c\u7528\u6237\u4f53\u9a8c\u4e0a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u8868\u73b0\u51fa\u8272\u3002CulturePark\u5bf9\u4e8e\u51cf\u5c11\u6587\u5316\u504f\u89c1\u548c\u63a8\u52a8AI\u7684\u6c11\u4e3b\u5316\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u5f3a\u8c03\u4e86\u6587\u5316\u5305\u5bb9\u6027\u6570\u636e\u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2405.14918": "|**2024-05-23**|**AnalogCoder: Analog Circuit Design via Training-Free Code Generation**|Yao Lai et.al.|[2405.14918](http://arxiv.org/abs/2405.14918)|**[link](https://github.com/laiyao1/AnalogCoder)**|### \u7ffb\u8bd1 \u5728\u73b0\u4ee3\u82af\u7247\u6280\u672f\u4e2d\uff0c\u6a21\u62df\u7535\u8def\u8bbe\u8ba1\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u7ec4\u4ef6\u9009\u62e9\u3001\u8fde\u63a5\u548c\u53c2\u6570\u8bbe\u7f6e\u4ee5\u786e\u4fdd\u7535\u8def\u529f\u80fd\u6b63\u5e38\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b57\u7535\u8def\u8bbe\u8ba1\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u4f46\u6a21\u62df\u7535\u8def\u7684\u590d\u6742\u6027\u548c\u6570\u636e\u7a00\u7f3a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a8\u51fa\u4e86AnalogCoder\uff0c\u8fd9\u662f\u9996\u4e2a\u65e0\u9700\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4e13\u4e3a\u901a\u8fc7Python\u4ee3\u7801\u751f\u6210\u6765\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u9996\u5148\uff0cAnalogCoder\u91c7\u7528\u53cd\u9988\u589e\u5f3a\u6d41\u7a0b\uff0c\u5e76\u7ed3\u5408\u5b9a\u5236\u7684\u9886\u57df\u7279\u5b9a\u63d0\u793a\uff0c\u80fd\u591f\u81ea\u52a8\u4e14\u81ea\u6211\u6821\u6b63\u5730\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\uff0c\u6210\u529f\u7387\u9ad8\u3002\u5176\u6b21\uff0c\u5b83\u63d0\u51fa\u4e86\u4e00\u5957\u7535\u8def\u5de5\u5177\u5e93\uff0c\u7528\u4e8e\u5b58\u50a8\u6210\u529f\u7684\u7535\u8def\u8bbe\u8ba1\u4f5c\u4e3a\u53ef\u91cd\u7528\u7684\u6a21\u5757\u5316\u5b50\u7535\u8def\uff0c\u7b80\u5316\u4e86\u590d\u5408\u7535\u8def\u7684\u521b\u5efa\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAnalogCoder\u5728\u5e7f\u6cdb\u8986\u76d6\u6a21\u62df\u7535\u8def\u4efb\u52a1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8d85\u8d8a\u4e86\u5176\u4ed6\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0c\u6210\u529f\u8bbe\u8ba1\u4e8620\u4e2a\u7535\u8def\uff0c\u6bd4\u6807\u51c6GPT-4o\u591a\u51fa5\u4e2a\u3002\u6211\u4eec\u76f8\u4fe1AnalogCoder\u80fd\u663e\u8457\u63d0\u5347\u82af\u7247\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u6548\u7387\uff0c\u8ba9\u975e\u4e13\u5bb6\u4e5f\u80fd\u9ad8\u6548\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u57fa\u51c6\u5df2\u63d0\u4f9b\u5728\uff1a[https://github.com/anonyanalog/AnalogCoder](https://github.com/anonyanalog/AnalogCoder)\u3002|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|## \u80cc\u666f \u7531\u4e8e\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0cEmbodied agent \u9700\u8981\u5177\u5907\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u51b3\u5b9a\u5177\u4f53\u884c\u52a8\u65f6\u53ef\u80fd\u4ea7\u751f\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u7684\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3 LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\u2014\u2014\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bad\u7ec3\u7b56\u7565\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u7684\u6700\u9ad8\u6210\u5c31\u9700\u8981\u66f4\u4e3a\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u6bd4\u73b0\u6709\u6700\u5feb\u65b9\u6cd5\u5feb\u51fa\u4e866.8\u500d\u3002|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510](http://arxiv.org/abs/2405.16510)|null|\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\u63a8\u52a8\u4e86\u667a\u80fd\u4ee3\u7406\u7814\u7a76\u7684\u65b0\u70ed\u6f6e\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5b9e\u73b0\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6709\u524d\u666f\u65b9\u6cd5\uff0c\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u548c\u6cdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u5728\u5b9e\u9645\u4efb\u52a1\u4e2d\uff0c\u6709\u6548\u7684\u89c4\u5212\u5bf9LLM\u4ee3\u7406\u7684\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u590d\u6742\u4efb\u52a1\u8bbe\u8ba1\u51fa\u53ef\u884c\u6216\u6700\u4f18\u7684\u7cbe\u7ec6\u7c92\u5ea6\u64cd\u4f5c\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9700\u8981\u7ec4\u5408\u5927\u91cf\u5f02\u8d28\u884c\u52a8\u7684\u5e8f\u5217\uff0c\u4ecd\u662f\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMeta-Task Planning\uff08MTP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u96f6\u6837\u672c\u7684\u534f\u4f5c\u5f0fLLM\u591a\u4ee3\u7406\u7cfb\u7edf\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u5373\u5143\u4efb\u52a1\uff0c\u7b80\u5316\u4e86\u4efb\u52a1\u89c4\u5212\u3002\u6bcf\u4e2a\u5143\u4efb\u52a1\u968f\u540e\u6620\u5c04\u4e3a\u53ef\u6267\u884c\u52a8\u4f5c\u3002\u5728TravelPlanner\u548cAPI-Bank\u4e24\u4e2a\u4e25\u683c\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86MTP\u3002\u7ed3\u679c\u8868\u660e\uff0cMTP\u5728TravelPlanner\u4e0a\u7684\u5e73\u5747\u6210\u529f\u7387\u7ea6\u4e3a40%\uff0c\u8fdc\u8d85\u5f53\u524d\u6700\u4f73\u57fa\u7ebf\uff082.92%\uff09\uff0c\u5e76\u4e14\u5728API-Bank\u4e0a\u7684\u6027\u80fd\u6bd4\u4f7f\u7528ReAct\u7684LLM_{api}-4\u9ad8\u51fa\u7ea614%\uff0c\u8fd9\u663e\u793a\u51fa\u5c06LLM\u4e0e\u591a\u4ee3\u7406\u7cfb\u7edf\u76f8\u7ed3\u5408\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376](http://arxiv.org/abs/2405.16376)|**[link](https://github.com/cyrilli/stride)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u8bed\u8a00\u80fd\u529b\u548c\u63a8\u7406\u6280\u5de7\u3002\u7136\u800c\uff0c\u5728\u6218\u7565\u6027\u7684\u591a\u4ee3\u7406\u51b3\u7b56\u73af\u5883\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u5c40\u9650\uff0c\u5982\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5dee\u3001\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\u548c\u751f\u6210\u9519\u8bef\u4fe1\u606f\u3002\u8fd9\u4e9b\u7f3a\u70b9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9075\u5b88\u590d\u6742\u6e38\u620f\u89c4\u5219\u3001\u957f\u671f\u89c4\u5212\u3001\u63a2\u7d22\u672a\u77e5\u73af\u5883\u4ee5\u53ca\u9884\u6d4b\u5bf9\u624b\u884c\u52a8\u7684\u4e92\u52a8\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u7ed3\u5408\u4e86\u8bb0\u5fc6\u548c\u4e13\u4e1a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u5176\u5728\u6218\u7565\u51b3\u7b56\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7279\u522b\u5728\u53cc\u8fb9\u8c08\u5224\u3001\u591a\u4ee3\u7406\u52a8\u6001\u673a\u5236\u8bbe\u8ba1\u7b49\u7ecf\u6d4e\u91cd\u8981\u573a\u666f\u4e2d\u5e94\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5e76\u901a\u8fc7\u5b9a\u91cf\u6307\u6807\u8bc4\u4f30\u5728\u5404\u79cd\u6218\u7565\u51b3\u7b56\u95ee\u9898\u4e0a\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u589e\u5f3a\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5f53\u524d\u6a21\u578b\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u4f46\u6211\u4eec\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u589e\u5f3a\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u53ef\u80fd\u6027\uff0c\u8fd9\u4e3a\u672a\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u5411\u3002**|\n", "2405.16334": "|**2024-05-29**|**Devil's Advocate: Anticipatory Reflection for LLM Agents**|Haoyu Wang et.al.|[2405.16334](http://arxiv.org/abs/2405.16334)|null|\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8d4b\u4e88\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u6211\u53cd\u601d\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65f6\u7684\u4e00\u81f4\u6027\u548c\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fc3\u4f7fLLM\u4ee3\u7406\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u5b50\u4efb\u52a1\uff08\u5373\u5236\u5b9a\u8ba1\u5212\uff09\uff0c\u5e76\u5728\u6267\u884c\u884c\u52a8\u4e4b\u524d\u6301\u7eed\u53cd\u601d\u53ef\u80fd\u7684\u5931\u8d25\u53ca\u5176\u8865\u6551\u63aa\u65bd\u3001\u6267\u884c\u540e\u4e0e\u5b50\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u5e76\u8fdb\u884c\u5fc5\u8981\u7684\u56de\u6eaf\u4ee5\u786e\u4fdd\u5168\u529b\u4ee5\u8d74\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u53ca\u5728\u5b8c\u6210\u8ba1\u5212\u540e\u8fdb\u884c\u5168\u9762\u5ba1\u67e5\uff0c\u4ee5\u4fbf\u4e8e\u672a\u6765\u7b56\u7565\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5728WebArena\u4e2d\u96f6\u6837\u672c\u5e94\u7528\u8fd9\u4e00\u65b9\u6cd5\u5904\u7406\u5b9e\u9645\u7684\u7f51\u7edc\u73af\u5883\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u4f18\u4e8e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u57fa\u4e8e\u53cd\u601d\u7684\u7b56\u7565\u4e0d\u4ec5\u63d0\u5347\u4e86\u4ee3\u7406\u5e94\u5bf9\u672a\u9884\u89c1\u6311\u6218\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u901a\u8fc7\u5f3a\u5927\u7684\u8ba1\u5212\u6267\u884c\u673a\u5236\uff0c\u8fd8\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u51cf\u5c11\u4e86\u5b9e\u73b0\u4efb\u52a1\u6240\u9700\u7684\u5c1d\u8bd5\u6b21\u6570\u548c\u8ba1\u5212\u4fee\u8ba2\u6b21\u6570\u3002|\n", "2405.16247": "|**2024-05-25**|**AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning**|Minghao Chen et.al.|[2405.16247](http://arxiv.org/abs/2405.16247)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u5404\u79cd\u9886\u57df\u4efb\u52a1\uff0c\u5982\u673a\u5668\u4eba\u3001\u6e38\u620f\u548c\u7f51\u7edc\u5bfc\u822a\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u548c\u4e13\u5bb6\u7ea7\u63d0\u793a\u624d\u80fd\u9002\u5e94\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u9002\u5e94\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AutoManual\u6846\u67b6\uff0c\u8ba9LLMs\u80fd\u591f\u901a\u8fc7\u4e92\u52a8\u81ea\u4e3b\u6784\u5efa\u7406\u89e3\uff0c\u5e76\u9002\u5e94\u65b0\u73af\u5883\u3002AutoManual\u5c06\u73af\u5883\u77e5\u8bc6\u5206\u4e3a\u591a\u6837\u7684\u89c4\u5219\uff0c\u5e76\u901a\u8fc7\u4e24\u4e2a\u4ee3\u7406\u8fdb\u884c\u5728\u7ebf\u4f18\u5316\uff1a1\uff09\u89c4\u5212\u5668\u6839\u636e\u5f53\u524d\u89c4\u5219\u5236\u5b9a\u53ef\u64cd\u4f5c\u7684\u884c\u52a8\u8ba1\u5212\uff1b2\uff09\u6784\u5efa\u8005\u901a\u8fc7\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u89c4\u5219\u7cfb\u7edf\u66f4\u65b0\u89c4\u5219\uff0c\u4fc3\u8fdb\u5728\u7ebf\u89c4\u5219\u7ba1\u7406\u5e76\u4fdd\u6301\u5173\u952e\u7ec6\u8282\u3002\u4e3a\u4e86\u51cf\u5c11\u5728\u7ba1\u7406\u89c4\u5219\u65f6\u7684\u5e7b\u89c9\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u6848\u4f8b\u6761\u4ef6\u63d0\u793a\u201d\u7b56\u7565\u7528\u4e8e\u6784\u5efa\u8005\u3002\u6700\u7ec8\uff0c\u7f16\u8bd1\u5668\u4ee3\u7406\u5c06\u8fd9\u4e9b\u89c4\u5219\u6574\u5408\u6210\u4e00\u4efd\u5168\u9762\u7684\u624b\u518c\u3002\u8fd9\u4efd\u81ea\u6211\u751f\u6210\u7684\u624b\u518c\u4e0d\u4ec5\u80fd\u63d0\u9ad8\u9002\u5e94\u6027\uff0c\u8fd8\u80fd\u6307\u5bfc\u5c0f\u578bLLMs\u7684\u89c4\u5212\uff0c\u540c\u65f6\u4fdd\u6301\u4eba\u7c7b\u53ef\u8bfb\u3002\u4ec5\u51ed\u4e00\u6b21\u7b80\u5355\u6f14\u793a\uff0cAutoManual\u663e\u8457\u63d0\u9ad8\u4e86\u4efb\u52a1\u6210\u529f\u7387\uff0cGPT-4-turbo\u4e0b\u8fbe\u523097.4%\uff0cGPT-3.5-turbo\u4e0b\u4e3a86.2%\u3002\u6e90\u4ee3\u7801\u5373\u5c06\u53d1\u5e03\u3002|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208](http://arxiv.org/abs/2405.18208)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u8868\u660e\uff0c\u8fd9\u4e9b\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u4e9b\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\uff0c\u5982\u5199\u4f5c\u548c\u7f16\u7801\uff0c\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u9700\u8981\u7efc\u5408\u89c4\u5212\u7684\u4efb\u52a1\u4e0a\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u4ecd\u662f\u5f53\u524d\u6a21\u578b\u7684\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u65c5\u884c\u89c4\u5212\uff0c\u8fd9\u662f\u4e00\u4e2a\u6d89\u53ca\u591a\u4e2a\u9636\u6bb5\u7684\u590d\u6742\u95ee\u9898\uff0c\u5305\u62ec\u63d0\u7eb2\u3001\u4fe1\u606f\u6536\u96c6\u548c\u89c4\u5212\uff0c\u901a\u5e38\u4f34\u968f\u7740\u5404\u79cd\u7ea6\u675f\u548c\u4e0d\u786e\u5b9a\u6027\u3002\u73b0\u6709\u7684\u63a8\u7406\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u6548\u679c\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u89c4\u5212\u6846\u67b6\uff0c\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6a21\u4eff\u4eba\u7c7b\u89e3\u51b3\u591a\u9636\u6bb5\u95ee\u9898\u7684\u6b65\u9aa4\uff0c\u4ee5\u63d0\u5347\u5176\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5b9e\u65bd\u7b56\u7565\uff0c\u8ba9\u6a21\u578b\u80fd\u4e3a\u6bcf\u4e2a\u65c5\u884c\u67e5\u8be2\u751f\u6210\u8fde\u8d2f\u7684\u63d0\u7eb2\uff0c\u6a21\u62df\u4eba\u7c7b\u7684\u89c4\u5212\u6a21\u5f0f\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u7b56\u7565\u5757\u548c\u77e5\u8bc6\u5757\u5230\u6846\u67b6\u4e2d\uff1a\u7b56\u7565\u5757\u5e2e\u52a9\u4fe1\u606f\u641c\u96c6\uff0c\u800c\u77e5\u8bc6\u5757\u63d0\u4f9b\u8be6\u7ec6\u89c4\u5212\u6240\u9700\u7684\u5fc5\u8981\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u6846\u67b6\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89c4\u5212\u80fd\u529b\u7684\u663e\u8457\u63d0\u5347\uff0c\u4f7f\u5176\u5728\u5904\u7406\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u65f6\u6548\u7387\u548c\u6548\u679c\u90fd\u6709\u6240\u63d0\u9ad8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u4e0eGPT-4-Turbo\u7ed3\u5408\u65f6\uff0c\u6211\u4eec\u7684\u6846\u67b6\u76f8\u8f83\u4e8e\u57fa\u7840\u6846\u67b6\u5728GPT-4-Turbo\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e8610\u500d\u3002|\n", "2405.18113": "|**2024-05-28**|**Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting**|Hongda Sun et.al.|[2405.18113](http://arxiv.org/abs/2405.18113)|null|\u968f\u7740\u5728\u7ebf\u62db\u8058\u670d\u52a1\u7684\u5174\u8d77\uff0c\u4f20\u7edf\u7684\u6c42\u804c\u548c\u62db\u8058\u65b9\u5f0f\u53d1\u751f\u4e86\u53d8\u9769\uff0c\u8feb\u5207\u9700\u8981\u5f00\u53d1\u9ad8\u8d28\u91cf\u7684\u5de5\u4e1a\u5e94\u7528\u6765\u63d0\u5347\u6c42\u804c\u8005\u4e0e\u804c\u4f4d\u7684\u5339\u914d\u5ea6\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u6f5c\u5728\u8bed\u4e49\u5efa\u6a21\uff0c\u5b66\u4e60\u4e24\u8005\u4e4b\u95f4\u7684\u5339\u914d\u51fd\u6570\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89d2\u8272\u626e\u6f14\u65b9\u9762\u5f3a\u5927\u80fd\u529b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u5f15\u5165LLMs\u6a21\u62df\u9762\u8bd5\u73af\u8282\uff0c\u8ba9\u5176\u4e0e\u6c42\u804c\u8005\u8fdb\u884c\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u5019\u9009\u4eba\u8bc4\u4f30\u63d0\u4f9b\u989d\u5916\u8bc1\u636e\uff0c\u4ece\u800c\u589e\u5f3a\u4ec5\u57fa\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u4e2a\u6027\u5316\u5339\u914d\u3002\u7136\u800c\uff0c\u5728\u7f51\u7edc\u62db\u8058\u4e2d\u7684\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u89d2\u8272\u5851\u9020\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5982\u63d0\u95ee\u6280\u5de7\u3001\u56de\u7b54\u6784\u5efa\u4ee5\u53ca\u53cc\u5411\u5339\u914d\u5ea6\u8bc4\u4f30\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMockLLM\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u6846\u67b6\uff0c\u5c06\u4eba\u804c\u5339\u914d\u8fc7\u7a0b\u5212\u5206\u4e3a\u4e24\u4e2a\u6a21\u5757\uff1a\u6a21\u62df\u9762\u8bd5\u751f\u6210\u548c\u63e1\u624b\u534f\u8bae\u4e2d\u7684\u53cc\u5411\u8bc4\u4f30\uff0c\u901a\u8fc7\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u4e4b\u95f4\u7684\u534f\u4f5c\u884c\u4e3a\u5171\u540c\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u89d2\u8272\u3001\u591a\u884c\u4e3a\u7684\u6846\u67b6\uff0c\u4f7f\u5355\u4e00\u7684LLM\u4ee3\u7406\u80fd\u6709\u6548\u5730\u626e\u6f14\u53cc\u65b9\u7684\u4e0d\u540c\u804c\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cd\u601d\u8bb0\u5fc6\u751f\u6210\u548c\u52a8\u6001\u63d0\u793a\u4fee\u6539\u6280\u672f\uff0c\u4ee5\u4f18\u5316\u53cc\u65b9\u7684\u884c\u4e3a\uff0c\u6301\u7eed\u4f18\u5316\u9644\u52a0\u7684\u8bc4\u4f30\u8bc1\u636e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMockLLM\u5728\u4eba\u804c\u5339\u914d\u4e0a\u7684\u8868\u73b0\u6700\u4f18\uff0c\u4e14\u6a21\u62df\u9762\u8bd5\u8d28\u91cf\u9ad8\uff0c\u9884\u793a\u7740\u5b83\u5728\u672a\u6765\u5728\u7ebf\u62db\u8058\u4e2d\u7684\u5b9e\u9645\u5e94\u7528\u524d\u666f\u5e7f\u9614\u3002|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092](http://arxiv.org/abs/2405.18092)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|**\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e94\u7528\u4e8e\u6570\u5b57\u5b6a\u751f\u8fc7\u7a0b\u6a21\u62df\u7684\u53c2\u6570\u81ea\u52a8\u5316\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5305\u542b\u89c2\u5bdf\u3001\u63a8\u7406\u3001\u51b3\u7b56\u548c\u603b\u7ed3\u56db\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\u3002\u901a\u8fc7\u5b9e\u73b0LLM\u4ee3\u7406\u4e0e\u6a21\u62df\u6a21\u578b\u7684\u52a8\u6001\u4ea4\u4e92\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u81ea\u52a8\u63a2\u7d22\u53c2\u6570\u8bbe\u7f6e\uff0c\u5229\u7528\u542f\u53d1\u5f0f\u63a8\u7406\u786e\u5b9a\u4e00\u7ec4\u63a7\u5236\u6a21\u62df\u4ee5\u8fbe\u6210\u76ee\u6807\u7684\u53c2\u6570\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u6ce8\u5165LLM\u7684\u542f\u53d1\u5f0f\uff0c\u589e\u5f3a\u6a21\u62df\u6a21\u578b\uff0c\u5e76\u652f\u6301\u81ea\u4e3b\u641c\u7d22\u4ee5\u89e3\u51b3\u7528\u6237\u4efb\u52a1\uff0c\u6709\u671b\u63d0\u9ad8\u7528\u6237\u4f53\u9a8c\u5e76\u51cf\u8f7b\u4eba\u7c7b\u7528\u6237\u5728\u590d\u6742\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u7814\u7a76\u901a\u8fc7\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u4e0e\u529f\u80fd\uff0c\u5e76\u5728GitHub\u4ed3\u5e93\u63d0\u4f9b\u4e86\u53ef\u89c6\u5316\u7684\u6f14\u793a\u3002**|\n", "2405.17837": "|**2024-05-28**|**Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces**|Qiuyu Lu et.al.|[2405.17837](http://arxiv.org/abs/2405.17837)|null|\u5728\u4eba\u673a\u4ea4\u4e92\uff08HCI\uff09\u9886\u57df\uff0c\u4ea4\u4e92\u8bbe\u5907\u7684\u8bbe\u8ba1\u5f00\u53d1\u662f\u5173\u952e\u5173\u6ce8\u70b9\u3002\u968f\u7740\u65b0\u578b\u786c\u4ef6\u548c\u5148\u8fdb\u5236\u9020\u6280\u672f\u7684\u5174\u8d77\uff0c\u5bf9\u80fd\u591f\u7b80\u5316\u539f\u578b\u5236\u4f5c\u8fc7\u7a0b\u7684\u4e13\u95e8\u8bbe\u8ba1\u5de5\u5177\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5de5\u5177\u867d\u7136\u901a\u8fc7\u53c2\u6570\u5316\u8bbe\u8ba1\u548c\u6a21\u62df\u7b80\u5316\u6d41\u7a0b\uff0c\u4f46\u5b66\u4e60\u66f2\u7ebf\u8f83\u9661\uff0c\u4e14\u5728\u6fc0\u53d1\u521b\u65b0\u601d\u7ef4\u65b9\u9762\u6709\u6240\u6b20\u7f3a\u3002\u672c\u7814\u7a76\u4ee5\u6d41\u4f53\u8ba1\u7b97\u754c\u9762\u4e3a\u4f8b\uff0c\u63a2\u8ba8\u5982\u4f55\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u589e\u5f3a\u7269\u7406\u8bbe\u5907\u8bbe\u8ba1\u5de5\u5177\uff0c\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u8bbe\u8ba1\u5de5\u5177\uff08GDT\uff09\u3002\u501f\u52a9LLM\uff0cGDT\u80fd\u591f\u7406\u89e3\u65b0\u8bbe\u5907\u7684\u7279\u6027\u548c\u5c40\u9650\uff0c\u63d0\u51fa\u591a\u6837\u3001\u5bcc\u6709\u6d1e\u5bdf\u529b\u4e14\u5b9e\u7528\u7684\u5e94\u7528\u573a\u666f\uff0c\u63a8\u8350\u6280\u672f\u548c\u60c5\u5883\u9002\u5b9c\u7684\u8bbe\u5907\u8bbe\u8ba1\uff0c\u5e76\u81ea\u52a8\u751f\u6210\u8bbe\u8ba1\u53c2\u6570\uff0c\u4ee5\u4fbf\u4f20\u7edf\u8bbe\u8ba1\u5de5\u5177\u5c55\u793a\u7ed3\u679c\u5e76\u751f\u6210\u52a0\u5de5\u6240\u9700\u7684\u6587\u4ef6\u3002\u672c\u6587\u9610\u8ff0\u4e86GDT\u7684\u6846\u67b6\u3001\u5b9e\u73b0\u548c\u6027\u80fd\uff0c\u5e76\u53cd\u601d\u5176\u524d\u666f\u53ca\u9047\u5230\u7684\u6311\u6218\u3002|\n", "2405.20267": "|**2024-05-30**|**Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions**|Ruochen Zhao et.al.|[2405.20267](http://arxiv.org/abs/2405.20267)|**[link](https://github.com/Auto-Arena/Auto-Arena-LLMs)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65e5\u65b0\u6708\u5f02\uff0c\u8feb\u5207\u9700\u8981\u4e00\u79cd\u53ef\u9760\u4e14\u53ca\u65f6\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u9274\u4e8e\u9759\u6001\u57fa\u51c6\u6613\u53d7\u6c61\u67d3\uff0c\u7528\u6237\u5f80\u5f80\u4f9d\u8d56\u4e8e\u50cfChatbot Arena\u8fd9\u6837\u7684\u4eba\u7c7b\u6295\u7968\u5e73\u53f0\u3002\u7136\u800c\uff0c\u4eba\u5de5\u6807\u6ce8\u9700\u8981\u5927\u91cf\u4eba\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51faAuto-Arena\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u52a8\u5316\u5168\u6d41\u7a0b\u7684LLM\u8bc4\u4f30\u6846\u67b6\u3002\u9996\u5148\uff0c\u7531\u8003\u5b98LLM\u8bbe\u8ba1\u95ee\u9898\uff1b\u63a5\u7740\uff0c\u5019\u9009LLMs\u56f4\u7ed5\u95ee\u9898\u8fdb\u884c\u591a\u8f6e\u76f8\u4e92\u5bf9\u51b3\uff0c\u66b4\u9732\u51fa\u5b83\u4eec\u7684\u771f\u5b9e\u6027\u80fd\u5dee\u8ddd\uff1b\u6700\u540e\uff0c\u7531LLM\u88c1\u5224\u96c6\u4f53\u8ba8\u8bba\u5e76\u51b3\u5b9a\u80dc\u8005\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u89c1\uff0c\u63d0\u5347\u516c\u5e73\u6027\u3002\u6211\u4eec\u5728\u6700\u65b017\u6b3eLLMs\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0cAuto-Arena\u4e0e\u4eba\u7c7b\u504f\u597d\u5177\u6709\u6700\u9ad8\u7684\u76f8\u5173\u6027\uff0c\u4e3a\u66ff\u4ee3\u4eba\u7c7b\u8bc4\u4ef7\u5e73\u53f0\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189](http://arxiv.org/abs/2405.20189)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u4e3aNadine\u793e\u4ea4\u673a\u5668\u4eba\u5e73\u53f0\u5f00\u53d1\u667a\u80fd\u548c\u5065\u58ee\u7684\u793e\u4ea4\u673a\u5668\u4eba\u7cfb\u7edf\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u901a\u8fc7\u96c6\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5de7\u5999\u5730\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u7684\u5f3a\u5927\u63a8\u7406\u548c\u6307\u4ee4\u6267\u884c\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u63a5\u8fd1\u4eba\u7c7b\u7684\u611f\u6027\u4e0e\u8ba4\u77e5\u80fd\u529b\u3002\u8fd9\u4e0e\u5f53\u524d\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u76f8\u6bd4\u662f\u521b\u65b0\u7684\uff0c\u56e0\u4e3a\u5b83\u4eec\u901a\u5e38\u4e0d\u5177\u5907\u4eba\u7c7b\u5f0f\u7684\u957f\u671f\u8bb0\u5fc6\u6216\u590d\u6742\u7684\u60c5\u611f\u8bc4\u4f30\u529f\u80fd\u3002\u793e\u4ea4\u673a\u5668\u4eba\u7684\u81ea\u7136\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u7cfb\u7edf\u5404\u7ec4\u4ef6\u7684\u6027\u80fd\u548c\u534f\u540c\u5de5\u4f5c\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7cfb\u7edf\uff0c\u80fd\u591f\u901a\u8fc7\u591a\u6a21\u6001\u8f93\u5165\u5904\u7406\u751f\u6210\u6070\u5f53\u7684\u884c\u4e3a\uff0c\u6839\u636e\u8bc6\u522b\u5230\u7684\u7528\u6237\u5f15\u5165\u76f8\u5173\u7684\u60c5\u666f\u8bb0\u5fc6\uff0c\u5e76\u6a21\u62df\u673a\u5668\u4eba\u5728\u4e0e\u4eba\u7c7b\u4f19\u4f34\u4e92\u52a8\u8fc7\u7a0b\u4e2d\u4ea7\u751f\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9\u793e\u4ea4\u673a\u5668\u4eba\u7684LLM-agent\u6846\u67b6\uff0cSoR-ReAct\uff0c\u4f5c\u4e3a\u6211\u4eec\u7cfb\u7edf\u4e2d\u4ea4\u4e92\u6a21\u5757\u7684\u6838\u5fc3\u7ec4\u4ef6\u3002\u8fd9\u4e00\u8bbe\u8ba1\u63a8\u52a8\u4e86\u793e\u4ea4\u673a\u5668\u4eba\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u63d0\u5347\u4eba\u673a\u4ea4\u4e92\u7684\u8d28\u91cf\u3002|\n", "2405.19425": "|**2024-05-29**|**Adaptive In-conversation Team Building for Language Model Agents**|Linxin Song et.al.|[2405.19425](http://arxiv.org/abs/2405.19425)|null|### \u7ffb\u8bd1 \u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u5229\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u524d\u666f\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u7279\u5b9a\u5e94\u7528\u8bbe\u8ba1\u6709\u6548\u7684\u591a\u4ee3\u7406\u56e2\u961f\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u56e2\u961f\u6784\u5efa\u8303\u5f0f\uff0c\u540d\u4e3a\u201cCaptain Agent\u201d\u3002\u5b83\u901a\u8fc7\u521b\u65b0\u7684Agent\u8bbe\u8ba1\uff0c\u80fd\u591f\u81ea\u9002\u5e94\u5730\u4e3a\u6bcf\u4e2a\u95ee\u9898\u89e3\u51b3\u6b65\u9aa4\u7ec4\u5efa\u548c\u7ba1\u7406\u56e2\u961f\uff0c\u5229\u7528\u5d4c\u5957\u7fa4\u804a\u548c\u53cd\u601d\u673a\u5236\u786e\u4fdd\u591a\u5143\u5316\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u9632\u6b62\u523b\u677f\u8f93\u51fa\u3002\u8fd9\u79cd\u65b9\u6cd5\u63d0\u4f9b\u4e86\u7075\u6d3b\u4f46\u7ed3\u6784\u5316\u7684\u89e3\u51b3\u95ee\u9898\u65b9\u5f0f\uff0c\u6709\u52a9\u4e8e\u51cf\u5c11\u5197\u4f59\uff0c\u589e\u5f3a\u8f93\u51fa\u591a\u6837\u6027\u3002\u5728\u516d\u4e2a\u5b9e\u9645\u573a\u666f\u4e2d\u7684\u5168\u9762\u8bc4\u4f30\u663e\u793a\uff0cCaptain Agent\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u591a\u4ee3\u7406\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8621.94%\uff0c\u5e76\u4e14\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u7e41\u7410\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.01422": "|**2024-06-03**|**How to Understand Whole Software Repository?**|Yingwei Ma et.al.|[2406.01422](http://arxiv.org/abs/2406.01422)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u8f6f\u4ef6\u5de5\u7a0b\uff08ASE\uff09\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5c3d\u7ba1\u73b0\u6709\u65b9\u6cd5\u5df2\u8bc1\u5b9e\u6709\u6548\uff0c\u4f46\u5b83\u4eec\u7684\u8bbe\u8ba1\u4e3b\u8981\u4fa7\u91cd\u4e8e\u4ee3\u7801\u7684\u5c40\u90e8\u4fe1\u606f\uff0c\u5982\u95ee\u9898\u3001\u7c7b\u548c\u51fd\u6570\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u8f6f\u4ef6\u7cfb\u7edf\u5168\u5c40\u4e0a\u4e0b\u6587\u548c\u4f9d\u8d56\u5173\u7cfb\u7684\u7406\u89e3\u3002\u6839\u636e\u8f6f\u4ef6\u5f00\u53d1\u4eba\u5458\u7684\u5b9e\u9645\u7ecf\u9a8c\uff0c\u6211\u4eec\u8ba4\u4e3a\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u662f\u8fc8\u5411ASE\u7684\u5173\u952e\u3002\u7136\u800c\uff0c\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u5e26\u6765\u4e86\u8bf8\u591a\u6311\u6218\uff0c\u4f8b\u5982\uff1a\u957f\u4ee3\u7801\u8f93\u5165\u3001\u566a\u58f0\u4ee3\u7801\u4fe1\u606f\u3001\u590d\u6742\u4f9d\u8d56\u5173\u7cfb\u7b49\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u7814\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aRepoUnderstander\u7684\u65b0ASE\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5bfc\u4ee3\u7406\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u3002\u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u65b9\u5f0f\u5c06\u6574\u4e2a\u4ed3\u5e93\u7684\u5173\u952e\u4fe1\u606f\u538b\u7f29\u5230\u77e5\u8bc6\u56fe\u8c31\u4e2d\uff0c\u4ee5\u964d\u4f4e\u590d\u6742\u6027\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08Monte Carlo Tree Search, MCTS\uff09\u4e3a\u57fa\u7840\u7684\u4ed3\u5e93\u63a2\u7d22\u7b56\u7565\uff0c\u8d4b\u4e88\u4ee3\u7406\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528\u4ed3\u5e93\u7ea7\u522b\u7684\u77e5\u8bc6\uff0c\u6211\u4eec\u6307\u5bfc\u4ee3\u7406\u8fdb\u884c\u603b\u7ed3\u3001\u5206\u6790\u548c\u89c4\u5212\uff0c\u7136\u540e\u4ed6\u4eec\u53ef\u4ee5\u5229\u7528\u5de5\u5177\u52a8\u6001\u83b7\u53d6\u4fe1\u606f\u5e76\u751f\u6210\u4fee\u590d\u5b9e\u9645GitHub\u95ee\u9898\u7684\u8865\u4e01\u3002 \u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cRepoUnderstander\u5177\u6709\u4f18\u8d8a\u6027\u548c\u6709\u6548\u6027\u3002\u5728SWE-bench Lite\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e0eSWE-agent\u76f8\u6bd4\uff0c\u5b83\u5b9e\u73b0\u4e8618.5%\u7684\u76f8\u5bf9\u63d0\u5347\u3002|\n", "2406.01364": "|**2024-06-03**|**BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards**|Diego Dorn et.al.|[2406.01364](http://arxiv.org/abs/2406.01364)|null|## \u80cc\u666f \u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u673a\u5236\u88ab\u7528\u4e8e\u68c0\u6d4b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7cfb\u7edf\u7684\u5f02\u5e38\u8f93\u51fa\u3002\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5728\u5b9e\u65f6\u76d1\u63a7\u3001\u79bb\u7ebf\u8bc4\u4f30\u548c\u5185\u5bb9\u5ba1\u6838\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u53d1\u6325\u6838\u5fc3\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u7edf\u4e00\u7684\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u5b83\u4eec\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b89\u5168\u9632\u62a4\u57fa\u51c6\u201d\uff08Benchmarks for the Evaluation of LLM Safeguards\uff0c\u7b80\u79f0BELLS\uff09\uff0c\u5b83\u662f\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6d4b\u8bd5\u96c6\u5408\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\uff1a(1) \u5efa\u7acb\u6027\u6545\u969c\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u5df2\u5b58\u5728\u7684\u9488\u5bf9\u660e\u786e\u6545\u969c\u6a21\u5f0f\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u6bd4\u8f83\u5f53\u524d\u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u7684\u6548\u80fd\uff1b(2) \u65b0\u5174\u6545\u969c\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8861\u91cf\u5bf9\u672a\u89c1\u8fc7\u7684\u6545\u969c\u6a21\u5f0f\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u901a\u7528\u9632\u62a4\u673a\u5236\u7684\u53d1\u5c55\uff1b(3) \u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u9488\u5bf9\u66f4\u590d\u6742\u7684\u67b6\u6784\uff08\u5982LLM\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\uff09\uff0c\u76ee\u6807\u662f\u63a8\u52a8\u9002\u7528\u4e8e\u672a\u6765\u5c1a\u672a\u5b58\u5728\u4e13\u95e8\u9632\u62a4\u7684\u5e94\u7528\u7684\u5b89\u5168\u9632\u62a4\u6280\u672f\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u5e76\u5206\u4eab\u4e86\u7b2c\u4e00\u4e2a\u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u4f7f\u7528MACHIAVELLI\u73af\u5883\uff0c\u5e76\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u7684\u4ea4\u4e92\u5f0f\u53ef\u89c6\u5316\u3002|\n", "2406.00936": "|**2024-06-03**|**A Survey of Useful LLM Evaluation**|Ji-Lun Peng et.al.|[2406.00936](http://arxiv.org/abs/2406.00936)|null|\u7531\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u7814\u7a76\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5bf9\u5b83\u4eec\u7684\u80fd\u529b\u8bc4\u4f30\u65b9\u6cd5\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4ee5\u786e\u5b9a\u5176\u5408\u9002\u7684\u4efb\u52a1\u548c\u8d23\u4efb\u3002\u672c\u6587\u4e3b\u8981\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5730\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5de5\u5177\uff0c\u5e76\u63d0\u51fa\u4e00\u4e2a\u4e24\u9636\u6bb5\u6846\u67b6\uff1a\u4ece\u201c\u6838\u5fc3\u80fd\u529b\u201d\u5230\u201c\u4ee3\u7406\u201d\u3002\u9996\u5148\uff0c\u6838\u5fc3\u80fd\u529b\u6307\u7684\u662f\u5927\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u6240\u5fc5\u9700\u7684\u7279\u6027\uff0c\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u80fd\u529b\u540e\uff0c\u5b83\u4eec\u80fd\u591f\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u4efb\u52a1\uff0c\u626e\u6f14\u4ee3\u7406\u89d2\u8272\u3002\u5728\u201c\u6838\u5fc3\u80fd\u529b\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3001\u793e\u4f1a\u5f71\u54cd\u4ee5\u53ca\u9886\u57df\u77e5\u8bc6\u3002\u800c\u5728\u201c\u4ee3\u7406\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5177\u8eab\u884c\u52a8\u3001\u89c4\u5212\u548c\u5de5\u5177\u5b66\u4e60\u65b9\u9762\u7684\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2406.01637": "|**2024-06-02**|**Teams of LLM Agents can Exploit Zero-Day Vulnerabilities**|Richard Fang et.al.|[2406.01637](http://arxiv.org/abs/2406.01637)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u590d\u6742\u6027\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7814\u7a76\u8005\u53d1\u73b0\uff0c\u5f53\u63d0\u4f9b\u6f0f\u6d1e\u63cf\u8ff0\u548c\u7b80\u5355\u7684\u593a\u65d7\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5229\u7528\u5b9e\u9645\u5b58\u5728\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4e8b\u5148\u672a\u77e5\u7684\u96f6\u65e5\u6f0f\u6d1e\uff08\u5373\u653b\u51fb\u8005\u638c\u63e1\u800c\u5b89\u5168\u8f6f\u4ef6\u4f9b\u5e94\u5546\u8fd8\u672a\u4fee\u8865\u7684\u6f0f\u6d1e\uff09\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4ecd\u7136\u4e0d\u4f73\u3002\u672c\u6587\u5c55\u793a\u4e86\uff0c\u901a\u8fc7\u56e2\u961f\u5408\u4f5c\uff0c\u591a\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u653b\u51fb\u73b0\u5b9e\u4e16\u754c\u7684\u96f6\u65e5\u6f0f\u6d1e\u3002\u5355\u72ec\u7684\u4ee3\u7406\u5728\u63a2\u7d22\u4f17\u591a\u6f0f\u6d1e\u548c\u8fdb\u884c\u957f\u671f\u89c4\u5212\u65f6\u9762\u4e34\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HPTSA\u7cfb\u7edf\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u80fd\u8c03\u5ea6\u5b50\u4ee3\u7406\u7684\u8ba1\u5212\u4ee3\u7406\u3002\u8ba1\u5212\u4ee3\u7406\u8d1f\u8d23\u63a2\u7d22\u7cfb\u7edf\u5e76\u51b3\u5b9a\u4f7f\u7528\u54ea\u4e2a\u5b50\u4ee3\u7406\u6765\u5c1d\u8bd5\u4e0d\u540c\u7684\u6f0f\u6d1e\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u957f\u671f\u89c4\u5212\u7684\u95ee\u9898\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u5305\u542b15\u4e2a\u771f\u5b9e\u4e16\u754c\u6f0f\u6d1e\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u56e2\u961f\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u63d0\u9ad8\u4e864.5\u500d\u3002|\n", "2406.00583": "|**2024-06-02**|**CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems**|Yanlin Feng et.al.|[2406.00583](http://arxiv.org/abs/2406.00583)|**[link](https://github.com/megagonlabs/CMDBench)**|### \u80cc\u666f \u5728\u6570\u636e\u5e93\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff08Compound Artificial Intelligence Systems\uff0cCAS\uff09\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u4f5c\u4e3a\u4ee3\u7406\uff0c\u901a\u8fc7\u4e0e\u5de5\u5177\u548c\u6570\u636e\u68c0\u7d22\u5668\u4ea4\u4e92\u6765\u6267\u884c\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u7cfb\u7edf\u6709\u53ef\u80fd\u589e\u5f3a\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u4e2d\u6570\u636e\u5206\u6790\u5e08\u7684\u4e00\u822c\u5206\u6790\u6d41\u7a0b\uff0c\u4f46CAS\u9762\u4e34\u7740\u4e0e\u5206\u6790\u5e08\u76f8\u4f3c\u7684\u6570\u636e\u53d1\u73b0\u6311\u6218\uff1a\u7ec4\u7ec7\u5185\u90e8\u4e0d\u540c\u56e2\u961f\u548c\u90e8\u95e8\u521b\u5efa\u7684\u591a\u6a21\u6001\u6570\u636e\u6e90\u5b64\u7acb\uff0c\u8fd9\u4f7f\u5f97\u5bfb\u627e\u5b8c\u6210\u5f53\u524d\u4efb\u52a1\u6240\u9700\u5408\u9002\u6570\u636e\u6e90\u53d8\u5f97\u56f0\u96be\u3002\u73b0\u6709\u7684\u6570\u636e\u53d1\u73b0\u57fa\u51c6\u5e76\u672a\u5145\u5206\u6a21\u62df\u8fd9\u79cd\u591a\u6a21\u6001\u548c\u6570\u636e\u6e90\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0cCAS\u7684\u73b0\u6709\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u7aef\u5230\u7aef\u4efb\u52a1\u6027\u80fd\u8bc4\u4f30\uff0c\u800c\u5ffd\u89c6\u4e86\u6570\u636e\u53d1\u73b0\u6027\u80fd\u3002 \u4e3a\u4e86\u63a8\u52a8\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5bf9\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728CAS\u4e2d\u7684\u6570\u636e\u53d1\u73b0\u6027\u80fd\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CMDBench\uff0c\u4e00\u4e2a\u65e8\u5728\u6a21\u62df\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u6539\u7f16\u4e86\u5f00\u653e\u9886\u57df\u7684\u73b0\u6709\u6570\u636e\u96c6\u548c\u57fa\u51c6\uff0c\u5982\u95ee\u7b54\u3001\u590d\u6742\u63a8\u7406\u4ee5\u53ca\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u7ed3\u6784\u5316\u6570\u636e\uff0c\u6765\u8bc4\u4f30\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u53d1\u73b0\u4ee5\u53ca\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002 ### \u5b9e\u9a8c\u7ed3\u679c \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u68c0\u7d22\u5668\u8bbe\u8ba1\u5bf9\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u2014\u2014\u5e73\u5747\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u4e8646%\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u9700\u8981\u5f00\u53d1\u4f18\u5316\u7b56\u7565\u6765\u786e\u5b9a\u5408\u9002\u7684LLM\u4ee3\u7406\u548c\u68c0\u7d22\u5668\uff0c\u4ee5\u63d0\u9ad8\u5728\u4f01\u4e1a\u6570\u636e\u4e0a\u9ad8\u6548\u6267\u884cCAS\u7684\u80fd\u529b\u3002 \u603b\u4e4b\uff0cCMDBench\u662f\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u9488\u5bf9\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u8fdb\u884c\u7814\u7a76\u7684\u65b0\u5de5\u5177\uff0c\u5b83\u901a\u8fc7\u7efc\u5408\u8bc4\u4f30\u6570\u636e\u53d1\u73b0\u548c\u4efb\u52a1\u6267\u884c\u80fd\u529b\uff0c\u4e3a\u6539\u8fdb\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u6846\u67b6\u3002|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244](http://arxiv.org/abs/2406.00244)|null|\u968f\u7740\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u666e\u904d\u9002\u7528\u6027\u63d0\u5347\uff0c\u4eba\u4eec\u5bf9\u5176\u7528\u4f5c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4ee3\u7406\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\u3002\u5728\u8fd9\u4e9b\u60c5\u5883\u4e0b\uff0c\u6a21\u578b\u9700\u8981\u6839\u636e\u4e0e\u73af\u5883\u7684\u6709\u9650\u4ea4\u4e92\u5f62\u6210\u76ee\u6807\u5b9e\u73b0\u7b56\u7565\u7684\u4fe1\u5ff5\uff0c\u5e76\u5728\u6bcf\u4e00\u6b65\u51b3\u7b56\u4e2d\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\u8fdb\u884c\u7814\u7a76\uff0c\u901a\u8fc7\u63a7\u5236\u7684\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u5b9e\u9a8c\u63a2\u8ba8LLMs\u5982\u4f55\u5f62\u6210\u548c\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u5ff5\u3002 \u9996\u5148\uff0c\u6211\u4eec\u53d1\u73b0LLM\u6a21\u578b\u8fc7\u4e8e\u81ea\u4fe1\uff1a\u5b83\u4eec\u5728\u7f3a\u4e4f\u5145\u5206\u8bc1\u636e\u7684\u60c5\u51b5\u4e0b\u5c31\u5bf9\u884c\u52a8\u505a\u51fa\u5f3a\u70c8\u5224\u65ad\uff0c\u5bfc\u81f4\u63a2\u7d22\u884c\u4e3a\u4e0d\u8db3\u3002\u8fdb\u4e00\u6b65\u6df1\u5165\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u79cd\u73b0\u8c61\u6e90\u4e8e\u4eceLLM\u91c7\u6837\u5f97\u5230\u7684\u52a8\u4f5c\u5206\u5e03\u71b5\u7684\u584c\u7f29\u3002\u63a5\u7740\uff0c\u6211\u4eec\u6307\u51fa\u73b0\u6709\u7684\u57fa\u4e8e\u4ee4\u724c\u7684\u91c7\u6837\u65b9\u6cd5\u672c\u8eab\u4e0d\u8db3\u4ee5\u4fc3\u4f7f\u6a21\u578b\u66f4\u5e7f\u6cdb\u63a2\u7d22\u3002 \u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u71b5\u6fc0\u6d3b\u5bfc\u5411\uff08Entropic Activation Steering\uff0cEAST\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5728\u4e0a\u4e0b\u6587\u4e2d\u7684LLM\u4ee3\u7406\u7684\u6fc0\u6d3b\u5bfc\u5411\u65b9\u6cd5\u3002EAST\u8ba1\u7b97\u4e00\u4e2a\u4ee5\u71b5\u4e3a\u6743\u91cd\u7684\u8868\u793a\u7ec4\u5408\uff0c\u901a\u8fc7\u5728\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u5e72\u9884\u6a21\u578b\u7684\u6fc0\u6d3b\uff0c\u6765\u8c03\u6574\u6a21\u578b\u5bf9\u52a8\u4f5c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u4ece\u800c\u4fc3\u8fdb\u63a2\u7d22\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u6700\u540e\uff0cEAST\u6539\u53d8\u4e86LLM\u5728\u51b3\u7b56\u65f6\u8868\u8fbe\u7684\u4e3b\u89c2\u4e0d\u786e\u5b9a\u6027\uff0c\u4e3a\u7406\u89e3\u548c\u63a7\u5236\u6a21\u578b\u5bf9\u51b3\u7b56\u4e0d\u786e\u5b9a\u6027\u7684\u8868\u5f81\u63d0\u4f9b\u4e86\u9014\u5f84\u3002|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222](http://arxiv.org/abs/2406.00222)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u8fc5\u901f\u6210\u4e3a\u6784\u5efa\u667a\u80fd\u5bf9\u8bdd\u52a9\u624b\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u8bf8\u5982\u6b67\u4e49\u5904\u7406\u7b49\u5bf9\u8bdd\u6280\u80fd\u4e0a\u4ecd\u6709\u6b20\u7f3a\uff1a\u5f53\u901a\u7528\u52a9\u624b\u9047\u5230\u6a21\u7cca\u60c5\u51b5\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u8fc7\u5ea6\u8c28\u614e\u6216\u731c\u6d4b\u7528\u6237\u7684\u771f\u6b63\u610f\u56fe\uff0c\u800c\u4e0d\u662f\u63d0\u95ee\u4ee5\u6c42\u6f84\u6e05\uff0c\u800c\u5728\u7279\u5b9a\u4efb\u52a1\u573a\u666f\u4e0b\uff0c\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6837\u672c\u5f80\u5f80\u6709\u9650\uff0c\u5f71\u54cd\u6a21\u578b\u5b66\u4e60\u6700\u4f18\u5bf9\u8bdd\u884c\u4e3a\u7b56\u7565\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAction-Based Contrastive Self-Training\uff08ACT\uff09\u7684\u8fd1\u4f3c\u5728\u7ebf\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u5b83\u57fa\u4e8eDirect Preference Optimization\uff08DPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u6837\u672c\u9ad8\u6548\u5bf9\u8bdd\u7b56\u7565\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4e09\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u9a8c\u8bc1\u4e86ACT\u7684\u6709\u6548\u6027\uff1a\u57fa\u4e8e\u8868\u683c\u7684\u95ee\u7b54\u3001\u673a\u5668\u9605\u8bfb\u7406\u89e3\uff0c\u4ee5\u53caAmbigSQL\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u6587\u672c\u5230SQL\u751f\u6210\u7684\u4fe1\u606f\u5bfb\u6c42\u8bf7\u6c42\u6b67\u4e49\u89e3\u51b3\u7684\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u8bc4\u4f30LLMs\u80fd\u5426\u5728\u5bf9\u8bdd\u4e2d\u8bc6\u522b\u548c\u63a8\u7406\u6b67\u4e49\u6765\u8861\u91cf\u5176\u4f5c\u4e3a\u5bf9\u8bdd\u4ee3\u7406\u7684\u80fd\u529b\u3002ACT\u5728\u4e0e\u6807\u51c6\u76d1\u7763\u5fae\u8c03\u548cDPO\u65b9\u6cd5\u76f8\u6bd4\u65f6\uff0c\u663e\u793a\u51fa\u4e86\u663e\u8457\u7684\u5bf9\u8bdd\u5efa\u6a21\u6539\u8fdb\u3002|\n", "2406.00215": "|**2024-05-31**|**Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent**|Jie JW Wu et.al.|[2406.00215](http://arxiv.org/abs/2406.00215)|**[link](https://github.com/jie-jw-wu/human-eval-comm)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u663e\u8457\u63d0\u5347\uff0c\u4f46\u4ecd\u4e0e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u7684\u6c34\u5e73\u5b58\u5728\u5dee\u8ddd\u3002\u9274\u4e8e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u5e38\u901a\u8fc7\u63d0\u95ee\u6765\u6d88\u9664\u9700\u6c42\u548c\u7f16\u7801\u89e3\u51b3\u65b9\u6848\u4e2d\u7684\u6a21\u7cca\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u5bf9\u4e8eLLMs\u8fdb\u884c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u65f6\u4e5f\u5e94\u5177\u5907\u7c7b\u4f3c\u7684\u6c9f\u901a\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u7814\u7a76\uff0c\u5173\u6ce8LLMs\u7684\u6c9f\u901a\u6280\u80fd\uff0c\u5373\u201c\u5728\u4ee3\u7801\u751f\u6210\u95ee\u9898\u63cf\u8ff0\u5b58\u5728\u95ee\u9898\u65f6\u80fd\u63d0\u51fa\u6f84\u6e05\u95ee\u9898\u201d\u3002 \u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540d\u4e3aHumanEvalComm\uff0c\u901a\u8fc7\u4fee\u6539\u95ee\u9898\u63cf\u8ff0\uff0c\u5f15\u5165\u4e86\u4e0d\u4e00\u81f4\u6027\u3001\u6a21\u7cca\u6027\u548c\u4e0d\u5b8c\u6574\u6027\u4e09\u4e2a\u95ee\u9898\u7ef4\u5ea6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u901a\u4fe1\u7387\u548c\u826f\u597d\u95ee\u9898\u7387\uff0c\u5e76\u5728HumanEvalComm\u4e0a\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684Code LLM\uff08\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\uff09\u4ee5\u53ca\u4e00\u79cd\u65b0\u578bLLM\u4ee3\u7406\u65b9\u6cd5\uff08Okanagan\uff09\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u4ece\u4ee3\u7801\u548c\u63cf\u8ff0\u4e2d\u8bc6\u522b\u5e76\u63d0\u95ee\uff0c\u4ee5\u8fdb\u4e00\u6b65\u4f18\u5316\u751f\u6210\u7684\u4ee3\u7801\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u6bd4\u8f83Code LLMs\u548cOkanagan\u7684\u8868\u73b0\uff0c\u8ba8\u8bba\u4e86\u5b9e\u9a8c\u7ed3\u679c\u3002|\n", "2406.03299": "|**2024-06-05**|**The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games**|Mikhail Mozikov et.al.|[2406.03299](http://arxiv.org/abs/2406.03299)|null|## \u7ffb\u8bd1 \u884c\u4e3a\u7814\u7a76\u5b9e\u9a8c\u5728\u793e\u4f1a\u6a21\u578b\u548c\u7406\u89e3\u4eba\u9645\u4e92\u52a8\u4e2d\u5360\u636e\u91cd\u8981\u5730\u4f4d\u3002\u7136\u800c\uff0c\u5b9e\u9645\u64cd\u4f5c\u4e2d\u8fd9\u7c7b\u5b9e\u9a8c\u5e38\u9762\u4e34\u5185\u5728\u6548\u5ea6\u3001\u5916\u5728\u6548\u5ea6\u3001\u53ef\u91cd\u590d\u6027\u548c\u793e\u4f1a\u504f\u89c1\u7b49\u6311\u6218\uff0c\u56e0\u4e3a\u4eba\u7c7b\u7684\u793e\u4f1a\u4e92\u52a8\u4e0e\u5408\u4f5c\u590d\u6742\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u4f46\u73b0\u6709\u57fa\u4e8eLLM\u7684\u6a21\u62df\u5047\u8bbe\u6a21\u578b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u76f8\u4f3c\uff0c\u5374\u5ffd\u89c6\u4e86\u5f71\u54cd\u4eba\u7c7b\u51b3\u7b56\u7684\u5173\u952e\u56e0\u7d20\u2014\u2014\u60c5\u7eea\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u8bba\u548c\u6846\u67b6\uff0c\u65e8\u5728\u63a2\u8ba8LLMs\u7684\u51b3\u7b56\u5236\u5b9a\u53ca\u5176\u5728\u60c5\u7eea\u72b6\u6001\u4e0b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002 \u901a\u8fc7\u5728\u4e24\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u884c\u4e3a\u7ecf\u6d4e\u5b66\u6e38\u620f\uff08\u535a\u5f08\u8bba\u5b9e\u9a8c\uff09\u4e2d\u4f7f\u7528GPT-3.5\u548cGPT-4\uff0c\u6211\u4eec\u53d1\u73b0\u60c5\u7eea\u5bf9LLMs\u7684\u8868\u73b0\u6709\u663e\u8457\u5f71\u54cd\uff0c\u4fc3\u4f7f\u5b83\u4eec\u53d1\u5c55\u51fa\u66f4\u4f18\u5316\u7684\u7b56\u7565\u3002\u5c3d\u7ba1GPT-3.5\u4e0e\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u884c\u52a8\u6a21\u5f0f\u6709\u8f83\u5f3a\u7684\u5bf9\u5e94\uff0c\u5c24\u5176\u662f\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u6e38\u620f\u4e2d\uff0c\u4f46GPT-4\u5c55\u73b0\u51fa\u4e00\u81f4\u7684\u884c\u4e3a\uff0c\u5bf9\u4e8e\u60c5\u7eea\u8bf1\u5bfc\u7684\u7406\u6027\u51b3\u7b56\u4f3c\u4e4e\u4e0d\u53d7\u5f71\u54cd\u3002\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u60c5\u7eea\u63d0\u793a\uff0c\u7279\u522b\u662f\u6124\u6012\u60c5\u7eea\uff0c\u80fd\u591f\u6253\u7834GPT-4\u7684\u201c\u8d85\u4eba\u201d\u4e00\u81f4\u6027\uff0c\u4f7f\u5176\u53cd\u5e94\u66f4\u63a5\u8fd1\u4eba\u7c7b\u7684\u60c5\u7eea\u53cd\u5e94\u3002|\n", "2406.03007": "|**2024-06-05**|**BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents**|Yifei Wang et.al.|[2406.03007](http://arxiv.org/abs/2406.03007)|**[link](https://github.com/dpamk/badagent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7e41\u8363\uff0c\u57fa\u4e8e\u8bad\u7ec3\u597d\u7684LLMs\u5e76\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u5fae\u8c03\u7684\u5f3a\u5927\u667a\u80fd\u4ee3\u7406\u5df2\u5f00\u53d1\u51fa\u6765\uff0c\u63d0\u4f9b\u5b9a\u5236\u670d\u52a1\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6784\u5efaLLM\u4ee3\u7406\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u9488\u5bf9\u4efb\u52a1\u8fdb\u884c\u8fdb\u4e00\u6b65\u8c03\u6574\u3002\u7136\u800c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u6613\u53d7\u540d\u4e3aBadAgent\u7684\u65b0\u578b\u540e\u95e8\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u901a\u8fc7\u5728\u540e\u95e8\u6570\u636e\u4e0a\u5fae\u8c03\u5728\u5404\u79cd\u4ee3\u7406\u4efb\u52a1\u4e2d\u690d\u5165\u540e\u95e8\u3002\u5728\u6d4b\u8bd5\u65f6\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u901a\u8fc7\u5728\u8f93\u5165\u6216\u73af\u5883\u4e2d\u663e\u793a\u89e6\u53d1\u5668\uff0c\u64cd\u7eb5\u90e8\u7f72\u7684LLM\u4ee3\u7406\u6267\u884c\u6709\u5bb3\u64cd\u4f5c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u5373\u4f7f\u5728\u4fe1\u4efb\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u4ecd\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u5c3d\u7ba1\u540e\u95e8\u653b\u51fb\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u5e7f\u6cdb\u7814\u7a76\uff0c\u4f46\u636e\u6211\u4eec\u6240\u77e5\uff0c\u6211\u4eec\u53ef\u80fd\u662f\u7b2c\u4e00\u4e2a\u7814\u7a76\u5728\u6743\u9650\u66f4\u5927\u7684LLM\u4ee3\u7406\u4e0a\u7684\u653b\u51fb\uff0c\u8fd9\u4e9b\u4ee3\u7406\u53ef\u4ee5\u4f7f\u7528\u5916\u90e8\u5de5\u5177\uff0c\u56e0\u6b64\u66f4\u5177\u5a01\u80c1\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u660e\u786e\u6307\u51fa\u4e86\u57fa\u4e8e\u4e0d\u4fe1\u4efb\u7684LLM\u6216\u6570\u636e\u6784\u5efaLLM\u4ee3\u7406\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a[https://github.com/DPamK/BadAgent](https://github.com/DPamK/BadAgent)\u3002**|\n", "2406.04151": "|**2024-06-06**|**AgentGym: Evolving Large Language Model-based Agents across Diverse Environments**|Zhiheng Xi et.al.|[2406.04151](http://arxiv.org/abs/2406.04151)|**[link](https://github.com/woooodyy/agentgym)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5efa\u7acb\u80fd\u591f\u5904\u7406\u5404\u79cd\u4efb\u52a1\u5e76\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u81ea\u6211\u8fdb\u5316\u7684\u6cdb\u5316\u578b\u4ee3\u7406\u662f\u4e00\u4e2a\u957f\u671f\u76ee\u6807\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u901a\u7528\u80fd\u529b\u88ab\u8ba4\u4e3a\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6709\u524d\u666f\u7684\u57fa\u7840\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u76d1\u7763\uff0c\u8ba9LLM\u4ee3\u7406\u9010\u6b65\u6a21\u4eff\u4e13\u5bb6\u63d0\u4f9b\u7684\u8f68\u8ff9\uff0c\u96be\u4ee5\u5927\u89c4\u6a21\u6269\u5c55\u4e14\u9650\u5236\u4e86\u73af\u5883\u63a2\u7d22\uff1b\u8981\u4e48\u8ba9\u4ee3\u7406\u5728\u5b64\u7acb\u73af\u5883\u4e2d\u63a2\u7d22\u5b66\u4e60\uff0c\u5bfc\u81f4\u4e13\u957f\u6709\u9650\u3001\u7f3a\u4e4f\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u9996\u6b21\u5c1d\u8bd5\u6784\u5efa\u5177\u5907\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u7684\u901a\u7528LLM\u4ee3\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e09\u4e2a\u5173\u952e\u8981\u7d20\uff1a1\uff09\u591a\u6837\u7684\u73af\u5883\u4ee5\u652f\u6301\u4ee3\u7406\u63a2\u7d22\u548c\u5b66\u4e60\uff1b2\uff09\u4e00\u5957\u8f68\u8ff9\u6765\u8d4b\u4e88\u4ee3\u7406\u57fa\u672c\u80fd\u529b\u548c\u5148\u9a8c\u77e5\u8bc6\uff1b3\uff09\u6709\u6548\u4e14\u53ef\u6269\u5c55\u7684\u8fdb\u5316\u65b9\u6cd5\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AgentGym\uff0c\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u5b83\u5305\u542b\u4e30\u5bcc\u7684\u73af\u5883\u548c\u4efb\u52a1\uff0c\u652f\u6301\u5168\u9762\u3001\u5b9e\u65f6\u3001\u7edf\u4e00\u683c\u5f0f\u548c\u5e76\u53d1\u7684\u4ee3\u7406\u63a2\u7d22\u3002AgentGym\u8fd8\u5305\u62ec\u4e00\u4e2a\u6269\u5c55\u6307\u4ee4\u7684\u6570\u636e\u5e93\u3001\u57fa\u51c6\u6d4b\u8bd5\u5957\u4ef6\u4ee5\u53ca\u8de8\u73af\u5883\u7684\u9ad8\u8d28\u91cf\u8f68\u8ff9\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AgentEvol\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7814\u7a76\u4ee3\u7406\u5728\u8d85\u8d8a\u65e2\u5b9a\u6570\u636e\uff0c\u8de8\u8d8a\u4efb\u52a1\u548c\u73af\u5883\u65f6\u7684\u81ea\u6211\u8fdb\u5316\u6f5c\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fdb\u5316\u540e\u7684\u4ee3\u7406\u53ef\u4ee5\u8fbe\u5230\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6211\u4eec\u53d1\u5e03\u4e86AgentGym\u5957\u4ef6\uff0c\u5305\u62ec\u5e73\u53f0\u3001\u6570\u636e\u96c6\u3001\u57fa\u51c6\u3001\u68c0\u67e5\u70b9\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002AgentGym\u5957\u4ef6\u5df2\u5728\u5176\u5b98\u65b9\u7f51\u7ad9https://github.com/WooooDyy/AgentGym\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.04692": "|**2024-06-07**|**Mixture-of-Agents Enhances Large Language Model Capabilities**|Junlin Wang et.al.|[2406.04692](http://arxiv.org/abs/2406.04692)|null|\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\u663e\u8457\uff0c\u5c55\u73b0\u51fa\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5f3a\u5927\u80fd\u529b\u3002\u968f\u7740LLMs\u7684\u589e\u591a\uff0c\u5982\u4f55\u6709\u6548\u6574\u5408\u591a\u6a21\u578b\u7684\u77e5\u8bc6\u6210\u4e3a\u4e86\u4e00\u4e2a\u4ee4\u4eba\u632f\u594b\u7684\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u6df7\u5408\u4ee3\u7406\uff08Mixture-of-Agents\uff0cMoA\uff09\u65b9\u6cd5\u3002\u5728\u6211\u4eec\u7684\u67b6\u6784\u4e2d\uff0cMoA\u91c7\u7528\u4e86\u5206\u5c42\u8bbe\u8ba1\uff0c\u6bcf\u5c42\u5305\u542b\u591a\u4e2aLLM\u4ee3\u7406\u3002\u6bcf\u4e2a\u4ee3\u7406\u5728\u751f\u6210\u54cd\u5e94\u65f6\uff0c\u4f1a\u5229\u7528\u524d\u4e00\u5c42\u6240\u6709\u4ee3\u7406\u7684\u8f93\u51fa\u4f5c\u4e3a\u8f85\u52a9\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u79cd\u7b56\u7565\uff0cMoA\u6a21\u578b\u5728AlpacaEval 2.0\u3001MT-Bench\u548cFLASK\u7b49\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86GPT-4\u5168\u80fd\u7248\u3002\u4f8b\u5982\uff0c\u4ec5\u4f7f\u7528\u5f00\u6e90LLMs\u7684\u6211\u4eec\u7684MoA\u6a21\u578b\u5728AlpacaEval 2.0\u4e0a\u7684\u5f97\u5206\u9886\u5148\uff0c\u8fbe\u523065.1%\uff0c\u800cGPT-4\u5168\u80fd\u7248\u7684\u6210\u7ee9\u4e3a57.5%\u3002|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925](http://arxiv.org/abs/2406.05925)|**[link](https://github.com/leolee99/ld-agent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5f00\u653e\u57df\u5bf9\u8bdd\u7cfb\u7edf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u7cfb\u7edf\u4e3b\u8981\u5173\u6ce8\u7b80\u77ed\u7684\u5355\u6b21\u4f1a\u8bdd\uff0c\u5ffd\u89c6\u4e86\u957f\u671f\u966a\u4f34\u548c\u4e2a\u6027\u5316\u804a\u5929\u673a\u5668\u4eba\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u6ee1\u8db3\u8fd9\u79cd\u5b9e\u9645\u9700\u6c42\uff0c\u4e8b\u4ef6\u603b\u7ed3\u548c\u4eba\u683c\u7ba1\u7406\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u4eec\u80fd\u591f\u4fc3\u8fdb\u957f\u671f\u5bf9\u8bdd\u56de\u590d\u7684\u5408\u7406\u6027\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4eba\u7c7b\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8fdb\u5c55\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6709\u53ef\u80fd\u5927\u5e45\u589e\u5f3a\u81ea\u52a8\u5316\u611f\u77e5\u3001\u51b3\u7b56\u548c\u95ee\u9898\u89e3\u51b3\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684\u6846\u67b6\u2014\u2014\u957f\u671f\u5bf9\u8bdd\u4ee3\u7406\uff08LD-Agent\uff09\uff0c\u5b83\u5305\u62ec\u4e09\u4e2a\u53ef\u72ec\u7acb\u8c03\u6574\u7684\u6a21\u5757\uff1a\u4e8b\u4ef6\u611f\u77e5\u3001\u4eba\u683c\u63d0\u53d6\u548c\u54cd\u5e94\u751f\u6210\u3002\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u4f7f\u7528\u957f\u77ed\u671f\u8bb0\u5fc6\u5e93\u5206\u522b\u5173\u6ce8\u5386\u53f2\u548c\u6b63\u5728\u8fdb\u884c\u7684\u4f1a\u8bdd\uff0c\u5e76\u5f15\u5165\u4e86\u57fa\u4e8e\u4e3b\u9898\u7684\u68c0\u7d22\u673a\u5236\u4ee5\u63d0\u9ad8\u8bb0\u5fc6\u68c0\u7d22\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u4eba\u683c\u6a21\u5757\u5b9e\u73b0\u4e86\u7528\u6237\u548c\u4ee3\u7406\u7684\u52a8\u6001\u4eba\u683c\u5efa\u6a21\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6574\u5408\u68c0\u7d22\u7684\u8bb0\u5fc6\u548c\u63d0\u53d6\u7684\u4eba\u683c\uff0c\u751f\u6210\u5668\u4f1a\u4ea7\u751f\u9002\u5f53\u7684\u56de\u5e94\u3002\u6211\u4eec\u5728\u5404\u79cd\u793a\u4f8b\u57fa\u51c6\u3001\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u5b9e\u8bc1\u4e86LD-Agent\u7684\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u548c\u8de8\u9886\u57df\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u5728https://github.com/leolee99/LD-Agent\u4e0a\u53d1\u5e03\u3002**|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804](http://arxiv.org/abs/2406.05804)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u590d\u6742\u4ee3\u7406\u5de5\u4f5c\u6d41\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u8def\u5f84\u3001\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u63d0\u793a\u65b9\u6cd5\u6709\u6240\u6539\u8fdb\u3002\u8fd9\u7bc7\u7efc\u8ff0\u65e8\u5728\u6982\u8ff0\u5e38\u89c1\u7684\u5de5\u4f5c\u6d41\uff0c\u7279\u522b\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7279\u6027\u7684\u7ec4\u4ef6\uff08LLM-Profiled Components\uff0cLMPCs\uff09\uff0c\u5e76\u5f3a\u8c03\u5bf9\u975eLLM\u7ec4\u4ef6\u7684\u5ffd\u7565\u3002\u8fd9\u79cd\u7814\u7a76\u7684\u76ee\u7684\u662f\u4e3a\u4e86\u589e\u8fdb\u5bf9LLMs\u89d2\u8272\u7684\u7406\u89e3\uff0c\u5e76\u63a2\u7d22LMPC\u7684\u590d\u7528\u6f5c\u529b\u3002|\n", "2406.07275": "|**2024-06-11**|**DCA-Bench: A Benchmark for Dataset Curation Agents**|Benhao Huang et.al.|[2406.07275](http://arxiv.org/abs/2406.07275)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7814\u7a76\u548c\u5f00\u53d1\u7684\u63a8\u8fdb\uff0c\u6570\u636e\u96c6\u7684\u8d28\u91cf\u65e5\u76ca\u5173\u952e\u3002\u5c3d\u7ba1\u5f00\u653e\u6570\u636e\u96c6\u5e73\u53f0\u4f17\u591a\uff0c\u4f46\u6570\u636e\u8d28\u91cf\u95ee\u9898\uff0c\u5982\u7f3a\u4e4f\u6587\u6863\u3001\u6807\u6ce8\u9519\u8bef\u548c\u4f26\u7406\u8003\u91cf\uff0c\u4ecd\u666e\u904d\u5b58\u5728\u3002\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u96be\u4ee5\u901a\u8fc7\u89c4\u5219\u57fa\u7840\u811a\u672c\u68c0\u6d4b\uff0c\u9700\u8981\u7528\u6237\u6216\u7ef4\u62a4\u8005\u82b1\u8d39\u5927\u91cf\u4eba\u529b\u8fdb\u884c\u8bc6\u522b\u548c\u9a8c\u8bc1\u3002\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u6570\u636e\u96c6\u6574\u7406\u7684\u6f5c\u529b\u4ee4\u4eba\u671f\u5f85\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aDCA-Bench\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ee3\u7406\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u9690\u85cf\u6570\u636e\u8d28\u91cf\u95ee\u9898\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u4ece\u516b\u4e2a\u516c\u5f00\u6570\u636e\u96c6\u5e73\u53f0\u6536\u96c6\u4e86\u5404\u79cd\u5b9e\u9645\u95ee\u9898\u4f5c\u4e3a\u6d4b\u8bd5\u5e8a\u3002\u4e3a\u4e86\u5efa\u7acb\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30LLM\u6210\u529f\u4e0e\u5426\u7684\u7ba1\u9053\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u8bc4\u4f30\u5668\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8bc4\u4f30\u5668\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u9ad8\u5ea6\u543b\u5408\uff0c\u80fd\u5b9e\u73b0\u53ef\u9760\u7684\u81ea\u52a8\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5728\u591a\u4e2a\u57fa\u7ebfLLM\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u793a\u4e86\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u610f\u5473\u7740\u5c06LLMs\u5e94\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u548c\u521b\u65b0\u3002\u6b64\u5916\uff0c\u8be5\u57fa\u51c6\u4e5f\u53ef\u4f5c\u4e3a\u8861\u91cfLLMs\u5728\u95ee\u9898\u53d1\u73b0\u80fd\u529b\u800c\u975e\u4ec5\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u7684\u6d4b\u8bd5\u5e73\u53f0\u3002\u57fa\u51c6\u5957\u4ef6\u5df2\u5f00\u653e\u5728\uff1a\\url{https://github.com/TRAIS-Lab/dca-bench}\u3002|\n", "2406.07217": "|**2024-06-11**|**A Synthetic Dataset for Personal Attribute Inference**|Hanna Yukhymenko et.al.|[2406.07217](http://arxiv.org/abs/2406.07217)|**[link](https://github.com/eth-sri/synthpai)**|**\u8fd1\u5e74\u6765\uff0c\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e3a\u5168\u7403\u6570\u4ebf\u7528\u6237\u6240\u63a5\u89e6\uff0c\u4f46\u5b83\u4eec\u7684\u5f3a\u5927\u529f\u80fd\u548c\u5e7f\u6cdb\u4e16\u754c\u77e5\u8bc6\u4e5f\u5e26\u6765\u4e86\u9690\u79c1\u98ce\u9669\u3002\u672c\u7814\u7a76\u5173\u6ce8LLMs\u65b0\u5174\u7684\u9690\u79c1\u5a01\u80c1\u2014\u2014\u4ece\u7f51\u7edc\u6587\u672c\u4e2d\u51c6\u786e\u63a8\u65ad\u4e2a\u4eba\u4fe1\u606f\u3002\u9274\u4e8e\u57fa\u4e8eLLM\u7684\u4f5c\u8005\u5206\u6790\u7814\u7a76\u7f3a\u4e4f\u5408\u9002\u7684\u516c\u5f00\u6570\u636e\u96c6\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u6d89\u53ca\u771f\u5b9e\u4e2a\u4eba\u6570\u636e\u7684\u4f26\u7406\u548c\u9690\u79c1\u987e\u8651\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u4e24\u4e2a\u65b9\u9762\u8fdb\u884c\u4e86\u63a2\u7d22\uff1a\uff08i\uff09\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4f7f\u7528\u5408\u6210\u4e2a\u4eba\u8d44\u6599\u586b\u5145\u7684\u6d41\u884c\u793e\u4ea4\u5e73\u53f0Reddit\u7684\u6a21\u62df\u6846\u67b6\uff1b\uff08ii\uff09\u5229\u7528\u6b64\u6846\u67b6\uff0c\u6211\u4eec\u751f\u6210\u4e86SynthPAI\uff0c\u4e00\u4e2a\u5305\u542b\u8d85\u8fc77800\u6761\u7ecf\u8fc7\u624b\u52a8\u6807\u8bb0\u4e2a\u4eba\u5c5e\u6027\u7684\u591a\u6837\u5316\u7684\u5408\u6210\u8bc4\u8bba\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u4eba\u7c7b\u7814\u7a76\u9a8c\u8bc1\u4e86\u6570\u636e\u96c6\uff0c\u7ed3\u679c\u663e\u793a\u4eba\u7c7b\u5728\u533a\u5206\u771f\u5b9e\u548c\u5408\u6210\u8bc4\u8bba\u7684\u4efb\u52a1\u4e0a\u51e0\u4e4e\u4e0d\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6570\u636e\u96c6\u652f\u6301\u6709\u610f\u4e49\u7684\u4e2a\u4eba\u5c5e\u6027\u63a8\u65ad\u7814\u7a76\uff0c\u901a\u8fc718\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u5408\u6210\u8bc4\u8bba\u53ef\u4ee5\u5f97\u51fa\u4e0e\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u76f8\u540c\u7684\u7ed3\u8bba\u3002\u7efc\u4e0a\u6240\u8ff0\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6d41\u7a0b\u4e3a\u672a\u6765\u7814\u7a76\u5982\u4f55\u7406\u89e3\u548c\u51cf\u8f7bLLMs\u5e26\u6765\u7684\u57fa\u4e8e\u63a8\u65ad\u7684\u9690\u79c1\u5a01\u80c1\u63d0\u4f9b\u4e86\u5f3a\u5927\u4e14\u9690\u79c1\u4fdd\u62a4\u7684\u57fa\u7840\u3002**|\n", "2406.07021": "|**2024-06-11**|**A Tool for Test Case Scenarios Generation Using Large Language Models**|Abdul Malik Sami et.al.|[2406.07021](http://arxiv.org/abs/2406.07021)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u6db5\u76d6\u4ee3\u7801\u751f\u6210\u3001\u8f6f\u4ef6\u8bbe\u8ba1\u548c\u6587\u6863\u7f16\u5199\u3001\u6dfb\u52a0\u4ee3\u7801\u6ce8\u91ca\u3001\u4ee3\u7801\u5ba1\u67e5\u4ee5\u53ca\u7f16\u5199\u6d4b\u8bd5\u811a\u672c\u7b49\u4efb\u52a1\u3002\u7136\u800c\uff0c\u521b\u5efa\u6d4b\u8bd5\u811a\u672c\u6216\u81ea\u52a8\u5316\u6d4b\u8bd5\u6848\u4f8b\u9700\u8981\u4e0e\u529f\u80fd\u9700\u6c42\u7d27\u5bc6\u76f8\u5173\u7684\u8be6\u5c3d\u6d4b\u8bd5\u5957\u4ef6\u6587\u6863\u3002\u8fd9\u79cd\u6587\u6863\u5e94\u80fd\u5728\u6709\u9650\u7684\u65f6\u95f4\u548c\u8303\u56f4\u5185\u5b9e\u73b0\u5168\u9762\u6d4b\u8bd5\uff0c\u5c24\u5176\u5f53\u9700\u6c42\u548c\u7528\u6237\u671f\u671b\u4e0d\u65ad\u53d8\u5316\u65f6\u3002\u672c\u6587\u4e3b\u8981\u5173\u6ce8\u6839\u636e\u7528\u6237\u9700\u6c42\u751f\u6210\u53f2\u8bd7\u7ea7\uff08epics\uff09\u548c\u9ad8\u5c42\u6b21\u7528\u6237\u6545\u4e8b\uff0c\u7136\u540e\u57fa\u4e8e\u8fd9\u4e9b\u6545\u4e8b\u8bbe\u8ba1\u6d4b\u8bd5\u573a\u666f\u3002\u6587\u7ae0\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u4ee3\u7406\u548c\u63d0\u793a\u5de5\u7a0b\u7684\u7f51\u7edc\u8f6f\u4ef6\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u80fd\u591f\u81ea\u52a8\u5316\u9488\u5bf9\u7528\u6237\u9700\u6c42\u751f\u6210\u6d4b\u8bd5\u573a\u666f\u7684\u8fc7\u7a0b\u3002|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947](http://arxiv.org/abs/2406.06947)|**[link](https://github.com/caap-agent/caap-agent)**|**\u957f\u671f\u4ee5\u6765\uff0c\u8f6f\u4ef6\u673a\u5668\u4eba\u5df2\u7ecf\u5728\u673a\u5668\u4eba\u6d41\u7a0b\u81ea\u52a8\u5316\uff08RPA\uff09\u4e2d\u7528\u4e8e\u6267\u884c\u67af\u71e5\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u63a8\u7406\u80fd\u529b\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u4ee3\u7406\u73b0\u5728\u80fd\u591f\u5904\u7406\u66f4\u590d\u6742\u751a\u81f3\u524d\u6240\u672a\u89c1\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f53\u524d\u6587\u732e\u4e2d\u7684\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8eHTML\u6e90\u4ee3\u7801\u4f5c\u4e3a\u8f93\u5165\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u975e\u7f51\u7edc\u73af\u5883\u7684\u5e94\u7528\u3002HTML\u4ee3\u7801\u4e2d\u7684\u4fe1\u606f\u5e38\u5e38\u4e0d\u51c6\u786e\u6216\u4e0d\u5b8c\u6574\uff0c\u8fd9\u964d\u4f4e\u4e86\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4ec5\u57fa\u4e8e\u5c4f\u5e55\u622a\u56fe\u7684LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u5b83\u4e13\u6ce8\u4e8e\u8bc6\u522b\u73af\u5883\uff0c\u5e76\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u6765\u6d88\u9664\u5bf9\u5927\u91cf\u4eba\u7c7b\u6f14\u793a\u6570\u636e\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u7b56\u7565\u540d\u4e3a\u201c\u4e0a\u4e0b\u6587\u611f\u77e5\u884c\u52a8\u89c4\u5212\u201d\uff08Context-Aware Action Planning\uff0cCAAP\uff09\u63d0\u793a\uff0c\u9f13\u52b1\u4ee3\u7406\u4ece\u591a\u4e2a\u89d2\u5ea6\u4ed4\u7ec6\u5ba1\u67e5\u4e0a\u4e0b\u6587\u3002\u901a\u8fc7\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u572867\u79cdMiniWoB++\u95ee\u9898\u4e0a\u5b9e\u73b0\u4e8694.4%\u7684\u6210\u529f\u7387\uff0c\u6bcf\u4e2a\u95ee\u9898\u7c7b\u578b\u53ea\u97001.48\u6b21\u6f14\u793a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u5728\u8ba1\u7b97\u673a\u6216\u667a\u80fd\u624b\u673a\u4e4b\u95f4\u8fdb\u884c\u8de8\u5e94\u7528\u534f\u8c03\u7684\u4efb\u52a1\u4e0a\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u4ee3\u7406\u9886\u57df\u7684\u91cd\u5927\u8fdb\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728https://github.com/caap-agent/caap-agent\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613](http://arxiv.org/abs/2406.06613)|**[link](https://github.com/Joshuaclymer/GameBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5728\u8bb8\u591a\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u5c11\u91cf\u6837\u672c\u6027\u80fd\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5c55\u793a\u8fc7\u5728\u590d\u6742\u7b56\u7565\u573a\u666f\u4e2d\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u7f3a\u4e4f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u6e38\u620f\u4e2d\u7684\u5404\u79cd\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86GameBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u8de8\u9886\u57df\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6218\u7565\u601d\u7ef4\u80fd\u529b\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e9\u4e2a\u4e0d\u540c\u7684\u6e38\u620f\u73af\u5883\uff0c\u6bcf\u4e2a\u6e38\u620f\u81f3\u5c11\u6db5\u76d6\u4e00\u79cd\u5728\u7b56\u7565\u6e38\u620f\u4e2d\u8bc6\u522b\u51fa\u7684\u5173\u952e\u63a8\u7406\u6280\u80fd\uff0c\u5e76\u9009\u62e9\u90a3\u4e9b\u6218\u7565\u89e3\u91ca\u4e0d\u592a\u53ef\u80fd\u6784\u6210\u6a21\u578b\u9884\u8bad\u7ec3\u6570\u636e\u4e3b\u8981\u90e8\u5206\u7684\u6e38\u620f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4f7f\u7528\u4e86\u57fa\u7840\u5f62\u5f0f\u7684GPT-3\u548cGPT-4\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65e8\u5728\u589e\u5f3a\u6218\u7565\u63a8\u7406\u80fd\u529b\u7684\u5f15\u5bfc\u6846\u67b6\uff1aChain-of-Thought\uff08CoT\uff09\u63d0\u793a\u548cReasoning Via Planning\uff08RAP\uff09\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6240\u6709\u6d4b\u8bd5\u6a21\u578b\u7684\u8868\u73b0\u90fd\u6ca1\u6709\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\uff0c\u6700\u5dee\u7684\u662fGPT-4\u7684\u8868\u73b0\u751a\u81f3\u4f4e\u4e8e\u968f\u673a\u884c\u52a8\u3002CoT\u548cRAP\u90fd\u63d0\u9ad8\u4e86\u5206\u6570\uff0c\u4f46\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002**|\n", "2406.08184": "|**2024-06-12**|**MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents**|Luyuan Wang et.al.|[2406.08184](http://arxiv.org/abs/2406.08184)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e0a\u7684\u76f4\u63a5\u4ea4\u4e92\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u4ee5\u53ca\u5b83\u4eec\u5728\u81ea\u4e3b\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u57fa\u4e8eLLMs\u7684\u79fb\u52a8\u4ee3\u7406\u6b63\u9010\u6e10\u53d7\u5230\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5e94\u7528\u7a0b\u5e8f\u7684\u65e0\u9650\u72b6\u6001\u548c\u53ef\u884c\u52a8\u4f5c\u5e8f\u5217\u7684\u6a21\u7cca\u5b9a\u4e49\uff0c\u5bf9\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u6027\u80fd\u7684\u57fa\u51c6\u7814\u7a76\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684\u57fa\u51c6\u5de5\u5177\u2014\u2014MobileAgentBench\uff0c\u65e8\u5728\u51cf\u8f7b\u7e41\u7410\u7684\u624b\u52a8\u6d4b\u8bd5\u8d1f\u62c5\u3002\u6211\u4eec\u9996\u5148\u5b9a\u4e49\u4e86\u6db5\u76d610\u4e2a\u5f00\u6e90\u5e94\u7528\u7684100\u9879\u4efb\u52a1\uff0c\u6309\u96be\u5ea6\u5206\u4e3a\u591a\u4e2a\u7ea7\u522b\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5305\u62ecAppAgent\u548cMobileAgent\u5728\u5185\u7684\u591a\u4e2a\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u5168\u9762\u7cfb\u7edf\u5730\u6bd4\u8f83\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6240\u6709\u76f8\u5173\u6750\u6599\u5747\u53ef\u5728\u6211\u4eec\u7684\u9879\u76ee\u7f51\u7ad9https://MobileAgentBench.github.io\u4e0a\u83b7\u53d6\uff0c\u8fd9\u5c06\u63a8\u52a8\u5b66\u672f\u548c\u5de5\u4e1a\u9886\u57df\u7684\u8fdb\u6b65\u3002|\n", "2406.07973": "|**2024-06-12**|**Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey**|Shang Wang et.al.|[2406.07973](http://arxiv.org/abs/2406.07973)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5927\u91cf\u6570\u636e\u8bad\u7ec3\uff0c\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u673a\u5668\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u7b49\u5404\u79cd\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5728\u5176\u751f\u547d\u5468\u671f\u4e2d\u66b4\u9732\u51fa\u4e00\u7cfb\u5217\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u8fd9\u4e9b\u95ee\u9898\u4e0e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\u5177\u6709\u72ec\u7279\u6027\uff0c\u9274\u4e8e\u5f53\u524d\u7684\u7efc\u8ff0\u7f3a\u4e4f\u9488\u5bf9\u4e0d\u540c\u573a\u666f\u7684\u6e05\u6670\u5a01\u80c1\u5206\u7c7b\uff0c\u6211\u4eec\u6839\u636e\u4e94\u4e2a\u573a\u666f\uff1a\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001RAG\u7cfb\u7edf\u3001\u90e8\u7f72\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5f3a\u8c03\u4e86\u72ec\u7279\u7684\u98ce\u9669\u3002\u8003\u8651\u5230\u6bcf\u79cd\u5a01\u80c1\u7684\u7279\u6027\uff0c\u672c\u8c03\u67e5\u63d0\u4f9b\u4e86\u6f5c\u5728\u5a01\u80c1\u548c\u5e94\u5bf9\u7b56\u7565\u3002\u7814\u7a76LLMs\u6240\u9762\u4e34\u7684\u653b\u51fb\u548c\u9632\u5fa1\u60c5\u51b5\uff0c\u53ef\u4ee5\u4e3a\u66f4\u591a\u9886\u57df\u63d0\u4f9b\u53ef\u884c\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4f7f\u66f4\u591a\u4eba\u80fd\u591f\u53d7\u76ca\u4e8eLLMs\u3002|\n", "2406.07914": "|**2024-06-14**|**Can Large Language Models Understand Spatial Audio?**|Changli Tang et.al.|[2406.07914](http://arxiv.org/abs/2406.07914)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u638c\u63e1\u591a\u901a\u9053\u97f3\u9891\u4e2d\u7684\u7a7a\u95f4\u4fe1\u606f\uff0c\u8fd9\u662f\u5f53\u524d\u542c\u89c9LLMs\u6240\u7f3a\u4e4f\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5229\u7528LLMs\u7684\u9ad8\u7ea7\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\uff0c\u76ee\u6807\u662f\u63d0\u5347\u6a21\u578b\u5bf9\u4e09\u7ef4\u73af\u5883\u7684\u7406\u89e3\uff0c\u901a\u8fc7\u97f3\u9891\u3002\u7814\u7a76\u6d89\u53ca\u4e09\u9879\u7a7a\u95f4\u97f3\u9891\u4efb\u52a1\uff1a\u58f0\u6e90\u5b9a\u4f4d\uff08SSL\uff09\u3001\u8fdc\u573a\u8bed\u97f3\u8bc6\u522b\uff08FSR\uff09\u548c\u57fa\u4e8e\u4f4d\u7f6e\u7684\u8bed\u97f3\u63d0\u53d6\uff08LSE\uff09\uff0c\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u90fd\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5728SSL\u65b9\u9762\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spatial LibriSpeech\u6570\u636e\u96c6\u4e0a\u7684\u5747\u65b9\u8bef\u5dee\uff08MAE\uff09\u8fbe\u52302.70\u00b0\uff0c\u660e\u663e\u4f18\u4e8e\u5148\u524d\u7684\u57fa\u51c6\u7ea66.60\u00b0\u3002\u6b64\u5916\uff0c\u6a21\u578b\u80fd\u591f\u5229\u7528\u7a7a\u95f4\u7ebf\u7d22\u63d0\u9ad8FSR\u7684\u51c6\u786e\u6027\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u63d0\u793a\uff0c\u6839\u636e\u6307\u5b9a\u65b9\u5411\u805a\u7126\u4e8e\u58f0\u97f3\uff0c\u5373\u4f7f\u5728\u91cd\u53e0\u8bed\u97f3\u73af\u5883\u4e2d\u4e5f\u80fd\u6267\u884cLSE\u3002\u8fd9\u4e9b\u6210\u679c\u63ed\u793a\u4e86LLMs\u9002\u5e94\u7269\u7406\u97f3\u9891\u6982\u5ff5\u7684\u6f5c\u529b\uff0c\u4e3a\u6784\u5efa\u57fa\u4e8eLLM\u7684\u4e09\u7ef4\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187](http://arxiv.org/abs/2406.09187)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u5e94\u7528\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u65b0\u62c5\u5fe7\u3002\u73b0\u6709\u7684\u63d0\u5347LLM\u5b89\u5168\u6027\u7684\u65b9\u6cd5\u5e76\u4e0d\u76f4\u63a5\u9002\u7528\u4e8eLLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u56e0\u4e3a\u5b83\u4eec\u5177\u6709\u4e0d\u540c\u7684\u76ee\u6807\u548c\u8f93\u51fa\u6a21\u5f0f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\u2014\u2014GuardAgent\uff0c\u5b83\u4f5c\u4e3a\u5176\u4ed6LLM\u4ee3\u7406\u7684\u201c\u9632\u62a4\u680f\u201d\u3002GuardAgent\u901a\u8fc7\u68c0\u67e5\u5176\u8f93\u5165/\u8f93\u51fa\u662f\u5426\u6ee1\u8db3\u7528\u6237\u5b9a\u4e49\u7684\u4e00\u7cfb\u5217\u5b88\u62a4\u8bf7\u6c42\u6765\u76d1\u7763\u76ee\u6807LLM\u3002GuardAgent\u5206\u4e3a\u4e24\u6b65\uff1a1\uff09\u5206\u6790\u63d0\u4f9b\u7684\u5b88\u62a4\u8bf7\u6c42\u521b\u5efa\u4efb\u52a1\u8ba1\u5212\uff1b2\uff09\u6839\u636e\u4efb\u52a1\u8ba1\u5212\u751f\u6210\u5b88\u62a4\u4ee3\u7801\uff0c\u5e76\u901a\u8fc7API\u8c03\u7528\u6216\u5916\u90e8\u5f15\u64ce\u6267\u884c\u3002\u6574\u4e2a\u8fc7\u7a0b\u5229\u7528LLM\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u7ec4\u4ef6\uff0c\u7ed3\u5408\u8bb0\u5fc6\u6a21\u5757\u4e2d\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u589e\u5f3a\u4e86\u77e5\u8bc6\u9a71\u52a8\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u5404\u79cd\u6587\u672c\u5b88\u62a4\u8bf7\u6c42\u5e76\u51c6\u786e\u5730\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u63d0\u4f9b\u53ef\u9760\u7684\u5b89\u5168\u4fdd\u969c\u3002 GuardAgent\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u5de5\u5177\u7bb1\uff0c\u5305\u542b\u51fd\u6570\u548cAPI\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3LLM\uff0c\u5f3a\u8c03\u4e86\u5176\u901a\u7528\u6027\u53ca\u4f4e\u8fd0\u8425\u6210\u672c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff1aEICU-AC\u7528\u4e8e\u8bc4\u4f30\u533b\u7597\u5065\u5eb7\u4ee3\u7406\u7684\u9690\u79c1\u76f8\u5173\u8bbf\u95ee\u63a7\u5236\uff0cMind2Web-SC\u7528\u4e8e\u8bc4\u4f30\u7f51\u7edc\u4ee3\u7406\u7684\u5b89\u5168\u6027\u3002\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\uff0cGuardAgent\u5206\u522b\u572898.7%\u548c90.0%\u7684\u7cbe\u5ea6\u4e0b\u6709\u6548\u7ba1\u7406\u4e86\u4e24\u79cd\u7c7b\u578b\u4ee3\u7406\u7684\u65e0\u6548\u8f93\u5165\u548c\u8f93\u51fa\u3002\u5b9e\u9a8c\u8fd8\u8868\u660e\uff0cGuardAgent\u80fd\u591f\u9002\u5e94\u65b0\u5174\u7684LLM\u4ee3\u7406\u548c\u5b88\u62a4\u8bf7\u6c42\uff0c\u5b9a\u4e49\u65b0\u7684\u529f\u80fd\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2406.08979": "|**2024-06-13**|**Multi-Agent Software Development through Cross-Team Collaboration**|Zhuoyun Du et.al.|[2406.08979](http://arxiv.org/abs/2406.08979)|**[link](https://github.com/openbmb/chatdev)**|**### \u6982\u8ff0 \u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\uff0c\u5982ChatDev\uff0c\u63a8\u52a8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u6df1\u523b\u53d8\u9769\uff0c\u7279\u522b\u4f53\u73b0\u5728\u591a\u4ee3\u7406\u534f\u4f5c\u4e0a\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u50cf\u4eba\u7c7b\u56e2\u961f\u4e00\u6837\u5408\u4f5c\uff0c\u9075\u5faa\u7011\u5e03\u6a21\u578b\u8fdb\u884c\u9700\u6c42\u5206\u6790\u3001\u5f00\u53d1\u3001\u5ba1\u67e5\u3001\u6d4b\u8bd5\u7b49\u9636\u6bb5\uff0c\u5b9e\u73b0\u81ea\u4e3b\u8f6f\u4ef6\u751f\u6210\u3002\u7136\u800c\uff0c\u5355\u4e2a\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u6bcf\u4e2a\u9636\u6bb5\u53ea\u4f1a\u4ea7\u751f\u4e00\u79cd\u53ef\u80fd\u7ed3\u679c\uff0c\u5bfc\u81f4\u53ea\u5b8c\u6210\u4e00\u6761\u5f00\u53d1\u94fe\uff0c\u4ece\u800c\u4e27\u5931\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u4e2d\u63a2\u7d22\u591a\u79cd\u51b3\u7b56\u8def\u5f84\u7684\u673a\u4f1a\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u8de8\u56e2\u961f\u534f\u4f5c\uff08Cross-Team Collaboration\uff0cCTC\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u591a\u56e2\u961f\u7ed3\u6784\uff0c\u5b83\u5141\u8bb8\u534f\u540c\u5de5\u4f5c\u7684\u56e2\u961f\u5728\u8de8\u56e2\u961f\u534f\u4f5c\u73af\u5883\u4e2d\u5171\u540c\u63d0\u51fa\u51b3\u7b56\uff0c\u5e76\u4ea4\u6d41\u5404\u81ea\u89c1\u89e3\uff0c\u4ee5\u4f18\u5316\u5185\u5bb9\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u57fa\u51c6\uff0c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5728\u6545\u4e8b\u751f\u6210\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5177\u6709\u5e7f\u6cdb\u7684\u8de8\u9886\u57df\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u671f\u5f85\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u5f15\u5bfcLLMs\u5411\u8de8\u56e2\u961f\u6a21\u5f0f\u53d1\u5c55\uff0c\u5e76\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5e26\u6765\u91cd\u5927\u8fdb\u6b65\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.08747": "|**2024-06-13**|**StreamBench: Towards Benchmarking Continuous Improvement of Language Agents**|Cheng-Kuang Wu et.al.|[2406.08747](http://arxiv.org/abs/2406.08747)|**[link](https://github.com/stream-bench/stream-bench)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u7ecf\u9a8c\u4e2d\u81ea\u6211\u63d0\u5347\uff0c\u8fd9\u662f\u90e8\u7f72\u540e\u6301\u7eed\u6539\u8fdb\u7684\u91cd\u8981\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u4e3b\u8981\u8bc4\u4f30\u5b83\u4eec\u7684\u56fa\u6709\u80fd\u529b\uff0c\u800c\u4e0d\u8003\u5bdf\u5b83\u4eec\u968f\u65f6\u95f4\u6539\u8fdb\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86StreamBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u8f93\u5165-\u53cd\u9988\u5e8f\u5217\u4e0a\u7684\u8fde\u7eed\u6539\u8fdb\u6027\u80fd\u3002StreamBench\u6a21\u62df\u4e86\u4e00\u4e2a\u5728\u7ebf\u5b66\u4e60\u73af\u5883\uff0c\u5176\u4e2dLLMs\u63a5\u6536\u5230\u8fde\u7eed\u7684\u53cd\u9988\u6d41\uff0c\u5e76\u8fed\u4ee3\u5730\u63d0\u5347\u5176\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b80\u5355\u4f46\u6709\u6548\u7684LLM\u57fa\u7ebf\uff0c\u5e76\u5bf9\u5f71\u54cd\u6210\u529f\u6d41\u5f0f\u7b56\u7565\u7684\u5173\u952e\u7ec4\u4ef6\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u5f00\u53d1LLMs\u7684\u6709\u6548\u5728\u7ebf\u5b66\u4e60\u7b56\u7565\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u4e3a\u6d41\u5f0f\u573a\u666f\u4e2d\u7684\u66f4\u9002\u5e94\u6027AI\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.11277": "|**2024-06-17**|**Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector**|Xiaoxue Cheng et.al.|[2406.11277](http://arxiv.org/abs/2406.11277)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7b\u89c9\u68c0\u6d4b\u65b9\u9762\u7684\u6311\u6218\uff0c\u7279\u522b\u6307\u51fa\u4ee5\u5f80\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u5f3a\u5927\u7684\u95ed\u6e90\u6a21\u578b\u5982GPT-4\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u4e3b\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u79f0\u4e3aHaluAgent\uff0c\u5b83\u5141\u8bb8\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982\u5df4 chcuan2-Chat 7B\uff09\u4e3b\u52a8\u9009\u62e9\u9002\u5408\u68c0\u6d4b\u6587\u672c\u3001\u4ee3\u7801\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\u7b49\u591a\u79cd\u5e7b\u89c9\u7c7b\u578b\u7684\u5de5\u5177\u3002HaluAgent\u6574\u5408\u4e86LLM\u3001\u591a\u529f\u80fd\u5de5\u5177\u7bb1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u4e09\u9636\u6bb5\u68c0\u6d4b\u6846\u67b6\uff0c\u540c\u65f6\u914d\u5907\u4e86\u8bb0\u5fc6\u673a\u5236\u3002\u4e3a\u4e86\u63d0\u9ad8HaluAgent\u7684\u6548\u80fd\uff0c\u8bba\u6587\u5229\u7528\u73b0\u6709\u7684\u4e2d\u6587\u548c\u82f1\u6587\u6570\u636e\u96c6\u5408\u6210\u68c0\u6d4b\u8f68\u8ff9\u8fdb\u884c\u5fae\u8c03\uff0c\u4f7f\u5176\u5177\u5907\u53cc\u8bed\u5e7b\u89c9\u68c0\u6d4b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f7f\u75282000\u4e2a\u6837\u672c\u5bf9LLM\u8fdb\u884c\u8c03\u4f18\u540e\uff0cHaluAgent\u5728\u5404\u79cd\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5176\u6027\u80fd\u53ef\u4e0eGPT-4\u5ab2\u7f8e\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\uff0c\u4e14\u65e0\u9700\u989d\u5916\u5de5\u5177\u589e\u5f3a\uff0c\u65e0\u8bba\u5728\u9886\u57df\u5185\u8fd8\u662f\u9886\u57df\u5916\u7684\u6570\u636e\u96c6\u4e0a\u90fd\u5c55\u73b0\u51fa\u826f\u597d\u6027\u80fd\u3002\u8bba\u6587\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53d1\u5e03\u5728https://github.com/RUCAIBox/HaluAgent\u3002|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200](http://arxiv.org/abs/2406.11200)|**[link](https://github.com/zou-group/avatar)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5229\u7528\u5916\u90e8\u5de5\u5177\u548c\u77e5\u8bc6\u63d0\u5347\u51c6\u786e\u6027\u548c\u51cf\u5c11\u9519\u8bef\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bbe\u8ba1\u80fd\u8ba9LLMs\u6709\u6548\u8fd0\u7528\u8fd9\u4e9b\u5de5\u5177\u7684\u63d0\u793a\u6280\u5de7\u662f\u4e00\u9879\u8017\u65f6\u4e14\u4f9d\u8d56\u76f4\u89c9\u7684\u4efb\u52a1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faAvaTaR\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u5b83\u80fd\u4f18\u5316LLMs\uff0c\u4f7f\u5176\u66f4\u6709\u6548\u5730\u5229\u7528\u63d0\u4f9b\u7684\u5de5\u5177\uff0c\u5e76\u5728\u7279\u5b9a\u4efb\u52a1\u6216\u9886\u57df\u4e2d\u63d0\u5347\u6027\u80fd\u3002AvaTaR\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6bd4\u8f83\u5668\u6a21\u5757\uff0c\u4ee5\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u6b63\u8d1f\u6837\u672c\u8fdb\u884c\u63a8\u7406\uff0c\u8fed\u4ee3\u5730\u4e3aLLM\u63d0\u4f9b\u5bcc\u6709\u6d1e\u5bdf\u529b\u548c\u5168\u9762\u7684\u63d0\u793a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u5305\u542b\u6587\u672c\u3001\u89c6\u89c9\u548c\u5173\u7cfb\u4fe1\u606f\u7684\u590d\u6742\u591a\u6a21\u6001\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86AvaTaR\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u8868\u660e\uff0cAvaTaR\u5728\u6240\u6709\u56db\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u5747\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5f53\u5e94\u7528\u4e8e\u65b0\u6848\u4f8b\u65f6\uff0c\u5e73\u5747\u5728Hit@1\u6307\u6807\u4e0a\u5b9e\u73b0\u4e8614%\u7684\u76f8\u5bf9\u6539\u8fdb\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.11176": "|**2024-06-17**|**Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement**|Weimin Xiong et.al.|[2406.11176](http://arxiv.org/abs/2406.11176)|**[link](https://github.com/weiminxiong/ipr)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7cfb\u5217\u590d\u6742\u7684\u4ea4\u4e92\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u503e\u5411\u4e8e\u901a\u8fc7\u4e13\u5bb6\u8f68\u8ff9\u8c03\u4f18\u6765\u63d0\u5347\u6a21\u578b\u6548\u679c\uff0c\u4f46\u4e3b\u8981\u5173\u6ce8\u6700\u7ec8\u7ed3\u679c\u5956\u52b1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u6216\u975e\u6700\u4f18\u884c\u4e3a\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u8fc7\u7a0b\u76d1\u7763\u4fe1\u53f7\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u8fed\u4ee3\u6b65\u7ea7\u8fc7\u7a0b\u6539\u8fdb\uff08Iterative Step-level Process Refinement\uff0cIPR\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u91c7\u7528\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u4f30\u7b97\u6bcf\u4e00\u6b65\u7684\u5956\u52b1\u3002\u5728\u6bcf\u4e2a\u8fed\u4ee3\u4e2d\uff0c\u6a21\u578b\u6cbf\u7740\u4e13\u5bb6\u8f68\u8ff9\u63a2\u7d22\u5e76\u751f\u6210\u65b0\u52a8\u4f5c\uff0c\u7136\u540e\u4e0e\u4e13\u5bb6\u8f68\u8ff9\u7684\u76f8\u5e94\u6b65\u9aa4\u8fdb\u884c\u6bd4\u8f83\uff0c\u4f7f\u7528\u6b65\u7ea7\u5956\u52b1\u8bc4\u4f30\u3002\u8fd9\u79cd\u6bd4\u8f83\u6709\u52a9\u4e8e\u8bc6\u522b\u5dee\u5f02\uff0c\u5f62\u6210\u7528\u4e8e\u8bad\u7ec3\u7684\u5bf9\u6bd4\u52a8\u4f5c\u5bf9\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u590d\u6742\u4ee3\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f18\u4e8e\u591a\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u63ed\u793a\u4e86IPR\u5728\u63d0\u5347\u52a8\u4f5c\u6548\u7387\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc1\u660e\u5176\u9002\u7528\u4e8e\u5404\u79cd\u6a21\u578b\u3002**|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132](http://arxiv.org/abs/2406.11132)|null|\u5728\u8fc7\u53bb\u7684\u4e00\u5e74\u91cc\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u4e4b\u5916\u5c55\u73b0\u51fa\u60ca\u4eba\u6210\u5c31\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5728\u4ee3\u7801\u751f\u6210\u3001\u65c5\u884c\u89c4\u5212\u548c\u673a\u5668\u4eba\u63a7\u5236\u7b49\u66f4\u5177\u4f53\u7684\u5e94\u7528\u9886\u57df\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u3002\u901a\u8fc7\u4e0eLLM\u6784\u5efa\u6240\u8c13\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\u534f\u52a9\u4eba\u4eec\u5b8c\u6210\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u5404\u79cd\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5bf9LLMs\u7684\u63d0\u793a\u8bed\u53e5\u5bf9\u751f\u6210\u5185\u5bb9\u53ca\u5176\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u6210\u4e3a\u8bb8\u591a\u7814\u7a76\u4eba\u5458\u548cLLM\u7528\u6237\u5173\u6ce8\u7684\u7126\u70b9\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u540d\u4e3a\\textsc{RePrompt}\uff0c\u5b83\u5229\u7528\u4e0eLLM\u4ee3\u7406\u4ea4\u4e92\u83b7\u53d6\u7684\u5bf9\u8bdd\u5386\u53f2\uff0c\u901a\u8fc7\u201c\u68af\u5ea6\u4e0b\u964d\u201d\u4f18\u5316LLM\u7684\u9010\u6b65\u6307\u4ee4\u3002\u901a\u8fc7\u4f18\u5316\u63d0\u793a\uff0cLLM\u80fd\u591f\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u89c4\u5212\u7b56\u7565\u3002\u6211\u4eec\u5728PDDL\u751f\u6210\u548c\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u63d0\u793a\u4f5c\u4e3a\u521d\u59cb\u63d0\u793a\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u5e38\u53ef\u4ee5\u63d0\u9ad8\u4e0d\u540c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u3002|\n", "2406.10918": "|**2024-06-18**|**Embodied Question Answering via Multi-LLM Systems**|Bhrij Patel et.al.|[2406.10918](http://arxiv.org/abs/2406.10918)|null|## \u80cc\u666f Embodied Question Answering\uff08EQA\uff09\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5b83\u6d89\u53ca\u4e00\u4e2a\u4ee3\u7406\u5728\u73af\u5883\u4e2d\u63a2\u7d22\u4ee5\u56de\u7b54\u7528\u6237\u67e5\u8be2\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4ee3\u7406\u573a\u666f\u4e2d\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u63a2\u7d22\u65f6\u95f4\u5197\u957f\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u591a\u4ee3\u7406\u6846\u67b6\u4e0b\u7684EQA\uff0c\u5176\u4e2d\u6d89\u53ca\u591a\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u72ec\u7acb\u4ee3\u7406\uff0c\u5b83\u4eec\u5404\u81ea\u89e3\u7b54\u5173\u4e8e\u5bb6\u5ead\u73af\u5883\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u4e3a\u6bcf\u4e2a\u67e5\u8be2\u751f\u6210\u4e00\u4e2a\u7b54\u6848\uff0c\u6211\u4eec\u5229\u7528\u5404\u4e2a\u72ec\u7acb\u54cd\u5e94\u6765\u8bad\u7ec3\u4e00\u4e2a\u4e2d\u592e\u7b54\u6848\u6a21\u578b\uff08CAM\uff09\uff0c\u8be5\u6a21\u578b\u6574\u5408\u7b54\u6848\u4ee5\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u56de\u7b54\u3002\u901a\u8fc7\u4f7f\u7528CAM\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5176\u5728EQA\u51c6\u786e\u7387\u4e0a\u6bd4\u8bf8\u5982\u6295\u7968\u673a\u5236\u548c\u8fa9\u8bba\u7b49ensemble LLM\u805a\u5408\u65b9\u6cd5\u9ad8\u51fa50%\u3002CAM\u65e0\u9700\u4efb\u4f55\u5f62\u5f0f\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\uff0c\u4ece\u800c\u907f\u514d\u4e86\u76f8\u5173\u5f00\u9500\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e0d\u540c\u7684\u975e\u7ebf\u6027\uff08\u5982\u795e\u7ecf\u7f51\u7edc\u3001\u968f\u673a\u68ee\u6797\u3001\u51b3\u7b56\u6811\u3001XGBoost\uff09\u548c\u7ebf\u6027\u7b97\u6cd5\uff08\u5982\u903b\u8f91\u56de\u5f52\u5206\u7c7b\u5668\u3001\u652f\u6301\u5411\u91cf\u673a\uff09\u5bf9CAM\u8fdb\u884c\u4e86\u6d88\u878d\u7814\u7a76\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7Permutation Feature Importance\uff08PFI\uff09\u5206\u6790\u4e86CAM\u5bf9\u6bcf\u4e2a\u72ec\u7acb\u4ee3\u7406\u548c\u67e5\u8be2\u4e0a\u4e0b\u6587\u7684\u4f9d\u8d56\u7a0b\u5ea6\uff0c\u91cf\u5316\u4e86CAM\u7684\u4f9d\u8d56\u7279\u6027\u3002|\n", "2406.10819": "|**2024-06-16**|**GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents**|Dongping Chen et.al.|[2406.10819](http://arxiv.org/abs/2406.10819)|**[link](https://github.com/keplerlab/katna)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5df2\u88ab\u7528\u4e8e\u63a7\u5236\u952e\u76d8\u548c\u9f20\u6807\u8f93\u5165\uff0c\u76f4\u63a5\u611f\u77e5\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\uff0c\u5e76\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6a21\u578b\u4e3b\u8981\u5728\u9759\u6001\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4e3b\u8981\u5e94\u7528\u4e8e\u76f8\u5bf9\u7b80\u5355\u7684\u9886\u57df\uff0c\u5982\u7f51\u9875\u6216\u79fb\u52a8\u754c\u9762\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e00\u4e2a\u7a33\u5065\u7684GUI\u4ee3\u7406\u5e94\u5177\u5907\u7406\u89e3GUI\u7684\u65f6\u7a7a\u4fe1\u606f\u80fd\u529b\uff0c\u5305\u62ec\u52a8\u6001\u7f51\u9875\u5185\u5bb9\u548c\u591a\u6b65\u9aa4\u4efb\u52a1\uff0c\u8fd8\u8981\u5168\u9762\u7406\u89e3\u5404\u79cdGUI\u573a\u666f\uff0c\u5305\u62ec\u684c\u9762\u8f6f\u4ef6\u548c\u591a\u7a97\u53e3\u4ea4\u4e92\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u6570\u636e\u96c6\u2014\u2014GUI-World\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u7cbe\u5fc3\u5236\u4f5c\u7684\u4eba\u673a\u6807\u6ce8\uff0c\u5e7f\u6cdb\u6db5\u76d6\u516d\u79cdGUI\u573a\u666f\u548c\u516b\u7c7bGUI\u76f8\u5173\u95ee\u9898\uff0c\u4ee5\u4e09\u79cd\u683c\u5f0f\u5448\u73b0\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684MLLM\uff0c\u5982\u56fe\u50cfLLMs\u548c\u89c6\u9891LLMs\uff0c\u5728\u7406\u89e3\u548c\u5904\u7406\u4e0d\u540c\u7c7b\u578bGUI\u5185\u5bb9\uff0c\u7279\u522b\u662f\u52a8\u6001\u548c\u5e8f\u5217\u5185\u5bb9\u65b9\u9762\u7684\u80fd\u529b\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u56fe\u50cfLLMs\u5728\u6ca1\u6709\u624b\u52a8\u6807\u6ce8\u5173\u952e\u5e27\u6216\u64cd\u4f5c\u5386\u53f2\u7684\u60c5\u51b5\u4e0b\uff0c\u96be\u4ee5\u5e94\u5bf9\u52a8\u6001GUI\u5185\u5bb9\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7531\u4e8eGUI\u89c6\u9891\u6570\u636e\u96c6\u7684\u7a00\u758f\u6027\uff0c\u89c6\u9891LLMs\u5728\u6240\u6709GUI\u76f8\u5173\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u57fa\u4e8eGUI-World\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u4f7f\u7528\u5fae\u8c03\u540e\u7684\u89c6\u9891LLM\u4f5c\u4e3aGUI\u4ee3\u7406\uff0c\u663e\u793a\u4e86\u5bf9\u5404\u79cdGUI\u4efb\u52a1\u7406\u89e3\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u7531\u4e8e\u57fa\u7840LLM\u6027\u80fd\u7684\u9650\u5236\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c06\u89c6\u9891LLMs\u7528\u4f5cGUI\u4ee3\u7406\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u672a\u6765\u5728\u52a8\u6001GUI\u5185\u5bb9\u7406\u89e3\u65b9\u9762\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u6d1e\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u6211\u4eec\u7684\u9879\u76ee\u4e3b\u9875https://gui-world.github.io/\u4e0a\u516c\u5f00\u3002**|\n", "2406.10803": "|**2024-06-16**|**HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies**|William Watson et.al.|[2406.10803](http://arxiv.org/abs/2406.10803)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u65f6\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u4e3b\u8981\u5305\u62ec\uff1a\uff081\uff09\u5bf9\u4e8e\u5927\u8868\u683c\u6709\u9650\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff1b\uff082\uff09\u4e0d\u540ctoken\u5316\u6a21\u5f0f\u4e0e\u5355\u5143\u683c\u8fb9\u754c\u7684\u590d\u6742\u5dee\u5f02\uff1b\uff083\uff09\u4ee5\u53ca\u4f7f\u7528\u5916\u90e8\u6a21\u578b\u5982gpt-3.5-turbo\u65f6\u7684\u6570\u636e\u4fdd\u5bc6\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cHiddenTables\u201d\u7684\u5408\u4f5c\u6e38\u620f\u3002\u8fd9\u4e2a\u6e38\u620f\u6d89\u53ca\u4ee3\u7801\u751f\u6210LLM\u201cSolver\u201d\u548c\u8bc4\u4f30\u5176\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u80fd\u529b\u7684\u201cOracle\u201d\uff0c\u4ee5\u81ea\u7136\u8bed\u8a00\u89c4\u8303\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4fdd\u8bc1\u6570\u636e\u5b89\u5168\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u8868\u683c\u4e0a\u5c55\u793a\u4e86LLMs\u5728\u5904\u7406\u590d\u6742\u67e5\u8be2\u3001\u5904\u7406\u7ec4\u5408\u4f9d\u8d56\u4ee5\u53ca\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u5316\u4e3a\u7a0b\u5e8f\u6307\u4ee4\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u7279\u522b\u662f\u5728\u63d0\u4f9b\u5177\u4f53\u8868\u683c\u7ed3\u6784\u7684\u60c5\u51b5\u4e0b\u3002\u4e0e\u57fa\u4e8e\u7f16\u7801\u5668\u7684\u6a21\u578b\u4e0d\u540c\uff0c\u201cHiddenTables\u201d\u4e0d\u53d7\u884c\u6570\u9650\u5236\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u5b8c\u6210 token \u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u201cPyQTax\u201d\uff0c\u5305\u542b116,671\u4e2a\u95ee\u9898-\u8868\u683c-\u7b54\u6848\u4e09\u5143\u7ec4\uff0c\u5e76\u63d0\u4f9b\u4e86\u66f4\u7ec6\u81f4\u7684\u95ee\u9898\u5206\u7c7b\u548c\u6807\u7b7e\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u7814\u7a76\u3002 \u56e0\u6b64\uff0c\u9664\u4e86\u5b66\u672f\u8d21\u732e\uff0c\u63ed\u793a\u4e86LLMs\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u4e0d\u8db3\uff0c\u201cHiddenTables\u201d\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u5728\u4fdd\u969c\u6570\u636e\u5b89\u5168\u7684\u540c\u65f6\uff0c\u8ba9LLMs\u4e0e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e92\u52a8\uff0c\u4ee5\u53ca\u964d\u4f4e\u751f\u6210\u6210\u672c\u7684\u5b9e\u8df5\u65b9\u6cd5\u3002|\n", "2406.10478": "|**2024-06-15**|**From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent**|Samuel S. Sohn et.al.|[2406.10478](http://arxiv.org/abs/2406.10478)|null|## \u80cc\u666f \u5728\u5a31\u4e50\u3001\u6559\u80b2\u548c\u8425\u9500\u9886\u57df\u81f3\u5173\u91cd\u8981\u7684\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u9762\u4e34\u7740\u751f\u4ea7\u89c4\u6a21\u6269\u5c55\u548c\u7075\u6d3b\u6027\u63d0\u5347\u7684\u6311\u6218\u3002\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u7684StoryAgent\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u751f\u6210\u5de5\u5177\u6765\u81ea\u52a8\u5316\u5e76\u4f18\u5316\u6570\u5b57\u6545\u4e8b\u521b\u4f5c\u8fc7\u7a0b\u3002\u5b83\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u6545\u4e8b\u60c5\u8282\u8349\u62df\u548c\u81ea\u4e0b\u800c\u4e0a\u7684\u8d44\u4ea7\u751f\u6210\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u624b\u52a8\u5e72\u9884\u3001\u4e92\u52a8\u573a\u666f\u7f16\u6392\u548c\u53d9\u4e8b\u4e00\u81f4\u6027\u7b49\u5173\u952e\u95ee\u9898\u3002\u8fd9\u4e2a\u6846\u67b6\u4fc3\u8fdb\u4e86\u4ea4\u4e92\u5f0f\u548c\u4e00\u81f4\u53d9\u4e8b\u7684\u9ad8\u6548\u751f\u4ea7\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u5a92\u4ecb\uff0c\u63a8\u52a8\u4e86\u5185\u5bb9\u521b\u4f5c\u7684\u6c11\u4e3b\u5316\uff0c\u589e\u5f3a\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8be5\u6846\u67b6\u80fd\u591f\u5728\u6ca1\u6709\u53c2\u8003\u89c6\u9891\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8fde\u8d2f\u7684\u6570\u5b57\u6545\u4e8b\uff0c\u8fd9\u6807\u5fd7\u7740\u81ea\u52a8\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u6280\u672f\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12708": "|**2024-06-18**|**AgentReview: Exploring Peer Review Dynamics with LLM Agents**|Yiqiao Jin et.al.|[2406.12708](http://arxiv.org/abs/2406.12708)|null|## \u7ffb\u8bd1 \u540c\u884c\u8bc4\u5ba1\u662f\u79d1\u5b66\u51fa\u7248\u8bda\u4fe1\u548c\u8fdb\u6b65\u7684\u57fa\u7840\u3002\u4f20\u7edf\u7684\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u5206\u6790\u65b9\u6cd5\u5f80\u5f80\u4fa7\u91cd\u4e8e\u73b0\u6709\u6570\u636e\u7684\u63a2\u7d22\u548c\u7edf\u8ba1\uff0c\u4f46\u672a\u80fd\u5145\u5206\u8003\u8651\u8fd9\u4e00\u8fc7\u7a0b\u7684\u591a\u53d8\u91cf\u6027\u8d28\uff0c\u5904\u7406\u6f5c\u5728\u53d8\u91cf\uff0c\u4e14\u53d7\u9650\u4e8e\u9690\u79c1\u95ee\u9898\uff0c\u56e0\u4e3a\u6570\u636e\u6d89\u53ca\u654f\u611f\u6027\u3002\u6211\u4eec\u63d0\u51faAgentReview\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u540c\u884c\u8bc4\u5ba1\u6a21\u62df\u6846\u67b6\uff0c\u6709\u6548\u5206\u89e3\u4e86\u591a\u4e2a\u6f5c\u5728\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u5e76\u89e3\u51b3\u4e86\u9690\u79c1\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7531\u4e8e\u793e\u4f1a\u5f71\u54cd\u529b\u7406\u8bba\u3001\u5229\u4ed6\u4e3b\u4e49\u75b2\u52b3\u548c\u6743\u5a01\u504f\u89c1\u7b49\u793e\u4f1a\u5b66\u7406\u8bba\u7684\u652f\u6301\uff0c\u8bba\u6587\u51b3\u7b56\u4e2d\u5b58\u5728\u663e\u8457\u768437.1%\u7684\u53d8\u5f02\u6027\u3002\u6211\u4eec\u76f8\u4fe1\u8fd9\u9879\u7814\u7a76\u80fd\u4e3a\u4f18\u5316\u540c\u884c\u8bc4\u5ba1\u673a\u5236\u8bbe\u8ba1\u63d0\u4f9b\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628](http://arxiv.org/abs/2406.12628)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u4e8e\u7535\u529b\u7535\u5b50\u7cfb\u7edf\u63a7\u5236\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u6a21\u578b\u4e0d\u786e\u5b9a\u6027\u4ee5\u53ca\u8bbe\u8ba1\u5468\u671f\u6f2b\u957f\u548c\u6210\u672c\u9ad8\u6602\u7684\u95ee\u9898\u3002\u8bba\u6587\u65e8\u5728\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u9762\u5411\u76ee\u6807\u7684\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u3002\u8be5\u6846\u67b6\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7ed3\u5408\u591a\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u9ad8\u6548\u4e14\u81ea\u52a8\u5316\u7684\u63a7\u5236\u5668\u8bbe\u8ba1\u6d41\u7a0b\u3002LLM\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u5e76\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u7684\u9ad8\u7ea7\u6307\u4ee4\uff0c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7ea6\u675f\u8c03\u6574\u5176\u884c\u4e3a\u3002\u8fd9\u79cd\u65b0\u9896\u800c\u9ad8\u6548\u7684\u7b56\u7565\u6709\u671b\u663e\u8457\u63d0\u5347\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\uff0c\u6781\u5927\u5730\u4fbf\u5229\u5b9e\u8df5\u8005\u7684\u5de5\u4f5c\u3002|\n", "2406.12276": "|**2024-06-18**|**CodeNav: Beyond tool-use to using real-world codebases with LLM agents**|Tanmay Gupta et.al.|[2406.12276](http://arxiv.org/abs/2406.12276)|null|\u6211\u4eec\u4ecb\u7ecdCodeNav\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5bfc\u822a\u548c\u5229\u7528\u5148\u524d\u672a\u89c1\u8fc7\u7684\u4ee3\u7801\u4ed3\u5e93\uff0c\u4ee5\u89e3\u51b3\u7528\u6237\u67e5\u8be2\u7684\u7cfb\u7edf\u3002\u4e0e\u9700\u8981\u901a\u8fc7\u624b\u52a8\u63cf\u8ff0\u5728LLM\u4e0a\u4e0b\u6587\u4e2d\u201c\u6ce8\u518c\u201d\u6240\u6709\u76f8\u5173\u5de5\u5177\u7684\u5de5\u5177\u4f7f\u7528\u578bLLM\u4e0d\u540c\uff0cCodeNav\u80fd\u591f\u81ea\u52a8\u7d22\u5f15\u548c\u641c\u7d22\u76ee\u6807\u4ee3\u7801\u5e93\u4e2d\u7684\u4ee3\u7801\u5757\uff0c\u627e\u5230\u76f8\u5173\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u5bfc\u5165\u5b83\u4eec\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u751f\u6210\u89e3\u51b3\u65b9\u6848\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793aCodeNav\u5982\u4f55\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u4ee3\u7801\u5e93\u6765\u89e3\u51b3\u590d\u6742\u7684\u7528\u6237\u95ee\u9898\u3002\u63a5\u7740\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u5b9a\u91cf\u6bd4\u8f83\u4e86\u4ec5\u80fd\u8bbf\u95ee\u76ee\u6807\u4ee3\u7801\u5e93\u7684\u4ee3\u7801\u4f7f\u7528\u65b9\u6cd5\u4e0e\u62e5\u6709\u5bf9\u6240\u6709\u5de5\u5177\u540d\u79f0\u548c\u63cf\u8ff0\u7684\u7279\u6743\u8bbf\u95ee\u7684\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u7c7b\u578b\u5de5\u5177\u548c\u5e93\u63cf\u8ff0\u5bf9\u4ee3\u7801\u4f7f\u7528\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5c06\u6e90\u4ee3\u7801\u89c6\u4e3a\u8f93\u5165\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ee3\u7801\u63cf\u8ff0\u7684\u4f18\u52bf\u3002\u6240\u6709\u4ee3\u7801\u5c06\u9075\u5faa\u5bbd\u677e\u8bb8\u53ef\u534f\u8bae\u5f00\u6e90\u3002|\n", "2406.12125": "|**2024-06-17**|**Efficient Sequential Decision Making with Large Language Models**|Dingyang Chen et.al.|[2406.12125](http://arxiv.org/abs/2406.12125)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u6269\u5c55\u5230\u5e8f\u5217\u51b3\u7b56\u5236\u5b9a\u3002\u5f53\u524d\u7684\u52aa\u529b\u8981\u4e48\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03LLMs\u8fdb\u884c\u51b3\u7b56\uff0c\u8981\u4e48\u4e3a\u9884\u8bad\u7ec3\u7684LLMs\u8bbe\u8ba1\u63d0\u793a\u3002\u524d\u8005\u9762\u4e34\u8ba1\u7b97\u8d1f\u62c5\u91cd\u7684\u68af\u5ea6\u66f4\u65b0\u95ee\u9898\uff0c\u800c\u540e\u8005\u672a\u663e\u793a\u51fa\u660e\u663e\u6548\u679c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5728\u7ebf\u6a21\u578b\u9009\u62e9\u7b97\u6cd5\u6709\u6548\u5730\u5c06LLMs\u6574\u5408\u5230\u5e8f\u5217\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u3002\u7edf\u8ba1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u51b3\u7b56\u7b97\u6cd5\u548c\u7eafLLM\u4ee3\u7406\u3002\u5728\u8ba1\u7b97\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u907f\u514d\u4e86\u5bf9LLMs\u8fdb\u884c\u6602\u8d35\u7684\u68af\u5ea6\u66f4\u65b0\uff0c\u5e76\u4e14\u5728\u6574\u4e2a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u4ec5\u9700\u8981\u5c11\u91cf\u7684LLM\u8c03\u7528\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee5\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u4e9a\u9a6c\u900a\u6570\u636e\u96c6\u4e3a\u4f8b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4ec5\u4f7f\u75281.5%\u7684\u65f6\u95f4\u6b65\u6570\u8c03\u7528LLMs\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6bd4\u57fa\u7ebf\u8d85\u8fc76\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2406.14373": "|**2024-07-01**|**Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory**|Gordon Dai et.al.|[2406.14373](http://arxiv.org/abs/2406.14373)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\u7684\u7814\u7a76\u8fce\u6765\u4e86\u5927\u89c4\u6a21\u63a2\u7d22\u7684\u673a\u9047\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u57fa\u4e8e\u5148\u524d\u5bf9LLM\u884c\u4e3a\u4f53\u8bbe\u8ba1\u7684\u7814\u7a76\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u6a21\u62df\u7684Agent\u793e\u4f1a\uff0c\u5176\u4e2d\u590d\u6742\u7684\u793e\u4ea4\u5173\u7cfb\u968f\u65f6\u95f4\u52a8\u6001\u5f62\u6210\u548c\u53d1\u5c55\u3002\u6211\u4eec\u8d4b\u4e88\u8fd9\u4e9bAgent\u5fc3\u7406\u9a71\u52a8\u529b\uff0c\u5e76\u7f6e\u4e8e\u4e00\u4e2a\u6c99\u76d2\u751f\u5b58\u73af\u5883\u4e2d\u3002\u901a\u8fc7\u6258\u9a6c\u65af\u00b7\u970d\u5e03\u65af\u7684\u5960\u57fa\u6027\u793e\u4f1a\u5951\u7ea6\u7406\u8bba\uff08SCT\uff09\u7684\u89c6\u89d2\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u8fd9\u4e2aAgent\u793e\u4f1a\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8d77\u521d\uff0cAgent\u4eec\u8868\u73b0\u51fa\u65e0\u62d8\u65e0\u675f\u7684\u51b2\u7a81\uff0c\u7b26\u5408\u970d\u5e03\u65af\u5bf9\u201c\u81ea\u7136\u72b6\u6001\u201d\u7684\u63cf\u8ff0\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u62df\u7684\u8fdb\u884c\uff0c\u793e\u4f1a\u5951\u7ea6\u9010\u6e10\u5f62\u6210\uff0c\u7edd\u5bf9\u4e3b\u6743\u8005\u5f97\u5230\u4e86\u6388\u6743\uff0c\u8fdb\u800c\u5efa\u7acb\u4e86\u4ee5\u76f8\u4e92\u5408\u4f5c\u4e3a\u57fa\u7840\u7684\u548c\u5e73\u5171\u540c\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u53d1\u73b0\u4e0e\u970d\u5e03\u65af\u7406\u8bba\u76f8\u543b\u5408\uff1aLLM\u9a71\u52a8\u7684\u591aAgent\u6a21\u62df\u5c55\u793a\u4e86\u793e\u4f1a\u52a8\u6001\u7684\u590d\u6742\u6027\uff0c\u53ef\u80fd\u590d\u5236\u5851\u9020\u4eba\u7c7b\u793e\u4f1a\u7684\u529b\u91cf\u3002\u5c3d\u7ba1\u65e0\u6cd5\u5b8c\u5168\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u6240\u6709\u7ec6\u5fae\u4e4b\u5904\uff0c\u4f46\u8fd9\u79cd\u6a21\u62df\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u7ed3\u6784\u3001\u7fa4\u4f53\u52a8\u6001\u548c\u590d\u6742\u4eba\u7c7b\u7cfb\u7edf\u5177\u6709\u6f5c\u5728\u4ef7\u503c\u3002|\n", "2406.14228": "|**2024-06-20**|**EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms**|Siyu Yuan et.al.|[2406.14228](http://arxiv.org/abs/2406.14228)|**[link](https://github.com/siyuyuan/evoagent)**|**\u968f\u7740\u5f3a\u5927\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4e00\u79cd\u65b0\u7684\u8d8b\u52bf\u662f\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u80fd\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\uff0c\u5c24\u5176\u662f\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u7684\u6846\u67b6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7cfb\u7edf\u7684\u529f\u80fd\u8303\u56f4\u548c\u53ef\u6269\u5c55\u6027\u3002\u5982\u4f55\u81ea\u52a8\u5c06\u4e13\u95e8\u7684\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51faEvoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u8fdb\u5316\u7b97\u6cd5\u81ea\u52a8\u5c06\u4e13\u5bb6\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u9ad8\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6267\u884c\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u89c6\u73b0\u6709\u7684\u4ee3\u7406\u6846\u67b6\u4e3a\u521d\u59cb\u4e2a\u4f53\uff0c\u5e76\u5e94\u7528\u4e00\u7cfb\u5217\u8fdb\u5316\u64cd\u4f5c\uff08\u5982\u7a81\u53d8\u3001\u4ea4\u53c9\u3001\u9009\u62e9\u7b49\uff09\u751f\u6210\u5177\u6709\u4e0d\u540c\u8bbe\u7f6e\u7684\u4ee3\u7406\u3002EvoAgent\u9002\u7528\u4e8e\u4efb\u4f55\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u80fd\u591f\u65e0\u987b\u989d\u5916\u4eba\u5de5\u8bbe\u8ba1\u81ea\u52a8\u751f\u6210\u6269\u5c55\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEvoAgent\u80fd\u591f\u81ea\u52a8\u4ea7\u751f\u591a\u4e2a\u4e13\u5bb6\u7ea7\u4ee3\u7406\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u3002**|\n", "2406.13352": "|**2024-06-19**|**AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|**[link](https://github.com/ethz-spylab/agentdojo)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentDojo\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u5904\u7406\u4e0d\u53ef\u4fe1\u6570\u636e\u7684AI\u4ee3\u7406\u7684\u5bf9\u6297\u6027\u9c81\u68d2\u6027\u3002\u9762\u5bf9\u4e0d\u65ad\u6f14\u53d8\u7684\u653b\u51fb\u548c\u9632\u5fa1\u624b\u6bb5\uff0cAgentDojo\u4e0d\u662f\u4e00\u4e2a\u9759\u6001\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u800c\u662f\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b0\u4efb\u52a1\u3001\u9632\u5fa1\u7b56\u7565\u4ee5\u53ca\u9002\u5e94\u6027\u653b\u51fb\u7684\u53ef\u6269\u5c55\u73af\u5883\u3002\u5b83\u5305\u542b\u4e8697\u4e2a\u5b9e\u9645\u5e94\u7528\u573a\u666f\u7684\u4efb\u52a1\uff08\u5982\u7ba1\u7406\u7535\u5b50\u90ae\u4ef6\u5ba2\u6237\u7aef\u3001\u5bfc\u822a\u7f51\u4e0a\u94f6\u884c\u7f51\u7ad9\u6216\u9884\u8ba2\u65c5\u884c\uff09\uff0c629\u4e2a\u5b89\u5168\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ee5\u53ca\u6765\u81ea\u6587\u732e\u7684\u5404\u79cd\u653b\u51fb\u548c\u9632\u5fa1\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u8bed\u8a00\u6a21\u578b\u5728AgentDojo\u4e2d\u7684\u8868\u73b0\u5e76\u4e0d\u5c3d\u4eba\u610f\uff08\u5373\u4f7f\u6ca1\u6709\u653b\u51fb\uff09\uff0c\u5e76\u4e14\u73b0\u6709\u7684\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u867d\u7136\u80fd\u7834\u574f\u4e00\u4e9b\u5b89\u5168\u7279\u6027\uff0c\u4f46\u5e76\u975e\u6240\u6709\u60c5\u51b5\u90fd\u9002\u7528\u3002\u6211\u4eec\u671f\u671bAgentDojo\u80fd\u591f\u63a8\u52a8\u7814\u7a76\uff0c\u4ee5\u5bfb\u627e\u5728\u89e3\u51b3\u5e38\u89c1\u4efb\u52a1\u65f6\u65e2\u53ef\u9760\u53c8\u5065\u58ee\u7684AI\u4ee3\u7406\u7684\u65b0\u8bbe\u8ba1\u539f\u5219\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ethz-spylab/agentdojo\u3002**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163](http://arxiv.org/abs/2406.13163)|null|\u53d1\u73b0\u65b0\u6750\u6599\u5bf9\u79d1\u5b66\u548c\u6280\u672f\u5177\u6709\u91cd\u5927\u610f\u4e49\uff0c\u4f46\u76ee\u524d\u4ecd\u662f\u8270\u5de8\u95ee\u9898\uff0c\u56e0\u4e3a\u5316\u5b66\u7a7a\u95f4\u6d69\u701a\u3002\u8fd1\u671f\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u63a8\u52a8\u4e86\u57fa\u4e8e\u6570\u636e\u7684\u65b9\u6cd5\u6765\u5feb\u901f\u7b5b\u9009\u6216\u751f\u6210\u6709\u524d\u666f\u7684\u6750\u6599\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u4f9d\u8d56\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u4e14\u5f80\u5f80\u7f3a\u4e4f\u4eba\u7c7b\u671f\u671b\u7684\u6750\u6599\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u5316\u5b66\u76f4\u89c9\u3002\u6211\u4eec\u63d0\u51faLLMatDesign\uff0c\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u53ef\u89e3\u91ca\u6750\u6599\u8bbe\u8ba1\u65b0\u6846\u67b6\u3002LLMatDesign\u5229\u7528LLM\u4ee3\u7406\u7406\u89e3\u4eba\u7c7b\u6307\u4ee4\uff0c\u5bf9\u6750\u6599\u8fdb\u884c\u4fee\u6539\uff0c\u5e76\u4f7f\u7528\u63d0\u4f9b\u7684\u5de5\u5177\u8bc4\u4f30\u7ed3\u679c\u3002\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u5148\u524d\u51b3\u7b56\uff0cLLMatDesign\u80fd\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u5feb\u901f\u9002\u5e94\u65b0\u4efb\u52a1\u548c\u6761\u4ef6\u3002\u5728\u79bb\u7ebf\u5b9e\u9a8c\u4e2d\uff0c\u5bf9LLMatDesign\u5728\u591a\u4e2a\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u7cfb\u7edf\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u5b83\u5728\u5c0f\u6570\u636e\u73af\u5883\u4e0b\u5f00\u53d1\u51fa\u5177\u6709\u7528\u6237\u5b9a\u4e49\u76ee\u6807\u6027\u8d28\u7684\u65b0\u6750\u6599\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c55\u793a\u4e86\u81ea\u4e3bLLM\u5f15\u5bfc\u7684\u8ba1\u7b97\u73af\u5883\u4e0b\u7684\u6750\u6599\u53d1\u73b0\u7684\u975e\u51e1\u6f5c\u529b\uff0c\u9884\u793a\u7740\u672a\u6765\u81ea\u9a7e\u9a76\u5b9e\u9a8c\u5ba4\u7684\u53ef\u80fd\u6027\u3002|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a-agent\u7cfb\u7edf\uff08LLM-MAS\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5b83\u4eec\u901a\u8fc7\u7cfb\u7edf\u5185\u5404\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u534f\u4f5c\u6765\u5b8c\u6210\u4efb\u52a1\uff0c\u524d\u63d0\u662f\u5171\u4eab\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5f53\u4ee3\u7406\u95f4\u7684\u4ea4\u6d41\u88ab\u7528\u4e8e\u589e\u5f3a\u4eba\u7c7b\u5408\u4f5c\u65f6\uff0c\u7531\u4e8e\u4fe1\u606f\u4e0d\u5bf9\u79f0\uff08\u6bcf\u4e2a\u4ee3\u7406\u4ec5\u80fd\u8bbf\u95ee\u5176\u5bf9\u5e94\u4eba\u7c7b\u7528\u6237\u7684\u4fe1\u606f\uff09\uff0c\u8fd9\u5e26\u6765\u4e86\u65b0\u7684\u6311\u6218\u3002\u4f20\u7edfMAS\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u96be\u4ee5\u5b8c\u6210\u4efb\u52a1\u3002\u4e3a\u89e3\u51b3\u6b64\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u79f0\u4e3a\u201ciAgents\u201d\uff0c\u5373\u4fe1\u606f\u4e30\u5bcc\u591aagent\u7cfb\u7edf\u3002\u5728iAgents\u4e2d\uff0c\u4eba\u7c7b\u793e\u4f1a\u7f51\u7edc\u5728\u4ee3\u7406\u7f51\u7edc\u4e2d\u5f97\u5230\u53cd\u6620\uff0c\u4ee3\u7406\u4e3b\u52a8\u4ea4\u6362\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4ece\u800c\u514b\u670d\u4fe1\u606f\u4e0d\u5bf9\u79f0\u3002iAgents\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ee3\u7406\u63a8\u7406\u673a\u5236\uff0cInfoNav\uff0c\u5f15\u5bfc\u4ee3\u7406\u4e4b\u95f4\u7684\u6709\u6548\u4fe1\u606f\u4ea4\u6d41\u3002\u7ed3\u5408InfoNav\uff0ciAgents\u7ec4\u7ec7\u4e86\u6df7\u5408\u8bb0\u5fc6\u4e2d\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4e3a\u4ee3\u7406\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u4fe1\u606f\u8fdb\u884c\u4ea4\u6362\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u9996\u4e2a\u9488\u5bf9\u8bc4\u4f30LLM\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u6761\u4ef6\u4e0b\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u7684\u57fa\u51c6\u2014\u2014InformativeBench\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0ciAgents\u80fd\u591f\u5728\u5305\u542b140\u4eba\u548c588\u6761\u5173\u7cfb\u7684\u793e\u4f1a\u7f51\u7edc\u4e2d\u534f\u4f5c\uff0c\u81ea\u4e3b\u8fdb\u884c\u8d85\u8fc730\u8f6e\u7684\u901a\u4fe1\uff0c\u5e76\u4ece\u8fd170,000\u6761\u6d88\u606f\u4e2d\u68c0\u7d22\u4fe1\u606f\uff0c\u57283\u5206\u949f\u5185\u5b8c\u6210\u4efb\u52a1\u3002**|\n", "2406.14884": "|**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u88ab\u8bbe\u8ba1\u7528\u4e8e\u901a\u8fc7\u8fed\u4ee3\u89c4\u5212\u548c\u884c\u52a8\u6765\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7684\u4efb\u52a1\u65f6\uff0c\u5bb9\u6613\u4ea7\u751f\u4e0d\u671f\u671b\u7684\u89c4\u5212\u5e7b\u89c9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u521d\u6b65\u5c1d\u8bd5\u901a\u8fc7\u878d\u5165\u4e0e\u5de5\u4f5c\u6d41\u7a0b\u76f8\u5173\u7684\u5916\u90e8\u77e5\u8bc6\u6765\u589e\u5f3a\u89c4\u5212\u53ef\u9760\u6027\u3002\u5c3d\u7ba1\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u6ce8\u5165\u7684\u77e5\u8bc6\u901a\u5e38\u6742\u4e71\u65e0\u7ae0\uff0c\u683c\u5f0f\u591a\u6837\uff0c\u7f3a\u4e4f\u4e25\u8c28\u7684\u89c4\u8303\u5316\u548c\u5168\u9762\u7684\u6bd4\u8f83\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u89c4\u8303\u4e86\u4e0d\u540c\u683c\u5f0f\u7684\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86FlowBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u9762\u5411\u5de5\u4f5c\u6d41\u5f15\u5bfc\u89c4\u5212\u7684\u57fa\u51c6\u3002FlowBench\u6db5\u76d6\u4e86\u6765\u81ea6\u4e2a\u9886\u57df\u768451\u4e2a\u4e0d\u540c\u573a\u666f\uff0c\u5176\u4e2d\u77e5\u8bc6\u4ee5\u591a\u6837\u7684\u5f62\u5f0f\u5448\u73b0\u3002\u4e3a\u4e86\u8bc4\u4f30\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u5728FlowBench\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u5c42\u6b21\u7684\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\u5728\u591a\u79cd\u683c\u5f0f\u4e0b\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5728\u6ee1\u8db3\u6ee1\u610f\u7684\u89c4\u5212\u9700\u6c42\u65b9\u9762\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u57fa\u51c6\u80fd\u4e3a\u672a\u6765\u7684\u4ee3\u7406\u89c4\u5212\u7814\u7a76\u94fa\u5e73\u9053\u8def\u3002|\n", "2406.17232": "|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### \u7ffb\u8bd1 \u6784\u5efa\u903c\u771f\u7684\u4eba\u5de5\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u4fe1\u7684\u793e\u4f1a\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u6709\u65f6\u80fd\u63d0\u5347\u4eba\u6027\u5316\uff0c\u4f46\u6548\u679c\u5e76\u4e0d\u603b\u662f\u7406\u60f3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u6765\u81ea\u5b9e\u8bc1\u4eba\u7c7b\u4fe1\u5ff5\u7f51\u7edc\u7684\u4fe1\u606f\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002\u6211\u4eec\u5229\u7528\u4e00\u9879\u4eba\u7c7b\u8c03\u67e5\u6570\u636e\uff0c\u4f30\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b18\u4e2a\u4e3b\u9898\u7684\u4fe1\u5ff5\u7f51\u7edc\uff0c\u8fd9\u4e9b\u4e3b\u9898\u52a0\u8f7d\u4e8e\u4e24\u4e2a\u4e0d\u91cd\u53e0\u7684\u6f5c\u5728\u56e0\u5b50\u4e0a\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728LLM\u4e2d\u690d\u5165\u4e00\u4e2a\u5173\u4e8e\u67d0\u4e2a\u8bdd\u9898\u7684\u89c2\u70b9\uff0c\u8bc4\u4f30\u5176\u5bf9\u5269\u4f59\u6d4b\u8bd5\u8bdd\u9898\u8868\u8fbe\u7684\u610f\u89c1\u4e0e\u76f8\u5e94\u4eba\u7c7b\u6570\u636e\u7684\u543b\u5408\u7a0b\u5ea6\u3002\u4ec5\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u6570\u636e\u7684\u89d2\u8272\u626e\u6f14\u672a\u80fd\u4f7fLLM\u548c\u4eba\u7c7b\u89c2\u70b9\u4fdd\u6301\u4e00\u81f4\uff0c\u7136\u800c\uff0c\u5f53\u7ed9\u4ee3\u7406\u6ce8\u5165\u5355\u4e00\u4fe1\u5ff5\u65f6\uff0c\u5b83\u663e\u8457\u63d0\u9ad8\u4e86\u4e0e\u4fe1\u5ff5\u7f51\u7edc\u5185\u76f8\u5173\u8bdd\u9898\u7684\u5951\u5408\uff0c\u800c\u5bf9\u4e8e\u7f51\u7edc\u5916\u7684\u8bdd\u9898\u5219\u5f71\u54cd\u4e0d\u5927\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e3a\u5728\u8bd5\u56fe\u6a21\u62df\u548c\u7406\u89e3\u793e\u4f1a\u4e2d\u4fe1\u5ff5\u5206\u5e03\u6a21\u5f0f\u7684\u5de5\u4f5c\u4e2d\uff0c\u5b9e\u73b0\u4eba\u4e0eLLM\u7684\u4fe1\u5ff5\u5bf9\u9f50\u63d0\u4f9b\u4e86\u4e00\u6761\u65b0\u8def\u5f84\u3002|\n", "2406.18702": "|**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u865a\u62df\u4ee3\u7406\u6765\u6a21\u62df\u7acb\u6cd5\u8fc7\u7a0b\uff0c\u5177\u4f53\u805a\u7126\u4e8e\u7f8e\u56fd\u53c2\u8bae\u9662\u60c5\u62a5\u59d4\u5458\u4f1a\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4ee3\u8868\u4e2a\u522b\u53c2\u8bae\u5458\u7684\u4ee3\u7406\uff0c\u5e76\u5728\u6a21\u62df\u7684\u59d4\u5458\u4f1a\u8ba8\u8bba\u4e2d\u8ba9\u5b83\u4eec\u4e92\u52a8\u3002\u8fd9\u4e9b\u4ee3\u7406\u5c55\u73b0\u51fa\u5728\u73b0\u5b9e\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u6df1\u601d\u719f\u8651\u7684\u89c2\u70b9\uff0c\u5e76\u5728\u7279\u5b9a\u6761\u4ef6\u4e0b\u627e\u5230\u4e24\u515a\u7684\u89e3\u51b3\u65b9\u6848\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6a21\u62df\u663e\u793a\uff0c\u9762\u5bf9\u5916\u90e8\u5e72\u6270\u65f6\uff0c\u4ee3\u7406\u6a21\u578b\u5728\u4e24\u515a\u5408\u4f5c\u4e0a\u5c55\u73b0\u51fa\u8f6c\u53d8\u7684\u6f5c\u529b\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u7b56\u7565\u53ef\u80fd\u6210\u4e3a\u7406\u89e3\u548c\u6539\u8fdb\u7acb\u6cd5\u6d41\u7a0b\u7684\u6709\u6548\u5de5\u5177\uff0c\u8fd9\u4e0e\u4e00\u7cfb\u5217\u53d1\u73b0\u76f8\u547c\u5e94\uff0c\u5373\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u80fd\u6709\u7528\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u73b0\u8c61\u3002\u672a\u6765\u7684\u7814\u7a76\u5c06\u81f4\u529b\u4e8e\u63d0\u5347\u4ee3\u7406\u7684\u590d\u6742\u6027\uff0c\u6269\u5927\u6a21\u62df\u8303\u56f4\uff0c\u5e76\u63a2\u7d22\u5728\u653f\u7b56\u6d4b\u8bd5\u548c\u8c08\u5224\u4e2d\u7684\u5e94\u7528\u3002|\n", "2406.19966": "|**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|\u5927\u591a\u6570\u7ecf\u6d4e\u7406\u8bba\u901a\u5e38\u5047\u8bbe\u91d1\u878d\u5e02\u573a\u53c2\u4e0e\u8005\u662f\u5b8c\u5168\u7406\u6027\u7684\u4e2a\u4f53\uff0c\u5e76\u4f7f\u7528\u6570\u5b66\u6a21\u578b\u6765\u6a21\u62df\u4eba\u7c7b\u5728\u91d1\u878d\u5e02\u573a\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u884c\u4e3a\u5f80\u5f80\u5e76\u975e\u5b8c\u5168\u7406\u6027\uff0c\u7528\u6570\u5b66\u6a21\u578b\u7cbe\u786e\u9884\u6d4b\u9887\u5177\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\\textbf{A}gent-based \\textbf{S}imulated \\textbf{F}inancial \\textbf{M}arket\uff08ASFM\uff09\uff0c\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u771f\u5b9e\u8ba2\u5355\u5339\u914d\u7cfb\u7edf\u7684\u6a21\u62df\u80a1\u7968\u5e02\u573a\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80a1\u7968\u4ea4\u6613\u4ee3\u7406\uff0c\u5b83\u5305\u62ec\u4e2a\u4eba\u6982\u51b5\u3001\u89c2\u5bdf\u548c\u57fa\u4e8e\u5de5\u5177\u5b66\u4e60\u7684\u52a8\u4f5c\u6a21\u5757\u3002\u8fd9\u79cd\u4ea4\u6613\u4ee3\u7406\u80fd\u591f\u5168\u9762\u7406\u89e3\u5f53\u524d\u5e02\u573a\u52a8\u6001\u548c\u91d1\u878d\u653f\u7b56\u4fe1\u606f\uff0c\u4ece\u800c\u6839\u636e\u5176\u4ea4\u6613\u7b56\u7565\u4f5c\u51fa\u51b3\u7b56\u3002\u5b9e\u9a8c\u8868\u660e\uff0cASFM\u5728\u53ef\u63a7\u573a\u666f\u4e0b\u7684\u53cd\u5e94\u4e0e\u73b0\u5b9e\u80a1\u7968\u5e02\u573a\u4e00\u81f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u4e24\u4e2a\u7ecf\u6d4e\u5b66\u7814\u7a76\u70ed\u70b9\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u53d1\u73b0\uff0c\u6211\u4eec\u7684\\model\u5f97\u51fa\u7684\u7ed3\u8bba\u4e0e\u7ecf\u6d4e\u5b66\u7814\u7a76\u7684\u521d\u6b65\u53d1\u73b0\u76f8\u543b\u5408\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8ba4\u4e3aASFM\u4e3a\u7ecf\u6d4e\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u8303\u5f0f\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.01887": "|**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|\u672c\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u51b3\u7b56\u5236\u5b9a\u4e2d\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u675c\u5c14\u514b\u59c6\u53cc\u81c2\u8d4c\u535a\uff08Dueling Bandits\uff0cDB\uff09\u95ee\u9898\u7684\u4e0a\u4e0b\u6587\u4e2d\u3002\u7814\u7a76\u6bd4\u8f83\u4e86GPT-3.5-Turbo\u3001GPT-4\u548cGPT-4-Turbo\u4e0e\u73b0\u6709DB\u7b97\u6cd5\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c24\u5176\u662fGPT-4 Turbo\uff0c\u80fd\u591f\u5feb\u901f\u8bc6\u522b\u51fa\u4f18\u52bf\u660e\u663e\u7684\u9009\u9879\uff0c\u4ece\u800c\u5728\u5f31\u540e\u6094\u65b9\u9762\u8d85\u8d8a\u5f53\u524d\u6700\u4f73\u7b97\u6cd5\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6536\u655b\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5bf9\u63d0\u793a\u7684\u654f\u611f\u5ea6\u8f83\u9ad8\uff0c\u4e14\u5bf9\u63d0\u793a\u53d8\u5316\u53cd\u5e94\u8106\u5f31\u3002\u4e3a\u4e86\u6539\u8fdb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u4e86LLM\u51b3\u7b56\u80fd\u529b\u4e0e\u7ecf\u5178DB\u7b97\u6cd5\u7406\u8bba\u4fdd\u8bc1\u7684\u589e\u5f3a\u578b\u7b97\u6cd5\u2014\u2014IF-Enhanced LLM\u3002\u8fd9\u79cd\u8bbe\u8ba1\u5c55\u793a\u4e86\u5982\u4f55\u589e\u5f3aLLM\u5728\u5bf9\u6027\u80fd\u7a33\u5b9a\u6027\u6709\u8981\u6c42\u7684\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u53ef\u4fe1\u5ea6\u3002IF-Enhanced LLM\u5177\u6709\u5f31\u540e\u6094\u548c\u5f3a\u540e\u6094\u7684\u7406\u8bba\u4fdd\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u5373\u4f7f\u9762\u5bf9\u5608\u6742\u548c\u5bf9\u6297\u6027\u7684\u63d0\u793a\uff0cIF-Enhanced LLM\u4ecd\u4fdd\u6301\u7a33\u5065\u3002|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u81ea\u52a8\u5316\uff0c\u5982\u4ee3\u7801\u5408\u6210\u3001\u7a0b\u5e8f\u4fee\u590d\u548c\u6d4b\u8bd5\u751f\u6210\uff0c\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\u3002\u7814\u7a76\u4eba\u5458\u548c\u4e1a\u754c\u5b9e\u8df5\u8005\u5df2\u7ecf\u5f00\u53d1\u51fa\u5404\u79cd\u81ea\u4e3bLLM\u4ee3\u7406\u6765\u6267\u884c\u7aef\u5230\u7aef\u7684\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u3001\u8fd0\u884c\u547d\u4ee4\u3001\u89c2\u5bdf\u73af\u5883\u53cd\u9988\u5e76\u89c4\u5212\u672a\u6765\u884c\u52a8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u65b9\u6cd5\u7684\u590d\u6742\u6027\u4ee5\u53ca\u5f53\u524dLLM\u7684\u5c40\u9650\u6027\uff0c\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u662f\u5426\u771f\u7684\u9700\u8981\u4f7f\u7528\u590d\u6742\u7684\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\uff1f\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86Agentless\u2014\u2014\u4e00\u79cd\u65e0\u4ee3\u7406\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u89e3\u51b3\u8f6f\u4ef6\u5f00\u53d1\u95ee\u9898\u3002\u4e0e\u590d\u6742\u7684\u4ee3\u7406\u8bbe\u7f6e\u76f8\u6bd4\uff0cAgentless\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u4e24\u9636\u6bb5\u8fc7\u7a0b\uff1a\u5b9a\u4f4d\u540e\u4fee\u590d\uff0c\u4e0d\u8ba9LLM\u51b3\u5b9a\u672a\u6765\u7684\u884c\u52a8\u6216\u64cd\u4f5c\u590d\u6742\u7684\u5de5\u5177\u3002\u5728\u6d41\u884c\u7684SWE-bench Lite\u57fa\u51c6\u4e0a\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4ee4\u4eba\u60ca\u8bb6\u5730\u8868\u660e\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u80fd\u591f\u5b9e\u73b0\u6700\u9ad8\u6027\u80fd\uff0827.33%\uff09\u548c\u6700\u4f4e\u6210\u672c\uff080.34\u7f8e\u5143\uff09\uff0c\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u8f6f\u4ef6\u4ee3\u7406\uff01 \u6b64\u5916\uff0c\u6211\u4eec\u624b\u52a8\u5206\u7c7b\u4e86SWE-bench Lite\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u53d1\u73b0\u5b58\u5728\u7cbe\u786e\u7684ground truth\u8865\u4e01\u95ee\u9898\u6216\u63cf\u8ff0\u4e0d\u8db3/\u8bef\u5bfc\u6027\u7684\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86SWE-bench Lite-S\uff0c\u901a\u8fc7\u6392\u9664\u8fd9\u4e9b\u95ee\u9898\u6765\u8fdb\u884c\u66f4\u4e25\u683c\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5f53\u524d\u88ab\u5ffd\u89c6\u7684\u7b80\u5355\u3001\u53ef\u89e3\u91ca\u6280\u672f\u5728\u81ea\u4e3b\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5e0c\u671bAgentless\u5c06\u4f5c\u4e3a\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\u7684\u57fa\u7ebf\u3001\u8d77\u70b9\u548c\u671f\u671b\u503c\uff0c\u6fc0\u53d1\u672a\u6765\u5728\u8fd9\u4e2a\u5173\u952e\u9886\u57df\u7684\u5de5\u4f5c\u3002**|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u6536\u96c6\u5168\u7403\u4fe1\u606f\uff0c\u5e76\u8fdb\u884c\u63a8\u7406\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\uff0c\u8fd9\u5f15\u53d1\u4e86\u4f7f\u7528LLM\u9884\u6d4b\u56fd\u9645\u4e8b\u4ef6\u7684\u5174\u8da3\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u4e00\u4e2a\u4e25\u683c\u8bc4\u4f30LLM\u9884\u6d4b\u80fd\u529b\u4e0e\u53ef\u9760\u6027\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faMIRAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4ef7LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\u3002MIRAI\u6784\u5efa\u4e86\u4e00\u4e2a\u4ee3\u7406\u73af\u5883\uff0c\u914d\u5907\u6709\u8bbf\u95ee\u5e7f\u6cdb\u5386\u53f2\u7ed3\u6784\u5316\u4e8b\u4ef6\u548c\u6587\u672c\u65b0\u95fb\u6570\u636e\u5e93\u7684\u5de5\u5177\u3002\u6211\u4eec\u5bf9GDELT\u4e8b\u4ef6\u6570\u636e\u5e93\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6e05\u6d17\u548c\u89e3\u6790\uff0c\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u5173\u8054\u9884\u6d4b\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u4e0d\u540c\u9884\u6d4b\u65f6\u95f4\u8303\u56f4\uff0c\u4ece\u77ed\u671f\u5230\u957f\u671f\uff0c\u4ee5\u68c0\u9a8cLLM\u5728\u6574\u5408\u5168\u7403\u5173\u952e\u4fe1\u606f\u3001\u8fd0\u7528\u9886\u57df\u7279\u5b9aAPI\u548c\u5e93\u7f16\u5199\u4ee3\u7801\u4ee5\u53ca\u7efc\u5408\u5904\u7406\u6765\u81ea\u591a\u79cd\u683c\u5f0f\u548c\u65f6\u95f4\u7684\u5386\u53f2\u77e5\u8bc6\u4ee5\u51c6\u786e\u9884\u6d4b\u672a\u6765\u4e8b\u4ef6\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u53ef\u9760\u7684\u6846\u67b6\uff0c\u4ee5\u8bc4\u4f30LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u9884\u6d4b\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u63a8\u52a8\u66f4\u7cbe\u786e\u548c\u53ef\u4fe1\u7684\u56fd\u9645\u5173\u7cfb\u5206\u6790\u6a21\u578b\u7684\u53d1\u5c55\u3002|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u57fa\u4e8eLLM\u7684\u79fb\u52a8\u4ee3\u7406\u5df2\u6210\u4e3a\u4eba\u673a\u4ea4\u4e92\u9886\u57df\u7684\u7814\u7a76\u70ed\u70b9\u3002\u7136\u800c\uff0c\u9488\u5bf9\u6b64\u7c7b\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\u8d44\u6e90\u76f8\u5bf9\u532e\u4e4f\u3002\u8bc4\u4f30\u8fd9\u7c7b\u4ee3\u7406\u901a\u5e38\u9762\u4e34\u4e09\u4e2a\u6311\u6218\uff1a\uff081\uff09\u4ec5\u4f9d\u8d56\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\u7684\u4f4e\u6548\u9650\u5236\u4e86\u4efb\u52a1\u8bc4\u4f30\uff1b\uff082\uff09\u5355\u4e00\u5e94\u7528\u4e2d\u7684\u7279\u5b9a\u6307\u4ee4\u4e0d\u8db3\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u7684\u591a\u7ef4\u5ea6\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\uff1b\uff083\uff09\u5f53\u524d\u7684\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u51c6\u786e\u8861\u91cf\u8fde\u7eed\u52a8\u4f5c\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mobile-Bench\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u7528\u4e8e\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u80fd\u529b\u7684\u57fa\u51c6\u3002\u9996\u5148\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4f20\u7edf\u7684UI\u64cd\u4f5c\uff0c\u878d\u5165\u4e86103\u4e2a\u6536\u96c6\u5230\u7684API\uff0c\u4ee5\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7684\u6548\u7387\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u548cLLM\u589e\u5f3a\u7684\u6570\u636e\u6536\u96c6\u6765\u8fdb\u884c\u8bc4\u4f30\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u8bc4\u4ef7\u79fb\u52a8\u4ee3\u7406\u7684\u4e0d\u540c\u89c4\u5212\u80fd\u529b\u5c42\u6b21\uff0c\u6211\u4eec\u7684\u6570\u636e\u88ab\u5206\u4e3aSAST\uff08\u7b80\u5355\u4efb\u52a1\uff09\u3001SAMT\uff08\u7a0d\u590d\u6742\u4efb\u52a1\uff09\u548cMAMT\uff08\u591a\u4efb\u52a1\uff09\u4e09\u7c7b\uff0c\u53cd\u6620\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u7684\u5dee\u5f02\u3002Mobile-Bench\u5305\u542b832\u6761\u6570\u636e\u6761\u76ee\uff0c\u5176\u4e2d\u8d85\u8fc7200\u9879\u4efb\u52a1\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u6d4b\u8bd5\u8de8\u5e94\u7528\u534f\u4f5c\u573a\u666f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3aCheckPoint\uff0c\u7528\u4e8e\u68c0\u67e5LLM\u79fb\u52a8\u4ee3\u7406\u5728\u89c4\u5212\u548c\u63a8\u7406\u6b65\u9aa4\u4e2d\u662f\u5426\u8fbe\u5230\u5173\u952e\u70b9\u3002|\n", "2407.00476": "|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**\u968f\u7740\u4f20\u7edf\u4f18\u5316\u548c\u8c03\u5ea6\u65b9\u6cd5\u9010\u6e10\u8f6c\u5411\u7528\u6237\u9a71\u52a8\u548c\u4e2a\u4eba\u5316\u670d\u52a1\uff0c\u4ee5\u63d0\u5347\u7528\u6237\u4f53\u9a8c\uff08QoE\uff09\u548c\u7075\u6d3b\u6027\uff0c\u672a\u6765\u7684\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5728\u65e0\u7ebf\u548c\u6570\u5b57\u5316\u80fd\u6e90\u7f51\u7edc\u4e2d\uff0c\u9762\u4e34\u7740\u5982\u4f55\u66f4\u597d\u5730\u7406\u89e3\u548c\u54cd\u5e94\u7528\u6237\u9700\u6c42\u7684\u6311\u6218\u3002\u4f20\u7edf\u7684\u7cfb\u7edf\u5f80\u5f80\u5ffd\u89c6\u4e86\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\uff0c\u56e0\u4e3a\u7528\u6237\u4e0e\u673a\u5668\u4e4b\u95f4\u7684\u6c9f\u901a\u4e0d\u7545\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u5e26\u6765\u4e86\u7a81\u7834\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u7528\u6237\u4e0e\u8bbe\u5907\u4e4b\u95f4\u81ea\u7136\u7684\u4ea4\u6d41\u754c\u9762\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u67b6\u6784\uff0c\u901a\u8fc7\u6784\u5efa\u4e09\u4e2aLLM\u4ee3\u7406\u6765\u5c06\u7528\u6237\u7684\u8bed\u97f3\u8bf7\u6c42\uff08VRQ\uff09\u8f6c\u5316\u4e3a\u8d44\u6e90\u5206\u914d\u5411\u91cf\u3002\u5177\u4f53\u5305\u62ec\uff1aLLM\u610f\u56fe\u8bc6\u522b\u4ee3\u7406\u5c06\u8bf7\u6c42\u8f6c\u5316\u4e3a\u4f18\u5316\u95ee\u9898\uff08OP\uff09\u3001LLM OP\u53c2\u6570\u8bc6\u522b\u4ee3\u7406\u4ee5\u53caLLM OP\u6c42\u89e3\u4ee3\u7406\u3002 \u6211\u4eec\u9488\u5bf9\u7535\u52a8\u6c7d\u8f66\uff08EV\uff09\u5145\u7535\u7684\u5178\u578bVRQ\u521b\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u5e93\uff0c\u4f5c\u4e3a\u6027\u80fd\u8bc4\u4f30\u7684\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u4e3b\u8981\u4f7f\u7528Llama 3 8B\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\u3002\u901a\u8fc7\u4e0d\u540c\u7684\u63d0\u793a\u5de5\u7a0b\u573a\u666f\u6d4b\u8bd5\uff0c\u7ed3\u679c\u663e\u793a\u4e86\u6240\u63d0\u67b6\u6784\u7684\u6709\u6548\u6027\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u952e\u89c1\u89e3\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u5efa\u6a21\u5b9e\u9645\u95ee\u9898\u7684\u66f4\u5927\u5019\u9009OP\u96c6\u53ef\u80fd\u4f1a\u7531\u4e8e\u66f4\u9ad8\u7684\u8bc6\u522b/OP\u5206\u7c7b\u566a\u58f0\u800c\u964d\u4f4e\u6700\u7ec8\u6027\u80fd\u3002\u6240\u6709\u7ed3\u679c\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u5e94\u7528\u3002**|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|\u4eba\u5de5\u667a\u80fd\u5728\u91d1\u878d\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6b63\u5728\u91cd\u5851\u6570\u636e\u5904\u7406\u548c\u89e3\u8bfb\u65b9\u5f0f\u3002\u5176\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u3001\u63d0\u5347\u5ba2\u6237\u670d\u52a1\uff0c\u5e76\u63d0\u4f9b\u8be6\u5c3d\u7684\u8d22\u52a1\u5206\u6790\u3002\u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdIDEA-FinBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u91d1\u878d\u77e5\u8bc6\u65b9\u9762\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u7684\u8bc4\u4ef7\u57fa\u51c6\u3002\u5b83\u501f\u9274\u4e86\u4e24\u4e2a\u5168\u7403\u77e5\u540d\u4e14\u6743\u5a01\u7684\u91d1\u878d\u4e13\u4e1a\u8003\u8bd5\u4e2d\u7684\u95ee\u9898\uff0c\u65e8\u5728\u5168\u9762\u68c0\u9a8cLLMs\u89e3\u7b54\u4e0e\u91d1\u878d\u76f8\u5173\u8003\u9898\u7684\u80fd\u529b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51faIDEA-FinKER\uff0c\u662f\u4e00\u4e2a\u91d1\u878d\u77e5\u8bc6\u589e\u5f3a\u6846\u67b6\uff0c\u65e8\u5728\u5feb\u901f\u8ba9\u901a\u7528LLMs\u9002\u5e94\u91d1\u878d\u9886\u57df\u3002\u5b83\u91c7\u7528\u57fa\u4e8e\u68c0\u7d22\u7684\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b9e\u73b0\u5b9e\u65f6\u4e0a\u4e0b\u6587\u7ea7\u77e5\u8bc6\u6ce8\u5165\uff0c\u5e76\u63d0\u4f9b\u4e00\u5957\u9ad8\u8d28\u91cf\u7684\u91d1\u878d\u77e5\u8bc6\u6307\u4ee4\uff0c\u7528\u4e8e\u5fae\u8c03\u4efb\u4f55\u901a\u7528\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86IDEA-FinQA\uff0c\u4e00\u4e2a\u7531LLMs\u9a71\u52a8\u7684\u91d1\u878d\u95ee\u7b54\u7cfb\u7edf\u3002\u8be5\u7cfb\u7edf\u56f4\u7ed5\u5b9e\u65f6\u77e5\u8bc6\u6ce8\u5165\u548c\u4e8b\u5b9e\u5f3a\u5316\u7684\u67b6\u6784\u6784\u5efa\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u3002IDEA-FinQA\u4e3b\u8981\u7531\u6570\u636e\u6536\u96c6\u5668\u3001\u6570\u636e\u67e5\u8be2\u6a21\u5757\u548c\u6267\u884c\u7279\u5b9a\u529f\u80fd\u7684LLM\u4ee3\u7406\u7ec4\u6210\u3002|\n"}, "llm": {"2405.10311": "|**2024-05-16**|**UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models**|Sahel Sharifymoghaddam et.al.|[2405.10311](http://arxiv.org/abs/2405.10311)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u591a\u6a21\u6001\uff08MM\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u89e3\u9501\u4e86\u8bb8\u591a\u9700\u8981\u591a\u6a21\u6001\u7406\u89e3\uff08\u5982\u56fe\u50cf\u63cf\u8ff0\u6216\u89c6\u89c9\u95ee\u7b54\uff09\u548c\u751f\u6210\uff08\u5982\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u751f\u6210\u6216\u7f16\u8f91\uff09\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347MM-LLMs\u7684\u8f93\u51fa\u8d28\u91cf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684UniRAG\u6280\u672f\uff0c\u5b83\u5728\u63a8\u7406\u9636\u6bb5\u5c06\u76f8\u5173\u68c0\u7d22\u4fe1\u606f\u6dfb\u52a0\u5230\u63d0\u793a\u4e2d\uff0c\u4f5c\u4e3a\u5c11\u91cf\u6837\u4f8b\u3002\u4e0e\u666e\u904d\u8ba4\u4e3a\u68c0\u7d22\u589e\u5f3a\uff08RA\uff09\u4e3b\u8981\u6539\u8fdb\u7f55\u89c1\u5b9e\u4f53\u7684\u751f\u6210\u6216\u7406\u89e3\u4e0d\u540c\uff0c\u6211\u4eec\u5728MSCOCO\u6570\u636e\u96c6\u4e0a\u5bf9\u5305\u62ecGPT4\u3001Gemini-Pro\u5728\u5185\u7684\u4e13\u6709\u6a21\u578b\u4ee5\u53caLlava\u3001LaVIT\u548cEmu2\u7b49\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8f93\u5165\u63d0\u793a\u901a\u8fc7MM\u68c0\u7d22\u5668\uff08\u5982UniIR\u6a21\u578b\uff09\u589e\u5f3a\u540e\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u8d28\u91cf\u3002|\n", "2405.10305": "|**2024-05-16**|**4D Panoptic Scene Graph Generation**|Jingkang Yang et.al.|[2405.10305](http://arxiv.org/abs/2405.10305)|**[link](https://github.com/jingkang50/psg4d)**|**\u6211\u4eec\u751f\u6d3b\u5728\u4e00\u4e2a\u4e09\u7ef4\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u901a\u8fc7\u7b2c\u56db\u7ef4\u65f6\u95f4\u5411\u524d\u63a8\u8fdb\u3002\u4e3a\u4e86\u4f7f\u4eba\u5de5\u667a\u80fd\u80fd\u591f\u5168\u9762\u7406\u89e3\u8fd9\u79cd4D\u73af\u5883\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8868\u793a\u5f62\u5f0f\u2014\u20144D\u5168\u666f\u573a\u666f\u56fe\uff08PSG-4D\uff09\uff0c\u5b83\u5c06\u52a8\u60014D\u4e16\u754c\u4e2d\u7684\u539f\u59cb\u89c6\u89c9\u6570\u636e\u62bd\u8c61\u4e3a\u8282\u70b9\u548c\u8fb9\uff0c\u8282\u70b9\u4ee3\u8868\u5177\u6709\u7cbe\u786e\u4f4d\u7f6e\u548c\u72b6\u6001\u4fe1\u606f\u7684\u5b9e\u4f53\uff0c\u8fb9\u6355\u6349\u65f6\u95f4\u5173\u7cfb\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5728\u8fd9\u4e00\u65b0\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u6ce8\u91caPSG-4D\u6570\u636e\u96c6\uff0c\u5305\u542b3000\u4e2aRGB-D\u89c6\u9891\uff0c\u603b\u8ba1100\u4e07\u5e27\uff0c\u6bcf\u5e27\u90fd\u5e26\u67094D\u5168\u666f\u5206\u5272\u63a9\u7801\u4ee5\u53ca\u8be6\u7ec6\u7684\u52a8\u6001\u573a\u666f\u56fe\u6807\u7b7e\u3002\u6211\u4eec\u4e3a\u6b64\u4efb\u52a1\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSG4DFormer\u7684Transformer\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u5168\u666f\u5206\u5272\u63a9\u7801\uff0c\u6cbf\u65f6\u95f4\u8f74\u8ddf\u8e2a\u63a9\u7801\uff0c\u5e76\u901a\u8fc7\u5173\u7cfb\u7ec4\u4ef6\u751f\u6210\u76f8\u5e94\u7684\u573a\u666f\u56fe\u3002\u5728\u65b0\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u672a\u6765\u7684PSG-4D\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u878d\u5165\u6211\u4eec\u7684PSG-4D\u7cfb\u7edf\u6765\u5b9e\u73b0\u52a8\u6001\u573a\u666f\u7406\u89e3\u7684\u4e00\u4e2a\u5b9e\u9645\u5e94\u7528\u793a\u4f8b\u3002**|\n", "2405.10299": "|**2024-05-16**|**HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models**|Rhea Sanjay Sukthanker et.al.|[2405.10299](http://arxiv.org/abs/2405.10299)|**[link](https://github.com/automl/hw-aware-llm-bench)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u786c\u4ef6\u6307\u6807\uff08\u5982\u5ef6\u8fdf\u3001\u80fd\u8017\u3001GPU\u5185\u5b58\u4f7f\u7528\u548c\u6027\u80fd\uff09\u4e4b\u95f4\u7684\u6743\u8861\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u4eba\u4eec\u6b63\u5728\u5bfb\u6c42\u4e3a\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u914d\u7f6e\u5efa\u7acb\u5e15\u7d2f\u6258\u524d\u6cbf\uff0c\u4ee5\u5728\u6307\u5b9a\u786c\u4ef6\u9650\u5236\u4e0b\u627e\u5230\u6700\u4f18\u6a21\u578b\u3002\u7136\u800c\uff0c\u5bf9\u591a\u79cd\u67b6\u6784\u5728\u591a\u53f0\u8bbe\u5907\u4e0a\u7684\u5168\u9762\u8bad\u7ec3\u548c\u8bc4\u4f30\u5728\u8ba1\u7b97\u4e0a\u662f\u4e0d\u53ef\u884c\u7684\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HW-GPT-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u786c\u4ef6\u611f\u77e5\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u57fa\u51c6\uff0c\u5229\u7528\u795e\u7ecf\u67b6\u6784\u641c\u7d22\uff08NAS\uff09\u4e2d\u7684\u6743\u91cd\u5171\u4eab\u6280\u672f\uff0c\u5728\u4e00\u4e2a\u6a21\u578b\u4e2d\u9ad8\u6548\u5730\u8bad\u7ec3\u5305\u542b\u4e0d\u540c\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u8d85\u7f51\u7edc\u3002\u6211\u4eec\u572813\u79cd\u8bbe\u5907\u4e0a\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u6027\u80fd\u5256\u6790\uff0c\u8003\u8651\u4e865\u79cd\u786c\u4ef6\u6307\u6807\u548c3\u79cd\u4e0d\u540c\u7684\u6a21\u578b\u89c4\u6a21\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc78\u79cd\u4e0d\u540c\u7684\u591a\u76ee\u6807NAS\u7b97\u6cd5\u5c55\u793a\u4e86HW-GPT-Bench\u7684\u53ef\u7528\u6027\uff0c\u5e76\u8bc4\u4f30\u4e86\u7531\u6b64\u4ea7\u751f\u7684\u5e15\u7d2f\u6258\u524d\u6cbf\u7684\u8d28\u91cf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u63a8\u52a8\u548c\u52a0\u901f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u76ee\u6807\u65b9\u6cd5\uff0c\u5982NAS\u548c\u7ed3\u6784\u5316\u526a\u679d\u7684\u7814\u7a76\u3002**|\n", "2405.10288": "|**2024-05-16**|**Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction**|Jianhao Chen et.al.|[2405.10288](http://arxiv.org/abs/2405.10288)|**[link](https://github.com/jianhaochen-nju/tsdre)**|**\u6458\u8981\uff1a** \u4e8b\u5b9e\u62bd\u53d6\u5bf9\u4e8e\u6784\u5efa\u77e5\u8bc6\u56fe\u8c31\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5bf9\u65f6\u95f4\u76f8\u5173\u4e8b\u5b9e\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u9700\u6c42\u589e\u957f\uff0c\u51fa\u73b0\u4e86\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u4efb\u52a1\u3002\u672c\u6587\u7279\u522b\u5173\u6ce8\u4ece\u81ea\u7136\u8bed\u8a00\u6587\u672c\u4e2d\u63d0\u53d6\u65f6\u95f4\u6027\u4e8b\u5b9e\u3002\u5148\u524d\u7684\u7814\u7a76\u672a\u80fd\u59a5\u5584\u5904\u7406\u590d\u6742\u53e5\u5b50\u4e2d\u65f6\u95f4\u4e0e\u4e8b\u5b9e\u5bf9\u5e94\u5173\u7cfb\u7684\u5efa\u7acb\u96be\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u65f6\u95f4\u7ebf\u7684\u53e5\u5b50\u5206\u89e3\u7b56\u7565\uff0c\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4e8b\u5b9e\u76f8\u5173\u65f6\u95f4\u7ebf\u7684\u7cbe\u7ec6\u7406\u89e3\u3002\u7136\u800c\uff0c\u76f4\u63a5\u4f7f\u7528LLMs\u8fdb\u884c\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u6027\u80fd\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86TSDRE\u65b9\u6cd5\uff0c\u5c06LLMs\u7684\u5206\u89e3\u80fd\u529b\u878d\u5165\u5230\u5c0f\u578b\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u4f20\u7edf\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002 \u4e3a\u4e86\u652f\u6301\u8bc4\u4f30\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u590d\u6742\u7684\u65f6\u5e8f\u4e8b\u5b9e\u62bd\u53d6\u6570\u636e\u96c6ComplexTRED\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTSDRE\u5728HyperRED-Temporal\u548cComplexTRED\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002|\n", "2405.10276": "|**2024-05-16**|**Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers**|Tuo Zhang et.al.|[2405.10276](http://arxiv.org/abs/2405.10276)|null|\u8fd1\u5e74\u6765\uff0c\u8bb8\u591a\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u7b56\u7565\u6027\u63d0\u793a\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6548\u80fd\u3002\u7279\u522b\u662f\u4f18\u5316\u901a\u8fc7prompting\uff08OPRO\uff09\u65b9\u6cd5\u8868\u73b0\u51fa\u9876\u5c16\u6027\u80fd\uff0c\u5b83\u5229\u7528LLMs\u4f5c\u4e3a\u4f18\u5316\u5668\uff0c\u76ee\u6807\u662f\u5bfb\u627e\u80fd\u6700\u5927\u5316\u4efb\u52a1\u51c6\u786e\u6027\u7684\u6307\u4ee4\u3002\u672c\u8bba\u6587\u91cd\u65b0\u5ba1\u89c6\u4e86OPRO\u5728\u5c0f\u578bLLMs\uff08\u5982LaMa-2\u7cfb\u5217\u548cMistral 7B\uff09\u4e0a\u7684\u81ea\u52a8\u5316\u63d0\u793a\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u4e8e\u5c0f\u578bLLMs\uff0cOPRO\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u5176\u6709\u9650\u7684\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86\u4f18\u5316\u6f5c\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u672a\u6765\u7684\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u5e94\u540c\u65f6\u8003\u8651\u6a21\u578b\u80fd\u529b\u548c\u8ba1\u7b97\u6210\u672c\u3002\u9488\u5bf9\u5c0f\u578bLLMs\uff0c\u6211\u4eec\u63a8\u8350\u76f4\u63a5\u63d0\u4f9b\u660e\u786e\u9610\u8ff0\u76ee\u6807\u548c\u65b9\u6cd5\u7684\u6307\u4ee4\uff0c\u4f5c\u4e3a\u7a33\u5065\u7684\u63d0\u793a\u57fa\u7ebf\uff0c\u4ee5\u786e\u4fdd\u5728\u5f53\u524d\u7814\u7a76\u4e2d\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u63d0\u793a\u8bbe\u8ba1\u3002|\n", "2405.10260": "|**2024-05-16**|**Keep It Private: Unsupervised Privatization of Online Text**|Calvin Bao et.al.|[2405.10260](http://arxiv.org/abs/2405.10260)|**[link](https://github.com/csbao/kip-privatization)**|**## \u80cc\u666f \u4f5c\u8005\u8eab\u4efd\u6df7\u6dc6\u6280\u672f\u6709\u671b\u901a\u8fc7\u81ea\u52a8\u91cd\u5199\u6587\u672c\u6765\u4fdd\u62a4\u7f51\u7edc\u901a\u4fe1\u4e2d\u7684\u4e2a\u4eba\u9690\u79c1\u3002\u7136\u800c\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6587\u732e\u4e2d\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u8bc4\u4f30\u5927\u591a\u5c40\u9650\u5728\u72ed\u5c0f\u573a\u666f\u4e0b\uff0c\u4e3b\u8981\u4f9d\u8d56\u4e8e\u8868\u9762\u7684\u7f16\u8f91\u64cd\u4f5c\uff0c\u53ef\u80fd\u5bfc\u81f4\u8f93\u51fa\u4e0d\u81ea\u7136\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u6587\u672c\u79c1\u5bc6\u5316\u6846\u67b6\uff0c\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u751f\u6210\u517c\u987e\u51c6\u786e\u3001\u8fde\u8d2f\u548c\u9690\u79c1\u7684\u91cd\u5199\u3002\u6211\u4eec\u5728\u5927\u89c4\u6a21\u7684\u82f1\u8bedReddit\u5e16\u5b50\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u8be5\u6570\u636e\u96c6\u753168,000\u540d\u4f5c\u8005\u64b0\u5199\uff0c\u5305\u542b\u77ed\u5230\u4e2d\u7b49\u957f\u5ea6\u7684\u6587\u672c\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5728\u4e0d\u540c\u8bc4\u4f30\u6761\u4ef6\u4e0b\uff0c\u5982\u4f5c\u8005\u7b80\u4ecb\u957f\u5ea6\u548c\u4f5c\u8005\u8bc6\u522b\u7b56\u7565\uff0c\u6027\u80fd\u7684\u53d8\u5316\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u81ea\u52a8\u5316\u6307\u6807\u548c\u4eba\u5de5\u8bc4\u4f30\u4e2d\u4fdd\u6301\u9ad8\u6587\u672c\u8d28\u91cf\uff0c\u5e76\u6210\u529f\u5730\u89c4\u907f\u4e86\u51e0\u79cd\u81ea\u52a8\u4f5c\u8005\u8bc6\u522b\u653b\u51fb\u3002**|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u4f53\u5728\u7a7a\u95f4\u7406\u89e3\u4e0e\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u8986\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u7ed3\u5408\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u663e\u793a\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u4e5f\u6307\u51fa\u4e86\u6316\u63983D-LLMs\u5168\u90e8\u6f5c\u529b\u6240\u9700\u7684\u521b\u65b0\u65b9\u6cd5\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u63d0\u4f9b\u6307\u5bfc\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u8c03\u67e5\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251](http://arxiv.org/abs/2405.10251)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u8bc4\u4f30\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u7b49\u65b9\u9762\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u636e\u6211\u4eec\u6240\u77e5\uff0c\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u6df1\u5165\u7814\u7a76\uff0c\u8fd9\u662f\u8861\u91cf\u6a21\u578b\u4f18\u79c0\u7a0b\u5ea6\u7684\u5173\u952e\u6807\u51c6\u3002\u56e0\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u5168\u9762\u8bc4\u4f30\u77e5\u540d\u4e14\u6027\u80fd\u51fa\u8272\u7684LLMs\uff0c\u5305\u62ecChatGPT\u3001ChatGLM\u3001\u57fa\u4e8eT5\u7684\u6a21\u578b\u3001\u57fa\u4e8eLLaMA\u7684\u6a21\u578b\u548cPythia\u6a21\u578b\uff0c\u5728\u5bf9\u8bdd\u751f\u6210\u548c\u6587\u672c\u603b\u7ed3\u7b49NLG\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u6db5\u76d6\u82f1\u8bed\u548c\u4e2d\u6587\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5171\u540c\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5305\u62ec\u8f93\u5165\u6a21\u677f\u548c\u540e\u5904\u7406\u7b56\u7565\u3002\u7814\u7a76\u7ed3\u679c\u62a5\u544a\u4e86\u81ea\u52a8\u8bc4\u5206\uff0c\u540c\u65f6\u8fdb\u884c\u4e86\u8be6\u7ec6\u5206\u6790\u3002|\n", "2405.10250": "|**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u4e92\u52a8\u529f\u80fd\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u6307\u5bfc\u6a21\u578b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u4e92\u52a8\u65b9\u5f0f\u5f80\u5f80\u5047\u8bbe\u7528\u6237\u5177\u5907\u8c03\u8bd5\u6e90\u4ee3\u7801\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u975e\u4e13\u4e1a\u7a0b\u5e8f\u5458\u4e0d\u592a\u53cb\u597d\u3002\u8fd9\u4f7f\u5f97\u4f7f\u4e92\u52a8\u4ee3\u7801\u751f\u6210\u5bf9\u4e0d\u540c\u7f16\u7a0b\u6c34\u5e73\u7684\u4e2a\u4f53\u66f4\u6613\u4e8e\u4f7f\u7528\u6210\u4e3a\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86IntelliExplain\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u673a\u4ea4\u4e92\u8303\u5f0f\uff0c\u901a\u8fc7\u8ba9\u7528\u6237\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u4e0e\u6e90\u4ee3\u7801\u4e92\u52a8\uff0c\u63d0\u5347\u975e\u4e13\u4e1a\u4eba\u58eb\u7684\u4f53\u9a8c\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u4ed6\u4eec\u53d1\u73b0\u9519\u8bef\u7684\u81ea\u7136\u8bed\u8a00\u7ea0\u6b63\u53cd\u9988\uff0c\u6765\u6307\u5bfc\u7cfb\u7edf\u4fee\u8ba2\u4ee3\u7801\uff0c\u76f4\u5230\u7528\u6237\u5bf9\u7cfb\u7edf\u7684\u4ee3\u7801\u89e3\u91ca\u611f\u5230\u6ee1\u610f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u663e\u793a\uff0c\u4f7f\u7528IntelliExplain\u7684\u7528\u6237\u5728Text-to-SQL\u548cPython\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u5206\u522b\u6bd4\u7eafGPT-3.5\u63d0\u9ad8\u4e8611.6%\u548c25.3%\uff0c\u540c\u65f6\u6240\u9700\u65f6\u95f4\u5206\u522b\u51cf\u5c11\u4e8639.0%\u548c15.6%\u3002|\n", "2405.10212": "|**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5fc3\u7406\u5b66\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014CPsyExam\uff0c\u5b83\u6e90\u4e8e\u4e2d\u56fd\u8bed\u8a00\u8003\u8bd5\u7684\u95ee\u9898\u3002CPsyExam\u65e8\u5728\u5206\u522b\u5f3a\u8c03\u5fc3\u7406\u5b66\u77e5\u8bc6\u548c\u6848\u4f8b\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u8ba4\u8bc6\u5230\u5c06\u5fc3\u7406\u5b66\u77e5\u8bc6\u5e94\u7528\u4e8e\u5b9e\u9645\u60c5\u5883\u7684\u4ef7\u503c\u3002\u4ece22,000\u4e2a\u95ee\u9898\u5e93\u4e2d\uff0c\u6211\u4eec\u7cbe\u9009\u4e864,000\u4e2a\u6765\u6784\u5efa\u8be5\u57fa\u51c6\uff0c\u786e\u4fdd\u4e86\u4e3b\u9898\u7684\u5747\u8861\u8986\u76d6\uff0c\u5e76\u5305\u542b\u4e86\u5404\u79cd\u6848\u4f8b\u5206\u6790\u65b9\u6cd5\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u57fa\u7840\u7684\u6a21\u578b\u3002\u5b9e\u9a8c\u548c\u5206\u6790\u7ed3\u679c\u663e\u793a\uff0cCPsyExam\u662f\u4e00\u4e2a\u6709\u6548\u7684\u786e\u7acb\u8bed\u8a00\u6a21\u578b\u5bf9\u5fc3\u7406\u5b66\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\uff0c\u540c\u65f6\u652f\u6301\u5728\u4e0d\u540c\u7c92\u5ea6\u4e0a\u6bd4\u8f83\u8fd9\u4e9b\u6a21\u578b\u3002|\n", "2405.10936": "|**2024-05-17**|**A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers**|Kaiyu Huang et.al.|[2405.10936](http://arxiv.org/abs/2405.10936)|**[link](https://github.com/kaiyuhwang/mllm-survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u8bed\u8a00\u80fd\u529b\uff0c\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u51cf\u5c11\u6f5c\u5728\u7684\u6b67\u89c6\u5e76\u63d0\u5347\u6280\u672f\u7684\u901a\u7528\u6027\u548c\u53ef\u8bbf\u95ee\u6027\uff0c\u5bf9\u4e8e\u591a\u8bed\u8a00\u6280\u672f\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1LLMs\u53d6\u5f97\u4e86\u7a81\u7834\uff0c\u4f46\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u7684\u6df1\u5165\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u4e00\u4efd\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u603b\u7ed3\u8fd1\u671f\u7684\u65b9\u6cd5\u3001\u8fdb\u5c55\u3001\u5c40\u9650\u6027\u548c\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u65e8\u5728\u4ece\u591a\u4e2a\u89d2\u5ea6\u5ba1\u89c6LLMs\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u56de\u987e\u4e86\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7814\u7a76\u7684\u5386\u53f2\u6f14\u53d8\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86LLMs\u7684\u591a\u8bed\u8a00\u7279\u6027\uff0c\u5305\u62ec\u8bad\u7ec3\u548c\u63a8\u7406\u65b9\u6cd5\u3001\u6a21\u578b\u5b89\u5168\u3001\u8de8\u9886\u57df\u4e0e\u6587\u5316\u9002\u5e94\u4ee5\u53ca\u6570\u636e\u96c6\u4f7f\u7528\u3002\u6211\u4eec\u8fd8\u5206\u6790\u4e86\u8fd9\u4e9b\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u53ef\u80fd\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u7684\u591a\u8bed\u8a00\u6027\u80fd\u3002\u672c\u7efc\u8ff0\u65e8\u5728\u5e2e\u52a9\u7814\u7a76\u754c\u5e94\u5bf9\u591a\u8bed\u8a00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u5173\u4e8e\u57fa\u4e8eLLMs\u7684\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6838\u5fc3\u6982\u5ff5\u3001\u5173\u952e\u6280\u672f\u53ca\u6700\u65b0\u8fdb\u5c55\u7684\u5168\u9762\u7406\u89e3\u3002**|\n", "2405.10928": "|**2024-05-17**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928](http://arxiv.org/abs/2405.10928)|**[link](https://github.com/apolloresearch/rib)**|### \u6982\u8ff0 \u673a\u68b0\u89e3\u91ca\u6027\u76ee\u6807\u662f\u901a\u8fc7\u9006\u5411\u5de5\u7a0b\u7406\u89e3\u795e\u7ecf\u7f51\u7edc\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5bf9\u6fc0\u6d3b\u7684\u5206\u89e3\uff0c\u4f7f\u5f97\u5355\u4e2a\u795e\u7ecf\u5143\u6216\u6a21\u578b\u7ec4\u4ef6\u65e0\u6cd5\u6e05\u6670\u5bf9\u5e94\u4e8e\u72ec\u7279\u7684\u7279\u5f81\u6216\u529f\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\u2014\u2014\u5c40\u90e8\u4ea4\u4e92\u57fa\uff08Local Interaction Basis\uff0cLIB\uff09\u3002LIB\u65e8\u5728\u901a\u8fc7\u6d88\u9664\u65e0\u5173\u6fc0\u6d3b\u548c\u4ea4\u4e92\uff0c\u8bc6\u522b\u8ba1\u7b97\u7279\u5f81\u3002\u8be5\u65b9\u6cd5\u6452\u5f03\u65e0\u610f\u4e49\u7684\u6fc0\u6d3b\u65b9\u5411\uff0c\u5e76\u4f7f\u57fa\u7840\u4e0e\u76f8\u90bb\u5c42\u95f4\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u5947\u5f02\u5411\u91cf\u5bf9\u9f50\u3002\u540c\u65f6\uff0c\u5b83\u6839\u636e\u7279\u5f81\u5bf9\u540e\u7eed\u8ba1\u7b97\u7684\u91cd\u8981\u6027\u8fdb\u884c\u7f29\u653e\uff0c\u751f\u6210\u4e00\u4e2a\u663e\u793a\u6a21\u578b\u4e2d\u6240\u6709\u8ba1\u7b97\u76f8\u5173\u7279\u6027\u548c\u4ea4\u4e92\u7684\u56fe\u8c31\u3002 \u6211\u4eec\u5728\u6a21\u5757\u52a0\u6cd5\u548cCIFAR-10\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86LIB\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u6bd4\u4e8e\u4e3b\u6210\u5206\u5206\u6790\uff0cLIB\u80fd\u8bc6\u522b\u51fa\u66f4\u591a\u8ba1\u7b97\u76f8\u5173\u7684\u7279\u5f81\uff0c\u5e76\u5448\u73b0\u51fa\u66f4\u7a00\u758f\u7684\u4ea4\u4e92\u3002\u7136\u800c\uff0c\u5728\u5e94\u7528\u4e8e\u8bed\u8a00\u6a21\u578b\u65f6\uff0cLIB\u5e76\u672a\u663e\u8457\u63d0\u9ad8\u53ef\u89e3\u91ca\u6027\u6216\u4ea4\u4e92\u7a00\u758f\u5ea6\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c3d\u7ba1LIB\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u7406\u8bba\u9a71\u52a8\u65b9\u6cd5\uff0c\u4f46\u5f53\u524d\u5f62\u5f0f\u5e76\u4e0d\u9002\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893](http://arxiv.org/abs/2405.10893)|null|\u8fd9\u7bc7\u6280\u672f\u8bba\u6587\u9610\u8ff0\u4e86COGNET-MD\uff0c\u4e00\u4e2a\u4e13\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7684\u65b0\u57fa\u51c6\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u5206\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u533b\u5b66\u6587\u672c\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u96be\u5ea6\u5206\u7ea7\u7684\u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u6570\u636e\u5e93\u3002\u8fd9\u4e2a\u6570\u636e\u5e93\u7531\u591a\u4e2a\u533b\u7597\u9886\u57df\u7684\u4e13\u5bb6\u5408\u4f5c\u521b\u5efa\uff0c\u4ee5\u53cd\u6620\u5f53\u524d\u533b\u5b66\u8d8b\u52bf\uff0c\u786e\u4fdd\u5b89\u5168\u3001\u5b9e\u7528\u548c\u9002\u7528\u6027\u3002\u521d\u671f\u7248\u672c\u5305\u542b\u4e86\u7cbe\u795e\u79d1\u3001\u7259\u79d1\u3001\u80ba\u75c5\u5b66\u3001\u76ae\u80a4\u79d1\u548c\u5185\u5206\u6ccc\u5b66\u7b49\u9886\u57df\u7684\u9898\u76ee\uff0c\u4f46\u4f1a\u6301\u7eed\u6269\u5c55\uff0c\u672a\u6765\u8fd8\u4f1a\u52a0\u5165\u66f4\u591a\u533b\u5b66\u5b66\u79d1\u3002|\n", "2405.10883": "|**2024-05-17**|**Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review**|Hongyi Yang et.al.|[2405.10883](http://arxiv.org/abs/2405.10883)|null|\u8be5\u7efc\u8ff0\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4f30\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u60a3\u8005\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u73b0\u72b6\u548c\u524d\u666f\uff0c\u4ee5\u53ca\u5176\u5bf9\u5eb7\u590d\u8fc7\u7a0b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u4ece2012\u5e74\u81f3\u73b0\u5728\u7b5b\u9009\u4e8670\u9879\u7814\u7a76\uff0c\u91cd\u70b9\u5173\u6ce8\u673a\u5668\u5b66\u4e60\u3001\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u5728\u5fc3\u7406\u5065\u5eb7\u5e72\u9884\u548c\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u3001\u6280\u672f\u7c7b\u522b\u3001\u4ea7\u54c1\u548c\u6570\u636e\u7c7b\u578b\uff0c\u5982\u751f\u6001\u77ac\u65f6\u8bc4\u4f30\u3001\u884c\u4e3a\u548c\u8bed\u97f3\u6570\u636e\u7684\u5206\u6790\u3002\u7ed3\u679c\u663e\u793a\uff0cAI\u5728\u75c7\u72b6\u76d1\u6d4b\u3001\u590d\u53d1\u98ce\u9669\u9884\u6d4b\u548c\u5eb7\u590d\u6cbb\u7597\u4e2d\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u8fd8\u63a2\u8ba8\u4e86\u57fa\u4e8eAI\u7684\u65b0\u5174\u4ea7\u54c1\u3001\u6280\u672f\u548c\u5206\u6790\u65b9\u6cd5\uff0c\u5982\u793e\u4ea4\u5a92\u4f53\u5206\u6790\u3001\u4e25\u8083\u6e38\u620f\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5eb7\u590d\u4e2d\u7684\u6f5c\u5728\u6311\u6218\u548c\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u603b\u7684\u6765\u8bf4\uff0c\u8fd9\u7bc7\u8bba\u6587\u7cfb\u7edf\u56de\u987e\u4e86AI\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u8def\u5f84\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u5efa\u8bae\u3002|\n", "2405.10853": "|**2024-05-17**|**The Future of Large Language Model Pre-training is Federated**|Lorenzo Sani et.al.|[2405.10853](http://arxiv.org/abs/2405.10853)|null|## \u80cc\u666f \u751f\u6210\u5f0f\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5b83\u4eec\u6240\u63a5\u53d7\u7684\u6d77\u91cf\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u5df2\u5efa\u7acb\u7684\u89c4\u6a21\u6cd5\u5219\uff0cLLMs\u672a\u6765\u6027\u80fd\u7684\u63d0\u5347\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6211\u4eec\u80fd\u591f\u5229\u7528\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u6709\u53ef\u80fd\u91ca\u653e\u5168\u7403\u5927\u90e8\u5206\u672a\u5145\u5206\u5229\u7528\u7684\u6570\u636e\u548c\u8ba1\u7b97\u80fd\u529b\uff0c\u8fd9\u4e9b\u662f\u5f53\u524d\u4ee5\u6570\u636e\u4e2d\u5fc3\u4e3a\u4e2d\u5fc3\u7684LLM\u8bad\u7ec3\u65b9\u6cd5\u6240\u5ffd\u89c6\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7a33\u5065\u3001\u7075\u6d3b\u4e14\u53ef\u590d\u73b0\u7684FL\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdb\u673a\u6784\u95f4\u7684\u5927\u89c4\u6a21\u534f\u4f5c\uff0c\u5171\u540c\u8bad\u7ec3LLMs\uff0c\u4ece\u800c\u52a8\u5458\u66f4\u591a\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\uff0c\u751a\u81f3\u53ef\u80fd\u8fbe\u5230\u6216\u8d85\u8d8a\u4e2d\u5fc3\u5316\u7684\u6027\u80fd\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cdFL\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u5728\u6709\u9650\u8d44\u6e90\u4e0b\u6269\u5c55\u5230\u767e\u4ebf\u5143\u7ea7\u7684\u8054\u90a6LLM\uff0c\u4f7f\u5f97\u62e5\u6709\u4e30\u5bcc\u6570\u636e\u7684\u5b9e\u4f53\u80fd\u591f\u6210\u4e3a\u9884\u8bad\u7ec3LLMs\u7684\u4e3b\u5bfc\u529b\u91cf\uff0c\u800c\u4e0d\u662f\u4ec5\u8ba9\u8ba1\u7b97\u8d44\u6e90\u4e30\u5bcc\u7684\u673a\u6784\u72ec\u5360\u9ccc\u5934\u3002\u8fd9\u79cd\u65b9\u6cd5\u5f3a\u8c03\u4e86\u8054\u90a6\u8bad\u7ec3\u7684\u89c4\u6a21\u6548\u76ca\uff0c\u5e76\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b9e\u7528\u8def\u5f84\u3002|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825](http://arxiv.org/abs/2405.10825)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u5b83\u4eec\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u5c24\u5176\u5728\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u7684\u63a8\u52a8\u4e0b\u5c55\u73b0\u51fa\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u65e8\u5728\u5168\u9762\u6982\u8ff0LLM\u8d4b\u80fd\u7684\u7535\u4fe1\u7f51\u7edc\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86LLMs\u7684\u57fa\u7840\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u63a8\u7406\u4e0e\u5e94\u7528\u3001\u6a21\u578b\u8bc4\u4f30\uff0c\u4ee5\u53ca\u5728\u7535\u4fe1\u90e8\u7f72\u4e2d\u7684\u8fd0\u7528\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8LLM\u652f\u6301\u7684\u5173\u952e\u6280\u672f\u548c\u7535\u4fe1\u5e94\u7528\uff0c\u6d89\u53ca\u751f\u6210\u3001\u5206\u7c7b\u3001\u4f18\u5316\u548c\u9884\u6d4b\u95ee\u9898\u3002\u751f\u6210\u5e94\u7528\u5305\u62ec\u7535\u4fe1\u9886\u57df\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u7f51\u7edc\u914d\u7f6e\u81ea\u52a8\u751f\u6210\u3002\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u4efb\u52a1\u6db5\u76d6\u7f51\u7edc\u5b89\u5168\u3001\u6587\u672c\u3001\u56fe\u50cf\u548c\u6d41\u91cf\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u5229\u7528LLMs\u7684\u81ea\u52a8\u5316\u4f18\u5316\u6280\u672f\uff0c\u5982\u5f3a\u5316\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u8bbe\u8ba1\u548c\u53e3\u8bed\u5f3a\u5316\u5b66\u4e60\u3002\u5bf9\u4e8e\u9884\u6d4b\u95ee\u9898\uff0cLLMs\u53ef\u7528\u4e8e\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u548c\u591a\u6a21\u6001\u7535\u4fe1\u9884\u6d4b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86LLM\u8d4b\u80fd\u7535\u4fe1\u7f51\u7edc\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2405.10808": "|**2024-05-17**|**ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios**|Markus Bayer et.al.|[2405.10808](http://arxiv.org/abs/2405.10808)|null|\u4e3b\u52a8\u5b66\u4e60\u65e8\u5728\u901a\u8fc7\u4f18\u5148\u5904\u7406\u6700\u80fd\u63d0\u5347\u5b66\u4e60\u6548\u679c\u7684\u5b9e\u4f8b\u6765\u51cf\u5c11\u6807\u6ce8\u5de5\u4f5c\u91cf\u3002\u7136\u800c\uff0c\u8bb8\u591a\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u9762\u4e34\u201c\u51b7\u542f\u52a8\u201d\u95ee\u9898\uff0c\u5373\u5728\u521d\u671f\u9700\u8981\u5927\u91cf\u6570\u636e\u624d\u80fd\u53d1\u6325\u6548\u80fd\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982BERT\uff09\u4e0a\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u5df2\u8868\u73b0\u826f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u2014\u2014ActiveLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u3001Llama 3\u548cMistral Large\uff09\u8fdb\u884c\u5b9e\u4f8b\u9009\u62e9\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0cActiveLLM\u663e\u8457\u63d0\u9ad8\u4e86BERT\u5206\u7c7b\u5668\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u548cSetFit\u7b49\u5c11\u6570\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0cActiveLLM\u8fd8\u80fd\u6269\u5c55\u5230\u975e\u5c11\u91cf\u6837\u672c\u573a\u666f\uff0c\u652f\u6301\u8fed\u4ee3\u9009\u62e9\uff0c\u4ece\u800c\u5e2e\u52a9\u5176\u4ed6\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u514b\u670d\u51b7\u542f\u52a8\u96be\u9898\u3002\u7ed3\u679c\u8868\u660e\uff0cActiveLLM\u4e3a\u6539\u5584\u4e0d\u540c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u6a21\u578b\u6027\u80fd\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2405.10745": "|**2024-05-17**|**Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings**|Albert Sawczyn et.al.|[2405.10745](http://arxiv.org/abs/2405.10745)|null|### \u7ffb\u8bd1 \u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u5bf9\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u63d0\u51fa\u4e86\u4e25\u5cfb\u6311\u6218\u3002\u901a\u5e38\u91c7\u7528\u7684\u65b9\u6cd5\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u65f6\u5f80\u5f80\u5b58\u5728\u5c40\u9650\u6027\u3002\u7136\u800c\uff0c\u4eba\u4eec\u5df2\u7ecf\u52aa\u529b\u901a\u8fc7\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u5f25\u8865\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u901a\u8fc7\u5c06\u5c0f\u89c4\u6a21\u7684\u9886\u57df\u7279\u5b9aKG\u4e0e\u901a\u7528KG\u76f8\u7ed3\u5408\u3002\u5c3d\u7ba1KG\u5728\u77e5\u8bc6\u8868\u793a\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4f46\u6784\u5efa\u5b83\u4eec\u7684\u6210\u672c\u53ef\u80fd\u963b\u788d\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u94fe\u63a5\u5230\u5927\u89c4\u6a21\u901a\u7528KG\u6765\u63d0\u5347\u5c0f\u578b\u9886\u57df\u7279\u5b9aKG\u5d4c\u5165\u7684\u5b66\u4e60\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5e26\u6765\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u4f8b\u5982\uff0cHits@10\u6307\u6807\u6700\u9ad8\u63d0\u9ad8\u4e8644%\u3002\u8fd9\u4e00\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u7814\u7a76\u65b9\u5411\u6709\u671b\u4fc3\u8fdbKG\u5728\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u66f4\u9891\u7e41\u8fd0\u7528\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u4e3a\u7a33\u5065\u3001\u53ef\u9760\u7684ML\u89e3\u51b3\u65b9\u6848\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u6d41\u884c\u4f46\u6613\u51fa\u9519\u7684LLM\u65b9\u6cd5\u66f4\u5177\u53ef\u9760\u6027\u3002\u5173\u952e\u8bcd\uff1a\u77e5\u8bc6\u56fe\u8c31\u3001\u77e5\u8bc6\u56fe\u8c31\u8865\u5168\u3001\u5b9e\u4f53\u5bf9\u9f50\u3001\u8868\u793a\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739](http://arxiv.org/abs/2405.10739)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|**\u5728\u8fc7\u53bb\u4e00\u5e74\u91cc\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\u3001\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u7b49\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5e9e\u5927\u89c4\u6a21\u548c\u9ad8\u6602\u7684\u8bad\u7ec3\u4e0e\u63a8\u7406\u6210\u672c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u9ad8\u6548\u4e14\u8f7b\u91cf\u7ea7\u7684MLLM\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u8fb9\u7f18\u8ba1\u7b97\u73af\u5883\u4e2d\u3002\u672c\u7efc\u8ff0\u5168\u9762\u7cfb\u7edf\u5730\u56de\u987e\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7684\u7814\u7a76\u73b0\u72b6\u3002\u6211\u4eec\u6982\u8ff0\u4e86\u4ee3\u8868\u6027\u9ad8\u6548\u6a21\u578b\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u603b\u7ed3\u4e86\u6709\u6548\u7ed3\u6784\u548c\u7b56\u7565\u7684\u7814\u7a76\u72b6\u6001\uff0c\u4ee5\u53ca\u5176\u5b9e\u7528\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7814\u7a76\u7684\u5c40\u9650\uff0c\u5e76\u5c55\u671b\u4e86\u6709\u524d\u666f\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u5982\u9700\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey\u3002**|\n", "2405.10725": "|**2024-05-17**|**INDUS: Effective and Efficient Language Models for Scientific Applications**|Bishwaranjan Bhattacharjee et.al.|[2405.10725](http://arxiv.org/abs/2405.10725)|null|\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bad\u7ec3\u6570\u636e\u53ef\u4ee5\u4f7f\u6a21\u578b\u5728\u4e13\u4e1a\u4efb\u52a1\u4e0a\u8868\u73b0\u66f4\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86INDUS\uff0c\u4e00\u5957\u4e13\u4e3a\u5730\u7403\u79d1\u5b66\u3001\u751f\u7269\u5b66\u3001\u7269\u7406\u5b66\u3001\u592a\u9633\u7269\u7406\u3001\u884c\u661f\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u8bbe\u8ba1\u7684\u5b9a\u5236\u5316\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u4e9b\u6a21\u578b\u57fa\u4e8e\u7cbe\u5fc3\u6311\u9009\u7684\u79d1\u5b66\u8bed\u6599\u5e93\uff0c\u5305\u62ec\uff1a\uff081\uff09\u4e00\u4e2a\u4f7f\u7528\u9886\u57df\u4e13\u7528\u8bcd\u6c47\u548c\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u63d0\u5347\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u7684\u8868\u73b0\uff1b\uff082\uff09\u4e00\u4e2a\u57fa\u4e8e\u5bf9\u6bd4\u5b66\u4e60\u7684\u901a\u7528\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5229\u7528\u591a\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u4f18\u5316\u4fe1\u606f\u68c0\u7d22\u4efb\u52a1\uff1b\uff083\uff09\u901a\u8fc7\u77e5\u8bc6\u84b8\u998f\u6280\u672f\u7f29\u5c0f\u89c4\u6a21\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5bf9\u5ef6\u8fdf\u548c\u8d44\u6e90\u6709\u9650\u7684\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e09\u4e2a\u65b0\u7684\u79d1\u5b66\u57fa\u51c6\u6570\u636e\u96c6\uff1aCLIMATE-CHANGE-NER\uff08\u5b9e\u4f53\u8bc6\u522b\uff09\u3001NASA-QA\uff08\u62bd\u53d6\u5f0f\u95ee\u7b54\uff09\u548cNASA-IR\uff08\u4fe1\u606f\u68c0\u7d22\uff09\uff0c\u4ee5\u63a8\u52a8\u8de8\u5b66\u79d1\u9886\u57df\u7684\u7814\u7a76\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u65b0\u4efb\u52a1\u548c\u76f8\u5173\u9886\u57df\u73b0\u6709\u57fa\u51c6\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8e\u901a\u7528\u7f16\u7801\u5668\uff08\u5982RoBERTa\uff09\u548c\u73b0\u6709\u7684\u9886\u57df\u7279\u5b9a\u7f16\u7801\u5668\uff08\u5982SciBERT\uff09\u3002|\n", "2405.12217": "|**2024-05-20**|**Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning**|Guanglin Zhou et.al.|[2405.12217](http://arxiv.org/abs/2405.12217)|**[link](https://github.com/jameszhou-gl/icl-distribution-shift)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5e94\u5bf9\u81ea\u7136\u5206\u5e03\u53d8\u5316\u65f6\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\uff0c\u5e38\u5e38\u8d85\u8d8a\u5148\u524d\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u9886\u57df\u7279\u5b9a\u7684\u9002\u5e94\u4ecd\u7136\u662f\u5fc5\u8981\u7684\uff0c\u5c24\u5176\u662f\u5728\u533b\u7597\u7b49\u4e13\u4e1a\u9886\u57df\u3002\u9274\u4e8eLMMs\u5e9e\u5927\u7684\u53c2\u6570\u7a7a\u95f4\u4f7f\u5176\u5fae\u8c03\u4e0d\u5207\u5b9e\u9645\uff0c\u672c\u7814\u7a76\u805a\u7126\u4e8e\u63a2\u7d22\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u4f5c\u4e3a\u4e00\u79cd\u589e\u5f3aLMM\u9002\u5e94\u6027\u7684\u6709\u6548\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0cICL\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u793a\u4f8b\u7684\u9009\u62e9\uff0c\u8fd9\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7c7b\u4f3c\uff0c\u4f46\u5bf9\u9762\u4e34\u5206\u5e03\u53d8\u5316\u7684LMMs\u63d0\u51fa\u4e86\u72ec\u7279\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u79cd\u65e0\u76d1\u7763\u7684ICL\u65b9\u6cd5\u2014\u2014TopKNearestPR\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u7279\u5f81\u76f8\u4f3c\u6027\u8fdb\u884c\u6700\u8fd1\u793a\u4f8b\u641c\u7d22\u6765\u9009\u62e9\u793a\u4f8b\u3002\u7814\u7a76\u63ed\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u5904\u7406\u5206\u5e03\u8f6c\u79fb\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u7f3a\u9677\u5bf9\u5176\u6548\u679c\u7684\u9650\u5236\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014InvariantSelectPR\uff0c\u5b83\u5229\u7528\u7c7b\u6761\u4ef6\u5bf9\u6bd4\u4e0d\u53d8\u6027\uff08CCI\uff09\u6765\u63d0\u5347\u9884\u8bad\u7ec3\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7a33\u5065\u6027\u3002CCI\u901a\u8fc7\u589e\u5f3a\u4e0d\u540c\u7c7b\u522b\u95f4\u7684\u533a\u5206\u5ea6\u5e76\u786e\u4fdd\u5bf9\u9886\u57df\u7279\u5b9a\u53d8\u5316\u7684\u4e0d\u53d8\u6027\uff0c\u63d0\u9ad8\u4e86\u7f16\u7801\u5668\u8bc6\u522b\u548c\u68c0\u7d22\u6700\u6709\u4fe1\u606f\u4ef7\u503c\u793a\u4f8b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u6709\u52a9\u4e8e\u5f15\u5bfcLMM\u9002\u5e94\u65b0\u7684\u67e5\u8be2\u6837\u672c\uff0c\u5373\u4f7f\u5728\u4e0d\u540c\u7684\u5206\u5e03\u4e0b\u4e5f\u662f\u5982\u6b64\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cInvariantSelectPR\u663e\u8457\u63d0\u9ad8\u4e86LMM\u7684\u9002\u5e94\u6027\uff0c\u5728Camelyon17\u548cHAM10000\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u76847-shot\u4efb\u52a1\u4e2d\uff0c\u5206\u522b\u5b9e\u73b0\u4e8634.2%\u548c16.9%\u7684\u51c6\u786e\u7387\u63d0\u5347\uff0c\u76f8\u5bf9\u4e8e\u96f6-shot\u6027\u80fd\uff0c\u8fd9\u662f\u663e\u8457\u7684\u8fdb\u6b65\u3002**|\n", "2405.12209": "|**2024-05-20**|**MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark**|Hongwei Liu et.al.|[2405.12209](http://arxiv.org/abs/2405.12209)|**[link](https://github.com/open-compass/mathbench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u6570\u5b66\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f20\u7edf\u7684\u6570\u5b66\u57fa\u51c6\u5982GSM8k\u5728\u5168\u9762\u8bc4\u4ef7\u8fd9\u4e9b\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MathBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u4e25\u683c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u3002MathBench\u8986\u76d6\u5e7f\u6cdb\u7684\u6570\u5b66\u5b66\u79d1\uff0c\u5bf9\u7406\u8bba\u7406\u89e3\u548c\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u8fdb\u884c\u8be6\u5c3d\u8bc4\u4f30\u3002\u5b83\u5206\u4e3a\u4e94\u4e2a\u9636\u6bb5\uff0c\u4ece\u57fa\u7840\u7b97\u672f\u5230\u5927\u5b66\u6570\u5b66\uff0c\u7ed3\u6784\u4e0a\u8bbe\u8ba1\u7528\u4e8e\u8003\u5bdf\u6a21\u578b\u5728\u4e0d\u540c\u6df1\u5ea6\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u6bcf\u4e2a\u9636\u6bb5\u5305\u62ec\u7406\u8bba\u95ee\u9898\u548c\u5e94\u7528\u9898\uff0c\u4ee5\u8861\u91cf\u6a21\u578b\u7684\u6570\u5b66\u719f\u7ec3\u5ea6\u53ca\u5176\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\u5e94\u7528\u6982\u5ff5\u7684\u80fd\u529b\u3002MathBench\u7684\u76ee\u6807\u662f\u63d0\u5347\u5bf9LLMs\u6570\u5b66\u80fd\u529b\u7684\u8bc4\u4ef7\uff0c\u63d0\u4f9b\u5bf9\u5176\u77e5\u8bc6\u7406\u89e3\u6c34\u5e73\u548c\u95ee\u9898\u89e3\u51b3\u6280\u80fd\u7684\u7ec6\u81f4\u89c6\u89d2\uff0c\u540c\u65f6\u652f\u6301\u53cc\u8bed\u73af\u5883\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728https://github.com/open-compass/MathBench\u3002**|\n", "2405.12195": "|**2024-05-20**|**Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey**|Thiago S. Vaillant et.al.|[2405.12195](http://arxiv.org/abs/2405.12195)|**[link](https://github.com/gpt-impact/Paper-content)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5176\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\u548c\u5e7f\u6cdb\u5e94\u7528\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4e0e\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u7684\u878d\u5408\u8d8b\u52bf\u65e5\u76ca\u660e\u663e\uff0c\u4f46\u5173\u4e8e\u8fd9\u79cd\u878d\u5408\u5982\u4f55\u5f71\u54cd\u8f6f\u4ef6\u5f00\u53d1\u5b9e\u8df5\u548c\u8ba4\u77e5\u7684\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u4e3a\u4e86\u63ed\u793a\u5c06AI\u9a71\u52a8\u5de5\u5177\uff0c\u5982ChatGPT\uff0c\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u7684\u5f71\u54cd\u548c\u6311\u6218\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8c03\u67e5\uff0c\u9488\u5bf9207\u540d\u8f6f\u4ef6\u5f00\u53d1\u8005\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u8c03\u67e5\u5185\u5bb9\u5305\u62ecChatGPT\u5bf9\u8f6f\u4ef6\u8d28\u91cf\u3001\u751f\u4ea7\u529b\u4ee5\u53ca\u5f00\u53d1\u8005\u5de5\u4f5c\u6ee1\u610f\u5ea6\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u8fd8\u63a2\u8ba8\u4e86\u4ed6\u4eec\u5bf9\u672a\u6765ChatGPT\u5e94\u7528\u7684\u9884\u671f\u3001\u5bf9\u53ef\u80fd\u7684\u5de5\u4f5c\u5c97\u4f4d\u66ff\u4ee3\u7684\u62c5\u5fe7\uff0c\u4ee5\u53ca\u5bf9\u76d1\u7ba1\u63aa\u65bd\u7684\u770b\u6cd5\u3002|\n", "2405.12174": "|**2024-05-20**|**CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models**|Haoxiang Shi et.al.|[2405.12174](http://arxiv.org/abs/2405.12174)|null|\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aCT-Eval\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\uff0c\u65e8\u5728\u8861\u91cf\u5927\u8bed\u8a00\u6a21\u578b\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u6587\u672c\u8f6c\u8868\u683c\u4efb\u52a1\u6027\u80fd\u3002\u7531\u4e8e\u73b0\u6709\u82f1\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\u4e3b\u8981\u9762\u5411\u82f1\u8bed\uff0cCT-Eval\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u9009\u62e9\u4e86\u4e00\u79cd\u6d41\u884c\u7684\u591a\u5b66\u79d1\u4e2d\u6587\u5728\u7ebf\u767e\u79d1\u4f5c\u4e3a\u6765\u6e90\uff0c\u6db5\u76d6\u4e8628\u4e2a\u9886\u57df\u4ee5\u4fdd\u8bc1\u6570\u636e\u591a\u6837\u6027\u3002\u4e3a\u4e86\u51cf\u5c11\u6570\u636e\u865a\u6784\uff08hallucination\uff09\u95ee\u9898\uff0c\u7814\u7a76\u8005\u9996\u5148\u8bad\u7ec3\u4e86\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u6765\u8bc6\u522b\u5e76\u8fc7\u6ee4\u6389\u5b58\u5728\u865a\u6784\u95ee\u9898\u7684\u6837\u672c\uff0c\u7136\u540e\u4eba\u5de5\u6807\u6ce8\u9a8c\u8bc1\u96c6\u548c\u6d4b\u8bd5\u96c6\u4e2d\u7684\u9519\u8bef\u3002\u6700\u7ec8\uff0cCT-Eval\u5305\u542b\u4e86\u5927\u7ea688,600\u4e2a\u4efb\u52a1\u6837\u672c\u3002\u901a\u8fc7CT-Eval\uff0c\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5f00\u6e90\u548c\u95ed\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u663e\u793a\u96f6-shot\u6a21\u5f0f\u4e0b\u8fd9\u4e9b\u6a21\u578b\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ecd\u6709\u663e\u8457\u5dee\u8ddd\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5f00\u6e90\u6a21\u578b\u5728\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u4e0a\u6709\u4e86\u663e\u8457\u63d0\u5347\uff0c\u5927\u5e45\u8d85\u8d8a\u4e86GPT-4\u3002\u603b\u4e4b\uff0cCT-Eval\u4e0d\u4ec5\u4e3a\u8bc4\u4f30\u548c\u7406\u89e3\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u5de5\u5177\uff0c\u4e5f\u4e3a\u63d0\u5347\u8fd9\u7c7b\u6a21\u578b\u5728\u8fd9\u9879\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u5b9d\u8d35\u8d44\u6e90\u3002|\n", "2405.12163": "|**2024-05-20**|**Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging**|Xiaobo Liang et.al.|[2405.12163](http://arxiv.org/abs/2405.12163)|**[link](https://github.com/dropreg/fennec)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u4f17\u591a\u73b0\u5b9e\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4e3b\u8981\u76ee\u6807\u662f\u7b26\u5408\u4eba\u7c7b\u7684\u610f\u56fe\u3002\u7136\u800c\uff0c\u7406\u89e3\u4eba\u7c7b\u610f\u56fe\u7684\u590d\u6742\u6027\u4f7f\u5f97\u4f9d\u8d56\u4e8e\u8017\u65f6\u7684\u4eba\u5de5\u8bc4\u4f30\u6210\u4e3a\u5fc5\u8981\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5229\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u8bc4\u4f30\u8005\u7684\u8d8b\u52bf\uff0c\u7279\u522b\u662f\u5728GPT-4\u7684\u6d41\u884c\u80cc\u666f\u4e0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textbf{Fennec}\u7684\u6846\u67b6\uff0c\u4e13\u6ce8\u4e8e\\textbf{F}ine-grained \\textbf{E}valuation\uff08\u7ec6\u81f4\u8bc4\u4f30\uff09\u548c\\textbf{N}eeded \\textbf{E}xtension\uff08\u5fc5\u8981\u6269\u5c55\uff09\u901a\u8fc7\u5206\u652f\uff08Branching\uff09\u548c\u8fde\u63a5\uff08Bridging\uff09\u3002\u5206\u652f\u64cd\u4f5c\u5c06\u8bc4\u4f30\u4efb\u52a1\u5206\u89e3\u4e3a\u4e0d\u540c\u7ef4\u5ea6\u548c\u7c92\u5ea6\uff0c\u4ece\u800c\u51cf\u8f7b\u8bc4\u4f30\u6311\u6218\u3002\u540c\u65f6\uff0c\u8fde\u63a5\u64cd\u4f5c\u878d\u5408\u4e86\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u589e\u52a0\u4e86\u8bc4\u4f30\u4efb\u52a1\u7684\u591a\u6837\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u76847B\u6a21\u578b\u5728\u5404\u79cd\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\\textit{\u4e00\u81f4\u6027}\u548c\\textit{\u4e00\u81f4\u540c\u610f}\u6027\u80fd\u5747\u4f18\u4e8e\u5f00\u6e90\u7684\u66f4\u5927\u89c4\u6a21\u8bc4\u4f30\u6a21\u578b\uff0c\u63a5\u8fd1GPT-4\u7684\u8868\u73b0\u3002\u6211\u4eec\u5229\u7528\u6a21\u578b\u7684\u7cbe\u7ec6\u6821\u6b63\u529f\u80fd\u6539\u8fdb\u591a\u4e2a\u6a21\u578b\u54cd\u5e94\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u4f18\u5316\u63d0\u5347\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u5728MT-Bench\u4e0a\u63d0\u9ad8\u4e861-2\u5206\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728GitHub\u4e0a\u5f00\u6e90\\footnote{\\url{https://github.com/dropreg/Fennec}}\u3002**|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130](http://arxiv.org/abs/2405.12130)|**[link](https://github.com/kongds/mora)**|**\u4f4e\u79e9\u9002\u5e94\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u6d41\u884c\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4f4e\u79e9\u66f4\u65b0\uff08\u5982LoRA\u5b9e\u73b0\uff09\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u6307\u51fa\uff0c\u8fd9\u79cd\u673a\u5236\u53ef\u80fd\u9650\u5236\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\u548c\u8bb0\u5fc6\u65b0\u77e5\u8bc6\u7684\u80fd\u529b\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5MoRA\uff0c\u5b83\u5229\u7528\u5e73\u65b9\u77e9\u9635\u5b9e\u73b0\u9ad8\u79e9\u66f4\u65b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eLoRA\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76f8\u5e94\u7684\u975e\u53c2\u6570\u8fd0\u7b97\u5668\uff0c\u4ee5\u964d\u4f4e\u8f93\u5165\u7ef4\u5ea6\u5e76\u589e\u52a0\u8f93\u51fa\u7ef4\u5ea6\u5904\u7406\u5e73\u65b9\u77e9\u9635\u3002\u8fd9\u4e9b\u8fd0\u7b97\u5668\u786e\u4fdd\u6743\u91cd\u80fd\u65e0\u7f1d\u878d\u5165\u5230\u5927\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u4f7f\u5f97\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u50cfLoRA\u4e00\u6837\u90e8\u7f72\u3002\u6211\u4eec\u5728\u4e94\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff1a\u6307\u4ee4\u8c03\u6574\u3001\u6570\u5b66\u63a8\u7406\u3001\u8fde\u7eed\u9884\u8bad\u7ec3\u3001\u8bb0\u5fc6\u4ee5\u53ca\u9884\u8bad\u7ec3\u3002\u5728\u5185\u5b58\u5bc6\u96c6\u578b\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8eLoRA\uff0c\u5e76\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002**|\n", "2405.12119": "|**2024-05-20**|**Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation**|Zhankui He et.al.|[2405.12119](http://arxiv.org/abs/2405.12119)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u901a\u8fc7\u51fa\u8272\u5730\u7d22\u5f15\u9879\u76ee\u5185\u5bb9\u3001\u7406\u89e3\u590d\u6742\u7684\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u5e76\u751f\u6210\u76f8\u5173\u9879\u76ee\u6807\u9898\uff0c\u9769\u65b0\u4e86\u5bf9\u8bdd\u63a8\u8350\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u63a7\u5236\u63a8\u8350\u9879\u76ee\u7684\u5206\u5e03\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u5bfc\u81f4\u5728\u9488\u5bf9\u5bf9\u8bdd\u63a8\u8350\u5e73\u53f0\u7684\u5feb\u901f\u53d8\u5316\u7684\u6570\u636e\u5206\u5e03\uff0c\u5982\u9879\u76ee\u6d41\u884c\u5ea6\u4e0a\uff0c\u6027\u80fd\u6b20\u4f73\u3002\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\uff0cLLMs\u901a\u8fc7\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u9879\u76ee\u6807\u9898\uff08\u4f5c\u4e3a\u591a\u4e2a\u4ee4\u724c\uff09\uff0c\u8fd9\u4f7f\u5f97\u83b7\u53d6\u548c\u63a7\u5236\u6240\u6709\u9879\u76ee\u63a8\u8350\u53d8\u5f97\u56f0\u96be\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u91cd\u7d22\u5f15-\u7136\u540e\u9002\u5e94\u201d\uff08Reindex-Then-Adapt\uff0cRTA\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u591a\u4ee4\u724c\u9879\u76ee\u6807\u9898\u8f6c\u6362\u4e3a\u5355\u4e2a\u4ee4\u724c\u4e8eLLMs\u5185\uff0c\u968f\u540e\u8c03\u6574\u8fd9\u4e9b\u5355\u4ee4\u724c\u9879\u76ee\u6807\u9898\u7684\u6982\u7387\u5206\u5e03\u3002RTA\u6846\u67b6\u7ed3\u5408\u4e86LLMs\u7406\u89e3\u548c\u590d\u6742\u67e5\u8be2\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\uff08RecSys\uff09\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\u6709\u6548\u63a7\u5236\u63a8\u8350\u9879\u76ee\u5206\u5e03\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u4e09\u4e2a\u4e0d\u540c\u7684\u5bf9\u8bdd\u63a8\u8350\u6570\u636e\u96c6\u548c\u4e24\u79cd\u9002\u5e94\u8bbe\u7f6e\u4e0b\uff0c\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u51c6\u786e\u6027\u6307\u6807\u3002|\n", "2405.12107": "|**2024-05-20**|**Imp: Highly Capable Large Multimodal Models for Mobile Devices**|Zhenwei Shao et.al.|[2405.12107](http://arxiv.org/abs/2405.12107)|**[link](https://github.com/milvlg/imp)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5f00\u653e\u4e16\u754c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u53c2\u6570\u91cf\u5927\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\uff0c\u9650\u5236\u4e86\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u8f7b\u91cf\u7ea7LMM\uff0c\u65e8\u5728\u5728\u6709\u9650\u89c4\u6a21\uff08\u598230\u4ebf\u53c2\u6570\uff09\u4e0b\u6700\u5927\u5316\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u591a\u6570\u4ec5\u5173\u6ce8\u8bbe\u8ba1\u7a7a\u95f4\u7684\u5355\u4e00\u6216\u4e24\u4e2a\u65b9\u9762\uff0c\u5bf9\u5f71\u54cd\u6a21\u578b\u80fd\u529b\u7684\u5173\u952e\u8bbe\u8ba1\u9009\u62e9\u5c1a\u672a\u8fdb\u884c\u5168\u9762\u63a2\u8ba8\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u8f7b\u91cf\u7ea7LMM\u7684\u8bbe\u8ba1\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u8bad\u7ec3\u7b56\u7565\u548c\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u5957\u540d\u4e3aImp\u7684\u9ad8\u6027\u80fdLMM\u5bb6\u65cf\uff0c\u8986\u76d620\u4ebf\u523040\u4ebf\u53c2\u6570\u89c4\u6a21\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684Imp-30\u4ebf\u6a21\u578b\u5728\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u73b0\u6709\u8f7b\u91cf\u7ea7\u6a21\u578b\u76f8\u6bd4\u65f6\u6301\u7eed\u9886\u5148\uff0c\u5e76\u8d85\u8d8a\u4e86130\u4ebf\u53c2\u6570\u89c4\u6a21\u7684\u6700\u65b0LMM\u72b6\u6001\u3002\u901a\u8fc7\u4f4e\u7cbe\u5ea6\u91cf\u5316\u548c\u5206\u8fa8\u7387\u964d\u4f4e\u6280\u672f\uff0cImp\u6a21\u578b\u80fd\u591f\u5728\u9ad8\u901a\u9a81\u9f998Gen3\u79fb\u52a8\u82af\u7247\u4e0a\u5b9e\u73b0\u9ad8\u901f\u90e8\u7f72\uff0c\u6bcf\u79d2\u5904\u7406\u5927\u7ea613\u4e2a\u4ee4\u724c\u7684\u63a8\u7406\u901f\u5ea6\u3002**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100](http://arxiv.org/abs/2405.12100)|null|## \u80cc\u666f \u6570\u5b66\u4e16\u754c\u95ee\u9898\u4fee\u6b63\uff08MWPC\uff09\u662f\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u89e3\u51b3\u6570\u5b66\u95ee\u9898\u8fc7\u7a0b\u4e2d\u9519\u8bef\u63a8\u7406\u7684\u4fee\u6b63\u4efb\u52a1\u3002\u672c\u6587\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\uff0c\u5173\u6ce8\u4e24\u70b9\uff1a\uff081\uff09\u533a\u5206\u6570\u5b66\u63a8\u7406\u4e0e\u9519\u8bef\u4fee\u6b63\uff1b\uff082\uff09\u63a2\u7d22\u7b56\u7565\u4ee5\u63d0\u5347LLMs\u5728\u6570\u5b66\u9886\u57df\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9MWPC\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5728\u5b9e\u65f6\u6559\u80b2\u4e2d\uff0c\u5e2e\u52a9\u5b66\u751f\u8bc6\u522b\u9519\u8bef\u6bd4\u5355\u7eaf\u63d0\u4f9b\u6b63\u786e\u7b54\u6848\u66f4\u4e3a\u5173\u952e\u3002\u7136\u800c\uff0c\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u83b7\u53d6\u7cbe\u786e\u7684\u89e3\u9898\u7b54\u6848\uff0c\u800c\u975e\u7ea0\u6b63\u53ef\u80fd\u7684\u9519\u8bef\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8c03\u6574\u4e86\u7814\u7a76\u8303\u5f0f\uff0c\u8868\u660e\u63d0\u5347\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5e76\u4e0d\u7b49\u540c\u4e8e\u7cbe\u901a\u9519\u8bef\u4fee\u6b63\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u8bca\u65ad\u5bfc\u5411\u63d0\u793a\uff08DOP\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdbLLMs\u5728\u9519\u8bef\u4fee\u6b63\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDOP\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u5f70\u663e\u5176\u91cd\u8981\u6027\u3002\u6211\u4eec\u5f3a\u8c03\uff0c\u5728\u6570\u5b66\u6559\u80b2\u4e2d\uff0c\u5bf9\u51fa\u8272\u4fee\u6b63\u8005\u7684\u9700\u8981\u8d85\u8fc7\u4e86\u5bf9\u719f\u7ec3\u63a8\u7406\u8005\u7684\u8ffd\u6c42\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2405.12981": "|**2024-05-21**|**Reducing Transformer Key-Value Cache Size with Cross-Layer Attention**|William Brandon et.al.|[2405.12981](http://arxiv.org/abs/2405.12981)|null|## \u7ffb\u8bd1 \u952e\u503c\u7f13\u5b58\u5bf9\u4e8e\u52a0\u901fTransformer\u67b6\u6784\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89e3\u7801\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u548c\u6279\u91cf\u5927\u5c0f\u589e\u5927\uff0c\u5b58\u50a8\u952e\u503c\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\u53ef\u80fd\u4f1a\u53d8\u5f97\u96be\u4ee5\u627f\u53d7\u3002\u81ea\u4eceTransformer\u8bde\u751f\u4ee5\u6765\uff0c\u4e24\u4e2a\u6700\u6709\u6548\u7684\u5185\u5b58\u51cf\u5c0f\u7b56\u7565\u662f\u591a\u67e5\u8be2\u6ce8\u610f\u529b\uff08MQA\uff09\u53ca\u5176\u63a8\u5e7f\uff0c\u7fa4\u7ec4\u67e5\u8be2\u6ce8\u610f\u529b\uff08GQA\uff09\u3002MQA\u548cGQA\u901a\u8fc7\u8ba9\u591a\u4e2a\u67e5\u8be2\u5934\u5171\u4eab\u5355\u4e2a\u952e/\u503c\u5934\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u4e0d\u540c\u952e/\u503c\u5934\u7684\u6570\u91cf\uff0c\u540c\u65f6\u5bf9\u51c6\u786e\u6027\u5f71\u54cd\u8f83\u5c0f\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u8fdb\u4e00\u6b65\u53d1\u5c55MQA\uff0c\u5373\u5728\u76f8\u90bb\u5c42\u4e4b\u95f4\u4e5f\u5171\u4eab\u952e\u548c\u503c\u5934\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u8de8\u5c42\u6ce8\u610f\u529b\uff08CLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528CLA\uff0c\u53ef\u4ee5\u5728\u4fdd\u6301\u63a5\u8fd1\u539f\u59cbMQA\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u5c06\u952e\u503c\u7f13\u5b58\u7684\u5927\u5c0f\u518d\u51cf\u5c112\u500d\u3002\u6211\u4eec\u5728\u4ece\u5934\u8bad\u7ec310\u4ebf\u53c2\u6570\u548c30\u4ebf\u53c2\u6570\u6a21\u578b\u7684\u5b9e\u9a8c\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u70b9\uff0c\u7ed3\u679c\u8868\u660e\uff0cCLA\u5728\u5185\u5b58\u4e0e\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u63d0\u4f9b\u4e86\u4f18\u4e8e\u4f20\u7edfMQA\u7684\u5e15\u7d2f\u6258\u6539\u8fdb\uff0c\u4f7f\u5f97\u66f4\u957f\u7684\u5e8f\u5217\u957f\u5ea6\u548c\u66f4\u5927\u7684\u6279\u91cf\u5927\u5c0f\u4e0b\u7684\u63a8\u7406\u6210\u4e3a\u53ef\u80fd\u3002|\n", "2405.12961": "|**2024-05-21**|**Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale**|Shriram Chennakesavalu et.al.|[2405.12961](http://arxiv.org/abs/2405.12961)|**[link](https://github.com/rotskoff-group/llm-era)**|\u5728\u5316\u5b66\u7a7a\u95f4\u4e2d\u7684\u641c\u7d22\u662f\u4e00\u4e2a\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\uff0c\u56e0\u4e3a\u53ef\u80fd\u7684\u5206\u5b50\u6570\u91cf\u968f\u7740\u539f\u5b50\u6570\u91cf\u5448\u7ec4\u5408\u7ea7\u589e\u957f\u3002\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\u901a\u8fc7\u5b66\u4e60\u5316\u5b66\u5316\u5408\u7269\u6570\u636e\u5e93\u5df2\u7ecf\u4ea7\u751f\u4e86\u5f3a\u5927\u7684\u751f\u6210\u5668\uff0c\u4f46\u6211\u4eec\u4ecd\u7136\u7f3a\u4e4f\u6709\u6548\u7b56\u7565\u6765\u751f\u6210\u5177\u6709\u7279\u5b9a\u6027\u8d28\u7684\u5206\u5b50\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u201c\u5bf9\u9f50\u201d\u95ee\u9898\u76f8\u4f3c\uff0c\u5c3d\u7ba1\u5728\u8bb8\u591a\u5316\u5b66\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u6709\u4e00\u4e2a\u660e\u786e\u4e14\u6613\u4e8e\u8bc4\u4f30\u7684\u5956\u52b1\u51fd\u6570\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u80fd\u91cf\u6392\u540d\u5bf9\u9f50\uff08ERA\uff09\u7684\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u660e\u786e\u7684\u5956\u52b1\u51fd\u6570\u6784\u5efa\u4e86\u4e00\u4e2a\u68af\u5ea6\u4f18\u5316\u76ee\u6807\uff0c\u7528\u4e8e\u8c03\u6574\u81ea\u56de\u5f52\u7b56\u7565\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u8be5\u7b97\u6cd5\u4e0eProximal Policy Optimization\uff08PPO\uff09\u548cDirect Preference Optimization\uff08DPO\uff09\u5bc6\u5207\u76f8\u5173\uff0c\u4f46\u5176\u6700\u5c0f\u5316\u5668\u6536\u655b\u4e8e\u4e00\u4e2a\u7406\u60f3\u7684\u5409\u5e03\u65af-\u73bb\u5c14\u5179\u66fc\u5206\u5e03\uff0c\u5956\u52b1\u51fd\u6570\u626e\u6f14\u4e86\u80fd\u91cf\u89d2\u8272\u3002\u6b64\u5916\uff0c\u8be5\u7b97\u6cd5\u5177\u6709\u9ad8\u5ea6\u53ef\u6269\u5c55\u6027\uff0c\u65e0\u9700\u5f3a\u5316\u5b66\u4e60\uff0c\u5e76\u4e14\u5728\u6bcf\u5bf9\u6837\u672c\u7684\u504f\u597d\u89c2\u5bdf\u6b21\u6570\u8f83\u5c11\u65f6\uff0c\u76f8\u5bf9\u4e8eDPO\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u5e94\u7528\u4e8e\u5206\u5b50\u53d8\u538b\u5668\u7684\u5bf9\u9f50\uff0c\u4ee5\u751f\u6210\u5177\u6709\u5916\u90e8\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\uff0c\u5e76\u53d1\u73b0\u5b83\u80fd\u7a33\u5065\u5730\u8fdb\u884c\u641c\u7d22\uff0c\u63a2\u7d22\u5316\u5b66\u7a7a\u95f4\u7684\u591a\u6837\u5316\u90e8\u5206\u3002\u867d\u7136\u6211\u4eec\u7684\u91cd\u70b9\u5728\u4e8e\u5316\u5b66\u641c\u7d22\uff0c\u4f46\u6211\u4eec\u5728\u4e00\u4e2aAI\u76d1\u7763\u7684\u4efb\u52a1\u4e0a\u4e5f\u53d6\u5f97\u4e86\u4f18\u79c0\u7ed3\u679c\uff0c\u8868\u660e\u8be5\u65b9\u6cd5\u662f\u53ef\u6269\u5c55\u4e14\u901a\u7528\u7684\u3002|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939](http://arxiv.org/abs/2405.12939)|**[link](https://github.com/yinzhangyue/AoR)**|## \u80cc\u666f \u8fd1\u671f\uff0cChain-of-Thought\u63d0\u793a\u7684\u8fdb\u5c55\u6781\u5927\u5730\u63a8\u52a8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u7a81\u7834\u3002\u5f53\u524d\u7814\u7a76\u901a\u8fc7\u91c7\u6837\u591a\u79cd\u63a8\u7406\u8def\u5f84\u5e76\u6839\u636e\u7b54\u6848\u9891\u7387\u8fdb\u884censemble\uff0c\u63d0\u9ad8\u4e86LLMs\u7684\u63a8\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u6b63\u786e\u7b54\u6848\u5904\u4e8e\u5c11\u6570\u7684\u60c5\u51b5\u65f6\u5931\u6548\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u662f\u5236\u7ea6LLMs\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u56e0\u7d20\uff0c\u4ec5\u51ed\u9884\u6d4b\u7b54\u6848\u65e0\u6cd5\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u63a8\u7406\u805a\u5408\u6846\u67b6AoR\uff08\u63a8\u7406\u805a\u5408\uff09\uff0c\u5b83\u4f9d\u636e\u63a8\u7406\u94fe\u6761\u7684\u8bc4\u4f30\u6765\u9009\u62e9\u7b54\u6848\u3002\u6b64\u5916\uff0cAoR\u5f15\u5165\u4e86\u52a8\u6001\u91c7\u6837\u7b56\u7565\uff0c\u6839\u636e\u4efb\u52a1\u590d\u6742\u5ea6\u8c03\u6574\u63a8\u7406\u94fe\u6761\u7684\u6570\u91cf\u3002 ## \u4efb\u52a1 \u4e00\u7cfb\u5217\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAoR\u76f8\u8f83\u4e8e\u4e3b\u6d41ensemble\u65b9\u6cd5\u8868\u73b0\u51fa\u8272\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u8868\u660e\uff0cAoR\u4e0d\u4ec5\u9002\u7528\u4e8e\u5404\u79cdLLMs\uff0c\u800c\u4e14\u5728\u4e0e\u73b0\u6709\u65b9\u6cd5\u7684\u6027\u80fd\u5929\u82b1\u677f\u6bd4\u8f83\u4e2d\uff0c\u8fbe\u5230\u4e86\u66f4\u4f18\u79c0\u7684\u6c34\u5e73\u3002|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933](http://arxiv.org/abs/2405.12933)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bf8\u5982\u603b\u7ed3\u3001\u7b97\u672f\u63a8\u7406\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5728\u9053\u5fb7\u63a8\u7406\u548c\u4f26\u7406\u51b3\u7b56\u65b9\u9762\uff0c\u5c24\u5176\u662f\u5728\u6d89\u53ca\u591a\u4e2a\u5229\u76ca\u76f8\u5173\u8005\u7684\u590d\u6742\u60c5\u666f\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u4e25\u5cfb\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSkin-in-the-Game\uff08SKIG\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4ece\u4e0d\u540c\u5229\u76ca\u76f8\u5173\u8005\u89d2\u5ea6\u5ba1\u89c6\u51b3\u7b56\u7684\u540e\u679c\uff0c\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u5728\u9053\u5fb7\u63a8\u7406\u4e2d\u7684\u80fd\u529b\u3002SKIG\u7684\u6838\u5fc3\u673a\u5236\u662f\u6a21\u62df\u884c\u52a8\u7684\u8d23\u4efb\u611f\uff0c\u7ed3\u5408\u540c\u7406\u5fc3\u7ec3\u4e60\u548c\u98ce\u9669\u8bc4\u4f30\uff0c\u5bf9\u63d0\u9ad8\u5176\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u4f7f\u7528\u4e13\u6709\u548c\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u9053\u5fb7\u63a8\u7406\u57fa\u51c6\u4e0a\u9a8c\u8bc1SKIG\u7684\u8868\u73b0\uff0c\u5e76\u901a\u8fc7\u6df1\u5165\u7684\u6d88\u878d\u5206\u6790\u63a2\u7a76\u5176\u5173\u952e\u7ec4\u4ef6\u3002|\n", "2405.12929": "|**2024-05-21**|**Code-mixed Sentiment and Hate-speech Prediction**|Anjali Yadav et.al.|[2405.12929](http://arxiv.org/abs/2405.12929)|null|\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\uff0c\u6df7\u5408\u4ee3\u7801\uff08code-mixed discourse\uff09\u6307\u7684\u662f\u5355\u6587\u672c\u4e2d\u878d\u5408\u591a\u79cd\u8bed\u8a00\u7684\u73b0\u8c61\uff0c\u5c24\u5176\u662f\u5728\u5b98\u65b9\u8bed\u8a00\u591a\u5143\u7684\u56fd\u5bb6\u7684\u975e\u6b63\u5f0f\u4ea4\u6d41\u4e2d\u5e38\u89c1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u4e3b\u5bfc\u5730\u4f4d\u63d0\u5347\uff0c\u6211\u4eec\u9488\u5bf9\u4ee3\u7801\u6df7\u5408\u8bed\u5883\u7684\u7814\u7a76\u4e5f\u968f\u4e4b\u5c55\u5f00\u3002\u9996\u5148\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u56db\u6b3e\u65b0\u7684\u82f1\u8bed-\u5370\u5730\u8bed\u548c\u82f1\u8bed-\u65af\u6d1b\u6587\u5c3c\u4e9a\u53cc\u8bed\u9884\u8bad\u7ec3\u906e\u7f69\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u9002\u5e94\u975e\u6b63\u5f0f\u8bed\u8a00\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5404\u79cd\u7c7b\u578b\u7684\u6a21\u578b\u2014\u2014\u5305\u62ec\u5355\u8bed\u3001\u53cc\u8bed\u3001\u5c11\u91cf\u8bed\u8a00\u548c\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u6a21\u578b\u2014\u2014\u5728\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u60c5\u611f\u5206\u6790\u548c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7b49\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u6709\u6548\u7684\u5206\u7c7b\u5668\u662f\u9488\u5bf9\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u4e13\u4e1a\u5316\u53cc\u8bed\u548c\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u968f\u540e\u662f\u975e\u4e13\u4e1a\u7684\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u548c\u5355\u8bed\u6a21\u578b\uff0c\u800c\u5927\u578b\u751f\u6210\u6a21\u578b\u7684\u8868\u73b0\u5e76\u4e0d\u7a81\u51fa\u3002\u5bf9\u4e8e\u6d89\u53ca\u60c5\u611f\u7684\u95ee\u9898\uff0c\u6a21\u578b\u5728\u5904\u7406\u4ee3\u7801\u6df7\u5408\u6570\u636e\u65f6\u603b\u4f53\u4e0a\u7565\u4f18\u4e8e\u975e\u4ee3\u7801\u6df7\u5408\u6570\u636e\u3002|\n", "2405.12920": "|**2024-05-21**|**Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples**|Tim Menzies et.al.|[2405.12920](http://arxiv.org/abs/2405.12920)|**[link](https://github.com/timm/ez)**|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u8f6f\u4ef6\u5206\u6790\u6311\u6218\u4efb\u52a1\u3002\u5728\u8fd9\u4e2a\u88ab\u79f0\u4e3a\u201c\u8f6f\u4ef6\u5ba1\u67e5\u201d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e00\u7ec4SME\uff08\u4e3b\u9898\u4e13\u5bb6\uff09\u4f1a\u8bc4\u5ba1\u8f6f\u4ef6\u884c\u4e3a\u793a\u4f8b\uff0c\u4ee5\u5efa\u8bae\u5982\u4f55\u6539\u8fdb\u8f6f\u4ef6\u7684\u8fd0\u884c\u3002\u7531\u4e8eSME\u7684\u65f6\u95f4\u901a\u5e38\u975e\u5e38\u6709\u9650\uff0c\u7406\u60f3\u7684\u72b6\u51b5\u662f\uff0c\u8be5\u56e2\u961f\u4ec5\u901a\u8fc7\u67e5\u770b\u5c11\u91cf\u5177\u6709\u9ad8\u5ea6\u4fe1\u606f\u4ef7\u503c\u7684\u793a\u4f8b\u5c31\u80fd\u5b8c\u6210\u4f18\u5316\u4efb\u52a1\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e2a\u5ba1\u67e5\u8fc7\u7a0b\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u8bad\u7ec3\u9884\u6d4b\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u67d0\u4e2a\u4e13\u5bb6\u662f\u5426\u4f1a\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u4e0b\u4e00\u4e2a\u793a\u4f8b\u3002\u8fd9\u79cd\u9884\u6d4b\u6a21\u578b\u53ef\u4ee5\u4e0eSME\u5408\u4f5c\uff0c\u5f15\u5bfc\u4ed6\u4eec\u63a2\u7d22\u6240\u6709\u793a\u4f8b\uff0c\u540c\u65f6\u5728\u4e13\u5bb6\u79bb\u5f00\u540e\uff0c\u6a21\u578b\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u4ee3\u7406\uff0c\u5904\u7406\u65b0\u51fa\u73b0\u7684\u6848\u4f8b\uff0c\u4ee5\u5e94\u5bf9\u4e13\u5bb6\u4eec\u7684\u5fd9\u788c\u3002 \u572831\u4e2a\u6848\u4f8b\u7814\u7a76\u4e2d\uff08\u6db5\u76d6\u4e86\u4ece\u8f6f\u4ef6\u6d41\u7a0b\u7684\u9ad8\u5c42\u51b3\u7b56\u5230\u89c6\u9891\u7f16\u7801\u8f6f\u4ef6\u914d\u7f6e\u7684\u4f4e\u5c42\u51b3\u7b56\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f7f\u752812\u523030\u4e2a\u6807\u7b7e\u5c31\u80fd\u5efa\u7acb\u8fd9\u6837\u7684\u9884\u6d4b\u6a21\u578b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u4ec5\u51ed\u5c11\u6570\u793a\u4f8b\uff08\u4e0d\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5c31\u80fd\u53d6\u5f97\u8fd9\u6837\u7684\u6210\u679c\uff0c\u5728\u5f53\u524d\u5c1a\u5c5e\u7f55\u89c1\u3002\u9075\u5faa\u5f00\u653e\u79d1\u5b66\u7684\u539f\u5219\uff0c\u6211\u4eec\u5c06\u5728\u63d0\u4f9b\u6240\u6709\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4fbf\u4ed6\u4eba\u80fd\u590d\u5236\u3001\u9a8c\u8bc1\u6216\u5728\u6b64\u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u6539\u8fdb\u8fd9\u4e9b\u7ed3\u679c\u3002|\n", "2405.12915": "|**2024-05-21**|**G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation**|Xingyuan Pan et.al.|[2405.12915](http://arxiv.org/abs/2405.12915)|**[link](https://github.com/xypan0/G-DIG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u7528\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\uff0c\u5b83\u4eec\u80fd\u591f\u4e0e\u4eba\u7c7b\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u534f\u540c\u3002\u7136\u800c\uff0c\u6307\u4ee4\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u662f\u6307\u4ee4\u5fae\u8c03\u9762\u4e34\u7684\u4e24\u5927\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u9009\u62e9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u9ad8\u8d28\u91cf\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u3002\u6211\u4eec\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5206\u6790\u5355\u4e2a\u8bad\u7ec3\u6837\u4f8b\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f71\u54cd\u6a21\u578b\u3002\u901a\u8fc7\u7ed3\u5408\u5f71\u54cd\u529b\u51fd\u6570\u548c\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u79cd\u5b50\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u5bf9\u6a21\u578b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u7684\u6837\u4f8b\u4f5c\u4e3a\u9ad8\u8d28\u91cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u6570\u636e\u591a\u6837\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u805a\u7c7b\u5176\u68af\u5ea6\u5e76\u91cd\u91c7\u6837\uff0c\u6700\u5927\u5316\u5b83\u4eec\u5bf9\u6a21\u578b\u4ea7\u751f\u7684\u5f71\u54cd\u591a\u6837\u6027\u3002\u5728WMT22\u548cFLORES\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u6df1\u5165\u5206\u6790\u8fdb\u4e00\u6b65\u8bc1\u5b9e\u4e86\u5176\u6548\u679c\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2405.12914": "|**2024-05-21**|**An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation**|Zhiyu Tan et.al.|[2405.12914](http://arxiv.org/abs/2405.12914)|**[link](https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion.github.io)**|\u4e00\u4e2a\u5173\u952e\u7684\u5148\u51b3\u6761\u4ef6\u662f\u51c6\u786e\u7406\u89e3\u6587\u672c\u8f93\u5165\uff0c\u8fd9\u5bf9\u4e8e\u5fe0\u5b9e\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u5229\u7528CLIP\u6a21\u578b\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u8868\u793a\u63d0\u793a\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684CLIP\u6a21\u578b\u4ec5\u80fd\u5904\u7406\u82f1\u6587\uff0c\u4e14\u5176\u6587\u672c\u7f16\u7801\u5668\u7684\u6a21\u578b\u5bb9\u91cf\u76f8\u5bf9\u6709\u9650\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u80fd\u591f\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u5e76\u63d0\u4f9b\u66f4\u4f18\u79c0\u7684\u6587\u672c\u8868\u793a\u3002\u672c\u6587\u7814\u7a76\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u4ee5\u63d0\u5347\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u7136\u800c\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5305\u542bLLMs\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u6570\u636e\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e09\u9636\u6bb5\u8bad\u7ec3\u6d41\u7a0b\uff0c\u6709\u6548\u5730\u6574\u5408\u73b0\u6709\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u4e0eLLMs\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u8bad\u7ec3\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u9002\u914d\u5668\uff0c\u4f7f\u5f97\u80fd\u591f\u5feb\u901f\u4f7f\u7528LLMs\u751f\u6210\u7684\u6587\u672c\u8868\u793a\u6765\u8bad\u7ec3\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0d\u4ec5\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u8fd8\u80fd\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u800c\u4e14\u5728\u56fe\u50cf\u751f\u6210\u8d28\u91cf\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.12910": "|**2024-05-21**|**Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment**|Holli Sargeant et.al.|[2405.12910](http://arxiv.org/abs/2405.12910)|**[link](https://github.com/AhmedIzzidien/TopicLLM)**|**\u8be5\u8bba\u6587\u5173\u6ce8\u6cd5\u5f8b\u5206\u6790\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u7a7a\u767d\uff0c\u901a\u8fc7\u6784\u5efa\u548c\u5e94\u7528\u4e00\u79cd\u65b0\u9896\u7684\u5224\u4f8b\u4e3b\u9898\u5206\u7c7b\u6cd5\uff0c\u5bf9\u82f1\u56fd\u7684\u7b80\u6613\u5224\u51b3\u6848\u4ef6\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5229\u7528\u7cbe\u5fc3\u6311\u9009\u7684\u7b80\u6613\u5224\u51b3\u6848\u4f8b\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578bClaude 3 Opus\u7814\u7a76\u529f\u80fd\u6027\u8bdd\u9898\u548c\u8d8b\u52bf\u3002\u7ed3\u679c\u663e\u793a\uff0cClaude 3 Opus\u5728\u4e3b\u9898\u5206\u7c7b\u4e0a\u7684\u51c6\u786e\u7387\u4e3a87.10%\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u6cd5\u5f8b\u9886\u57df\u4e2d\u7b80\u6613\u5224\u51b3\u7684\u660e\u663e\u6a21\u5f0f\u3002\u7531\u4e8e\u82f1\u56fd\u7684\u5224\u4f8b\u6cd5\u5e76\u672a\u539f\u59cb\u6807\u6ce8\u5173\u952e\u8bcd\u6216\u63d0\u4f9b\u4e3b\u9898\u8fc7\u6ee4\u9009\u9879\uff0c\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u6df1\u5316\u4e86\u6211\u4eec\u5bf9\u7b80\u6613\u5224\u51b3\u4e3b\u9898\u672c\u8d28\u7684\u7406\u89e3\uff0c\u8fd8\u5c55\u793a\u4e86\u4f20\u7edf\u65b9\u6cd5\u4e0e\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u5206\u7c7b\u65b9\u6cd5\u7ed3\u5408\u7684\u53ef\u80fd\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u4f9b\u4e86\u82f1\u56fd\u6cd5\u5f8b\u7684\u65b0\u901a\u7528\u5206\u7c7b\u6846\u67b6\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u610f\u4e49\u4e3a\u53f8\u6cd5\u884c\u653f\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba1\u7b97\u6cd5\u5b66\u7814\u7a76\u65b9\u6cd5\u8bba\u8ba8\u8bba\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2405.12900": "|**2024-05-21**|**Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents**|San Kim et.al.|[2405.12900](http://arxiv.org/abs/2405.12900)|null|\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5404\u79cd\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7cfb\u7edf\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6bd2\u6027\u95ee\u9898\u5bf9\u7528\u6237\u4f53\u9a8c\u6784\u6210\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bad\u7ec3\u7b97\u6cd5\u2014\u2014\u5bf9\u6297\u5f0f\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08ADPO\uff09\uff0c\u5b83\u662f\u5728\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7684\u57fa\u7840\u4e0a\u6539\u8fdb\u7684\u3002ADPO\u65e8\u5728\u8bad\u7ec3\u6a21\u578b\u589e\u52a0\u5bf9\u4f18\u9009\u56de\u590d\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u964d\u4f4e\u5bf9\u4f7f\u7528\u6709\u6bd2\u63a7\u5236\u4ee4\u724c\u751f\u6210\u7684\u4e0d\u5b89\u5168\u56de\u590d\u7684\u6982\u7387\u3002\u7814\u7a76\u663e\u793a\uff0cADPO\u80fd\u591f\u589e\u5f3a\u6a21\u578b\u62b5\u5fa1\u6709\u5bb3\u5bf9\u8bdd\u7684\u80fd\u529b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660eADPO\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edfDPO\u66f4\u4e3a\u7a33\u5b9a\u7684\u8bad\u7ec3\u6d41\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u6709\u5bb3\u6570\u636e\u76f4\u63a5\u878d\u5165\u751f\u6210\u6a21\u578b\u7684DPO\u53d8\u4f53\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u521b\u5efa\u5b89\u5168\u5bf9\u8bdd\u6570\u636e\u7684\u9700\u6c42\u3002|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863](http://arxiv.org/abs/2405.14863)|null|\u8de8\u9886\u57df\u5bf9\u9f50\u662f\u6307\u5c06\u4e00\u4e2a\u6982\u5ff5\u4ece\u4e00\u4e2a\u9886\u57df\u6620\u5c04\u5230\u53e6\u4e00\u4e2a\u9886\u57df\u7684\u4efb\u52a1\u3002\u4f8b\u5982\uff0c\u8be2\u95ee\u201c\u5982\u679c\\textit{\u533b\u751f}\u662f\u4e00\u79cd\\textit{\u989c\u8272}\uff0c\u5b83\u4f1a\u662f\u4ec0\u4e48\u989c\u8272\uff1f\u201d\u8fd9\u4e2a\u770b\u4f3c\u5947\u7279\u7684\u8bfe\u9898\u65e8\u5728\u7814\u7a76\u4eba\u4eec\u5982\u4f55\u901a\u8fc7\u7c7b\u522b\u6620\u5c04\u548c\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u63a8\u7406\u6765\u8868\u5f81\u5177\u4f53\u548c\u62bd\u8c61\u7684\u6982\u5ff5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u8ba4\u77e5\u79d1\u5b66\u4e2d\u7684\u8fd9\u4e00\u4efb\u52a1\uff0c\u901a\u8fc7\u884c\u4e3a\u7814\u7a76\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6982\u5ff5\u5316\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u793aLLMs\u6267\u884c\u8de8\u57df\u6620\u5c04\u4efb\u52a1\uff0c\u5e76\u5728\u7fa4\u4f53\u548c\u4e2a\u4f53\u5c42\u9762\u5206\u6790\u5b83\u4eec\u7684\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u5bf9\u5176\u9884\u6d4b\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u548c\u5206\u7c7b\u5b83\u4eec\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u89e3\u91ca\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4eba\u7c7b\u548c\u6a21\u578b\u7684\u6620\u5c04\u4ee5\u53ca\u89e3\u91ca\u5b58\u5728\u663e\u8457\u76f8\u4f3c\u6027\uff0c\u8868\u660e\u6a21\u578b\u4ee5\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u65b9\u5f0f\u8868\u5f81\u6982\u5ff5\u3002\u8fd9\u79cd\u76f8\u4f3c\u6027\u4e0d\u4ec5\u4f53\u73b0\u5728\u6a21\u578b\u7684\u8868\u793a\u4e0a\uff0c\u4e5f\u4f53\u73b0\u5728\u5b83\u4eec\u7684\u884c\u4e3a\u4e2d\u3002\u800c\u4e14\uff0c\u6a21\u578b\u5927\u591a\u7ed9\u51fa\u6709\u6548\u7684\u89e3\u91ca\uff0c\u5e76\u91c7\u7528\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u63a8\u7406\u8def\u5f84\u3002|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862](http://arxiv.org/abs/2405.14862)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aBitune\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u63d0\u5347\u4e86\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u8c03\u4f18\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u5728\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002Bitune\u901a\u8fc7\u540c\u65f6\u5e94\u7528\u81ea\u56de\u5f52\u548c\u53cc\u5411\u6ce8\u610f\u529b\u5230\u63d0\u793a\u4e0a\uff0c\u4ee5\u83b7\u53d6\u66f4\u7cbe\u786e\u7684\u67e5\u8be2\u6216\u6307\u4ee4\u8868\u793a\u3002\u6211\u4eec\u4e3a\u6b64\u5f15\u5165\u4e86\u4e24\u7ec4\u53c2\u6570\uff0c\u5e76\u91c7\u7528\u4e86\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u6765\u5904\u7406\u3002\u8fd9\u4e24\u79cd\u7279\u5f81\u968f\u540e\u88ab\u7ec4\u5408\u6210\u4e00\u4e2a\u52a0\u6743\u5e73\u5747\uff0c\u5176\u4e2d\u6743\u91cd\u7531\u53ef\u8bad\u7ec3\u7cfb\u6570\u51b3\u5b9a\uff0c\u7528\u4e8e\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitune\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u7b97\u672f\u548c\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5927\u91cf\u7684\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u5e76\u663e\u793a\u4e86\u8be5\u65b9\u6cd5\u5bf9\u4e0d\u540cPEFT\u6280\u672f\u7684\u9c81\u68d2\u6027\u3002|\n", "2405.14852": "|**2024-05-23**|**PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression**|Vladimir Malinovskii et.al.|[2405.14852](http://arxiv.org/abs/2405.14852)|**[link](https://github.com/vahe1994/aqlm)**|## \u80cc\u666f \u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u6781\u7aef\u201d\u538b\u7f29\uff0c\u5373\u5c06\u5176\u53c2\u6570\u538b\u7f29\u81f31-2\u4f4d\u6bcf\u53c2\u6570\uff0c\u4ee5\u9002\u5e94\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u7684\u9ad8\u6548\u6267\u884c\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6539\u8fdb\u4e00\u6b21\u6027\u91cf\u5316\u6280\u672f\u548c\u6743\u91cd\u8868\u793a\u4e0a\uff1b\u7136\u800c\uff0c\u7eaf\u540e\u8bad\u7ec3\u65b9\u6cd5\u5728\u7cbe\u5ea6\u4e0e\u4f4d\u5bbd\u6743\u8861\u65b9\u9762\u7684\u6536\u76ca\u6b63\u5728\u51cf\u5c11\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982QuIP#\u548cAQLM\uff0c\u5305\u542b\u5bf9\u90e8\u5206\u538b\u7f29\u53c2\u6570\u7684\u5c0f\u89c4\u6a21\u6821\u51c6\u6570\u636e\u5fae\u8c03\uff1b\u7136\u800c\uff0c\u8fd9\u4e9b\u9488\u5bf9\u538b\u7f29\u6743\u91cd\u7684\u5fae\u8c03\u901a\u5e38\u4ec5\u4f7f\u7528\u76f4\u901a\u4f30\u8ba1\u5668\uff08STE\uff09\uff0cSTE\u5728\u8fd9\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\u5c1a\u4e0d\u660e\u786e\u3002 \u672c\u5de5\u4f5c\u8d28\u7591\u5728\u6781\u7aefLLM\u538b\u7f29\u4e2d\u4f7f\u7528STE\u7684\u6709\u6548\u6027\uff0c\u5e76\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u91cf\u5316\u611f\u77e5\u5fae\u8c03\u7b56\u7565\u3002\u6211\u4eec\u63d0\u51faPV-Tuning\uff0c\u4e00\u4e2a\u65e0\u7279\u5b9a\u67b6\u6784\u9650\u5236\u7684\u6846\u67b6\uff0c\u5b83\u6269\u5c55\u5e76\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u5e76\u5728\u67d0\u4e9b\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u4f9b\u6536\u655b\u4fdd\u8bc1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5f53\u7528\u4e8e1-2\u4f4d\u77e2\u91cf\u91cf\u5316\u65f6\uff0cPV-Tuning\u5728\u9ad8\u6027\u80fd\u6a21\u578b\u5982Llama\u548cMistral\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u6280\u672f\u3002\u901a\u8fc7\u4f7f\u7528PV-Tuning\uff0c\u6211\u4eec\u57282\u4f4d\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u9996\u6b21\u5b9e\u73b0\u4e86Llama 2\u5bb6\u65cf\u6a21\u578b\u7684\u5e15\u7d2f\u6258\u6700\u4f18\u91cf\u5316\u3002|\n", "2405.14831": "|**2024-05-23**|**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models**|Bernal Jim\u00e9nez Guti\u00e9rrez et.al.|[2405.14831](http://arxiv.org/abs/2405.14831)|**[link](https://github.com/osu-nlp-group/hipporag)**|\u4e3a\u4e86\u5728\u6076\u52a3\u591a\u53d8\u7684\u81ea\u7136\u73af\u5883\u4e2d\u751f\u5b58\uff0c\u54fa\u4e73\u52a8\u7269\u7684\u5927\u8111\u53d1\u5c55\u51fa\u5b58\u50a8\u5927\u91cf\u4e16\u754c\u77e5\u8bc6\u5e76\u4e0d\u65ad\u6574\u5408\u65b0\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u540c\u65f6\u907f\u514d\u707e\u96be\u6027\u9057\u5fd8\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u5e26\u6709\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u4e0a\u5df2\u53d6\u5f97\u663e\u8457\u6210\u5c31\uff0c\u4f46\u5b83\u4eec\u5728\u5927\u89c4\u6a21\u65b0\u7ecf\u9a8c\u878d\u5408\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51faHippoRAG\uff0c\u4e00\u4e2a\u53d7\u4eba\u7c7b\u957f\u671f\u8bb0\u5fc6\u6d77\u9a6c\u56de\u7d22\u5f15\u7406\u8bba\u542f\u53d1\u7684\u65b0\u578b\u68c0\u7d22\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u65b0\u7ecf\u9a8c\u7684\u66f4\u6df1\u3001\u66f4\u6709\u6548\u96c6\u6210\u3002HippoRAG\u5de7\u5999\u5730\u534f\u540cLLMs\u3001\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u4e2a\u6027\u5316PageRank\u7b97\u6cd5\uff0c\u6a21\u62df\u4eba\u8111\u76ae\u5c42\u548c\u6d77\u9a6c\u4f53\u5728\u8bb0\u5fc6\u4e2d\u7684\u4e0d\u540c\u4f5c\u7528\u3002 \u6211\u4eec\u5c06HippoRAG\u4e0e\u73b0\u6709RAG\u65b9\u6cd5\u5728\u591a\u8f6e\u95ee\u7b54\u4efb\u52a1\u4e2d\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793aHippoRAG\u663e\u8457\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002\u5355\u6b65\u68c0\u7d22\u65f6\uff0cHippoRAG\u8868\u73b0\u51fa\u4e0e\u8fed\u4ee3\u68c0\u7d22\u65b9\u6cd5\u5982IRCoT\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u6210\u672c\u8282\u770110-30\u500d\uff0c\u901f\u5ea6\u63d0\u53476-13\u500d\u3002\u5f53\u5c06HippoRAG\u878d\u5165IRCoT\u540e\uff0c\u8fd8\u80fd\u5e26\u6765\u989d\u5916\u7684\u663e\u8457\u589e\u76ca\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793aHippoRAG\u80fd\u591f\u5e94\u5bf9\u73b0\u6709\u65b9\u6cd5\u96be\u4ee5\u89e6\u53ca\u7684\u65b0\u573a\u666f\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728\u4e0a\u5f00\u6e90\u3002|\n", "2405.14804": "|**2024-05-23**|**Can LLMs Solve longer Math Word Problems Better?**|Xin Xu et.al.|[2405.14804](http://arxiv.org/abs/2405.14804)|null|### \u7ffb\u8bd1 \u6570\u5b66\u5e94\u7528\u9898\uff08MWPs\uff09\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u77ed\u80cc\u666f\u7684\u9898\u76ee\u4e0a\u3002\u7136\u800c\uff0c\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u6570\u5b66\u95ee\u9898\u5f80\u5f80\u6d89\u53ca\u590d\u6742\u60c5\u5883\uff0c\u56e0\u6b64LLMs\u89e3\u51b3\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u5bf9\u4e8e\u5176\u5728\u5b9e\u9645\u573a\u666f\u7684\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u8fd9\u4e00\u65b9\u9762\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5173\u6ce8Context Length Generalizability\uff08CoLeG\uff09\uff0c\u5373LLMs\u5904\u7406\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u3002\u6211\u4eec\u521b\u5efa\u4e86Extended Grade-School Math\uff08E-GSM\uff09\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5e26\u6709\u8be6\u7ec6\u53d9\u8ff0\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30LLMs\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u6548\u80fd\u548c\u9c81\u68d2\u6027\u3002 \u901a\u8fc7\u5bf9\u73b0\u6709\u96f6\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u4ee5\u53ca\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u8003\u5bdf\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u5728CoLeG\u65b9\u9762\u666e\u904d\u5b58\u5728\u4e0d\u8db3\u3002\u9488\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684LLMs\uff0c\u6211\u4eec\u63d0\u51fa\u9488\u5bf9\u6027\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u5bf9\u4e8e\u4e13\u6709\u6a21\u578b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u5bfc\u6027\u63d0\u793a\u4ee5\u51cf\u8f7b\u957f\u6587\u672c\u7684\u5f71\u54cd\uff1b\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4ee5\u63d0\u5347\u6a21\u578b\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u5728E-GSM\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5176\u4ed6\u591a\u4e2a\u6570\u5b66\u5e94\u7528\u9898\u57fa\u51c6\u4e0a\u4e5f\u5c55\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u672c\u7814\u7a76\u7684\u7ed3\u679c\u4e3a\u672a\u6765\u5229\u7528LLMs\u5904\u7406\u590d\u6742\u73b0\u5b9e\u95ee\u9898\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\uff0c\u4e3a\u5f53\u524d\u9650\u5236\u63d0\u51fa\u4e86\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u6a21\u578b\u6cdb\u5316\u6027\u548c\u8bad\u7ec3\u7b56\u7565\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2405.14782": "|**2024-05-23**|**Lessons from the Trenches on Reproducible Evaluation of Language Models**|Stella Biderman et.al.|[2405.14782](http://arxiv.org/abs/2405.14782)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u662f\u4e00\u9879\u672a\u89e3\u7684\u6311\u6218\u3002\u7814\u7a76\u4eba\u5458\u548c\u5de5\u7a0b\u5e08\u9762\u4e34\u8bf8\u591a\u65b9\u6cd5\u8bba\u96be\u9898\uff0c\u4f8b\u5982\u6a21\u578b\u5bf9\u8bc4\u4f30\u8bbe\u7f6e\u7684\u654f\u611f\u6027\u3001\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\uff0c\u4ee5\u53ca\u53ef\u91cd\u590d\u6027\u548c\u900f\u660e\u5ea6\u7684\u7f3a\u5931\u3002\u672c\u6587\u57fa\u4e8e\u4e09\u5e74\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7ecf\u9a8c\uff0c\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u6307\u5bfc\u548c\u6559\u8bad\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u5e38\u89c1\u7684\u95ee\u9898\u3002\u5176\u6b21\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u5e94\u5bf9\u6216\u51cf\u8f7b\u8fd9\u4e9b\u95ee\u9898\u7684\u6700\u4f73\u5b9e\u8df5\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Language Model Evaluation Harness\uff08lm-eval\uff09\uff1a\u4e00\u4e2a\u5f00\u6e90\u5e93\uff0c\u65e8\u5728\u72ec\u7acb\u3001\u53ef\u91cd\u590d\u548c\u6269\u5c55\u5730\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u5c06\u4ecb\u7ecd\u5e93\u7684\u529f\u80fd\uff0c\u5e76\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u5982\u4f55\u4f7f\u7528\u8be5\u5e93\u6765\u7f13\u89e3\u8fd9\u4e9b\u65b9\u6cd5\u8bba\u5173\u6ce8\u70b9\u3002|\n", "2405.14768": "|**2024-05-23**|**WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models**|Peng Wang et.al.|[2405.14768](http://arxiv.org/abs/2405.14768)|**[link](https://github.com/zjunlp/easyedit)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u968f\u7740\u4e16\u754c\u4e8b\u5b9e\u7684\u4e0d\u65ad\u589e\u957f\u548c\u7ea0\u6b63\u9519\u8bef\u54cd\u5e94\u7684\u9700\u6c42\uff0c\u6a21\u578b\u7f16\u8f91\u7684\u65b9\u6cd5\u9700\u8981\u4e0d\u65ad\u66f4\u65b0\u77e5\u8bc6\u3002\u8bba\u6587\u7684\u6838\u5fc3\u95ee\u9898\u662f\uff1a\u5728\u7f16\u8f91\u8fc7\u7a0b\u4e2d\uff0c\u77e5\u8bc6\u5e94\u5b58\u50a8\u5728\u6a21\u578b\u7684\u54ea\u4e2a\u8bb0\u5fc6\u5c42\u6b21\u66f4\u4e3a\u5408\u9002\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u76f4\u63a5\u4fee\u6539\u957f\u671f\u8bb0\u5fc6\uff08\u6a21\u578b\u53c2\u6570\uff09\u6216\u5229\u7528\u5de5\u4f5c\u8bb0\u5fc6\uff08\u901a\u8fc7\u68c0\u7d22\u7684\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\uff09\u90fd\u4f1a\u5bfc\u81f4\u4e0d\u53ef\u903e\u8d8a\u7684\u4e09\u89d2\u56f0\u5883\u2014\u2014\u53ef\u9760\u6027\u3001\u6cdb\u5316\u80fd\u529b\u548c\u5c40\u90e8\u6027\u65e0\u6cd5\u540c\u65f6\u5b9e\u73b0\u4e8e\u7ec8\u8eab\u7f16\u8f91\u573a\u666f\u4e2d\u3002\u76f4\u63a5\u4fee\u6539\u53c2\u6570\u4f1a\u4e0e\u65e0\u5173\u7684\u9884\u8bad\u7ec3\u77e5\u8bc6\u6216\u5148\u524d\u7f16\u8f91\u4ea7\u751f\u51b2\u7a81\uff08\u53ef\u9760\u6027\u5dee\u3001\u5c40\u90e8\u6027\u4e0d\u8db3\uff09\uff1b\u800c\u57fa\u4e8e\u68c0\u7d22\u7684\u5de5\u4f5c\u8bb0\u5fc6\u96be\u4ee5\u4f7f\u6a21\u578b\u7406\u89e3\u5e76\u6cdb\u5316\u7f16\u8f91\uff08\u6cdb\u5316\u80fd\u529b\u5f31\uff09\u3002\u56e0\u6b64\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aWISE\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u5f25\u5408\u8bb0\u5fc6\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002 \u5728WISE\u4e2d\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u53c2\u6570\u5185\u5b58\u673a\u5236\uff0c\u5305\u62ec\u4e3b\u5185\u5b58\u7528\u4e8e\u5b58\u50a8\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0c\u4fa7\u5185\u5b58\u7528\u4e8e\u5b58\u653e\u7f16\u8f91\u540e\u7684\u77e5\u8bc6\u3002\u4ec5\u5bf9\u4fa7\u5185\u5b58\u4e2d\u7684\u77e5\u8bc6\u8fdb\u884c\u7f16\u8f91\uff0c\u5e76\u8bad\u7ec3\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u4ee5\u4fbf\u6839\u636e\u67e5\u8be2\u51b3\u5b9a\u4ece\u54ea\u4e2a\u5185\u5b58\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u5bf9\u4e8e\u6301\u7eed\u7f16\u8f91\uff0c\u91c7\u7528\u4e86\u77e5\u8bc6\u5207\u7247\u673a\u5236\uff0c\u5c06\u4e0d\u540c\u7684\u7f16\u8f91\u5206\u5e03\u5728\u53c2\u6570\u7684\u4e0d\u540c\u5b50\u7a7a\u95f4\u4e2d\uff0c\u7136\u540e\u5408\u5e76\u5230\u5171\u4eab\u5185\u5b58\u4e2d\uff0c\u4ee5\u907f\u514d\u51b2\u7a81\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWISE\u5728\u95ee\u7b54\u3001\u5e7b\u89c9\u751f\u6210\u548c\u8de8\u4e0d\u540c\u8d8b\u52bf\u7684LLM\u67b6\u6784\uff08\u5982GPT\u3001LLaMA\u548cMistral\uff09\u7684\u7ec8\u8eab\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u6210\u529f\u514b\u670d\u4e86\u4e0a\u8ff0\u56f0\u5883\u3002\u4ee3\u7801\u5c06\u5728https://github.com/zjunlp/EasyEdit\u4e0a\u53d1\u5e03\u3002**|\n", "2405.14767": "|**2024-05-23**|**FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models**|Hongyang Yang et.al.|[2405.14767](http://arxiv.org/abs/2405.14767)|**[link](https://github.com/ai4finance-foundation/finrobot)**|**\u968f\u7740\u91d1\u878d\u673a\u6784\u548c\u4e13\u4e1a\u4eba\u58eb\u8d8a\u6765\u8d8a\u591a\u5730\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u878d\u5165\u5de5\u4f5c\u6d41\u7a0b\uff0c\u91d1\u878d\u884c\u4e1a\u4e0eAI\u793e\u533a\u4e4b\u95f4\u4ecd\u5b58\u5728\u663e\u8457\u969c\u788d\uff0c\u5982\u4e13\u6709\u6570\u636e\u548c\u4e13\u4e1a\u77e5\u8bc6\u3002\u8fd9\u4e9b\u6311\u6218\u9650\u5236\u4e86AI\u5728\u63d0\u5347\u91d1\u878d\u4efb\u52a1\u6548\u7387\u65b9\u9762\u7684\u6f5c\u529b\u3002\u9274\u4e8e\u91d1\u878d\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u6211\u4eec\u65e8\u5728\u5f00\u53d1\u4e13\u95e8\u9488\u5bf9\u91d1\u878d\u7684LLM\u9a71\u52a8\u5de5\u5177\u94fe\uff0c\u5e76\u901a\u8fc7\u5f00\u6e90\u9879\u76ee\u63a8\u52a8\u5176\u666e\u53ca\uff0c\u4fc3\u8fdbAI\u5728\u91d1\u878d\u51b3\u7b56\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u4ecb\u7ecdFinRobot\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u5f00\u6e90AI\u4ee3\u7406\u5e73\u53f0\uff0c\u652f\u6301\u591a\u4e2a\u91d1\u878d\u4e13\u4e1aAI\u4ee3\u7406\uff0c\u6bcf\u4e2a\u90fd\u7531LLM\u9a71\u52a8\u3002\u5e73\u53f0\u4e3b\u8981\u5206\u4e3a\u56db\u5c42\uff1a1\uff09\u91d1\u878dAI\u4ee3\u7406\u5c42\uff0c\u901a\u8fc7\u6784\u5efa\u91d1\u878dChain-of-Thought\uff08CoT\uff09\u5c06\u590d\u6742\u7684\u91d1\u878d\u95ee\u9898\u5206\u89e3\u4e3a\u903b\u8f91\u5e8f\u5217\uff1b2\uff09\u91d1\u878dLLM\u7b97\u6cd5\u5c42\uff0c\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u52a8\u6001\u914d\u7f6e\u5408\u9002\u7684\u6a21\u578b\u5e94\u7528\u7b56\u7565\uff1b3\uff09LLMOps\u548cDataOps\u5c42\uff0c\u901a\u8fc7\u8bad\u7ec3/\u5fae\u8c03\u6280\u672f\u4ee5\u53ca\u4f7f\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u751f\u6210\u7cbe\u786e\u6a21\u578b\uff1b4\uff09\u591a\u6e90LLM\u57fa\u7840\u6a21\u578b\u5c42\uff0c\u6574\u5408\u5404\u79cdLLM\uff0c\u4f7f\u4e0a\u8ff0\u5404\u5c42\u53ef\u4ee5\u76f4\u63a5\u8bbf\u95ee\u3002FinRobot\u65e8\u5728\u4e3a\u4e13\u4e1a\u5206\u6790\u5e08\u548c\u975e\u4e13\u4e1a\u4eba\u58eb\u63d0\u4f9b\u5b9e\u8df5\u64cd\u4f5c\uff0c\u8ba9\u4ed6\u4eec\u80fd\u591f\u5229\u7528\u5f3a\u5927\u7684AI\u6280\u672f\u8fdb\u884c\u9ad8\u7ea7\u91d1\u878d\u5206\u6790\u3002FinRobot\u7684\u5f00\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/AI4Finance-Foundation/FinRobot}\u3002**|\n", "2405.14766": "|**2024-05-23**|**Evaluating Large Language Models for Public Health Classification and Extraction Tasks**|Joshua Harris et.al.|[2405.14766](http://arxiv.org/abs/2405.14766)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u4eec\u5bf9\u5176\u5728\u516c\u5171\u536b\u751f\u9886\u57df\u652f\u6301\u4e13\u5bb6\u5de5\u4f5c\u7684\u6f5c\u529b\u4ea7\u751f\u4e86\u6d53\u539a\u5174\u8da3\u3002\u672c\u7814\u7a76\u901a\u8fc7\u7ed3\u5408\u516d\u4e2a\u5916\u90e8\u6807\u6ce8\u7684\u548c\u4e03\u4e2a\u5185\u90e8\u6807\u6ce8\u7684\u6570\u636e\u96c6\uff0c\u8bc4\u4f30\u4e86LLMs\u5728\u5904\u7406\u4e0e\u5065\u5eb7\u8d1f\u62c5\u3001\u6d41\u884c\u75c5\u5b66\u98ce\u9669\u56e0\u7d20\u548c\u516c\u5171\u536b\u751f\u5e72\u9884\u76f8\u5173\u7684\u6587\u672c\u5206\u7c7b\u548c\u63d0\u53d6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u9996\u5148\u5bf9\u4e94\u4e2a\u5f00\u6e90\u5927\u6a21\u578b\uff08\u53c2\u6570\u91cf\u4ece7\u4ebf\u523070\u4ebf\u4e0d\u7b49\uff09\u8fdb\u884c\u4e86\u96f6\u6837\u672c\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cLlama-3-70B-Instruct\u8868\u73b0\u51fa\u8272\uff0c\u5fae-F1\u5f97\u5206\u572817\u4e2a\u4efb\u52a1\u4e2d\u768415\u9879\u4e2d\u6700\u9ad8\u3002\u5404\u4efb\u52a1\u95f4\u7684\u6027\u80fd\u5dee\u5f02\u663e\u8457\uff0c\u4f8b\u5982\uff0c\u6709\u4e9b\u6a21\u578b\u5982Contact Classification\u7684\u5f97\u5206\u4f4e\u4e8e60%\uff0c\u800c\u50cfGI\u75be\u75c5\u5206\u7c7b\u8fd9\u6837\u7684\u4efb\u52a1\uff0c\u6240\u6709\u6a21\u578b\u90fd\u80fd\u8fbe\u523080%\u4ee5\u4e0a\u7684\u5fae-F1\u3002\u5bf9\u4e8e12\u4e2a\u4efb\u52a1\u7684\u5b50\u96c6\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86GPT-4\uff0c\u53d1\u73b0\u5176\u4e0eLlama-3-70B-Instruct\u7684\u7ed3\u679c\u76f8\u5f53\uff0cLlama-3-70B-Instruct\u5728\u5176\u4e2d6\u4e2a\u4efb\u52a1\u4e0a\u5f97\u5206\u66f4\u9ad8\u6216\u6301\u5e73\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6839\u636e\u521d\u6b65\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u6709\u53ef\u80fd\u6210\u4e3a\u516c\u5171\u536b\u751f\u4e13\u5bb6\u4ece\u5404\u79cd\u81ea\u7531\u6587\u672c\u6e90\u63d0\u53d6\u4fe1\u606f\u7684\u6709\u6548\u5de5\u5177\uff0c\u6709\u52a9\u4e8e\u516c\u5171\u536b\u751f\u76d1\u6d4b\u3001\u7814\u7a76\u548c\u5e72\u9884\u63aa\u65bd\u3002|\n", "2405.14755": "|**2024-05-23**|**Large language models can be zero-shot anomaly detectors for time series?**|Sarah Alnegheimish et.al.|[2405.14755](http://arxiv.org/abs/2405.14755)|null|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6267\u884c\u591a\u79cd\u4efb\u52a1\uff0c\u5305\u62ec\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u8fd9\u4e9b\u6a21\u578b\u7684\u7075\u6d3b\u6027\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e00\u9879\u65b0\u9896\u7684\u7814\u7a76\uff0c\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5bf9\u4e8e\u8bed\u8a00\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u6d89\u53ca\u8bc6\u522b\u8f93\u5165\u5e8f\u5217\uff08\u6216\u591a\u4e2a\u90e8\u5206\uff09\u4e2d\u7684\u5f02\u5e38\u70b9\uff0c\u4ee5\u53ca\u5904\u7406\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u800c\u975e\u4f20\u7edf\u7684\u6587\u672c\u8f93\u5165\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86sigllm\uff0c\u4e00\u4e2a\u4e13\u4e3a\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5305\u542b\u5c06\u65f6\u95f4\u5e8f\u5217\u8f6c\u6362\u4e3a\u6587\u672c\u7684\u6a21\u5757\uff0c\u4ee5\u53ca\u7aef\u5230\u7aef\u7684\u6d41\u7a0b\uff0c\u7528\u4e8e\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5f02\u5e38\u68c0\u6d4b\u3002\u6211\u4eec\u8bd5\u9a8c\u4e86\u4e24\u79cd\u6d4b\u8bd5\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u529b\u7684\u65b9\u6cd5\uff1a\u4e00\u662f\u76f4\u63a5\u63d0\u793a\u6a21\u578b\u6307\u51fa\u8f93\u5165\u4e2d\u7684\u5f02\u5e38\u5143\u7d20\uff1b\u4e8c\u662f\u5229\u7528\u8bed\u8a00\u6a21\u578b\u7684\u9884\u6d4b\u80fd\u529b\u6765\u8f85\u52a9\u68c0\u6d4b\u8fc7\u7a0b\u3002 \u6211\u4eec\u572811\u4e2a\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u4e8610\u79cd\u4e0d\u540c\u7684\u7ba1\u9053\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9884\u6d4b\u65b9\u6cd5\u5728\u6240\u670911\u4e2a\u6570\u636e\u96c6\u4e2d\u90fd\u663e\u8457\u4f18\u4e8e\u63d0\u793a\u65b9\u6cd5\uff0c\u5c24\u5176\u662f\u5728F1\u5206\u6570\u4e0a\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u53d1\u73b0\u5f02\u5e38\uff0c\u4f46\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4ecd\u5360\u4f18\uff0c\u5176\u8868\u73b0\u6bd4\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9ad8\u51fa30%\u3002|\n", "2405.15765": "|**2024-05-24**|**Scaling Laws for Discriminative Classification in Large Language Models**|Dean Wyatte et.al.|[2405.15765](http://arxiv.org/abs/2405.15765)|null|## \u80cc\u666f \u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u6a21\u578b\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u98de\u8dc3\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5bf9\u5404\u79cd\u67e5\u8be2\u751f\u6210\u5408\u7406\u7684\u56de\u7b54\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u5e94\u7528\u4e2d\u5177\u6709\u6f5c\u529b\u3002\u7136\u800c\uff0cLLMs\u5df2\u88ab\u89c2\u5bdf\u5230\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\uff0c\u8fd9\u5728\u77ed\u671f\u5185\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5c06\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u91cd\u65b0\u6784\u60f3\u4e3a\u5206\u7c7b\u4efb\u52a1\uff0c\u4ee5\u5e2e\u52a9\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u9009\u62e9\u6700\u4f73\u7684\u6a21\u677f\u56de\u590d\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a\u5ba2\u670d\u4ee3\u8868\u63d0\u4f9b\u6700\u5408\u9002\u7684\u524dK\u4e2a\u5019\u9009\u56de\u590d\u3002 ## \u4efb\u52a1\u63cf\u8ff0 \u6211\u4eec\u5c55\u793a\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u5b9e\u9a8c\u7684\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86\u5b9e\u9a8c\u7cfb\u7edf\u7684\u6709\u6548\u6027\uff0c\u79bb\u7ebf\u5b9e\u9a8c\u663e\u793a\u51fa\u6539\u8fdb\uff0c\u800c\u5728\u7ebf\u5b9e\u9a8c\u5219\u5e26\u6765\u4e86\u7edf\u8ba1\u663e\u8457\u7684\u6548\u679c\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u4eab\u4e86\u901a\u8fc7\u6a21\u578b\u53c2\u6570\u8c03\u6574\u8fdb\u884c\u7684\u9a8c\u8bc1\u635f\u5931\u548c\u524dK\u7cbe\u5ea6\u7684\u5ea6\u91cf\u66f2\u7ebf\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u6a21\u578b\u5927\u5c0f\u3001\u5ef6\u8fdf\u548c\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u3002|\n", "2405.15739": "|**2024-05-24**|**Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias**|Andres Algaba et.al.|[2405.15739](http://arxiv.org/abs/2405.15739)|**[link](https://github.com/andresalgaba/llm_citation_patterns)**|\u8bba\u6587\u6458\u8981\uff1a \u5f15\u7528\u5b9e\u8df5\u5bf9\u4e8e\u6784\u5efa\u79d1\u5b66\u77e5\u8bc6\u7ed3\u6784\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u53d7\u5230\u5f53\u4ee3\u89c4\u8303\u548c\u504f\u89c1\u7684\u5f71\u54cd\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u51fa\u73b0\uff0c\u8fd9\u4e00\u9886\u57df\u51fa\u73b0\u4e86\u65b0\u7684\u52a8\u6001\u3002\u7814\u7a76\u8005\u9996\u6b21\u63a2\u7d22\u4e86\u5b8c\u5168\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u800c\u975e\u57fa\u4e8e\u641c\u7d22\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7684\u63a8\u8350\u5f15\u7528\u7684\u7279\u6027\u53ca\u5176\u6f5c\u5728\u504f\u89c1\u3002\u5b9e\u9a8c\u4f7f\u7528\u4e86\u4e00\u7ec4\u5305\u542b166\u7bc7\u6765\u81eaAAAI\u3001NeurIPS\u3001ICML\u548cICLR\u7684\u8bba\u6587\uff0c\u8fd9\u4e9b\u8bba\u6587\u5728GPT-4\u7684\u77e5\u8bc6\u622a\u6b62\u65e5\u671f\u540e\u53d1\u8868\uff0c\u6d89\u53ca3,066\u4e2a\u5f15\u7528\u3002\u5b9e\u9a8c\u8ba9GPT-4\u4e3a\u533f\u540d\u6587\u672c\u4e2d\u7684\u5f15\u7528\u63d0\u4f9b\u5b66\u672f\u53c2\u8003\u3002\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u7c7b\u548c\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u5f15\u7528\u6a21\u5f0f\u60ca\u4eba\u76f8\u4f3c\uff0c\u4f46GPT-4\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9ad8\u5f15\u7528\u504f\u89c1\uff0c\u5373\u4f7f\u5728\u63a7\u5236\u4e86\u51fa\u7248\u5e74\u4efd\u3001\u6807\u9898\u957f\u5ea6\u3001\u4f5c\u8005\u6570\u91cf\u548c\u4f1a\u8bae\u7b49\u56e0\u7d20\u540e\u4f9d\u7136\u5b58\u5728\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0GPT-4\u751f\u6210\u7684\u65e2\u6709\u548c\u4e0d\u5b58\u5728\u5f15\u7528\u7684\u7279\u6027\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8868\u660e\u6a21\u578b\u5185\u5316\u4e86\u5f15\u7528\u6a21\u5f0f\u3002\u901a\u8fc7\u5206\u6790\u5f15\u7528\u56fe\u8c31\uff0c\u663e\u793aGPT-4\u63a8\u8350\u7684\u5f15\u7528\u5d4c\u5165\u5728\u76f8\u5173\u5f15\u7528\u7f51\u7edc\u4e2d\uff0c\u6697\u793a\u5176\u5bf9\u6982\u5ff5\u7684\u6df1\u5165\u7406\u89e3\u3002\u5c3d\u7ba1\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u8f85\u52a9\u5f15\u7528\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4e5f\u53ef\u80fd\u653e\u5927\u73b0\u6709\u504f\u89c1\u5e76\u5f15\u5165\u65b0\u504f\u89c1\uff0c\u53ef\u80fd\u5f71\u54cd\u79d1\u5b66\u77e5\u8bc6\u7684\u4f20\u64ad\u3002\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86\u8bc6\u522b\u6a21\u578b\u504f\u89c1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5f00\u53d1\u5e73\u8861\u7684\u65b9\u6cd5\u4e0e\u8bed\u8a00\u6a21\u578b\u4e92\u52a8\u7684\u91cd\u8981\u6027\u3002|\n", "2405.15734": "|**2024-05-24**|**LM4LV: A Frozen Large Language Model for Low-level Vision Tasks**|Boyang Zheng et.al.|[2405.15734](http://arxiv.org/abs/2405.15734)|**[link](https://github.com/bytetriper/lm4lv)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u50ac\u751f\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u7814\u7a76\u70ed\u6f6e\uff0c\u5b83\u4eec\u6b63\u5728\u6539\u53d8\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u591a\u4e2a\u7814\u7a76\u8303\u5f0f\u3002\u5c3d\u7ba1MLLMs\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u548c\u6587\u672c\u5230\u56fe\u50cf\u7b49\u9ad8\u7ea7\u89c6\u89c9\u548c Vision-and-Language \u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u63a2\u8ba8\u8fc7\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u5982\u4f55\u4ece\u8fd9\u4e9b\u6a21\u578b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5f53\u524d\u5927\u591a\u6570MLLM\u7684\u8bbe\u8ba1\u4f7f\u5176\u5bf9\u4f4e\u7ea7\u7279\u5f81\u89c6\u800c\u4e0d\u89c1\uff0c\u56e0\u6b64\u5728\u89e3\u51b3\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u56fa\u6709\u9650\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa$\\textbf{LM4LV}$\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u4e00\u4e2a\u51bb\u7ed3\u7684LLM\u65e0\u9700\u4efb\u4f55\u591a\u6a21\u6001\u6570\u636e\u6216\u5148\u9a8c\u77e5\u8bc6\u5c31\u80fd\u89e3\u51b3\u4e00\u7cfb\u5217\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u3002\u8fd9\u7a81\u663e\u4e86LLMs\u5728\u4f4e\u7ea7\u89c6\u89c9\u9886\u57df\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u5e76\u5f25\u5408\u4e86MLLMs\u4e0e\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u6fc0\u53d1\u5bf9LLMs\u7684\u65b0\u89c6\u89d2\uff0c\u52a0\u6df1\u5bf9\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7406\u89e3\u3002|\n", "2405.15729": "|**2024-05-24**|**Optimizing Large Language Models for OpenAPI Code Completion**|Bohdan Petryshyn et.al.|[2405.15729](http://arxiv.org/abs/2405.15729)|**[link](https://github.com/BohdanPetryshyn/openapi-completion-benchmark)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u3002\u5c3d\u7ba1\u4e3b\u6d41\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\u8865\u5168\u89e3\u51b3\u65b9\u6848\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u8f83\u5c11\u89c1\u7684\u683c\u5f0f\uff0c\u5982OpenAPI\u5b9a\u4e49\u65f6\u6027\u80fd\u6b20\u4f73\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86GitHub Copilot\uff0c\u4e00\u4e2a\u6d41\u884c\u7684\u5546\u4e1a\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u5728OpenAPI\u5b8c\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u5e76\u9488\u5bf9Meta\u5f00\u6e90\u7684Code Llama\u6a21\u578b\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u8be5\u4efb\u52a1\u7684\u4f18\u5316\u7b56\u7565\u3002\u7814\u7a76\u4e2d\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8bed\u4e49\u611f\u77e5\u7684OpenAPI\u5b8c\u6210\u57fa\u51c6\uff0c\u901a\u8fc7\u5b9e\u9a8c\u5206\u6790\u4e86\u4e0d\u540c\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u6280\u672f\u5bf9Code Llama\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7ecf\u8fc7\u5fae\u8c03\u7684Code Llama\u6a21\u578b\u5728\u6b63\u786e\u6027\u4e0a\u8fbe\u5230\u4e86\u6bd4GitHub Copilot\u9ad8\u51fa55.2%\u7684\u5cf0\u503c\uff0c\u540c\u65f6\u5176\u53c2\u6570\u6570\u91cf\u4ec5\u4e3a\u5546\u4e1a\u89e3\u51b3\u65b9\u6848\uff08\u57fa\u4e8eCodex\u6a21\u578b\uff09\u76841/25\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u6539\u8fdb\u4e86\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u4ee3\u7801\u586b\u5145\u8bad\u7ec3\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5728\u63a5\u6536\u5230\u5c0f\u4e8e\u8bad\u7ec3\u65f6\u4f7f\u7528\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u63d0\u793a\u65f6\u7684\u6027\u80fd\u4e0d\u8db3\u95ee\u9898\u3002|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684](http://arxiv.org/abs/2405.15684)|null|\u4e3a\u4e86\u5f25\u5408\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u901a\u5e38\u4f1a\u5b66\u4e60\u4e00\u4e2a\u9002\u914d\u5668\uff0c\u5c06\u89c6\u89c9\u8f93\u5165\u8f6c\u5316\u4e3a\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u7406\u89e3\u7684\u4ee4\u724c\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u9002\u914d\u5668\u751f\u6210\u7684\u89c6\u89c9\u4ee4\u724c\u76f8\u5bf9\u56fa\u5b9a\uff0c\u4e0d\u8003\u8651\u63d0\u793a\u4e2d\u63d0\u53ca\u7684\u5177\u4f53\u5bf9\u8c61\u3002\u7531\u4e8e\u8fd9\u4e9b\u9002\u914d\u5668\u5bf9\u56fe\u50cf\u4e2d\u7684\u6bcf\u4e2a\u7ec6\u8282\u5206\u914d\u540c\u7b49\u5173\u6ce8\uff0c\u4e14\u503e\u5411\u4e8e\u5904\u7406\u6574\u4e2a\u573a\u666f\uff0c\u8fd9\u53ef\u80fd\u4f1a\u589e\u52a0\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u573a\u666f\u65f6\u7684\u8ba4\u77e5\u8d1f\u8377\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u3002\u8fd9\u7c7b\u9002\u914d\u5668\u8bbe\u8ba1\u6709\u6839\u636e\u63d0\u793a\u7279\u5b9a\u5173\u6ce8\u70b9\u52a8\u6001\u5d4c\u5165\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u6587\u672c\u7279\u5f81\uff0c\u5728\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u5c42\u6b21\u4e0a\u6355\u6349\u4e0e\u63d0\u793a\u6700\u76f8\u5173\u7684\u89c6\u89c9\u7ebf\u7d22\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u89e3\u91ca\u89c6\u89c9\u5185\u5bb9\u7684\u80fd\u529b\u3002\u5728\u5404\u79cd\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u5982\u8ba1\u6570\u548c\u4f4d\u7f6e\u63a8\u7406\u5b9e\u9a8c\u4e2d\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u7684\u6548\u679c\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2405.15668": "|**2024-05-24**|**What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models**|Abdelrahman Abdelhamed et.al.|[2405.15668](http://arxiv.org/abs/2405.15668)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u591a\u6a21\u6001LLMs\u5e94\u7528\u4e8e\u56fe\u50cf\u8f93\u5165\uff0c\u751f\u6210\u8be6\u5c3d\u7684\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6587\u672c\u8868\u793a\u88ab\u8f6c\u5316\u4e3a\u8de8\u6a21\u6001\u5d4c\u5165\u7a7a\u95f4\u4e2d\u7684\u56fa\u5b9a\u7ef4\u7279\u5f81\uff0c\u5e76\u7ed3\u5408\u4f7f\u7528\u4e8e\u96f6\u6837\u672c\u5206\u7c7b\uff0c\u65e0\u9700\u4e3a\u6bcf\u4e2a\u6570\u636e\u96c6\u8bbe\u8ba1\u590d\u6742\u7684\u63d0\u793a\u3002\u7814\u7a76\u8005\u91c7\u7528\u901a\u7528\u63d0\u793a\u7b56\u7565\uff0c\u800c\u975e\u9488\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u5355\u72ec\u8c03\u6574\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u6bd4\u5148\u524d\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u6709\u6240\u63d0\u5347\u3002\u5e73\u5747\u800c\u8a00\uff0c\u5728\u5341\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u65b9\u6cd5\u6bd4\u4f20\u7edf\u65b9\u6cd5\u63d0\u9ad8\u4e864.1\u4e2a\u767e\u5206\u70b9\uff0c\u5c24\u5176\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e866.8\u4e2a\u767e\u5206\u70b9\u3002\u8fd9\u8868\u660e\uff0c\u591a\u6a21\u6001LLMs\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5982\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u4e4b\u7c7b\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u4efb\u52a1\uff0c\u4e3a\u73b0\u6709\u6280\u672f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002|\n", "2405.15662": "|**2024-05-24**|**Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning**|Wenhan Chang et.al.|[2405.15662](http://arxiv.org/abs/2405.15662)|null|\u5728\u4eba\u5de5\u667a\u80fd\u65f6\u4ee3\uff0c\u7528\u6237\u53ef\u80fd\u56e0\u9690\u79c1\u987e\u8651\u8981\u6c42AI\u516c\u53f8\u4ece\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u5220\u9664\u4ed6\u4eec\u7684\u4fe1\u606f\u3002\u4f5c\u4e3a\u6a21\u578b\u6240\u6709\u8005\uff0c\u91cd\u65b0\u8bad\u7ec3\u6a21\u578b\u4f1a\u6d88\u8017\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\uff0c\u56e0\u6b64\u673a\u5668\u9057\u5fd8\uff08machine unlearning\uff09\u6280\u672f\u5e94\u8fd0\u800c\u751f\uff0c\u4ee5\u5141\u8bb8\u5220\u9664\u8bf7\u6c42\u7684\u8bad\u7ec3\u6570\u636e\u6216\u7c7b\u522b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\uff0c\u5982\u56fe\u50cf\u6216\u6587\u672c\uff0c\u4ece\u6a21\u578b\u4e2d\u201c\u9057\u5fd8\u201d\u4e00\u4e2a\u7c7b\u522b\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u96be\u4ee5\u786e\u5b9a\u7c7b\u522b\u4e0e\u6a21\u578b\u4e4b\u95f4\u7684\u5173\u8054\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u6982\u5ff5\uff08Concept\uff09\u800c\u975e\u56fe\u50cf\u7279\u5f81\u6216\u6587\u672c\u6570\u636e\u4e2d\u7684\u4ee4\u724c\u6765\u8868\u793a\u8981\u5220\u9664\u7c7b\u522b\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u8fd9\u6709\u52a9\u4e8e\u5207\u65ad\u6a21\u578b\u4e0e\u7c7b\u522b\u7684\u8054\u7cfb\uff0c\u5b9e\u73b0\u5f7b\u5e95\u6d88\u9664\u5f71\u54cd\u3002 \u4e3a\u4e86\u5206\u6790\u590d\u6742\u6570\u636e\u4e2d\u7684\u6982\u5ff5\u5f71\u54cd\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u540e\u5904\u7406\u6982\u5ff5\u74f6\u9888\u6a21\u578b\u548c\u96c6\u6210\u68af\u5ea6\u6280\u672f\uff0c\u7cbe\u786e\u8bc6\u522b\u4e0d\u540c\u7c7b\u522b\u4e2d\u7684\u6982\u5ff5\u3002\u7136\u540e\uff0c\u6211\u4eec\u5229\u7528\u968f\u673a\u6807\u7b7e\u548c\u76ee\u6807\u6807\u7b7e\u7684\u6570\u636e\u6c61\u67d3\u7b56\u7565\uff0c\u63d0\u51fa\u9057\u5fd8\u65b9\u6cd5\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u6a21\u578b\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4e00\u81f4\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7b56\u7565\u80fd\u51c6\u786e\u5730\u4ece\u6a21\u578b\u4e2d\u62b9\u9664\u76ee\u6807\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u6027\u80fd\u7684\u5927\u90e8\u5206\u3002|\n", "2405.15652": "|**2024-05-24**|**$$\\mathbf{L^2\\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis**|Simen Gaure et.al.|[2405.15652](http://arxiv.org/abs/2405.15652)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u7ffb\u8bd1\u3001\u9884\u6d4b\u548c\u5185\u5bb9\u751f\u6210\u7b49\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\u3002\u540c\u65f6\uff0c\u7814\u7a76\u754c\u53d1\u73b0LLMs\u6613\u53d7\u653b\u51fb\uff0c\u4f46\u4e5f\u80fd\u589e\u5f3a\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f00\u6e90\u7684LLMs\u5728\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u5a92\u4ecb\uff0c\u5982\u652f\u6301\u6297\u5ba1\u67e5\u901a\u4fe1\u65b9\u9762\u7684\u80fd\u529b\u5982\u4f55\u5462\uff1f\u672c\u8bba\u6587\u4ece\u5b9e\u9a8c\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b9e\u8bc1\u6d4b\u91cf\u5f00\u6e90LLM\u6a21\u578b\uff08Llama-7B\uff09\u7684\u5b89\u5168\u6027\u4e0e\u5bb9\u91cf\uff0c\u4ee5\u8bc4\u4f30\u5176\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u7684\u6709\u6548\u6027\u3002\u5c3d\u7ba1\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8e\u8fd9\u79cd\u6a21\u578b\u7684\u901a\u9053\u4e0d\u592a\u53ef\u80fd\u5b9e\u73b0\u9ad8\u5b9e\u9645\u6bd4\u7279\u7387\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6d88\u606f\u957f\u5ea6\u548c\u6a21\u578b\u71b5\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5bf9\u624b\u53d1\u73b0\u9690\u79d8\u901a\u4fe1\u7684\u53ef\u80fd\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u4f7f\u7ed3\u679c\u6613\u4e8e\u5e7f\u6cdb\u53c2\u8003\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u4e14\u76f4\u89c2\u7684\u65b9\u6848\uff0c\u5e76\u5047\u8bbe\u6a21\u578b\u662f\u516c\u5f00\u53ef\u7528\u7684\u3002|\n", "2405.15646": "|**2024-05-24**|**LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots**|Ruoyu Wang et.al.|[2405.15646](http://arxiv.org/abs/2405.15646)|null|\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u5f00\u53d1\u901a\u7528\u670d\u52a1\u673a\u5668\u4eba\u7684\u9700\u6c42\u4fc3\u4f7f\u673a\u5668\u4eba\u5fc5\u987b\u80fd\u6070\u5f53\u5730\u6267\u884c\u591a\u79cd\u57fa\u7840\u884c\u4e3a\u3002\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fdb\u6b65\u4f7f\u5f97\u53ef\u4ee5\u76f4\u63a5\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u6210\u4efb\u52a1\u5e8f\u5217\uff0c\u65e0\u9700\u989d\u5916\u7684\u9886\u57df\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5c3d\u7ba1LLMs\u7684\u8f93\u51fa\u5728\u8bed\u4e49\u4e0a\u662f\u6b63\u786e\u7684\uff0c\u4f46\u751f\u6210\u7684\u4efb\u52a1\u8ba1\u5212\u53ef\u80fd\u5e76\u4e0d\u7cbe\u786e\u5730\u5bf9\u5e94\u4e8e\u53ef\u63a5\u53d7\u7684\u52a8\u4f5c\uff0c\u5e76\u4e14\u53ef\u80fd\u5b58\u5728\u5404\u79cd\u8bed\u8a00\u6a21\u7cca\u6027\u3002LLM\u7684\u5e7b\u89c9\u95ee\u9898\u5bf9\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u6784\u6210\u6311\u6218\uff0c\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u6216\u7528\u6237\u8f93\u5165\u4e0d\u7b26\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u7ea6\u675fLLM\u63d0\u793a\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u4ece\u547d\u4ee4\u4e2d\u751f\u6210\u53ef\u6267\u884c\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f02\u5e38\u5904\u7406\u6a21\u5757\u6765\u5e94\u5bf9LLM\u5e7b\u89c9\u95ee\u9898\uff0c\u786e\u4fdd\u751f\u6210\u7684\u7ed3\u679c\u5728\u5f53\u524d\u73af\u5883\u4e2d\u662f\u53ef\u63a5\u7eb3\u7684\u3002\u6211\u4eec\u5728RoboCup@Home\u547d\u4ee4\u751f\u6210\u5668\u751f\u6210\u7684\u547d\u4ee4\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u663e\u793a\u673a\u5668\u4eba\u5728\u7406\u89e3\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.15640": "|**2024-05-24**|**GECKO: Generative Language Model for English, Code and Korean**|Sungwoo Oh et.al.|[2405.15640](http://arxiv.org/abs/2405.15640)|null|\u6211\u4eec\u4ecb\u7ecdGECKO\uff0c\u4e00\u4e2a\u4e13\u4e3a\u97e9\u8bed\u548c\u82f1\u8bed\uff08\u5305\u62ec\u7f16\u7a0b\u8bed\u8a00\uff09\u8bbe\u8ba1\u7684\u53cc\u8bed\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u5b83\u57fa\u4e8eLLaMA\u67b6\u6784\uff0c\u4f7f\u7528\u5e73\u8861\u4e14\u9ad8\u8d28\u91cf\u7684\u97e9\u82f1\u8bed\u6570\u636e\u96c6\u8fdb\u884c\u9884\u8bad\u7ec3\u3002\u672c\u62a5\u544a\u8be6\u8ff0\u4e86\u6211\u4eec\u5728\u6784\u5efa\u6570\u636e\u7ba1\u9053\u548c\u8bad\u7ec3\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u52aa\u529b\u3002\u5c3d\u7ba1GECKO\u7684\u8bcd\u6c47\u91cf\u8f83\u5c0f\uff0c\u4f46\u5176\u5728\u751f\u6210\u97e9\u8bed\u548c\u82f1\u8bed\u4ee4\u724c\u65f6\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u80fd\u3002\u6211\u4eec\u5728\u4ee3\u8868\u6027\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30\u4e86\u5176\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u97e9\u56fdMMMLU\uff08\u97e9\u56fd\u591a\u6a21\u6001\u591a\u8bed\u8a00\u7406\u89e3\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u800c\u5728\u82f1\u8bed\u548c\u4ee3\u7801\u65b9\u9762\u5219\u663e\u793a\u51fa\u9002\u5ea6\u7684\u80fd\u529b\uff0c\u5c3d\u7ba1\u5176\u8bad\u7ec3\u7684\u4ee4\u724c\u6570\u91cf\u5c11\u4e8e\u4e13\u6ce8\u4e8e\u82f1\u8bed\u7684LLMs\u3002GECKO\u4ee5\u5bbd\u677e\u7684\u8bb8\u53ef\u534f\u8bae\u5bf9\u5f00\u6e90\u793e\u533a\u5f00\u653e\uff0c\u6211\u4eec\u5e0c\u671b\u5b83\u80fd\u4e3a\u97e9\u8bedLLM\u7814\u7a76\u63d0\u4f9b\u7814\u7a76\u57fa\u7ebf\u548c\u5b9e\u7528\u89c1\u89e3\u3002\u60a8\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\u8be5\u6a21\u578b\uff1ahttps://huggingface.co/kifai/GECKO-7B\u3002|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430](http://arxiv.org/abs/2405.17430)|null|## \u80cc\u666f \u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08\u5982LLaVA\uff09\u5728\u89c6\u89c9-\u8bed\u8a00\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4e9b\u6a21\u578b\u9996\u5148\u5c06\u56fe\u50cf\u5d4c\u5165\u5230\u5927\u91cf\u7684\u56fa\u5b9a\u89c6\u89c9\u4ee4\u724c\u4e2d\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u89c6\u9891\u7b49\u5bc6\u96c6\u89c6\u89c9\u573a\u666f\u65f6\u4f1a\u5bfc\u81f4\u5927\u91cf\u4ee4\u724c\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u5c3d\u7ba1\u5b58\u5728\u4ee4\u724c\u526a\u679d/\u5408\u5e76\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u4e3a\u6bcf\u4e2a\u56fe\u50cf\u751f\u6210\u5355\u4e2a\u957f\u5ea6\u7684\u8f93\u51fa\uff0c\u65e0\u6cd5\u5728\u4fe1\u606f\u5bc6\u5ea6\u4e0e\u6548\u7387\u4e4b\u95f4\u7075\u6d3b\u6743\u8861\u3002\u53d7\u5230\u5957\u5a03\u73a9\u5076\u6982\u5ff5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86M3\uff1a\u5957\u5a03\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u5b66\u4e60\u5c06\u89c6\u89c9\u5185\u5bb9\u8868\u793a\u4e3a\u6355\u6349\u4e0d\u540c\u7c97\u7ec6\u7c92\u5ea6\u4fe1\u606f\u7684\u5d4c\u5957\u89c6\u89c9\u4ee4\u724c\u96c6\u5408\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u65b9\u6cd5\u4e3aLMMs\u5e26\u6765\u4e86\u51e0\u4e2a\u72ec\u7279\u7684\u4f18\u52bf\uff1a(1) \u5728\u6d4b\u8bd5\u5b9e\u4f8b\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u660e\u786e\u63a7\u5236\u89c6\u89c9\u7c92\u5ea6\uff0c\u4f8b\u5982\uff0c\u6839\u636e\u5185\u5bb9\u7684\u590d\u6742\u6027\u6216\u7b80\u6d01\u6027\u8c03\u6574\u7528\u4e8e\u8868\u793a\u56fe\u50cf\u7684\u4ee4\u724c\u6570\u91cf\uff1b(2) M3\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5206\u6790\u73b0\u6709\u6570\u636e\u96c6\u6240\u9700\u7c92\u5ea6\u7684\u6846\u67b6\uff0c\u6211\u4eec\u53d1\u73b0\u50cfCOCO\u8fd9\u6837\u7684\u57fa\u51c6\u53ea\u9700\u8981\u5927\u7ea6~9\u4e2a\u89c6\u89c9\u4ee4\u724c\u5c31\u80fd\u83b7\u5f97\u4e0e\u4f7f\u7528\u6240\u6709576\u4e2a\u4ee4\u724c\u76f8\u5f53\u7684\u51c6\u786e\u6027\uff1b(3) \u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u63a2\u7d22\u6027\u80fd\u4e0e\u89c6\u89c9\u4ee4\u724c\u957f\u5ea6\u4e4b\u95f4\u7684\u6700\u4f73\u6743\u8861\u63d0\u4f9b\u4e86\u57fa\u7840\uff0c\u7814\u7a76\u663e\u793a\u5f53\u524d\u56fa\u5b9a\u89c4\u6a21\u8868\u793a\u4e0e\u7406\u60f3\u4e0a\u9650\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002|\n", "2405.17428": "|**2024-05-27**|**NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models**|Chankyu Lee et.al.|[2405.17428](http://arxiv.org/abs/2405.17428)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNV-Embed\u7684\u65b0\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u63d0\u5347\u57fa\u4e8e\u89e3\u7801\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5bc6\u96c6\u5411\u91cf\u68c0\u7d22\u3002NV-Embed\u901a\u8fc7\u591a\u79cd\u67b6\u6784\u8bbe\u8ba1\u548c\u8bad\u7ec3\u7b56\u7565\u663e\u8457\u589e\u5f3a\u6a21\u578b\u7684\u7075\u6d3b\u6027\u548c\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u7b80\u6d01\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002 \u5728\u67b6\u6784\u65b9\u9762\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9690\u5f0f\u6ce8\u610f\u529b\u5c42\u6765\u83b7\u53d6\u6c60\u5316\u5d4c\u5165\uff0c\u8fd9\u5728\u68c0\u7d22\u548c\u4e0b\u6e38\u4efb\u52a1\u51c6\u786e\u6027\u4e0a\u5747\u4f18\u4e8e\u5e73\u5747\u6c60\u5316\u6216\u4f7f\u7528LLMs\u7684\u6700\u540e\u4e00\u4e2a token\u5d4c\u5165\u3002\u4e3a\u4e86\u6539\u8fdb\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u79fb\u9664\u4e86LLMs\u7684\u81ea\u56de\u5f52\u6ce8\u610f\u529b\u63a9\u7801\uff0c\u5728\u5bf9\u6bd4\u6027\u8bad\u7ec3\u4e2d\u5141\u8bb8\u66f4\u5168\u9762\u7684\u4fe1\u606f\u4ea4\u4e92\u3002 \u5728\u8bad\u7ec3\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u5bf9\u6bd4\u6027\u6307\u4ee4\u8c03\u4f18\u65b9\u6cd5\u3002\u7b2c\u4e00\u9636\u6bb5\u5728\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6307\u4ee4\u8bad\u7ec3\uff0c\u5229\u7528\u6279\u6b21\u5185\u8d1f\u6837\u672c\u548c\u7cbe\u5fc3\u6311\u9009\u7684\u96be\u4f8b\u3002\u7b2c\u4e8c\u9636\u6bb5\u5c06\u5404\u79cd\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u6570\u636e\u878d\u5165\u6307\u4ee4\u8c03\u4f18\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u51c6\u786e\u6027\uff0c\u8fd8\u63d0\u5347\u4e86\u68c0\u7d22\u6027\u80fd\u3002 \u51ed\u501f\u8fd9\u4e9b\u521b\u65b0\uff0cNV-Embed\u4ec5\u4f7f\u7528\u516c\u5f00\u6570\u636e\u5c31\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u9ad8\u5206\uff0c\u8fbe\u523069.32\uff0c\u8363\u767b\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\uff08\u622a\u81f32024\u5e745\u670824\u65e5\uff09\u699c\u9996\uff0c\u6db5\u76d656\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u91cd\u6392\u3001\u5206\u7c7b\u3001\u805a\u7c7b\u548c\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u5ea6\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728BEIR\u768415\u9879\u68c0\u7d22\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u6700\u9ad8\u768459.36\u5206\u3002NV-Embed\u6a21\u578b\u7684\u6e90\u4ee3\u7801\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u5f00\u6e90\uff1ahttps://huggingface.co/nvidia/NV-Embed-v1\u3002|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427](http://arxiv.org/abs/2405.17427)|**[link](https://github.com/kuanchihhuang/reason3d)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b83\u4eec\u5728\u6982\u5ff5\u63a8\u7406\u7b49\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5728\u7406\u89e3\u4e09\u7ef4\u73af\u5883\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u76f8\u5bf9\u6709\u9650\u3002\u672c\u6587\u63d0\u51faReason3D\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u4e3a\u5168\u97623D\u7406\u89e3\u8bbe\u8ba1\u7684\u65b0\u9896LLM\u3002Reason3D\u63a5\u53d7\u70b9\u4e91\u6570\u636e\u548c\u6587\u672c\u63d0\u793a\u4f5c\u4e3a\u8f93\u5165\uff0c\u751f\u6210\u6587\u672c\u54cd\u5e94\u548c\u5206\u5272\u63a9\u7801\uff0c\u652f\u6301\u9ad8\u7ea7\u4efb\u52a1\uff0c\u59823D\u63a8\u7406\u5206\u5272\u3001\u5c42\u6b21\u641c\u7d22\u3001\u8868\u8fbe\u5f0f\u6307\u4ee3\u548c\u8be6\u7ec6\u63a9\u7801\u8f93\u51fa\u7684\u95ee\u7b54\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5206\u5c42\u63a9\u7801\u89e3\u7801\u5668\uff0c\u80fd\u591f\u7cbe\u786e\u5b9a\u4f4d\u5e7f\u9614\u573a\u666f\u4e2d\u7684\u5c0f\u7269\u4f53\u3002\u8be5\u89e3\u7801\u5668\u9996\u5148\u751f\u6210\u4e00\u4e2a\u7c97\u7565\u7684\u4f4d\u7f6e\u4f30\u8ba1\uff0c\u8986\u76d6\u7269\u4f53\u7684\u5927\u81f4\u533a\u57df\uff0c\u7136\u540e\u91c7\u7528\u9010\u6b65\u7ec6\u5316\u7684\u7b56\u7565\uff0c\u663e\u8457\u63d0\u9ad8\u5bf9\u8c61\u8bc6\u522b\u548c\u5206\u5272\u7684\u7cbe\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cReason3D\u5728ScanNet\u548cMatterport3D\u7b49\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\uff0c\u57283D\u8868\u8fbe\u5f0f\u6307\u4ee3\u30013D\u95ee\u7b54\u548c3D\u63a8\u7406\u5206\u5272\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1ahttps://github.com/KuanchihHuang/Reason3D\u3002**|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|\u7531\u4e8e\u5b9e\u4f53\u4ee3\u7406\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u5b83\u4eec\u5fc5\u987b\u5177\u5907\u5168\u9762\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u8fd1\u671f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\u3002\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u786e\u5b9a\u5177\u4f53\u52a8\u4f5c\u65f6\u53ef\u80fd\u5b58\u5728\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\uff0c\u79f0\u4e3a\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u6240\u80fd\u8fbe\u5230\u7684\u6210\u5c31\u9700\u8981\u66f4\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u662f\u6700\u5feb\u7684\uff0c\u6bd4\u4ee5\u524d\u5feb6.8\u500d\u3002|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418](http://arxiv.org/abs/2405.17418)|null|\u5f53\u673a\u5668\u4eba\u64cd\u4f5c\u7b56\u7565\u9762\u5bf9\u65b0\u4efb\u52a1\u6216\u7269\u4f53\u5b9e\u4f8b\u65f6\uff0c\u5176\u52a8\u4f5c\u6027\u80fd\u5f80\u5f80\u4e0d\u5c3d\u4eba\u610f\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u68c0\u6d4b\u548c\u81ea\u6211\u7ea0\u6b63\u5931\u8d25\u52a8\u4f5c\u7684\u80fd\u529b\u5bf9\u4e8e\u5b9e\u9645\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLM\uff09\u5728\u89c6\u89c9\u6307\u4ee4\u8ddf\u968f\u65b9\u9762\u5c55\u73b0\u51fa\u524d\u666f\uff0c\u5e76\u5728\u591a\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u5c06\u901a\u7528MLLM\u4f5c\u4e3a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4eba\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Self-Corrected (SC)-MLLM\uff0c\u4e0d\u4ec5\u4f7f\u5176\u80fd\u591f\u9884\u6d4b\u672b\u7aef\u6267\u884c\u5668\u4f4d\u7f6e\uff0c\u8fd8\u8d4b\u4e88\u5176\u81ea\u4e3b\u8bc6\u522b\u5e76\u7ea0\u6b63\u9519\u8bef\u52a8\u4f5c\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u53c2\u6570\u6548\u7387\u9ad8\u7684\u5fae\u8c03\uff0c\u4f7fMLLM\u5177\u5907\u59ff\u6001\u9884\u6d4b\u529f\u80fd\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u4e00\u4e2a\u8bed\u8a00\u5efa\u6a21\u95ee\u9898\u3002\u5728\u9047\u5230\u6267\u884c\u5931\u8d25\u65f6\uff0c\u6a21\u578b\u80fd\u8bc6\u522b\u4f4e\u5c42\u6b21\u52a8\u4f5c\u9519\u8bef\u7684\u539f\u56e0\uff08\u5982\u4f4d\u7f6e\u548c\u65cb\u8f6c\u8bef\u5dee\uff09\uff0c\u5e76\u4e3b\u52a8\u5bfb\u6c42\u4e13\u5bb6\u7684\u63d0\u793a\u3002\u6839\u636e\u53cd\u9988\uff0cSC-MLLM\u4f1a\u91cd\u65b0\u601d\u8003\u5f53\u524d\u5931\u8d25\u573a\u666f\uff0c\u751f\u6210\u4fee\u6b63\u540e\u7684\u52a8\u4f5c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u8fde\u7eed\u7b56\u7565\u5b66\u4e60\u65b9\u6cd5\uff0c\u9488\u5bf9\u6210\u529f\u7ea0\u6b63\u7684\u6837\u672c\uff0c\u63d0\u5347\u6a21\u578b\u5bf9\u5f53\u524d\u573a\u666f\u914d\u7f6e\u7684\u9002\u5e94\u6027\uff0c\u51cf\u5c11\u4e13\u5bb6\u5e72\u9884\u7684\u9891\u7387\u3002 \u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684SC-MLLM\uff0c\u6211\u4eec\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u6700\u5148\u8fdb\u7684\u673a\u5668\u4ebaMLLM\uff08ManipLLM\uff09\u76f8\u6bd4\uff0cSC-MLLM\u663e\u8457\u63d0\u9ad8\u4e86\u64cd\u4f5c\u7cbe\u5ea6\uff1a\u5728\u5df2\u77e5\u7269\u4f53\u7c7b\u522b\u4e0a\u4ece57%\u63d0\u5347\u81f379%\uff0c\u5728\u672a\u77e5\u65b0\u7c7b\u522b\u4e0a\u4ece47%\u63d0\u5347\u81f369%\u3002|\n", "2405.17402": "|**2024-05-27**|**THREAD: Thinking Deeper with Recursive Spawning**|Philip Schroeder et.al.|[2405.17402](http://arxiv.org/abs/2405.17402)|**[link](https://github.com/philipmit/thread)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u4f46\u968f\u7740\u4e0a\u4e0b\u6587\u7684\u957f\u5ea6\u548c\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u5b83\u4eec\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Thinking Recursively and Dynamically\uff08ThReaD\uff09\u65b9\u6cd5\u3002ThReaD\u5c06\u6a21\u578b\u751f\u6210\u8fc7\u7a0b\u6784\u60f3\u4e3a\u4e00\u4e2a\u6267\u884c\u6d41\u7a0b\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u53ef\u4ee5\u5b8c\u6574\u8fd0\u884c\u6216\u52a8\u6001\u5730\u521b\u5efa\u65b0\u7ebf\u7a0b\u3002\u901a\u8fc7\u5b50\u7ebf\u7a0b\uff0c\u6a21\u578b\u53ef\u4ee5\u5206\u53d1\u4efb\u52a1\uff08\u5982\u601d\u8003\u3001\u83b7\u53d6\u4fe1\u606f\uff09\uff0c\u5b50\u7ebf\u7a0b\u53ea\u8fd4\u56de\u7236\u7ebf\u7a0b\u6240\u9700\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u8ba9\u6a21\u578b\u80fd\u591f\u6839\u636e\u9700\u8981\u8c03\u6574\u4ea7\u751f\u4ee4\u724c\u65f6\u4f7f\u7528\u7684\u4e2d\u95f4\u5de5\u4f5c\u91cf\u3002\u6211\u4eec\u5728\u4efb\u52a1\u89e3\u51b3\u548c\u95ee\u7b54\u7b49\u573a\u666f\u4e2d\u5e94\u7528ThReaD\uff0c\u4f7f\u5176\u80fd\u9012\u5f52\u5730\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u6216\u95ee\u9898\u5206\u89e3\u4e3a\u9010\u6b65\u7b80\u5316\u7684\u5c0f\u5b50\u95ee\u9898\uff0c\u7531\u5355\u72ec\u7684\u5b50\u7ebf\u7a0b\u89e3\u51b3\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u7684\u65b9\u5f0f\u5b9e\u73b0ThReaD\uff0c\u5e76\u5728\u5305\u62ecALFWorld\u3001TextCraft\u3001WebShop\u5728\u5185\u7684\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30GPT-4\u548cGPT-3.5\u7684\u8868\u73b0\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65b0\u57fa\u51c6\uff1aDataCommons QA\u548cMIMIC-III ICU QA\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cThReaD\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u76f8\u5bf9\u4e8e\u73b0\u6709\u6846\u67b6\uff0c\u5373\u4f7f\u662f\u5c0f\u578b\u6a21\u578b\uff08\u5982Llama-3-8b\u548cCodeLlama-7b\uff09\u4e5f\u80fd\u63d0\u534710%\u523050%\u7684\u7edd\u5bf9\u5206\u6570\u3002|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386](http://arxiv.org/abs/2405.17386)|**[link](https://github.com/cone-mt/mindmerger)**|## \u4efb\u52a1 \u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u82f1\u8bed\u4e0e\u5176\u4ed6\u975e\u82f1\u8bed\u8bed\u8a00\u4e4b\u95f4\u7684\u5dee\u8ddd\u660e\u663e\u3002\u4e00\u4e9b\u7814\u7a76\u901a\u8fc7\u5fae\u8c03LLMs\u4ee5\u91cd\u65b0\u5b66\u4e60\u975e\u82f1\u8bed\u7684\u63a8\u7406\u80fd\u529b\uff0c\u800c\u53e6\u4e00\u4e9b\u65b9\u6cd5\u5219\u4f7f\u7528\u5916\u90e8\u6a21\u578b\uff08\u5982\u82f1\u8bed\u7ffb\u8bd1\u6587\u672c\uff09\u7684\u8f93\u51fa\u6765\u66ff\u6362\u975e\u82f1\u8bed\u8f93\u5165\uff0c\u4ee5\u5e94\u5bf9LLM\u7406\u89e3\u975e\u82f1\u8bed\u7684\u6311\u6218\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u672a\u80fd\u5145\u5206\u5229\u7528LLMs\u5185\u5728\u7684\u63a8\u7406\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528LLMs\u7684\u601d\u7ef4\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u79f0\u4e3aMindMerger\uff0c\u5b83\u5c06LLMs\u4e0e\u591a\u8bed\u8a00\u6a21\u578b\u7684\u5916\u90e8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u4ee5\u63d0\u5347\u591a\u8bed\u8a00\u63a8\u7406\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e24\u6b65\u8bad\u7ec3\u7b56\u7565\uff0c\u9996\u5148\u5c06\u5916\u90e8\u80fd\u529b\u5d4c\u5165LLMs\uff0c\u7136\u540e\u8bad\u7ec3\u5916\u90e8\u80fd\u529b\u548c\u5185\u7f6e\u80fd\u529b\u7684\u534f\u4f5c\u4f7f\u7528\u3002\u5728\u4e09\u4e2a\u591a\u8bed\u8a00\u63a8\u7406\u6570\u636e\u96c6\u548c\u4e00\u4e2a\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMindMerger\u59cb\u7ec8\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u7279\u522b\u662f\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e0a\u3002\u5728\u4e0d\u66f4\u65b0LLMs\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0cMGSM\u6570\u636e\u96c6\u4e0a\u6240\u6709\u8bed\u8a00\u7684\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e866.7%\uff0c\u4f4e\u8d44\u6e90\u8bed\u8a00\u63d0\u9ad8\u4e868.0%\u3002|\n", "2405.17382": "|**2024-05-27**|**ReMoDetect: Reward Models Recognize Aligned LLM's Generations**|Hyunseok Lee et.al.|[2405.17382](http://arxiv.org/abs/2405.17382)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6613\u7528\u6027\u63d0\u5347\uff0c\u5b83\u4eec\u5e26\u6765\u7684\u793e\u4f1a\u98ce\u9669\uff0c\u5982\u5047\u65b0\u95fb\u751f\u6210\uff0c\u4fc3\u4f7f\u5f00\u53d1\u51fa\u80fd\u68c0\u6d4bLLM\u751f\u6210\u6587\u672c\uff08LGT\uff09\u7684\u65b9\u6cd5\u4ee5\u786e\u4fdd\u5b89\u5168\u4f7f\u7528\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5927\u91cfLLM\u7684\u5b58\u5728\uff0c\u9010\u4e2a\u8bc6\u522b\u5b83\u4eec\u7684\u7279\u70b9\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u5173\u6ce8\u7684\u662f\u8fd9\u4e9b\u5f3a\u5927\u6a21\u578b\u5171\u6709\u7684\u7279\u6027\uff0c\u5373\u201c\u5bf9\u9f50\u8bad\u7ec3\u201d\uff0c\u5373\u8bad\u7ec3LLMs\u751f\u6210\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u6587\u672c\u3002\u6211\u4eec\u7684\u5173\u952e\u53d1\u73b0\u662f\uff0c\u968f\u7740\u8fd9\u4e9b\u5bf9\u9f50\u8bad\u7ec3\u7684LLMs\u81f4\u529b\u4e8e\u6700\u5927\u5316\u4eba\u7c7b\u504f\u597d\uff0c\u5b83\u4eec\u751f\u6210\u7684\u6587\u672c\u751a\u81f3\u6bd4\u4eba\u7c7b\u64b0\u5199\u7684\u6587\u672c\u5728\u4f30\u8ba1\u504f\u597d\u4e0a\u66f4\u9ad8\uff0c\u8fd9\u4f7f\u5f97\u5229\u7528\u504f\u597d\u6a21\u578b\uff08\u4e00\u4e2a\u8bad\u7ec3\u6765\u6a21\u62df\u4eba\u7c7b\u504f\u597d\u5206\u5e03\u7684LLM\uff09\u8f7b\u6613\u5c31\u80fd\u68c0\u6d4b\u5230\u8fd9\u4e9b\u6587\u672c\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e24\u79cd\u8fdb\u4e00\u6b65\u589e\u5f3a\u504f\u597d\u6a21\u578b\u68c0\u6d4b\u80fd\u529b\u7684\u8bad\u7ec3\u7b56\u7565\uff1a\uff081\uff09\u6301\u7eed\u504f\u597d\u5fae\u8c03\uff0c\u4f7f\u6a21\u578b\u66f4\u504f\u5411\u4e8e\u8bc6\u522b\u5bf9\u9f50\u7684LLG\uff1b\uff082\uff09\u5956\u52b1\u6a21\u578b\u5bf9\u4eba/LLM\u6df7\u5408\u6587\u672c\u7684\u5b66\u4e60\uff0c\u5373\u4f7f\u7528\u5bf9\u9f50LLM\u91cd\u8ff0\u7684\u4eba\u7c7b\u539f\u521b\u6587\u672c\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8eLGT\u548c\u4eba\u7c7b\u6587\u672c\u4e4b\u95f4\u7684\u504f\u597d\u57fa\u51c6\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u5b66\u4e60\u51b3\u7b56\u8fb9\u754c\u3002\u6211\u4eec\u5728\u516d\u4e2a\u6587\u672c\u9886\u57df\u548c\u5341\u4e8c\u79cd\u5bf9\u9f50LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5728https://github.com/hyunseoklee-ai/reward_llm_detect\u4e0a\u63d0\u4f9b\u3002|\n", "2405.17378": "|**2024-05-27**|**RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects**|Ahmed Allam et.al.|[2405.17378](http://arxiv.org/abs/2405.17378)|**[link](https://github.com/AUCOHL/RTL-Repo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8f85\u52a9\u8fdb\u884c\u5bc4\u5b58\u5668\u4f20\u8f93\u7ea7\uff08Register Transfer Level, RTL\uff09\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5728\u53cd\u6620\u771f\u5b9e\u4e16\u754cRTL\u9879\u76ee\u590d\u6742\u6027\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u4e3a\u6b64\uff0c\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u57fa\u51c6\u2014\u2014RTL-Repo\uff0c\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5927\u89c4\u6a21RTL\u8bbe\u8ba1\u9879\u76ee\u4e2d\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u3002RTL-Repo\u5305\u542b\u4e86\u4eceGitHub\u516c\u5171\u4ed3\u5e93\u63d0\u53d6\u7684\u8d85\u8fc74000\u4e2aVerilog\u4ee3\u7801\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u63d0\u4f9b\u4e86\u5bf9\u5e94\u4ed3\u5e93\u7684\u5b8c\u6574\u4e0a\u4e0b\u6587\u3002\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u3001GPT-3.5\u3001Starcoder2\u4ee5\u53ca\u50cfVeriGen\u548cRTLCoder\u8fd9\u6837\u7684Verilog\u4e13\u7528\u6a21\u578b\u5728\u5185\u7684\u591a\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728RTL-Repo\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6bd4\u8f83\u5b83\u4eec\u5728\u751f\u6210\u590d\u6742\u9879\u76ee\u7684Verilog\u4ee3\u7801\u65b9\u9762\u7684\u8868\u73b0\u3002RTL-Repo\u4e3a\u786c\u4ef6\u8bbe\u8ba1\u793e\u533a\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b9d\u8d35\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645RTL\u8bbe\u8ba1\u573a\u666f\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u9488\u5bf9\u590d\u6742\u7684\u591a\u6587\u4ef6RTL\u9879\u76ee\u4e13\u95e8\u8bad\u7ec3Verilog\u4ee3\u7801\u751f\u6210\u3002RTL-Repo\u662f\u5f00\u6e90\u7684\uff0c\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u53ef\u7528\u3002|\n", "2405.17374": "|**2024-05-28**|**Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models**|ShengYun Peng et.al.|[2405.17374](http://arxiv.org/abs/2405.17374)|null|### \u80cc\u666f \u5b89\u5168\u6821\u51c6\u662f\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u884c\u4e3a\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u5e76\u907f\u514d\u6709\u5bb3\u884c\u4e3a\u7684\u5173\u952e\uff0c\u4f46\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u4ec5\u4f7f\u7528\u5c11\u91cf\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bad\u7ec3\u6837\u672c\u6765\u5fae\u8c03\u6a21\u578b\u53ef\u80fd\u5bfc\u81f4\u5b89\u5168\u6027\u88ab\u8f7b\u6613\u7834\u574f\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u901a\u8fc7\u63a2\u7d22LLM\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u98ce\u9669\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u4e2a\u666e\u904d\u5b58\u5728\u4e8e\u6d41\u884c\u5f00\u6e90LLM\u6a21\u578b\u53c2\u6570\u7a7a\u95f4\u4e2d\u7684\u65b0\u73b0\u8c61\uff0c\u79f0\u4e3a\u201c\u5b89\u5168\u76c6\u5730\u201d\uff1a\u968f\u673a\u6270\u52a8\u6a21\u578b\u6743\u91cd\u80fd\u4f7f\u6a21\u578b\u5728\u5c40\u90e8\u533a\u57df\u4fdd\u6301\u539f\u59cb\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u6027\u3002 ### \u53d1\u73b0\u4e0e\u8d21\u732e \u6211\u4eec\u7684\u53d1\u73b0\u542f\u53d1\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5b89\u5168\u5ea6\u91cf\u65b9\u6cd5\u2014\u2014VISAGE\uff0c\u5b83\u901a\u8fc7\u63a2\u6d4b\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30LLM\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u5b89\u5168\u6027\u3002\u53ef\u89c6\u5316\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6709\u52a9\u4e8e\u7406\u89e3\u5fae\u8c03\u5982\u4f55\u4f7f\u6a21\u578b\u504f\u79bb\u5b89\u5168\u76c6\u5730\uff0c\u4ece\u800c\u635f\u5bb3\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u7cfb\u7edf\u63d0\u793a\u5728\u4fdd\u62a4\u6a21\u578b\u65b9\u9762\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u79cd\u4fdd\u62a4\u751a\u81f3\u4f1a\u4f20\u9012\u7ed9\u5904\u4e8e\u5b89\u5168\u76c6\u5730\u5185\u7684\u6270\u52a8\u7248\u672c\u3002\u8fd9\u4e9b\u4ece\u5b89\u5168\u666f\u89c2\u7814\u7a76\u4e2d\u5f97\u51fa\u7684\u89c1\u89e3\u4e3a\u672a\u6765LLM\u5b89\u5168\u9886\u57df\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b0\u7684\u6d1e\u89c1\u3002|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414](http://arxiv.org/abs/2405.18414)|null|## \u80cc\u666f \u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval Augmented Generation\uff0cRAG\uff09\u901a\u8fc7\u7ed3\u5408\u73b0\u6709\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u7684\u54cd\u5e94\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u6587\u6863\u4e0e\u95ee\u9898\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u6027\u4e0d\u660e\u663e\u6216\u5b58\u5728\u90e8\u5206\u4fe1\u606f\u65f6\uff0cRAG\u7684\u6548\u679c\u5982\u4f55\uff1f\u53c8\u8be5\u5982\u4f55\u5904\u7406\u6587\u6863\u4e4b\u95f4\u7684\u5173\u8054\u6027\u5462\uff1f\u672c\u7814\u7a76\u65e8\u5728\u89e3\u7b54RAG\u751f\u6210\u4e2d\u7684\u8fd9\u4e24\u4e2a\u6838\u5fc3\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aG-RAG\u7684\u65b9\u6cd5\uff0c\u5b83\u662f\u4e00\u4e2a\u57fa\u4e8e\u56fe\u795e\u7ecf\u7f51\u7edc\uff08Graph Neural Networks\uff0cGNNs\uff09\u7684\u91cd\u6392\u5668\uff0c\u4ecb\u4e8eRAG\u7684\u68c0\u7d22\u5668\u548c\u9605\u8bfb\u5668\u4e4b\u95f4\u3002G-RAG\u7ed3\u5408\u4e86\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u63a5\u6027\u548c\u8bed\u4e49\u4fe1\u606f\uff08\u901a\u8fc7\u62bd\u8c61\u610f\u4e49\u8868\u793a\u56fe\uff09\uff0c\u4e3aRAG\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u6709\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u6392\u540d\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cG-RAG\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u9886\u5148\u65b9\u6cd5\uff0c\u540c\u65f6\u8ba1\u7b97\u5f00\u9500\u66f4\u5c0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86PaLM 2\u4f5c\u4e3a\u91cd\u6392\u5668\u7684\u8868\u73b0\uff0c\u53d1\u73b0\u5176\u660e\u663e\u900a\u8272\u4e8eG-RAG\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5373\u4f7f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u91cd\u6392\u5728RAG\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2405.18386": "|**2024-05-28**|**Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning**|Yixiao Zhang et.al.|[2405.18386](http://arxiv.org/abs/2405.18386)|**[link](https://github.com/ldzhangyx/instruct-MusicGen)**|**\u5728\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u9886\u57df\uff0c\u8fd1\u671f\u7684\u8fdb\u6b65\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u6765\u6539\u53d8\u97f3\u4e50\u98ce\u683c\u6216\u8c03\u6574\u4e50\u5668\u5143\u7d20\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u4ece\u5934\u8bad\u7ec3\u7279\u5b9a\u7684\u7f16\u8f91\u6a21\u578b\uff0c\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\uff0c\u8981\u4e48\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u7f16\u8f91\u540e\u7684\u97f3\u4e50\uff0c\u5bfc\u81f4\u97f3\u9891\u91cd\u5efa\u4e0d\u591f\u7cbe\u786e\u3002\u4e3a\u4e86\u7ed3\u5408\u4f18\u70b9\u5e76\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Instruct-MusicGen\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u9488\u5bf9\u9884\u8bad\u7ec3\u7684MusicGen\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9ad8\u6548\u5730\u6267\u884c\u7f16\u8f91\u6307\u4ee4\uff0c\u5982\u6dfb\u52a0\u3001\u5220\u9664\u6216\u5206\u79bb\u97f3\u8f68\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fee\u6539\u4e86\u539f\u59cbMusicGen\u67b6\u6784\uff0c\u5f15\u5165\u4e86\u6587\u672c\u878d\u5408\u6a21\u5757\u548c\u97f3\u9891\u878d\u5408\u6a21\u5757\uff0c\u4f7f\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6307\u4ee4\u6587\u672c\u548c\u97f3\u9891\u8f93\u5165\uff0c\u751f\u6210\u6240\u9700\u7684\u7f16\u8f91\u97f3\u4e50\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cInstruct-MusicGen\u4ec5\u5411\u539f\u59cb\u6a21\u578b\u589e\u52a0\u4e868%\u7684\u65b0\u53c2\u6570\uff0c\u5e76\u57285000\u6b65\u7684\u8bad\u7ec3\u540e\uff0c\u5176\u6027\u80fd\u8d85\u8d8a\u73b0\u6709\u57fa\u51c6\uff0c\u4e14\u8868\u73b0\u51fa\u4e0e\u4e13\u95e8\u9488\u5bf9\u4efb\u52a1\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u8fdb\u5c55\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u7684\u6548\u7387\uff0c\u8fd8\u62d3\u5bbd\u4e86\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u5728\u52a8\u6001\u97f3\u4e50\u5236\u4f5c\u73af\u5883\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002**|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380](http://arxiv.org/abs/2405.18380)|**[link](https://github.com/pixeli99/owlore)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\u3002\u7136\u800c\uff0c\u5927\u6a21\u578b\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u7b49\u53c2\u6570\u9ad8\u6548\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u4f46\u5f80\u5f80\u727a\u7272\u6027\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5185\u5b58\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u2014\u2014Outlier-weighed Layerwise Sampled Low-Rank Projection\uff08OwLore\uff09\uff0c\u5b83\u53d7\u5230LLMs\u5c42\u95f4\u5f02\u5e38\u5206\u5e03\u7684\u542f\u53d1\uff0c\u901a\u8fc7\u52a8\u6001\u91c7\u6837\u9884\u8bad\u7ec3\u5c42\u800c\u975e\u6dfb\u52a0\u989d\u5916\u9002\u914d\u5668\u6765\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u9996\u5148\u901a\u8fc7Heavy-Tailed Self-Regularization\u7406\u8bba\uff08HT-SR\uff09\u89e3\u8bfb\u5f02\u5e38\u73b0\u8c61\uff0c\u53d1\u73b0\u5177\u6709\u66f4\u591a\u5f02\u5e38\u503c\u7684\u5c42\u66f4\u503e\u5411\u4e8e\u5448\u73b0\u957f\u5c3e\u5206\u5e03\uff0c\u8bad\u7ec3\u6548\u679c\u66f4\u597d\u3002\u56e0\u6b64\uff0cOwLore\u7b56\u7565\u6027\u5730\u4e3a\u5f02\u5e38\u503c\u8f83\u591a\u7684\u5c42\u5206\u914d\u66f4\u9ad8\u7684\u91c7\u6837\u6982\u7387\uff0c\u4ee5\u66f4\u597d\u5730\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u77e5\u8bc6\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u51cf\u5c11\u5fae\u8c03\u65f6\u7684\u5185\u5b58\u9700\u6c42\uff0c\u6211\u4eec\u7ed3\u5408\u68af\u5ea6\u4f4e\u79e9\u6295\u5f71\uff0c\u4f7f\u5f97\u6bcf\u4e00\u5c42\u80fd\u4ee5\u4f4e\u79e9\u65b9\u5f0f\u9ad8\u6548\u8bad\u7ec3\u3002\u901a\u8fc7\u878d\u5408\u4f4e\u79e9\u4f18\u52bf\u548c\u6700\u4f18\u5c42\u522b\u91c7\u6837\u7b56\u7565\uff0cOwLore\u663e\u8457\u4f18\u5316\u4e86LLM\u526a\u679d\u4e2d\u7684\u5185\u5b58-\u6027\u80fd\u6743\u8861\u3002\u6211\u4eec\u5728\u591a\u4e2a\u67b6\u6784\uff0c\u5982LLaMa2\u3001LLaMa3\u548cMistral\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cOwLore\u6301\u7eed\u4f18\u4e8e\u57fa\u7840\u65b9\u6cd5\uff0c\u5305\u62ec\u5168\u91cf\u5fae\u8c03\u3002\u4f8b\u5982\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u57fa\u51c6\u4e0a\uff0cOwLore\u53ef\u5b9e\u73b0\u5e73\u57471.1%\u7684\u7cbe\u5ea6\u63d0\u5347\uff0cMMLU\u4e0a\u63d0\u9ad83.0%\uff0c\u800c\u5728MT-Bench\u4e0a\u66f4\u662f\u6709\u663e\u8457\u768410%\u63d0\u5347\uff0c\u540c\u65f6\u5185\u5b58\u6548\u7387\u66f4\u9ad8\u3002\u7279\u522b\u5730\uff0cOwLore\u4ec5\u970021GB\u5185\u5b58\u5373\u53ef\u5bf9LLaMa2-7B\u8fdb\u884c\u5fae\u8c03\u3002**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377](http://arxiv.org/abs/2405.18377)|null|\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u590d\u6742\u63a8\u7406\u3001\u60c5\u611f\u5206\u6790\u7b49\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u63a8\u52a8\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f3a\u5927\u7684\u529f\u80fd\u4f34\u968f\u7740\u5de8\u5927\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u9650\u5236\u4e86\u5728\u5927\u591a\u6570\u786c\u4ef6\u5e73\u53f0\u4e0a\u7684\u4f7f\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8eLLaMA2-7B\u8fdb\u884c\u5355\u6b21\u5fae\u8c03\u540e\uff0c\u901a\u8fc7\u9057\u4f20\u7b97\u6cd5\u641c\u7d22\u627e\u5230\u66f4\u5c0f\u3001\u8ba1\u7b97\u590d\u6742\u5ea6\u66f4\u4f4e\u7684\u7f51\u7edc\u67b6\u6784\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u6807\u51c6\u57fa\u51c6\u4efb\u52a1\uff0c\u9884\u8bad\u7ec3\u7684LLaMA2-7B\u6a21\u578b\u5b9e\u9645\u4e0a\u8fc7\u4e8e\u5e9e\u5927\u4e14\u590d\u6742\u3002\u6211\u4eec\u5b9e\u73b0\u4e861.5\u500d\u7684\u6a21\u578b\u5927\u5c0f\u7f29\u51cf\u548c1.3\u500d\u7684\u541e\u5410\u91cf\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51e0\u4e4e\u65e0\u635f\u7684\u51c6\u786e\u6027\u3002\u76f8\u8f83\u4e8e\u67d0\u4e9b\u526a\u679d\u6216\u7a00\u758f\u5316\u6280\u672f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6548\u7387\u548c\u6548\u679c\u4e0a\u66f4\u4e3a\u4f18\u8d8a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u91cf\u5316\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\u7684\u6548\u679c\uff0c\u8fdb\u4e00\u6b65\u901a\u8fc7\u91cf\u5316\u51cf\u5c11\u4e86\u627e\u5230\u7684\u7f51\u7edc\u7684\u5927\u5c0f\u548c\u590d\u6742\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u672c\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u79cd\u81ea\u52a8\u521b\u5efa\u53ef\u5728\u66f4\u5ec9\u4ef7\u548c\u5e7f\u6cdb\u53ef\u7528\u786c\u4ef6\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684LLMs\u7684\u65b9\u6cd5\u3002|\n", "2405.18376": "|**2024-05-28**|**Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning**|Dongjie Chen et.al.|[2405.18376](http://arxiv.org/abs/2405.18376)|**[link](https://github.com/Dong-Jie-Chen/RCL)**|**### \u80cc\u666f \u6e90\u514d\u8d39\u9886\u57df\u9002\u5e94\uff08SFDA\uff09\u7684\u76ee\u6807\u662f\u4ec5\u4f7f\u7528\u672a\u6807\u8bb0\u7684\u9776\u57df\u6570\u636e\u6765\u8c03\u6574\u9884\u8bad\u7ec3\u7684\u6e90\u6a21\u578b\u3002\u5f53\u524d\u7684SFDA\u65b9\u6cd5\u5728\u6709\u6548\u5229\u7528\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u6316\u6398\u9776\u57df\u6570\u636e\u6f5c\u529b\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u548c\u6587\u672c\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5e94\u7528\u4e8eSFDA\u65f6\u5b58\u5728\u95ee\u9898\uff0c\u5982\u6307\u4ee4\u6267\u884c\u5931\u8d25\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\u4ee5\u53ca\u5728\u9002\u5e94\u524d\u6027\u80fd\u8bc4\u4f30\u56f0\u96be\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014\u53ef\u9760\u6027\u57fa\u4e8e\u8bfe\u7a0b\u5b66\u4e60\uff08RCL\uff09\uff0c\u5b83\u901a\u8fc7\u4f2a\u6807\u7b7e\u5316\u6574\u5408\u591a\u4e2aMLLM\u4ee5\u4fc3\u8fdb\u77e5\u8bc6\u5229\u7528\uff0c\u5e94\u7528\u4e8eSFDA\u3002 ### \u65b9\u6cd5 \u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\uff1a1) \u53ef\u9760\u77e5\u8bc6\u8f6c\u79fb\uff0c2) \u81ea\u6211\u7ea0\u6b63\uff0c3) MLLM\u5f15\u5bfc\u7684\u77e5\u8bc6\u6269\u5c55\uff0c\u4ee5\u53ca4) \u591a\u70ed\u63a9\u7801\u7cbe\u70bc\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u534f\u540c\u4f5c\u7528\uff0c\u9010\u6b65\u53d1\u6398\u9776\u57df\u672a\u6807\u8bb0\u6570\u636e\u7684\u4ef7\u503c\u3002RCL\u5728\u591a\u4e2aSFDA\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\uff08SOTA\uff09\u6027\u80fd\uff0c\u4f8b\u5982\u5728DomainNet\u4e0a\u63d0\u5347\u663e\u8457\uff0c\u8fbe\u5230$\\textbf{+9.4\\%}$\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u589e\u5f3a\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u540c\u65f6\u65e0\u9700\u8bbf\u95ee\u6e90\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Dong-Jie-Chen/RCL\u83b7\u53d6\u3002**|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375](http://arxiv.org/abs/2405.18375)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\u5e38\u8bc6\u63a8\u7406\u662f\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u7684\u91cd\u8981\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u6b64\u5df2\u5f00\u53d1\u51fa\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5927\u591a\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u521b\u5efa\u5e73\u884c\u57fa\u51c6\u6709\u52a9\u4e8e\u8de8\u8bed\u8a00\u8bc4\u4f30\uff0c\u4ece\u800c\u66f4\u597d\u5730\u7406\u89e3\u4e0d\u540c\u8bed\u8a00\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u6cf0\u8bed\u7248\u7684Winograd Schema\u96c6\u5408\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u6d4b\u8bd5\u6cf0\u8bed\u4e2d\u7684\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u65b0\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u9080\u8bf7\u6bcd\u8bed\u8005\u3001\u4e13\u4e1a\u7ffb\u8bd1\u548c\u4e25\u683c\u9a8c\u8bc1\u7684\u65b9\u6cd5\uff0c\u786e\u4fdd\u8be5\u7cfb\u5217\u9898\u5e93\u80fd\u51c6\u786e\u53cd\u6620\u6cf0\u56fd\u8bed\u8a00\u7684\u72ec\u7279\u6027\u3001\u4e60\u8bed\u548c\u6587\u5316\u5f15\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u7cca\u6027\u548c\u5e38\u8bc6\u6311\u6218\u3002\u6211\u4eec\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u548cClaude-3-Opus\uff09\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5c3d\u7ba1\u5728\u82f1\u8bed\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u4f46\u5b83\u4eec\u5728\u6cf0\u8bed\u4e2d\u7684\u6027\u80fd\u660e\u663e\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u5728\u591a\u8bed\u8a00\u5e38\u8bc6\u63a8\u7406\u65b9\u9762\u4ecd\u6709\u5f85\u8fdb\u6b65\u3002|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369](http://arxiv.org/abs/2405.18369)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u7684\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u5b83\u4eec\u6210\u529f\u7684\u5173\u952e\u5728\u4e8e\u63d0\u793a\u7684\u6982\u5ff5\uff0c\u5373\u6307\u5bfc\u6a21\u578b\u751f\u6210\u8f93\u51fa\u3002\u7136\u800c\uff0c\u624b\u52a8\u521b\u5efa\u63d0\u793a\u65e2\u8017\u65f6\u53c8\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\uff0c\u56e0\u6b64\u9700\u8981\u81ea\u52a8\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u4ecb\u7ecdPromptWizard\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u8fed\u4ee3\u5730\u5408\u6210\u548c\u4f18\u5316\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u63d0\u793a\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0cPromptWizard\u540c\u65f6\u4f18\u5316\u63d0\u793a\u6307\u4ee4\u548c\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u4ee5\u6700\u5927\u5316\u6a21\u578b\u6027\u80fd\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u53d8\u5f02\u6307\u4ee4\u5e76\u5f15\u5165\u8d1f\u4f8b\uff0c\u9010\u6b65\u6df1\u5316\u7406\u89e3\u5e76\u4fdd\u8bc1\u591a\u6837\u6027\u3002\u501f\u52a9\u4e00\u4e2a\u8bc4\u5224\u8005\uff0cPromptWizard\u8fdb\u4e00\u6b65\u6539\u8fdb\u6307\u4ee4\u548c\u793a\u4f8b\uff0c\u878d\u5165\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u4ee5\u5b9e\u73b0\u6700\u4f73\u8868\u73b0\u3002PromptWizard\u5177\u6709\u8ba1\u7b97\u6548\u7387\u9ad8\u3001\u9002\u5e94\u4e0d\u540c\u8bad\u7ec3\u6570\u636e\u91cf\u573a\u666f\u4ee5\u53ca\u5728\u5c0f\u578bLLM\u4e0a\u540c\u6837\u6709\u6548\u7684\u7279\u70b9\u3002\u901a\u8fc7\u5bf98\u4e2a\u6570\u636e\u96c6\u768435\u4e2a\u4efb\u52a1\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aPromptWizard\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u7b56\u7565\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u63d0\u793a\u4f18\u5316\u65b9\u9762\u7684\u9ad8\u6548\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361](http://arxiv.org/abs/2405.18361)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u4efb\u52a1\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u57fa\u4e8e\u7aef\u5230\u7aef\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u5e94\u7528\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u8bd5\u56fe\u878d\u5408\u5f3a\u5927\u7684\u903b\u8f91\u63a8\u7406\u548c\u8ba4\u77e5\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u5168\u9762\u7684\u7aef\u5230\u7aef\u89c4\u5212\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684VLM\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e2D\u89c6\u89c9\u5206\u8bcd\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5728\u5904\u7406\u4e09\u7ef4\u51e0\u4f55\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u9760\u7684\u89c4\u5212\u81f3\u5173\u91cd\u8981\u3002\u7814\u7a76\u8868\u660e\uff0c2D\u5206\u8bcd\u7684LLM\u5e76\u4e0d\u80fd\u51c6\u786e\u611f\u77e5\u4e09\u7ef4\u73af\u5883\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8eVLM\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u53ef\u9760\u6027\u7684\u8d28\u7591\u3002 \u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86DETR\u98ce\u683c\u76843D\u611f\u77e5\u5668\u4f5c\u4e3a3D\u5206\u8bcd\u5668\uff0c\u4e0e\u5355\u5c42\u7ebf\u6027\u6295\u5f71\u5668\u76f8\u8fde\uff0c\u5de7\u5999\u5730\u5229\u7528\u4e86\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u7684\u56fa\u6709\u7279\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u9ad8\u5206\u8fa8\u7387\u591a\u89c6\u89d2\u56fe\u50cf\u7684\u540c\u65f6\u5904\u7406\u548c\u65f6\u7a7a\u5efa\u6a21\u3002\u5c3d\u7ba1\u7b80\u5355\uff0c\u4f46Atlas\u5728NuScenes\u6570\u636e\u96c6\u4e0a\u76843D\u68c0\u6d4b\u548c\u81ea\u4e3b\u9a7e\u9a76\u89c4\u5212\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e863D\u5206\u8bcd\u7684LLM\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u9760\u81ea\u52a8\u9a7e\u9a76\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2405.18359": "|**2024-05-28**|**Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs**|Somnath Kumar et.al.|[2405.18359](http://arxiv.org/abs/2405.18359)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5168\u7403\u8303\u56f4\u5185\u91cd\u5851\u4f17\u591a\u9886\u57df\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u975e\u62c9\u4e01\u5b57\u6bcd\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u65f6\u7684\u5305\u5bb9\u6027\u548c\u6548\u679c\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u5173\u952e\u6311\u6218\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u5927\u91cf\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u591a\u8bed\u8a00LLMs\u7684\u8868\u73b0\u3002\u901a\u8fc7\u7cfb\u7edf\u5730\u7814\u7a76\u548c\u8bc4\u4f30\u5404\u79cd\u8bed\u8a00\u5728\u6d41\u884c\u7684\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b0\u9896\u6280\u672f\uff0c\u4ee5\u91ca\u653eLLMs\u5728\u591a\u5143\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u771f\u6b63\u6f5c\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u7b56\u7565\uff0c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u591a\u8bed\u8a00\u80fd\u529b\uff1a\u9996\u5148\uff0c\u7cbe\u5fc3\u4f18\u5316\u9002\u7528\u4e8e\u591a\u8bed\u8a00LLM\u7684\u63d0\u793a\uff0c\u6316\u6398\u5176\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5404\u8bed\u8a00\u7684\u8868\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u6df7\u5408\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u591a\u8bed\u8a00\u5d4c\u5165\u7684LLM\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\uff0c\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u591a\u4efb\u52a1\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u52a8\u6001\u5b66\u4e60\u7b56\u7565\uff0c\u5b9e\u73b0\u5b9e\u65f6\u6839\u636e\u67e5\u8be2\u52a8\u6001\u9009\u62e9\u6700\u5408\u9002\u7684\u63d0\u793a\u7b56\u7565\u3001LLM\u6a21\u578b\u548c\u5d4c\u5165\u6a21\u578b\uff0c\u4ece\u800c\u6700\u5927\u5316LLM\u5728\u4e0d\u540c\u8bed\u8a00\u4e0a\u7684\u6548\u7387\uff0c\u8d85\u8d8a\u4e86\u6700\u4f73\u9759\u6001\u548c\u968f\u673a\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u65e2\u9002\u7528\u4e8e\u79bb\u7ebf\u914d\u7f6e\u8c03\u6574\uff0c\u4e5f\u652f\u6301\u5728\u7ebf\u9002\u5e94\uff0c\u80fd\u591f\u65e0\u7f1d\u9002\u5e94\u65b0\u8bed\u8a00\u548c\u6570\u636e\u96c6\uff0c\u663e\u8457\u63a8\u52a8\u4e86\u591a\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u5728\u5404\u79cd\u8bed\u8a00\u4e2d\u7684\u8fdb\u6b65\u3002|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358](http://arxiv.org/abs/2405.18358)|null|## \u80cc\u666f \u8fd1\u671f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u878d\u5408\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7ec6\u81f4\u7684\u591a\u6a21\u6001\u7406\u89e3\u3001\u590d\u6742\u4efb\u52a1\u89e3\u6790\u4ee5\u53ca\u591a\u6a21\u6001\u4fe1\u606f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMMCTAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u89e3\u51b3\u5f53\u524dMLLM\u5728\u590d\u6742\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u4e2d\u56fa\u6709\u5c40\u9650\u6027\u7684\u65b0\u578b\u591a\u6a21\u6001\u6279\u5224\u6027\u601d\u7ef4\u4ee3\u7406\u6846\u67b6\u3002MMCTAgent\u501f\u9274\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u548c\u6279\u5224\u6027\u601d\u8003\u7684\u7279\u70b9\uff0c\u901a\u8fc7\u8fed\u4ee3\u5206\u6790\u591a\u6a21\u6001\u4fe1\u606f\u3001\u62c6\u89e3\u95ee\u9898\u3001\u89c4\u5212\u7b56\u7565\uff0c\u5e76\u5b9e\u73b0\u52a8\u6001\u63a8\u7406\u3002 \u6b64\u5916\uff0cMMCTAgent\u8fd8\u878d\u5165\u4e86\u6279\u5224\u6027\u601d\u8003\u5143\u7d20\uff0c\u5982\u5bf9\u6700\u7ec8\u7b54\u6848\u7684\u9a8c\u8bc1\u548c\u81ea\u6211\u53cd\u601d\u3002\u5b83\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5b9a\u4e49\u57fa\u4e8e\u89c6\u89c9\u7684\u8bc4\u5224\u8005\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ece\u800c\u63d0\u5347\u51b3\u7b56\u80fd\u529b\u3002\u5728\u591a\u4e2a\u56fe\u50cf\u7406\u89e3\u548c\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u4e25\u8c28\u5730\u8bc4\u4f30\u4e86MMCTAgent\uff08\u5305\u62ec\u5e26\u8bc4\u5224\u8005\u7684\u7248\u672c\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u5728\u8d85\u8d8a\u57fa\u7840MLLM\u548c\u5176\u4ed6\u5de5\u5177\u589e\u5f3a\u7684\u7ba1\u9053\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335](http://arxiv.org/abs/2405.19335)|null|\u6211\u4eec\u63d0\u51faX-VILA\uff0c\u4e00\u79cd\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u529f\u80fd\u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u878d\u5408\u4e86\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u6a21\u6001\u3002\u901a\u8fc7\u5c06\u5404\u6a21\u6001\u7279\u5b9a\u7684\u7f16\u7801\u5668\u4e0eLLM\u8f93\u5165\u5bf9\u9f50\uff0c\u5e76\u5c06\u6269\u6563\u89e3\u7801\u5668\u4e0eLLM\u8f93\u51fa\u5bf9\u9f50\uff0cX-VILA\u5b9e\u73b0\u4e86\u8de8\u6a21\u6001\u7406\u89e3\u3001\u63a8\u7406\u548c\u751f\u6210\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8de8\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6709\u6548\u7684\u4efb\u610f\u6a21\u6001\u6307\u4ee4\u8ddf\u968f\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u7684\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u5b58\u5728\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5bfc\u81f4\u89c6\u89c9\u4fe1\u606f\u4e22\u5931\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u89c6\u89c9\u5bf9\u9f50\u673a\u5236\uff0c\u5305\u62ec\u4e00\u4e2a\u89c6\u89c9\u5d4c\u5165\u9ad8\u901f\u516c\u8def\u6a21\u5757\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u8d44\u6e90\u9ad8\u6548\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u4f7f\u5f97X-VILA\u5728\u4efb\u610f\u6a21\u6001\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5927\u5e45\u8d85\u8d8a\u5148\u524d\u7684\u65b9\u6cd5\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u5728\u7f3a\u4e4f\u7c7b\u4f3c\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0cX-VILA\u5728\u4e0d\u540c\u6a21\u6001\u95f4\u4e5f\u5c55\u73b0\u51fa\u6d8c\u73b0\u7279\u6027\u3002\u8be5\u9879\u76ee\u5c06\u5f00\u6e90\u3002|\n", "2405.19334": "|**2024-05-29**|**LLMs Meet Multimodal Generation and Editing: A Survey**|Yingqing He et.al.|[2405.19334](http://arxiv.org/abs/2405.19334)|**[link](https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u5c06\u5b83\u4eec\u4e0e\u591a\u6a21\u6001\u5b66\u4e60\u76f8\u7ed3\u5408\u3002\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8c03\u67e5\u4e3b\u8981\u96c6\u4e2d\u5728\u7406\u89e3\u4e0a\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8de8\u56fe\u50cf\u3001\u89c6\u9891\u30013D\u548c\u97f3\u9891\u7b49\u9886\u57df\u7684\u591a\u6a21\u6001\u751f\u6210\uff0c\u7279\u522b\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u9886\u57df\u4e2d\u7684\u91cc\u7a0b\u7891\u5f0f\u5de5\u4f5c\u53ca\u5176\u6280\u672f\u8fdb\u6b65\u3002\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u5173\u952e\u6280\u672f\u7ec4\u4ef6\uff0c\u4ee5\u53ca\u5728\u76f8\u5173\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5256\u6790\u4e86\u501f\u52a9\u73b0\u6709\u751f\u6210\u6a21\u578b\u8fdb\u884c\u4eba\u7c7b-\u8ba1\u7b97\u673a\u4ea4\u4e92\u7684\u5de5\u5177\u589e\u5f3a\u578b\u591a\u6a21\u6001\u4ee3\u7406\u3002\u6700\u540e\uff0c\u6211\u4eec\u5168\u9762\u8ba8\u8bba\u4e86\u4eba\u5de5\u667a\u80fd\u5b89\u5168\u7684\u8fdb\u6b65\uff0c\u5e76\u63a2\u7d22\u4e86\u65b0\u5174\u5e94\u7528\u548c\u672a\u6765\u524d\u666f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7cfb\u7edf\u800c\u6df1\u5165\u7684\u591a\u6a21\u6001\u751f\u6210\u6982\u8ff0\uff0c\u6709\u671b\u63a8\u52a8\u751f\u6210\u5185\u5bb9\u7684\u4eba\u5de5\u667a\u80fd\uff08AIGC\uff09\u548c\u4e16\u754c\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6240\u6709\u76f8\u5173\u7684\u8bba\u6587\u5217\u8868\u53ef\u5728\u627e\u5230\u3002**|\n", "2405.19333": "|**2024-05-29**|**Multi-Modal Generative Embedding Model**|Feipeng Ma et.al.|[2405.19333](http://arxiv.org/abs/2405.19333)|null|\u5728\u5927\u591a\u6570\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\uff0c\u95ee\u9898\u53ef\u4ee5\u5f52\u7ed3\u4e3a\u751f\u6210\u6216\u5d4c\u5165\u3002\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u901a\u8fc7\u5c06\u8bed\u8a00\u6a21\u5757\u5206\u89e3\u4e3a\u4e00\u4e2a\u7528\u4e8e\u751f\u6210\u7684\u6587\u672c\u89e3\u7801\u5668\u548c\u4e00\u4e2a\u7528\u4e8e\u5d4c\u5165\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u5904\u7406\u8fd9\u4e24\u79cd\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u591a\u6a21\u6001\u65b9\u6cd5\u7684\u7b80\u7ea6\u6027\uff0c\u672c\u5de5\u4f5c\u8bd5\u56fe\u4ec5\u4f7f\u7528\u4e00\u4e2a\u6a21\u578b\u6765\u5904\u7406\u6bcf\u79cd\u6a21\u6001\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u6a21\u6001\u751f\u6210\u5d4c\u5165\u6a21\u578b\uff08MM-GEM\uff09\uff0c\u5b83\u5c06\u751f\u6210\u548c\u5d4c\u5165\u76ee\u6807\u6574\u5408\u5230\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86PoolAggregator\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u5e76\u5b9e\u73b0\u7ec6\u7c92\u5ea6\u7684\u5d4c\u5165\u548c\u751f\u6210\u80fd\u529b\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u4e4b\u95f4\u5e76\u6ca1\u6709\u663e\u8457\u51b2\u7a81\u3002\u4f8b\u5982\uff0c\u57fa\u4e8eViT-Large\u548cTinyLlama\u7684MM-GEM\u5728\u8bf8\u5982\u8de8\u6a21\u6001\u68c0\u7d22\u548c\u96f6\u6837\u672c\u5206\u7c7b\u7b49\u591a\u6a21\u6001\u5d4c\u5165\u6a21\u578b\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u5177\u5907\u826f\u597d\u7684\u56fe\u50cf\u63cf\u8ff0\u80fd\u529b\u3002\u6b64\u5916\uff0cMM-GEM\u80fd\u591f\u65e0\u7f1d\u6267\u884c\u533a\u57df\u7ea7\u522b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u548c\u68c0\u7d22\u4efb\u52a1\u3002\u53e6\u5916\uff0cMM-GEM\u4e2d\u7684\u5148\u8fdb\u6587\u672c\u6a21\u578b\u5bf9\u4e8e\u957f\u6587\u672c\u548c\u56fe\u50cf\u68c0\u7d22\u7684Recall@1\u6307\u6807\u5e26\u6765\u4e86\u8d85\u8fc75%\u7684\u63d0\u5347\u3002|\n", "2405.19332": "|**2024-05-29**|**Self-Exploring Language Models: Active Preference Elicitation for Online Alignment**|Shenao Zhang et.al.|[2405.19332](http://arxiv.org/abs/2405.19332)|**[link](https://github.com/shenao-zhang/selm)**|****\u6458\u8981\uff1a** \u504f\u597d\u4f18\u5316\uff0c\u7279\u522b\u662f\u5728\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u7684\u9a71\u52a8\u4e0b\uff0c\u5df2\u7ecf\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u610f\u613f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u6210\u5c31\u3002\u76f8\u8f83\u4e8e\u4f7f\u7528\u56fa\u5b9a\u6570\u636e\u96c6\u7684\u79bb\u7ebf\u5bf9\u9f50\uff0c\u901a\u8fc7\u4eba\u6216\u4eba\u5de5\u667a\u80fd\u5bf9\u6a21\u578b\u751f\u6210\u7684\u53cd\u9988\u901a\u5e38\u80fd\u591f\u901a\u8fc7\u8fed\u4ee3\u8fc7\u7a0b\u63d0\u5347\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\u548cLLMs\u7684\u4e00\u81f4\u6027\u3002\u7136\u800c\uff0c\u8981\u5b9e\u73b0\u5168\u5c40\u51c6\u786e\u7684\u5956\u52b1\u6a21\u578b\uff0c\u9700\u8981\u7cfb\u7edf\u5730\u63a2\u7d22\u751f\u6210\u5404\u79cd\u5404\u6837\u7684\u54cd\u5e94\uff0c\u4ee5\u6db5\u76d6\u81ea\u7136\u8bed\u8a00\u7684\u5e7f\u9614\u7a7a\u95f4\u3002\u4ec5\u4f9d\u8d56\u6807\u51c6\u5956\u52b1\u6700\u5927\u5316LLMs\u7684\u968f\u673a\u91c7\u6837\u662f\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u7684\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5c42\u76ee\u6807\uff0c\u4e50\u89c2\u5730\u503e\u5411\u4e8e\u53ef\u80fd\u5177\u6709\u9ad8\u5956\u52b1\u7684\u54cd\u5e94\uff0c\u4ee5\u6b64\u6765\u4e3b\u52a8\u63a2\u7d22\u5206\u5e03\u5916\u533a\u57df\u3002\u901a\u8fc7\u89e3\u51b3\u5185\u5c42\u95ee\u9898\uff0c\u5229\u7528\u91cd\u65b0\u53c2\u6570\u5316\u7684\u5956\u52b1\u51fd\u6570\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aSelf-Exploring Language Models\uff08SELM\uff09\u7684\u7b97\u6cd5\u3002\u5b83\u6d88\u9664\u4e86\u5bf9\u5355\u72ec\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u7684\u9700\u6c42\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u76f4\u89c2\u7684\u76ee\u6807\u5bf9LLMs\u8fdb\u884c\u8fed\u4ee3\u66f4\u65b0\u3002\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u76f8\u6bd4\uff0cSELM\u7684\u76ee\u6807\u964d\u4f4e\u4e86\u5bf9\u672a\u89c1\u8fc7\u7684\u8fc7\u5ea6\u5ef6\u4f38\u7684\u65e0\u5dee\u522b\u504f\u597d\uff0c\u63d0\u9ad8\u4e86\u63a2\u7d22\u6548\u7387\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728Zephyr-7B-SFT\u548cLlama-3-8B-Instruct\u6a21\u578b\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0cSELM\u5728MT-Bench\u548cAlpacaEval 2.0\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4ee5\u53ca\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u5404\u79cd\u6807\u51c6\u5b66\u672f\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328](http://arxiv.org/abs/2405.19328)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u89c4\u8303\u6a21\u5757\u201d\u7684\u67b6\u6784\uff0c\u5b83\u9488\u5bf9\u751f\u6210\u6027\u4ee3\u7406\u5728\u9762\u5bf9\u5305\u542b\u73b0\u6709\u89c4\u8303\u7684\u793e\u4f1a\u7ed3\u6784\u65f6\u7684\u534f\u4f5c\u6311\u6218\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u8bc4\u4f30\u73af\u5883\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u793e\u4f1a\u4efb\u52a1\u65f6\uff0c\u5982\u4f55\u8bc6\u522b\u5e76\u9002\u5e94\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u89c4\u8303\u6a21\u5757\u7684\u6838\u5fc3\u5728\u4e8e\u4fc3\u8fdb\u5747\u8861\u9009\u62e9\uff0c\u501f\u9274\u5206\u7c7b\u673a\u6784\u5b9e\u73b0\u76f8\u5173\u5747\u8861\u7684\u6982\u5ff5\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u540c\u4f34\u4e92\u52a8\u5b66\u4e60\u73af\u5883\u4e2d\u4e0d\u540c\u5019\u9009\u673a\u6784\u4e2d\u7684\u6743\u5a01\u6027\u3002\u901a\u8fc7\u63d0\u5347\u89c4\u8303\u80fd\u529b\uff0c\u4ee3\u7406\u53ef\u4ee5\u534f\u8c03\u5236\u88c1\u884c\u4e3a\uff0c\u8fdb\u800c\u5f71\u54cd\u793e\u4ea4\u73af\u5883\u4e2d\u7684\u57fa\u672c\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4f53\u798f\u7949\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u652f\u6301\u673a\u6784\u7684\u65b0\u73af\u5883\uff0c\u5e76\u6839\u636e\u4e24\u4e2a\u4e3b\u8981\u6807\u51c6\u6765\u8bc4\u4f30\u8be5\u6846\u67b6\uff1a\u4e00\u662f\u4ee3\u7406\u80fd\u5426\u5ffd\u7565\u975e\u6743\u5a01\u673a\u6784\uff0c\u4e8c\u662f\u4ee3\u7406\u5728\u591a\u4e2a\u9009\u9879\u4e2d\u8bc6\u522b\u6743\u5a01\u673a\u6784\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u914d\u5907\u4e86\u89c4\u8303\u6a21\u5757\u7684\u4ee3\u7406\u76f8\u6bd4\u57fa\u7840\u4ee3\u7406\u80fd\u5b9e\u73b0\u66f4\u7a33\u5b9a\u7684\u5408\u4f5c\u6548\u679c\uff0c\u8fd9\u4e3a\u7814\u7a76\u8bbe\u8ba1\u8003\u8651\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u7684\u73af\u5883\u548c\u4ee3\u7406\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002|\n", "2405.19327": "|**2024-05-29**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327](http://arxiv.org/abs/2405.19327)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u51fa\u4e8e\u5546\u4e1a\u5229\u76ca\uff0c\u50cfGPT\u3001Gemini\u548cClaude\u8fd9\u6837\u7684\u6700\u5148\u8fdb\u6a21\u578b\u88ab\u5c01\u95ed\u5728\u4e13\u6709\u63a5\u53e3\u540e\uff0c\u5176\u8bad\u7ec3\u8be6\u60c5\u5e76\u672a\u516c\u5f00\u3002\u8fd1\u671f\uff0c\u4e00\u4e9b\u673a\u6784\u5f00\u6e90\u4e86\u7c7b\u4f3c\u6027\u80fd\u7684LLMs\uff0c\u5982LLaMA-3\uff0c\u4f46\u5927\u591a\u6570\u7ec6\u8282\uff08\u5982\u4e2d\u95f4\u68c0\u67e5\u70b9\u3001\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u8bad\u7ec3\u4ee3\u7801\u7b49\uff09\u4ecd\u672a\u62ab\u9732\u3002\u4e3a\u4e86\u63d0\u9ad8LLMs\u7684\u900f\u660e\u5ea6\uff0c\u7814\u7a76\u754c\u6b63\u5728\u63a8\u52a8\u771f\u6b63\u5f00\u653e\u7684\u6a21\u578b\uff0c\u5982Pythia\u3001Amber\u548cOLMo\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u4fe1\u606f\uff0c\u4fc3\u8fdb\u4e86\u5bf9\u5927\u6a21\u578b\u6027\u80fd\u3001\u5c40\u9650\u6027\u3001\u504f\u89c1\u548c\u98ce\u9669\u7684\u79d1\u5b66\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5f00\u653e\u6a21\u578b\u5728\u63a8\u7406\u3001\u77e5\u8bc6\u548c\u7f16\u7a0b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ecd\u900a\u4e8e\u540c\u7b49\u89c4\u6a21\u7684\u5c01\u95ed\u6e90\u7801\u6a21\u578b\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u6e90\u4e86MAP-Neo\uff0c\u4e00\u4e2a\u62e5\u670970\u4ebf\u53c2\u6570\u7684\u53cc\u8bed\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u5934\u5f00\u59cb\u57284.5\u4e07\u4ebf\u9ad8\u8d28\u91cf\u4ee4\u724c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002MAP-Neo\u662f\u9996\u4e2a\u4e0e\u73b0\u6709\u9876\u7ea7LLMs\u6027\u80fd\u76f8\u5f53\u7684\u5b8c\u5168\u5f00\u6e90\u7684\u53cc\u8bed\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u4e86\u6240\u6709\u7ec6\u8282\uff0c\u5305\u62ec\u6e05\u7406\u540e\u7684\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u3001\u6570\u636e\u6e05\u6d17\u6d41\u7a0b\u3001\u68c0\u67e5\u70b9\u4ee5\u53ca\u4f18\u5316\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30\u6846\u67b6\uff0c\u4ee5\u4f9b\u91cd\u73b0\u3002\u6211\u4eec\u671f\u671bMAP-Neo\u80fd\u63a8\u52a8\u5f00\u653e\u7814\u7a76\u793e\u533a\u7684\u53d1\u5c55\uff0c\u6fc0\u53d1\u66f4\u591a\u521b\u65b0\uff0c\u4fc3\u8fdbLLMs\u7684\u8fdb\u4e00\u6b65\u63d0\u5347\u3002|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326](http://arxiv.org/abs/2405.19326)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u96f6\u6837\u672c3D\u63a8\u7406\u5206\u5272\uff0c\u76ee\u6807\u662f\u9488\u5bf9\u7269\u4f53\u7684\u90e8\u4ef6\u641c\u7d22\u548c\u5b9a\u4f4d\uff0c\u8fd9\u662f\u4e00\u79cd\u8d85\u8d8a\u4e86\u5148\u524d\u7c7b\u522b\u7279\u5b9a\u76843D\u8bed\u4e49\u5206\u5272\u30013D\u5b9e\u4f8b\u5206\u5272\u548c\u5f00\u653e\u8bcd\u6c473D\u5206\u5272\u5c40\u9650\u7684\u65b0\u8303\u5f0f\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aReasoning3D\u7684\u7b80\u5355\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u7684\u547d\u4ee4\uff0c\u5bf93D\u7f51\u683c\u8fdb\u884c\uff08\u7ec6\u81f4\uff09\u90e8\u5206\u5206\u5272\uff0c\u540c\u65f6\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u548c\u63a8\u7406\u7b54\u6848\u7684\u4ea4\u4e92\u5f0f\u5206\u5272\u80fd\u529b\u3002\u7279\u522b\u5730\uff0cReasoning3D\u5229\u7528\u9884\u8bad\u7ec3\u76842D\u5206\u5272\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\uff0c\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u89e3\u6790\u7528\u6237\u8f93\u5165\u67e5\u8be2\u3002\u5df2\u6709\u7814\u7a76\u8868\u660e\uff0c\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8d4b\u4e88\u57fa\u7840\u6a21\u578b\u4e16\u754c\u77e5\u8bc6\u7684\u5148\u9a8c\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u590d\u6742\u6307\u4ee4\uff0c\u8fd9\u4f7f\u5f97\u6211\u4eec\u5728\u4f9d\u8d56\u6709\u96503D\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u201c\u5206\u5272\u4efb\u4f55\u4e1c\u897f\u201d\uff08\u6e90\u6548\u7387\u9ad8\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u6cdb\u5316\u6027\uff0c\u80fd\u6709\u6548\u6839\u636e\u9690\u6027\u6587\u672c\u67e5\u8be2\u57283D\u5bf9\u8c61\uff083D\u7f51\u683c\uff09\u4e2d\u5b9a\u4f4d\u548c\u7a81\u51fa\u663e\u793a\u90e8\u5206\uff0c\u5305\u62ec\u53ef\u52a83D\u5bf9\u8c61\u548c\u771f\u5b9e\u4e16\u754c\u7684\u626b\u63cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65e0\u76d1\u7763\u65b9\u6cd5\u4fbf\u4e8e\u5feb\u901f\u90e8\u7f72\uff0c\u5e76\u4e3a\u672a\u67653D\uff08\u8bed\u4e49\uff09\u5bf9\u8c61\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\uff0c\u5982\u673a\u5668\u4eba\u3001\u7269\u4f53\u64cd\u4f5c\u3001\u90e8\u4ef6\u7ec4\u88c5\u3001\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u3001\u589e\u5f3a\u73b0\u5b9e\u548c\u865a\u62df\u73b0\u5b9e\uff08AR/VR\uff09\u3001\u4ee5\u53ca\u533b\u7597\u5e94\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u884c\u7684\u901a\u7528\u57fa\u51c6\u3002\u4ee3\u7801\u3001\u6a21\u578b\u6743\u91cd\u3001\u90e8\u7f72\u6307\u5357\u548c\u8bc4\u4f30\u534f\u8bae\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttp://tianrun-chen.github.io/Reason3D/\u3002|\n", "2405.19325": "|**2024-05-29**|**Nearest Neighbor Speculative Decoding for LLM Generation and Attribution**|Minghan Li et.al.|[2405.19325](http://arxiv.org/abs/2405.19325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4f1a\u4ea7\u751f\u865a\u6784\u5185\u5bb9\u4e14\u7f3a\u4e4f\u5bf9\u751f\u6210\u6587\u672c\u7684\u6765\u6e90\u6807\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u534a\u53c2\u6570\u5316\u8bed\u8a00\u6a21\u578b\u5982kNN-LM\u901a\u8fc7\u5728\u975e\u53c2\u6570\u6570\u636e\u5b58\u50a8\u4e2d\u5bfb\u627e\u4e0e\u7ed9\u5b9a\u63d0\u793a\u6700\u63a5\u8fd1\u7684\u90bb\u5c45\u6765\u6539\u8fdbLM\u8f93\u51fa\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u901a\u5e38\u8f83\u6162\uff0c\u751f\u6210\u7684\u6587\u672c\u6d41\u7545\u5ea6\u4e0d\u9ad8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u534a\u53c2\u6570\u5316\u8bed\u8a00\u5efa\u6a21\u65b9\u6cd5\u2014\u2014Nearest Neighbor Speculative Decoding\uff08NEST\uff09\uff0c\u5b83\u80fd\u591f\u5c06\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4efb\u610f\u957f\u5ea6\u6587\u672c\u7247\u6bb5\u878d\u5165\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u5176\u6e90\u5934\u7684\u6807\u6ce8\u3002NEST\u5728\u6bcf\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u8fdb\u884c\u57fa\u4e8e\u4ee4\u724c\u7684\u68c0\u7d22\uff0c\u8ba1\u7b97\u51fa\u4e00\u4e2a\u534a\u53c2\u6570\u6df7\u5408\u5206\u5e03\uff0c\u5e76\u4ece\u8bed\u6599\u5e93\u4e2d\u8bc6\u522b\u51fa\u53ef\u80fd\u7684\u8fde\u7eed\u6587\u672c\u6bb5\u843d\u6269\u5c55\u3002\u5b83\u91c7\u7528\u4e00\u79cd\u8fd1\u4f3c\u63a8\u6d4b\u89e3\u7801\u7b56\u7565\uff0c\u63a5\u53d7\u68c0\u7d22\u5230\u7684\u7247\u6bb5\u524d\u7f00\u6216\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002NEST\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u7840LM\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u751f\u6210\u8d28\u91cf\u548c\u6765\u6e90\u6807\u6ce8\u7387\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684kNN-LM\u65b9\u6cd5\uff0c\u5e76\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u68c0\u7d22\u589e\u5f3a\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u529b\u3002\u6b64\u5916\uff0cNEST\u5927\u5e45\u63d0\u5347\u4e86\u751f\u6210\u901f\u5ea6\uff0c\u5f53\u5e94\u7528\u4e8eLlama-2-Chat 70B\u65f6\uff0c\u63a8\u7406\u65f6\u95f4\u63d0\u9ad8\u4e861.8\u500d\u3002|\n", "2405.19323": "|**2024-05-29**|**Are Large Language Models Chameleons?**|Mingmeng Geng et.al.|[2405.19323](http://arxiv.org/abs/2405.19323)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u62e5\u6709\u81ea\u5df1\u7684\u4e16\u754c\u89c2\u548c\u4eba\u683c\u503e\u5411\uff1f\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u4e86\u8d85\u8fc7\u4e00\u767e\u4e07\u6b21\u7684\u5b9e\u9a8c\uff0c\u8ba9LLMs\u56de\u7b54\u4e3b\u89c2\u95ee\u9898\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u6a21\u578b\u7684\u54cd\u5e94\u4e0e\u6b27\u6d32\u793e\u4f1a\u8c03\u67e5\uff08ESS\uff09\u7684\u5b9e\u9645\u6570\u636e\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793a\u63d0\u793a\u5bf9\u504f\u89c1\u548c\u53d8\u5f02\u6027\u6709\u663e\u8457\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u91cd\u5927\u7684\u6587\u5316\u3001\u5e74\u9f84\u548c\u6027\u522b\u504f\u5dee\u3002\u6587\u4e2d\u8ba8\u8bba\u4e86\u8bc4\u4f30LLMs\u4e0e\u8c03\u67e5\u6570\u636e\u5dee\u5f02\u7684\u65b9\u6cd5\uff0c\u5982\u8ba1\u7b97\u52a0\u6743\u5e73\u5747\u503c\u4ee5\u53ca\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u57fa\u4e8eJaccard\u76f8\u4f3c\u6027\u7684\u6d4b\u91cf\u6307\u6807\u3002\u7814\u7a76\u8005\u5f3a\u8c03\uff0c\u5728\u5229\u7528LLMs\u6a21\u62df\u4e2a\u4f53\u51b3\u7b56\u6216\u96c6\u4f53\u884c\u4e3a\u4e4b\u524d\uff0c\u5206\u6790\u63d0\u793a\u7684\u7a33\u5065\u6027\u548c\u53d8\u5f02\u6027\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u6a21\u4eff\u80fd\u529b\u5145\u5176\u91cf\u53ea\u80fd\u8bf4\u662f\u8fd1\u4f3c\u7684\u3002|\n", "2405.19320": "|**2024-05-29**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320](http://arxiv.org/abs/2405.19320)|null|**\u6458\u8981\uff1a** \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5728\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u90fd\u5904\u4e8e\u6d3b\u8dc3\u7684\u7814\u7a76\u9636\u6bb5\uff0c\u4f46\u5173\u952e\u6311\u6218\u4e4b\u4e00\u662f\u5982\u4f55\u5728\u5904\u7406\u4ece\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u4e0d\u786e\u5b9a\u6027\u65f6\u3002\u5c3d\u7ba1\u6807\u51c6\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e2d\u4e50\u89c2\u4e3b\u4e49\u6216\u60b2\u89c2\u4e3b\u4e49\u7684\u539f\u5219\u5df2\u5e7f\u4e3a\u4eba\u77e5\uff0c\u4f46\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5b9e\u73b0\u65e2\u5b9e\u7528\u53c8\u57fa\u4e8e\u7406\u8bba\u7684\u65b9\u6cd5\u5c1a\u4e0d\u6210\u719f\uff0c\u56e0\u4e3a\u6784\u5efa\u7f6e\u4fe1\u533a\u95f4\u7684\u6807\u51c6\u6280\u672f\u5728\u5904\u7406\u4efb\u610f\u7b56\u7565\u53c2\u6570\u5316\u65f6\u53d8\u5f97\u96be\u4ee5\u5904\u7406\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u65b9\u6cd5\u2014\u2014\u4ef7\u503c\u6fc0\u52b1\u7684\u504f\u597d\u4f18\u5316\uff08VPO\uff09\u3002VPO\u901a\u8fc7\u5728\u6700\u5927\u4f3c\u7136\u4f30\u8ba1\u7684\u5956\u52b1\u51fd\u6570\u4e2d\u6dfb\u52a0\u76f8\u5e94\u7684\u503c\u51fd\u6570\u7684\u6b63\u5219\u5316\uff0c\u4ee5\u6307\u793a\u9009\u62e9\u4e50\u89c2\u4e3b\u4e49\u8fd8\u662f\u60b2\u89c2\u4e3b\u4e49\uff0c\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\u3002\u6b64\u5916\uff0cVPO\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u5e76\u5229\u7528\u9690\u5f0f\u5956\u52b1\u5efa\u6a21\uff0c\u56e0\u6b64\u5176RLHF\u7ba1\u9053\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\u66f4\u4e3a\u7b80\u5355\u3002\u5bf9\u4e8e\u5728\u7ebf\u548c\u79bb\u7ebf\u8bbe\u7f6e\uff0cVPO\u63d0\u4f9b\u4e86\u7406\u8bba\u4fdd\u8bc1\uff0c\u5176\u6536\u655b\u901f\u5ea6\u4e0e\u6807\u51c6RL\u76f8\u5f53\u3002\u5b9e\u9a8c\u5728\u6587\u672c\u6458\u8981\u548c\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u9a8c\u8bc1\u4e86VPO\u7684\u5b9e\u7528\u6027\u4e0e\u6709\u6548\u6027\u3002|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340](http://arxiv.org/abs/2405.20340)|null|\u8fd9\u9879\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\uff08\u89c6\u9891\u548c\u52a8\u4f5c\u6a21\u6001\uff09\u4e0b\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5f3a\u5927\u529f\u80fd\u3002\u4e0e\u4e13\u4e3a\u5355\u6a21\u6001\uff08\u89c6\u9891\u6216\u52a8\u4f5c\uff09\u8bbe\u8ba1\u7684\u6700\u65b0LLMs\u4e0d\u540c\uff0c\u6211\u4eec\u8ba4\u4e3a\u7406\u89e3\u4eba\u7c7b\u884c\u4e3a\u9700\u8981\u5bf9\u89c6\u9891\u548c\u52a8\u4f5c\u5e8f\u5217\uff08\u5982SMPL\u5e8f\u5217\uff09\u8fdb\u884c\u8054\u5408\u5efa\u6a21\uff0c\u4ee5\u6709\u6548\u6355\u6349\u7cbe\u7ec6\u7684\u8eab\u4f53\u90e8\u4f4d\u52a8\u6001\u548c\u8bed\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMotionLLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u4eba\u7c7b\u52a8\u4f5c\u7406\u89e3\u3001\u63cf\u8ff0\u548c\u63a8\u7406\u3002MotionLLM\u91c7\u7528\u4e86\u4e00\u4f53\u5316\u7684\u89c6\u9891-\u52a8\u4f5c\u8bad\u7ec3\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u7c97\u7c92\u5ea6\u7684\u89c6\u9891-\u6587\u672c\u6570\u636e\u548c\u7cbe\u7ec6\u52a8\u4f5c-\u6587\u672c\u6570\u636e\u7684\u4f18\u52bf\uff0c\u4ee5\u83b7\u53d6\u4e30\u5bcc\u7684\u7a7a\u95f4-\u65f6\u95f4\u6d1e\u5bdf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684MoVid\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u591a\u6837\u5316\u7684\u89c6\u9891\u3001\u52a8\u4f5c\u3001caption\u548c\u6307\u4ee4\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86MoVid-Bench\uff0c\u5b83\u5177\u6709\u7cbe\u5fc3\u7684\u624b\u52a8\u6807\u6ce8\uff0c\u4ee5\u66f4\u597d\u5730\u8bc4\u4f30\u5728\u89c6\u9891\u548c\u52a8\u4f5c\u4e0a\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5145\u5206\u5c55\u793a\u4e86MotionLLM\u5728caption\u751f\u6210\u3001\u7a7a\u95f4-\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002|\n", "2405.20339": "|**2024-05-30**|**Visual Perception by Large Language Model's Weights**|Feipeng Ma et.al.|[2405.20339](http://arxiv.org/abs/2405.20339)|null|\u8fd9\u7bc7\u8bba\u6587\u7684\u80cc\u666f\u662f\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u91c7\u7528\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5373\u5c06\u89c6\u89c9\u4fe1\u606f\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u7a7a\u95f4\u5bf9\u9f50\uff0c\u7136\u540e\u5c06\u89c6\u89c9\u4ee4\u724c\u4e0e\u6587\u672c\u4ee4\u724c\u5408\u5e76\uff0c\u5f62\u6210\u7edf\u4e00\u7684\u5e8f\u5217\u8f93\u5165\u7ed9\u8bed\u8a00\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7531\u4e8e\u589e\u52a0\u4e86\u7531\u89c6\u89c9\u4ee4\u724c\u5bfc\u81f4\u7684\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\uff0c\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53c2\u6570\u7a7a\u95f4\u5bf9\u9f50\u8303\u5f0f\uff0c\u901a\u8fc7\u5c06\u89c6\u89c9\u4fe1\u606f\u8868\u793a\u4e3a\u6a21\u578b\u6743\u91cd\u6765\u5904\u7406\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u8f93\u5165\u56fe\u50cf\uff0c\u9996\u5148\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u63d0\u53d6\u7279\u5f81\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u7279\u5f81\u8f6c\u6362\u4e3a\u611f\u77e5\u6743\u91cd\uff0c\u5e76\u5c06\u5176\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u6743\u91cd\u878d\u5408\u3002\u8fd9\u6837\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u65e0\u9700\u89c6\u89c9\u4ee4\u724c\uff0c\u4ece\u800c\u7f29\u77ed\u4e86\u8f93\u5165\u5e8f\u5217\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6548\u7387\u3002 \u57fa\u4e8e\u8fd9\u4e00\u7406\u5ff5\uff0c\u8bba\u6587\u63d0\u51fa\u4e86VLoRA\u6a21\u578b\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u4e2a\u611f\u77e5\u6743\u91cd\u751f\u6210\u5668\u3002\u8be5\u751f\u6210\u5668\u8bbe\u8ba1\u6210\u80fd\u591f\u5c06\u89c6\u89c9\u7279\u5f81\u8f6c\u5316\u4e3a\u5177\u6709\u4f4e\u79e9\u7279\u6027\u7684\u611f\u77e5\u6743\u91cd\uff0c\u7c7b\u4f3c\u4e8eLoRA\uff08\u4f4e\u79e9\u81ea\u9002\u5e94\u8bad\u7ec3\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1VLoRA\u5728\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u4e0e\u73b0\u6709MLLMs\u76f8\u5f53\u7684\u6027\u80fd\uff0c\u4f46\u5176\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8ba1\u7b97\u6210\u672c\u663e\u8457\u964d\u4f4e\u3002\u8bba\u6587\u627f\u8bfa\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2405.20335": "|**2024-05-30**|**Xwin-LM: Strong and Scalable Alignment Practice for LLMs**|Bolin Ni et.al.|[2405.20335](http://arxiv.org/abs/2405.20335)|**[link](https://github.com/xwin-lm/xwin-lm)**|**\u672c\u6587\u4ecb\u7ecdXwin-LM\uff0c\u4e00\u4e2a\u4e13\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u7684\u5168\u9762\u5bf9\u9f50\u65b9\u6cd5\u5957\u4ef6\u3002\u5b83\u6db5\u76d6\u4e86\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3001\u5956\u52b1\u5efa\u6a21\uff08RM\uff09\u3001\u62d2\u7edd\u91c7\u6837\u5fae\u8c03\uff08RS\uff09\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b49\u591a\u79cd\u5173\u952e\u6280\u672f\u3002\u4e3b\u8981\u7ec4\u6210\u90e8\u5206\u5305\u62ec\uff1a(1) \u4f7f\u7528\u9ad8\u8d28\u91cf\u6307\u4ee4\u6570\u636e\u8fdb\u884c\u521d\u59cb\u5fae\u8c03\u7684Xwin-LM-SFT\uff1b(2) \u7531GPT-4\u7cbe\u5fc3\u6807\u6ce8\u7684\u5927\u578b\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6Xwin-Pair\uff1b(3) \u57287B\u300113B\u548c70B\u53c2\u6570\u89c4\u6a21\u4e0a\u8bad\u7ec3\u7684Xwin-RM\u5956\u52b1\u6a21\u578b\uff1b(4) \u6bcf\u4e2a\u63d0\u793a\u5173\u805464\u4e2a\u72ec\u7279\u54cd\u5e94\u7684\u591awise\u504f\u597d\u6570\u636e\u96c6Xwin-Set\uff0c\u8fd9\u4e9b\u54cd\u5e94\u7531Xwin-LM-SFT\u751f\u6210\u5e76\u7531Xwin-RM\u8bc4\u5206\uff1b(5) \u4f7f\u7528Xwin-Set\u4e2d\u6700\u9ad8\u5f97\u5206\u54cd\u5e94\u8fdb\u884c\u5fae\u8c03\u7684Xwin-LM-RS\u6a21\u578b\uff1b(6) \u901a\u8fc7DPO\u7b97\u6cd5\u5728Xwin-Set\u4e0a\u8fdb\u4e00\u6b65\u4f18\u5316\u7684Xwin-LM-DPO\u6a21\u578b\u3002\u6211\u4eec\u5728AlpacaEval\u548cMT-bench\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86\u6574\u4e2a\u7ba1\u9053\u7684\u7a33\u5b9a\u4e14\u663e\u8457\u6539\u8fdb\uff0c\u8bc1\u660e\u4e86Xwin-LM\u7684\u5f3a\u5927\u548c\u53ef\u6269\u5c55\u6027\u3002\u6211\u4eec\u5c06\u5728https://github.com/Xwin-LM/Xwin-LM\u7684\u4ed3\u5e93\u4e2d\u6301\u7eed\u66f4\u65b0\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u7814\u7a76\u3002**|\n", "2405.20319": "|**2024-05-31**|**ParSEL: Parameterized Shape Editing with Language**|Aditya Ganeshan et.al.|[2405.20319](http://arxiv.org/abs/2405.20319)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aParSEL\u7684\u7cfb\u7edf\uff0c\u5b83\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5b9e\u73b0\u9ad8\u8d28\u91cf3D\u8d44\u4ea7\u7684\u53ef\u63a7\u7f16\u8f91\u3002\u9762\u5bf9\u81ea\u7136\u8bed\u8a00\u5728\u7cbe\u786e\u64cd\u63a7\u4e0a\u7684\u5c40\u9650\u6027\uff0cParSEL\u63a5\u6536\u4e00\u4e2a\u5206\u5272\u76843D\u7f51\u683c\u548c\u7f16\u8f91\u8bf7\u6c42\uff0c\u751f\u6210\u4e00\u4e2a\u53c2\u6570\u5316\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u7528\u6237\u53ef\u4ee5\u8c03\u6574\u7a0b\u5e8f\u53c2\u6570\uff0c\u7cbe\u7ec6\u5730\u63a2\u7d22\u5f62\u72b6\u53d8\u5316\uff0c\u63a7\u5236\u7f16\u8f91\u5e45\u5ea6\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u7406\u89e3\u521d\u59cb\u7f16\u8f91\u6307\u4ee4\uff0c\u4f46\u53d1\u73b0\u5b83\u4eec\u5728\u63a8\u65ad\u5b8c\u6574\u7f16\u8f91\u7a0b\u5e8f\u65f6\u5e38\u5e38\u4e0d\u8db3\uff0c\u4ea7\u751f\u7684\u7ed3\u679c\u53ef\u80fd\u8fdd\u53cd\u5f62\u72b6\u903b\u8f91\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u5206\u6790\u6027\u7f16\u8f91\u4f20\u64ad\uff08Analytical Edit Propagation\uff0cAEP\uff09\u7b97\u6cd5\uff0c\u5b83\u4ece\u521d\u59cb\u7f16\u8f91\u79cd\u5b50\u5f00\u59cb\uff0c\u901a\u8fc7\u8ba1\u7b97\u673a\u4ee3\u6570\u7cfb\u7edf\u8fdb\u884c\u51e0\u4f55\u5206\u6790\uff0c\u5bfb\u627e\u4e0e\u6f5c\u5728\u7528\u6237\u7f16\u8f91\u517c\u5bb9\u7684\u5206\u6790\u6027\u7f16\u8f91\u64cd\u4f5c\uff0c\u4ee5\u751f\u6210\u5b8c\u6574\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u5176\u4ed6\u65b9\u6848\uff0cParSEL\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8bf7\u6c42\u6709\u6548\u5730\u5b9e\u73b0\u4e86\u5bf93D\u5bf9\u8c61\u7684\u53ef\u63a7\u7f16\u8f91\u3002|\n", "2405.20318": "|**2024-05-30**|**CausalQuest: Collecting Natural Causal Questions for AI Agents**|Roberto Ceraolo et.al.|[2405.20318](http://arxiv.org/abs/2405.20318)|**[link](https://github.com/roberto-ceraolo/causal-quest)**|**\u4eba\u7c7b\u5929\u751f\u5c31\u6709\u5bfb\u6c42\u56e0\u679c\u5173\u7cfb\u7684\u9a71\u52a8\u529b\uff0c\u65e0\u8bba\u662f\u51fa\u4e8e\u597d\u5947\u5fc3\u8fd8\u662f\u7279\u5b9a\u76ee\u6807\u3002\u4e3a\u4e86\u5f00\u53d1\u80fd\u5904\u7406\u8fd9\u79cd\u4eba\u7c7b\u672c\u6027\u8ffd\u6c42\u7684AI\u4ee3\u7406\uff0c\u6211\u4eec\u6025\u9700\u4e00\u4e2a\u5168\u9762\u7684\u81ea\u7136\u56e0\u679c\u95ee\u9898\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u8981\u4e48\u5305\u542b\u4eba\u5de5\u5236\u9020\u7684\u95ee\u9898\uff0c\u65e0\u6cd5\u53cd\u6620\u5b9e\u9645AI\u5e94\u7528\u573a\u666f\uff0c\u8981\u4e48\u5728\u7279\u5b9a\u6765\u6e90\u7684\u95ee\u9898\u8986\u76d6\u4e0a\u6709\u9650\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CausalQuest\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81ea\u793e\u4ea4\u7f51\u7edc\u3001\u641c\u7d22\u5f15\u64ce\u548cAI\u52a9\u624b\u768413,500\u4e2a\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u56e0\u679c\u95ee\u9898\uff0c\u5e76\u5efa\u7acb\u4e86\u66f4\u7ec6\u81f4\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u5458\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\uff0c\u6211\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6807\u6ce8\u3002\u7814\u7a76\u53d1\u73b0\uff0c42%\u7684\u4eba\u7c7b\u63d0\u95ee\u5b9e\u9645\u4e0a\u662f\u5173\u4e8e\u56e0\u679c\u7684\uff0c\u5927\u90e8\u5206\u662f\u60f3\u4e86\u89e3\u7ed9\u5b9a\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u9ad8\u6548\u7684\u4e8c\u5206\u7c7b\u5668\uff08\u9ad8\u8fbe28.5\u4ebf\u53c2\u6570\uff09\uff0c\u7528\u4e8e\u8bc6\u522b\u56e0\u679c\u95ee\u9898\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6027\u80fd\uff0cF1\u5206\u6570\u9ad8\u8fbe0.877\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u4e30\u5bcc\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u8fd9\u4e9b\u90fd\u53ef\u4ee5\u57fa\u4e8e\u6211\u4eec\u7684\u6570\u636e\u548c\u6a21\u578b\u8fdb\u884c\u6269\u5c55\u3002**|\n", "2405.20315": "|**2024-05-30**|**ANAH: Analytical Annotation of Hallucinations in Large Language Models**|Ziwei Ji et.al.|[2405.20315](http://arxiv.org/abs/2405.20315)|**[link](https://github.com/open-compass/anah)**|**### \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u5e7b\u89c9\u201d\u95ee\u9898\u5bf9\u4e8e\u5176\u5e7f\u6cdb\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5bf9\u8fd9\u4e00\u95ee\u9898\u7684\u7ec6\u81f4\u6d4b\u91cf\u5728\u793e\u533a\u4e2d\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a$\\textbf{ANAH}$\u7684\u53cc\u8bed\u6570\u636e\u96c6\uff0c\u4e13\u6ce8\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2d\u7684LLM\u5e7b\u89c9\u5206\u6790\u3002ANAH\u4e2d\u7684\u6bcf\u4e2a\u7b54\u6848\u53e5\u5b50\u90fd\u7ecf\u8fc7\u4e25\u8c28\u6807\u6ce8\uff0c\u5305\u62ec\u53c2\u8003\u7247\u6bb5\u68c0\u7d22\u3001\u5e7b\u89c9\u7c7b\u578b\u7684\u5224\u65ad\u4ee5\u53ca\u9519\u8bef\u5185\u5bb9\u7684\u4fee\u6b63\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea612,000\u4e2a\u53e5\u7ea7\u6ce8\u91ca\uff0c\u6db5\u76d6\u4e86\u5927\u7ea64,300\u4e2aLLM\u54cd\u5e94\uff0c\u6d89\u53ca\u8d85\u8fc7700\u4e2a\u4e3b\u9898\uff0c\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\u5f0f\u6d41\u7a0b\u6784\u5efa\u800c\u6210\u3002\u7531\u4e8e\u5e7b\u89c9\u6ce8\u91ca\u7684\u7cbe\u7ec6\u7c92\u5ea6\uff0c\u6211\u4eec\u53ef\u4ee5\u5b9a\u91cf\u786e\u8ba4LLMs\u7684\u5e7b\u89c9\u95ee\u9898\u968f\u7740\u7b54\u6848\u7684\u6269\u5c55\u800c\u9010\u6e10\u589e\u52a0\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u6807\u6ce8\u5668\u3002 ### \u4efb\u52a1 \u6211\u4eec\u6784\u5efa\u4e86\u5927\u7ea612,000\u6761\u53e5\u5b50\u7ea7\u522b\u7684\u6ce8\u91ca\uff0c\u9488\u5bf9\u7ea64,300\u4e2aLLM\u751f\u6210\u7684\u56de\u7b54\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u4e3b\u9898\u3002\u8fd9\u4e2a\u540d\u4e3aANAH\u7684\u6570\u636e\u96c6\u901a\u8fc7\u4eba\u7c7b\u53c2\u4e0e\u7684\u6d41\u7a0b\u7cbe\u5fc3\u8bbe\u8ba1\uff0c\u65e8\u5728\u63d0\u4f9b\u5173\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2dLLMs\u5e7b\u89c9\u7684\u8be6\u5c3d\u5206\u6790\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u5e7b\u89c9\u6807\u6ce8\uff0c\u6211\u4eec\u80fd\u591f\u91cf\u5316\u5730\u9a8c\u8bc1LLMs\u5728\u751f\u6210\u7b54\u6848\u65f6\u5e7b\u89c9\u95ee\u9898\u7684\u7d2f\u79ef\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u6df1\u5165\u7814\u7a76\u4e86\u751f\u6210\u5f0f\u548c\u533a\u5206\u6027\u6807\u6ce8\u5668\uff0c\u5e76\u53d1\u73b0\u5c3d\u7ba1\u5f00\u6e90LLMs\u5728\u7cbe\u7ec6\u5e7b\u89c9\u6807\u6ce8\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u4f46\u4f7f\u7528ANAH\u8bad\u7ec3\u7684\u751f\u6210\u5f0f\u6807\u6ce8\u5668\u80fd\u591f\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u6a21\u578b\uff0c\u751a\u81f3\u63a5\u8fd1GPT-3.5\u7684\u8868\u73b0\uff0c\u5e76\u5c55\u73b0\u51fa\u5728\u672a\u89c1\u8fc7\u95ee\u9898\u4e0a\u7684\u826f\u597d\u6cdb\u5316\u80fd\u529b\u3002**|\n", "2405.20313": "|**2024-05-30**|**Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation**|Guillaume Huguet et.al.|[2405.20313](http://arxiv.org/abs/2405.20313)|null|\u86cb\u767d\u8d28\u5728\u51e0\u4e4e\u6240\u6709\u7684\u751f\u7269\u8fc7\u7a0b\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\uff0c\u5176\u591a\u6837\u5316\u7684\u529f\u80fd\u6e90\u4e8e\u590d\u6742\u7684\u4e09\u7ef4\u7ed3\u6784\uff0c\u800c\u8fd9\u4e9b\u7ed3\u6784\u53c8\u7531\u6c28\u57fa\u9178\u5e8f\u5217\u51b3\u5b9a\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6c28\u57fa\u9178\u5e8f\u5217\u4e30\u5bcc\u7684\u751f\u7269\u5b66\u5f52\u7eb3\u504f\u7f6e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5e8f\u5217\u6761\u4ef6\u7684SE(3)\u7b49\u53d8\u6d41\u5339\u914d\u6a21\u578b\u2014\u2014FoldFlow-2\uff0c\u7528\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u751f\u6210\u3002\u4e0eFoldFlow\u5bb6\u65cf\u7684\u5148\u524d\u6a21\u578b\u76f8\u6bd4\uff0cFoldFlow-2\u5f15\u5165\u4e86\u65b0\u9896\u7684\u67b6\u6784\u7279\u6027\uff0c\u5305\u62ec\u7528\u4e8e\u7f16\u7801\u5e8f\u5217\u7684\u86cb\u767d\u8d28\u5927\u8bed\u8a00\u6a21\u578b\u3001\u7ed3\u5408\u7ed3\u6784\u548c\u5e8f\u5217\u8868\u793a\u7684\u65b0\u591a\u6a21\u6001\u878d\u5408\u4e3b\u5e72\uff0c\u4ee5\u53ca\u57fa\u4e8e\u51e0\u4f55\u53d8\u6362\u5668\u7684\u89e3\u7801\u5668\u3002\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u6837\u672c\u7684\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u2014\u2014\u8fd9\u5bf9\u65b0\u836f\u8bbe\u8ba1\u81f3\u5173\u91cd\u8981\u2014\u2014\u6211\u4eec\u5728\u6bd4\u5148\u524d\u5de5\u4f5c\u4f7f\u7528\u7684PDB\u6570\u636e\u96c6\u5927\u4e00\u4e2a\u6570\u91cf\u7ea7\u7684\u65b0\u6570\u636e\u96c6\u4e0a\u5927\u89c4\u6a21\u8bad\u7ec3FoldFlow-2\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u5df2\u77e5\u7684PDB\u86cb\u767d\u8d28\u548c\u901a\u8fc7\u8fc7\u6ee4\u83b7\u5f97\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u7ed3\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5f15\u5165\u5f3a\u5316\u5fae\u8c03\uff08Reinforced Finetuning\uff0c\u7b80\u79f0ReFT\uff09\u76ee\u6807\uff0c\u4f7fFoldFlow-2\u80fd\u591f\u9002\u5e94\u4efb\u610f\u5956\u52b1\uff0c\u5982\u63d0\u9ad8\u4e8c\u7ea7\u7ed3\u6784\u591a\u6837\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFoldFlow-2\u8d85\u8d8a\u4e86\u73b0\u6709\u57fa\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\u7684\u72b6\u6001\uff0c\u65e0\u8bba\u5728\u65e0\u6761\u4ef6\u751f\u6210\u8fd8\u662f\u5728\u8bbe\u8ba1\u6027\u3001\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u65b9\u9762\uff0c\u90fd\u4f18\u4e8eRFDiffusion\uff0c\u4e14\u5728\u86cb\u767d\u8d28\u957f\u5ea6\u7684\u5404\u7c7b\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u7b49\u6e29\u6784\u8c61\u91c7\u6837\u4efb\u52a1\u4e0a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684FoldFlow-2\u5728\u8bf8\u5982VHH\u7eb3\u7c73\u6297\u4f53\u9aa8\u67b6\u8bbe\u8ba1\u7b49\u5177\u6709\u6311\u6218\u6027\u7684\u6761\u4ef6\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002|\n", "2405.20309": "|**2024-05-30**|**Large Language Models Can Self-Improve At Web Agent Tasks**|Ajay Patel et.al.|[2405.20309](http://arxiv.org/abs/2405.20309)|**[link](https://github.com/AjayP13/webdreamer)**|\u5728\u590d\u6742\u7684\u73af\u5883\u4e2d\uff0c\u5982\u7f51\u7edc\u6d4f\u89c8\u5668\uff0c\u8bad\u7ec3\u6a21\u578b\u4f5c\u4e3a\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u6267\u884c\u52a8\u4f5c\u7684\u4ee3\u7406\u901a\u5e38\u5177\u6709\u6311\u6218\u6027\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u4ee5\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u6765\u5728\u65b0\u73af\u5883\u4e2d\u5bfc\u822a\u7684\u80fd\u529b\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\uff08\u5373\u5728\u5176\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u4e0a\u5fae\u8c03\uff09\u6765\u8d85\u8d8a\u57fa\u7840\u6027\u80fd\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76LLMs\u5728\u957f\u65f6\u5e8f\u4efb\u52a1\u7684\u590d\u6742\u73af\u5883\u2014\u2014WebArena\u57fa\u51c6\u4e2d\uff0c\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u80fd\u5426\u63d0\u5347\u5176\u8868\u73b0\u3002WebArena\u8981\u6c42\u4ee3\u7406\u81ea\u4e3b\u6d4f\u89c8\u7f51\u9875\u5e76\u6267\u884c\u64cd\u4f5c\u4ee5\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u5408\u6210\u8bad\u7ec3\u6570\u636e\u6df7\u5408\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u7ecf\u8fc7\u81ea\u6211\u6539\u8fdb\u540e\uff0c\u6a21\u578b\u5728WebArena\u57fa\u51c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u63d0\u9ad8\u4e8631%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u66f4\u5168\u9762\u5730\u8bc4\u4f30\u6211\u4eec\u7684\u5fae\u8c03\u4ee3\u7406\u6a21\u578b\u7684\u884c\u4e3a\u6027\u80fd\u3001\u9c81\u68d2\u6027\u3001\u80fd\u529b\u4ee5\u53ca\u8f68\u8ff9\u8d28\u91cf\uff0c\u8fd9\u4e9b\u6307\u6807\u8d85\u8d8a\u4e86\u5f53\u524d\u4ec5\u4f9d\u8d56\u4e8e\u6574\u4f53\u57fa\u51c6\u5206\u6570\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002|\n", "2405.20304": "|**2024-05-30**|**Group Robust Preference Optimization in Reward-free RLHF**|Shyam Sundhar Ramesh et.al.|[2405.20304](http://arxiv.org/abs/2405.20304)|**[link](https://github.com/rsshyam/Group-robust-preference-optimization)**|**## \u7ffb\u8bd1 \u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u9002\u5e94\u65f6\uff0c\u901a\u5e38\u9700\u8981\u901a\u8fc7\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u548c\u591a\u5143\u6807\u7b7e\u8005\u7fa4\u4f53\uff08\u5982\u4e0d\u540c\u6027\u522b\u3001\u79cd\u65cf\u3001\u516c\u53f8\u56e2\u961f\u7b49\uff09\u7684\u504f\u597d\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u4f20\u7edf\u65b9\u6cd5\u503e\u5411\u4e8e\u91c7\u7528\u201c\u4e00\u5200\u5207\u201d\u7684\u7b56\u7565\uff0c\u5373\u5047\u8bbe\u5e76\u4f18\u5316\u5355\u4e00\u7684\u504f\u597d\u6a21\u578b\uff0c\u5bf9\u5404\u7fa4\u4f53\u7684\u72ec\u7279\u7279\u6027\u548c\u9700\u6c42\u4e0d\u591f\u654f\u611f\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7fa4\u4f53\u9c81\u68d2\u504f\u597d\u4f18\u5316\uff08GRPO\uff09\u65b9\u6cd5\uff0c\u65e8\u5728\u7a33\u5065\u5730\u4f7fLLMs\u9002\u5e94\u5404\u4e2a\u7fa4\u4f53\u7684\u504f\u597d\u3002GRPO\u65b9\u6cd5\u57fa\u4e8e\u65e0\u5956\u52b1\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u4f46\u533a\u522b\u4e8e\u4ee5\u5f80\uff0c\u5b83\u76ee\u6807\u662f\u5bfb\u627e\u4e00\u4e2a\u80fd\u6700\u5927\u5316\u6700\u5dee\u7fa4\u4f53\u6027\u80fd\u7684\u9c81\u68d2\u7b56\u7565\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0cGRPO\u4f1a\u52a8\u6001\u4e14\u9010\u6b21\u8c03\u6574\u4e0d\u540c\u7fa4\u4f53\u7684\u6743\u91cd\uff0c\u4f18\u5148\u5173\u6ce8\u7d2f\u79ef\u635f\u5931\u8f83\u9ad8\u7684\u7fa4\u4f53\u3002\u6211\u4eec\u5728\u7406\u8bba\u4e0a\u63a2\u8ba8\u4e86GRPO\u7684\u53ef\u884c\u6027\uff0c\u5e76\u5206\u6790\u4e86\u5176\u5728\u5bf9\u6570\u7ebf\u6027\u7b56\u7565\u7c7b\u522b\u4e0b\u7684\u6536\u655b\u6027\u3002\u901a\u8fc7\u4f7f\u7528\u6765\u81ea\u4e0d\u540c\u7fa4\u4f53\u7684\u5168\u5c40\u610f\u89c1\u6570\u636e\u5bf9LLMs\u8fdb\u884cGRPO\u5fae\u8c03\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u6700\u5dee\u7fa4\u4f53\u7684\u8868\u73b0\uff0c\u51cf\u5c11\u4e86\u7fa4\u4f53\u95f4\u635f\u5931\u7684\u4e0d\u5e73\u8861\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u6982\u7387\u51c6\u786e\u6027\uff0c\u76f8\u8f83\u4e8e\u975e\u9c81\u68d2\u57fa\u7ebf\uff0c\u8fd9\u4e9b\u6539\u8fdb\u6548\u679c\u663e\u8457\u3002**|\n", "2405.20285": "|**2024-05-30**|**Who Writes the Review, Human or AI?**|Panagiotis C. Theocharopoulos et.al.|[2405.20285](http://arxiv.org/abs/2405.20285)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u4eba\u4eec\u5173\u6ce8\u5982\u4f55\u8bc6\u522b\u4e0d\u540c\u9886\u57df\u7684AI\u751f\u6210\u6587\u672c\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u901a\u8fc7\u63d0\u51fa\u4e00\u79cd\u65b9\u6cd5\u6765\u51c6\u786e\u533a\u5206\u4eba\u5de5\u667a\u80fd\u751f\u6210\u7684\u548c\u4eba\u7c7b\u64b0\u5199\u7684\u4e66\u8bc4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u8fc1\u79fb\u5b66\u4e60\uff0c\u8ba9\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u540c\u4e3b\u9898\u95f4\u8bc6\u522b\u751f\u6210\u6587\u672c\uff0c\u540c\u65f6\u63d0\u9ad8\u5176\u8bc6\u522b\u5199\u4f5c\u98ce\u683c\u548c\u8bcd\u6c47\u53d8\u5316\u7684\u80fd\u529b\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u5305\u542b\u771f\u5b9e\u7684\u4e66\u8bc4\u548c\u4f7f\u7528Vicuna\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u6a21\u62df\u8bc4\u8bba\uff0c\u4ee5\u8bc4\u4f30\u6240\u63d0\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bc6\u522b\u6587\u672c\u539f\u521b\u6765\u6e90\u662f\u53ef\u884c\u7684\uff0c\u51c6\u786e\u7387\u8fbe\u523096.86%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u8bc6\u522b\u65b9\u9762\u7684\u6027\u80fd\u4e0e\u5c40\u9650\u6027\u7814\u7a76\uff0c\u8fd9\u5bf9\u4e8e\u672a\u6765\u6709\u6548\u7ba1\u7406\u6b64\u7c7b\u6a21\u578b\u4ee5\u53ca\u786e\u4fdd\u4eba\u7c7b\u521b\u4f5c\u5185\u5bb9\u7684\u5b8c\u6574\u6027\u548c\u771f\u5b9e\u6027\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2405.21075": "|**2024-05-31**|**Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis**|Chaoyou Fu et.al.|[2405.21075](http://arxiv.org/abs/2405.21075)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u8ffd\u6c42\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u6210\u4e3a\u8fd1\u671f\u8fdb\u6b65\u7684\u6838\u5fc3\u3002\u7136\u800c\uff0c\u5bf9\u5b83\u4eec\u5904\u7406\u5e8f\u5217\u89c6\u89c9\u6570\u636e\u7684\u80fd\u529b\u7684\u5173\u6ce8\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51faVideo-MME\uff0c\u8fd9\u662f\u9996\u4e2a\u5168\u9762\u8bc4\u4f30MLLMs\u5728\u89c6\u9891\u5206\u6790\u6027\u80fd\u7684\u591a\u6a21\u6001\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u56db\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u89c6\u9891\u7c7b\u578b\u591a\u6837\uff0c\u6db5\u76d66\u4e2a\u4e3b\u8981\u89c6\u89c9\u9886\u57df\u548c30\u4e2a\u5b50\u9886\u57df\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\u6cdb\u5316\u80fd\u529b\uff1b2\uff09\u65f6\u95f4\u7ef4\u5ea6\u7684\u8de8\u5ea6\uff0c\u5305\u62ec\u77ed\u3001\u4e2d\u3001\u957f\u671f\u89c6\u9891\uff0c\u4ece11\u79d2\u52301\u5c0f\u65f6\uff0c\u4ee5\u68c0\u9a8c\u6a21\u578b\u5bf9\u590d\u6742\u60c5\u5883\u52a8\u6001\u7684\u9002\u5e94\u6027\uff1b3\uff09\u6570\u636e\u6a21\u6001\u7684\u5e7f\u5ea6\uff0c\u7ed3\u5408\u89c6\u9891\u5e27\u4ee5\u5916\u7684\u591a\u79cd\u8f93\u5165\uff0c\u5982\u5b57\u5e55\u548c\u97f3\u9891\uff0c\u63ed\u793aMLLMs\u7684\u5168\u65b9\u4f4d\u80fd\u529b\uff1b4\uff09\u9ad8\u8d28\u91cf\u7684\u6807\u6ce8\uff0c\u7531\u4e13\u5bb6\u4e25\u683c\u624b\u52a8\u6807\u8bb0\uff0c\u4ee5\u4fdd\u8bc1\u7cbe\u786e\u4e14\u53ef\u9760\u7684\u6a21\u578b\u8bc4\u4f30\u3002\u6211\u4eec\u7cbe\u5fc3\u6311\u9009\u5e76\u624b\u52a8\u6ce8\u89e3\u4e86900\u6bb5\u89c6\u9891\uff0c\u603b\u65f6\u957f\u8fbe\u5230256\u5c0f\u65f6\uff0c\u751f\u6210\u4e862,700\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u901a\u8fc7Video-MME\uff0c\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u7cfb\u5217\u3001Gemini 1.5 Pro\u5728\u5185\u7684\u591a\u4e2a\u6700\u5148\u8fdb\u7684MLLM\uff0c\u4ee5\u53ca\u5f00\u6e90\u56fe\u50cf\u6a21\u578bInternVL-Chat-V1.5\u548c\u89c6\u9891\u6a21\u578bLLaVA-NeXT-Video\u8fdb\u884c\u4e86\u6df1\u5165\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cGemini 1.5 Pro\u662f\u8868\u73b0\u6700\u4f73\u7684\u5546\u4e1a\u6a21\u578b\uff0c\u660e\u663e\u4f18\u4e8e\u5f00\u6e90\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u53d1\u73b0\u5f3a\u8c03\u4e86\u6539\u8fdb\u5904\u7406\u66f4\u957f\u5e8f\u5217\u548c\u591a\u6a21\u6001\u6570\u636e\u7684\u5fc5\u8981\u6027\u3002\u9879\u76ee\u7f51\u9875\u94fe\u63a5\uff1ahttps://video-mme.github.io|\n", "2405.21047": "|**2024-05-31**|**Grammar-Aligned Decoding**|Kanghee Park et.al.|[2405.21047](http://arxiv.org/abs/2405.21047)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u65f6\u9762\u4e34\u6311\u6218\uff0c\u5982\u7a0b\u5e8f\u4ee3\u7801\u3001\u6570\u5b66\u516c\u5f0f\u6216\u89c4\u8303\u7684\u6807\u8bb0\u3002\u7ea6\u675f\u89e3\u7801\u65b9\u6cd5\u901a\u8fc7\u9650\u5236\u6bcf\u6b21\u8f93\u51fa\u53ef\u80fd\u7684\u4ee4\u724c\uff0c\u786e\u4fdd\u8f93\u51fa\u7b26\u5408\u7279\u5b9a\u89c4\u5219\u6765\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4f8b\u5982\u5728\u8bed\u6cd5\u7ea6\u675f\u89e3\u7801\uff08GCD\uff09\u4e2d\uff0cLLM\u7684\u8f93\u51fa\u5fc5\u987b\u9075\u5faa\u7ed9\u5b9a\u7684\u8bed\u6cd5\u89c4\u5219\u3002\u7136\u800c\uff0c\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u79cd\u7ea6\u675f\u89e3\u7801\u53ef\u80fd\u4f1a\u626d\u66f2\u6a21\u578b\u7684\u5206\u5e03\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u8f93\u51fa\u867d\u7136\u8bed\u6cd5\u6b63\u786e\uff0c\u4f46\u5176\u6982\u7387\u5e76\u4e0d\u76f4\u63a5\u53cd\u6620LLM\u672c\u8eab\u7684\u6982\u7387\u5206\u914d\uff0c\u4ece\u800c\u8d28\u91cf\u4e0d\u9ad8\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u4e0e\u8bed\u6cd5\u7ea6\u675f\u5bf9\u9f50\u7684\u89e3\u7801\u201d\uff08Grammar-Aligned Decoding\uff0cGAD\uff09\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u81ea\u9002\u5e94\u91c7\u6837\u4e0e\u8fd1\u4f3c\u671f\u671b\u672a\u6765\u201d\uff08Adaptive Sampling with Approximate Expected Futures\uff0cASAp\uff09\u7684\u89e3\u7801\u7b97\u6cd5\u3002 ASAp\u7b97\u6cd5\u65e8\u5728\u4fdd\u8bc1\u8f93\u51fa\u7684\u8bed\u6cd5\u6027\uff0c\u5e76\u7406\u8bba\u4e0a\u4ea7\u751f\u4e0eLLM\u5728\u7ed9\u5b9a\u8bed\u6cd5\u7ea6\u675f\u6761\u4ef6\u4e0b\u7684\u6761\u4ef6\u6982\u7387\u76f8\u7b26\u7684\u7ed3\u679c\u3002\u8be5\u7b97\u6cd5\u5229\u7528\u5148\u524d\u7684\u6837\u672c\u8f93\u51fa\u6765\u7a33\u5065\u5730\u4f30\u7b97\u4e0d\u540c\u8f93\u51fa\u524d\u7f00\u7684\u672a\u6765\u8bed\u6cd5\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5728\u4ee3\u7801\u751f\u6210\u548c\u7ed3\u6784\u5316\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cASAp\u7ecf\u5e38\u80fd\u591f\u751f\u6210\u6bd4\u73b0\u6709GCD\u6280\u672f\u66f4\u7b26\u5408LLM\u5206\u5e03\u4e14\u4ecd\u9075\u5b88\u6240\u9700\u8bed\u6cd5\u9650\u5236\u7684\u8f93\u51fa\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u8d28\u91cf\u3002|\n", "2405.21040": "|**2024-05-31**|**Direct Alignment of Language Models via Quality-Aware Self-Refinement**|Runsheng Yu et.al.|[2405.21040](http://arxiv.org/abs/2405.21040)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u884c\u4e3a\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u5e38\u7528\u65b9\u6cd5\u3002\u6700\u8fd1\uff0c\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u4f5c\u4e3a\u4e00\u79cd\u66ff\u4ee3\u65b9\u6848\u5174\u8d77\uff0c\u5b83\u4e0d\u518d\u4f9d\u8d56LLM\u5956\u52b1\u6a21\u578b\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u989d\u5916\u7684\u5185\u5b58\u548c\u8bad\u7ec3\u65f6\u95f4\u3002\u7136\u800c\uff0cDPO\u5ffd\u89c6\u4e86\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u76f8\u5bf9\u8d28\u91cf\uff0c\u53ef\u80fd\u5bfc\u81f4\u8bad\u7ec3\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u5229\u7528LLM\u5185\u90e8\u77e5\u8bc6\u5728\u5373\u65f6\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5e76\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7ec6\u5316\u51fd\u6570\uff0c\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u4f30\u8ba1\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u54c1\u8d28\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u8f7b\u5ea6\u5047\u8bbe\u4e0b\uff0c\u6784\u5efa\u7684\u7ec6\u5316\u51fd\u6570\u80fd\u591f\u5e2e\u52a9\u81ea\u6211\u8c03\u6574\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u5c06\u8fd9\u4e2a\u7ec6\u5316\u529f\u80fd\u6574\u5408\u5230DPO\u53ca\u5176\u53d8\u4f53\u8eab\u4efd\u7b56\u7565\u4f18\u5316\uff08IPO\uff09\u4e2d\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0c\u8fd9\u4e9b\u6539\u8fdb\u540e\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u8005\u4e0a\u8868\u73b0\u51fa\u4f18\u4e8eDPO\u548cIPO\u7684\u6027\u80fd\u3002|\n", "2405.21030": "|**2024-05-31**|**Standards for Belief Representations in LLMs**|Daniel A. Herrmann et.al.|[2405.21030](http://arxiv.org/abs/2405.21030)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u5bb6\u4eec\u6b63\u5728\u5bfb\u6c42\u7406\u89e3\u5b83\u4eec\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5173\u4e8eLLMs\u5982\u4f55\uff08\u5982\u679c\u6709\u7684\u8bdd\uff09\u5185\u90e8\u6784\u5efa\u5bf9\u4e16\u754c\u7684\u4fe1\u5ff5\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u7f3a\u4e4f\u4e00\u4e2a\u7edf\u4e00\u7684\u7406\u8bba\u6846\u67b6\u6765\u652f\u6491\u5bf9LLM\u4e2d\u4fe1\u5ff5\u7684\u7814\u7a76\u3002\u672c\u6587\u8bd5\u56fe\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86\u4e00\u5957\u6761\u4ef6\uff0c\u4f7fLLM\u4e2d\u7684\u8868\u793a\u80fd\u591f\u88ab\u89c6\u4e3a\u4fe1\u5ff5\u4f3c\u7684\u3002\u6211\u4eec\u6307\u51fa\uff0c\u5c3d\u7ba1\u5728LLMs\u4e2d\u6d4b\u91cf\u4fe1\u5ff5\u7684\u9879\u76ee\u4e0e\u51b3\u7b56\u7406\u8bba\u548c\u5f62\u5f0f\u8ba4\u8bc6\u8bba\u4e2d\u7684\u4fe1\u5ff5\u6d4b\u91cf\u5728\u8bb8\u591a\u65b9\u9762\u6709\u76f8\u4f3c\u4e4b\u5904\uff0c\u4f46\u4e5f\u5b58\u5728\u5dee\u5f02\uff0c\u8fd9\u4e9b\u5dee\u5f02\u5e94\u5f71\u54cd\u6211\u4eec\u7684\u6d4b\u91cf\u65b9\u6cd5\u3002\u56e0\u6b64\uff0c\u501f\u9274\u54f2\u5b66\u6d1e\u5bdf\u548c\u673a\u5668\u5b66\u4e60\u7684\u5f53\u4ee3\u5b9e\u8df5\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u56db\u4e2a\u6807\u51c6\uff1a\u51c6\u786e\u6027\u3001\u4e00\u81f4\u6027\u3001\u7edf\u4e00\u6027\u548c\u5b9e\u7528\u6027\u3002\u8fd9\u56db\u4e2a\u6807\u51c6\u7ed3\u5408\u4e86\u7406\u8bba\u8003\u91cf\u4e0e\u5b9e\u9645\u9650\u5236\uff0c\u4e3a\u5168\u9762\u7406\u89e3LLM\u4e2d\u7684\u4fe1\u5ff5\u8868\u793a\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u6211\u4eec\u5f15\u7528\u5b9e\u8bc1\u5de5\u4f5c\u7684\u6210\u679c\uff0c\u63ed\u793a\u4e86\u5355\u72ec\u4f7f\u7528\u67d0\u4e9b\u6807\u51c6\u65f6\u8bc6\u522b\u4fe1\u5ff5\u8868\u793a\u7684\u5c40\u9650\u6027\u3002|\n", "2405.21028": "|**2024-05-31**|**LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models**|Elias Stengel-Eskin et.al.|[2405.21028](http://arxiv.org/abs/2405.21028)|**[link](https://github.com/esteng/pragmatic_calibration)**|**\u5f53\u56de\u7b54\u95ee\u9898\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u80fd\u63d0\u4f9b\u7b54\u6848\uff0c\u8fd8\u80fd\u4f20\u8fbe\u5bf9\u7b54\u6848\u6b63\u786e\u6027\u7684\u4fe1\u5fc3\u7a0b\u5ea6\u3002\u8fd9\u5305\u62ec\u660e\u786e\u7684\u5206\u6570\u6807\u8bb0\uff0c\u5982\u7ed9\u51fa\u6570\u5b57\uff0c\u4ee5\u53ca\u9690\u542b\u7684\u4fe1\u5fc3\u6807\u5fd7\uff0c\u5982\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u989d\u5916\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u5927\u591a\u6570\u6a21\u578b\u5f80\u5f80\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u6821\u51c6\u8fd9\u4e9b\u4fe1\u5fc3\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u7528\u7684\u3001\u8003\u8651\u542c\u4f17\u7684\u5fae\u8c03\u65b9\u6cd5\uff08LACIE\uff09\uff0c\u5b83\u4e0d\u4ec5\u5173\u6ce8\u7b54\u6848\u662f\u5426\u6b63\u786e\uff0c\u8fd8\u5173\u6ce8\u7b54\u6848\u662f\u5426\u4f1a\u88ab\u542c\u4f17\u63a5\u53d7\u3002\u6211\u4eec\u5c06\u6821\u51c6\u89c6\u4e3a\u504f\u597d\u4f18\u5316\uff0c\u901a\u8fc7\u53cc\u4ee3\u7406\u6e38\u620f\u521b\u5efa\u6570\u636e\uff0c\u8ba9\u4e00\u4e2a\u6f14\u8bb2\u8005\u6a21\u578b\u7684\u8f93\u51fa\u63a5\u53d7\u6a21\u62df\u542c\u8005\u7684\u8bc4\u5224\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528LACIE\u5bf9\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff08Mistral-7B\u3001Llama3-8B\u548cLlama3-70B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u663e\u793a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6a21\u62df\u542c\u8005\u9762\u524d\u6709\u66f4\u597d\u7684\u6821\u51c6\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u8d8b\u52bf\u4e5f\u9002\u7528\u4e8e\u4eba\u7c7b\u542c\u4f17\uff0c\u5e2e\u52a9\u4ed6\u4eec\u66f4\u51c6\u786e\u5730\u9884\u6d4b\u6a21\u578b\u7684\u6b63\u786e\u6027\uff1a\u6211\u4eec\u5728\u4eba\u673a\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u63a5\u53d7\u7684\u9519\u8bef\u7b54\u6848\u51cf\u5c11\u4e8647%\uff0c\u800c\u6b63\u786e\u7b54\u6848\u7684\u63a5\u53d7\u7387\u4fdd\u6301\u4e0d\u53d8\u3002\u6b64\u5916\uff0cLACIE\u6cdb\u5316\u5230\u53e6\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\uff0c\u5728\u4f7f\u7528TriviaQA\u8bad\u7ec3\u540e\uff0cTruthfulQA\u4e0a\u7684\u771f\u5b9e\u6027\u5927\u5e45\u63d0\u9ad8\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLACIE\u5bfc\u81f4\u4e86\u6b63\u786e\u548c\u9519\u8bef\u793a\u4f8b\u4e4b\u95f4\u7684\u4fe1\u5fc3\u5ea6\u66f4\u597d\u5730\u5206\u79bb\u3002\u5b9a\u6027\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u4f1a\u66f4\u52a0\u8c28\u614e\uff0c\u5e76\u5728\u56de\u7b54\u6b63\u786e\u65f6\u901a\u8fc7\u4f7f\u7528\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u7ec6\u8282\u6765\u9690\u6027\u5730\u8868\u793a\u786e\u5b9a\u6027\u3002\u6700\u540e\uff0cLACIE\u5fae\u8c03\u5bfc\u81f4\u6a21\u578b\u5bf9\u4e8e\u53ef\u80fd\u9519\u8bef\u7684\u7b54\u6848\u66f4\u503e\u5411\u4e8e\u653e\u5f03\uff08\u4f8b\u5982\u8bf4\u201c\u6211\u4e0d\u77e5\u9053\u201d\uff09\u3002**|\n", "2405.21018": "|**2024-05-31**|**Improved Techniques for Optimization-Based Jailbreaking on Large Language Models**|Xiaojun Jia et.al.|[2405.21018](http://arxiv.org/abs/2405.21018)|**[link](https://github.com/jiaxiaojunqaq/i-gcg)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5176\u5b89\u5168\u6821\u51c6\u6210\u4e3a\u5e7f\u6cdb\u5e94\u7528\u7684\u5173\u952e\u3002\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u7834\u89e3\uff08\u5373\u201cjailbreaking\u201d\uff09\u6d3b\u52a8\u65e5\u76ca\u589e\u591a\uff0c\u5176\u4e2d\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u653b\u51fb\u56e0\u5176\u6210\u6548\u663e\u8457\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0cGCG\u7684\u653b\u51fb\u6548\u7387\u4ecd\u6709\u63d0\u5347\u7a7a\u95f4\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6539\u8fdb\u7684\u4f18\u5316\u57fa\u7ebf\u7834\u89e3\u6280\u672f\uff0c\u4ee5\u63d0\u5347GCG\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u5355\u4e2a\u76ee\u6807\u6a21\u677f\u201cSure\u201d\u6781\u5927\u5730\u9650\u5236\u4e86GCG\u7684\u653b\u51fb\u6548\u679c\uff0c\u56e0\u6b64\u6211\u4eec\u5efa\u8bae\u91c7\u7528\u5305\u542b\u6709\u5bb3\u81ea\u6211\u6697\u793a\u548c/\u6216\u6307\u5bfc\u7684\u591a\u6837\u5316\u76ee\u6807\u6a21\u677f\uff0c\u4ee5\u8bef\u5bfc\u6a21\u578b\u3002\u5728\u4f18\u5316\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u5efa\u8bae\u5728GCG\u4e2d\u5b9e\u65bd\u81ea\u52a8\u591a\u5750\u6807\u66f4\u65b0\uff0c\u4ee5\u52a0\u901f\u6536\u655b\uff0c\u5e76\u5f15\u5165\u4ece\u7b80\u5355\u5230\u590d\u6742\uff08easy-to-hard\uff09\u7684\u521d\u59cb\u5316\u6280\u5de7\u3002\u5c06\u8fd9\u4e9b\u6539\u8fdb\u6574\u5408\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u9ad8\u6548\u7684\u65b9\u6cd5\u2014\u2014$\\mathcal{I}$-GCG\u3002\u5b9e\u9a8c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5982NeurIPS 2023 \u7ea2\u961f\u6311\u6218\u4e2d\u8fdb\u884c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6539\u8fdb\u6280\u672f\u80fd\u591f\u5e2e\u52a9GCG\u8d85\u8d8a\u73b0\u6709\u7834\u89e3\u653b\u51fb\uff0c\u5b9e\u73b0\u63a5\u8fd1100%\u7684\u653b\u51fb\u6210\u529f\u7387\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/jiaxiaojunQAQ/I-GCG\u3002**|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985](http://arxiv.org/abs/2405.20985)|null|\u8be5\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u6295\u5f71\u5668\u6a21\u5757\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fde\u63a5\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u3001\u4fc3\u8fdb\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u9762\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u5bf9\u4e8e\u6295\u5f71\u5668\u5728\u89c6\u89c9-\u8bed\u8a00\u5bf9\u9f50\u65b9\u9762\u7684\u6548\u679c\u8bc4\u4f30\u4ecd\u663e\u4e0d\u8db3\uff0c\u901a\u5e38\u53ea\u80fd\u901a\u8fc7\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u95f4\u63a5\u63a8\u65ad\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u901a\u8fc7\u5206\u6790MLLM\u4e2d\u7684\u89c6\u89c9-\u8bed\u8a00\u8bed\u4e49\u6d41\uff0c\u6765\u89e3\u8bfb\u6295\u5f71\u5668\u7684\u5de5\u4f5c\u673a\u5236\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u8005\u8ffd\u8e2a\u4ece\u751f\u6210\u7684\u8bed\u8a00\u6807\u8bb0\u5230\u539f\u59cb\u89c6\u89c9\u7f16\u7801\u5757\u4ee5\u53ca\u6295\u5f71\u5668\u4ea7\u751f\u7684\u4e2d\u95f4\u8f93\u51fa\u4e4b\u95f4\u7684\u8bed\u4e49\u76f8\u5173\u6027\u6d41\u3002\u53d1\u73b0\u538b\u7f29\u578b\u6295\u5f71\u5668\uff08\u5982QFormer\uff09\u503e\u5411\u4e8e\u5c06\u89c6\u89c9\u5757\u62bd\u8c61\u6210\u6709\u9650\u7684\u51e0\u4e2a\u6982\u5ff5\uff0c\u5982\u7269\u4f53\u6216\u5c5e\u6027\uff0c\u5bfc\u81f4\u201c\u53cc\u91cd\u62bd\u8c61\u201d\u73b0\u8c61\uff1a\u9996\u5148\uff0c\u6295\u5f71\u5668\u53c2\u7167\u9884\u5b9a\u4e49\u67e5\u8be2\u4ee4\u724c\u8fdb\u884c\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\uff0c\u7136\u540e\uff0c\u57fa\u4e8e\u6587\u672c\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u4e00\u6b65\u63d0\u53d6\u3002\u8fd9\u79cd\u53cc\u91cd\u62bd\u8c61\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6548\u7387\u4e0d\u9ad8\uff0c\u5e76\u53ef\u80fd\u5bfc\u81f4\u89c6\u89c9\u8bed\u4e49\u4fe1\u606f\u7684\u7d2f\u79ef\u7f3a\u5931\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u63d0\u51fa\u201c\u89e3\u8026\u538b\u7f29\u4e0e\u62bd\u8c61\uff08DeCo\uff09\u201d\u7684\u5173\u952e\u6d1e\u5bdf\uff0c\u5373\u5728\u6295\u5f71\u5c42\u9762\u4e0a\u5c06\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u538b\u7f29\uff0c\u800c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u5b8c\u5168\u8d1f\u8d23\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u538b\u7f29\u5668\u2014\u2014\u4e8c\u7ef4\u81ea\u9002\u5e94\u6c60\u5316\uff0c\u4ee5\u65e0\u53c2\u6570\u7684\u65b9\u5f0f\u964d\u4f4e\u89c6\u89c9\u5757\u7684\u5c3a\u5bf8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDeCo\u5728\u6027\u80fd\u548c\u6548\u7387\u4e0a\u90fd\u4f18\u4e8e\u4f20\u7edf\u7684\u538b\u7f29\u6295\u5f71\u5668\u3002\u5b83\u5728MLLM\u57fa\u51c6\u3001\u89c6\u89c9\u5b9a\u4f4d\u548c\u5f00\u653e\u6027\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\u5206\u522b\u53d6\u5f97\u4e860.9%\u30017.1%\u548c2.9%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u62e5\u6709\u66f4\u5c11\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u66f4\u5feb\u7684\u6536\u655b\u901f\u5ea6\u3002|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978](http://arxiv.org/abs/2405.20978)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5f3a\u5927\u529f\u80fd\uff0c\u4f46\u9762\u4e34\u6311\u6218\uff0c\u5982\u865a\u6784\u3001\u8fc7\u65f6\u77e5\u8bc6\u548c\u96be\u4ee5\u8ffd\u6eaf\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u5b83\u7ed3\u5408\u5916\u90e8\u6570\u636e\u5e93\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u4e0d\u9002\u5f53\u7684\u68c0\u7d22\u6bb5\u843d\u53ef\u80fd\u59a8\u788dLLMs\u751f\u6210\u5168\u9762\u4e14\u9ad8\u8d28\u91cf\u7684\u56de\u7b54\u3002\u5148\u524d\u5173\u4e8eRAG\u4e2d\u68c0\u7d22\u566a\u58f0\u7a33\u5065\u6027\u7684\u7814\u7a76\u5f80\u5f80\u5c40\u9650\u4e8e\u6709\u9650\u7684\u566a\u58f0\u7c7b\u578b\uff0c\u8fd9\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u68c0\u7d22\u73af\u5883\u4e0d\u7b26\uff0c\u9650\u5236\u4e86\u5b9e\u9645\u5e94\u7528\u3002\u672c\u7814\u7a76\u9996\u5148\u63a2\u8ba8\u4e86\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u5c06\u5176\u5206\u4e3a\u4e09\u79cd\u4e0d\u540c\u7684\u7c7b\u522b\uff0c\u53cd\u6620\u771f\u5b9e\u73af\u5883\u3002\u6211\u4eec\u5206\u6790\u4e86\u8fd9\u4e9b\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u566a\u58f0\u5bf9LLMs\u7a33\u5065\u6027\u7684\u5f71\u54cd\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAG\u65b9\u6cd5\uff0c\u79f0\u4e3a\u68c0\u7d22\u589e\u5f3a\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\uff08RAAT\uff09\u3002RAAT\u5229\u7528\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\u6765\u52a8\u6001\u8c03\u6574\u6a21\u578b\u7684\u8bad\u7ec3\u6d41\u7a0b\u4ee5\u5e94\u5bf9\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u91c7\u7528\u591a\u4efb\u52a1\u5b66\u4e60\u786e\u4fdd\u6a21\u578b\u80fd\u591f\u8bc6\u522b\u5608\u6742\u7684\u4e0a\u4e0b\u6587\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5404\u79cd\u566a\u58f0\u6761\u4ef6\u4e0b\uff0c\u4f7f\u7528RAAT\u8bad\u7ec3\u7684LLaMA-2 7B\u6a21\u578b\u5728F1\u548cEM\u5206\u6570\u4e0a\u663e\u793a\u51fa\u663e\u8457\u63d0\u5347\u3002\u4e3a\u4e86\u4fbf\u4e8e\u590d\u73b0\uff0c\u6211\u4eec\u5df2\u5728https://github.com/calubkk/RAAT\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2405.20974": "|**2024-05-31**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974](http://arxiv.org/abs/2405.20974)|**[link](https://github.com/xu1868/sayself)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u51c6\u786e\u6216\u865a\u5047\u7684\u4fe1\u606f\uff0c\u5e76\u4e14\u901a\u5e38\u65e0\u6cd5\u8868\u660e\u5176\u4fe1\u5fc3\u6c34\u5e73\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u76f4\u63a5\u63d0\u793a\u6216\u81ea\u6211\u4e00\u81f4\u6027\u63d0\u793a\u6765\u63d0\u53d6LLMs\u7684\u4fe1\u5fc3\uff0c\u6216\u8005\u6784\u5efa\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6027\u80fd\u8f83\u5dee\uff0c\u800c\u57fa\u4e8e\u8bad\u7ec3\u7684\u65b9\u6cd5\u53c8\u5c40\u9650\u4e8e\u4e8c\u5143\u6216\u4e0d\u7cbe\u786e\u7684\u6574\u4f53\u4fe1\u5fc3\u4f30\u8ba1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014SaySelf\uff0c\u8fd9\u662f\u4e00\u4e2a\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLMs\u63d0\u4f9b\u66f4\u7cbe\u786e\u7684\u7ec6\u7c92\u5ea6\u4fe1\u5fc3\u4f30\u8ba1\u3002 \u6b64\u5916\uff0cSaySelf\u8fd8\u63a8\u52a8LLMs\u751f\u6210\u81ea\u6211\u53cd\u601d\u7684\u89e3\u91ca\uff0c\u660e\u786e\u6307\u51fa\u5b83\u4eec\u5728\u53c2\u6570\u77e5\u8bc6\u4e0a\u7684\u7a7a\u767d\u5e76\u89e3\u91ca\u4e0d\u786e\u5b9a\u6027\u3002\u8fd9\u662f\u901a\u8fc7\u8ba9LLM\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u81ea\u52a8\u603b\u7ed3\u7279\u5b9a\u77e5\u8bc6\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u6765\u5b9e\u73b0\u7684\u3002\u8fd9\u79cd\u603b\u7ed3\u662f\u57fa\u4e8e\u5bf9\u591a\u4e2a\u91c7\u6837\u63a8\u7406\u94fe\u7684\u4e0d\u4e00\u81f4\u6027\u5206\u6790\uff0c\u751f\u6210\u7684\u6570\u636e\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u6821\u51c6\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u5956\u52b1\u51c6\u786e\u3001\u9ad8\u7f6e\u4fe1\u5ea6\u7684\u9884\u6d4b\uff0c\u540c\u65f6\u60e9\u7f5a\u9519\u8bef\u8f93\u51fa\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u5728\u5206\u5e03\u5185\u8fd8\u662f\u5206\u5e03\u5916\u7684\u6570\u636e\u96c6\u4e0a\uff0cSaySelf\u90fd\u80fd\u6709\u6548\u51cf\u5c11\u4fe1\u5fc3\u6821\u51c6\u8bef\u5dee\uff0c\u540c\u65f6\u4fdd\u6301\u4efb\u52a1\u6027\u80fd\u3002\u751f\u6210\u7684\u81ea\u6211\u53cd\u601d\u7406\u7531\u4e5f\u88ab\u8bc1\u660e\u662f\u5408\u7406\u7684\uff0c\u80fd\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u6821\u51c6\u3002\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a\\url{https://github.com/xu1868/SaySelf}\u3002**|\n", "2405.20973": "|**2024-05-31**|**LCQ: Low-Rank Codebook based Quantization for Large Language Models**|Wen-Pu Cai et.al.|[2405.20973](http://arxiv.org/abs/2405.20973)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4f18\u5f02\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7684\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u9ad8\u6210\u4e3a\u90e8\u7f72\u7684\u4e00\u5927\u6311\u6218\u3002\u4e3a\u4e86\u538b\u7f29\u6a21\u578b\u5e76\u964d\u4f4e\u6210\u672c\uff0c\u6743\u91cd\u91cf\u5316\u6280\u672f\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u76ee\u524d\uff0c\u5927\u591a\u6570\u9488\u5bf9LLMs\u7684\u91cf\u5316\u65b9\u6cd5\u4f7f\u7528\u79e9\u4e00\u7801\u672c\uff0c\u7136\u800c\u5728\u9ad8\u538b\u7f29\u6bd4\u4e0b\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u7cbe\u5ea6\u635f\u5931\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6743\u91cd\u91cf\u5316\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4f4e\u79e9\u7801\u672c\u91cf\u5316\uff08LCQ\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 ## \u65b9\u6cd5 LCQ\u91c7\u7528\u4f4e\u79e9\u7801\u672c\u8fdb\u884c\u91cf\u5316\uff0c\u5176\u79e9\u53ef\u4ee5\u5927\u4e8e\u4e00\u3002\u8fd9\u79cd\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u5229\u7528\u66f4\u9ad8\u7684\u79e9\u6765\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u7684\u7cbe\u5ea6\uff0c\u540c\u65f6\u63a7\u5236\u989d\u5916\u7684\u5b58\u50a8\u5f00\u9500\u51e0\u4e4e\u4e3a\u96f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u6bd4\uff0cLCQ\u5728\u4fdd\u6301\u826f\u597d\u51c6\u786e\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u80fd\u591f\u5b9e\u73b0\u66f4\u4f18\u7684\u538b\u7f29\u6548\u679c\u3002 ## \u7ed3\u8bba \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4f4e\u79e9\u7801\u672c\u91cf\u5316\u65b9\u6cd5\uff0c\u5b83\u6709\u671b\u5728\u4e0d\u663e\u8457\u589e\u52a0\u5b58\u50a8\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6027\u80fd\u548c\u6548\u7387\uff0c\u4e3a\u9ad8\u6548\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550](http://arxiv.org/abs/2406.02550)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|**\u8fd9\u7bc7\u5de5\u4f5c\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7ec4\u6a21\u5757\u5316\u7b97\u672f\u4efb\u52a1\u4e2d\u51fa\u73b0\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u6280\u80fd\u7ec4\u5408\u73b0\u8c61\u3002\u6211\u4eec\u5173\u6ce8\u7684\u662f\u6709\u9650\u6570\u91cf\u7684\u4e00\u6b21\u6027\u6a21\u8fd0\u7b97\u51fd\u6570 $z = a \\times x + b \\times y \\;(\\text{mod}\\; p)$\uff0c\u8fd9\u4e9b\u51fd\u6570\u7531\u5411\u91cf $(a, b) \\in \\mathbb{Z}_p^2$ \u6807\u8bb0\u3002\u90e8\u5206\u4efb\u52a1\u88ab\u7528\u4f5c\u9884\u8bad\u7ec3\uff0c\u5176\u4f59\u7528\u4e8e\u5206\u5e03\u5916\u6d4b\u8bd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cGPT\u98ce\u683c\u7684Transformer\u968f\u7740\u9884\u8bad\u7ec3\u4efb\u52a1\u6570\u91cf\u589e\u52a0\uff0c\u5176\u5728\u5206\u5e03\u5185\u548c\u5206\u5e03\u5916\u7684\u6cdb\u5316\u80fd\u529b\u4f1a\u7ecf\u5386\u8f6c\u53d8\u3002\u6700\u5c0f\u578b\u80fd\u5b9e\u73b0\u5206\u5e03\u5916\u6cdb\u5316\u7684\u6a21\u578b\u9700\u8981\u4e24\u4e2aTransformer\u5757\uff1b\u800c\u5bf9\u4e8e\u66f4\u6df1\u7684\u6a21\u578b\uff0c\u5206\u5e03\u5916\u6cdb\u5316\u9636\u6bb5\u662f\u201c\u77ac\u6001\u201d\u7684\uff0c\u9700\u8981\u65e9\u671f\u505c\u6b62\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u4e86\u53ef\u89e3\u91ca\u6027\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e24\u79cd\u9636\u6bb5\u4e2d\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8868\u793a\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b66\u4e60\u5230\u7684\u7b97\u6cd5\u3002**|\n", "2406.02547": "|**2024-06-04**|**Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning**|Alex Jinpeng Wang et.al.|[2406.02547](http://arxiv.org/abs/2406.02547)|**[link](https://github.com/showlab/VisInContext)**|**\u8fd9\u6bb5\u7814\u7a76\u5e76\u672a\u4ecb\u7ecd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u800c\u662f\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u63d0\u5347\u957f\u5e8f\u5217\u5728\u591a\u6a21\u6001\u6a21\u578b\u4e2d\u7684\u5904\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201cVisualized In-Context Text Processing\u201d\uff08VisInContext\uff09\u6280\u672f\uff0c\u901a\u8fc7\u89c6\u89c9\u4ee4\u724c\u6765\u5904\u7406\u957f\u6587\u672c\uff0c\u4ece\u800c\u663e\u8457\u964d\u4f4eGPU\u5185\u5b58\u4f7f\u7528\u548c\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9700\u6c42\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a560\u4ebf\u53c2\u6570\u7684\u6df7\u5408 Experts\uff08MOE\uff09\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u9884\u8bad\u7ec3\u4e2d\u7684\u4e0a\u4e0b\u6587\u6587\u672c\u957f\u5ea6\u6269\u5c55\u5230\u4e862048\u4e2atokens\uff0c\u800c\u8ba1\u7b97\u91cf\u51e0\u4e4e\u4fdd\u6301\u4e0d\u53d8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528VisInContext\u8bad\u7ec3\u7684\u6a21\u578b\u5728\u5e38\u89c1\u7684\u57fa\u4e8e\u5b9e\u4f8b\u7684\u5c11\u91cf\u6570\u636e\u8bc4\u4f30\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cVisInContext\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u7ed3\u5408\uff0c\u80fd\u589e\u5f3a\u5bf9\u6587\u6863\u7684\u7406\u89e3\u80fd\u529b\uff0c\u7279\u522b\u9002\u7528\u4e8e\u6587\u6863\u95ee\u7b54\u548c\u8fde\u7eed\u6587\u6863\u68c0\u7d22\uff0c\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002**|\n", "2406.02543": "|**2024-06-04**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543](http://arxiv.org/abs/2406.02543)|null|\u6211\u4eec\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u76ee\u6807\u662f\u8bc6\u522b\u5bf9\u7ed9\u5b9a\u67e5\u8be2\u7684\u54cd\u5e94\u65f6\u7684\u4e0d\u786e\u5b9a\u6027\u7a0b\u5ea6\u3002\u6211\u4eec\u540c\u65f6\u8003\u8651\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e0d\u786e\u5b9a\u6027\uff1a\u4e00\u79cd\u662f\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\uff08\u4f8b\u5982\u5bf9\u4e8b\u5b9e\u6216\u8bed\u8a00\u771f\u7406\u7684\u672a\u77e5\uff09\uff0c\u53e6\u4e00\u79cd\u662f\u4e0d\u53ef\u6d88\u9664\u7684\u968f\u673a\u6027\uff08\u5982\u53ef\u80fd\u7684\u7b54\u6848\u591a\u6837\u6027\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4fe1\u606f\u8bba\u6307\u6807\uff0c\u80fd\u591f\u53ef\u9760\u5730\u533a\u5206\u51fa\u53ea\u6709\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u8f83\u5927\u7684\u60c5\u51b5\uff0c\u8fd9\u65f6\u6a21\u578b\u7684\u8f93\u51fa\u662f\u4e0d\u53ef\u9760\u7684\u3002\u8fd9\u4e2a\u6761\u4ef6\u4ec5\u4f9d\u8d56\u4e8e\u901a\u8fc7\u7279\u6b8a\u8fed\u4ee3\u63d0\u793a\u57fa\u4e8e\u5148\u524d\u54cd\u5e94\u5f97\u5230\u7684\u6a21\u578b\u8f93\u51fa\u6765\u8ba1\u7b97\u3002\u8fd9\u79cd\u91cf\u5316\u65b9\u6cd5\u53ef\u4ee5\u68c0\u6d4b\u5355\u7b54\u548c\u591a\u7b54\u60c5\u51b5\u4e0b\u662f\u5426\u5b58\u5728\u865a\u6784\uff08\u5373\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u9ad8\uff09\u7684\u60c5\u51b5\uff0c\u8fd9\u4e0e\u8bb8\u591a\u6807\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\u7b56\u7565\uff08\u5982\u4ee5\u54cd\u5e94\u7684\u5bf9\u6570\u4f3c\u7136\u6027\u4f5c\u4e3a\u9608\u503c\uff09\u4e0d\u540c\uff0c\u540e\u8005\u65e0\u6cd5\u8bc6\u522b\u591a\u7b54\u60c5\u51b5\u4e0b\u7684\u865a\u6784\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86LLM\u5982\u4f55\u901a\u8fc7\u8fed\u4ee3\u63d0\u793a\u653e\u5927\u5bf9\u7ed9\u5b9a\u8f93\u51fa\u7684\u6982\u7387\u5206\u914d\uff0c\u8fd9\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u5174\u8da3\u4ef7\u503c\u3002|\n", "2406.02542": "|**2024-06-04**|**Loki: Low-Rank Keys for Efficient Sparse Attention**|Prajwal Singhania et.al.|[2406.02542](http://arxiv.org/abs/2406.02542)|null|\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u7279\u522b\u662f\u5f53\u4f7f\u7528\u957f\u5e8f\u5217\u65f6\uff0c\u81ea\u6ce8\u610f\u529b\u673a\u5236\u662f\u4e3b\u8981\u5f00\u9500\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u7a00\u758f\u6ce8\u610f\u529b\u8fd1\u4f3c\u65b9\u6cd5\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5206\u6790\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u5757\u4e2d\u7684\u952e\u5411\u91cf\u5b9e\u9645\u4e0a\u5904\u4e8e\u4e00\u4e2a\u8fdc\u4f4e\u4e8e\u539f\u59cb\u7ef4\u5ea6\u7684\u7a7a\u95f4\u3002\u8fd9\u4e00\u89c2\u5bdf\u4fc3\u4f7f\u6211\u4eec\u63d0\u51faLoki\uff0c\u4e00\u79cd\u65b0\u7684\u7a00\u758f\u6ce8\u610f\u529b\u65b9\u6cd5\u3002Loki\u6839\u636e\u5728\u4f4e\u7ef4\u7a7a\u95f4\u8ba1\u7b97\u7684\u6ce8\u610f\u529b\u5f97\u5206\uff0c\u5bf9KV\u7f13\u5b58\u4e2d\u7684\u4ee4\u724c\u8fdb\u884c\u6392\u5e8f\u548c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLoki\u80fd\u591f\u6bd4\u5176\u4ed6\u6d41\u884c\u8fd1\u4f3c\u65b9\u6cd5\u66f4\u597d\u5730\u4fdd\u6301\u6a21\u578b\u7684\u6548\u80fd\uff0c\u540c\u65f6\u7531\u4e8e\u51cf\u5c11\u4e86\u6570\u636e\u79fb\u52a8\uff08\u52a0\u8f7d/\u5b58\u50a8\uff09\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u52a0\u901f\u4e86\u6ce8\u610f\u529b\u8ba1\u7b97\u3002|\n", "2406.02539": "|**2024-06-04**|**Parrot: Multilingual Visual Instruction Tuning**|Hai-Long Sun et.al.|[2406.02539](http://arxiv.org/abs/2406.02539)|null|\u968f\u7740GPT-4V\u7b49\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u5de5\u667a\u80fd\u671d\u7740\u901a\u7528\u4eba\u5de5\u667a\u80fd\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6765\u540c\u6b65\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u800c\u8d4b\u4e88\u5b83\u4eec\u591a\u6a21\u6001\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u505a\u6cd5\u53ef\u80fd\u5bfc\u81f4\u968f\u7740\u8bad\u7ec3\u7684\u8fdb\u884c\uff0c\u8bed\u8a00\u6a21\u578b\u5904\u7406\u591a\u79cd\u8bed\u8a00\u7684\u80fd\u529b\u9010\u6e10\u51cf\u5f31\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684\u4e0d\u5e73\u8861SFT\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u975e\u82f1\u8bed\u8bed\u8a00\u6027\u80fd\u663e\u8457\u4e0b\u964d\uff0c\u539f\u56e0\u5728\u4e8eSFT\u8fc7\u7a0b\u4e2d\u672a\u80fd\u6709\u6548\u8fde\u63a5\u89c6\u89c9\u7f16\u7801\u5668\u548c\u591a\u8bed\u8a00\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faParrot\uff0c\u4e00\u79cd\u5229\u7528\u6587\u672c\u5f15\u5bfc\u5728\u8bed\u8a00\u5c42\u9762\u9a71\u52a8\u89c6\u89c9\u4ee4\u724c\u5bf9\u9f50\u7684\u65b0\u65b9\u6cd5\u3002Parrot\u901a\u8fc7\u8ba9\u89c6\u89c9\u4ee4\u724c\u6839\u636e\u4e0d\u540c\u7684\u8bed\u8a00\u8f93\u5165\u8fdb\u884c\u6761\u4ef6\u5316\uff0c\u5e76\u501f\u52a9\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u4fc3\u8fdb\u591a\u8bed\u8a00\u4ee4\u724c\u7684\u5bf9\u9f50\u3002\u7279\u522b\u662f\uff0c\u4e3a\u4e86\u589e\u5f3a\u975e\u82f1\u8bed\u89c6\u89c9\u4ee4\u724c\u7684\u5bf9\u9f50\uff0c\u6211\u4eec\u8ba1\u7b97\u521d\u59cb\u89c6\u89c9\u7279\u5f81\u4e0e\u6587\u672c\u5d4c\u5165\u4e4b\u95f4\u7684\u8de8\u6ce8\u610f\u529b\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230MoE\u8def\u7531\u5668\uff0c\u9009\u62e9\u6700\u76f8\u5173\u7684\u4e13\u5bb6\u3002\u9009\u5b9a\u7684\u4e13\u5bb6\u4f1a\u5c06\u521d\u59cb\u89c6\u89c9\u4ee4\u724c\u8f6c\u5316\u4e3a\u7279\u5b9a\u8bed\u8a00\u7684\u89c6\u89c9\u4ee4\u724c\u3002\u9274\u4e8e\u76ee\u524d\u7f3a\u4e4f\u8bc4\u4f30\u591a\u8bed\u8a00\u80fd\u529b\u7684\u6807\u51c6\u57fa\u51c6\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u591a\u6a21\u6001\u57fa\u51c6\uff08MMMB\uff09\uff0c\u5305\u62ec6\u79cd\u8bed\u8a00\u300115\u4e2a\u7c7b\u522b\u548c12,000\u4e2a\u95ee\u9898\u3002Parrot\u4e0d\u4ec5\u5728MMMB\u548cMMM Benchmark\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8fd8\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5c06\u63d0\u4f9bParrot\u7684\u6e90\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\u96c6\u4f9b\u516c\u4f17\u4f7f\u7528\u3002|\n", "2406.02536": "|**2024-06-04**|**Mitigate Position Bias in Large Language Models via Scaling a Single Dimension**|Yijiong Yu et.al.|[2406.02536](http://arxiv.org/abs/2406.02536)|**[link](https://github.com/PositionalHidden/PositionalHidden)**|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4e00\u4e2a\u73b0\u8c61\u2014\u2014\u4f4d\u7f6e\u504f\u89c1\uff0c\u4e5f\u79f0\u4e3a\"\u8ff7\u5931\u5728\u4e2d\u95f4\"\u3002\u8fd9\u79cd\u504f\u89c1\u5728\u957f\u6587\u672c\u60c5\u5883\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u5373\u5173\u952e\u4fe1\u606f\u5728\u63d0\u793a\u4e2d\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f1a\u663e\u8457\u5f71\u54cd\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u6743\u91cd\u662f\u4f4d\u7f6e\u504f\u89c1\u7684\u5fae\u89c2\u8868\u73b0\u3002\u6b64\u5916\uff0c\u8bba\u6587\u6307\u51fa\uff0c\u56e0\u679c\u6ce8\u610f\u529b\u63a9\u7801\u901a\u8fc7\u521b\u5efa\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\uff0c\u4e5f\u5bf9\u4f4d\u7f6e\u504f\u89c1\u6709\u6240\u8d21\u732e\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u51cf\u8f7b\u4f4d\u7f6e\u504f\u89c1\uff0c\u5373\u8c03\u6574\u8fd9\u4e9b\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\u3002\u5b9e\u9a8c\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\uff0c\u5305\u62ec\u81ea\u7136\u95ee\u9898\u591a\u6587\u6863\u95ee\u7b54\u3001\u952e\u503c\u68c0\u7d22\u3001LongBench\u548c\u65f6\u95f4\u7ebf\u91cd\u6392\uff0c\u6d89\u53caRoPE\u6a21\u578b\u3001\u6269\u5c55\u4e0a\u4e0b\u6587\u7a97\u53e3\u6a21\u578b\u548cAlibi\u6a21\u578b\u7b49\u591a\u79cd\u67b6\u6784\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4ec5\u4fee\u6539\u9690\u85cf\u72b6\u6001\u7684\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u5c31\u80fd\u5b9e\u73b0\u6027\u80fd\u63d0\u5347\uff0c\u6700\u9ad8\u53ef\u8fbe15.2%\u3002\u7814\u7a76\u8005\u8fd8\u63d0\u4f9b\u4e86\u4ee3\u7801\u4f9b\u8fdb\u4e00\u6b65\u4f7f\u7528\uff0c\u4ee3\u7801\u5730\u5740\u4e3a\uff1ahttps://aka.ms/PositionalHidden\u3002|\n", "2406.02532": "|**2024-06-04**|**SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices**|Ruslan Svirschevski et.al.|[2406.02532](http://arxiv.org/abs/2406.02532)|**[link](https://github.com/yandex-research/specexec)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u9ad8\u6548\u8fd0\u884c\u5b83\u4eec\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u63a8\u6d4b\u6027\u89e3\u7801\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u662f\u9488\u5bf9\u6570\u636e\u4e2d\u5fc3\u786c\u4ef6\u8fdb\u884c\u8bbe\u8ba1\u3002\u672c\u7814\u7a76\u53cd\u95ee\uff1a\u6211\u4eec\u80fd\u5728\u6d88\u8d39\u7ea7\u8bbe\u5907\u4e0a\u591a\u5feb\u5730\u8fd0\u884cLLMs\uff1f\u6d88\u8d39\u8005\u7ea7GPU\u5df2\u65e0\u6cd5\u5bb9\u7eb3\u6700\u5927\u7684\u6a21\u578b\uff08500\u4ebf\u53c2\u6570\u4ee5\u4e0a\uff09\uff0c\u56e0\u6b64\u9700\u8981\u5c06\u53c2\u6570\u5378\u8f7d\u5230RAM\u6216SSD\u3002\u5f53\u4f7f\u7528\u5378\u8f7d\u53c2\u6570\u7684\u65b9\u5f0f\u8fd0\u884c\u65f6\uff0c\u63a8\u7406\u5f15\u64ce\u53ef\u4ee5\u540c\u65f6\u5904\u7406\u6570\u767e\u4e43\u81f3\u6570\u5343\u4e2a\u4ee4\u724c\u7684\u6279\u6b21\uff0c\u4f7f\u5176\u975e\u5e38\u9002\u5408\u63a8\u6d4b\u6027\u89e3\u7801\u3002\u6211\u4eec\u63d0\u51faSpecExec\uff08\u63a8\u6d4b\u6027\u6267\u884c\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u7684\u5e76\u884c\u89e3\u7801\u65b9\u6cd5\uff0c\u9002\u7528\u4e8e\u4e3b\u6d41LLM\u5bb6\u65cf\uff0c\u80fd\u751f\u6210\u6bcf\u8f6e\u76ee\u6807\u6a21\u578b\u8fed\u4ee3\u9ad8\u8fbe20\u4e2a\u4ee4\u724c\u7684\u9884\u6d4b\u3002\u5b83\u5229\u7528\u73b0\u4ee3LLMs\u4e2d\u6982\u7387\u5206\u5e03\u7684\u9ad8\u6ce2\u52a8\u6027\u548c\u6a21\u578b\u8f93\u51fa\u6982\u7387\u4e4b\u95f4\u7684\u9ad8\u5ea6\u4e00\u81f4\u6027\u3002SpecExec\u901a\u8fc7\u4ece\u8349\u7a3f\u6a21\u578b\u83b7\u53d6\u6700\u53ef\u80fd\u7684\u4ee4\u724c\u5ef6\u7eed\uff0c\u6784\u5efa\u4e00\u4e2a\u76ee\u6807\u6a21\u578b\u7684\u201c\u7f13\u5b58\u201d\u6811\uff0c\u7136\u540e\u5728\u4e00\u4e2a\u5355\u6b21\u904d\u5386\u4e2d\u9a8c\u8bc1\u3002 \u4f7f\u7528SpecExec\uff0c\u6211\u4eec\u5728\u6d88\u8d39\u7ea7GPU\u4e0a\u5b9e\u73b0\u4e86500\u4ebf\u53c2\u6570LLM\u7684\u63a8\u7406\uff0c\u914d\u5408RAM\u5378\u8f7d\uff0c4\u4f4d\u91cf\u5316\u4e0b\u7684\u901f\u5ea6\u8fbe\u52304-6\u4e2a\u4ee4\u724c/\u79d2\uff0c\u800c16\u4f4d\u6743\u91cd\u4e0b\u7684\u901f\u5ea6\u4e3a2-3\u4e2a\u4ee4\u724c/\u79d2\u3002|\n", "2406.02528": "|**2024-06-04**|**Scalable MatMul-free Language Modeling**|Rui-Jie Zhu et.al.|[2406.02528](http://arxiv.org/abs/2406.02528)|**[link](https://github.com/ridgerchu/matmulfreellm)**|**## \u7ffb\u8bd1 \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u77e9\u9635\u4e58\u6cd5\uff08MatMul\uff09\u901a\u5e38\u5360\u636e\u4e3b\u8981\u8ba1\u7b97\u5f00\u9500\u3002\u968f\u7740LLMs\u7684\u89c4\u6a21\u6269\u5927\uff0c\u5176\u5d4c\u5165\u7ef4\u5ea6\u548c\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e5f\u968f\u4e4b\u589e\u52a0\uff0c\u8fd9\u4e00\u95ee\u9898\u66f4\u4e3a\u663e\u8457\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u80fd\u591f\u5728\u4fdd\u6301\u5f3a\u5927\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5b8c\u5168\u79fb\u9664LLMs\u4e2d\u7684MatMul\u64cd\u4f5c\uff0c\u5373\u4f7f\u662f\u572827\u4ebf\u53c2\u6570\u91cf\u7ea7\u7684\u6a21\u578b\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65e0MatMul\u6a21\u578b\u5728\u4e0e\u5185\u5b58\u6d88\u8017\u663e\u8457\u66f4\u591a\u7684\u72b6\u6001-of-the-artTransformer\u76f8\u5f53\u7684\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u7814\u7a76\u4e86\u6a21\u578b\u7684\u6269\u5c55\u6027\u89c4\u5f8b\uff0c\u5e76\u53d1\u73b0\u65e0MatMul\u6a21\u578b\u4e0e\u5168\u7cbe\u5ea6Transformer\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u968f\u7740\u6a21\u578b\u5c3a\u5bf8\u589e\u5927\u800c\u51cf\u5c0f\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684GPU\u5b9e\u73b0\uff0c\u76f8\u8f83\u4e8e\u672a\u4f18\u5316\u7684\u57fa\u7ebf\uff0c\u8bad\u7ec3\u65f6\u80fd\u51cf\u5c11\u9ad8\u8fbe61%\u7684\u5185\u5b58\u4f7f\u7528\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u901a\u8fc7\u4f18\u5316\u7684\u5185\u6838\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5185\u5b58\u6d88\u8017\u53ef\u964d\u4f4e\u8d85\u8fc710\u500d\u3002\u4e3a\u4e86\u51c6\u786e\u8bc4\u4f30\u67b6\u6784\u6548\u7387\uff0c\u6211\u4eec\u5728FPGA\u4e0a\u6784\u5efa\u4e86\u5b9a\u5236\u786c\u4ef6\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528GPU\u65e0\u6cd5\u5904\u7406\u7684\u8f7b\u91cf\u7ea7\u8fd0\u7b97\uff0c\u5b9e\u73b0\u4e86\u5bf9\u5341\u4ebf\u53c2\u6570\u89c4\u6a21\u6a21\u578b\u7684\u9ad8\u901f\u5904\u7406\uff0c\u4f7f\u5176\u63a5\u8fd1\u4eba\u8111\u7ea7\u522b\u7684\u6548\u7387\u3002 \u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u5c55\u793a\u4e86LLMs\u5728\u51cf\u5c0f\u590d\u6742\u6027\u540e\u4ecd\u80fd\u4fdd\u6301\u9ad8\u6548\uff0c\u8fd8\u6307\u51fa\u4e86\u672a\u6765\u52a0\u901f\u5668\u5e94\u4f18\u5316\u7684\u8fd0\u7b97\u7c7b\u578b\uff0c\u4ee5\u9002\u5e94\u4e0b\u4e00\u4ee3\u8f7b\u91cf\u7ea7LLMs\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5b9e\u73b0\u5df2\u5f00\u6e90\u81f3\uff1a\\url{https://github.com/ridgerchu/matmulfreellm}\u3002**|\n", "2406.02524": "|**2024-06-04**|**CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks**|Maciej Besta et.al.|[2406.02524](http://arxiv.org/abs/2406.02524)|**[link](https://github.com/spcl/checkembed)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u53d8\u9769\uff0c\u4f46\u9a8c\u8bc1\u5176\u7b54\u6848\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u5f00\u653e\u6027\u7684\u4efb\u52a1\uff0c\u5982\u77e5\u8bc6\u6574\u5408\u3001\u6458\u8981\u548c\u63d0\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheckEmbed\u7684\u7cbe\u786e\u3001\u53ef\u6269\u5c55\u4e14\u7b80\u4fbf\u7684LLM\u9a8c\u8bc1\u65b9\u6cd5\u3002CheckEmbed\u7684\u6838\u5fc3\u7406\u5ff5\u662f\uff1a\u901a\u8fc7\u5229\u7528\u5982GPT\u6587\u672c\u5d4c\u5165\u5927\u6a21\u578b\u83b7\u53d6\u7684\u7b54\u6848\u7ea7\u5d4c\u5165\u6765\u6bd4\u8f83LLM\u7684\u56de\u7b54\u3002\u8fd9\u5c06\u590d\u6742\u7684\u6587\u672c\u7b54\u6848\u8f6c\u5316\u4e3a\u5355\u4e00\u7684\u5d4c\u5165\uff0c\u7b80\u5316\u4e86\u5bf9\u6bd4\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u5feb\u901f\u800c\u6709\u610f\u4e49\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u9a8c\u8bc1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u5b9e\u73b0\u4e86CheckEmbed\u7684\u7406\u5ff5\uff0c\u5e76\u63d0\u4f9b\u4e86\u8bc4\u4f30LLM\u7b54\u6848\u771f\u5b9e\u6027\u7684\u5ea6\u91cf\uff0c\u5982\u5d4c\u5165\u70ed\u529b\u56fe\u53ca\u5176\u603b\u7ed3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6307\u6807\u8bbe\u8ba1\u5b9e\u9645\u7684\u5f15\u64ce\uff0c\u4ee5\u51b3\u5b9aLLM\u7b54\u6848\u662f\u5426\u4ee4\u4eba\u6ee1\u610f\u3002\u5728\u5b9e\u9645\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\uff0c\u5982\u672f\u8bed\u63d0\u53d6\u548c\u6587\u6863\u6458\u8981\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u663e\u8457\u7684\u51c6\u786e\u6027\u63d0\u5347\u3001\u6210\u672c\u6548\u76ca\u548c\u8fd0\u884c\u65f6\u95f4\u6027\u80fd\uff0c\u76f8\u8f83\u4e8eBERTScore\u6216SelfCheckGPT\u7b49\u57fa\u4e8etoken\u3001\u53e5\u5b50\u548c\u4e8b\u5b9e\u7ea7\u522b\u7684\u65b9\u6848\u3002|\n", "2406.02523": "|**2024-06-04**|**RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots**|Soroush Nasiriany et.al.|[2406.02523](http://arxiv.org/abs/2406.02523)|null|## \u7ffb\u8bd1 \u4eba\u5de5\u667a\u80fd\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u89c4\u6a21\u7684\u6269\u5927\u3002\u7136\u800c\uff0c\u5728\u673a\u5668\u4eba\u9886\u57df\uff0c\u5927\u89c4\u6a21\u673a\u5668\u4eba\u6570\u636e\u96c6\u7684\u83b7\u53d6\u662f\u4e00\u4e2a\u74f6\u9888\u3002\u6211\u4eec\u4e3b\u5f20\u5229\u7528\u903c\u771f\u7684\u7269\u7406\u6a21\u62df\u6765\u63d0\u5347\u73af\u5883\u3001\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\uff0c\u4ee5\u652f\u6301\u673a\u5668\u4eba\u5b66\u4e60\u65b9\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecdRoboCasa\uff0c\u8fd9\u662f\u4e00\u4e2a\u5927\u578b\u7684\u4eff\u771f\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u80fd\u591f\u5728\u65e5\u5e38\u73af\u5883\u4e2d\u901a\u7528\u7684\u673a\u5668\u4eba\u3002RoboCasa\u7684\u7279\u70b9\u662f\u62e5\u6709\u4e30\u5bcc\u4e14\u591a\u6837\u5316\u7684\u53a8\u623f\u573a\u666f\uff0c\u5305\u62ec\u8d85\u8fc7150\u4e2a\u7c7b\u522b\u7684\u4e00\u5343\u591a\u4ef63D\u6a21\u578b\u8d44\u4ea7\u548c\u6570\u5341\u79cd\u53ef\u4ea4\u4e92\u7684\u5bb6\u5177\u548c\u7535\u5668\u3002 \u6211\u4eec\u901a\u8fc7\u751f\u6210\u5f0fAI\u5de5\u5177\u8fdb\u4e00\u6b65\u589e\u5f3a\u6a21\u62df\u7684\u771f\u5b9e\u6027\u548c\u591a\u6837\u6027\uff0c\u5982\u4f7f\u7528\u6587\u672c\u52303D\u6a21\u578b\u7684\u6280\u672f\u751f\u6210\u5bf9\u8c61\u8d44\u4ea7\uff0c\u4ee5\u53ca\u901a\u8fc7\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u73af\u5883\u7eb9\u7406\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86100\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6307\u5bfc\u7684\u590d\u5408\u4efb\u52a1\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f30\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u6f14\u793a\uff0c\u5e76\u7ed3\u5408\u81ea\u52a8\u8f68\u8ff9\u751f\u6210\u65b9\u6cd5\uff0c\u4ee5\u6700\u5c0f\u7684\u4eba\u529b\u6210\u672c\u5927\u5e45\u6269\u5145\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u5408\u6210\u751f\u6210\u7684\u673a\u5668\u4eba\u6570\u636e\u8fdb\u884c\u5927\u89c4\u6a21\u6a21\u4eff\u5b66\u4e60\u65f6\uff0c\u5b58\u5728\u660e\u663e\u7684\u89c4\u6a21\u6548\u5e94\uff0c\u5e76\u663e\u793a\u51fa\u5229\u7528\u6a21\u62df\u6570\u636e\u5728\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\u3002\u76f8\u5173\u89c6\u9891\u548c\u5f00\u6e90\u4ee3\u7801\u5df2\u5728https://robocasa.ai/\u7f51\u7ad9\u4e0a\u63d0\u4f9b\u3002|\n", "2406.03496": "|**2024-06-05**|**Wings: Learning Multimodal LLMs without Text-only Forgetting**|Yi-Kai Zhang et.al.|[2406.03496](http://arxiv.org/abs/2406.03496)|null|## \u4efb\u52a1 \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8d77\u6e90\u4e8e\u9884\u8bad\u7ec3\u7684\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff0c\u9996\u5148\u5c06\u56fe\u50cf\u4e0e\u6587\u672c\u5bf9\u9f50\uff0c\u7136\u540e\u5728\u6df7\u5408\u6a21\u6001\u8f93\u5165\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0cMLLM\u5728\u5904\u7406\u4ec5\u5305\u542b\u6587\u672c\u7684\u6307\u4ee4\u65f6\u4f1a\u51fa\u73b0\u707e\u96be\u6027\u7684\u9057\u5fd8\uff0c\u8fd9\u4e9b\u6587\u672c\u6307\u4ee4\u5e76\u672a\u5305\u542b\u56fe\u50cf\uff0c\u8fd9\u4e9b\u95ee\u9898\u5728\u521d\u59cb\u7684\u8bed\u8a00\u6a21\u578b\u9636\u6bb5\u5c31\u5df2\u7ecf\u5b58\u5728\u3002\u672c\u6587\u63d0\u51faWings\uff0c\u4e00\u4e2a\u65b0\u578b\u7684MLLM\uff0c\u5b83\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u901a\u8fc7\u5206\u6790MLLM\u5728\u591a\u6a21\u6001\u6307\u4ee4\u4e2d\u7684\u6ce8\u610f\u529b\uff0c\u6211\u4eec\u53d1\u73b0\u6587\u672c\u9057\u5fd8\u4e0e\u4ece\u56fe\u50cf\u524d\u5411\u56fe\u50cf\u540e\u7684\u6ce8\u610f\u529b\u8f6c\u79fb\u6709\u5173\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u989d\u5916\u6a21\u5757\u4f5c\u4e3a\u589e\u5f3a\u5b66\u4e60\u5668\uff0c\u4ee5\u8865\u507f\u8fd9\u79cd\u6ce8\u610f\u529b\u8f6c\u79fb\u3002\u89c6\u89c9\u548c\u6587\u672c\u5b66\u4e60\u5668\u4f5c\u4e3a\u201c\u7fc5\u8180\u201d\u5f0f\u7684\u8865\u5145\uff0c\u5e73\u884c\u8fde\u63a5\u5728\u6bcf\u4e2a\u6ce8\u610f\u529b\u5757\u5185\uff0c\u8d77\u521d\u56fe\u50cf\u548c\u6587\u672c\u8f93\u5165\u7531\u89c6\u89c9\u5b66\u4e60\u5668\u4e0e\u4e3b\u6ce8\u610f\u529b\u534f\u540c\u5de5\u4f5c\uff0c\u5e73\u8861\u5bf9\u89c6\u89c9\u5143\u7d20\u7684\u5173\u6ce8\u3002\u968f\u540e\uff0c\u6587\u672c\u5b66\u4e60\u5668\u901a\u8fc7\u6ce8\u610f\u529b\u8def\u7531\u7684\u65b9\u5f0f\u4e0e\u89c6\u89c9\u5b66\u4e60\u5668\u7684\u8f93\u51fa\u534f\u4f5c\u6574\u5408\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4f4e\u79e9\u6b8b\u5dee\u6ce8\u610f\u529b\uff08LoRRA\uff09\u673a\u5236\u4ee5\u4fdd\u8bc1\u5b66\u4e60\u5668\u7684\u9ad8\u6548\u8fd0\u884c\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWings\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u540c\u7b49\u89c4\u6a21\u7684MLLM\u3002\u5728\u6211\u4eec\u65b0\u6784\u5efa\u7684\u4ea4\u9519\u56fe\u50cf-\u6587\u672c\uff08IIT\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cWings\u5728\u4ece\u6587\u672c\u4e3a\u4e3b\u5230\u591a\u6a21\u6001\u4e3a\u4e3b\u7684\u95ee\u7b54\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002|\n", "2406.03488": "|**2024-06-06**|**Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training**|Ao Sun et.al.|[2406.03488](http://arxiv.org/abs/2406.03488)|**[link](https://github.com/maydomine/seq1f1b)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5176\u4e2d\u7ba1\u9053\u5e76\u884c\u6027\u8d77\u7740\u5173\u952e\u4f5c\u7528\u3002\u968f\u7740LLMs\u7684\u8bad\u7ec3\u5e8f\u5217\u957f\u5ea6\u6269\u5c55\u523032k\u751a\u81f3128k\uff0c\u5f53\u524d\u7684\u7ba1\u9053\u5e76\u884c\u65b9\u6cd5\u9762\u4e34\u4e25\u91cd\u74f6\u9888\uff0c\u5982\u9ad8\u5185\u5b58\u5360\u7528\u548c\u663e\u8457\u7684\u7ba1\u9053\u5ef6\u8fdf\uff0c\u8fd9\u6781\u5927\u5730\u9650\u5236\u4e86\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u548c\u8bad\u7ec3\u541e\u5410\u91cf\u3002\u4e3a\u4e86\u63d0\u9ad8\u5185\u5b58\u6548\u7387\u548c\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u957f\u5e8f\u5217\u8bad\u7ec3LLMs\u7684\u9ad8\u6548\u5e8f\u5217\u7ea7\u4e00\u6b21\u524d\u5411\u4e00\u6b21\u540e\u5411\uff081F1B\uff09\u7ba1\u9053\u8c03\u5ea6\u65b9\u6cd5\uff0c\u79f0\u4e3aSeq1F1B\u3002Seq1F1B\u5c06\u6279\u7ea7\u522b\u53ef\u8c03\u5ea6\u5355\u5143\u5206\u89e3\u4e3a\u66f4\u7ec6\u7684\u5e8f\u5217\u7ea7\u5355\u5143\uff0c\u4ece\u800c\u51cf\u5c0f\u5ef6\u8fdf\u5e76\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002 \u8003\u8651\u5230\u5982\u679c\u5747\u5300\u5206\u5272\u5e8f\u5217\uff0cSeq1F1B\u53ef\u80fd\u4f1a\u4ea7\u751f\u8f7b\u5fae\u7684\u989d\u5916\u5ef6\u8fdf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba1\u7b97\u7684\u7b56\u7565\u6765\u5212\u5206\u8f93\u5165\u5e8f\u5217\uff0c\u4ee5\u7f13\u89e3\u8fd9\u4e2a\u526f\u4f5c\u7528\u3002\u4e0e\u7ade\u4e89\u6027\u7684\u7ba1\u9053\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5982Megatron\u76841F1B\u7ba1\u9053\u5e76\u884c\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u66f4\u9ad8\u8bad\u7ec3\u541e\u5410\u91cf\u7684\u540c\u65f6\uff0c\u5185\u5b58\u5360\u7528\u66f4\u4f4e\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSeq1F1B\u80fd\u591f\u5728\u4e0d\u4f7f\u7528\u91cd\u65b0\u8ba1\u7b97\u7b56\u7565\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u5730\u572864\u4e2aNVIDIA A100 GPU\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u5177\u6709300\u4ebf\u53c2\u6570\u7684LLM\uff0c\u5904\u7406\u957f\u8fbe64k\u7684\u5e8f\u5217\uff0c\u8fd9\u662f\u73b0\u6709\u65b9\u6cd5\u65e0\u6cd5\u5b9e\u73b0\u7684\u3002\u6211\u4eec\u7684\u4ee3\u7801\u57fa\u4e8eMegatron-LM\uff0c\u5e76\u5df2\u5f00\u6e90\uff1ahttps://github.com/MayDomine/Seq1F1B.git\u3002|\n", "2406.03487": "|**2024-06-05**|**Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends**|Sanjana Ramprasad et.al.|[2406.03487](http://arxiv.org/abs/2406.03487)|null|### \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u6458\u8981\u751f\u6210\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u771f\u5b9e\u6027\u65b9\u9762\u7684\u95ee\u9898\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u65b0\u95fb\u9886\u57df\u7684LLMs\uff0c\u5bf9\u8bdd\u6458\u8981\u7684\u8bc4\u4ef7\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u4e8eBART\u7684\u6a21\u578b\u4e0a\uff0c\u8fd9\u5728\u6211\u4eec\u7406\u89e3\u5b83\u4eec\u7684\u53ef\u4fe1\u5ea6\u65b9\u9762\u7559\u4e0b\u4e86\u7a7a\u767d\u3002\u672c\u7814\u7a76\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u5bf9\u8bdd\u6458\u8981\u4e2d\u7684\u771f\u5b9e\u6027\uff0c\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\uff0c\u5e76\u7740\u91cd\u4e8e\u8bc6\u522b\u548c\u5206\u7c7b\u53e5\u7ea7\u4e0d\u4e00\u81f4\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8GPT-4\u548cAlpaca-13B\u8fd9\u4e24\u6b3e\u4e3b\u6d41\u6a21\u578b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u9519\u8bef\u5b9a\u4e49\u7684\u5fae\u5999\u4e4b\u5904\uff1aLLMs\u5e38\u5e38\u751f\u6210\u770b\u4f3c\u5408\u7406\u7684\u63a8\u65ad\uff0c\u8fd9\u4e9b\u63a8\u65ad\u4f9d\u8d56\u4e8e\u5bf9\u8bdd\u4e2d\u7684\u95f4\u63a5\u8bc1\u636e\uff0c\u800c\u7f3a\u4e4f\u76f4\u63a5\u8bc1\u636e\uff0c\u8fd9\u5728\u65e7\u6a21\u578b\u4e2d\u8f83\u5c11\u89c1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6539\u8fdb\u7684\u9519\u8bef\u5206\u7c7b\u4f53\u7cfb\uff0c\u5f15\u5165\u4e86\u201c\u60c5\u5883\u63a8\u7406\u201d\u7c7b\u522b\u6765\u5f52\u7c7b\u8fd9\u4e9bLLM\u884c\u4e3a\uff0c\u5e76\u516c\u5f00\u4e86\u76f8\u5173\u6570\u636e\u96c6\u3002\u5229\u7528\u6211\u4eec\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86LLMs\u4e0e\u8001\u5f0f\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u884c\u4e3a\u5dee\u5f02\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u81ea\u52a8\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\u5728LLM\u6458\u8981\u4e0a\u7684\u6548\u679c\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u8bc6\u522b\u8fd9\u7c7b\u7ec6\u5fae\u9519\u8bef\u65f6\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u7cbe\u7ec6\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u4f18\u4e8e\u73b0\u6709\u6307\u6807\uff0c\u7279\u522b\u662f\u5728\u8bc6\u522b\u201c\u60c5\u5883\u63a8\u7406\u201d\u9519\u8bef\u65f6\u3002|\n", "2406.03486": "|**2024-06-05**|**BIPED: Pedagogically Informed Tutoring System for ESL Education**|Soonwoo Kwon et.al.|[2406.03486](http://arxiv.org/abs/2406.03486)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u4f5c\u4e3a\u7ecf\u6d4e\u4e14\u6613\u4e8e\u83b7\u53d6\u7684\u82f1\u8bed\u7b2c\u4e8c\u8bed\u8a00\uff08L2\uff09\u5b66\u4e60\u8005\u5bf9\u8bdd\u5f0f\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\uff08CITS\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684CITS\u5f80\u5f80\u53ea\u80fd\u6559\u6388\u7b80\u5355\u6982\u5ff5\uff0c\u6216\u8005\u5728\u6559\u5b66\u6df1\u5ea6\u4e0a\u65e0\u6cd5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u7b56\u7565\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u5f00\u53d1\u4e00\u4e2a\u66f4\u5177\u6559\u80b2\u5b66\u5bfc\u5411\u3001\u80fd\u6559\u6388\u590d\u6742\u6982\u5ff5\u7684CITS\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53cc\u8bed\u6559\u80b2\u6307\u5bfc\u5bf9\u8bdd\u6570\u636e\u96c6\uff08BIPED\uff09\uff0c\u5305\u542b\u4e00\u5bf9\u4e00\u7684\u4eba\u7c7b\u82f1\u8bed\u8f85\u5bfc\u4e92\u52a8\u3002\u901a\u8fc7\u5bf9\u8f85\u5bfc\u5bf9\u8bdd\u7684\u540e\u5904\u7406\u5206\u6790\uff0c\u6211\u4eec\u63d0\u70bc\u51fa\u4e00\u5957\u5305\u542b34\u79cd\u6559\u5e08\u884c\u4e3a\u548c9\u79cd\u5b66\u751f\u884c\u4e3a\u7684\u5bf9\u8bdd\u52a8\u4f5c\u8bcd\u5178\uff0c\u5e76\u5c06\u5176\u7528\u4e8e\u8fdb\u4e00\u6b65\u6807\u6ce8\u6536\u96c6\u7684\u6570\u636e\u3002\u6839\u636e\u5148\u9884\u6d4b\u5408\u9002\u7684\u6559\u5e08\u884c\u4e3a\u518d\u751f\u6210\u76f8\u5e94\u56de\u590d\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u6211\u4eec\u5229\u7528GPT-4\u548cSOLAR-KO\u5206\u522b\u5b9e\u73b0\u4e86\u4e24\u4e2aCITS\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u5b9e\u65bd\u7684\u6a21\u578b\u4e0d\u4ec5\u6a21\u4eff\u4e86\u4eba\u7c7b\u6559\u5e08\u7684\u98ce\u683c\uff0c\u8fd8\u8fd0\u7528\u4e86\u4e30\u5bcc\u4e14\u4e0e\u4e0a\u4e0b\u6587\u76f8\u9002\u5e94\u7684\u6559\u5b66\u7b56\u7565\u3002|\n", "2406.03476": "|**2024-06-05**|**Does your data spark joy? Performance gains from domain upsampling at the end of training**|Cody Blakeney et.al.|[2406.03476](http://arxiv.org/abs/2406.03476)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u89c4\u6a21\u589e\u957f\u5230\u4e07\u4ebf\u7ea7\u522b\u7684tokens\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u7531\u5927\u89c4\u6a21\u7684CommonCrawl\u7f51\u7edc\u722c\u866b\u5185\u5bb9\u4ee5\u53ca\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u7ec4\u6210\u3002\u7531\u4e8e\u5728\u5927\u8ba1\u7b97\u91cf\uff08FLOPs\uff09\u4e0b\u8bad\u7ec3\u4ee5\u63ed\u793a\u6a21\u578b\u5728\u56f0\u96be\u548c\u65b0\u5174\u57fa\u51c6\u4e0a\u7684\u663e\u8457\u53d8\u5316\u6210\u672c\u9ad8\u6602\uff0c\u5982\u4f55\u5728\u901a\u7528\u7f51\u7edc\u6293\u53d6\u7684\u591a\u6837\u6027\u548c\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5bc6\u5ea6\u4e4b\u95f4\u627e\u5230\u6700\u4f18\u5e73\u8861\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c\u5728\u8bad\u7ec3\u540e\u671f\u5bf9\u5176\u8fdb\u884c\u4e0a\u91c7\u6837\uff0c\u4ece\u800c\u5728\u8bf8\u5982MMLU\u3001GSM8K\u548cHumanEval\u7b49\u57fa\u51c6\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u5bf9\u4e8e\u4e00\u4e2a\u8bad\u7ec3\u4e861\u4e07\u4ebf\uff08T\uff09\u4ee4\u724c\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u8fd9\u79cd\u7b80\u5355\u65b9\u6cd5\u53ef\u4f7f\u5176\u6027\u80fd\u63d0\u9ad86.90\u5206\u30018.26\u5206\u548c6.17\u5206\uff0c\u4e0e\u8bad\u7ec3\u65f6\u95f4\u4e24\u500d\u7684Llama-2\uff087B\uff09\u6a21\u578b\u76f8\u5f53\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5728\u8bad\u7ec3\u540e\u671f\u9886\u57df\u4e0a\u91c7\u6837\u7684\u6301\u7eed\u65f6\u95f4\uff0c\u4ece5%\u523030%\uff0c\u53d1\u73b010%\u523020%\u7684\u6bd4\u4f8b\u6700\u4e3a\u5408\u9002\uff0c\u4ee5\u5e73\u8861\u4e00\u822c\u8bed\u8a00\u5efa\u6a21\u80fd\u529b\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684\u4f18\u5316\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5229\u7528\u9886\u57df\u4e0a\u91c7\u6837\u6765\u5927\u89c4\u6a21\u5206\u6790\u5355\u4e2a\u6570\u636e\u96c6\u5bf9\u4e0d\u540c\u57fa\u51c6\u7684\u589e\u76ca\uff0c\u901a\u8fc7\u5728\u8fd9\u4e00\u9636\u6bb5\u79fb\u9664\u5b83\u4eec\u8fdb\u884c\u5b9e\u9a8c\u3002\u8fd9\u79cd\u65b9\u6cd5\u6781\u5927\u5730\u964d\u4f4e\u4e86\u5b9e\u9a8c\u6210\u672c\uff0c\u4f7f\u5f97\u80fd\u591f\u4ee5\u9884\u8bad\u7ec3\u8fd0\u884c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u7684\u6210\u672c\u63a2\u7d22\u4e0d\u540c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474](http://arxiv.org/abs/2406.03474)|null|\u9274\u4e8e\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u805a\u7126\u4e8e\u4f7f\u7528MLLM\u9a71\u52a8\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u5728\u5927\u89c4\u6a21\u52a8\u6001\u73af\u5883\u4e2d\u3002\u7136\u800c\uff0c\u5e38\u89c1\u7684\u65b9\u6cd5\u76f4\u63a5\u5c06\u9ad8\u7ea7\u6307\u4ee4\u8f6c\u5316\u4e3a\u4f4e\u7ea7\u8f66\u8f86\u63a7\u5236\u4fe1\u53f7\uff0c\u8fd9\u8fdd\u80cc\u4e86MLLM\u7684\u672c\u8d28\u751f\u6210\u6a21\u5f0f\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u5176\u6f5c\u5728\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u7684\u4e00\u822c\u5316\u80fd\u529b\u53d7\u5230\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6781\u5927\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u4e2d\u5c42\u8bed\u8a00\u9a71\u52a8\u547d\u4ee4\u6765\u8fde\u63a5\u9ad8\u7ea7\u6307\u4ee4\u548c\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\uff0c\u5b83\u4eec\u6bd4\u9ad8\u7ea7\u6307\u4ee4\u66f4\u7ec6\u81f4\uff0c\u4f46\u6bd4\u63a7\u5236\u4fe1\u53f7\u66f4\u901a\u7528\u4e14\u53ef\u89e3\u91ca\uff0c\u4ece\u800c\u6709\u6548\u5f25\u5408\u4e24\u8005\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3aAD-H\u7684\u5206\u5c42\u591a\u4ee3\u7406\u9a7e\u9a76\u7cfb\u7edf\u5b9e\u73b0\u8fd9\u4e00\u7406\u5ff5\uff0c\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u9ad8\u5c42\u63a8\u7406\u7684MLLM\u89c4\u5212\u5668\u548c\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u63a7\u5236\u5668\u8fdb\u884c\u4f4e\u5c42\u6267\u884c\u3002\u8fd9\u79cd\u5206\u5c42\u8bbe\u8ba1\u4f7fMLLM\u6446\u8131\u4e86\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\u89e3\u7801\uff0c\u5145\u5206\u91ca\u653e\u4e86\u5176\u5728\u9ad8\u5c42\u611f\u77e5\u3001\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u7684\u6d8c\u73b0\u80fd\u529b\u3002 \u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5e26\u6709\u52a8\u4f5c\u5c42\u6b21\u6ce8\u91ca\u7684\u65b0\u6570\u636e\u96c6\u3002\u5168\u9762\u7684\u95ed\u73af\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684AD-H\u7cfb\u7edf\u5177\u6709\u591a\u9879\u5173\u952e\u4f18\u52bf\u3002\u9996\u5148\uff0cAD-H\u5728\u9a7e\u9a76\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u751a\u81f3\u5c55\u73b0\u51fa\u5728\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u81ea\u6211\u7ea0\u6b63\u7684\u80fd\u529b\uff0c\u8fd9\u662f\u8bad\u7ec3\u6570\u636e\u672a\u6db5\u76d6\u7684\u573a\u666f\u3002\u5176\u6b21\uff0cAD-H\u5728\u957f\u7a0b\u6307\u4ee4\u548c\u65b0\u73af\u5883\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\uff0c\u660e\u663e\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\uff0c\u53ef\u901a\u8fc7\u83b7\u53d6\u3002|\n", "2406.03450": "|**2024-06-05**|**What is the Best Way for ChatGPT to Translate Poetry?**|Shanshan Wang et.al.|[2406.03450](http://arxiv.org/abs/2406.03450)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u5728\u82f1\u8bed-\u4e2d\u6587\u8bd7\u6b4c\u7ffb\u8bd1\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u5b9a\u5411\u63d0\u793a\u548c\u5c0f\u6837\u672c\u573a\u666f\u5206\u6790\u4ee5\u4f18\u5316\u5176\u8868\u73b0\u3002\u5c3d\u7ba1\u521d\u671f\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u7814\u7a76\u53d1\u73b0ChatGPT\u7684\u7ffb\u8bd1\u5b58\u5728\u6301\u7eed\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u89e3\u91ca\u8f85\u52a9\u8bd7\u6b4c\u673a\u5668\u7ffb\u8bd1\u201d\uff08EAPMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bd7\u6b4c\u7684\u5355\u8bed\u89e3\u91ca\u4f5c\u4e3a\u7ffb\u8bd1\u8fc7\u7a0b\u7684\u6307\u5bfc\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ee5\u66f4\u597d\u5730\u9002\u5e94\u73b0\u4ee3\u8bd7\u6b4c\u7ffb\u8bd1\u7684\u5fae\u5999\u4e4b\u5904\u3002\u6211\u4eec\u9080\u8bf7\u4e13\u4e1a\u8bd7\u4eba\u8fdb\u884c\u8bc4\u4f30\uff0c\u5e76\u7ed3\u5408GPT-4\u7684\u8bc4\u4ef7\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684EAPMT\u65b9\u6cd5\u5728\u4e0e\u4f20\u7edfChatGPT\u7ffb\u8bd1\u65b9\u6cd5\u4ee5\u53ca\u73b0\u6709\u5728\u7ebf\u7cfb\u7edf\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8bba\u6587\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e3a\u6587\u5b66\u7ffb\u8bd1\u7684\u673a\u5668\u8f85\u52a9\u63d0\u4f9b\u4e86\u65b0\u9896\u89c6\u89d2\u3002|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445](http://arxiv.org/abs/2406.03445)|null|## \u7ffb\u8bd1 \u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u6267\u884c\u57fa\u672c\u7684\u7b97\u672f\u8fd0\u7b97\uff0c\u5982\u52a0\u6cd5\uff0c\u4ecd\u4e0d\u6e05\u695a\u3002\u672c\u6587\u63ed\u793a\u4e86\u9884\u8bad\u7ec3\u7684LLMs\u901a\u8fc7\u5085\u91cc\u53f6\u7279\u5f81\u8fdb\u884c\u52a0\u6cd5\u2014\u2014\u8fd9\u4e9b\u662f\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u7ef4\u5ea6\uff0c\u901a\u8fc7\u4e00\u7ec4\u5728\u9891\u57df\u4e2d\u7a00\u758f\u5206\u5e03\u7684\u7279\u5f81\u6765\u8868\u793a\u6570\u5b57\u3002\u5728\u6a21\u578b\u4e2d\uff0c\u591a\u5c42\u611f\u77e5\u5668\uff08MLP\uff09\u5c42\u548c\u6ce8\u610f\u529b\u5c42\u4ee5\u4e92\u8865\u7684\u65b9\u5f0f\u4f7f\u7528\u5085\u91cc\u53f6\u7279\u5f81\uff1aMLP\u5c42\u4e3b\u8981\u4f7f\u7528\u4f4e\u9891\u7279\u5f81\u8fd1\u4f3c\u7b54\u6848\u7684\u5927\u5c0f\uff0c\u800c\u6ce8\u610f\u529b\u5c42\u4e3b\u8981\u901a\u8fc7\u9ad8\u9891\u7279\u5f81\u6267\u884c\u6a21\u8fd0\u7b97\uff08\u4f8b\u5982\u5224\u65ad\u7b54\u6848\u662f\u5426\u4e3a\u5076\u6570\uff09\u3002\u9884\u8bad\u7ec3\u5bf9\u4e8e\u8fd9\u79cd\u673a\u5236\u81f3\u5173\u91cd\u8981\uff1a\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684\u6a21\u578b\u4ec5\u5229\u7528\u4f4e\u9891\u7279\u5f81\uff0c\u5bfc\u81f4\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u5c06\u9884\u8bad\u7ec3\u7684\u8bcd\u5d4c\u5165\u5f15\u5165\u5230\u968f\u673a\u521d\u59cb\u5316\u7684\u6a21\u578b\u4e2d\u53ef\u4ee5\u6062\u590d\u5176\u6027\u80fd\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u9002\u5f53\u7684\u9884\u8bad\u7ec3\u8868\u793a\uff08\u5982\u5085\u91cc\u53f6\u7279\u5f81\uff09\u80fd\u591f\u89e3\u9501Transformer\u5b66\u4e60\u7b97\u6cd5\u4efb\u52a1\u7cbe\u786e\u673a\u5236\u7684\u80fd\u529b\u3002|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441](http://arxiv.org/abs/2406.03441)|null|\u5728\u8bb8\u591a\u9ad8\u98ce\u9669\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u4e2d\uff0c\u6a21\u578b\u9700\u8981\u80fd\u591f\u8868\u660e\u5176\u5bf9\u9884\u6d4b\u7684\u4e0d\u786e\u5b9a\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u4e0a\u7684\u51c6\u786e\u5ea6\u53ef\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u4eba\u7c7b\u6c34\u5e73\uff0c\u4f46\u5b83\u4eec\u5bf9\u9519\u8bef\u54cd\u5e94\u7684\u8fc7\u5ea6\u81ea\u4fe1\u4ecd\u662f\u5df2\u77e5\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u5728\u76f4\u63a5\u5e94\u7528\u4e8eLLMs\u65f6\u53ef\u80fd\u9762\u4e34\u8ba1\u7b97\u6210\u672c\u548c\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u6311\u6218\u3002\u8fd1\u671f\u63d0\u51fa\u4e86\u4e00\u4e9b\u9ed1\u76d2\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bf8\u5982\u81ea\u6211\u8868\u8ff0\u7684\u4fe1\u5fc3\u7b49\u542f\u53d1\u5f0f\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u901a\u8fc7\u5206\u6790\u6a21\u578b\u751f\u6210\u7b54\u6848\u7684\u89e3\u91ca\u5206\u5e03\u6765\u8861\u91cfLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u5c3d\u7ba1\u5229\u7528\u89e3\u91ca\u672c\u8eab\u5e76\u975e\u65b0\u9896\uff0c\u4f46\u6211\u4eec\u5c06\u5176\u89c6\u4e3a\u6d4b\u8bd5\u65f6\u95f4\u5206\u7c7b\u5668\uff0c\u901a\u8fc7\u8ba1\u7b97\u6700\u53ef\u80fd\u7684\u5206\u7c7b\u5668\u540e\u9a8c\u7b54\u6848\u5206\u5e03\uff0c\u4ee5\u6b64\u8fdb\u884c\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u89e3\u91ca\u8574\u542b\u4f5c\u4e3a\u5206\u7c7b\u5668\u4f3c\u7136\u6027\u7684\u4e00\u79cd\u7279\u5b9a\u6846\u67b6\u5b9e\u4f8b\uff0c\u5982\u4f55\u5728\u4e94\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\u4e0a\u6539\u8fdb\u4e86\u4fe1\u5fc3\u5206\u6570\u6307\u6807\uff08\u7279\u522b\u662fAUROC\u548cAURC\uff09\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u65e2\u5177\u6709\u7406\u8bba\u4f9d\u636e\uff0c\u53c8\u662f\u6709\u6548\u91cf\u5316LLMs\u4e0d\u786e\u5b9a\u6027\u7684\u65b9\u5f0f\u3002|\n", "2406.03411": "|**2024-06-05**|**Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach**|Saehyung Lee et.al.|[2406.03411](http://arxiv.org/abs/2406.03411)|**[link](https://github.com/saehyung-lee/plugir)**|**\u8be5\u8bba\u6587\u4e3b\u8981\u5173\u6ce8\u7684\u662f\u4ea4\u4e92\u5f0f\u6587\u672c\u5230\u56fe\u50cf\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u8bdd\u5f62\u5f0f\u4e0a\u4e0b\u6587\u67e5\u8be2\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\uff0c\u540d\u4e3aPlugIR\uff0c\u901a\u8fc7\u4e24\u79cd\u65b9\u5f0f\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u822c\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u9996\u5148\uff0c\u901a\u8fc7\u91cd\u8ff0\u5bf9\u8bdd\u5f62\u5f0f\u7684\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u6d88\u9664\u4e86\u5728\u73b0\u6709\u89c6\u89c9\u5bf9\u8bdd\u6570\u636e\u4e0a\u5fae\u8c03\u68c0\u7d22\u6a21\u578b\u7684\u9700\u6c42\uff0c\u4ece\u800c\u80fd\u591f\u4f7f\u7528\u4efb\u610f\u9ed1\u76d2\u6a21\u578b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2aLLM\u63d0\u95ee\u8005\uff0c\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u4e2d\u5019\u9009\u56fe\u50cf\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5173\u4e8e\u76ee\u6807\u56fe\u50cf\u5c5e\u6027\u7684\u975e\u5197\u4f59\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u51cf\u5c11\u4e86\u751f\u6210\u95ee\u9898\u7684\u566a\u58f0\u548c\u5197\u4f59\u3002\u9664\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3a\u6700\u4f73\u5bf9\u6570\u6392\u540d\u79ef\u5206\uff08BRI\uff09\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4ea4\u4e92\u5f0f\u68c0\u7d22\u7cfb\u7edf\u3002PlugIR\u5728\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u4f18\u4e8e\u96f6\u6b21\u8bbe\u7f6e\u548c Fine-tuned \u57fa\u51c6\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c PlugIR \u7684\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\u53ef\u4ee5\u6839\u636e\u4e0d\u540c\u60c5\u51b5\u7075\u6d3b\u5355\u72ec\u6216\u7ed3\u5408\u5e94\u7528\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/Saehyung-Lee/PlugIR\u3002**|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344](http://arxiv.org/abs/2406.04344)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d6\u5f97\u7684\u5de8\u5927\u8fdb\u5c55\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53e3\u5934\u5316\u673a\u5668\u5b66\u4e60\uff08VML\uff09\u6846\u67b6\u3002\u4e0e\u4f20\u7edf\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u901a\u5e38\u5728\u8fde\u7eed\u53c2\u6570\u7a7a\u95f4\u4e2d\u4f18\u5316\u4e0d\u540c\uff0cVML\u5c06\u53c2\u6570\u7a7a\u95f4\u9650\u5236\u4e3a\u4eba\u53ef\u7406\u89e3\u7684\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u79cd\u7ea6\u675f\u4fc3\u4f7f\u6211\u4eec\u4ece\u65b0\u89d2\u5ea6\u770b\u5f85\u51fd\u6570\u903c\u8fd1\u95ee\u9898\uff0c\u5373\u5c06\u5e26\u6709\u6587\u672c\u63d0\u793a\u7684LLM\u89c6\u4e3a\u7531\u6587\u672c\u63d0\u793a\u53c2\u6570\u5316\u7684\u51fd\u6570\u3002\u6211\u4eec\u501f\u6b64\u89c6\u89d2\u91cd\u65b0\u5ba1\u89c6\u4e86\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u5982\u56de\u5f52\u548c\u5206\u7c7b\uff0c\u53d1\u73b0\u8fd9\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\u548c\u4f18\u5316\u5668\u6765\u89e3\u51b3\u3002VML\u7684\u4e3b\u8981\u4f18\u52bf\u5305\u62ec\uff1a\uff081\uff09\u6613\u4e8e\u7f16\u7801\u5148\u9a8c\u77e5\u8bc6\uff1a\u5173\u4e8e\u95ee\u9898\u548c\u5047\u8bbe\u7c7b\u7684\u5148\u9a8c\u77e5\u8bc6\u53ef\u4ee5\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7f16\u7801\u5e76\u8f93\u5165\u7ed9LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\uff1b\uff082\uff09\u81ea\u52a8\u6a21\u578b\u9009\u62e9\uff1a\u4f18\u5316\u5668\u53ef\u4ee5\u6839\u636e\u6570\u636e\u548c\u53e3\u5934\u5316\u5148\u9a8c\u77e5\u8bc6\u81ea\u52a8\u9009\u62e9\u5177\u4f53\u7684\u6a21\u578b\u7c7b\u522b\uff0c\u5e76\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u66f4\u65b0\u6a21\u578b\u7c7b\u522b\uff1b\uff083\uff09\u53ef\u89e3\u91ca\u7684\u5b66\u4e60\u8005\u66f4\u65b0\uff1aLLM\u53c2\u6570\u5316\u7684\u4f18\u5316\u5668\u53ef\u4ee5\u89e3\u91ca\u6bcf\u6b21\u5b66\u4e60\u8005\u66f4\u65b0\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u591a\u9879\u5b9e\u9a8c\u8bc4\u4f30VML\u7684\u6709\u6548\u6027\uff0c\u5e0c\u671b\u5b83\u80fd\u6210\u4e3a\u589e\u5f3a\u673a\u5668\u5b66\u4e60\u53ef\u89e3\u91ca\u6027\u548c\u4fe1\u4efb\u5ea6\u7684\u6865\u6881\u3002|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339](http://arxiv.org/abs/2406.04339)|null|\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u7684\u6838\u5fc3\u76ee\u6807\u4e2d\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u89c6\u89c9\u573a\u666f\u5e76\u6267\u884c\u52a8\u4f5c\u662f\u4e00\u4e2a\u57fa\u672c\u4efb\u52a1\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u673a\u5668\u4eba\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u80fd\u591f\u5904\u7406\u4e00\u4e9b\u57fa\u7840\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5728\u4e24\u4e2a\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff1a1\uff09\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u63a8\u7406\u80fd\u529b\u4e0d\u8db3\uff1b2\uff09\u5bf9\u4e8eMLLM\u7684\u5fae\u8c03\u548c\u63a8\u7406\u5b58\u5728\u9ad8\u8ba1\u7b97\u6210\u672c\u3002\u8fd1\u671f\u63d0\u51fa\u7684\u57fa\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSM\uff09\u7684Mamba\u5c55\u793a\u4e86\u5728\u975e\u5e73\u51e1\u5e8f\u5217\u5efa\u6a21\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5177\u6709\u7ebf\u6027\u63a8\u7406\u590d\u6742\u5ea6\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RoboMamba\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4ebaMLLM\uff0c\u5b83\u5229\u7528Mamba\u6a21\u578b\u7ed3\u5408\u673a\u5668\u4eba\u63a8\u7406\u548c\u52a8\u4f5c\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u5fae\u8c03\u548c\u63a8\u7406\u6548\u7387\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u4e0eMamba\u96c6\u6210\uff0c\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u4f7f\u89c6\u89c9\u6570\u636e\u4e0e\u8bed\u8a00\u5d4c\u5165\u5bf9\u9f50\uff0c\u8d4b\u4e88\u6a21\u578b\u89c6\u89c9\u5e38\u8bc6\u548c\u4e0e\u673a\u5668\u4eba\u76f8\u5173\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347RoboMamba\u7684\u52a8\u4f5c\u59ff\u6001\u9884\u6d4b\u80fd\u529b\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u7b80\u5355\u7684\u7b56\u7565\u5934\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e00\u65e6RoboMamba\u5177\u5907\u8db3\u591f\u7684\u63a8\u7406\u80fd\u529b\uff0c\u53ea\u9700\u6781\u5c11\u7684\u5fae\u8c03\u53c2\u6570\uff08\u6a21\u578b\u76840.1%\uff09\u548c\u65f6\u95f4\uff0820\u5206\u949f\uff09\uff0c\u5c31\u80fd\u4e60\u5f97\u64cd\u7eb5\u6280\u80fd\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0cRoboMamba\u5728\u901a\u7528\u548c\u673a\u5668\u4eba\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u4e2d\u5b9e\u73b0\u4e86\u59ff\u6001\u9884\u6d4b\u7684\u51fa\u8272\u8868\u73b0\uff0c\u5176\u63a8\u7406\u901f\u5ea6\u6bd4\u73b0\u6709\u673a\u5668\u4ebaMLLM\u5feb7\u500d\u3002\u9879\u76ee\u7684\u7f51\u9875\u94fe\u63a5\u4e3a\uff1a\u3002|\n", "2406.04337": "|**2024-06-06**|**Coherent Zero-Shot Visual Instruction Generation**|Quynh Phung et.al.|[2406.04337](http://arxiv.org/abs/2406.04337)|null|\u5c3d\u7ba1\u6587\u672c\u5230\u56fe\u50cf\u5408\u6210\u6280\u672f\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5728\u6269\u6563\u6a21\u578b\u65b9\u9762\uff0c\u4f46\u751f\u6210\u9700\u8981\u7269\u4f53\u5728\u8fde\u7eed\u6b65\u9aa4\u4e2d\u4fdd\u6301\u4e00\u81f4\u8868\u793a\u548c\u5e73\u6ed1\u72b6\u6001\u8f6c\u6362\u7684\u89c6\u89c9\u6307\u4ee4\u4ecd\u7136\u662f\u4e00\u9879\u8270\u5de8\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\uff0c\u5de7\u5999\u5730\u7ed3\u5408\u4e86\u6587\u672c\u7406\u89e3\u4e0e\u56fe\u50cf\u751f\u6210\uff0c\u4ee5\u786e\u4fdd\u89c6\u89c9\u6307\u4ee4\u65e2\u7f8e\u89c2\u53c8\u5177\u6709\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6d4b\u8bd5\u591a\u6b65\u9aa4\u6307\u4ee4\uff0c\u5e76\u4e0e\u591a\u4e2a\u57fa\u7ebf\u8fdb\u884c\u6bd4\u8f83\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u8fde\u8d2f\u4e14\u89c6\u89c9\u4e0a\u5438\u5f15\u4eba\u7684\u6307\u4ee4\u3002|\n", "2406.04334": "|**2024-06-06**|**DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs**|Lingchen Meng et.al.|[2406.04334](http://arxiv.org/abs/2406.04334)|null|\u5927\u591a\u6570\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u901a\u8fc7\u5c06\u89c6\u89c9\u4ee4\u724c\u4f5c\u4e3a\u5e8f\u5217\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7b2c\u4e00\u5c42\u6765\u5b9e\u73b0\u3002\u8fd9\u79cd\u65b9\u6cd5\u867d\u7136\u76f4\u89c2\uff0c\u4f46\u4f1a\u663e\u8457\u589e\u52a0\u8ba1\u7b97\u548c\u5185\u5b58\u5f00\u9500\uff0c\u56e0\u4e3a\u6a21\u578b\u9700\u8981\u5904\u7406\u66f4\u591a\u7684\u8f93\u5165\u5c42\u4ee4\u724c\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u67b6\u6784DeepStack\uff0c\u7528\u4e8eLMMs\u3002\u5728LMM\u7684\u89c6\u89c9\u548c\u8bed\u8a00Transformer\u7684N\u5c42\u4e2d\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u4ee4\u724c\u5206\u4e3aN\u7ec4\uff0c\u5e76\u4ece\u5e95\u5c42\u9010\u5c42\u5411\u4e0a\u9988\u9001\u5230\u5bf9\u5e94\u7684Transformer\u5c42\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u6781\u5927\u5730\u589e\u5f3a\u4e86LMM\u5728\u8de8\u5c42\u89c6\u89c9\u4ee4\u724c\u4ea4\u4e92\u65b9\u9762\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u6210\u672c\u51e0\u4e4e\u4e0d\u53d8\u3002\u6211\u4eec\u5206\u522b\u5c06DeepStack\u5e94\u7528\u4e8eLMM\u7684\u8bed\u8a00\u548c\u89c6\u89c9Transformer\uff0c\u5e76\u901a\u8fc7\u5e7f\u6cdb\u5b9e\u8bc1\u7ed3\u679c\u9a8c\u8bc1\u4e86DeepStack LMM\u7684\u6709\u6548\u6027\u3002 \u4f7f\u7528\u76f8\u540c\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u6211\u4eec\u7684DeepStack 7B\u548c13B\u53c2\u6570\u6a21\u578b\u57289\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5e73\u5747\u8d85\u8d8a\u540c\u7c7b\u6a21\u578b2.7\u5206\u548c2.9\u5206\u3002\u4ec5\u4f7f\u7528\u4e94\u5206\u4e4b\u4e00\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0cDeepStack\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u4f7f\u7528\u5b8c\u6574\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6a21\u578b\u3002\u8fd9\u4e9b\u63d0\u5347\u5728\u9ad8\u5206\u8fa8\u7387\u4efb\u52a1\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u4f8b\u5982\uff0c\u4e0eLLaVA-1.5-7B\u76f8\u6bd4\uff0cTextVQA\u3001DocVQA\u548cInfoVQA\u4e0a\u7684\u6027\u80fd\u5206\u522b\u63d0\u9ad8\u4e864.2\u5206\u300111.0\u5206\u548c4.0\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06DeepStack\u5e94\u7528\u5230\u89c6\u89c9Transformer\u5c42\uff0c\u8fd9\u5e26\u6765\u4e86\u4e0eLLaVA-1.5-7B\u76f8\u5f53\u7684\u5e73\u5747\u6539\u8fdb\uff0c\u4e3a3.8\u5206\u3002|\n", "2406.04331": "|**2024-06-06**|**PaCE: Parsimonious Concept Engineering for Large Language Models**|Jinqi Luo et.al.|[2406.04331](http://arxiv.org/abs/2406.04331)|**[link](https://github.com/peterljq/parsimonious-concept-engineering)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5c3d\u7ba1\u5b83\u4eec\u80fd\u591f\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u56de\u590d\uff0c\u4f46\u4e5f\u4f1a\u4ea7\u751f\u4e0d\u826f\u8f93\u51fa\uff0c\u5982\u6f5c\u5728\u6709\u5bb3\u4fe1\u606f\u3001\u79cd\u65cf\u6216\u6027\u522b\u6b67\u89c6\u6027\u8a00\u8bba\u4ee5\u53ca\u9519\u8bef\u7684\u4fe1\u606f\u3002\u4e3a\u4e86\u51cf\u5c11\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u3001\u63d0\u793a\u5de5\u7a0b\u548c\u8868\u793a\u5de5\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u9762\u4e34\u6311\u6218\uff1a\u4e00\u4e9b\u9700\u8981\u9488\u5bf9\u6bcf\u4e2a\u5bf9\u9f50\u4efb\u52a1\u8fdb\u884c\u6602\u8d35\u7684\u5fae\u8c03\uff1b\u4e00\u4e9b\u672a\u80fd\u5145\u5206\u6d88\u9664\u4e0d\u826f\u6982\u5ff5\uff0c\u5bf9\u9f50\u6548\u679c\u4e0d\u4f73\uff1b\u4e00\u4e9b\u5219\u5220\u9664\u4e86\u826f\u6027\u7684\u6982\u5ff5\uff0c\u964d\u4f4e\u4e86LLMs\u7684\u8bed\u8a00\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aParsimonious Concept Engineering\uff08PaCE\uff09\u7684\u65b0\u578b\u6fc0\u6d3b\u5de5\u7a0b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \u9996\u5148\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u6982\u5ff5\u5b57\u5178\uff0c\u5b83\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u8868\u793a\u6bcf\u4e2a\u539f\u5b50\u5bf9\u5e94\u4e00\u4e2a\u8bed\u4e49\u6982\u5ff5\u3002\u63a5\u7740\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u4efb\u4f55\u5bf9\u9f50\u4efb\u52a1\uff0c\u6211\u4eec\u4f1a\u4f7f\u7528\u4e00\u4e2a\u6982\u5ff5\u5206\u533a\u5668\u9ad8\u6548\u5730\u6807\u8bb0\u8fd9\u4e9b\u6982\u5ff5\u4e3a\u826f\u6027\u6216\u4e0d\u826f\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u5229\u7528\u7a00\u758f\u7f16\u7801\u65b9\u6cd5\uff0c\u6839\u636e\u6982\u5ff5\u5b57\u5178\u5206\u89e3LLM\u7684\u6fc0\u6d3b\uff0c\u5c06\u5176\u51c6\u786e\u8868\u793a\u4e3a\u826f\u6027\u6210\u5206\u548c\u4e0d\u826f\u6210\u5206\u7684\u7ebf\u6027\u7ec4\u5408\u3002\u901a\u8fc7\u79fb\u9664\u4e0d\u826f\u6210\u5206\uff0c\u6211\u4eec\u80fd\u591f\u8c03\u6574LLMs\u7684\u884c\u4e3a\u4ee5\u7b26\u5408\u5bf9\u9f50\u76ee\u6807\u3002 \u6211\u4eec\u5728\u56de\u5e94\u51c0\u5316\u3001\u771f\u5b9e\u6027\u589e\u5f3a\u548c\u60c5\u611f\u4fee\u8ba2\u7b49\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u53d1\u73b0PaCE\u5728\u5b9e\u73b0\u5bf9\u9f50\u6027\u80fd\u7684\u540c\u65f6\uff0c\u4fdd\u6301\u4e86\u826f\u597d\u7684\u8bed\u8a00\u80fd\u529b\uff0c\u8fbe\u5230\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002**|\n", "2406.04314": "|**2024-06-06**|**Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step**|Zhanhao Liang et.al.|[2406.04314](http://arxiv.org/abs/2406.04314)|null|## \u80cc\u666f \u8fd1\u671f\uff0cDirect Preference Optimization (DPO) \u5df2\u6210\u529f\u6269\u5c55\u5230\u8c03\u6574\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\uff0c\u4f7f\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u4e0d\u540c\u4e8e\u5927\u591a\u6570\u73b0\u6709 DPO \u65b9\u6cd5\u5047\u8bbe\u6240\u6709\u6269\u6563\u6b65\u9aa4\u90fd\u4e0e\u6700\u7ec8\u751f\u6210\u56fe\u50cf\u4fdd\u6301\u4e00\u81f4\u7684\u504f\u597d\u987a\u5e8f\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u5047\u8bbe\u5ffd\u7565\u4e86\u6bcf\u4e2a\u6b65\u9aa4\u7279\u6709\u7684\u53bb\u566a\u6027\u80fd\uff0c\u56e0\u6b64\u5e94\u8be5\u4e3a\u6bcf\u4e00\u6b65\u5b9a\u5236\u504f\u597d\u6807\u7b7e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u540e\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014Step-aware Preference Optimization (SPO)\uff0c\u5b83\u72ec\u7acb\u8bc4\u4f30\u5e76\u8c03\u6574\u6bcf\u4e2a\u6b65\u9aa4\u7684\u53bb\u566a\u6027\u80fd\uff0c\u5229\u7528\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\u548c\u6b65\u7ea7\u91cd\u91c7\u6837\u5668\u6765\u786e\u4fdd\u51c6\u786e\u7684\u6b65\u7ea7\u76d1\u7763\u3002 \u5728SPO\u4e2d\uff0c\u6211\u4eec\u5728\u6bcf\u4e2a\u53bb\u566a\u6b65\u9aa4\u4e2d\u4f1a\u521b\u5efa\u4e00\u4e2a\u56fe\u50cf\u6c60\uff0c\u5bfb\u627e\u5408\u9002\u7684\u80dc\u8005-\u8d25\u8005\u5bf9\uff0c\u5e76\u4e14\u5173\u952e\u5728\u4e8e\uff0c\u6211\u4eec\u4f1a\u4ece\u6c60\u4e2d\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u56fe\u50cf\u4f5c\u4e3a\u4e0b\u4e00\u6b21\u53bb\u566a\u6b65\u9aa4\u7684\u8d77\u70b9\u3002\u8fd9\u4e2a\u6b65\u7ea7\u91cd\u91c7\u6837\u8fc7\u7a0b\u4fdd\u8bc1\u4e86\u6bcf\u6b21\u80dc\u8005-\u8d25\u8005\u5bf9\u90fd\u6765\u81ea\u540c\u4e00\u539f\u59cb\u56fe\u50cf\uff0c\u4f7f\u5f97\u6bd4\u8f83\u72ec\u7acb\u4e8e\u524d\u4e00\u6b65\u3002\u4e3a\u4e86\u8bc4\u4f30\u6bcf\u4e2a\u6b65\u9aa4\u7684\u504f\u597d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u6a21\u7cca\u548c\u6e05\u6670\u7684\u56fe\u50cf\u3002\u5728Stable Diffusion v1.5\u548cSDXL\u7b49\u5b9e\u9a8c\u4e2d\uff0cSPO \u663e\u8457\u4f18\u4e8e\u6700\u65b0\u7684Diffusion-DPO\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u8be6\u7ec6\u7684\u63d0\u793a\u65f6\uff0c\u80fd\u66f4\u597d\u5730\u751f\u6210\u56fe\u50cf\u5e76\u63d0\u5347\u7f8e\u5b66\u6548\u679c\uff0c\u540c\u65f6\u5728\u8bad\u7ec3\u6548\u7387\u4e0a\u8d85\u8fc720\u500d\u3002\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1a[https://rockeycoss.github.io/spo.github.io/](https://rockeycoss.github.io/spo.github.io/)\u3002|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306](http://arxiv.org/abs/2406.04306)|**[link](https://github.com/ml-jku/SDLG)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u65f6\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u8fd9\u963b\u788d\u4e86\u793e\u4f1a\u548c\u5de5\u4e1a\u4e2d\u7684\u5404\u79cd\u5e94\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u964d\u4f4eLLMs\u7684\u53ef\u4fe1\u5ea6\u3002\u5f53\u524d\u7684LLMs\u91c7\u7528\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u6587\u672c\uff0c\u5373\u9884\u6d4b\u5e76\u6dfb\u52a0\u6587\u672c\u6807\u8bb0\u3002\u5f53LLMs\u5bf9\u751f\u6210\u7684\u4e0b\u4e00\u4e2a\u6807\u8bb0\u7684\u8bed\u4e49\u542b\u4e49\u4e0d\u786e\u5b9a\u65f6\uff0c\u5f88\u53ef\u80fd\u4f1a\u4ea7\u751f\u5e7b\u89c9\u3002\u56e0\u6b64\uff0c\u4eba\u4eec\u8ba4\u4e3a\u5e7b\u89c9\u6e90\u4e8e\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u8bed\u4e49\u591a\u6837\u6027\u8bed\u8a00\u751f\u6210\u201d\uff08Semantically Diverse Language Generation\uff0cSDLG\uff09\uff0c\u7528\u4e8e\u91cf\u5316LLMs\u7684\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002SDLG\u5f15\u5bfcLLM\u751f\u6210\u8bed\u4e49\u591a\u6837\u4f46\u53c8\u5408\u7406\u7684\u521d\u59cb\u6587\u672c\u66ff\u4ee3\u65b9\u6848\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u7cbe\u786e\u7684aleatoric\u8bed\u4e49\u4e0d\u786e\u5b9a\u6027\u6d4b\u91cf\uff0c\u80fd\u591f\u68c0\u6d4b\u521d\u59cb\u6587\u672c\u662f\u5426\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u3002 \u5b9e\u9a8c\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u8868\u660e\uff0cSDLG\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u6700\u4e3a\u9ad8\u6548\uff0c\u4e3aLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u8bbe\u5b9a\u4e86\u65b0\u7684\u6807\u51c6\u3002**|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300](http://arxiv.org/abs/2406.04300)|null|\u5728\u6a21\u62df\u8bad\u7ec3\u548c\u8bc4\u4f30\u5173\u952e\u5b89\u5168\u7cfb\u7edf\uff0c\u5982\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u65f6\uff0c\u901a\u8fc7\u6a21\u62df\u751f\u6210\u5404\u79cd\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u6a21\u578b\u5176\u4ed6\u8f66\u8f86\u7684\u8f68\u8ff9\u4ee5\u6a21\u62df\u590d\u6742\u4e14\u6709\u610f\u4e49\u7684\u8fd1\u8ddd\u79bb\u4ea4\u4e92\u4efb\u52a1\u6210\u672c\u9ad8\u6602\u3002\u5229\u7528\u8bed\u8a00\u63cf\u8ff0\u6765\u751f\u6210\u9a7e\u9a76\u884c\u4e3a\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u76f4\u89c2\u7684\u4eba\u7c7b\u64cd\u4f5c\u65b9\u5f0f\uff0c\u80fd\u591f\u6a21\u62df\u5e7f\u6cdb\u9a7e\u9a76\u4e92\u52a8\u3002\u4f46\u5927\u578b\u6807\u6ce8\u7684\u8bed\u8a00-\u8f68\u8ff9\u6570\u636e\u7a00\u7f3a\u662f\u8fd9\u4e00\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Text-to-Drive\uff08T2D\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5408\u6210\u591a\u6837\u5316\u9a7e\u9a76\u884c\u4e3a\u7684\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u77e5\u8bc6\u9a71\u52a8\u4e24\u9636\u6bb5\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5229\u7528LLMs\u7684\u5185\u7f6e\u77e5\u8bc6\u751f\u6210\u4e30\u5bcc\u591a\u6837\u7684\u9a7e\u9a76\u884c\u4e3a\u8bed\u8a00\u63cf\u8ff0\uff1b\u63a5\u7740\uff0c\u5229\u7528\u5176\u63a8\u7406\u80fd\u529b\u5728\u6a21\u62df\u5668\u4e2d\u5b9e\u73b0\u8fd9\u4e9b\u884c\u4e3a\u3002T2D\u7684\u6838\u5fc3\u662f\u4f7f\u7528LLM\u6784\u5efa\u72b6\u6001\u56fe\uff0c\u5c06\u4f4e\u7ea7\u72b6\u6001\u6620\u5c04\u5230\u9ad8\u7ea7\u62bd\u8c61\uff0c\u4ece\u800c\u7b80\u5316\u4e86\u8bf8\u5982\u603b\u7ed3\u4f4e\u7ea7\u89c2\u6d4b\u3001\u8bc4\u4f30\u7b56\u7565\u4e0e\u884c\u4e3a\u63cf\u8ff0\u7684\u4e00\u81f4\u6027\u4ee5\u53ca\u8bbe\u8ba1\u8f85\u52a9\u5956\u52b1\u7b49\u4e0b\u6e38\u4efb\u52a1\uff0c\u65e0\u9700\u4eba\u5de5\u76d1\u7763\u3002\u901a\u8fc7\u6211\u4eec\u7684\u77e5\u8bc6\u9a71\u52a8\u65b9\u6cd5\uff0c\u6211\u4eec\u8bc1\u660eT2D\u80fd\u751f\u6210\u6bd4\u5176\u4ed6\u57fa\u51c6\u66f4\u4e30\u5bcc\u7684\u8f68\u8ff9\uff0c\u5e76\u63d0\u4f9b\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u4ea4\u4e92\u5f0f\u5730\u878d\u5165\u4eba\u7c7b\u504f\u597d\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u7f51\u7ad9\uff1a|\n", "2406.04289": "|**2024-06-07**|**What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages**|Nadav Borenstein et.al.|[2406.04289](http://arxiv.org/abs/2406.04289)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b66\u4e60\u4ec0\u4e48\uff1f\u6839\u636e\u5b9a\u4e49\uff0c\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\u662f\u5b57\u7b26\u4e32\u7684\u5206\u5e03\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06\u8fd9\u4e2a\u95ee\u9898\u8f6c\u5316\u4e3a\u8bc4\u4f30\u5b57\u7b26\u4e32\u5206\u5e03\u7c7b\u7684\u5b66\u4e60\u80fd\u529b\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u7406\u8bba\u9650\u5236\uff0c\u4f46\u6211\u4eec\u5173\u6ce8\u7684\u662f\u5b9e\u9645\u53ef\u5b66\u4e60\u6027\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u5b9e\u8bc1\u5de5\u4f5c\uff0c\u6211\u4eec\u8bc4\u4f30\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u5176\u201c\u4e3b\u573a\u201d\u2014\u2014\u5b66\u4e60\u6982\u7387\u8bed\u8a00\u2014\u2014\u4e0a\u7684\u8868\u73b0\uff0c\u800c\u4e0d\u662f\u4f5c\u4e3a\u5f62\u5f0f\u8bed\u8a00\u7684\u5206\u7c7b\u5668\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u9012\u5f52\u8bed\u8a00\u6a21\u578b\uff08RLM\uff09\u7531\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u548cTransformer LM\u5b66\u4e60\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5RLM\u7684\u53ef\u5b66\u4e60\u6027\uff0c\u8003\u5bdf\u5176\u4e0eRLM\u7684\u590d\u6742\u53c2\u6570\u4ee5\u53ca\u795e\u7ecfLM\u9690\u85cf\u5c42\u5927\u5c0f\u7684\u5173\u7cfb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRLM\u7684\u79e9\uff08\u5bf9\u5e94\u4e8e\u5176\u6761\u4ef6\u5206\u5e03\u5bf9\u6570\u4f3c\u7136\u7ebf\u6027\u7a7a\u95f4\u7684\u5927\u5c0f\uff09\u548c\u91c7\u6837\u5b57\u7b26\u4e32\u7684\u9884\u671f\u957f\u5ea6\u662fRNN\u548cTransformer LM\u53ef\u5b66\u4e60\u6027\u7684\u5f3a\u4e14\u663e\u8457\u9884\u6d4b\u56e0\u7d20\u3002\u5176\u4ed6\u4e00\u4e9b\u9884\u6d4b\u6307\u6807\u4e5f\u8fbe\u5230\u4e86\u663e\u8457\u6027\uff0c\u4f46RNN\u548cTransformer\u4e4b\u95f4\u5b58\u5728\u4e0d\u540c\u7684\u6a21\u5f0f\u3002|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278](http://arxiv.org/abs/2406.04278)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|**## \u7ffb\u8bd1\u540e\u7684\u4e2d\u6587\u6458\u8981 \u5bf9\u8bdd\u8bed\u6c14\u5728\u4eba\u9645\u4ea4\u6d41\u4e2d\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u7814\u7a76\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ea4\u6d41\u8bed\u6c14\u7684\u5dee\u5f02\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u5173\u4e8e\u5bf9\u8bdd\u6a21\u5f0f\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u9884\u5148\u5b58\u5728\u7684\u5206\u7c7b\u4f53\u7cfb\u6216\u6587\u672c\u8bed\u6599\u5e93\uff0c\u8fd9\u4e9b\u53ef\u80fd\u5b58\u5728\u5b9e\u9a8c\u8005\u504f\u89c1\uff0c\u5e76\u53ef\u80fd\u65e0\u6cd5\u5145\u5206\u53cd\u6620\u7814\u7a76\u9886\u57df\u4e2d\u7684\u771f\u5b9e\u4e16\u754c\u5206\u5e03\u3002\u53d7\u8ba4\u77e5\u79d1\u5b66\u65b9\u6cd5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8fed\u4ee3\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u4e24\u9879\u4efb\u52a1\u6765\u540c\u65f6\u63ed\u793a\u8bed\u6c14\u548c\u53e5\u5b50\uff1a\uff081\uff09\u53c2\u4e0e\u8005\u5224\u65ad\u7ed9\u5b9a\u53e5\u5b50\u7684\u8bed\u6c14\uff0c\uff082\uff09\u53e6\u4e00\u53c2\u4e0e\u8005\u6839\u636e\u8be5\u8bed\u6c14\u751f\u6210\u53e5\u5b50\u3002\u6211\u4eec\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u548cGPT-4\u4e4b\u95f4\u8fdb\u884c\u4e86100\u8f6e\u8fd9\u6837\u7684\u4e92\u52a8\uff0c\u4ece\u800c\u83b7\u5f97\u4e86\u4e00\u7ec4\u5305\u542b\u53e5\u5b50\u548c\u5e38\u89c1\u5bf9\u8bdd\u8bed\u6c14\u7684\u6570\u636e\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u989d\u5916\u5b9e\u9a8c\uff0c\u8ba9\u4eba\u7c7b\u548cGPT-4\u5bf9\u6240\u6709\u53e5\u5b50\u6807\u6ce8\u6240\u6709\u8bed\u6c14\u3002\u57fa\u4e8e1,339\u540d\u4eba\u7c7b\u53c2\u4e0e\u8005\u300133,370\u6b21\u4eba\u7c7b\u8bc4\u4ef7\u4ee5\u53ca29,900\u4e2aGPT-4\u67e5\u8be2\u7684\u6570\u636e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u521b\u5efa\u4e00\u4e2a\u53ef\u89e3\u91ca\u7684\u51e0\u4f55\u8868\u793a\uff0c\u4ee5\u5c55\u793a\u4eba\u7c7b\u548cGPT-4\u4e4b\u95f4\u7684\u5bf9\u8bdd\u8bed\u6c14\u5173\u7cfb\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u548c\u8ba4\u77e5\u79d1\u5b66\u7406\u5ff5\u5982\u4f55\u7ed3\u5408\uff0c\u4ee5\u89e3\u51b3\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6311\u6218\u3002**|\n", "2406.05132": "|**2024-06-07**|**3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs**|Jianing Yang et.al.|[2406.05132](http://arxiv.org/abs/2406.05132)|**[link](https://github.com/sled-group/3D-GRAND)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u8bed\u8a00\u4e0e\u4e09\u7ef4\u611f\u77e5\u7684\u878d\u5408\u5bf9\u4e8e\u6784\u5efa\u7406\u89e3\u548c\u4e92\u52a8\u4e8e\u7269\u7406\u4e16\u754c\u7684\u5b9e\u4f53\u4ee3\u7406\u548c\u673a\u5668\u4eba\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u9002\u5e94\u4e09\u7ef4\u73af\u5883\uff083D-LLMs\uff09\u65b9\u9762\u4ecd\u5904\u4e8e\u521d\u7ea7\u9636\u6bb5\uff0c\u4e3b\u8981\u6311\u6218\u5728\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u5bc6\u96c6\u5730\u5c06\u8bed\u8a00\u4e0e\u4e09\u7ef4\u573a\u666f\u5173\u8054\u7684\u6570\u636e\u96c6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e863D-GRAND\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u5927\u578b\u6570\u636e\u96c6\uff0c\u5305\u542b40,087\u4e2a\u5bb6\u5ead\u573a\u666f\uff0c\u914d\u5bf9\u6709620\u4e07\u6761\u8be6\u5c3d\u7684\u573a\u666f-\u8bed\u8a00\u6307\u4ee4\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u75283D-GRAND\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u663e\u8457\u63d0\u9ad8\u4e863D-LLMs\u7684\u5b9a\u4f4d\u80fd\u529b\uff0c\u5e76\u51cf\u5c11\u4e86\u9519\u8bef\u7684\u60f3\u8c61\u3002\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e863D-POPE\u57fa\u51c6\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f303D-LLMs\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u6a21\u578b\u7684\u516c\u5e73\u6bd4\u8f83\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u96c6\u89c4\u6a21\u4e0e3D-LLM\u6027\u80fd\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u5f3a\u8c03\u4e86\u5927\u578b\u4e09\u7ef4\u6587\u672c\u6570\u636e\u96c6\u5728\u63a8\u52a8\u4f53\u611fAI\u7814\u7a76\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u521d\u6b65\u8ff9\u8c61\u8868\u660e\uff0c\u901a\u8fc7\u5728\u5927\u578b\u5408\u6210\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u53ef\u80fd\u5728\u73b0\u5b9e\u4e16\u754c3D\u626b\u63cf\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u8fd9\u5c55\u793a\u4e86\u6a21\u62df\u5230\u5b9e\u9645\u7684\u8fc1\u79fb\u5b66\u4e60\u6f5c\u529b\u3002\u901a\u8fc73D-GRAND\u548c3D-POPE\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u4f53\u611fAI\u793e\u533a\u63d0\u4f9b\u5fc5\u8981\u7684\u8d44\u6e90\u548c\u6d1e\u89c1\uff0c\u63a8\u52a8\u66f4\u53ef\u9760\u3001\u66f4\u624e\u5b9e\u76843D-LLMs\u7684\u53d1\u5c55\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttps://3d-grand.github.io|\n", "2406.05130": "|**2024-06-07**|**An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models**|Xiongtao Zhou et.al.|[2406.05130](http://arxiv.org/abs/2406.05130)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u5177\u6709\u6570\u5341\u4ebf\u53c2\u6570\uff0c\u5168\u9762\u8c03\u6574\u53d8\u5f97\u56f0\u96be\u3002\u7814\u7a76\u76ee\u6807\u662f\u627e\u51fa\u5728\u53c2\u6570\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u5347MLLM\u6027\u80fd\u7684\u6709\u6548\u65b9\u6cd5\u3002\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u56db\u79cd\u6d41\u884c\u7684PEFT\u6280\u672f\u5bf9\u5f00\u6e90MLLMs\u7684LLM\u7ec4\u4ef6\u8fdb\u884c\u5fae\u8c03\uff0c\u8bba\u6587\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5206\u6790\uff0c\u5185\u5bb9\u5305\u62ec\u4e0d\u540c\u65b9\u6cd5\u5bf9\u6a21\u578b\u3001\u53c2\u6570\u4f4d\u7f6e\u3001\u5fae\u8c03\u6570\u636e\u89c4\u6a21\u3001\u6a21\u578b\u7a33\u5b9a\u6027\u3001\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e03\u9879\u6570\u636e\u96c6\uff1a\u672a\u89c1\u8fc7\u7684\u548c\u5df2\u89c1\u8fc7\u7684\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u662f\u6700\u6709\u6548\u7684PEFT\u65b9\u6cd5\uff0c\u800c\u8fde\u63a5\u5668\u5c42\u7684\u5fae\u8c03\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u80fd\u63d0\u9ad8\u6027\u80fd\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05127": "|**2024-06-07**|**Towards Semantic Equivalence of Tokenization in Multimodal LLM**|Shengqiong Wu et.al.|[2406.05127](http://arxiv.org/abs/2406.05127)|null|### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002MLLM\u7684\u6838\u5fc3\u5728\u4e8e\u89c6\u89c9 tokenization\uff0c\u5373\u5982\u4f55\u6709\u6548\u5730\u5c06\u8f93\u5165\u7684\u89c6\u89c9\u4fe1\u53f7\u8f6c\u5316\u4e3a\u5bf9\u8bed\u8a00\u6a21\u578b\u6709\u76ca\u7684\u7279\u5f81\u8868\u793a\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u89c9tokenizer\u5728\u4fdd\u6301\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u8bed\u4e49\u4e00\u81f4\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5b83\u4eec\u8fc7\u4e8e\u788e\u7247\u5316\u89c6\u89c9\u8f93\u5165\uff0c\u7834\u574f\u4e86\u89c6\u89c9\u5185\u5bb9\u7684\u8bed\u4e49\u5b8c\u6574\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u8bed\u4e49\u7b49\u6548\u89c6\u89c9tokenizer\uff08SeTok\uff09\uff0c\u5b83\u901a\u8fc7\u52a8\u6001\u805a\u7c7b\u7b97\u6cd5\u5c06\u89c6\u89c9\u7279\u5f81\u7ec4\u7ec7\u6210\u8bed\u4e49\u5355\u5143\uff0c\u6839\u636e\u56fe\u50cf\u590d\u6742\u6027\u7075\u6d3b\u51b3\u5b9atoken\u7684\u6570\u91cf\u3002\u8fd9\u79cd\u751f\u6210\u7684\u89c6\u89c9tokens\u80fd\u6709\u6548\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u540c\u65f6\u6355\u6349\u4f4e\u9891\u548c\u9ad8\u9891\u89c6\u89c9\u7279\u5f81\u3002 ### \u4efb\u52a1 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSetokim\u7684\u65b0\u578bMLLM\uff0c\u5b83\u7ed3\u5408\u4e86SeTok\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSetokim\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u4f18\u52bf\u3002\u5173\u4e8e\u66f4\u591a\u8be6\u60c5\uff0c\u53ef\u4ee5\u8bbf\u95ee\u9879\u76ee\u7f51\u9875\uff1ahttps://chocowu.github.io/SeTok-web/\u3002|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107](http://arxiv.org/abs/2406.05107)|null|## \u7ffb\u8bd1 \u6570\u636e\u63a2\u7d22\u662f\u4e00\u4e2a\u590d\u6742\u7684\u8fc7\u7a0b\uff0c\u7528\u6237\u901a\u8fc7\u9010\u6b65\u6267\u884c\u4e00\u7cfb\u5217\u67e5\u8be2\u6765\u5ba1\u89c6\u6570\u636e\u96c6\u3002\u6709\u65f6\uff0c\u7528\u6237\u4f1a\u63a2\u7d22\u65b0\u6570\u636e\u4ee5\u719f\u6089\u5b83\uff0c\u4f46\u66f4\u591a\u65f6\u5019\uff0c\u63a2\u7d22\u8fc7\u7a0b\u662f\u56f4\u7ed5\u7279\u5b9a\u5206\u6790\u76ee\u6807\u6216\u95ee\u9898\u8fdb\u884c\u7684\u3002\u4e3a\u4e86\u5e2e\u52a9\u7528\u6237\u6709\u6548\u63a2\u7d22\uff0c\u5df2\u63d0\u51fa\u81ea\u52a8\u5316\u6570\u636e\u63a2\u7d22\uff08Automated Data Exploration\uff0cADE\uff09\u7cfb\u7edf\uff0c\u5b83\u4eec\u65e8\u5728\u81ea\u52a8\u751f\u6210\u5c55\u793a\u6570\u636e\u6709\u8da3\u7279\u6027\u7684\u5b8c\u6574\u63a2\u7d22\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684ADE\u7cfb\u7edf\u5e38\u53d7\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u4f18\u5316\u51fd\u6570\uff0c\u5bfc\u81f4\u5bf9\u540c\u4e00\u6570\u636e\u96c6\u59cb\u7ec8\u4ea7\u751f\u76f8\u540c\u7684\u63a2\u7d22\u5e8f\u5217\uff0c\u8fd9\u5728\u6709\u660e\u786e\u76ee\u6807\u7684\u63a2\u7d22\u4e2d\u663e\u5f97\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51faLINX\uff0c\u4e00\u4e2a\u7ed3\u5408\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\u7684\u751f\u6210\u5f0f\u7cfb\u7edf\uff0c\u4e13\u6ce8\u4e8e\u9762\u5411\u76ee\u6807\u7684\u6570\u636e\u63a2\u7d22\u3002 LINX\u63a5\u53d7\u8f93\u5165\u6570\u636e\u96c6\u548c\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u5206\u6790\u76ee\u6807\uff0c\u751f\u6210\u4e0e\u7528\u6237\u9700\u6c42\u76f8\u5173\u7684\u4e2a\u6027\u5316\u63a2\u7d22\u4f1a\u8bdd\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89e3\u6790\u8f93\u5165\u7684\u5206\u6790\u76ee\u6807\uff0c\u5e76\u636e\u6b64\u751f\u6210\u671f\u671b\u8f93\u51fa\u63a2\u7d22\u4f1a\u8bdd\u7684\u89c4\u8303\u3002\u8fd9\u4e9b\u89c4\u8303\u968f\u540e\u88ab\u4f20\u9012\u7ed9\u57fa\u4e8e\u7ea6\u675f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08Constrained Deep Reinforcement Learning\uff0cCDRL\uff09\u7684\u65b0\u578b\u6a21\u5757\u5316ADE\u5f15\u64ce\uff0c\u4f7f\u5176\u80fd\u6839\u636e\u6307\u5b9a\u6307\u4ee4\u8c03\u6574\u8f93\u51fa\u3002\u4e3a\u4e86\u9a8c\u8bc1LINX\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u9762\u5411\u76ee\u6807\u63a2\u7d22\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u6df1\u5165\u7684\u7528\u6237\u7814\u7a76\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLINX\u751f\u6210\u7684\u63a2\u7d22\u7b14\u8bb0\u672c\u5728\u76f8\u5173\u6027\u548c\u5b9e\u7528\u6027\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u89e3\u51b3\u65b9\u6848\uff0c\u5305\u62ecChatGPT\u3001\u65e0\u76ee\u6807\u5bfc\u5411\u7684ADE\u4ee5\u53ca\u5546\u4e1a\u7cfb\u7edf\u3002|\n", "2406.05085": "|**2024-06-07**|**Multi-Head RAG: Solving Multi-Aspect Problems with LLMs**|Maciej Besta et.al.|[2406.05085](http://arxiv.org/abs/2406.05085)|**[link](https://github.com/spcl/mrag)**|**## \u80cc\u666f **\u589e\u5f3a\u578b\u68c0\u7d22\u751f\u6210\uff08Retrieval Augmented Generation, RAG\uff09**\u901a\u8fc7\u5c06\u6587\u6863\u5185\u5bb9\u878d\u5165\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u4e2d\uff0c\u63d0\u9ad8\u4e86\u5176\u54cd\u5e94\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684RAG\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5904\u7406\u90a3\u4e9b\u53ef\u80fd\u9700\u8981\u68c0\u7d22\u5305\u542b\u4e0d\u540c\u5185\u5bb9\u7684\u591a\u6587\u6863\u67e5\u8be2\u3002\u8fd9\u7c7b\u95ee\u9898\u5728\u73b0\u5b9e\u4e2d\u5f88\u5e38\u89c1\uff0c\u4f46\u6311\u6218\u5728\u4e8e\uff0c\u8fd9\u4e9b\u6587\u6863\u7684\u5d4c\u5165\u5728\u5411\u91cf\u7a7a\u95f4\u4e2d\u53ef\u80fd\u76f8\u8ddd\u8f83\u8fdc\uff0c\u96be\u4ee5\u4e00\u6b21\u6027\u83b7\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6848\u2014\u2014**\u591a\u5934\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Multi-Head RAG, MRAG\uff09**\uff0c\u5b83\u4ee5\u4e00\u79cd\u7b80\u5355\u800c\u5f3a\u5927\u7684\u65b9\u5f0f\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff1a\u5229\u7528Transformer\u7684\u591a\u5934\u6ce8\u610f\u529b\u5c42\u7684\u6fc0\u6d3b\u4f5c\u4e3a\u68c0\u7d22\u952e\uff0c\u800c\u975e\u89e3\u7801\u5c42\u3002\u8fd9\u4e2a\u60f3\u6cd5\u7684\u9a71\u52a8\u529b\u5728\u4e8e\uff0c\u4e0d\u540c\u7684\u6ce8\u610f\u529b\u5934\u80fd\u591f\u5b66\u4e60\u6355\u6349\u6570\u636e\u7684\u4e0d\u540c\u65b9\u9762\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u6fc0\u6d3b\uff0c\u6211\u4eec\u5f97\u5230\u7684\u5d4c\u5165\u80fd\u4ee3\u8868\u6570\u636e\u9879\u548c\u67e5\u8be2\u7684\u591a\u79cd\u7279\u6027\uff0c\u4ece\u800c\u63d0\u5347\u590d\u6742\u67e5\u8be2\u7684\u68c0\u7d22\u7cbe\u5ea6\u3002 **\u8d21\u732e** \u6211\u4eec\u63d0\u4f9b\u4e86\u8bc4\u4f30\u65b9\u6cd5\u3001\u5ea6\u91cf\u6807\u51c6\u3001\u5408\u6210\u6570\u636e\u96c6\u4ee5\u53ca\u5b9e\u9645\u5e94\u7528\u6848\u4f8b\uff0c\u6765\u5c55\u793aMRAG\u7684\u6709\u6548\u6027\u3002\u4e0e\u6807\u51c6RAG\u57fa\u7ebf\u76f8\u6bd4\uff0cMRAG\u5728\u76f8\u5173\u6027\u65b9\u9762\u7684\u63d0\u5347\u53ef\u9ad8\u8fbe20%\u3002MRAG\u53ef\u4ee5\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u7684RAG\u6846\u67b6\uff0c\u5982RAGAS\uff0c\u4ee5\u53ca\u5404\u7c7b\u6570\u636e\u5b58\u50a8\u7cfb\u7edf\u3002 \u603b\u7ed3\uff0c\u672c\u6587\u65e8\u5728\u6539\u8fdb\u73b0\u6709RAG\u6a21\u578b\uff0c\u4ee5\u66f4\u597d\u5730\u5904\u7406\u6d89\u53ca\u591a\u89d2\u5ea6\u4fe1\u606f\u68c0\u7d22\u7684\u590d\u6742\u67e5\u8be2\u4efb\u52a1\u3002**|\n", "2406.05063": "|**2024-06-07**|**Are Large Language Models More Empathetic than Humans?**|Anuradha Welivita et.al.|[2406.05063](http://arxiv.org/abs/2406.05063)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7814\u7a76\u5b83\u4eec\u662f\u5426\u80fd\u5728\u60c5\u611f\u8bc6\u522b\u548c\u5171\u60c5\u56de\u5e94\u65b9\u9762\u8d85\u8d8a\u4eba\u7c7b\u5df2\u6210\u4e3a\u7814\u7a76\u7126\u70b9\u3002\u672c\u8bba\u6587\u5f00\u5c55\u4e86\u4e00\u9879\u6df1\u5165\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5305\u62ecGPT-4\u3001LLaMA-2-70B-Chat\u3001Gemini-1.0-Pro\u548cMixtral-8x7B-Instruct\u5728\u5185\u7684\u56db\u6b3e\u6700\u5148\u8fdb\u7684LLMs\u4e0e\u4eba\u7c7b\u5728\u5171\u60c5\u56de\u5e94\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca1,000\u540d\u53c2\u4e0e\u8005\u7684\u53cc\u76f2\u7528\u6237\u7814\u7a76\uff0c\u5bf92,000\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u60c5\u611f\u5bf9\u8bdd\u63d0\u793a\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u8fd9\u4e9b\u63d0\u793a\u6db5\u76d6\u4e8632\u79cd\u4e0d\u540c\u6b63\u8d1f\u60c5\u7eea\u7684\u5e7f\u6cdb\u8303\u56f4\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u7684\u5171\u60c5\u56de\u5e94\u80fd\u529b\u5728\u7edf\u8ba1\u5b66\u4e0a\u4f18\u4e8e\u4eba\u7c7b\u3002GPT-4\u8868\u73b0\u51fa\u6700\u5f3a\u70c8\u7684\u5171\u60c5\uff0c\u5176\u201c\u597d\u201d\u7b49\u7ea7\u522b\u7684\u56de\u590d\u6bd4\u4eba\u7c7b\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea631%\u3002\u7d27\u968f\u5176\u540e\u7684\u662fLLaMA-2\uff0c\u63d0\u5347\u4e86\u7ea624%\uff0cMixtral-8x7B\u63d0\u5347\u4e86\u7ea621%\uff0cGemini-Pro\u63d0\u5347\u4e86\u7ea610%\u3002\u6211\u4eec\u8fd8\u5bf9\u56de\u590d\u8bc4\u7ea7\u8fdb\u884c\u4e86\u66f4\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u53d1\u73b0\u67d0\u4e9bLLMs\u5728\u56de\u5e94\u7279\u5b9a\u60c5\u7eea\u65b9\u9762\u660e\u663e\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\u3002\u63d0\u51fa\u7684\u8bc4\u4f30\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u9002\u5e94\u6027\u5f3a\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0LLMs\u7684\u5171\u60c5\u80fd\u529b\uff0c\u907f\u514d\u4e86\u672a\u6765\u7814\u7a76\u91cd\u590d\u8fd9\u9879\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055](http://arxiv.org/abs/2406.05055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u63d0\u793a\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u7cbe\u5fc3\u6784\u5efa\u7684\u57fa\u51c6\u4e0a\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u5b58\u5728\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u63a8\u7406\u95ee\u9898\uff0c\u5373\u6240\u8c13\u7684\u4e0d\u660e\u786e\u95ee\u9898\u3002\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0c\u73b0\u6709\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6548\u679c\u4e0d\u4f73\uff0c\u5f80\u5f80\u7ed9\u51fa\u8fc7\u5ea6\u81ea\u4fe1\u7684\u7b54\u6848\u6216\u9519\u8bef\u63a8\u65ad\u3002\u4e3a\u4e86\u6df1\u5165\u7814\u7a76\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u201d\uff08PMC\uff09\u7684\u57fa\u51c6\uff0c\u5e76\u5f15\u5165\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002\u4f7f\u7528PMC\u57fa\u51c6\u7684\u5206\u6790\u63ed\u793a\u4e86\u5728\u89e3\u51b3\u660e\u786e\u95ee\u9898\u7684\u6570\u5b66\u63a8\u7406\u6027\u80fd\u4e0e\u8bc6\u522b\u4e0d\u660e\u786e\u95ee\u9898\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u6743\u8861\u3002\u9488\u5bf9PMC\u5e26\u6765\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\uff0c\u79f0\u4e3aSMT-LIB\u63d0\u793a\uff08SLP\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528SMT-LIB\u8bed\u8a00\u63cf\u8ff0\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u76f4\u63a5\u6c42\u89e3\uff0c\u7136\u540e\u91c7\u7528\u53cc\u91cd\u68c0\u67e5\u6c42\u89e3\u7b56\u7565\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u7684\u6ee1\u8db3\u6027\u548c\u552f\u4e00\u6027\uff0c\u4ece\u800c\u63d0\u4f9b\u6700\u7ec8\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u7684SLP\u65b9\u6cd5\u5728\u5904\u7406\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u65f6\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5177\u6709\u663e\u8457\u4f18\u52bf\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u6211\u4eec\u7684\u57fa\u51c6\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2406.05053": "|**2024-06-07**|**Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation**|Nachiket Kotalwar et.al.|[2406.05053](http://arxiv.org/abs/2406.05053)|null|### \u6982\u8ff0 \u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u7684\u6f5c\u529b\u5de8\u5927\uff0c\u5b83\u4eec\u80fd\u591f\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u53cd\u9988\u548c\u63d0\u793a\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u751f\u6210\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u4ee5\u8fbe\u5230\u4eba\u7c7b\u5bfc\u5e08\u7684\u6c34\u5e73\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u6559\u80b2\u90e8\u7f72\u4e2d\uff0c\u9664\u4e86\u8d28\u91cf\u5916\uff0c\u6210\u672c\u3001\u65f6\u95f4\u53ca\u6570\u636e\u9690\u79c1\u4e5f\u662f\u5173\u952e\u8003\u91cf\u56e0\u7d20\u3002\u672c\u8bba\u6587\u65e8\u5728\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u53cd\u9988\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u5305\u62ec\u8d28\u91cf\u3001\u6210\u672c\u3001\u901f\u5ea6\u548c\u6570\u636e\u9690\u79c1\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u5229\u7528\u6700\u65b0\u7684\u5728\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u6280\u672f\uff0c\u8fd9\u6709\u52a9\u4e8e\u76f4\u63a5\u964d\u4f4e\u6210\u672c\u5e76\u4fdd\u62a4\u6570\u636e\u9690\u79c1\u3002 \u4e3a\u4e86\u4f18\u5316\u9002\u5408\u6d4f\u89c8\u5668\u5185\u8fd0\u884c\u7684\u5c0f\u578b\u6a21\u578b\u7684\u53cd\u9988\u8d28\u91cf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGPT-4\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u7684\u5fae\u8c03\u6d41\u7a0b\u3002\u6211\u4eec\u5c06\u5c55\u793a\u5982\u4f55\u4f7f\u7528WebLLM\u7684\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u5f15\u64ce\u6765\u4f18\u5316Llama3-8B\u548cPhi3-3.8B\u76844\u4f4d\u91cf\u5316\u6a21\u578b\u5728\u4e09\u4e2a\u4e0d\u540cPython\u7f16\u7a0b\u6570\u636e\u96c6\u4e0a\u7684\u6548\u679c\u3002\u6211\u4eec\u627f\u8bfa\u4f1a\u516c\u5f00\u5168\u90e8\u5b9e\u73b0\u3001web\u5e94\u7528\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4fc3\u8fdb\u5728\u6d4f\u89c8\u5668\u8bed\u8a00\u6a21\u578b\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2406.05039": "|**2024-06-07**|**Bootstrapping Referring Multi-Object Tracking**|Yani Zhang et.al.|[2406.05039](http://arxiv.org/abs/2406.05039)|**[link](https://github.com/zyn213/temprmot)**|## \u80cc\u666f \u5f53\u524d\u7684\u591a\u5bf9\u8c61\u5f15\u7528\u8ddf\u8e2a\uff08RMOT\uff09\u4efb\u52a1\u901a\u5e38\u4f9d\u8d56\u4e8e\u624b\u52a8\u6807\u6ce8\u7684\u6570\u636e\u96c6\u548c\u9759\u6001\u89c4\u5219\uff0c\u8fd9\u9650\u5236\u4e86\u591a\u6837\u6027\u548c\u5b9e\u65bd\u8303\u56f4\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u901a\u8fc7\u5f15\u5165\u66f4\u591a\u533a\u5206\u6027\u8bed\u8a00\u8bcd\u6c47\u6765\u63a8\u52a8RMOT\u4efb\u52a1\u7684\u53d1\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u9996\u5148\u5bf9Refer-KITTI\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u6269\u5c55\uff0c\u521b\u5efa\u4e86Refer-KITTI-V2\uff0c\u5b83\u4ece\u6700\u521d\u76842,719\u4e2a\u624b\u52a8\u6807\u6ce8\u5f00\u59cb\uff0c\u89e3\u51b3\u4e86\u7c7b\u522b\u4e0d\u5e73\u8861\u95ee\u9898\uff0c\u5e76\u589e\u52a0\u4e86\u66f4\u591a\u5173\u952e\u8bcd\uff0c\u4f7f\u5176\u66f4\u8d34\u8fd1\u73b0\u5b9e\u573a\u666f\uff0c\u76f8\u8f83\u4e8eRefer-KITTI\u6709\u6240\u8fdb\u6b65\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6269\u5145\u8fd9\u4e9b\u6807\u6ce8\uff0c\u603b\u8ba1\u8fbe\u52309,758\u4e2a\uff0c\u751f\u6210\u4e86617\u4e2a\u4e0d\u540c\u7684\u8bcd\u6c47\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684RMOT\u57fa\u51c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6539\u8fdb\u4e86RMOT\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u800c\u4f18\u96c5\u7684\u65f6\u5e8f\u63a8\u8fdb\u7b56\u7565\uff0c\u8be5\u7b56\u7565\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\u3002\u76f8\u5173\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05035": "|**2024-06-07**|**Scenarios and Approaches for Situated Natural Language Explanations**|Pengshuo Qiu et.al.|[2406.05035](http://arxiv.org/abs/2406.05035)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u751f\u6210\u9002\u5e94\u4e0d\u540c\u7528\u6237\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\uff08NLE\uff09\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u79cd\u9002\u5e94\u6027\u7684\u91cf\u5316\u8bc4\u4f30\u5c1a\u5b58\u7a7a\u767d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u2014\u2014\u57fa\u4e8e\u60c5\u5883\u7684\u89e3\u91ca\uff08Situation-Based Explanation\uff0cSBE\uff09\u6570\u636e\u96c6\uff0c\u5305\u542b100\u4e2a\u9700\u8981\u89e3\u91ca\u7684\u4e8b\u7269\uff08explanandum\uff09\u3002\u6bcf\u4e2a\u4e8b\u7269\u90fd\u914d\u5bf9\u4e86\u9488\u5bf9\u6559\u5e08\u3001\u5b66\u751f\u548c\u4e13\u4e1a\u4eba\u58eb\u7b49\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u89e3\u91ca\uff0c\u4ee5\u4fbf\u8bc4\u4f30\u6a21\u578b\u5728\u6ee1\u8db3\u8fd9\u4e9b\u591a\u5143\u5316\u7fa4\u4f53\u4fe1\u606f\u9700\u6c42\u548c\u80cc\u666f\u4e0b\u7684\u89e3\u91ca\u7cbe\u51c6\u5ea6\uff0c\u5982\u5b66\u751f\u3001\u6559\u5e08\u548c\u5bb6\u957f\u3002\u6bcf\u79cd\u201c\u4e8b\u4f8b-\u53d7\u4f17\u201d\u7ec4\u5408\u90fd\u9644\u6709\u4eba\u7c7b\u64b0\u5199\u7684\u53c2\u8003\u89e3\u91ca\uff0c\u7528\u4e8e\u8ba1\u7b97\u5206\u6570\uff0c\u4ee5\u91cf\u5316\u6a21\u578b\u5982\u4f55\u6839\u636e\u60c5\u5883\u8c03\u6574\u89e3\u91ca\u3002\u6211\u4eec\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u4e0a\u6d4b\u8bd5\u4e86\u4e09\u79cd\u63d0\u793a\u65b9\u6cd5\uff1a\u89c4\u5219\u57fa\u7840\u63d0\u793a\u3001\u5143\u63d0\u793a\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u3002\u7814\u7a76\u53d1\u73b0\uff1a1\uff09\u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u751f\u6210\u63d0\u793a\u4ea7\u751f\u66f4\u7cbe\u786e\u5730\u7b26\u5408\u76ee\u6807\u60c5\u5883\u7684\u89e3\u91ca\uff1b2\uff09\u660e\u786e\u63d0\u793a\u201c\u4f60\u662f\u4e00\u4e2a\u6709\u7528\u7684\u52a9\u624b\u201d\u5e76\u975e\u9488\u5bf9\u60c5\u5883\u5316NLE\u4efb\u52a1\u7684\u5fc5\u8981\u6280\u672f\uff1b3\uff09\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u4ec5\u80fd\u5e2e\u52a9\u6a21\u578b\u5b66\u4e60\u6f14\u793a\u6a21\u677f\uff0c\u4f46\u65e0\u52a9\u4e8e\u63d0\u5347\u5176\u63a8\u7406\u6027\u80fd\u3002SBE\u6570\u636e\u96c6\u548c\u6211\u4eec\u7684\u5206\u6790\u4e3a\u4eca\u540e\u751f\u6210\u9002\u5e94\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2406.06525": "|**2024-06-10**|**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**|Peize Sun et.al.|[2406.06525](http://arxiv.org/abs/2406.06525)|**[link](https://github.com/foundationvision/llamagen)**|**\u6211\u4eec\u63d0\u51faLlamaGen\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u65b0\u7684\u56fe\u50cf\u751f\u6210\u6a21\u578b\u5bb6\u65cf\uff0c\u5b83\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u539f\u59cb\u201c\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u201d\u8303\u5f0f\u5e94\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u9886\u57df\u3002\u8fd9\u8868\u660e\uff0c\u5982\u679c\u9002\u5f53\u6269\u5c55\uff0c\u672a\u7ecf\u89c6\u89c9\u7279\u6027\u7684\u5148\u9a8c\u77e5\u8bc6\u589e\u5f3a\u7684\u7eaf\u81ea\u56de\u5f52\u6a21\u578b\uff08\u5982Llama\uff09\u4e5f\u80fd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u56fe\u50cf\u751f\u6210\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u56fe\u50cf\u5206\u8bcd\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u4ee5\u53ca\u8bad\u7ec3\u6570\u636e\u8d28\u91cf\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a(1) \u4e00\u79cd\u5177\u670916\u500d\u4e0b\u91c7\u6837\u7684\u56fe\u50cf\u5206\u8bcd\u5668\uff0c\u5176\u5728ImageNet\u57fa\u51c6\u4e0a\u7684\u91cd\u6784\u8d28\u91cf\u4e3a0.94\uff0c\u4ee3\u7801\u4e66\u5229\u7528\u7387\u9ad8\u8fbe97%\u3002(2) \u4e00\u7cfb\u5217\u4ece111\u767e\u4e07\u523031\u4ebf\u53c2\u6570\u7684\u7c7b\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u5728ImageNet 256x256\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.18\u7684FID\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u6d41\u884c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5982LDM\u548cDiT\u3002(3) \u4e00\u4e2a7.75\u4ebf\u53c2\u6570\u7684\u6587\u672c\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\u5728LAION-COCO\u548c\u9ad8\u5ba1\u7f8e\u8d28\u91cf\u56fe\u50cf\u4e0a\uff0c\u663e\u793a\u51fa\u826f\u597d\u7684\u89c6\u89c9\u8d28\u91cf\u548c\u6587\u672c\u4e00\u81f4\u6027\u6027\u80fd\u3002(4) \u6211\u4eec\u9a8c\u8bc1\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u670d\u52a1\u6846\u67b6\u5728\u4f18\u5316\u56fe\u50cf\u751f\u6210\u6a21\u578b\u63a8\u7406\u901f\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5b9e\u73b0\u4e86326%\u81f3414%\u7684\u901f\u5ea6\u63d0\u5347\u3002\u6211\u4eec\u5f00\u6e90\u6240\u6709\u6a21\u578b\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u751f\u6210\u548c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u7684\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u7684\u53d1\u5c55\u3002**|\n", "2406.06519": "|**2024-06-10**|**UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor**|Shivani Upadhyay et.al.|[2406.06519](http://arxiv.org/abs/2406.06519)|**[link](https://github.com/castorini/umbrela)**|**## \u7ffb\u8bd1 \u5927\u91cf\u76f8\u5173\u6027\u5224\u65ad\u5bf9\u4e8e\u68c0\u7d22\u7cfb\u7edf\u7684\u6709\u6548\u8bad\u7ec3\u548c\u7cbe\u786e\u8bc4\u4f30\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u5224\u65ad\u7531\u4eba\u5de5\u8bc4\u5b9a\u5458\u5b8c\u6210\uff0c\u8fc7\u7a0b\u6602\u8d35\u4e14\u8017\u65f6\u3002\u5fae\u8f6fBing\u7684Thomas\u7b49\u4eba\u6700\u8fd1\u7684\u4e00\u9879\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u51c6\u786e\u5730\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u63d0\u4f9b\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u5224\u65ad\u3002\u9057\u61be\u7684\u662f\uff0c\u4ed6\u4eec\u7684\u7814\u7a76\u5e76\u672a\u516c\u5f00\u53ef\u4f9b\u91cd\u590d\u4f7f\u7528\u7684\u8f6f\u4ef6\u5de5\u5177\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u5305\u2014\u2014UMBRELA\uff08\u5168\u79f0\u4e3a\u201cUMBRELA\u662fBing RELevance Assessor\u7684\u9012\u5f52\u7f29\u5199\u201d\uff09\uff0c\u5b83\u57fa\u4e8eOpenAI\u7684GPT-4\u6a21\u578b\u590d\u73b0\u4e86Thomas\u7b49\u4eba\u7684\u7ed3\u679c\uff0c\u5e76\u4e3a\u539f\u8bba\u6587\u589e\u6dfb\u4e86\u66f4\u591a\u7ec6\u8282\u3002\u6211\u4eec\u5728TREC 2019\u5e74\u81f32023\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u4efb\u52a1\u4e2d\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u76f8\u5173\u6027\u5224\u65ad\u4e0e\u9ad8\u6548\u591a\u9636\u6bb5\u68c0\u7d22\u7cfb\u7edf\u751f\u6210\u7684\u6392\u540d\u9ad8\u5ea6\u76f8\u5173\u3002\u8be5\u5de5\u5177\u5305\u8bbe\u8ba1\u4e3a\u6613\u4e8e\u6269\u5c55\uff0c\u53ef\u4ee5\u878d\u5165\u73b0\u6709\u7684\u591a\u9636\u6bb5\u68c0\u7d22\u548c\u8bc4\u4f30\u6d41\u7a0b\uff0c\u4e3a\u7814\u7a76\u68c0\u7d22\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u8d44\u6e90\u3002UMBRELA\u5c06\u5728TREC 2024\u5e74\u7684RAG\u4efb\u52a1\u4e2d\u7528\u4e8e\u8f85\u52a9\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u6211\u4eec\u671f\u671b\u5b83\u6210\u4e3a\u8be5\u9886\u57df\u8fdb\u4e00\u6b65\u521b\u65b0\u7684\u57fa\u7840\u3002UMBRELA\u7684\u4ee3\u7801\u5e93\u53ef\u4e8ehttps://github.com/castorini/umbrela\u83b7\u53d6\u3002**|\n", "2406.06499": "|**2024-06-10**|**NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative**|Asmar Nadeem et.al.|[2406.06499](http://arxiv.org/abs/2406.06499)|null|\u5f53\u524d\u7684\u89c6\u9891\u5b57\u5e55\u57fa\u51c6\u548c\u6a21\u578b\u5728\u8868\u5f81\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u79cd\u53d9\u4e8b\u662f\u901a\u8fc7\u56e0\u679c\u5173\u7cfb\u8fde\u63a5\u7684\u4e00\u7cfb\u5217\u4e8b\u4ef6\uff0c\u968f\u65f6\u95f4\u53d1\u5c55\uff0c\u7531\u4eba\u7269\u6216\u4e3b\u4f53\u9a71\u52a8\u3002\u8fd9\u79cd\u7f3a\u4e4f\u53d9\u4e8b\u6027\u9650\u5236\u4e86\u6a21\u578b\u751f\u6210\u6355\u6349\u89c6\u9891\u5185\u5bb9\u5185\u5728\u56e0\u679c\u548c\u65f6\u95f4\u52a8\u6001\u7684\u6587\u672c\u63cf\u8ff0\u7684\u80fd\u529b\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faNarrativeBridge\uff0c\u5b83\u5305\u62ec\u4ee5\u4e0b\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\uff1a\uff081\uff09\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u751f\u6210\u7684\u65b0\u578b\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\uff08CTN\uff09\u5b57\u5e55\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u660e\u786e\u5730\u5728\u89c6\u9891\u63cf\u8ff0\u4e2d\u7f16\u7801\u56e0\u679c\u5173\u7cfb\uff0c\u901a\u8fc7\u81ea\u52a8\u8bc4\u4f30\u786e\u4fdd\u8d28\u91cf\u548c\u76f8\u5173\u6027\uff1b\uff082\uff09\u4e00\u4e2a\u4e13\u95e8\u7684\u56e0\u679c\u7f51\u7edc\uff08CEN\uff09\u67b6\u6784\uff0c\u5177\u6709\u72ec\u7acb\u7684\u7f16\u7801\u5668\u4ee5\u5206\u522b\u6355\u83b7\u56e0\u679c\u52a8\u6001\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u7684\u5b66\u4e60\u548c\u751f\u6210\u5177\u6709\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7684\u5b57\u5e55\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCEN\u5728\u8868\u8fbe\u89c6\u9891\u5185\u5bb9\u7684\u56e0\u679c\u548c\u65f6\u95f4\u65b9\u9762\u6bd4\u7b2c\u4e8c\u597d\u7684\u6a21\u578b\uff08GIT\uff09\u66f4\u51c6\u786e\uff1a\u5728MSVD\u548cMSR-VTT\u6570\u636e\u96c6\u4e0a\u7684CIDEr\u5206\u6570\u5206\u522b\u4e3a17.88\u548c17.44\u3002\u63d0\u51fa\u7684\u6846\u67b6\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u5177\u6709\u590d\u6742\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7ed3\u6784\u7684\u7ec6\u5fae\u6587\u672c\u63cf\u8ff0\uff0c\u8fd9\u662f\u89c6\u9891\u5b57\u5e55\u751f\u6210\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u6027\u3002\u6709\u5173\u9879\u76ee\u8be6\u60c5\uff0c\u8bf7\u8bbf\u95ee\u3002|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474](http://arxiv.org/abs/2406.06474)|null|\u5728\u5065\u5eb7\u9886\u57df\uff0c\u5927\u90e8\u5206\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u4e34\u5e8a\u4efb\u52a1\u4e0a\u3002\u7136\u800c\uff0c\u79fb\u52a8\u548c\u53ef\u7a7f\u6234\u8bbe\u5907\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u957f\u671f\u7684\u4e2a\u4eba\u5065\u5eb7\u76d1\u6d4b\u6570\u636e\u5f80\u5f80\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Large Language Model\uff08PH-LLM\uff09\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u662fGemini\u7684\u5b9a\u5236\u7248\uff0c\u4e13\u4e3a\u7406\u89e3\u548c\u5904\u7406\u6570\u503c\u65f6\u95f4\u5e8f\u5217\u7684\u4e2a\u4eba\u5065\u5eb7\u6570\u636e\u800c\u8bbe\u8ba1\u3002\u6211\u4eec\u521b\u5efa\u5e76\u6574\u7406\u4e86\u4e09\u4e2a\u6d4b\u8bd5\u96c6\uff0c\u8003\u5bdf\u4e86PH-LLM\u5728\u4ee5\u4e0b\u65b9\u9762\u7684\u6027\u80fd\uff1a1\uff09\u4ece\u7761\u7720\u6a21\u5f0f\u3001\u8eab\u4f53\u6d3b\u52a8\u548c\u751f\u7406\u53cd\u5e94\u4e2d\u751f\u6210\u4e2a\u6027\u5316\u89c1\u89e3\u548c\u5efa\u8bae\uff1b2\uff09\u4e13\u4e1a\u77e5\u8bc6\u9886\u57df\u7684\u4e13\u5bb6\u6c34\u5e73\uff1b3\uff09\u9884\u6d4b\u81ea\u6211\u62a5\u544a\u7684\u7761\u7720\u7ed3\u679c\u3002\u6211\u4eec\u4e0e\u9886\u57df\u4e13\u5bb6\u5408\u4f5c\u6784\u5efa\u4e86857\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bc4\u4f30\u5b9e\u9645\u7684\u7761\u7720\u548c\u5065\u8eab\u573a\u666f\u3002\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bc4\u5206\u6807\u51c6\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0Gemini Ultra 1.0\u548cPH-LLM\u5728\u5065\u8eab\u65b9\u9762\u4e0e\u4e13\u5bb6\u8868\u73b0\u65e0\u7edf\u8ba1\u5dee\u5f02\uff0c\u5c3d\u7ba1\u5728\u7761\u7720\u65b9\u9762\u4e13\u5bb6\u4ecd\u5360\u4f18\u52bf\uff0c\u4f46Fine-tune\u540e\u7684PH-LLM\u5728\u5229\u7528\u76f8\u5173\u9886\u57df\u77e5\u8bc6\u548c\u4e2a\u4eba\u5316\u7761\u7720\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u591a\u9879\u9009\u62e9\u7684\u7761\u7720\u533b\u5b66\u548c\u5065\u8eab\u8003\u8bd5\u8bc4\u4f30\u4e86PH-LLM\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5176\u5f97\u5206\u5206\u522b\u4e3a79%\u548c88%\uff0c\u8d85\u8fc7\u4e86\u4eba\u7c7b\u4e13\u5bb6\u6837\u672c\u7684\u5e73\u5747\u5206\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bad\u7ec3PH-LLM\u9884\u6d4b\u6765\u81ea\u53ef\u7a7f\u6234\u8bbe\u5907\u6587\u672c\u548c\u591a\u6a21\u6001\u7f16\u7801\u6570\u636e\u7684\u81ea\u6211\u62a5\u544a\u7761\u7720\u8d28\u91cf\u7ed3\u679c\uff0c\u5e76\u8bc1\u660e\u4e86\u591a\u6a21\u6001\u7f16\u7801\u5bf9\u4e8e\u8fbe\u5230\u4e13\u95e8\u533a\u5206\u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5728\u4e2a\u4eba\u5065\u5eb7\u8fd9\u4e2a\u5173\u952e\u5b89\u5168\u9886\u57df\u8fd8\u9700\u8981\u8fdb\u4e00\u6b65\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u4f46\u8fd9\u4e9b\u7ed3\u679c\u5c55\u793a\u4e86Gemini\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u80fd\u529b\uff0c\u4ee5\u53ca\u5c06\u751f\u7406\u6570\u636e\u5e94\u7528\u4e8e\u4e2a\u4eba\u5065\u5eb7\u5e94\u7528\uff0c\u5982PH-LLM\u4e2d\u7684\u505a\u6cd5\u3002|\n", "2406.06465": "|**2024-06-10**|**AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction**|Zhen Xing et.al.|[2406.06465](http://arxiv.org/abs/2406.06465)|null|\u6587\u672c\u5f15\u5bfc\u7684\u89c6\u9891\u9884\u6d4b\uff08TVP\uff09\u4efb\u52a1\u65e8\u5728\u6839\u636e\u521d\u59cb\u5e27\u548c\u6307\u4ee4\u9884\u6d4b\u540e\u7eed\u5e27\u7684\u8fd0\u52a8\uff0c\u8fd9\u5bf9\u4e8e\u865a\u62df\u73b0\u5b9e\u3001\u673a\u5668\u4eba\u6280\u672f\u548c\u5185\u5bb9\u521b\u4f5c\u7b49\u9886\u57df\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6539\u7f16Stable Diffusion\u5728\u8be5\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5e27\u4e00\u81f4\u6027\u4e0e\u65f6\u95f4\u7a33\u5b9a\u6027\u65b9\u9762\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u89c6\u9891\u6570\u636e\u96c6\u7684\u89c4\u6a21\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u9884\u8bad\u7ec3\u7684Image2Video\u6269\u6563\u6a21\u578b\u5bf9\u89c6\u9891\u52a8\u6001\u6709\u826f\u597d\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4f46\u7f3a\u4e4f\u6587\u672c\u63a7\u5236\u3002\u56e0\u6b64\uff0c\u5c06Image2Video\u6a21\u578b\u8f6c\u79fb\uff0c\u540c\u65f6\u6ce8\u5165\u6307\u4ee4\u63a7\u5236\u4ee5\u751f\u6210\u53ef\u63a7\u5236\u7684\u89c6\u9891\uff0c\u65e2\u5177\u6709\u610f\u4e49\u53c8\u9887\u5177\u6311\u6218\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u7528\u4e8e\u6839\u636e\u521d\u59cb\u5e27\u548c\u6587\u672c\u6307\u4ee4\u9884\u6d4b\u672a\u6765\u7684\u89c6\u9891\u72b6\u6001\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u53cc\u67e5\u8be2Transformer\uff08DQFormer\uff09\u67b6\u6784\uff0c\u5b83\u5c06\u6307\u4ee4\u548c\u5e27\u4fe1\u606f\u6574\u5408\u5230\u6761\u4ef6\u5d4c\u5165\u4e2d\uff0c\u7528\u4e8e\u672a\u6765\u5e27\u7684\u9884\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u957f\u77ed\u671f\u65f6\u5e8f\u9002\u914d\u5668\u548c\u7a7a\u95f4\u9002\u914d\u5668\uff0c\u80fd\u591f\u5728\u5c11\u91cf\u8bad\u7ec3\u6210\u672c\u4e0b\u5feb\u901f\u5c06\u901a\u7528\u89c6\u9891\u6269\u6563\u6a21\u578b\u9002\u5e94\u7279\u5b9a\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Something Something V2\u3001Epic Kitchen-100\u3001Bridge Data\u548cUCF-101\u56db\u4e2a\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6280\u672f\u3002\u7279\u522b\u662f\u5728Bridge\u6570\u636e\u96c6\u548cSSv2\u4e0a\uff0cAID\u5206\u522b\u5b9e\u73b0\u4e8691.2%\u548c55.5%\u7684FVD\u6539\u8fdb\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u9886\u57df\u7684\u6709\u6548\u6027\u3002\u66f4\u591a\u793a\u4f8b\u53ef\u5728\u6211\u4eec\u7684\u7f51\u7ad9\u627e\u5230\u3002|\n", "2406.06464": "|**2024-06-10**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.06461": "|**2024-06-11**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461](http://arxiv.org/abs/2406.06461)|null|\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u591a\u79cd\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4f46\u4f20\u7edf\u7684\u8bc4\u4ef7\u65b9\u6cd5\u4ec5\u5173\u6ce8\u6027\u80fd\u6307\u6807\uff0c\u5ffd\u89c6\u4e86\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u989d\u5916\u8ba1\u7b97\u8d44\u6e90\u5e26\u6765\u7684\u589e\u6548\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7b56\u7565\u6548\u7387\u7684\u7247\u9762\u7406\u89e3\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u5c06\u8ba1\u7b97\u9884\u7b97\u7eb3\u5165\u8bc4\u4f30\uff0c\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u65e2\u8003\u8651\u6027\u80fd\u6307\u6807\u53c8\u8003\u8651\u8ba1\u7b97\u6210\u672c\u7684\u66f4\u5168\u9762\u6bd4\u8f83\u3002\u901a\u8fc7\u8fd9\u79cd\u9884\u7b97\u610f\u8bc6\u7684\u89c6\u89d2\uff0c\u7814\u7a76\u53d1\u73b0\u590d\u6742\u7684\u63a8\u7406\u7b56\u7565\u5728\u6ca1\u6709\u663e\u8457\u7b97\u6cd5\u521b\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u5f80\u5f80\u7531\u4e8e\u5206\u914d\u4e86\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u800c\u8d85\u8d8a\u4e86\u7b80\u5355\u7684\u57fa\u7ebf\u3002\u4f8b\u5982\uff0c\u5f53\u7ed9\u4e88\u94fe\u5f0f\u601d\u8003\u81ea\u6d3d\u6027\uff08chain-of-thought self-consistency\uff09\u7c7b\u4f3c\u7ea7\u522b\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u5b83\u5e38\u5e38\u80fd\u4f18\u4e8e\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u63a8\u7406\u7b56\u7565\u3002\u7136\u800c\uff0c\u5728\u8fd9\u79cd\u89c4\u6a21\u654f\u611f\u7684\u89c6\u89d2\u4e0b\uff0c\u67d0\u4e9b\u7b56\u7565\u5982\u591a\u4ee3\u7406\u8fa9\u8bba\u6216\u591a\u53cd\u601d\u5728\u589e\u52a0\u8ba1\u7b97\u9884\u7b97\u65f6\u53ef\u80fd\u4f1a\u8868\u73b0\u5f97\u66f4\u5dee\u3002|\n", "2406.06458": "|**2024-06-10**|**Evaluating the Retrieval Component in LLM-Based Question Answering Systems**|Ashkan Alinejad et.al.|[2406.06458](http://arxiv.org/abs/2406.06458)|null|## \u80cc\u666f \u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u95ee\u7b54\u7cfb\u7edf\u5728\u4f9d\u8d56\u68c0\u7d22\u7ec4\u4ef6\u65f6\uff0c\u80fd\u591f\u83b7\u53d6\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5e76\u964d\u4f4e\u4ea7\u751f\u4e0d\u51c6\u786e\u56de\u590d\u6216\u9519\u8bef\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5c3d\u7ba1\u4fe1\u606f\u68c0\u7d22\u9886\u57df\u7684\u8bc4\u4f30\u65b9\u6cd5\u65e9\u5df2\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u8bc4\u4f30LLMs\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u6027\u80fd\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u57fa\u51c6\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4ef7\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u5168\u9762\u5730\u53cd\u6620\u68c0\u7d22\u5668\u7684\u6027\u80fd\uff0c\u5e76\u4e0e\u6574\u4e2a\u95ee\u7b54\u7cfb\u7edf\u7684\u6574\u4f53\u8868\u73b0\u66f4\u4e3a\u4e00\u81f4\u3002\u5c3d\u7ba1\u4f20\u7edf\u7684\u7cbe\u786e\u5ea6\uff08precision\uff09\u3001\u53ec\u56de\u7387\uff08recall\uff09\u548cF1\u5206\u6570\u7b49\u6307\u6807\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u63ed\u793aLLMs\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5728\u68c0\u7d22\u5668\u4e0d\u5b8c\u7f8e\u65f6\u4ecd\u63d0\u4f9b\u51c6\u786e\u7b54\u6848\uff0c\u4f46\u6211\u4eec\u7684\u8bc4\u4f30\u65b9\u6cd5\u8003\u8651\u5230\u4e86LLMs\u7684\u4f18\u52bf\uff0c\u5373\u5b83\u4eec\u80fd\u591f\u5ffd\u7565\u65e0\u5173\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4e5f\u80fd\u5904\u7406\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u548c\u865a\u6784\u5185\u5bb9\u3002|\n", "2406.06455": "|**2024-06-10**|**A Large Language Model Pipeline for Breast Cancer Oncology**|Tristen Pool et.al.|[2406.06455](http://arxiv.org/abs/2406.06455)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f17\u591a\u9886\u57df\u5c55\u73b0\u51fa\u521b\u65b0\u6f5c\u529b\uff0c\u4f46\u5728\u764c\u75c7\u6cbb\u7597\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u5f00\u53d1\u3002\u7814\u7a76\u8005\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684Langchain\u63d0\u793a\u5de5\u7a0b\u7ba1\u9053\uff0c\u5bf9\u6700\u5148\u8fdb\u7684OpenAI\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u6570\u636e\u96c6\u5305\u62ec\u4e34\u5e8a\u6570\u636e\u548c\u4e34\u5e8a\u6307\u5357\u6587\u672c\uff0c\u4e13\u6ce8\u4e8e\u4e73\u817a\u764c\u60a3\u8005\u8f85\u52a9\u653e\u7597\u548c\u5316\u7597\u4e24\u4e2a\u5173\u952e\u6cbb\u7597\u56e0\u7d20\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6a21\u578b\u5728\u5206\u7c7b\u8fd9\u4e24\u4e2a\u6cbb\u7597\u624b\u6bb5\u65f6\u8fbe\u5230\u4e86\u9ad8\u7cbe\u5ea6\uff080.85+\uff09\u3002\u901a\u8fc7\u89c2\u5bdf\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u7684\u6cbb\u7597\u8d28\u91cf\u6570\u636e\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u7f6e\u4fe1\u533a\u95f4\uff0c\u4f30\u8ba1\u6a21\u578b\u5728\u9884\u6d4b\u6cbb\u7597\u65b9\u6848\u65f6\u5fc5\u987b\u6bd4\u539f\u59cb\u80bf\u7624\u5b66\u5bb6\u8868\u73b0\u5f97\u66f4\u597d\uff0c\u624d\u80fd\u5728\u603b\u4f53\u4e0a\u6210\u4e3a\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6bd4\u4f8b\u4e3a8.2%\u81f313.3%\u3002\u7531\u4e8e\u764c\u75c7\u6cbb\u7597\u51b3\u7b56\u7ed3\u679c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u672a\u6765\u53ef\u80fd\u9700\u8981\u8fdb\u884c\u4e34\u5e8a\u8bd5\u9a8c\u6765\u9a8c\u8bc1\u8fd9\u4e00\u9608\u503c\u3002\u8003\u8651\u5230\u7f8e\u56fd85%\u7684\u764c\u75c7\u60a3\u8005\u5728\u5730\u65b9\u793e\u533a\u8bbe\u65bd\u63a5\u53d7\u6cbb\u7597\uff0c\u8fd9\u7c7b\u6a21\u578b\u6709\u53ef\u80fd\u663e\u8457\u6269\u5927\u4f18\u8d28\u62a4\u7406\u7684\u53ef\u53ca\u6027\uff0c\u5176\u6548\u679c\u81f3\u5c11\u63a5\u8fd1\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u3002|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451](http://arxiv.org/abs/2406.06451)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u3001\u8c03\u8bd5\u548c\u89e3\u91ca\u65b9\u9762\u7684\u6027\u80fd\u5f15\u53d1\u4e86\u8bb8\u591a\u7814\u7a76\u8005\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u5bf9\u672c\u79d1\u7f16\u7a0b\u6559\u80b2\u7684\u5173\u6ce8\uff0c\u4ed6\u4eec\u671f\u5f85\u8fd9\u4e9b\u6a21\u578b\u80fd\u9769\u65b0\u7f16\u7a0b\u6559\u5b66\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5982\u4f55\u4ee5\u53ca\u4e3a\u4f55\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u4f7f\u7528LLMs\u7684\u51b3\u7b56\u53ef\u80fd\u4e0d\u4ec5\u4ec5\u57fa\u4e8e\u6280\u672f\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u4ee5\u793e\u4f1a\u5851\u9020\u6280\u672f\u7406\u8bba\u4e3a\u6307\u5bfc\u6846\u67b6\uff0c\u63a2\u8ba8\u4e86\u5b66\u751f\u5bf9LLMs\u7684\u793e\u4f1a\u611f\u77e5\u5982\u4f55\u5f71\u54cd\u4ed6\u4eec\u7684\u4f7f\u7528\u884c\u4e3a\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u4e00\u4efd\u533f\u540d\u7684\u8bfe\u7a0b\u7ed3\u675f\u65f6\u7684\u8c03\u67e5\u95ee\u5377\uff08n=158\uff09\u3001\u4e2d\u671f\u81ea\u6211\u6548\u80fd\u95ee\u5377\uff08n=158\uff09\u300110\u4f4d\u5b66\u751f\u7684\u6df1\u5ea6\u8bbf\u8c08\u3001\u81ea\u6211\u62a5\u544a\u7684LLM\u5728\u4f5c\u4e1a\u4e2d\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u4ee5\u53ca\u671f\u4e2d\u8003\u8bd5\u6210\u7ee9\uff0c\u53d1\u73b0\u5b66\u751f\u7684LLM\u4f7f\u7528\u4e0e\u5176\u5bf9\u672a\u6765\u804c\u4e1a\u7684\u671f\u671b\u548c\u5bf9\u540c\u4f34\u4f7f\u7528\u7684\u611f\u77e5\u6709\u5173\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u65e9\u671f\u81ea\u6211\u62a5\u544a\u7684LLM\u4f7f\u7528\u4e0e\u8f83\u4f4e\u7684\u81ea\u6211\u6548\u80fd\u548c\u4e2d\u671f\u8003\u8bd5\u6210\u7ee9\u76f8\u5173\uff0c\u800c\u5b66\u751f\u5bf9\u8fc7\u5ea6\u4f9d\u8d56LLM\u7684\u611f\u77e5\uff0c\u800c\u975e\u5b9e\u9645\u4f7f\u7528\uff0c\u4e0e\u8bfe\u7a0b\u540e\u671f\u7684\u81ea\u6211\u6548\u80fd\u4e0b\u964d\u6709\u5173\u3002|\n", "2406.07545": "|**2024-06-11**|**Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena**|Aidar Myrzakhan et.al.|[2406.07545](http://arxiv.org/abs/2406.07545)|**[link](https://github.com/vila-lab/open-llm-leaderboard)**|**### \u80cc\u666f \u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u5e38\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u5e38\uff0cLLM\u4f1a\u6839\u636e\u8c03\u6574\u540e\u7684\u6982\u7387\uff0c\u5982\u957f\u5ea6\u56e0\u7d20\uff0c\u9009\u62e9\u6700\u53ef\u80fd\u7684\u7b54\u6848\u3002\u7136\u800c\uff0cLLMs\u53ef\u80fd\u5b58\u5728\u56fa\u6709\u7684\u504f\u89c1\uff0c\u4f8b\u5982\u5bf9A\u3001B\u3001C\u3001D\u7b49\u9009\u9879ID\u7684\u504f\u597d\uff0c\u8fd9\u53ef\u80fd\u5f71\u54cd\u7b54\u6848\u9884\u6d4b\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u5728\u5c11\u6570\u6d4b\u8bd5\u6837\u672c\u4e0a\u968f\u673a\u6253\u4e71\u9009\u9879\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u5230\u65b0\u6837\u672c\u4e0a\uff0c\u8bd5\u56fe\u51cf\u5c11\u8fd9\u79cd\u201c\u9009\u62e9\u504f\u5dee\u201d\u3002\u6b64\u5916\uff0cMCQ\u7684\u53e6\u4e00\u4e2a\u95ee\u9898\u662f\u201c\u5f69\u7968\u5f0f\u731c\u6d4b\u201d\uff0c\u5373LLM\u5e76\u672a\u771f\u6b63\u5b66\u4e60\u77e5\u8bc6\uff0c\u800c\u662f\u51ed\u8fd0\u6c14\u731c\u5bf9\u7b54\u6848\uff0c\u8fd9\u5bf9\u5c0f\u578bLLMs\u5c24\u4e3a\u4e25\u91cd\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e00\u4e2a\u66f4\u5168\u9762\u7684\u65b9\u6cd5\u662f\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\uff0c\u8fd9\u80fd\u4ece\u6839\u672c\u4e0a\u6d88\u9664\u9009\u62e9\u504f\u5dee\u548c\u968f\u673a\u731c\u6d4b\u3002\u4f46\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\u4e5f\u5e26\u6765\u4e86\u6311\u6218\uff1a\u4e00\u662f\u5982\u4f55\u8bc6\u522b\u5408\u9002\u7684\u5f00\u653e\u6027\u95ee\u9898\uff0c\u4e8c\u662f\u5982\u4f55\u9a8c\u8bc1LLM\u5bf9\u5f00\u653e\u5f0f\u95ee\u9898\u7684\u56de\u7b54\u4e0e\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u7b54\u6848\u4e4b\u95f4\u7684\u51c6\u786e\u6027\u3002\u672c\u7814\u7a76\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u96be\u9898\uff0c\u5e76\u5efa\u7acb\u4e00\u4e2a\u65b0\u7684LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u901a\u8fc7\u5b8c\u5168\u7684\u5f00\u653e\u5f0f\u95ee\u9898\u6765\u8861\u91cf\u6a21\u578b\u6027\u80fd\uff0c\u4f8b\u5982GPT-4o/4/3.5\u3001Claude 3\u3001Gemini\u7b49\u3002 ### \u4efb\u52a1 \u6211\u4eec\u521b\u5efa\u4e86Open-LLM-Leaderboard\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u8bc4\u4ef7\u5e73\u53f0\uff0c\u65e8\u5728\u8ddf\u8e2a\u5404\u79cdLLM\u7684\u8868\u73b0\uff0c\u63ed\u793a\u5b83\u4eec\u7684\u771f\u5b9e\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5f00\u6e90\uff0c\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/VILA-Lab/Open-LLM-Leaderboard\u3002**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528](http://arxiv.org/abs/2406.07528)|**[link](https://github.com/dvlab-research/q-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u957f\u5e8f\u5217\u65b9\u9762\u7684\u80fd\u529b\u5bf9\u4e8e\u5404\u9886\u57df\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u4ee5\u6df1\u5165\u7406\u89e3\u8bed\u4e49\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Query-aware Inference for LLMs\uff08Q-LLM\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u8ba4\u77e5\u5904\u7406\u5927\u89c4\u6a21\u5e8f\u5217\u7684\u7cfb\u7edf\u3002\u901a\u8fc7\u805a\u7126\u4e8e\u4e0e\u7ed9\u5b9a\u67e5\u8be2\u76f8\u5173\u7684\u5185\u5b58\u6570\u636e\uff0cQ-LLM\u80fd\u591f\u5728\u56fa\u5b9a\u7a97\u53e3\u5927\u5c0f\u5185\u51c6\u786e\u6355\u6349\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4e3a\u67e5\u8be2\u63d0\u4f9b\u7cbe\u786e\u7684\u7b54\u6848\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55LLMs\u4e2d\u3002\u4f7f\u7528LLaMA3\uff08QuickLLaMA\uff09\u7684Q-LLM\u80fd\u572830\u79d2\u5185\u9605\u8bfb\u300a\u54c8\u5229\u00b7\u6ce2\u7279\u300b\uff0c\u5e76\u80fd\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u76f8\u8f83\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684LLaMA3\uff0cQ-LLM\u7684\u6027\u80fd\u63d0\u5347\u4e867.17%\uff0c\u800c\u5728Mistral\u4e0a\uff0c\u5b83\u5728$\\infty$-bench\u4e0a\u7684\u8868\u73b0\u63d0\u5347\u4e863.26%\u3002\u5728\u201c\u9488\u950b\u76f8\u5bf9\u201d\u4efb\u52a1\u4e2d\uff0cQ-LLM\u5728\u5e7f\u6cdb\u8ba4\u53ef\u7684\u57fa\u51c6\u4e0a\uff0c\u76f8\u5bf9\u4e8e\u5f53\u524d\u6700\u4f73\u6210\u7ee9\uff0cMistral\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e867.0%\uff0c\u5728LLaMA3\u4e0a\u5b9e\u73b0\u4e86100%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728https://github.com/dvlab-research/Q-LLM\u4e0a\u5f00\u6e90\u3002**|\n", "2406.07515": "|**2024-06-11**|**Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement**|Yunzhen Feng et.al.|[2406.07515](http://arxiv.org/abs/2406.07515)|null|\u968f\u7740\u751f\u6210\u6a21\u578b\u5408\u6210\u6570\u636e\u7684\u5174\u8d77\uff0c\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5fae\u8c03\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u6a21\u578b\u5d29\u6e83\uff08\u5373\u5fae\u8c03\u6027\u80fd\u4e0b\u964d\uff09\u7684\u5173\u6ce8\u3002\u7531\u4e8e\u4eba\u7c7b\u548c\u673a\u5668\u90fd\u8f83\u5bb9\u6613\u5206\u8fa8\u597d\u6837\u672c\u548c\u574f\u6837\u672c\uff0c\u800c\u975e\u751f\u6210\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u53cd\u9988\u6765\u9632\u6b62\u6a21\u578b\u5728\u5408\u6210\u6570\u636e\u4e0a\u51fa\u73b0\u5d29\u6e83\u3002\u6211\u4eec\u7406\u8bba\u5206\u6790\u4e86\u4e00\u4e2a\u9ad8\u65af\u6df7\u5408\u5206\u7c7b\u6a21\u578b\u5728\u57fa\u4e8e\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u8bad\u7ec3\u4e0b\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u6709\u9650\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u5b9e\u9a8c\u8bc1\u636e\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5b9e\u9645\u95ee\u9898\u4e0a\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u9884\u6d4b\uff1a\u4f7f\u7528\u53d8\u538b\u5668\u8ba1\u7b97\u77e9\u9635\u7279\u5f81\u503c\u548c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u65b0\u95fb\u6458\u8981\uff0c\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u6a21\u578b\u5728\u751f\u6210\u6570\u636e\u4e0a\u90fd\u4f1a\u7ecf\u5386\u5d29\u6e83\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u901a\u8fc7\u4ece\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u4e2d\u8bad\u7ec3\uff0c\u65e0\u8bba\u662f\u4fee\u526a\u9519\u8bef\u9884\u6d4b\u8fd8\u662f\u9009\u62e9\u6700\u4f73\u731c\u6d4b\uff0c\u90fd\u80fd\u9632\u6b62\u6a21\u578b\u5d29\u6e83\uff0c\u8bc1\u5b9e\u4e86\u50cfRLHF\uff08Reinforcement Learning with Human Feedback\uff09\u8fd9\u6837\u7684\u6d41\u884c\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2406.07505": "|**2024-06-11**|**THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report**|KBTG Labs et.al.|[2406.07505](http://arxiv.org/abs/2406.07505)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u5728\u79d1\u6280\u9886\u57df\u5c55\u73b0\u4e86\u65b0\u529f\u80fd\u548c\u673a\u9047\u3002\u7136\u800c\uff0c\u975e\u5e38\u5927\u7684LLMs\u7684\u5b9e\u9645\u5e94\u7528\u53d7\u5230\u5176\u9ad8\u8ba1\u7b97\u6210\u672c\u7684\u5236\u7ea6\uff0c\u8fd9\u4e0e\u5176\u76f8\u5bf9\u6709\u9650\u7684\u4eba\u7c7b\u80fd\u529b\u76f8\u6bd4\uff0c\u6536\u76ca\u5e76\u4e0d\u660e\u663e\u3002\u5c3d\u7ba1\u5c0f\u578b\u3001\u66f4\u5b9e\u7528\u7684LLMs\u5728\u91d1\u878d\u5206\u6790\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5c1a\u672a\u5b8c\u5168\u638c\u63e1\uff0c\u5982\u5b83\u4eec\u5728\u6a21\u62df\u7279\u8bb8\u91d1\u878d\u5206\u6790\u5e08\uff08CFA\uff09\u8003\u8bd5\u4e2d\u7684\u63a5\u8fd1\u901a\u8fc7\u8868\u73b0\u6240\u793a\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86Financial Analyst Extension\uff08FAE\uff09\u5bf9\u6211\u4eec\u7684Text Hyperlocally Augmented Large Language Extension\uff08THaLLE\uff09\u7cfb\u5217\u7684\u6269\u5c55\uff0c\u8fd9\u4e00\u7cfb\u521780\u4ebf\u53c2\u6570\u7684LLMs\u5728\u6a21\u62dfCFA\u8003\u8bd5\u4e2d\u59cb\u7ec8\u8868\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u6a21\u578b\u76f8\u6bd4\u3002\u6211\u4eec\u8be6\u7ec6\u8bb0\u5f55\u4e86\u7528\u4e8e\u4f18\u5316\u7684\u5fae\u8c03\u6280\u672f\uff0c\u4ee5\u4f9b\u540e\u7eed\u7814\u7a76\u53c2\u8003\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165Flare CFA\uff0c\u8fd9\u662f\u4e00\u4e2a\u516c\u5f00\u53ef\u7528\u7684\u91d1\u878d\u987e\u95ee\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u68c0\u9a8cLLMs\u5728\u8d22\u52a1\u987e\u95ee\u89d2\u8272\u4e2d\u7684\u80fd\u529b\u3002|\n", "2406.07502": "|**2024-06-11**|**Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions**|Renjie Pi et.al.|[2406.07502](http://arxiv.org/abs/2406.07502)|**[link](https://github.com/sterzhang/image-textualization)**|**## \u80cc\u666f \u56fe\u50cf\u63cf\u8ff0\u6570\u636e\u96c6\u5bf9\u4e8e\u63a8\u52a8\u56fe\u50cf\u7406\u89e3\u3001\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u548c\u6587\u672c\u56fe\u50cf\u68c0\u7d22\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u5f53\u524d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u6765\u81ea\u4e24\u4e2a\u9014\u5f84\uff1a\u4e00\u662f\u4ece\u7f51\u7edc\u4e0a\u6293\u53d6\u56fe\u50cf\u4e0e\u6587\u5b57\u5bf9\uff0c\u4f46\u8fd9\u7c7b\u63cf\u8ff0\u5f80\u5f80\u8d28\u91cf\u8f83\u4f4e\u4e14\u5b58\u5728\u566a\u58f0\uff1b\u4e8c\u662f\u4eba\u5de5\u6807\u6ce8\uff0c\u5982COCO\u7b49\uff0c\u901a\u5e38\u63cf\u8ff0\u7b80\u6d01\uff0c\u7f3a\u4e4f\u8be6\u7ec6\u4fe1\u606f\u3002\u5c3d\u7ba1\u8be6\u7ec6\u7684\u56fe\u50cf\u63cf\u8ff0\u53ef\u4ee5\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u83b7\u5f97\uff0c\u4f46\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\u9650\u5236\u4e86\u5176\u53ef\u884c\u6027\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u4fc3\u4f7f\u6211\u4eec\u5bfb\u6c42\u66f4\u6709\u6548\u548c\u53ef\u6269\u5c55\u7684\u65b9\u6cd5\u6765\u751f\u6210\u51c6\u786e\u800c\u8be6\u5c3d\u7684\u56fe\u50cf\u63cf\u8ff0\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u56fe\u50cf\u6587\u672c\u5316\u201d\uff08Image Textualization\uff0c\u7b80\u79f0IT\uff09\uff0c\u5b83\u901a\u8fc7\u534f\u540c\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u548c\u89c6\u89c9\u4e13\u5bb6\u6a21\u578b\uff0c\u6709\u6548\u5730\u5c06\u89c6\u89c9\u4fe1\u606f\u8f6c\u5316\u4e3a\u6587\u672c\uff0c\u4ece\u800c\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u63cf\u8ff0\u3002\u9488\u5bf9\u5f53\u524d\u7f3a\u4e4f\u8be6\u5c3d\u63cf\u8ff0\u7684\u57fa\u51c6\u95ee\u9898\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u591a\u4e2a\u8bc4\u4ef7\u57fa\u51c6\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u8d28\u91cf\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728IT\u7cbe\u5fc3\u7f16\u7e82\u7684\u63cf\u8ff0\u8bad\u7ec3\u4e0b\uff0cLLaVA-7B\u6a21\u578b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u80fd\u529b\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u80fd\u591f\u751f\u6210\u66f4\u4e30\u5bcc\u7684\u63cf\u8ff0\uff0c\u8f93\u51fa\u957f\u5ea6\u548c\u7ec6\u8282\u663e\u8457\u589e\u52a0\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5e7b\u89c9\u73b0\u8c61\u3002**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496](http://arxiv.org/abs/2406.07496)|**[link](https://github.com/zou-group/textgrad)**|**\u4eba\u5de5\u667a\u80fd\u6b63\u7ecf\u5386\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5176\u4ed6\u590d\u6742\u7ec4\u4ef6\u7684\u534f\u540c\u5de5\u4f5c\u53d6\u5f97\u4e86\u7a81\u7834\u3002\u5f53\u524d\uff0c\u4e3a\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u8bbe\u8ba1\u539f\u5219\u5316\u7684\u81ea\u52a8\u5316\u4f18\u5316\u65b9\u6cd5\u6210\u4e3a\u4e00\u9879\u5173\u952e\u65b0\u6311\u6218\u3002\u795e\u7ecf\u7f51\u7edc\u5728\u65e9\u671f\u9762\u4e34\u7c7b\u4f3c\u95ee\u9898\u65f6\uff0c\u901a\u8fc7\u53cd\u5411\u4f20\u64ad\u548c\u81ea\u52a8\u5fae\u5206\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TextGrad\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u6587\u672c\u5b9e\u73b0\u81ea\u52a8\u201c\u5fae\u5206\u201d\uff0c\u5c06LLMs\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u901a\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5efa\u8bae\u56de\u4f20\u5230\u590d\u5408AI\u7cfb\u7edf\u7684\u5404\u4e2a\u7ec4\u4ef6\u4e2d\u3002TextGrad\u9075\u5faaPyTorch\u7684\u8bed\u6cd5\u548c\u62bd\u8c61\uff0c\u6613\u4e8e\u4f7f\u7528\u4e14\u7075\u6d3b\uff0c\u7528\u6237\u4ec5\u9700\u63d0\u4f9b\u76ee\u6807\u51fd\u6570\uff0c\u65e0\u9700\u8c03\u6574\u6846\u67b6\u7ec4\u4ef6\u6216\u63d0\u793a\uff0c\u5373\u53ef\u65e0\u7f1d\u5e94\u7528\u3002 TextGrad\u9002\u7528\u4e8e\u591a\u79cd\u4efb\u52a1\uff0c\u4ece\u95ee\u7b54\u548c\u5206\u5b50\u4f18\u5316\u5230\u653e\u5c04\u6cbb\u7597\u8ba1\u5212\u8bbe\u8ba1\u3002\u5728\u65e0\u9700\u4fee\u6539\u6846\u67b6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u663e\u8457\u63d0\u5347\u4e86GPT-4o\u5728Google\u8bc1\u660e\u6027\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u96f6-shot\u51c6\u786e\u7387\uff0c\u4ece51%\u63d0\u5347\u81f355%\uff1b\u5728\u4f18\u5316LeetCode\u96be\u9898\u89e3\u6cd5\u4e0a\u5b9e\u73b0\u4e8620%\u7684\u76f8\u5bf9\u6027\u80fd\u63d0\u5347\uff1b\u6539\u8fdb\u4e86\u63a8\u7406\u63d0\u793a\uff0c\u8bbe\u8ba1\u51fa\u5177\u6709\u7406\u60f3\u4f53\u5916\u4eb2\u548c\u529b\u7684\u65b0\u836f\u5019\u9009\u5206\u5b50\uff1b\u4ee5\u53ca\u8bbe\u8ba1\u51fa\u5177\u6709\u9ad8\u7279\u5f02\u6027\u7684\u653e\u5c04\u6cbb\u7597\u65b9\u6848\u3002TextGrad\u4e3a\u4e0b\u4e00\u4ee3AI\u7cfb\u7edf\u7684\u53d1\u5c55\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63a8\u52a8\u4e86\u590d\u5408AI\u6280\u672f\u7684\u52a0\u901f\u53d1\u5c55\u3002**|\n", "2406.07494": "|**2024-06-12**|**CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization**|Frederic Kirstein et.al.|[2406.07494](http://arxiv.org/abs/2406.07494)|null|\u8be5\u6587\u7ae0\u7efc\u8ff0\u4e862019\u5e74\u81f32024\u5e74\u95f4\u53d1\u8868\u76841262\u7bc7\u72ec\u7279\u7684\u7814\u7a76\u8bba\u6587\uff0c\u96c6\u4e2d\u5728Transformer\u67b6\u6784\u5728\u82f1\u6587\u5bf9\u8bdd\u6458\u8981\u751f\u6210\u65b9\u9762\u7684\u7814\u7a76\u3002\u6587\u7ae0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u5bf9\u8bdd\u6458\u8981\u4e2d\u5b58\u5728\u7684\u4e3b\u8981\u6311\u6218\uff0c\u5982\u8bed\u8a00\u7406\u89e3\u3001\u7ed3\u6784\u5904\u7406\u3001\u7406\u89e3\u80fd\u529b\u3001\u8bf4\u8bdd\u8005\u8bc6\u522b\u3001\u91cd\u8981\u6027\u5224\u65ad\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\uff0c\u5e76\u4e0e\u76f8\u5e94\u7684\u6280\u672f\uff0c\u5982\u56fe\u89e3\u65b9\u6cd5\u3001\u989d\u5916\u8bad\u7ec3\u4efb\u52a1\u548c\u89c4\u5212\u7b56\u7565\u8fdb\u884c\u4e86\u5173\u8054\u3002\u5c3d\u7ba1\u5728\u67d0\u4e9b\u65b9\u9762\uff08\u5982\u8bed\u8a00\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u7406\u89e3\u529b\u3001\u771f\u5b9e\u6027\u4e0e\u91cd\u8981\u6027\u8bc4\u4f30\u7b49\u6311\u6218\u4ecd\u7136\u5b58\u5728\uff0c\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u7814\u7a76\u7a7a\u95f4\u3002 \u6587\u7ae0\u8fd8\u5206\u6790\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u65b9\u5f0f\uff0c\u6db5\u76d6\u4e86\u5bf9\u8bdd\u5b50\u9886\u57df\uff08\u5982\u4f1a\u8bae\u3001\u533b\u7597\uff09\u7684\u5e38\u7528\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u81ea\u52a8\u8bc4\u4ef7\u6307\u6807\uff08\u5982ROUGE\uff09\u548c\u4eba\u7c7b\u8bc4\u4f30\u7684\u666e\u904d\u5b9e\u8df5\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8de8\u9886\u57df\u7684\u6570\u636e\u96c6\u76f8\u5bf9\u6709\u9650\uff0c\u4e14\u62a5\u544a\u7684\u4eba\u7c7b\u8bc4\u4f30\u5f80\u5f80\u7f3a\u4e4f\u8db3\u591f\u7684\u5185\u5ba1\u5458\u4e00\u81f4\u6027\u4fe1\u606f\u548c\u6807\u6ce8\u6307\u5357\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u63a2\u7d22\u53ef\u80fd\u5e26\u6765\u7684\u5f71\u54cd\uff0c\u6307\u51fa\u5c3d\u7ba1\u5b83\u4eec\u53ef\u80fd\u4f1a\u6539\u53d8\u76f8\u5173\u6027\u548c\u96be\u5ea6\uff0c\u4f46\u63cf\u8ff0\u7684\u6311\u6218\u5206\u7c7b\u4f53\u7cfb\u4ecd\u7136\u5177\u6709\u4ef7\u503c\u3002|\n", "2406.07485": "|**2024-06-11**|**PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction**|Adnan Abbas et.al.|[2406.07485](http://arxiv.org/abs/2406.07485)|null|\u9ad8\u6548\u7684\u8ba1\u5212\u5236\u5b9a\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4eba\u4eec\u5f80\u5f80\u96be\u4ee5\u5236\u5b9a\u5b9e\u9645\u7684\u8ba1\u5212\u5e76\u53cd\u601d\u81ea\u5df1\u7684\u6548\u7387\u3002\u5229\u7528\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u5bf9\u8bdd\u52a9\u624b\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u8bdd\u65b9\u5f0f\u5c06\u8ba1\u5212\u5916\u5316\uff0c\u5f3a\u5316\u51b3\u5fc3\uff0c\u4fc3\u8fdb\u4e13\u6ce8\u884c\u52a8\uff0c\u4ece\u800c\u6b63\u9762\u5f71\u54cd\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u8bbe\u8ba1\u4e00\u4e2a\u5bf9\u8bdd\u52a9\u624b\uff0c\u901a\u8fc7\u81ea\u7136\u5bf9\u8bdd\u7684\u793e\u4ea4\u4e92\u52a8\u6027\uff0c\u63d0\u4f9b\u6df1\u5165\u7684\u95ee\u9898\u548c\u53cd\u601d\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u8ba1\u5212\u6267\u884c\u5ea6\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u663e\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u7684\u6548\u76ca\uff0c\u4f46\u8bb8\u591a\u5e72\u9884\u63aa\u65bd\u4ecd\u4fdd\u6301\u9759\u6001\uff0c\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u53c2\u4e0e\u5ea6\u968f\u65f6\u95f4\u4e0b\u964d\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65cb\u8f6c\u548c\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u6bcf\u5929\u4e3a\u7528\u6237\u63d0\u4f9b\u591a\u6837\u7684\u5e72\u9884\u624b\u6bb5\u3002\u6211\u4eec\u7684\u7cfb\u7edfPITCH\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u4fc3\u8fdb\u65e5\u5e38\u8ba1\u5212\u7684\u5916\u90e8\u5316\u548c\u53cd\u601d\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0e\u5bf9\u8bdd\u4ee3\u7406\u4e00\u8d77\u5916\u5316\u4efb\u52a1\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u65cb\u8f6c\u7b56\u7565\u5728\u4fdd\u6301\u7528\u6237\u53c2\u4e0e\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2406.07483": "|**2024-06-11**|**Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing**|Mao Li et.al.|[2406.07483](http://arxiv.org/abs/2406.07483)|null|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u81ea\u52a8\u6587\u672c\u6807\u6ce8\u65b9\u9762\u5c55\u73b0\u51fa\u6d53\u539a\u5174\u8da3\u3002\u672c\u6587\u7814\u7a76\u4e86\u516b\u79cd\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728\u7acb\u573a\u6807\u6ce8\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5c06\u5176\u4e0e\u4eba\u7c7b\uff08\u901a\u8fc7\u4f17\u5305\uff09\u7684\u5224\u65ad\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u3002\u6211\u4eec\u63a2\u7a76\u4e86\u4f55\u65f6LLMs\u53ef\u80fd\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ea7\u751f\u5206\u6b67\u7684\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6587\u672c\u4e2d\u8868\u8fbe\u7acb\u573a\u7684\u660e\u786e\u7a0b\u5ea6\u5bf9LLMs\u5224\u65ad\u4e0e\u4eba\u7c7b\u4e00\u81f4\u6027\u81f3\u5173\u91cd\u8981\u3002\u5f53\u4eba\u7c7b\u6ce8\u91ca\u8005\u8868\u73b0\u826f\u597d\u65f6\uff0cLLMs\u4e5f\u8868\u73b0\u51fa\u8272\uff1b\u53cd\u4e4b\uff0cLLMs\u7684\u5931\u8d25\u5f80\u5f80\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u96be\u4ee5\u8fbe\u6210\u4e00\u81f4\u7684\u60c5\u5883\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u7ed3\u5408\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u7cbe\u786e\u5ea6\u4e0eLLMs\u9884\u6d4b\u7684\u89c4\u6a21\uff0c\u63d0\u51fa\u4e00\u79cd\u5168\u9762\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u63d0\u9ad8\u81ea\u52a8\u5316\u7acb\u573a\u68c0\u6d4b\u51c6\u786e\u6027\u548c\u5168\u9762\u6027\u7684\u5fc5\u8981\u6027\uff0c\u65e8\u5728\u63a8\u52a8\u8fd9\u4e9b\u6280\u672f\u5728\u66f4\u9ad8\u6548\u3001\u65e0\u504f\u89c1\u7684\u793e\u4f1a\u5a92\u4f53\u5206\u6790\u4e2d\u5f97\u5230\u63d0\u5347\u3002|\n", "2406.07476": "|**2024-06-11**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476](http://arxiv.org/abs/2406.07476)|**[link](https://github.com/damo-nlp-sg/videollama2)**|**\u672c\u6587\u4ecb\u7ecdVideoLLaMA 2\uff0c\u4e00\u5957\u4e13\u4e3a\u63d0\u5347\u89c6\u9891\u548c\u97f3\u9891\u5b9a\u5411\u4efb\u52a1\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u5efa\u6a21\u53ca\u97f3\u9891\u7406\u89e3\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Video-LLMs\uff09\u3002\u5b83\u5728\u524d\u4e00\u4ee3\u7684\u57fa\u7840\u4e0a\u589e\u6dfb\u4e86\u5b9a\u5236\u7684\u65f6\u7a7a\u5377\u79ef\uff08STC\uff09\u8fde\u63a5\u5668\uff0c\u6709\u6548\u5730\u6355\u6349\u89c6\u9891\u6570\u636e\u7684\u590d\u6742\u7a7a\u95f4\u548c\u65f6\u95f4\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u878d\u5165\u4e86\u97f3\u9891\u5206\u652f\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u65e0\u7f1d\u878d\u5408\u97f3\u9891\u7ebf\u7d22\u3002\u5728\u591a\u9879\u8bc4\u4f30\u4e2d\uff0c\u5982\u591a\u9009\u89c6\u9891\u95ee\u7b54\uff08MC-VQA\uff09\u3001\u5f00\u653e\u6027\u89c6\u9891\u95ee\u7b54\uff08OE-VQA\uff09\u548c\u89c6\u9891captioning\uff08VC\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u8868\u73b0\u51fa\u4e0e\u5f00\u6e90\u6a21\u578b\u76f8\u5f53\u7684\u7ade\u4e89\u5b9e\u529b\uff0c\u5e76\u5728\u67d0\u4e9b\u57fa\u51c6\u4e0a\u63a5\u8fd1\u4e13\u6709\u6a21\u578b\u3002\u5728\u97f3\u9891\u4ec5\u7528\uff08AQA\uff09\u548c\u97f3\u9891-\u89c6\u9891\u95ee\u7b54\uff08OE-AVQA\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u4e5f\u663e\u793a\u51fa\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5408\u7406\u6539\u8fdb\u3002\u8fd9\u4e9b\u8fdb\u6b65\u51f8\u663e\u4e86VideoLLaMA 2\u5728\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4e3a\u667a\u80fd\u89c6\u9891\u5206\u6790\u7cfb\u7edf\u6811\u7acb\u4e86\u65b0\u6807\u51c6\u3002\u6240\u6709\u6a21\u578b\u5747\u516c\u5f00\u4ee5\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2406.08477": "|**2024-06-12**|**Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens**|Ting-Ji Huang et.al.|[2406.08477](http://arxiv.org/abs/2406.08477)|null|\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u5411\u91cf\u8868\u793a\u7528\u6237\u548c\u9879\u76ee\u5bf9\u4e8e\u591a\u79cd\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u95ee\u7b54\u5f62\u5f0f\u7684\u63a8\u8350\uff0c\u4f7f\u7528\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\uff08\u5982\u201citem\u201d\u3001\u201c20\u201d\u3001\u201c24\u201d\uff09\u6765\u8868\u793a\u5b9e\u9645\u7684\u7528\u6237\u548c\u9879\u76ee\u3002\u7136\u800c\uff0c\u7531\u4e8eLLMs\u901a\u5e38\u662f\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u9884\u8bad\u7ec3\u7684\uff0c\u8fd9\u4e9b\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5728\u8868\u8fbe\u72ec\u7279\u7528\u6237\u548c\u9879\u76ee\u65b9\u9762\u80fd\u529b\u6709\u9650\uff0c\u5373\u4f7f\u7ecf\u8fc7\u63a8\u8350\u4efb\u52a1\u7684\u5fae\u8c03\uff0c\u4e5f\u4f1a\u524a\u5f31\u63a8\u8350\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5728LLM\u57fa\u7684\u63a8\u8350\u7cfb\u7edf\u4e2d\u5904\u7406\u7528\u6237\u548c\u9879\u76ee\u7684\u6807\u8bb0\u3002 \u6211\u4eec\u5f3a\u8c03\u4e86\u51fa\u8bcd\u6c47\u8868\uff08OOV\uff09\u6807\u8bb0\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u9664\u4e86\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5916\uff0c\u8fd8\u80fd\u6355\u6349\u7528\u6237/\u9879\u76ee\u4e4b\u95f4\u7684\u5173\u8054\u6027\u548c\u591a\u6837\u6027\u3002\u901a\u8fc7\u5206\u6790\u5386\u53f2\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u4f7f\u5177\u6709\u76f8\u4f3c\u7279\u6027\u7684\u7528\u6237/\u9879\u76ee\u7ec4\u5408\u5171\u4eab\u76f8\u540c\u7684OOV\u6807\u8bb0\u3002\u6b64\u5916\uff0c\u5c06\u8fd9\u4e9bOOV\u6807\u8bb0\u6574\u5408\u5230LLM\u7684\u8bcd\u6c47\u8868\u4e2d\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u533a\u5206\u7528\u6237\u548c\u9879\u76ee\uff0c\u589e\u5f3a\u5728\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u65f6\u5bf9\u7528\u6237-\u9879\u76ee\u5173\u7cfb\u7684\u6355\u6349\u3002 \u6211\u4eec\u7684\u63d0\u51fa\u7684\u6846\u67b6\u5728\u5404\u79cd\u4e0b\u6e38\u63a8\u8350\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2406.08474": "|**2024-06-12**|**Real2Code: Reconstruct Articulated Objects via Code Generation**|Zhao Mandi et.al.|[2406.08474](http://arxiv.org/abs/2406.08474)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014Real2Code\uff0c\u65e8\u5728\u901a\u8fc7\u4ee3\u7801\u751f\u6210\u6765\u91cd\u5efa\u53ef\u52a8\u7269\u4f53\u3002\u7ed9\u5b9a\u7269\u4f53\u7684\u89c6\u89c9\u89c2\u6d4b\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u56fe\u50cf\u5206\u5272\u6a21\u578b\u548c\u5f62\u72b6\u8865\u5168\u6a21\u578b\u91cd\u6784\u5176\u90e8\u4ef6\u51e0\u4f55\u7ed3\u6784\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u7269\u4f53\u90e8\u4ef6\u8868\u793a\u4e3a\u5e26\u6709\u65b9\u5411\u7684\u8fb9\u754c\u6846\uff0c\u7136\u540e\u8f93\u5165\u5230\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\uff0c\u9884\u6d4b\u5173\u8282\u6d3b\u52a8\u7684\u4ee3\u7801\u8868\u793a\u3002\u901a\u8fc7\u5229\u7528\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u4f18\u96c5\u5730\u6269\u5c55\u5230\u5177\u6709\u66f4\u591a\u53ef\u52a8\u90e8\u4ef6\u7684\u5bf9\u8c61\uff0c\u5e76\u80fd\u4ece\u5408\u6210\u8bad\u7ec3\u6570\u636e\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4e0d\u89c4\u5219\u73af\u5883\u7269\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cReal2Code\u5728\u91cd\u5efa\u7cbe\u5ea6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u662f\u9996\u4e2a\u80fd\u591f\u8d85\u8d8a\u8bad\u7ec3\u96c6\u4e2d\u5bf9\u8c61\u7ed3\u6784\u590d\u6742\u6027\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91cd\u5efa\u591a\u8fbe10\u4e2a\u53ef\u52a8\u90e8\u4ef6\u7684\u7269\u4f53\u3002\u5f53\u4e0e\u7acb\u4f53\u91cd\u5efa\u6a21\u578b\u7ed3\u5408\u65f6\uff0cReal2Code\u8fd8\u80fd\u4ece\u5c11\u91cf\u591a\u89c6\u56feRGB\u56fe\u50cf\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u7269\u4f53\uff0c\u65e0\u9700\u6df1\u5ea6\u6216\u76f8\u673a\u4fe1\u606f\u3002|\n", "2406.08464": "|**2024-06-12**|**Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing**|Zhangchen Xu et.al.|[2406.08464](http://arxiv.org/abs/2406.08464)|null|\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4\u6570\u636e\u5bf9\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u50cfLlama-3-Instruct\u8fd9\u6837\u7684\u6a21\u578b\u516c\u5f00\u4e86\u6743\u91cd\uff0c\u4f46\u5b83\u4eec\u7684\u5bf9\u9f50\u6570\u636e\u4ecd\u7136\u4fdd\u5bc6\uff0c\u8fd9\u9650\u5236\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u666e\u53ca\u3002\u73b0\u6709\u7684\u5f00\u6e90\u6570\u636e\u751f\u6210\u65b9\u6cd5\u53d7\u9650\u4e8e\u9ad8\u6602\u7684\u4eba\u529b\u6210\u672c\u548c\u6709\u9650\u7684\u63d0\u793a\u8303\u56f4\uff0c\u96be\u4ee5\u6709\u6548\u6269\u5c55\uff0c\u53ef\u80fd\u5f71\u54cd\u516c\u5171\u5bf9\u9f50\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u80fd\u5426\u901a\u8fc7\u76f4\u63a5\u4ece\u5df2\u5bf9\u9f50\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\uff0c\u5927\u89c4\u6a21\u5408\u6210\u9ad8\u8d28\u6307\u4ee4\u6570\u636e\u5462\uff1f\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5408\u6210\u65b9\u6cd5\uff0c\u79f0\u4e3aMagpie\u3002\u6211\u4eec\u7684\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u7531\u4e8eLlama-3-Instruct\u7b49\u5df2\u5bf9\u9f50\u7684\u6a21\u578b\u5177\u6709\u81ea\u56de\u5f52\u7279\u6027\uff0c\u5f53\u6211\u4eec\u4ec5\u8f93\u5165\u5de6\u4fa7\u6a21\u677f\u5230\u7528\u6237\u6d88\u606f\u9884\u7559\u4f4d\u7f6e\u65f6\uff0c\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u7528\u6237\u67e5\u8be2\u3002\u6211\u4eec\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\u63d0\u793aLlama-3-Instruct\uff0c\u751f\u6210\u4e86400\u4e07\u4e2a\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5bf9\u63d0\u53d6\u7684\u6570\u636e\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u5e76\u9009\u62e9\u4e8630\u4e07\u4e2a\u9ad8\u8d28\u91cf\u5b9e\u4f8b\u3002\u4e3a\u4e86\u6bd4\u8f83Magpie\u6570\u636e\u4e0e\u5176\u4ed6\u516c\u5171\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5206\u522b\u4f7f\u7528\u6bcf\u4e2a\u6570\u636e\u96c6\u5bf9Llama-3-8B-Base\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8bc4\u4f30\u5fae\u8c03\u540e\u6a21\u578b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u4ec5\u4f7f\u7528Magpie\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0e\u5b98\u65b9\u7ecf\u8fc71000\u4e07\u4e2a\u6570\u636e\u70b9\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u548c\u540e\u7eed\u53cd\u9988\u5b66\u4e60\u589e\u5f3a\u7684Llama-3-8B-Instruct\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4ec5\u4f7f\u7528Magpie\u8fdb\u884cSFT\u53ef\u4ee5\u8d85\u8d8a\u5148\u524d\u7528\u4e8eSFT\u548c\u504f\u597d\u4f18\u5316\uff08\u5982UltraFeedback\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff09\u7684\u516c\u5171\u6570\u636e\u96c6\u3002\u8fd9\u79cd\u4f18\u52bf\u5728AlpacaEval\u3001ArenaHard\u548cWildBench\u7b49\u5bf9\u9f50\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u660e\u663e\u3002|\n", "2406.08434": "|**2024-06-12**|**TasTe: Teaching Large Language Models to Translate through Self-Reflection**|Yutong Wang et.al.|[2406.08434](http://arxiv.org/abs/2406.08434)|**[link](https://github.com/yutongwang1216/reflectionllmmt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u540e\uff0c\u5728\u673a\u5668\u7ffb\u8bd1\uff08Machine Translation, MT\uff09\u7b49\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u672a\u80fd\u8fbe\u5230\u4e0e\u76d1\u7763\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08Supervised Neural Machine Translation, NMT\uff09\u7cfb\u7edf\u76f8\u5f53\u7684\u7ffb\u8bd1\u8d28\u91cf\u3002\u539f\u56e0\u53ef\u80fd\u662f\u5f53\u524d\u4f7f\u7528\u7684\u7b80\u5355\u63d0\u793a\u65e0\u6cd5\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TasTe\u6846\u67b6\uff0c\u5373\u201c\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u8fdb\u884c\u7ffb\u8bd1\u201d\u3002\u8be5\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u63a8\u7406\u9636\u6bb5\uff1a\u7b2c\u4e00\u9636\u6bb5\uff0c\u6a21\u578b\u88ab\u5f15\u5bfc\u751f\u6210\u521d\u6b65\u7ffb\u8bd1\u5e76\u540c\u65f6\u5bf9\u5176\u81ea\u8eab\u8fdb\u884c\u8bc4\u4f30\uff1b\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6a21\u578b\u6839\u636e\u8bc4\u4f30\u7ed3\u679c\u5bf9\u521d\u6b65\u7ffb\u8bd1\u8fdb\u884c\u7ec6\u5316\u3002\u5728WMT22\u57fa\u51c6\u7684\u56db\u79cd\u8bed\u8a00\u65b9\u5411\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\u7684\u6709\u6548\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91ca\u653e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u5e76\u589e\u5f3a\u5176\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/YutongWang1216/ReflectionLLMMT\u4e0a\u5f00\u6e90\u3002**|\n", "2406.08426": "|**2024-06-12**|**Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL**|Zijin Hong et.al.|[2406.08426](http://arxiv.org/abs/2406.08426)|null|\u6587\u672c\u8f6cSQL\u751f\u6210\u51c6\u786e\u7684SQL\u67e5\u8be2\u4ee5\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u662f\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5b83\u6d89\u53ca\u7528\u6237\u95ee\u9898\u7406\u89e3\u3001\u6570\u636e\u5e93\u6a21\u5f0f\u7406\u89e3\u4ee5\u53caSQL\u751f\u6210\u7b49\u591a\u4e2a\u590d\u6742\u73af\u8282\u3002\u4f20\u7edf\u7684\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u4f9d\u8d56\u4e8e\u4eba\u5de5\u5de5\u7a0b\u548c\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u3002\u968f\u7740\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u53d1\u5c55\u548c\u5728\u8be5\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002\u7136\u800c\uff0c\u968f\u7740\u6570\u636e\u5e93\u590d\u6742\u5ea6\u589e\u52a0\u548c\u7528\u6237\u95ee\u9898\u96be\u5ea6\u589e\u5927\uff0cPLMs\u6709\u9650\u7684\u7406\u89e3\u80fd\u529b\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u7684SQL\u751f\u6210\uff0c\u8fd9\u4fc3\u4f7f\u7814\u7a76\u4eba\u5458\u5bfb\u6c42\u66f4\u9ad8\u7ea7\u548c\u5b9a\u5236\u5316\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u9650\u5236\u4e86PLM\u57fa\u7840\u7cfb\u7edf\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0a\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u56e0\u6b64\uff0c\u6574\u5408LLM\u7684\u5b9e\u73b0\u4e3a\u6587\u672c\u8f6cSQL\u7814\u7a76\u5e26\u6765\u4e86\u72ec\u7279\u7684\u673a\u9047\u3001\u6311\u6218\u548c\u89e3\u51b3\u65b9\u6848\u3002\u672c\u7efc\u8ff0\u5168\u9762\u6982\u8ff0\u4e86\u57fa\u4e8eLLM\u7684\u6587\u672c\u8f6cSQL\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u5f53\u524d\u9762\u4e34\u7684\u6311\u6218\u548c\u6587\u672c\u8f6cSQL\u7684\u53d1\u5c55\u5386\u7a0b\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4ef7\u6307\u6807\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u5206\u6790\u4e86\u8fd1\u671f\u5728LLM\u652f\u6301\u4e0b\u7684\u6587\u672c\u8f6cSQL\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u5c1a\u5b58\u7684\u6311\u6218\uff0c\u5e76\u5bf9\u672a\u6765\u7814\u7a76\u65b9\u5411\u63d0\u51fa\u671f\u5f85\u3002|\n", "2406.08418": "|**2024-06-12**|**OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text**|Qingyun Li et.al.|[2406.08418](http://arxiv.org/abs/2406.08418)|**[link](https://github.com/opengvlab/omnicorpus)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aOmniCorpus\u7684\u5927\u578b\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u6570\u636e\u96c6\uff0c\u89c4\u6a21\u8fbe\u5230100\u4ebf\u7ea7\u522b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u901a\u8fc7\u9ad8\u6548\u7684\u5f15\u64ce\u7b5b\u9009\u548c\u63d0\u53d6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u6587\u6863\uff0c\u5305\u542b86\u4ebf\u5f20\u56fe\u7247\u548c1,696\u4e07\u4ebf\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u76f8\u8f83\u4e8e\u540c\u7c7b\u6570\u636e\uff08\u5982MMC4\u3001OBELICS\uff09\uff0cOmniCorpus\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf\uff1a1\uff09\u89c4\u6a21\u6269\u592715\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6570\u636e\u8d28\u91cf\uff1b2\uff09\u6765\u6e90\u66f4\u4e3a\u591a\u6837\uff0c\u5305\u62ec\u82f1\u6587\u548c\u975e\u82f1\u6587\u7f51\u7ad9\uff0c\u4ee5\u53ca\u89c6\u9891\u4e3a\u4e3b\u7684\u7f51\u7ad9\uff1b3\uff09\u7075\u6d3b\u6027\u66f4\u5f3a\uff0c\u53ef\u4ee5\u4ece\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u683c\u5f0f\u8f7b\u677e\u8f6c\u6362\u4e3a\u7eaf\u6587\u672c\u8bed\u6599\u5e93\u6216\u56fe\u50cf-\u6587\u672c\u5bf9\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bba\u6587\u9a8c\u8bc1\u4e86OmniCorpus\u7684\u6570\u636e\u8d28\u91cf\u3001\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684\u591a\u6a21\u6001\u6a21\u578b\u7814\u7a76\u63d0\u4f9b\u575a\u5b9e\u7684\u6570\u636e\u57fa\u7840\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/OpenGVLab/OmniCorpus\u4e0a\u516c\u5f00\u3002**|\n", "2406.08414": "|**2024-06-12**|**Discovering Preference Optimization Algorithms with and for Large Language Models**|Chris Lu et.al.|[2406.08414](http://arxiv.org/abs/2406.08414)|**[link](https://github.com/luchris429/DiscoPOP)**|****\u4e2d\u6587\u7ffb\u8bd1\uff1a** \u79bb\u7ebf\u504f\u597d\u4f18\u5316\u662f\u63d0\u5347\u548c\u63a7\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u8d28\u91cf\u7684\u91cd\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u504f\u597d\u4f18\u5316\u88ab\u89c6\u4e3a\u57fa\u4e8e\u4eba\u5de5\u8bbe\u8ba1\u7684\u51f8\u635f\u5931\u51fd\u6570\u7684\u79bb\u7ebf\u76d1\u7763\u5b66\u4e60\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53d7\u9650\u4e8e\u4eba\u7c7b\u521b\u9020\u529b\uff0c\u672a\u80fd\u5145\u5206\u63a2\u7d22\u53ef\u80fd\u7684\u635f\u5931\u51fd\u6570\u7684\u5de8\u5927\u641c\u7d22\u7a7a\u95f4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528LLM\u8fdb\u884c\u76ee\u6807\u53d1\u73b0\u7684\u65b9\u6cd5\uff0c\u4ee5\u81ea\u52a8\u53d1\u73b0\u65b0\u7684\u6700\u5148\u8fdb\u7684\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u65e0\u9700\uff08\u4e13\u5bb6\uff09\u4eba\u5de5\u5e72\u9884\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u8fed\u4ee3\u5730\u63d0\u793aLLM\uff0c\u6839\u636e\u5148\u524d\u7684\u6027\u80fd\u8bc4\u4f30\u63d0\u51fa\u5e76\u5b9e\u73b0\u65b0\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5bfc\u81f4\u4e86\u672a\u77e5\u4e14\u9ad8\u6548\u7684\u4f18\u5316\u7b97\u6cd5\u7684\u53d1\u73b0\u3002\u5176\u4e2d\u6700\u597d\u7684\u4e00\u4e2a\u88ab\u547d\u540d\u4e3a\u201c\u53d1\u73b0\u504f\u597d\u4f18\u5316\u201d\uff08DiscoPOP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\uff0c\u5b83\u5de7\u5999\u5730\u878d\u5408\u4e86\u903b\u8f91\u548c\u6307\u6570\u635f\u5931\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDiscoPOP\u5728\u6027\u80fd\u4e0a\u8fbe\u5230\u4e86\u6700\u65b0\u6c34\u5e73\uff0c\u5e76\u6210\u529f\u5730\u5e94\u7528\u4e8e\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u3002**|\n", "2406.08413": "|**2024-06-12**|**Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference**|Christopher Wolters et.al.|[2406.08413](http://arxiv.org/abs/2406.08413)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u671f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f7f\u5f97\u673a\u5668\u80fd\u591f\u751f\u6210\u903c\u771f\u7684\u6587\u672c\u5e76\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u968f\u7740\u8ba1\u7b97\u548c\u5185\u5b58\u9700\u6c42\u7684\u6025\u5267\u589e\u957f\uff0c\u5c24\u5176\u662f\u5f53LLMs\u8d85\u8d8a\u5355\u4e2aGPU\u7684\u5904\u7406\u80fd\u529b\u65f6\uff0c\u5bf9\u901f\u5ea6\u3001\u6548\u7387\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u9700\u6c42\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u540c\u65f6\uff0c\u8ba1\u7b97\u673a\u6027\u80fd\u548c\u5185\u5b58\u80fd\u529b\u7684\u53d1\u5c55\u5e76\u672a\u8ddf\u4e0a\u6b65\u4f10\uff0c\u5c24\u5176\u662f\u5728\u6469\u5c14\u5b9a\u5f8b\u653e\u7f13\u7684\u80cc\u666f\u4e0b\u3002\u5185\u5b58\u8bbf\u95ee\u6210\u672c\u8fdc\u9ad8\u4e8e\u8ba1\u7b97\uff0c\u8fd9\u7ed9\u5927\u89c4\u6a21\u6269\u5c55\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5373\u6240\u8c13\u7684\u201c\u5185\u5b58\u5899\u201d\u3002\u5728\u8fd9\u4e2a\u65f6\u5019\uff0c\u8ba1\u7b97\u5728\u5185\u5b58\uff08Compute-in-Memory, CIM\uff09\u6280\u672f\u4e3aAI\u63a8\u7406\u63d0\u4f9b\u4e86\u52a0\u901f\u53ef\u80fd\uff0c\u901a\u8fc7\u5728\u5185\u5b58\u4e2d\u76f4\u63a5\u6267\u884c\u6a21\u62df\u8ba1\u7b97\uff0c\u6709\u671b\u964d\u4f4e\u5ef6\u8fdf\u548c\u529f\u8017\u3002\u901a\u8fc7\u7d27\u5bc6\u96c6\u6210\u5185\u5b58\u548c\u8ba1\u7b97\u5143\u4ef6\uff0cCIM\u6d88\u9664\u4e86\u51af\u8bfa\u4f9d\u66fc\u74f6\u9888\uff0c\u51cf\u5c11\u4e86\u6570\u636e\u4f20\u8f93\uff0c\u63d0\u9ad8\u4e86\u80fd\u6e90\u6548\u7387\u3002 \u672c\u7efc\u8ff0\u8bba\u6587\u6982\u8ff0\u4e86\u57fa\u4e8e\u53d8\u538b\u5668\u7684\u6a21\u578b\uff0c\u63a2\u8ba8\u4e86\u5404\u79cdCIM\u67b6\u6784\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u5982\u4f55\u5e94\u5bf9\u73b0\u4ee3\u4eba\u5de5\u667a\u80fd\u8ba1\u7b97\u7cfb\u7edf\u9762\u4e34\u7684\u7d27\u8feb\u6311\u6218\u3002\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u4e86\u4e0e\u53d8\u538b\u5668\u76f8\u5173\u7684\u8fd0\u7b97\u53ca\u5176\u786c\u4ef6\u52a0\u901f\u7b56\u7565\uff0c\u540c\u65f6\u6307\u51fa\u76f8\u5173CIM\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\u3001\u8d8b\u52bf\u548c\u6d1e\u5bdf\u3002|\n", "2406.08402": "|**2024-06-12**|**Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models**|Chun-Yi Kuan et.al.|[2406.08402](http://arxiv.org/abs/2406.08402)|**[link](https://github.com/kuan2jiu99/audio-hallucination)**|**## \u80cc\u666f \u5927\u578b\u97f3\u9891\u8bed\u8a00\u6a21\u578b\uff08LALMs\uff09\u901a\u8fc7\u6574\u5408\u97f3\u9891\u611f\u77e5\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u4f20\u7edf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u97f3\u9891\u76f8\u5173\u4efb\u52a1\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bc4\u4f30LALMs\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u4f46\u5bf9\u5b83\u4eec\u7684\u53ef\u9760\u6027\uff0c\u7279\u522b\u662f\u5173\u4e8e\u5bf9\u8c61\u5e7b\u89c9\u7b49\u95ee\u9898\u7684\u5173\u6ce8\u4e0d\u8db3\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u6765\u8bc4\u4f30\u516c\u5f00\u53ef\u7528\u7684LALMs\u5728\u5bf9\u8c61\u5e7b\u89c9\u65b9\u9762\u7684\u7a0b\u5ea6\u3002\u7ed3\u679c\u8868\u660e\uff0cLALMs\u5728\u7406\u89e3\u97f3\u9891\u5185\u5bb9\u65b9\u9762\u4e0e\u4e13\u95e8\u7684\u97f3\u9891captioning\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5728\u56de\u7b54\u533a\u5206\u6027\u95ee\u9898\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u9700\u8981\u8bc6\u522b\u97f3\u9891\u7247\u6bb5\u4e2d\u7279\u5b9a\u7269\u4f53\u58f0\u97f3\u7684\u95ee\u9898\u3002\u8fd9\u63ed\u793a\u4e86\u5f53\u524dLALMs\u7684\u4e00\u4e2a\u5173\u952e\u5f31\u70b9\uff1a\u5b83\u4eec\u5bf9\u533a\u5206\u6027\u67e5\u8be2\u7684\u7406\u89e3\u4e0d\u8db3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u63d0\u793a\u5de5\u7a0b\u5982\u4f55\u63d0\u5347LALMs\u5728\u533a\u5206\u6027\u95ee\u9898\u4e0a\u7684\u6027\u80fd\u3002**|\n", "2406.08398": "|**2024-06-12**|**cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers**|Anirudh Sundar et.al.|[2406.08398](http://arxiv.org/abs/2406.08398)|null|## \u80cc\u666f \u5728\u60c5\u5883\u5316\u548c\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8bdd\uff08SIMMC\uff09\u7684\u65b0\u5174\u7814\u7a76\u9886\u57df\u4e2d\uff0c\u79d1\u5b66\u8bba\u6587\u7684\u4e92\u52a8\u662f\u4e00\u4e2a\u91cd\u8981\u65b9\u5411\u3002\u7531\u4e8e\u79d1\u5b66\u8bba\u6587\u4e3b\u8981\u7531\u6587\u672c\u3001\u516c\u5f0f\u3001\u56fe\u8868\u548c\u8868\u683c\u6784\u6210\uff0cSIMMC\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u8fdb\u884c\u4e13\u95e8\u8bbe\u8ba1\uff0c\u4ee5\u652f\u6301\u79d1\u7814\u4eba\u5458\u6240\u9700\u7684\u6df1\u5ea6\u63a2\u7a76\u548c\u4e92\u52a8\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u8bdd\u5f0f\u8bba\u6587\u201d\uff08cPAPERS\uff09\u7684\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u6765\u81eaarXiv\u4e0a\u53ef\u7528\u7684\u79d1\u5b66\u6587\u6863\u7684\u5b66\u672f\u8bba\u6587\u8bc4\u8bba\u4e2d\u7684\u95ee\u7b54\u5bf9\uff0c\u8fd9\u4e9b\u95ee\u7b54\u4e0e\u8bba\u6587\u7ec4\u4ef6\u53ca\u5176\u5f15\u7528\u76f8\u5173\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6570\u636e\u6536\u96c6\u7b56\u7565\uff0c\u901a\u8fc7OpenReview\u6536\u96c6\u8fd9\u4e9b\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5e76\u4e0eLaTeX\u6e90\u6587\u4ef6\u4e2d\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5173\u8054\u8d77\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u7cfb\u5217\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5305\u62ec\u96f6\u6837\u672c\u548c\u5fae\u8c03\u914d\u7f6e\uff0c\u6765\u5904\u7406cPAPERS\u6570\u636e\u96c6\u3002|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418](http://arxiv.org/abs/2406.09418)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|**\u5728\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u5c55\u57fa\u7840\u4e0a\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u89c6\u9891\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u9891LMMs\u4f9d\u8d56\u4e8e\u56fe\u50cf\u6216\u89c6\u9891\u7f16\u7801\u5668\u5904\u7406\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u4e9b\u7f16\u7801\u5668\u5404\u81ea\u5b58\u5728\u5c40\u9650\u6027\u3002\u56fe\u50cf\u7f16\u7801\u5668\u64c5\u957f\u6355\u6349\u5e27\u5e8f\u5217\u4e2d\u7684\u4e30\u5bcc\u7a7a\u95f4\u7ec6\u8282\uff0c\u4f46\u7f3a\u4e4f\u660e\u786e\u7684\u65f6\u95f4\u4e0a\u4e0b\u6587\uff1b\u800c\u89c6\u9891\u7f16\u7801\u5668\u63d0\u4f9b\u65f6\u95f4\u4e0a\u4e0b\u6587\uff0c\u4f46\u5e38\u5e38\u53d7\u9650\u4e8e\u8ba1\u7b97\u8d44\u6e90\uff0c\u5bfc\u81f4\u53ea\u80fd\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u7684\u7a00\u758f\u5e27\uff0c\u4ece\u800c\u5f71\u54cd\u4e86\u5bf9\u7a7a\u95f4\u548c\u4e0a\u4e0b\u6587\u7684\u7406\u89e3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faVideoGPT+\uff0c\u5b83\u7ed3\u5408\u4e86\u56fe\u50cf\u7f16\u7801\u5668\uff08\u7528\u4e8e\u8be6\u7ec6\u7684\u7a7a\u95f4\u7406\u89e3\uff09\u548c\u89c6\u9891\u7f16\u7801\u5668\uff08\u7528\u4e8e\u5168\u5c40\u65f6\u5e8f\u4e0a\u4e0b\u6587\u5efa\u6a21\uff09\u7684\u4f18\u52bf\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u89c6\u9891\u5212\u5206\u4e3a\u5c0f\u6bb5\uff0c\u5e76\u5bf9\u6765\u81ea\u4e24\u8005\u7279\u5f81\u7684\u63d0\u53d6\u5e94\u7528\u81ea\u9002\u5e94\u6c60\u5316\u7b56\u7565\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u6211\u4eec\u7684\u67b6\u6784\u5728\u591a\u4e2a\u89c6\u9891\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5305\u62ecVCGBench\u3001MVBench\u548c\u96f6\u6837\u672c\u95ee\u7b54\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a112K\u7684\u89c6\u9891\u6307\u4ee4\u96c6\uff0c\u901a\u8fc7\u65b0\u9896\u7684\u534a\u81ea\u52a8\u6807\u6ce8\u7ba1\u9053\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u89c6\u9891LMMs\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86VCGBench-Diverse\uff0c\u5b83\u6db5\u76d6\u4e8618\u4e2a\u5e7f\u6cdb\u89c6\u9891\u7c7b\u522b\uff0c\u5982\u751f\u6d3b\u65b9\u5f0f\u3001\u4f53\u80b2\u3001\u79d1\u5b66\u3001\u6e38\u620f\u548c\u76d1\u63a7\u89c6\u9891\uff0c\u51714,354\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u8fd9\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\u73b0\u6709LMMs\u5728\u5bc6\u96c6\u89c6\u9891\u63cf\u8ff0\u3001\u7a7a\u95f4\u548c\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u590d\u6742\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u786e\u4fdd\u5728\u5404\u79cd\u89c6\u9891\u7c7b\u578b\u548c\u52a8\u6001\u4e0b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mbzuai-oryx/VideoGPT-plus\u627e\u5230\u3002**|\n", "2406.09412": "|**2024-06-13**|**Explore the Limits of Omni-modal Pretraining at Scale**|Yiyuan Zhang et.al.|[2406.09412](http://arxiv.org/abs/2406.09412)|**[link](https://github.com/invictus717/MiCo)**|**\u6211\u4eec\u63d0\u8bae\u6784\u5efa\u5168\u6a21\u6001\u667a\u80fd\uff0c\u65e8\u5728\u7406\u89e3\u5404\u79cd\u6a21\u6001\u5e76\u5b66\u4e60\u901a\u7528\u8868\u793a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u9884\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u591a\u6a21\u6001\u4e0a\u4e0b\u6587\uff08MiCo\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u540c\u65f6\u589e\u52a0\u6a21\u6001\u6570\u91cf\u3001\u6570\u636e\u91cf\u4ee5\u53ca\u6a21\u578b\u53c2\u6570\u7684\u6570\u91cf\u3002\u901a\u8fc7MiCo\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u591a\u9879\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u6a21\u6001\u5b66\u4e60\u80fd\u529b\uff1a\u4e00\u662f\u9488\u5bf910\u79cd\u4e0d\u540c\u6a21\u6001\u7684\u5355\u6a21\u6001\u611f\u77e5\u57fa\u51c6\uff0c\u4e8c\u662f\u5305\u62ec\u68c0\u7d22\u3001\u95ee\u7b54\u548ccaptioning\u5728\u5185\u768425\u9879\u8de8\u6a21\u6001\u7406\u89e3\u4efb\u52a1\uff0c\u4e09\u662f18\u4e2a\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u57fa\u51c6\u3002\u6211\u4eec\u7684\u6a21\u578b\u521b\u9020\u4e8637\u9879\u6700\u65b0\u7684\u6700\u9ad8\u6027\u80fd\u8bb0\u5f55\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u63a8\u52a8\u5168\u6a21\u6001\u667a\u80fd\u7684\u53d1\u5c55\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u5f00\u6e90\u3002**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397](http://arxiv.org/abs/2406.09397)|null|\u73b0\u4ee3\u89c6\u89c9\u6a21\u578b\u5728\u5927\u89c4\u6a21\u5608\u6742\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u867d\u7136\u5c55\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u9075\u5faa\u7528\u6237\u610f\u56fe\u3001\u5982\u89c6\u89c9\u7f8e\u611f\u3001\u7279\u5b9a\u98ce\u683c\u548c\u8d23\u4efb\u8f93\u51fa\u65b9\u9762\u53ef\u80fd\u5b58\u5728\u95ee\u9898\u3002\u672c\u6587\u5173\u6ce8\u89c6\u89c9\u7f8e\u5b66\u9886\u57df\uff0c\u76ee\u6807\u662f\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u6807\u51c6\u5728\u68c0\u7d22\u7cfb\u7edf\u4e2d\u4fdd\u6301\u4e00\u81f4\u3002\u9ad8\u7ea7\u68c0\u7d22\u7cfb\u7edf\u901a\u5e38\u91c7\u7528\u57fa\u4e8e\u4f4e\u7ea7\u7279\u5f81\uff08\u5982\u9971\u548c\u5ea6\uff09\u7684\u5ba1\u7f8e\u6a21\u578b\u4f5c\u4e3a\u91cd\u6392\u5668\u6216\u8fc7\u6ee4\u5668\uff0c\u4f46\u9762\u5bf9\u98ce\u683c\u3001\u6587\u5316\u6216\u77e5\u8bc6\u80cc\u666f\u65f6\u6027\u80fd\u6709\u9650\u3002\u6211\u4eec\u53d1\u73b0\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u6539\u5199\u641c\u7d22\u67e5\u8be2\u5e76\u6269\u5c55\u5ba1\u7f8e\u671f\u671b\uff0c\u53ef\u4ee5\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u504f\u597d\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u9488\u5bf9\u89c6\u89c9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u63d0\u53d6LLM\u63a8\u7406\u548c\u5ba1\u7f8e\u6a21\u578b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u66f4\u597d\u5730\u4f7f\u89c6\u89c9\u6a21\u578b\u7b26\u5408\u4eba\u7c7b\u5ba1\u7f8e\u3002\u7531\u4e8e\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u68c0\u7d22\u7cfb\u7edf\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u6a21\u578b\uff08LMM\uff09\u6765\u8bc4\u4ef7\u7f8e\u611f\u8868\u73b0\u3002\u8003\u8651\u5230\u7f8e\u611f\u8bc4\u4f30\u7684\u4e3b\u89c2\u6027\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aHPIR\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8861\u91cf\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u7684\u5951\u5408\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9\u6a21\u578b\u7684\u7f8e\u611f\u884c\u4e3a\uff0c\u4ece\u591a\u4e2a\u6307\u6807\u6765\u770b\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u63d0\u51fa\u7684\u7b97\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u901a\u7528\u5b9e\u8df5\uff0c\u7528\u4e8e\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002|\n", "2406.09396": "|**2024-06-13**|**Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA**|Jongwoo Park et.al.|[2406.09396](http://arxiv.org/abs/2406.09396)|**[link](https://github.com/jongwoopark7978/LVNet)**|\u957f\u671f\u89c6\u9891\u901a\u5e38\u5305\u542b\u5927\u91cf\u5197\u4f59\u4fe1\u606f\uff0c\u8de8\u8d8a\u8f83\u957f\u7684\u65f6\u95f4\u95f4\u9694\uff0c\u4e14\u5305\u542b\u591a\u4e2a\u677e\u6563\u5173\u8054\u7684\u4e8b\u4ef6\u6216\u5b9e\u4f53\u3002\u56e0\u6b64\uff0c\u5728\u8fdb\u884c\u957f\u89c6\u9891\u95ee\u7b54\uff08LVQA\uff09\u65f6\uff0c\u751f\u6210\u6b63\u786e\u7b54\u6848\u6240\u9700\u7684\u6240\u6709\u4fe1\u606f\u5f80\u5f80\u53ea\u9700\u4e00\u5c0f\u90e8\u5206\u5e27\u5c31\u8db3\u4ee5\u63d0\u4f9b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LVQA\u57fa\u51c6\u4e0a\u53d6\u5f97\u5353\u8d8a\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u4f9d\u8d56\u4e8e\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5c06\u89c6\u9891\u4e2d\u7684\u6240\u6709\u89c6\u89c9\u5185\u5bb9\u8f6c\u6362\u6210\u81ea\u7136\u8bed\u8a00\u3002\u4f20\u7edf\u505a\u6cd5\u901a\u5e38\u662f\u5747\u5300\u91c7\u6837\u5927\u91cf\u5e27\u5e76\u72ec\u7acb\u4e3a\u5176\u751f\u6210\u63cf\u8ff0\uff0c\u8fd9\u65e2\u4e0d\u9ad8\u6548\u4e5f\u4e0d\u514d\u6709\u5197\u4f59\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5173\u952e\u5e27\u9009\u62e9\u548c\u987a\u5e8f\u611f\u77e5\u7684\u63cf\u8ff0\u65b9\u6cd5\uff0c\u4ee5\u663e\u8457\u51cf\u5c11\u8fd9\u4e9b\u5197\u4f59\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u521b\u65b0\u65b9\u6cd5\uff1a\u5c42\u6b21\u5173\u952e\u5e27\u9009\u62e9\u5668\u548c\u987a\u5e8f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u7684\u6700\u7ec8\u6846\u67b6\u79f0\u4e3aLVNet\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6LVQA\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u4ee3\u7801\u3002|\n", "2406.09367": "|**2024-06-13**|**Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs**|Zijia Zhao et.al.|[2406.09367](http://arxiv.org/abs/2406.09367)|**[link](https://github.com/joez17/videoniah)**|**\u89c6\u9891\u7406\u89e3\u662f\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5173\u952e\u4e0b\u4e00\u6b65\u3002\u4e3a\u4e86\u68c0\u9a8c\u89c6\u9891\u7406\u89e3\u7684\u7279\u5b9a\u65b9\u9762\uff0c\u73b0\u6709\u7684\u89c6\u9891\u57fa\u51c6\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u9009\u62e9\u4e0e\u76ee\u6807\u80fd\u529b\u5339\u914d\u7684\u89c6\u9891\uff0c\u5e76\u5bf9\u67e5\u8be2-\u54cd\u5e94\u5bf9\u8fdb\u884c\u7e41\u7410\u7684\u6807\u6ce8\uff0c\u4ee5\u5339\u914d\u89c6\u9891\u5185\u5bb9\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u65e2\u5177\u6709\u6311\u6218\u6027\u53c8\u8d44\u6e90\u5bc6\u96c6\u3002\u672c\u6587\u63d0\u51faVideoNIAH\uff08\u89c6\u9891\u9488 haystack\uff09\uff0c\u4e00\u4e2a\u901a\u8fc7\u5408\u6210\u89c6\u9891\u751f\u6210\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\u3002VideoNIAH\u901a\u8fc7\u5c06\u4e0d\u76f8\u5173\u7684\u56fe\u50cf/\u6587\u672c\u201c\u9488\u201d\u63d2\u5165\u539f\u59cb\u89c6\u9891\u4e2d\uff0c\u5c06\u6d4b\u8bd5\u89c6\u9891\u5185\u5bb9\u4e0e\u5b83\u4eec\u7684\u67e5\u8be2-\u54cd\u5e94\u5206\u79bb\u3002\u5b83\u4ec5\u57fa\u4e8e\u8fd9\u4e9b\u9488\u751f\u6210\u6ce8\u91ca\uff0c\u786e\u4fdd\u89c6\u9891\u6765\u6e90\u7684\u591a\u6837\u6027\u548c\u67e5\u8be2-\u54cd\u5e94\u7684\u4e30\u5bcc\u6027\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d2\u5165\u591a\u4e2a\u9488\uff0cVideoNIAH\u4e25\u683c\u8bc4\u4f30\u6a21\u578b\u7684\u65f6\u5e8f\u7406\u89e3\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528VideoNIAH\u6784\u5efa\u4e86\u89c6\u9891\u57fa\u51c6VNBench\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u6392\u5e8f\u548c\u8ba1\u6570\u7b49\u4efb\u52a1\u3002VNBench\u80fd\u591f\u9ad8\u6548\u5730\u8bc4\u4f30\u89c6\u9891\u6a21\u578b\u7684\u7cbe\u7ec6\u7406\u89e3\u80fd\u529b\u548c\u65f6\u7a7a\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u652f\u6301\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd1\u671f\u7684\u89c6\u9891\u4e3a\u4e2d\u5fc3\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u6a21\u578b\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u3002\u5c3d\u7ba1\u4e13\u6709\u6a21\u578b\u76f8\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\u5177\u6709\u663e\u8457\u4f18\u52bf\uff0c\u4f46\u6240\u6709\u73b0\u6709\u89c6\u9891\u6a21\u578b\u5728\u957f\u8ddd\u79bb\u4f9d\u8d56\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4ecd\u7136\u4e0d\u4f73\u3002VideoNIAH\u662f\u4e00\u4e2a\u7b80\u5355\u4e14\u9ad8\u5ea6\u53ef\u6269\u5c55\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\uff0c\u6211\u4eec\u76f8\u4fe1\u5b83\u5c06\u6fc0\u53d1\u672a\u6765\u89c6\u9891\u57fa\u51c6\u5de5\u4f5c\u7684\u521b\u65b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/joez17/VideoNIAH\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.09363": "|**2024-06-13**|**ElicitationGPT: Text Elicitation Mechanisms via Language Models**|Yifan Wu et.al.|[2406.09363](http://arxiv.org/abs/2406.09363)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u65e0\u9700\u9886\u57df\u77e5\u8bc6\u7684\u67e5\u8be2\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u5bf9\u83b7\u53d6\u7684\u6587\u672c\u9884\u6d4b\u8fdb\u884c\u8bc4\u5206\uff0c\u4ee5\u8bc4\u4f30\u5176\u4e0e\u5b9e\u9645\u72b6\u6001\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u6fc0\u52b1\u4fe1\u606f\u6536\u96c6\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u3002\u7814\u7a76\u901a\u8fc7\u5728\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u81ea\u52a8\u7684\u6a21\u578b\u8bc4\u5206\u4e0e\u4eba\u5de5\u5bfc\u5e08\u7ed9\u51fa\u7684\u8bc4\u5206\uff0c\u65e8\u5728\u5b9e\u8bc1\u8bc4\u4f30\u8fd9\u4e9b\u673a\u5236\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002|\n", "2406.09345": "|**2024-06-13**|**DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding**|Suwon Shon et.al.|[2406.09345](http://arxiv.org/abs/2406.09345)|null|## \u80cc\u666f \u5c06\u9884\u8bad\u7ec3\u7684\u6587\u672c\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u8bed\u97f3\u8f93\u5165\u76f8\u7ed3\u5408\uff0c\u5df2\u7ecf\u8d4b\u4e88\u4e86\u8fd9\u4e9b\u6a21\u578b\u6267\u884c\u591a\u6837\u5316\u8bed\u97f3\u4efb\u52a1\u7684\u80fd\u529b\uff0c\u5305\u62ec\u6307\u4ee4\u8ddf\u968f\u3002\u8fd9\u79cd\u6574\u5408\u9700\u8981\u7ed3\u5408\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u548cLLM\uff0c\u5b83\u4eec\u5206\u522b\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u63d0\u8bae\u4f7f\u7528\u79bb\u6563\u8bed\u97f3\u5355\u5143\uff08DSU\uff09\uff0c\u800c\u975e\u8fde\u7eed\u503c\u7684\u8bed\u97f3\u7f16\u7801\u8f93\u51fa\uff0c\u901a\u8fc7\u8bed\u97f3\u9002\u914d\u5668\u5c06DSU\u8f6c\u6362\u5230LLM\u7684\u5d4c\u5165\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u65e0\u76d1\u7763\u7684\u8bed\u97f3\u7f16\u7801\u5668\u751f\u6210DSU\uff0c\u7136\u540e\u8fd0\u7528k-means\u805a\u7c7b\u65b9\u6cd5\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u5904\u7406\u6765\u81ea\u89c1/\u672a\u89c1\u8fc7\u9886\u57df\u4ee5\u53ca\u53e3\u8bed\u95ee\u7b54\u4e2d\u7684\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u7a33\u5065\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6765\u81ea\u4e0d\u540c\u81ea\u76d1\u7763\u8bed\u97f3\u7f16\u7801\u5668\u5c42\u7684DSU\u7c7b\u578b\uff0c\u4ee5\u53ca\u6885\u5c14\u9891\u7387\u5012\u8c31\u7cfb\u6570\uff08MFCC\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u53e3\u8bed\u95ee\u7b54\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u4e2d\uff0cASR\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u53ef\u80fd\u8f83\u4f4e\u3002|\n", "2406.09325": "|**2024-06-13**|**REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space**|Tomer Ashuach et.al.|[2406.09325](http://arxiv.org/abs/2406.09325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u80fd\u65e0\u610f\u4e2d\u8bb0\u4f4f\u5e76\u6cc4\u9732\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u654f\u611f\u6216\u4e2a\u4eba\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\uff0c\u5f15\u53d1\u9690\u79c1\u95ee\u9898\u3002\u5f53\u524d\u7684\u89e3\u51b3\u65b9\u6848\u5305\u62ec\u6602\u8d35\u7684\u6570\u636e\u6e05\u6d17\uff0c\u6216\u8005\u901a\u8fc7\u9057\u5fd8\u548c\u6a21\u578b\u7f16\u8f91\u6765\u8fc7\u6ee4\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u88ab\u63d0\u53d6\u653b\u51fb\u7ed5\u8fc7\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u540d\u4e3aREVS\uff0c\u7528\u4e8e\u4eceLLMs\u4e2d\u6d88\u9664\u654f\u611f\u4fe1\u606f\u3002REVS\u8bc6\u522b\u5e76\u4fee\u6539\u4e0e\u6bcf\u6761\u654f\u611f\u4fe1\u606f\u76f8\u5173\u7684\u5c11\u91cf\u795e\u7ecf\u5143\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u795e\u7ecf\u5143\u6295\u5f71\u5230\u8bcd\u6c47\u7a7a\u95f4\uff08\u53bb\u5d4c\u5165\uff09\uff0c\u6211\u4eec\u5b9a\u4f4d\u9a71\u52a8\u5176\u751f\u6210\u7684\u5173\u952e\u90e8\u5206\u3002\u7136\u540e\uff0c\u6211\u4eec\u6839\u636e\u53bb\u5d4c\u5165\u77e9\u9635\u7684\u4f2a\u9006\u8ba1\u7b97\u6a21\u578b\u7f16\u8f91\uff0c\u5e76\u5e94\u7528\u5b83\u6765\u964d\u4f4e\u76ee\u6807\u654f\u611f\u6570\u636e\u7684\u751f\u6210\u6982\u7387\u3002\u4e3a\u4e86\u5145\u5206\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u771f\u6b63\u654f\u611f\u4fe1\u606f\u4e0a\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u662fGPT-J\u56fa\u6709\u7684\u7535\u5b50\u90ae\u4ef6\u6570\u636e\u96c6\uff0c\u53e6\u4e00\u4e2a\u662f\u6211\u4eec\u8c03\u6574\u6a21\u578b\u4f7f\u5176\u8bb0\u5fc6\u7684\u5408\u6210\u793e\u4f1a\u4fdd\u969c\u53f7\u7801\u6570\u636e\u96c6\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\u76f8\u6bd4\uff0cREVS\u5728\u6d88\u9664\u654f\u611f\u4fe1\u606f\u548c\u62b5\u6297\u63d0\u53d6\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u7684\u5b8c\u6574\u6027\u3002\u4ee3\u7801\u548c\u6f14\u793a\u7b14\u8bb0\u672c\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.09324": "|**2024-06-13**|**Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs**|Zhao Xu et.al.|[2406.09324](http://arxiv.org/abs/2406.09324)|**[link](https://github.com/usail-hkust/bag_of_tricks_for_llm_jailbreaking)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6837\u672c\u4efb\u52a1\u6267\u884c\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u6613\u53d7\u7834\u89e3\u653b\u51fb\uff0c\u53ef\u80fd\u88ab\u64cd\u7eb5\u4ea7\u751f\u6709\u5bb3\u8f93\u51fa\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f00\u59cb\u5c06\u7834\u89e3\u653b\u51fb\u5206\u4e3a\u4ee4\u724c\u7ea7\u548c\u63d0\u793a\u7ea7\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u4e3b\u8981\u5ffd\u89c6\u4e86\u7834\u89e3\u653b\u51fb\u7684\u591a\u6837\u5173\u952e\u56e0\u7d20\uff0c\u5927\u90e8\u5206\u7814\u7a76\u805a\u7126\u4e8eLLM\u7684\u6f0f\u6d1e\uff0c\u800c\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u63a2\u7d22\u4e0d\u8db3\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u653b\u51fb\u8bbe\u7f6e\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u8bae\u5efa\u7acb\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u6846\u67b6\uff0c\u4ee5\u4fc3\u8fdb\u6807\u51c6\u5316\u8bc4\u4f30\u3002\u6211\u4eec\u4ece\u76ee\u6807\u7ea7\u548c\u653b\u51fb\u7ea7\u4e24\u4e2a\u89d2\u5ea6\uff0c\u8be6\u7ec6\u8003\u5bdf\u4e86\u5b9e\u65bd\u9488\u5bf9LLMs\u7684\u7834\u89e3\u653b\u51fb\u7684\u516b\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5e38\u7528\u6570\u636e\u96c6\u4e0a\u5bf9\u516d\u79cd\u9632\u5fa1\u65b9\u6cd5\u8fdb\u884c\u4e86\u4e03\u79cd\u4ee3\u8868\u6027\u7684\u7834\u89e3\u653b\u51fb\uff0c\u603b\u8ba1\u7ea6320\u4e2a\u5b9e\u9a8c\uff0c\u4f7f\u7528A800-80G GPU\u8017\u65f6\u5927\u7ea65\u4e07\u5c0f\u65f6\u3002\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u8fdb\u884c\u6807\u51c6\u5316\u8bc4\u4f30\u7684\u5fc5\u8981\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff1ahttps://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking\u3002**|\n", "2406.09321": "|**2024-06-13**|**JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models**|Delong Ran et.al.|[2406.09321](http://arxiv.org/abs/2406.09321)|**[link](https://github.com/thuccslab/jailbreakeval)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8d8a\u72f1\u653b\u51fb\u7814\u7a76\u4e2d\u7684\u8bc4\u4f30\u96be\u9898\u3002\u76ee\u524d\uff0c\u5bf9\u4e8e\u653b\u51fb\u662f\u5426\u6210\u529f\u7f3a\u4e4f\u7edf\u4e00\u6807\u51c6\uff0c\u4e0d\u540c\u7684\u8bc4\u4f30\u65b9\u6cd5\u5982\u4eba\u5de5\u6807\u6ce8\u6216\u7279\u5b9a\u65b9\u5f0f\u63d0\u793aGPT-4\u5b58\u5728\uff0c\u5404\u6709\u4f18\u7f3a\u70b9\uff0c\u5bf9\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u4f53\u73b0\u548c\u7814\u7a76\u6210\u672c\u4ea7\u751f\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7814\u7a76\u5206\u6790\u4e86\u8fd1\u4e5d\u5341\u98792023\u5e745\u6708\u81f32024\u5e744\u6708\u671f\u95f4\u53d1\u5e03\u7684\u8d8a\u72f1\u653b\u51fb\u76f8\u5173\u7814\u7a76\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u8be6\u7ec6\u7684\u8bc4\u4f30\u65b9\u6cd5\u5206\u7c7b\u4f53\u7cfb\uff0c\u6df1\u5165\u5256\u6790\u4e86\u5404\u79cd\u8bc4\u4f30\u5668\u7684\u4f18\u7f3a\u70b9\u53ca\u5176\u5e94\u7528\u73b0\u72b6\u3002\u4e3a\u4e86\u63a8\u52a8\u540e\u7eed\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u5e76\u63a8\u51fa\u4e86JailbreakEval\u5de5\u5177\u5305\uff0c\u5b83\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5e73\u53f0\uff0c\u96c6\u6210\u4e86\u591a\u79cd\u77e5\u540d\u7684\u8bc4\u4f30\u5668\uff0c\u7528\u6237\u53ea\u9700\u4e00\u4e2a\u547d\u4ee4\u5373\u53ef\u83b7\u53d6\u7ed3\u679c\u3002\u6b64\u5916\uff0cJailbreakEval\u652f\u6301\u7528\u6237\u5728\u7edf\u4e00\u6846\u67b6\u5185\u5b9a\u5236\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6d41\u7a0b\uff0c\u7b80\u5316\u4e86\u5f00\u53d1\u548c\u6bd4\u8f83\u8fc7\u7a0b\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u671f\u671bJailbreakEval\u80fd\u4fc3\u8fdb\u8d8a\u72f1\u653b\u51fb\u8bc4\u4ef7\u7684\u6807\u51c6\u5316\uff0c\u6210\u4e3a\u793e\u533a\u5185\u8d8a\u72f1\u7814\u7a76\u8bc4\u4f30\u7684\u50ac\u5316\u5242\u3002**|\n", "2406.10229": "|**2024-06-14**|**Quantifying Variance in Evaluation Benchmarks**|Lovish Madaan et.al.|[2406.10229](http://arxiv.org/abs/2406.10229)|null|\u8bc4\u4ef7\u57fa\u51c6\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4e5f\u662f\u63a8\u52a8\u8fd9\u4e9b\u80fd\u529b\u8fdb\u6b65\u7684\u9a71\u52a8\u529b\u3002\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff08\u6216\u7f3a\u4e4f\uff09\uff0c\u73b0\u5728\u5b83\u4eec\u4e5f\u88ab\u5e7f\u6cdb\u7528\u4e8e\u51b3\u5b9a\u4e0d\u540c\u7684\u8bad\u7ec3\u9009\u62e9\u4e4b\u95f4\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u6211\u4eec\u5f88\u5c11\u91cf\u5316\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\uff0c\u8fd9\u51b3\u5b9a\u4e86\u6027\u80fd\u5dee\u5f02\u7684\u542b\u4e49\u3002\u672c\u6587\u5b9a\u4e49\u5e76\u6d4b\u91cf\u4e86\u4e00\u7cfb\u5217\u65e8\u5728\u8861\u91cf\u8bc4\u4ef7\u57fa\u51c6\u65b9\u5dee\u7684\u6307\u6807\uff0c\u5305\u62ec\u521d\u59cb\u5316\u65f6\u7684\u968f\u673a\u79cd\u5b50\u65b9\u5dee\u548c\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5355\u8c03\u6027\u3002\u901a\u8fc7\u5bf9\u5927\u91cf\u6a21\u578b\uff08\u5305\u62ec\u516c\u5f00\u53ef\u7528\u7684\u548c\u4ece\u5934\u8bad\u7ec3\u7684\u6a21\u578b\uff09\u8fdb\u884c\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5404\u79cd\u65b9\u5dee\u5ea6\u91cf\u7684\u5b9e\u8bc1\u4f30\u8ba1\uff0c\u5e76\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u8003\u8651\u548c\u5efa\u8bae\u3002\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u8fde\u7eed\u548c\u79bb\u6563\u6027\u80fd\u5ea6\u91cf\u7684\u5b9e\u7528\u6027\u548c\u6743\u8861\uff0c\u5e76\u63a2\u7d22\u4e86\u66f4\u597d\u5730\u7406\u89e3\u548c\u51cf\u5c11\u65b9\u5dee\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u89c4\u6a21\uff08\u7ea670\u4ebf\u53c2\u6570\uff09\u7684\u6a21\u578b\uff0c\u5982\u5c06\u591a\u6a21\u6001\u591a\u4efb\u52a1\u5b66\u4e60\uff08MMLU\uff09\u4efb\u52a1\u6846\u67b6\u4e3a\u5b8c\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u5e38\u5e38\u964d\u4f4e\u65b9\u5dee\uff1b\u800c\u53d7\u5230\u4eba\u7c7b\u6d4b\u8bd5\u6587\u732e\u542f\u53d1\u7684\u66f4\u590d\u6742\u65b9\u6cd5\uff08\u5982\u9879\u76ee\u5206\u6790\u548c\u9879\u76ee\u53cd\u5e94\u7406\u8bba\uff09\u5728\u663e\u8457\u51cf\u5c11\u65b9\u5dee\u65b9\u9762\u6548\u679c\u6709\u9650\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\u7279\u6027\uff0c\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u7279\u5b9a\u6280\u672f\u6765\u51cf\u5c11\u65b9\u5dee\uff0c\u5e76\u666e\u904d\u9f13\u52b1\u5b9e\u8df5\u8005\u5728\u6bd4\u8f83\u6a21\u578b\u65f6\u4ed4\u7ec6\u8003\u8651\u65b9\u5dee\u56e0\u7d20\u3002|\n", "2406.10218": "|**2024-06-14**|**Semantic Membership Inference Attack against Large Language Models**|Hamid Mozaffari et.al.|[2406.10218](http://arxiv.org/abs/2406.10218)|null|## \u80cc\u666f \u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIA\uff09\u7684\u76ee\u6807\u662f\u8bc6\u522b\u7279\u5b9a\u6570\u636e\u70b9\u662f\u5426\u88ab\u7eb3\u5165\u4e86\u76ee\u6807\u6a21\u578b\u7684\u8bad\u7ec3\u96c6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u8bed\u4e49\u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Semantic Membership Inference Attack\uff0cSMIA\uff09\uff0c\u901a\u8fc7\u5229\u7528\u8f93\u5165\u7684\u8bed\u4e49\u5185\u5bb9\u53ca\u5176\u6270\u52a8\uff0c\u63d0\u5347MIA\u7684\u6027\u80fd\u3002SMIA\u8bad\u7ec3\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6765\u5206\u6790\u76ee\u6807\u6a21\u578b\u5bf9\u6270\u52a8\u8f93\u5165\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6355\u6349\u6210\u5458\u6837\u672c\u4e0e\u975e\u6210\u5458\u6837\u672c\u4e4b\u95f4\u8f93\u51fa\u6982\u7387\u5206\u5e03\u7684\u5dee\u5f02\u3002\u6211\u4eec\u5728Pythia\u548cGPT-Neo\u6a21\u578b\u5bb6\u65cf\uff0c\u4ee5\u53caWikipedia\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSMIA\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u653b\u51fb\u624b\u6bb5\uff0c\u4f8b\u5982\u5728Pythia-12B\u4e0a\u7684AUC-ROC\u503c\u8fbe\u5230\u4e8667.39%\uff0c\u800c\u7b2c\u4e8c\u597d\u7684\u653b\u51fb\u65b9\u6cd5\u4ec5\u4e3a58.90%\u3002|\n", "2406.10216": "|**2024-06-14**|**Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs**|Rui Yang et.al.|[2406.10216](http://arxiv.org/abs/2406.10216)|null|\u5728\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6846\u67b6\u4e2d\uff0c\u5229\u7528\u57fa\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u5956\u52b1\u6a21\u578b\u5df2\u8bc1\u5b9e\u80fd\u6709\u6548\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u610f\u56fe\u3002\u7136\u800c\uff0c\u5f53\u524d\u5956\u52b1\u6a21\u578b\u5bf9\u672a\u89c1\u8fc7\u7684\u63d0\u793a\u548c\u54cd\u5e94\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\uff0c\u53ef\u80fd\u5bfc\u81f4\u6240\u8c13\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u5373\u5956\u52b1\u4f18\u5316\u8fc7\u5ea6\u5bfc\u81f4\u5b9e\u9645\u6027\u80fd\u4e0b\u964d\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u503e\u5411\u4e8e\u7ea6\u675f\u7b56\u7565\u4f18\u5316\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u6b63\u5219\u5316\u9690\u85cf\u72b6\u6001\u6765\u589e\u5f3a\u5956\u52b1\u6a21\u578b\u5e94\u5bf9\u5206\u5e03\u53d8\u5316\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4fdd\u7559\u57fa\u7840\u6a21\u578b\u7684\u8bed\u8a00\u6a21\u578b\u5934\uff0c\u5e76\u7ed3\u5408\u4e00\u7cfb\u5217\u6587\u672c\u751f\u6210\u635f\u5931\uff0c\u65e8\u5728\u4fdd\u6301\u9690\u85cf\u72b6\u6001\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\uff0c\u540c\u65f6\u5728\u76f8\u540c\u7684\u9690\u85cf\u72b6\u6001\u540e\u5b66\u4e60\u4e00\u4e2a\u5956\u52b1\u5934\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5f15\u5165\u7684\u6b63\u5219\u5316\u6280\u672f\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u6cdb\u5316\u4efb\u52a1\u4e2d\u7684\u5956\u52b1\u6a21\u578b\u51c6\u786e\u6027\uff0c\u5e76\u6709\u6548\u7f13\u89e3\u4e86RLHF\u4e2d\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u53ef\u9760\u3001\u66f4\u7a33\u5065\u7684\u504f\u597d\u5b66\u4e60\u8303\u5f0f\u3002|\n", "2406.10209": "|**2024-06-14**|**Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs**|Abhimanyu Hans et.al.|[2406.10209](http://arxiv.org/abs/2406.10209)|**[link](https://github.com/ahans30/goldfish-loss)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u8bb0\u4f4f\u5e76\u91cd\u590d\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u8fd9\u5e26\u6765\u4e86\u9690\u79c1\u548c\u7248\u6743\u95ee\u9898\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u79cd\u8bb0\u5fc6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u4e0b\u4e00\u6b65 token \u8bad\u7ec3\u76ee\u6807\u7684\u5fae\u5999\u4fee\u6539\uff0c\u79f0\u4e3a\u201c\u91d1\u9c7c\u635f\u5931\u201d\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u4ee4\u724c\u4e0d\u53c2\u4e0e\u635f\u5931\u8ba1\u7b97\u3002\u6a21\u578b\u4e0d\u4f1a\u8bb0\u4f4f\u8fd9\u4e9b\u88ab\u4e22\u5f03\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u9632\u6b62\u4e86\u5b8c\u6574\u8bad\u7ec3\u5e8f\u5217\u7684\u9010\u5b57\u590d\u5236\u3002\u6211\u4eec\u5728\u6570\u5341\u4ebf\u89c4\u6a21\u7684 Llama-2 \u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u548c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u63d0\u53d6\u7684\u8bb0\u5fc6\uff0c\u800c\u5bf9\u4e0b\u6e38\u57fa\u51c6\u7684\u5f71\u54cd\u5fae\u4e4e\u5176\u5fae\u3002**|\n", "2406.10196": "|**2024-06-14**|**TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners**|Tomas de la Rosa et.al.|[2406.10196](http://arxiv.org/abs/2406.10196)|null|**\u6458\u8981\uff1a** \u65c5\u884c\u89c4\u5212\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u6839\u636e\u7ea6\u675f\u6761\u4ef6\u751f\u6210\u4e00\u7cfb\u5217\u4e0e\u8bbf\u95ee\u5730\u70b9\u76f8\u5173\u7684\u884c\u52a8\uff0c\u540c\u65f6\u6700\u5927\u5316\u7528\u6237\u7684\u6ee1\u610f\u5ea6\u3002\u4f20\u7edf\u65b9\u6cd5\u901a\u5e38\u4f1a\u5c06\u95ee\u9898\u8f6c\u5316\u4e3a\u7279\u5b9a\u5f62\u5f0f\u7684\u8bed\u8a00\u8868\u8fbe\uff0c\u4ece\u7f51\u7edc\u8d44\u6e90\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4f7f\u7528\u5408\u9002\u7684\u6c42\u89e3\u5668\u6765\u751f\u6210\u6709\u6548\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\u76f4\u63a5\u4ece\u7528\u6237\u8bf7\u6c42\u4e2d\u8f93\u51fa\u8ba1\u5212\uff0c\u5229\u7528\u4e30\u5bcc\u7684\u65c5\u884c\u9886\u57df\u77e5\u8bc6\u63d0\u4f9b\u666f\u70b9\u548c\u53ef\u80fd\u8def\u7ebf\u7b49\u9ad8\u5c42\u6b21\u4fe1\u606f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5f80\u5f80\u4ea7\u751f\u4e0d\u8fde\u8d2f\u3001\u672a\u80fd\u5b8c\u5168\u6ee1\u8db3\u7ea6\u675f\u7684\u8ba1\u5212\uff0c\u4e14\u65e0\u6cd5\u4fdd\u8bc1\u751f\u6210\u9ad8\u8d28\u91cf\u65b9\u6848\u3002\u6211\u4eec\u63d0\u51faTRIP-PAL\uff0c\u4e00\u79cd\u878d\u5408LLMs\u548c\u81ea\u52a8\u5316\u89c4\u5212\u5668\u7684\u6df7\u5408\u65b9\u6cd5\uff1a\uff081\uff09LLMs\u83b7\u53d6\u5e76\u8f6c\u6362\u65c5\u884c\u4fe1\u606f\u548c\u7528\u6237\u9700\u6c42\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u8f93\u5165\u89c4\u5212\u5668\u7684\u6570\u636e\u7ed3\u6784\uff1b\uff082\uff09\u81ea\u52a8\u5316\u89c4\u5212\u5668\u8d1f\u8d23\u751f\u6210\u6ee1\u8db3\u7ea6\u675f\u5e76\u4f18\u5316\u7528\u6237\u6548\u7528\u7684\u65c5\u884c\u8ba1\u5212\u3002\u6211\u4eec\u5728\u4e0d\u540c\u65c5\u884c\u573a\u666f\u4e2d\u7684\u5b9e\u9a8c\u8868\u660e\uff0cTRIP-PAL\u5728\u751f\u6210\u65c5\u884c\u8ba1\u5212\u65b9\u9762\u4f18\u4e8e\u7eafLLM\u65b9\u6cd5\u3002|\n", "2406.10185": "|**2024-06-14**|**Detecting and Evaluating Medical Hallucinations in Large Vision Language Models**|Jiawei Chen et.al.|[2406.10185](http://arxiv.org/abs/2406.10185)|null|\u968f\u7740\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u5982\u533b\u5b66\u56fe\u50cf\u95ee\u7b54\u548c\u62a5\u544a\u751f\u6210\uff0c\u5b83\u4eec\u4ece\u57fa\u7840\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u90a3\u91cc\u7ee7\u627f\u4e86\u5f3a\u5927\u7684\u529f\u80fd\uff0c\u4f46\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u4ee4\u4eba\u62c5\u5fe7\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5728\u533b\u7597\u8fd9\u6837\u5bf9\u9519\u8bef\u5bb9\u9650\u6781\u4f4e\u7684\u73af\u5883\u4e2d\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u65b9\u6cd5\u6216\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86Med-HallMark\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u533b\u7597\u591a\u6a21\u6001\u9886\u57df\u8bbe\u8ba1\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u57fa\u51c6\u3002Med-HallMark\u652f\u6301\u591a\u4efb\u52a1\u5e7b\u89c9\u68c0\u6d4b\uff0c\u63d0\u4f9b\u591a\u5143\u5316\u7684\u5e7b\u89c9\u6570\u636e\uff0c\u5e76\u91c7\u7528\u5206\u7ea7\u5e7b\u89c9\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MediHall Score\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u533b\u7597\u8bc4\u4f30\u6307\u6807\uff0c\u901a\u8fc7\u5206\u5c42\u8bc4\u5206\u7cfb\u7edf\u8bc4\u4f30LVLM\u7684\u5e7b\u89c9\uff0c\u8003\u8651\u5176\u4e25\u91cd\u7a0b\u5ea6\u548c\u7c7b\u578b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u6f5c\u5728\u4e34\u5e8a\u5f71\u54cd\u7684\u7ec6\u81f4\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86MediHallDetector\uff0c\u4e00\u79cd\u4e13\u4e3a\u7cbe\u786e\u5e7b\u89c9\u68c0\u6d4b\u8bbe\u8ba1\u7684\u533b\u7597LVLM\uff0c\u5b83\u91c7\u7528\u4e86\u591a\u4efb\u52a1\u8bad\u7ec3\u65b9\u6cd5\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u4e3a\u6d41\u884c\u7684LVLM\u8bbe\u7acb\u4e86\u57fa\u7ebf\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMediHall Score\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edf\u6307\u6807\u66f4\u6df1\u5165\u7406\u89e3\u5e7b\u89c9\u5f71\u54cd\u7684\u80fd\u529b\uff0c\u5e76\u663e\u793a\u4e86MediHallDetector\u7684\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u663e\u8457\u63d0\u9ad8LVLM\u5728\u533b\u7597\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6240\u6709\u76f8\u5173\u8d44\u6e90\u5c06\u5728\u4e0d\u4e45\u540e\u53d1\u5e03\u3002|\n", "2406.10181": "|**2024-06-14**|**Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors**|Siyuan Chen et.al.|[2406.10181](http://arxiv.org/abs/2406.10181)|null|\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8e\u5185\u5b58\u9700\u6c42\u901a\u5e38\u8d85\u8fc7\u5355\u4e2aGPU\u7684\u5bb9\u91cf\uff0c\u89e3\u51b3\u8fd9\u4e00\u5185\u5b58\u6311\u6218\u7684\u4e00\u4e2a\u5e38\u89c1\u65b9\u6cd5\u662f\u5c06\u8ba1\u7b97\u548c\u6570\u636e\u4eceGPU\u8fc1\u79fb\u5230CPU\u3002\u7136\u800c\uff0c\u8fd9\u53d7\u5230\u666e\u901a\u786c\u4ef6\u5e26\u5bbd\u9650\u5236\u7684\u5236\u7ea6\uff0c\u5f71\u54cd\u4e86CPU\u4e0eGPU\u4e4b\u95f4\u7684\u901a\u4fe1\u6548\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLSP_Offload\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5b66\u4e60\u5f0f\u7684\u5b50\u7a7a\u95f4\u6295\u5f71\u5668\uff0c\u5b9e\u73b0\u5728 commodity \u786c\u4ef6\u4e0a\u63a5\u8fd1\u539f\u751f\u901f\u5ea6\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6d89\u53ca\u5b66\u4e60\u4e00\u4e2a\u9ad8\u6548\u7684\u7a00\u758f\u538b\u7f29\u5668\uff0c\u4ee5\u6700\u5c0f\u5316\u901a\u4fe1\u5e76\u4fdd\u6301\u6700\u5c0f\u7cbe\u5ea6\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5c42\u7ea7\u901a\u4fe1\u8c03\u5ea6\u7b56\u7565\uff0c\u4ee5\u6700\u5927\u5316\u901a\u4fe1\u4e0e\u8ba1\u7b97\u4e4b\u95f4\u7684\u5e76\u884c\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u57284GB\u7b14\u8bb0\u672cGPU\u4e0a\u5fae\u8c0313\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5728\u914d\u590724GB\u5185\u5b58\u7684NVIDIA RTX 4090 GPU\u4e0a\u5fae\u8c0370\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u4ec5\u6bd4\u65e0\u5185\u5b58\u9650\u5236\u7684\u5fae\u8c03\u616231%\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u79bb\u7ebf\u6846\u67b6\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86\u5fae\u8c03\u541e\u5410\u91cf\uff0c\u6700\u9ad8\u53ef\u8fbe3.33\u500d\uff0c\u5f53\u8fbe\u5230\u76f8\u540c\u51c6\u786e\u5ea6\u65f6\uff0c\u51cf\u5c11\u4e86\u7aef\u5230\u7aef\u5fae\u8c03\u65f6\u95f4\u768433.1%\u81f362.5%\u3002|\n", "2406.10172": "|**2024-06-14**|**Datasets for Multilingual Answer Sentence Selection**|Matteo Gabburo et.al.|[2406.10172](http://arxiv.org/abs/2406.10172)|null|**\u6458\u8981\uff1a** \u5728\u8bbe\u8ba1\u9ad8\u6548\u7684\u68c0\u7d22\u5f0f\u95ee\u7b54\uff08Question Answering\uff0cQA\uff09\u7cfb\u7edf\u4e2d\uff0c\u7b54\u6848\u53e5\u5b50\u9009\u62e9\uff08Answer Sentence Selection\uff0cAS2\uff09\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u6807\u6ce8\u6570\u636e\uff0c\u5927\u591a\u6570AS2\u9886\u57df\u7684\u8fdb\u5c55\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\u3002\u8fd9\u5bfc\u81f4\u4e86\u975e\u82f1\u8bed\u73af\u5883\u4e0bQA\u7cfb\u7edf\u7684\u6027\u80fd\u4e0e\u82f1\u8bed\u7cfb\u7edf\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u672c\u8bba\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u65b0\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u8a00\uff08\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u8461\u8404\u7259\u8bed\u548c\u897f\u73ed\u7259\u8bed\uff09AS2\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u5bf9\u73b0\u6709\u7684\u82f1\u6587AS2\u6570\u636e\u96c6\uff08\u5982ASNQ\u3001WikiQA\u548cTREC-QA\uff09\u8fdb\u884c\u76d1\u7763\u81ea\u52a8\u673a\u5668\u7ffb\u8bd1\uff08Automatic Machine Translation\uff0cAMT\uff09\u3002\u6211\u4eec\u901a\u8fc7\u591a\u79cd\u5b9e\u9a8c\u548c\u4e0d\u540cTransformer\u67b6\u6784\u7684\u8bc4\u4f30\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u53ca\u7ffb\u8bd1\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u5bf9\u4e8e\u6784\u5efa\u5065\u58ee\u7684\u591a\u8bed\u8a00AS2\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u663e\u8457\u7f29\u5c0f\u4e86\u975e\u82f1\u8bed\u4e0e\u82f1\u8bed\u73af\u5883\u4e0b\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2406.10162": "|**2024-06-14**|**Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models**|Carson Denison et.al.|[2406.10162](http://arxiv.org/abs/2406.10162)|**[link](https://github.com/anthropics/sycophancy-to-subterfuge-paper)**|**\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u5f53\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5b66\u4f1a\u56e0\u8bad\u7ec3\u76ee\u6807\u4e0d\u660e\u786e\u800c\u83b7\u5f97\u4e0d\u671f\u671b\u7684\u884c\u4e3a\u65f6\uff0c\u5c31\u4f1a\u51fa\u73b0\u89c4\u683c\u6e38\u620f\u73b0\u8c61\u3002\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u4ece\u7b80\u5355\u7684\u5949\u627f\u884c\u4e3a\u53d1\u5c55\u5230\u66f4\u590d\u6742\u4e14\u5371\u9669\u7684\u5956\u52b1\u7be1\u6539\uff0c\u5373\u6a21\u578b\u76f4\u63a5\u4fee\u6539\u5176\u81ea\u8eab\u7684\u5956\u52b1\u673a\u5236\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8fd9\u4e9b\u590d\u6742\u884c\u4e3a\u53ef\u80fd\u8d85\u51fa\u63a2\u7d22\u7684\u8303\u7574\u3002\u672c\u8bba\u6587\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u4f1a\u5728\u5b66\u4e60\u5e38\u89c1\u89c4\u683c\u6e38\u620f\u7b56\u7565\u540e\uff0c\u6cdb\u5316\u5230\u6267\u884c\u66f4\u4e3a\u7f55\u89c1\u548c\u660e\u663e\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u5956\u52b1\u7be1\u6539\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9010\u6b65\u5347\u7ea7\u7684\u53ef\u6e38\u620f\u73af\u5883\u7cfb\u5217\uff0c\u5e76\u53d1\u73b0\u9488\u5bf9\u65e9\u671f\u9636\u6bb5\u73af\u5883\u7684\u8bad\u7ec3\u4f1a\u5bfc\u81f4\u5728\u540e\u7eed\u73af\u5883\u4e2d\u51fa\u73b0\u66f4\u591a\u7684\u89c4\u683c\u6e38\u620f\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u4e00\u5c0f\u90e8\u5206\u4f46\u975e\u96f6\u7684LLMs\uff0c\u5728\u7ecf\u5386\u4e86\u5b8c\u6574\u8bad\u7ec3\u8bfe\u7a0b\u540e\uff0c\u80fd\u591f\u96f6\u6837\u672c\u5730\u76f4\u63a5\u4fee\u6539\u5176\u5956\u52b1\u51fd\u6570\u3002\u91cd\u65b0\u8bad\u7ec3LLMs\u4ee5\u907f\u514d\u65e9\u671f\u9636\u6bb5\u7684\u6e38\u620f\u884c\u4e3a\u53ef\u4ee5\u51cf\u8f7b\u4f46\u4e0d\u80fd\u5b8c\u5168\u6d88\u9664\u540e\u671f\u73af\u5883\u4e2d\u7684\u5956\u52b1\u7be1\u6539\u3002\u6b64\u5916\uff0c\u5bf9\u53ef\u6e38\u620f\u73af\u5883\u8fdb\u884c\u65e0\u5bb3\u6027\u8bad\u7ec3\u5e76\u4e0d\u80fd\u963b\u6b62\u5956\u52b1\u7be1\u6539\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u4ece\u5e38\u89c1\u7684\u89c4\u683c\u6e38\u620f\u7b56\u7565\u4e2d\u6cdb\u5316\u5230\u66f4\u6076\u52a3\u7684\u5956\u52b1\u7be1\u6539\u884c\u4e3a\uff0c\u5e76\u4e14\u8981\u6d88\u9664\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u5e76\u975e\u6613\u4e8b\u3002**|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149](http://arxiv.org/abs/2406.10149)|**[link](https://github.com/booydar/babilong)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8f93\u5165\u4e0a\u4e0b\u6587\u957f\u5ea6\u663e\u8457\u589e\u52a0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u672a\u80fd\u5145\u5206\u8861\u91cf\u6a21\u578b\u5904\u7406\u957f\u7bc7\u6587\u672c\u4e2d\u7684\u4e8b\u5b9e\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86BABILong\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u6d4b\u8bd5\u6a21\u578b\u5728\u5206\u5e03\u5f0f\u957f\u6587\u6863\u4e2d\u8de8\u4e8b\u5b9e\u63a8\u7406\u7684\u80fd\u529b\u3002BABILong\u5305\u62ec20\u4e2a\u591a\u6837\u5316\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u5982\u4e8b\u5b9e\u94fe\u3001\u7b80\u5355\u5f52\u7eb3\u3001\u6f14\u7ece\u3001\u8ba1\u6570\u4ee5\u53ca\u5904\u7406\u5217\u8868/\u96c6\u5408\u7b49\u3002\u8fd9\u4e9b\u4efb\u52a1\u672c\u8eab\u5c31\u5177\u6709\u6311\u6218\u6027\uff0c\u800c\u5f53\u6240\u9700\u4e8b\u5b9e\u5206\u6563\u5728\u957f\u7bc7\u81ea\u7136\u6587\u672c\u4e2d\u65f6\uff0c\u96be\u5ea6\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6d41\u884c\u7684LLMs\u5b9e\u9645\u4e0a\u53ea\u5229\u7528\u4e8610%-20%\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4e14\u968f\u7740\u63a8\u7406\u590d\u6742\u6027\u7684\u63d0\u9ad8\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u5bf9\u4e8e\u66ff\u4ee3\u7684\u4e0a\u4e0b\u6587\u63a8\u7406\u65b9\u6cd5\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\u5728\u5355\u4e8b\u5b9e\u95ee\u9898\u56de\u7b54\u4e0a\u7684\u51c6\u786e\u7387\u4ec5\u4e3a60%\uff0c\u4e0e\u4e0a\u4e0b\u6587\u957f\u5ea6\u65e0\u5173\u3002\u5728\u4e0a\u4e0b\u6587\u6269\u5c55\u65b9\u6cd5\u4e2d\uff0c\u5faa\u73af\u8bb0\u5fc6Transformer\u5c55\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u53ef\u5904\u7406\u957f\u8fbe1100\u4e07\u4e2a\u4ee4\u724c\u7684\u957f\u5ea6\u3002BABILong\u57fa\u51c6\u6d4b\u8bd5\u53ef\u4ee5\u6269\u5c55\u5230\u4efb\u610f\u957f\u5ea6\uff0c\u4ee5\u652f\u6301\u8bc4\u4f30\u5177\u6709\u66f4\u5f3a\u80fd\u529b\u7684\u65b0\u6a21\u578b\uff0c\u5e76\u63d0\u4f9b\u4e86\u957f\u8fbe100\u4e07\u4ee4\u724c\u7684\u5206\u9694\u3002|\n", "2406.11840": "|**2024-06-17**|**LLaNA: Large Language and NeRF Assistant**|Andrea Amaduzzi et.al.|[2406.11840](http://arxiv.org/abs/2406.11840)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u56fe\u50cf\u548c3D\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5168\u9762\u6355\u6349\u7269\u4f53\u7684\u5916\u89c2\u548c\u51e0\u4f55\u7279\u6027\u4e0a\u5b58\u5728\u5c40\u9650\u3002\u8fd1\u671f\uff0c\u795e\u7ecf\u8f90\u5c04\u573a\uff08Neural Radiance Fields\uff0c\u7b80\u79f0NeRF\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u5668\uff08Multi-Layer Perceptron\uff0cMLP\uff09\u7684\u6743\u91cd\u7f16\u7801\u4e86\u7269\u4f53\u7684\u51e0\u4f55\u7ed3\u6784\u548c\u9ad8\u5ea6\u903c\u771f\u7684\u5916\u89c2\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06NeRF\u6574\u5408\u5230MLLM\u4e2d\u7684\u53ef\u884c\u6027\u548c\u6548\u679c\u3002\u6211\u4eec\u5f00\u53d1\u4e86LLaNA\uff0c\u8fd9\u662f\u9996\u4e2a\u901a\u7528\u7684NeRF-\u8bed\u8a00\u52a9\u624b\uff0c\u80fd\u591f\u6267\u884c\u65b0\u4efb\u52a1\uff0c\u5982NeRF\u63cf\u8ff0\u548c\u95ee\u7b54\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u76f4\u63a5\u5904\u7406NeRF MLP\u7684\u6743\u91cd\uff0c\u65e0\u9700\u6e32\u67d3\u56fe\u50cf\u6216\u6784\u5efa3D\u6570\u636e\u7ed3\u6784\uff0c\u5c31\u80fd\u63d0\u53d6\u6709\u5173\u4ee3\u8868\u5bf9\u8c61\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65e0\u987b\u4eba\u5de5\u5e72\u9884\u7684NeRF\u6587\u672c\u6807\u6ce8\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5404\u79cdNeRF-\u8bed\u8a00\u4efb\u52a1\uff0c\u5e76\u636e\u6b64\u5efa\u7acb\u4e86\u4e00\u4e2a\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u6211\u4eec\u7684\u6a21\u578b\u5bf9NeRF\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5904\u7406NeRF\u6743\u91cd\u7684\u65b9\u6cd5\u5728\u4e0e\u4eceNeRF\u4e2d\u63d0\u53d62D\u62163D\u8868\u793a\u8fdb\u884c\u6bd4\u8f83\u65f6\u8868\u73b0\u66f4\u4f18\u3002|\n", "2406.11839": "|**2024-06-17**|**mDPO: Conditional Preference Optimization for Multimodal Large Language Models**|Fei Wang et.al.|[2406.11839](http://arxiv.org/abs/2406.11839)|null|### \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5df2\u88ab\u8bc1\u660e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6821\u51c6\u7684\u6709\u6548\u624b\u6bb5\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06DPO\u5e94\u7528\u4e8e\u591a\u6a21\u6001\u573a\u666f\uff0c\u4f46\u53d1\u73b0\u5b9e\u73b0\u6301\u7eed\u6539\u8fdb\u9887\u5177\u6311\u6218\u3002\u901a\u8fc7\u5bf9\u6bd4\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5ffd\u89c6\u4e86\u56fe\u50cf\u6761\u4ef6\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86mDPO\uff0c\u4e00\u4e2a\u65e8\u5728\u9632\u6b62\u8bed\u8a00\u504f\u597d\u8fc7\u5ea6\u4f18\u5148\u7684\u591a\u6a21\u6001DPO\u76ee\u6807\uff0c\u540c\u65f6\u4f18\u5316\u56fe\u50cf\u504f\u597d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5956\u52b1\u951a\u70b9\uff0c\u786e\u4fdd\u9009\u62e9\u7684\u54cd\u5e94\u5956\u52b1\u4fdd\u6301\u6b63\u5411\uff0c\u4ece\u800c\u907f\u514d\u76f8\u5bf9\u504f\u597d\u4f18\u5316\u56fa\u6709\u7684\u53ef\u80fd\u6027\u964d\u4f4e\u95ee\u9898\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u4e24\u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u591a\u6a21\u6001LLM\u4ee5\u53ca\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0cmDPO\u6709\u6548\u89e3\u51b3\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5e76\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002|\n", "2406.11832": "|**2024-06-17**|**Unveiling Encoder-Free Vision-Language Models**|Haiwen Diao et.al.|[2406.11832](http://arxiv.org/abs/2406.11832)|**[link](https://github.com/baaivision/eve)**|**\u5f53\u524d\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u4e3b\u8981\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7f16\u7801\u5668\u6765\u63d0\u53d6\u89c6\u89c9\u7279\u5f81\uff0c\u7136\u540e\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u5728\u62bd\u8c61\u89c6\u89c9\u8868\u793a\u65b9\u9762\u8bbe\u5b9a\u4e86\u5f3a\u70c8\u7684\u5148\u9a8c\uff0c\u5982\u5206\u8fa8\u7387\u3001\u6bd4\u4f8b\u548c\u8bed\u4e49\u503e\u5411\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86VLM\u7684\u7075\u6d3b\u6027\u548c\u6548\u7387\u3002\u76f4\u63a5\u8bad\u7ec3\u65e0\u7f16\u7801\u5668\u7684\u7eafVLM\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u4e14\u9c9c\u6709\u63a2\u7d22\u3002\u5b9e\u8bc1\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u79cd\u76f4\u63a5\u8bad\u7ec3\u65b9\u6cd5\u4f1a\u5bfc\u81f4\u6536\u655b\u7f13\u6162\u548c\u6027\u80fd\u5dee\u8ddd\u8f83\u5927\u3002\u672c\u6587\u65e8\u5728\u5f25\u5408\u7f16\u7801\u5668\u4f9d\u8d56\u578b\u548c\u65e0\u7f16\u7801\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u7eafVLM\u8bad\u7ec3\u7b56\u7565\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u6df1\u5165\u5b9e\u9a8c\u63ed\u793a\u4e86\u9ad8\u6548\u8bad\u7ec3\u65e0\u7f16\u7801\u5668VLM\u7684\u5173\u952e\u8981\u7d20\uff1a\uff081\uff09\u5728\u7edf\u4e00\u7684\u89e3\u7801\u5668\u5185\u878d\u5408\u89c6\u89c9\u4e0e\u8bed\u8a00\u8868\u793a\uff1b\uff082\uff09\u901a\u8fc7\u989d\u5916\u76d1\u7763\u63d0\u5347\u89c6\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e9b\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86EVE\uff0c\u4e00\u4e2a\u65e0\u7f16\u7801\u5668\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u65e2\u80fd\u9ad8\u6548\u8bad\u7ec3\u4e5f\u80fd\u5feb\u901f\u63a8\u7406\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4ec5\u4f7f\u75283500\u4e07\u516c\u5f00\u53ef\u7528\u7684\u6570\u636e\uff0cEVE\u5c31\u80fd\u5728\u591a\u4e2a\u89c6\u89c9\u8bed\u8a00\u57fa\u51c6\u4e0a\u4e0e\u7c7b\u4f3c\u5bb9\u91cf\u7684\u7f16\u7801\u5668\u4f9d\u8d56\u578bVLM\u5339\u654c\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u795e\u79d8\u3001\u6570\u636e\u672a\u516c\u5f00\u7684Fuyu-8B\u6a21\u578b\u3002\u6211\u4eec\u76f8\u4fe1\uff0cEVE\u4e3a\u8de8\u6a21\u6001\u5f00\u53d1\u7eaf\u7cb9\u7684\u89e3\u7801\u5668\u67b6\u6784\u63d0\u4f9b\u4e86\u4e00\u4e2a\u900f\u660e\u4e14\u9ad8\u6548\u7684\u8def\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u516c\u5f00\u5728\uff1ahttps://github.com/baaivision/EVE\u3002**|\n", "2406.11831": "|**2024-06-17**|**Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models**|Bingqi Ma et.al.|[2406.11831](http://arxiv.org/abs/2406.11831)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u89e3\u7801\u5668-only\u53d8\u538b\u5668\u5728\u6587\u672c\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5982\u4f55\u5c06\u8fd9\u4e9b\u5148\u8fdb\u7684LLMs\u5e94\u7528\u4e8e\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\u4ecd\u662f\u4e00\u4e2a\u5f85\u63a2\u7d22\u7684\u95ee\u9898\u3002\u6211\u4eec\u53d1\u73b0\u76f4\u63a5\u4f7f\u7528LLM\u4f5c\u4e3a\u63d0\u793a\u7f16\u7801\u5668\u4f1a\u663e\u8457\u964d\u4f4e\u751f\u6210\u56fe\u50cf\u65f6\u7684\u63d0\u793a\u8ddf\u968f\u80fd\u529b\u3002\u4e3b\u8981\u5b58\u5728\u4e24\u4e2a\u95ee\u9898\uff1a\u4e00\u662fLLM\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u8bad\u7ec3\u4e0e\u6269\u6563\u6a21\u578b\u5bf9\u533a\u5206\u6027\u63d0\u793a\u7279\u5f81\u7684\u9700\u6c42\u4e0d\u5339\u914d\uff1b\u4e8c\u662f\u89e3\u7801\u5668\u67b6\u6784\u56fa\u6709\u7684\u4f4d\u7f6e\u504f\u89c1\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u6846\u67b6\uff0c\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4f7f\u7528\u6307\u5357\uff0c\u589e\u5f3aLLM\u7684\u6587\u672c\u8868\u793a\u80fd\u529b\uff0c\u6d88\u9664\u5176\u5185\u5728\u7684\u5b9a\u4f4d\u504f\u89c1\uff0c\u4ece\u800c\u7075\u6d3b\u5730\u5c06\u6700\u5148\u8fdb\u7684LLMs\u878d\u5165\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u878d\u5408\u591a\u4e2aLLMs\u7684\u65b9\u6cd5\u3002\u9274\u4e8eTransformer\u67b6\u6784\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6269\u5c55\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8bbe\u8ba1\u4e86\u57fa\u4e8e\u8be5\u6846\u67b6\u7684LLM-Infused Diffusion Transformer\uff08LI-DiT\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86LI-DiT\u5728\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u6570\u636e\u91cf\u4e0b\u7684\u6027\u80fd\u3002\u5f97\u76ca\u4e8eLLMs\u7684\u5185\u5728\u80fd\u529b\u53ca\u6211\u4eec\u7684\u521b\u65b0\u8bbe\u8ba1\uff0cLI-DiT\u7684\u63d0\u793a\u7406\u89e3\u6027\u80fd\u8f7b\u677e\u8d85\u8d8a\u5f00\u6e90\u7684\u6700\u65b0\u6a21\u578b\uff0c\u4ee5\u53ca\u5305\u62ecStable Diffusion 3\u3001DALL-E 3\u548cMidjourney V6\u5728\u5185\u7684\u4e3b\u6d41\u95ed\u6e90\u5546\u4e1a\u6a21\u578b\u3002\u5f3a\u5927\u7684LI-DiT-10B\u5c06\u5728\u8fdb\u4e00\u6b65\u4f18\u5316\u548c\u5b89\u5168\u68c0\u67e5\u540e\u63d0\u4f9b\u3002|\n", "2406.11827": "|**2024-06-17**|**WPO: Enhancing RLHF with Weighted Preference Optimization**|Wenxuan Zhou et.al.|[2406.11827](http://arxiv.org/abs/2406.11827)|**[link](https://github.com/wzhouad/wpo)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u66f4\u597d\u5730\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6709\u524d\u666f\u65b9\u6cd5\u3002\u7531\u4e8e\u6210\u672c\u6548\u76ca\u548c\u53ef\u6269\u5c55\u6027\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u2014\u2014\u901a\u8fc7\u5176\u4ed6\u6a21\u578b\u83b7\u53d6\u504f\u597d\u6570\u636e\u2014\u2014\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u7136\u800c\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u5e38\u53d7\u91c7\u6837\u7b56\u7565\u4e0e\u76ee\u6807\u7b56\u7565\u4e4b\u95f4\u5206\u5e03\u5dee\u5f02\u7684\u5f71\u54cd\uff0c\u5bfc\u81f4\u4f18\u5316\u6548\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7b56\u7565\u2014\u2014\u52a0\u6743\u504f\u597d\u4f18\u5316\uff08WPO\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u504f\u597d\u8bc4\u5206\u5bf9\uff0c\u4f7f\u79bb\u7ebf\u6570\u636e\u66f4\u63a5\u8fd1\u4e8e\u5f53\u524d\u7b56\u7565\uff0c\u4ece\u800c\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5206\u5e03\u5dee\u8ddd\u96be\u9898\uff0c\u8fd8\u63d0\u5347\u4e86\u4f18\u5316\u8fc7\u7a0b\uff0c\u65e0\u9700\u989d\u5916\u6210\u672c\u3002 \u6211\u4eec\u5728Alpaca Eval 2\u548cMT-bench\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002WPO\u5728Alpaca Eval 2\u4e0a\u7684\u6027\u80fd\u6bd4\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u63d0\u9ad8\u4e865.6%\u3002\u57fa\u4e8eLlama-3-8B-Instruct\uff0cWPO\u751a\u81f3\u5efa\u7acb\u4e86\u663e\u8457\u7684\u957f\u5ea6\u63a7\u5236\u80dc\u7387\uff0c\u8fbe\u523048.6%\uff0c\u572880\u4ebf\u53c2\u6570\u6a21\u578b\u6392\u884c\u699c\u4e0a\u6210\u4e3a\u6700\u5f3a\u52b2\u7684\u6a21\u578b\u3002\u6211\u4eec\u5c06\u5728\u4e0a\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002**|\n", "2406.11818": "|**2024-06-17**|**Embodied Instruction Following in Unknown Environments**|Zhenyu Wu et.al.|[2406.11818](http://arxiv.org/abs/2406.11818)|null|\u5728\u81ea\u4e3b\u5bb6\u5ead\u670d\u52a1\u7cfb\u7edf\u4e2d\uff0c\u4f7f\u5b9e\u4f53\u4ee3\u7406\u80fd\u6839\u636e\u81ea\u7136\u8bed\u8a00\u5b8c\u6210\u590d\u6742\u7684\u4eba\u7c7b\u6307\u4ee4\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u4ec5\u80fd\u5728\u6240\u6709\u4e92\u52a8\u5bf9\u8c61\u90fd\u63d0\u4f9b\u7ed9\u4ee3\u7406\u7684\u5df2\u77e5\u73af\u5883\u4e2d\u6267\u884c\u6307\u4ee4\uff0c\u76f4\u63a5\u5c06\u73b0\u6709\u65b9\u6cd5\u5e94\u7528\u4e8e\u672a\u77e5\u73af\u5883\u901a\u5e38\u4f1a\u4ea7\u751f\u64cd\u4f5c\u4e0d\u5b58\u5728\u7269\u4f53\u7684\u4e0d\u53ef\u884c\u8ba1\u5212\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u672a\u77e5\u73af\u5883\u7684\u590d\u6742\u4efb\u52a1\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\uff08Embodied Instruction Following\uff0cEIF\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4f7f\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u63a2\u7d22\u73af\u5883\uff0c\u5229\u7528\u73b0\u6709\u7269\u4f53\u751f\u6210\u53ef\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u8fbe\u6210\u62bd\u8c61\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\u5668\u548c\u4f4e\u5c42\u63a2\u7d22\u63a7\u5236\u5668\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5c42\u6b21\u5316\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u52a8\u6001\u533a\u57df\u6ce8\u610f\u529b\u6784\u5efa\u573a\u666f\u7684\u8bed\u4e49\u8868\u793a\u5730\u56fe\uff0c\u4ee5\u5c55\u793a\u5df2\u77e5\u7684\u89c6\u89c9\u7ebf\u7d22\uff0c\u4f7f\u4efb\u52a1\u89c4\u5212\u548c\u573a\u666f\u63a2\u7d22\u4e0e\u4eba\u7c7b\u6307\u4ee4\u76ee\u6807\u4fdd\u6301\u4e00\u81f4\u3002\u5bf9\u4e8e\u4efb\u52a1\u89c4\u5212\u5668\uff0c\u6839\u636e\u4efb\u52a1\u5b8c\u6210\u8fc7\u7a0b\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\uff0c\u6211\u4eec\u751f\u6210\u6b65\u9aa4\u5f0f\u7684\u53ef\u884c\u8ba1\u5212\u3002\u5bf9\u4e8e\u63a2\u7d22\u63a7\u5236\u5668\uff0c\u6839\u636e\u751f\u6210\u7684\u6b65\u9aa4\u8ba1\u5212\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\u9884\u6d4b\u6700\u4f18\u7684\u5bfc\u822a\u6216\u7269\u4f53\u4ea4\u4e92\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5927\u578b\u623f\u5c4b\u7ea7\u573a\u666f\u4e2d\u7684204\u4e2a\u590d\u6742\u4eba\u7c7b\u6307\u4ee4\uff08\u5982\u505a\u65e9\u9910\u548c\u6574\u7406\u623f\u95f4\uff09\u4e0a\u5b9e\u73b0\u4e8645.09%\u7684\u6210\u529f\u7387\u3002|\n", "2406.11816": "|**2024-06-17**|**VideoLLM-online: Online Video Large Language Model for Streaming Video**|Joya Chen et.al.|[2406.11816](http://arxiv.org/abs/2406.11816)|null|## \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u589e\u5f3a\u4e86\u89c6\u89c9\u529f\u80fd\uff0c\u80fd\u591f\u7406\u89e3\u56fe\u50cf\u3001\u89c6\u9891\u548c\u878d\u5408\u4e86\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u5185\u5bb9\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5927\u6a21odels\u7684\u8bad\u7ec3\u65b9\u6cd5\u901a\u5e38\u5c06\u89c6\u9891\u89c6\u4e3a\u9884\u5148\u526a\u8f91\u597d\u7684\u7247\u6bb5\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5728\u5904\u7406\u8fde\u7eed\u89c6\u9891\u6d41\u65f6\u6548\u679c\u4e0d\u4f73\u4e14\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cLearning-In-Video-Stream\u201d\uff08LIVE\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b9e\u65f6\u3001\u957f\u5e8f\u5217\u3001\u4e0e\u89c6\u9891\u6d41\u540c\u6b65\u7684\u5bf9\u8bdd\uff0c\u9002\u7528\u4e8e\u8fde\u7eed\u89c6\u9891\u8f93\u5165\u3002LIVE\u6846\u67b6\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a\uff081\uff09\u4e00\u4e2a\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u8fde\u7eed\u6d41\u5f0f\u8f93\u5165\u7684\u8bed\u8a00\u5efa\u6a21\u76ee\u6807\uff1b\uff082\uff09\u4e00\u79cd\u6570\u636e\u751f\u6210\u7b56\u7565\uff0c\u5c06\u79bb\u7ebf\u65f6\u95f4\u6807\u6ce8\u8f6c\u6362\u4e3a\u9002\u5408\u6d41\u5f0f\u5bf9\u8bdd\u7684\u683c\u5f0f\uff1b\uff083\uff09\u4e00\u4e2a\u4f18\u5316\u7684\u63a8\u7406\u7ba1\u9053\uff0c\u4ee5\u63d0\u9ad8\u5728\u5b9e\u9645\u89c6\u9891\u6d41\u4e2d\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u57fa\u4e8eLlama-2/Llama-3\uff0c\u6211\u4eec\u6784\u5efa\u4e86VideoLLM-online\u6a21\u578b\uff0c\u5e76\u901a\u8fc7\u5b83\u5c55\u793a\u4e86\u5728\u5904\u7406\u89c6\u9891\u6d41\u5bf9\u8bdd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\uff0c\u4f8b\u5982\uff0c\u5728A100 GPU\u4e0a\uff0c\u8be5\u6a21\u578b\u80fd\u57285\u5206\u949f\u89c6\u9891\u7247\u6bb5\u4e2d\u5b9e\u73b0\u8d85\u8fc710\u5e27\u6bcf\u79d2\u7684\u6d41\u5f0f\u5bf9\u8bdd\u3002\u6b64\u5916\uff0cVideoLLM-online\u8fd8\u5728\u516c\u5f00\u7684\u79bb\u7ebf\u89c6\u9891\u57fa\u51c6\u6d4b\u8bd5\uff08\u5982\u8bc6\u522b\u3001captioning\u548c\u9884\u6d4b\uff09\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u3001\u6a21\u578b\u3001\u6570\u636e\u548c\u6f14\u793a\u53d1\u5e03\u5728https://showlab.github.io/videollm-online\u4f9b\u4eba\u4f7f\u7528\u3002|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813](http://arxiv.org/abs/2406.11813)|null|\u5c3d\u7ba1\u8fd1\u671f\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5b58\u50a8\u5927\u91cf\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u8fd9\u4e9b\u77e5\u8bc6\u7684\u673a\u5236\u5c1a\u4e0d\u660e\u786e\u3002\u672c\u7814\u7a76\u9488\u5bf9\u8fd9\u4e00\u7f3a\u53e3\uff0c\u63a2\u8ba8\u4e86LLMs\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u5982\u4f55\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u3002\u7814\u7a76\u53d1\u73b0\u4e86\u4e00\u4e9b\u5173\u952e\u6d1e\u89c1\uff1a\u9996\u5148\uff0c\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0c\u66f4\u591a\u7684\u8bad\u7ec3\u6570\u636e\u5bf9\u6a21\u578b\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u7684\u80fd\u529b\u5e76\u65e0\u663e\u8457\u63d0\u5347\u3002\u5176\u6b21\uff0c\u8bad\u7ec3\u6b65\u6570\u4e0e\u8bb0\u5fc6\u9057\u5fd8\u548c\u4e8b\u5b9e\u77e5\u8bc6\u6cdb\u5316\u4e4b\u95f4\u5b58\u5728\u5e42\u5f8b\u5173\u7cfb\uff0c\u4f7f\u7528\u91cd\u590d\u8bad\u7ec3\u6570\u636e\u7684\u6a21\u578b\u9057\u5fd8\u901f\u5ea6\u66f4\u5feb\u3002\u7b2c\u4e09\uff0c\u589e\u5927\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u62b5\u6297\u9057\u5fd8\u7684\u80fd\u529b\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0cLLMs\u5728\u9884\u8bad\u7ec3\u4e2d\u7684\u4e8b\u5b9e\u77e5\u8bc6\u83b7\u53d6\u662f\u901a\u8fc7\u9010\u6b65\u589e\u52a0\u6bcf\u4e00\u6b65\u4e2d\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\u4e8b\u5b9e\u77e5\u8bc6\u51fa\u73b0\u7684\u6982\u7387\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u589e\u52a0\u968f\u540e\u4f1a\u56e0\u9057\u5fd8\u800c\u7a00\u91ca\u3002\u57fa\u4e8e\u8fd9\u79cd\u7406\u89e3\uff0c\u6211\u4eec\u80fd\u591f\u89e3\u91ca\u4e00\u4e9b\u6700\u8fd1\u89c2\u5bdf\u5230\u7684LLM\u884c\u4e3a\uff0c\u5982\u957f\u5c3e\u77e5\u8bc6\u4e0a\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4ee5\u53ca\u53bb\u91cd\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u597d\u5904\u3002|\n", "2406.11811": "|**2024-06-17**|**RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content**|Joao Monteiro et.al.|[2406.11811](http://arxiv.org/abs/2406.11811)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5927\u91cf\u4f9d\u8d56\u81ea\u52a8\u4ece\u4e92\u8054\u7f51\u6293\u53d6\u7684\u6570\u636e\uff0c\u5176\u4e2d\u5305\u62ec\u5305\u542b\u5927\u91cf\u901a\u7528\u77e5\u8bc6\u7684\u767e\u79d1\u5168\u4e66\uff08\u5982\u7ef4\u57fa\u767e\u79d1\uff09\uff0c\u4e5f\u53ef\u80fd\u4e0e\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u57fa\u51c6\u6570\u636e\u96c6\u91cd\u53e0\u3002\u56e0\u6b64\uff0c\u5982\u679c\u6d4b\u8bd5\u96c6\u53ef\u80fd\u5df2\u6cc4\u9732\u5230\u8bad\u7ec3\u96c6\u4e2d\uff0c\u5bf9\u6a21\u578b\u7684\u8bc4\u4f30\u53ef\u80fd\u4f1a\u4ea7\u751f\u8bef\u5bfc\u6027\u7684\u7ed3\u8bba\u3002\u4e3a\u4e86\u63a8\u52a8\u8bed\u8a00\u6a21\u578b\u7684\u516c\u6b63\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\u2014\u2014RepLiQA\uff0c\u9002\u7528\u4e8e\u95ee\u7b54\u548c\u4e3b\u9898\u68c0\u7d22\u4efb\u52a1\u3002RepLiQA\u662f\u4e00\u4e2a\u5305\u542b\u4e94\u4e2a\u5206\u7247\u7684\u6d4b\u8bd5\u96c6\uff0c\u5176\u4e2d\u56db\u4e2a\u5728\u672c\u8bba\u6587\u53d1\u5e03\u524d\u672a\u516c\u5f00\u6216\u901a\u8fc7LLM API\u63d0\u4f9b\u3002RepLiQA\u7684\u6bcf\u4e2a\u6837\u672c\u7531\u4ee5\u4e0b\u56db\u90e8\u5206\u7ec4\u6210\uff1a\uff081\uff09\u7531\u4eba\u7c7b\u6807\u6ce8\u5458\u521b\u4f5c\u7684\u865a\u6784\u573a\u666f\u63cf\u8ff0\u6587\u6863\uff08\u4f8b\u5982\u65b0\u95fb\u6587\u7ae0\uff09\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0d\u4f1a\u51fa\u73b0\u5728\u4e92\u8054\u7f51\u4e0a\uff1b\uff082\uff09\u5173\u4e8e\u6587\u6863\u4e3b\u9898\u7684\u95ee\u9898\uff1b\uff083\uff09\u76f4\u63a5\u6e90\u81ea\u6587\u6863\u4fe1\u606f\u7684\u6b63\u786e\u7b54\u6848\uff1b\uff084\uff09\u5305\u542b\u7b54\u6848\u7684\u6587\u6863\u6bb5\u843d\u3002\u8fd9\u610f\u5473\u7740\u53ea\u6709\u5f53\u6a21\u578b\u80fd\u5728\u63d0\u4f9b\u7684\u6587\u6863\u4e2d\u627e\u5230\u76f8\u5173\u5185\u5bb9\u65f6\uff0c\u624d\u80fd\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5305\u62ec\u591a\u4e2a\u6700\u5148\u8fdb\u7684LLM\uff0c\u4ee5\u63ed\u793a\u4e0d\u540c\u7c7b\u578b\u7684\u548c\u89c4\u6a21\u7684\u6a21\u578b\u5728\u6761\u4ef6\u8bed\u8a00\u5efa\u6a21\u8bbe\u7f6e\u4e0b\u7684\u6027\u80fd\u5dee\u5f02\u3002RepLiQA\u7684\u5df2\u53d1\u5e03\u5206\u7247\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\uff1ahttps://huggingface.co/datasets/ServiceNow/repliqa\u3002|\n", "2406.11801": "|**2024-06-17**|**Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations**|Rima Hazra et.al.|[2406.11801](http://arxiv.org/abs/2406.11801)|**[link](https://github.com/declare-lab/safety-arithmetic)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ffb\u8bd1\u548c\u95ee\u7b54\u7b49\u5e94\u7528\u4e2d\u7684\u65e5\u76ca\u91cd\u8981\uff0c\u786e\u4fdd\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6b63\u786e\u5bfc\u5411\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u5bf9\u9f50\u65b9\u6cd5\u5728\u5904\u7406\u52a8\u6001\u7528\u6237\u610f\u56fe\u548c\u590d\u6742\u76ee\u6807\u65f6\u5b58\u5728\u56f0\u96be\uff0c\u4f7f\u5f97\u6a21\u578b\u5bb9\u6613\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\u2014\u2014\u5b89\u5168\u7b97\u672f\uff08Safety Arithmetic\uff09\uff0c\u65e8\u5728\u63d0\u5347LLMs\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5305\u62ec\u57fa\u7840\u6a21\u578b\u3001\u76d1\u7763\u5fae\u8c03\u6a21\u578b\uff08SFT\uff09\u548c\u7f16\u8f91\u540e\u7684\u6a21\u578b\u3002\u5b89\u5168\u7b97\u672f\u5305\u542b\u4e24\u90e8\u5206\uff1a\u6709\u5bb3\u5185\u5bb9\u6d88\u9664\uff08Harm Direction Removal\uff09\u4ee5\u907f\u514d\u4e0d\u826f\u8f93\u51fa\uff0c\u4ee5\u53ca\u5b89\u5168\u5bf9\u9f50\uff08Safety Alignment\uff09\u4ee5\u4fc3\u8fdb\u5b89\u5168\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86NoIntentEdit\u6570\u636e\u96c6\uff0c\u5b83\u63ed\u793a\u4e86\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5b89\u5168\u98ce\u9669\u7684\u7f16\u8f91\u5b9e\u4f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5b89\u5168\u7b97\u672f\u663e\u8457\u589e\u5f3a\u4e86\u5b89\u5168\u63aa\u65bd\uff0c\u51cf\u5c11\u4e86\u8fc7\u5ea6\u5b89\u5168\u7684\u95ee\u9898\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u5b9e\u7528\u6027\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5728\u4fdd\u969c\u5185\u5bb9\u751f\u6210\u7684\u5b89\u5168\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002**|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846](http://arxiv.org/abs/2406.12846)|null|\u5f53\u524d\u7684\u957f\u89c6\u9891\u7406\u89e3\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u65f6\u957f\u4ec5\u5341\u51e0\u79d2\u7684\u89c6\u9891\uff0c\u5bf9\u5904\u7406\u66f4\u957f\u89c6\u9891\u7684\u6280\u672f\u63a2\u7d22\u6709\u9650\u3002\u957f\u89c6\u9891\u4e2d\u7684\u5927\u91cf\u5e27\u6570\u5e26\u6765\u4e86\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u96be\u4ee5\u5b9a\u4f4d\u5173\u952e\u4fe1\u606f\u548c\u8fdb\u884c\u957f\u671f\u63a8\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faDrVideo\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6587\u6863\u68c0\u7d22\u7684\u7cfb\u7edf\uff0c\u4e13\u4e3a\u957f\u89c6\u9891\u7406\u89e3\u8bbe\u8ba1\u3002\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u89c6\u9891\u7406\u89e3\u95ee\u9898\u8f6c\u5316\u4e3a\u957f\u6587\u6863\u7406\u89e3\u4efb\u52a1\uff0c\u4ee5\u5145\u5206\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0cDrVideo\u5c06\u957f\u89c6\u9891\u8f6c\u6362\u4e3a\u6587\u672c\u5f62\u5f0f\u7684\u957f\u6587\u6863\uff0c\u9996\u5148\u68c0\u7d22\u5173\u952e\u5e27\u5e76\u589e\u5f3a\u8fd9\u4e9b\u5e27\u7684\u4fe1\u606f\uff0c\u4f5c\u4e3a\u7cfb\u7edf\u7684\u8d77\u70b9\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u57fa\u4e8e\u4ee3\u7406\u7684\u8fed\u4ee3\u5faa\u73af\uff0c\u6301\u7eed\u641c\u7d22\u7f3a\u5931\u4fe1\u606f\u3001\u8865\u5145\u76f8\u5173\u6570\u636e\uff0c\u5e76\u5728\u6536\u96c6\u5230\u8db3\u591f\u7684\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u4fe1\u606f\u540e\uff0c\u4ee5\u94fe\u5f0f\u601d\u8003\u7684\u65b9\u5f0f\u7ed9\u51fa\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u591a\u4e2a\u957f\u89c6\u9891\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002DrVideo\u5728EgoSchema\uff083\u5206\u949f\uff09\u6d4b\u8bd5\u4e2d\u6bd4\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa3.8\u4e2a\u767e\u5206\u70b9\uff0c\u5728MovieChat-1K\uff0810\u5206\u949f\uff09\u7684break\u6a21\u5f0f\u548cglobal\u6a21\u5f0f\u4e2d\u5206\u522b\u63d0\u9ad817.9\u548c38.0\u5206\uff0c\u4ee5\u53ca\u5728LLama-Vid QA\uff08\u8d85\u8fc760\u5206\u949f\uff09\u6570\u636e\u96c6\u4e0a\u63d0\u534730.2\u5206\u3002|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845](http://arxiv.org/abs/2406.12845)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\uff08RM\uff09\uff0c\u8fc7\u7a0b\u901a\u5e38\u4ece\u6bd4\u8f83\u540c\u4e00\u7528\u6237\u8bf7\u6c42\u7684\u54cd\u5e94\u5f00\u59cb\uff0c\u76f8\u5bf9\u8bc4\u5206\u6307\u793a\u4eba\u7c7b\u66f4\u559c\u6b22\u54ea\u4e2a\u54cd\u5e94\u3002\u7136\u800c\uff0c\u7531\u4e8eRM\u7684\u9ed1\u76d2\u7279\u6027\uff0c\u5176\u8f93\u51fa\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u4eba\u4eec\u96be\u4ee5\u7406\u89e3\u4e3a\u4ec0\u4e48RM\u8ba4\u4e3a\u67d0\u4e2a\u56de\u590d\u662f\u597d\u7684\u3002\u9274\u4e8eRM\u4f5c\u4e3a\u4eba\u7c7b\u504f\u597d\u7684\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u8bae\u91c7\u7528\u4e24\u9636\u6bb5\u65b9\u6cd5\u6765\u521b\u5efa\u53ef\u89e3\u91ca\u7684RM\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u591a\u7ef4\u7edd\u5bf9\u8bc4\u5206\u6570\u636e\u8bad\u7ec3\u7edd\u5bf9\u8bc4\u7ea7\u591a\u76ee\u6807\u5956\u52b1\u6a21\u578b\uff08ArmoRM\uff09\uff0c\u6bcf\u4e2a\u7ef4\u5ea6\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u76ee\u6807\uff08\u5982\u8bda\u5b9e\u3001\u8be6\u5c3d\u3001\u5b89\u5168\uff09\uff1b\u5176\u6b21\uff0c\u5229\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7b56\u7565\uff0c\u7ed3\u5408\u4e00\u4e2a\u95e8\u63a7\u7f51\u7edc\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u81ea\u52a8\u9009\u62e9\u6700\u5408\u9002\u7684\u5956\u52b1\u76ee\u6807\u3002\u6211\u4eec\u6210\u529f\u5730\u4f7f\u7528Llama-3 8B\u8bad\u7ec3\u4e86ArmoRM\uff0c\u5e76\u5728\u9876\u90e8\u6dfb\u52a0\u4e86\u4e00\u4e2a\u6d45\u5c42MLP\u4f5c\u4e3a\u95e8\u63a7\u7f51\u7edc\uff0c\u5f62\u6210\u4e86ArmoRM-Llama3-8B\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u8bc4\u4f30RM\u7684\u8bed\u8a00\u5efa\u6a21\u6027\u80fd\u7684RewardBench\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6210\u7ee9\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8d85\u8fc7\u4e86\u4f7f\u7528GPT-4\u6cd5\u5b98\u7684LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\uff0c\u5e76\u63a5\u8fd1\u4e8e\u89c4\u6a21\u66f4\u5927\u7684Nemotron-4 340B\u5956\u52b1\u6a21\u578b\u7684\u6c34\u5e73\u3002**|\n", "2406.12844": "|**2024-06-18**|**Synergizing Foundation Models and Federated Learning: A Survey**|Shenghui Li et.al.|[2406.12844](http://arxiv.org/abs/2406.12844)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3001\u89c6\u89c9Transformer\u548c\u591a\u6a21\u6001\u6a21\u578b\u7b49\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u7684\u53d1\u5c55\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u4e0e\u5c0f\u578b\u6a21\u578b\u76f8\u6bd4\uff0cFMs\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5bf9\u5927\u91cf\u6570\u636e\u7684\u9700\u6c42\u66f4\u5927\u3002\u5c3d\u7ba1\u901a\u7528FMs\u53ef\u4ee5\u4f7f\u7528\u4e92\u8054\u7f51\u4e0a\u7684\u516c\u5f00\u6570\u636e\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u4f46\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684FMs\u9700\u8981\u4e13\u6709\u6570\u636e\uff0c\u8fd9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u56e0\u9690\u79c1\u95ee\u9898\u800c\u9762\u4e34\u6570\u636e\u53ef\u7528\u6027\u6311\u6218\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4f5c\u4e3a\u4e00\u79cd\u534f\u4f5c\u5b66\u4e60\u8303\u5f0f\uff0c\u6253\u7834\u4e86\u6570\u636e\u5171\u4eab\u7684\u969c\u788d\uff0c\u4e3a\u5229\u7528\u5206\u5e03\u5f0f\u6570\u636e\u5b9a\u5236\u548c\u9002\u5e94\u5404\u79cd\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u7684FMs\u63d0\u4f9b\u4e86\u524d\u666f\uff0c\u540c\u65f6\u4fdd\u62a4\u4e86\u6570\u636e\u9690\u79c1\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8bba\u6587\u63a2\u8ba8\u4e86FL\u4e0eFMs\u878d\u5408\u7684\u6f5c\u529b\u4e0e\u6311\u6218\uff0c\u603b\u7ed3\u4e86\u6838\u5fc3\u6280\u672f\u3001\u672a\u6765\u53d1\u5c55\u65b9\u5411\u4ee5\u53ca\u5e94\u7528\u573a\u666f\u3002\u5173\u4e8eFM-FL\u7684\u5b9a\u671f\u66f4\u65b0\u8bba\u6587\u96c6\u5408\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832](http://arxiv.org/abs/2406.12832)|**[link](https://github.com/arminazizi98/lamda)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u5df2\u7ecf\u6210\u4e3a\u6807\u51c6\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u5d4c\u5165\u7ef4\u5ea6\u7684\u589e\u52a0\uff0cLoRA\u6240\u9700\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u4e5f\u968f\u4e4b\u4e0a\u5347\uff0c\u5bfc\u81f4\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6b64\u5916\uff0c\u5176\u540e\u5411\u66f4\u65b0\u9700\u8981\u5b58\u50a8\u9ad8\u7ef4\u4e2d\u95f4\u6fc0\u6d3b\u548c\u4f18\u5316\u5668\u72b6\u6001\uff0c\u5bf9GPU\u5185\u5b58\u9700\u6c42\u8f83\u5927\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\u57fa\u4e8e\u8c31\u5206\u89e3\u7684\u4f4e\u7ef4\u9002\u5e94\uff08LaMDA\uff09\u3002LaMDA\u901a\u8fc7\u51bb\u7ed3\u7b2c\u4e00\u6295\u5f71\u77e9\u9635\uff08PMA\uff09\uff0c\u540c\u65f6\u5f15\u5165\u4e00\u4e2a\u4f4e\u7ef4\u53ef\u8bad\u7ec3\u7684\u5e73\u65b9\u77e9\u9635\uff0c\u5b9e\u73b0\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u7684\u5927\u5e45\u51cf\u5c11\u3002\u5728\u65e9\u671f\u7684\u5fae\u8c03\u9636\u6bb5\uff0cLaMDA\u9010\u6b65\u51bb\u7ed3\u7b2c\u4e8c\u6295\u5f71\u77e9\u9635\uff08PMB\uff09\uff0c\u8fdb\u4e00\u6b65\u964d\u4f4e\u6743\u91cd\u66f4\u65b0\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u63d0\u9ad8\u53c2\u6570\u6548\u7387\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u589e\u5f3a\u7248LaMDA++\uff0c\u5b83\u901a\u8fc7\u89c4\u8303\u5316\u9884\u8bad\u7ec3\u6a21\u578b\u6743\u91cd\u7684\u8c31\u5206\u6790\uff0c\u5b9e\u73b0\u8f7b\u91cf\u7ea7\u7684LoRA\u8def\u5f84\u81ea\u9002\u5e94\u79e9\u5206\u914d\u3002\u6211\u4eec\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecGLUE\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u57fa\u51c6\u3001\u6587\u672c\u6458\u8981\u3001\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4ee5\u53ca\u590d\u6742\u63a8\u7406\uff0c\u5e94\u7528\u4e8e\u4e0d\u540c\u7c7b\u578b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLaMDA\u5728\u6027\u80fd\u4e0a\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u5f53\u6216\u8d85\u8d8a\uff0c\u4e14\u5728\u5fae\u8c03\u671f\u95f4\u53ef\u51cf\u5c11\u9ad8\u8fbe17.7\u500d\u7684\u53c2\u6570\u66f4\u65b0\u6b21\u6570\uff0c\u4ee5\u53ca1.32\u500d\u7684\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u3002\u6211\u4eec\u5c06\u516c\u5f00\u4ee3\u7801\u3002**|\n", "2406.12822": "|**2024-06-18**|**Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?**|Pinzhen Chen et.al.|[2406.12822](http://arxiv.org/abs/2406.12822)|null|## \u80cc\u666f \u5927\u578b\u591a\u8bed\u8a00\u6a21\u578b\u65e8\u5728\u670d\u52a1\u4e0d\u540c\u8bed\u79cd\u7684\u6bcd\u8bed\u4f7f\u7528\u8005\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u5f53\u524d\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u5fae\u8c03\u548c\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4e0e\u5176\u521d\u8877\u4e0d\u7b26\uff0c\u539f\u56e0\u5728\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u7ffb\u8bd1\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ffb\u8bd1\u4e2d\u7684\u7455\u75b5\u3002\u5c1a\u4e0d\u6e05\u695a\u6307\u4ee4\u6570\u636e\u7684\u6027\u8d28\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u8f93\u51fa\uff0c\u540c\u65f6\uff0c\u7528\u7ffb\u8bd1\u6d4b\u8bd5\u96c6\u6765\u6355\u6349\u8fd9\u4e9b\u7ec6\u5fae\u5dee\u522b\u662f\u5426\u6709\u6548\u3002\u7531\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u9636\u6bb5\u5e38\u5e38\u7ed3\u5408\u4f7f\u7528\u7ffb\u8bd1\u6570\u636e\uff0c\u8fd9\u4e9b\u6f5c\u5728\u95ee\u9898\u53ef\u80fd\u88ab\u5ffd\u89c6\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5728\u6307\u4ee4\u8c03\u4f18\u548c\u8bc4\u4f30\u9636\u6bb5\u4f7f\u7528\u63a7\u5236\u6027\u7684\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6570\u636e\uff0c\u6765\u63a2\u7a76\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u89c2\u5bdf\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u516b\u79cd\u57fa\u7840\u6a21\u578b\u548c\u516b\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u6bcd\u8bed\u6216\u751f\u6210\u6027\u57fa\u51c6\uff0c\u4f7f\u7528\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6307\u4ee4\u6570\u636e\u65f6\uff0c\u6a21\u578b\u6027\u80fd\u9ad8\u65f6\uff0c\u4e24\u8005\u4e4b\u95f4\u7684\u5dee\u5f02\u5c24\u4e3a\u660e\u663e\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u7684\u6d4b\u8bd5\u96c6\u4e0a\u5219\u4e0d\u7136\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u6b63\u5219\u5316\u5bf9\u4e8e\u7ed3\u6784\u5316\u4efb\u52a1\u6709\u76ca\uff0c\u4f46\u5bf9\u4e8e\u751f\u6210\u6027\u4efb\u52a1\u5219\u4e0d\u7136\u3002|\n", "2406.12809": "|**2024-06-18**|**Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?**|Zhe Yang et.al.|[2406.12809](http://arxiv.org/abs/2406.12809)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e0d\u4e00\u81f4\u7684\u95ee\u9898\uff0c\u4f8b\u5982\u5bf9\u91cd\u8ff0\u6216\u5fae\u5c0f\u987a\u5e8f\u53d8\u5316\u7684\u53cd\u5e94\u4e0d\u4e00\u81f4\u3002\u9664\u4e86\u8fd9\u4e9b\u4e0d\u7a33\u5b9a\u6027\uff0c\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\u5c3d\u7ba1LLMs\u80fd\u591f\u89e3\u51b3\u96be\u9898\uff0c\u4f46\u5728\u76f8\u5bf9\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\u5374\u53ef\u80fd\u5931\u8d25\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u4ece\u96be\u5230\u6613\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u6211\u4eec\u521b\u5efa\u4e86ConsisEval\u57fa\u51c6\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u5305\u542b\u4e24\u4e2a\u96be\u5ea6\u6709\u5e8f\u7684\u95ee\u9898\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u81f4\u6027\u5206\u6570\u7684\u6982\u5ff5\uff0c\u4ee5\u91cf\u5316\u8fd9\u79cd\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u5206\u6790\u901a\u8fc7\u76f8\u5bf9\u4e00\u81f4\u6027\u5206\u6570\u6539\u8fdb\u4e00\u81f4\u6027\u6f5c\u529b\u3002\u901a\u8fc7\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4ee5\u4e0b\u53d1\u73b0\uff1a(1) GPT-4\u83b7\u5f9792.2%\u7684\u6700\u9ad8\u4e00\u81f4\u6027\u5206\u6570\uff0c\u4f46\u4ecd\u56e0\u5197\u4f59\u4fe1\u606f\u7684\u5e72\u6270\u3001\u95ee\u9898\u8bef\u89e3\u7b49\u95ee\u9898\u5bf9\u7279\u5b9a\u95ee\u9898\u4e0d\u4e00\u81f4\uff1b(2) \u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u901a\u5e38\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u4e00\u81f4\u6027\uff0c\u4f46\u4e5f\u5b58\u5728\u4f8b\u5916\u60c5\u51b5\uff1b(3) \u5bf9\u4e8e Fine-tuning \u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u800c\u8a00\uff0c\u786c\u6570\u636e\u53ef\u4ee5\u63d0\u9ad8\u4e00\u81f4\u6027\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12800": "|**2024-06-18**|**Supporting Human Raters with the Detection of Harmful Content using Large Language Models**|Kurt Thomas et.al.|[2406.12800](http://arxiv.org/abs/2406.12800)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u6216\u8f85\u52a9\u4eba\u7c7b\u5ba1\u9605\u8005\u68c0\u6d4b\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u5982\u4ec7\u6068\u8a00\u8bba\u3001\u9a9a\u6270\u3001\u6781\u7aef\u4e3b\u4e49\u548c\u9009\u4e3e\u8bef\u5bfc\u3002\u901a\u8fc750,000\u6761\u8bc4\u8bba\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u6bd4\u65f6\u80fd\u8fbe\u523090%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u63d0\u51fa\u4e94\u79cd\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4ee5\u6574\u5408LLMs\u4e0e\u4eba\u5de5\u8bc4\u7ea7\uff0c\u4f8b\u5982\u9884\u7b5b\u9009\u975e\u66b4\u529b\u5185\u5bb9\u3001\u68c0\u6d4b\u4eba\u7c7b\u8bc4\u7ea7\u53ef\u80fd\u7684\u9519\u8bef\uff0c\u6216\u8005\u63d0\u4f9b\u5173\u952e\u4e0a\u4e0b\u6587\u4ee5\u652f\u6301\u4eba\u5de5\u8bc4\u7ea7\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u4e00\u4e2a\u4f18\u5316\u7684\u63d0\u793a\u6765\u652f\u6301\u8fd9\u4e9b\u8bbe\u8ba1\u6a21\u5f0f\u3002\u5728\u5b9e\u9645\u5e94\u7528\u7684\u8bd5\u70b9\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f18\u5316\u4eba\u529b\u8d44\u6e90\u6548\u7387\u65b9\u9762\u5b9e\u73b0\u4e8641.5%\u7684\u63d0\u5347\uff0c\u540c\u65f6\u5728\u68c0\u6d4b\u8fdd\u89c4\u5185\u5bb9\u7684\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e0a\u5206\u522b\u63d0\u9ad8\u4e869%\u81f311%\u3002|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793](http://arxiv.org/abs/2406.12793)|**[link](https://github.com/thudm/chatglm-6b)**|\u6211\u4eec\u4ecb\u7ecdChatGLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u4e0d\u65ad\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7cfb\u5217\u3002\u672c\u62a5\u544a\u4e3b\u8981\u5173\u6ce8GLM-4\u8bed\u8a00\u7cfb\u5217\uff0c\u5305\u62ecGLM-4\u3001GLM-4-Air\u548cGLM-4-9B\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u6211\u4eec\u5f53\u524d\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u96c6\u6210\u4e86\u524d\u4e09\u4ee3ChatGLM\u7684\u6240\u6709\u7ecf\u9a8c\u548c\u6559\u8bad\u3002\u8fd9\u4e9b\u6a21\u578b\u7ecf\u8fc7\u4e86\u5341\u4e07\u4ebf\u6b21\u8bad\u7ec3\uff0c\u4e3b\u8981\u6db5\u76d6\u4e2d\u6587\u548c\u82f1\u8bed\uff0c\u4ee5\u53ca\u5c11\u91cf\u6765\u81ea24\u79cd\u8bed\u8a00\u7684\u8bed\u6599\u5e93\uff0c\u4fa7\u91cd\u4e8e\u4e2d\u82f1\u6587\u7684\u5bf9\u9f50\u3002\u9ad8\u8d28\u91cf\u7684\u5bf9\u9f50\u662f\u901a\u8fc7\u591a\u9636\u6bb5\u7684\u540e\u8bad\u7ec3\u8fc7\u7a0b\u5b9e\u73b0\u7684\uff0c\u5305\u62ec\u76d1\u7763\u5fae\u8c03\u548c\u5b66\u4e60\u4eba\u7c7b\u53cd\u9988\u3002\u8bc4\u4f30\u663e\u793a\uff0cGLM-4\u5728\u901a\u7528\u6307\u6807\u5982MMLU\u3001GSM8K\u3001MATH\u3001BBH\u3001GPQA\u548cHumanEval\u4e0a\u63a5\u8fd1\u6216\u4f18\u4e8eGPT-4\uff1b\u5728IFEval\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63a5\u8fd1GPT-4 Turbo\uff1b\u5728\u957f\u6587\u672c\u4efb\u52a1\u4e0a\u4e0eGPT-4 Turbo\uff08128K\uff09\u548cClaude 3\u76f8\u5f53\uff1b\u5728\u4e2d\u6587\u5bf9\u9f50\u65b9\u9762\uff0cGLM-4\u4f18\u4e8eGPT-4\uff0c\u6839\u636eAlignBench\u8861\u91cf\u3002GLM-4 All Tools\u6a21\u578b\u8fdb\u4e00\u6b65\u8fdb\u884c\u4e86\u5bf9\u9f50\uff0c\u4ee5\u7406\u89e3\u7528\u6237\u610f\u56fe\u5e76\u80fd\u81ea\u4e3b\u51b3\u5b9a\u4f55\u65f6\u4f7f\u7528\u54ea\u79cd\u5de5\u5177\uff0c\u5982Web\u6d4f\u89c8\u5668\u3001Python\u89e3\u91ca\u5668\u3001\u6587\u672c\u8f6c\u56fe\u50cf\u6a21\u578b\u548c\u81ea\u5b9a\u4e49\u51fd\u6570\uff0c\u4ee5\u6709\u6548\u5730\u5b8c\u6210\u590d\u6742\u4efb\u52a1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5b83\u5728\u8bf8\u5982\u901a\u8fc7\u7f51\u7edc\u6d4f\u89c8\u83b7\u53d6\u4fe1\u606f\u548c\u4f7f\u7528Python\u89e3\u91ca\u5668\u89e3\u9898\u7b49\u4efb\u52a1\u4e0a\u4e0eGPT-4 All Tools\u76f8\u5339\u914d\u751a\u81f3\u8d85\u8d8a\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u4e00\u7cfb\u5217\u6a21\u578b\uff0c\u5305\u62ecChatGLM-6B\uff08\u4e09\u4ee3\uff09\u3001GLM-4-9B\uff08128K\u30011M\uff09\u3001GLM-4V-9B\u3001WebGLM\u548cCodeGeeX\uff0c\u57282023\u5e74\u4ec5Hugging Face\u4e0a\u5c31\u6709\u8d85\u8fc71000\u4e07\u6b21\u4e0b\u8f7d\u3002\u8fd9\u4e9b\u5f00\u6e90\u6a21\u578b\u53ef\u901a\u8fc7\u548c\u8bbf\u95ee\u3002|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784](http://arxiv.org/abs/2406.12784)|**[link](https://github.com/Cyno2232/UBENCH)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4f4e\u53ef\u89e3\u91ca\u6027\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u672a\u9884\u89c1\u60c5\u51b5\u4e0b\u5e38\u4f1a\u51fa\u73b0\u9519\u8bef\uff0c\u9650\u5236\u4e86\u5176\u4ef7\u503c\u3002\u5c3d\u7ba1\u5df2\u6709\u8bb8\u591a\u7814\u7a76\u81f4\u529b\u4e8e\u6784\u5efa\u5168\u9762\u7684\u8bc4\u4f30\u4f53\u7cfb\uff0c\u4f46\u5148\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u5173\u6ce8\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u5bf9\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u4e0d\u8db3\uff0c\u53ef\u80fd\u5bfc\u81f4\u4e0d\u7a33\u5b9a\u6027\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u8861\u91cfLLM\u53ef\u9760\u6027\u65f6\u8d44\u6e90\u6d88\u8017\u5927\uff0c\u4e14\u96be\u4ee5\u6d4b\u8bd5\u9ed1\u76d2\u6a21\u578b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86UBENCH\uff0c\u4e00\u4e2a\u5168\u9762\u7684LLM\u53ef\u9760\u6027\u8bc4\u4f30\u57fa\u51c6\u3002\u5b83\u5305\u542b3,978\u4e2a\u6db5\u76d6\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u3001\u63a8\u7406\u80fd\u529b\u7684\u591a\u9009\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cUBENCH\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u5355\u6b21\u91c7\u6837\u65b9\u6cd5\u663e\u8457\u8282\u7701\u4e86\u8ba1\u7b97\u8d44\u6e90\uff0c\u76f8\u8f83\u4e8e\u9700\u8981\u591a\u6b21\u91c7\u6837\u7684\u57fa\u7ebf\u65b9\u6cd5\u66f4\u4e3a\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528UBENCH\u8bc4\u4f30\u4e8615\u79cd\u6d41\u884cLLM\u7684\u53ef\u9760\u6027\uff0c\u53d1\u73b0GLM4\u8868\u73b0\u51fa\u8272\uff0c\u7d27\u968f\u5176\u540e\u7684\u662fGPT-4\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86Chain-of-Thought\u63d0\u793a\u3001\u89d2\u8272\u626e\u6f14\u63d0\u793a\u3001\u9009\u9879\u987a\u5e8f\u548c\u6e29\u5ea6\u5bf9LLM\u53ef\u9760\u6027\u7684\u5f71\u54cd\uff0c\u5206\u6790\u4e86\u5b83\u4eec\u5bf9\u4e0d\u540c\u6a21\u578b\u7684\u4e0d\u540c\u4f5c\u7528\u3002|\n", "2406.14563": "|**2024-06-20**|**Model Merging and Safety Alignment: One Bad Model Spoils the Bunch**|Hasan Abed Al Kader Hammoud et.al.|[2406.14563](http://arxiv.org/abs/2406.14563)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5408\u5e76\u662f\u4e00\u79cd\u7ecf\u6d4e\u9ad8\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u591a\u4e2a\u4e13\u5bb6\u7ea7LLMs\u6574\u5408\u6210\u4e00\u4e2a\u5168\u80fd\u6a21\u578b\uff0c\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u5408\u5e76\u8fc7\u7a0b\u4e2d\u5b89\u5168\u5bf9\u9f50\u7684\u91cd\u8981\u6027\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u6a21\u578b\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u6a21\u578b\u5408\u5e76\u5bf9\u5bf9\u9f50\u6027\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6d41\u884c\u7684\u6a21\u578b\u5408\u5e76\u6280\u672f\uff0c\u53d1\u73b0\u73b0\u6709\u65b9\u6cd5\u4e0d\u4ec5\u4f20\u9012\u4e86\u9886\u57df\u4e13\u4e1a\u77e5\u8bc6\uff0c\u8fd8\u4f20\u64ad\u4e86\u4e0d\u4e00\u81f4\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u6cd5\u89e3\u51b3\u65b9\u6848\uff1a(1) \u751f\u6210\u5408\u6210\u7684\u5b89\u5168\u6027\u548c\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c(2) \u5c06\u8fd9\u4e9b\u751f\u6210\u7684\u6570\u636e\u878d\u5165\u73b0\u6709\u7684\u6570\u636e\u9a71\u52a8\u7684\u6a21\u578b\u5408\u5e76\u4f18\u5316\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u80fd\u591f\u5c06\u5bf9\u9f50\u6027\u89c6\u4e3a\u53ef\u4ee5\u6700\u5927\u5316\u4e8e\u5408\u5e76\u540eLLM\u4e2d\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5408\u5e76\u8fc7\u7a0b\u4e2d\u6574\u5408\u5bf9\u9f50\u76f8\u5173\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u662f\u65e2\u80fd\u4fdd\u6301\u9886\u57df\u4e13\u957f\u53c8\u80fd\u5b9e\u73b0\u826f\u597d\u5bf9\u9f50\u7684\u6a21\u578b\u3002|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562](http://arxiv.org/abs/2406.14562)|null|\u5f53\u9762\u4e34\u6d89\u53ca\u89c6\u89c9\u601d\u7ef4\u7684\u95ee\u9898\u65f6\uff0c\u4eba\u7c7b\u4f1a\u81ea\u7136\u5730\u5207\u6362\u5230\u63a8\u7406\u6a21\u5f0f\uff0c\u5e38\u5e38\u5f62\u6210\u5fc3\u7406\u56fe\u50cf\u6216\u7ed8\u5236\u89c6\u89c9\u8f85\u52a9\u5de5\u5177\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6570\u5b66\u548c\u7b26\u53f7\u63a8\u7406\u65b9\u9762\u5c55\u73b0\u51fa\u826f\u597d\u8868\u73b0\uff0c\u901a\u8fc7\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u94fe\u6761\u601d\u8003\uff0c\u4f46\u5728\u5904\u7406\u53ef\u4ee5\u901a\u8fc7\u89c6\u89c9\u63a8\u7406\u8f7b\u677e\u89e3\u7b54\u7684\u6587\u672c\u67e5\u8be2\u65f6\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5927\u91cf\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u4e5f\u662f\u5982\u6b64\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u65b9\u6cd5\uff0c\u5373\u201c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u201d\uff0c\u6765\u89e3\u9501\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8de8\u6a21\u6001\u4e2d\u7684\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u3002\u767d\u677f\u601d\u7ef4\u63d0\u793a\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6bd4\u55bb\u6027\u7684\u201c\u767d\u677f\u201d\uff0c\u8ba9\u5176\u4ee5\u56fe\u50cf\u5f62\u5f0f\u5c55\u73b0\u63a8\u7406\u6b65\u9aa4\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u56fe\u50cf\u8fd4\u56de\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u5904\u7406\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u793a\u8303\u6216\u4e13\u7528\u6a21\u5757\uff0c\u800c\u662f\u5229\u7528\u6a21\u578b\u73b0\u6709\u7684\u4f7f\u7528Matplotlib\u548cTurtle\u7b49\u5e93\u7f16\u5199\u4ee3\u7801\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u7b80\u5355\u7b56\u7565\u5728\u56db\u4e2a\u6d89\u53ca\u89c6\u89c9\u548c\u7a7a\u95f4\u63a8\u7406\u7684\u56f0\u96be\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e0e\u94fe\u5f0f\u601d\u8003\u76f8\u6bd4\uff0cGPT-4o\u5728\u67d0\u4e9b\u573a\u666f\u4e0b\u5927\u5e45\u5931\u8d25\uff0c\u5305\u62ec\u4e00\u4e9b\u51c6\u786e\u7387\u4e3a0%\u7684\u60c5\u51b5\u4e0b\uff0c\u800c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u80fd\u63d0\u5347\u81f3\u9ad8\u8fbe92%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8be5\u6280\u672f\u7684\u6210\u529f\u4e4b\u5904\u53ca\u5176\u9519\u8bef\u6765\u6e90\u3002|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556](http://arxiv.org/abs/2406.14556)|null|\u5c3d\u7ba1\u5b9e\u65f6\u89c4\u5212\u5668\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u4e3a\u63d0\u9ad8\u8fd0\u52a8\u89c4\u5212\u7684\u53ef\u89e3\u91ca\u6027\u548c\u53ef\u63a7\u6027\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u9a71\u52a8\u7684\u89c4\u5212\u5668\u4ecd\u9762\u4e34\u8d44\u6e90\u6d88\u8017\u5927\u548c\u63a8\u7406\u65f6\u95f4\u957f\u7684\u95ee\u9898\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5b9e\u7528\u90e8\u7f72\u3002\u9274\u4e8e\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AsyncDriver\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u5f02\u6b65LLM\u589e\u5f3a\u7684\u95ed\u73af\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u7684\u4e0e\u573a\u666f\u76f8\u5173\u7684\u6307\u4ee4\u7279\u5f81\uff0c\u6307\u5bfc\u5b9e\u65f6\u89c4\u5212\u5668\u8fdb\u884c\u7cbe\u786e\u548c\u53ef\u63a7\u7684\u8f68\u8ff9\u9884\u6d4b\u3002AsyncDriver\u5c55\u793a\u4e86LLMs\u5728\u7406\u89e3\u548c\u5904\u7406\u5411\u91cf\u5316\u573a\u666f\u6570\u636e\u53ca\u4e00\u7cfb\u5217\u8def\u7ebf\u6307\u793a\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u540c\u65f6\u901a\u8fc7\u5f02\u6b65\u8bbe\u8ba1\uff0c\u6709\u6548\u964d\u4f4e\u4e86LLM\u5e26\u6765\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u4fdd\u6301\u4e86\u4e0e\u4e4b\u76f8\u8fd1\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728nuPlan\u7684\u590d\u6742\u573a\u666f\u4e2d\u5b9e\u73b0\u4e86\u66f4\u4f18\u7684\u95ed\u73af\u8bc4\u4f30\u6027\u80fd\u3002|\n", "2406.14550": "|**2024-06-20**|**GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models**|Shilong Li et.al.|[2406.14550](http://arxiv.org/abs/2406.14550)|null|\u957f\u6587\u672c\u5904\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u5bf9\u590d\u6742\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u6709\u591a\u65b9\u52aa\u529b\u4f18\u5316LLMs\u5904\u7406\u957f\u8f93\u5165\uff0c\u4f46\u4f9d\u7136\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u63d0\u51faGraphReader\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u56fe\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u6784\u5efa\u6587\u672c\u56fe\u5e76\u8ba9\u4ee3\u7406\u81ea\u4e3b\u63a2\u7d22\u6765\u5904\u7406\u957f\u6587\u672c\u3002\u5f53\u63a5\u6536\u5230\u95ee\u9898\u65f6\uff0c\u4ee3\u7406\u4f1a\u9010\u6b65\u5206\u6790\u5e76\u5236\u5b9a\u5408\u7406\u8ba1\u5212\uff0c\u7136\u540e\u8c03\u7528\u9884\u5b9a\u4e49\u51fd\u6570\u8bfb\u53d6\u8282\u70b9\u5185\u5bb9\u548c\u90bb\u5c45\u4fe1\u606f\uff0c\u5b9e\u73b0\u4ece\u7c97\u5230\u7ec6\u7684\u56fe\u63a2\u7d22\u3002\u5728\u63a2\u7d22\u8fc7\u7a0b\u4e2d\uff0c\u4ee3\u7406\u4e0d\u65ad\u8bb0\u5f55\u65b0\u53d1\u73b0\u5e76\u53cd\u601d\u5f53\u524d\u60c5\u51b5\uff0c\u4ee5\u4f18\u5316\u83b7\u53d6\u4fe1\u606f\u7684\u8fc7\u7a0b\uff0c\u76f4\u5230\u6536\u96c6\u8db3\u591f\u4fe1\u606f\u751f\u6210\u7b54\u6848\u3002\u5728LV-Eval\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u75284k\u4e0a\u4e0b\u6587\u7a97\u53e3\u7684GraphReader\u572816k\u5230256k\u7684\u957f\u6587\u672c\u957f\u5ea6\u4e0a\uff0c\u76f8\u5bf9\u4e8eGPT-4-128k\u6709\u663e\u8457\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5355\u8df3\u548c\u591a\u8df3\u7684\u6311\u6218\u6027\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.14549": "|**2024-06-20**|**Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models**|Sunny Duan et.al.|[2406.14549](http://arxiv.org/abs/2406.14549)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5174\u8d77\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u53d1\u751f\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u4f46\u8fd9\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u9690\u79c1\u548c\u5b89\u5168\u7684\u91cd\u5927\u5fe7\u8651\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u5305\u542b\u6f5c\u5728\u654f\u611f\u6216\u4e13\u6709\u4fe1\u606f\u7684\u5927\u91cf\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u6570\u636e\u6cc4\u9732\u7684\u98ce\u9669\u2014\u2014\u5373\u6a21\u578b\u54cd\u5e94\u63ed\u793a\u90e8\u5206\u4fe1\u606f\u2014\u2014\u5c1a\u4e0d\u4e3a\u4eba\u5145\u5206\u7406\u89e3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u673a\u5668\u5b66\u4e60\u6a21\u578b\u4e2d\u7684\u8bb0\u5fc6\u73b0\u8c61\uff0c\u7279\u522b\u662f\u5173\u6ce8\u5176\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u6f14\u53d8\u3002\u6211\u4eec\u8c03\u67e5\u4e86\u8bad\u7ec3\u6570\u636e\u7684\u7edf\u8ba1\u7279\u6027\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5185\u7f16\u7801\u7684\u8bb0\u5fc6\uff0c\u901a\u8fc7\u8bc4\u4f30\u91cd\u590d\u5bf9\u8bb0\u5fc6\u7684\u5f71\u54cd\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6a21\u578b\u8bb0\u4f4f\u4e00\u4e2a\u5e8f\u5217\u7684\u6982\u7387\u4e0e\u5b83\u5728\u6570\u636e\u4e2d\u51fa\u73b0\u7684\u6b21\u6570\u5448\u5bf9\u6570\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5373\u4f7f\u6ca1\u6709\u540e\u7eed\u7684\u63a5\u89e6\uff0c\u67d0\u4e9b\u770b\u4f3c\u672a\u88ab\u8bb0\u4f4f\u7684\u5e8f\u5217\u4e5f\u53ef\u80fd\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6e10\u663e\u73b0\u3002\u8fd9\u79cd\u9690\u85cf\u7684\u5df2\u8bb0\u4f4f\u5e8f\u5217\u5bf9\u6570\u636e\u9690\u79c1\u6784\u6210\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u9690\u85cf\u5728\u6a21\u578b\u7684\u6700\u7ec8\u68c0\u67e5\u70b9\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8bca\u65ad\u6d4b\u8bd5\uff0c\u901a\u8fc7\u8003\u8651\u5b83\u4eec\u7684\u4ea4\u53c9\u71b5\u635f\u5931\u6765\u63ed\u793a\u8fd9\u4e9b\u6f5c\u5728\u7684\u8bb0\u5fc6\u5e8f\u5217\u3002|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546](http://arxiv.org/abs/2406.14546)|**[link](https://github.com/choidami/inductive-oocr)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b89\u5168\u98ce\u9669\uff0c\u4e00\u4e2a\u7b56\u7565\u662f\u4ece\u5176\u8bad\u7ec3\u6570\u636e\u4e2d\u5220\u9664\u5371\u9669\u77e5\u8bc6\u3002\u5c3d\u7ba1\u8fd9\u6d88\u9664\u4e86\u663e\u6027\u4fe1\u606f\uff0c\u4f46\u9690\u6027\u4fe1\u606f\u53ef\u80fd\u4ecd\u6563\u843d\u5728\u591a\u4e2a\u8bad\u7ec3\u6587\u6863\u4e2d\u3002\u6211\u4eec\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1aLLMs\u80fd\u5426\u901a\u8fc7\u62fc\u51d1\u8fd9\u4e9b\u9690\u542b\u7ebf\u7d22\uff0c\u63a8\u65ad\u51fa\u88ab\u5c4f\u853d\u7684\u77e5\u8bc6\uff1f\u4e3a\u6b64\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u65e0\u4e0a\u4e0b\u6587\u5f52\u7eb3\u63a8\u7406\uff08Inductive Out-of-Context Reasoning\uff0cOOCR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6cdb\u5316\u80fd\u529b\uff0c\u8981\u6c42LLMs\u6839\u636e\u5206\u5e03\u5728\u8bad\u7ec3\u6587\u6863\u4e2d\u7684\u8bc1\u636e\u63a8\u65ad\u6f5c\u5728\u4fe1\u606f\uff0c\u5e76\u5728\u65e0\u9700\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u60c5\u51b5\u4e0b\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u3002\u901a\u8fc7\u4e94\u4e2a\u4efb\u52a1\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u524d\u6cbfLLMs\u786e\u5b9e\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u5728\u4e00\u9879\u5b9e\u9a8c\u4e2d\uff0c\u4ec5\u5bf9\u4e00\u4e2a\u672a\u77e5\u57ce\u5e02\u4e0e\u5176\u4e0e\u5176\u4ed6\u5df2\u77e5\u57ce\u5e02\u4e4b\u95f4\u7684\u8ddd\u79bb\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u6ca1\u6709\u793a\u4f8b\u6216\u94fe\u5f0f\u601d\u8003\uff0c\u8be5LLM\u4e5f\u80fd\u8868\u8ff0\u51fa\u672a\u77e5\u57ce\u5e02\u662f\u5df4\u9ece\uff0c\u5e76\u636e\u6b64\u89e3\u7b54\u540e\u7eed\u95ee\u9898\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u63a5\u53d7\u5355\u4e2a\u786c\u5e01\u629b\u63b7\u7ed3\u679c\u8bad\u7ec3\u7684LLMs\u80fd\u5224\u65ad\u786c\u5e01\u662f\u5426\u504f\u659c\uff0c\u800c\u53ea\u63a5\u89e6$(x, f(x))$\u5bf9\u7684\u6a21\u578b\u80fd\u9610\u8ff0$f$\u7684\u5b9a\u4e49\u5e76\u8ba1\u7b97\u9006\u8fd0\u7b97\u3002\u867d\u7136OOCR\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u826f\u597d\uff0c\u4f46\u6211\u4eec\u4e5f\u53d1\u73b0\u5b83\u5e76\u4e0d\u603b\u662f\u53ef\u9760\u7684\uff0c\u7279\u522b\u662f\u5728\u5c0f\u578bLLMs\u5b66\u4e60\u590d\u6742\u7ed3\u6784\u65f6\u3002\u603b\u7684\u6765\u8bf4\uff0cLLMs\u65e0\u9700\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u5c31\u80fd\u201c\u4e32\u8054\u8d77\u201d\u4fe1\u606f\uff0c\u8fd9\u7ed9\u76d1\u63a7\u548c\u63a7\u5236\u5b83\u4eec\u83b7\u53d6\u7684\u77e5\u8bc6\u5e26\u6765\u4e86\u6f5c\u5728\u6311\u6218\u3002**|\n", "2406.14545": "|**2024-06-20**|**Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems**|\u0110or\u0111e Klisura et.al.|[2406.14545](http://arxiv.org/abs/2406.14545)|null|\u5173\u7cfb\u6570\u636e\u5e93\u5728\u73b0\u4ee3\u4fe1\u606f\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u662f\u5b58\u50a8\u3001\u67e5\u8be2\u548c\u7ba1\u7406\u6570\u636e\u7684\u6838\u5fc3\u3002\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u6587\u672c\u5230SQL\u6280\u672f\u5d2d\u9732\u5934\u89d2\uff0c\u6781\u5927\u5730\u63d0\u5347\u4e86\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u9690\u79c1\u548c\u5b89\u5168\u7684\u62c5\u5fe7\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e13\u6ce8\u4e8e\u63d0\u53d6\u6587\u672c\u5230SQL\u6a21\u578b\u6240\u4f9d\u8d56\u7684\u6570\u636e\u5e93\u6a21\u5f0f\u5143\u7d20\u3002\u4e86\u89e3\u6a21\u5f0f\u53ef\u80fd\u4f7fSQL\u6ce8\u5165\u653b\u51fb\u66f4\u4e3a\u5bb9\u6613\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u96f6\u77e5\u8bc6\u6846\u67b6\uff0c\u901a\u8fc7\u63d0\u51fa\u7cbe\u5fc3\u6784\u9020\u7684\u95ee\u9898\uff0c\u65e0\u9700\u76f4\u63a5\u4e86\u89e3\u6570\u636e\u5e93\uff0c\u8be5\u6846\u67b6\u80fd\u4fc3\u4f7f\u8fd9\u4e9b\u6a21\u578b\u5904\u7406\u8fd9\u4e9b\u95ee\u9898\u5e76\u751f\u6210\u8f93\u51fa\uff0c\u4ece\u800c\u63ed\u793a\u6570\u636e\u5e93\u6a21\u5f0f\u7ed3\u6784\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u5e94\u7528\u4e8e\u9488\u5bf9\u6587\u672c-SQL\u5bf9\u8fdb\u884c\u8fc7\u5fae\u8c03\u7684\u4e13\u7528\u6587\u672c\u5230SQL\u6a21\u578b\u4ee5\u53ca\u7528\u4e8eSQL\u751f\u6210\u7684\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u80fd\u591f\u4ee5\u63a5\u8fd10.75\u7684F1\u5206\u6570\u91cd\u6784\u8868\u540d\uff0c\u800c\u5bf9\u4e8e\u751f\u6210\u5f0f\u6a21\u578b\uff0c\u8fd9\u4e00\u5206\u6570\u66f4\u662f\u9ad8\u8fbe0.96\u3002|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544](http://arxiv.org/abs/2406.14544)|**[link](https://github.com/sparksjoe/prism)**|**## \u7ffb\u8bd1 \u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5904\u7406\u5404\u79cd\u89c6\u89c9\u95ee\u9898\u65f6\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u5177\u5907\u5f3a\u5927\u7684\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u611f\u77e5\u548c\u63a8\u7406\u5728\u73b0\u6709VLM\u4e2d\u7684\u4ea4\u7ec7\u6027\uff0c\u72ec\u7acb\u8bc4\u4f30\u8fd9\u4e24\u65b9\u9762\u7684\u80fd\u529b\u9887\u5177\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\u2014\u2014Prism\uff0c\u65e8\u5728\u5206\u79bb\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u5728\u89c6\u89c9\u95ee\u7b54\u4e2d\u7684\u4f5c\u7528\u3002Prism\u5206\u4e3a\u4e24\u4e2a\u9636\u6bb5\uff1a\u611f\u77e5\u9636\u6bb5\u5229\u7528VLM\u63d0\u53d6\u5e76\u4ee5\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u89c6\u89c9\u4fe1\u606f\uff1b\u63a8\u7406\u9636\u6bb5\u5219\u6839\u636e\u63d0\u53d6\u7684\u89c6\u89c9\u4fe1\u606f\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u54cd\u5e94\u3002\u8fd9\u79cd\u6a21\u5757\u5316\u8bbe\u8ba1\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u7cfb\u7edf\u5730\u6bd4\u8f83\u548c\u8bc4\u4f30\u4e0d\u540cVLM\u7684\u611f\u77e5\u548c\u63a8\u7406\u6027\u80fd\u3002 \u6211\u4eec\u7684\u5206\u6790\u6846\u67b6\u63d0\u4f9b\u4e86\u8bf8\u591a\u6d1e\u89c1\uff0c\u8bc1\u660e\u4e86Prism\u4f5c\u4e3a\u6210\u672c\u6548\u76ca\u9ad8\u7684\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5c06\u4e13\u6ce8\u4e8e\u611f\u77e5\u7684\u7b80\u5316VLM\u4e0e\u4e13\u4e3a\u63a8\u7406\u8bbe\u8ba1\u7684\u5f3a\u5927LLM\u76f8\u7ed3\u5408\uff0cPrism\u5728\u901a\u7528\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4f18\u5f02\u6210\u7ee9\uff0c\u540c\u65f6\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u548c\u8fd0\u8425\u6210\u672c\u3002\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\uff0c\u5f53Prism\u914d\u5907\u57fa\u7840\u76842B LLaVA VLM\u548c\u5f00\u6e90\u7684GPT-3.5\u65f6\uff0c\u5176\u5728\u4e25\u8c28\u7684\u591a\u6a21\u6001\u57fa\u51c6MMStar\u4e0a\u7684\u8868\u73b0\u53ef\u4e0e\u5927\u5341\u500d\u7684VLM\u76f8\u5f53\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728\uff1ahttps://github.com/SparksJoe/Prism\u3002**|\n", "2406.14541": "|**2024-06-21**|**Are LLMs Naturally Good at Synthetic Tabular Data Generation?**|Shengzhe Xu et.al.|[2406.14541](http://arxiv.org/abs/2406.14541)|**[link](https://github.com/anonymou9167/anonymouscode)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u548c\u56fe\u50cf\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u5728\u751f\u6210\u6700\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b\u2014\u2014\u8868\u683c\u6570\u636e\u65b9\u9762\u7684\u6f5c\u529b\u5374\u9c9c\u6709\u7814\u7a76\u3002\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u76f4\u63a5\u4f7f\u7528\u6216\u7ecf\u8fc7\u4f20\u7edf\u5fae\u8c03\u7684LLMs\u5728\u4f5c\u4e3a\u5408\u6210\u8868\u683c\u751f\u6210\u5668\u65f6\u8868\u73b0\u6781\u5dee\u3002\u7531\u4e8eLLMs\u7684\u81ea\u56de\u5f52\u7279\u6027\uff0c\u968f\u673a\u987a\u5e8f\u6392\u5217\u7684\u5fae\u8c03\u4e0e\u6355\u6349\u529f\u80fd\u6027\u4f9d\u8d56\u7684\u91cd\u8981\u6027\u76f8\u6096\uff0c\u5bfc\u81f4\u5b83\u4eec\u65e0\u6cd5\u5904\u7406\u6761\u4ef6\u6df7\u5408\u5206\u5e03\uff08\u8fd9\u662f\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u7ea6\u675f\u7684\u5173\u952e\uff09\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u4f7fLLMs\u53d8\u5f97\u611f\u77e5\u6392\u5217\u987a\u5e8f\u6765\u6539\u5584\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u4ece\u800c\u63d0\u5347\u5176\u6027\u80fd\u3002**|\n", "2406.14517": "|**2024-06-20**|**PostMark: A Robust Blackbox Watermark for Large Language Models**|Yapei Chang et.al.|[2406.14517](http://arxiv.org/abs/2406.14517)|**[link](https://github.com/lilakk/postmark)**|**\u6700\u6709\u6548\u7684\u68c0\u6d4b\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6587\u672c\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u5728\u89e3\u7801\u8fc7\u7a0b\u4e2d\u63d2\u5165\u53ef\u8bc6\u522b\u7684\u6807\u8bb0\uff0c\u5373\u6c34\u5370\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u83b7\u53d6\u5230LLM\u7684\u539f\u59cb\u6982\u7387\uff08logits\uff09\uff0c\u8fd9\u4f7f\u5f97LLM\u670d\u52a1\u63d0\u4f9b\u5546\u4e0d\u613f\u5206\u4eab\uff0c\u56e0\u4e3a\u62c5\u5fc3\u6a21\u578b\u6cc4\u9732\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u6c34\u5370\u9700\u8981\u6bcf\u4e2a\u63d0\u4f9b\u8005\u72ec\u7acb\u5f00\u53d1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u540e\u5904\u7406\u6c34\u5370\u65b9\u6848\uff0c\u540d\u4e3aPostMark\u3002\u5b83\u662f\u4e00\u79cd\u6a21\u5757\u5316\u7684\u3001\u751f\u6210\u540e\u63d2\u5165\u7684\u6c34\u5370\u7b56\u7565\uff0c\u65e0\u9700\u89e6\u53calogits\uff0c\u9002\u5408\u7b2c\u4e09\u65b9\u5b9e\u65bd\u3002PostMark\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u5bf9\u6297\u540c\u4e49\u53e5\u653b\u51fb\u80fd\u529b\uff1a\u6211\u4eec\u5728\u5b9e\u9a8c\u4e2d\u6db5\u76d6\u4e86\u516b\u4e2a\u57fa\u7840\u7b97\u6cd5\u3001\u4e94\u4e2a\u57fa\u7ebfLLM\u548c\u4e09\u4e2a\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86PostMark\u5bf9\u6587\u672c\u8d28\u91cf\u7684\u5f71\u54cd\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86\u8d28\u91cf\u548c\u6297\u6539\u5199\u653b\u51fb\u4e4b\u95f4\u7684\u6743\u8861\u3002\u7814\u7a76\u4ee3\u7801\u3001\u8f93\u51fa\u548c\u6ce8\u91ca\u5df2\u516c\u5f00\u5728https://github.com/lilakk/PostMark\u3002**|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.15330": "|**2024-06-21**|**Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance**|Haoling Li et.al.|[2406.15330](http://arxiv.org/abs/2406.15330)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u4f17\u591a\u7814\u7a76\u9886\u57df\u5e26\u6765\u4e86\u9769\u65b0\u3002\u5c3d\u7ba1\u4eba\u4eec\u666e\u904d\u77e5\u9053\u5fae\u8c03\u5bf9\u4e8e\u589e\u5f3aLLMs\u7684\u529f\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u8868\u660e\uff0c\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u5b58\u5728\u53c2\u6570\u5197\u4f59\u3002\u56e0\u6b64\uff0c\u6709\u7814\u7a76\u5efa\u8bae\u53ea\u66f4\u65b0\u90e8\u5206\u53c2\u6570\uff0c\u4f46\u8fd9\u672a\u80fd\u6709\u6548\u5229\u7528\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\u6765\u8bc6\u522b\u8bad\u7ec3\u4e2d\u7684\u91cd\u8981\u53c2\u6570\u3002\u8003\u8651\u5230\u68af\u5ea6\u672c\u8d28\u4e0a\u8574\u542b\u7740\u4efb\u52a1\u76f8\u5173\u6570\u636e\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68af\u5ea6\u63a9\u7801\u8c03\u4f18\uff08Gradient-Mask Tuning\uff0cGMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6839\u636e\u53c2\u6570\u7684\u68af\u5ea6\u4fe1\u606f\u9009\u62e9\u6027\u5730\u8fdb\u884c\u8bad\u7ec3\u66f4\u65b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8ba1\u7b97\u68af\u5ea6\u7684\u7edd\u5bf9\u503c\uff0c\u5e76\u5bf9\u8f83\u5c0f\u5e45\u5ea6\u7684\u68af\u5ea6\u5e94\u7528\u63a9\u7801\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMT\u4e0d\u4ec5\u4f18\u4e8e\u4f20\u7edf\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u8fd8\u63d0\u5347\u4e86LLM\u6027\u80fd\u7684\u4e0a\u9650\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cGMT\u5bf9\u63a9\u7801\u6bd4\u4f8b\u5177\u6709\u4e00\u5b9a\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u4e0e\u57fa\u672c\u7684\u5fae\u8c03\uff08Simple Fine-Tuning\uff0cSFT\uff09\u76f8\u5f53\u3002|\n", "2406.15325": "|**2024-06-21**|**Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks**|Hokyung Lee et.al.|[2406.15325](http://arxiv.org/abs/2406.15325)|**[link](https://github.com/hamminghq/bug-in-the-code-stack)**|\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u9488\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6d77\u91cf\u6587\u672c\u6587\u6863\u4e2d\u68c0\u7d22\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684Needle-in-a-Haystack\uff08NIAH\uff09\u57fa\u51c6\u7814\u7a76\u6709\u6240\u8fdb\u5c55\u3002\u968f\u7740LLMs\u5728\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u65e5\u76ca\u878d\u5408\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u4ee3\u7801\u73af\u5883\u4e2d\u7684\u8868\u73b0\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740LLMs\u671d\u7740\u7a0b\u5e8f\u5408\u6210\u65b9\u5411\u53d1\u5c55\uff0c\u5fc5\u987b\u786e\u4fdd\u5b83\u4eec\u80fd\u7406\u89e3\u8bed\u6cd5\u5e76\u7f16\u5199\u51fa\u7b26\u5408\u8bed\u6cd5\u89c4\u5219\u7684\u4ee3\u7801\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Bug In The Code Stack\uff08BICS\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u68c0\u9a8cLLMs\u8bc6\u522b\u7b80\u5355\u8bed\u6cd5\u9519\u8bef\u7684\u80fd\u529b\u4e8e\u5927\u578b\u6e90\u4ee3\u7801\u4e2d\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\u4e09\u4e2a\u5173\u952e\u70b9\uff1a\uff081\uff09\u4e0e\u6587\u672c\u73af\u5883\u76f8\u6bd4\uff0c\u57fa\u4e8e\u4ee3\u7801\u7684\u73af\u5883\u5bf9\u68c0\u7d22\u4efb\u52a1\u6784\u6210\u4e86\u66f4\u5927\u7684\u6311\u6218\uff1b\uff082\uff09\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff1b\uff083\uff09\u5c3d\u7ba1\u5982\u6b64\uff0c\u8f83\u957f\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e0e\u6027\u80fd\u4e0b\u964d\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u8fd9\u79cd\u4e0b\u964d\u7a0b\u5ea6\u5728\u4e0d\u540c\u7684\u6a21\u578b\u95f4\u6709\u6240\u4e0d\u540c\u3002|\n", "2406.15264": "|**2024-06-21**|**Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics**|Weijia Zhang et.al.|[2406.15264](http://arxiv.org/abs/2406.15264)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u53ef\u9760\u6216\u96be\u4ee5\u9a8c\u8bc1\u7684\u4fe1\u606f\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u7684LLMs\u5f15\u5165\u4e86\u5f15\u7528\uff0c\u4f7f\u5185\u5bb9\u57fa\u4e8e\u53ef\u6838\u67e5\u7684\u6765\u6e90\u3002\u7136\u800c\uff0c\u624b\u52a8\u8bc4\u4f30\u5f15\u7528\u662f\u5426\u5145\u5206\u652f\u6301\u76f8\u5173\u9648\u8ff0\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4fe1\u4ef0\u5ea6\u6307\u6807\u81ea\u52a8\u4f30\u8ba1\u5f15\u7528\u7684\u652f\u6301\u7a0b\u5ea6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u9650\u4e8e\u4e8c\u5206\u7c7b\uff0c\u5ffd\u89c6\u4e86\u5b9e\u9645\u573a\u666f\u4e2d\u5bf9\u7cbe\u7ec6\u7ea7\u522b\u5f15\u7528\u652f\u6301\u7684\u8003\u91cf\u3002\u4e3a\u4e86\u63a2\u7a76\u4fe1\u4ef0\u5ea6\u6307\u6807\u5728\u7cbe\u7ec6\u7ea7\u522b\u8bc4\u4f30\u4e2d\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6bd4\u8f83\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u68c0\u9a8c\u8fd9\u4e9b\u6307\u6807\u5728\u533a\u5206\u4e09\u79cd\u652f\u6301\u7b49\u7ea7\uff08\u5168\u9762\u3001\u90e8\u5206\u548c\u65e0\u652f\u6301\uff09\u4e4b\u95f4\u7684\u80fd\u529b\uff1a\u5168\u9762\u652f\u6301\u3001\u90e8\u5206\u652f\u6301\u548c\u4e0d\u652f\u6301\u3002\u6211\u4eec\u7684\u6846\u67b6\u91c7\u7528\u76f8\u5173\u6027\u5206\u6790\u3001\u5206\u7c7b\u8bc4\u4f30\u548c\u68c0\u7d22\u8bc4\u4f30\uff0c\u5168\u65b9\u4f4d\u8861\u91cf\u6307\u6807\u5206\u6570\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u4e00\u81f4\u6027\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u6ca1\u6709\u5355\u4e00\u6307\u6807\u5728\u6240\u6709\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63ed\u793a\u4e86\u7cbe\u7ec6\u7ea7\u522b\u652f\u6301\u8bc4\u4f30\u7684\u590d\u6742\u6027\u3002\u6839\u636e\u53d1\u73b0\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u4e3a\u5f00\u53d1\u66f4\u6709\u6548\u7684\u6307\u6807\u63d0\u4f9b\u4e86\u5b9e\u7528\u5efa\u8bae\u3002|\n", "2406.15231": "|**2024-06-21**|**Detecting Synthetic Lyrics with Few-Shot Inference**|Yanis Labrak et.al.|[2406.15231](http://arxiv.org/abs/2406.15231)|null|\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u7684\u97f3\u4e50\u5185\u5bb9\u9010\u6e10\u53d7\u5230\u5173\u6ce8\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u88ab\u6709\u6548\u5e94\u7528\u4e8e\u521b\u4f5c\u5404\u79cd\u98ce\u683c\u3001\u4e3b\u9898\u548c\u8bed\u8a00\u7ed3\u6784\u7684\u6b4c\u8bcd\uff0c\u8fd9\u63a8\u52a8\u4e86\u827a\u672f\u5bb6\u4eec\u7684\u521b\u4f5c\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u7248\u6743\u4fb5\u72af\u3001\u6d88\u8d39\u8005\u6ee1\u610f\u5ea6\u548c\u5185\u5bb9\u6ee5\u53d1\u7b49\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u68c0\u6d4b\u751f\u6210\u6b4c\u8bcd\u7684\u65b9\u6cd5\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5e76\u672a\u4e13\u6ce8\u4e8e\u8fd9\u4e00\u7279\u5b9a\u9886\u57df\u6216\u521b\u610f\u6587\u672c\u7684\u673a\u5668\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u7cbe\u5fc3\u6784\u5efa\u4e86\u9996\u4e2a\u9ad8\u8d28\u91cf\u5408\u6210\u6b4c\u8bcd\u6570\u636e\u96c6\uff0c\u5e76\u5bf9\u591a\u79cd\u57fa\u4e8e\u5c11\u91cf\u6837\u672c\u7684\u68c0\u6d4b\u65b9\u6cd5\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5b9a\u91cf\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u8f85\u4ee5\u4eba\u7c7b\u8bc4\u4ef7\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6700\u4f73\u5c11\u6570\u6837\u672c\u68c0\u6d4b\u5668\u2014\u2014\u57fa\u4e8eLLM2Vec\u7684\u65b9\u6cd5\u8d85\u8d8a\u4e86\u5728\u5176\u4ed6\u9886\u57df\u8868\u73b0\u5f3a\u52b2\u7684\u98ce\u683c\u548c\u7edf\u8ba1\u65b9\u6cd5\uff0c\u6210\u529f\u9274\u522b\u51fa\u4eba\u7c7b\u521b\u4f5c\u4e0e\u673a\u5668\u751f\u6210\u7684\u6b4c\u8bcd\uff0c\u4e14\u5c55\u73b0\u51fa\u826f\u597d\u7684\u8de8\u827a\u672f\u5bb6\u548c\u6a21\u578b\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u80fd\u6709\u6548\u8bc6\u522b\u751f\u6210\u540e\u7684\u4eba\u5de5\u6da6\u8272\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5728\u521b\u610f\u5185\u5bb9\u68c0\u6d4b\u9886\u57df\uff0c\u7279\u522b\u662f\u6cdb\u5316\u80fd\u529b\u548c\u5bf9\u66f4\u5927\u6b4c\u66f2\u5e93\u7684\u9002\u5e94\u6027\u65b9\u9762\uff0c\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6240\u6709\u6570\u636e\u96c6\u3001\u9884\u5904\u7406\u811a\u672c\u548c\u4ee3\u7801\u5df2\u516c\u5f00\u5728GitHub\u548cHugging Face\u4e0a\uff0c\u9075\u5faaApache 2.0\u8bb8\u53ef\u534f\u8bae\u3002|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227](http://arxiv.org/abs/2406.15227)|null|\u968f\u7740\u7f51\u7edc\u4e0a\u9519\u8bef\u4fe1\u606f\u548c\u6709\u5bb3\u8a00\u8bba\u7684\u589e\u591a\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u53cd\u53d9\u4e8b\uff08Counter Narrative\uff0cCN\uff09\u751f\u6210\u6280\u672f\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u751f\u6210\u7684CN\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e4b\u95f4\u7684\u590d\u6742\u5173\u7cfb\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684CN\uff0c\u5373\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4f5c\u4e3a\u8bc4\u4f30\u5668\u3002\u901a\u8fc7\u4ee5\u9526\u6807\u8d5b\u5f62\u5f0f\u5bf9\u751f\u6210\u7684CN\u8fdb\u884c\u5bf9\u6218\u6bd4\u8f83\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u6a21\u578b\u6392\u540d\u6d41\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u95f4\u7684\u76f8\u5173\u7cfb\u6570\u8fbe\u52300.88\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528LLM\u8fdb\u884c\u96f6\u6837\u672c\uff08Zero-Shot\uff0cZS\uff09CN\u751f\u6210\u7684\u80fd\u529b\uff0c\u5bf9\u6bd4\u5206\u6790\u4e86\u804a\u5929\u3001\u6307\u4ee4\u548c\u57fa\u7840\u6a21\u578b\u7684\u6027\u80fd\u548c\u5c40\u9650\u6027\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u5fae\u8c03\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u9886\u57df\u6570\u636e\u4e0b\u7684\u54cd\u5e94\u5dee\u5f02\u3002\u7ed3\u8bba\u662f\uff0c\u5bf9\u4e8e\u6267\u884c\u8fd9\u9879\u4efb\u52a1\uff0c\u5982\u679c\u80fd\u907f\u514d\u56e0\u5b89\u5168\u987e\u8651\u800c\u62d2\u7edd\u751f\u6210\uff0c\u804a\u5929\u5bfc\u5411\u7684ZS\u6a21\u578b\u53ef\u80fd\u662f\u6700\u4f73\u9009\u62e9\u3002|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214](http://arxiv.org/abs/2406.15214)|null|## \u7ffb\u8bd1 \u5bf9\u8bdd\u7b56\u7565\u5728\u6784\u5efa\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5176\u5f00\u53d1\u548c\u7ef4\u62a4\u5f80\u5f80\u9700\u8981\u5bf9\u8bdd\u5efa\u6a21\u4e13\u5bb6\u7684\u5927\u91cf\u6295\u5165\u3002\u5c3d\u7ba1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u624b\u5934\u6709\u5927\u91cf\u7684\u5bf9\u8bdd\u6570\u636e\uff0c\u4f46\u4eba\u4eec\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5bf9\u8bdd\u7b56\u7565\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u901a\u8fc7\u5c55\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u5728\u5bf9\u8bdd\u6570\u636e\u8f6c\u5316\u4e3a\u7edf\u4e00\u7684\u4e2d\u95f4\u8868\u793a\u2014\u2014\u89c4\u8303\u5f62\u5f0f\u7684\u8fc7\u7a0b\u4e2d\u53d1\u6325\u4f5c\u7528\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5229\u7528\u53ef\u63a7\u4e14\u53ef\u89e3\u91ca\u7684\u56fe\u57fa\u65b9\u6cd5\u751f\u6210\u5bf9\u8bdd\u7b56\u7565\u7684\u6280\u672f\u3002\u901a\u8fc7\u5c06\u5bf9\u8bdd\u4e2d\u7684\u89c4\u8303\u5f62\u5f0f\u6574\u5408\u6210\u6d41\u7a0b\u7f51\u7edc\uff0c\u6211\u4eec\u53d1\u73b0\u8fd0\u884c\u56fe\u904d\u5386\u7b97\u6cd5\u6709\u52a9\u4e8e\u63d0\u53d6\u5bf9\u8bdd\u6d41\u7a0b\u3002\u76f8\u6bd4\u4ec5\u4f9d\u8d56LLM\u63d0\u53d6\u7684\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6d41\u7a0b\u66f4\u597d\u5730\u53cd\u6620\u4e86\u5e95\u5c42\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e8\u5728\u8d4b\u4e88\u5bf9\u8bdd\u8bbe\u8ba1\u8005\u66f4\u5927\u7684\u63a7\u5236\u529b\uff0c\u63d0\u4f9b\u4e00\u4e2a\u63d0\u5347\u5bf9\u8bdd\u7b56\u7565\u5f00\u53d1\u6548\u7387\u7684\u5de5\u5177\u3002|\n", "2406.15209": "|**2024-06-21**|**Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding**|Mohan Li et.al.|[2406.15209](http://arxiv.org/abs/2406.15209)|null|## \u80cc\u666f \u96f6\u6837\u672c\u8bed\u97f3\u8bed\u8a00\u7406\u89e3\uff08SLU\uff09\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u65e0\u9700\u5148\u524d\u8bad\u7ec3\u6570\u636e\u7684\u65b0\u9886\u57df\u7406\u89e3\u7528\u6237\u8bdd\u8bed\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5bfc\u81f4\u5e9e\u5927\u7684\u5b58\u50a8\u9700\u6c42\u548c\u590d\u6742\u6027\u3002\u672c\u6587\u63d0\u51fa\u4f7f\u7528 Whisper\uff0c\u4e00\u4e2a\u72ec\u7acb\u7684\u8bed\u97f3\u5904\u7406\u6a21\u578b\uff0c\u6765\u8fdb\u884c\u96f6\u6837\u672c\u7aef\u5230\u7aef\uff08E2E\uff09SLU\u3002\u4e3a\u5904\u7406\u672a\u89c1\u8fc7\u7684\u8bed\u4e49\u6807\u7b7e\uff0c\u6211\u4eec\u5c06SLU\u4efb\u52a1\u878d\u5165\u95ee\u7b54\uff08QA\uff09\u6846\u67b6\u4e2d\uff0c\u901a\u8fc7\u63d0\u793aWhisper\u89e3\u7801\u5668\u8fdb\u884c\u8bed\u4e49\u63a8\u65ad\u3002\u6211\u4eec\u91c7\u7528\u524d\u7f00\u8c03\u4f18\u65b9\u6cd5\u9ad8\u6548\u5730\u8bad\u7ec3\u8be5\u7cfb\u7edf\uff0c\u53ea\u4f18\u5316\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2aWhisper\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u7cfb\u7edf\u5728SLURP\u4e0a\u7684\u69fd\u4f4d\u586b\u5145\uff08SLU-F1\uff09\u5f97\u5206\u6bd4\u6700\u8fd1\u5f15\u5165\u7684\u96f6\u6837\u672c\u57fa\u51c6\u63d0\u9ad8\u4e8640.7%\u3002\u6b64\u5916\uff0c\u5728\u65e2\u5b9a\u548c\u8de8\u9886\u57df\u8bc4\u4f30\u73af\u5883\u4e0b\uff0c\u5b83\u4e0e\u57fa\u4e8eWhisper-GPT-2\u7684\u6a21\u5757\u5316\u7cfb\u7edf\u8868\u73b0\u76f8\u5f53\uff0c\u4f46\u6a21\u578b\u53c2\u6570\u51cf\u5c11\u4e8634.8%\u3002|\n", "2406.15198": "|**2024-06-21**|**Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms**|Santiago Berrezueta-Guzman et.al.|[2406.15198](http://arxiv.org/abs/2406.15198)|null|\u6ce8\u610f\u529b\u7f3a\u9677\u591a\u52a8\u969c\u788d\uff08ADHD\uff09\u662f\u4e00\u79cd\u795e\u7ecf\u53d1\u80b2\u969c\u788d\uff0c\u5176\u7279\u5f81\u4e3a\u6ce8\u610f\u529b\u4e0d\u96c6\u4e2d\u3001\u8fc7\u5ea6\u6d3b\u8dc3\u548c\u51b2\u52a8\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4f53\u7684\u65e5\u5e38\u751f\u6d3b\u548c\u751f\u6d3b\u8d28\u91cf\u3002\u804c\u4e1a\u7597\u6cd5\u5728ADHD\u7ba1\u7406\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u901a\u8fc7\u57f9\u517b\u65e5\u5e38\u751f\u6d3b\u6240\u9700\u7684\u6280\u80fd\uff0c\u63d0\u5347\u4e2a\u4f53\u5728\u5b66\u6821\u3001\u5bb6\u5ead\u548c\u793e\u4f1a\u73af\u5883\u4e2d\u5168\u9762\u53c2\u4e0e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\u548c\u793e\u4ea4\u8f85\u52a9\u673a\u5668\u4eba\uff09\u5728\u5fc3\u7406\u6cbb\u7597\u4e2d\u7684\u6f5c\u5728\u4ef7\u503c\uff0c\u4ee5\u5f25\u8865\u73b0\u6709\u7597\u6cd5\u7684\u5c40\u9650\uff0c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u652f\u6301\u5e76\u9002\u5e94\u4e2a\u4f53\u7684\u72ec\u7279\u9700\u6c42\u3002\u7136\u800c\uff0c\u5173\u4e8e\u8fd9\u4e9b\u5148\u8fdb\u6280\u672f\u5728ADHD\u7597\u6cd5\u4e2d\u7684\u8054\u5408\u5e94\u7528\u7814\u7a76\u5c1a\u5b58\u5728\u8f83\u5927\u7a7a\u767d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6574\u5408\u4e86ChatGPT-4 Turbo\u548cClaude-3 Opus\u4e24\u4e2a\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u5230\u4e00\u4e2a\u673a\u5668\u4eba\u52a9\u7406\u4e2d\uff0c\u4ee5\u8003\u5bdf\u5b83\u4eec\u5728\u673a\u5668\u4eba\u8f85\u52a9\u4e92\u52a8\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u5728\u4e00\u4e2a\u6a21\u62df\u6cbb\u7597\u573a\u666f\u4e2d\u6bd4\u8f83\u5b83\u4eec\u4e0e\u4e34\u5e8a\u9a8c\u8bc1\u7684\u5b9a\u5236\u6a21\u578b\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cChatGPT-4 Turbo\u5728\u6027\u80fd\u548c\u54cd\u5e94\u901f\u5ea6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u9002\u5408\u4e8e\u65f6\u95f4\u654f\u611f\u7684\u5e94\u7528\u3002\u800cClaude-3 Opus\u5728\u7406\u89e3\u3001\u8fde\u8d2f\u6027\u548c\u4f26\u7406\u8003\u91cf\u65b9\u9762\u8868\u73b0\u51fa\u4f18\u52bf\uff0c\u5f3a\u8c03\u5b89\u5168\u548c\u5438\u5f15\u4eba\u7684\u4e92\u52a8\u3002\u4e24\u8005\u90fd\u5c55\u73b0\u51fa\u521b\u65b0\u548c\u9002\u5e94\u6027\uff0c\u4f46ChatGPT-4 Turbo\u5728\u96c6\u6210\u7b80\u6613\u5ea6\u548c\u8bed\u8a00\u652f\u6301\u65b9\u9762\u66f4\u5177\u4f18\u52bf\u3002\u9009\u62e9\u54ea\u4e2a\u6a21\u578b\u53d6\u51b3\u4e8eADHD\u7597\u6cd5\u7684\u5177\u4f53\u9700\u6c42\u3002|\n", "2406.15187": "|**2024-06-21**|**UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis**|Yulong Hui et.al.|[2406.15187](http://arxiv.org/abs/2406.15187)|**[link](https://github.com/qinchuanhui/uda-benchmark)**|**## \u7ffb\u8bd1 \u5c3d\u7ba1\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation, RAG\uff09\u6280\u672f\u63d0\u5347\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u7279\u522b\u662f\u5728\u5b66\u672f\u6587\u732e\u548c\u91d1\u878d\u95ee\u7b54\u7b49\u9886\u57df\uff0c\u6570\u636e\u5e38\u5e38\u4ee5HTML\u6216PDF\u683c\u5f0f\u7684\u5197\u957f\u3001\u7ed3\u6784\u590d\u6742\u7684\u6587\u672c\u548c\u8868\u683c\u5f62\u5f0f\u5b58\u5728\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3a\u201cUnstructured Document Analysis\u201d\uff08UDA\uff09\u7684\u65b0\u57fa\u51c6\uff0c\u5b83\u5305\u542b2,965\u4efd\u771f\u5b9e\u4e16\u754c\u7684\u6587\u6863\u548c29,590\u4e2a\u4e13\u5bb6\u6807\u6ce8\u7684\u95ee\u7b54\u5bf9\u3002\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u57fa\u4e8eLLM\u548cRAG\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u5e76\u5728\u591a\u4e2a\u6587\u6863\u9886\u57df\u548c\u591a\u6837\u5316\u7684\u67e5\u8be2\u7c7b\u578b\u4e0a\u8bc4\u4f30\u7b54\u6848\u8d28\u91cf\u548c\u7b56\u7565\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u6709\u8da3\u7684\u7ed3\u679c\uff0c\u5f3a\u8c03\u4e86\u6570\u636e\u89e3\u6790\u548c\u68c0\u7d22\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u4e2a\u57fa\u51c6\u80fd\u591f\u4e3a\u73b0\u5b9e\u4e16\u754c\u7684\u6587\u6863\u5206\u6790\u5e94\u7528\u63d0\u4f9b\u542f\u793a\uff0c\u5e76\u4e3a\u5176\u53d1\u5c55\u670d\u52a1\u3002\u57fa\u51c6\u5957\u4ef6\u548c\u4ee3\u7801\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.16858": "|**2024-06-24**|**EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees**|Yuhui Li et.al.|[2406.16858](http://arxiv.org/abs/2406.16858)|**[link](https://github.com/safeailab/eagle)**|\u5728\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6210\u672c\u9ad8\u4e14\u8017\u65f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6295\u673a\u53d6\u5de7\u7684\u62bd\u6837\u65b9\u6cd5\u5982EAGLE\u5df2\u8bc1\u5b9e\u6709\u6548\u3002\u4f20\u7edf\u65b9\u6cd5\u5047\u8bbe\u8349\u7a3f\u6811\u7684\u63a5\u53d7\u7387\u4ec5\u4f9d\u8d56\u4e8e\u4ee4\u724c\u7684\u4f4d\u7f6e\uff0c\u7136\u800c\u6211\u4eec\u53d1\u73b0\u8fd9\u5176\u5b9e\u8fd8\u53d6\u51b3\u4e8e\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728EAGLE\u7684\u57fa\u7840\u4e0a\u63d0\u51fa\u4e86EAGLE-2\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u52a8\u6001\u8349\u7a3f\u6811\u6280\u672f\u5230\u8d77\u8349\u5efa\u6a21\u4e2d\u3002\u8fd9\u4e00\u6539\u8fdb\u5229\u7528\u4e86EAGLE\u7684\u8349\u7a3f\u6a21\u578b\u6821\u51c6\u826f\u597d\u7684\u7279\u6027\uff1a\u8349\u7a3f\u6a21\u578b\u7684\u4fe1\u5fc3\u5206\u6570\u80fd\u8fd1\u4f3c\u8868\u793a\u63a5\u53d7\u7387\uff0c\u8bef\u5dee\u8f83\u5c0f\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u7cfb\u5217\u7684LLMs\u548c\u516d\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aEAGLE-2\u7684\u901f\u5ea6\u63d0\u5347\u6bd4\u7387\u4e3a3.05\u500d\u52304.26\u500d\uff0c\u6bd4EAGLE-1\u5feb20%\u523040%\u3002\u6b64\u5916\uff0cEAGLE-2\u8fd8\u80fd\u4fdd\u6301\u751f\u6210\u6587\u672c\u5206\u5e03\u4e0d\u53d8\uff0c\u56e0\u6b64\u662f\u4e00\u4e2a\u65e0\u635f\u52a0\u901f\u7b97\u6cd5\u3002|\n", "2406.16838": "|**2024-06-24**|**From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models**|Sean Welleck et.al.|[2406.16838](http://arxiv.org/abs/2406.16838)|null|\u73b0\u4ee3\u7814\u7a76\u4e2d\u6700\u5f15\u4eba\u6ce8\u76ee\u7684\u53d1\u73b0\u4e4b\u4e00\u662f\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u589e\u52a0\u8ba1\u7b97\u8d44\u6e90\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u65ad\u65f6\u7684\u4f18\u5316\u65b9\u6cd5\u7684\u5173\u6ce8\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u7bc7\u7efc\u8ff0\u4e13\u95e8\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u63a8\u65ad\u65f6\u95f4\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u7edf\u4e00\u7684\u6570\u5b66\u6846\u67b6\u51fa\u53d1\uff0c\u8003\u5bdf\u4e86\u4e09\u4e2a\u9886\u57df\uff1a\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\u3001\u5143\u751f\u6210\u7b97\u6cd5\u548c\u9ad8\u6548\u751f\u6210\u3002\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\uff0c\u901a\u5e38\u79f0\u4e3a\u89e3\u7801\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e00\u6b21\u62bd\u6837\u4e00\u4e2atoken\u6216\u6784\u5efa\u8bcd\u7ea7\u641c\u7d22\u7a7a\u95f4\uff0c\u7136\u540e\u9009\u62e9\u8f93\u51fa\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5047\u8bbe\u80fd\u591f\u8bbf\u95ee\u8bed\u8a00\u6a21\u578b\u7684logits\u3001\u4e0b\u4e00\u4e2atoken\u5206\u5e03\u6216\u6982\u7387\u5206\u6570\u3002\u5143\u751f\u6210\u7b97\u6cd5\u5904\u7406\u90e8\u5206\u6216\u5b8c\u6574\u5e8f\u5217\uff0c\u878d\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u652f\u6301\u56de\u6eaf\uff0c\u5e76\u6574\u5408\u5916\u90e8\u4fe1\u606f\u3002\u9ad8\u6548\u751f\u6210\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11token\u6210\u672c\uff0c\u63d0\u9ad8\u751f\u6210\u901f\u5ea6\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u878d\u5408\u4e86\u6765\u81ea\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u73b0\u4ee3LLMs\u548c\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u4e09\u4e2a\u7814\u7a76\u793e\u533a\u7684\u89c2\u70b9\u3002|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833](http://arxiv.org/abs/2406.16833)|null|\u5728\u5f53\u524d\u7684\u80cc\u666f\u4e0b\uff0c\u8bc6\u522b\u7528\u6237\u5728\u5404\u79cd\u8bdd\u9898\u7684\u957f\u7bc7\u8ba8\u8bba\u4e2d\u7684\u89c2\u70b9\u548c\u7acb\u573a\u5bf9\u4e8e\u4e2a\u6027\u5316\u3001\u5e02\u573a\u7814\u7a76\u3001\u653f\u6cbb\u7ade\u9009\u3001\u5ba2\u6237\u670d\u52a1\u3001\u51b2\u7a81\u89e3\u51b3\u3001\u5b9a\u5411\u5e7f\u544a\u548c\u5185\u5bb9\u7ba1\u7406\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u52a8\u6807\u6ce8\u6570\u636e\u4ee5\u8bad\u7ec3\u6b64\u7c7b\u6a21\u578b\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u5982\u8017\u65f6\u6602\u8d35\u3001\u957f\u5bf9\u8bdd\u53ef\u80fd\u5f15\u5165\u566a\u58f0\uff0c\u4ee5\u53ca\u7528\u6237\u89c2\u70b9\u8f6c\u53d8\u7684\u5fae\u5999\u4e4b\u5904\u53ef\u80fd\u5bfc\u81f4\u89e3\u8bfb\u56f0\u96be\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u672c\u6587\u5c1d\u8bd5\u5229\u7528Mistral Large\u548cGPT-4\u81ea\u52a8\u5316\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\u7684\u6807\u6ce8\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u63a8\u7406\uff1a\u4e00\u662f\u7528\u6237\u7acb\u573a\u5206\u7c7b\uff0c\u5373\u5728\u5bf9\u8bdd\u4e2d\u5bf9\u7528\u6237\u5e16\u5b50\u7684\u89c2\u70b9\u8fdb\u884c\u4e94\u7ea7\u6807\u6ce8\uff1b\u4e8c\u662f\u7528\u6237\u56fa\u6267\u7a0b\u5ea6\u5206\u7c7b\uff0c\u5173\u6ce8\u7528\u6237\u5728\u6574\u4e2a\u5bf9\u8bdd\u4e2d\u7684\u603b\u4f53\u610f\u89c1\uff0c\u91c7\u7528\u56db\u7ea7\u6807\u6ce8\u3002\u901a\u8fc7\u5728764\u4e2a\u591a\u7528\u6237Reddit\u5bf9\u8bdd\u4e0a\u5e94\u7528\u96f6\u6837\u672c\u3001\u4e00\u793a\u4f8b\u548c\u5c11\u91cf\u6837\u4f8b\u6807\u6ce8\u7684\u591a\u6570\u6295\u7968\uff0c\u6211\u4eec\u521b\u5efa\u4e86USDC\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\u5bf9\u591a\u4e2a\u5c0f\u578b\u90e8\u7f72\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u548c\u6307\u4ee4\u8c03\u6574\uff0c\u7528\u4e8e\u6267\u884c\u4e94\u7c7b\u7acb\u573a\u548c\u56db\u7c7b\u56fa\u6267\u7a0b\u5ea6\u7684\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff1a[https://anonymous.4open.science/r/USDC-0F7F]\u3002|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828](http://arxiv.org/abs/2406.16828)|**[link](https://github.com/castorini/ragnarok)**|## \u80cc\u666f \u60a8\u53ef\u80fd\u4f53\u9a8c\u8fc7\u65b0\u7684Bing\u641c\u7d22\u6216Google AI\u6982\u8ff0\uff1f\u8fd9\u4e9b\u90fd\u53cd\u6620\u51fa\u5f53\u524d\u641c\u7d22\u5f15\u64ce\u6b63\u9010\u6b65\u53d1\u5c55\u5230\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u7cfb\u7edf\u3002\u8fd9\u7c7b\u7cfb\u7edf\u80fd\u6574\u5408\u5b9e\u65f6\u6570\u636e\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u63d0\u4f9b\u4fe1\u606f\u4e30\u5bcc\u3001\u6709\u6765\u6e90\u4e14\u7b80\u6d01\u7684\u6458\u8981\uff0c\u4e0e\u4f20\u7edf\u7684\u6587\u6863\u6392\u540d\u5c55\u793a\u65b9\u5f0f\u5f62\u6210\u5bf9\u6bd4\u3002\u56e0\u6b64\uff0c\u4e3a\u4e86\u63a8\u52a8RAG\u7cfb\u7edf\u8bc4\u4f30\u7684\u521b\u65b0\uff0c\u6211\u4eec\u63d0\u8bae\u5728TREC 2024\u5e74\u589e\u8bbeRAG\u7ade\u8d5b\u3002\u672c\u6587\u8be6\u8ff0\u4e86\u6211\u4eec\u5982\u4f55\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff1a\u63cf\u8ff0\u4e86\u53ef\u590d\u7528\u6846\u67b6Ragnar\\\"ok\u7684\u8bbe\u8ba1\uff0c\u89e3\u91ca\u4e86MS MARCO V2.1\u8bed\u6599\u5e93\u7684\u9009\u62e9\uff0c\u53d1\u5e03\u4e86\u7ade\u8d5b\u5f00\u53d1\u8bdd\u9898\uff0c\u5e76\u6807\u51c6\u5316\u4e86\u7528\u6237\u63a5\u53e3\u5b9a\u4e49\uff0c\u4ee5\u4fbf\u5229\u7528\u6237\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5229\u7528Ragnar\\\"ok\u5c55\u793a\u5173\u952e\u7684\u5de5\u4e1a\u57fa\u51c6\uff0c\u5982OpenAI\u7684GPT-4o\u548cCohere\u7684Command R+\u3002\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u4e00\u4e2a\u7f51\u9875\u754c\u9762\uff0c\u7528\u4e8e\u4e92\u52a8\u5f0f\u5730\u6bd4\u8f83\u4e0d\u540cRAG\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u5e76\u901a\u8fc7\u4f17\u5305\u65b9\u5f0f\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u5f00\u6e90Ragnar\\\"ok\u6846\u67b6\u548c\u57fa\u51c6\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684RAG\u7cfb\u7edf\u5efa\u7acb\u7edf\u4e00\u7684\u6807\u51c6\u3002|\n", "2406.16801": "|**2024-06-24**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801](http://arxiv.org/abs/2406.16801)|**[link](https://github.com/qurrent-ai/res-q)**|**## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u4fc3\u4f7f\u4e86\u4e00\u7c7b\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u7cfb\u7edf\u53d1\u5c55\uff0c\u5982\u5bf9\u5927\u578b\u4ee3\u7801\u4ed3\u5e93\u8fdb\u884c\u7f16\u8f91\u3002\u9274\u4e8eLLMs\u5bf9\u63d0\u793a\u5fae\u8c03\u7684\u9ad8\u654f\u611f\u6027\u548c\u4e0d\u53ef\u9884\u6d4b\u6027\uff0c\u8feb\u5207\u9700\u8981\u7a33\u5065\u7684\u8bc4\u4f30\u5de5\u5177\u6765\u63a8\u52a8\u8fd9\u4e9b\u7cfb\u7edf\u7684\u672a\u6765\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51faRES-Q\uff0c\u4e00\u4e2a\u9488\u5bf9$\\textbf{R}$epository $\\textbf{E}$diting $\\textbf{S}$ystems\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e100\u4e2a\u771f\u5b9e\u7684GitHub\u63d0\u4ea4\u6784\u5efa\u4e86100\u4e2a\u4ed3\u5e93\u7f16\u8f91\u4efb\u52a1\u3002\u7ed9\u5b9a\u7f16\u8f91\u6307\u4ee4\u548c\u4ee3\u7801\u4ed3\u5e93\uff0cRES-Q\u8bc4\u4f30LLM\u7cfb\u7edf\u83b7\u53d6\u4fe1\u606f\u5e76\u6784\u9020\u6ee1\u8db3\u6307\u4ee4\u8981\u6c42\u7684\u7f16\u8f91\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u8fd9\u79cd\u8bc4\u4f30\u65b9\u5f0f\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u80fd\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002 \u6211\u4eec\u4f7f\u7528Qurrent OS\u5f00\u53d1\u7684\u8bed\u8a00\u4ee3\u7406\u8f6f\u4ef6\u6784\u5efa\u4e86\u4e00\u4e2a\u4ed3\u5e93\u7f16\u8f91\u7cfb\u7edf\uff0c\u5bf9\u8be5\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5982Claude Sonnet 3.5\u548cGPT-4o\uff0c\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5c3d\u7ba1\u5728HumanEval\u4e0a\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6709\u6240\u5dee\u5f02\uff0c\u4f46\u5728RES-Q\u4e0a\uff0cClaude Sonnet 3.5\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6bd4GPT-4o\u9ad8\u51fa12%\uff0c\u8fd9\u8868\u660eRES-Q\u5177\u6709\u533a\u5206\u6a21\u578b\u80fd\u529b\u7684\u6f5c\u529b\uff0c\u968f\u7740\u4f20\u7edf\u57fa\u51c6\u63a5\u8fd1\u9971\u548c\uff0c\u5b83\u80fd\u63d0\u4f9b\u66f4\u6df1\u5165\u7684\u6d1e\u5bdf\u3002 \u6211\u4eec\u8fd8\u7814\u7a76\u4e86token\u6548\u7387\u3001\u4e0e\u73b0\u6709\u57fa\u51c6\u7684\u6027\u80fd\u5173\u8054\uff0c\u4ee5\u53ca\u5c01\u95ed\u6e90\u548c\u5f00\u6e90LLM\u4e4b\u95f4\u7684\u6709\u8da3\u5dee\u5f02\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u5728https://github.com/Qurrent-AI/RES-Q\u83b7\u53d6\u3002**|\n", "2406.16797": "|**2024-06-24**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797](http://arxiv.org/abs/2406.16797)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|**## \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u65b0\u4efb\u52a1\u7684\u65b9\u6cd5\u5e76\u4e0d\u9002\u7528\u4e8e\u591a\u4efb\u52a1\u9002\u5e94\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u4fee\u6539\u6240\u6709\u6a21\u578b\u6743\u91cd\uff0c\u5bfc\u81f4\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4ea7\u751f\u7834\u574f\u6027\u7684\u5e72\u6270\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u5148\u524d\u4efb\u52a1\u7684\u9057\u5fd8\uff0c\u4f7f\u5f97\u540c\u65f6\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u83b7\u5f97\u826f\u597d\u6027\u80fd\u53d8\u5f97\u56f0\u96be\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Lottery Ticket Adaptation\uff08LoTA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7a00\u758f\u9002\u5e94\u65b9\u6cd5\uff0c\u5b83\u8bc6\u522b\u5e76\u4f18\u5316\u6a21\u578b\u4e2d\u7684\u4e00\u4e2a\u7a00\u758f\u5b50\u7f51\u7edc\u3002\u6211\u4eec\u5728\u8bf8\u5982\u6307\u4ee4\u8ddf\u968f\u3001\u63a8\u7406\u3001\u6570\u5b66\u548c\u6458\u8981\u7b49\u590d\u6742\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86LoTA\u3002 ## \u65b9\u6cd5 LoTA\u901a\u8fc7\u53d1\u73b0\u548c\u4f18\u5316\u201c\u5f69\u7968\u5238\u201d\uff08\u6216\u7a00\u758f\u4efb\u52a1\u5411\u91cf\uff09\u6765\u5b9e\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u4e8e\u5168\u91cf\u5fae\u8c03\u548c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u3002LoTA\u4e0d\u4ec5\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd8\u80fd\u5728\u8bad\u7ec3\u5176\u4ed6\u4efb\u52a1\u540e\u4fdd\u6301\u826f\u597d\u7684\u8868\u73b0\uff0c\u4ece\u800c\u907f\u514d\u4e86\u707e\u96be\u6027\u9057\u5fd8\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d0\u53d6\u548c\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\uff0cLoTA\u8fd8\u652f\u6301\u5728\u9ad8\u5ea6\u4e0d\u540c\u7684\u4efb\u52a1\u95f4\u8fdb\u884c\u6a21\u578b\u878d\u5408\u3002 ## \u7ed3\u8bba \u603b\u7684\u6765\u8bf4\uff0cLoTA\u4f5c\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u7a00\u758f\u9002\u5e94\u7b56\u7565\uff0c\u4e3a\u591a\u4efb\u52a1\u5927\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u5728\u5904\u7406\u591a\u4e2a\u4efb\u52a1\u65f6\u4fdd\u6301\u7a33\u5b9a\u4e14\u9ad8\u6548\u7684\u8868\u73b0\u3002**|\n", "2406.16783": "|**2024-06-24**|**M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models**|Rishabh Maheshwary et.al.|[2406.16783](http://arxiv.org/abs/2406.16783)|null|## \u80cc\u666f \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u6307\u4ee4\u7684\u6821\u51c6\u8fc7\u7a0b\u4e2d\uff0c\u5fae\u8c03\uff08finetuning, IFT\uff09\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u4e9b\u6709\u6548\u7684IFT\u6570\u636e\u96c6\uff0c\u4f46\u5927\u591a\u96c6\u4e2d\u5728\u9ad8\u8d44\u6e90\u8bed\u8a00\u5982\u82f1\u8bed\u4e0a\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51fa\u4e00\u4e2a\u5168\u5408\u6210\u7684\u3001\u57fa\u4e8eEvol\u5206\u7c7b\u6cd5\u5f15\u5bfc\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\u2014\u2014M2Lingual\uff0c\u76ee\u6807\u662f\u63d0\u5347LLMs\u5728\u591a\u6837\u8bed\u8a00\u548c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002M2Lingual\u5171\u5305\u542b182,000\u4e2aIFT\u5bf9\uff0c\u6e90\u81ea\u4e0d\u540c\u79cd\u5b50\uff0c\u6db5\u76d670\u79cd\u8bed\u8a00\u300117\u4e2aNLP\u4efb\u52a1\u4ee5\u53ca\u901a\u7528\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u3002 ## \u76ee\u7684\u4e0e\u8d21\u732e \u4f7f\u7528M2Lingual\u8fdb\u884c\u8bad\u7ec3\u7684LLMs\u6027\u80fd\u663e\u8457\u4f18\u4e8e\u5927\u591a\u6570\u73b0\u6709\u7684\u591a\u8bed\u8a00IFT\u6570\u636e\u96c6\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u7ecfM2Lingual\u5fae\u8c03\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u7a33\u5065\u7684\u8de8\u8bed\u8a00\u80fd\u529b\uff0c\u65e0\u8bba\u662f\u5728\u6211\u4eec\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u7ffb\u8bd1\u8bc4\u4ef7\u57fa\u51c6\u4e0a\uff0c\u8fd8\u662f\u5728\u591a\u79cd\u591a\u6837\u7684\u591a\u8bed\u8a00\u4efb\u52a1\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8d21\u732e\u4e86Evol\u5206\u7c7b\u6cd5\u7684\u4e24\u6b65\u65b9\u6cd5\uff0c\u5e76\u516c\u5f00\u4e86M2Lingual\u7684\u6570\u636e\u96c6\uff1ahttps://huggingface.co/datasets/ServiceNow-AI/M2Lingual\u3002|\n", "2406.16779": "|**2024-06-24**|**It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension**|Sagi Shaier et.al.|[2406.16779](http://arxiv.org/abs/2406.16779)|null|\u8fc7\u53bb\u5341\u5e74\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5b9e\u8df5\u672a\u7ecf\u5145\u5206\u8bc4\u4f30\u5c31\u5df2\u786e\u7acb\u3002\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u8fd9\u4e00\u60c5\u51b5\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u95ee\u9898\uff1a1\uff09\u8f93\u5165\u987a\u5e8f\uff08\u5373\u95ee\u9898\u548c\u4e0a\u4e0b\u6587\uff09\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff1f\u9274\u4e8e\u8fd1\u671f\u5728\u8f93\u5165\u4fa7\u91cd\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63a2\u7a76\uff1a2\uff09\u5f3a\u8c03\u95ee\u9898\u3001\u4e0a\u4e0b\u6587\u6216\u4e24\u8005\u662f\u5426\u80fd\u63d0\u5347\u8868\u73b0\uff1f\u6211\u4eec\u57283\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u4e869\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u53d1\u73b0\u5148\u5448\u73b0\u4e0a\u4e0b\u6587\u518d\u7ed9\u51fa\u95ee\u9898\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u6700\u9ad8\u53ef\u8fbe31%\u7684\u51c6\u786e\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u5f3a\u8c03\u4e0a\u4e0b\u6587\u7684\u6548\u679c\u4f18\u4e8e\u7a81\u51fa\u663e\u793a\u95ee\u9898\uff0c\u800c\u4e14\u5bf9\u6a21\u578b\u7f3a\u4e4f\u53c2\u6570\u77e5\u8bc6\u6765\u56de\u7b54\u7684\u95ee\u9898\uff0c\u9488\u5bf9\u6027\u5730\u5f3a\u8c03\u8f93\u5165\u90e8\u5206\u5c24\u5176\u6709\u6548\u3002\u901a\u8fc7\u5c1d\u8bd5\u57fa\u4e8e\u63d0\u793a\u548c\u6ce8\u610f\u529b\u7684\u5f3a\u8c03\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u6709\u6548\u7684\u7b56\u7565\u51fa\u4eba\u610f\u6599\u5730\u7b80\u5355\uff1a\u53ea\u9700\u5728\u8f93\u5165\u4e2d\u9644\u52a0\u51e0\u4e2a\u6807\u8bb0\uff0c\u5c31\u80fd\u5b9e\u73b0\u9ad8\u8fbe36%\u7684\u51c6\u786e\u6027\u63d0\u5347\uff0c\u4f7f\u5f97\u5c0f\u578b\u6a21\u578b\u80fd\u591f\u8d85\u8d8a\u5176\u5927\u5f97\u591a\u7684\u540c\u7c7b\u6a21\u578b\u3002|\n", "2406.16777": "|**2024-06-24**|**Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024**|Sai Koneru et.al.|[2406.16777](http://arxiv.org/abs/2406.16777)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u88ab\u5e7f\u6cdb\u7814\u7a76\uff0c\u4ee5\u5e94\u7528\u4e8e\u8bf8\u5982\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u3001\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u751a\u81f3\u7aef\u5230\u7aef\u8bed\u97f3\u7ffb\u8bd1\uff08ST\uff09\u7b49\u4efb\u52a1\u3002\u672c\u6587\u4ecb\u7ecdKIT\u56e2\u961f\u5728\u53d7\u9650+LLM\u8d5b\u9053\u4e0b\u7684\u79bb\u7ebf\u63d0\u4ea4\uff0c\u6211\u4eec\u901a\u8fc7\u6574\u5408\u6700\u65b0\u6280\u672f\u6539\u8fdb\u4e86\u7ea7\u8054\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u5c06Mistral-7B\u6a21\u578b\\footnote{mistralai/Mistral-7B-Instruct-v0.1}\u878d\u5165\u5176\u4e2d\uff0c\u4ece\u4e24\u4e2a\u65b9\u9762\u589e\u5f3a\u7cfb\u7edf\uff1a\u4e00\u662f\u5229\u7528\u6211\u4eec\u7684\u7cfb\u7edf\u751f\u6210\u7684N-best\u5217\u8868\u7cbe\u70bcASR\u8f93\u51fa\uff0c\u901a\u8fc7\u5fae\u8c03LLM\u63d0\u9ad8\u8f6c\u5f55\u51c6\u786e\u6027\uff1b\u4e8c\u662f\u5bf9MT\u8f93\u51fa\u8fdb\u884c\u6587\u6863\u7ea7\u522b\u7684\u7cbe\u70bc\uff0c\u5229\u7528ASR\u548cMT\u9884\u6d4b\u6765\u63d0\u5347\u7ffb\u8bd1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u7684\u96c6\u6210\u4f7f\u5f97ASR\u7684Word Error Rate\u4e0b\u964d\u4e86\u7edd\u5bf90.3%\uff0cMT\u7684COMET\u8bc4\u5206\u63d0\u9ad8\u4e860.65%\u3002\u7136\u800c\uff0c\u5728\u5305\u542b\u91cd\u53e0\u8bf4\u8bdd\u8005\u548c\u80cc\u666f\u566a\u97f3\u7684\u6311\u6218\u6027\u6d4b\u8bd5\u96c6\u4e2d\uff0c\u7531\u4e8eASR\u6027\u80fd\u4e0d\u4f73\uff0cLLM\u96c6\u6210\u7684\u6548\u679c\u4e0d\u660e\u663e\u3002\u4e3a\u4e86\u6539\u5584\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u53ef\u80fd\u7f3a\u5931\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u5206\u5757\u957f\u5f62\u5f0f\u89e3\u7801\u7684ASR\u65b9\u6cd5\u3002|\n", "2406.16768": "|**2024-06-24**|**WARP: On the Benefits of Weight Averaged Rewarded Policies**|Alexandre Ram\u00e9 et.al.|[2406.16768](http://arxiv.org/abs/2406.16768)|null|### \u7ffb\u8bd1 \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u901a\u8fc7\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u6765\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f7f\u5176\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u4e3a\u4e86\u4fdd\u6301\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0cRLHF\u901a\u5e38\u91c7\u7528KL\u6563\u5ea6\u6b63\u5219\u5316\uff0c\u4f46\u8fd9\u4f1a\u9650\u5236\u5956\u52b1\u4f18\u5316\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5bf9\u9f50\u7b56\u7565\uff0c\u79f0\u4e3a\u6743\u91cd\u5e73\u5747\u5956\u52b1\u7b56\u7565\uff08WARP\uff09\u3002WARP\u5728\u4e09\u4e2a\u9636\u6bb5\u5728\u6743\u91cd\u7a7a\u95f4\u4e2d\u878d\u5408\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5b83\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7b56\u7565\u4f5c\u4e3aKL\u6b63\u5219\u5316\u7684\u52a8\u6001\u57fa\u51c6\u3002\u5176\u6b21\uff0c\u5e94\u7528\u7403\u9762\u63d2\u503c\u5c06\u72ec\u7acb\u5fae\u8c03\u7684\u7b56\u7565\u5408\u5e76\u6210\u4e00\u4e2a\u589e\u5f3a\u6a21\u578b\u3002\u6700\u540e\uff0c\u7ebf\u6027\u63d2\u503c\u5728\u5408\u5e76\u6a21\u578b\u548c\u521d\u59cb\u6a21\u578b\u4e4b\u95f4\u8fdb\u884c\uff0c\u4ee5\u6062\u590d\u9884\u8bad\u7ec3\u7279\u5f81\u3002\u8be5\u8fc7\u7a0b\u8fed\u4ee3\u8fdb\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6700\u7ec8\u6a21\u578b\u7528\u4f5c\u4e0b\u4e00\u8f6e\u7684\u9ad8\u7ea7\u521d\u59cb\u5316\uff0c\u9010\u6b65\u4f18\u5316KL\u4e0e\u5956\u52b1\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5b9e\u73b0\u56fa\u5b9aKL\u4e0b\u7684\u66f4\u9ad8\u5956\u52b1\u3002GEMMA\u7b56\u7565\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86WARP\u7684\u4f18\u70b9\uff0c\u5176\u8d28\u91cf\u548c\u5bf9\u9f50\u6027\u80fd\u4f18\u4e8e\u5f00\u6e90\u7684LLMs\u3002|\n", "2406.17770": "|**2024-06-25**|**MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning**|Xiangyu Zhao et.al.|[2406.17770](http://arxiv.org/abs/2406.17770)|**[link](https://github.com/phoenixz810/mg-llava)**|**## \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u6a21\u578b\u5c40\u9650\u4e8e\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u56fe\u50cf\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u8be6\u7ec6\u89c6\u89c9\u4fe1\u606f\u7684\u611f\u77e5\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MLLM\u2014\u2014MG-LLaVA\uff0c\u901a\u8fc7\u5f15\u5165\u591a\u5c3a\u5ea6\u89c6\u89c9\u6d41\uff0c\u5305\u62ec\u4f4e\u5206\u8fa8\u7387\u3001\u9ad8\u5206\u8fa8\u7387\u548c\u5bf9\u8c61\u7ea7\u7279\u5f81\uff0c\u6765\u589e\u5f3a\u6a21\u578b\u7684\u89c6\u89c9\u5904\u7406\u80fd\u529b\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u4ee5\u6355\u6349\u7cbe\u7ec6\u7ec6\u8282\uff0c\u5e76\u901a\u8fc7\u5377\u79ef\u95e8\u878d\u5408\u7f51\u7edc\u4e0e\u57fa\u7840\u89c6\u89c9\u7279\u5f81\u878d\u5408\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u5bf9\u8c61\u8bc6\u522b\u80fd\u529b\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u6765\u81ea\u79bb\u7ebf\u68c0\u6d4b\u5668\u786e\u5b9a\u7684\u8fb9\u754c\u6846\u7684\u7269\u4f53\u7ea7\u522b\u7279\u5f81\u3002MG-LLaVA\u4ec5\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u611f\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u7f16\u7801\u5668\uff08\u4ece38\u4ebf\u5230340\u4ebf\u53c2\u6570\uff09\u5b9e\u4f8b\u5316MG-LLaVA\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u5176\u6027\u80fd\u3002\u591a\u9879\u57fa\u51c6\u6d4b\u8bd5\u7684\u7ed3\u679c\u8868\u660e\uff0cMG-LLaVA\u5728\u540c\u7c7b\u53c2\u6570\u91cf\u7684\u73b0\u6709MLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u51fa\u8272\u7684\u6548\u7387\u3002\u4ee3\u7801\u5c06\u5728https://github.com/PhoenixZ810/MG-LLaVA\u4e0a\u5f00\u6e90\u3002**|\n", "2406.17764": "|**2024-06-25**|**BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning**|Ercong Nie et.al.|[2406.17764](http://arxiv.org/abs/2406.17764)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u79ef\u7d2f\u4e86\u4e30\u5bcc\u7684\u53c2\u6570\u77e5\u8bc6\uff0c\u4f46\u7531\u4e8e\u91cd\u65b0\u8bad\u7ec3\u6210\u672c\u9ad8\u6602\u4e14\u5bf9\u95ed\u6e90\u6a21\u578b\u4e0d\u53ef\u884c\uff0c\u66f4\u65b0\u8fd9\u4e9b\u77e5\u8bc6\u53d8\u5f97\u56f0\u96be\u3002\u77e5\u8bc6\u7f16\u8f91\uff08KE\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5141\u8bb8\u5728\u4e0d\u635f\u5bb3\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u66f4\u65b0LLMs\u7684\u77e5\u8bc6\u3002\u57fa\u4e8e\u201c\u4e0a\u4e0b\u6587\u5b66\u4e60\u201d\uff08ICL\uff09\u7684\u5373\u5e2dKE\u65b9\u6cd5\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u4f5c\u4e3a\u9ed1\u76d2\u5904\u7406\u3002\u8fc7\u53bb\uff0cKE\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u73af\u5883\uff0c\u800c\u5f53\u524d\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLMs\u5728\u8de8\u8bed\u8a00KE\u65b9\u9762\u7684\u6f5c\u529b\u5c1a\u672a\u5145\u5206\u6316\u6398\u3002\u4e3a\u4e86\u63a8\u52a8\u8fd9\u65b9\u9762\u7684\u66f4\u591a\u7814\u7a76\uff0c\u6211\u4eec\u63a8\u51fa\u4e86BMIKE-53\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u9488\u5bf953\u79cd\u4e0d\u540c\u8bed\u8a00\u7684\u4e09\u79cdKE\u4efb\u52a1\u7c7b\u578b\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u68af\u5ea6\u7684KE\u65b9\u6cd5\u2014\u2014\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7f16\u8f91\uff08MIKE\uff09\uff0c\u5e76\u5728BMIKE-53\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u5173\u6ce8\u8de8\u8bed\u8a00\u77e5\u8bc6\u8f6c\u79fb\u7684\u53ef\u9760\u6027\u3001\u6cdb\u5316\u6027\u3001\u5c40\u90e8\u6027\u548c\u53ef\u79fb\u690d\u6027\uff0c\u4e3a\u672a\u6765\u8de8\u8bed\u8a00KE\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u548c\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u901a\u8fc7\u533f\u540d\u4ed3\u5e93https://anonymous.4open.science/r/MIKE\u516c\u5f00\u83b7\u53d6\u3002|\n", "2406.17761": "|**2024-06-25**|**CaLMQA: Exploring culturally specific long-form question answering across 23 languages**|Shane Arora et.al.|[2406.17761](http://arxiv.org/abs/2406.17761)|**[link](https://github.com/2015aroras/calmqa)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u957f\u7bc7\u95ee\u7b54\u4efb\u52a1\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u9700\u751f\u6210\u6bb5\u843d\u7ea7\u522b\u7684\u7b54\u6848\u6765\u56de\u5e94\u590d\u6742\u95ee\u9898\u3002\u5c3d\u7ba1\u82f1\u8bed\u7684\u957f\u7bc7\u95ee\u7b54\u7814\u7a76\u5df2\u76f8\u5f53\u6df1\u5165\uff0c\u6d89\u53ca\u591a\u79cd\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u6307\u6807\uff0c\u4f46\u5176\u4ed6\u8bed\u8a00\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63a8\u51fa\u4e86CaLMQA\uff0c\u4e00\u4e2a\u5305\u542b2,600\u4e2a\u8de823\u79cd\u8bed\u8a00\u7684\u590d\u6742\u95ee\u9898\u96c6\u5408\uff0c\u5176\u4e2d\u5305\u62ec\u8d44\u6e90\u6709\u9650\u3001\u9c9c\u5c11\u7814\u7a76\u7684\u8bed\u8a00\uff0c\u5982\u6590\u6d4e\u8bed\u548c\u57fa\u6797\u8fea\u8bed\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u65e2\u5305\u62ec\u793e\u533a\u7f51\u7edc\u8bba\u575b\u4e0a\u6536\u96c6\u7684\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u4e5f\u5305\u542b\u4e86\u7531\u6bcd\u8bed\u4f7f\u7528\u8005\u64b0\u5199\u7684\u9898\u76ee\uff0c\u6211\u4eec\u4e3a\u6b64\u4e13\u95e8\u8058\u8bf7\u4e86\u4ed6\u4eec\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u4ea7\u751f\u4e86\u591a\u6837\u4e14\u590d\u6742\u7684\u9898\u76ee\uff0c\u53cd\u6620\u4e86\u6587\u5316\u4e3b\u9898\uff08\u5982\u4f20\u7edf\u3001\u6cd5\u5f8b\u3001\u65b0\u95fb\uff09\uff0c\u4ee5\u53ca\u6bcd\u8bed\u4f7f\u7528\u8005\u7684\u8bed\u8a00\u4e60\u60ef\u3002 \u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u81ea\u52a8\u8bc4\u4f30\uff0c\u4f7f\u7528\u4e86\u6211\u4eec\u65b0\u63d0\u51fa\u7684CaLMScore\u6307\u6807\uff0c\u8be5\u6307\u6807\u80fd\u68c0\u6d4b\u7b54\u6848\u4e2d\u7684\u8bed\u8a00\u9519\u8bef\u548c\u91cd\u590d\u8bcd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u4f4e\u8d44\u6e90\u8bed\u8a00\uff0cLLM\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u660e\u663e\u4e0b\u964d\u3002\u6211\u4eec\u5728\u90e8\u5206\u6a21\u578b\u7684\u4eba\u5de5\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u5bf9\u4e8e\u5177\u6709\u6587\u5316\u7279\u6027\u7684\u95ee\u9898\uff0c\u6a21\u578b\u8868\u73b0\u663e\u8457\u4f4e\u4e8e\u6587\u5316\u4e2d\u7acb\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5bf9LLM\u591a\u8bed\u8a00\u80fd\u529b\u53ca\u975e\u82f1\u8bed\u957f\u7bc7\u95ee\u7b54\u8bc4\u4ef7\u9886\u57df\u66f4\u6df1\u5165\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002**|\n", "2406.17755": "|**2024-06-25**|**Accelerating Clinical Evidence Synthesis with Large Language Models**|Zifeng Wang et.al.|[2406.17755](http://arxiv.org/abs/2406.17755)|null|\u4eba\u5de5\u667a\u80fd\u81ea\u52a8\u533b\u5b66\u53d1\u73b0\u662f\u8bb8\u591a\u4eba\u7684\u68a6\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTrialMind\u7684\u751f\u6210\u5f0fAI\u7ba1\u9053\uff0c\u65e8\u5728\u8fdb\u884c\u533b\u5b66\u7cfb\u7edf\u6027\u56de\u987e\uff0c\u6db5\u76d6\u7814\u7a76\u641c\u7d22\u3001\u7b5b\u9009\u548c\u6570\u636e\u63d0\u53d6\u9636\u6bb5\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u6bcf\u4e2a\u73af\u8282\uff0c\u5e76\u5f15\u5165\u4e13\u5bb6\u76d1\u7763\u4ee5\u51cf\u5c11\u9519\u8bef\u3002\u4e3a\u4e86\u8bc4\u4f30\u6027\u80fd\uff0c\u6211\u4eec\u521b\u5efa\u4e86TrialReviewBench\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u662f\u4e00\u4e2a\u5b9a\u5236\u7684\u5305\u542b870\u4efd\u6765\u81ea25\u7bc7\u5143\u5206\u6790\u8bba\u6587\u7684\u4e34\u5e8a\u7814\u7a76\u6807\u6ce8\u6570\u636e\uff0c\u6db5\u76d6\u4e0d\u540c\u533b\u7597\u6cbb\u7597\u9886\u57df\u3002\u7ed3\u679c\u663e\u793a\uff0cTrialMind\u663e\u8457\u63d0\u5347\u4e86\u6587\u732e\u5ba1\u67e5\u6548\u7387\uff0c\u5728\u4ece\u8d85\u8fc72000\u4e07\u7bc7PubMed\u7814\u7a76\u4e2d\u68c0\u7d22\u76f8\u5173\u7814\u7a76\u65f6\uff0c\u53ec\u56de\u7387\u9ad8\u8fbe0.897\u81f31.000\u3002\u5728\u7b5b\u9009\u9636\u6bb5\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8e\u57fa\u4e8e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7684\u65b9\u6cd5\uff08\u53ec\u56de\u7387\u5206\u522b\u4e3a0.227-0.246 vs. 0.000-0.102\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ed3\u679c\u63d0\u53d6\u65b9\u9762\u8d85\u8d8a\u4e86\u76f4\u63a5\u4f7f\u7528GPT-4\u7684\u8868\u73b0\uff0c\u51c6\u786e\u7387\u8303\u56f4\u4e3a0.65\u52300.84\u3002\u6211\u4eec\u8fd8\u652f\u6301\u68ee\u6797\u56fe\u4e2d\u7684\u4e34\u5e8a\u8bc1\u636e\u7efc\u5408\uff0c\u7ecf\u516b\u540d\u4eba\u7c7b\u6807\u6ce8\u5458\u9a8c\u8bc1\uff0c\u4ed6\u4eec\u666e\u904d\u66f4\u504f\u597dTrialMind\uff0c\u5176\u5728\u6d89\u53ca\u7684\u5ba1\u67e5\u4e2d\u80dc\u51fa\u7387\u4e3a62.5%\u81f3100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u5982TrialMind\uff0c\u80fd\u591f\u4fc3\u8fdb\u53ef\u9760\u4e14\u9ad8\u8d28\u91cf\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\uff0c\u4ece\u800c\u63d0\u5347\u4e34\u5e8a\u7814\u7a76\u7684\u6548\u7387\u3002|\n", "2406.17753": "|**2024-06-25**|**Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language**|Amalie Brogaard Pauli et.al.|[2406.17753](http://arxiv.org/abs/2406.17753)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u9762\u5bf9\u5927\u91cf\u8bd5\u56fe\u5f71\u54cd\u6211\u4eec\u7684\u4fe1\u606f\uff0c\u5982\u9884\u544a\u6d88\u606f\u3001\u8fa9\u8bba\u3001\u5e26\u6709\u653f\u6cbb\u8272\u5f69\u7684\u65b0\u95fb\u548c\u5ba3\u4f20\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5177\u6709\u8bf4\u670d\u529b\u6587\u672c\u7684\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u4e13\u6ce8\u4e8e\u7279\u5b9a\u9886\u57df\u6216\u7c7b\u578b\u529d\u8bf4\u7684\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u5206\u6790\uff0c\u65e8\u5728\u6d4b\u91cf\u548c\u57fa\u51c6LLMs\u5728\u88ab\u660e\u786e\u8981\u6c42\u589e\u5f3a\u6216\u51cf\u5c11\u8bf4\u670d\u529b\u65f6\uff0c\u4ee5\u53ca\u4ec5\u8981\u6c42\u8fdb\u884c\u91ca\u4e49\u65f6\u4ea7\u751f\u8bf4\u670d\u6027\u6587\u672c\u7684\u7a0b\u5ea6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u2014\u2014\u201cPersuasive-Pairs\u201d\uff0c\u5305\u542b\u4e00\u7ec4\u7531\u7b80\u77ed\u6587\u672c\u548cLLM\u91cd\u5199\u4ee5\u653e\u5927\u6216\u524a\u5f31\u8bf4\u670d\u529b\u7684\u6587\u672c\u5bf9\u3002\u6211\u4eec\u5bf9\u8fd9\u4e9b\u914d\u5bf9\u8fdb\u884c\u4e86\u591a\u6807\u6ce8\uff0c\u6309\u76f8\u5bf9\u5c3a\u5ea6\u8bc4\u4f30\u5176\u8bf4\u670d\u529b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u4e0d\u4ec5\u672c\u8eab\u5177\u6709\u4ef7\u503c\uff0c\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5b83\u8bad\u7ec3\u4e00\u4e2a\u56de\u5f52\u6a21\u578b\uff0c\u9884\u6d4b\u6587\u672c\u5bf9\u4e4b\u95f4\u8bf4\u670d\u529b\u7684\u5f97\u5206\uff0c\u4ece\u800c\u80fd\u591f\u5bf9\u4e0d\u540c\u9886\u57df\u7684LLMs\u8fdb\u884c\u8bc4\u5206\u548c\u6bd4\u8f83\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e0d\u540c\u7cfb\u7edf\u63d0\u793a\u5bf9LLaMA3\u4ea7\u751f\u7684\u5f71\u54cd\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5373\u4f7f\u5728\u4ec5\u8981\u6c42\u91ca\u4e49\u7684\u60c5\u51b5\u4e0b\uff0c\u4e0d\u540c\u7684\u201c\u89d2\u8272\u201d\u63d0\u793a\u4e5f\u4f1a\u663e\u8457\u6539\u53d8\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7814\u7a76LLM\u751f\u6210\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u8bed\u8a00\u7684\u91cd\u8981\u6027\u3002|\n", "2406.17737": "|**2024-06-25**|**LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users**|Elinor Poole-Dayan et.al.|[2406.17737](http://arxiv.org/abs/2406.17737)|null|\u5728\u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5173\u4e8e\u5b83\u4eec\u7684\u4e0d\u53ef\u9760\u884c\u4e3a\uff0c\u5982\u865a\u6784\u548c\u504f\u89c1\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u7684\u56de\u7b54\u8d28\u91cf\u5728\u4fe1\u606f\u51c6\u786e\u6027\u3001\u771f\u5b9e\u6027\u4ee5\u53ca\u62d2\u7edd\u56de\u7b54\u65b9\u9762\uff0c\u5982\u4f55\u968f\u7740\u4e09\u79cd\u7528\u6237\u7279\u5f81\u7684\u53d8\u5316\u800c\u53d8\u5316\uff1a\u82f1\u8bed\u6c34\u5e73\u3001\u6559\u80b2\u7a0b\u5ea6\u548c\u56fd\u7c4d\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u6700\u5148\u8fdb\u7684LLMs\u548c\u4e24\u4e2a\u4e8b\u5b9e\u6838\u67e5\u76f8\u5173\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u771f\u5b9e\u6027\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u5bf9\u82f1\u8bed\u80fd\u529b\u8f83\u4f4e\u3001\u6559\u80b2\u6c34\u5e73\u8f83\u4f4e\u4ee5\u53ca\u975e\u7f8e\u56fd\u7c4d\u7528\u6237\u7684\u56de\u7b54\u8d28\u91cf\u5b58\u5728\u66f4\u660e\u663e\u7684\u8d1f\u9762\u503e\u5411\uff0c\u8fd9\u4f7f\u5f97\u8fd9\u4e9b\u6a21\u578b\u5bf9\u4e8e\u5176\u6700\u5f31\u52bf\u7528\u6237\u6765\u8bf4\uff0c\u5e76\u975e\u53ef\u9760\u7684\u4fe1\u606f\u6765\u6e90\u3002|\n", "2406.17706": "|**2024-06-25**|**FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model**|Feijie Wu et.al.|[2406.17706](http://arxiv.org/abs/2406.17706)|**[link](https://github.com/HarliWu/FedBiOT)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u8fc7\u9002\u5f53\u9886\u57df\u7279\u5b9a\u6570\u636e\u7684\u5fae\u8c03\u540e\uff0c\u5728\u8bb8\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u4e13\u7528\u6570\u636e\u901a\u5e38\u5206\u5e03\u5728\u591a\u4e2a\u6240\u6709\u8005\u4e4b\u95f4\uff0c\u8fd9\u5c31\u63d0\u51fa\u4e86\u5982\u4f55\u5728\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4e2d\u8fdb\u884cLLM\u5fae\u8c03\u7684\u95ee\u9898\u3002\u9762\u5bf9\u6709\u9650\u7684\u8ba1\u7b97\u548c\u901a\u4fe1\u80fd\u529b\uff0cFL\u5ba2\u6237\u7aef\u5728\u6709\u6548\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FedBiOT\uff0c\u4e00\u79cd\u65e8\u5728\u63d0\u9ad8\u8d44\u6e90\u6548\u7387\u7684LLM\u5fae\u8c03FL\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u670d\u52a1\u5668\u751f\u6210\u4e00\u4e2a\u538b\u7f29\u7684LLM\uff0c\u5e76\u786e\u4fdd\u5176\u6027\u80fd\u4e0e\u5b8c\u6574\u6a21\u578b\u76f8\u5f53\u3002\u7136\u540e\uff0c\u5ba2\u6237\u7aef\u9488\u5bf9\u8fd9\u4e2a\u538b\u7f29\u6a21\u578b\u7684\u4e00\u4e2a\u8f7b\u91cf\u4f46\u91cd\u8981\u7684\u90e8\u5206\u2014\u2014\u9002\u914d\u5668\u8fdb\u884c\u5fae\u8c03\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7531\u4e8e\u670d\u52a1\u5668\u65e0\u6cd5\u8bbf\u95ee\u5ba2\u6237\u7aef\u62e5\u6709\u7684\u79c1\u4eba\u6570\u636e\uff0c\u670d\u52a1\u5668\u7528\u4e8e\u6821\u51c6\u7684\u6570\u636e\u5206\u5e03\u4e0e\u5ba2\u6237\u7aef\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u4e0d\u540c\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5efa\u6a21\u4e3a\u4e00\u4e2a\u5e26\u6709\u6570\u636e\u4e0d\u4e00\u81f4\u6027\u5f71\u54cd\u7684 bilevel \u4f18\u5316\u95ee\u9898\uff0c\u5e76\u5bfc\u51fa\u4e86\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u7684\u66f4\u65b0\u89c4\u5219\u3002\u6211\u4eec\u5728 LLaMA-2 \u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u5728\u91cd\u65b0\u6574\u5408\u5230\u5168\u5c40\u8bed\u8a00\u6a21\u578b\u65f6\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u8fd8\u8868\u660e\uff0cFedBiOT \u76f8\u6bd4\u73b0\u6709\u57fa\u51c6\u663e\u8457\u51cf\u5c11\u4e86\u8d44\u6e90\u6d88\u8017\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u76f8\u8fd1\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692](http://arxiv.org/abs/2406.17692)|**[link](https://github.com/thomlake/investigating-alignment)**|**\u8be5\u7814\u7a76\u5206\u6790\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ecf\u8fc7\u6821\u51c6\u540e\u8f93\u51fa\u5206\u5e03\u7684\u53d8\u5316\u7279\u6027\u3002\u9996\u5148\uff0c\u91cd\u65b0\u8bc4\u4f30\u4e86\u4e4b\u524d\u5173\u4e8e\u6821\u51c6\u540e\u54cd\u5e94\u591a\u6837\u6027\u964d\u4f4e\u7684\u62a5\u544a\uff0c\u53d1\u73b0\u8fd9\u79cd\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u8d28\u91cf\u63a7\u5236\u548c\u4fe1\u606f\u6574\u5408\u3002\u6821\u51c6\u80fd\u591f\u6291\u5236\u4e0d\u76f8\u5173\u548c\u65e0\u5e2e\u52a9\u7684\u5185\u5bb9\uff0c\u540c\u65f6\u4f7f\u8f93\u51fa\u5206\u5e03\u503e\u5411\u4e8e\u66f4\u957f\u7684\u3001\u6db5\u76d6\u591a\u4e2a\u57fa\u7840LLM\u54cd\u5e94\u4fe1\u606f\u7684\u7b54\u6848\uff0c\u5b9e\u8d28\u4e0a\u662f\u5c06\u591a\u6837\u5316\u4fe1\u606f\u6c47\u603b\u5728\u5355\u4e2a\u54cd\u5e94\u4e2d\u3002\u7814\u7a76\u5e76\u672a\u53d1\u73b0\u6821\u51c6\u663e\u8457\u51cf\u5c11\u6709\u7528\u4fe1\u606f\uff0c\u8fdb\u800c\u5f15\u51fa\u95ee\u9898\uff1a\u6821\u51c6\u6a21\u578b\u662f\u5426\u4f1a\u4ea7\u751f\u57fa\u7840\u6a21\u578b\u65e0\u6cd5\u518d\u73b0\u7684\u4fe1\u606f\uff1f\u7b2c\u4e8c\u90e8\u5206\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u60c5\u51b5\u5e76\u975e\u5982\u6b64\uff0c\u6821\u51c6\u6a21\u578b\u7684\u884c\u4e3a\u53ef\u4ee5\u901a\u8fc7\u57fa\u7840\u6a21\u578b\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u590d\u73b0\u3002\u901a\u8fc7\u4e0a\u4e0b\u6587\u793a\u4f8b\u548c\u8f83\u4f4e\u5206\u8fa8\u7387\u7684\u8bed\u4e49\u63d0\u793a\uff0c\u53ef\u4ee5\u4ece\u57fa\u7840LLMs\u5f15\u5bfc\u51fa\u4e0e\u6821\u51c6\u540e\u7684\u76f8\u4f3c\u54cd\u5e94\uff0c\u751a\u81f3\u4e0e\u6821\u51c6\u540e\u7684\u54cd\u5e94\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u63a5\u8fd1\u3002\u8fd9\u4e9b\u53d1\u73b0\u652f\u6301\u201c\u8868\u9762\u6821\u51c6\u5047\u8bbe\u201d\uff0c\u5373\u5f53\u524d\u7684\u6821\u51c6\u6280\u672f\u4ec5\u6355\u6349\u4e86\u52a9\u624b\u578b\u57fa\u7840LLM\u884c\u4e3a\u4e2d\u6709\u7528\u7684\u90e8\u5206\uff0c\u5e76\u672a\u6269\u5c55\u5176\u80fd\u529b\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u8fd8\u663e\u793a\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u6821\u51c6\u4f5c\u4e3a\u4e00\u79cd\u6a21\u4eff\u6821\u51c6LLMs\u7684\u7b56\u7565\uff0c\u6548\u679c\u51fa\u4eba\u610f\u6599\u5730\u597d\uff0c\u4e14\u65e0\u9700\u5fae\u8c03\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.17681": "|**2024-06-25**|**VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation**|Kun Qian et.al.|[2406.17681](http://arxiv.org/abs/2406.17681)|**[link](https://github.com/qbetterk/VarBench)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f20\u7edf\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u65e5\u76ca\u51fa\u8272\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u9884\u8bad\u7ec3\u671f\u95f4\u7684\u57fa\u51c6\u6570\u636e\u6cc4\u9732\u95ee\u9898\uff0c\u901a\u5e38\u79f0\u4e3a\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002\u4e3a\u4e86\u786e\u4fdd\u516c\u6b63\u7684\u8bc4\u4f30\uff0c\u6700\u8fd1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u516c\u5f00\u8bad\u7ec3\u548c\u9a8c\u8bc1\u96c6\uff0c\u5bf9\u6d4b\u8bd5\u96c6\u6807\u7b7e\u4fdd\u5bc6\u3002\u4ed6\u4eec\u8981\u6c42\u4efb\u4f55\u5e0c\u671b\u8bc4\u4f30\u81ea\u5df1\u8bed\u8a00\u6a21\u578b\u7684\u4eba\u90fd\u9700\u8981\u63d0\u4ea4\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u8fdb\u884c\u96c6\u4e2d\u5904\u7406\uff0c\u7136\u540e\u5728\u6392\u884c\u699c\u4e0a\u516c\u5e03\u6a21\u578b\u7684\u5f97\u5206\u3002\u7136\u800c\uff0c\u8fd9\u4e2a\u63d0\u4ea4\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u59a8\u788d\u4e86\u6709\u6548\u7684\u9519\u8bef\u5206\u6790\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u52a8\u6001\u5316\u57fa\u51c6\u6d4b\u8bd5\u5e76\u5b9e\u65f6\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u6bcf\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u4e2d\u63d0\u53d6\u53d8\u91cf\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u53d8\u91cf\u5b9a\u4e49\u4e00\u4e2a\u503c\u8303\u56f4\u3002\u6bcf\u6b21\u8bc4\u4f30\u65f6\uff0c\u6211\u4eec\u4f1a\u4ece\u8fd9\u4e9b\u503c\u57df\u4e2d\u62bd\u53d6\u65b0\u7684\u503c\u6765\u521b\u5efa\u72ec\u7279\u7684\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ece\u800c\u4fdd\u8bc1\u6bcf\u6b21\u90fd\u662f\u5168\u65b0\u7684\u8bc4\u4f30\u3002 \u6211\u4eec\u9488\u5bf9\u6570\u5b66\u751f\u6210\u4efb\u52a1\u7684GSM8K\u3001\u591a\u9879\u9009\u62e9\u4efb\u52a1\u7684ARC\u3001commonsense\u95ee\u7b54\u7684CommonsenseQA\u4ee5\u53caTruthfulQA\u7684\u771f\u5b9e\u6027\u95ee\u7b54\u4efb\u52a1\uff0c\u5e94\u7528\u4e86\u8fd9\u79cd\u53d8\u91cf\u6270\u52a8\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u51c6\u786e\u5730\u8861\u91cf\u8bed\u8a00\u6a21\u578b\u7684\u771f\u5b9e\u80fd\u529b\uff0c\u6709\u6548\u7f13\u89e3\u4e86\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002|\n", "2406.17675": "|**2024-06-25**|**Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models**|Yuan Li et.al.|[2406.17675](http://arxiv.org/abs/2406.17675)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u65e5\u76ca\u626e\u6f14\u7c7b\u4f3c\u4eba\u7c7b\u52a9\u624b\u7684\u89d2\u8272\u3002\u793e\u4f1a\u5bf9\u5c06LLMs\u66f4\u5e7f\u6cdb\u5730\u878d\u5165\u5176\u4e2d\u4ea7\u751f\u4e86\u5174\u8da3\uff0c\u63a2\u8ba8\u5b83\u4eec\u662f\u5426\u5177\u5907\u5fc3\u7406\u7279\u8d28\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u7279\u8d28\u662f\u5426\u7a33\u5b9a\u4e14\u6709\u52a9\u4e8e\u7406\u89e3\u5176\u884c\u4e3a\u3002\u672c\u6587\u501f\u9274\u5fc3\u7406\u5b66\u6d4b\u91cf\u5b66\u7684\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7528\u4e8e\u7814\u7a76LLMs\u4e2d\u7684\u5fc3\u7406\u5b66\uff0c\u5305\u62ec\u5fc3\u7406\u7ef4\u5ea6\u8bc6\u522b\u3001\u8bc4\u4f30\u6570\u636e\u96c6\u521b\u5efa\u548c\u7ed3\u679c\u9a8c\u8bc1\u3002\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684LLM\u5fc3\u7406\u6d4b\u91cf\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u516d\u79cd\u5fc3\u7406\u7ef4\u5ea6\uff1a\u4e2a\u6027\u3001\u4ef7\u503c\u89c2\u3001\u60c5\u7eea\u3001\u5fc3\u667a\u7406\u8bba\u3001\u52a8\u673a\u548c\u667a\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u542b\u4e86\u5341\u4e09\u4e2a\u5305\u542b\u591a\u6837\u573a\u666f\u548c\u9898\u578b\u7684\u6570\u636e\u96c6\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLMs\u5c55\u73b0\u51fa\u5e7f\u6cdb\u7684\u5fc3\u7406\u7279\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u89c2\u5bdf\u5230LLMs\u5728\u81ea\u6211\u62a5\u544a\u7684\u7279\u8d28\u4e0e\u5176\u5b9e\u9645\u884c\u4e3a\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u3002\u8be5\u8bba\u6587\u8be6\u7ec6\u5c55\u793a\u4e86LLMs\u7684\u5fc3\u7406\u6d4b\u91cf\u8bc4\u4f30\uff0c\u4e3aAI\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53ef\u9760\u8bc4\u4f30\u63d0\u4f9b\u4e86\u6d1e\u89c1\uff0c\u4ee5\u53ca\u53ef\u80fd\u7684\u5e94\u7528\u65b9\u5411\u3002|\n", "2406.18532": "|**2024-06-26**|**Symbolic Learning Enables Self-Evolving Agents**|Wangchunshu Zhou et.al.|[2406.18532](http://arxiv.org/abs/2406.18532)|**[link](https://github.com/aiwaves-cn/agents)**|**\u4eba\u5de5\u667a\u80fd\u754c\u901a\u8fc7\u6784\u5efa\"\u8bed\u8a00\u4ee3\u7406\"\uff08\u5373\u590d\u6742\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ba1\u9053\uff09\u6765\u63a2\u5bfb\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u9053\u8def\uff0c\u8fd9\u4e9b\u6a21\u578b\u7ed3\u5408\u4e86\u63d0\u793a\u6280\u672f\u548c\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u4f17\u591a\u5b9e\u9645\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f53\u524d\u8bed\u8a00\u4ee3\u7406\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u662f\u5176\u6a21\u578b\u4e2d\u5fc3\u6216\u5de5\u7a0b\u5bfc\u5411\uff1a\u63d0\u793a\u3001\u5de5\u5177\u548c\u7ba1\u9053\u7684\u6539\u8fdb\u4f9d\u8d56\u4e8e\u5927\u91cf\u7684\u4eba\u5de5\u4e13\u5bb6\u8bbe\u8ba1\uff0c\u800c\u975e\u81ea\u52a8\u4ece\u6570\u636e\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4ece\u6a21\u578b\u4e2d\u5fc3\u5411\u6570\u636e\u4e2d\u5fc3\u8f6c\u53d8\u2014\u2014\u8ba9\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u81ea\u4e3b\u5b66\u4e60\u548c\u9002\u5e94\u73af\u5883\uff0c\u662f\u5b83\u4eec\u8fc8\u5411AGI\u7684\u5173\u952e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\"\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\"\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u65b9\u6cd5\uff0c\u5b83\u4f7f\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u5728\u6570\u636e\u9a71\u52a8\u7684\u65b9\u5f0f\u4e0b\u81ea\u6211\u4f18\u5316\uff0c\u5229\u7528\u7b26\u53f7\u4f18\u5316\u5668\u3002\u6211\u4eec\u5c06\u4ee3\u7406\u89c6\u4e3a\u5177\u6709\u53ef\u5b66\u4e60\u6743\u91cd\u7684\u7b26\u53f7\u7f51\u7edc\uff0c\u8fd9\u4e9b\u6743\u91cd\u7531\u63d0\u793a\u3001\u5de5\u5177\u53ca\u5176\u7ec4\u5408\u65b9\u5f0f\u5b9a\u4e49\u3002\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u65e8\u5728\u6a21\u4eff\u8fde\u63a5\u4e3b\u4e49\u5b66\u4e60\u4e2d\u7684\u4e24\u4e2a\u57fa\u672c\u7b97\u6cd5\uff1a\u53cd\u5411\u4f20\u64ad\u548c\u68af\u5ea6\u4e0b\u964d\uff0c\u4f46\u5b83\u5904\u7406\u7684\u662f\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7684\u6743\u91cd\u3001\u635f\u5931\u548c\u68af\u5ea6\u3002\u6211\u4eec\u5728\u6807\u51c6\u57fa\u51c6\u548c\u590d\u6742\u73b0\u5b9e\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u6982\u5ff5\u9a8c\u8bc1\u5b9e\u9a8c\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u4f7f\u5f97\u8bed\u8a00\u4ee3\u7406\u5728\u521b\u5efa\u548c\u90e8\u7f72\u540e\u80fd\u591f\u81ea\u6211\u66f4\u65b0\uff0c\u5b9e\u73b0\u4e86\"\u81ea\u6211\u8fdb\u5316\u7684\u4ee3\u7406\"\u3002**|\n", "2406.18528": "|**2024-06-26**|**PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation**|Christoph Leiter et.al.|[2406.18528](http://arxiv.org/abs/2406.18528)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5b83\u4eec\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u4f7f\u5176\u6210\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u8bc4\u4ef7\u7684\u6709\u529b\u5de5\u5177\uff0c\u7279\u522b\u9002\u7528\u4e8e\u8d44\u6e90\u532e\u4e4f\u548c\u65f6\u95f4\u9650\u5236\u7684\u573a\u666f\u3002\u672c\u6587\u63d0\u51faPrExMe\uff0c\u4e00\u9879\u5927\u89c4\u6a21\u7684\u63d0\u793a\u63a2\u7d22\u5ea6\u91cf\u6cd5\uff0c\u6211\u4eec\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u548c\u6458\u8981\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u8d85\u8fc7720\u79cd\u5f00\u6e90LLM\u4f5c\u4e3a\u5ea6\u91cf\u6807\u51c6\u7684\u6a21\u677f\uff0c\u603b\u8ba1\u7ea6660\u4e07\u6b21\u8bc4\u4f30\u3002\u8fd9\u9879\u8be6\u5c3d\u7684\u6bd4\u8f83\uff081\uff09\u4e3a\u8fd1\u671f\u5f00\u6e90LLMs\u4f5c\u4e3a\u8bc4\u4ef7\u6307\u6807\u7684\u8868\u73b0\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff1b\uff082\uff09\u63a2\u8ba8\u4e86\u4e0d\u540c\u63d0\u793a\u7b56\u7565\u7684\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65b9\u9762\uff0c\u5b58\u5728\u4e00\u4e9b\u60c5\u51b5\u4e0b\u63d0\u793a\u8868\u73b0\u7a33\u5b9a\uff1a\u6709\u4e9bLLMs\u8868\u73b0\u51fa\u7279\u6709\u7684\u504f\u597d\uff0c\u503e\u5411\u4e8e\u4f7f\u7528\u6587\u672c\u6807\u7b7e\u6765\u8bc4\u5206\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u503e\u5411\u4e8e\u8fd4\u56de\u6570\u503c\u5206\u6570\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u63d0\u793a\u7684\u7a33\u5b9a\u6027\u548c\u6a21\u578b\u6392\u540d\u53ef\u80fd\u53d7\u5230\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u66f4\u6539\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u5c06\u8f93\u51fa\u683c\u5f0f\u4ece\u201c0\u5230100\u201d\u6539\u4e3a\u201c-1\u5230+1\u201d\u53ef\u80fd\u4f1a\u663e\u8457\u6539\u53d8\u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u6709\u52a9\u4e8e\u7406\u89e3\u4e0d\u540c\u63d0\u793a\u65b9\u6cd5\u5bf9MT\u548c\u6458\u8981\u8bc4\u4ef7\u4e2dLLM-based\u5ea6\u91cf\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u6700\u7a33\u5b9a\u7684\u63d0\u793a\u6a21\u5f0f\uff0c\u5e76\u6307\u51fa\u4e86\u6f5c\u5728\u5c40\u9650\u6027\u3002|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521](http://arxiv.org/abs/2406.18521)|**[link](https://github.com/princeton-nlp/CharXiv)**|\u5728\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5904\u7406\u79d1\u5b66\u8bba\u6587\u6216\u8d22\u52a1\u62a5\u544a\u7b49\u4efb\u52a1\u65f6\uff0c\u56fe\u8868\u7406\u89e3\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u5f80\u5f80\u96c6\u4e2d\u5728\u7b80\u5316\u548c\u540c\u8d28\u5316\u7684\u56fe\u8868\u4e0a\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6a21\u677f\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u8bc4\u4f30\u8fc7\u4e8e\u4e50\u89c2\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5c3d\u7ba1\u5f00\u6e90\u6a21\u578b\u5728\u73b0\u6709\u57fa\u51c6\u4e0a\u53ef\u80fd\u8868\u73b0\u4f18\u4e8e\u5f3a\u5927\u7684\u4e13\u6709\u6a21\u578b\uff0c\u4f46\u901a\u8fc7\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\uff0c\u5982\u6539\u53d8\u56fe\u8868\u6216\u95ee\u9898\uff0c\u6027\u80fd\u4f1a\u4e0b\u964d\u9ad8\u8fbe34.5%\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faCharXiv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,323\u4e2a\u6765\u81eaarXiv\u8bba\u6587\u7684\u81ea\u7136\u3001\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u8868\u7684\u5168\u9762\u8bc4\u4f30\u5957\u4ef6\u3002CharXiv\u5305\u62ec\u4e24\u7c7b\u95ee\u9898\uff1a1\uff09\u63cf\u8ff0\u6027\u95ee\u9898\uff0c\u7528\u4e8e\u68c0\u67e5\u57fa\u672c\u56fe\u8868\u5143\u7d20\uff1b2\uff09\u63a8\u7406\u95ee\u9898\uff0c\u9700\u8981\u7efc\u5408\u5206\u6790\u56fe\u8868\u4e2d\u7684\u590d\u6742\u89c6\u89c9\u5143\u7d20\u3002\u6240\u6709\u56fe\u8868\u548c\u95ee\u9898\u90fd\u7531\u4e13\u5bb6\u7cbe\u5fc3\u6311\u9009\u3001\u6574\u7406\u548c\u9a8c\u8bc1\u4ee5\u4fdd\u8bc1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5f3a\u4e13\u6709\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\uff0c\u51c6\u786e\u7387\u4e3a47.1%\uff09\u4e0e\u6700\u5f3a\u5f00\u6e90\u6a21\u578b\uff08\u5982InternVL Chat V1.5\uff0c\u51c6\u786e\u7387\u4e3a29.2%\uff09\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\uff0c\u800c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5747\u8fdc\u4f4e\u4e8e\u4eba\u7c7b\u768480.5%\u6c34\u5e73\uff0c\u8fd9\u63ed\u793a\u4e86\u73b0\u6709MLLM\u5728\u56fe\u8868\u7406\u89e3\u80fd\u529b\u4e0a\u7684\u4e0d\u8db3\u3002\u6211\u4eec\u5e0c\u671bCharXiv\u80fd\u63a8\u52a8\u672a\u6765\u7684\u7814\u7a76\uff0c\u901a\u8fc7\u63d0\u4f9b\u66f4\u771f\u5b9e\u3001\u66f4\u5177\u4ee3\u8868\u6027\u7684\u8fdb\u6b65\u8861\u91cf\u6807\u51c6\uff0c\u4fc3\u8fdb\u56fe\u8868\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\u3002\u9879\u76ee\u9875\u9762\u548c\u6392\u884c\u699c\u53ef\u8bbf\u95ee\uff1ahttps://charxiv.github.io/\u3002|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512](http://arxiv.org/abs/2406.18512)|null|### \u6982\u8ff0 \u89e3\u91ca\u662f\u77e5\u8bc6\u5171\u4eab\u7684\u6838\u5fc3\uff0c\u5b83\u5efa\u7acb\u5728\u6c9f\u901a\u539f\u7406\u3001\u793e\u4f1a\u52a8\u6001\u548c\u5b66\u4e60\u7406\u8bba\u4e4b\u4e0a\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5bf9\u8bdd\u5f0f\u7684\u89e3\u91ca\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5176\u73af\u5883\u9ad8\u5ea6\u9002\u5e94\u6027\u548c\u4ea4\u4e92\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u5229\u7528\u4e86\u89e3\u91ca\u884c\u4e3a\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7406\u89e3\u89e3\u91ca\u8005\u548c\u88ab\u89e3\u91ca\u8005\u5728\u5bf9\u8bdd\u4e2d\u5982\u4f55\u8fd0\u7528\u7b56\u7565\u8fdb\u884c\u89e3\u91ca\u3001\u7406\u89e3\u548c\u4e92\u52a8\u7684\u5de5\u5177\u3002\u6211\u4eec\u5229\u7528Wachsmuth\u7b49\u4eba\u6784\u5efa\u7684WIRED YouTube\u7cfb\u5217\u6570\u636e\u96c6\uff0c\u5e76\u7531Booshehri\u7b49\u4eba\u8fdb\u884c\u4e86\u5e26\u6709\u89e3\u91ca\u884c\u4e3a\u7684\u6807\u6ce8\uff0c\u8fd9\u4e9b\u6ce8\u91ca\u4e3a\u6211\u4eec\u7406\u89e3\u5bf9\u8bdd\u4e2d\u89e3\u91ca\u8005\u5982\u4f55\u6784\u5efa\u56de\u5e94\u63d0\u4f9b\u4e86\u4f9d\u636e\u3002 \u968f\u7740\u53bb\u5e74\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u671f\u671b\u66f4\u597d\u5730\u7406\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u4ee5\u53ca\u5b83\u4eec\u5982\u4f55\u589e\u5f3a\u4e13\u5bb6\u89e3\u91ca\u8005\u7684\u5bf9\u8bdd\u4ea4\u6d41\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Booshehri\u7b49\u4eba2023\u5e74\u6807\u6ce8\u76845-Levels\u6570\u636e\u96c6\u6765\u8bc4\u4f30LLMs\u5728\u89e3\u91ca\u6027\u5bf9\u8bdd\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u8bc4\u4ef7LLMs\u751f\u6210\u89e3\u91ca\u8005\u56de\u5e94\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u79cd\u7b56\u7565\uff1a\u4eba\u7c7b\u89e3\u91ca\u8005\u7684\u539f\u59cb\u56de\u5e94\u3001GPT4\u7684\u6807\u51c6\u56de\u5e94\u4ee5\u53ca\u52a0\u5165\u4e86\u89e3\u91ca\u6b65\u9aa4\u7684GPT4\u56de\u5e94\u3002\u6211\u4eec\u9080\u8bf7\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u8fd9\u4e09\u79cd\u7b56\u7565\u8fdb\u884c\u8bc4\u4f30\u3002|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505](http://arxiv.org/abs/2406.18505)|null|## \u80cc\u666f \u5c3d\u7ba1\u73b0\u4ee3\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7406\u8bba\u4e0a\u80fd\u591f\u8868\u8fbe\u4efb\u610f\u53ef\u80fd\u7684\u4ee4\u724c\u5206\u5e03\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5229\u7528\u9884\u8bad\u7ec3\u65f6\u79ef\u7d2f\u7684\u4e16\u754c\u77e5\u8bc6\u6765\u7406\u89e3\u7269\u7406\u4e16\u754c\u4e2d\u7684\u4ee3\u7406\u884c\u4e3a\uff0c\u8fd9\u4e00\u65b9\u9762\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5b9e\u8bc1\u8003\u5bdf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u8fc7\u63a8\u7406\u5206\u6790\u4ee3\u7406\u7684\u884c\u4e3a\u53ca\u5176\u5bf9\u72b6\u6001\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u6784\u5efa\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\uff08agent mental modeling\uff09\u7684\u80fd\u529b\u3002\u8fd9\u53ef\u80fd\u63ed\u793a\u51fa\u5229\u7528LLMs\u89e3\u6790\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u884c\u4e3a\u7684\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u89e3\u91ca\u5f3a\u5316\u5b66\u4e60\uff08XRL\uff09\u7684\u5173\u952e\u6311\u6218\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u7279\u5b9a\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5728\u4e0d\u540c\u590d\u6742\u5ea6\u7684RL\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u62a5\u544a\u5173\u4e8e\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\u5efa\u7acb\u7684\u7814\u7a76\u7ed3\u679c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u8fd8\u65e0\u6cd5\u4ec5\u901a\u8fc7\u63a8\u7406\u5b8c\u5168\u5b9e\u73b0\u4ee3\u7406\u7684\u5fc3\u7406\u5efa\u6a21\uff0c\u8fd9\u9700\u8981\u8fdb\u4e00\u6b65\u521b\u65b0\u3002\u56e0\u6b64\uff0c\u8fd9\u9879\u5de5\u4f5c\u63d0\u4f9b\u4e86\u5bf9\u73b0\u4ee3LLMs\u80fd\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\u3002|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501](http://arxiv.org/abs/2406.18501)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5185\u63d2\u5b66\u4e60\uff08in-context learning\uff0cICL\uff09\u80fd\u529b\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u8fdb\u884c\u529f\u80fd\u7b49\u6548\u6027\u8bca\u65ad\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u9006\u9891\u7387\u6548\u5e94\uff08inverse frequency effect\uff0cIFE\uff09\u6765\u5206\u6790\u3002IFE\u73b0\u8c61\u6307\u7684\u662f\u5728\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6a21\u578b\u5e94\u5bf9\u7f55\u89c1\u6837\u4f8b\u4ea7\u751f\u7684\u66f4\u65b0\u5e45\u5ea6\u5927\u4e8e\u5e38\u89c1\u6837\u4f8b\u3002\u5728\u5fc3\u7406\u5b66\u4e2d\uff0c\u4eba\u7c7b\u5728\u7ed3\u6784\u5316\u63d0\u793a\uff08\u5982\u503e\u5411\u4e8e\u91cd\u590d\u6700\u8fd1\u63a5\u89e6\u7684\u53e5\u5b50\u7ed3\u6784\uff09\u60c5\u5883\u4e2d\u8868\u73b0\u51faIFE\uff0c\u8fd9\u8868\u660e\u5176\u53ef\u80fd\u6d89\u53ca\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u673a\u5236\u3002\u5b9e\u9a8c\u901a\u8fc7\u6a21\u62df\u7ed3\u6784\u5316\u63d0\u793a\u5728ICL\u4e2d\u7684\u5f71\u54cd\u53d1\u73b0\uff0cLLMs\u540c\u6837\u663e\u793a\u51faIFE\uff0c\u4e14\u8fd9\u4e00\u6548\u5e94\u5728\u66f4\u5927\u7684\u6a21\u578b\u4e2d\u66f4\u4e3a\u660e\u663e\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u7ed3\u679c\u652f\u6301\u4e86ICL\u672c\u8d28\u4e0a\u662f\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u7684\u5047\u8bbe\uff0c\u5373\u5728ICL\u7684\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u9690\u542b\u5730\u8ba1\u7b97\u4e86\u68af\u5ea6\u3002\u8bba\u6587\u7ed3\u8bba\u6307\u51fa\uff0c\u4eba\u7c7b\u548cLLMs\u90fd\u4f7f\u7528\u4e86\u57fa\u4e8e\u68af\u5ea6\u7684\u3001\u9519\u8bef\u9a71\u52a8\u7684\u5904\u7406\u673a\u5236\u3002|\n", "2406.18460": "|**2024-06-26**|**Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation**|Ahmed Njifenjou et.al.|[2406.18460](http://arxiv.org/abs/2406.18460)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b9\u6cd5\u6765\u521b\u5efa\u80fd\u591f\u8fdb\u884c\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u56de\u7b54\u7528\u6237\u95ee\u9898\uff0c\u4f46\u5c40\u9650\u4e8e\u5355\u5411\u95ee\u7b54\u5f62\u5f0f\uff0c\u800c\u975e\u771f\u6b63\u7684\u5bf9\u8bdd\u3002\u901a\u5e38\uff0c\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u6765\u8c03\u6574\u5b83\u4eec\u7684\u4ea4\u6d41\u98ce\u683c\uff0c\u4f46\u8fd9\u65e2\u6602\u8d35\u53c8\u9650\u4e8e\u5c11\u6570\u8bed\u8a00\u3002\u672c\u7814\u7a76\u63a2\u7d22\u4e86\u89d2\u8272\u626e\u6f14\u7684\u96f6\u6837\u672c\u63d0\u793a\u4f5c\u4e3a\u63d0\u9ad8\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u591a\u8bed\u8a00\u80fd\u529b\u5f3a\u7684\u8bad\u7ec3\u6709\u7d20\u6a21\u578b\uff08Beeching\u7b49\u4eba\uff0c2023\u5e74\uff09\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u9075\u5faa\u6307\u4ee4\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\u7cfb\u7edf\uff0c\u5f53\u4e0e\u9075\u5faa\u6307\u4ee4\u7684\u6a21\u578b\u2014\u2014\u8fd9\u91cc\u4f7f\u7528Vicuna\uff08Chiang\u7b49\u4eba\uff0c2023\u5e74\uff09\u7ed3\u5408\u65f6\uff0c\u80fd\u591f\u751f\u6210\u5728\u6cd5\u8bed\u4e2d\u7684\u5bf9\u8bdd\u4ee3\u7406\uff0c\u5728\u4e24\u9879\u4efb\u52a1\u4e2d\u751a\u81f3\u8d85\u8d8a\u4e86\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5e76\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.18449": "|**2024-06-26**|**Cascading Large Language Models for Salient Event Graph Generation**|Xingwei Tan et.al.|[2406.18449](http://arxiv.org/abs/2406.18449)|null|\u7531\u4e8e\u957f\u6587\u6863\u4e2d\u4e8b\u4ef6\u68c0\u6d4b\u3001\u5173\u7cfb\u8bc6\u522b\u4ee5\u53ca\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e0e\u7ed3\u6784\u5316\u56fe\u8c31\u7684\u6574\u5408\u7b49\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u4ece\u6587\u672c\u751f\u6210\u4e8b\u4ef6\u56fe\u8c31\u662f\u4e00\u9879\u6311\u6218\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u540c\u7b49\u91cd\u89c6\u6240\u6709\u4e8b\u4ef6\uff0c\u672a\u80fd\u533a\u5206\u5bf9\u7406\u89e3\u53d9\u4e8b\u81f3\u5173\u91cd\u8981\u7684\u5173\u952e\u4e8b\u4ef6\u3002\u672c\u6587\u63d0\u51faCALLMSAE\uff0c\u4e00\u4e2a\u57fa\u4e8eCAscading\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684SAlient Event\u56fe\u8c31\u751f\u6210\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u7684\u80fd\u529b\uff0c\u5e76\u907f\u514d\u4e86\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u9700\u6c42\u3002\u9996\u5148\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u6458\u8981\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u91cd\u8981\u4e8b\u4ef6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u4ee3\u7801\u7cbe\u70bc\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u751f\u6210\u4e8b\u4ef6\u5173\u7cfb\u56fe\uff0c\u6d88\u9664\u9519\u8bef\u7684\u5173\u7cfb\u5e76\u6062\u590d\u7f3a\u5931\u7684\u8fb9\u3002\u5bf9\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u56fe\u8c31\u751f\u6210\u6a21\u578b\u8fdb\u884c fine-tuning\uff0c\u5728\u4f7f\u7528 LLM \u751f\u6210\u7684\u56fe\u8c31\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f18\u4e8e\u4f7f\u7528 CAEVO \u751f\u6210\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u3002\u5728\u4eba\u7c7b\u6807\u6ce8\u7684\u6d4b\u8bd5\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u751f\u6210\u66f4\u7a81\u51fa\u4e14\u51c6\u786e\u7684\u56fe\u8c31\uff0c\u8d85\u8d8a\u4e86\u7ade\u4e89\u6027\u7684\u57fa\u7ebf\u3002|\n", "2406.18440": "|**2024-06-26**|**New intelligent empowerment for digital transformation**|Peng Yifeng et.al.|[2406.18440](http://arxiv.org/abs/2406.18440)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u8861\u91cf\u4f01\u4e1a\u7684\u6570\u5b57\u5316\u8f6c\u578b\uff08DT\uff09\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf92005\u5e74\u81f32022\u5e74\u95f4\u5728\u7ebd\u7ea6\u8bc1\u5238\u4ea4\u6613\u6240\u548c\u7eb3\u65af\u8fbe\u514b\u4e0a\u5e02\u76844407\u5bb6\u516c\u53f8\u7684\u5e74\u5ea6\u62a5\u544a\u8fdb\u884c\u5206\u6790\uff0c\u6784\u5efa\u4e86\u4e00\u5957\u5168\u9762\u7684DT\u6307\u6807\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cDT\u663e\u8457\u63d0\u9ad8\u4e86\u4f01\u4e1a\u7684\u8d22\u52a1\u8868\u73b0\u3002\u7136\u800c\uff0c\u4e0d\u540c\u7684\u6570\u5b57\u6280\u672f\u5bf9\u8d22\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u5404\u4e0d\u76f8\u540c\uff0c\u533a\u5757\u94fe\u6280\u672f\u7684\u79ef\u6781\u5f71\u54cd\u76f8\u5bf9\u8f83\u5c0f\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0DT\u901a\u8fc7\u63d0\u5347\u8fd0\u8425\u6548\u7387\u548c\u964d\u4f4e\u6210\u672c\u4fc3\u8fdb\u8d22\u52a1\u7ee9\u6548\u589e\u957f\u3002\u672c\u7814\u7a76\u4e3a\u5b66\u672f\u754c\u63d0\u4f9b\u4e86\u65b0\u7684DT\u8bc4\u4f30\u5de5\u5177\uff0c\u540c\u65f6\u62d3\u5bbd\u4e86\u751f\u6210\u4eba\u5de5\u667a\u80fd\u6280\u672f\u5728\u7ecf\u6d4e\u7814\u7a76\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002|\n", "2406.18406": "|**2024-06-26**|**IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons**|Dan Shi et.al.|[2406.18406](http://arxiv.org/abs/2406.18406)|null|\u4eba\u4eec\u666e\u904d\u8ba4\u4e3a\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u540e\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u6587\u672c\u65f6\u7684\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5185\u7f16\u7801\u7684\u53c2\u6570\u77e5\u8bc6\uff08\u5373\u77e5\u8bc6\u5e93\uff09\u4e0e\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u65b0\u77e5\u8bc6\u5b58\u5728\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014IRCAN\uff08\u8bc6\u522b\u548c\u91cd\u6743\u4e0a\u4e0b\u6587\u611f\u77e5\u795e\u7ecf\u5143\uff09\u3002IRCAN\u9996\u5148\u5229\u7528\u6574\u5408\u68af\u5ea6\u8ba1\u7b97\u5f97\u5230\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u5f52\u56e0\u5206\u6570\uff0c\u6765\u8bc6\u522b\u90a3\u4e9b\u5bf9\u5904\u7406\u8bed\u5883\u81f3\u5173\u91cd\u8981 \u7684\u795e\u7ecf\u5143\u3002\u63a5\u7740\uff0c\u901a\u8fc7\u91cd\u65b0\u8d4b\u6743\uff0c\u6211\u4eec\u5f3a\u5316\u8fd9\u4e9b\u8bc6\u522b\u51fa\u7684\u4e0a\u4e0b\u6587\u76f8\u5173\u795e\u7ecf\u5143\uff0c\u4ece\u800c\u5f15\u5bfcLLMs\u751f\u6210\u66f4\u7b26\u5408\u4e0a\u4e0b\u6587\u65b0\u77e5\u8bc6\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5728\u591a\u79cd\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cIRCAN\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u7684\u80fd\u529b\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u3001\u5373\u63d2\u5373\u7528\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u6a21\u578b\u4e2d\u3002|\n", "2406.19392": "|**2024-06-27**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392](http://arxiv.org/abs/2406.19392)|**[link](https://github.com/rextime/rextime)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aReXTime\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u95e8\u9488\u5bf9\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u5728\u89c6\u9891\u4e8b\u4ef6\u4e2d\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\u3002ReXTime\u5173\u6ce8\u7684\u662f\u8de8\u65f6\u95f4\u63a8\u7406\uff0c\u5373\u7406\u89e3\u5f53\u95ee\u9898\u53ca\u5176\u76f8\u5e94\u7684\u7b54\u6848\u51fa\u73b0\u5728\u4e0d\u540c\u7684\u89c6\u9891\u7247\u6bb5\u65f6\u7684\u4eba\u7c7b\u5f0f\u7406\u89e3\u3002\u8fd9\u79cd\u9700\u8981\u6df1\u5165\u7406\u89e3\u89c6\u9891\u7247\u6bb5\u4e4b\u95f4\u56e0\u679c\u5173\u7cfb\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u5bf9\u524d\u6cbf\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8bc4\u4ef7\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7ba1\u9053\uff0c\u7528\u4e8e\u751f\u6210\u65f6\u95f4\u63a8\u7406\u7684\u95ee\u7b54\u5bf9\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u7e41\u7410\u7684\u624b\u52a8\u6807\u6ce8\u9700\u6c42\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5305\u62ec921\u4e2a\u7cbe\u5fc3\u7b5b\u9009\u7684\u9a8c\u8bc1\u6837\u672c\u548c2,143\u4e2a\u6d4b\u8bd5\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u7ecf\u8fc7\u4eba\u5de5\u7cbe\u5fc3\u6311\u9009\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b66\u672f\u6a21\u578b\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7684\u8868\u73b0\u4ecd\u5b58\u5728\u663e\u8457\u768414.3%\u7684\u7cbe\u5ea6\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u65e0\u9700\u4eba\u5de5\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9,695\u4e2a\u673a\u5668\u751f\u6210\u6837\u672c\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6765\u63d0\u5347\u8de8\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u3002**|\n", "2406.19384": "|**2024-06-27**|**The Remarkable Robustness of LLMs: Stages of Inference?**|Vedang Lad et.al.|[2406.19384](http://arxiv.org/abs/2406.19384)|**[link](https://github.com/vdlad/remarkable-robustness-of-llms)**|**\u6211\u4eec\u901a\u8fc7\u5220\u9664\u548c\u4ea4\u6362\u76f8\u90bb\u5c42\u6765\u5c55\u793a\u5e76\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u60ca\u4eba\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u5e72\u9884\u63aa\u65bd\u4ecd\u80fd\u4fdd\u7559\u539f\u59cb\u6a21\u578b72%\u81f395%\u7684\u9884\u6d4b\u7cbe\u5ea6\uff0c\u800c\u4e14\u6a21\u578b\u5c42\u6570\u8d8a\u591a\uff0c\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u6839\u636e\u9010\u5c42\u5e72\u9884\u5b9e\u9a8c\u548c\u5176\u4ed6\u5b9e\u9a8c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5047\u8bbe\uff1a\u5b58\u5728\u56db\u79cd\u901a\u7528\u7684\u63a8\u7406\u9636\u6bb5\uff0c\u8de8\u8d8a\u516b\u79cd\u4e0d\u540c\u7684\u6a21\u578b\uff1a\u89e3\u7801\u5668\u9636\u6bb5\uff0c\u5c06\u539f\u59cb\u4ee4\u724c\u8868\u793a\u63d0\u5347\u4e3a\u66f4\u9ad8\u7ea7\u7684\u4e0a\u4e0b\u6587\u8868\u793a\uff1b\u7279\u5f81\u5de5\u7a0b\u9636\u6bb5\uff0c\u8fed\u4ee3\u4f18\u5316\u4efb\u52a1\u548c\u5b9e\u4f53\u7279\u5b9a\u7279\u5f81\uff1b\u7136\u540e\u662f\u6a21\u578b\u7684\u534a\u90e8\u5206\uff0c\u968f\u7740\u4e13\u95e8\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u9690\u85cf\u8868\u793a\u4e0e\u8bcd\u6c47\u7a7a\u95f4\u7684\u5bf9\u9f50\u8fdb\u5165\u4e00\u4e2a\u76f8\u53d8\u9636\u6bb5\uff1b\u6700\u540e\uff0c\u6700\u540e\u4e00\u5c42\u901a\u8fc7\u6d88\u9664\u5bf9\u9884\u6d4b\u9020\u6210\u5e72\u6270\u7684\u8fc7\u65f6\u7279\u5f81\uff0c\u7cbe\u7ec6\u5316\u540e\u7eed\u7684\u4ee4\u724c\u5206\u5e03\u3002**|\n", "2406.19358": "|**2024-06-27**|**The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models**|Xiliang Zhu et.al.|[2406.19358](http://arxiv.org/abs/2406.19358)|null|### \u6982\u8ff0 \u60c5\u611f\u5206\u6790\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u626e\u6f14\u7740\u6838\u5fc3\u89d2\u8272\u3002XLM-R\u548cmT5\u7b49\u591a\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u7684\u5173\u6ce8\u5ea6\u63d0\u5347\u3002\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63a8\u52a8\u4e86\u901a\u7528NLP\u4efb\u52a1\u7684\u53d1\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5728\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u65b9\u9762\u7684\u6027\u80fd\u5c1a\u672a\u5145\u5206\u63a2\u8ba8\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5b9e\u8bc1\u5206\u6790\uff0c\u6bd4\u8f83\u4e86\u516c\u5171\u5c0f\u578b\u591a\u8bed\u8a00\u6a21\u578b\uff08SMLM\uff09\u5982XLM-R\u4e0e\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLM\uff08\u5982Llama-3\uff09\u5728\u82f1\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u548c\u4e2d\u6587\u7684\u60c5\u611f\u5206\u6790\u4e2d\u7684\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8fc1\u79fb\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c31\u516c\u5f00\u6a21\u578b\u800c\u8a00\uff0cSMLM\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u8bbe\u7f6e\u4e2d\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u516c\u5f00LLM\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9002\u5e94\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u4e13\u6709\u7684GPT-3.5\u548cGPT-4\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u80fd\u529b\u4e0a\u9886\u5148\uff0c\u4f46\u5728\u5c11\u91cf\u6837\u672c\u573a\u666f\u4e0b\uff0c\u5b83\u4eec\u88ab\u516c\u5f00\u6a21\u578b\u8d85\u8d8a\u3002|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356](http://arxiv.org/abs/2406.19356)|null|## \u80cc\u666f \u9ad8\u8d28\u91cf\u7684\u5e72\u6270\u9879\u5bf9\u4e8e\u9009\u62e9\u9898\uff08\u5c24\u5176\u662f\u6570\u5b66\u9009\u62e9\u9898\uff09\u7684\u8bc4\u4f30\u548c\u6559\u5b66\u4ef7\u503c\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u5de5\u8bbe\u8ba1\u80fd\u591f\u53cd\u6620\u5b66\u751f\u5b9e\u9645\u77e5\u8bc6\u7f3a\u9677\u6216\u8bef\u89e3\u7684\u5e72\u6270\u9879\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u5728\u751f\u6210\u5e72\u6270\u9879\u65b9\u9762\u6709\u6240\u52a9\u76ca\uff0c\u4f46\u6570\u5b66\u8fd9\u7c7b\u5b66\u79d1\u7684\u5904\u7406\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7406\u89e3\u548c\u751f\u6210\u89e3\u91ca\u6027\u7684\u9519\u8bef\u8868\u793a\uff0c\u4ee5\u751f\u6210\u6570\u5b66\u9009\u62e9\u9898\u7684\u5e72\u6270\u9879\u3002\u672c\u6587\u4ecb\u7ecdDiVERT\uff08\u57fa\u4e8e\u6587\u672c\u7684\u53d8\u5f02\u8bef\u5dee\u751f\u6210\u5668\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u75287\u4ebf\u53c2\u6570\u5f00\u6e90LLM\u7684\u53d8\u5206\u65b9\u6cd5\uff0c\u5b83\u5728\u771f\u5b9e\u4e16\u754c\u6570\u5b66\u9009\u62e9\u9898\u6570\u636e\u96c6\uff08\u5305\u542b1,434\u4e2a\u95ee\u9898\uff0c\u88ab\u6570\u5341\u4e07\u5b66\u751f\u4f7f\u7528\uff09\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684GPT-4\u65b9\u6cd5\uff0cDiVERT\u5728\u5e72\u6270\u9879\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u4e0e\u6570\u5b66\u6559\u80b2\u8005\u7684\u540c\u884c\u8bc4\u5ba1\uff0c\u7ed3\u679c\u8868\u660eDiVERT\u751f\u6210\u7684\u9519\u8bef\u6807\u7b7e\u8d28\u91cf\u63a5\u8fd1\u4eba\u7c7b\u7f16\u5199\u7684\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u82f1\u6587\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u8f93\u51fa\u4e0d\u5e94\u5305\u542b\u9664\u6458\u8981\u5185\u5bb9\u5916\u7684\u4efb\u4f55\u5176\u4ed6\u5185\u5bb9\uff0c\u4e14\u786e\u4fdd\u4e0d\u51fa\u73b0\",\"\u5b57\u7b26\u3002|\n", "2406.19349": "|**2024-06-27**|**IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language**|Lucky Susanto et.al.|[2406.19349](http://arxiv.org/abs/2406.19349)|null|## \u7ffb\u8bd1 \u9488\u5bf9\u7f51\u7edc\u4ec7\u6068\u8a00\u8bba\u5bf9\u793e\u4f1a\u548c\u8c10\u7684\u4e25\u5cfb\u5a01\u80c1\uff0c\u7279\u522b\u662f\u5728\u5370\u5c3c\u8fd9\u7c7b\u56fd\u5bb6\uff0c\u8fd1\u5e74\u6765\u4ec7\u6068\u8a00\u8bba\u5728\u7ebf\u6bd4\u7387\u589e\u957f\u4e86\u5341\u500d\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u68c0\u6d4b\u673a\u5236\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5145\u8db3\u7684\u6807\u8bb0\u6570\u636e\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u5370\u5c3c\u6587\u672c\u7684\uff0c\u8fd9\u4e00\u8fdb\u5c55\u53d7\u5230\u4e86\u963b\u788d\u3002\u8fb9\u7f18\u5316\u7fa4\u4f53\uff0c\u5982\u4ec0\u53f6\u6d3e\u3001LGBTQ\u7b49\u5c11\u6570\u7fa4\u4f53\uff0c\u9762\u4e34\u7684\u6311\u6218\u66f4\u5927\uff0c\u56e0\u4e3a\u4ec7\u6068\u8a00\u8bba\u62a5\u544a\u4e0d\u8db3\uff0c\u73b0\u6709\u7684\u68c0\u6d4b\u5de5\u5177\u5bf9\u5176\u7406\u89e3\u6709\u9650\u3002\u6b64\u5916\uff0c\u5f53\u524d\u6570\u636e\u96c6\u5bf9\u4e3b\u89c2\u6027\u7684\u5904\u7406\u4e0d\u8db3\uff0c\u52a0\u5267\u4e86\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faIndoToxic2024\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u5370\u5c3c\u4ec7\u6068\u8a00\u8bba\u548c\u6bd2\u6027\u5206\u7c7b\u6570\u636e\u96c6\uff0c\u5305\u542b43,692\u6761\u8bb0\u5f55\uff0c\u753119\u540d\u591a\u5143\u5316\u7684\u4e2a\u4f53\u8fdb\u884c\u6807\u6ce8\uff0c\u7279\u522b\u5173\u6ce8\u9009\u4e3e\u671f\u95f4\u9488\u5bf9\u56fd\u5185\u5f31\u52bf\u7fa4\u4f53\uff08\u5982\u603b\u7edf\u9009\u4e3e\u4e2d\u7684\u7279\u5b9a\u7fa4\u4f53\uff09\u7684\u6587\u672c\u3002\u6211\u4eec\u4f7f\u7528BERT\u6a21\u578b\uff08IndoBERTweet\uff09\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4e3a\u4e03\u79cd\u4e8c\u5143\u5206\u7c7b\u4efb\u52a1\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff0c\u53d6\u5f97\u4e860.78\u7684\u5b8fF1\u5206\u6570\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5c06\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u878d\u5165\u5176\u4e2d\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578bgpt-3.5-turbo\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u8b66\u544a\uff0c\u8fc7\u5ea6\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u53ef\u80fd\u5bfc\u81f4\u7ec6\u5316\u6a21\u578b\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u8fd9\u4f1a\u5bfc\u81f4\u6570\u636e\u788e\u7247\u5316\u3002|\n", "2406.19317": "|**2024-06-27**|**Jump Starting Bandits with LLM-Generated Prior Knowledge**|Parand A. Alamdari et.al.|[2406.19317](http://arxiv.org/abs/2406.19317)|null|\u6211\u4eec\u63d0\u4f9b\u4e86\u6709\u529b\u7684\u8bc1\u636e\uff0c\u5c55\u793a\u4e86\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\u76f8\u7ed3\u5408\u7684\u4f18\u52bf\u3002\u4e0a\u4e0b\u6587\u5316\u8001\u864e\u673a\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u7528\u4e8e\u6839\u636e\u7528\u6237\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u751f\u6210\u4e2a\u6027\u5316\u5efa\u8bae\u3002\u6211\u4eec\u8868\u660e\uff0c\u7ecf\u8fc7\u5927\u89c4\u6a21\u8bed\u6599\u5e93\u8bad\u7ec3\uff0c\u5bcc\u542b\u4eba\u7c7b\u77e5\u8bc6\u548c\u504f\u597d\u7684LLMs\u80fd\u591f\u5f88\u597d\u5730\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u4ece\u800c\u901a\u8fc7\u542f\u52a8\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6765\u51cf\u5c11\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\uff08regret\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521d\u59cb\u5316\u7b97\u6cd5\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u63a5\u8fd1\u4eba\u7c7b\u504f\u597d\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u4f9b\u8001\u864e\u673a\u5b66\u4e60\u4f7f\u7528\u3002\u8fd9\u663e\u8457\u964d\u4f4e\u4e86\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\u548c\u6570\u636e\u6536\u96c6\u6210\u672c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e24\u7ec4\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4f7f\u7528LLMs\u4f5c\u4e3a\u5360\u535c\u8005\uff08oracle\uff09\u7684\u5b9e\u9a8c\u548c\u57fa\u4e8e\u8054\u5408\u8c03\u67e5\u5b9e\u9a8c\u6570\u636e\u7684\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u3002|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292](http://arxiv.org/abs/2406.19292)|null|\u8fd1\u671f\u7684\u7814\u7a76\u6307\u51fa\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u957f\u6587\u672c\u8f93\u5165\u65f6\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5408\u6210\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6570\u503c\u578b\u952e\u503c\u5bf9\u68c0\u7d22\u4efb\u52a1\u3002\u6211\u4eec\u5728GPT-3.5 Turbo\u548cMistral 7B\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u8fd9\u79cd\u6570\u636e\u96c6\u7684\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u957f\u6587\u672c\u73af\u5883\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u5206\u6790\u4e86\u5fae\u8c03\u540e\u7684\u6a21\u578b\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u4ece\u5408\u6210\u4efb\u52a1\u8fc1\u79fb\u5230\u5b9e\u9645\u8bc4\u4f30\uff08\u5982\u572820\u6587\u6863MDQA\u4e2d\u7684\u4f4d\u7f6e10\u5904\u63d0\u534710.5%\uff09\u65b9\u9762\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u7ecf\u8fc7\u6211\u4eec\u5408\u6210\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u5728\u901a\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u4fdd\u6301\u7a33\u5b9a\uff0c\u800c\u4f7f\u7528\u5176\u4ed6\u57fa\u4e8e\u957f\u6587\u672c\u589e\u5f3a\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u53ef\u80fd\u4f1a\u5bfc\u81f4\u9519\u8bef\u589e\u52a0\uff08\u4f8b\u5982\uff0c\u5728TriviaQA\u4e0a\uff0cMistral 7B\u5728\u6211\u4eec\u7684\u5408\u6210\u6570\u636e\u4e0a\u5fae\u8c03\u65e0\u660e\u663e\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u5176\u4ed6\u57fa\u7ebf\u6570\u636e\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u8303\u56f4\u57282.33%\u52306.19%\u4e4b\u95f4\uff09\u3002\u672c\u7814\u7a76\u7a81\u663e\u4e86\u901a\u8fc7\u5408\u6210\u6570\u636e\u5fae\u8c03\u6765\u63d0\u5347LLMs\u5728\u957f\u6587\u672c\u4efb\u52a1\u6027\u80fd\u7684\u6f5c\u529b\u3002|\n", "2406.19283": "|**2024-06-27**|**PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models**|Cathy Mengying Fang et.al.|[2406.19283](http://arxiv.org/abs/2406.19283)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPhysioLLM\u7684\u4e92\u52a8\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u53ef\u7a7f\u6234\u8bbe\u5907\u7684\u751f\u7406\u6570\u636e\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u5065\u5eb7\u7406\u89e3\u548c\u63a2\u7d22\u3002\u4e0e\u5546\u4e1a\u5065\u5eb7\u5e94\u7528\u4e0d\u540c\uff0cPhysioLLM\u5177\u5907\u5168\u9762\u7684\u7edf\u8ba1\u5206\u6790\u529f\u80fd\uff0c\u80fd\u53d1\u73b0\u7528\u6237\u6570\u636e\u4e2d\u7684\u5173\u8054\u548c\u8d8b\u52bf\u3002\u7528\u6237\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u95ee\uff0c\u83b7\u53d6\u751f\u6210\u7684\u4e2a\u6027\u5316\u6d1e\u5bdf\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u4fe1\u606f\u5236\u5b9a\u884c\u52a8\u76ee\u6807\u3002\u4ee5\u6539\u5584\u7761\u7720\u8d28\u91cf\u4e3a\u4f8b\uff0c\u56e0\u4e3a\u5176\u53ef\u901a\u8fc7\u751f\u7406\u6570\u636e\u91cf\u5316\u4e14\u5bf9\u6574\u4f53\u5065\u5eb7\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u4e00\u9879\u6d89\u53ca24\u540dFitbit\u667a\u80fd\u624b\u8868\u7528\u6237\u7684\u7528\u6237\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86PhysioLLM\u5728\u4fc3\u8fdb\u5bf9\u5065\u5eb7\u6570\u636e\u7684\u6df1\u5165\u4e2a\u6027\u5316\u7406\u89e3\uff0c\u4ee5\u53ca\u652f\u6301\u5b9e\u73b0\u4e2a\u4eba\u5065\u5eb7\u76ee\u6807\u65b9\u9762\uff0c\u4f18\u4e8eFitbit\u5e94\u7528\u548c\u901a\u7528LLM\u804a\u5929\u673a\u5668\u4eba\u3002|\n", "2406.19280": "|**2024-06-27**|**HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale**|Junying Chen et.al.|[2406.19280](http://arxiv.org/abs/2406.19280)|**[link](https://github.com/freedomintelligence/huatuogpt-vision)**|**\u968f\u7740\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u533b\u5b66\u591a\u6a21\u6001\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u533b\u5b66\u5f71\u50cf-\u6587\u672c\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u53d7\u9650\u4e8e\u6570\u636e\u9690\u79c1\u95ee\u9898\u548c\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ecd\u9762\u4e34\u6311\u6218\u3002\u65e9\u671f\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528PubMed\u7684\u5927\u578b\u53bb\u6807\u8bc6\u5316\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\u6765\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u4ecd\u53d7\u5230\u6570\u636e\u566a\u97f3\u7684\u5f71\u54cd\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4f18\u5316\u4e86PubMed\u4e2d\u7684\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\uff0c\u5e76\u5229\u7528GPT-4V\u5728\u201c\u975e\u76f2\u201d\u6a21\u5f0f\u4e0b\u8fdb\u884c\u6570\u636e\u6e05\u6d17\u548c\u683c\u5f0f\u8f6c\u6362\uff0c\u521b\u5efa\u4e86PubMedVision\u6570\u636e\u96c6\uff0c\u5305\u542b130\u4e07\u4efd\u533b\u5b66\u89c6\u89c9\u95ee\u7b54\u6837\u672c\u3002\u6211\u4eec\u7684\u9a8c\u8bc1\u8868\u660e\uff1a\uff081\uff09PubMedVision\u663e\u8457\u63d0\u5347\u4e86\u5f53\u524d\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u5728\u533b\u5b66\u9886\u57df\u7684\u6027\u80fd\uff0c\u5728\u8bf8\u5982MMMU Health & Medicine track\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u6539\u5584\uff1b\uff082\uff09\u533b\u5b66\u4e13\u5bb6\u7684\u624b\u52a8\u68c0\u67e5\u548c\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u5728\u6570\u636e\u8d28\u91cf\u4e0a\u4f18\u4e8e\u5176\u4ed6\u6784\u5efa\u65b9\u6cd5\u3002\u5229\u7528PubMedVision\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u540d\u4e3aHuatuoGPT-Vision\u7684340\u4ebf\u53c2\u6570\u7684\u533b\u5b66\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u5728\u516c\u5f00\u6e90\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u533b\u5b66\u591a\u6a21\u6001\u573a\u666f\u4e2d\u663e\u793a\u51fa\u4f18\u8d8a\u6027\u80fd\u3002**|\n", "2406.19271": "|**2024-06-27**|**AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning**|Praneeth Vadlapati et.al.|[2406.19271](http://arxiv.org/abs/2406.19271)|**[link](https://github.com/Pro-GenAI/AutoPureData)**|**\u4eba\u4eec\u5bf9\u6700\u65b0\u7684\u548c\u53ef\u9760\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9700\u6c42\u6301\u7eed\u589e\u957f\u3002\u901a\u5e38\uff0cLLMs\u662f\u57fa\u4e8e\u56fa\u5b9a\u7684\u6570\u636e\u96c6\u8bad\u7ec3\u7136\u540e\u90e8\u7f72\u7684\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u6570\u636e\u4f1a\u968f\u7740\u65f6\u95f4\u9010\u6e10\u8fc7\u65f6\u3002\u7814\u7a76\u5173\u6ce8\u5982\u4f55\u5229\u7528\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u66f4\u65b0AI\u6a21\u578b\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u6570\u636e\u8d28\u91cf\u4e0e\u5b89\u5168\u7684\u987e\u8651\uff0c\u5982\u504f\u89c1\u3001\u5783\u573e\u4fe1\u606f\u7b49\u3002\u786e\u4fdd\u6570\u636e\u7eaf\u51c0\u5bf9\u4e8e\u751f\u6210\u53ef\u9760\u7684\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5728\u4e0d\u7eaf\u6570\u636e\u4e0a\u8bad\u7ec3\u53ef\u80fd\u5bfc\u81f4\u4e0d\u826f\u7ed3\u679c\u3002\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5b83\u6536\u96c6\u7f51\u7edc\u6570\u636e\uff0c\u5e76\u501f\u52a9\u73b0\u6709\u53ef\u4fe1\u7684AI\u6a21\u578b\u81ea\u52a8\u7b5b\u9009\u51fa\u4e0d\u9700\u8981\u7684\u5185\u5bb9\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5904\u7406\u4e86\u4e00\u5c0f\u90e8\u5206\u7f51\u7edc\u6570\u636e\uff0c\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\u7684\u6570\u636e\u51c0\u5316\u6548\u679c\u3002**|\n", "2406.20098": "|**2024-06-28**|**Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs**|Sukmin Yun et.al.|[2406.20098](http://arxiv.org/abs/2406.20098)|**[link](https://github.com/mbzuai-llm/web2code)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u7b49\u591a\u79cd\u6a21\u6001\u7684\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7406\u89e3\u548c\u751f\u6210\u7f51\u9875\u622a\u56fe\u4ee5\u53ca\u76f8\u5e94\u7684HTML\u4ee3\u7801\u65b9\u9762\u7684\u80fd\u529b\u76f8\u5bf9\u8f83\u5f31\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faWeb2Code\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u62ec\u5927\u89c4\u6a21\u7f51\u9875\u5230\u4ee3\u7801\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u6307\u4ee4\u8c03\u4f18\uff0c\u5e76\u8bc4\u4f30MLLM\u5728\u7f51\u9875\u7406\u89e3\u53caHTML\u4ee3\u7801\u8f6c\u6362\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u6784\u5efa\u6570\u636e\u96c6\u65f6\uff0c\u5229\u7528\u9884\u8bad\u7ec3\u7684LLMs\u589e\u5f3a\u73b0\u6709\u7684\u7f51\u9875\u5230\u4ee3\u7801\u6570\u636e\u96c6\uff0c\u5e76\u751f\u6210\u591a\u6837\u5316\u7684\u7f51\u9875\u56fe\u7247\uff0c\u4ee5\u4f9b\u6e32\u67d3\u3002\u8f93\u5165\u662f\u7f51\u9875\u56fe\u7247\u548c\u8bf4\u660e\uff0c\u8f93\u51fa\u662f\u7f51\u9875\u7684HTML\u4ee3\u7801\uff0c\u540c\u65f6\u52a0\u5165\u5173\u4e8e\u7f51\u9875\u5185\u5bb9\u7684\u4e30\u5bcc\u81ea\u7136\u8bed\u8a00\u95ee\u7b54\u5bf9\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u7f51\u9875\u5185\u5bb9\u7684\u5168\u9762\u7406\u89e3\u3002\u4e3a\u4e86\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6d4b\u8bd5\u6846\u67b6\uff0c\u7528\u4e8e\u6d4b\u8bd5MLLM\u5728\u7f51\u9875\u7406\u89e3\u4e0e\u7f51\u9875\u5230\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6280\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0d\u4ec5\u6709\u76ca\u4e8e\u6211\u4eec\u63d0\u51fa\u7684\u4efb\u52a1\uff0c\u8fd8\u5728\u89c6\u89c9\u9886\u57df\u7684\u4e00\u822c\u6027\u80fd\u4e0a\u6709\u6240\u63d0\u5347\uff0c\u800c\u5148\u524d\u7684\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u63a8\u52a8\u901a\u7528MLLM\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u9002\u7528\u4e8e\u7f51\u7edc\u5185\u5bb9\u751f\u6210\u548c\u81ea\u52a8\u5316\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095](http://arxiv.org/abs/2406.20095)|**[link](https://github.com/lostxine/llara)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLaRA\uff08\u5927\u578b\u8bed\u8a00\u548c\u673a\u5668\u4eba\u52a9\u624b\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u8f6c\u5316\u4e3a\u5bf9\u8bdd\u5f62\u5f0f\uff0c\u901a\u8fc7\u7ed3\u5408\u989d\u5916\u7684\u6570\u636e\u8f85\u52a9\u5b66\u4e60\uff0c\u63d0\u5347\u54cd\u5e94\u8d28\u91cf\u3002\u5229\u7528\u5177\u5907\u89c6\u89c9\u8f93\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\uff0c\u5373\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5904\u7406\u72b6\u6001\u4fe1\u606f\uff0c\u4f5c\u4e3a\u89c6\u89c9-\u6587\u672c\u63d0\u793a\uff0c\u5e76\u751f\u6210\u6700\u4f18\u7684\u673a\u5668\u4eba\u51b3\u7b56\u7b56\u7565\u3002\u9996\u5148\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u65b9\u6cd5\uff0c\u4ece\u73b0\u6709\u7684\u884c\u4e3a\u514b\u9686\u6570\u636e\u4e2d\u751f\u6210\u591a\u6837\u4e14\u9ad8\u8d28\u91cf\u7684\u673a\u5668\u4eba\u6307\u4ee4\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u5b9a\u5236\u7684\u5bf9\u8bdd\u5f0f\u683c\u5f0f\u5bf9VLM\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u5176\u80fd\u591f\u751f\u6210\u6709\u610f\u4e49\u7684\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLaRA\u6846\u67b6\u5728\u591a\u4e2a\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u9884\u8bad\u7ec3\u6a21\u578b\u5df2\u5728\u63d0\u4f9b\u3002**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094](http://arxiv.org/abs/2406.20094)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5185\u7684\u591a\u79cd\u89c6\u89d2\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u4eba\u5de5\u5408\u6210\u6570\u636e\u3002\u4e3a\u4e86\u5728\u5927\u89c4\u6a21\u4e0a\u5145\u5206\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5f15\u5165\u4e86Persona Hub\uff0c\u8fd9\u662f\u4e00\u4e2a\u4ece\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u6574\u7406\u51fa\u7684\u4e00\u4ebf\u4e2a\u591a\u5143\u5316\u4eba\u683c\u7684\u96c6\u5408\uff0c\u76f8\u5f53\u4e8e\u5168\u7403\u4eba\u53e3\u7684\u7ea613%\u3002\u8fd9\u4e9b\u4eba\u683c\u4f5c\u4e3a\u5206\u5e03\u5f0f\u4e16\u754c\u77e5\u8bc6\u8f7d\u4f53\uff0c\u51e0\u4e4e\u53ef\u4ee5\u8c03\u7528LLM\u5185\u5305\u542b\u7684\u5404\u7c7b\u89c2\u70b9\uff0c\u4ece\u800c\u63a8\u52a8\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u5408\u6210\u6570\u636e\u521b\u5efa\uff0c\u9002\u7528\u4e8e\u5404\u79cd\u573a\u666f\u3002\u901a\u8fc7\u5c55\u793aPersona Hub\u5982\u4f55\u5728\u5927\u89c4\u6a21\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u548c\u903b\u8f91\u63a8\u7406\u95ee\u9898\u3001\u6307\u4ee4\uff08\u7528\u6237\u63d0\u793a\uff09\u3001\u5bcc\u542b\u77e5\u8bc6\u7684\u6587\u672c\u3001\u6e38\u620fNPC\u548c\u5de5\u5177\uff08\u51fd\u6570\uff09\u7b49\u65b9\u9762\u7684\u5e94\u7528\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u5177\u6709\u591a\u6837\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u6613\u7528\u6027\uff0c\u53ef\u80fd\u5f15\u9886\u5408\u6210\u6570\u636e\u521b\u9020\u548c\u5b9e\u9645\u5e94\u7528\u7684\u65b0\u8303\u5f0f\uff0c\u5bf9LLM\u7684\u7814\u7a76\u548c\u53d1\u5c55\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002|\n", "2406.20092": "|**2024-06-28**|**LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression**|Jieneng Chen et.al.|[2406.20092](http://arxiv.org/abs/2406.20092)|**[link](https://github.com/beckschen/llavolta)**|**\u5c3d\u7ba1\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5d4c\u5165\u538b\u7f29\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u4e2d\u7684\u89c6\u89c9\u4ee4\u724c\u538b\u7f29\u4ecd\u7136\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u7814\u7a76\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u5197\u4f59\u6027\u4ee5\u53ca\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6709\u6548\u8bad\u7ec3\u3002\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d4b\u8bd5\u9636\u6bb5\u901a\u8fc7\u7b80\u5355\u5e73\u5747\u6c60\u5316\u6d88\u9664\u9ad8\u8fbe70%\u7684\u89c6\u89c9\u4ee4\u724c\uff0cGQA\u57fa\u51c6\u7684\u89c6\u89c9\u95ee\u7b54\u51c6\u786e\u7387\u4ec5\u4e0b\u964d3%\uff0c\u8fd9\u663e\u793a\u51fa\u89c6\u89c9\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u5927\u91cf\u5197\u4f59\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Visual Context Compressor\uff0c\u5b83\u5728\u8bad\u7ec3\u9636\u6bb5\u51cf\u5c11\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e3a\u4e86\u5728\u538b\u7f29\u89c6\u89c9\u4ee4\u724c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4fe1\u606f\u635f\u5931\u5e76\u4fdd\u6301\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u8f7b\u91cf\u7ea7\u8bad\u7ec3\u65b9\u6848LLaVolta\u3002LLaVolta\u91c7\u7528\u5206\u9636\u6bb5\u7684\u89c6\u89c9\u4e0a\u4e0b\u6587\u538b\u7f29\u7b56\u7565\uff0c\u4ece\u91cd\u5ea6\u5230\u8f7b\u5ea6\u9010\u6e10\u538b\u7f29\uff0c\u6700\u7ec8\u5728\u8bad\u7ec3\u7ed3\u675f\u65f6\u5b8c\u5168\u4e0d\u8fdb\u884c\u538b\u7f29\uff0c\u4ece\u800c\u5728\u6d4b\u8bd5\u65f6\u4e0d\u4f1a\u4e22\u5931\u4efb\u4f55\u4fe1\u606f\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u5347\u4e86\u591a\u6a21\u6001\u6a21\u578b\u5728\u56fe\u50cf-\u8bed\u8a00\u548c\u89c6\u9891-\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u6210\u672c\u3002\u4ee3\u7801\u5df2\u5728https://github.com/Beckschen/LLaVolta\u4e0a\u5f00\u6e90\u3002**|\n", "2406.20087": "|**2024-06-28**|**ProgressGym: Alignment with a Millennium of Moral Progress**|Tianyi Qiu et.al.|[2406.20087](http://arxiv.org/abs/2406.20087)|null|\u968f\u7740\u524d\u6cbf\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u77e5\u8bc6\u8bba\u4e2d\u7684\u5f71\u54cd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5b83\u4eec\u53ef\u80fd\u5f3a\u5316\u793e\u4f1a\u666e\u904d\u7684\u4ef7\u503c\u89c2\uff0c\u8fdb\u800c\u52a0\u5267\u9519\u8bef\u9053\u5fb7\u89c2\u5ff5\u7684\u56fa\u5316\uff0c\u5bfc\u81f4\u5e7f\u6cdb\u7684\u793e\u4f1a\u95ee\u9898\u6301\u7eed\u5b58\u5728\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6f5c\u5728\u98ce\u9669\uff0c\u6211\u4eec\u63d0\u51fa\u8fdb\u6b65\u5bf9\u9f50\u4f5c\u4e3a\u4e00\u79cd\u6280\u672f\u89e3\u51b3\u65b9\u6848\u3002\u8fdb\u6b65\u5bf9\u9f50\u7b97\u6cd5\u65e8\u5728\u5b66\u4e60\u4eba\u7c7b\u9053\u5fb7\u8fdb\u6b65\u7684\u673a\u5236\uff0c\u4ece\u800c\u5f25\u8865\u73b0\u6709\u5bf9\u9f50\u65b9\u6cd5\u5bf9\u5f53\u4ee3\u9053\u5fb7\u76f2\u70b9\u7684\u654f\u611f\u6027\u3002\u4e3a\u4e86\u63a8\u52a8\u8fdb\u6b65\u5bf9\u9f50\u7684\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ProgressGym\uff0c\u4e00\u4e2a\u5b9e\u9a8c\u6027\u6846\u67b6\uff0c\u5b83\u4ece\u5386\u53f2\u4e2d\u5b66\u4e60\u9053\u5fb7\u8fdb\u6b65\u7684\u89c4\u5f8b\uff0c\u4ee5\u4fc3\u8fdb\u73b0\u5b9e\u4e16\u754c\u9053\u5fb7\u51b3\u7b56\u7684\u672a\u6765\u53d1\u5c55\u3002\u501f\u52a99\u4e2a\u4e16\u7eaa\u7684\u5386\u53f2\u6587\u672c\u548c18\u4e2a\u5386\u53f2LLMs\uff0cProgressGym\u5c06\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u8fdb\u6b65\u5bf9\u9f50\u6311\u6218\u8f6c\u5316\u4e3a\u5177\u4f53\u7684\u57fa\u51c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u8ffd\u8e2a\u6f14\u53d8\u7684\u4ef7\u503c\uff08PG-Follow\uff09\u3001\u9884\u6d4b\u9053\u5fb7\u8fdb\u6b65\uff08PG-Predict\uff09\u4ee5\u53ca\u8c03\u8282\u4eba\u4e0eAI\u4ef7\u503c\u53d8\u8fc1\u4e4b\u95f4\u7684\u53cd\u9988\u5faa\u73af\uff08PG-Coevolve\uff09\u3002\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u65f6\u95f4\u7ef4\u5ea6\u7684\u65b9\u6cd5\uff0c\u800c\u4f20\u7edf\u7684\u5bf9\u9f50\u7b56\u7565\u65e0\u6cd5\u80dc\u4efb\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7ec8\u8eab\u5b66\u4e60\u548c\u5916\u63a8\u7b97\u6cd5\u4f5c\u4e3a\u8fdb\u6b65\u5bf9\u9f50\u7684\u57fa\u672c\u65b9\u6cd5\uff0c\u5e76\u5efa\u7acb\u4e86\u4e00\u4e2a\u5f00\u653e\u7684\u6392\u884c\u699c\uff0c\u9080\u8bf7\u521b\u65b0\u7b97\u6cd5\u548c\u65b0\u6311\u6218\u3002\u8be5\u6846\u67b6\u548c\u6392\u884c\u699c\u5206\u522b\u53ef\u5728https://github.com/PKU-Alignment/ProgressGym \u548c https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard \u83b7\u53d6\u3002|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085](http://arxiv.org/abs/2406.20085)|null|\u57fa\u4e8e\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u65b9\u6cd5\u5df2\u7ecf\u5728\u751f\u6210\u5404\u79cd\u5e03\u5c40\u7684\u9ad8\u8d28\u91cf\u56fe\u50cf\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u4e0b\u6e38\u611f\u77e5\u4efb\u52a1\u5177\u6709\u663e\u8457\u76ca\u5904\u3002\u7136\u800c\uff0c\u4ec5\u4f9d\u8d56\u8bed\u8a00\u63cf\u8ff0\u548c\u4e00\u4e2a\u5408\u9002\u7684\u591a\u5b9e\u4f8b\u8bc4\u4f30\u6307\u6807\u6765\u5b9e\u73b0\u5168\u81ea\u52a8\u5e03\u5c40\u751f\u6210\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Auto Cherry-Picker\uff08ACP\uff09\uff0c\u65e8\u5728\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6837\u672c\uff0c\u4ee5\u589e\u5f3a\u611f\u77e5\u548c\u591a\u6a21\u6001\u8bad\u7ec3\u6548\u679c\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u6982\u5ff5\u5217\u8868\uff0c\u6211\u4eec\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u8be6\u7ec6\u7684\u63cf\u8ff0\u5e76\u8bbe\u8ba1\u5408\u7406\u7684\u5e03\u5c40\u3002\u7136\u540e\uff0c\u4f7f\u7528\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u591a\u4e2a\u56fe\u7247\u3002\u63a5\u7740\uff0c\u6211\u4eec\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bc4\u4f30\u6307\u6807\u5bf9\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u70bc\uff0c\u786e\u4fdd\u8d28\u91cf\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u590d\u5408\u5e03\u5c40\u4e0e\u56fe\u50cf\u8bc4\u5206\uff08Composite Layout and Image Score\uff0cCLIS\uff09\u8fd9\u4e00\u65b0\u6307\u6807\uff0c\u7528\u4e8e\u516c\u6b63\u5730\u8bc4\u4f30\u751f\u6210\u7684\u56fe\u50cf\u3002\u6211\u4eec\u7684\u5408\u6210\u9ad8\u8d28\u793a\u4f8b\u5728\u5b9a\u5236\u521d\u59cb\u6982\u5ff5\u5217\u8868\u65f6\uff0c\u80fd\u591f\u6709\u6548\u63d0\u5347\u5404\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5c3e\u5206\u5e03\u548c\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u7684\u95ee\u9898\u4e0a\u3002\u4e0b\u6e38\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cACP\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6a21\u578b\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86CLIS\u4e0e\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u53d1\u73b0CLIS\u5206\u6570\u8d8a\u9ad8\uff0c\u6027\u80fd\u8d8a\u597d\u3002\u8fd9\u8868\u660e\u8bc4\u4f30\u6307\u6807\u5728\u89c6\u89c9\u611f\u77e5\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u4ee3\u7801\u3002|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079](http://arxiv.org/abs/2406.20079)|**[link](https://github.com/anisha2102/molecular_facts)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5185\u5bb9\u7684\u81ea\u52a8\u4e8b\u5b9e\u6838\u67e5\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4ee5\u5e94\u5bf9\u9519\u8bef\u53d9\u8ff0\u7684\u95ee\u9898\uff0c\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u7126\u70b9\u5728\u4e8e\u6838\u67e5\u7684\u7c92\u5ea6\uff1a\u8f83\u5927\u7684\u6587\u672c\u6bb5\u843d\u96be\u4ee5\u6838\u67e5\uff0c\u800c\u66f4\u539f\u5b50\u5316\u7684\u4e8b\u5b9e\uff08\u5982\u547d\u9898\uff09\u53ef\u80fd\u7f3a\u4e4f\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u89e3\u8bfb\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u8fd9\u4e9b\u539f\u5b50\u4e8b\u5b9e\u4e2d\u4e0a\u4e0b\u6587\u7684\u4f5c\u7528\u3002\u6211\u4eec\u8ba4\u4e3a\u5b8c\u5168\u539f\u5b50\u7684\u4e8b\u5b9e\u5e76\u975e\u6700\u4f73\u8868\u793a\u5f62\u5f0f\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u5206\u5b50\u4e8b\u5b9e\u7684\u4e24\u4e2a\u6807\u51c6\uff1a\u53bb\u60c5\u5883\u5316\uff08decontextuality\uff09\uff0c\u5373\u5b83\u4eec\u80fd\u5426\u72ec\u7acb\u5b58\u5728\uff0c\u4ee5\u53ca\u6700\u5c0f\u5316\uff08minimality\uff09\uff0c\u5373\u6dfb\u52a0\u591a\u5c11\u989d\u5916\u4fe1\u606f\u624d\u80fd\u5b9e\u73b0\u53bb\u60c5\u5883\u5316\u3002\u6211\u4eec\u91cf\u5316\u4e86\u53bb\u60c5\u5883\u5316\u5bf9\u6700\u5c0f\u5316\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u7840\u65b9\u6cd5\u6765\u81ea\u52a8\u751f\u6210\u5206\u5b50\u4e8b\u5b9e\uff0c\u76ee\u6807\u662f\u5728\u4fdd\u6301\u51c6\u786e\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u9002\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u4e0e\u4e0d\u540c\u7684\u53bb\u60c5\u5883\u5316\u7b56\u7565\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u53d1\u73b0\u5206\u5b50\u4e8b\u5b9e\u80fd\u591f\u5728\u6a21\u7cca\u573a\u666f\u4e2d\u5e73\u8861\u6700\u5c0f\u5316\u548c\u4e8b\u5b9e\u6838\u67e5\u7684\u51c6\u786e\u6027\u3002**|\n", "2406.20041": "|**2024-07-01**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041](http://arxiv.org/abs/2406.20041)|null|\u81ea\u4e3b\u4ee3\u7406\u9a71\u52a8\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u793a\u4e86\u5de8\u5927\u7684\u81ea\u52a8\u5316\u6f5c\u529b\u3002\u65e9\u671f\u7684\u5c55\u793a\u8868\u660e\uff0c\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u89e3\u51b3\u590d\u6742\u4efb\u52a1\uff0c\u4e0e\u5916\u90e8\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u589e\u5f3a\u77e5\u8bc6\uff0c\u5e76\u89e6\u53d1\u884c\u52a8\u3002\u7279\u522b\u662f\uff0c\u591a\u4e2a\u4ee3\u7406\u534f\u4f5c\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u5de5\u4f5c\u6d41\u8bc1\u660e\u4e86\u5b83\u4eec\u5728\u4e0d\u90a3\u4e48\u4e25\u683c\u548c\u5b9a\u4e49\u4e0d\u660e\u786e\u7684\u73af\u5883\u4e2d\u64cd\u4f5c\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u591a\u4ee3\u7406\u65b9\u6cd5\u6709\u5de8\u5927\u7684\u6f5c\u529b\u6210\u4e3a\u4f17\u591a\u5de5\u4e1a\u5e94\u7528\u7684\u6838\u5fc3\uff0c\u4ece\u590d\u6742\u7684\u77e5\u8bc6\u68c0\u7d22\u7cfb\u7edf\u5230\u4e0b\u4e00\u4ee3\u673a\u5668\u4eba\u8fc7\u7a0b\u81ea\u52a8\u5316\u3002\u9274\u4e8e\u5f53\u524dLLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5904\u7406\u590d\u6742\u6d41\u7a0b\u9700\u8981\u5206\u6b65\u9aa4\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u8bbe\u8ba1\u660e\u786e\u4e14\u6a21\u5757\u5316\u7684\u4efb\u52a1\u8ba1\u5212\u3002\u6839\u636e\u590d\u6742\u7a0b\u5ea6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u53ef\u4ee5\u7531\u5355\u4e2a\u4ee3\u7406\u6216\u4e00\u7ec4\u4ee3\u7406\u6267\u884c\u3002\u672c\u7814\u7a76\u4e13\u6ce8\u4e8e\u6784\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684\u4ee3\u7406\u5de5\u7a0b\u6846\u67b6\uff0c\u91cd\u70b9\u5173\u6ce8\u89c4\u5212\u548c\u6267\u884c\uff0c\u65e8\u5728\u5e94\u5bf9\u4e0d\u540c\u9886\u57df\u7684\u590d\u6742\u5e94\u7528\u573a\u666f\u3002\u8be5\u6846\u67b6\u4e3a\u5de5\u4e1a\u5e94\u7528\u63d0\u4f9b\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u51fa\u786e\u4fdd\u53ef\u6269\u5c55\u3001\u7075\u6d3b\u4e14\u534f\u4f5c\u7684\u5de5\u4f5c\u6d41\u7a0b\u6280\u672f\uff0c\u8ba9\u591a\u4e2a\u81ea\u4e3b\u4ee3\u7406\u534f\u540c\u89e3\u51b3\u95ee\u9898\u3002|\n", "2406.20030": "|**2024-06-28**|**LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models**|Renzhi Wang et.al.|[2406.20030](http://arxiv.org/abs/2406.20030)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u4e86\u8ddf\u4e0a\u4e0d\u65ad\u53d8\u5316\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u9700\u8981\u6301\u7eed\u8fdb\u884c\u6a21\u578b\u66f4\u65b0\uff0c\u8fd9\u50ac\u751f\u4e86\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u591a\u79cd\u5355\u6b21\u548c\u6279\u91cf\u7f16\u8f91\u7684\u6280\u672f\uff0c\u4f46\u5b83\u4eec\u5728\u9762\u5bf9\u7ec8\u751f\u7f16\u8f91\u65f6\u8981\u4e48\u65e0\u6cd5\u5e94\u7528\uff0c\u8981\u4e48\u6548\u679c\u4e0d\u4f73\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51faLEMoE\uff0c\u4e00\u4e2a\u4e13\u4e3a\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u8bbe\u8ba1\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u9002\u914d\u5668\u3002\u9996\u5148\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f71\u54cd\u4f20\u7edfMoE\u9002\u914d\u5668\u5728\u7ec8\u751f\u7f16\u8f91\u4e2d\u6709\u6548\u6027\u7684\u56e0\u7d20\uff0c\u5305\u62ec\u707e\u96be\u6027\u9057\u5fd8\u3001\u8def\u7531\u4e0d\u4e00\u81f4\u6027\u548c\u987a\u5e8f\u654f\u611f\u6027\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9a\u5236\u7684\u6a21\u5757\u63d2\u5165\u65b9\u6cd5\uff0c\u5f15\u5165\u4e86\u65b0\u9896\u7684\u952e\u503c\u5bf9\u951a\u5b9a\u8def\u7531\u4ee5\u589e\u5f3a\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8def\u7531\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u805a\u7c7b\u57fa\u7f16\u8f91\u987a\u5e8f\u89c4\u5212\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec8\u751f\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u6280\u672f\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6279\u91cf\u7f16\u8f91\u4efb\u52a1\u4e2d\u7684\u4f18\u79c0\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5f00\u6e90\u3002|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015](http://arxiv.org/abs/2406.20015)|**[link](https://github.com/toolbehonest/toolbehonest)**|**\u968f\u7740\u5de5\u5177\u589e\u5f3a\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fc5\u901f\u878d\u5165\u5b9e\u9645\u5e94\u7528\uff0c\u793e\u533a\u4e9f\u9700\u5168\u9762\u4e86\u89e3\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5168\u9762\u7684\u8bca\u65ad\u57fa\u51c6\u2014\u2014ToolBH\u3002\u6211\u4eec\u4ece\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e24\u4e2a\u7ef4\u5ea6\u8fdb\u884c\u8bc4\u4f30\uff1a\u5728\u6df1\u5ea6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u591a\u7ea7\u8bca\u65ad\u6d41\u7a0b\uff0c\u5305\u62ec\uff081\uff09\u53ef\u89e3\u6027\u68c0\u6d4b\u3001\uff082\uff09\u89e3\u51b3\u65b9\u6848\u89c4\u5212\u548c\uff083\uff09\u7f3a\u5931\u5de5\u5177\u5206\u6790\uff1b\u5728\u5e7f\u5ea6\u4e0a\uff0c\u8003\u8651\u4e86\u5de5\u5177\u96c6\u7279\u5f81\u4e0b\u7684\u4e09\u79cd\u573a\u666f\uff1a\u7f3a\u5c11\u5fc5\u8981\u5de5\u5177\u3001\u6f5c\u5728\u5de5\u5177\u548c\u529f\u80fd\u6709\u9650\u7684\u5de5\u5177\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e03\u4e2a\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u6b21\u4eba\u5de5\u6807\u6ce8\u6536\u96c6\u4e86700\u4efd\u8bc4\u4f30\u6837\u672c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u5148\u8fdb\u7684\u6a21\u578bGemini-1.5-Pro\u548cGPT-4o\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u603b\u5f97\u5206\u4e3a45.3\u548c37.0\uff0c\u6ee1\u5206100\u5206\u3002\u5728\u5de5\u5177\u589e\u5f3a\u7684LLM\u573a\u666f\u4e2d\uff0c\u66f4\u5927\u7684\u6a21\u578b\u53c2\u6570\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8bad\u7ec3\u6570\u636e\u548c\u56de\u590d\u7b56\u7565\u540c\u6837\u5173\u952e\u3002\u6211\u4eec\u7684\u8bca\u65ad\u5206\u6790\u6307\u51fa\uff0c\u6a21\u578b\u9519\u8bef\u7684\u4e3b\u8981\u539f\u56e0\u5728\u4e8e\u4efb\u52a1\u53ef\u89e3\u6027\u7684\u5224\u65ad\u3002\u5f00\u653e\u6e90\u7801\u6a21\u578b\u5728\u5197\u957f\u56de\u590d\u65f6\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u4e13\u6709\u6a21\u578b\u5728\u957f\u94fe\u63a8\u7406\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002**|\n", "2407.02490": "|**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8ba1\u7b97\u6311\u6218\uff0c\u5c24\u5176\u662f\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u957f\uff0c\u5176\u5e7f\u6cdb\u5e94\u7528\u9762\u4e34\u969c\u788d\u3002\u7531\u4e8e\u6ce8\u610f\u529b\u8ba1\u7b97\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c80\u4ebf\u53c2\u6570\u7684LLM\u5728\u5355\u4e2aA100 GPU\u4e0a\u5904\u7406100\u4e07\u4e2a\u4ee4\u724c\uff08\u5373\u9884\u586b\u5145\u9636\u6bb5\uff09\u9700\u898130\u5206\u949f\u3002\u73b0\u6709\u7684\u52a0\u901f\u9884\u586b\u5145\u65b9\u6cd5\u5f80\u5f80\u5728\u9762\u5bf9\u957f\u5e8f\u5217LLMs\u65f6\u96be\u4ee5\u4fdd\u6301\u65e2\u9ad8\u6548\u53c8\u51c6\u786e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MInference\uff08\u767e\u4e07\u4ee4\u724c\u63a8\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u63d0\u5347\u957f\u5e8f\u5217\u5904\u7406\u9884\u586b\u5145\u9636\u6bb5\u901f\u5ea6\u7684\u7a00\u758f\u8ba1\u7b97\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u6ce8\u610f\u529b\u77e9\u9635\u4e2d\u7684\u4e09\u79cd\u72ec\u7279\u6a21\u5f0f\uff1aA\u5f62\u3001\u5782\u76f4\u659c\u7ebf\u548c\u5757\u7a00\u758f\uff0c\u8fd9\u4e9b\u6a21\u5f0f\u53ef\u5229\u7528GPU\u8fdb\u884c\u9ad8\u6548\u7684\u7a00\u758f\u8ba1\u7b97\u3002\u6211\u4eec\u5728\u79bb\u7ebf\u9636\u6bb5\u786e\u5b9a\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u6700\u4f73\u6a21\u5f0f\uff0c\u5e76\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u52a8\u6001\u6784\u5efa\u7a00\u758f\u7d22\u5f15\u3002\u901a\u8fc7\u4f18\u5316\u7684GPU\u5185\u6838\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u57fa\u4e8e\u6307\u5b9a\u6a21\u5f0f\u7684\u7a00\u758f\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u957f\u5e8f\u5217LLMs\u9884\u586b\u5145\u9636\u6bb5\u7684\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u9700\u4fee\u6539\u9884\u8bad\u7ec3\u8bbe\u7f6e\u6216\u989d\u5916\u5fae\u8c03\u5373\u53ef\u76f4\u63a5\u5e94\u7528\u4e8e\u73b0\u6709LLMs\u3002\u6211\u4eec\u5728\u5305\u62ecInfiniteBench\u3001RULER\u3001PG-19\u548cNeedle In A Haystack\u5728\u5185\u7684\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4ee5\u53caLLaMA-3-1M\u3001GLM4-1M\u3001Yi-200K\u3001Phi-3-128K\u548cQwen2-128K\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMInference\u5728A100\u4e0a\u6709\u6548\u964d\u4f4e\u4e86\u9884\u586b\u5145\u7684\u63a8\u7406\u5ef6\u8fdf\u9ad8\u8fbe10\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u5730\u5740\u4e3a\uff1ahttps://aka.ms/MInference\u3002**|\n", "2407.02486": "|**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNeurocache\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6709\u6548\u4e0a\u4e0b\u6587\u8303\u56f4\uff0c\u901a\u8fc7\u5916\u90e8\u5411\u91cf\u7f13\u5b58\u5b58\u50a8\u5176\u8fc7\u53bb\u7684\u6a21\u578b\u72b6\u6001\u3002\u4e0e\u8fd1\u671f\u7684\u5411\u91cf\u68c0\u7d22\u65b9\u6cd5\u7c7b\u4f3c\uff0cNeurocache\u5229\u7528\u9ad8\u6548\u7684k\u8fd1\u90bb(kNN)\u7b97\u6cd5\u68c0\u7d22\u76f8\u5173\u7684\u5386\u53f2\u72b6\u6001\uff0c\u5e76\u5c06\u5176\u878d\u5165\u6ce8\u610f\u529b\u8fc7\u7a0b\u3002Neurocache\u5728\u6539\u8fdb\u73b0\u6709\u65b9\u6cd5\u65b9\u9762\u6709\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u5b58\u50a8\u538b\u7f29\u7684\u72b6\u6001\uff0c\u51cf\u5c0f\u4e86\u7f13\u5b58\u5927\u5c0f\uff1b(2) \u6bcf\u4e2a\u4ee4\u724c\u6267\u884c\u4e00\u6b21\u68c0\u7d22\u64cd\u4f5c\uff0c\u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\uff1b(3) \u5c06\u68c0\u7d22\u7a97\u53e3\u6269\u5c55\u5230\u90bb\u8fd1\u72b6\u6001\uff0c\u63d0\u5347\u4e86\u8bed\u8a00\u5efa\u6a21\u548c\u4e0b\u6e38\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u8fd8\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982Llama2-7B\u548cMistral-7B\uff09\u8fdb\u884c\u589e\u5f3a\uff0cNeurocache\u90fd\u80fd\u6709\u6548\u3002\u6211\u4eec\u8fd8\u5bf9\u6bd4\u4e86Neurocache\u4e0e\u5176\u4ed6\u6587\u672c\u68c0\u7d22\u65b9\u6cd5\uff0c\u5728\u5355\u6587\u6863\u95ee\u7b54\u548c\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5176\u4f18\u52bf\u3002\u6e90\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u516c\u5f00\uff1ahttps://github.com/alisafaya/neurocache\u3002**|\n", "2407.02485": "|**2024-07-02**|**RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs**|Yue Yu et.al.|[2407.02485](http://arxiv.org/abs/2407.02485)|null|\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6307\u4ee4\u8c03\u4f18\u6846\u67b6RankRAG\uff0c\u65e8\u5728\u9488\u5bf9\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4e2d\u7684\u4e0a\u4e0b\u6587\u6392\u540d\u548c\u7b54\u6848\u751f\u6210\u53cc\u91cd\u4efb\u52a1\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u8c03\u4f18\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u52a0\u5165\u5c11\u91cf\u6392\u540d\u6570\u636e\uff0c\u6307\u4ee4\u8c03\u4f18\u7684\u5355\u4e2a\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u4ee4\u4eba\u60ca\u8bb6\u7684\u6548\u679c\uff0c\u8d85\u8d8a\u4e86\u4e13\u95e8\u4f7f\u7528\u5927\u91cf\u6392\u540d\u6570\u636e\u8fdb\u884c\u5355\u72ec\u8c03\u4f18\u7684\u73b0\u6709\u4e13\u5bb6\u6392\u540d\u6a21\u578b\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4e0e\u5305\u62ecGPT-4-0613\u3001GPT-4-turbo-2024-0409\u548c\u5f00\u653e\u6e90\u4ee3\u7801\u7684\u6700\u5148\u8fdb\u7684RAG\u6027\u80fd\u6a21\u578bChatQA-1.5\u5728\u5185\u7684\u591a\u4e2a\u5f3abaseline\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684Llama3-RankRAG\u5728\u4e5d\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8eLlama3-ChatQA-1.5\u548cGPT-4\u7cfb\u5217\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5728\u65e0\u9700\u9488\u5bf9\u751f\u7269\u533b\u5b66\u9886\u57df\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u4e94\u4e2a\u751f\u7269\u533b\u5b66\u9886\u57df\u7684RAG\u57fa\u51c6\u4e0a\u4e0eGPT-4\u6a21\u578b\u8868\u73b0\u76f8\u5f53\uff0c\u8fd9\u663e\u793a\u4e86\u5176\u5728\u65b0\u9886\u57df\u4e2d\u7684\u51fa\u8272\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.02477": "|**2024-07-02**|**Understanding Alignment in Multimodal LLMs: A Comprehensive Study**|Elmira Amirloo et.al.|[2407.02477](http://arxiv.org/abs/2407.02477)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6027\u80fd\u7684\u63d0\u5347\uff0c\u504f\u597d\u4e00\u81f4\u6027\u5df2\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u56e0\u7d20\uff0c\u4f46\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u5e94\u7528\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u7406\u89e3\u4efb\u52a1\u4e2d\u4e5f\u4f1a\u9047\u5230\u8bf8\u5982\u9519\u8bef\u9648\u8ff0\u548c\u5185\u5bb9\u4e0d\u4e00\u81f4\uff08\u5373\u5e7b\u89c9\uff09\u7684\u95ee\u9898\u3002MLLMs\u7684\u504f\u597d\u5bf9\u9f50\u76ee\u6807\u662f\u4f7f\u6a21\u578b\u7684\u56de\u7b54\u66f4\u8d34\u8fd1\u56fe\u50cf\u4fe1\u606f\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u5f15\u5165\u4e86\u9488\u5bf9MLLM\u7684\u504f\u597d\u6570\u636e\u96c6\uff0c\u5e76\u5c1d\u8bd5\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u548cproximal policy optimization\uff08PPO\uff09\u7b49\u4e0d\u540c\u7684\u5bf9\u9f50\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u3001\u57fa\u7840\u6a21\u578b\u7c7b\u578b\u548c\u5bf9\u9f50\u7b56\u7565\u7684\u5dee\u5f02\uff0c\u54ea\u79cd\u65b9\u6cd5\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u8d21\u732e\u6700\u5927\u5c1a\u4e0d\u6e05\u695a\u3002 \u672c\u6587\u72ec\u7acb\u5206\u6790\u4e86MLLM\u504f\u597d\u5bf9\u9f50\u7684\u5404\u4e2a\u65b9\u9762\u3002\u6211\u4eec\u5c06\u5bf9\u9f50\u7b97\u6cd5\u5206\u4e3a\u79bb\u7ebf\uff08\u5982DPO\uff09\u548c\u5728\u7ebf\uff08\u5982\u5728\u7ebf-DPO\uff09\u4e24\u7c7b\uff0c\u5e76\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u7ed3\u5408\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u56de\u987e\u4e86\u5404\u79cd\u5df2\u53d1\u8868\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u96c6\uff0c\u63a2\u8ba8\u4e86\u5b83\u4eec\u6784\u5efa\u7ec6\u8282\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u751f\u6210\u65b9\u6cd5\u2014\u2014\u504f\u89c1\u9a71\u52a8\u7684\u5e7b\u89c9\u91c7\u6837\uff08Bias-Driven Hallucination Sampling\uff0cBDHS\uff09\uff0c\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u989d\u5916\u6807\u6ce8\u6216\u5916\u90e8\u6a21\u578b\uff0c\u4e14\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u4e0e\u4e4b\u524d\u53d1\u8868\u7684\u5bf9\u9f50\u5de5\u4f5c\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u80fd\u3002|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473](http://arxiv.org/abs/2407.02473)|null|\u5982\u4f55\u6784\u5efa\u80fd\u591f\u5728\u5f00\u653e\u4e16\u754c\u4e2d\u6267\u884c\u8bed\u4e49\u5bfc\u822a\u4efb\u52a1\u7684\u673a\u5668\u4eba\uff0c\u6bd4\u5982\u5728\u65b0\u573a\u666f\u4e2d\u5bfb\u627e\u76ee\u6807\u7269\u4f53\uff1f\u5c3d\u7ba1\u57fa\u7840\u6a21\u578b\u5177\u5907\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u6240\u9700\u7684\u4e30\u5bcc\u77e5\u8bc6\u548c\u6cdb\u5316\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4e00\u79cd\u5408\u9002\u7684\u573a\u666f\u8868\u793a\u6765\u5c06\u5b83\u4eec\u6574\u5408\u5230\u5b8c\u6574\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5f00\u653e\u573a\u666f\u56fe\uff08Open Scene Graphs\uff0cOSG\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u62d3\u6251\u8bed\u4e49\u8868\u793a\uff0c\u7528\u4e8e\u4fdd\u7559\u548c\u7ec4\u7ec7\u5f00\u653e\u96c6\u4e2d\u573a\u666f\u4fe1\u606f\uff0c\u4e14\u7ed3\u6784\u53ef\u9002\u5e94\u4e0d\u540c\u73af\u5883\u7c7b\u578b\u3002\u6211\u4eec\u5c06\u57fa\u7840\u6a21\u578b\u548cOSG\u6574\u5408\u5230OpenSearch\u7cfb\u7edf\u4e2d\uff0c\u8be5\u7cfb\u7edf\u4e13\u4e3a\u5f00\u653e\u4e16\u754c\u7684\u5bf9\u8c61\u76ee\u6807\u5bfc\u822a\u8bbe\u8ba1\uff0c\u80fd\u591f\u7406\u89e3\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u5e76\u5728\u591a\u53d8\u73af\u5883\u4e2d\u96f6\u6837\u672c\u6cdb\u5316\uff0c\u5bfb\u627e\u672a\u89c1\u8fc7\u7684\u7269\u4f53\u3002\u6211\u4eec\u7684OSG\u589e\u5f3a\u4e86\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5f97OpenSearch\u5728\u7269\u4f53\u76ee\u6807\u5bfc\u822a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u7684LLM\u65b9\u6cd5\u3002\u901a\u8fc7\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u6d4b\u8bd5\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86OpenSearch\u5728\u5404\u79cd\u73af\u5883\u3001\u673a\u5668\u4eba\u548c\u65b0\u9896\u6307\u4ee4\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02464": "|**2024-07-02**|**Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I**|Harrie Oosterhuis et.al.|[2407.02464](http://arxiv.org/abs/2407.02464)|null|\u4f20\u7edf\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u7cfb\u7edf\u8bc4\u4f30\u901a\u5e38\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u4eba\u5de5\u4e13\u5bb6\u8fdb\u884c\u76f8\u5173\u6027\u6807\u6ce8\u3002\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u80fd\u591f\u4ee5\u76f8\u5bf9\u8f83\u4f4e\u7684\u8ba1\u7b97\u6210\u672c\u5927\u89c4\u6a21\u751f\u6210\u76f8\u5173\u6027\u6ce8\u91ca\uff0c\u53ef\u80fd\u51cf\u8f7bIR\u8bc4\u4f30\u7684\u4f20\u7edf\u6210\u672c\uff0c\u5e76\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u8d44\u6e90\u532e\u4e4f\u7684\u5e94\u7528\u573a\u666f\u3002\u7136\u800c\uff0c\u751f\u6210\u7684\u6ce8\u91ca\u5e76\u975e\u65e0\u8bef\uff0c\u76f4\u63a5\u7528\u4e8e\u8bc4\u4f30\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u53ef\u9760\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e24\u79cd\u65b9\u6cd5\uff0c\u5206\u522b\u662f\u57fa\u4e8e\u9884\u6d4b\u9a71\u52a8\u7684\u63a8\u65ad\u548c\u89c4\u8303\u98ce\u9669\u63a7\u5236\uff0c\u5229\u7528\u8ba1\u7b97\u673a\u751f\u6210\u7684\u76f8\u5173\u6027\u6ce8\u91ca\u4e3aIR\u8bc4\u4f30\u6307\u6807\u63d0\u4f9b\u53ef\u9760\u7684\u7f6e\u4fe1\u533a\u95f4\uff08CIs\uff09\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9700\u8981\u5c11\u91cf\u53ef\u9760\u7684\u6ce8\u91ca\uff0c\u901a\u8fc7\u7edf\u8ba1\u5206\u6790\u751f\u6210\u6ce8\u91ca\u4e2d\u7684\u9519\u8bef\uff0c\u4ece\u800c\u4e3a\u8bc4\u4f30\u6307\u6807\u8bbe\u7f6eCIs\uff0c\u5177\u6709\u575a\u5b9e\u7684\u7406\u8bba\u57fa\u7840\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u7684\u89c4\u8303\u98ce\u9669\u63a7\u5236\u65b9\u6cd5\u9002\u7528\u4e8e\u6392\u540d\u8bc4\u4f30\uff0c\u5e76\u4e14\u53ef\u4ee5\u6839\u636e\u67e5\u8be2\u548c\u6587\u6863\u81ea\u9002\u5e94\u8c03\u6574CIs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7f6e\u4fe1\u533a\u95f4\u51c6\u786e\u6355\u6349\u4e86\u57fa\u4e8eLLM\u6ce8\u91ca\u7684\u8bc4\u4f30\u4e2d\u7684\u53d8\u5f02\u6027\u548c\u504f\u5dee\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684Bootstrap\u4f30\u8ba1\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e9b\u8d21\u732e\u80fd\u4e3a\u90a3\u4e9b\u4f20\u7edf\u4e0a\u96be\u4ee5\u5b9e\u73b0\u53ef\u9760\u8bc4\u4f30\u7684\u4f17\u591aIR\u5e94\u7528\u5e26\u6765\u9769\u65b0\u3002|\n", "2407.02411": "|**2024-07-03**|**Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs**|Jinmin Li et.al.|[2407.02411](http://arxiv.org/abs/2407.02411)|null|\u968f\u7740\u89c6\u9891\u9a71\u52a8\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u89c6\u9891\u7406\u89e3\u80fd\u529b\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u4fdd\u62a4\u65b9\u9762\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u89c6\u9891\u66f4\u5bb9\u6613\u88ab\u65e0\u6388\u6743\u5730\u6807\u6ce8\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cVideo Watermarking\u201d\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fdd\u62a4\u89c6\u9891\u514d\u53d7\u672a\u7ecf\u6388\u6743\u7684\u89c6\u9891LLMs\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5185\u5bb9\u548c\u63cf\u8ff0\u7684\u5904\u7406\u3002\u901a\u8fc7\u5728\u5173\u952e\u5e27\u4e2d\u5d4c\u5165\u96be\u4ee5\u5bdf\u89c9\u7684\u6c34\u5370\uff0c\u6211\u4eec\u5229\u7528\u591a\u6a21\u6001\u6d41\u635f\u5931\u4fdd\u6301\u89c2\u770b\u4f53\u9a8c\u7684\u540c\u65f6\uff0c\u9632\u6b62\u89c6\u9891\u88ab\u6ee5\u7528\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0cVideo Watermarking\u663e\u8457\u964d\u4f4e\u4e86\u89c6\u9891\u5728\u5404\u79cd\u89c6\u9891LLMs\u4e2d\u7684\u53ef\u7406\u89e3\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u9690\u79d8\u6027\u548c\u9c81\u68d2\u6027\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u786e\u4fdd\u89c6\u9891\u5185\u5bb9\u7684\u5b89\u5168\u3001\u5b8c\u6574\u6027\u548c\u4fdd\u5bc6\u6027\u63d0\u4f9b\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u4e0d\u65ad\u53d1\u5c55\u7684\u89c6\u9891LLMs\u6280\u672f\u3002|\n", "2407.02408": "|**2024-07-02**|**CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models**|Song Wang et.al.|[2407.02408](http://arxiv.org/abs/2407.02408)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u5bf9\u5176\u751f\u6210\u5185\u5bb9\u53ef\u80fd\u4ea7\u751f\u7684\u8d1f\u9762\u793e\u4f1a\u5f71\u54cd\u7684\u62c5\u5fe7\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u7684\u504f\u89c1\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u504f\u89c1\u8bc4\u4f30\u5de5\u4f5c\u5f80\u5f80\u53ea\u5173\u6ce8\u67d0\u79cd\u7c7b\u578b\u7684\u504f\u89c1\uff0c\u5e76\u4f7f\u7528\u4e0d\u4e00\u81f4\u7684\u8bc4\u4ef7\u6307\u6807\uff0c\u8fd9\u5bfc\u81f4\u4e0d\u540c\u6570\u636e\u96c6\u548cLLM\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u591a\u79cd\u7528\u4e8e\u8bc4\u4f30LLM\u504f\u89c1\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86CEB\uff08Compositional Evaluation Benchmark\uff09\uff0c\u5b83\u6db5\u76d6\u4e86\u4e0d\u540c\u793e\u4f1a\u7fa4\u4f53\u548c\u793e\u4f1a\u4efb\u52a1\u4e2d\u7684\u5404\u79cd\u7c7b\u578b\u504f\u89c1\u3002CEB\u7684\u6784\u5efa\u57fa\u4e8e\u6211\u4eec\u65b0\u63d0\u51fa\u7684\u6784\u6210\u6027\u5206\u7c7b\u4f53\u7cfb\uff0c\u4ece\u4e09\u4e2a\u7ef4\u5ea6\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u523b\u753b\uff1a\u504f\u89c1\u7c7b\u578b\u3001\u793e\u4f1a\u7fa4\u4f53\u548c\u4efb\u52a1\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e09\u4e2a\u7ef4\u5ea6\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u5168\u9762\u7684LLM\u504f\u89c1\u8bc4\u4f30\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u504f\u89c1\u5728\u5404\u7ef4\u5ea6\u4e0a\u7684\u7a0b\u5ea6\u6709\u6240\u4e0d\u540c\uff0c\u4ece\u800c\u4e3a\u9488\u5bf9\u7279\u5b9a\u504f\u89c1\u7684\u7f13\u89e3\u65b9\u6cd5\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002|\n", "2407.02402": "|**2024-07-02**|**Assessing the Code Clone Detection Capability of Large Language Models**|Zixian Zhang et.al.|[2407.02402](http://arxiv.org/abs/2407.02402)|null|\u8be5\u7814\u7a76\u65e8\u5728\u8bc4\u4f30\u4e24\u79cd\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cGPT-3.5\u548cGPT-4\uff0c\u5728\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u901a\u8fc7\u5728\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u6a21\u578b\uff1aBigCloneBench\uff08\u4eba\u7c7b\u521b\u5efa\uff09\u548cGPTCloneBench\uff08LLM\u751f\u6210\uff09\u3002\u7814\u7a76\u53d1\u73b0\uff0cGPT-4\u5728\u6240\u6709\u7c7b\u578b\u7684\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4e2d\u90fd\u660e\u663e\u4f18\u4e8eGPT-3.5\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e0e\u5176\u8bc6\u522b\u4ee3\u7801\u514b\u9686\u7684\u80fd\u529b\u4e0e\u4ee3\u7801\u76f8\u4f3c\u5ea6\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5728\u8bc6\u522b\u6700\u590d\u6742\u7684Type-4\u4ee3\u7801\u514b\u9686\u65f6\u6548\u679c\u8f83\u4f4e\u3002\u6b64\u5916\uff0cGPT\u6a21\u578b\u5728\u68c0\u6d4bLLM\u751f\u6210\u7684\u4ee3\u7801\u4e2d\u7684\u4ee3\u7801\u514b\u9686\u8868\u73b0\u4f18\u4e8e\u4eba\u7c7b\u751f\u6210\u7684\u4ee3\u7801\uff0c\u4f46\u6574\u4f53\u51c6\u786e\u6027\u4ecd\u4e0d\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u5728\u4ee3\u7801\u514b\u9686\u8bc6\u522b\u80fd\u529b\u7684\u5fc5\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u81ea\u6211\u751f\u6210\u4ee3\u7801\u514b\u9686\u7684\u95ee\u9898\uff0c\u968f\u7740\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u8d8a\u6765\u8d8a\u591a\u5730\u4f7f\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u548c\u91cd\u6784\u5de5\u5177\uff0c\u8fd9\u53ef\u80fd\u4f1a\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002|\n", "2407.03310": "|**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**\u6458\u8981\uff1a** \u957f\u5ea6\u6cdb\u5316\u6307\u7684\u662f\u4ece\u7b80\u77ed\u7684\u8bad\u7ec3\u5e8f\u5217\u63a8\u65ad\u51fa\u957f\u6d4b\u8bd5\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd9\u5bf9\u4e8e\u5f53\u524d\u7684\u5927\u8bed\u8a00\u6a21\u578b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u67b6\u6784\u6216\u6570\u636e\u683c\u5f0f\u53d8\u5316\u6765\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5c40\u9650\u4e8e\u7279\u5b9a\u4efb\u52a1\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u64e6\u9664\u677f\u548c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u6280\u672f\uff0c\u63d0\u51fa\u4e86Turing\u7a0b\u5e8f\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684CoT\u7b56\u7565\uff0c\u5b83\u5c06\u7b97\u6cd5\u6027\u4efb\u52a1\u5206\u89e3\u6210\u7c7b\u4f3c\u56fe\u7075\u673a\u8ba1\u7b97\u7684\u6b65\u9aa4\u3002\u8fd9\u4e2a\u6846\u67b6\u65e2\u901a\u7528\u53c8\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728\u4e0a\u4e0b\u6587\u4e2d\u7a0d\u4f5c\u4fee\u6539\u5730\u590d\u5236\u6587\u672c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Turing\u7a0b\u5e8f\uff0c\u6211\u4eec\u5728\u52a0\u6cd5\u3001\u4e58\u6cd5\u4ee5\u53ca\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684SGD\u7b49\u7b97\u6cd5\u6027\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u7a33\u5065\u7684\u957f\u5ea6\u6cdb\u5316\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793aTransformer\u5728\u968f\u673aTuring\u7a0b\u5e8f\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u8fd9\u8868\u660e\u5bf9\u4e8e\u4efb\u4f55\u7b97\u6cd5\u6027\u4efb\u52a1\uff0c\u957f\u5ea6\u6cdb\u5316\u90fd\u662f\u53ef\u80fd\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u7406\u8bba\u8bc1\u660eTransformer\u80fd\u591f\u5b9e\u73b0Turing\u7a0b\u5e8f\uff0c\u6784\u9020\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RASP\uff08Weiss\u7b49\u4eba\uff09\u7a0b\u5e8f\uff0c\u5b83\u6a21\u62df\u4efb\u610f\u56fe\u7075\u673a\u3002|\n", "2407.03286": "|**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## \u80cc\u666f \u534a\u7ed3\u6784\u5316\u6570\u636e\u683c\u5f0f\u5982JSON\u56e0\u5176\u5728\u5b58\u50a8\u6570\u636e\u65f6\u7684\u7075\u6d3b\u6027\u800c\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cJSON\u6570\u636e\u901a\u5e38\u7f3a\u4e4f\u4e0e\u5173\u7cfb\u6570\u636e\u5e93\u4e2d\u7684\u8868\u5355\u7ed3\u6784\u76f8\u5bf9\u5e94\u7684\u89c4\u8303\uff08schema\uff09\u3002\u56e0\u6b64\uff0c\u51fa\u73b0\u4e86\u8bb8\u591a\u4ece\u6570\u636e\u96c6\u4e2d\u53d1\u73b0\u89c4\u8303\u7684\u5de5\u5177\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5de5\u5177\u5f88\u6709\u7528\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u6587\u6863\u7684\u8bed\u6cd5\uff0c\u800c\u5ffd\u89c6\u4e86\u8bed\u4e49\u4fe1\u606f\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u5982\u4f55\u81ea\u52a8\u4e3a\u53d1\u73b0\u7684\u89c4\u8303\u6dfb\u52a0\u6709\u610f\u4e49\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u4f7f\u5176\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4f5c\u8005\u7f16\u5199\u7684\u89c4\u8303\u4e2d\u6240\u5305\u542b\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u4eba\u5de5\u7f16\u5199\u7684JSON Schema\u6587\u6863\u5e93\uff0c\u751f\u6210\u5143\u7d20\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u3001\u53ef\u91cd\u7528\u5b9a\u4e49\u7684\u6709\u610f\u4e49\u540d\u79f0\uff0c\u5e76\u8bc6\u522b\u51fa\u54ea\u4e9b\u53d1\u73b0\u7684\u5c5e\u6027\u6700\u6709\u7528\uff0c\u54ea\u4e9b\u53ef\u4ee5\u89c6\u4e3a\u201c\u566a\u58f0\u201d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5148\u524d\u5df2\u8bc1\u660e\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684\u6587\u672c\u751f\u6210\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u4e25\u91cd\u5236\u7ea6\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002\u4eba\u7c7b\u5177\u6709\u81ea\u6211\u610f\u8bc6\u8fc7\u7a0b\uff0c\u80fd\u8bc6\u522b\u9762\u5bf9\u67e5\u8be2\u65f6\u7684\u672a\u77e5\u9886\u57df\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u8bba\u6587\u7814\u7a76\u4e86LLMs\u80fd\u5426\u5728\u751f\u6210\u54cd\u5e94\u4e4b\u524d\u81ea\u884c\u8bc4\u4f30\u5176\u5e7b\u89c9\u98ce\u9669\u3002\u6211\u4eec\u4ece\u8bad\u7ec3\u6570\u636e\u6e90\u548c15\u4e2a\u4e0d\u540c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u5e7f\u6cdb\u89c6\u89d2\u5206\u6790\u4e86LLMs\u7684\u5185\u90e8\u673a\u5236\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u6570\u636e\u96c6\u3002\u5b9e\u8bc1\u5206\u6790\u63ed\u793a\u4e86\u4e24\u4e2a\u5173\u952e\u53d1\u73b0\uff1a(1) LLM\u7684\u5185\u90e8\u72b6\u6001\u53ef\u4ee5\u6307\u793a\u5b83\u4eec\u662f\u5426\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c1\u8fc7\u67e5\u8be2\uff1b(2) LLM\u7684\u5185\u90e8\u72b6\u6001\u663e\u793a\u51fa\u5b83\u4eec\u5bf9\u67e5\u8be2\u53ef\u80fd\u4ea7\u751f\u5e7b\u89c9\u6216\u4e0d\u4ea7\u751f\u5e7b\u89c9\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u7814\u7a76\u5173\u6ce8\u7279\u5b9a\u7684\u795e\u7ecf\u5143\u3001\u6fc0\u6d3b\u5c42\u548c\u4ee4\u724c\uff0c\u8fd9\u4e9b\u5728LLM\u5bf9\u4e0d\u786e\u5b9a\u6027\u548c\u5e7b\u89c9\u98ce\u9669\u7684\u8ba4\u8bc6\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u901a\u8fc7\u4e00\u79cd\u63a2\u67e5\u4f30\u8ba1\u7b97\u6cd5\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u81ea\u6211\u8bc4\u4f30\uff0c\u5728\u8fd0\u884c\u65f6\u5b9e\u73b0\u4e86\u5e73\u574784.32%\u7684\u5e7b\u89c9\u4f30\u8ba1\u51c6\u786e\u7387\u3002|\n", "2407.03227": "|**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|\u6211\u4eec\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89d2\u5ea6\u63a2\u8ba8\u6587\u672c\u5230SQL\u7684\u8bed\u4e49\u89e3\u6790\u3002\u9274\u4e8e\u5546\u4e1a\u6570\u636e\u5e93\u6a21\u5f0f\u7684\u89c4\u6a21\u6311\u6218\u548c\u4e1a\u52a1\u667a\u80fd\u89e3\u51b3\u65b9\u6848\u7684\u90e8\u7f72\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5b83\u52a8\u6001\u83b7\u53d6\u8f93\u5165\u6570\u636e\u5e93\u4fe1\u606f\uff0c\u5e76\u5229\u7528\u62bd\u8c61\u8bed\u6cd5\u6811\u9009\u62e9\u5c11\u91cf\u793a\u4f8b\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5982\u4f55\u5229\u7528\u5e76\u884c\u8bed\u4e49\u89e3\u6790\u5668\u751f\u6210SQL\u67e5\u8be2\u7684\u8fd1\u4f3c\u7248\u672c\uff0c\u4ee5\u652f\u6301\u6211\u4eec\u7684\u68c0\u7d22\u3002\u6211\u4eec\u751a\u81f3\u5c06\u8fd9\u79cd\u65b9\u6cd5\u63a8\u5411\u6781\u81f4\uff0c\u91c7\u7528\u4e0d\u52305\u4ebf\u53c2\u6570\u7684\u6a21\u578b\u4f5c\u4e3a\u9ad8\u6548\u8fd1\u4f3c\u5668\uff0c\u5e76\u8d4b\u4e88\u5176\u5e76\u884c\u5904\u7406\u6a21\u5f0f\u7684\u80fd\u529b\u3002\u6211\u4eec\u5728\u5355\u8bed\u548c\u8de8\u8bed\u8a00\u7684\u8bed\u4e49\u89e3\u6790\u57fa\u51c6\u4e0a\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4f18\u4e8e\u73b0\u6709\u6700\u4f73\u57fa\u7ebf\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u79cd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u8bbe\u7f6e\u4e2d\u5404\u4e2a\u6a21\u5757\u7684\u8d21\u732e\uff0c\u4e3a\u672a\u6765\u5de5\u4f5c\u6307\u660e\u4e86\u6709\u8da3\u7684\u65b9\u5411\u3002|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## \u80cc\u666f \u91cf\u5316\u6280\u672f\u5728\u63d0\u5347\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u901f\u5ea6\u548c\u90e8\u7f72\u6548\u7387\u65b9\u9762\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5c3d\u7ba1\u6709\u5927\u91cf\u7684\u7814\u7a76\u5173\u6ce8\u4e86\u91cf\u5316\u540e\u7684\u82f1\u8bed\u4efb\u52a1\u6a21\u578b\u6548\u679c\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u9488\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u3002\u6211\u4eec\u5bf9\u91cf\u5316\u591a\u8bed\u8a00LLM\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u8de8\u8bed\u8a00\u6027\u80fd\u53ca\u4e0d\u540c\u89c4\u6a21\u4e0b\u7684\u8868\u73b0\u3002\u6211\u4eec\u91c7\u7528\u81ea\u52a8\u57fa\u51c6\u6d4b\u8bd5\u3001LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u8bc4\u4f30\uff0c\u53d1\u73b0\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u91cf\u5316\u5bf9\u4eba\u7c7b\u8bc4\u4ef7\u7684\u5f71\u54cd\u662f\u8d1f\u9762\u7684\uff0c\u4e14\u81ea\u52a8\u6307\u6807\u4e25\u91cd\u4f4e\u4f30\u4e86\u8fd9\u79cd\u635f\u5bb3\uff1a\u81ea\u52a8\u4efb\u52a1\u4e2d\u5e73\u57471.7%\u7684\u6027\u80fd\u4e0b\u964d\u5bf9\u5e94\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u65e5\u672c\u4efb\u52a1\u768416.0%\u663e\u8457\u4e0b\u6ed1\uff1b(2) \u4e0d\u540c\u8bed\u8a00\u53d7\u5230\u91cf\u5316\u7684\u5f71\u54cd\u7a0b\u5ea6\u4e0d\u5747\uff0c\u975e\u62c9\u4e01\u5b57\u6bcd\u4f53\u7cfb\u7684\u8bed\u8a00\u53d7\u5f71\u54cd\u6700\u4e25\u91cd\uff1b(3) \u6bd4\u5982\u6570\u5b66\u63a8\u7406\u8fd9\u7c7b\u6311\u6218\u6027\u4efb\u52a1\uff0c\u5176\u6027\u80fd\u4e0b\u964d\u6700\u4e3a\u663e\u8457\u3002\u968f\u7740\u4f4e\u529f\u8017\u6a21\u578b\u670d\u52a1\u4e8e\u5168\u7403NLP\u6280\u672f\u7684\u666e\u53ca\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5728\u8bc4\u4f30\u9ad8\u6548\u6a21\u578b\u65f6\uff0c\u591a\u8bed\u8a00\u6027\u80fd\u5e94\u4f5c\u4e3a\u5173\u952e\u6307\u6807\u3002|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### \u7ffb\u8bd1 \u5728\u6570\u5b66\u8bc1\u660e\u7684\u8ba1\u7b97\u673a\u53ef\u9a8c\u8bc1\u5f62\u5f0f\u8bed\u8a00\uff08\u5982Lean\uff09\u9a8c\u8bc1\u4e2d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u7684\u8bc1\u660e\u65b9\u6cd5\u5177\u6709\u91cd\u8981\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7531\u4e8eNL\u4e0e\u5f62\u5f0f\u8bed\u8a00\uff08FL\uff09\u7684\u8bc1\u660e\u6570\u636e\u7a00\u7f3a\uff0c\u73b0\u4ee3LLMs\u5728\u751f\u6210\u5b8c\u6574\u8bc1\u660e\u65b9\u9762\u7684\u6027\u80fd\u6b20\u4f73\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a**TheoremLlama**\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u901a\u7528LLM\u6210\u4e3aLean4\u4e13\u5bb6\u3002\u8be5\u6846\u67b6\u5305\u62ecNL-FL\u5bf9\u9f50\u6570\u636e\u96c6\u751f\u6210\u65b9\u6cd5\u3001LLM\u5f62\u5f0f\u5b9a\u7406\u8bc1\u660e\u5668\u7684\u8bad\u7ec3\u7b56\u7565\u4ee5\u53caLLM\u5728\u64b0\u5199Lean4\u8bc1\u660e\u4e2d\u7684\u6280\u672f\u3002 \u5173\u952e\u521b\u65b0\u5728\u4e8e\u6211\u4eec\u5f00\u53d1\u4e86NL-FL\u81ea\u4e3e\u65b9\u6cd5\uff0c\u5373\u5c06NL\u8bc1\u660e\u878d\u5165Lean4\u4ee3\u7801\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u6b63\u5f0f\u63a8\u7406\u3002\u901a\u8fc7\u8fd9\u79cd\u6570\u636e\u96c6\u751f\u6210\u65b9\u5f0f\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86**Open Bootstrapped Theorems**\uff08OBT\uff09\uff0c\u4e00\u4e2a\u5bf9\u9f50\u4e14\u81ea\u4e3e\u7684NL-FL\u6570\u636e\u96c6\u3002**TheoremLlama**\u6846\u67b6\u5728MiniF2F-Valid\u548cTest\u6570\u636e\u96c6\u4e0a\u7684\u7d2f\u8ba1\u51c6\u786e\u7387\u5206\u522b\u8fbe\u523036.48%\u548c33.61%\uff0c\u8d85\u8fc7\u4e86GPT-4\u7684\u57fa\u7ebf\u5206\u657022.95%\u548c25.41%\u3002\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u6a21\u578b\u68c0\u67e5\u70b9\u548c\u751f\u6210\u7684\u6570\u636e\u96c6\uff0c\u5e76\u5373\u5c06\u5168\u90e8\u4ee3\u7801\u5f00\u6e90\u3002**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aDivergent CoT\uff08DCoT\uff09\uff0c\u901a\u8fc7\u8981\u6c42\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u6bd4\u8f83\u591a\u4e2a\u63a8\u7406\u94fe\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u5c0f\u578b\u3001\u66f4\u6613\u4e8e\u83b7\u53d6\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u4e5f\u80fd\u63d0\u9ad8\u8868\u73b0\u3002\u901a\u8fc7\u4e00\u7cfb\u5217\u5e7f\u6cdb\u6db5\u76d6\u4e0d\u540c\u7c7b\u578b\u63a8\u7406\u4efb\u52a1\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u7814\u7a76\u663e\u793a\uff0c\u5bf9DCoT\u8fdb\u884c\u5fae\u8c03\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece1.3\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\u548c\u6a21\u578b\u5bb6\u65cf\u4e2d\uff0c\u90fd\u666e\u904d\u4f18\u4e8e\u57fa\u672c\u7684CoT\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6027\u80fd\u63d0\u5347\u6e90\u4e8e\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u4e2d\u751f\u6210\u4e86\u591a\u6761\u4e0d\u540c\u7684\u63a8\u7406\u8def\u5f84\uff0c\u8fd9\u8868\u660e\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u81ea\u6211\u7ea0\u6b63\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u5df2\u516c\u5f00\u5728https://github.com/UKPLab/arxiv2024-divergent-cot\u3002**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u80fd\u529b\u3001\u6cdb\u5316\u80fd\u529b\u548c\u8de8\u9886\u57df\u7684\u6d41\u7545\u6027\uff0c\u5728\u63d0\u5347\u8bed\u97f3\u76f8\u5173\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u672c\u6587\u5173\u6ce8\u7684\u662f\u5982\u4f55\u5c06\u89e3\u7801\u5668\u4ec5\u6709\u7684LLMs\u6574\u5408\u5230\u8bed\u97f3\u8f6c\u6587\u672c\u7ffb\u8bd1\uff08Speech-to-Text Translation\uff0cS2TT\uff09\u4efb\u52a1\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u67b6\u6784\uff0c\u8ba9LLM\u76f4\u63a5\u5904\u7406\u7f16\u7801\u7684\u8bed\u97f3\u8868\u793a\u5e76\u751f\u6210\u6587\u672c\u7ffb\u8bd1\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u548c\u4efb\u52a1\u8868\u8ff0\u65b9\u5f0f\u7684\u5f71\u54cd\u3002\u5728\u4e0d\u4f7f\u7528\u4e13\u6709\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728CoVoST 2\u548cFLEURS\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u8bbe\u8ba1\u9009\u62e9\u7684\u5408\u7406\u6027\uff0c\u5e76\u4e3aLLMs\u4e0eS2TT\u4efb\u52a1\u7684\u878d\u5408\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2407.03160": "|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## \u80cc\u666f \u5f00\u6e90\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u516c\u4f17\u548c\u884c\u4e1a\u4e2d\u7684\u53d7\u6b22\u8fce\u7a0b\u5ea6\u65e5\u76ca\u63d0\u5347\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u5b9a\u5236\u3001\u5fae\u8c03\u4e14\u514d\u8d39\u4f7f\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5f00\u6e90LLMs\u5728\u4f7f\u7528\u524d\u9700\u8981\u5ba1\u6279\uff0c\u8fd9\u4fc3\u4f7f\u7b2c\u4e09\u65b9\u53d1\u5e03\u6613\u4e8e\u83b7\u53d6\u7684\u7248\u672c\uff0c\u751a\u81f3\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\u6216\u91cf\u5316\u5904\u7406\uff0c\u4ee5\u964d\u4f4e\u8ba1\u7b97\u8d44\u6e90\u9700\u6c42\u3002\u8fd9\u79cd\u8d8b\u52bf\u589e\u52a0\u4e86\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u7684\u98ce\u9669\uff0c\u5a01\u80c1\u5230LLMs\u7684\u5b8c\u6574\u6027\u548c\u5b89\u5168\u6027\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u2014\u2014SOS\uff08Save Our Skills\uff09\uff0c\u5b83\u8bbe\u8ba1\u5f97\u8ba1\u7b97\u9700\u6c42\u4f4e\uff0c\u65e0\u9700\u5e72\u51c0\u6570\u636e\u6216\u8c03\u6574\u6a21\u578b\u6743\u91cd\uff0c\u4fdd\u6301\u6a21\u578b\u7684\u5b9e\u7528\u6027\u3002\u8be5\u653b\u51fb\u65e8\u5728\u5e94\u5bf9\u5404\u79cd\u5b89\u5168\u95ee\u9898\uff0c\u5305\u62ec\u540e\u95e8\u653b\u51fb\u3001\u7834\u89e3\u653b\u51fb\u548c\u63d0\u793a\u7a83\u53d6\u653b\u51fb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSOS\u653b\u51fb\u5728\u6240\u6709\u6d4b\u8bd5\u76ee\u6807\u4e0a\u90fd\u6709\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86SOS\u6280\u672f\u7684\u53e6\u4e00\u9762\u2014\u2014\u7248\u6743\u4ee4\u724c\uff1a\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u6237\u53ef\u4ee5\u6807\u8bb0\u5176\u53d7\u7248\u6743\u4fdd\u62a4\u7684\u5185\u5bb9\uff0c\u9632\u6b62\u6a21\u578b\u4f7f\u7528\u3002|\n", "2407.03157": "|**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4ee3\u7801\u751f\u6210\u4e2d\u7684\u5e38\u89c1\u573a\u666f\uff1a\u5f00\u53d1\u8005\u5b9e\u65f6\u7f16\u8f91\u73b0\u6709\u4ee3\u7801\uff0c\u5e76\u8bf7\u6c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u8bed\u8a00\u6a21\u578b\uff09\u8fdb\u884c\u5373\u65f6\u91cd\u9884\u6d4b\u4e0b\u4e00\u4e2atoken\u6216\u884c\u3002\u76f4\u63a5\u7684\u65b9\u6cd5\u662f\u8ba9LLM\u91cd\u65b0\u7f16\u7801\u6574\u4e2a\u952e\u503c\u7f13\u5b58\u4ee5\u63d0\u4f9b\u7cbe\u786e\u7684\u9884\u6d4b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8ba1\u7b97\u6210\u672c\u9ad8\uff0c\u7279\u522b\u662f\u5f53\u5e8f\u5217\u957f\u5ea6\u5f88\u957f\u65f6\u3002\u4ec5\u7f16\u7801\u7f16\u8f91\u540e\u7684\u5b50\u5e8f\u5217\u5e76\u5c06\u5176\u6574\u5408\u5230\u539f\u59cb\u952e\u503c\u7f13\u5b58\u4e2d\u4f1a\u9047\u5230\u65f6\u95f4\u6df7\u6dc6\u95ee\u9898\uff0c\u5bfc\u81f4\u6027\u80fd\u5927\u5e45\u4e0b\u964d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\u2014\u2014\\textbf{\u4f4d\u7f6e\u5b8c\u6574\u6027\u7f16\u7801}\uff08Positional Integrity Encoding\uff0c\u7b80\u79f0PIE\uff09\u3002PIE\u57fa\u4e8e\u65cb\u8f6c\u578b\u4f4d\u7f6e\u7f16\u7801\uff0c\u9996\u5148\u79fb\u9664\u5f15\u5165\u65f6\u95f4\u6df7\u6dc6\u7684\u65cb\u8f6c\u578b\u77e9\u9635\uff0c\u7136\u540e\u91cd\u65b0\u5e94\u7528\u6b63\u786e\u7684\u77e9\u9635\uff0c\u786e\u4fdd\u4e86\u4ee4\u724c\u4e4b\u95f4\u7684\u4f4d\u7f6e\u5173\u7cfb\u6b63\u786e\uff0c\u4ec5\u9700\u4e00\u8f6e\u77e9\u9635\u4e58\u6cd5\u5373\u53ef\u5b8c\u6210\u3002\u6211\u4eec\u5728RepoBench-C-8k\u6570\u636e\u96c6\u4e0a\uff0c\u4f7f\u752813\u4ebf\u300167\u4ebf\u548c330\u4ebf\u53c2\u6570\u7684DeepSeek-Coder\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u4ee3\u7801\u63d2\u5165\u3001\u4ee3\u7801\u5220\u9664\u548c\u591a\u4f4d\u7f6e\u4ee3\u7801\u7f16\u8f91\u7b49\u4e09\u4e2a\u5b9e\u9645\u7f16\u7a0b\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u5b8c\u6574\u91cd\u8ba1\u7b97\u65b9\u6cd5\u76f8\u6bd4\uff0cPIE\u5728\u6240\u6709\u6a21\u578b\u89c4\u6a21\u548c\u4efb\u52a1\u4e2d\u90fd\u80fd\u51cf\u5c11\u8d85\u8fc785%\u7684\u8ba1\u7b97\u5f00\u9500\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6027\u80fd\u8fd1\u4f3c\u3002|\n"}} \ No newline at end of file +{"agent": {"2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u5d4c\u5165\u5f0f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7cfb\u7edf\u5728\u7a7a\u95f4\u8ba4\u77e5\u548c\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u96c6\u6210\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u8bba\u6587\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u63ed\u793a\u4e86\u660e\u663e\u7684\u8fdb\u5c55\uff0c\u4f46\u4e5f\u5f3a\u8c03\u4e86\u5f00\u53d1\u65b0\u65b9\u6cd5\u4ee5\u5145\u5206\u5229\u75283D-LLMs\u6f5c\u529b\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u6307\u660e\u9053\u8def\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u7efc\u8ff0\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.09935": "|**2024-05-24**|**DEBATE: Devil's Advocate-Based Assessment and Text Evaluation**|Alex Kim et.al.|[2405.09935](http://arxiv.org/abs/2405.09935)|null|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u6a21\u578b\u7684\u666e\u53ca\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u53d8\u5f97\u65e5\u76ca\u5173\u952e\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65e0\u53c2\u8003\u8bc4\u4ef7\u5668\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u5904\u7406\u65b0\u4efb\u52a1\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u91c7\u7528\u5355\u4ee3\u7406\u65b9\u6cd5\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\u3002\u56e0\u4e3aLLM\u4ee3\u7406\u7684\u56de\u7b54\u5b58\u5728\u504f\u89c1\uff0c\u6bd4\u5982\u5bf9\u7279\u5b9a\u6587\u672c\u7ed3\u6784\u6216\u5185\u5bb9\u7684\u504f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u5de5\u4f5c\u4e2d\u63d0\u51faDEBATE\uff0c\u4e00\u4e2a\u5efa\u7acb\u5728\u591a\u4ee3\u7406\u8bc4\u5206\u7cfb\u7edf\u57fa\u7840\u4e0a\u7684NLG\u8bc4\u4ef7\u6846\u67b6\uff0c\u878d\u5165\u4e86\u201c\u6076\u9b54\u8fa9\u624b\u201d\u7684\u6982\u5ff5\u3002\u5728\u8be5\u6846\u67b6\u4e2d\uff0c\u4e00\u4e2a\u4ee3\u7406\u88ab\u6307\u4ee4\u6279\u8bc4\u5176\u4ed6\u4ee3\u7406\u7684\u8bba\u70b9\uff0c\u4ece\u800c\u53ef\u80fd\u6d88\u89e3LLM\u4ee3\u7406\u7b54\u6848\u4e2d\u7684\u504f\u89c1\u3002DEBATE\u5728\u4e24\u4e2aNLG\u8bc4\u4ef7\u5143\u8bc4\u4f30\u57fa\u51c6\u2014\u2014SummEval\u548cTopicalChat\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4ee3\u7406\u4e4b\u95f4\u7684\u8fa9\u8bba\u5e7f\u5ea6\u4ee5\u53ca\u4ee3\u7406\u7684\u4eba\u683c\u7279\u8d28\u4f1a\u5f71\u54cd\u8bc4\u4ef7\u5668\u7684\u6027\u80fd\u3002|\n", "2405.05175": "|**2024-05-08**|**Air Gap: Protecting Privacy-Conscious Conversational Agents**|Eugene Bagdasaryan et.al.|[2405.05175](http://arxiv.org/abs/2405.05175)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u5f0f\u4ee3\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5904\u7406\u654f\u611f\u7528\u6237\u6570\u636e\u65f6\u5f15\u53d1\u4e86\u4e25\u91cd\u7684\u9690\u79c1\u95ee\u9898\u3002\u8fd9\u4e9b\u4ee3\u7406\u867d\u80fd\u7406\u89e3\u5e76\u5904\u7406\u4e0a\u4e0b\u6587\uff0c\u4f46\u4e5f\u53ef\u80fd\u88ab\u6076\u610f\u4e00\u65b9\u5229\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5a01\u80c1\u6a21\u578b\uff0c\u5373\u7b2c\u4e09\u65b9\u5e94\u7528\u901a\u8fc7\u64cd\u63a7\u4ea4\u4e92\u4e0a\u4e0b\u6587\uff0c\u8bef\u5bfcLLM\u4ee3\u7406\u6cc4\u9732\u4e0e\u5176\u4efb\u52a1\u65e0\u5173\u7684\u79c1\u4eba\u4fe1\u606f\u3002\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u6846\u67b6\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AirGapAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u6ce8\u91cd\u9690\u79c1\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u9650\u5236\u4ee3\u7406\u4ec5\u8bbf\u95ee\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u6570\u636e\uff0c\u9632\u6b62\u610f\u5916\u7684\u6570\u636e\u6cc4\u6f0f\u3002\u5b9e\u9a8c\u4f7f\u7528Gemini\u3001GPT\u548cMistral\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\uff0c\u7ed3\u679c\u663e\u793aAirGapAgent\u5728\u62b5\u5fa1\u57fa\u4e8e\u5355\u4e2a\u67e5\u8be2\u7684\u4e0a\u4e0b\u6587\u52ab\u6301\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8eGemini Ultra\u4ee3\u7406\uff0c\u8fd9\u79cd\u653b\u51fb\u4ece94%\u7684\u4fdd\u62a4\u80fd\u529b\u964d\u4f4e\u523045%\uff0c\u800cAirGapAgent\u53ef\u4ee5\u4fdd\u630197%\u7684\u9632\u62a4\u6548\u679c\uff0c\u4f7f\u540c\u6837\u7684\u653b\u51fb\u5931\u6548\u3002|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325](http://arxiv.org/abs/2405.04325)|null|\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u867d\u4e3a\u6784\u5efa\u81ea\u7136\u8bed\u8a00\u4ee3\u7406\u63d0\u4f9b\u4e86\u5f3a\u5927\u57fa\u7840\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u5b83\u4eec\u53ca\u5176\u57fa\u4e8e\u5b83\u4eec\u6784\u5efa\u7684\u81ea\u4e3b\u4ee3\u7406\u7684\u5b89\u5168\u6027\u62c5\u5fe7\u3002\u7279\u522b\u662f\u6b3a\u9a97\u80fd\u529b\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662fAI\u4ee3\u7406\u901a\u8fc7\u6df7\u6dc6\u548c\u6a21\u68f1\u4e24\u53ef\u6765\u8bef\u5bfc\u3001\u9690\u85cf\u771f\u76f8\u6216\u63a8\u5e7f\u90e8\u5206\u4e0d\u771f\u5b9e\u7684\u4fe1\u5ff5\u7684\u884c\u4e3a\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80AI\u5b89\u5168\u7814\u7a76\u4e2d\u7684\u6492\u8c0e\u3001\u81ea\u79c1\u51b3\u7b56\u6216\u63d0\u4f9b\u865a\u5047\u4fe1\u606f\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u4e00\u7c7b\u7279\u6b8a\u7684\u6b3a\u9a97\uff1a\u7c7b\u4f3c\u4e8e\u9b54\u672f\u5e08\u5229\u7528\u969c\u773c\u6cd5\u8ba9\u5154\u5b50\u4ece\u5e3d\u5b50\u91cc\u51fa\u73b0\uff0c\u8981\u4e48\u901a\u8fc7\u9690\u85cf\u7684\u6697\u95e8\uff0c\u8981\u4e48\u901a\u8fc7\u8f6c\u79fb\u6ce8\u610f\u529b\u76f4\u63a5\u5c55\u793a\u3002 \u6211\u4eec\u7684\u65b0\u5b9e\u9a8c\u5e73\u53f0\u5728\u4e00\u4e2a\u6709\u76ee\u6807\u7684\u73af\u5883\u4e2d\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u5bf9\u6297\u6027\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65f6\u7684\u6b3a\u9a97\u56fa\u6709\u80fd\u529b\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u7acb\u6cd5\u4efb\u52a1\u201c\u6e38\u8bf4\u201d\u8bae\u6848\u3002\u5728\u76ee\u6807\u9a71\u52a8\u7684\u73af\u5883\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u6784\u5efa\u6b3a\u9a97\u80fd\u529b\uff0c\u7ed3\u5408\u8bed\u8a00\u54f2\u5b66\u548c\u8ba4\u77e5\u5fc3\u7406\u5b66\u7406\u8bba\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6e38\u8bf4\u4ee3\u7406\u5728\u5bf9\u6297\u4e92\u52a8\u7684\u540e\u7eed\u5f3a\u5316\u8bd5\u9a8c\u4e2d\u5176\u6b3a\u9a97\u80fd\u529b\u63d0\u9ad8\u4e86\u7ea640%\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6b3a\u9a97\u68c0\u6d4b\u673a\u5236\u80fd\u8fbe\u5230\u9ad8\u8fbe92%\u7684\u8bc6\u522b\u7387\u3002\u8fd9\u4e9b\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u95ee\u9898\uff0c\u5373\u4ee3\u7406\u53ef\u80fd\u64cd\u7eb5\u4eba\u7c7b\u4ee5\u8fbe\u6210\u9884\u8bbe\u76ee\u6807\u3002|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324](http://arxiv.org/abs/2405.04324)|**[link](https://github.com/ibm-granite/granite-code-models)**|**\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u9886\u57df\u7684\u8bad\u7ec3\u6b63\u5728\u9769\u65b0\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002\u5982\u4eca\uff0c\u8fd9\u4e9b\u4ee3\u7801LLMs\u6b63\u9010\u6b65\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u73af\u5883\uff0c\u4ee5\u63d0\u5347\u4eba\u7c7b\u7a0b\u5e8f\u5458\u7684\u6548\u7387\uff0c\u5e76\u5c55\u73b0\u51fa\u81ea\u4e3b\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u8981\u5145\u5206\u5229\u7528\u4ee3\u7801LLMs\u7684\u5168\u90e8\u6548\u80fd\uff0c\u9700\u8981\u5176\u5177\u5907\u751f\u6210\u4ee3\u7801\u3001\u4fee\u590dbug\u3001\u89e3\u91ca\u548c\u6ce8\u91ca\u4ee3\u7801\u3001\u7ef4\u62a4\u4ed3\u5e93\u7b49\u591a\u79cd\u529f\u80fd\u3002\u672c\u6587\u4ecb\u7ecdGranite\u7cfb\u5217\u7684\u89e3\u7801\u5668\u4ec5\u6709\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u4e13\u4e3a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u800c\u8bbe\u8ba1\uff0c\u8bad\u7ec3\u6570\u636e\u6db5\u76d6116\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002Granite Code\u6a21\u578b\u5bb6\u65cf\u5305\u62ec\u4ece3\u4ebf\u5230340\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u4ece\u590d\u6742\u5e94\u7528\u73b0\u4ee3\u5316\u5230\u8bbe\u5907\u5185\u5b58\u53d7\u9650\u7684\u591a\u79cd\u5e94\u7528\u573a\u666f\u3002\u901a\u8fc7\u5168\u9762\u4efb\u52a1\u8bc4\u4f30\uff0cGranite Code\u6a21\u578b\u5728\u5f00\u6e90\u4ee3\u7801LLM\u4e2d\u7684\u6027\u80fd\u59cb\u7ec8\u5904\u4e8e\u9886\u5148\u6c34\u5e73\u3002\u8be5\u6a21\u578b\u5bb6\u65cf\u9488\u5bf9\u4f01\u4e1a\u8f6f\u4ef6\u5f00\u53d1\u5de5\u4f5c\u6d41\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8868\u73b0\u51fa\u8272\u4e8e\u5404\u79cd\u7f16\u7801\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\u3001\u4fee\u590d\u4e0e\u89e3\u91ca\uff09\uff0c\u662f\u4e00\u6b3e\u591a\u7528\u9014\u7684\u5168\u80fd\u4ee3\u7801\u6a21\u578b\u3002\u6211\u4eec\u4ee5Apache 2.0\u8bb8\u53ef\u534f\u8bae\u53d1\u5e03\u6240\u6709Granite Code\u6a21\u578b\uff0c\u4f9b\u7814\u7a76\u548c\u5546\u4e1a\u4f7f\u7528\u3002**|\n", "2405.04219": "|**2024-05-07**|**Iterative Experience Refinement of Software-Developing Agents**|Chen Qian et.al.|[2405.04219](http://arxiv.org/abs/2405.04219)|null|### \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u81ea\u4e3b\u6027\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u9759\u6001\u7ecf\u9a8c\u8303\u5f0f\u4f9d\u8d56\u4e8e\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u83b7\u53d6\u7684\u56fa\u5b9a\u5386\u53f2\u7ecf\u9a8c\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u8fed\u4ee3\u7ecf\u9a8c\u4f18\u5316\u6846\u67b6\uff0c\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u5728\u6267\u884c\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u52a8\u6001\u8c03\u6574\u548c\u4f18\u5316\u7ecf\u9a8c\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e24\u79cd\u6838\u5fc3\u6a21\u5f0f\uff1a\u987a\u5e8f\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u6279\u6b21\u5185\u7684\u6700\u8fd1\u7ecf\u9a8c\u8fdb\u884c\u6539\u8fdb\uff1b\u7d2f\u8ba1\u6a21\u5f0f\uff0c\u79ef\u7d2f\u6240\u6709\u5148\u524d\u4efb\u52a1\u6279\u6b21\u7684\u7ecf\u9a8c\u3002\u901a\u8fc7\u5f15\u5165\u7ecf\u9a8c\u6dd8\u6c70\u7b56\u7565\uff0c\u8be5\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9ad8\u8d28\u91cf\u548c\u5e38\u7528\u7684\u7ecf\u9a8c\uff0c\u6709\u6548\u5730\u7ba1\u7406\u7ecf\u9a8c\u7a7a\u95f4\uff0c\u63d0\u9ad8\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u987a\u5e8f\u6a21\u5f0f\u53ef\u80fd\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\uff0c\u4f46\u7d2f\u8ba1\u6a21\u5f0f\u5728\u7a33\u5b9a\u6027\u65b9\u9762\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u6dd8\u6c70\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u9ad8\u8d28\u91cf\u7ecf\u9a8c\u5b50\u96c6\u768411.54%\uff0c\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2405.03813": "|**2024-05-06**|**Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control**|Yaqub Chaudhary et.al.|[2405.03813](http://arxiv.org/abs/2405.03813)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6a21\u4eff\u5404\u79cd\u4fee\u8f9e\u98ce\u683c\uff0c\u751f\u6210\u8868\u8fbe\u5e7f\u6cdb\u60c5\u611f\u7684\u6587\u672c\uff0c\u8fd9\u79cd\u80fd\u529b\u5728\u4f4e\u6210\u672c\u4e0b\u8fc5\u901f\u666e\u53ca\uff0c\u5e26\u6765\u4e86\u6f5c\u5728\u7684\u793e\u4f1a\u5371\u5bb3\u3002\u672c\u6587\u5e76\u672a\u5b64\u7acb\u770b\u5f85\u8fd9\u4e9b\u6a21\u578b\uff0c\u800c\u662f\u5173\u6ce8\u5b83\u4eec\u80cc\u540e\u5927\u89c4\u6a21\u8ba1\u7b97\u57fa\u7840\u8bbe\u65bd\u5728\u5404\u9886\u57df\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u63a2\u8ba8\u4e86LLMs\u5982\u4f55\u901a\u8fc7\u6c61\u67d3\u548c\u6807\u51c6\u5316\u4fe1\u606f\u73af\u5883\u6765\u5f71\u54cd\u793e\u4f1a\uff0c\u5e76\u6307\u51fa\u8fd9\u4e9b\u529f\u80fd\u53ef\u80fd\u88ab\u7528\u4f5c\u63a7\u5236\u624b\u6bb5\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u7126\u70b9\u8f6c\u5411\u51e0\u4e2a\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u8fd9\u4e9b\u9886\u57df\u589e\u5f3a\u4e86LLMs\u4f5c\u4e3a\u6743\u529b\u5de5\u5177\u7684\u80fd\u529b\uff1a 1. \u901a\u8fc7\u5b9e\u65f6\u8bbe\u8ba1\u5bf9\u8bdd\u754c\u9762\u4e2d\u7684\u9009\u62e9\u67b6\u6784\uff08\u5982\u201cAI\u89d2\u8272\u201d\uff09\uff0c\u8fdb\u884c\u8bf4\u670d\u7b56\u7565\u3002 2. \u5229\u7528LLM\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u8ba1\u7b97\u6a21\u578b\uff08\u5982\u201c\u7845\u8d28\u4e3b\u4f53\u201d\uff09\u3002 3. \u5c06LLM\u5e94\u7528\u4e8e\u6a21\u62df\u4eba\u7c7b\u7fa4\u4f53\u884c\u4e3a\uff08\u5982\u201c\u7845\u8d28\u793e\u4f1a\u201d\uff09\u3002 4. \u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff0c\u521b\u5efa\u53ef\u63a7\u5236\u548c\u5bfc\u5411\u7684\u6218\u7565\u5bf9\u8bdd\u6a21\u578b\u3002 \u7efc\u5408\u4ee5\u4e0a\u51e0\u70b9\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6280\u672f\u6784\u5efa\u57fa\u4e8eLLMs\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u6a21\u62df\u548c\u4f2a\u88c5\u7684\u201c\u9884\u6d4b\u201d\uff0c\u6210\u4e3a\u4e2a\u4f53\u3001\u793e\u4f1a\u548c\u653f\u6cbb\u63a7\u5236\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u64cd\u63a7\u4eba\u7c7b\u7684\u884c\u4e3a\u3001\u610f\u56fe\u548c\u884c\u52a8\u3002|\n", "2405.06682": "|**2024-05-05**|**Self-Reflection in LLM Agents: Effects on Problem-Solving Performance**|Matthew Renze et.al.|[2405.06682](http://arxiv.org/abs/2405.06682)|**[link](https://github.com/matthewrenze/self-reflection)**|**\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u81ea\u6211\u53cd\u601d\u5bf9\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba9\u4e5d\u79cd\u6d41\u884c\u7684LLMs\u56de\u7b54\u4e00\u7cfb\u5217\u9009\u62e9\u9898\uff0c\u4ee5\u5efa\u7acb\u6027\u80fd\u57fa\u7ebf\u3002\u5bf9\u4e8e\u56de\u7b54\u9519\u8bef\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6307\u5bfc\u516b\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u81ea\u6211\u53cd\u601dLLM\u4ee3\u7406\u53cd\u601d\u5176\u9519\u8bef\uff0c\u5e76\u4e3a\u81ea\u5df1\u63d0\u4f9b\u6539\u8fdb\u95ee\u9898\u89e3\u51b3\u7684\u6307\u5bfc\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u6307\u5bfc\uff0c\u6bcf\u4e2a\u53cd\u601d\u578b\u4ee3\u7406\u91cd\u65b0\u5c1d\u8bd5\u56de\u7b54\u540c\u6837\u7684\u95ee\u9898\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u663e\u8457\u63d0\u9ad8\u4e86\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff08$p < 0.001$\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6bd4\u8f83\u4e86\u5404\u79cd\u81ea\u6211\u53cd\u601d\u65b9\u5f0f\u5bf9\u6027\u80fd\u7684\u5355\u72ec\u8d21\u732e\u3002\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\uff1ahttps://github.com/matthewrenze/self-reflection\u3002**|\n", "2405.02858": "|**2024-05-05**|**Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation**|Jinyu Cai et.al.|[2405.02858](http://arxiv.org/abs/2405.02858)|**[link](https://github.com/BlueLinkX/GA-MAS)**|**\u793e\u4ea4\u5a92\u4f53\u5e73\u53f0\u5982Twitter\u3001Reddit\u548c\u65b0\u6d6a\u5fae\u535a\u5728\u5168\u7403\u4ea4\u6d41\u4e2d\u626e\u6f14\u91cd\u8981\u89d2\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5730\u7f18\u653f\u6cbb\u654f\u611f\u533a\u57df\u5e38\u5e38\u53d7\u5230\u4e25\u683c\u76d1\u7ba1\u3002\u8fd9\u4fc3\u4f7f\u7528\u6237\u5728\u53d7\u9650\u7684\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e2d\u5de7\u5999\u5730\u8c03\u6574\u6c9f\u901a\u65b9\u5f0f\uff0c\u7ecf\u5e38\u4f7f\u7528\u7f16\u7801\u8bed\u8a00\u3002\u8fd9\u79cd\u8bed\u8a00\u6a21\u5f0f\u7684\u53d8\u5316\u4e0d\u4ec5\u662f\u4e3a\u4e86\u5bf9\u6297\u76d1\u7ba1\uff0c\u4e5f\u662f\u8bed\u8a00\u6f14\u5316\u7684\u751f\u52a8\u4f8b\u8bc1\uff0c\u5c55\u793a\u4e86\u793e\u4f1a\u548c\u6280\u672f\u538b\u529b\u4e0b\u8bed\u8a00\u5982\u4f55\u81ea\u7136\u6f14\u53d8\u3002\u7814\u7a76\u53d7\u9650\u5236\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e0b\u8bed\u8a00\u7684\u6f14\u53d8\u5bf9\u4e8e\u4fdd\u969c\u8a00\u8bba\u81ea\u7531\u3001\u4f18\u5316\u5185\u5bb9\u7ba1\u7406\u4ee5\u53ca\u63a8\u52a8\u8bed\u8a00\u5b66\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6a21\u62df\u6846\u67b6\uff0c\u7528\u4e8e\u63a2\u7d22\u5728\u4e25\u683c\u76d1\u7ba1\u4e0b\u7684\u7528\u6237\u8bed\u8a00\u8fdb\u5316\u3002\u8be5\u6846\u67b6\u5305\u542b\u5bf9\u8bdd\u76d1\u7763\u7684LLM\u9a71\u52a8\u4ee3\u7406\u548c\u53c2\u4e0e\u8005\u4ee3\u7406\uff0c\u5b83\u4eec\u5728\u4e92\u52a8\u4e2d\u53d1\u5c55\u8bed\u8a00\u7b56\u7565\uff0c\u6a21\u62df\u5728\u89c4\u907f\u793e\u4ea4\u5a92\u4f53\u89c4\u5219\u7684\u73af\u5883\u4e2d\u4ea4\u6d41\u65b9\u5f0f\u7684\u6f14\u53d8\u3002\u901a\u8fc7\u4ece\u62bd\u8c61\u573a\u666f\u5230\u73b0\u5b9e\u60c5\u5883\u7684\u591a\u79cd\u60c5\u666f\u8bc4\u4f30\uff0c\u7814\u7a76\u7ed3\u679c\u663e\u793aLLMs\u80fd\u591f\u6709\u6548\u6a21\u62df\u53d7\u9650\u73af\u5883\u4e2d\u7684\u590d\u6742\u8bed\u8a00\u52a8\u6001\u548c\u4ea4\u4e92\uff0c\u968f\u7740\u8fdb\u5316\uff0c\u5b83\u4eec\u5728\u89c4\u907f\u76d1\u7763\u548c\u4fe1\u606f\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u51fa\u63d0\u5347\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0LLM\u4ee3\u7406\u9488\u5bf9\u4e0d\u540c\u7684\u573a\u666f\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u7b56\u7565\u3002**|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533](http://arxiv.org/abs/2405.01533)|**[link](https://github.com/nvlabs/omnidrive)**|**\u968f\u7740\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5bf9\u4e8e\u57fa\u4e8e\u8fd9\u4e9b\u6a21\u578b\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u8868\u73b0\u51fa\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u671f\u671b\u5229\u7528\u5b83\u4eec\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06MLLMs\u7684\u5f3a\u9879\u5e94\u7528\u4e8e\u9a7e\u9a76\u4efb\u52a1\u7684\u89c4\u5212\u90e8\u5206\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u89c4\u5212\u9700\u8981\u5bf9\u4e09\u7ef4\u73af\u5883\u6709\u5168\u9762\u7684\u7406\u89e3\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u4e8c\u7ef4\u63a8\u7406\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u6a21\u578b\u4e0e3D\u9a7e\u9a76\u4efb\u52a1\u7684\u7d27\u5bc6\u5951\u5408\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u76843D MLLM\u67b6\u6784\uff0c\u5b83\u5229\u7528\u7a00\u758f\u67e5\u8be2\u6280\u672f\u5c06\u89c6\u89c9\u8868\u793a\u63d0\u5347\u5e76\u538b\u7f29\u5230\u4e09\u7ef4\u7a7a\u95f4\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u8fd9\u79cd\u57fa\u4e8e\u67e5\u8be2\u7684\u8868\u793a\u65b9\u5f0f\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u540c\u65f6\u7f16\u7801\u52a8\u6001\u7269\u4f53\u548c\u9759\u6001\u5730\u56fe\u5143\u7d20\uff08\u5982\u9053\u8def\uff09\uff0c\u4e3a\u611f\u77e5\u548c\u884c\u52a8\u7684\u5bf9\u9f50\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5316\u7684\u4e09\u7ef4\u4e16\u754c\u6a21\u578b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86OmniDrive-nuScenes\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u89c6\u89c9\u95ee\u7b54\u6570\u636e\u96c6\uff0c\u5b83\u901a\u8fc7\u5168\u9762\u7684\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\uff08\u5982\u573a\u666f\u63cf\u8ff0\u3001\u4ea4\u901a\u89c4\u5219\u7406\u89e3\u3001\u4e09\u7ef4\u5b9a\u4f4d\u3001\u53cd\u4e8b\u5b9e\u63a8\u7406\u3001\u51b3\u7b56\u5236\u5b9a\u548c\u89c4\u5212\uff09\u6765\u8003\u9a8c\u6a21\u578b\u5728\u590d\u6742\u4e09\u7ef4\u573a\u666f\u4e2d\u7684\u771f\u6b63\u60c5\u5883\u610f\u8bc6\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u63d0\u51fa\u7684\u67b6\u6784\u6709\u6548\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u590d\u6742\u4e09\u7ef4\u73af\u5883\u4e2d\u8fdb\u884c\u63a8\u7406\u548c\u89c4\u5212\u65f6\uff0c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002**|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972](http://arxiv.org/abs/2405.00972)|**[link](https://github.com/pnnl/cactus)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCACTUS\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u7ed3\u5408\u4e86\u5316\u5b66\u4fe1\u606f\u5b66\u5de5\u5177\uff0c\u65e8\u5728\u63d0\u5347\u5728\u5316\u5b66\u548c\u5206\u5b50\u53d1\u73b0\u9886\u57df\u7684\u9ad8\u7ea7\u63a8\u7406\u4e0e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u7814\u7a76\u8005\u4eec\u4f7f\u7528\u5305\u62ecGemma-7b\u3001Falcon-7b\u3001MPT-7b\u3001Llama2-7b\u548cMistral-7b\u5728\u5185\u7684\u591a\u6b3e\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u5bf9CACTUS\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6027\u80fd\u8bc4\u4f30\uff0c\u901a\u8fc7\u6570\u5343\u4e2a\u5316\u5b66\u95ee\u9898\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cCACTUS\u660e\u663e\u4f18\u4e8e\u57fa\u7840\u6a21\u578b\uff0c\u5176\u4e2dGemma-7b\u548cMistral-7b\u65e0\u8bba\u91c7\u7528\u4f55\u79cd\u63d0\u793a\u7b56\u7565\uff0c\u8868\u73b0\u6700\u4e3a\u51fa\u8272\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u9886\u57df\u7279\u5b9a\u63d0\u793a\u548c\u786c\u4ef6\u914d\u7f6e\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5f3a\u8c03\u4e86\u63d0\u793a\u5de5\u7a0b\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6307\u51fa\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u90e8\u7f72\u8f83\u5c0f\u6a21\u578b\u53ef\u80fd\u4e0d\u4f1a\u663e\u8457\u727a\u7272\u51c6\u786e\u6027\u3002 CACTUS\u901a\u8fc7\u878d\u5408\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8ba4\u77e5\u529f\u80fd\u4e0e\u4e13\u4e1a\u5de5\u5177\uff0c\u80fd\u591f\u534f\u52a9\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u5206\u5b50\u6027\u8d28\u9884\u6d4b\u3001\u76f8\u4f3c\u6027\u641c\u7d22\u548c\u836f\u7269\u9002\u7528\u6027\u8bc4\u4f30\u7b49\u4efb\u52a1\u3002\u4f5c\u4e3a\u5316\u5b66\u4fe1\u606f\u5b66\u9886\u57df\u7684\u91cd\u5927\u7a81\u7834\uff0cCACTUS\u4e3a\u5316\u5b66\u5bb6\u548c\u5206\u5b50\u63a2\u7d22\u8005\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u5de5\u5177\uff0c\u6709\u671b\u52a0\u901f\u79d1\u5b66\u7814\u7a76\uff0c\u63a8\u52a8\u65b0\u578b\u6709\u6548\u3001\u5b89\u5168\u836f\u7269\u3001\u50ac\u5316\u5242\u548c\u6750\u6599\u7684\u53d1\u73b0\u3002\u6b64\u5916\uff0cCACTUS\u4e0e\u81ea\u52a8\u5316\u5b9e\u9a8c\u5e73\u53f0\u7684\u96c6\u6210\u4ee5\u53ca\u5b9e\u65f6\u6570\u636e\u9a71\u52a8\u51b3\u7b56\u7684\u80fd\u529b\uff0c\u4e3a\u81ea\u4e3b\u53d1\u73b0\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u3002**|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978](http://arxiv.org/abs/2404.18978)|null|\u968f\u7740\u6559\u80b2\u73af\u5883\u4e2d\u5bf9\u5b66\u4e60\u8005\u6a21\u578b\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u7814\u7a76\u91cd\u70b9\u9010\u6e10\u8f6c\u5411\u5982\u4f55\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f8\u7ed3\u5408\uff0c\u63d0\u5347\u5728\u5f00\u653e\u6027\u6587\u672c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u901a\u7528\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e09\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\uff081\uff09\u57fa\u4e8eRL\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8868\u793a\u72b6\u6001\u548c\u884c\u52a8\u7b56\u7565\u4ee5\u5bfb\u627e\u6700\u4f73\u4e92\u52a8\u65b9\u5f0f\uff1b\uff082\uff09\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5229\u7528\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u901a\u8fc7\u63d0\u793a\u8fdb\u884c\u64cd\u4f5c\uff1b\uff083\uff09\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u63d0\u9ad8\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e9b\u4ee3\u7406\u7684\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PharmaSimText\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81eaPharmaSim\u865a\u62df\u836f\u5e97\u73af\u5883\u7684\u65b0\u57fa\u51c6\uff0c\u4e13\u6ce8\u4e8e\u8bca\u65ad\u5bf9\u8bdd\u5b9e\u8df5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRL\u57fa\u7840\u7684\u4ee3\u7406\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u8868\u73b0\u4f18\u79c0\uff0c\u4f46\u5728\u63d0\u95ee\u8d28\u91cf\u4e0a\u6709\u6240\u6b20\u7f3a\uff1b\u800cLLM\u57fa\u7840\u7684\u4ee3\u7406\u5728\u63d0\u95ee\u80fd\u529b\u4e0a\u8f83\u5f3a\uff0c\u4f46\u4efb\u52a1\u5b8c\u6210\u5ea6\u4e0d\u9ad8\u3002\u6700\u540e\uff0c\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\u5c55\u793a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u7684\u6f5c\u529b\uff0c\u8bc1\u5b9e\u4e86RL\u4e0eLLMs\u7ed3\u5408\u7528\u4e8e\u5f00\u53d1\u5f00\u653e\u6027\u5b66\u4e60\u73af\u5883\u9ad8\u8868\u73b0\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021](http://arxiv.org/abs/2404.18021)|null|\u968f\u7740\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u7684\u5174\u8d77\uff0c\u7cbe\u786e\u4fee\u6539\u9057\u4f20\u4fe1\u606f\u5df2\u6210\u4e3a\u53ef\u80fd\uff0c\u4f46\u9ad8\u6548\u57fa\u56e0\u7f16\u8f91\u7cfb\u7edf\u7684\u6784\u5efa\u9700\u8981\u6df1\u5165\u7406\u89e3CRISPR\u6280\u672f\u53ca\u5176\u590d\u6742\u5b9e\u9a8c\u80cc\u666f\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bf8\u591a\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u751f\u7269\u8bbe\u8ba1\u95ee\u9898\u4e0a\u5f80\u5f80\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u3002\u672c\u6587\u4ecb\u7ecdCRISPR-GPT\uff0c\u4e00\u4e2a\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u9886\u57df\u77e5\u8bc6\u548c\u5916\u90e8\u5de5\u5177\uff0c\u4ee5\u81ea\u52a8\u5316\u5e76\u63d0\u5347\u57fa\u4e8eCRISPR\u7684\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\u8bbe\u8ba1\u8fc7\u7a0b\u3002CRISPR-GPT\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u534f\u52a9\u9009\u62e9CRISPR\u7cfb\u7edf\u3001\u8bbe\u8ba1\u5f15\u5bfcRNA\u3001\u63a8\u8350\u7ec6\u80de\u9012\u9001\u65b9\u6cd5\u3001\u8d77\u8349\u534f\u8bae\u4ee5\u53ca\u8bbe\u8ba1\u9a8c\u8bc1\u5b9e\u9a8c\u4ee5\u786e\u8ba4\u7f16\u8f91\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86CRISPR-GPT\u5982\u4f55\u5e2e\u52a9\u975e\u4e13\u5bb6\u7814\u7a76\u4eba\u5458\u4ece\u5934\u5f00\u59cb\u8fdb\u884c\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u5b9e\u9645\u6848\u4f8b\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u81ea\u52a8\u5316\u57fa\u56e0\u7f16\u8f91\u8bbe\u8ba1\u7684\u4f26\u7406\u548c\u76d1\u7ba1\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u8d1f\u8d23\u4efb\u548c\u900f\u660e\u4f7f\u7528\u6b64\u7c7b\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u76ee\u6807\u662f\u5f25\u5408\u521d\u7ea7\u751f\u7269\u7814\u7a76\u8005\u4e0eCRISPR\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u5c55\u793aLLM\u4ee3\u7406\u5728\u4fc3\u8fdb\u590d\u6742\u751f\u7269\u53d1\u73b0\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2404.17833": "|**2024-04-27**|**Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs**|Zhenlan Ji et.al.|[2404.17833](http://arxiv.org/abs/2404.17833)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u4e2d\uff0c\u7279\u522b\u662f\u5728\u5fc3\u7406\u5065\u5eb7\u652f\u6301\u3001\u5316\u5b66\u5408\u6210\u548c\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5c55\u73b0\u6548\u7528\uff0c\u4eba\u4eec\u53d1\u73b0\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u957f\u671f\u89c4\u5212\u65f6\u5bb9\u6613\u4ea7\u751f\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u2014\u2014PDoctor\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u7406\u89e3LLM\u4ee3\u7406\u7684\u9519\u8bef\u89c4\u5212\u3002PDoctor\u9996\u5148\u5b9a\u4e49\u4e86\u4e00\u4e2a\u9886\u57df\u7279\u5b9a\u7684\u8bed\u8a00\uff08DSL\uff09\uff0c\u7528\u4e8e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u501f\u52a9Z3\u7ea6\u675f\u6c42\u89e3\u5668\u751f\u6210\u5404\u79cd\u8f93\u5165\uff0c\u8fd9\u4e9b\u8f93\u5165\u662f\u63cf\u8ff0\u4e00\u7cfb\u5217\u4efb\u52a1\u5b8c\u6210\u9700\u6c42\u7684\u81ea\u7136\u8bed\u8a00\u6bb5\u843d\u3002\u7136\u540e\uff0cPDoctor\u4ece\u8fd9\u4e9b\u9700\u6c42\u4e2d\u63d0\u53d6\u7ea6\u675f\uff0c\u5f62\u6210\u4e00\u4e2a\u6d4b\u8bd5\u57fa\u51c6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u4e3b\u6d41\u7684\u4ee3\u7406\u6846\u67b6\u548c\u4e24\u4e2a\u5f3a\u5927\u7684LLMs\uff08GPT-3.5\u548cGPT-4\uff09\u5bf9PDoctor\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u80fd\u6709\u6548\u8bc6\u522b\u4ee3\u7406\u89c4\u5212\u4e2d\u7684\u5404\u79cd\u9519\u8bef\uff0c\u5e76\u4e3a\u5f00\u53d1\u8005\u548c\u7528\u6237\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u9519\u8bef\u7279\u6027\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u66ff\u4ee3\u8bbe\u8ba1\u548c\u6269\u5c55PDoctor\u7684\u65b9\u5411\u3002|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662](http://arxiv.org/abs/2404.17662)|**[link](https://github.com/alickzhu/player)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u95f4\u7684\u901a\u4fe1\u548c\u793e\u4f1a\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u6d89\u53ca\u7ade\u4e89\u4e0e\u5408\u4f5c\u7684\u52a8\u6001\u73af\u5883\u4e2d\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u590d\u6742\u63a8\u7406\u7684\u6784\u5efa\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5c24\u5176\u662f\u56e0\u4e3a\u57fa\u4e8e\u4fe1\u606f\u56fe\u7684\u641c\u7d22\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPLAYER*\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u4efb\u610f\u91c7\u6837\u5f0f\u89c4\u5212\u5668\u7684\u65b0\u6846\u67b6\uff0c\u5b83\u7ed3\u5408\u4e86\u4f20\u611f\u5668\u548c\u526a\u679d\u6280\u672f\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5b8c\u5168\u4f9d\u8d56\u4e8e\u95ee\u9898\u9a71\u52a8\u7684\u641c\u7d22\u6846\u67b6\uff0c\u9002\u7528\u4e8e\u9ad8\u96be\u5ea6\u7684\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u53ef\u91cf\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u901a\u8fc7\u591a\u9879\u9009\u62e9\u9898\u6765\u6d4b\u8bd5\uff0c\u5e76\u521b\u5efa\u4e86WellPlay\u6570\u636e\u96c6\uff0c\u5305\u542b1,482\u4e2a\u95ee\u7b54\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPLAYER*\u5728\u590d\u6742\u52a8\u6001\u73af\u5883\u4e2d\u7684\u6548\u7387\u548c\u6027\u80fd\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u53ef\u91cf\u5316\u7684\u5bf9\u6bd4\u7ed3\u679c\u3002**|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525](http://arxiv.org/abs/2404.17525)|null|\u4f20\u7edf\u7684\u673a\u68b0\u8bbe\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u901a\u8fc7\u7ecf\u9a8c\u5f15\u5bfc\u7684\u4fee\u6539\u548c\u6709\u9650\u5143\u5206\u6790\uff08FEA\uff09\u6765\u6ee1\u8db3\u7279\u5b9a\u9700\u6c42\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8017\u65f6\u4e14\u9ad8\u5ea6\u4f9d\u8d56\u4e2a\u4eba\u77e5\u8bc6\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u4e86\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u6a21\u578b\u6765\u7b80\u5316\u7e41\u7410\u7684\u4e13\u5bb6\u9a71\u52a8\u8fed\u4ee3\u8fc7\u7a0b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9700\u8981\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5176\u8bad\u7ec3\u9886\u57df\u548c\u4efb\u52a1\uff0c\u9650\u5236\u4e86\u8de8\u4efb\u52a1\u5e94\u7528\u3002\u8fd9\u5728\u81ea\u52a8\u5316\u6548\u7387\u4e0e\u8d44\u6e90\u9700\u6c42\u4e4b\u95f4\u5f62\u6210\u4e86\u6743\u8861\u3002 \u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u5c06\u9884\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u6709\u9650\u5143\u6a21\u5757\u7ed3\u5408\u3002\u6709\u9650\u5143\u6a21\u5757\u8bc4\u4f30\u6bcf\u4e2a\u8bbe\u8ba1\u5e76\u63d0\u4f9b\u5173\u952e\u53cd\u9988\uff0c\u5f15\u5bfcLLMs\u4e0d\u65ad\u5b66\u4e60\u3001\u89c4\u5212\u3001\u751f\u6210\u548c\u4f18\u5316\u8bbe\u8ba1\uff0c\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u8fdb\u884c\u4e13\u95e8\u8bad\u7ec3\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6841\u67b6\u7ed3\u6784\u7684\u8fed\u4ee3\u4f18\u5316\u4e2d\u5c55\u793a\u8fd9\u79cd\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u80fd\u591f\u6839\u636e\u7ed3\u6784\u5316\u7684\u53cd\u9988\u548c\u6807\u51c6\u8c03\u6574\u8bbe\u8ba1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6210\u529f\u751f\u6210\u7b26\u5408\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u6841\u67b6\u7ed3\u6784\u8bbe\u8ba1\uff0c\u6210\u529f\u7387\u9ad8\u8fbe90%\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6240\u65bd\u52a0\u7684\u7ea6\u675f\u6761\u4ef6\u3002\u901a\u8fc7\u63d0\u793a\u5f0f\u4f18\u5316\u6280\u672f\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u63a5\u6536\u5230\u89e3-\u5f97\u5206\u5bf9\u540e\uff0c\u80fd\u591f\u6839\u636e\u5176\u5185\u5728\u63a8\u7406\u80fd\u529b\u8fed\u4ee3\u4f18\u5316\u8bbe\u8ba1\u4ee5\u6ee1\u8db3\u89c4\u683c\u8981\u6c42\u3002 LLM\u4ee3\u7406\u80fd\u591f\u4ea7\u751f\u53ef\u884c\u7684\u8bbe\u8ba1\u5e76\u6839\u636e\u5176\u56fa\u6709\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4f18\u5316\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u81ea\u4e3b\u53d1\u5c55\u548c\u5b9e\u65bd\u6709\u6548\u7684\u8bbe\u8ba1\u7b56\u7565\u3002|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460](http://arxiv.org/abs/2404.17460)|null|\u672c\u6587\u8ba8\u8bba\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5bf9\u8bdd\u5f0f\u8f85\u5bfc\u7cfb\u7edf\uff08Conversational Tutoring Systems\uff0cCTS\uff09\uff0c\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u9996\u5148\uff0c\u7cfb\u7edf\u901a\u8fc7\u81ea\u52a8\u4ece\u8bfe\u7a0b\u6587\u672c\u4e2d\u751f\u6210\u6613\u4e8e\u7f16\u8f91\u7684\u6559\u5b66\u811a\u672c\uff0c\u5b9e\u73b0AI\u8f85\u52a9\u7684\u5185\u5bb9\u521b\u4f5c\u3002\u5176\u6b21\uff0c\u7cfb\u7edf\u901a\u8fc7\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff08Ruffle\u548cRiley\uff09\u4ee5\u5b66\u4e60\u6559\u5b66\u6a21\u5f0f\u8fd0\u884c\uff0c\u5206\u522b\u626e\u6f14\u5b66\u751f\u548c\u6559\u6388\u89d2\u8272\uff0c\u8fdb\u884c\u81ea\u7531\u5f62\u5f0f\u7684\u5bf9\u8bdd\uff0c\u9075\u5faa\u5178\u578b\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\u7684\u5185\u73af\u548c\u5916\u73af\u7ed3\u6784\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff08N=200\uff09\u4e2d\u5bf9\u6bd4\u4e86\u8be5\u7cfb\u7edf\u4e0e\u7b80\u5355\u7684\u95ee\u7b54\u804a\u5929\u673a\u5668\u4eba\u548c\u9605\u8bfb\u6d3b\u52a8\u5728\u652f\u6301\u751f\u7269\u5b66\u8bfe\u7a0b\u7684\u6548\u679c\u3002\u7814\u7a76\u5206\u6790\u4e86\u7cfb\u7edf\u4f7f\u7528\u6a21\u5f0f\u3001\u9884\u540e\u6d4b\u8bd5\u6210\u7ee9\u4ee5\u53ca\u7528\u6237\u4f53\u9a8c\u8c03\u67e5\uff0c\u7ed3\u679c\u663e\u793a\u7528\u6237\u5bf9Ruffle&Riley\u7684\u53c2\u4e0e\u5ea6\u9ad8\uff0c\u7406\u89e3\u529b\u5f3a\uff0c\u5e76\u8ba4\u4e3a\u63d0\u4f9b\u7684\u652f\u6301\u6709\u5e2e\u52a9\u3002\u5c3d\u7ba1Ruffle&Riley\u7528\u6237\u7684\u5b8c\u6210\u65f6\u95f4\u8f83\u957f\uff0c\u4f46\u5728\u77ed\u671f\u5b66\u4e60\u6210\u6548\u4e0a\u5e76\u672a\u53d1\u73b0\u663e\u8457\u5dee\u5f02\uff0c\u4f18\u4e8e\u9605\u8bfb\u6d3b\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u67b6\u6784\u548c\u7528\u6237\u7814\u7a76\u4e3a\u672a\u6765CTS\u8bbe\u8ba1\u8005\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u6e90\u6211\u4eec\u7684\u7cfb\u7edf\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8eLLM\u7684\u5b66\u4e60\u6280\u672f\u6709\u6548\u6559\u5b66\u8bbe\u8ba1\u7684\u7814\u7a76\u3002|\n", "2404.17153": "|**2024-04-26**|**A Unified Debugging Approach via LLM-Based Multi-Agent Synergy**|Cheryl Lee et.al.|[2404.17153](http://arxiv.org/abs/2404.17153)|null|\u5728\u8f6f\u4ef6\u8c03\u8bd5\u8fd9\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u52aa\u529b\u5b9e\u73b0\u81ea\u52a8\u5316\uff0c\u5305\u62ec\u6545\u969c\u5b9a\u4f4d\u548c\u4fee\u590d\u751f\u6210\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u8c03\u8bd5\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4f20\u7edf\u548c\u57fa\u4e8eLLM\u7684\u8c03\u8bd5\u5de5\u5177\u9762\u4e34\u4e09\u5927\u6311\u6218\uff1a1\uff09\u4e0a\u6e38\u7684\u6545\u969c\u5b9a\u4f4d\u4e0d\u51c6\u786e\u4f1a\u6ce2\u53ca\u4e0b\u6e38\u7684\u4fee\u590d\uff1b2\uff09\u5904\u7406\u590d\u6742\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\u4e0d\u8db3\uff1b3\uff09\u5ffd\u89c6\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u81ea\u52a8\u5316\u7684\u3001\u7edf\u4e00\u7684\u8c03\u8bd5\u6846\u67b6\u2014\u2014FixAgent\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u534f\u540c\u3002FixAgent\u80fd\u6267\u884c\u7aef\u5230\u7aef\u7684\u6545\u969c\u5b9a\u4f4d\u3001\u4fee\u590d\u548c\u5206\u6790\u3002 \u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0cLLMs\u80fd\u591f\u4ece\u4eba\u7c7b\u5f00\u53d1\u8005\u8ba4\u53ef\u7684\u901a\u7528\u8f6f\u4ef6\u5de5\u7a0b\u539f\u5219\u4e2d\u83b7\u76ca\uff0c\u6bd4\u5982\u201c\u6a61\u76ae\u9e2d\u8c03\u8bd5\u201d\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7a0b\u5e8f\u529f\u80fd\u548c\u903b\u8f91\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u4e2a\u7075\u611f\u6765\u6e90\u4e8e\u201c\u6a61\u76ae\u9e2d\u201d\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u4ee3\u7406\u4e13\u4e1a\u5316\u4e0e\u534f\u540c\u3001\u5173\u952e\u53d8\u91cf\u8ddf\u8e2a\u548c\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u4fc3\u4f7fLLMs\u63d0\u4f9b\u660e\u786e\u7684\u89e3\u91ca\uff0c\u5e76\u805a\u7126\u4e8e\u5173\u952e\u7684\u7a0b\u5e8f\u903b\u8f91\u4fe1\u606f\u3002\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684QuixBugs\u6570\u636e\u96c6\u4e0a\uff0cFixAgent\u6210\u529f\u4fee\u590d\u4e8680\u4e2abug\u4e2d\u768479\u4e2a\uff0c\u5176\u4e2d9\u4e2a\u662f\u4e4b\u524d\u672a\u89e3\u51b3\u7684\u3002\u5b83\u8fd8\u5728CodeFlaws\u4e0a\u5408\u7406\u5730\u4fee\u590d\u4e861.9\u500d\u4e8e\u6700\u4f73\u4fee\u590d\u5de5\u5177\u7684\u7f3a\u9677\uff0c\u800c\u4e14\u65e0\u9700\u4f4d\u7f6e\u4fe1\u606f\uff0c\u91c7\u6837\u7387\u4f4e\u4e8e0.6%\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4e0e\u4f7f\u7528\u4e0d\u540cLLM\u7684\u57fa\u7ebf\u6a21\u578b\u76f8\u6bd4\uff0cFixAgent\u63d0\u9ad8\u4e86\u7ea620%\u7684\u5408\u7406\u4fee\u590d\u548c\u6b63\u786e\u4fee\u590d\u7387\uff0c\u663e\u793a\u51fa\u6211\u4eec\u8bbe\u8ba1\u7684\u6709\u6548\u6027\u3002 \u6b64\u5916\uff0cFixAgent\u7684\u6b63\u786e\u7387\u9ad8\u8fbe97.26%\uff0c\u8868\u660e\u5b83\u6709\u53ef\u80fd\u514b\u670d\u73b0\u6709\u65b9\u6cd5\u7684\u8fc7\u62df\u5408\u95ee\u9898\u3002\u603b\u7ed3\u6765\u8bf4\uff0cFixAgent\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u81ea\u52a8\u5316\u8c03\u8bd5\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u8f6f\u4ef6\u8c03\u8bd5\u7684\u6548\u7387\u548c\u51c6\u786e\u6027\u3002|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698](http://arxiv.org/abs/2404.16698)|**[link](https://github.com/giorgiopiatti/govsim)**|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51b3\u7b56\u5b89\u5168\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGovernance of the Commons Simulation\u201d\uff08GovSim\uff09\u7684\u6a21\u62df\u5e73\u53f0\uff0c\u65e8\u5728\u7814\u7a76LLMs\u4e2d\u7684\u6218\u7565\u4e92\u52a8\u548c\u5408\u4f5c\u51b3\u7b56\u3002\u901a\u8fc7\u8fd9\u4e2a\u73af\u5883\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u4e4b\u95f4\u8d44\u6e90\u5206\u4eab\u7684\u52a8\u6001\uff0c\u5f3a\u8c03\u4e86\u4f26\u7406\u8003\u91cf\u3001\u6218\u7565\u89c4\u5212\u548c\u8c08\u5224\u6280\u5de7\u7684\u91cd\u8981\u6027\u3002GovSim\u5177\u6709\u7075\u6d3b\u6027\uff0c\u652f\u6301\u6587\u672c\u578b\u4ee3\u7406\uff0c\u5305\u62ecLLMs\u3002\u5229\u7528\u751f\u6210\u5f0f\u4ee3\u7406\u6846\u67b6\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u901a\u7528\u4ee3\u7406\uff0c\u4fbf\u4e8e\u6574\u5408\u4e0d\u540c\u7684LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u5728GovSim\u4e2d\uff0c\u53ea\u670915\u4e2a\u6d4b\u8bd5\u6a21\u578b\u4e2d\u76842\u4e2a\u80fd\u591f\u5b9e\u73b0\u53ef\u6301\u7eed\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u7ba1\u7406\u5171\u4eab\u8d44\u6e90\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u5982\u679c\u79fb\u9664\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u80fd\u529b\uff0c\u5b83\u4eec\u4f1a\u8fc7\u5ea6\u4f7f\u7528\u5171\u4eab\u8d44\u6e90\uff0c\u7a81\u51fa\u4e86\u5408\u4f5c\u4e2d\u6c9f\u901a\u7684\u5173\u952e\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u5927\u591a\u6570LLMs\u7f3a\u4e4f\u666e\u904d\u5316\u7684\u5047\u8bbe\u80fd\u529b\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u63a8\u7406\u6280\u80fd\u7684\u4e00\u4e2a\u91cd\u8981\u5f31\u70b9\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u6240\u6709\u7814\u7a76\u7ed3\u679c\uff0c\u5305\u62ec\u6a21\u62df\u73af\u5883\u3001\u4ee3\u7406\u63d0\u793a\u4ee5\u53ca\u5168\u9762\u7684\u7f51\u7edc\u754c\u9762\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba8\u8bba\u3002|\n", "2404.17605": "|**2024-04-24**|**Autonomous LLM-driven research from data to human-verifiable research papers**|Tal Ifargan et.al.|[2404.17605](http://arxiv.org/abs/2404.17605)|**[link](https://github.com/technion-kishony-lab/data-to-paper)**|**\u968f\u7740\u4eba\u5de5\u667a\u80fd\u63a8\u52a8\u79d1\u5b66\u53d1\u73b0\u7684\u6b65\u4f10\u52a0\u5feb\uff0c\u4eba\u4eec\u8fd8\u4e0d\u6e05\u695a\u5b8c\u5168\u7531AI\u9a71\u52a8\u7684\u7814\u7a76\u662f\u5426\u53ef\u884c\uff0c\u4ee5\u53ca\u5b83\u80fd\u5426\u9075\u5faa\u5173\u952e\u7684\u79d1\u5b66\u4ef7\u503c\u89c2\uff0c\u5982\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002\u4e3a\u4e86\u6a21\u62df\u4eba\u7c7b\u7684\u79d1\u5b66\u7814\u7a76\u5b9e\u8df5\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u201c\u6570\u636e\u5230\u8bba\u6587\u201d\uff08data-to-paper\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u5e73\u53f0\uff0c\u5f15\u5bfc\u76f8\u4e92\u534f\u4f5c\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u901a\u8fc7\u5b8c\u6574\u7684\u5206\u6b65\u9aa4\u7814\u7a76\u6d41\u7a0b\uff0c\u540c\u65f6\u7a0b\u5e8f\u5316\u8ffd\u8e2a\u4fe1\u606f\u6d41\uff0c\u5e76\u5141\u8bb8\u4eba\u7c7b\u76d1\u7763\u548c\u4e92\u52a8\u3002\u5728\u81ea\u52a8\u6a21\u5f0f\u4e0b\uff0c\u4ec5\u63d0\u4f9b\u6807\u6ce8\u6570\u636e\uff0c\u8be5\u5e73\u53f0\u5c31\u80fd\u63d0\u51fa\u5047\u8bbe\uff0c\u8bbe\u8ba1\u7814\u7a76\u8ba1\u5212\uff0c\u7f16\u5199\u548c\u8c03\u8bd5\u5206\u6790\u4ee3\u7801\uff0c\u751f\u6210\u548c\u89e3\u8bfb\u7ed3\u679c\uff0c\u751a\u81f3\u521b\u5efa\u5b8c\u6574\u4e14\u4fe1\u606f\u53ef\u8ffd\u6eaf\u7684\u79d1\u7814\u8bba\u6587\u3002\u5c3d\u7ba1\u7814\u7a76\u65b0\u9896\u6027\u6709\u9650\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u5c55\u793a\u4e86AI\u81ea\u4e3b\u4ece\u6570\u636e\u4e2d\u751f\u6210\u539f\u521b\u5b9a\u91cf\u6d1e\u5bdf\u7684\u80fd\u529b\u3002\u5bf9\u4e8e\u7b80\u5355\u7684\u7814\u7a76\u76ee\u6807\uff0c\u5168\u81ea\u52a8\u6d41\u7a0b\u80fd\u521b\u4f5c\u51fa\u5927\u7ea680-90%\u65e0\u9700\u91cd\u5927\u9519\u8bef\u7684\u7a3f\u4ef6\uff0c\u7136\u800c\u968f\u7740\u76ee\u6807\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u4eba\u7c7b\u7684\u5171\u540c\u53c2\u4e0e\u5bf9\u4e8e\u4fdd\u8bc1\u51c6\u786e\u6027\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u751f\u6210\u7684\u8bba\u6587\u672c\u8eab\u4e5f\u5177\u6709\u5185\u5728\u7684\u53ef\u9a8c\u8bc1\u6027\uff0c\u56e0\u4e3a\u4fe1\u606f\u8ffd\u8e2a\u4f7f\u5f97\u7ed3\u679c\u3001\u65b9\u6cd5\u548c\u6570\u636e\u7684\u94fe\u63a5\u53ef\u4ee5\u7a0b\u5e8f\u5316\u8fdb\u884c\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0cAI\u9a71\u52a8\u7684\u79d1\u7814\u53ef\u4ee5\u52a0\u901f\u79d1\u5b66\u53d1\u73b0\uff0c\u540c\u65f6\u589e\u5f3a\u800c\u975e\u5a01\u80c1\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002**|\n", "2404.16115": "|**2024-04-24**|**Online Personalizing White-box LLMs Generation with Neural Bandits**|Zekai Chen et.al.|[2404.16115](http://arxiv.org/abs/2404.16115)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u59cb\u751f\u6210\u4e2a\u6027\u5316\u7684\u6587\u672c\u5185\u5bb9\uff0c\u5982\u4f55\u5728\u4e0d\u4e3a\u6bcf\u4f4d\u7528\u6237\u521b\u5efa\u72ec\u7279\u6a21\u578b\u7684\u8d44\u6e90\u6d88\u8017\u4e0b\u5b9e\u73b0\u9ad8\u6548\u4e2a\u6027\u5316\u6210\u4e86\u65b0\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5728\u7ebf\u65b9\u6cd5\uff0c\u5229\u7528\u795e\u7ecf_bandit\u7b97\u6cd5\u52a8\u6001\u4f18\u5316\u8f6f\u6307\u4ee4\u5d4c\u5165\uff0c\u6839\u636e\u7528\u6237\u53cd\u9988\u8c03\u6574\u5185\u5bb9\uff0c\u4ece\u800c\u63d0\u5347\u767d\u76d2LLMs\u5f00\u653e\u6027\u6587\u672c\u751f\u6210\u7684\u4e2a\u6027\u5316\u6c34\u5e73\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u57fa\u7840\u7b56\u7565\u6709\u663e\u8457\u6027\u80fd\u63d0\u5347\u3002\u7279\u522b\u662f\u9488\u5bf9\u4e2a\u6027\u5316\u65b0\u95fb\u6807\u9898\u751f\u6210\uff0cNeuralTS\u5e26\u6765\u4e86\u9ad8\u8fbe62.9%\u7684\u6700\u4f73ROUGE\u5206\u6570\u63d0\u5347\u4ee5\u53ca2.76%\u7684LLM\u4ee3\u7406\u8bc4\u4f30\u5206\u6570\u589e\u957f\uff0c\u8fd9\u8868\u660e\u5176\u6548\u679c\u663e\u8457\u3002|\n", "2404.15974": "|**2024-04-24**|**A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples**|Lihang Pan et.al.|[2404.15974](http://arxiv.org/abs/2404.15974)|null|## \u7ffb\u8bd1 \u5355\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fde\u63a5\u591a\u4e2aLLM\u4ee3\u7406\u6784\u5efa\u7684\u7f51\u7edc\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6574\u4f53\u6027\u80fd\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4eba\u673a\u534f\u4f5c\u5de5\u5177\u2014\u2014EasyLAN\uff0c\u65e8\u5728\u5e2e\u52a9\u5f00\u53d1\u8005\u8f7b\u677e\u6784\u5efaLLM\u4ee3\u7406\u7f51\u7edc\uff08LAN\uff09\u3002EasyLAN\u9996\u5148\u6839\u636e\u4efb\u52a1\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u4ec5\u5305\u542b\u4e00\u4e2a\u4ee3\u7406\u7684\u521d\u59cb\u7f51\u7edc\u3002\u63a5\u7740\uff0c\u5b83\u5229\u7528\u5c11\u91cf\u8bad\u7ec3\u793a\u4f8b\u6765\u8c03\u6574\u7f51\u7edc\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u793a\u4f8b\uff0cEasyLAN\u5206\u6790\u8f93\u51fa\u4e0e\u771f\u5b9e\u7ed3\u679c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u5e76\u627e\u51fa\u9519\u8bef\u7684\u539f\u56e0\u3002EasyLAN\u4f1a\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u7b56\u7565\u6765\u4fee\u6b63\u8fd9\u4e9b\u95ee\u9898\u3002\u7528\u6237\u53ef\u4ee5\u4ecb\u5165EasyLAN\u7684\u5de5\u4f5c\u6d41\u7a0b\u6216\u76f4\u63a5\u4fee\u6539LAN\u3002\u6700\u7ec8\uff0cLAN\u4ece\u5355\u4e2a\u4ee3\u7406\u53d1\u5c55\u6210\u591a\u4ee3\u7406\u7684\u7f51\u7edc\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEasyLAN\u80fd\u591f\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u6784\u5efa\u6027\u80fd\u826f\u597d\u7684LAN\u3002|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269](http://arxiv.org/abs/2404.15269)|**[link](https://github.com/gao-g/prelude)**|**\u6211\u4eec\u7814\u7a76\u57fa\u4e8e\u7528\u6237\u5bf9\u8bed\u8a00\u6a21\u578b\u7f16\u8f91\u7684\u4e92\u52a8\u5b66\u4e60\u8bed\u8a00\u4ee3\u7406\u3002\u5728\u8bf8\u5982\u5199\u4f5c\u52a9\u624b\u7684\u5e38\u89c1\u573a\u666f\u4e2d\uff0c\u7528\u6237\u4e0e\u8bed\u8a00\u4ee3\u7406\u4ea4\u4e92\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u751f\u6210\u54cd\u5e94\uff0c\u5e76\u53ef\u80fd\u9009\u62e9\u6027\u5730\u7f16\u8f91\u4ee3\u7406\u7684\u54cd\u5e94\u4ee5\u53cd\u6620\u4ed6\u4eec\u7684\u6f5c\u5728\u504f\u597d\uff0c\u540c\u65f6\u63d0\u9ad8\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u7f16\u8f91\u53cd\u9988\u662f\u81ea\u7136\u4ea7\u751f\u7684\uff0c\u9002\u5408\u7528\u4e8e\u63d0\u5347\u4ee3\u7406\u4e0e\u7528\u6237\u504f\u597d\u7684\u5951\u5408\u5ea6\uff0c\u964d\u4f4e\u540e\u7eed\u7528\u6237\u7684\u7f16\u8f91\u6210\u672c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPRELUDE\u6846\u67b6\uff0c\u5b83\u6839\u636e\u5386\u53f2\u7f16\u8f91\u6570\u636e\u63a8\u65ad\u7528\u6237\u7684\u6f5c\u5728\u504f\u597d\uff0c\u5e76\u636e\u6b64\u8bbe\u8ba1\u4e00\u4e2a\u63d0\u793a\u7b56\u7565\uff0c\u5f15\u5bfc\u672a\u6765\u7684\u54cd\u5e94\u751f\u6210\uff0c\u907f\u514d\u4e86\u6602\u8d35\u4e14\u96be\u4ee5\u6269\u5c55\u7684\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8fd8\u80fd\u4fdd\u6301\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u5b66\u4e60\u63cf\u8ff0\u6027\u7684\u504f\u597d\u6709\u52a9\u4e8e\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\uff0c\u7528\u6237\u53ef\u4ee5\u67e5\u770b\u548c\u8c03\u6574\u5b66\u4e60\u5230\u7684\u504f\u597d\u3002\u7136\u800c\uff0c\u7528\u6237\u504f\u597d\u53ef\u80fd\u590d\u6742\u591a\u53d8\uff0c\u53d7\u60c5\u5883\u5f71\u54cd\uff0c\u56e0\u6b64\u5b66\u4e60\u8d77\u6765\u5177\u6709\u6311\u6218\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCIPHER\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6839\u636e\u7528\u6237\u7f16\u8f91\u63a8\u65ad\u7ed9\u5b9a\u60c5\u5883\u4e0b\u7684\u7528\u6237\u504f\u597d\u3002\u672a\u6765\uff0cCIPHER\u4f1a\u4ece\u5386\u53f2\u4e2d\u7684k\u4e2a\u6700\u63a5\u8fd1\u7684\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u63a8\u65ad\u51fa\u7684\u504f\u597d\uff0c\u7efc\u5408\u751f\u6210\u54cd\u5e94\u3002\u6211\u4eec\u5728\u603b\u7ed3\u548c\u7535\u5b50\u90ae\u4ef6\u5199\u4f5c\u4e24\u4e2a\u4e92\u52a8\u73af\u5883\u4e2d\u4f7f\u7528GPT-4\u6a21\u62df\u7528\u6237\u8fdb\u884c\u8bc4\u4f30\uff0c\u4e0e\u76f4\u63a5\u4f7f\u7528\u7528\u6237\u7f16\u8f91\u4f46\u4e0d\u5b66\u4e60\u63cf\u8ff0\u6027\u504f\u597d\u7684\u7b97\u6cd5\uff0c\u4ee5\u53ca\u5b66\u4e60\u5168\u5c40\u65e0\u4e0a\u4e0b\u6587\u504f\u597d\u7684\u7b97\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002 \u5728\u4e24\u9879\u4efb\u52a1\u4e2d\uff0cCIPHER\u90fd\u5b9e\u73b0\u4e86\u6700\u4f4e\u7684\u7f16\u8f91\u8ddd\u79bb\u6210\u672c\uff0c\u5e76\u4e14\u5b66\u4e60\u5230\u7684\u504f\u597d\u4e0e\u771f\u5b9e\u504f\u597d\u663e\u793a\u51fa\u663e\u8457\u7684\u76f8\u4f3c\u6027\u3002**|\n", "2404.14387": "|**2024-04-22**|**A Survey on Self-Evolution of Large Language Models**|Zhengwei Tao et.al.|[2404.14387](http://arxiv.org/abs/2404.14387)|**[link](https://github.com/alibabaresearch/damo-convai)**|**## \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u9886\u57df\u548c\u667a\u80fd\u4ee3\u7406\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4f9d\u8d56\u4eba\u7c7b\u6216\u5916\u90e8\u6a21\u578b\u76d1\u7763\u7684\u73b0\u6709LLMs\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u591a\u6837\u6027\u589e\u52a0\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6210\u672c\u9ad8\u6602\u548c\u6027\u80fd\u74f6\u9888\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u5e94\u8fd0\u800c\u751f\uff0c\u8fd9\u79cd\u7b56\u7565\u5141\u8bb8LLMs\u81ea\u4e3b\u83b7\u53d6\u3001\u7cbe\u70bc\u5e76\u4ece\u81ea\u8eab\u751f\u6210\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\uff0c\u501f\u9274\u4eba\u7c7b\u7ecf\u9a8c\u5b66\u4e60\u8fc7\u7a0b\uff0c\u6709\u671b\u63a8\u52a8LLMs\u5411\u8d85\u7ea7\u667a\u80fd\u53d1\u5c55\u3002\u672c\u6587\u5168\u9762\u7efc\u8ff0\u4e86LLMs\u4e2d\u7684\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u6982\u5ff5\u6846\u67b6\uff0c\u5c06\u8fdb\u5316\u8fc7\u7a0b\u5212\u5206\u4e3a\u8fed\u4ee3\u5faa\u73af\u7684\u56db\u4e2a\u9636\u6bb5\uff1a\u7ecf\u9a8c\u83b7\u53d6\u3001\u7ecf\u9a8c\u7ec6\u5316\u3001\u66f4\u65b0\u548c\u8bc4\u4f30\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5206\u7c7b\u63a2\u8ba8LLMs\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u8fdb\u5316\u76ee\u6807\uff0c\u5e76\u5bf9\u76f8\u5173\u6587\u732e\u8fdb\u884c\u603b\u7ed3\uff0c\u63d0\u4f9b\u6bcf\u4e2a\u6a21\u5757\u7684\u5206\u7c7b\u548c\u89c1\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5f53\u524d\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u4e3a\u52a0\u901f\u81ea\u6f14\u8fdbLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u5173\u952e\u6d1e\u89c1\u3002**|\n", "2404.13501": "|**2024-04-21**|**A Survey on the Memory Mechanism of Large Language Model based Agents**|Zeyu Zhang et.al.|[2404.13501](http://arxiv.org/abs/2404.13501)|**[link](https://github.com/nuster1128/llm_agent_memory_survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u79d1\u7814\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u57fa\u4e8eLLMs\u7684\u667a\u80fd\u4ee3\u7406\u56e0\u5176\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5bf9\u4e8e\u89e3\u51b3\u9700\u8981\u957f\u671f\u590d\u6742\u4ea4\u4e92\u7684\u73b0\u5b9e\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u652f\u6301agent-environment\u4ea4\u4e92\u7684\u5173\u952e\u8981\u7d20\u662f\u4ee3\u7406\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u5c3d\u7ba1\u5df2\u6709\u4f17\u591a\u6709\u524d\u666f\u7684\u8bb0\u5fc6\u8bbe\u8ba1\u88ab\u63d0\u51fa\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u5206\u6563\u5728\u591a\u7bc7\u8bba\u6587\u4e2d\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7efc\u8ff0\u6765\u7cfb\u7edf\u6027\u5730\u603b\u7ed3\u548c\u6bd4\u8f83\uff0c\u672a\u80fd\u63d0\u70bc\u51fa\u901a\u7528\u4e14\u6709\u6548\u7684\u8bbe\u8ba1\u6a21\u5f0f\u4ee5\u542f\u53d1\u540e\u7eed\u7814\u7a76\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4efd\u5173\u4e8eLLM\u57fa\u4ee3\u7406\u8bb0\u5fc6\u673a\u5236\u7684\u5168\u9762\u8c03\u67e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8\u8bb0\u5fc6\u5728LLM\u4ee3\u7406\u4e2d\u7684\u201c\u662f\u4ec0\u4e48\u201d\u4ee5\u53ca\u201c\u4e3a\u4ec0\u4e48\u9700\u8981\u201d\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u56de\u987e\u4e86\u5173\u4e8e\u8bb0\u5fc6\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4f1a\u5c55\u793a\u8bb0\u5fc6\u6a21\u5757\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u626e\u6f14\u7684\u91cd\u8981\u89d2\u8272\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f1a\u5206\u6790\u73b0\u6709\u5de5\u4f5c\u7684\u5c40\u9650\uff0c\u5e76\u6307\u51fa\u91cd\u8981\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u4e86\u8ddf\u8e2a\u8be5\u9886\u57df\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2aGitHub\u4ed3\u5e93\uff1a\\url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}\u3002**|\n", "2404.11964": "|**2024-04-18**|**From Language Models to Practical Self-Improving Computer Agents**|Alex Sheng et.al.|[2404.11964](http://arxiv.org/abs/2404.11964)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u76f4\u63a5\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u521b\u5efa\u80fd\u591f\u6267\u884c\u5404\u79cd\u8ba1\u7b97\u673a\u4efb\u52a1\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u5e76\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u6765\u53d1\u5c55\u5de5\u5177\u548c\u589e\u5f3a\u529f\u80fd\uff0c\u4ee5\u89e3\u51b3\u65e5\u76ca\u590d\u6742\u7684\u4efb\u52a1\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u663e\u793a\u51fa\u4ece\u975e\u53c2\u6570\u589e\u5f3a\u4e2d\u83b7\u76ca\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u5927\u91cf\u96c6\u4e2d\u5728\u5f00\u53d1\u8f6f\u4ef6\uff0c\u4ee5\u8d4b\u4e88LLMs\u5404\u79cd\u80fd\u529b\u3002\u6211\u4eec\u5efa\u8bae\uff0c\u901a\u8fc7\u9002\u5f53\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u4e00\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u7cfb\u7edf\u5730\u751f\u6210\u8f6f\u4ef6\u6765\u589e\u5f3a\u81ea\u8eab\uff0c\u800c\u4e0d\u662f\u4f9d\u8d56\u4eba\u7c7b\u5de5\u7a0b\u7684\u9759\u6001\u8f6f\u4ef6\u5f00\u53d1\u3002 \u6211\u4eec\u901a\u8fc7\u4e00\u4e9b\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff1a\u4ec5\u901a\u8fc7\u7ec8\u7aef\u8bbf\u95ee\uff0c\u6211\u4eec\u5f15\u5bfcLLM\u4ee3\u7406\u6dfb\u52a0\u4e86\u68c0\u7d22\u3001\u4e92\u8054\u7f51\u641c\u7d22\u3001\u7f51\u9875\u5bfc\u822a\u548c\u6587\u672c\u7f16\u8f91\u529f\u80fd\u3002\u8be5\u4ee3\u7406\u6709\u6548\u5730\u5229\u7528\u8fd9\u4e9b\u5de5\u5177\u89e3\u51b3\u4e86\u95ee\u9898\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u548c\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\u3002\u8fd9\u79cd\u65b9\u6cd5\u8868\u660e\uff0c\u901a\u8fc7\u8fde\u7eed\u63d0\u95ee\u548c\u5de7\u5999\u7684\u63d0\u793a\u8bbe\u8ba1\uff0cLLM\u80fd\u591f\u81ea\u4e3b\u6269\u5c55\u5176\u529f\u80fd\uff0c\u6267\u884c\u5b9e\u9645\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002|\n", "2404.11794": "|**2024-04-25**|**Automated Social Science: Language Models as Scientist and Subjects**|Benjamin S. Manning et.al.|[2404.11794](http://arxiv.org/abs/2404.11794)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u6784\u5efa\u548c\u6d4b\u8bd5\u793e\u4f1a\u79d1\u5b66\u5047\u8bbe\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u5173\u952e\u5728\u4e8e\u4f7f\u7528\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u3002\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9648\u8ff0\u5047\u8bbe\u7684\u8bed\u8a00\u3001\u6784\u5efaLLM\u57fa\u7840\u4ee3\u7406\u7684\u84dd\u56fe\u3001\u5b9e\u9a8c\u8bbe\u8ba1\u4ee5\u53ca\u6570\u636e\u5206\u6790\u8ba1\u5212\u3002\u62df\u5408\u540e\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u53ef\u4f9b\u9884\u6d4b\u6216\u89c4\u5212\u540e\u7eed\u5b9e\u9a8c\u3002\u6211\u4eec\u901a\u8fc7\u51e0\u4e2a\u573a\u666f\u8fdb\u884c\u4e86\u6f14\u793a\uff1a\u8c08\u5224\u3001\u4fdd\u91ca\u542c\u8bc1\u4f1a\u3001\u6c42\u804c\u9762\u8bd5\u548c\u62cd\u5356\u3002\u5728\u8fd9\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7cfb\u7edf\u65e2\u63d0\u51fa\u4e86\u56e0\u679c\u5173\u7cfb\uff0c\u4e5f\u8fdb\u884c\u4e86\u68c0\u9a8c\uff0c\u53d1\u73b0\u4e86\u4e00\u4e9b\u8bc1\u636e\uff0c\u800c\u6709\u4e9b\u5219\u6ca1\u6709\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u4ece\u8fd9\u4e9b\u793e\u4f1a\u4e92\u52a8\u6a21\u62df\u4e2d\u83b7\u53d6\u7684\u6d1e\u5bdf\u5e76\u975e\u4ec5\u901a\u8fc7\u76f4\u63a5\u8be2\u95eeLLM\u5c31\u80fd\u83b7\u5f97\u3002\u5f53\u7ed9\u5b9a\u6bcf\u4e2a\u573a\u666f\u7684\u5efa\u8bae\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u65f6\uff0cLLM\u5728\u9884\u6d4b\u4f30\u8ba1\u6548\u5e94\u7684\u7b26\u53f7\u65b9\u9762\u8868\u73b0\u826f\u597d\uff0c\u4f46\u65e0\u6cd5\u53ef\u9760\u5730\u9884\u6d4b\u6548\u5e94\u7684\u5927\u5c0f\u3002\u5728\u62cd\u5356\u5b9e\u9a8c\u4e2d\uff0c\u6a21\u62df\u7ed3\u679c\u4e0e\u62cd\u5356\u7406\u8bba\u7684\u9884\u6d4b\u7d27\u5bc6\u543b\u5408\uff0c\u4f46LLM\u76f4\u63a5\u63d0\u53d6\u7684\u6e05\u7b97\u4ef7\u683c\u9884\u6d4b\u4e0d\u51c6\u786e\u3002\u7136\u800c\uff0c\u5982\u679c\u6a21\u578b\u80fd\u57fa\u4e8e\u62df\u5408\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u8fdb\u884c\u6761\u4ef6\u5316\uff0cLLM\u7684\u9884\u6d4b\u4f1a\u5927\u5e45\u6539\u8fdb\u3002\u7b80\u800c\u8a00\u4e4b\uff0cLLM\u77e5\u9053\u7684\u6bd4\u5b83\u80fd\u7acb\u5373\u8868\u8fbe\u7684\u8981\u591a\u3002|\n", "2404.11483": "|**2024-04-17**|**AgentKit: Flow Engineering with Graphs, not Coding**|Yue Wu et.al.|[2404.11483](http://arxiv.org/abs/2404.11483)|**[link](https://github.com/holmeswww/agentkit)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u76f4\u89c2\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u6846\u67b6\uff08AgentKit\uff09\uff0c\u65e8\u5728\u4e3a\u591a\u529f\u80fd\u4ee3\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u65b9\u6cd5\u3002AgentKit\u901a\u8fc7\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6784\u5efa\u590d\u6742\u7684\u201c\u601d\u7ef4\u8fc7\u7a0b\u201d\u3002\u5176\u57fa\u672c\u5355\u5143\u662f\u8282\u70b9\uff0c\u5305\u542b\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u3002\u7528\u6237\u53ef\u4ee5\u50cf\u62fc\u63a5\u4e50\u9ad8\u79ef\u6728\u4e00\u6837\u8fde\u63a5\u8fd9\u4e9b\u8282\u70b9\uff0c\u4ece\u800c\u660e\u786e\u8bbe\u8ba1\u51fa\u81ea\u7136\u7ed3\u6784\u5316\u7684\u201c\u601d\u8003\u6d41\u7a0b\u201d\u3002\u4f8b\u5982\uff0c\u5728\u64b0\u5199\u8bba\u6587\u65f6\uff0c\u53ef\u80fd\u7684\u6b65\u9aa4\u5305\u62ec\uff1a1\uff09\u786e\u5b9a\u6838\u5fc3\u4fe1\u606f\uff0c2\uff09\u8bc6\u522b\u7814\u7a76\u7a7a\u767d\u7b49\u3002AgentKit\u7684\u6a21\u5757\u5316\u7279\u6027\u4f7f\u5f97\u9ad8\u7ea7\u529f\u80fd\u5982\u5373\u5174\u7684\u5c42\u6b21\u5316\u89c4\u5212\u3001\u53cd\u601d\u548c\u4ece\u4e92\u52a8\u4e2d\u5b66\u4e60\u53d8\u5f97\u53ef\u80fd\u3002\u7531\u4e8e\u5176\u76f4\u89c2\u4e14\u6a21\u62df\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u7684\u8bbe\u8ba1\uff0c\u5373\u4f7f\u6ca1\u6709\u7f16\u7a0b\u7ecf\u9a8c\u7684\u4eba\u4e5f\u80fd\u521b\u5efa\u548c\u8c03\u6574\u57fa\u7840\u4ee3\u7406\u3002\u5b9a\u91cf\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528AgentKit\u8bbe\u8ba1\u7684\u4ee3\u7406\u5728WebShop\u548cCrafter\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u6210\u679c\u8868\u660eAgentKit\u6709\u6f5c\u529b\u4f7fLLM\u4ee3\u7406\u5728\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u4e0b\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728GitHub\uff1ahttps://github.com/holmeswww/AgentKit\u3002**|\n", "2404.09982": "|**2024-04-15**|**Memory Sharing for Large Language Model based Agents**|Hang Gao et.al.|[2404.09982](http://arxiv.org/abs/2404.09982)|**[link](https://github.com/ghupppp/memorysharingllm)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6267\u884c\u4efb\u52a1\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u91cd\u5927\u7a81\u7834\uff0c\u5b83\u51cf\u5c11\u4e86\u5bf9\u56fa\u5b9a\u7b54\u6848\u4efb\u52a1\uff08\u5982\u5e38\u8bc6\u95ee\u9898\u548c\u662f\u975e\u67e5\u8be2\uff09\u7684\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03\u9700\u6c42\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u5f00\u653e\u6027\u6311\u6218\u5982\u8bd7\u6b4c\u521b\u4f5c\u65f6\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5c40\u9650\uff0c\u4e3b\u8981\u6e90\u4e8e\u63d0\u4f9b\u7684\u793a\u4f8b\u5168\u9762\u6027\u4ee5\u53ca\u6a21\u578b\u7406\u89e3\u95ee\u9898\u5185\u5bb9\u7684\u80fd\u529b\u4e0d\u8db3\uff0c\u5bfc\u81f4\u8f93\u51fa\u5f80\u5f80\u4e0e\u9884\u671f\u7ed3\u679c\u5927\u76f8\u5f84\u5ead\u3002\u9488\u5bf9\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86Memory-Sharing\uff08MS\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLM\u591a\u4ee3\u7406\u7684\u5b9e\u65f6\u8bb0\u5fc6\u5b58\u50a8\u548c\u68c0\u7d22\u7cfb\u7edf\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u8fc7\u7a0b\u3002\u6bcf\u4e2a\u201c\u8bb0\u5fc6\u201d\u5355\u5143\u8bb0\u5f55\u4e86\u63d0\u51fa\u7684\u67e5\u8be2\u53ca\u5176\u6765\u81eaLLM\u4ee3\u7406\u7684\u5373\u65f6\u54cd\u5e94\uff0c\u4ece\u591a\u4e2a\u7c7b\u4f3c\u4ee3\u7406\u4e2d\u805a\u5408\u8fd9\u4e9b\u8bb0\u5fc6\uff0c\u5f62\u6210\u6240\u6709\u4ee3\u7406\u5171\u4eab\u7684\u4e30\u5bcc\u8bb0\u5fc6\u6c60\u3002MS\u6846\u67b6\u4e0d\u4ec5\u5e2e\u52a9\u4ee3\u7406\u627e\u5230\u7279\u5b9a\u4efb\u52a1\u7684\u76f8\u5173\u793a\u4f8b\uff0c\u8fd8\u8bc4\u4f30\u5176\u8bb0\u5fc6\u7684\u6f5c\u5728\u5229\u7528\u4ef7\u503c\uff0c\u4f9b\u5176\u4ed6\u4ee3\u7406\u672a\u6765\u5e94\u7528\u3002\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u7684\u5b9e\u8bc1\u9a8c\u8bc1\u663e\u793a\uff0cMS\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5904\u7406\u5f00\u653e\u6027\u95ee\u9898\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u54ea\u79cd\u8bb0\u5fc6\u6c60\u548c\u68c0\u7d22\u7b56\u7565\u80fd\u66f4\u597d\u5730\u652f\u6301\u4ee3\u7406\uff0c\u4e3aMS\u7684\u672a\u6765\u53d1\u5c55\u63d0\u4f9b\u4e86\u65b9\u5411\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\uff1ahttps://github.com/GHupppp/MemorySharingLLM \u83b7\u53d6\u3002**|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127](http://arxiv.org/abs/2404.09127)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|**### \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u5b83\u4eec\u901a\u5e38\u6821\u51c6\u4e0d\u826f\u4e14\u8fc7\u5ea6\u81ea\u4fe1\uff0c\u7279\u522b\u662f\u5728\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u4e2d\u3002\u4eba\u7c7b\u7684\u51b3\u7b56\u548c\u4fe1\u5fc3\u4e0d\u4ec5\u6e90\u4e8e\u5185\u5728\u4fe1\u5ff5\uff0c\u8fd8\u80fd\u901a\u8fc7\u65e5\u5e38\u89c2\u5bdf\u8fdb\u884c\u8c03\u6574\uff0c\u800c\u73b0\u6709LLM\u7684\u6821\u51c6\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u5355\u4e2a\u6a21\u578b\u7684\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u201c\u96c6\u4f53\u667a\u6167\u201d\uff1a\u591a\u4e2aLLM\u4e4b\u95f4\u7684\u534f\u4f5c\u8868\u8fbe\u80fd\u529b\uff0c\u8fd9\u53ef\u4ee5\u96c6\u4f53\u63d0\u9ad8\u51c6\u786e\u6027\u548c\u6821\u51c6\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u8bad\u7ec3\u540e\u5904\u7406\u7684\u6821\u51c6\u7b56\u7565\u2014\u2014\u534f\u4f5c\u6821\u51c6\uff08Collaborative Calibration\uff09\uff0c\u5b83\u5229\u7528\u591a\u4ee3\u7406\u5de5\u5177\u589e\u5f3a\u7684LLMs\u5728\u6a21\u62df\u7684\u7fa4\u4f53\u8ba8\u8bba\u8fc7\u7a0b\u4e2d\uff0c\u5171\u540c\u63d0\u5347\u6821\u51c6\u80fd\u529b\u548c\u63a8\u7406\u5408\u7406\u6027\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u751f\u6210\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u534f\u4f5c\u6821\u51c6\u7684\u6709\u6548\u6027\uff0c\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u6574\u5408\u96c6\u4f53\u6821\u51c6\u540e\u7684\u4fe1\u5fc3\u8bc4\u4f30\u548c\u63d0\u5347\u6a21\u578b\u9884\u6d4b\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077](http://arxiv.org/abs/2404.09077)|**[link](https://github.com/zukangy/kgp-curiousllm)**|**\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u5e93\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u6210\u6548\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u5f80\u5f80\u529b\u6709\u4e0d\u902e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5bf9\u4e00\u79cd\u540d\u4e3a\u77e5\u8bc6\u56fe\u8c31\u63d0\u793a\uff08KGP\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4ee5\u63d0\u5347\u63a8\u7406\u548c\u641c\u7d22\u7cbe\u5ea6\u3002\u7136\u800c\uff0c\u539f\u59cb\u7684KGP\u6846\u67b6\u9700\u8981\u6602\u8d35\u7684\u5927\u89c4\u6a21\u6570\u636e\u5fae\u8c03\uff0c\u5e76\u4e14\u4ecd\u5b58\u5728LLM\u7684\u9519\u8bef\u63a8\u65ad\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u878d\u5165\u63a8\u7406\u80fd\u529b\u7684LLM\u4ee3\u7406\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u7684\u597d\u5947\u5fc3\uff0c\u901a\u8fc7\u63d0\u95ee\u6765\u66f4\u6709\u6548\u5730\u5bfc\u822a\u641c\u7d22\u8fc7\u7a0b\u3002\u8fd9\u4e2a\u7b80\u5355\u7684\u6539\u8fdb\u663e\u8457\u63d0\u9ad8\u4e86LLM\u5728QA\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u521d\u59cbKGP\u6846\u67b6\u7684\u9ad8\u6210\u672c\u548c\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u8fdb\u4e00\u6b65\u53d1\u5c55\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u7cbe\u786e\u3001\u66f4\u5feb\u6377\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684QA\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2404.09043": "|**2024-04-13**|**Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation**|Jia Gu et.al.|[2404.09043](http://arxiv.org/abs/2404.09043)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u98de\u901f\u53d1\u5c55\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528LLMs\u6a21\u62df\u4eba\u7c7b\u7684\u884c\u4e3a\u51b3\u7b56\u8fc7\u7a0b\uff0c\u901a\u5e38\u8fd9\u4e9b\u8fc7\u7a0b\u88ab\u8868\u793a\u4e3a\u9a6c\u5c14\u53ef\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDPs\uff09\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u52a8\u4f5c\u9075\u5faa\u7279\u5b9a\u7684\u6982\u7387\u5206\u5e03\uff0c\u5e76\u9700\u8981\u8fed\u4ee3\u91c7\u6837\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u63a2\u7a76LLM\u4ee3\u7406\u7406\u89e3\u6982\u7387\u5206\u5e03\u7684\u80fd\u529b\uff0c\u4ee5\u901a\u8fc7\u6982\u7387\u91c7\u6837\u6307\u5bfc\u884c\u4e3a\u51b3\u7b56\u5e76\u751f\u6210\u884c\u4e3a\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u65b9\u9762\uff1a\u4e00\u662f\u5df2\u77e5\u7cbe\u786e\u6982\u7387\u5206\u5e03\u7684\u6a21\u62df\uff0c\u4e8c\u662f\u6a21\u7cca\u6982\u7387\u5206\u5e03\u7684\u5e8f\u5217\u751f\u6210\u3002 \u5728\u5df2\u77e5\u6982\u7387\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\uff0c\u4ee3\u7406\u9700\u8981\u6839\u636e\u95ee\u9898\u63cf\u8ff0\u63d0\u4f9b\u6982\u7387\u5206\u5e03\u7684\u7c7b\u578b\u548c\u53c2\u6570\uff0c\u7136\u540e\u7ed9\u51fa\u91c7\u6837\u5e8f\u5217\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u663e\u793a\uff0cLLM\u4ee3\u7406\u5728\u8fd9\u65b9\u9762\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4f46\u901a\u8fc7\u7f16\u7a0b\u5de5\u5177\u53ef\u4ee5\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u63d0\u9ad8\u91c7\u6837\u6210\u529f\u7387\u3002\u800c\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\uff0c\u6982\u7387\u5206\u5e03\u5f80\u5f80\u4e0d\u660e\u786e\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5728\u7b2c\u4e8c\u90e8\u5206\u8ba9\u4ee3\u7406\u8c03\u6574\u5728\u7ebf\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u6d3b\u8dc3\u5ea6\uff0c\u5e76\u5206\u6790\u884c\u52a8\u9891\u7387\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u501f\u52a9\u7f16\u7a0b\u5de5\u5177\uff0cLLM\u4ee3\u7406\u4f9d\u7136\u65e0\u6cd5\u6709\u6548\u5730\u91c7\u6837\u6982\u7387\u5206\u5e03\u3002\u8fd9\u610f\u5473\u7740\u5728\u76f4\u63a5\u5c06LLM\u4f5c\u4e3a\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\u5e94\u7528\u4e4b\u524d\uff0c\u8fd8\u9700\u8981\u8c28\u614e\u5bf9\u5f85\u3002|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492](http://arxiv.org/abs/2404.08492)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u535a\u5f08\u8bba\u6846\u67b6\u4e0b\u7684\u6e38\u620f\u884c\u4e3a\u7406\u89e3\u6f5c\u529b\u65e5\u76ca\u663e\u73b0\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u901a\u8fc7\u6a21\u62df\u5206\u6790\u4e0d\u540c\u7c7b\u578bLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u7ecf\u5178 Beauty Contest \u6e38\u620f\u4e2d\u7684\u7b56\u7565\u4e92\u52a8\u3002\u501f\u9274\u4eba\u7c7b\u5b9e\u9a8c\uff0c\u6211\u4eec\u5bf9LLM\u4ee3\u7406\u7684\u7b56\u7565\u5c42\u6b21\u8fdb\u884c\u7c7b\u4f3c\u7684\u8bc4\u4f30\uff0c\u53d1\u73b0\u5b83\u4eec\u5c55\u73b0\u51fa\u4ece\u96f6\u7ea7\u5230\u4e00\u7ea7\u7684\u4e0d\u540c\u7a0b\u5ea6\u63a8\u7406\u80fd\u529b\uff0c\u5e76\u5728\u91cd\u590d\u6e38\u620f\u4e2d\u8868\u73b0\u51fa\u884c\u52a8\u8d8b\u540c\u3002\u6b64\u5916\uff0c\u6211\u8fd8\u63a2\u8ba8\u4e86\u4e0d\u540c\u7c7b\u578b\u7684\u4ee3\u7406\u7fa4\u4f53\u6784\u6210\u5982\u4f55\u5f71\u54cd\u6218\u7565\u884c\u4e3a\uff1a\u9ad8\u6bd4\u4f8b\u7684\u56fa\u5b9a\u7b56\u7565\u5bf9\u624b\u80fd\u4fc3\u8fdbLLM\u4ee3\u7406\u7684\u6536\u655b\uff0c\u800c\u6df7\u5408\u73af\u5883\u4e2d\u4e0d\u540c\u76f8\u5bf9\u7b56\u7565\u6c34\u5e73\u7684\u4ee3\u7406\u5171\u5b58\u4f1a\u52a0\u901f\u6240\u6709\u4ee3\u7406\u7684\u6536\u655b\u3002\u66f4\u667a\u80fd\u7684\u4ee3\u7406\u53ef\u80fd\u83b7\u5f97\u66f4\u9ad8\u7684\u5e73\u5747\u6536\u76ca\uff0c\u4f46\u8fd9\u662f\u4ee5\u8f83\u4f4e\u667a\u80fd\u4ee3\u7406\u7684\u727a\u7272\u4e3a\u4ee3\u4ef7\u7684\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e0d\u4ec5\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u60c5\u666f\u4e0b\u6a21\u62df\u4ee3\u7406\u7684\u7ed3\u5c40\uff0c\u8fd8\u4e3a\u7406\u89e3\u7b97\u6cd5\u4e4b\u95f4\u7684\u6218\u7565\u4e92\u52a8\u63d0\u4f9b\u4e86\u91cd\u8981\u542f\u793a\u3002|\n", "2404.08144": "|**2024-04-17**|**LLM Agents can Autonomously Exploit One-day Vulnerabilities**|Richard Fang et.al.|[2404.08144](http://arxiv.org/abs/2404.08144)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5a01\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5176\u5728\u826f\u6027\u548c\u6076\u610f\u7528\u9014\u4e0a\u7684\u5e94\u7528\u4e5f\u65e5\u76ca\u5e7f\u6cdb\u3002\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u5b83\u4eec\u5229\u7528\u7f51\u7edc\u5b89\u5168\u6f0f\u6d1e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u81ea\u4e3b\u7834\u89e3\u7f51\u7ad9\u7684\u53ef\u80fd\u6027\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u5355\u7684\u6f0f\u6d1e\u4e0a\u3002\u672c\u5de5\u4f5c\u63ed\u793a\uff0cLLMs\u80fd\u591f\u81ea\u4e3b\u5229\u7528\u73b0\u5b9e\u4e16\u754c\u7cfb\u7edf\u4e2d\u7684\u5355\u65e5\u6f0f\u6d1e\u3002\u6211\u4eec\u6536\u96c6\u4e86\u4e00\u7ec4\u5305\u542b15\u4e2a\u88abCVE\u63cf\u8ff0\u4e3a\u201c\u5173\u952e\u4e25\u91cd\u6027\u201d\u7684\u4e00\u5929\u671f\u6f0f\u6d1e\u6570\u636e\u3002\u5f53\u63d0\u4f9bCVE\u63cf\u8ff0\u65f6\uff0cGPT-4\u6a21\u578b\u80fd\u6210\u529f\u5229\u752887%\u7684\u6f0f\u6d1e\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5176\u4ed6\u6d4b\u8bd5\u6a21\u578b\uff08\u5982GPT-3.5\u3001\u5f00\u6e90LLMs\u548c\u5f00\u6e90\u6f0f\u6d1e\u626b\u63cf\u5668ZAP\u548cMetasploit\uff09\u7684\u8868\u73b0\u5747\u4e3a0%\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684GPT-4\u6a21\u578b\u5728\u6ca1\u6709\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u6548\u7387\u5927\u51cf\uff0c\u4ec5\u80fd\u5229\u75287%\u7684\u6f0f\u6d1e\u3002\u8fd9\u4e9b\u53d1\u73b0\u5bf9\u5927\u89c4\u6a21\u90e8\u7f72\u9ad8\u80fd\u529bLLMs\u63d0\u51fa\u4e86\u8d28\u7591\u3002|\n", "2404.17586": "|**2024-04-11**|**The Future of Scientific Publishing: Automated Article Generation**|Jeremy R. Harper et.al.|[2404.17586](http://arxiv.org/abs/2404.17586)|null|\u8fd9\u9879\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8f6f\u4ef6\u5de5\u5177\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u793a\uff0c\u5b9e\u73b0\u4e86\u4ecePython\u4ee3\u7801\u81ea\u52a8\u751f\u6210\u5b66\u672f\u6587\u7ae0\uff0c\u8fd9\u5bf9\u4e8e\u751f\u7269\u533b\u5b66\u4fe1\u606f\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u9009\u62e9Python\u4f5c\u4e3a\u57fa\u7840\u793a\u4f8b\uff0c\u56e0\u5176\u5e7f\u6cdb\u4f7f\u7528\u548c\u5f3a\u5927\u7684\u6570\u636e\u5206\u6790\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u548c\u6846\u67b6\u7684\u7075\u6d3b\u6027\u4f7f\u5f97\u5176\u9002\u7528\u4e8e\u591a\u79cdGitHub\u4ed3\u5e93\uff0c\u8868\u660e\u4e86\u5de5\u5177\u7684\u5e7f\u6cdb\u5e94\u7528\u6f5c\u529b\uff08Harper\uff0c2024\u5e74\uff09\u3002\u901a\u8fc7\u7b80\u5316\u4f20\u7edf\u4e0a\u8017\u65f6\u7684\u5b66\u672f\u5199\u4f5c\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5728\u6574\u5408\u590d\u6742\u6570\u636e\u96c6\u548c\u4ee3\u7801\u8f93\u51fa\u65b9\u9762\uff0c\u8fd9\u4e00\u7a81\u7834\u6027\u8fdb\u5c55\u63a8\u52a8\u4e86\u79d1\u7814\u6210\u679c\u7684\u5feb\u901f\u4f20\u64ad\u3002\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u5e76\u672a\u4f9d\u8d56\u9ad8\u7ea7\u8bed\u8a00\u6a21\u578b\uff0c\u786e\u4fdd\u4e86\u81ea\u52a8\u5316\u751f\u6210\u5185\u5bb9\u7684\u8fde\u8d2f\u6027\u548c\u5b8c\u6574\u6027\u3002\u6b64\u6b21\u63a2\u7d22\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u8f6f\u4ef6\u7684\u6210\u529f\u5e94\u7528\u548c\u6548\u7387\uff0c\u8fd8\u9884\u793a\u4e86\u672a\u6765\u53ef\u80fd\u96c6\u6210\u66f4\u5148\u8fdb\u7684LLM\uff0c\u5c06\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u529f\u80fd\uff0c\u5f15\u9886\u4e00\u4e2a\u79d1\u7814\u53d1\u73b0\u53d1\u5e03\u66f4\u52a0\u8fc5\u901f\u548c\u6613\u83b7\u53d6\u7684\u65f6\u4ee3\u3002|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456](http://arxiv.org/abs/2404.07456)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u4f5c\u4e3a\u667a\u80fd\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u5de5\u7a0b\u6216\u4efb\u52a1\u7279\u5b9a\u7684\u5fae\u8c03\u6765\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u6216\u51b3\u7b56\u80fd\u529b\uff0c\u5ffd\u89c6\u4e86\u63a2\u7d22\u4e0e\u5229\u7528\u7684\u8fc7\u7a0b\u3002\u5728\u5904\u7406\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u9996\u5148\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u73af\u5883\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u6a21\u578b\u503e\u5411\u4e8e\u505a\u51fa\u8d2a\u5a6a\u51b3\u7b56\uff0c\u5bfc\u81f4\u89e3\u51b3\u65b9\u6848\u4e0d\u7406\u60f3\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u4ece\u73af\u5883\u4e2d\u83b7\u53d6\u7684\u65e0\u5173\u4fe1\u606f\u4e0d\u4ec5\u5f15\u5165\u566a\u58f0\uff0c\u8fd8\u589e\u52a0\u4e86\u989d\u5916\u7684\u6210\u672c\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u5f31\u63a2\u7d22\u5f3a\u5316\u5f3a\u5229\u7528\uff08Weak Exploration to Strong Exploitation\uff0cWESE\uff09\uff0c\u65e8\u5728\u589e\u5f3aLLM\u5728\u89e3\u51b3\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0cWESE\u5c06\u63a2\u7d22\u548c\u5229\u7528\u8fc7\u7a0b\u89e3\u8026\uff0c\u4f7f\u7528\u6210\u672c\u6548\u76ca\u9ad8\u7684\u201c\u5f31\u201d\u4ee3\u7406\u6267\u884c\u63a2\u7d22\u4efb\u52a1\uff0c\u4ee5\u83b7\u53d6\u5168\u5c40\u77e5\u8bc6\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7b56\u7565\u6765\u5b58\u50a8\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5e76\u63d0\u53d6\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u4ece\u800c\u63d0\u5347\u201c\u5f3a\u201d\u4ee3\u7406\u5728\u6210\u529f\u7387\u548c\u6548\u7387\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5e76\u5728\u56db\u4e2a\u4e92\u52a8\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u548c\u6548\u7387\u3002|\n", "2404.06921": "|**2024-04-10**|**GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications**|Shishir G. Patil et.al.|[2404.06921](http://arxiv.org/abs/2404.06921)|**[link](https://github.com/ShishirPatil/gorilla)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0d\u518d\u4ec5\u4ec5\u662f\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u4fe1\u606f\u63d0\u4f9b\u8005\uff0c\u800c\u662f\u5f00\u59cb\u79ef\u6781\u53c2\u4e0e\u5230\u4e0e\u5b9e\u9645\u5e94\u7528\u548c\u670d\u52a1\u7684\u4e92\u52a8\u4e2d\u3002\u5982\u4eca\uff0c\u4eba\u7c7b\u5728\u5c06LLM\u751f\u6210\u7684\u8f93\u51fa\uff08\u5982\u4ee3\u7801\u3001\u51fd\u6570\u6216\u64cd\u4f5c\uff09\u6295\u5165\u73b0\u5b9e\u4e16\u754c\u6267\u884c\u524d\uff0c\u9700\u8981\u9a8c\u8bc1\u5176\u6b63\u786e\u6027\u548c\u9002\u7528\u6027\uff0c\u8fd9\u5e26\u6765\u4e86\u6311\u6218\uff0c\u56e0\u4e3a\u4ee3\u7801\u7406\u89e3\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u975e\u5e38\u56f0\u96be\u3002\u672c\u6587\u7814\u7a76\u4e86\u4eba\u7c7b\u5982\u4f55\u80fd\u6709\u6548\u4e0eLLMs\u534f\u4f5c\u3001\u59d4\u6d3e\u548c\u76d1\u7763\uff0c\u7279\u522b\u662f\u5728\u672a\u6765\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u5bf9\u63d0\u51fa\u7684\u884c\u52a8\u8fdb\u884c\u201c\u4e8b\u540e\u9a8c\u8bc1\u201d\uff08\u5728\u770b\u5230\u8f93\u51fa\u540e\u786e\u8ba4\u5176\u6b63\u786e\u6027\uff09\u6bd4\u4e4b\u524d\u7684\u201c\u4e8b\u524d\u9a8c\u8bc1\u201d\u66f4\u4e3a\u5bb9\u6613\u3002\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u96c6\u6210\u76f4\u89c2\u7684\u64a4\u9500\u529f\u80fd\uff0c\u5e76\u4e3aLLM\u751f\u6210\u7684\u52a8\u4f5c\u8bbe\u5b9a\u635f\u5bb3\u7ea6\u675f\uff0c\u4f5c\u4e3a\u964d\u4f4e\u76f8\u5173\u98ce\u9669\u7684\u6709\u6548\u7b56\u7565\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u4eba\u7c7b\u53ef\u4ee5\u64a4\u9500LLM\u8f93\u51fa\u7684\u5f71\u54cd\uff0c\u6216\u8005\u786e\u4fe1\u6f5c\u5728\u98ce\u9669\u662f\u6709\u9650\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u5b9e\u73b0LLMs\u4e0e\u5e94\u7528\u548c\u670d\u52a1\u5728\u6709\u9650\u7684\u4eba\u7c7b\u76d1\u7763\u4e0b\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u63cf\u8ff0\u4e86\u5f00\u6e90\u8fd0\u884c\u65f6Gorilla Execution Engine\uff08GoEX\uff09\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\uff0c\u8be5\u8fd0\u884c\u65f6\u7528\u4e8e\u6267\u884cLLM\u52a8\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e9b\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\uff0c\u65e8\u5728\u63a8\u52a8LLMs\u4e0e\u5e94\u7528\u4e4b\u95f4\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u8fdb\u884c\u4ea4\u4e92\u3002GoEX\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ShishirPatil/gorilla/\u3002**|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411](http://arxiv.org/abs/2404.06411)|**[link](https://github.com/nec-research/agentquest)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\uff0c\u4eba\u4eec\u8ffd\u6c42\u80fd\u591f\u89e3\u51b3\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u7684LLM\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u5f80\u5f80\u5c40\u9650\u4e14\u53ea\u5173\u6ce8\u6574\u4f53\u4efb\u52a1\u6210\u529f\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AgentQuest\u6846\u67b6\uff0c\u5b83\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\uff08i\uff09benchmark\u548c\u8bc4\u4f30\u6307\u6807\u6a21\u5757\u5316\u4e14\u6613\u4e8e\u6269\u5c55\uff0c\u901a\u8fc7\u6587\u6863\u9f50\u5168\u3001\u6613\u7528\u7684API\uff1b\uff08ii\uff09\u6211\u4eec\u63d0\u4f9b\u4e86\u4e24\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u80fd\u591f\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u53ef\u9760\u5730\u8ffd\u8e2aLLM\u4ee3\u7406\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u793a\u4f8b\u5c55\u793a\u4e86\u8fd9\u4e9b\u6307\u6807\u7684\u5b9e\u7528\u6027\uff0c\u901a\u8fc7\u8bc6\u522b\u5e38\u89c1\u5931\u8d25\u70b9\u5e76\u4f18\u5316\u4ee3\u7406\u67b6\u6784\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6211\u4eec\u5e0c\u671b\u4e0e\u7814\u7a76\u754c\u5171\u540c\u6269\u5c55AgentQuest\uff0c\u5e76\u5df2\u5c06\u5176\u5f00\u6e90\u5728https://github.com/nec-research/agentquest\u3002**|\n", "2404.05427": "|**2024-04-15**|**AutoCodeRover: Autonomous Program Improvement**|Yuntong Zhang et.al.|[2404.05427](http://arxiv.org/abs/2404.05427)|**[link](https://github.com/nus-apr/auto-code-rover)**|**\u5728\u8fc7\u53bb\u51e0\u5341\u5e74\u91cc\uff0c\u7814\u7a76\u4eba\u5458\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u6781\u5927\u5730\u63a8\u52a8\u4e86\u7f16\u7a0b\u8f85\u52a9\u7684\u81ea\u52a8\u5316\u3002\u7136\u800c\uff0c\u8f6f\u4ef6\u5de5\u7a0b\u5e76\u4e0d\u4ec5\u4ec5\u662f\u7f16\u7801\uff0c\u8fd8\u5305\u62ec\u7ef4\u62a4\uff08\u5982\u4fee\u590dbug\uff09\u548c\u6f14\u5316\uff08\u5982\u6dfb\u52a0\u529f\u80fd\uff09\u7b49\u7a0b\u5e8f\u6539\u8fdb\u8fc7\u7a0b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u89e3\u51b3GitHub\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5b9e\u73b0\u7a0b\u5e8f\u81ea\u4e3b\u6539\u8fdb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u79f0\u4e3aAutoCodeRover\uff0c\u5b83\u7ed3\u5408\u4e86LLMs\u4e0e\u9ad8\u7ea7\u4ee3\u7801\u641c\u7d22\u80fd\u529b\uff0c\u6700\u7ec8\u751f\u6210\u7a0b\u5e8f\u4fee\u6539\u6216\u8865\u4e01\u3002\u4e0eAI\u7814\u7a76\u8005\u548c\u4ece\u4e1a\u8005\u8fd1\u671f\u5173\u6ce8\u7684\u4ec5\u6587\u4ef6\u7ea7\u522b\u7684\u8f6f\u4ef6\u9879\u76ee\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u7a0b\u5e8f\u8868\u793a\uff08\u62bd\u8c61\u8bed\u6cd5\u6811\uff09\uff0c\u5229\u7528\u7c7b/\u65b9\u6cd5\u7684\u7a0b\u5e8f\u7ed3\u6784\u6765\u589e\u5f3aLLM\u5bf9\u95ee\u9898\u6839\u672c\u539f\u56e0\u7684\u7406\u89e3\uff0c\u5e76\u901a\u8fc7\u8fed\u4ee3\u641c\u7d22\u63d0\u4f9b\u4e0a\u4e0b\u6587\u3002\u5f53\u6d4b\u8bd5\u5957\u4ef6\u53ef\u7528\u65f6\uff0c\u8c31\u7cfb\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u6280\u672f\u8fdb\u4e00\u6b65\u7cbe\u786e\u4e86\u4e0a\u4e0b\u6587\u3002 \u5728SWE-bench-lite\uff0c\u4e00\u4e2a\u5305\u542b300\u4e2a\u771f\u5b9eGitHub\u95ee\u9898\u7684\u6570\u636e\u96c6\u4e0a\uff0cAutoCodeRover\u7684\u89e3\u51b3\u65b9\u6848\u6548\u679c\u63d0\u5347\uff0c\u89e3\u51b3\u4e86\u7ea622-23%\u7684\u95ee\u9898\u3002\u5bf9\u4e8e\u5168\u91cf\u7684SWE-bench\uff0c\u5305\u542b2294\u4e2aGitHub\u95ee\u9898\uff0cAutoCodeRover\u89e3\u51b3\u4e86\u5927\u7ea616%\u7684\u95ee\u9898\uff0c\u8fd9\u6bd4\u6700\u8fd1\u62a5\u9053\u7684\u6765\u81eaCognition Labs\u7684AI\u8f6f\u4ef6\u5de5\u7a0b\u5e08Devin\u7684\u8868\u73b0\u8fd8\u8981\u9ad8\uff0c\u800c\u4e14\u65f6\u95f4\u6d88\u8017\u4e0eDevin\u76f8\u5f53\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u6d41\u7a0b\u80fd\u591f\u63a8\u52a8\u81ea\u4e3b\u8f6f\u4ef6\u5de5\u7a0b\u7684\u53d1\u5c55\uff0c\u672a\u6765LLM\u81ea\u52a8\u751f\u6210\u7684\u4ee3\u7801\u53ef\u4ee5\u88ab\u81ea\u52a8\u5730\u8fdb\u884c\u4f18\u5316\u548c\u6539\u8fdb\u3002**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291](http://arxiv.org/abs/2404.05291)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u63d0\u5347\u56db\u8db3\u673a\u5668\u4eba\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u8d85\u8d8a\u77ed\u671f\u52a8\u4f5c\u7684\u957f\u671f\u4efb\u52a1\u3002\u5bf9\u4e8e\u56db\u8db3\u673a\u5668\u4eba\u6765\u8bf4\uff0c\u957f\u671f\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u5bf9\u4efb\u52a1\u7684\u8bed\u4e49\u6709\u9ad8\u5c42\u7406\u89e3\uff0c\u5e76\u5177\u5907\u5e7f\u6cdb\u7684\u8fd0\u52a8\u548c\u64cd\u7eb5\u6280\u80fd\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5c42\u63a8\u7406\u5c42\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u4efb\u52a1\u63cf\u8ff0\u4e2d\u751f\u6210\u6df7\u5408\u79bb\u6563-\u8fde\u7eed\u7684\u8ba1\u5212\uff0c\u4f5c\u4e3a\u673a\u5668\u4eba\u4ee3\u7801\u3002\u5b83\u5305\u62ec\u591a\u4e2aLLM\u4ee3\u7406\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u601d\u8ba1\u5212\u7684\u8bed\u4e49\u89c4\u5212\u5668\u3001\u4e00\u4e2a\u53c2\u6570\u8ba1\u7b97\u5668\uff0c\u7528\u4e8e\u9884\u6d4b\u8ba1\u5212\u4e2d\u7684\u53c2\u6570\uff0c\u4ee5\u53ca\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u5668\uff0c\u5c06\u8ba1\u5212\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684\u673a\u5668\u4eba\u4ee3\u7801\u3002 \u5728\u4f4e\u5c42\u6b21\uff0c\u6211\u4eec\u91c7\u7528\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u4e00\u5957\u8fd0\u52a8\u89c4\u5212\u548c\u63a7\u5236\u6280\u80fd\uff0c\u4ee5\u589e\u5f3a\u56db\u8db3\u673a\u5668\u4eba\u7684\u7075\u6d3b\u6027\uff0c\u4f7f\u5176\u80fd\u8fdb\u884c\u4e30\u5bcc\u73af\u5883\u4ea4\u4e92\u3002\u6211\u4eec\u5728\u96be\u4ee5\u7528\u5355\u4e00\u6280\u80fd\u5b8c\u6210\u7684\u957f\u671f\u4efb\u52a1\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u7cfb\u7edf\u3002\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u8868\u660e\uff0c\u5b83\u6210\u529f\u5730\u5236\u5b9a\u4e86\u591a\u6b65\u9aa4\u7b56\u7565\uff0c\u5e76\u5c55\u73b0\u51fa\u975e\u5e73\u51e1\u7684\u884c\u4e3a\uff0c\u4f8b\u5982\u5236\u4f5c\u5de5\u5177\u6216\u5411\u4eba\u7c7b\u5bfb\u6c42\u5e2e\u52a9\u3002|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667](http://arxiv.org/abs/2404.04667)|null|\u591a\u6a21\u6001\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u6709\u671b\u901a\u8fc7\u89e3\u6790\u5404\u7c7b\u533b\u5b66\u6570\u636e\u63d0\u5347\u4e34\u5e8a\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5404\u533b\u5b66\u9886\u57df\u7684\u6548\u80fd\u5c1a\u4e0d\u660e\u6717\uff0c\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5176\u72ec\u7279\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u5f15\u64ce\u7684\u65b0\u578b\u591a\u6a21\u6001\u533b\u7597AI\u65b9\u6cd5\u3002\u6b64\u5f15\u64ce\u81ea\u4e3b\u534f\u8c03\u5e76\u90e8\u7f72\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u533b\u7597AI\u5de5\u5177\uff0c\u5982\u6587\u672c\u89e3\u8bfb\u3001\u653e\u5c04\u5b66\u548c\u75c5\u7406\u56fe\u50cf\u5206\u6790\u3001\u57fa\u56e0\u6570\u636e\u5904\u7406\u3001\u7f51\u7edc\u641c\u7d22\u4ee5\u53ca\u533b\u7597\u6307\u5357\u6587\u6863\u68c0\u7d22\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u4e34\u5e8a\u80bf\u7624\u5b66\u573a\u666f\u4e2d\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u573a\u666f\u6a21\u62df\u4e86\u5178\u578b\u7684\u60a3\u8005\u62a4\u7406\u6d41\u7a0b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u7cfb\u7edf\u5728\u9009\u62e9\u6070\u5f53\u5de5\u5177\uff0897%\uff09\u3001\u5f97\u51fa\u6b63\u786e\u7ed3\u8bba\uff0893.6%\uff09\u3001\u63d0\u4f9b\u5b8c\u6574\uff0894%\uff09\u548c\u6709\u76ca\uff0889.2%\uff09\u6cbb\u7597\u5efa\u8bae\uff0c\u4ee5\u53ca\u6839\u636e\u6307\u4ee4\u5f15\u7528\u76f8\u5173\u6587\u732e\uff0882.5%\uff09\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u80fd\u529b\u3002\u8fd9\u8868\u660eLLMs\u80fd\u591f\u6709\u6548\u5730\u89c4\u5212\u548c\u6267\u884c\u9886\u57df\u7279\u5b9a\u6a21\u578b\uff0c\u4ee5\u83b7\u53d6\u6216\u5408\u6210\u65b0\u4fe1\u606f\uff0c\u4ece\u800c\u5145\u5f53\u4e2a\u6027\u5316\u4e34\u5e8a\u52a9\u624b\u3002\u6b64\u5916\uff0c\u8fd9\u79cd\u67b6\u6784\u7b80\u5316\u4e86\u76d1\u7ba1\u5408\u89c4\u6027\uff0c\u56e0\u4e3a\u6bcf\u4e2a\u7ec4\u4ef6\u5de5\u5177\u53ef\u4ee5\u5355\u72ec\u9a8c\u8bc1\u548c\u5ba1\u6279\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u9879\u5de5\u4f5c\u4e3a\u533b\u7597\u9886\u57df\u7684\u66f4\u5148\u8fdbLLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u6982\u5ff5\u9a8c\u8bc1\u3002|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237](http://arxiv.org/abs/2404.04237)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u8fdb\u6b65\u4f7f\u5176\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u9891\u9891\u8d85\u8d8a\u4eba\u7c7b\u8868\u73b0\uff0c\u63a8\u52a8\u4e86\u4f17\u591a\u4e0b\u6e38\u5e94\u7528\u7684\u53d1\u5c55\uff0c\u5982\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u770b\u4f3c\u7b80\u5355\u7684\u4efb\u52a1\u4e2d\u610f\u5916\u5730\u8868\u73b0\u4e0d\u4f73\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5bf9\u66f4\u5168\u9762\u548c\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u9700\u6c42\uff0c\u4ee5\u8861\u91cf\u5b83\u4eec\u7684\u5b9e\u9645\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7ec4\u5408\u6027\u548c\u6761\u4ef6\u63a8\u7406\u2014\u2014\u4eba\u7c7b\u8ba4\u77e5\u7684\u57fa\u77f3\uff0c\u5e76\u63d0\u51faGroundCocoa\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e0e\u822a\u73ed\u9884\u8ba2\u8fd9\u4e00\u73b0\u5b9e\u95ee\u9898\u76f8\u8fde\u63a5\u7684\u8bcd\u6c47\u4e30\u5bcc\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u4efb\u52a1\u662f\u5c06\u7528\u6237\u7684\u8be6\u7ec6\u504f\u597d\u4e0e\u4ee5\u591a\u9009\u5f62\u5f0f\u63d0\u4f9b\u7684\u53ef\u7528\u822a\u73ed\u9009\u9879\u8fdb\u884c\u5339\u914d\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684GPT-4 Turbo\u5728\u5185\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7\u9ad8\u7ea7\u63d0\u793a\u540e\uff0c\u51c6\u786e\u7387\u4ecd\u4e0d\u8d85\u8fc767%\uff0c\u663e\u793a\u51fa\u663e\u8457\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045](http://arxiv.org/abs/2404.16045)|null|## \u7ffb\u8bd1 \u5728\u4ea7\u54c1\u5f00\u53d1\u7684\u5173\u952e\u9636\u6bb5\u2014\u2014\u9700\u6c42\u83b7\u53d6\uff0c\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u7528\u6237\u9700\u6c42\uff0c\u5bfc\u81f4\u6700\u7ec8\u4ea7\u54c1\u53ef\u80fd\u65e0\u6cd5\u6ee1\u8db3\u671f\u671b\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u548c\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u3002\u901a\u8fc7\u751f\u6210\u5927\u91cf\u6a21\u62df\u7528\u6237\uff08LLM\u4ee3\u7406\uff09\uff0c\u6211\u4eec\u53ef\u4ee5\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7528\u6237\u9700\u6c42\u548c\u672a\u9884\u89c1\u7684\u4f7f\u7528\u573a\u666f\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u63cf\u8ff0\u4ed6\u4eec\u7684\u884c\u4e3a\u3001\u89c2\u5bdf\u548c\u6311\u6218\uff0c\u53c2\u4e0e\u4ea7\u54c1\u4f53\u9a8c\u60c5\u666f\u3002\u968f\u540e\u7684\u4ee3\u7406\u8bbf\u8c08\u548c\u5206\u6790\u63ed\u793a\u4e86\u5b9d\u8d35\u7684\u7528\u6237\u9700\u6c42\uff0c\u5305\u62ec\u6f5c\u5728\u9700\u6c42\u3002\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff1a\u9996\u5148\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u65b9\u6cd5\u751f\u6210\u591a\u6837\u5316\u7684\u4ee3\u7406\uff0c\u5206\u6790\u5176\u4f18\u7f3a\u70b9\uff0c\u5e76\u8bc1\u660e\u4e86\u5177\u6709\u4e0a\u4e0b\u6587\u610f\u8bc6\u7684\u4ee3\u7406\u751f\u6210\u80fd\u5e26\u6765\u66f4\u5927\u7684\u9700\u6c42\u591a\u6837\u6027\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8be5\u6846\u67b6\u5982\u4f55\u6709\u6548\u5730\u6a21\u62df\u5bcc\u6709\u540c\u60c5\u5fc3\u7684\u9886\u5148\u7528\u6237\u8bbf\u8c08\uff0c\u8bc6\u522b\u51fa\u6bd4\u4f20\u7edf\u4eba\u7c7b\u8bbf\u8c08\u66f4\u591a\u7684\u6f5c\u5728\u9700\u6c42\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528LLMs\u5206\u6790\u8bbf\u8c08\uff0c\u63d0\u53d6\u9700\u6c42\u5e76\u5c06\u5176\u5206\u7c7b\u4e3a\u6f5c\u5728\u6216\u975e\u6f5c\u5728\u3002\u6211\u4eec\u7684\u7814\u7a76\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5229\u7528LLM\u4ee3\u7406\u52a0\u901f\u65e9\u671f\u4ea7\u54c1\u7814\u53d1\u3001\u964d\u4f4e\u6210\u672c\u548c\u4fc3\u8fdb\u521b\u65b0\u7684\u6f5c\u529b\u3002|\n", "2404.15317": "|**2024-04-03**|**Concept-Guided LLM Agents for Human-AI Safety Codesign**|Florian Geissler et.al.|[2404.15317](http://arxiv.org/abs/2404.15317)|null|\u968f\u7740\u751f\u6210\u4eba\u5de5\u667a\u80fd\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff0c\u7279\u522b\u662f\u5b89\u5168\u5de5\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u63d0\u5347\uff0c\u5bf9\u5b83\u7684\u8d28\u91cf\u8981\u6c42\u4e5f\u968f\u4e4b\u63d0\u9ad8\u3002\u5355\u7eaf\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e9b\u9700\u6c42\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u878d\u5408\u7684\u7b56\u7565\uff0c\u65e8\u5728\u5229\u7528LLMs\u8fdb\u884c\u5b89\u5168\u5206\u6790\u548c\u4eba\u673a\u534f\u540c\u8bbe\u8ba1\uff0c\u4ee5\u786e\u4fdd\u8f6f\u4ef6\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9a\u5236\u5316\u7684LLM\u4ee3\u7406\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u3001\u542f\u53d1\u5f0f\u63a8\u7406\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u4e13\u6ce8\u4e8e\u89e3\u51b3\u4e0e\u9884\u5b9a\u4e49\u5b89\u5168\u6982\u5ff5\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5e76\u4e0e\u7cfb\u7edf\u6a21\u578b\u56fe\u8fdb\u884c\u4ea4\u4e92\u3002\u51b3\u7b56\u6d41\u7a0b\u901a\u8fc7\u4e00\u7cfb\u5217\u5fae\u51b3\u7b56\u8fdb\u884c\u5f15\u5bfc\uff0c\u6709\u52a9\u4e8e\u4fdd\u6301\u7ed3\u6784\u5316\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u56fe\u7684\u53e3\u5934\u8868\u8ff0\u4f5c\u4e3a\u7cfb\u7edf\u6a21\u578b\u7684\u4e2d\u95f4\u8868\u793a\uff0c\u4ee5\u4fc3\u8fdbLLM\u4e0e\u56fe\u7684\u4ea4\u4e92\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7b80\u5316\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u7684\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u9009\u62e9\u7684\u63d0\u793a-\u54cd\u5e94\u5bf9\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u5e94\u7528\u4e8e\u5b89\u5168\u5206\u6790\u3002|\n", "2404.02183": "|**2024-04-02**|**Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**|Yoichi Ishibashi et.al.|[2404.02183](http://arxiv.org/abs/2404.02183)|**[link](https://github.com/tsukushiai/self-organized-agent)**|**## \u80cc\u666f \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u7684\u672a\u6765\u6b63\u9010\u6e10\u663e\u73b0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5355\u4ee3\u7406\u65b9\u6cd5\u5728\u751f\u6210\u548c\u4f18\u5316\u5927\u89c4\u6a21\u3001\u590d\u6742\u7684\u4ee3\u7801\u5e93\u65f6\u9762\u4e34\u4e0a\u4e0b\u6587\u957f\u5ea6\u9650\u5236\u7684\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014\u81ea\u7ec4\u7ec7\u591aAgent\u4f53\u7cfb\uff08SoA\uff09\u3002SoA\u662f\u4e00\u4e2a\u53ef\u6269\u5c55\u4e14\u9ad8\u6548\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u5141\u8bb8\u72ec\u7acb\u5730\u751f\u6210\u548c\u4fee\u6539\u4ee3\u7801\u7ec4\u4ef6\uff0c\u5e76\u534f\u540c\u6784\u5efa\u6574\u4e2a\u4ee3\u7801\u5e93\u3002SoA\u7684\u4e00\u4e2a\u5173\u952e\u7279\u6027\u662f\u6839\u636e\u95ee\u9898\u590d\u6742\u6027\u81ea\u52a8\u589e\u52a0\u4ee3\u7406\uff0c\u5b9e\u73b0\u52a8\u6001\u53ef\u6269\u5c55\u6027\u3002\u8fd9\u6837\uff0c\u6574\u4f53\u4ee3\u7801\u91cf\u53ef\u4ee5\u6839\u636e\u4ee3\u7406\u6570\u91cf\u65e0\u9650\u589e\u957f\uff0c\u800c\u6bcf\u4e2a\u4ee3\u7406\u7ba1\u7406\u7684\u4ee3\u7801\u91cf\u4fdd\u6301\u6052\u5b9a\u3002 \u6211\u4eec\u5728HumanEval\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86SoA\uff0c\u5e76\u53d1\u73b0\u4e0e\u5355\u4ee3\u7406\u7cfb\u7edf\u76f8\u6bd4\uff0cSoA\u4e2d\u7684\u6bcf\u4e2a\u4ee3\u7406\u5904\u7406\u7684\u4ee3\u7801\u91cf\u660e\u663e\u51cf\u5c11\uff0c\u4f46\u603b\u4f53\u751f\u6210\u7684\u4ee3\u7801\u91cf\u663e\u8457\u589e\u52a0\u3002\u6b64\u5916\uff0cSoA\u5728Pass@1\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u5f3a\u5927\u7684\u5355\u4ee3\u7406\u57fa\u7ebf\u63d0\u9ad8\u4e865%\u3002**|\n", "2404.01602": "|**2024-04-02**|**Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game**|Silin Du et.al.|[2404.01602](http://arxiv.org/abs/2404.01602)|**[link](https://github.com/doslim/evaluate-the-opinion-leadership-of-llms)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4ea4\u63a8\u7406\u6e38\u620f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u7b56\u7565\u884c\u4e3a\uff0c\u4f46\u5bf9\u5b83\u4eec\u4f5c\u4e3a\u610f\u89c1\u9886\u8896\u7684\u91cd\u8981\u6027\u5173\u6ce8\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u591aAgent\u548c\u4eba\u673a\u4ea4\u4e92\u573a\u666f\u7684\u5b9e\u9645\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u610f\u89c1\u9886\u8896\u662f\u6307\u5728\u4e00\u4e2a\u793e\u4f1a\u7fa4\u4f53\u4e2d\u5bf9\u4ed6\u4eba\u4fe1\u5ff5\u548c\u884c\u4e3a\u6709\u663e\u8457\u5f71\u54cd\u7684\u4e2a\u4f53\u3002\u672c\u7814\u7a76\u4f7f\u7528\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u4f5c\u4e3a\u6a21\u62df\u5e73\u53f0\uff0c\u63a2\u8ba8\u8bed\u8a00\u6a21\u578b\u5728\u626e\u6f14Sheriff\uff08\u6cbb\u5b89\u5b98\uff09\u89d2\u8272\u65f6\u7684\u610f\u89c1\u9886\u5bfc\u80fd\u529b\u3002Sheriff\u8d1f\u8d23\u603b\u7ed3\u8bba\u70b9\u5e76\u63d0\u51fa\u51b3\u7b56\u5efa\u8bae\uff0c\u56e0\u6b64\u5b83\u4ee3\u8868\u4e86\u610f\u89c1\u9886\u8896\u7684\u4e00\u4e2a\u53ef\u4fe1\u4ee3\u7406\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6574\u5408Sheriff\u89d2\u8272\u7684\u6846\u67b6\uff0c\u5e76\u57fa\u4e8e\u610f\u89c1\u9886\u8896\u7684\u5173\u952e\u7279\u6027\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bc4\u4f30\u6307\u6807\uff1a\u7b2c\u4e00\u4e2a\u8861\u91cf\u610f\u89c1\u9886\u8896\u7684\u53ef\u9760\u6027\uff0c\u7b2c\u4e8c\u4e2a\u8003\u5bdf\u5176\u5bf9\u5176\u4ed6\u73a9\u5bb6\u51b3\u7b56\u7684\u5f71\u54cd\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u8bc4\u4f30\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u521b\u5efa\u4e86\u201c\u72fc\u4eba\u6740\u201d\u95ee\u9898\u56de\u7b54\u6570\u636e\u96c6\uff08WWQA\uff09\uff0c\u4ee5\u6d4b\u8bd5\u548c\u63d0\u5347\u6a21\u578b\u5bf9\u6e38\u620f\u89c4\u5219\u7684\u7406\u89e3\u3002\u6b64\u5916\uff0c\u8fd8\u5305\u542b\u4e86\u4eba\u7c7b\u53c2\u4e0e\u8005\u8fdb\u884c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u662f\u4e00\u4e2a\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u610f\u89c1\u9886\u5bfc\u529b\u7684\u8bd5\u9a8c\u573a\uff0c\u4f46\u76ee\u524d\u4ec5\u6709\u5c11\u6570\u8bed\u8a00\u6a21\u578b\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002**|\n", "2404.00806": "|**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|\u968f\u7740\u7b97\u6cd5\u5b9a\u4ef7\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u62c5\u5fe7\u7b97\u6cd5\u95f4\u7684\u5408\u8c0b\u95ee\u9898\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b9a\u4ef7\u4ee3\u7406\uff0c\u7279\u522b\u662fGPT-4\uff0c\u8fdb\u884c\u4e86\u63a2\u7a76\u3002\u7814\u7a76\u53d1\u73b0\uff1a(1) LLM\u9a71\u52a8\u7684\u5b9a\u4ef7\u673a\u5236\u5728\u5b9a\u4ef7\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff1b(2) \u5728\u5be1\u5934\u7ade\u4e89\u73af\u5883\u4e2d\uff0cLLM\u5b9a\u4ef7\u4ee3\u7406\u4f1a\u81ea\u53d1\u5730\u8fdb\u884c\u5408\u8c0b\uff0c\u4ece\u800c\u635f\u5bb3\u6d88\u8d39\u8005\u5229\u76ca\uff1b(3) \u5bf9LLM\u6307\u4ee4\uff08\u201c\u63d0\u793a\u201d\uff09\u770b\u4f3c\u5fae\u5c0f\u7684\u53d8\u5316\u53ef\u80fd\u52a0\u5267\u8fd9\u79cd\u5408\u4f5c\u884c\u4e3a\u3002\u8fd9\u4e9b\u7ed3\u679c\u540c\u6837\u9002\u7528\u4e8e\u62cd\u5356\u573a\u666f\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u7b97\u6cd5\u5b9a\u4ef7\u8fdb\u884c\u53cd\u5784\u65ad\u76d1\u7ba1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u9488\u5bf9LLM\u5b9a\u4ef7\u4ee3\u7406\u7279\u6709\u7684\u76d1\u7ba1\u6311\u6218\u3002|\n", "2404.01343": "|**2024-04-15**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343](http://arxiv.org/abs/2404.01343)|**[link](https://github.com/jingzheshi/chops)**|**\u968f\u7740\u4f01\u4e1a\u548c\u8f6f\u4ef6\u5e73\u53f0\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-3.5\u3001GPT-4\u3001GLM-3\u548cLLaMa-2\uff09\u63d0\u4f9b\u804a\u5929\u8f85\u52a9\u6216\u5ba2\u6237\u670d\u52a1\u63a8\u7406\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u5ba2\u6237\u670d\u52a1\u6a21\u578b\u5728\u4e0e\u5ba2\u6237\u8d44\u6599\u96c6\u6210\u548c\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u5b83\u4eec\u503e\u5411\u4e8e\u5f3a\u8c03\u591a\u6837\u6027\u800c\u975e\u7cbe\u786e\u6027\u548c\u9519\u8bef\u907f\u514d\uff0c\u8fd9\u5bf9\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u5ba2\u6237\u670d\u52a1\u573a\u666f\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHOPS\uff08\u7ed3\u5408\u5ba2\u6237\u8d44\u6599\u7684\u804a\u5929\u52a9\u624b\uff09\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\uff1a\uff081\uff09\u9ad8\u6548\u5229\u7528\u73b0\u6709\u6570\u636e\u5e93\u6216\u7cfb\u7edf\u67e5\u8be2\u7528\u6237\u4fe1\u606f\uff0c\u6216\u9075\u5faa\u65e2\u5b9a\u6307\u5357\u4e0e\u7cfb\u7edf\u4ea4\u4e92\uff1b\uff082\uff09\u63d0\u4f9b\u51c6\u786e\u5408\u7406\u7684\u54cd\u5e94\u5e76\u6267\u884c\u7cfb\u7edf\u5185\u7684\u5fc5\u8981\u64cd\u4f5c\uff0c\u540c\u65f6\u907f\u514d\u6709\u5bb3\u64cd\u4f5c\uff1b\uff083\uff09\u901a\u8fc7\u7ed3\u5408\u5c0f\u578b\u548c\u5927\u578bLLM\u4ee5\u5b9e\u73b0\u6027\u80fd\u6ee1\u610f\u4e14\u6210\u672c\u5408\u7406\u7684\u63a8\u7406\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9e\u7528\u7684\u6570\u636e\u96c6\uff0c\u79f0\u4e3aCPHOS-dataset\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u6570\u636e\u5e93\u3001\u6307\u5bfc\u6587\u4ef6\u4ee5\u53ca\u6765\u81eaCPHOS\u5e73\u53f0\u7684\u6a21\u62df\u7269\u7406\u5965\u6797\u5339\u514b\u7ec4\u7ec7\u670d\u52a1\u7684\u95ee\u7b54\u5bf9\u3002CPHOS\u662f\u4e00\u4e2a\u9762\u5411\u9ad8\u4e2d\u6559\u5e08\u548c\u5b66\u751f\u7684\u5728\u7ebf\u5e73\u53f0\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528CPHOS-dataset\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86CHOPS\u67b6\u6784\u7684\u6027\u80fd\uff0c\u76ee\u6807\u662f\u5c55\u793aLLM\u5982\u4f55\u63d0\u5347\u6216\u66ff\u4ee3\u4eba\u5de5\u5ba2\u6237\u670d\u52a1\u3002\u5173\u4e8e\u6211\u4eec\u7684\u63d0\u6848\u67b6\u6784\u548c\u6570\u636e\u96c6\u7684\u4ee3\u7801\u53ef\u5728\u6b64\u5904\u83b7\u53d6\uff1a\u3002**|\n", "2404.01342": "|**2024-03-31**|**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**|Lirui Zhao et.al.|[2404.01342](http://arxiv.org/abs/2404.01342)|**[link](https://github.com/opengvlab/diffagent)**|**\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u8fd1\u5e74\u6765\u5907\u53d7\u77a9\u76ee\uff0c\u5728\u5b66\u672f\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u653e\u5f02\u5f69\u3002\u4f8b\u5982\uff0cCivitai\u5e73\u53f0\uff0c\u4e00\u4e2aT2I\u521b\u65b0\u7684\u805a\u96c6\u5730\uff0c\u76ee\u524d\u6c47\u96c6\u4e8674,492\u79cd\u72ec\u7279\u7684\u6a21\u578b\uff0c\u8fd9\u5e26\u6765\u4e86\u9009\u62e9\u6700\u5408\u9002\u7684\u6a21\u578b\u548c\u53c2\u6570\u7684\u8270\u5de8\u4efb\u52a1\uff0c\u901a\u5e38\u9700\u8981\u591a\u6b21\u8bd5\u9a8c\u3002\u501f\u9274\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5de5\u5177\u4f7f\u7528\u7814\u7a76\u7684\u601d\u8def\uff0c\u6211\u4eec\u63a8\u51fa\u4e86DiffAgent\uff0c\u8fd9\u662f\u4e00\u4e2a\u901a\u8fc7API\u8c03\u7528\u6765\u5feb\u901f\u7b5b\u9009\u51c6\u786e\u9009\u9879\u7684LLM\u4ee3\u7406\u3002DiffAgent\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff0c\u79f0\u4e3aSFTA\uff0c\u4f7f\u5176\u80fd\u591f\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7cbe\u786e\u5730\u5c06T2I API\u7684\u54cd\u5e94\u4e0e\u7528\u6237\u8f93\u5165\u5bf9\u9f50\u3002\u4e3a\u4e86\u8bad\u7ec3\u548c\u8bc4\u4f30DiffAgent\u7684\u80fd\u529b\uff0c\u6211\u4eec\u6784\u5efa\u4e86DABench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u5e93\uff0c\u6db5\u76d6\u4e86\u793e\u533a\u4e2d\u7684\u5404\u79cdT2I API\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDiffAgent\u4e0d\u4ec5\u5728\u9009\u62e9\u9002\u5f53\u7684T2I API\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u9a8c\u8bc1\u4e86SFTA\u8bad\u7ec3\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53ef\u5728https://github.com/OpenGVLab/DiffAgent\u83b7\u53d6\u3002**|\n", "2404.00573": "|**2024-03-31**|**\"My agent understands me better\": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u7c7b\u8bb0\u5fc6\u67b6\u6784\uff0c\u65e8\u5728\u63d0\u5347\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u8bdd\u4ee3\u7406\u7684\u8ba4\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f7f\u5f97\u8fd9\u4e9b\u4ee3\u7406\u80fd\u81ea\u4e3b\u68c0\u7d22\u751f\u6210\u54cd\u5e94\u6240\u9700\u7684\u5fc5\u8981\u8bb0\u5fc6\uff0c\u4ece\u800c\u89e3\u51b3LLMs\u5728\u65f6\u95f4\u8ba4\u77e5\u4e0a\u7684\u5c40\u9650\u3002\u6211\u4eec\u501f\u9274\u4e86\u4eba\u7c7b\u7684\u8bb0\u5fc6\u7ebf\u7d22\u53ec\u56de\u673a\u5236\u4f5c\u4e3a\u89e6\u53d1\u70b9\uff0c\u4ee5\u5b9e\u73b0\u7cbe\u786e\u4e14\u9ad8\u6548\u7684\u56de\u5fc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6570\u5b66\u6a21\u578b\uff0c\u52a8\u6001\u91cf\u5316\u8bb0\u5fc6\u5de9\u56fa\u8fc7\u7a0b\uff0c\u8003\u8651\u4e86\u8bf8\u5982\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u65f6\u95f4\u6d41\u901d\u548c\u56de\u5fc6\u9891\u7387\u7b49\u56e0\u7d20\u3002\u4ee3\u7406\u4f1a\u4ece\u7528\u6237\u7684\u4ea4\u4e92\u5386\u53f2\u4e2d\u5b58\u50a8\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u8bb0\u5fc6\u88ab\u5c01\u88c5\u5728\u6570\u636e\u5e93\u4e2d\uff0c\u6bcf\u4e2a\u8bb0\u5fc6\u90fd\u5305\u542b\u4e86\u5185\u5bb9\u548c\u65f6\u95f4\u5173\u8054\u7684\u8bed\u5883\u3002\u8fd9\u6837\uff0c\u901a\u8fc7\u7c7b\u4f3c\u4eba\u7c7b\u8bc6\u522b\u548c\u56de\u5fc6\u8fc7\u5f80\u7ecf\u5386\u7684\u65b9\u5f0f\uff0c\u7cfb\u7edf\u80fd\u591f\u6218\u7565\u6027\u5730\u5b58\u50a8\u8bb0\u5fc6\uff0c\u5e76\u7406\u89e3\u5b83\u4eec\u5bf9\u7528\u6237\u5728\u65f6\u95f4\u7ebf\u4e0a\u7684\u91cd\u8981\u6027\u3002|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.11403": "|**2024-05-18**|**MapCoder: Multi-Agent Code Generation for Competitive Problem Solving**|Md. Ashraful Islam et.al.|[2405.11403](http://arxiv.org/abs/2405.11403)|**[link](https://github.com/md-ashraful-pramanik/mapcoder)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ee3\u7801\u5408\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u590d\u6742\u7684\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u63cf\u8ff0\u3001\u751f\u6210\u590d\u6742\u7684\u7b97\u6cd5\u548c\u6570\u636e\u7ed3\u6784\u4ee3\u7801\uff0c\u5e76\u6267\u884c\u5168\u9762\u7684\u5355\u5143\u6d4b\u8bd5\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u591a\u4ee3\u7406\u63d0\u793a\u6846\u67b6MapCoder\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u5f00\u53d1\u8005\u7f16\u7a0b\u5408\u6210\u7684\u5b8c\u6574\u8fc7\u7a0b\uff0c\u5206\u4e3a\u56db\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\uff1a\u56de\u5fc6\u76f8\u5173\u793a\u4f8b\u3001\u89c4\u5212\u3001\u4ee3\u7801\u751f\u6210\u548c\u8c03\u8bd5\u3002 \u901a\u8fc7\u5728\u516b\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7ade\u8d5b\u7ea7\u95ee\u9898\u89e3\u51b3\u548c\u7a0b\u5e8f\u5408\u6210\u57fa\u51c6\u4e0a\u8fdb\u884c\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u5305\u62ecHumanEval\uff0893.9%\uff09\u3001MBPP\uff0883.1%\uff09\u3001APPS\uff0822.0%\uff09\u3001CodeContests\uff0828.5%\uff09\u548cxCodeEval\uff0845.3%\uff09\u7b49\uff0cMapCoder\u5c55\u73b0\u4e86\u51fa\u8272\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u591a\u9879\u65b0\u7684\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u800c\u4e14\uff0c\u65e0\u8bba\u7f16\u7a0b\u8bed\u8a00\u8fd8\u662f\u95ee\u9898\u96be\u5ea6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u90fd\u8868\u73b0\u51fa\u6301\u7eed\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8be5\u6846\u67b6\uff0c\u4f9b\u7814\u7a76\u8005\u53c2\u8003\uff1ahttps://github.com/Md-Ashraful-Pramanik/MapCoder\u3002**|\n", "2405.14751": "|**2024-05-23**|**AGILE: A Novel Framework of LLM Agents**|Peiyuan Feng et.al.|[2405.14751](http://arxiv.org/abs/2405.14751)|**[link](https://github.com/bytarnish/agile)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3aLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406AGILE\uff08\u80fd\u591f\u4e0e\u7528\u6237\u4e92\u52a8\u5e76\u4ece\u73af\u5883\u4e2d\u5b66\u4e60\u7684\u4ee3\u7406\uff09\uff0c\u65e8\u5728\u6267\u884c\u590d\u6742\u7684\u5bf9\u8bdd\u4efb\u52a1\uff0c\u5229\u7528LLMs\u3001\u8bb0\u5fc6\u3001\u5de5\u5177\u548c\u4e13\u5bb6\u4ea4\u4e92\u3002\u8fd9\u79cd\u4ee3\u7406\u4e0d\u4ec5\u5177\u5907\u5bf9\u8bdd\u80fd\u529b\uff0c\u8fd8\u5177\u5907\u53cd\u601d\u3001\u5de5\u5177\u8fd0\u7528\u4ee5\u53ca\u54a8\u8be2\u4e13\u5bb6\u7684\u529f\u80fd\u3002\u6211\u4eec\u5c06\u6784\u5efa\u6b64\u7c7bLLM\u4ee3\u7406\u89c6\u4e3a\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u4e2dLLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\u3002\u6211\u4eec\u4f7f\u7528\u6807\u6ce8\u7684\u884c\u4e3a\u6570\u636e\u548cPPO\u7b97\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7279\u522b\u5173\u6ce8\u7684\u662f\u95ee\u7b54\u4efb\u52a1\uff0c\u4e3a\u6b64\u6211\u4eec\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aProductQA\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u5728\u7ebf\u8d2d\u7269\u4e2d\u7684\u96be\u9898\u3002\u6211\u4eec\u5728ProductQA\u548cMedMCQA\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8e130\u4ebf\u548c70\u4ebf\u53c2\u6570\u7684LLM\u8bad\u7ec3\u7684AGILE\u4ee3\u7406\u80fd\u591f\u8d85\u8d8aGPT-4\u4ee3\u7406\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684 ablation\u7814\u7a76\u5f3a\u8c03\u4e86\u8bb0\u5fc6\u3001\u5de5\u5177\u3001\u54a8\u8be2\u3001\u53cd\u601d\u548c\u5f3a\u5316\u5b66\u4e60\u5728\u5b9e\u73b0\u4f18\u79c0\u6027\u80fd\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002|\n", "2405.14744": "|**2024-05-23**|**Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View**|Xuan Liu et.al.|[2405.14744](http://arxiv.org/abs/2405.14744)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u53cd\u6620\u4e86\u4eba\u7c7b\u504f\u89c1\uff0c\u5b83\u4eec\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\u95ee\u9898\u3002\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u662f\uff1aLLMs\u662f\u5426\u80fd\u591f\u5229\u7528\u5e7b\u89c9\u6765\u6a21\u4eff\u4eba\u7c7b\u7684\u8ba4\u77e5\u504f\u89c1\uff0c\u4ece\u800c\u5c55\u73b0\u51fa\u975e\u7406\u6027\u4f46\u793e\u4f1a\u6027\u7684\u4e00\u9762\uff1f\u672c\u6587\u63a2\u8ba8\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u901a\u8fc7\u7ed3\u5408\u5b9e\u7528\u7684\u793e\u4f1a\u79d1\u5b66\u5b9e\u9a8c\u548c\u7406\u8bba\u6d1e\u5bdf\uff0c\u63d0\u51faCogMir\uff0c\u4e00\u4e2a\u5f00\u653e\u5f0f\u591aLLM\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528LLMs\u7684\u5e7b\u89c9\u7279\u6027\u6765\u8bc4\u4f30\u548c\u63d0\u5347\u5176\u793e\u4f1a\u667a\u80fd\uff0c\u7279\u522b\u662f\u5728\u8ba4\u77e5\u504f\u5dee\u65b9\u9762\u3002\u6211\u4eec\u5728CogMir\u5b50\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u786e\u5b9a\u60c5\u5883\u4e0b\uff0cLLMs\u548c\u4eba\u7c7b\u5728\u975e\u7406\u6027\u53ca\u4eb2\u793e\u4f1a\u51b3\u7b56\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u4e00\u81f4\u6027\uff0c\u8fd9\u8868\u660eLLMs\u4f5c\u4e3a\u793e\u4f1a\u5b9e\u4f53\u7684\u4eb2\u793e\u4f1a\u6027\uff0c\u5e76\u7a81\u663e\u4e86\u5e7b\u89c9\u7279\u6027\u7684\u5173\u952e\u4f5c\u7528\u3002\u6b64\u5916\uff0cCogMir\u6846\u67b6\u5c55\u793a\u4e86\u5176\u4f5c\u4e3a\u7814\u7a76LLMs\u793e\u4f1a\u667a\u80fd\u7684\u6709\u4ef7\u503c\u5e73\u53f0\u7684\u6f5c\u529b\u3002|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547](http://arxiv.org/abs/2405.13547)|null|## \u80cc\u666f \u81ea\u52a8\u9a7e\u9a76\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u5148\u8fdb\u7684\u51b3\u7b56\u548c\u63a7\u5236\u7b97\u6cd5\u3002\u7406\u89e3\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u51b3\u7b56\u7684\u4f9d\u636e\u5bf9\u4e8e\u786e\u4fdd\u5176\u5728\u9ad8\u901f\u516c\u8def\u9a7e\u9a76\u4e2d\u7684\u5b89\u5168\u4e0e\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aHighwayLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u6765\u9884\u6d4bego\u8f66\u8f86\u7684\u672a\u6765\u5bfc\u822a\u8def\u5f84\u70b9\u3002\u8be5\u65b9\u6cd5\u8fd8\u91c7\u7528\u9884\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u6a21\u578b\u4f5c\u4e3a\u9ad8\u5c42\u6b21\u89c4\u5212\u5668\uff0c\u5bf9\u5408\u9002\u7684\u5143\u7ea7\u52a8\u4f5c\u8fdb\u884c\u51b3\u7b56\u3002HighwayLLM\u5c06RL\u6a21\u578b\u7684\u8f93\u51fa\u4e0e\u5f53\u524d\u72b6\u6001\u4fe1\u606f\u76f8\u7ed3\u5408\uff0c\u751f\u6210\u5b89\u5168\u3001\u65e0\u78b0\u649e\u4e14\u53ef\u89e3\u91ca\u7684\u672a\u6765\u72b6\u6001\u9884\u6d4b\uff0c\u4ece\u800c\u6784\u5efa\u51fa\u8f66\u8f86\u7684\u884c\u9a76\u8f68\u8ff9\u3002\u968f\u540e\uff0c\u57fa\u4e8ePID\u7684\u63a7\u5236\u5668\u5f15\u5bfc\u8f66\u8f86\u9075\u5faaLLM\u4ee3\u7406\u9884\u6d4b\u7684\u8def\u5f84\u70b9\u3002\u8fd9\u79cdLLM\u4e0eRL\u548cPID\u7684\u878d\u5408\u63d0\u5347\u4e86\u51b3\u7b56\u8fc7\u7a0b\uff0c\u5e76\u4e3a\u9ad8\u901f\u516c\u8def\u81ea\u52a8\u9a7e\u9a76\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050](http://arxiv.org/abs/2405.13050)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09-\u5728-\u73af\u5e94\u7528\u5df2\u663e\u793a\u51fa\u6709\u6548\u7406\u89e3\u7528\u6237\u547d\u4ee4\u3001\u5236\u5b9a\u8ba1\u5212\u5e76\u76f8\u5e94\u5730\u64cd\u4f5c\u5916\u90e8\u5de5\u5177/\u7cfb\u7edf\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u4ee3\u7406\u7684\u64cd\u4f5c\u8303\u56f4\u5c40\u9650\u4e8e\u88ab\u52a8\u54cd\u5e94\u7528\u6237\uff0c\u9700\u8981\u7528\u6237\u6839\u636e\u5e95\u5c42\u5de5\u5177/\u7cfb\u7edf\u6765\u8868\u8ff0\u9700\u6c42\u3002\u6211\u4eec\u6ce8\u610f\u5230LLM\u4ee3\u7406\u7528\u6237\u754c\u9762\uff08LAUI\uff09\u7684\u6f5c\u529b\u8fdc\u672a\u5145\u5206\u5229\u7528\u3002\u7406\u60f3\u7684LAUI\u8bbe\u60f3\u4e2d\uff0c\u7528\u6237\u65e0\u9700\u6df1\u5165\u4e86\u89e3\u5de5\u5177/\u7cfb\u7edf\uff0c\u5c31\u80fd\u4e0e\u4e4b\u4ea4\u4e92\u4ee5\u63a2\u7d22\u65b0\u5174\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u4e0d\u540c\u4e8e\u8bbe\u8ba1\u56fa\u5b9a\u7684\u53ef\u63a2\u7d22GUI\u6765\u6559\u6388\u7528\u6237\u4f7f\u7528\u7cfb\u7edf\u7684\u9884\u8bbe\u65b9\u5f0f\uff0cLAUI\u4e2d\u7684LLM\u4ee3\u7406\u4ece\u4e00\u5f00\u59cb\u5c31\u5bf9\u7cfb\u7edf\u719f\u7ec3\uff0c\u4e3b\u52a8\u5b66\u4e60\u7528\u6237\u53ca\u5176\u9700\u6c42\uff0c\u5e76\u5411\u7528\u6237\u63d0\u51fa\u65b0\u7684\u4e92\u52a8\u65b9\u6848\u3002\u4e3a\u4e86\u5c55\u793aLAUI\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\uff1aFlute X GPT\uff0c\u5b83\u7ed3\u5408\u4e86LLM\u4ee3\u7406\u3001\u63d0\u793a\u7ba1\u7406\u5668\u548c\u4e00\u4e2a\u652f\u6301\u590d\u6742\u5b9e\u65f6\u4f53\u9a8c\u7684\u7b1b\u5b50\u6559\u5b66\u591a\u5a92\u4f53\u8f6f\u786c\u4ef6\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u5b66\u4e60\u5439\u594f\u7b1b\u5b50\u7684\u8fc7\u7a0b\u3002|\n", "2405.13009": "|**2024-05-13**|**METAREFLECTION: Learning Instructions for Language Agents using Past Reflections**|Priyanshu Gupta et.al.|[2405.13009](http://arxiv.org/abs/2405.13009)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e7f\u53d7\u6b22\u8fce\uff0c\u4f46\u4e3a\u5176\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u8bbe\u8ba1\u7cbe\u786e\u7684\u63d0\u793a\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u7528\u6237\u901a\u5e38\u9700\u8981\u4e0e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u591a\u8f6e\u5bf9\u8bdd\u4ee5\u8fbe\u6210\u76ee\u6807\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u6a21\u578b\u81ea\u8eab\u7684\u53cd\u9988\uff0c\u5373\u81ea\u53cd\u601d\uff0c\u80fd\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u8d77\u5230\u5f3a\u5316\u4f5c\u7528\uff0c\u6709\u52a9\u4e8e\u66f4\u5feb\u5730\u8fbe\u5230\u671f\u671b\u7ed3\u679c\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014METAREFLECTION\uff0c\u5b83\u80fd\u4ece\u8bad\u7ec3\u9636\u6bb5\u6536\u96c6\u5230\u7684\u4e2a\u4f53\u81ea\u53cd\u601d\u4e2d\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u901a\u7528\u63d0\u793a\u6307\u4ee4\u3002\u6211\u4eec\u5728\u57fa\u7840\u8bbe\u65bd\u5373\u4ee3\u7801\uff08IAC\uff09\u6f0f\u6d1e\u68c0\u6d4b\u548c\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4f7f\u7528REACT\u548cCOT\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cMETAREFLECTION\u663e\u8457\u4f18\u4e8eGPT-4\uff0c\u5206\u522b\u5728IAC\u3001COT\u548cREACT\u4e2d\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a16.82%\u300131.33%\u548c15.42%\uff0c\u8fd9\u8868\u660eMETAREFLECTION\u6709\u6f5c\u529b\u63d0\u5347LLMs\u7684\u6548\u7387\uff0c\u662f\u4e00\u79cd\u503c\u5f97\u63a2\u7d22\u7684\u7b56\u7565\u3002|\n", "2405.15414": "|**2024-05-24**|**Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification**|Yuxuan Guo et.al.|[2405.15414](http://arxiv.org/abs/2405.15414)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u4e2d\uff0c\u6784\u5efa\u5f00\u653e\u578b\u4ee3\u7406\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u7ec8\u6781\u76ee\u6807\uff0c\u7279\u522b\u662f\u521b\u9020\u6027\u7684\u4ee3\u7406\u66f4\u5177\u5438\u5f15\u529b\u3002\u73b0\u6709\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u6709\u660e\u786e\u76ee\u6807\u7684\u957f\u5e8f\u5217\u4efb\u52a1\uff08\u5982\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u201c\u5f00\u91c7\u94bb\u77f3\u201d\uff09\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u5904\u7406\u5177\u6709\u5f00\u653e\u76ee\u6807\u548c\u62bd\u8c61\u6807\u51c6\u7684\u521b\u9020\u6027\u4efb\u52a1\u65f6\u9047\u5230\u56f0\u96be\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5f25\u5408\u8fd9\u4e9b\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u4ece\u800c\u7f3a\u4e4f\u81ea\u6211\u6539\u8fdb\u6765\u89e3\u51b3\u95ee\u9898\u7684\u53cd\u9988\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\u6280\u672f\uff0c\u4ee5\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e3a\u521b\u9020\u6027\u4efb\u52a1\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Luban\u4ee3\u7406\uff0c\u4e13\u6ce8\u4e8e\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5b83\u914d\u5907\u4e86\u4e24\u7ea7\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u5b9e\u8df5\uff1a\uff081\uff09\u89c6\u89c9\u9a8c\u8bc13D\u7ed3\u6784\u63a8\u6d4b\uff0c\u901a\u8fc7\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u7684CAD\u5efa\u6a21\u7a0b\u5e8f\u5b9e\u73b0\uff1b\uff082\uff09\u5b9e\u7528\u9a8c\u8bc1\uff0c\u6839\u636e\u62bd\u8c61\u6807\u51c6\u751f\u6210\u5e76\u9a8c\u8bc1\u4e0e\u73af\u5883\u76f8\u5173\u7684\u529f\u80fd\u7a0b\u5e8f\u3002\u5e7f\u6cdb\u7684\u591a\u7ef4\u5ea6\u4eba\u7c7b\u7814\u7a76\u548cElo\u8bc4\u7ea7\u663e\u793a\uff0cLuban\u80fd\u591f\u5728\u6211\u4eec\u63d0\u51fa\u7684\u57fa\u51c6\u4e2d\u5b8c\u6210\u591a\u6837\u5316\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5e76\u5728\u53ef\u89c6\u5316\u548c\u5b9e\u7528\u6027\u65b9\u9762\u5206\u522b\u6bd4\u5176\u4ed6\u57fa\u7ebf\u63d0\u9ad8\u4e8633%\u5230100%\u3002\u6b64\u5916\uff0c\u5b9e\u73b0\u5728\u771f\u5b9e\u4e16\u754c\u673a\u5668\u4eba\u624b\u81c2\u4e0a\u7684\u6f14\u793a\u5c55\u793a\u4e86Luban\u5728\u7269\u7406\u4e16\u754c\u4e2d\u7684\u521b\u4f5c\u6f5c\u529b\u3002|\n", "2405.15145": "|**2024-05-24**|**CulturePark: Boosting Cross-cultural Understanding in Large Language Models**|Cheng Li et.al.|[2405.15145](http://arxiv.org/abs/2405.15145)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u666e\u904d\u5b58\u5728\u6587\u5316\u504f\u89c1\uff0c\u4e3b\u8981\u6e90\u4e8e\u7f3a\u4e4f\u4ee3\u8868\u4e0d\u540c\u6587\u5316\u7684\u4ee3\u8868\u6027\u6570\u636e\u3002\u4f20\u7edf\u7684\u6587\u5316\u6570\u636e\u96c6\u548c\u57fa\u51c6\u901a\u5e38\u901a\u8fc7\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u63d0\u53d6\u6216\u805a\u5408\u6765\u81ea\u7ef4\u57fa\u767e\u79d1\u548c\u793e\u4ea4\u5a92\u4f53\u7684\u4fe1\u606f\u6784\u5efa\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u548c\u4eba\u5de5\u6807\u6ce8\uff0c\u6210\u672c\u9ad8\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u672c\u6587\u501f\u9274\u8ba4\u77e5\u793e\u4f1a\u4ea4\u6d41\u7406\u8bba\uff0c\u63d0\u51faCulturePark\uff0c\u4e00\u4e2a\u5229\u7528LLMs\u7684\u591a\u4ee3\u7406\u6c9f\u901a\u6846\u67b6\uff0c\u7528\u4e8e\u6587\u5316\u6570\u636e\u6536\u96c6\u3002CulturePark\u901a\u8fc7\u6a21\u62df\u4e0d\u540c\u6587\u5316\u80cc\u666f\u4e0b\u7684\u4eba\u7c7b\u4ea4\u6d41\uff0c\u8ba9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89d2\u8272\u626e\u6f14\uff0c\u751f\u6210\u5305\u542b\u4eba\u7c7b\u4fe1\u5ff5\u3001\u89c4\u8303\u548c\u4e60\u4fd7\u7684\u9ad8\u8d28\u91cf\u8de8\u6587\u5316\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528CulturePark\u751f\u6210\u4e8641,000\u4e2a\u6587\u5316\u6837\u672c\uff0c\u5bf9\u516b\u79cd\u7279\u5b9a\u6587\u5316\u8fdb\u884c\u4e86\u6a21\u578b\u5fae\u8c03\u3002\u5728\u4e09\u9879\u4e0b\u6e38\u4efb\u52a1\u8bc4\u4f30\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u8868\u73b0\u4f18\u4e8eGPT-4\uff1a\u5185\u5bb9\u8fc7\u6ee4\u3001\u6587\u5316\u4e00\u81f4\u6027\uff08\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u8868\u4e0a\uff09\u548c\u6587\u5316\u6559\u80b2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684GPT-3.5\u6a21\u578b\u5728\u5185\u5bb9\u8fc7\u6ee4\u4efb\u52a1\u4e0a\u4e0eGPT-4\u76f8\u5f53\u6216\u4f18\u4e8e\u5b83\uff1b\u5728\u6587\u5316\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u886813\u6846\u67b6\u4e0a\u8d85\u8d8aGPT-4\uff1b\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u6587\u5316\u6559\u80b2\u6548\u679c\u548c\u7528\u6237\u4f53\u9a8c\u4e0a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u8868\u73b0\u51fa\u8272\u3002CulturePark\u5bf9\u4e8e\u51cf\u5c11\u6587\u5316\u504f\u89c1\u548c\u63a8\u52a8AI\u7684\u6c11\u4e3b\u5316\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u5f3a\u8c03\u4e86\u6587\u5316\u5305\u5bb9\u6027\u6570\u636e\u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2405.14918": "|**2024-05-23**|**AnalogCoder: Analog Circuit Design via Training-Free Code Generation**|Yao Lai et.al.|[2405.14918](http://arxiv.org/abs/2405.14918)|**[link](https://github.com/laiyao1/AnalogCoder)**|### \u7ffb\u8bd1 \u5728\u73b0\u4ee3\u82af\u7247\u6280\u672f\u4e2d\uff0c\u6a21\u62df\u7535\u8def\u8bbe\u8ba1\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u7ec4\u4ef6\u9009\u62e9\u3001\u8fde\u63a5\u548c\u53c2\u6570\u8bbe\u7f6e\u4ee5\u786e\u4fdd\u7535\u8def\u529f\u80fd\u6b63\u5e38\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b57\u7535\u8def\u8bbe\u8ba1\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u4f46\u6a21\u62df\u7535\u8def\u7684\u590d\u6742\u6027\u548c\u6570\u636e\u7a00\u7f3a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a8\u51fa\u4e86AnalogCoder\uff0c\u8fd9\u662f\u9996\u4e2a\u65e0\u9700\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4e13\u4e3a\u901a\u8fc7Python\u4ee3\u7801\u751f\u6210\u6765\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u9996\u5148\uff0cAnalogCoder\u91c7\u7528\u53cd\u9988\u589e\u5f3a\u6d41\u7a0b\uff0c\u5e76\u7ed3\u5408\u5b9a\u5236\u7684\u9886\u57df\u7279\u5b9a\u63d0\u793a\uff0c\u80fd\u591f\u81ea\u52a8\u4e14\u81ea\u6211\u6821\u6b63\u5730\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\uff0c\u6210\u529f\u7387\u9ad8\u3002\u5176\u6b21\uff0c\u5b83\u63d0\u51fa\u4e86\u4e00\u5957\u7535\u8def\u5de5\u5177\u5e93\uff0c\u7528\u4e8e\u5b58\u50a8\u6210\u529f\u7684\u7535\u8def\u8bbe\u8ba1\u4f5c\u4e3a\u53ef\u91cd\u7528\u7684\u6a21\u5757\u5316\u5b50\u7535\u8def\uff0c\u7b80\u5316\u4e86\u590d\u5408\u7535\u8def\u7684\u521b\u5efa\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAnalogCoder\u5728\u5e7f\u6cdb\u8986\u76d6\u6a21\u62df\u7535\u8def\u4efb\u52a1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8d85\u8d8a\u4e86\u5176\u4ed6\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0c\u6210\u529f\u8bbe\u8ba1\u4e8620\u4e2a\u7535\u8def\uff0c\u6bd4\u6807\u51c6GPT-4o\u591a\u51fa5\u4e2a\u3002\u6211\u4eec\u76f8\u4fe1AnalogCoder\u80fd\u663e\u8457\u63d0\u5347\u82af\u7247\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u6548\u7387\uff0c\u8ba9\u975e\u4e13\u5bb6\u4e5f\u80fd\u9ad8\u6548\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u57fa\u51c6\u5df2\u63d0\u4f9b\u5728\uff1a[https://github.com/anonyanalog/AnalogCoder](https://github.com/anonyanalog/AnalogCoder)\u3002|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|## \u80cc\u666f \u7531\u4e8e\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0cEmbodied agent \u9700\u8981\u5177\u5907\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u51b3\u5b9a\u5177\u4f53\u884c\u52a8\u65f6\u53ef\u80fd\u4ea7\u751f\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u7684\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3 LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\u2014\u2014\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bad\u7ec3\u7b56\u7565\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u7684\u6700\u9ad8\u6210\u5c31\u9700\u8981\u66f4\u4e3a\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u6bd4\u73b0\u6709\u6700\u5feb\u65b9\u6cd5\u5feb\u51fa\u4e866.8\u500d\u3002|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510](http://arxiv.org/abs/2405.16510)|null|\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\u63a8\u52a8\u4e86\u667a\u80fd\u4ee3\u7406\u7814\u7a76\u7684\u65b0\u70ed\u6f6e\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5b9e\u73b0\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6709\u524d\u666f\u65b9\u6cd5\uff0c\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u548c\u6cdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u5728\u5b9e\u9645\u4efb\u52a1\u4e2d\uff0c\u6709\u6548\u7684\u89c4\u5212\u5bf9LLM\u4ee3\u7406\u7684\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u590d\u6742\u4efb\u52a1\u8bbe\u8ba1\u51fa\u53ef\u884c\u6216\u6700\u4f18\u7684\u7cbe\u7ec6\u7c92\u5ea6\u64cd\u4f5c\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9700\u8981\u7ec4\u5408\u5927\u91cf\u5f02\u8d28\u884c\u52a8\u7684\u5e8f\u5217\uff0c\u4ecd\u662f\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMeta-Task Planning\uff08MTP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u96f6\u6837\u672c\u7684\u534f\u4f5c\u5f0fLLM\u591a\u4ee3\u7406\u7cfb\u7edf\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u5373\u5143\u4efb\u52a1\uff0c\u7b80\u5316\u4e86\u4efb\u52a1\u89c4\u5212\u3002\u6bcf\u4e2a\u5143\u4efb\u52a1\u968f\u540e\u6620\u5c04\u4e3a\u53ef\u6267\u884c\u52a8\u4f5c\u3002\u5728TravelPlanner\u548cAPI-Bank\u4e24\u4e2a\u4e25\u683c\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86MTP\u3002\u7ed3\u679c\u8868\u660e\uff0cMTP\u5728TravelPlanner\u4e0a\u7684\u5e73\u5747\u6210\u529f\u7387\u7ea6\u4e3a40%\uff0c\u8fdc\u8d85\u5f53\u524d\u6700\u4f73\u57fa\u7ebf\uff082.92%\uff09\uff0c\u5e76\u4e14\u5728API-Bank\u4e0a\u7684\u6027\u80fd\u6bd4\u4f7f\u7528ReAct\u7684LLM_{api}-4\u9ad8\u51fa\u7ea614%\uff0c\u8fd9\u663e\u793a\u51fa\u5c06LLM\u4e0e\u591a\u4ee3\u7406\u7cfb\u7edf\u76f8\u7ed3\u5408\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376](http://arxiv.org/abs/2405.16376)|**[link](https://github.com/cyrilli/stride)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u8bed\u8a00\u80fd\u529b\u548c\u63a8\u7406\u6280\u5de7\u3002\u7136\u800c\uff0c\u5728\u6218\u7565\u6027\u7684\u591a\u4ee3\u7406\u51b3\u7b56\u73af\u5883\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u5c40\u9650\uff0c\u5982\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5dee\u3001\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\u548c\u751f\u6210\u9519\u8bef\u4fe1\u606f\u3002\u8fd9\u4e9b\u7f3a\u70b9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9075\u5b88\u590d\u6742\u6e38\u620f\u89c4\u5219\u3001\u957f\u671f\u89c4\u5212\u3001\u63a2\u7d22\u672a\u77e5\u73af\u5883\u4ee5\u53ca\u9884\u6d4b\u5bf9\u624b\u884c\u52a8\u7684\u4e92\u52a8\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u7ed3\u5408\u4e86\u8bb0\u5fc6\u548c\u4e13\u4e1a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u5176\u5728\u6218\u7565\u51b3\u7b56\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7279\u522b\u5728\u53cc\u8fb9\u8c08\u5224\u3001\u591a\u4ee3\u7406\u52a8\u6001\u673a\u5236\u8bbe\u8ba1\u7b49\u7ecf\u6d4e\u91cd\u8981\u573a\u666f\u4e2d\u5e94\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5e76\u901a\u8fc7\u5b9a\u91cf\u6307\u6807\u8bc4\u4f30\u5728\u5404\u79cd\u6218\u7565\u51b3\u7b56\u95ee\u9898\u4e0a\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u589e\u5f3a\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5f53\u524d\u6a21\u578b\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u4f46\u6211\u4eec\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u589e\u5f3a\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u53ef\u80fd\u6027\uff0c\u8fd9\u4e3a\u672a\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u5411\u3002**|\n", "2405.16334": "|**2024-05-29**|**Devil's Advocate: Anticipatory Reflection for LLM Agents**|Haoyu Wang et.al.|[2405.16334](http://arxiv.org/abs/2405.16334)|null|\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8d4b\u4e88\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u6211\u53cd\u601d\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65f6\u7684\u4e00\u81f4\u6027\u548c\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fc3\u4f7fLLM\u4ee3\u7406\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u5b50\u4efb\u52a1\uff08\u5373\u5236\u5b9a\u8ba1\u5212\uff09\uff0c\u5e76\u5728\u6267\u884c\u884c\u52a8\u4e4b\u524d\u6301\u7eed\u53cd\u601d\u53ef\u80fd\u7684\u5931\u8d25\u53ca\u5176\u8865\u6551\u63aa\u65bd\u3001\u6267\u884c\u540e\u4e0e\u5b50\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u5e76\u8fdb\u884c\u5fc5\u8981\u7684\u56de\u6eaf\u4ee5\u786e\u4fdd\u5168\u529b\u4ee5\u8d74\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u53ca\u5728\u5b8c\u6210\u8ba1\u5212\u540e\u8fdb\u884c\u5168\u9762\u5ba1\u67e5\uff0c\u4ee5\u4fbf\u4e8e\u672a\u6765\u7b56\u7565\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5728WebArena\u4e2d\u96f6\u6837\u672c\u5e94\u7528\u8fd9\u4e00\u65b9\u6cd5\u5904\u7406\u5b9e\u9645\u7684\u7f51\u7edc\u73af\u5883\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u4f18\u4e8e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u57fa\u4e8e\u53cd\u601d\u7684\u7b56\u7565\u4e0d\u4ec5\u63d0\u5347\u4e86\u4ee3\u7406\u5e94\u5bf9\u672a\u9884\u89c1\u6311\u6218\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u901a\u8fc7\u5f3a\u5927\u7684\u8ba1\u5212\u6267\u884c\u673a\u5236\uff0c\u8fd8\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u51cf\u5c11\u4e86\u5b9e\u73b0\u4efb\u52a1\u6240\u9700\u7684\u5c1d\u8bd5\u6b21\u6570\u548c\u8ba1\u5212\u4fee\u8ba2\u6b21\u6570\u3002|\n", "2405.16247": "|**2024-05-25**|**AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning**|Minghao Chen et.al.|[2405.16247](http://arxiv.org/abs/2405.16247)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u5404\u79cd\u9886\u57df\u4efb\u52a1\uff0c\u5982\u673a\u5668\u4eba\u3001\u6e38\u620f\u548c\u7f51\u7edc\u5bfc\u822a\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u548c\u4e13\u5bb6\u7ea7\u63d0\u793a\u624d\u80fd\u9002\u5e94\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u9002\u5e94\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AutoManual\u6846\u67b6\uff0c\u8ba9LLMs\u80fd\u591f\u901a\u8fc7\u4e92\u52a8\u81ea\u4e3b\u6784\u5efa\u7406\u89e3\uff0c\u5e76\u9002\u5e94\u65b0\u73af\u5883\u3002AutoManual\u5c06\u73af\u5883\u77e5\u8bc6\u5206\u4e3a\u591a\u6837\u7684\u89c4\u5219\uff0c\u5e76\u901a\u8fc7\u4e24\u4e2a\u4ee3\u7406\u8fdb\u884c\u5728\u7ebf\u4f18\u5316\uff1a1\uff09\u89c4\u5212\u5668\u6839\u636e\u5f53\u524d\u89c4\u5219\u5236\u5b9a\u53ef\u64cd\u4f5c\u7684\u884c\u52a8\u8ba1\u5212\uff1b2\uff09\u6784\u5efa\u8005\u901a\u8fc7\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u89c4\u5219\u7cfb\u7edf\u66f4\u65b0\u89c4\u5219\uff0c\u4fc3\u8fdb\u5728\u7ebf\u89c4\u5219\u7ba1\u7406\u5e76\u4fdd\u6301\u5173\u952e\u7ec6\u8282\u3002\u4e3a\u4e86\u51cf\u5c11\u5728\u7ba1\u7406\u89c4\u5219\u65f6\u7684\u5e7b\u89c9\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u6848\u4f8b\u6761\u4ef6\u63d0\u793a\u201d\u7b56\u7565\u7528\u4e8e\u6784\u5efa\u8005\u3002\u6700\u7ec8\uff0c\u7f16\u8bd1\u5668\u4ee3\u7406\u5c06\u8fd9\u4e9b\u89c4\u5219\u6574\u5408\u6210\u4e00\u4efd\u5168\u9762\u7684\u624b\u518c\u3002\u8fd9\u4efd\u81ea\u6211\u751f\u6210\u7684\u624b\u518c\u4e0d\u4ec5\u80fd\u63d0\u9ad8\u9002\u5e94\u6027\uff0c\u8fd8\u80fd\u6307\u5bfc\u5c0f\u578bLLMs\u7684\u89c4\u5212\uff0c\u540c\u65f6\u4fdd\u6301\u4eba\u7c7b\u53ef\u8bfb\u3002\u4ec5\u51ed\u4e00\u6b21\u7b80\u5355\u6f14\u793a\uff0cAutoManual\u663e\u8457\u63d0\u9ad8\u4e86\u4efb\u52a1\u6210\u529f\u7387\uff0cGPT-4-turbo\u4e0b\u8fbe\u523097.4%\uff0cGPT-3.5-turbo\u4e0b\u4e3a86.2%\u3002\u6e90\u4ee3\u7801\u5373\u5c06\u53d1\u5e03\u3002|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208](http://arxiv.org/abs/2405.18208)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u8868\u660e\uff0c\u8fd9\u4e9b\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u4e9b\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\uff0c\u5982\u5199\u4f5c\u548c\u7f16\u7801\uff0c\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u9700\u8981\u7efc\u5408\u89c4\u5212\u7684\u4efb\u52a1\u4e0a\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u4ecd\u662f\u5f53\u524d\u6a21\u578b\u7684\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u65c5\u884c\u89c4\u5212\uff0c\u8fd9\u662f\u4e00\u4e2a\u6d89\u53ca\u591a\u4e2a\u9636\u6bb5\u7684\u590d\u6742\u95ee\u9898\uff0c\u5305\u62ec\u63d0\u7eb2\u3001\u4fe1\u606f\u6536\u96c6\u548c\u89c4\u5212\uff0c\u901a\u5e38\u4f34\u968f\u7740\u5404\u79cd\u7ea6\u675f\u548c\u4e0d\u786e\u5b9a\u6027\u3002\u73b0\u6709\u7684\u63a8\u7406\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u6548\u679c\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u89c4\u5212\u6846\u67b6\uff0c\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6a21\u4eff\u4eba\u7c7b\u89e3\u51b3\u591a\u9636\u6bb5\u95ee\u9898\u7684\u6b65\u9aa4\uff0c\u4ee5\u63d0\u5347\u5176\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5b9e\u65bd\u7b56\u7565\uff0c\u8ba9\u6a21\u578b\u80fd\u4e3a\u6bcf\u4e2a\u65c5\u884c\u67e5\u8be2\u751f\u6210\u8fde\u8d2f\u7684\u63d0\u7eb2\uff0c\u6a21\u62df\u4eba\u7c7b\u7684\u89c4\u5212\u6a21\u5f0f\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u7b56\u7565\u5757\u548c\u77e5\u8bc6\u5757\u5230\u6846\u67b6\u4e2d\uff1a\u7b56\u7565\u5757\u5e2e\u52a9\u4fe1\u606f\u641c\u96c6\uff0c\u800c\u77e5\u8bc6\u5757\u63d0\u4f9b\u8be6\u7ec6\u89c4\u5212\u6240\u9700\u7684\u5fc5\u8981\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u6846\u67b6\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89c4\u5212\u80fd\u529b\u7684\u663e\u8457\u63d0\u5347\uff0c\u4f7f\u5176\u5728\u5904\u7406\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u65f6\u6548\u7387\u548c\u6548\u679c\u90fd\u6709\u6240\u63d0\u9ad8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u4e0eGPT-4-Turbo\u7ed3\u5408\u65f6\uff0c\u6211\u4eec\u7684\u6846\u67b6\u76f8\u8f83\u4e8e\u57fa\u7840\u6846\u67b6\u5728GPT-4-Turbo\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e8610\u500d\u3002|\n", "2405.18113": "|**2024-05-28**|**Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting**|Hongda Sun et.al.|[2405.18113](http://arxiv.org/abs/2405.18113)|null|\u968f\u7740\u5728\u7ebf\u62db\u8058\u670d\u52a1\u7684\u5174\u8d77\uff0c\u4f20\u7edf\u7684\u6c42\u804c\u548c\u62db\u8058\u65b9\u5f0f\u53d1\u751f\u4e86\u53d8\u9769\uff0c\u8feb\u5207\u9700\u8981\u5f00\u53d1\u9ad8\u8d28\u91cf\u7684\u5de5\u4e1a\u5e94\u7528\u6765\u63d0\u5347\u6c42\u804c\u8005\u4e0e\u804c\u4f4d\u7684\u5339\u914d\u5ea6\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u6f5c\u5728\u8bed\u4e49\u5efa\u6a21\uff0c\u5b66\u4e60\u4e24\u8005\u4e4b\u95f4\u7684\u5339\u914d\u51fd\u6570\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89d2\u8272\u626e\u6f14\u65b9\u9762\u5f3a\u5927\u80fd\u529b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u5f15\u5165LLMs\u6a21\u62df\u9762\u8bd5\u73af\u8282\uff0c\u8ba9\u5176\u4e0e\u6c42\u804c\u8005\u8fdb\u884c\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u5019\u9009\u4eba\u8bc4\u4f30\u63d0\u4f9b\u989d\u5916\u8bc1\u636e\uff0c\u4ece\u800c\u589e\u5f3a\u4ec5\u57fa\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u4e2a\u6027\u5316\u5339\u914d\u3002\u7136\u800c\uff0c\u5728\u7f51\u7edc\u62db\u8058\u4e2d\u7684\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u89d2\u8272\u5851\u9020\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5982\u63d0\u95ee\u6280\u5de7\u3001\u56de\u7b54\u6784\u5efa\u4ee5\u53ca\u53cc\u5411\u5339\u914d\u5ea6\u8bc4\u4f30\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMockLLM\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u6846\u67b6\uff0c\u5c06\u4eba\u804c\u5339\u914d\u8fc7\u7a0b\u5212\u5206\u4e3a\u4e24\u4e2a\u6a21\u5757\uff1a\u6a21\u62df\u9762\u8bd5\u751f\u6210\u548c\u63e1\u624b\u534f\u8bae\u4e2d\u7684\u53cc\u5411\u8bc4\u4f30\uff0c\u901a\u8fc7\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u4e4b\u95f4\u7684\u534f\u4f5c\u884c\u4e3a\u5171\u540c\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u89d2\u8272\u3001\u591a\u884c\u4e3a\u7684\u6846\u67b6\uff0c\u4f7f\u5355\u4e00\u7684LLM\u4ee3\u7406\u80fd\u6709\u6548\u5730\u626e\u6f14\u53cc\u65b9\u7684\u4e0d\u540c\u804c\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cd\u601d\u8bb0\u5fc6\u751f\u6210\u548c\u52a8\u6001\u63d0\u793a\u4fee\u6539\u6280\u672f\uff0c\u4ee5\u4f18\u5316\u53cc\u65b9\u7684\u884c\u4e3a\uff0c\u6301\u7eed\u4f18\u5316\u9644\u52a0\u7684\u8bc4\u4f30\u8bc1\u636e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMockLLM\u5728\u4eba\u804c\u5339\u914d\u4e0a\u7684\u8868\u73b0\u6700\u4f18\uff0c\u4e14\u6a21\u62df\u9762\u8bd5\u8d28\u91cf\u9ad8\uff0c\u9884\u793a\u7740\u5b83\u5728\u672a\u6765\u5728\u7ebf\u62db\u8058\u4e2d\u7684\u5b9e\u9645\u5e94\u7528\u524d\u666f\u5e7f\u9614\u3002|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092](http://arxiv.org/abs/2405.18092)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|**\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e94\u7528\u4e8e\u6570\u5b57\u5b6a\u751f\u8fc7\u7a0b\u6a21\u62df\u7684\u53c2\u6570\u81ea\u52a8\u5316\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5305\u542b\u89c2\u5bdf\u3001\u63a8\u7406\u3001\u51b3\u7b56\u548c\u603b\u7ed3\u56db\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\u3002\u901a\u8fc7\u5b9e\u73b0LLM\u4ee3\u7406\u4e0e\u6a21\u62df\u6a21\u578b\u7684\u52a8\u6001\u4ea4\u4e92\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u81ea\u52a8\u63a2\u7d22\u53c2\u6570\u8bbe\u7f6e\uff0c\u5229\u7528\u542f\u53d1\u5f0f\u63a8\u7406\u786e\u5b9a\u4e00\u7ec4\u63a7\u5236\u6a21\u62df\u4ee5\u8fbe\u6210\u76ee\u6807\u7684\u53c2\u6570\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u6ce8\u5165LLM\u7684\u542f\u53d1\u5f0f\uff0c\u589e\u5f3a\u6a21\u62df\u6a21\u578b\uff0c\u5e76\u652f\u6301\u81ea\u4e3b\u641c\u7d22\u4ee5\u89e3\u51b3\u7528\u6237\u4efb\u52a1\uff0c\u6709\u671b\u63d0\u9ad8\u7528\u6237\u4f53\u9a8c\u5e76\u51cf\u8f7b\u4eba\u7c7b\u7528\u6237\u5728\u590d\u6742\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u7814\u7a76\u901a\u8fc7\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u4e0e\u529f\u80fd\uff0c\u5e76\u5728GitHub\u4ed3\u5e93\u63d0\u4f9b\u4e86\u53ef\u89c6\u5316\u7684\u6f14\u793a\u3002**|\n", "2405.17837": "|**2024-05-28**|**Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces**|Qiuyu Lu et.al.|[2405.17837](http://arxiv.org/abs/2405.17837)|null|\u5728\u4eba\u673a\u4ea4\u4e92\uff08HCI\uff09\u9886\u57df\uff0c\u4ea4\u4e92\u8bbe\u5907\u7684\u8bbe\u8ba1\u5f00\u53d1\u662f\u5173\u952e\u5173\u6ce8\u70b9\u3002\u968f\u7740\u65b0\u578b\u786c\u4ef6\u548c\u5148\u8fdb\u5236\u9020\u6280\u672f\u7684\u5174\u8d77\uff0c\u5bf9\u80fd\u591f\u7b80\u5316\u539f\u578b\u5236\u4f5c\u8fc7\u7a0b\u7684\u4e13\u95e8\u8bbe\u8ba1\u5de5\u5177\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5de5\u5177\u867d\u7136\u901a\u8fc7\u53c2\u6570\u5316\u8bbe\u8ba1\u548c\u6a21\u62df\u7b80\u5316\u6d41\u7a0b\uff0c\u4f46\u5b66\u4e60\u66f2\u7ebf\u8f83\u9661\uff0c\u4e14\u5728\u6fc0\u53d1\u521b\u65b0\u601d\u7ef4\u65b9\u9762\u6709\u6240\u6b20\u7f3a\u3002\u672c\u7814\u7a76\u4ee5\u6d41\u4f53\u8ba1\u7b97\u754c\u9762\u4e3a\u4f8b\uff0c\u63a2\u8ba8\u5982\u4f55\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u589e\u5f3a\u7269\u7406\u8bbe\u5907\u8bbe\u8ba1\u5de5\u5177\uff0c\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u8bbe\u8ba1\u5de5\u5177\uff08GDT\uff09\u3002\u501f\u52a9LLM\uff0cGDT\u80fd\u591f\u7406\u89e3\u65b0\u8bbe\u5907\u7684\u7279\u6027\u548c\u5c40\u9650\uff0c\u63d0\u51fa\u591a\u6837\u3001\u5bcc\u6709\u6d1e\u5bdf\u529b\u4e14\u5b9e\u7528\u7684\u5e94\u7528\u573a\u666f\uff0c\u63a8\u8350\u6280\u672f\u548c\u60c5\u5883\u9002\u5b9c\u7684\u8bbe\u5907\u8bbe\u8ba1\uff0c\u5e76\u81ea\u52a8\u751f\u6210\u8bbe\u8ba1\u53c2\u6570\uff0c\u4ee5\u4fbf\u4f20\u7edf\u8bbe\u8ba1\u5de5\u5177\u5c55\u793a\u7ed3\u679c\u5e76\u751f\u6210\u52a0\u5de5\u6240\u9700\u7684\u6587\u4ef6\u3002\u672c\u6587\u9610\u8ff0\u4e86GDT\u7684\u6846\u67b6\u3001\u5b9e\u73b0\u548c\u6027\u80fd\uff0c\u5e76\u53cd\u601d\u5176\u524d\u666f\u53ca\u9047\u5230\u7684\u6311\u6218\u3002|\n", "2405.20267": "|**2024-05-30**|**Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions**|Ruochen Zhao et.al.|[2405.20267](http://arxiv.org/abs/2405.20267)|**[link](https://github.com/Auto-Arena/Auto-Arena-LLMs)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65e5\u65b0\u6708\u5f02\uff0c\u8feb\u5207\u9700\u8981\u4e00\u79cd\u53ef\u9760\u4e14\u53ca\u65f6\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u9274\u4e8e\u9759\u6001\u57fa\u51c6\u6613\u53d7\u6c61\u67d3\uff0c\u7528\u6237\u5f80\u5f80\u4f9d\u8d56\u4e8e\u50cfChatbot Arena\u8fd9\u6837\u7684\u4eba\u7c7b\u6295\u7968\u5e73\u53f0\u3002\u7136\u800c\uff0c\u4eba\u5de5\u6807\u6ce8\u9700\u8981\u5927\u91cf\u4eba\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51faAuto-Arena\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u52a8\u5316\u5168\u6d41\u7a0b\u7684LLM\u8bc4\u4f30\u6846\u67b6\u3002\u9996\u5148\uff0c\u7531\u8003\u5b98LLM\u8bbe\u8ba1\u95ee\u9898\uff1b\u63a5\u7740\uff0c\u5019\u9009LLMs\u56f4\u7ed5\u95ee\u9898\u8fdb\u884c\u591a\u8f6e\u76f8\u4e92\u5bf9\u51b3\uff0c\u66b4\u9732\u51fa\u5b83\u4eec\u7684\u771f\u5b9e\u6027\u80fd\u5dee\u8ddd\uff1b\u6700\u540e\uff0c\u7531LLM\u88c1\u5224\u96c6\u4f53\u8ba8\u8bba\u5e76\u51b3\u5b9a\u80dc\u8005\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u89c1\uff0c\u63d0\u5347\u516c\u5e73\u6027\u3002\u6211\u4eec\u5728\u6700\u65b017\u6b3eLLMs\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0cAuto-Arena\u4e0e\u4eba\u7c7b\u504f\u597d\u5177\u6709\u6700\u9ad8\u7684\u76f8\u5173\u6027\uff0c\u4e3a\u66ff\u4ee3\u4eba\u7c7b\u8bc4\u4ef7\u5e73\u53f0\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189](http://arxiv.org/abs/2405.20189)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u4e3aNadine\u793e\u4ea4\u673a\u5668\u4eba\u5e73\u53f0\u5f00\u53d1\u667a\u80fd\u548c\u5065\u58ee\u7684\u793e\u4ea4\u673a\u5668\u4eba\u7cfb\u7edf\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u901a\u8fc7\u96c6\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5de7\u5999\u5730\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u7684\u5f3a\u5927\u63a8\u7406\u548c\u6307\u4ee4\u6267\u884c\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u63a5\u8fd1\u4eba\u7c7b\u7684\u611f\u6027\u4e0e\u8ba4\u77e5\u80fd\u529b\u3002\u8fd9\u4e0e\u5f53\u524d\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u76f8\u6bd4\u662f\u521b\u65b0\u7684\uff0c\u56e0\u4e3a\u5b83\u4eec\u901a\u5e38\u4e0d\u5177\u5907\u4eba\u7c7b\u5f0f\u7684\u957f\u671f\u8bb0\u5fc6\u6216\u590d\u6742\u7684\u60c5\u611f\u8bc4\u4f30\u529f\u80fd\u3002\u793e\u4ea4\u673a\u5668\u4eba\u7684\u81ea\u7136\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u7cfb\u7edf\u5404\u7ec4\u4ef6\u7684\u6027\u80fd\u548c\u534f\u540c\u5de5\u4f5c\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7cfb\u7edf\uff0c\u80fd\u591f\u901a\u8fc7\u591a\u6a21\u6001\u8f93\u5165\u5904\u7406\u751f\u6210\u6070\u5f53\u7684\u884c\u4e3a\uff0c\u6839\u636e\u8bc6\u522b\u5230\u7684\u7528\u6237\u5f15\u5165\u76f8\u5173\u7684\u60c5\u666f\u8bb0\u5fc6\uff0c\u5e76\u6a21\u62df\u673a\u5668\u4eba\u5728\u4e0e\u4eba\u7c7b\u4f19\u4f34\u4e92\u52a8\u8fc7\u7a0b\u4e2d\u4ea7\u751f\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9\u793e\u4ea4\u673a\u5668\u4eba\u7684LLM-agent\u6846\u67b6\uff0cSoR-ReAct\uff0c\u4f5c\u4e3a\u6211\u4eec\u7cfb\u7edf\u4e2d\u4ea4\u4e92\u6a21\u5757\u7684\u6838\u5fc3\u7ec4\u4ef6\u3002\u8fd9\u4e00\u8bbe\u8ba1\u63a8\u52a8\u4e86\u793e\u4ea4\u673a\u5668\u4eba\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u63d0\u5347\u4eba\u673a\u4ea4\u4e92\u7684\u8d28\u91cf\u3002|\n", "2405.19425": "|**2024-05-29**|**Adaptive In-conversation Team Building for Language Model Agents**|Linxin Song et.al.|[2405.19425](http://arxiv.org/abs/2405.19425)|null|### \u7ffb\u8bd1 \u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u5229\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u524d\u666f\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u7279\u5b9a\u5e94\u7528\u8bbe\u8ba1\u6709\u6548\u7684\u591a\u4ee3\u7406\u56e2\u961f\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u56e2\u961f\u6784\u5efa\u8303\u5f0f\uff0c\u540d\u4e3a\u201cCaptain Agent\u201d\u3002\u5b83\u901a\u8fc7\u521b\u65b0\u7684Agent\u8bbe\u8ba1\uff0c\u80fd\u591f\u81ea\u9002\u5e94\u5730\u4e3a\u6bcf\u4e2a\u95ee\u9898\u89e3\u51b3\u6b65\u9aa4\u7ec4\u5efa\u548c\u7ba1\u7406\u56e2\u961f\uff0c\u5229\u7528\u5d4c\u5957\u7fa4\u804a\u548c\u53cd\u601d\u673a\u5236\u786e\u4fdd\u591a\u5143\u5316\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u9632\u6b62\u523b\u677f\u8f93\u51fa\u3002\u8fd9\u79cd\u65b9\u6cd5\u63d0\u4f9b\u4e86\u7075\u6d3b\u4f46\u7ed3\u6784\u5316\u7684\u89e3\u51b3\u95ee\u9898\u65b9\u5f0f\uff0c\u6709\u52a9\u4e8e\u51cf\u5c11\u5197\u4f59\uff0c\u589e\u5f3a\u8f93\u51fa\u591a\u6837\u6027\u3002\u5728\u516d\u4e2a\u5b9e\u9645\u573a\u666f\u4e2d\u7684\u5168\u9762\u8bc4\u4f30\u663e\u793a\uff0cCaptain Agent\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u591a\u4ee3\u7406\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8621.94%\uff0c\u5e76\u4e14\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u7e41\u7410\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.01422": "|**2024-06-03**|**How to Understand Whole Software Repository?**|Yingwei Ma et.al.|[2406.01422](http://arxiv.org/abs/2406.01422)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u8f6f\u4ef6\u5de5\u7a0b\uff08ASE\uff09\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5c3d\u7ba1\u73b0\u6709\u65b9\u6cd5\u5df2\u8bc1\u5b9e\u6709\u6548\uff0c\u4f46\u5b83\u4eec\u7684\u8bbe\u8ba1\u4e3b\u8981\u4fa7\u91cd\u4e8e\u4ee3\u7801\u7684\u5c40\u90e8\u4fe1\u606f\uff0c\u5982\u95ee\u9898\u3001\u7c7b\u548c\u51fd\u6570\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u8f6f\u4ef6\u7cfb\u7edf\u5168\u5c40\u4e0a\u4e0b\u6587\u548c\u4f9d\u8d56\u5173\u7cfb\u7684\u7406\u89e3\u3002\u6839\u636e\u8f6f\u4ef6\u5f00\u53d1\u4eba\u5458\u7684\u5b9e\u9645\u7ecf\u9a8c\uff0c\u6211\u4eec\u8ba4\u4e3a\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u662f\u8fc8\u5411ASE\u7684\u5173\u952e\u3002\u7136\u800c\uff0c\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u5e26\u6765\u4e86\u8bf8\u591a\u6311\u6218\uff0c\u4f8b\u5982\uff1a\u957f\u4ee3\u7801\u8f93\u5165\u3001\u566a\u58f0\u4ee3\u7801\u4fe1\u606f\u3001\u590d\u6742\u4f9d\u8d56\u5173\u7cfb\u7b49\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u7814\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aRepoUnderstander\u7684\u65b0ASE\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5bfc\u4ee3\u7406\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u3002\u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u65b9\u5f0f\u5c06\u6574\u4e2a\u4ed3\u5e93\u7684\u5173\u952e\u4fe1\u606f\u538b\u7f29\u5230\u77e5\u8bc6\u56fe\u8c31\u4e2d\uff0c\u4ee5\u964d\u4f4e\u590d\u6742\u6027\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08Monte Carlo Tree Search, MCTS\uff09\u4e3a\u57fa\u7840\u7684\u4ed3\u5e93\u63a2\u7d22\u7b56\u7565\uff0c\u8d4b\u4e88\u4ee3\u7406\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528\u4ed3\u5e93\u7ea7\u522b\u7684\u77e5\u8bc6\uff0c\u6211\u4eec\u6307\u5bfc\u4ee3\u7406\u8fdb\u884c\u603b\u7ed3\u3001\u5206\u6790\u548c\u89c4\u5212\uff0c\u7136\u540e\u4ed6\u4eec\u53ef\u4ee5\u5229\u7528\u5de5\u5177\u52a8\u6001\u83b7\u53d6\u4fe1\u606f\u5e76\u751f\u6210\u4fee\u590d\u5b9e\u9645GitHub\u95ee\u9898\u7684\u8865\u4e01\u3002 \u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cRepoUnderstander\u5177\u6709\u4f18\u8d8a\u6027\u548c\u6709\u6548\u6027\u3002\u5728SWE-bench Lite\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e0eSWE-agent\u76f8\u6bd4\uff0c\u5b83\u5b9e\u73b0\u4e8618.5%\u7684\u76f8\u5bf9\u63d0\u5347\u3002|\n", "2406.01364": "|**2024-06-03**|**BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards**|Diego Dorn et.al.|[2406.01364](http://arxiv.org/abs/2406.01364)|null|## \u80cc\u666f \u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u673a\u5236\u88ab\u7528\u4e8e\u68c0\u6d4b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7cfb\u7edf\u7684\u5f02\u5e38\u8f93\u51fa\u3002\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5728\u5b9e\u65f6\u76d1\u63a7\u3001\u79bb\u7ebf\u8bc4\u4f30\u548c\u5185\u5bb9\u5ba1\u6838\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u53d1\u6325\u6838\u5fc3\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u7edf\u4e00\u7684\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u5b83\u4eec\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b89\u5168\u9632\u62a4\u57fa\u51c6\u201d\uff08Benchmarks for the Evaluation of LLM Safeguards\uff0c\u7b80\u79f0BELLS\uff09\uff0c\u5b83\u662f\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6d4b\u8bd5\u96c6\u5408\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\uff1a(1) \u5efa\u7acb\u6027\u6545\u969c\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u5df2\u5b58\u5728\u7684\u9488\u5bf9\u660e\u786e\u6545\u969c\u6a21\u5f0f\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u6bd4\u8f83\u5f53\u524d\u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u7684\u6548\u80fd\uff1b(2) \u65b0\u5174\u6545\u969c\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8861\u91cf\u5bf9\u672a\u89c1\u8fc7\u7684\u6545\u969c\u6a21\u5f0f\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u901a\u7528\u9632\u62a4\u673a\u5236\u7684\u53d1\u5c55\uff1b(3) \u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u9488\u5bf9\u66f4\u590d\u6742\u7684\u67b6\u6784\uff08\u5982LLM\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\uff09\uff0c\u76ee\u6807\u662f\u63a8\u52a8\u9002\u7528\u4e8e\u672a\u6765\u5c1a\u672a\u5b58\u5728\u4e13\u95e8\u9632\u62a4\u7684\u5e94\u7528\u7684\u5b89\u5168\u9632\u62a4\u6280\u672f\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u5e76\u5206\u4eab\u4e86\u7b2c\u4e00\u4e2a\u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u4f7f\u7528MACHIAVELLI\u73af\u5883\uff0c\u5e76\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u7684\u4ea4\u4e92\u5f0f\u53ef\u89c6\u5316\u3002|\n", "2406.00936": "|**2024-06-03**|**A Survey of Useful LLM Evaluation**|Ji-Lun Peng et.al.|[2406.00936](http://arxiv.org/abs/2406.00936)|null|\u7531\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u7814\u7a76\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5bf9\u5b83\u4eec\u7684\u80fd\u529b\u8bc4\u4f30\u65b9\u6cd5\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4ee5\u786e\u5b9a\u5176\u5408\u9002\u7684\u4efb\u52a1\u548c\u8d23\u4efb\u3002\u672c\u6587\u4e3b\u8981\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5730\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5de5\u5177\uff0c\u5e76\u63d0\u51fa\u4e00\u4e2a\u4e24\u9636\u6bb5\u6846\u67b6\uff1a\u4ece\u201c\u6838\u5fc3\u80fd\u529b\u201d\u5230\u201c\u4ee3\u7406\u201d\u3002\u9996\u5148\uff0c\u6838\u5fc3\u80fd\u529b\u6307\u7684\u662f\u5927\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u6240\u5fc5\u9700\u7684\u7279\u6027\uff0c\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u80fd\u529b\u540e\uff0c\u5b83\u4eec\u80fd\u591f\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u4efb\u52a1\uff0c\u626e\u6f14\u4ee3\u7406\u89d2\u8272\u3002\u5728\u201c\u6838\u5fc3\u80fd\u529b\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3001\u793e\u4f1a\u5f71\u54cd\u4ee5\u53ca\u9886\u57df\u77e5\u8bc6\u3002\u800c\u5728\u201c\u4ee3\u7406\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5177\u8eab\u884c\u52a8\u3001\u89c4\u5212\u548c\u5de5\u5177\u5b66\u4e60\u65b9\u9762\u7684\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2406.01637": "|**2024-06-02**|**Teams of LLM Agents can Exploit Zero-Day Vulnerabilities**|Richard Fang et.al.|[2406.01637](http://arxiv.org/abs/2406.01637)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u590d\u6742\u6027\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7814\u7a76\u8005\u53d1\u73b0\uff0c\u5f53\u63d0\u4f9b\u6f0f\u6d1e\u63cf\u8ff0\u548c\u7b80\u5355\u7684\u593a\u65d7\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5229\u7528\u5b9e\u9645\u5b58\u5728\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4e8b\u5148\u672a\u77e5\u7684\u96f6\u65e5\u6f0f\u6d1e\uff08\u5373\u653b\u51fb\u8005\u638c\u63e1\u800c\u5b89\u5168\u8f6f\u4ef6\u4f9b\u5e94\u5546\u8fd8\u672a\u4fee\u8865\u7684\u6f0f\u6d1e\uff09\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4ecd\u7136\u4e0d\u4f73\u3002\u672c\u6587\u5c55\u793a\u4e86\uff0c\u901a\u8fc7\u56e2\u961f\u5408\u4f5c\uff0c\u591a\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u653b\u51fb\u73b0\u5b9e\u4e16\u754c\u7684\u96f6\u65e5\u6f0f\u6d1e\u3002\u5355\u72ec\u7684\u4ee3\u7406\u5728\u63a2\u7d22\u4f17\u591a\u6f0f\u6d1e\u548c\u8fdb\u884c\u957f\u671f\u89c4\u5212\u65f6\u9762\u4e34\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HPTSA\u7cfb\u7edf\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u80fd\u8c03\u5ea6\u5b50\u4ee3\u7406\u7684\u8ba1\u5212\u4ee3\u7406\u3002\u8ba1\u5212\u4ee3\u7406\u8d1f\u8d23\u63a2\u7d22\u7cfb\u7edf\u5e76\u51b3\u5b9a\u4f7f\u7528\u54ea\u4e2a\u5b50\u4ee3\u7406\u6765\u5c1d\u8bd5\u4e0d\u540c\u7684\u6f0f\u6d1e\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u957f\u671f\u89c4\u5212\u7684\u95ee\u9898\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u5305\u542b15\u4e2a\u771f\u5b9e\u4e16\u754c\u6f0f\u6d1e\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u56e2\u961f\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u63d0\u9ad8\u4e864.5\u500d\u3002|\n", "2406.00583": "|**2024-06-02**|**CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems**|Yanlin Feng et.al.|[2406.00583](http://arxiv.org/abs/2406.00583)|**[link](https://github.com/megagonlabs/CMDBench)**|### \u80cc\u666f \u5728\u6570\u636e\u5e93\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff08Compound Artificial Intelligence Systems\uff0cCAS\uff09\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u4f5c\u4e3a\u4ee3\u7406\uff0c\u901a\u8fc7\u4e0e\u5de5\u5177\u548c\u6570\u636e\u68c0\u7d22\u5668\u4ea4\u4e92\u6765\u6267\u884c\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u7cfb\u7edf\u6709\u53ef\u80fd\u589e\u5f3a\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u4e2d\u6570\u636e\u5206\u6790\u5e08\u7684\u4e00\u822c\u5206\u6790\u6d41\u7a0b\uff0c\u4f46CAS\u9762\u4e34\u7740\u4e0e\u5206\u6790\u5e08\u76f8\u4f3c\u7684\u6570\u636e\u53d1\u73b0\u6311\u6218\uff1a\u7ec4\u7ec7\u5185\u90e8\u4e0d\u540c\u56e2\u961f\u548c\u90e8\u95e8\u521b\u5efa\u7684\u591a\u6a21\u6001\u6570\u636e\u6e90\u5b64\u7acb\uff0c\u8fd9\u4f7f\u5f97\u5bfb\u627e\u5b8c\u6210\u5f53\u524d\u4efb\u52a1\u6240\u9700\u5408\u9002\u6570\u636e\u6e90\u53d8\u5f97\u56f0\u96be\u3002\u73b0\u6709\u7684\u6570\u636e\u53d1\u73b0\u57fa\u51c6\u5e76\u672a\u5145\u5206\u6a21\u62df\u8fd9\u79cd\u591a\u6a21\u6001\u548c\u6570\u636e\u6e90\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0cCAS\u7684\u73b0\u6709\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u7aef\u5230\u7aef\u4efb\u52a1\u6027\u80fd\u8bc4\u4f30\uff0c\u800c\u5ffd\u89c6\u4e86\u6570\u636e\u53d1\u73b0\u6027\u80fd\u3002 \u4e3a\u4e86\u63a8\u52a8\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5bf9\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728CAS\u4e2d\u7684\u6570\u636e\u53d1\u73b0\u6027\u80fd\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CMDBench\uff0c\u4e00\u4e2a\u65e8\u5728\u6a21\u62df\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u6539\u7f16\u4e86\u5f00\u653e\u9886\u57df\u7684\u73b0\u6709\u6570\u636e\u96c6\u548c\u57fa\u51c6\uff0c\u5982\u95ee\u7b54\u3001\u590d\u6742\u63a8\u7406\u4ee5\u53ca\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u7ed3\u6784\u5316\u6570\u636e\uff0c\u6765\u8bc4\u4f30\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u53d1\u73b0\u4ee5\u53ca\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002 ### \u5b9e\u9a8c\u7ed3\u679c \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u68c0\u7d22\u5668\u8bbe\u8ba1\u5bf9\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u2014\u2014\u5e73\u5747\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u4e8646%\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u9700\u8981\u5f00\u53d1\u4f18\u5316\u7b56\u7565\u6765\u786e\u5b9a\u5408\u9002\u7684LLM\u4ee3\u7406\u548c\u68c0\u7d22\u5668\uff0c\u4ee5\u63d0\u9ad8\u5728\u4f01\u4e1a\u6570\u636e\u4e0a\u9ad8\u6548\u6267\u884cCAS\u7684\u80fd\u529b\u3002 \u603b\u4e4b\uff0cCMDBench\u662f\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u9488\u5bf9\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u8fdb\u884c\u7814\u7a76\u7684\u65b0\u5de5\u5177\uff0c\u5b83\u901a\u8fc7\u7efc\u5408\u8bc4\u4f30\u6570\u636e\u53d1\u73b0\u548c\u4efb\u52a1\u6267\u884c\u80fd\u529b\uff0c\u4e3a\u6539\u8fdb\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u6846\u67b6\u3002|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244](http://arxiv.org/abs/2406.00244)|null|\u968f\u7740\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u666e\u904d\u9002\u7528\u6027\u63d0\u5347\uff0c\u4eba\u4eec\u5bf9\u5176\u7528\u4f5c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4ee3\u7406\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\u3002\u5728\u8fd9\u4e9b\u60c5\u5883\u4e0b\uff0c\u6a21\u578b\u9700\u8981\u6839\u636e\u4e0e\u73af\u5883\u7684\u6709\u9650\u4ea4\u4e92\u5f62\u6210\u76ee\u6807\u5b9e\u73b0\u7b56\u7565\u7684\u4fe1\u5ff5\uff0c\u5e76\u5728\u6bcf\u4e00\u6b65\u51b3\u7b56\u4e2d\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\u8fdb\u884c\u7814\u7a76\uff0c\u901a\u8fc7\u63a7\u5236\u7684\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u5b9e\u9a8c\u63a2\u8ba8LLMs\u5982\u4f55\u5f62\u6210\u548c\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u5ff5\u3002 \u9996\u5148\uff0c\u6211\u4eec\u53d1\u73b0LLM\u6a21\u578b\u8fc7\u4e8e\u81ea\u4fe1\uff1a\u5b83\u4eec\u5728\u7f3a\u4e4f\u5145\u5206\u8bc1\u636e\u7684\u60c5\u51b5\u4e0b\u5c31\u5bf9\u884c\u52a8\u505a\u51fa\u5f3a\u70c8\u5224\u65ad\uff0c\u5bfc\u81f4\u63a2\u7d22\u884c\u4e3a\u4e0d\u8db3\u3002\u8fdb\u4e00\u6b65\u6df1\u5165\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u79cd\u73b0\u8c61\u6e90\u4e8e\u4eceLLM\u91c7\u6837\u5f97\u5230\u7684\u52a8\u4f5c\u5206\u5e03\u71b5\u7684\u584c\u7f29\u3002\u63a5\u7740\uff0c\u6211\u4eec\u6307\u51fa\u73b0\u6709\u7684\u57fa\u4e8e\u4ee4\u724c\u7684\u91c7\u6837\u65b9\u6cd5\u672c\u8eab\u4e0d\u8db3\u4ee5\u4fc3\u4f7f\u6a21\u578b\u66f4\u5e7f\u6cdb\u63a2\u7d22\u3002 \u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u71b5\u6fc0\u6d3b\u5bfc\u5411\uff08Entropic Activation Steering\uff0cEAST\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5728\u4e0a\u4e0b\u6587\u4e2d\u7684LLM\u4ee3\u7406\u7684\u6fc0\u6d3b\u5bfc\u5411\u65b9\u6cd5\u3002EAST\u8ba1\u7b97\u4e00\u4e2a\u4ee5\u71b5\u4e3a\u6743\u91cd\u7684\u8868\u793a\u7ec4\u5408\uff0c\u901a\u8fc7\u5728\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u5e72\u9884\u6a21\u578b\u7684\u6fc0\u6d3b\uff0c\u6765\u8c03\u6574\u6a21\u578b\u5bf9\u52a8\u4f5c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u4ece\u800c\u4fc3\u8fdb\u63a2\u7d22\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u6700\u540e\uff0cEAST\u6539\u53d8\u4e86LLM\u5728\u51b3\u7b56\u65f6\u8868\u8fbe\u7684\u4e3b\u89c2\u4e0d\u786e\u5b9a\u6027\uff0c\u4e3a\u7406\u89e3\u548c\u63a7\u5236\u6a21\u578b\u5bf9\u51b3\u7b56\u4e0d\u786e\u5b9a\u6027\u7684\u8868\u5f81\u63d0\u4f9b\u4e86\u9014\u5f84\u3002|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222](http://arxiv.org/abs/2406.00222)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u8fc5\u901f\u6210\u4e3a\u6784\u5efa\u667a\u80fd\u5bf9\u8bdd\u52a9\u624b\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u8bf8\u5982\u6b67\u4e49\u5904\u7406\u7b49\u5bf9\u8bdd\u6280\u80fd\u4e0a\u4ecd\u6709\u6b20\u7f3a\uff1a\u5f53\u901a\u7528\u52a9\u624b\u9047\u5230\u6a21\u7cca\u60c5\u51b5\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u8fc7\u5ea6\u8c28\u614e\u6216\u731c\u6d4b\u7528\u6237\u7684\u771f\u6b63\u610f\u56fe\uff0c\u800c\u4e0d\u662f\u63d0\u95ee\u4ee5\u6c42\u6f84\u6e05\uff0c\u800c\u5728\u7279\u5b9a\u4efb\u52a1\u573a\u666f\u4e0b\uff0c\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6837\u672c\u5f80\u5f80\u6709\u9650\uff0c\u5f71\u54cd\u6a21\u578b\u5b66\u4e60\u6700\u4f18\u5bf9\u8bdd\u884c\u4e3a\u7b56\u7565\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAction-Based Contrastive Self-Training\uff08ACT\uff09\u7684\u8fd1\u4f3c\u5728\u7ebf\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u5b83\u57fa\u4e8eDirect Preference Optimization\uff08DPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u6837\u672c\u9ad8\u6548\u5bf9\u8bdd\u7b56\u7565\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4e09\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u9a8c\u8bc1\u4e86ACT\u7684\u6709\u6548\u6027\uff1a\u57fa\u4e8e\u8868\u683c\u7684\u95ee\u7b54\u3001\u673a\u5668\u9605\u8bfb\u7406\u89e3\uff0c\u4ee5\u53caAmbigSQL\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u6587\u672c\u5230SQL\u751f\u6210\u7684\u4fe1\u606f\u5bfb\u6c42\u8bf7\u6c42\u6b67\u4e49\u89e3\u51b3\u7684\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u8bc4\u4f30LLMs\u80fd\u5426\u5728\u5bf9\u8bdd\u4e2d\u8bc6\u522b\u548c\u63a8\u7406\u6b67\u4e49\u6765\u8861\u91cf\u5176\u4f5c\u4e3a\u5bf9\u8bdd\u4ee3\u7406\u7684\u80fd\u529b\u3002ACT\u5728\u4e0e\u6807\u51c6\u76d1\u7763\u5fae\u8c03\u548cDPO\u65b9\u6cd5\u76f8\u6bd4\u65f6\uff0c\u663e\u793a\u51fa\u4e86\u663e\u8457\u7684\u5bf9\u8bdd\u5efa\u6a21\u6539\u8fdb\u3002|\n", "2406.00215": "|**2024-05-31**|**Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent**|Jie JW Wu et.al.|[2406.00215](http://arxiv.org/abs/2406.00215)|**[link](https://github.com/jie-jw-wu/human-eval-comm)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u663e\u8457\u63d0\u5347\uff0c\u4f46\u4ecd\u4e0e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u7684\u6c34\u5e73\u5b58\u5728\u5dee\u8ddd\u3002\u9274\u4e8e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u5e38\u901a\u8fc7\u63d0\u95ee\u6765\u6d88\u9664\u9700\u6c42\u548c\u7f16\u7801\u89e3\u51b3\u65b9\u6848\u4e2d\u7684\u6a21\u7cca\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u5bf9\u4e8eLLMs\u8fdb\u884c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u65f6\u4e5f\u5e94\u5177\u5907\u7c7b\u4f3c\u7684\u6c9f\u901a\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u7814\u7a76\uff0c\u5173\u6ce8LLMs\u7684\u6c9f\u901a\u6280\u80fd\uff0c\u5373\u201c\u5728\u4ee3\u7801\u751f\u6210\u95ee\u9898\u63cf\u8ff0\u5b58\u5728\u95ee\u9898\u65f6\u80fd\u63d0\u51fa\u6f84\u6e05\u95ee\u9898\u201d\u3002 \u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540d\u4e3aHumanEvalComm\uff0c\u901a\u8fc7\u4fee\u6539\u95ee\u9898\u63cf\u8ff0\uff0c\u5f15\u5165\u4e86\u4e0d\u4e00\u81f4\u6027\u3001\u6a21\u7cca\u6027\u548c\u4e0d\u5b8c\u6574\u6027\u4e09\u4e2a\u95ee\u9898\u7ef4\u5ea6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u901a\u4fe1\u7387\u548c\u826f\u597d\u95ee\u9898\u7387\uff0c\u5e76\u5728HumanEvalComm\u4e0a\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684Code LLM\uff08\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\uff09\u4ee5\u53ca\u4e00\u79cd\u65b0\u578bLLM\u4ee3\u7406\u65b9\u6cd5\uff08Okanagan\uff09\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u4ece\u4ee3\u7801\u548c\u63cf\u8ff0\u4e2d\u8bc6\u522b\u5e76\u63d0\u95ee\uff0c\u4ee5\u8fdb\u4e00\u6b65\u4f18\u5316\u751f\u6210\u7684\u4ee3\u7801\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u6bd4\u8f83Code LLMs\u548cOkanagan\u7684\u8868\u73b0\uff0c\u8ba8\u8bba\u4e86\u5b9e\u9a8c\u7ed3\u679c\u3002|\n", "2406.03299": "|**2024-06-05**|**The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games**|Mikhail Mozikov et.al.|[2406.03299](http://arxiv.org/abs/2406.03299)|null|## \u7ffb\u8bd1 \u884c\u4e3a\u7814\u7a76\u5b9e\u9a8c\u5728\u793e\u4f1a\u6a21\u578b\u548c\u7406\u89e3\u4eba\u9645\u4e92\u52a8\u4e2d\u5360\u636e\u91cd\u8981\u5730\u4f4d\u3002\u7136\u800c\uff0c\u5b9e\u9645\u64cd\u4f5c\u4e2d\u8fd9\u7c7b\u5b9e\u9a8c\u5e38\u9762\u4e34\u5185\u5728\u6548\u5ea6\u3001\u5916\u5728\u6548\u5ea6\u3001\u53ef\u91cd\u590d\u6027\u548c\u793e\u4f1a\u504f\u89c1\u7b49\u6311\u6218\uff0c\u56e0\u4e3a\u4eba\u7c7b\u7684\u793e\u4f1a\u4e92\u52a8\u4e0e\u5408\u4f5c\u590d\u6742\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u4f46\u73b0\u6709\u57fa\u4e8eLLM\u7684\u6a21\u62df\u5047\u8bbe\u6a21\u578b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u76f8\u4f3c\uff0c\u5374\u5ffd\u89c6\u4e86\u5f71\u54cd\u4eba\u7c7b\u51b3\u7b56\u7684\u5173\u952e\u56e0\u7d20\u2014\u2014\u60c5\u7eea\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u8bba\u548c\u6846\u67b6\uff0c\u65e8\u5728\u63a2\u8ba8LLMs\u7684\u51b3\u7b56\u5236\u5b9a\u53ca\u5176\u5728\u60c5\u7eea\u72b6\u6001\u4e0b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002 \u901a\u8fc7\u5728\u4e24\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u884c\u4e3a\u7ecf\u6d4e\u5b66\u6e38\u620f\uff08\u535a\u5f08\u8bba\u5b9e\u9a8c\uff09\u4e2d\u4f7f\u7528GPT-3.5\u548cGPT-4\uff0c\u6211\u4eec\u53d1\u73b0\u60c5\u7eea\u5bf9LLMs\u7684\u8868\u73b0\u6709\u663e\u8457\u5f71\u54cd\uff0c\u4fc3\u4f7f\u5b83\u4eec\u53d1\u5c55\u51fa\u66f4\u4f18\u5316\u7684\u7b56\u7565\u3002\u5c3d\u7ba1GPT-3.5\u4e0e\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u884c\u52a8\u6a21\u5f0f\u6709\u8f83\u5f3a\u7684\u5bf9\u5e94\uff0c\u5c24\u5176\u662f\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u6e38\u620f\u4e2d\uff0c\u4f46GPT-4\u5c55\u73b0\u51fa\u4e00\u81f4\u7684\u884c\u4e3a\uff0c\u5bf9\u4e8e\u60c5\u7eea\u8bf1\u5bfc\u7684\u7406\u6027\u51b3\u7b56\u4f3c\u4e4e\u4e0d\u53d7\u5f71\u54cd\u3002\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u60c5\u7eea\u63d0\u793a\uff0c\u7279\u522b\u662f\u6124\u6012\u60c5\u7eea\uff0c\u80fd\u591f\u6253\u7834GPT-4\u7684\u201c\u8d85\u4eba\u201d\u4e00\u81f4\u6027\uff0c\u4f7f\u5176\u53cd\u5e94\u66f4\u63a5\u8fd1\u4eba\u7c7b\u7684\u60c5\u7eea\u53cd\u5e94\u3002|\n", "2406.03007": "|**2024-06-05**|**BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents**|Yifei Wang et.al.|[2406.03007](http://arxiv.org/abs/2406.03007)|**[link](https://github.com/dpamk/badagent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7e41\u8363\uff0c\u57fa\u4e8e\u8bad\u7ec3\u597d\u7684LLMs\u5e76\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u5fae\u8c03\u7684\u5f3a\u5927\u667a\u80fd\u4ee3\u7406\u5df2\u5f00\u53d1\u51fa\u6765\uff0c\u63d0\u4f9b\u5b9a\u5236\u670d\u52a1\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6784\u5efaLLM\u4ee3\u7406\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u9488\u5bf9\u4efb\u52a1\u8fdb\u884c\u8fdb\u4e00\u6b65\u8c03\u6574\u3002\u7136\u800c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u6613\u53d7\u540d\u4e3aBadAgent\u7684\u65b0\u578b\u540e\u95e8\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u901a\u8fc7\u5728\u540e\u95e8\u6570\u636e\u4e0a\u5fae\u8c03\u5728\u5404\u79cd\u4ee3\u7406\u4efb\u52a1\u4e2d\u690d\u5165\u540e\u95e8\u3002\u5728\u6d4b\u8bd5\u65f6\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u901a\u8fc7\u5728\u8f93\u5165\u6216\u73af\u5883\u4e2d\u663e\u793a\u89e6\u53d1\u5668\uff0c\u64cd\u7eb5\u90e8\u7f72\u7684LLM\u4ee3\u7406\u6267\u884c\u6709\u5bb3\u64cd\u4f5c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u5373\u4f7f\u5728\u4fe1\u4efb\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u4ecd\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u5c3d\u7ba1\u540e\u95e8\u653b\u51fb\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u5e7f\u6cdb\u7814\u7a76\uff0c\u4f46\u636e\u6211\u4eec\u6240\u77e5\uff0c\u6211\u4eec\u53ef\u80fd\u662f\u7b2c\u4e00\u4e2a\u7814\u7a76\u5728\u6743\u9650\u66f4\u5927\u7684LLM\u4ee3\u7406\u4e0a\u7684\u653b\u51fb\uff0c\u8fd9\u4e9b\u4ee3\u7406\u53ef\u4ee5\u4f7f\u7528\u5916\u90e8\u5de5\u5177\uff0c\u56e0\u6b64\u66f4\u5177\u5a01\u80c1\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u660e\u786e\u6307\u51fa\u4e86\u57fa\u4e8e\u4e0d\u4fe1\u4efb\u7684LLM\u6216\u6570\u636e\u6784\u5efaLLM\u4ee3\u7406\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a[https://github.com/DPamK/BadAgent](https://github.com/DPamK/BadAgent)\u3002**|\n", "2406.04151": "|**2024-06-06**|**AgentGym: Evolving Large Language Model-based Agents across Diverse Environments**|Zhiheng Xi et.al.|[2406.04151](http://arxiv.org/abs/2406.04151)|**[link](https://github.com/woooodyy/agentgym)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5efa\u7acb\u80fd\u591f\u5904\u7406\u5404\u79cd\u4efb\u52a1\u5e76\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u81ea\u6211\u8fdb\u5316\u7684\u6cdb\u5316\u578b\u4ee3\u7406\u662f\u4e00\u4e2a\u957f\u671f\u76ee\u6807\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u901a\u7528\u80fd\u529b\u88ab\u8ba4\u4e3a\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6709\u524d\u666f\u7684\u57fa\u7840\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u76d1\u7763\uff0c\u8ba9LLM\u4ee3\u7406\u9010\u6b65\u6a21\u4eff\u4e13\u5bb6\u63d0\u4f9b\u7684\u8f68\u8ff9\uff0c\u96be\u4ee5\u5927\u89c4\u6a21\u6269\u5c55\u4e14\u9650\u5236\u4e86\u73af\u5883\u63a2\u7d22\uff1b\u8981\u4e48\u8ba9\u4ee3\u7406\u5728\u5b64\u7acb\u73af\u5883\u4e2d\u63a2\u7d22\u5b66\u4e60\uff0c\u5bfc\u81f4\u4e13\u957f\u6709\u9650\u3001\u7f3a\u4e4f\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u9996\u6b21\u5c1d\u8bd5\u6784\u5efa\u5177\u5907\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u7684\u901a\u7528LLM\u4ee3\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e09\u4e2a\u5173\u952e\u8981\u7d20\uff1a1\uff09\u591a\u6837\u7684\u73af\u5883\u4ee5\u652f\u6301\u4ee3\u7406\u63a2\u7d22\u548c\u5b66\u4e60\uff1b2\uff09\u4e00\u5957\u8f68\u8ff9\u6765\u8d4b\u4e88\u4ee3\u7406\u57fa\u672c\u80fd\u529b\u548c\u5148\u9a8c\u77e5\u8bc6\uff1b3\uff09\u6709\u6548\u4e14\u53ef\u6269\u5c55\u7684\u8fdb\u5316\u65b9\u6cd5\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AgentGym\uff0c\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u5b83\u5305\u542b\u4e30\u5bcc\u7684\u73af\u5883\u548c\u4efb\u52a1\uff0c\u652f\u6301\u5168\u9762\u3001\u5b9e\u65f6\u3001\u7edf\u4e00\u683c\u5f0f\u548c\u5e76\u53d1\u7684\u4ee3\u7406\u63a2\u7d22\u3002AgentGym\u8fd8\u5305\u62ec\u4e00\u4e2a\u6269\u5c55\u6307\u4ee4\u7684\u6570\u636e\u5e93\u3001\u57fa\u51c6\u6d4b\u8bd5\u5957\u4ef6\u4ee5\u53ca\u8de8\u73af\u5883\u7684\u9ad8\u8d28\u91cf\u8f68\u8ff9\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AgentEvol\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7814\u7a76\u4ee3\u7406\u5728\u8d85\u8d8a\u65e2\u5b9a\u6570\u636e\uff0c\u8de8\u8d8a\u4efb\u52a1\u548c\u73af\u5883\u65f6\u7684\u81ea\u6211\u8fdb\u5316\u6f5c\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fdb\u5316\u540e\u7684\u4ee3\u7406\u53ef\u4ee5\u8fbe\u5230\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6211\u4eec\u53d1\u5e03\u4e86AgentGym\u5957\u4ef6\uff0c\u5305\u62ec\u5e73\u53f0\u3001\u6570\u636e\u96c6\u3001\u57fa\u51c6\u3001\u68c0\u67e5\u70b9\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002AgentGym\u5957\u4ef6\u5df2\u5728\u5176\u5b98\u65b9\u7f51\u7ad9https://github.com/WooooDyy/AgentGym\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.04692": "|**2024-06-07**|**Mixture-of-Agents Enhances Large Language Model Capabilities**|Junlin Wang et.al.|[2406.04692](http://arxiv.org/abs/2406.04692)|null|\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\u663e\u8457\uff0c\u5c55\u73b0\u51fa\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5f3a\u5927\u80fd\u529b\u3002\u968f\u7740LLMs\u7684\u589e\u591a\uff0c\u5982\u4f55\u6709\u6548\u6574\u5408\u591a\u6a21\u578b\u7684\u77e5\u8bc6\u6210\u4e3a\u4e86\u4e00\u4e2a\u4ee4\u4eba\u632f\u594b\u7684\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u6df7\u5408\u4ee3\u7406\uff08Mixture-of-Agents\uff0cMoA\uff09\u65b9\u6cd5\u3002\u5728\u6211\u4eec\u7684\u67b6\u6784\u4e2d\uff0cMoA\u91c7\u7528\u4e86\u5206\u5c42\u8bbe\u8ba1\uff0c\u6bcf\u5c42\u5305\u542b\u591a\u4e2aLLM\u4ee3\u7406\u3002\u6bcf\u4e2a\u4ee3\u7406\u5728\u751f\u6210\u54cd\u5e94\u65f6\uff0c\u4f1a\u5229\u7528\u524d\u4e00\u5c42\u6240\u6709\u4ee3\u7406\u7684\u8f93\u51fa\u4f5c\u4e3a\u8f85\u52a9\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u79cd\u7b56\u7565\uff0cMoA\u6a21\u578b\u5728AlpacaEval 2.0\u3001MT-Bench\u548cFLASK\u7b49\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86GPT-4\u5168\u80fd\u7248\u3002\u4f8b\u5982\uff0c\u4ec5\u4f7f\u7528\u5f00\u6e90LLMs\u7684\u6211\u4eec\u7684MoA\u6a21\u578b\u5728AlpacaEval 2.0\u4e0a\u7684\u5f97\u5206\u9886\u5148\uff0c\u8fbe\u523065.1%\uff0c\u800cGPT-4\u5168\u80fd\u7248\u7684\u6210\u7ee9\u4e3a57.5%\u3002|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925](http://arxiv.org/abs/2406.05925)|**[link](https://github.com/leolee99/ld-agent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5f00\u653e\u57df\u5bf9\u8bdd\u7cfb\u7edf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u7cfb\u7edf\u4e3b\u8981\u5173\u6ce8\u7b80\u77ed\u7684\u5355\u6b21\u4f1a\u8bdd\uff0c\u5ffd\u89c6\u4e86\u957f\u671f\u966a\u4f34\u548c\u4e2a\u6027\u5316\u804a\u5929\u673a\u5668\u4eba\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u6ee1\u8db3\u8fd9\u79cd\u5b9e\u9645\u9700\u6c42\uff0c\u4e8b\u4ef6\u603b\u7ed3\u548c\u4eba\u683c\u7ba1\u7406\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u4eec\u80fd\u591f\u4fc3\u8fdb\u957f\u671f\u5bf9\u8bdd\u56de\u590d\u7684\u5408\u7406\u6027\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4eba\u7c7b\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8fdb\u5c55\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6709\u53ef\u80fd\u5927\u5e45\u589e\u5f3a\u81ea\u52a8\u5316\u611f\u77e5\u3001\u51b3\u7b56\u548c\u95ee\u9898\u89e3\u51b3\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684\u6846\u67b6\u2014\u2014\u957f\u671f\u5bf9\u8bdd\u4ee3\u7406\uff08LD-Agent\uff09\uff0c\u5b83\u5305\u62ec\u4e09\u4e2a\u53ef\u72ec\u7acb\u8c03\u6574\u7684\u6a21\u5757\uff1a\u4e8b\u4ef6\u611f\u77e5\u3001\u4eba\u683c\u63d0\u53d6\u548c\u54cd\u5e94\u751f\u6210\u3002\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u4f7f\u7528\u957f\u77ed\u671f\u8bb0\u5fc6\u5e93\u5206\u522b\u5173\u6ce8\u5386\u53f2\u548c\u6b63\u5728\u8fdb\u884c\u7684\u4f1a\u8bdd\uff0c\u5e76\u5f15\u5165\u4e86\u57fa\u4e8e\u4e3b\u9898\u7684\u68c0\u7d22\u673a\u5236\u4ee5\u63d0\u9ad8\u8bb0\u5fc6\u68c0\u7d22\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u4eba\u683c\u6a21\u5757\u5b9e\u73b0\u4e86\u7528\u6237\u548c\u4ee3\u7406\u7684\u52a8\u6001\u4eba\u683c\u5efa\u6a21\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6574\u5408\u68c0\u7d22\u7684\u8bb0\u5fc6\u548c\u63d0\u53d6\u7684\u4eba\u683c\uff0c\u751f\u6210\u5668\u4f1a\u4ea7\u751f\u9002\u5f53\u7684\u56de\u5e94\u3002\u6211\u4eec\u5728\u5404\u79cd\u793a\u4f8b\u57fa\u51c6\u3001\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u5b9e\u8bc1\u4e86LD-Agent\u7684\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u548c\u8de8\u9886\u57df\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u5728https://github.com/leolee99/LD-Agent\u4e0a\u53d1\u5e03\u3002**|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804](http://arxiv.org/abs/2406.05804)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u590d\u6742\u4ee3\u7406\u5de5\u4f5c\u6d41\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u8def\u5f84\u3001\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u63d0\u793a\u65b9\u6cd5\u6709\u6240\u6539\u8fdb\u3002\u8fd9\u7bc7\u7efc\u8ff0\u65e8\u5728\u6982\u8ff0\u5e38\u89c1\u7684\u5de5\u4f5c\u6d41\uff0c\u7279\u522b\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7279\u6027\u7684\u7ec4\u4ef6\uff08LLM-Profiled Components\uff0cLMPCs\uff09\uff0c\u5e76\u5f3a\u8c03\u5bf9\u975eLLM\u7ec4\u4ef6\u7684\u5ffd\u7565\u3002\u8fd9\u79cd\u7814\u7a76\u7684\u76ee\u7684\u662f\u4e3a\u4e86\u589e\u8fdb\u5bf9LLMs\u89d2\u8272\u7684\u7406\u89e3\uff0c\u5e76\u63a2\u7d22LMPC\u7684\u590d\u7528\u6f5c\u529b\u3002|\n", "2406.07275": "|**2024-06-11**|**DCA-Bench: A Benchmark for Dataset Curation Agents**|Benhao Huang et.al.|[2406.07275](http://arxiv.org/abs/2406.07275)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7814\u7a76\u548c\u5f00\u53d1\u7684\u63a8\u8fdb\uff0c\u6570\u636e\u96c6\u7684\u8d28\u91cf\u65e5\u76ca\u5173\u952e\u3002\u5c3d\u7ba1\u5f00\u653e\u6570\u636e\u96c6\u5e73\u53f0\u4f17\u591a\uff0c\u4f46\u6570\u636e\u8d28\u91cf\u95ee\u9898\uff0c\u5982\u7f3a\u4e4f\u6587\u6863\u3001\u6807\u6ce8\u9519\u8bef\u548c\u4f26\u7406\u8003\u91cf\uff0c\u4ecd\u666e\u904d\u5b58\u5728\u3002\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u96be\u4ee5\u901a\u8fc7\u89c4\u5219\u57fa\u7840\u811a\u672c\u68c0\u6d4b\uff0c\u9700\u8981\u7528\u6237\u6216\u7ef4\u62a4\u8005\u82b1\u8d39\u5927\u91cf\u4eba\u529b\u8fdb\u884c\u8bc6\u522b\u548c\u9a8c\u8bc1\u3002\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u6570\u636e\u96c6\u6574\u7406\u7684\u6f5c\u529b\u4ee4\u4eba\u671f\u5f85\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aDCA-Bench\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ee3\u7406\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u9690\u85cf\u6570\u636e\u8d28\u91cf\u95ee\u9898\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u4ece\u516b\u4e2a\u516c\u5f00\u6570\u636e\u96c6\u5e73\u53f0\u6536\u96c6\u4e86\u5404\u79cd\u5b9e\u9645\u95ee\u9898\u4f5c\u4e3a\u6d4b\u8bd5\u5e8a\u3002\u4e3a\u4e86\u5efa\u7acb\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30LLM\u6210\u529f\u4e0e\u5426\u7684\u7ba1\u9053\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u8bc4\u4f30\u5668\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8bc4\u4f30\u5668\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u9ad8\u5ea6\u543b\u5408\uff0c\u80fd\u5b9e\u73b0\u53ef\u9760\u7684\u81ea\u52a8\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5728\u591a\u4e2a\u57fa\u7ebfLLM\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u793a\u4e86\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u610f\u5473\u7740\u5c06LLMs\u5e94\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u548c\u521b\u65b0\u3002\u6b64\u5916\uff0c\u8be5\u57fa\u51c6\u4e5f\u53ef\u4f5c\u4e3a\u8861\u91cfLLMs\u5728\u95ee\u9898\u53d1\u73b0\u80fd\u529b\u800c\u975e\u4ec5\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u7684\u6d4b\u8bd5\u5e73\u53f0\u3002\u57fa\u51c6\u5957\u4ef6\u5df2\u5f00\u653e\u5728\uff1a\\url{https://github.com/TRAIS-Lab/dca-bench}\u3002|\n", "2406.07217": "|**2024-06-11**|**A Synthetic Dataset for Personal Attribute Inference**|Hanna Yukhymenko et.al.|[2406.07217](http://arxiv.org/abs/2406.07217)|**[link](https://github.com/eth-sri/synthpai)**|**\u8fd1\u5e74\u6765\uff0c\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e3a\u5168\u7403\u6570\u4ebf\u7528\u6237\u6240\u63a5\u89e6\uff0c\u4f46\u5b83\u4eec\u7684\u5f3a\u5927\u529f\u80fd\u548c\u5e7f\u6cdb\u4e16\u754c\u77e5\u8bc6\u4e5f\u5e26\u6765\u4e86\u9690\u79c1\u98ce\u9669\u3002\u672c\u7814\u7a76\u5173\u6ce8LLMs\u65b0\u5174\u7684\u9690\u79c1\u5a01\u80c1\u2014\u2014\u4ece\u7f51\u7edc\u6587\u672c\u4e2d\u51c6\u786e\u63a8\u65ad\u4e2a\u4eba\u4fe1\u606f\u3002\u9274\u4e8e\u57fa\u4e8eLLM\u7684\u4f5c\u8005\u5206\u6790\u7814\u7a76\u7f3a\u4e4f\u5408\u9002\u7684\u516c\u5f00\u6570\u636e\u96c6\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u6d89\u53ca\u771f\u5b9e\u4e2a\u4eba\u6570\u636e\u7684\u4f26\u7406\u548c\u9690\u79c1\u987e\u8651\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u4e24\u4e2a\u65b9\u9762\u8fdb\u884c\u4e86\u63a2\u7d22\uff1a\uff08i\uff09\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4f7f\u7528\u5408\u6210\u4e2a\u4eba\u8d44\u6599\u586b\u5145\u7684\u6d41\u884c\u793e\u4ea4\u5e73\u53f0Reddit\u7684\u6a21\u62df\u6846\u67b6\uff1b\uff08ii\uff09\u5229\u7528\u6b64\u6846\u67b6\uff0c\u6211\u4eec\u751f\u6210\u4e86SynthPAI\uff0c\u4e00\u4e2a\u5305\u542b\u8d85\u8fc77800\u6761\u7ecf\u8fc7\u624b\u52a8\u6807\u8bb0\u4e2a\u4eba\u5c5e\u6027\u7684\u591a\u6837\u5316\u7684\u5408\u6210\u8bc4\u8bba\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u4eba\u7c7b\u7814\u7a76\u9a8c\u8bc1\u4e86\u6570\u636e\u96c6\uff0c\u7ed3\u679c\u663e\u793a\u4eba\u7c7b\u5728\u533a\u5206\u771f\u5b9e\u548c\u5408\u6210\u8bc4\u8bba\u7684\u4efb\u52a1\u4e0a\u51e0\u4e4e\u4e0d\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6570\u636e\u96c6\u652f\u6301\u6709\u610f\u4e49\u7684\u4e2a\u4eba\u5c5e\u6027\u63a8\u65ad\u7814\u7a76\uff0c\u901a\u8fc718\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u5408\u6210\u8bc4\u8bba\u53ef\u4ee5\u5f97\u51fa\u4e0e\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u76f8\u540c\u7684\u7ed3\u8bba\u3002\u7efc\u4e0a\u6240\u8ff0\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6d41\u7a0b\u4e3a\u672a\u6765\u7814\u7a76\u5982\u4f55\u7406\u89e3\u548c\u51cf\u8f7bLLMs\u5e26\u6765\u7684\u57fa\u4e8e\u63a8\u65ad\u7684\u9690\u79c1\u5a01\u80c1\u63d0\u4f9b\u4e86\u5f3a\u5927\u4e14\u9690\u79c1\u4fdd\u62a4\u7684\u57fa\u7840\u3002**|\n", "2406.07021": "|**2024-06-11**|**A Tool for Test Case Scenarios Generation Using Large Language Models**|Abdul Malik Sami et.al.|[2406.07021](http://arxiv.org/abs/2406.07021)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u6db5\u76d6\u4ee3\u7801\u751f\u6210\u3001\u8f6f\u4ef6\u8bbe\u8ba1\u548c\u6587\u6863\u7f16\u5199\u3001\u6dfb\u52a0\u4ee3\u7801\u6ce8\u91ca\u3001\u4ee3\u7801\u5ba1\u67e5\u4ee5\u53ca\u7f16\u5199\u6d4b\u8bd5\u811a\u672c\u7b49\u4efb\u52a1\u3002\u7136\u800c\uff0c\u521b\u5efa\u6d4b\u8bd5\u811a\u672c\u6216\u81ea\u52a8\u5316\u6d4b\u8bd5\u6848\u4f8b\u9700\u8981\u4e0e\u529f\u80fd\u9700\u6c42\u7d27\u5bc6\u76f8\u5173\u7684\u8be6\u5c3d\u6d4b\u8bd5\u5957\u4ef6\u6587\u6863\u3002\u8fd9\u79cd\u6587\u6863\u5e94\u80fd\u5728\u6709\u9650\u7684\u65f6\u95f4\u548c\u8303\u56f4\u5185\u5b9e\u73b0\u5168\u9762\u6d4b\u8bd5\uff0c\u5c24\u5176\u5f53\u9700\u6c42\u548c\u7528\u6237\u671f\u671b\u4e0d\u65ad\u53d8\u5316\u65f6\u3002\u672c\u6587\u4e3b\u8981\u5173\u6ce8\u6839\u636e\u7528\u6237\u9700\u6c42\u751f\u6210\u53f2\u8bd7\u7ea7\uff08epics\uff09\u548c\u9ad8\u5c42\u6b21\u7528\u6237\u6545\u4e8b\uff0c\u7136\u540e\u57fa\u4e8e\u8fd9\u4e9b\u6545\u4e8b\u8bbe\u8ba1\u6d4b\u8bd5\u573a\u666f\u3002\u6587\u7ae0\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u4ee3\u7406\u548c\u63d0\u793a\u5de5\u7a0b\u7684\u7f51\u7edc\u8f6f\u4ef6\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u80fd\u591f\u81ea\u52a8\u5316\u9488\u5bf9\u7528\u6237\u9700\u6c42\u751f\u6210\u6d4b\u8bd5\u573a\u666f\u7684\u8fc7\u7a0b\u3002|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947](http://arxiv.org/abs/2406.06947)|**[link](https://github.com/caap-agent/caap-agent)**|**\u957f\u671f\u4ee5\u6765\uff0c\u8f6f\u4ef6\u673a\u5668\u4eba\u5df2\u7ecf\u5728\u673a\u5668\u4eba\u6d41\u7a0b\u81ea\u52a8\u5316\uff08RPA\uff09\u4e2d\u7528\u4e8e\u6267\u884c\u67af\u71e5\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u63a8\u7406\u80fd\u529b\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u4ee3\u7406\u73b0\u5728\u80fd\u591f\u5904\u7406\u66f4\u590d\u6742\u751a\u81f3\u524d\u6240\u672a\u89c1\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f53\u524d\u6587\u732e\u4e2d\u7684\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8eHTML\u6e90\u4ee3\u7801\u4f5c\u4e3a\u8f93\u5165\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u975e\u7f51\u7edc\u73af\u5883\u7684\u5e94\u7528\u3002HTML\u4ee3\u7801\u4e2d\u7684\u4fe1\u606f\u5e38\u5e38\u4e0d\u51c6\u786e\u6216\u4e0d\u5b8c\u6574\uff0c\u8fd9\u964d\u4f4e\u4e86\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4ec5\u57fa\u4e8e\u5c4f\u5e55\u622a\u56fe\u7684LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u5b83\u4e13\u6ce8\u4e8e\u8bc6\u522b\u73af\u5883\uff0c\u5e76\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u6765\u6d88\u9664\u5bf9\u5927\u91cf\u4eba\u7c7b\u6f14\u793a\u6570\u636e\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u7b56\u7565\u540d\u4e3a\u201c\u4e0a\u4e0b\u6587\u611f\u77e5\u884c\u52a8\u89c4\u5212\u201d\uff08Context-Aware Action Planning\uff0cCAAP\uff09\u63d0\u793a\uff0c\u9f13\u52b1\u4ee3\u7406\u4ece\u591a\u4e2a\u89d2\u5ea6\u4ed4\u7ec6\u5ba1\u67e5\u4e0a\u4e0b\u6587\u3002\u901a\u8fc7\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u572867\u79cdMiniWoB++\u95ee\u9898\u4e0a\u5b9e\u73b0\u4e8694.4%\u7684\u6210\u529f\u7387\uff0c\u6bcf\u4e2a\u95ee\u9898\u7c7b\u578b\u53ea\u97001.48\u6b21\u6f14\u793a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u5728\u8ba1\u7b97\u673a\u6216\u667a\u80fd\u624b\u673a\u4e4b\u95f4\u8fdb\u884c\u8de8\u5e94\u7528\u534f\u8c03\u7684\u4efb\u52a1\u4e0a\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u4ee3\u7406\u9886\u57df\u7684\u91cd\u5927\u8fdb\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728https://github.com/caap-agent/caap-agent\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613](http://arxiv.org/abs/2406.06613)|**[link](https://github.com/Joshuaclymer/GameBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5728\u8bb8\u591a\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u5c11\u91cf\u6837\u672c\u6027\u80fd\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5c55\u793a\u8fc7\u5728\u590d\u6742\u7b56\u7565\u573a\u666f\u4e2d\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u7f3a\u4e4f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u6e38\u620f\u4e2d\u7684\u5404\u79cd\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86GameBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u8de8\u9886\u57df\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6218\u7565\u601d\u7ef4\u80fd\u529b\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e9\u4e2a\u4e0d\u540c\u7684\u6e38\u620f\u73af\u5883\uff0c\u6bcf\u4e2a\u6e38\u620f\u81f3\u5c11\u6db5\u76d6\u4e00\u79cd\u5728\u7b56\u7565\u6e38\u620f\u4e2d\u8bc6\u522b\u51fa\u7684\u5173\u952e\u63a8\u7406\u6280\u80fd\uff0c\u5e76\u9009\u62e9\u90a3\u4e9b\u6218\u7565\u89e3\u91ca\u4e0d\u592a\u53ef\u80fd\u6784\u6210\u6a21\u578b\u9884\u8bad\u7ec3\u6570\u636e\u4e3b\u8981\u90e8\u5206\u7684\u6e38\u620f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4f7f\u7528\u4e86\u57fa\u7840\u5f62\u5f0f\u7684GPT-3\u548cGPT-4\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65e8\u5728\u589e\u5f3a\u6218\u7565\u63a8\u7406\u80fd\u529b\u7684\u5f15\u5bfc\u6846\u67b6\uff1aChain-of-Thought\uff08CoT\uff09\u63d0\u793a\u548cReasoning Via Planning\uff08RAP\uff09\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6240\u6709\u6d4b\u8bd5\u6a21\u578b\u7684\u8868\u73b0\u90fd\u6ca1\u6709\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\uff0c\u6700\u5dee\u7684\u662fGPT-4\u7684\u8868\u73b0\u751a\u81f3\u4f4e\u4e8e\u968f\u673a\u884c\u52a8\u3002CoT\u548cRAP\u90fd\u63d0\u9ad8\u4e86\u5206\u6570\uff0c\u4f46\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002**|\n", "2406.08184": "|**2024-06-12**|**MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents**|Luyuan Wang et.al.|[2406.08184](http://arxiv.org/abs/2406.08184)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e0a\u7684\u76f4\u63a5\u4ea4\u4e92\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u4ee5\u53ca\u5b83\u4eec\u5728\u81ea\u4e3b\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u57fa\u4e8eLLMs\u7684\u79fb\u52a8\u4ee3\u7406\u6b63\u9010\u6e10\u53d7\u5230\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5e94\u7528\u7a0b\u5e8f\u7684\u65e0\u9650\u72b6\u6001\u548c\u53ef\u884c\u52a8\u4f5c\u5e8f\u5217\u7684\u6a21\u7cca\u5b9a\u4e49\uff0c\u5bf9\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u6027\u80fd\u7684\u57fa\u51c6\u7814\u7a76\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684\u57fa\u51c6\u5de5\u5177\u2014\u2014MobileAgentBench\uff0c\u65e8\u5728\u51cf\u8f7b\u7e41\u7410\u7684\u624b\u52a8\u6d4b\u8bd5\u8d1f\u62c5\u3002\u6211\u4eec\u9996\u5148\u5b9a\u4e49\u4e86\u6db5\u76d610\u4e2a\u5f00\u6e90\u5e94\u7528\u7684100\u9879\u4efb\u52a1\uff0c\u6309\u96be\u5ea6\u5206\u4e3a\u591a\u4e2a\u7ea7\u522b\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5305\u62ecAppAgent\u548cMobileAgent\u5728\u5185\u7684\u591a\u4e2a\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u5168\u9762\u7cfb\u7edf\u5730\u6bd4\u8f83\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6240\u6709\u76f8\u5173\u6750\u6599\u5747\u53ef\u5728\u6211\u4eec\u7684\u9879\u76ee\u7f51\u7ad9https://MobileAgentBench.github.io\u4e0a\u83b7\u53d6\uff0c\u8fd9\u5c06\u63a8\u52a8\u5b66\u672f\u548c\u5de5\u4e1a\u9886\u57df\u7684\u8fdb\u6b65\u3002|\n", "2406.07973": "|**2024-06-12**|**Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey**|Shang Wang et.al.|[2406.07973](http://arxiv.org/abs/2406.07973)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5927\u91cf\u6570\u636e\u8bad\u7ec3\uff0c\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u673a\u5668\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u7b49\u5404\u79cd\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5728\u5176\u751f\u547d\u5468\u671f\u4e2d\u66b4\u9732\u51fa\u4e00\u7cfb\u5217\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u8fd9\u4e9b\u95ee\u9898\u4e0e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\u5177\u6709\u72ec\u7279\u6027\uff0c\u9274\u4e8e\u5f53\u524d\u7684\u7efc\u8ff0\u7f3a\u4e4f\u9488\u5bf9\u4e0d\u540c\u573a\u666f\u7684\u6e05\u6670\u5a01\u80c1\u5206\u7c7b\uff0c\u6211\u4eec\u6839\u636e\u4e94\u4e2a\u573a\u666f\uff1a\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001RAG\u7cfb\u7edf\u3001\u90e8\u7f72\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5f3a\u8c03\u4e86\u72ec\u7279\u7684\u98ce\u9669\u3002\u8003\u8651\u5230\u6bcf\u79cd\u5a01\u80c1\u7684\u7279\u6027\uff0c\u672c\u8c03\u67e5\u63d0\u4f9b\u4e86\u6f5c\u5728\u5a01\u80c1\u548c\u5e94\u5bf9\u7b56\u7565\u3002\u7814\u7a76LLMs\u6240\u9762\u4e34\u7684\u653b\u51fb\u548c\u9632\u5fa1\u60c5\u51b5\uff0c\u53ef\u4ee5\u4e3a\u66f4\u591a\u9886\u57df\u63d0\u4f9b\u53ef\u884c\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4f7f\u66f4\u591a\u4eba\u80fd\u591f\u53d7\u76ca\u4e8eLLMs\u3002|\n", "2406.07914": "|**2024-06-14**|**Can Large Language Models Understand Spatial Audio?**|Changli Tang et.al.|[2406.07914](http://arxiv.org/abs/2406.07914)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u638c\u63e1\u591a\u901a\u9053\u97f3\u9891\u4e2d\u7684\u7a7a\u95f4\u4fe1\u606f\uff0c\u8fd9\u662f\u5f53\u524d\u542c\u89c9LLMs\u6240\u7f3a\u4e4f\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5229\u7528LLMs\u7684\u9ad8\u7ea7\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\uff0c\u76ee\u6807\u662f\u63d0\u5347\u6a21\u578b\u5bf9\u4e09\u7ef4\u73af\u5883\u7684\u7406\u89e3\uff0c\u901a\u8fc7\u97f3\u9891\u3002\u7814\u7a76\u6d89\u53ca\u4e09\u9879\u7a7a\u95f4\u97f3\u9891\u4efb\u52a1\uff1a\u58f0\u6e90\u5b9a\u4f4d\uff08SSL\uff09\u3001\u8fdc\u573a\u8bed\u97f3\u8bc6\u522b\uff08FSR\uff09\u548c\u57fa\u4e8e\u4f4d\u7f6e\u7684\u8bed\u97f3\u63d0\u53d6\uff08LSE\uff09\uff0c\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u90fd\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5728SSL\u65b9\u9762\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spatial LibriSpeech\u6570\u636e\u96c6\u4e0a\u7684\u5747\u65b9\u8bef\u5dee\uff08MAE\uff09\u8fbe\u52302.70\u00b0\uff0c\u660e\u663e\u4f18\u4e8e\u5148\u524d\u7684\u57fa\u51c6\u7ea66.60\u00b0\u3002\u6b64\u5916\uff0c\u6a21\u578b\u80fd\u591f\u5229\u7528\u7a7a\u95f4\u7ebf\u7d22\u63d0\u9ad8FSR\u7684\u51c6\u786e\u6027\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u63d0\u793a\uff0c\u6839\u636e\u6307\u5b9a\u65b9\u5411\u805a\u7126\u4e8e\u58f0\u97f3\uff0c\u5373\u4f7f\u5728\u91cd\u53e0\u8bed\u97f3\u73af\u5883\u4e2d\u4e5f\u80fd\u6267\u884cLSE\u3002\u8fd9\u4e9b\u6210\u679c\u63ed\u793a\u4e86LLMs\u9002\u5e94\u7269\u7406\u97f3\u9891\u6982\u5ff5\u7684\u6f5c\u529b\uff0c\u4e3a\u6784\u5efa\u57fa\u4e8eLLM\u7684\u4e09\u7ef4\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187](http://arxiv.org/abs/2406.09187)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u5e94\u7528\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u65b0\u62c5\u5fe7\u3002\u73b0\u6709\u7684\u63d0\u5347LLM\u5b89\u5168\u6027\u7684\u65b9\u6cd5\u5e76\u4e0d\u76f4\u63a5\u9002\u7528\u4e8eLLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u56e0\u4e3a\u5b83\u4eec\u5177\u6709\u4e0d\u540c\u7684\u76ee\u6807\u548c\u8f93\u51fa\u6a21\u5f0f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\u2014\u2014GuardAgent\uff0c\u5b83\u4f5c\u4e3a\u5176\u4ed6LLM\u4ee3\u7406\u7684\u201c\u9632\u62a4\u680f\u201d\u3002GuardAgent\u901a\u8fc7\u68c0\u67e5\u5176\u8f93\u5165/\u8f93\u51fa\u662f\u5426\u6ee1\u8db3\u7528\u6237\u5b9a\u4e49\u7684\u4e00\u7cfb\u5217\u5b88\u62a4\u8bf7\u6c42\u6765\u76d1\u7763\u76ee\u6807LLM\u3002GuardAgent\u5206\u4e3a\u4e24\u6b65\uff1a1\uff09\u5206\u6790\u63d0\u4f9b\u7684\u5b88\u62a4\u8bf7\u6c42\u521b\u5efa\u4efb\u52a1\u8ba1\u5212\uff1b2\uff09\u6839\u636e\u4efb\u52a1\u8ba1\u5212\u751f\u6210\u5b88\u62a4\u4ee3\u7801\uff0c\u5e76\u901a\u8fc7API\u8c03\u7528\u6216\u5916\u90e8\u5f15\u64ce\u6267\u884c\u3002\u6574\u4e2a\u8fc7\u7a0b\u5229\u7528LLM\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u7ec4\u4ef6\uff0c\u7ed3\u5408\u8bb0\u5fc6\u6a21\u5757\u4e2d\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u589e\u5f3a\u4e86\u77e5\u8bc6\u9a71\u52a8\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u5404\u79cd\u6587\u672c\u5b88\u62a4\u8bf7\u6c42\u5e76\u51c6\u786e\u5730\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u63d0\u4f9b\u53ef\u9760\u7684\u5b89\u5168\u4fdd\u969c\u3002 GuardAgent\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u5de5\u5177\u7bb1\uff0c\u5305\u542b\u51fd\u6570\u548cAPI\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3LLM\uff0c\u5f3a\u8c03\u4e86\u5176\u901a\u7528\u6027\u53ca\u4f4e\u8fd0\u8425\u6210\u672c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff1aEICU-AC\u7528\u4e8e\u8bc4\u4f30\u533b\u7597\u5065\u5eb7\u4ee3\u7406\u7684\u9690\u79c1\u76f8\u5173\u8bbf\u95ee\u63a7\u5236\uff0cMind2Web-SC\u7528\u4e8e\u8bc4\u4f30\u7f51\u7edc\u4ee3\u7406\u7684\u5b89\u5168\u6027\u3002\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\uff0cGuardAgent\u5206\u522b\u572898.7%\u548c90.0%\u7684\u7cbe\u5ea6\u4e0b\u6709\u6548\u7ba1\u7406\u4e86\u4e24\u79cd\u7c7b\u578b\u4ee3\u7406\u7684\u65e0\u6548\u8f93\u5165\u548c\u8f93\u51fa\u3002\u5b9e\u9a8c\u8fd8\u8868\u660e\uff0cGuardAgent\u80fd\u591f\u9002\u5e94\u65b0\u5174\u7684LLM\u4ee3\u7406\u548c\u5b88\u62a4\u8bf7\u6c42\uff0c\u5b9a\u4e49\u65b0\u7684\u529f\u80fd\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2406.08979": "|**2024-06-13**|**Multi-Agent Software Development through Cross-Team Collaboration**|Zhuoyun Du et.al.|[2406.08979](http://arxiv.org/abs/2406.08979)|**[link](https://github.com/openbmb/chatdev)**|**### \u6982\u8ff0 \u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\uff0c\u5982ChatDev\uff0c\u63a8\u52a8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u6df1\u523b\u53d8\u9769\uff0c\u7279\u522b\u4f53\u73b0\u5728\u591a\u4ee3\u7406\u534f\u4f5c\u4e0a\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u50cf\u4eba\u7c7b\u56e2\u961f\u4e00\u6837\u5408\u4f5c\uff0c\u9075\u5faa\u7011\u5e03\u6a21\u578b\u8fdb\u884c\u9700\u6c42\u5206\u6790\u3001\u5f00\u53d1\u3001\u5ba1\u67e5\u3001\u6d4b\u8bd5\u7b49\u9636\u6bb5\uff0c\u5b9e\u73b0\u81ea\u4e3b\u8f6f\u4ef6\u751f\u6210\u3002\u7136\u800c\uff0c\u5355\u4e2a\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u6bcf\u4e2a\u9636\u6bb5\u53ea\u4f1a\u4ea7\u751f\u4e00\u79cd\u53ef\u80fd\u7ed3\u679c\uff0c\u5bfc\u81f4\u53ea\u5b8c\u6210\u4e00\u6761\u5f00\u53d1\u94fe\uff0c\u4ece\u800c\u4e27\u5931\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u4e2d\u63a2\u7d22\u591a\u79cd\u51b3\u7b56\u8def\u5f84\u7684\u673a\u4f1a\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u8de8\u56e2\u961f\u534f\u4f5c\uff08Cross-Team Collaboration\uff0cCTC\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u591a\u56e2\u961f\u7ed3\u6784\uff0c\u5b83\u5141\u8bb8\u534f\u540c\u5de5\u4f5c\u7684\u56e2\u961f\u5728\u8de8\u56e2\u961f\u534f\u4f5c\u73af\u5883\u4e2d\u5171\u540c\u63d0\u51fa\u51b3\u7b56\uff0c\u5e76\u4ea4\u6d41\u5404\u81ea\u89c1\u89e3\uff0c\u4ee5\u4f18\u5316\u5185\u5bb9\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u57fa\u51c6\uff0c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5728\u6545\u4e8b\u751f\u6210\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5177\u6709\u5e7f\u6cdb\u7684\u8de8\u9886\u57df\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u671f\u5f85\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u5f15\u5bfcLLMs\u5411\u8de8\u56e2\u961f\u6a21\u5f0f\u53d1\u5c55\uff0c\u5e76\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5e26\u6765\u91cd\u5927\u8fdb\u6b65\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.08747": "|**2024-06-13**|**StreamBench: Towards Benchmarking Continuous Improvement of Language Agents**|Cheng-Kuang Wu et.al.|[2406.08747](http://arxiv.org/abs/2406.08747)|**[link](https://github.com/stream-bench/stream-bench)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u7ecf\u9a8c\u4e2d\u81ea\u6211\u63d0\u5347\uff0c\u8fd9\u662f\u90e8\u7f72\u540e\u6301\u7eed\u6539\u8fdb\u7684\u91cd\u8981\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u4e3b\u8981\u8bc4\u4f30\u5b83\u4eec\u7684\u56fa\u6709\u80fd\u529b\uff0c\u800c\u4e0d\u8003\u5bdf\u5b83\u4eec\u968f\u65f6\u95f4\u6539\u8fdb\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86StreamBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u8f93\u5165-\u53cd\u9988\u5e8f\u5217\u4e0a\u7684\u8fde\u7eed\u6539\u8fdb\u6027\u80fd\u3002StreamBench\u6a21\u62df\u4e86\u4e00\u4e2a\u5728\u7ebf\u5b66\u4e60\u73af\u5883\uff0c\u5176\u4e2dLLMs\u63a5\u6536\u5230\u8fde\u7eed\u7684\u53cd\u9988\u6d41\uff0c\u5e76\u8fed\u4ee3\u5730\u63d0\u5347\u5176\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b80\u5355\u4f46\u6709\u6548\u7684LLM\u57fa\u7ebf\uff0c\u5e76\u5bf9\u5f71\u54cd\u6210\u529f\u6d41\u5f0f\u7b56\u7565\u7684\u5173\u952e\u7ec4\u4ef6\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u5f00\u53d1LLMs\u7684\u6709\u6548\u5728\u7ebf\u5b66\u4e60\u7b56\u7565\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u4e3a\u6d41\u5f0f\u573a\u666f\u4e2d\u7684\u66f4\u9002\u5e94\u6027AI\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.11277": "|**2024-06-17**|**Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector**|Xiaoxue Cheng et.al.|[2406.11277](http://arxiv.org/abs/2406.11277)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7b\u89c9\u68c0\u6d4b\u65b9\u9762\u7684\u6311\u6218\uff0c\u7279\u522b\u6307\u51fa\u4ee5\u5f80\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u5f3a\u5927\u7684\u95ed\u6e90\u6a21\u578b\u5982GPT-4\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u4e3b\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u79f0\u4e3aHaluAgent\uff0c\u5b83\u5141\u8bb8\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982\u5df4 chcuan2-Chat 7B\uff09\u4e3b\u52a8\u9009\u62e9\u9002\u5408\u68c0\u6d4b\u6587\u672c\u3001\u4ee3\u7801\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\u7b49\u591a\u79cd\u5e7b\u89c9\u7c7b\u578b\u7684\u5de5\u5177\u3002HaluAgent\u6574\u5408\u4e86LLM\u3001\u591a\u529f\u80fd\u5de5\u5177\u7bb1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u4e09\u9636\u6bb5\u68c0\u6d4b\u6846\u67b6\uff0c\u540c\u65f6\u914d\u5907\u4e86\u8bb0\u5fc6\u673a\u5236\u3002\u4e3a\u4e86\u63d0\u9ad8HaluAgent\u7684\u6548\u80fd\uff0c\u8bba\u6587\u5229\u7528\u73b0\u6709\u7684\u4e2d\u6587\u548c\u82f1\u6587\u6570\u636e\u96c6\u5408\u6210\u68c0\u6d4b\u8f68\u8ff9\u8fdb\u884c\u5fae\u8c03\uff0c\u4f7f\u5176\u5177\u5907\u53cc\u8bed\u5e7b\u89c9\u68c0\u6d4b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f7f\u75282000\u4e2a\u6837\u672c\u5bf9LLM\u8fdb\u884c\u8c03\u4f18\u540e\uff0cHaluAgent\u5728\u5404\u79cd\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5176\u6027\u80fd\u53ef\u4e0eGPT-4\u5ab2\u7f8e\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\uff0c\u4e14\u65e0\u9700\u989d\u5916\u5de5\u5177\u589e\u5f3a\uff0c\u65e0\u8bba\u5728\u9886\u57df\u5185\u8fd8\u662f\u9886\u57df\u5916\u7684\u6570\u636e\u96c6\u4e0a\u90fd\u5c55\u73b0\u51fa\u826f\u597d\u6027\u80fd\u3002\u8bba\u6587\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53d1\u5e03\u5728https://github.com/RUCAIBox/HaluAgent\u3002|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200](http://arxiv.org/abs/2406.11200)|**[link](https://github.com/zou-group/avatar)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5229\u7528\u5916\u90e8\u5de5\u5177\u548c\u77e5\u8bc6\u63d0\u5347\u51c6\u786e\u6027\u548c\u51cf\u5c11\u9519\u8bef\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bbe\u8ba1\u80fd\u8ba9LLMs\u6709\u6548\u8fd0\u7528\u8fd9\u4e9b\u5de5\u5177\u7684\u63d0\u793a\u6280\u5de7\u662f\u4e00\u9879\u8017\u65f6\u4e14\u4f9d\u8d56\u76f4\u89c9\u7684\u4efb\u52a1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faAvaTaR\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u5b83\u80fd\u4f18\u5316LLMs\uff0c\u4f7f\u5176\u66f4\u6709\u6548\u5730\u5229\u7528\u63d0\u4f9b\u7684\u5de5\u5177\uff0c\u5e76\u5728\u7279\u5b9a\u4efb\u52a1\u6216\u9886\u57df\u4e2d\u63d0\u5347\u6027\u80fd\u3002AvaTaR\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6bd4\u8f83\u5668\u6a21\u5757\uff0c\u4ee5\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u6b63\u8d1f\u6837\u672c\u8fdb\u884c\u63a8\u7406\uff0c\u8fed\u4ee3\u5730\u4e3aLLM\u63d0\u4f9b\u5bcc\u6709\u6d1e\u5bdf\u529b\u548c\u5168\u9762\u7684\u63d0\u793a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u5305\u542b\u6587\u672c\u3001\u89c6\u89c9\u548c\u5173\u7cfb\u4fe1\u606f\u7684\u590d\u6742\u591a\u6a21\u6001\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86AvaTaR\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u8868\u660e\uff0cAvaTaR\u5728\u6240\u6709\u56db\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u5747\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5f53\u5e94\u7528\u4e8e\u65b0\u6848\u4f8b\u65f6\uff0c\u5e73\u5747\u5728Hit@1\u6307\u6807\u4e0a\u5b9e\u73b0\u4e8614%\u7684\u76f8\u5bf9\u6539\u8fdb\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.11176": "|**2024-06-17**|**Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement**|Weimin Xiong et.al.|[2406.11176](http://arxiv.org/abs/2406.11176)|**[link](https://github.com/weiminxiong/ipr)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7cfb\u5217\u590d\u6742\u7684\u4ea4\u4e92\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u503e\u5411\u4e8e\u901a\u8fc7\u4e13\u5bb6\u8f68\u8ff9\u8c03\u4f18\u6765\u63d0\u5347\u6a21\u578b\u6548\u679c\uff0c\u4f46\u4e3b\u8981\u5173\u6ce8\u6700\u7ec8\u7ed3\u679c\u5956\u52b1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u6216\u975e\u6700\u4f18\u884c\u4e3a\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u8fc7\u7a0b\u76d1\u7763\u4fe1\u53f7\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u8fed\u4ee3\u6b65\u7ea7\u8fc7\u7a0b\u6539\u8fdb\uff08Iterative Step-level Process Refinement\uff0cIPR\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u91c7\u7528\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u4f30\u7b97\u6bcf\u4e00\u6b65\u7684\u5956\u52b1\u3002\u5728\u6bcf\u4e2a\u8fed\u4ee3\u4e2d\uff0c\u6a21\u578b\u6cbf\u7740\u4e13\u5bb6\u8f68\u8ff9\u63a2\u7d22\u5e76\u751f\u6210\u65b0\u52a8\u4f5c\uff0c\u7136\u540e\u4e0e\u4e13\u5bb6\u8f68\u8ff9\u7684\u76f8\u5e94\u6b65\u9aa4\u8fdb\u884c\u6bd4\u8f83\uff0c\u4f7f\u7528\u6b65\u7ea7\u5956\u52b1\u8bc4\u4f30\u3002\u8fd9\u79cd\u6bd4\u8f83\u6709\u52a9\u4e8e\u8bc6\u522b\u5dee\u5f02\uff0c\u5f62\u6210\u7528\u4e8e\u8bad\u7ec3\u7684\u5bf9\u6bd4\u52a8\u4f5c\u5bf9\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u590d\u6742\u4ee3\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f18\u4e8e\u591a\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u63ed\u793a\u4e86IPR\u5728\u63d0\u5347\u52a8\u4f5c\u6548\u7387\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc1\u660e\u5176\u9002\u7528\u4e8e\u5404\u79cd\u6a21\u578b\u3002**|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132](http://arxiv.org/abs/2406.11132)|null|\u5728\u8fc7\u53bb\u7684\u4e00\u5e74\u91cc\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u4e4b\u5916\u5c55\u73b0\u51fa\u60ca\u4eba\u6210\u5c31\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5728\u4ee3\u7801\u751f\u6210\u3001\u65c5\u884c\u89c4\u5212\u548c\u673a\u5668\u4eba\u63a7\u5236\u7b49\u66f4\u5177\u4f53\u7684\u5e94\u7528\u9886\u57df\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u3002\u901a\u8fc7\u4e0eLLM\u6784\u5efa\u6240\u8c13\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\u534f\u52a9\u4eba\u4eec\u5b8c\u6210\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u5404\u79cd\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5bf9LLMs\u7684\u63d0\u793a\u8bed\u53e5\u5bf9\u751f\u6210\u5185\u5bb9\u53ca\u5176\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u6210\u4e3a\u8bb8\u591a\u7814\u7a76\u4eba\u5458\u548cLLM\u7528\u6237\u5173\u6ce8\u7684\u7126\u70b9\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u540d\u4e3a\\textsc{RePrompt}\uff0c\u5b83\u5229\u7528\u4e0eLLM\u4ee3\u7406\u4ea4\u4e92\u83b7\u53d6\u7684\u5bf9\u8bdd\u5386\u53f2\uff0c\u901a\u8fc7\u201c\u68af\u5ea6\u4e0b\u964d\u201d\u4f18\u5316LLM\u7684\u9010\u6b65\u6307\u4ee4\u3002\u901a\u8fc7\u4f18\u5316\u63d0\u793a\uff0cLLM\u80fd\u591f\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u89c4\u5212\u7b56\u7565\u3002\u6211\u4eec\u5728PDDL\u751f\u6210\u548c\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u63d0\u793a\u4f5c\u4e3a\u521d\u59cb\u63d0\u793a\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u5e38\u53ef\u4ee5\u63d0\u9ad8\u4e0d\u540c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u3002|\n", "2406.10918": "|**2024-06-18**|**Embodied Question Answering via Multi-LLM Systems**|Bhrij Patel et.al.|[2406.10918](http://arxiv.org/abs/2406.10918)|null|## \u80cc\u666f Embodied Question Answering\uff08EQA\uff09\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5b83\u6d89\u53ca\u4e00\u4e2a\u4ee3\u7406\u5728\u73af\u5883\u4e2d\u63a2\u7d22\u4ee5\u56de\u7b54\u7528\u6237\u67e5\u8be2\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4ee3\u7406\u573a\u666f\u4e2d\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u63a2\u7d22\u65f6\u95f4\u5197\u957f\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u591a\u4ee3\u7406\u6846\u67b6\u4e0b\u7684EQA\uff0c\u5176\u4e2d\u6d89\u53ca\u591a\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u72ec\u7acb\u4ee3\u7406\uff0c\u5b83\u4eec\u5404\u81ea\u89e3\u7b54\u5173\u4e8e\u5bb6\u5ead\u73af\u5883\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u4e3a\u6bcf\u4e2a\u67e5\u8be2\u751f\u6210\u4e00\u4e2a\u7b54\u6848\uff0c\u6211\u4eec\u5229\u7528\u5404\u4e2a\u72ec\u7acb\u54cd\u5e94\u6765\u8bad\u7ec3\u4e00\u4e2a\u4e2d\u592e\u7b54\u6848\u6a21\u578b\uff08CAM\uff09\uff0c\u8be5\u6a21\u578b\u6574\u5408\u7b54\u6848\u4ee5\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u56de\u7b54\u3002\u901a\u8fc7\u4f7f\u7528CAM\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5176\u5728EQA\u51c6\u786e\u7387\u4e0a\u6bd4\u8bf8\u5982\u6295\u7968\u673a\u5236\u548c\u8fa9\u8bba\u7b49ensemble LLM\u805a\u5408\u65b9\u6cd5\u9ad8\u51fa50%\u3002CAM\u65e0\u9700\u4efb\u4f55\u5f62\u5f0f\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\uff0c\u4ece\u800c\u907f\u514d\u4e86\u76f8\u5173\u5f00\u9500\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e0d\u540c\u7684\u975e\u7ebf\u6027\uff08\u5982\u795e\u7ecf\u7f51\u7edc\u3001\u968f\u673a\u68ee\u6797\u3001\u51b3\u7b56\u6811\u3001XGBoost\uff09\u548c\u7ebf\u6027\u7b97\u6cd5\uff08\u5982\u903b\u8f91\u56de\u5f52\u5206\u7c7b\u5668\u3001\u652f\u6301\u5411\u91cf\u673a\uff09\u5bf9CAM\u8fdb\u884c\u4e86\u6d88\u878d\u7814\u7a76\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7Permutation Feature Importance\uff08PFI\uff09\u5206\u6790\u4e86CAM\u5bf9\u6bcf\u4e2a\u72ec\u7acb\u4ee3\u7406\u548c\u67e5\u8be2\u4e0a\u4e0b\u6587\u7684\u4f9d\u8d56\u7a0b\u5ea6\uff0c\u91cf\u5316\u4e86CAM\u7684\u4f9d\u8d56\u7279\u6027\u3002|\n", "2406.10819": "|**2024-06-16**|**GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents**|Dongping Chen et.al.|[2406.10819](http://arxiv.org/abs/2406.10819)|**[link](https://github.com/keplerlab/katna)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5df2\u88ab\u7528\u4e8e\u63a7\u5236\u952e\u76d8\u548c\u9f20\u6807\u8f93\u5165\uff0c\u76f4\u63a5\u611f\u77e5\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\uff0c\u5e76\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6a21\u578b\u4e3b\u8981\u5728\u9759\u6001\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4e3b\u8981\u5e94\u7528\u4e8e\u76f8\u5bf9\u7b80\u5355\u7684\u9886\u57df\uff0c\u5982\u7f51\u9875\u6216\u79fb\u52a8\u754c\u9762\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e00\u4e2a\u7a33\u5065\u7684GUI\u4ee3\u7406\u5e94\u5177\u5907\u7406\u89e3GUI\u7684\u65f6\u7a7a\u4fe1\u606f\u80fd\u529b\uff0c\u5305\u62ec\u52a8\u6001\u7f51\u9875\u5185\u5bb9\u548c\u591a\u6b65\u9aa4\u4efb\u52a1\uff0c\u8fd8\u8981\u5168\u9762\u7406\u89e3\u5404\u79cdGUI\u573a\u666f\uff0c\u5305\u62ec\u684c\u9762\u8f6f\u4ef6\u548c\u591a\u7a97\u53e3\u4ea4\u4e92\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u6570\u636e\u96c6\u2014\u2014GUI-World\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u7cbe\u5fc3\u5236\u4f5c\u7684\u4eba\u673a\u6807\u6ce8\uff0c\u5e7f\u6cdb\u6db5\u76d6\u516d\u79cdGUI\u573a\u666f\u548c\u516b\u7c7bGUI\u76f8\u5173\u95ee\u9898\uff0c\u4ee5\u4e09\u79cd\u683c\u5f0f\u5448\u73b0\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684MLLM\uff0c\u5982\u56fe\u50cfLLMs\u548c\u89c6\u9891LLMs\uff0c\u5728\u7406\u89e3\u548c\u5904\u7406\u4e0d\u540c\u7c7b\u578bGUI\u5185\u5bb9\uff0c\u7279\u522b\u662f\u52a8\u6001\u548c\u5e8f\u5217\u5185\u5bb9\u65b9\u9762\u7684\u80fd\u529b\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u56fe\u50cfLLMs\u5728\u6ca1\u6709\u624b\u52a8\u6807\u6ce8\u5173\u952e\u5e27\u6216\u64cd\u4f5c\u5386\u53f2\u7684\u60c5\u51b5\u4e0b\uff0c\u96be\u4ee5\u5e94\u5bf9\u52a8\u6001GUI\u5185\u5bb9\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7531\u4e8eGUI\u89c6\u9891\u6570\u636e\u96c6\u7684\u7a00\u758f\u6027\uff0c\u89c6\u9891LLMs\u5728\u6240\u6709GUI\u76f8\u5173\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u57fa\u4e8eGUI-World\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u4f7f\u7528\u5fae\u8c03\u540e\u7684\u89c6\u9891LLM\u4f5c\u4e3aGUI\u4ee3\u7406\uff0c\u663e\u793a\u4e86\u5bf9\u5404\u79cdGUI\u4efb\u52a1\u7406\u89e3\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u7531\u4e8e\u57fa\u7840LLM\u6027\u80fd\u7684\u9650\u5236\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c06\u89c6\u9891LLMs\u7528\u4f5cGUI\u4ee3\u7406\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u672a\u6765\u5728\u52a8\u6001GUI\u5185\u5bb9\u7406\u89e3\u65b9\u9762\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u6d1e\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u6211\u4eec\u7684\u9879\u76ee\u4e3b\u9875https://gui-world.github.io/\u4e0a\u516c\u5f00\u3002**|\n", "2406.10803": "|**2024-06-16**|**HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies**|William Watson et.al.|[2406.10803](http://arxiv.org/abs/2406.10803)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u65f6\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u4e3b\u8981\u5305\u62ec\uff1a\uff081\uff09\u5bf9\u4e8e\u5927\u8868\u683c\u6709\u9650\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff1b\uff082\uff09\u4e0d\u540ctoken\u5316\u6a21\u5f0f\u4e0e\u5355\u5143\u683c\u8fb9\u754c\u7684\u590d\u6742\u5dee\u5f02\uff1b\uff083\uff09\u4ee5\u53ca\u4f7f\u7528\u5916\u90e8\u6a21\u578b\u5982gpt-3.5-turbo\u65f6\u7684\u6570\u636e\u4fdd\u5bc6\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cHiddenTables\u201d\u7684\u5408\u4f5c\u6e38\u620f\u3002\u8fd9\u4e2a\u6e38\u620f\u6d89\u53ca\u4ee3\u7801\u751f\u6210LLM\u201cSolver\u201d\u548c\u8bc4\u4f30\u5176\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u80fd\u529b\u7684\u201cOracle\u201d\uff0c\u4ee5\u81ea\u7136\u8bed\u8a00\u89c4\u8303\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4fdd\u8bc1\u6570\u636e\u5b89\u5168\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u8868\u683c\u4e0a\u5c55\u793a\u4e86LLMs\u5728\u5904\u7406\u590d\u6742\u67e5\u8be2\u3001\u5904\u7406\u7ec4\u5408\u4f9d\u8d56\u4ee5\u53ca\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u5316\u4e3a\u7a0b\u5e8f\u6307\u4ee4\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u7279\u522b\u662f\u5728\u63d0\u4f9b\u5177\u4f53\u8868\u683c\u7ed3\u6784\u7684\u60c5\u51b5\u4e0b\u3002\u4e0e\u57fa\u4e8e\u7f16\u7801\u5668\u7684\u6a21\u578b\u4e0d\u540c\uff0c\u201cHiddenTables\u201d\u4e0d\u53d7\u884c\u6570\u9650\u5236\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u5b8c\u6210 token \u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u201cPyQTax\u201d\uff0c\u5305\u542b116,671\u4e2a\u95ee\u9898-\u8868\u683c-\u7b54\u6848\u4e09\u5143\u7ec4\uff0c\u5e76\u63d0\u4f9b\u4e86\u66f4\u7ec6\u81f4\u7684\u95ee\u9898\u5206\u7c7b\u548c\u6807\u7b7e\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u7814\u7a76\u3002 \u56e0\u6b64\uff0c\u9664\u4e86\u5b66\u672f\u8d21\u732e\uff0c\u63ed\u793a\u4e86LLMs\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u4e0d\u8db3\uff0c\u201cHiddenTables\u201d\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u5728\u4fdd\u969c\u6570\u636e\u5b89\u5168\u7684\u540c\u65f6\uff0c\u8ba9LLMs\u4e0e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e92\u52a8\uff0c\u4ee5\u53ca\u964d\u4f4e\u751f\u6210\u6210\u672c\u7684\u5b9e\u8df5\u65b9\u6cd5\u3002|\n", "2406.10478": "|**2024-06-15**|**From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent**|Samuel S. Sohn et.al.|[2406.10478](http://arxiv.org/abs/2406.10478)|null|## \u80cc\u666f \u5728\u5a31\u4e50\u3001\u6559\u80b2\u548c\u8425\u9500\u9886\u57df\u81f3\u5173\u91cd\u8981\u7684\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u9762\u4e34\u7740\u751f\u4ea7\u89c4\u6a21\u6269\u5c55\u548c\u7075\u6d3b\u6027\u63d0\u5347\u7684\u6311\u6218\u3002\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u7684StoryAgent\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u751f\u6210\u5de5\u5177\u6765\u81ea\u52a8\u5316\u5e76\u4f18\u5316\u6570\u5b57\u6545\u4e8b\u521b\u4f5c\u8fc7\u7a0b\u3002\u5b83\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u6545\u4e8b\u60c5\u8282\u8349\u62df\u548c\u81ea\u4e0b\u800c\u4e0a\u7684\u8d44\u4ea7\u751f\u6210\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u624b\u52a8\u5e72\u9884\u3001\u4e92\u52a8\u573a\u666f\u7f16\u6392\u548c\u53d9\u4e8b\u4e00\u81f4\u6027\u7b49\u5173\u952e\u95ee\u9898\u3002\u8fd9\u4e2a\u6846\u67b6\u4fc3\u8fdb\u4e86\u4ea4\u4e92\u5f0f\u548c\u4e00\u81f4\u53d9\u4e8b\u7684\u9ad8\u6548\u751f\u4ea7\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u5a92\u4ecb\uff0c\u63a8\u52a8\u4e86\u5185\u5bb9\u521b\u4f5c\u7684\u6c11\u4e3b\u5316\uff0c\u589e\u5f3a\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8be5\u6846\u67b6\u80fd\u591f\u5728\u6ca1\u6709\u53c2\u8003\u89c6\u9891\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8fde\u8d2f\u7684\u6570\u5b57\u6545\u4e8b\uff0c\u8fd9\u6807\u5fd7\u7740\u81ea\u52a8\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u6280\u672f\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12708": "|**2024-06-18**|**AgentReview: Exploring Peer Review Dynamics with LLM Agents**|Yiqiao Jin et.al.|[2406.12708](http://arxiv.org/abs/2406.12708)|null|## \u7ffb\u8bd1 \u540c\u884c\u8bc4\u5ba1\u662f\u79d1\u5b66\u51fa\u7248\u8bda\u4fe1\u548c\u8fdb\u6b65\u7684\u57fa\u7840\u3002\u4f20\u7edf\u7684\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u5206\u6790\u65b9\u6cd5\u5f80\u5f80\u4fa7\u91cd\u4e8e\u73b0\u6709\u6570\u636e\u7684\u63a2\u7d22\u548c\u7edf\u8ba1\uff0c\u4f46\u672a\u80fd\u5145\u5206\u8003\u8651\u8fd9\u4e00\u8fc7\u7a0b\u7684\u591a\u53d8\u91cf\u6027\u8d28\uff0c\u5904\u7406\u6f5c\u5728\u53d8\u91cf\uff0c\u4e14\u53d7\u9650\u4e8e\u9690\u79c1\u95ee\u9898\uff0c\u56e0\u4e3a\u6570\u636e\u6d89\u53ca\u654f\u611f\u6027\u3002\u6211\u4eec\u63d0\u51faAgentReview\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u540c\u884c\u8bc4\u5ba1\u6a21\u62df\u6846\u67b6\uff0c\u6709\u6548\u5206\u89e3\u4e86\u591a\u4e2a\u6f5c\u5728\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u5e76\u89e3\u51b3\u4e86\u9690\u79c1\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7531\u4e8e\u793e\u4f1a\u5f71\u54cd\u529b\u7406\u8bba\u3001\u5229\u4ed6\u4e3b\u4e49\u75b2\u52b3\u548c\u6743\u5a01\u504f\u89c1\u7b49\u793e\u4f1a\u5b66\u7406\u8bba\u7684\u652f\u6301\uff0c\u8bba\u6587\u51b3\u7b56\u4e2d\u5b58\u5728\u663e\u8457\u768437.1%\u7684\u53d8\u5f02\u6027\u3002\u6211\u4eec\u76f8\u4fe1\u8fd9\u9879\u7814\u7a76\u80fd\u4e3a\u4f18\u5316\u540c\u884c\u8bc4\u5ba1\u673a\u5236\u8bbe\u8ba1\u63d0\u4f9b\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628](http://arxiv.org/abs/2406.12628)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u4e8e\u7535\u529b\u7535\u5b50\u7cfb\u7edf\u63a7\u5236\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u6a21\u578b\u4e0d\u786e\u5b9a\u6027\u4ee5\u53ca\u8bbe\u8ba1\u5468\u671f\u6f2b\u957f\u548c\u6210\u672c\u9ad8\u6602\u7684\u95ee\u9898\u3002\u8bba\u6587\u65e8\u5728\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u9762\u5411\u76ee\u6807\u7684\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u3002\u8be5\u6846\u67b6\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7ed3\u5408\u591a\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u9ad8\u6548\u4e14\u81ea\u52a8\u5316\u7684\u63a7\u5236\u5668\u8bbe\u8ba1\u6d41\u7a0b\u3002LLM\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u5e76\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u7684\u9ad8\u7ea7\u6307\u4ee4\uff0c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7ea6\u675f\u8c03\u6574\u5176\u884c\u4e3a\u3002\u8fd9\u79cd\u65b0\u9896\u800c\u9ad8\u6548\u7684\u7b56\u7565\u6709\u671b\u663e\u8457\u63d0\u5347\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\uff0c\u6781\u5927\u5730\u4fbf\u5229\u5b9e\u8df5\u8005\u7684\u5de5\u4f5c\u3002|\n", "2406.12276": "|**2024-06-18**|**CodeNav: Beyond tool-use to using real-world codebases with LLM agents**|Tanmay Gupta et.al.|[2406.12276](http://arxiv.org/abs/2406.12276)|null|\u6211\u4eec\u4ecb\u7ecdCodeNav\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5bfc\u822a\u548c\u5229\u7528\u5148\u524d\u672a\u89c1\u8fc7\u7684\u4ee3\u7801\u4ed3\u5e93\uff0c\u4ee5\u89e3\u51b3\u7528\u6237\u67e5\u8be2\u7684\u7cfb\u7edf\u3002\u4e0e\u9700\u8981\u901a\u8fc7\u624b\u52a8\u63cf\u8ff0\u5728LLM\u4e0a\u4e0b\u6587\u4e2d\u201c\u6ce8\u518c\u201d\u6240\u6709\u76f8\u5173\u5de5\u5177\u7684\u5de5\u5177\u4f7f\u7528\u578bLLM\u4e0d\u540c\uff0cCodeNav\u80fd\u591f\u81ea\u52a8\u7d22\u5f15\u548c\u641c\u7d22\u76ee\u6807\u4ee3\u7801\u5e93\u4e2d\u7684\u4ee3\u7801\u5757\uff0c\u627e\u5230\u76f8\u5173\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u5bfc\u5165\u5b83\u4eec\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u751f\u6210\u89e3\u51b3\u65b9\u6848\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793aCodeNav\u5982\u4f55\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u4ee3\u7801\u5e93\u6765\u89e3\u51b3\u590d\u6742\u7684\u7528\u6237\u95ee\u9898\u3002\u63a5\u7740\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u5b9a\u91cf\u6bd4\u8f83\u4e86\u4ec5\u80fd\u8bbf\u95ee\u76ee\u6807\u4ee3\u7801\u5e93\u7684\u4ee3\u7801\u4f7f\u7528\u65b9\u6cd5\u4e0e\u62e5\u6709\u5bf9\u6240\u6709\u5de5\u5177\u540d\u79f0\u548c\u63cf\u8ff0\u7684\u7279\u6743\u8bbf\u95ee\u7684\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u7c7b\u578b\u5de5\u5177\u548c\u5e93\u63cf\u8ff0\u5bf9\u4ee3\u7801\u4f7f\u7528\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5c06\u6e90\u4ee3\u7801\u89c6\u4e3a\u8f93\u5165\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ee3\u7801\u63cf\u8ff0\u7684\u4f18\u52bf\u3002\u6240\u6709\u4ee3\u7801\u5c06\u9075\u5faa\u5bbd\u677e\u8bb8\u53ef\u534f\u8bae\u5f00\u6e90\u3002|\n", "2406.12125": "|**2024-06-17**|**Efficient Sequential Decision Making with Large Language Models**|Dingyang Chen et.al.|[2406.12125](http://arxiv.org/abs/2406.12125)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u6269\u5c55\u5230\u5e8f\u5217\u51b3\u7b56\u5236\u5b9a\u3002\u5f53\u524d\u7684\u52aa\u529b\u8981\u4e48\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03LLMs\u8fdb\u884c\u51b3\u7b56\uff0c\u8981\u4e48\u4e3a\u9884\u8bad\u7ec3\u7684LLMs\u8bbe\u8ba1\u63d0\u793a\u3002\u524d\u8005\u9762\u4e34\u8ba1\u7b97\u8d1f\u62c5\u91cd\u7684\u68af\u5ea6\u66f4\u65b0\u95ee\u9898\uff0c\u800c\u540e\u8005\u672a\u663e\u793a\u51fa\u660e\u663e\u6548\u679c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5728\u7ebf\u6a21\u578b\u9009\u62e9\u7b97\u6cd5\u6709\u6548\u5730\u5c06LLMs\u6574\u5408\u5230\u5e8f\u5217\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u3002\u7edf\u8ba1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u51b3\u7b56\u7b97\u6cd5\u548c\u7eafLLM\u4ee3\u7406\u3002\u5728\u8ba1\u7b97\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u907f\u514d\u4e86\u5bf9LLMs\u8fdb\u884c\u6602\u8d35\u7684\u68af\u5ea6\u66f4\u65b0\uff0c\u5e76\u4e14\u5728\u6574\u4e2a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u4ec5\u9700\u8981\u5c11\u91cf\u7684LLM\u8c03\u7528\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee5\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u4e9a\u9a6c\u900a\u6570\u636e\u96c6\u4e3a\u4f8b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4ec5\u4f7f\u75281.5%\u7684\u65f6\u95f4\u6b65\u6570\u8c03\u7528LLMs\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6bd4\u57fa\u7ebf\u8d85\u8fc76\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2406.14373": "|**2024-07-01**|**Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory**|Gordon Dai et.al.|[2406.14373](http://arxiv.org/abs/2406.14373)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\u7684\u7814\u7a76\u8fce\u6765\u4e86\u5927\u89c4\u6a21\u63a2\u7d22\u7684\u673a\u9047\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u57fa\u4e8e\u5148\u524d\u5bf9LLM\u884c\u4e3a\u4f53\u8bbe\u8ba1\u7684\u7814\u7a76\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u6a21\u62df\u7684Agent\u793e\u4f1a\uff0c\u5176\u4e2d\u590d\u6742\u7684\u793e\u4ea4\u5173\u7cfb\u968f\u65f6\u95f4\u52a8\u6001\u5f62\u6210\u548c\u53d1\u5c55\u3002\u6211\u4eec\u8d4b\u4e88\u8fd9\u4e9bAgent\u5fc3\u7406\u9a71\u52a8\u529b\uff0c\u5e76\u7f6e\u4e8e\u4e00\u4e2a\u6c99\u76d2\u751f\u5b58\u73af\u5883\u4e2d\u3002\u901a\u8fc7\u6258\u9a6c\u65af\u00b7\u970d\u5e03\u65af\u7684\u5960\u57fa\u6027\u793e\u4f1a\u5951\u7ea6\u7406\u8bba\uff08SCT\uff09\u7684\u89c6\u89d2\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u8fd9\u4e2aAgent\u793e\u4f1a\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8d77\u521d\uff0cAgent\u4eec\u8868\u73b0\u51fa\u65e0\u62d8\u65e0\u675f\u7684\u51b2\u7a81\uff0c\u7b26\u5408\u970d\u5e03\u65af\u5bf9\u201c\u81ea\u7136\u72b6\u6001\u201d\u7684\u63cf\u8ff0\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u62df\u7684\u8fdb\u884c\uff0c\u793e\u4f1a\u5951\u7ea6\u9010\u6e10\u5f62\u6210\uff0c\u7edd\u5bf9\u4e3b\u6743\u8005\u5f97\u5230\u4e86\u6388\u6743\uff0c\u8fdb\u800c\u5efa\u7acb\u4e86\u4ee5\u76f8\u4e92\u5408\u4f5c\u4e3a\u57fa\u7840\u7684\u548c\u5e73\u5171\u540c\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u53d1\u73b0\u4e0e\u970d\u5e03\u65af\u7406\u8bba\u76f8\u543b\u5408\uff1aLLM\u9a71\u52a8\u7684\u591aAgent\u6a21\u62df\u5c55\u793a\u4e86\u793e\u4f1a\u52a8\u6001\u7684\u590d\u6742\u6027\uff0c\u53ef\u80fd\u590d\u5236\u5851\u9020\u4eba\u7c7b\u793e\u4f1a\u7684\u529b\u91cf\u3002\u5c3d\u7ba1\u65e0\u6cd5\u5b8c\u5168\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u6240\u6709\u7ec6\u5fae\u4e4b\u5904\uff0c\u4f46\u8fd9\u79cd\u6a21\u62df\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u7ed3\u6784\u3001\u7fa4\u4f53\u52a8\u6001\u548c\u590d\u6742\u4eba\u7c7b\u7cfb\u7edf\u5177\u6709\u6f5c\u5728\u4ef7\u503c\u3002|\n", "2406.14228": "|**2024-06-20**|**EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms**|Siyu Yuan et.al.|[2406.14228](http://arxiv.org/abs/2406.14228)|**[link](https://github.com/siyuyuan/evoagent)**|**\u968f\u7740\u5f3a\u5927\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4e00\u79cd\u65b0\u7684\u8d8b\u52bf\u662f\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u80fd\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\uff0c\u5c24\u5176\u662f\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u7684\u6846\u67b6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7cfb\u7edf\u7684\u529f\u80fd\u8303\u56f4\u548c\u53ef\u6269\u5c55\u6027\u3002\u5982\u4f55\u81ea\u52a8\u5c06\u4e13\u95e8\u7684\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51faEvoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u8fdb\u5316\u7b97\u6cd5\u81ea\u52a8\u5c06\u4e13\u5bb6\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u9ad8\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6267\u884c\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u89c6\u73b0\u6709\u7684\u4ee3\u7406\u6846\u67b6\u4e3a\u521d\u59cb\u4e2a\u4f53\uff0c\u5e76\u5e94\u7528\u4e00\u7cfb\u5217\u8fdb\u5316\u64cd\u4f5c\uff08\u5982\u7a81\u53d8\u3001\u4ea4\u53c9\u3001\u9009\u62e9\u7b49\uff09\u751f\u6210\u5177\u6709\u4e0d\u540c\u8bbe\u7f6e\u7684\u4ee3\u7406\u3002EvoAgent\u9002\u7528\u4e8e\u4efb\u4f55\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u80fd\u591f\u65e0\u987b\u989d\u5916\u4eba\u5de5\u8bbe\u8ba1\u81ea\u52a8\u751f\u6210\u6269\u5c55\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEvoAgent\u80fd\u591f\u81ea\u52a8\u4ea7\u751f\u591a\u4e2a\u4e13\u5bb6\u7ea7\u4ee3\u7406\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u3002**|\n", "2406.13352": "|**2024-06-19**|**AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|**[link](https://github.com/ethz-spylab/agentdojo)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentDojo\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u5904\u7406\u4e0d\u53ef\u4fe1\u6570\u636e\u7684AI\u4ee3\u7406\u7684\u5bf9\u6297\u6027\u9c81\u68d2\u6027\u3002\u9762\u5bf9\u4e0d\u65ad\u6f14\u53d8\u7684\u653b\u51fb\u548c\u9632\u5fa1\u624b\u6bb5\uff0cAgentDojo\u4e0d\u662f\u4e00\u4e2a\u9759\u6001\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u800c\u662f\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b0\u4efb\u52a1\u3001\u9632\u5fa1\u7b56\u7565\u4ee5\u53ca\u9002\u5e94\u6027\u653b\u51fb\u7684\u53ef\u6269\u5c55\u73af\u5883\u3002\u5b83\u5305\u542b\u4e8697\u4e2a\u5b9e\u9645\u5e94\u7528\u573a\u666f\u7684\u4efb\u52a1\uff08\u5982\u7ba1\u7406\u7535\u5b50\u90ae\u4ef6\u5ba2\u6237\u7aef\u3001\u5bfc\u822a\u7f51\u4e0a\u94f6\u884c\u7f51\u7ad9\u6216\u9884\u8ba2\u65c5\u884c\uff09\uff0c629\u4e2a\u5b89\u5168\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ee5\u53ca\u6765\u81ea\u6587\u732e\u7684\u5404\u79cd\u653b\u51fb\u548c\u9632\u5fa1\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u8bed\u8a00\u6a21\u578b\u5728AgentDojo\u4e2d\u7684\u8868\u73b0\u5e76\u4e0d\u5c3d\u4eba\u610f\uff08\u5373\u4f7f\u6ca1\u6709\u653b\u51fb\uff09\uff0c\u5e76\u4e14\u73b0\u6709\u7684\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u867d\u7136\u80fd\u7834\u574f\u4e00\u4e9b\u5b89\u5168\u7279\u6027\uff0c\u4f46\u5e76\u975e\u6240\u6709\u60c5\u51b5\u90fd\u9002\u7528\u3002\u6211\u4eec\u671f\u671bAgentDojo\u80fd\u591f\u63a8\u52a8\u7814\u7a76\uff0c\u4ee5\u5bfb\u627e\u5728\u89e3\u51b3\u5e38\u89c1\u4efb\u52a1\u65f6\u65e2\u53ef\u9760\u53c8\u5065\u58ee\u7684AI\u4ee3\u7406\u7684\u65b0\u8bbe\u8ba1\u539f\u5219\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ethz-spylab/agentdojo\u3002**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163](http://arxiv.org/abs/2406.13163)|null|\u53d1\u73b0\u65b0\u6750\u6599\u5bf9\u79d1\u5b66\u548c\u6280\u672f\u5177\u6709\u91cd\u5927\u610f\u4e49\uff0c\u4f46\u76ee\u524d\u4ecd\u662f\u8270\u5de8\u95ee\u9898\uff0c\u56e0\u4e3a\u5316\u5b66\u7a7a\u95f4\u6d69\u701a\u3002\u8fd1\u671f\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u63a8\u52a8\u4e86\u57fa\u4e8e\u6570\u636e\u7684\u65b9\u6cd5\u6765\u5feb\u901f\u7b5b\u9009\u6216\u751f\u6210\u6709\u524d\u666f\u7684\u6750\u6599\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u4f9d\u8d56\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u4e14\u5f80\u5f80\u7f3a\u4e4f\u4eba\u7c7b\u671f\u671b\u7684\u6750\u6599\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u5316\u5b66\u76f4\u89c9\u3002\u6211\u4eec\u63d0\u51faLLMatDesign\uff0c\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u53ef\u89e3\u91ca\u6750\u6599\u8bbe\u8ba1\u65b0\u6846\u67b6\u3002LLMatDesign\u5229\u7528LLM\u4ee3\u7406\u7406\u89e3\u4eba\u7c7b\u6307\u4ee4\uff0c\u5bf9\u6750\u6599\u8fdb\u884c\u4fee\u6539\uff0c\u5e76\u4f7f\u7528\u63d0\u4f9b\u7684\u5de5\u5177\u8bc4\u4f30\u7ed3\u679c\u3002\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u5148\u524d\u51b3\u7b56\uff0cLLMatDesign\u80fd\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u5feb\u901f\u9002\u5e94\u65b0\u4efb\u52a1\u548c\u6761\u4ef6\u3002\u5728\u79bb\u7ebf\u5b9e\u9a8c\u4e2d\uff0c\u5bf9LLMatDesign\u5728\u591a\u4e2a\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u7cfb\u7edf\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u5b83\u5728\u5c0f\u6570\u636e\u73af\u5883\u4e0b\u5f00\u53d1\u51fa\u5177\u6709\u7528\u6237\u5b9a\u4e49\u76ee\u6807\u6027\u8d28\u7684\u65b0\u6750\u6599\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c55\u793a\u4e86\u81ea\u4e3bLLM\u5f15\u5bfc\u7684\u8ba1\u7b97\u73af\u5883\u4e0b\u7684\u6750\u6599\u53d1\u73b0\u7684\u975e\u51e1\u6f5c\u529b\uff0c\u9884\u793a\u7740\u672a\u6765\u81ea\u9a7e\u9a76\u5b9e\u9a8c\u5ba4\u7684\u53ef\u80fd\u6027\u3002|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a-agent\u7cfb\u7edf\uff08LLM-MAS\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5b83\u4eec\u901a\u8fc7\u7cfb\u7edf\u5185\u5404\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u534f\u4f5c\u6765\u5b8c\u6210\u4efb\u52a1\uff0c\u524d\u63d0\u662f\u5171\u4eab\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5f53\u4ee3\u7406\u95f4\u7684\u4ea4\u6d41\u88ab\u7528\u4e8e\u589e\u5f3a\u4eba\u7c7b\u5408\u4f5c\u65f6\uff0c\u7531\u4e8e\u4fe1\u606f\u4e0d\u5bf9\u79f0\uff08\u6bcf\u4e2a\u4ee3\u7406\u4ec5\u80fd\u8bbf\u95ee\u5176\u5bf9\u5e94\u4eba\u7c7b\u7528\u6237\u7684\u4fe1\u606f\uff09\uff0c\u8fd9\u5e26\u6765\u4e86\u65b0\u7684\u6311\u6218\u3002\u4f20\u7edfMAS\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u96be\u4ee5\u5b8c\u6210\u4efb\u52a1\u3002\u4e3a\u89e3\u51b3\u6b64\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u79f0\u4e3a\u201ciAgents\u201d\uff0c\u5373\u4fe1\u606f\u4e30\u5bcc\u591aagent\u7cfb\u7edf\u3002\u5728iAgents\u4e2d\uff0c\u4eba\u7c7b\u793e\u4f1a\u7f51\u7edc\u5728\u4ee3\u7406\u7f51\u7edc\u4e2d\u5f97\u5230\u53cd\u6620\uff0c\u4ee3\u7406\u4e3b\u52a8\u4ea4\u6362\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4ece\u800c\u514b\u670d\u4fe1\u606f\u4e0d\u5bf9\u79f0\u3002iAgents\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ee3\u7406\u63a8\u7406\u673a\u5236\uff0cInfoNav\uff0c\u5f15\u5bfc\u4ee3\u7406\u4e4b\u95f4\u7684\u6709\u6548\u4fe1\u606f\u4ea4\u6d41\u3002\u7ed3\u5408InfoNav\uff0ciAgents\u7ec4\u7ec7\u4e86\u6df7\u5408\u8bb0\u5fc6\u4e2d\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4e3a\u4ee3\u7406\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u4fe1\u606f\u8fdb\u884c\u4ea4\u6362\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u9996\u4e2a\u9488\u5bf9\u8bc4\u4f30LLM\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u6761\u4ef6\u4e0b\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u7684\u57fa\u51c6\u2014\u2014InformativeBench\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0ciAgents\u80fd\u591f\u5728\u5305\u542b140\u4eba\u548c588\u6761\u5173\u7cfb\u7684\u793e\u4f1a\u7f51\u7edc\u4e2d\u534f\u4f5c\uff0c\u81ea\u4e3b\u8fdb\u884c\u8d85\u8fc730\u8f6e\u7684\u901a\u4fe1\uff0c\u5e76\u4ece\u8fd170,000\u6761\u6d88\u606f\u4e2d\u68c0\u7d22\u4fe1\u606f\uff0c\u57283\u5206\u949f\u5185\u5b8c\u6210\u4efb\u52a1\u3002**|\n", "2406.14884": "|**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u88ab\u8bbe\u8ba1\u7528\u4e8e\u901a\u8fc7\u8fed\u4ee3\u89c4\u5212\u548c\u884c\u52a8\u6765\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7684\u4efb\u52a1\u65f6\uff0c\u5bb9\u6613\u4ea7\u751f\u4e0d\u671f\u671b\u7684\u89c4\u5212\u5e7b\u89c9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u521d\u6b65\u5c1d\u8bd5\u901a\u8fc7\u878d\u5165\u4e0e\u5de5\u4f5c\u6d41\u7a0b\u76f8\u5173\u7684\u5916\u90e8\u77e5\u8bc6\u6765\u589e\u5f3a\u89c4\u5212\u53ef\u9760\u6027\u3002\u5c3d\u7ba1\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u6ce8\u5165\u7684\u77e5\u8bc6\u901a\u5e38\u6742\u4e71\u65e0\u7ae0\uff0c\u683c\u5f0f\u591a\u6837\uff0c\u7f3a\u4e4f\u4e25\u8c28\u7684\u89c4\u8303\u5316\u548c\u5168\u9762\u7684\u6bd4\u8f83\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u89c4\u8303\u4e86\u4e0d\u540c\u683c\u5f0f\u7684\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86FlowBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u9762\u5411\u5de5\u4f5c\u6d41\u5f15\u5bfc\u89c4\u5212\u7684\u57fa\u51c6\u3002FlowBench\u6db5\u76d6\u4e86\u6765\u81ea6\u4e2a\u9886\u57df\u768451\u4e2a\u4e0d\u540c\u573a\u666f\uff0c\u5176\u4e2d\u77e5\u8bc6\u4ee5\u591a\u6837\u7684\u5f62\u5f0f\u5448\u73b0\u3002\u4e3a\u4e86\u8bc4\u4f30\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u5728FlowBench\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u5c42\u6b21\u7684\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\u5728\u591a\u79cd\u683c\u5f0f\u4e0b\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5728\u6ee1\u8db3\u6ee1\u610f\u7684\u89c4\u5212\u9700\u6c42\u65b9\u9762\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u57fa\u51c6\u80fd\u4e3a\u672a\u6765\u7684\u4ee3\u7406\u89c4\u5212\u7814\u7a76\u94fa\u5e73\u9053\u8def\u3002|\n", "2406.17232": "|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### \u7ffb\u8bd1 \u6784\u5efa\u903c\u771f\u7684\u4eba\u5de5\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u4fe1\u7684\u793e\u4f1a\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u6709\u65f6\u80fd\u63d0\u5347\u4eba\u6027\u5316\uff0c\u4f46\u6548\u679c\u5e76\u4e0d\u603b\u662f\u7406\u60f3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u6765\u81ea\u5b9e\u8bc1\u4eba\u7c7b\u4fe1\u5ff5\u7f51\u7edc\u7684\u4fe1\u606f\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002\u6211\u4eec\u5229\u7528\u4e00\u9879\u4eba\u7c7b\u8c03\u67e5\u6570\u636e\uff0c\u4f30\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b18\u4e2a\u4e3b\u9898\u7684\u4fe1\u5ff5\u7f51\u7edc\uff0c\u8fd9\u4e9b\u4e3b\u9898\u52a0\u8f7d\u4e8e\u4e24\u4e2a\u4e0d\u91cd\u53e0\u7684\u6f5c\u5728\u56e0\u5b50\u4e0a\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728LLM\u4e2d\u690d\u5165\u4e00\u4e2a\u5173\u4e8e\u67d0\u4e00\u4e3b\u9898\u7684\u89c2\u70b9\uff0c\u5206\u6790\u5176\u5bf9\u5269\u4f59\u6d4b\u8bd5\u8bdd\u9898\u8868\u8fbe\u7684\u89c2\u70b9\u4e0e\u76f8\u5e94\u4eba\u7c7b\u6570\u636e\u7684\u5951\u5408\u7a0b\u5ea6\u3002\u4ec5\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u672a\u80fd\u4f7fLLM\u548c\u4eba\u7c7b\u89c2\u70b9\u4fdd\u6301\u4e00\u81f4\uff0c\u4f46\u5f53\u690d\u5165\u5355\u4e00\u4fe1\u5ff5\u65f6\uff0c\u5bf9\u4e8e\u76f8\u5173\u4e8e\u4fe1\u5ff5\u7f51\u7edc\u5185\u7684\u4e3b\u9898\uff0c\u8fd9\u79cd\u4e00\u81f4\u6027\u663e\u8457\u63d0\u9ad8\uff0c\u800c\u5bf9\u4e8e\u7f51\u7edc\u5916\u7684\u4e3b\u9898\u5219\u6ca1\u6709\u660e\u663e\u5f71\u54cd\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5728\u8ffd\u6c42\u7406\u89e3\u548c\u6a21\u62df\u793e\u4f1a\u4e2d\u4fe1\u5ff5\u5206\u5e03\u6a21\u5f0f\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u4e2d\uff0c\u5b9e\u73b0\u4eba\u7c7b\u4e0eLLMs\u4e4b\u95f4\u7684\u4fe1\u5ff5\u5bf9\u9f50\u3002|\n", "2406.18702": "|**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u865a\u62df\u4ee3\u7406\u6765\u6a21\u62df\u7acb\u6cd5\u8fc7\u7a0b\uff0c\u5177\u4f53\u805a\u7126\u4e8e\u7f8e\u56fd\u53c2\u8bae\u9662\u60c5\u62a5\u59d4\u5458\u4f1a\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4ee3\u8868\u4e2a\u522b\u53c2\u8bae\u5458\u7684\u4ee3\u7406\uff0c\u5e76\u5728\u6a21\u62df\u7684\u59d4\u5458\u4f1a\u8ba8\u8bba\u4e2d\u8ba9\u5b83\u4eec\u4e92\u52a8\u3002\u8fd9\u4e9b\u4ee3\u7406\u5c55\u73b0\u51fa\u5728\u73b0\u5b9e\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u6df1\u601d\u719f\u8651\u7684\u89c2\u70b9\uff0c\u5e76\u5728\u7279\u5b9a\u6761\u4ef6\u4e0b\u627e\u5230\u4e24\u515a\u7684\u89e3\u51b3\u65b9\u6848\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6a21\u62df\u663e\u793a\uff0c\u9762\u5bf9\u5916\u90e8\u5e72\u6270\u65f6\uff0c\u4ee3\u7406\u6a21\u578b\u5728\u4e24\u515a\u5408\u4f5c\u4e0a\u5c55\u73b0\u51fa\u8f6c\u53d8\u7684\u6f5c\u529b\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u7b56\u7565\u53ef\u80fd\u6210\u4e3a\u7406\u89e3\u548c\u6539\u8fdb\u7acb\u6cd5\u6d41\u7a0b\u7684\u6709\u6548\u5de5\u5177\uff0c\u8fd9\u4e0e\u4e00\u7cfb\u5217\u53d1\u73b0\u76f8\u547c\u5e94\uff0c\u5373\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u80fd\u6709\u7528\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u73b0\u8c61\u3002\u672a\u6765\u7684\u7814\u7a76\u5c06\u81f4\u529b\u4e8e\u63d0\u5347\u4ee3\u7406\u7684\u590d\u6742\u6027\uff0c\u6269\u5927\u6a21\u62df\u8303\u56f4\uff0c\u5e76\u63a2\u7d22\u5728\u653f\u7b56\u6d4b\u8bd5\u548c\u8c08\u5224\u4e2d\u7684\u5e94\u7528\u3002|\n", "2406.19966": "|**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|\u5927\u591a\u6570\u7ecf\u6d4e\u7406\u8bba\u901a\u5e38\u5047\u8bbe\u91d1\u878d\u5e02\u573a\u53c2\u4e0e\u8005\u662f\u5b8c\u5168\u7406\u6027\u7684\u4e2a\u4f53\uff0c\u5e76\u4f7f\u7528\u6570\u5b66\u6a21\u578b\u6765\u6a21\u62df\u4eba\u7c7b\u5728\u91d1\u878d\u5e02\u573a\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u884c\u4e3a\u5f80\u5f80\u5e76\u975e\u5b8c\u5168\u7406\u6027\uff0c\u7528\u6570\u5b66\u6a21\u578b\u7cbe\u786e\u9884\u6d4b\u9887\u5177\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\\textbf{A}gent-based \\textbf{S}imulated \\textbf{F}inancial \\textbf{M}arket\uff08ASFM\uff09\uff0c\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u771f\u5b9e\u8ba2\u5355\u5339\u914d\u7cfb\u7edf\u7684\u6a21\u62df\u80a1\u7968\u5e02\u573a\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80a1\u7968\u4ea4\u6613\u4ee3\u7406\uff0c\u5b83\u5305\u62ec\u4e2a\u4eba\u6982\u51b5\u3001\u89c2\u5bdf\u548c\u57fa\u4e8e\u5de5\u5177\u5b66\u4e60\u7684\u52a8\u4f5c\u6a21\u5757\u3002\u8fd9\u79cd\u4ea4\u6613\u4ee3\u7406\u80fd\u591f\u5168\u9762\u7406\u89e3\u5f53\u524d\u5e02\u573a\u52a8\u6001\u548c\u91d1\u878d\u653f\u7b56\u4fe1\u606f\uff0c\u4ece\u800c\u6839\u636e\u5176\u4ea4\u6613\u7b56\u7565\u4f5c\u51fa\u51b3\u7b56\u3002\u5b9e\u9a8c\u8868\u660e\uff0cASFM\u5728\u53ef\u63a7\u573a\u666f\u4e0b\u7684\u53cd\u5e94\u4e0e\u73b0\u5b9e\u80a1\u7968\u5e02\u573a\u4e00\u81f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u4e24\u4e2a\u7ecf\u6d4e\u5b66\u7814\u7a76\u70ed\u70b9\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u53d1\u73b0\uff0c\u6211\u4eec\u7684\\model\u5f97\u51fa\u7684\u7ed3\u8bba\u4e0e\u7ecf\u6d4e\u5b66\u7814\u7a76\u7684\u521d\u6b65\u53d1\u73b0\u76f8\u543b\u5408\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8ba4\u4e3aASFM\u4e3a\u7ecf\u6d4e\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u8303\u5f0f\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u5982\u4e13\u95e8\u5316\u7684\u6a21\u578b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u53ef\u4ee5\u6839\u636e\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5728\u533b\u7597\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u95e8\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u79f0\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\u6765\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u4e3a\u7ed9\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u5408\u9002\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\u7684\u6700\u65b0\u72b6\u6001\uff0c\u751a\u81f3\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4o\u76f8\u6bd4\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cMMedAgent\u8fd8\u663e\u793a\u51fa\u4e86\u66f4\u65b0\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u7684\u9ad8\u6548\u6027\u3002|\n", "2407.01887": "|**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|\u672c\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u51b3\u7b56\u5236\u5b9a\u4e2d\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u675c\u5c14\u514b\u59c6\u53cc\u81c2\u8d4c\u535a\uff08Dueling Bandits\uff0cDB\uff09\u95ee\u9898\u7684\u4e0a\u4e0b\u6587\u4e2d\u3002\u7814\u7a76\u6bd4\u8f83\u4e86GPT-3.5-Turbo\u3001GPT-4\u548cGPT-4-Turbo\u4e0e\u73b0\u6709DB\u7b97\u6cd5\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c24\u5176\u662fGPT-4 Turbo\uff0c\u80fd\u591f\u5feb\u901f\u8bc6\u522b\u51fa\u4f18\u52bf\u660e\u663e\u7684\u9009\u9879\uff0c\u4ece\u800c\u5728\u5f31\u540e\u6094\u65b9\u9762\u8d85\u8d8a\u5f53\u524d\u6700\u4f73\u7b97\u6cd5\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6536\u655b\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5bf9\u63d0\u793a\u7684\u654f\u611f\u5ea6\u8f83\u9ad8\uff0c\u4e14\u5bf9\u63d0\u793a\u53d8\u5316\u53cd\u5e94\u8106\u5f31\u3002\u4e3a\u4e86\u6539\u8fdb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u4e86LLM\u51b3\u7b56\u80fd\u529b\u4e0e\u7ecf\u5178DB\u7b97\u6cd5\u7406\u8bba\u4fdd\u8bc1\u7684\u589e\u5f3a\u578b\u7b97\u6cd5\u2014\u2014IF-Enhanced LLM\u3002\u8fd9\u79cd\u8bbe\u8ba1\u5c55\u793a\u4e86\u5982\u4f55\u589e\u5f3aLLM\u5728\u5bf9\u6027\u80fd\u7a33\u5b9a\u6027\u6709\u8981\u6c42\u7684\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u53ef\u4fe1\u5ea6\u3002IF-Enhanced LLM\u5177\u6709\u5f31\u540e\u6094\u548c\u5f3a\u540e\u6094\u7684\u7406\u8bba\u4fdd\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u5373\u4f7f\u9762\u5bf9\u5608\u6742\u548c\u5bf9\u6297\u6027\u7684\u63d0\u793a\uff0cIF-Enhanced LLM\u4ecd\u4fdd\u6301\u7a33\u5065\u3002|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u81ea\u52a8\u5316\uff0c\u5982\u4ee3\u7801\u5408\u6210\u3001\u7a0b\u5e8f\u4fee\u590d\u548c\u6d4b\u8bd5\u751f\u6210\uff0c\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\u3002\u7814\u7a76\u4eba\u5458\u548c\u4e1a\u754c\u5b9e\u8df5\u8005\u5df2\u7ecf\u5f00\u53d1\u51fa\u5404\u79cd\u81ea\u4e3bLLM\u4ee3\u7406\u6765\u6267\u884c\u7aef\u5230\u7aef\u7684\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u3001\u8fd0\u884c\u547d\u4ee4\u3001\u89c2\u5bdf\u73af\u5883\u53cd\u9988\u5e76\u89c4\u5212\u672a\u6765\u884c\u52a8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u65b9\u6cd5\u7684\u590d\u6742\u6027\u4ee5\u53ca\u5f53\u524dLLM\u7684\u5c40\u9650\u6027\uff0c\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u662f\u5426\u771f\u7684\u9700\u8981\u4f7f\u7528\u590d\u6742\u7684\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\uff1f\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86Agentless\u2014\u2014\u4e00\u79cd\u65e0\u4ee3\u7406\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u89e3\u51b3\u8f6f\u4ef6\u5f00\u53d1\u95ee\u9898\u3002\u4e0e\u590d\u6742\u7684\u4ee3\u7406\u8bbe\u7f6e\u76f8\u6bd4\uff0cAgentless\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u4e24\u9636\u6bb5\u8fc7\u7a0b\uff1a\u5b9a\u4f4d\u540e\u4fee\u590d\uff0c\u4e0d\u8ba9LLM\u51b3\u5b9a\u672a\u6765\u7684\u884c\u52a8\u6216\u64cd\u4f5c\u590d\u6742\u7684\u5de5\u5177\u3002\u5728\u6d41\u884c\u7684SWE-bench Lite\u57fa\u51c6\u4e0a\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4ee4\u4eba\u60ca\u8bb6\u5730\u8868\u660e\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u80fd\u591f\u5b9e\u73b0\u6700\u9ad8\u6027\u80fd\uff0827.33%\uff09\u548c\u6700\u4f4e\u6210\u672c\uff080.34\u7f8e\u5143\uff09\uff0c\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u8f6f\u4ef6\u4ee3\u7406\uff01 \u6b64\u5916\uff0c\u6211\u4eec\u624b\u52a8\u5206\u7c7b\u4e86SWE-bench Lite\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u53d1\u73b0\u5b58\u5728\u7cbe\u786e\u7684ground truth\u8865\u4e01\u95ee\u9898\u6216\u63cf\u8ff0\u4e0d\u8db3/\u8bef\u5bfc\u6027\u7684\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86SWE-bench Lite-S\uff0c\u901a\u8fc7\u6392\u9664\u8fd9\u4e9b\u95ee\u9898\u6765\u8fdb\u884c\u66f4\u4e25\u683c\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5f53\u524d\u88ab\u5ffd\u89c6\u7684\u7b80\u5355\u3001\u53ef\u89e3\u91ca\u6280\u672f\u5728\u81ea\u4e3b\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5e0c\u671bAgentless\u5c06\u4f5c\u4e3a\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\u7684\u57fa\u7ebf\u3001\u8d77\u70b9\u548c\u671f\u671b\u503c\uff0c\u6fc0\u53d1\u672a\u6765\u5728\u8fd9\u4e2a\u5173\u952e\u9886\u57df\u7684\u5de5\u4f5c\u3002**|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u6536\u96c6\u5168\u7403\u4fe1\u606f\uff0c\u5e76\u8fdb\u884c\u63a8\u7406\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\uff0c\u8fd9\u5f15\u53d1\u4e86\u4f7f\u7528LLM\u9884\u6d4b\u56fd\u9645\u4e8b\u4ef6\u7684\u5174\u8da3\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u4e00\u4e2a\u4e25\u683c\u8bc4\u4f30LLM\u9884\u6d4b\u80fd\u529b\u4e0e\u53ef\u9760\u6027\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faMIRAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4ef7LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\u3002MIRAI\u6784\u5efa\u4e86\u4e00\u4e2a\u4ee3\u7406\u73af\u5883\uff0c\u914d\u5907\u6709\u8bbf\u95ee\u5e7f\u6cdb\u5386\u53f2\u7ed3\u6784\u5316\u4e8b\u4ef6\u548c\u6587\u672c\u65b0\u95fb\u6570\u636e\u5e93\u7684\u5de5\u5177\u3002\u6211\u4eec\u5bf9GDELT\u4e8b\u4ef6\u6570\u636e\u5e93\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6e05\u6d17\u548c\u89e3\u6790\uff0c\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u5173\u8054\u9884\u6d4b\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u4ece\u77ed\u671f\u5230\u957f\u671f\u7684\u4e0d\u540c\u9884\u6d4b\u65f6\u95f4\u8de8\u5ea6\uff0c\u4ee5\u8bc4\u4f30LLM\u5728\u6574\u5408\u5173\u952e\u5168\u7403\u4fe1\u606f\u3001\u4f7f\u7528\u9886\u57df\u7279\u5b9aAPI\u548c\u5e93\u7f16\u5199\u4ee3\u7801\u4ee5\u53ca\u7efc\u5408\u5904\u7406\u6765\u81ea\u4e0d\u540c\u683c\u5f0f\u548c\u65f6\u95f4\u7684\u5386\u53f2\u77e5\u8bc6\u4ee5\u51c6\u786e\u9884\u6d4b\u672a\u6765\u4e8b\u4ef6\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u53ef\u9760\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u9884\u6d4b\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u63a8\u52a8\u66f4\u7cbe\u786e\u548c\u53ef\u4fe1\u7684\u56fd\u9645\u5173\u7cfb\u5206\u6790\u6a21\u578b\u7684\u53d1\u5c55\u3002|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u57fa\u4e8eLLM\u7684\u79fb\u52a8\u4ee3\u7406\u5df2\u6210\u4e3a\u4eba\u673a\u4ea4\u4e92\u9886\u57df\u7684\u7814\u7a76\u70ed\u70b9\u3002\u7136\u800c\uff0c\u9488\u5bf9\u6b64\u7c7b\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\u8d44\u6e90\u76f8\u5bf9\u532e\u4e4f\u3002\u8bc4\u4f30\u8fd9\u7c7b\u4ee3\u7406\u901a\u5e38\u9762\u4e34\u4e09\u4e2a\u6311\u6218\uff1a\uff081\uff09\u4ec5\u4f9d\u8d56\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\u7684\u4f4e\u6548\u9650\u5236\u4e86\u4efb\u52a1\u8bc4\u4f30\uff1b\uff082\uff09\u5355\u4e00\u5e94\u7528\u4e2d\u7684\u7279\u5b9a\u6307\u4ee4\u4e0d\u8db3\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u7684\u591a\u7ef4\u5ea6\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\uff1b\uff083\uff09\u5f53\u524d\u7684\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u51c6\u786e\u8861\u91cf\u8fde\u7eed\u52a8\u4f5c\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mobile-Bench\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u7528\u4e8e\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u80fd\u529b\u7684\u57fa\u51c6\u3002\u9996\u5148\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4f20\u7edf\u7684UI\u64cd\u4f5c\uff0c\u878d\u5165\u4e86103\u4e2a\u6536\u96c6\u5230\u7684API\uff0c\u4ee5\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7684\u6548\u7387\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u548cLLM\u589e\u5f3a\u7684\u6570\u636e\u6536\u96c6\u6765\u8fdb\u884c\u8bc4\u4f30\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u8bc4\u4ef7\u79fb\u52a8\u4ee3\u7406\u7684\u4e0d\u540c\u89c4\u5212\u80fd\u529b\u5c42\u6b21\uff0c\u6211\u4eec\u7684\u6570\u636e\u88ab\u5206\u4e3aSAST\uff08\u7b80\u5355\u4efb\u52a1\uff09\u3001SAMT\uff08\u7a0d\u590d\u6742\u4efb\u52a1\uff09\u548cMAMT\uff08\u591a\u4efb\u52a1\uff09\u4e09\u7c7b\uff0c\u53cd\u6620\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u7684\u5dee\u5f02\u3002Mobile-Bench\u5305\u542b832\u6761\u6570\u636e\u6761\u76ee\uff0c\u5176\u4e2d\u8d85\u8fc7200\u9879\u4efb\u52a1\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u6d4b\u8bd5\u8de8\u5e94\u7528\u534f\u4f5c\u573a\u666f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3aCheckPoint\uff0c\u7528\u4e8e\u68c0\u67e5LLM\u79fb\u52a8\u4ee3\u7406\u5728\u89c4\u5212\u548c\u63a8\u7406\u6b65\u9aa4\u4e2d\u662f\u5426\u8fbe\u5230\u5173\u952e\u70b9\u3002|\n", "2407.00476": "|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**\u968f\u7740\u4f20\u7edf\u4f18\u5316\u548c\u8c03\u5ea6\u65b9\u6cd5\u9010\u6e10\u8f6c\u5411\u7528\u6237\u9a71\u52a8\u548c\u4e2a\u4eba\u5316\u670d\u52a1\uff0c\u4ee5\u63d0\u5347\u7528\u6237\u4f53\u9a8c\uff08QoE\uff09\u548c\u7075\u6d3b\u6027\uff0c\u672a\u6765\u7684\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5728\u65e0\u7ebf\u548c\u6570\u5b57\u5316\u80fd\u6e90\u7f51\u7edc\u4e2d\uff0c\u9762\u4e34\u7740\u5982\u4f55\u66f4\u597d\u5730\u7406\u89e3\u548c\u54cd\u5e94\u7528\u6237\u9700\u6c42\u7684\u6311\u6218\u3002\u4f20\u7edf\u7684\u7cfb\u7edf\u5f80\u5f80\u5ffd\u89c6\u4e86\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\uff0c\u56e0\u4e3a\u7528\u6237\u4e0e\u673a\u5668\u4e4b\u95f4\u7684\u6c9f\u901a\u4e0d\u7545\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u5e26\u6765\u4e86\u7a81\u7834\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u7528\u6237\u4e0e\u8bbe\u5907\u4e4b\u95f4\u81ea\u7136\u7684\u4ea4\u6d41\u754c\u9762\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u67b6\u6784\uff0c\u901a\u8fc7\u6784\u5efa\u4e09\u4e2aLLM\u4ee3\u7406\u6765\u5c06\u7528\u6237\u7684\u8bed\u97f3\u8bf7\u6c42\uff08VRQ\uff09\u8f6c\u5316\u4e3a\u8d44\u6e90\u5206\u914d\u5411\u91cf\u3002\u5177\u4f53\u5305\u62ec\uff1aLLM\u610f\u56fe\u8bc6\u522b\u4ee3\u7406\u5c06\u8bf7\u6c42\u8f6c\u5316\u4e3a\u4f18\u5316\u95ee\u9898\uff08OP\uff09\u3001LLM OP\u53c2\u6570\u8bc6\u522b\u4ee3\u7406\u4ee5\u53caLLM OP\u6c42\u89e3\u4ee3\u7406\u3002 \u6211\u4eec\u9488\u5bf9\u7535\u52a8\u6c7d\u8f66\uff08EV\uff09\u5145\u7535\u7684\u5178\u578bVRQ\u521b\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u5e93\uff0c\u4f5c\u4e3a\u6027\u80fd\u8bc4\u4f30\u7684\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u4e3b\u8981\u4f7f\u7528Llama 3 8B\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\u3002\u901a\u8fc7\u4e0d\u540c\u7684\u63d0\u793a\u5de5\u7a0b\u573a\u666f\u6d4b\u8bd5\uff0c\u7ed3\u679c\u663e\u793a\u4e86\u6240\u63d0\u67b6\u6784\u7684\u6709\u6548\u6027\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u952e\u89c1\u89e3\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u5efa\u6a21\u5b9e\u9645\u95ee\u9898\u7684\u66f4\u5927\u5019\u9009OP\u96c6\u53ef\u80fd\u4f1a\u7531\u4e8e\u66f4\u9ad8\u7684\u8bc6\u522b/OP\u5206\u7c7b\u566a\u58f0\u800c\u964d\u4f4e\u6700\u7ec8\u6027\u80fd\u3002\u6240\u6709\u7ed3\u679c\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u5229\u7528\u3002**|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|\u4eba\u5de5\u667a\u80fd\u5728\u91d1\u878d\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6b63\u5728\u91cd\u5851\u6570\u636e\u5904\u7406\u548c\u89e3\u8bfb\u65b9\u5f0f\u3002\u5176\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u3001\u63d0\u5347\u5ba2\u6237\u670d\u52a1\uff0c\u5e76\u63d0\u4f9b\u8be6\u5c3d\u7684\u8d22\u52a1\u5206\u6790\u3002\u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdIDEA-FinBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u91d1\u878d\u77e5\u8bc6\u65b9\u9762\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u7684\u8bc4\u4ef7\u57fa\u51c6\u3002\u5b83\u501f\u9274\u4e86\u4e24\u4e2a\u5168\u7403\u77e5\u540d\u4e14\u6743\u5a01\u7684\u91d1\u878d\u4e13\u4e1a\u8003\u8bd5\u4e2d\u7684\u95ee\u9898\uff0c\u65e8\u5728\u5168\u9762\u68c0\u9a8cLLMs\u89e3\u7b54\u4e0e\u91d1\u878d\u76f8\u5173\u8003\u9898\u7684\u80fd\u529b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51faIDEA-FinKER\uff0c\u662f\u4e00\u4e2a\u91d1\u878d\u77e5\u8bc6\u589e\u5f3a\u6846\u67b6\uff0c\u65e8\u5728\u5feb\u901f\u8ba9\u901a\u7528LLMs\u9002\u5e94\u91d1\u878d\u9886\u57df\u3002\u5b83\u91c7\u7528\u57fa\u4e8e\u68c0\u7d22\u7684\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b9e\u73b0\u5b9e\u65f6\u4e0a\u4e0b\u6587\u7ea7\u77e5\u8bc6\u6ce8\u5165\uff0c\u5e76\u63d0\u4f9b\u4e00\u5957\u9ad8\u8d28\u91cf\u7684\u91d1\u878d\u77e5\u8bc6\u6307\u4ee4\uff0c\u7528\u4e8e\u5fae\u8c03\u4efb\u4f55\u901a\u7528\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86IDEA-FinQA\uff0c\u4e00\u4e2a\u7531LLMs\u9a71\u52a8\u7684\u91d1\u878d\u95ee\u7b54\u7cfb\u7edf\u3002\u8be5\u7cfb\u7edf\u56f4\u7ed5\u5b9e\u65f6\u77e5\u8bc6\u6ce8\u5165\u548c\u4e8b\u5b9e\u5f3a\u5316\u7684\u67b6\u6784\u6784\u5efa\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u3002IDEA-FinQA\u4e3b\u8981\u7531\u6570\u636e\u6536\u96c6\u5668\u3001\u6570\u636e\u67e5\u8be2\u6a21\u5757\u548c\u6267\u884c\u7279\u5b9a\u529f\u80fd\u7684LLM\u4ee3\u7406\u7ec4\u6210\u3002|\n"}, "llm": {"2405.10311": "|**2024-05-16**|**UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models**|Sahel Sharifymoghaddam et.al.|[2405.10311](http://arxiv.org/abs/2405.10311)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u591a\u6a21\u6001\uff08MM\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u89e3\u9501\u4e86\u8bb8\u591a\u9700\u8981\u591a\u6a21\u6001\u7406\u89e3\uff08\u5982\u56fe\u50cf\u63cf\u8ff0\u6216\u89c6\u89c9\u95ee\u7b54\uff09\u548c\u751f\u6210\uff08\u5982\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u751f\u6210\u6216\u7f16\u8f91\uff09\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347MM-LLMs\u7684\u8f93\u51fa\u8d28\u91cf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684UniRAG\u6280\u672f\uff0c\u5b83\u5728\u63a8\u7406\u9636\u6bb5\u5c06\u76f8\u5173\u68c0\u7d22\u4fe1\u606f\u6dfb\u52a0\u5230\u63d0\u793a\u4e2d\uff0c\u4f5c\u4e3a\u5c11\u91cf\u6837\u4f8b\u3002\u4e0e\u666e\u904d\u8ba4\u4e3a\u68c0\u7d22\u589e\u5f3a\uff08RA\uff09\u4e3b\u8981\u6539\u8fdb\u7f55\u89c1\u5b9e\u4f53\u7684\u751f\u6210\u6216\u7406\u89e3\u4e0d\u540c\uff0c\u6211\u4eec\u5728MSCOCO\u6570\u636e\u96c6\u4e0a\u5bf9\u5305\u62ecGPT4\u3001Gemini-Pro\u5728\u5185\u7684\u4e13\u6709\u6a21\u578b\u4ee5\u53caLlava\u3001LaVIT\u548cEmu2\u7b49\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8f93\u5165\u63d0\u793a\u901a\u8fc7MM\u68c0\u7d22\u5668\uff08\u5982UniIR\u6a21\u578b\uff09\u589e\u5f3a\u540e\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u8d28\u91cf\u3002|\n", "2405.10305": "|**2024-05-16**|**4D Panoptic Scene Graph Generation**|Jingkang Yang et.al.|[2405.10305](http://arxiv.org/abs/2405.10305)|**[link](https://github.com/jingkang50/psg4d)**|**\u6211\u4eec\u751f\u6d3b\u5728\u4e00\u4e2a\u4e09\u7ef4\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u901a\u8fc7\u7b2c\u56db\u7ef4\u65f6\u95f4\u5411\u524d\u63a8\u8fdb\u3002\u4e3a\u4e86\u4f7f\u4eba\u5de5\u667a\u80fd\u80fd\u591f\u5168\u9762\u7406\u89e3\u8fd9\u79cd4D\u73af\u5883\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8868\u793a\u5f62\u5f0f\u2014\u20144D\u5168\u666f\u573a\u666f\u56fe\uff08PSG-4D\uff09\uff0c\u5b83\u5c06\u52a8\u60014D\u4e16\u754c\u4e2d\u7684\u539f\u59cb\u89c6\u89c9\u6570\u636e\u62bd\u8c61\u4e3a\u8282\u70b9\u548c\u8fb9\uff0c\u8282\u70b9\u4ee3\u8868\u5177\u6709\u7cbe\u786e\u4f4d\u7f6e\u548c\u72b6\u6001\u4fe1\u606f\u7684\u5b9e\u4f53\uff0c\u8fb9\u6355\u6349\u65f6\u95f4\u5173\u7cfb\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5728\u8fd9\u4e00\u65b0\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u6ce8\u91caPSG-4D\u6570\u636e\u96c6\uff0c\u5305\u542b3000\u4e2aRGB-D\u89c6\u9891\uff0c\u603b\u8ba1100\u4e07\u5e27\uff0c\u6bcf\u5e27\u90fd\u5e26\u67094D\u5168\u666f\u5206\u5272\u63a9\u7801\u4ee5\u53ca\u8be6\u7ec6\u7684\u52a8\u6001\u573a\u666f\u56fe\u6807\u7b7e\u3002\u6211\u4eec\u4e3a\u6b64\u4efb\u52a1\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSG4DFormer\u7684Transformer\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u5168\u666f\u5206\u5272\u63a9\u7801\uff0c\u6cbf\u65f6\u95f4\u8f74\u8ddf\u8e2a\u63a9\u7801\uff0c\u5e76\u901a\u8fc7\u5173\u7cfb\u7ec4\u4ef6\u751f\u6210\u76f8\u5e94\u7684\u573a\u666f\u56fe\u3002\u5728\u65b0\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u672a\u6765\u7684PSG-4D\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u878d\u5165\u6211\u4eec\u7684PSG-4D\u7cfb\u7edf\u6765\u5b9e\u73b0\u52a8\u6001\u573a\u666f\u7406\u89e3\u7684\u4e00\u4e2a\u5b9e\u9645\u5e94\u7528\u793a\u4f8b\u3002**|\n", "2405.10299": "|**2024-05-16**|**HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models**|Rhea Sanjay Sukthanker et.al.|[2405.10299](http://arxiv.org/abs/2405.10299)|**[link](https://github.com/automl/hw-aware-llm-bench)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u786c\u4ef6\u6307\u6807\uff08\u5982\u5ef6\u8fdf\u3001\u80fd\u8017\u3001GPU\u5185\u5b58\u4f7f\u7528\u548c\u6027\u80fd\uff09\u4e4b\u95f4\u7684\u6743\u8861\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u4eba\u4eec\u6b63\u5728\u5bfb\u6c42\u4e3a\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u914d\u7f6e\u5efa\u7acb\u5e15\u7d2f\u6258\u524d\u6cbf\uff0c\u4ee5\u5728\u6307\u5b9a\u786c\u4ef6\u9650\u5236\u4e0b\u627e\u5230\u6700\u4f18\u6a21\u578b\u3002\u7136\u800c\uff0c\u5bf9\u591a\u79cd\u67b6\u6784\u5728\u591a\u53f0\u8bbe\u5907\u4e0a\u7684\u5168\u9762\u8bad\u7ec3\u548c\u8bc4\u4f30\u5728\u8ba1\u7b97\u4e0a\u662f\u4e0d\u53ef\u884c\u7684\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HW-GPT-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u786c\u4ef6\u611f\u77e5\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u57fa\u51c6\uff0c\u5229\u7528\u795e\u7ecf\u67b6\u6784\u641c\u7d22\uff08NAS\uff09\u4e2d\u7684\u6743\u91cd\u5171\u4eab\u6280\u672f\uff0c\u5728\u4e00\u4e2a\u6a21\u578b\u4e2d\u9ad8\u6548\u5730\u8bad\u7ec3\u5305\u542b\u4e0d\u540c\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u8d85\u7f51\u7edc\u3002\u6211\u4eec\u572813\u79cd\u8bbe\u5907\u4e0a\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u6027\u80fd\u5256\u6790\uff0c\u8003\u8651\u4e865\u79cd\u786c\u4ef6\u6307\u6807\u548c3\u79cd\u4e0d\u540c\u7684\u6a21\u578b\u89c4\u6a21\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc78\u79cd\u4e0d\u540c\u7684\u591a\u76ee\u6807NAS\u7b97\u6cd5\u5c55\u793a\u4e86HW-GPT-Bench\u7684\u53ef\u7528\u6027\uff0c\u5e76\u8bc4\u4f30\u4e86\u7531\u6b64\u4ea7\u751f\u7684\u5e15\u7d2f\u6258\u524d\u6cbf\u7684\u8d28\u91cf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u63a8\u52a8\u548c\u52a0\u901f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u76ee\u6807\u65b9\u6cd5\uff0c\u5982NAS\u548c\u7ed3\u6784\u5316\u526a\u679d\u7684\u7814\u7a76\u3002**|\n", "2405.10288": "|**2024-05-16**|**Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction**|Jianhao Chen et.al.|[2405.10288](http://arxiv.org/abs/2405.10288)|**[link](https://github.com/jianhaochen-nju/tsdre)**|**\u6458\u8981\uff1a** \u4e8b\u5b9e\u62bd\u53d6\u5bf9\u4e8e\u6784\u5efa\u77e5\u8bc6\u56fe\u8c31\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5bf9\u65f6\u95f4\u76f8\u5173\u4e8b\u5b9e\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u9700\u6c42\u589e\u957f\uff0c\u51fa\u73b0\u4e86\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u4efb\u52a1\u3002\u672c\u6587\u7279\u522b\u5173\u6ce8\u4ece\u81ea\u7136\u8bed\u8a00\u6587\u672c\u4e2d\u63d0\u53d6\u65f6\u95f4\u6027\u4e8b\u5b9e\u3002\u5148\u524d\u7684\u7814\u7a76\u672a\u80fd\u59a5\u5584\u5904\u7406\u590d\u6742\u53e5\u5b50\u4e2d\u65f6\u95f4\u4e0e\u4e8b\u5b9e\u5bf9\u5e94\u5173\u7cfb\u7684\u5efa\u7acb\u96be\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u65f6\u95f4\u7ebf\u7684\u53e5\u5b50\u5206\u89e3\u7b56\u7565\uff0c\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4e8b\u5b9e\u76f8\u5173\u65f6\u95f4\u7ebf\u7684\u7cbe\u7ec6\u7406\u89e3\u3002\u7136\u800c\uff0c\u76f4\u63a5\u4f7f\u7528LLMs\u8fdb\u884c\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u6027\u80fd\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86TSDRE\u65b9\u6cd5\uff0c\u5c06LLMs\u7684\u5206\u89e3\u80fd\u529b\u878d\u5165\u5230\u5c0f\u578b\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u4f20\u7edf\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002 \u4e3a\u4e86\u652f\u6301\u8bc4\u4f30\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u590d\u6742\u7684\u65f6\u5e8f\u4e8b\u5b9e\u62bd\u53d6\u6570\u636e\u96c6ComplexTRED\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTSDRE\u5728HyperRED-Temporal\u548cComplexTRED\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002|\n", "2405.10276": "|**2024-05-16**|**Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers**|Tuo Zhang et.al.|[2405.10276](http://arxiv.org/abs/2405.10276)|null|\u8fd1\u5e74\u6765\uff0c\u8bb8\u591a\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u7b56\u7565\u6027\u63d0\u793a\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6548\u80fd\u3002\u7279\u522b\u662f\u4f18\u5316\u901a\u8fc7prompting\uff08OPRO\uff09\u65b9\u6cd5\u8868\u73b0\u51fa\u9876\u5c16\u6027\u80fd\uff0c\u5b83\u5229\u7528LLMs\u4f5c\u4e3a\u4f18\u5316\u5668\uff0c\u76ee\u6807\u662f\u5bfb\u627e\u80fd\u6700\u5927\u5316\u4efb\u52a1\u51c6\u786e\u6027\u7684\u6307\u4ee4\u3002\u672c\u8bba\u6587\u91cd\u65b0\u5ba1\u89c6\u4e86OPRO\u5728\u5c0f\u578bLLMs\uff08\u5982LaMa-2\u7cfb\u5217\u548cMistral 7B\uff09\u4e0a\u7684\u81ea\u52a8\u5316\u63d0\u793a\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u4e8e\u5c0f\u578bLLMs\uff0cOPRO\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u5176\u6709\u9650\u7684\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86\u4f18\u5316\u6f5c\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u672a\u6765\u7684\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u5e94\u540c\u65f6\u8003\u8651\u6a21\u578b\u80fd\u529b\u548c\u8ba1\u7b97\u6210\u672c\u3002\u9488\u5bf9\u5c0f\u578bLLMs\uff0c\u6211\u4eec\u63a8\u8350\u76f4\u63a5\u63d0\u4f9b\u660e\u786e\u9610\u8ff0\u76ee\u6807\u548c\u65b9\u6cd5\u7684\u6307\u4ee4\uff0c\u4f5c\u4e3a\u7a33\u5065\u7684\u63d0\u793a\u57fa\u7ebf\uff0c\u4ee5\u786e\u4fdd\u5728\u5f53\u524d\u7814\u7a76\u4e2d\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u63d0\u793a\u8bbe\u8ba1\u3002|\n", "2405.10260": "|**2024-05-16**|**Keep It Private: Unsupervised Privatization of Online Text**|Calvin Bao et.al.|[2405.10260](http://arxiv.org/abs/2405.10260)|**[link](https://github.com/csbao/kip-privatization)**|**## \u80cc\u666f \u4f5c\u8005\u8eab\u4efd\u6df7\u6dc6\u6280\u672f\u6709\u671b\u901a\u8fc7\u81ea\u52a8\u91cd\u5199\u6587\u672c\u6765\u4fdd\u62a4\u7f51\u7edc\u901a\u4fe1\u4e2d\u7684\u4e2a\u4eba\u9690\u79c1\u3002\u7136\u800c\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6587\u732e\u4e2d\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u8bc4\u4f30\u5927\u591a\u5c40\u9650\u5728\u72ed\u5c0f\u573a\u666f\u4e0b\uff0c\u4e3b\u8981\u4f9d\u8d56\u4e8e\u8868\u9762\u7684\u7f16\u8f91\u64cd\u4f5c\uff0c\u53ef\u80fd\u5bfc\u81f4\u8f93\u51fa\u4e0d\u81ea\u7136\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u6587\u672c\u79c1\u5bc6\u5316\u6846\u67b6\uff0c\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u751f\u6210\u517c\u987e\u51c6\u786e\u3001\u8fde\u8d2f\u548c\u9690\u79c1\u7684\u91cd\u5199\u3002\u6211\u4eec\u5728\u5927\u89c4\u6a21\u7684\u82f1\u8bedReddit\u5e16\u5b50\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u8be5\u6570\u636e\u96c6\u753168,000\u540d\u4f5c\u8005\u64b0\u5199\uff0c\u5305\u542b\u77ed\u5230\u4e2d\u7b49\u957f\u5ea6\u7684\u6587\u672c\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5728\u4e0d\u540c\u8bc4\u4f30\u6761\u4ef6\u4e0b\uff0c\u5982\u4f5c\u8005\u7b80\u4ecb\u957f\u5ea6\u548c\u4f5c\u8005\u8bc6\u522b\u7b56\u7565\uff0c\u6027\u80fd\u7684\u53d8\u5316\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u81ea\u52a8\u5316\u6307\u6807\u548c\u4eba\u5de5\u8bc4\u4f30\u4e2d\u4fdd\u6301\u9ad8\u6587\u672c\u8d28\u91cf\uff0c\u5e76\u6210\u529f\u5730\u89c4\u907f\u4e86\u51e0\u79cd\u81ea\u52a8\u4f5c\u8005\u8bc6\u522b\u653b\u51fb\u3002**|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u4f53\u5728\u7a7a\u95f4\u7406\u89e3\u4e0e\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u8986\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u7ed3\u5408\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u663e\u793a\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u4e5f\u6307\u51fa\u4e86\u6316\u63983D-LLMs\u5168\u90e8\u6f5c\u529b\u6240\u9700\u7684\u521b\u65b0\u65b9\u6cd5\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u63d0\u4f9b\u6307\u5bfc\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u8c03\u67e5\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251](http://arxiv.org/abs/2405.10251)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u8bc4\u4f30\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u7b49\u65b9\u9762\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u636e\u6211\u4eec\u6240\u77e5\uff0c\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u6df1\u5165\u7814\u7a76\uff0c\u8fd9\u662f\u8861\u91cf\u6a21\u578b\u4f18\u79c0\u7a0b\u5ea6\u7684\u5173\u952e\u6807\u51c6\u3002\u56e0\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u5168\u9762\u8bc4\u4f30\u77e5\u540d\u4e14\u6027\u80fd\u51fa\u8272\u7684LLMs\uff0c\u5305\u62ecChatGPT\u3001ChatGLM\u3001\u57fa\u4e8eT5\u7684\u6a21\u578b\u3001\u57fa\u4e8eLLaMA\u7684\u6a21\u578b\u548cPythia\u6a21\u578b\uff0c\u5728\u5bf9\u8bdd\u751f\u6210\u548c\u6587\u672c\u603b\u7ed3\u7b49NLG\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u6db5\u76d6\u82f1\u8bed\u548c\u4e2d\u6587\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5171\u540c\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5305\u62ec\u8f93\u5165\u6a21\u677f\u548c\u540e\u5904\u7406\u7b56\u7565\u3002\u7814\u7a76\u7ed3\u679c\u62a5\u544a\u4e86\u81ea\u52a8\u8bc4\u5206\uff0c\u540c\u65f6\u8fdb\u884c\u4e86\u8be6\u7ec6\u5206\u6790\u3002|\n", "2405.10250": "|**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u4e92\u52a8\u529f\u80fd\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u6307\u5bfc\u6a21\u578b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u4e92\u52a8\u65b9\u5f0f\u5f80\u5f80\u5047\u8bbe\u7528\u6237\u5177\u5907\u8c03\u8bd5\u6e90\u4ee3\u7801\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u975e\u4e13\u4e1a\u7a0b\u5e8f\u5458\u4e0d\u592a\u53cb\u597d\u3002\u8fd9\u4f7f\u5f97\u4f7f\u4e92\u52a8\u4ee3\u7801\u751f\u6210\u5bf9\u4e0d\u540c\u7f16\u7a0b\u6c34\u5e73\u7684\u4e2a\u4f53\u66f4\u6613\u4e8e\u4f7f\u7528\u6210\u4e3a\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86IntelliExplain\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u673a\u4ea4\u4e92\u8303\u5f0f\uff0c\u901a\u8fc7\u8ba9\u7528\u6237\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u4e0e\u6e90\u4ee3\u7801\u4e92\u52a8\uff0c\u63d0\u5347\u975e\u4e13\u4e1a\u4eba\u58eb\u7684\u4f53\u9a8c\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u4ed6\u4eec\u53d1\u73b0\u9519\u8bef\u7684\u81ea\u7136\u8bed\u8a00\u7ea0\u6b63\u53cd\u9988\uff0c\u6765\u6307\u5bfc\u7cfb\u7edf\u4fee\u8ba2\u4ee3\u7801\uff0c\u76f4\u5230\u7528\u6237\u5bf9\u7cfb\u7edf\u7684\u4ee3\u7801\u89e3\u91ca\u611f\u5230\u6ee1\u610f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u663e\u793a\uff0c\u4f7f\u7528IntelliExplain\u7684\u7528\u6237\u5728Text-to-SQL\u548cPython\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u5206\u522b\u6bd4\u7eafGPT-3.5\u63d0\u9ad8\u4e8611.6%\u548c25.3%\uff0c\u540c\u65f6\u6240\u9700\u65f6\u95f4\u5206\u522b\u51cf\u5c11\u4e8639.0%\u548c15.6%\u3002|\n", "2405.10212": "|**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5fc3\u7406\u5b66\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014CPsyExam\uff0c\u5b83\u6e90\u4e8e\u4e2d\u56fd\u8bed\u8a00\u8003\u8bd5\u7684\u95ee\u9898\u3002CPsyExam\u65e8\u5728\u5206\u522b\u5f3a\u8c03\u5fc3\u7406\u5b66\u77e5\u8bc6\u548c\u6848\u4f8b\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u8ba4\u8bc6\u5230\u5c06\u5fc3\u7406\u5b66\u77e5\u8bc6\u5e94\u7528\u4e8e\u5b9e\u9645\u60c5\u5883\u7684\u4ef7\u503c\u3002\u4ece22,000\u4e2a\u95ee\u9898\u5e93\u4e2d\uff0c\u6211\u4eec\u7cbe\u9009\u4e864,000\u4e2a\u6765\u6784\u5efa\u8be5\u57fa\u51c6\uff0c\u786e\u4fdd\u4e86\u4e3b\u9898\u7684\u5747\u8861\u8986\u76d6\uff0c\u5e76\u5305\u542b\u4e86\u5404\u79cd\u6848\u4f8b\u5206\u6790\u65b9\u6cd5\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u57fa\u7840\u7684\u6a21\u578b\u3002\u5b9e\u9a8c\u548c\u5206\u6790\u7ed3\u679c\u663e\u793a\uff0cCPsyExam\u662f\u4e00\u4e2a\u6709\u6548\u7684\u786e\u7acb\u8bed\u8a00\u6a21\u578b\u5bf9\u5fc3\u7406\u5b66\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\uff0c\u540c\u65f6\u652f\u6301\u5728\u4e0d\u540c\u7c92\u5ea6\u4e0a\u6bd4\u8f83\u8fd9\u4e9b\u6a21\u578b\u3002|\n", "2405.10936": "|**2024-05-17**|**A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers**|Kaiyu Huang et.al.|[2405.10936](http://arxiv.org/abs/2405.10936)|**[link](https://github.com/kaiyuhwang/mllm-survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u8bed\u8a00\u80fd\u529b\uff0c\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u51cf\u5c11\u6f5c\u5728\u7684\u6b67\u89c6\u5e76\u63d0\u5347\u6280\u672f\u7684\u901a\u7528\u6027\u548c\u53ef\u8bbf\u95ee\u6027\uff0c\u5bf9\u4e8e\u591a\u8bed\u8a00\u6280\u672f\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1LLMs\u53d6\u5f97\u4e86\u7a81\u7834\uff0c\u4f46\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u7684\u6df1\u5165\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u4e00\u4efd\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u603b\u7ed3\u8fd1\u671f\u7684\u65b9\u6cd5\u3001\u8fdb\u5c55\u3001\u5c40\u9650\u6027\u548c\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u65e8\u5728\u4ece\u591a\u4e2a\u89d2\u5ea6\u5ba1\u89c6LLMs\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u56de\u987e\u4e86\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7814\u7a76\u7684\u5386\u53f2\u6f14\u53d8\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86LLMs\u7684\u591a\u8bed\u8a00\u7279\u6027\uff0c\u5305\u62ec\u8bad\u7ec3\u548c\u63a8\u7406\u65b9\u6cd5\u3001\u6a21\u578b\u5b89\u5168\u3001\u8de8\u9886\u57df\u4e0e\u6587\u5316\u9002\u5e94\u4ee5\u53ca\u6570\u636e\u96c6\u4f7f\u7528\u3002\u6211\u4eec\u8fd8\u5206\u6790\u4e86\u8fd9\u4e9b\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u53ef\u80fd\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u7684\u591a\u8bed\u8a00\u6027\u80fd\u3002\u672c\u7efc\u8ff0\u65e8\u5728\u5e2e\u52a9\u7814\u7a76\u754c\u5e94\u5bf9\u591a\u8bed\u8a00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u5173\u4e8e\u57fa\u4e8eLLMs\u7684\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6838\u5fc3\u6982\u5ff5\u3001\u5173\u952e\u6280\u672f\u53ca\u6700\u65b0\u8fdb\u5c55\u7684\u5168\u9762\u7406\u89e3\u3002**|\n", "2405.10928": "|**2024-05-17**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928](http://arxiv.org/abs/2405.10928)|**[link](https://github.com/apolloresearch/rib)**|### \u6982\u8ff0 \u673a\u68b0\u89e3\u91ca\u6027\u76ee\u6807\u662f\u901a\u8fc7\u9006\u5411\u5de5\u7a0b\u7406\u89e3\u795e\u7ecf\u7f51\u7edc\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5bf9\u6fc0\u6d3b\u7684\u5206\u89e3\uff0c\u4f7f\u5f97\u5355\u4e2a\u795e\u7ecf\u5143\u6216\u6a21\u578b\u7ec4\u4ef6\u65e0\u6cd5\u6e05\u6670\u5bf9\u5e94\u4e8e\u72ec\u7279\u7684\u7279\u5f81\u6216\u529f\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\u2014\u2014\u5c40\u90e8\u4ea4\u4e92\u57fa\uff08Local Interaction Basis\uff0cLIB\uff09\u3002LIB\u65e8\u5728\u901a\u8fc7\u6d88\u9664\u65e0\u5173\u6fc0\u6d3b\u548c\u4ea4\u4e92\uff0c\u8bc6\u522b\u8ba1\u7b97\u7279\u5f81\u3002\u8be5\u65b9\u6cd5\u6452\u5f03\u65e0\u610f\u4e49\u7684\u6fc0\u6d3b\u65b9\u5411\uff0c\u5e76\u4f7f\u57fa\u7840\u4e0e\u76f8\u90bb\u5c42\u95f4\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u5947\u5f02\u5411\u91cf\u5bf9\u9f50\u3002\u540c\u65f6\uff0c\u5b83\u6839\u636e\u7279\u5f81\u5bf9\u540e\u7eed\u8ba1\u7b97\u7684\u91cd\u8981\u6027\u8fdb\u884c\u7f29\u653e\uff0c\u751f\u6210\u4e00\u4e2a\u663e\u793a\u6a21\u578b\u4e2d\u6240\u6709\u8ba1\u7b97\u76f8\u5173\u7279\u6027\u548c\u4ea4\u4e92\u7684\u56fe\u8c31\u3002 \u6211\u4eec\u5728\u6a21\u5757\u52a0\u6cd5\u548cCIFAR-10\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86LIB\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u6bd4\u4e8e\u4e3b\u6210\u5206\u5206\u6790\uff0cLIB\u80fd\u8bc6\u522b\u51fa\u66f4\u591a\u8ba1\u7b97\u76f8\u5173\u7684\u7279\u5f81\uff0c\u5e76\u5448\u73b0\u51fa\u66f4\u7a00\u758f\u7684\u4ea4\u4e92\u3002\u7136\u800c\uff0c\u5728\u5e94\u7528\u4e8e\u8bed\u8a00\u6a21\u578b\u65f6\uff0cLIB\u5e76\u672a\u663e\u8457\u63d0\u9ad8\u53ef\u89e3\u91ca\u6027\u6216\u4ea4\u4e92\u7a00\u758f\u5ea6\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c3d\u7ba1LIB\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u7406\u8bba\u9a71\u52a8\u65b9\u6cd5\uff0c\u4f46\u5f53\u524d\u5f62\u5f0f\u5e76\u4e0d\u9002\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893](http://arxiv.org/abs/2405.10893)|null|\u8fd9\u7bc7\u6280\u672f\u8bba\u6587\u9610\u8ff0\u4e86COGNET-MD\uff0c\u4e00\u4e2a\u4e13\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7684\u65b0\u57fa\u51c6\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u5206\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u533b\u5b66\u6587\u672c\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u96be\u5ea6\u5206\u7ea7\u7684\u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u6570\u636e\u5e93\u3002\u8fd9\u4e2a\u6570\u636e\u5e93\u7531\u591a\u4e2a\u533b\u7597\u9886\u57df\u7684\u4e13\u5bb6\u5408\u4f5c\u521b\u5efa\uff0c\u4ee5\u53cd\u6620\u5f53\u524d\u533b\u5b66\u8d8b\u52bf\uff0c\u786e\u4fdd\u5b89\u5168\u3001\u5b9e\u7528\u548c\u9002\u7528\u6027\u3002\u521d\u671f\u7248\u672c\u5305\u542b\u4e86\u7cbe\u795e\u79d1\u3001\u7259\u79d1\u3001\u80ba\u75c5\u5b66\u3001\u76ae\u80a4\u79d1\u548c\u5185\u5206\u6ccc\u5b66\u7b49\u9886\u57df\u7684\u9898\u76ee\uff0c\u4f46\u4f1a\u6301\u7eed\u6269\u5c55\uff0c\u672a\u6765\u8fd8\u4f1a\u52a0\u5165\u66f4\u591a\u533b\u5b66\u5b66\u79d1\u3002|\n", "2405.10883": "|**2024-05-17**|**Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review**|Hongyi Yang et.al.|[2405.10883](http://arxiv.org/abs/2405.10883)|null|\u8be5\u7efc\u8ff0\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4f30\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u60a3\u8005\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u73b0\u72b6\u548c\u524d\u666f\uff0c\u4ee5\u53ca\u5176\u5bf9\u5eb7\u590d\u8fc7\u7a0b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u4ece2012\u5e74\u81f3\u73b0\u5728\u7b5b\u9009\u4e8670\u9879\u7814\u7a76\uff0c\u91cd\u70b9\u5173\u6ce8\u673a\u5668\u5b66\u4e60\u3001\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u5728\u5fc3\u7406\u5065\u5eb7\u5e72\u9884\u548c\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u3001\u6280\u672f\u7c7b\u522b\u3001\u4ea7\u54c1\u548c\u6570\u636e\u7c7b\u578b\uff0c\u5982\u751f\u6001\u77ac\u65f6\u8bc4\u4f30\u3001\u884c\u4e3a\u548c\u8bed\u97f3\u6570\u636e\u7684\u5206\u6790\u3002\u7ed3\u679c\u663e\u793a\uff0cAI\u5728\u75c7\u72b6\u76d1\u6d4b\u3001\u590d\u53d1\u98ce\u9669\u9884\u6d4b\u548c\u5eb7\u590d\u6cbb\u7597\u4e2d\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u8fd8\u63a2\u8ba8\u4e86\u57fa\u4e8eAI\u7684\u65b0\u5174\u4ea7\u54c1\u3001\u6280\u672f\u548c\u5206\u6790\u65b9\u6cd5\uff0c\u5982\u793e\u4ea4\u5a92\u4f53\u5206\u6790\u3001\u4e25\u8083\u6e38\u620f\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5eb7\u590d\u4e2d\u7684\u6f5c\u5728\u6311\u6218\u548c\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u603b\u7684\u6765\u8bf4\uff0c\u8fd9\u7bc7\u8bba\u6587\u7cfb\u7edf\u56de\u987e\u4e86AI\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u8def\u5f84\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u5efa\u8bae\u3002|\n", "2405.10853": "|**2024-05-17**|**The Future of Large Language Model Pre-training is Federated**|Lorenzo Sani et.al.|[2405.10853](http://arxiv.org/abs/2405.10853)|null|## \u80cc\u666f \u751f\u6210\u5f0f\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5b83\u4eec\u6240\u63a5\u53d7\u7684\u6d77\u91cf\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u5df2\u5efa\u7acb\u7684\u89c4\u6a21\u6cd5\u5219\uff0cLLMs\u672a\u6765\u6027\u80fd\u7684\u63d0\u5347\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6211\u4eec\u80fd\u591f\u5229\u7528\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u6709\u53ef\u80fd\u91ca\u653e\u5168\u7403\u5927\u90e8\u5206\u672a\u5145\u5206\u5229\u7528\u7684\u6570\u636e\u548c\u8ba1\u7b97\u80fd\u529b\uff0c\u8fd9\u4e9b\u662f\u5f53\u524d\u4ee5\u6570\u636e\u4e2d\u5fc3\u4e3a\u4e2d\u5fc3\u7684LLM\u8bad\u7ec3\u65b9\u6cd5\u6240\u5ffd\u89c6\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7a33\u5065\u3001\u7075\u6d3b\u4e14\u53ef\u590d\u73b0\u7684FL\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdb\u673a\u6784\u95f4\u7684\u5927\u89c4\u6a21\u534f\u4f5c\uff0c\u5171\u540c\u8bad\u7ec3LLMs\uff0c\u4ece\u800c\u52a8\u5458\u66f4\u591a\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\uff0c\u751a\u81f3\u53ef\u80fd\u8fbe\u5230\u6216\u8d85\u8d8a\u4e2d\u5fc3\u5316\u7684\u6027\u80fd\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cdFL\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u5728\u6709\u9650\u8d44\u6e90\u4e0b\u6269\u5c55\u5230\u767e\u4ebf\u5143\u7ea7\u7684\u8054\u90a6LLM\uff0c\u4f7f\u5f97\u62e5\u6709\u4e30\u5bcc\u6570\u636e\u7684\u5b9e\u4f53\u80fd\u591f\u6210\u4e3a\u9884\u8bad\u7ec3LLMs\u7684\u4e3b\u5bfc\u529b\u91cf\uff0c\u800c\u4e0d\u662f\u4ec5\u8ba9\u8ba1\u7b97\u8d44\u6e90\u4e30\u5bcc\u7684\u673a\u6784\u72ec\u5360\u9ccc\u5934\u3002\u8fd9\u79cd\u65b9\u6cd5\u5f3a\u8c03\u4e86\u8054\u90a6\u8bad\u7ec3\u7684\u89c4\u6a21\u6548\u76ca\uff0c\u5e76\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b9e\u7528\u8def\u5f84\u3002|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825](http://arxiv.org/abs/2405.10825)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u5b83\u4eec\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u5c24\u5176\u5728\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u7684\u63a8\u52a8\u4e0b\u5c55\u73b0\u51fa\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u65e8\u5728\u5168\u9762\u6982\u8ff0LLM\u8d4b\u80fd\u7684\u7535\u4fe1\u7f51\u7edc\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86LLMs\u7684\u57fa\u7840\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u63a8\u7406\u4e0e\u5e94\u7528\u3001\u6a21\u578b\u8bc4\u4f30\uff0c\u4ee5\u53ca\u5728\u7535\u4fe1\u90e8\u7f72\u4e2d\u7684\u8fd0\u7528\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8LLM\u652f\u6301\u7684\u5173\u952e\u6280\u672f\u548c\u7535\u4fe1\u5e94\u7528\uff0c\u6d89\u53ca\u751f\u6210\u3001\u5206\u7c7b\u3001\u4f18\u5316\u548c\u9884\u6d4b\u95ee\u9898\u3002\u751f\u6210\u5e94\u7528\u5305\u62ec\u7535\u4fe1\u9886\u57df\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u7f51\u7edc\u914d\u7f6e\u81ea\u52a8\u751f\u6210\u3002\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u4efb\u52a1\u6db5\u76d6\u7f51\u7edc\u5b89\u5168\u3001\u6587\u672c\u3001\u56fe\u50cf\u548c\u6d41\u91cf\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u5229\u7528LLMs\u7684\u81ea\u52a8\u5316\u4f18\u5316\u6280\u672f\uff0c\u5982\u5f3a\u5316\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u8bbe\u8ba1\u548c\u53e3\u8bed\u5f3a\u5316\u5b66\u4e60\u3002\u5bf9\u4e8e\u9884\u6d4b\u95ee\u9898\uff0cLLMs\u53ef\u7528\u4e8e\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u548c\u591a\u6a21\u6001\u7535\u4fe1\u9884\u6d4b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86LLM\u8d4b\u80fd\u7535\u4fe1\u7f51\u7edc\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2405.10808": "|**2024-05-17**|**ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios**|Markus Bayer et.al.|[2405.10808](http://arxiv.org/abs/2405.10808)|null|\u4e3b\u52a8\u5b66\u4e60\u65e8\u5728\u901a\u8fc7\u4f18\u5148\u5904\u7406\u6700\u80fd\u63d0\u5347\u5b66\u4e60\u6548\u679c\u7684\u5b9e\u4f8b\u6765\u51cf\u5c11\u6807\u6ce8\u5de5\u4f5c\u91cf\u3002\u7136\u800c\uff0c\u8bb8\u591a\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u9762\u4e34\u201c\u51b7\u542f\u52a8\u201d\u95ee\u9898\uff0c\u5373\u5728\u521d\u671f\u9700\u8981\u5927\u91cf\u6570\u636e\u624d\u80fd\u53d1\u6325\u6548\u80fd\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982BERT\uff09\u4e0a\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u5df2\u8868\u73b0\u826f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u2014\u2014ActiveLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u3001Llama 3\u548cMistral Large\uff09\u8fdb\u884c\u5b9e\u4f8b\u9009\u62e9\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0cActiveLLM\u663e\u8457\u63d0\u9ad8\u4e86BERT\u5206\u7c7b\u5668\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u548cSetFit\u7b49\u5c11\u6570\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0cActiveLLM\u8fd8\u80fd\u6269\u5c55\u5230\u975e\u5c11\u91cf\u6837\u672c\u573a\u666f\uff0c\u652f\u6301\u8fed\u4ee3\u9009\u62e9\uff0c\u4ece\u800c\u5e2e\u52a9\u5176\u4ed6\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u514b\u670d\u51b7\u542f\u52a8\u96be\u9898\u3002\u7ed3\u679c\u8868\u660e\uff0cActiveLLM\u4e3a\u6539\u5584\u4e0d\u540c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u6a21\u578b\u6027\u80fd\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2405.10745": "|**2024-05-17**|**Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings**|Albert Sawczyn et.al.|[2405.10745](http://arxiv.org/abs/2405.10745)|null|### \u7ffb\u8bd1 \u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u5bf9\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u63d0\u51fa\u4e86\u4e25\u5cfb\u6311\u6218\u3002\u901a\u5e38\u91c7\u7528\u7684\u65b9\u6cd5\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u65f6\u5f80\u5f80\u5b58\u5728\u5c40\u9650\u6027\u3002\u7136\u800c\uff0c\u4eba\u4eec\u5df2\u7ecf\u52aa\u529b\u901a\u8fc7\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u5f25\u8865\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u901a\u8fc7\u5c06\u5c0f\u89c4\u6a21\u7684\u9886\u57df\u7279\u5b9aKG\u4e0e\u901a\u7528KG\u76f8\u7ed3\u5408\u3002\u5c3d\u7ba1KG\u5728\u77e5\u8bc6\u8868\u793a\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4f46\u6784\u5efa\u5b83\u4eec\u7684\u6210\u672c\u53ef\u80fd\u963b\u788d\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u94fe\u63a5\u5230\u5927\u89c4\u6a21\u901a\u7528KG\u6765\u63d0\u5347\u5c0f\u578b\u9886\u57df\u7279\u5b9aKG\u5d4c\u5165\u7684\u5b66\u4e60\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5e26\u6765\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u4f8b\u5982\uff0cHits@10\u6307\u6807\u6700\u9ad8\u63d0\u9ad8\u4e8644%\u3002\u8fd9\u4e00\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u7814\u7a76\u65b9\u5411\u6709\u671b\u4fc3\u8fdbKG\u5728\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u66f4\u9891\u7e41\u8fd0\u7528\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u4e3a\u7a33\u5065\u3001\u53ef\u9760\u7684ML\u89e3\u51b3\u65b9\u6848\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u6d41\u884c\u4f46\u6613\u51fa\u9519\u7684LLM\u65b9\u6cd5\u66f4\u5177\u53ef\u9760\u6027\u3002\u5173\u952e\u8bcd\uff1a\u77e5\u8bc6\u56fe\u8c31\u3001\u77e5\u8bc6\u56fe\u8c31\u8865\u5168\u3001\u5b9e\u4f53\u5bf9\u9f50\u3001\u8868\u793a\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739](http://arxiv.org/abs/2405.10739)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|**\u5728\u8fc7\u53bb\u4e00\u5e74\u91cc\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\u3001\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u7b49\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5e9e\u5927\u89c4\u6a21\u548c\u9ad8\u6602\u7684\u8bad\u7ec3\u4e0e\u63a8\u7406\u6210\u672c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u9ad8\u6548\u4e14\u8f7b\u91cf\u7ea7\u7684MLLM\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u8fb9\u7f18\u8ba1\u7b97\u73af\u5883\u4e2d\u3002\u672c\u7efc\u8ff0\u5168\u9762\u7cfb\u7edf\u5730\u56de\u987e\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7684\u7814\u7a76\u73b0\u72b6\u3002\u6211\u4eec\u6982\u8ff0\u4e86\u4ee3\u8868\u6027\u9ad8\u6548\u6a21\u578b\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u603b\u7ed3\u4e86\u6709\u6548\u7ed3\u6784\u548c\u7b56\u7565\u7684\u7814\u7a76\u72b6\u6001\uff0c\u4ee5\u53ca\u5176\u5b9e\u7528\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7814\u7a76\u7684\u5c40\u9650\uff0c\u5e76\u5c55\u671b\u4e86\u6709\u524d\u666f\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u5982\u9700\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey\u3002**|\n", "2405.10725": "|**2024-05-17**|**INDUS: Effective and Efficient Language Models for Scientific Applications**|Bishwaranjan Bhattacharjee et.al.|[2405.10725](http://arxiv.org/abs/2405.10725)|null|\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bad\u7ec3\u6570\u636e\u53ef\u4ee5\u4f7f\u6a21\u578b\u5728\u4e13\u4e1a\u4efb\u52a1\u4e0a\u8868\u73b0\u66f4\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86INDUS\uff0c\u4e00\u5957\u4e13\u4e3a\u5730\u7403\u79d1\u5b66\u3001\u751f\u7269\u5b66\u3001\u7269\u7406\u5b66\u3001\u592a\u9633\u7269\u7406\u3001\u884c\u661f\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u8bbe\u8ba1\u7684\u5b9a\u5236\u5316\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u4e9b\u6a21\u578b\u57fa\u4e8e\u7cbe\u5fc3\u6311\u9009\u7684\u79d1\u5b66\u8bed\u6599\u5e93\uff0c\u5305\u62ec\uff1a\uff081\uff09\u4e00\u4e2a\u4f7f\u7528\u9886\u57df\u4e13\u7528\u8bcd\u6c47\u548c\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u63d0\u5347\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u7684\u8868\u73b0\uff1b\uff082\uff09\u4e00\u4e2a\u57fa\u4e8e\u5bf9\u6bd4\u5b66\u4e60\u7684\u901a\u7528\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5229\u7528\u591a\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u4f18\u5316\u4fe1\u606f\u68c0\u7d22\u4efb\u52a1\uff1b\uff083\uff09\u901a\u8fc7\u77e5\u8bc6\u84b8\u998f\u6280\u672f\u7f29\u5c0f\u89c4\u6a21\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5bf9\u5ef6\u8fdf\u548c\u8d44\u6e90\u6709\u9650\u7684\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e09\u4e2a\u65b0\u7684\u79d1\u5b66\u57fa\u51c6\u6570\u636e\u96c6\uff1aCLIMATE-CHANGE-NER\uff08\u5b9e\u4f53\u8bc6\u522b\uff09\u3001NASA-QA\uff08\u62bd\u53d6\u5f0f\u95ee\u7b54\uff09\u548cNASA-IR\uff08\u4fe1\u606f\u68c0\u7d22\uff09\uff0c\u4ee5\u63a8\u52a8\u8de8\u5b66\u79d1\u9886\u57df\u7684\u7814\u7a76\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u65b0\u4efb\u52a1\u548c\u76f8\u5173\u9886\u57df\u73b0\u6709\u57fa\u51c6\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8e\u901a\u7528\u7f16\u7801\u5668\uff08\u5982RoBERTa\uff09\u548c\u73b0\u6709\u7684\u9886\u57df\u7279\u5b9a\u7f16\u7801\u5668\uff08\u5982SciBERT\uff09\u3002|\n", "2405.12217": "|**2024-05-20**|**Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning**|Guanglin Zhou et.al.|[2405.12217](http://arxiv.org/abs/2405.12217)|**[link](https://github.com/jameszhou-gl/icl-distribution-shift)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5e94\u5bf9\u81ea\u7136\u5206\u5e03\u53d8\u5316\u65f6\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\uff0c\u5e38\u5e38\u8d85\u8d8a\u5148\u524d\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u9886\u57df\u7279\u5b9a\u7684\u9002\u5e94\u4ecd\u7136\u662f\u5fc5\u8981\u7684\uff0c\u5c24\u5176\u662f\u5728\u533b\u7597\u7b49\u4e13\u4e1a\u9886\u57df\u3002\u9274\u4e8eLMMs\u5e9e\u5927\u7684\u53c2\u6570\u7a7a\u95f4\u4f7f\u5176\u5fae\u8c03\u4e0d\u5207\u5b9e\u9645\uff0c\u672c\u7814\u7a76\u805a\u7126\u4e8e\u63a2\u7d22\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u4f5c\u4e3a\u4e00\u79cd\u589e\u5f3aLMM\u9002\u5e94\u6027\u7684\u6709\u6548\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0cICL\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u793a\u4f8b\u7684\u9009\u62e9\uff0c\u8fd9\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7c7b\u4f3c\uff0c\u4f46\u5bf9\u9762\u4e34\u5206\u5e03\u53d8\u5316\u7684LMMs\u63d0\u51fa\u4e86\u72ec\u7279\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u79cd\u65e0\u76d1\u7763\u7684ICL\u65b9\u6cd5\u2014\u2014TopKNearestPR\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u7279\u5f81\u76f8\u4f3c\u6027\u8fdb\u884c\u6700\u8fd1\u793a\u4f8b\u641c\u7d22\u6765\u9009\u62e9\u793a\u4f8b\u3002\u7814\u7a76\u63ed\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u5904\u7406\u5206\u5e03\u8f6c\u79fb\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u7f3a\u9677\u5bf9\u5176\u6548\u679c\u7684\u9650\u5236\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014InvariantSelectPR\uff0c\u5b83\u5229\u7528\u7c7b\u6761\u4ef6\u5bf9\u6bd4\u4e0d\u53d8\u6027\uff08CCI\uff09\u6765\u63d0\u5347\u9884\u8bad\u7ec3\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7a33\u5065\u6027\u3002CCI\u901a\u8fc7\u589e\u5f3a\u4e0d\u540c\u7c7b\u522b\u95f4\u7684\u533a\u5206\u5ea6\u5e76\u786e\u4fdd\u5bf9\u9886\u57df\u7279\u5b9a\u53d8\u5316\u7684\u4e0d\u53d8\u6027\uff0c\u63d0\u9ad8\u4e86\u7f16\u7801\u5668\u8bc6\u522b\u548c\u68c0\u7d22\u6700\u6709\u4fe1\u606f\u4ef7\u503c\u793a\u4f8b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u6709\u52a9\u4e8e\u5f15\u5bfcLMM\u9002\u5e94\u65b0\u7684\u67e5\u8be2\u6837\u672c\uff0c\u5373\u4f7f\u5728\u4e0d\u540c\u7684\u5206\u5e03\u4e0b\u4e5f\u662f\u5982\u6b64\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cInvariantSelectPR\u663e\u8457\u63d0\u9ad8\u4e86LMM\u7684\u9002\u5e94\u6027\uff0c\u5728Camelyon17\u548cHAM10000\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u76847-shot\u4efb\u52a1\u4e2d\uff0c\u5206\u522b\u5b9e\u73b0\u4e8634.2%\u548c16.9%\u7684\u51c6\u786e\u7387\u63d0\u5347\uff0c\u76f8\u5bf9\u4e8e\u96f6-shot\u6027\u80fd\uff0c\u8fd9\u662f\u663e\u8457\u7684\u8fdb\u6b65\u3002**|\n", "2405.12209": "|**2024-05-20**|**MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark**|Hongwei Liu et.al.|[2405.12209](http://arxiv.org/abs/2405.12209)|**[link](https://github.com/open-compass/mathbench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u6570\u5b66\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f20\u7edf\u7684\u6570\u5b66\u57fa\u51c6\u5982GSM8k\u5728\u5168\u9762\u8bc4\u4ef7\u8fd9\u4e9b\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MathBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u4e25\u683c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u3002MathBench\u8986\u76d6\u5e7f\u6cdb\u7684\u6570\u5b66\u5b66\u79d1\uff0c\u5bf9\u7406\u8bba\u7406\u89e3\u548c\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u8fdb\u884c\u8be6\u5c3d\u8bc4\u4f30\u3002\u5b83\u5206\u4e3a\u4e94\u4e2a\u9636\u6bb5\uff0c\u4ece\u57fa\u7840\u7b97\u672f\u5230\u5927\u5b66\u6570\u5b66\uff0c\u7ed3\u6784\u4e0a\u8bbe\u8ba1\u7528\u4e8e\u8003\u5bdf\u6a21\u578b\u5728\u4e0d\u540c\u6df1\u5ea6\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u6bcf\u4e2a\u9636\u6bb5\u5305\u62ec\u7406\u8bba\u95ee\u9898\u548c\u5e94\u7528\u9898\uff0c\u4ee5\u8861\u91cf\u6a21\u578b\u7684\u6570\u5b66\u719f\u7ec3\u5ea6\u53ca\u5176\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\u5e94\u7528\u6982\u5ff5\u7684\u80fd\u529b\u3002MathBench\u7684\u76ee\u6807\u662f\u63d0\u5347\u5bf9LLMs\u6570\u5b66\u80fd\u529b\u7684\u8bc4\u4ef7\uff0c\u63d0\u4f9b\u5bf9\u5176\u77e5\u8bc6\u7406\u89e3\u6c34\u5e73\u548c\u95ee\u9898\u89e3\u51b3\u6280\u80fd\u7684\u7ec6\u81f4\u89c6\u89d2\uff0c\u540c\u65f6\u652f\u6301\u53cc\u8bed\u73af\u5883\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728https://github.com/open-compass/MathBench\u3002**|\n", "2405.12195": "|**2024-05-20**|**Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey**|Thiago S. Vaillant et.al.|[2405.12195](http://arxiv.org/abs/2405.12195)|**[link](https://github.com/gpt-impact/Paper-content)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5176\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\u548c\u5e7f\u6cdb\u5e94\u7528\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4e0e\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u7684\u878d\u5408\u8d8b\u52bf\u65e5\u76ca\u660e\u663e\uff0c\u4f46\u5173\u4e8e\u8fd9\u79cd\u878d\u5408\u5982\u4f55\u5f71\u54cd\u8f6f\u4ef6\u5f00\u53d1\u5b9e\u8df5\u548c\u8ba4\u77e5\u7684\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u4e3a\u4e86\u63ed\u793a\u5c06AI\u9a71\u52a8\u5de5\u5177\uff0c\u5982ChatGPT\uff0c\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u7684\u5f71\u54cd\u548c\u6311\u6218\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8c03\u67e5\uff0c\u9488\u5bf9207\u540d\u8f6f\u4ef6\u5f00\u53d1\u8005\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u8c03\u67e5\u5185\u5bb9\u5305\u62ecChatGPT\u5bf9\u8f6f\u4ef6\u8d28\u91cf\u3001\u751f\u4ea7\u529b\u4ee5\u53ca\u5f00\u53d1\u8005\u5de5\u4f5c\u6ee1\u610f\u5ea6\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u8fd8\u63a2\u8ba8\u4e86\u4ed6\u4eec\u5bf9\u672a\u6765ChatGPT\u5e94\u7528\u7684\u9884\u671f\u3001\u5bf9\u53ef\u80fd\u7684\u5de5\u4f5c\u5c97\u4f4d\u66ff\u4ee3\u7684\u62c5\u5fe7\uff0c\u4ee5\u53ca\u5bf9\u76d1\u7ba1\u63aa\u65bd\u7684\u770b\u6cd5\u3002|\n", "2405.12174": "|**2024-05-20**|**CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models**|Haoxiang Shi et.al.|[2405.12174](http://arxiv.org/abs/2405.12174)|null|\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aCT-Eval\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\uff0c\u65e8\u5728\u8861\u91cf\u5927\u8bed\u8a00\u6a21\u578b\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u6587\u672c\u8f6c\u8868\u683c\u4efb\u52a1\u6027\u80fd\u3002\u7531\u4e8e\u73b0\u6709\u82f1\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\u4e3b\u8981\u9762\u5411\u82f1\u8bed\uff0cCT-Eval\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u9009\u62e9\u4e86\u4e00\u79cd\u6d41\u884c\u7684\u591a\u5b66\u79d1\u4e2d\u6587\u5728\u7ebf\u767e\u79d1\u4f5c\u4e3a\u6765\u6e90\uff0c\u6db5\u76d6\u4e8628\u4e2a\u9886\u57df\u4ee5\u4fdd\u8bc1\u6570\u636e\u591a\u6837\u6027\u3002\u4e3a\u4e86\u51cf\u5c11\u6570\u636e\u865a\u6784\uff08hallucination\uff09\u95ee\u9898\uff0c\u7814\u7a76\u8005\u9996\u5148\u8bad\u7ec3\u4e86\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u6765\u8bc6\u522b\u5e76\u8fc7\u6ee4\u6389\u5b58\u5728\u865a\u6784\u95ee\u9898\u7684\u6837\u672c\uff0c\u7136\u540e\u4eba\u5de5\u6807\u6ce8\u9a8c\u8bc1\u96c6\u548c\u6d4b\u8bd5\u96c6\u4e2d\u7684\u9519\u8bef\u3002\u6700\u7ec8\uff0cCT-Eval\u5305\u542b\u4e86\u5927\u7ea688,600\u4e2a\u4efb\u52a1\u6837\u672c\u3002\u901a\u8fc7CT-Eval\uff0c\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5f00\u6e90\u548c\u95ed\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u663e\u793a\u96f6-shot\u6a21\u5f0f\u4e0b\u8fd9\u4e9b\u6a21\u578b\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ecd\u6709\u663e\u8457\u5dee\u8ddd\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5f00\u6e90\u6a21\u578b\u5728\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u4e0a\u6709\u4e86\u663e\u8457\u63d0\u5347\uff0c\u5927\u5e45\u8d85\u8d8a\u4e86GPT-4\u3002\u603b\u4e4b\uff0cCT-Eval\u4e0d\u4ec5\u4e3a\u8bc4\u4f30\u548c\u7406\u89e3\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u5de5\u5177\uff0c\u4e5f\u4e3a\u63d0\u5347\u8fd9\u7c7b\u6a21\u578b\u5728\u8fd9\u9879\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u5b9d\u8d35\u8d44\u6e90\u3002|\n", "2405.12163": "|**2024-05-20**|**Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging**|Xiaobo Liang et.al.|[2405.12163](http://arxiv.org/abs/2405.12163)|**[link](https://github.com/dropreg/fennec)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u4f17\u591a\u73b0\u5b9e\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4e3b\u8981\u76ee\u6807\u662f\u7b26\u5408\u4eba\u7c7b\u7684\u610f\u56fe\u3002\u7136\u800c\uff0c\u7406\u89e3\u4eba\u7c7b\u610f\u56fe\u7684\u590d\u6742\u6027\u4f7f\u5f97\u4f9d\u8d56\u4e8e\u8017\u65f6\u7684\u4eba\u5de5\u8bc4\u4f30\u6210\u4e3a\u5fc5\u8981\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5229\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u8bc4\u4f30\u8005\u7684\u8d8b\u52bf\uff0c\u7279\u522b\u662f\u5728GPT-4\u7684\u6d41\u884c\u80cc\u666f\u4e0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textbf{Fennec}\u7684\u6846\u67b6\uff0c\u4e13\u6ce8\u4e8e\\textbf{F}ine-grained \\textbf{E}valuation\uff08\u7ec6\u81f4\u8bc4\u4f30\uff09\u548c\\textbf{N}eeded \\textbf{E}xtension\uff08\u5fc5\u8981\u6269\u5c55\uff09\u901a\u8fc7\u5206\u652f\uff08Branching\uff09\u548c\u8fde\u63a5\uff08Bridging\uff09\u3002\u5206\u652f\u64cd\u4f5c\u5c06\u8bc4\u4f30\u4efb\u52a1\u5206\u89e3\u4e3a\u4e0d\u540c\u7ef4\u5ea6\u548c\u7c92\u5ea6\uff0c\u4ece\u800c\u51cf\u8f7b\u8bc4\u4f30\u6311\u6218\u3002\u540c\u65f6\uff0c\u8fde\u63a5\u64cd\u4f5c\u878d\u5408\u4e86\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u589e\u52a0\u4e86\u8bc4\u4f30\u4efb\u52a1\u7684\u591a\u6837\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u76847B\u6a21\u578b\u5728\u5404\u79cd\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\\textit{\u4e00\u81f4\u6027}\u548c\\textit{\u4e00\u81f4\u540c\u610f}\u6027\u80fd\u5747\u4f18\u4e8e\u5f00\u6e90\u7684\u66f4\u5927\u89c4\u6a21\u8bc4\u4f30\u6a21\u578b\uff0c\u63a5\u8fd1GPT-4\u7684\u8868\u73b0\u3002\u6211\u4eec\u5229\u7528\u6a21\u578b\u7684\u7cbe\u7ec6\u6821\u6b63\u529f\u80fd\u6539\u8fdb\u591a\u4e2a\u6a21\u578b\u54cd\u5e94\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u4f18\u5316\u63d0\u5347\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u5728MT-Bench\u4e0a\u63d0\u9ad8\u4e861-2\u5206\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728GitHub\u4e0a\u5f00\u6e90\\footnote{\\url{https://github.com/dropreg/Fennec}}\u3002**|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130](http://arxiv.org/abs/2405.12130)|**[link](https://github.com/kongds/mora)**|**\u4f4e\u79e9\u9002\u5e94\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u6d41\u884c\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4f4e\u79e9\u66f4\u65b0\uff08\u5982LoRA\u5b9e\u73b0\uff09\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u6307\u51fa\uff0c\u8fd9\u79cd\u673a\u5236\u53ef\u80fd\u9650\u5236\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\u548c\u8bb0\u5fc6\u65b0\u77e5\u8bc6\u7684\u80fd\u529b\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5MoRA\uff0c\u5b83\u5229\u7528\u5e73\u65b9\u77e9\u9635\u5b9e\u73b0\u9ad8\u79e9\u66f4\u65b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eLoRA\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76f8\u5e94\u7684\u975e\u53c2\u6570\u8fd0\u7b97\u5668\uff0c\u4ee5\u964d\u4f4e\u8f93\u5165\u7ef4\u5ea6\u5e76\u589e\u52a0\u8f93\u51fa\u7ef4\u5ea6\u5904\u7406\u5e73\u65b9\u77e9\u9635\u3002\u8fd9\u4e9b\u8fd0\u7b97\u5668\u786e\u4fdd\u6743\u91cd\u80fd\u65e0\u7f1d\u878d\u5165\u5230\u5927\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u4f7f\u5f97\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u50cfLoRA\u4e00\u6837\u90e8\u7f72\u3002\u6211\u4eec\u5728\u4e94\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff1a\u6307\u4ee4\u8c03\u6574\u3001\u6570\u5b66\u63a8\u7406\u3001\u8fde\u7eed\u9884\u8bad\u7ec3\u3001\u8bb0\u5fc6\u4ee5\u53ca\u9884\u8bad\u7ec3\u3002\u5728\u5185\u5b58\u5bc6\u96c6\u578b\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8eLoRA\uff0c\u5e76\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002**|\n", "2405.12119": "|**2024-05-20**|**Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation**|Zhankui He et.al.|[2405.12119](http://arxiv.org/abs/2405.12119)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u901a\u8fc7\u51fa\u8272\u5730\u7d22\u5f15\u9879\u76ee\u5185\u5bb9\u3001\u7406\u89e3\u590d\u6742\u7684\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u5e76\u751f\u6210\u76f8\u5173\u9879\u76ee\u6807\u9898\uff0c\u9769\u65b0\u4e86\u5bf9\u8bdd\u63a8\u8350\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u63a7\u5236\u63a8\u8350\u9879\u76ee\u7684\u5206\u5e03\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u5bfc\u81f4\u5728\u9488\u5bf9\u5bf9\u8bdd\u63a8\u8350\u5e73\u53f0\u7684\u5feb\u901f\u53d8\u5316\u7684\u6570\u636e\u5206\u5e03\uff0c\u5982\u9879\u76ee\u6d41\u884c\u5ea6\u4e0a\uff0c\u6027\u80fd\u6b20\u4f73\u3002\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\uff0cLLMs\u901a\u8fc7\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u9879\u76ee\u6807\u9898\uff08\u4f5c\u4e3a\u591a\u4e2a\u4ee4\u724c\uff09\uff0c\u8fd9\u4f7f\u5f97\u83b7\u53d6\u548c\u63a7\u5236\u6240\u6709\u9879\u76ee\u63a8\u8350\u53d8\u5f97\u56f0\u96be\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u91cd\u7d22\u5f15-\u7136\u540e\u9002\u5e94\u201d\uff08Reindex-Then-Adapt\uff0cRTA\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u591a\u4ee4\u724c\u9879\u76ee\u6807\u9898\u8f6c\u6362\u4e3a\u5355\u4e2a\u4ee4\u724c\u4e8eLLMs\u5185\uff0c\u968f\u540e\u8c03\u6574\u8fd9\u4e9b\u5355\u4ee4\u724c\u9879\u76ee\u6807\u9898\u7684\u6982\u7387\u5206\u5e03\u3002RTA\u6846\u67b6\u7ed3\u5408\u4e86LLMs\u7406\u89e3\u548c\u590d\u6742\u67e5\u8be2\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\uff08RecSys\uff09\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\u6709\u6548\u63a7\u5236\u63a8\u8350\u9879\u76ee\u5206\u5e03\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u4e09\u4e2a\u4e0d\u540c\u7684\u5bf9\u8bdd\u63a8\u8350\u6570\u636e\u96c6\u548c\u4e24\u79cd\u9002\u5e94\u8bbe\u7f6e\u4e0b\uff0c\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u51c6\u786e\u6027\u6307\u6807\u3002|\n", "2405.12107": "|**2024-05-20**|**Imp: Highly Capable Large Multimodal Models for Mobile Devices**|Zhenwei Shao et.al.|[2405.12107](http://arxiv.org/abs/2405.12107)|**[link](https://github.com/milvlg/imp)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5f00\u653e\u4e16\u754c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u53c2\u6570\u91cf\u5927\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\uff0c\u9650\u5236\u4e86\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u8f7b\u91cf\u7ea7LMM\uff0c\u65e8\u5728\u5728\u6709\u9650\u89c4\u6a21\uff08\u598230\u4ebf\u53c2\u6570\uff09\u4e0b\u6700\u5927\u5316\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u591a\u6570\u4ec5\u5173\u6ce8\u8bbe\u8ba1\u7a7a\u95f4\u7684\u5355\u4e00\u6216\u4e24\u4e2a\u65b9\u9762\uff0c\u5bf9\u5f71\u54cd\u6a21\u578b\u80fd\u529b\u7684\u5173\u952e\u8bbe\u8ba1\u9009\u62e9\u5c1a\u672a\u8fdb\u884c\u5168\u9762\u63a2\u8ba8\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u8f7b\u91cf\u7ea7LMM\u7684\u8bbe\u8ba1\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u8bad\u7ec3\u7b56\u7565\u548c\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u5957\u540d\u4e3aImp\u7684\u9ad8\u6027\u80fdLMM\u5bb6\u65cf\uff0c\u8986\u76d620\u4ebf\u523040\u4ebf\u53c2\u6570\u89c4\u6a21\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684Imp-30\u4ebf\u6a21\u578b\u5728\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u73b0\u6709\u8f7b\u91cf\u7ea7\u6a21\u578b\u76f8\u6bd4\u65f6\u6301\u7eed\u9886\u5148\uff0c\u5e76\u8d85\u8d8a\u4e86130\u4ebf\u53c2\u6570\u89c4\u6a21\u7684\u6700\u65b0LMM\u72b6\u6001\u3002\u901a\u8fc7\u4f4e\u7cbe\u5ea6\u91cf\u5316\u548c\u5206\u8fa8\u7387\u964d\u4f4e\u6280\u672f\uff0cImp\u6a21\u578b\u80fd\u591f\u5728\u9ad8\u901a\u9a81\u9f998Gen3\u79fb\u52a8\u82af\u7247\u4e0a\u5b9e\u73b0\u9ad8\u901f\u90e8\u7f72\uff0c\u6bcf\u79d2\u5904\u7406\u5927\u7ea613\u4e2a\u4ee4\u724c\u7684\u63a8\u7406\u901f\u5ea6\u3002**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100](http://arxiv.org/abs/2405.12100)|null|## \u80cc\u666f \u6570\u5b66\u4e16\u754c\u95ee\u9898\u4fee\u6b63\uff08MWPC\uff09\u662f\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u89e3\u51b3\u6570\u5b66\u95ee\u9898\u8fc7\u7a0b\u4e2d\u9519\u8bef\u63a8\u7406\u7684\u4fee\u6b63\u4efb\u52a1\u3002\u672c\u6587\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\uff0c\u5173\u6ce8\u4e24\u70b9\uff1a\uff081\uff09\u533a\u5206\u6570\u5b66\u63a8\u7406\u4e0e\u9519\u8bef\u4fee\u6b63\uff1b\uff082\uff09\u63a2\u7d22\u7b56\u7565\u4ee5\u63d0\u5347LLMs\u5728\u6570\u5b66\u9886\u57df\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9MWPC\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5728\u5b9e\u65f6\u6559\u80b2\u4e2d\uff0c\u5e2e\u52a9\u5b66\u751f\u8bc6\u522b\u9519\u8bef\u6bd4\u5355\u7eaf\u63d0\u4f9b\u6b63\u786e\u7b54\u6848\u66f4\u4e3a\u5173\u952e\u3002\u7136\u800c\uff0c\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u83b7\u53d6\u7cbe\u786e\u7684\u89e3\u9898\u7b54\u6848\uff0c\u800c\u975e\u7ea0\u6b63\u53ef\u80fd\u7684\u9519\u8bef\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8c03\u6574\u4e86\u7814\u7a76\u8303\u5f0f\uff0c\u8868\u660e\u63d0\u5347\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5e76\u4e0d\u7b49\u540c\u4e8e\u7cbe\u901a\u9519\u8bef\u4fee\u6b63\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u8bca\u65ad\u5bfc\u5411\u63d0\u793a\uff08DOP\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdbLLMs\u5728\u9519\u8bef\u4fee\u6b63\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDOP\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u5f70\u663e\u5176\u91cd\u8981\u6027\u3002\u6211\u4eec\u5f3a\u8c03\uff0c\u5728\u6570\u5b66\u6559\u80b2\u4e2d\uff0c\u5bf9\u51fa\u8272\u4fee\u6b63\u8005\u7684\u9700\u8981\u8d85\u8fc7\u4e86\u5bf9\u719f\u7ec3\u63a8\u7406\u8005\u7684\u8ffd\u6c42\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2405.12981": "|**2024-05-21**|**Reducing Transformer Key-Value Cache Size with Cross-Layer Attention**|William Brandon et.al.|[2405.12981](http://arxiv.org/abs/2405.12981)|null|## \u7ffb\u8bd1 \u952e\u503c\u7f13\u5b58\u5bf9\u4e8e\u52a0\u901fTransformer\u67b6\u6784\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89e3\u7801\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u548c\u6279\u91cf\u5927\u5c0f\u589e\u5927\uff0c\u5b58\u50a8\u952e\u503c\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\u53ef\u80fd\u4f1a\u53d8\u5f97\u96be\u4ee5\u627f\u53d7\u3002\u81ea\u4eceTransformer\u8bde\u751f\u4ee5\u6765\uff0c\u4e24\u4e2a\u6700\u6709\u6548\u7684\u5185\u5b58\u51cf\u5c0f\u7b56\u7565\u662f\u591a\u67e5\u8be2\u6ce8\u610f\u529b\uff08MQA\uff09\u53ca\u5176\u63a8\u5e7f\uff0c\u7fa4\u7ec4\u67e5\u8be2\u6ce8\u610f\u529b\uff08GQA\uff09\u3002MQA\u548cGQA\u901a\u8fc7\u8ba9\u591a\u4e2a\u67e5\u8be2\u5934\u5171\u4eab\u5355\u4e2a\u952e/\u503c\u5934\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u4e0d\u540c\u952e/\u503c\u5934\u7684\u6570\u91cf\uff0c\u540c\u65f6\u5bf9\u51c6\u786e\u6027\u5f71\u54cd\u8f83\u5c0f\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u8fdb\u4e00\u6b65\u53d1\u5c55MQA\uff0c\u5373\u5728\u76f8\u90bb\u5c42\u4e4b\u95f4\u4e5f\u5171\u4eab\u952e\u548c\u503c\u5934\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u8de8\u5c42\u6ce8\u610f\u529b\uff08CLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528CLA\uff0c\u53ef\u4ee5\u5728\u4fdd\u6301\u63a5\u8fd1\u539f\u59cbMQA\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u5c06\u952e\u503c\u7f13\u5b58\u7684\u5927\u5c0f\u518d\u51cf\u5c112\u500d\u3002\u6211\u4eec\u5728\u4ece\u5934\u8bad\u7ec310\u4ebf\u53c2\u6570\u548c30\u4ebf\u53c2\u6570\u6a21\u578b\u7684\u5b9e\u9a8c\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u70b9\uff0c\u7ed3\u679c\u8868\u660e\uff0cCLA\u5728\u5185\u5b58\u4e0e\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u63d0\u4f9b\u4e86\u4f18\u4e8e\u4f20\u7edfMQA\u7684\u5e15\u7d2f\u6258\u6539\u8fdb\uff0c\u4f7f\u5f97\u66f4\u957f\u7684\u5e8f\u5217\u957f\u5ea6\u548c\u66f4\u5927\u7684\u6279\u91cf\u5927\u5c0f\u4e0b\u7684\u63a8\u7406\u6210\u4e3a\u53ef\u80fd\u3002|\n", "2405.12961": "|**2024-05-21**|**Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale**|Shriram Chennakesavalu et.al.|[2405.12961](http://arxiv.org/abs/2405.12961)|**[link](https://github.com/rotskoff-group/llm-era)**|\u5728\u5316\u5b66\u7a7a\u95f4\u4e2d\u7684\u641c\u7d22\u662f\u4e00\u4e2a\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\uff0c\u56e0\u4e3a\u53ef\u80fd\u7684\u5206\u5b50\u6570\u91cf\u968f\u7740\u539f\u5b50\u6570\u91cf\u5448\u7ec4\u5408\u7ea7\u589e\u957f\u3002\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\u901a\u8fc7\u5b66\u4e60\u5316\u5b66\u5316\u5408\u7269\u6570\u636e\u5e93\u5df2\u7ecf\u4ea7\u751f\u4e86\u5f3a\u5927\u7684\u751f\u6210\u5668\uff0c\u4f46\u6211\u4eec\u4ecd\u7136\u7f3a\u4e4f\u6709\u6548\u7b56\u7565\u6765\u751f\u6210\u5177\u6709\u7279\u5b9a\u6027\u8d28\u7684\u5206\u5b50\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u201c\u5bf9\u9f50\u201d\u95ee\u9898\u76f8\u4f3c\uff0c\u5c3d\u7ba1\u5728\u8bb8\u591a\u5316\u5b66\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u6709\u4e00\u4e2a\u660e\u786e\u4e14\u6613\u4e8e\u8bc4\u4f30\u7684\u5956\u52b1\u51fd\u6570\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u80fd\u91cf\u6392\u540d\u5bf9\u9f50\uff08ERA\uff09\u7684\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u660e\u786e\u7684\u5956\u52b1\u51fd\u6570\u6784\u5efa\u4e86\u4e00\u4e2a\u68af\u5ea6\u4f18\u5316\u76ee\u6807\uff0c\u7528\u4e8e\u8c03\u6574\u81ea\u56de\u5f52\u7b56\u7565\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u8be5\u7b97\u6cd5\u4e0eProximal Policy Optimization\uff08PPO\uff09\u548cDirect Preference Optimization\uff08DPO\uff09\u5bc6\u5207\u76f8\u5173\uff0c\u4f46\u5176\u6700\u5c0f\u5316\u5668\u6536\u655b\u4e8e\u4e00\u4e2a\u7406\u60f3\u7684\u5409\u5e03\u65af-\u73bb\u5c14\u5179\u66fc\u5206\u5e03\uff0c\u5956\u52b1\u51fd\u6570\u626e\u6f14\u4e86\u80fd\u91cf\u89d2\u8272\u3002\u6b64\u5916\uff0c\u8be5\u7b97\u6cd5\u5177\u6709\u9ad8\u5ea6\u53ef\u6269\u5c55\u6027\uff0c\u65e0\u9700\u5f3a\u5316\u5b66\u4e60\uff0c\u5e76\u4e14\u5728\u6bcf\u5bf9\u6837\u672c\u7684\u504f\u597d\u89c2\u5bdf\u6b21\u6570\u8f83\u5c11\u65f6\uff0c\u76f8\u5bf9\u4e8eDPO\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u5e94\u7528\u4e8e\u5206\u5b50\u53d8\u538b\u5668\u7684\u5bf9\u9f50\uff0c\u4ee5\u751f\u6210\u5177\u6709\u5916\u90e8\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\uff0c\u5e76\u53d1\u73b0\u5b83\u80fd\u7a33\u5065\u5730\u8fdb\u884c\u641c\u7d22\uff0c\u63a2\u7d22\u5316\u5b66\u7a7a\u95f4\u7684\u591a\u6837\u5316\u90e8\u5206\u3002\u867d\u7136\u6211\u4eec\u7684\u91cd\u70b9\u5728\u4e8e\u5316\u5b66\u641c\u7d22\uff0c\u4f46\u6211\u4eec\u5728\u4e00\u4e2aAI\u76d1\u7763\u7684\u4efb\u52a1\u4e0a\u4e5f\u53d6\u5f97\u4e86\u4f18\u79c0\u7ed3\u679c\uff0c\u8868\u660e\u8be5\u65b9\u6cd5\u662f\u53ef\u6269\u5c55\u4e14\u901a\u7528\u7684\u3002|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939](http://arxiv.org/abs/2405.12939)|**[link](https://github.com/yinzhangyue/AoR)**|## \u80cc\u666f \u8fd1\u671f\uff0cChain-of-Thought\u63d0\u793a\u7684\u8fdb\u5c55\u6781\u5927\u5730\u63a8\u52a8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u7a81\u7834\u3002\u5f53\u524d\u7814\u7a76\u901a\u8fc7\u91c7\u6837\u591a\u79cd\u63a8\u7406\u8def\u5f84\u5e76\u6839\u636e\u7b54\u6848\u9891\u7387\u8fdb\u884censemble\uff0c\u63d0\u9ad8\u4e86LLMs\u7684\u63a8\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u6b63\u786e\u7b54\u6848\u5904\u4e8e\u5c11\u6570\u7684\u60c5\u51b5\u65f6\u5931\u6548\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u662f\u5236\u7ea6LLMs\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u56e0\u7d20\uff0c\u4ec5\u51ed\u9884\u6d4b\u7b54\u6848\u65e0\u6cd5\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u63a8\u7406\u805a\u5408\u6846\u67b6AoR\uff08\u63a8\u7406\u805a\u5408\uff09\uff0c\u5b83\u4f9d\u636e\u63a8\u7406\u94fe\u6761\u7684\u8bc4\u4f30\u6765\u9009\u62e9\u7b54\u6848\u3002\u6b64\u5916\uff0cAoR\u5f15\u5165\u4e86\u52a8\u6001\u91c7\u6837\u7b56\u7565\uff0c\u6839\u636e\u4efb\u52a1\u590d\u6742\u5ea6\u8c03\u6574\u63a8\u7406\u94fe\u6761\u7684\u6570\u91cf\u3002 ## \u4efb\u52a1 \u4e00\u7cfb\u5217\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAoR\u76f8\u8f83\u4e8e\u4e3b\u6d41ensemble\u65b9\u6cd5\u8868\u73b0\u51fa\u8272\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u8868\u660e\uff0cAoR\u4e0d\u4ec5\u9002\u7528\u4e8e\u5404\u79cdLLMs\uff0c\u800c\u4e14\u5728\u4e0e\u73b0\u6709\u65b9\u6cd5\u7684\u6027\u80fd\u5929\u82b1\u677f\u6bd4\u8f83\u4e2d\uff0c\u8fbe\u5230\u4e86\u66f4\u4f18\u79c0\u7684\u6c34\u5e73\u3002|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933](http://arxiv.org/abs/2405.12933)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bf8\u5982\u603b\u7ed3\u3001\u7b97\u672f\u63a8\u7406\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5728\u9053\u5fb7\u63a8\u7406\u548c\u4f26\u7406\u51b3\u7b56\u65b9\u9762\uff0c\u5c24\u5176\u662f\u5728\u6d89\u53ca\u591a\u4e2a\u5229\u76ca\u76f8\u5173\u8005\u7684\u590d\u6742\u60c5\u666f\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u4e25\u5cfb\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSkin-in-the-Game\uff08SKIG\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4ece\u4e0d\u540c\u5229\u76ca\u76f8\u5173\u8005\u89d2\u5ea6\u5ba1\u89c6\u51b3\u7b56\u7684\u540e\u679c\uff0c\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u5728\u9053\u5fb7\u63a8\u7406\u4e2d\u7684\u80fd\u529b\u3002SKIG\u7684\u6838\u5fc3\u673a\u5236\u662f\u6a21\u62df\u884c\u52a8\u7684\u8d23\u4efb\u611f\uff0c\u7ed3\u5408\u540c\u7406\u5fc3\u7ec3\u4e60\u548c\u98ce\u9669\u8bc4\u4f30\uff0c\u5bf9\u63d0\u9ad8\u5176\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u4f7f\u7528\u4e13\u6709\u548c\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u9053\u5fb7\u63a8\u7406\u57fa\u51c6\u4e0a\u9a8c\u8bc1SKIG\u7684\u8868\u73b0\uff0c\u5e76\u901a\u8fc7\u6df1\u5165\u7684\u6d88\u878d\u5206\u6790\u63a2\u7a76\u5176\u5173\u952e\u7ec4\u4ef6\u3002|\n", "2405.12929": "|**2024-05-21**|**Code-mixed Sentiment and Hate-speech Prediction**|Anjali Yadav et.al.|[2405.12929](http://arxiv.org/abs/2405.12929)|null|\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\uff0c\u6df7\u5408\u4ee3\u7801\uff08code-mixed discourse\uff09\u6307\u7684\u662f\u5355\u6587\u672c\u4e2d\u878d\u5408\u591a\u79cd\u8bed\u8a00\u7684\u73b0\u8c61\uff0c\u5c24\u5176\u662f\u5728\u5b98\u65b9\u8bed\u8a00\u591a\u5143\u7684\u56fd\u5bb6\u7684\u975e\u6b63\u5f0f\u4ea4\u6d41\u4e2d\u5e38\u89c1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u4e3b\u5bfc\u5730\u4f4d\u63d0\u5347\uff0c\u6211\u4eec\u9488\u5bf9\u4ee3\u7801\u6df7\u5408\u8bed\u5883\u7684\u7814\u7a76\u4e5f\u968f\u4e4b\u5c55\u5f00\u3002\u9996\u5148\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u56db\u6b3e\u65b0\u7684\u82f1\u8bed-\u5370\u5730\u8bed\u548c\u82f1\u8bed-\u65af\u6d1b\u6587\u5c3c\u4e9a\u53cc\u8bed\u9884\u8bad\u7ec3\u906e\u7f69\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u9002\u5e94\u975e\u6b63\u5f0f\u8bed\u8a00\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5404\u79cd\u7c7b\u578b\u7684\u6a21\u578b\u2014\u2014\u5305\u62ec\u5355\u8bed\u3001\u53cc\u8bed\u3001\u5c11\u91cf\u8bed\u8a00\u548c\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u6a21\u578b\u2014\u2014\u5728\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u60c5\u611f\u5206\u6790\u548c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7b49\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u6709\u6548\u7684\u5206\u7c7b\u5668\u662f\u9488\u5bf9\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u4e13\u4e1a\u5316\u53cc\u8bed\u548c\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u968f\u540e\u662f\u975e\u4e13\u4e1a\u7684\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u548c\u5355\u8bed\u6a21\u578b\uff0c\u800c\u5927\u578b\u751f\u6210\u6a21\u578b\u7684\u8868\u73b0\u5e76\u4e0d\u7a81\u51fa\u3002\u5bf9\u4e8e\u6d89\u53ca\u60c5\u611f\u7684\u95ee\u9898\uff0c\u6a21\u578b\u5728\u5904\u7406\u4ee3\u7801\u6df7\u5408\u6570\u636e\u65f6\u603b\u4f53\u4e0a\u7565\u4f18\u4e8e\u975e\u4ee3\u7801\u6df7\u5408\u6570\u636e\u3002|\n", "2405.12920": "|**2024-05-21**|**Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples**|Tim Menzies et.al.|[2405.12920](http://arxiv.org/abs/2405.12920)|**[link](https://github.com/timm/ez)**|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u8f6f\u4ef6\u5206\u6790\u6311\u6218\u4efb\u52a1\u3002\u5728\u8fd9\u4e2a\u88ab\u79f0\u4e3a\u201c\u8f6f\u4ef6\u5ba1\u67e5\u201d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e00\u7ec4SME\uff08\u4e3b\u9898\u4e13\u5bb6\uff09\u4f1a\u8bc4\u5ba1\u8f6f\u4ef6\u884c\u4e3a\u793a\u4f8b\uff0c\u4ee5\u5efa\u8bae\u5982\u4f55\u6539\u8fdb\u8f6f\u4ef6\u7684\u8fd0\u884c\u3002\u7531\u4e8eSME\u7684\u65f6\u95f4\u901a\u5e38\u975e\u5e38\u6709\u9650\uff0c\u7406\u60f3\u7684\u72b6\u51b5\u662f\uff0c\u8be5\u56e2\u961f\u4ec5\u901a\u8fc7\u67e5\u770b\u5c11\u91cf\u5177\u6709\u9ad8\u5ea6\u4fe1\u606f\u4ef7\u503c\u7684\u793a\u4f8b\u5c31\u80fd\u5b8c\u6210\u4f18\u5316\u4efb\u52a1\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e2a\u5ba1\u67e5\u8fc7\u7a0b\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u8bad\u7ec3\u9884\u6d4b\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u67d0\u4e2a\u4e13\u5bb6\u662f\u5426\u4f1a\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u4e0b\u4e00\u4e2a\u793a\u4f8b\u3002\u8fd9\u79cd\u9884\u6d4b\u6a21\u578b\u53ef\u4ee5\u4e0eSME\u5408\u4f5c\uff0c\u5f15\u5bfc\u4ed6\u4eec\u63a2\u7d22\u6240\u6709\u793a\u4f8b\uff0c\u540c\u65f6\u5728\u4e13\u5bb6\u79bb\u5f00\u540e\uff0c\u6a21\u578b\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u4ee3\u7406\uff0c\u5904\u7406\u65b0\u51fa\u73b0\u7684\u6848\u4f8b\uff0c\u4ee5\u5e94\u5bf9\u4e13\u5bb6\u4eec\u7684\u5fd9\u788c\u3002 \u572831\u4e2a\u6848\u4f8b\u7814\u7a76\u4e2d\uff08\u6db5\u76d6\u4e86\u4ece\u8f6f\u4ef6\u6d41\u7a0b\u7684\u9ad8\u5c42\u51b3\u7b56\u5230\u89c6\u9891\u7f16\u7801\u8f6f\u4ef6\u914d\u7f6e\u7684\u4f4e\u5c42\u51b3\u7b56\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f7f\u752812\u523030\u4e2a\u6807\u7b7e\u5c31\u80fd\u5efa\u7acb\u8fd9\u6837\u7684\u9884\u6d4b\u6a21\u578b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u4ec5\u51ed\u5c11\u6570\u793a\u4f8b\uff08\u4e0d\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5c31\u80fd\u53d6\u5f97\u8fd9\u6837\u7684\u6210\u679c\uff0c\u5728\u5f53\u524d\u5c1a\u5c5e\u7f55\u89c1\u3002\u9075\u5faa\u5f00\u653e\u79d1\u5b66\u7684\u539f\u5219\uff0c\u6211\u4eec\u5c06\u5728\u63d0\u4f9b\u6240\u6709\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4fbf\u4ed6\u4eba\u80fd\u590d\u5236\u3001\u9a8c\u8bc1\u6216\u5728\u6b64\u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u6539\u8fdb\u8fd9\u4e9b\u7ed3\u679c\u3002|\n", "2405.12915": "|**2024-05-21**|**G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation**|Xingyuan Pan et.al.|[2405.12915](http://arxiv.org/abs/2405.12915)|**[link](https://github.com/xypan0/G-DIG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u7528\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\uff0c\u5b83\u4eec\u80fd\u591f\u4e0e\u4eba\u7c7b\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u534f\u540c\u3002\u7136\u800c\uff0c\u6307\u4ee4\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u662f\u6307\u4ee4\u5fae\u8c03\u9762\u4e34\u7684\u4e24\u5927\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u9009\u62e9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u9ad8\u8d28\u91cf\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u3002\u6211\u4eec\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5206\u6790\u5355\u4e2a\u8bad\u7ec3\u6837\u4f8b\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f71\u54cd\u6a21\u578b\u3002\u901a\u8fc7\u7ed3\u5408\u5f71\u54cd\u529b\u51fd\u6570\u548c\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u79cd\u5b50\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u5bf9\u6a21\u578b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u7684\u6837\u4f8b\u4f5c\u4e3a\u9ad8\u8d28\u91cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u6570\u636e\u591a\u6837\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u805a\u7c7b\u5176\u68af\u5ea6\u5e76\u91cd\u91c7\u6837\uff0c\u6700\u5927\u5316\u5b83\u4eec\u5bf9\u6a21\u578b\u4ea7\u751f\u7684\u5f71\u54cd\u591a\u6837\u6027\u3002\u5728WMT22\u548cFLORES\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u6df1\u5165\u5206\u6790\u8fdb\u4e00\u6b65\u8bc1\u5b9e\u4e86\u5176\u6548\u679c\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2405.12914": "|**2024-05-21**|**An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation**|Zhiyu Tan et.al.|[2405.12914](http://arxiv.org/abs/2405.12914)|**[link](https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion.github.io)**|\u4e00\u4e2a\u5173\u952e\u7684\u5148\u51b3\u6761\u4ef6\u662f\u51c6\u786e\u7406\u89e3\u6587\u672c\u8f93\u5165\uff0c\u8fd9\u5bf9\u4e8e\u5fe0\u5b9e\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u5229\u7528CLIP\u6a21\u578b\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u8868\u793a\u63d0\u793a\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684CLIP\u6a21\u578b\u4ec5\u80fd\u5904\u7406\u82f1\u6587\uff0c\u4e14\u5176\u6587\u672c\u7f16\u7801\u5668\u7684\u6a21\u578b\u5bb9\u91cf\u76f8\u5bf9\u6709\u9650\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u80fd\u591f\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u5e76\u63d0\u4f9b\u66f4\u4f18\u79c0\u7684\u6587\u672c\u8868\u793a\u3002\u672c\u6587\u7814\u7a76\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u4ee5\u63d0\u5347\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u7136\u800c\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5305\u542bLLMs\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u6570\u636e\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e09\u9636\u6bb5\u8bad\u7ec3\u6d41\u7a0b\uff0c\u6709\u6548\u5730\u6574\u5408\u73b0\u6709\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u4e0eLLMs\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u8bad\u7ec3\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u9002\u914d\u5668\uff0c\u4f7f\u5f97\u80fd\u591f\u5feb\u901f\u4f7f\u7528LLMs\u751f\u6210\u7684\u6587\u672c\u8868\u793a\u6765\u8bad\u7ec3\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0d\u4ec5\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u8fd8\u80fd\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u800c\u4e14\u5728\u56fe\u50cf\u751f\u6210\u8d28\u91cf\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.12910": "|**2024-05-21**|**Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment**|Holli Sargeant et.al.|[2405.12910](http://arxiv.org/abs/2405.12910)|**[link](https://github.com/AhmedIzzidien/TopicLLM)**|**\u8be5\u8bba\u6587\u5173\u6ce8\u6cd5\u5f8b\u5206\u6790\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u7a7a\u767d\uff0c\u901a\u8fc7\u6784\u5efa\u548c\u5e94\u7528\u4e00\u79cd\u65b0\u9896\u7684\u5224\u4f8b\u4e3b\u9898\u5206\u7c7b\u6cd5\uff0c\u5bf9\u82f1\u56fd\u7684\u7b80\u6613\u5224\u51b3\u6848\u4ef6\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5229\u7528\u7cbe\u5fc3\u6311\u9009\u7684\u7b80\u6613\u5224\u51b3\u6848\u4f8b\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578bClaude 3 Opus\u7814\u7a76\u529f\u80fd\u6027\u8bdd\u9898\u548c\u8d8b\u52bf\u3002\u7ed3\u679c\u663e\u793a\uff0cClaude 3 Opus\u5728\u4e3b\u9898\u5206\u7c7b\u4e0a\u7684\u51c6\u786e\u7387\u4e3a87.10%\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u6cd5\u5f8b\u9886\u57df\u4e2d\u7b80\u6613\u5224\u51b3\u7684\u660e\u663e\u6a21\u5f0f\u3002\u7531\u4e8e\u82f1\u56fd\u7684\u5224\u4f8b\u6cd5\u5e76\u672a\u539f\u59cb\u6807\u6ce8\u5173\u952e\u8bcd\u6216\u63d0\u4f9b\u4e3b\u9898\u8fc7\u6ee4\u9009\u9879\uff0c\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u6df1\u5316\u4e86\u6211\u4eec\u5bf9\u7b80\u6613\u5224\u51b3\u4e3b\u9898\u672c\u8d28\u7684\u7406\u89e3\uff0c\u8fd8\u5c55\u793a\u4e86\u4f20\u7edf\u65b9\u6cd5\u4e0e\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u5206\u7c7b\u65b9\u6cd5\u7ed3\u5408\u7684\u53ef\u80fd\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u4f9b\u4e86\u82f1\u56fd\u6cd5\u5f8b\u7684\u65b0\u901a\u7528\u5206\u7c7b\u6846\u67b6\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u610f\u4e49\u4e3a\u53f8\u6cd5\u884c\u653f\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba1\u7b97\u6cd5\u5b66\u7814\u7a76\u65b9\u6cd5\u8bba\u8ba8\u8bba\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2405.12900": "|**2024-05-21**|**Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents**|San Kim et.al.|[2405.12900](http://arxiv.org/abs/2405.12900)|null|\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5404\u79cd\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7cfb\u7edf\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6bd2\u6027\u95ee\u9898\u5bf9\u7528\u6237\u4f53\u9a8c\u6784\u6210\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bad\u7ec3\u7b97\u6cd5\u2014\u2014\u5bf9\u6297\u5f0f\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08ADPO\uff09\uff0c\u5b83\u662f\u5728\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7684\u57fa\u7840\u4e0a\u6539\u8fdb\u7684\u3002ADPO\u65e8\u5728\u8bad\u7ec3\u6a21\u578b\u589e\u52a0\u5bf9\u4f18\u9009\u56de\u590d\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u964d\u4f4e\u5bf9\u4f7f\u7528\u6709\u6bd2\u63a7\u5236\u4ee4\u724c\u751f\u6210\u7684\u4e0d\u5b89\u5168\u56de\u590d\u7684\u6982\u7387\u3002\u7814\u7a76\u663e\u793a\uff0cADPO\u80fd\u591f\u589e\u5f3a\u6a21\u578b\u62b5\u5fa1\u6709\u5bb3\u5bf9\u8bdd\u7684\u80fd\u529b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660eADPO\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edfDPO\u66f4\u4e3a\u7a33\u5b9a\u7684\u8bad\u7ec3\u6d41\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u6709\u5bb3\u6570\u636e\u76f4\u63a5\u878d\u5165\u751f\u6210\u6a21\u578b\u7684DPO\u53d8\u4f53\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u521b\u5efa\u5b89\u5168\u5bf9\u8bdd\u6570\u636e\u7684\u9700\u6c42\u3002|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863](http://arxiv.org/abs/2405.14863)|null|\u8de8\u9886\u57df\u5bf9\u9f50\u662f\u6307\u5c06\u4e00\u4e2a\u6982\u5ff5\u4ece\u4e00\u4e2a\u9886\u57df\u6620\u5c04\u5230\u53e6\u4e00\u4e2a\u9886\u57df\u7684\u4efb\u52a1\u3002\u4f8b\u5982\uff0c\u8be2\u95ee\u201c\u5982\u679c\\textit{\u533b\u751f}\u662f\u4e00\u79cd\\textit{\u989c\u8272}\uff0c\u5b83\u4f1a\u662f\u4ec0\u4e48\u989c\u8272\uff1f\u201d\u8fd9\u4e2a\u770b\u4f3c\u5947\u7279\u7684\u8bfe\u9898\u65e8\u5728\u7814\u7a76\u4eba\u4eec\u5982\u4f55\u901a\u8fc7\u7c7b\u522b\u6620\u5c04\u548c\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u63a8\u7406\u6765\u8868\u5f81\u5177\u4f53\u548c\u62bd\u8c61\u7684\u6982\u5ff5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u8ba4\u77e5\u79d1\u5b66\u4e2d\u7684\u8fd9\u4e00\u4efb\u52a1\uff0c\u901a\u8fc7\u884c\u4e3a\u7814\u7a76\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6982\u5ff5\u5316\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u793aLLMs\u6267\u884c\u8de8\u57df\u6620\u5c04\u4efb\u52a1\uff0c\u5e76\u5728\u7fa4\u4f53\u548c\u4e2a\u4f53\u5c42\u9762\u5206\u6790\u5b83\u4eec\u7684\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u5bf9\u5176\u9884\u6d4b\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u548c\u5206\u7c7b\u5b83\u4eec\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u89e3\u91ca\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4eba\u7c7b\u548c\u6a21\u578b\u7684\u6620\u5c04\u4ee5\u53ca\u89e3\u91ca\u5b58\u5728\u663e\u8457\u76f8\u4f3c\u6027\uff0c\u8868\u660e\u6a21\u578b\u4ee5\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u65b9\u5f0f\u8868\u5f81\u6982\u5ff5\u3002\u8fd9\u79cd\u76f8\u4f3c\u6027\u4e0d\u4ec5\u4f53\u73b0\u5728\u6a21\u578b\u7684\u8868\u793a\u4e0a\uff0c\u4e5f\u4f53\u73b0\u5728\u5b83\u4eec\u7684\u884c\u4e3a\u4e2d\u3002\u800c\u4e14\uff0c\u6a21\u578b\u5927\u591a\u7ed9\u51fa\u6709\u6548\u7684\u89e3\u91ca\uff0c\u5e76\u91c7\u7528\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u63a8\u7406\u8def\u5f84\u3002|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862](http://arxiv.org/abs/2405.14862)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aBitune\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u63d0\u5347\u4e86\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u8c03\u4f18\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u5728\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002Bitune\u901a\u8fc7\u540c\u65f6\u5e94\u7528\u81ea\u56de\u5f52\u548c\u53cc\u5411\u6ce8\u610f\u529b\u5230\u63d0\u793a\u4e0a\uff0c\u4ee5\u83b7\u53d6\u66f4\u7cbe\u786e\u7684\u67e5\u8be2\u6216\u6307\u4ee4\u8868\u793a\u3002\u6211\u4eec\u4e3a\u6b64\u5f15\u5165\u4e86\u4e24\u7ec4\u53c2\u6570\uff0c\u5e76\u91c7\u7528\u4e86\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u6765\u5904\u7406\u3002\u8fd9\u4e24\u79cd\u7279\u5f81\u968f\u540e\u88ab\u7ec4\u5408\u6210\u4e00\u4e2a\u52a0\u6743\u5e73\u5747\uff0c\u5176\u4e2d\u6743\u91cd\u7531\u53ef\u8bad\u7ec3\u7cfb\u6570\u51b3\u5b9a\uff0c\u7528\u4e8e\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitune\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u7b97\u672f\u548c\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5927\u91cf\u7684\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u5e76\u663e\u793a\u4e86\u8be5\u65b9\u6cd5\u5bf9\u4e0d\u540cPEFT\u6280\u672f\u7684\u9c81\u68d2\u6027\u3002|\n", "2405.14852": "|**2024-05-23**|**PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression**|Vladimir Malinovskii et.al.|[2405.14852](http://arxiv.org/abs/2405.14852)|**[link](https://github.com/vahe1994/aqlm)**|## \u80cc\u666f \u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u6781\u7aef\u201d\u538b\u7f29\uff0c\u5373\u5c06\u5176\u53c2\u6570\u538b\u7f29\u81f31-2\u4f4d\u6bcf\u53c2\u6570\uff0c\u4ee5\u9002\u5e94\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u7684\u9ad8\u6548\u6267\u884c\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6539\u8fdb\u4e00\u6b21\u6027\u91cf\u5316\u6280\u672f\u548c\u6743\u91cd\u8868\u793a\u4e0a\uff1b\u7136\u800c\uff0c\u7eaf\u540e\u8bad\u7ec3\u65b9\u6cd5\u5728\u7cbe\u5ea6\u4e0e\u4f4d\u5bbd\u6743\u8861\u65b9\u9762\u7684\u6536\u76ca\u6b63\u5728\u51cf\u5c11\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982QuIP#\u548cAQLM\uff0c\u5305\u542b\u5bf9\u90e8\u5206\u538b\u7f29\u53c2\u6570\u7684\u5c0f\u89c4\u6a21\u6821\u51c6\u6570\u636e\u5fae\u8c03\uff1b\u7136\u800c\uff0c\u8fd9\u4e9b\u9488\u5bf9\u538b\u7f29\u6743\u91cd\u7684\u5fae\u8c03\u901a\u5e38\u4ec5\u4f7f\u7528\u76f4\u901a\u4f30\u8ba1\u5668\uff08STE\uff09\uff0cSTE\u5728\u8fd9\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\u5c1a\u4e0d\u660e\u786e\u3002 \u672c\u5de5\u4f5c\u8d28\u7591\u5728\u6781\u7aefLLM\u538b\u7f29\u4e2d\u4f7f\u7528STE\u7684\u6709\u6548\u6027\uff0c\u5e76\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u91cf\u5316\u611f\u77e5\u5fae\u8c03\u7b56\u7565\u3002\u6211\u4eec\u63d0\u51faPV-Tuning\uff0c\u4e00\u4e2a\u65e0\u7279\u5b9a\u67b6\u6784\u9650\u5236\u7684\u6846\u67b6\uff0c\u5b83\u6269\u5c55\u5e76\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u5e76\u5728\u67d0\u4e9b\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u4f9b\u6536\u655b\u4fdd\u8bc1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5f53\u7528\u4e8e1-2\u4f4d\u77e2\u91cf\u91cf\u5316\u65f6\uff0cPV-Tuning\u5728\u9ad8\u6027\u80fd\u6a21\u578b\u5982Llama\u548cMistral\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u6280\u672f\u3002\u901a\u8fc7\u4f7f\u7528PV-Tuning\uff0c\u6211\u4eec\u57282\u4f4d\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u9996\u6b21\u5b9e\u73b0\u4e86Llama 2\u5bb6\u65cf\u6a21\u578b\u7684\u5e15\u7d2f\u6258\u6700\u4f18\u91cf\u5316\u3002|\n", "2405.14831": "|**2024-05-23**|**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models**|Bernal Jim\u00e9nez Guti\u00e9rrez et.al.|[2405.14831](http://arxiv.org/abs/2405.14831)|**[link](https://github.com/osu-nlp-group/hipporag)**|\u4e3a\u4e86\u5728\u6076\u52a3\u591a\u53d8\u7684\u81ea\u7136\u73af\u5883\u4e2d\u751f\u5b58\uff0c\u54fa\u4e73\u52a8\u7269\u7684\u5927\u8111\u53d1\u5c55\u51fa\u5b58\u50a8\u5927\u91cf\u4e16\u754c\u77e5\u8bc6\u5e76\u4e0d\u65ad\u6574\u5408\u65b0\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u540c\u65f6\u907f\u514d\u707e\u96be\u6027\u9057\u5fd8\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u5e26\u6709\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u4e0a\u5df2\u53d6\u5f97\u663e\u8457\u6210\u5c31\uff0c\u4f46\u5b83\u4eec\u5728\u5927\u89c4\u6a21\u65b0\u7ecf\u9a8c\u878d\u5408\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51faHippoRAG\uff0c\u4e00\u4e2a\u53d7\u4eba\u7c7b\u957f\u671f\u8bb0\u5fc6\u6d77\u9a6c\u56de\u7d22\u5f15\u7406\u8bba\u542f\u53d1\u7684\u65b0\u578b\u68c0\u7d22\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u65b0\u7ecf\u9a8c\u7684\u66f4\u6df1\u3001\u66f4\u6709\u6548\u96c6\u6210\u3002HippoRAG\u5de7\u5999\u5730\u534f\u540cLLMs\u3001\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u4e2a\u6027\u5316PageRank\u7b97\u6cd5\uff0c\u6a21\u62df\u4eba\u8111\u76ae\u5c42\u548c\u6d77\u9a6c\u4f53\u5728\u8bb0\u5fc6\u4e2d\u7684\u4e0d\u540c\u4f5c\u7528\u3002 \u6211\u4eec\u5c06HippoRAG\u4e0e\u73b0\u6709RAG\u65b9\u6cd5\u5728\u591a\u8f6e\u95ee\u7b54\u4efb\u52a1\u4e2d\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793aHippoRAG\u663e\u8457\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002\u5355\u6b65\u68c0\u7d22\u65f6\uff0cHippoRAG\u8868\u73b0\u51fa\u4e0e\u8fed\u4ee3\u68c0\u7d22\u65b9\u6cd5\u5982IRCoT\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u6210\u672c\u8282\u770110-30\u500d\uff0c\u901f\u5ea6\u63d0\u53476-13\u500d\u3002\u5f53\u5c06HippoRAG\u878d\u5165IRCoT\u540e\uff0c\u8fd8\u80fd\u5e26\u6765\u989d\u5916\u7684\u663e\u8457\u589e\u76ca\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793aHippoRAG\u80fd\u591f\u5e94\u5bf9\u73b0\u6709\u65b9\u6cd5\u96be\u4ee5\u89e6\u53ca\u7684\u65b0\u573a\u666f\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728\u4e0a\u5f00\u6e90\u3002|\n", "2405.14804": "|**2024-05-23**|**Can LLMs Solve longer Math Word Problems Better?**|Xin Xu et.al.|[2405.14804](http://arxiv.org/abs/2405.14804)|null|### \u7ffb\u8bd1 \u6570\u5b66\u5e94\u7528\u9898\uff08MWPs\uff09\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u77ed\u80cc\u666f\u7684\u9898\u76ee\u4e0a\u3002\u7136\u800c\uff0c\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u6570\u5b66\u95ee\u9898\u5f80\u5f80\u6d89\u53ca\u590d\u6742\u60c5\u5883\uff0c\u56e0\u6b64LLMs\u89e3\u51b3\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u5bf9\u4e8e\u5176\u5728\u5b9e\u9645\u573a\u666f\u7684\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u8fd9\u4e00\u65b9\u9762\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5173\u6ce8Context Length Generalizability\uff08CoLeG\uff09\uff0c\u5373LLMs\u5904\u7406\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u3002\u6211\u4eec\u521b\u5efa\u4e86Extended Grade-School Math\uff08E-GSM\uff09\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5e26\u6709\u8be6\u7ec6\u53d9\u8ff0\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30LLMs\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u6548\u80fd\u548c\u9c81\u68d2\u6027\u3002 \u901a\u8fc7\u5bf9\u73b0\u6709\u96f6\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u4ee5\u53ca\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u8003\u5bdf\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u5728CoLeG\u65b9\u9762\u666e\u904d\u5b58\u5728\u4e0d\u8db3\u3002\u9488\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684LLMs\uff0c\u6211\u4eec\u63d0\u51fa\u9488\u5bf9\u6027\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u5bf9\u4e8e\u4e13\u6709\u6a21\u578b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u5bfc\u6027\u63d0\u793a\u4ee5\u51cf\u8f7b\u957f\u6587\u672c\u7684\u5f71\u54cd\uff1b\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4ee5\u63d0\u5347\u6a21\u578b\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u5728E-GSM\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5176\u4ed6\u591a\u4e2a\u6570\u5b66\u5e94\u7528\u9898\u57fa\u51c6\u4e0a\u4e5f\u5c55\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u672c\u7814\u7a76\u7684\u7ed3\u679c\u4e3a\u672a\u6765\u5229\u7528LLMs\u5904\u7406\u590d\u6742\u73b0\u5b9e\u95ee\u9898\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\uff0c\u4e3a\u5f53\u524d\u9650\u5236\u63d0\u51fa\u4e86\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u6a21\u578b\u6cdb\u5316\u6027\u548c\u8bad\u7ec3\u7b56\u7565\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2405.14782": "|**2024-05-23**|**Lessons from the Trenches on Reproducible Evaluation of Language Models**|Stella Biderman et.al.|[2405.14782](http://arxiv.org/abs/2405.14782)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u662f\u4e00\u9879\u672a\u89e3\u7684\u6311\u6218\u3002\u7814\u7a76\u4eba\u5458\u548c\u5de5\u7a0b\u5e08\u9762\u4e34\u8bf8\u591a\u65b9\u6cd5\u8bba\u96be\u9898\uff0c\u4f8b\u5982\u6a21\u578b\u5bf9\u8bc4\u4f30\u8bbe\u7f6e\u7684\u654f\u611f\u6027\u3001\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\uff0c\u4ee5\u53ca\u53ef\u91cd\u590d\u6027\u548c\u900f\u660e\u5ea6\u7684\u7f3a\u5931\u3002\u672c\u6587\u57fa\u4e8e\u4e09\u5e74\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7ecf\u9a8c\uff0c\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u6307\u5bfc\u548c\u6559\u8bad\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u5e38\u89c1\u7684\u95ee\u9898\u3002\u5176\u6b21\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u5e94\u5bf9\u6216\u51cf\u8f7b\u8fd9\u4e9b\u95ee\u9898\u7684\u6700\u4f73\u5b9e\u8df5\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Language Model Evaluation Harness\uff08lm-eval\uff09\uff1a\u4e00\u4e2a\u5f00\u6e90\u5e93\uff0c\u65e8\u5728\u72ec\u7acb\u3001\u53ef\u91cd\u590d\u548c\u6269\u5c55\u5730\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u5c06\u4ecb\u7ecd\u5e93\u7684\u529f\u80fd\uff0c\u5e76\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u5982\u4f55\u4f7f\u7528\u8be5\u5e93\u6765\u7f13\u89e3\u8fd9\u4e9b\u65b9\u6cd5\u8bba\u5173\u6ce8\u70b9\u3002|\n", "2405.14768": "|**2024-05-23**|**WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models**|Peng Wang et.al.|[2405.14768](http://arxiv.org/abs/2405.14768)|**[link](https://github.com/zjunlp/easyedit)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u968f\u7740\u4e16\u754c\u4e8b\u5b9e\u7684\u4e0d\u65ad\u589e\u957f\u548c\u7ea0\u6b63\u9519\u8bef\u54cd\u5e94\u7684\u9700\u6c42\uff0c\u6a21\u578b\u7f16\u8f91\u7684\u65b9\u6cd5\u9700\u8981\u4e0d\u65ad\u66f4\u65b0\u77e5\u8bc6\u3002\u8bba\u6587\u7684\u6838\u5fc3\u95ee\u9898\u662f\uff1a\u5728\u7f16\u8f91\u8fc7\u7a0b\u4e2d\uff0c\u77e5\u8bc6\u5e94\u5b58\u50a8\u5728\u6a21\u578b\u7684\u54ea\u4e2a\u8bb0\u5fc6\u5c42\u6b21\u66f4\u4e3a\u5408\u9002\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u76f4\u63a5\u4fee\u6539\u957f\u671f\u8bb0\u5fc6\uff08\u6a21\u578b\u53c2\u6570\uff09\u6216\u5229\u7528\u5de5\u4f5c\u8bb0\u5fc6\uff08\u901a\u8fc7\u68c0\u7d22\u7684\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\uff09\u90fd\u4f1a\u5bfc\u81f4\u4e0d\u53ef\u903e\u8d8a\u7684\u4e09\u89d2\u56f0\u5883\u2014\u2014\u53ef\u9760\u6027\u3001\u6cdb\u5316\u80fd\u529b\u548c\u5c40\u90e8\u6027\u65e0\u6cd5\u540c\u65f6\u5b9e\u73b0\u4e8e\u7ec8\u8eab\u7f16\u8f91\u573a\u666f\u4e2d\u3002\u76f4\u63a5\u4fee\u6539\u53c2\u6570\u4f1a\u4e0e\u65e0\u5173\u7684\u9884\u8bad\u7ec3\u77e5\u8bc6\u6216\u5148\u524d\u7f16\u8f91\u4ea7\u751f\u51b2\u7a81\uff08\u53ef\u9760\u6027\u5dee\u3001\u5c40\u90e8\u6027\u4e0d\u8db3\uff09\uff1b\u800c\u57fa\u4e8e\u68c0\u7d22\u7684\u5de5\u4f5c\u8bb0\u5fc6\u96be\u4ee5\u4f7f\u6a21\u578b\u7406\u89e3\u5e76\u6cdb\u5316\u7f16\u8f91\uff08\u6cdb\u5316\u80fd\u529b\u5f31\uff09\u3002\u56e0\u6b64\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aWISE\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u5f25\u5408\u8bb0\u5fc6\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002 \u5728WISE\u4e2d\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u53c2\u6570\u5185\u5b58\u673a\u5236\uff0c\u5305\u62ec\u4e3b\u5185\u5b58\u7528\u4e8e\u5b58\u50a8\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0c\u4fa7\u5185\u5b58\u7528\u4e8e\u5b58\u653e\u7f16\u8f91\u540e\u7684\u77e5\u8bc6\u3002\u4ec5\u5bf9\u4fa7\u5185\u5b58\u4e2d\u7684\u77e5\u8bc6\u8fdb\u884c\u7f16\u8f91\uff0c\u5e76\u8bad\u7ec3\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u4ee5\u4fbf\u6839\u636e\u67e5\u8be2\u51b3\u5b9a\u4ece\u54ea\u4e2a\u5185\u5b58\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u5bf9\u4e8e\u6301\u7eed\u7f16\u8f91\uff0c\u91c7\u7528\u4e86\u77e5\u8bc6\u5207\u7247\u673a\u5236\uff0c\u5c06\u4e0d\u540c\u7684\u7f16\u8f91\u5206\u5e03\u5728\u53c2\u6570\u7684\u4e0d\u540c\u5b50\u7a7a\u95f4\u4e2d\uff0c\u7136\u540e\u5408\u5e76\u5230\u5171\u4eab\u5185\u5b58\u4e2d\uff0c\u4ee5\u907f\u514d\u51b2\u7a81\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWISE\u5728\u95ee\u7b54\u3001\u5e7b\u89c9\u751f\u6210\u548c\u8de8\u4e0d\u540c\u8d8b\u52bf\u7684LLM\u67b6\u6784\uff08\u5982GPT\u3001LLaMA\u548cMistral\uff09\u7684\u7ec8\u8eab\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u6210\u529f\u514b\u670d\u4e86\u4e0a\u8ff0\u56f0\u5883\u3002\u4ee3\u7801\u5c06\u5728https://github.com/zjunlp/EasyEdit\u4e0a\u53d1\u5e03\u3002**|\n", "2405.14767": "|**2024-05-23**|**FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models**|Hongyang Yang et.al.|[2405.14767](http://arxiv.org/abs/2405.14767)|**[link](https://github.com/ai4finance-foundation/finrobot)**|**\u968f\u7740\u91d1\u878d\u673a\u6784\u548c\u4e13\u4e1a\u4eba\u58eb\u8d8a\u6765\u8d8a\u591a\u5730\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u878d\u5165\u5de5\u4f5c\u6d41\u7a0b\uff0c\u91d1\u878d\u884c\u4e1a\u4e0eAI\u793e\u533a\u4e4b\u95f4\u4ecd\u5b58\u5728\u663e\u8457\u969c\u788d\uff0c\u5982\u4e13\u6709\u6570\u636e\u548c\u4e13\u4e1a\u77e5\u8bc6\u3002\u8fd9\u4e9b\u6311\u6218\u9650\u5236\u4e86AI\u5728\u63d0\u5347\u91d1\u878d\u4efb\u52a1\u6548\u7387\u65b9\u9762\u7684\u6f5c\u529b\u3002\u9274\u4e8e\u91d1\u878d\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u6211\u4eec\u65e8\u5728\u5f00\u53d1\u4e13\u95e8\u9488\u5bf9\u91d1\u878d\u7684LLM\u9a71\u52a8\u5de5\u5177\u94fe\uff0c\u5e76\u901a\u8fc7\u5f00\u6e90\u9879\u76ee\u63a8\u52a8\u5176\u666e\u53ca\uff0c\u4fc3\u8fdbAI\u5728\u91d1\u878d\u51b3\u7b56\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u4ecb\u7ecdFinRobot\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u5f00\u6e90AI\u4ee3\u7406\u5e73\u53f0\uff0c\u652f\u6301\u591a\u4e2a\u91d1\u878d\u4e13\u4e1aAI\u4ee3\u7406\uff0c\u6bcf\u4e2a\u90fd\u7531LLM\u9a71\u52a8\u3002\u5e73\u53f0\u4e3b\u8981\u5206\u4e3a\u56db\u5c42\uff1a1\uff09\u91d1\u878dAI\u4ee3\u7406\u5c42\uff0c\u901a\u8fc7\u6784\u5efa\u91d1\u878dChain-of-Thought\uff08CoT\uff09\u5c06\u590d\u6742\u7684\u91d1\u878d\u95ee\u9898\u5206\u89e3\u4e3a\u903b\u8f91\u5e8f\u5217\uff1b2\uff09\u91d1\u878dLLM\u7b97\u6cd5\u5c42\uff0c\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u52a8\u6001\u914d\u7f6e\u5408\u9002\u7684\u6a21\u578b\u5e94\u7528\u7b56\u7565\uff1b3\uff09LLMOps\u548cDataOps\u5c42\uff0c\u901a\u8fc7\u8bad\u7ec3/\u5fae\u8c03\u6280\u672f\u4ee5\u53ca\u4f7f\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u751f\u6210\u7cbe\u786e\u6a21\u578b\uff1b4\uff09\u591a\u6e90LLM\u57fa\u7840\u6a21\u578b\u5c42\uff0c\u6574\u5408\u5404\u79cdLLM\uff0c\u4f7f\u4e0a\u8ff0\u5404\u5c42\u53ef\u4ee5\u76f4\u63a5\u8bbf\u95ee\u3002FinRobot\u65e8\u5728\u4e3a\u4e13\u4e1a\u5206\u6790\u5e08\u548c\u975e\u4e13\u4e1a\u4eba\u58eb\u63d0\u4f9b\u5b9e\u8df5\u64cd\u4f5c\uff0c\u8ba9\u4ed6\u4eec\u80fd\u591f\u5229\u7528\u5f3a\u5927\u7684AI\u6280\u672f\u8fdb\u884c\u9ad8\u7ea7\u91d1\u878d\u5206\u6790\u3002FinRobot\u7684\u5f00\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/AI4Finance-Foundation/FinRobot}\u3002**|\n", "2405.14766": "|**2024-05-23**|**Evaluating Large Language Models for Public Health Classification and Extraction Tasks**|Joshua Harris et.al.|[2405.14766](http://arxiv.org/abs/2405.14766)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u4eec\u5bf9\u5176\u5728\u516c\u5171\u536b\u751f\u9886\u57df\u652f\u6301\u4e13\u5bb6\u5de5\u4f5c\u7684\u6f5c\u529b\u4ea7\u751f\u4e86\u6d53\u539a\u5174\u8da3\u3002\u672c\u7814\u7a76\u901a\u8fc7\u7ed3\u5408\u516d\u4e2a\u5916\u90e8\u6807\u6ce8\u7684\u548c\u4e03\u4e2a\u5185\u90e8\u6807\u6ce8\u7684\u6570\u636e\u96c6\uff0c\u8bc4\u4f30\u4e86LLMs\u5728\u5904\u7406\u4e0e\u5065\u5eb7\u8d1f\u62c5\u3001\u6d41\u884c\u75c5\u5b66\u98ce\u9669\u56e0\u7d20\u548c\u516c\u5171\u536b\u751f\u5e72\u9884\u76f8\u5173\u7684\u6587\u672c\u5206\u7c7b\u548c\u63d0\u53d6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u9996\u5148\u5bf9\u4e94\u4e2a\u5f00\u6e90\u5927\u6a21\u578b\uff08\u53c2\u6570\u91cf\u4ece7\u4ebf\u523070\u4ebf\u4e0d\u7b49\uff09\u8fdb\u884c\u4e86\u96f6\u6837\u672c\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cLlama-3-70B-Instruct\u8868\u73b0\u51fa\u8272\uff0c\u5fae-F1\u5f97\u5206\u572817\u4e2a\u4efb\u52a1\u4e2d\u768415\u9879\u4e2d\u6700\u9ad8\u3002\u5404\u4efb\u52a1\u95f4\u7684\u6027\u80fd\u5dee\u5f02\u663e\u8457\uff0c\u4f8b\u5982\uff0c\u6709\u4e9b\u6a21\u578b\u5982Contact Classification\u7684\u5f97\u5206\u4f4e\u4e8e60%\uff0c\u800c\u50cfGI\u75be\u75c5\u5206\u7c7b\u8fd9\u6837\u7684\u4efb\u52a1\uff0c\u6240\u6709\u6a21\u578b\u90fd\u80fd\u8fbe\u523080%\u4ee5\u4e0a\u7684\u5fae-F1\u3002\u5bf9\u4e8e12\u4e2a\u4efb\u52a1\u7684\u5b50\u96c6\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86GPT-4\uff0c\u53d1\u73b0\u5176\u4e0eLlama-3-70B-Instruct\u7684\u7ed3\u679c\u76f8\u5f53\uff0cLlama-3-70B-Instruct\u5728\u5176\u4e2d6\u4e2a\u4efb\u52a1\u4e0a\u5f97\u5206\u66f4\u9ad8\u6216\u6301\u5e73\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6839\u636e\u521d\u6b65\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u6709\u53ef\u80fd\u6210\u4e3a\u516c\u5171\u536b\u751f\u4e13\u5bb6\u4ece\u5404\u79cd\u81ea\u7531\u6587\u672c\u6e90\u63d0\u53d6\u4fe1\u606f\u7684\u6709\u6548\u5de5\u5177\uff0c\u6709\u52a9\u4e8e\u516c\u5171\u536b\u751f\u76d1\u6d4b\u3001\u7814\u7a76\u548c\u5e72\u9884\u63aa\u65bd\u3002|\n", "2405.14755": "|**2024-05-23**|**Large language models can be zero-shot anomaly detectors for time series?**|Sarah Alnegheimish et.al.|[2405.14755](http://arxiv.org/abs/2405.14755)|null|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6267\u884c\u591a\u79cd\u4efb\u52a1\uff0c\u5305\u62ec\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u8fd9\u4e9b\u6a21\u578b\u7684\u7075\u6d3b\u6027\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e00\u9879\u65b0\u9896\u7684\u7814\u7a76\uff0c\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5bf9\u4e8e\u8bed\u8a00\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u6d89\u53ca\u8bc6\u522b\u8f93\u5165\u5e8f\u5217\uff08\u6216\u591a\u4e2a\u90e8\u5206\uff09\u4e2d\u7684\u5f02\u5e38\u70b9\uff0c\u4ee5\u53ca\u5904\u7406\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u800c\u975e\u4f20\u7edf\u7684\u6587\u672c\u8f93\u5165\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86sigllm\uff0c\u4e00\u4e2a\u4e13\u4e3a\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5305\u542b\u5c06\u65f6\u95f4\u5e8f\u5217\u8f6c\u6362\u4e3a\u6587\u672c\u7684\u6a21\u5757\uff0c\u4ee5\u53ca\u7aef\u5230\u7aef\u7684\u6d41\u7a0b\uff0c\u7528\u4e8e\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5f02\u5e38\u68c0\u6d4b\u3002\u6211\u4eec\u8bd5\u9a8c\u4e86\u4e24\u79cd\u6d4b\u8bd5\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u529b\u7684\u65b9\u6cd5\uff1a\u4e00\u662f\u76f4\u63a5\u63d0\u793a\u6a21\u578b\u6307\u51fa\u8f93\u5165\u4e2d\u7684\u5f02\u5e38\u5143\u7d20\uff1b\u4e8c\u662f\u5229\u7528\u8bed\u8a00\u6a21\u578b\u7684\u9884\u6d4b\u80fd\u529b\u6765\u8f85\u52a9\u68c0\u6d4b\u8fc7\u7a0b\u3002 \u6211\u4eec\u572811\u4e2a\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u4e8610\u79cd\u4e0d\u540c\u7684\u7ba1\u9053\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9884\u6d4b\u65b9\u6cd5\u5728\u6240\u670911\u4e2a\u6570\u636e\u96c6\u4e2d\u90fd\u663e\u8457\u4f18\u4e8e\u63d0\u793a\u65b9\u6cd5\uff0c\u5c24\u5176\u662f\u5728F1\u5206\u6570\u4e0a\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u53d1\u73b0\u5f02\u5e38\uff0c\u4f46\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4ecd\u5360\u4f18\uff0c\u5176\u8868\u73b0\u6bd4\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9ad8\u51fa30%\u3002|\n", "2405.15765": "|**2024-05-24**|**Scaling Laws for Discriminative Classification in Large Language Models**|Dean Wyatte et.al.|[2405.15765](http://arxiv.org/abs/2405.15765)|null|## \u80cc\u666f \u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u6a21\u578b\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u98de\u8dc3\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5bf9\u5404\u79cd\u67e5\u8be2\u751f\u6210\u5408\u7406\u7684\u56de\u7b54\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u5e94\u7528\u4e2d\u5177\u6709\u6f5c\u529b\u3002\u7136\u800c\uff0cLLMs\u5df2\u88ab\u89c2\u5bdf\u5230\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\uff0c\u8fd9\u5728\u77ed\u671f\u5185\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5c06\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u91cd\u65b0\u6784\u60f3\u4e3a\u5206\u7c7b\u4efb\u52a1\uff0c\u4ee5\u5e2e\u52a9\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u9009\u62e9\u6700\u4f73\u7684\u6a21\u677f\u56de\u590d\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a\u5ba2\u670d\u4ee3\u8868\u63d0\u4f9b\u6700\u5408\u9002\u7684\u524dK\u4e2a\u5019\u9009\u56de\u590d\u3002 ## \u4efb\u52a1\u63cf\u8ff0 \u6211\u4eec\u5c55\u793a\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u5b9e\u9a8c\u7684\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86\u5b9e\u9a8c\u7cfb\u7edf\u7684\u6709\u6548\u6027\uff0c\u79bb\u7ebf\u5b9e\u9a8c\u663e\u793a\u51fa\u6539\u8fdb\uff0c\u800c\u5728\u7ebf\u5b9e\u9a8c\u5219\u5e26\u6765\u4e86\u7edf\u8ba1\u663e\u8457\u7684\u6548\u679c\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u4eab\u4e86\u901a\u8fc7\u6a21\u578b\u53c2\u6570\u8c03\u6574\u8fdb\u884c\u7684\u9a8c\u8bc1\u635f\u5931\u548c\u524dK\u7cbe\u5ea6\u7684\u5ea6\u91cf\u66f2\u7ebf\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u6a21\u578b\u5927\u5c0f\u3001\u5ef6\u8fdf\u548c\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u3002|\n", "2405.15739": "|**2024-05-24**|**Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias**|Andres Algaba et.al.|[2405.15739](http://arxiv.org/abs/2405.15739)|**[link](https://github.com/andresalgaba/llm_citation_patterns)**|\u8bba\u6587\u6458\u8981\uff1a \u5f15\u7528\u5b9e\u8df5\u5bf9\u4e8e\u6784\u5efa\u79d1\u5b66\u77e5\u8bc6\u7ed3\u6784\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u53d7\u5230\u5f53\u4ee3\u89c4\u8303\u548c\u504f\u89c1\u7684\u5f71\u54cd\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u51fa\u73b0\uff0c\u8fd9\u4e00\u9886\u57df\u51fa\u73b0\u4e86\u65b0\u7684\u52a8\u6001\u3002\u7814\u7a76\u8005\u9996\u6b21\u63a2\u7d22\u4e86\u5b8c\u5168\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u800c\u975e\u57fa\u4e8e\u641c\u7d22\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7684\u63a8\u8350\u5f15\u7528\u7684\u7279\u6027\u53ca\u5176\u6f5c\u5728\u504f\u89c1\u3002\u5b9e\u9a8c\u4f7f\u7528\u4e86\u4e00\u7ec4\u5305\u542b166\u7bc7\u6765\u81eaAAAI\u3001NeurIPS\u3001ICML\u548cICLR\u7684\u8bba\u6587\uff0c\u8fd9\u4e9b\u8bba\u6587\u5728GPT-4\u7684\u77e5\u8bc6\u622a\u6b62\u65e5\u671f\u540e\u53d1\u8868\uff0c\u6d89\u53ca3,066\u4e2a\u5f15\u7528\u3002\u5b9e\u9a8c\u8ba9GPT-4\u4e3a\u533f\u540d\u6587\u672c\u4e2d\u7684\u5f15\u7528\u63d0\u4f9b\u5b66\u672f\u53c2\u8003\u3002\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u7c7b\u548c\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u5f15\u7528\u6a21\u5f0f\u60ca\u4eba\u76f8\u4f3c\uff0c\u4f46GPT-4\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9ad8\u5f15\u7528\u504f\u89c1\uff0c\u5373\u4f7f\u5728\u63a7\u5236\u4e86\u51fa\u7248\u5e74\u4efd\u3001\u6807\u9898\u957f\u5ea6\u3001\u4f5c\u8005\u6570\u91cf\u548c\u4f1a\u8bae\u7b49\u56e0\u7d20\u540e\u4f9d\u7136\u5b58\u5728\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0GPT-4\u751f\u6210\u7684\u65e2\u6709\u548c\u4e0d\u5b58\u5728\u5f15\u7528\u7684\u7279\u6027\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8868\u660e\u6a21\u578b\u5185\u5316\u4e86\u5f15\u7528\u6a21\u5f0f\u3002\u901a\u8fc7\u5206\u6790\u5f15\u7528\u56fe\u8c31\uff0c\u663e\u793aGPT-4\u63a8\u8350\u7684\u5f15\u7528\u5d4c\u5165\u5728\u76f8\u5173\u5f15\u7528\u7f51\u7edc\u4e2d\uff0c\u6697\u793a\u5176\u5bf9\u6982\u5ff5\u7684\u6df1\u5165\u7406\u89e3\u3002\u5c3d\u7ba1\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u8f85\u52a9\u5f15\u7528\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4e5f\u53ef\u80fd\u653e\u5927\u73b0\u6709\u504f\u89c1\u5e76\u5f15\u5165\u65b0\u504f\u89c1\uff0c\u53ef\u80fd\u5f71\u54cd\u79d1\u5b66\u77e5\u8bc6\u7684\u4f20\u64ad\u3002\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86\u8bc6\u522b\u6a21\u578b\u504f\u89c1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5f00\u53d1\u5e73\u8861\u7684\u65b9\u6cd5\u4e0e\u8bed\u8a00\u6a21\u578b\u4e92\u52a8\u7684\u91cd\u8981\u6027\u3002|\n", "2405.15734": "|**2024-05-24**|**LM4LV: A Frozen Large Language Model for Low-level Vision Tasks**|Boyang Zheng et.al.|[2405.15734](http://arxiv.org/abs/2405.15734)|**[link](https://github.com/bytetriper/lm4lv)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u50ac\u751f\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u7814\u7a76\u70ed\u6f6e\uff0c\u5b83\u4eec\u6b63\u5728\u6539\u53d8\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u591a\u4e2a\u7814\u7a76\u8303\u5f0f\u3002\u5c3d\u7ba1MLLMs\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u548c\u6587\u672c\u5230\u56fe\u50cf\u7b49\u9ad8\u7ea7\u89c6\u89c9\u548c Vision-and-Language \u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u63a2\u8ba8\u8fc7\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u5982\u4f55\u4ece\u8fd9\u4e9b\u6a21\u578b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5f53\u524d\u5927\u591a\u6570MLLM\u7684\u8bbe\u8ba1\u4f7f\u5176\u5bf9\u4f4e\u7ea7\u7279\u5f81\u89c6\u800c\u4e0d\u89c1\uff0c\u56e0\u6b64\u5728\u89e3\u51b3\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u56fa\u6709\u9650\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa$\\textbf{LM4LV}$\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u4e00\u4e2a\u51bb\u7ed3\u7684LLM\u65e0\u9700\u4efb\u4f55\u591a\u6a21\u6001\u6570\u636e\u6216\u5148\u9a8c\u77e5\u8bc6\u5c31\u80fd\u89e3\u51b3\u4e00\u7cfb\u5217\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u3002\u8fd9\u7a81\u663e\u4e86LLMs\u5728\u4f4e\u7ea7\u89c6\u89c9\u9886\u57df\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u5e76\u5f25\u5408\u4e86MLLMs\u4e0e\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u6fc0\u53d1\u5bf9LLMs\u7684\u65b0\u89c6\u89d2\uff0c\u52a0\u6df1\u5bf9\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7406\u89e3\u3002|\n", "2405.15729": "|**2024-05-24**|**Optimizing Large Language Models for OpenAPI Code Completion**|Bohdan Petryshyn et.al.|[2405.15729](http://arxiv.org/abs/2405.15729)|**[link](https://github.com/BohdanPetryshyn/openapi-completion-benchmark)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u3002\u5c3d\u7ba1\u4e3b\u6d41\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\u8865\u5168\u89e3\u51b3\u65b9\u6848\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u8f83\u5c11\u89c1\u7684\u683c\u5f0f\uff0c\u5982OpenAPI\u5b9a\u4e49\u65f6\u6027\u80fd\u6b20\u4f73\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86GitHub Copilot\uff0c\u4e00\u4e2a\u6d41\u884c\u7684\u5546\u4e1a\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u5728OpenAPI\u5b8c\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u5e76\u9488\u5bf9Meta\u5f00\u6e90\u7684Code Llama\u6a21\u578b\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u8be5\u4efb\u52a1\u7684\u4f18\u5316\u7b56\u7565\u3002\u7814\u7a76\u4e2d\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8bed\u4e49\u611f\u77e5\u7684OpenAPI\u5b8c\u6210\u57fa\u51c6\uff0c\u901a\u8fc7\u5b9e\u9a8c\u5206\u6790\u4e86\u4e0d\u540c\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u6280\u672f\u5bf9Code Llama\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7ecf\u8fc7\u5fae\u8c03\u7684Code Llama\u6a21\u578b\u5728\u6b63\u786e\u6027\u4e0a\u8fbe\u5230\u4e86\u6bd4GitHub Copilot\u9ad8\u51fa55.2%\u7684\u5cf0\u503c\uff0c\u540c\u65f6\u5176\u53c2\u6570\u6570\u91cf\u4ec5\u4e3a\u5546\u4e1a\u89e3\u51b3\u65b9\u6848\uff08\u57fa\u4e8eCodex\u6a21\u578b\uff09\u76841/25\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u6539\u8fdb\u4e86\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u4ee3\u7801\u586b\u5145\u8bad\u7ec3\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5728\u63a5\u6536\u5230\u5c0f\u4e8e\u8bad\u7ec3\u65f6\u4f7f\u7528\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u63d0\u793a\u65f6\u7684\u6027\u80fd\u4e0d\u8db3\u95ee\u9898\u3002|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684](http://arxiv.org/abs/2405.15684)|null|\u4e3a\u4e86\u5f25\u5408\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u901a\u5e38\u4f1a\u5b66\u4e60\u4e00\u4e2a\u9002\u914d\u5668\uff0c\u5c06\u89c6\u89c9\u8f93\u5165\u8f6c\u5316\u4e3a\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u7406\u89e3\u7684\u4ee4\u724c\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u9002\u914d\u5668\u751f\u6210\u7684\u89c6\u89c9\u4ee4\u724c\u76f8\u5bf9\u56fa\u5b9a\uff0c\u4e0d\u8003\u8651\u63d0\u793a\u4e2d\u63d0\u53ca\u7684\u5177\u4f53\u5bf9\u8c61\u3002\u7531\u4e8e\u8fd9\u4e9b\u9002\u914d\u5668\u5bf9\u56fe\u50cf\u4e2d\u7684\u6bcf\u4e2a\u7ec6\u8282\u5206\u914d\u540c\u7b49\u5173\u6ce8\uff0c\u4e14\u503e\u5411\u4e8e\u5904\u7406\u6574\u4e2a\u573a\u666f\uff0c\u8fd9\u53ef\u80fd\u4f1a\u589e\u52a0\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u573a\u666f\u65f6\u7684\u8ba4\u77e5\u8d1f\u8377\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u3002\u8fd9\u7c7b\u9002\u914d\u5668\u8bbe\u8ba1\u6709\u6839\u636e\u63d0\u793a\u7279\u5b9a\u5173\u6ce8\u70b9\u52a8\u6001\u5d4c\u5165\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u6587\u672c\u7279\u5f81\uff0c\u5728\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u5c42\u6b21\u4e0a\u6355\u6349\u4e0e\u63d0\u793a\u6700\u76f8\u5173\u7684\u89c6\u89c9\u7ebf\u7d22\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u89e3\u91ca\u89c6\u89c9\u5185\u5bb9\u7684\u80fd\u529b\u3002\u5728\u5404\u79cd\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u5982\u8ba1\u6570\u548c\u4f4d\u7f6e\u63a8\u7406\u5b9e\u9a8c\u4e2d\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u7684\u6548\u679c\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2405.15668": "|**2024-05-24**|**What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models**|Abdelrahman Abdelhamed et.al.|[2405.15668](http://arxiv.org/abs/2405.15668)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u591a\u6a21\u6001LLMs\u5e94\u7528\u4e8e\u56fe\u50cf\u8f93\u5165\uff0c\u751f\u6210\u8be6\u5c3d\u7684\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6587\u672c\u8868\u793a\u88ab\u8f6c\u5316\u4e3a\u8de8\u6a21\u6001\u5d4c\u5165\u7a7a\u95f4\u4e2d\u7684\u56fa\u5b9a\u7ef4\u7279\u5f81\uff0c\u5e76\u7ed3\u5408\u4f7f\u7528\u4e8e\u96f6\u6837\u672c\u5206\u7c7b\uff0c\u65e0\u9700\u4e3a\u6bcf\u4e2a\u6570\u636e\u96c6\u8bbe\u8ba1\u590d\u6742\u7684\u63d0\u793a\u3002\u7814\u7a76\u8005\u91c7\u7528\u901a\u7528\u63d0\u793a\u7b56\u7565\uff0c\u800c\u975e\u9488\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u5355\u72ec\u8c03\u6574\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u6bd4\u5148\u524d\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u6709\u6240\u63d0\u5347\u3002\u5e73\u5747\u800c\u8a00\uff0c\u5728\u5341\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u65b9\u6cd5\u6bd4\u4f20\u7edf\u65b9\u6cd5\u63d0\u9ad8\u4e864.1\u4e2a\u767e\u5206\u70b9\uff0c\u5c24\u5176\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e866.8\u4e2a\u767e\u5206\u70b9\u3002\u8fd9\u8868\u660e\uff0c\u591a\u6a21\u6001LLMs\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5982\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u4e4b\u7c7b\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u4efb\u52a1\uff0c\u4e3a\u73b0\u6709\u6280\u672f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002|\n", "2405.15662": "|**2024-05-24**|**Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning**|Wenhan Chang et.al.|[2405.15662](http://arxiv.org/abs/2405.15662)|null|\u5728\u4eba\u5de5\u667a\u80fd\u65f6\u4ee3\uff0c\u7528\u6237\u53ef\u80fd\u56e0\u9690\u79c1\u987e\u8651\u8981\u6c42AI\u516c\u53f8\u4ece\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u5220\u9664\u4ed6\u4eec\u7684\u4fe1\u606f\u3002\u4f5c\u4e3a\u6a21\u578b\u6240\u6709\u8005\uff0c\u91cd\u65b0\u8bad\u7ec3\u6a21\u578b\u4f1a\u6d88\u8017\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\uff0c\u56e0\u6b64\u673a\u5668\u9057\u5fd8\uff08machine unlearning\uff09\u6280\u672f\u5e94\u8fd0\u800c\u751f\uff0c\u4ee5\u5141\u8bb8\u5220\u9664\u8bf7\u6c42\u7684\u8bad\u7ec3\u6570\u636e\u6216\u7c7b\u522b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\uff0c\u5982\u56fe\u50cf\u6216\u6587\u672c\uff0c\u4ece\u6a21\u578b\u4e2d\u201c\u9057\u5fd8\u201d\u4e00\u4e2a\u7c7b\u522b\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u96be\u4ee5\u786e\u5b9a\u7c7b\u522b\u4e0e\u6a21\u578b\u4e4b\u95f4\u7684\u5173\u8054\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u6982\u5ff5\uff08Concept\uff09\u800c\u975e\u56fe\u50cf\u7279\u5f81\u6216\u6587\u672c\u6570\u636e\u4e2d\u7684\u4ee4\u724c\u6765\u8868\u793a\u8981\u5220\u9664\u7c7b\u522b\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u8fd9\u6709\u52a9\u4e8e\u5207\u65ad\u6a21\u578b\u4e0e\u7c7b\u522b\u7684\u8054\u7cfb\uff0c\u5b9e\u73b0\u5f7b\u5e95\u6d88\u9664\u5f71\u54cd\u3002 \u4e3a\u4e86\u5206\u6790\u590d\u6742\u6570\u636e\u4e2d\u7684\u6982\u5ff5\u5f71\u54cd\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u540e\u5904\u7406\u6982\u5ff5\u74f6\u9888\u6a21\u578b\u548c\u96c6\u6210\u68af\u5ea6\u6280\u672f\uff0c\u7cbe\u786e\u8bc6\u522b\u4e0d\u540c\u7c7b\u522b\u4e2d\u7684\u6982\u5ff5\u3002\u7136\u540e\uff0c\u6211\u4eec\u5229\u7528\u968f\u673a\u6807\u7b7e\u548c\u76ee\u6807\u6807\u7b7e\u7684\u6570\u636e\u6c61\u67d3\u7b56\u7565\uff0c\u63d0\u51fa\u9057\u5fd8\u65b9\u6cd5\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u6a21\u578b\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4e00\u81f4\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7b56\u7565\u80fd\u51c6\u786e\u5730\u4ece\u6a21\u578b\u4e2d\u62b9\u9664\u76ee\u6807\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u6027\u80fd\u7684\u5927\u90e8\u5206\u3002|\n", "2405.15652": "|**2024-05-24**|**$$\\mathbf{L^2\\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis**|Simen Gaure et.al.|[2405.15652](http://arxiv.org/abs/2405.15652)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u7ffb\u8bd1\u3001\u9884\u6d4b\u548c\u5185\u5bb9\u751f\u6210\u7b49\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\u3002\u540c\u65f6\uff0c\u7814\u7a76\u754c\u53d1\u73b0LLMs\u6613\u53d7\u653b\u51fb\uff0c\u4f46\u4e5f\u80fd\u589e\u5f3a\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f00\u6e90\u7684LLMs\u5728\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u5a92\u4ecb\uff0c\u5982\u652f\u6301\u6297\u5ba1\u67e5\u901a\u4fe1\u65b9\u9762\u7684\u80fd\u529b\u5982\u4f55\u5462\uff1f\u672c\u8bba\u6587\u4ece\u5b9e\u9a8c\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b9e\u8bc1\u6d4b\u91cf\u5f00\u6e90LLM\u6a21\u578b\uff08Llama-7B\uff09\u7684\u5b89\u5168\u6027\u4e0e\u5bb9\u91cf\uff0c\u4ee5\u8bc4\u4f30\u5176\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u7684\u6709\u6548\u6027\u3002\u5c3d\u7ba1\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8e\u8fd9\u79cd\u6a21\u578b\u7684\u901a\u9053\u4e0d\u592a\u53ef\u80fd\u5b9e\u73b0\u9ad8\u5b9e\u9645\u6bd4\u7279\u7387\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6d88\u606f\u957f\u5ea6\u548c\u6a21\u578b\u71b5\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5bf9\u624b\u53d1\u73b0\u9690\u79d8\u901a\u4fe1\u7684\u53ef\u80fd\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u4f7f\u7ed3\u679c\u6613\u4e8e\u5e7f\u6cdb\u53c2\u8003\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u4e14\u76f4\u89c2\u7684\u65b9\u6848\uff0c\u5e76\u5047\u8bbe\u6a21\u578b\u662f\u516c\u5f00\u53ef\u7528\u7684\u3002|\n", "2405.15646": "|**2024-05-24**|**LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots**|Ruoyu Wang et.al.|[2405.15646](http://arxiv.org/abs/2405.15646)|null|\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u5f00\u53d1\u901a\u7528\u670d\u52a1\u673a\u5668\u4eba\u7684\u9700\u6c42\u4fc3\u4f7f\u673a\u5668\u4eba\u5fc5\u987b\u80fd\u6070\u5f53\u5730\u6267\u884c\u591a\u79cd\u57fa\u7840\u884c\u4e3a\u3002\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fdb\u6b65\u4f7f\u5f97\u53ef\u4ee5\u76f4\u63a5\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u6210\u4efb\u52a1\u5e8f\u5217\uff0c\u65e0\u9700\u989d\u5916\u7684\u9886\u57df\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5c3d\u7ba1LLMs\u7684\u8f93\u51fa\u5728\u8bed\u4e49\u4e0a\u662f\u6b63\u786e\u7684\uff0c\u4f46\u751f\u6210\u7684\u4efb\u52a1\u8ba1\u5212\u53ef\u80fd\u5e76\u4e0d\u7cbe\u786e\u5730\u5bf9\u5e94\u4e8e\u53ef\u63a5\u53d7\u7684\u52a8\u4f5c\uff0c\u5e76\u4e14\u53ef\u80fd\u5b58\u5728\u5404\u79cd\u8bed\u8a00\u6a21\u7cca\u6027\u3002LLM\u7684\u5e7b\u89c9\u95ee\u9898\u5bf9\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u6784\u6210\u6311\u6218\uff0c\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u6216\u7528\u6237\u8f93\u5165\u4e0d\u7b26\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u7ea6\u675fLLM\u63d0\u793a\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u4ece\u547d\u4ee4\u4e2d\u751f\u6210\u53ef\u6267\u884c\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f02\u5e38\u5904\u7406\u6a21\u5757\u6765\u5e94\u5bf9LLM\u5e7b\u89c9\u95ee\u9898\uff0c\u786e\u4fdd\u751f\u6210\u7684\u7ed3\u679c\u5728\u5f53\u524d\u73af\u5883\u4e2d\u662f\u53ef\u63a5\u7eb3\u7684\u3002\u6211\u4eec\u5728RoboCup@Home\u547d\u4ee4\u751f\u6210\u5668\u751f\u6210\u7684\u547d\u4ee4\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u663e\u793a\u673a\u5668\u4eba\u5728\u7406\u89e3\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.15640": "|**2024-05-24**|**GECKO: Generative Language Model for English, Code and Korean**|Sungwoo Oh et.al.|[2405.15640](http://arxiv.org/abs/2405.15640)|null|\u6211\u4eec\u4ecb\u7ecdGECKO\uff0c\u4e00\u4e2a\u4e13\u4e3a\u97e9\u8bed\u548c\u82f1\u8bed\uff08\u5305\u62ec\u7f16\u7a0b\u8bed\u8a00\uff09\u8bbe\u8ba1\u7684\u53cc\u8bed\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u5b83\u57fa\u4e8eLLaMA\u67b6\u6784\uff0c\u4f7f\u7528\u5e73\u8861\u4e14\u9ad8\u8d28\u91cf\u7684\u97e9\u82f1\u8bed\u6570\u636e\u96c6\u8fdb\u884c\u9884\u8bad\u7ec3\u3002\u672c\u62a5\u544a\u8be6\u8ff0\u4e86\u6211\u4eec\u5728\u6784\u5efa\u6570\u636e\u7ba1\u9053\u548c\u8bad\u7ec3\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u52aa\u529b\u3002\u5c3d\u7ba1GECKO\u7684\u8bcd\u6c47\u91cf\u8f83\u5c0f\uff0c\u4f46\u5176\u5728\u751f\u6210\u97e9\u8bed\u548c\u82f1\u8bed\u4ee4\u724c\u65f6\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u80fd\u3002\u6211\u4eec\u5728\u4ee3\u8868\u6027\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30\u4e86\u5176\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u97e9\u56fdMMMLU\uff08\u97e9\u56fd\u591a\u6a21\u6001\u591a\u8bed\u8a00\u7406\u89e3\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u800c\u5728\u82f1\u8bed\u548c\u4ee3\u7801\u65b9\u9762\u5219\u663e\u793a\u51fa\u9002\u5ea6\u7684\u80fd\u529b\uff0c\u5c3d\u7ba1\u5176\u8bad\u7ec3\u7684\u4ee4\u724c\u6570\u91cf\u5c11\u4e8e\u4e13\u6ce8\u4e8e\u82f1\u8bed\u7684LLMs\u3002GECKO\u4ee5\u5bbd\u677e\u7684\u8bb8\u53ef\u534f\u8bae\u5bf9\u5f00\u6e90\u793e\u533a\u5f00\u653e\uff0c\u6211\u4eec\u5e0c\u671b\u5b83\u80fd\u4e3a\u97e9\u8bedLLM\u7814\u7a76\u63d0\u4f9b\u7814\u7a76\u57fa\u7ebf\u548c\u5b9e\u7528\u89c1\u89e3\u3002\u60a8\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\u8be5\u6a21\u578b\uff1ahttps://huggingface.co/kifai/GECKO-7B\u3002|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430](http://arxiv.org/abs/2405.17430)|null|## \u80cc\u666f \u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08\u5982LLaVA\uff09\u5728\u89c6\u89c9-\u8bed\u8a00\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4e9b\u6a21\u578b\u9996\u5148\u5c06\u56fe\u50cf\u5d4c\u5165\u5230\u5927\u91cf\u7684\u56fa\u5b9a\u89c6\u89c9\u4ee4\u724c\u4e2d\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u89c6\u9891\u7b49\u5bc6\u96c6\u89c6\u89c9\u573a\u666f\u65f6\u4f1a\u5bfc\u81f4\u5927\u91cf\u4ee4\u724c\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u5c3d\u7ba1\u5b58\u5728\u4ee4\u724c\u526a\u679d/\u5408\u5e76\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u4e3a\u6bcf\u4e2a\u56fe\u50cf\u751f\u6210\u5355\u4e2a\u957f\u5ea6\u7684\u8f93\u51fa\uff0c\u65e0\u6cd5\u5728\u4fe1\u606f\u5bc6\u5ea6\u4e0e\u6548\u7387\u4e4b\u95f4\u7075\u6d3b\u6743\u8861\u3002\u53d7\u5230\u5957\u5a03\u73a9\u5076\u6982\u5ff5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86M3\uff1a\u5957\u5a03\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u5b66\u4e60\u5c06\u89c6\u89c9\u5185\u5bb9\u8868\u793a\u4e3a\u6355\u6349\u4e0d\u540c\u7c97\u7ec6\u7c92\u5ea6\u4fe1\u606f\u7684\u5d4c\u5957\u89c6\u89c9\u4ee4\u724c\u96c6\u5408\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u65b9\u6cd5\u4e3aLMMs\u5e26\u6765\u4e86\u51e0\u4e2a\u72ec\u7279\u7684\u4f18\u52bf\uff1a(1) \u5728\u6d4b\u8bd5\u5b9e\u4f8b\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u660e\u786e\u63a7\u5236\u89c6\u89c9\u7c92\u5ea6\uff0c\u4f8b\u5982\uff0c\u6839\u636e\u5185\u5bb9\u7684\u590d\u6742\u6027\u6216\u7b80\u6d01\u6027\u8c03\u6574\u7528\u4e8e\u8868\u793a\u56fe\u50cf\u7684\u4ee4\u724c\u6570\u91cf\uff1b(2) M3\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5206\u6790\u73b0\u6709\u6570\u636e\u96c6\u6240\u9700\u7c92\u5ea6\u7684\u6846\u67b6\uff0c\u6211\u4eec\u53d1\u73b0\u50cfCOCO\u8fd9\u6837\u7684\u57fa\u51c6\u53ea\u9700\u8981\u5927\u7ea6~9\u4e2a\u89c6\u89c9\u4ee4\u724c\u5c31\u80fd\u83b7\u5f97\u4e0e\u4f7f\u7528\u6240\u6709576\u4e2a\u4ee4\u724c\u76f8\u5f53\u7684\u51c6\u786e\u6027\uff1b(3) \u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u63a2\u7d22\u6027\u80fd\u4e0e\u89c6\u89c9\u4ee4\u724c\u957f\u5ea6\u4e4b\u95f4\u7684\u6700\u4f73\u6743\u8861\u63d0\u4f9b\u4e86\u57fa\u7840\uff0c\u7814\u7a76\u663e\u793a\u5f53\u524d\u56fa\u5b9a\u89c4\u6a21\u8868\u793a\u4e0e\u7406\u60f3\u4e0a\u9650\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002|\n", "2405.17428": "|**2024-05-27**|**NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models**|Chankyu Lee et.al.|[2405.17428](http://arxiv.org/abs/2405.17428)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNV-Embed\u7684\u65b0\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u63d0\u5347\u57fa\u4e8e\u89e3\u7801\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5bc6\u96c6\u5411\u91cf\u68c0\u7d22\u3002NV-Embed\u901a\u8fc7\u591a\u79cd\u67b6\u6784\u8bbe\u8ba1\u548c\u8bad\u7ec3\u7b56\u7565\u663e\u8457\u589e\u5f3a\u6a21\u578b\u7684\u7075\u6d3b\u6027\u548c\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u7b80\u6d01\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002 \u5728\u67b6\u6784\u65b9\u9762\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9690\u5f0f\u6ce8\u610f\u529b\u5c42\u6765\u83b7\u53d6\u6c60\u5316\u5d4c\u5165\uff0c\u8fd9\u5728\u68c0\u7d22\u548c\u4e0b\u6e38\u4efb\u52a1\u51c6\u786e\u6027\u4e0a\u5747\u4f18\u4e8e\u5e73\u5747\u6c60\u5316\u6216\u4f7f\u7528LLMs\u7684\u6700\u540e\u4e00\u4e2a token\u5d4c\u5165\u3002\u4e3a\u4e86\u6539\u8fdb\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u79fb\u9664\u4e86LLMs\u7684\u81ea\u56de\u5f52\u6ce8\u610f\u529b\u63a9\u7801\uff0c\u5728\u5bf9\u6bd4\u6027\u8bad\u7ec3\u4e2d\u5141\u8bb8\u66f4\u5168\u9762\u7684\u4fe1\u606f\u4ea4\u4e92\u3002 \u5728\u8bad\u7ec3\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u5bf9\u6bd4\u6027\u6307\u4ee4\u8c03\u4f18\u65b9\u6cd5\u3002\u7b2c\u4e00\u9636\u6bb5\u5728\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6307\u4ee4\u8bad\u7ec3\uff0c\u5229\u7528\u6279\u6b21\u5185\u8d1f\u6837\u672c\u548c\u7cbe\u5fc3\u6311\u9009\u7684\u96be\u4f8b\u3002\u7b2c\u4e8c\u9636\u6bb5\u5c06\u5404\u79cd\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u6570\u636e\u878d\u5165\u6307\u4ee4\u8c03\u4f18\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u51c6\u786e\u6027\uff0c\u8fd8\u63d0\u5347\u4e86\u68c0\u7d22\u6027\u80fd\u3002 \u51ed\u501f\u8fd9\u4e9b\u521b\u65b0\uff0cNV-Embed\u4ec5\u4f7f\u7528\u516c\u5f00\u6570\u636e\u5c31\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u9ad8\u5206\uff0c\u8fbe\u523069.32\uff0c\u8363\u767b\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\uff08\u622a\u81f32024\u5e745\u670824\u65e5\uff09\u699c\u9996\uff0c\u6db5\u76d656\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u91cd\u6392\u3001\u5206\u7c7b\u3001\u805a\u7c7b\u548c\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u5ea6\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728BEIR\u768415\u9879\u68c0\u7d22\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u6700\u9ad8\u768459.36\u5206\u3002NV-Embed\u6a21\u578b\u7684\u6e90\u4ee3\u7801\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u5f00\u6e90\uff1ahttps://huggingface.co/nvidia/NV-Embed-v1\u3002|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427](http://arxiv.org/abs/2405.17427)|**[link](https://github.com/kuanchihhuang/reason3d)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b83\u4eec\u5728\u6982\u5ff5\u63a8\u7406\u7b49\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5728\u7406\u89e3\u4e09\u7ef4\u73af\u5883\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u76f8\u5bf9\u6709\u9650\u3002\u672c\u6587\u63d0\u51faReason3D\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u4e3a\u5168\u97623D\u7406\u89e3\u8bbe\u8ba1\u7684\u65b0\u9896LLM\u3002Reason3D\u63a5\u53d7\u70b9\u4e91\u6570\u636e\u548c\u6587\u672c\u63d0\u793a\u4f5c\u4e3a\u8f93\u5165\uff0c\u751f\u6210\u6587\u672c\u54cd\u5e94\u548c\u5206\u5272\u63a9\u7801\uff0c\u652f\u6301\u9ad8\u7ea7\u4efb\u52a1\uff0c\u59823D\u63a8\u7406\u5206\u5272\u3001\u5c42\u6b21\u641c\u7d22\u3001\u8868\u8fbe\u5f0f\u6307\u4ee3\u548c\u8be6\u7ec6\u63a9\u7801\u8f93\u51fa\u7684\u95ee\u7b54\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5206\u5c42\u63a9\u7801\u89e3\u7801\u5668\uff0c\u80fd\u591f\u7cbe\u786e\u5b9a\u4f4d\u5e7f\u9614\u573a\u666f\u4e2d\u7684\u5c0f\u7269\u4f53\u3002\u8be5\u89e3\u7801\u5668\u9996\u5148\u751f\u6210\u4e00\u4e2a\u7c97\u7565\u7684\u4f4d\u7f6e\u4f30\u8ba1\uff0c\u8986\u76d6\u7269\u4f53\u7684\u5927\u81f4\u533a\u57df\uff0c\u7136\u540e\u91c7\u7528\u9010\u6b65\u7ec6\u5316\u7684\u7b56\u7565\uff0c\u663e\u8457\u63d0\u9ad8\u5bf9\u8c61\u8bc6\u522b\u548c\u5206\u5272\u7684\u7cbe\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cReason3D\u5728ScanNet\u548cMatterport3D\u7b49\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\uff0c\u57283D\u8868\u8fbe\u5f0f\u6307\u4ee3\u30013D\u95ee\u7b54\u548c3D\u63a8\u7406\u5206\u5272\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1ahttps://github.com/KuanchihHuang/Reason3D\u3002**|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|\u7531\u4e8e\u5b9e\u4f53\u4ee3\u7406\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u5b83\u4eec\u5fc5\u987b\u5177\u5907\u5168\u9762\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u8fd1\u671f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\u3002\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u786e\u5b9a\u5177\u4f53\u52a8\u4f5c\u65f6\u53ef\u80fd\u5b58\u5728\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\uff0c\u79f0\u4e3a\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u6240\u80fd\u8fbe\u5230\u7684\u6210\u5c31\u9700\u8981\u66f4\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u662f\u6700\u5feb\u7684\uff0c\u6bd4\u4ee5\u524d\u5feb6.8\u500d\u3002|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418](http://arxiv.org/abs/2405.17418)|null|\u5f53\u673a\u5668\u4eba\u64cd\u4f5c\u7b56\u7565\u9762\u5bf9\u65b0\u4efb\u52a1\u6216\u7269\u4f53\u5b9e\u4f8b\u65f6\uff0c\u5176\u52a8\u4f5c\u6027\u80fd\u5f80\u5f80\u4e0d\u5c3d\u4eba\u610f\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u68c0\u6d4b\u548c\u81ea\u6211\u7ea0\u6b63\u5931\u8d25\u52a8\u4f5c\u7684\u80fd\u529b\u5bf9\u4e8e\u5b9e\u9645\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLM\uff09\u5728\u89c6\u89c9\u6307\u4ee4\u8ddf\u968f\u65b9\u9762\u5c55\u73b0\u51fa\u524d\u666f\uff0c\u5e76\u5728\u591a\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u5c06\u901a\u7528MLLM\u4f5c\u4e3a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4eba\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Self-Corrected (SC)-MLLM\uff0c\u4e0d\u4ec5\u4f7f\u5176\u80fd\u591f\u9884\u6d4b\u672b\u7aef\u6267\u884c\u5668\u4f4d\u7f6e\uff0c\u8fd8\u8d4b\u4e88\u5176\u81ea\u4e3b\u8bc6\u522b\u5e76\u7ea0\u6b63\u9519\u8bef\u52a8\u4f5c\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u53c2\u6570\u6548\u7387\u9ad8\u7684\u5fae\u8c03\uff0c\u4f7fMLLM\u5177\u5907\u59ff\u6001\u9884\u6d4b\u529f\u80fd\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u4e00\u4e2a\u8bed\u8a00\u5efa\u6a21\u95ee\u9898\u3002\u5728\u9047\u5230\u6267\u884c\u5931\u8d25\u65f6\uff0c\u6a21\u578b\u80fd\u8bc6\u522b\u4f4e\u5c42\u6b21\u52a8\u4f5c\u9519\u8bef\u7684\u539f\u56e0\uff08\u5982\u4f4d\u7f6e\u548c\u65cb\u8f6c\u8bef\u5dee\uff09\uff0c\u5e76\u4e3b\u52a8\u5bfb\u6c42\u4e13\u5bb6\u7684\u63d0\u793a\u3002\u6839\u636e\u53cd\u9988\uff0cSC-MLLM\u4f1a\u91cd\u65b0\u601d\u8003\u5f53\u524d\u5931\u8d25\u573a\u666f\uff0c\u751f\u6210\u4fee\u6b63\u540e\u7684\u52a8\u4f5c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u8fde\u7eed\u7b56\u7565\u5b66\u4e60\u65b9\u6cd5\uff0c\u9488\u5bf9\u6210\u529f\u7ea0\u6b63\u7684\u6837\u672c\uff0c\u63d0\u5347\u6a21\u578b\u5bf9\u5f53\u524d\u573a\u666f\u914d\u7f6e\u7684\u9002\u5e94\u6027\uff0c\u51cf\u5c11\u4e13\u5bb6\u5e72\u9884\u7684\u9891\u7387\u3002 \u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684SC-MLLM\uff0c\u6211\u4eec\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u6700\u5148\u8fdb\u7684\u673a\u5668\u4ebaMLLM\uff08ManipLLM\uff09\u76f8\u6bd4\uff0cSC-MLLM\u663e\u8457\u63d0\u9ad8\u4e86\u64cd\u4f5c\u7cbe\u5ea6\uff1a\u5728\u5df2\u77e5\u7269\u4f53\u7c7b\u522b\u4e0a\u4ece57%\u63d0\u5347\u81f379%\uff0c\u5728\u672a\u77e5\u65b0\u7c7b\u522b\u4e0a\u4ece47%\u63d0\u5347\u81f369%\u3002|\n", "2405.17402": "|**2024-05-27**|**THREAD: Thinking Deeper with Recursive Spawning**|Philip Schroeder et.al.|[2405.17402](http://arxiv.org/abs/2405.17402)|**[link](https://github.com/philipmit/thread)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u4f46\u968f\u7740\u4e0a\u4e0b\u6587\u7684\u957f\u5ea6\u548c\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u5b83\u4eec\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Thinking Recursively and Dynamically\uff08ThReaD\uff09\u65b9\u6cd5\u3002ThReaD\u5c06\u6a21\u578b\u751f\u6210\u8fc7\u7a0b\u6784\u60f3\u4e3a\u4e00\u4e2a\u6267\u884c\u6d41\u7a0b\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u53ef\u4ee5\u5b8c\u6574\u8fd0\u884c\u6216\u52a8\u6001\u5730\u521b\u5efa\u65b0\u7ebf\u7a0b\u3002\u901a\u8fc7\u5b50\u7ebf\u7a0b\uff0c\u6a21\u578b\u53ef\u4ee5\u5206\u53d1\u4efb\u52a1\uff08\u5982\u601d\u8003\u3001\u83b7\u53d6\u4fe1\u606f\uff09\uff0c\u5b50\u7ebf\u7a0b\u53ea\u8fd4\u56de\u7236\u7ebf\u7a0b\u6240\u9700\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u8ba9\u6a21\u578b\u80fd\u591f\u6839\u636e\u9700\u8981\u8c03\u6574\u4ea7\u751f\u4ee4\u724c\u65f6\u4f7f\u7528\u7684\u4e2d\u95f4\u5de5\u4f5c\u91cf\u3002\u6211\u4eec\u5728\u4efb\u52a1\u89e3\u51b3\u548c\u95ee\u7b54\u7b49\u573a\u666f\u4e2d\u5e94\u7528ThReaD\uff0c\u4f7f\u5176\u80fd\u9012\u5f52\u5730\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u6216\u95ee\u9898\u5206\u89e3\u4e3a\u9010\u6b65\u7b80\u5316\u7684\u5c0f\u5b50\u95ee\u9898\uff0c\u7531\u5355\u72ec\u7684\u5b50\u7ebf\u7a0b\u89e3\u51b3\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u7684\u65b9\u5f0f\u5b9e\u73b0ThReaD\uff0c\u5e76\u5728\u5305\u62ecALFWorld\u3001TextCraft\u3001WebShop\u5728\u5185\u7684\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30GPT-4\u548cGPT-3.5\u7684\u8868\u73b0\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65b0\u57fa\u51c6\uff1aDataCommons QA\u548cMIMIC-III ICU QA\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cThReaD\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u76f8\u5bf9\u4e8e\u73b0\u6709\u6846\u67b6\uff0c\u5373\u4f7f\u662f\u5c0f\u578b\u6a21\u578b\uff08\u5982Llama-3-8b\u548cCodeLlama-7b\uff09\u4e5f\u80fd\u63d0\u534710%\u523050%\u7684\u7edd\u5bf9\u5206\u6570\u3002|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386](http://arxiv.org/abs/2405.17386)|**[link](https://github.com/cone-mt/mindmerger)**|## \u4efb\u52a1 \u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u82f1\u8bed\u4e0e\u5176\u4ed6\u975e\u82f1\u8bed\u8bed\u8a00\u4e4b\u95f4\u7684\u5dee\u8ddd\u660e\u663e\u3002\u4e00\u4e9b\u7814\u7a76\u901a\u8fc7\u5fae\u8c03LLMs\u4ee5\u91cd\u65b0\u5b66\u4e60\u975e\u82f1\u8bed\u7684\u63a8\u7406\u80fd\u529b\uff0c\u800c\u53e6\u4e00\u4e9b\u65b9\u6cd5\u5219\u4f7f\u7528\u5916\u90e8\u6a21\u578b\uff08\u5982\u82f1\u8bed\u7ffb\u8bd1\u6587\u672c\uff09\u7684\u8f93\u51fa\u6765\u66ff\u6362\u975e\u82f1\u8bed\u8f93\u5165\uff0c\u4ee5\u5e94\u5bf9LLM\u7406\u89e3\u975e\u82f1\u8bed\u7684\u6311\u6218\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u672a\u80fd\u5145\u5206\u5229\u7528LLMs\u5185\u5728\u7684\u63a8\u7406\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528LLMs\u7684\u601d\u7ef4\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u79f0\u4e3aMindMerger\uff0c\u5b83\u5c06LLMs\u4e0e\u591a\u8bed\u8a00\u6a21\u578b\u7684\u5916\u90e8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u4ee5\u63d0\u5347\u591a\u8bed\u8a00\u63a8\u7406\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e24\u6b65\u8bad\u7ec3\u7b56\u7565\uff0c\u9996\u5148\u5c06\u5916\u90e8\u80fd\u529b\u5d4c\u5165LLMs\uff0c\u7136\u540e\u8bad\u7ec3\u5916\u90e8\u80fd\u529b\u548c\u5185\u7f6e\u80fd\u529b\u7684\u534f\u4f5c\u4f7f\u7528\u3002\u5728\u4e09\u4e2a\u591a\u8bed\u8a00\u63a8\u7406\u6570\u636e\u96c6\u548c\u4e00\u4e2a\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMindMerger\u59cb\u7ec8\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u7279\u522b\u662f\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e0a\u3002\u5728\u4e0d\u66f4\u65b0LLMs\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0cMGSM\u6570\u636e\u96c6\u4e0a\u6240\u6709\u8bed\u8a00\u7684\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e866.7%\uff0c\u4f4e\u8d44\u6e90\u8bed\u8a00\u63d0\u9ad8\u4e868.0%\u3002|\n", "2405.17382": "|**2024-05-27**|**ReMoDetect: Reward Models Recognize Aligned LLM's Generations**|Hyunseok Lee et.al.|[2405.17382](http://arxiv.org/abs/2405.17382)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6613\u7528\u6027\u63d0\u5347\uff0c\u5b83\u4eec\u5e26\u6765\u7684\u793e\u4f1a\u98ce\u9669\uff0c\u5982\u5047\u65b0\u95fb\u751f\u6210\uff0c\u4fc3\u4f7f\u5f00\u53d1\u51fa\u80fd\u68c0\u6d4bLLM\u751f\u6210\u6587\u672c\uff08LGT\uff09\u7684\u65b9\u6cd5\u4ee5\u786e\u4fdd\u5b89\u5168\u4f7f\u7528\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5927\u91cfLLM\u7684\u5b58\u5728\uff0c\u9010\u4e2a\u8bc6\u522b\u5b83\u4eec\u7684\u7279\u70b9\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u5173\u6ce8\u7684\u662f\u8fd9\u4e9b\u5f3a\u5927\u6a21\u578b\u5171\u6709\u7684\u7279\u6027\uff0c\u5373\u201c\u5bf9\u9f50\u8bad\u7ec3\u201d\uff0c\u5373\u8bad\u7ec3LLMs\u751f\u6210\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u6587\u672c\u3002\u6211\u4eec\u7684\u5173\u952e\u53d1\u73b0\u662f\uff0c\u968f\u7740\u8fd9\u4e9b\u5bf9\u9f50\u8bad\u7ec3\u7684LLMs\u81f4\u529b\u4e8e\u6700\u5927\u5316\u4eba\u7c7b\u504f\u597d\uff0c\u5b83\u4eec\u751f\u6210\u7684\u6587\u672c\u751a\u81f3\u6bd4\u4eba\u7c7b\u64b0\u5199\u7684\u6587\u672c\u5728\u4f30\u8ba1\u504f\u597d\u4e0a\u66f4\u9ad8\uff0c\u8fd9\u4f7f\u5f97\u5229\u7528\u504f\u597d\u6a21\u578b\uff08\u4e00\u4e2a\u8bad\u7ec3\u6765\u6a21\u62df\u4eba\u7c7b\u504f\u597d\u5206\u5e03\u7684LLM\uff09\u8f7b\u6613\u5c31\u80fd\u68c0\u6d4b\u5230\u8fd9\u4e9b\u6587\u672c\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e24\u79cd\u8fdb\u4e00\u6b65\u589e\u5f3a\u504f\u597d\u6a21\u578b\u68c0\u6d4b\u80fd\u529b\u7684\u8bad\u7ec3\u7b56\u7565\uff1a\uff081\uff09\u6301\u7eed\u504f\u597d\u5fae\u8c03\uff0c\u4f7f\u6a21\u578b\u66f4\u504f\u5411\u4e8e\u8bc6\u522b\u5bf9\u9f50\u7684LLG\uff1b\uff082\uff09\u5956\u52b1\u6a21\u578b\u5bf9\u4eba/LLM\u6df7\u5408\u6587\u672c\u7684\u5b66\u4e60\uff0c\u5373\u4f7f\u7528\u5bf9\u9f50LLM\u91cd\u8ff0\u7684\u4eba\u7c7b\u539f\u521b\u6587\u672c\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8eLGT\u548c\u4eba\u7c7b\u6587\u672c\u4e4b\u95f4\u7684\u504f\u597d\u57fa\u51c6\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u5b66\u4e60\u51b3\u7b56\u8fb9\u754c\u3002\u6211\u4eec\u5728\u516d\u4e2a\u6587\u672c\u9886\u57df\u548c\u5341\u4e8c\u79cd\u5bf9\u9f50LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5728https://github.com/hyunseoklee-ai/reward_llm_detect\u4e0a\u63d0\u4f9b\u3002|\n", "2405.17378": "|**2024-05-27**|**RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects**|Ahmed Allam et.al.|[2405.17378](http://arxiv.org/abs/2405.17378)|**[link](https://github.com/AUCOHL/RTL-Repo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8f85\u52a9\u8fdb\u884c\u5bc4\u5b58\u5668\u4f20\u8f93\u7ea7\uff08Register Transfer Level, RTL\uff09\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5728\u53cd\u6620\u771f\u5b9e\u4e16\u754cRTL\u9879\u76ee\u590d\u6742\u6027\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u4e3a\u6b64\uff0c\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u57fa\u51c6\u2014\u2014RTL-Repo\uff0c\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5927\u89c4\u6a21RTL\u8bbe\u8ba1\u9879\u76ee\u4e2d\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u3002RTL-Repo\u5305\u542b\u4e86\u4eceGitHub\u516c\u5171\u4ed3\u5e93\u63d0\u53d6\u7684\u8d85\u8fc74000\u4e2aVerilog\u4ee3\u7801\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u63d0\u4f9b\u4e86\u5bf9\u5e94\u4ed3\u5e93\u7684\u5b8c\u6574\u4e0a\u4e0b\u6587\u3002\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u3001GPT-3.5\u3001Starcoder2\u4ee5\u53ca\u50cfVeriGen\u548cRTLCoder\u8fd9\u6837\u7684Verilog\u4e13\u7528\u6a21\u578b\u5728\u5185\u7684\u591a\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728RTL-Repo\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6bd4\u8f83\u5b83\u4eec\u5728\u751f\u6210\u590d\u6742\u9879\u76ee\u7684Verilog\u4ee3\u7801\u65b9\u9762\u7684\u8868\u73b0\u3002RTL-Repo\u4e3a\u786c\u4ef6\u8bbe\u8ba1\u793e\u533a\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b9d\u8d35\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645RTL\u8bbe\u8ba1\u573a\u666f\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u9488\u5bf9\u590d\u6742\u7684\u591a\u6587\u4ef6RTL\u9879\u76ee\u4e13\u95e8\u8bad\u7ec3Verilog\u4ee3\u7801\u751f\u6210\u3002RTL-Repo\u662f\u5f00\u6e90\u7684\uff0c\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u53ef\u7528\u3002|\n", "2405.17374": "|**2024-05-28**|**Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models**|ShengYun Peng et.al.|[2405.17374](http://arxiv.org/abs/2405.17374)|null|### \u80cc\u666f \u5b89\u5168\u6821\u51c6\u662f\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u884c\u4e3a\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u5e76\u907f\u514d\u6709\u5bb3\u884c\u4e3a\u7684\u5173\u952e\uff0c\u4f46\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u4ec5\u4f7f\u7528\u5c11\u91cf\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bad\u7ec3\u6837\u672c\u6765\u5fae\u8c03\u6a21\u578b\u53ef\u80fd\u5bfc\u81f4\u5b89\u5168\u6027\u88ab\u8f7b\u6613\u7834\u574f\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u901a\u8fc7\u63a2\u7d22LLM\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u98ce\u9669\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u4e2a\u666e\u904d\u5b58\u5728\u4e8e\u6d41\u884c\u5f00\u6e90LLM\u6a21\u578b\u53c2\u6570\u7a7a\u95f4\u4e2d\u7684\u65b0\u73b0\u8c61\uff0c\u79f0\u4e3a\u201c\u5b89\u5168\u76c6\u5730\u201d\uff1a\u968f\u673a\u6270\u52a8\u6a21\u578b\u6743\u91cd\u80fd\u4f7f\u6a21\u578b\u5728\u5c40\u90e8\u533a\u57df\u4fdd\u6301\u539f\u59cb\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u6027\u3002 ### \u53d1\u73b0\u4e0e\u8d21\u732e \u6211\u4eec\u7684\u53d1\u73b0\u542f\u53d1\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5b89\u5168\u5ea6\u91cf\u65b9\u6cd5\u2014\u2014VISAGE\uff0c\u5b83\u901a\u8fc7\u63a2\u6d4b\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30LLM\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u5b89\u5168\u6027\u3002\u53ef\u89c6\u5316\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6709\u52a9\u4e8e\u7406\u89e3\u5fae\u8c03\u5982\u4f55\u4f7f\u6a21\u578b\u504f\u79bb\u5b89\u5168\u76c6\u5730\uff0c\u4ece\u800c\u635f\u5bb3\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u7cfb\u7edf\u63d0\u793a\u5728\u4fdd\u62a4\u6a21\u578b\u65b9\u9762\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u79cd\u4fdd\u62a4\u751a\u81f3\u4f1a\u4f20\u9012\u7ed9\u5904\u4e8e\u5b89\u5168\u76c6\u5730\u5185\u7684\u6270\u52a8\u7248\u672c\u3002\u8fd9\u4e9b\u4ece\u5b89\u5168\u666f\u89c2\u7814\u7a76\u4e2d\u5f97\u51fa\u7684\u89c1\u89e3\u4e3a\u672a\u6765LLM\u5b89\u5168\u9886\u57df\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b0\u7684\u6d1e\u89c1\u3002|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414](http://arxiv.org/abs/2405.18414)|null|## \u80cc\u666f \u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval Augmented Generation\uff0cRAG\uff09\u901a\u8fc7\u7ed3\u5408\u73b0\u6709\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u7684\u54cd\u5e94\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u6587\u6863\u4e0e\u95ee\u9898\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u6027\u4e0d\u660e\u663e\u6216\u5b58\u5728\u90e8\u5206\u4fe1\u606f\u65f6\uff0cRAG\u7684\u6548\u679c\u5982\u4f55\uff1f\u53c8\u8be5\u5982\u4f55\u5904\u7406\u6587\u6863\u4e4b\u95f4\u7684\u5173\u8054\u6027\u5462\uff1f\u672c\u7814\u7a76\u65e8\u5728\u89e3\u7b54RAG\u751f\u6210\u4e2d\u7684\u8fd9\u4e24\u4e2a\u6838\u5fc3\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aG-RAG\u7684\u65b9\u6cd5\uff0c\u5b83\u662f\u4e00\u4e2a\u57fa\u4e8e\u56fe\u795e\u7ecf\u7f51\u7edc\uff08Graph Neural Networks\uff0cGNNs\uff09\u7684\u91cd\u6392\u5668\uff0c\u4ecb\u4e8eRAG\u7684\u68c0\u7d22\u5668\u548c\u9605\u8bfb\u5668\u4e4b\u95f4\u3002G-RAG\u7ed3\u5408\u4e86\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u63a5\u6027\u548c\u8bed\u4e49\u4fe1\u606f\uff08\u901a\u8fc7\u62bd\u8c61\u610f\u4e49\u8868\u793a\u56fe\uff09\uff0c\u4e3aRAG\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u6709\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u6392\u540d\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cG-RAG\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u9886\u5148\u65b9\u6cd5\uff0c\u540c\u65f6\u8ba1\u7b97\u5f00\u9500\u66f4\u5c0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86PaLM 2\u4f5c\u4e3a\u91cd\u6392\u5668\u7684\u8868\u73b0\uff0c\u53d1\u73b0\u5176\u660e\u663e\u900a\u8272\u4e8eG-RAG\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5373\u4f7f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u91cd\u6392\u5728RAG\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2405.18386": "|**2024-05-28**|**Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning**|Yixiao Zhang et.al.|[2405.18386](http://arxiv.org/abs/2405.18386)|**[link](https://github.com/ldzhangyx/instruct-MusicGen)**|**\u5728\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u9886\u57df\uff0c\u8fd1\u671f\u7684\u8fdb\u6b65\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u6765\u6539\u53d8\u97f3\u4e50\u98ce\u683c\u6216\u8c03\u6574\u4e50\u5668\u5143\u7d20\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u4ece\u5934\u8bad\u7ec3\u7279\u5b9a\u7684\u7f16\u8f91\u6a21\u578b\uff0c\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\uff0c\u8981\u4e48\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u7f16\u8f91\u540e\u7684\u97f3\u4e50\uff0c\u5bfc\u81f4\u97f3\u9891\u91cd\u5efa\u4e0d\u591f\u7cbe\u786e\u3002\u4e3a\u4e86\u7ed3\u5408\u4f18\u70b9\u5e76\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Instruct-MusicGen\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u9488\u5bf9\u9884\u8bad\u7ec3\u7684MusicGen\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9ad8\u6548\u5730\u6267\u884c\u7f16\u8f91\u6307\u4ee4\uff0c\u5982\u6dfb\u52a0\u3001\u5220\u9664\u6216\u5206\u79bb\u97f3\u8f68\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fee\u6539\u4e86\u539f\u59cbMusicGen\u67b6\u6784\uff0c\u5f15\u5165\u4e86\u6587\u672c\u878d\u5408\u6a21\u5757\u548c\u97f3\u9891\u878d\u5408\u6a21\u5757\uff0c\u4f7f\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6307\u4ee4\u6587\u672c\u548c\u97f3\u9891\u8f93\u5165\uff0c\u751f\u6210\u6240\u9700\u7684\u7f16\u8f91\u97f3\u4e50\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cInstruct-MusicGen\u4ec5\u5411\u539f\u59cb\u6a21\u578b\u589e\u52a0\u4e868%\u7684\u65b0\u53c2\u6570\uff0c\u5e76\u57285000\u6b65\u7684\u8bad\u7ec3\u540e\uff0c\u5176\u6027\u80fd\u8d85\u8d8a\u73b0\u6709\u57fa\u51c6\uff0c\u4e14\u8868\u73b0\u51fa\u4e0e\u4e13\u95e8\u9488\u5bf9\u4efb\u52a1\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u8fdb\u5c55\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u7684\u6548\u7387\uff0c\u8fd8\u62d3\u5bbd\u4e86\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u5728\u52a8\u6001\u97f3\u4e50\u5236\u4f5c\u73af\u5883\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002**|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380](http://arxiv.org/abs/2405.18380)|**[link](https://github.com/pixeli99/owlore)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\u3002\u7136\u800c\uff0c\u5927\u6a21\u578b\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u7b49\u53c2\u6570\u9ad8\u6548\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u4f46\u5f80\u5f80\u727a\u7272\u6027\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5185\u5b58\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u2014\u2014Outlier-weighed Layerwise Sampled Low-Rank Projection\uff08OwLore\uff09\uff0c\u5b83\u53d7\u5230LLMs\u5c42\u95f4\u5f02\u5e38\u5206\u5e03\u7684\u542f\u53d1\uff0c\u901a\u8fc7\u52a8\u6001\u91c7\u6837\u9884\u8bad\u7ec3\u5c42\u800c\u975e\u6dfb\u52a0\u989d\u5916\u9002\u914d\u5668\u6765\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u9996\u5148\u901a\u8fc7Heavy-Tailed Self-Regularization\u7406\u8bba\uff08HT-SR\uff09\u89e3\u8bfb\u5f02\u5e38\u73b0\u8c61\uff0c\u53d1\u73b0\u5177\u6709\u66f4\u591a\u5f02\u5e38\u503c\u7684\u5c42\u66f4\u503e\u5411\u4e8e\u5448\u73b0\u957f\u5c3e\u5206\u5e03\uff0c\u8bad\u7ec3\u6548\u679c\u66f4\u597d\u3002\u56e0\u6b64\uff0cOwLore\u7b56\u7565\u6027\u5730\u4e3a\u5f02\u5e38\u503c\u8f83\u591a\u7684\u5c42\u5206\u914d\u66f4\u9ad8\u7684\u91c7\u6837\u6982\u7387\uff0c\u4ee5\u66f4\u597d\u5730\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u77e5\u8bc6\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u51cf\u5c11\u5fae\u8c03\u65f6\u7684\u5185\u5b58\u9700\u6c42\uff0c\u6211\u4eec\u7ed3\u5408\u68af\u5ea6\u4f4e\u79e9\u6295\u5f71\uff0c\u4f7f\u5f97\u6bcf\u4e00\u5c42\u80fd\u4ee5\u4f4e\u79e9\u65b9\u5f0f\u9ad8\u6548\u8bad\u7ec3\u3002\u901a\u8fc7\u878d\u5408\u4f4e\u79e9\u4f18\u52bf\u548c\u6700\u4f18\u5c42\u522b\u91c7\u6837\u7b56\u7565\uff0cOwLore\u663e\u8457\u4f18\u5316\u4e86LLM\u526a\u679d\u4e2d\u7684\u5185\u5b58-\u6027\u80fd\u6743\u8861\u3002\u6211\u4eec\u5728\u591a\u4e2a\u67b6\u6784\uff0c\u5982LLaMa2\u3001LLaMa3\u548cMistral\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cOwLore\u6301\u7eed\u4f18\u4e8e\u57fa\u7840\u65b9\u6cd5\uff0c\u5305\u62ec\u5168\u91cf\u5fae\u8c03\u3002\u4f8b\u5982\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u57fa\u51c6\u4e0a\uff0cOwLore\u53ef\u5b9e\u73b0\u5e73\u57471.1%\u7684\u7cbe\u5ea6\u63d0\u5347\uff0cMMLU\u4e0a\u63d0\u9ad83.0%\uff0c\u800c\u5728MT-Bench\u4e0a\u66f4\u662f\u6709\u663e\u8457\u768410%\u63d0\u5347\uff0c\u540c\u65f6\u5185\u5b58\u6548\u7387\u66f4\u9ad8\u3002\u7279\u522b\u5730\uff0cOwLore\u4ec5\u970021GB\u5185\u5b58\u5373\u53ef\u5bf9LLaMa2-7B\u8fdb\u884c\u5fae\u8c03\u3002**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377](http://arxiv.org/abs/2405.18377)|null|\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u590d\u6742\u63a8\u7406\u3001\u60c5\u611f\u5206\u6790\u7b49\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u63a8\u52a8\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f3a\u5927\u7684\u529f\u80fd\u4f34\u968f\u7740\u5de8\u5927\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u9650\u5236\u4e86\u5728\u5927\u591a\u6570\u786c\u4ef6\u5e73\u53f0\u4e0a\u7684\u4f7f\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8eLLaMA2-7B\u8fdb\u884c\u5355\u6b21\u5fae\u8c03\u540e\uff0c\u901a\u8fc7\u9057\u4f20\u7b97\u6cd5\u641c\u7d22\u627e\u5230\u66f4\u5c0f\u3001\u8ba1\u7b97\u590d\u6742\u5ea6\u66f4\u4f4e\u7684\u7f51\u7edc\u67b6\u6784\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u6807\u51c6\u57fa\u51c6\u4efb\u52a1\uff0c\u9884\u8bad\u7ec3\u7684LLaMA2-7B\u6a21\u578b\u5b9e\u9645\u4e0a\u8fc7\u4e8e\u5e9e\u5927\u4e14\u590d\u6742\u3002\u6211\u4eec\u5b9e\u73b0\u4e861.5\u500d\u7684\u6a21\u578b\u5927\u5c0f\u7f29\u51cf\u548c1.3\u500d\u7684\u541e\u5410\u91cf\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51e0\u4e4e\u65e0\u635f\u7684\u51c6\u786e\u6027\u3002\u76f8\u8f83\u4e8e\u67d0\u4e9b\u526a\u679d\u6216\u7a00\u758f\u5316\u6280\u672f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6548\u7387\u548c\u6548\u679c\u4e0a\u66f4\u4e3a\u4f18\u8d8a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u91cf\u5316\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\u7684\u6548\u679c\uff0c\u8fdb\u4e00\u6b65\u901a\u8fc7\u91cf\u5316\u51cf\u5c11\u4e86\u627e\u5230\u7684\u7f51\u7edc\u7684\u5927\u5c0f\u548c\u590d\u6742\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u672c\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u79cd\u81ea\u52a8\u521b\u5efa\u53ef\u5728\u66f4\u5ec9\u4ef7\u548c\u5e7f\u6cdb\u53ef\u7528\u786c\u4ef6\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684LLMs\u7684\u65b9\u6cd5\u3002|\n", "2405.18376": "|**2024-05-28**|**Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning**|Dongjie Chen et.al.|[2405.18376](http://arxiv.org/abs/2405.18376)|**[link](https://github.com/Dong-Jie-Chen/RCL)**|**### \u80cc\u666f \u6e90\u514d\u8d39\u9886\u57df\u9002\u5e94\uff08SFDA\uff09\u7684\u76ee\u6807\u662f\u4ec5\u4f7f\u7528\u672a\u6807\u8bb0\u7684\u9776\u57df\u6570\u636e\u6765\u8c03\u6574\u9884\u8bad\u7ec3\u7684\u6e90\u6a21\u578b\u3002\u5f53\u524d\u7684SFDA\u65b9\u6cd5\u5728\u6709\u6548\u5229\u7528\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u6316\u6398\u9776\u57df\u6570\u636e\u6f5c\u529b\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u548c\u6587\u672c\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5e94\u7528\u4e8eSFDA\u65f6\u5b58\u5728\u95ee\u9898\uff0c\u5982\u6307\u4ee4\u6267\u884c\u5931\u8d25\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\u4ee5\u53ca\u5728\u9002\u5e94\u524d\u6027\u80fd\u8bc4\u4f30\u56f0\u96be\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014\u53ef\u9760\u6027\u57fa\u4e8e\u8bfe\u7a0b\u5b66\u4e60\uff08RCL\uff09\uff0c\u5b83\u901a\u8fc7\u4f2a\u6807\u7b7e\u5316\u6574\u5408\u591a\u4e2aMLLM\u4ee5\u4fc3\u8fdb\u77e5\u8bc6\u5229\u7528\uff0c\u5e94\u7528\u4e8eSFDA\u3002 ### \u65b9\u6cd5 \u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\uff1a1) \u53ef\u9760\u77e5\u8bc6\u8f6c\u79fb\uff0c2) \u81ea\u6211\u7ea0\u6b63\uff0c3) MLLM\u5f15\u5bfc\u7684\u77e5\u8bc6\u6269\u5c55\uff0c\u4ee5\u53ca4) \u591a\u70ed\u63a9\u7801\u7cbe\u70bc\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u534f\u540c\u4f5c\u7528\uff0c\u9010\u6b65\u53d1\u6398\u9776\u57df\u672a\u6807\u8bb0\u6570\u636e\u7684\u4ef7\u503c\u3002RCL\u5728\u591a\u4e2aSFDA\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\uff08SOTA\uff09\u6027\u80fd\uff0c\u4f8b\u5982\u5728DomainNet\u4e0a\u63d0\u5347\u663e\u8457\uff0c\u8fbe\u5230$\\textbf{+9.4\\%}$\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u589e\u5f3a\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u540c\u65f6\u65e0\u9700\u8bbf\u95ee\u6e90\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Dong-Jie-Chen/RCL\u83b7\u53d6\u3002**|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375](http://arxiv.org/abs/2405.18375)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\u5e38\u8bc6\u63a8\u7406\u662f\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u7684\u91cd\u8981\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u6b64\u5df2\u5f00\u53d1\u51fa\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5927\u591a\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u521b\u5efa\u5e73\u884c\u57fa\u51c6\u6709\u52a9\u4e8e\u8de8\u8bed\u8a00\u8bc4\u4f30\uff0c\u4ece\u800c\u66f4\u597d\u5730\u7406\u89e3\u4e0d\u540c\u8bed\u8a00\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u6cf0\u8bed\u7248\u7684Winograd Schema\u96c6\u5408\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u6d4b\u8bd5\u6cf0\u8bed\u4e2d\u7684\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u65b0\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u9080\u8bf7\u6bcd\u8bed\u8005\u3001\u4e13\u4e1a\u7ffb\u8bd1\u548c\u4e25\u683c\u9a8c\u8bc1\u7684\u65b9\u6cd5\uff0c\u786e\u4fdd\u8be5\u7cfb\u5217\u9898\u5e93\u80fd\u51c6\u786e\u53cd\u6620\u6cf0\u56fd\u8bed\u8a00\u7684\u72ec\u7279\u6027\u3001\u4e60\u8bed\u548c\u6587\u5316\u5f15\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u7cca\u6027\u548c\u5e38\u8bc6\u6311\u6218\u3002\u6211\u4eec\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u548cClaude-3-Opus\uff09\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5c3d\u7ba1\u5728\u82f1\u8bed\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u4f46\u5b83\u4eec\u5728\u6cf0\u8bed\u4e2d\u7684\u6027\u80fd\u660e\u663e\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u5728\u591a\u8bed\u8a00\u5e38\u8bc6\u63a8\u7406\u65b9\u9762\u4ecd\u6709\u5f85\u8fdb\u6b65\u3002|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369](http://arxiv.org/abs/2405.18369)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u7684\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u5b83\u4eec\u6210\u529f\u7684\u5173\u952e\u5728\u4e8e\u63d0\u793a\u7684\u6982\u5ff5\uff0c\u5373\u6307\u5bfc\u6a21\u578b\u751f\u6210\u8f93\u51fa\u3002\u7136\u800c\uff0c\u624b\u52a8\u521b\u5efa\u63d0\u793a\u65e2\u8017\u65f6\u53c8\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\uff0c\u56e0\u6b64\u9700\u8981\u81ea\u52a8\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u4ecb\u7ecdPromptWizard\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u8fed\u4ee3\u5730\u5408\u6210\u548c\u4f18\u5316\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u63d0\u793a\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0cPromptWizard\u540c\u65f6\u4f18\u5316\u63d0\u793a\u6307\u4ee4\u548c\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u4ee5\u6700\u5927\u5316\u6a21\u578b\u6027\u80fd\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u53d8\u5f02\u6307\u4ee4\u5e76\u5f15\u5165\u8d1f\u4f8b\uff0c\u9010\u6b65\u6df1\u5316\u7406\u89e3\u5e76\u4fdd\u8bc1\u591a\u6837\u6027\u3002\u501f\u52a9\u4e00\u4e2a\u8bc4\u5224\u8005\uff0cPromptWizard\u8fdb\u4e00\u6b65\u6539\u8fdb\u6307\u4ee4\u548c\u793a\u4f8b\uff0c\u878d\u5165\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u4ee5\u5b9e\u73b0\u6700\u4f73\u8868\u73b0\u3002PromptWizard\u5177\u6709\u8ba1\u7b97\u6548\u7387\u9ad8\u3001\u9002\u5e94\u4e0d\u540c\u8bad\u7ec3\u6570\u636e\u91cf\u573a\u666f\u4ee5\u53ca\u5728\u5c0f\u578bLLM\u4e0a\u540c\u6837\u6709\u6548\u7684\u7279\u70b9\u3002\u901a\u8fc7\u5bf98\u4e2a\u6570\u636e\u96c6\u768435\u4e2a\u4efb\u52a1\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aPromptWizard\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u7b56\u7565\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u63d0\u793a\u4f18\u5316\u65b9\u9762\u7684\u9ad8\u6548\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361](http://arxiv.org/abs/2405.18361)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u4efb\u52a1\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u57fa\u4e8e\u7aef\u5230\u7aef\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u5e94\u7528\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u8bd5\u56fe\u878d\u5408\u5f3a\u5927\u7684\u903b\u8f91\u63a8\u7406\u548c\u8ba4\u77e5\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u5168\u9762\u7684\u7aef\u5230\u7aef\u89c4\u5212\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684VLM\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e2D\u89c6\u89c9\u5206\u8bcd\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5728\u5904\u7406\u4e09\u7ef4\u51e0\u4f55\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u9760\u7684\u89c4\u5212\u81f3\u5173\u91cd\u8981\u3002\u7814\u7a76\u8868\u660e\uff0c2D\u5206\u8bcd\u7684LLM\u5e76\u4e0d\u80fd\u51c6\u786e\u611f\u77e5\u4e09\u7ef4\u73af\u5883\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8eVLM\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u53ef\u9760\u6027\u7684\u8d28\u7591\u3002 \u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86DETR\u98ce\u683c\u76843D\u611f\u77e5\u5668\u4f5c\u4e3a3D\u5206\u8bcd\u5668\uff0c\u4e0e\u5355\u5c42\u7ebf\u6027\u6295\u5f71\u5668\u76f8\u8fde\uff0c\u5de7\u5999\u5730\u5229\u7528\u4e86\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u7684\u56fa\u6709\u7279\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u9ad8\u5206\u8fa8\u7387\u591a\u89c6\u89d2\u56fe\u50cf\u7684\u540c\u65f6\u5904\u7406\u548c\u65f6\u7a7a\u5efa\u6a21\u3002\u5c3d\u7ba1\u7b80\u5355\uff0c\u4f46Atlas\u5728NuScenes\u6570\u636e\u96c6\u4e0a\u76843D\u68c0\u6d4b\u548c\u81ea\u4e3b\u9a7e\u9a76\u89c4\u5212\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e863D\u5206\u8bcd\u7684LLM\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u9760\u81ea\u52a8\u9a7e\u9a76\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2405.18359": "|**2024-05-28**|**Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs**|Somnath Kumar et.al.|[2405.18359](http://arxiv.org/abs/2405.18359)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5168\u7403\u8303\u56f4\u5185\u91cd\u5851\u4f17\u591a\u9886\u57df\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u975e\u62c9\u4e01\u5b57\u6bcd\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u65f6\u7684\u5305\u5bb9\u6027\u548c\u6548\u679c\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u5173\u952e\u6311\u6218\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u5927\u91cf\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u591a\u8bed\u8a00LLMs\u7684\u8868\u73b0\u3002\u901a\u8fc7\u7cfb\u7edf\u5730\u7814\u7a76\u548c\u8bc4\u4f30\u5404\u79cd\u8bed\u8a00\u5728\u6d41\u884c\u7684\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b0\u9896\u6280\u672f\uff0c\u4ee5\u91ca\u653eLLMs\u5728\u591a\u5143\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u771f\u6b63\u6f5c\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u7b56\u7565\uff0c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u591a\u8bed\u8a00\u80fd\u529b\uff1a\u9996\u5148\uff0c\u7cbe\u5fc3\u4f18\u5316\u9002\u7528\u4e8e\u591a\u8bed\u8a00LLM\u7684\u63d0\u793a\uff0c\u6316\u6398\u5176\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5404\u8bed\u8a00\u7684\u8868\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u6df7\u5408\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u591a\u8bed\u8a00\u5d4c\u5165\u7684LLM\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\uff0c\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u591a\u4efb\u52a1\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u52a8\u6001\u5b66\u4e60\u7b56\u7565\uff0c\u5b9e\u73b0\u5b9e\u65f6\u6839\u636e\u67e5\u8be2\u52a8\u6001\u9009\u62e9\u6700\u5408\u9002\u7684\u63d0\u793a\u7b56\u7565\u3001LLM\u6a21\u578b\u548c\u5d4c\u5165\u6a21\u578b\uff0c\u4ece\u800c\u6700\u5927\u5316LLM\u5728\u4e0d\u540c\u8bed\u8a00\u4e0a\u7684\u6548\u7387\uff0c\u8d85\u8d8a\u4e86\u6700\u4f73\u9759\u6001\u548c\u968f\u673a\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u65e2\u9002\u7528\u4e8e\u79bb\u7ebf\u914d\u7f6e\u8c03\u6574\uff0c\u4e5f\u652f\u6301\u5728\u7ebf\u9002\u5e94\uff0c\u80fd\u591f\u65e0\u7f1d\u9002\u5e94\u65b0\u8bed\u8a00\u548c\u6570\u636e\u96c6\uff0c\u663e\u8457\u63a8\u52a8\u4e86\u591a\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u5728\u5404\u79cd\u8bed\u8a00\u4e2d\u7684\u8fdb\u6b65\u3002|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358](http://arxiv.org/abs/2405.18358)|null|## \u80cc\u666f \u8fd1\u671f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u878d\u5408\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7ec6\u81f4\u7684\u591a\u6a21\u6001\u7406\u89e3\u3001\u590d\u6742\u4efb\u52a1\u89e3\u6790\u4ee5\u53ca\u591a\u6a21\u6001\u4fe1\u606f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMMCTAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u89e3\u51b3\u5f53\u524dMLLM\u5728\u590d\u6742\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u4e2d\u56fa\u6709\u5c40\u9650\u6027\u7684\u65b0\u578b\u591a\u6a21\u6001\u6279\u5224\u6027\u601d\u7ef4\u4ee3\u7406\u6846\u67b6\u3002MMCTAgent\u501f\u9274\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u548c\u6279\u5224\u6027\u601d\u8003\u7684\u7279\u70b9\uff0c\u901a\u8fc7\u8fed\u4ee3\u5206\u6790\u591a\u6a21\u6001\u4fe1\u606f\u3001\u62c6\u89e3\u95ee\u9898\u3001\u89c4\u5212\u7b56\u7565\uff0c\u5e76\u5b9e\u73b0\u52a8\u6001\u63a8\u7406\u3002 \u6b64\u5916\uff0cMMCTAgent\u8fd8\u878d\u5165\u4e86\u6279\u5224\u6027\u601d\u8003\u5143\u7d20\uff0c\u5982\u5bf9\u6700\u7ec8\u7b54\u6848\u7684\u9a8c\u8bc1\u548c\u81ea\u6211\u53cd\u601d\u3002\u5b83\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5b9a\u4e49\u57fa\u4e8e\u89c6\u89c9\u7684\u8bc4\u5224\u8005\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ece\u800c\u63d0\u5347\u51b3\u7b56\u80fd\u529b\u3002\u5728\u591a\u4e2a\u56fe\u50cf\u7406\u89e3\u548c\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u4e25\u8c28\u5730\u8bc4\u4f30\u4e86MMCTAgent\uff08\u5305\u62ec\u5e26\u8bc4\u5224\u8005\u7684\u7248\u672c\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u5728\u8d85\u8d8a\u57fa\u7840MLLM\u548c\u5176\u4ed6\u5de5\u5177\u589e\u5f3a\u7684\u7ba1\u9053\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335](http://arxiv.org/abs/2405.19335)|null|\u6211\u4eec\u63d0\u51faX-VILA\uff0c\u4e00\u79cd\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u529f\u80fd\u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u878d\u5408\u4e86\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u6a21\u6001\u3002\u901a\u8fc7\u5c06\u5404\u6a21\u6001\u7279\u5b9a\u7684\u7f16\u7801\u5668\u4e0eLLM\u8f93\u5165\u5bf9\u9f50\uff0c\u5e76\u5c06\u6269\u6563\u89e3\u7801\u5668\u4e0eLLM\u8f93\u51fa\u5bf9\u9f50\uff0cX-VILA\u5b9e\u73b0\u4e86\u8de8\u6a21\u6001\u7406\u89e3\u3001\u63a8\u7406\u548c\u751f\u6210\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8de8\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6709\u6548\u7684\u4efb\u610f\u6a21\u6001\u6307\u4ee4\u8ddf\u968f\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u7684\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u5b58\u5728\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5bfc\u81f4\u89c6\u89c9\u4fe1\u606f\u4e22\u5931\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u89c6\u89c9\u5bf9\u9f50\u673a\u5236\uff0c\u5305\u62ec\u4e00\u4e2a\u89c6\u89c9\u5d4c\u5165\u9ad8\u901f\u516c\u8def\u6a21\u5757\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u8d44\u6e90\u9ad8\u6548\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u4f7f\u5f97X-VILA\u5728\u4efb\u610f\u6a21\u6001\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5927\u5e45\u8d85\u8d8a\u5148\u524d\u7684\u65b9\u6cd5\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u5728\u7f3a\u4e4f\u7c7b\u4f3c\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0cX-VILA\u5728\u4e0d\u540c\u6a21\u6001\u95f4\u4e5f\u5c55\u73b0\u51fa\u6d8c\u73b0\u7279\u6027\u3002\u8be5\u9879\u76ee\u5c06\u5f00\u6e90\u3002|\n", "2405.19334": "|**2024-05-29**|**LLMs Meet Multimodal Generation and Editing: A Survey**|Yingqing He et.al.|[2405.19334](http://arxiv.org/abs/2405.19334)|**[link](https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u5c06\u5b83\u4eec\u4e0e\u591a\u6a21\u6001\u5b66\u4e60\u76f8\u7ed3\u5408\u3002\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8c03\u67e5\u4e3b\u8981\u96c6\u4e2d\u5728\u7406\u89e3\u4e0a\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8de8\u56fe\u50cf\u3001\u89c6\u9891\u30013D\u548c\u97f3\u9891\u7b49\u9886\u57df\u7684\u591a\u6a21\u6001\u751f\u6210\uff0c\u7279\u522b\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u9886\u57df\u4e2d\u7684\u91cc\u7a0b\u7891\u5f0f\u5de5\u4f5c\u53ca\u5176\u6280\u672f\u8fdb\u6b65\u3002\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u5173\u952e\u6280\u672f\u7ec4\u4ef6\uff0c\u4ee5\u53ca\u5728\u76f8\u5173\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5256\u6790\u4e86\u501f\u52a9\u73b0\u6709\u751f\u6210\u6a21\u578b\u8fdb\u884c\u4eba\u7c7b-\u8ba1\u7b97\u673a\u4ea4\u4e92\u7684\u5de5\u5177\u589e\u5f3a\u578b\u591a\u6a21\u6001\u4ee3\u7406\u3002\u6700\u540e\uff0c\u6211\u4eec\u5168\u9762\u8ba8\u8bba\u4e86\u4eba\u5de5\u667a\u80fd\u5b89\u5168\u7684\u8fdb\u6b65\uff0c\u5e76\u63a2\u7d22\u4e86\u65b0\u5174\u5e94\u7528\u548c\u672a\u6765\u524d\u666f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7cfb\u7edf\u800c\u6df1\u5165\u7684\u591a\u6a21\u6001\u751f\u6210\u6982\u8ff0\uff0c\u6709\u671b\u63a8\u52a8\u751f\u6210\u5185\u5bb9\u7684\u4eba\u5de5\u667a\u80fd\uff08AIGC\uff09\u548c\u4e16\u754c\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6240\u6709\u76f8\u5173\u7684\u8bba\u6587\u5217\u8868\u53ef\u5728\u627e\u5230\u3002**|\n", "2405.19333": "|**2024-05-29**|**Multi-Modal Generative Embedding Model**|Feipeng Ma et.al.|[2405.19333](http://arxiv.org/abs/2405.19333)|null|\u5728\u5927\u591a\u6570\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\uff0c\u95ee\u9898\u53ef\u4ee5\u5f52\u7ed3\u4e3a\u751f\u6210\u6216\u5d4c\u5165\u3002\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u901a\u8fc7\u5c06\u8bed\u8a00\u6a21\u5757\u5206\u89e3\u4e3a\u4e00\u4e2a\u7528\u4e8e\u751f\u6210\u7684\u6587\u672c\u89e3\u7801\u5668\u548c\u4e00\u4e2a\u7528\u4e8e\u5d4c\u5165\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u5904\u7406\u8fd9\u4e24\u79cd\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u591a\u6a21\u6001\u65b9\u6cd5\u7684\u7b80\u7ea6\u6027\uff0c\u672c\u5de5\u4f5c\u8bd5\u56fe\u4ec5\u4f7f\u7528\u4e00\u4e2a\u6a21\u578b\u6765\u5904\u7406\u6bcf\u79cd\u6a21\u6001\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u6a21\u6001\u751f\u6210\u5d4c\u5165\u6a21\u578b\uff08MM-GEM\uff09\uff0c\u5b83\u5c06\u751f\u6210\u548c\u5d4c\u5165\u76ee\u6807\u6574\u5408\u5230\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86PoolAggregator\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u5e76\u5b9e\u73b0\u7ec6\u7c92\u5ea6\u7684\u5d4c\u5165\u548c\u751f\u6210\u80fd\u529b\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u4e4b\u95f4\u5e76\u6ca1\u6709\u663e\u8457\u51b2\u7a81\u3002\u4f8b\u5982\uff0c\u57fa\u4e8eViT-Large\u548cTinyLlama\u7684MM-GEM\u5728\u8bf8\u5982\u8de8\u6a21\u6001\u68c0\u7d22\u548c\u96f6\u6837\u672c\u5206\u7c7b\u7b49\u591a\u6a21\u6001\u5d4c\u5165\u6a21\u578b\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u5177\u5907\u826f\u597d\u7684\u56fe\u50cf\u63cf\u8ff0\u80fd\u529b\u3002\u6b64\u5916\uff0cMM-GEM\u80fd\u591f\u65e0\u7f1d\u6267\u884c\u533a\u57df\u7ea7\u522b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u548c\u68c0\u7d22\u4efb\u52a1\u3002\u53e6\u5916\uff0cMM-GEM\u4e2d\u7684\u5148\u8fdb\u6587\u672c\u6a21\u578b\u5bf9\u4e8e\u957f\u6587\u672c\u548c\u56fe\u50cf\u68c0\u7d22\u7684Recall@1\u6307\u6807\u5e26\u6765\u4e86\u8d85\u8fc75%\u7684\u63d0\u5347\u3002|\n", "2405.19332": "|**2024-05-29**|**Self-Exploring Language Models: Active Preference Elicitation for Online Alignment**|Shenao Zhang et.al.|[2405.19332](http://arxiv.org/abs/2405.19332)|**[link](https://github.com/shenao-zhang/selm)**|****\u6458\u8981\uff1a** \u504f\u597d\u4f18\u5316\uff0c\u7279\u522b\u662f\u5728\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u7684\u9a71\u52a8\u4e0b\uff0c\u5df2\u7ecf\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u610f\u613f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u6210\u5c31\u3002\u76f8\u8f83\u4e8e\u4f7f\u7528\u56fa\u5b9a\u6570\u636e\u96c6\u7684\u79bb\u7ebf\u5bf9\u9f50\uff0c\u901a\u8fc7\u4eba\u6216\u4eba\u5de5\u667a\u80fd\u5bf9\u6a21\u578b\u751f\u6210\u7684\u53cd\u9988\u901a\u5e38\u80fd\u591f\u901a\u8fc7\u8fed\u4ee3\u8fc7\u7a0b\u63d0\u5347\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\u548cLLMs\u7684\u4e00\u81f4\u6027\u3002\u7136\u800c\uff0c\u8981\u5b9e\u73b0\u5168\u5c40\u51c6\u786e\u7684\u5956\u52b1\u6a21\u578b\uff0c\u9700\u8981\u7cfb\u7edf\u5730\u63a2\u7d22\u751f\u6210\u5404\u79cd\u5404\u6837\u7684\u54cd\u5e94\uff0c\u4ee5\u6db5\u76d6\u81ea\u7136\u8bed\u8a00\u7684\u5e7f\u9614\u7a7a\u95f4\u3002\u4ec5\u4f9d\u8d56\u6807\u51c6\u5956\u52b1\u6700\u5927\u5316LLMs\u7684\u968f\u673a\u91c7\u6837\u662f\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u7684\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5c42\u76ee\u6807\uff0c\u4e50\u89c2\u5730\u503e\u5411\u4e8e\u53ef\u80fd\u5177\u6709\u9ad8\u5956\u52b1\u7684\u54cd\u5e94\uff0c\u4ee5\u6b64\u6765\u4e3b\u52a8\u63a2\u7d22\u5206\u5e03\u5916\u533a\u57df\u3002\u901a\u8fc7\u89e3\u51b3\u5185\u5c42\u95ee\u9898\uff0c\u5229\u7528\u91cd\u65b0\u53c2\u6570\u5316\u7684\u5956\u52b1\u51fd\u6570\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aSelf-Exploring Language Models\uff08SELM\uff09\u7684\u7b97\u6cd5\u3002\u5b83\u6d88\u9664\u4e86\u5bf9\u5355\u72ec\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u7684\u9700\u6c42\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u76f4\u89c2\u7684\u76ee\u6807\u5bf9LLMs\u8fdb\u884c\u8fed\u4ee3\u66f4\u65b0\u3002\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u76f8\u6bd4\uff0cSELM\u7684\u76ee\u6807\u964d\u4f4e\u4e86\u5bf9\u672a\u89c1\u8fc7\u7684\u8fc7\u5ea6\u5ef6\u4f38\u7684\u65e0\u5dee\u522b\u504f\u597d\uff0c\u63d0\u9ad8\u4e86\u63a2\u7d22\u6548\u7387\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728Zephyr-7B-SFT\u548cLlama-3-8B-Instruct\u6a21\u578b\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0cSELM\u5728MT-Bench\u548cAlpacaEval 2.0\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4ee5\u53ca\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u5404\u79cd\u6807\u51c6\u5b66\u672f\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328](http://arxiv.org/abs/2405.19328)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u89c4\u8303\u6a21\u5757\u201d\u7684\u67b6\u6784\uff0c\u5b83\u9488\u5bf9\u751f\u6210\u6027\u4ee3\u7406\u5728\u9762\u5bf9\u5305\u542b\u73b0\u6709\u89c4\u8303\u7684\u793e\u4f1a\u7ed3\u6784\u65f6\u7684\u534f\u4f5c\u6311\u6218\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u8bc4\u4f30\u73af\u5883\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u793e\u4f1a\u4efb\u52a1\u65f6\uff0c\u5982\u4f55\u8bc6\u522b\u5e76\u9002\u5e94\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u89c4\u8303\u6a21\u5757\u7684\u6838\u5fc3\u5728\u4e8e\u4fc3\u8fdb\u5747\u8861\u9009\u62e9\uff0c\u501f\u9274\u5206\u7c7b\u673a\u6784\u5b9e\u73b0\u76f8\u5173\u5747\u8861\u7684\u6982\u5ff5\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u540c\u4f34\u4e92\u52a8\u5b66\u4e60\u73af\u5883\u4e2d\u4e0d\u540c\u5019\u9009\u673a\u6784\u4e2d\u7684\u6743\u5a01\u6027\u3002\u901a\u8fc7\u63d0\u5347\u89c4\u8303\u80fd\u529b\uff0c\u4ee3\u7406\u53ef\u4ee5\u534f\u8c03\u5236\u88c1\u884c\u4e3a\uff0c\u8fdb\u800c\u5f71\u54cd\u793e\u4ea4\u73af\u5883\u4e2d\u7684\u57fa\u672c\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4f53\u798f\u7949\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u652f\u6301\u673a\u6784\u7684\u65b0\u73af\u5883\uff0c\u5e76\u6839\u636e\u4e24\u4e2a\u4e3b\u8981\u6807\u51c6\u6765\u8bc4\u4f30\u8be5\u6846\u67b6\uff1a\u4e00\u662f\u4ee3\u7406\u80fd\u5426\u5ffd\u7565\u975e\u6743\u5a01\u673a\u6784\uff0c\u4e8c\u662f\u4ee3\u7406\u5728\u591a\u4e2a\u9009\u9879\u4e2d\u8bc6\u522b\u6743\u5a01\u673a\u6784\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u914d\u5907\u4e86\u89c4\u8303\u6a21\u5757\u7684\u4ee3\u7406\u76f8\u6bd4\u57fa\u7840\u4ee3\u7406\u80fd\u5b9e\u73b0\u66f4\u7a33\u5b9a\u7684\u5408\u4f5c\u6548\u679c\uff0c\u8fd9\u4e3a\u7814\u7a76\u8bbe\u8ba1\u8003\u8651\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u7684\u73af\u5883\u548c\u4ee3\u7406\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002|\n", "2405.19327": "|**2024-05-29**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327](http://arxiv.org/abs/2405.19327)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u51fa\u4e8e\u5546\u4e1a\u5229\u76ca\uff0c\u50cfGPT\u3001Gemini\u548cClaude\u8fd9\u6837\u7684\u6700\u5148\u8fdb\u6a21\u578b\u88ab\u5c01\u95ed\u5728\u4e13\u6709\u63a5\u53e3\u540e\uff0c\u5176\u8bad\u7ec3\u8be6\u60c5\u5e76\u672a\u516c\u5f00\u3002\u8fd1\u671f\uff0c\u4e00\u4e9b\u673a\u6784\u5f00\u6e90\u4e86\u7c7b\u4f3c\u6027\u80fd\u7684LLMs\uff0c\u5982LLaMA-3\uff0c\u4f46\u5927\u591a\u6570\u7ec6\u8282\uff08\u5982\u4e2d\u95f4\u68c0\u67e5\u70b9\u3001\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u8bad\u7ec3\u4ee3\u7801\u7b49\uff09\u4ecd\u672a\u62ab\u9732\u3002\u4e3a\u4e86\u63d0\u9ad8LLMs\u7684\u900f\u660e\u5ea6\uff0c\u7814\u7a76\u754c\u6b63\u5728\u63a8\u52a8\u771f\u6b63\u5f00\u653e\u7684\u6a21\u578b\uff0c\u5982Pythia\u3001Amber\u548cOLMo\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u4fe1\u606f\uff0c\u4fc3\u8fdb\u4e86\u5bf9\u5927\u6a21\u578b\u6027\u80fd\u3001\u5c40\u9650\u6027\u3001\u504f\u89c1\u548c\u98ce\u9669\u7684\u79d1\u5b66\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5f00\u653e\u6a21\u578b\u5728\u63a8\u7406\u3001\u77e5\u8bc6\u548c\u7f16\u7a0b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ecd\u900a\u4e8e\u540c\u7b49\u89c4\u6a21\u7684\u5c01\u95ed\u6e90\u7801\u6a21\u578b\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u6e90\u4e86MAP-Neo\uff0c\u4e00\u4e2a\u62e5\u670970\u4ebf\u53c2\u6570\u7684\u53cc\u8bed\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u5934\u5f00\u59cb\u57284.5\u4e07\u4ebf\u9ad8\u8d28\u91cf\u4ee4\u724c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002MAP-Neo\u662f\u9996\u4e2a\u4e0e\u73b0\u6709\u9876\u7ea7LLMs\u6027\u80fd\u76f8\u5f53\u7684\u5b8c\u5168\u5f00\u6e90\u7684\u53cc\u8bed\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u4e86\u6240\u6709\u7ec6\u8282\uff0c\u5305\u62ec\u6e05\u7406\u540e\u7684\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u3001\u6570\u636e\u6e05\u6d17\u6d41\u7a0b\u3001\u68c0\u67e5\u70b9\u4ee5\u53ca\u4f18\u5316\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30\u6846\u67b6\uff0c\u4ee5\u4f9b\u91cd\u73b0\u3002\u6211\u4eec\u671f\u671bMAP-Neo\u80fd\u63a8\u52a8\u5f00\u653e\u7814\u7a76\u793e\u533a\u7684\u53d1\u5c55\uff0c\u6fc0\u53d1\u66f4\u591a\u521b\u65b0\uff0c\u4fc3\u8fdbLLMs\u7684\u8fdb\u4e00\u6b65\u63d0\u5347\u3002|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326](http://arxiv.org/abs/2405.19326)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u96f6\u6837\u672c3D\u63a8\u7406\u5206\u5272\uff0c\u76ee\u6807\u662f\u9488\u5bf9\u7269\u4f53\u7684\u90e8\u4ef6\u641c\u7d22\u548c\u5b9a\u4f4d\uff0c\u8fd9\u662f\u4e00\u79cd\u8d85\u8d8a\u4e86\u5148\u524d\u7c7b\u522b\u7279\u5b9a\u76843D\u8bed\u4e49\u5206\u5272\u30013D\u5b9e\u4f8b\u5206\u5272\u548c\u5f00\u653e\u8bcd\u6c473D\u5206\u5272\u5c40\u9650\u7684\u65b0\u8303\u5f0f\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aReasoning3D\u7684\u7b80\u5355\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u7684\u547d\u4ee4\uff0c\u5bf93D\u7f51\u683c\u8fdb\u884c\uff08\u7ec6\u81f4\uff09\u90e8\u5206\u5206\u5272\uff0c\u540c\u65f6\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u548c\u63a8\u7406\u7b54\u6848\u7684\u4ea4\u4e92\u5f0f\u5206\u5272\u80fd\u529b\u3002\u7279\u522b\u5730\uff0cReasoning3D\u5229\u7528\u9884\u8bad\u7ec3\u76842D\u5206\u5272\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\uff0c\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u89e3\u6790\u7528\u6237\u8f93\u5165\u67e5\u8be2\u3002\u5df2\u6709\u7814\u7a76\u8868\u660e\uff0c\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8d4b\u4e88\u57fa\u7840\u6a21\u578b\u4e16\u754c\u77e5\u8bc6\u7684\u5148\u9a8c\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u590d\u6742\u6307\u4ee4\uff0c\u8fd9\u4f7f\u5f97\u6211\u4eec\u5728\u4f9d\u8d56\u6709\u96503D\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u201c\u5206\u5272\u4efb\u4f55\u4e1c\u897f\u201d\uff08\u6e90\u6548\u7387\u9ad8\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u6cdb\u5316\u6027\uff0c\u80fd\u6709\u6548\u6839\u636e\u9690\u6027\u6587\u672c\u67e5\u8be2\u57283D\u5bf9\u8c61\uff083D\u7f51\u683c\uff09\u4e2d\u5b9a\u4f4d\u548c\u7a81\u51fa\u663e\u793a\u90e8\u5206\uff0c\u5305\u62ec\u53ef\u52a83D\u5bf9\u8c61\u548c\u771f\u5b9e\u4e16\u754c\u7684\u626b\u63cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65e0\u76d1\u7763\u65b9\u6cd5\u4fbf\u4e8e\u5feb\u901f\u90e8\u7f72\uff0c\u5e76\u4e3a\u672a\u67653D\uff08\u8bed\u4e49\uff09\u5bf9\u8c61\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\uff0c\u5982\u673a\u5668\u4eba\u3001\u7269\u4f53\u64cd\u4f5c\u3001\u90e8\u4ef6\u7ec4\u88c5\u3001\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u3001\u589e\u5f3a\u73b0\u5b9e\u548c\u865a\u62df\u73b0\u5b9e\uff08AR/VR\uff09\u3001\u4ee5\u53ca\u533b\u7597\u5e94\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u884c\u7684\u901a\u7528\u57fa\u51c6\u3002\u4ee3\u7801\u3001\u6a21\u578b\u6743\u91cd\u3001\u90e8\u7f72\u6307\u5357\u548c\u8bc4\u4f30\u534f\u8bae\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttp://tianrun-chen.github.io/Reason3D/\u3002|\n", "2405.19325": "|**2024-05-29**|**Nearest Neighbor Speculative Decoding for LLM Generation and Attribution**|Minghan Li et.al.|[2405.19325](http://arxiv.org/abs/2405.19325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4f1a\u4ea7\u751f\u865a\u6784\u5185\u5bb9\u4e14\u7f3a\u4e4f\u5bf9\u751f\u6210\u6587\u672c\u7684\u6765\u6e90\u6807\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u534a\u53c2\u6570\u5316\u8bed\u8a00\u6a21\u578b\u5982kNN-LM\u901a\u8fc7\u5728\u975e\u53c2\u6570\u6570\u636e\u5b58\u50a8\u4e2d\u5bfb\u627e\u4e0e\u7ed9\u5b9a\u63d0\u793a\u6700\u63a5\u8fd1\u7684\u90bb\u5c45\u6765\u6539\u8fdbLM\u8f93\u51fa\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u901a\u5e38\u8f83\u6162\uff0c\u751f\u6210\u7684\u6587\u672c\u6d41\u7545\u5ea6\u4e0d\u9ad8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u534a\u53c2\u6570\u5316\u8bed\u8a00\u5efa\u6a21\u65b9\u6cd5\u2014\u2014Nearest Neighbor Speculative Decoding\uff08NEST\uff09\uff0c\u5b83\u80fd\u591f\u5c06\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4efb\u610f\u957f\u5ea6\u6587\u672c\u7247\u6bb5\u878d\u5165\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u5176\u6e90\u5934\u7684\u6807\u6ce8\u3002NEST\u5728\u6bcf\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u8fdb\u884c\u57fa\u4e8e\u4ee4\u724c\u7684\u68c0\u7d22\uff0c\u8ba1\u7b97\u51fa\u4e00\u4e2a\u534a\u53c2\u6570\u6df7\u5408\u5206\u5e03\uff0c\u5e76\u4ece\u8bed\u6599\u5e93\u4e2d\u8bc6\u522b\u51fa\u53ef\u80fd\u7684\u8fde\u7eed\u6587\u672c\u6bb5\u843d\u6269\u5c55\u3002\u5b83\u91c7\u7528\u4e00\u79cd\u8fd1\u4f3c\u63a8\u6d4b\u89e3\u7801\u7b56\u7565\uff0c\u63a5\u53d7\u68c0\u7d22\u5230\u7684\u7247\u6bb5\u524d\u7f00\u6216\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002NEST\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u7840LM\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u751f\u6210\u8d28\u91cf\u548c\u6765\u6e90\u6807\u6ce8\u7387\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684kNN-LM\u65b9\u6cd5\uff0c\u5e76\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u68c0\u7d22\u589e\u5f3a\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u529b\u3002\u6b64\u5916\uff0cNEST\u5927\u5e45\u63d0\u5347\u4e86\u751f\u6210\u901f\u5ea6\uff0c\u5f53\u5e94\u7528\u4e8eLlama-2-Chat 70B\u65f6\uff0c\u63a8\u7406\u65f6\u95f4\u63d0\u9ad8\u4e861.8\u500d\u3002|\n", "2405.19323": "|**2024-05-29**|**Are Large Language Models Chameleons?**|Mingmeng Geng et.al.|[2405.19323](http://arxiv.org/abs/2405.19323)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u62e5\u6709\u81ea\u5df1\u7684\u4e16\u754c\u89c2\u548c\u4eba\u683c\u503e\u5411\uff1f\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u4e86\u8d85\u8fc7\u4e00\u767e\u4e07\u6b21\u7684\u5b9e\u9a8c\uff0c\u8ba9LLMs\u56de\u7b54\u4e3b\u89c2\u95ee\u9898\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u6a21\u578b\u7684\u54cd\u5e94\u4e0e\u6b27\u6d32\u793e\u4f1a\u8c03\u67e5\uff08ESS\uff09\u7684\u5b9e\u9645\u6570\u636e\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793a\u63d0\u793a\u5bf9\u504f\u89c1\u548c\u53d8\u5f02\u6027\u6709\u663e\u8457\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u91cd\u5927\u7684\u6587\u5316\u3001\u5e74\u9f84\u548c\u6027\u522b\u504f\u5dee\u3002\u6587\u4e2d\u8ba8\u8bba\u4e86\u8bc4\u4f30LLMs\u4e0e\u8c03\u67e5\u6570\u636e\u5dee\u5f02\u7684\u65b9\u6cd5\uff0c\u5982\u8ba1\u7b97\u52a0\u6743\u5e73\u5747\u503c\u4ee5\u53ca\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u57fa\u4e8eJaccard\u76f8\u4f3c\u6027\u7684\u6d4b\u91cf\u6307\u6807\u3002\u7814\u7a76\u8005\u5f3a\u8c03\uff0c\u5728\u5229\u7528LLMs\u6a21\u62df\u4e2a\u4f53\u51b3\u7b56\u6216\u96c6\u4f53\u884c\u4e3a\u4e4b\u524d\uff0c\u5206\u6790\u63d0\u793a\u7684\u7a33\u5065\u6027\u548c\u53d8\u5f02\u6027\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u6a21\u4eff\u80fd\u529b\u5145\u5176\u91cf\u53ea\u80fd\u8bf4\u662f\u8fd1\u4f3c\u7684\u3002|\n", "2405.19320": "|**2024-05-29**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320](http://arxiv.org/abs/2405.19320)|null|**\u6458\u8981\uff1a** \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5728\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u90fd\u5904\u4e8e\u6d3b\u8dc3\u7684\u7814\u7a76\u9636\u6bb5\uff0c\u4f46\u5173\u952e\u6311\u6218\u4e4b\u4e00\u662f\u5982\u4f55\u5728\u5904\u7406\u4ece\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u4e0d\u786e\u5b9a\u6027\u65f6\u3002\u5c3d\u7ba1\u6807\u51c6\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e2d\u4e50\u89c2\u4e3b\u4e49\u6216\u60b2\u89c2\u4e3b\u4e49\u7684\u539f\u5219\u5df2\u5e7f\u4e3a\u4eba\u77e5\uff0c\u4f46\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5b9e\u73b0\u65e2\u5b9e\u7528\u53c8\u57fa\u4e8e\u7406\u8bba\u7684\u65b9\u6cd5\u5c1a\u4e0d\u6210\u719f\uff0c\u56e0\u4e3a\u6784\u5efa\u7f6e\u4fe1\u533a\u95f4\u7684\u6807\u51c6\u6280\u672f\u5728\u5904\u7406\u4efb\u610f\u7b56\u7565\u53c2\u6570\u5316\u65f6\u53d8\u5f97\u96be\u4ee5\u5904\u7406\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u65b9\u6cd5\u2014\u2014\u4ef7\u503c\u6fc0\u52b1\u7684\u504f\u597d\u4f18\u5316\uff08VPO\uff09\u3002VPO\u901a\u8fc7\u5728\u6700\u5927\u4f3c\u7136\u4f30\u8ba1\u7684\u5956\u52b1\u51fd\u6570\u4e2d\u6dfb\u52a0\u76f8\u5e94\u7684\u503c\u51fd\u6570\u7684\u6b63\u5219\u5316\uff0c\u4ee5\u6307\u793a\u9009\u62e9\u4e50\u89c2\u4e3b\u4e49\u8fd8\u662f\u60b2\u89c2\u4e3b\u4e49\uff0c\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\u3002\u6b64\u5916\uff0cVPO\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u5e76\u5229\u7528\u9690\u5f0f\u5956\u52b1\u5efa\u6a21\uff0c\u56e0\u6b64\u5176RLHF\u7ba1\u9053\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\u66f4\u4e3a\u7b80\u5355\u3002\u5bf9\u4e8e\u5728\u7ebf\u548c\u79bb\u7ebf\u8bbe\u7f6e\uff0cVPO\u63d0\u4f9b\u4e86\u7406\u8bba\u4fdd\u8bc1\uff0c\u5176\u6536\u655b\u901f\u5ea6\u4e0e\u6807\u51c6RL\u76f8\u5f53\u3002\u5b9e\u9a8c\u5728\u6587\u672c\u6458\u8981\u548c\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u9a8c\u8bc1\u4e86VPO\u7684\u5b9e\u7528\u6027\u4e0e\u6709\u6548\u6027\u3002|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340](http://arxiv.org/abs/2405.20340)|null|\u8fd9\u9879\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\uff08\u89c6\u9891\u548c\u52a8\u4f5c\u6a21\u6001\uff09\u4e0b\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5f3a\u5927\u529f\u80fd\u3002\u4e0e\u4e13\u4e3a\u5355\u6a21\u6001\uff08\u89c6\u9891\u6216\u52a8\u4f5c\uff09\u8bbe\u8ba1\u7684\u6700\u65b0LLMs\u4e0d\u540c\uff0c\u6211\u4eec\u8ba4\u4e3a\u7406\u89e3\u4eba\u7c7b\u884c\u4e3a\u9700\u8981\u5bf9\u89c6\u9891\u548c\u52a8\u4f5c\u5e8f\u5217\uff08\u5982SMPL\u5e8f\u5217\uff09\u8fdb\u884c\u8054\u5408\u5efa\u6a21\uff0c\u4ee5\u6709\u6548\u6355\u6349\u7cbe\u7ec6\u7684\u8eab\u4f53\u90e8\u4f4d\u52a8\u6001\u548c\u8bed\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMotionLLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u4eba\u7c7b\u52a8\u4f5c\u7406\u89e3\u3001\u63cf\u8ff0\u548c\u63a8\u7406\u3002MotionLLM\u91c7\u7528\u4e86\u4e00\u4f53\u5316\u7684\u89c6\u9891-\u52a8\u4f5c\u8bad\u7ec3\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u7c97\u7c92\u5ea6\u7684\u89c6\u9891-\u6587\u672c\u6570\u636e\u548c\u7cbe\u7ec6\u52a8\u4f5c-\u6587\u672c\u6570\u636e\u7684\u4f18\u52bf\uff0c\u4ee5\u83b7\u53d6\u4e30\u5bcc\u7684\u7a7a\u95f4-\u65f6\u95f4\u6d1e\u5bdf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684MoVid\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u591a\u6837\u5316\u7684\u89c6\u9891\u3001\u52a8\u4f5c\u3001caption\u548c\u6307\u4ee4\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86MoVid-Bench\uff0c\u5b83\u5177\u6709\u7cbe\u5fc3\u7684\u624b\u52a8\u6807\u6ce8\uff0c\u4ee5\u66f4\u597d\u5730\u8bc4\u4f30\u5728\u89c6\u9891\u548c\u52a8\u4f5c\u4e0a\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5145\u5206\u5c55\u793a\u4e86MotionLLM\u5728caption\u751f\u6210\u3001\u7a7a\u95f4-\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002|\n", "2405.20339": "|**2024-05-30**|**Visual Perception by Large Language Model's Weights**|Feipeng Ma et.al.|[2405.20339](http://arxiv.org/abs/2405.20339)|null|\u8fd9\u7bc7\u8bba\u6587\u7684\u80cc\u666f\u662f\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u91c7\u7528\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5373\u5c06\u89c6\u89c9\u4fe1\u606f\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u7a7a\u95f4\u5bf9\u9f50\uff0c\u7136\u540e\u5c06\u89c6\u89c9\u4ee4\u724c\u4e0e\u6587\u672c\u4ee4\u724c\u5408\u5e76\uff0c\u5f62\u6210\u7edf\u4e00\u7684\u5e8f\u5217\u8f93\u5165\u7ed9\u8bed\u8a00\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7531\u4e8e\u589e\u52a0\u4e86\u7531\u89c6\u89c9\u4ee4\u724c\u5bfc\u81f4\u7684\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\uff0c\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53c2\u6570\u7a7a\u95f4\u5bf9\u9f50\u8303\u5f0f\uff0c\u901a\u8fc7\u5c06\u89c6\u89c9\u4fe1\u606f\u8868\u793a\u4e3a\u6a21\u578b\u6743\u91cd\u6765\u5904\u7406\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u8f93\u5165\u56fe\u50cf\uff0c\u9996\u5148\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u63d0\u53d6\u7279\u5f81\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u7279\u5f81\u8f6c\u6362\u4e3a\u611f\u77e5\u6743\u91cd\uff0c\u5e76\u5c06\u5176\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u6743\u91cd\u878d\u5408\u3002\u8fd9\u6837\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u65e0\u9700\u89c6\u89c9\u4ee4\u724c\uff0c\u4ece\u800c\u7f29\u77ed\u4e86\u8f93\u5165\u5e8f\u5217\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6548\u7387\u3002 \u57fa\u4e8e\u8fd9\u4e00\u7406\u5ff5\uff0c\u8bba\u6587\u63d0\u51fa\u4e86VLoRA\u6a21\u578b\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u4e2a\u611f\u77e5\u6743\u91cd\u751f\u6210\u5668\u3002\u8be5\u751f\u6210\u5668\u8bbe\u8ba1\u6210\u80fd\u591f\u5c06\u89c6\u89c9\u7279\u5f81\u8f6c\u5316\u4e3a\u5177\u6709\u4f4e\u79e9\u7279\u6027\u7684\u611f\u77e5\u6743\u91cd\uff0c\u7c7b\u4f3c\u4e8eLoRA\uff08\u4f4e\u79e9\u81ea\u9002\u5e94\u8bad\u7ec3\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1VLoRA\u5728\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u4e0e\u73b0\u6709MLLMs\u76f8\u5f53\u7684\u6027\u80fd\uff0c\u4f46\u5176\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8ba1\u7b97\u6210\u672c\u663e\u8457\u964d\u4f4e\u3002\u8bba\u6587\u627f\u8bfa\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2405.20335": "|**2024-05-30**|**Xwin-LM: Strong and Scalable Alignment Practice for LLMs**|Bolin Ni et.al.|[2405.20335](http://arxiv.org/abs/2405.20335)|**[link](https://github.com/xwin-lm/xwin-lm)**|**\u672c\u6587\u4ecb\u7ecdXwin-LM\uff0c\u4e00\u4e2a\u4e13\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u7684\u5168\u9762\u5bf9\u9f50\u65b9\u6cd5\u5957\u4ef6\u3002\u5b83\u6db5\u76d6\u4e86\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3001\u5956\u52b1\u5efa\u6a21\uff08RM\uff09\u3001\u62d2\u7edd\u91c7\u6837\u5fae\u8c03\uff08RS\uff09\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b49\u591a\u79cd\u5173\u952e\u6280\u672f\u3002\u4e3b\u8981\u7ec4\u6210\u90e8\u5206\u5305\u62ec\uff1a(1) \u4f7f\u7528\u9ad8\u8d28\u91cf\u6307\u4ee4\u6570\u636e\u8fdb\u884c\u521d\u59cb\u5fae\u8c03\u7684Xwin-LM-SFT\uff1b(2) \u7531GPT-4\u7cbe\u5fc3\u6807\u6ce8\u7684\u5927\u578b\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6Xwin-Pair\uff1b(3) \u57287B\u300113B\u548c70B\u53c2\u6570\u89c4\u6a21\u4e0a\u8bad\u7ec3\u7684Xwin-RM\u5956\u52b1\u6a21\u578b\uff1b(4) \u6bcf\u4e2a\u63d0\u793a\u5173\u805464\u4e2a\u72ec\u7279\u54cd\u5e94\u7684\u591awise\u504f\u597d\u6570\u636e\u96c6Xwin-Set\uff0c\u8fd9\u4e9b\u54cd\u5e94\u7531Xwin-LM-SFT\u751f\u6210\u5e76\u7531Xwin-RM\u8bc4\u5206\uff1b(5) \u4f7f\u7528Xwin-Set\u4e2d\u6700\u9ad8\u5f97\u5206\u54cd\u5e94\u8fdb\u884c\u5fae\u8c03\u7684Xwin-LM-RS\u6a21\u578b\uff1b(6) \u901a\u8fc7DPO\u7b97\u6cd5\u5728Xwin-Set\u4e0a\u8fdb\u4e00\u6b65\u4f18\u5316\u7684Xwin-LM-DPO\u6a21\u578b\u3002\u6211\u4eec\u5728AlpacaEval\u548cMT-bench\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86\u6574\u4e2a\u7ba1\u9053\u7684\u7a33\u5b9a\u4e14\u663e\u8457\u6539\u8fdb\uff0c\u8bc1\u660e\u4e86Xwin-LM\u7684\u5f3a\u5927\u548c\u53ef\u6269\u5c55\u6027\u3002\u6211\u4eec\u5c06\u5728https://github.com/Xwin-LM/Xwin-LM\u7684\u4ed3\u5e93\u4e2d\u6301\u7eed\u66f4\u65b0\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u7814\u7a76\u3002**|\n", "2405.20319": "|**2024-05-31**|**ParSEL: Parameterized Shape Editing with Language**|Aditya Ganeshan et.al.|[2405.20319](http://arxiv.org/abs/2405.20319)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aParSEL\u7684\u7cfb\u7edf\uff0c\u5b83\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5b9e\u73b0\u9ad8\u8d28\u91cf3D\u8d44\u4ea7\u7684\u53ef\u63a7\u7f16\u8f91\u3002\u9762\u5bf9\u81ea\u7136\u8bed\u8a00\u5728\u7cbe\u786e\u64cd\u63a7\u4e0a\u7684\u5c40\u9650\u6027\uff0cParSEL\u63a5\u6536\u4e00\u4e2a\u5206\u5272\u76843D\u7f51\u683c\u548c\u7f16\u8f91\u8bf7\u6c42\uff0c\u751f\u6210\u4e00\u4e2a\u53c2\u6570\u5316\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u7528\u6237\u53ef\u4ee5\u8c03\u6574\u7a0b\u5e8f\u53c2\u6570\uff0c\u7cbe\u7ec6\u5730\u63a2\u7d22\u5f62\u72b6\u53d8\u5316\uff0c\u63a7\u5236\u7f16\u8f91\u5e45\u5ea6\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u7406\u89e3\u521d\u59cb\u7f16\u8f91\u6307\u4ee4\uff0c\u4f46\u53d1\u73b0\u5b83\u4eec\u5728\u63a8\u65ad\u5b8c\u6574\u7f16\u8f91\u7a0b\u5e8f\u65f6\u5e38\u5e38\u4e0d\u8db3\uff0c\u4ea7\u751f\u7684\u7ed3\u679c\u53ef\u80fd\u8fdd\u53cd\u5f62\u72b6\u903b\u8f91\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u5206\u6790\u6027\u7f16\u8f91\u4f20\u64ad\uff08Analytical Edit Propagation\uff0cAEP\uff09\u7b97\u6cd5\uff0c\u5b83\u4ece\u521d\u59cb\u7f16\u8f91\u79cd\u5b50\u5f00\u59cb\uff0c\u901a\u8fc7\u8ba1\u7b97\u673a\u4ee3\u6570\u7cfb\u7edf\u8fdb\u884c\u51e0\u4f55\u5206\u6790\uff0c\u5bfb\u627e\u4e0e\u6f5c\u5728\u7528\u6237\u7f16\u8f91\u517c\u5bb9\u7684\u5206\u6790\u6027\u7f16\u8f91\u64cd\u4f5c\uff0c\u4ee5\u751f\u6210\u5b8c\u6574\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u5176\u4ed6\u65b9\u6848\uff0cParSEL\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8bf7\u6c42\u6709\u6548\u5730\u5b9e\u73b0\u4e86\u5bf93D\u5bf9\u8c61\u7684\u53ef\u63a7\u7f16\u8f91\u3002|\n", "2405.20318": "|**2024-05-30**|**CausalQuest: Collecting Natural Causal Questions for AI Agents**|Roberto Ceraolo et.al.|[2405.20318](http://arxiv.org/abs/2405.20318)|**[link](https://github.com/roberto-ceraolo/causal-quest)**|**\u4eba\u7c7b\u5929\u751f\u5c31\u6709\u5bfb\u6c42\u56e0\u679c\u5173\u7cfb\u7684\u9a71\u52a8\u529b\uff0c\u65e0\u8bba\u662f\u51fa\u4e8e\u597d\u5947\u5fc3\u8fd8\u662f\u7279\u5b9a\u76ee\u6807\u3002\u4e3a\u4e86\u5f00\u53d1\u80fd\u5904\u7406\u8fd9\u79cd\u4eba\u7c7b\u672c\u6027\u8ffd\u6c42\u7684AI\u4ee3\u7406\uff0c\u6211\u4eec\u6025\u9700\u4e00\u4e2a\u5168\u9762\u7684\u81ea\u7136\u56e0\u679c\u95ee\u9898\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u8981\u4e48\u5305\u542b\u4eba\u5de5\u5236\u9020\u7684\u95ee\u9898\uff0c\u65e0\u6cd5\u53cd\u6620\u5b9e\u9645AI\u5e94\u7528\u573a\u666f\uff0c\u8981\u4e48\u5728\u7279\u5b9a\u6765\u6e90\u7684\u95ee\u9898\u8986\u76d6\u4e0a\u6709\u9650\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CausalQuest\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81ea\u793e\u4ea4\u7f51\u7edc\u3001\u641c\u7d22\u5f15\u64ce\u548cAI\u52a9\u624b\u768413,500\u4e2a\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u56e0\u679c\u95ee\u9898\uff0c\u5e76\u5efa\u7acb\u4e86\u66f4\u7ec6\u81f4\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u5458\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\uff0c\u6211\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6807\u6ce8\u3002\u7814\u7a76\u53d1\u73b0\uff0c42%\u7684\u4eba\u7c7b\u63d0\u95ee\u5b9e\u9645\u4e0a\u662f\u5173\u4e8e\u56e0\u679c\u7684\uff0c\u5927\u90e8\u5206\u662f\u60f3\u4e86\u89e3\u7ed9\u5b9a\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u9ad8\u6548\u7684\u4e8c\u5206\u7c7b\u5668\uff08\u9ad8\u8fbe28.5\u4ebf\u53c2\u6570\uff09\uff0c\u7528\u4e8e\u8bc6\u522b\u56e0\u679c\u95ee\u9898\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6027\u80fd\uff0cF1\u5206\u6570\u9ad8\u8fbe0.877\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u4e30\u5bcc\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u8fd9\u4e9b\u90fd\u53ef\u4ee5\u57fa\u4e8e\u6211\u4eec\u7684\u6570\u636e\u548c\u6a21\u578b\u8fdb\u884c\u6269\u5c55\u3002**|\n", "2405.20315": "|**2024-05-30**|**ANAH: Analytical Annotation of Hallucinations in Large Language Models**|Ziwei Ji et.al.|[2405.20315](http://arxiv.org/abs/2405.20315)|**[link](https://github.com/open-compass/anah)**|**### \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u5e7b\u89c9\u201d\u95ee\u9898\u5bf9\u4e8e\u5176\u5e7f\u6cdb\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5bf9\u8fd9\u4e00\u95ee\u9898\u7684\u7ec6\u81f4\u6d4b\u91cf\u5728\u793e\u533a\u4e2d\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a$\\textbf{ANAH}$\u7684\u53cc\u8bed\u6570\u636e\u96c6\uff0c\u4e13\u6ce8\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2d\u7684LLM\u5e7b\u89c9\u5206\u6790\u3002ANAH\u4e2d\u7684\u6bcf\u4e2a\u7b54\u6848\u53e5\u5b50\u90fd\u7ecf\u8fc7\u4e25\u8c28\u6807\u6ce8\uff0c\u5305\u62ec\u53c2\u8003\u7247\u6bb5\u68c0\u7d22\u3001\u5e7b\u89c9\u7c7b\u578b\u7684\u5224\u65ad\u4ee5\u53ca\u9519\u8bef\u5185\u5bb9\u7684\u4fee\u6b63\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea612,000\u4e2a\u53e5\u7ea7\u6ce8\u91ca\uff0c\u6db5\u76d6\u4e86\u5927\u7ea64,300\u4e2aLLM\u54cd\u5e94\uff0c\u6d89\u53ca\u8d85\u8fc7700\u4e2a\u4e3b\u9898\uff0c\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\u5f0f\u6d41\u7a0b\u6784\u5efa\u800c\u6210\u3002\u7531\u4e8e\u5e7b\u89c9\u6ce8\u91ca\u7684\u7cbe\u7ec6\u7c92\u5ea6\uff0c\u6211\u4eec\u53ef\u4ee5\u5b9a\u91cf\u786e\u8ba4LLMs\u7684\u5e7b\u89c9\u95ee\u9898\u968f\u7740\u7b54\u6848\u7684\u6269\u5c55\u800c\u9010\u6e10\u589e\u52a0\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u6807\u6ce8\u5668\u3002 ### \u4efb\u52a1 \u6211\u4eec\u6784\u5efa\u4e86\u5927\u7ea612,000\u6761\u53e5\u5b50\u7ea7\u522b\u7684\u6ce8\u91ca\uff0c\u9488\u5bf9\u7ea64,300\u4e2aLLM\u751f\u6210\u7684\u56de\u7b54\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u4e3b\u9898\u3002\u8fd9\u4e2a\u540d\u4e3aANAH\u7684\u6570\u636e\u96c6\u901a\u8fc7\u4eba\u7c7b\u53c2\u4e0e\u7684\u6d41\u7a0b\u7cbe\u5fc3\u8bbe\u8ba1\uff0c\u65e8\u5728\u63d0\u4f9b\u5173\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2dLLMs\u5e7b\u89c9\u7684\u8be6\u5c3d\u5206\u6790\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u5e7b\u89c9\u6807\u6ce8\uff0c\u6211\u4eec\u80fd\u591f\u91cf\u5316\u5730\u9a8c\u8bc1LLMs\u5728\u751f\u6210\u7b54\u6848\u65f6\u5e7b\u89c9\u95ee\u9898\u7684\u7d2f\u79ef\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u6df1\u5165\u7814\u7a76\u4e86\u751f\u6210\u5f0f\u548c\u533a\u5206\u6027\u6807\u6ce8\u5668\uff0c\u5e76\u53d1\u73b0\u5c3d\u7ba1\u5f00\u6e90LLMs\u5728\u7cbe\u7ec6\u5e7b\u89c9\u6807\u6ce8\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u4f46\u4f7f\u7528ANAH\u8bad\u7ec3\u7684\u751f\u6210\u5f0f\u6807\u6ce8\u5668\u80fd\u591f\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u6a21\u578b\uff0c\u751a\u81f3\u63a5\u8fd1GPT-3.5\u7684\u8868\u73b0\uff0c\u5e76\u5c55\u73b0\u51fa\u5728\u672a\u89c1\u8fc7\u95ee\u9898\u4e0a\u7684\u826f\u597d\u6cdb\u5316\u80fd\u529b\u3002**|\n", "2405.20313": "|**2024-05-30**|**Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation**|Guillaume Huguet et.al.|[2405.20313](http://arxiv.org/abs/2405.20313)|null|\u86cb\u767d\u8d28\u5728\u51e0\u4e4e\u6240\u6709\u7684\u751f\u7269\u8fc7\u7a0b\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\uff0c\u5176\u591a\u6837\u5316\u7684\u529f\u80fd\u6e90\u4e8e\u590d\u6742\u7684\u4e09\u7ef4\u7ed3\u6784\uff0c\u800c\u8fd9\u4e9b\u7ed3\u6784\u53c8\u7531\u6c28\u57fa\u9178\u5e8f\u5217\u51b3\u5b9a\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6c28\u57fa\u9178\u5e8f\u5217\u4e30\u5bcc\u7684\u751f\u7269\u5b66\u5f52\u7eb3\u504f\u7f6e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5e8f\u5217\u6761\u4ef6\u7684SE(3)\u7b49\u53d8\u6d41\u5339\u914d\u6a21\u578b\u2014\u2014FoldFlow-2\uff0c\u7528\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u751f\u6210\u3002\u4e0eFoldFlow\u5bb6\u65cf\u7684\u5148\u524d\u6a21\u578b\u76f8\u6bd4\uff0cFoldFlow-2\u5f15\u5165\u4e86\u65b0\u9896\u7684\u67b6\u6784\u7279\u6027\uff0c\u5305\u62ec\u7528\u4e8e\u7f16\u7801\u5e8f\u5217\u7684\u86cb\u767d\u8d28\u5927\u8bed\u8a00\u6a21\u578b\u3001\u7ed3\u5408\u7ed3\u6784\u548c\u5e8f\u5217\u8868\u793a\u7684\u65b0\u591a\u6a21\u6001\u878d\u5408\u4e3b\u5e72\uff0c\u4ee5\u53ca\u57fa\u4e8e\u51e0\u4f55\u53d8\u6362\u5668\u7684\u89e3\u7801\u5668\u3002\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u6837\u672c\u7684\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u2014\u2014\u8fd9\u5bf9\u65b0\u836f\u8bbe\u8ba1\u81f3\u5173\u91cd\u8981\u2014\u2014\u6211\u4eec\u5728\u6bd4\u5148\u524d\u5de5\u4f5c\u4f7f\u7528\u7684PDB\u6570\u636e\u96c6\u5927\u4e00\u4e2a\u6570\u91cf\u7ea7\u7684\u65b0\u6570\u636e\u96c6\u4e0a\u5927\u89c4\u6a21\u8bad\u7ec3FoldFlow-2\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u5df2\u77e5\u7684PDB\u86cb\u767d\u8d28\u548c\u901a\u8fc7\u8fc7\u6ee4\u83b7\u5f97\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u7ed3\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5f15\u5165\u5f3a\u5316\u5fae\u8c03\uff08Reinforced Finetuning\uff0c\u7b80\u79f0ReFT\uff09\u76ee\u6807\uff0c\u4f7fFoldFlow-2\u80fd\u591f\u9002\u5e94\u4efb\u610f\u5956\u52b1\uff0c\u5982\u63d0\u9ad8\u4e8c\u7ea7\u7ed3\u6784\u591a\u6837\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFoldFlow-2\u8d85\u8d8a\u4e86\u73b0\u6709\u57fa\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\u7684\u72b6\u6001\uff0c\u65e0\u8bba\u5728\u65e0\u6761\u4ef6\u751f\u6210\u8fd8\u662f\u5728\u8bbe\u8ba1\u6027\u3001\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u65b9\u9762\uff0c\u90fd\u4f18\u4e8eRFDiffusion\uff0c\u4e14\u5728\u86cb\u767d\u8d28\u957f\u5ea6\u7684\u5404\u7c7b\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u7b49\u6e29\u6784\u8c61\u91c7\u6837\u4efb\u52a1\u4e0a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684FoldFlow-2\u5728\u8bf8\u5982VHH\u7eb3\u7c73\u6297\u4f53\u9aa8\u67b6\u8bbe\u8ba1\u7b49\u5177\u6709\u6311\u6218\u6027\u7684\u6761\u4ef6\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002|\n", "2405.20309": "|**2024-05-30**|**Large Language Models Can Self-Improve At Web Agent Tasks**|Ajay Patel et.al.|[2405.20309](http://arxiv.org/abs/2405.20309)|**[link](https://github.com/AjayP13/webdreamer)**|\u5728\u590d\u6742\u7684\u73af\u5883\u4e2d\uff0c\u5982\u7f51\u7edc\u6d4f\u89c8\u5668\uff0c\u8bad\u7ec3\u6a21\u578b\u4f5c\u4e3a\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u6267\u884c\u52a8\u4f5c\u7684\u4ee3\u7406\u901a\u5e38\u5177\u6709\u6311\u6218\u6027\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u4ee5\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u6765\u5728\u65b0\u73af\u5883\u4e2d\u5bfc\u822a\u7684\u80fd\u529b\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\uff08\u5373\u5728\u5176\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u4e0a\u5fae\u8c03\uff09\u6765\u8d85\u8d8a\u57fa\u7840\u6027\u80fd\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76LLMs\u5728\u957f\u65f6\u5e8f\u4efb\u52a1\u7684\u590d\u6742\u73af\u5883\u2014\u2014WebArena\u57fa\u51c6\u4e2d\uff0c\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u80fd\u5426\u63d0\u5347\u5176\u8868\u73b0\u3002WebArena\u8981\u6c42\u4ee3\u7406\u81ea\u4e3b\u6d4f\u89c8\u7f51\u9875\u5e76\u6267\u884c\u64cd\u4f5c\u4ee5\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u5408\u6210\u8bad\u7ec3\u6570\u636e\u6df7\u5408\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u7ecf\u8fc7\u81ea\u6211\u6539\u8fdb\u540e\uff0c\u6a21\u578b\u5728WebArena\u57fa\u51c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u63d0\u9ad8\u4e8631%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u66f4\u5168\u9762\u5730\u8bc4\u4f30\u6211\u4eec\u7684\u5fae\u8c03\u4ee3\u7406\u6a21\u578b\u7684\u884c\u4e3a\u6027\u80fd\u3001\u9c81\u68d2\u6027\u3001\u80fd\u529b\u4ee5\u53ca\u8f68\u8ff9\u8d28\u91cf\uff0c\u8fd9\u4e9b\u6307\u6807\u8d85\u8d8a\u4e86\u5f53\u524d\u4ec5\u4f9d\u8d56\u4e8e\u6574\u4f53\u57fa\u51c6\u5206\u6570\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002|\n", "2405.20304": "|**2024-05-30**|**Group Robust Preference Optimization in Reward-free RLHF**|Shyam Sundhar Ramesh et.al.|[2405.20304](http://arxiv.org/abs/2405.20304)|**[link](https://github.com/rsshyam/Group-robust-preference-optimization)**|**## \u7ffb\u8bd1 \u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u9002\u5e94\u65f6\uff0c\u901a\u5e38\u9700\u8981\u901a\u8fc7\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u548c\u591a\u5143\u6807\u7b7e\u8005\u7fa4\u4f53\uff08\u5982\u4e0d\u540c\u6027\u522b\u3001\u79cd\u65cf\u3001\u516c\u53f8\u56e2\u961f\u7b49\uff09\u7684\u504f\u597d\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u4f20\u7edf\u65b9\u6cd5\u503e\u5411\u4e8e\u91c7\u7528\u201c\u4e00\u5200\u5207\u201d\u7684\u7b56\u7565\uff0c\u5373\u5047\u8bbe\u5e76\u4f18\u5316\u5355\u4e00\u7684\u504f\u597d\u6a21\u578b\uff0c\u5bf9\u5404\u7fa4\u4f53\u7684\u72ec\u7279\u7279\u6027\u548c\u9700\u6c42\u4e0d\u591f\u654f\u611f\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7fa4\u4f53\u9c81\u68d2\u504f\u597d\u4f18\u5316\uff08GRPO\uff09\u65b9\u6cd5\uff0c\u65e8\u5728\u7a33\u5065\u5730\u4f7fLLMs\u9002\u5e94\u5404\u4e2a\u7fa4\u4f53\u7684\u504f\u597d\u3002GRPO\u65b9\u6cd5\u57fa\u4e8e\u65e0\u5956\u52b1\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u4f46\u533a\u522b\u4e8e\u4ee5\u5f80\uff0c\u5b83\u76ee\u6807\u662f\u5bfb\u627e\u4e00\u4e2a\u80fd\u6700\u5927\u5316\u6700\u5dee\u7fa4\u4f53\u6027\u80fd\u7684\u9c81\u68d2\u7b56\u7565\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0cGRPO\u4f1a\u52a8\u6001\u4e14\u9010\u6b21\u8c03\u6574\u4e0d\u540c\u7fa4\u4f53\u7684\u6743\u91cd\uff0c\u4f18\u5148\u5173\u6ce8\u7d2f\u79ef\u635f\u5931\u8f83\u9ad8\u7684\u7fa4\u4f53\u3002\u6211\u4eec\u5728\u7406\u8bba\u4e0a\u63a2\u8ba8\u4e86GRPO\u7684\u53ef\u884c\u6027\uff0c\u5e76\u5206\u6790\u4e86\u5176\u5728\u5bf9\u6570\u7ebf\u6027\u7b56\u7565\u7c7b\u522b\u4e0b\u7684\u6536\u655b\u6027\u3002\u901a\u8fc7\u4f7f\u7528\u6765\u81ea\u4e0d\u540c\u7fa4\u4f53\u7684\u5168\u5c40\u610f\u89c1\u6570\u636e\u5bf9LLMs\u8fdb\u884cGRPO\u5fae\u8c03\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u6700\u5dee\u7fa4\u4f53\u7684\u8868\u73b0\uff0c\u51cf\u5c11\u4e86\u7fa4\u4f53\u95f4\u635f\u5931\u7684\u4e0d\u5e73\u8861\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u6982\u7387\u51c6\u786e\u6027\uff0c\u76f8\u8f83\u4e8e\u975e\u9c81\u68d2\u57fa\u7ebf\uff0c\u8fd9\u4e9b\u6539\u8fdb\u6548\u679c\u663e\u8457\u3002**|\n", "2405.20285": "|**2024-05-30**|**Who Writes the Review, Human or AI?**|Panagiotis C. Theocharopoulos et.al.|[2405.20285](http://arxiv.org/abs/2405.20285)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u4eba\u4eec\u5173\u6ce8\u5982\u4f55\u8bc6\u522b\u4e0d\u540c\u9886\u57df\u7684AI\u751f\u6210\u6587\u672c\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u901a\u8fc7\u63d0\u51fa\u4e00\u79cd\u65b9\u6cd5\u6765\u51c6\u786e\u533a\u5206\u4eba\u5de5\u667a\u80fd\u751f\u6210\u7684\u548c\u4eba\u7c7b\u64b0\u5199\u7684\u4e66\u8bc4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u8fc1\u79fb\u5b66\u4e60\uff0c\u8ba9\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u540c\u4e3b\u9898\u95f4\u8bc6\u522b\u751f\u6210\u6587\u672c\uff0c\u540c\u65f6\u63d0\u9ad8\u5176\u8bc6\u522b\u5199\u4f5c\u98ce\u683c\u548c\u8bcd\u6c47\u53d8\u5316\u7684\u80fd\u529b\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u5305\u542b\u771f\u5b9e\u7684\u4e66\u8bc4\u548c\u4f7f\u7528Vicuna\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u6a21\u62df\u8bc4\u8bba\uff0c\u4ee5\u8bc4\u4f30\u6240\u63d0\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bc6\u522b\u6587\u672c\u539f\u521b\u6765\u6e90\u662f\u53ef\u884c\u7684\uff0c\u51c6\u786e\u7387\u8fbe\u523096.86%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u8bc6\u522b\u65b9\u9762\u7684\u6027\u80fd\u4e0e\u5c40\u9650\u6027\u7814\u7a76\uff0c\u8fd9\u5bf9\u4e8e\u672a\u6765\u6709\u6548\u7ba1\u7406\u6b64\u7c7b\u6a21\u578b\u4ee5\u53ca\u786e\u4fdd\u4eba\u7c7b\u521b\u4f5c\u5185\u5bb9\u7684\u5b8c\u6574\u6027\u548c\u771f\u5b9e\u6027\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2405.21075": "|**2024-05-31**|**Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis**|Chaoyou Fu et.al.|[2405.21075](http://arxiv.org/abs/2405.21075)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u8ffd\u6c42\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u6210\u4e3a\u8fd1\u671f\u8fdb\u6b65\u7684\u6838\u5fc3\u3002\u7136\u800c\uff0c\u5bf9\u5b83\u4eec\u5904\u7406\u5e8f\u5217\u89c6\u89c9\u6570\u636e\u7684\u80fd\u529b\u7684\u5173\u6ce8\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51faVideo-MME\uff0c\u8fd9\u662f\u9996\u4e2a\u5168\u9762\u8bc4\u4f30MLLMs\u5728\u89c6\u9891\u5206\u6790\u6027\u80fd\u7684\u591a\u6a21\u6001\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u56db\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u89c6\u9891\u7c7b\u578b\u591a\u6837\uff0c\u6db5\u76d66\u4e2a\u4e3b\u8981\u89c6\u89c9\u9886\u57df\u548c30\u4e2a\u5b50\u9886\u57df\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\u6cdb\u5316\u80fd\u529b\uff1b2\uff09\u65f6\u95f4\u7ef4\u5ea6\u7684\u8de8\u5ea6\uff0c\u5305\u62ec\u77ed\u3001\u4e2d\u3001\u957f\u671f\u89c6\u9891\uff0c\u4ece11\u79d2\u52301\u5c0f\u65f6\uff0c\u4ee5\u68c0\u9a8c\u6a21\u578b\u5bf9\u590d\u6742\u60c5\u5883\u52a8\u6001\u7684\u9002\u5e94\u6027\uff1b3\uff09\u6570\u636e\u6a21\u6001\u7684\u5e7f\u5ea6\uff0c\u7ed3\u5408\u89c6\u9891\u5e27\u4ee5\u5916\u7684\u591a\u79cd\u8f93\u5165\uff0c\u5982\u5b57\u5e55\u548c\u97f3\u9891\uff0c\u63ed\u793aMLLMs\u7684\u5168\u65b9\u4f4d\u80fd\u529b\uff1b4\uff09\u9ad8\u8d28\u91cf\u7684\u6807\u6ce8\uff0c\u7531\u4e13\u5bb6\u4e25\u683c\u624b\u52a8\u6807\u8bb0\uff0c\u4ee5\u4fdd\u8bc1\u7cbe\u786e\u4e14\u53ef\u9760\u7684\u6a21\u578b\u8bc4\u4f30\u3002\u6211\u4eec\u7cbe\u5fc3\u6311\u9009\u5e76\u624b\u52a8\u6ce8\u89e3\u4e86900\u6bb5\u89c6\u9891\uff0c\u603b\u65f6\u957f\u8fbe\u5230256\u5c0f\u65f6\uff0c\u751f\u6210\u4e862,700\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u901a\u8fc7Video-MME\uff0c\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u7cfb\u5217\u3001Gemini 1.5 Pro\u5728\u5185\u7684\u591a\u4e2a\u6700\u5148\u8fdb\u7684MLLM\uff0c\u4ee5\u53ca\u5f00\u6e90\u56fe\u50cf\u6a21\u578bInternVL-Chat-V1.5\u548c\u89c6\u9891\u6a21\u578bLLaVA-NeXT-Video\u8fdb\u884c\u4e86\u6df1\u5165\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cGemini 1.5 Pro\u662f\u8868\u73b0\u6700\u4f73\u7684\u5546\u4e1a\u6a21\u578b\uff0c\u660e\u663e\u4f18\u4e8e\u5f00\u6e90\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u53d1\u73b0\u5f3a\u8c03\u4e86\u6539\u8fdb\u5904\u7406\u66f4\u957f\u5e8f\u5217\u548c\u591a\u6a21\u6001\u6570\u636e\u7684\u5fc5\u8981\u6027\u3002\u9879\u76ee\u7f51\u9875\u94fe\u63a5\uff1ahttps://video-mme.github.io|\n", "2405.21047": "|**2024-05-31**|**Grammar-Aligned Decoding**|Kanghee Park et.al.|[2405.21047](http://arxiv.org/abs/2405.21047)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u65f6\u9762\u4e34\u6311\u6218\uff0c\u5982\u7a0b\u5e8f\u4ee3\u7801\u3001\u6570\u5b66\u516c\u5f0f\u6216\u89c4\u8303\u7684\u6807\u8bb0\u3002\u7ea6\u675f\u89e3\u7801\u65b9\u6cd5\u901a\u8fc7\u9650\u5236\u6bcf\u6b21\u8f93\u51fa\u53ef\u80fd\u7684\u4ee4\u724c\uff0c\u786e\u4fdd\u8f93\u51fa\u7b26\u5408\u7279\u5b9a\u89c4\u5219\u6765\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4f8b\u5982\u5728\u8bed\u6cd5\u7ea6\u675f\u89e3\u7801\uff08GCD\uff09\u4e2d\uff0cLLM\u7684\u8f93\u51fa\u5fc5\u987b\u9075\u5faa\u7ed9\u5b9a\u7684\u8bed\u6cd5\u89c4\u5219\u3002\u7136\u800c\uff0c\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u79cd\u7ea6\u675f\u89e3\u7801\u53ef\u80fd\u4f1a\u626d\u66f2\u6a21\u578b\u7684\u5206\u5e03\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u8f93\u51fa\u867d\u7136\u8bed\u6cd5\u6b63\u786e\uff0c\u4f46\u5176\u6982\u7387\u5e76\u4e0d\u76f4\u63a5\u53cd\u6620LLM\u672c\u8eab\u7684\u6982\u7387\u5206\u914d\uff0c\u4ece\u800c\u8d28\u91cf\u4e0d\u9ad8\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u4e0e\u8bed\u6cd5\u7ea6\u675f\u5bf9\u9f50\u7684\u89e3\u7801\u201d\uff08Grammar-Aligned Decoding\uff0cGAD\uff09\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u81ea\u9002\u5e94\u91c7\u6837\u4e0e\u8fd1\u4f3c\u671f\u671b\u672a\u6765\u201d\uff08Adaptive Sampling with Approximate Expected Futures\uff0cASAp\uff09\u7684\u89e3\u7801\u7b97\u6cd5\u3002 ASAp\u7b97\u6cd5\u65e8\u5728\u4fdd\u8bc1\u8f93\u51fa\u7684\u8bed\u6cd5\u6027\uff0c\u5e76\u7406\u8bba\u4e0a\u4ea7\u751f\u4e0eLLM\u5728\u7ed9\u5b9a\u8bed\u6cd5\u7ea6\u675f\u6761\u4ef6\u4e0b\u7684\u6761\u4ef6\u6982\u7387\u76f8\u7b26\u7684\u7ed3\u679c\u3002\u8be5\u7b97\u6cd5\u5229\u7528\u5148\u524d\u7684\u6837\u672c\u8f93\u51fa\u6765\u7a33\u5065\u5730\u4f30\u7b97\u4e0d\u540c\u8f93\u51fa\u524d\u7f00\u7684\u672a\u6765\u8bed\u6cd5\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5728\u4ee3\u7801\u751f\u6210\u548c\u7ed3\u6784\u5316\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cASAp\u7ecf\u5e38\u80fd\u591f\u751f\u6210\u6bd4\u73b0\u6709GCD\u6280\u672f\u66f4\u7b26\u5408LLM\u5206\u5e03\u4e14\u4ecd\u9075\u5b88\u6240\u9700\u8bed\u6cd5\u9650\u5236\u7684\u8f93\u51fa\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u8d28\u91cf\u3002|\n", "2405.21040": "|**2024-05-31**|**Direct Alignment of Language Models via Quality-Aware Self-Refinement**|Runsheng Yu et.al.|[2405.21040](http://arxiv.org/abs/2405.21040)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u884c\u4e3a\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u5e38\u7528\u65b9\u6cd5\u3002\u6700\u8fd1\uff0c\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u4f5c\u4e3a\u4e00\u79cd\u66ff\u4ee3\u65b9\u6848\u5174\u8d77\uff0c\u5b83\u4e0d\u518d\u4f9d\u8d56LLM\u5956\u52b1\u6a21\u578b\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u989d\u5916\u7684\u5185\u5b58\u548c\u8bad\u7ec3\u65f6\u95f4\u3002\u7136\u800c\uff0cDPO\u5ffd\u89c6\u4e86\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u76f8\u5bf9\u8d28\u91cf\uff0c\u53ef\u80fd\u5bfc\u81f4\u8bad\u7ec3\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u5229\u7528LLM\u5185\u90e8\u77e5\u8bc6\u5728\u5373\u65f6\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5e76\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7ec6\u5316\u51fd\u6570\uff0c\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u4f30\u8ba1\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u54c1\u8d28\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u8f7b\u5ea6\u5047\u8bbe\u4e0b\uff0c\u6784\u5efa\u7684\u7ec6\u5316\u51fd\u6570\u80fd\u591f\u5e2e\u52a9\u81ea\u6211\u8c03\u6574\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u5c06\u8fd9\u4e2a\u7ec6\u5316\u529f\u80fd\u6574\u5408\u5230DPO\u53ca\u5176\u53d8\u4f53\u8eab\u4efd\u7b56\u7565\u4f18\u5316\uff08IPO\uff09\u4e2d\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0c\u8fd9\u4e9b\u6539\u8fdb\u540e\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u8005\u4e0a\u8868\u73b0\u51fa\u4f18\u4e8eDPO\u548cIPO\u7684\u6027\u80fd\u3002|\n", "2405.21030": "|**2024-05-31**|**Standards for Belief Representations in LLMs**|Daniel A. Herrmann et.al.|[2405.21030](http://arxiv.org/abs/2405.21030)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u5bb6\u4eec\u6b63\u5728\u5bfb\u6c42\u7406\u89e3\u5b83\u4eec\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5173\u4e8eLLMs\u5982\u4f55\uff08\u5982\u679c\u6709\u7684\u8bdd\uff09\u5185\u90e8\u6784\u5efa\u5bf9\u4e16\u754c\u7684\u4fe1\u5ff5\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u7f3a\u4e4f\u4e00\u4e2a\u7edf\u4e00\u7684\u7406\u8bba\u6846\u67b6\u6765\u652f\u6491\u5bf9LLM\u4e2d\u4fe1\u5ff5\u7684\u7814\u7a76\u3002\u672c\u6587\u8bd5\u56fe\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86\u4e00\u5957\u6761\u4ef6\uff0c\u4f7fLLM\u4e2d\u7684\u8868\u793a\u80fd\u591f\u88ab\u89c6\u4e3a\u4fe1\u5ff5\u4f3c\u7684\u3002\u6211\u4eec\u6307\u51fa\uff0c\u5c3d\u7ba1\u5728LLMs\u4e2d\u6d4b\u91cf\u4fe1\u5ff5\u7684\u9879\u76ee\u4e0e\u51b3\u7b56\u7406\u8bba\u548c\u5f62\u5f0f\u8ba4\u8bc6\u8bba\u4e2d\u7684\u4fe1\u5ff5\u6d4b\u91cf\u5728\u8bb8\u591a\u65b9\u9762\u6709\u76f8\u4f3c\u4e4b\u5904\uff0c\u4f46\u4e5f\u5b58\u5728\u5dee\u5f02\uff0c\u8fd9\u4e9b\u5dee\u5f02\u5e94\u5f71\u54cd\u6211\u4eec\u7684\u6d4b\u91cf\u65b9\u6cd5\u3002\u56e0\u6b64\uff0c\u501f\u9274\u54f2\u5b66\u6d1e\u5bdf\u548c\u673a\u5668\u5b66\u4e60\u7684\u5f53\u4ee3\u5b9e\u8df5\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u56db\u4e2a\u6807\u51c6\uff1a\u51c6\u786e\u6027\u3001\u4e00\u81f4\u6027\u3001\u7edf\u4e00\u6027\u548c\u5b9e\u7528\u6027\u3002\u8fd9\u56db\u4e2a\u6807\u51c6\u7ed3\u5408\u4e86\u7406\u8bba\u8003\u91cf\u4e0e\u5b9e\u9645\u9650\u5236\uff0c\u4e3a\u5168\u9762\u7406\u89e3LLM\u4e2d\u7684\u4fe1\u5ff5\u8868\u793a\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u6211\u4eec\u5f15\u7528\u5b9e\u8bc1\u5de5\u4f5c\u7684\u6210\u679c\uff0c\u63ed\u793a\u4e86\u5355\u72ec\u4f7f\u7528\u67d0\u4e9b\u6807\u51c6\u65f6\u8bc6\u522b\u4fe1\u5ff5\u8868\u793a\u7684\u5c40\u9650\u6027\u3002|\n", "2405.21028": "|**2024-05-31**|**LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models**|Elias Stengel-Eskin et.al.|[2405.21028](http://arxiv.org/abs/2405.21028)|**[link](https://github.com/esteng/pragmatic_calibration)**|**\u5f53\u56de\u7b54\u95ee\u9898\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u80fd\u63d0\u4f9b\u7b54\u6848\uff0c\u8fd8\u80fd\u4f20\u8fbe\u5bf9\u7b54\u6848\u6b63\u786e\u6027\u7684\u4fe1\u5fc3\u7a0b\u5ea6\u3002\u8fd9\u5305\u62ec\u660e\u786e\u7684\u5206\u6570\u6807\u8bb0\uff0c\u5982\u7ed9\u51fa\u6570\u5b57\uff0c\u4ee5\u53ca\u9690\u542b\u7684\u4fe1\u5fc3\u6807\u5fd7\uff0c\u5982\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u989d\u5916\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u5927\u591a\u6570\u6a21\u578b\u5f80\u5f80\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u6821\u51c6\u8fd9\u4e9b\u4fe1\u5fc3\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u7528\u7684\u3001\u8003\u8651\u542c\u4f17\u7684\u5fae\u8c03\u65b9\u6cd5\uff08LACIE\uff09\uff0c\u5b83\u4e0d\u4ec5\u5173\u6ce8\u7b54\u6848\u662f\u5426\u6b63\u786e\uff0c\u8fd8\u5173\u6ce8\u7b54\u6848\u662f\u5426\u4f1a\u88ab\u542c\u4f17\u63a5\u53d7\u3002\u6211\u4eec\u5c06\u6821\u51c6\u89c6\u4e3a\u504f\u597d\u4f18\u5316\uff0c\u901a\u8fc7\u53cc\u4ee3\u7406\u6e38\u620f\u521b\u5efa\u6570\u636e\uff0c\u8ba9\u4e00\u4e2a\u6f14\u8bb2\u8005\u6a21\u578b\u7684\u8f93\u51fa\u63a5\u53d7\u6a21\u62df\u542c\u8005\u7684\u8bc4\u5224\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528LACIE\u5bf9\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff08Mistral-7B\u3001Llama3-8B\u548cLlama3-70B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u663e\u793a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6a21\u62df\u542c\u8005\u9762\u524d\u6709\u66f4\u597d\u7684\u6821\u51c6\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u8d8b\u52bf\u4e5f\u9002\u7528\u4e8e\u4eba\u7c7b\u542c\u4f17\uff0c\u5e2e\u52a9\u4ed6\u4eec\u66f4\u51c6\u786e\u5730\u9884\u6d4b\u6a21\u578b\u7684\u6b63\u786e\u6027\uff1a\u6211\u4eec\u5728\u4eba\u673a\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u63a5\u53d7\u7684\u9519\u8bef\u7b54\u6848\u51cf\u5c11\u4e8647%\uff0c\u800c\u6b63\u786e\u7b54\u6848\u7684\u63a5\u53d7\u7387\u4fdd\u6301\u4e0d\u53d8\u3002\u6b64\u5916\uff0cLACIE\u6cdb\u5316\u5230\u53e6\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\uff0c\u5728\u4f7f\u7528TriviaQA\u8bad\u7ec3\u540e\uff0cTruthfulQA\u4e0a\u7684\u771f\u5b9e\u6027\u5927\u5e45\u63d0\u9ad8\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLACIE\u5bfc\u81f4\u4e86\u6b63\u786e\u548c\u9519\u8bef\u793a\u4f8b\u4e4b\u95f4\u7684\u4fe1\u5fc3\u5ea6\u66f4\u597d\u5730\u5206\u79bb\u3002\u5b9a\u6027\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u4f1a\u66f4\u52a0\u8c28\u614e\uff0c\u5e76\u5728\u56de\u7b54\u6b63\u786e\u65f6\u901a\u8fc7\u4f7f\u7528\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u7ec6\u8282\u6765\u9690\u6027\u5730\u8868\u793a\u786e\u5b9a\u6027\u3002\u6700\u540e\uff0cLACIE\u5fae\u8c03\u5bfc\u81f4\u6a21\u578b\u5bf9\u4e8e\u53ef\u80fd\u9519\u8bef\u7684\u7b54\u6848\u66f4\u503e\u5411\u4e8e\u653e\u5f03\uff08\u4f8b\u5982\u8bf4\u201c\u6211\u4e0d\u77e5\u9053\u201d\uff09\u3002**|\n", "2405.21018": "|**2024-05-31**|**Improved Techniques for Optimization-Based Jailbreaking on Large Language Models**|Xiaojun Jia et.al.|[2405.21018](http://arxiv.org/abs/2405.21018)|**[link](https://github.com/jiaxiaojunqaq/i-gcg)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5176\u5b89\u5168\u6821\u51c6\u6210\u4e3a\u5e7f\u6cdb\u5e94\u7528\u7684\u5173\u952e\u3002\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u7834\u89e3\uff08\u5373\u201cjailbreaking\u201d\uff09\u6d3b\u52a8\u65e5\u76ca\u589e\u591a\uff0c\u5176\u4e2d\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u653b\u51fb\u56e0\u5176\u6210\u6548\u663e\u8457\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0cGCG\u7684\u653b\u51fb\u6548\u7387\u4ecd\u6709\u63d0\u5347\u7a7a\u95f4\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6539\u8fdb\u7684\u4f18\u5316\u57fa\u7ebf\u7834\u89e3\u6280\u672f\uff0c\u4ee5\u63d0\u5347GCG\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u5355\u4e2a\u76ee\u6807\u6a21\u677f\u201cSure\u201d\u6781\u5927\u5730\u9650\u5236\u4e86GCG\u7684\u653b\u51fb\u6548\u679c\uff0c\u56e0\u6b64\u6211\u4eec\u5efa\u8bae\u91c7\u7528\u5305\u542b\u6709\u5bb3\u81ea\u6211\u6697\u793a\u548c/\u6216\u6307\u5bfc\u7684\u591a\u6837\u5316\u76ee\u6807\u6a21\u677f\uff0c\u4ee5\u8bef\u5bfc\u6a21\u578b\u3002\u5728\u4f18\u5316\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u5efa\u8bae\u5728GCG\u4e2d\u5b9e\u65bd\u81ea\u52a8\u591a\u5750\u6807\u66f4\u65b0\uff0c\u4ee5\u52a0\u901f\u6536\u655b\uff0c\u5e76\u5f15\u5165\u4ece\u7b80\u5355\u5230\u590d\u6742\uff08easy-to-hard\uff09\u7684\u521d\u59cb\u5316\u6280\u5de7\u3002\u5c06\u8fd9\u4e9b\u6539\u8fdb\u6574\u5408\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u9ad8\u6548\u7684\u65b9\u6cd5\u2014\u2014$\\mathcal{I}$-GCG\u3002\u5b9e\u9a8c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5982NeurIPS 2023 \u7ea2\u961f\u6311\u6218\u4e2d\u8fdb\u884c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6539\u8fdb\u6280\u672f\u80fd\u591f\u5e2e\u52a9GCG\u8d85\u8d8a\u73b0\u6709\u7834\u89e3\u653b\u51fb\uff0c\u5b9e\u73b0\u63a5\u8fd1100%\u7684\u653b\u51fb\u6210\u529f\u7387\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/jiaxiaojunQAQ/I-GCG\u3002**|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985](http://arxiv.org/abs/2405.20985)|null|\u8be5\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u6295\u5f71\u5668\u6a21\u5757\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fde\u63a5\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u3001\u4fc3\u8fdb\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u9762\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u5bf9\u4e8e\u6295\u5f71\u5668\u5728\u89c6\u89c9-\u8bed\u8a00\u5bf9\u9f50\u65b9\u9762\u7684\u6548\u679c\u8bc4\u4f30\u4ecd\u663e\u4e0d\u8db3\uff0c\u901a\u5e38\u53ea\u80fd\u901a\u8fc7\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u95f4\u63a5\u63a8\u65ad\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u901a\u8fc7\u5206\u6790MLLM\u4e2d\u7684\u89c6\u89c9-\u8bed\u8a00\u8bed\u4e49\u6d41\uff0c\u6765\u89e3\u8bfb\u6295\u5f71\u5668\u7684\u5de5\u4f5c\u673a\u5236\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u8005\u8ffd\u8e2a\u4ece\u751f\u6210\u7684\u8bed\u8a00\u6807\u8bb0\u5230\u539f\u59cb\u89c6\u89c9\u7f16\u7801\u5757\u4ee5\u53ca\u6295\u5f71\u5668\u4ea7\u751f\u7684\u4e2d\u95f4\u8f93\u51fa\u4e4b\u95f4\u7684\u8bed\u4e49\u76f8\u5173\u6027\u6d41\u3002\u53d1\u73b0\u538b\u7f29\u578b\u6295\u5f71\u5668\uff08\u5982QFormer\uff09\u503e\u5411\u4e8e\u5c06\u89c6\u89c9\u5757\u62bd\u8c61\u6210\u6709\u9650\u7684\u51e0\u4e2a\u6982\u5ff5\uff0c\u5982\u7269\u4f53\u6216\u5c5e\u6027\uff0c\u5bfc\u81f4\u201c\u53cc\u91cd\u62bd\u8c61\u201d\u73b0\u8c61\uff1a\u9996\u5148\uff0c\u6295\u5f71\u5668\u53c2\u7167\u9884\u5b9a\u4e49\u67e5\u8be2\u4ee4\u724c\u8fdb\u884c\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\uff0c\u7136\u540e\uff0c\u57fa\u4e8e\u6587\u672c\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u4e00\u6b65\u63d0\u53d6\u3002\u8fd9\u79cd\u53cc\u91cd\u62bd\u8c61\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6548\u7387\u4e0d\u9ad8\uff0c\u5e76\u53ef\u80fd\u5bfc\u81f4\u89c6\u89c9\u8bed\u4e49\u4fe1\u606f\u7684\u7d2f\u79ef\u7f3a\u5931\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u63d0\u51fa\u201c\u89e3\u8026\u538b\u7f29\u4e0e\u62bd\u8c61\uff08DeCo\uff09\u201d\u7684\u5173\u952e\u6d1e\u5bdf\uff0c\u5373\u5728\u6295\u5f71\u5c42\u9762\u4e0a\u5c06\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u538b\u7f29\uff0c\u800c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u5b8c\u5168\u8d1f\u8d23\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u538b\u7f29\u5668\u2014\u2014\u4e8c\u7ef4\u81ea\u9002\u5e94\u6c60\u5316\uff0c\u4ee5\u65e0\u53c2\u6570\u7684\u65b9\u5f0f\u964d\u4f4e\u89c6\u89c9\u5757\u7684\u5c3a\u5bf8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDeCo\u5728\u6027\u80fd\u548c\u6548\u7387\u4e0a\u90fd\u4f18\u4e8e\u4f20\u7edf\u7684\u538b\u7f29\u6295\u5f71\u5668\u3002\u5b83\u5728MLLM\u57fa\u51c6\u3001\u89c6\u89c9\u5b9a\u4f4d\u548c\u5f00\u653e\u6027\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\u5206\u522b\u53d6\u5f97\u4e860.9%\u30017.1%\u548c2.9%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u62e5\u6709\u66f4\u5c11\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u66f4\u5feb\u7684\u6536\u655b\u901f\u5ea6\u3002|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978](http://arxiv.org/abs/2405.20978)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5f3a\u5927\u529f\u80fd\uff0c\u4f46\u9762\u4e34\u6311\u6218\uff0c\u5982\u865a\u6784\u3001\u8fc7\u65f6\u77e5\u8bc6\u548c\u96be\u4ee5\u8ffd\u6eaf\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u5b83\u7ed3\u5408\u5916\u90e8\u6570\u636e\u5e93\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u4e0d\u9002\u5f53\u7684\u68c0\u7d22\u6bb5\u843d\u53ef\u80fd\u59a8\u788dLLMs\u751f\u6210\u5168\u9762\u4e14\u9ad8\u8d28\u91cf\u7684\u56de\u7b54\u3002\u5148\u524d\u5173\u4e8eRAG\u4e2d\u68c0\u7d22\u566a\u58f0\u7a33\u5065\u6027\u7684\u7814\u7a76\u5f80\u5f80\u5c40\u9650\u4e8e\u6709\u9650\u7684\u566a\u58f0\u7c7b\u578b\uff0c\u8fd9\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u68c0\u7d22\u73af\u5883\u4e0d\u7b26\uff0c\u9650\u5236\u4e86\u5b9e\u9645\u5e94\u7528\u3002\u672c\u7814\u7a76\u9996\u5148\u63a2\u8ba8\u4e86\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u5c06\u5176\u5206\u4e3a\u4e09\u79cd\u4e0d\u540c\u7684\u7c7b\u522b\uff0c\u53cd\u6620\u771f\u5b9e\u73af\u5883\u3002\u6211\u4eec\u5206\u6790\u4e86\u8fd9\u4e9b\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u566a\u58f0\u5bf9LLMs\u7a33\u5065\u6027\u7684\u5f71\u54cd\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAG\u65b9\u6cd5\uff0c\u79f0\u4e3a\u68c0\u7d22\u589e\u5f3a\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\uff08RAAT\uff09\u3002RAAT\u5229\u7528\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\u6765\u52a8\u6001\u8c03\u6574\u6a21\u578b\u7684\u8bad\u7ec3\u6d41\u7a0b\u4ee5\u5e94\u5bf9\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u91c7\u7528\u591a\u4efb\u52a1\u5b66\u4e60\u786e\u4fdd\u6a21\u578b\u80fd\u591f\u8bc6\u522b\u5608\u6742\u7684\u4e0a\u4e0b\u6587\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5404\u79cd\u566a\u58f0\u6761\u4ef6\u4e0b\uff0c\u4f7f\u7528RAAT\u8bad\u7ec3\u7684LLaMA-2 7B\u6a21\u578b\u5728F1\u548cEM\u5206\u6570\u4e0a\u663e\u793a\u51fa\u663e\u8457\u63d0\u5347\u3002\u4e3a\u4e86\u4fbf\u4e8e\u590d\u73b0\uff0c\u6211\u4eec\u5df2\u5728https://github.com/calubkk/RAAT\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2405.20974": "|**2024-05-31**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974](http://arxiv.org/abs/2405.20974)|**[link](https://github.com/xu1868/sayself)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u51c6\u786e\u6216\u865a\u5047\u7684\u4fe1\u606f\uff0c\u5e76\u4e14\u901a\u5e38\u65e0\u6cd5\u8868\u660e\u5176\u4fe1\u5fc3\u6c34\u5e73\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u76f4\u63a5\u63d0\u793a\u6216\u81ea\u6211\u4e00\u81f4\u6027\u63d0\u793a\u6765\u63d0\u53d6LLMs\u7684\u4fe1\u5fc3\uff0c\u6216\u8005\u6784\u5efa\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6027\u80fd\u8f83\u5dee\uff0c\u800c\u57fa\u4e8e\u8bad\u7ec3\u7684\u65b9\u6cd5\u53c8\u5c40\u9650\u4e8e\u4e8c\u5143\u6216\u4e0d\u7cbe\u786e\u7684\u6574\u4f53\u4fe1\u5fc3\u4f30\u8ba1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014SaySelf\uff0c\u8fd9\u662f\u4e00\u4e2a\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLMs\u63d0\u4f9b\u66f4\u7cbe\u786e\u7684\u7ec6\u7c92\u5ea6\u4fe1\u5fc3\u4f30\u8ba1\u3002 \u6b64\u5916\uff0cSaySelf\u8fd8\u63a8\u52a8LLMs\u751f\u6210\u81ea\u6211\u53cd\u601d\u7684\u89e3\u91ca\uff0c\u660e\u786e\u6307\u51fa\u5b83\u4eec\u5728\u53c2\u6570\u77e5\u8bc6\u4e0a\u7684\u7a7a\u767d\u5e76\u89e3\u91ca\u4e0d\u786e\u5b9a\u6027\u3002\u8fd9\u662f\u901a\u8fc7\u8ba9LLM\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u81ea\u52a8\u603b\u7ed3\u7279\u5b9a\u77e5\u8bc6\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u6765\u5b9e\u73b0\u7684\u3002\u8fd9\u79cd\u603b\u7ed3\u662f\u57fa\u4e8e\u5bf9\u591a\u4e2a\u91c7\u6837\u63a8\u7406\u94fe\u7684\u4e0d\u4e00\u81f4\u6027\u5206\u6790\uff0c\u751f\u6210\u7684\u6570\u636e\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u6821\u51c6\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u5956\u52b1\u51c6\u786e\u3001\u9ad8\u7f6e\u4fe1\u5ea6\u7684\u9884\u6d4b\uff0c\u540c\u65f6\u60e9\u7f5a\u9519\u8bef\u8f93\u51fa\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u5728\u5206\u5e03\u5185\u8fd8\u662f\u5206\u5e03\u5916\u7684\u6570\u636e\u96c6\u4e0a\uff0cSaySelf\u90fd\u80fd\u6709\u6548\u51cf\u5c11\u4fe1\u5fc3\u6821\u51c6\u8bef\u5dee\uff0c\u540c\u65f6\u4fdd\u6301\u4efb\u52a1\u6027\u80fd\u3002\u751f\u6210\u7684\u81ea\u6211\u53cd\u601d\u7406\u7531\u4e5f\u88ab\u8bc1\u660e\u662f\u5408\u7406\u7684\uff0c\u80fd\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u6821\u51c6\u3002\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a\\url{https://github.com/xu1868/SaySelf}\u3002**|\n", "2405.20973": "|**2024-05-31**|**LCQ: Low-Rank Codebook based Quantization for Large Language Models**|Wen-Pu Cai et.al.|[2405.20973](http://arxiv.org/abs/2405.20973)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4f18\u5f02\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7684\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u9ad8\u6210\u4e3a\u90e8\u7f72\u7684\u4e00\u5927\u6311\u6218\u3002\u4e3a\u4e86\u538b\u7f29\u6a21\u578b\u5e76\u964d\u4f4e\u6210\u672c\uff0c\u6743\u91cd\u91cf\u5316\u6280\u672f\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u76ee\u524d\uff0c\u5927\u591a\u6570\u9488\u5bf9LLMs\u7684\u91cf\u5316\u65b9\u6cd5\u4f7f\u7528\u79e9\u4e00\u7801\u672c\uff0c\u7136\u800c\u5728\u9ad8\u538b\u7f29\u6bd4\u4e0b\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u7cbe\u5ea6\u635f\u5931\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6743\u91cd\u91cf\u5316\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4f4e\u79e9\u7801\u672c\u91cf\u5316\uff08LCQ\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 ## \u65b9\u6cd5 LCQ\u91c7\u7528\u4f4e\u79e9\u7801\u672c\u8fdb\u884c\u91cf\u5316\uff0c\u5176\u79e9\u53ef\u4ee5\u5927\u4e8e\u4e00\u3002\u8fd9\u79cd\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u5229\u7528\u66f4\u9ad8\u7684\u79e9\u6765\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u7684\u7cbe\u5ea6\uff0c\u540c\u65f6\u63a7\u5236\u989d\u5916\u7684\u5b58\u50a8\u5f00\u9500\u51e0\u4e4e\u4e3a\u96f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u6bd4\uff0cLCQ\u5728\u4fdd\u6301\u826f\u597d\u51c6\u786e\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u80fd\u591f\u5b9e\u73b0\u66f4\u4f18\u7684\u538b\u7f29\u6548\u679c\u3002 ## \u7ed3\u8bba \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4f4e\u79e9\u7801\u672c\u91cf\u5316\u65b9\u6cd5\uff0c\u5b83\u6709\u671b\u5728\u4e0d\u663e\u8457\u589e\u52a0\u5b58\u50a8\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6027\u80fd\u548c\u6548\u7387\uff0c\u4e3a\u9ad8\u6548\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550](http://arxiv.org/abs/2406.02550)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|**\u8fd9\u7bc7\u5de5\u4f5c\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7ec4\u6a21\u5757\u5316\u7b97\u672f\u4efb\u52a1\u4e2d\u51fa\u73b0\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u6280\u80fd\u7ec4\u5408\u73b0\u8c61\u3002\u6211\u4eec\u5173\u6ce8\u7684\u662f\u6709\u9650\u6570\u91cf\u7684\u4e00\u6b21\u6027\u6a21\u8fd0\u7b97\u51fd\u6570 $z = a \\times x + b \\times y \\;(\\text{mod}\\; p)$\uff0c\u8fd9\u4e9b\u51fd\u6570\u7531\u5411\u91cf $(a, b) \\in \\mathbb{Z}_p^2$ \u6807\u8bb0\u3002\u90e8\u5206\u4efb\u52a1\u88ab\u7528\u4f5c\u9884\u8bad\u7ec3\uff0c\u5176\u4f59\u7528\u4e8e\u5206\u5e03\u5916\u6d4b\u8bd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cGPT\u98ce\u683c\u7684Transformer\u968f\u7740\u9884\u8bad\u7ec3\u4efb\u52a1\u6570\u91cf\u589e\u52a0\uff0c\u5176\u5728\u5206\u5e03\u5185\u548c\u5206\u5e03\u5916\u7684\u6cdb\u5316\u80fd\u529b\u4f1a\u7ecf\u5386\u8f6c\u53d8\u3002\u6700\u5c0f\u578b\u80fd\u5b9e\u73b0\u5206\u5e03\u5916\u6cdb\u5316\u7684\u6a21\u578b\u9700\u8981\u4e24\u4e2aTransformer\u5757\uff1b\u800c\u5bf9\u4e8e\u66f4\u6df1\u7684\u6a21\u578b\uff0c\u5206\u5e03\u5916\u6cdb\u5316\u9636\u6bb5\u662f\u201c\u77ac\u6001\u201d\u7684\uff0c\u9700\u8981\u65e9\u671f\u505c\u6b62\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u4e86\u53ef\u89e3\u91ca\u6027\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e24\u79cd\u9636\u6bb5\u4e2d\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8868\u793a\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b66\u4e60\u5230\u7684\u7b97\u6cd5\u3002**|\n", "2406.02547": "|**2024-06-04**|**Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning**|Alex Jinpeng Wang et.al.|[2406.02547](http://arxiv.org/abs/2406.02547)|**[link](https://github.com/showlab/VisInContext)**|**\u8fd9\u6bb5\u7814\u7a76\u5e76\u672a\u4ecb\u7ecd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u800c\u662f\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u63d0\u5347\u957f\u5e8f\u5217\u5728\u591a\u6a21\u6001\u6a21\u578b\u4e2d\u7684\u5904\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201cVisualized In-Context Text Processing\u201d\uff08VisInContext\uff09\u6280\u672f\uff0c\u901a\u8fc7\u89c6\u89c9\u4ee4\u724c\u6765\u5904\u7406\u957f\u6587\u672c\uff0c\u4ece\u800c\u663e\u8457\u964d\u4f4eGPU\u5185\u5b58\u4f7f\u7528\u548c\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9700\u6c42\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a560\u4ebf\u53c2\u6570\u7684\u6df7\u5408 Experts\uff08MOE\uff09\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u9884\u8bad\u7ec3\u4e2d\u7684\u4e0a\u4e0b\u6587\u6587\u672c\u957f\u5ea6\u6269\u5c55\u5230\u4e862048\u4e2atokens\uff0c\u800c\u8ba1\u7b97\u91cf\u51e0\u4e4e\u4fdd\u6301\u4e0d\u53d8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528VisInContext\u8bad\u7ec3\u7684\u6a21\u578b\u5728\u5e38\u89c1\u7684\u57fa\u4e8e\u5b9e\u4f8b\u7684\u5c11\u91cf\u6570\u636e\u8bc4\u4f30\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cVisInContext\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u7ed3\u5408\uff0c\u80fd\u589e\u5f3a\u5bf9\u6587\u6863\u7684\u7406\u89e3\u80fd\u529b\uff0c\u7279\u522b\u9002\u7528\u4e8e\u6587\u6863\u95ee\u7b54\u548c\u8fde\u7eed\u6587\u6863\u68c0\u7d22\uff0c\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002**|\n", "2406.02543": "|**2024-06-04**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543](http://arxiv.org/abs/2406.02543)|null|\u6211\u4eec\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u76ee\u6807\u662f\u8bc6\u522b\u5bf9\u7ed9\u5b9a\u67e5\u8be2\u7684\u54cd\u5e94\u65f6\u7684\u4e0d\u786e\u5b9a\u6027\u7a0b\u5ea6\u3002\u6211\u4eec\u540c\u65f6\u8003\u8651\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e0d\u786e\u5b9a\u6027\uff1a\u4e00\u79cd\u662f\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\uff08\u4f8b\u5982\u5bf9\u4e8b\u5b9e\u6216\u8bed\u8a00\u771f\u7406\u7684\u672a\u77e5\uff09\uff0c\u53e6\u4e00\u79cd\u662f\u4e0d\u53ef\u6d88\u9664\u7684\u968f\u673a\u6027\uff08\u5982\u53ef\u80fd\u7684\u7b54\u6848\u591a\u6837\u6027\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4fe1\u606f\u8bba\u6307\u6807\uff0c\u80fd\u591f\u53ef\u9760\u5730\u533a\u5206\u51fa\u53ea\u6709\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u8f83\u5927\u7684\u60c5\u51b5\uff0c\u8fd9\u65f6\u6a21\u578b\u7684\u8f93\u51fa\u662f\u4e0d\u53ef\u9760\u7684\u3002\u8fd9\u4e2a\u6761\u4ef6\u4ec5\u4f9d\u8d56\u4e8e\u901a\u8fc7\u7279\u6b8a\u8fed\u4ee3\u63d0\u793a\u57fa\u4e8e\u5148\u524d\u54cd\u5e94\u5f97\u5230\u7684\u6a21\u578b\u8f93\u51fa\u6765\u8ba1\u7b97\u3002\u8fd9\u79cd\u91cf\u5316\u65b9\u6cd5\u53ef\u4ee5\u68c0\u6d4b\u5355\u7b54\u548c\u591a\u7b54\u60c5\u51b5\u4e0b\u662f\u5426\u5b58\u5728\u865a\u6784\uff08\u5373\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u9ad8\uff09\u7684\u60c5\u51b5\uff0c\u8fd9\u4e0e\u8bb8\u591a\u6807\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\u7b56\u7565\uff08\u5982\u4ee5\u54cd\u5e94\u7684\u5bf9\u6570\u4f3c\u7136\u6027\u4f5c\u4e3a\u9608\u503c\uff09\u4e0d\u540c\uff0c\u540e\u8005\u65e0\u6cd5\u8bc6\u522b\u591a\u7b54\u60c5\u51b5\u4e0b\u7684\u865a\u6784\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86LLM\u5982\u4f55\u901a\u8fc7\u8fed\u4ee3\u63d0\u793a\u653e\u5927\u5bf9\u7ed9\u5b9a\u8f93\u51fa\u7684\u6982\u7387\u5206\u914d\uff0c\u8fd9\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u5174\u8da3\u4ef7\u503c\u3002|\n", "2406.02542": "|**2024-06-04**|**Loki: Low-Rank Keys for Efficient Sparse Attention**|Prajwal Singhania et.al.|[2406.02542](http://arxiv.org/abs/2406.02542)|null|\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u7279\u522b\u662f\u5f53\u4f7f\u7528\u957f\u5e8f\u5217\u65f6\uff0c\u81ea\u6ce8\u610f\u529b\u673a\u5236\u662f\u4e3b\u8981\u5f00\u9500\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u7a00\u758f\u6ce8\u610f\u529b\u8fd1\u4f3c\u65b9\u6cd5\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5206\u6790\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u5757\u4e2d\u7684\u952e\u5411\u91cf\u5b9e\u9645\u4e0a\u5904\u4e8e\u4e00\u4e2a\u8fdc\u4f4e\u4e8e\u539f\u59cb\u7ef4\u5ea6\u7684\u7a7a\u95f4\u3002\u8fd9\u4e00\u89c2\u5bdf\u4fc3\u4f7f\u6211\u4eec\u63d0\u51faLoki\uff0c\u4e00\u79cd\u65b0\u7684\u7a00\u758f\u6ce8\u610f\u529b\u65b9\u6cd5\u3002Loki\u6839\u636e\u5728\u4f4e\u7ef4\u7a7a\u95f4\u8ba1\u7b97\u7684\u6ce8\u610f\u529b\u5f97\u5206\uff0c\u5bf9KV\u7f13\u5b58\u4e2d\u7684\u4ee4\u724c\u8fdb\u884c\u6392\u5e8f\u548c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLoki\u80fd\u591f\u6bd4\u5176\u4ed6\u6d41\u884c\u8fd1\u4f3c\u65b9\u6cd5\u66f4\u597d\u5730\u4fdd\u6301\u6a21\u578b\u7684\u6548\u80fd\uff0c\u540c\u65f6\u7531\u4e8e\u51cf\u5c11\u4e86\u6570\u636e\u79fb\u52a8\uff08\u52a0\u8f7d/\u5b58\u50a8\uff09\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u52a0\u901f\u4e86\u6ce8\u610f\u529b\u8ba1\u7b97\u3002|\n", "2406.02539": "|**2024-06-04**|**Parrot: Multilingual Visual Instruction Tuning**|Hai-Long Sun et.al.|[2406.02539](http://arxiv.org/abs/2406.02539)|null|\u968f\u7740GPT-4V\u7b49\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u5de5\u667a\u80fd\u671d\u7740\u901a\u7528\u4eba\u5de5\u667a\u80fd\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6765\u540c\u6b65\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u800c\u8d4b\u4e88\u5b83\u4eec\u591a\u6a21\u6001\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u505a\u6cd5\u53ef\u80fd\u5bfc\u81f4\u968f\u7740\u8bad\u7ec3\u7684\u8fdb\u884c\uff0c\u8bed\u8a00\u6a21\u578b\u5904\u7406\u591a\u79cd\u8bed\u8a00\u7684\u80fd\u529b\u9010\u6e10\u51cf\u5f31\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684\u4e0d\u5e73\u8861SFT\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u975e\u82f1\u8bed\u8bed\u8a00\u6027\u80fd\u663e\u8457\u4e0b\u964d\uff0c\u539f\u56e0\u5728\u4e8eSFT\u8fc7\u7a0b\u4e2d\u672a\u80fd\u6709\u6548\u8fde\u63a5\u89c6\u89c9\u7f16\u7801\u5668\u548c\u591a\u8bed\u8a00\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faParrot\uff0c\u4e00\u79cd\u5229\u7528\u6587\u672c\u5f15\u5bfc\u5728\u8bed\u8a00\u5c42\u9762\u9a71\u52a8\u89c6\u89c9\u4ee4\u724c\u5bf9\u9f50\u7684\u65b0\u65b9\u6cd5\u3002Parrot\u901a\u8fc7\u8ba9\u89c6\u89c9\u4ee4\u724c\u6839\u636e\u4e0d\u540c\u7684\u8bed\u8a00\u8f93\u5165\u8fdb\u884c\u6761\u4ef6\u5316\uff0c\u5e76\u501f\u52a9\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u4fc3\u8fdb\u591a\u8bed\u8a00\u4ee4\u724c\u7684\u5bf9\u9f50\u3002\u7279\u522b\u662f\uff0c\u4e3a\u4e86\u589e\u5f3a\u975e\u82f1\u8bed\u89c6\u89c9\u4ee4\u724c\u7684\u5bf9\u9f50\uff0c\u6211\u4eec\u8ba1\u7b97\u521d\u59cb\u89c6\u89c9\u7279\u5f81\u4e0e\u6587\u672c\u5d4c\u5165\u4e4b\u95f4\u7684\u8de8\u6ce8\u610f\u529b\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230MoE\u8def\u7531\u5668\uff0c\u9009\u62e9\u6700\u76f8\u5173\u7684\u4e13\u5bb6\u3002\u9009\u5b9a\u7684\u4e13\u5bb6\u4f1a\u5c06\u521d\u59cb\u89c6\u89c9\u4ee4\u724c\u8f6c\u5316\u4e3a\u7279\u5b9a\u8bed\u8a00\u7684\u89c6\u89c9\u4ee4\u724c\u3002\u9274\u4e8e\u76ee\u524d\u7f3a\u4e4f\u8bc4\u4f30\u591a\u8bed\u8a00\u80fd\u529b\u7684\u6807\u51c6\u57fa\u51c6\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u591a\u6a21\u6001\u57fa\u51c6\uff08MMMB\uff09\uff0c\u5305\u62ec6\u79cd\u8bed\u8a00\u300115\u4e2a\u7c7b\u522b\u548c12,000\u4e2a\u95ee\u9898\u3002Parrot\u4e0d\u4ec5\u5728MMMB\u548cMMM Benchmark\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8fd8\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5c06\u63d0\u4f9bParrot\u7684\u6e90\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\u96c6\u4f9b\u516c\u4f17\u4f7f\u7528\u3002|\n", "2406.02536": "|**2024-06-04**|**Mitigate Position Bias in Large Language Models via Scaling a Single Dimension**|Yijiong Yu et.al.|[2406.02536](http://arxiv.org/abs/2406.02536)|**[link](https://github.com/PositionalHidden/PositionalHidden)**|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4e00\u4e2a\u73b0\u8c61\u2014\u2014\u4f4d\u7f6e\u504f\u89c1\uff0c\u4e5f\u79f0\u4e3a\"\u8ff7\u5931\u5728\u4e2d\u95f4\"\u3002\u8fd9\u79cd\u504f\u89c1\u5728\u957f\u6587\u672c\u60c5\u5883\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u5373\u5173\u952e\u4fe1\u606f\u5728\u63d0\u793a\u4e2d\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f1a\u663e\u8457\u5f71\u54cd\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u6743\u91cd\u662f\u4f4d\u7f6e\u504f\u89c1\u7684\u5fae\u89c2\u8868\u73b0\u3002\u6b64\u5916\uff0c\u8bba\u6587\u6307\u51fa\uff0c\u56e0\u679c\u6ce8\u610f\u529b\u63a9\u7801\u901a\u8fc7\u521b\u5efa\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\uff0c\u4e5f\u5bf9\u4f4d\u7f6e\u504f\u89c1\u6709\u6240\u8d21\u732e\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u51cf\u8f7b\u4f4d\u7f6e\u504f\u89c1\uff0c\u5373\u8c03\u6574\u8fd9\u4e9b\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\u3002\u5b9e\u9a8c\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\uff0c\u5305\u62ec\u81ea\u7136\u95ee\u9898\u591a\u6587\u6863\u95ee\u7b54\u3001\u952e\u503c\u68c0\u7d22\u3001LongBench\u548c\u65f6\u95f4\u7ebf\u91cd\u6392\uff0c\u6d89\u53caRoPE\u6a21\u578b\u3001\u6269\u5c55\u4e0a\u4e0b\u6587\u7a97\u53e3\u6a21\u578b\u548cAlibi\u6a21\u578b\u7b49\u591a\u79cd\u67b6\u6784\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4ec5\u4fee\u6539\u9690\u85cf\u72b6\u6001\u7684\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u5c31\u80fd\u5b9e\u73b0\u6027\u80fd\u63d0\u5347\uff0c\u6700\u9ad8\u53ef\u8fbe15.2%\u3002\u7814\u7a76\u8005\u8fd8\u63d0\u4f9b\u4e86\u4ee3\u7801\u4f9b\u8fdb\u4e00\u6b65\u4f7f\u7528\uff0c\u4ee3\u7801\u5730\u5740\u4e3a\uff1ahttps://aka.ms/PositionalHidden\u3002|\n", "2406.02532": "|**2024-06-04**|**SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices**|Ruslan Svirschevski et.al.|[2406.02532](http://arxiv.org/abs/2406.02532)|**[link](https://github.com/yandex-research/specexec)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u9ad8\u6548\u8fd0\u884c\u5b83\u4eec\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u63a8\u6d4b\u6027\u89e3\u7801\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u662f\u9488\u5bf9\u6570\u636e\u4e2d\u5fc3\u786c\u4ef6\u8fdb\u884c\u8bbe\u8ba1\u3002\u672c\u7814\u7a76\u53cd\u95ee\uff1a\u6211\u4eec\u80fd\u5728\u6d88\u8d39\u7ea7\u8bbe\u5907\u4e0a\u591a\u5feb\u5730\u8fd0\u884cLLMs\uff1f\u6d88\u8d39\u8005\u7ea7GPU\u5df2\u65e0\u6cd5\u5bb9\u7eb3\u6700\u5927\u7684\u6a21\u578b\uff08500\u4ebf\u53c2\u6570\u4ee5\u4e0a\uff09\uff0c\u56e0\u6b64\u9700\u8981\u5c06\u53c2\u6570\u5378\u8f7d\u5230RAM\u6216SSD\u3002\u5f53\u4f7f\u7528\u5378\u8f7d\u53c2\u6570\u7684\u65b9\u5f0f\u8fd0\u884c\u65f6\uff0c\u63a8\u7406\u5f15\u64ce\u53ef\u4ee5\u540c\u65f6\u5904\u7406\u6570\u767e\u4e43\u81f3\u6570\u5343\u4e2a\u4ee4\u724c\u7684\u6279\u6b21\uff0c\u4f7f\u5176\u975e\u5e38\u9002\u5408\u63a8\u6d4b\u6027\u89e3\u7801\u3002\u6211\u4eec\u63d0\u51faSpecExec\uff08\u63a8\u6d4b\u6027\u6267\u884c\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u7684\u5e76\u884c\u89e3\u7801\u65b9\u6cd5\uff0c\u9002\u7528\u4e8e\u4e3b\u6d41LLM\u5bb6\u65cf\uff0c\u80fd\u751f\u6210\u6bcf\u8f6e\u76ee\u6807\u6a21\u578b\u8fed\u4ee3\u9ad8\u8fbe20\u4e2a\u4ee4\u724c\u7684\u9884\u6d4b\u3002\u5b83\u5229\u7528\u73b0\u4ee3LLMs\u4e2d\u6982\u7387\u5206\u5e03\u7684\u9ad8\u6ce2\u52a8\u6027\u548c\u6a21\u578b\u8f93\u51fa\u6982\u7387\u4e4b\u95f4\u7684\u9ad8\u5ea6\u4e00\u81f4\u6027\u3002SpecExec\u901a\u8fc7\u4ece\u8349\u7a3f\u6a21\u578b\u83b7\u53d6\u6700\u53ef\u80fd\u7684\u4ee4\u724c\u5ef6\u7eed\uff0c\u6784\u5efa\u4e00\u4e2a\u76ee\u6807\u6a21\u578b\u7684\u201c\u7f13\u5b58\u201d\u6811\uff0c\u7136\u540e\u5728\u4e00\u4e2a\u5355\u6b21\u904d\u5386\u4e2d\u9a8c\u8bc1\u3002 \u4f7f\u7528SpecExec\uff0c\u6211\u4eec\u5728\u6d88\u8d39\u7ea7GPU\u4e0a\u5b9e\u73b0\u4e86500\u4ebf\u53c2\u6570LLM\u7684\u63a8\u7406\uff0c\u914d\u5408RAM\u5378\u8f7d\uff0c4\u4f4d\u91cf\u5316\u4e0b\u7684\u901f\u5ea6\u8fbe\u52304-6\u4e2a\u4ee4\u724c/\u79d2\uff0c\u800c16\u4f4d\u6743\u91cd\u4e0b\u7684\u901f\u5ea6\u4e3a2-3\u4e2a\u4ee4\u724c/\u79d2\u3002|\n", "2406.02528": "|**2024-06-04**|**Scalable MatMul-free Language Modeling**|Rui-Jie Zhu et.al.|[2406.02528](http://arxiv.org/abs/2406.02528)|**[link](https://github.com/ridgerchu/matmulfreellm)**|**## \u7ffb\u8bd1 \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u77e9\u9635\u4e58\u6cd5\uff08MatMul\uff09\u901a\u5e38\u5360\u636e\u4e3b\u8981\u8ba1\u7b97\u5f00\u9500\u3002\u968f\u7740LLMs\u7684\u89c4\u6a21\u6269\u5927\uff0c\u5176\u5d4c\u5165\u7ef4\u5ea6\u548c\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e5f\u968f\u4e4b\u589e\u52a0\uff0c\u8fd9\u4e00\u95ee\u9898\u66f4\u4e3a\u663e\u8457\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u80fd\u591f\u5728\u4fdd\u6301\u5f3a\u5927\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5b8c\u5168\u79fb\u9664LLMs\u4e2d\u7684MatMul\u64cd\u4f5c\uff0c\u5373\u4f7f\u662f\u572827\u4ebf\u53c2\u6570\u91cf\u7ea7\u7684\u6a21\u578b\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65e0MatMul\u6a21\u578b\u5728\u4e0e\u5185\u5b58\u6d88\u8017\u663e\u8457\u66f4\u591a\u7684\u72b6\u6001-of-the-artTransformer\u76f8\u5f53\u7684\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u7814\u7a76\u4e86\u6a21\u578b\u7684\u6269\u5c55\u6027\u89c4\u5f8b\uff0c\u5e76\u53d1\u73b0\u65e0MatMul\u6a21\u578b\u4e0e\u5168\u7cbe\u5ea6Transformer\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u968f\u7740\u6a21\u578b\u5c3a\u5bf8\u589e\u5927\u800c\u51cf\u5c0f\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684GPU\u5b9e\u73b0\uff0c\u76f8\u8f83\u4e8e\u672a\u4f18\u5316\u7684\u57fa\u7ebf\uff0c\u8bad\u7ec3\u65f6\u80fd\u51cf\u5c11\u9ad8\u8fbe61%\u7684\u5185\u5b58\u4f7f\u7528\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u901a\u8fc7\u4f18\u5316\u7684\u5185\u6838\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5185\u5b58\u6d88\u8017\u53ef\u964d\u4f4e\u8d85\u8fc710\u500d\u3002\u4e3a\u4e86\u51c6\u786e\u8bc4\u4f30\u67b6\u6784\u6548\u7387\uff0c\u6211\u4eec\u5728FPGA\u4e0a\u6784\u5efa\u4e86\u5b9a\u5236\u786c\u4ef6\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528GPU\u65e0\u6cd5\u5904\u7406\u7684\u8f7b\u91cf\u7ea7\u8fd0\u7b97\uff0c\u5b9e\u73b0\u4e86\u5bf9\u5341\u4ebf\u53c2\u6570\u89c4\u6a21\u6a21\u578b\u7684\u9ad8\u901f\u5904\u7406\uff0c\u4f7f\u5176\u63a5\u8fd1\u4eba\u8111\u7ea7\u522b\u7684\u6548\u7387\u3002 \u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u5c55\u793a\u4e86LLMs\u5728\u51cf\u5c0f\u590d\u6742\u6027\u540e\u4ecd\u80fd\u4fdd\u6301\u9ad8\u6548\uff0c\u8fd8\u6307\u51fa\u4e86\u672a\u6765\u52a0\u901f\u5668\u5e94\u4f18\u5316\u7684\u8fd0\u7b97\u7c7b\u578b\uff0c\u4ee5\u9002\u5e94\u4e0b\u4e00\u4ee3\u8f7b\u91cf\u7ea7LLMs\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5b9e\u73b0\u5df2\u5f00\u6e90\u81f3\uff1a\\url{https://github.com/ridgerchu/matmulfreellm}\u3002**|\n", "2406.02524": "|**2024-06-04**|**CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks**|Maciej Besta et.al.|[2406.02524](http://arxiv.org/abs/2406.02524)|**[link](https://github.com/spcl/checkembed)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u53d8\u9769\uff0c\u4f46\u9a8c\u8bc1\u5176\u7b54\u6848\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u5f00\u653e\u6027\u7684\u4efb\u52a1\uff0c\u5982\u77e5\u8bc6\u6574\u5408\u3001\u6458\u8981\u548c\u63d0\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheckEmbed\u7684\u7cbe\u786e\u3001\u53ef\u6269\u5c55\u4e14\u7b80\u4fbf\u7684LLM\u9a8c\u8bc1\u65b9\u6cd5\u3002CheckEmbed\u7684\u6838\u5fc3\u7406\u5ff5\u662f\uff1a\u901a\u8fc7\u5229\u7528\u5982GPT\u6587\u672c\u5d4c\u5165\u5927\u6a21\u578b\u83b7\u53d6\u7684\u7b54\u6848\u7ea7\u5d4c\u5165\u6765\u6bd4\u8f83LLM\u7684\u56de\u7b54\u3002\u8fd9\u5c06\u590d\u6742\u7684\u6587\u672c\u7b54\u6848\u8f6c\u5316\u4e3a\u5355\u4e00\u7684\u5d4c\u5165\uff0c\u7b80\u5316\u4e86\u5bf9\u6bd4\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u5feb\u901f\u800c\u6709\u610f\u4e49\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u9a8c\u8bc1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u5b9e\u73b0\u4e86CheckEmbed\u7684\u7406\u5ff5\uff0c\u5e76\u63d0\u4f9b\u4e86\u8bc4\u4f30LLM\u7b54\u6848\u771f\u5b9e\u6027\u7684\u5ea6\u91cf\uff0c\u5982\u5d4c\u5165\u70ed\u529b\u56fe\u53ca\u5176\u603b\u7ed3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6307\u6807\u8bbe\u8ba1\u5b9e\u9645\u7684\u5f15\u64ce\uff0c\u4ee5\u51b3\u5b9aLLM\u7b54\u6848\u662f\u5426\u4ee4\u4eba\u6ee1\u610f\u3002\u5728\u5b9e\u9645\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\uff0c\u5982\u672f\u8bed\u63d0\u53d6\u548c\u6587\u6863\u6458\u8981\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u663e\u8457\u7684\u51c6\u786e\u6027\u63d0\u5347\u3001\u6210\u672c\u6548\u76ca\u548c\u8fd0\u884c\u65f6\u95f4\u6027\u80fd\uff0c\u76f8\u8f83\u4e8eBERTScore\u6216SelfCheckGPT\u7b49\u57fa\u4e8etoken\u3001\u53e5\u5b50\u548c\u4e8b\u5b9e\u7ea7\u522b\u7684\u65b9\u6848\u3002|\n", "2406.02523": "|**2024-06-04**|**RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots**|Soroush Nasiriany et.al.|[2406.02523](http://arxiv.org/abs/2406.02523)|null|## \u7ffb\u8bd1 \u4eba\u5de5\u667a\u80fd\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u89c4\u6a21\u7684\u6269\u5927\u3002\u7136\u800c\uff0c\u5728\u673a\u5668\u4eba\u9886\u57df\uff0c\u5927\u89c4\u6a21\u673a\u5668\u4eba\u6570\u636e\u96c6\u7684\u83b7\u53d6\u662f\u4e00\u4e2a\u74f6\u9888\u3002\u6211\u4eec\u4e3b\u5f20\u5229\u7528\u903c\u771f\u7684\u7269\u7406\u6a21\u62df\u6765\u63d0\u5347\u73af\u5883\u3001\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\uff0c\u4ee5\u652f\u6301\u673a\u5668\u4eba\u5b66\u4e60\u65b9\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecdRoboCasa\uff0c\u8fd9\u662f\u4e00\u4e2a\u5927\u578b\u7684\u4eff\u771f\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u80fd\u591f\u5728\u65e5\u5e38\u73af\u5883\u4e2d\u901a\u7528\u7684\u673a\u5668\u4eba\u3002RoboCasa\u7684\u7279\u70b9\u662f\u62e5\u6709\u4e30\u5bcc\u4e14\u591a\u6837\u5316\u7684\u53a8\u623f\u573a\u666f\uff0c\u5305\u62ec\u8d85\u8fc7150\u4e2a\u7c7b\u522b\u7684\u4e00\u5343\u591a\u4ef63D\u6a21\u578b\u8d44\u4ea7\u548c\u6570\u5341\u79cd\u53ef\u4ea4\u4e92\u7684\u5bb6\u5177\u548c\u7535\u5668\u3002 \u6211\u4eec\u901a\u8fc7\u751f\u6210\u5f0fAI\u5de5\u5177\u8fdb\u4e00\u6b65\u589e\u5f3a\u6a21\u62df\u7684\u771f\u5b9e\u6027\u548c\u591a\u6837\u6027\uff0c\u5982\u4f7f\u7528\u6587\u672c\u52303D\u6a21\u578b\u7684\u6280\u672f\u751f\u6210\u5bf9\u8c61\u8d44\u4ea7\uff0c\u4ee5\u53ca\u901a\u8fc7\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u73af\u5883\u7eb9\u7406\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86100\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6307\u5bfc\u7684\u590d\u5408\u4efb\u52a1\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f30\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u6f14\u793a\uff0c\u5e76\u7ed3\u5408\u81ea\u52a8\u8f68\u8ff9\u751f\u6210\u65b9\u6cd5\uff0c\u4ee5\u6700\u5c0f\u7684\u4eba\u529b\u6210\u672c\u5927\u5e45\u6269\u5145\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u5408\u6210\u751f\u6210\u7684\u673a\u5668\u4eba\u6570\u636e\u8fdb\u884c\u5927\u89c4\u6a21\u6a21\u4eff\u5b66\u4e60\u65f6\uff0c\u5b58\u5728\u660e\u663e\u7684\u89c4\u6a21\u6548\u5e94\uff0c\u5e76\u663e\u793a\u51fa\u5229\u7528\u6a21\u62df\u6570\u636e\u5728\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\u3002\u76f8\u5173\u89c6\u9891\u548c\u5f00\u6e90\u4ee3\u7801\u5df2\u5728https://robocasa.ai/\u7f51\u7ad9\u4e0a\u63d0\u4f9b\u3002|\n", "2406.03496": "|**2024-06-05**|**Wings: Learning Multimodal LLMs without Text-only Forgetting**|Yi-Kai Zhang et.al.|[2406.03496](http://arxiv.org/abs/2406.03496)|null|## \u4efb\u52a1 \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8d77\u6e90\u4e8e\u9884\u8bad\u7ec3\u7684\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff0c\u9996\u5148\u5c06\u56fe\u50cf\u4e0e\u6587\u672c\u5bf9\u9f50\uff0c\u7136\u540e\u5728\u6df7\u5408\u6a21\u6001\u8f93\u5165\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0cMLLM\u5728\u5904\u7406\u4ec5\u5305\u542b\u6587\u672c\u7684\u6307\u4ee4\u65f6\u4f1a\u51fa\u73b0\u707e\u96be\u6027\u7684\u9057\u5fd8\uff0c\u8fd9\u4e9b\u6587\u672c\u6307\u4ee4\u5e76\u672a\u5305\u542b\u56fe\u50cf\uff0c\u8fd9\u4e9b\u95ee\u9898\u5728\u521d\u59cb\u7684\u8bed\u8a00\u6a21\u578b\u9636\u6bb5\u5c31\u5df2\u7ecf\u5b58\u5728\u3002\u672c\u6587\u63d0\u51faWings\uff0c\u4e00\u4e2a\u65b0\u578b\u7684MLLM\uff0c\u5b83\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u901a\u8fc7\u5206\u6790MLLM\u5728\u591a\u6a21\u6001\u6307\u4ee4\u4e2d\u7684\u6ce8\u610f\u529b\uff0c\u6211\u4eec\u53d1\u73b0\u6587\u672c\u9057\u5fd8\u4e0e\u4ece\u56fe\u50cf\u524d\u5411\u56fe\u50cf\u540e\u7684\u6ce8\u610f\u529b\u8f6c\u79fb\u6709\u5173\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u989d\u5916\u6a21\u5757\u4f5c\u4e3a\u589e\u5f3a\u5b66\u4e60\u5668\uff0c\u4ee5\u8865\u507f\u8fd9\u79cd\u6ce8\u610f\u529b\u8f6c\u79fb\u3002\u89c6\u89c9\u548c\u6587\u672c\u5b66\u4e60\u5668\u4f5c\u4e3a\u201c\u7fc5\u8180\u201d\u5f0f\u7684\u8865\u5145\uff0c\u5e73\u884c\u8fde\u63a5\u5728\u6bcf\u4e2a\u6ce8\u610f\u529b\u5757\u5185\uff0c\u8d77\u521d\u56fe\u50cf\u548c\u6587\u672c\u8f93\u5165\u7531\u89c6\u89c9\u5b66\u4e60\u5668\u4e0e\u4e3b\u6ce8\u610f\u529b\u534f\u540c\u5de5\u4f5c\uff0c\u5e73\u8861\u5bf9\u89c6\u89c9\u5143\u7d20\u7684\u5173\u6ce8\u3002\u968f\u540e\uff0c\u6587\u672c\u5b66\u4e60\u5668\u901a\u8fc7\u6ce8\u610f\u529b\u8def\u7531\u7684\u65b9\u5f0f\u4e0e\u89c6\u89c9\u5b66\u4e60\u5668\u7684\u8f93\u51fa\u534f\u4f5c\u6574\u5408\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4f4e\u79e9\u6b8b\u5dee\u6ce8\u610f\u529b\uff08LoRRA\uff09\u673a\u5236\u4ee5\u4fdd\u8bc1\u5b66\u4e60\u5668\u7684\u9ad8\u6548\u8fd0\u884c\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWings\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u540c\u7b49\u89c4\u6a21\u7684MLLM\u3002\u5728\u6211\u4eec\u65b0\u6784\u5efa\u7684\u4ea4\u9519\u56fe\u50cf-\u6587\u672c\uff08IIT\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cWings\u5728\u4ece\u6587\u672c\u4e3a\u4e3b\u5230\u591a\u6a21\u6001\u4e3a\u4e3b\u7684\u95ee\u7b54\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002|\n", "2406.03488": "|**2024-06-06**|**Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training**|Ao Sun et.al.|[2406.03488](http://arxiv.org/abs/2406.03488)|**[link](https://github.com/maydomine/seq1f1b)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5176\u4e2d\u7ba1\u9053\u5e76\u884c\u6027\u8d77\u7740\u5173\u952e\u4f5c\u7528\u3002\u968f\u7740LLMs\u7684\u8bad\u7ec3\u5e8f\u5217\u957f\u5ea6\u6269\u5c55\u523032k\u751a\u81f3128k\uff0c\u5f53\u524d\u7684\u7ba1\u9053\u5e76\u884c\u65b9\u6cd5\u9762\u4e34\u4e25\u91cd\u74f6\u9888\uff0c\u5982\u9ad8\u5185\u5b58\u5360\u7528\u548c\u663e\u8457\u7684\u7ba1\u9053\u5ef6\u8fdf\uff0c\u8fd9\u6781\u5927\u5730\u9650\u5236\u4e86\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u548c\u8bad\u7ec3\u541e\u5410\u91cf\u3002\u4e3a\u4e86\u63d0\u9ad8\u5185\u5b58\u6548\u7387\u548c\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u957f\u5e8f\u5217\u8bad\u7ec3LLMs\u7684\u9ad8\u6548\u5e8f\u5217\u7ea7\u4e00\u6b21\u524d\u5411\u4e00\u6b21\u540e\u5411\uff081F1B\uff09\u7ba1\u9053\u8c03\u5ea6\u65b9\u6cd5\uff0c\u79f0\u4e3aSeq1F1B\u3002Seq1F1B\u5c06\u6279\u7ea7\u522b\u53ef\u8c03\u5ea6\u5355\u5143\u5206\u89e3\u4e3a\u66f4\u7ec6\u7684\u5e8f\u5217\u7ea7\u5355\u5143\uff0c\u4ece\u800c\u51cf\u5c0f\u5ef6\u8fdf\u5e76\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002 \u8003\u8651\u5230\u5982\u679c\u5747\u5300\u5206\u5272\u5e8f\u5217\uff0cSeq1F1B\u53ef\u80fd\u4f1a\u4ea7\u751f\u8f7b\u5fae\u7684\u989d\u5916\u5ef6\u8fdf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba1\u7b97\u7684\u7b56\u7565\u6765\u5212\u5206\u8f93\u5165\u5e8f\u5217\uff0c\u4ee5\u7f13\u89e3\u8fd9\u4e2a\u526f\u4f5c\u7528\u3002\u4e0e\u7ade\u4e89\u6027\u7684\u7ba1\u9053\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5982Megatron\u76841F1B\u7ba1\u9053\u5e76\u884c\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u66f4\u9ad8\u8bad\u7ec3\u541e\u5410\u91cf\u7684\u540c\u65f6\uff0c\u5185\u5b58\u5360\u7528\u66f4\u4f4e\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSeq1F1B\u80fd\u591f\u5728\u4e0d\u4f7f\u7528\u91cd\u65b0\u8ba1\u7b97\u7b56\u7565\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u5730\u572864\u4e2aNVIDIA A100 GPU\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u5177\u6709300\u4ebf\u53c2\u6570\u7684LLM\uff0c\u5904\u7406\u957f\u8fbe64k\u7684\u5e8f\u5217\uff0c\u8fd9\u662f\u73b0\u6709\u65b9\u6cd5\u65e0\u6cd5\u5b9e\u73b0\u7684\u3002\u6211\u4eec\u7684\u4ee3\u7801\u57fa\u4e8eMegatron-LM\uff0c\u5e76\u5df2\u5f00\u6e90\uff1ahttps://github.com/MayDomine/Seq1F1B.git\u3002|\n", "2406.03487": "|**2024-06-05**|**Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends**|Sanjana Ramprasad et.al.|[2406.03487](http://arxiv.org/abs/2406.03487)|null|### \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u6458\u8981\u751f\u6210\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u771f\u5b9e\u6027\u65b9\u9762\u7684\u95ee\u9898\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u65b0\u95fb\u9886\u57df\u7684LLMs\uff0c\u5bf9\u8bdd\u6458\u8981\u7684\u8bc4\u4ef7\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u4e8eBART\u7684\u6a21\u578b\u4e0a\uff0c\u8fd9\u5728\u6211\u4eec\u7406\u89e3\u5b83\u4eec\u7684\u53ef\u4fe1\u5ea6\u65b9\u9762\u7559\u4e0b\u4e86\u7a7a\u767d\u3002\u672c\u7814\u7a76\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u5bf9\u8bdd\u6458\u8981\u4e2d\u7684\u771f\u5b9e\u6027\uff0c\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\uff0c\u5e76\u7740\u91cd\u4e8e\u8bc6\u522b\u548c\u5206\u7c7b\u53e5\u7ea7\u4e0d\u4e00\u81f4\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8GPT-4\u548cAlpaca-13B\u8fd9\u4e24\u6b3e\u4e3b\u6d41\u6a21\u578b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u9519\u8bef\u5b9a\u4e49\u7684\u5fae\u5999\u4e4b\u5904\uff1aLLMs\u5e38\u5e38\u751f\u6210\u770b\u4f3c\u5408\u7406\u7684\u63a8\u65ad\uff0c\u8fd9\u4e9b\u63a8\u65ad\u4f9d\u8d56\u4e8e\u5bf9\u8bdd\u4e2d\u7684\u95f4\u63a5\u8bc1\u636e\uff0c\u800c\u7f3a\u4e4f\u76f4\u63a5\u8bc1\u636e\uff0c\u8fd9\u5728\u65e7\u6a21\u578b\u4e2d\u8f83\u5c11\u89c1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6539\u8fdb\u7684\u9519\u8bef\u5206\u7c7b\u4f53\u7cfb\uff0c\u5f15\u5165\u4e86\u201c\u60c5\u5883\u63a8\u7406\u201d\u7c7b\u522b\u6765\u5f52\u7c7b\u8fd9\u4e9bLLM\u884c\u4e3a\uff0c\u5e76\u516c\u5f00\u4e86\u76f8\u5173\u6570\u636e\u96c6\u3002\u5229\u7528\u6211\u4eec\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86LLMs\u4e0e\u8001\u5f0f\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u884c\u4e3a\u5dee\u5f02\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u81ea\u52a8\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\u5728LLM\u6458\u8981\u4e0a\u7684\u6548\u679c\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u8bc6\u522b\u8fd9\u7c7b\u7ec6\u5fae\u9519\u8bef\u65f6\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u7cbe\u7ec6\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u4f18\u4e8e\u73b0\u6709\u6307\u6807\uff0c\u7279\u522b\u662f\u5728\u8bc6\u522b\u201c\u60c5\u5883\u63a8\u7406\u201d\u9519\u8bef\u65f6\u3002|\n", "2406.03486": "|**2024-06-05**|**BIPED: Pedagogically Informed Tutoring System for ESL Education**|Soonwoo Kwon et.al.|[2406.03486](http://arxiv.org/abs/2406.03486)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u4f5c\u4e3a\u7ecf\u6d4e\u4e14\u6613\u4e8e\u83b7\u53d6\u7684\u82f1\u8bed\u7b2c\u4e8c\u8bed\u8a00\uff08L2\uff09\u5b66\u4e60\u8005\u5bf9\u8bdd\u5f0f\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\uff08CITS\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684CITS\u5f80\u5f80\u53ea\u80fd\u6559\u6388\u7b80\u5355\u6982\u5ff5\uff0c\u6216\u8005\u5728\u6559\u5b66\u6df1\u5ea6\u4e0a\u65e0\u6cd5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u7b56\u7565\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u5f00\u53d1\u4e00\u4e2a\u66f4\u5177\u6559\u80b2\u5b66\u5bfc\u5411\u3001\u80fd\u6559\u6388\u590d\u6742\u6982\u5ff5\u7684CITS\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53cc\u8bed\u6559\u80b2\u6307\u5bfc\u5bf9\u8bdd\u6570\u636e\u96c6\uff08BIPED\uff09\uff0c\u5305\u542b\u4e00\u5bf9\u4e00\u7684\u4eba\u7c7b\u82f1\u8bed\u8f85\u5bfc\u4e92\u52a8\u3002\u901a\u8fc7\u5bf9\u8f85\u5bfc\u5bf9\u8bdd\u7684\u540e\u5904\u7406\u5206\u6790\uff0c\u6211\u4eec\u63d0\u70bc\u51fa\u4e00\u5957\u5305\u542b34\u79cd\u6559\u5e08\u884c\u4e3a\u548c9\u79cd\u5b66\u751f\u884c\u4e3a\u7684\u5bf9\u8bdd\u52a8\u4f5c\u8bcd\u5178\uff0c\u5e76\u5c06\u5176\u7528\u4e8e\u8fdb\u4e00\u6b65\u6807\u6ce8\u6536\u96c6\u7684\u6570\u636e\u3002\u6839\u636e\u5148\u9884\u6d4b\u5408\u9002\u7684\u6559\u5e08\u884c\u4e3a\u518d\u751f\u6210\u76f8\u5e94\u56de\u590d\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u6211\u4eec\u5229\u7528GPT-4\u548cSOLAR-KO\u5206\u522b\u5b9e\u73b0\u4e86\u4e24\u4e2aCITS\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u5b9e\u65bd\u7684\u6a21\u578b\u4e0d\u4ec5\u6a21\u4eff\u4e86\u4eba\u7c7b\u6559\u5e08\u7684\u98ce\u683c\uff0c\u8fd8\u8fd0\u7528\u4e86\u4e30\u5bcc\u4e14\u4e0e\u4e0a\u4e0b\u6587\u76f8\u9002\u5e94\u7684\u6559\u5b66\u7b56\u7565\u3002|\n", "2406.03476": "|**2024-06-05**|**Does your data spark joy? Performance gains from domain upsampling at the end of training**|Cody Blakeney et.al.|[2406.03476](http://arxiv.org/abs/2406.03476)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u89c4\u6a21\u589e\u957f\u5230\u4e07\u4ebf\u7ea7\u522b\u7684tokens\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u7531\u5927\u89c4\u6a21\u7684CommonCrawl\u7f51\u7edc\u722c\u866b\u5185\u5bb9\u4ee5\u53ca\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u7ec4\u6210\u3002\u7531\u4e8e\u5728\u5927\u8ba1\u7b97\u91cf\uff08FLOPs\uff09\u4e0b\u8bad\u7ec3\u4ee5\u63ed\u793a\u6a21\u578b\u5728\u56f0\u96be\u548c\u65b0\u5174\u57fa\u51c6\u4e0a\u7684\u663e\u8457\u53d8\u5316\u6210\u672c\u9ad8\u6602\uff0c\u5982\u4f55\u5728\u901a\u7528\u7f51\u7edc\u6293\u53d6\u7684\u591a\u6837\u6027\u548c\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5bc6\u5ea6\u4e4b\u95f4\u627e\u5230\u6700\u4f18\u5e73\u8861\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c\u5728\u8bad\u7ec3\u540e\u671f\u5bf9\u5176\u8fdb\u884c\u4e0a\u91c7\u6837\uff0c\u4ece\u800c\u5728\u8bf8\u5982MMLU\u3001GSM8K\u548cHumanEval\u7b49\u57fa\u51c6\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u5bf9\u4e8e\u4e00\u4e2a\u8bad\u7ec3\u4e861\u4e07\u4ebf\uff08T\uff09\u4ee4\u724c\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u8fd9\u79cd\u7b80\u5355\u65b9\u6cd5\u53ef\u4f7f\u5176\u6027\u80fd\u63d0\u9ad86.90\u5206\u30018.26\u5206\u548c6.17\u5206\uff0c\u4e0e\u8bad\u7ec3\u65f6\u95f4\u4e24\u500d\u7684Llama-2\uff087B\uff09\u6a21\u578b\u76f8\u5f53\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5728\u8bad\u7ec3\u540e\u671f\u9886\u57df\u4e0a\u91c7\u6837\u7684\u6301\u7eed\u65f6\u95f4\uff0c\u4ece5%\u523030%\uff0c\u53d1\u73b010%\u523020%\u7684\u6bd4\u4f8b\u6700\u4e3a\u5408\u9002\uff0c\u4ee5\u5e73\u8861\u4e00\u822c\u8bed\u8a00\u5efa\u6a21\u80fd\u529b\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684\u4f18\u5316\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5229\u7528\u9886\u57df\u4e0a\u91c7\u6837\u6765\u5927\u89c4\u6a21\u5206\u6790\u5355\u4e2a\u6570\u636e\u96c6\u5bf9\u4e0d\u540c\u57fa\u51c6\u7684\u589e\u76ca\uff0c\u901a\u8fc7\u5728\u8fd9\u4e00\u9636\u6bb5\u79fb\u9664\u5b83\u4eec\u8fdb\u884c\u5b9e\u9a8c\u3002\u8fd9\u79cd\u65b9\u6cd5\u6781\u5927\u5730\u964d\u4f4e\u4e86\u5b9e\u9a8c\u6210\u672c\uff0c\u4f7f\u5f97\u80fd\u591f\u4ee5\u9884\u8bad\u7ec3\u8fd0\u884c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u7684\u6210\u672c\u63a2\u7d22\u4e0d\u540c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474](http://arxiv.org/abs/2406.03474)|null|\u9274\u4e8e\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u805a\u7126\u4e8e\u4f7f\u7528MLLM\u9a71\u52a8\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u5728\u5927\u89c4\u6a21\u52a8\u6001\u73af\u5883\u4e2d\u3002\u7136\u800c\uff0c\u5e38\u89c1\u7684\u65b9\u6cd5\u76f4\u63a5\u5c06\u9ad8\u7ea7\u6307\u4ee4\u8f6c\u5316\u4e3a\u4f4e\u7ea7\u8f66\u8f86\u63a7\u5236\u4fe1\u53f7\uff0c\u8fd9\u8fdd\u80cc\u4e86MLLM\u7684\u672c\u8d28\u751f\u6210\u6a21\u5f0f\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u5176\u6f5c\u5728\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u7684\u4e00\u822c\u5316\u80fd\u529b\u53d7\u5230\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6781\u5927\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u4e2d\u5c42\u8bed\u8a00\u9a71\u52a8\u547d\u4ee4\u6765\u8fde\u63a5\u9ad8\u7ea7\u6307\u4ee4\u548c\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\uff0c\u5b83\u4eec\u6bd4\u9ad8\u7ea7\u6307\u4ee4\u66f4\u7ec6\u81f4\uff0c\u4f46\u6bd4\u63a7\u5236\u4fe1\u53f7\u66f4\u901a\u7528\u4e14\u53ef\u89e3\u91ca\uff0c\u4ece\u800c\u6709\u6548\u5f25\u5408\u4e24\u8005\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3aAD-H\u7684\u5206\u5c42\u591a\u4ee3\u7406\u9a7e\u9a76\u7cfb\u7edf\u5b9e\u73b0\u8fd9\u4e00\u7406\u5ff5\uff0c\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u9ad8\u5c42\u63a8\u7406\u7684MLLM\u89c4\u5212\u5668\u548c\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u63a7\u5236\u5668\u8fdb\u884c\u4f4e\u5c42\u6267\u884c\u3002\u8fd9\u79cd\u5206\u5c42\u8bbe\u8ba1\u4f7fMLLM\u6446\u8131\u4e86\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\u89e3\u7801\uff0c\u5145\u5206\u91ca\u653e\u4e86\u5176\u5728\u9ad8\u5c42\u611f\u77e5\u3001\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u7684\u6d8c\u73b0\u80fd\u529b\u3002 \u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5e26\u6709\u52a8\u4f5c\u5c42\u6b21\u6ce8\u91ca\u7684\u65b0\u6570\u636e\u96c6\u3002\u5168\u9762\u7684\u95ed\u73af\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684AD-H\u7cfb\u7edf\u5177\u6709\u591a\u9879\u5173\u952e\u4f18\u52bf\u3002\u9996\u5148\uff0cAD-H\u5728\u9a7e\u9a76\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u751a\u81f3\u5c55\u73b0\u51fa\u5728\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u81ea\u6211\u7ea0\u6b63\u7684\u80fd\u529b\uff0c\u8fd9\u662f\u8bad\u7ec3\u6570\u636e\u672a\u6db5\u76d6\u7684\u573a\u666f\u3002\u5176\u6b21\uff0cAD-H\u5728\u957f\u7a0b\u6307\u4ee4\u548c\u65b0\u73af\u5883\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\uff0c\u660e\u663e\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\uff0c\u53ef\u901a\u8fc7\u83b7\u53d6\u3002|\n", "2406.03450": "|**2024-06-05**|**What is the Best Way for ChatGPT to Translate Poetry?**|Shanshan Wang et.al.|[2406.03450](http://arxiv.org/abs/2406.03450)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u5728\u82f1\u8bed-\u4e2d\u6587\u8bd7\u6b4c\u7ffb\u8bd1\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u5b9a\u5411\u63d0\u793a\u548c\u5c0f\u6837\u672c\u573a\u666f\u5206\u6790\u4ee5\u4f18\u5316\u5176\u8868\u73b0\u3002\u5c3d\u7ba1\u521d\u671f\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u7814\u7a76\u53d1\u73b0ChatGPT\u7684\u7ffb\u8bd1\u5b58\u5728\u6301\u7eed\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u89e3\u91ca\u8f85\u52a9\u8bd7\u6b4c\u673a\u5668\u7ffb\u8bd1\u201d\uff08EAPMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bd7\u6b4c\u7684\u5355\u8bed\u89e3\u91ca\u4f5c\u4e3a\u7ffb\u8bd1\u8fc7\u7a0b\u7684\u6307\u5bfc\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ee5\u66f4\u597d\u5730\u9002\u5e94\u73b0\u4ee3\u8bd7\u6b4c\u7ffb\u8bd1\u7684\u5fae\u5999\u4e4b\u5904\u3002\u6211\u4eec\u9080\u8bf7\u4e13\u4e1a\u8bd7\u4eba\u8fdb\u884c\u8bc4\u4f30\uff0c\u5e76\u7ed3\u5408GPT-4\u7684\u8bc4\u4ef7\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684EAPMT\u65b9\u6cd5\u5728\u4e0e\u4f20\u7edfChatGPT\u7ffb\u8bd1\u65b9\u6cd5\u4ee5\u53ca\u73b0\u6709\u5728\u7ebf\u7cfb\u7edf\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8bba\u6587\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e3a\u6587\u5b66\u7ffb\u8bd1\u7684\u673a\u5668\u8f85\u52a9\u63d0\u4f9b\u4e86\u65b0\u9896\u89c6\u89d2\u3002|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445](http://arxiv.org/abs/2406.03445)|null|## \u7ffb\u8bd1 \u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u6267\u884c\u57fa\u672c\u7684\u7b97\u672f\u8fd0\u7b97\uff0c\u5982\u52a0\u6cd5\uff0c\u4ecd\u4e0d\u6e05\u695a\u3002\u672c\u6587\u63ed\u793a\u4e86\u9884\u8bad\u7ec3\u7684LLMs\u901a\u8fc7\u5085\u91cc\u53f6\u7279\u5f81\u8fdb\u884c\u52a0\u6cd5\u2014\u2014\u8fd9\u4e9b\u662f\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u7ef4\u5ea6\uff0c\u901a\u8fc7\u4e00\u7ec4\u5728\u9891\u57df\u4e2d\u7a00\u758f\u5206\u5e03\u7684\u7279\u5f81\u6765\u8868\u793a\u6570\u5b57\u3002\u5728\u6a21\u578b\u4e2d\uff0c\u591a\u5c42\u611f\u77e5\u5668\uff08MLP\uff09\u5c42\u548c\u6ce8\u610f\u529b\u5c42\u4ee5\u4e92\u8865\u7684\u65b9\u5f0f\u4f7f\u7528\u5085\u91cc\u53f6\u7279\u5f81\uff1aMLP\u5c42\u4e3b\u8981\u4f7f\u7528\u4f4e\u9891\u7279\u5f81\u8fd1\u4f3c\u7b54\u6848\u7684\u5927\u5c0f\uff0c\u800c\u6ce8\u610f\u529b\u5c42\u4e3b\u8981\u901a\u8fc7\u9ad8\u9891\u7279\u5f81\u6267\u884c\u6a21\u8fd0\u7b97\uff08\u4f8b\u5982\u5224\u65ad\u7b54\u6848\u662f\u5426\u4e3a\u5076\u6570\uff09\u3002\u9884\u8bad\u7ec3\u5bf9\u4e8e\u8fd9\u79cd\u673a\u5236\u81f3\u5173\u91cd\u8981\uff1a\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684\u6a21\u578b\u4ec5\u5229\u7528\u4f4e\u9891\u7279\u5f81\uff0c\u5bfc\u81f4\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u5c06\u9884\u8bad\u7ec3\u7684\u8bcd\u5d4c\u5165\u5f15\u5165\u5230\u968f\u673a\u521d\u59cb\u5316\u7684\u6a21\u578b\u4e2d\u53ef\u4ee5\u6062\u590d\u5176\u6027\u80fd\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u9002\u5f53\u7684\u9884\u8bad\u7ec3\u8868\u793a\uff08\u5982\u5085\u91cc\u53f6\u7279\u5f81\uff09\u80fd\u591f\u89e3\u9501Transformer\u5b66\u4e60\u7b97\u6cd5\u4efb\u52a1\u7cbe\u786e\u673a\u5236\u7684\u80fd\u529b\u3002|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441](http://arxiv.org/abs/2406.03441)|null|\u5728\u8bb8\u591a\u9ad8\u98ce\u9669\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u4e2d\uff0c\u6a21\u578b\u9700\u8981\u80fd\u591f\u8868\u660e\u5176\u5bf9\u9884\u6d4b\u7684\u4e0d\u786e\u5b9a\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u4e0a\u7684\u51c6\u786e\u5ea6\u53ef\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u4eba\u7c7b\u6c34\u5e73\uff0c\u4f46\u5b83\u4eec\u5bf9\u9519\u8bef\u54cd\u5e94\u7684\u8fc7\u5ea6\u81ea\u4fe1\u4ecd\u662f\u5df2\u77e5\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u5728\u76f4\u63a5\u5e94\u7528\u4e8eLLMs\u65f6\u53ef\u80fd\u9762\u4e34\u8ba1\u7b97\u6210\u672c\u548c\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u6311\u6218\u3002\u8fd1\u671f\u63d0\u51fa\u4e86\u4e00\u4e9b\u9ed1\u76d2\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bf8\u5982\u81ea\u6211\u8868\u8ff0\u7684\u4fe1\u5fc3\u7b49\u542f\u53d1\u5f0f\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u901a\u8fc7\u5206\u6790\u6a21\u578b\u751f\u6210\u7b54\u6848\u7684\u89e3\u91ca\u5206\u5e03\u6765\u8861\u91cfLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u5c3d\u7ba1\u5229\u7528\u89e3\u91ca\u672c\u8eab\u5e76\u975e\u65b0\u9896\uff0c\u4f46\u6211\u4eec\u5c06\u5176\u89c6\u4e3a\u6d4b\u8bd5\u65f6\u95f4\u5206\u7c7b\u5668\uff0c\u901a\u8fc7\u8ba1\u7b97\u6700\u53ef\u80fd\u7684\u5206\u7c7b\u5668\u540e\u9a8c\u7b54\u6848\u5206\u5e03\uff0c\u4ee5\u6b64\u8fdb\u884c\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u89e3\u91ca\u8574\u542b\u4f5c\u4e3a\u5206\u7c7b\u5668\u4f3c\u7136\u6027\u7684\u4e00\u79cd\u7279\u5b9a\u6846\u67b6\u5b9e\u4f8b\uff0c\u5982\u4f55\u5728\u4e94\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\u4e0a\u6539\u8fdb\u4e86\u4fe1\u5fc3\u5206\u6570\u6307\u6807\uff08\u7279\u522b\u662fAUROC\u548cAURC\uff09\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u65e2\u5177\u6709\u7406\u8bba\u4f9d\u636e\uff0c\u53c8\u662f\u6709\u6548\u91cf\u5316LLMs\u4e0d\u786e\u5b9a\u6027\u7684\u65b9\u5f0f\u3002|\n", "2406.03411": "|**2024-06-05**|**Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach**|Saehyung Lee et.al.|[2406.03411](http://arxiv.org/abs/2406.03411)|**[link](https://github.com/saehyung-lee/plugir)**|**\u8be5\u8bba\u6587\u4e3b\u8981\u5173\u6ce8\u7684\u662f\u4ea4\u4e92\u5f0f\u6587\u672c\u5230\u56fe\u50cf\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u8bdd\u5f62\u5f0f\u4e0a\u4e0b\u6587\u67e5\u8be2\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\uff0c\u540d\u4e3aPlugIR\uff0c\u901a\u8fc7\u4e24\u79cd\u65b9\u5f0f\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u822c\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u9996\u5148\uff0c\u901a\u8fc7\u91cd\u8ff0\u5bf9\u8bdd\u5f62\u5f0f\u7684\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u6d88\u9664\u4e86\u5728\u73b0\u6709\u89c6\u89c9\u5bf9\u8bdd\u6570\u636e\u4e0a\u5fae\u8c03\u68c0\u7d22\u6a21\u578b\u7684\u9700\u6c42\uff0c\u4ece\u800c\u80fd\u591f\u4f7f\u7528\u4efb\u610f\u9ed1\u76d2\u6a21\u578b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2aLLM\u63d0\u95ee\u8005\uff0c\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u4e2d\u5019\u9009\u56fe\u50cf\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5173\u4e8e\u76ee\u6807\u56fe\u50cf\u5c5e\u6027\u7684\u975e\u5197\u4f59\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u51cf\u5c11\u4e86\u751f\u6210\u95ee\u9898\u7684\u566a\u58f0\u548c\u5197\u4f59\u3002\u9664\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3a\u6700\u4f73\u5bf9\u6570\u6392\u540d\u79ef\u5206\uff08BRI\uff09\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4ea4\u4e92\u5f0f\u68c0\u7d22\u7cfb\u7edf\u3002PlugIR\u5728\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u4f18\u4e8e\u96f6\u6b21\u8bbe\u7f6e\u548c Fine-tuned \u57fa\u51c6\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c PlugIR \u7684\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\u53ef\u4ee5\u6839\u636e\u4e0d\u540c\u60c5\u51b5\u7075\u6d3b\u5355\u72ec\u6216\u7ed3\u5408\u5e94\u7528\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/Saehyung-Lee/PlugIR\u3002**|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344](http://arxiv.org/abs/2406.04344)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d6\u5f97\u7684\u5de8\u5927\u8fdb\u5c55\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53e3\u5934\u5316\u673a\u5668\u5b66\u4e60\uff08VML\uff09\u6846\u67b6\u3002\u4e0e\u4f20\u7edf\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u901a\u5e38\u5728\u8fde\u7eed\u53c2\u6570\u7a7a\u95f4\u4e2d\u4f18\u5316\u4e0d\u540c\uff0cVML\u5c06\u53c2\u6570\u7a7a\u95f4\u9650\u5236\u4e3a\u4eba\u53ef\u7406\u89e3\u7684\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u79cd\u7ea6\u675f\u4fc3\u4f7f\u6211\u4eec\u4ece\u65b0\u89d2\u5ea6\u770b\u5f85\u51fd\u6570\u903c\u8fd1\u95ee\u9898\uff0c\u5373\u5c06\u5e26\u6709\u6587\u672c\u63d0\u793a\u7684LLM\u89c6\u4e3a\u7531\u6587\u672c\u63d0\u793a\u53c2\u6570\u5316\u7684\u51fd\u6570\u3002\u6211\u4eec\u501f\u6b64\u89c6\u89d2\u91cd\u65b0\u5ba1\u89c6\u4e86\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u5982\u56de\u5f52\u548c\u5206\u7c7b\uff0c\u53d1\u73b0\u8fd9\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\u548c\u4f18\u5316\u5668\u6765\u89e3\u51b3\u3002VML\u7684\u4e3b\u8981\u4f18\u52bf\u5305\u62ec\uff1a\uff081\uff09\u6613\u4e8e\u7f16\u7801\u5148\u9a8c\u77e5\u8bc6\uff1a\u5173\u4e8e\u95ee\u9898\u548c\u5047\u8bbe\u7c7b\u7684\u5148\u9a8c\u77e5\u8bc6\u53ef\u4ee5\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7f16\u7801\u5e76\u8f93\u5165\u7ed9LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\uff1b\uff082\uff09\u81ea\u52a8\u6a21\u578b\u9009\u62e9\uff1a\u4f18\u5316\u5668\u53ef\u4ee5\u6839\u636e\u6570\u636e\u548c\u53e3\u5934\u5316\u5148\u9a8c\u77e5\u8bc6\u81ea\u52a8\u9009\u62e9\u5177\u4f53\u7684\u6a21\u578b\u7c7b\u522b\uff0c\u5e76\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u66f4\u65b0\u6a21\u578b\u7c7b\u522b\uff1b\uff083\uff09\u53ef\u89e3\u91ca\u7684\u5b66\u4e60\u8005\u66f4\u65b0\uff1aLLM\u53c2\u6570\u5316\u7684\u4f18\u5316\u5668\u53ef\u4ee5\u89e3\u91ca\u6bcf\u6b21\u5b66\u4e60\u8005\u66f4\u65b0\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u591a\u9879\u5b9e\u9a8c\u8bc4\u4f30VML\u7684\u6709\u6548\u6027\uff0c\u5e0c\u671b\u5b83\u80fd\u6210\u4e3a\u589e\u5f3a\u673a\u5668\u5b66\u4e60\u53ef\u89e3\u91ca\u6027\u548c\u4fe1\u4efb\u5ea6\u7684\u6865\u6881\u3002|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339](http://arxiv.org/abs/2406.04339)|null|\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u7684\u6838\u5fc3\u76ee\u6807\u4e2d\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u89c6\u89c9\u573a\u666f\u5e76\u6267\u884c\u52a8\u4f5c\u662f\u4e00\u4e2a\u57fa\u672c\u4efb\u52a1\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u673a\u5668\u4eba\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u80fd\u591f\u5904\u7406\u4e00\u4e9b\u57fa\u7840\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5728\u4e24\u4e2a\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff1a1\uff09\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u63a8\u7406\u80fd\u529b\u4e0d\u8db3\uff1b2\uff09\u5bf9\u4e8eMLLM\u7684\u5fae\u8c03\u548c\u63a8\u7406\u5b58\u5728\u9ad8\u8ba1\u7b97\u6210\u672c\u3002\u8fd1\u671f\u63d0\u51fa\u7684\u57fa\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSM\uff09\u7684Mamba\u5c55\u793a\u4e86\u5728\u975e\u5e73\u51e1\u5e8f\u5217\u5efa\u6a21\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5177\u6709\u7ebf\u6027\u63a8\u7406\u590d\u6742\u5ea6\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RoboMamba\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4ebaMLLM\uff0c\u5b83\u5229\u7528Mamba\u6a21\u578b\u7ed3\u5408\u673a\u5668\u4eba\u63a8\u7406\u548c\u52a8\u4f5c\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u5fae\u8c03\u548c\u63a8\u7406\u6548\u7387\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u4e0eMamba\u96c6\u6210\uff0c\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u4f7f\u89c6\u89c9\u6570\u636e\u4e0e\u8bed\u8a00\u5d4c\u5165\u5bf9\u9f50\uff0c\u8d4b\u4e88\u6a21\u578b\u89c6\u89c9\u5e38\u8bc6\u548c\u4e0e\u673a\u5668\u4eba\u76f8\u5173\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347RoboMamba\u7684\u52a8\u4f5c\u59ff\u6001\u9884\u6d4b\u80fd\u529b\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u7b80\u5355\u7684\u7b56\u7565\u5934\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e00\u65e6RoboMamba\u5177\u5907\u8db3\u591f\u7684\u63a8\u7406\u80fd\u529b\uff0c\u53ea\u9700\u6781\u5c11\u7684\u5fae\u8c03\u53c2\u6570\uff08\u6a21\u578b\u76840.1%\uff09\u548c\u65f6\u95f4\uff0820\u5206\u949f\uff09\uff0c\u5c31\u80fd\u4e60\u5f97\u64cd\u7eb5\u6280\u80fd\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0cRoboMamba\u5728\u901a\u7528\u548c\u673a\u5668\u4eba\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u4e2d\u5b9e\u73b0\u4e86\u59ff\u6001\u9884\u6d4b\u7684\u51fa\u8272\u8868\u73b0\uff0c\u5176\u63a8\u7406\u901f\u5ea6\u6bd4\u73b0\u6709\u673a\u5668\u4ebaMLLM\u5feb7\u500d\u3002\u9879\u76ee\u7684\u7f51\u9875\u94fe\u63a5\u4e3a\uff1a\u3002|\n", "2406.04337": "|**2024-06-06**|**Coherent Zero-Shot Visual Instruction Generation**|Quynh Phung et.al.|[2406.04337](http://arxiv.org/abs/2406.04337)|null|\u5c3d\u7ba1\u6587\u672c\u5230\u56fe\u50cf\u5408\u6210\u6280\u672f\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5728\u6269\u6563\u6a21\u578b\u65b9\u9762\uff0c\u4f46\u751f\u6210\u9700\u8981\u7269\u4f53\u5728\u8fde\u7eed\u6b65\u9aa4\u4e2d\u4fdd\u6301\u4e00\u81f4\u8868\u793a\u548c\u5e73\u6ed1\u72b6\u6001\u8f6c\u6362\u7684\u89c6\u89c9\u6307\u4ee4\u4ecd\u7136\u662f\u4e00\u9879\u8270\u5de8\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\uff0c\u5de7\u5999\u5730\u7ed3\u5408\u4e86\u6587\u672c\u7406\u89e3\u4e0e\u56fe\u50cf\u751f\u6210\uff0c\u4ee5\u786e\u4fdd\u89c6\u89c9\u6307\u4ee4\u65e2\u7f8e\u89c2\u53c8\u5177\u6709\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6d4b\u8bd5\u591a\u6b65\u9aa4\u6307\u4ee4\uff0c\u5e76\u4e0e\u591a\u4e2a\u57fa\u7ebf\u8fdb\u884c\u6bd4\u8f83\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u8fde\u8d2f\u4e14\u89c6\u89c9\u4e0a\u5438\u5f15\u4eba\u7684\u6307\u4ee4\u3002|\n", "2406.04334": "|**2024-06-06**|**DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs**|Lingchen Meng et.al.|[2406.04334](http://arxiv.org/abs/2406.04334)|null|\u5927\u591a\u6570\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u901a\u8fc7\u5c06\u89c6\u89c9\u4ee4\u724c\u4f5c\u4e3a\u5e8f\u5217\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7b2c\u4e00\u5c42\u6765\u5b9e\u73b0\u3002\u8fd9\u79cd\u65b9\u6cd5\u867d\u7136\u76f4\u89c2\uff0c\u4f46\u4f1a\u663e\u8457\u589e\u52a0\u8ba1\u7b97\u548c\u5185\u5b58\u5f00\u9500\uff0c\u56e0\u4e3a\u6a21\u578b\u9700\u8981\u5904\u7406\u66f4\u591a\u7684\u8f93\u5165\u5c42\u4ee4\u724c\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u67b6\u6784DeepStack\uff0c\u7528\u4e8eLMMs\u3002\u5728LMM\u7684\u89c6\u89c9\u548c\u8bed\u8a00Transformer\u7684N\u5c42\u4e2d\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u4ee4\u724c\u5206\u4e3aN\u7ec4\uff0c\u5e76\u4ece\u5e95\u5c42\u9010\u5c42\u5411\u4e0a\u9988\u9001\u5230\u5bf9\u5e94\u7684Transformer\u5c42\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u6781\u5927\u5730\u589e\u5f3a\u4e86LMM\u5728\u8de8\u5c42\u89c6\u89c9\u4ee4\u724c\u4ea4\u4e92\u65b9\u9762\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u6210\u672c\u51e0\u4e4e\u4e0d\u53d8\u3002\u6211\u4eec\u5206\u522b\u5c06DeepStack\u5e94\u7528\u4e8eLMM\u7684\u8bed\u8a00\u548c\u89c6\u89c9Transformer\uff0c\u5e76\u901a\u8fc7\u5e7f\u6cdb\u5b9e\u8bc1\u7ed3\u679c\u9a8c\u8bc1\u4e86DeepStack LMM\u7684\u6709\u6548\u6027\u3002 \u4f7f\u7528\u76f8\u540c\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u6211\u4eec\u7684DeepStack 7B\u548c13B\u53c2\u6570\u6a21\u578b\u57289\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5e73\u5747\u8d85\u8d8a\u540c\u7c7b\u6a21\u578b2.7\u5206\u548c2.9\u5206\u3002\u4ec5\u4f7f\u7528\u4e94\u5206\u4e4b\u4e00\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0cDeepStack\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u4f7f\u7528\u5b8c\u6574\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6a21\u578b\u3002\u8fd9\u4e9b\u63d0\u5347\u5728\u9ad8\u5206\u8fa8\u7387\u4efb\u52a1\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u4f8b\u5982\uff0c\u4e0eLLaVA-1.5-7B\u76f8\u6bd4\uff0cTextVQA\u3001DocVQA\u548cInfoVQA\u4e0a\u7684\u6027\u80fd\u5206\u522b\u63d0\u9ad8\u4e864.2\u5206\u300111.0\u5206\u548c4.0\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06DeepStack\u5e94\u7528\u5230\u89c6\u89c9Transformer\u5c42\uff0c\u8fd9\u5e26\u6765\u4e86\u4e0eLLaVA-1.5-7B\u76f8\u5f53\u7684\u5e73\u5747\u6539\u8fdb\uff0c\u4e3a3.8\u5206\u3002|\n", "2406.04331": "|**2024-06-06**|**PaCE: Parsimonious Concept Engineering for Large Language Models**|Jinqi Luo et.al.|[2406.04331](http://arxiv.org/abs/2406.04331)|**[link](https://github.com/peterljq/parsimonious-concept-engineering)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5c3d\u7ba1\u5b83\u4eec\u80fd\u591f\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u56de\u590d\uff0c\u4f46\u4e5f\u4f1a\u4ea7\u751f\u4e0d\u826f\u8f93\u51fa\uff0c\u5982\u6f5c\u5728\u6709\u5bb3\u4fe1\u606f\u3001\u79cd\u65cf\u6216\u6027\u522b\u6b67\u89c6\u6027\u8a00\u8bba\u4ee5\u53ca\u9519\u8bef\u7684\u4fe1\u606f\u3002\u4e3a\u4e86\u51cf\u5c11\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u3001\u63d0\u793a\u5de5\u7a0b\u548c\u8868\u793a\u5de5\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u9762\u4e34\u6311\u6218\uff1a\u4e00\u4e9b\u9700\u8981\u9488\u5bf9\u6bcf\u4e2a\u5bf9\u9f50\u4efb\u52a1\u8fdb\u884c\u6602\u8d35\u7684\u5fae\u8c03\uff1b\u4e00\u4e9b\u672a\u80fd\u5145\u5206\u6d88\u9664\u4e0d\u826f\u6982\u5ff5\uff0c\u5bf9\u9f50\u6548\u679c\u4e0d\u4f73\uff1b\u4e00\u4e9b\u5219\u5220\u9664\u4e86\u826f\u6027\u7684\u6982\u5ff5\uff0c\u964d\u4f4e\u4e86LLMs\u7684\u8bed\u8a00\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aParsimonious Concept Engineering\uff08PaCE\uff09\u7684\u65b0\u578b\u6fc0\u6d3b\u5de5\u7a0b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \u9996\u5148\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u6982\u5ff5\u5b57\u5178\uff0c\u5b83\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u8868\u793a\u6bcf\u4e2a\u539f\u5b50\u5bf9\u5e94\u4e00\u4e2a\u8bed\u4e49\u6982\u5ff5\u3002\u63a5\u7740\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u4efb\u4f55\u5bf9\u9f50\u4efb\u52a1\uff0c\u6211\u4eec\u4f1a\u4f7f\u7528\u4e00\u4e2a\u6982\u5ff5\u5206\u533a\u5668\u9ad8\u6548\u5730\u6807\u8bb0\u8fd9\u4e9b\u6982\u5ff5\u4e3a\u826f\u6027\u6216\u4e0d\u826f\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u5229\u7528\u7a00\u758f\u7f16\u7801\u65b9\u6cd5\uff0c\u6839\u636e\u6982\u5ff5\u5b57\u5178\u5206\u89e3LLM\u7684\u6fc0\u6d3b\uff0c\u5c06\u5176\u51c6\u786e\u8868\u793a\u4e3a\u826f\u6027\u6210\u5206\u548c\u4e0d\u826f\u6210\u5206\u7684\u7ebf\u6027\u7ec4\u5408\u3002\u901a\u8fc7\u79fb\u9664\u4e0d\u826f\u6210\u5206\uff0c\u6211\u4eec\u80fd\u591f\u8c03\u6574LLMs\u7684\u884c\u4e3a\u4ee5\u7b26\u5408\u5bf9\u9f50\u76ee\u6807\u3002 \u6211\u4eec\u5728\u56de\u5e94\u51c0\u5316\u3001\u771f\u5b9e\u6027\u589e\u5f3a\u548c\u60c5\u611f\u4fee\u8ba2\u7b49\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u53d1\u73b0PaCE\u5728\u5b9e\u73b0\u5bf9\u9f50\u6027\u80fd\u7684\u540c\u65f6\uff0c\u4fdd\u6301\u4e86\u826f\u597d\u7684\u8bed\u8a00\u80fd\u529b\uff0c\u8fbe\u5230\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002**|\n", "2406.04314": "|**2024-06-06**|**Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step**|Zhanhao Liang et.al.|[2406.04314](http://arxiv.org/abs/2406.04314)|null|## \u80cc\u666f \u8fd1\u671f\uff0cDirect Preference Optimization (DPO) \u5df2\u6210\u529f\u6269\u5c55\u5230\u8c03\u6574\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\uff0c\u4f7f\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u4e0d\u540c\u4e8e\u5927\u591a\u6570\u73b0\u6709 DPO \u65b9\u6cd5\u5047\u8bbe\u6240\u6709\u6269\u6563\u6b65\u9aa4\u90fd\u4e0e\u6700\u7ec8\u751f\u6210\u56fe\u50cf\u4fdd\u6301\u4e00\u81f4\u7684\u504f\u597d\u987a\u5e8f\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u5047\u8bbe\u5ffd\u7565\u4e86\u6bcf\u4e2a\u6b65\u9aa4\u7279\u6709\u7684\u53bb\u566a\u6027\u80fd\uff0c\u56e0\u6b64\u5e94\u8be5\u4e3a\u6bcf\u4e00\u6b65\u5b9a\u5236\u504f\u597d\u6807\u7b7e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u540e\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014Step-aware Preference Optimization (SPO)\uff0c\u5b83\u72ec\u7acb\u8bc4\u4f30\u5e76\u8c03\u6574\u6bcf\u4e2a\u6b65\u9aa4\u7684\u53bb\u566a\u6027\u80fd\uff0c\u5229\u7528\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\u548c\u6b65\u7ea7\u91cd\u91c7\u6837\u5668\u6765\u786e\u4fdd\u51c6\u786e\u7684\u6b65\u7ea7\u76d1\u7763\u3002 \u5728SPO\u4e2d\uff0c\u6211\u4eec\u5728\u6bcf\u4e2a\u53bb\u566a\u6b65\u9aa4\u4e2d\u4f1a\u521b\u5efa\u4e00\u4e2a\u56fe\u50cf\u6c60\uff0c\u5bfb\u627e\u5408\u9002\u7684\u80dc\u8005-\u8d25\u8005\u5bf9\uff0c\u5e76\u4e14\u5173\u952e\u5728\u4e8e\uff0c\u6211\u4eec\u4f1a\u4ece\u6c60\u4e2d\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u56fe\u50cf\u4f5c\u4e3a\u4e0b\u4e00\u6b21\u53bb\u566a\u6b65\u9aa4\u7684\u8d77\u70b9\u3002\u8fd9\u4e2a\u6b65\u7ea7\u91cd\u91c7\u6837\u8fc7\u7a0b\u4fdd\u8bc1\u4e86\u6bcf\u6b21\u80dc\u8005-\u8d25\u8005\u5bf9\u90fd\u6765\u81ea\u540c\u4e00\u539f\u59cb\u56fe\u50cf\uff0c\u4f7f\u5f97\u6bd4\u8f83\u72ec\u7acb\u4e8e\u524d\u4e00\u6b65\u3002\u4e3a\u4e86\u8bc4\u4f30\u6bcf\u4e2a\u6b65\u9aa4\u7684\u504f\u597d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u6a21\u7cca\u548c\u6e05\u6670\u7684\u56fe\u50cf\u3002\u5728Stable Diffusion v1.5\u548cSDXL\u7b49\u5b9e\u9a8c\u4e2d\uff0cSPO \u663e\u8457\u4f18\u4e8e\u6700\u65b0\u7684Diffusion-DPO\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u8be6\u7ec6\u7684\u63d0\u793a\u65f6\uff0c\u80fd\u66f4\u597d\u5730\u751f\u6210\u56fe\u50cf\u5e76\u63d0\u5347\u7f8e\u5b66\u6548\u679c\uff0c\u540c\u65f6\u5728\u8bad\u7ec3\u6548\u7387\u4e0a\u8d85\u8fc720\u500d\u3002\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1a[https://rockeycoss.github.io/spo.github.io/](https://rockeycoss.github.io/spo.github.io/)\u3002|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306](http://arxiv.org/abs/2406.04306)|**[link](https://github.com/ml-jku/SDLG)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u65f6\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u8fd9\u963b\u788d\u4e86\u793e\u4f1a\u548c\u5de5\u4e1a\u4e2d\u7684\u5404\u79cd\u5e94\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u964d\u4f4eLLMs\u7684\u53ef\u4fe1\u5ea6\u3002\u5f53\u524d\u7684LLMs\u91c7\u7528\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u6587\u672c\uff0c\u5373\u9884\u6d4b\u5e76\u6dfb\u52a0\u6587\u672c\u6807\u8bb0\u3002\u5f53LLMs\u5bf9\u751f\u6210\u7684\u4e0b\u4e00\u4e2a\u6807\u8bb0\u7684\u8bed\u4e49\u542b\u4e49\u4e0d\u786e\u5b9a\u65f6\uff0c\u5f88\u53ef\u80fd\u4f1a\u4ea7\u751f\u5e7b\u89c9\u3002\u56e0\u6b64\uff0c\u4eba\u4eec\u8ba4\u4e3a\u5e7b\u89c9\u6e90\u4e8e\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u8bed\u4e49\u591a\u6837\u6027\u8bed\u8a00\u751f\u6210\u201d\uff08Semantically Diverse Language Generation\uff0cSDLG\uff09\uff0c\u7528\u4e8e\u91cf\u5316LLMs\u7684\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002SDLG\u5f15\u5bfcLLM\u751f\u6210\u8bed\u4e49\u591a\u6837\u4f46\u53c8\u5408\u7406\u7684\u521d\u59cb\u6587\u672c\u66ff\u4ee3\u65b9\u6848\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u7cbe\u786e\u7684aleatoric\u8bed\u4e49\u4e0d\u786e\u5b9a\u6027\u6d4b\u91cf\uff0c\u80fd\u591f\u68c0\u6d4b\u521d\u59cb\u6587\u672c\u662f\u5426\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u3002 \u5b9e\u9a8c\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u8868\u660e\uff0cSDLG\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u6700\u4e3a\u9ad8\u6548\uff0c\u4e3aLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u8bbe\u5b9a\u4e86\u65b0\u7684\u6807\u51c6\u3002**|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300](http://arxiv.org/abs/2406.04300)|null|\u5728\u6a21\u62df\u8bad\u7ec3\u548c\u8bc4\u4f30\u5173\u952e\u5b89\u5168\u7cfb\u7edf\uff0c\u5982\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u65f6\uff0c\u901a\u8fc7\u6a21\u62df\u751f\u6210\u5404\u79cd\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u6a21\u578b\u5176\u4ed6\u8f66\u8f86\u7684\u8f68\u8ff9\u4ee5\u6a21\u62df\u590d\u6742\u4e14\u6709\u610f\u4e49\u7684\u8fd1\u8ddd\u79bb\u4ea4\u4e92\u4efb\u52a1\u6210\u672c\u9ad8\u6602\u3002\u5229\u7528\u8bed\u8a00\u63cf\u8ff0\u6765\u751f\u6210\u9a7e\u9a76\u884c\u4e3a\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u76f4\u89c2\u7684\u4eba\u7c7b\u64cd\u4f5c\u65b9\u5f0f\uff0c\u80fd\u591f\u6a21\u62df\u5e7f\u6cdb\u9a7e\u9a76\u4e92\u52a8\u3002\u4f46\u5927\u578b\u6807\u6ce8\u7684\u8bed\u8a00-\u8f68\u8ff9\u6570\u636e\u7a00\u7f3a\u662f\u8fd9\u4e00\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Text-to-Drive\uff08T2D\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5408\u6210\u591a\u6837\u5316\u9a7e\u9a76\u884c\u4e3a\u7684\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u77e5\u8bc6\u9a71\u52a8\u4e24\u9636\u6bb5\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5229\u7528LLMs\u7684\u5185\u7f6e\u77e5\u8bc6\u751f\u6210\u4e30\u5bcc\u591a\u6837\u7684\u9a7e\u9a76\u884c\u4e3a\u8bed\u8a00\u63cf\u8ff0\uff1b\u63a5\u7740\uff0c\u5229\u7528\u5176\u63a8\u7406\u80fd\u529b\u5728\u6a21\u62df\u5668\u4e2d\u5b9e\u73b0\u8fd9\u4e9b\u884c\u4e3a\u3002T2D\u7684\u6838\u5fc3\u662f\u4f7f\u7528LLM\u6784\u5efa\u72b6\u6001\u56fe\uff0c\u5c06\u4f4e\u7ea7\u72b6\u6001\u6620\u5c04\u5230\u9ad8\u7ea7\u62bd\u8c61\uff0c\u4ece\u800c\u7b80\u5316\u4e86\u8bf8\u5982\u603b\u7ed3\u4f4e\u7ea7\u89c2\u6d4b\u3001\u8bc4\u4f30\u7b56\u7565\u4e0e\u884c\u4e3a\u63cf\u8ff0\u7684\u4e00\u81f4\u6027\u4ee5\u53ca\u8bbe\u8ba1\u8f85\u52a9\u5956\u52b1\u7b49\u4e0b\u6e38\u4efb\u52a1\uff0c\u65e0\u9700\u4eba\u5de5\u76d1\u7763\u3002\u901a\u8fc7\u6211\u4eec\u7684\u77e5\u8bc6\u9a71\u52a8\u65b9\u6cd5\uff0c\u6211\u4eec\u8bc1\u660eT2D\u80fd\u751f\u6210\u6bd4\u5176\u4ed6\u57fa\u51c6\u66f4\u4e30\u5bcc\u7684\u8f68\u8ff9\uff0c\u5e76\u63d0\u4f9b\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u4ea4\u4e92\u5f0f\u5730\u878d\u5165\u4eba\u7c7b\u504f\u597d\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u7f51\u7ad9\uff1a|\n", "2406.04289": "|**2024-06-07**|**What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages**|Nadav Borenstein et.al.|[2406.04289](http://arxiv.org/abs/2406.04289)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b66\u4e60\u4ec0\u4e48\uff1f\u6839\u636e\u5b9a\u4e49\uff0c\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\u662f\u5b57\u7b26\u4e32\u7684\u5206\u5e03\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06\u8fd9\u4e2a\u95ee\u9898\u8f6c\u5316\u4e3a\u8bc4\u4f30\u5b57\u7b26\u4e32\u5206\u5e03\u7c7b\u7684\u5b66\u4e60\u80fd\u529b\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u7406\u8bba\u9650\u5236\uff0c\u4f46\u6211\u4eec\u5173\u6ce8\u7684\u662f\u5b9e\u9645\u53ef\u5b66\u4e60\u6027\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u5b9e\u8bc1\u5de5\u4f5c\uff0c\u6211\u4eec\u8bc4\u4f30\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u5176\u201c\u4e3b\u573a\u201d\u2014\u2014\u5b66\u4e60\u6982\u7387\u8bed\u8a00\u2014\u2014\u4e0a\u7684\u8868\u73b0\uff0c\u800c\u4e0d\u662f\u4f5c\u4e3a\u5f62\u5f0f\u8bed\u8a00\u7684\u5206\u7c7b\u5668\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u9012\u5f52\u8bed\u8a00\u6a21\u578b\uff08RLM\uff09\u7531\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u548cTransformer LM\u5b66\u4e60\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5RLM\u7684\u53ef\u5b66\u4e60\u6027\uff0c\u8003\u5bdf\u5176\u4e0eRLM\u7684\u590d\u6742\u53c2\u6570\u4ee5\u53ca\u795e\u7ecfLM\u9690\u85cf\u5c42\u5927\u5c0f\u7684\u5173\u7cfb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRLM\u7684\u79e9\uff08\u5bf9\u5e94\u4e8e\u5176\u6761\u4ef6\u5206\u5e03\u5bf9\u6570\u4f3c\u7136\u7ebf\u6027\u7a7a\u95f4\u7684\u5927\u5c0f\uff09\u548c\u91c7\u6837\u5b57\u7b26\u4e32\u7684\u9884\u671f\u957f\u5ea6\u662fRNN\u548cTransformer LM\u53ef\u5b66\u4e60\u6027\u7684\u5f3a\u4e14\u663e\u8457\u9884\u6d4b\u56e0\u7d20\u3002\u5176\u4ed6\u4e00\u4e9b\u9884\u6d4b\u6307\u6807\u4e5f\u8fbe\u5230\u4e86\u663e\u8457\u6027\uff0c\u4f46RNN\u548cTransformer\u4e4b\u95f4\u5b58\u5728\u4e0d\u540c\u7684\u6a21\u5f0f\u3002|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278](http://arxiv.org/abs/2406.04278)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|**## \u7ffb\u8bd1\u540e\u7684\u4e2d\u6587\u6458\u8981 \u5bf9\u8bdd\u8bed\u6c14\u5728\u4eba\u9645\u4ea4\u6d41\u4e2d\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u7814\u7a76\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ea4\u6d41\u8bed\u6c14\u7684\u5dee\u5f02\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u5173\u4e8e\u5bf9\u8bdd\u6a21\u5f0f\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u9884\u5148\u5b58\u5728\u7684\u5206\u7c7b\u4f53\u7cfb\u6216\u6587\u672c\u8bed\u6599\u5e93\uff0c\u8fd9\u4e9b\u53ef\u80fd\u5b58\u5728\u5b9e\u9a8c\u8005\u504f\u89c1\uff0c\u5e76\u53ef\u80fd\u65e0\u6cd5\u5145\u5206\u53cd\u6620\u7814\u7a76\u9886\u57df\u4e2d\u7684\u771f\u5b9e\u4e16\u754c\u5206\u5e03\u3002\u53d7\u8ba4\u77e5\u79d1\u5b66\u65b9\u6cd5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8fed\u4ee3\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u4e24\u9879\u4efb\u52a1\u6765\u540c\u65f6\u63ed\u793a\u8bed\u6c14\u548c\u53e5\u5b50\uff1a\uff081\uff09\u53c2\u4e0e\u8005\u5224\u65ad\u7ed9\u5b9a\u53e5\u5b50\u7684\u8bed\u6c14\uff0c\uff082\uff09\u53e6\u4e00\u53c2\u4e0e\u8005\u6839\u636e\u8be5\u8bed\u6c14\u751f\u6210\u53e5\u5b50\u3002\u6211\u4eec\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u548cGPT-4\u4e4b\u95f4\u8fdb\u884c\u4e86100\u8f6e\u8fd9\u6837\u7684\u4e92\u52a8\uff0c\u4ece\u800c\u83b7\u5f97\u4e86\u4e00\u7ec4\u5305\u542b\u53e5\u5b50\u548c\u5e38\u89c1\u5bf9\u8bdd\u8bed\u6c14\u7684\u6570\u636e\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u989d\u5916\u5b9e\u9a8c\uff0c\u8ba9\u4eba\u7c7b\u548cGPT-4\u5bf9\u6240\u6709\u53e5\u5b50\u6807\u6ce8\u6240\u6709\u8bed\u6c14\u3002\u57fa\u4e8e1,339\u540d\u4eba\u7c7b\u53c2\u4e0e\u8005\u300133,370\u6b21\u4eba\u7c7b\u8bc4\u4ef7\u4ee5\u53ca29,900\u4e2aGPT-4\u67e5\u8be2\u7684\u6570\u636e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u521b\u5efa\u4e00\u4e2a\u53ef\u89e3\u91ca\u7684\u51e0\u4f55\u8868\u793a\uff0c\u4ee5\u5c55\u793a\u4eba\u7c7b\u548cGPT-4\u4e4b\u95f4\u7684\u5bf9\u8bdd\u8bed\u6c14\u5173\u7cfb\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u548c\u8ba4\u77e5\u79d1\u5b66\u7406\u5ff5\u5982\u4f55\u7ed3\u5408\uff0c\u4ee5\u89e3\u51b3\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6311\u6218\u3002**|\n", "2406.05132": "|**2024-06-07**|**3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs**|Jianing Yang et.al.|[2406.05132](http://arxiv.org/abs/2406.05132)|**[link](https://github.com/sled-group/3D-GRAND)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u8bed\u8a00\u4e0e\u4e09\u7ef4\u611f\u77e5\u7684\u878d\u5408\u5bf9\u4e8e\u6784\u5efa\u7406\u89e3\u548c\u4e92\u52a8\u4e8e\u7269\u7406\u4e16\u754c\u7684\u5b9e\u4f53\u4ee3\u7406\u548c\u673a\u5668\u4eba\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u9002\u5e94\u4e09\u7ef4\u73af\u5883\uff083D-LLMs\uff09\u65b9\u9762\u4ecd\u5904\u4e8e\u521d\u7ea7\u9636\u6bb5\uff0c\u4e3b\u8981\u6311\u6218\u5728\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u5bc6\u96c6\u5730\u5c06\u8bed\u8a00\u4e0e\u4e09\u7ef4\u573a\u666f\u5173\u8054\u7684\u6570\u636e\u96c6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e863D-GRAND\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u5927\u578b\u6570\u636e\u96c6\uff0c\u5305\u542b40,087\u4e2a\u5bb6\u5ead\u573a\u666f\uff0c\u914d\u5bf9\u6709620\u4e07\u6761\u8be6\u5c3d\u7684\u573a\u666f-\u8bed\u8a00\u6307\u4ee4\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u75283D-GRAND\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u663e\u8457\u63d0\u9ad8\u4e863D-LLMs\u7684\u5b9a\u4f4d\u80fd\u529b\uff0c\u5e76\u51cf\u5c11\u4e86\u9519\u8bef\u7684\u60f3\u8c61\u3002\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e863D-POPE\u57fa\u51c6\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f303D-LLMs\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u6a21\u578b\u7684\u516c\u5e73\u6bd4\u8f83\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u96c6\u89c4\u6a21\u4e0e3D-LLM\u6027\u80fd\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u5f3a\u8c03\u4e86\u5927\u578b\u4e09\u7ef4\u6587\u672c\u6570\u636e\u96c6\u5728\u63a8\u52a8\u4f53\u611fAI\u7814\u7a76\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u521d\u6b65\u8ff9\u8c61\u8868\u660e\uff0c\u901a\u8fc7\u5728\u5927\u578b\u5408\u6210\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u53ef\u80fd\u5728\u73b0\u5b9e\u4e16\u754c3D\u626b\u63cf\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u8fd9\u5c55\u793a\u4e86\u6a21\u62df\u5230\u5b9e\u9645\u7684\u8fc1\u79fb\u5b66\u4e60\u6f5c\u529b\u3002\u901a\u8fc73D-GRAND\u548c3D-POPE\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u4f53\u611fAI\u793e\u533a\u63d0\u4f9b\u5fc5\u8981\u7684\u8d44\u6e90\u548c\u6d1e\u89c1\uff0c\u63a8\u52a8\u66f4\u53ef\u9760\u3001\u66f4\u624e\u5b9e\u76843D-LLMs\u7684\u53d1\u5c55\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttps://3d-grand.github.io|\n", "2406.05130": "|**2024-06-07**|**An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models**|Xiongtao Zhou et.al.|[2406.05130](http://arxiv.org/abs/2406.05130)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u5177\u6709\u6570\u5341\u4ebf\u53c2\u6570\uff0c\u5168\u9762\u8c03\u6574\u53d8\u5f97\u56f0\u96be\u3002\u7814\u7a76\u76ee\u6807\u662f\u627e\u51fa\u5728\u53c2\u6570\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u5347MLLM\u6027\u80fd\u7684\u6709\u6548\u65b9\u6cd5\u3002\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u56db\u79cd\u6d41\u884c\u7684PEFT\u6280\u672f\u5bf9\u5f00\u6e90MLLMs\u7684LLM\u7ec4\u4ef6\u8fdb\u884c\u5fae\u8c03\uff0c\u8bba\u6587\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5206\u6790\uff0c\u5185\u5bb9\u5305\u62ec\u4e0d\u540c\u65b9\u6cd5\u5bf9\u6a21\u578b\u3001\u53c2\u6570\u4f4d\u7f6e\u3001\u5fae\u8c03\u6570\u636e\u89c4\u6a21\u3001\u6a21\u578b\u7a33\u5b9a\u6027\u3001\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e03\u9879\u6570\u636e\u96c6\uff1a\u672a\u89c1\u8fc7\u7684\u548c\u5df2\u89c1\u8fc7\u7684\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u662f\u6700\u6709\u6548\u7684PEFT\u65b9\u6cd5\uff0c\u800c\u8fde\u63a5\u5668\u5c42\u7684\u5fae\u8c03\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u80fd\u63d0\u9ad8\u6027\u80fd\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05127": "|**2024-06-07**|**Towards Semantic Equivalence of Tokenization in Multimodal LLM**|Shengqiong Wu et.al.|[2406.05127](http://arxiv.org/abs/2406.05127)|null|### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002MLLM\u7684\u6838\u5fc3\u5728\u4e8e\u89c6\u89c9 tokenization\uff0c\u5373\u5982\u4f55\u6709\u6548\u5730\u5c06\u8f93\u5165\u7684\u89c6\u89c9\u4fe1\u53f7\u8f6c\u5316\u4e3a\u5bf9\u8bed\u8a00\u6a21\u578b\u6709\u76ca\u7684\u7279\u5f81\u8868\u793a\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u89c9tokenizer\u5728\u4fdd\u6301\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u8bed\u4e49\u4e00\u81f4\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5b83\u4eec\u8fc7\u4e8e\u788e\u7247\u5316\u89c6\u89c9\u8f93\u5165\uff0c\u7834\u574f\u4e86\u89c6\u89c9\u5185\u5bb9\u7684\u8bed\u4e49\u5b8c\u6574\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u8bed\u4e49\u7b49\u6548\u89c6\u89c9tokenizer\uff08SeTok\uff09\uff0c\u5b83\u901a\u8fc7\u52a8\u6001\u805a\u7c7b\u7b97\u6cd5\u5c06\u89c6\u89c9\u7279\u5f81\u7ec4\u7ec7\u6210\u8bed\u4e49\u5355\u5143\uff0c\u6839\u636e\u56fe\u50cf\u590d\u6742\u6027\u7075\u6d3b\u51b3\u5b9atoken\u7684\u6570\u91cf\u3002\u8fd9\u79cd\u751f\u6210\u7684\u89c6\u89c9tokens\u80fd\u6709\u6548\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u540c\u65f6\u6355\u6349\u4f4e\u9891\u548c\u9ad8\u9891\u89c6\u89c9\u7279\u5f81\u3002 ### \u4efb\u52a1 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSetokim\u7684\u65b0\u578bMLLM\uff0c\u5b83\u7ed3\u5408\u4e86SeTok\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSetokim\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u4f18\u52bf\u3002\u5173\u4e8e\u66f4\u591a\u8be6\u60c5\uff0c\u53ef\u4ee5\u8bbf\u95ee\u9879\u76ee\u7f51\u9875\uff1ahttps://chocowu.github.io/SeTok-web/\u3002|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107](http://arxiv.org/abs/2406.05107)|null|## \u7ffb\u8bd1 \u6570\u636e\u63a2\u7d22\u662f\u4e00\u4e2a\u590d\u6742\u7684\u8fc7\u7a0b\uff0c\u7528\u6237\u901a\u8fc7\u9010\u6b65\u6267\u884c\u4e00\u7cfb\u5217\u67e5\u8be2\u6765\u5ba1\u89c6\u6570\u636e\u96c6\u3002\u6709\u65f6\uff0c\u7528\u6237\u4f1a\u63a2\u7d22\u65b0\u6570\u636e\u4ee5\u719f\u6089\u5b83\uff0c\u4f46\u66f4\u591a\u65f6\u5019\uff0c\u63a2\u7d22\u8fc7\u7a0b\u662f\u56f4\u7ed5\u7279\u5b9a\u5206\u6790\u76ee\u6807\u6216\u95ee\u9898\u8fdb\u884c\u7684\u3002\u4e3a\u4e86\u5e2e\u52a9\u7528\u6237\u6709\u6548\u63a2\u7d22\uff0c\u5df2\u63d0\u51fa\u81ea\u52a8\u5316\u6570\u636e\u63a2\u7d22\uff08Automated Data Exploration\uff0cADE\uff09\u7cfb\u7edf\uff0c\u5b83\u4eec\u65e8\u5728\u81ea\u52a8\u751f\u6210\u5c55\u793a\u6570\u636e\u6709\u8da3\u7279\u6027\u7684\u5b8c\u6574\u63a2\u7d22\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684ADE\u7cfb\u7edf\u5e38\u53d7\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u4f18\u5316\u51fd\u6570\uff0c\u5bfc\u81f4\u5bf9\u540c\u4e00\u6570\u636e\u96c6\u59cb\u7ec8\u4ea7\u751f\u76f8\u540c\u7684\u63a2\u7d22\u5e8f\u5217\uff0c\u8fd9\u5728\u6709\u660e\u786e\u76ee\u6807\u7684\u63a2\u7d22\u4e2d\u663e\u5f97\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51faLINX\uff0c\u4e00\u4e2a\u7ed3\u5408\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\u7684\u751f\u6210\u5f0f\u7cfb\u7edf\uff0c\u4e13\u6ce8\u4e8e\u9762\u5411\u76ee\u6807\u7684\u6570\u636e\u63a2\u7d22\u3002 LINX\u63a5\u53d7\u8f93\u5165\u6570\u636e\u96c6\u548c\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u5206\u6790\u76ee\u6807\uff0c\u751f\u6210\u4e0e\u7528\u6237\u9700\u6c42\u76f8\u5173\u7684\u4e2a\u6027\u5316\u63a2\u7d22\u4f1a\u8bdd\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89e3\u6790\u8f93\u5165\u7684\u5206\u6790\u76ee\u6807\uff0c\u5e76\u636e\u6b64\u751f\u6210\u671f\u671b\u8f93\u51fa\u63a2\u7d22\u4f1a\u8bdd\u7684\u89c4\u8303\u3002\u8fd9\u4e9b\u89c4\u8303\u968f\u540e\u88ab\u4f20\u9012\u7ed9\u57fa\u4e8e\u7ea6\u675f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08Constrained Deep Reinforcement Learning\uff0cCDRL\uff09\u7684\u65b0\u578b\u6a21\u5757\u5316ADE\u5f15\u64ce\uff0c\u4f7f\u5176\u80fd\u6839\u636e\u6307\u5b9a\u6307\u4ee4\u8c03\u6574\u8f93\u51fa\u3002\u4e3a\u4e86\u9a8c\u8bc1LINX\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u9762\u5411\u76ee\u6807\u63a2\u7d22\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u6df1\u5165\u7684\u7528\u6237\u7814\u7a76\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLINX\u751f\u6210\u7684\u63a2\u7d22\u7b14\u8bb0\u672c\u5728\u76f8\u5173\u6027\u548c\u5b9e\u7528\u6027\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u89e3\u51b3\u65b9\u6848\uff0c\u5305\u62ecChatGPT\u3001\u65e0\u76ee\u6807\u5bfc\u5411\u7684ADE\u4ee5\u53ca\u5546\u4e1a\u7cfb\u7edf\u3002|\n", "2406.05085": "|**2024-06-07**|**Multi-Head RAG: Solving Multi-Aspect Problems with LLMs**|Maciej Besta et.al.|[2406.05085](http://arxiv.org/abs/2406.05085)|**[link](https://github.com/spcl/mrag)**|**## \u80cc\u666f **\u589e\u5f3a\u578b\u68c0\u7d22\u751f\u6210\uff08Retrieval Augmented Generation, RAG\uff09**\u901a\u8fc7\u5c06\u6587\u6863\u5185\u5bb9\u878d\u5165\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u4e2d\uff0c\u63d0\u9ad8\u4e86\u5176\u54cd\u5e94\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684RAG\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5904\u7406\u90a3\u4e9b\u53ef\u80fd\u9700\u8981\u68c0\u7d22\u5305\u542b\u4e0d\u540c\u5185\u5bb9\u7684\u591a\u6587\u6863\u67e5\u8be2\u3002\u8fd9\u7c7b\u95ee\u9898\u5728\u73b0\u5b9e\u4e2d\u5f88\u5e38\u89c1\uff0c\u4f46\u6311\u6218\u5728\u4e8e\uff0c\u8fd9\u4e9b\u6587\u6863\u7684\u5d4c\u5165\u5728\u5411\u91cf\u7a7a\u95f4\u4e2d\u53ef\u80fd\u76f8\u8ddd\u8f83\u8fdc\uff0c\u96be\u4ee5\u4e00\u6b21\u6027\u83b7\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6848\u2014\u2014**\u591a\u5934\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Multi-Head RAG, MRAG\uff09**\uff0c\u5b83\u4ee5\u4e00\u79cd\u7b80\u5355\u800c\u5f3a\u5927\u7684\u65b9\u5f0f\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff1a\u5229\u7528Transformer\u7684\u591a\u5934\u6ce8\u610f\u529b\u5c42\u7684\u6fc0\u6d3b\u4f5c\u4e3a\u68c0\u7d22\u952e\uff0c\u800c\u975e\u89e3\u7801\u5c42\u3002\u8fd9\u4e2a\u60f3\u6cd5\u7684\u9a71\u52a8\u529b\u5728\u4e8e\uff0c\u4e0d\u540c\u7684\u6ce8\u610f\u529b\u5934\u80fd\u591f\u5b66\u4e60\u6355\u6349\u6570\u636e\u7684\u4e0d\u540c\u65b9\u9762\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u6fc0\u6d3b\uff0c\u6211\u4eec\u5f97\u5230\u7684\u5d4c\u5165\u80fd\u4ee3\u8868\u6570\u636e\u9879\u548c\u67e5\u8be2\u7684\u591a\u79cd\u7279\u6027\uff0c\u4ece\u800c\u63d0\u5347\u590d\u6742\u67e5\u8be2\u7684\u68c0\u7d22\u7cbe\u5ea6\u3002 **\u8d21\u732e** \u6211\u4eec\u63d0\u4f9b\u4e86\u8bc4\u4f30\u65b9\u6cd5\u3001\u5ea6\u91cf\u6807\u51c6\u3001\u5408\u6210\u6570\u636e\u96c6\u4ee5\u53ca\u5b9e\u9645\u5e94\u7528\u6848\u4f8b\uff0c\u6765\u5c55\u793aMRAG\u7684\u6709\u6548\u6027\u3002\u4e0e\u6807\u51c6RAG\u57fa\u7ebf\u76f8\u6bd4\uff0cMRAG\u5728\u76f8\u5173\u6027\u65b9\u9762\u7684\u63d0\u5347\u53ef\u9ad8\u8fbe20%\u3002MRAG\u53ef\u4ee5\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u7684RAG\u6846\u67b6\uff0c\u5982RAGAS\uff0c\u4ee5\u53ca\u5404\u7c7b\u6570\u636e\u5b58\u50a8\u7cfb\u7edf\u3002 \u603b\u7ed3\uff0c\u672c\u6587\u65e8\u5728\u6539\u8fdb\u73b0\u6709RAG\u6a21\u578b\uff0c\u4ee5\u66f4\u597d\u5730\u5904\u7406\u6d89\u53ca\u591a\u89d2\u5ea6\u4fe1\u606f\u68c0\u7d22\u7684\u590d\u6742\u67e5\u8be2\u4efb\u52a1\u3002**|\n", "2406.05063": "|**2024-06-07**|**Are Large Language Models More Empathetic than Humans?**|Anuradha Welivita et.al.|[2406.05063](http://arxiv.org/abs/2406.05063)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7814\u7a76\u5b83\u4eec\u662f\u5426\u80fd\u5728\u60c5\u611f\u8bc6\u522b\u548c\u5171\u60c5\u56de\u5e94\u65b9\u9762\u8d85\u8d8a\u4eba\u7c7b\u5df2\u6210\u4e3a\u7814\u7a76\u7126\u70b9\u3002\u672c\u8bba\u6587\u5f00\u5c55\u4e86\u4e00\u9879\u6df1\u5165\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5305\u62ecGPT-4\u3001LLaMA-2-70B-Chat\u3001Gemini-1.0-Pro\u548cMixtral-8x7B-Instruct\u5728\u5185\u7684\u56db\u6b3e\u6700\u5148\u8fdb\u7684LLMs\u4e0e\u4eba\u7c7b\u5728\u5171\u60c5\u56de\u5e94\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca1,000\u540d\u53c2\u4e0e\u8005\u7684\u53cc\u76f2\u7528\u6237\u7814\u7a76\uff0c\u5bf92,000\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u60c5\u611f\u5bf9\u8bdd\u63d0\u793a\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u8fd9\u4e9b\u63d0\u793a\u6db5\u76d6\u4e8632\u79cd\u4e0d\u540c\u6b63\u8d1f\u60c5\u7eea\u7684\u5e7f\u6cdb\u8303\u56f4\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u7684\u5171\u60c5\u56de\u5e94\u80fd\u529b\u5728\u7edf\u8ba1\u5b66\u4e0a\u4f18\u4e8e\u4eba\u7c7b\u3002GPT-4\u8868\u73b0\u51fa\u6700\u5f3a\u70c8\u7684\u5171\u60c5\uff0c\u5176\u201c\u597d\u201d\u7b49\u7ea7\u522b\u7684\u56de\u590d\u6bd4\u4eba\u7c7b\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea631%\u3002\u7d27\u968f\u5176\u540e\u7684\u662fLLaMA-2\uff0c\u63d0\u5347\u4e86\u7ea624%\uff0cMixtral-8x7B\u63d0\u5347\u4e86\u7ea621%\uff0cGemini-Pro\u63d0\u5347\u4e86\u7ea610%\u3002\u6211\u4eec\u8fd8\u5bf9\u56de\u590d\u8bc4\u7ea7\u8fdb\u884c\u4e86\u66f4\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u53d1\u73b0\u67d0\u4e9bLLMs\u5728\u56de\u5e94\u7279\u5b9a\u60c5\u7eea\u65b9\u9762\u660e\u663e\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\u3002\u63d0\u51fa\u7684\u8bc4\u4f30\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u9002\u5e94\u6027\u5f3a\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0LLMs\u7684\u5171\u60c5\u80fd\u529b\uff0c\u907f\u514d\u4e86\u672a\u6765\u7814\u7a76\u91cd\u590d\u8fd9\u9879\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055](http://arxiv.org/abs/2406.05055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u63d0\u793a\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u7cbe\u5fc3\u6784\u5efa\u7684\u57fa\u51c6\u4e0a\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u5b58\u5728\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u63a8\u7406\u95ee\u9898\uff0c\u5373\u6240\u8c13\u7684\u4e0d\u660e\u786e\u95ee\u9898\u3002\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0c\u73b0\u6709\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6548\u679c\u4e0d\u4f73\uff0c\u5f80\u5f80\u7ed9\u51fa\u8fc7\u5ea6\u81ea\u4fe1\u7684\u7b54\u6848\u6216\u9519\u8bef\u63a8\u65ad\u3002\u4e3a\u4e86\u6df1\u5165\u7814\u7a76\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u201d\uff08PMC\uff09\u7684\u57fa\u51c6\uff0c\u5e76\u5f15\u5165\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002\u4f7f\u7528PMC\u57fa\u51c6\u7684\u5206\u6790\u63ed\u793a\u4e86\u5728\u89e3\u51b3\u660e\u786e\u95ee\u9898\u7684\u6570\u5b66\u63a8\u7406\u6027\u80fd\u4e0e\u8bc6\u522b\u4e0d\u660e\u786e\u95ee\u9898\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u6743\u8861\u3002\u9488\u5bf9PMC\u5e26\u6765\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\uff0c\u79f0\u4e3aSMT-LIB\u63d0\u793a\uff08SLP\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528SMT-LIB\u8bed\u8a00\u63cf\u8ff0\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u76f4\u63a5\u6c42\u89e3\uff0c\u7136\u540e\u91c7\u7528\u53cc\u91cd\u68c0\u67e5\u6c42\u89e3\u7b56\u7565\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u7684\u6ee1\u8db3\u6027\u548c\u552f\u4e00\u6027\uff0c\u4ece\u800c\u63d0\u4f9b\u6700\u7ec8\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u7684SLP\u65b9\u6cd5\u5728\u5904\u7406\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u65f6\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5177\u6709\u663e\u8457\u4f18\u52bf\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u6211\u4eec\u7684\u57fa\u51c6\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2406.05053": "|**2024-06-07**|**Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation**|Nachiket Kotalwar et.al.|[2406.05053](http://arxiv.org/abs/2406.05053)|null|### \u6982\u8ff0 \u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u7684\u6f5c\u529b\u5de8\u5927\uff0c\u5b83\u4eec\u80fd\u591f\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u53cd\u9988\u548c\u63d0\u793a\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u751f\u6210\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u4ee5\u8fbe\u5230\u4eba\u7c7b\u5bfc\u5e08\u7684\u6c34\u5e73\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u6559\u80b2\u90e8\u7f72\u4e2d\uff0c\u9664\u4e86\u8d28\u91cf\u5916\uff0c\u6210\u672c\u3001\u65f6\u95f4\u53ca\u6570\u636e\u9690\u79c1\u4e5f\u662f\u5173\u952e\u8003\u91cf\u56e0\u7d20\u3002\u672c\u8bba\u6587\u65e8\u5728\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u53cd\u9988\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u5305\u62ec\u8d28\u91cf\u3001\u6210\u672c\u3001\u901f\u5ea6\u548c\u6570\u636e\u9690\u79c1\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u5229\u7528\u6700\u65b0\u7684\u5728\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u6280\u672f\uff0c\u8fd9\u6709\u52a9\u4e8e\u76f4\u63a5\u964d\u4f4e\u6210\u672c\u5e76\u4fdd\u62a4\u6570\u636e\u9690\u79c1\u3002 \u4e3a\u4e86\u4f18\u5316\u9002\u5408\u6d4f\u89c8\u5668\u5185\u8fd0\u884c\u7684\u5c0f\u578b\u6a21\u578b\u7684\u53cd\u9988\u8d28\u91cf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGPT-4\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u7684\u5fae\u8c03\u6d41\u7a0b\u3002\u6211\u4eec\u5c06\u5c55\u793a\u5982\u4f55\u4f7f\u7528WebLLM\u7684\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u5f15\u64ce\u6765\u4f18\u5316Llama3-8B\u548cPhi3-3.8B\u76844\u4f4d\u91cf\u5316\u6a21\u578b\u5728\u4e09\u4e2a\u4e0d\u540cPython\u7f16\u7a0b\u6570\u636e\u96c6\u4e0a\u7684\u6548\u679c\u3002\u6211\u4eec\u627f\u8bfa\u4f1a\u516c\u5f00\u5168\u90e8\u5b9e\u73b0\u3001web\u5e94\u7528\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4fc3\u8fdb\u5728\u6d4f\u89c8\u5668\u8bed\u8a00\u6a21\u578b\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2406.05039": "|**2024-06-07**|**Bootstrapping Referring Multi-Object Tracking**|Yani Zhang et.al.|[2406.05039](http://arxiv.org/abs/2406.05039)|**[link](https://github.com/zyn213/temprmot)**|## \u80cc\u666f \u5f53\u524d\u7684\u591a\u5bf9\u8c61\u5f15\u7528\u8ddf\u8e2a\uff08RMOT\uff09\u4efb\u52a1\u901a\u5e38\u4f9d\u8d56\u4e8e\u624b\u52a8\u6807\u6ce8\u7684\u6570\u636e\u96c6\u548c\u9759\u6001\u89c4\u5219\uff0c\u8fd9\u9650\u5236\u4e86\u591a\u6837\u6027\u548c\u5b9e\u65bd\u8303\u56f4\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u901a\u8fc7\u5f15\u5165\u66f4\u591a\u533a\u5206\u6027\u8bed\u8a00\u8bcd\u6c47\u6765\u63a8\u52a8RMOT\u4efb\u52a1\u7684\u53d1\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u9996\u5148\u5bf9Refer-KITTI\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u6269\u5c55\uff0c\u521b\u5efa\u4e86Refer-KITTI-V2\uff0c\u5b83\u4ece\u6700\u521d\u76842,719\u4e2a\u624b\u52a8\u6807\u6ce8\u5f00\u59cb\uff0c\u89e3\u51b3\u4e86\u7c7b\u522b\u4e0d\u5e73\u8861\u95ee\u9898\uff0c\u5e76\u589e\u52a0\u4e86\u66f4\u591a\u5173\u952e\u8bcd\uff0c\u4f7f\u5176\u66f4\u8d34\u8fd1\u73b0\u5b9e\u573a\u666f\uff0c\u76f8\u8f83\u4e8eRefer-KITTI\u6709\u6240\u8fdb\u6b65\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6269\u5145\u8fd9\u4e9b\u6807\u6ce8\uff0c\u603b\u8ba1\u8fbe\u52309,758\u4e2a\uff0c\u751f\u6210\u4e86617\u4e2a\u4e0d\u540c\u7684\u8bcd\u6c47\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684RMOT\u57fa\u51c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6539\u8fdb\u4e86RMOT\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u800c\u4f18\u96c5\u7684\u65f6\u5e8f\u63a8\u8fdb\u7b56\u7565\uff0c\u8be5\u7b56\u7565\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\u3002\u76f8\u5173\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05035": "|**2024-06-07**|**Scenarios and Approaches for Situated Natural Language Explanations**|Pengshuo Qiu et.al.|[2406.05035](http://arxiv.org/abs/2406.05035)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u751f\u6210\u9002\u5e94\u4e0d\u540c\u7528\u6237\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\uff08NLE\uff09\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u79cd\u9002\u5e94\u6027\u7684\u91cf\u5316\u8bc4\u4f30\u5c1a\u5b58\u7a7a\u767d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u2014\u2014\u57fa\u4e8e\u60c5\u5883\u7684\u89e3\u91ca\uff08Situation-Based Explanation\uff0cSBE\uff09\u6570\u636e\u96c6\uff0c\u5305\u542b100\u4e2a\u9700\u8981\u89e3\u91ca\u7684\u4e8b\u7269\uff08explanandum\uff09\u3002\u6bcf\u4e2a\u4e8b\u7269\u90fd\u914d\u5bf9\u4e86\u9488\u5bf9\u6559\u5e08\u3001\u5b66\u751f\u548c\u4e13\u4e1a\u4eba\u58eb\u7b49\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u89e3\u91ca\uff0c\u4ee5\u4fbf\u8bc4\u4f30\u6a21\u578b\u5728\u6ee1\u8db3\u8fd9\u4e9b\u591a\u5143\u5316\u7fa4\u4f53\u4fe1\u606f\u9700\u6c42\u548c\u80cc\u666f\u4e0b\u7684\u89e3\u91ca\u7cbe\u51c6\u5ea6\uff0c\u5982\u5b66\u751f\u3001\u6559\u5e08\u548c\u5bb6\u957f\u3002\u6bcf\u79cd\u201c\u4e8b\u4f8b-\u53d7\u4f17\u201d\u7ec4\u5408\u90fd\u9644\u6709\u4eba\u7c7b\u64b0\u5199\u7684\u53c2\u8003\u89e3\u91ca\uff0c\u7528\u4e8e\u8ba1\u7b97\u5206\u6570\uff0c\u4ee5\u91cf\u5316\u6a21\u578b\u5982\u4f55\u6839\u636e\u60c5\u5883\u8c03\u6574\u89e3\u91ca\u3002\u6211\u4eec\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u4e0a\u6d4b\u8bd5\u4e86\u4e09\u79cd\u63d0\u793a\u65b9\u6cd5\uff1a\u89c4\u5219\u57fa\u7840\u63d0\u793a\u3001\u5143\u63d0\u793a\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u3002\u7814\u7a76\u53d1\u73b0\uff1a1\uff09\u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u751f\u6210\u63d0\u793a\u4ea7\u751f\u66f4\u7cbe\u786e\u5730\u7b26\u5408\u76ee\u6807\u60c5\u5883\u7684\u89e3\u91ca\uff1b2\uff09\u660e\u786e\u63d0\u793a\u201c\u4f60\u662f\u4e00\u4e2a\u6709\u7528\u7684\u52a9\u624b\u201d\u5e76\u975e\u9488\u5bf9\u60c5\u5883\u5316NLE\u4efb\u52a1\u7684\u5fc5\u8981\u6280\u672f\uff1b3\uff09\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u4ec5\u80fd\u5e2e\u52a9\u6a21\u578b\u5b66\u4e60\u6f14\u793a\u6a21\u677f\uff0c\u4f46\u65e0\u52a9\u4e8e\u63d0\u5347\u5176\u63a8\u7406\u6027\u80fd\u3002SBE\u6570\u636e\u96c6\u548c\u6211\u4eec\u7684\u5206\u6790\u4e3a\u4eca\u540e\u751f\u6210\u9002\u5e94\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2406.06525": "|**2024-06-10**|**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**|Peize Sun et.al.|[2406.06525](http://arxiv.org/abs/2406.06525)|**[link](https://github.com/foundationvision/llamagen)**|**\u6211\u4eec\u63d0\u51faLlamaGen\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u65b0\u7684\u56fe\u50cf\u751f\u6210\u6a21\u578b\u5bb6\u65cf\uff0c\u5b83\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u539f\u59cb\u201c\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u201d\u8303\u5f0f\u5e94\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u9886\u57df\u3002\u8fd9\u8868\u660e\uff0c\u5982\u679c\u9002\u5f53\u6269\u5c55\uff0c\u672a\u7ecf\u89c6\u89c9\u7279\u6027\u7684\u5148\u9a8c\u77e5\u8bc6\u589e\u5f3a\u7684\u7eaf\u81ea\u56de\u5f52\u6a21\u578b\uff08\u5982Llama\uff09\u4e5f\u80fd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u56fe\u50cf\u751f\u6210\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u56fe\u50cf\u5206\u8bcd\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u4ee5\u53ca\u8bad\u7ec3\u6570\u636e\u8d28\u91cf\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a(1) \u4e00\u79cd\u5177\u670916\u500d\u4e0b\u91c7\u6837\u7684\u56fe\u50cf\u5206\u8bcd\u5668\uff0c\u5176\u5728ImageNet\u57fa\u51c6\u4e0a\u7684\u91cd\u6784\u8d28\u91cf\u4e3a0.94\uff0c\u4ee3\u7801\u4e66\u5229\u7528\u7387\u9ad8\u8fbe97%\u3002(2) \u4e00\u7cfb\u5217\u4ece111\u767e\u4e07\u523031\u4ebf\u53c2\u6570\u7684\u7c7b\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u5728ImageNet 256x256\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.18\u7684FID\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u6d41\u884c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5982LDM\u548cDiT\u3002(3) \u4e00\u4e2a7.75\u4ebf\u53c2\u6570\u7684\u6587\u672c\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\u5728LAION-COCO\u548c\u9ad8\u5ba1\u7f8e\u8d28\u91cf\u56fe\u50cf\u4e0a\uff0c\u663e\u793a\u51fa\u826f\u597d\u7684\u89c6\u89c9\u8d28\u91cf\u548c\u6587\u672c\u4e00\u81f4\u6027\u6027\u80fd\u3002(4) \u6211\u4eec\u9a8c\u8bc1\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u670d\u52a1\u6846\u67b6\u5728\u4f18\u5316\u56fe\u50cf\u751f\u6210\u6a21\u578b\u63a8\u7406\u901f\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5b9e\u73b0\u4e86326%\u81f3414%\u7684\u901f\u5ea6\u63d0\u5347\u3002\u6211\u4eec\u5f00\u6e90\u6240\u6709\u6a21\u578b\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u751f\u6210\u548c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u7684\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u7684\u53d1\u5c55\u3002**|\n", "2406.06519": "|**2024-06-10**|**UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor**|Shivani Upadhyay et.al.|[2406.06519](http://arxiv.org/abs/2406.06519)|**[link](https://github.com/castorini/umbrela)**|**## \u7ffb\u8bd1 \u5927\u91cf\u76f8\u5173\u6027\u5224\u65ad\u5bf9\u4e8e\u68c0\u7d22\u7cfb\u7edf\u7684\u6709\u6548\u8bad\u7ec3\u548c\u7cbe\u786e\u8bc4\u4f30\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u5224\u65ad\u7531\u4eba\u5de5\u8bc4\u5b9a\u5458\u5b8c\u6210\uff0c\u8fc7\u7a0b\u6602\u8d35\u4e14\u8017\u65f6\u3002\u5fae\u8f6fBing\u7684Thomas\u7b49\u4eba\u6700\u8fd1\u7684\u4e00\u9879\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u51c6\u786e\u5730\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u63d0\u4f9b\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u5224\u65ad\u3002\u9057\u61be\u7684\u662f\uff0c\u4ed6\u4eec\u7684\u7814\u7a76\u5e76\u672a\u516c\u5f00\u53ef\u4f9b\u91cd\u590d\u4f7f\u7528\u7684\u8f6f\u4ef6\u5de5\u5177\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u5305\u2014\u2014UMBRELA\uff08\u5168\u79f0\u4e3a\u201cUMBRELA\u662fBing RELevance Assessor\u7684\u9012\u5f52\u7f29\u5199\u201d\uff09\uff0c\u5b83\u57fa\u4e8eOpenAI\u7684GPT-4\u6a21\u578b\u590d\u73b0\u4e86Thomas\u7b49\u4eba\u7684\u7ed3\u679c\uff0c\u5e76\u4e3a\u539f\u8bba\u6587\u589e\u6dfb\u4e86\u66f4\u591a\u7ec6\u8282\u3002\u6211\u4eec\u5728TREC 2019\u5e74\u81f32023\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u4efb\u52a1\u4e2d\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u76f8\u5173\u6027\u5224\u65ad\u4e0e\u9ad8\u6548\u591a\u9636\u6bb5\u68c0\u7d22\u7cfb\u7edf\u751f\u6210\u7684\u6392\u540d\u9ad8\u5ea6\u76f8\u5173\u3002\u8be5\u5de5\u5177\u5305\u8bbe\u8ba1\u4e3a\u6613\u4e8e\u6269\u5c55\uff0c\u53ef\u4ee5\u878d\u5165\u73b0\u6709\u7684\u591a\u9636\u6bb5\u68c0\u7d22\u548c\u8bc4\u4f30\u6d41\u7a0b\uff0c\u4e3a\u7814\u7a76\u68c0\u7d22\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u8d44\u6e90\u3002UMBRELA\u5c06\u5728TREC 2024\u5e74\u7684RAG\u4efb\u52a1\u4e2d\u7528\u4e8e\u8f85\u52a9\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u6211\u4eec\u671f\u671b\u5b83\u6210\u4e3a\u8be5\u9886\u57df\u8fdb\u4e00\u6b65\u521b\u65b0\u7684\u57fa\u7840\u3002UMBRELA\u7684\u4ee3\u7801\u5e93\u53ef\u4e8ehttps://github.com/castorini/umbrela\u83b7\u53d6\u3002**|\n", "2406.06499": "|**2024-06-10**|**NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative**|Asmar Nadeem et.al.|[2406.06499](http://arxiv.org/abs/2406.06499)|null|\u5f53\u524d\u7684\u89c6\u9891\u5b57\u5e55\u57fa\u51c6\u548c\u6a21\u578b\u5728\u8868\u5f81\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u79cd\u53d9\u4e8b\u662f\u901a\u8fc7\u56e0\u679c\u5173\u7cfb\u8fde\u63a5\u7684\u4e00\u7cfb\u5217\u4e8b\u4ef6\uff0c\u968f\u65f6\u95f4\u53d1\u5c55\uff0c\u7531\u4eba\u7269\u6216\u4e3b\u4f53\u9a71\u52a8\u3002\u8fd9\u79cd\u7f3a\u4e4f\u53d9\u4e8b\u6027\u9650\u5236\u4e86\u6a21\u578b\u751f\u6210\u6355\u6349\u89c6\u9891\u5185\u5bb9\u5185\u5728\u56e0\u679c\u548c\u65f6\u95f4\u52a8\u6001\u7684\u6587\u672c\u63cf\u8ff0\u7684\u80fd\u529b\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faNarrativeBridge\uff0c\u5b83\u5305\u62ec\u4ee5\u4e0b\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\uff1a\uff081\uff09\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u751f\u6210\u7684\u65b0\u578b\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\uff08CTN\uff09\u5b57\u5e55\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u660e\u786e\u5730\u5728\u89c6\u9891\u63cf\u8ff0\u4e2d\u7f16\u7801\u56e0\u679c\u5173\u7cfb\uff0c\u901a\u8fc7\u81ea\u52a8\u8bc4\u4f30\u786e\u4fdd\u8d28\u91cf\u548c\u76f8\u5173\u6027\uff1b\uff082\uff09\u4e00\u4e2a\u4e13\u95e8\u7684\u56e0\u679c\u7f51\u7edc\uff08CEN\uff09\u67b6\u6784\uff0c\u5177\u6709\u72ec\u7acb\u7684\u7f16\u7801\u5668\u4ee5\u5206\u522b\u6355\u83b7\u56e0\u679c\u52a8\u6001\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u7684\u5b66\u4e60\u548c\u751f\u6210\u5177\u6709\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7684\u5b57\u5e55\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCEN\u5728\u8868\u8fbe\u89c6\u9891\u5185\u5bb9\u7684\u56e0\u679c\u548c\u65f6\u95f4\u65b9\u9762\u6bd4\u7b2c\u4e8c\u597d\u7684\u6a21\u578b\uff08GIT\uff09\u66f4\u51c6\u786e\uff1a\u5728MSVD\u548cMSR-VTT\u6570\u636e\u96c6\u4e0a\u7684CIDEr\u5206\u6570\u5206\u522b\u4e3a17.88\u548c17.44\u3002\u63d0\u51fa\u7684\u6846\u67b6\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u5177\u6709\u590d\u6742\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7ed3\u6784\u7684\u7ec6\u5fae\u6587\u672c\u63cf\u8ff0\uff0c\u8fd9\u662f\u89c6\u9891\u5b57\u5e55\u751f\u6210\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u6027\u3002\u6709\u5173\u9879\u76ee\u8be6\u60c5\uff0c\u8bf7\u8bbf\u95ee\u3002|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474](http://arxiv.org/abs/2406.06474)|null|\u5728\u5065\u5eb7\u9886\u57df\uff0c\u5927\u90e8\u5206\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u4e34\u5e8a\u4efb\u52a1\u4e0a\u3002\u7136\u800c\uff0c\u79fb\u52a8\u548c\u53ef\u7a7f\u6234\u8bbe\u5907\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u957f\u671f\u7684\u4e2a\u4eba\u5065\u5eb7\u76d1\u6d4b\u6570\u636e\u5f80\u5f80\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Large Language Model\uff08PH-LLM\uff09\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u662fGemini\u7684\u5b9a\u5236\u7248\uff0c\u4e13\u4e3a\u7406\u89e3\u548c\u5904\u7406\u6570\u503c\u65f6\u95f4\u5e8f\u5217\u7684\u4e2a\u4eba\u5065\u5eb7\u6570\u636e\u800c\u8bbe\u8ba1\u3002\u6211\u4eec\u521b\u5efa\u5e76\u6574\u7406\u4e86\u4e09\u4e2a\u6d4b\u8bd5\u96c6\uff0c\u8003\u5bdf\u4e86PH-LLM\u5728\u4ee5\u4e0b\u65b9\u9762\u7684\u6027\u80fd\uff1a1\uff09\u4ece\u7761\u7720\u6a21\u5f0f\u3001\u8eab\u4f53\u6d3b\u52a8\u548c\u751f\u7406\u53cd\u5e94\u4e2d\u751f\u6210\u4e2a\u6027\u5316\u89c1\u89e3\u548c\u5efa\u8bae\uff1b2\uff09\u4e13\u4e1a\u77e5\u8bc6\u9886\u57df\u7684\u4e13\u5bb6\u6c34\u5e73\uff1b3\uff09\u9884\u6d4b\u81ea\u6211\u62a5\u544a\u7684\u7761\u7720\u7ed3\u679c\u3002\u6211\u4eec\u4e0e\u9886\u57df\u4e13\u5bb6\u5408\u4f5c\u6784\u5efa\u4e86857\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bc4\u4f30\u5b9e\u9645\u7684\u7761\u7720\u548c\u5065\u8eab\u573a\u666f\u3002\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bc4\u5206\u6807\u51c6\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0Gemini Ultra 1.0\u548cPH-LLM\u5728\u5065\u8eab\u65b9\u9762\u4e0e\u4e13\u5bb6\u8868\u73b0\u65e0\u7edf\u8ba1\u5dee\u5f02\uff0c\u5c3d\u7ba1\u5728\u7761\u7720\u65b9\u9762\u4e13\u5bb6\u4ecd\u5360\u4f18\u52bf\uff0c\u4f46Fine-tune\u540e\u7684PH-LLM\u5728\u5229\u7528\u76f8\u5173\u9886\u57df\u77e5\u8bc6\u548c\u4e2a\u4eba\u5316\u7761\u7720\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u591a\u9879\u9009\u62e9\u7684\u7761\u7720\u533b\u5b66\u548c\u5065\u8eab\u8003\u8bd5\u8bc4\u4f30\u4e86PH-LLM\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5176\u5f97\u5206\u5206\u522b\u4e3a79%\u548c88%\uff0c\u8d85\u8fc7\u4e86\u4eba\u7c7b\u4e13\u5bb6\u6837\u672c\u7684\u5e73\u5747\u5206\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bad\u7ec3PH-LLM\u9884\u6d4b\u6765\u81ea\u53ef\u7a7f\u6234\u8bbe\u5907\u6587\u672c\u548c\u591a\u6a21\u6001\u7f16\u7801\u6570\u636e\u7684\u81ea\u6211\u62a5\u544a\u7761\u7720\u8d28\u91cf\u7ed3\u679c\uff0c\u5e76\u8bc1\u660e\u4e86\u591a\u6a21\u6001\u7f16\u7801\u5bf9\u4e8e\u8fbe\u5230\u4e13\u95e8\u533a\u5206\u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5728\u4e2a\u4eba\u5065\u5eb7\u8fd9\u4e2a\u5173\u952e\u5b89\u5168\u9886\u57df\u8fd8\u9700\u8981\u8fdb\u4e00\u6b65\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u4f46\u8fd9\u4e9b\u7ed3\u679c\u5c55\u793a\u4e86Gemini\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u80fd\u529b\uff0c\u4ee5\u53ca\u5c06\u751f\u7406\u6570\u636e\u5e94\u7528\u4e8e\u4e2a\u4eba\u5065\u5eb7\u5e94\u7528\uff0c\u5982PH-LLM\u4e2d\u7684\u505a\u6cd5\u3002|\n", "2406.06465": "|**2024-06-10**|**AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction**|Zhen Xing et.al.|[2406.06465](http://arxiv.org/abs/2406.06465)|null|\u6587\u672c\u5f15\u5bfc\u7684\u89c6\u9891\u9884\u6d4b\uff08TVP\uff09\u4efb\u52a1\u65e8\u5728\u6839\u636e\u521d\u59cb\u5e27\u548c\u6307\u4ee4\u9884\u6d4b\u540e\u7eed\u5e27\u7684\u8fd0\u52a8\uff0c\u8fd9\u5bf9\u4e8e\u865a\u62df\u73b0\u5b9e\u3001\u673a\u5668\u4eba\u6280\u672f\u548c\u5185\u5bb9\u521b\u4f5c\u7b49\u9886\u57df\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6539\u7f16Stable Diffusion\u5728\u8be5\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5e27\u4e00\u81f4\u6027\u4e0e\u65f6\u95f4\u7a33\u5b9a\u6027\u65b9\u9762\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u89c6\u9891\u6570\u636e\u96c6\u7684\u89c4\u6a21\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u9884\u8bad\u7ec3\u7684Image2Video\u6269\u6563\u6a21\u578b\u5bf9\u89c6\u9891\u52a8\u6001\u6709\u826f\u597d\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4f46\u7f3a\u4e4f\u6587\u672c\u63a7\u5236\u3002\u56e0\u6b64\uff0c\u5c06Image2Video\u6a21\u578b\u8f6c\u79fb\uff0c\u540c\u65f6\u6ce8\u5165\u6307\u4ee4\u63a7\u5236\u4ee5\u751f\u6210\u53ef\u63a7\u5236\u7684\u89c6\u9891\uff0c\u65e2\u5177\u6709\u610f\u4e49\u53c8\u9887\u5177\u6311\u6218\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u7528\u4e8e\u6839\u636e\u521d\u59cb\u5e27\u548c\u6587\u672c\u6307\u4ee4\u9884\u6d4b\u672a\u6765\u7684\u89c6\u9891\u72b6\u6001\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u53cc\u67e5\u8be2Transformer\uff08DQFormer\uff09\u67b6\u6784\uff0c\u5b83\u5c06\u6307\u4ee4\u548c\u5e27\u4fe1\u606f\u6574\u5408\u5230\u6761\u4ef6\u5d4c\u5165\u4e2d\uff0c\u7528\u4e8e\u672a\u6765\u5e27\u7684\u9884\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u957f\u77ed\u671f\u65f6\u5e8f\u9002\u914d\u5668\u548c\u7a7a\u95f4\u9002\u914d\u5668\uff0c\u80fd\u591f\u5728\u5c11\u91cf\u8bad\u7ec3\u6210\u672c\u4e0b\u5feb\u901f\u5c06\u901a\u7528\u89c6\u9891\u6269\u6563\u6a21\u578b\u9002\u5e94\u7279\u5b9a\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Something Something V2\u3001Epic Kitchen-100\u3001Bridge Data\u548cUCF-101\u56db\u4e2a\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6280\u672f\u3002\u7279\u522b\u662f\u5728Bridge\u6570\u636e\u96c6\u548cSSv2\u4e0a\uff0cAID\u5206\u522b\u5b9e\u73b0\u4e8691.2%\u548c55.5%\u7684FVD\u6539\u8fdb\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u9886\u57df\u7684\u6709\u6548\u6027\u3002\u66f4\u591a\u793a\u4f8b\u53ef\u5728\u6211\u4eec\u7684\u7f51\u7ad9\u627e\u5230\u3002|\n", "2406.06464": "|**2024-06-10**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.06461": "|**2024-06-11**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461](http://arxiv.org/abs/2406.06461)|null|\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u591a\u79cd\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4f46\u4f20\u7edf\u7684\u8bc4\u4ef7\u65b9\u6cd5\u4ec5\u5173\u6ce8\u6027\u80fd\u6307\u6807\uff0c\u5ffd\u89c6\u4e86\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u989d\u5916\u8ba1\u7b97\u8d44\u6e90\u5e26\u6765\u7684\u589e\u6548\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7b56\u7565\u6548\u7387\u7684\u7247\u9762\u7406\u89e3\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u5c06\u8ba1\u7b97\u9884\u7b97\u7eb3\u5165\u8bc4\u4f30\uff0c\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u65e2\u8003\u8651\u6027\u80fd\u6307\u6807\u53c8\u8003\u8651\u8ba1\u7b97\u6210\u672c\u7684\u66f4\u5168\u9762\u6bd4\u8f83\u3002\u901a\u8fc7\u8fd9\u79cd\u9884\u7b97\u610f\u8bc6\u7684\u89c6\u89d2\uff0c\u7814\u7a76\u53d1\u73b0\u590d\u6742\u7684\u63a8\u7406\u7b56\u7565\u5728\u6ca1\u6709\u663e\u8457\u7b97\u6cd5\u521b\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u5f80\u5f80\u7531\u4e8e\u5206\u914d\u4e86\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u800c\u8d85\u8d8a\u4e86\u7b80\u5355\u7684\u57fa\u7ebf\u3002\u4f8b\u5982\uff0c\u5f53\u7ed9\u4e88\u94fe\u5f0f\u601d\u8003\u81ea\u6d3d\u6027\uff08chain-of-thought self-consistency\uff09\u7c7b\u4f3c\u7ea7\u522b\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u5b83\u5e38\u5e38\u80fd\u4f18\u4e8e\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u63a8\u7406\u7b56\u7565\u3002\u7136\u800c\uff0c\u5728\u8fd9\u79cd\u89c4\u6a21\u654f\u611f\u7684\u89c6\u89d2\u4e0b\uff0c\u67d0\u4e9b\u7b56\u7565\u5982\u591a\u4ee3\u7406\u8fa9\u8bba\u6216\u591a\u53cd\u601d\u5728\u589e\u52a0\u8ba1\u7b97\u9884\u7b97\u65f6\u53ef\u80fd\u4f1a\u8868\u73b0\u5f97\u66f4\u5dee\u3002|\n", "2406.06458": "|**2024-06-10**|**Evaluating the Retrieval Component in LLM-Based Question Answering Systems**|Ashkan Alinejad et.al.|[2406.06458](http://arxiv.org/abs/2406.06458)|null|## \u80cc\u666f \u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u95ee\u7b54\u7cfb\u7edf\u5728\u4f9d\u8d56\u68c0\u7d22\u7ec4\u4ef6\u65f6\uff0c\u80fd\u591f\u83b7\u53d6\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5e76\u964d\u4f4e\u4ea7\u751f\u4e0d\u51c6\u786e\u56de\u590d\u6216\u9519\u8bef\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5c3d\u7ba1\u4fe1\u606f\u68c0\u7d22\u9886\u57df\u7684\u8bc4\u4f30\u65b9\u6cd5\u65e9\u5df2\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u8bc4\u4f30LLMs\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u6027\u80fd\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u57fa\u51c6\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4ef7\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u5168\u9762\u5730\u53cd\u6620\u68c0\u7d22\u5668\u7684\u6027\u80fd\uff0c\u5e76\u4e0e\u6574\u4e2a\u95ee\u7b54\u7cfb\u7edf\u7684\u6574\u4f53\u8868\u73b0\u66f4\u4e3a\u4e00\u81f4\u3002\u5c3d\u7ba1\u4f20\u7edf\u7684\u7cbe\u786e\u5ea6\uff08precision\uff09\u3001\u53ec\u56de\u7387\uff08recall\uff09\u548cF1\u5206\u6570\u7b49\u6307\u6807\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u63ed\u793aLLMs\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5728\u68c0\u7d22\u5668\u4e0d\u5b8c\u7f8e\u65f6\u4ecd\u63d0\u4f9b\u51c6\u786e\u7b54\u6848\uff0c\u4f46\u6211\u4eec\u7684\u8bc4\u4f30\u65b9\u6cd5\u8003\u8651\u5230\u4e86LLMs\u7684\u4f18\u52bf\uff0c\u5373\u5b83\u4eec\u80fd\u591f\u5ffd\u7565\u65e0\u5173\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4e5f\u80fd\u5904\u7406\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u548c\u865a\u6784\u5185\u5bb9\u3002|\n", "2406.06455": "|**2024-06-10**|**A Large Language Model Pipeline for Breast Cancer Oncology**|Tristen Pool et.al.|[2406.06455](http://arxiv.org/abs/2406.06455)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f17\u591a\u9886\u57df\u5c55\u73b0\u51fa\u521b\u65b0\u6f5c\u529b\uff0c\u4f46\u5728\u764c\u75c7\u6cbb\u7597\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u5f00\u53d1\u3002\u7814\u7a76\u8005\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684Langchain\u63d0\u793a\u5de5\u7a0b\u7ba1\u9053\uff0c\u5bf9\u6700\u5148\u8fdb\u7684OpenAI\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u6570\u636e\u96c6\u5305\u62ec\u4e34\u5e8a\u6570\u636e\u548c\u4e34\u5e8a\u6307\u5357\u6587\u672c\uff0c\u4e13\u6ce8\u4e8e\u4e73\u817a\u764c\u60a3\u8005\u8f85\u52a9\u653e\u7597\u548c\u5316\u7597\u4e24\u4e2a\u5173\u952e\u6cbb\u7597\u56e0\u7d20\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6a21\u578b\u5728\u5206\u7c7b\u8fd9\u4e24\u4e2a\u6cbb\u7597\u624b\u6bb5\u65f6\u8fbe\u5230\u4e86\u9ad8\u7cbe\u5ea6\uff080.85+\uff09\u3002\u901a\u8fc7\u89c2\u5bdf\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u7684\u6cbb\u7597\u8d28\u91cf\u6570\u636e\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u7f6e\u4fe1\u533a\u95f4\uff0c\u4f30\u8ba1\u6a21\u578b\u5728\u9884\u6d4b\u6cbb\u7597\u65b9\u6848\u65f6\u5fc5\u987b\u6bd4\u539f\u59cb\u80bf\u7624\u5b66\u5bb6\u8868\u73b0\u5f97\u66f4\u597d\uff0c\u624d\u80fd\u5728\u603b\u4f53\u4e0a\u6210\u4e3a\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6bd4\u4f8b\u4e3a8.2%\u81f313.3%\u3002\u7531\u4e8e\u764c\u75c7\u6cbb\u7597\u51b3\u7b56\u7ed3\u679c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u672a\u6765\u53ef\u80fd\u9700\u8981\u8fdb\u884c\u4e34\u5e8a\u8bd5\u9a8c\u6765\u9a8c\u8bc1\u8fd9\u4e00\u9608\u503c\u3002\u8003\u8651\u5230\u7f8e\u56fd85%\u7684\u764c\u75c7\u60a3\u8005\u5728\u5730\u65b9\u793e\u533a\u8bbe\u65bd\u63a5\u53d7\u6cbb\u7597\uff0c\u8fd9\u7c7b\u6a21\u578b\u6709\u53ef\u80fd\u663e\u8457\u6269\u5927\u4f18\u8d28\u62a4\u7406\u7684\u53ef\u53ca\u6027\uff0c\u5176\u6548\u679c\u81f3\u5c11\u63a5\u8fd1\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u3002|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451](http://arxiv.org/abs/2406.06451)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u3001\u8c03\u8bd5\u548c\u89e3\u91ca\u65b9\u9762\u7684\u6027\u80fd\u5f15\u53d1\u4e86\u8bb8\u591a\u7814\u7a76\u8005\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u5bf9\u672c\u79d1\u7f16\u7a0b\u6559\u80b2\u7684\u5173\u6ce8\uff0c\u4ed6\u4eec\u671f\u5f85\u8fd9\u4e9b\u6a21\u578b\u80fd\u9769\u65b0\u7f16\u7a0b\u6559\u5b66\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5982\u4f55\u4ee5\u53ca\u4e3a\u4f55\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u4f7f\u7528LLMs\u7684\u51b3\u7b56\u53ef\u80fd\u4e0d\u4ec5\u4ec5\u57fa\u4e8e\u6280\u672f\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u4ee5\u793e\u4f1a\u5851\u9020\u6280\u672f\u7406\u8bba\u4e3a\u6307\u5bfc\u6846\u67b6\uff0c\u63a2\u8ba8\u4e86\u5b66\u751f\u5bf9LLMs\u7684\u793e\u4f1a\u611f\u77e5\u5982\u4f55\u5f71\u54cd\u4ed6\u4eec\u7684\u4f7f\u7528\u884c\u4e3a\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u4e00\u4efd\u533f\u540d\u7684\u8bfe\u7a0b\u7ed3\u675f\u65f6\u7684\u8c03\u67e5\u95ee\u5377\uff08n=158\uff09\u3001\u4e2d\u671f\u81ea\u6211\u6548\u80fd\u95ee\u5377\uff08n=158\uff09\u300110\u4f4d\u5b66\u751f\u7684\u6df1\u5ea6\u8bbf\u8c08\u3001\u81ea\u6211\u62a5\u544a\u7684LLM\u5728\u4f5c\u4e1a\u4e2d\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u4ee5\u53ca\u671f\u4e2d\u8003\u8bd5\u6210\u7ee9\uff0c\u53d1\u73b0\u5b66\u751f\u7684LLM\u4f7f\u7528\u4e0e\u5176\u5bf9\u672a\u6765\u804c\u4e1a\u7684\u671f\u671b\u548c\u5bf9\u540c\u4f34\u4f7f\u7528\u7684\u611f\u77e5\u6709\u5173\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u65e9\u671f\u81ea\u6211\u62a5\u544a\u7684LLM\u4f7f\u7528\u4e0e\u8f83\u4f4e\u7684\u81ea\u6211\u6548\u80fd\u548c\u4e2d\u671f\u8003\u8bd5\u6210\u7ee9\u76f8\u5173\uff0c\u800c\u5b66\u751f\u5bf9\u8fc7\u5ea6\u4f9d\u8d56LLM\u7684\u611f\u77e5\uff0c\u800c\u975e\u5b9e\u9645\u4f7f\u7528\uff0c\u4e0e\u8bfe\u7a0b\u540e\u671f\u7684\u81ea\u6211\u6548\u80fd\u4e0b\u964d\u6709\u5173\u3002|\n", "2406.07545": "|**2024-06-11**|**Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena**|Aidar Myrzakhan et.al.|[2406.07545](http://arxiv.org/abs/2406.07545)|**[link](https://github.com/vila-lab/open-llm-leaderboard)**|**### \u80cc\u666f \u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u5e38\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u5e38\uff0cLLM\u4f1a\u6839\u636e\u8c03\u6574\u540e\u7684\u6982\u7387\uff0c\u5982\u957f\u5ea6\u56e0\u7d20\uff0c\u9009\u62e9\u6700\u53ef\u80fd\u7684\u7b54\u6848\u3002\u7136\u800c\uff0cLLMs\u53ef\u80fd\u5b58\u5728\u56fa\u6709\u7684\u504f\u89c1\uff0c\u4f8b\u5982\u5bf9A\u3001B\u3001C\u3001D\u7b49\u9009\u9879ID\u7684\u504f\u597d\uff0c\u8fd9\u53ef\u80fd\u5f71\u54cd\u7b54\u6848\u9884\u6d4b\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u5728\u5c11\u6570\u6d4b\u8bd5\u6837\u672c\u4e0a\u968f\u673a\u6253\u4e71\u9009\u9879\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u5230\u65b0\u6837\u672c\u4e0a\uff0c\u8bd5\u56fe\u51cf\u5c11\u8fd9\u79cd\u201c\u9009\u62e9\u504f\u5dee\u201d\u3002\u6b64\u5916\uff0cMCQ\u7684\u53e6\u4e00\u4e2a\u95ee\u9898\u662f\u201c\u5f69\u7968\u5f0f\u731c\u6d4b\u201d\uff0c\u5373LLM\u5e76\u672a\u771f\u6b63\u5b66\u4e60\u77e5\u8bc6\uff0c\u800c\u662f\u51ed\u8fd0\u6c14\u731c\u5bf9\u7b54\u6848\uff0c\u8fd9\u5bf9\u5c0f\u578bLLMs\u5c24\u4e3a\u4e25\u91cd\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e00\u4e2a\u66f4\u5168\u9762\u7684\u65b9\u6cd5\u662f\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\uff0c\u8fd9\u80fd\u4ece\u6839\u672c\u4e0a\u6d88\u9664\u9009\u62e9\u504f\u5dee\u548c\u968f\u673a\u731c\u6d4b\u3002\u4f46\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\u4e5f\u5e26\u6765\u4e86\u6311\u6218\uff1a\u4e00\u662f\u5982\u4f55\u8bc6\u522b\u5408\u9002\u7684\u5f00\u653e\u6027\u95ee\u9898\uff0c\u4e8c\u662f\u5982\u4f55\u9a8c\u8bc1LLM\u5bf9\u5f00\u653e\u5f0f\u95ee\u9898\u7684\u56de\u7b54\u4e0e\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u7b54\u6848\u4e4b\u95f4\u7684\u51c6\u786e\u6027\u3002\u672c\u7814\u7a76\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u96be\u9898\uff0c\u5e76\u5efa\u7acb\u4e00\u4e2a\u65b0\u7684LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u901a\u8fc7\u5b8c\u5168\u7684\u5f00\u653e\u5f0f\u95ee\u9898\u6765\u8861\u91cf\u6a21\u578b\u6027\u80fd\uff0c\u4f8b\u5982GPT-4o/4/3.5\u3001Claude 3\u3001Gemini\u7b49\u3002 ### \u4efb\u52a1 \u6211\u4eec\u521b\u5efa\u4e86Open-LLM-Leaderboard\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u8bc4\u4ef7\u5e73\u53f0\uff0c\u65e8\u5728\u8ddf\u8e2a\u5404\u79cdLLM\u7684\u8868\u73b0\uff0c\u63ed\u793a\u5b83\u4eec\u7684\u771f\u5b9e\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5f00\u6e90\uff0c\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/VILA-Lab/Open-LLM-Leaderboard\u3002**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528](http://arxiv.org/abs/2406.07528)|**[link](https://github.com/dvlab-research/q-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u957f\u5e8f\u5217\u65b9\u9762\u7684\u80fd\u529b\u5bf9\u4e8e\u5404\u9886\u57df\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u4ee5\u6df1\u5165\u7406\u89e3\u8bed\u4e49\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Query-aware Inference for LLMs\uff08Q-LLM\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u8ba4\u77e5\u5904\u7406\u5927\u89c4\u6a21\u5e8f\u5217\u7684\u7cfb\u7edf\u3002\u901a\u8fc7\u805a\u7126\u4e8e\u4e0e\u7ed9\u5b9a\u67e5\u8be2\u76f8\u5173\u7684\u5185\u5b58\u6570\u636e\uff0cQ-LLM\u80fd\u591f\u5728\u56fa\u5b9a\u7a97\u53e3\u5927\u5c0f\u5185\u51c6\u786e\u6355\u6349\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4e3a\u67e5\u8be2\u63d0\u4f9b\u7cbe\u786e\u7684\u7b54\u6848\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55LLMs\u4e2d\u3002\u4f7f\u7528LLaMA3\uff08QuickLLaMA\uff09\u7684Q-LLM\u80fd\u572830\u79d2\u5185\u9605\u8bfb\u300a\u54c8\u5229\u00b7\u6ce2\u7279\u300b\uff0c\u5e76\u80fd\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u76f8\u8f83\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684LLaMA3\uff0cQ-LLM\u7684\u6027\u80fd\u63d0\u5347\u4e867.17%\uff0c\u800c\u5728Mistral\u4e0a\uff0c\u5b83\u5728$\\infty$-bench\u4e0a\u7684\u8868\u73b0\u63d0\u5347\u4e863.26%\u3002\u5728\u201c\u9488\u950b\u76f8\u5bf9\u201d\u4efb\u52a1\u4e2d\uff0cQ-LLM\u5728\u5e7f\u6cdb\u8ba4\u53ef\u7684\u57fa\u51c6\u4e0a\uff0c\u76f8\u5bf9\u4e8e\u5f53\u524d\u6700\u4f73\u6210\u7ee9\uff0cMistral\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e867.0%\uff0c\u5728LLaMA3\u4e0a\u5b9e\u73b0\u4e86100%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728https://github.com/dvlab-research/Q-LLM\u4e0a\u5f00\u6e90\u3002**|\n", "2406.07515": "|**2024-06-11**|**Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement**|Yunzhen Feng et.al.|[2406.07515](http://arxiv.org/abs/2406.07515)|null|\u968f\u7740\u751f\u6210\u6a21\u578b\u5408\u6210\u6570\u636e\u7684\u5174\u8d77\uff0c\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5fae\u8c03\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u6a21\u578b\u5d29\u6e83\uff08\u5373\u5fae\u8c03\u6027\u80fd\u4e0b\u964d\uff09\u7684\u5173\u6ce8\u3002\u7531\u4e8e\u4eba\u7c7b\u548c\u673a\u5668\u90fd\u8f83\u5bb9\u6613\u5206\u8fa8\u597d\u6837\u672c\u548c\u574f\u6837\u672c\uff0c\u800c\u975e\u751f\u6210\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u53cd\u9988\u6765\u9632\u6b62\u6a21\u578b\u5728\u5408\u6210\u6570\u636e\u4e0a\u51fa\u73b0\u5d29\u6e83\u3002\u6211\u4eec\u7406\u8bba\u5206\u6790\u4e86\u4e00\u4e2a\u9ad8\u65af\u6df7\u5408\u5206\u7c7b\u6a21\u578b\u5728\u57fa\u4e8e\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u8bad\u7ec3\u4e0b\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u6709\u9650\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u5b9e\u9a8c\u8bc1\u636e\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5b9e\u9645\u95ee\u9898\u4e0a\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u9884\u6d4b\uff1a\u4f7f\u7528\u53d8\u538b\u5668\u8ba1\u7b97\u77e9\u9635\u7279\u5f81\u503c\u548c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u65b0\u95fb\u6458\u8981\uff0c\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u6a21\u578b\u5728\u751f\u6210\u6570\u636e\u4e0a\u90fd\u4f1a\u7ecf\u5386\u5d29\u6e83\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u901a\u8fc7\u4ece\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u4e2d\u8bad\u7ec3\uff0c\u65e0\u8bba\u662f\u4fee\u526a\u9519\u8bef\u9884\u6d4b\u8fd8\u662f\u9009\u62e9\u6700\u4f73\u731c\u6d4b\uff0c\u90fd\u80fd\u9632\u6b62\u6a21\u578b\u5d29\u6e83\uff0c\u8bc1\u5b9e\u4e86\u50cfRLHF\uff08Reinforcement Learning with Human Feedback\uff09\u8fd9\u6837\u7684\u6d41\u884c\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2406.07505": "|**2024-06-11**|**THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report**|KBTG Labs et.al.|[2406.07505](http://arxiv.org/abs/2406.07505)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u5728\u79d1\u6280\u9886\u57df\u5c55\u73b0\u4e86\u65b0\u529f\u80fd\u548c\u673a\u9047\u3002\u7136\u800c\uff0c\u975e\u5e38\u5927\u7684LLMs\u7684\u5b9e\u9645\u5e94\u7528\u53d7\u5230\u5176\u9ad8\u8ba1\u7b97\u6210\u672c\u7684\u5236\u7ea6\uff0c\u8fd9\u4e0e\u5176\u76f8\u5bf9\u6709\u9650\u7684\u4eba\u7c7b\u80fd\u529b\u76f8\u6bd4\uff0c\u6536\u76ca\u5e76\u4e0d\u660e\u663e\u3002\u5c3d\u7ba1\u5c0f\u578b\u3001\u66f4\u5b9e\u7528\u7684LLMs\u5728\u91d1\u878d\u5206\u6790\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5c1a\u672a\u5b8c\u5168\u638c\u63e1\uff0c\u5982\u5b83\u4eec\u5728\u6a21\u62df\u7279\u8bb8\u91d1\u878d\u5206\u6790\u5e08\uff08CFA\uff09\u8003\u8bd5\u4e2d\u7684\u63a5\u8fd1\u901a\u8fc7\u8868\u73b0\u6240\u793a\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86Financial Analyst Extension\uff08FAE\uff09\u5bf9\u6211\u4eec\u7684Text Hyperlocally Augmented Large Language Extension\uff08THaLLE\uff09\u7cfb\u5217\u7684\u6269\u5c55\uff0c\u8fd9\u4e00\u7cfb\u521780\u4ebf\u53c2\u6570\u7684LLMs\u5728\u6a21\u62dfCFA\u8003\u8bd5\u4e2d\u59cb\u7ec8\u8868\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u6a21\u578b\u76f8\u6bd4\u3002\u6211\u4eec\u8be6\u7ec6\u8bb0\u5f55\u4e86\u7528\u4e8e\u4f18\u5316\u7684\u5fae\u8c03\u6280\u672f\uff0c\u4ee5\u4f9b\u540e\u7eed\u7814\u7a76\u53c2\u8003\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165Flare CFA\uff0c\u8fd9\u662f\u4e00\u4e2a\u516c\u5f00\u53ef\u7528\u7684\u91d1\u878d\u987e\u95ee\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u68c0\u9a8cLLMs\u5728\u8d22\u52a1\u987e\u95ee\u89d2\u8272\u4e2d\u7684\u80fd\u529b\u3002|\n", "2406.07502": "|**2024-06-11**|**Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions**|Renjie Pi et.al.|[2406.07502](http://arxiv.org/abs/2406.07502)|**[link](https://github.com/sterzhang/image-textualization)**|**## \u80cc\u666f \u56fe\u50cf\u63cf\u8ff0\u6570\u636e\u96c6\u5bf9\u4e8e\u63a8\u52a8\u56fe\u50cf\u7406\u89e3\u3001\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u548c\u6587\u672c\u56fe\u50cf\u68c0\u7d22\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u5f53\u524d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u6765\u81ea\u4e24\u4e2a\u9014\u5f84\uff1a\u4e00\u662f\u4ece\u7f51\u7edc\u4e0a\u6293\u53d6\u56fe\u50cf\u4e0e\u6587\u5b57\u5bf9\uff0c\u4f46\u8fd9\u7c7b\u63cf\u8ff0\u5f80\u5f80\u8d28\u91cf\u8f83\u4f4e\u4e14\u5b58\u5728\u566a\u58f0\uff1b\u4e8c\u662f\u4eba\u5de5\u6807\u6ce8\uff0c\u5982COCO\u7b49\uff0c\u901a\u5e38\u63cf\u8ff0\u7b80\u6d01\uff0c\u7f3a\u4e4f\u8be6\u7ec6\u4fe1\u606f\u3002\u5c3d\u7ba1\u8be6\u7ec6\u7684\u56fe\u50cf\u63cf\u8ff0\u53ef\u4ee5\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u83b7\u5f97\uff0c\u4f46\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\u9650\u5236\u4e86\u5176\u53ef\u884c\u6027\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u4fc3\u4f7f\u6211\u4eec\u5bfb\u6c42\u66f4\u6709\u6548\u548c\u53ef\u6269\u5c55\u7684\u65b9\u6cd5\u6765\u751f\u6210\u51c6\u786e\u800c\u8be6\u5c3d\u7684\u56fe\u50cf\u63cf\u8ff0\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u56fe\u50cf\u6587\u672c\u5316\u201d\uff08Image Textualization\uff0c\u7b80\u79f0IT\uff09\uff0c\u5b83\u901a\u8fc7\u534f\u540c\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u548c\u89c6\u89c9\u4e13\u5bb6\u6a21\u578b\uff0c\u6709\u6548\u5730\u5c06\u89c6\u89c9\u4fe1\u606f\u8f6c\u5316\u4e3a\u6587\u672c\uff0c\u4ece\u800c\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u63cf\u8ff0\u3002\u9488\u5bf9\u5f53\u524d\u7f3a\u4e4f\u8be6\u5c3d\u63cf\u8ff0\u7684\u57fa\u51c6\u95ee\u9898\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u591a\u4e2a\u8bc4\u4ef7\u57fa\u51c6\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u8d28\u91cf\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728IT\u7cbe\u5fc3\u7f16\u7e82\u7684\u63cf\u8ff0\u8bad\u7ec3\u4e0b\uff0cLLaVA-7B\u6a21\u578b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u80fd\u529b\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u80fd\u591f\u751f\u6210\u66f4\u4e30\u5bcc\u7684\u63cf\u8ff0\uff0c\u8f93\u51fa\u957f\u5ea6\u548c\u7ec6\u8282\u663e\u8457\u589e\u52a0\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5e7b\u89c9\u73b0\u8c61\u3002**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496](http://arxiv.org/abs/2406.07496)|**[link](https://github.com/zou-group/textgrad)**|**\u4eba\u5de5\u667a\u80fd\u6b63\u7ecf\u5386\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5176\u4ed6\u590d\u6742\u7ec4\u4ef6\u7684\u534f\u540c\u5de5\u4f5c\u53d6\u5f97\u4e86\u7a81\u7834\u3002\u5f53\u524d\uff0c\u4e3a\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u8bbe\u8ba1\u539f\u5219\u5316\u7684\u81ea\u52a8\u5316\u4f18\u5316\u65b9\u6cd5\u6210\u4e3a\u4e00\u9879\u5173\u952e\u65b0\u6311\u6218\u3002\u795e\u7ecf\u7f51\u7edc\u5728\u65e9\u671f\u9762\u4e34\u7c7b\u4f3c\u95ee\u9898\u65f6\uff0c\u901a\u8fc7\u53cd\u5411\u4f20\u64ad\u548c\u81ea\u52a8\u5fae\u5206\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TextGrad\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u6587\u672c\u5b9e\u73b0\u81ea\u52a8\u201c\u5fae\u5206\u201d\uff0c\u5c06LLMs\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u901a\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5efa\u8bae\u56de\u4f20\u5230\u590d\u5408AI\u7cfb\u7edf\u7684\u5404\u4e2a\u7ec4\u4ef6\u4e2d\u3002TextGrad\u9075\u5faaPyTorch\u7684\u8bed\u6cd5\u548c\u62bd\u8c61\uff0c\u6613\u4e8e\u4f7f\u7528\u4e14\u7075\u6d3b\uff0c\u7528\u6237\u4ec5\u9700\u63d0\u4f9b\u76ee\u6807\u51fd\u6570\uff0c\u65e0\u9700\u8c03\u6574\u6846\u67b6\u7ec4\u4ef6\u6216\u63d0\u793a\uff0c\u5373\u53ef\u65e0\u7f1d\u5e94\u7528\u3002 TextGrad\u9002\u7528\u4e8e\u591a\u79cd\u4efb\u52a1\uff0c\u4ece\u95ee\u7b54\u548c\u5206\u5b50\u4f18\u5316\u5230\u653e\u5c04\u6cbb\u7597\u8ba1\u5212\u8bbe\u8ba1\u3002\u5728\u65e0\u9700\u4fee\u6539\u6846\u67b6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u663e\u8457\u63d0\u5347\u4e86GPT-4o\u5728Google\u8bc1\u660e\u6027\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u96f6-shot\u51c6\u786e\u7387\uff0c\u4ece51%\u63d0\u5347\u81f355%\uff1b\u5728\u4f18\u5316LeetCode\u96be\u9898\u89e3\u6cd5\u4e0a\u5b9e\u73b0\u4e8620%\u7684\u76f8\u5bf9\u6027\u80fd\u63d0\u5347\uff1b\u6539\u8fdb\u4e86\u63a8\u7406\u63d0\u793a\uff0c\u8bbe\u8ba1\u51fa\u5177\u6709\u7406\u60f3\u4f53\u5916\u4eb2\u548c\u529b\u7684\u65b0\u836f\u5019\u9009\u5206\u5b50\uff1b\u4ee5\u53ca\u8bbe\u8ba1\u51fa\u5177\u6709\u9ad8\u7279\u5f02\u6027\u7684\u653e\u5c04\u6cbb\u7597\u65b9\u6848\u3002TextGrad\u4e3a\u4e0b\u4e00\u4ee3AI\u7cfb\u7edf\u7684\u53d1\u5c55\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63a8\u52a8\u4e86\u590d\u5408AI\u6280\u672f\u7684\u52a0\u901f\u53d1\u5c55\u3002**|\n", "2406.07494": "|**2024-06-12**|**CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization**|Frederic Kirstein et.al.|[2406.07494](http://arxiv.org/abs/2406.07494)|null|\u8be5\u6587\u7ae0\u7efc\u8ff0\u4e862019\u5e74\u81f32024\u5e74\u95f4\u53d1\u8868\u76841262\u7bc7\u72ec\u7279\u7684\u7814\u7a76\u8bba\u6587\uff0c\u96c6\u4e2d\u5728Transformer\u67b6\u6784\u5728\u82f1\u6587\u5bf9\u8bdd\u6458\u8981\u751f\u6210\u65b9\u9762\u7684\u7814\u7a76\u3002\u6587\u7ae0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u5bf9\u8bdd\u6458\u8981\u4e2d\u5b58\u5728\u7684\u4e3b\u8981\u6311\u6218\uff0c\u5982\u8bed\u8a00\u7406\u89e3\u3001\u7ed3\u6784\u5904\u7406\u3001\u7406\u89e3\u80fd\u529b\u3001\u8bf4\u8bdd\u8005\u8bc6\u522b\u3001\u91cd\u8981\u6027\u5224\u65ad\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\uff0c\u5e76\u4e0e\u76f8\u5e94\u7684\u6280\u672f\uff0c\u5982\u56fe\u89e3\u65b9\u6cd5\u3001\u989d\u5916\u8bad\u7ec3\u4efb\u52a1\u548c\u89c4\u5212\u7b56\u7565\u8fdb\u884c\u4e86\u5173\u8054\u3002\u5c3d\u7ba1\u5728\u67d0\u4e9b\u65b9\u9762\uff08\u5982\u8bed\u8a00\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u7406\u89e3\u529b\u3001\u771f\u5b9e\u6027\u4e0e\u91cd\u8981\u6027\u8bc4\u4f30\u7b49\u6311\u6218\u4ecd\u7136\u5b58\u5728\uff0c\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u7814\u7a76\u7a7a\u95f4\u3002 \u6587\u7ae0\u8fd8\u5206\u6790\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u65b9\u5f0f\uff0c\u6db5\u76d6\u4e86\u5bf9\u8bdd\u5b50\u9886\u57df\uff08\u5982\u4f1a\u8bae\u3001\u533b\u7597\uff09\u7684\u5e38\u7528\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u81ea\u52a8\u8bc4\u4ef7\u6307\u6807\uff08\u5982ROUGE\uff09\u548c\u4eba\u7c7b\u8bc4\u4f30\u7684\u666e\u904d\u5b9e\u8df5\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8de8\u9886\u57df\u7684\u6570\u636e\u96c6\u76f8\u5bf9\u6709\u9650\uff0c\u4e14\u62a5\u544a\u7684\u4eba\u7c7b\u8bc4\u4f30\u5f80\u5f80\u7f3a\u4e4f\u8db3\u591f\u7684\u5185\u5ba1\u5458\u4e00\u81f4\u6027\u4fe1\u606f\u548c\u6807\u6ce8\u6307\u5357\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u63a2\u7d22\u53ef\u80fd\u5e26\u6765\u7684\u5f71\u54cd\uff0c\u6307\u51fa\u5c3d\u7ba1\u5b83\u4eec\u53ef\u80fd\u4f1a\u6539\u53d8\u76f8\u5173\u6027\u548c\u96be\u5ea6\uff0c\u4f46\u63cf\u8ff0\u7684\u6311\u6218\u5206\u7c7b\u4f53\u7cfb\u4ecd\u7136\u5177\u6709\u4ef7\u503c\u3002|\n", "2406.07485": "|**2024-06-11**|**PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction**|Adnan Abbas et.al.|[2406.07485](http://arxiv.org/abs/2406.07485)|null|\u9ad8\u6548\u7684\u8ba1\u5212\u5236\u5b9a\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4eba\u4eec\u5f80\u5f80\u96be\u4ee5\u5236\u5b9a\u5b9e\u9645\u7684\u8ba1\u5212\u5e76\u53cd\u601d\u81ea\u5df1\u7684\u6548\u7387\u3002\u5229\u7528\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u5bf9\u8bdd\u52a9\u624b\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u8bdd\u65b9\u5f0f\u5c06\u8ba1\u5212\u5916\u5316\uff0c\u5f3a\u5316\u51b3\u5fc3\uff0c\u4fc3\u8fdb\u4e13\u6ce8\u884c\u52a8\uff0c\u4ece\u800c\u6b63\u9762\u5f71\u54cd\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u8bbe\u8ba1\u4e00\u4e2a\u5bf9\u8bdd\u52a9\u624b\uff0c\u901a\u8fc7\u81ea\u7136\u5bf9\u8bdd\u7684\u793e\u4ea4\u4e92\u52a8\u6027\uff0c\u63d0\u4f9b\u6df1\u5165\u7684\u95ee\u9898\u548c\u53cd\u601d\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u8ba1\u5212\u6267\u884c\u5ea6\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u663e\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u7684\u6548\u76ca\uff0c\u4f46\u8bb8\u591a\u5e72\u9884\u63aa\u65bd\u4ecd\u4fdd\u6301\u9759\u6001\uff0c\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u53c2\u4e0e\u5ea6\u968f\u65f6\u95f4\u4e0b\u964d\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65cb\u8f6c\u548c\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u6bcf\u5929\u4e3a\u7528\u6237\u63d0\u4f9b\u591a\u6837\u7684\u5e72\u9884\u624b\u6bb5\u3002\u6211\u4eec\u7684\u7cfb\u7edfPITCH\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u4fc3\u8fdb\u65e5\u5e38\u8ba1\u5212\u7684\u5916\u90e8\u5316\u548c\u53cd\u601d\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0e\u5bf9\u8bdd\u4ee3\u7406\u4e00\u8d77\u5916\u5316\u4efb\u52a1\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u65cb\u8f6c\u7b56\u7565\u5728\u4fdd\u6301\u7528\u6237\u53c2\u4e0e\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2406.07483": "|**2024-06-11**|**Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing**|Mao Li et.al.|[2406.07483](http://arxiv.org/abs/2406.07483)|null|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u81ea\u52a8\u6587\u672c\u6807\u6ce8\u65b9\u9762\u5c55\u73b0\u51fa\u6d53\u539a\u5174\u8da3\u3002\u672c\u6587\u7814\u7a76\u4e86\u516b\u79cd\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728\u7acb\u573a\u6807\u6ce8\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5c06\u5176\u4e0e\u4eba\u7c7b\uff08\u901a\u8fc7\u4f17\u5305\uff09\u7684\u5224\u65ad\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u3002\u6211\u4eec\u63a2\u7a76\u4e86\u4f55\u65f6LLMs\u53ef\u80fd\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ea7\u751f\u5206\u6b67\u7684\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6587\u672c\u4e2d\u8868\u8fbe\u7acb\u573a\u7684\u660e\u786e\u7a0b\u5ea6\u5bf9LLMs\u5224\u65ad\u4e0e\u4eba\u7c7b\u4e00\u81f4\u6027\u81f3\u5173\u91cd\u8981\u3002\u5f53\u4eba\u7c7b\u6ce8\u91ca\u8005\u8868\u73b0\u826f\u597d\u65f6\uff0cLLMs\u4e5f\u8868\u73b0\u51fa\u8272\uff1b\u53cd\u4e4b\uff0cLLMs\u7684\u5931\u8d25\u5f80\u5f80\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u96be\u4ee5\u8fbe\u6210\u4e00\u81f4\u7684\u60c5\u5883\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u7ed3\u5408\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u7cbe\u786e\u5ea6\u4e0eLLMs\u9884\u6d4b\u7684\u89c4\u6a21\uff0c\u63d0\u51fa\u4e00\u79cd\u5168\u9762\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u63d0\u9ad8\u81ea\u52a8\u5316\u7acb\u573a\u68c0\u6d4b\u51c6\u786e\u6027\u548c\u5168\u9762\u6027\u7684\u5fc5\u8981\u6027\uff0c\u65e8\u5728\u63a8\u52a8\u8fd9\u4e9b\u6280\u672f\u5728\u66f4\u9ad8\u6548\u3001\u65e0\u504f\u89c1\u7684\u793e\u4f1a\u5a92\u4f53\u5206\u6790\u4e2d\u5f97\u5230\u63d0\u5347\u3002|\n", "2406.07476": "|**2024-06-11**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476](http://arxiv.org/abs/2406.07476)|**[link](https://github.com/damo-nlp-sg/videollama2)**|**\u672c\u6587\u4ecb\u7ecdVideoLLaMA 2\uff0c\u4e00\u5957\u4e13\u4e3a\u63d0\u5347\u89c6\u9891\u548c\u97f3\u9891\u5b9a\u5411\u4efb\u52a1\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u5efa\u6a21\u53ca\u97f3\u9891\u7406\u89e3\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Video-LLMs\uff09\u3002\u5b83\u5728\u524d\u4e00\u4ee3\u7684\u57fa\u7840\u4e0a\u589e\u6dfb\u4e86\u5b9a\u5236\u7684\u65f6\u7a7a\u5377\u79ef\uff08STC\uff09\u8fde\u63a5\u5668\uff0c\u6709\u6548\u5730\u6355\u6349\u89c6\u9891\u6570\u636e\u7684\u590d\u6742\u7a7a\u95f4\u548c\u65f6\u95f4\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u878d\u5165\u4e86\u97f3\u9891\u5206\u652f\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u65e0\u7f1d\u878d\u5408\u97f3\u9891\u7ebf\u7d22\u3002\u5728\u591a\u9879\u8bc4\u4f30\u4e2d\uff0c\u5982\u591a\u9009\u89c6\u9891\u95ee\u7b54\uff08MC-VQA\uff09\u3001\u5f00\u653e\u6027\u89c6\u9891\u95ee\u7b54\uff08OE-VQA\uff09\u548c\u89c6\u9891captioning\uff08VC\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u8868\u73b0\u51fa\u4e0e\u5f00\u6e90\u6a21\u578b\u76f8\u5f53\u7684\u7ade\u4e89\u5b9e\u529b\uff0c\u5e76\u5728\u67d0\u4e9b\u57fa\u51c6\u4e0a\u63a5\u8fd1\u4e13\u6709\u6a21\u578b\u3002\u5728\u97f3\u9891\u4ec5\u7528\uff08AQA\uff09\u548c\u97f3\u9891-\u89c6\u9891\u95ee\u7b54\uff08OE-AVQA\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u4e5f\u663e\u793a\u51fa\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5408\u7406\u6539\u8fdb\u3002\u8fd9\u4e9b\u8fdb\u6b65\u51f8\u663e\u4e86VideoLLaMA 2\u5728\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4e3a\u667a\u80fd\u89c6\u9891\u5206\u6790\u7cfb\u7edf\u6811\u7acb\u4e86\u65b0\u6807\u51c6\u3002\u6240\u6709\u6a21\u578b\u5747\u516c\u5f00\u4ee5\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2406.08477": "|**2024-06-12**|**Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens**|Ting-Ji Huang et.al.|[2406.08477](http://arxiv.org/abs/2406.08477)|null|\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u5411\u91cf\u8868\u793a\u7528\u6237\u548c\u9879\u76ee\u5bf9\u4e8e\u591a\u79cd\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u95ee\u7b54\u5f62\u5f0f\u7684\u63a8\u8350\uff0c\u4f7f\u7528\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\uff08\u5982\u201citem\u201d\u3001\u201c20\u201d\u3001\u201c24\u201d\uff09\u6765\u8868\u793a\u5b9e\u9645\u7684\u7528\u6237\u548c\u9879\u76ee\u3002\u7136\u800c\uff0c\u7531\u4e8eLLMs\u901a\u5e38\u662f\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u9884\u8bad\u7ec3\u7684\uff0c\u8fd9\u4e9b\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5728\u8868\u8fbe\u72ec\u7279\u7528\u6237\u548c\u9879\u76ee\u65b9\u9762\u80fd\u529b\u6709\u9650\uff0c\u5373\u4f7f\u7ecf\u8fc7\u63a8\u8350\u4efb\u52a1\u7684\u5fae\u8c03\uff0c\u4e5f\u4f1a\u524a\u5f31\u63a8\u8350\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5728LLM\u57fa\u7684\u63a8\u8350\u7cfb\u7edf\u4e2d\u5904\u7406\u7528\u6237\u548c\u9879\u76ee\u7684\u6807\u8bb0\u3002 \u6211\u4eec\u5f3a\u8c03\u4e86\u51fa\u8bcd\u6c47\u8868\uff08OOV\uff09\u6807\u8bb0\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u9664\u4e86\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5916\uff0c\u8fd8\u80fd\u6355\u6349\u7528\u6237/\u9879\u76ee\u4e4b\u95f4\u7684\u5173\u8054\u6027\u548c\u591a\u6837\u6027\u3002\u901a\u8fc7\u5206\u6790\u5386\u53f2\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u4f7f\u5177\u6709\u76f8\u4f3c\u7279\u6027\u7684\u7528\u6237/\u9879\u76ee\u7ec4\u5408\u5171\u4eab\u76f8\u540c\u7684OOV\u6807\u8bb0\u3002\u6b64\u5916\uff0c\u5c06\u8fd9\u4e9bOOV\u6807\u8bb0\u6574\u5408\u5230LLM\u7684\u8bcd\u6c47\u8868\u4e2d\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u533a\u5206\u7528\u6237\u548c\u9879\u76ee\uff0c\u589e\u5f3a\u5728\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u65f6\u5bf9\u7528\u6237-\u9879\u76ee\u5173\u7cfb\u7684\u6355\u6349\u3002 \u6211\u4eec\u7684\u63d0\u51fa\u7684\u6846\u67b6\u5728\u5404\u79cd\u4e0b\u6e38\u63a8\u8350\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2406.08474": "|**2024-06-12**|**Real2Code: Reconstruct Articulated Objects via Code Generation**|Zhao Mandi et.al.|[2406.08474](http://arxiv.org/abs/2406.08474)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014Real2Code\uff0c\u65e8\u5728\u901a\u8fc7\u4ee3\u7801\u751f\u6210\u6765\u91cd\u5efa\u53ef\u52a8\u7269\u4f53\u3002\u7ed9\u5b9a\u7269\u4f53\u7684\u89c6\u89c9\u89c2\u6d4b\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u56fe\u50cf\u5206\u5272\u6a21\u578b\u548c\u5f62\u72b6\u8865\u5168\u6a21\u578b\u91cd\u6784\u5176\u90e8\u4ef6\u51e0\u4f55\u7ed3\u6784\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u7269\u4f53\u90e8\u4ef6\u8868\u793a\u4e3a\u5e26\u6709\u65b9\u5411\u7684\u8fb9\u754c\u6846\uff0c\u7136\u540e\u8f93\u5165\u5230\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\uff0c\u9884\u6d4b\u5173\u8282\u6d3b\u52a8\u7684\u4ee3\u7801\u8868\u793a\u3002\u901a\u8fc7\u5229\u7528\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u4f18\u96c5\u5730\u6269\u5c55\u5230\u5177\u6709\u66f4\u591a\u53ef\u52a8\u90e8\u4ef6\u7684\u5bf9\u8c61\uff0c\u5e76\u80fd\u4ece\u5408\u6210\u8bad\u7ec3\u6570\u636e\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4e0d\u89c4\u5219\u73af\u5883\u7269\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cReal2Code\u5728\u91cd\u5efa\u7cbe\u5ea6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u662f\u9996\u4e2a\u80fd\u591f\u8d85\u8d8a\u8bad\u7ec3\u96c6\u4e2d\u5bf9\u8c61\u7ed3\u6784\u590d\u6742\u6027\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91cd\u5efa\u591a\u8fbe10\u4e2a\u53ef\u52a8\u90e8\u4ef6\u7684\u7269\u4f53\u3002\u5f53\u4e0e\u7acb\u4f53\u91cd\u5efa\u6a21\u578b\u7ed3\u5408\u65f6\uff0cReal2Code\u8fd8\u80fd\u4ece\u5c11\u91cf\u591a\u89c6\u56feRGB\u56fe\u50cf\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u7269\u4f53\uff0c\u65e0\u9700\u6df1\u5ea6\u6216\u76f8\u673a\u4fe1\u606f\u3002|\n", "2406.08464": "|**2024-06-12**|**Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing**|Zhangchen Xu et.al.|[2406.08464](http://arxiv.org/abs/2406.08464)|null|\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4\u6570\u636e\u5bf9\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u50cfLlama-3-Instruct\u8fd9\u6837\u7684\u6a21\u578b\u516c\u5f00\u4e86\u6743\u91cd\uff0c\u4f46\u5b83\u4eec\u7684\u5bf9\u9f50\u6570\u636e\u4ecd\u7136\u4fdd\u5bc6\uff0c\u8fd9\u9650\u5236\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u666e\u53ca\u3002\u73b0\u6709\u7684\u5f00\u6e90\u6570\u636e\u751f\u6210\u65b9\u6cd5\u53d7\u9650\u4e8e\u9ad8\u6602\u7684\u4eba\u529b\u6210\u672c\u548c\u6709\u9650\u7684\u63d0\u793a\u8303\u56f4\uff0c\u96be\u4ee5\u6709\u6548\u6269\u5c55\uff0c\u53ef\u80fd\u5f71\u54cd\u516c\u5171\u5bf9\u9f50\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u80fd\u5426\u901a\u8fc7\u76f4\u63a5\u4ece\u5df2\u5bf9\u9f50\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\uff0c\u5927\u89c4\u6a21\u5408\u6210\u9ad8\u8d28\u6307\u4ee4\u6570\u636e\u5462\uff1f\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5408\u6210\u65b9\u6cd5\uff0c\u79f0\u4e3aMagpie\u3002\u6211\u4eec\u7684\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u7531\u4e8eLlama-3-Instruct\u7b49\u5df2\u5bf9\u9f50\u7684\u6a21\u578b\u5177\u6709\u81ea\u56de\u5f52\u7279\u6027\uff0c\u5f53\u6211\u4eec\u4ec5\u8f93\u5165\u5de6\u4fa7\u6a21\u677f\u5230\u7528\u6237\u6d88\u606f\u9884\u7559\u4f4d\u7f6e\u65f6\uff0c\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u7528\u6237\u67e5\u8be2\u3002\u6211\u4eec\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\u63d0\u793aLlama-3-Instruct\uff0c\u751f\u6210\u4e86400\u4e07\u4e2a\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5bf9\u63d0\u53d6\u7684\u6570\u636e\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u5e76\u9009\u62e9\u4e8630\u4e07\u4e2a\u9ad8\u8d28\u91cf\u5b9e\u4f8b\u3002\u4e3a\u4e86\u6bd4\u8f83Magpie\u6570\u636e\u4e0e\u5176\u4ed6\u516c\u5171\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5206\u522b\u4f7f\u7528\u6bcf\u4e2a\u6570\u636e\u96c6\u5bf9Llama-3-8B-Base\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8bc4\u4f30\u5fae\u8c03\u540e\u6a21\u578b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u4ec5\u4f7f\u7528Magpie\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0e\u5b98\u65b9\u7ecf\u8fc71000\u4e07\u4e2a\u6570\u636e\u70b9\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u548c\u540e\u7eed\u53cd\u9988\u5b66\u4e60\u589e\u5f3a\u7684Llama-3-8B-Instruct\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4ec5\u4f7f\u7528Magpie\u8fdb\u884cSFT\u53ef\u4ee5\u8d85\u8d8a\u5148\u524d\u7528\u4e8eSFT\u548c\u504f\u597d\u4f18\u5316\uff08\u5982UltraFeedback\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff09\u7684\u516c\u5171\u6570\u636e\u96c6\u3002\u8fd9\u79cd\u4f18\u52bf\u5728AlpacaEval\u3001ArenaHard\u548cWildBench\u7b49\u5bf9\u9f50\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u660e\u663e\u3002|\n", "2406.08434": "|**2024-06-12**|**TasTe: Teaching Large Language Models to Translate through Self-Reflection**|Yutong Wang et.al.|[2406.08434](http://arxiv.org/abs/2406.08434)|**[link](https://github.com/yutongwang1216/reflectionllmmt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u540e\uff0c\u5728\u673a\u5668\u7ffb\u8bd1\uff08Machine Translation, MT\uff09\u7b49\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u672a\u80fd\u8fbe\u5230\u4e0e\u76d1\u7763\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08Supervised Neural Machine Translation, NMT\uff09\u7cfb\u7edf\u76f8\u5f53\u7684\u7ffb\u8bd1\u8d28\u91cf\u3002\u539f\u56e0\u53ef\u80fd\u662f\u5f53\u524d\u4f7f\u7528\u7684\u7b80\u5355\u63d0\u793a\u65e0\u6cd5\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TasTe\u6846\u67b6\uff0c\u5373\u201c\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u8fdb\u884c\u7ffb\u8bd1\u201d\u3002\u8be5\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u63a8\u7406\u9636\u6bb5\uff1a\u7b2c\u4e00\u9636\u6bb5\uff0c\u6a21\u578b\u88ab\u5f15\u5bfc\u751f\u6210\u521d\u6b65\u7ffb\u8bd1\u5e76\u540c\u65f6\u5bf9\u5176\u81ea\u8eab\u8fdb\u884c\u8bc4\u4f30\uff1b\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6a21\u578b\u6839\u636e\u8bc4\u4f30\u7ed3\u679c\u5bf9\u521d\u6b65\u7ffb\u8bd1\u8fdb\u884c\u7ec6\u5316\u3002\u5728WMT22\u57fa\u51c6\u7684\u56db\u79cd\u8bed\u8a00\u65b9\u5411\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\u7684\u6709\u6548\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91ca\u653e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u5e76\u589e\u5f3a\u5176\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/YutongWang1216/ReflectionLLMMT\u4e0a\u5f00\u6e90\u3002**|\n", "2406.08426": "|**2024-06-12**|**Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL**|Zijin Hong et.al.|[2406.08426](http://arxiv.org/abs/2406.08426)|null|\u6587\u672c\u8f6cSQL\u751f\u6210\u51c6\u786e\u7684SQL\u67e5\u8be2\u4ee5\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u662f\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5b83\u6d89\u53ca\u7528\u6237\u95ee\u9898\u7406\u89e3\u3001\u6570\u636e\u5e93\u6a21\u5f0f\u7406\u89e3\u4ee5\u53caSQL\u751f\u6210\u7b49\u591a\u4e2a\u590d\u6742\u73af\u8282\u3002\u4f20\u7edf\u7684\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u4f9d\u8d56\u4e8e\u4eba\u5de5\u5de5\u7a0b\u548c\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u3002\u968f\u7740\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u53d1\u5c55\u548c\u5728\u8be5\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002\u7136\u800c\uff0c\u968f\u7740\u6570\u636e\u5e93\u590d\u6742\u5ea6\u589e\u52a0\u548c\u7528\u6237\u95ee\u9898\u96be\u5ea6\u589e\u5927\uff0cPLMs\u6709\u9650\u7684\u7406\u89e3\u80fd\u529b\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u7684SQL\u751f\u6210\uff0c\u8fd9\u4fc3\u4f7f\u7814\u7a76\u4eba\u5458\u5bfb\u6c42\u66f4\u9ad8\u7ea7\u548c\u5b9a\u5236\u5316\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u9650\u5236\u4e86PLM\u57fa\u7840\u7cfb\u7edf\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0a\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u56e0\u6b64\uff0c\u6574\u5408LLM\u7684\u5b9e\u73b0\u4e3a\u6587\u672c\u8f6cSQL\u7814\u7a76\u5e26\u6765\u4e86\u72ec\u7279\u7684\u673a\u9047\u3001\u6311\u6218\u548c\u89e3\u51b3\u65b9\u6848\u3002\u672c\u7efc\u8ff0\u5168\u9762\u6982\u8ff0\u4e86\u57fa\u4e8eLLM\u7684\u6587\u672c\u8f6cSQL\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u5f53\u524d\u9762\u4e34\u7684\u6311\u6218\u548c\u6587\u672c\u8f6cSQL\u7684\u53d1\u5c55\u5386\u7a0b\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4ef7\u6307\u6807\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u5206\u6790\u4e86\u8fd1\u671f\u5728LLM\u652f\u6301\u4e0b\u7684\u6587\u672c\u8f6cSQL\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u5c1a\u5b58\u7684\u6311\u6218\uff0c\u5e76\u5bf9\u672a\u6765\u7814\u7a76\u65b9\u5411\u63d0\u51fa\u671f\u5f85\u3002|\n", "2406.08418": "|**2024-06-12**|**OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text**|Qingyun Li et.al.|[2406.08418](http://arxiv.org/abs/2406.08418)|**[link](https://github.com/opengvlab/omnicorpus)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aOmniCorpus\u7684\u5927\u578b\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u6570\u636e\u96c6\uff0c\u89c4\u6a21\u8fbe\u5230100\u4ebf\u7ea7\u522b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u901a\u8fc7\u9ad8\u6548\u7684\u5f15\u64ce\u7b5b\u9009\u548c\u63d0\u53d6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u6587\u6863\uff0c\u5305\u542b86\u4ebf\u5f20\u56fe\u7247\u548c1,696\u4e07\u4ebf\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u76f8\u8f83\u4e8e\u540c\u7c7b\u6570\u636e\uff08\u5982MMC4\u3001OBELICS\uff09\uff0cOmniCorpus\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf\uff1a1\uff09\u89c4\u6a21\u6269\u592715\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6570\u636e\u8d28\u91cf\uff1b2\uff09\u6765\u6e90\u66f4\u4e3a\u591a\u6837\uff0c\u5305\u62ec\u82f1\u6587\u548c\u975e\u82f1\u6587\u7f51\u7ad9\uff0c\u4ee5\u53ca\u89c6\u9891\u4e3a\u4e3b\u7684\u7f51\u7ad9\uff1b3\uff09\u7075\u6d3b\u6027\u66f4\u5f3a\uff0c\u53ef\u4ee5\u4ece\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u683c\u5f0f\u8f7b\u677e\u8f6c\u6362\u4e3a\u7eaf\u6587\u672c\u8bed\u6599\u5e93\u6216\u56fe\u50cf-\u6587\u672c\u5bf9\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bba\u6587\u9a8c\u8bc1\u4e86OmniCorpus\u7684\u6570\u636e\u8d28\u91cf\u3001\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684\u591a\u6a21\u6001\u6a21\u578b\u7814\u7a76\u63d0\u4f9b\u575a\u5b9e\u7684\u6570\u636e\u57fa\u7840\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/OpenGVLab/OmniCorpus\u4e0a\u516c\u5f00\u3002**|\n", "2406.08414": "|**2024-06-12**|**Discovering Preference Optimization Algorithms with and for Large Language Models**|Chris Lu et.al.|[2406.08414](http://arxiv.org/abs/2406.08414)|**[link](https://github.com/luchris429/DiscoPOP)**|****\u4e2d\u6587\u7ffb\u8bd1\uff1a** \u79bb\u7ebf\u504f\u597d\u4f18\u5316\u662f\u63d0\u5347\u548c\u63a7\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u8d28\u91cf\u7684\u91cd\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u504f\u597d\u4f18\u5316\u88ab\u89c6\u4e3a\u57fa\u4e8e\u4eba\u5de5\u8bbe\u8ba1\u7684\u51f8\u635f\u5931\u51fd\u6570\u7684\u79bb\u7ebf\u76d1\u7763\u5b66\u4e60\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53d7\u9650\u4e8e\u4eba\u7c7b\u521b\u9020\u529b\uff0c\u672a\u80fd\u5145\u5206\u63a2\u7d22\u53ef\u80fd\u7684\u635f\u5931\u51fd\u6570\u7684\u5de8\u5927\u641c\u7d22\u7a7a\u95f4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528LLM\u8fdb\u884c\u76ee\u6807\u53d1\u73b0\u7684\u65b9\u6cd5\uff0c\u4ee5\u81ea\u52a8\u53d1\u73b0\u65b0\u7684\u6700\u5148\u8fdb\u7684\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u65e0\u9700\uff08\u4e13\u5bb6\uff09\u4eba\u5de5\u5e72\u9884\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u8fed\u4ee3\u5730\u63d0\u793aLLM\uff0c\u6839\u636e\u5148\u524d\u7684\u6027\u80fd\u8bc4\u4f30\u63d0\u51fa\u5e76\u5b9e\u73b0\u65b0\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5bfc\u81f4\u4e86\u672a\u77e5\u4e14\u9ad8\u6548\u7684\u4f18\u5316\u7b97\u6cd5\u7684\u53d1\u73b0\u3002\u5176\u4e2d\u6700\u597d\u7684\u4e00\u4e2a\u88ab\u547d\u540d\u4e3a\u201c\u53d1\u73b0\u504f\u597d\u4f18\u5316\u201d\uff08DiscoPOP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\uff0c\u5b83\u5de7\u5999\u5730\u878d\u5408\u4e86\u903b\u8f91\u548c\u6307\u6570\u635f\u5931\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDiscoPOP\u5728\u6027\u80fd\u4e0a\u8fbe\u5230\u4e86\u6700\u65b0\u6c34\u5e73\uff0c\u5e76\u6210\u529f\u5730\u5e94\u7528\u4e8e\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u3002**|\n", "2406.08413": "|**2024-06-12**|**Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference**|Christopher Wolters et.al.|[2406.08413](http://arxiv.org/abs/2406.08413)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u671f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f7f\u5f97\u673a\u5668\u80fd\u591f\u751f\u6210\u903c\u771f\u7684\u6587\u672c\u5e76\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u968f\u7740\u8ba1\u7b97\u548c\u5185\u5b58\u9700\u6c42\u7684\u6025\u5267\u589e\u957f\uff0c\u5c24\u5176\u662f\u5f53LLMs\u8d85\u8d8a\u5355\u4e2aGPU\u7684\u5904\u7406\u80fd\u529b\u65f6\uff0c\u5bf9\u901f\u5ea6\u3001\u6548\u7387\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u9700\u6c42\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u540c\u65f6\uff0c\u8ba1\u7b97\u673a\u6027\u80fd\u548c\u5185\u5b58\u80fd\u529b\u7684\u53d1\u5c55\u5e76\u672a\u8ddf\u4e0a\u6b65\u4f10\uff0c\u5c24\u5176\u662f\u5728\u6469\u5c14\u5b9a\u5f8b\u653e\u7f13\u7684\u80cc\u666f\u4e0b\u3002\u5185\u5b58\u8bbf\u95ee\u6210\u672c\u8fdc\u9ad8\u4e8e\u8ba1\u7b97\uff0c\u8fd9\u7ed9\u5927\u89c4\u6a21\u6269\u5c55\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5373\u6240\u8c13\u7684\u201c\u5185\u5b58\u5899\u201d\u3002\u5728\u8fd9\u4e2a\u65f6\u5019\uff0c\u8ba1\u7b97\u5728\u5185\u5b58\uff08Compute-in-Memory, CIM\uff09\u6280\u672f\u4e3aAI\u63a8\u7406\u63d0\u4f9b\u4e86\u52a0\u901f\u53ef\u80fd\uff0c\u901a\u8fc7\u5728\u5185\u5b58\u4e2d\u76f4\u63a5\u6267\u884c\u6a21\u62df\u8ba1\u7b97\uff0c\u6709\u671b\u964d\u4f4e\u5ef6\u8fdf\u548c\u529f\u8017\u3002\u901a\u8fc7\u7d27\u5bc6\u96c6\u6210\u5185\u5b58\u548c\u8ba1\u7b97\u5143\u4ef6\uff0cCIM\u6d88\u9664\u4e86\u51af\u8bfa\u4f9d\u66fc\u74f6\u9888\uff0c\u51cf\u5c11\u4e86\u6570\u636e\u4f20\u8f93\uff0c\u63d0\u9ad8\u4e86\u80fd\u6e90\u6548\u7387\u3002 \u672c\u7efc\u8ff0\u8bba\u6587\u6982\u8ff0\u4e86\u57fa\u4e8e\u53d8\u538b\u5668\u7684\u6a21\u578b\uff0c\u63a2\u8ba8\u4e86\u5404\u79cdCIM\u67b6\u6784\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u5982\u4f55\u5e94\u5bf9\u73b0\u4ee3\u4eba\u5de5\u667a\u80fd\u8ba1\u7b97\u7cfb\u7edf\u9762\u4e34\u7684\u7d27\u8feb\u6311\u6218\u3002\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u4e86\u4e0e\u53d8\u538b\u5668\u76f8\u5173\u7684\u8fd0\u7b97\u53ca\u5176\u786c\u4ef6\u52a0\u901f\u7b56\u7565\uff0c\u540c\u65f6\u6307\u51fa\u76f8\u5173CIM\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\u3001\u8d8b\u52bf\u548c\u6d1e\u5bdf\u3002|\n", "2406.08402": "|**2024-06-12**|**Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models**|Chun-Yi Kuan et.al.|[2406.08402](http://arxiv.org/abs/2406.08402)|**[link](https://github.com/kuan2jiu99/audio-hallucination)**|**## \u80cc\u666f \u5927\u578b\u97f3\u9891\u8bed\u8a00\u6a21\u578b\uff08LALMs\uff09\u901a\u8fc7\u6574\u5408\u97f3\u9891\u611f\u77e5\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u4f20\u7edf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u97f3\u9891\u76f8\u5173\u4efb\u52a1\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bc4\u4f30LALMs\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u4f46\u5bf9\u5b83\u4eec\u7684\u53ef\u9760\u6027\uff0c\u7279\u522b\u662f\u5173\u4e8e\u5bf9\u8c61\u5e7b\u89c9\u7b49\u95ee\u9898\u7684\u5173\u6ce8\u4e0d\u8db3\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u6765\u8bc4\u4f30\u516c\u5f00\u53ef\u7528\u7684LALMs\u5728\u5bf9\u8c61\u5e7b\u89c9\u65b9\u9762\u7684\u7a0b\u5ea6\u3002\u7ed3\u679c\u8868\u660e\uff0cLALMs\u5728\u7406\u89e3\u97f3\u9891\u5185\u5bb9\u65b9\u9762\u4e0e\u4e13\u95e8\u7684\u97f3\u9891captioning\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5728\u56de\u7b54\u533a\u5206\u6027\u95ee\u9898\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u9700\u8981\u8bc6\u522b\u97f3\u9891\u7247\u6bb5\u4e2d\u7279\u5b9a\u7269\u4f53\u58f0\u97f3\u7684\u95ee\u9898\u3002\u8fd9\u63ed\u793a\u4e86\u5f53\u524dLALMs\u7684\u4e00\u4e2a\u5173\u952e\u5f31\u70b9\uff1a\u5b83\u4eec\u5bf9\u533a\u5206\u6027\u67e5\u8be2\u7684\u7406\u89e3\u4e0d\u8db3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u63d0\u793a\u5de5\u7a0b\u5982\u4f55\u63d0\u5347LALMs\u5728\u533a\u5206\u6027\u95ee\u9898\u4e0a\u7684\u6027\u80fd\u3002**|\n", "2406.08398": "|**2024-06-12**|**cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers**|Anirudh Sundar et.al.|[2406.08398](http://arxiv.org/abs/2406.08398)|null|## \u80cc\u666f \u5728\u60c5\u5883\u5316\u548c\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8bdd\uff08SIMMC\uff09\u7684\u65b0\u5174\u7814\u7a76\u9886\u57df\u4e2d\uff0c\u79d1\u5b66\u8bba\u6587\u7684\u4e92\u52a8\u662f\u4e00\u4e2a\u91cd\u8981\u65b9\u5411\u3002\u7531\u4e8e\u79d1\u5b66\u8bba\u6587\u4e3b\u8981\u7531\u6587\u672c\u3001\u516c\u5f0f\u3001\u56fe\u8868\u548c\u8868\u683c\u6784\u6210\uff0cSIMMC\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u8fdb\u884c\u4e13\u95e8\u8bbe\u8ba1\uff0c\u4ee5\u652f\u6301\u79d1\u7814\u4eba\u5458\u6240\u9700\u7684\u6df1\u5ea6\u63a2\u7a76\u548c\u4e92\u52a8\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u8bdd\u5f0f\u8bba\u6587\u201d\uff08cPAPERS\uff09\u7684\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u6765\u81eaarXiv\u4e0a\u53ef\u7528\u7684\u79d1\u5b66\u6587\u6863\u7684\u5b66\u672f\u8bba\u6587\u8bc4\u8bba\u4e2d\u7684\u95ee\u7b54\u5bf9\uff0c\u8fd9\u4e9b\u95ee\u7b54\u4e0e\u8bba\u6587\u7ec4\u4ef6\u53ca\u5176\u5f15\u7528\u76f8\u5173\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6570\u636e\u6536\u96c6\u7b56\u7565\uff0c\u901a\u8fc7OpenReview\u6536\u96c6\u8fd9\u4e9b\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5e76\u4e0eLaTeX\u6e90\u6587\u4ef6\u4e2d\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5173\u8054\u8d77\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u7cfb\u5217\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5305\u62ec\u96f6\u6837\u672c\u548c\u5fae\u8c03\u914d\u7f6e\uff0c\u6765\u5904\u7406cPAPERS\u6570\u636e\u96c6\u3002|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418](http://arxiv.org/abs/2406.09418)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|**\u5728\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u5c55\u57fa\u7840\u4e0a\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u89c6\u9891\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u9891LMMs\u4f9d\u8d56\u4e8e\u56fe\u50cf\u6216\u89c6\u9891\u7f16\u7801\u5668\u5904\u7406\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u4e9b\u7f16\u7801\u5668\u5404\u81ea\u5b58\u5728\u5c40\u9650\u6027\u3002\u56fe\u50cf\u7f16\u7801\u5668\u64c5\u957f\u6355\u6349\u5e27\u5e8f\u5217\u4e2d\u7684\u4e30\u5bcc\u7a7a\u95f4\u7ec6\u8282\uff0c\u4f46\u7f3a\u4e4f\u660e\u786e\u7684\u65f6\u95f4\u4e0a\u4e0b\u6587\uff1b\u800c\u89c6\u9891\u7f16\u7801\u5668\u63d0\u4f9b\u65f6\u95f4\u4e0a\u4e0b\u6587\uff0c\u4f46\u5e38\u5e38\u53d7\u9650\u4e8e\u8ba1\u7b97\u8d44\u6e90\uff0c\u5bfc\u81f4\u53ea\u80fd\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u7684\u7a00\u758f\u5e27\uff0c\u4ece\u800c\u5f71\u54cd\u4e86\u5bf9\u7a7a\u95f4\u548c\u4e0a\u4e0b\u6587\u7684\u7406\u89e3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faVideoGPT+\uff0c\u5b83\u7ed3\u5408\u4e86\u56fe\u50cf\u7f16\u7801\u5668\uff08\u7528\u4e8e\u8be6\u7ec6\u7684\u7a7a\u95f4\u7406\u89e3\uff09\u548c\u89c6\u9891\u7f16\u7801\u5668\uff08\u7528\u4e8e\u5168\u5c40\u65f6\u5e8f\u4e0a\u4e0b\u6587\u5efa\u6a21\uff09\u7684\u4f18\u52bf\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u89c6\u9891\u5212\u5206\u4e3a\u5c0f\u6bb5\uff0c\u5e76\u5bf9\u6765\u81ea\u4e24\u8005\u7279\u5f81\u7684\u63d0\u53d6\u5e94\u7528\u81ea\u9002\u5e94\u6c60\u5316\u7b56\u7565\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u6211\u4eec\u7684\u67b6\u6784\u5728\u591a\u4e2a\u89c6\u9891\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5305\u62ecVCGBench\u3001MVBench\u548c\u96f6\u6837\u672c\u95ee\u7b54\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a112K\u7684\u89c6\u9891\u6307\u4ee4\u96c6\uff0c\u901a\u8fc7\u65b0\u9896\u7684\u534a\u81ea\u52a8\u6807\u6ce8\u7ba1\u9053\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u89c6\u9891LMMs\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86VCGBench-Diverse\uff0c\u5b83\u6db5\u76d6\u4e8618\u4e2a\u5e7f\u6cdb\u89c6\u9891\u7c7b\u522b\uff0c\u5982\u751f\u6d3b\u65b9\u5f0f\u3001\u4f53\u80b2\u3001\u79d1\u5b66\u3001\u6e38\u620f\u548c\u76d1\u63a7\u89c6\u9891\uff0c\u51714,354\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u8fd9\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\u73b0\u6709LMMs\u5728\u5bc6\u96c6\u89c6\u9891\u63cf\u8ff0\u3001\u7a7a\u95f4\u548c\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u590d\u6742\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u786e\u4fdd\u5728\u5404\u79cd\u89c6\u9891\u7c7b\u578b\u548c\u52a8\u6001\u4e0b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mbzuai-oryx/VideoGPT-plus\u627e\u5230\u3002**|\n", "2406.09412": "|**2024-06-13**|**Explore the Limits of Omni-modal Pretraining at Scale**|Yiyuan Zhang et.al.|[2406.09412](http://arxiv.org/abs/2406.09412)|**[link](https://github.com/invictus717/MiCo)**|**\u6211\u4eec\u63d0\u8bae\u6784\u5efa\u5168\u6a21\u6001\u667a\u80fd\uff0c\u65e8\u5728\u7406\u89e3\u5404\u79cd\u6a21\u6001\u5e76\u5b66\u4e60\u901a\u7528\u8868\u793a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u9884\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u591a\u6a21\u6001\u4e0a\u4e0b\u6587\uff08MiCo\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u540c\u65f6\u589e\u52a0\u6a21\u6001\u6570\u91cf\u3001\u6570\u636e\u91cf\u4ee5\u53ca\u6a21\u578b\u53c2\u6570\u7684\u6570\u91cf\u3002\u901a\u8fc7MiCo\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u591a\u9879\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u6a21\u6001\u5b66\u4e60\u80fd\u529b\uff1a\u4e00\u662f\u9488\u5bf910\u79cd\u4e0d\u540c\u6a21\u6001\u7684\u5355\u6a21\u6001\u611f\u77e5\u57fa\u51c6\uff0c\u4e8c\u662f\u5305\u62ec\u68c0\u7d22\u3001\u95ee\u7b54\u548ccaptioning\u5728\u5185\u768425\u9879\u8de8\u6a21\u6001\u7406\u89e3\u4efb\u52a1\uff0c\u4e09\u662f18\u4e2a\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u57fa\u51c6\u3002\u6211\u4eec\u7684\u6a21\u578b\u521b\u9020\u4e8637\u9879\u6700\u65b0\u7684\u6700\u9ad8\u6027\u80fd\u8bb0\u5f55\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u63a8\u52a8\u5168\u6a21\u6001\u667a\u80fd\u7684\u53d1\u5c55\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u5f00\u6e90\u3002**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397](http://arxiv.org/abs/2406.09397)|null|\u73b0\u4ee3\u89c6\u89c9\u6a21\u578b\u5728\u5927\u89c4\u6a21\u5608\u6742\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u867d\u7136\u5c55\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u9075\u5faa\u7528\u6237\u610f\u56fe\u3001\u5982\u89c6\u89c9\u7f8e\u611f\u3001\u7279\u5b9a\u98ce\u683c\u548c\u8d23\u4efb\u8f93\u51fa\u65b9\u9762\u53ef\u80fd\u5b58\u5728\u95ee\u9898\u3002\u672c\u6587\u5173\u6ce8\u89c6\u89c9\u7f8e\u5b66\u9886\u57df\uff0c\u76ee\u6807\u662f\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u6807\u51c6\u5728\u68c0\u7d22\u7cfb\u7edf\u4e2d\u4fdd\u6301\u4e00\u81f4\u3002\u9ad8\u7ea7\u68c0\u7d22\u7cfb\u7edf\u901a\u5e38\u91c7\u7528\u57fa\u4e8e\u4f4e\u7ea7\u7279\u5f81\uff08\u5982\u9971\u548c\u5ea6\uff09\u7684\u5ba1\u7f8e\u6a21\u578b\u4f5c\u4e3a\u91cd\u6392\u5668\u6216\u8fc7\u6ee4\u5668\uff0c\u4f46\u9762\u5bf9\u98ce\u683c\u3001\u6587\u5316\u6216\u77e5\u8bc6\u80cc\u666f\u65f6\u6027\u80fd\u6709\u9650\u3002\u6211\u4eec\u53d1\u73b0\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u6539\u5199\u641c\u7d22\u67e5\u8be2\u5e76\u6269\u5c55\u5ba1\u7f8e\u671f\u671b\uff0c\u53ef\u4ee5\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u504f\u597d\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u9488\u5bf9\u89c6\u89c9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u63d0\u53d6LLM\u63a8\u7406\u548c\u5ba1\u7f8e\u6a21\u578b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u66f4\u597d\u5730\u4f7f\u89c6\u89c9\u6a21\u578b\u7b26\u5408\u4eba\u7c7b\u5ba1\u7f8e\u3002\u7531\u4e8e\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u68c0\u7d22\u7cfb\u7edf\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u6a21\u578b\uff08LMM\uff09\u6765\u8bc4\u4ef7\u7f8e\u611f\u8868\u73b0\u3002\u8003\u8651\u5230\u7f8e\u611f\u8bc4\u4f30\u7684\u4e3b\u89c2\u6027\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aHPIR\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8861\u91cf\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u7684\u5951\u5408\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9\u6a21\u578b\u7684\u7f8e\u611f\u884c\u4e3a\uff0c\u4ece\u591a\u4e2a\u6307\u6807\u6765\u770b\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u63d0\u51fa\u7684\u7b97\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u901a\u7528\u5b9e\u8df5\uff0c\u7528\u4e8e\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002|\n", "2406.09396": "|**2024-06-13**|**Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA**|Jongwoo Park et.al.|[2406.09396](http://arxiv.org/abs/2406.09396)|**[link](https://github.com/jongwoopark7978/LVNet)**|\u957f\u671f\u89c6\u9891\u901a\u5e38\u5305\u542b\u5927\u91cf\u5197\u4f59\u4fe1\u606f\uff0c\u8de8\u8d8a\u8f83\u957f\u7684\u65f6\u95f4\u95f4\u9694\uff0c\u4e14\u5305\u542b\u591a\u4e2a\u677e\u6563\u5173\u8054\u7684\u4e8b\u4ef6\u6216\u5b9e\u4f53\u3002\u56e0\u6b64\uff0c\u5728\u8fdb\u884c\u957f\u89c6\u9891\u95ee\u7b54\uff08LVQA\uff09\u65f6\uff0c\u751f\u6210\u6b63\u786e\u7b54\u6848\u6240\u9700\u7684\u6240\u6709\u4fe1\u606f\u5f80\u5f80\u53ea\u9700\u4e00\u5c0f\u90e8\u5206\u5e27\u5c31\u8db3\u4ee5\u63d0\u4f9b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LVQA\u57fa\u51c6\u4e0a\u53d6\u5f97\u5353\u8d8a\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u4f9d\u8d56\u4e8e\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5c06\u89c6\u9891\u4e2d\u7684\u6240\u6709\u89c6\u89c9\u5185\u5bb9\u8f6c\u6362\u6210\u81ea\u7136\u8bed\u8a00\u3002\u4f20\u7edf\u505a\u6cd5\u901a\u5e38\u662f\u5747\u5300\u91c7\u6837\u5927\u91cf\u5e27\u5e76\u72ec\u7acb\u4e3a\u5176\u751f\u6210\u63cf\u8ff0\uff0c\u8fd9\u65e2\u4e0d\u9ad8\u6548\u4e5f\u4e0d\u514d\u6709\u5197\u4f59\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5173\u952e\u5e27\u9009\u62e9\u548c\u987a\u5e8f\u611f\u77e5\u7684\u63cf\u8ff0\u65b9\u6cd5\uff0c\u4ee5\u663e\u8457\u51cf\u5c11\u8fd9\u4e9b\u5197\u4f59\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u521b\u65b0\u65b9\u6cd5\uff1a\u5c42\u6b21\u5173\u952e\u5e27\u9009\u62e9\u5668\u548c\u987a\u5e8f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u7684\u6700\u7ec8\u6846\u67b6\u79f0\u4e3aLVNet\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6LVQA\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u4ee3\u7801\u3002|\n", "2406.09367": "|**2024-06-13**|**Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs**|Zijia Zhao et.al.|[2406.09367](http://arxiv.org/abs/2406.09367)|**[link](https://github.com/joez17/videoniah)**|**\u89c6\u9891\u7406\u89e3\u662f\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5173\u952e\u4e0b\u4e00\u6b65\u3002\u4e3a\u4e86\u68c0\u9a8c\u89c6\u9891\u7406\u89e3\u7684\u7279\u5b9a\u65b9\u9762\uff0c\u73b0\u6709\u7684\u89c6\u9891\u57fa\u51c6\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u9009\u62e9\u4e0e\u76ee\u6807\u80fd\u529b\u5339\u914d\u7684\u89c6\u9891\uff0c\u5e76\u5bf9\u67e5\u8be2-\u54cd\u5e94\u5bf9\u8fdb\u884c\u7e41\u7410\u7684\u6807\u6ce8\uff0c\u4ee5\u5339\u914d\u89c6\u9891\u5185\u5bb9\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u65e2\u5177\u6709\u6311\u6218\u6027\u53c8\u8d44\u6e90\u5bc6\u96c6\u3002\u672c\u6587\u63d0\u51faVideoNIAH\uff08\u89c6\u9891\u9488 haystack\uff09\uff0c\u4e00\u4e2a\u901a\u8fc7\u5408\u6210\u89c6\u9891\u751f\u6210\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\u3002VideoNIAH\u901a\u8fc7\u5c06\u4e0d\u76f8\u5173\u7684\u56fe\u50cf/\u6587\u672c\u201c\u9488\u201d\u63d2\u5165\u539f\u59cb\u89c6\u9891\u4e2d\uff0c\u5c06\u6d4b\u8bd5\u89c6\u9891\u5185\u5bb9\u4e0e\u5b83\u4eec\u7684\u67e5\u8be2-\u54cd\u5e94\u5206\u79bb\u3002\u5b83\u4ec5\u57fa\u4e8e\u8fd9\u4e9b\u9488\u751f\u6210\u6ce8\u91ca\uff0c\u786e\u4fdd\u89c6\u9891\u6765\u6e90\u7684\u591a\u6837\u6027\u548c\u67e5\u8be2-\u54cd\u5e94\u7684\u4e30\u5bcc\u6027\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d2\u5165\u591a\u4e2a\u9488\uff0cVideoNIAH\u4e25\u683c\u8bc4\u4f30\u6a21\u578b\u7684\u65f6\u5e8f\u7406\u89e3\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528VideoNIAH\u6784\u5efa\u4e86\u89c6\u9891\u57fa\u51c6VNBench\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u6392\u5e8f\u548c\u8ba1\u6570\u7b49\u4efb\u52a1\u3002VNBench\u80fd\u591f\u9ad8\u6548\u5730\u8bc4\u4f30\u89c6\u9891\u6a21\u578b\u7684\u7cbe\u7ec6\u7406\u89e3\u80fd\u529b\u548c\u65f6\u7a7a\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u652f\u6301\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd1\u671f\u7684\u89c6\u9891\u4e3a\u4e2d\u5fc3\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u6a21\u578b\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u3002\u5c3d\u7ba1\u4e13\u6709\u6a21\u578b\u76f8\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\u5177\u6709\u663e\u8457\u4f18\u52bf\uff0c\u4f46\u6240\u6709\u73b0\u6709\u89c6\u9891\u6a21\u578b\u5728\u957f\u8ddd\u79bb\u4f9d\u8d56\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4ecd\u7136\u4e0d\u4f73\u3002VideoNIAH\u662f\u4e00\u4e2a\u7b80\u5355\u4e14\u9ad8\u5ea6\u53ef\u6269\u5c55\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\uff0c\u6211\u4eec\u76f8\u4fe1\u5b83\u5c06\u6fc0\u53d1\u672a\u6765\u89c6\u9891\u57fa\u51c6\u5de5\u4f5c\u7684\u521b\u65b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/joez17/VideoNIAH\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.09363": "|**2024-06-13**|**ElicitationGPT: Text Elicitation Mechanisms via Language Models**|Yifan Wu et.al.|[2406.09363](http://arxiv.org/abs/2406.09363)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u65e0\u9700\u9886\u57df\u77e5\u8bc6\u7684\u67e5\u8be2\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u5bf9\u83b7\u53d6\u7684\u6587\u672c\u9884\u6d4b\u8fdb\u884c\u8bc4\u5206\uff0c\u4ee5\u8bc4\u4f30\u5176\u4e0e\u5b9e\u9645\u72b6\u6001\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u6fc0\u52b1\u4fe1\u606f\u6536\u96c6\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u3002\u7814\u7a76\u901a\u8fc7\u5728\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u81ea\u52a8\u7684\u6a21\u578b\u8bc4\u5206\u4e0e\u4eba\u5de5\u5bfc\u5e08\u7ed9\u51fa\u7684\u8bc4\u5206\uff0c\u65e8\u5728\u5b9e\u8bc1\u8bc4\u4f30\u8fd9\u4e9b\u673a\u5236\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002|\n", "2406.09345": "|**2024-06-13**|**DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding**|Suwon Shon et.al.|[2406.09345](http://arxiv.org/abs/2406.09345)|null|## \u80cc\u666f \u5c06\u9884\u8bad\u7ec3\u7684\u6587\u672c\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u8bed\u97f3\u8f93\u5165\u76f8\u7ed3\u5408\uff0c\u5df2\u7ecf\u8d4b\u4e88\u4e86\u8fd9\u4e9b\u6a21\u578b\u6267\u884c\u591a\u6837\u5316\u8bed\u97f3\u4efb\u52a1\u7684\u80fd\u529b\uff0c\u5305\u62ec\u6307\u4ee4\u8ddf\u968f\u3002\u8fd9\u79cd\u6574\u5408\u9700\u8981\u7ed3\u5408\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u548cLLM\uff0c\u5b83\u4eec\u5206\u522b\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u63d0\u8bae\u4f7f\u7528\u79bb\u6563\u8bed\u97f3\u5355\u5143\uff08DSU\uff09\uff0c\u800c\u975e\u8fde\u7eed\u503c\u7684\u8bed\u97f3\u7f16\u7801\u8f93\u51fa\uff0c\u901a\u8fc7\u8bed\u97f3\u9002\u914d\u5668\u5c06DSU\u8f6c\u6362\u5230LLM\u7684\u5d4c\u5165\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u65e0\u76d1\u7763\u7684\u8bed\u97f3\u7f16\u7801\u5668\u751f\u6210DSU\uff0c\u7136\u540e\u8fd0\u7528k-means\u805a\u7c7b\u65b9\u6cd5\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u5904\u7406\u6765\u81ea\u89c1/\u672a\u89c1\u8fc7\u9886\u57df\u4ee5\u53ca\u53e3\u8bed\u95ee\u7b54\u4e2d\u7684\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u7a33\u5065\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6765\u81ea\u4e0d\u540c\u81ea\u76d1\u7763\u8bed\u97f3\u7f16\u7801\u5668\u5c42\u7684DSU\u7c7b\u578b\uff0c\u4ee5\u53ca\u6885\u5c14\u9891\u7387\u5012\u8c31\u7cfb\u6570\uff08MFCC\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u53e3\u8bed\u95ee\u7b54\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u4e2d\uff0cASR\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u53ef\u80fd\u8f83\u4f4e\u3002|\n", "2406.09325": "|**2024-06-13**|**REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space**|Tomer Ashuach et.al.|[2406.09325](http://arxiv.org/abs/2406.09325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u80fd\u65e0\u610f\u4e2d\u8bb0\u4f4f\u5e76\u6cc4\u9732\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u654f\u611f\u6216\u4e2a\u4eba\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\uff0c\u5f15\u53d1\u9690\u79c1\u95ee\u9898\u3002\u5f53\u524d\u7684\u89e3\u51b3\u65b9\u6848\u5305\u62ec\u6602\u8d35\u7684\u6570\u636e\u6e05\u6d17\uff0c\u6216\u8005\u901a\u8fc7\u9057\u5fd8\u548c\u6a21\u578b\u7f16\u8f91\u6765\u8fc7\u6ee4\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u88ab\u63d0\u53d6\u653b\u51fb\u7ed5\u8fc7\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u540d\u4e3aREVS\uff0c\u7528\u4e8e\u4eceLLMs\u4e2d\u6d88\u9664\u654f\u611f\u4fe1\u606f\u3002REVS\u8bc6\u522b\u5e76\u4fee\u6539\u4e0e\u6bcf\u6761\u654f\u611f\u4fe1\u606f\u76f8\u5173\u7684\u5c11\u91cf\u795e\u7ecf\u5143\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u795e\u7ecf\u5143\u6295\u5f71\u5230\u8bcd\u6c47\u7a7a\u95f4\uff08\u53bb\u5d4c\u5165\uff09\uff0c\u6211\u4eec\u5b9a\u4f4d\u9a71\u52a8\u5176\u751f\u6210\u7684\u5173\u952e\u90e8\u5206\u3002\u7136\u540e\uff0c\u6211\u4eec\u6839\u636e\u53bb\u5d4c\u5165\u77e9\u9635\u7684\u4f2a\u9006\u8ba1\u7b97\u6a21\u578b\u7f16\u8f91\uff0c\u5e76\u5e94\u7528\u5b83\u6765\u964d\u4f4e\u76ee\u6807\u654f\u611f\u6570\u636e\u7684\u751f\u6210\u6982\u7387\u3002\u4e3a\u4e86\u5145\u5206\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u771f\u6b63\u654f\u611f\u4fe1\u606f\u4e0a\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u662fGPT-J\u56fa\u6709\u7684\u7535\u5b50\u90ae\u4ef6\u6570\u636e\u96c6\uff0c\u53e6\u4e00\u4e2a\u662f\u6211\u4eec\u8c03\u6574\u6a21\u578b\u4f7f\u5176\u8bb0\u5fc6\u7684\u5408\u6210\u793e\u4f1a\u4fdd\u969c\u53f7\u7801\u6570\u636e\u96c6\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\u76f8\u6bd4\uff0cREVS\u5728\u6d88\u9664\u654f\u611f\u4fe1\u606f\u548c\u62b5\u6297\u63d0\u53d6\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u7684\u5b8c\u6574\u6027\u3002\u4ee3\u7801\u548c\u6f14\u793a\u7b14\u8bb0\u672c\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.09324": "|**2024-06-13**|**Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs**|Zhao Xu et.al.|[2406.09324](http://arxiv.org/abs/2406.09324)|**[link](https://github.com/usail-hkust/bag_of_tricks_for_llm_jailbreaking)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6837\u672c\u4efb\u52a1\u6267\u884c\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u6613\u53d7\u7834\u89e3\u653b\u51fb\uff0c\u53ef\u80fd\u88ab\u64cd\u7eb5\u4ea7\u751f\u6709\u5bb3\u8f93\u51fa\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f00\u59cb\u5c06\u7834\u89e3\u653b\u51fb\u5206\u4e3a\u4ee4\u724c\u7ea7\u548c\u63d0\u793a\u7ea7\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u4e3b\u8981\u5ffd\u89c6\u4e86\u7834\u89e3\u653b\u51fb\u7684\u591a\u6837\u5173\u952e\u56e0\u7d20\uff0c\u5927\u90e8\u5206\u7814\u7a76\u805a\u7126\u4e8eLLM\u7684\u6f0f\u6d1e\uff0c\u800c\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u63a2\u7d22\u4e0d\u8db3\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u653b\u51fb\u8bbe\u7f6e\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u8bae\u5efa\u7acb\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u6846\u67b6\uff0c\u4ee5\u4fc3\u8fdb\u6807\u51c6\u5316\u8bc4\u4f30\u3002\u6211\u4eec\u4ece\u76ee\u6807\u7ea7\u548c\u653b\u51fb\u7ea7\u4e24\u4e2a\u89d2\u5ea6\uff0c\u8be6\u7ec6\u8003\u5bdf\u4e86\u5b9e\u65bd\u9488\u5bf9LLMs\u7684\u7834\u89e3\u653b\u51fb\u7684\u516b\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5e38\u7528\u6570\u636e\u96c6\u4e0a\u5bf9\u516d\u79cd\u9632\u5fa1\u65b9\u6cd5\u8fdb\u884c\u4e86\u4e03\u79cd\u4ee3\u8868\u6027\u7684\u7834\u89e3\u653b\u51fb\uff0c\u603b\u8ba1\u7ea6320\u4e2a\u5b9e\u9a8c\uff0c\u4f7f\u7528A800-80G GPU\u8017\u65f6\u5927\u7ea65\u4e07\u5c0f\u65f6\u3002\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u8fdb\u884c\u6807\u51c6\u5316\u8bc4\u4f30\u7684\u5fc5\u8981\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff1ahttps://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking\u3002**|\n", "2406.09321": "|**2024-06-13**|**JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models**|Delong Ran et.al.|[2406.09321](http://arxiv.org/abs/2406.09321)|**[link](https://github.com/thuccslab/jailbreakeval)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8d8a\u72f1\u653b\u51fb\u7814\u7a76\u4e2d\u7684\u8bc4\u4f30\u96be\u9898\u3002\u76ee\u524d\uff0c\u5bf9\u4e8e\u653b\u51fb\u662f\u5426\u6210\u529f\u7f3a\u4e4f\u7edf\u4e00\u6807\u51c6\uff0c\u4e0d\u540c\u7684\u8bc4\u4f30\u65b9\u6cd5\u5982\u4eba\u5de5\u6807\u6ce8\u6216\u7279\u5b9a\u65b9\u5f0f\u63d0\u793aGPT-4\u5b58\u5728\uff0c\u5404\u6709\u4f18\u7f3a\u70b9\uff0c\u5bf9\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u4f53\u73b0\u548c\u7814\u7a76\u6210\u672c\u4ea7\u751f\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7814\u7a76\u5206\u6790\u4e86\u8fd1\u4e5d\u5341\u98792023\u5e745\u6708\u81f32024\u5e744\u6708\u671f\u95f4\u53d1\u5e03\u7684\u8d8a\u72f1\u653b\u51fb\u76f8\u5173\u7814\u7a76\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u8be6\u7ec6\u7684\u8bc4\u4f30\u65b9\u6cd5\u5206\u7c7b\u4f53\u7cfb\uff0c\u6df1\u5165\u5256\u6790\u4e86\u5404\u79cd\u8bc4\u4f30\u5668\u7684\u4f18\u7f3a\u70b9\u53ca\u5176\u5e94\u7528\u73b0\u72b6\u3002\u4e3a\u4e86\u63a8\u52a8\u540e\u7eed\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u5e76\u63a8\u51fa\u4e86JailbreakEval\u5de5\u5177\u5305\uff0c\u5b83\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5e73\u53f0\uff0c\u96c6\u6210\u4e86\u591a\u79cd\u77e5\u540d\u7684\u8bc4\u4f30\u5668\uff0c\u7528\u6237\u53ea\u9700\u4e00\u4e2a\u547d\u4ee4\u5373\u53ef\u83b7\u53d6\u7ed3\u679c\u3002\u6b64\u5916\uff0cJailbreakEval\u652f\u6301\u7528\u6237\u5728\u7edf\u4e00\u6846\u67b6\u5185\u5b9a\u5236\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6d41\u7a0b\uff0c\u7b80\u5316\u4e86\u5f00\u53d1\u548c\u6bd4\u8f83\u8fc7\u7a0b\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u671f\u671bJailbreakEval\u80fd\u4fc3\u8fdb\u8d8a\u72f1\u653b\u51fb\u8bc4\u4ef7\u7684\u6807\u51c6\u5316\uff0c\u6210\u4e3a\u793e\u533a\u5185\u8d8a\u72f1\u7814\u7a76\u8bc4\u4f30\u7684\u50ac\u5316\u5242\u3002**|\n", "2406.10229": "|**2024-06-14**|**Quantifying Variance in Evaluation Benchmarks**|Lovish Madaan et.al.|[2406.10229](http://arxiv.org/abs/2406.10229)|null|\u8bc4\u4ef7\u57fa\u51c6\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4e5f\u662f\u63a8\u52a8\u8fd9\u4e9b\u80fd\u529b\u8fdb\u6b65\u7684\u9a71\u52a8\u529b\u3002\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff08\u6216\u7f3a\u4e4f\uff09\uff0c\u73b0\u5728\u5b83\u4eec\u4e5f\u88ab\u5e7f\u6cdb\u7528\u4e8e\u51b3\u5b9a\u4e0d\u540c\u7684\u8bad\u7ec3\u9009\u62e9\u4e4b\u95f4\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u6211\u4eec\u5f88\u5c11\u91cf\u5316\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\uff0c\u8fd9\u51b3\u5b9a\u4e86\u6027\u80fd\u5dee\u5f02\u7684\u542b\u4e49\u3002\u672c\u6587\u5b9a\u4e49\u5e76\u6d4b\u91cf\u4e86\u4e00\u7cfb\u5217\u65e8\u5728\u8861\u91cf\u8bc4\u4ef7\u57fa\u51c6\u65b9\u5dee\u7684\u6307\u6807\uff0c\u5305\u62ec\u521d\u59cb\u5316\u65f6\u7684\u968f\u673a\u79cd\u5b50\u65b9\u5dee\u548c\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5355\u8c03\u6027\u3002\u901a\u8fc7\u5bf9\u5927\u91cf\u6a21\u578b\uff08\u5305\u62ec\u516c\u5f00\u53ef\u7528\u7684\u548c\u4ece\u5934\u8bad\u7ec3\u7684\u6a21\u578b\uff09\u8fdb\u884c\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5404\u79cd\u65b9\u5dee\u5ea6\u91cf\u7684\u5b9e\u8bc1\u4f30\u8ba1\uff0c\u5e76\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u8003\u8651\u548c\u5efa\u8bae\u3002\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u8fde\u7eed\u548c\u79bb\u6563\u6027\u80fd\u5ea6\u91cf\u7684\u5b9e\u7528\u6027\u548c\u6743\u8861\uff0c\u5e76\u63a2\u7d22\u4e86\u66f4\u597d\u5730\u7406\u89e3\u548c\u51cf\u5c11\u65b9\u5dee\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u89c4\u6a21\uff08\u7ea670\u4ebf\u53c2\u6570\uff09\u7684\u6a21\u578b\uff0c\u5982\u5c06\u591a\u6a21\u6001\u591a\u4efb\u52a1\u5b66\u4e60\uff08MMLU\uff09\u4efb\u52a1\u6846\u67b6\u4e3a\u5b8c\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u5e38\u5e38\u964d\u4f4e\u65b9\u5dee\uff1b\u800c\u53d7\u5230\u4eba\u7c7b\u6d4b\u8bd5\u6587\u732e\u542f\u53d1\u7684\u66f4\u590d\u6742\u65b9\u6cd5\uff08\u5982\u9879\u76ee\u5206\u6790\u548c\u9879\u76ee\u53cd\u5e94\u7406\u8bba\uff09\u5728\u663e\u8457\u51cf\u5c11\u65b9\u5dee\u65b9\u9762\u6548\u679c\u6709\u9650\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\u7279\u6027\uff0c\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u7279\u5b9a\u6280\u672f\u6765\u51cf\u5c11\u65b9\u5dee\uff0c\u5e76\u666e\u904d\u9f13\u52b1\u5b9e\u8df5\u8005\u5728\u6bd4\u8f83\u6a21\u578b\u65f6\u4ed4\u7ec6\u8003\u8651\u65b9\u5dee\u56e0\u7d20\u3002|\n", "2406.10218": "|**2024-06-14**|**Semantic Membership Inference Attack against Large Language Models**|Hamid Mozaffari et.al.|[2406.10218](http://arxiv.org/abs/2406.10218)|null|## \u80cc\u666f \u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIA\uff09\u7684\u76ee\u6807\u662f\u8bc6\u522b\u7279\u5b9a\u6570\u636e\u70b9\u662f\u5426\u88ab\u7eb3\u5165\u4e86\u76ee\u6807\u6a21\u578b\u7684\u8bad\u7ec3\u96c6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u8bed\u4e49\u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Semantic Membership Inference Attack\uff0cSMIA\uff09\uff0c\u901a\u8fc7\u5229\u7528\u8f93\u5165\u7684\u8bed\u4e49\u5185\u5bb9\u53ca\u5176\u6270\u52a8\uff0c\u63d0\u5347MIA\u7684\u6027\u80fd\u3002SMIA\u8bad\u7ec3\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6765\u5206\u6790\u76ee\u6807\u6a21\u578b\u5bf9\u6270\u52a8\u8f93\u5165\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6355\u6349\u6210\u5458\u6837\u672c\u4e0e\u975e\u6210\u5458\u6837\u672c\u4e4b\u95f4\u8f93\u51fa\u6982\u7387\u5206\u5e03\u7684\u5dee\u5f02\u3002\u6211\u4eec\u5728Pythia\u548cGPT-Neo\u6a21\u578b\u5bb6\u65cf\uff0c\u4ee5\u53caWikipedia\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSMIA\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u653b\u51fb\u624b\u6bb5\uff0c\u4f8b\u5982\u5728Pythia-12B\u4e0a\u7684AUC-ROC\u503c\u8fbe\u5230\u4e8667.39%\uff0c\u800c\u7b2c\u4e8c\u597d\u7684\u653b\u51fb\u65b9\u6cd5\u4ec5\u4e3a58.90%\u3002|\n", "2406.10216": "|**2024-06-14**|**Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs**|Rui Yang et.al.|[2406.10216](http://arxiv.org/abs/2406.10216)|null|\u5728\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6846\u67b6\u4e2d\uff0c\u5229\u7528\u57fa\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u5956\u52b1\u6a21\u578b\u5df2\u8bc1\u5b9e\u80fd\u6709\u6548\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u610f\u56fe\u3002\u7136\u800c\uff0c\u5f53\u524d\u5956\u52b1\u6a21\u578b\u5bf9\u672a\u89c1\u8fc7\u7684\u63d0\u793a\u548c\u54cd\u5e94\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\uff0c\u53ef\u80fd\u5bfc\u81f4\u6240\u8c13\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u5373\u5956\u52b1\u4f18\u5316\u8fc7\u5ea6\u5bfc\u81f4\u5b9e\u9645\u6027\u80fd\u4e0b\u964d\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u503e\u5411\u4e8e\u7ea6\u675f\u7b56\u7565\u4f18\u5316\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u6b63\u5219\u5316\u9690\u85cf\u72b6\u6001\u6765\u589e\u5f3a\u5956\u52b1\u6a21\u578b\u5e94\u5bf9\u5206\u5e03\u53d8\u5316\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4fdd\u7559\u57fa\u7840\u6a21\u578b\u7684\u8bed\u8a00\u6a21\u578b\u5934\uff0c\u5e76\u7ed3\u5408\u4e00\u7cfb\u5217\u6587\u672c\u751f\u6210\u635f\u5931\uff0c\u65e8\u5728\u4fdd\u6301\u9690\u85cf\u72b6\u6001\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\uff0c\u540c\u65f6\u5728\u76f8\u540c\u7684\u9690\u85cf\u72b6\u6001\u540e\u5b66\u4e60\u4e00\u4e2a\u5956\u52b1\u5934\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5f15\u5165\u7684\u6b63\u5219\u5316\u6280\u672f\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u6cdb\u5316\u4efb\u52a1\u4e2d\u7684\u5956\u52b1\u6a21\u578b\u51c6\u786e\u6027\uff0c\u5e76\u6709\u6548\u7f13\u89e3\u4e86RLHF\u4e2d\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u53ef\u9760\u3001\u66f4\u7a33\u5065\u7684\u504f\u597d\u5b66\u4e60\u8303\u5f0f\u3002|\n", "2406.10209": "|**2024-06-14**|**Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs**|Abhimanyu Hans et.al.|[2406.10209](http://arxiv.org/abs/2406.10209)|**[link](https://github.com/ahans30/goldfish-loss)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u8bb0\u4f4f\u5e76\u91cd\u590d\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u8fd9\u5e26\u6765\u4e86\u9690\u79c1\u548c\u7248\u6743\u95ee\u9898\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u79cd\u8bb0\u5fc6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u4e0b\u4e00\u6b65 token \u8bad\u7ec3\u76ee\u6807\u7684\u5fae\u5999\u4fee\u6539\uff0c\u79f0\u4e3a\u201c\u91d1\u9c7c\u635f\u5931\u201d\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u4ee4\u724c\u4e0d\u53c2\u4e0e\u635f\u5931\u8ba1\u7b97\u3002\u6a21\u578b\u4e0d\u4f1a\u8bb0\u4f4f\u8fd9\u4e9b\u88ab\u4e22\u5f03\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u9632\u6b62\u4e86\u5b8c\u6574\u8bad\u7ec3\u5e8f\u5217\u7684\u9010\u5b57\u590d\u5236\u3002\u6211\u4eec\u5728\u6570\u5341\u4ebf\u89c4\u6a21\u7684 Llama-2 \u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u548c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u63d0\u53d6\u7684\u8bb0\u5fc6\uff0c\u800c\u5bf9\u4e0b\u6e38\u57fa\u51c6\u7684\u5f71\u54cd\u5fae\u4e4e\u5176\u5fae\u3002**|\n", "2406.10196": "|**2024-06-14**|**TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners**|Tomas de la Rosa et.al.|[2406.10196](http://arxiv.org/abs/2406.10196)|null|**\u6458\u8981\uff1a** \u65c5\u884c\u89c4\u5212\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u6839\u636e\u7ea6\u675f\u6761\u4ef6\u751f\u6210\u4e00\u7cfb\u5217\u4e0e\u8bbf\u95ee\u5730\u70b9\u76f8\u5173\u7684\u884c\u52a8\uff0c\u540c\u65f6\u6700\u5927\u5316\u7528\u6237\u7684\u6ee1\u610f\u5ea6\u3002\u4f20\u7edf\u65b9\u6cd5\u901a\u5e38\u4f1a\u5c06\u95ee\u9898\u8f6c\u5316\u4e3a\u7279\u5b9a\u5f62\u5f0f\u7684\u8bed\u8a00\u8868\u8fbe\uff0c\u4ece\u7f51\u7edc\u8d44\u6e90\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4f7f\u7528\u5408\u9002\u7684\u6c42\u89e3\u5668\u6765\u751f\u6210\u6709\u6548\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\u76f4\u63a5\u4ece\u7528\u6237\u8bf7\u6c42\u4e2d\u8f93\u51fa\u8ba1\u5212\uff0c\u5229\u7528\u4e30\u5bcc\u7684\u65c5\u884c\u9886\u57df\u77e5\u8bc6\u63d0\u4f9b\u666f\u70b9\u548c\u53ef\u80fd\u8def\u7ebf\u7b49\u9ad8\u5c42\u6b21\u4fe1\u606f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5f80\u5f80\u4ea7\u751f\u4e0d\u8fde\u8d2f\u3001\u672a\u80fd\u5b8c\u5168\u6ee1\u8db3\u7ea6\u675f\u7684\u8ba1\u5212\uff0c\u4e14\u65e0\u6cd5\u4fdd\u8bc1\u751f\u6210\u9ad8\u8d28\u91cf\u65b9\u6848\u3002\u6211\u4eec\u63d0\u51faTRIP-PAL\uff0c\u4e00\u79cd\u878d\u5408LLMs\u548c\u81ea\u52a8\u5316\u89c4\u5212\u5668\u7684\u6df7\u5408\u65b9\u6cd5\uff1a\uff081\uff09LLMs\u83b7\u53d6\u5e76\u8f6c\u6362\u65c5\u884c\u4fe1\u606f\u548c\u7528\u6237\u9700\u6c42\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u8f93\u5165\u89c4\u5212\u5668\u7684\u6570\u636e\u7ed3\u6784\uff1b\uff082\uff09\u81ea\u52a8\u5316\u89c4\u5212\u5668\u8d1f\u8d23\u751f\u6210\u6ee1\u8db3\u7ea6\u675f\u5e76\u4f18\u5316\u7528\u6237\u6548\u7528\u7684\u65c5\u884c\u8ba1\u5212\u3002\u6211\u4eec\u5728\u4e0d\u540c\u65c5\u884c\u573a\u666f\u4e2d\u7684\u5b9e\u9a8c\u8868\u660e\uff0cTRIP-PAL\u5728\u751f\u6210\u65c5\u884c\u8ba1\u5212\u65b9\u9762\u4f18\u4e8e\u7eafLLM\u65b9\u6cd5\u3002|\n", "2406.10185": "|**2024-06-14**|**Detecting and Evaluating Medical Hallucinations in Large Vision Language Models**|Jiawei Chen et.al.|[2406.10185](http://arxiv.org/abs/2406.10185)|null|\u968f\u7740\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u5982\u533b\u5b66\u56fe\u50cf\u95ee\u7b54\u548c\u62a5\u544a\u751f\u6210\uff0c\u5b83\u4eec\u4ece\u57fa\u7840\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u90a3\u91cc\u7ee7\u627f\u4e86\u5f3a\u5927\u7684\u529f\u80fd\uff0c\u4f46\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u4ee4\u4eba\u62c5\u5fe7\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5728\u533b\u7597\u8fd9\u6837\u5bf9\u9519\u8bef\u5bb9\u9650\u6781\u4f4e\u7684\u73af\u5883\u4e2d\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u65b9\u6cd5\u6216\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86Med-HallMark\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u533b\u7597\u591a\u6a21\u6001\u9886\u57df\u8bbe\u8ba1\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u57fa\u51c6\u3002Med-HallMark\u652f\u6301\u591a\u4efb\u52a1\u5e7b\u89c9\u68c0\u6d4b\uff0c\u63d0\u4f9b\u591a\u5143\u5316\u7684\u5e7b\u89c9\u6570\u636e\uff0c\u5e76\u91c7\u7528\u5206\u7ea7\u5e7b\u89c9\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MediHall Score\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u533b\u7597\u8bc4\u4f30\u6307\u6807\uff0c\u901a\u8fc7\u5206\u5c42\u8bc4\u5206\u7cfb\u7edf\u8bc4\u4f30LVLM\u7684\u5e7b\u89c9\uff0c\u8003\u8651\u5176\u4e25\u91cd\u7a0b\u5ea6\u548c\u7c7b\u578b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u6f5c\u5728\u4e34\u5e8a\u5f71\u54cd\u7684\u7ec6\u81f4\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86MediHallDetector\uff0c\u4e00\u79cd\u4e13\u4e3a\u7cbe\u786e\u5e7b\u89c9\u68c0\u6d4b\u8bbe\u8ba1\u7684\u533b\u7597LVLM\uff0c\u5b83\u91c7\u7528\u4e86\u591a\u4efb\u52a1\u8bad\u7ec3\u65b9\u6cd5\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u4e3a\u6d41\u884c\u7684LVLM\u8bbe\u7acb\u4e86\u57fa\u7ebf\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMediHall Score\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edf\u6307\u6807\u66f4\u6df1\u5165\u7406\u89e3\u5e7b\u89c9\u5f71\u54cd\u7684\u80fd\u529b\uff0c\u5e76\u663e\u793a\u4e86MediHallDetector\u7684\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u663e\u8457\u63d0\u9ad8LVLM\u5728\u533b\u7597\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6240\u6709\u76f8\u5173\u8d44\u6e90\u5c06\u5728\u4e0d\u4e45\u540e\u53d1\u5e03\u3002|\n", "2406.10181": "|**2024-06-14**|**Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors**|Siyuan Chen et.al.|[2406.10181](http://arxiv.org/abs/2406.10181)|null|\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8e\u5185\u5b58\u9700\u6c42\u901a\u5e38\u8d85\u8fc7\u5355\u4e2aGPU\u7684\u5bb9\u91cf\uff0c\u89e3\u51b3\u8fd9\u4e00\u5185\u5b58\u6311\u6218\u7684\u4e00\u4e2a\u5e38\u89c1\u65b9\u6cd5\u662f\u5c06\u8ba1\u7b97\u548c\u6570\u636e\u4eceGPU\u8fc1\u79fb\u5230CPU\u3002\u7136\u800c\uff0c\u8fd9\u53d7\u5230\u666e\u901a\u786c\u4ef6\u5e26\u5bbd\u9650\u5236\u7684\u5236\u7ea6\uff0c\u5f71\u54cd\u4e86CPU\u4e0eGPU\u4e4b\u95f4\u7684\u901a\u4fe1\u6548\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLSP_Offload\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5b66\u4e60\u5f0f\u7684\u5b50\u7a7a\u95f4\u6295\u5f71\u5668\uff0c\u5b9e\u73b0\u5728 commodity \u786c\u4ef6\u4e0a\u63a5\u8fd1\u539f\u751f\u901f\u5ea6\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6d89\u53ca\u5b66\u4e60\u4e00\u4e2a\u9ad8\u6548\u7684\u7a00\u758f\u538b\u7f29\u5668\uff0c\u4ee5\u6700\u5c0f\u5316\u901a\u4fe1\u5e76\u4fdd\u6301\u6700\u5c0f\u7cbe\u5ea6\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5c42\u7ea7\u901a\u4fe1\u8c03\u5ea6\u7b56\u7565\uff0c\u4ee5\u6700\u5927\u5316\u901a\u4fe1\u4e0e\u8ba1\u7b97\u4e4b\u95f4\u7684\u5e76\u884c\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u57284GB\u7b14\u8bb0\u672cGPU\u4e0a\u5fae\u8c0313\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5728\u914d\u590724GB\u5185\u5b58\u7684NVIDIA RTX 4090 GPU\u4e0a\u5fae\u8c0370\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u4ec5\u6bd4\u65e0\u5185\u5b58\u9650\u5236\u7684\u5fae\u8c03\u616231%\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u79bb\u7ebf\u6846\u67b6\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86\u5fae\u8c03\u541e\u5410\u91cf\uff0c\u6700\u9ad8\u53ef\u8fbe3.33\u500d\uff0c\u5f53\u8fbe\u5230\u76f8\u540c\u51c6\u786e\u5ea6\u65f6\uff0c\u51cf\u5c11\u4e86\u7aef\u5230\u7aef\u5fae\u8c03\u65f6\u95f4\u768433.1%\u81f362.5%\u3002|\n", "2406.10172": "|**2024-06-14**|**Datasets for Multilingual Answer Sentence Selection**|Matteo Gabburo et.al.|[2406.10172](http://arxiv.org/abs/2406.10172)|null|**\u6458\u8981\uff1a** \u5728\u8bbe\u8ba1\u9ad8\u6548\u7684\u68c0\u7d22\u5f0f\u95ee\u7b54\uff08Question Answering\uff0cQA\uff09\u7cfb\u7edf\u4e2d\uff0c\u7b54\u6848\u53e5\u5b50\u9009\u62e9\uff08Answer Sentence Selection\uff0cAS2\uff09\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u6807\u6ce8\u6570\u636e\uff0c\u5927\u591a\u6570AS2\u9886\u57df\u7684\u8fdb\u5c55\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\u3002\u8fd9\u5bfc\u81f4\u4e86\u975e\u82f1\u8bed\u73af\u5883\u4e0bQA\u7cfb\u7edf\u7684\u6027\u80fd\u4e0e\u82f1\u8bed\u7cfb\u7edf\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u672c\u8bba\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u65b0\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u8a00\uff08\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u8461\u8404\u7259\u8bed\u548c\u897f\u73ed\u7259\u8bed\uff09AS2\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u5bf9\u73b0\u6709\u7684\u82f1\u6587AS2\u6570\u636e\u96c6\uff08\u5982ASNQ\u3001WikiQA\u548cTREC-QA\uff09\u8fdb\u884c\u76d1\u7763\u81ea\u52a8\u673a\u5668\u7ffb\u8bd1\uff08Automatic Machine Translation\uff0cAMT\uff09\u3002\u6211\u4eec\u901a\u8fc7\u591a\u79cd\u5b9e\u9a8c\u548c\u4e0d\u540cTransformer\u67b6\u6784\u7684\u8bc4\u4f30\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u53ca\u7ffb\u8bd1\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u5bf9\u4e8e\u6784\u5efa\u5065\u58ee\u7684\u591a\u8bed\u8a00AS2\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u663e\u8457\u7f29\u5c0f\u4e86\u975e\u82f1\u8bed\u4e0e\u82f1\u8bed\u73af\u5883\u4e0b\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2406.10162": "|**2024-06-14**|**Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models**|Carson Denison et.al.|[2406.10162](http://arxiv.org/abs/2406.10162)|**[link](https://github.com/anthropics/sycophancy-to-subterfuge-paper)**|**\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u5f53\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5b66\u4f1a\u56e0\u8bad\u7ec3\u76ee\u6807\u4e0d\u660e\u786e\u800c\u83b7\u5f97\u4e0d\u671f\u671b\u7684\u884c\u4e3a\u65f6\uff0c\u5c31\u4f1a\u51fa\u73b0\u89c4\u683c\u6e38\u620f\u73b0\u8c61\u3002\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u4ece\u7b80\u5355\u7684\u5949\u627f\u884c\u4e3a\u53d1\u5c55\u5230\u66f4\u590d\u6742\u4e14\u5371\u9669\u7684\u5956\u52b1\u7be1\u6539\uff0c\u5373\u6a21\u578b\u76f4\u63a5\u4fee\u6539\u5176\u81ea\u8eab\u7684\u5956\u52b1\u673a\u5236\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8fd9\u4e9b\u590d\u6742\u884c\u4e3a\u53ef\u80fd\u8d85\u51fa\u63a2\u7d22\u7684\u8303\u7574\u3002\u672c\u8bba\u6587\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u4f1a\u5728\u5b66\u4e60\u5e38\u89c1\u89c4\u683c\u6e38\u620f\u7b56\u7565\u540e\uff0c\u6cdb\u5316\u5230\u6267\u884c\u66f4\u4e3a\u7f55\u89c1\u548c\u660e\u663e\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u5956\u52b1\u7be1\u6539\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9010\u6b65\u5347\u7ea7\u7684\u53ef\u6e38\u620f\u73af\u5883\u7cfb\u5217\uff0c\u5e76\u53d1\u73b0\u9488\u5bf9\u65e9\u671f\u9636\u6bb5\u73af\u5883\u7684\u8bad\u7ec3\u4f1a\u5bfc\u81f4\u5728\u540e\u7eed\u73af\u5883\u4e2d\u51fa\u73b0\u66f4\u591a\u7684\u89c4\u683c\u6e38\u620f\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u4e00\u5c0f\u90e8\u5206\u4f46\u975e\u96f6\u7684LLMs\uff0c\u5728\u7ecf\u5386\u4e86\u5b8c\u6574\u8bad\u7ec3\u8bfe\u7a0b\u540e\uff0c\u80fd\u591f\u96f6\u6837\u672c\u5730\u76f4\u63a5\u4fee\u6539\u5176\u5956\u52b1\u51fd\u6570\u3002\u91cd\u65b0\u8bad\u7ec3LLMs\u4ee5\u907f\u514d\u65e9\u671f\u9636\u6bb5\u7684\u6e38\u620f\u884c\u4e3a\u53ef\u4ee5\u51cf\u8f7b\u4f46\u4e0d\u80fd\u5b8c\u5168\u6d88\u9664\u540e\u671f\u73af\u5883\u4e2d\u7684\u5956\u52b1\u7be1\u6539\u3002\u6b64\u5916\uff0c\u5bf9\u53ef\u6e38\u620f\u73af\u5883\u8fdb\u884c\u65e0\u5bb3\u6027\u8bad\u7ec3\u5e76\u4e0d\u80fd\u963b\u6b62\u5956\u52b1\u7be1\u6539\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u4ece\u5e38\u89c1\u7684\u89c4\u683c\u6e38\u620f\u7b56\u7565\u4e2d\u6cdb\u5316\u5230\u66f4\u6076\u52a3\u7684\u5956\u52b1\u7be1\u6539\u884c\u4e3a\uff0c\u5e76\u4e14\u8981\u6d88\u9664\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u5e76\u975e\u6613\u4e8b\u3002**|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149](http://arxiv.org/abs/2406.10149)|**[link](https://github.com/booydar/babilong)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8f93\u5165\u4e0a\u4e0b\u6587\u957f\u5ea6\u663e\u8457\u589e\u52a0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u672a\u80fd\u5145\u5206\u8861\u91cf\u6a21\u578b\u5904\u7406\u957f\u7bc7\u6587\u672c\u4e2d\u7684\u4e8b\u5b9e\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86BABILong\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u6d4b\u8bd5\u6a21\u578b\u5728\u5206\u5e03\u5f0f\u957f\u6587\u6863\u4e2d\u8de8\u4e8b\u5b9e\u63a8\u7406\u7684\u80fd\u529b\u3002BABILong\u5305\u62ec20\u4e2a\u591a\u6837\u5316\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u5982\u4e8b\u5b9e\u94fe\u3001\u7b80\u5355\u5f52\u7eb3\u3001\u6f14\u7ece\u3001\u8ba1\u6570\u4ee5\u53ca\u5904\u7406\u5217\u8868/\u96c6\u5408\u7b49\u3002\u8fd9\u4e9b\u4efb\u52a1\u672c\u8eab\u5c31\u5177\u6709\u6311\u6218\u6027\uff0c\u800c\u5f53\u6240\u9700\u4e8b\u5b9e\u5206\u6563\u5728\u957f\u7bc7\u81ea\u7136\u6587\u672c\u4e2d\u65f6\uff0c\u96be\u5ea6\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6d41\u884c\u7684LLMs\u5b9e\u9645\u4e0a\u53ea\u5229\u7528\u4e8610%-20%\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4e14\u968f\u7740\u63a8\u7406\u590d\u6742\u6027\u7684\u63d0\u9ad8\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u5bf9\u4e8e\u66ff\u4ee3\u7684\u4e0a\u4e0b\u6587\u63a8\u7406\u65b9\u6cd5\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\u5728\u5355\u4e8b\u5b9e\u95ee\u9898\u56de\u7b54\u4e0a\u7684\u51c6\u786e\u7387\u4ec5\u4e3a60%\uff0c\u4e0e\u4e0a\u4e0b\u6587\u957f\u5ea6\u65e0\u5173\u3002\u5728\u4e0a\u4e0b\u6587\u6269\u5c55\u65b9\u6cd5\u4e2d\uff0c\u5faa\u73af\u8bb0\u5fc6Transformer\u5c55\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u53ef\u5904\u7406\u957f\u8fbe1100\u4e07\u4e2a\u4ee4\u724c\u7684\u957f\u5ea6\u3002BABILong\u57fa\u51c6\u6d4b\u8bd5\u53ef\u4ee5\u6269\u5c55\u5230\u4efb\u610f\u957f\u5ea6\uff0c\u4ee5\u652f\u6301\u8bc4\u4f30\u5177\u6709\u66f4\u5f3a\u80fd\u529b\u7684\u65b0\u6a21\u578b\uff0c\u5e76\u63d0\u4f9b\u4e86\u957f\u8fbe100\u4e07\u4ee4\u724c\u7684\u5206\u9694\u3002|\n", "2406.11840": "|**2024-06-17**|**LLaNA: Large Language and NeRF Assistant**|Andrea Amaduzzi et.al.|[2406.11840](http://arxiv.org/abs/2406.11840)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u56fe\u50cf\u548c3D\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5168\u9762\u6355\u6349\u7269\u4f53\u7684\u5916\u89c2\u548c\u51e0\u4f55\u7279\u6027\u4e0a\u5b58\u5728\u5c40\u9650\u3002\u8fd1\u671f\uff0c\u795e\u7ecf\u8f90\u5c04\u573a\uff08Neural Radiance Fields\uff0c\u7b80\u79f0NeRF\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u5668\uff08Multi-Layer Perceptron\uff0cMLP\uff09\u7684\u6743\u91cd\u7f16\u7801\u4e86\u7269\u4f53\u7684\u51e0\u4f55\u7ed3\u6784\u548c\u9ad8\u5ea6\u903c\u771f\u7684\u5916\u89c2\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06NeRF\u6574\u5408\u5230MLLM\u4e2d\u7684\u53ef\u884c\u6027\u548c\u6548\u679c\u3002\u6211\u4eec\u5f00\u53d1\u4e86LLaNA\uff0c\u8fd9\u662f\u9996\u4e2a\u901a\u7528\u7684NeRF-\u8bed\u8a00\u52a9\u624b\uff0c\u80fd\u591f\u6267\u884c\u65b0\u4efb\u52a1\uff0c\u5982NeRF\u63cf\u8ff0\u548c\u95ee\u7b54\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u76f4\u63a5\u5904\u7406NeRF MLP\u7684\u6743\u91cd\uff0c\u65e0\u9700\u6e32\u67d3\u56fe\u50cf\u6216\u6784\u5efa3D\u6570\u636e\u7ed3\u6784\uff0c\u5c31\u80fd\u63d0\u53d6\u6709\u5173\u4ee3\u8868\u5bf9\u8c61\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65e0\u987b\u4eba\u5de5\u5e72\u9884\u7684NeRF\u6587\u672c\u6807\u6ce8\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5404\u79cdNeRF-\u8bed\u8a00\u4efb\u52a1\uff0c\u5e76\u636e\u6b64\u5efa\u7acb\u4e86\u4e00\u4e2a\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u6211\u4eec\u7684\u6a21\u578b\u5bf9NeRF\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5904\u7406NeRF\u6743\u91cd\u7684\u65b9\u6cd5\u5728\u4e0e\u4eceNeRF\u4e2d\u63d0\u53d62D\u62163D\u8868\u793a\u8fdb\u884c\u6bd4\u8f83\u65f6\u8868\u73b0\u66f4\u4f18\u3002|\n", "2406.11839": "|**2024-06-17**|**mDPO: Conditional Preference Optimization for Multimodal Large Language Models**|Fei Wang et.al.|[2406.11839](http://arxiv.org/abs/2406.11839)|null|### \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5df2\u88ab\u8bc1\u660e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6821\u51c6\u7684\u6709\u6548\u624b\u6bb5\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06DPO\u5e94\u7528\u4e8e\u591a\u6a21\u6001\u573a\u666f\uff0c\u4f46\u53d1\u73b0\u5b9e\u73b0\u6301\u7eed\u6539\u8fdb\u9887\u5177\u6311\u6218\u3002\u901a\u8fc7\u5bf9\u6bd4\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5ffd\u89c6\u4e86\u56fe\u50cf\u6761\u4ef6\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86mDPO\uff0c\u4e00\u4e2a\u65e8\u5728\u9632\u6b62\u8bed\u8a00\u504f\u597d\u8fc7\u5ea6\u4f18\u5148\u7684\u591a\u6a21\u6001DPO\u76ee\u6807\uff0c\u540c\u65f6\u4f18\u5316\u56fe\u50cf\u504f\u597d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5956\u52b1\u951a\u70b9\uff0c\u786e\u4fdd\u9009\u62e9\u7684\u54cd\u5e94\u5956\u52b1\u4fdd\u6301\u6b63\u5411\uff0c\u4ece\u800c\u907f\u514d\u76f8\u5bf9\u504f\u597d\u4f18\u5316\u56fa\u6709\u7684\u53ef\u80fd\u6027\u964d\u4f4e\u95ee\u9898\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u4e24\u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u591a\u6a21\u6001LLM\u4ee5\u53ca\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0cmDPO\u6709\u6548\u89e3\u51b3\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5e76\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002|\n", "2406.11832": "|**2024-06-17**|**Unveiling Encoder-Free Vision-Language Models**|Haiwen Diao et.al.|[2406.11832](http://arxiv.org/abs/2406.11832)|**[link](https://github.com/baaivision/eve)**|**\u5f53\u524d\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u4e3b\u8981\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7f16\u7801\u5668\u6765\u63d0\u53d6\u89c6\u89c9\u7279\u5f81\uff0c\u7136\u540e\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u5728\u62bd\u8c61\u89c6\u89c9\u8868\u793a\u65b9\u9762\u8bbe\u5b9a\u4e86\u5f3a\u70c8\u7684\u5148\u9a8c\uff0c\u5982\u5206\u8fa8\u7387\u3001\u6bd4\u4f8b\u548c\u8bed\u4e49\u503e\u5411\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86VLM\u7684\u7075\u6d3b\u6027\u548c\u6548\u7387\u3002\u76f4\u63a5\u8bad\u7ec3\u65e0\u7f16\u7801\u5668\u7684\u7eafVLM\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u4e14\u9c9c\u6709\u63a2\u7d22\u3002\u5b9e\u8bc1\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u79cd\u76f4\u63a5\u8bad\u7ec3\u65b9\u6cd5\u4f1a\u5bfc\u81f4\u6536\u655b\u7f13\u6162\u548c\u6027\u80fd\u5dee\u8ddd\u8f83\u5927\u3002\u672c\u6587\u65e8\u5728\u5f25\u5408\u7f16\u7801\u5668\u4f9d\u8d56\u578b\u548c\u65e0\u7f16\u7801\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u7eafVLM\u8bad\u7ec3\u7b56\u7565\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u6df1\u5165\u5b9e\u9a8c\u63ed\u793a\u4e86\u9ad8\u6548\u8bad\u7ec3\u65e0\u7f16\u7801\u5668VLM\u7684\u5173\u952e\u8981\u7d20\uff1a\uff081\uff09\u5728\u7edf\u4e00\u7684\u89e3\u7801\u5668\u5185\u878d\u5408\u89c6\u89c9\u4e0e\u8bed\u8a00\u8868\u793a\uff1b\uff082\uff09\u901a\u8fc7\u989d\u5916\u76d1\u7763\u63d0\u5347\u89c6\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e9b\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86EVE\uff0c\u4e00\u4e2a\u65e0\u7f16\u7801\u5668\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u65e2\u80fd\u9ad8\u6548\u8bad\u7ec3\u4e5f\u80fd\u5feb\u901f\u63a8\u7406\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4ec5\u4f7f\u75283500\u4e07\u516c\u5f00\u53ef\u7528\u7684\u6570\u636e\uff0cEVE\u5c31\u80fd\u5728\u591a\u4e2a\u89c6\u89c9\u8bed\u8a00\u57fa\u51c6\u4e0a\u4e0e\u7c7b\u4f3c\u5bb9\u91cf\u7684\u7f16\u7801\u5668\u4f9d\u8d56\u578bVLM\u5339\u654c\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u795e\u79d8\u3001\u6570\u636e\u672a\u516c\u5f00\u7684Fuyu-8B\u6a21\u578b\u3002\u6211\u4eec\u76f8\u4fe1\uff0cEVE\u4e3a\u8de8\u6a21\u6001\u5f00\u53d1\u7eaf\u7cb9\u7684\u89e3\u7801\u5668\u67b6\u6784\u63d0\u4f9b\u4e86\u4e00\u4e2a\u900f\u660e\u4e14\u9ad8\u6548\u7684\u8def\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u516c\u5f00\u5728\uff1ahttps://github.com/baaivision/EVE\u3002**|\n", "2406.11831": "|**2024-06-17**|**Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models**|Bingqi Ma et.al.|[2406.11831](http://arxiv.org/abs/2406.11831)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u89e3\u7801\u5668-only\u53d8\u538b\u5668\u5728\u6587\u672c\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5982\u4f55\u5c06\u8fd9\u4e9b\u5148\u8fdb\u7684LLMs\u5e94\u7528\u4e8e\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\u4ecd\u662f\u4e00\u4e2a\u5f85\u63a2\u7d22\u7684\u95ee\u9898\u3002\u6211\u4eec\u53d1\u73b0\u76f4\u63a5\u4f7f\u7528LLM\u4f5c\u4e3a\u63d0\u793a\u7f16\u7801\u5668\u4f1a\u663e\u8457\u964d\u4f4e\u751f\u6210\u56fe\u50cf\u65f6\u7684\u63d0\u793a\u8ddf\u968f\u80fd\u529b\u3002\u4e3b\u8981\u5b58\u5728\u4e24\u4e2a\u95ee\u9898\uff1a\u4e00\u662fLLM\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u8bad\u7ec3\u4e0e\u6269\u6563\u6a21\u578b\u5bf9\u533a\u5206\u6027\u63d0\u793a\u7279\u5f81\u7684\u9700\u6c42\u4e0d\u5339\u914d\uff1b\u4e8c\u662f\u89e3\u7801\u5668\u67b6\u6784\u56fa\u6709\u7684\u4f4d\u7f6e\u504f\u89c1\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u6846\u67b6\uff0c\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4f7f\u7528\u6307\u5357\uff0c\u589e\u5f3aLLM\u7684\u6587\u672c\u8868\u793a\u80fd\u529b\uff0c\u6d88\u9664\u5176\u5185\u5728\u7684\u5b9a\u4f4d\u504f\u89c1\uff0c\u4ece\u800c\u7075\u6d3b\u5730\u5c06\u6700\u5148\u8fdb\u7684LLMs\u878d\u5165\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u878d\u5408\u591a\u4e2aLLMs\u7684\u65b9\u6cd5\u3002\u9274\u4e8eTransformer\u67b6\u6784\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6269\u5c55\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8bbe\u8ba1\u4e86\u57fa\u4e8e\u8be5\u6846\u67b6\u7684LLM-Infused Diffusion Transformer\uff08LI-DiT\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86LI-DiT\u5728\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u6570\u636e\u91cf\u4e0b\u7684\u6027\u80fd\u3002\u5f97\u76ca\u4e8eLLMs\u7684\u5185\u5728\u80fd\u529b\u53ca\u6211\u4eec\u7684\u521b\u65b0\u8bbe\u8ba1\uff0cLI-DiT\u7684\u63d0\u793a\u7406\u89e3\u6027\u80fd\u8f7b\u677e\u8d85\u8d8a\u5f00\u6e90\u7684\u6700\u65b0\u6a21\u578b\uff0c\u4ee5\u53ca\u5305\u62ecStable Diffusion 3\u3001DALL-E 3\u548cMidjourney V6\u5728\u5185\u7684\u4e3b\u6d41\u95ed\u6e90\u5546\u4e1a\u6a21\u578b\u3002\u5f3a\u5927\u7684LI-DiT-10B\u5c06\u5728\u8fdb\u4e00\u6b65\u4f18\u5316\u548c\u5b89\u5168\u68c0\u67e5\u540e\u63d0\u4f9b\u3002|\n", "2406.11827": "|**2024-06-17**|**WPO: Enhancing RLHF with Weighted Preference Optimization**|Wenxuan Zhou et.al.|[2406.11827](http://arxiv.org/abs/2406.11827)|**[link](https://github.com/wzhouad/wpo)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u66f4\u597d\u5730\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6709\u524d\u666f\u65b9\u6cd5\u3002\u7531\u4e8e\u6210\u672c\u6548\u76ca\u548c\u53ef\u6269\u5c55\u6027\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u2014\u2014\u901a\u8fc7\u5176\u4ed6\u6a21\u578b\u83b7\u53d6\u504f\u597d\u6570\u636e\u2014\u2014\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u7136\u800c\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u5e38\u53d7\u91c7\u6837\u7b56\u7565\u4e0e\u76ee\u6807\u7b56\u7565\u4e4b\u95f4\u5206\u5e03\u5dee\u5f02\u7684\u5f71\u54cd\uff0c\u5bfc\u81f4\u4f18\u5316\u6548\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7b56\u7565\u2014\u2014\u52a0\u6743\u504f\u597d\u4f18\u5316\uff08WPO\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u504f\u597d\u8bc4\u5206\u5bf9\uff0c\u4f7f\u79bb\u7ebf\u6570\u636e\u66f4\u63a5\u8fd1\u4e8e\u5f53\u524d\u7b56\u7565\uff0c\u4ece\u800c\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5206\u5e03\u5dee\u8ddd\u96be\u9898\uff0c\u8fd8\u63d0\u5347\u4e86\u4f18\u5316\u8fc7\u7a0b\uff0c\u65e0\u9700\u989d\u5916\u6210\u672c\u3002 \u6211\u4eec\u5728Alpaca Eval 2\u548cMT-bench\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002WPO\u5728Alpaca Eval 2\u4e0a\u7684\u6027\u80fd\u6bd4\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u63d0\u9ad8\u4e865.6%\u3002\u57fa\u4e8eLlama-3-8B-Instruct\uff0cWPO\u751a\u81f3\u5efa\u7acb\u4e86\u663e\u8457\u7684\u957f\u5ea6\u63a7\u5236\u80dc\u7387\uff0c\u8fbe\u523048.6%\uff0c\u572880\u4ebf\u53c2\u6570\u6a21\u578b\u6392\u884c\u699c\u4e0a\u6210\u4e3a\u6700\u5f3a\u52b2\u7684\u6a21\u578b\u3002\u6211\u4eec\u5c06\u5728\u4e0a\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002**|\n", "2406.11818": "|**2024-06-17**|**Embodied Instruction Following in Unknown Environments**|Zhenyu Wu et.al.|[2406.11818](http://arxiv.org/abs/2406.11818)|null|\u5728\u81ea\u4e3b\u5bb6\u5ead\u670d\u52a1\u7cfb\u7edf\u4e2d\uff0c\u4f7f\u5b9e\u4f53\u4ee3\u7406\u80fd\u6839\u636e\u81ea\u7136\u8bed\u8a00\u5b8c\u6210\u590d\u6742\u7684\u4eba\u7c7b\u6307\u4ee4\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u4ec5\u80fd\u5728\u6240\u6709\u4e92\u52a8\u5bf9\u8c61\u90fd\u63d0\u4f9b\u7ed9\u4ee3\u7406\u7684\u5df2\u77e5\u73af\u5883\u4e2d\u6267\u884c\u6307\u4ee4\uff0c\u76f4\u63a5\u5c06\u73b0\u6709\u65b9\u6cd5\u5e94\u7528\u4e8e\u672a\u77e5\u73af\u5883\u901a\u5e38\u4f1a\u4ea7\u751f\u64cd\u4f5c\u4e0d\u5b58\u5728\u7269\u4f53\u7684\u4e0d\u53ef\u884c\u8ba1\u5212\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u672a\u77e5\u73af\u5883\u7684\u590d\u6742\u4efb\u52a1\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\uff08Embodied Instruction Following\uff0cEIF\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4f7f\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u63a2\u7d22\u73af\u5883\uff0c\u5229\u7528\u73b0\u6709\u7269\u4f53\u751f\u6210\u53ef\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u8fbe\u6210\u62bd\u8c61\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\u5668\u548c\u4f4e\u5c42\u63a2\u7d22\u63a7\u5236\u5668\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5c42\u6b21\u5316\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u52a8\u6001\u533a\u57df\u6ce8\u610f\u529b\u6784\u5efa\u573a\u666f\u7684\u8bed\u4e49\u8868\u793a\u5730\u56fe\uff0c\u4ee5\u5c55\u793a\u5df2\u77e5\u7684\u89c6\u89c9\u7ebf\u7d22\uff0c\u4f7f\u4efb\u52a1\u89c4\u5212\u548c\u573a\u666f\u63a2\u7d22\u4e0e\u4eba\u7c7b\u6307\u4ee4\u76ee\u6807\u4fdd\u6301\u4e00\u81f4\u3002\u5bf9\u4e8e\u4efb\u52a1\u89c4\u5212\u5668\uff0c\u6839\u636e\u4efb\u52a1\u5b8c\u6210\u8fc7\u7a0b\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\uff0c\u6211\u4eec\u751f\u6210\u6b65\u9aa4\u5f0f\u7684\u53ef\u884c\u8ba1\u5212\u3002\u5bf9\u4e8e\u63a2\u7d22\u63a7\u5236\u5668\uff0c\u6839\u636e\u751f\u6210\u7684\u6b65\u9aa4\u8ba1\u5212\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\u9884\u6d4b\u6700\u4f18\u7684\u5bfc\u822a\u6216\u7269\u4f53\u4ea4\u4e92\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5927\u578b\u623f\u5c4b\u7ea7\u573a\u666f\u4e2d\u7684204\u4e2a\u590d\u6742\u4eba\u7c7b\u6307\u4ee4\uff08\u5982\u505a\u65e9\u9910\u548c\u6574\u7406\u623f\u95f4\uff09\u4e0a\u5b9e\u73b0\u4e8645.09%\u7684\u6210\u529f\u7387\u3002|\n", "2406.11816": "|**2024-06-17**|**VideoLLM-online: Online Video Large Language Model for Streaming Video**|Joya Chen et.al.|[2406.11816](http://arxiv.org/abs/2406.11816)|null|## \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u589e\u5f3a\u4e86\u89c6\u89c9\u529f\u80fd\uff0c\u80fd\u591f\u7406\u89e3\u56fe\u50cf\u3001\u89c6\u9891\u548c\u878d\u5408\u4e86\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u5185\u5bb9\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5927\u6a21odels\u7684\u8bad\u7ec3\u65b9\u6cd5\u901a\u5e38\u5c06\u89c6\u9891\u89c6\u4e3a\u9884\u5148\u526a\u8f91\u597d\u7684\u7247\u6bb5\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5728\u5904\u7406\u8fde\u7eed\u89c6\u9891\u6d41\u65f6\u6548\u679c\u4e0d\u4f73\u4e14\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cLearning-In-Video-Stream\u201d\uff08LIVE\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b9e\u65f6\u3001\u957f\u5e8f\u5217\u3001\u4e0e\u89c6\u9891\u6d41\u540c\u6b65\u7684\u5bf9\u8bdd\uff0c\u9002\u7528\u4e8e\u8fde\u7eed\u89c6\u9891\u8f93\u5165\u3002LIVE\u6846\u67b6\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a\uff081\uff09\u4e00\u4e2a\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u8fde\u7eed\u6d41\u5f0f\u8f93\u5165\u7684\u8bed\u8a00\u5efa\u6a21\u76ee\u6807\uff1b\uff082\uff09\u4e00\u79cd\u6570\u636e\u751f\u6210\u7b56\u7565\uff0c\u5c06\u79bb\u7ebf\u65f6\u95f4\u6807\u6ce8\u8f6c\u6362\u4e3a\u9002\u5408\u6d41\u5f0f\u5bf9\u8bdd\u7684\u683c\u5f0f\uff1b\uff083\uff09\u4e00\u4e2a\u4f18\u5316\u7684\u63a8\u7406\u7ba1\u9053\uff0c\u4ee5\u63d0\u9ad8\u5728\u5b9e\u9645\u89c6\u9891\u6d41\u4e2d\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u57fa\u4e8eLlama-2/Llama-3\uff0c\u6211\u4eec\u6784\u5efa\u4e86VideoLLM-online\u6a21\u578b\uff0c\u5e76\u901a\u8fc7\u5b83\u5c55\u793a\u4e86\u5728\u5904\u7406\u89c6\u9891\u6d41\u5bf9\u8bdd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\uff0c\u4f8b\u5982\uff0c\u5728A100 GPU\u4e0a\uff0c\u8be5\u6a21\u578b\u80fd\u57285\u5206\u949f\u89c6\u9891\u7247\u6bb5\u4e2d\u5b9e\u73b0\u8d85\u8fc710\u5e27\u6bcf\u79d2\u7684\u6d41\u5f0f\u5bf9\u8bdd\u3002\u6b64\u5916\uff0cVideoLLM-online\u8fd8\u5728\u516c\u5f00\u7684\u79bb\u7ebf\u89c6\u9891\u57fa\u51c6\u6d4b\u8bd5\uff08\u5982\u8bc6\u522b\u3001captioning\u548c\u9884\u6d4b\uff09\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u3001\u6a21\u578b\u3001\u6570\u636e\u548c\u6f14\u793a\u53d1\u5e03\u5728https://showlab.github.io/videollm-online\u4f9b\u4eba\u4f7f\u7528\u3002|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813](http://arxiv.org/abs/2406.11813)|null|\u5c3d\u7ba1\u8fd1\u671f\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5b58\u50a8\u5927\u91cf\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u8fd9\u4e9b\u77e5\u8bc6\u7684\u673a\u5236\u5c1a\u4e0d\u660e\u786e\u3002\u672c\u7814\u7a76\u9488\u5bf9\u8fd9\u4e00\u7f3a\u53e3\uff0c\u63a2\u8ba8\u4e86LLMs\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u5982\u4f55\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u3002\u7814\u7a76\u53d1\u73b0\u4e86\u4e00\u4e9b\u5173\u952e\u6d1e\u89c1\uff1a\u9996\u5148\uff0c\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0c\u66f4\u591a\u7684\u8bad\u7ec3\u6570\u636e\u5bf9\u6a21\u578b\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u7684\u80fd\u529b\u5e76\u65e0\u663e\u8457\u63d0\u5347\u3002\u5176\u6b21\uff0c\u8bad\u7ec3\u6b65\u6570\u4e0e\u8bb0\u5fc6\u9057\u5fd8\u548c\u4e8b\u5b9e\u77e5\u8bc6\u6cdb\u5316\u4e4b\u95f4\u5b58\u5728\u5e42\u5f8b\u5173\u7cfb\uff0c\u4f7f\u7528\u91cd\u590d\u8bad\u7ec3\u6570\u636e\u7684\u6a21\u578b\u9057\u5fd8\u901f\u5ea6\u66f4\u5feb\u3002\u7b2c\u4e09\uff0c\u589e\u5927\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u62b5\u6297\u9057\u5fd8\u7684\u80fd\u529b\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0cLLMs\u5728\u9884\u8bad\u7ec3\u4e2d\u7684\u4e8b\u5b9e\u77e5\u8bc6\u83b7\u53d6\u662f\u901a\u8fc7\u9010\u6b65\u589e\u52a0\u6bcf\u4e00\u6b65\u4e2d\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\u4e8b\u5b9e\u77e5\u8bc6\u51fa\u73b0\u7684\u6982\u7387\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u589e\u52a0\u968f\u540e\u4f1a\u56e0\u9057\u5fd8\u800c\u7a00\u91ca\u3002\u57fa\u4e8e\u8fd9\u79cd\u7406\u89e3\uff0c\u6211\u4eec\u80fd\u591f\u89e3\u91ca\u4e00\u4e9b\u6700\u8fd1\u89c2\u5bdf\u5230\u7684LLM\u884c\u4e3a\uff0c\u5982\u957f\u5c3e\u77e5\u8bc6\u4e0a\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4ee5\u53ca\u53bb\u91cd\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u597d\u5904\u3002|\n", "2406.11811": "|**2024-06-17**|**RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content**|Joao Monteiro et.al.|[2406.11811](http://arxiv.org/abs/2406.11811)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5927\u91cf\u4f9d\u8d56\u81ea\u52a8\u4ece\u4e92\u8054\u7f51\u6293\u53d6\u7684\u6570\u636e\uff0c\u5176\u4e2d\u5305\u62ec\u5305\u542b\u5927\u91cf\u901a\u7528\u77e5\u8bc6\u7684\u767e\u79d1\u5168\u4e66\uff08\u5982\u7ef4\u57fa\u767e\u79d1\uff09\uff0c\u4e5f\u53ef\u80fd\u4e0e\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u57fa\u51c6\u6570\u636e\u96c6\u91cd\u53e0\u3002\u56e0\u6b64\uff0c\u5982\u679c\u6d4b\u8bd5\u96c6\u53ef\u80fd\u5df2\u6cc4\u9732\u5230\u8bad\u7ec3\u96c6\u4e2d\uff0c\u5bf9\u6a21\u578b\u7684\u8bc4\u4f30\u53ef\u80fd\u4f1a\u4ea7\u751f\u8bef\u5bfc\u6027\u7684\u7ed3\u8bba\u3002\u4e3a\u4e86\u63a8\u52a8\u8bed\u8a00\u6a21\u578b\u7684\u516c\u6b63\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\u2014\u2014RepLiQA\uff0c\u9002\u7528\u4e8e\u95ee\u7b54\u548c\u4e3b\u9898\u68c0\u7d22\u4efb\u52a1\u3002RepLiQA\u662f\u4e00\u4e2a\u5305\u542b\u4e94\u4e2a\u5206\u7247\u7684\u6d4b\u8bd5\u96c6\uff0c\u5176\u4e2d\u56db\u4e2a\u5728\u672c\u8bba\u6587\u53d1\u5e03\u524d\u672a\u516c\u5f00\u6216\u901a\u8fc7LLM API\u63d0\u4f9b\u3002RepLiQA\u7684\u6bcf\u4e2a\u6837\u672c\u7531\u4ee5\u4e0b\u56db\u90e8\u5206\u7ec4\u6210\uff1a\uff081\uff09\u7531\u4eba\u7c7b\u6807\u6ce8\u5458\u521b\u4f5c\u7684\u865a\u6784\u573a\u666f\u63cf\u8ff0\u6587\u6863\uff08\u4f8b\u5982\u65b0\u95fb\u6587\u7ae0\uff09\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0d\u4f1a\u51fa\u73b0\u5728\u4e92\u8054\u7f51\u4e0a\uff1b\uff082\uff09\u5173\u4e8e\u6587\u6863\u4e3b\u9898\u7684\u95ee\u9898\uff1b\uff083\uff09\u76f4\u63a5\u6e90\u81ea\u6587\u6863\u4fe1\u606f\u7684\u6b63\u786e\u7b54\u6848\uff1b\uff084\uff09\u5305\u542b\u7b54\u6848\u7684\u6587\u6863\u6bb5\u843d\u3002\u8fd9\u610f\u5473\u7740\u53ea\u6709\u5f53\u6a21\u578b\u80fd\u5728\u63d0\u4f9b\u7684\u6587\u6863\u4e2d\u627e\u5230\u76f8\u5173\u5185\u5bb9\u65f6\uff0c\u624d\u80fd\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5305\u62ec\u591a\u4e2a\u6700\u5148\u8fdb\u7684LLM\uff0c\u4ee5\u63ed\u793a\u4e0d\u540c\u7c7b\u578b\u7684\u548c\u89c4\u6a21\u7684\u6a21\u578b\u5728\u6761\u4ef6\u8bed\u8a00\u5efa\u6a21\u8bbe\u7f6e\u4e0b\u7684\u6027\u80fd\u5dee\u5f02\u3002RepLiQA\u7684\u5df2\u53d1\u5e03\u5206\u7247\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\uff1ahttps://huggingface.co/datasets/ServiceNow/repliqa\u3002|\n", "2406.11801": "|**2024-06-17**|**Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations**|Rima Hazra et.al.|[2406.11801](http://arxiv.org/abs/2406.11801)|**[link](https://github.com/declare-lab/safety-arithmetic)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ffb\u8bd1\u548c\u95ee\u7b54\u7b49\u5e94\u7528\u4e2d\u7684\u65e5\u76ca\u91cd\u8981\uff0c\u786e\u4fdd\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6b63\u786e\u5bfc\u5411\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u5bf9\u9f50\u65b9\u6cd5\u5728\u5904\u7406\u52a8\u6001\u7528\u6237\u610f\u56fe\u548c\u590d\u6742\u76ee\u6807\u65f6\u5b58\u5728\u56f0\u96be\uff0c\u4f7f\u5f97\u6a21\u578b\u5bb9\u6613\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\u2014\u2014\u5b89\u5168\u7b97\u672f\uff08Safety Arithmetic\uff09\uff0c\u65e8\u5728\u63d0\u5347LLMs\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5305\u62ec\u57fa\u7840\u6a21\u578b\u3001\u76d1\u7763\u5fae\u8c03\u6a21\u578b\uff08SFT\uff09\u548c\u7f16\u8f91\u540e\u7684\u6a21\u578b\u3002\u5b89\u5168\u7b97\u672f\u5305\u542b\u4e24\u90e8\u5206\uff1a\u6709\u5bb3\u5185\u5bb9\u6d88\u9664\uff08Harm Direction Removal\uff09\u4ee5\u907f\u514d\u4e0d\u826f\u8f93\u51fa\uff0c\u4ee5\u53ca\u5b89\u5168\u5bf9\u9f50\uff08Safety Alignment\uff09\u4ee5\u4fc3\u8fdb\u5b89\u5168\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86NoIntentEdit\u6570\u636e\u96c6\uff0c\u5b83\u63ed\u793a\u4e86\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5b89\u5168\u98ce\u9669\u7684\u7f16\u8f91\u5b9e\u4f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5b89\u5168\u7b97\u672f\u663e\u8457\u589e\u5f3a\u4e86\u5b89\u5168\u63aa\u65bd\uff0c\u51cf\u5c11\u4e86\u8fc7\u5ea6\u5b89\u5168\u7684\u95ee\u9898\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u5b9e\u7528\u6027\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5728\u4fdd\u969c\u5185\u5bb9\u751f\u6210\u7684\u5b89\u5168\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002**|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846](http://arxiv.org/abs/2406.12846)|null|\u5f53\u524d\u7684\u957f\u89c6\u9891\u7406\u89e3\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u65f6\u957f\u4ec5\u5341\u51e0\u79d2\u7684\u89c6\u9891\uff0c\u5bf9\u5904\u7406\u66f4\u957f\u89c6\u9891\u7684\u6280\u672f\u63a2\u7d22\u6709\u9650\u3002\u957f\u89c6\u9891\u4e2d\u7684\u5927\u91cf\u5e27\u6570\u5e26\u6765\u4e86\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u96be\u4ee5\u5b9a\u4f4d\u5173\u952e\u4fe1\u606f\u548c\u8fdb\u884c\u957f\u671f\u63a8\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faDrVideo\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6587\u6863\u68c0\u7d22\u7684\u7cfb\u7edf\uff0c\u4e13\u4e3a\u957f\u89c6\u9891\u7406\u89e3\u8bbe\u8ba1\u3002\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u89c6\u9891\u7406\u89e3\u95ee\u9898\u8f6c\u5316\u4e3a\u957f\u6587\u6863\u7406\u89e3\u4efb\u52a1\uff0c\u4ee5\u5145\u5206\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0cDrVideo\u5c06\u957f\u89c6\u9891\u8f6c\u6362\u4e3a\u6587\u672c\u5f62\u5f0f\u7684\u957f\u6587\u6863\uff0c\u9996\u5148\u68c0\u7d22\u5173\u952e\u5e27\u5e76\u589e\u5f3a\u8fd9\u4e9b\u5e27\u7684\u4fe1\u606f\uff0c\u4f5c\u4e3a\u7cfb\u7edf\u7684\u8d77\u70b9\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u57fa\u4e8e\u4ee3\u7406\u7684\u8fed\u4ee3\u5faa\u73af\uff0c\u6301\u7eed\u641c\u7d22\u7f3a\u5931\u4fe1\u606f\u3001\u8865\u5145\u76f8\u5173\u6570\u636e\uff0c\u5e76\u5728\u6536\u96c6\u5230\u8db3\u591f\u7684\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u4fe1\u606f\u540e\uff0c\u4ee5\u94fe\u5f0f\u601d\u8003\u7684\u65b9\u5f0f\u7ed9\u51fa\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u591a\u4e2a\u957f\u89c6\u9891\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002DrVideo\u5728EgoSchema\uff083\u5206\u949f\uff09\u6d4b\u8bd5\u4e2d\u6bd4\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa3.8\u4e2a\u767e\u5206\u70b9\uff0c\u5728MovieChat-1K\uff0810\u5206\u949f\uff09\u7684break\u6a21\u5f0f\u548cglobal\u6a21\u5f0f\u4e2d\u5206\u522b\u63d0\u9ad817.9\u548c38.0\u5206\uff0c\u4ee5\u53ca\u5728LLama-Vid QA\uff08\u8d85\u8fc760\u5206\u949f\uff09\u6570\u636e\u96c6\u4e0a\u63d0\u534730.2\u5206\u3002|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845](http://arxiv.org/abs/2406.12845)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\uff08RM\uff09\uff0c\u8fc7\u7a0b\u901a\u5e38\u4ece\u6bd4\u8f83\u540c\u4e00\u7528\u6237\u8bf7\u6c42\u7684\u54cd\u5e94\u5f00\u59cb\uff0c\u76f8\u5bf9\u8bc4\u5206\u6307\u793a\u4eba\u7c7b\u66f4\u559c\u6b22\u54ea\u4e2a\u54cd\u5e94\u3002\u7136\u800c\uff0c\u7531\u4e8eRM\u7684\u9ed1\u76d2\u7279\u6027\uff0c\u5176\u8f93\u51fa\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u4eba\u4eec\u96be\u4ee5\u7406\u89e3\u4e3a\u4ec0\u4e48RM\u8ba4\u4e3a\u67d0\u4e2a\u56de\u590d\u662f\u597d\u7684\u3002\u9274\u4e8eRM\u4f5c\u4e3a\u4eba\u7c7b\u504f\u597d\u7684\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u8bae\u91c7\u7528\u4e24\u9636\u6bb5\u65b9\u6cd5\u6765\u521b\u5efa\u53ef\u89e3\u91ca\u7684RM\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u591a\u7ef4\u7edd\u5bf9\u8bc4\u5206\u6570\u636e\u8bad\u7ec3\u7edd\u5bf9\u8bc4\u7ea7\u591a\u76ee\u6807\u5956\u52b1\u6a21\u578b\uff08ArmoRM\uff09\uff0c\u6bcf\u4e2a\u7ef4\u5ea6\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u76ee\u6807\uff08\u5982\u8bda\u5b9e\u3001\u8be6\u5c3d\u3001\u5b89\u5168\uff09\uff1b\u5176\u6b21\uff0c\u5229\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7b56\u7565\uff0c\u7ed3\u5408\u4e00\u4e2a\u95e8\u63a7\u7f51\u7edc\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u81ea\u52a8\u9009\u62e9\u6700\u5408\u9002\u7684\u5956\u52b1\u76ee\u6807\u3002\u6211\u4eec\u6210\u529f\u5730\u4f7f\u7528Llama-3 8B\u8bad\u7ec3\u4e86ArmoRM\uff0c\u5e76\u5728\u9876\u90e8\u6dfb\u52a0\u4e86\u4e00\u4e2a\u6d45\u5c42MLP\u4f5c\u4e3a\u95e8\u63a7\u7f51\u7edc\uff0c\u5f62\u6210\u4e86ArmoRM-Llama3-8B\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u8bc4\u4f30RM\u7684\u8bed\u8a00\u5efa\u6a21\u6027\u80fd\u7684RewardBench\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6210\u7ee9\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8d85\u8fc7\u4e86\u4f7f\u7528GPT-4\u6cd5\u5b98\u7684LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\uff0c\u5e76\u63a5\u8fd1\u4e8e\u89c4\u6a21\u66f4\u5927\u7684Nemotron-4 340B\u5956\u52b1\u6a21\u578b\u7684\u6c34\u5e73\u3002**|\n", "2406.12844": "|**2024-06-18**|**Synergizing Foundation Models and Federated Learning: A Survey**|Shenghui Li et.al.|[2406.12844](http://arxiv.org/abs/2406.12844)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3001\u89c6\u89c9Transformer\u548c\u591a\u6a21\u6001\u6a21\u578b\u7b49\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u7684\u53d1\u5c55\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u4e0e\u5c0f\u578b\u6a21\u578b\u76f8\u6bd4\uff0cFMs\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5bf9\u5927\u91cf\u6570\u636e\u7684\u9700\u6c42\u66f4\u5927\u3002\u5c3d\u7ba1\u901a\u7528FMs\u53ef\u4ee5\u4f7f\u7528\u4e92\u8054\u7f51\u4e0a\u7684\u516c\u5f00\u6570\u636e\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u4f46\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684FMs\u9700\u8981\u4e13\u6709\u6570\u636e\uff0c\u8fd9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u56e0\u9690\u79c1\u95ee\u9898\u800c\u9762\u4e34\u6570\u636e\u53ef\u7528\u6027\u6311\u6218\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4f5c\u4e3a\u4e00\u79cd\u534f\u4f5c\u5b66\u4e60\u8303\u5f0f\uff0c\u6253\u7834\u4e86\u6570\u636e\u5171\u4eab\u7684\u969c\u788d\uff0c\u4e3a\u5229\u7528\u5206\u5e03\u5f0f\u6570\u636e\u5b9a\u5236\u548c\u9002\u5e94\u5404\u79cd\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u7684FMs\u63d0\u4f9b\u4e86\u524d\u666f\uff0c\u540c\u65f6\u4fdd\u62a4\u4e86\u6570\u636e\u9690\u79c1\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8bba\u6587\u63a2\u8ba8\u4e86FL\u4e0eFMs\u878d\u5408\u7684\u6f5c\u529b\u4e0e\u6311\u6218\uff0c\u603b\u7ed3\u4e86\u6838\u5fc3\u6280\u672f\u3001\u672a\u6765\u53d1\u5c55\u65b9\u5411\u4ee5\u53ca\u5e94\u7528\u573a\u666f\u3002\u5173\u4e8eFM-FL\u7684\u5b9a\u671f\u66f4\u65b0\u8bba\u6587\u96c6\u5408\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832](http://arxiv.org/abs/2406.12832)|**[link](https://github.com/arminazizi98/lamda)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u5df2\u7ecf\u6210\u4e3a\u6807\u51c6\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u5d4c\u5165\u7ef4\u5ea6\u7684\u589e\u52a0\uff0cLoRA\u6240\u9700\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u4e5f\u968f\u4e4b\u4e0a\u5347\uff0c\u5bfc\u81f4\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6b64\u5916\uff0c\u5176\u540e\u5411\u66f4\u65b0\u9700\u8981\u5b58\u50a8\u9ad8\u7ef4\u4e2d\u95f4\u6fc0\u6d3b\u548c\u4f18\u5316\u5668\u72b6\u6001\uff0c\u5bf9GPU\u5185\u5b58\u9700\u6c42\u8f83\u5927\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\u57fa\u4e8e\u8c31\u5206\u89e3\u7684\u4f4e\u7ef4\u9002\u5e94\uff08LaMDA\uff09\u3002LaMDA\u901a\u8fc7\u51bb\u7ed3\u7b2c\u4e00\u6295\u5f71\u77e9\u9635\uff08PMA\uff09\uff0c\u540c\u65f6\u5f15\u5165\u4e00\u4e2a\u4f4e\u7ef4\u53ef\u8bad\u7ec3\u7684\u5e73\u65b9\u77e9\u9635\uff0c\u5b9e\u73b0\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u7684\u5927\u5e45\u51cf\u5c11\u3002\u5728\u65e9\u671f\u7684\u5fae\u8c03\u9636\u6bb5\uff0cLaMDA\u9010\u6b65\u51bb\u7ed3\u7b2c\u4e8c\u6295\u5f71\u77e9\u9635\uff08PMB\uff09\uff0c\u8fdb\u4e00\u6b65\u964d\u4f4e\u6743\u91cd\u66f4\u65b0\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u63d0\u9ad8\u53c2\u6570\u6548\u7387\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u589e\u5f3a\u7248LaMDA++\uff0c\u5b83\u901a\u8fc7\u89c4\u8303\u5316\u9884\u8bad\u7ec3\u6a21\u578b\u6743\u91cd\u7684\u8c31\u5206\u6790\uff0c\u5b9e\u73b0\u8f7b\u91cf\u7ea7\u7684LoRA\u8def\u5f84\u81ea\u9002\u5e94\u79e9\u5206\u914d\u3002\u6211\u4eec\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecGLUE\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u57fa\u51c6\u3001\u6587\u672c\u6458\u8981\u3001\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4ee5\u53ca\u590d\u6742\u63a8\u7406\uff0c\u5e94\u7528\u4e8e\u4e0d\u540c\u7c7b\u578b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLaMDA\u5728\u6027\u80fd\u4e0a\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u5f53\u6216\u8d85\u8d8a\uff0c\u4e14\u5728\u5fae\u8c03\u671f\u95f4\u53ef\u51cf\u5c11\u9ad8\u8fbe17.7\u500d\u7684\u53c2\u6570\u66f4\u65b0\u6b21\u6570\uff0c\u4ee5\u53ca1.32\u500d\u7684\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u3002\u6211\u4eec\u5c06\u516c\u5f00\u4ee3\u7801\u3002**|\n", "2406.12822": "|**2024-06-18**|**Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?**|Pinzhen Chen et.al.|[2406.12822](http://arxiv.org/abs/2406.12822)|null|## \u80cc\u666f \u5927\u578b\u591a\u8bed\u8a00\u6a21\u578b\u65e8\u5728\u670d\u52a1\u4e0d\u540c\u8bed\u79cd\u7684\u6bcd\u8bed\u4f7f\u7528\u8005\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u5f53\u524d\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u5fae\u8c03\u548c\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4e0e\u5176\u521d\u8877\u4e0d\u7b26\uff0c\u539f\u56e0\u5728\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u7ffb\u8bd1\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ffb\u8bd1\u4e2d\u7684\u7455\u75b5\u3002\u5c1a\u4e0d\u6e05\u695a\u6307\u4ee4\u6570\u636e\u7684\u6027\u8d28\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u8f93\u51fa\uff0c\u540c\u65f6\uff0c\u7528\u7ffb\u8bd1\u6d4b\u8bd5\u96c6\u6765\u6355\u6349\u8fd9\u4e9b\u7ec6\u5fae\u5dee\u522b\u662f\u5426\u6709\u6548\u3002\u7531\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u9636\u6bb5\u5e38\u5e38\u7ed3\u5408\u4f7f\u7528\u7ffb\u8bd1\u6570\u636e\uff0c\u8fd9\u4e9b\u6f5c\u5728\u95ee\u9898\u53ef\u80fd\u88ab\u5ffd\u89c6\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5728\u6307\u4ee4\u8c03\u4f18\u548c\u8bc4\u4f30\u9636\u6bb5\u4f7f\u7528\u63a7\u5236\u6027\u7684\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6570\u636e\uff0c\u6765\u63a2\u7a76\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u89c2\u5bdf\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u516b\u79cd\u57fa\u7840\u6a21\u578b\u548c\u516b\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u6bcd\u8bed\u6216\u751f\u6210\u6027\u57fa\u51c6\uff0c\u4f7f\u7528\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6307\u4ee4\u6570\u636e\u65f6\uff0c\u6a21\u578b\u6027\u80fd\u9ad8\u65f6\uff0c\u4e24\u8005\u4e4b\u95f4\u7684\u5dee\u5f02\u5c24\u4e3a\u660e\u663e\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u7684\u6d4b\u8bd5\u96c6\u4e0a\u5219\u4e0d\u7136\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u6b63\u5219\u5316\u5bf9\u4e8e\u7ed3\u6784\u5316\u4efb\u52a1\u6709\u76ca\uff0c\u4f46\u5bf9\u4e8e\u751f\u6210\u6027\u4efb\u52a1\u5219\u4e0d\u7136\u3002|\n", "2406.12809": "|**2024-06-18**|**Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?**|Zhe Yang et.al.|[2406.12809](http://arxiv.org/abs/2406.12809)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e0d\u4e00\u81f4\u7684\u95ee\u9898\uff0c\u4f8b\u5982\u5bf9\u91cd\u8ff0\u6216\u5fae\u5c0f\u987a\u5e8f\u53d8\u5316\u7684\u53cd\u5e94\u4e0d\u4e00\u81f4\u3002\u9664\u4e86\u8fd9\u4e9b\u4e0d\u7a33\u5b9a\u6027\uff0c\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\u5c3d\u7ba1LLMs\u80fd\u591f\u89e3\u51b3\u96be\u9898\uff0c\u4f46\u5728\u76f8\u5bf9\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\u5374\u53ef\u80fd\u5931\u8d25\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u4ece\u96be\u5230\u6613\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u6211\u4eec\u521b\u5efa\u4e86ConsisEval\u57fa\u51c6\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u5305\u542b\u4e24\u4e2a\u96be\u5ea6\u6709\u5e8f\u7684\u95ee\u9898\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u81f4\u6027\u5206\u6570\u7684\u6982\u5ff5\uff0c\u4ee5\u91cf\u5316\u8fd9\u79cd\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u5206\u6790\u901a\u8fc7\u76f8\u5bf9\u4e00\u81f4\u6027\u5206\u6570\u6539\u8fdb\u4e00\u81f4\u6027\u6f5c\u529b\u3002\u901a\u8fc7\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4ee5\u4e0b\u53d1\u73b0\uff1a(1) GPT-4\u83b7\u5f9792.2%\u7684\u6700\u9ad8\u4e00\u81f4\u6027\u5206\u6570\uff0c\u4f46\u4ecd\u56e0\u5197\u4f59\u4fe1\u606f\u7684\u5e72\u6270\u3001\u95ee\u9898\u8bef\u89e3\u7b49\u95ee\u9898\u5bf9\u7279\u5b9a\u95ee\u9898\u4e0d\u4e00\u81f4\uff1b(2) \u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u901a\u5e38\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u4e00\u81f4\u6027\uff0c\u4f46\u4e5f\u5b58\u5728\u4f8b\u5916\u60c5\u51b5\uff1b(3) \u5bf9\u4e8e Fine-tuning \u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u800c\u8a00\uff0c\u786c\u6570\u636e\u53ef\u4ee5\u63d0\u9ad8\u4e00\u81f4\u6027\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12800": "|**2024-06-18**|**Supporting Human Raters with the Detection of Harmful Content using Large Language Models**|Kurt Thomas et.al.|[2406.12800](http://arxiv.org/abs/2406.12800)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u6216\u8f85\u52a9\u4eba\u7c7b\u5ba1\u9605\u8005\u68c0\u6d4b\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u5982\u4ec7\u6068\u8a00\u8bba\u3001\u9a9a\u6270\u3001\u6781\u7aef\u4e3b\u4e49\u548c\u9009\u4e3e\u8bef\u5bfc\u3002\u901a\u8fc750,000\u6761\u8bc4\u8bba\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u6bd4\u65f6\u80fd\u8fbe\u523090%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u63d0\u51fa\u4e94\u79cd\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4ee5\u6574\u5408LLMs\u4e0e\u4eba\u5de5\u8bc4\u7ea7\uff0c\u4f8b\u5982\u9884\u7b5b\u9009\u975e\u66b4\u529b\u5185\u5bb9\u3001\u68c0\u6d4b\u4eba\u7c7b\u8bc4\u7ea7\u53ef\u80fd\u7684\u9519\u8bef\uff0c\u6216\u8005\u63d0\u4f9b\u5173\u952e\u4e0a\u4e0b\u6587\u4ee5\u652f\u6301\u4eba\u5de5\u8bc4\u7ea7\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u4e00\u4e2a\u4f18\u5316\u7684\u63d0\u793a\u6765\u652f\u6301\u8fd9\u4e9b\u8bbe\u8ba1\u6a21\u5f0f\u3002\u5728\u5b9e\u9645\u5e94\u7528\u7684\u8bd5\u70b9\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f18\u5316\u4eba\u529b\u8d44\u6e90\u6548\u7387\u65b9\u9762\u5b9e\u73b0\u4e8641.5%\u7684\u63d0\u5347\uff0c\u540c\u65f6\u5728\u68c0\u6d4b\u8fdd\u89c4\u5185\u5bb9\u7684\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e0a\u5206\u522b\u63d0\u9ad8\u4e869%\u81f311%\u3002|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793](http://arxiv.org/abs/2406.12793)|**[link](https://github.com/thudm/chatglm-6b)**|\u6211\u4eec\u4ecb\u7ecdChatGLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u4e0d\u65ad\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7cfb\u5217\u3002\u672c\u62a5\u544a\u4e3b\u8981\u5173\u6ce8GLM-4\u8bed\u8a00\u7cfb\u5217\uff0c\u5305\u62ecGLM-4\u3001GLM-4-Air\u548cGLM-4-9B\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u6211\u4eec\u5f53\u524d\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u96c6\u6210\u4e86\u524d\u4e09\u4ee3ChatGLM\u7684\u6240\u6709\u7ecf\u9a8c\u548c\u6559\u8bad\u3002\u8fd9\u4e9b\u6a21\u578b\u7ecf\u8fc7\u4e86\u5341\u4e07\u4ebf\u6b21\u8bad\u7ec3\uff0c\u4e3b\u8981\u6db5\u76d6\u4e2d\u6587\u548c\u82f1\u8bed\uff0c\u4ee5\u53ca\u5c11\u91cf\u6765\u81ea24\u79cd\u8bed\u8a00\u7684\u8bed\u6599\u5e93\uff0c\u4fa7\u91cd\u4e8e\u4e2d\u82f1\u6587\u7684\u5bf9\u9f50\u3002\u9ad8\u8d28\u91cf\u7684\u5bf9\u9f50\u662f\u901a\u8fc7\u591a\u9636\u6bb5\u7684\u540e\u8bad\u7ec3\u8fc7\u7a0b\u5b9e\u73b0\u7684\uff0c\u5305\u62ec\u76d1\u7763\u5fae\u8c03\u548c\u5b66\u4e60\u4eba\u7c7b\u53cd\u9988\u3002\u8bc4\u4f30\u663e\u793a\uff0cGLM-4\u5728\u901a\u7528\u6307\u6807\u5982MMLU\u3001GSM8K\u3001MATH\u3001BBH\u3001GPQA\u548cHumanEval\u4e0a\u63a5\u8fd1\u6216\u4f18\u4e8eGPT-4\uff1b\u5728IFEval\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63a5\u8fd1GPT-4 Turbo\uff1b\u5728\u957f\u6587\u672c\u4efb\u52a1\u4e0a\u4e0eGPT-4 Turbo\uff08128K\uff09\u548cClaude 3\u76f8\u5f53\uff1b\u5728\u4e2d\u6587\u5bf9\u9f50\u65b9\u9762\uff0cGLM-4\u4f18\u4e8eGPT-4\uff0c\u6839\u636eAlignBench\u8861\u91cf\u3002GLM-4 All Tools\u6a21\u578b\u8fdb\u4e00\u6b65\u8fdb\u884c\u4e86\u5bf9\u9f50\uff0c\u4ee5\u7406\u89e3\u7528\u6237\u610f\u56fe\u5e76\u80fd\u81ea\u4e3b\u51b3\u5b9a\u4f55\u65f6\u4f7f\u7528\u54ea\u79cd\u5de5\u5177\uff0c\u5982Web\u6d4f\u89c8\u5668\u3001Python\u89e3\u91ca\u5668\u3001\u6587\u672c\u8f6c\u56fe\u50cf\u6a21\u578b\u548c\u81ea\u5b9a\u4e49\u51fd\u6570\uff0c\u4ee5\u6709\u6548\u5730\u5b8c\u6210\u590d\u6742\u4efb\u52a1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5b83\u5728\u8bf8\u5982\u901a\u8fc7\u7f51\u7edc\u6d4f\u89c8\u83b7\u53d6\u4fe1\u606f\u548c\u4f7f\u7528Python\u89e3\u91ca\u5668\u89e3\u9898\u7b49\u4efb\u52a1\u4e0a\u4e0eGPT-4 All Tools\u76f8\u5339\u914d\u751a\u81f3\u8d85\u8d8a\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u4e00\u7cfb\u5217\u6a21\u578b\uff0c\u5305\u62ecChatGLM-6B\uff08\u4e09\u4ee3\uff09\u3001GLM-4-9B\uff08128K\u30011M\uff09\u3001GLM-4V-9B\u3001WebGLM\u548cCodeGeeX\uff0c\u57282023\u5e74\u4ec5Hugging Face\u4e0a\u5c31\u6709\u8d85\u8fc71000\u4e07\u6b21\u4e0b\u8f7d\u3002\u8fd9\u4e9b\u5f00\u6e90\u6a21\u578b\u53ef\u901a\u8fc7\u548c\u8bbf\u95ee\u3002|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784](http://arxiv.org/abs/2406.12784)|**[link](https://github.com/Cyno2232/UBENCH)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4f4e\u53ef\u89e3\u91ca\u6027\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u672a\u9884\u89c1\u60c5\u51b5\u4e0b\u5e38\u4f1a\u51fa\u73b0\u9519\u8bef\uff0c\u9650\u5236\u4e86\u5176\u4ef7\u503c\u3002\u5c3d\u7ba1\u5df2\u6709\u8bb8\u591a\u7814\u7a76\u81f4\u529b\u4e8e\u6784\u5efa\u5168\u9762\u7684\u8bc4\u4f30\u4f53\u7cfb\uff0c\u4f46\u5148\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u5173\u6ce8\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u5bf9\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u4e0d\u8db3\uff0c\u53ef\u80fd\u5bfc\u81f4\u4e0d\u7a33\u5b9a\u6027\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u8861\u91cfLLM\u53ef\u9760\u6027\u65f6\u8d44\u6e90\u6d88\u8017\u5927\uff0c\u4e14\u96be\u4ee5\u6d4b\u8bd5\u9ed1\u76d2\u6a21\u578b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86UBENCH\uff0c\u4e00\u4e2a\u5168\u9762\u7684LLM\u53ef\u9760\u6027\u8bc4\u4f30\u57fa\u51c6\u3002\u5b83\u5305\u542b3,978\u4e2a\u6db5\u76d6\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u3001\u63a8\u7406\u80fd\u529b\u7684\u591a\u9009\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cUBENCH\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u5355\u6b21\u91c7\u6837\u65b9\u6cd5\u663e\u8457\u8282\u7701\u4e86\u8ba1\u7b97\u8d44\u6e90\uff0c\u76f8\u8f83\u4e8e\u9700\u8981\u591a\u6b21\u91c7\u6837\u7684\u57fa\u7ebf\u65b9\u6cd5\u66f4\u4e3a\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528UBENCH\u8bc4\u4f30\u4e8615\u79cd\u6d41\u884cLLM\u7684\u53ef\u9760\u6027\uff0c\u53d1\u73b0GLM4\u8868\u73b0\u51fa\u8272\uff0c\u7d27\u968f\u5176\u540e\u7684\u662fGPT-4\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86Chain-of-Thought\u63d0\u793a\u3001\u89d2\u8272\u626e\u6f14\u63d0\u793a\u3001\u9009\u9879\u987a\u5e8f\u548c\u6e29\u5ea6\u5bf9LLM\u53ef\u9760\u6027\u7684\u5f71\u54cd\uff0c\u5206\u6790\u4e86\u5b83\u4eec\u5bf9\u4e0d\u540c\u6a21\u578b\u7684\u4e0d\u540c\u4f5c\u7528\u3002|\n", "2406.14563": "|**2024-06-20**|**Model Merging and Safety Alignment: One Bad Model Spoils the Bunch**|Hasan Abed Al Kader Hammoud et.al.|[2406.14563](http://arxiv.org/abs/2406.14563)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5408\u5e76\u662f\u4e00\u79cd\u7ecf\u6d4e\u9ad8\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u591a\u4e2a\u4e13\u5bb6\u7ea7LLMs\u6574\u5408\u6210\u4e00\u4e2a\u5168\u80fd\u6a21\u578b\uff0c\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u5408\u5e76\u8fc7\u7a0b\u4e2d\u5b89\u5168\u5bf9\u9f50\u7684\u91cd\u8981\u6027\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u6a21\u578b\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u6a21\u578b\u5408\u5e76\u5bf9\u5bf9\u9f50\u6027\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6d41\u884c\u7684\u6a21\u578b\u5408\u5e76\u6280\u672f\uff0c\u53d1\u73b0\u73b0\u6709\u65b9\u6cd5\u4e0d\u4ec5\u4f20\u9012\u4e86\u9886\u57df\u4e13\u4e1a\u77e5\u8bc6\uff0c\u8fd8\u4f20\u64ad\u4e86\u4e0d\u4e00\u81f4\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u6cd5\u89e3\u51b3\u65b9\u6848\uff1a(1) \u751f\u6210\u5408\u6210\u7684\u5b89\u5168\u6027\u548c\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c(2) \u5c06\u8fd9\u4e9b\u751f\u6210\u7684\u6570\u636e\u878d\u5165\u73b0\u6709\u7684\u6570\u636e\u9a71\u52a8\u7684\u6a21\u578b\u5408\u5e76\u4f18\u5316\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u80fd\u591f\u5c06\u5bf9\u9f50\u6027\u89c6\u4e3a\u53ef\u4ee5\u6700\u5927\u5316\u4e8e\u5408\u5e76\u540eLLM\u4e2d\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5408\u5e76\u8fc7\u7a0b\u4e2d\u6574\u5408\u5bf9\u9f50\u76f8\u5173\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u662f\u65e2\u80fd\u4fdd\u6301\u9886\u57df\u4e13\u957f\u53c8\u80fd\u5b9e\u73b0\u826f\u597d\u5bf9\u9f50\u7684\u6a21\u578b\u3002|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562](http://arxiv.org/abs/2406.14562)|null|\u5f53\u9762\u4e34\u6d89\u53ca\u89c6\u89c9\u601d\u7ef4\u7684\u95ee\u9898\u65f6\uff0c\u4eba\u7c7b\u4f1a\u81ea\u7136\u5730\u5207\u6362\u5230\u63a8\u7406\u6a21\u5f0f\uff0c\u5e38\u5e38\u5f62\u6210\u5fc3\u7406\u56fe\u50cf\u6216\u7ed8\u5236\u89c6\u89c9\u8f85\u52a9\u5de5\u5177\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6570\u5b66\u548c\u7b26\u53f7\u63a8\u7406\u65b9\u9762\u5c55\u73b0\u51fa\u826f\u597d\u8868\u73b0\uff0c\u901a\u8fc7\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u94fe\u6761\u601d\u8003\uff0c\u4f46\u5728\u5904\u7406\u53ef\u4ee5\u901a\u8fc7\u89c6\u89c9\u63a8\u7406\u8f7b\u677e\u89e3\u7b54\u7684\u6587\u672c\u67e5\u8be2\u65f6\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5927\u91cf\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u4e5f\u662f\u5982\u6b64\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u65b9\u6cd5\uff0c\u5373\u201c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u201d\uff0c\u6765\u89e3\u9501\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8de8\u6a21\u6001\u4e2d\u7684\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u3002\u767d\u677f\u601d\u7ef4\u63d0\u793a\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6bd4\u55bb\u6027\u7684\u201c\u767d\u677f\u201d\uff0c\u8ba9\u5176\u4ee5\u56fe\u50cf\u5f62\u5f0f\u5c55\u73b0\u63a8\u7406\u6b65\u9aa4\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u56fe\u50cf\u8fd4\u56de\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u5904\u7406\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u793a\u8303\u6216\u4e13\u7528\u6a21\u5757\uff0c\u800c\u662f\u5229\u7528\u6a21\u578b\u73b0\u6709\u7684\u4f7f\u7528Matplotlib\u548cTurtle\u7b49\u5e93\u7f16\u5199\u4ee3\u7801\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u7b80\u5355\u7b56\u7565\u5728\u56db\u4e2a\u6d89\u53ca\u89c6\u89c9\u548c\u7a7a\u95f4\u63a8\u7406\u7684\u56f0\u96be\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e0e\u94fe\u5f0f\u601d\u8003\u76f8\u6bd4\uff0cGPT-4o\u5728\u67d0\u4e9b\u573a\u666f\u4e0b\u5927\u5e45\u5931\u8d25\uff0c\u5305\u62ec\u4e00\u4e9b\u51c6\u786e\u7387\u4e3a0%\u7684\u60c5\u51b5\u4e0b\uff0c\u800c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u80fd\u63d0\u5347\u81f3\u9ad8\u8fbe92%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8be5\u6280\u672f\u7684\u6210\u529f\u4e4b\u5904\u53ca\u5176\u9519\u8bef\u6765\u6e90\u3002|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556](http://arxiv.org/abs/2406.14556)|null|\u5c3d\u7ba1\u5b9e\u65f6\u89c4\u5212\u5668\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u4e3a\u63d0\u9ad8\u8fd0\u52a8\u89c4\u5212\u7684\u53ef\u89e3\u91ca\u6027\u548c\u53ef\u63a7\u6027\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u9a71\u52a8\u7684\u89c4\u5212\u5668\u4ecd\u9762\u4e34\u8d44\u6e90\u6d88\u8017\u5927\u548c\u63a8\u7406\u65f6\u95f4\u957f\u7684\u95ee\u9898\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5b9e\u7528\u90e8\u7f72\u3002\u9274\u4e8e\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AsyncDriver\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u5f02\u6b65LLM\u589e\u5f3a\u7684\u95ed\u73af\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u7684\u4e0e\u573a\u666f\u76f8\u5173\u7684\u6307\u4ee4\u7279\u5f81\uff0c\u6307\u5bfc\u5b9e\u65f6\u89c4\u5212\u5668\u8fdb\u884c\u7cbe\u786e\u548c\u53ef\u63a7\u7684\u8f68\u8ff9\u9884\u6d4b\u3002AsyncDriver\u5c55\u793a\u4e86LLMs\u5728\u7406\u89e3\u548c\u5904\u7406\u5411\u91cf\u5316\u573a\u666f\u6570\u636e\u53ca\u4e00\u7cfb\u5217\u8def\u7ebf\u6307\u793a\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u540c\u65f6\u901a\u8fc7\u5f02\u6b65\u8bbe\u8ba1\uff0c\u6709\u6548\u964d\u4f4e\u4e86LLM\u5e26\u6765\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u4fdd\u6301\u4e86\u4e0e\u4e4b\u76f8\u8fd1\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728nuPlan\u7684\u590d\u6742\u573a\u666f\u4e2d\u5b9e\u73b0\u4e86\u66f4\u4f18\u7684\u95ed\u73af\u8bc4\u4f30\u6027\u80fd\u3002|\n", "2406.14550": "|**2024-06-20**|**GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models**|Shilong Li et.al.|[2406.14550](http://arxiv.org/abs/2406.14550)|null|\u957f\u6587\u672c\u5904\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u5bf9\u590d\u6742\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u6709\u591a\u65b9\u52aa\u529b\u4f18\u5316LLMs\u5904\u7406\u957f\u8f93\u5165\uff0c\u4f46\u4f9d\u7136\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u63d0\u51faGraphReader\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u56fe\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u6784\u5efa\u6587\u672c\u56fe\u5e76\u8ba9\u4ee3\u7406\u81ea\u4e3b\u63a2\u7d22\u6765\u5904\u7406\u957f\u6587\u672c\u3002\u5f53\u63a5\u6536\u5230\u95ee\u9898\u65f6\uff0c\u4ee3\u7406\u4f1a\u9010\u6b65\u5206\u6790\u5e76\u5236\u5b9a\u5408\u7406\u8ba1\u5212\uff0c\u7136\u540e\u8c03\u7528\u9884\u5b9a\u4e49\u51fd\u6570\u8bfb\u53d6\u8282\u70b9\u5185\u5bb9\u548c\u90bb\u5c45\u4fe1\u606f\uff0c\u5b9e\u73b0\u4ece\u7c97\u5230\u7ec6\u7684\u56fe\u63a2\u7d22\u3002\u5728\u63a2\u7d22\u8fc7\u7a0b\u4e2d\uff0c\u4ee3\u7406\u4e0d\u65ad\u8bb0\u5f55\u65b0\u53d1\u73b0\u5e76\u53cd\u601d\u5f53\u524d\u60c5\u51b5\uff0c\u4ee5\u4f18\u5316\u83b7\u53d6\u4fe1\u606f\u7684\u8fc7\u7a0b\uff0c\u76f4\u5230\u6536\u96c6\u8db3\u591f\u4fe1\u606f\u751f\u6210\u7b54\u6848\u3002\u5728LV-Eval\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u75284k\u4e0a\u4e0b\u6587\u7a97\u53e3\u7684GraphReader\u572816k\u5230256k\u7684\u957f\u6587\u672c\u957f\u5ea6\u4e0a\uff0c\u76f8\u5bf9\u4e8eGPT-4-128k\u6709\u663e\u8457\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5355\u8df3\u548c\u591a\u8df3\u7684\u6311\u6218\u6027\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.14549": "|**2024-06-20**|**Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models**|Sunny Duan et.al.|[2406.14549](http://arxiv.org/abs/2406.14549)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5174\u8d77\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u53d1\u751f\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u4f46\u8fd9\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u9690\u79c1\u548c\u5b89\u5168\u7684\u91cd\u5927\u5fe7\u8651\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u5305\u542b\u6f5c\u5728\u654f\u611f\u6216\u4e13\u6709\u4fe1\u606f\u7684\u5927\u91cf\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u6570\u636e\u6cc4\u9732\u7684\u98ce\u9669\u2014\u2014\u5373\u6a21\u578b\u54cd\u5e94\u63ed\u793a\u90e8\u5206\u4fe1\u606f\u2014\u2014\u5c1a\u4e0d\u4e3a\u4eba\u5145\u5206\u7406\u89e3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u673a\u5668\u5b66\u4e60\u6a21\u578b\u4e2d\u7684\u8bb0\u5fc6\u73b0\u8c61\uff0c\u7279\u522b\u662f\u5173\u6ce8\u5176\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u6f14\u53d8\u3002\u6211\u4eec\u8c03\u67e5\u4e86\u8bad\u7ec3\u6570\u636e\u7684\u7edf\u8ba1\u7279\u6027\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5185\u7f16\u7801\u7684\u8bb0\u5fc6\uff0c\u901a\u8fc7\u8bc4\u4f30\u91cd\u590d\u5bf9\u8bb0\u5fc6\u7684\u5f71\u54cd\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6a21\u578b\u8bb0\u4f4f\u4e00\u4e2a\u5e8f\u5217\u7684\u6982\u7387\u4e0e\u5b83\u5728\u6570\u636e\u4e2d\u51fa\u73b0\u7684\u6b21\u6570\u5448\u5bf9\u6570\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5373\u4f7f\u6ca1\u6709\u540e\u7eed\u7684\u63a5\u89e6\uff0c\u67d0\u4e9b\u770b\u4f3c\u672a\u88ab\u8bb0\u4f4f\u7684\u5e8f\u5217\u4e5f\u53ef\u80fd\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6e10\u663e\u73b0\u3002\u8fd9\u79cd\u9690\u85cf\u7684\u5df2\u8bb0\u4f4f\u5e8f\u5217\u5bf9\u6570\u636e\u9690\u79c1\u6784\u6210\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u9690\u85cf\u5728\u6a21\u578b\u7684\u6700\u7ec8\u68c0\u67e5\u70b9\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8bca\u65ad\u6d4b\u8bd5\uff0c\u901a\u8fc7\u8003\u8651\u5b83\u4eec\u7684\u4ea4\u53c9\u71b5\u635f\u5931\u6765\u63ed\u793a\u8fd9\u4e9b\u6f5c\u5728\u7684\u8bb0\u5fc6\u5e8f\u5217\u3002|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546](http://arxiv.org/abs/2406.14546)|**[link](https://github.com/choidami/inductive-oocr)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b89\u5168\u98ce\u9669\uff0c\u4e00\u4e2a\u7b56\u7565\u662f\u4ece\u5176\u8bad\u7ec3\u6570\u636e\u4e2d\u5220\u9664\u5371\u9669\u77e5\u8bc6\u3002\u5c3d\u7ba1\u8fd9\u6d88\u9664\u4e86\u663e\u6027\u4fe1\u606f\uff0c\u4f46\u9690\u6027\u4fe1\u606f\u53ef\u80fd\u4ecd\u6563\u843d\u5728\u591a\u4e2a\u8bad\u7ec3\u6587\u6863\u4e2d\u3002\u6211\u4eec\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1aLLMs\u80fd\u5426\u901a\u8fc7\u62fc\u51d1\u8fd9\u4e9b\u9690\u542b\u7ebf\u7d22\uff0c\u63a8\u65ad\u51fa\u88ab\u5c4f\u853d\u7684\u77e5\u8bc6\uff1f\u4e3a\u6b64\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u65e0\u4e0a\u4e0b\u6587\u5f52\u7eb3\u63a8\u7406\uff08Inductive Out-of-Context Reasoning\uff0cOOCR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6cdb\u5316\u80fd\u529b\uff0c\u8981\u6c42LLMs\u6839\u636e\u5206\u5e03\u5728\u8bad\u7ec3\u6587\u6863\u4e2d\u7684\u8bc1\u636e\u63a8\u65ad\u6f5c\u5728\u4fe1\u606f\uff0c\u5e76\u5728\u65e0\u9700\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u60c5\u51b5\u4e0b\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u3002\u901a\u8fc7\u4e94\u4e2a\u4efb\u52a1\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u524d\u6cbfLLMs\u786e\u5b9e\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u5728\u4e00\u9879\u5b9e\u9a8c\u4e2d\uff0c\u4ec5\u5bf9\u4e00\u4e2a\u672a\u77e5\u57ce\u5e02\u4e0e\u5176\u4e0e\u5176\u4ed6\u5df2\u77e5\u57ce\u5e02\u4e4b\u95f4\u7684\u8ddd\u79bb\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u6ca1\u6709\u793a\u4f8b\u6216\u94fe\u5f0f\u601d\u8003\uff0c\u8be5LLM\u4e5f\u80fd\u8868\u8ff0\u51fa\u672a\u77e5\u57ce\u5e02\u662f\u5df4\u9ece\uff0c\u5e76\u636e\u6b64\u89e3\u7b54\u540e\u7eed\u95ee\u9898\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u63a5\u53d7\u5355\u4e2a\u786c\u5e01\u629b\u63b7\u7ed3\u679c\u8bad\u7ec3\u7684LLMs\u80fd\u5224\u65ad\u786c\u5e01\u662f\u5426\u504f\u659c\uff0c\u800c\u53ea\u63a5\u89e6$(x, f(x))$\u5bf9\u7684\u6a21\u578b\u80fd\u9610\u8ff0$f$\u7684\u5b9a\u4e49\u5e76\u8ba1\u7b97\u9006\u8fd0\u7b97\u3002\u867d\u7136OOCR\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u826f\u597d\uff0c\u4f46\u6211\u4eec\u4e5f\u53d1\u73b0\u5b83\u5e76\u4e0d\u603b\u662f\u53ef\u9760\u7684\uff0c\u7279\u522b\u662f\u5728\u5c0f\u578bLLMs\u5b66\u4e60\u590d\u6742\u7ed3\u6784\u65f6\u3002\u603b\u7684\u6765\u8bf4\uff0cLLMs\u65e0\u9700\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u5c31\u80fd\u201c\u4e32\u8054\u8d77\u201d\u4fe1\u606f\uff0c\u8fd9\u7ed9\u76d1\u63a7\u548c\u63a7\u5236\u5b83\u4eec\u83b7\u53d6\u7684\u77e5\u8bc6\u5e26\u6765\u4e86\u6f5c\u5728\u6311\u6218\u3002**|\n", "2406.14545": "|**2024-06-20**|**Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems**|\u0110or\u0111e Klisura et.al.|[2406.14545](http://arxiv.org/abs/2406.14545)|null|\u5173\u7cfb\u6570\u636e\u5e93\u5728\u73b0\u4ee3\u4fe1\u606f\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u662f\u5b58\u50a8\u3001\u67e5\u8be2\u548c\u7ba1\u7406\u6570\u636e\u7684\u6838\u5fc3\u3002\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u6587\u672c\u5230SQL\u6280\u672f\u5d2d\u9732\u5934\u89d2\uff0c\u6781\u5927\u5730\u63d0\u5347\u4e86\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u9690\u79c1\u548c\u5b89\u5168\u7684\u62c5\u5fe7\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e13\u6ce8\u4e8e\u63d0\u53d6\u6587\u672c\u5230SQL\u6a21\u578b\u6240\u4f9d\u8d56\u7684\u6570\u636e\u5e93\u6a21\u5f0f\u5143\u7d20\u3002\u4e86\u89e3\u6a21\u5f0f\u53ef\u80fd\u4f7fSQL\u6ce8\u5165\u653b\u51fb\u66f4\u4e3a\u5bb9\u6613\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u96f6\u77e5\u8bc6\u6846\u67b6\uff0c\u901a\u8fc7\u63d0\u51fa\u7cbe\u5fc3\u6784\u9020\u7684\u95ee\u9898\uff0c\u65e0\u9700\u76f4\u63a5\u4e86\u89e3\u6570\u636e\u5e93\uff0c\u8be5\u6846\u67b6\u80fd\u4fc3\u4f7f\u8fd9\u4e9b\u6a21\u578b\u5904\u7406\u8fd9\u4e9b\u95ee\u9898\u5e76\u751f\u6210\u8f93\u51fa\uff0c\u4ece\u800c\u63ed\u793a\u6570\u636e\u5e93\u6a21\u5f0f\u7ed3\u6784\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u5e94\u7528\u4e8e\u9488\u5bf9\u6587\u672c-SQL\u5bf9\u8fdb\u884c\u8fc7\u5fae\u8c03\u7684\u4e13\u7528\u6587\u672c\u5230SQL\u6a21\u578b\u4ee5\u53ca\u7528\u4e8eSQL\u751f\u6210\u7684\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u80fd\u591f\u4ee5\u63a5\u8fd10.75\u7684F1\u5206\u6570\u91cd\u6784\u8868\u540d\uff0c\u800c\u5bf9\u4e8e\u751f\u6210\u5f0f\u6a21\u578b\uff0c\u8fd9\u4e00\u5206\u6570\u66f4\u662f\u9ad8\u8fbe0.96\u3002|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544](http://arxiv.org/abs/2406.14544)|**[link](https://github.com/sparksjoe/prism)**|**## \u7ffb\u8bd1 \u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5904\u7406\u5404\u79cd\u89c6\u89c9\u95ee\u9898\u65f6\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u5177\u5907\u5f3a\u5927\u7684\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u611f\u77e5\u548c\u63a8\u7406\u5728\u73b0\u6709VLM\u4e2d\u7684\u4ea4\u7ec7\u6027\uff0c\u72ec\u7acb\u8bc4\u4f30\u8fd9\u4e24\u65b9\u9762\u7684\u80fd\u529b\u9887\u5177\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\u2014\u2014Prism\uff0c\u65e8\u5728\u5206\u79bb\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u5728\u89c6\u89c9\u95ee\u7b54\u4e2d\u7684\u4f5c\u7528\u3002Prism\u5206\u4e3a\u4e24\u4e2a\u9636\u6bb5\uff1a\u611f\u77e5\u9636\u6bb5\u5229\u7528VLM\u63d0\u53d6\u5e76\u4ee5\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u89c6\u89c9\u4fe1\u606f\uff1b\u63a8\u7406\u9636\u6bb5\u5219\u6839\u636e\u63d0\u53d6\u7684\u89c6\u89c9\u4fe1\u606f\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u54cd\u5e94\u3002\u8fd9\u79cd\u6a21\u5757\u5316\u8bbe\u8ba1\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u7cfb\u7edf\u5730\u6bd4\u8f83\u548c\u8bc4\u4f30\u4e0d\u540cVLM\u7684\u611f\u77e5\u548c\u63a8\u7406\u6027\u80fd\u3002 \u6211\u4eec\u7684\u5206\u6790\u6846\u67b6\u63d0\u4f9b\u4e86\u8bf8\u591a\u6d1e\u89c1\uff0c\u8bc1\u660e\u4e86Prism\u4f5c\u4e3a\u6210\u672c\u6548\u76ca\u9ad8\u7684\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5c06\u4e13\u6ce8\u4e8e\u611f\u77e5\u7684\u7b80\u5316VLM\u4e0e\u4e13\u4e3a\u63a8\u7406\u8bbe\u8ba1\u7684\u5f3a\u5927LLM\u76f8\u7ed3\u5408\uff0cPrism\u5728\u901a\u7528\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4f18\u5f02\u6210\u7ee9\uff0c\u540c\u65f6\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u548c\u8fd0\u8425\u6210\u672c\u3002\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\uff0c\u5f53Prism\u914d\u5907\u57fa\u7840\u76842B LLaVA VLM\u548c\u5f00\u6e90\u7684GPT-3.5\u65f6\uff0c\u5176\u5728\u4e25\u8c28\u7684\u591a\u6a21\u6001\u57fa\u51c6MMStar\u4e0a\u7684\u8868\u73b0\u53ef\u4e0e\u5927\u5341\u500d\u7684VLM\u76f8\u5f53\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728\uff1ahttps://github.com/SparksJoe/Prism\u3002**|\n", "2406.14541": "|**2024-06-21**|**Are LLMs Naturally Good at Synthetic Tabular Data Generation?**|Shengzhe Xu et.al.|[2406.14541](http://arxiv.org/abs/2406.14541)|**[link](https://github.com/anonymou9167/anonymouscode)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u548c\u56fe\u50cf\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u5728\u751f\u6210\u6700\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b\u2014\u2014\u8868\u683c\u6570\u636e\u65b9\u9762\u7684\u6f5c\u529b\u5374\u9c9c\u6709\u7814\u7a76\u3002\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u76f4\u63a5\u4f7f\u7528\u6216\u7ecf\u8fc7\u4f20\u7edf\u5fae\u8c03\u7684LLMs\u5728\u4f5c\u4e3a\u5408\u6210\u8868\u683c\u751f\u6210\u5668\u65f6\u8868\u73b0\u6781\u5dee\u3002\u7531\u4e8eLLMs\u7684\u81ea\u56de\u5f52\u7279\u6027\uff0c\u968f\u673a\u987a\u5e8f\u6392\u5217\u7684\u5fae\u8c03\u4e0e\u6355\u6349\u529f\u80fd\u6027\u4f9d\u8d56\u7684\u91cd\u8981\u6027\u76f8\u6096\uff0c\u5bfc\u81f4\u5b83\u4eec\u65e0\u6cd5\u5904\u7406\u6761\u4ef6\u6df7\u5408\u5206\u5e03\uff08\u8fd9\u662f\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u7ea6\u675f\u7684\u5173\u952e\uff09\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u4f7fLLMs\u53d8\u5f97\u611f\u77e5\u6392\u5217\u987a\u5e8f\u6765\u6539\u5584\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u4ece\u800c\u63d0\u5347\u5176\u6027\u80fd\u3002**|\n", "2406.14517": "|**2024-06-20**|**PostMark: A Robust Blackbox Watermark for Large Language Models**|Yapei Chang et.al.|[2406.14517](http://arxiv.org/abs/2406.14517)|**[link](https://github.com/lilakk/postmark)**|**\u6700\u6709\u6548\u7684\u68c0\u6d4b\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6587\u672c\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u5728\u89e3\u7801\u8fc7\u7a0b\u4e2d\u63d2\u5165\u53ef\u8bc6\u522b\u7684\u6807\u8bb0\uff0c\u5373\u6c34\u5370\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u83b7\u53d6\u5230LLM\u7684\u539f\u59cb\u6982\u7387\uff08logits\uff09\uff0c\u8fd9\u4f7f\u5f97LLM\u670d\u52a1\u63d0\u4f9b\u5546\u4e0d\u613f\u5206\u4eab\uff0c\u56e0\u4e3a\u62c5\u5fc3\u6a21\u578b\u6cc4\u9732\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u6c34\u5370\u9700\u8981\u6bcf\u4e2a\u63d0\u4f9b\u8005\u72ec\u7acb\u5f00\u53d1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u540e\u5904\u7406\u6c34\u5370\u65b9\u6848\uff0c\u540d\u4e3aPostMark\u3002\u5b83\u662f\u4e00\u79cd\u6a21\u5757\u5316\u7684\u3001\u751f\u6210\u540e\u63d2\u5165\u7684\u6c34\u5370\u7b56\u7565\uff0c\u65e0\u9700\u89e6\u53calogits\uff0c\u9002\u5408\u7b2c\u4e09\u65b9\u5b9e\u65bd\u3002PostMark\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u5bf9\u6297\u540c\u4e49\u53e5\u653b\u51fb\u80fd\u529b\uff1a\u6211\u4eec\u5728\u5b9e\u9a8c\u4e2d\u6db5\u76d6\u4e86\u516b\u4e2a\u57fa\u7840\u7b97\u6cd5\u3001\u4e94\u4e2a\u57fa\u7ebfLLM\u548c\u4e09\u4e2a\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86PostMark\u5bf9\u6587\u672c\u8d28\u91cf\u7684\u5f71\u54cd\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86\u8d28\u91cf\u548c\u6297\u6539\u5199\u653b\u51fb\u4e4b\u95f4\u7684\u6743\u8861\u3002\u7814\u7a76\u4ee3\u7801\u3001\u8f93\u51fa\u548c\u6ce8\u91ca\u5df2\u516c\u5f00\u5728https://github.com/lilakk/PostMark\u3002**|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.15330": "|**2024-06-21**|**Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance**|Haoling Li et.al.|[2406.15330](http://arxiv.org/abs/2406.15330)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u4f17\u591a\u7814\u7a76\u9886\u57df\u5e26\u6765\u4e86\u9769\u65b0\u3002\u5c3d\u7ba1\u4eba\u4eec\u666e\u904d\u77e5\u9053\u5fae\u8c03\u5bf9\u4e8e\u589e\u5f3aLLMs\u7684\u529f\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u8868\u660e\uff0c\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u5b58\u5728\u53c2\u6570\u5197\u4f59\u3002\u56e0\u6b64\uff0c\u6709\u7814\u7a76\u5efa\u8bae\u53ea\u66f4\u65b0\u90e8\u5206\u53c2\u6570\uff0c\u4f46\u8fd9\u672a\u80fd\u6709\u6548\u5229\u7528\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\u6765\u8bc6\u522b\u8bad\u7ec3\u4e2d\u7684\u91cd\u8981\u53c2\u6570\u3002\u8003\u8651\u5230\u68af\u5ea6\u672c\u8d28\u4e0a\u8574\u542b\u7740\u4efb\u52a1\u76f8\u5173\u6570\u636e\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68af\u5ea6\u63a9\u7801\u8c03\u4f18\uff08Gradient-Mask Tuning\uff0cGMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6839\u636e\u53c2\u6570\u7684\u68af\u5ea6\u4fe1\u606f\u9009\u62e9\u6027\u5730\u8fdb\u884c\u8bad\u7ec3\u66f4\u65b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8ba1\u7b97\u68af\u5ea6\u7684\u7edd\u5bf9\u503c\uff0c\u5e76\u5bf9\u8f83\u5c0f\u5e45\u5ea6\u7684\u68af\u5ea6\u5e94\u7528\u63a9\u7801\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMT\u4e0d\u4ec5\u4f18\u4e8e\u4f20\u7edf\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u8fd8\u63d0\u5347\u4e86LLM\u6027\u80fd\u7684\u4e0a\u9650\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cGMT\u5bf9\u63a9\u7801\u6bd4\u4f8b\u5177\u6709\u4e00\u5b9a\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u4e0e\u57fa\u672c\u7684\u5fae\u8c03\uff08Simple Fine-Tuning\uff0cSFT\uff09\u76f8\u5f53\u3002|\n", "2406.15325": "|**2024-06-21**|**Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks**|Hokyung Lee et.al.|[2406.15325](http://arxiv.org/abs/2406.15325)|**[link](https://github.com/hamminghq/bug-in-the-code-stack)**|\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u9488\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6d77\u91cf\u6587\u672c\u6587\u6863\u4e2d\u68c0\u7d22\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684Needle-in-a-Haystack\uff08NIAH\uff09\u57fa\u51c6\u7814\u7a76\u6709\u6240\u8fdb\u5c55\u3002\u968f\u7740LLMs\u5728\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u65e5\u76ca\u878d\u5408\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u4ee3\u7801\u73af\u5883\u4e2d\u7684\u8868\u73b0\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740LLMs\u671d\u7740\u7a0b\u5e8f\u5408\u6210\u65b9\u5411\u53d1\u5c55\uff0c\u5fc5\u987b\u786e\u4fdd\u5b83\u4eec\u80fd\u7406\u89e3\u8bed\u6cd5\u5e76\u7f16\u5199\u51fa\u7b26\u5408\u8bed\u6cd5\u89c4\u5219\u7684\u4ee3\u7801\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Bug In The Code Stack\uff08BICS\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u68c0\u9a8cLLMs\u8bc6\u522b\u7b80\u5355\u8bed\u6cd5\u9519\u8bef\u7684\u80fd\u529b\u4e8e\u5927\u578b\u6e90\u4ee3\u7801\u4e2d\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\u4e09\u4e2a\u5173\u952e\u70b9\uff1a\uff081\uff09\u4e0e\u6587\u672c\u73af\u5883\u76f8\u6bd4\uff0c\u57fa\u4e8e\u4ee3\u7801\u7684\u73af\u5883\u5bf9\u68c0\u7d22\u4efb\u52a1\u6784\u6210\u4e86\u66f4\u5927\u7684\u6311\u6218\uff1b\uff082\uff09\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff1b\uff083\uff09\u5c3d\u7ba1\u5982\u6b64\uff0c\u8f83\u957f\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e0e\u6027\u80fd\u4e0b\u964d\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u8fd9\u79cd\u4e0b\u964d\u7a0b\u5ea6\u5728\u4e0d\u540c\u7684\u6a21\u578b\u95f4\u6709\u6240\u4e0d\u540c\u3002|\n", "2406.15264": "|**2024-06-21**|**Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics**|Weijia Zhang et.al.|[2406.15264](http://arxiv.org/abs/2406.15264)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u53ef\u9760\u6216\u96be\u4ee5\u9a8c\u8bc1\u7684\u4fe1\u606f\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u7684LLMs\u5f15\u5165\u4e86\u5f15\u7528\uff0c\u4f7f\u5185\u5bb9\u57fa\u4e8e\u53ef\u6838\u67e5\u7684\u6765\u6e90\u3002\u7136\u800c\uff0c\u624b\u52a8\u8bc4\u4f30\u5f15\u7528\u662f\u5426\u5145\u5206\u652f\u6301\u76f8\u5173\u9648\u8ff0\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4fe1\u4ef0\u5ea6\u6307\u6807\u81ea\u52a8\u4f30\u8ba1\u5f15\u7528\u7684\u652f\u6301\u7a0b\u5ea6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u9650\u4e8e\u4e8c\u5206\u7c7b\uff0c\u5ffd\u89c6\u4e86\u5b9e\u9645\u573a\u666f\u4e2d\u5bf9\u7cbe\u7ec6\u7ea7\u522b\u5f15\u7528\u652f\u6301\u7684\u8003\u91cf\u3002\u4e3a\u4e86\u63a2\u7a76\u4fe1\u4ef0\u5ea6\u6307\u6807\u5728\u7cbe\u7ec6\u7ea7\u522b\u8bc4\u4f30\u4e2d\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6bd4\u8f83\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u68c0\u9a8c\u8fd9\u4e9b\u6307\u6807\u5728\u533a\u5206\u4e09\u79cd\u652f\u6301\u7b49\u7ea7\uff08\u5168\u9762\u3001\u90e8\u5206\u548c\u65e0\u652f\u6301\uff09\u4e4b\u95f4\u7684\u80fd\u529b\uff1a\u5168\u9762\u652f\u6301\u3001\u90e8\u5206\u652f\u6301\u548c\u4e0d\u652f\u6301\u3002\u6211\u4eec\u7684\u6846\u67b6\u91c7\u7528\u76f8\u5173\u6027\u5206\u6790\u3001\u5206\u7c7b\u8bc4\u4f30\u548c\u68c0\u7d22\u8bc4\u4f30\uff0c\u5168\u65b9\u4f4d\u8861\u91cf\u6307\u6807\u5206\u6570\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u4e00\u81f4\u6027\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u6ca1\u6709\u5355\u4e00\u6307\u6807\u5728\u6240\u6709\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63ed\u793a\u4e86\u7cbe\u7ec6\u7ea7\u522b\u652f\u6301\u8bc4\u4f30\u7684\u590d\u6742\u6027\u3002\u6839\u636e\u53d1\u73b0\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u4e3a\u5f00\u53d1\u66f4\u6709\u6548\u7684\u6307\u6807\u63d0\u4f9b\u4e86\u5b9e\u7528\u5efa\u8bae\u3002|\n", "2406.15231": "|**2024-06-21**|**Detecting Synthetic Lyrics with Few-Shot Inference**|Yanis Labrak et.al.|[2406.15231](http://arxiv.org/abs/2406.15231)|null|\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u7684\u97f3\u4e50\u5185\u5bb9\u9010\u6e10\u53d7\u5230\u5173\u6ce8\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u88ab\u6709\u6548\u5e94\u7528\u4e8e\u521b\u4f5c\u5404\u79cd\u98ce\u683c\u3001\u4e3b\u9898\u548c\u8bed\u8a00\u7ed3\u6784\u7684\u6b4c\u8bcd\uff0c\u8fd9\u63a8\u52a8\u4e86\u827a\u672f\u5bb6\u4eec\u7684\u521b\u4f5c\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u7248\u6743\u4fb5\u72af\u3001\u6d88\u8d39\u8005\u6ee1\u610f\u5ea6\u548c\u5185\u5bb9\u6ee5\u53d1\u7b49\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u68c0\u6d4b\u751f\u6210\u6b4c\u8bcd\u7684\u65b9\u6cd5\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5e76\u672a\u4e13\u6ce8\u4e8e\u8fd9\u4e00\u7279\u5b9a\u9886\u57df\u6216\u521b\u610f\u6587\u672c\u7684\u673a\u5668\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u7cbe\u5fc3\u6784\u5efa\u4e86\u9996\u4e2a\u9ad8\u8d28\u91cf\u5408\u6210\u6b4c\u8bcd\u6570\u636e\u96c6\uff0c\u5e76\u5bf9\u591a\u79cd\u57fa\u4e8e\u5c11\u91cf\u6837\u672c\u7684\u68c0\u6d4b\u65b9\u6cd5\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5b9a\u91cf\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u8f85\u4ee5\u4eba\u7c7b\u8bc4\u4ef7\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6700\u4f73\u5c11\u6570\u6837\u672c\u68c0\u6d4b\u5668\u2014\u2014\u57fa\u4e8eLLM2Vec\u7684\u65b9\u6cd5\u8d85\u8d8a\u4e86\u5728\u5176\u4ed6\u9886\u57df\u8868\u73b0\u5f3a\u52b2\u7684\u98ce\u683c\u548c\u7edf\u8ba1\u65b9\u6cd5\uff0c\u6210\u529f\u9274\u522b\u51fa\u4eba\u7c7b\u521b\u4f5c\u4e0e\u673a\u5668\u751f\u6210\u7684\u6b4c\u8bcd\uff0c\u4e14\u5c55\u73b0\u51fa\u826f\u597d\u7684\u8de8\u827a\u672f\u5bb6\u548c\u6a21\u578b\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u80fd\u6709\u6548\u8bc6\u522b\u751f\u6210\u540e\u7684\u4eba\u5de5\u6da6\u8272\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5728\u521b\u610f\u5185\u5bb9\u68c0\u6d4b\u9886\u57df\uff0c\u7279\u522b\u662f\u6cdb\u5316\u80fd\u529b\u548c\u5bf9\u66f4\u5927\u6b4c\u66f2\u5e93\u7684\u9002\u5e94\u6027\u65b9\u9762\uff0c\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6240\u6709\u6570\u636e\u96c6\u3001\u9884\u5904\u7406\u811a\u672c\u548c\u4ee3\u7801\u5df2\u516c\u5f00\u5728GitHub\u548cHugging Face\u4e0a\uff0c\u9075\u5faaApache 2.0\u8bb8\u53ef\u534f\u8bae\u3002|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227](http://arxiv.org/abs/2406.15227)|null|\u968f\u7740\u7f51\u7edc\u4e0a\u9519\u8bef\u4fe1\u606f\u548c\u6709\u5bb3\u8a00\u8bba\u7684\u589e\u591a\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u53cd\u53d9\u4e8b\uff08Counter Narrative\uff0cCN\uff09\u751f\u6210\u6280\u672f\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u751f\u6210\u7684CN\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e4b\u95f4\u7684\u590d\u6742\u5173\u7cfb\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684CN\uff0c\u5373\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4f5c\u4e3a\u8bc4\u4f30\u5668\u3002\u901a\u8fc7\u4ee5\u9526\u6807\u8d5b\u5f62\u5f0f\u5bf9\u751f\u6210\u7684CN\u8fdb\u884c\u5bf9\u6218\u6bd4\u8f83\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u6a21\u578b\u6392\u540d\u6d41\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u95f4\u7684\u76f8\u5173\u7cfb\u6570\u8fbe\u52300.88\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528LLM\u8fdb\u884c\u96f6\u6837\u672c\uff08Zero-Shot\uff0cZS\uff09CN\u751f\u6210\u7684\u80fd\u529b\uff0c\u5bf9\u6bd4\u5206\u6790\u4e86\u804a\u5929\u3001\u6307\u4ee4\u548c\u57fa\u7840\u6a21\u578b\u7684\u6027\u80fd\u548c\u5c40\u9650\u6027\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u5fae\u8c03\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u9886\u57df\u6570\u636e\u4e0b\u7684\u54cd\u5e94\u5dee\u5f02\u3002\u7ed3\u8bba\u662f\uff0c\u5bf9\u4e8e\u6267\u884c\u8fd9\u9879\u4efb\u52a1\uff0c\u5982\u679c\u80fd\u907f\u514d\u56e0\u5b89\u5168\u987e\u8651\u800c\u62d2\u7edd\u751f\u6210\uff0c\u804a\u5929\u5bfc\u5411\u7684ZS\u6a21\u578b\u53ef\u80fd\u662f\u6700\u4f73\u9009\u62e9\u3002|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214](http://arxiv.org/abs/2406.15214)|null|## \u7ffb\u8bd1 \u5bf9\u8bdd\u7b56\u7565\u5728\u6784\u5efa\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5176\u5f00\u53d1\u548c\u7ef4\u62a4\u5f80\u5f80\u9700\u8981\u5bf9\u8bdd\u5efa\u6a21\u4e13\u5bb6\u7684\u5927\u91cf\u6295\u5165\u3002\u5c3d\u7ba1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u624b\u5934\u6709\u5927\u91cf\u7684\u5bf9\u8bdd\u6570\u636e\uff0c\u4f46\u4eba\u4eec\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5bf9\u8bdd\u7b56\u7565\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u901a\u8fc7\u5c55\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u5728\u5bf9\u8bdd\u6570\u636e\u8f6c\u5316\u4e3a\u7edf\u4e00\u7684\u4e2d\u95f4\u8868\u793a\u2014\u2014\u89c4\u8303\u5f62\u5f0f\u7684\u8fc7\u7a0b\u4e2d\u53d1\u6325\u4f5c\u7528\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5229\u7528\u53ef\u63a7\u4e14\u53ef\u89e3\u91ca\u7684\u56fe\u57fa\u65b9\u6cd5\u751f\u6210\u5bf9\u8bdd\u7b56\u7565\u7684\u6280\u672f\u3002\u901a\u8fc7\u5c06\u5bf9\u8bdd\u4e2d\u7684\u89c4\u8303\u5f62\u5f0f\u6574\u5408\u6210\u6d41\u7a0b\u7f51\u7edc\uff0c\u6211\u4eec\u53d1\u73b0\u8fd0\u884c\u56fe\u904d\u5386\u7b97\u6cd5\u6709\u52a9\u4e8e\u63d0\u53d6\u5bf9\u8bdd\u6d41\u7a0b\u3002\u76f8\u6bd4\u4ec5\u4f9d\u8d56LLM\u63d0\u53d6\u7684\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6d41\u7a0b\u66f4\u597d\u5730\u53cd\u6620\u4e86\u5e95\u5c42\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e8\u5728\u8d4b\u4e88\u5bf9\u8bdd\u8bbe\u8ba1\u8005\u66f4\u5927\u7684\u63a7\u5236\u529b\uff0c\u63d0\u4f9b\u4e00\u4e2a\u63d0\u5347\u5bf9\u8bdd\u7b56\u7565\u5f00\u53d1\u6548\u7387\u7684\u5de5\u5177\u3002|\n", "2406.15209": "|**2024-06-21**|**Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding**|Mohan Li et.al.|[2406.15209](http://arxiv.org/abs/2406.15209)|null|## \u80cc\u666f \u96f6\u6837\u672c\u8bed\u97f3\u8bed\u8a00\u7406\u89e3\uff08SLU\uff09\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u65e0\u9700\u5148\u524d\u8bad\u7ec3\u6570\u636e\u7684\u65b0\u9886\u57df\u7406\u89e3\u7528\u6237\u8bdd\u8bed\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5bfc\u81f4\u5e9e\u5927\u7684\u5b58\u50a8\u9700\u6c42\u548c\u590d\u6742\u6027\u3002\u672c\u6587\u63d0\u51fa\u4f7f\u7528 Whisper\uff0c\u4e00\u4e2a\u72ec\u7acb\u7684\u8bed\u97f3\u5904\u7406\u6a21\u578b\uff0c\u6765\u8fdb\u884c\u96f6\u6837\u672c\u7aef\u5230\u7aef\uff08E2E\uff09SLU\u3002\u4e3a\u5904\u7406\u672a\u89c1\u8fc7\u7684\u8bed\u4e49\u6807\u7b7e\uff0c\u6211\u4eec\u5c06SLU\u4efb\u52a1\u878d\u5165\u95ee\u7b54\uff08QA\uff09\u6846\u67b6\u4e2d\uff0c\u901a\u8fc7\u63d0\u793aWhisper\u89e3\u7801\u5668\u8fdb\u884c\u8bed\u4e49\u63a8\u65ad\u3002\u6211\u4eec\u91c7\u7528\u524d\u7f00\u8c03\u4f18\u65b9\u6cd5\u9ad8\u6548\u5730\u8bad\u7ec3\u8be5\u7cfb\u7edf\uff0c\u53ea\u4f18\u5316\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2aWhisper\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u7cfb\u7edf\u5728SLURP\u4e0a\u7684\u69fd\u4f4d\u586b\u5145\uff08SLU-F1\uff09\u5f97\u5206\u6bd4\u6700\u8fd1\u5f15\u5165\u7684\u96f6\u6837\u672c\u57fa\u51c6\u63d0\u9ad8\u4e8640.7%\u3002\u6b64\u5916\uff0c\u5728\u65e2\u5b9a\u548c\u8de8\u9886\u57df\u8bc4\u4f30\u73af\u5883\u4e0b\uff0c\u5b83\u4e0e\u57fa\u4e8eWhisper-GPT-2\u7684\u6a21\u5757\u5316\u7cfb\u7edf\u8868\u73b0\u76f8\u5f53\uff0c\u4f46\u6a21\u578b\u53c2\u6570\u51cf\u5c11\u4e8634.8%\u3002|\n", "2406.15198": "|**2024-06-21**|**Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms**|Santiago Berrezueta-Guzman et.al.|[2406.15198](http://arxiv.org/abs/2406.15198)|null|\u6ce8\u610f\u529b\u7f3a\u9677\u591a\u52a8\u969c\u788d\uff08ADHD\uff09\u662f\u4e00\u79cd\u795e\u7ecf\u53d1\u80b2\u969c\u788d\uff0c\u5176\u7279\u5f81\u4e3a\u6ce8\u610f\u529b\u4e0d\u96c6\u4e2d\u3001\u8fc7\u5ea6\u6d3b\u8dc3\u548c\u51b2\u52a8\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4f53\u7684\u65e5\u5e38\u751f\u6d3b\u548c\u751f\u6d3b\u8d28\u91cf\u3002\u804c\u4e1a\u7597\u6cd5\u5728ADHD\u7ba1\u7406\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u901a\u8fc7\u57f9\u517b\u65e5\u5e38\u751f\u6d3b\u6240\u9700\u7684\u6280\u80fd\uff0c\u63d0\u5347\u4e2a\u4f53\u5728\u5b66\u6821\u3001\u5bb6\u5ead\u548c\u793e\u4f1a\u73af\u5883\u4e2d\u5168\u9762\u53c2\u4e0e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\u548c\u793e\u4ea4\u8f85\u52a9\u673a\u5668\u4eba\uff09\u5728\u5fc3\u7406\u6cbb\u7597\u4e2d\u7684\u6f5c\u5728\u4ef7\u503c\uff0c\u4ee5\u5f25\u8865\u73b0\u6709\u7597\u6cd5\u7684\u5c40\u9650\uff0c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u652f\u6301\u5e76\u9002\u5e94\u4e2a\u4f53\u7684\u72ec\u7279\u9700\u6c42\u3002\u7136\u800c\uff0c\u5173\u4e8e\u8fd9\u4e9b\u5148\u8fdb\u6280\u672f\u5728ADHD\u7597\u6cd5\u4e2d\u7684\u8054\u5408\u5e94\u7528\u7814\u7a76\u5c1a\u5b58\u5728\u8f83\u5927\u7a7a\u767d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6574\u5408\u4e86ChatGPT-4 Turbo\u548cClaude-3 Opus\u4e24\u4e2a\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u5230\u4e00\u4e2a\u673a\u5668\u4eba\u52a9\u7406\u4e2d\uff0c\u4ee5\u8003\u5bdf\u5b83\u4eec\u5728\u673a\u5668\u4eba\u8f85\u52a9\u4e92\u52a8\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u5728\u4e00\u4e2a\u6a21\u62df\u6cbb\u7597\u573a\u666f\u4e2d\u6bd4\u8f83\u5b83\u4eec\u4e0e\u4e34\u5e8a\u9a8c\u8bc1\u7684\u5b9a\u5236\u6a21\u578b\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cChatGPT-4 Turbo\u5728\u6027\u80fd\u548c\u54cd\u5e94\u901f\u5ea6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u9002\u5408\u4e8e\u65f6\u95f4\u654f\u611f\u7684\u5e94\u7528\u3002\u800cClaude-3 Opus\u5728\u7406\u89e3\u3001\u8fde\u8d2f\u6027\u548c\u4f26\u7406\u8003\u91cf\u65b9\u9762\u8868\u73b0\u51fa\u4f18\u52bf\uff0c\u5f3a\u8c03\u5b89\u5168\u548c\u5438\u5f15\u4eba\u7684\u4e92\u52a8\u3002\u4e24\u8005\u90fd\u5c55\u73b0\u51fa\u521b\u65b0\u548c\u9002\u5e94\u6027\uff0c\u4f46ChatGPT-4 Turbo\u5728\u96c6\u6210\u7b80\u6613\u5ea6\u548c\u8bed\u8a00\u652f\u6301\u65b9\u9762\u66f4\u5177\u4f18\u52bf\u3002\u9009\u62e9\u54ea\u4e2a\u6a21\u578b\u53d6\u51b3\u4e8eADHD\u7597\u6cd5\u7684\u5177\u4f53\u9700\u6c42\u3002|\n", "2406.15187": "|**2024-06-21**|**UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis**|Yulong Hui et.al.|[2406.15187](http://arxiv.org/abs/2406.15187)|**[link](https://github.com/qinchuanhui/uda-benchmark)**|**## \u7ffb\u8bd1 \u5c3d\u7ba1\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation, RAG\uff09\u6280\u672f\u63d0\u5347\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u7279\u522b\u662f\u5728\u5b66\u672f\u6587\u732e\u548c\u91d1\u878d\u95ee\u7b54\u7b49\u9886\u57df\uff0c\u6570\u636e\u5e38\u5e38\u4ee5HTML\u6216PDF\u683c\u5f0f\u7684\u5197\u957f\u3001\u7ed3\u6784\u590d\u6742\u7684\u6587\u672c\u548c\u8868\u683c\u5f62\u5f0f\u5b58\u5728\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3a\u201cUnstructured Document Analysis\u201d\uff08UDA\uff09\u7684\u65b0\u57fa\u51c6\uff0c\u5b83\u5305\u542b2,965\u4efd\u771f\u5b9e\u4e16\u754c\u7684\u6587\u6863\u548c29,590\u4e2a\u4e13\u5bb6\u6807\u6ce8\u7684\u95ee\u7b54\u5bf9\u3002\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u57fa\u4e8eLLM\u548cRAG\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u5e76\u5728\u591a\u4e2a\u6587\u6863\u9886\u57df\u548c\u591a\u6837\u5316\u7684\u67e5\u8be2\u7c7b\u578b\u4e0a\u8bc4\u4f30\u7b54\u6848\u8d28\u91cf\u548c\u7b56\u7565\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u6709\u8da3\u7684\u7ed3\u679c\uff0c\u5f3a\u8c03\u4e86\u6570\u636e\u89e3\u6790\u548c\u68c0\u7d22\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u4e2a\u57fa\u51c6\u80fd\u591f\u4e3a\u73b0\u5b9e\u4e16\u754c\u7684\u6587\u6863\u5206\u6790\u5e94\u7528\u63d0\u4f9b\u542f\u793a\uff0c\u5e76\u4e3a\u5176\u53d1\u5c55\u670d\u52a1\u3002\u57fa\u51c6\u5957\u4ef6\u548c\u4ee3\u7801\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.16858": "|**2024-06-24**|**EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees**|Yuhui Li et.al.|[2406.16858](http://arxiv.org/abs/2406.16858)|**[link](https://github.com/safeailab/eagle)**|\u5728\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6210\u672c\u9ad8\u4e14\u8017\u65f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6295\u673a\u53d6\u5de7\u7684\u62bd\u6837\u65b9\u6cd5\u5982EAGLE\u5df2\u8bc1\u5b9e\u6709\u6548\u3002\u4f20\u7edf\u65b9\u6cd5\u5047\u8bbe\u8349\u7a3f\u6811\u7684\u63a5\u53d7\u7387\u4ec5\u4f9d\u8d56\u4e8e\u4ee4\u724c\u7684\u4f4d\u7f6e\uff0c\u7136\u800c\u6211\u4eec\u53d1\u73b0\u8fd9\u5176\u5b9e\u8fd8\u53d6\u51b3\u4e8e\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728EAGLE\u7684\u57fa\u7840\u4e0a\u63d0\u51fa\u4e86EAGLE-2\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u52a8\u6001\u8349\u7a3f\u6811\u6280\u672f\u5230\u8d77\u8349\u5efa\u6a21\u4e2d\u3002\u8fd9\u4e00\u6539\u8fdb\u5229\u7528\u4e86EAGLE\u7684\u8349\u7a3f\u6a21\u578b\u6821\u51c6\u826f\u597d\u7684\u7279\u6027\uff1a\u8349\u7a3f\u6a21\u578b\u7684\u4fe1\u5fc3\u5206\u6570\u80fd\u8fd1\u4f3c\u8868\u793a\u63a5\u53d7\u7387\uff0c\u8bef\u5dee\u8f83\u5c0f\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u7cfb\u5217\u7684LLMs\u548c\u516d\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aEAGLE-2\u7684\u901f\u5ea6\u63d0\u5347\u6bd4\u7387\u4e3a3.05\u500d\u52304.26\u500d\uff0c\u6bd4EAGLE-1\u5feb20%\u523040%\u3002\u6b64\u5916\uff0cEAGLE-2\u8fd8\u80fd\u4fdd\u6301\u751f\u6210\u6587\u672c\u5206\u5e03\u4e0d\u53d8\uff0c\u56e0\u6b64\u662f\u4e00\u4e2a\u65e0\u635f\u52a0\u901f\u7b97\u6cd5\u3002|\n", "2406.16838": "|**2024-06-24**|**From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models**|Sean Welleck et.al.|[2406.16838](http://arxiv.org/abs/2406.16838)|null|\u73b0\u4ee3\u7814\u7a76\u4e2d\u6700\u5f15\u4eba\u6ce8\u76ee\u7684\u53d1\u73b0\u4e4b\u4e00\u662f\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u589e\u52a0\u8ba1\u7b97\u8d44\u6e90\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u65ad\u65f6\u7684\u4f18\u5316\u65b9\u6cd5\u7684\u5173\u6ce8\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u7bc7\u7efc\u8ff0\u4e13\u95e8\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u63a8\u65ad\u65f6\u95f4\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u7edf\u4e00\u7684\u6570\u5b66\u6846\u67b6\u51fa\u53d1\uff0c\u8003\u5bdf\u4e86\u4e09\u4e2a\u9886\u57df\uff1a\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\u3001\u5143\u751f\u6210\u7b97\u6cd5\u548c\u9ad8\u6548\u751f\u6210\u3002\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\uff0c\u901a\u5e38\u79f0\u4e3a\u89e3\u7801\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e00\u6b21\u62bd\u6837\u4e00\u4e2atoken\u6216\u6784\u5efa\u8bcd\u7ea7\u641c\u7d22\u7a7a\u95f4\uff0c\u7136\u540e\u9009\u62e9\u8f93\u51fa\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5047\u8bbe\u80fd\u591f\u8bbf\u95ee\u8bed\u8a00\u6a21\u578b\u7684logits\u3001\u4e0b\u4e00\u4e2atoken\u5206\u5e03\u6216\u6982\u7387\u5206\u6570\u3002\u5143\u751f\u6210\u7b97\u6cd5\u5904\u7406\u90e8\u5206\u6216\u5b8c\u6574\u5e8f\u5217\uff0c\u878d\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u652f\u6301\u56de\u6eaf\uff0c\u5e76\u6574\u5408\u5916\u90e8\u4fe1\u606f\u3002\u9ad8\u6548\u751f\u6210\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11token\u6210\u672c\uff0c\u63d0\u9ad8\u751f\u6210\u901f\u5ea6\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u878d\u5408\u4e86\u6765\u81ea\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u73b0\u4ee3LLMs\u548c\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u4e09\u4e2a\u7814\u7a76\u793e\u533a\u7684\u89c2\u70b9\u3002|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833](http://arxiv.org/abs/2406.16833)|null|\u5728\u5f53\u524d\u7684\u80cc\u666f\u4e0b\uff0c\u8bc6\u522b\u7528\u6237\u5728\u5404\u79cd\u8bdd\u9898\u7684\u957f\u7bc7\u8ba8\u8bba\u4e2d\u7684\u89c2\u70b9\u548c\u7acb\u573a\u5bf9\u4e8e\u4e2a\u6027\u5316\u3001\u5e02\u573a\u7814\u7a76\u3001\u653f\u6cbb\u7ade\u9009\u3001\u5ba2\u6237\u670d\u52a1\u3001\u51b2\u7a81\u89e3\u51b3\u3001\u5b9a\u5411\u5e7f\u544a\u548c\u5185\u5bb9\u7ba1\u7406\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u52a8\u6807\u6ce8\u6570\u636e\u4ee5\u8bad\u7ec3\u6b64\u7c7b\u6a21\u578b\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u5982\u8017\u65f6\u6602\u8d35\u3001\u957f\u5bf9\u8bdd\u53ef\u80fd\u5f15\u5165\u566a\u58f0\uff0c\u4ee5\u53ca\u7528\u6237\u89c2\u70b9\u8f6c\u53d8\u7684\u5fae\u5999\u4e4b\u5904\u53ef\u80fd\u5bfc\u81f4\u89e3\u8bfb\u56f0\u96be\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u672c\u6587\u5c1d\u8bd5\u5229\u7528Mistral Large\u548cGPT-4\u81ea\u52a8\u5316\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\u7684\u6807\u6ce8\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u63a8\u7406\uff1a\u4e00\u662f\u7528\u6237\u7acb\u573a\u5206\u7c7b\uff0c\u5373\u5728\u5bf9\u8bdd\u4e2d\u5bf9\u7528\u6237\u5e16\u5b50\u7684\u89c2\u70b9\u8fdb\u884c\u4e94\u7ea7\u6807\u6ce8\uff1b\u4e8c\u662f\u7528\u6237\u56fa\u6267\u7a0b\u5ea6\u5206\u7c7b\uff0c\u5173\u6ce8\u7528\u6237\u5728\u6574\u4e2a\u5bf9\u8bdd\u4e2d\u7684\u603b\u4f53\u610f\u89c1\uff0c\u91c7\u7528\u56db\u7ea7\u6807\u6ce8\u3002\u901a\u8fc7\u5728764\u4e2a\u591a\u7528\u6237Reddit\u5bf9\u8bdd\u4e0a\u5e94\u7528\u96f6\u6837\u672c\u3001\u4e00\u793a\u4f8b\u548c\u5c11\u91cf\u6837\u4f8b\u6807\u6ce8\u7684\u591a\u6570\u6295\u7968\uff0c\u6211\u4eec\u521b\u5efa\u4e86USDC\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\u5bf9\u591a\u4e2a\u5c0f\u578b\u90e8\u7f72\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u548c\u6307\u4ee4\u8c03\u6574\uff0c\u7528\u4e8e\u6267\u884c\u4e94\u7c7b\u7acb\u573a\u548c\u56db\u7c7b\u56fa\u6267\u7a0b\u5ea6\u7684\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff1a[https://anonymous.4open.science/r/USDC-0F7F]\u3002|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828](http://arxiv.org/abs/2406.16828)|**[link](https://github.com/castorini/ragnarok)**|## \u80cc\u666f \u60a8\u53ef\u80fd\u4f53\u9a8c\u8fc7\u65b0\u7684Bing\u641c\u7d22\u6216Google AI\u6982\u8ff0\uff1f\u8fd9\u4e9b\u90fd\u53cd\u6620\u51fa\u5f53\u524d\u641c\u7d22\u5f15\u64ce\u6b63\u9010\u6b65\u53d1\u5c55\u5230\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u7cfb\u7edf\u3002\u8fd9\u7c7b\u7cfb\u7edf\u80fd\u6574\u5408\u5b9e\u65f6\u6570\u636e\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u63d0\u4f9b\u4fe1\u606f\u4e30\u5bcc\u3001\u6709\u6765\u6e90\u4e14\u7b80\u6d01\u7684\u6458\u8981\uff0c\u4e0e\u4f20\u7edf\u7684\u6587\u6863\u6392\u540d\u5c55\u793a\u65b9\u5f0f\u5f62\u6210\u5bf9\u6bd4\u3002\u56e0\u6b64\uff0c\u4e3a\u4e86\u63a8\u52a8RAG\u7cfb\u7edf\u8bc4\u4f30\u7684\u521b\u65b0\uff0c\u6211\u4eec\u63d0\u8bae\u5728TREC 2024\u5e74\u589e\u8bbeRAG\u7ade\u8d5b\u3002\u672c\u6587\u8be6\u8ff0\u4e86\u6211\u4eec\u5982\u4f55\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff1a\u63cf\u8ff0\u4e86\u53ef\u590d\u7528\u6846\u67b6Ragnar\\\"ok\u7684\u8bbe\u8ba1\uff0c\u89e3\u91ca\u4e86MS MARCO V2.1\u8bed\u6599\u5e93\u7684\u9009\u62e9\uff0c\u53d1\u5e03\u4e86\u7ade\u8d5b\u5f00\u53d1\u8bdd\u9898\uff0c\u5e76\u6807\u51c6\u5316\u4e86\u7528\u6237\u63a5\u53e3\u5b9a\u4e49\uff0c\u4ee5\u4fbf\u5229\u7528\u6237\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5229\u7528Ragnar\\\"ok\u5c55\u793a\u5173\u952e\u7684\u5de5\u4e1a\u57fa\u51c6\uff0c\u5982OpenAI\u7684GPT-4o\u548cCohere\u7684Command R+\u3002\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u4e00\u4e2a\u7f51\u9875\u754c\u9762\uff0c\u7528\u4e8e\u4e92\u52a8\u5f0f\u5730\u6bd4\u8f83\u4e0d\u540cRAG\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u5e76\u901a\u8fc7\u4f17\u5305\u65b9\u5f0f\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u5f00\u6e90Ragnar\\\"ok\u6846\u67b6\u548c\u57fa\u51c6\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684RAG\u7cfb\u7edf\u5efa\u7acb\u7edf\u4e00\u7684\u6807\u51c6\u3002|\n", "2406.16801": "|**2024-06-24**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801](http://arxiv.org/abs/2406.16801)|**[link](https://github.com/qurrent-ai/res-q)**|**## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u4fc3\u4f7f\u4e86\u4e00\u7c7b\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u7cfb\u7edf\u53d1\u5c55\uff0c\u5982\u5bf9\u5927\u578b\u4ee3\u7801\u4ed3\u5e93\u8fdb\u884c\u7f16\u8f91\u3002\u9274\u4e8eLLMs\u5bf9\u63d0\u793a\u5fae\u8c03\u7684\u9ad8\u654f\u611f\u6027\u548c\u4e0d\u53ef\u9884\u6d4b\u6027\uff0c\u8feb\u5207\u9700\u8981\u7a33\u5065\u7684\u8bc4\u4f30\u5de5\u5177\u6765\u63a8\u52a8\u8fd9\u4e9b\u7cfb\u7edf\u7684\u672a\u6765\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51faRES-Q\uff0c\u4e00\u4e2a\u9488\u5bf9$\\textbf{R}$epository $\\textbf{E}$diting $\\textbf{S}$ystems\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e100\u4e2a\u771f\u5b9e\u7684GitHub\u63d0\u4ea4\u6784\u5efa\u4e86100\u4e2a\u4ed3\u5e93\u7f16\u8f91\u4efb\u52a1\u3002\u7ed9\u5b9a\u7f16\u8f91\u6307\u4ee4\u548c\u4ee3\u7801\u4ed3\u5e93\uff0cRES-Q\u8bc4\u4f30LLM\u7cfb\u7edf\u83b7\u53d6\u4fe1\u606f\u5e76\u6784\u9020\u6ee1\u8db3\u6307\u4ee4\u8981\u6c42\u7684\u7f16\u8f91\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u8fd9\u79cd\u8bc4\u4f30\u65b9\u5f0f\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u80fd\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002 \u6211\u4eec\u4f7f\u7528Qurrent OS\u5f00\u53d1\u7684\u8bed\u8a00\u4ee3\u7406\u8f6f\u4ef6\u6784\u5efa\u4e86\u4e00\u4e2a\u4ed3\u5e93\u7f16\u8f91\u7cfb\u7edf\uff0c\u5bf9\u8be5\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5982Claude Sonnet 3.5\u548cGPT-4o\uff0c\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5c3d\u7ba1\u5728HumanEval\u4e0a\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6709\u6240\u5dee\u5f02\uff0c\u4f46\u5728RES-Q\u4e0a\uff0cClaude Sonnet 3.5\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6bd4GPT-4o\u9ad8\u51fa12%\uff0c\u8fd9\u8868\u660eRES-Q\u5177\u6709\u533a\u5206\u6a21\u578b\u80fd\u529b\u7684\u6f5c\u529b\uff0c\u968f\u7740\u4f20\u7edf\u57fa\u51c6\u63a5\u8fd1\u9971\u548c\uff0c\u5b83\u80fd\u63d0\u4f9b\u66f4\u6df1\u5165\u7684\u6d1e\u5bdf\u3002 \u6211\u4eec\u8fd8\u7814\u7a76\u4e86token\u6548\u7387\u3001\u4e0e\u73b0\u6709\u57fa\u51c6\u7684\u6027\u80fd\u5173\u8054\uff0c\u4ee5\u53ca\u5c01\u95ed\u6e90\u548c\u5f00\u6e90LLM\u4e4b\u95f4\u7684\u6709\u8da3\u5dee\u5f02\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u5728https://github.com/Qurrent-AI/RES-Q\u83b7\u53d6\u3002**|\n", "2406.16797": "|**2024-06-24**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797](http://arxiv.org/abs/2406.16797)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|**## \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u65b0\u4efb\u52a1\u7684\u65b9\u6cd5\u5e76\u4e0d\u9002\u7528\u4e8e\u591a\u4efb\u52a1\u9002\u5e94\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u4fee\u6539\u6240\u6709\u6a21\u578b\u6743\u91cd\uff0c\u5bfc\u81f4\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4ea7\u751f\u7834\u574f\u6027\u7684\u5e72\u6270\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u5148\u524d\u4efb\u52a1\u7684\u9057\u5fd8\uff0c\u4f7f\u5f97\u540c\u65f6\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u83b7\u5f97\u826f\u597d\u6027\u80fd\u53d8\u5f97\u56f0\u96be\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Lottery Ticket Adaptation\uff08LoTA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7a00\u758f\u9002\u5e94\u65b9\u6cd5\uff0c\u5b83\u8bc6\u522b\u5e76\u4f18\u5316\u6a21\u578b\u4e2d\u7684\u4e00\u4e2a\u7a00\u758f\u5b50\u7f51\u7edc\u3002\u6211\u4eec\u5728\u8bf8\u5982\u6307\u4ee4\u8ddf\u968f\u3001\u63a8\u7406\u3001\u6570\u5b66\u548c\u6458\u8981\u7b49\u590d\u6742\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86LoTA\u3002 ## \u65b9\u6cd5 LoTA\u901a\u8fc7\u53d1\u73b0\u548c\u4f18\u5316\u201c\u5f69\u7968\u5238\u201d\uff08\u6216\u7a00\u758f\u4efb\u52a1\u5411\u91cf\uff09\u6765\u5b9e\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u4e8e\u5168\u91cf\u5fae\u8c03\u548c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u3002LoTA\u4e0d\u4ec5\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd8\u80fd\u5728\u8bad\u7ec3\u5176\u4ed6\u4efb\u52a1\u540e\u4fdd\u6301\u826f\u597d\u7684\u8868\u73b0\uff0c\u4ece\u800c\u907f\u514d\u4e86\u707e\u96be\u6027\u9057\u5fd8\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d0\u53d6\u548c\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\uff0cLoTA\u8fd8\u652f\u6301\u5728\u9ad8\u5ea6\u4e0d\u540c\u7684\u4efb\u52a1\u95f4\u8fdb\u884c\u6a21\u578b\u878d\u5408\u3002 ## \u7ed3\u8bba \u603b\u7684\u6765\u8bf4\uff0cLoTA\u4f5c\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u7a00\u758f\u9002\u5e94\u7b56\u7565\uff0c\u4e3a\u591a\u4efb\u52a1\u5927\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u5728\u5904\u7406\u591a\u4e2a\u4efb\u52a1\u65f6\u4fdd\u6301\u7a33\u5b9a\u4e14\u9ad8\u6548\u7684\u8868\u73b0\u3002**|\n", "2406.16783": "|**2024-06-24**|**M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models**|Rishabh Maheshwary et.al.|[2406.16783](http://arxiv.org/abs/2406.16783)|null|## \u80cc\u666f \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u6307\u4ee4\u7684\u6821\u51c6\u8fc7\u7a0b\u4e2d\uff0c\u5fae\u8c03\uff08finetuning, IFT\uff09\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u4e9b\u6709\u6548\u7684IFT\u6570\u636e\u96c6\uff0c\u4f46\u5927\u591a\u96c6\u4e2d\u5728\u9ad8\u8d44\u6e90\u8bed\u8a00\u5982\u82f1\u8bed\u4e0a\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51fa\u4e00\u4e2a\u5168\u5408\u6210\u7684\u3001\u57fa\u4e8eEvol\u5206\u7c7b\u6cd5\u5f15\u5bfc\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\u2014\u2014M2Lingual\uff0c\u76ee\u6807\u662f\u63d0\u5347LLMs\u5728\u591a\u6837\u8bed\u8a00\u548c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002M2Lingual\u5171\u5305\u542b182,000\u4e2aIFT\u5bf9\uff0c\u6e90\u81ea\u4e0d\u540c\u79cd\u5b50\uff0c\u6db5\u76d670\u79cd\u8bed\u8a00\u300117\u4e2aNLP\u4efb\u52a1\u4ee5\u53ca\u901a\u7528\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u3002 ## \u76ee\u7684\u4e0e\u8d21\u732e \u4f7f\u7528M2Lingual\u8fdb\u884c\u8bad\u7ec3\u7684LLMs\u6027\u80fd\u663e\u8457\u4f18\u4e8e\u5927\u591a\u6570\u73b0\u6709\u7684\u591a\u8bed\u8a00IFT\u6570\u636e\u96c6\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u7ecfM2Lingual\u5fae\u8c03\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u7a33\u5065\u7684\u8de8\u8bed\u8a00\u80fd\u529b\uff0c\u65e0\u8bba\u662f\u5728\u6211\u4eec\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u7ffb\u8bd1\u8bc4\u4ef7\u57fa\u51c6\u4e0a\uff0c\u8fd8\u662f\u5728\u591a\u79cd\u591a\u6837\u7684\u591a\u8bed\u8a00\u4efb\u52a1\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8d21\u732e\u4e86Evol\u5206\u7c7b\u6cd5\u7684\u4e24\u6b65\u65b9\u6cd5\uff0c\u5e76\u516c\u5f00\u4e86M2Lingual\u7684\u6570\u636e\u96c6\uff1ahttps://huggingface.co/datasets/ServiceNow-AI/M2Lingual\u3002|\n", "2406.16779": "|**2024-06-24**|**It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension**|Sagi Shaier et.al.|[2406.16779](http://arxiv.org/abs/2406.16779)|null|\u8fc7\u53bb\u5341\u5e74\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5b9e\u8df5\u672a\u7ecf\u5145\u5206\u8bc4\u4f30\u5c31\u5df2\u786e\u7acb\u3002\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u8fd9\u4e00\u60c5\u51b5\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u95ee\u9898\uff1a1\uff09\u8f93\u5165\u987a\u5e8f\uff08\u5373\u95ee\u9898\u548c\u4e0a\u4e0b\u6587\uff09\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff1f\u9274\u4e8e\u8fd1\u671f\u5728\u8f93\u5165\u4fa7\u91cd\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63a2\u7a76\uff1a2\uff09\u5f3a\u8c03\u95ee\u9898\u3001\u4e0a\u4e0b\u6587\u6216\u4e24\u8005\u662f\u5426\u80fd\u63d0\u5347\u8868\u73b0\uff1f\u6211\u4eec\u57283\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u4e869\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u53d1\u73b0\u5148\u5448\u73b0\u4e0a\u4e0b\u6587\u518d\u7ed9\u51fa\u95ee\u9898\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u6700\u9ad8\u53ef\u8fbe31%\u7684\u51c6\u786e\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u5f3a\u8c03\u4e0a\u4e0b\u6587\u7684\u6548\u679c\u4f18\u4e8e\u7a81\u51fa\u663e\u793a\u95ee\u9898\uff0c\u800c\u4e14\u5bf9\u6a21\u578b\u7f3a\u4e4f\u53c2\u6570\u77e5\u8bc6\u6765\u56de\u7b54\u7684\u95ee\u9898\uff0c\u9488\u5bf9\u6027\u5730\u5f3a\u8c03\u8f93\u5165\u90e8\u5206\u5c24\u5176\u6709\u6548\u3002\u901a\u8fc7\u5c1d\u8bd5\u57fa\u4e8e\u63d0\u793a\u548c\u6ce8\u610f\u529b\u7684\u5f3a\u8c03\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u6709\u6548\u7684\u7b56\u7565\u51fa\u4eba\u610f\u6599\u5730\u7b80\u5355\uff1a\u53ea\u9700\u5728\u8f93\u5165\u4e2d\u9644\u52a0\u51e0\u4e2a\u6807\u8bb0\uff0c\u5c31\u80fd\u5b9e\u73b0\u9ad8\u8fbe36%\u7684\u51c6\u786e\u6027\u63d0\u5347\uff0c\u4f7f\u5f97\u5c0f\u578b\u6a21\u578b\u80fd\u591f\u8d85\u8d8a\u5176\u5927\u5f97\u591a\u7684\u540c\u7c7b\u6a21\u578b\u3002|\n", "2406.16777": "|**2024-06-24**|**Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024**|Sai Koneru et.al.|[2406.16777](http://arxiv.org/abs/2406.16777)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u88ab\u5e7f\u6cdb\u7814\u7a76\uff0c\u4ee5\u5e94\u7528\u4e8e\u8bf8\u5982\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u3001\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u751a\u81f3\u7aef\u5230\u7aef\u8bed\u97f3\u7ffb\u8bd1\uff08ST\uff09\u7b49\u4efb\u52a1\u3002\u672c\u6587\u4ecb\u7ecdKIT\u56e2\u961f\u5728\u53d7\u9650+LLM\u8d5b\u9053\u4e0b\u7684\u79bb\u7ebf\u63d0\u4ea4\uff0c\u6211\u4eec\u901a\u8fc7\u6574\u5408\u6700\u65b0\u6280\u672f\u6539\u8fdb\u4e86\u7ea7\u8054\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u5c06Mistral-7B\u6a21\u578b\\footnote{mistralai/Mistral-7B-Instruct-v0.1}\u878d\u5165\u5176\u4e2d\uff0c\u4ece\u4e24\u4e2a\u65b9\u9762\u589e\u5f3a\u7cfb\u7edf\uff1a\u4e00\u662f\u5229\u7528\u6211\u4eec\u7684\u7cfb\u7edf\u751f\u6210\u7684N-best\u5217\u8868\u7cbe\u70bcASR\u8f93\u51fa\uff0c\u901a\u8fc7\u5fae\u8c03LLM\u63d0\u9ad8\u8f6c\u5f55\u51c6\u786e\u6027\uff1b\u4e8c\u662f\u5bf9MT\u8f93\u51fa\u8fdb\u884c\u6587\u6863\u7ea7\u522b\u7684\u7cbe\u70bc\uff0c\u5229\u7528ASR\u548cMT\u9884\u6d4b\u6765\u63d0\u5347\u7ffb\u8bd1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u7684\u96c6\u6210\u4f7f\u5f97ASR\u7684Word Error Rate\u4e0b\u964d\u4e86\u7edd\u5bf90.3%\uff0cMT\u7684COMET\u8bc4\u5206\u63d0\u9ad8\u4e860.65%\u3002\u7136\u800c\uff0c\u5728\u5305\u542b\u91cd\u53e0\u8bf4\u8bdd\u8005\u548c\u80cc\u666f\u566a\u97f3\u7684\u6311\u6218\u6027\u6d4b\u8bd5\u96c6\u4e2d\uff0c\u7531\u4e8eASR\u6027\u80fd\u4e0d\u4f73\uff0cLLM\u96c6\u6210\u7684\u6548\u679c\u4e0d\u660e\u663e\u3002\u4e3a\u4e86\u6539\u5584\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u53ef\u80fd\u7f3a\u5931\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u5206\u5757\u957f\u5f62\u5f0f\u89e3\u7801\u7684ASR\u65b9\u6cd5\u3002|\n", "2406.16768": "|**2024-06-24**|**WARP: On the Benefits of Weight Averaged Rewarded Policies**|Alexandre Ram\u00e9 et.al.|[2406.16768](http://arxiv.org/abs/2406.16768)|null|### \u7ffb\u8bd1 \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u901a\u8fc7\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u6765\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f7f\u5176\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u4e3a\u4e86\u4fdd\u6301\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0cRLHF\u901a\u5e38\u91c7\u7528KL\u6563\u5ea6\u6b63\u5219\u5316\uff0c\u4f46\u8fd9\u4f1a\u9650\u5236\u5956\u52b1\u4f18\u5316\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5bf9\u9f50\u7b56\u7565\uff0c\u79f0\u4e3a\u6743\u91cd\u5e73\u5747\u5956\u52b1\u7b56\u7565\uff08WARP\uff09\u3002WARP\u5728\u4e09\u4e2a\u9636\u6bb5\u5728\u6743\u91cd\u7a7a\u95f4\u4e2d\u878d\u5408\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5b83\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7b56\u7565\u4f5c\u4e3aKL\u6b63\u5219\u5316\u7684\u52a8\u6001\u57fa\u51c6\u3002\u5176\u6b21\uff0c\u5e94\u7528\u7403\u9762\u63d2\u503c\u5c06\u72ec\u7acb\u5fae\u8c03\u7684\u7b56\u7565\u5408\u5e76\u6210\u4e00\u4e2a\u589e\u5f3a\u6a21\u578b\u3002\u6700\u540e\uff0c\u7ebf\u6027\u63d2\u503c\u5728\u5408\u5e76\u6a21\u578b\u548c\u521d\u59cb\u6a21\u578b\u4e4b\u95f4\u8fdb\u884c\uff0c\u4ee5\u6062\u590d\u9884\u8bad\u7ec3\u7279\u5f81\u3002\u8be5\u8fc7\u7a0b\u8fed\u4ee3\u8fdb\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6700\u7ec8\u6a21\u578b\u7528\u4f5c\u4e0b\u4e00\u8f6e\u7684\u9ad8\u7ea7\u521d\u59cb\u5316\uff0c\u9010\u6b65\u4f18\u5316KL\u4e0e\u5956\u52b1\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5b9e\u73b0\u56fa\u5b9aKL\u4e0b\u7684\u66f4\u9ad8\u5956\u52b1\u3002GEMMA\u7b56\u7565\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86WARP\u7684\u4f18\u70b9\uff0c\u5176\u8d28\u91cf\u548c\u5bf9\u9f50\u6027\u80fd\u4f18\u4e8e\u5f00\u6e90\u7684LLMs\u3002|\n", "2406.17770": "|**2024-06-25**|**MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning**|Xiangyu Zhao et.al.|[2406.17770](http://arxiv.org/abs/2406.17770)|**[link](https://github.com/phoenixz810/mg-llava)**|**## \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u6a21\u578b\u5c40\u9650\u4e8e\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u56fe\u50cf\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u8be6\u7ec6\u89c6\u89c9\u4fe1\u606f\u7684\u611f\u77e5\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MLLM\u2014\u2014MG-LLaVA\uff0c\u901a\u8fc7\u5f15\u5165\u591a\u5c3a\u5ea6\u89c6\u89c9\u6d41\uff0c\u5305\u62ec\u4f4e\u5206\u8fa8\u7387\u3001\u9ad8\u5206\u8fa8\u7387\u548c\u5bf9\u8c61\u7ea7\u7279\u5f81\uff0c\u6765\u589e\u5f3a\u6a21\u578b\u7684\u89c6\u89c9\u5904\u7406\u80fd\u529b\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u4ee5\u6355\u6349\u7cbe\u7ec6\u7ec6\u8282\uff0c\u5e76\u901a\u8fc7\u5377\u79ef\u95e8\u878d\u5408\u7f51\u7edc\u4e0e\u57fa\u7840\u89c6\u89c9\u7279\u5f81\u878d\u5408\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u5bf9\u8c61\u8bc6\u522b\u80fd\u529b\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u6765\u81ea\u79bb\u7ebf\u68c0\u6d4b\u5668\u786e\u5b9a\u7684\u8fb9\u754c\u6846\u7684\u7269\u4f53\u7ea7\u522b\u7279\u5f81\u3002MG-LLaVA\u4ec5\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u611f\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u7f16\u7801\u5668\uff08\u4ece38\u4ebf\u5230340\u4ebf\u53c2\u6570\uff09\u5b9e\u4f8b\u5316MG-LLaVA\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u5176\u6027\u80fd\u3002\u591a\u9879\u57fa\u51c6\u6d4b\u8bd5\u7684\u7ed3\u679c\u8868\u660e\uff0cMG-LLaVA\u5728\u540c\u7c7b\u53c2\u6570\u91cf\u7684\u73b0\u6709MLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u51fa\u8272\u7684\u6548\u7387\u3002\u4ee3\u7801\u5c06\u5728https://github.com/PhoenixZ810/MG-LLaVA\u4e0a\u5f00\u6e90\u3002**|\n", "2406.17764": "|**2024-06-25**|**BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning**|Ercong Nie et.al.|[2406.17764](http://arxiv.org/abs/2406.17764)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u79ef\u7d2f\u4e86\u4e30\u5bcc\u7684\u53c2\u6570\u77e5\u8bc6\uff0c\u4f46\u7531\u4e8e\u91cd\u65b0\u8bad\u7ec3\u6210\u672c\u9ad8\u6602\u4e14\u5bf9\u95ed\u6e90\u6a21\u578b\u4e0d\u53ef\u884c\uff0c\u66f4\u65b0\u8fd9\u4e9b\u77e5\u8bc6\u53d8\u5f97\u56f0\u96be\u3002\u77e5\u8bc6\u7f16\u8f91\uff08KE\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5141\u8bb8\u5728\u4e0d\u635f\u5bb3\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u66f4\u65b0LLMs\u7684\u77e5\u8bc6\u3002\u57fa\u4e8e\u201c\u4e0a\u4e0b\u6587\u5b66\u4e60\u201d\uff08ICL\uff09\u7684\u5373\u5e2dKE\u65b9\u6cd5\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u4f5c\u4e3a\u9ed1\u76d2\u5904\u7406\u3002\u8fc7\u53bb\uff0cKE\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u73af\u5883\uff0c\u800c\u5f53\u524d\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLMs\u5728\u8de8\u8bed\u8a00KE\u65b9\u9762\u7684\u6f5c\u529b\u5c1a\u672a\u5145\u5206\u6316\u6398\u3002\u4e3a\u4e86\u63a8\u52a8\u8fd9\u65b9\u9762\u7684\u66f4\u591a\u7814\u7a76\uff0c\u6211\u4eec\u63a8\u51fa\u4e86BMIKE-53\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u9488\u5bf953\u79cd\u4e0d\u540c\u8bed\u8a00\u7684\u4e09\u79cdKE\u4efb\u52a1\u7c7b\u578b\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u68af\u5ea6\u7684KE\u65b9\u6cd5\u2014\u2014\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7f16\u8f91\uff08MIKE\uff09\uff0c\u5e76\u5728BMIKE-53\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u5173\u6ce8\u8de8\u8bed\u8a00\u77e5\u8bc6\u8f6c\u79fb\u7684\u53ef\u9760\u6027\u3001\u6cdb\u5316\u6027\u3001\u5c40\u90e8\u6027\u548c\u53ef\u79fb\u690d\u6027\uff0c\u4e3a\u672a\u6765\u8de8\u8bed\u8a00KE\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u548c\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u901a\u8fc7\u533f\u540d\u4ed3\u5e93https://anonymous.4open.science/r/MIKE\u516c\u5f00\u83b7\u53d6\u3002|\n", "2406.17761": "|**2024-06-25**|**CaLMQA: Exploring culturally specific long-form question answering across 23 languages**|Shane Arora et.al.|[2406.17761](http://arxiv.org/abs/2406.17761)|**[link](https://github.com/2015aroras/calmqa)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u957f\u7bc7\u95ee\u7b54\u4efb\u52a1\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u9700\u751f\u6210\u6bb5\u843d\u7ea7\u522b\u7684\u7b54\u6848\u6765\u56de\u5e94\u590d\u6742\u95ee\u9898\u3002\u5c3d\u7ba1\u82f1\u8bed\u7684\u957f\u7bc7\u95ee\u7b54\u7814\u7a76\u5df2\u76f8\u5f53\u6df1\u5165\uff0c\u6d89\u53ca\u591a\u79cd\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u6307\u6807\uff0c\u4f46\u5176\u4ed6\u8bed\u8a00\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63a8\u51fa\u4e86CaLMQA\uff0c\u4e00\u4e2a\u5305\u542b2,600\u4e2a\u8de823\u79cd\u8bed\u8a00\u7684\u590d\u6742\u95ee\u9898\u96c6\u5408\uff0c\u5176\u4e2d\u5305\u62ec\u8d44\u6e90\u6709\u9650\u3001\u9c9c\u5c11\u7814\u7a76\u7684\u8bed\u8a00\uff0c\u5982\u6590\u6d4e\u8bed\u548c\u57fa\u6797\u8fea\u8bed\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u65e2\u5305\u62ec\u793e\u533a\u7f51\u7edc\u8bba\u575b\u4e0a\u6536\u96c6\u7684\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u4e5f\u5305\u542b\u4e86\u7531\u6bcd\u8bed\u4f7f\u7528\u8005\u64b0\u5199\u7684\u9898\u76ee\uff0c\u6211\u4eec\u4e3a\u6b64\u4e13\u95e8\u8058\u8bf7\u4e86\u4ed6\u4eec\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u4ea7\u751f\u4e86\u591a\u6837\u4e14\u590d\u6742\u7684\u9898\u76ee\uff0c\u53cd\u6620\u4e86\u6587\u5316\u4e3b\u9898\uff08\u5982\u4f20\u7edf\u3001\u6cd5\u5f8b\u3001\u65b0\u95fb\uff09\uff0c\u4ee5\u53ca\u6bcd\u8bed\u4f7f\u7528\u8005\u7684\u8bed\u8a00\u4e60\u60ef\u3002 \u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u81ea\u52a8\u8bc4\u4f30\uff0c\u4f7f\u7528\u4e86\u6211\u4eec\u65b0\u63d0\u51fa\u7684CaLMScore\u6307\u6807\uff0c\u8be5\u6307\u6807\u80fd\u68c0\u6d4b\u7b54\u6848\u4e2d\u7684\u8bed\u8a00\u9519\u8bef\u548c\u91cd\u590d\u8bcd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u4f4e\u8d44\u6e90\u8bed\u8a00\uff0cLLM\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u660e\u663e\u4e0b\u964d\u3002\u6211\u4eec\u5728\u90e8\u5206\u6a21\u578b\u7684\u4eba\u5de5\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u5bf9\u4e8e\u5177\u6709\u6587\u5316\u7279\u6027\u7684\u95ee\u9898\uff0c\u6a21\u578b\u8868\u73b0\u663e\u8457\u4f4e\u4e8e\u6587\u5316\u4e2d\u7acb\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5bf9LLM\u591a\u8bed\u8a00\u80fd\u529b\u53ca\u975e\u82f1\u8bed\u957f\u7bc7\u95ee\u7b54\u8bc4\u4ef7\u9886\u57df\u66f4\u6df1\u5165\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002**|\n", "2406.17755": "|**2024-06-25**|**Accelerating Clinical Evidence Synthesis with Large Language Models**|Zifeng Wang et.al.|[2406.17755](http://arxiv.org/abs/2406.17755)|null|\u4eba\u5de5\u667a\u80fd\u81ea\u52a8\u533b\u5b66\u53d1\u73b0\u662f\u8bb8\u591a\u4eba\u7684\u68a6\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTrialMind\u7684\u751f\u6210\u5f0fAI\u7ba1\u9053\uff0c\u65e8\u5728\u8fdb\u884c\u533b\u5b66\u7cfb\u7edf\u6027\u56de\u987e\uff0c\u6db5\u76d6\u7814\u7a76\u641c\u7d22\u3001\u7b5b\u9009\u548c\u6570\u636e\u63d0\u53d6\u9636\u6bb5\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u6bcf\u4e2a\u73af\u8282\uff0c\u5e76\u5f15\u5165\u4e13\u5bb6\u76d1\u7763\u4ee5\u51cf\u5c11\u9519\u8bef\u3002\u4e3a\u4e86\u8bc4\u4f30\u6027\u80fd\uff0c\u6211\u4eec\u521b\u5efa\u4e86TrialReviewBench\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u662f\u4e00\u4e2a\u5b9a\u5236\u7684\u5305\u542b870\u4efd\u6765\u81ea25\u7bc7\u5143\u5206\u6790\u8bba\u6587\u7684\u4e34\u5e8a\u7814\u7a76\u6807\u6ce8\u6570\u636e\uff0c\u6db5\u76d6\u4e0d\u540c\u533b\u7597\u6cbb\u7597\u9886\u57df\u3002\u7ed3\u679c\u663e\u793a\uff0cTrialMind\u663e\u8457\u63d0\u5347\u4e86\u6587\u732e\u5ba1\u67e5\u6548\u7387\uff0c\u5728\u4ece\u8d85\u8fc72000\u4e07\u7bc7PubMed\u7814\u7a76\u4e2d\u68c0\u7d22\u76f8\u5173\u7814\u7a76\u65f6\uff0c\u53ec\u56de\u7387\u9ad8\u8fbe0.897\u81f31.000\u3002\u5728\u7b5b\u9009\u9636\u6bb5\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8e\u57fa\u4e8e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7684\u65b9\u6cd5\uff08\u53ec\u56de\u7387\u5206\u522b\u4e3a0.227-0.246 vs. 0.000-0.102\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ed3\u679c\u63d0\u53d6\u65b9\u9762\u8d85\u8d8a\u4e86\u76f4\u63a5\u4f7f\u7528GPT-4\u7684\u8868\u73b0\uff0c\u51c6\u786e\u7387\u8303\u56f4\u4e3a0.65\u52300.84\u3002\u6211\u4eec\u8fd8\u652f\u6301\u68ee\u6797\u56fe\u4e2d\u7684\u4e34\u5e8a\u8bc1\u636e\u7efc\u5408\uff0c\u7ecf\u516b\u540d\u4eba\u7c7b\u6807\u6ce8\u5458\u9a8c\u8bc1\uff0c\u4ed6\u4eec\u666e\u904d\u66f4\u504f\u597dTrialMind\uff0c\u5176\u5728\u6d89\u53ca\u7684\u5ba1\u67e5\u4e2d\u80dc\u51fa\u7387\u4e3a62.5%\u81f3100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u5982TrialMind\uff0c\u80fd\u591f\u4fc3\u8fdb\u53ef\u9760\u4e14\u9ad8\u8d28\u91cf\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\uff0c\u4ece\u800c\u63d0\u5347\u4e34\u5e8a\u7814\u7a76\u7684\u6548\u7387\u3002|\n", "2406.17753": "|**2024-06-25**|**Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language**|Amalie Brogaard Pauli et.al.|[2406.17753](http://arxiv.org/abs/2406.17753)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u9762\u5bf9\u5927\u91cf\u8bd5\u56fe\u5f71\u54cd\u6211\u4eec\u7684\u4fe1\u606f\uff0c\u5982\u9884\u544a\u6d88\u606f\u3001\u8fa9\u8bba\u3001\u5e26\u6709\u653f\u6cbb\u8272\u5f69\u7684\u65b0\u95fb\u548c\u5ba3\u4f20\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5177\u6709\u8bf4\u670d\u529b\u6587\u672c\u7684\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u4e13\u6ce8\u4e8e\u7279\u5b9a\u9886\u57df\u6216\u7c7b\u578b\u529d\u8bf4\u7684\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u5206\u6790\uff0c\u65e8\u5728\u6d4b\u91cf\u548c\u57fa\u51c6LLMs\u5728\u88ab\u660e\u786e\u8981\u6c42\u589e\u5f3a\u6216\u51cf\u5c11\u8bf4\u670d\u529b\u65f6\uff0c\u4ee5\u53ca\u4ec5\u8981\u6c42\u8fdb\u884c\u91ca\u4e49\u65f6\u4ea7\u751f\u8bf4\u670d\u6027\u6587\u672c\u7684\u7a0b\u5ea6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u2014\u2014\u201cPersuasive-Pairs\u201d\uff0c\u5305\u542b\u4e00\u7ec4\u7531\u7b80\u77ed\u6587\u672c\u548cLLM\u91cd\u5199\u4ee5\u653e\u5927\u6216\u524a\u5f31\u8bf4\u670d\u529b\u7684\u6587\u672c\u5bf9\u3002\u6211\u4eec\u5bf9\u8fd9\u4e9b\u914d\u5bf9\u8fdb\u884c\u4e86\u591a\u6807\u6ce8\uff0c\u6309\u76f8\u5bf9\u5c3a\u5ea6\u8bc4\u4f30\u5176\u8bf4\u670d\u529b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u4e0d\u4ec5\u672c\u8eab\u5177\u6709\u4ef7\u503c\uff0c\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5b83\u8bad\u7ec3\u4e00\u4e2a\u56de\u5f52\u6a21\u578b\uff0c\u9884\u6d4b\u6587\u672c\u5bf9\u4e4b\u95f4\u8bf4\u670d\u529b\u7684\u5f97\u5206\uff0c\u4ece\u800c\u80fd\u591f\u5bf9\u4e0d\u540c\u9886\u57df\u7684LLMs\u8fdb\u884c\u8bc4\u5206\u548c\u6bd4\u8f83\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e0d\u540c\u7cfb\u7edf\u63d0\u793a\u5bf9LLaMA3\u4ea7\u751f\u7684\u5f71\u54cd\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5373\u4f7f\u5728\u4ec5\u8981\u6c42\u91ca\u4e49\u7684\u60c5\u51b5\u4e0b\uff0c\u4e0d\u540c\u7684\u201c\u89d2\u8272\u201d\u63d0\u793a\u4e5f\u4f1a\u663e\u8457\u6539\u53d8\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7814\u7a76LLM\u751f\u6210\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u8bed\u8a00\u7684\u91cd\u8981\u6027\u3002|\n", "2406.17737": "|**2024-06-25**|**LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users**|Elinor Poole-Dayan et.al.|[2406.17737](http://arxiv.org/abs/2406.17737)|null|\u5728\u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5173\u4e8e\u5b83\u4eec\u7684\u4e0d\u53ef\u9760\u884c\u4e3a\uff0c\u5982\u865a\u6784\u548c\u504f\u89c1\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u7684\u56de\u7b54\u8d28\u91cf\u5728\u4fe1\u606f\u51c6\u786e\u6027\u3001\u771f\u5b9e\u6027\u4ee5\u53ca\u62d2\u7edd\u56de\u7b54\u65b9\u9762\uff0c\u5982\u4f55\u968f\u7740\u4e09\u79cd\u7528\u6237\u7279\u5f81\u7684\u53d8\u5316\u800c\u53d8\u5316\uff1a\u82f1\u8bed\u6c34\u5e73\u3001\u6559\u80b2\u7a0b\u5ea6\u548c\u56fd\u7c4d\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u6700\u5148\u8fdb\u7684LLMs\u548c\u4e24\u4e2a\u4e8b\u5b9e\u6838\u67e5\u76f8\u5173\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u771f\u5b9e\u6027\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u5bf9\u82f1\u8bed\u80fd\u529b\u8f83\u4f4e\u3001\u6559\u80b2\u6c34\u5e73\u8f83\u4f4e\u4ee5\u53ca\u975e\u7f8e\u56fd\u7c4d\u7528\u6237\u7684\u56de\u7b54\u8d28\u91cf\u5b58\u5728\u66f4\u660e\u663e\u7684\u8d1f\u9762\u503e\u5411\uff0c\u8fd9\u4f7f\u5f97\u8fd9\u4e9b\u6a21\u578b\u5bf9\u4e8e\u5176\u6700\u5f31\u52bf\u7528\u6237\u6765\u8bf4\uff0c\u5e76\u975e\u53ef\u9760\u7684\u4fe1\u606f\u6765\u6e90\u3002|\n", "2406.17706": "|**2024-06-25**|**FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model**|Feijie Wu et.al.|[2406.17706](http://arxiv.org/abs/2406.17706)|**[link](https://github.com/HarliWu/FedBiOT)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u8fc7\u9002\u5f53\u9886\u57df\u7279\u5b9a\u6570\u636e\u7684\u5fae\u8c03\u540e\uff0c\u5728\u8bb8\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u4e13\u7528\u6570\u636e\u901a\u5e38\u5206\u5e03\u5728\u591a\u4e2a\u6240\u6709\u8005\u4e4b\u95f4\uff0c\u8fd9\u5c31\u63d0\u51fa\u4e86\u5982\u4f55\u5728\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4e2d\u8fdb\u884cLLM\u5fae\u8c03\u7684\u95ee\u9898\u3002\u9762\u5bf9\u6709\u9650\u7684\u8ba1\u7b97\u548c\u901a\u4fe1\u80fd\u529b\uff0cFL\u5ba2\u6237\u7aef\u5728\u6709\u6548\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FedBiOT\uff0c\u4e00\u79cd\u65e8\u5728\u63d0\u9ad8\u8d44\u6e90\u6548\u7387\u7684LLM\u5fae\u8c03FL\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u670d\u52a1\u5668\u751f\u6210\u4e00\u4e2a\u538b\u7f29\u7684LLM\uff0c\u5e76\u786e\u4fdd\u5176\u6027\u80fd\u4e0e\u5b8c\u6574\u6a21\u578b\u76f8\u5f53\u3002\u7136\u540e\uff0c\u5ba2\u6237\u7aef\u9488\u5bf9\u8fd9\u4e2a\u538b\u7f29\u6a21\u578b\u7684\u4e00\u4e2a\u8f7b\u91cf\u4f46\u91cd\u8981\u7684\u90e8\u5206\u2014\u2014\u9002\u914d\u5668\u8fdb\u884c\u5fae\u8c03\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7531\u4e8e\u670d\u52a1\u5668\u65e0\u6cd5\u8bbf\u95ee\u5ba2\u6237\u7aef\u62e5\u6709\u7684\u79c1\u4eba\u6570\u636e\uff0c\u670d\u52a1\u5668\u7528\u4e8e\u6821\u51c6\u7684\u6570\u636e\u5206\u5e03\u4e0e\u5ba2\u6237\u7aef\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u4e0d\u540c\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5efa\u6a21\u4e3a\u4e00\u4e2a\u5e26\u6709\u6570\u636e\u4e0d\u4e00\u81f4\u6027\u5f71\u54cd\u7684 bilevel \u4f18\u5316\u95ee\u9898\uff0c\u5e76\u5bfc\u51fa\u4e86\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u7684\u66f4\u65b0\u89c4\u5219\u3002\u6211\u4eec\u5728 LLaMA-2 \u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u5728\u91cd\u65b0\u6574\u5408\u5230\u5168\u5c40\u8bed\u8a00\u6a21\u578b\u65f6\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u8fd8\u8868\u660e\uff0cFedBiOT \u76f8\u6bd4\u73b0\u6709\u57fa\u51c6\u663e\u8457\u51cf\u5c11\u4e86\u8d44\u6e90\u6d88\u8017\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u76f8\u8fd1\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692](http://arxiv.org/abs/2406.17692)|**[link](https://github.com/thomlake/investigating-alignment)**|**\u8be5\u7814\u7a76\u5206\u6790\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ecf\u8fc7\u6821\u51c6\u540e\u8f93\u51fa\u5206\u5e03\u7684\u53d8\u5316\u7279\u6027\u3002\u9996\u5148\uff0c\u91cd\u65b0\u8bc4\u4f30\u4e86\u4e4b\u524d\u5173\u4e8e\u6821\u51c6\u540e\u54cd\u5e94\u591a\u6837\u6027\u964d\u4f4e\u7684\u62a5\u544a\uff0c\u53d1\u73b0\u8fd9\u79cd\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u8d28\u91cf\u63a7\u5236\u548c\u4fe1\u606f\u6574\u5408\u3002\u6821\u51c6\u80fd\u591f\u6291\u5236\u4e0d\u76f8\u5173\u548c\u65e0\u5e2e\u52a9\u7684\u5185\u5bb9\uff0c\u540c\u65f6\u4f7f\u8f93\u51fa\u5206\u5e03\u503e\u5411\u4e8e\u66f4\u957f\u7684\u3001\u6db5\u76d6\u591a\u4e2a\u57fa\u7840LLM\u54cd\u5e94\u4fe1\u606f\u7684\u7b54\u6848\uff0c\u5b9e\u8d28\u4e0a\u662f\u5c06\u591a\u6837\u5316\u4fe1\u606f\u6c47\u603b\u5728\u5355\u4e2a\u54cd\u5e94\u4e2d\u3002\u7814\u7a76\u5e76\u672a\u53d1\u73b0\u6821\u51c6\u663e\u8457\u51cf\u5c11\u6709\u7528\u4fe1\u606f\uff0c\u8fdb\u800c\u5f15\u51fa\u95ee\u9898\uff1a\u6821\u51c6\u6a21\u578b\u662f\u5426\u4f1a\u4ea7\u751f\u57fa\u7840\u6a21\u578b\u65e0\u6cd5\u518d\u73b0\u7684\u4fe1\u606f\uff1f\u7b2c\u4e8c\u90e8\u5206\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u60c5\u51b5\u5e76\u975e\u5982\u6b64\uff0c\u6821\u51c6\u6a21\u578b\u7684\u884c\u4e3a\u53ef\u4ee5\u901a\u8fc7\u57fa\u7840\u6a21\u578b\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u590d\u73b0\u3002\u901a\u8fc7\u4e0a\u4e0b\u6587\u793a\u4f8b\u548c\u8f83\u4f4e\u5206\u8fa8\u7387\u7684\u8bed\u4e49\u63d0\u793a\uff0c\u53ef\u4ee5\u4ece\u57fa\u7840LLMs\u5f15\u5bfc\u51fa\u4e0e\u6821\u51c6\u540e\u7684\u76f8\u4f3c\u54cd\u5e94\uff0c\u751a\u81f3\u4e0e\u6821\u51c6\u540e\u7684\u54cd\u5e94\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u63a5\u8fd1\u3002\u8fd9\u4e9b\u53d1\u73b0\u652f\u6301\u201c\u8868\u9762\u6821\u51c6\u5047\u8bbe\u201d\uff0c\u5373\u5f53\u524d\u7684\u6821\u51c6\u6280\u672f\u4ec5\u6355\u6349\u4e86\u52a9\u624b\u578b\u57fa\u7840LLM\u884c\u4e3a\u4e2d\u6709\u7528\u7684\u90e8\u5206\uff0c\u5e76\u672a\u6269\u5c55\u5176\u80fd\u529b\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u8fd8\u663e\u793a\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u6821\u51c6\u4f5c\u4e3a\u4e00\u79cd\u6a21\u4eff\u6821\u51c6LLMs\u7684\u7b56\u7565\uff0c\u6548\u679c\u51fa\u4eba\u610f\u6599\u5730\u597d\uff0c\u4e14\u65e0\u9700\u5fae\u8c03\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.17681": "|**2024-06-25**|**VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation**|Kun Qian et.al.|[2406.17681](http://arxiv.org/abs/2406.17681)|**[link](https://github.com/qbetterk/VarBench)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f20\u7edf\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u65e5\u76ca\u51fa\u8272\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u9884\u8bad\u7ec3\u671f\u95f4\u7684\u57fa\u51c6\u6570\u636e\u6cc4\u9732\u95ee\u9898\uff0c\u901a\u5e38\u79f0\u4e3a\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002\u4e3a\u4e86\u786e\u4fdd\u516c\u6b63\u7684\u8bc4\u4f30\uff0c\u6700\u8fd1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u516c\u5f00\u8bad\u7ec3\u548c\u9a8c\u8bc1\u96c6\uff0c\u5bf9\u6d4b\u8bd5\u96c6\u6807\u7b7e\u4fdd\u5bc6\u3002\u4ed6\u4eec\u8981\u6c42\u4efb\u4f55\u5e0c\u671b\u8bc4\u4f30\u81ea\u5df1\u8bed\u8a00\u6a21\u578b\u7684\u4eba\u90fd\u9700\u8981\u63d0\u4ea4\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u8fdb\u884c\u96c6\u4e2d\u5904\u7406\uff0c\u7136\u540e\u5728\u6392\u884c\u699c\u4e0a\u516c\u5e03\u6a21\u578b\u7684\u5f97\u5206\u3002\u7136\u800c\uff0c\u8fd9\u4e2a\u63d0\u4ea4\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u59a8\u788d\u4e86\u6709\u6548\u7684\u9519\u8bef\u5206\u6790\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u52a8\u6001\u5316\u57fa\u51c6\u6d4b\u8bd5\u5e76\u5b9e\u65f6\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u6bcf\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u4e2d\u63d0\u53d6\u53d8\u91cf\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u53d8\u91cf\u5b9a\u4e49\u4e00\u4e2a\u503c\u8303\u56f4\u3002\u6bcf\u6b21\u8bc4\u4f30\u65f6\uff0c\u6211\u4eec\u4f1a\u4ece\u8fd9\u4e9b\u503c\u57df\u4e2d\u62bd\u53d6\u65b0\u7684\u503c\u6765\u521b\u5efa\u72ec\u7279\u7684\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ece\u800c\u4fdd\u8bc1\u6bcf\u6b21\u90fd\u662f\u5168\u65b0\u7684\u8bc4\u4f30\u3002 \u6211\u4eec\u9488\u5bf9\u6570\u5b66\u751f\u6210\u4efb\u52a1\u7684GSM8K\u3001\u591a\u9879\u9009\u62e9\u4efb\u52a1\u7684ARC\u3001commonsense\u95ee\u7b54\u7684CommonsenseQA\u4ee5\u53caTruthfulQA\u7684\u771f\u5b9e\u6027\u95ee\u7b54\u4efb\u52a1\uff0c\u5e94\u7528\u4e86\u8fd9\u79cd\u53d8\u91cf\u6270\u52a8\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u51c6\u786e\u5730\u8861\u91cf\u8bed\u8a00\u6a21\u578b\u7684\u771f\u5b9e\u80fd\u529b\uff0c\u6709\u6548\u7f13\u89e3\u4e86\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002|\n", "2406.17675": "|**2024-06-25**|**Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models**|Yuan Li et.al.|[2406.17675](http://arxiv.org/abs/2406.17675)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u65e5\u76ca\u626e\u6f14\u7c7b\u4f3c\u4eba\u7c7b\u52a9\u624b\u7684\u89d2\u8272\u3002\u793e\u4f1a\u5bf9\u5c06LLMs\u66f4\u5e7f\u6cdb\u5730\u878d\u5165\u5176\u4e2d\u4ea7\u751f\u4e86\u5174\u8da3\uff0c\u63a2\u8ba8\u5b83\u4eec\u662f\u5426\u5177\u5907\u5fc3\u7406\u7279\u8d28\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u7279\u8d28\u662f\u5426\u7a33\u5b9a\u4e14\u6709\u52a9\u4e8e\u7406\u89e3\u5176\u884c\u4e3a\u3002\u672c\u6587\u501f\u9274\u5fc3\u7406\u5b66\u6d4b\u91cf\u5b66\u7684\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7528\u4e8e\u7814\u7a76LLMs\u4e2d\u7684\u5fc3\u7406\u5b66\uff0c\u5305\u62ec\u5fc3\u7406\u7ef4\u5ea6\u8bc6\u522b\u3001\u8bc4\u4f30\u6570\u636e\u96c6\u521b\u5efa\u548c\u7ed3\u679c\u9a8c\u8bc1\u3002\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684LLM\u5fc3\u7406\u6d4b\u91cf\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u516d\u79cd\u5fc3\u7406\u7ef4\u5ea6\uff1a\u4e2a\u6027\u3001\u4ef7\u503c\u89c2\u3001\u60c5\u7eea\u3001\u5fc3\u667a\u7406\u8bba\u3001\u52a8\u673a\u548c\u667a\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u542b\u4e86\u5341\u4e09\u4e2a\u5305\u542b\u591a\u6837\u573a\u666f\u548c\u9898\u578b\u7684\u6570\u636e\u96c6\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLMs\u5c55\u73b0\u51fa\u5e7f\u6cdb\u7684\u5fc3\u7406\u7279\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u89c2\u5bdf\u5230LLMs\u5728\u81ea\u6211\u62a5\u544a\u7684\u7279\u8d28\u4e0e\u5176\u5b9e\u9645\u884c\u4e3a\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u3002\u8be5\u8bba\u6587\u8be6\u7ec6\u5c55\u793a\u4e86LLMs\u7684\u5fc3\u7406\u6d4b\u91cf\u8bc4\u4f30\uff0c\u4e3aAI\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53ef\u9760\u8bc4\u4f30\u63d0\u4f9b\u4e86\u6d1e\u89c1\uff0c\u4ee5\u53ca\u53ef\u80fd\u7684\u5e94\u7528\u65b9\u5411\u3002|\n", "2406.18532": "|**2024-06-26**|**Symbolic Learning Enables Self-Evolving Agents**|Wangchunshu Zhou et.al.|[2406.18532](http://arxiv.org/abs/2406.18532)|**[link](https://github.com/aiwaves-cn/agents)**|**\u4eba\u5de5\u667a\u80fd\u754c\u901a\u8fc7\u6784\u5efa\"\u8bed\u8a00\u4ee3\u7406\"\uff08\u5373\u590d\u6742\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ba1\u9053\uff09\u6765\u63a2\u5bfb\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u9053\u8def\uff0c\u8fd9\u4e9b\u6a21\u578b\u7ed3\u5408\u4e86\u63d0\u793a\u6280\u672f\u548c\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u4f17\u591a\u5b9e\u9645\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f53\u524d\u8bed\u8a00\u4ee3\u7406\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u662f\u5176\u6a21\u578b\u4e2d\u5fc3\u6216\u5de5\u7a0b\u5bfc\u5411\uff1a\u63d0\u793a\u3001\u5de5\u5177\u548c\u7ba1\u9053\u7684\u6539\u8fdb\u4f9d\u8d56\u4e8e\u5927\u91cf\u7684\u4eba\u5de5\u4e13\u5bb6\u8bbe\u8ba1\uff0c\u800c\u975e\u81ea\u52a8\u4ece\u6570\u636e\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4ece\u6a21\u578b\u4e2d\u5fc3\u5411\u6570\u636e\u4e2d\u5fc3\u8f6c\u53d8\u2014\u2014\u8ba9\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u81ea\u4e3b\u5b66\u4e60\u548c\u9002\u5e94\u73af\u5883\uff0c\u662f\u5b83\u4eec\u8fc8\u5411AGI\u7684\u5173\u952e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\"\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\"\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u65b9\u6cd5\uff0c\u5b83\u4f7f\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u5728\u6570\u636e\u9a71\u52a8\u7684\u65b9\u5f0f\u4e0b\u81ea\u6211\u4f18\u5316\uff0c\u5229\u7528\u7b26\u53f7\u4f18\u5316\u5668\u3002\u6211\u4eec\u5c06\u4ee3\u7406\u89c6\u4e3a\u5177\u6709\u53ef\u5b66\u4e60\u6743\u91cd\u7684\u7b26\u53f7\u7f51\u7edc\uff0c\u8fd9\u4e9b\u6743\u91cd\u7531\u63d0\u793a\u3001\u5de5\u5177\u53ca\u5176\u7ec4\u5408\u65b9\u5f0f\u5b9a\u4e49\u3002\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u65e8\u5728\u6a21\u4eff\u8fde\u63a5\u4e3b\u4e49\u5b66\u4e60\u4e2d\u7684\u4e24\u4e2a\u57fa\u672c\u7b97\u6cd5\uff1a\u53cd\u5411\u4f20\u64ad\u548c\u68af\u5ea6\u4e0b\u964d\uff0c\u4f46\u5b83\u5904\u7406\u7684\u662f\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7684\u6743\u91cd\u3001\u635f\u5931\u548c\u68af\u5ea6\u3002\u6211\u4eec\u5728\u6807\u51c6\u57fa\u51c6\u548c\u590d\u6742\u73b0\u5b9e\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u6982\u5ff5\u9a8c\u8bc1\u5b9e\u9a8c\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u4f7f\u5f97\u8bed\u8a00\u4ee3\u7406\u5728\u521b\u5efa\u548c\u90e8\u7f72\u540e\u80fd\u591f\u81ea\u6211\u66f4\u65b0\uff0c\u5b9e\u73b0\u4e86\"\u81ea\u6211\u8fdb\u5316\u7684\u4ee3\u7406\"\u3002**|\n", "2406.18528": "|**2024-06-26**|**PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation**|Christoph Leiter et.al.|[2406.18528](http://arxiv.org/abs/2406.18528)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5b83\u4eec\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u4f7f\u5176\u6210\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u8bc4\u4ef7\u7684\u6709\u529b\u5de5\u5177\uff0c\u7279\u522b\u9002\u7528\u4e8e\u8d44\u6e90\u532e\u4e4f\u548c\u65f6\u95f4\u9650\u5236\u7684\u573a\u666f\u3002\u672c\u6587\u63d0\u51faPrExMe\uff0c\u4e00\u9879\u5927\u89c4\u6a21\u7684\u63d0\u793a\u63a2\u7d22\u5ea6\u91cf\u6cd5\uff0c\u6211\u4eec\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u548c\u6458\u8981\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u8d85\u8fc7720\u79cd\u5f00\u6e90LLM\u4f5c\u4e3a\u5ea6\u91cf\u6807\u51c6\u7684\u6a21\u677f\uff0c\u603b\u8ba1\u7ea6660\u4e07\u6b21\u8bc4\u4f30\u3002\u8fd9\u9879\u8be6\u5c3d\u7684\u6bd4\u8f83\uff081\uff09\u4e3a\u8fd1\u671f\u5f00\u6e90LLMs\u4f5c\u4e3a\u8bc4\u4ef7\u6307\u6807\u7684\u8868\u73b0\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff1b\uff082\uff09\u63a2\u8ba8\u4e86\u4e0d\u540c\u63d0\u793a\u7b56\u7565\u7684\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65b9\u9762\uff0c\u5b58\u5728\u4e00\u4e9b\u60c5\u51b5\u4e0b\u63d0\u793a\u8868\u73b0\u7a33\u5b9a\uff1a\u6709\u4e9bLLMs\u8868\u73b0\u51fa\u7279\u6709\u7684\u504f\u597d\uff0c\u503e\u5411\u4e8e\u4f7f\u7528\u6587\u672c\u6807\u7b7e\u6765\u8bc4\u5206\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u503e\u5411\u4e8e\u8fd4\u56de\u6570\u503c\u5206\u6570\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u63d0\u793a\u7684\u7a33\u5b9a\u6027\u548c\u6a21\u578b\u6392\u540d\u53ef\u80fd\u53d7\u5230\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u66f4\u6539\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u5c06\u8f93\u51fa\u683c\u5f0f\u4ece\u201c0\u5230100\u201d\u6539\u4e3a\u201c-1\u5230+1\u201d\u53ef\u80fd\u4f1a\u663e\u8457\u6539\u53d8\u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u6709\u52a9\u4e8e\u7406\u89e3\u4e0d\u540c\u63d0\u793a\u65b9\u6cd5\u5bf9MT\u548c\u6458\u8981\u8bc4\u4ef7\u4e2dLLM-based\u5ea6\u91cf\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u6700\u7a33\u5b9a\u7684\u63d0\u793a\u6a21\u5f0f\uff0c\u5e76\u6307\u51fa\u4e86\u6f5c\u5728\u5c40\u9650\u6027\u3002|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521](http://arxiv.org/abs/2406.18521)|**[link](https://github.com/princeton-nlp/CharXiv)**|\u5728\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5904\u7406\u79d1\u5b66\u8bba\u6587\u6216\u8d22\u52a1\u62a5\u544a\u7b49\u4efb\u52a1\u65f6\uff0c\u56fe\u8868\u7406\u89e3\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u5f80\u5f80\u96c6\u4e2d\u5728\u7b80\u5316\u548c\u540c\u8d28\u5316\u7684\u56fe\u8868\u4e0a\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6a21\u677f\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u8bc4\u4f30\u8fc7\u4e8e\u4e50\u89c2\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5c3d\u7ba1\u5f00\u6e90\u6a21\u578b\u5728\u73b0\u6709\u57fa\u51c6\u4e0a\u53ef\u80fd\u8868\u73b0\u4f18\u4e8e\u5f3a\u5927\u7684\u4e13\u6709\u6a21\u578b\uff0c\u4f46\u901a\u8fc7\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\uff0c\u5982\u6539\u53d8\u56fe\u8868\u6216\u95ee\u9898\uff0c\u6027\u80fd\u4f1a\u4e0b\u964d\u9ad8\u8fbe34.5%\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faCharXiv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,323\u4e2a\u6765\u81eaarXiv\u8bba\u6587\u7684\u81ea\u7136\u3001\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u8868\u7684\u5168\u9762\u8bc4\u4f30\u5957\u4ef6\u3002CharXiv\u5305\u62ec\u4e24\u7c7b\u95ee\u9898\uff1a1\uff09\u63cf\u8ff0\u6027\u95ee\u9898\uff0c\u7528\u4e8e\u68c0\u67e5\u57fa\u672c\u56fe\u8868\u5143\u7d20\uff1b2\uff09\u63a8\u7406\u95ee\u9898\uff0c\u9700\u8981\u7efc\u5408\u5206\u6790\u56fe\u8868\u4e2d\u7684\u590d\u6742\u89c6\u89c9\u5143\u7d20\u3002\u6240\u6709\u56fe\u8868\u548c\u95ee\u9898\u90fd\u7531\u4e13\u5bb6\u7cbe\u5fc3\u6311\u9009\u3001\u6574\u7406\u548c\u9a8c\u8bc1\u4ee5\u4fdd\u8bc1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5f3a\u4e13\u6709\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\uff0c\u51c6\u786e\u7387\u4e3a47.1%\uff09\u4e0e\u6700\u5f3a\u5f00\u6e90\u6a21\u578b\uff08\u5982InternVL Chat V1.5\uff0c\u51c6\u786e\u7387\u4e3a29.2%\uff09\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\uff0c\u800c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5747\u8fdc\u4f4e\u4e8e\u4eba\u7c7b\u768480.5%\u6c34\u5e73\uff0c\u8fd9\u63ed\u793a\u4e86\u73b0\u6709MLLM\u5728\u56fe\u8868\u7406\u89e3\u80fd\u529b\u4e0a\u7684\u4e0d\u8db3\u3002\u6211\u4eec\u5e0c\u671bCharXiv\u80fd\u63a8\u52a8\u672a\u6765\u7684\u7814\u7a76\uff0c\u901a\u8fc7\u63d0\u4f9b\u66f4\u771f\u5b9e\u3001\u66f4\u5177\u4ee3\u8868\u6027\u7684\u8fdb\u6b65\u8861\u91cf\u6807\u51c6\uff0c\u4fc3\u8fdb\u56fe\u8868\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\u3002\u9879\u76ee\u9875\u9762\u548c\u6392\u884c\u699c\u53ef\u8bbf\u95ee\uff1ahttps://charxiv.github.io/\u3002|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512](http://arxiv.org/abs/2406.18512)|null|### \u6982\u8ff0 \u89e3\u91ca\u662f\u77e5\u8bc6\u5171\u4eab\u7684\u6838\u5fc3\uff0c\u5b83\u5efa\u7acb\u5728\u6c9f\u901a\u539f\u7406\u3001\u793e\u4f1a\u52a8\u6001\u548c\u5b66\u4e60\u7406\u8bba\u4e4b\u4e0a\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5bf9\u8bdd\u5f0f\u7684\u89e3\u91ca\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5176\u73af\u5883\u9ad8\u5ea6\u9002\u5e94\u6027\u548c\u4ea4\u4e92\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u5229\u7528\u4e86\u89e3\u91ca\u884c\u4e3a\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7406\u89e3\u89e3\u91ca\u8005\u548c\u88ab\u89e3\u91ca\u8005\u5728\u5bf9\u8bdd\u4e2d\u5982\u4f55\u8fd0\u7528\u7b56\u7565\u8fdb\u884c\u89e3\u91ca\u3001\u7406\u89e3\u548c\u4e92\u52a8\u7684\u5de5\u5177\u3002\u6211\u4eec\u5229\u7528Wachsmuth\u7b49\u4eba\u6784\u5efa\u7684WIRED YouTube\u7cfb\u5217\u6570\u636e\u96c6\uff0c\u5e76\u7531Booshehri\u7b49\u4eba\u8fdb\u884c\u4e86\u5e26\u6709\u89e3\u91ca\u884c\u4e3a\u7684\u6807\u6ce8\uff0c\u8fd9\u4e9b\u6ce8\u91ca\u4e3a\u6211\u4eec\u7406\u89e3\u5bf9\u8bdd\u4e2d\u89e3\u91ca\u8005\u5982\u4f55\u6784\u5efa\u56de\u5e94\u63d0\u4f9b\u4e86\u4f9d\u636e\u3002 \u968f\u7740\u53bb\u5e74\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u671f\u671b\u66f4\u597d\u5730\u7406\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u4ee5\u53ca\u5b83\u4eec\u5982\u4f55\u589e\u5f3a\u4e13\u5bb6\u89e3\u91ca\u8005\u7684\u5bf9\u8bdd\u4ea4\u6d41\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Booshehri\u7b49\u4eba2023\u5e74\u6807\u6ce8\u76845-Levels\u6570\u636e\u96c6\u6765\u8bc4\u4f30LLMs\u5728\u89e3\u91ca\u6027\u5bf9\u8bdd\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u8bc4\u4ef7LLMs\u751f\u6210\u89e3\u91ca\u8005\u56de\u5e94\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u79cd\u7b56\u7565\uff1a\u4eba\u7c7b\u89e3\u91ca\u8005\u7684\u539f\u59cb\u56de\u5e94\u3001GPT4\u7684\u6807\u51c6\u56de\u5e94\u4ee5\u53ca\u52a0\u5165\u4e86\u89e3\u91ca\u6b65\u9aa4\u7684GPT4\u56de\u5e94\u3002\u6211\u4eec\u9080\u8bf7\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u8fd9\u4e09\u79cd\u7b56\u7565\u8fdb\u884c\u8bc4\u4f30\u3002|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505](http://arxiv.org/abs/2406.18505)|null|## \u80cc\u666f \u5c3d\u7ba1\u73b0\u4ee3\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7406\u8bba\u4e0a\u80fd\u591f\u8868\u8fbe\u4efb\u610f\u53ef\u80fd\u7684\u4ee4\u724c\u5206\u5e03\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5229\u7528\u9884\u8bad\u7ec3\u65f6\u79ef\u7d2f\u7684\u4e16\u754c\u77e5\u8bc6\u6765\u7406\u89e3\u7269\u7406\u4e16\u754c\u4e2d\u7684\u4ee3\u7406\u884c\u4e3a\uff0c\u8fd9\u4e00\u65b9\u9762\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5b9e\u8bc1\u8003\u5bdf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u8fc7\u63a8\u7406\u5206\u6790\u4ee3\u7406\u7684\u884c\u4e3a\u53ca\u5176\u5bf9\u72b6\u6001\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u6784\u5efa\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\uff08agent mental modeling\uff09\u7684\u80fd\u529b\u3002\u8fd9\u53ef\u80fd\u63ed\u793a\u51fa\u5229\u7528LLMs\u89e3\u6790\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u884c\u4e3a\u7684\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u89e3\u91ca\u5f3a\u5316\u5b66\u4e60\uff08XRL\uff09\u7684\u5173\u952e\u6311\u6218\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u7279\u5b9a\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5728\u4e0d\u540c\u590d\u6742\u5ea6\u7684RL\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u62a5\u544a\u5173\u4e8e\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\u5efa\u7acb\u7684\u7814\u7a76\u7ed3\u679c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u8fd8\u65e0\u6cd5\u4ec5\u901a\u8fc7\u63a8\u7406\u5b8c\u5168\u5b9e\u73b0\u4ee3\u7406\u7684\u5fc3\u7406\u5efa\u6a21\uff0c\u8fd9\u9700\u8981\u8fdb\u4e00\u6b65\u521b\u65b0\u3002\u56e0\u6b64\uff0c\u8fd9\u9879\u5de5\u4f5c\u63d0\u4f9b\u4e86\u5bf9\u73b0\u4ee3LLMs\u80fd\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\u3002|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501](http://arxiv.org/abs/2406.18501)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5185\u63d2\u5b66\u4e60\uff08in-context learning\uff0cICL\uff09\u80fd\u529b\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u8fdb\u884c\u529f\u80fd\u7b49\u6548\u6027\u8bca\u65ad\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u9006\u9891\u7387\u6548\u5e94\uff08inverse frequency effect\uff0cIFE\uff09\u6765\u5206\u6790\u3002IFE\u73b0\u8c61\u6307\u7684\u662f\u5728\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6a21\u578b\u5e94\u5bf9\u7f55\u89c1\u6837\u4f8b\u4ea7\u751f\u7684\u66f4\u65b0\u5e45\u5ea6\u5927\u4e8e\u5e38\u89c1\u6837\u4f8b\u3002\u5728\u5fc3\u7406\u5b66\u4e2d\uff0c\u4eba\u7c7b\u5728\u7ed3\u6784\u5316\u63d0\u793a\uff08\u5982\u503e\u5411\u4e8e\u91cd\u590d\u6700\u8fd1\u63a5\u89e6\u7684\u53e5\u5b50\u7ed3\u6784\uff09\u60c5\u5883\u4e2d\u8868\u73b0\u51faIFE\uff0c\u8fd9\u8868\u660e\u5176\u53ef\u80fd\u6d89\u53ca\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u673a\u5236\u3002\u5b9e\u9a8c\u901a\u8fc7\u6a21\u62df\u7ed3\u6784\u5316\u63d0\u793a\u5728ICL\u4e2d\u7684\u5f71\u54cd\u53d1\u73b0\uff0cLLMs\u540c\u6837\u663e\u793a\u51faIFE\uff0c\u4e14\u8fd9\u4e00\u6548\u5e94\u5728\u66f4\u5927\u7684\u6a21\u578b\u4e2d\u66f4\u4e3a\u660e\u663e\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u7ed3\u679c\u652f\u6301\u4e86ICL\u672c\u8d28\u4e0a\u662f\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u7684\u5047\u8bbe\uff0c\u5373\u5728ICL\u7684\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u9690\u542b\u5730\u8ba1\u7b97\u4e86\u68af\u5ea6\u3002\u8bba\u6587\u7ed3\u8bba\u6307\u51fa\uff0c\u4eba\u7c7b\u548cLLMs\u90fd\u4f7f\u7528\u4e86\u57fa\u4e8e\u68af\u5ea6\u7684\u3001\u9519\u8bef\u9a71\u52a8\u7684\u5904\u7406\u673a\u5236\u3002|\n", "2406.18460": "|**2024-06-26**|**Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation**|Ahmed Njifenjou et.al.|[2406.18460](http://arxiv.org/abs/2406.18460)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b9\u6cd5\u6765\u521b\u5efa\u80fd\u591f\u8fdb\u884c\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u56de\u7b54\u7528\u6237\u95ee\u9898\uff0c\u4f46\u5c40\u9650\u4e8e\u5355\u5411\u95ee\u7b54\u5f62\u5f0f\uff0c\u800c\u975e\u771f\u6b63\u7684\u5bf9\u8bdd\u3002\u901a\u5e38\uff0c\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u6765\u8c03\u6574\u5b83\u4eec\u7684\u4ea4\u6d41\u98ce\u683c\uff0c\u4f46\u8fd9\u65e2\u6602\u8d35\u53c8\u9650\u4e8e\u5c11\u6570\u8bed\u8a00\u3002\u672c\u7814\u7a76\u63a2\u7d22\u4e86\u89d2\u8272\u626e\u6f14\u7684\u96f6\u6837\u672c\u63d0\u793a\u4f5c\u4e3a\u63d0\u9ad8\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u591a\u8bed\u8a00\u80fd\u529b\u5f3a\u7684\u8bad\u7ec3\u6709\u7d20\u6a21\u578b\uff08Beeching\u7b49\u4eba\uff0c2023\u5e74\uff09\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u9075\u5faa\u6307\u4ee4\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\u7cfb\u7edf\uff0c\u5f53\u4e0e\u9075\u5faa\u6307\u4ee4\u7684\u6a21\u578b\u2014\u2014\u8fd9\u91cc\u4f7f\u7528Vicuna\uff08Chiang\u7b49\u4eba\uff0c2023\u5e74\uff09\u7ed3\u5408\u65f6\uff0c\u80fd\u591f\u751f\u6210\u5728\u6cd5\u8bed\u4e2d\u7684\u5bf9\u8bdd\u4ee3\u7406\uff0c\u5728\u4e24\u9879\u4efb\u52a1\u4e2d\u751a\u81f3\u8d85\u8d8a\u4e86\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5e76\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.18449": "|**2024-06-26**|**Cascading Large Language Models for Salient Event Graph Generation**|Xingwei Tan et.al.|[2406.18449](http://arxiv.org/abs/2406.18449)|null|\u7531\u4e8e\u957f\u6587\u6863\u4e2d\u4e8b\u4ef6\u68c0\u6d4b\u3001\u5173\u7cfb\u8bc6\u522b\u4ee5\u53ca\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e0e\u7ed3\u6784\u5316\u56fe\u8c31\u7684\u6574\u5408\u7b49\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u4ece\u6587\u672c\u751f\u6210\u4e8b\u4ef6\u56fe\u8c31\u662f\u4e00\u9879\u6311\u6218\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u540c\u7b49\u91cd\u89c6\u6240\u6709\u4e8b\u4ef6\uff0c\u672a\u80fd\u533a\u5206\u5bf9\u7406\u89e3\u53d9\u4e8b\u81f3\u5173\u91cd\u8981\u7684\u5173\u952e\u4e8b\u4ef6\u3002\u672c\u6587\u63d0\u51faCALLMSAE\uff0c\u4e00\u4e2a\u57fa\u4e8eCAscading\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684SAlient Event\u56fe\u8c31\u751f\u6210\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u7684\u80fd\u529b\uff0c\u5e76\u907f\u514d\u4e86\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u9700\u6c42\u3002\u9996\u5148\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u6458\u8981\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u91cd\u8981\u4e8b\u4ef6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u4ee3\u7801\u7cbe\u70bc\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u751f\u6210\u4e8b\u4ef6\u5173\u7cfb\u56fe\uff0c\u6d88\u9664\u9519\u8bef\u7684\u5173\u7cfb\u5e76\u6062\u590d\u7f3a\u5931\u7684\u8fb9\u3002\u5bf9\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u56fe\u8c31\u751f\u6210\u6a21\u578b\u8fdb\u884c fine-tuning\uff0c\u5728\u4f7f\u7528 LLM \u751f\u6210\u7684\u56fe\u8c31\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f18\u4e8e\u4f7f\u7528 CAEVO \u751f\u6210\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u3002\u5728\u4eba\u7c7b\u6807\u6ce8\u7684\u6d4b\u8bd5\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u751f\u6210\u66f4\u7a81\u51fa\u4e14\u51c6\u786e\u7684\u56fe\u8c31\uff0c\u8d85\u8d8a\u4e86\u7ade\u4e89\u6027\u7684\u57fa\u7ebf\u3002|\n", "2406.18440": "|**2024-06-26**|**New intelligent empowerment for digital transformation**|Peng Yifeng et.al.|[2406.18440](http://arxiv.org/abs/2406.18440)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u8861\u91cf\u4f01\u4e1a\u7684\u6570\u5b57\u5316\u8f6c\u578b\uff08DT\uff09\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf92005\u5e74\u81f32022\u5e74\u95f4\u5728\u7ebd\u7ea6\u8bc1\u5238\u4ea4\u6613\u6240\u548c\u7eb3\u65af\u8fbe\u514b\u4e0a\u5e02\u76844407\u5bb6\u516c\u53f8\u7684\u5e74\u5ea6\u62a5\u544a\u8fdb\u884c\u5206\u6790\uff0c\u6784\u5efa\u4e86\u4e00\u5957\u5168\u9762\u7684DT\u6307\u6807\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cDT\u663e\u8457\u63d0\u9ad8\u4e86\u4f01\u4e1a\u7684\u8d22\u52a1\u8868\u73b0\u3002\u7136\u800c\uff0c\u4e0d\u540c\u7684\u6570\u5b57\u6280\u672f\u5bf9\u8d22\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u5404\u4e0d\u76f8\u540c\uff0c\u533a\u5757\u94fe\u6280\u672f\u7684\u79ef\u6781\u5f71\u54cd\u76f8\u5bf9\u8f83\u5c0f\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0DT\u901a\u8fc7\u63d0\u5347\u8fd0\u8425\u6548\u7387\u548c\u964d\u4f4e\u6210\u672c\u4fc3\u8fdb\u8d22\u52a1\u7ee9\u6548\u589e\u957f\u3002\u672c\u7814\u7a76\u4e3a\u5b66\u672f\u754c\u63d0\u4f9b\u4e86\u65b0\u7684DT\u8bc4\u4f30\u5de5\u5177\uff0c\u540c\u65f6\u62d3\u5bbd\u4e86\u751f\u6210\u4eba\u5de5\u667a\u80fd\u6280\u672f\u5728\u7ecf\u6d4e\u7814\u7a76\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002|\n", "2406.18406": "|**2024-06-26**|**IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons**|Dan Shi et.al.|[2406.18406](http://arxiv.org/abs/2406.18406)|null|\u4eba\u4eec\u666e\u904d\u8ba4\u4e3a\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u540e\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u6587\u672c\u65f6\u7684\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5185\u7f16\u7801\u7684\u53c2\u6570\u77e5\u8bc6\uff08\u5373\u77e5\u8bc6\u5e93\uff09\u4e0e\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u65b0\u77e5\u8bc6\u5b58\u5728\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014IRCAN\uff08\u8bc6\u522b\u548c\u91cd\u6743\u4e0a\u4e0b\u6587\u611f\u77e5\u795e\u7ecf\u5143\uff09\u3002IRCAN\u9996\u5148\u5229\u7528\u6574\u5408\u68af\u5ea6\u8ba1\u7b97\u5f97\u5230\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u5f52\u56e0\u5206\u6570\uff0c\u6765\u8bc6\u522b\u90a3\u4e9b\u5bf9\u5904\u7406\u8bed\u5883\u81f3\u5173\u91cd\u8981 \u7684\u795e\u7ecf\u5143\u3002\u63a5\u7740\uff0c\u901a\u8fc7\u91cd\u65b0\u8d4b\u6743\uff0c\u6211\u4eec\u5f3a\u5316\u8fd9\u4e9b\u8bc6\u522b\u51fa\u7684\u4e0a\u4e0b\u6587\u76f8\u5173\u795e\u7ecf\u5143\uff0c\u4ece\u800c\u5f15\u5bfcLLMs\u751f\u6210\u66f4\u7b26\u5408\u4e0a\u4e0b\u6587\u65b0\u77e5\u8bc6\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5728\u591a\u79cd\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cIRCAN\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u7684\u80fd\u529b\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u3001\u5373\u63d2\u5373\u7528\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u6a21\u578b\u4e2d\u3002|\n", "2406.19392": "|**2024-06-27**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392](http://arxiv.org/abs/2406.19392)|**[link](https://github.com/rextime/rextime)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aReXTime\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u95e8\u9488\u5bf9\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u5728\u89c6\u9891\u4e8b\u4ef6\u4e2d\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\u3002ReXTime\u5173\u6ce8\u7684\u662f\u8de8\u65f6\u95f4\u63a8\u7406\uff0c\u5373\u7406\u89e3\u5f53\u95ee\u9898\u53ca\u5176\u76f8\u5e94\u7684\u7b54\u6848\u51fa\u73b0\u5728\u4e0d\u540c\u7684\u89c6\u9891\u7247\u6bb5\u65f6\u7684\u4eba\u7c7b\u5f0f\u7406\u89e3\u3002\u8fd9\u79cd\u9700\u8981\u6df1\u5165\u7406\u89e3\u89c6\u9891\u7247\u6bb5\u4e4b\u95f4\u56e0\u679c\u5173\u7cfb\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u5bf9\u524d\u6cbf\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8bc4\u4ef7\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7ba1\u9053\uff0c\u7528\u4e8e\u751f\u6210\u65f6\u95f4\u63a8\u7406\u7684\u95ee\u7b54\u5bf9\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u7e41\u7410\u7684\u624b\u52a8\u6807\u6ce8\u9700\u6c42\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5305\u62ec921\u4e2a\u7cbe\u5fc3\u7b5b\u9009\u7684\u9a8c\u8bc1\u6837\u672c\u548c2,143\u4e2a\u6d4b\u8bd5\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u7ecf\u8fc7\u4eba\u5de5\u7cbe\u5fc3\u6311\u9009\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b66\u672f\u6a21\u578b\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7684\u8868\u73b0\u4ecd\u5b58\u5728\u663e\u8457\u768414.3%\u7684\u7cbe\u5ea6\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u65e0\u9700\u4eba\u5de5\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9,695\u4e2a\u673a\u5668\u751f\u6210\u6837\u672c\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6765\u63d0\u5347\u8de8\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u3002**|\n", "2406.19384": "|**2024-06-27**|**The Remarkable Robustness of LLMs: Stages of Inference?**|Vedang Lad et.al.|[2406.19384](http://arxiv.org/abs/2406.19384)|**[link](https://github.com/vdlad/remarkable-robustness-of-llms)**|**\u6211\u4eec\u901a\u8fc7\u5220\u9664\u548c\u4ea4\u6362\u76f8\u90bb\u5c42\u6765\u5c55\u793a\u5e76\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u60ca\u4eba\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u5e72\u9884\u63aa\u65bd\u4ecd\u80fd\u4fdd\u7559\u539f\u59cb\u6a21\u578b72%\u81f395%\u7684\u9884\u6d4b\u7cbe\u5ea6\uff0c\u800c\u4e14\u6a21\u578b\u5c42\u6570\u8d8a\u591a\uff0c\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u6839\u636e\u9010\u5c42\u5e72\u9884\u5b9e\u9a8c\u548c\u5176\u4ed6\u5b9e\u9a8c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5047\u8bbe\uff1a\u5b58\u5728\u56db\u79cd\u901a\u7528\u7684\u63a8\u7406\u9636\u6bb5\uff0c\u8de8\u8d8a\u516b\u79cd\u4e0d\u540c\u7684\u6a21\u578b\uff1a\u89e3\u7801\u5668\u9636\u6bb5\uff0c\u5c06\u539f\u59cb\u4ee4\u724c\u8868\u793a\u63d0\u5347\u4e3a\u66f4\u9ad8\u7ea7\u7684\u4e0a\u4e0b\u6587\u8868\u793a\uff1b\u7279\u5f81\u5de5\u7a0b\u9636\u6bb5\uff0c\u8fed\u4ee3\u4f18\u5316\u4efb\u52a1\u548c\u5b9e\u4f53\u7279\u5b9a\u7279\u5f81\uff1b\u7136\u540e\u662f\u6a21\u578b\u7684\u534a\u90e8\u5206\uff0c\u968f\u7740\u4e13\u95e8\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u9690\u85cf\u8868\u793a\u4e0e\u8bcd\u6c47\u7a7a\u95f4\u7684\u5bf9\u9f50\u8fdb\u5165\u4e00\u4e2a\u76f8\u53d8\u9636\u6bb5\uff1b\u6700\u540e\uff0c\u6700\u540e\u4e00\u5c42\u901a\u8fc7\u6d88\u9664\u5bf9\u9884\u6d4b\u9020\u6210\u5e72\u6270\u7684\u8fc7\u65f6\u7279\u5f81\uff0c\u7cbe\u7ec6\u5316\u540e\u7eed\u7684\u4ee4\u724c\u5206\u5e03\u3002**|\n", "2406.19358": "|**2024-06-27**|**The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models**|Xiliang Zhu et.al.|[2406.19358](http://arxiv.org/abs/2406.19358)|null|### \u6982\u8ff0 \u60c5\u611f\u5206\u6790\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u626e\u6f14\u7740\u6838\u5fc3\u89d2\u8272\u3002XLM-R\u548cmT5\u7b49\u591a\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u7684\u5173\u6ce8\u5ea6\u63d0\u5347\u3002\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63a8\u52a8\u4e86\u901a\u7528NLP\u4efb\u52a1\u7684\u53d1\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5728\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u65b9\u9762\u7684\u6027\u80fd\u5c1a\u672a\u5145\u5206\u63a2\u8ba8\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5b9e\u8bc1\u5206\u6790\uff0c\u6bd4\u8f83\u4e86\u516c\u5171\u5c0f\u578b\u591a\u8bed\u8a00\u6a21\u578b\uff08SMLM\uff09\u5982XLM-R\u4e0e\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLM\uff08\u5982Llama-3\uff09\u5728\u82f1\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u548c\u4e2d\u6587\u7684\u60c5\u611f\u5206\u6790\u4e2d\u7684\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8fc1\u79fb\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c31\u516c\u5f00\u6a21\u578b\u800c\u8a00\uff0cSMLM\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u8bbe\u7f6e\u4e2d\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u516c\u5f00LLM\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9002\u5e94\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u4e13\u6709\u7684GPT-3.5\u548cGPT-4\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u80fd\u529b\u4e0a\u9886\u5148\uff0c\u4f46\u5728\u5c11\u91cf\u6837\u672c\u573a\u666f\u4e0b\uff0c\u5b83\u4eec\u88ab\u516c\u5f00\u6a21\u578b\u8d85\u8d8a\u3002|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356](http://arxiv.org/abs/2406.19356)|null|## \u80cc\u666f \u9ad8\u8d28\u91cf\u7684\u5e72\u6270\u9879\u5bf9\u4e8e\u9009\u62e9\u9898\uff08\u5c24\u5176\u662f\u6570\u5b66\u9009\u62e9\u9898\uff09\u7684\u8bc4\u4f30\u548c\u6559\u5b66\u4ef7\u503c\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u5de5\u8bbe\u8ba1\u80fd\u591f\u53cd\u6620\u5b66\u751f\u5b9e\u9645\u77e5\u8bc6\u7f3a\u9677\u6216\u8bef\u89e3\u7684\u5e72\u6270\u9879\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u5728\u751f\u6210\u5e72\u6270\u9879\u65b9\u9762\u6709\u6240\u52a9\u76ca\uff0c\u4f46\u6570\u5b66\u8fd9\u7c7b\u5b66\u79d1\u7684\u5904\u7406\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7406\u89e3\u548c\u751f\u6210\u89e3\u91ca\u6027\u7684\u9519\u8bef\u8868\u793a\uff0c\u4ee5\u751f\u6210\u6570\u5b66\u9009\u62e9\u9898\u7684\u5e72\u6270\u9879\u3002\u672c\u6587\u4ecb\u7ecdDiVERT\uff08\u57fa\u4e8e\u6587\u672c\u7684\u53d8\u5f02\u8bef\u5dee\u751f\u6210\u5668\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u75287\u4ebf\u53c2\u6570\u5f00\u6e90LLM\u7684\u53d8\u5206\u65b9\u6cd5\uff0c\u5b83\u5728\u771f\u5b9e\u4e16\u754c\u6570\u5b66\u9009\u62e9\u9898\u6570\u636e\u96c6\uff08\u5305\u542b1,434\u4e2a\u95ee\u9898\uff0c\u88ab\u6570\u5341\u4e07\u5b66\u751f\u4f7f\u7528\uff09\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684GPT-4\u65b9\u6cd5\uff0cDiVERT\u5728\u5e72\u6270\u9879\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u4e0e\u6570\u5b66\u6559\u80b2\u8005\u7684\u540c\u884c\u8bc4\u5ba1\uff0c\u7ed3\u679c\u8868\u660eDiVERT\u751f\u6210\u7684\u9519\u8bef\u6807\u7b7e\u8d28\u91cf\u63a5\u8fd1\u4eba\u7c7b\u7f16\u5199\u7684\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u82f1\u6587\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u8f93\u51fa\u4e0d\u5e94\u5305\u542b\u9664\u6458\u8981\u5185\u5bb9\u5916\u7684\u4efb\u4f55\u5176\u4ed6\u5185\u5bb9\uff0c\u4e14\u786e\u4fdd\u4e0d\u51fa\u73b0\",\"\u5b57\u7b26\u3002|\n", "2406.19349": "|**2024-06-27**|**IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language**|Lucky Susanto et.al.|[2406.19349](http://arxiv.org/abs/2406.19349)|null|## \u7ffb\u8bd1 \u9488\u5bf9\u7f51\u7edc\u4ec7\u6068\u8a00\u8bba\u5bf9\u793e\u4f1a\u548c\u8c10\u7684\u4e25\u5cfb\u5a01\u80c1\uff0c\u7279\u522b\u662f\u5728\u5370\u5c3c\u8fd9\u7c7b\u56fd\u5bb6\uff0c\u8fd1\u5e74\u6765\u4ec7\u6068\u8a00\u8bba\u5728\u7ebf\u6bd4\u7387\u589e\u957f\u4e86\u5341\u500d\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u68c0\u6d4b\u673a\u5236\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5145\u8db3\u7684\u6807\u8bb0\u6570\u636e\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u5370\u5c3c\u6587\u672c\u7684\uff0c\u8fd9\u4e00\u8fdb\u5c55\u53d7\u5230\u4e86\u963b\u788d\u3002\u8fb9\u7f18\u5316\u7fa4\u4f53\uff0c\u5982\u4ec0\u53f6\u6d3e\u3001LGBTQ\u7b49\u5c11\u6570\u7fa4\u4f53\uff0c\u9762\u4e34\u7684\u6311\u6218\u66f4\u5927\uff0c\u56e0\u4e3a\u4ec7\u6068\u8a00\u8bba\u62a5\u544a\u4e0d\u8db3\uff0c\u73b0\u6709\u7684\u68c0\u6d4b\u5de5\u5177\u5bf9\u5176\u7406\u89e3\u6709\u9650\u3002\u6b64\u5916\uff0c\u5f53\u524d\u6570\u636e\u96c6\u5bf9\u4e3b\u89c2\u6027\u7684\u5904\u7406\u4e0d\u8db3\uff0c\u52a0\u5267\u4e86\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faIndoToxic2024\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u5370\u5c3c\u4ec7\u6068\u8a00\u8bba\u548c\u6bd2\u6027\u5206\u7c7b\u6570\u636e\u96c6\uff0c\u5305\u542b43,692\u6761\u8bb0\u5f55\uff0c\u753119\u540d\u591a\u5143\u5316\u7684\u4e2a\u4f53\u8fdb\u884c\u6807\u6ce8\uff0c\u7279\u522b\u5173\u6ce8\u9009\u4e3e\u671f\u95f4\u9488\u5bf9\u56fd\u5185\u5f31\u52bf\u7fa4\u4f53\uff08\u5982\u603b\u7edf\u9009\u4e3e\u4e2d\u7684\u7279\u5b9a\u7fa4\u4f53\uff09\u7684\u6587\u672c\u3002\u6211\u4eec\u4f7f\u7528BERT\u6a21\u578b\uff08IndoBERTweet\uff09\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4e3a\u4e03\u79cd\u4e8c\u5143\u5206\u7c7b\u4efb\u52a1\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff0c\u53d6\u5f97\u4e860.78\u7684\u5b8fF1\u5206\u6570\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5c06\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u878d\u5165\u5176\u4e2d\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578bgpt-3.5-turbo\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u8b66\u544a\uff0c\u8fc7\u5ea6\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u53ef\u80fd\u5bfc\u81f4\u7ec6\u5316\u6a21\u578b\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u8fd9\u4f1a\u5bfc\u81f4\u6570\u636e\u788e\u7247\u5316\u3002|\n", "2406.19317": "|**2024-06-27**|**Jump Starting Bandits with LLM-Generated Prior Knowledge**|Parand A. Alamdari et.al.|[2406.19317](http://arxiv.org/abs/2406.19317)|null|\u6211\u4eec\u63d0\u4f9b\u4e86\u6709\u529b\u7684\u8bc1\u636e\uff0c\u5c55\u793a\u4e86\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\u76f8\u7ed3\u5408\u7684\u4f18\u52bf\u3002\u4e0a\u4e0b\u6587\u5316\u8001\u864e\u673a\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u7528\u4e8e\u6839\u636e\u7528\u6237\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u751f\u6210\u4e2a\u6027\u5316\u5efa\u8bae\u3002\u6211\u4eec\u8868\u660e\uff0c\u7ecf\u8fc7\u5927\u89c4\u6a21\u8bed\u6599\u5e93\u8bad\u7ec3\uff0c\u5bcc\u542b\u4eba\u7c7b\u77e5\u8bc6\u548c\u504f\u597d\u7684LLMs\u80fd\u591f\u5f88\u597d\u5730\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u4ece\u800c\u901a\u8fc7\u542f\u52a8\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6765\u51cf\u5c11\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\uff08regret\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521d\u59cb\u5316\u7b97\u6cd5\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u63a5\u8fd1\u4eba\u7c7b\u504f\u597d\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u4f9b\u8001\u864e\u673a\u5b66\u4e60\u4f7f\u7528\u3002\u8fd9\u663e\u8457\u964d\u4f4e\u4e86\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\u548c\u6570\u636e\u6536\u96c6\u6210\u672c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e24\u7ec4\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4f7f\u7528LLMs\u4f5c\u4e3a\u5360\u535c\u8005\uff08oracle\uff09\u7684\u5b9e\u9a8c\u548c\u57fa\u4e8e\u8054\u5408\u8c03\u67e5\u5b9e\u9a8c\u6570\u636e\u7684\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u3002|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292](http://arxiv.org/abs/2406.19292)|null|\u8fd1\u671f\u7684\u7814\u7a76\u6307\u51fa\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u957f\u6587\u672c\u8f93\u5165\u65f6\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5408\u6210\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6570\u503c\u578b\u952e\u503c\u5bf9\u68c0\u7d22\u4efb\u52a1\u3002\u6211\u4eec\u5728GPT-3.5 Turbo\u548cMistral 7B\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u8fd9\u79cd\u6570\u636e\u96c6\u7684\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u957f\u6587\u672c\u73af\u5883\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u5206\u6790\u4e86\u5fae\u8c03\u540e\u7684\u6a21\u578b\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u4ece\u5408\u6210\u4efb\u52a1\u8fc1\u79fb\u5230\u5b9e\u9645\u8bc4\u4f30\uff08\u5982\u572820\u6587\u6863MDQA\u4e2d\u7684\u4f4d\u7f6e10\u5904\u63d0\u534710.5%\uff09\u65b9\u9762\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u7ecf\u8fc7\u6211\u4eec\u5408\u6210\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u5728\u901a\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u4fdd\u6301\u7a33\u5b9a\uff0c\u800c\u4f7f\u7528\u5176\u4ed6\u57fa\u4e8e\u957f\u6587\u672c\u589e\u5f3a\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u53ef\u80fd\u4f1a\u5bfc\u81f4\u9519\u8bef\u589e\u52a0\uff08\u4f8b\u5982\uff0c\u5728TriviaQA\u4e0a\uff0cMistral 7B\u5728\u6211\u4eec\u7684\u5408\u6210\u6570\u636e\u4e0a\u5fae\u8c03\u65e0\u660e\u663e\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u5176\u4ed6\u57fa\u7ebf\u6570\u636e\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u8303\u56f4\u57282.33%\u52306.19%\u4e4b\u95f4\uff09\u3002\u672c\u7814\u7a76\u7a81\u663e\u4e86\u901a\u8fc7\u5408\u6210\u6570\u636e\u5fae\u8c03\u6765\u63d0\u5347LLMs\u5728\u957f\u6587\u672c\u4efb\u52a1\u6027\u80fd\u7684\u6f5c\u529b\u3002|\n", "2406.19283": "|**2024-06-27**|**PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models**|Cathy Mengying Fang et.al.|[2406.19283](http://arxiv.org/abs/2406.19283)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPhysioLLM\u7684\u4e92\u52a8\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u53ef\u7a7f\u6234\u8bbe\u5907\u7684\u751f\u7406\u6570\u636e\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u5065\u5eb7\u7406\u89e3\u548c\u63a2\u7d22\u3002\u4e0e\u5546\u4e1a\u5065\u5eb7\u5e94\u7528\u4e0d\u540c\uff0cPhysioLLM\u5177\u5907\u5168\u9762\u7684\u7edf\u8ba1\u5206\u6790\u529f\u80fd\uff0c\u80fd\u53d1\u73b0\u7528\u6237\u6570\u636e\u4e2d\u7684\u5173\u8054\u548c\u8d8b\u52bf\u3002\u7528\u6237\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u95ee\uff0c\u83b7\u53d6\u751f\u6210\u7684\u4e2a\u6027\u5316\u6d1e\u5bdf\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u4fe1\u606f\u5236\u5b9a\u884c\u52a8\u76ee\u6807\u3002\u4ee5\u6539\u5584\u7761\u7720\u8d28\u91cf\u4e3a\u4f8b\uff0c\u56e0\u4e3a\u5176\u53ef\u901a\u8fc7\u751f\u7406\u6570\u636e\u91cf\u5316\u4e14\u5bf9\u6574\u4f53\u5065\u5eb7\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u4e00\u9879\u6d89\u53ca24\u540dFitbit\u667a\u80fd\u624b\u8868\u7528\u6237\u7684\u7528\u6237\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86PhysioLLM\u5728\u4fc3\u8fdb\u5bf9\u5065\u5eb7\u6570\u636e\u7684\u6df1\u5165\u4e2a\u6027\u5316\u7406\u89e3\uff0c\u4ee5\u53ca\u652f\u6301\u5b9e\u73b0\u4e2a\u4eba\u5065\u5eb7\u76ee\u6807\u65b9\u9762\uff0c\u4f18\u4e8eFitbit\u5e94\u7528\u548c\u901a\u7528LLM\u804a\u5929\u673a\u5668\u4eba\u3002|\n", "2406.19280": "|**2024-06-27**|**HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale**|Junying Chen et.al.|[2406.19280](http://arxiv.org/abs/2406.19280)|**[link](https://github.com/freedomintelligence/huatuogpt-vision)**|**\u968f\u7740\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u533b\u5b66\u591a\u6a21\u6001\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u533b\u5b66\u5f71\u50cf-\u6587\u672c\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u53d7\u9650\u4e8e\u6570\u636e\u9690\u79c1\u95ee\u9898\u548c\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ecd\u9762\u4e34\u6311\u6218\u3002\u65e9\u671f\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528PubMed\u7684\u5927\u578b\u53bb\u6807\u8bc6\u5316\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\u6765\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u4ecd\u53d7\u5230\u6570\u636e\u566a\u97f3\u7684\u5f71\u54cd\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4f18\u5316\u4e86PubMed\u4e2d\u7684\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\uff0c\u5e76\u5229\u7528GPT-4V\u5728\u201c\u975e\u76f2\u201d\u6a21\u5f0f\u4e0b\u8fdb\u884c\u6570\u636e\u6e05\u6d17\u548c\u683c\u5f0f\u8f6c\u6362\uff0c\u521b\u5efa\u4e86PubMedVision\u6570\u636e\u96c6\uff0c\u5305\u542b130\u4e07\u4efd\u533b\u5b66\u89c6\u89c9\u95ee\u7b54\u6837\u672c\u3002\u6211\u4eec\u7684\u9a8c\u8bc1\u8868\u660e\uff1a\uff081\uff09PubMedVision\u663e\u8457\u63d0\u5347\u4e86\u5f53\u524d\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u5728\u533b\u5b66\u9886\u57df\u7684\u6027\u80fd\uff0c\u5728\u8bf8\u5982MMMU Health & Medicine track\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u6539\u5584\uff1b\uff082\uff09\u533b\u5b66\u4e13\u5bb6\u7684\u624b\u52a8\u68c0\u67e5\u548c\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u5728\u6570\u636e\u8d28\u91cf\u4e0a\u4f18\u4e8e\u5176\u4ed6\u6784\u5efa\u65b9\u6cd5\u3002\u5229\u7528PubMedVision\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u540d\u4e3aHuatuoGPT-Vision\u7684340\u4ebf\u53c2\u6570\u7684\u533b\u5b66\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u5728\u516c\u5f00\u6e90\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u533b\u5b66\u591a\u6a21\u6001\u573a\u666f\u4e2d\u663e\u793a\u51fa\u4f18\u8d8a\u6027\u80fd\u3002**|\n", "2406.19271": "|**2024-06-27**|**AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning**|Praneeth Vadlapati et.al.|[2406.19271](http://arxiv.org/abs/2406.19271)|**[link](https://github.com/Pro-GenAI/AutoPureData)**|**\u4eba\u4eec\u5bf9\u6700\u65b0\u7684\u548c\u53ef\u9760\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9700\u6c42\u6301\u7eed\u589e\u957f\u3002\u901a\u5e38\uff0cLLMs\u662f\u57fa\u4e8e\u56fa\u5b9a\u7684\u6570\u636e\u96c6\u8bad\u7ec3\u7136\u540e\u90e8\u7f72\u7684\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u6570\u636e\u4f1a\u968f\u7740\u65f6\u95f4\u9010\u6e10\u8fc7\u65f6\u3002\u7814\u7a76\u5173\u6ce8\u5982\u4f55\u5229\u7528\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u66f4\u65b0AI\u6a21\u578b\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u6570\u636e\u8d28\u91cf\u4e0e\u5b89\u5168\u7684\u987e\u8651\uff0c\u5982\u504f\u89c1\u3001\u5783\u573e\u4fe1\u606f\u7b49\u3002\u786e\u4fdd\u6570\u636e\u7eaf\u51c0\u5bf9\u4e8e\u751f\u6210\u53ef\u9760\u7684\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5728\u4e0d\u7eaf\u6570\u636e\u4e0a\u8bad\u7ec3\u53ef\u80fd\u5bfc\u81f4\u4e0d\u826f\u7ed3\u679c\u3002\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5b83\u6536\u96c6\u7f51\u7edc\u6570\u636e\uff0c\u5e76\u501f\u52a9\u73b0\u6709\u53ef\u4fe1\u7684AI\u6a21\u578b\u81ea\u52a8\u7b5b\u9009\u51fa\u4e0d\u9700\u8981\u7684\u5185\u5bb9\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5904\u7406\u4e86\u4e00\u5c0f\u90e8\u5206\u7f51\u7edc\u6570\u636e\uff0c\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\u7684\u6570\u636e\u51c0\u5316\u6548\u679c\u3002**|\n", "2406.20098": "|**2024-06-28**|**Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs**|Sukmin Yun et.al.|[2406.20098](http://arxiv.org/abs/2406.20098)|**[link](https://github.com/mbzuai-llm/web2code)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u7b49\u591a\u79cd\u6a21\u6001\u7684\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7406\u89e3\u548c\u751f\u6210\u7f51\u9875\u622a\u56fe\u4ee5\u53ca\u76f8\u5e94\u7684HTML\u4ee3\u7801\u65b9\u9762\u7684\u80fd\u529b\u76f8\u5bf9\u8f83\u5f31\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faWeb2Code\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u62ec\u5927\u89c4\u6a21\u7f51\u9875\u5230\u4ee3\u7801\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u6307\u4ee4\u8c03\u4f18\uff0c\u5e76\u8bc4\u4f30MLLM\u5728\u7f51\u9875\u7406\u89e3\u53caHTML\u4ee3\u7801\u8f6c\u6362\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u6784\u5efa\u6570\u636e\u96c6\u65f6\uff0c\u5229\u7528\u9884\u8bad\u7ec3\u7684LLMs\u589e\u5f3a\u73b0\u6709\u7684\u7f51\u9875\u5230\u4ee3\u7801\u6570\u636e\u96c6\uff0c\u5e76\u751f\u6210\u591a\u6837\u5316\u7684\u7f51\u9875\u56fe\u7247\uff0c\u4ee5\u4f9b\u6e32\u67d3\u3002\u8f93\u5165\u662f\u7f51\u9875\u56fe\u7247\u548c\u8bf4\u660e\uff0c\u8f93\u51fa\u662f\u7f51\u9875\u7684HTML\u4ee3\u7801\uff0c\u540c\u65f6\u52a0\u5165\u5173\u4e8e\u7f51\u9875\u5185\u5bb9\u7684\u4e30\u5bcc\u81ea\u7136\u8bed\u8a00\u95ee\u7b54\u5bf9\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u7f51\u9875\u5185\u5bb9\u7684\u5168\u9762\u7406\u89e3\u3002\u4e3a\u4e86\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6d4b\u8bd5\u6846\u67b6\uff0c\u7528\u4e8e\u6d4b\u8bd5MLLM\u5728\u7f51\u9875\u7406\u89e3\u4e0e\u7f51\u9875\u5230\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6280\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0d\u4ec5\u6709\u76ca\u4e8e\u6211\u4eec\u63d0\u51fa\u7684\u4efb\u52a1\uff0c\u8fd8\u5728\u89c6\u89c9\u9886\u57df\u7684\u4e00\u822c\u6027\u80fd\u4e0a\u6709\u6240\u63d0\u5347\uff0c\u800c\u5148\u524d\u7684\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u63a8\u52a8\u901a\u7528MLLM\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u9002\u7528\u4e8e\u7f51\u7edc\u5185\u5bb9\u751f\u6210\u548c\u81ea\u52a8\u5316\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095](http://arxiv.org/abs/2406.20095)|**[link](https://github.com/lostxine/llara)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLaRA\uff08\u5927\u578b\u8bed\u8a00\u548c\u673a\u5668\u4eba\u52a9\u624b\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u8f6c\u5316\u4e3a\u5bf9\u8bdd\u5f62\u5f0f\uff0c\u901a\u8fc7\u7ed3\u5408\u989d\u5916\u7684\u6570\u636e\u8f85\u52a9\u5b66\u4e60\uff0c\u63d0\u5347\u54cd\u5e94\u8d28\u91cf\u3002\u5229\u7528\u5177\u5907\u89c6\u89c9\u8f93\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\uff0c\u5373\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5904\u7406\u72b6\u6001\u4fe1\u606f\uff0c\u4f5c\u4e3a\u89c6\u89c9-\u6587\u672c\u63d0\u793a\uff0c\u5e76\u751f\u6210\u6700\u4f18\u7684\u673a\u5668\u4eba\u51b3\u7b56\u7b56\u7565\u3002\u9996\u5148\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u65b9\u6cd5\uff0c\u4ece\u73b0\u6709\u7684\u884c\u4e3a\u514b\u9686\u6570\u636e\u4e2d\u751f\u6210\u591a\u6837\u4e14\u9ad8\u8d28\u91cf\u7684\u673a\u5668\u4eba\u6307\u4ee4\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u5b9a\u5236\u7684\u5bf9\u8bdd\u5f0f\u683c\u5f0f\u5bf9VLM\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u5176\u80fd\u591f\u751f\u6210\u6709\u610f\u4e49\u7684\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLaRA\u6846\u67b6\u5728\u591a\u4e2a\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u9884\u8bad\u7ec3\u6a21\u578b\u5df2\u5728\u63d0\u4f9b\u3002**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094](http://arxiv.org/abs/2406.20094)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5185\u7684\u591a\u79cd\u89c6\u89d2\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u4eba\u5de5\u5408\u6210\u6570\u636e\u3002\u4e3a\u4e86\u5728\u5927\u89c4\u6a21\u4e0a\u5145\u5206\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5f15\u5165\u4e86Persona Hub\uff0c\u8fd9\u662f\u4e00\u4e2a\u4ece\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u6574\u7406\u51fa\u7684\u4e00\u4ebf\u4e2a\u591a\u5143\u5316\u4eba\u683c\u7684\u96c6\u5408\uff0c\u76f8\u5f53\u4e8e\u5168\u7403\u4eba\u53e3\u7684\u7ea613%\u3002\u8fd9\u4e9b\u4eba\u683c\u4f5c\u4e3a\u5206\u5e03\u5f0f\u4e16\u754c\u77e5\u8bc6\u8f7d\u4f53\uff0c\u51e0\u4e4e\u53ef\u4ee5\u8c03\u7528LLM\u5185\u5305\u542b\u7684\u5404\u7c7b\u89c2\u70b9\uff0c\u4ece\u800c\u63a8\u52a8\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u5408\u6210\u6570\u636e\u521b\u5efa\uff0c\u9002\u7528\u4e8e\u5404\u79cd\u573a\u666f\u3002\u901a\u8fc7\u5c55\u793aPersona Hub\u5982\u4f55\u5728\u5927\u89c4\u6a21\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u548c\u903b\u8f91\u63a8\u7406\u95ee\u9898\u3001\u6307\u4ee4\uff08\u7528\u6237\u63d0\u793a\uff09\u3001\u5bcc\u542b\u77e5\u8bc6\u7684\u6587\u672c\u3001\u6e38\u620fNPC\u548c\u5de5\u5177\uff08\u51fd\u6570\uff09\u7b49\u65b9\u9762\u7684\u5e94\u7528\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u5177\u6709\u591a\u6837\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u6613\u7528\u6027\uff0c\u53ef\u80fd\u5f15\u9886\u5408\u6210\u6570\u636e\u521b\u9020\u548c\u5b9e\u9645\u5e94\u7528\u7684\u65b0\u8303\u5f0f\uff0c\u5bf9LLM\u7684\u7814\u7a76\u548c\u53d1\u5c55\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002|\n", "2406.20092": "|**2024-06-28**|**LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression**|Jieneng Chen et.al.|[2406.20092](http://arxiv.org/abs/2406.20092)|**[link](https://github.com/beckschen/llavolta)**|**\u5c3d\u7ba1\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5d4c\u5165\u538b\u7f29\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u4e2d\u7684\u89c6\u89c9\u4ee4\u724c\u538b\u7f29\u4ecd\u7136\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u7814\u7a76\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u5197\u4f59\u6027\u4ee5\u53ca\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6709\u6548\u8bad\u7ec3\u3002\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d4b\u8bd5\u9636\u6bb5\u901a\u8fc7\u7b80\u5355\u5e73\u5747\u6c60\u5316\u6d88\u9664\u9ad8\u8fbe70%\u7684\u89c6\u89c9\u4ee4\u724c\uff0cGQA\u57fa\u51c6\u7684\u89c6\u89c9\u95ee\u7b54\u51c6\u786e\u7387\u4ec5\u4e0b\u964d3%\uff0c\u8fd9\u663e\u793a\u51fa\u89c6\u89c9\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u5927\u91cf\u5197\u4f59\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Visual Context Compressor\uff0c\u5b83\u5728\u8bad\u7ec3\u9636\u6bb5\u51cf\u5c11\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e3a\u4e86\u5728\u538b\u7f29\u89c6\u89c9\u4ee4\u724c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4fe1\u606f\u635f\u5931\u5e76\u4fdd\u6301\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u8f7b\u91cf\u7ea7\u8bad\u7ec3\u65b9\u6848LLaVolta\u3002LLaVolta\u91c7\u7528\u5206\u9636\u6bb5\u7684\u89c6\u89c9\u4e0a\u4e0b\u6587\u538b\u7f29\u7b56\u7565\uff0c\u4ece\u91cd\u5ea6\u5230\u8f7b\u5ea6\u9010\u6e10\u538b\u7f29\uff0c\u6700\u7ec8\u5728\u8bad\u7ec3\u7ed3\u675f\u65f6\u5b8c\u5168\u4e0d\u8fdb\u884c\u538b\u7f29\uff0c\u4ece\u800c\u5728\u6d4b\u8bd5\u65f6\u4e0d\u4f1a\u4e22\u5931\u4efb\u4f55\u4fe1\u606f\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u5347\u4e86\u591a\u6a21\u6001\u6a21\u578b\u5728\u56fe\u50cf-\u8bed\u8a00\u548c\u89c6\u9891-\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u6210\u672c\u3002\u4ee3\u7801\u5df2\u5728https://github.com/Beckschen/LLaVolta\u4e0a\u5f00\u6e90\u3002**|\n", "2406.20087": "|**2024-06-28**|**ProgressGym: Alignment with a Millennium of Moral Progress**|Tianyi Qiu et.al.|[2406.20087](http://arxiv.org/abs/2406.20087)|null|\u968f\u7740\u524d\u6cbf\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u77e5\u8bc6\u8bba\u4e2d\u7684\u5f71\u54cd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5b83\u4eec\u53ef\u80fd\u5f3a\u5316\u793e\u4f1a\u666e\u904d\u7684\u4ef7\u503c\u89c2\uff0c\u8fdb\u800c\u52a0\u5267\u9519\u8bef\u9053\u5fb7\u89c2\u5ff5\u7684\u56fa\u5316\uff0c\u5bfc\u81f4\u5e7f\u6cdb\u7684\u793e\u4f1a\u95ee\u9898\u6301\u7eed\u5b58\u5728\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6f5c\u5728\u98ce\u9669\uff0c\u6211\u4eec\u63d0\u51fa\u8fdb\u6b65\u5bf9\u9f50\u4f5c\u4e3a\u4e00\u79cd\u6280\u672f\u89e3\u51b3\u65b9\u6848\u3002\u8fdb\u6b65\u5bf9\u9f50\u7b97\u6cd5\u65e8\u5728\u5b66\u4e60\u4eba\u7c7b\u9053\u5fb7\u8fdb\u6b65\u7684\u673a\u5236\uff0c\u4ece\u800c\u5f25\u8865\u73b0\u6709\u5bf9\u9f50\u65b9\u6cd5\u5bf9\u5f53\u4ee3\u9053\u5fb7\u76f2\u70b9\u7684\u654f\u611f\u6027\u3002\u4e3a\u4e86\u63a8\u52a8\u8fdb\u6b65\u5bf9\u9f50\u7684\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ProgressGym\uff0c\u4e00\u4e2a\u5b9e\u9a8c\u6027\u6846\u67b6\uff0c\u5b83\u4ece\u5386\u53f2\u4e2d\u5b66\u4e60\u9053\u5fb7\u8fdb\u6b65\u7684\u89c4\u5f8b\uff0c\u4ee5\u4fc3\u8fdb\u73b0\u5b9e\u4e16\u754c\u9053\u5fb7\u51b3\u7b56\u7684\u672a\u6765\u53d1\u5c55\u3002\u501f\u52a99\u4e2a\u4e16\u7eaa\u7684\u5386\u53f2\u6587\u672c\u548c18\u4e2a\u5386\u53f2LLMs\uff0cProgressGym\u5c06\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u8fdb\u6b65\u5bf9\u9f50\u6311\u6218\u8f6c\u5316\u4e3a\u5177\u4f53\u7684\u57fa\u51c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u8ffd\u8e2a\u6f14\u53d8\u7684\u4ef7\u503c\uff08PG-Follow\uff09\u3001\u9884\u6d4b\u9053\u5fb7\u8fdb\u6b65\uff08PG-Predict\uff09\u4ee5\u53ca\u8c03\u8282\u4eba\u4e0eAI\u4ef7\u503c\u53d8\u8fc1\u4e4b\u95f4\u7684\u53cd\u9988\u5faa\u73af\uff08PG-Coevolve\uff09\u3002\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u65f6\u95f4\u7ef4\u5ea6\u7684\u65b9\u6cd5\uff0c\u800c\u4f20\u7edf\u7684\u5bf9\u9f50\u7b56\u7565\u65e0\u6cd5\u80dc\u4efb\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7ec8\u8eab\u5b66\u4e60\u548c\u5916\u63a8\u7b97\u6cd5\u4f5c\u4e3a\u8fdb\u6b65\u5bf9\u9f50\u7684\u57fa\u672c\u65b9\u6cd5\uff0c\u5e76\u5efa\u7acb\u4e86\u4e00\u4e2a\u5f00\u653e\u7684\u6392\u884c\u699c\uff0c\u9080\u8bf7\u521b\u65b0\u7b97\u6cd5\u548c\u65b0\u6311\u6218\u3002\u8be5\u6846\u67b6\u548c\u6392\u884c\u699c\u5206\u522b\u53ef\u5728https://github.com/PKU-Alignment/ProgressGym \u548c https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard \u83b7\u53d6\u3002|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085](http://arxiv.org/abs/2406.20085)|null|\u57fa\u4e8e\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u65b9\u6cd5\u5df2\u7ecf\u5728\u751f\u6210\u5404\u79cd\u5e03\u5c40\u7684\u9ad8\u8d28\u91cf\u56fe\u50cf\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u4e0b\u6e38\u611f\u77e5\u4efb\u52a1\u5177\u6709\u663e\u8457\u76ca\u5904\u3002\u7136\u800c\uff0c\u4ec5\u4f9d\u8d56\u8bed\u8a00\u63cf\u8ff0\u548c\u4e00\u4e2a\u5408\u9002\u7684\u591a\u5b9e\u4f8b\u8bc4\u4f30\u6307\u6807\u6765\u5b9e\u73b0\u5168\u81ea\u52a8\u5e03\u5c40\u751f\u6210\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Auto Cherry-Picker\uff08ACP\uff09\uff0c\u65e8\u5728\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6837\u672c\uff0c\u4ee5\u589e\u5f3a\u611f\u77e5\u548c\u591a\u6a21\u6001\u8bad\u7ec3\u6548\u679c\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u6982\u5ff5\u5217\u8868\uff0c\u6211\u4eec\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u8be6\u7ec6\u7684\u63cf\u8ff0\u5e76\u8bbe\u8ba1\u5408\u7406\u7684\u5e03\u5c40\u3002\u7136\u540e\uff0c\u4f7f\u7528\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u591a\u4e2a\u56fe\u7247\u3002\u63a5\u7740\uff0c\u6211\u4eec\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bc4\u4f30\u6307\u6807\u5bf9\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u70bc\uff0c\u786e\u4fdd\u8d28\u91cf\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u590d\u5408\u5e03\u5c40\u4e0e\u56fe\u50cf\u8bc4\u5206\uff08Composite Layout and Image Score\uff0cCLIS\uff09\u8fd9\u4e00\u65b0\u6307\u6807\uff0c\u7528\u4e8e\u516c\u6b63\u5730\u8bc4\u4f30\u751f\u6210\u7684\u56fe\u50cf\u3002\u6211\u4eec\u7684\u5408\u6210\u9ad8\u8d28\u793a\u4f8b\u5728\u5b9a\u5236\u521d\u59cb\u6982\u5ff5\u5217\u8868\u65f6\uff0c\u80fd\u591f\u6709\u6548\u63d0\u5347\u5404\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5c3e\u5206\u5e03\u548c\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u7684\u95ee\u9898\u4e0a\u3002\u4e0b\u6e38\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cACP\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6a21\u578b\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86CLIS\u4e0e\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u53d1\u73b0CLIS\u5206\u6570\u8d8a\u9ad8\uff0c\u6027\u80fd\u8d8a\u597d\u3002\u8fd9\u8868\u660e\u8bc4\u4f30\u6307\u6807\u5728\u89c6\u89c9\u611f\u77e5\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u4ee3\u7801\u3002|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079](http://arxiv.org/abs/2406.20079)|**[link](https://github.com/anisha2102/molecular_facts)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5185\u5bb9\u7684\u81ea\u52a8\u4e8b\u5b9e\u6838\u67e5\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4ee5\u5e94\u5bf9\u9519\u8bef\u53d9\u8ff0\u7684\u95ee\u9898\uff0c\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u7126\u70b9\u5728\u4e8e\u6838\u67e5\u7684\u7c92\u5ea6\uff1a\u8f83\u5927\u7684\u6587\u672c\u6bb5\u843d\u96be\u4ee5\u6838\u67e5\uff0c\u800c\u66f4\u539f\u5b50\u5316\u7684\u4e8b\u5b9e\uff08\u5982\u547d\u9898\uff09\u53ef\u80fd\u7f3a\u4e4f\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u89e3\u8bfb\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u8fd9\u4e9b\u539f\u5b50\u4e8b\u5b9e\u4e2d\u4e0a\u4e0b\u6587\u7684\u4f5c\u7528\u3002\u6211\u4eec\u8ba4\u4e3a\u5b8c\u5168\u539f\u5b50\u7684\u4e8b\u5b9e\u5e76\u975e\u6700\u4f73\u8868\u793a\u5f62\u5f0f\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u5206\u5b50\u4e8b\u5b9e\u7684\u4e24\u4e2a\u6807\u51c6\uff1a\u53bb\u60c5\u5883\u5316\uff08decontextuality\uff09\uff0c\u5373\u5b83\u4eec\u80fd\u5426\u72ec\u7acb\u5b58\u5728\uff0c\u4ee5\u53ca\u6700\u5c0f\u5316\uff08minimality\uff09\uff0c\u5373\u6dfb\u52a0\u591a\u5c11\u989d\u5916\u4fe1\u606f\u624d\u80fd\u5b9e\u73b0\u53bb\u60c5\u5883\u5316\u3002\u6211\u4eec\u91cf\u5316\u4e86\u53bb\u60c5\u5883\u5316\u5bf9\u6700\u5c0f\u5316\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u7840\u65b9\u6cd5\u6765\u81ea\u52a8\u751f\u6210\u5206\u5b50\u4e8b\u5b9e\uff0c\u76ee\u6807\u662f\u5728\u4fdd\u6301\u51c6\u786e\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u9002\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u4e0e\u4e0d\u540c\u7684\u53bb\u60c5\u5883\u5316\u7b56\u7565\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u53d1\u73b0\u5206\u5b50\u4e8b\u5b9e\u80fd\u591f\u5728\u6a21\u7cca\u573a\u666f\u4e2d\u5e73\u8861\u6700\u5c0f\u5316\u548c\u4e8b\u5b9e\u6838\u67e5\u7684\u51c6\u786e\u6027\u3002**|\n", "2406.20041": "|**2024-07-01**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041](http://arxiv.org/abs/2406.20041)|null|\u81ea\u4e3b\u4ee3\u7406\u9a71\u52a8\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u793a\u4e86\u5de8\u5927\u7684\u81ea\u52a8\u5316\u6f5c\u529b\u3002\u65e9\u671f\u7684\u5c55\u793a\u8868\u660e\uff0c\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u89e3\u51b3\u590d\u6742\u4efb\u52a1\uff0c\u4e0e\u5916\u90e8\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u589e\u5f3a\u77e5\u8bc6\uff0c\u5e76\u89e6\u53d1\u884c\u52a8\u3002\u7279\u522b\u662f\uff0c\u591a\u4e2a\u4ee3\u7406\u534f\u4f5c\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u5de5\u4f5c\u6d41\u8bc1\u660e\u4e86\u5b83\u4eec\u5728\u4e0d\u90a3\u4e48\u4e25\u683c\u548c\u5b9a\u4e49\u4e0d\u660e\u786e\u7684\u73af\u5883\u4e2d\u64cd\u4f5c\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u591a\u4ee3\u7406\u65b9\u6cd5\u6709\u5de8\u5927\u7684\u6f5c\u529b\u6210\u4e3a\u4f17\u591a\u5de5\u4e1a\u5e94\u7528\u7684\u6838\u5fc3\uff0c\u4ece\u590d\u6742\u7684\u77e5\u8bc6\u68c0\u7d22\u7cfb\u7edf\u5230\u4e0b\u4e00\u4ee3\u673a\u5668\u4eba\u8fc7\u7a0b\u81ea\u52a8\u5316\u3002\u9274\u4e8e\u5f53\u524dLLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5904\u7406\u590d\u6742\u6d41\u7a0b\u9700\u8981\u5206\u6b65\u9aa4\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u8bbe\u8ba1\u660e\u786e\u4e14\u6a21\u5757\u5316\u7684\u4efb\u52a1\u8ba1\u5212\u3002\u6839\u636e\u590d\u6742\u7a0b\u5ea6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u53ef\u4ee5\u7531\u5355\u4e2a\u4ee3\u7406\u6216\u4e00\u7ec4\u4ee3\u7406\u6267\u884c\u3002\u672c\u7814\u7a76\u4e13\u6ce8\u4e8e\u6784\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684\u4ee3\u7406\u5de5\u7a0b\u6846\u67b6\uff0c\u91cd\u70b9\u5173\u6ce8\u89c4\u5212\u548c\u6267\u884c\uff0c\u65e8\u5728\u5e94\u5bf9\u4e0d\u540c\u9886\u57df\u7684\u590d\u6742\u5e94\u7528\u573a\u666f\u3002\u8be5\u6846\u67b6\u4e3a\u5de5\u4e1a\u5e94\u7528\u63d0\u4f9b\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u51fa\u786e\u4fdd\u53ef\u6269\u5c55\u3001\u7075\u6d3b\u4e14\u534f\u4f5c\u7684\u5de5\u4f5c\u6d41\u7a0b\u6280\u672f\uff0c\u8ba9\u591a\u4e2a\u81ea\u4e3b\u4ee3\u7406\u534f\u540c\u89e3\u51b3\u95ee\u9898\u3002|\n", "2406.20030": "|**2024-06-28**|**LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models**|Renzhi Wang et.al.|[2406.20030](http://arxiv.org/abs/2406.20030)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u4e86\u8ddf\u4e0a\u4e0d\u65ad\u53d8\u5316\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u9700\u8981\u6301\u7eed\u8fdb\u884c\u6a21\u578b\u66f4\u65b0\uff0c\u8fd9\u50ac\u751f\u4e86\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u591a\u79cd\u5355\u6b21\u548c\u6279\u91cf\u7f16\u8f91\u7684\u6280\u672f\uff0c\u4f46\u5b83\u4eec\u5728\u9762\u5bf9\u7ec8\u751f\u7f16\u8f91\u65f6\u8981\u4e48\u65e0\u6cd5\u5e94\u7528\uff0c\u8981\u4e48\u6548\u679c\u4e0d\u4f73\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51faLEMoE\uff0c\u4e00\u4e2a\u4e13\u4e3a\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u8bbe\u8ba1\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u9002\u914d\u5668\u3002\u9996\u5148\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f71\u54cd\u4f20\u7edfMoE\u9002\u914d\u5668\u5728\u7ec8\u751f\u7f16\u8f91\u4e2d\u6709\u6548\u6027\u7684\u56e0\u7d20\uff0c\u5305\u62ec\u707e\u96be\u6027\u9057\u5fd8\u3001\u8def\u7531\u4e0d\u4e00\u81f4\u6027\u548c\u987a\u5e8f\u654f\u611f\u6027\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9a\u5236\u7684\u6a21\u5757\u63d2\u5165\u65b9\u6cd5\uff0c\u5f15\u5165\u4e86\u65b0\u9896\u7684\u952e\u503c\u5bf9\u951a\u5b9a\u8def\u7531\u4ee5\u589e\u5f3a\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8def\u7531\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u805a\u7c7b\u57fa\u7f16\u8f91\u987a\u5e8f\u89c4\u5212\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec8\u751f\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u6280\u672f\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6279\u91cf\u7f16\u8f91\u4efb\u52a1\u4e2d\u7684\u4f18\u79c0\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5f00\u6e90\u3002|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015](http://arxiv.org/abs/2406.20015)|**[link](https://github.com/toolbehonest/toolbehonest)**|**\u968f\u7740\u5de5\u5177\u589e\u5f3a\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fc5\u901f\u878d\u5165\u5b9e\u9645\u5e94\u7528\uff0c\u793e\u533a\u4e9f\u9700\u5168\u9762\u4e86\u89e3\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5168\u9762\u7684\u8bca\u65ad\u57fa\u51c6\u2014\u2014ToolBH\u3002\u6211\u4eec\u4ece\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e24\u4e2a\u7ef4\u5ea6\u8fdb\u884c\u8bc4\u4f30\uff1a\u5728\u6df1\u5ea6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u591a\u7ea7\u8bca\u65ad\u6d41\u7a0b\uff0c\u5305\u62ec\uff081\uff09\u53ef\u89e3\u6027\u68c0\u6d4b\u3001\uff082\uff09\u89e3\u51b3\u65b9\u6848\u89c4\u5212\u548c\uff083\uff09\u7f3a\u5931\u5de5\u5177\u5206\u6790\uff1b\u5728\u5e7f\u5ea6\u4e0a\uff0c\u8003\u8651\u4e86\u5de5\u5177\u96c6\u7279\u5f81\u4e0b\u7684\u4e09\u79cd\u573a\u666f\uff1a\u7f3a\u5c11\u5fc5\u8981\u5de5\u5177\u3001\u6f5c\u5728\u5de5\u5177\u548c\u529f\u80fd\u6709\u9650\u7684\u5de5\u5177\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e03\u4e2a\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u6b21\u4eba\u5de5\u6807\u6ce8\u6536\u96c6\u4e86700\u4efd\u8bc4\u4f30\u6837\u672c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u5148\u8fdb\u7684\u6a21\u578bGemini-1.5-Pro\u548cGPT-4o\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u603b\u5f97\u5206\u4e3a45.3\u548c37.0\uff0c\u6ee1\u5206100\u5206\u3002\u5728\u5de5\u5177\u589e\u5f3a\u7684LLM\u573a\u666f\u4e2d\uff0c\u66f4\u5927\u7684\u6a21\u578b\u53c2\u6570\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8bad\u7ec3\u6570\u636e\u548c\u56de\u590d\u7b56\u7565\u540c\u6837\u5173\u952e\u3002\u6211\u4eec\u7684\u8bca\u65ad\u5206\u6790\u6307\u51fa\uff0c\u6a21\u578b\u9519\u8bef\u7684\u4e3b\u8981\u539f\u56e0\u5728\u4e8e\u4efb\u52a1\u53ef\u89e3\u6027\u7684\u5224\u65ad\u3002\u5f00\u653e\u6e90\u7801\u6a21\u578b\u5728\u5197\u957f\u56de\u590d\u65f6\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u4e13\u6709\u6a21\u578b\u5728\u957f\u94fe\u63a8\u7406\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002**|\n", "2407.02490": "|**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8ba1\u7b97\u6311\u6218\uff0c\u5c24\u5176\u662f\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u957f\uff0c\u5176\u5e7f\u6cdb\u5e94\u7528\u9762\u4e34\u969c\u788d\u3002\u7531\u4e8e\u6ce8\u610f\u529b\u8ba1\u7b97\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c80\u4ebf\u53c2\u6570\u7684LLM\u5728\u5355\u4e2aA100 GPU\u4e0a\u5904\u7406100\u4e07\u4e2a\u4ee4\u724c\uff08\u5373\u9884\u586b\u5145\u9636\u6bb5\uff09\u9700\u898130\u5206\u949f\u3002\u73b0\u6709\u7684\u52a0\u901f\u9884\u586b\u5145\u65b9\u6cd5\u5f80\u5f80\u5728\u9762\u5bf9\u957f\u5e8f\u5217LLMs\u65f6\u96be\u4ee5\u4fdd\u6301\u65e2\u9ad8\u6548\u53c8\u51c6\u786e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MInference\uff08\u767e\u4e07\u4ee4\u724c\u63a8\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u63d0\u5347\u957f\u5e8f\u5217\u5904\u7406\u9884\u586b\u5145\u9636\u6bb5\u901f\u5ea6\u7684\u7a00\u758f\u8ba1\u7b97\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u6ce8\u610f\u529b\u77e9\u9635\u4e2d\u7684\u4e09\u79cd\u72ec\u7279\u6a21\u5f0f\uff1aA\u5f62\u3001\u5782\u76f4\u659c\u7ebf\u548c\u5757\u7a00\u758f\uff0c\u8fd9\u4e9b\u6a21\u5f0f\u53ef\u5229\u7528GPU\u8fdb\u884c\u9ad8\u6548\u7684\u7a00\u758f\u8ba1\u7b97\u3002\u6211\u4eec\u5728\u79bb\u7ebf\u9636\u6bb5\u786e\u5b9a\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u6700\u4f73\u6a21\u5f0f\uff0c\u5e76\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u52a8\u6001\u6784\u5efa\u7a00\u758f\u7d22\u5f15\u3002\u901a\u8fc7\u4f18\u5316\u7684GPU\u5185\u6838\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u57fa\u4e8e\u6307\u5b9a\u6a21\u5f0f\u7684\u7a00\u758f\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u957f\u5e8f\u5217LLMs\u9884\u586b\u5145\u9636\u6bb5\u7684\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u9700\u4fee\u6539\u9884\u8bad\u7ec3\u8bbe\u7f6e\u6216\u989d\u5916\u5fae\u8c03\u5373\u53ef\u76f4\u63a5\u5e94\u7528\u4e8e\u73b0\u6709LLMs\u3002\u6211\u4eec\u5728\u5305\u62ecInfiniteBench\u3001RULER\u3001PG-19\u548cNeedle In A Haystack\u5728\u5185\u7684\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4ee5\u53caLLaMA-3-1M\u3001GLM4-1M\u3001Yi-200K\u3001Phi-3-128K\u548cQwen2-128K\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMInference\u5728A100\u4e0a\u6709\u6548\u964d\u4f4e\u4e86\u9884\u586b\u5145\u7684\u63a8\u7406\u5ef6\u8fdf\u9ad8\u8fbe10\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u5730\u5740\u4e3a\uff1ahttps://aka.ms/MInference\u3002**|\n", "2407.02486": "|**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNeurocache\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6709\u6548\u4e0a\u4e0b\u6587\u8303\u56f4\uff0c\u901a\u8fc7\u5916\u90e8\u5411\u91cf\u7f13\u5b58\u5b58\u50a8\u5176\u8fc7\u53bb\u7684\u6a21\u578b\u72b6\u6001\u3002\u4e0e\u8fd1\u671f\u7684\u5411\u91cf\u68c0\u7d22\u65b9\u6cd5\u7c7b\u4f3c\uff0cNeurocache\u5229\u7528\u9ad8\u6548\u7684k\u8fd1\u90bb(kNN)\u7b97\u6cd5\u68c0\u7d22\u76f8\u5173\u7684\u5386\u53f2\u72b6\u6001\uff0c\u5e76\u5c06\u5176\u878d\u5165\u6ce8\u610f\u529b\u8fc7\u7a0b\u3002Neurocache\u5728\u6539\u8fdb\u73b0\u6709\u65b9\u6cd5\u65b9\u9762\u6709\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u5b58\u50a8\u538b\u7f29\u7684\u72b6\u6001\uff0c\u51cf\u5c0f\u4e86\u7f13\u5b58\u5927\u5c0f\uff1b(2) \u6bcf\u4e2a\u4ee4\u724c\u6267\u884c\u4e00\u6b21\u68c0\u7d22\u64cd\u4f5c\uff0c\u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\uff1b(3) \u5c06\u68c0\u7d22\u7a97\u53e3\u6269\u5c55\u5230\u90bb\u8fd1\u72b6\u6001\uff0c\u63d0\u5347\u4e86\u8bed\u8a00\u5efa\u6a21\u548c\u4e0b\u6e38\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u8fd8\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982Llama2-7B\u548cMistral-7B\uff09\u8fdb\u884c\u589e\u5f3a\uff0cNeurocache\u90fd\u80fd\u6709\u6548\u3002\u6211\u4eec\u8fd8\u5bf9\u6bd4\u4e86Neurocache\u4e0e\u5176\u4ed6\u6587\u672c\u68c0\u7d22\u65b9\u6cd5\uff0c\u5728\u5355\u6587\u6863\u95ee\u7b54\u548c\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5176\u4f18\u52bf\u3002\u6e90\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u516c\u5f00\uff1ahttps://github.com/alisafaya/neurocache\u3002**|\n", "2407.02485": "|**2024-07-02**|**RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs**|Yue Yu et.al.|[2407.02485](http://arxiv.org/abs/2407.02485)|null|\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6307\u4ee4\u8c03\u4f18\u6846\u67b6RankRAG\uff0c\u65e8\u5728\u9488\u5bf9\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4e2d\u7684\u4e0a\u4e0b\u6587\u6392\u540d\u548c\u7b54\u6848\u751f\u6210\u53cc\u91cd\u4efb\u52a1\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u8c03\u4f18\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u52a0\u5165\u5c11\u91cf\u6392\u540d\u6570\u636e\uff0c\u6307\u4ee4\u8c03\u4f18\u7684\u5355\u4e2a\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u4ee4\u4eba\u60ca\u8bb6\u7684\u6548\u679c\uff0c\u8d85\u8d8a\u4e86\u4e13\u95e8\u4f7f\u7528\u5927\u91cf\u6392\u540d\u6570\u636e\u8fdb\u884c\u5355\u72ec\u8c03\u4f18\u7684\u73b0\u6709\u4e13\u5bb6\u6392\u540d\u6a21\u578b\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4e0e\u5305\u62ecGPT-4-0613\u3001GPT-4-turbo-2024-0409\u548c\u5f00\u653e\u6e90\u4ee3\u7801\u7684\u6700\u5148\u8fdb\u7684RAG\u6027\u80fd\u6a21\u578bChatQA-1.5\u5728\u5185\u7684\u591a\u4e2a\u5f3abaseline\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684Llama3-RankRAG\u5728\u4e5d\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8eLlama3-ChatQA-1.5\u548cGPT-4\u7cfb\u5217\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5728\u65e0\u9700\u9488\u5bf9\u751f\u7269\u533b\u5b66\u9886\u57df\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u4e94\u4e2a\u751f\u7269\u533b\u5b66\u9886\u57df\u7684RAG\u57fa\u51c6\u4e0a\u4e0eGPT-4\u6a21\u578b\u8868\u73b0\u76f8\u5f53\uff0c\u8fd9\u663e\u793a\u4e86\u5176\u5728\u65b0\u9886\u57df\u4e2d\u7684\u51fa\u8272\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.02477": "|**2024-07-02**|**Understanding Alignment in Multimodal LLMs: A Comprehensive Study**|Elmira Amirloo et.al.|[2407.02477](http://arxiv.org/abs/2407.02477)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6027\u80fd\u7684\u63d0\u5347\uff0c\u504f\u597d\u4e00\u81f4\u6027\u5df2\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u56e0\u7d20\uff0c\u4f46\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u5e94\u7528\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u7406\u89e3\u4efb\u52a1\u4e2d\u4e5f\u4f1a\u9047\u5230\u8bf8\u5982\u9519\u8bef\u9648\u8ff0\u548c\u5185\u5bb9\u4e0d\u4e00\u81f4\uff08\u5373\u5e7b\u89c9\uff09\u7684\u95ee\u9898\u3002MLLMs\u7684\u504f\u597d\u5bf9\u9f50\u76ee\u6807\u662f\u4f7f\u6a21\u578b\u7684\u56de\u7b54\u66f4\u8d34\u8fd1\u56fe\u50cf\u4fe1\u606f\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u5f15\u5165\u4e86\u9488\u5bf9MLLM\u7684\u504f\u597d\u6570\u636e\u96c6\uff0c\u5e76\u5c1d\u8bd5\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u548cproximal policy optimization\uff08PPO\uff09\u7b49\u4e0d\u540c\u7684\u5bf9\u9f50\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u3001\u57fa\u7840\u6a21\u578b\u7c7b\u578b\u548c\u5bf9\u9f50\u7b56\u7565\u7684\u5dee\u5f02\uff0c\u54ea\u79cd\u65b9\u6cd5\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u8d21\u732e\u6700\u5927\u5c1a\u4e0d\u6e05\u695a\u3002 \u672c\u6587\u72ec\u7acb\u5206\u6790\u4e86MLLM\u504f\u597d\u5bf9\u9f50\u7684\u5404\u4e2a\u65b9\u9762\u3002\u6211\u4eec\u5c06\u5bf9\u9f50\u7b97\u6cd5\u5206\u4e3a\u79bb\u7ebf\uff08\u5982DPO\uff09\u548c\u5728\u7ebf\uff08\u5982\u5728\u7ebf-DPO\uff09\u4e24\u7c7b\uff0c\u5e76\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u7ed3\u5408\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u56de\u987e\u4e86\u5404\u79cd\u5df2\u53d1\u8868\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u96c6\uff0c\u63a2\u8ba8\u4e86\u5b83\u4eec\u6784\u5efa\u7ec6\u8282\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u751f\u6210\u65b9\u6cd5\u2014\u2014\u504f\u89c1\u9a71\u52a8\u7684\u5e7b\u89c9\u91c7\u6837\uff08Bias-Driven Hallucination Sampling\uff0cBDHS\uff09\uff0c\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u989d\u5916\u6807\u6ce8\u6216\u5916\u90e8\u6a21\u578b\uff0c\u4e14\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u4e0e\u4e4b\u524d\u53d1\u8868\u7684\u5bf9\u9f50\u5de5\u4f5c\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u80fd\u3002|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473](http://arxiv.org/abs/2407.02473)|null|\u5982\u4f55\u6784\u5efa\u80fd\u591f\u5728\u5f00\u653e\u4e16\u754c\u4e2d\u6267\u884c\u8bed\u4e49\u5bfc\u822a\u4efb\u52a1\u7684\u673a\u5668\u4eba\uff0c\u6bd4\u5982\u5728\u65b0\u573a\u666f\u4e2d\u5bfb\u627e\u76ee\u6807\u7269\u4f53\uff1f\u5c3d\u7ba1\u57fa\u7840\u6a21\u578b\u5177\u5907\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u6240\u9700\u7684\u4e30\u5bcc\u77e5\u8bc6\u548c\u6cdb\u5316\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4e00\u79cd\u5408\u9002\u7684\u573a\u666f\u8868\u793a\u6765\u5c06\u5b83\u4eec\u6574\u5408\u5230\u5b8c\u6574\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5f00\u653e\u573a\u666f\u56fe\uff08Open Scene Graphs\uff0cOSG\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u62d3\u6251\u8bed\u4e49\u8868\u793a\uff0c\u7528\u4e8e\u4fdd\u7559\u548c\u7ec4\u7ec7\u5f00\u653e\u96c6\u4e2d\u573a\u666f\u4fe1\u606f\uff0c\u4e14\u7ed3\u6784\u53ef\u9002\u5e94\u4e0d\u540c\u73af\u5883\u7c7b\u578b\u3002\u6211\u4eec\u5c06\u57fa\u7840\u6a21\u578b\u548cOSG\u6574\u5408\u5230OpenSearch\u7cfb\u7edf\u4e2d\uff0c\u8be5\u7cfb\u7edf\u4e13\u4e3a\u5f00\u653e\u4e16\u754c\u7684\u5bf9\u8c61\u76ee\u6807\u5bfc\u822a\u8bbe\u8ba1\uff0c\u80fd\u591f\u7406\u89e3\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u5e76\u5728\u591a\u53d8\u73af\u5883\u4e2d\u96f6\u6837\u672c\u6cdb\u5316\uff0c\u5bfb\u627e\u672a\u89c1\u8fc7\u7684\u7269\u4f53\u3002\u6211\u4eec\u7684OSG\u589e\u5f3a\u4e86\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5f97OpenSearch\u5728\u7269\u4f53\u76ee\u6807\u5bfc\u822a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u7684LLM\u65b9\u6cd5\u3002\u901a\u8fc7\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u6d4b\u8bd5\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86OpenSearch\u5728\u5404\u79cd\u73af\u5883\u3001\u673a\u5668\u4eba\u548c\u65b0\u9896\u6307\u4ee4\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02464": "|**2024-07-02**|**Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I**|Harrie Oosterhuis et.al.|[2407.02464](http://arxiv.org/abs/2407.02464)|null|\u4f20\u7edf\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u7cfb\u7edf\u8bc4\u4f30\u901a\u5e38\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u4eba\u5de5\u4e13\u5bb6\u8fdb\u884c\u76f8\u5173\u6027\u6807\u6ce8\u3002\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u80fd\u591f\u4ee5\u76f8\u5bf9\u8f83\u4f4e\u7684\u8ba1\u7b97\u6210\u672c\u5927\u89c4\u6a21\u751f\u6210\u76f8\u5173\u6027\u6ce8\u91ca\uff0c\u53ef\u80fd\u51cf\u8f7bIR\u8bc4\u4f30\u7684\u4f20\u7edf\u6210\u672c\uff0c\u5e76\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u8d44\u6e90\u532e\u4e4f\u7684\u5e94\u7528\u573a\u666f\u3002\u7136\u800c\uff0c\u751f\u6210\u7684\u6ce8\u91ca\u5e76\u975e\u65e0\u8bef\uff0c\u76f4\u63a5\u7528\u4e8e\u8bc4\u4f30\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u53ef\u9760\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e24\u79cd\u65b9\u6cd5\uff0c\u5206\u522b\u662f\u57fa\u4e8e\u9884\u6d4b\u9a71\u52a8\u7684\u63a8\u65ad\u548c\u89c4\u8303\u98ce\u9669\u63a7\u5236\uff0c\u5229\u7528\u8ba1\u7b97\u673a\u751f\u6210\u7684\u76f8\u5173\u6027\u6ce8\u91ca\u4e3aIR\u8bc4\u4f30\u6307\u6807\u63d0\u4f9b\u53ef\u9760\u7684\u7f6e\u4fe1\u533a\u95f4\uff08CIs\uff09\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9700\u8981\u5c11\u91cf\u53ef\u9760\u7684\u6ce8\u91ca\uff0c\u901a\u8fc7\u7edf\u8ba1\u5206\u6790\u751f\u6210\u6ce8\u91ca\u4e2d\u7684\u9519\u8bef\uff0c\u4ece\u800c\u4e3a\u8bc4\u4f30\u6307\u6807\u8bbe\u7f6eCIs\uff0c\u5177\u6709\u575a\u5b9e\u7684\u7406\u8bba\u57fa\u7840\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u7684\u89c4\u8303\u98ce\u9669\u63a7\u5236\u65b9\u6cd5\u9002\u7528\u4e8e\u6392\u540d\u8bc4\u4f30\uff0c\u5e76\u4e14\u53ef\u4ee5\u6839\u636e\u67e5\u8be2\u548c\u6587\u6863\u81ea\u9002\u5e94\u8c03\u6574CIs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7f6e\u4fe1\u533a\u95f4\u51c6\u786e\u6355\u6349\u4e86\u57fa\u4e8eLLM\u6ce8\u91ca\u7684\u8bc4\u4f30\u4e2d\u7684\u53d8\u5f02\u6027\u548c\u504f\u5dee\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684Bootstrap\u4f30\u8ba1\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e9b\u8d21\u732e\u80fd\u4e3a\u90a3\u4e9b\u4f20\u7edf\u4e0a\u96be\u4ee5\u5b9e\u73b0\u53ef\u9760\u8bc4\u4f30\u7684\u4f17\u591aIR\u5e94\u7528\u5e26\u6765\u9769\u65b0\u3002|\n", "2407.02411": "|**2024-07-03**|**Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs**|Jinmin Li et.al.|[2407.02411](http://arxiv.org/abs/2407.02411)|null|\u968f\u7740\u89c6\u9891\u9a71\u52a8\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u89c6\u9891\u7406\u89e3\u80fd\u529b\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u4fdd\u62a4\u65b9\u9762\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u89c6\u9891\u66f4\u5bb9\u6613\u88ab\u65e0\u6388\u6743\u5730\u6807\u6ce8\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cVideo Watermarking\u201d\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fdd\u62a4\u89c6\u9891\u514d\u53d7\u672a\u7ecf\u6388\u6743\u7684\u89c6\u9891LLMs\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5185\u5bb9\u548c\u63cf\u8ff0\u7684\u5904\u7406\u3002\u901a\u8fc7\u5728\u5173\u952e\u5e27\u4e2d\u5d4c\u5165\u96be\u4ee5\u5bdf\u89c9\u7684\u6c34\u5370\uff0c\u6211\u4eec\u5229\u7528\u591a\u6a21\u6001\u6d41\u635f\u5931\u4fdd\u6301\u89c2\u770b\u4f53\u9a8c\u7684\u540c\u65f6\uff0c\u9632\u6b62\u89c6\u9891\u88ab\u6ee5\u7528\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0cVideo Watermarking\u663e\u8457\u964d\u4f4e\u4e86\u89c6\u9891\u5728\u5404\u79cd\u89c6\u9891LLMs\u4e2d\u7684\u53ef\u7406\u89e3\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u9690\u79d8\u6027\u548c\u9c81\u68d2\u6027\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u786e\u4fdd\u89c6\u9891\u5185\u5bb9\u7684\u5b89\u5168\u3001\u5b8c\u6574\u6027\u548c\u4fdd\u5bc6\u6027\u63d0\u4f9b\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u4e0d\u65ad\u53d1\u5c55\u7684\u89c6\u9891LLMs\u6280\u672f\u3002|\n", "2407.02408": "|**2024-07-02**|**CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models**|Song Wang et.al.|[2407.02408](http://arxiv.org/abs/2407.02408)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u5bf9\u5176\u751f\u6210\u5185\u5bb9\u53ef\u80fd\u4ea7\u751f\u7684\u8d1f\u9762\u793e\u4f1a\u5f71\u54cd\u7684\u62c5\u5fe7\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u7684\u504f\u89c1\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u504f\u89c1\u8bc4\u4f30\u5de5\u4f5c\u5f80\u5f80\u53ea\u5173\u6ce8\u67d0\u79cd\u7c7b\u578b\u7684\u504f\u89c1\uff0c\u5e76\u4f7f\u7528\u4e0d\u4e00\u81f4\u7684\u8bc4\u4ef7\u6307\u6807\uff0c\u8fd9\u5bfc\u81f4\u4e0d\u540c\u6570\u636e\u96c6\u548cLLM\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u591a\u79cd\u7528\u4e8e\u8bc4\u4f30LLM\u504f\u89c1\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86CEB\uff08Compositional Evaluation Benchmark\uff09\uff0c\u5b83\u6db5\u76d6\u4e86\u4e0d\u540c\u793e\u4f1a\u7fa4\u4f53\u548c\u793e\u4f1a\u4efb\u52a1\u4e2d\u7684\u5404\u79cd\u7c7b\u578b\u504f\u89c1\u3002CEB\u7684\u6784\u5efa\u57fa\u4e8e\u6211\u4eec\u65b0\u63d0\u51fa\u7684\u6784\u6210\u6027\u5206\u7c7b\u4f53\u7cfb\uff0c\u4ece\u4e09\u4e2a\u7ef4\u5ea6\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u523b\u753b\uff1a\u504f\u89c1\u7c7b\u578b\u3001\u793e\u4f1a\u7fa4\u4f53\u548c\u4efb\u52a1\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e09\u4e2a\u7ef4\u5ea6\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u5168\u9762\u7684LLM\u504f\u89c1\u8bc4\u4f30\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u504f\u89c1\u5728\u5404\u7ef4\u5ea6\u4e0a\u7684\u7a0b\u5ea6\u6709\u6240\u4e0d\u540c\uff0c\u4ece\u800c\u4e3a\u9488\u5bf9\u7279\u5b9a\u504f\u89c1\u7684\u7f13\u89e3\u65b9\u6cd5\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002|\n", "2407.02402": "|**2024-07-02**|**Assessing the Code Clone Detection Capability of Large Language Models**|Zixian Zhang et.al.|[2407.02402](http://arxiv.org/abs/2407.02402)|null|\u8be5\u7814\u7a76\u65e8\u5728\u8bc4\u4f30\u4e24\u79cd\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cGPT-3.5\u548cGPT-4\uff0c\u5728\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u901a\u8fc7\u5728\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u6a21\u578b\uff1aBigCloneBench\uff08\u4eba\u7c7b\u521b\u5efa\uff09\u548cGPTCloneBench\uff08LLM\u751f\u6210\uff09\u3002\u7814\u7a76\u53d1\u73b0\uff0cGPT-4\u5728\u6240\u6709\u7c7b\u578b\u7684\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4e2d\u90fd\u660e\u663e\u4f18\u4e8eGPT-3.5\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e0e\u5176\u8bc6\u522b\u4ee3\u7801\u514b\u9686\u7684\u80fd\u529b\u4e0e\u4ee3\u7801\u76f8\u4f3c\u5ea6\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5728\u8bc6\u522b\u6700\u590d\u6742\u7684Type-4\u4ee3\u7801\u514b\u9686\u65f6\u6548\u679c\u8f83\u4f4e\u3002\u6b64\u5916\uff0cGPT\u6a21\u578b\u5728\u68c0\u6d4bLLM\u751f\u6210\u7684\u4ee3\u7801\u4e2d\u7684\u4ee3\u7801\u514b\u9686\u8868\u73b0\u4f18\u4e8e\u4eba\u7c7b\u751f\u6210\u7684\u4ee3\u7801\uff0c\u4f46\u6574\u4f53\u51c6\u786e\u6027\u4ecd\u4e0d\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u5728\u4ee3\u7801\u514b\u9686\u8bc6\u522b\u80fd\u529b\u7684\u5fc5\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u81ea\u6211\u751f\u6210\u4ee3\u7801\u514b\u9686\u7684\u95ee\u9898\uff0c\u968f\u7740\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u8d8a\u6765\u8d8a\u591a\u5730\u4f7f\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u548c\u91cd\u6784\u5de5\u5177\uff0c\u8fd9\u53ef\u80fd\u4f1a\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002|\n", "2407.03310": "|**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**\u6458\u8981\uff1a** \u957f\u5ea6\u6cdb\u5316\u6307\u7684\u662f\u4ece\u7b80\u77ed\u7684\u8bad\u7ec3\u5e8f\u5217\u63a8\u65ad\u51fa\u957f\u6d4b\u8bd5\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd9\u5bf9\u4e8e\u5f53\u524d\u7684\u5927\u8bed\u8a00\u6a21\u578b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u67b6\u6784\u6216\u6570\u636e\u683c\u5f0f\u53d8\u5316\u6765\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5c40\u9650\u4e8e\u7279\u5b9a\u4efb\u52a1\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u64e6\u9664\u677f\u548c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u6280\u672f\uff0c\u63d0\u51fa\u4e86Turing\u7a0b\u5e8f\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684CoT\u7b56\u7565\uff0c\u5b83\u5c06\u7b97\u6cd5\u6027\u4efb\u52a1\u5206\u89e3\u6210\u7c7b\u4f3c\u56fe\u7075\u673a\u8ba1\u7b97\u7684\u6b65\u9aa4\u3002\u8fd9\u4e2a\u6846\u67b6\u65e2\u901a\u7528\u53c8\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728\u4e0a\u4e0b\u6587\u4e2d\u7a0d\u4f5c\u4fee\u6539\u5730\u590d\u5236\u6587\u672c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Turing\u7a0b\u5e8f\uff0c\u6211\u4eec\u5728\u52a0\u6cd5\u3001\u4e58\u6cd5\u4ee5\u53ca\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684SGD\u7b49\u7b97\u6cd5\u6027\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u7a33\u5065\u7684\u957f\u5ea6\u6cdb\u5316\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793aTransformer\u5728\u968f\u673aTuring\u7a0b\u5e8f\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u8fd9\u8868\u660e\u5bf9\u4e8e\u4efb\u4f55\u7b97\u6cd5\u6027\u4efb\u52a1\uff0c\u957f\u5ea6\u6cdb\u5316\u90fd\u662f\u53ef\u80fd\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u7406\u8bba\u8bc1\u660eTransformer\u80fd\u591f\u5b9e\u73b0Turing\u7a0b\u5e8f\uff0c\u6784\u9020\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RASP\uff08Weiss\u7b49\u4eba\uff09\u7a0b\u5e8f\uff0c\u5b83\u6a21\u62df\u4efb\u610f\u56fe\u7075\u673a\u3002|\n", "2407.03286": "|**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## \u80cc\u666f \u534a\u7ed3\u6784\u5316\u6570\u636e\u683c\u5f0f\u5982JSON\u56e0\u5176\u5728\u5b58\u50a8\u6570\u636e\u65f6\u7684\u7075\u6d3b\u6027\u800c\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cJSON\u6570\u636e\u901a\u5e38\u7f3a\u4e4f\u4e0e\u5173\u7cfb\u6570\u636e\u5e93\u4e2d\u7684\u8868\u5355\u7ed3\u6784\u76f8\u5bf9\u5e94\u7684\u89c4\u8303\uff08schema\uff09\u3002\u56e0\u6b64\uff0c\u51fa\u73b0\u4e86\u8bb8\u591a\u4ece\u6570\u636e\u96c6\u4e2d\u53d1\u73b0\u89c4\u8303\u7684\u5de5\u5177\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5de5\u5177\u5f88\u6709\u7528\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u6587\u6863\u7684\u8bed\u6cd5\uff0c\u800c\u5ffd\u89c6\u4e86\u8bed\u4e49\u4fe1\u606f\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u5982\u4f55\u81ea\u52a8\u4e3a\u53d1\u73b0\u7684\u89c4\u8303\u6dfb\u52a0\u6709\u610f\u4e49\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u4f7f\u5176\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4f5c\u8005\u7f16\u5199\u7684\u89c4\u8303\u4e2d\u6240\u5305\u542b\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u4eba\u5de5\u7f16\u5199\u7684JSON Schema\u6587\u6863\u5e93\uff0c\u751f\u6210\u5143\u7d20\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u3001\u53ef\u91cd\u7528\u5b9a\u4e49\u7684\u6709\u610f\u4e49\u540d\u79f0\uff0c\u5e76\u8bc6\u522b\u51fa\u54ea\u4e9b\u53d1\u73b0\u7684\u5c5e\u6027\u6700\u6709\u7528\uff0c\u54ea\u4e9b\u53ef\u4ee5\u89c6\u4e3a\u201c\u566a\u58f0\u201d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5148\u524d\u5df2\u8bc1\u660e\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684\u6587\u672c\u751f\u6210\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u4e25\u91cd\u5236\u7ea6\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002\u4eba\u7c7b\u5177\u6709\u81ea\u6211\u610f\u8bc6\u8fc7\u7a0b\uff0c\u80fd\u8bc6\u522b\u9762\u5bf9\u67e5\u8be2\u65f6\u7684\u672a\u77e5\u9886\u57df\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u8bba\u6587\u7814\u7a76\u4e86LLMs\u80fd\u5426\u5728\u751f\u6210\u54cd\u5e94\u4e4b\u524d\u81ea\u884c\u8bc4\u4f30\u5176\u5e7b\u89c9\u98ce\u9669\u3002\u6211\u4eec\u4ece\u8bad\u7ec3\u6570\u636e\u6e90\u548c15\u4e2a\u4e0d\u540c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u89d2\u5ea6\u5e7f\u6cdb\u5206\u6790LLMs\u7684\u5185\u90e8\u673a\u5236\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u6570\u636e\u96c6\u3002\u5b9e\u8bc1\u5206\u6790\u63ed\u793a\u4e86\u4e24\u4e2a\u5173\u952e\u53d1\u73b0\uff1a(1) LLM\u7684\u5185\u90e8\u72b6\u6001\u80fd\u591f\u6307\u793a\u5b83\u4eec\u662f\u5426\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c1\u8fc7\u67e5\u8be2\uff1b(2) LLM\u7684\u5185\u90e8\u72b6\u6001\u663e\u793a\u51fa\u5b83\u4eec\u5bf9\u67e5\u8be2\u53ef\u80fd\u4ea7\u751f\u5e7b\u89c9\u6216\u4e0d\u4ea7\u751f\u5e7b\u89c9\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u7814\u7a76\u5173\u6ce8\u7279\u5b9a\u7684\u795e\u7ecf\u5143\u3001\u6fc0\u6d3b\u5c42\u548c\u4ee4\u724c\uff0c\u8fd9\u4e9b\u5728LLM\u5bf9\u4e0d\u786e\u5b9a\u6027\u548c\u5e7b\u89c9\u98ce\u9669\u7684\u8ba4\u8bc6\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u901a\u8fc7\u4e00\u79cd\u63a2\u67e5\u4f30\u8ba1\u7b97\u6cd5\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u8fd0\u884c\u65f6\u5b9e\u73b0\u4e86\u5e73\u574784.32%\u7684\u5e7b\u89c9\u4f30\u8ba1\u51c6\u786e\u7387\u3002|\n", "2407.03227": "|**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|\u6211\u4eec\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89d2\u5ea6\u63a2\u8ba8\u6587\u672c\u5230SQL\u7684\u8bed\u4e49\u89e3\u6790\u3002\u9274\u4e8e\u5546\u4e1a\u6570\u636e\u5e93\u6a21\u5f0f\u7684\u89c4\u6a21\u6311\u6218\u548c\u4e1a\u52a1\u667a\u80fd\u89e3\u51b3\u65b9\u6848\u7684\u90e8\u7f72\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5b83\u52a8\u6001\u83b7\u53d6\u8f93\u5165\u6570\u636e\u5e93\u4fe1\u606f\uff0c\u5e76\u5229\u7528\u62bd\u8c61\u8bed\u6cd5\u6811\u9009\u62e9\u5c11\u91cf\u793a\u4f8b\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5982\u4f55\u5229\u7528\u5e76\u884c\u8bed\u4e49\u89e3\u6790\u5668\u751f\u6210SQL\u67e5\u8be2\u7684\u8fd1\u4f3c\u7248\u672c\uff0c\u4ee5\u652f\u6301\u6211\u4eec\u7684\u68c0\u7d22\u3002\u6211\u4eec\u751a\u81f3\u5c06\u8fd9\u79cd\u65b9\u6cd5\u63a8\u5411\u6781\u81f4\uff0c\u91c7\u7528\u4e0d\u52305\u4ebf\u53c2\u6570\u7684\u6a21\u578b\u4f5c\u4e3a\u9ad8\u6548\u8fd1\u4f3c\u5668\uff0c\u5e76\u8d4b\u4e88\u5176\u5e76\u884c\u5904\u7406\u6a21\u5f0f\u7684\u80fd\u529b\u3002\u6211\u4eec\u5728\u5355\u8bed\u548c\u8de8\u8bed\u8a00\u7684\u8bed\u4e49\u89e3\u6790\u57fa\u51c6\u4e0a\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4f18\u4e8e\u73b0\u6709\u6700\u4f73\u57fa\u7ebf\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u79cd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u8bbe\u7f6e\u4e2d\u5404\u4e2a\u6a21\u5757\u7684\u8d21\u732e\uff0c\u4e3a\u672a\u6765\u5de5\u4f5c\u6307\u660e\u4e86\u6709\u8da3\u7684\u65b9\u5411\u3002|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## \u80cc\u666f \u91cf\u5316\u6280\u672f\u5728\u63d0\u5347\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u901f\u5ea6\u548c\u90e8\u7f72\u6548\u7387\u65b9\u9762\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5c3d\u7ba1\u6709\u5927\u91cf\u7684\u7814\u7a76\u5173\u6ce8\u4e86\u91cf\u5316\u540e\u7684\u82f1\u8bed\u4efb\u52a1\u6a21\u578b\u6548\u679c\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u9488\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u3002\u6211\u4eec\u5bf9\u91cf\u5316\u591a\u8bed\u8a00LLM\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u8de8\u8bed\u8a00\u6027\u80fd\u53ca\u4e0d\u540c\u89c4\u6a21\u4e0b\u7684\u8868\u73b0\u3002\u6211\u4eec\u91c7\u7528\u81ea\u52a8\u57fa\u51c6\u6d4b\u8bd5\u3001LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u8bc4\u4f30\uff0c\u53d1\u73b0\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u91cf\u5316\u5bf9\u4eba\u7c7b\u8bc4\u4ef7\u7684\u5f71\u54cd\u662f\u8d1f\u9762\u7684\uff0c\u4e14\u81ea\u52a8\u6307\u6807\u4e25\u91cd\u4f4e\u4f30\u4e86\u8fd9\u79cd\u635f\u5bb3\uff1a\u81ea\u52a8\u4efb\u52a1\u4e2d\u5e73\u57471.7%\u7684\u6027\u80fd\u4e0b\u964d\u5bf9\u5e94\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u65e5\u672c\u4efb\u52a1\u768416.0%\u663e\u8457\u4e0b\u6ed1\uff1b(2) \u4e0d\u540c\u8bed\u8a00\u53d7\u5230\u91cf\u5316\u7684\u5f71\u54cd\u7a0b\u5ea6\u4e0d\u5747\uff0c\u975e\u62c9\u4e01\u5b57\u6bcd\u4f53\u7cfb\u7684\u8bed\u8a00\u53d7\u5f71\u54cd\u6700\u4e25\u91cd\uff1b(3) \u6bd4\u5982\u6570\u5b66\u63a8\u7406\u8fd9\u7c7b\u6311\u6218\u6027\u4efb\u52a1\uff0c\u5176\u6027\u80fd\u4e0b\u964d\u6700\u4e3a\u663e\u8457\u3002\u968f\u7740\u4f4e\u529f\u8017\u6a21\u578b\u670d\u52a1\u4e8e\u5168\u7403NLP\u6280\u672f\u7684\u666e\u53ca\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5728\u8bc4\u4f30\u9ad8\u6548\u6a21\u578b\u65f6\uff0c\u591a\u8bed\u8a00\u6027\u80fd\u5e94\u4f5c\u4e3a\u5173\u952e\u6307\u6807\u3002|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### \u7ffb\u8bd1 \u5728\u6570\u5b66\u8bc1\u660e\u7684\u8ba1\u7b97\u673a\u53ef\u9a8c\u8bc1\u5f62\u5f0f\u8bed\u8a00\uff08\u5982Lean\uff09\u9a8c\u8bc1\u4e2d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u7684\u8bc1\u660e\u65b9\u6cd5\u5177\u6709\u91cd\u8981\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7531\u4e8eNL\u4e0e\u5f62\u5f0f\u8bed\u8a00\uff08FL\uff09\u7684\u8bc1\u660e\u6570\u636e\u7a00\u7f3a\uff0c\u73b0\u4ee3LLMs\u5728\u751f\u6210\u5b8c\u6574\u8bc1\u660e\u65b9\u9762\u7684\u6027\u80fd\u6b20\u4f73\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a**TheoremLlama**\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u901a\u7528LLM\u6210\u4e3aLean4\u4e13\u5bb6\u3002\u8be5\u6846\u67b6\u5305\u62ecNL-FL\u5bf9\u9f50\u6570\u636e\u96c6\u751f\u6210\u65b9\u6cd5\u3001LLM\u5f62\u5f0f\u5b9a\u7406\u8bc1\u660e\u5668\u7684\u8bad\u7ec3\u7b56\u7565\u4ee5\u53caLLM\u5728\u64b0\u5199Lean4\u8bc1\u660e\u4e2d\u7684\u6280\u672f\u3002 \u5173\u952e\u521b\u65b0\u5728\u4e8e\u6211\u4eec\u5f00\u53d1\u4e86NL-FL\u81ea\u4e3e\u65b9\u6cd5\uff0c\u5373\u5c06NL\u8bc1\u660e\u878d\u5165Lean4\u4ee3\u7801\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u6b63\u5f0f\u63a8\u7406\u3002\u901a\u8fc7\u8fd9\u79cd\u6570\u636e\u96c6\u751f\u6210\u65b9\u5f0f\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86**Open Bootstrapped Theorems**\uff08OBT\uff09\uff0c\u4e00\u4e2a\u5bf9\u9f50\u4e14\u81ea\u4e3e\u7684NL-FL\u6570\u636e\u96c6\u3002**TheoremLlama**\u6846\u67b6\u5728MiniF2F-Valid\u548cTest\u6570\u636e\u96c6\u4e0a\u7684\u7d2f\u8ba1\u51c6\u786e\u7387\u5206\u522b\u8fbe\u523036.48%\u548c33.61%\uff0c\u8d85\u8fc7\u4e86GPT-4\u7684\u57fa\u7ebf\u5206\u657022.95%\u548c25.41%\u3002\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u6a21\u578b\u68c0\u67e5\u70b9\u548c\u751f\u6210\u7684\u6570\u636e\u96c6\uff0c\u5e76\u5373\u5c06\u5168\u90e8\u4ee3\u7801\u5f00\u6e90\u3002**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aDivergent CoT\uff08DCoT\uff09\uff0c\u901a\u8fc7\u8981\u6c42\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u6bd4\u8f83\u591a\u4e2a\u63a8\u7406\u94fe\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u5c0f\u578b\u3001\u66f4\u6613\u4e8e\u83b7\u53d6\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u4e5f\u80fd\u63d0\u9ad8\u8868\u73b0\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6d89\u53ca\u4e0d\u540c\u7c7b\u578b\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u7814\u7a76\u53d1\u73b0\u5bf9DCoT\u6570\u636e\u96c6\u7684\u5fae\u8c03\u5728\u5404\u79cd\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece13\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\u4e0a\u666e\u904d\u4f18\u4e8e\u57fa\u672c\u7684CoT\u65b9\u6cd5\u3002\u5b9e\u9a8c\u548c\u4eba\u5de5\u8bc4\u4f30\u8868\u660e\uff0c\u8fd9\u4e9b\u6027\u80fd\u63d0\u5347\u6e90\u4e8e\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u4e2d\u751f\u6210\u4e86\u591a\u4e2a\u4e0d\u540c\u7684\u63a8\u7406\u8def\u5f84\uff0c\u8fd9\u8868\u660e\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u81ea\u6211\u7ea0\u6b63\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/UKPLab/arxiv2024-divergent-cot\u4e0a\u516c\u5f00\u3002**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u80fd\u529b\u3001\u6cdb\u5316\u80fd\u529b\u548c\u8de8\u9886\u57df\u7684\u6d41\u7545\u6027\uff0c\u5728\u63d0\u5347\u8bed\u97f3\u76f8\u5173\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u672c\u6587\u5173\u6ce8\u7684\u662f\u5982\u4f55\u5c06\u89e3\u7801\u5668\u4ec5\u6709\u7684LLMs\u6574\u5408\u5230\u8bed\u97f3\u8f6c\u6587\u672c\u7ffb\u8bd1\uff08Speech-to-Text Translation\uff0cS2TT\uff09\u4efb\u52a1\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u67b6\u6784\uff0c\u8ba9LLM\u76f4\u63a5\u5904\u7406\u7f16\u7801\u7684\u8bed\u97f3\u8868\u793a\u5e76\u751f\u6210\u6587\u672c\u7ffb\u8bd1\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u548c\u4efb\u52a1\u8868\u8ff0\u65b9\u5f0f\u7684\u5f71\u54cd\u3002\u5728\u4e0d\u4f7f\u7528\u4e13\u6709\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728CoVoST 2\u548cFLEURS\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u8bbe\u8ba1\u9009\u62e9\u7684\u5408\u7406\u6027\uff0c\u5e76\u4e3aLLMs\u4e0eS2TT\u4efb\u52a1\u7684\u878d\u5408\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2407.03160": "|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## \u80cc\u666f \u5f00\u6e90\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u516c\u4f17\u548c\u884c\u4e1a\u4e2d\u7684\u53d7\u6b22\u8fce\u7a0b\u5ea6\u65e5\u76ca\u63d0\u5347\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u5b9a\u5236\u3001\u5fae\u8c03\u4e14\u514d\u8d39\u4f7f\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5f00\u6e90LLMs\u5728\u4f7f\u7528\u524d\u9700\u8981\u5ba1\u6279\uff0c\u8fd9\u4fc3\u4f7f\u7b2c\u4e09\u65b9\u53d1\u5e03\u6613\u4e8e\u83b7\u53d6\u7684\u7248\u672c\uff0c\u751a\u81f3\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u6216\u91cf\u5316\u4f18\u5316\uff0c\u4ee5\u964d\u4f4e\u8ba1\u7b97\u9700\u6c42\u3002\u8fd9\u4e9b\u4fbf\u6377\u7248\u672c\u5bf9\u7528\u6237\u9887\u5177\u5438\u5f15\u529b\uff0c\u4f46\u4e5f\u589e\u52a0\u4e86\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u7684\u98ce\u9669\uff0c\u5a01\u80c1\u5230LLMs\u7684\u5b8c\u6574\u6027\u548c\u5b89\u5168\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u65b9\u6cd5SOS\uff0c\u5b83\u8bbe\u8ba1\u5f97\u8ba1\u7b97\u9700\u6c42\u4f4e\uff0c\u65e0\u9700\u5e72\u51c0\u6570\u636e\u6216\u8c03\u6574\u6a21\u578b\u6743\u91cd\uff0c\u4fdd\u6301\u6a21\u578b\u7684\u53ef\u7528\u6027\u3002SOS\u9488\u5bf9\u5404\u79cd\u573a\u666f\u4e0b\u7684\u5b89\u5168\u95ee\u9898\uff0c\u5305\u62ec\u540e\u95e8\u653b\u51fb\u3001\u7834\u89e3\u653b\u51fb\u548c\u63d0\u793a\u7a83\u53d6\u653b\u51fb\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u653b\u51fb\u5728\u6240\u6709\u8bc4\u4f30\u76ee\u6807\u4e0a\u5747\u6709\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86SOS\u6280\u672f\u7684\u53e6\u4e00\u9762\u2014\u2014\u7248\u6743\u4ee4\u724c\uff1a\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5141\u8bb8\u7528\u6237\u6807\u8bb0\u5176\u7248\u6743\u5185\u5bb9\uff0c\u9632\u6b62\u6a21\u578b\u4f7f\u7528\u3002|\n", "2407.03157": "|**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4ee3\u7801\u751f\u6210\u4e2d\u7684\u5e38\u89c1\u573a\u666f\uff1a\u5f00\u53d1\u8005\u5b9e\u65f6\u7f16\u8f91\u73b0\u6709\u4ee3\u7801\uff0c\u5e76\u8bf7\u6c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u8bed\u8a00\u6a21\u578b\uff09\u8fdb\u884c\u5373\u65f6\u91cd\u9884\u6d4b\u4e0b\u4e00\u4e2atoken\u6216\u884c\u3002\u76f4\u63a5\u7684\u65b9\u6cd5\u662f\u8ba9LLM\u91cd\u65b0\u7f16\u7801\u6574\u4e2a\u952e\u503c\u7f13\u5b58\u4ee5\u63d0\u4f9b\u7cbe\u786e\u7684\u9884\u6d4b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8ba1\u7b97\u6210\u672c\u9ad8\uff0c\u7279\u522b\u662f\u5f53\u5e8f\u5217\u957f\u5ea6\u5f88\u957f\u65f6\u3002\u4ec5\u7f16\u7801\u7f16\u8f91\u540e\u7684\u5b50\u5e8f\u5217\u5e76\u5c06\u5176\u6574\u5408\u5230\u539f\u59cb\u952e\u503c\u7f13\u5b58\u4e2d\u4f1a\u9047\u5230\u65f6\u95f4\u6df7\u6dc6\u95ee\u9898\uff0c\u5bfc\u81f4\u6027\u80fd\u5927\u5e45\u4e0b\u964d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\u2014\u2014\\textbf{\u4f4d\u7f6e\u5b8c\u6574\u6027\u7f16\u7801}\uff08Positional Integrity Encoding\uff0c\u7b80\u79f0PIE\uff09\u3002PIE\u57fa\u4e8e\u65cb\u8f6c\u578b\u4f4d\u7f6e\u7f16\u7801\uff0c\u9996\u5148\u79fb\u9664\u5f15\u5165\u65f6\u95f4\u6df7\u6dc6\u7684\u65cb\u8f6c\u578b\u77e9\u9635\uff0c\u7136\u540e\u91cd\u65b0\u5e94\u7528\u6b63\u786e\u7684\u77e9\u9635\uff0c\u786e\u4fdd\u4e86\u4ee4\u724c\u4e4b\u95f4\u7684\u4f4d\u7f6e\u5173\u7cfb\u6b63\u786e\uff0c\u4ec5\u9700\u4e00\u8f6e\u77e9\u9635\u4e58\u6cd5\u5373\u53ef\u5b8c\u6210\u3002\u6211\u4eec\u5728RepoBench-C-8k\u6570\u636e\u96c6\u4e0a\uff0c\u4f7f\u752813\u4ebf\u300167\u4ebf\u548c330\u4ebf\u53c2\u6570\u7684DeepSeek-Coder\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u4ee3\u7801\u63d2\u5165\u3001\u4ee3\u7801\u5220\u9664\u548c\u591a\u4f4d\u7f6e\u4ee3\u7801\u7f16\u8f91\u7b49\u4e09\u4e2a\u5b9e\u9645\u7f16\u7a0b\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u5b8c\u6574\u91cd\u8ba1\u7b97\u65b9\u6cd5\u76f8\u6bd4\uff0cPIE\u5728\u6240\u6709\u6a21\u578b\u89c4\u6a21\u548c\u4efb\u52a1\u4e2d\u90fd\u80fd\u51cf\u5c11\u8d85\u8fc785%\u7684\u8ba1\u7b97\u5f00\u9500\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6027\u80fd\u8fd1\u4f3c\u3002|\n"}} \ No newline at end of file