-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ input_file support utf16 #1212
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
目前的改动是基于logtail 1.6改动的,支持utf16-LE和utf16-BE,支持Windows和Linux |
测试脚本 # coding=utf-8
import os
import codecs
import shutil
import time
mode="le"
log="Json"
nums=1000000
source_file = 'C:\\Users\\Administrator\\ilogtail\\bin\\log\\utf16'+mode+'.log'
target_file = 'C:\\Users\\Administrator\\ilogtail\\bin\\log\\utf16'+log+'.log'
shutil.copy2(source_file, target_file)
time.sleep(3)
print("start")
if log == "Reg":
for i in range(0, nums):
log_message = """[2022-07-07T10:43:27.360266763] [INFO] java.lang.Exception: exception happened
"""+str(i)+""" at com.aliyun.sls.devops.logGenerator.type.RegexMultiLog.f2(RegexMultiLog.java:108)
at java.base/java.lang.Thread.run(Thread.java:833)
日志采集是整个日志基础设施中最基础最关键的组件之一,影响着企业内部数据的完整性以及实时性。采集器作为数据链路的前置环节,其可靠性、扩展性、灵活性以及资源(CPU 和内存)消耗等,往往是最被关注的核心技术点。目前开源的日志采集器比较多。各采集器官网上关于其产品特性的描述也都比较相似,基本上都包括日志搜集、转换、路由等功能,并且无一例外都会突出其为高性能而设计。如果单纯看产品文档,其实很难在前面提到的核心技术点上得出有区分度的结论,若直接在生产环境上使用,则无疑是高压线上走钢丝。
我所在的公司作为一家通信与信息服务类公司,线上存在海量日志采集的场景,对于采集效率要求极高。前段时间阿里将内部大规模部署的采集引擎 ilogtail 对外开源,其列举的性能数据和技术细节吸引了我的注意。但是如果在外部社区使用,其具体的性能数据如何。本文将 ilogtail 与其他四款广泛使用的日志采集器:filebeat(go 语言)、vector(rust 语言)、fluent-bit(c 语言)、rsyslog(c 语言)进行对比测试,重点关注他们在可靠性、采集、转换性能、以及功能上的差异。
""" + "\n"
file_name = target_file
if os.path.exists(file_name):
with open(file_name, 'ab') as f:
f.write(log_message.encode('utf-16'+mode))
else:
with codecs.open(file_name, 'w', 'utf-16-'+mode) as f:
f.write(log_message)
if log == "Json":
for i in range(0, nums):
log_message = """{"url": "POST /PutData?Category=YunOsAccountOpLog HTTP/1.1", "ip": "10.200.98.220", "user-agent": "aliyun-sdk-java", "request": {"status": "200", "latency": "18204"}, "time": "07/Jul/2022:10:30:28"}""" + "\n"
file_name = target_file
if os.path.exists(file_name):
with open(file_name, 'ab') as f:
f.write(log_message.encode('utf-16'+mode))
else:
with codecs.open(file_name, 'w', 'utf-16-'+mode) as f:
f.write(log_message) |
messixukejia
reviewed
Nov 6, 2023
yyuuttaaoo
requested changes
Nov 6, 2023
yyuuttaaoo
reviewed
Nov 6, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.