Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ input_file support utf16 #1212

Merged
merged 21 commits into from
Nov 7, 2023
Merged

C++ input_file support utf16 #1212

merged 21 commits into from
Nov 7, 2023

Conversation

quzard
Copy link
Collaborator

@quzard quzard commented Nov 1, 2023

No description provided.

@quzard
Copy link
Collaborator Author

quzard commented Nov 1, 2023

目前的改动是基于logtail 1.6改动的,支持utf16-LE和utf16-BE,支持Windows和Linux

@quzard quzard changed the base branch from main to 1.6 November 2, 2023 07:40
@quzard quzard changed the title C++ input_file support utf16 C++ input_file support utf16_LE Nov 4, 2023
@quzard
Copy link
Collaborator Author

quzard commented Nov 4, 2023

测试脚本

# coding=utf-8
import os
import codecs
import shutil
import time

mode="le"
log="Json"
nums=1000000

source_file = 'C:\\Users\\Administrator\\ilogtail\\bin\\log\\utf16'+mode+'.log'
target_file = 'C:\\Users\\Administrator\\ilogtail\\bin\\log\\utf16'+log+'.log'
shutil.copy2(source_file, target_file)
time.sleep(3)
print("start")


if log == "Reg":
    for i in range(0, nums):
        log_message = """[2022-07-07T10:43:27.360266763] [INFO] java.lang.Exception: exception happened
    """+str(i)+"""        at com.aliyun.sls.devops.logGenerator.type.RegexMultiLog.f2(RegexMultiLog.java:108)
        at java.base/java.lang.Thread.run(Thread.java:833)
    日志采集是整个日志基础设施中最基础最关键的组件之一,影响着企业内部数据的完整性以及实时性。采集器作为数据链路的前置环节,其可靠性、扩展性、灵活性以及资源(CPU 和内存)消耗等,往往是最被关注的核心技术点。目前开源的日志采集器比较多。各采集器官网上关于其产品特性的描述也都比较相似,基本上都包括日志搜集、转换、路由等功能,并且无一例外都会突出其为高性能而设计。如果单纯看产品文档,其实很难在前面提到的核心技术点上得出有区分度的结论,若直接在生产环境上使用,则无疑是高压线上走钢丝。

    我所在的公司作为一家通信与信息服务类公司,线上存在海量日志采集的场景,对于采集效率要求极高。前段时间阿里将内部大规模部署的采集引擎 ilogtail 对外开源,其列举的性能数据和技术细节吸引了我的注意。但是如果在外部社区使用,其具体的性能数据如何。本文将 ilogtail 与其他四款广泛使用的日志采集器:filebeat(go 语言)、vector(rust 语言)、fluent-bit(c 语言)、rsyslog(c 语言)进行对比测试,重点关注他们在可靠性、采集、转换性能、以及功能上的差异。
        """ + "\n"

        file_name = target_file

        if os.path.exists(file_name):
            with open(file_name, 'ab') as f:
                f.write(log_message.encode('utf-16'+mode))
        else:
            with codecs.open(file_name, 'w', 'utf-16-'+mode) as f:
                f.write(log_message)
if log == "Json":
    for i in range(0, nums):
        log_message = """{"url": "POST /PutData?Category=YunOsAccountOpLog HTTP/1.1", "ip": "10.200.98.220", "user-agent": "aliyun-sdk-java", "request": {"status": "200", "latency": "18204"}, "time": "07/Jul/2022:10:30:28"}""" + "\n"

        file_name = target_file

        if os.path.exists(file_name):
            with open(file_name, 'ab') as f:
                f.write(log_message.encode('utf-16'+mode))
        else:
            with codecs.open(file_name, 'w', 'utf-16-'+mode) as f:
                f.write(log_message)

@quzard quzard changed the title C++ input_file support utf16_LE C++ input_file support utf16 Nov 4, 2023
core/common/EncodingConverter.cpp Show resolved Hide resolved
core/common/EncodingConverter.cpp Outdated Show resolved Hide resolved
core/reader/LogFileReader.cpp Outdated Show resolved Hide resolved
core/reader/LogFileReader.cpp Outdated Show resolved Hide resolved
core/reader/LogFileReader.cpp Show resolved Hide resolved
core/reader/LogFileReader.cpp Outdated Show resolved Hide resolved
@yyuuttaaoo yyuuttaaoo merged commit db1cc6b into alibaba:1.6 Nov 7, 2023
20 checks passed
@yyuuttaaoo yyuuttaaoo added the enhancement Feature enhancement label Nov 8, 2023
@yyuuttaaoo yyuuttaaoo added this to the v1.8 milestone Nov 8, 2023
@quzard quzard deleted the support-utf16 branch February 28, 2024 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants