-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add access log json design doc #15
base: main
Are you sure you want to change the base?
Conversation
df1d839
to
8e4308d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial questions
etime: ####, | ||
notes: "". | ||
tag: ####, | ||
msg: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RESULT may be extended by new logging (like described here). For example 'sid="session_1234"'. Should it be part of a new field (sid), and extended field, or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your design, yes this will have to be updated for those new key value pairs. So to answer your question, yes we should add new fields and not extend existing ones.
|
||
When switching to a new logging format the current log will be rotated. | ||
|
||
You can also customize the "local_time" format using strftime conversion specifications. The default would be **%FT%TZ** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan tool to convert json <--> default.
For compatibility with tools like logconv.pl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get this done before writing the new logconv, as logconv will need to support both formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And no I don't see json being the default yet as I think it's too late for this to land downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an already started work to rewrite logconv.pl into python. AFAIK it is only based on standard format, @jchapma ?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we have started work on logconv.py, currently finishing off verbose mode processing. The initial release will just replicate the functionality and output format of logconv.pl.
8e4308d
to
fc09f57
Compare
Also there is an header in access log that could also be parsed
|
fc09f57
to
f4966ec
Compare
f4966ec
to
a181db1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea. As we have JSON, would it make sense to have a nicely grouped data? Something like this:
"operation": {
"type": "BIND",
"id": 1234,
"method": "SIMPLE"
},
"connection": {
"id": 5678,
"client": {
"ip": "192.168.1.100"
},
"server": {
"ip": "10.0.0.1"
}
},
"log": {
"level": "INFO"
},
"result": {
"code": 0,
"message": "Authentication successful"
}
This structure provides better organization of the data. And it's aligned with common log management tools.
|
||
{ | ||
// Header | ||
version: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the version
field? Is it the 389 DS version?
If so, should we also have something like format_version
, which could make future format changes easier to manage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the
version
field? Is it the 389 DS version? If so, should we also have something likeformat_version
, which could make future format changes easier to manage?
This was to address Thierry's point about the header that the current logs have:
389-Directory/3.1.1 DEVELOPER BUILD B0000.000.0000
localhost.localdomain:636 (/etc/dirsrv/slapd-localhost)
Added a new configuration setting for access logging under **cn=config** | ||
|
||
``` | ||
nsslapd-accesslog-json-format: default | json | json-pretty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wander if we should consider adding an option for selective JSON logging, allowing admin to choose which operations or fields are included in the JSON format (or exclude list?). But it might be too much to handle (and not very needed?). I don't have a hard opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean filter the log? Or a log mixed with JSON and non-JSON logging in it?
Personally I don't think we should filter access log content, and I don't think we should mix json with non-json logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean filter the log entries before printing them as to include some fields or not. So log entries might look cleaner and consume less storage.
But I'm also not sure if it worths it... Just an idea to consider here:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean filter the log entries before printing them as to include some fields or not. So log entries might look cleaner and consume less storage. But I'm also not sure if it worths it... Just an idea to consider here:)
Well it's only going to write keys that have values. If you want to filter out keys it would not be hard to add that later, but it could have other side effects. If we allow setting filters on the fly, then we need a lock, and that can hurt performance. We could require a restart instead, but at this point I don't know if we need this? Again it could be added later.
local_time: <strftime output - customizable>, | ||
gm_time: <gm time - uses a fixed format of **%FT%TZ**>, | ||
conn_id: ####, | ||
op_id: ####, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's already here, but I think we need to have a unique identifier for each log entry to help with deduplication and tracking. Probably by combining a few fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a single header? Or a unique value in every single log event? You could argue the timestamp is unique. I guess I'm not sure what you are trying to solve here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think timestamp can be the same (but very rarely) on some concurrent events.
For any log consumption of the most log processing tools, it's beneficial to distinguish between events with just one field - uniq key.
It's a safe and future proof option to add one, IMO...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I have recently noticed that opid+connid is not sufficient to identify all event associated with a given operation (as different connection may have the same connid) we probably need a 64 bit counter or associate the operation start timestamp + connid + opid anyway having an unique operation identifier will be helpfull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I have recently noticed that opid+connid is not sufficient to identify all event associated with a given operation (as different connection may have the same connid) we probably need a 64 bit counter or associate the operation start timestamp + connid + opid anyway having an unique operation identifier will be helpfull
Sounds good, I'll add some kind of unique key for each connection and its ops.
// The rest of the log | ||
op: ( | ||
ABANDON, ADD, AUTOBIND, BIND/UNBIND, CLOSE, CMP, CONN, DEL, EXT, | ||
MOD, MODRDN, SEARCH, SORT, STAT, RESULT, VLV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we would not want to add a generic CTRL op to log incoming control
with the OID and the criticality flags (I am more dubious about the logging the control data because it may contains secret - that said we may have a white list of control whose data are safe to log)
And maybe another op for HTTP PROXY operation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About controls: maybe a field with a sequence of
"control" object containing: criticallity, oid, optionnal string representing the data
(probably cheaper than adding a new record for each controls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And reuse the same field/object for the response/control in RESULT records
If that is the case we may remove the need fort VLV/SORT tags since the information will be provided by the generic format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not sure I'll following your idea. Controls could be added to the operations in which they were attached to. I don't see how it would be its own log "event". Sorry I think I am just misunderstanding your suggestion.
...
I'm actually concerned this might overly complicate it. There will be overlap. A new connection, versus BIND, versus MOD, etc. The connection object/sub-key will be used very differently between some of these operations. In some cases the "connection" object should list the bind DN, but not for BIND operations. There are going to be these corner cases that will add complexity. While it does look nice, I think it might be complex/inconsistent across all operations/events. It will also add to the size of the logs, but that's not as important. Your idea is similar to how I had originally wanted to do the audit log JSON format, but @progier389 suggested that the once you know the "op" it doesn't really matter if you have these sub-keys or not. The parser still needs to know what to look for, and I do like the single layer of the event. I have to think about this. Other comments are welcome. |
No description provided.