Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add access log json design doc #15

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mreynolds389
Copy link
Contributor

No description provided.

Copy link
Contributor

@tbordaz tbordaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial questions

etime: ####,
notes: "".
tag: ####,
msg: "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RESULT may be extended by new logging (like described here). For example 'sid="session_1234"'. Should it be part of a new field (sid), and extended field, or something else.

Copy link
Contributor Author

@mreynolds389 mreynolds389 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your design, yes this will have to be updated for those new key value pairs. So to answer your question, yes we should add new fields and not extend existing ones.


When switching to a new logging format the current log will be rotated.

You can also customize the "local_time" format using strftime conversion specifications. The default would be **%FT%TZ**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan tool to convert json <--> default.
For compatibility with tools like logconv.pl ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get this done before writing the new logconv, as logconv will need to support both formats

Copy link
Contributor Author

@mreynolds389 mreynolds389 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And no I don't see json being the default yet as I think it's too late for this to land downstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an already started work to rewrite logconv.pl into python. AFAIK it is only based on standard format, @jchapma ?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we have started work on logconv.py, currently finishing off verbose mode processing. The initial release will just replicate the functionality and output format of logconv.pl.

@tbordaz
Copy link
Contributor

tbordaz commented Oct 21, 2024

Also there is an header in access log that could also be parsed

        389-Directory/3.1.1.2 B2024.285.0000
        localhost.localdomain:63601 (/etc/dirsrv/slapd-localhost)

Copy link
Member

@droideck droideck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea. As we have JSON, would it make sense to have a nicely grouped data? Something like this:

   "operation": {
      "type": "BIND",
      "id": 1234,
      "method": "SIMPLE"
    },
    "connection": {
      "id": 5678,
      "client": {
        "ip": "192.168.1.100"
      },
      "server": {
        "ip": "10.0.0.1"
      }
    },
    "log": {
      "level": "INFO"
    },
    "result": {
      "code": 0,
      "message": "Authentication successful"
    }

This structure provides better organization of the data. And it's aligned with common log management tools.


{
// Header
version: "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the version field? Is it the 389 DS version?
If so, should we also have something like format_version, which could make future format changes easier to manage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the version field? Is it the 389 DS version? If so, should we also have something like format_version, which could make future format changes easier to manage?

This was to address Thierry's point about the header that the current logs have:

    389-Directory/3.1.1 DEVELOPER BUILD B0000.000.0000
    localhost.localdomain:636 (/etc/dirsrv/slapd-localhost) 

Added a new configuration setting for access logging under **cn=config**

```
nsslapd-accesslog-json-format: default | json | json-pretty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wander if we should consider adding an option for selective JSON logging, allowing admin to choose which operations or fields are included in the JSON format (or exclude list?). But it might be too much to handle (and not very needed?). I don't have a hard opinion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean filter the log? Or a log mixed with JSON and non-JSON logging in it?

Personally I don't think we should filter access log content, and I don't think we should mix json with non-json logging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean filter the log entries before printing them as to include some fields or not. So log entries might look cleaner and consume less storage.
But I'm also not sure if it worths it... Just an idea to consider here:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean filter the log entries before printing them as to include some fields or not. So log entries might look cleaner and consume less storage. But I'm also not sure if it worths it... Just an idea to consider here:)

Well it's only going to write keys that have values. If you want to filter out keys it would not be hard to add that later, but it could have other side effects. If we allow setting filters on the fly, then we need a lock, and that can hurt performance. We could require a restart instead, but at this point I don't know if we need this? Again it could be added later.

docs/389ds/design/accesslog-json-design.md Show resolved Hide resolved
local_time: <strftime output - customizable>,
gm_time: <gm time - uses a fixed format of **%FT%TZ**>,
conn_id: ####,
op_id: ####,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's already here, but I think we need to have a unique identifier for each log entry to help with deduplication and tracking. Probably by combining a few fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a single header? Or a unique value in every single log event? You could argue the timestamp is unique. I guess I'm not sure what you are trying to solve here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think timestamp can be the same (but very rarely) on some concurrent events.

For any log consumption of the most log processing tools, it's beneficial to distinguish between events with just one field - uniq key.
It's a safe and future proof option to add one, IMO...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I have recently noticed that opid+connid is not sufficient to identify all event associated with a given operation (as different connection may have the same connid) we probably need a 64 bit counter or associate the operation start timestamp + connid + opid anyway having an unique operation identifier will be helpfull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I have recently noticed that opid+connid is not sufficient to identify all event associated with a given operation (as different connection may have the same connid) we probably need a 64 bit counter or associate the operation start timestamp + connid + opid anyway having an unique operation identifier will be helpfull

Sounds good, I'll add some kind of unique key for each connection and its ops.

// The rest of the log
op: (
ABANDON, ADD, AUTOBIND, BIND/UNBIND, CLOSE, CMP, CONN, DEL, EXT,
MOD, MODRDN, SEARCH, SORT, STAT, RESULT, VLV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we would not want to add a generic CTRL op to log incoming control
with the OID and the criticality flags (I am more dubious about the logging the control data because it may contains secret - that said we may have a white list of control whose data are safe to log)

And maybe another op for HTTP PROXY operation ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About controls: maybe a field with a sequence of
"control" object containing: criticallity, oid, optionnal string representing the data
(probably cheaper than adding a new record for each controls)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And reuse the same field/object for the response/control in RESULT records
If that is the case we may remove the need fort VLV/SORT tags since the information will be provided by the generic format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not sure I'll following your idea. Controls could be added to the operations in which they were attached to. I don't see how it would be its own log "event". Sorry I think I am just misunderstanding your suggestion.

@mreynolds389
Copy link
Contributor Author

Another idea. As we have JSON, would it make sense to have a nicely grouped data? Something like this:

...

This structure provides better organization of the data. And it's aligned with common log management tools.

I'm actually concerned this might overly complicate it. There will be overlap. A new connection, versus BIND, versus MOD, etc. The connection object/sub-key will be used very differently between some of these operations. In some cases the "connection" object should list the bind DN, but not for BIND operations. There are going to be these corner cases that will add complexity. While it does look nice, I think it might be complex/inconsistent across all operations/events. It will also add to the size of the logs, but that's not as important.

Your idea is similar to how I had originally wanted to do the audit log JSON format, but @progier389 suggested that the once you know the "op" it doesn't really matter if you have these sub-keys or not. The parser still needs to know what to look for, and I do like the single layer of the event.

I have to think about this. Other comments are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants