Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frrist/metrics/include instance #4450

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

frrist
Copy link
Member

@frrist frrist commented Sep 17, 2024

Decouples the metadata store from the repo, allow service that accept only a config to read the contents of metadata stored in the repo.
feat: client and server include ids in job meta

  • clients pass their instance and installation id in headers, this is
    added to the jobs metadata when submitted
  • server includes their instance and installation id in meta of job.
  • Now, given a job spec it may be determined which node submitted the job
    and which node receieved the submission

frrist added 3 commits September 17, 2024 14:11
- clients pass their instance and installation id in headers, this is
  added to the jobs metadata when submitted
- server includes their instance and installation id in meta of job.

Now, given a job spec it may be determined which node submitted the job
and which node receieved the submission
Comment on lines 254 to 261
installationID, err := metastore.ReadInstallationID()
if err != nil {
log.Trace().Err(err).Msg("failed to read installationID")
}
instanceID, err := metastore.ReadInstanceID()
if err != nil {
log.Trace().Err(err).Msg("failed to read instanceID")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace are mainly for more granular and request level logging. It might be useful to use debug level here

Copy link
Member Author

@frrist frrist Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing to debug

cmd/util/api.go Outdated
Comment on lines 39 to 46
installationID, err := metastore.ReadInstallationID()
if err != nil {
return nil, err
}
instanceID, err := metastore.ReadInstanceID()
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be best effort instead of failing the request

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will modify to log a debug message, and only include in headers if found and non-empty.

cmd/util/api.go Outdated
Comment on lines 54 to 55
apimodels.HTTPHeaderBacalhauBuildInstallationID: {installationID},
apimodels.HTTPHeaderBacalhauInstanceID: {instanceID},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be HTTPHeaderBacalhauInstallationID instead of HTTPHeaderBacalhauBuildInstallationID. Also should we always send the headers or only if they have values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will only include if value is present: #4450 (comment)

Comment on lines 56 to 60
type MetaStore interface {
ReadInstallationID() (string, error)
ReadInstanceID() (string, error)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why introducing an interface instead of just passing the actual metastore to NewRequesterNode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent the receivers of the type from calling methods that can modify the contents of the store - this is a read only store given this interface. It will also make testing easier in the future.

Comment on lines 218 to 225
instanceID, err := metaStore.ReadInstanceID()
if err != nil {
return nil, err
}
installationID, err := metaStore.ReadInstallationID()
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common pattern I am seeing in different places in the code where we should treat these as best effort instead of failing. My recommendation is to have the metastore handle the failures internally by just logging and returning an empty id. If there are cases where you need to fail, then you can introduce MustReadInstanceID() methods.

What would be nice is:

  1. Have the metastore read the ids from disk only once the first time a getter is called, store the entries in-memory and then serve subsequent requests from memory
  2. If a failure happens while reading from disk, then it should log only once, and then all future calls will just get an empty id without more logging

Comment on lines 65 to 66
job.Meta[models.MetaClientInstallationID] = request.ClientInstallationID
job.Meta[models.MetaClientInstanceID] = request.ClientInstanceID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only set these if the request has these metadata?

Comment on lines +118 to +133
func OrchestratorInstanceID(instanceID string) JobTransformer {
f := func(ctx context.Context, job *models.Job) error {
job.Meta[models.MetaServerInstanceID] = instanceID
return nil
}
return JobFn(f)
}

func OrchestratorInstallationID(installationID string) JobTransformer {
f := func(ctx context.Context, job *models.Job) error {
job.Meta[models.MetaServerInstallationID] = installationID
return nil
}
return JobFn(f)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here

Comment on lines 99 to 102
func (fsr *MetadataStore) ReadInstallationID() (string, error) {
sysmeta, err := fsr.readMetadata()
if err != nil {
return "", err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

installationID should no longer be in the meta store. It is fine to keep the method here if you prefer, but the source is not system_metadata.yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it in the store prevents it from being scattered in different files. This was the design we decided here: https://www.notion.so/expanso/Rethinking-Configuration-435fbe87419148b4bbc5119d413786eb?pvs=4#5d5f7298b9ef4ab4bcb1fd6a44ca8c30. Additionally, for users migrating from v1.4.0 to v1.5.0, the installationID will be moved to the metadata store from its previous location the config file.

I know we discussed this in our meeting; How would you like to proceed here?
We can continuing doing what we do now and keep it in the store, or we can modify the metadata store to check both locations - its internal file and the file produced by the installation script. The latter feels like unnecessary complexity and coupling with the implementation of the install script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants