Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use dataDir/config.yaml as the default config path #4466

Merged
merged 6 commits into from
Sep 19, 2024
Merged

Conversation

wdbaruni
Copy link
Member

@wdbaruni wdbaruni commented Sep 18, 2024

A recent change in behaviour has caused some confusion where the default config path changed from repo/config.yaml to ~/.config/bacalhau/config.yaml. This PR reverts this change and continue using a config.yaml under the repo/dataDir if present and if no explicit config file was provided by the user through --config flag.

The assumption that many users have is deleting the data directory (e.g. ~/.bacalhau), or pointing to a fresh directory using $BACALHAU_DIR env variable or --data-dir flag, is reseting the state and configuration of the node or client. However, this wasn't the case as the default config path was outside the data directory.

Before this PR

This is an example where issues could happen:

# user changes the api port 
# here the user didn't provide a --config file, so configurations 
# were written to a user global `~/.config/bacalhau/config.yaml` 
# and not local to the data dir
bacalhau config set api.port 2233

# user starts a node and expects 2233
# this works as the user didn't specify --config and the default path was picked
bacalhau serve --compute --orchestrator

# user decides to start another node, or just stop the previous one and start fresh again
export BACALHAU_DIR=$(mktemp -d -t bacalhau_dir)
bacalhau serve --compute --orchestrator

# here the user expects the node to start with default api.port of 1234
# but the default configuration was read again

After this PR

# user changes the api port 
# the configs are now written to the data dir `$BACALHAU_DIR/config.yaml`
bacalhau config set api.port 2233

# user starts a node and expects 2233
# this works as the user didn't specify --config and the data dir path was picked
bacalhau serve --compute --orchestrator

# user decides to start another node, or just stop the previous one and start fresh again
export BACALHAU_DIR=$(mktemp -d -t bacalhau_dir)
bacalhau serve --compute --orchestrator

# here the node starts with fresh config as the data dir has empty configuratoin

Different Combinations

with the upcoming v1.5.0 release, users are able to define explicit configuration files using --config flag, as well as the existing --data-dir flag to define where bacalhau stores its state. These are also configurable using env variables.

The default data-dir if none is specified is ~/.bacalhau, and the default config path is $(dataDir)/config.yaml. Lets explore what that means in action

Setting config

# will use ~/.bacalhau/config.yaml
bacalhau config set api.port 2233

# will use hello.yaml
bacalhau config set --config hello.yaml api.port 2233

# will use customDir/config.yaml
bacalhau config set --data-dir customDir api.port 2233

# will use hello.yaml
bacalhau config set --config hello.yaml --data-dir customDir api.port 2233

Applying config

bacalhau serve has the same flags as bacalhau config set, and the same logic and precedence applies

# will use ~/.bacalhau/config.yaml
bacalhau serve --orchestrator --compute

# will use hello.yaml
bacalhau serve --orchestrator --compute --config hello.yaml

# will use customDir/config.yaml
bacalhau serve --orchestrator --compute --data-dir customDir 

# will use hello.yaml instead of customDir/config.yaml
bacalhau serve --orchestrator --compute --config hello.yaml --data-dir customDir 

@wdbaruni wdbaruni marked this pull request as ready for review September 18, 2024 12:12
@seanmtracey
Copy link
Member

Looks good to me, with one question:

# will use ~/.bacalhau/config.yaml
bacalhau config set api.port 2233

# will use hello.yaml
bacalhau config set --config hello.yaml api.port 2233

# will use customDir/config.yaml
bacalhau config set --data-dir customDir api.port 2233

# will use hello.yaml
bacalhau config set --config hello.yaml --data-dir customDir api.port 2233

With this example at the end of the PR notes - if I were to run bacalhau serve --compute --orchestrator having entered all of the commands in the example in sequence - would the config be derived from ~/.bacalhau/config.yaml or would it be derived from hello.yaml?

What I'm really asking here is, if I run bacalhau config set --config hello.yaml will that set the hello.yaml file as the default config files for all subsequent executions?

@wdbaruni
Copy link
Member Author

wdbaruni commented Sep 18, 2024

What I'm really asking here is, if I run bacalhau config set --config hello.yaml will that set the hello.yaml file as the default config files for all subsequent executions?

No it wont. It will actually fail with requires at least 2 arg(s), only received 0 as you are missing the key and value to configure. What you are referring to is config contexts, but unfortunately that didn't make it through.

I have updated the examples in the PR description to make it clearer

@seanmtracey
Copy link
Member

OK, this makes sense to me.

My understanding is as follows:

  1. We have a default config.yaml file that Bacalhau will source if it's executed without being passed a different config file in the command (No --config flag)
  2. If there is a config.yaml in the active data directory of a Bacalhau installation, it will be read and override the default values.
  3. If I run bacalhau config set <KEY> <VALUE>, without passing a path to a config file, it will write that value to the default config.yaml at ~/.bacalhau/config.yaml unless there is a config.yaml file in the data directory, in which case it will write the value to that file.
  4. If run bacalhau config set --data-dir customDir <KEY> <VALUE> it will write a config.yaml file with that value to that specified directory, butttt.....
  5. If that config.yaml file is not in the data directory, it's values will not be read unless Bacalhau is executed and explicitely passed that config file with bacalhau serve --orchestrator --compute --config hello.yaml ....

So, data dir config.yaml supersedes default config.yaml unless a separate config.yaml is explicitly passed with the --config flag?

Have I got that right?

@aronchick
Copy link
Collaborator

this seems solid to me. Just please use whatever is standard for norms with Viper!

@wdbaruni
Copy link
Member Author

You lost me a little 😅

  1. If I run bacalhau config set , without passing a path to a config file, it will write that value to the default config.yaml at ~/.bacalhau/config.yaml unless there is a config.yaml file in the data directory, in which case it will write the value to that file.

~/.bacalhau is the default data directory. So if you call bacalhau config set with no config path and no data directory like you mentioned, then it will write to ~/.bacalhau/config.yaml . It will also create the config file if it doesn't exist.

  1. If that config.yaml file is not in the data directory, it's values will not be read unless Bacalhau is executed and explicitely passed that config file with bacalhau serve --orchestrator --compute --config hello.yaml

Not sure I understand this. If you run bacalhau serve with nothing, then it will attempt to read config.yaml from your data directory, which is ~/.bacalhau/config.yaml by default. If there is no config.yaml in that directory, which is the case if you never called bacalhau config set, then bacalhau will start with default configuration. It will not complain that the config file doesn't exist and will not attempt to create one.

So, data dir config.yaml supersedes default config.yaml unless a separate config.yaml is explicitly passed with the --config flag?

data dir config.yaml IS the default config file, unless by "default config.yaml" you mean the hardcoded default configurations, then you are right. So in order of presedence:

  • Default hardcoded configurations
  • config.yaml under your data directory
  • config.yaml that is passed explicitly through --config flag

What is worth clarifying is we DO NOT merge --config explicit.yaml with ~/.bacalhau/config.yaml. If a user provides a config file, then we use it

Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please address the change suggested to set.go before merging.

Comment on lines 29 to 77
RunE: func(cmd *cobra.Command, args []string) error {
// initialize a new or open an existing repo. We need to ensure a repo
// exists before we can create or modify a config file in it.
_, err := util.SetupRepoConfig(cmd)
// Get the value of the --config flag
configFlag, err := cmd.Flags().GetString("config")
if err != nil {
return fmt.Errorf("failed to setup repo: %w", err)
return fmt.Errorf("failed to get config flag: %w", err)
}
return setConfig(cmd.PersistentFlags().Lookup("config").Value.String(), args[0], args[1:]...)
},
// provide auto completion for arguments to the `set` command
ValidArgsFunction: setAutoComplete,
}

bacalhauCfgDir := "bacalhau"
bacalhauCfgFile := config.DefaultFileName
// If --config flag is set, use it to set the Viper key
if configFlag != "" {
viper.Set(cliflags.RootCommandConfigFiles, []string{configFlag})
}

usrCfgDir, err := os.UserConfigDir()
if err != nil {
log.Warn().Err(err).Msg("Failed to find user-specific configuration directory. Using current directory to write config.")
} else {
bacalhauCfgDir = filepath.Join(usrCfgDir, bacalhauCfgDir)
if err := os.MkdirAll(bacalhauCfgDir, util2.OS_USER_RWX); err != nil {
// This means we failed to create a directory either in the current directory, or the user config dir
// indicating a some-what serious misconfiguration of the system. We panic here to provide as much
// detail as possible.
log.Panic().Err(err).Msgf("Failed to create bacalhau configuration directory: %s", bacalhauCfgDir)
}
bacalhauCfgFile = filepath.Join(bacalhauCfgDir, bacalhauCfgFile)
var configPath string
// load configs to get the config file path
rawConfig, err := util.SetupConfigType(cmd)
if err != nil {
return fmt.Errorf("failed to setup config: %w", err)
}

configPath = rawConfig.ConfigFileUsed()
if configPath == "" {
// we fall back to the default config file path $BACALHAU_DIR/config.yaml
// this requires initializing a new or opening an existing data-dir
bacalhauConfig, err := util.DecodeBacalhauConfig(rawConfig)
if err != nil {
return fmt.Errorf("failed to decode bacalhau config: %w", err)
}
_, err = util.SetupRepo(bacalhauConfig)
if err != nil {
return fmt.Errorf("failed to setup repo: %w", err)
}
configPath = filepath.Join(bacalhauConfig.DataDir, config.DefaultFileName)

// create the config file if it doesn't exist
if _, err := os.Stat(configPath); os.IsNotExist(err) {
if err := os.WriteFile(configPath, []byte{}, util2.OS_USER_RWX); err != nil {
return fmt.Errorf("failed to create default config file %s: %w", configPath, err)
}
}
}

return setConfig(configPath, args[0], args[1:]...)
},
// Provide auto completion for arguments to the `set` command
ValidArgsFunction: setAutoComplete,
}

setCmd.PersistentFlags().String("config", bacalhauCfgFile, "Path to the config file")
setCmd.PersistentFlags().String("config", "", "Path to the config file (default is $BACALHAU_DIR/config.yaml)")
return setCmd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can all be simplified a fair bit. setConfig will create the file if it doesn't already exist so no need to create one here.

Suggested change
RunE: func(cmd *cobra.Command, args []string) error {
// initialize a new or open an existing repo. We need to ensure a repo
// exists before we can create or modify a config file in it.
_, err := util.SetupRepoConfig(cmd)
// Get the value of the --config flag
configFlag, err := cmd.Flags().GetString("config")
if err != nil {
return fmt.Errorf("failed to setup repo: %w", err)
return fmt.Errorf("failed to get config flag: %w", err)
}
return setConfig(cmd.PersistentFlags().Lookup("config").Value.String(), args[0], args[1:]...)
},
// provide auto completion for arguments to the `set` command
ValidArgsFunction: setAutoComplete,
}
bacalhauCfgDir := "bacalhau"
bacalhauCfgFile := config.DefaultFileName
// If --config flag is set, use it to set the Viper key
if configFlag != "" {
viper.Set(cliflags.RootCommandConfigFiles, []string{configFlag})
}
usrCfgDir, err := os.UserConfigDir()
if err != nil {
log.Warn().Err(err).Msg("Failed to find user-specific configuration directory. Using current directory to write config.")
} else {
bacalhauCfgDir = filepath.Join(usrCfgDir, bacalhauCfgDir)
if err := os.MkdirAll(bacalhauCfgDir, util2.OS_USER_RWX); err != nil {
// This means we failed to create a directory either in the current directory, or the user config dir
// indicating a some-what serious misconfiguration of the system. We panic here to provide as much
// detail as possible.
log.Panic().Err(err).Msgf("Failed to create bacalhau configuration directory: %s", bacalhauCfgDir)
}
bacalhauCfgFile = filepath.Join(bacalhauCfgDir, bacalhauCfgFile)
var configPath string
// load configs to get the config file path
rawConfig, err := util.SetupConfigType(cmd)
if err != nil {
return fmt.Errorf("failed to setup config: %w", err)
}
configPath = rawConfig.ConfigFileUsed()
if configPath == "" {
// we fall back to the default config file path $BACALHAU_DIR/config.yaml
// this requires initializing a new or opening an existing data-dir
bacalhauConfig, err := util.DecodeBacalhauConfig(rawConfig)
if err != nil {
return fmt.Errorf("failed to decode bacalhau config: %w", err)
}
_, err = util.SetupRepo(bacalhauConfig)
if err != nil {
return fmt.Errorf("failed to setup repo: %w", err)
}
configPath = filepath.Join(bacalhauConfig.DataDir, config.DefaultFileName)
// create the config file if it doesn't exist
if _, err := os.Stat(configPath); os.IsNotExist(err) {
if err := os.WriteFile(configPath, []byte{}, util2.OS_USER_RWX); err != nil {
return fmt.Errorf("failed to create default config file %s: %w", configPath, err)
}
}
}
return setConfig(configPath, args[0], args[1:]...)
},
// Provide auto completion for arguments to the `set` command
ValidArgsFunction: setAutoComplete,
}
setCmd.PersistentFlags().String("config", bacalhauCfgFile, "Path to the config file")
setCmd.PersistentFlags().String("config", "", "Path to the config file (default is $BACALHAU_DIR/config.yaml)")
return setCmd
// initialize a new or open an existing repo. We need to ensure a repo
// exists before we can create or modify a config file in it.
cfg, err := util.SetupRepoConfig(cmd)
if err != nil {
return fmt.Errorf("failed to setup repo: %w", err)
}
cfgFile := cmd.PersistentFlags().Lookup("config").Value.String()
if cfgFile == "" {
cfgFile = filepath.Join(cfg.DataDir, config.DefaultFileName)
}
return setConfig(cfgFile, args[0], args[1:]...)
},
// provide auto completion for arguments to the `set` command
ValidArgsFunction: setAutoComplete,
}
setCmd.PersistentFlags().String("config", "", "Path to the config file")
return setCmd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few issues with this approach:

  • a repo is going to be initialized and created unnecessarily even if the user define a config path. The repo can be well different than the one in user's config file as this --config flag is different than the one SetupRepoConfig() is basing on
  • we will create a config instance and log that a config was initialize using paths that can be different from one provided by the user as we are calling SetupRepoConfig() before parsing --config
  • setConfig is not creating the config file for us, and we don't want it to. We are only creating a config file if we are using the default path where we are creating config.yaml under the repo. Otherwise if the user provided a config file that doesn't exist, then we better fail instead of creating it to avoid undesired behavour with typos

Here are examples of this behaviour:

# config file with datadir setup
→ cat bacalhau_config.yaml
datadir: /Users/walid/test/config

# set a new entry in that config, you will see that repo is getting initialized
# at the default location instead of the one in the config file
→ bacalhau config set api.host 2.2.2.2 --config bacalhau_config.yaml
21:49:11.858 | INF ../../workspace/bacalhau/pkg/config/config.go:211 > Config loaded from: [], and with data-dir /Users/walid/.bacalhau
21:49:11.859 | INF ../../workspace/bacalhau/pkg/repo/fs.go:91 > Initializing repo at /Users/walid/.bacalhau
21:49:12.018 | INF ../../workspace/bacalhau/cmd/cli/config/set.go:49 > Writing config to bacalhau_config.yaml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're right, the creation of the repo should happen on if a flag isn't provided:

func newSetCmd() *cobra.Command {
	setCmd := &cobra.Command{
		Use:          "set",
		Args:         cobra.MinimumNArgs(2),
		Short:        "Set a value in the config.",
		PreRunE:      hook.ClientPreRunHooks,
		PostRunE:     hook.ClientPostRunHooks,
		SilenceUsage: true,
		RunE: func(cmd *cobra.Command, args []string) error {
			cfgFile := cmd.PersistentFlags().Lookup("config").Value.String()
			if cfgFile == "" {
				// if the user didn't provide a flag, then ensure a repo is initialize and write/create file there.
				cfg, err := util.SetupRepoConfig(cmd)
				if err != nil {
					return err
				}
				cfgFile = filepath.Join(cfg.DataDir, config.DefaultFileName)
			}
			return setConfig(cfgFile, args[0], args[1:]...)
		},
		// provide auto completion for arguments to the `set` command
		ValidArgsFunction: setAutoComplete,
	}

	setCmd.PersistentFlags().String("config", "", "Path to the config file")
	return setCmd
}

func setConfig(cfgFilePath, key string, value ...string) error {
	v := viper.New()
	v.SetConfigFile(cfgFilePath)
	if err := v.ReadInConfig(); err != nil {
		if !os.IsNotExist(err) {
			return err
		}
	}
	parsed, err := types.CastConfigValueForKey(key, value)
	if err != nil {
		return err
	}
	v.Set(key, parsed)
	if err := v.WriteConfigAs(cfgFilePath); err != nil {
		return err
	}

	return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as 21:49:11.858 | INF ../../workspace/bacalhau/pkg/config/config.go:211 > Config loaded from: [], and with data-dir /Users/walid/.bacalhau

I think we should only produce this log line when we have loaded a config from a file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I don't like that we are forking how config path is resolved in set command compared to others. I tried to minimize the fork by still calling SetupConfig the same as how all other methods do. If in the future we introduce a new logic on how config path is resolved, such as introducing an env variable, we wont fall in bugs where we miss to update the code here as well. It is getting late here. I have an idea to reuse the same config flags that I'll push tomorrow.

Second, we still have to create the config file if it is coming from the repo

Last, I have to disagree about the logging. Even if configs are not loaded from a file because we couldn't fine one or the user didn't provide one, we should still log that to help them understand what is going on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, my goal with these comments is to get use to simplify things, my approval is sticky, so merge when you are satisfied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made changes here to make sure we are using the same --config flag for all commands, both for reading and writing it. Config initialization is now as unified as it can get using util.SetupConfig() that relies on that flag

b985723

cmd/util/repo.go Outdated Show resolved Hide resolved
cmd/cli/root.go Show resolved Hide resolved
@wdbaruni wdbaruni merged commit ad6d795 into main Sep 19, 2024
3 of 4 checks passed
@wdbaruni wdbaruni deleted the default-config branch September 19, 2024 08:43
Comment on lines +55 to +60
if cf.isWriteMode {
// Check if a config file is already set in Viper
if viper.GetString(RootCommandConfigFiles) != "" {
return fmt.Errorf("single config file can be set")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how we are enforcing that the flag may only be used once with the config set command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Collaborator

@aronchick aronchick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review code, but behavior lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants