Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for defining "maintenance windows" #55

Open
ebekker opened this issue Jan 24, 2017 · 7 comments
Open

Add support for defining "maintenance windows" #55

ebekker opened this issue Jan 24, 2017 · 7 comments

Comments

@ebekker
Copy link
Collaborator

ebekker commented Jan 24, 2017

An interesting idea that I ran across was the ability to define maintenance windows -- for example to say that only between hours x and y on days a,b,c is Tug allowed to respond with "GetConfiguration" status to pull and update the MOF.

This also sparks the concept of policy-driven behavior, perhaps associated with specific Config names.

CC: @concentrateddon

@thedevopsmachine
Copy link

You'll still have to reply with a Configuration, otherwise the LCM will bomb out and you'll fill your Event Log with errors. Also, I don't think that would truly put it in Maintenance Mode, since the LCM will still be doing consistency checks on the last retrieved MOF (the configuration downloads and consistency checks are not related).

I would recommend that during a maintenance window, GetConfiguration would return a "maintenance" configuration with a "Log" resource that writes to the Event Log saying "Server is in Maintenance Mode". This way the download manager will download the temporary MOF and will disassociate the old MOF, thus preventing the consistency checker from executing any of the resources.

Keep in mind that the default configuration refresh interval is 30 mins and the default consistency check interval in 15 mins, so just because the server thinks the machine is in a maintenance window, the LCM may still be applying the configuration until the next time it checks for an updated config (which could be up to 30 minutes from the start of the window). If using this feature, the RefreshFrequencyMins should be dropped to something really low, like 5 minutes, to minimize the impact of this delay.

@ebekker
Copy link
Collaborator Author

ebekker commented Jan 31, 2017

That's a great point -- the consistency check on the local node will still be validating and adjusting as necessary against any discrepancies it finds with the config in-hand which may ultimately cause undesirable behavior like starting/stopping services or rebooting, etc.

This will take a little more thought to implement correctly without making it overly-complicated.

In your example -- during the maintenance window is when we would want to send the real configuration so that the node can make any adjustments it might need. So perhaps we would need to dynamically inject the log resource you mention into the actual MOF, or perhaps we can play some trickery with assembling Partial Configs, say one that is dynamically computed by the pull server, and the other that is the real config.

@concentrateddon
Copy link
Contributor

I'm not sure it gets THAT complex. Outside maintenance mode, server delivers a MOF that only logs something. Inside maintenance mode, server delivers real MOF. The node isn't going to undo itself outside of maintenance window, it'll just stop enforcing the actual desired configuration. But yeah - the pull server knows what the "real" config is, and just delivers a "placeholder" outside the maintenance window.

@concentrateddon
Copy link
Contributor

But I'd implement this in the Provider, not Tug per se. I don't think you want a lot of business logic in the web server layer; the business logic is meant to live in the providers. So the provider gets a MOF request, and if it's in maintenance window, it delivers the real deal. If it's not, it delivers a fake. You'd just need to track the status on those so you knew whether to tell the node it actually had a new MOF or not (e.g., checksumming).

@thedevopsmachine
Copy link

I misinterpreted your intent on maintenance windows. I assumed that DSC would be running all the time, and during the maintenance window DSC would NOT be running so you could do manual stuff that you might not put in a DSC config on "pet" servers (e.g. enabling debug settings, installing Exchange CU's, restoring DB's etc).

Not really sure why you would want to only run DSC in a maintenance window. You can set the LCM to use ApplyAndMonitor mode; that will apply the MOF once and not reapply again unless the MOF changes. Just don't change the MOF (or at least don't let the agent know that you've changed the MOF) until you're in your "maintenance window" and your objective is achieved. That logic is in your Provider though.

@ebekker
Copy link
Collaborator Author

ebekker commented Jan 31, 2017

Yes, but that configuration would be applied on the Node and it's not that you don't want to ever update the Node -- just that you want to control when it's safe to update it without any fear of disruption or loss of service.

Actually, disregard my comment, I misunderstood your remark. You're actually describing the same idea -- putting some logic in the server-side that would control when it's safe to serve up an updated MOF.

@edthedev
Copy link

Maintenance window support was a hot topic at PSH Summit 2017.

I think we could provide a good inroad for some early adopters by creating a reference provider implementation, and linking to it from the ReadMe.

I'll take a crack at it during one of my Monday hack nights sometime in the next few weeks, and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants