Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main review mods 2023 12 06 #55

Open
wants to merge 151 commits into
base: main-review-2023-11-17
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
afe0783
branch:HPCC-27615-original-folder-structure. Constructed the original…
Sep 21, 2023
5a51f03
branch:HPCC-27615-fixes-deploy-without-external-storage. Now can depl…
Sep 21, 2023
35b1a43
branch:HPCC-27615-my-local-has-flat-structure
Sep 21, 2023
f31f1c7
branch:HPCC-27615-easy-deploy. Get is a merge of HPCC-27615 latest an…
Sep 22, 2023
c11ef09
branch: HPCC-27615-easy-deploy. This is a merge of HPCC-27615 and bra…
Sep 22, 2023
977e107
branch:new-variable-enable_thor. Now one must set 'enable_thor=true' …
Sep 26, 2023
1b54ae7
branch:fix-roxie-so-port-18002-is-used
Sep 27, 2023
eba6f63
branch:add-htpasswd-support
Sep 28, 2023
58268fa
branch:add-ecl-code-security. Plus, , setup 'storage_data_lz'
Oct 4, 2023
fbafd5d
branch:add-ecl-code-security
Oct 4, 2023
32b918b
Merge pull request #1 from tlhumphrey2/add-ecl-code-security
Oct 5, 2023
226139f
branch:add-terraform-to-deploy-everything
Oct 6, 2023
982e692
Merge pull request #2 from tlhumphrey2/add-terraform-to-deploy-everyt…
Oct 6, 2023
347ceef
branch:add-terraform-to-deploy-everything
Oct 9, 2023
1e0e18f
Merge pull request #3 from tlhumphrey2/add-terraform-to-deploy-everyt…
Oct 9, 2023
e5f5853
branch:aks-is-now-using-easy-deploy-variables
Oct 12, 2023
9f93f16
Merge pull request #4 from tlhumphrey2/aks-is-now-using-easy-deploy-v…
Oct 12, 2023
7fb2c89
branch:aks-is-now-using-easy-deploy-variables
Oct 12, 2023
cde9f4f
Merge pull request #5 from tlhumphrey2/aks-is-now-using-easy-deploy-v…
Oct 12, 2023
0f56039
branch:aks-is-now-using-easy-deploy-variables. Changed scripts/deploy…
Oct 13, 2023
1bf7dc9
Merge pull request #6 from tlhumphrey2/aks-is-now-using-easy-deploy-v…
Oct 13, 2023
e14e9eb
branch:aks-is-now-using-easy-deploy-variables
Oct 17, 2023
3de0cc1
Merge pull request #7 from tlhumphrey2/aks-is-now-using-easy-deploy-v…
Oct 17, 2023
dbe5cef
branch:aks-is-now-using-easy-deploy-variables
Oct 18, 2023
ff58b6d
Merge pull request #8 from tlhumphrey2/aks-is-now-using-easy-deploy-v…
Oct 18, 2023
2203f42
branch:few-changes-20231020
Oct 20, 2023
71434db
Merge pull request #9 from tlhumphrey2/few-changes-20231020
Oct 20, 2023
c54dad8
branch:HPCC-27615-easy-deploy-bryan1
Oct 28, 2023
f077f15
branch:HPCC-27615-easy-deploy-bryan1-w-oss
Oct 28, 2023
1868452
branch:HPCC-27615-easy-deploy-bryan1-w-oss-zones
Oct 28, 2023
1523b64
branch:HPCC-27615-easy-deploy-bryan1-w-oss-zones. Now can optionally …
Oct 30, 2023
8e0e2f1
branch:HPCC-27615-easy-deploy-bryan1-w-oss-zones. Now can optionally …
Oct 30, 2023
d2ee650
branch: HPCC-27615-easy-deploy-bryan1-w-oss-zones
Oct 30, 2023
873a9f7
branch:HPCC-27615-easy-deploy-bryan1-w-oss-zones. merged with github …
Oct 30, 2023
50f7050
branch:HPCC-27615-easy-deploy-bryan2-root-sto-applied-initials-added.…
Oct 31, 2023
108641a
branch:HPCC-27615-easy-deploy-bryan2-root-sto-applied-initials-added.…
Oct 31, 2023
19e114a
branch:HPCC-27615-easy-deploy-bryan2-root-sto-applied-initials-added2
Nov 1, 2023
5e3a1ec
branch:HPCC-27615-easy-deploy-bryan3-roxiepool-optional
Nov 1, 2023
7065e89
branch:HPCC-27615-easy-deploy-bryan4-placing-auto.tfvars-files-aks-st…
Nov 2, 2023
ad678e4
branch:HPCC-27615-easy-deploy-bryan4-placing-auto.tfvars-files-aks-st…
Nov 2, 2023
aca0ced
branch:HPCC-27615-easy-deploy-bryan4-placing-auto.tfvars-files-aks-st…
Nov 2, 2023
1c346ee
branch:HPCC-27615-easy-deploy-bryan4-placing-auto.tfvars-files-aks-st…
Nov 2, 2023
e796355
branch:HPCC-27615-easy-deploy-bryan5-miscellaneous-changes
Nov 4, 2023
3588eec
branch:HPCC-27615-easy-deploy-bryan5-miscellaneous-changes. Added doc…
Nov 5, 2023
8ec2aab
branch:HPCC-27615-easy-deploy-bryan5-miscellaneous-changes. Updated d…
Nov 5, 2023
498d908
branch:HPCC-27615-easy-deploy-bryan6-restrict-hpcc-access
Nov 6, 2023
81afffd
branch:HPCC-27615-easy-deploy-bryan6-restrict-hpcc-access. Updated us…
Nov 6, 2023
902ec17
branch:HPCC-27615-easy-deploy-bryan6-restrict-hpcc-access. Updating U…
Nov 6, 2023
682612e
branch:HPCC-27615-easy-deploy-bryan6-restrict-hpcc-access. Updated Us…
Nov 6, 2023
2417b9a
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation
Nov 7, 2023
d8aa06c
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation
Nov 7, 2023
bf1a64d
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation
Nov 7, 2023
41be7fa
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation
Nov 7, 2023
02fe05b
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation. Update…
Nov 7, 2023
5bc6502
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation. Update…
Nov 7, 2023
a8a8170
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation. Update…
Nov 7, 2023
882a390
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation. Update…
Nov 7, 2023
345453e
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation. Update…
Nov 7, 2023
986595a
branch:HPCC-27615-easy-deploy-bryan8-cleanup-and-make-clearer
Nov 9, 2023
746eb3e
Merge pull request #36 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 9, 2023
fc879fe
branch:HPCC-27615-easy-deploy-bryan8-pods-assigned-nodepools
Nov 9, 2023
6ffcf2d
Merge pull request #37 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 9, 2023
9f522e2
branch:HPCC-27615-easy-deploy-bryan8-pods-assigned-nodepools. hpcc.tf…
Nov 10, 2023
409e1a5
Merge pull request #38 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 10, 2023
ba53432
branch:no-ephemeral-storage-when-external-used
Nov 10, 2023
dc46a6f
Merge pull request #39 from hpccsystems-solutions-lab/no-ephemeral-st…
tlhumphrey2 Nov 10, 2023
cf68e81
branch:no-ephemeral-storage-when-external-used. Fixed paths in script…
Nov 10, 2023
d7ea956
Merge pull request #40 from hpccsystems-solutions-lab/no-ephemeral-st…
tlhumphrey2 Nov 10, 2023
31396de
branch:HPCC-27615-easy-deploy-bryan7-developers-documentation. Update…
Nov 12, 2023
d42c883
Merge pull request #41 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 12, 2023
d17572e
branch:HPCC-27615-easy-deploy-bryan9-variable-eclwatch-a-record
Nov 16, 2023
b05a798
Merge pull request #42 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 16, 2023
fcc5365
Update hpcc.tf
tlhumphrey2 Nov 16, 2023
a76e524
branch:HPCC-27615-easy-deploy-bryan10-added-hpcc_version
Nov 16, 2023
c175fdb
Merge pull request #43 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 16, 2023
1dce33d
Tims' Modifications
dcamper Nov 17, 2023
b44fbcb
branch:HPCC-27615-easy-deploy-bryan10-added-hpcc_version_and_misc. Ad…
Nov 20, 2023
4d284cb
Merge branch 'main' into HPCC-27615-easy-deploy-bryan10-added-hpcc_ve…
tlhumphrey2 Nov 20, 2023
d89b49e
Merge pull request #45 from hpccsystems-solutions-lab/HPCC-27615-easy…
tlhumphrey2 Nov 20, 2023
8747f16
Update destroy
tlhumphrey2 Nov 20, 2023
822acd9
branch:main-review-mods-2023-11-17. The easy fixes
Nov 21, 2023
73f5982
main-review-mods-2023-11-17. Resolved conflicts in scripts/destroy
Nov 21, 2023
339bf98
Merge branch 'main-review-mods-2023-11-17-tlh-changes' into main-revi…
Nov 21, 2023
4d5526d
branch:main-review-mods-2023-11-17-activate-aks_node_size
Nov 21, 2023
5d71764
Merge pull request #46 from hpccsystems-solutions-lab/main-review-mod…
tlhumphrey2 Nov 21, 2023
3aab1f8
branch:main-review-mods-2023-11-17-activate-aks_node_size
Nov 21, 2023
1518d47
Merge pull request #48 from hpccsystems-solutions-lab/main-review-mod…
tlhumphrey2 Nov 21, 2023
2a6805b
branch:main-review-mods-2023-11-17-activate-aks_node_size. aks_node_s…
Nov 22, 2023
a97cb81
Merge pull request #50 from hpccsystems-solutions-lab/main-review-mod…
tlhumphrey2 Nov 22, 2023
6809ae2
branch:main-review-mods-2023-11-17-deploy-hpcc-depends-on-storage
Nov 22, 2023
a646d62
Merge pull request #52 from hpccsystems-solutions-lab/main-review-mod…
tlhumphrey2 Nov 22, 2023
2e5e4f4
branch:main-review-mods-2023-11-17-aks_node-sizes-now-object
Nov 24, 2023
f8fa470
Merge pull request #54 from hpccsystems-solutions-lab/main-review-mod…
tlhumphrey2 Nov 24, 2023
290d023
Remove input variables not used
Dec 5, 2023
801e279
Added column 'updatable' to table of options in README.md
Dec 5, 2023
075d3aa
Removed automation.tf from aks
Dec 5, 2023
fb61631
Increased values of 'managerResources'. NOTE: Dan's 'managerResources…
Dec 5, 2023
c5d78de
Increased values of 'workerResources' to match Dan's
Dec 5, 2023
99d7fcf
Increased cpu of hthor resources to 2. This is higher than what Dan has.
Dec 5, 2023
416b4e4
Eliminated metadata variables not used
Dec 5, 2023
0274223
Removed metadata from aks, hpcc, and vnet (not in storage). Now it is…
Dec 5, 2023
bfd5142
Removed metadata from aks, hpcc, storage, and vnet (not in storage). …
Dec 5, 2023
5617f98
Removed metadata from aks, hpcc, storage, and vnet. Now it is copied …
Dec 5, 2023
6471329
Added new variable, 'aks_capacity' which defines the minimum and maxi…
Dec 6, 2023
ab72960
In lite-variables.tf, I changed the descriptions of 'aks_node_sizes' …
Dec 6, 2023
96bbe2c
In all bash scripts, replaced with
Dec 6, 2023
4d88cc9
Removed referenced branch from all source statements, since the branc…
Dec 6, 2023
91266e9
In lite-variables.tf, line 54 added '\' before each " in description.
Dec 7, 2023
f45720c
In hpcc/outputs.tf, prefixed eclwatch url with 'https://'. In lite-va…
Dec 7, 2023
3405610
In README.md, documented outpuss of hpcc, aks, vnet. There are no out…
Dec 7, 2023
6e6b6c3
Change default node size for spray pool from 2xlarge to large.
Dec 8, 2023
6ece380
Reduced the size of nodes in each node pool.
Dec 8, 2023
eae050f
Fixed value of output 'advisor_recommendations' for both aks/outputs.…
Dec 8, 2023
11554af
Calculates max capacity of thorpool. Set thor cpu and ram.
Dec 11, 2023
5c33a80
Calculates max capacity of thorpool. Set thor cpu and ram. Added to r…
Dec 12, 2023
8dd9ebf
In both hpcc/hpcc.tf and aks/aks.tf, changed source statements so val…
Dec 12, 2023
a00e254
In both hpcc/hpcc.tf and aks/aks.tf, removed prefix 'git@' from sourc…
Dec 12, 2023
9f48ece
Changed variable 'aks_node_sizes' to individual string variables: rox…
Dec 12, 2023
f4847f5
Changed 'source' in aks/aks.tf. Now it points to github repo
Dec 12, 2023
01f09ed
To redone lite-locals.tf, added workerResources cpu and memory.
Dec 13, 2023
b30f1a5
Removed commented-out code from hpcc/main.tf
Dec 13, 2023
b83083c
Removed variable 'hpcc_namespace'
Dec 13, 2023
185c760
Removed 18010 from output of eclwatch URL. Also, changed opinionated …
Dec 13, 2023
b772483
Capitalized Kubernetes everywhere in README.md
Dec 13, 2023
3eb4e6e
Deleted paragraph 'This repo is a fork of the excellent work performe…
Dec 13, 2023
76386ea
Make sure all these are capitalized in README.md when used as product…
Dec 13, 2023
4f37c43
Throughout README.md changed 'terraform' to 'terraform code'
Dec 13, 2023
85b2bef
All fixes for Dan's comments about hpcc-tf-for-developers.md are in t…
Dec 13, 2023
e6ff10c
In lite-locals.tf, deleted all terraform code that was commented-out.
Dec 13, 2023
38ef877
All fixes for all Dan's review in 1:15pm email today.
Dec 13, 2023
6e366a7
In hpcc-tf-for-developers.md, capitalizes all Azure
Dec 13, 2023
e74c3b6
Dan's review fixes in email dated 12/14/2023 7:38am
Dec 14, 2023
44cedc4
Removed azuread_group.subscription_owner from aks/aks.tf and aks/data…
Dec 15, 2023
97f7eef
Make 1 or 4 nodepools optional. Added aks_4nodepools
Dec 18, 2023
4ba65e2
Removed all error messages in 'thorpool_max_capacity' calculations an…
Dec 19, 2023
a8cef3e
Removed all occurrences of region restriction.
Dec 19, 2023
d899526
In lite-variables.tf, changed 8002 to 18002
Dec 19, 2023
2255093
In lite.auto.tfvars.example, changed 8002 to 18002
Dec 19, 2023
a5e3dc5
Removed all error messages in 'thorpool_max_capacity' calculations an…
Dec 19, 2023
abbb3bf
To metadata.tf of storage, added 'additional_tags'. Plus, removed 'de…
Dec 20, 2023
9cf1765
'workerResources' memory. Added 'G'. Caused thor container error.
Dec 27, 2023
f58b8b7
Added scripts/extract-aks-tfvars to properly extra 'aks_' variables f…
Jan 3, 2024
9cdfc88
In lite-variables.tf, no longer says REQUIRED for aks_enable_roxie.
Jan 3, 2024
81eca56
Changed workerMemory.query to same value as workerResources.memory.
Jan 3, 2024
ed598d8
Created 'aks_nodepools_max_capacity'. max_capacity of all hpcc nodepo…
Jan 4, 2024
05f46ff
Corrected 'aks_nodepools_max_capacity' code of lite-locals.tf and lit…
Jan 5, 2024
09cad2b
In README.md, minimum vCPU requirements given. In aks/locals.tf, min_…
Jan 5, 2024
4d1b78f
Added in README.md: 1) info about the directory, and what is in it, …
Jan 8, 2024
e4a8cc4
In scripts/needed-auto-tfvars-files/aks/aks.auto.tfvars.example, chan…
Jan 10, 2024
431fad0
In lite-locals.tf, increased 'helm_chart_timeout' from 300 to 600. Wh…
Jan 10, 2024
3c7c641
In README.md, said that 'jq' and 'kubelogin' are required (i.e. they …
Jan 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion aks/aks.tf
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
module "aks" {
depends_on = [random_string.string]
source = "[email protected]:hpccsystems-solutions-lab/tlh-oss-terraform-azurerm-aks.git"
#source = "[email protected]:hpccsystems-solutions-lab/tlh-oss-terraform-azurerm-aks.git"
source = "https://github.com/hpccsystems-solutions-lab/tlh-oss-terraform-azurerm-aks.git"
#source = "/home/azureuser/temp/OSS/terraform-azurerm-aks"

providers = {
Expand Down
16 changes: 8 additions & 8 deletions aks/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ locals {
node_type_version = "v2"
node_size = var.aks_node_sizes.roxie
single_group = false
min_capacity = var.aks_capacity.roxie_min
max_capacity = var.aks_capacity.roxie_max
min_capacity = 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be zero -- or the entire node pool not built at all -- if Roxie is disabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All min_capacitys are 1 now because when I set them to zero, they still showup on the portal as one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is appropriate for a node pool to be spun down to zero, we should set it that way, even if it winds up as a '1' in the portal afterwards. That could be a bug, or it could be an intentional behavior by an upstream module that needs to be corrected. We can at least do it correct in our code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

max_capacity = 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) Shouldn't this be determined dynamically, based on the size of the cluster the user wants?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLH: Added new variable, 'aks_capacity' which defines the minimum and maximum number of nodes in each node pool.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This size of a Roxie pool is still hardwired. Is that correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link

@dcamper dcamper Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not fixed, as of this writing:

min_capacity = 1
max_capacity = 3

tlh response: I don't have any roxie variables I can use to calculate the roxiepool max_capacity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not edit anyone else's comments; reply instead.

Your comment is a good reply to my original question ("This size of a Roxie pool is still hardwired. Is that correct?").

Is a maximum capacity of 3 appropriate, though? Assuming that Roxie's VM requirements are hardcoded from the helm chart, do you need more than 3 nodes in the pool?

Elsewhere you correctly asserted that the maximum capacity costs no extra money, but it does impose a limit on the resources that can be allocated. If Roxie's helm requirements cause more than 3 nodes to be created, then everything will fail. This is worth double-checking. Related, if you can determine that you will never need 3 nodes, then why not reduce this 3 to whatever is really needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim, please address the previous questions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max_capacity of all hpcc nodepools is var.aks_nodepools_max_capacity (see commit ed598d8). By default the value is 400. My reasoning is having a large max_capacity doesn't cost more. The cost increases only when you use nodes. For example, say your min_capacity=1 but during the execution of hpcc, it needs 10 more nodes. So, the capacity is increased to 11. So, you will be charged for 11 then.

labels = {
"lnrs.io/tier" = "standard"
"workload" = "roxiepool"
Expand All @@ -31,8 +31,8 @@ locals {
node_type_version = "v2" # v1, v2
node_size = var.aks_node_sizes.thor
single_group = false
min_capacity = var.aks_capacity.thor_min
max_capacity = var.aks_capacity.thor_max
min_capacity = 1
max_capacity = var.aks_thorpool_max_capacity
labels = {
"lnrs.io/tier" = "standard"
"workload" = "thorpool"
Expand All @@ -48,8 +48,8 @@ locals {
node_type_version = "v1"
node_size = var.aks_node_sizes.serv
single_group = false
min_capacity = var.aks_capacity.serv_min
max_capacity = var.aks_capacity.serv_max
min_capacity = 1
max_capacity = 3
labels = {
"lnrs.io/tier" = "standard"
"workload" = "servpool"
Expand All @@ -65,8 +65,8 @@ locals {
node_type_version = "v1"
node_size = var.aks_node_sizes.spray
single_group = false
min_capacity = var.aks_capacity.spray_min
max_capacity = var.aks_capacity.spray_max
min_capacity = 0
max_capacity = 6
dcamper marked this conversation as resolved.
Show resolved Hide resolved
labels = {
"lnrs.io/tier" = "standard"
"workload" = "spraypool"
Expand Down
2 changes: 1 addition & 1 deletion aks/outputs.tf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
output "advisor_recommendations" {
description = "Advisor recommendations or 'none'"
value = data.azurerm_advisor_recommendations.advisor.recommendations == tolist([])? "none" : data.azurerm_advisor_recommendations.advisor.recommendations
value = data.azurerm_advisor_recommendations.advisor.recommendations
}
output "aks_login" {
description = "Location of the aks credentials"
Expand Down
3 changes: 2 additions & 1 deletion hpcc/hpcc.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module "hpcc" {
source = "[email protected]:hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git"
#source = "[email protected]:hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git"
source = "https://github.com/hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git"

environment = local.metadata.environment
productname = local.metadata.product_name
Expand Down
175 changes: 169 additions & 6 deletions hpcc/lite-locals.tf → lite-locals.tf
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,5 +1,63 @@
output "thor_max_jobs" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this output (and the next several lines) for debugging purposes or did you intend to have them remain for the final product? If the latter, please update the documentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These outputs are for debugging. Regarding you comments about the min_capacity and max_capacity in aks/locals.tf: all min_capacitys are 1. And, only the max_capacity for thorpool is variable because it is the only one that I can calculate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will keep this conversation as unresolved until the debugging statements are removed.

value = var.thor_max_jobs
}
output "thor_num_workers" {
value = var.thor_num_workers
}
output "thor_node_size" {
value = var.aks_node_sizes.thor
}
output "thor_ns_spec" {
value = local.ns_spec[var.aks_node_sizes.thor]
}
output "thor_worker_cpus" {
value = var.thor_worker_cpus
}
output "thorWorkersPerNode" {
value = "local.ns_spec[${var.aks_node_sizes.thor}].cpu / var.thor_worker_cpus = ${local.thorWorkersPerNode}"
}
output "thor_worker_ram" {
value = "local.ns_spec[${var.aks_node_sizes.thor}].ram / local.thorWorkersPerNode = ${local.thor_worker_ram}"
}
output "nodesPer1Job" {
value = "var.thor_num_workers / local.thorWorkersPerNode = ${local.nodesPer1Job}"
}
output "thorpool_max_capacity" {
value = "local.nodesPer1Job * var.thor_max_jobs = ${local.thorpool_max_capacity}"
}
locals {
helm_chart_timeout=600
ns_spec = {
dcamper marked this conversation as resolved.
Show resolved Hide resolved
"large" = {
cpu = 2
ram = 8
}
"xlarge" = {
cpu = 4
ram = 16
}
"2xlarge" = {
cpu = 8
ram = 32
}
"4xlarge" = {
cpu = 16
ram = 64
}
}

twpn = "${ local.ns_spec[var.aks_node_sizes.thor].cpu / var.thor_worker_cpus }"
thorWorkersPerNode = ceil(local.twpn) == local.twpn? local.twpn : "local.thorWorkersPerNode, ${local.twpn}, is not an integer because local.ns_spec[${var.aks_node_sizes.thor}].cpu, ${local.ns_spec[var.aks_node_sizes.thor].cpu}, is not a multiple of var.thor_worker_cpus, ${var.thor_worker_cpus}."

twr = "${local.ns_spec[var.aks_node_sizes.thor].ram / local.thorWorkersPerNode }"
thor_worker_ram = ceil(local.twr) == local.twr? local.twr : "local.thor_worker_ram, ${local.twr}, is not an integer because local.ns_spec[${var.aks_node_sizes.thor}].ram, ${local.ns_spec[var.aks_node_sizes.thor].ram}, is not a multiple of local.thorWorkersPerNode, ${local.thorWorkersPerNode}."

np1j = "${var.thor_num_workers / local.thorWorkersPerNode }"
nodesPer1Job = ceil(local.np1j) == local.np1j? local.np1j : "local.nodesPer1Job, ${local.np1j}, is not an integer because var.thor_num_workers, ${var.thor_num_workers}, is not a multiple of local.thorWorkersPerNode, ${local.thorWorkersPerNode}."

thorpool_max_capacity = ceil("${ local.nodesPer1Job * var.thor_max_jobs }")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work in all cases (one Thor worker, one max job); nodesPer1Job becomes 0.5 and fails validation.

│ Error: Invalid operand
│ 
│   on lite-locals.tf line 66, in locals:
│   66:   thorpool_max_capacity = ceil("${ local.nodesPer1Job * var.thor_max_jobs }")
│     ├────────────────
│     │ local.nodesPer1Job is "local.nodesPer1Job, 0.5, is not an integer because var.thor_num_workers, 1, is not a multiple of local.thorWorkersPerNode, 2."
│ 
│ Unsuitable value for left operand: a number is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm responding to your comment, "It is not fixed. See line 94, below, as a simple example". I'm not sure what you're talking about. But, I believe it might be code that was commented-out (you want me to delete it) and I have).
Again, there isn't a reply box immediately after your comment.


helm_chart_timeout=300
#hpcc_version = "8.6.20"

owner = {
name = var.admin_username
Expand All @@ -8,17 +66,80 @@ locals {

owner_name_initials = lower(join("",[for x in split(" ",local.owner.name): substr(x,0,1)]))

tags = var.extra_tags
/*metadata = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete commented-out code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not fixed. See line 94, below, as a simple example.

project = format("%shpccplatform", local.owner_name_initials)
product_name = format("%shpccplatform", local.owner_name_initials)
business_unit = "commercial"
environment = "sandbox"
market = "us"
product_group = format("%shpcc", local.owner_name_initials)
resource_group_type = "app"
sre_team = format("%shpccplatform", local.owner_name_initials)
subscription_type = "dev"
additional_tags = { "justification" = "testing" }
location = var.aks_azure_region # Acceptable values: eastus, centralus
}

tags = merge(local.metadata.additional_tags, var.extra_tags)
*/

# # disable_naming_conventions - Disable naming conventions
# # disable_naming_conventions = true
disable_naming_conventions = false

# # auto_launch_eclwatch - Automatically launch ECLWatch web interface.
#auto_launch_eclwatch = true
auto_launch_svc = {
eclwatch = false
}

# azure_auth = {
# # AAD_CLIENT_ID = ""
# # AAD_CLIENT_SECRET = ""
# # AAD_TENANT_ID = ""
# # AAD_PRINCIPAL_ID = ""
# SUBSCRIPTION_ID = ""
# }

# hpcc_container = {
# version = "9.2.0"
# image_name = "platform-core-ln"
# image_root = "jfrog.com/glb-docker-virtual"
# # custom_chart_version = "9.2.0-rc1"
# # custom_image_version = "9.2.0-demo"
# }

# hpcc_container_registry_auth = {
# username = "value"
# password = "value"
# }

internal_domain = var.aks_dns_zone_name // Example: hpcczone.us-hpccsystems-dev.azure.lnrsg.io

external = {}
# external = {
# blob_nfs = [{
# container_id = ""
# container_name = ""
# id = ""
# resource_group_name = var.storage_account_resource_group_name
# storage_account_id = ""
# storage_account_name = var.storage_account_name
# }]
# # hpc_cache = [{
# # id = ""
# # path = ""
# # server = ""
# }]
# hpcc = [{
# name = ""
# planes = list(object({
# local = ""
# remote = ""
# }))
# service = ""
# }]
# }

admin_services_storage_account_settings = {
replication_type = "ZRS" #LRS only if using HPC Cache
Expand All @@ -43,6 +164,12 @@ locals {
delete_protection = false
}
}
# hpc_cache = {
# enabled = false
# size = "small"
# cache_update_frequency = "3h"
# storage_account_data_planes = null
# }
}
external = null
}
Expand All @@ -63,6 +190,36 @@ locals {
replicas = 6
dcamper marked this conversation as resolved.
Show resolved Hide resolved
nodeSelector = "spraypool"
}

# ldap = {
# ldap_server = "" //Server IP
# dali = {
# hpcc_admin_password = ""
# hpcc_admin_username = ""
# ldap_admin_password = ""
# ldap_admin_username = ""
# adminGroupName = "HPCC-Admins"
# filesBasedn = "ou=files,ou=eclHPCCSysUser,dc=z0lpf,dc=onmicrosoft,dc=com"
# groupsBasedn = "OU=AADDC Users,dc=z0lpf,dc=onmicrosoft,dc=com"
# resourcesBasedn = "ou=smc,ou=espservices,ou=eclHPCCSysUser,dc=z0lpf,dc=onmicrosoft,dc=com"
# systemBasedn = "OU=AADDC Users,dc=z0lpf,dc=onmicrosoft,dc=com"
# usersBasedn = "OU=AADDC Users,dc=z0lpf,dc=onmicrosoft,dc=com"
# workunitsBasedn = "ou=workunits,ou=eclHPCCSysUser,dc=z0lpf,dc=onmicrosoft,dc=com"
# }
# esp = {
# hpcc_admin_password = ""
# hpcc_admin_username = ""
# ldap_admin_password = ""
# ldap_admin_username = ""
# adminGroupName = "HPCC-Admins"
# filesBasedn = "ou=files,ou=eclHPCCSysUser,dc=z0lpf,dc=onmicrosoft,dc=com"
# groupsBasedn = "OU=AADDC Users,dc=z0lpf,dc=onmicrosoft,dc=com"
# resourcesBasedn = "ou=smc,ou=espservices,ou=eclHPCCSysUser,dc=z0lpf,dc=onmicrosoft,dc=com"
# systemBasedn = "OU=AADDC Users,dc=z0lpf,dc=onmicrosoft,dc=com"
# usersBasedn = "OU=AADDC Users,dc=z0lpf,dc=onmicrosoft,dc=com"
# workunitsBasedn = "ou=workunits,ou=eclHPCCSysUser,dc=z0lpf,dc=onmicrosoft,dc=com"
# }
# }

roxie_internal_service = {
name = "iroxie"
Expand Down Expand Up @@ -100,6 +257,7 @@ locals {
disabled = (var.aks_enable_roxie == true)? false : true
name = "roxie"
nodeSelector = { workload = "roxiepool" }
# tlh 20231109 numChannels = 2
numChannels = 1
prefix = "roxie"
replicas = 2
Expand Down Expand Up @@ -249,7 +407,7 @@ locals {
type = "hthor"
spillPlane = "spill"
resources = {
cpu = "2"
cpu = "1"
memory = "4G"
}
nodeSelector = { workload = "servpool" }
Expand Down Expand Up @@ -323,6 +481,7 @@ locals {
throttle = 0
retryinterval = 6
keepResultFiles = false
# egress = "engineEgress"
}

dfuwu-archiver = {
Expand All @@ -336,6 +495,7 @@ locals {
cutoff = 14
at = "* * * * *"
throttle = 0
# egress = "engineEgress"
}

dfurecovery-archiver = {
Expand All @@ -344,6 +504,7 @@ locals {
limit = 20
cutoff = 4
at = "* * * * *"
# egress = "engineEgress"
}

file-expiry = {
Expand All @@ -353,6 +514,7 @@ locals {
persistExpiryDefault = 7
expiryDefault = 4
user = "sasha"
# egress = "engineEgress"
}
}

Expand Down Expand Up @@ -385,15 +547,16 @@ locals {
maxGraphs = 2
maxGraphStartupTime = 172800
numWorkersPerPod = 1
#nodeSelector = {}
nodeSelector = { workload = "thorpool" }
egress = "engineEgress"
tolerations_value = "thorpool"
managerResources = {
cpu = 2
memory = "4G"
cpu = 1
memory = "2G"
}
workerResources = {
cpu = 4
cpu = 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 3 the best number here (I thought it was 2).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, both cpu and memory for 'workerResources' are variable.

memory = "4G"
}
workerMemory = {
Expand Down
Loading