-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main review mods 2023 12 06 #55
base: main-review-2023-11-17
Are you sure you want to change the base?
Conversation
… folder structure by replacing the subfolders of modules, i.e. aks, logging, storage, and vnet, with the folders in the root by the same names, i.e. aks, logging, storage, and vnet. Also, placed the contents of the folder, 'hpcc' in the root and deleted the folder.
…oy hpcc without external storage.
…d branch 'tlh-bug-fixes-and-personalize-easy-deploy-v0'
…nch: tlh-bug-fixes-and-personalize-easy-deploy-v0. Also, in hpcc/hpcc.tf module 'hpcc' now sources Godji's opinionated hpcc instead my my local clone of it. As of this date, 2023/09/22, Godji has not merged a change I made to my local opinionated. So, hpcc will not deploy until he merges or you switch to sourcing a local clone of my opnionated hpcc.
…if one wants a thor cluster
Add ecl code security
…hing branch:add-terraform-to-deploy-everything
…hing branch:add-terraform-to-deploy-everything
…ariables branch:aks-is-now-using-easy-deploy-variables
…ariables branch:aks-is-now-using-easy-deploy-variables
… so lite.auto.tfvars is correctly copied to hpcc and/or aks directories.
…ariables branch:aks-is-now-using-easy-deploy-variables. Changed scripts/deploy…
…ariables branch:aks-is-now-using-easy-deploy-variables
…ariables branch:aks-is-now-using-easy-deploy-variables
branch:few-changes-20231020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few new comments, but be aware that there are still older outstanding conversations as well.
aks/locals.tf
Outdated
node_type_version = "v2" # v1, v2 | ||
node_size = var.aks_serv_node_size | ||
single_group = false | ||
min_capacity = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify this min_capacity value. Is it really needed? I am somewhat suspicious of it, if we have trimmed the HPCC cluster down at all then we may not need 4 nodes at minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the nodepool called 'hpccpool'. This is the only nodepool used by the hpcc cluster when 'aks_4nodepools=false'. I set 'min_capacity=4' because the terraform was always using 4 nodepools before I added this new option. I assume you want to know if the hpcc cluster can use less than 4 nodes. I have verified that it can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify the previous comment.
lite-variables.tf
Outdated
variable "aks_azure_region" { | ||
type = string | ||
description = "REQUIRED. The Azure region abbreviation in which to create these resources.\nMust be one of [\"eastus\", \"eastus2\", \"centralus\"].\nExample entry: eastus2" | ||
validation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is noted elsewhere, but: We should remove the region constraints. What is there now, which was brought over from old code, pertains to our group and not to OSS. Users should be free to place their clusters anywhere in the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
lite-variables.tf
Outdated
} | ||
|
||
variable "aks_enable_roxie" { | ||
description = "REQUIRED. Enable ROXIE?\nThis will also expose port 8002 on the cluster.\nExample entry: false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this port number correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Should have been 18002.
lite.auto.tfvars.example
Outdated
#------------------------------------------------------------------------------ | ||
|
||
# Enable ROXIE? | ||
# This will also expose port 8002 on the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 8002 correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Changed to 18002.
…d replaced with floor values and '1' if result is 0.
…d replaced with floor values and '1' if result is 0.
…scription' from outputs.tf
@@ -1,6 +1,7 @@ | |||
module "hpcc" { | |||
#source = "[email protected]:hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git" | |||
source = "github.com/hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git" | |||
#source = "/home/azureuser/tlhumphrey2/RBA-terraform-azurerm-hpcc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented-out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commented-out code I don't want to remove until we are fairly sure most everything works properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In aks/locals.tf, I have changed to min_capacity of hpccpool and servpool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this conversation re: commented-out code as unresolved until it is completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have spoke with Godji and Milind. Godji said that he has the role of Owner and has never tried a role with fewer permissions than Owner. Milind says he has always used TFE to deploy the RBA hpcc terraform, but, believes if you use the terraform outside of TFE you would need elevated permissions.
@@ -30,11 +30,11 @@ module "metadata" { | |||
project = local.metadata.project | |||
} | |||
|
|||
resource "null_resource" "delete_ephemeral_storage_accounts" { | |||
/*resource "null_resource" "delete_ephemeral_storage_accounts" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented-out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commented-out code I don't want to delete until we are sure everything works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving unresolved until the commented-out code is removed.
lite-variables.tf
Outdated
} | ||
|
||
variable "aks_enable_roxie" { | ||
description = "REQUIRED. Enable ROXIE?\nThis will also expose port 8002 on the cluster.\nExample entry: false" | ||
description = "REQUIRED if you want roxie. Enable ROXIE?\nThis will also expose port 18002 on the cluster.\nExample entry: false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"if you want roxie" is incorrect, strictly speaking. If someone omitted the value entirely, because they don't want a Roxie, would the plan fail?
Also, note the capitalization of "Roxie"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. lite-variables.tf no longer says roxie is required.
lite-variables.tf
Outdated
} | ||
|
||
variable "aks_enable_roxie" { | ||
description = "REQUIRED. Enable ROXIE?\nThis will also expose port 8002 on the cluster.\nExample entry: false" | ||
description = "REQUIRED if you want roxie. Enable ROXIE?\nThis will also expose port 18002 on the cluster.\nExample entry: false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: ECL Watch now uses port 443 instead of 18010, correct? This same thing needs to happen with Roxie: It should use port 443 instead of 18002. 443 is the default https port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't changed to port 443, because when I did, the terraform didn't work properly. So, I'm waiting until we have most issues solved before trying again.
aks/locals.tf
Outdated
node_size = var.aks_node_sizes.roxie | ||
single_group = false | ||
min_capacity = 1 | ||
max_capacity = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tim, please address the previous questions.
aks/locals.tf
Outdated
node_type_version = "v2" # v1, v2 | ||
node_size = var.aks_serv_node_size | ||
single_group = false | ||
min_capacity = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify the previous comment.
@@ -0,0 +1,16 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has the previous issue been resolved? Are any storage accounts created and then destroyed if the user only wants ephemeral storage?
lite-locals.tf
Outdated
} | ||
workerMemory = { | ||
query = "3G" | ||
thirdParty = "500M" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query value is really workerResources.memory - thirdParty, rounded down. Please change the hardcoded 3G to that computation.
The goal of this block is to divide workResources.memory into "memory for Thor" and "memory for 3rd party tools".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed workerMemory.query to the same value as workerResources.memory. But, I don't know what you want thirdParty to be?????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your commit 81eca56 you changed the query value to exactly the same value as workerResources.memory, which is not what was requested. Your change runs the risk of over-allocation of RAM, leading to an Out Of Memory exception and a killed pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit "ed598d8".
…ols are set to this. Also, fixed workerMemory.query and workerMemory.thirdParty.
lite-locals.tf
Outdated
query = format("%dG", local.thor_worker_ram) | ||
thirdParty = "500M" | ||
query = format(%dG", local.workerMemory_query) | ||
thirdParty = format(%dG", local.workerMemory_thirdParty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same as the original. The hardcoded value was "500M" but your computed value is "500G" -- definitely an error or OOM condition if a Thor job starts.
…capacity changed for hpccpool & servpool. In lite-locals.tf, Corrected workerMemory values.
In lite-locals.tf, I corrected the calculation of workMemory.query.
From: Dan S. Camper ***@***.***>
Sent: Friday, January 5, 2024 6:29 AM
To: hpccsystems-solutions-lab/terraform-azurerm-hpcc-lite ***@***.***>
Cc: Humphrey, Timothy L (RIS-HBE) ***@***.***>; Author ***@***.***>
Subject: Re: [hpccsystems-solutions-lab/terraform-azurerm-hpcc-lite] Main review mods 2023 12 06 (PR #55)
*** External email: use caution ***
@dcamper commented on this pull request.
________________________________
In lite-locals.tf<#55 (comment)>:
@@ -432,8 +433,8 @@ locals {
memory = format("%dG", local.thor_worker_ram)
}
workerMemory = {
- query = format("%dG", local.thor_worker_ram)
- thirdParty = "500M"
+ query = format(%dG", local.workerMemory_query)
+ thirdParty = format(%dG", local.workerMemory_thirdParty)
This is not the same as the original. The hardcoded value was "500M" but your computed value is "500G" -- definitely an error or OOM condition if a Thor job starts.
-
Reply to this email directly, view it on GitHub<#55 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAIUZYF7GHALK4XKQDLMAR3YM752FAVCNFSM6AAAAABAJMATMSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMBVHE3DQMJVGY>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
…________________________________
The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.
|
@@ -1,6 +1,7 @@ | |||
module "hpcc" { | |||
#source = "[email protected]:hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git" | |||
source = "github.com/hpccsystems-solutions-lab/tlh-opinionated-terraform-azurerm-hpcc.git" | |||
#source = "/home/azureuser/tlhumphrey2/RBA-terraform-azurerm-hpcc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this conversation re: commented-out code as unresolved until it is completed.
@@ -30,11 +30,11 @@ module "metadata" { | |||
project = local.metadata.project | |||
} | |||
|
|||
resource "null_resource" "delete_ephemeral_storage_accounts" { | |||
/*resource "null_resource" "delete_ephemeral_storage_accounts" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving unresolved until the commented-out code is removed.
@@ -0,0 +1,16 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please continue to work on this issue; leaving unresolved.
…nd 2) the need for 28 vCPUs before deploying HPCC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No new comments, but there are still unresolved conversations. In addition, the issue of not being able to deploy into the eastus2 region is still unresolved.
OK. Now, I'm working on the issue of not being able to deploy into eastus2. As far as I know there shouldn't be unresolved conversations. How to I find any unresolved conversations?
From: Dan S. Camper ***@***.***>
Sent: Monday, January 8, 2024 9:22 AM
To: hpccsystems-solutions-lab/terraform-azurerm-hpcc-lite ***@***.***>
Cc: Humphrey, Timothy L (RIS-HBE) ***@***.***>; Author ***@***.***>
Subject: Re: [hpccsystems-solutions-lab/terraform-azurerm-hpcc-lite] Main review mods 2023 12 06 (PR #55)
*** External email: use caution ***
@dcamper requested changes on this pull request.
No new comments, but there are still unresolved conversations. In addition, the issue of not being able to deploy into the eastus2 region is still unresolved.
-
Reply to this email directly, view it on GitHub<#55 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAIUZYEKKDGXD7UIAPQ7RVTYNQMLDAVCNFSM6AAAAABAJMATMSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMBZGUZTANRVGM>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
…________________________________
The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.
|
…ged availability_zone from 1 to 2. Why? In eastus2, getting error message saying hpccpool could not be created because constrints too restrictive.
…y? 300 was not enough time for HPCC to fully deploy on my azure vm Standard_B2s
…must be installed).
No description provided.