Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to launch managed postgres db in cluster #138

Closed
shrutebattlestargalactica opened this issue Oct 19, 2023 · 16 comments
Closed

Unable to launch managed postgres db in cluster #138

shrutebattlestargalactica opened this issue Oct 19, 2023 · 16 comments

Comments

@shrutebattlestargalactica
Copy link

shrutebattlestargalactica commented Oct 19, 2023

Hello @rooftopcellist! I am having some issues with deploying EDA into our EKS cluster:

eda-manager.log

it is stuck on

TASK [postgres : Wait for Database to initialize if managed DB]:

{"level":"info","ts":1697747113.740688,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}

I deployed via recommended operator with ArgoCD/Customize

Are we not able to yet use an external db similar to what we do with AWX? thanks for your help!

@shrutebattlestargalactica
Copy link
Author

@rooftopcellist I am wonder if this is related to this issue similar to the awx-operator: ansible/awx-operator#1022

I forgot to mention but we run this EKS cluster on k8s v1.28

@kurokobo
Copy link
Contributor

Perhaps your PostgreSQL pod is not starting properly. Please check the status of your PostgreSQL pod, and solve the cause of the failure to start.

In most cases for this issue (and the issue that reported on AWX Operator repo), the cause is in your environment or your configuration, not in Operator.

@shrutebattlestargalactica
Copy link
Author

shrutebattlestargalactica commented Oct 23, 2023

Thats correct - the pod is crashing. I checked the pod logs and saw this:

mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

image

eda-manager container shows this error after the DB task:

TASK [postgres : Wait for Database to initialize if managed DB]:

{"level":"info","ts":1697747113.740688,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}

I have a very vanilla set up for the deployment:

eda.yaml

apiVersion: eda.ansible.com/v1alpha1
kind: EDA
metadata:
name: awxnonprod-eda
namespace: eda
spec:
automation_server_url: https://awxdev.mydomain.com
service_type: nodeport
ingress_type: none
service_labels: |
environment: nonprod
team: technology
hostname: edadev.mydomain.com

pv and pvc are successfully created but not sure why its throwing this permission error when creating the userdata directory

I'd ultimately like to see if we are able to use an external rds instance but I am unsure of how to set up the secret for EDA. I am already doing this with our AWX instance so I am curious if we can use the same settings

@shrutebattlestargalactica
Copy link
Author

shrutebattlestargalactica commented Oct 24, 2023

Alright I ended up getting the external RDS postgres working! I was mimicking my secret from the AWX configuration and had to remove the "type: unmanaged" from the secret below:


apiVersion: v1
kind: Secret
metadata:
name: eda-nonprod-postgres-configuration
namespace: eda-nonprod
stringData:
host: "mydatabase.rds.amazonaws.com"
port: "5432"
database: "mydatabase"
username: "supersecretusername"
password: "supersecretpassword"
sslmode: "prefer"
type: Opaque

@watsonb
Copy link

watsonb commented Oct 24, 2023

I'm running into the same permission issue:

mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

PV/PVC are created successfully. This worked in 0.0.4 for me, but not working in 0.0.7 of the operator.

@watsonb
Copy link

watsonb commented Oct 24, 2023

OK, weird, after just letting it sit and 7 crash backoff loops later, it was able to create the directory. Must be some timing issue.

@goldyfruit
Copy link

goldyfruit commented Nov 29, 2023

Facing the same issue but no resolution after a while

NAME                                                     READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
eda-postgres-13-0                                        0/1     Pending   0          10s   <none>          <none>         <none>           <none>
eda-redis-99c7f6c8f-9qr98                                1/1     Running   0          68s   10.233.66.56    k8s-worker-4   <none>           <none>
eda-server-operator-controller-manager-8f4cb5d7d-k785p   2/2     Running   0          79m   10.233.98.129   k8s-worker-2   <none>           <none>
eda-postgres-13-0                                        0/1     Pending   0          27s   <none>          k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     ContainerCreating   0          27s   <none>          k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     ContainerCreating   0          44s   <none>          k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Running             0          46s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               0          47s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               1 (2s ago)   48s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    1 (2s ago)   49s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               2 (23s ago)   70s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    2 (5s ago)    74s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               3 (32s ago)   101s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    3 (2s ago)    103s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Running             4 (54s ago)   2m35s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               4 (55s ago)   2m36s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    4 (9s ago)    2m44s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Running             5 (84s ago)   3m59s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               5 (85s ago)   4m      10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    5 (2s ago)    4m1s    10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               6 (2m43s ago)   6m42s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    6 (2s ago)      6m43s   10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Running             7 (5m5s ago)    11m     10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               7 (5m7s ago)    11m     10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    7 (3s ago)      11m     10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               8 (5m12s ago)   16m     10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    8 (7s ago)      17m     10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     Error               9 (5m4s ago)    22m     10.233.66.34    k8s-worker-4   <none>           <none>
eda-postgres-13-0                                        0/1     CrashLoopBackOff    9 (3s ago)      22m     10.233.66.34    k8s-worker-4   <none>           <none>
# kubectl logs -f --tail 50 -n eda eda-postgres-13-0
mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

@shrutebattlestargalactica
Copy link
Author

@shrutebattlestargalactica
Copy link
Author

@goldyfruit
Copy link

Seems to work with NFS but no iSCSI.

@rchaud
Copy link

rchaud commented Apr 4, 2024

I am having this issue in Operator 1.0.1. Why was this issue closed?

@rooftopcellist rooftopcellist reopened this Apr 4, 2024
@rooftopcellist
Copy link
Member

maybe this is the same issue that was resolved here...

The sclorg postgresql image uses UID 26, whereas the postgres:13 image from dockerhub uses root (UID 0). That seems to create permissions issues when running on k8s. No issues on Openshift, which is why I didn't notice it.

I haven't confirmed that this is the case here as well, but it's my best guess atm.

@msmagnanijr
Copy link
Contributor

I'm creating some roles that could be quite handy for your test. It's a bit dated, so you'll need to go over the images, tags, and such. e.g: the redis image now is quay.io/fedora/redis-6 .

https://github.com/msmagnanijr/eda-server/tree/AAP-15757/automation

@rchaud
Copy link

rchaud commented Apr 5, 2024

@rooftopcellist I have created a PR to allow users to extend the security context. I was able to get it working in K3s by setting the user and group to 1001.

#190

@rooftopcellist
Copy link
Member

@chinochao Can you try out the changes in #193 (now merged into the main branch and see if that solves the permissions issue for you?

@rooftopcellist
Copy link
Member

I believe this is resolved now. Please open another issue if that is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants