본문 바로가기
MLOps/Issue

Kubernetes POD 상태 CrashLoopBackOff : too many open files 오류 해결하는 방법

by 해피ing 2025. 2. 5.
반응형

명령어 입력은 파란색, 출력 결과는 붉은색입니다.

Kubernetes POD 상태를 확인하기 위한 명령어를 입력합니다.
kubectl get pods -n kubeflow
Kubernetes POD 상태를 확인합니다.
NAME                                                     READY   STATUS             RESTARTS        AGE
admission-webhook-deployment-5ff6bc6ddf-wnh4n            1/1     Running            0               8m51s
cache-server-7d48869657-zxpg7                            2/2     Running            0               8m50s
centraldashboard-6bd5bc75f4-wcfs7                        2/2     Running            0               8m53s
jupyter-web-app-deployment-757f5fd8c5-n4jgc              2/2     Running            0               8m51s
katib-controller-69cb7d8444-lkv87                        0/1     CrashLoopBackOff   5 (35s ago)     8m50s
katib-db-manager-695bd86c4f-gwgcl                        1/1     Running            5 (2m52s ago)   8m54s
katib-mysql-5dfcbbc87f-zxnv4                             1/1     Running            0               8m53s
katib-ui-785dd58497-m7z4w                                2/2     Running            0               8m52s
kserve-controller-manager-64fdb76d68-559zj               2/2     Running            0               8m51s
kserve-models-web-app-f7fcfc48b-2zfjd                    2/2     Running            0               8m50s
kubeflow-pipelines-profile-controller-68469d866f-f5w2h   1/1     Running            0               8m54s
metacontroller-0                                         1/1     Running            0               8m48s
metadata-envoy-deployment-677d8c6fb9-q5j7j               1/1     Running            0               8m52s
metadata-grpc-deployment-76d6fb49f8-bhmlv                2/2     Running            5 (2m10s ago)   8m52s
metadata-writer-589bc65748-fgmlq                         2/2     Running            1 (55s ago)     8m51s
minio-847b65dd88-s9t5n                                   2/2     Running            0               8m51s
ml-pipeline-5b85c7746f-jvffh                             1/2     CrashLoopBackOff   6 (88s ago)     8m50s
ml-pipeline-persistenceagent-7b5ffffc6c-vzkp9            2/2     Running            0               8m54s
ml-pipeline-scheduledworkflow-548ccdfb65-x7bk9           2/2     Running            0               8m53s
ml-pipeline-ui-64fb4d9ccd-gq9p8                          2/2     Running            0               8m52s
ml-pipeline-viewer-crd-5f9b548cdb-b9zmt                  2/2     Running            0               8m52s
ml-pipeline-visualizationserver-fc7dd6c75-f2g5j          2/2     Running            0               8m51s
mysql-767f4d9f9b-k8nk9                                   2/2     Running            0               8m51s
notebook-controller-deployment-54f9b8c88b-nlj8t          2/2     Running            0               8m51s
profiles-deployment-7555868994-w9l4c                     3/3     Running            0               8m51s
pvcviewer-controller-manager-b6b48785d-bh862             3/3     Running            0               8m54s
tensorboard-controller-deployment-6f8754b5ff-tr4x2       3/3     Running            0               8m53s
tensorboards-web-app-deployment-5fcf78b64-crd7r          2/2     Running            0               8m53s
training-operator-79cc5c4557-6m95r                       1/1     Running            0               8m53s
volumes-web-app-deployment-75d4d59b65-9lsgr              2/2     Running            0               8m52s
workflow-controller-55ff8d6489-2t9h8                     2/2     Running            0               8m52s
Kubernetes POD 상태가 CrashLoopBackOff 일 때, 로그를 확인하기 위한 명령어를 입력합니다.
kubectl logs -f pod/ml-pipeline-5b85c7746f-jvffh -n kubeflow --all-containers
CrashLoopBackOff 상태의 내부 정보를 확인합니다.
{"level":"info","ts":"2025-02-05T08:04:23Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
F0205 08:04:23.014066       1 config.go:46] config=main.Config{CertFile:"/etc/webhook/certs/tls.crt", KeyFile:"/etc/webhook/certs/tls.key"} Error: too many open files
모니터링 가능한 파일의 최대 개수를 설정하기 위해 옵션 값을 최대 값으로 변경합니다.
sudo sysctl -w fs.inotify.max_user_watches=2099999999
모니터링 가능한 파일의 최대 개수를 확인합니다.
fs.inotify.max_user_watches = 2099999999
시스템에서 한 사용자가 생성할 수 있는 인스턴스의 최대 개수를 설정하기 위해 옵션 값을 최대 값으로 변경합니다.
sudo sysctl -w fs.inotify.max_user_instances=2099999999
사용자가 생성할 수 있는 인스턴스의 최대 개수를 확인합니다.
fs.inotify.max_user_instances = 2099999999
하나의 프로세스에서 저장할 수 있는 이벤트의 최대 개수를 설정하기 위해 옵션 값을 최대 값으로 변경합니다.
sudo sysctl -w fs.inotify.max_queued_events=2099999999
저장할 수 있는 이벤트의 최대 개수를 확인합니다.
fs.inotify.max_queued_events = 2099999999
Kubernetes POD 상태를 확인하기 위한 명령어를 다시 입력합니다.
kubectl get pods -n kubeflow
Kubernetes POD 상태를 다시 확인합니다.
NAME                                                     READY   STATUS    RESTARTS   AGE
admission-webhook-deployment-5ff6bc6ddf-wnh4n            1/1     Running   0          8m51s
cache-server-7d48869657-zxpg7                            2/2     Running   0          8m50s
centraldashboard-6bd5bc75f4-wcfs7                        2/2     Running   0          8m53s
jupyter-web-app-deployment-757f5fd8c5-n4jgc              2/2     Running   0          8m51s
katib-controller-69cb7d8444-lkv87                        1/1     Running   8          8m50s
katib-db-manager-695bd86c4f-gwgcl                        1/1     Running   5          8m54s
katib-mysql-5dfcbbc87f-zxnv4                             1/1     Running   0          8m53s
katib-ui-785dd58497-m7z4w                                2/2     Running   0          8m52s
kserve-controller-manager-64fdb76d68-559zj               2/2     Running   0          8m51s
kserve-models-web-app-f7fcfc48b-2zfjd                    2/2     Running   0          8m50s
kubeflow-pipelines-profile-controller-68469d866f-f5w2h   1/1     Running   0          8m54s
metacontroller-0                                         1/1     Running   0          8m48s
metadata-envoy-deployment-677d8c6fb9-q5j7j               1/1     Running   0          8m52s
metadata-grpc-deployment-76d6fb49f8-bhmlv                2/2     Running   5          8m52s
metadata-writer-589bc65748-fgmlq                         2/2     Running   1          8m51s
minio-847b65dd88-s9t5n                                   2/2     Running   0          8m51s
ml-pipeline-5b85c7746f-jvffh                             2/2     Running   7          8m50s
ml-pipeline-persistenceagent-7b5ffffc6c-vzkp9            2/2     Running   0          8m54s
ml-pipeline-scheduledworkflow-548ccdfb65-x7bk9           2/2     Running   0          8m53s
ml-pipeline-ui-64fb4d9ccd-gq9p8                          2/2     Running   0          8m52s
ml-pipeline-viewer-crd-5f9b548cdb-b9zmt                  2/2     Running   0          8m52s
ml-pipeline-visualizationserver-fc7dd6c75-f2g5j          2/2     Running   0          8m51s
mysql-767f4d9f9b-k8nk9                                   2/2     Running   0          8m51s
notebook-controller-deployment-54f9b8c88b-nlj8t          2/2     Running   0          8m51s
profiles-deployment-7555868994-w9l4c                     3/3     Running   0          8m51s
pvcviewer-controller-manager-b6b48785d-bh862             3/3     Running   0          8m54s
tensorboard-controller-deployment-6f8754b5ff-tr4x2       3/3     Running   0          8m53s
tensorboards-web-app-deployment-5fcf78b64-crd7r          2/2     Running   0          8m53s
training-operator-79cc5c4557-6m95r                       1/1     Running   0          8m53s
volumes-web-app-deployment-75d4d59b65-9lsgr              2/2     Running   0          8m52s
workflow-controller-55ff8d6489-2t9h8                     2/2     Running   0          8m52s
반응형