반응형

명령어 입력은 파란색, 출력 결과는 붉은색입니다.
Kubernetes POD 상태를 확인하기 위한 명령어를 입력합니다.
kubectl get pods -n kubeflow
Kubernetes POD 상태를 확인합니다.
NAME READY STATUS RESTARTS AGE
admission-webhook-deployment-5ff6bc6ddf-wnh4n 1/1 Running 0 8m51s
cache-server-7d48869657-zxpg7 2/2 Running 0 8m50s
centraldashboard-6bd5bc75f4-wcfs7 2/2 Running 0 8m53s
jupyter-web-app-deployment-757f5fd8c5-n4jgc 2/2 Running 0 8m51s
katib-controller-69cb7d8444-lkv87 0/1 CrashLoopBackOff 5 (35s ago) 8m50s
katib-db-manager-695bd86c4f-gwgcl 1/1 Running 5 (2m52s ago) 8m54s
katib-mysql-5dfcbbc87f-zxnv4 1/1 Running 0 8m53s
katib-ui-785dd58497-m7z4w 2/2 Running 0 8m52s
kserve-controller-manager-64fdb76d68-559zj 2/2 Running 0 8m51s
kserve-models-web-app-f7fcfc48b-2zfjd 2/2 Running 0 8m50s
kubeflow-pipelines-profile-controller-68469d866f-f5w2h 1/1 Running 0 8m54s
metacontroller-0 1/1 Running 0 8m48s
metadata-envoy-deployment-677d8c6fb9-q5j7j 1/1 Running 0 8m52s
metadata-grpc-deployment-76d6fb49f8-bhmlv 2/2 Running 5 (2m10s ago) 8m52s
metadata-writer-589bc65748-fgmlq 2/2 Running 1 (55s ago) 8m51s
minio-847b65dd88-s9t5n 2/2 Running 0 8m51s
ml-pipeline-5b85c7746f-jvffh 1/2 CrashLoopBackOff 6 (88s ago) 8m50s
ml-pipeline-persistenceagent-7b5ffffc6c-vzkp9 2/2 Running 0 8m54s
ml-pipeline-scheduledworkflow-548ccdfb65-x7bk9 2/2 Running 0 8m53s
ml-pipeline-ui-64fb4d9ccd-gq9p8 2/2 Running 0 8m52s
ml-pipeline-viewer-crd-5f9b548cdb-b9zmt 2/2 Running 0 8m52s
ml-pipeline-visualizationserver-fc7dd6c75-f2g5j 2/2 Running 0 8m51s
mysql-767f4d9f9b-k8nk9 2/2 Running 0 8m51s
notebook-controller-deployment-54f9b8c88b-nlj8t 2/2 Running 0 8m51s
profiles-deployment-7555868994-w9l4c 3/3 Running 0 8m51s
pvcviewer-controller-manager-b6b48785d-bh862 3/3 Running 0 8m54s
tensorboard-controller-deployment-6f8754b5ff-tr4x2 3/3 Running 0 8m53s
tensorboards-web-app-deployment-5fcf78b64-crd7r 2/2 Running 0 8m53s
training-operator-79cc5c4557-6m95r 1/1 Running 0 8m53s
volumes-web-app-deployment-75d4d59b65-9lsgr 2/2 Running 0 8m52s
workflow-controller-55ff8d6489-2t9h8 2/2 Running 0 8m52s
Kubernetes POD 상태가 CrashLoopBackOff 일 때, 로그를 확인하기 위한 명령어를 입력합니다.
kubectl logs -f pod/ml-pipeline-5b85c7746f-jvffh -n kubeflow --all-containers
CrashLoopBackOff 상태의 내부 정보를 확인합니다.
{"level":"info","ts":"2025-02-05T08:04:23Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
F0205 08:04:23.014066 1 config.go:46] config=main.Config{CertFile:"/etc/webhook/certs/tls.crt", KeyFile:"/etc/webhook/certs/tls.key"} Error: too many open files
모니터링 가능한 파일의 최대 개수를 설정하기 위해 옵션 값을 최대 값으로 변경합니다.
sudo sysctl -w fs.inotify.max_user_watches=2099999999
모니터링 가능한 파일의 최대 개수를 확인합니다.
fs.inotify.max_user_watches = 2099999999
시스템에서 한 사용자가 생성할 수 있는 인스턴스의 최대 개수를 설정하기 위해 옵션 값을 최대 값으로 변경합니다.
sudo sysctl -w fs.inotify.max_user_instances=2099999999
사용자가 생성할 수 있는 인스턴스의 최대 개수를 확인합니다.
fs.inotify.max_user_instances = 2099999999
하나의 프로세스에서 저장할 수 있는 이벤트의 최대 개수를 설정하기 위해 옵션 값을 최대 값으로 변경합니다.
sudo sysctl -w fs.inotify.max_queued_events=2099999999
저장할 수 있는 이벤트의 최대 개수를 확인합니다.
fs.inotify.max_queued_events = 2099999999
Kubernetes POD 상태를 확인하기 위한 명령어를 다시 입력합니다.
kubectl get pods -n kubeflow
Kubernetes POD 상태를 다시 확인합니다.
NAME READY STATUS RESTARTS AGE
admission-webhook-deployment-5ff6bc6ddf-wnh4n 1/1 Running 0 8m51s
cache-server-7d48869657-zxpg7 2/2 Running 0 8m50s
centraldashboard-6bd5bc75f4-wcfs7 2/2 Running 0 8m53s
jupyter-web-app-deployment-757f5fd8c5-n4jgc 2/2 Running 0 8m51s
katib-controller-69cb7d8444-lkv87 1/1 Running 8 8m50s
katib-db-manager-695bd86c4f-gwgcl 1/1 Running 5 8m54s
katib-mysql-5dfcbbc87f-zxnv4 1/1 Running 0 8m53s
katib-ui-785dd58497-m7z4w 2/2 Running 0 8m52s
kserve-controller-manager-64fdb76d68-559zj 2/2 Running 0 8m51s
kserve-models-web-app-f7fcfc48b-2zfjd 2/2 Running 0 8m50s
kubeflow-pipelines-profile-controller-68469d866f-f5w2h 1/1 Running 0 8m54s
metacontroller-0 1/1 Running 0 8m48s
metadata-envoy-deployment-677d8c6fb9-q5j7j 1/1 Running 0 8m52s
metadata-grpc-deployment-76d6fb49f8-bhmlv 2/2 Running 5 8m52s
metadata-writer-589bc65748-fgmlq 2/2 Running 1 8m51s
minio-847b65dd88-s9t5n 2/2 Running 0 8m51s
ml-pipeline-5b85c7746f-jvffh 2/2 Running 7 8m50s
ml-pipeline-persistenceagent-7b5ffffc6c-vzkp9 2/2 Running 0 8m54s
ml-pipeline-scheduledworkflow-548ccdfb65-x7bk9 2/2 Running 0 8m53s
ml-pipeline-ui-64fb4d9ccd-gq9p8 2/2 Running 0 8m52s
ml-pipeline-viewer-crd-5f9b548cdb-b9zmt 2/2 Running 0 8m52s
ml-pipeline-visualizationserver-fc7dd6c75-f2g5j 2/2 Running 0 8m51s
mysql-767f4d9f9b-k8nk9 2/2 Running 0 8m51s
notebook-controller-deployment-54f9b8c88b-nlj8t 2/2 Running 0 8m51s
profiles-deployment-7555868994-w9l4c 3/3 Running 0 8m51s
pvcviewer-controller-manager-b6b48785d-bh862 3/3 Running 0 8m54s
tensorboard-controller-deployment-6f8754b5ff-tr4x2 3/3 Running 0 8m53s
tensorboards-web-app-deployment-5fcf78b64-crd7r 2/2 Running 0 8m53s
training-operator-79cc5c4557-6m95r 1/1 Running 0 8m53s
volumes-web-app-deployment-75d4d59b65-9lsgr 2/2 Running 0 8m52s
workflow-controller-55ff8d6489-2t9h8 2/2 Running 0 8m52s

반응형