해당 데모는 AWS EKS 환경에서 3rd Party Solution 을 사용하지 않고 AWS Native 한 성능 및 모니터링 지표 구성을 위한 과정을 기술하였습니다.
실 사용 환경, 버전, Architecture에 따라 구성 및 설정 값은 달라질 수 있습니다
What is Cloud Watch Container Insight?
컨테이너 기반 어플리케이션과 마이크로 서비스의 도입이 확대 됨에 따라 다양한 모니터링 아키텍쳐 및 솔루션을 도입하고 있는 추세입니다.
CloudWatch Container Insight 란 AWS Native 서비스중 하나인 CloudWatch에서 제공하는 기능으로, 컨테이너 환경에서 모니터링 데이터를 안정적으로 수집, 성능 분석 및 기타 장애 상황을 분석하기 위해 사용 되는 서비스입니다.
신규 클러스터와 기존 클러스터 인프라 및 컨테이너화된 애플리케이션의 컴퓨팅 사용률과 오류에 대한 분석 정보를 Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, AWS Fargate 등의 컨테이너 관리 서비스에서 즉각적으로 확인 할 수 있으며, 수집 한 정보를 바탕으로 컨테이너 환경에서 발생하는 다양한 상황에 대처하고, 필요한 요소를 간소화 함으로써, 개발자의 생산성을 높일 수 있습니다.
※ 현재 Demo를 위해 사전 생성 하여 둔 EKS Cluster Architecture
해당 Hands On에서는 기존 구성된 EKS 클러스터 구성 상에 Cloudwatch Logs에 전송을 위한 Fluent Bit을 Deamon Set형태로 설치하여 클러스터의 Metric 값을 수집. Cloud Watch Insight 구성을 통하여 가시적인 지표를 출력하는 단계까지 진행할 예정 입니다.
일반 적으로 Log Architecture의 경우 ELK 스택을 사용하지만 해당 Hands On에서는 L(Logstash) 대신 Fluent Bit을 통해 구성하여 진행할 예정입니다.
※ Logstash 와 Fluent Bit에 대해 항목 별로 잘 비교된 블로그 내용이 있어 공유 합니다.
출처 : https://techblog.gccompany.co.kr/eks-환경에서의-efk-도입기-e8a92695e991
CloudWatch Agent, Fluent Bit 설치
# 매니페스트 파일 관리를 위한 디렉터리 생성
mkdir -p manifests/cloudwatch-insight && cd manifests/cloudwatch-insight
# pwd
/home/hiid/kubernates/manifests/cloudwatch-insight
# 다음의 명령어를 통해 amazon-cloudwatch 라는 Name Space생성
kubectl create ns amazon-cloudwatch
#일부 설정값 지정하여 cloudwatch 에이전트 및 FluentBit 설치
ClusterName=[Created ClusterName]
RegionName=$AWS_REGION
FluentBitHttpPort='2020' #FluentBit 사용 포트
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
아래 YAML파일은 AWS에서 배포하는 CW-Agent Container Insight의 Template입니다.
다음의 YAML 파일을 다운로드하여 내부 설정을 일부 수정하여 Daemon Set을 배포할 예정입니다.
상세한 설정 및 명령어는 본문 내용 참고 하여 진행하면 바로 진행 할 수 있습니다.
#File Download
wget <https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml>
#해당 파일에 대해 환경변수 값을 적용
sed -i 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' cwagent-fluent-bit-quickstart.yaml
#-- cwagent-fluent-bit-quickstart.yaml--#
# create amazon-cloudwatch namespace
apiVersion: v1
kind: Namespace
metadata:
name: amazon-cloudwatch
labels:
name: amazon-cloudwatch
---
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloudwatch-agent
namespace: amazon-cloudwatch
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cloudwatch-agent-role
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets", "daemonsets", "deployments"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes/stats", "configmaps", "events"]
verbs: ["create"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["cwagent-clusterleader"]
verbs: ["get","update"]
- nonResourceURLs: ["/metrics"]
verbs: ["get", "list", "watch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cloudwatch-agent-role-binding
subjects:
- kind: ServiceAccount
name: cloudwatch-agent
namespace: amazon-cloudwatch
roleRef:
kind: ClusterRole
name: cloudwatch-agent-role
apiGroup: rbac.authorization.k8s.io
---
# create configmap for cwagent config
apiVersion: v1
data:
# Configuration is in Json format. No matter what configure change you make,
# please keep the Json blob valid.
cwagentconfig.json: |
{
"agent": {
"region": "northeast-2"
},
"logs": {
"metrics_collected": {
"kubernetes": {
"cluster_name": "demo",
"metrics_collection_interval": 60
}
},
"force_flush_interval": 5
}
}
kind: ConfigMap
metadata:
name: cwagentconfig
namespace: amazon-cloudwatch
---
# deploy cwagent as daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloudwatch-agent
namespace: amazon-cloudwatch
spec:
selector:
matchLabels:
name: cloudwatch-agent
template:
metadata:
labels:
name: cloudwatch-agent
spec:
containers:
- name: cloudwatch-agent
image: public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.300028.1b210
#ports:
# - containerPort: 8125
# hostPort: 8125
# protocol: UDP
resources:
limits:
cpu: 400m
memory: 400Mi
requests:
cpu: 400m
memory: 400Mi
# Please don't change below envs
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CI_VERSION
value: "k8s/1.3.17"
# Please don't change the mountPath
volumeMounts:
- name: cwagentconfig
mountPath: /etc/cwagentconfig
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: dockersock
mountPath: /var/run/docker.sock
readOnly: true
- name: varlibdocker
mountPath: /var/lib/docker
readOnly: true
- name: containerdsock
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: devdisk
mountPath: /dev/disk
readOnly: true
nodeSelector:
kubernetes.io/os: linux
volumes:
- name: cwagentconfig
configMap:
name: cwagentconfig
- name: rootfs
hostPath:
path: /
- name: dockersock
hostPath:
path: /var/run/docker.sock
- name: varlibdocker
hostPath:
path: /var/lib/docker
- name: containerdsock
hostPath:
path: /run/containerd/containerd.sock
- name: sys
hostPath:
path: /sys
- name: devdisk
hostPath:
path: /dev/disk/
terminationGracePeriodSeconds: 60
serviceAccountName: cloudwatch-agent
---
# create configmap for cluster name and aws region for CloudWatch Logs
# need to replace the placeholders demo and northeast-2
# and need to replace "On" and "2020"
# and need to replace "Off" and "On"
apiVersion: v1
data:
cluster.name: demo
logs.region: northeast-2
http.server: "On"
http.port: "2020"
read.head: "Off"
read.tail: "On"
kind: ConfigMap
metadata:
name: fluent-bit-cluster-info
namespace: amazon-cloudwatch
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: amazon-cloudwatch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-role
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
- apiGroups: [""]
resources:
- namespaces
- pods
- pods/logs
- nodes
- nodes/proxy
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-role
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: amazon-cloudwatch
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: amazon-cloudwatch
labels:
k8s-app: fluent-bit
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Grace 30
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server ${HTTP_SERVER}
HTTP_Listen 0.0.0.0
HTTP_Port ${HTTP_PORT}
storage.path /var/fluent-bit/state/flb-storage/
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 5M
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf
application-log.conf: |
[INPUT]
Name tail
Tag application.*
Exclude_Path /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
Path /var/log/containers/*.log
multiline.parser docker, cri
DB /var/fluent-bit/state/flb_container.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
Rotate_Wait 30
storage.type filesystem
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag application.*
Path /var/log/containers/fluent-bit*
multiline.parser docker, cri
DB /var/fluent-bit/state/flb_log.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag application.*
Path /var/log/containers/cloudwatch-agent*
multiline.parser docker, cri
DB /var/fluent-bit/state/flb_cwagent.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[FILTER]
Name kubernetes
Match application.*
Kube_URL <https://kubernetes.default.svc:443>
Kube_Tag_Prefix application.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Labels Off
Annotations Off
Use_Kubelet On
Kubelet_Port 10250
Buffer_Size 0
[OUTPUT]
Name cloudwatch_logs
Match application.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/application
log_stream_prefix ${HOST_NAME}-
auto_create_group true
extra_user_agent container-insights
dataplane-log.conf: |
[INPUT]
Name systemd
Tag dataplane.systemd.*
Systemd_Filter _SYSTEMD_UNIT=docker.service
Systemd_Filter _SYSTEMD_UNIT=containerd.service
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
DB /var/fluent-bit/state/systemd.db
Path /var/log/journal
Read_From_Tail ${READ_FROM_TAIL}
[INPUT]
Name tail
Tag dataplane.tail.*
Path /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
multiline.parser docker, cri
DB /var/fluent-bit/state/flb_dataplane_tail.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
Rotate_Wait 30
storage.type filesystem
Read_from_Head ${READ_FROM_HEAD}
[FILTER]
Name modify
Match dataplane.systemd.*
Rename _HOSTNAME hostname
Rename _SYSTEMD_UNIT systemd_unit
Rename MESSAGE message
Remove_regex ^((?!hostname|systemd_unit|message).)*$
[FILTER]
Name aws
Match dataplane.*
imds_version v2
[OUTPUT]
Name cloudwatch_logs
Match dataplane.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/dataplane
log_stream_prefix ${HOST_NAME}-
auto_create_group true
extra_user_agent container-insights
host-log.conf: |
[INPUT]
Name tail
Tag host.dmesg
Path /var/log/dmesg
Key message
DB /var/fluent-bit/state/flb_dmesg.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag host.messages
Path /var/log/messages
Parser syslog
DB /var/fluent-bit/state/flb_messages.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag host.secure
Path /var/log/secure
Parser syslog
DB /var/fluent-bit/state/flb_secure.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[FILTER]
Name aws
Match host.*
imds_version v2
[OUTPUT]
Name cloudwatch_logs
Match host.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/host
log_stream_prefix ${HOST_NAME}.
auto_create_group true
extra_user_agent container-insights
parsers.conf: |
[PARSER]
Name syslog
Format regex
Regex ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\\/\\.\\-]*)(?:\\[(?<pid>[0-9]+)\\])?(?:[^\\:]*\\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
[PARSER]
Name container_firstline
Format regex
Regex (?<log>(?<="log":")\\S(?!\\.).*?)(?<!\\\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\\d{4}-\\d{1,2}-\\d{1,2}T\\d{2}:\\d{2}:\\d{2}\\.\\w*).*(?=})
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name cwagent_firstline
Format regex
Regex (?<log>(?<="log":")\\d{4}[\\/-]\\d{1,2}[\\/-]\\d{1,2}[ T]\\d{2}:\\d{2}:\\d{2}(?!\\.).*?)(?<!\\\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\\d{4}-\\d{1,2}-\\d{1,2}T\\d{2}:\\d{2}:\\d{2}\\.\\w*).*(?=})
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: amazon-cloudwatch
labels:
k8s-app: fluent-bit
version: v1
kubernetes.io/cluster-service: "true"
**#fluent-bit의 DaemonSet객체상에 아래 spec 값을 추가
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: NotIn
values:
- fargate**
## Daemon Set 생성관련 공식 DOC : https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
spec:
selector:
matchLabels:
k8s-app: fluent-bit
template:
metadata:
labels:
k8s-app: fluent-bit
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: fluent-bit
image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
imagePullPolicy: Always
env:
- name: AWS_REGION
valueFrom:
configMapKeyRef:
name: fluent-bit-cluster-info
key: logs.region
- name: CLUSTER_NAME
valueFrom:
configMapKeyRef:
name: fluent-bit-cluster-info
key: cluster.name
- name: HTTP_SERVER
valueFrom:
configMapKeyRef:
name: fluent-bit-cluster-info
key: http.server
- name: HTTP_PORT
valueFrom:
configMapKeyRef:
name: fluent-bit-cluster-info
key: http.port
- name: READ_FROM_HEAD
valueFrom:
configMapKeyRef:
name: fluent-bit-cluster-info
key: read.head
- name: READ_FROM_TAIL
valueFrom:
configMapKeyRef:
name: fluent-bit-cluster-info
key: read.tail
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: CI_VERSION
value: "k8s/1.3.17"
resources:
limits:
memory: 200Mi
requests:
cpu: 500m
memory: 100Mi
volumeMounts:
# Please don't change below read-only permissions
- name: fluentbitstate
mountPath: /var/fluent-bit/state
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
- name: runlogjournal
mountPath: /run/log/journal
readOnly: true
- name: dmesg
mountPath: /var/log/dmesg
readOnly: true
terminationGracePeriodSeconds: 10
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
volumes:
- name: fluentbitstate
hostPath:
path: /var/fluent-bit/state
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
- name: runlogjournal
hostPath:
path: /run/log/journal
- name: dmesg
hostPath:
path: /var/log/dmesg
serviceAccountName: fluent-bit
**#일부 추가 요건에 대한 설정값이 필요한 경우 해당 DaemonSet 값에 대한 설정을 일부 조정한 뒤 파일을 Deploy 합니다.
kubectl apply -f cwagent-fluent-bit-quickstart.yaml**
※ 해당 HandsOn의 경우 별도의 요건 사항 없이 Container Insight를 구성하는 것에 목적을 두고 있으므로, 추후 Config 설정을 통하여 좀 더 다양한 요건에 맞는 구축을 할 수 있도록 업데이트 할 예정입니다.
정상 Deploy 확인(Kubectl)
kubectl get po -n amazon-cloudwatch
#CloudWatch-AgentPod 및 Fluent-bit pod각 3개씩 생성
kubectl get daemonsets -n amazon-cloudwatch
#2개의 DaemonSet 확인
Management Console View
클러스터에 정상적으로 Deploy되었으면, AWS Management Console에서 가시적인 현재 Cluster 상태, Pod / Name Space등의 기준 성능 지표를 확인 할 수 있습니다.
CloudWatch Container Insights Map View
현재 클러스터의 구성을 Map형식으로 표현하여 주며, 각 서비스 별 연계 및 연결 성을 확인 할 수있습니다.
EKS Service 별 성능 모니터링을 통해 각 서비스 별 성능 지표를 확인 가능하며, 각 pod 별로 상세 사항을 추가로 확인 가능합니다.
각 Pod별 성능 지표 확인 가능 하며, 다양한 Filter를 기준으로 Columm을 Sorting 하여 모니터링 할 수 있습니다.
각 서비스 별 Application 및 Performance Log를 Log Insight 상에서 쿼리 하여 확인 할 수 있습니다.
이상으로 EKS Container Deamon Set을 통해 AWS Cloudwatch Agent에서 Container Insight를 구성하여 보았습니다.
실제 Hands-On 및 실습 단계 자체는 매우 간단하며, 짧은 시간이 소요됩니다. 다만 실 서비스 상에서 다양한 요건들을 반영한 Config를 구성하여 적절히 사용한다면, Cloud Native하게 마이크로 서비스의 모니터링 및 성능 지표 수집을 통한 다양한 활동을 수행할 수 있을 것으로 보입니다.
또한 3rd-Party Solution의 라이센스 비용이 부담스러운 작은 단위의 프로젝트에서 모니터링 방안으로도 충분히 활용 될 수 있을 것으로 보입니다.
감사합니다.