High kernel memory usage in Kubernetes Node

问题

I am pretty desperate searching for a solution to this. I am running a Kubernetes Cluster (v1.16.7) on AWS.

Node specs are: It is an Amazon EC2 t3.medium instance with 4GB RAM and AMI: k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17 with kernel: 4.9.0-7-amd64

My main problem is that I see increased memory usage in the kernel, which leads to faster memory starvation issues in my node. More specifically:

free -m:

              total        used        free      shared  buff/cache   available
Mem:           3895        3470         130           3         294         204
Swap:             0           0           0

This currently shows that my actual used (non-cache or reclaimable memory) is around 3.4GB.

Also the output of sudo smem -twk:

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          1.5G     184.1M       1.3G 
userspace memory               2.2G     111.1M       2.1G 
free memory                  125.5M     125.5M          0 
----------------------------------------------------------
                               3.8G     420.7M       3.4G

matches the output of free in the following way:

used column in free = smem kernel NonCache + userspace Noncache = 3.4GB
buff/cache columne in free = smem kernel Cache + userspace Cache = 294MB

Also kubectl top node matches the userspace memory in smem showing around 2.2GB and so does the total of top and ps aux of the running processes.

However my /proc/meminfo/:

MemTotal:        3989436 kB
MemFree:          133272 kB
MemAvailable:     209416 kB
Buffers:           10472 kB
Cached:           255628 kB
SwapCached:            0 kB
Active:          2340712 kB
Inactive:          80612 kB
Active(anon):    2156712 kB
Inactive(anon):     1752 kB
Active(file):     184000 kB
Inactive(file):    78860 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              1404 kB
Writeback:             0 kB
AnonPages:       2155264 kB
Mapped:           111500 kB
Shmem:              3220 kB
Slab:             121856 kB
SReclaimable:      36260 kB
SUnreclaim:        85596 kB
KernelStack:       17440 kB
PageTables:        32972 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1994716 kB
Committed_AS:    8704948 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      518120 kB
DirectMap2M:     3614720 kB
DirectMap1G:           0 kB

shows a total of kernel memory usage Slab + SReclaimable + SUnreclaim of ~238MB which is nowhere near 1.3GB shown in smem which also sums up in the free report.

So where is the extra memory in the kernel spent???

Are there any other ways to check where kernel memory is used?

Thanks!

UPDATE

After many trials and experimentation with configuration, the problem is narrowed down to FluentD logging System.

We have an in-app logging mechanism, that targets a FluentD service using TCP @type forward source which then is sent to ElasticSearch using @type elasticsearch *match. Same FluentD service also captures local logfiles and sent to Elastic without any problem, so it seems that it has something to do with TCP communication...

Image used is quay.io/fluentd_elasticsearch/fluentd:v3.1.0 from https://github.com/kokuwaio/helm-charts/tree/main/charts/fluentd-elasticsearch v11.3.0 helm chart

来源：https://stackoverflow.com/questions/65024698/high-kernel-memory-usage-in-kubernetes-node

标签

Kubernetes

memory-management

memory-leaks

linux-kernel

fluentd