how to speed/improve memcpy

nbari · Sep 7, 2021

I am running a Redis (version 6.2.5) cluster "sentinel" of 3 nodes (1 master, 2 replicas), OS FreeBSD 13, the dataset is approximately 70GB, the system has 320GB RAM, SSD disks, and the servers (dedicated, not VM's) are in the same network/datacenter, probably hardware is not a problem, but I start to notice that after BGSAVE finishes, there is a lag of approximately 10 seconds, because of this, the applications randomly get this error: NOREPLICAS Not enough good replicas to write, setting, min-replicas-max-lag to 20 helps but I would like to know if there is something I could fine-tune to speed up memcpy, from the flame graph: (Github discussion here: https://github.com/redis/redis/discussions/9457)

The current /boot/loader.conf:

Code:

# Set Max size = 32GB
vfs.zfs.arc_max="34359738368"
# Min size = 4GB
vfs.zfs.arc_min="4294967296"


kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"

In /etc/sysctl.conf

Code:

kern.ipc.shm_use_phys=1

For building the kernel I have this in /etc/make.conf

Code:

CFLAGS=         -O2 -pipe -fno-strict-aliasing
COPTFLAGS=      -O2 -pipe -fno-strict-aliasing

BUILD_OPTIMIZED=        YES
BUILD_STATIC=           YES
OPTIMIZED_CFLAGS=       YES
WITHOUT_DEBUG=          YES
WITH_CPUFLAGS=          YES
WITH_OPTIMIZED_CFLAGS=  YES
MALLOC_PRODUCTION=      YES

From the flame graph, next to the Redis is one from ZFS:

Here also I notice that ZFS is using memcpy:

Currently trying with vm.pmap.pg_ps_enabled=0

SirDice · Sep 7, 2021

nbari said:
For building the kernel I have this in /etc/make.conf

Remove those CFLAGS and COPTFLAGS.

diizzy · Sep 7, 2021

...and BUILD_STATIC probably doesn't help either but it most likely it wont fix your main issue
You probably want to set/define CPUTYPE so you can take advantage of new(er) instructions

Alain De Vos · Sep 7, 2021

Why this tuning ?

Code:

kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"

mark_j · Sep 7, 2021

That tuning's for things like databases communicating over ipc/shared memory - Oracle is big on all that stuff, particularly on Solaris.

nbari · Sep 7, 2021

Alain De Vos said:
Why this tuning ?

Code:

kern.ipc.semmns="2048" kern.ipc.semmni="128" kern.ipc.shmall="33554432" kern.ipc.shmseg="1024" kern.ipc.shmmax="137438953472"

Like mark_j mentioned, indeed I normally found also that for PostgreSQL and though could help to improve Redis performance. (try to keep all in ram)

diizzy regarding BUILD_STATIC/CPUTYPE any url/doc or something in specific ?

Currently, I am rebuilding kernel & word removing as SirDice advised, removing CFLAGS/COPTFLAGS

SirDice · Sep 7, 2021

nbari said:
indeed I normally found also that for PostgreSQL and though could help to improve Redis performance. (try to keep all in ram)

PostgreSQL uses shared memory (which is where those settings are for). Redis however does not.

diizzy · Sep 7, 2021

nbari
https://cgit.freebsd.org/src/tree/share/examples/etc/make.conf#n25 will probably help
As regarding to static vs dynamic I should probably rephrase it as it depends, ffmpeg is or at least used to be a lot slower compiled as a static binary for instance. Dynamic linking is also the in general preferred way of linking binaries and libraries in FreeBSD

mark_j · Sep 7, 2021

Back to the original question " if there is something I could fine-tune to speed up memcpy", there's probably not. Clang (and gcc) are very good at inlining the code to optimise it. I guess you could experiment with __SSE2__ or __SSE3__ (thus restricting possible optimisation to intel code chips) in memcpy with -fno-builtin-memcpy in your own version of memcpy?

The only way to really speed it up is get faster RAM or where possible use pointers and move pointers rather copying the data; but that assumes modifying this "redis" stuff.

If this stuff is all in memory, then judicious use of mmap would seem a better approach. Then again, I don't know what redis does, it might just do so.

_martin · Sep 7, 2021

I second the mark_j opinion. memcpy (memmove) comes from the libc, for amd64 lib/libc/amd64/string/memmove.S. Note there does say no simd operations.
But writing your of memcpy (and let it be better than the current one) could be a hard task.

Alain De Vos · Sep 8, 2021

when i run,

Code:

pkg info | awk '{print $1}' | xargs -I {} pkg info -D  {} | grep kern.ipc

I don't find anything on kern.ipc

_martin · Sep 8, 2021

Alain De Vos Have a look in here: Part II. Interprocess Communication to get some information about IPC.
Those tunables are accessible via standard sysctl, $ sysctl kern.ipc.

SirDice · Sep 8, 2021

Alain De Vos said:
when i run,

Code:

pkg info | awk '{print $1}' | xargs -I {} pkg info -D {} | grep kern.ipc

I don't find anything on kern.ipc

You know you can just do pkg info -aD right?

nbari · Sep 13, 2021

SirDice said:
Remove those CFLAGS and COPTFLAGS.

Is there any benefit from using something like or why is better to remove all CFLAGS/COPTFLGS?

Code:

CFLAGS="-O3"

SirDice · Sep 13, 2021

nbari said:
Is there any benefit from using something like or why is better to remove all CFLAGS/COPTFLGS?

The developers have already set the most optimal (for most people) options for every single part of the base and kernel. So unless you fully understand what all those options do with the compiler and the code it's best not to touch it. Just randomly adding options you found on the internet without actually understanding what they do typically makes things worse, not better.

covacat · Sep 13, 2021

memcpy/memmove is already assembly code (without simd instructions) so CFLAGS wont matter

hardworkingnewbie · Sep 13, 2021

For me the more interesting question is: how did you configure your Redis instance? This very important piece of information is still missing.

nbari · Sep 14, 2021

probably I am facing this issue:

Latency induced by transparent huge pages
Unfortunately when a Linux kernel has transparent huge pages enabled, Redis incurs to a big latency penalty after the fork call is used in order to persist on disk. Huge pages are the cause of the following issue:

Fork is called, two processes with shared huge pages are created.

In a busy instance, a few event loops runs will cause commands to target a few thousand of pages, causing the copy on write of almost the whole process memory.

This will result in big latency and big memory usage.

How in FreeBSD, like in Linux "disable transparent huge pages" or similar to improve these latency issues?

nbari · Sep 14, 2021

hardworkingnewbie said:
For me the more interesting question is: how did you configure your Redis instance? This very important piece of information is still missing.

redis halts/idle more than 10 seconds after BGSAVE finishes · redis redis · Discussion #9457

I am running a cluster "sentinel" of 3 nodes (1 master, 2 replicas) Redis version 6.2.5, OS FreeBSD 13, the dataset is approximately 70GB, the system has 320GB RAM, SSD disks, and the servers (dedi...

github.com

Code:

appendonly no
daemonize yes
databases 8
dbfilename dump.rdb
dir /var/db/redis
min-replicas-max-lag 20
min-replicas-to-write 1
pidfile /var/run/redis/redis.pid
protected-mode no

maxmemory 255092mb

save 900 1
save 300 10
save 60 10000

io-threads 13
io-threads-do-reads yes

client-output-buffer-limit replica 16gb 16gb 60
repl-backlog-size 4gb
repl-timeout 3600

mark_j · Sep 14, 2021

nbari said:
probably I am facing this issue:

How in FreeBSD, like in Linux "disable transparent huge pages" or similar to improve these latency issues?

Read this.
You need to remember linux != freebsd. In some things like providing posix they're close but nearly everything else is different.
What I'm try to say is while both might implement superpages or hugepages or w^x or whatever, the actual implementation and functionality will likely be hugely disparate.

how to speed/improve memcpy

nbari

SirDice

Administrator

diizzy

Alain De Vos

mark_j

nbari

SirDice

Administrator

diizzy

mark_j

_martin

Alain De Vos

_martin

SirDice

Administrator

nbari

SirDice

Administrator

covacat

hardworkingnewbie

nbari

Latency induced by transparent huge pages

nbari

redis halts/idle more than 10 seconds after BGSAVE finishes · redis redis · Discussion #9457

mark_j

how to speed/improve memcpy

Administrator

Administrator

Administrator

Administrator

Latency induced by transparent huge pages​

Latency induced by transparent huge pages