how to speed/improve memcpy

nbari

Member

Reaction score: 7
Messages: 84

I am running a Redis (version 6.2.5) cluster "sentinel" of 3 nodes (1 master, 2 replicas), OS FreeBSD 13, the dataset is approximately 70GB, the system has 320GB RAM, SSD disks, and the servers (dedicated, not VM's) are in the same network/datacenter, probably hardware is not a problem, but I start to notice that after BGSAVE finishes, there is a lag of approximately 10 seconds, because of this, the applications randomly get this error: NOREPLICAS Not enough good replicas to write, setting, min-replicas-max-lag to 20 helps but I would like to know if there is something I could fine-tune to speed up memcpy, from the flame graph: (Github discussion here: https://github.com/redis/redis/discussions/9457)


132091625-b84f2c89-3cf2-4b6a-83d3-c0683b1de546.png


The current /boot/loader.conf:

Code:
# Set Max size = 32GB
vfs.zfs.arc_max="34359738368"
# Min size = 4GB
vfs.zfs.arc_min="4294967296"


kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"

In /etc/sysctl.conf

Code:
kern.ipc.shm_use_phys=1

For building the kernel I have this in /etc/make.conf

Code:
CFLAGS=         -O2 -pipe -fno-strict-aliasing
COPTFLAGS=      -O2 -pipe -fno-strict-aliasing

BUILD_OPTIMIZED=        YES
BUILD_STATIC=           YES
OPTIMIZED_CFLAGS=       YES
WITHOUT_DEBUG=          YES
WITH_CPUFLAGS=          YES
WITH_OPTIMIZED_CFLAGS=  YES
MALLOC_PRODUCTION=      YES

From the flame graph, next to the Redis is one from ZFS:

Screenshot 2021-09-07 at 12.00.57.png


Here also I notice that ZFS is using memcpy:

Screenshot 2021-09-07 at 12.02.08.png




Currently trying with vm.pmap.pg_ps_enabled=0
 
Last edited:

diizzy

Aspiring Daemon

Reaction score: 164
Messages: 536

...and BUILD_STATIC probably doesn't help either but it most likely it wont fix your main issue
You probably want to set/define CPUTYPE so you can take advantage of new(er) instructions
 

Alain De Vos

Daemon

Reaction score: 613
Messages: 2,074

Why this tuning ?
Code:
kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"
 

mark_j

Daemon

Reaction score: 682
Messages: 1,192

That tuning's for things like databases communicating over ipc/shared memory - Oracle is big on all that stuff, particularly on Solaris.
 
OP
nbari

nbari

Member

Reaction score: 7
Messages: 84

Why this tuning ?
Code:
kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"

Like mark_j mentioned, indeed I normally found also that for PostgreSQL and though could help to improve Redis performance. (try to keep all in ram)


diizzy regarding BUILD_STATIC/CPUTYPE any url/doc or something in specific ?

Currently, I am rebuilding kernel & word removing as SirDice advised, removing CFLAGS/COPTFLAGS
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,053
Messages: 38,509

indeed I normally found also that for PostgreSQL and though could help to improve Redis performance. (try to keep all in ram)
PostgreSQL uses shared memory (which is where those settings are for). Redis however does not.
 

mark_j

Daemon

Reaction score: 682
Messages: 1,192

Back to the original question " if there is something I could fine-tune to speed up memcpy", there's probably not. Clang (and gcc) are very good at inlining the code to optimise it. I guess you could experiment with __SSE2__ or __SSE3__ (thus restricting possible optimisation to intel code chips) in memcpy with -fno-builtin-memcpy in your own version of memcpy?

The only way to really speed it up is get faster RAM or where possible use pointers and move pointers rather copying the data; but that assumes modifying this "redis" stuff.

If this stuff is all in memory, then judicious use of mmap would seem a better approach. Then again, I don't know what redis does, it might just do so.
 

_martin

Daemon

Reaction score: 286
Messages: 1,080

I second the mark_j opinion. memcpy (memmove) comes from the libc, for amd64 lib/libc/amd64/string/memmove.S. Note there does say no simd operations.
But writing your of memcpy (and let it be better than the current one) could be a hard task.
 

Alain De Vos

Daemon

Reaction score: 613
Messages: 2,074

when i run,
Code:
pkg info | awk '{print $1}' | xargs -I {} pkg info -D  {} | grep kern.ipc
I don't find anything on kern.ipc
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,053
Messages: 38,509

when i run,
Code:
pkg info | awk '{print $1}' | xargs -I {} pkg info -D  {} | grep kern.ipc
I don't find anything on kern.ipc
You know you can just do pkg info -aD right?
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,053
Messages: 38,509

Is there any benefit from using something like or why is better to remove all CFLAGS/COPTFLGS?
The developers have already set the most optimal (for most people) options for every single part of the base and kernel. So unless you fully understand what all those options do with the compiler and the code it's best not to touch it. Just randomly adding options you found on the internet without actually understanding what they do typically makes things worse, not better.
 

covacat

Well-Known Member

Reaction score: 198
Messages: 424

memcpy/memmove is already assembly code (without simd instructions) so CFLAGS wont matter
 

hardworkingnewbie

Active Member

Reaction score: 151
Messages: 154

For me the more interesting question is: how did you configure your Redis instance? This very important piece of information is still missing.
 
OP
nbari

nbari

Member

Reaction score: 7
Messages: 84

probably I am facing this issue:

Latency induced by transparent huge pages​

Unfortunately when a Linux kernel has transparent huge pages enabled, Redis incurs to a big latency penalty after the fork call is used in order to persist on disk. Huge pages are the cause of the following issue:

  1. Fork is called, two processes with shared huge pages are created.
  2. In a busy instance, a few event loops runs will cause commands to target a few thousand of pages, causing the copy on write of almost the whole process memory.
  3. This will result in big latency and big memory usage.

How in FreeBSD, like in Linux "disable transparent huge pages" or similar to improve these latency issues?
 
OP
nbari

nbari

Member

Reaction score: 7
Messages: 84

For me the more interesting question is: how did you configure your Redis instance? This very important piece of information is still missing.

Code:
appendonly no
daemonize yes
databases 8
dbfilename dump.rdb
dir /var/db/redis
min-replicas-max-lag 20
min-replicas-to-write 1
pidfile /var/run/redis/redis.pid
protected-mode no

maxmemory 255092mb

save 900 1
save 300 10
save 60 10000

io-threads 13
io-threads-do-reads yes

client-output-buffer-limit replica 16gb 16gb 60
repl-backlog-size 4gb
repl-timeout 3600
 

mark_j

Daemon

Reaction score: 682
Messages: 1,192

probably I am facing this issue:


How in FreeBSD, like in Linux "disable transparent huge pages" or similar to improve these latency issues?
Read this.
You need to remember linux != freebsd. In some things like providing posix they're close but nearly everything else is different.
What I'm try to say is while both might implement superpages or hugepages or w^x or whatever, the actual implementation and functionality will likely be hugely disparate.
 
Top