Using “sysbench” to test memory performance

Sysbench is a powerful testing tool for CPU / Memory / Mysql etc. Three years ago, I used to test performance of MYSQL by using it.
Yesterday, I used Sysbench to test memory bandwidth of my server.
By using command:

sysbench --test=memory --memory-block-size=1M --memory-total-size=100G --num-threads=1 run

It reported the memory bandwidth could reach 8.4GB/s, which did make sense for me.
But after decrease the block size (Change 1M to 1K):

sysbench --test=memory --memory-block-size=1K --memory-total-size=100G --num-threads=1 run

The memory bandwidth reported by Sysbench became only 2GB/s
This regression of memory performance really confuse me. Maybe the memory of modern machines has some kind of “Max limited frequency” so we can’t access memory with too high frequency?
After checked the code of Sysbench, I found out its logic about memory test is just like this program (I wrote it myself):

/* mytest.c */
#include 
#include 
#include 
const long DATA = (100 * 1024 * 1048576LL); /* 100G data */
int main(int argc, char *argv[]) {
    volatile int tmp = 0;
    int *buffer, *end, *begin;
    long i, loop, block_size;
    struct timeval before, after;
    if (argc < 2) {
        return -1;
    }
    block_size = atoi(argv[1]);
    buffer = (int *)malloc(block_size);
    end = (int*)(((char *)buffer) + block_size);
    loop = (long)DATA / block_size;
    gettimeofday(&before, NULL);
    for (i = 0; i < loop; i++) {
        for (begin = buffer; begin < end; begin++) {
            *begin = tmp;
        }
    }
    gettimeofday(&after, NULL);
    printf("time: %lu\n", (after.tv_sec * 1000000 + after.tv_usec)
        - (before.tv_sec * 1000000 + before.tv_usec));
    free(buffer);
}

But this test program cost only 14 seconds (Sysbench cost 49 seconds). To find out the root cause, we need to use a more powerful tool -- perf:

# perf stat -e cache-misses,faults,branch-misses ./mytest 1048576
Performance counter stats for './my 1048576':
            90,395 cache-misses
               400 faults
           178,554 branch-misses
      14.825497139 seconds time elapsed
# perf stat -e cache-misses,faults,branch-misses sysbench --test=memory --memory-block-size=1K --memory-total-size=100G --num-threads=1 run
Performance counter stats for 'sysbench --test=memory --memory-block-size=1K --memory-total-size=100G --num-threads=1 run':
           739,223 cache-misses
               825 faults
           531,908 branch-misses
      49.264963322 seconds time elapsed

They have totally different CPU cache-misses. The root cause is because Sysbench use a complicate framework to support different test targets (Mysql/Memory ...), which need to pass a structure named "request" and many other arguments in and out of execution_request() function many times in one request (accessing 1K memory, in our scenario), this overload becomes big when block size is too small.
The conclusion is: don't use Sysbench to test memory performance by using too small block size, better bigger than 1MB.
Ref: by Coly Li 's teaching, memory do have "top limit access frequency" (link). Take DDR4-1866 for example: it's data rate is 1866MT/s （MT = Mega Transfer) and every transfer takes 8 bytes, so we can access memory more than 1 billion times per second, theoretically.

Robin on Linux

Using “sysbench” to test memory performance

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply