Introduction
When working with Oracle RAC, it’s strongly advised to use Jumbo Frames for the network that provides the interconnect between nodes.
As the nodes tend to send a lot of blocks back and forth across this private network, the larger the size of the packets, the fewer of them there are to send.
The usual block size for an Oracle database is 8192 bytes.
The standard MTU (Maximum Transmission Unit) for IP frame size is 1500 bytes.
Sending an 8k Oracle block requires assembling six chunks of data to create a frame or packet.
However, if Jumbo Frames are used (9000 bytes), the entire block fits neatly into a single frame or packet.
—–
Note: Viewing the mechanical effects of MTU in action requires a fair amount of effort to setup a SPAN or Port Mirror. Use that port to capture the traffic from the wire. This is not being done for this test.
Why this explanation? Because, as shown below, the packet size will be ~8k, even though the MTU is set to 1500. Because we cannot see the effects of MTU directly on the client or server, these effects are inferred from other data.
“Frame” and “packet” are terms that seem to be used interchangeably. However, they are context-sensitive. That is, they occupy different layers of the OSI model.
—–
On with the story…
Recently, I was working with a two Node Oracle RAC system that runs in a VMWare ESXi 6.5 environment.
It was thought that due to the optimizations being performed by VMWare in the virtual network stack, that Jumbo Frames were unnecessary.
However, that does not seem to be the case.
After some testing of throughput using both the standard 1500 byte MTU and 9000 byte Jumbo Frame MTU, the larger MTU size resulted in a 22% increase in throughput speed.
Why did that happen? Well, keep reading to find out.
The Test Lab
Though the VMWare testing was done on Oracle Linux 7.8, the following experiments are being performed on Ubuntu 20.04 LTS.
As there was no need to run Oracle, Ubuntu works just fine for these tests.
Following are the two servers created:
ubuntu-test-mule-01: tcp test client - 192.168.1.226 ubuntu-test-mule-02: tcp test server - 192.168.1.227
Some version information:
root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '^(NAME=|VERSION=)' /etc/os-release NAME="Ubuntu" VERSION="20.04 LTS (Focal Fossa)"
Network Configuration
Other than having different IP addresses, ubuntu-test-mule-01 and ubuntu-test-mule-02 are set up exactly the same way.
Because this version of Ubuntu uses netplan
to configure the interfaces, we modified the /etc/netplan/00-installer-config.yaml
file to configure the two test interfaces.
The interfaces used for the testing are enp0s8
and enp0s9
.
Then, netplan apply
was used to enable the changes.
root@ubuntu-mule-01:~/perl-sockets/packet-test# cat /etc/netplan/00-installer-config.yaml # This is the network config written by 'subiquity' network: ethernets: enp0s3: dhcp4: true enp0s8: dhcp4: false addresses: [192.168.154.4/24] mtu: 9000 enp0s9: dhcp4: false addresses: [192.168.199.35/24] version: 2
The results:
# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 08:00:27:b8:5c:dc brd ff:ff:ff:ff:ff:ff inet 192.168.1.227/24 brd 192.168.1.255 scope global enp0s3 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:feb8:5cdc/64 scope link valid_lft forever preferred_lft forever 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000 link/ether 08:00:27:ab:d5:44 brd ff:ff:ff:ff:ff:ff inet 192.168.154.5/24 brd 192.168.154.255 scope global enp0s8 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:feab:d544/64 scope link valid_lft forever preferred_lft forever 4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 08:00:27:22:fc:a2 brd ff:ff:ff:ff:ff:ff inet 192.168.199.36/24 brd 192.168.199.255 scope global enp0s9 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe22:fca2/64 scope link valid_lft forever preferred_lft forever
Configure the TCP Test
Some time ago, I put together some Perl scripts for network throughput testing.
There were several reasons for this, including:
- You can copy and paste the code if necessary.
- It is easy to modify for different tests.
The following was run on each of the test mule servers:
# git clone https://github.com/jkstill/perl-sockets.git
On the ubuntu-mule-02 Server:
The following changes were made to server.pl:
First, disable the dot reporting. By default a “.” is printed every 256 packets that are received. You can disable this using the following line:
my $displayReport = 0; # set to 0 to disable reporting
Now, set the listening addresses.
Default code:
my $sock = IO::Socket::INET->new( #LocalAddr => '192.168.1.255', # uncomment and edit adddress if needed LocalPort => $port, Proto => $proto, Listen => 1, Reuse => 1 ) or die "Cannot create socket: $@";
We changed this to reflect the network interfaces that would be used for the testing on the server-side.
my $sock = IO::Socket::INET->new( LocalAddr => '192.168.154.5', # MTU 9000 #LocalAddr => '192.168.199.36', # MTU 1500 LocalPort => $port, Proto => $proto, Listen => 1, Reuse => 1 ) or die "Cannot create socket: $@";
The appropriate interface was used for each test.
On the ubuntu-mule-01 Client:
Some test data was created. The use of /dev/urandom
and gzip
makes it unlikely that you can perform any compression. This is something I learned to do for quick throughput tests using ssh, as ssh compresses data. It’s probably not necessary in this case, but then again, it doesn’t hurt to ensure the test data is non-compressible.
root # cd perl-sockets/packet-test root # dd if=/dev/urandom bs=1048576 count=101 | gzip - | dd iflag=fullblock bs=1048576 count=100 of=testdata-100M.dat root # dd if=/dev/urandom bs=1048576 count=1025 | gzip - | dd iflag=fullblock bs=1048576 count=1024 of=testdata-1G.dat root@ubuntu-mule-01:~/perl-sockets/packet-test# ls -l testdata-1* -rw-r--r-- 1 root root 104857600 May 11 16:05 testdata-100M.dat -rw-r--r-- 1 root root 1073741824 May 11 16:04 testdata-1G.dat
The Driver Script
We used the packet-driver.sh
script to run each of the tests from the client-side.
This script simply runs a throughput test 23 times in succession, using the specified MTU size.
#!/usr/bin/env bash : ${1:?Call with 'packet-driver.sh <SIZE> '!} : ${mtu:=$1} if ( echo $mtu | grep -vE '1500|9000' ); then echo Please use 1500 or 9000 exit 1 fi declare -A localHosts declare -A remoteHosts localHosts[9000]=192.168.154.4 localHosts[1500]=192.168.199.35 remoteHosts[9000]=192.168.154.5 remoteHosts[1500]=192.168.199.36 blocksize=8192 testfile=testdata-1G.dat cmd="./client.pl --remote-host ${remoteHosts[$mtu]} --local-host ${localHosts[$mtu]} --file $testfile --buffer-size $blocksize" for i in {0..22} do echo "executing: $cmd" $cmd done
Perform the Tests
For each test, we enabled the correct interface in server.pl
, and then the started the server.
For the client-side, we called the packet-driver.sh
script with the required MTU size.
The MTU size passed on the command line determines which interface is used on the client-side.
1500 MTU
On the server-side, make sure the address in server.pl
is set for the 1500 byte MTU interface. Then, start the server:
root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '\s+LocalAddr' server.pl LocalAddr => '192.168.199.36', # MTU 1500 root@ubuntu-mule-02:~/perl-sockets/packet-test# ./server.pl | tee mtu-1500.log Initial Receive Buffer is 425984 bytes Server is now listening ... Initial Buffer size set to: 2048
On the client-side, run packet-driver.sh:
root@ubuntu-mule-01:~/perl-sockets/packet-test# ./packet-driver.sh 1500 executing: ./client.pl --remote-host 192.168.199.36 --local-host 192.168.199.35 --file testdata-1G.dat --buffer-size 8192 remote host: 192.168.199.36 port: 4242 bufsz: 8192 simulated latency: 0 bufsz: 8192 Send Buffer is 425984 bytes Connected to 192.168.199.36 on port 4242 Sending data...
9000 MTU
On the server-side, make sure the address in server.pl
is set for the 9000 byte MTU interface. Then, start the server:
root@ubuntu-mule-02:~/perl-sockets/packet-test# grep -E '\s+LocalAddr' server.pl LocalAddr => '192.168.154.5', # MTU 9000 root@ubuntu-mule-02:~/perl-sockets/packet-test# ./server.pl | tee mtu-9000.log Initial Receive Buffer is 425984 bytes Server is now listening ... Initial Buffer size set to: 2048
Now, run packet-driver.sh
on the client:
root@ubuntu-mule-01:~/perl-sockets/packet-test# ./packet-driver.sh 9000 executing: ./client.pl --remote-host 192.168.154.5 --local-host 192.168.154.4 --file testdata-1G.dat --buffer-size 8192 remote host: 192.168.154.5 port: 4242 bufsz: 8192 simulated latency: 0 bufsz: 8192 Send Buffer is 425984 bytes Connected to 192.168.154.5 on port 4242 Sending data...
Reporting
When all tests are complete, use packet-averages.pl
to calculate the averages across all tests per MTU size.
root@ubuntu-mule-02:~/perl-sockets/packet-test# ./packet-averages.pl < mtu-1500.log key/avg: Bytes Received 1073733637.000000 key/avg: Avg Packet Size 7898.147391 key/avg: Packets Received 135948.304348 key/avg: Average milliseconds 0.043824 key/avg: Avg Megabytes/Second 172.000000 key/avg: Avg milliseconds/MiB 5.818500 key/avg: Total Elapsed Seconds 6.850447 key/avg: Network Elapsed Seconds 5.958098 root@ubuntu-mule-02:~/perl-sockets/packet-test# ./packet-averages.pl < mtu-9000.log key/avg: Bytes Received 1073733637.000000 key/avg: Avg Packet Size 7519.793478 key/avg: Packets Received 142790.217391 key/avg: Average milliseconds 0.033652 key/avg: Avg Megabytes/Second 213.165217 key/avg: Avg milliseconds/MiB 4.692753 key/avg: Total Elapsed Seconds 5.495095 key/avg: Network Elapsed Seconds 4.805343
The average Total Elapsed Seconds for the 9000 MTU tests is only 80% of the time required for the 1500 MTU tests.
From these results, it appears as if using Jumbo Frames is a pretty clear winner, even in a virtualized environment.
This result might seem somewhat surprising, as the tests are not sending any data over a physical wire.
In this case, the “network” is only composed of VirtualBox host network adapters.
So, why then are Jumbo Frames still so much faster than the standard 1500 MTU size?
Performance Profiling
This time, we’ll run a single test for each MTU size.
We’ll use the perf
profiler to gather process profile information on the client-side.
First, the 1500 MTU size:
perf record --output perf-mtu-1500.data ./client.pl --remote-host 192.168.199.36 --local-host 192.168.199.35 --file testdata-1G.dat --buffer-size 8192
Now the 9000 MTU size:
perf record --output perf-mtu-9000.data ./client.pl --remote-host 192.168.154.5 --local-host 192.168.154.4 --file testdata-1G.dat --buffer-size 8192
Let’s see some reports from perf
.
1500 MTU
root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-1500.data --stats | grep TOTAL TOTAL events: 28648 root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-1500.data | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 28K of event 'cpu-clock:pppH' # Event count (approx.): 7137500000 # # Overhead Command Shared Object Symbol # ........ ....... .................. .................................... # 56.35% perl [kernel.kallsyms] [k] e1000_xmit_frame 21.00% perl [kernel.kallsyms] [k] e1000_alloc_rx_buffers 9.71% perl [kernel.kallsyms] [k] e1000_clean 2.22% perl [kernel.kallsyms] [k] __softirqentry_text_start 1.74% perl [kernel.kallsyms] [k] __lock_text_start 0.61% perl [kernel.kallsyms] [k] copy_user_generic_string 0.58% perl [kernel.kallsyms] [k] clear_page_rep 0.39% perl libpthread-2.31.so [.] __libc_read 0.30% perl [kernel.kallsyms] [k] do_syscall_64
9000 MTU
root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-9000.data --stats | grep TOTAL TOTAL events: 25259 root@ubuntu-mule-01:~/perl-sockets/packet-test# root@ubuntu-mule-01:~/perl-sockets/packet-test# perf report -i perf-mtu-9000.data | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 25K of event 'cpu-clock:pppH' # Event count (approx.): 6290500000 # # Overhead Command Shared Object Symbol # ........ ....... .................. ........................................ # 64.15% perl [kernel.kallsyms] [k] e1000_xmit_frame 16.92% perl [kernel.kallsyms] [k] e1000_alloc_jumbo_rx_buffers 9.03% perl [kernel.kallsyms] [k] e1000_clean 1.79% perl [kernel.kallsyms] [k] __softirqentry_text_start 1.66% perl [kernel.kallsyms] [k] __lock_text_start 0.41% perl [kernel.kallsyms] [k] clear_page_rep 0.39% perl [kernel.kallsyms] [k] copy_user_generic_string 0.26% perl [kernel.kallsyms] [k] do_syscall_64 0.24% perl libpthread-2.31.so [.] __libc_read
A conclusion we might draw from these reports:
When using an MTU of 9000, the test program spent more time sending data (e1000_xmit_frame
) and less time in overhead.
Note that 16.92% of the time was spent in allocating Jumbo-sized frames through e1000_alloc_jumbo_rx_buffers
in the MBU 9000 test, versus 21% of the time required in the MTU 1500 test for e1000_alloc_rx_buffers
.
The reason for the performance increase, in this case, seems to be this: The use of Jumbo Frames simply requires less work for the server.
Rather than assembling six 1500 byte frames into a packet to accommodate our 8192-byte packet, Jumbo Frames can get it all in one frame.
Though these tests were run using servers virtualized with VirtualBox, the results are quite similar to those seen in servers running in VMWare ESXi.
The fact that the servers are virtual does not reduce the need to ensure that RAC nodes get the fastest possible throughput on the private network used for the interconnect… And that means using Jumbo Frames.